E-Discovery Search Blog

E-Discovery, My How You’ve Grown!

I got an email the other day from Justin, our business development rep in Chicago. He was writing about a case we had been helping with for maybe a year now. It was an interesting matter but nothing out of the ordinary. I believe they have about 150,000 documents on the site.

Justin wrote to tell me that our client would be adding data to the site, also nothing out of the ordinary. But then he surprised me. “They expect to be adding another 28 million pages to the site,” he reported. That did get my attention.

“Did you mean 28,000 pages?” I wrote back, mostly just to kid him. “Perhaps that was a typo,” I continued.

“Nope,” he answered. “That was not a typo. Our partner says to expect another 28 million pages on the site.” I had to laugh when I thought about that volume. We have come a long way in this industry when 28 million pages isn’t all that unusual.

When Discovery Had No ‘E’ in Front

We started Catalyst in 2000, more than 10 years ago. At the time, there was no “e” in front of discovery. Digital mostly meant scanned images of the paper originals. Even the email that existed was printed out to paper and then scanned. I remember former partners of mine who would get a CD in production. The first thing they asked for was to print it out. Often we would stamp the pages and scan them back in to the system.

I smile when I think about those days. Back then a “big case” had 30,000 documents in it. (Of course some were bigger than that but plenty were smaller as well.) When we were getting started, I took pride in the fact that we had a dedicated storage device called a Net Appliance Filer. It was an industrial-strength set of hard drives that were striped using the RAID 5 protocol. I confess to bragging about it because it really seemed special to me.

Most important–and this is what makes me smile–the device had a whole 600 gigabytes of usable space on it. We thought we might never need another one. Ever. Indeed, I used to look forward to the day when we had a million pages on our system. That seemed like a huge reach to me. The thought of reaching the million-document mark seemed even more formidable. You need a lot of cases at 30,000 documents each to reach a million. Thirty-three seemed like a lot of cases to me back then.

A Million Documents is Now Routine

Fast forward to 2011 when you rarely see the word discovery without an “e” in front of it. We buy SANs these days rather than network-attached storage and they hold a lot more data. I believe our standard buying unit is now 48 terabytes which, as most know, is like 48,000 of those gigabytes we originally thought about. We now have a number of these devices and just bought another.

Volumes have risen incredibly. Several years ago, we automated the loading process just to keep up with the demand. Clients and partners send us data directly via a specialized FTP system for automated processing and loading. These days, loading a million documents in a day is not unusual. Indeed, just checking as I write this sitting on a porch on a sparkling afternoon in Nashville, Tenn., there are 162 separate automated loading tickets in our system. It just boggles my mind to think of that much data.

Taking it further, it is not just the number of documents being sent to us that has grown dramatically, but also the size of cases. Four or five years ago, our average case size was about 16 gigabytes of documents. If you figured 5,000-10,000 documents per gigabyte (a number some people throw around but we believe is high), that would come to between 80 and 160,000 documents per case. That seemed big enough to me.

Today, after removing a number of really large outlier cases, our average has jumped to about 120 gigabytes a case. Using the same base figures as before, that suggests that cases have grown substantially, to somewhere between 600,000 and more than a million documents per case. That is just amazing when you think about it.

And speaking of outliers, we have clients with 20 terabytes of litigation documents on our system (and one prospect talking about 60 terabytes of case data). I have wondered how many total documents that represents but I am not even going to pull out my calculator to figure that one. We see cases with as many as 8 million documents in them under review. It just boggles the mind.

So, that is what I was thinking about when I got Justin’s email. My point isn’t to say that Catalyst is big or that our numbers are anything special. Rather, like many of you, I remember what litigation was like in the 1990s, a time when we thought paper discovery was getting out of hand. In those days, we moved from red-well folders, to filing cabinets to war rooms and even warehouses. But we never contemplated having even 100,000 documents to review, let alone millions. It just didn’t happen, at least not in my practice.

The volume of digital data is growing at an explosive rate, as we all know. Some claim the world is creating more than 988 exabytes of data a year in new content. That comes to 988 billion gigabytes (It goes exabyte to petabyte to terabyte to gigabyte.)

The great bulk of this data consists of video and audio files but written data keeps expanding too. And a lot of that stuff will be discoverable, which is what drives this industry.

The e-discovery market is all grown up now, with trade shows, industry magazines, experts and analysts. My how you’ve grown! My how we all have grown!

John Tredennick About John Tredennick

A nationally known trial lawyer and longtime litigation partner at Holland & Hart, John founded Catalyst in 2000 and is responsible for its overall direction, voice and vision.

Well before founding Catalyst, John was a pioneer in the field of legal technology. He was editor-in-chief of the multi-author, two-book series, Winning With Computers: Trial Practice in the Twenty-First Century (ABA Press 1990, 1991). Both were ABA best sellers focusing on using computers in litigation technology. At the same time, he wrote, How to Prepare for Take and Use a Deposition at Trial (James Publishing 1990), which he and his co-author continued to supplement for several years. He also wrote, Lawyer's Guide to Spreadsheets (Glasser Publishing 2000), and, Lawyer's Guide to Microsoft Excel 2007 (ABA Press 2009).

John is the former chair of the ABA's Law Practice Management Section. For many years, he was editor-in-chief of the ABA's Law Practice Management magazine, a monthly publication focusing on legal technology and law office management. More recently, he founded and edited Law Practice Today, a monthly ABA webzine that focuses on legal technology and management. Over two decades, John has written scores of articles on legal technology and spoken on legal technology to audiences on four of the five continents.

Comments

  1. Craig Ball says:

    Good post, John! Thanks for the walk down memory lane, and more, for the metrics on change.
    _
    My challenge in dealing with the leap in volumetrics ties to my sense that human productivity has not grown since the 90′s. If anything, I think that we may be less productive when one considers the impact of all the distractions at hand. Who hasn’t sailed off into the net and found hours melt away?
    _
    If we are not generating more unique work product, then what is the composition of the leap in volume? Music, photos and video certainly play a crucial role on local machines, but that’s unlikely to be the case in your repositories (unless those collecting ESI are truly asleep at the wheel). So, my suspicion is that the jump in volume must be primarily a function of reiteration and fracturing.
    _
    There needs to be work done on this point. Surely we are not writing many more items of substance. Perhaps we are, Tweet-like, just breaking them into more-and-more pieces that each consume more space than they should and then scattering many copies of them into the ether.

    Best regards, Craig

  2. Good points as always Craig.

    Here is my view. Back in the old days when I received a request for production I would send it off to my client for response. I might make a few comments about what is requested and/or required but I wasn’t conducting facilities inspections or necessarily interviewing each potential custodian.

    In due course, the client would send me a package of documents, perhaps an envelope or a box. And, yes, there were the times that there were more than a few boxes of documents.

    But nobody endeavored to check every nook and cranny for documents and particularly for every copy of the documents. Sure it would be interesting to know if the witness had seen a key document before. We found that out at the deposition by placing the document before the witness and asking the question. We just didn’t focus on whether we had *every* possible copy of each document. It was instead about whether we had the key documents needed to prove our case.

    So, the starting point for you answer is that we just didn’t look as hard and didn’t care if we have every possible copy of every document. Putting aside the trial lawyer reasons why I didn’t need all those copies, it would have been hard to find them. We went to the correspondence files and got relevant stuff there. We didn’t then go to every employee to see if they might have more copies of these documents or whether there might be something that didn’t make it to the correspondence file. And so on.

    Add to that two facts of digital life.

    First, it is a lot easier to create and distribute documents. I remember in the early nineties pushing my firm to change from a printed “Daily Memo” to one delivered via email to members in our ten offices. We had a lot of people upset when we started the program. They were used to having their copy handed to them as the mail person came around with the cart. Then, they sort of got used to the electronic version. What I remember was the conclusion that we saved over $100,000 by making this change. That represented paper, ink, copy machines and people to ship and distribute all that stuff around.

    This just grew as email and electronic files became the norm. Putting aside whether we actually write more content today than back then, we can sure spread it around more easily. The infernal “Reply all” is a key part of our problem but that isn’t the only culprit. Saving stuff in multiple places, backing it up multiple times and emailing it around at next to no cost caused at least a good part of the explosion. When it doesn’t cost to print, copy or store paper, you know volumes are going to explode.

    Plus, we do write more. A lot more. Email has taken the place of many phone calls. When making a PowerPoint presentation is easy, we all do it. Even writing documents is now a lot easier than it was. And, I was pretty good with a Dictaphone back then. So, I think we write more and copy more. That means more files waiting to be discovered.

    The second fact of digital life is that it is easier to collect the stuff. When you can store a million files on a disk and search them all in milliseconds, then the game has changed. When they can be transported on a thumb drive or sent quickly as an email attachment, the point goes further. Add to that the host of collection vendors and software makers who keep honing their craft. And, a judicial mindset that is more focused on finding *every* copy of every document than making sure the relevant stuff gets produced. Without question, the practices have changed since I started trying cases.

    So, most of the digital explosion is about video and audio as we might expect. But, the volume of digital content that hits the net during the discovery process has increased dramatically as well, for these and more reasons. I don’t see it every going back to the good old days again.

    Best regards to you too old friend.

    John Tredennick

Trackbacks

  1. [...] Article Published By: Catalyst E-Discovery Search Blog [...]

  2. [...] My How You’ve Grown! http://tinyurl.com/3mduqhr (John [...]

  3. [...] his full post click here. Posted in Electronic Discovery 101: Where to Start « Can Technology [...]

  4. [...] In his article, “Accounting for the Costs of Electronic Discovery,” David Degnan states that conducting electronic discovery “may cost upwards of $30,000 per gigabyte.” (You can read Bob Ambrogi’s post about it here.) That is a lot of money for discovery, particularly considering that the number of gigabytes we are seeing per case seems to keep increasing. [...]

  5. [...] April, Catalyst CEO John Tredennick published a post here, E-Discovery, My How You’ve Grown! In it, he talked about the growth of the e-discovery industry since he founded Catalyst in 2000, [...]

Share Your Thoughts

*