E-Discovery Search Blog

What’s Wrong with the EDRM Model?

[Note: In response to the thoughtful comments to this post, I created a revised version of my EDRM chart. See the revised chart and my explanation of the changes in this post: A Revised Version of My Alternative EDRM Model.]

EDRM

The Electronic Discovery Reference Model (EDRM) has been around for several years now. It is widely used both in the U.S. and around the world when people talk about electronic discovery. Lawyers, consultants, analysts, vendors and yes, investors, all talk about the EDRM as a standard way to describe the e-discovery process.

But there is something wrong with the model. We could certainly debate about the order of the process, whether records management should be included in the model or the order of the boxes. My point is different. George Socha, a good friend for years, Tom Gelbmann and the other members of the EDRM committee did a great job when they drew up the model. But they made one mistake—and it is time to point it out.

Look at the triangles that create shading for the model. The left triangle is meant to reflect data volume. The right triangle reflects the volume of relevant documents.

The left triangle slopes downward to reflect the fact that as we move through the model, we reduce the total volume of documents being considered. That seems logical enough. The whole point of the EDRM process is to take a large volume of documents (and data of course) and cull it down to those few documents that are relevant.

Likewise, the right triangle has an upward slope. The reason is to show how the volume of relevant documents increases during the culling process. Actually, the volume doesn’t increase but the percentage of relevant versus non-relevant documents increases as the non-relevant documents are culled out. This is both logical and common sense.

As a lawyer, my job is to plow through volumes of documents—whether paper or electronic—and ferret out what might be helpful to my case. In the old days, I used to put yellow stickies on the documents that interested me and send them for copying. In the end, the relevant documents went into notebooks for my review. The proportion of relevant to non-relevant documents got larger and larger until I finally marked my exhibits for trial. At that point, they were all relevant. That’s why I marked them as exhibits.

But here’s the rub. Look closely at the triangles in the EDRM. Notice that the triangle reflecting “Data Volume” starts high on the Y axis (top of the chart). It then drops down to the X axis (Y=0) at the right side of the chart, reflecting the end of the process—the trial or the hearing.

And, look closely at the triangle reflecting “Data Relevance.” It starts on the X axis at the beginning of the process (Y=0) and rises high up the Y axis by the end of the process—trial/hearing.

What’s wrong with that? Well, just one thing. The triangles’ values should never drop to zero (intersecting the X axis), certainly not at the beginning and usually not at the end of the process as well. They need to be above the X axis. Y has to be positive at the beginning and end of the process.

What do I mean? Quite simply this. When you start the process of collection, there certainly are a lot of non-relevant documents. The initial line should be at the top of the axis. But, there are also a quantity of relevant documents in the mix. That’s why we do the collection in the first place, to find the relevant documents.

Thus the triangle for “Data Relevance” is off. It should never be at the X axis because you wouldn’t do the collection if you didn’t believe that some of the documents you were collecting were relevant. The line has to begin above zero.

The same point applies to the Data Volume triangle. While the goal is to pare down the documents as much as possible, you always end up with some documents. That is the point of the effort. It might only be trial exhibits (relevant equals 100%) or the pile might be a mix of relevant and not relevant. But either way, data volume ends up as a positive number. Otherwise the process failed.

So, we goofed on the drawing. Both of the triangles—Data Volume and Data Relevance—need to be raised so that neither intersects the X axis. Here is how I would do it.

EDRMrevised

I created this in PowerPoint and it took a while to figure out how to create a triangle that wasn’t a triangle. I did it by extending the triangle beyond the slide frame so that the end point don’t show. I then cut it off when I made a screen shot.

This is how the process should be displayed. At the start of the process, document volume is high but the percentage of relevant documents is relatively low. We could debate what that percentage might be but I think we could all agree that it would never be zero.

At the end of the process, data volume is greatly reduced, reflecting the success of our processing, culling, search, analytics and review work. The volume will vary based on the issues in the case but it should never be zero, as I am sure you will readily agree.

Why did this happen? I suspect it was just a matter of the graphics program the EDRM drafters used. I have never heard anyone talk about it but it has bugged me since I first saw the chart. Maybe it’s the trial lawyer in me acting out.

So, here is my chart. I freely offer it to anyone who wants it. No copyright, no claims. I think it is a bit prettier than the original but, as they say, art is in the eye of the beholder. What it is, however, is more reflective of the EDRM process. You start with a high volume of documents and a low percentage or relevant ones. You end with a low volume of documents but a high percentage of relevant ones. That’s the EDRM process. That’s the trial lawyer’s objective.

Forgive me George and all of the respected members of the original EDRM committee. I am not offering a serious criticism and I commend you for your efforts and the success of the model. I just thought I would have a little fun in the blog and perhaps make my own contribution to the process. You have to smile a little, even with the serious business of electronic discovery.

John Tredennick About John Tredennick

A nationally known trial lawyer and longtime litigation partner at Holland & Hart, John founded Catalyst in 2000 and is responsible for its overall direction, voice and vision.

Well before founding Catalyst, John was a pioneer in the field of legal technology. He was editor-in-chief of the multi-author, two-book series, Winning With Computers: Trial Practice in the Twenty-First Century (ABA Press 1990, 1991). Both were ABA best sellers focusing on using computers in litigation technology. At the same time, he wrote, How to Prepare for Take and Use a Deposition at Trial (James Publishing 1990), which he and his co-author continued to supplement for several years. He also wrote, Lawyer's Guide to Spreadsheets (Glasser Publishing 2000), and, Lawyer's Guide to Microsoft Excel 2007 (ABA Press 2009).

John is the former chair of the ABA's Law Practice Management Section. For many years, he was editor-in-chief of the ABA's Law Practice Management magazine, a monthly publication focusing on legal technology and law office management. More recently, he founded and edited Law Practice Today, a monthly ABA webzine that focuses on legal technology and management. Over two decades, John has written scores of articles on legal technology and spoken on legal technology to audiences on four of the five continents.

Comments

  1. John,

    As we discussed, there is reason behind the differently scaled triangle. The idea is that if you start with a terabyte of data, you should not end up with a terabyte of data. It is highly unlikely that the entire terabyte is relevant, however you define relevance. And who would want to put a terabyte in front of a jury, even if a judge were to allow such a folly?

    By the way, you lost some arrows along the way, so that your version never allows for disposition of data, which is an information management task.

    George

  2. Whilst I agree with the logic in what you say, your model does not explain that percentages are involved. It therefore looks to the untutored eye as though there is as much relevant data at the end of the process as there was volume of data at the start. This suggests that there is purely a metamorphosis from “volume” to “relevant”. Help!

    The relevant data of course remains fairly static throughout the process. It is in there somewhere from the outset. The processes allow us to bring the cream to the surface so that we only have to skim the surface to take to trial.

    My artistic skills prevent me from trying to demonstrate this behind the model but perhaps if the model was rotated through 90 degrees and the two triangles replaced by a milk churn (with the cream at the top) or an iceberg (with the relevant tip above the surface) then we get nearer to the point of showing the processes in the first place!

  3. Tom Gelbmann says:

    John,

    You are right, it did bring a little smile. We do need a few light moments in this business.

    Regards, Tom

  4. Luz G. Banda says:

    Hello. Besides mentioning the missing of some arrows in the Revised EDRM, just like Mr. George Socha Jr. did, and the change in the direction of the arrows at the Processing, Analysis, and Review boxes, I would like to ask if there is any relevance in where the triangles meet in this new model. Even though the triangles serve as reference to depict the transition (if there is such) from Data Volume to Data Relevance and to provide an insight into the possible data ratios at a particular point along the process, it is somewhat unclear whether this transition varies in such a way that the place the triangles meet is a relevant reference to the steps in the process.

    • Hello to you. You ask a good questions with respect to the triangles.

      I didn’t create the original model nor am I any kind of spokesperson for EDRM. However, I believe the intent of the triangles were to show two things: 1)that data volumes decrease as the process moves forward; and 2) that the percentage of relevant content increases in like fashion. (On the latter point, it has to be relative percentage: the actual volume of relevant documents does not change unless something is added or destroyed, which is something I do not think is contemplated in the model.)

      Both points make perfect sense to me and model what we do as trial lawyers. What no model could do is show actual figures or percentages. Data volumes vary by case–small cases have few documents, larger cases usually have more. Likewise, the number of relevant documents varies both with the method of collection and with the issues in the case. So, while the model can show trends and directions, it cannot do any more than that.

      Where the triangles meet is an interesting concept. I suspect it will vary in accordance with the methods used throughout the model. If I was a corporate GC, I would want volume to decrease as quickly as possible. Actually, I would want the collection to be as surgical as possible. The fewer documents collected, the lower my processing and review costs.

      As a trial lawyer, I hate looking at non-relevant documents. So my goal there would be to have the relevant percentage start high and continue rising across the process. That lets me focus on preparing my witnesses and opening statement.

      The triangles will always intersect but the model can’t do much more than it does in my judgment. It will vary by the case and circumstances.

      Hope that helps.

  5. swati chaudhary says:

    Hello Sir,

    I read your article and found it very interesting. The axises i.e. X and Y which you are talking about are still not clear to me through this model because you have not mentioned them.

    But if I presume the two axises on the either sides of the model, then also I am not able be understand that how can the triangles be similar. There has to have distinction between the triangles because then only we would be able to define the total data volume from relevant data.

    You are right that volume of relevant data never changes but the relevant data in the document review projects is never same, sometimes volume of irrelevant data can also increase.

    According to my understanding Mr. Socha and Gelbmann has tried to show the maximum volume of data (relevant plus not relevant), but we cannot fix the percentage of what data is relevant and not relevant

    Thank You
    Swati Chaudhary

Trackbacks

  1. [...] This post was mentioned on Twitter by Bob Ambrogi and Rob Robinson, Catalyst. Catalyst said: What’s Wrong with the EDRM Model? http://bit.ly/c83q3U [...]

  2. [...] you think it is for the lawyers? Would you really trust a hard drive to a lawyer? Josh Tredennick has a problem with an EDRM [...]

  3. [...] you think it is for the lawyers? Would you really trust a hard drive to a lawyer? Josh Tredennick has a problem with an EDRM model. Filed under: Uncategorized Leave a comment Comments (0) Trackbacks (0) ( [...]

  4. [...] Tredennick wrote an interesting article entitled “What is Wrong with the EDRM Model?” discussing what he felt was missing from the EDRM diagram. I thought you might find it a [...]

Share Your Thoughts

*