Catalyst Repository Systems - Powering Complex Legal Matters

E-Discovery Search Blog

Catalyst E-Discovery Search Blog RSS Follow Catalyst on Twitter Join Catalyst on Facebook Catalyst on LinkedIn Catalyst YouTube Channel
Follow Us:
Technology, Techniques and Best Practices

My Key Word Searches are Better than Your Predictive Ranking Technology!

I recently got a distress call from an e-discovery partner of ours with an unhappy client. “It seems like there is something wrong with your predictive ranking technology,” our partner said on the Google Hangout. “It’s proposing that the client team review too many documents–more than we got with key word searching. Our client is upset. We need to do something to explain this fast.”

In this case the client team had not used technology assisted review (TAR) before; this was their first try at the process. They wanted proof that it was worth the extra cost for the technology. Specifically, they wanted to see whether it actually cut down on review costs, like everyone claimed.

The problem was that the system didn’t seem to work–at least in their eyes. They had started the review process by running a series of key word searches, which was their normal practice. The searches hit on a total of about 11,000 documents, out of a test set of 50,000 documents. This suggested they had only 11,000 documents to review for the production, about 20% of the total collected.

Our partner had recommended that the client try our Predictive Ranking technology as a better means to find responsive documents. Everyone’s initial expectation was that doing so would reduce the review population even further than the 11,000 documents that the key word searches hit. Somehow they got the impression that the review population might go down to more like 7,000. That would certainly justify the extra expense of this new technology.

Unfortunately, the opposite turned out to be the case. Instead of recommending that the team review 7,000 documents, or even 11,000 documents, our system suggested reviewing more than 18,500 documents.

You can imagine the client’s consternation. “You want us to pay you for these results? You just increased rather than reduced my review costs. I like it better the old way.”

It was time to go to work to figure out what happened. Fortunately, our team had analyzed the key word search results and compared them to the documents identified through Predictive Ranking. My job was to explain the difference between the two approaches. My hope was to show that the system worked well and provided a better outcome than key word search¾at least if the goal was to identify and review potentially relevant documents.

To be sure, our Predictive Ranking system came up with more documents to review than did the key word searches. However, we quickly concluded that the key word searches–while finding many potentially responsive documents–missed a lot of others that should be considered as well. Here is how we reached that conclusion.

The Numbers

Let me start by giving you some basic information about the two processes. From there it becomes a bit easier to explain the difference in the results. You can then see how we got to our ultimate conclusion.

The client collected just over 51,000 documents for this production. As a first step, counsel created a set of key word searches and asked our partner to run them using our PowerSearch utility. As I mentioned earlier, the searches hit on about 11,000 documents.

We then started the Predictive Ranking process, which is the name we started using years ago for our TAR methods. The first step was to take an initial random sample for reference purposes and to estimate the overall richness of the population. We came up with an estimate of 22%, which would suggest that there were about 11,000 relevant documents in the collection.

Hmmm. That was pretty close to the number of documents found through the key word searches. Did they nail it this time?

We next worked with the client’s legal experts to review and tag seed documents and then to undertake several rounds of system training. At the end of the process, the system suggested a review cutoff below 36%. That meant that the review team would look at the top 36% of the documents and ignore (after a confirmatory sample) the remaining 64%.

The resulting numbers looked like this:

  • Likely responsive and need review: 18,552.
  • Likely non-responsive and don’t need review: 32,982

Our sampling also suggested that the top documents above the cutoff had a richness of about 50% (which meant half of these were likely responsive). The documents below the cutoff had a richness of about 7% (which meant that only 7 out of 100 were likely responsive). Seven percent seemed like a good number for the discard pile, one that most courts would accept.

The Question

As mentioned earlier, our results caused heartburn for our partner and its client. The key word search approach seemed to require that the team review only 11,318 documents. Why pay for Predictive Ranking if it requires that the team review 7,000 additional documents? That’s about 60% higher than the key word results.

Understanding the Numbers

The answer requires that we better understand what all these figures mean. Unless we can compare apples to apples, we have no way to judge the efficacy of the two approaches. Fortunately we had an easy way to do just that.

We compared the document IDs for the files returned from the key word searches with the files returned from our Predictive Ranking process. What we found was pretty interesting. Let me show you with this simple diagram:

I created this diagram to map the comparative results of our Predictive Ranking and the key word searches. The circle represents the total documents in the population. If you add up the numbers in the four quadrants, it comes up to the 51,534 files at issue

The four quadrants represent the different states of the document population.

  • The top left quadrant represents documents that our Predictive Ranking system found likely responsive but did not return from the key word searches. There were 11,285 documents in this category.
  • The top right quadrant represents documents that hit under both approaches. These documents were returned by the key word searches and were also designated by our Predictive Ranking system as potentially responsive. There were 7,267 documents in this category.
  • The bottom left quadrant represents documents that did not hit under either approach. Neither our Predictive Ranking system nor the key word searches deemed them likely responsive. There were 28,931 documents in this category.
  • The bottom right quadrant represents documents that were returned from the key word searches but were not deemed likely responsive by our Predictive Ranking system. There were 4,051 documents in this category.

So, what can we say about all this? First, we can say that the team should probably review the 7,267 documents found in the top right quadrant. Both approaches tagged them as likely responsive. That does not mean that they will all be responsive but it is a good bet that a lot of them are.

Second, we can suggest that the 28,931 documents in the bottom left quadrant  include few responsive documents. Neither the key word searches nor our Predictive Ranking system hit on these documents. There is still a need for confirmatory sampling but we can be pretty sure that there are not a lot of responsive documents hiding in this quadrant.

The Two Key Quadrants

That leaves us with two quadrants to consider and this is where we find the answer to our puzzle. Together, these two quadrants represent about 15,000 documents. Here is what we can say about each:

  • The top left quadrant represents 11,285 documents that our Predictive Ranking system found as likely responsive. The keyword searches provide no information about these documents other than to say that they did not return from the searches.
  • The bottom right quadrant represents 4,051 documents that hit on counsel’s keyword searches but our Predictive Ranking system found to be likely non-responsive.

If counsel only reviewed documents that returned from the key word searches, they would be ignoring the 11,285 documents identified in the top left quadrant. Many of them had already been tagged during the Predictive Ranking training session and thus we knew that there were responsive documents in this quadrant. Our richness estimate went so far as to suggest that 50% of them were likely responsive, which meant that counsel might be missing 5,000-6,000 responsive documents using their key word approach. It quickly became evident that counsel would have to at least test additional documents in this quadrant before dismissing them as not responsive.

Conversely, our Predictive Ranking system led us to question how many of the 4,051 documents in the lower right quadrant were responsive. In fact, we knew from training that many of the documents in that quadrant were not responsive. At the least, that is what the reviewers concluded when they addressed them during the sampling.

We suggested that the client test the documents in this quadrant before engaging in review. Our suspicion was that these were false hits from the key word searches and not likely of interest. Our estimate was that they would find a richness of about 7%.

Answering the Question

By now you have already figured out how to respond to the client’s concerns. Simply put, the key word searches–while effective at finding some of the potentially responsive documents–missed a lot of others that should be reviewed. The Predictive Ranking system found many of the documents returned from the key word searches but it also found a lot of other potentially responsive documents. The total numbers were higher but there was good reason for that outcome. There were more documents that needed to be reviewed.

Put another way, search has two qualitative measures: precision and recall. Precision is a measure of the number of true hits (actually responsive documents) returned from your search compared to the total number returned. Recall is a measure of the total true hits returned from your search against the actual number of true hits in the population.

In our case, the key word searches may have been good on precision (assuming that the documents in the top right quadrant were, in fact, responsive). However, they seemed to miss the boat on recall. The searches missed a lot of the other responsive documents. That is not a good thing if your opponent chooses to challenge your production in court.

It turned out that my explanation proved helpful to our partner and the client team. They moved forward with their review using the documents ranked by our Predictive Ranking system. It turned out that there were a lot of responsive documents missed by the key word searches and many of the documents returned by the searches in the lower right quadrant were false hits. The explanation and diagram helped to clear up the mystery. I thought it might be helpful to others as well as they grapple with the mysteries of technology assisted review.

There may also be a moral to this story, so to speak. Discussion of technology assisted review often focuses on its ability to reduce document populations. But review is not just a numbers game—it’s also about getting it right. It does neither lawyers nor their clients any good to cut document populations if they are cutting a large number of potentially responsive documents in the process. As my story above illustrates, fewer is not always better. Here, Predictive Ranking proved itself superior to key word searching at getting it right. That may have saved counsel some grief further down the road.

Use of Predictive Coding and the Cloud in E-Discovery Rose in 2012, Survey Says

Use of predictive coding and Internet-based electronic discovery tools rose in 2012, according to the recently published 2012 ABA Legal Technology Survey Report on litigation and courtroom technology.

Of lawyers whose firm had handled an e-discovery case, 44 percent said they had used Internet-based e-discovery tools, up from 31 percent in 2011. Thirty-five percent said they had used Internet-based litigation-support software, up from 24 percent in 2011. Of those same lawyers, 23 percent said they had used predictive coding to process or review e-discovery materials, up from 15 percent the prior year.

By comparison, lawyers’ use of desktop-based e-discovery tools rose only slightly, from 46 percent to 48 percent, and their use of desktop-based litigation support software held steady, at 46 percent.

Not surprisingly, the use of these types of e-discovery tools is far more common among lawyers in larger firms than among those in solo and small firms. Among lawyers whose firm has handled e-discovery matters, only 5 percent of solo lawyers and 6 percent of lawyers in firms of 2-9 lawyers say they’ve used predictive coding. By contrast, in firms of 500 or more lawyers, 43.5 percent report having used predictive coding.

A similar but less-dramatic gap exists when lawyers who have handled e-discovery matters were asked if they ever use Internet-based e-discovery tools. Among lawyers in firms of 500 or more, 67.3 percent say they’ve used these tools. Among lawyers in solo firms, 33.3 percent say they have.

In fact, solo and small-firm lawyers are far less likely than their larger-firm counterparts to have ever handled an e-discovery matter. When asked how often they had made an e-discovery request on behalf of a client, 64.2 percent of solo lawyers said never. At firms of 500 or more, only 31.3 percent answered never.

Along the same lines, lawyers were asked how often they had received e-discovery requests on behalf of clients. Of solo lawyers, 56.1 percent said never. At firms of 500 or more, 27.9 percent said never.

Another question asked whether the lawyer’s firm (as opposed to the lawyer directly) had ever been involved in a case that required the processing or review of e-discovery materials. Only 12.8 percent of solos and 34.3 percent of lawyers in firms of 2-9 lawyers answered yes. Of lawyers in firms of 500 or more, 71 percent said yes. Among all respondents in all sized firms, 43.8 percent said that their firms had been involved in an e-discovery matter.

On the topic of outsourcing, the survey asked lawyers whether they outsource e-discovery processing or review. The results show little change in outsourcing to e-discovery consultants and companies — 44 percent in 2012 compared to 45 percent in 2011. Likewise, the percentage of outsourcing to computer forensics specialists remained steady at 42 percent from 2011 to 2012.

However, the survey indicates that outsourcing to lawyers outside their own firm is on the rise. Outsourcing to lawyers within the United States rose from 16 percent in 2011 to 25 percent in 2012. Outsourcing to lawyers outside the United States rose from 3 percent in 2011 to 8 percent in 2012. Here again, the larger the firm, the more likely the lawyer is to outsource.

Something that surprised me in the survey is that there has been virtually no change over the past three years in the number of firms reporting that they have a distinct e-discovery initiative (such as a practice group). In 2012, 25 percent of respondents said their firms had such an initiative, down from 27 percent in 2011 and equal with 2010′s 25 percent. Also notable is that, among firms that have such an initiative, fewer of them report having a partner heading it up. Increasingly, the firm’s CIO is taking on primary responsibility for its e-discovery initiative.

The 2012 ABA Legal Technology Survey Report consists of six volumes, covering a range of topics from technology basics to mobile lawyering. The e-discovery results are contained in Volume III, which covers litigation and courtroom technology. Volume III is available for purchase from the ABA for $350 (or $300 for ABA members). An abbreviated trend report on litigation and courtroom technology can be purchased for $55 (or $45 for ABA members).

Achieving Efficiency in E-Discovery: Argyle’s Interview with John Tredennick

Recently, Scott Robbin of the Argyle Executive Forum interviewed Catalyst’s founder and CEO John Tredennick about how to make e-discovery more efficient and the advantages of using one vendor to provide a central repository to manage e-discovery documents. Below is their conversation.

Trends and Best Practices in E-Discovery Privilege Review

Recent cases have presented several examples of the inadvertent production of privileged, “smoking gun” documents. In some of those cases, even with a clawback agreement in place, the court would not allow clawback. In others, even when clawback was allowed, the damage was done.

Below are the slides from a recent Catalyst Webinar, Trends and Best Practices in E-Discovery Privilege Review. (Follow that link to see and hear the entire webinar, with audio.) Presented by members of the Catalyst Search & Analytics Consulting Team, the webinar explores recent e-discovery trends and best practices regarding privileged documents.

View more PowerPoint from Catalyst Repository Systems

Webinar: Trends and Best Practices in E-Discovery Privilege Review

You cannot unring a bell, as the saying goes, and nowhere in e-discovery is that more true than when privileged documents are inadvertently produced to an opposing party. Even if you have a clawback agreement and even if the court enforces that agreement (not a certainty), the damage may already be done.

On Thursday, June 28, at noon Eastern time, Catalyst will present a free webinar, Trends and Best Practices in E-Discovery Privilege Review, presented by three members of the Catalyst Search & Analytics Consulting team. John Hokkanen, director of Search and Analytics Consulting, and senior consultants Jim Eidelman and Ron Tienzo will explore recent e-discovery trends and best practices regarding privileged documents.

Topics will Include:

  • A brief overview of recent court decisions regarding attorney-client waiver issues and best ways to use FRE 502.
  • Best ways to search for potentially privileged documents.
  • Using advanced analytics to identify social networking among in-house counsel, outside counsel, management and others to categorize privileged documents.
  • Creating a privilege log automatically as you review.
  • Best ways to QC for privilege using analytics, sampling, and rule-based software.

To attend this free event, register here.

Federal Court Affirms Judge Peck’s Predictive Coding Order

Judge Carter

When last we left the case of Da Silva Moore v. Publicis Groupe–the groundbreaking case in which U.S. Magistrate Judge Andrew J. Peck issued the first judicial opinion to endorse the use of computer-assisted review and predictive coding–it was headed for review by U.S. District Judge Andrew L. Carter Jr. Now, thanks to a heads-up from Evan Koblentz at Law Technology News, we learn that Judge Carter has issued his ruling and has adopted Judge Peck’s opinion.

“The Court adopts Judge Peck’s rulings because they are well reasoned and they consider the potential advantages and pitfalls of the predictive coding software,” Judge Carter wrote in an opinion filed today.

In challenging Judge Peck’s order, the plaintiffs had argued that he had mischaracterized and confused the issue of whether they had consented to the use of predictive coding. Judge Carter concluded that any such confusion was immaterial.

The confusion is immaterial because the ESI protocol contains standards for measuring the reliability of the process and the protocol builds in levels of participation by Plaintiffs. It provides that the search methods will be carefully crafted and tested for quality assurance, with Plaintiffs participating in their implementation. For example, Plaintiffs’ counsel may provide keywords and review the documents and the issue coding before the production is made. If there is a concern with the relevance of the culled documents, the parties may raise the issue before Judge Peck before the final production. Further, upon the receipt of the production, if Plaintiffs determine that they are missing relevant documents, they may revisit the issue of whether the software is the best method.

Plaintiffs also challenged Judge Peck’s order on the ground that predictive coding is not a reliable method. Judge Carter ruled that this issue is also premature. As the litigation continues, if the parties believe the predictive coding software is flawed or that the process produces incomplete results, they can raise their concerns with Judge Peck and ask him to reconsider, Judge Carter noted. “To call the method unreliable at this stage is speculative.”

There simply is no review tool that guarantees perfection. The parties and Judge Peck have acknowledged that there are risks inherent in any method of reviewing electronic documents. Manual review with keyword searches is costly, though appropriate in certain situations. However, even if all parties here were willing to entertain the notion of manually reviewing the documents, such review is prone to human error and marred with inconsistencies from the various attorneys’ determination of whether a document is responsive. Judge Peck concluded that under the circumstances ofthis particular case, the use of the predictive coding software as specified in the ESI protocol is more appropriate than keyword searching. The Court does not find a basis to hold that his conclusion is clearly erroneous or  contrary to law.

As to that secondary issue I mentioned in an earlier blog post–whether Rule 702 and Daubert apply to a court’s acceptance of a predictive-coding protocol–Judge Carter made short work of that. In a footnote, he wrote: “The Court adopts Judge Peck’s analysis of Rule 26(g) and Fed. R. Evidence 702 for similar reasons provided in his written opinion.”

Thus, Judge Peck’s predictive coding order has stood its ground and, with Judge Carter’s adoption of his reasoning, the use of predictive coding has taken another giant step towards the mainstream.

Should the ‘Daubert’ Standard Apply to Predictive Coding? We May Know Soon

It’s been a month since U.S. Magistrate Judge Andrew J. Peck issued his seminal opinion on predictive coding, Da Silva Moore v. Publicis Groupe, and it continues to make waves. Notably, it appears that U.S. District Judge Andrew L. Carter Jr. will weigh in on the issue. On March 13, he entered an order granting plaintiffs’ request to submit additional briefing on their objections to Judge Peck’s order.

A key issue Judge Carter may need to address is one given short shrift in coverage of and commentary on Judge Peck’s opinion. Understandably, most of the commentary focused on the fact that Judge Peck’s opinion marked a milestone — the first judicial opinion to recognize that computer-assisted review is an acceptable way to search for electronically stored information.

But in the course of that opinion, Judge Peck made another significant ruling. He concluded that Federal Rule of Evidence 702 and the Supreme Court’s decision in Daubert v. Merrell Dow Pharmaceuticals do not apply to a court’s acceptance of a predictive-coding protocol.

Rule 702 and Daubert give trial judges the responsibility to act as “gatekeepers” to exclude unreliable scientific and technical expert testimony. Judge Peck reasoned that these did not apply to the Da Silva Moore case because no one was trying to put anything into evidence. Here is how he explained it:

If MSL sought to have its expert testify at trial and introduce the results of its ESI protocol into evidence, Daubert and Rule 702 would apply. Here, in contrast, the tens of thousands of emails that will be produced in discovery are not being offered into evidence at trial as the result of a scientific process or otherwise. The admissibility of specific emails at trial will depend upon each email itself (for example, whether it is hearsay, or a business record or party admission), not how it was found during discovery.

Rule 702 and Daubert simply are not applicable to how documents are searched for and found in discovery.

You may recall that before Judge Peck issued his written opinion in this case on Feb. 22, he made oral rulings at the motion hearing on Feb. 8. On Feb. 22, just as Judge Peck was issuing his written opinion, the plaintiffs filed objections to his Feb. 8 rulings. One of their central arguments was that Judge Peck erred in disregarding his gatekeeper role under Daubert.

Because predictive coding is a new and novel technology, they argued, Judge Peck should have required expert testimony regarding its reliability or appropriateness. They cite Magistrate Judge Paul Grimm’s well-known ruling in Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251, 260 n.10 (D. Md. 2008), where he said, “[R]esolving contested issues of whether a particular search and information retrieval method was appropriate … involves scientific, technical or specialized information.” Relying on this, the plaintiffs argued:

[A]t no point did the Magistrate review any evidence to support his decision. The Magistrate took no judicial notice of any documents or studies that support the reliability of MSL’s method, nor did he receive any affidavits or declarations from purported experts that supported the methodology of MSL’s method. To his credit, the Magistrate did ask the parties to bring the ESI experts they had hired to advise them regarding the creation of an ESI protocol. These experts, however, were never sworn in, and thus the statements they made in court at the hearings were not sworn testimony made under penalty of perjury. The Magistrate judge never asked for or evaluated the qualifications of these experts, nor were the parties given an opportunity to question or cross-examine the experts in order for the Court to make a finding regarding the reliability of the experts’ opinions. Thus, the Magistrate’s decision relies only on the arguments made by counsel.

On March 7, the defendants responded to plaintiffs’ objections. With regard to the Daubert issue, they took the same position as Judge Peck–that Rule 702 and Daubert do not apply to the methods used to take discovery, but only kick in when evidence is presented at trial.

Plaintiffs simply are incorrect in their assertion that Victor Stanley requires expert testimony regarding the methodology selected by a party to search for electronically stored information. Rather, this case only requires that the selected methodology was carefully planned by qualified persons, contains provisions for quality assurance, and is supported by persons with the requisite qualifications and experience.

After receiving the defendants’ response, the plaintiffs wrote to Judge Carter on March 9 asking for leave to file their own response.

[W]hile Plaintiffs were denied an opportunity to respond to Magistrate Judge Peck’s written opinion, MSL had the benefit of filing its opposition approximately two weeks after the written opinion had been issued. Indeed, MSL’s opposition brief and supporting expert declarations not only reference, but also largely rely upon Magistrate Judge Peck’s observations regarding Plaintiffs’ Objection.

Judge Carter granted the plaintiffs’ request on March 13. (The brief was due March 19.) That means that we can expect him to issue a ruling of his own. It seems unavoidable that any ruling he issues will address the core issue of the appropriateness of computer-assisted review, at least in this case. Most likely, he will also have to address this secondary issue of the applicability of Daubert. If he does, in fact, squarely address these issues–and regardless of whether he agrees with Judge Peck–his ruling will be yet another milestone for predictive coding.

Judge Peck Provides a Primer on Computer-Assisted Review

Magistrate Judge Andrew J. Peck issued a landmark decision in Monique Da Silva Moore v. MSL Group, filed on Feb. 24, 2012. This much-blogged-about decision made headlines as being the first judicial opinion to approve the process of “predictive coding,” which is one of the many terms people use to describe computer-assisted coding.

Well, Judge Peck did just that. As he hinted during his presentations at LegalTech, this was the first time a court had the opportunity to consider the propriety of computer-assisted coding. Without hesitation, Judge Peck ushered us into the next generation of e-discovery review—people assisted by a friendly robot. That set the e-discovery blogosphere buzzing, as Bob Ambrogi pointed out in an earlier post.

I recommend reading the decision (and its accompanying predictive-coding protocol) not for its result but for its reasoning. This is one of the best sources I have seen on the reasons for and processes underlying predictive coding. Indeed, Judge Peck provided a primer on how to conduct predictive coding that is must reading for anyone wanting to get up to speed on this process.

What is Computer-Assisted Review?

Judge Peck started by quoting from his earlier article in Law Technology News:

By computer-assisted coding, I mean tools (different vendors use different names) that use sophisticated algorithms to enable the computer to determine relevance, based on interaction with (i.e. training by) a human reviewer.

As Judge Peck concluded: “This judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.”

Why Do We Need Computer-Assisted Review?

The answer for Judge Peck was simple: Other methods of finding relevant documents are expensive and less effective. As he explained:

  • The objective of e-discovery is to identify as many relevant documents as possible while reviewing as few non-relevant documents as possible.
  • Linear review is often too expensive. Despite being seen as the “gold standard,” studies show that computerized searches underlying predictive coding are at least as accurate as human review, if not more accurate.
  • Studies also show a high rate of disagreement among human reviewers as to whether a document is relevant. In most cases, the difference is attributable to human error or fatigue.
  • Key word searches to reduce data sets also miss a large percentage of relevant documents. The typical practice of opposing parties choosing keywords resembles a game of “Go Fish,” as Ralph Losey once pointed out.
  • Key word searches are often over-inclusive, finding large numbers of irrelevant documents that increase review costs. They can also be under-inclusive, missing relevant documents. In one key study the recall rate was just 20%.

Ultimately, Judge Peck reminded us of the goals underlying the Federal Rules of Civil Procedure. Perfection is not required. The goal is the “just, speedy, and inexpensive determination” of lawsuits.

Judge Peck concluded that the use of predictive coding was appropriate in this case for the following reasons:

  1. The parties’ agreement.
  2. The vast amount of ESI (over 3 million documents).
  3. The superiority of computer-assisted review over manual review or keyword searches.
  4. The need for cost effectiveness and proportionality.
  5. The transparent process proposed by the parties.

The last point was perhaps the most important factor leading to the decision: “MSL’s transparency in its proposed ESI search protocol made it easier for the Court to approve the use of predictive coding.”

How Does the Process Work?

The court attached the parties’ proposed protocol to the opinion. While it does not represent the only way to do computer-assisted review, it provides a helpful look into how the process works.

  1. The process in this case began with attorneys developing an understanding of the files and identifying a small number that will function as an initial seed set representative of the categories to be reviewed and coded. There are a number of ways to develop the seed set, including the use of search tools and other filters, interviews, key custodian review, etc. You can see more on this subject below.
  2. Opposing counsel should be advised of the hit counts and keyword searches used to develop the seed set and invited to submit their own keywords. They should also be provided with the resulting seed documents and allowed to review and comment on the coding done on the seed documents.
  3. The seed sets are then used to begin the predictive coding process. Each seed set (one per issue being reviewed) is used to begin training the software.
  4. The software uses each seed set to identify and prioritize all similar documents over the complete corpus under review. Essentially, they review at least 500 of the computer-selected documents to confirm that the computer is properly categorizing the documents. This is a calibration process.
  5. Transparency requires that opposing counsel be given a chance to review all non-privileged documents used in the calibration process. If the parties disagree on tagging, they meet and confer to resolve the dispute.
  6. At the conclusion of the training process, the system then identifies relevant documents from the larger set. These documents are reviewed manually for production. In this case, the producing party reserved the right to seek relief should too many documents be identified.
  7. Accuracy during the process should be tested and quality controlled by both judgmental and statistical sampling.
  8. Statistical sampling involves a small set of documents randomly selected from the total files to be tested. That allows the parties to project error rates from the sample.
  9. Here, the parties agreed on a series of issues that will, of necessity, vary on other cases. The key point is that the parties agree on the issues and test the coding during the process.

Random Samples

It is important to create an initial random sample from the entire document set. The parties used a 95% confidence level with an error margin of 2%. They determined that the sample size should be 2,399 documents. You can figure this out using one of the publicly available sample-size calculators such as Raosoft, which we often use:

Seed Sets

The protocol goes on to describe a number of ways to generate seed sets including:

  • Agreed-upon search terms.
  • Judgmental analysis.
  • Concept search.

The parties frequently sampled the results from searches to evaluate their effectiveness.

There is at least a good blog post to be written about seed sets. Some computer-assisted coding systems like the one used for this case start their process with seed sets. The notion is that attorneys understand the cases, know what is and is not relevant and can train the system to recognize more relevant documents more effectively than starting with no seed documents.

Others think this is a mistake. They believe that however well meaning, the attorneys will bias the system to find what they think is relevant and get self-reinforcing results. In this regard, they are suggesting that the attorneys will make the same mistakes found in key word searches—thinking that you know which words will be most effective at finding your documents.

Systems following this logic urge the user to start from scratch, telling the system what is and is not relevant based on reviewing documents. As you do that, the system begins developing its own profile of relevant documents and builds out the searches. The belief is that the system may create a better search through this process than it might if you bias it with your seed documents.

There is a middle ground here as well. Many of the latter systems (no seed) will allow you to submit a limited number of seed documents as part of the training process. That may represent the best of both worlds or it may not, depending on your beliefs. The important point is that there are different approaches to computer-assisted processing. This protocol shows you one approach only.

Training Iterations

The process involves a number of computer runs to find responsive documents. The parties started with a first set of potentially relevant documents based on analysis of the seed set. After that review, the computer was asked to consider the new tagging and find a second set for testing. Then a third and a fourth.

The protocol suggested that the parties run through this process seven times. The key is to watch the change in the number of relevant documents predicted by the system after each round of testing. Once that number dropped below a delta of 5%, the parties had the option to stop. The notion is that the system has become stable by that time, with further review unlikely to uncover many more relevant documents.

Finishing the Process

Once the training has completed and the system is “stable,” we move from computer-assisted to human-powered review. At that point, the producing party reviews all of the potentially responsive documents and produces accordingly.

Final QC Protocol

As a final stage, the parties need to focus on the potentially non-responsive documents—the ones the system says to ignore. The parties select a random sample (2,399 documents again) to see how many were, in fact, responsive.

These same documents (non-privileged ones) must be produced to the opposing party for review. If that party finds too many responsive documents in the sample or otherwise objects, it is time for a meet-and-confer to resolve the dispute. Failing that, you can always go to the court and fight it out.

Is This the Bible on Predictive Coding?

Certainly not. There are a lot of ways to approach this process. However, first opinions on any topic carry a lot of weight. We chose a profession that is guided by precedent, and these are first tracks on this new and exciting subject. The suggested procedures make sense to me and provide a starting point for your predictive coding efforts. This opinion and its accompanying protocol are important reading whether you are proposing or opposing the process for your next case.

 

Check for Privilege Before Turning Over Your Database: The Lesson in Thorncreek Apartments

Before you give opposing counsel the keys to your production database, run at least one check on the privilege field to see if any of your documents are marked “privileged.” That is the lesson a federal judge taught a hapless defense counsel in Thorncreek Apartments III v. Village of Park Forest, 2011 U.S. Dist. Lexis 88281 (N.D. Ill August 9,2011). If you don’t, you may be deemed to waive the privilege. I hate when that happens!

Icon of two keys on a keyring“What’s going on here?” you might ask. Can anyone be that sloppy? “Maybe,” I say in response. At least that’s what it seemed like here. Counsel literally made a production database available for more than seven months without once checking to see if it included privileged documents. A waiver is not inadvertent if you were hopelessly sloppy about it. Here is the story.

The Facts

The plaintiffs filed a motion before the District Court arguing that privilege was waived for six documents included in a production database provided by defendant Village of Park Forest. The Village argued that the documents were inadvertently produced in a production database hosted by its online vendor Kroll Ontrack. (I don’t think Kroll did anything wrong here.)

Defense counsel went through what seemed like a reasonable process in pulling files off of a number of tape backups. First, they conducted a key word search to pull back documents that might be responsive. The key words had been agreed upon by opposing counsel and in some cases ordered by the court. (Looks like there may have been some controversy around the key words and no, people weren’t talking about predictive coding in this case.)

As a second step, Kroll put the potentially responsive documents in an online database for defense counsel to review. They did so, marking documents as responsive, non-responsive and privileged.

The third step was to place the documents released by defense counsel in another online database made accessible to plaintiffs’ counsel. As the court noted, documents that the Village elected to withhold from production were not placed in this database. So far it all makes sense.

Producing the Privilege Documents

Here is where the rub came. At some point, plaintiffs complained that they were not able to see the documents returned from the agreed-upon searches but marked non-responsive. The Village had previously said it would include non-responsive documents in the production database in order to show how many its review had identified. In an attempt to be magnanimous, the Village elected to begin placing all of the documents in the production database—responsive and non-responsive.

As the court noted, the parties’ briefs left it a bit “murky” as to how the Village intended to handle the documents counsel had reviewed and marked “privileged.” Why they were not pulled out of the population before the production database was marked live is simply beyond me.

But they were not and 159 privileged documents went online, easily available to plaintiffs’ counsel. Even more surprising, during the seven months the production database was live, Village counsel did not bother to produce a privilege log. At one point, counsel claimed that there were no privileged documents to withhold. Somebody tell me how that happened.

So, as you probably guessed by now, depositions started and some wiseacre slapped two of the juicy privileged documents in front of a witness. Village counsel erupted, claiming privilege and inadvertent production. The game was on.

Game On: Reel Those Privileged Documents Back In

Actually, the game was over, at least for defense counsel. It appears that the parties came to agreement with respect to most of the privileged documents (probably the non-important ones) but disagreed with respect to six of them. Quickly concluding that at least some of the six were privileged, the court thus was required to review the doctrine of inadvertent waiver.

The test for inadvertent waiver is pretty simple, made more so by the recent amendments to F.R.E 502:

  1. Were the documents privileged?
  2. Was the production inadvertent?
  3. Should privilege be waived nonetheless?

The privilege discussion didn’t interest me much. I did all that in law school. So, my attention was focused on the second and third prongs of the test.

Was the Disclosure Inadvertent?

As the court noted, this issue could be wrapped up with the third element, which goes to the heart of forgiveness. Rather than do that, the court espoused a simple analysis for this element.

It simply presumed based on the evidence before it that counsel didn’t really mean to include all of those privileged documents in the production database. And who would?

Certainly counsel wasn’t using those documents in an affirmative way, which was the original reason courts held that the full privilege would be waived. The old, “I did it because counsel told me to do it,” was a key way to waive privilege for that entire subject matter. That didn’t happen here.

Nor did the Village sit silent as opposing counsel was cramming those two juicy privileged documents down their witness’s throat. They got up and objected, allowing the deposition to go forward only under protest.

Surprisingly—and this seemed important to the court—it then took more than four months before Village counsel came up with a proper privilege log to join the dispute. While the court stated this as a dispositive fact, you can just tell it didn’t like counsels’ lackadaisical approach. Neither would I; I wonder what happened there.

Anyway, the court let them off the hook and ruled that the production of privileged documents were inadvertent. On to the next step in the test. (But don’t you take four months to produce your privilege log.)

Reasonable Steps to Prevent Disclosure?

Once the court concluded the production was inadvertent, the Village had two more hurdles to cross:

  1. Did the holder of the privilege take reasonable steps to prevent production of privileged documents?
  2. Did the holder of the privilege promptly attempt to rectify the error after it became known?

This case was about the first prong of that test. The court complained that the Village provided “precious little” about the steps taken to find and isolate privileged documents. Counsel for the Village wrote an email stating that he spent “countless hours” reviewing a “relatively large amount” of documents to find those with privileged content. Why, the court asked, was there no affidavit to support this allegation? Why indeed.

The court had no sympathy for the next argument—counsel thought that marking the document “privileged” would keep it out of the Kroll database. As the court astutely pointed out:

It would have been a simple matter for the Village to check the production database created by Kroll—before it went live online and became available to [plaintiffs]—to verify that privileged documents were not disclosed.

Duh! I am not privy to the Kroll software but I bet there was a simple way to search to see if anything was privileged—either by privilege tag, if that was included in the production database, or by the reference number given to the privileged documents. What happened here?

In a somewhat gratuitous fashion, the court piled on by noting that not a single privileged document was withheld and that no privilege log was produced.

I confess that I am puzzled myself as to how this happened. Counsel went to the trouble of marking 159 documents privileged. Why in the seven months that followed did counsel not ask for a printout from the Kroll system sufficient to produce a simple privilege log? Alas, we case readers don’t get to ask these questions.

As the court concluded—ironically citing yet another case as precedent for the point:

It is axiomatic that a screening procedure that fails to detect confidential documents that are actually listed as privileged is patently inadequate.

Sorry Charlie. You lose on that one.

Failing to Notice and Rectify

The court didn’t stop there. It went on to fail the Village on the second element of the test. It held that the Village failed to rectify the error in a timely way. Again, ignorance in using the Kroll database seemed to be at the heart of the finding.

For starters, the court forgave the Village for taking another four months after the deposition to issue the privilege log. Methinks the court didn’t like that fact at all but just didn’t want to say so.

Instead, the court jumped on counsel for failing to find its own error for over nine months—the period from March to December 2009 when the production database was available to the plaintiffs. According to the court, defense counsel should have logged-in and run a privilege search during this period. The 150+ hits that came back would have been a dead giveaway.

As the court said:

Yet for some none months, the Village apparently had no inkling that the production database contained documents that the Village wished to withhold as privileged, or that [plaintiffs] were reviewing and obtaining those documents. If that is true (and we accept that it is), that means the Village as not paying any attention whatsoever to what documents its opponent in the litigation was selecting from the database. Perhaps [plaintiff] simply selected all of them; the parties’ briefs do not tell us if this is so.  But, even if that were the case, a single visit to the production database could have alerted the Village to the problem.

This seems like piling on to me. Counsel clearly didn’t know much about databases or pay much attention to the process. The process was sloppy or non-existent. The client paid the price.

The court did go on to make one last point that brings us full circle. It noted that the problem might have come to light earlier had Village counsel provided a privilege log, which was its duty in the first place. Doing that might have forced plaintiff’s counsel to acknowledge what it knew—that the database had a bunch of privileged documents. But with no privilege log and a seeming statement that there were no documents being withheld on privilege grounds, all bets were off. Plaintiffs’ counsel could sit quietly until the deposition and then drop the bombshell on a hapless witness.

What Can We Learn From This?

This is a simple-enough case with a simple-enough message: Don’t produce documents in an online database without doing some basic checking first. Assuming, as the case does, that there was a field containing a privilege designation, it would take counsel milliseconds to realize that something bad was about to happen. If lead counsel wasn’t comfortable running the search, how about that tech-savvy associate or legal assistant? If not them, how about your friendly vendor? If asked, the Kroll people could have spotted the mistake. Some might say they should have but not me.

We built an automated production system that can be run by our clients with no Catalyst intervention. Rather than allow these kinds of mistakes to happen, we added a QC rule-set that will not allow a document marked “privileged” or “potentially privileged” to be produced without a specific override. Even if foldered for production, these documents are pulled out into a special folder that must be addressed by the client before a production can go through.

These rules are just another step in trying to make the process easier and more foolproof for lawyers who are not comfortable with technology. It doesn’t guaranty that a privileged document will never be produced—we have seen cases where the documents are marked privileged after they are produced—but it can cut down on mistakes. With the stakes (and the volumes) this high, you have to do everything you can to avoid an inadvertent waiver.

This is not a case where counsel took dozens of steps to avoid privilege yet something slipped through, as I have written about before. (See, Bad Facts Make Bad Law: ‘Mt. Hawley’ A Step Backward for Rule 502(b).) Rather, it is about a simple mistake that anyone could have caught with just a smidgen of effort. I don’t feel as bad for Village counsel as I do for some of the other victims. This case is no “derelict on the waters of the law,” it is a fair ruling on somewhat extreme circumstances. Counsel were sleeping on the job (or so it seems to me) and paid the price.

So, the lesson here? Don’t produce those documents without checking to see if privileged files might have snuck through. Run some searches, sample some documents and for God’s sake check the privilege field. You will sleep better, and pay lower malpractice premiums, if you do.

 

Guest Post: Deploy Review Attorneys as a SWAT Team, Using their Separate Skills to Reach a Common Goal

[The following guest post is by Jeffrey Parkhurst, project consultant at Studeo Legal, a Catalyst partner.]

A March 2011 article in The New York Times, “Armies of Expensive Lawyers, Replaced by Cheaper Software,” touched off a firestorm of discussion and debate within the e-discovery industry. It led some to suggest that algorithms, artificial intelligence and predictive coding could put lawyers out of business. I don’t believe that. In my opinion, the most successful litigators will be those who know how to combine cutting-edge technology with proven processes and “smart” review strategy.

There is no doubt that advances in technology are changing the role of lawyers in e-discovery. However, no software, no matter how sophisticated, will ever replace the need for intelligent, experienced legal talent to oversee the process and examine the results. Some people seem to have lost sight of the importance of those with “boots on the ground”—the actual document review staff.  Document reviewers remain a key (if often underpaid and unappreciated) asset to successful litigation.

Even though advances in e-discovery technology will never fully replace review teams, they are already leading us to rethink the traditional review paradigm. At Studeo Legal, we firmly believe that the most effective way to organize a document review team is to consider the members to be “review specialists,” with separate areas of expertise. Think of them as similar to a military SWAT team, made up of people with specialized skills all focused on a common mission. In the same way, members of a legal SWAT team are highly trained, intelligent lawyers, each with different skill sets that can help make the mission a success.

Making a SWAT team the core of your document review offers many advantages over a standard review approach.

What Can a SWAT Team Do For Your Review?

There are essentially two types of review attorney that you can involve in a project. The first type is the attorney who is often recruited at the last minute, with minimal review experience and no stake in the outcome of the case. This attorney must be familiar with different document review software, have some solid background and experience in the subject matter and understand the definitions of relevance and privilege.

While attorneys of this first type can usually be counted on to “get through” the document population and make determinations about relevance and privilege, there is usually a disconnect between them and the litigation team. Often, the knowledge that they gain over time is not transferred to the litigation team due to their “temporary” status on the case. This can be especially detrimental when the issues change or the matter expands beyond the original case boundaries.

The second type of review attorney is one who can provide all the same services as the first-level reviewers but who brings additional skills useful to the project manager and trial counsel. This attorney is easily able to get through the basic review work error free while also having the ability to provide the litigators with invaluable insight and interpretation about a document set.

While skill sets will differ from attorney to attorney, among them you will often find skills that give them the ability to:

  • Identify new, related areas of concern that were not considered when the document review was started.
  • Perform iterative searches on the original data set to identify further key, related documents.
  • Assist in the preparation of a solid document review plan.
  • Provide valuable insight into the relevant document population.
  • Interpret additional concepts and areas of litigation.
  • Assist with the creation of deposition kits.
  • Prepare motions.
  • Prepare discovery responses.
  • Employ quality control procedures throughout the process.

By identifying these skills and making use of them, you can turn your review team into a SWAT team.

What Benefits Does a SWAT Approach Offer to Clients?

A  SWAT team approach to the review allows the client to receive far more than a linear review of documents during the first pass with technology. This approach results in:

  • More accurate document interpretation.
  • More accurate determination of whether documents are hot or responsive.
  • Greater insight into how documents fit into your litigation strategy.
  • Identification of new areas of concern, based on an intelligent review of the core documents with related legal issues and correct privilege identification.
  • The ability to perform defensible sampling techniques on the original document population, as new information is identified during the document review. This can identify additional related documents not found by the original search protocols.
  • Intelligent review results in new document threads, language nuances and matter concepts that technology is still not able to capture.

If the need arises to expand the size of the review team due to deadlines or increased document population, the SWAT team acts as the core group of reviewers, quickly training new attorneys, supervising work and training them about the case to date.

The Role of Technology

Use of the most advanced technology available is key to handling litigation in today’s market.  In fact, it is the only way to manage the terabytes of information in a case. Technology and tools exist to help distill your data and parse it into any number of different folders and categories. Search technology can include Bayesian concept searching, predictive coding, linguistic-based search, artificial intelligence or all of the above.

Choosing the correct advanced technology can help you significantly decrease the volume of data that requires direct attorney review. It will create manageable sets of potentially relevant documents. Using a sophisticated tool such as Catalyst CR, along with Catalyst’s knowledgeable consulting and project management staff, will assure your clients that the best technology is being used to parse the data and create a manageable document population.

SWAT Teams at Studeo Legal

At Studeo Legal, we employ this SWAT team approach to document review and have found it to be highly effective. We help clients assemble the people, processes and technology best suited to the project. We have used SWAT teams for:

  • Privilege review, based on a group of defined documents.
  • Relevancy/responsiveness review, identifying documents related to the discovery request.
  • Subject matter review, identifying documents related to a particular issue.
  • Quality control, tracking document tagging for accuracy against client criteria.
  • Project reporting, for quality, status, progress and exception reporting.

Of course, we also provide other e-discovery support, as needed for high-volume or complex document projects.

Conclusion

Jeffrey Parkhurst

Despite the inflammatory headline in The New York Times, technology is not about to put lawyers out of business. Technology can make it possible to reduce your document population from millions of pages to a few hundred thousand. But thorough discovery still requires human review and interpretation. Even technology benefits from human interaction to perform iterative searches, revisit the database and uncover new evidence trails.

It has been our clients’ experience that document review often changes the nature or strategy of the case. While time and money are important, discovery is not a contest to see how fast or how few documents can be reviewed and produced.  It is about locating those documents that can make or break your case. Implementing a SWAT team approach for document review is the most effective way to supplement your use of top technology.

The above post is adapted from an in-depth Studeo Legal white paper on document review strategy that offers a unique perspective on this increasingly controversial area of litigation support. To download the full white paper, please click here.  Please direct any follow up questions to Studeo Legal at (817) 529-6900.