Catalyst Repository Systems - Powering Complex Legal Matters

E-Discovery Search Blog

Catalyst E-Discovery Search Blog RSS Follow Catalyst on Twitter Join Catalyst on Facebook Catalyst on LinkedIn Catalyst YouTube Channel
Follow Us:
Technology, Techniques and Best Practices

In Praise of Proportionality: Judge OKs Predictive Coding After Keyword Search

Predictive coding purists might argue that the process is tainted if it is preceded by the use of keyword searching to reduce the document set. As a matter of fact, that was exactly what the plaintiffs argued in the multi-district litigation against Biomet over its M2a Magnum hip implant. But in a ruling last week, U.S. District Judge Robert L. Miller Jr. said that proportionality trumped purity, and that even if predictive coding might unearth additional relevant documents, the cost would far outweigh the likely benefits.

Emblème de la JusticeThe ruling came in a highly contentious matter in which Biomet has already produced 2.5 million documents out of a universe of 19.5 million. The plaintiffs say Biomet has not produced enough and that the production should total closer to 10 million documents.

Biomet began the process of producing documents in the summer of 2012, before the multiple lawsuits over its hip implant were centralized under the Judicial Panel on Multidistrict Litigation. It started by using keyword culling, reducing the number of documents from 19.5 million to 3.9 million, or 1.5 terabytes of data. Deduplication further reduced the number of documents to 2.5 million.

Biomet then employed predictive coding to identity relevant and privileged documents from among the 2.5 million that remained after keyword searching and deduplication. It also invited the plaintiffs to suggest additional search terms and offered to produce the rest of the non-privileged documents so that plaintiffs could verify that it was producing the relevant documents.

This process has cost Biomet $1.07 million so far and will end up costing between $2 million and $3.25 million, according to Judge Miller’s ruling.

The plaintiffs objected that Biomet’s initial use of keyword searching tainted the predictive coding process. They argued that keyword searching is less accurate than predictive coding, citing a recent article that said that keyword searches generate only a 20 percent responsive rate, compared to at least a 75 percent responsive rate for predictive coding. Because Biomet started with the less-accurate keyword searches, the entire process is flawed. The only remedy, they asserted, is for Biomet to go back to where it started and employ predictive coding against the original set of 19.5 million documents, with plaintiffs and defendants sitting together and jointly going through the process.

Balancing Cost against Benefits

Against this backdrop, Judge Miller framed the issue this way:

The issue before me today isn’t whether predictive coding is a better way of doing things than keyword searching prior to predictive coding. I must decide whether Biomet’s procedure satisfies its discovery obligations and, if so, whether it must also do what the Steering Committee seeks.

Judge Miller easily decides that Biomet’s process complied with the Federal Rules of Civil Procedure and with the Seventh Circuit Principles Relating to the Discovery of Electronically Stored Information, as well as with principles set forth by The Sedona Conference.

By contrast, he says that plaintiffs’ request that Biomet go back to Square One “sits uneasily with the proportionality standard in Rule 26(b)(2)(C).” The cost of this, he notes, would run in the “low seven figures,” but Biomet’s testing suggests it would find only a “comparatively modest number of documents.”

It might well be that predictive coding, instead of a keyword search, at Stage Two of the process would unearth additional relevant documents. But it would cost Biomet a million, or millions, of dollars to test the Steering Committee’s theory that predictive coding would produce a significantly greater number of relevant documents. Even in light of the needs of the hundreds of plaintiffs in this case, the very large amount in controversy, the parties’ resources, the importance of the issues at stake, and the importance of this discovery in resolving the issues,I can’t find that the likely benefits of the discovery proposed by the Steering Committee equals or outweighs its additional burden on, and additional expense to, Biomet.

Judge Miller concludes by stating that he assumes Biomet will remain open to meeting and conferring on additional, reasonable search terms and to producing the non-privileged documents included in the statistical sample. “Beyond that,” he writes, “if the Steering Committee wishes production of documents that can be identified only through re-commenced processing, predictive coding, review, and production, the Steering Committee will have to bear the expense.”

The full text of Judge Miller’s order is below. (If you are not able to see the embedded document below, click here to see it.)


Court Awards $2.8M to Cover Cost of Technology Assisted Review

Among e-discovery practitioners, it was a major milestone last year when U.S. Magistrate Judge Andrew J. Peck issued Da Silva Moore v. Publicis Groupe, the first judicial opinion expressly approving the use of technology-assisted review. In the context of real-world litigation, however, using TAR may be only half the battle — there is also the issue of having to pay for it.

Judge Battaglia

Thus, another judicial milestone may have been reached recently when a federal judge in San Diego awarded $2.8 million for costs associated with the use of TAR in a complex patent lawsuit that involved “voluminous” quantities of electronically stored information.

The fee award in Gabriel Technologies Corp. v. Qualcomm Inc. was part of a much-larger $12.4 million attorneys’ fees award in favor of Qualcomm. U.S. District Judge Anthony Battaglia entered the fee award after determining that the plaintiffs’ claims “were objectively baseless and brought in subjective bad faith.”

As part of its fee request, Qualcomm and its lead counsel, Cooley LLP, sought to recover $391,928.91 for use of an outside document-review vendor and $2,829,349.10 for fees associated with the outside vendor’s use of TAR.

Turning first to the fees for the review vendor, Judge Battaglia found that Cooley’s use of the vendor, Black Letter, was reasonable and that the fees charged by the vendor were reasonable.

The Black Letter attorneys billed a total of 6,949.5 hours of document review at rates of $55 to $67 per hour. As Defendants note in their motion, Plaintiffs’ claims involved 92 patents resulting in voluminous document production. For this reason, Cooley reasonably decided to have Black Letter perform document review in this matter. Had Cooley performed the document review themselves, the resulting attorneys’ fees would have undoubtedly been exponentially higher than those charged on behalf of Black Letter. In light of the circumstances and the amount of discovery required, the Court concludes that the rates charged and hours spent by Black Letter are reasonable and, thus, finds the resulting lodestar amount of $391,928.91 to be reasonable as well.

Turning next to the use of technology-assisted review supplied by a different vendor, H5, Judge Battaglia begins his analysis by quoting Qualcomm’s explanation for the use of TAR:

Over the course of this litigation, Defendants collected almost 12,000,000 records — mostly in the form of Electronically Stored Information (ESI). . . . Rather than manually reviewing the huge volume of resultant records, Defendants paid H5 to employ its proprietary technology to sort these records into responsive and non-responsive documents.

Here again, Judge Battaglia found both the use and the cost of TAR to be reasonable:

The review performed by H5 and Black Letter accomplished different objectives with the H5 electronic process minimizing the overall work for Black Letter. Again, the Court finds Cooley’s decision to undertake a more efficient and less time-consuming method of document review to be reasonable under the circumstances. In this case, the nature of Plaintiffs’ claims resulted in significant discovery and document production, and Cooley seemingly reduced the overall fees and attorney hours required by performing electronic document review at the outset. Thus, the Court finds the requested amount of $2,829,349.10 to be reasonable.

From what I have been able to find, this is the first published opinion in which a federal judge expressly awarded fees to cover the cost of technology-assisted review. Just as Da Silva Moore opened the door to other cases endorsing the use of TAR in e-discovery, perhaps so will this case open the door to other cases awarding fees for TAR.

 

Report: Predictive Coding Replaces Sanctions as the Big News in E-Discovery

For the last several years, year-end reports on e-discovery have highlighted sanctions as the lead headline. (For examples from this blog, see Report: Sanction Requests Rise But Awards Hold Steady for 2011 and E-Discovery Sanctions Reach an All-Time High, Survey Finds.) For 2012, however, a different story took the lead spot — the rise of predictive coding.

Such is the conclusion of the 2012 Year-End Electronic Discovery and Information Law Update published by the law firm Gibson Dunn.

In our prior electronic discovery mid-year and year-end reports, the lead story was sanctions, as numerous decisions imposing onerous penalties for real or perceived e-discovery failures caught the attention of the legal community. By contrast, 2012 was the year of predictive coding, and of meaningful rules reform becoming an important step closer.

What made predictive coding the story of 2012, says the report, were the several court decisions that discussed and even endorsed it.

In the absence of judicial approval, many litigants were unwilling to use this technology. That may well change now, following several decisions approving review methodologies involving predictive coding.

Of course, the increasing acceptance of predictive coding and other forms of technology-assisted review was not the only big e-discovery story last year. Among others cited by Gibson Dunn in its report were:

  • Proposed amendments to the Federal Rules of Civil Procedure that would limit the most serious sanctions for failures to preserve to cases where the court finds that the failure was willful or in bad faith, or that it “irreparably deprived a party of any meaningful opportunity to present a claim or defense.”
  • The rise of international e-discovery and the corollary need to deal with foreign data protection and privacy law.  ”Foreign data protection and privacy laws have become pervasive and foreign data protection authorities more active in their enforcement of such laws,” says the report.
  • The European Commission’s proposal to replace the 27 data protection laws of the EU member states with a single data privacy regulation — a proposal that has good news and bad news for companies, according to the report.

Sanctions awarded by type and percentage of cases where sanctions granted

Even though the sanctions story is no longer the lead in Gibson Dunn’s year-end report, it is by no means gone away. However, rather than focus on punitive sanctions, courts have shifted towards pragmatic solutions.

“Decisions have increasingly noted that remedial monetary sanctions, as well as other measures such as reopening discovery and hiring forensic analysts to search for spoliated data, are generally fairer and better at making the aggrieved party whole than punitive sanctions such as a default judgment,” the report says.

An area that gained increasing attention in 2012 among e-discovery professionals and courts is the discoverability of social networking information, the report finds.  Courts increasingly face difficult questions about the extent to which parties are required to preserve social media and about whether changes to social media sites constitute spoliation. “One court this year even required a defendant in a trademark infringement case to recreate a Facebook page as it had previously existed, so that the Facebook page showed plaintiff in a photo that displayed ‘infringing trade dress,’” the report explains.

Additional key issues from 2012 identified by the report are:

  • Parties’ preservation obligations in advance of and at the outset of litigation.
  • The scope and meaning of “cooperation” in e-discovery, consistent with the Cooperation Proclamation of The Sedona Conference.
  • The emergence of proportionality as an increasingly important concept in e-discovery.
  • The continued lack of clarity from the courts about what constitutes reasonable efforts to prevent the disclosure of privileged information.
  • Greater emphasis by courts on the government’s e-discovery obligations and a greater willingness to sanction the government for failure to live up to those obligations.

The full Gibson Dunn year-end report is available on the website and in PDF format.

Article: Predictive Coding Helps Companies Reduce Discovery Costs

Computer Technology Review has published an article by John Tredennick, Catalyst’s founder and CEO, “Predictive Coding Helps Companies Reduce Discovery Costs.” In it, John discusses how recent court decisions have opened the door to wider use of technology-assisted review to cut costs in document discovery.

In particular, he focuses on Magistrate Judge Andrew J. Peck’s 2012 decision, Da Silva Moore v. Publicis Groupe, writing:

Without doubt, Judge Peck ushered in the next generation of e-discovery search and review. In the months since his decision, the business world has shown a heightened interest in predictive coding. Other judges have picked up the mantle and considered its use in their cases. Industry vendors have scrambled to add predictive coding to their product rosters.

As important as was the ruling’s outcome, so too was its reasoning. For business leaders, IT professionals and in-house lawyers, the opinion provides a primer on the reasons for using predictive coding and the processes underlying it.

From there, John goes on to explain how predictive coding works and how it is used in e-discovery. In the end, he concludes:

For business leaders, the significance of Judge Peck’s ruling is its seal of approval for predictive coding technology. That should lead to its wider acceptance and use. With data volumes increasing dramatically, predictive coding is the best weapon businesses have to cut the cost and time of review. In their battle to conquer big data, that is a significant victory.

Read the full article at Computer Technology Review.

Catalyst Participates in EDI-Oracle Computer Assisted Review Study

Anyone who attended the recent LegalTech show in New York can attest to the fact that there is no shortage of e-discovery vendors offering some form of technology assisted review. In fact, Catalyst unveiled its own predictive-ranking tool, Insight Predict, at LegalTech.

However, for e-discovery professionals, there is a scarcity of independent metrics by which they can evaluate the quality and costs of these various TAR tools against manual review. For that reason, the non-profit Electronic Discovery Institute (EDI) has launched the EDI Oracle computer Assisted Review Study, which will evaluate multiple human and technology-assisted analysis systems for responding to discovery in litigation.

Catalyst is one of dozens of companies that EDI has invited to participate in the study. The goal of the study, as described in EDI’s announcement, is to obtain an unbiased assessment of technology to help the legal community navigate the computer assisted review process.

The study will provide independently reviewed metrics on the quality and costs of various technology assisted review solutions compared to manual review. The results of the study could potentially speed up the acceptance of technology assisted review methodologies by courts and by corporations.

Members of Oracle’s legal team are participating in the study, in collaboration with chief scientists Peter Glynn and Gerd Infanger from Stanford University. Patrick Oot, co-founder of EDI, is the project’s editor-in-chief.

“This sort of research is what started the non-profit in its quest for technology education in the legal sector,” Oot said in the announcement. “It has taken almost two years for the Oracle-EDI team to find the right scientists, participants, and protocol to make this incredibly complicated undertaking a success.”

As the study moves forward, you will be able to follow its status via Twitter using the hashtag #EDIOracleStudy.

 

Survey of GC and CIOs Predicts a Major Role for Predictive Technology

Increasingly in legal matters involving large enterprises, decisions about e-discovery technology are driven not by litigators, but by general counsel and chief information officers. Now, a survey published this week offers insights into how GCs and CIOs view predictive technology.

The survey’s key finding might come as a surprise to some: The vast majority of both GCs (85%) and CIOs (90%) believe that predictive technology will play a major role in increasing litigation-related worker productivity and reducing the overall cost of e-discovery.

The survey of GCs and CIOs at the Global 250 was conducted in December and January by the eDiscovery Solutions Group (eDSG), which published preliminary results earlier this week. More than half of those polled participated in the survey, the full results of which will be published in the spring.

Other highlights of the survey:

  • 80% of GCs believe predictive technology could be valuable throughout the e-discovery lifecycle.
  • 75% of GCs say that predictive technology vendors do not do a good job of differentiating their products.
  • 66% of GCs think that predictive technology would be key to enabling e-discovery to be brought in-house.

Only 15% of general counsel believe that predictive technology is not ready for the main legal-processing market, the survey found.

You can read more of the preliminary results at eDiscoveryTimes. eDSG describes itself as an international thought leader and global provider of market intelligence, advisory services and information management consulting specializing in information governance, e-discovery and advanced cloud computing.

Predictive Ranking: Technology Assisted Review Designed for the Real World

Why Predictive Ranking?

Most articles about technology assisted review (TAR) start with dire warnings about the explosion in electronic data. In most legal matters, however, the reality is that the quantity of data is big, but it is no explosion. The fact of the matter is that even a half million documents—a relatively small number in comparison to the “big data” of the web—pose a significant and serious challenge to a review team. That is a lot of documents and can cost a lot of money to review, especially if you have to go through them in a manual, linear fashion. Catalyst’s Predictive Ranking bypasses that linearity, helping you zero-in on the documents that matter most. But that is only part of what it does.

In the real world of e-discovery search and review, the challenges lawyers face come not merely from the explosion of data, but also from the constraints imposed by rolling collection, immediate deadlines, and non-standardized (and at times confusing) validation procedures. Overcoming these challenges is as much about process and workflow as it is about the technology that can be specifically crafted to enable that workflow. For these real-world challenges, Catalyst’s Predictive Ranking provides solutions that no other TAR process can offer.

In this article, we will give an overview of Catalyst’s Predictive Ranking and discuss how it differs from other TAR systems in its ability to respond to the dynamics of real-world litigation. But first, we will start with an overview of the TAR process and discuss some concepts that are key to understanding how it works.

What is Predictive Ranking?

Predictive Ranking is Catalyst’s proprietary TAR process. We developed it more than four years ago and have continued to refine and improve it ever since. It is the process used in our newly released product, Insight Predict.

In general, all the various forms of TAR share common denominators: machine learning, sampling, subjective coding of documents, and refinement. But at the end of the day, the basic concept of TAR is simple, in that it must accomplish only two essential tasks:

  1. Finding all (or “proportionally all”) responsive documents.
  2. Verifying that all (or “proportionally all”) responsive documents have been found.

That is it. For short, let us call these two goals “finding” and “validating.”

Finding Responsive Documents

Finding consists of two parts:

  1. Locating and selecting documents to label. By “label,” we mean manually mark them as responsive or nonresponsive.
  2. Propagating (via an algorithmic inference engine) these labels onto unseen documents.

This process of finding or searching for responsive documents is typically evaluated using two qualitative measures: precision and recall. Precision is a measure of the number of true hits (actually responsive documents) in the search compared against the total number of hits returned. Recall is a measure of the total true hits returned from the search against the actual number of true hits in the population.

One area of contention and disagreement among vendors is step 1, the sampling procedures used to train the algorithm in step 2. Vendors’ philosophies general fall into one of two camps, which loosely can be described as “judgmentalists” and “randomists.”

The judgmentalist approach assumes that litigation counsel (or the review manager) has the most insightful knowledge about the domain and matter and is therefore going to be the most effective at choosing training documents. The randomist approach, on the other hand, is concerned about bias. Expertise can help the system quickly find certain pockets of responsive information, the randomists concede, but the problem they see is that even experts do not know what they do not know. By focusing the attention of the system on some documents and not others, the judgmental approach potentially ignores large swaths of responsive information even while it does exceptionally well at finding others.

Therefore, the random approach samples every document in the collection with equal probability. This even-handed approach mitigates the problem of human bias and ensures that a wide set of starting points are selected. However, there is still no guarantee that a simple random sample will find those known pockets of responsive information about which the human assessor has more intimate knowledge.

At Catalyst, we recognize merits in both approaches. An ideal process would be one that combines the strengths of each to overcome the weakness of the other. One straightforward solution is to take the “more is more” approach and do both judgmental and random sampling. A combined sample not only has the advantage of human expertise, but also avoids some of the issues of bias.

However, while it is important to avoid bias, simple random sampling misses the point. Random sampling is good for estimating counts; it does not do as well at guaranteeing topical coverage (sussing out all pockets). The best way to avoid bias is not to pick “random” documents, but to select documents about which you know that you know very little. Let’s call it “diverse topical coverage.”

Remember the difference between the two goals: finding vs. validating. For validation, a statistically valid random sample is required. But for finding, we can be more intelligent than that. We can use intelligent algorithms to explicitly detect which documents we know the least about, no matter which other documents we already know something about. This is more than just simple random sampling, which has no guarantee to topically cover a collection. This is using algorithms to explicitly seek out those documents about which we know nothing or next to nothing. The Catalyst approach is therefore to not stand in the way of our clients by shoehorning them into a single sampling regimen for the purpose of finding. Rather, our clients may pick whatever documents that they want to judge, for whatever reason and “contextual diversity sampling” will detect any imbalances and help select the rest.

Examples of Finding

The following examples illustrate the performance of Catalyst’s intelligent algorithms with respect to the various points that were made in the previous section about random, judgmental, and contextual diversity sampling. In each of these examples, the horizontal x-axis represents the percentage of the collection that must be reviewed in order to find (on the y-axis) the given recall level using Catalyst’s Predictive Ranking algorithms.

For example, in this first graph we have a Predictive Ranking task with a significant number of responsive documents, a high richness. There are two lines, each representing a different initial seed condition: random versus judgmental. The first thing to note is that judgmental sampling starts slightly “ahead” of random sampling. The difference is not huge; the judgmental approach finds perhaps 2-3% more documents initially. That is to be expected, because the whole point of judgmental sampling is that the human can use his or her intelligence and insight into the case or domain to find documents that the computer is not capable of finding by strictly random sampling.

That brings us to the concern that judgmental sampling is biased and will not allow TAR algorithms to find all the documents. However, this chart shows that by using Catalyst’s intelligent iterative Predictive Ranking algorithms, both the judgmental and random initial sampling get to the same place. They both get about 80% of the available responsive documents after reviewing only 6% of the collection, 90% after reviewing about 12% of the collection, and so forth. Initial differences and biases are swallowed up by Catalyst’s intelligent Predictive Ranking algorithms.

In the second graph, we have a different matter in which the number of available responsive documents is over an order of magnitude less than in the previous example; the collection is very sparse. In this case, random sampling is not enough. A random sample does not find any responsive documents, so nothing can be learned by any algorithm. However, the judgmental sample does find a number of responsive documents, and even with this sparse matter, 85% of the available responsive documents may be found by only examining a little more than 6% of the collection.

However, a different story emerges when the user chooses to switch on contextual diversity sampling as part of the algorithmic learning process. In the previous example, contextual diversity was not needed. In this case, especially with the failure of the random sampling approach, it is. The following graph shows the results of both random sampling and judgmental sampling with contextual diversity activated, alongside the original results with no contextual diversity:

Adding contextual diversity to the judgmental seed has the effect of slowing learning in the initial phases. However, after only about 3.5% of the way through the collection, it catches up to the judgmental-only approach and even surpasses it. A 95% recall may be achieved a little less than 8% of the way through the collection. The results for adding contextual diversity to the random sampling are even more striking. It also catches up to judgmental sampling about 4% of the way through the collection and also surpasses it by the end, ending up at just over 90% recall a little less than 8% of the way through the collection.

These examples serve two primary purposes. First, they demonstrate that Catalyst’s iterative Predictive Ranking algorithms work, and work well. The vast majority of a collection does not need to be reviewed, because the Predictive Ranking algorithm finds 85%, 90%, 95% of all available responsive documents within only a few percent of the entire collection.

Second, these examples demonstrate that, no matter how you start, you will attain that good result. It is this second point that bears repeating and further consideration. Real-world e-discovery is messy. Collection is rolling. Deadlines are imminent. Experts are not always available when you need them to be available. It is not always feasible to start a TAR project in the clean, perfect, step-by-step manner that a vendor might require. Knowing that one can instead start either with judgmental samples or with random samples, and that the ability to add a contextual diversity option ensures that early shortcomings are not only mitigated but exceeded, is of critical importance to a TAR project.

Validating What You Have Found

Validating is an essential step in ensuring legal defensibility. There are multiple ways of doing it. Yes, there needs to be random sampling. Yes, it needs to be statistically significant. But there are different ways of structuring the random samples. The most common method is to do a simple random sample of the collection as a whole, and then another simple random sample of the documents that the machine has labeled as nonresponsive. If the richness of responsive documents in the latter sample has significantly decreased from the responsive-document richness in the initial whole population, then the process is considered to be valid.

However, at Catalyst we use a different procedure, one that we think is better at validating results. Like other methods, it also relies on random sampling. However, instead of doing a simple random sample of a set of documents, we use a systematic random sample of a ranking of documents. Instead of labeling documents first and sampling for richness second, the Catalyst procedure ranks all documents by their likelihood of being responsive. Only then is a random sample—a systematic random sample—taken.

At equal intervals across the entire list, samples are drawn. This gives Catalyst the ability to better estimate the concentration of responsive documents at every point in the list than an approach based on unordered simple random sampling. With this better estimate, a smarter decision boundary can be drawn between the responsive and nonresponsive documents. In addition, because the documents on either side of that boundary have already been systematically sampled, there is no need for a two-stage sampling procedure.

Workflow: Putting Finding and Validating Together

In the previous section, we introduced the two primary tasks involved in TAR: finding and validation. If machines (and humans, for that matter) were perfect, there would be no need for these two stages. There would only be a need for a single stage. For example, if a machine algorithm were known to perfectly find every responsive document in the collection, there would be no need to validate the algorithm’s output. And if a validation process could perfectly detect when all documents are correctly labeled, there would be no need to use an algorithm to find all the responsive ones; all possible configurations (combinatorial issues aside) could be tested until the correct one is found.

But no perfect solutions exist for either task, nor will they in the future. Thus, the reason for having a two-stage TAR process is so that each stage can provide checks and balances to the other. Validation ensures that finding is working, and finding ensures that validation will succeed.

Therefore, TAR requires some combination of both tasks. The manner in which both finding and validation are symbiotically combined is known as the e-discovery “workflow.” Workflow is a non-standard process that varies from vendor to vendor. For the most part, every vendor’s technology combines these tasks in a way that, ultimately, is defensible. However, defensibility is the minimum bar that must be cleared.

Some combinations might work more efficiently than others. Some combinations might work more effectively than others. And some workflows allow for more flexibility to meet the challenges of real world e-discovery, such as rolling collection.

We’ll discuss a standard model, typical of the industry, then review Catalyst’s approach, and finally conclude with the reason Catalyst’s approach is better. Hint: It’s not (only) about effectiveness, although we will show that it is that. Rather, it is about flexibility, which is crucial in the work environments in which lawyers and review teams use this technology.

Standard TAR Workflow

Most TAR technologies follow the same essential workflow. As we will explain, this standard workflow suffers from two weaknesses when applied in the context of real-world litigation. Here are the steps it entails:

  1. Estimate via simple random sampling how many responsive and nonresponsive docs there are in the collection (aka estimate whole population richness).
  2. Sample (and manually, subjectively code) documents.
  3. Feed those documents to a predictive coding engine to label the remainder of the collection.
  4. If manual intervention is needed to assist in the labeling (for example via threshold or rank-cutoff setting), do so at this point.
  5. Estimate via simple random sampling how many responsive documents there are in the set of documents that have been labeled in steps 3 and 4 as nonresponsive.
  6. Compare the estimate in step 5 with the estimate in step 1. If there has been a significant decrease in responsive richness, then the process as a whole is valid.

TAR as a whole relies on these six steps working as a harmonious process. However, each step is not done for the same reason. Steps 2-4 are for the purpose of finding and labeling. Steps 1, 5, and 6 are for the purpose of validation.

The first potential weakness in this standard workflow stems from the fact that the validation step is split into two parts, one at the very beginning and one at the very end. It is the relative comparison between the beginning and the end that gives this simple random-sampling-based workflow its validity. However, that also means that in order to establish validity, no new documents may arrive at any point after the workflow has started. Collection must be finished.

In real-world settings, collection is rarely complete at the outset. If new documents arrive after the whole-population richness estimate (step 1) is already done, then that estimate will no longer be statistically valid. And if that initial estimate is no longer valid, then the final estimates (step 5), which compare themselves to that initial estimate, will also not be valid. Thus, the process falls apart.

The second potential weakness in the standard workflow is that the manual intervention for threshold setting (step 4) occurs before the second (and final) random sampling (step 5). This is crucial to the manner in which the standard workflow operates. In order to compare before and after richness estimates (step 1 vs. step 5), concrete decisions will have had to be made about labels and decision boundaries. But in real-world settings, it may be premature to make concrete decisions at this point in the overall review.

How Catalyst’s Workflow Differs

In order to circumvent these weaknesses and match our process more closely to real-world litigation, Catalyst’s Predictive Ranking uses a proprietary, four-step workflow:

  1. Sample (and manually, subjectively code) documents.
  2. Feed those documents to our Predictive Ranking engine to rank the remainder of the collection.
  3. Estimate via a systematic random sample the relative concentration of responsive documents throughout the ranking created in step 2.
  4. Based on the concentration estimate from step 3, select a threshold or rank-cutoff setting which gives the desired recall and/or precision.

Once again, as with the standard predictive coding workflow, our Predictive Ranking as a whole relies on these four steps working as a harmonious process. However, each step is not done for the same reason. Steps 1 and 2 are for the purpose of finding and labeling. Steps 3 and 4 are for the purpose of validation.

Two important points should be noted about Catalyst’s workflow. The first is that the validation step is not split into two parts. Validation only happens at the very end of the entire workflow. If more documents arrive while documents are being found and labeled during steps 1 and 2 (i.e. if collection is rolling), the addition of new documents does not interfere with anything critical to the validation of the process. (Additional documents might make finding more difficult; finding is a separate issue from validating, one which Catalyst’s contextual diversity sampling algorithms are designed to address.)

The fact that validation in our workflow is not hampered by collections that are fluid and dynamic is significant. In real-world e-discovery situations, rolling collection is the norm. Our ability to handle this fluidity natively—by which we mean central to the way the workflow normally works, rather than as a tacked-on exception—is highly valuable to lawyers and review teams.

The second important point to note about Catalyst’s workflow is that the manual intervention for threshold setting (step 4) happens after the systematic random sample. At first it may seem counterintuitive as to why this is defensible, because choices about the labeling of documents are happening after a random sample has been taken. But the purpose of the systematic random sample is to estimate concentrations in a statistically valid manner. Since the concentration estimates themselves are valid, decisions made based on those concentrations are also valid.

Consequences and Benefits of the Catalyst Workflow

We already touched on two key ways in which the Catalyst Predictive Ranking workflow is unique from the industry standard workflow. It is important to understand what our workflow allows us—and you—to do:

  1. Get good results. Catalyst Predictive Ranking consistently demonstrates high scores for both precision and recall.
  2. Add more training samples, of any kind, at any time. That allows the flexibility of having judgmental samples without bias.
  3. Add more documents, of any kind, at any time. You don’t have to wait 158 days until all documents are collected. And you don’t have to repeat step 1 of the standard workflow when those additional documents arrive.
  4. Go through multiple stages of culling and filtering without hampering validation. In the standard workflow, that would destroy your baseline. This is not a concern with the Catalyst approach, which saves the validation to the very end, via the systematic sample.

Catalyst has more than four years of experience using Predictive Ranking techniques to target review and reduce document populations. Our algorithms are highly refined and highly effective. Even more important, however, is that our Predictive Ranking workflow has what other vendors’ workflows do not—the flexibility to accommodate real-world e-discovery. Out there in the trenches of litigation, e-discovery is a dynamic process. Whereas other vendors’ TAR workflows require a static collection, ours flows with the dynamics of your case.

Law Technology News Features Catalyst’s New ‘Insight Predict’

Those attending LegalTech New York this week can expect to see some new approaches to predictive coding, writes Law Technology News reporter Evan Koblentz in his article, LegalTech to Debut Novel ‘Predictive Coding’ Strategies. Featured prominently in Koblentz’s article is Insight Predict, a new tool for technology-assisted review that Catalyst is introducing at LegalTech.

In his report, Koblentz writes:

[A]t the show, Catalyst will introduce its own predictive coding system, calling Insight Predict. “It’s probably as different as you can get,” and is based on query technology developed in a joint venture between Japan’s Fuji and Norwalk, Conn.-based Xerox Corp., CEO John Tredennick said.

“The system’s built in a big cloud, where we’re ranking all your millions of documents, every time,” said Tredennick, who revealed the project in August. “Some other systems use 10,000 documents and call that representative, which we question,” he continued. “The whole key is we can do this against any volume of documents quickly, and you just can’t with the appliances.”

Read the full article at Law.com.

My Key Word Searches are Better than Your Predictive Ranking Technology!

I recently got a distress call from an e-discovery partner of ours with an unhappy client. “It seems like there is something wrong with your predictive ranking technology,” our partner said on the Google Hangout. “It’s proposing that the client team review too many documents–more than we got with key word searching. Our client is upset. We need to do something to explain this fast.”

In this case the client team had not used technology assisted review (TAR) before; this was their first try at the process. They wanted proof that it was worth the extra cost for the technology. Specifically, they wanted to see whether it actually cut down on review costs, like everyone claimed.

The problem was that the system didn’t seem to work–at least in their eyes. They had started the review process by running a series of key word searches, which was their normal practice. The searches hit on a total of about 11,000 documents, out of a test set of 50,000 documents. This suggested they had only 11,000 documents to review for the production, about 20% of the total collected.

Our partner had recommended that the client try our Predictive Ranking technology as a better means to find responsive documents. Everyone’s initial expectation was that doing so would reduce the review population even further than the 11,000 documents that the key word searches hit. Somehow they got the impression that the review population might go down to more like 7,000. That would certainly justify the extra expense of this new technology.

Unfortunately, the opposite turned out to be the case. Instead of recommending that the team review 7,000 documents, or even 11,000 documents, our system suggested reviewing more than 18,500 documents.

You can imagine the client’s consternation. “You want us to pay you for these results? You just increased rather than reduced my review costs. I like it better the old way.”

It was time to go to work to figure out what happened. Fortunately, our team had analyzed the key word search results and compared them to the documents identified through Predictive Ranking. My job was to explain the difference between the two approaches. My hope was to show that the system worked well and provided a better outcome than key word search¾at least if the goal was to identify and review potentially relevant documents.

To be sure, our Predictive Ranking system came up with more documents to review than did the key word searches. However, we quickly concluded that the key word searches–while finding many potentially responsive documents–missed a lot of others that should be considered as well. Here is how we reached that conclusion.

The Numbers

Let me start by giving you some basic information about the two processes. From there it becomes a bit easier to explain the difference in the results. You can then see how we got to our ultimate conclusion.

The client collected just over 51,000 documents for this production. As a first step, counsel created a set of key word searches and asked our partner to run them using our PowerSearch utility. As I mentioned earlier, the searches hit on about 11,000 documents.

We then started the Predictive Ranking process, which is the name we started using years ago for our TAR methods. The first step was to take an initial random sample for reference purposes and to estimate the overall richness of the population. We came up with an estimate of 22%, which would suggest that there were about 11,000 relevant documents in the collection.

Hmmm. That was pretty close to the number of documents found through the key word searches. Did they nail it this time?

We next worked with the client’s legal experts to review and tag seed documents and then to undertake several rounds of system training. At the end of the process, the system suggested a review cutoff below 36%. That meant that the review team would look at the top 36% of the documents and ignore (after a confirmatory sample) the remaining 64%.

The resulting numbers looked like this:

  • Likely responsive and need review: 18,552.
  • Likely non-responsive and don’t need review: 32,982

Our sampling also suggested that the top documents above the cutoff had a richness of about 50% (which meant half of these were likely responsive). The documents below the cutoff had a richness of about 7% (which meant that only 7 out of 100 were likely responsive). Seven percent seemed like a good number for the discard pile, one that most courts would accept.

The Question

As mentioned earlier, our results caused heartburn for our partner and its client. The key word search approach seemed to require that the team review only 11,318 documents. Why pay for Predictive Ranking if it requires that the team review 7,000 additional documents? That’s about 60% higher than the key word results.

Understanding the Numbers

The answer requires that we better understand what all these figures mean. Unless we can compare apples to apples, we have no way to judge the efficacy of the two approaches. Fortunately we had an easy way to do just that.

We compared the document IDs for the files returned from the key word searches with the files returned from our Predictive Ranking process. What we found was pretty interesting. Let me show you with this simple diagram:

I created this diagram to map the comparative results of our Predictive Ranking and the key word searches. The circle represents the total documents in the population. If you add up the numbers in the four quadrants, it comes up to the 51,534 files at issue

The four quadrants represent the different states of the document population.

  • The top left quadrant represents documents that our Predictive Ranking system found likely responsive but did not return from the key word searches. There were 11,285 documents in this category.
  • The top right quadrant represents documents that hit under both approaches. These documents were returned by the key word searches and were also designated by our Predictive Ranking system as potentially responsive. There were 7,267 documents in this category.
  • The bottom left quadrant represents documents that did not hit under either approach. Neither our Predictive Ranking system nor the key word searches deemed them likely responsive. There were 28,931 documents in this category.
  • The bottom right quadrant represents documents that were returned from the key word searches but were not deemed likely responsive by our Predictive Ranking system. There were 4,051 documents in this category.

So, what can we say about all this? First, we can say that the team should probably review the 7,267 documents found in the top right quadrant. Both approaches tagged them as likely responsive. That does not mean that they will all be responsive but it is a good bet that a lot of them are.

Second, we can suggest that the 28,931 documents in the bottom left quadrant  include few responsive documents. Neither the key word searches nor our Predictive Ranking system hit on these documents. There is still a need for confirmatory sampling but we can be pretty sure that there are not a lot of responsive documents hiding in this quadrant.

The Two Key Quadrants

That leaves us with two quadrants to consider and this is where we find the answer to our puzzle. Together, these two quadrants represent about 15,000 documents. Here is what we can say about each:

  • The top left quadrant represents 11,285 documents that our Predictive Ranking system found as likely responsive. The keyword searches provide no information about these documents other than to say that they did not return from the searches.
  • The bottom right quadrant represents 4,051 documents that hit on counsel’s keyword searches but our Predictive Ranking system found to be likely non-responsive.

If counsel only reviewed documents that returned from the key word searches, they would be ignoring the 11,285 documents identified in the top left quadrant. Many of them had already been tagged during the Predictive Ranking training session and thus we knew that there were responsive documents in this quadrant. Our richness estimate went so far as to suggest that 50% of them were likely responsive, which meant that counsel might be missing 5,000-6,000 responsive documents using their key word approach. It quickly became evident that counsel would have to at least test additional documents in this quadrant before dismissing them as not responsive.

Conversely, our Predictive Ranking system led us to question how many of the 4,051 documents in the lower right quadrant were responsive. In fact, we knew from training that many of the documents in that quadrant were not responsive. At the least, that is what the reviewers concluded when they addressed them during the sampling.

We suggested that the client test the documents in this quadrant before engaging in review. Our suspicion was that these were false hits from the key word searches and not likely of interest. Our estimate was that they would find a richness of about 7%.

Answering the Question

By now you have already figured out how to respond to the client’s concerns. Simply put, the key word searches–while effective at finding some of the potentially responsive documents–missed a lot of others that should be reviewed. The Predictive Ranking system found many of the documents returned from the key word searches but it also found a lot of other potentially responsive documents. The total numbers were higher but there was good reason for that outcome. There were more documents that needed to be reviewed.

Put another way, search has two qualitative measures: precision and recall. Precision is a measure of the number of true hits (actually responsive documents) returned from your search compared to the total number returned. Recall is a measure of the total true hits returned from your search against the actual number of true hits in the population.

In our case, the key word searches may have been good on precision (assuming that the documents in the top right quadrant were, in fact, responsive). However, they seemed to miss the boat on recall. The searches missed a lot of the other responsive documents. That is not a good thing if your opponent chooses to challenge your production in court.

It turned out that my explanation proved helpful to our partner and the client team. They moved forward with their review using the documents ranked by our Predictive Ranking system. It turned out that there were a lot of responsive documents missed by the key word searches and many of the documents returned by the searches in the lower right quadrant were false hits. The explanation and diagram helped to clear up the mystery. I thought it might be helpful to others as well as they grapple with the mysteries of technology assisted review.

There may also be a moral to this story, so to speak. Discussion of technology assisted review often focuses on its ability to reduce document populations. But review is not just a numbers game—it’s also about getting it right. It does neither lawyers nor their clients any good to cut document populations if they are cutting a large number of potentially responsive documents in the process. As my story above illustrates, fewer is not always better. Here, Predictive Ranking proved itself superior to key word searching at getting it right. That may have saved counsel some grief further down the road.

The Next Big Predictive Coding Case that Wasn’t

The case that many believed might be the next big bang in predictive coding jurisprudence instead has ended with barely a whimper.

As I noted here last month, in the wake of Magistrate Judge Andrew J. Peck’s ruling in Da Silva Moore v. Publicis Groupe affirming the use of predictive coding, many in the e-discovery field turned their attention to Kleen Products LLC v. Packaging Corporation of America, believing that it might be the Next Big Case on predictive coding.

The plaintiffs in Kleen Products had asked U.S. Magistrate Judge Nan Nolan to require the defendants to use predictive coding and Judge Nolan had conducted two days of evidentiary hearings on the request as well as several status conferences.

Although the case continues on, predictive coding is off the table, at least for the time being. Last week, Judge Nolan approved a stipulation submitted by the parties in which plaintiffs withdrew their demand to apply predictive coding to any documents relating to any request for production filed prior to Oct. 1, 2013.

As to any requests for production filed after that date, the parties stipulated that they will meet and confer regarding the appropriate search methodology. “If the parties fail to agree on a search methodology,” the stipulation says, “either party may file a motion with the Court seeking resolution.”

That suggests that we may not have heard the last of Kleen Products in the context of computer-assisted search. But with any further possible rulings on the issue well over a year away, we can safely write it off as the next big case.