Measuring Recall in E-Discovery Review, Part Two: No Easy Answers

In Part One of this two-part post, I introduced readers to statistical problems inherent in proving the level of recall reached in a Technology Assisted Review (TAR) project. Specifically, I showed that the confidence intervals around an asserted recall percentage could be sufficiently large with typical sample sizes as to undercut the basic assertion used to justify your TAR cutoff.

download-pdfIn our hypothetical example, we had to acknowledge that while our point estimate suggested we had found 75% of the relevant documents in the collection, it was possible that we found only a far lower percentage. For example, with a sample size of 600 documents, the lower bound of our confidence interval was 40%. If we increased the sample size to 2,400 documents, the lower bound only increased to 54%. And, if we upped our sample to 9,500 documents, we got the lower bound to 63%.

Even assuming that 63% as a lower bound is enough, we would have a lot of documents to sample. Using basic assumptions about cost and productivity, we concluded that we might spend 95 hours to review our sample at a cost of about $20,000. If the sample didn’t prove out our hoped-for recall level (or if we received more documents to review), we might have to run the sample several times. That is a problem.

Is there a better and cheaper way to prove recall in a statistically sound manner? In this Part Two, I will take a look at some of the other approaches people have put forward and see how they match up. However, as Maura Grossman and Gordon Cormack warned in “Comments on ‘The Implications of Rule 26(g) on the Use of Technology-Assisted Review’” and Bill Dimm amplified in a later post on the subject, there is no free lunch. Continue reading

TAR 2.0 Capabilities Allow Use in Even More E-Discovery Tasks

Recent advances in Technology Assisted Review (“TAR 2.0”) include the ability to deal with low richness, rolling collections, and flexible inputs in addition to vast improvements in speed. [1] These improvements now allow TAR to be used effectively in many more discovery workflows than its traditional “TAR 1.0” use in classifying large numbers of documents for production.

To better understand this, it helps to begin by examining in more detail the kinds of tasks we face. Broadly speaking, document review tasks fall into three categories:[2]

  • Classification. This is the most common form of document review, in which documents are sorted into buckets such as responsive or non-responsive so that we can do something different with each class of document. The most common example here is a review for production.
  • Protection. This is a higher level of review in which the purpose is to protect certain types of information from disclosure. The most common example is privilege review, but this also encompasses trade secrets and other forms of confidential, protected, or even embarrassing information, such as personally identifiable information (PII) or confidential supervisory information (CSI).
  • Knowledge Generation. The goal here is learning what stories the documents can tell us and discovering information that could prove useful to our case. A common example of this is searching and reviewing documents received in a production from an opposing party or searching a collection for documents related to specific issues or deposition witnesses. Continue reading

TAR in the Courts: A Compendium of Case Law about Technology Assisted Review

Magistrate Judge Andrew Peck

Magistrate Judge Andrew Peck

It is less than three years since the first court decision approving the use of technology assisted review in e-discovery. “Counsel no longer have to worry about being the ‘first’ or ‘guinea pig’ for judicial acceptance of computer-assisted review,” U.S. Magistrate Judge Andrew J. Peck declared in his groundbreaking opinion in Da Silva Moore v. Publicis Groupe.

Judge Peck did not open a floodgate of judicial decisions on TAR. To date, there have been fewer than 20 such decisions and not one from an appellate court.

However, what he did do — just as he said — was to set the stage for judicial acceptance of TAR. Not a single court since has questioned the soundness of Judge Peck’s decision. To the contrary, courts uniformly cite his ruling with approval.

That does not mean that every court orders TAR in every case. The one overarching lesson of the TAR decisions to date is that each case stands on its own merits. Courts look not only to the efficiency and effectiveness of TAR, but also to issues of proportionality and cooperation.

What follows is a summary of the cases to date involving TAR. Each includes a link to the full-text decision, so that you can read for yourself what the court said. Continue reading

How Corporate Counsel are Integrating E-Discovery Technologies to Help Manage Litigation Costs

The newsletter Digital Discovery & e-Evidence just published an article by Catalyst founder and CEO John Tredennick, “Taking Control: How Corporate Counsel are Integrating eDiscovery Technologies to Help Manage Litigation Costs.” In the article, John explains why savvy corporate counsel are using the multi-matter repository and technology assisted review to manage cases and control costs. Continue reading

How Much Can I Save with CAL? A Closer Look at the Grossman/Cormack Research Results

As most e-discovery professionals know, two leading experts in technology assisted review, Maura R. Grossman and Gordon V. Cormack, recently presented the first peer-reviewed scientific study on the effectiveness of several TAR protocols, “Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery,” to the annual conference of the Special Interest Group on Information Retrieval, a part of the Association for Computing Machinery (ACM).

download-pdfPerhaps the most important conclusion of the study was that an advanced TAR 2.0 protocol, continuous active learning (CAL), proved to be far more effective than the two standard TAR 1.0 protocols used by most of the early products on the market today—simple passive learning (SPL) and simple active learning (SAL). Continue reading

The Seven Percent Solution: The Case of the Confounding TAR Savings

SevenPercentSolution

“Which is it to-day,” [Watson] asked, “morphine or cocaine?”

[Sherlock] raised his eyes languidly from the old black-letter volume which he had opened. 
“It is cocaine,” he said, “a seven-per-cent solution. Would you care to try it?”

-The Sign of the Four, Sir Arthur Conan Doyle, (1890)

Back in the mid-to-late 1800s, many touted cocaine as a wonder drug, providing not only stimulation but a wonderful feeling of clarity as well. Doctors prescribed the drug in a seven percent solution of water. Although Watson did not approve, Sherlock Holmes felt the drug helped him focus and shut out the distractions of the real world. He came to regret his addiction in later novels, as cocaine moved out of the mainstream.

This story is about a different type of seven percent solution, with no cocaine involved. Rather, we will be talking about the impact of another kind of stimulant, one that saves a surprising amount of review time and costs. This is the story of how a seemingly small improvement in review richness can make a big difference for your e-discovery budget. Continue reading

Measuring Recall in E-Discovery Review, Part One: A Tougher Problem Than You Might Realize

A critical metric in Technology Assisted Review (TAR) is recall, which is the percentage of relevant documents actually found from the collection. One of the most compelling reasons for using TAR is the promise that a review team can achieve a desired level of recall (say 75% of the relevant documents) after reviewing only a small portion of the total document population (say 5%). The savings come from not having to review the remaining 95% of the documents. The argument is that the remaining documents (the “discard pile”) include so few that are relevant (against so many irrelevant documents) that further review is not economically justified. Continue reading

Another Court Formally Endorses the Use of Technology Assisted Review

Given the increasing prevalence of technology assisted review in e-discovery, it seems hard to believe that it was just 19 months ago that TAR received its first judicial endorsement. That endorsement came, of course, from U.S. Magistrate Judge Andrew J. Peck in his landmark ruling in Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012), adopted sub nom. Moore v. Publicis Groupe SA, No. 11 Civ. 1279 (ALC)(AJP), 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012), in which he stated, “This judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.”

Other courts have since followed suit, and now there is another to add to the list: the U.S. Tax Court. Continue reading

National Law Journal Article on Using TAR for Non-English Documents

In a recent memorandum, a U.S. Department of Justice attorney questioned the effectiveness of using technology assisted review with non-English documents. While the DOJ “would be open to discussion” about using TAR in such cases, it is not ready to adopt it as a standard procedure, the memo said.

In an article published Sept. 1 in The National Law Journal, Catalyst founder and CEO John Tredennick responds to that DOJ memo. In the article, Yes, Predictive Coding Works in Non-Western Languages, Tredennick explains that TAR, when done properly, can be just as effective for non-English as it is for English documents. This is true even for the so-called “CJK languages” — Asian languages including Chinese, Japanese and Korean.

Read the full article at the NLJ website: Yes, Predictive Coding Works in Non-Western Languages.

 

RIP Browning Marean, 1942-2014

I am sad to report that Browning Marean passed away last Friday. He will be sorely missed by his partners at DLA Piper, his clients and his many friends and colleagues. I am proud to say that I have been friends with Browning for many years and count myself in his fan club. We go back to the early days of Catalyst and before that even. The time was too short.

Browning served on the Catalyst Advisory Board for the past two years and was always quick to help whenever I asked. You couldn’t ask for a better sounding board or friend.

Many have already posted their thoughts and regrets about the loss of Browning, including our friend Craig Ball, who as usual made the case as eloquently as possible (Browning Marean 1942-2014). Thanks to Chris Dale as well for his comments, Goodbye Old Friend: Farewell to Browning Marean, and photo gallery. And to Ralph Losey: Browning Marean: The Life and Death of a Great Lawyer. And Tom O’Connor: Browning Marean: A Remembrance.

Browning and I go back to the early days, before there was an “E” in front of discovery. He told me once that he got his start on the speaking circuit after hearing one of my talks. It inspired him to see a lawyer up there talking about litigation technology, he said. Having watched Browning leave me in the dirt with his speaking prowess, I was both honored and pleased to have played a small part in getting him going.

I had the privilege of being with Browning on the dais, at conferences and in quiet evening meals from Hong Kong to London and many places in between. Had I realized time was short, there are so many things I would have wanted to say. Alas, that seldom happens and it didn’t here. He wrote me a few weeks ago to say he expected to be back on his feet in September. How I wish that were still true.

Browning: You touched a lot of people over your too few years and made the world a better place. We carry on in your honor.

Rest in peace old friend.