Readers of our blog will know that I have a continuing interest in answering the perennial e-discovery question: “How many native documents are in a gigabyte?” I started thinking about this in 2011 and published my first article on the subject based on analysis of 18 million files. Challenging industry assumptions, which ran from 5,000 to as many as 15,000, I concluded that the average across all files—based on that sample—was closer to 2,500. Continue reading
An old friend called me recently to talk about a beef he had with his e-discovery provider. “What’s up?” I asked when I realized who it was. He told me he thought he had done everything right in setting up his last e-discovery project. He sent out an RFP to several vendors, asked all the right questions and then picked the bidder with the lowest per-gigabyte price to host the documents. Everything seemed like it was on track.
“So what’s wrong with that,” I asked. “You went for the low bidder and locked them in with an ironclad contract. Getting hosting for that kind of per-gigabyte price seems like a steal.”
My friend sighed in response. “What happened was that I didn’t read the fine print.” Continue reading
Taking his lead from the seminal U.S. case, Da Silva Moore v. Publicis Groupe, a master of Britain’s High Court of Justice has approved the use of technology assisted review, becoming the first case to do so in the United Kingdom and only the second case outside the U.S. to approve TAR.
In a written decision issued Feb. 16, 2016, in the case Pyrrho Investments Ltd. v. MWB Property Ltd., Master Matthews, who is similar in responsibility to a magistrate judge in the U.S. federal court system – provided his reasons for his approval of the parties’ request to use TAR in a case involving some 3.1 million electronic documents. Continue reading
In the user interface (UI) and user experience (UX) world, one of the ways people design successful software is through the creation of a “mental model” of the underlying processes. Mental models have been around since the 1940s and used for different processes but the concept caught hold in software because it gave designers a framework to understand user needs and the problems they were trying to solve.
According to one of the pioneers of Internet usability, Jacob Neilson, “mental models are one of the most important concepts in human-computer interaction.” We use them to inform our software design and we wanted to share one that we created to model the e-discovery process. Continue reading
Last March, we wrote about U.S. Magistrate-Judge Andrew J. Peck’s decision in Rio Tinto PLC v. Vale SA (S.D. N.Y. March 3, 2015). The decision focused on the types of disputes over process that can arise when parties negotiate a TAR 1.0 protocol. In that post, we noted with approval Judge Peck’s acknowledgment that one common bone of contention in TAR 1.0 negotiations ⎯ transparency around training and the seed set ⎯ becomes less of an issue when the TAR methodology uses continuous active learning.
If the TAR methodology uses ‘continuous active learning’ (CAL) (as opposed to simple passive learning (SPL) or simple active learning (SAL)), the contents of the seed set is much less significant.
After issuing his opinion, and doubtless facing continuing squabbles among the parties, Judge Peck appointed Maura Grossman to serve as a special master to resolve discovery disputes relating to the parties’ use of TAR. Several months later, she issued a “Stipulation and Order re: Revised Validation and Audit Protocols for the Use of Predictive Coding in Discovery,” which is the subject of this blog post. Continue reading
Can keyword search be as or more effective than technology assisted review at finding relevant documents?
A client recently asked me this question and it is one I frequently hear from lawyers. The issue underlying the question is whether a TAR platform such as our Insight Predict is worth the fee we charge for it.
The question is a fair one and it can apply to a range of cases. The short answer, drawing on my 20-plus years of experience as a lawyer, is unequivocally, “It depends.” Continue reading
Whether at Sedona, Georgetown, Legaltech or any other of the many discovery conferences one might attend, a common debate centers on the efficacy of keyword search. “Keyword search is dead,” some argue, touting the effectiveness of the newer predictive analytics engines. “Long live keyword search,” comes back in return from lawyers who have relied on it for decades both to find legal precedent and, more recently, relevant documents for their cases.
Often, the critics of keyword searching cite the 1985 Blair and Maron study for the Association of Computing Machinery that suggested that full-text retrieval systems brought back only 20 percent of the relevant documents. That assertion is true but I wonder how many of the debaters have ever read the study itself. My guess is not many, including me. So I decided to give it a read. Continue reading
Recently, Bob Ambrogi, our director of communications, published a post called “Our 10 Most Popular Blog Posts of 2015 (So Far).” To my surprise, one of my 2011 posts topped the list: “Shedding Light on an E-Discovery Mystery: How Many Documents in a Gigabyte?” Another on the same topic ranked fourth: “How Many Documents in a Gigabyte? An Updated Answer to that Vexing Question.”
Hmmm. Clearly, a lot of us are interested in knowing the answer to this question. I have received a number of comments on both posts (both in writing and in conversation), which always makes the writing worthwhile. The RAND people told me they also found my findings of interest when they were putting together their study on e-discovery costs. Continue reading
A key debate in the battle between TAR 1.0 (one-time training) and TAR 2.0 (continuous active learning) is whether you need a “subject matter expert” (SME) to do the training. With first-generation TAR engines, this was considered a given. Training had to be done by an SME, which many interpreted as a senior lawyer intimately familiar with the underlying case. Indeed, the big question in the TAR 1.0 world was whether you could use several SMEs to spread the training load and get the work done more quickly.
SME training presented practical problems for TAR 1.0 users—primarily because the SME had to look at a lot of documents before review could begin. You started with a “control” set, often 500 documents or more, to be used as a reference for training. Then, the SME needed to review thousands of additional documents to train the system. After that, the SME had to review and tag another 500 documents to document effectiveness of the training. All told, the SME could expect to to look at and judge 3,000 to 5,000 or more documents before the review could start. Continue reading
I do not know if any leprechauns appeared in this case, but the Irish High Court found the proverbial pot of gold under the TAR rainbow in Irish Bank Resolution Corp. vs. Quinn—the first decision outside the U.S. to approve the use of Technology Assisted Review for civil discovery.
The protocol at issue in the March 3, 2015, decision was TAR 1.0 (Clearwell). For that reason, some of the points addressed by the court will be immaterial for legal professionals who use the more-advanced TAR 2.0 and Continuous Active Learning (CAL). Even so, the case makes for an interesting read, both for its description of the TAR process at issue and for its ultimate outcome. Continue reading