Predictive coding purists might argue that the process is tainted if it is preceded by the use of keyword searching to reduce the document set. As a matter of fact, that was exactly what the plaintiffs argued in the multi-district litigation against Biomet over its M2a Magnum hip implant. But in a ruling last week, U.S. District Judge Robert L. Miller Jr. said that proportionality trumped purity, and that even if predictive coding might unearth additional relevant documents, the cost would far outweigh the likely benefits.
The ruling came in a highly contentious matter in which Biomet has already produced 2.5 million documents out of a universe of 19.5 million. The plaintiffs say Biomet has not produced enough and that the production should total closer to 10 million documents.
Biomet began the process of producing documents in the summer of 2012, before the multiple lawsuits over its hip implant were centralized under the Judicial Panel on Multidistrict Litigation. It started by using keyword culling, reducing the number of documents from 19.5 million to 3.9 million, or 1.5 terabytes of data. Deduplication further reduced the number of documents to 2.5 million.
Biomet then employed predictive coding to identity relevant and privileged documents from among the 2.5 million that remained after keyword searching and deduplication. It also invited the plaintiffs to suggest additional search terms and offered to produce the rest of the non-privileged documents so that plaintiffs could verify that it was producing the relevant documents.
This process has cost Biomet $1.07 million so far and will end up costing between $2 million and $3.25 million, according to Judge Miller’s ruling.
The plaintiffs objected that Biomet’s initial use of keyword searching tainted the predictive coding process. They argued that keyword searching is less accurate than predictive coding, citing a recent article that said that keyword searches generate only a 20 percent responsive rate, compared to at least a 75 percent responsive rate for predictive coding. Because Biomet started with the less-accurate keyword searches, the entire process is flawed. The only remedy, they asserted, is for Biomet to go back to where it started and employ predictive coding against the original set of 19.5 million documents, with plaintiffs and defendants sitting together and jointly going through the process.
Balancing Cost against Benefits
Against this backdrop, Judge Miller framed the issue this way:
The issue before me today isn’t whether predictive coding is a better way of doing things than keyword searching prior to predictive coding. I must decide whether Biomet’s procedure satisfies its discovery obligations and, if so, whether it must also do what the Steering Committee seeks.
Judge Miller easily decides that Biomet’s process complied with the Federal Rules of Civil Procedure and with the Seventh Circuit Principles Relating to the Discovery of Electronically Stored Information, as well as with principles set forth by The Sedona Conference.
By contrast, he says that plaintiffs’ request that Biomet go back to Square One “sits uneasily with the proportionality standard in Rule 26(b)(2)(C).” The cost of this, he notes, would run in the “low seven figures,” but Biomet’s testing suggests it would find only a “comparatively modest number of documents.”
It might well be that predictive coding, instead of a keyword search, at Stage Two of the process would unearth additional relevant documents. But it would cost Biomet a million, or millions, of dollars to test the Steering Committee’s theory that predictive coding would produce a significantly greater number of relevant documents. Even in light of the needs of the hundreds of plaintiffs in this case, the very large amount in controversy, the parties’ resources, the importance of the issues at stake, and the importance of this discovery in resolving the issues,I can’t find that the likely benefits of the discovery proposed by the Steering Committee equals or outweighs its additional burden on, and additional expense to, Biomet.
Judge Miller concludes by stating that he assumes Biomet will remain open to meeting and conferring on additional, reasonable search terms and to producing the non-privileged documents included in the statistical sample. “Beyond that,” he writes, “if the Steering Committee wishes production of documents that can be identified only through re-commenced processing, predictive coding, review, and production, the Steering Committee will have to bear the expense.”
The full text of Judge Miller’s order is below. (If you are not able to see the embedded document below, click here to see it.)


Computer Technology Review has published an article by
However, for e-discovery professionals, there is a scarcity of independent metrics by which they can evaluate the quality and costs of these various TAR tools against manual review. For that reason, the non-profit
Increasingly in legal matters involving large enterprises, decisions about e-discovery technology are driven not by litigators, but by general counsel and chief information officers. Now, a survey published this week offers insights into how GCs and CIOs view predictive technology.
That is a lot of documents and can cost a lot of money to review, especially if you have to go through them in a manual, linear fashion. Catalyst’s Predictive Ranking bypasses that linearity, helping you zero-in on the documents that matter most. But that is only part of what it does.


In this case the client team had not used technology assisted review (TAR) before; this was their first try at the process. They wanted proof that it was worth the extra cost for the technology. Specifically, they wanted to see whether it actually cut down on review costs, like everyone claimed.
In our case, the key word searches may have been good on precision (assuming that the documents in the top right quadrant were, in fact, responsive). However, they seemed to miss the boat on recall. The searches missed a lot of the other responsive documents. That is not a good thing if your opponent chooses to challenge your production in court.
As I