I read a fascinating blog post from Andrew McAfee for the Harvard Business Review. Titled “Big Data’s Biggest Challenge? Convincing People NOT to Trust Their Judgment,” the article’s primary thesis is that as the amount of data goes up, the importance of human judgment should go down.
Even though it may seem counterintuitive, support for this proposition is piling up rapidly. McAfee cites numerous examples to back his argument. For one, it has been shown that parole boards do much worse than algorithms in assessing which prisoners should be sent home. Pathologists are not as good as image analysis software at diagnosing breast cancer. And, apparently a number of top American legal scholars got beat at predicting Supreme Court votes by a data-driven decision rule.
Have you heard how they finally taught computers to translate? For many years, humans tried to create more and more complicated rules to govern the translation of grammar in different languages. Microsoft and many others struggled with the problem, finding they could only get so far with this largely human-based approach. The resulting translations were sometimes passable but more often comical, using the humans to articulate the rules of the road.
Franz Josef Och, a research scientist at Google, tried a different approach. Rather than try to define language through rules and grammar, he simply tossed a couple billion translations at the computer to see what would happen. The result was a huge leap forward in the accuracy of computerized translation and a model that most other companies (including Microsoft) follow today. You can read more here and more about these kinds of stories in the book, Big Data: A Revolution That Will Transform How We Live, Work and Think.
What’s this have to do with legal search?
It turns out a lot. McAfee reaches the surprising conclusion that humans need to play second fiddle to the algorithms when it comes to big data. Despite our intuition, the purpose of data analytics is not to assist humans in exercising their intuition. As much as that makes sense to us carbon-based units, we simply don’t do as good a job in many situations even when presented with the insights that algorithms can provide. We tend to dismiss them in favor of our emotions and biases. At best:
What you usually see is [that] the judgment of the aided experts is somewhere in between the model and the unaided expert. So the experts get better if you give them the model. But still the model by itself performs better.
(Citing sociologist Chris Snijders, quoted in the Ian Ayres book, Super Crunchers: Why Thinking-by-Numbers is the New Way to be Smart.)
We need to flip our bias on its head, McAfee argues. Rather than have the algorithm aid the expert, the better approach is to have the expert assist the algorithm. It turns out that the results get better when the expert lends his or her judgment to the computer algorithm rather than vice versa. As Ayres put it in Super Crunchers:
Instead of having the statistics as a servant to expert choice, the expert becomes a servant of the statistical machine.
Here is the fun part. It turns out that lawyers are at the forefront of this trend. “How so?” you ask. Easy, I respond. Technology-assisted review.
Although TAR vendors use different algorithms and even different approaches, the lawyer serves the algorithm and not vice versa. For traditional TAR, we look to subject matter experts to train the algorithm. To train the algorithm, they review documents and tag them as relevant or not. The algorithm uses their judgments to assist in building its rankings. But the order is clear: The experts are serving the algorithm and not the other way around.
Even if we use review teams instead of experts for training, as I have considered in several recent articles (see here and here), the pecking order doesn’t change. The reviewers are working for the algorithm to support its efforts to analyze big data. The algorithm and not the reviewers is ultimately the decision maker for the ranking; we play only a supporting role.
Several studies have documented the superiority of TAR over human judgment. In a study published in 2011, Maura Grossman of Wachtell, Lipton, Rosen & Katz and Prof. Gordon Cormack of the University of Waterloo, concluded, “[T]he myth that exhaustive manual review is the most effective—and therefore the most defensible—approach to document review is strongly refuted. TAR can (and does) yield more accurate results than exhaustive manual review, with much lower effort.”
Earlier, in their 2009 study for the Electronic Discovery Institute, Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, Herbert L. Roitblat, Anne Kershaw and Patrick Oot compared human review to TAR. “On every measure,” they concluded, “the performance of the two computer systems was at least as accurate … as that of a human re-review.”
But while these studies only hinted at McAfee’s thesis, the further evolution of TAR technology and the further growth of big data have made it explicitly clear. It turns out that even the legal profession, with its reverence for tradition, is no longer immune from these evolutionary trends. Big data demands new methods and new masters, and legal is no exception. All we can do is listen and learn and move with the times.
Long ago a wit proclaimed that the law office of the future would have a lawyer, a dog and a computer. The lawyer would be there to turn on the computer in the morning. The dog was there to keep the lawyer away from the computer for the rest of the day.
I wonder if that fellow was thinking about Big Data and the evolution of our information society? If not, he came pretty close in his prediction. The dog just gave way to a smart algorithm. We call ours Fido.