Author Archives: Jeremy Pickens

Jeremy Pickens

About Jeremy Pickens

Jeremy Pickens is one of the world’s leading information retrieval scientists and a pioneer in the field of collaborative exploratory search, a form of information seeking in which a group of people who share a common information need actively collaborate to achieve it. Dr. Pickens has seven patents and patents pending in the field of search and information retrieval. As senior applied research scientist at Catalyst, Dr. Pickens has spearheaded the development of Insight Predict. His ongoing research and development focuses on methods for continuous learning, and the variety of real world technology assisted review workflows that are only possible with this approach. Dr. Pickens earned his doctoral degree at the University of Massachusetts, Amherst, Center for Intelligent Information Retrieval. He conducted his post-doctoral work at King’s College, London. Before joining Catalyst, he spent five years as a research scientist at FX Palo Alto Lab, Inc. In addition to his Catalyst responsibilities, he continues to organize research workshops and speak at scientific conferences around the world.

Thinking Through the Implications of CAL: Who Does the Training?

Before joining Catalyst in 2010, my entire academic and professional career revolved around basic research. I spent my time coming up with new and interesting algorithms, ways of improving document rankings and classification. However, in much of my research, it was not always clear which algorithms which may or may not have immediate application. It is not that the algorithms were not useful; they were. They just did not always have immediate application to a live, deployed system.

computersintrainingSince joining Catalyst, however, my research has become much more applied. I have come to discover that doesn’t just mean that the algorithms that I design have to be more narrowly focused on the existing task. It also means that I have to design those algorithms to be aware of the larger real world contexts in which those algorithms will be deployed and the limitations that may exist therein.

So it is with keen interest that I have been watching the eDiscovery world react to the recent (SIGIR 2014) paper from Maura Grossman and Gordon Cormack on the CAL (continuous active learning) protocol, Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery. Continue reading

The Five Myths of Technology Assisted Review, Revisited

Tar PitOn Jan. 24, Law Technology News published John’s article, “Five Myths about Technology Assisted Review.” The article challenged several conventional assumptions about the predictive coding process and generated a lot of interest and a bit of dyspepsia too. At the least, it got some good discussions going and perhaps nudged the status quo a bit in the balance.

One writer, Roe Frazer, took issue with our views in a blog post he wrote. Apparently, he tried to post his comments with Law Technology News but was unsuccessful. Instead, he posted his reaction on the blog of his company, Cicayda. We would have responded there but we don’t see a spot for replies on that blog either. Continue reading

In TAR, Wrong Decisions Can Lead to the Right Documents (A Response to Ralph Losey)

In a recent blog post, Ralph Losey tackles the issue of expertise and TAR algorithm training.  The post, as is characteristic of Losey’s writing, is densely packed.  He raises a number of different objections to doing any sort of training using a reviewer who is not a subject matter expert (SME).  I will not attempt to unpack every single one of those objections.  Rather, I wish to cut directly to the fundamental point that underlies the belief in the absolute necessity that an SME, and only an SME, should provide the judgments, the document codings, that get used for training: Continue reading

Predictive Ranking: Technology Assisted Review Designed for the Real World

Why Predictive Ranking?

Most articles about technology assisted review (TAR) start with dire warnings about the explosion in electronic data. In most legal matters, however, the reality is that the quantity of data is big, but it is no explosion. The fact of the matter is that even a half million documents—a relatively small number in comparison to the “big data” of the web—pose a significant and serious challenge to a review team. That is a lot of documents and can cost a lot of money to review, especially if you have to go through them in a manual, linear fashion. Catalyst’s Predictive Ranking bypasses that linearity, helping you zero-in on the documents that matter most. But that is only part of what it does.

In the real world of e-discovery search and review, the challenges lawyers face come not merely from the explosion of data, but also from the constraints imposed by rolling collection, immediate deadlines, and non-standardized (and at times confusing) Continue reading

Search Q&A: How ECA is ‘Broken’ and the Solution that will Fix It

[This is another in a series of search Q&As between Bruce Kiefer, Catalyst's director of research and development, and Dr. Jeremy Pickens, Catalyst's senior applied research scientist.]

BRUCE KIEFER: Early case assessment is a hot topic in electronic discovery. You believe that it may be flawed and cause additional errors. Why is that?

DR. JEREMY PICKENS: We’ve all heard the expression, “Don’t throw out the baby with the bath water.” Unfortunately, many e-discovery professionals risk doing exactly that in the way they are conducting ECA. Continue reading

Search Q&A: Learning to Read the ‘Signals’ Within Document Collections

By and | This entry was posted in Search, Search Q&A and tagged , on by .

Smoke Signals by Frederic Remington[This is another in a series of search Q&As between Bruce Kiefer, Catalyst's director of research and development, and Dr. Jeremy Pickens, Catalyst's senior applied research scientist.]

BRUCE KIEFER: What are “signals” and how can they improve search?

DR. JEREMY PICKENS: Signals are objectively measurable and quantifiable properties of a document or collection (or even user). Signals could come from the document itself (data) or from information surrounding the document, such as lists of users who have edited a document, Continue reading

Search Q&A: The Six Blind Men and the E-Discovery Elephant

By and | This entry was posted in Search, Search Q&A and tagged , on by .

[This is another in a series of search Q&As between Bruce Kiefer, Catalyst's director of research and development, and Dr. Jeremy Pickens, Catalyst's senior applied research scientist.]

BRUCE KIEFER: There are a lot of search algorithms out there. Why do you feel that collaboration is a better way to search?

DR. JEREMY PICKENS: Collaboration is a better way to search because e-discovery is not all about the algorithms. Algorithms also involve people.

In a previous post (Q&A: Is There a Google-like ‘Magic Bullet’ in E-Discovery Search?), I talked about why there will never be a magic bullet for e-discovery. That primarily has to do with the fact that an information Continue reading

Q&A: Collaborative Information Seeking: Smarter Search for E-Discovery

[This is one in a series of search Q&As between Bruce Kiefer, Catalyst's director of research and development, and Dr. Jeremy Pickens, Catalyst's senior applied research scientist.]

BRUCE KIEFER: In our last Q&A post (Q&A: How Can Various Methods of Machine Learning Be Used in E-Discovery?), you talked about machine learning and collaboration. More than a decade ago, collaborative filtering and recommendations became a distinguishing part of the online shopping experience. You’ve been interested in collaborative seeking. What is collaborative seeking and how does it compare to receiving a recommendation?

DR. JEREMY PICKENS: Search (seeking) and recommendation are really two edges Continue reading

The Recommind Patent and the Need to Better Define ‘Predictive Coding’

Last week, I attended the DESI IV workshop at the International Conference on AI and LAW (ICAIL).  This workshop brought together a diverse array of lawyers, vendors and academics–and even featured a special guest appearance by the courts (Magistrate Judge Paul W. Grimm).  The purpose of the workshop was, in part:

…to provide a platform for discussion of an open standard governing the elements of a state-of-the-art search for electronic evidence in the context of civil discovery. Continue reading