Catalyst Repository Systems - Powering Complex Legal Matters

E-Discovery Search Blog

Catalyst E-Discovery Search Blog RSS Follow Catalyst on Twitter Join Catalyst on Facebook Catalyst on LinkedIn Catalyst YouTube Channel
Follow Us:
Technology, Techniques and Best Practices
Nirupama Bhatt

About Nirupama Bhatt

Nirupama Bhatt is a litigation support professional with more than 10 years of wide-ranging experience in technology, project management and business operations. As a member of the Search and Analytics Consulting team at Catalyst, she focuses on developing and refining e-discovery searches and strategies for law firms, corporations and government entities.

One of Rupa's key strengths is in analyzing and documenting our clients' e-discovery practices and then recommending techniques they can adopt to reduce costs and enhance effectiveness. In her work with clients, she helps develop and refine multi-lingual search strategies, coordinates attorney review teams for document-review projects, and recommends best practices for achieving defensible, repeatable results. She is multilingual and is currently studying Japanese.

Rupa is a member of the Denver chapter of Women in E-Discovery and is a Catalyst-certified search specialist and trainer.

The Slippery Issue of Relevance in Document Review

By Rupa Bhatt*

Electronic discovery is a multi-stage process of collecting, processing, analyzing, searching and reviewing electronically stored information. Yet all these stages are directed towards two basic tasks: identifying relevant information and preventing the disclosure of privileged information.

For cases involving significant collections of ESI, a key stage in the accomplishment of these two tasks is review. Even as search and analytics enable more sophisticated identification of documents through computer technology, there is still the need in virtually every case for eyes-on review by trained reviewers.

But if, as the saying goes, “Beauty is in the eyes of the beholder,” the same can be said for relevance. It is a slippery concept and its application is somewhat subjective. As reviewers move from document to document, they make judgment calls about each one’s relevance. Their judgment calls are not always perfect or consistent.

The fact of the matter is, when it comes to e-discovery document review, a certain number of inconsistent review calls are not only to be expected, but are unavoidable. Given this, how can you be better prepared in order to prevent these incidents from impacting your outcomes?

Later in this post, I will offer some suggestions on best practices to follow in order to minimize the impact of inconsistent relevance determinations. But first, some background.

Guideposts for Determining Relevance

The federal rules provide the two primary guideposts that we follow in defining relevance in discovery and at trial:

FRCP Rule 26: Duty to Disclose; General Provisions Governing Discovery. (b) Discovery Scope and Limits. (1) Scope in General. Unless otherwise limited by court order, the scope of discovery is as follows: Parties may obtain discovery regarding any non privileged matter that is relevant to any party’s claim or defense. … Relevant information need not be admissible at the trial if the discovery appears reasonably calculated to lead to the discovery of admissible evidence.

FRE Rule 401: Definition of “Relevant Evidence.” “Relevant evidence” means evidence having any tendency to make the existence of any fact that is of consequence to the determination of the action more probable or less probable than it would be without the evidence.

Given the broadness of these definitions, the determination of relevance can be a highly subjective judgment call. At the least, it is dependent on the case and on the counsel or reviewer. And the impact of that judgment call can be wide and varied. It can affect every subsequent stage of the e-discovery and document review process.

It is often the fact that the reviewer’s subjective determination impacts the relevance determination for other similar documents. Given this, one can always question the relevance assessment criteria – human and subjective, electronic and objective — applied at every stage of document review. These judgment calls in coding documents are often referred to as “coding inconsistencies” or “bad coding calls.” Such inconsistencies can have serious consequences, ranging from production of irrelevant or privileged documents to the imposition of judicial sanctions such as an adverse inference.

How Reviewers Determine Relevance

Relevance does not behave. People behave. Although various factors can contribute to deviations in relevance determinations, the single-most important factor is the subjectivity of the reviewer.

A paper produced by the TREC 2008 Legal Track described it this way:

While the ultimate determination of responsiveness (and whether or not to produce a given document) is a binary decision, the breadth or narrowness with which “responsiveness” is defined is often dependent on numerous subjective determinations involving, among other things, the nature of the risk posed by production, the party requesting the information, the willingness of the producing party to face a challenge for underproduction, and the level of knowledge that the producing party has about the matter at a particular point in time. Lawyers can and do draw these lines differently for different types of opponents, on different matters, and at different times on the same matter. This makes it exceedingly difficult to establish a “gold standard” against which to measure relevance/responsiveness and explains why document review cannot be completely automated.

Hence, the interplay of all these factors defines and directs the outcome of the review itself.  Some examples of “influencers” that may impact the relevance determinations include:

Human/subjective factors. These factors can include the prior experience and knowledge of the reviewers as a whole (i.e., Is this an off-shore review team unfamiliar with U.S. litigation? Is the review team trained on the review platform?), as well as issues with individual reviewers’ ability to grasp, perceptions, ideas, concepts, emotional state, cultural norms, etc.).

Instead of “subjective,” it may be more appropriate to say that discovery involves judgment about the situation as well as about the documents and their contents. Some judgments bias the reviewer to be more inclusive and some bias the reviewer to be less inclusive, but these judgments are not made willy-nilly. As opposed to pure errors, which are random, these judgment calls are based on a systematic interpretation of the evidence and the situation.

Objective factors. These factors can include relevance criteria as presented by lead counsel (i.e., how complex is the review in terms of numbers of issues and parties? How many reviewers are there? What is the timeline? Is the review happening in different geographical locations and time-zones? What is the review workflow? Is it well-formulated?).

Technology factors. These can include the review platform (i.e., How easy is it for counsel and reviewers to navigate through documents? How fast is the tool? Are there any technological limitations hindering the review?).

relevance triangle

Figure 1

Ultimately, the determination of document relevance involves the interplay among these three factors and can be represented in many ways, as illustrated by Figure 1 and Figure 2.

relevance strata

Figure 2

Apart from the above factors, the reviewer is also intuitively — based on the matter, training and information provided before the start of review — and at a sub-conscious level figuring out a pattern while trawling through the documents.

Research indicates that reviewers look for clues and triggers in the document. While there is no wide consensus, on a general level, we can group these triggers into the following categories:

  • Overt and latent semantic content: Topic, quality, depth, scope, treatment, clarity.
  • Object: Characteristics of the document, e.g. type, organization, representation, format, availability, accessibility, information flow, threads.
  • Validity: Accuracy of information provided, authority, trustworthiness of sources, verifiability with coding docket.
  • Situational match: Appropriateness to situation or tasks, urgency.
  • Belief match: Credence given to information, acceptance as to truth, reality, confidence.

Reviewers’ Learning Curve as a Factor

As a reviewer progresses through documents, the reviewer’s knowledge of the matter and comfort with the review platform improve. As they do, the accuracy of document relevance determinations also improves.

We typically see more coding inconsistencies within the first five days of a review project. During this time, the reviewers are learning the matter as well as getting a feel for the underlying documents. A reviewer’s understanding and knowledge of the review platform can either benefit or hinder the ability to accurately assess document relevance.

By way of illustration, many of you might have answered market research surveys. Observe that a question might have been asked three different times using different terms and in different sections of the questionnaire. Ever wondered why? It is because the researcher is checking the reliability and consistency of the individual’s response over time. Invariably, there is an inconsistency observed in the individual’s response.

Another example is the study conducted by Ellen M Voorhees (2000) with TREC data. One set of reviewers was requested to review a random sample of documents and assess for relevance pursuant to pre-defined criteria. A second team then assessed the same set of documents (already coded for relevance). On analyzing the results, it was determined that of the documents considered relevant by the first set of reviewers, only 80% of the documents were considered relevant by the second set of reviewers. So what changed? The documents and the pre-defined criteria were the same, but the reviewers were different.

Best Practices to Prevent Inconsistent Relevance Assessments

By now you get the picture. Relevance is a slippery and subjective concept. As I said at the outset, a certain number of inconsistent review calls are not only to be expected, but are unavoidable. That does not mean, however, that you should throw your hands up in defeat. There are various best practices you should employ in review in order to minimize the number and impact of relevance miscalls.

  • Plan, plan, plan. It is imperative to walk through a review workflow and to integrate counsel into each stage of the quality control and assurance process.
  • Provide repeatable, detailed case training and easily accessible case documents. If you use an outside review team, consider recording their training session. The recording will always help the team revisit the issues that may come up repeatedly and can be used as a training tool for new reviewers.
  • Choose a good team. In most cases, the team of reviewers can either win or lose the case. Hence, choose wisely.
  • Implement a thorough review workflow. The workflow should include escalation points. Supplement it by creating a detailed review journal outlining your expectations for reviewers. Clearly identify the issues and the criteria for relevance. Make certain to find example documents to use for training. Address the logistics of the review, including how quickly reviewers should move through documents and how they should handle parent–child relationships, email threads and near duplicates, redactions and annotations etc. Our suggestion is to create “cheat sheets” for easy reviewer reference throughout the review project.
  • Identify a senior attorney as a dedicated subject-matter expert. During the initial days of a review, it is a necessity to ensure strict quality control procedures are followed to identify inconsistencies in coding. Involving counsel in the initial days – to answer questions,  offer insight, provide further clarifications to what may seem like minor points, participating in “hands on” quality control, providing further ad-hoc, on-demand training etc. — can offer immense benefits to ensuring accurate and consistent document coding.
  • Build-in tools for quality control. Your review process should include QC checks for coding calls regarding issues, privilege, responsiveness etc. The process should provide timely reports of review quality, accuracy, speed, code reversals etc. The system should provide alerts for inconsistent coding across families. If a parent document is tagged responsive, the tool should be able to tag all the family members as responsive (or something similar). It is a good practice to use automated rules and intelligent auto-coding methodologies, as well as bulk updates. To ensure uniformity and prioritize review, consider leveraging software analytics, such as classifiers, clustering and e-mail threading.
  • Second level QC plan. This should be developed by senior attorneys, specifically to address the case at hand.
  • Review audits. Conduct these periodically and document findings, successes and failures.
  • Employ sampling. Sample documents extensively for relevance and privilege.
  • Post review audit. You should conduct an extensive post-review audit and QC check.

These best practices won’t make relevance any less slippery a concept to nail down. But they can help prevent the review process from slipping out of your control.

*Rupa Bhatt is employed by Catalyst as a member of the Catalyst Consulting team.

A Primer on the Use of Sampling in E-Discovery, Part II

By Rupa Bhatt*

In the first of my two posts on the use of sampling in e-discovery, I described generally the methods used in sampling and defined some of the key terminology used in the field. In this post, I will discuss more specifically the types of statistically valid sampling methods and explain how a sample set is selected.

In deciding which type of sample to use, the objectives of the review should guide you. When it is not necessary to make a statement about an entire population, a judgmental sample is appropriate, as discussed in Part 1. When judgmental sampling does not adequately support the review’s conclusions, one of the four following more accurate statistical samples should be used:

  • Discovery sampling is appropriate when the population to be reviewed has a high potential for integrity breakdowns or the review objective is to detect significant inconsistencies in coding calls. It is used to identify at least one suspected occurrence, assuming a given number of such occurrences exist in the population.
  • Stop-or-go sampling involves sampling a document universe in increments and examining each incremental sample before deciding when to stop. Appropriate for preliminary sampling and search results testing, it allows reviewers to determine from the smallest possible sample size if an error rate exceeds a predetermined level. This method provides assurance, within a fixed degree of confidence, that the error rate in a population is less than a predetermined acceptable error rate. While it does not provide an estimate of actual error rate, it can readily be converted into attribute sampling, which can be used to estimate actual error rate.
  • Attribute sampling identifies the number of times a particular characteristic (or trend) appears in a document population, at a given confidence level, within a specified range of precision. It is applicable in testing systems of internal quality control by estimating the number of occurrences (errors) in a population with a scientific degree of reliability. It can be used, for example, to check coding calls by a reviewer or set of reviewers.
    Let’s say you have a document sample that consists mainly of e-mails between Person A and Person B within a certain date range. After reviewing a small set of 100 documents, the reviewers have classified 95 as privileged. We can safely extend this conclusion to the rest of the population and conclude at a 95 percent confidence level that all e-mail communication between Person A and Person B within that date range is privileged.
  • Variable sampling is used for resource planning purposes and to “guesstimate” dollar values, time spans and other variables at given confidence levels and ranges of precision

Sample Size

Sample sizes must be determined after review objectives and tests are established and sample type is determined.  For judgment samples, reviewers should determine sample size based on the type of review and review circumstances. A sample of 50 or more items can, in many cases, be evaluated statistically, if desired. Therefore, before choosing a judgmental sample size of more than 50 items, reviewers should consider statistical sampling for a much more reliable, valid and most importantly defensible conclusions.

Sample Selection

After determining sample type and size, the objective is to select an unbiased sample of review documents. Unbiased samples are normally selected by the following methods:

  • Random number selection is generally considered the best method for selecting a true random sample. As an example, if we give every document in a legal repository a unique ID number, we can, through a random document ID generator, pull out a sample set of random documents equal to the desired sample size.
  • Interval selection is the method of selecting items at fixed intervals after the first item is chosen at random using random number interval sampling. This method assumes the population to be randomly distributed. For example, Docid 7 is picked first and every seventh document from there onward is included in the sample set.
  • Stratified selection is accomplished by dividing a population into two or more segments.  Samples are selected from each segment using random numbers or interval selection and analyzed.  Results are then combined into an estimate for the entire population.  Stratifying can help reduce sample size and the distorting effect of wide numerical variances.  This method of selection is effective for determining characteristics of a portion of a population. This is the methodology we recommend for privilege review.

The use of sampling in e-discovery is still in its early stages of adoption. But with document-by-document review becoming physically and financially unfeasible, sampling will increasingly be a key tool for lawyers and e-discovery professionals.

*Rupa Bhatt is employed by Catalyst as a member of the Catalyst Consulting team.

A Primer on the Use of Sampling in E-Discovery, Part I

By Rupa Bhatt*

The traditional approach to privilege review is to look at each document individually and tag it with the appropriate code. But with the quantity of electronically stored information exploding exponentially, it is increasingly impractical, if not impossible, to conduct document-by-document review.

This state of affairs has spurred the more-frequent use of sampling to exclude documents unlikely to be privileged and narrow the number of documents needing eyes-on review. The concept that underlies this was authoritatively described by U.S. Magistrate Judge John M. Facciola and Nixon Peabody partner Jonathan M. Redgrave in their November 2009 article, Asserting and Challenging Privilege Claims in Modern Litigation: The Facciola-Redgrave Framework. Several court decisions have endorsed the use of sampling in e-discovery, most famously Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D.Md.2008), and e-discovery guides such as the Maryland protocol expressly refer to the use of “sampling to search rather than searching all of the records.”

So what, exactly, is sampling all about? The concept comes from the world of statistics and is broadly applied in any number of common circumstances. Generally, sampling occurs based on one of three methods:

  • Judgmental sampling is a method in which every item does not have an equal chance of being selected.  Typically, judgmental samples are used when staff or time resources are limited or there is no need to generalize about a population.  For example, a judgmental sample may be sufficient to show a control weakness or to prompt management to take corrective action.
  • Random sampling is a method in which each member of the population has an equal chance of being selected.  This ensures no bias is used in the sample selection.  However, a random sample, of itself, does not imply a “statistical sample.” Accordingly, the results cannot be projected to the population.  A random, non-statistical sampling method would typically be used as a way of emphasizing that the results were not biased or exaggerated by selecting, for example, known cases of noncompliance.
  • Statistically valid sampling is a method that combines random sampling with additional criteria, including confidence level, confidence intervals, expected error rate and precision.  This method is used to make a statement about the population from which the sample was selected.  Using this method, outcome measures can be projected to the population.

Sampling has its own terminology. Some common terms include:

  • Confidence level is the degree of probability that sample results are representative of actual conditions in the population.  For example, a confidence level of 90 percent means that there are 90 chances in 100 that sample results represent the true population characteristics.
  • Precision is the range within which the estimate of the population characteristics will fall at the stipulated confidence level. It is expressed as a plus or minus percentage.  Confidence level and precision are integral parts of the same mechanism.  Each has an effect on the other.  Precision is expressed in percentages (for attribute samples) and in units (for variable samples).  For example, in the statement, “We estimate the error rate in performing this activity is between 10 and 15 percent,” the precision is 5 percent (or +/- 2.5 percent).
  • Sample reliability is the confidence level and precision achieved by a statistical sample.  The statement, “We are 90 percent confident the actual error rate is between 10 and 15 percent,” is an example of sample reliability.
  • Stratified selection means dividing the population into two or more segments (strata) and taking a sample from each stratum.  Sample results from the separate strata may be combined into an estimate for the entire population.
  • Standard deviation is the measure of the variability of a particular population or of a sample from that population.

These are some general concepts about sampling. In the second half of this two-part post, I will go into more detail on the primary methods of sampling and on the mechanics of selecting a sample set.

*Rupa Bhatt is employed by Catalyst as a member of the Catalyst Consulting team.