By Rupa Bhatt*
The traditional approach to privilege review is to look at each document individually and tag it with the appropriate code. But with the quantity of electronically stored information exploding exponentially, it is increasingly impractical, if not impossible, to conduct document-by-document review.
This state of affairs has spurred the more-frequent use of sampling to exclude documents unlikely to be privileged and narrow the number of documents needing eyes-on review. The concept that underlies this was authoritatively described by U.S. Magistrate Judge John M. Facciola and Nixon Peabody partner Jonathan M. Redgrave in their November 2009 article, Asserting and Challenging Privilege Claims in Modern Litigation: The Facciola-Redgrave Framework. Several court decisions have endorsed the use of sampling in e-discovery, most famously Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D.Md.2008), and e-discovery guides such as the Maryland protocol expressly refer to the use of “sampling to search rather than searching all of the records.”
So what, exactly, is sampling all about? The concept comes from the world of statistics and is broadly applied in any number of common circumstances. Generally, sampling occurs based on one of three methods:
- Judgmental sampling is a method in which every item does not have an equal chance of being selected. Typically, judgmental samples are used when staff or time resources are limited or there is no need to generalize about a population. For example, a judgmental sample may be sufficient to show a control weakness or to prompt management to take corrective action.
- Random sampling is a method in which each member of the population has an equal chance of being selected. This ensures no bias is used in the sample selection. However, a random sample, of itself, does not imply a “statistical sample.” Accordingly, the results cannot be projected to the population. A random, non-statistical sampling method would typically be used as a way of emphasizing that the results were not biased or exaggerated by selecting, for example, known cases of noncompliance.
- Statistically valid sampling is a method that combines random sampling with additional criteria, including confidence level, confidence intervals, expected error rate and precision. This method is used to make a statement about the population from which the sample was selected. Using this method, outcome measures can be projected to the population.
Sampling has its own terminology. Some common terms include:
- Confidence level is the degree of probability that sample results are representative of actual conditions in the population. For example, a confidence level of 90 percent means that there are 90 chances in 100 that sample results represent the true population characteristics.
- Precision is the range within which the estimate of the population characteristics will fall at the stipulated confidence level. It is expressed as a plus or minus percentage. Confidence level and precision are integral parts of the same mechanism. Each has an effect on the other. Precision is expressed in percentages (for attribute samples) and in units (for variable samples). For example, in the statement, “We estimate the error rate in performing this activity is between 10 and 15 percent,” the precision is 5 percent (or +/- 2.5 percent).
- Sample reliability is the confidence level and precision achieved by a statistical sample. The statement, “We are 90 percent confident the actual error rate is between 10 and 15 percent,” is an example of sample reliability.
- Stratified selection means dividing the population into two or more segments (strata) and taking a sample from each stratum. Sample results from the separate strata may be combined into an estimate for the entire population.
- Standard deviation is the measure of the variability of a particular population or of a sample from that population.
These are some general concepts about sampling. In the second half of this two-part post, I will go into more detail on the primary methods of sampling and on the mechanics of selecting a sample set.
*Rupa Bhatt is employed by Catalyst as a member of the Catalyst Consulting team.
[...] This post was mentioned on Twitter by Bob Ambrogi, Rob Robinson and Tom Mighell, Catalyst. Catalyst said: A Primer on the Use of Sampling in E-Discovery http://bit.ly/drFmOT [...]