You don’t have to read too many legal documents to know that lawyers love to use acronyms and abbreviations. Why is it, then, that lawyers sometimes forget about acronyms and abbreviations when constructing keyword searches? If you’re going to search for “Federal Trade Commission,” shouldn’t you also search for “FTC”?
A recent order by a U.S. magistrate judge in the Southern District of California is a reminder of why lawyers should never forget about acronyms and abbreviations in e-discovery. Further, the ruling underscores the importance of thinking about these terms early on in a case.
In this antitrust matter, discovery was fairly far along before the plaintiffs realized something was missing. As the defendants began to produce documents in response to plaintiffs’ discovery requests, the plaintiffs realized that some of the various defendants routinely referred to each other using abbreviations or acronyms. Upon learning this, they filed a motion seeking to compel the defendants to run document searches containing these abbreviations and acronyms.
That might seem to be a reasonable request. But U.S. Magistrate Judge Louisa S. Porter denied the motion. To understand why, we need to go back to the beginning of discovery in this case.
The Perils of Cooperation
Before the defendants responded to plaintiffs’ discovery requests, they notified the plaintiff that they intended to use keyword searches to find relevant documents. They asked the plaintiffs to provide search terms, but the plaintiffs contended they could not do this because they lacked sufficient information at that point to construct meaningful searches. The defendants went ahead and created their own list of search terms, which they then showed to plaintiffs. Upon seeing it, plaintiffs protested that the terms were too restrictive and were unlikely to capture some highly relevant documents. At that point, the two sides sat down and negotiated a list of agreed-upon search terms. The negotiated list included several terms specifically targeted to capturing defendant-to-defendant communications.
With the parties having agreed on search terms, the defendants began to produce documents. From the court’s opinion, it appears that it was through review of those produced documents that plaintiffs discovered the frequent use of abbreviations and acronyms. Thus came plaintiffs’ motion and thus we arrive back at the magistrate-judge’s denial of that motion.
Plaintiffs had ‘Ample Opportunity’
As I began to read the judge’s reasoning, I was sure she was heading towards ruling in the plaintiffs’ favor. She starts out talking about the Sedona principles of cooperation. She talks about the need for keyword searching to be “a cooperative and informed process.” She emphasizes the importance of “a full and transparent discussion among counsel of the search terminology.” All of this had me thinking she was about to chastise the defendants for their collective failure to mention the acronyms and abbreviations.
Instead, it was the plaintiffs who the judge chastised. Her message to them–these are my words, not hers–was this: “You had plenty of chances to ask about abbreviations and acronyms and you should have thought to do so.”
Here is how she put it:
Here, the Court finds Plaintiffs had ample opportunity to obtain discovery regarding abbreviations and acronyms of Defendant companies, and the burden or expense to Defendants in having to comply with Plaintiffs’ request regarding abbreviations and acronyms outweighs its likely benefit. … First, Plaintiffs had two separate opportunities to suggest that Defendants search for abbreviations and acronyms of the Defendant companies; initially, before Defendant’s produced documents; and second, during negotiations between the parties on agreed-upon expanded search terms. In the spirit of the conclusions made at the Sedona Conference, and in light of the transparent discussion among counsel of the search terminology and subsequent agreement on the search method, the Court finds it unreasonable for Defendant to re-search documents they have already searched and produced.
Second, after meeting and conferring with Plaintiffs, and relying on their agreement with Plaintiffs regarding search terms, Defendants have already searched and produced a significant number of documents, thereby incurring significant expenses during this limited discovery period. Further, as articulated by Defendants, the new search terms Plaintiffs have proposed would require some Defendants to review tens of thousands of additional documents that would likely yield only a very small number of additional responsive documents. Therefore, the Court finds a re-search of documents Defendants have already searched and produced is overly burdensome.
Without knowing more about the precise abbreviations and acronyms involved in this case, it’s hard to judge whether this was a fair outcome. But it sure strikes me the wrong way.
In the normal course, search queries should use stemming, fuzzy searching, phonic searching and similar techniques to find similar or related words. But what if a term is off the map, so to speak? What if internal emails regularly referred to the company CEO as “Col.” because he had long ago been an officer in the military? How would the opposing party know, in advance of any discovery, to search for this? No stemming or fuzzy searching would pick that up. On the other hand, if you’re suing General Motors and neglect to search for GM, then you have no one to blame but yourself.
The lesson here, I suppose, is to remember that you don’t know what you don’t know. In the case of acronyms and abbreviations, anticipate and ask–so you don’t find yourself having to C.Y.A. later on.
In Re: National Association of Music Merchants, Musical Instruments and Equipment Antitrust Litigation, MDL No. 2121 (Dec. 19, 2011).
Another great post, Bob. Thanks.
You write, “In the normal course, search queries should use stemming, fuzzy searching, phonic searching and similar techniques to find similar or related words.” I used to say much the same thing, but lately find that the methods which mechanically vary spellings–especially fuzzy searching–inject such a landslide of false hits that I’ve reconsidered my position.
I sugegst that, while the search methods you mention have their place, the superior approach is to test your searches on a small-but-sensibly-selected sample of the data to be searched and assess whether the terms are sufficiently precise and achieving the recall of data sought. This better reflects the ideal of pairing the power of mechanized search with the insight of human reviewers.
I also recommend that common misspellings be anticipated rather than employ fuzzy search. Certain transpositions and spelling errors are far more common (e.g., “i” before “e” errors, hitting adjacent letters on a keyboard), and so there’s no need to search for every variant when you can search for the variants that occur with the greatest frequency. Here again, testing a sensible sample allows for the identification of the variants most pertinent to the information sought.
Thanks again for the useful post.
Thank you Mr Ball for the excellent recommendations ..Completely agree with you..
In research, we normally implement a survey on a sample population to establish the content validity of the questions, and based on the results, tweak the questions accordingly and re-apply to a different population until we establish both content validity and reliability. This is what is known as a “pilot study.” What Craig is suggesting is an equivalent of that…in terms of precision and recall.
His second recommendation is what we already practice, especially with names that may be spelled any number of ways. We can always extend it to include common misspellings..
Both the recommendations are time intensive and give importance to pre-planning and establishing the search methodology prior to taking a deep dive into review…
Thanks again for your valuable comments..
Excellent points, Craig. I agree. A few additional thoughts:
1. Analytics. Use of many analytical tools might prevent this problem. For example, many of the “predictive coding” tools would likely have found documents with the acronyms.
2. Ethics and Stipulations. It is an adversary system. Nevertheless, I have felt queasy when the producing party “sandbags” the requesting party into stipulating to terms that are incomplete, which seems to happen often. I think that, at a minimum, the plaintiff should have required a clause in the stipulation that if the defendant finds alternate terms for people, entities or other terms on the list, those additional terms will be disclosed and added to the list. If there is to be a stipulated list, it shouldn’t be cast in concrete.
3. Tiered approach. At Sedona meetings and other conferences, judges recommend an iterative approach, with terms discussed again after preliminary searching and initial production of a representative sample. Requesting parties shouldn’t get locked in to an initial list, which is sure to be both under-inclusive and over-inclusive.