Catalyst Repository Systems - Powering Complex Legal Matters

E-Discovery Search Blog

Catalyst E-Discovery Search Blog RSS Follow Catalyst on Twitter Join Catalyst on Facebook Catalyst on LinkedIn Catalyst YouTube Channel
Follow Us:
Technology, Techniques and Best Practices
Jim Eidelman

About Jim Eidelman

A veteran trial lawyer and pioneer in legal technology, Jim Eidelman is the senior consultant on Catalyst's Search and Analytics Consulting team. With more than three decades of hands-on experience providing technology advice to law firms and legal departments, he helps clients develop strategies to overcome their toughest e-discovery challenges.

After starting out as a securities litigator, Jim launched a consulting firm in 1980 dedicated to helping lawyers enhance their practices through technology. He became recognized nationwide as a speaker and writer on legal technology. The book he co-edited with John Tredennick, Winning with Computers, became an ABA bestseller. Software he developed for legal-document assembly was praised for having the ability to "think like a lawyer."

After Plaintiff Takes Sledgehammer to Computer, Case Dismissed for Spoliation

Could you hear me laughing from down the hall?

The plaintiff in an employment case, when ordered to turn over his computers for inspection, destroyed all evidence.

He protested that the evidence was “irrelevant,” but the judge wouldn’t buy that.

Among the more interesting aspects:

  1. The plaintiff first wiped the hard drive of his failing desktop, took a sledgehammer to the hard drive and to the computer, and then threw them both in the landfill.
  2. The plaintiff also installed the programs Evidence Eliminator and CCleaner and removed all evidence from his laptop.
  3. All of this happened after he retained counsel, who told him he must preserve all emails and other documents.
  4. The plaintiff’s wiping of data from the laptop occurred a few days after the magistrate ordered him to submit it for inspection within seven days.
  5. In an email, the plaintiff told counsel that if the court ordered him to turn over the laptop, he would copy the data to a CD and wipe the hard drive.  According to the findings of fact, “At the conclusion of the e-mail he jokes that ‘an electrical surge just fried my computer and a 50 pound anvil fell over and landed on it’ and asks ‘what penalties [he would] suffer from a contempt of court citation.’ “

I was laughing out loud as I read this, but neither the magistrate nor the district judge was amused.  The plaintiff’s case was thrown out, and he was ordered to pay the defendant’s costs and attorneys’ fees.

The case is Taylor v. The Mitre Corp., No. 1:11-cv-1247, 2012 WL 5473573 (E.D. Va. Nov. 8, 2012) adopting recommendations of Taylor v. Mitre Corp., No. 1:11-cv-01247 (LO/IDD), 2012 WL 5473715 (E.D. Va. Sept. 10, 2012)

Thanks to K&L Gates for calling attention to this case on its Electronic Discovery Law blog.

Best Practices in Predictive Coding: When are Pre-Culling and Keyword Searching Defensible?

Predictive coding is an effective e-discovery tool for ranking large sets of documents. However, it is commonly performed in a manner that may be severely under-inclusive–and therefore raise concerns about its defensibility.

In the use of predictive coding, it is a common practice for the producing party to run keyword searches first, and then sample and rank the resulting documents.  The documents that don’t hit on the searches are culled out before reaching the predictive coding process.

The reasons for doing it this way are:

  • Keyword searching is an accepted standard in e-discovery.
  • The client can avoid the per-document cost for the predictive coding software.
  • It reduces the number of documents that need to be reviewed and produced, which reduces time, cost, and risk.

However, this approach ignores the “dirty little secret” of e-discovery search—that keyword searches leave behind a large set of responsive/relevant documents.

Rank ALL (or Most) Documents, Not Just the Hits

The landmark e-discovery study about keyword searching was published in 1985 by Blair and Maron. David C. Blair & M.E. Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System, 28 COMMUNC’NS. OF THE ACM 289, 295 (1985). In that study, the attorneys were confident that their searches had found more than 75% of the responsive documents.  But they were wrong.  In fact, the searches had only found 20% of the relevant documents.

This was the only study on the subject for many years. Despite it, keyword searching nevertheless became the accepted practice, largely because it was the best approach available.  More recently, however, studies by TREC and others have shown that Blair and Maron were right. TREC’s 2008 study found that keyword searching returned an average of just 24% of responsive documents. In its 2007 study, the result was 22%. Other studies returned even less.

While sampling the leave-behinds and iterative searching by adding terms discovered during the review can solve part of the problem, plenty of documents will be omitted from the review and production. 

This is the reason that going through the predictive coding process against ALL documents, rather than just the keyword search hits, is generally the most defensible practice.  Predictive coding based on sampling ALL the documents will find documents that the keyword searches miss.

How to use Keyword Searching with Predictive Coding

Does this mean that you don’t need keyword searching and other pre-culling techniques?  Of course not. Here are some of the ways we use keyword searching in conjunction with predictive coding:

  • Junk removal. We always analyze the documents for “junk” that can obviously be removed.  For example, in the Enron collection, it certainly makes sense to cull out the fantasy football documents first.  In a securities case, there are usually huge numbers of email market letters and irrelevant stock recommendations that can defensibly be removed.
  • Boosting “richness.” In most cases, the predictive coing software won’t work as well if the ratio of relevant documents to irrelevant documents (“richness percent”) is too low, meaning that the relevant documents are “sparse.”  It is legitimate and defensible to use keyword searching to boost the “richness” to a reasonable ratio before starting the predictive coding exercise.  Note, however, that it may make sense later to have the software rank the documents that didn’t hit on the keyword searches, as if it were a later rolling upload, to see if the software finds additional relevant documents.
  • Targeted searches. Often certain terms (or combinations of terms) will serve as a “rifle shot” to find important documents.  For example, in a patent case, a search for a technical term, such as a chemical name, may be important, especially when paired with the name of the inventor or the opposing party.  Targeted searching can be used at the beginning for sampling to find seed documents.  And, of course, it should be used throughout the review to find “rifle shot” documents and documents on sparsely populated issues that do not lend themselves to predictive coding. 
  • Metadata. Many predictive coding applications only analyze text and not metadata.  Obviously, there are many times when searching metadata is a key to finding relevant documents or filtering out irrelevant ones.  For example, in finding privileged communications, it helps to look in the TO and FROM fields to see if there are attorneys and clients in them.  Similarly, date filters are critical in filtering out documents that may contain “relevant” terms but are irrelevant to the issues in the case because of the time frame. 
  • Sampling and discrepancy analysis. While predictive coding applications typically include sampling methodologies, it is nevertheless a good idea to do additional sampling outside of the application, which can be done by searching for documents likely to be relevant. In particular, the discrepancy analysis, which compares software predictions with actual coding, will find documents that were given a low rank by the software.  Once that happens, you can go search for similar documents using “More Like This” and keyword searching. 

The Bottom Line

If you run predictive coding with ALL the documents, and not just those that hit on keyword searches, you will find more relevant documents, so the process is more defensible.  But keyword searching and searching metadata are still critical tools in the e-discovery toolbox.

 

In E-Discovery, Even Google Needs Help with Search: Oracle Case is Lesson in the Complexity of Privilege Search

Lest anyone underestimate the complexity of privilege searching in e-discovery, consider the recent case in which none other than search giant Google got tripped up. Although the company’s privilege searches screened out an email labeled “Attorney Work Product,” the searches failed to catch nine drafts of the same email, autosaved by the author’s email program.

Worse yet, the email was a true “smoking gun.” In fact, the email was so potentially inculpatory that the judge remarked that it and the Magna Carta (common law) would be all that the opponent’s counsel would need to win its case. (It is interesting to note that this was not the only incriminating document, and this document is even more damaging to Google when paired with a 2005 email written by Andy Rubin, Google vice president in charge of Android.)

In this post, we’ll give you some background on the case and then discuss the lessons it teaches about privilege searching in e-discovery.

A Dispute Over Java

Since August 2010, Google has been locked in litigation with Oracle America over whether its Android operating system violates Oracle’s Java patents and copyrights. Within the month before Oracle filed the lawsuit, lawyers for both companies met to discuss the alleged infringement. Ten days later, Google General Counsel Kent Walker convened an internal staff meeting to discuss the matter further. One of the attendees was Google software engineer Tim Lindholm.

A week later, as a follow-up to that meeting and a prelude to another, Lindholm wrote an email addressed to Google senior counsel Ben Lee and Andy Rubin. The email was labeled “Attorney Work Product” and “Google Confidential.” In part, it said:

What we’ve actually been asked to do (by Larry and Sergei) is to investigate what technical alternatives exist to Java for Android and Chrome. We’ve been over a bunch of these, and think they all suck. We conclude that we need to negotiate a license for Java under the terms we need.

During the four minutes it took Lindholm to write this email, his computer auto-saved it nine times. Each time, the “to” and “cc” fields were still blank and the email was not labeled with the footer. Only the tenth and final version identified the recipients and included the labels for “work product” and “confidential.”

During discovery after the lawsuit commenced, Google produced the first eight drafts of the email to Oracle. It held back the ninth draft and the final version, listing both on its privilege log. Google later told the court that its electronic scanning mechanisms “did not catch those drafts before production” because the drafts did not contain the confidentiality or privilege headings and did not list any addressees.

In court documents, Google said, “Due to the volume and speed of production in this case, Google has been forced to rely on electronic screening mechanisms, which identify potentially privileged documents based in part on sender and recipient information, as well as privilege-related keywords.”  It appears that these drafts slipped through because there was no “eyes on” review of the documents unless they hit on privilege terms.

The Emails Come to Light

On July 21, 2011, in two separate hearings in the case – one a telephonic hearing before U.S. Magistrate Judge Donna M. Ryu to compel Lindholm’s deposition and the other a Daubert hearing before U.S. District Judge William Alsup – Oracle referenced one of the email drafts. In the Daubert hearing, Oracle read part of it into the record. Google’s attorneys responded by addressing the substance of the email, but they did not object to it as privileged or confidential.

Judge Alsup readily picked up on the email’s significance. At one point during the hearing, he said to Google, “You are going to be on the losing end of this document” with “profound implications for a permanent injunction.” At another point, he said that a good trial lawyer could use the simple combination of the Lindholm email and the Magna Carta to win Oracle’s case and get an injunction. And to add insult to injury, all this took place with several news reporters in attendance, without objection.

Judge William Alsup

After those hearings, later the same day, Google sent notice to Oracle that the draft email constituted “protected material” under a protective order/clawback agreement and asked that it no longer refer to it in public. The next day, asserting that the draft was “unintentionally produced privileged material,” Google clawed it back. Soon after, Google clawed back all draft versions of the email.

Google’s next move was to ask Judge Alsup to redact the transcript of the Daubert hearing to remove all references to the Lindholm document. The judge denied the request, ruling that the email was protected neither by the attorney-client privilege nor the work-product doctrine.

Meanwhile, Oracle filed a motion with Judge Ryu to compel Google to re-produce the emails it clawed back. Judge Ryu granted Oracle’s motion, finding that the email was neither privileged nor otherwise protected. Google appealed that finding to Judge Alsup, who entered an order on Oct. 20, 2011, affirming the magistrate judge. “The Lindholm email and drafts will not be treated as protected by attorney-client privilege or work-product immunity,” he concluded.

Google now says it will appeal Judge Alsup’s ruling.

What This Means for Privilege Search

Although much can be said about whether or not these documents were privileged, whether the privilege was waived, and whether the actions of the attorneys in the hearings were well thought out, we will focus on the implications for search.

Producing Autosave Documents. One big question this case raises is whether producing parties should ever include “autosave” drafts of documents. Many software packages delete the autosave information when the user saves the document or sends the email. The answer will depend on the circumstances of the case, but it is a question counsel should always consider.

Clawback Agreements under Rule 502. The only sure way to protect privileged documents is to review them. While it was the goal of Rule 502 of the Federal Rules of Evidence to obviate the need to review every document for privilege, this case once again shows that “you can’t un-ring a bell.” Once the other side sees a damaging privileged document, it’s almost impossible to control the damage.

Keyword Searching and Highlighting. Keyword searching would have flagged this document as Potentially Privileged only if the search terms had included “negotiat*” or “licens*”. Many firms don’t like to use such broad terms because they flag too many documents as Potentially Privileged. At Catalyst, we have developed a list of several hundred words commonly used in legal documents, and many clients like to use them in their “Potentially Privileged” searches. However, doing so will identify a large number of “false hit” documents that are not privileged. One approach we recommend is to use term highlighting so that all such terms are easy to see as the document is reviewed. Even that wouldn’t have helped in this case because it appears that only documents that hit on privilege search terms were reviewed for privilege.

Near Dupe Analysis. We generally recommend using email threading and near-dupe analysis software. While the principal benefit is speeding the review, a second important benefit is to prevent inconsistent coding. In this case, when the attorneys were reviewing the final version of the emails for privilege, they would also have seen the autosaved drafts and could “tag all” as privileged. Further, we always recommend an “inconsistent coding” analysis, part of which is to identify documents in the same set of near dupes/email threads that are inconsistently coded for privilege. This would have caught the autosaved drafts before production in this case, where non-hits weren’t reviewed.

(It is interesting to note that in the most famous case of inadvertent production of a privileged, “smoking gun” document, Mt. Hawley Insurance Co. v. Felman Production, it appears that the damaging document was a dupe of a document withheld and on the privilege log. This underscores the importance of doing an inconsistent coding analysis! On this point, see also our earlier post: Bad Facts Make Bad Law: ‘Mt. Hawley’ A Step Backward for Rule 502(b)).

Other inconsistent coding checks. In addition to duplicate, near-dupe and attachment family inconsistent coding checks, other  methods we recommend to look for similar documents are “More Like This” and Key Document Clustering. In this case, this was such an important document that the attorneys should have used analytics to find similar documents as a QC measure.

Predictive Coding. Would predictive coding have helped? It’s hard to say. If the final versions of the emails had been used as seed documents, or tagged as privilege in the set of documents in the privilege sample set, there’s a good chance that the autosaved versions would have been highly ranked for privilege. On the other hand, if they were not in the set chosen to be sampled and not among the seed documents, then they might have slipped through.

The Honor System. It is also worth noting that in most cases the claims of privilege for documents listed on the privilege log are very seldom challenged.  The attorneys producing the documents are essentially on the honor system.  Had the “autosave” emails not slipped through, Oracle would not have had any way to challenge the claim of privilege for the withheld emails.  The claim of privilege in this case was certainly arguable, but it is the practice of many parties to withhold whole families of documents even when only one document is privileged and otherwise to err on the side of claiming privilege.

The Bottom Line

Recently, IDG News Service reporter James Niccolai wrote a thoughtful examination of the Google-Oracle privilege fiasco, How Google Was Tripped Up By a Bad Search. As he noted, the irony in the case is that “the e-mail might never have seen the light of day if the search tools used to identify documents covered by attorney-client privilege had done their job.” In any other case, that might not be considered ironic. But this, after all, is Google.

Whether these draft emails slipped through as a result of human or technological error, Google attributed the error to “electronic scanning tools.” Google has not provided further details and Judge Alsup’s opinion contained no further explanation.

Even so, as we’ve reviewed in this post, the case is not without lessons for e-discovery professionals. No human or technology can offer 100 percent protection against inadvertent production of privileged documents. However, some of the steps we’ve outlined above will certainly help minimize the risk.

Common E-Discovery Error #7: Searching for Non-Indexed Documents

This is the seventh in a series of posts on common e-discovery errors and how to avoid them. For background on the series, see the introduction.

No matter how diligently you search, you can’t find something if it isn’t there. Such is the case with document text. If the text is not indexed, your search will not find it. Forgetting this simple fact is an all-too-common error in e-discovery searches.

Search engines don’t actually search the text of the documents. Rather, they search an index of the text of the documents. If the document contains no readable text to be indexed, or if the document is not properly indexed, then the document will not be found.

The danger in this is that the searcher may believe the search was complete when, in fact, it was not. The search may have found all matching text, but it may have overlooked matching documents whose text was not readable.

Why Would Documents not be Indexed?

Even though a document or file contains text that you and I can see, the text may not be viewable by the software that creates the search index. The simplest example is a photograph of a sign. If we look at the photograph, our eyes can read the text on the sign. But indexing software, without an OCR component, is unable to distinguish that text from any other part of the photo.

There are several reasons why a document’s text may not be indexed:

  • No text due to file type. Typically, text can be extracted only from standard applications. The document may have been created in an application from which text cannot easily be extracted, or it may be a photo or other file that has no text.
  • No OCR (or bad OCR). A file that is scanned into TIFF without text or PDF (image only) format requires OCR software to create the indexable text. If the file is not run through OCR or if the OCR program or process is deficient, the text will exist in a way that it cannot be indexed.
  • Hardware or software error. As with any computer application, sometimes there can be “hiccups” in the indexing process, requiring that the documents be re-indexed. To avoid this situation, QC following indexing is essential.
  • Human error. A technician can make mistakes that prevent proper indexing, such as incorrect mapping to the text in the load file or incorrect copying of the data so that the mapping is incorrect. Again, QC should uncover these errors.
  • Password protection. Indexing will skip password-protected documents. You need to identify them and either get the password or break the password (if possible).
  • Office 2007/2010. When Microsoft first released Office 2007 and its new document format, many systems were unable to read it. By now, most systems have caught up and have the ability to extract text from Office 2007/2010 files. However, a document collection may include Office 2007 documents that contain non-indexable text (or text that hasn’t been indexed).
  • Non-Western language issues. As mentioned in a prior blog post (Common E-Discovery Error #6: No Comprende), languages that are not standard Western languages–such as Japanese, Chinese, Korean and Arabic–can raise indexing issues. In the UTF-8 form of Unicode supported by most e-discovery search engines, accented Latin characters as well as Greek, Cyrillic, Arabic and Hebrew characters occupy 2 bytes each. All other characters, including Chinese, Japanese, Korean and Indian, occupy 3 bytes each. Supplementary characters occupy 4 bytes each. Each language must use the proper encoding to index correctly. If a document in Japanese is saved as ASCII or incorrectly converted, it won’t index correctly.

There is one other tricky aspect to this that can trip you up if you’re not careful. Sometimes a document that has no readable text in its body will not be reported as a non-indexed document because there was text in the document properties that was indexed. For example, if there is a scanned PDF that was not OCR’d, but that has any text in the document properties, it will not be reported as a “non-indexed” document, even though the text of the PDF is not indexed.

The Danger of Non-Indexed Documents

There is an old proverb, “What you don’t know can’t hurt you.” Unfortunately, that is not the case in e-discovery.

One danger when the search index omits document text is that counsel will fail to produce a responsive document. If this happens by a good faith mistake, it is unlikely to result in sanctions. For one, the other side is unlikely to know about the omission. For another, the attorneys for the producing party may not even know about it.

The greater danger in this scenario is that counsel will inadvertently produce a privileged document–and that IS a big problem. That was part of what lead to the disastrous outcome in Mt. Hawley Insurance Company v. Felman Production Inc., 2010 WL 1990555 (S.D.W.Va., May 18, 2010). a case in which counsel inadvertently produced a “smoking gun” document. (Even if the court enforced the clawback agreement there – see Bad Facts Make Bad Law: ‘Mt. Hawley’ A Step Backward for Rule 502(b) — turning over the smoking gun documents that didn’t hit on the privilege search would still have been a disaster. This bell could not be unrung.)

Consider what happens in a responsiveness search when text is not included in the search index:

  • The review only includes search “hits,” so the non-hits aren’t reviewed and aren’t tagged as responsive.
  • If it is a “produce everything that isn’t privileged” review, only the documents that hit on the search terms are produced. There may be responsive documents that are not produced.

Now compare the possible outcomes in a privilege search:

  • In cases where every responsive document is being reviewed for privilege with help from searches to find “potentially privileged” documents, more privileged documents are likely to slip through undetected if they are not tagged as “potentially privileged.”
  • Worse, in cases where all non-privileged documents are being produced, non-indexed privileged documents, if any, would be produced because they didn’t hit on the privilege searches.

Best Practices for Avoiding Indexing Disasters

Non-indexed documents pose dangers, but they are dangers you can minimize or avoid altogether. Here are best practices we at Catalyst’s Search & Analytics Consulting Group recommend:

  • After data is loaded, get exception reports that show non-indexed data by file type, and look carefully at the counts.
  • Sample the non-indexed data by file type. Does any of the data need OCR?
  • When you run searches, be sure to sample the non-hits as well as the hits.
  • Remember to do sweep searches later to look for any documents that become searchable but that were not searchable when searches were previously run.

The rules don’t require perfection – just reasonability. Take reasonable steps to avoid these non-indexed document issues.

Text that is visible to the human eye is sometimes invisible to the computer. Forget that and you may end up with productions that are under-inclusive or over-inclusive. It is a common mistake, but one you can easily avoid.

Common E-Discovery Error #6: No Comprende

This is the sixth in a series of posts on common e-discovery errors and how to avoid them. For background on the series, see the introduction.

Businesses operate in a global economy that spans borders, cultures and languages. One increasingly common result of that is that e-discovery collections contain documents in multiple languages. Search and review of multiple-language collections is fraught with potential pitfalls.

The Catalyst resource library includes several case studies and white papers addressing the challenges of multi-language search and review. The challenges for search can be complex, particularly for the so-called CJK languages—symbol-based languages such as Chinese, Japanese and Korean.

That said, there are a handful of mistakes we commonly see made with regard to multi-language document collections. We’ll briefly review them here.

1. Ignoring non-English documents.

You’ve no doubt heard that ancient Japanese maxim, “Hear no evil, see no evil, speak no evil.” But you’d be shocked by how commonly that maxim gets applied in the context of e-discovery.

seenoevil

This is not an advisable approach to e-discovery.

Wittingly or not, lawyers sometimes simply ignore non-English documents. They run English-language search terms against a multi-language database and accept the results as final. Or if questions do arise during the search or review, they are never resolved before production.

Bottom line: English-language search terms will not find foreign-language results.

2. Inaccurate search translation.

Your friend’s daughter studied Japanese in college and offers to translate your search terms. You can’t beat the price. But does she know that Japanese uses four different alphabets? If so, does she know all four? Is she familiar with Japanese usage of business and legal terms? What we call a “contract” or “agreement,” the Japanese often refer to as a “promise.” [Read more...]

Common E-Discovery Error #5: Blind Dates

This is the fifth in a series of posts on common e-discovery errors and how to avoid them. For background on the series, see the introduction.

Often, you want to limit a search to a specified date range. Sounds easy enough. But like so much in e-discovery, it is often easier said than done.

The truth is, it is sometimes difficult to determine the precise date of a document—especially from the metadata. There are a number of reasons why this is so. Some document files may be missing key dates altogether, often because the dates were lost or corrupted somewhere in the course of the document’s processing. Other documents may show incorrect dates, such as when errors in processing or when copying alters a document’s “last modified” date.

And then there is the question of which date to use. A document’s metadata can carry a string of dates, relating to when it was created, modified, printed, e-mailed and the like. Or a document can be linked to another—such as in the case of an e-mail and an attachment—where each linked document has a different creation date. If the attachment is within the search range but the e-mail is not, what do you do?

You need to carefully QC.  Are the dates you are searching for correctly populated?  Are you searching for the correct fields in your date range search? [Read more...]

Common E-Discovery Error #4: Overly Broad Searches

This is the fourth in a series of posts on common e-discovery errors and how to avoid them. For background on the series, see the introduction.

An earlier post talked about the problem of constructing searches too narrowly, with the result that responsive documents are overlooked. A similar issue and a leading cause of e-discovery inefficiency is searches that are overbroad. If your searches are retrieving far more documents than are likely to be responsive, you are wasting time and money. After all, the most expensive part of any review is the cost of attorney review, and overly broad searches means more higher legal fees.

Overly broad searches commonly result from the failure to use Boolean operators or the use of the wrong Boolean operators. If you search for the terms “william” and “clinton,” you will retrieve every document that hits on either word. If you search for “william AND clinton,” you will retrieve every document in which both words appear, regardless of their proximity to each other. But if you search “william NEAR/2 clinton,” you get only the documents in which the two words appear within two words of each other.

Wild cards can also lead to overly inclusive results. In one case I know of, the client used the wild card search “don*” to search for variations on the name of a chemical. The problem was that the search also brought back a large number of hits on a variety of unrelated words, such as “donate” and “don’t.” By reconstructing the search more precisely, we were able to help the client cut out some 200,000 documents.

There are dozens of ways searches can be overly broad, and these are just a few of the most common. Use proximity operators. And remember that using wild cards is easier, but using specific searches is more accurate and reduces the number of irrelevant hits.

Common E-Discovery Error #3: Failure to Clean out the Junk

This is the third in a series of posts on common e-discovery errors and how to avoid them. For background on the series, see the introduction.

The fewer the documents you have to review, the less time and money it will take to review them. To understand this bit of common sense, you don’t need a rocket scientist. But to achieve it, it helps to have a junk remover. That is because one easy – but often overlooked – way to reduce the document pool is to first clean out all the junk.

By “junk,” I mean all those documents clogging your data collection that are the corporate equivalent of spam. In the Enron case, for example, there were thousands of documents pertaining to an employee fantasy football league. E-mail files can contain hundreds of daily cafeteria menus, blood-drive solicitations, job postings, and HR reminders. Harder to filter out are industry-specific newsletters and industry-specific updates that hit on search terms but do not have anything to do with the case.

These junk documents are not remotely related to the matter in dispute and should be hauled out at the earliest opportunity. The best way to do this is through early case assessment. ECA is a critical step that can help save time and money at every subsequent stage.

We like to look at document lists grouped and sorted by count, looking at authors, subject lines and titles, file types, attachment counts, and near-dupe counts. You can quickly sample the ones that look like they may be junk (by author or title) or that have the highest counts. Often, doing this will let you find the “low hanging fruit” and remove the junk from your review right at the beginning.

Of course, there are also analytics that that can help find the junk.

In any case, don’t waste time reviewing documents that can be filtered out with some intelligent ECA.

Common E-Discovery Error #2: Unbridled Contract Reviewers

This is the second in a series of posts on common e-discovery errors and how to avoid them. For background on the series, see the introduction.

Using outside contractors for document review is common in e-discovery and often even necessary for particularly large matters. But the firm’s attorneys must take the lead to make sure that contract reviewers are (1) properly trained and supervised and (2) have the correct skill sets.

In one case we know of, an outside review company was hired to help go through a collection of more than two million documents involving highly technical information that related to electronics and industry-specific e-mails and used industry-specific jargon and acronyms. The law firm was reluctant to use contract reviewers in the case, but the client insisted in order to control costs.

The law firm’s reluctance to use contract reviewers resulted in its taking a stand-offish approach to working with them, at least at first. It did little ensure the reviewers had the qualifications or training to understand and review the documents. It provided little supervision, instead telling reviewers that if they were in doubt about a document, to mark it as responsive.

Lacking the proper technical training and background, the reviewers turned out to be in doubt about most of the documents because of all the jargon and acronyms. By the time they were done, they had marked almost 80 percent of everything they reviewed as responsive. By all other indications, no more than 20 percent should have come back as responsive.

Whereas the client had urged the use of contract reviewers as a cost-savings measure, the law firm’s lack of oversight caused the opposite result, turning the process into an enormous waste of money and time. The bulk of the review would have to be repeated.

The lessons to be learned from this are obvious but not always followed. When an e-discovery project uses contract reviewers or other unfamiliar reviewers, the law firm must work with them and supervise them closely. If the project requires technical or specialized knowledge, the lawyers need to make sure that the reviewers are equipped, by experience or training, with the knowledge they need to make sense of the documents. When questions arise, steps should be in place for the lawyers to answer them quickly and in a way that all of the reviewers learn from the answers. Sometimes, it is even a good idea to use engineers or other technical professionals instead of attorneys for the first level review.

Whether the reviewers are on staff or on contract, their work is ultimately the responsibility of the lawyers in charge of the case. As long as the lawyers remember that, they can avoid problems, large and small.

Common E-Discovery Error #1: Searches too Narrow

This is the first in a series of posts on common e-discovery errors and how to avoid them. For background on the series, see the introduction.

One of the most common errors in e-discovery is a poorly constructed search. The results can be overly narrow or overly broad. In one particular case we observed, the problem of too narrow a search was compounded by the lack of cooperation between the parties and flaws in the data.

The case was complex and involved a large collection of documents. The original team of lawyers at the previous firm constructed its searches far too narrowly. By way of illustration, they searched people by name, e.g. “william clinton,” but not by alternate constructions or proximity, e.g., “(william OR bill) NEAR/2 clinton.” A search for a building was by its formal name, e.g. “government center complex,” but not by its common usage, “government center.”

To make matters worse, the vendor that had processed the data had done a disastrous job. In many cases, key metadata about dates and custodians was missing entirely. The situation was compounded by the fact that the legal team had an acrimonious relationship with opposing counsel and never bothered to consult or seek agreement with them on search parameters.

Overly narrow searches and missing metadata proved to be a toxic combination. It later came to light that their searches had missed at least 30 percent of responsive documents.

With a new team of lawyers in place, they had to reload all the documents and redo all the searches. The second time around, the new firm consulted with and cooperated with their opponents in conducting the searches. Even so, the process ended up costing far more in dollars and time than it ever should have.

Three lessons can be learned from this story:

1. Employ best practices in search techniques. Proficiency in search is a distinct form of expertise. You do not learn it by studying legal research your first year of law school. If your team does not include someone who has this expertise, then your team is incomplete.

2. Choose vendors carefully. Flawed data can be invisible data when it comes to search. If text or metadata are missing, then even the most fastidious search may miss responsive documents. In the case we discussed above, the vendor who processed the data went out of business not long after. That suggests there may have been warning signs that should have caused the e-discovery team to look elsewhere.

3. Cooperate with opposing counsel. Two brains are better than one when it comes to constructing and executing searches. If both sides cooperate, they are likely to get better results. This is why the Federal Rules of Civil Procedure encourage litigants to cooperatively develop search methodologies. And it is, in part, the thinking behind The Sedona Conference Cooperation Proclamation. Research shows that both parties are happier with the results when they cooperate.