Catalyst Repository Systems - Powering Complex Legal Matters

E-Discovery Search Blog

Catalyst E-Discovery Search Blog RSS Follow Catalyst on Twitter Join Catalyst on Facebook Catalyst on LinkedIn Catalyst YouTube Channel
Follow Us:
Technology, Techniques and Best Practices
Ron Tienzo

About Ron Tienzo

Ron Tienzo is that rare lawyer who is also fluent in technology and adept at search. As a Senior Consultant at Catalyst, Ron draws on his unique background as both a lawyer and a software engineer. With a degree from Sturm College of Law at the University of Denver and extensive training in a range of legal software applications and programming languages, Ron provides litigation consulting to corporate law departments and law firms.

Prior to joining Catalyst, Ron was the software specialist at a full-service Los Angeles law firm. There, he was the firm's lead person on the use of technology in complex litigation matters. In addition, he was the firm's principal advisor on all software matters and was responsible for software training for all professionals and staff.

Friday Fun: Obi-Wan Kenobi, The Most Demanded E-Discovery Expert in the Industry

A bit of fun for a Friday afternoon. To hear the Star Wars clip that inspired this cartoon, click here.

Treading Past Angels: Finding the Right Search Expert for Your Case

Last February, Assured Guaranty Municipal Corp. sued UBS Real Estate Securities Inc. for breach of contract, accusing the company of failing to meet obligations related to the pooling of residential mortgage-backed securities. As the case moved along, disputes arose over discovery and both sides filed motions to compel.

One of the discovery issues in dispute was the adequacy of the search terms that Assured proposed to apply to electronic documents. Ruling on this issue in a Nov. 21, 2012, memorandum, U.S. Magistrate Judge James C. Francis IV began by quoting the oft-cited words of another U.S. magistrate judge, John M. Facciola, in U.S. v. O’Keefe:

Whether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics. … Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.

Because neither party provided the expert affidavits that would have enabled him to decide on the most efficient search protocol, Judge Francis laid out three options for counsel:

  1. They can cooperate (along with their technical consultants) and attempt to agree on an appropriate set of search criteria.
  2. They can refile a motion to compel, supported by expert testimony.
  3. Or, they can request the appointment of a neutral consultant who will design a search strategy.

Note that in each of the three options, Judge Francis requires that an expert be involved in any determination, even when both counsel come to an agreement out of court.

So, if counsel are required to consult an expert and if search creation and testing is truly an area where “angels fear to tread,” how do you make sure to select the right search expert for your case? Below are some tips you may find helpful in finding the right consultant.

E-Discovery Expert  ≠ Search Expert

Electronic discovery is an extremely broad field of knowledge covering the entire gambit of the EDRM [link: http://www.edrm.net/resources/edrm-stages-explained]. While there may be a few e-discovery experts out there qualified to serve as search experts, Judge Franklin makes clear that the expert should have a solid understanding of the interplay between “the sciences of computer technology, statistics and linguistics.”  In the world of academia, this field of study is referred to as “Information Retrieval” (IR). [link: http://en.wikipedia.org/wiki/Information_retrieval] Those specializing in IR seek to use searches to identify the relevant information from a larger sea of data. When selecting your expert, be sure she has expert knowledge in at least one of the areas of expertise listed by Franklin and foundational knowledge of the others. She should be able to understand how each science plays into the ultimate goal of creating defensible searches.

One Size Does Not Fit All

Like electronic discovery, IR is an extremely broad area of study. In 1992, the Text Retrieval Conference (TREC) [link: http://trec.nist.gov/ ] was created to support the IR community by providing the infrastructure to test and evaluate different text retrieval methodologies.  Since its inception, there have been 21 different research areas, or tracks. These tracks range from web search to spam filtering to gene sequencing. While the Legal Track has been a mainstay for TREC since 2006, it is far from the only IR area of study.

Much of IR focuses on returning the best results from an extremely large dataset. A good example of this would be web search. In web search, the user enters a query and the IR system compiles a list of matching results in a ranked order by likelihood or responsiveness. Users typically find what they are looking for within the first few pages of results and have no need to proceed further.

In contrast, an ideal e-discovery search methodology would need to make an accurate relevance assessment on each and every document in the collection. Instead of finding just the best documents, we need to identify those with even the slightest traces of relevant content. This task poses a unique and what I believe to be a significantly harder task than the majority of IR research.

When selecting a search expert, be sure to find one who is familiar with the intricacies of legal search. While the ideal candidate does not need to have a JD, the expert should be familiar with what is required under the Federal Rules of Evidence and be able to conform her processes to that standard.

Those Who Can, Do; Those Who Can’t, Teach; Those Wanting to Be My Expert, Do and Teach!

It’s obvious that you want to find an expert who has done similar work before and has the references to confirm it. Supporting documentation can be in the form of scholarly articles, prepared client reports, and documents submitted to the court. You want to read through these documents to make sure the information is presented clearly, that a structured workflow was followed, and the results are supported by iterative testing. You also want to make sure that throughout the expert’s career, she has been consistent in her work. Prior publications represent a minefield for prior inconsistent opinions and will likely be used by opposing counsel to discredit your expert

You also want to find an expert who can clearly explain the craft. Most experts are used to operating in a highly specialized academic community with shared foundational knowledge and common vocabulary. A good expert needs to be a teacher at heart. She should enjoy educating others by explaining complex principles in simplified terms.

Have the expert walk you through one of her reports. Make sure she can detail in basic vocabulary each step that was taken, why choices were made, and how the outcome was determined. For any part of the process that requires foundational knowledge, such as statistical sampling, she should be able to break down the process enough that the user can follow along and then return to the bigger picture. Be sure to ask questions, keeping a close eye to make sure she does not appear frustrated or inconvenienced.

Our Catalyst Consulting team of linguists, statisticians, attorneys, and technical experts has been helping clients construct repeatable and defensible search methodologies since 2008. For assistance with your case, or to find out more on selecting the perfect expert, be sure to contact them at consulting@catalystsecure.com.

Automatic Footers: Toothless Legal Verbiage Causes Search Headaches

“The contents of this email may be privileged and confidential and are intended for the use of the intended addressee(s) only. Unless you are the addressee, you may not use, copy or disclose to anyone the message or any information contained in the message. Under penalty of death, public ridicule, and death a second time you are legally obligated to: 1) Delete this email and all copies. 2) Destroy your computer and email server using fire, sledge hammer, and/or atomic weapon. 3) Bury the remains of step two in a haunted pet cemetery. 4) Confess to a religious leader of your choosing that you read an unintended electronic communication and promise never to do it again.”

For anyone who has generated or received an email from a law firm or major corporation, you’ve become accustomed to the obligatory paragraph of legal jargon that follows even the most rudimentary of emails. You’ve seen how that added paragraph, when multiplied during the course of normal correspondence, takes what should be a well-formatted email reply and turns the whole string into an endless chain that even M.C. Escher would be proud of.

This added text is something we put up with believing it is a necessary evil. We are under the belief that this paragraph could someday come to the rescue should we accidentally send a confidential email to Craigslist instead of Craig in accounting. Well world, be prepared to be shattered. According to a recent article in The Economist, Spare us the e-mail yada-yada, they are probably pointless.

They are assumed to be a wise precaution. But they are mostly, legally speaking, pointless. Lawyers and experts on internet policy say no court case has ever turned on the presence or absence of such an automatic e-mail footer in America, the most litigious of rich countries.

Many disclaimers are, in effect, seeking to impose a contractual obligation unilaterally, and thus are probably unenforceable. This is clear in Europe, where a directive from the European Commission tells the courts to strike out any unreasonable contractual obligation on a consumer if he has not freely negotiated it. And a footer stating that nothing in the e-mail should be used to break the law would be of no protection to a lawyer or financial adviser sending a message that did suggest something illegal.

How effective can something be that is automatically generated and unilaterally imposed? An unintended recipient could just as easily argue that the notice does not actually represent any sort of subjective intent to claim privilege since it is being used on emails that range from afternoon pizza orders to communications with opposing counsel.

What Does This Have to Do with Search?

Despite their likely futility, these blocks of text are not going anywhere. Instead, those of us in e-discovery need to accept that they are going to be in our collections and find solutions for how to work around them.

While “filler” text has implications to many culling and review analytics, it is most felt in identification of privileged documents. When every email in your collection contains the words “Privileged and Confidential” in the footer, a simple Boolean search is not going to cut it.

At Catalyst, we’ve approached this problem in a few different ways. The most effective is to temporarily alter the index in a way that excludes these likely “false positives.” In order for this to work, you need to devote some time to sampling your collection. Are there style encodings that designate footer text? Or is the footer text consistent across the collection so that it can be easily identified? Once you know what is to be removed, run your searches or other analytics, update for Potentially Privilege and restore the original index.

A less text-intensive approach would be to focus on the metadata in your collection. Go beyond your standard privilege search and look at who the author and recipients are in your collection. Ninety percent of privilege comes down to who sent it and who received it. Focus on communications sent solely between privileged parties. Other documents can be further classified according to likelihood of being privileged. For long communications spanning varied recipients, use email threading tools to identify where privilege breaks or is created.

With the right amount of planning and forethought when embarking on a document review strategy, even if these automated footers are useless, you won’t be.

A Free Trick to Quickly Categorize Documents During Review

It slices, it dices, it groups documents together for lightning quick review!

Each grouping is masterfully handcrafted by the original author!

Guaranteed to increase in value or your money back!

And that’s not all … call in the next 10 minutes and your order is completely free!

I’ve finally recovered from my yearly “Brand Bowl” hangover. For those not familiar with the event, Brand Bowl is where the best and the brightest companies march on the field of battle, your television, to try to win consumers’ praises and pocketbooks. The action is intense, broken up only by intermittent bouts of football and halftime shows. Costing $3 million for a 30-second spot, the stakes are high. To the victors, a viral buzz worth 10 times the money spent; to the losers, a costly blemish to their brand identity.

A relatively new type of entrant to the competition are companies that give away their product for free. Seven of the advertisers this year offered some type of free service. There was a time when free was all the marketing you needed. In fact, I once saw a video of people pushing and crowding to get an unappetizing plate of cold beans and hot dogs simply because it was free. Technology has changed the playing field. Free email, free data storage, free open source software — the list of free products goes on and on. Now even free needs a multi-million dollar commercial.

Not to be left out, I wanted to share with you one of my favorite free tricks when it comes to document review. Inherent in every document is a telling piece of metadata that provides a human-based classification system created by the document author him- or herself. Many of you know this valuable piece of data as “Original File Path.”

Why is this important? Just think of your own personal computer. When you save a document, the first question asked by the system is where should it be saved? In our Microsoft-centric world, we are most accustomed to using a folder tree. The folder choice helps us remember where to go when we find this document in the future. To save space, like documents are usually foldered together around a central topic.

Examples of How You Can Use Folder Trees

This built-in organization structure is a seldom used but powerful tool to accelerate document assessment. How so? Here are just some of the possibilities for how you can make use of folder trees:

  • Accelerated Review: Short on time? Identify the largest folders first to quickly move them through review. In my experience, I’ve seen thousands of documents moved by a single reviewer in just a few hours by sampling and bulk coding similar documents.
  • Opposing Party Production: Start your review by searching for folder names that are most relevant to your case. A great way to prepare for rapidly approaching depositions.
  • Hot/Privileged Review: Found a really hot document or a crucial privileged attorney-client communication? Take a look at the folder that the document came from. Chances are you will find more where that came from.
  • Low-Hanging Fruit: If reviewing for an intellectual-property case, it would make sense to pay special attention to the folder named “Patent Revisions” while offering just a cursory review of the “Employee Picnic 2009” folder. This low-hanging fruit is easy to identify and provides a great jump start to any case.

Review is an ever-growing cost of litigation. Any device that you can use for free should be a welcome change. While technologies such as clustering, semantic indexing and predictive coding have an important part to play in modern document review, at times their cost can be prohibitive. Depending on the size of the case and the amount of financial resources available, free might just be exactly the right price for your client.

The Bead, the Cupcake and the Million Dollar Sanction

10:14 pm. Crying and screaming. Get out of the car. Grab my daughter. Run through the blistering cold. Check in to the emergency room. Pray everything is going to be alright.

That was my Tuesday night. The kids were put to bed early and my wife and I were enjoying the precious few moments that we get to spend alone together. Then the screaming started. From upstairs we could hear my girl shrieking at the top of her lungs.

I ran up stairs faster than I thought possible. As I entered her room, she looked up at me with her sad doe eyes. Between the sobs, I heard, “Papa, it’s stuck in my nose.”

While we thought she was sleeping, she had grabbed a glass bead off of a broken necklace and stuck it in her nose. Each attempt to remove the bead just pushed it in further. It got pushed so far back that it necessitated a trip to the emergency room to get it out.

After lots of tears, some very compassionate nurses, and a late night cupcake run, my daughter is now bead free and doing well.

Mark Pappas of Creative Pipe is like my Three Year Old

As I recount the events of that night, I can’t help but make the correlation to the sanction order recently handed down by U.S. Magistrate Judge Paul W. Grimm in Victor Stanley v. Creative Pipe.

On Jan. 24, 2011, Judge Grimm ordered Mark Pappas, owner of Creative Pipe, to pay $1.05 million for costs “associated with all discovery that would not have been undertaken but for Defendants’ spoliation, as well as the briefings and hearings regarding Plaintiff’s Motion for Sanctions.”

Judge Grimm’s Jan. 24 sanctions order follows from his Sept. 9, 2010, opinion detailing the “background and intricacies” of the case. The details of the decision that most refer to as Victor Stanley II are far too numerous to go over in a blog post. To learn more (and I suggest everyone should), read Judge Grimm’s 89-page memorandumVictor Stanley, Inc. v. Creative Pipe, Inc., 269 F.R.D. 497 (D. Md. 2010). You’ll laugh, you’ll cry, you’ll be very disturbed.

Judge Grimm does offer up an itemized list that briefly scratches the surface of Pappa’s misconduct.

  1. Pappas’s failure to implement a litigation hold.
  2. Pappas’s deletions of ESI soon after VSI filed suit.
  3. Pappas’s failure to preserve his external hard drive after Plaintiff demanded preservation of ESI.
  4. Pappas’s failure to preserve files and emails after Plaintiff demanded their preservation.
  5. Pappas’s deletion of ESI after the Court issued its first preservation order.
  6. Pappas’s continued deletion of ESI and use of programs to permanently remove files after the Court admonished the parties of their duty to preserve evidence and issued its second preservation order.
  7. Pappas’s failure to preserve ESI when he replaced the CPI server.
  8. Pappas’s further use of programs to permanently delete ESI after the Court issued numerous production orders.

With every offense, Pappas pushed the bead farther up his own nose. Even when he knew he had been caught, he offered such gem of excuses as, “he would put e-mails into a ‘Deleted Items’ folder on his laptop [for] … storage purposes.” Or, even better, that he had a backup drive but he returned it to “Bob from Office Max.”

Like my daughter and her bead, Pappa got himself into a situation that he could no longer get out of. The bead is now so deeply lodged in this case that it will take a million dollar sanction and threats of jail time to remove. I’m fairly certain it is going to take more than a cupcake to make Pappas feel better once this case is over.

Investigating the E-Discovery Word on the Street: ‘Unitization’

“Papa, there is a noise in the fireplace. We need to investigate!” – My 3-year-old

These are the first words out of my daughter’s mouth when I come home. While my wife and I try to introduce her to new words, “investigate” was not one of them. I should be concerned about the family of squirrels living in my chimney. Instead, I want to know where she had learned such a big word.

Turns out she learned it on the street. The very same street where I learned a lot of the words I still use today, Sesame Street. Despite 30 years age difference between us, this kids’ program has transcended generations.

We sit on the couch and pull up the episode on the DVR. I am instantly transported to my childhood. Big Bird, Cookie Monster, the Count and I spent many mornings together, me in my footie pajamas and bowl of cereal and them meeting outside Hooper’s store.

During the episode, a furry red puppet named Murray introduces us to the Word on the Street: “investigate.” Twenty minutes later, Colin Farrell shows up with Elmo to show how they investigate.

The Need to Keep Learning

Now that I’m well beyond Sesame Street’s recommended viewing age, I have to wonder why I stopped seeking out new words. I haven’t checked, but I’m pretty sure that I haven’t learned them all yet.

New industries and new terminology go hand in hand. Before the iPhone, did anyone use the term “appstore”? Electronic discovery is one of those new industries that is slowly building a lexicon of industry-specific words. Some are unique to us while others have been in existence for some time but are being appropriated to what we do.

Words are power and it is time for us in e-discovery to have a common language base. With common language, we can save time and effort and reduce costly mistakes when dealing with each other. So I present to you: The E-Discovery Word on the Street.

E-Discovery Word on the Street: Unitization

Today’s E-Discovery Word on the Street comes from The Sedona Conference® Glossary: E-Discovery & Digital Information Management (Third Edition):

Unitization – Physical and Logical: The assembly of individually scanned pages into documents. Physical Unitization utilizes actual objects such as staples, paper clips, and folders to determine pages that belong together as documents for archival and retrieval purposes. Logical unitization is the process of human review of each individual page in an image collection using logical cues to determine pages that belong together as documents. Such cues can be consecutive page numbering, report titles, similar headers and footers and other logical indicators. This process should also capture document relationships, such as parent and child attachments.

Obviously, we’re not dealing with paper clips and staples in e-discovery. Our work deals with the logical unitization of documents. Most of us are aware of the common parent/children family relationships among documents, with its most typical usage being e-mails and attachments. What should be noted, though, is that unitization applies to all document relationships. E-mail threads, near duplicates, date ranges, custodians, etc. — any grouping that would create a logical category that assists the reviewer in building relationship from one document to the next.

A good real-world example is a deck of cards. Each card carries with it numerous properties that link it to other cards within a deck. For example, the ace of diamonds can be unitized with aces, diamonds or red cards. Each group creates its own logical category.

For more e-discovery terms and definitions, check out The Sedona Conference® Glossary: E-Discovery & Digital Information Management (Third Edition).

Have any suggestions for industry words we should all know? Post them in the comments section.

E-Discovery, You Say? But What Do You Actually Do for a Living?

This year I was invited to a New Year’s Eve party by an old college friend. Being the introverted person that I am, I dread these types of social gatherings. Still, in hopes of turning over a new leaf, I decided to go.

At the party, I knew only a handful of people, so I was forced to do the question dance for most of the night. For those of you who don’t know what I’m talking about, it usually goes like this:

Q: So what do you do for a living?
A: I’m a teacher.
Q: That’s great, what do you teach?

This question-answer-question format is the staple for new interactions. It provides commonality between the two parties and offers a platform to branch off into other areas of discussion. This pattern is as old as time. If you could go back a million years you would see two cavemen awkwardly standing around a fire asking, “So you hunter or gatherer?”

Unfortunately, for those of us in e-discovery, it is never that simple.

Q: So what do you do for a living?
A: I work in electronic discovery.
blank stare

While I received a few different responses at the party, “blank stare” was definitely the most common. My favorite though was, “So is electronics discovery like when you help people find their lost remote control?”

Throughout the night I came up with numerous definitions to explain what I do. Each time though, my definition was a conversation stopper. Later that evening, I thought about what was I doing wrong.

[Read more...]

Tangled: Taking the Knots Out of E-Mail Threads and Near Duplicates

I was recently asked out on a daddy-daughter date by my three-year-old girl. Although I was the one paying (she has yet to find a job, I blame the recession), she got to pick the activity. For our date, we went out to the matinée showing of the Disney movie, Tangled. For those of you without young children, Tangled is the modern interpretation of the classic fairytale, Rapunzel.

At work the following Monday I was asked how to effectively untangle e-mail threading and near duplicates. I immediately thought of the movie I just saw and realized that I have knots of my own – or, should I say, of my clients – to work through.

For example: Say you have an e-mail with an attachment, a real estate contract. This e-mail was sent to five recipients. Each of them then forwarded it to five others. In the collection, there exist multiple duplicates and near-duplicates of the attachment that were themselves sent in multiple different e-mails. These e-mails were also forwarded, modified and re-forwarded.

How can the review team be sure these documents are being reviewed efficiently and coded consistently? With multiple reviewers and a large document collection, the chain of duplicates and near-duplicates can easily lead to confusion and inconsistencies.

Consider another example: When company employees use a graphical logo in their e-mail signatures, that logo travels as an attachment. If I look for near-duplicates of such an e-mail, I can end up looping in easily half the e-mails from that site. Where do the near-dupes begin and end?

You see why I referenced Tangled? Instead of 70-feet of hair, I had what seemed like a mile of documents to straighten out. While threading and near-dupe technology is great for increasing review speed and maintaining consistent coding, it can quickly branch out into a big mess.

To solve my dilemma I needed to find a way to add structure to these content groupings. I thought back to my daughter and the movie and came up with a way to keep both her hair and my documents from getting out of control.

[Read more...]

Setting Up Review Workflows for Multi-Language Documents

The world is getting smaller. For large corporations, it is virtually certain that their operations span multiple countries. But it is no longer just large corporations that operate globally. These days, even small- and mid-sized businesses are likely to have international components.

When a business is international, then any legal matters involving that business are also likely to be international in scope. In the context of litigation or a government investigation, that means the matter is likely to involve documents in more than one language. Often, such cases will involve collections of documents in a number of different languages – or even single documents containing multiple languages.

In other words, multi-language documents are a fact of e-discovery life these days. For e-discovery professionals, processing and review of multi-language collections raise a number of issues. In this post, I want to talk about one – review workflow.

Language Identification

The start of any successful multi-language review begins with computerized language identification. While most platforms support language identification, they tend to vary greatly in efficiency. Language identification uses built-in dictionaries to identify the primary (and sometimes secondary) language present in a document. This information can be used to route documents to the appropriate reviewer or to distinguish which documents need to be sent out for translation before they can be reviewed.

In the case of Chinese/Japanese/Korean (CJK) documents, language identification is less precise than when dealing with a Western Language character set. Frequently, document headers, email formatting, and email signatures contain CJK text while the substantial portion of the record would be in a Western language. This minimal amount of text causes the entire document to be coded as CJK. To navigate this problem, use a search tool that can both tokenize and count Western versus CJK words within a given document. These numbers can help establish a baseline to determine the true language of a document.

Language Specific Batching

With the limited number of foreign language reviewers and the often high cost associated with obtaining their services, it is important to have a clear system for assigning foreign language documents.

While most foreign language reviewers can review both their native language and English, you don’t want them wasting their time reviewing a document that 90% of your other reviewers can read. For English documents, documents with an unknown language, or documents with no text, these should go to your English reviewers. Only documents containing a non-English language should go to your foreign language reviewers.

Keep in mind though, when reviewing by document families, if any document contains a foreign language, the entire document set should go to your foreign language reviewer.

Flexible Workflows

No language identification is perfect. Inevitably reviewers are going to come across documents they can’t read. That’s why it is important to have a flexible workflow when setting up your review.

Take advantage of a rule-based review platform to re-route documents to the appropriate reviewer. If an English reviewer comes across a Russian document, the platform should provide the ability to reassign that document without being lost in the shuffle. If no reviewer exists with that language proficiency, incorporate a system where a coder can tag a document for translation.

Objective Indexing: Facciola-Redgrave Framework

To follow up on my recent blog post on the The Facciola-Redgrave Framework, I was asked to go into more detail regarding what if anything should be provided to the opposing party.

As I described in my earlier post, U.S. Magistrate Judge John M. Facciola and Nixon Peabody partner Jonathan M. Redgrave have proposed a system of dealing with privilege by using categories. They outlined their proposal in a November 2009 article published in The Federal Courts Law Review, Asserting and Challenging Privilege Claims in Modern Litigation: The Facciola-Redgrave Framework.

Facciola and Redgrave propose that instead of a traditional privilege review, large segments of documents can be grouped together by categories with a high likelihood of privilege. These categories of documents can be excluded from entry in a privilege log and, in some cases, from collection and review entirely.

So what should be handed to the other side for these categorized documents? According to the journal authors, in lieu of a traditional privilege log, an objective index can be provided for some or all of the categories. An objective index is comprised of “readily available information [that] can be prepared to aid the parties in understanding the universe of documents at issue.”

Facciola and Redgrave stress that this is not intended to be a substitute for a privilege log and should not be an undue burden to the producing party. The data should come from information already stored in a database such as authors, recipients, dates, assigned bates, or production number and file size.

When creating an objective index, be sure to work with someone who is familiar with databases and your data in particular. With the right technical counsel, you can make short work of a process that, when done on a document level, could take a room full of attorneys days to accomplish.

(Ron Tienzo is assistant director of Catalyst Consulting.)