E-Discovery Search Blog

Bob Ambrogi

About Bob Ambrogi

A lawyer and veteran legal journalist, Bob advises Catalyst on strategic communications and marketing matters. He is also a practicing lawyer in Massachusetts and is the former editor-in-chief of The National Law Journal, Lawyers USA and Massachusetts Lawyers Weekly. A fellow of the College of Law Practice Management, he also writes the blog LawSites.

Was Samsung Deal a Watershed for use of Machine Translation in FTC Second Requests?

Another e-discovery technology may recently have had its watershed moment, similar to the recent watershed marked by U.S. Magistrate Judge Andrew J. Peck’s endorsement of predictive coding. This time, the watershed involved the use of machines in place of humans to translate foreign-language documents in a massive Federal Trade Commission Second Request.

The deal that was the subject of the Second Request was remarkable by any measure. It involved the nearly $1.4 billion sale by South Korea-based Samsung Electronics Co. Ltd., a world leader in digital consumer electronics and information technology, of its hard disk drive operations to Seagate Technology Plc, a major manufacturer of disk drives and storage products. After the deal closed in December, Samsung reported its largest-ever quarterly profit.

The deal was also momentous for the law firm that represented Samsung, Paul Hastings. Last month, the antitrust publication Global Competition Review named the Samsung/Seagate transaction as “Matter of the Year,” recognizing both Paul Hastings for its representation of Samsung and Wilson Sonsini Goodrich & Rosati for its representation of Seagate.

But before the deal could be consummated, it had to await Second Request review by the FTC, a fast-track, discovery-like process that gives the FTC the documents and information it needs to evaluate whether the proposed transaction may violate antitrust laws.

One Million Pages to Translate

For the attorneys at Paul Hastings, the Second Request process was further complicated by one simple fact—the vast majority of the documents they had to produce were in Korean, but the Second Request process requires all documents to be produced in English. Based on the firm’s preliminary estimates, of roughly 350,000 documents they had collected, 200,000 were in Korean. At an estimated five pages per document, Paul Hastings estimated that they could be facing as many as a million pages of Korean-language text.

The attorneys faced two hurdles. Time was the first; as long as the parties have not complied with the Second Request, they are prohibited from closing the transaction, potentially resulting in significant business losses while the two companies continue to operate independently. Expense was the other; by their estimates, the cost to translate all 200,000 documents by hand could easily exceed $20 million.

Given these hurdles, the Paul Hastings attorneys decided early in the process to try machine translation (MT) as an alternative to hand translation. MT held the promise of both expediting the translation process and saving their client substantial expense.

In the field of e-discovery, MT is a technology that is gaining growing recognition and use. Sophisticated MT software uses linguistic and statistical techniques to translate text from one language to another. By seeding the software with key terms and phrases and updating the dictionary as the translation progresses, results can be impressive. While machine translation is far from perfect, in the e-discovery context, it can often eliminate or greatly reduce the need for human translation.

Using MT for the Second Request

In a Second Request scenario, each party must certify that it has “substantially complied” with the government’s requests, including the request that documents be provided in English.  With this substantial compliance standard in mind, the Paul Hastings team went forward with its plan, selecting an MT vendor, providing core terms and names to seed the dictionary, and working with the vendor to refine results. Simultaneously, certain documents identified as particularly relevant or important were singled out for hand translation to ensure maximum accuracy for this subset. When the translation process was complete, the lawyers delivered the translated documents to the FTC.

As the FTC was quick to note, the machine-translated documents did not have the same level of quality as the hand-translated materials.  However, Paul Hastings took the position that the translations were sufficient to allow the agency to identify relevant materials for further review and to seek hand translation of particular documents as necessary. In the end, the Second Request process ran its course and the FTC allowed the merger to close.

In this case, at least, machine translation was sufficient to allow the process to reach its end. Not only that, but by using MT, Paul Hastings saved its client a huge expense. Remember that estimated human translation cost of over $20 million? By contrast, the final cost of machine translation was less than 5% of that amount.

What Does this Mean?

Recently, I interviewed Michael S. Wise, a Paul Hastings litigation associate who was heavily involved in the Samsung deal and the translation project.

There is no definitive way to determine whether MT has ever before been used to such an extent in an FTC second request, Wise said. And the circumstances of the case prevent any solid inference that the FTC now endorses MT.

Still, the implications are significant for large corporations headquartered outside the United States, he believes. “We hear multiple war stories of companies paying millions for hand translations of materials in the Second Request process. That seems ludicrous from a cost perspective.”

Despite his firm’s success with the process in the Samsung case, Wise cautions that MT remains far from perfect and therefore is not always a substitute for human translations, particularly not in an adversarial context where precision may be demanded by the opposing party.

“But in the agency context, where there’s no wrongful conduct being alleged, where it only involves the economic effects of the transaction, you have a strong argument that the government has an obligation to employ reasonableness in the costs it imposes on a company,” Wise said. “With MT, you can control the cost and still be able to run search terms and find what you’re looking for.”

Wise says he would use MT again when the circumstances warrant it. He also says he would refine the approach based on the lessons he learned here. For example, he found that having separate vendors for translation and document review/hosting resulted in technical compatibility issues; he would prefer a vendor that offers a single platform to handle both MT and review in the future. Also, he would budget more time for the MT process and start with a larger seed set than the vendor recommended in this case.  He believes that to maximize accuracy in his next case, he would look to have double the number of seed documents and an extra month built into the timeline which could be used to work with the vendor on fine tuning its translation engine.

So was this a watershed moment for MT? Wise believes it was.

“Five years ago, maybe less, you had to hand-translate everything. We are getting to the point now where, done properly, there is the opportunity to employ these technologies to substantially reduce costs.”

The story of how lawyers can use technology to streamline e-discovery is not just about predictive coding. With machine translation, the story now has another chapter.

(Note: Although Catalyst offers a full range of language services, including MT and human translation, it was not involved in this case.)

Two Catalyst Researchers Coauthor Book on Next-Generation Search

Two leaders of research and development here at Catalyst have helped write a seminal new book on search and information retrieval, Next Generation Search Engines: Advanced Models for Information Retrieval, released in March by IGI Global.

Next Generation Search EnginesBruce Kiefer, leader of the platform group at Catalyst, and Reed Esau, a platform architect focused on research and development at Catalyst, together with Michael W. Berry, associate director of the Center for Intelligent Systems & Machine Learning at the University of Tennessee, co-authored the book’s chapter, “The Use of Text Mining Techniques in Electronic Discovery for Legal Matters.”

As volumes of e-discovery data have outgrown the manual processes long used to make relevance judgments, Kiefer and Esau explain how methods of text mining and information retrieval, including predictive coding, can be used to help reduce data volumes. Acknowledging that text-mining techniques have so far delivered uneven results, they start the chapter by looking at the historical bias of the collection process. They then examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization can deal with nuances of the collection process.

Their chapter is part of a book intended for scientists and decision-makers who wish to gain working knowledge about search in order to evaluate available options and to engage in a dialogue with software and data providers. The aim of the book is to give readers a better understanding of the latest trends in applied research.

The book is available for purchase in hardcover and e-book editions from IGI Global (www.igi-global.com). The individual chapter can be purchased separately as a PDF download.

For more information about the book and the authors, see the full announcement.

 

Catalyst’s Jim Eidelman Discusses Predictive Coding in ‘Law Technology News’

Now that U.S. District Judge Andrew L. Carter Jr. has affirmed the groundbreaking predictive coding order issued by U.S. Magistrate Judge Andrew J. Peck in Da Silva Moore v. Publicis Groupe, Law Technology News reporter Evan Koblentz went back and spoke to leading professionals in the legal technology field for their reactions. You can read his story here: Take Two: Reactions to ‘Da Silva Moore’ Predictive Coding Order.

Jim Eidelman

One of the people Koblentz quotes is Catalyst’s own Jim Eidelman, senior search and analytics consultant on the Catalyst Search & Analytics Consulting team. These court decisions gave predictive coding “a legitimacy that was needed,” Eidelman told Koblentz. But before predictive coding can fully enter the mainstream, engineers need to work out some of the technology’s limitations, he said.

“Obviously it is all about the process, the sampling, and the use of common sense,” Eidelman said. “Some documents can only be found other ways, and predictive coding isn’t a universal solution. Clearly multi-mode searching and review is required in every case, with or without da Silva.”

Eidelman goes on to discuss what he says is “one of the big defensibility issues nobody is talking about.” That issue is pre-culling using keyword searching — something that can leave relevant documents behind and taint the process.

“Other big issues are sampling methodologies, how multiple issues are handled, and attorney-client privilege,” Eidelman says in the article. “There is still so much to be worked out. We are just at the infancy of the machine learning applied to e-discovery documents, even though ‘relevance feedback’ has been used in other areas for decades.”

Read the full article with reactions from a number of technology professionals at Law Technology News.

‘Big Data’ Keynote at MarkLogic Conference Features Catalyst Insight

MarkLogic calls itself “the operational database for big data,” and “big data” was the theme of the executive keynote yesterday that kicked off the three-day MarkLogic World 2012 conference in Washington, D.C. And to help illustrate that theme and the power of MarkLogic to handle big data, the keynote presenters invited one company–you guessed it, Catalyst–to share the stage and highlight our revolutionary new e-discovery platform built using MarkLogic, Catalyst Insight.

MarkLogic’s COO Keith Carlson and SVP Products Gary Lang presented the keynote, where they featured customers that are using big data in big ways — customers as diverse as Warner Bros., which uses MarkLogic to manage its digital assets, and the San Francisco Giants, which uses it to manage and analyze thousands of play-by-play images of every game.

The keynote speakers then highlighted three companies that had integrated the MarkLogic database into their own products. This time, they talked about companies such as Tableau Software and CQ Roll Call. But of the companies they chose to cover, they invited just one to join them on stage.

In front of an audience of 800-plus attendees, T.J. Gill, Catalyst’s vice president of business development, joined Carlson and Lang on stage to talk about Insight and how it uses the MarkLogic platform to handle hundreds of terabytes of data for some of the world’s largest cases and corporations. For T.J., it was both thrilling and nerve-wracking, but for Catalyst, it was an honor to see Insight honored in this way.

Throughout the keynote, an illustrator was graphically journaling the presentation. By the end, he had created a single picture that pretty much encapsulated the entire presentation. As you can see from the images below, Catalyst Insight is shown prominently (just below the MarkLogic name and to the left of “big data”).

 

 

 

Federal Court Affirms Judge Peck’s Predictive Coding Order

Judge Carter

When last we left the case of Da Silva Moore v. Publicis Groupe–the groundbreaking case in which U.S. Magistrate Judge Andrew J. Peck issued the first judicial opinion to endorse the use of computer-assisted review and predictive coding–it was headed for review by U.S. District Judge Andrew L. Carter Jr. Now, thanks to a heads-up from Evan Koblentz at Law Technology News, we learn that Judge Carter has issued his ruling and has adopted Judge Peck’s opinion.

“The Court adopts Judge Peck’srulings because they are well reasoned and they consider the potential advantages and pitfalls of the predictive coding software,” Judge Carter wrote in an opinion filed today.

In challenging Judge Peck’s order, the plaintiffs had argued that he had mischaracterized and confused the issue of whether they had consented to the use of predictive coding. Judge Carter concluded that any such confusion was immaterial.

The confusion is immaterial because the ESI protocol contains standards for measuring the reliability of the process and the protocol builds in levels of participation by Plaintiffs. It provides that the search methods will be carefully crafted and tested for quality assurance, with Plaintiffs participating in their implementation. For example, Plaintiffs’ counsel may provide keywords and review the documents and the issue coding before the production is made. If there is a concern with the relevance of the culled documents, the parties may raise the issue before Judge Peck before the final production. Further, upon the receipt of the production, if Plaintiffs determine that they are missing relevant documents, they may revisit the issue of whether the software is the best method.

Plaintiffs also challenged Judge Peck’s order on the ground that predictive coding is not a reliable method. Judge Carter ruled that this issue is also premature. As the litigation continues, if the parties believe the predictive coding software is flawed or that the process produces incomplete results, they can raise their concerns with Judge Peck and ask him to reconsider, Judge Carter noted. “To call the method unreliable at this stage is speculative.”

There simply is no review tool that guarantees perfection. The parties and Judge Peck have acknowledged that there are risks inherent in any method of reviewing electronic documents. Manual review with keyword searches is costly, though appropriate in certain situations. However, even if all parties here were willing to entertain the notion of manually reviewing the documents, such review is prone to human error and marred with inconsistencies from the various attorneys’ determination of whether a document is responsive. Judge Peck concluded that under the circumstances ofthis particular case, the use of the predictive coding software as specified in the ESI protocol is more appropriate than keyword searching. The Court does not find a basis to hold that his conclusion is clearly erroneous or  contrary to law.

As to that secondary issue I mentioned in an earlier blog post–whether Rule 702 and Daubert apply to a court’s acceptance of a predictive-coding protocol–Judge Carter made short work of that. In a footnote, he wrote: “The Court adopts Judge Peck’s analysis of Rule 26(g) and Fed. R. Evidence 702 for similar reasons provided in his written opinion.”

Thus, Judge Peck’s predictive coding order has stood its ground and, with Judge Carter’s adoption of his reasoning, the use of predictive coding has taken another giant step towards the mainstream.

Catalyst Secures $32 Million Growth Equity Investment

Major news at Catalyst today: The company announced that FTV Capital has acquired a significant stake in Catalyst, investing a total of $32 million.

FTV Capital is a multi-stage growth equity firm that invests in high-growth companies offering a range of innovative solutions in business services, financial services, payments and transaction processing, and technology. FTV Capital provides entrepreneurs with unique access to its Global Partner Network, a group of the world’s foremost financial institutions that have invested in FTV Capital and its portfolio companies for more than a decade. Founded in 1998, FTV Capital has more than $1 billion across three funds, and has offices in San Francisco and New York.

“We are highly enthusiastic about partnering with Catalyst, given the significant achievements and momentum of this company, “said Eric Byunn, FTV partner and new Catalyst board member. “Catalyst’s long-tenured, proven management team has built a differentiated, proprietary technology offering that has attracted an impressive client base of blue chip companies and large law firms. Global demand for e-discovery is huge and growing rapidly, and Catalyst is well-positioned to capitalize on this multi-billion dollar market.”

“FTV’s new investment represents a powerful endorsement of Catalyst’s accomplishments and strong outlook, bolstered by the continued support from our initial investors, who have retained a stake in the company,” said John Tredennick, CEO of Catalyst. “FTV brings domain expertise and a strong network of industry contacts which will help contribute to Catalyst’s continued strong growth, especially as we expand into new markets.”

“We have worked with this management team since our initial investment in 2005 and are proud of the growth Catalyst has achieved,” said Tyler Newton, a partner at Catalyst Investors and former Catalyst board member. “Management’s vision and the company’s capabilities will enable it to continue to attract enterprises and law firms worldwide who are looking for an e-discovery solution with speed, scalability and features that attack complex legal issues.” Catalyst Investors is selling the majority of its ownership in Catalyst in conjunction with this transaction.

Read the full announcement here or download the PDF version.

Spoliation Meets the Backyard Barbecue: The Bonfire of Veracity

We don’t often write about spoliation cases here, but some cases are so burning hot that they ignite our interest. Such is the recent ruling in an employment discrimination case from U.S. District Chief Judge William H. Steele of the Southern District of Alabama, Evans v. Mobile County Health Department.

The plaintiff, Sandra Evans, appealed to Judge Steele from a finding of a U.S. magistrate judge. The magistrate concluded that she had purposefully destroyed electronic evidence contained on her personal computer during the pendency of the litigation and that the destruction amounted to spoliation, deserving of sanctions. The plaintiff was clearly hot around the collar about this, asserting that her PC contained no discoverable evidence.

Judge Steele made short order of plaintiff’s assertion. There was abundant evidence that her PC contained information that was potentially discoverable, he concluded. These included emails she had admitted forwarding from her work computer to her home computer and a daily work diary she kept on her home computer.

But the one fact that seemed to most burn the judge–and that turned the plaintiff’s arguments to ashes–was the method by which she destroyed the evidence.

She took her home computer out into her yard and set it ablaze.

How did she explain this rather unusual act? Her computer had crashed, she contended, so she set it on fire in order to keep tax information it contained out of the hands of third parties.

Given all this, the judge’s response was rather cool. Her explanation was “facially implausible,” he said, “leaving only the destruction of discoverable information as a likely motive.” Sanctions affirmed.

Had I been the plaintiff’s counsel, I would have cited that famous Sedona Conference e-discovery document in her defense. You know the one, The Conflagration Proclamation.

Catalyst Named Leading Provider of Predictive Coding Technology

Complex Discovery, a blog focused on the intersection of electronic discovery and social media, has named Catalyst a leading provider of predictive coding technology.

The blog’s author, Rob Robinson, compiled the list of 22 predictive coding providers based on his review of information from leading providers of e-discovery technology.

See Rob’s full list here: 20+ Predictive Coding Technology Providers.

‘Patent Troll’ Lawsuits Target E-Discovery Vendors, Including Catalyst

A company called Lone Star Document Management has filed a string of patent-infringement lawsuits against a number of e-discovery companies, including Catalyst. The lawsuits, most of which were filed this month, claim infringement of U.S. patent 6,918,082, which describes a system for proofing PDF documents delivered over a network.

Troll by John BauerIn an interview with reporter Evan Koblentz at Law Technology News, Catalyst’s founder and CEO John Tredennick calls the lawsuit “baloney.” Some of the patent claims do not apply to e-discovery software, Tredennick says, and those that do–such as the ability for users to add comments to documents–were standard methods well before this patent was filed in 1998.

The company appears to be a shell company formed to acquire and enforce patents, Koblentz reports. The company has no website and its Plano, Texas, address is merely a mail-forwarding location that forwards mail to one of the patent’s inventors, Jeffrey M. Gross of Brooklyn, N.Y.

Other e-discovery companies that have been sued include Atalasoft; Breeze LLC; Business Intelligence Associates; CAP Digisoft Solution Inc.CaseCentral; Digital Reef; Gallivan, Gallivan & O’Melia; Lexbe; and Trial Solutions of Texas (now known as CloudNine Discovery).

Tredennick tells Koblentz that he has not yet been served with the complaint, but that it looks like a case of patent trolls. “This first thing I’m going to do is try to contact all the other defendants and see if we can’t fight it together using a common defense fund, which cuts down the expenses dramatically,” Tredennick says. “The second thing is we’re going to go after these guys for filing a frivolous suit. We’re going to go after them hard for sanctions.”

Should the ‘Daubert’ Standard Apply to Predictive Coding? We May Know Soon

It’s been a month since U.S. Magistrate Judge Andrew J. Peck issued his seminal opinion on predictive coding, Da Silva Moore v. Publicis Groupe, and it continues to make waves. Notably, it appears that U.S. District Judge Andrew L. Carter Jr. will weigh in on the issue. On March 13, he entered an order granting plaintiffs’ request to submit additional briefing on their objections to Judge Peck’s order.

A key issue Judge Carter may need to address is one given short shrift in coverage of and commentary on Judge Peck’s opinion. Understandably, most of the commentary focused on the fact that Judge Peck’s opinion marked a milestone — the first judicial opinion to recognize that computer-assisted review is an acceptable way to search for electronically stored information.

But in the course of that opinion, Judge Peck made another significant ruling. He concluded that Federal Rule of Evidence 702 and the Supreme Court’s decision in Daubert v. Merrell Dow Pharmaceuticals do not apply to a court’s acceptance of a predictive-coding protocol.

Rule 702 and Daubert give trial judges the responsibility to act as “gatekeepers” to exclude unreliable scientific and technical expert testimony. Judge Peck reasoned that these did not apply to the Da Silva Moore case because no one was trying to put anything into evidence. Here is how he explained it:

If MSL sought to have its expert testify at trial and introduce the results of its ESI protocol into evidence, Daubert and Rule 702 would apply. Here, in contrast, the tens of thousands of emails that will be produced in discovery are not being offered into evidence at trial as the result of a scientific process or otherwise. The admissibility of specific emails at trial will depend upon each email itself (for example, whether it is hearsay, or a business record or party admission), not how it was found during discovery.

Rule 702 and Daubert simply are not applicable to how documents are searched for and found in discovery.

You may recall that before Judge Peck issued his written opinion in this case on Feb. 22, he made oral rulings at the motion hearing on Feb. 8. On Feb. 22, just as Judge Peck was issuing his written opinion, the plaintiffs filed objections to his Feb. 8 rulings. One of their central arguments was that Judge Peck erred in disregarding his gatekeeper role under Daubert.

Because predictive coding is a new and novel technology, they argued, Judge Peck should have required expert testimony regarding its reliability or appropriateness. They cite Magistrate Judge Paul Grimm’s well-known ruling in Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251, 260 n.10 (D. Md. 2008), where he said, “[R]esolving contested issues of whether a particular search and information retrieval method was appropriate … involves scientific, technical or specialized information.” Relying on this, the plaintiffs argued:

[A]t no point did the Magistrate review any evidence to support his decision. The Magistrate took no judicial notice of any documents or studies that support the reliability of MSL’s method, nor did he receive any affidavits or declarations from purported experts that supported the methodology of MSL’s method. To his credit, the Magistrate did ask the parties to bring the ESI experts they had hired to advise them regarding the creation of an ESI protocol. These experts, however, were never sworn in, and thus the statements they made in court at the hearings were not sworn testimony made under penalty of perjury. The Magistrate judge never asked for or evaluated the qualifications of these experts, nor were the parties given an opportunity to question or cross-examine the experts in order for the Court to make a finding regarding the reliability of the experts’ opinions. Thus, the Magistrate’s decision relies only on the arguments made by counsel.

On March 7, the defendants responded to plaintiffs’ objections. With regard to the Daubert issue, they took the same position as Judge Peck–that Rule 702 and Daubert do not apply to the methods used to take discovery, but only kick in when evidence is presented at trial.

Plaintiffs simply are incorrect in their assertion that Victor Stanley requires expert testimony regarding the methodology selected by a party to search for electronically stored information. Rather, this case only requires that the selected methodology was carefully planned by qualified persons, contains provisions for quality assurance, and is supported by persons with the requisite qualifications and experience.

After receiving the defendants’ response, the plaintiffs wrote to Judge Carter on March 9 asking for leave to file their own response.

[W]hile Plaintiffs were denied an opportunity to respond to Magistrate Judge Peck’s written opinion, MSL had the benefit of filing its opposition approximately two weeks after the written opinion had been issued. Indeed, MSL’s opposition brief and supporting expert declarations not only reference, but also largely rely upon Magistrate Judge Peck’s observations regarding Plaintiffs’ Objection.

Judge Carter granted the plaintiffs’ request on March 13. (The brief was due March 19.) That means that we can expect him to issue a ruling of his own. It seems unavoidable that any ruling he issues will address the core issue of the appropriateness of computer-assisted review, at least in this case. Most likely, he will also have to address this secondary issue of the applicability of Daubert. If he does, in fact, squarely address these issues–and regardless of whether he agrees with Judge Peck–his ruling will be yet another milestone for predictive coding.