Catalyst Repository Systems - Powering Complex Legal Matters

E-Discovery Search Blog

Catalyst E-Discovery Search Blog RSS Follow Catalyst on Twitter Join Catalyst on Facebook Catalyst on LinkedIn
Follow Us:
Technology, Techniques and Best Practices

Using TAR in International Litigation: Does Predictive Coding Work for Non-English Languages?

[This article originally appeared in the Winter 2014 issue of EDDE Journal, a publication of the E-Discovery and Digital Evidence Committee of the ABA Section of Science and Technology Law.]

Although still relatively new, technology-assisted review (TAR) has become a game changer for electronic discovery. This is no surprise. With digital content exploding at unimagined rates, the cost of review has skyrocketed, now accounting for over 70% of discovery costs. In this environment, a process that promises to cut review costs is sure to draw interest, as TAR, indeed, has.

flag-24502_640Called by various names—including predictive coding, predictive ranking, and computer-assisted review—TAR has become a central consideration for clients facing large-scale document review. It originally gained favor for use in pre-production reviews, providing a statistical basis to cut review time by half or more. It gained further momentum in 2012, when federal and state courts first recognized the legal validity of the process.

More recently, lawyers have realized that TAR also has great value for purposes other than preparing a production. For one, it can help you quickly find the most relevant documents in productions you receive from an opposing party. TAR can be useful for early case assessment, for regulatory investigations and even in situations where your goal is only to speed up the production process through prioritized review. In each case, TAR has proven to save time and money, often in substantial amounts.

But what about non-English language documents? For TAR to be useful in international litigation, it needs to work for languages other than English. Although English is used widely around the world,[1] it is not the only language you will see if you get involved in multi-national litigation, arbitration or regulatory investigations. Chinese, Japanese and Korean will be common for Asian transactions; German, French, Spanish, Russian, Arabic and Hebrew will be found for matters involving European or Middle Eastern nations. Will TAR work for documents in these languages?

download-pdf-versionMany industry professionals doubted that TAR would work on non-English documents. They reasoned that the TAR process was about “understanding” the meaning of documents. It followed that unless the system could understand the documents—and presumably computers understand English—the process wouldn’t be effective.

The doubters were wrong. Computers don’t actually understand documents; they simply catalog the words in documents. More accurately, we call what they recognize “tokens,” because often the fragments (numbers, misspellings, acronyms and simple gibberish) are not even words. The question, then, is whether computers can recognize tokens (words or otherwise) when they appear in other languages.

The simple answer is yes. If the documents are processed properly, TAR can be just as effective for non-English as it is for English documents. After a brief introduction to TAR and how it works, I will show you how this can be the case. We will close with a case study using TAR for Japanese documents.

What is TAR?

TAR is a process through which one or more humans interact with a computer to train it to find relevant documents. Just as there are many names for the process, there are many variations of it. For simplicity’s sake, I will use Magistrate Judge Andrew J. Peck’s definition in Da Silva Moore v. Publicis Groupe, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24, 2012), the first case to approve TAR as a method to shape document review:

By computer assisted review, I mean tools that use sophisticated algorithms to enable the computer to determine relevance, based on interaction with a human reviewer.

It is about as simple as that:

  1. A human (subject matter expert, often a lawyer) sits down at a computer and looks at a subset of documents.
  2. For each, the lawyer records a thumbs-up or thumbs-down decision (tagging the document). The TAR algorithm watches carefully, learning during this training.
  3. When the training session is complete, we let the system rank and divide the full set of documents between (predicted) relevant and irrelevant.[2]
  4. We then review the relevant documents, ignoring the rest.

The benefits from this process are easy to see. Let’s say you started with a million documents that otherwise would have to be reviewed by your team. If the computer algorithm predicted with the requisite degree of confidence that 700,000 are likely not-relevant, you could then exclude them from the review for a huge savings in review costs. That is a great result, particularly if you are the one paying the bills. At four dollars a document for review (to pick a figure), you just saved $2.8 million. And the courts say this is permissible.

How is TAR Used?

TAR can be used for several purposes. The classic use is to prioritize the review process, typically in anticipation of an outgoing production. You use TAR to sort the documents in order of likely relevance. The reviewers do their work in that order, presumably reviewing the most likely relevant ones first. When they get to a point where the number of relevant documents drops significantly, suggesting that they have seen most of them, the review stops. Somebody then samples the unreviewed documents to confirm that the number of relevant documents remaining is sufficiently low to justify discontinuing further, often expensive, review.

We can see the benefits of a TAR process through the following chart, which is known as a yield curve:

YieldCurve1

A yield curve presents the results of a ranking process and is a handy way to visualize the difference between two processes. The X axis shows the percentage of documents that are available for review. The Y axis shows the percentage of relevant documents found at each point in the review.

As a base line, I created a gray diagonal line to show the progress of a linear review (which essentially moves through the documents in random order). Without a better means for ordering the documents by relevance, the recall rates for a linear review typically match the percentage of documents actually reviewed¾hence the straight line. By the time you have seen 80% of the documents, you probably have seen 80% of the relevant documents.

The blue line shows the progress of a TAR review. Because the documents are ranked in order of likely relevance, you see more relevant documents at the front end of your review. Following the blue line up the Y axis, you can see that you would reach 50% recall (have viewed 50% of the relevant documents) after about 5% of your review. You would have seen 80% of the relevant documents after reviewing just 10% of the total review population.

This is a big deal. If you use TAR to organize your review, you can dramatically improve the speed at which you find relevant documents over a linear review process. Assuming the judge will let you stop your review after you find 80% of the documents (and some courts have indicated this is a valid stopping point), review savings can be substantial.

You can also use this process for other purposes. Analyzing inbound productions is one good example. These are often received shortly before depositions begin. If you receive a million or so documents in a production, how are you to quickly find which ones are important and which are not?

Here is an example where counsel reviewed about 200,000 documents received not long before depositions commenced and found about 5,700 which were “hot.” Using a small set of their own judgments about the documents for training, we were able to demonstrate that they would have found the same number of hot documents after reviewing only 38,000 documents. They could have stopped there and avoided the costs of reviewing the remaining 120,000 documents.

YieldCurve2

You can also use this process for early case assessment, using the ranking engine to place a higher number of relevant documents at the front of the stack.

What about non-English Documents?

To understand why TAR can work with non-English documents, you need to know two basic points:

  1. TAR doesn’t understand English or any other language. It uses an algorithm to associate words with relevant or irrelevant documents.
  2. To use the process for non-English documents, particularly those in Chinese and Japanese, the system has to first tokenize the document text so it can identify individual words.

We will hit these topics in order.

1. TAR Doesn’t Understand English

It is beyond the province of this article to provide a detailed explanation of how TAR works, but a basic explanation will suffice for our purposes. Let me start with this: TAR doesn’t understand English or the actual meaning of documents. Rather, it simply analyzes words algorithmically according to their frequency in relevant documents compared to their frequency in irrelevant documents.

Think of it this way. We train the system by marking documents as relevant or irrelevant. When I mark a document relevant, the computer algorithm analyzes the words in that document and ranks them based on frequency, proximity or some other such basis. When I mark a document irrelevant, the algorithm does the same, this time giving the words a negative score. At the end of the training process, the computer sums up the analysis from the individual training documents and uses that information to build a search against a larger set of documents.

While different algorithms work differently, think of the TAR system as creating huge searches using the words developed during training. It might use 10,000 positive terms, with each ranked for importance. It might similarly use 10,000 negative terms, with each ranked in a similar way. The search results would come up in an ordered fashion sorted by importance, with the most likely relevant ones coming first.

None of this requires that the computer know English or the meaning of the documents or even the words in them. All the computer needs to know is which words are contained in which documents.

2. If Documents are Properly Tokenized, the TAR Process Will Work.

Tokenization may be an unfamiliar term to you but it is not difficult to understand. When a computer processes documents for search, it pulls out all of the words and places them in a combined index. When you run a search, the computer doesn’t go through all of your documents one by one. Rather, it goes to an ordered index of terms to find out which documents contain which terms. That’s why search works so quickly. Even Google works this way, using huge indexes of words.

As I mentioned, however, the computer doesn’t understand words or even that a word is a word. Rather, for English documents it identifies a word as a series of characters separated by spaces or punctuation marks. Thus, it recognizes the words in this sentence because each has a space (or a comma) before and after it. Because not every group of characters is necessarily an actual “word,” information retrieval scientists call these groupings “tokens,” and the act of identifying these tokens for the index as “tokenization.”

All of these are tokens:

  • Bank
  • door
  • 12345
  • barnyard
  • mixxpelling

And so on. All of these will be kept in a token index for fast search and retrieval.

Certain languages, such as Chinese and Japanese, don’t delineate words with spaces or western punctuation. Rather, their characters run through the line break, often with no breaks at all. It is up to the reader to tokenize the sentences in order to understand their meaning.

Many early English-language search systems couldn’t tokenize Asian text, resulting in search results that often were less than desirable. More advanced search systems, like the one we chose for Catalyst, had special tokenization engines which were designed to index these Asian languages and many others that don’t follow the Western conventions. They provided more accurate search results than did their less-advanced counterparts.

Similarly, the first TAR systems were focused on English-language documents and could not process Asian text. At Catalyst, we added a text tokenizer to make sure that we handled these languages properly. As a result, our TAR system can analyze Chinese and Japanese documents just as if they were in English. Word frequency counts are just as effective for these documents and the resulting rankings are as effective as well.

A Case Study to Prove the Point.

Let me illustrate this with an example from a matter we handled not long ago. We were contacted by a major U.S. law firm that was facing review of a set of mixed Japanese and English language documents. It wanted to use TAR on the Japanese documents, with the goal of cutting both the cost and time of the review, but was uncertain whether TAR would work with Japanese.

Our solution to this problem was to first tokenize the Japanese documents before beginning the TAR process. Our method of tokenization—also called segmentation—extracts the Japanese text and then uses language-identification software to break it into words and phrases that the TAR engine can identify.

To achieve this, we loaded the Japanese documents into our review platform. As we loaded the documents, we performed language detection and extracted the Japanese text. Then, using our proprietary technology and methods, we tokenized the text so the system would be able to analyze the Japanese words and phrases.

With tokenization complete, we could begin the TAR process. In this case, senior lawyers from the firm reviewed 500 documents to create a reference set to be used by the system for its analysis. Next, they reviewed a sample set of 600 documents, marking them relevant or non-relevant. These documents were then used to train the system so it could distinguish between likely relevant and likely non-relevant documents and use that information for ranking.

After the initial review, and based on the training set, we directed the system to rank the remainder of the documents for relevance. The results were compelling:

  • The system was able to identify a high percentage of likely relevant documents (98%) and place them at the front of the review queue through its ranking process. As a result, the review team would need to review only about half of the total document population (48%) to cover the bulk of the likely relevant documents.
  • The remaining portion of the documents (52%) contained a small percentage of likely relevant documents. The review team reviewed a random sample from this portion and found only 3% were likely relevant. This low percentage suggested that these documents did not need to be reviewed, thus saving the cost of reviewing over half the documents.

By applying tokenization before beginning the TAR process, the law firm was able to target its review toward the most-likely relevant documents and to reduce the total number of documents that needed to be reviewed or translated by more than half.

Conclusion

As corporations grow increasingly global, legal matters are increasingly likely to involve non-English language documents. Many believed that TAR was not up to the task of analyzing non-English documents. The truth, however, is that with the proper technology and expertise, TAR can be used with any language, even difficult Asian languages such as Chinese and Japanese.

Whether for English or non-English documents, the benefits of TAR are the same. By using computer algorithms to rank documents by relevance, lawyers can review the most important documents first, review far fewer documents overall, and ultimately cut both the cost and time of review. In the end, that is something their clients will understand, no matter what language they speak.

 


[1] It is, for example, the language used in almost every commercial deal involving more than one country.

[2] Relevant in this case means relevant to the issues under review. TAR systems are often used to find responsive documents but they can be used for other inquiries such as privileged, hot or relevant to a particular issue.

Head of Catalyst’s Korea Office Publishes Book on E-Discovery for Businesses

10015026_10203629755686711_1209566146_n

Youngsoo Park presents Catalyst CEO John Tredennick with a copy of his new book during a meeting this week in Seoul.

The head of Catalyst’s South Korea office, Youngsoo Park, is the coauthor with Jeongho Yoo of a just-published Korean-language book about e-discovery for business leaders. The book, What Every Business Person Should Know about eDiscovery, provides a comprehensive overview of all aspects of e-discovery.

l9788994567266The book is only the second ever about e-discovery published in Korea and the first in which hands-on professionals explore the topic in depth. The book covers the history and basics of e-discovery and then examines key topics and legal issues in e-discovery practice, both in the United States and Korea. It also explains several of the leading technology platforms for e-discovery, including Catalyst Insight. The book was published earlier this month in Seoul by InfoTheBooks.com.

Park, who is considered one of the leading e-discovery experts in Korea, joined Catalyst in 2013, when the company opened its first office in Seoul. He oversees the office and the expansion of Catalyst’s Asia-Pacific operations into South Korea.

His co-author, Jeongho Yoo, is the head of e-discovery, digital forensics and information security for a major corporation in Korea. He has served as an instructor in digital forensics for the Korea Police Investigation Agency, the Cyber Terror Response Center and the Korea Customs Service.

The book includes chapters on e-discovery in the U.S.  and the evolution of e-discovery practice under the Federal Rules of Civil Procedure. It also discusses e-discovery in Korea, including an overview of applicable Korea laws and a discussion of the country’s corporate culture and environment.

Other sections of the book cover digital forensics, the stages of e-discovery under the EDRM, and the preparation and implementation of an e-discovery strategy.

The book, which is in Korean, can be ordered here.

 

Article: Meeting the Challenges of Asian Language E-Discovery

As e-discovery reaches across borders into Asia, global companies face new and often unfamiliar challenges. Whatever the nature of the case, if it involves electronic information stored in China, Japan, Korea or elsewhere in Asia, be advised: You’ll be managing case files differently than you would be if you were in the United States.

The challenges presented in managing electronic files in Asia stem from many causes—some geographical, some technical and some cultural.

In Asian countries, the laws governing data and privacy are quite different than in the U.S. For example, in China, collecting and exporting data involving “state secrets” can get you thrown in jail. In Japan, taking data out and hosting it in the U.S. may cause you to lose your client.

Language, too, presents multiple challenges. The so-called CJK languages (Chinese, Japanese and Korean) are the most difficult to process, search and review. Mangle the processing and you lose your data. Mess up the search and you may as well have lost your data. Either way, your review becomes costly and ineffective.

In an article published in the February/March 2013 issue of Todays General Counsel magazine, “Challenges of Asian Language E-Discovery,” John Tredennick, President and CEO of Catalyst, and W. Peter Cladouhos, Esq., firm-wide Practice Support Electronic Discovery Consultant for Paul Hastings LLP, outline some of the most common, and the most critical, challenges companies face when handling Asian data and keeping Asian e-discovery on track and on budget.

Catalyst Expands in Asia with New Offices, Added Staff

Rapid growth of Catalyst’s Asia division has resulted in relocation of our Tokyo offices to a larger, more central location and the hiring of three additional professionals in the office to help meet the expanding demand for project consulting, data processing and network engineering.

Since we formally established our Catalyst Asia division just 18 months ago, demand for our services throughout Asia has soared. Headquartered in Hong Kong and with operations throughout the region, the division is led by Asia e-discovery and forensics veteran Richard Kershaw. Earlier this year, we added our first Asia data center—a full-featured, high security facility located in Tokyo.

With this latest move, Catalyst relocates its Tokyo offices to the prestigious Nishi Shinjuku district, near to the Tokyo Metropolitan Government buildings. The new offices provide space for expansion now and into the future. The move will also allow the installation of additional IT infrastructure to provide a more robust connection to the data center and more options for data handling.

To help meet the increasing demand for its services in Asia, Catalyst added three professionals to its Tokyo staff, all highly experienced and knowledgeable in their respective specialties:

  • Etsuko Akimoto is a lawyer who joins Catalyst as a project consultant. She comes to Catalyst from the Tokyo office of the international law firm Freshfields Bruckhaus Deringer. A member of the New York bar, Ms. Akimoto is a graduate of Tokyo University.
  • Gaby Banda joins Catalyst as a specialist in processing and productions. An experienced data processing technician, Ms. Banda formerly worked for Document Technologies Inc. in San Diego, Calif.
  • Luca da Col joins Catalyst as a network engineer in the infrastructure group. Formerly employed by Eire Systems in Tokyo, he has substantial experience as a data center network engineer in the telecommunications industry.

Read the full press release for more details.

Webinar Offers Guidance on Mitigating Risk in Cross-Border E-Discovery

Join two leading authorities in international e-discovery for a free, one-hour webinar, Cross-Border E-Discovery: Meeting the Challenges and Mitigating the Risks, to be held on Wednesday, Sept. 21, 2011, at noon Eastern time.

The webinar will explore the challenges for multinational corporations engaged in cross-border e-discovery–from data privacy laws and discovery-blocking statutes to language and cultural issues–and offer tips for mitigating risk.

Panelists for the webinar will be:

  • Maura R. Grossman, counsel at Wachtell, Lipton, Rosen & Katz. Ms. Grossman’s practice focuses on advising lawyers and clients on legal, technical, and strategic issues involving electronic discovery and information management, both domestically and abroad, as well as on matters of legal ethics. Ms. Grossman speaks and writes frequently on e-discovery and legal ethics and is a member of several Sedona Conference working groups.
  • Richard Kershaw, Asia managing director for Catalyst Repository Systems. A fluent Japanese speaker, Mr. Kershaw has lived and worked in the Asia region since 1996. Over the years, he has successfully led forensic data management assignments in arbitration, litigation and regulatory investigations across the region, including matters in Saudi Arabia, India, Indonesia, Singapore, Malaysia, Hong Kong, China, Taiwan, the Philippines and Japan.

For more details about this webinar or to register, visit: Cross-Border E-Discovery: Meeting the Challenges and Mitigating the Risks.

LegalTech West Coast: Join Us for Sushi and Sake for a Good Cause

salmon
Will you be at LegalTech West Coast in Los Angeles this month? If so, here’s your chance to support Japan relief efforts and have some fun and good food at the same time.

On Tuesday, May 17, starting at 6 p.m., Catalyst is hosting an evening of sushi and sake to benefit the 2011 Japan Relief Fund. If you’ll be at LegalTech, we welcome you to join us. No financial commitment is necessary to attend, but guests will be invited to contribute whatever they can. Catalyst has pledged to match every dollar donated up to $20,000.

The Japan Relief Fund was set up by the Japan America Society of Southern California to help provide critical funds to the victims of the massive 9.0-magnitude earthquake and the resulting tsunami. All proceeds will be forwarded to experienced non-governmental disaster relief agencies in Japan that have a proven track record of emergency humanitarian relief and restoration and development of destroyed areas.

When: Tuesday, May 17, 2011, 6 p.m.
Where: Los Angeles Marriott Downtown, 333 South Figueroa Street, Los Angeles
Questions? Stop by the Catalyst booth (Booth 117) or email Chelsey Lehman, marketing manager, clehman@catalystsecure.com

With FCPA Actions on the Rise, Search Takes Center Stage

Corporate Counsel magazine recently issued a report that should cause multi-national corporations and their counsel to pay attention: Trend Watch: Foreign Bribery Actions Doubled Last Year.

Specifically, the magazine reported that enforcement actions under the Foreign Corrupt Practices Act (“FCPA”) nearly doubled in 2010, rising to 76 (with complaints against 23 companies and 53 individuals). In 2009, the SEC and Justice Department brought 45 actions (against 12 corporations and 33 individuals). That number was a significant jump again from 2008 when the government brought 37 actions against companies and individuals.

The pace seems to be continuing as well. This month, Paul Hastings, one of the leading international firms advising on FCPA investigations, issued its first Quarterly FCPA Report for 2011 [PDF]. So far this year, it reports, enforcement continues apace, with actions brought against four companies and seven individuals, along with a blockbuster forfeiture and a number of guilty pleas and settlements. The forfeiture amounted to nearly $149 million and related to a high-profile arms contract case involving 22 indicted defendants.

Another international law firm, Herbert Smith, in an article, Developments in Anti-Bribery Legislation: The UK Bribery Act and its Impact for Japanese Companies [PDF], reported that, of the 10 all-time largest FCPA settlements, eight were achieved in 2010 and eight (together totaling over US$ 2.25 billion) were settlements with non-U.S. companies.

A lot of the recent activity seems to relate to the changing of the guard after the 2010 election. Under the Bush administration, FCPA enforcement happened but was not a priority. Under Obama and the Democrats, FCPA investigations seem to be a priority. As Assistant Attorney General Lanny A. Breuer said in a speech at a recent national FCPA conference, “We are in a new era of FCPA enforcement [and] we are here to stay.” (Also see our earlier post, DOJ’s Breuer Vows Heightened FCPA Enforcement.)

Add to all that the recent enactment of the Dodd-Frank Wall Street Reform and Consumer Protection Act, which provides that “whistleblowers” who provide information to U.S. authorities leading to successful prosecutions under the FCPA may be entitled personally to huge sums as a result (up to 30% of the monetary recovery). (See Fried Frank’s client memorandum, New Incentives for Foreign Corrupt Practices Act Whistleblowers: Dodd-Frank Wall Street Reform and Consumer Protection Act [PDF].)

At the least, the government had over 140 prosecutions and investigations underway in 2010, according to EthicalCorp.com. That figure is dramatically higher than previous years under prior administrations.

All you can say is watch out.

What Does This Have to Do with Search?

A lot actually. FCPA investigations typically involve hundreds of thousands or even millions of documents collected from all over the world. Sometimes, the investigations are initiated after the government issues a complaint. In those cases, counsel have a starting place for their investigation. The government has a good-faith obligation to set forth the basis of its complaint and that should alert counsel as to the people to interview and the subject matter for their searches.

In other cases, it doesn’t work that way. The FCPA laws impose liability on an acquiring corporation when mergers occur. That means that the buyer of another company could be held liable for bribes and other corrupt activities that occurred even before the merger. That is true even if the buyer never did anything wrong.

In that regard, it is a bit like buying a U.S. company that has plants and property. If you later determine that the property you are buying is contaminated with toxic chemicals, you may be facing expensive Superfund liability. It doesn’t matter that you didn’t release any pollutants at your company or at the new site you acquired. You’re stuck nonetheless.

Superfund’s broad environmental liability has led to a growing and lucrative practice for environmental audit companies. The same is true for the FCPA. Some of the largest law firms in the world, with the depth and geographic coverage to mount these investigations, offer specialized FCPA practices. Paul Weiss is one such firm but there are a number of others in the game.

The key difference is this: If you are doing an environmental audit, you know exactly where the property is. You can send out your teams to inspect the ground, review the chemical history of the plant, check where materials were dumped and even drill for problems. It is just a matter of money but you can certainly find problems if you are engaged to take a look.

FCPA Investigations are Not Easy

What about in an FCPA investigation? Well, the problem is a bit different. First, what kind of fraud are we looking for? Counsel can’t exactly assemble the staff and ask for a show of hands from anyone who has bribed a foreign official lately. What, nobody raised their hand? Well, bribery isn’t something you usually put on your resumé or Facebook page.

What to do? The law firms we work with often start by talking to people and collecting their documents. What can you learn about how they deal with government officials? What do the documents show? What about the expense accounts and other money transfers?

Search is a key part of the answer. Modern search engines allow you to search millions of pages in a variety of languages with the click of a mouse. But what do you search for? That’s the hard part.

Traditional Search Doesn’t Cut It

Traditional Boolean search certainly has a part to play in an FCPA case but it isn’t always the most effective method. The reason is that we don’t know exactly what we are looking for, let alone the terms that might elicit those documents. Searching for “bribe*” within 10 words of “government official” probably won’t do the trick. Searching for the names of the government officials in question (assuming you know even that) might help.

Boolean search becomes even tougher in FCPA due diligence investigations because we are trying to prove a negative—that employees in the company to be acquired were not bribing public officials. That means counsel has to comb company and employee records to determine that nothing improper is going on. A tough assignment to say the least.

This is where a non-traditional form of search can be helpful.

Think about the traditional approach to search: You ask the documents specific questions and hope they answer with helpful information. This is a bit like the games of Fish or Battleship from our childhoods. We kind of know what we are looking for and the trick is to frame queries to find out if it is there. So, we think of key term variants and try to frame our searches to find good stuff.

“Give me all your schemes to convince government officials to give us the business.” Answer: “Go Fish.” Uggh, try again.

Let the Documents Speak to You

Now consider another approach to search, one that is more effective for these kinds of cases. Instead of questioning the documents, you let the documents speak to you and tell you their secrets. While the technique is still based on search, the approach is different. It can be far more effective when you are dealing with large volumes of documents and have no clear road map to follow.

“What does he mean?” you ask. “After all, documents don’t speak, they just sit there.”

I mean this. Modern search engines collect data about documents than can help shed light on what they contain and how they relate to one another. For example, in Catalyst CR we collect statistics about the metadata contained in the documents we index. Thus, if I were looking at files obtained from a particular office or custodian, I could quickly determine a lot of helpful information about their contents—without running a search. Our Correlation Navigators feature allows you to see a wide range of information in a view that might look like this:

(Click image to enlarge)

In this case our focus is on recipients after a search on the Enron documents. We can quickly see who is on the sending or receiving end of emails and better focus our review on those files.

Likewise, the system looks for information within the bodies of the documents and can show key concepts being discussed in the population. Here would be a view of some of the topics being discussed in a sample of the Enron population:

(Click image to enlarge)

In this case, the words are placed in alphabetical order but sized by relevance. They can provide important clues as you try and hone in your investigation.

With Catalyst Insight, our next release of our flagship product, we are taking investigation to a whole new level. Along with the field information we can provide about people and topics, we will provide investigators with new tools to allow the documents to speak to them.

Here is a timeline view, for example:

(Click image to enlarge)

With the timeline and the various field facets, a searcher can interact with documents and allow them to “speak” about their contents. The searcher can interact with each of the facets or drill down into the timeline to see how the communications flowed between the parties.

Investigators can use similar tools to track communications between parties, which can help guide their investigation. Here is an example from Insight:

(Click image to enlarge)

In this case, you can click on any individual to move that person to the center of the graph. Instantly, you can see who individuals are communicating with and how often. Click on the numbers and you can look at the actual communications. As you continue your investigation, you can go back and forth among individuals and documents.

There are a number of other techniques you can employ to speak with your documents. Clustering documents around themes can help you sort through large volumes of documents and focus on those that might matter. Finding “More Like These” from a key document set can take advantage of more complex queries than most researchers can hope to create on their own. These methods let the computer do that work based on complex algorithms and let the documents speak to you and help your investigation.

There are many more techniques we could discuss for FCPA investigations but this is a start. Suffice it to say that search is at the heart of these investigations and that search is typically more complex than we learned using Lexis or Westlaw. Mathematics, analytics and visual cues are important here as we both interrogate the documents and let them speak directly to us.

Regulating Bribes and Corruption is Catching on Globally

If you think this issue is of importance only to U.S.-based corporations, think again. It is true that, for many years, the United States stood alone in its efforts to police international corruption. The FCPA itself has been around for 30 or so years, although enforcement efforts have stepped up only recently. Some questioned our cheek in trying to tell people how to do business in other parts of the world.

Despite these misgivings, the idea of combating corruption is catching on globally. England recently enacted a similar law called the UK Bribery Act. It goes into effect July 1, 2011, and will impact U.S. as well as Asian companies. Recently, the U.K. Ministry of Justice published its guidance on procedures companies can put into place to protect themselves under the new act. (Also see our earlier post, Is Your Company Ready for the UK Bribery Act?)

Other European governments are following suit. In December 2010, Spain passed legislation allowing companies to be held accountable for criminal liability and making it a crime to bribe foreign officials. A DLA Piper publication provides background on this legislation.

The Asian region isn’t ignoring this issue either. Singapore has the Corrupt Practices Investigation Bureau (CPIB) which is dedicated to enforcing its Prevention of Corruption Act.

Darren Cerasi, director of I-Analysis in Singapore (one of Catalyst’s Asia Partners), reports that while the CPIB was set up primarily to prosecute government officials, its jurisdiction also extends to civilian bribery and includes the potential for both fines and jail time. Indonesia is another country that is focused on bribery and other anti-competitive acts.

Interestingly enough, China has an Anti-Bribery Law as well, which came into effect in December 2008. The Chinese Anti-Bribery law was amended in February this year to include making it an offense to bribe government officials outside of China and non-government officials too. It is expected to come into force in May.

Our China hand, Richard Kershaw, director of Catalyst Asia, says that China seems to be interested in bringing more accountability to government and its people. In August of 2008, it also passed an Anti-Monopoly Law which allows citizens to sue for monopolistic practices along with government enforcement. These laws apply to both foreign and Chinese domestic companies.

To read more about China’s laws, see:

Japan is in the game as well. In a recent case, prosecutors in Japan got four former senior executives of a Japanese company to plead guilty to bribing a Vietnamese transport official. The guilty pleas are especially noteworthy given Japan’s historical reputation as a jurisdiction where anti-bribery enforcement has been relatively lax. (See the Baker Botts FCPA Update.)

Without question, international counsel will be faced with more and more of these tricky and high-stakes investigations. Documents will be at the center stage and search will be the key to making sense of them.

TransPerfect Legal and Catalyst Announce Partnership for Asia E-Discovery

This week, TransPerfect Legal Solutions, one of the world’s leading providers of global legal support services, and Catalyst announced their partnership to provide comprehensive litigation support throughout Asia. Through this partnership, clients in Asia will benefit from the combination of Catalyst’s premium hosting solution and TLS’s industry-leading multilingual translation and managed review capabilities.

In business since 1992, TLS has offices throughout Asia and the world. It offers a full suite of litigation support services, including translations and multi-language managed review.

“As data volumes increase, companies are looking for better ways to streamline reviews,” said Richard Kershaw, managing director of Catalyst Asia. “The Catalyst platform has always excelled in supporting multiple languages, and by partnering with TLS, we are now able to offer our clients full multilingual capabilities and document-focused litigation support services tailored to the needs of Asia-based projects.”

To learn more about the partnership between TLS and Catalyst, read or download the full announcement.

Another Catalyst Employee Reports on the Situation in Japan

[The following update on the situation in Japan was sent in earlier today by Ian Folkman, production supervisor in Catalyst's Tokyo office. Ian and another Catalyst employee, project consultant Abdel Basheer, just moved to Tokyo from Denver last November. For earlier updates from another Catalyst employee in Japan, Robert Ocampo, see here and here.]

Earthquakes have taken on a whole new meaning for me after the recent events. When I first arrived, the small earthquakes were very interesting to me because I had not felt an earthquake before in the same manner. The big earthquake was so noticeably different than what I had felt up until then that Robert and I knew right away this one was different. Now that I have felt a large earthquake, each new earthquake — there have been hundreds in the weeks following — no matter the size, brings with it a slight sense of vertigo and nervousness, even though none of the earthquakes have been the same size since.

While the nuclear power plant has been the more interesting topic lately, I have yet to see Godzilla outside of my window and no one has gained any super powers. The concern about radiation in the media has been overstated, and there has been no real danger to the people in Tokyo.

The true effects of the disasters have been in the daily lives of some of the people. People here in Tokyo have had rolling blackouts to conserve electricity, cold weather, and a smaller supply of food and water to draw on. I have seen grocery and convenience stores low on food for some time, but they are slowly returning to normal, and many emergency supplies were bought up in the days after the earthquake. I also see department stores closing early and lights remaining off even during night hours. Despite all this, there has been no famine and everyone has enough food and water, and even the tap water is safe to drink.

Lastly, I would like to say that I am amazed at how well the people of Japan have survived one of the worst earthquakes in history. Their preparedness is an example for everyone.

Update from Tokyo: ‘Our Lives Have Changed’

[This is an update to the situation in Tokyo from Robert Ocampo, operations manager for Catalyst in Japan and a resident of Tokyo for more than 20 years. Also see his earlier post describing the day of the earthquake.]

Well, life in Tokyo has changed. Our mobile phones buzz at least three times a day warning of imminent earthquakes in any which region of Japan. People are getting numb to the shakes that happen all the time. It has become a common sight to see irritated commuters waiting for the next train or pushing to get on a train filled to the brim.

It’s eerie, but it looks like Tokyo is becoming a ghost town. Big retail shops and department stores all open later and close earlier — it’s Friday today and many Uniqlo shops were closed at 3 p.m.

Of course, whether we are going to be hit with possible radiation may be on our minds, but compared to what is going on with the intermittent blackouts, food shortages and long lines for gasoline, the bigger worry is when is that next big earthquake going to hit us and where. A week after and living in Tokyo is different — we are all just coping and hoping for the best for those up north.

Hopefully next week we’ll have happy news.