Catalyst team devises method that gets rid of the bath water, keeps the baby.
Lawyers representing a U.S. multinational in an international contract dispute had to find the pins within a 350 GB haystack of documents. Not only that, but they wanted only the pins and not the pinheads. They needed to find every document that included a trade name. At the same time, they wanted to exclude any document that used the trade name as a domain name.
All documents had to be found that included the business name—call it "WidgetWizard." But all documents had to be excluded that used the same name as a domain, say "widgetwizard.com." This presented a document-search riddle: How could we find all valid occurrences of a name while excluding all invalid occurrences of the same name.
To find the answer to this riddle, the search professionals of Catalyst Consulting worked in tandem with our chief technology officer. Operating under tight deadlines set by the client, the Catalyst team came up with a way to solve the problem.
Making the riddle all the more difficult to solve were the size and condition of the data-set. It arrived at Catalyst on a hard drive containing 3.36 million documents spread through 530 separate PST files. The documents were in multiple languages, primarily English and Chinese, and many were a mix of languages.
The solution we devised was based on on-the-fly extraction and analysis of document text. As we processed the documents, we extracted the text of each. From the extracted text, we deleted every occurrence of "widgetwizard.com." We then ran the search for "widgetwizard," knowing that every remaining hit would identify a valid document.
Notably, this process of extraction, deletion and searching was performed entirely within system RAM. In this way, the deletions of "widgetwizard.com" left the underlying data untouched. Only the temporary text files were altered, thereby allowing us to exclude all occurrences of the name in one form while retaining all occurrences in the preferred form.
As document collections increase and searches grow more complex, clients and counsel increasingly turn to repository companies that provide enhanced technology needed for the legal process along with the expertise to put it to use. The Catalyst Consulting team includes certified search experts with many years of experience in dealing with complex searching, including privilege searching, workflow and QC. That’s one of the many Catalyst advantages.