Services

Equivio Near Duplicate Detection Helps Federal Agency Review 48,000 Documents Quickly

The Problem

A federal agency involved in a contract dispute with one of its suppliers received its opponent's discovery production: 48,747 documents in single-page TIFF image format, with document breaks but no digitized text.

The client had only weeks to search the production for key evidence, and only four attorneys who could work part time on this review. With limited resources and a very tight budget, they needed a strategy that would maximize the legal team's document review efficiency to prepare them for their imminent hearing.

Initial review of the TIFF images uncovered wide-spread duplication within the discovery materials, however traditional duplicate identification technology didn't work in this situation because each image contained a unique control number (i.e., a Bates number). This meant no two image files were exactly identical, even if they differed only by control number. In addition, when the production's 200,000+ TIFF images were run through an OCR engine to generate searchable text, the process introduced transcription errors. This meant that while many documents were near duplicates, there were no exact duplicates.

The Solution

The client chose to use Equivio to reduce the burden of reviewing the entire production. Using a 90% similarity threshold,13,787 documents (about 28% of the collection) were identified as near-duplicates and put into groups. Equivio then identified the pivot document the document containing the greatest amount of text.

When a reviewer encountered a cluster via the pivot document that was responsive, a single click would bring up the rest of the near-duplicate grouping for individual analysis. Conversely, if they found the pivot document was irrelevant to the dispute, they could eliminate entire clusters of documents from further review with a single click.

The Results

Equivio near-duplicate analysis accelerated this review in two ways. First, identifying near duplicates significantly reduced the number of documents that the legal team needed to review in its first pass. Second, Equivio groupings sped up the pace of the review because, after reading a pivot document carefully, near-dupes could then be quickly scanned for differences rather than reviewed from top to bottom as they would have been if they had come up as scattered documents.

Download the Case Study