Applied Search Project

uva_logo_smallUsing the TREC Problem and the Enron Collection

University of Virginia Law School
Monday -Thursday, October 12 to 22, 2009
Classes meet at WB 121 from 8:30 to 9:50 ET
Course Lead: John Tredennick

 


Project Rules

You may choose to do this project or take the class exam. The project is due by the end of the day on Tuesday, November 10th.

This project is an individual rather than a team effort. If you choose to work on this project, please contact me and I will assign you a coach to discuss search strategy. The coach will help you understand CR's capabilities but will not prepare you searches for you or discuss case law other than in general terms.

The Problem

You represent the defendants in the case Zocalo, et. al. v. Enron Corporation, et. al. The case is pending before the United States District Court in Maryland. The action involves a class of investors who have brought suit against Enron and many of its professional advisers for various misrepresentation claims, deceptive practices, omissions, etc.

Here is the mythical complaint in the case: Complaint for the TREC Legal Track.

The plaintiffs have submitted discovery requests to your client and your job is to address two specific claims:

205. All documents or communications that describe, discuss, refer to, report on, or relate to
energy schedules and bids, including but not limited to, estimates, forecasts, descriptions,
characterizations, analyses, evaluations, projections, plans, and reports on the volume(s) or
geographic location(s) of energy loads.

207. All documents or communications that describe, discuss, refer to, report on, or relate to
fantasy football, gambling on football, and related activities, including but not limited to,
football teams, football players, football games, football statistics, and football performance.

For this exercise, you are to come up with and execute a strategy to produce documents responsive to Request 205 and to segregate, as non-responsive, any documents in Request 207. Documents should be placed in shared folders that will available to yourself and the user account?(jtdemo).

When you are finished, prepare a memo setting forth and defending your approach, The goal is to prepare your partner to defend your actions should they be challenged in court. This will include a discussion of the relevant case law and a showing of how you met the applicable requirements. Assume the cases and other materials posted to the web site will be persuasive to the local Magistrate. He/she will also be interested in the academic thinking on the subject and will find that information persuasive. You don't need to conduct further research.

The memo should be no longer than 5 pages double spaced and may be shorter if you don't need the length.

Feel free to contact me with questions.

Good luck.

John Tredennick

More About the TREC Program

The National Institute of Standards created TREC (Text REtrieval Conference) several years ago to bring academics and vendors together to study ways to make search more effective. For the past couple of years, they have held a competition to see which approach was most effective against a controlled document population. First, they used scanned documents?from the tobacco litigation. This year they have created a large (3.5 million) set of Enron documents in native format.

Here is more about TREC and the TREC Legal Track:

Here is the complaint and the request for production of documents that are used in the competition.

For this class we will focus on two requests for production, specifically requests 205 and 207. Here are the requests:

205. All documents or communications that describe, discuss, refer to, report on, or relate to
energy schedules and bids, including but not limited to, estimates, forecasts, descriptions,
characterizations, analyses, evaluations, projections, plans, and reports on the volume(s) or
geographic location(s) of energy loads.

207. All documents or communications that describe, discuss, refer to, report on, or relate to
fantasy football, gambling on football, and related activities, including but not limited to,
football teams, football players, football games, football statistics, and football performance.

These represents the two facets of electronic discovery: (1) finding the good stuff; and (2) culling out the irrelevant stuff.

Document Population

We will be using what is called as the Enron set of emails. In this case about 250,000?emails have been rendered to basic text files (without attachments). This is a subset of a 3.5 million?population of? documents used in the TREC competition. I chose to load the smaller set because the TREC data contains information protected by privacy regulations and their grant of right excludes public consumption. It is meant to be used to test search and analytics methods.

The emails we are using were earlier placed in the public domain. They too?have been widely used over the years both to get a glimpse into the Entron debacle but also for search and analytics projects. Let me warn you that all kinds of topics are discussed and there are some porn emails in the population. There are also some privileged conversations between the clients and members of my old law firm.

This volume is enough to give you an idea of the problems you will face in practice as you engage in electronic discovery.

I also added as a separate sub-collection a group of foreign-language documents. We call this the IN8L collection for reasons I have forgotten. It does not pertain to the Enron issues but I wanted to give? you an opportunity to see and test foreign language search and review. You will find Chinese, Japanese, Korean and a lot more languages there. The documents are things we could find on the web and download.

Tool Set: Catalyst CR

I have set up accounts for all of our students in Catalyst CR.

To access the site: https://demo.catalystapps.com/cr

You will find a search guide and user guide on the Search home page. You can also find manuals, training videos, tips and a lot of other information on our support site at: http://support.catalystsecure.com

There will be two sub-collections on the site: Enron and IN8L. Look for them under the?Links section on the right side. Uncheck either one to only search against one set of documents.

Coaching Assignments

If you want help from a coach, contact me and I will assign one of the following:

Jim Eidelman, This e-mail address is being protected from spambots. You need JavaScript enabled to view it .

Chris Toomey, This e-mail address is being protected from spambots. You need JavaScript enabled to view it

Ron Tienzo, This e-mail address is being protected from spambots. You need JavaScript enabled to view it

Patty Daly, This e-mail address is being protected from spambots. You need JavaScript enabled to view it

 

 


Course Home | Week One | Week Two | Applied Search Project | Faculty | Course Description | Resources Page