Resource Center

Character Encoding: An Introduction for E-Discovery Professionals
Written by John Tredennick   

“There is something wrong with your system,” the angry lawyer on the phone said to Laura, the project consultant working on her case. “I am looking at the screen and all I see are a bunch of question marks and boxes,” she continued, getting more exasperated by the minute. “How am I supposed to review these documents if I can’t read the words?”

“Let me see if I can help,” Laura answered, trying to be as calm as possible. “Perhaps your computer is just using the wrong code page to display the text. If so, we can probably fix the problem with a mouse click,” she offered hopefully. “If not, there could be a problem with how your data was collected or processed.”

“A code page?” responded the caller. “What the heck is a code page?”

Our caller’s confusion was not unusual. After all, most of us went to school to study law, not technology. Many lawyers still have little interest in knowing more about technology than how to turn on their computers.

But legal-technology professionals do need to know about code pages and character encoding, particularly as multi-language discovery becomes more common. The good news is that the subject isn’t that difficult. It is just a matter of taking it step-by-step.

In this article, we provide a primer on what you need to know. We start by reviewing the development of ASCII—the basic standard for English-language programs—and discuss its limitations for non-English languages. Later, we introduce you to Unicode, the standard that evolved to address ASCII’s limitations. Finally, we explain why all this is important for you to understand.

 

Read the Full Article.