About Us
Catalyst Gridlines
Catalyst was one of the first companies to use secure Internet technology to help corporations and their counsel manage large volumes of documents. Today, we help manage documents in multiple languages and work with people all over the world.
Searching Foreign Languages: A Primer, Featured in Law Technology Today
Written by John Tredennick   

In this second article of a two part series, Tredennick offers additional advice on handling foreign language documents. Learn how your computer searches a document, what tokenization means and most importantly, how you will manage documents in pictorial languages, such as Chinese, and even multiple languages.

Last issue, I began a series on dealing with foreign language documents. We started by looking at foreign language characters and the problems they cause with computers raised on ASCII programming code. You may recall that ASCII is the basic language of most early computers. It supports up to 256 different characters and uses a single byte of programming code (8 bits) to hold all those combinations of ones and zeroes. That was enough to cover our basic alphabet, along with some punctuation, but not much else. If you were born speaking a language with thousands of characters, you were out of luck.

This problem led to the Unicode movement in the early nineties. The goal was to create a universal system to describe the characters in all the world's languages. They did so by using several bytes to describe the additional characters that were needed. This gave us a single encoding system that could cover thousands of language variants-even the 65,000 or so pictorial characters that people use in the Far East, namely, China, Japan and Korea. It is also why many languages are called "double byte" languages.

So, now we have our computer systems ready to support foreign language characters. What's next? Why search, of course. If you can't search all of these foreign language documents, how are you going to find that smoking gun you need to make your case. Search is the next step in handling foreign language documents.

?Read full article

?