Technology: Turning document review expense into assets

Creating reusable synonym-rings turns the expense of one document review into a reusable asset that can be leveraged in the next review

For many corporations — those with active litigation portfolios or frequent internal investigations or ongoing regulatory scrutiny — document review has become a continuous activity. The cost of these reviews can really mount up, often into the tens of millions of dollars per year. Ironically, one of the cost-drivers is the reluctance of counsel to admit that document review is now a regular part of corporate life and react accordingly. In this article we will show you how to substantially reduce document review costs by creating reusable assets in each and every review.

When discussing this topic, clients frequently say that while they do a lot of document review, each matter is different and therefore the work of one review cannot be leveraged into the next. We are here to tell you that this is an erroneous statement. From a language perspective there is much more similarity than difference between reviews. If you think in terms of language instead of documents, much of the work done for one matter is reusable in the next.

Query: What makes a document relevant to a case issue?

Answer: The document contains language that communicates information about the subject matter of the issue.

Document review isn’t really about “documents,” it’s about analyzing the specific words (and how they are used) in a document. Words are how we communicate concepts. Different companies use words in different ways; for example, a company in the trash hauling business uses the word “collection” as a synonym for “hauling,” whereas a bank does not. Each company has a unique vocabulary that they use to communicate the concepts that drive their business. For sure, companies in the same industry have more similar vocabularies than companies in different industries, but even within the same industry, each company develops its own way of speaking. When you know how a company talks about issues, document review becomes much less expensive.

The first step in document review is to find all of the documents that are about a given topic or topics (often limited by custodian and date). If you know what words the company uses to talk about the topic, a simple Boolean search engine can retrieve the documents that are potentially of interest. Document review is expensive because we don’t know what words are being used, so we inevitably waste considerable time and money reading documents that couldn’t possibly be relevant.

But here’s the thing — while there are indeed lots of possible word choices, there are not actually a lot of different words being used. Corporations (like all social groups) coalesce around relatively few words to communicate their activities. Over time, corporations create their own “dictionaries” of language. Those dictionaries are the Rosetta stone of document review.

Suppose we are tasked with finding all documents about meetings with competitors. For a document to be about this topic, it must contain a word(s) that communicates the concept of a meeting and also a word(s) that communicates the concept of a competitor. A document that does not contain these words cannot be about the topic. If I know which words the company uses to talk about meetings and competitors, then I can construct a series of Boolean queries to find all documents that might be about this topic.

Put another way, if I have two “synonym-rings,” one that lists all of the possible words the company uses to communicate the concept of meeting and another that lists all of the possible ways the company identifies competitors, then I am well on my way to identifying the documents worth reviewing. And how might I happen to have these synonym-rings? Perhaps in a previous matter these same concepts were at issue, so I built the rings then and was prescient enough to save my work. The trick to building a corporate dictionary of synonym-rings is to do it incrementally, case-by-case.

Building synonym-rings is actually very straightforward. There are only about 25,000 root words in use at any corporation. To build the corporate dictionary of synonym-rings, we extract the vocabulary from the documents collected for a matter, organize the root words by part-of-speech (nouns, verbs, etc.) and look through the word lists. If we are building a synonym-ring for the concept of “meeting,” we look through the word list and record every word we can possibly imagine being used to communicate that a meeting took place. The process takes only a few minutes.

Creating reusable synonym-rings turns the expense of one document review into a reusable asset that can be leveraged in the next review. If you already know how the company talks about a topic, it’s simple to find the documents that are potentially about that topic. Synonym-rings act as super keyword filters; they eliminate documents from the review that couldn’t possibly be about the issues. Our experience is that using well-constructed synonym-rings to identify potentially relevant documents eliminates about 80 percent of collected documents from the review process — a substantial savings in time and money.

So the next time the need to review documents comes along, take a deep breath and acknowledge that this time won’t be the last time, and start to work in a way that creates leverage and cost savings down the road.

Contributing Author

author image

Andy Kraftsow

RenewData’s Chief Scientist Andy Kraftsow leads the company’s efforts to develop groundbreaking technologies. Trained as a mathematician (and a CPA) Kraftsow is one of the...

Bio and more articles

Join the Conversation

11

Advertisement. Closing in 15 seconds.