Keyword Searching Has Limitations in E-Discovery

When systems provider Autonomy searched Enron's electronically stored information (ESI) in a demonstration of a new method it calls meaning-based computing, the search for the term "book loss" turned up code names the now-defunct energy company had used to hide its financial misdeeds.

Keyword Limitations

Although keyword searching remains the most common method of information retrieval for e-discovery, regulatory requests and other projects, it has limitations. It cannot determine the context to differentiate a word with different meanings. For example, if you were dealing with a spoliation claim and looking for the term "shredding," as in shredding paper, you might turn up irrelevant documents referring to shredding food or shredding the slopes on a snowboard.

System Training

The new systems are "trained" to locate relevant documents. A reviewer might read a sampling of 100 documents from an ESI collection and determine that 80 are relevant. The reviewer returns the 100 documents to the system to collect other documents referring to the same or similar issues included in the 80. The system can begin to determine a pattern.

Market Acceptance

The eDiscovery Institute Survey on Predictive Coding, released in October 2010, asks, "Given the claimed advantages for predictive coding, why isn't everyone using it?" The most mentioned reason was uncertainty about whether judges would accept predictive coding as providing reasonable and defensible efforts to identify responsive documents (see "Courtroom Commentary").

Michael Kozubek

Bio and more articles

Join the Conversation

Advertisement. Closing in 15 seconds.