E-discovery: Change your thinking about e-discovery

Concept searching produces more relevant results than Boolean and keyword methods

We go through the exercise of electronic discovery to reduce volumes of data into useful trial evidence. The problem is we’ve been using the wrong tools to do it. Traditionally, counsel savvy about e-discovery have interviewed key information custodians with the goal of developing a list of words and phrases (hereinafter, “keywords”) to be applied against the data set. In theory, applying a keyword filter captures most (if not all) of the relevant information while screening out irrelevant information.

The problem with screening data with keywords is that it doesn’t do what we assume it does. Applying a keyword filter to a data set (i.e., a Boolean search) is simultaneously over- and under-inclusive. Language (and human beings’ use of language) is inconsistent and imprecise. Keyword/Boolean searches are over-inclusive in that a simple Boolean search lacks the “intelligence” to differentiate between synonyms (e.g., you search for documents related to an insect infestation at an apple orchard, and you end up with thousands of documents about Apple computers).

As a result, your screened data set usually contains thousands of documents that are completely irrelevant to the case, so your review costs soar. But scarier (for us litigator types) is the fact that keyword searches are under-inclusive. Keyword/Boolean searches are under-inclusive by nature because they assume:

  1.             People use the same words to refer to the same or similar concepts
  2.             People spell things similarly.

. . . both of which we know are not necessarily true. (Blair & Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System, 28 Com. A.C.M. 289 [1985]). For example, attorneys and paralegals involved in a subway accident case used keyword methodology to search a 350,000-page (40,000-document) database. Id. The litigation team believed they had located 75 percent of the relevant documents, while a separate manual review of the documents found that they had identified only 20 percent of the relevant documents. Id.

Most of the “missed” internal communications about the subway accident included terms and phrases like the “unfortunate incident,” the “disaster,” the “event,” the “situation,” the “problem” and the “difficulty” and never mentioned the “subway” or the “accident.” Id. Your keyword-screened document set may also be missing close to 80 percent of the documents that are particularly relevant to your case.

Luckily, there’s a solution to this problem: concept searching. Concept searching software uses algorithms to build a language model unique to each document set. Once the algorithm is applied to the document set, it tells the user which concepts (rather than key words) to look for. Applying the language algorithm reduces the problem with synonyms because the software is “intelligent” enough to determine whether the data is relevant based on the context in which it was used.

Most importantly, however, the software informs the user about additional words and phrases (i.e., “disaster,” “difficulty” and “unfortunate incident” from the example above), which relate to the same concept the user is attempting to explore, thereby yielding a more useable data set.

Concept searching is not new. E-discovery vendors have been marketing some form of concept searching under the rubric of “clustering” and/or “data analytics” since the early 2000s. Since then, concept searching software has become much more sophisticated and identifies related concepts much more clearly than its predecessors.

Indeed, Herbert Blutenthal of OrcaTec has used his company’s concept searching technology on many data sets and reports that he “always learns things when [he] takes on a new collection.” Even the U.S. Department of Defense has recently licensed a type of clustering/data analytic software to help make sense of the volumes of unclassified information in its data archives.

The moral of this article: Although the cost of concept searching may be high, it has the potential to yield more relevant information than the “keyword”/Boolean method. Before you completely dismiss the idea because of the price tag, you ought to know what you’re giving up.

Page 1 of 2
Comments

InsideScoop Daily eNewsletter

InsideScoop delivers the latest-breaking news affecting in-house counsel. Get the latest business trends, current corporate litigation, labor developments, technology initiatives and more — FREE. Sign up now!

You have been subscribed! You will receive a confirmation email soon.

See the entire list of InsideCounsel eNewsletters.

Resource Library


Bring the Benefits of Decision Tree Analysis to Your Everyday...

In this on-demand webinar, learn how to counter the challenges of litigation with predictive analytics...

13 Things to do Now to Reduce Risk and Avoid...

We have developed best practices for lowering your e-Discovery costs, shortening the length of your...

7 Simple Strategies for Improving Legal Fee Budgeting Certainty

Understanding the legal fee budgeting paradigm and following seven simple strategies will help you control...

Complimentary White Paper: Best Practices for Meeting Critical eDiscovery Challenges

Packed with practical advice, this white paper discusses best practices for meeting eDiscovery challenges across...

Complimentary White Paper "Key Considerations for Collection Methodologies and Resources"

This white paper addresses the need for companies to reevaluate their current collection policies in...

Moving Matters In-House: How Technology Enables Legal In-Sourcing

Strategically shifting more matters to in-house counsel has proven to be an effective strategy to...

5 Ways to Promote Responsible Content Sharing

Find out five ways that organizations can promote responsible sharing of content among employees by...

Reducing the Costs of eDiscovery from Collection to Court!

Predictive coding is only one of many ways organizations can make eDiscovery faster, cheaper and...

Discovery Shifts to the Cloud

Adoption of Cloud computing continues to gain momentum. How can IT and Legal Teams avoid...

Lower Your Total Cost of Ownership

With the deployment of Proofpoint Enterprise Archive, organizations have realized significant cost savings in automating...

View All »

Advertisement. Closing in 15 seconds.