We go through the exercise of electronic discovery to reduce volumes of data into useful trial evidence. The problem is we’ve been using the wrong tools to do it. Traditionally, counsel savvy about e-discovery have interviewed key information custodians with the goal of developing a list of words and phrases (hereinafter, “keywords”) to be applied against the data set. In theory, applying a keyword filter captures most (if not all) of the relevant information while screening out irrelevant information.
The problem with screening data with keywords is that it doesn’t do what we assume it does. Applying a keyword filter to a data set (i.e., a Boolean search) is simultaneously over- and under-inclusive. Language (and human beings’ use of language) is inconsistent and imprecise. Keyword/Boolean searches are over-inclusive in that a simple Boolean search lacks the “intelligence” to differentiate between synonyms (e.g., you search for documents related to an insect infestation at an apple orchard, and you end up with thousands of documents about Apple computers).
Most of the “missed” internal communications about the subway accident included terms and phrases like the “unfortunate incident,” the “disaster,” the “event,” the “situation,” the “problem” and the “difficulty” and never mentioned the “subway” or the “accident.” Id. Your keyword-screened document set may also be missing close to 80 percent of the documents that are particularly relevant to your case.