More On

Technology: Relative defensibility of Boolean and statistical document culling workflows

Technology is fallible, but that doesn’t mean we shouldn’t use it

Our prior articles have addressed statistical document sorting technologies, various workflows that leverage this technology, and the defensibility of these technologies and workflows in court. The Kleen Products v. Packaging Corp. of America case pending before Magistrate Judge Nan Nolan illustrates the baseline of defensibility for the more traditional Boolean searching technology.

The several defendants in Kleen, an antirust case, used Boolean searches to generate a linear review set. The plaintiff argues that the process was inadequate and seeks to compel the defendants to try again using statistical document sorting technology. There is great interest in whether this case will become the first in which a party is affirmatively compelled to use statistical document sorting technology. But, leaving the question of the potential remedy aside, this case illustrates well the challenge of defending a Boolean process that is conducted unilaterally.

In their effort to defend the process, the defendants already have invested in preparations for two full days of expert testimony. It appears that the third day of hearings is currently on hold as the parties negotiate an amicable resolution. However these negotiations turn out in Kleen, most litigators know that it tends to be very difficult to hold firm to a unilaterally established set of Boolean search terms. Opponents identify overlooked words and raise questions about unknowable shorthand or even code words. Often, there are heated disputes over how broad or narrow the search terms ought to be. But, quite commonly, when the dust settles, at least some additional search terms get performed either by agreement or court order.

Anyone who seeks to hold firm to a unilaterally selected set of search terms should keep in mind what Magistrate Judge John Facciola famously said in U.S. v. O’Keefe:

“Whether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics...Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”

The difficulty of selecting defensible search terms is well known to litigators who handle cases involving a large volume of documents. This difficulty is one reason lawyers often do not use Boolean technology to restrict document collections as opposed to document reviews: There is a very real risk of needing to expand the search later. If the collection is restricted to only documents that hit on certain search terms, then there is a real risk of serially inconveniencing employees, disrupting the client’s business and incurring duplicative expenses for repeated collections with expanded search parameters.

Our point is not that Boolean technology is obsolete or indefensible. It is that all technologies and workflows are subject to attack, including even those that may be considered to be time-honored and traditional. So we should not resist using the newer statistical technologies and workflows simply because they too are subject to uncertainty and attack.

Some workflows that leverage statistical document sorting technology probably are more defensible than others. For example, a workflow in which humans analyze each and every concept cluster may be more difficult to attack than either Boolean workflows, or predictive workflows in which an algorithm is trained based on a sample set of documents. This is because when each concept cluster folder has been analyzed, a human being can take the stand and testify that he performed a reasonably diligent review of each concept-cluster folder and made informed and reasonable judgments. This would generally seem to leave little opening for attack absent some apparent gap in the production (which gap a producing party likely would seek to remedy before letting the dispute go to the judge).

This does not mean that Boolean or the various predictive workflows should not be tried. Each workflow potentially has a place in appropriate circumstances. Boolean and predictive workflows may be more prone to require some flexibility and cooperation and, failing that, judicial involvement. But that is more familiar ground to most litigators than might at first be assumed.

Contributing Author

author image

Thomas Lidbury

Thomas A. Lidbury is a partner in Drinker Biddle & Reath's Commercial Litigation practice and leads the electronic discovery and records management group. He advises clients in...

Bio and more articles

Contributing Author

author image

Michael Boland

Michael J. Boland is managing director of Drinker Discovery Solutions LLC, a subsidiary of Drinker Biddle & Reath, which provides electronic discovery services including processing and advanced...

Bio and more articles

Join the Conversation

Project 5/165 is endeavoring to reach the day when one-third—or 165—of Fortune 500 general counsel are women. We want that day to come in the next five years, thus, 5/165. The programs encompasses various elements we hope will create awareness around this important topic, such as regular messages from the voice of Project 5/165, Stasia Kelly, and networking events around the country. In the meantime, we'll keep you updated on the latest news on women in corporate law with our monthly newsletter, and do our best to promote the advancement of women through in-person workshops.
Video

Transformative Leadership Winner Nadia Dombroski

In this video Nadia Dombroski discusses the advancement of women in law
Thanks to our Sponsors

Featured Topics

E-Discovery

Vantage Cloud mitigates compliance risks