Technology: Relative defensibility of Boolean and statistical document culling workflows

Technology is fallible, but that doesn’t mean we shouldn’t use it

Our prior articles have addressed statistical document sorting technologies, various workflows that leverage this technology, and the defensibility of these technologies and workflows in court. The Kleen Products v. Packaging Corp. of America case pending before Magistrate Judge Nan Nolan illustrates the baseline of defensibility for the more traditional Boolean searching technology.

The several defendants in Kleen, an antirust case, used Boolean searches to generate a linear review set. The plaintiff argues that the process was inadequate and seeks to compel the defendants to try again using statistical document sorting technology. There is great interest in whether this case will become the first in which a party is affirmatively compelled to use statistical document sorting technology. But, leaving the question of the potential remedy aside, this case illustrates well the challenge of defending a Boolean process that is conducted unilaterally.

In their effort to defend the process, the defendants already have invested in preparations for two full days of expert testimony. It appears that the third day of hearings is currently on hold as the parties negotiate an amicable resolution. However these negotiations turn out in Kleen, most litigators know that it tends to be very difficult to hold firm to a unilaterally established set of Boolean search terms. Opponents identify overlooked words and raise questions about unknowable shorthand or even code words. Often, there are heated disputes over how broad or narrow the search terms ought to be. But, quite commonly, when the dust settles, at least some additional search terms get performed either by agreement or court order.

Anyone who seeks to hold firm to a unilaterally selected set of search terms should keep in mind what Magistrate Judge John Facciola famously said in U.S. v. O’Keefe:

“Whether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics...Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”

The difficulty of selecting defensible search terms is well known to litigators who handle cases involving a large volume of documents. This difficulty is one reason lawyers often do not use Boolean technology to restrict document collections as opposed to document reviews: There is a very real risk of needing to expand the search later. If the collection is restricted to only documents that hit on certain search terms, then there is a real risk of serially inconveniencing employees, disrupting the client’s business and incurring duplicative expenses for repeated collections with expanded search parameters.

Our point is not that Boolean technology is obsolete or indefensible. It is that all technologies and workflows are subject to attack, including even those that may be considered to be time-honored and traditional. So we should not resist using the newer statistical technologies and workflows simply because they too are subject to uncertainty and attack.

Some workflows that leverage statistical document sorting technology probably are more defensible than others. For example, a workflow in which humans analyze each and every concept cluster may be more difficult to attack than either Boolean workflows, or predictive workflows in which an algorithm is trained based on a sample set of documents. This is because when each concept cluster folder has been analyzed, a human being can take the stand and testify that he performed a reasonably diligent review of each concept-cluster folder and made informed and reasonable judgments. This would generally seem to leave little opening for attack absent some apparent gap in the production (which gap a producing party likely would seek to remedy before letting the dispute go to the judge).

This does not mean that Boolean or the various predictive workflows should not be tried. Each workflow potentially has a place in appropriate circumstances. Boolean and predictive workflows may be more prone to require some flexibility and cooperation and, failing that, judicial involvement. But that is more familiar ground to most litigators than might at first be assumed.

About the Author
Thomas Lidbury

Thomas Lidbury

Thomas A. Lidbury is a partner in Drinker Biddle & Reath's Commercial Litigation practice and leads the electronic discovery and records management group. He advises clients in the design and implementation of their internal electronic discovery programs and handles electronic discovery in major litigation.

About the Author
Michael Boland

Michael Boland

Michael J. Boland is managing director of Drinker Discovery Solutions LLC, a subsidiary of Drinker Biddle & Reath, which provides electronic discovery services including processing and advanced culling techniques, a review platform, and production of data.

Comments

InsideScoop Daily eNewsletter

InsideScoop delivers the latest-breaking news affecting in-house counsel. Get the latest business trends, current corporate litigation, labor developments, technology initiatives and more — FREE. Sign up now!

You have been subscribed! You will receive a confirmation email soon.

See the entire list of InsideCounsel eNewsletters.

Resource Library


Bring the Benefits of Decision Tree Analysis to Your Everyday...

In this on-demand webinar, learn how to counter the challenges of litigation with predictive analytics...

13 Things to do Now to Reduce Risk and Avoid...

We have developed best practices for lowering your e-Discovery costs, shortening the length of your...

7 Simple Strategies for Improving Legal Fee Budgeting Certainty

Understanding the legal fee budgeting paradigm and following seven simple strategies will help you control...

Complimentary White Paper: Best Practices for Meeting Critical eDiscovery Challenges

Packed with practical advice, this white paper discusses best practices for meeting eDiscovery challenges across...

Complimentary White Paper "Key Considerations for Collection Methodologies and Resources"

This white paper addresses the need for companies to reevaluate their current collection policies in...

Moving Matters In-House: How Technology Enables Legal In-Sourcing

Strategically shifting more matters to in-house counsel has proven to be an effective strategy to...

5 Ways to Promote Responsible Content Sharing

Find out five ways that organizations can promote responsible sharing of content among employees by...

Reducing the Costs of eDiscovery from Collection to Court!

Predictive coding is only one of many ways organizations can make eDiscovery faster, cheaper and...

Discovery Shifts to the Cloud

Adoption of Cloud computing continues to gain momentum. How can IT and Legal Teams avoid...

Lower Your Total Cost of Ownership

With the deployment of Proofpoint Enterprise Archive, organizations have realized significant cost savings in automating...

View All »

Advertisement. Closing in 15 seconds.