More On

Technology: Getting defensive about predictive document sorting technology

The fight over predictive coding resembles Boolean disputes

The efficacy and defensibility of so-called “predictive coding” has been a hot topic in light of Magistrate Judge Andrew Peck stating the he “has approved of the use of computer-assisted review” in Da Silva Moore v. Publicis Groupe, and Magistrate Judge Nan Nolan conducting a still ongoing evidentiary hearing in Kleen Products v. Packaging Corp. of America, in which the plaintiffs seek to force the defendants to start over with this technology after they already used a traditional Boolean methodology.

Judge Peck’s ruling in Da Silva Moore certainly is positive about statistical document sorting technology. These “predictive” workflows leverage new technology that learns about the substantive relationships between documents based on coding decisions made by humans. But the ruling comes from a case in which the parties, at least initially, already agreed to use the technology and were arguing only over the particulars of the protocol to be followed. Moreover, the order does not adopt or approve of any particular protocol, tool or technology. Judge Peck resolved certain disputes about how to proceed with the process initially, but he reserved judgment on whether those initial steps would be sufficient until after those steps are completed. Contrary to what some have suggested, the holding is quite limited. But there is an important takeaway: Resolution of a dispute over how to use statistical document sorting technology and “predictive” workflows looks much like the resolution of a dispute over how to use Boolean technology.

Likewise, in Da Silva Moore the parties came into court with competing proposals to “stabilize the training of the software” and to create the initial seed set used to train the software. Instead of hit rates and such, the arguments focus on “statistical confidence levels,” how many “iterative rounds” of human coding should be done to adequately teach the algorithm, how many of the documents humans should review and code in each of those rounds, and at what point the algorithm should be trusted to have found substantially all of the important documents that the humans will then review and code. In addition to the quantitative metrics, there were qualitative arguments such as whether the defendant should review all or only some of the documents the computer will return in the final round. Judge Peck exercised his wide discretion in discovery matters, split the baby and told the parties to come back if they still have disputes after doing what the court has ordered. So the first hotly contested ruling on the use of statistical document sorting technology and “predictive” workflows looks much like the more familiar disputes over Boolean methods. This should give comfort to new adopters of this technology.

One concern in this case is that the defendant agreed to allow the plaintiff to review the documents that the reviewers code as not responsive in each iterative round. If this becomes the price for judicial permission to use these methods, then it may not be worth it.

Contributing Author

author image

Thomas Lidbury

Thomas A. Lidbury is a partner in Drinker Biddle & Reath's Commercial Litigation practice and leads the electronic discovery and records management group. He advises clients in...

Bio and more articles

Contributing Author

author image

Michael Boland

Michael J. Boland is managing director of Drinker Discovery Solutions LLC, a subsidiary of Drinker Biddle & Reath, which provides electronic discovery services including processing and advanced...

Bio and more articles

Join the Conversation

Advertisement. Closing in 15 seconds.