Many lawyers’ eyes glaze over when we hear of Bayesian networks, concept clustering, predictive coding, suggestive coding, machine-assisted review, meaning-based coding, latent semantic analysis, probabilistic latent semantic analysis, Shannon’s theory, the Markov blanket, de Finnetti’s theorem, latent Dirichlet allocation, Gibbs sampling and so on. Like Chevy Chase, most of us believed “there would be no math.”
Can lawyers still hold to that view now that document collections are measured in gigabytes and terabytes, and sophisticated mathematical document sorting technology is going mainstream? Yes, we can. We only need a basic understanding of what the technology does so that we can know how to effectively use it in a defensible workflow.
Developers initially touted this technology for early case assessment purposes because the users hadn’t devised workflows to use it for primary document culling. But users now have developed defensible workflows that have been in use for most of the past decade. These workflows can be divided into two categories:
1. The first approach is to sort the entire dataset into clusters before humans look at the documents, review the clusters (without reviewing each document) to separate those that do not promise to contain relevant documents from those that do, and review the documents only in the latter.
This technology is very powerful and defensible when combined with the right process. Users have developed good workflows to leverage this technology to very efficiently cull relevant from irrelevant documents. As new and mysterious as it may sound, it has been in real-world use in some of the biggest litigation in the country for at least a decade. And you do not need a Ph.D. to use it in your next case.