Those of us in the e-discovery industry recall that 2012 was declared the “year of predictive of coding.” Despite a similar prediction for 2013, many of us assumed predictive coding would continue to be the focus for corporate clients and become further engrained in legal culture. However, even though adoption rates seemed to level off and widespread usage remains relatively low, several software companies are continuing to develop the next iteration of predictive coding technology for the discovery process. As it stands, even though we are many years away from artificial intelligence replacing virtually all of the human labor involved in the review process, it is likely that 2014 will mark another defining moment in the evolution of predictive coding.
Cost considerations aside, the perceived difficulty of use is a legitimate barrier in the market. Even though there is strong support for predictive technology in some legal circles, many of these lawyer-advocates already have a good understanding of technology and are outliers, constituting a discreet minority in the profession. Attendees of e-discovery conferences will note that the audience is often very homogenous. This is not a mere coincidence; it reflects the reality that e-discovery remains a niche practice, tangential to the merits of the case, and interest in the topic to the Bar, in general, is limited. Naturally, e-discovery advocates will argue this is a myopic view by our colleagues, as the impact of e-discovery on the litigation process (and litigation budgets) cannot be overstated. Nonetheless, if predictive coding is to become standard practice, it must be more accessible to the majority of professionals who aren’t interested in calculating recall and precision or don’t know their confidence interval from their correlation coefficient.
In order to take predictive coding beyond a mere culling tool, it is critical that it is combined with other technologies and search methodologies to become more intuitive and subsequently more useful to the user. Of course, in order for this approach to succeed, it must first defeat a point of view that is widespread in the market: The idea that random sampling is the only way to generate a training set for predictive coding.
The argument for using only random sampling to generate the initial data seed set for predictive coding is often based on an assumption that the best results are unbiased. However, by treating the discovery process like a science experiment, litigators and vendors ignore the presence of bias that occurs naturally — though not through traditional processes. By basing a search on preexisting information, rather than randomly looking through all the information, users are able to find the information they’re looking for more efficiently, cutting down time and costs of the process. This is not to say that a degree of randomness in the search process is not helpful, but relying on nothing but random sampling only creates a false sense of statistical certainty that cab belie the complexity of the data management process that is influenced by outside factors. Outside counsel and legal departments must stop buying vendor claims that are based on lab work in favor of solutions that reflect real life needs and scenarios.