Large-scale document review poses significant challenges to an organization. A number of factors are responsible, but they generally relate to what I call the document review conundrum – the constant challenge of dealing with large amounts of electronically stored information, with ever-increasing volumes and ever-decreasing budgets and time. Worse still, the risks of making a misstep are very significant. The good news is a solution is finally on the horizon, in the form of technology-assisted review. But as one problem is addressed, another is created in kind. Today, many are attempting to understand the difference between the different forms of this emerging technology category and to learn how they can get the most out of their investment. This article will address these issues head on.
Understanding technology-assisted review
Thus, a best practice is to build a large enough seed set to cover all your bases. A few years ago, the common practice was to select as few as 500 documents to provide this training. With data volumes increasing and greater education available on how artificial intelligence works, more organizations are building significantly larger seed sets (generally 10,000 or more documents). Why? Because computers do not understand context as humans do. For instance, a computer doesn’t understand that “Mitt is running for President” means the same thing as “Mitt threw his hat in the ring.” It is therefore important to capture all concepts and semantic patterns up front to maximize the chances that the computer catches as many instances of the key expressions as possible. You’ll also want to quality control the results to monitor/limit what the computer may miss.
Best practices for language-based analytics methodologies