How many miles per gallon can I get using Insight Predict, the OpenText™ technology assisted review platform, which is based on continuous active learning (CAL)? And how does that fuel efficiency rating compare to what I might get driving a keyword search model?
While our clients don’t always use these automotive terms, this is a key question we are often asked. How does CAL review efficiency1 compare to the review efficiency I have gotten using keyword search? Put another way, how many non-relevant documents will I have to look at to complete my review using CAL versus the number of false hits that will likely come back from keyword searches?
This is an important question. The number of documents you have to review to get to your recall target directly correlates to the time and cost of doing that review. If you have to look at more non-relevant documents using either a CAL or keyword process, then perhaps you need to change your ride. If the miles per gallon are significantly lower with keyword search, then you are paying too much on your commute. Time to trade in for a new model.
How efficient are keywords?
This question might be tougher to answer than you think. While keyword searches can be effective in finding some relevant documents, the process rarely achieves a sufficient level of recall for a true comparison with a CAL process.
Anecdotally, we have asked a number of e-discovery professionals how efficient their keyword-based reviews typically are. While we aren’t claiming this was a statistical survey the typical answer we received was ten to one. That means that the typical reviewer has to look at ten documents for every relevant one found. That is a pretty low efficiency rate in our view. If these estimates are anywhere near correct, that suggests that a CAL process like Insight Predict will be about 5 times more efficient than keyword review. Put another way, if you are basing your review on keyword search, you are reviewing too many documents.
To be sure, keyword search still has a place in e-discovery, to find relevant documents quickly or when searching for documents that you know have specific and not often-repeated terms. But when it comes to a general review, here’s what the data shows about keyword efficiency—and why Predict is a superior method for document review.
Apples-to-apples: The recall-precision tradeoff in keyword searching
There is not a lot of publicly available data on the true effectiveness of keyword search, but we do know one thing for certain. There will always be a tradeoff between recall and precision (or, in this case, efficiency). As a general matter, you can only increase recall levels by sacrificing precision. With that in mind, we can take a look at the keyword search data and understand, at least qualitatively, how to compare keyword search to TAR, given that TAR typically attains much higher recall levels than keyword search.
Blair-Maron: Even though it’s over 30 years old, the Blair-Maron study remains the most comprehensive analysis of keyword effectiveness in the legal realm. One of the key findings of that study is that while attorneys were able to use keyword search to find a lot of relevant documents (indeed, sometimes achieving 80% precision on individual topics), they were only able to find about 20% of the total relevant ones in the larger population. The conclusion which follows logically is that had the attorneys tried to develop additional keywords to increase total recall from 20% to, say, 75 or 80%, they would have had to review a large number of non-relevant documents in the bargain.
Biomet: We do have one other publicly available data point that we might use to further evaluate the recall-precision tradeoff for keyword search in the Biomet decision, Biomet M2a Magnum Hip Implants Prods. Liab. Litigation (N.D. Ind. April 18, 2013). In Biomet, the producing party used keywords to reduce the review population from 19.5 million to 2.5 million documents.
The statistics in the Biomet matter are not entirely consistent but do provide a reasonable basis for assessment. Using one set of statistics, we can estimate the recall of the keyword search to be roughly 60%, which means the parties only found 60% of the relevant documents in the entire population.
What about the precision or efficiency of their review? Based on information disclosed in the opinion, we calculate that the precision of their keyword search efforts was roughly 16% (an efficiency of 6.25:1). Using a second set of statistics from the case, the precision would only be about 9% (11:1 efficiency). Neither figure seems to account for the fact that the team likely reviewed many non-responsive family members, so we need to treat the Biomet precision values conservatively. Their actual efficiency may well have been worse than 11 to 1.
Since neither the Blair-Maron study nor the Biomet decision reflect the higher levels of recall typically seen with Predict (for example, we typically set our recall target at 80%), we need to consider the recall-precision tradeoff. Obviously, at 80% recall, the precision of keyword searching would be nowhere near the 80% level seen with the Blair-Maron study. Instead, keyword search precision at 80% recall would more likely be nearer (and probably less than) the 16% precision calculated in Biomet. From these figures, we can estimate the precision of keyword search at 80% recall to be roughly 10%, which equates to a 10:1 efficiency.
Insight predict: Achieving high efficiency and high recall
In our examination of efficiency rates across several simulated Insight Predict projects we found that, on average, a Predict review has the potential to reach a 1.75:1 efficiency ratio at 75% recall. This means, in the perfect world of a simulation, you would have to review 1.75 documents to find each relevant document during the course of review.
We then took a look at how those same cases played out in the real world by looking at the statistics for the actual review. On average, the efficiency of the actual review was 2.66:1. But the average recall in those cases was in the neighborhood of 90% — which is generally higher than necessary in litigation, and certainly higher than you would see with keyword search.
Looking at the statistics at the point at which those same cases achieved 80% recall (a more realistic litigation target), the efficiency increases to about 2:1. This means that you will look at two documents in a typical Predict review to find each responsive document. So if you need to find about 10% of a collection to reach your estimated recall goal, you’ll wind up looking at about 20% of the collection during the course of the review.
Your mileage may vary (slightly)
So, why is Predict slightly less efficient in the real world, on average, as compared to the simulation environment? Let me offer another analogy to an automobile. Every new car carries a sticker that reads something like “32 mpg highway/24 mpg city”. Are these real numbers? Of course they are, but they are actually the result of tests performed in the laboratory, where the conditions are perfect.
In real life, your actual gas mileage will likely be less, say 30 mpg on a highway and 21 in the city. You may be driving into a headwind; you may be in stop-and-go traffic, or your tires aren’t perfectly inflated. All those real-world factors cause your gas mileage, your “efficiency”, to go down.
The same thing happens with Predict in the real world. The requirements of discovery and real-world review workflows lead to some inescapable inefficiencies along the way – for example, reviewing family members for privilege, periodic sampling, and the like, all decrease the average efficiency of Predict projects. That’s why we estimate Predict efficiency at 80% to be roughly 2:1, rather than the 1.75:1 seen in the simulations.
Maximizing fuel efficiency
Which option would you choose to maximize the fuel efficiency of your next document review? While mileage may vary in driving toward 80% recall, keyword search averages out at an efficiency of about 10:1, while Predict averages out at roughly 2:1. So, unless you know for sure that you will be coasting downhill for the entire review, choosing Predict will likely make your review about 5 times more efficient.
- What we mean by “efficiency” is how many documents you need to review to find one relevant one. This is expressed as a ratio— such as 10:1 for keyword search versus 2:1 for a CAL-based process. And this is really just another way to look at review precision, since an efficiency of 10:1 equates to 10% precision, and an efficiency of 2:1 equates to 50% precision.