Using TAR 2.0 for knowledge generation and protection

Lawyers search for documents for many different reasons. TAR 1.0 systems were primarily used to reduce review costs in outbound productions. As most know, modern TAR 2.0 protocols, which are based on continuous active learning (CAL) can support a wide range of review needs. In our last post, for example, we talked about how TAR 2.0 systems can be used effectively to support investigations.

That isn’t the end of the discussion. There are a lot of ways to use a CAL predictive ranking algorithm to move take on other types of document review projects. Here we explore various techniques for implementing a TAR 2.0 review for even more knowledge generation tasks than investigations, including opposing party reviews, depo prep and issue analysis, and privilege QC.

Opposing party productions

Opposing party reviews are essentially knowledge generation tasks. The objective is to weed through a collection to find particularly relevant documents. Recall (i.e., finding all of the relevant documents) is not as critical as precision—seeing more relevant documents than irrelevant ones—and surfacing more hot documents in the process.

CAL is particularly suited for this task. First, CAL is efficient in the review of sparse collections. And, despite the general responsiveness of opposing party productions, the truly important documents are few and far between. Second, as discussed earlier, CAL is also a superior way to surface Hot documents along the way.

There are a few different ways to initiate a CAL review of an opposing party production. With the caveat that the language used by opposing parties will typically differ, client documents may provide a reasonable starting point. Relevant opposing party documents provide even better seeds to initiate a CAL ranking.  Oftentimes, a handful of such documents are available through past communications, or can be found through a modest analytics assessment of the production—and only a handful of positive documents is enough to start a CAL review. Otherwise, a CAL review can be initiated with a single synthetic seed detailing precisely what is being sought from the opposing party production.

Once the CAL review begins, there is no special workflow needed to effectively review an opposing party production. CAL will elevate relevant documents, including hot documents, and further minimize the number of irrelevant documents that need to be reviewed along the way. And contextual diversity will ensure coverage across the Collection.

If, at any point, there is a desire to switch gears and truly focus on finding hot documents, it’s easy with Predict. Just spin up a new Predict project ranking on the HotDoc field. Every decision to that point will be used to train the CAL algorithm, and Predict will begin to surface hot documents preferentially over even generally responsive Documents.

Depo prep and issue analysis

Preparing for multiple witness depositions and researching multiple issues are both knowledge generation tasks, and they follow a similar workflow. And both tasks often suffer from low richness within the larger responsive collection, which makes CAL particularly useful.  In both cases, the setup follows the same approach. The typical coding approach is to structure the witness list or issue list as a multivalue field to allow reviewers to select more than one value (witness or issue) for each document. To get even more granular in the coding schema, and even further improve the effectiveness of CAL, each witness or issue can be set up as a separate binary (yes-no) field.  Using either structure, creating a separate Predict project for each issue or witness will ensure multiple simultaneous, independent rankings. That way, ranking and review for every witness and issue will be focused. Each review can then be conducted simultaneously by multiple reviewers, or sequentially by a single reviewer.

This approach will not prevent reviewers from coding the full spectrum of witnesses and issues pertinent to a particular document. Rather, while every reviewer will see documents ranked independently given their specific objective, they will be able to code documents for other issues and witnesses when appropriate. Doing so will correspondingly improve those other rankings.

Privilege and privilege QC

Privilege assessment is a protection task, regardless of whether it is an initial privilege review to locate privileged documents among a group of unreviewed documents slated for production, or a quality control measure to ensure that documents coded as not being privileged are indeed not privileged. In both cases, CAL can be an effective tool in preventing inappropriate production and disclosure.

CAL has its primary utility as an initial privilege review technique when documents are being produced without an eyes-on review— situations such as second requests and subpoenas. In that case, the goal is to effectively locate and withhold all of the privileged documents, without reviewing the bulk of the collection.

Certainly, in most instances, analytics will be used to isolate obviously privileged communications exchanged with counsel. Any privileged documents discovered in this analytics phase can then be used as seed documents to initiate the CAL ranking for further privilege Review.

The extent of the effectiveness of a CAL tool during this initial privilege review will then depend largely on the features that are used to inform the CAL algorithm. If email header information (To, From, domains, etc.) is included in the feature set, the CAL algorithm may have the ability to discern the identity of individuals making and breaking privilege and rank documents for review accordingly.  Otherwise, the algorithm will be constrained to ranking documents based purely on content text. Text-based ranking is critical to an effective privilege review nevertheless, because privileged communications may be subsequently distributed internally, without any reference to counsel. Assuming the general content of the text has been coded as privileged (presumably in the original communication with counsel) a CAL tool will then elevate similar documents as potentially privileged.

Beyond this, the QC algorithms incorporated into a Predict review provide one final defense to privilege disclosure, particularly in a traditional production review. In that situation, every document being produced has been reviewed and coded, inter alia, for privilege.  Spinning up a Predict project on privilege, then, will rank the entire collection by the likelihood of each document being privileged.  Further, algorithmic QC will rank every document coded as “not privileged” by the likelihood that they are, in fact, privileged. So, the top-ranked documents actually look like they are privileged even though they are coded as “not privileged.” Reviewing the top-ranked documents in this ranking will provide a final measure of assurance that privileged documents are not being produced.

Other uses

We have no doubt that people will come up with other use cases for CAL-based predictive ranking. We have written about two-tailed reviews, where teams focus on both ends of the ranked spectrum.  We also believe CAL-like systems will prove useful for other kinds of searches, including government records inquiries and patent research, with the focus being on using good documents rather than assumed keywords to build out better searches against all kinds of documents.

To learn more about TAR 2.0, TAR for Smart People, Third Edition, is available in print and in a downloadable PDF format.

Show More


OpenText is the leader in Enterprise Information Management (EIM). Our EIM products enable businesses to grow faster, lower operational costs, and reduce information governance and security risks by improving business insight, impact and process speed.

Related Posts

Back to top button