Insight Predict—Why It Is Different
Reasons Insight Predict produces superior results include:
Continuous Active Learning (CAL): Early TAR systems employed one-time or limited training. With Insight Predict, training is continuous and ongoing, allowing the engine to constantly get smarter about the documents.
An end to random training documents: CAL doesn’t require that an expert click through thousands of randomly-selected documents.
Subject Matter Experts (SMEs) not required: CAL doesn’t require that a SME train the system before review can begin. Review teams can do the training just as effectively, particularly with an expert performing QC steps along the way.
Rolling uploads are not a problem: With CAL, the system integrates new documents as they are received, and actively selects training documents for the reviewers to fill in any gaps.
Excels at low richness collections: CAL excels when there are only a few relevant documents in the collection. Earlier TAR systems had issues with low richness.
Integration: Predict is Catalyst’s TAR system within Insight, and the algorithm is trained any time documents are reviewed. Predict is not separate from the search and review platform. As a result, you don’t have to worry about importing or exporting data into different systems. Review Projects are integrated with Predict.
Permission
Insight Predict must be enabled for your site in order to use it. To manage Predict projects, you must be assigned to a role with the PredictCreateDatabase permission.
Workflow
The first step in any Predict project is training the system to rank the documents properly. We give you flexibility to decide how to train the system. If you have documents that are already reviewed, they will be automatically sent to the system to teach Predict what is relevant and not. This allows Predict to rank the remaining documents. If you do not have documents already reviewed, you can start review with any documents you like. As documents are reviewed, Predict is trained.
As the review progresses, we provide reports so that you can analyze progress. You are provided with the top-ranked terms, review statistics, custodian ranks, and progress charts. You can make decisions based on the information within these reports.
The QC stage gives you the opportunity to check documents that have potentially false positive or false negative coding. You can check the documents where the system ranking differs from the coded values – in other words, these are documents that the system thinks should have been tagged as positive, even though they were by tagged by reviewers as negative, and vice versa.
At this point, we recommend that you use your expert to review these documents, and correct mistakes or confirm opposing decisions.
As the ranking continues and documents are reviewed, analyze the Progress Chart for stability of the collection. Stability is represented when little or no change occurs with the ranked list. In other words, the average document has not moved up or down the ranked list very much.
We provide a stage in Predict where you can evaluate the ranked documents. This produces a Yield Curve. By examining the Yield Curve, you can determine if there is a point at which you can discard a portion of the population as non-relevant.