LexisNexis® Legal Newsroom

July 2012

Home – Predictive Coding: A Classic Battle Of Man Versus Machine?

Predictive Coding: A Classic Battle Of Man Versus Machine?

American folk hero John Henry was what was known as a steel-driver. Not an occupation for the meek, a steel-driver hammered and chiseled away at giant mountains of rock to carve out railroad tunnels. Carve out what you don't need, but leave the mountain.

 

With the advent of the steam-powered hammer came the question: Which is more effective--man or machine? The good news for John Henry was that he beat the machine in a contest. The bad news was that as soon as he won, his hammer still clutched in his aching grip, John Henry dropped dead.

 

In the case of Eli Whitney's famous engine-powered cotton separator, nicknamed the cotton gin ("gin" was short for engine, and had nothing to do with the essential ingredient to the classic Martini), the task was to separate cotton fibers for use in making fabric from cottonseeds, which had other uses, like growing more cotton. Take what you need to make fabric, but leave the seeds.

 

In the 1970s man harnessed machine to revolutionize legal research with the development of what is now a collection data search solutions run by (ahem) LexisNexis.

 

In all of these cases efficiency was a key innovation driver: a better result at less cost.   We will address cost issues in another article in this report. But here we want to address man vs. machine. Which is better? 

 

Sadly, for those of us who like clear winners and losers, the answer is not that sexy: it's all about balance. The fact is that somebody had to point that steam-powered steel-driver in the right direction and somebody had to get the fiber-rich cottonseed pods to the gin.

 

How Much Can Man Do?

How many documents can a human (in this case an attorney) review effectively? In a recent study published by The Rand Corporation--Where the Money Goes: Understanding Litigant Expenditures by Producing Electronic Discovery--some services claim they can review 100 documents an hour. They say decisions on relevance and privilege can be made in an average of 36 seconds. But can we do better? Rand says probably not, that it is unrealistic to expect human reviewers to get much faster.  But grouping documents surely could make us go faster, right? Rand says that, "given the physical limitations of reading and comprehension," the answer is still no.

 

A common way of grouping documents is the use of keyword searching. This is great when you know what you're looking for, but discovery involves looking for what you don't know you're looking for.

 

Go Fish

U.S. Magistrate Judge Andrew J. Peck, who--with his decision in Da Silva Moore v. Publicis Groupe, No. 11 Civ. 1279 (ALC) (AJP), 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24, 2012) firmly established himself in the e-discovery vanguard--wrote that the way lawyers choose keywords is a lot like playing Go Fish. "The requesting party guesses which keywords might produce evidence to support its case without having much, if any, knowledge of the responding party's 'cards' (i.e., the terminology used by the responding party's custodians). Indeed, the responding party's counsel often does not know what is in its own client's 'cards.'" 

 

Judge Peck said another problem with keywords is that they do not narrow the field of documents enough, explaining that this type of search can be "over-inclusive," finding responsive documents but still with too many irrelevant documents.

 

Judge Peck pointed to a 1985 study in which scholars David Blair and M. Maron collected 40,000 documents from a Bay Area Rapid Transit accident. They instructed experienced attorney and paralegal searchers to use keywords and other review techniques to retrieve at least 75 percent of the documents relevant to 51 document requests.  "Searchers believed they met the goals, but their average recall was just 20 percent.  This result has been replicated in the TREC Legal Track studies over the past few years," Judge Peck wrote.

 

It Can Be Taught

In commenting on Judge Peck's ruling in an article they posted on martindale.com®, Proskauer Rose attorneys Mark W. Batten, Enzo Der Boghossian and Steven D. Hurd said that "When the system's predictions and reviewer's coding become sufficiently consistent, the system has learned enough to make confident predictions as to the remaining documents. Some systems produce a simple yes/no as to relevance, while others give a relevance score (e.g., on a 0 to 100 basis) that can be used to prioritize review."

 

The example given by Judge Peck, they wrote, was that a score above 50 may produce 97 percent of the relevant documents, but constitute only 20 percent of the entire document set. Additionally, counsel may decide that documents below a certain score are so unlikely to be relevant that no human review is necessary.  In Da Silva Moore the parties agreed to use a 95 percent confidence level to create a random sample of the entire email collection.  The sample of 2,399 documents was to be reviewed to determine relevant documents for a "seed set" to use to train the predictive coding software, the Proskauer attorneys explained.

 

Reasonable Minds Disagree

Another limitation on human reviewers is--and this will come as no surprise to anyone reading this article--that not all lawyers will agree on what documents are relevant or respond to discovery requests. 

 

In fact, "some rigorous studies ... found that human reviewers often disagree with one another when they review the same set of documents for relevance and responsiveness in large-scale reviews," the Rand authors wrote in their report.  They cite a study involving 28,000 documents put into 12,000 topical groups, or families, to see if these families were responsive. The seven teams "differed significantly" on how many documents were relevant. One team found 23 percent of the documents relevant, with another team concluding that more than twice as many documents--54 percent--were relevant. 

 

Rand says predictive coding can meet the challenge of over-production and attorney-machine disagreement. "Such machine-learning techniques continually refine the computer's classifications with input from users ... until the ambiguous ratings disappear." The iterative process, in which attorneys review the software's decisions and the system continues to "learn" after a number of rounds of review, will minimize the level of disagreement between the software and the human reviewers.

 

Rand was quick to point out that "attorney review is still very much in play with predictive coding, but generally only for the smaller subset of documents that the application has judged to be potentially relevant, responsive, or privileged."

 

Ask Me Again Later

Concerns will and should continue to keep the pressure on predictive coding systems to improve. Can the issue of under-production vs. over-production be resolved? Can these systems identify privileged, confidential or classic "smoking gun" documents? Can they handle highly technical documents? And what about their appropriateness for small document sets?

 

Judge Peck said, "These types of questions are better decided 'down the road,' when real information is available to the parties and the court."

 

Nobody Is Perfect

While Judge Peck has come out in favor of predictive coding, and apparently is the first judge to do so, he acknowledges its limits. "The Court recognizes that computer-assisted review is not a magic, Staples®-Easy-Button solution appropriate for all cases. The technology exists and should be used where appropriate, but it is not a case of machine replacing humans: it is the process used and the interaction of man and machine that the courts need to examine.

 

"The objective of review in e-discovery is to identify as many relevant documents as possible, while reviewing as few non-relevant documents as possible. Recall is the fraction of relevant documents identified during a review; precision is the fraction of identified documents that are relevant. Thus, recall is a measure of completeness, while precision is a measure of accuracy or correctness. The goal is for the review method to result in higher recall and higher precision than another review method, at a cost proportionate to the 'value' of the case."

 

In affirming Judge Peck's decision, U.S. Judge Andrew L. Carter said "there is simply no review tool that guarantees perfection." He included human review in that category.  "[E]ven if all parties here were willing to entertain the notion of manually reviewing documents, such review is prone to human error and marred with inconsistencies from the various attorneys' determination of whether a document is responsive."

 

You Can Always Object

Judge Carter also echoed Judge Peck's point that if the software is flawed or if a party is not receiving the types of documents that should be produced, the parties are always allowed to reconsider their methods and raise their concerns with the court.

 

Judge Peck noted that litigants seemed to be looking for a judge somewhere to come out in favor of predictive coding, which he did "where appropriate." And Rand Corporation says the use and advancement of predictive coding is going to require "innovative, public-spirited litigants to take bold steps by using this technology for large-scale e-discovery efforts and to proclaim its use in [an] open and transparent manner." 

 

Surely early forms of cotton-separating machines failed by not perfectly separating cotton fibers from cottonseeds, or smashing valuable cottonseeds in the process.  Likewise, the first steam-powered steel-drivers surely didn't chip away enough rock quickly enough. Predictive coding is one of those inventions that will continue to improve over time--allowing human reviewers to exert less time on lower-level tasks and more time on higher-level functions.

 

martindale.com is a registered trademark of Reed Elsevier Properties, Inc., used under license. 

 

For more information about LexisNexis products and solutions, connect with us through our corporate site