On Point: The power and possibilities of the emerging field of predictive coding

Mark St. Peter, Computing Source

One of the unavoidable realities of 21st century litigation is that the nature of the documentary evidence is changing: in an increasingly digital world, the term paperwork is already becoming an anachronism. The volume of electronic data is increasing at a rapid rate, subsequently driving the costs of document collection and review related to litigation higher and higher.

For attorneys and legal professionals, one of the biggest challenges in recent years has been to find ways to navigate this virtual world and effectively salvage, source, secure, store and ultimately use this paperwork and documentation. Of those challenges, it is the ability to separate the potentially significant wheat from the irrelevant chaff that is arguably the most important. While overcoming the legal and logistical hurdles inherent to storing and using massive amounts of information is no small task, finding the right information at the right moment in a case is absolutely essential. Experienced litigators understand this reality, and appreciate the fact that utilizing the most sophisticated document management techniques is a critically important piece of the litigation toolkit.

In recent years, traditional methods of narrowing document collections (for instance, keyword searches) are giving way to methods that more closely resemble a kind of artificial intelligence.
Arguably the most effective (and certainly the fastest growing) new method is predictive coding: a computer-assisted document review process that rapidly sifts through documents and data to identify specific concepts from sample documents. Predictive coding combines familiar techniques such as keyword searches with proven human expertise to generate a flexible and powerful search mechanism. The more that these kinds of systems “learn” about document relevance, the faster and more accurately they can effectively identify responsive informational “hits” and rule out non-responsive or irrelevant documents.

Predictive coding selects and suggests specific documents based on a myriad of preset rules, patterns and conditions-and then those documents are “accepted” or “rejected” by human review personnel. The resulting feedback loops gives the computer an increasingly in-depth series of indications and parameters of how to handle similarly constructed documents. The more feedback the system gets, the more accurate the results ultimately become. One of the key advantages of predictive coding is in its ability to accommodate a range of variables-document type, language,
content, party, timeframe and conceptual meaning-in order to categorize and prioritize documents. Predictive coding programs can also rank documents according to their likely relevance or responsiveness, increasing legal professionals' chances for a targeted, timely and efficient review and production.

The power and flexibility of predictive coding is especially valuable at a time when important-and potentially legally significant-information is not only available in unprecedented volume, but is also stored in so many different formats (word processing documents, spreadsheets, presentations, emails, text messages, and social media) and is located in more and more places (computers, laptops, cell phones, corporate servers, cloud storage and many more). Social networking content alone encompasses a daunting array of data, from online profiles and postings, to messages, photos, videos and more. Faced with sifting through such an abundance of digital data, current and more well-established search methods like keyword searching have some very evident drawbacks. For example, attorneys are usually not familiar with all of the various acronyms, abbreviations and special terminology involved in a corporate activity-which means that many keyword searches are potentially compromised in a manner that could very easily miss many possibly responsive documents or pieces of information. Keyword searching is also far “better” at identifying irrelevant or unusable documents, such as privileged or confidential documents, due to the inherent rigidity of keyword terms. For instance, using keyword searching to locate attorney names is an excellent and targeted way to locate potentially privileged documents in a jiffy.

One notable drawback to predictive coding techniques is that they are best-suited to environments with extremely large amounts of electronically stored information (ESI), which are still not common in many cases. Because of this, the most effective results can sometimes be achieved by combining multiple search methods-a strategy that should increase the responsiveness of documents selected for review while also minimizing false positives. The value of working with ESI specialists, computer forensics professionals and digital evidence experts cannot be overstated. From specialized techniques and industry-specific insights, to the ability to work with different systems and to determine the precise nature of the best case- and system-specific searching techniques and digital data retrieval, these professionals are an increasingly important link in the litigation chain.

The electronic discovery and document review phase has become, by far, the most expensive component in a litigated case-especially since the Federal Rules were amended to address ESI.
Utilizing the best and most efficient search and data management strategies possible is the best way to keep those costs to a minimum while successfully finding digital needles in a virtual haystack. Emerging methods like predictive coding promise to be the tools of the very near future that can help attorneys (and their clients) achieve those goals.

—————

Southfield, Mich.-based, Computing Source specializes in electronic discovery, computer forensics, expert testimony, paper, media and traditional discovery and document review. For more information, please visit www.computingsource.com.