Home Knowledge. Innovation. Results. Services Technology Training Encore Discoveries News Strategic Alliances About Us Contact Us


Advanced Practices in Data Minimization

By Hemanth Salem and James Ramsey

For several years, general counsel, defense attorneys and the courts have been struggling with the ever- increasing volume of electronic data and the overwhelming associated costs. To alleviate this burden, Encore developed a series of data minimization methodologies to reduce the amount of documents counsel would need to review, thereby lowering cost and time of production.

Data minimization strategies and solutions need to be transparent, defensible and scalable. These characteristics represent the best defense against allegations of spoliation and/or failure to produce. They also support valid arguments against the waiver of attorney-client privilege or work product.

    • A transparent process is documented and comprehensible.

    • A defensible process is reasonably designed and conducted to find and produce relevant material and to identify privileged or work product material.

    • A scalable process is capable of processing large amounts of data within the court-established discovery deadlines.

Background: Even though courts are sympathetic to the burdens imposed by electronic discovery or ESI (electronically stored information), most litigants cannot afford to review copies of every email and electronic file on their computers.   Severe sanctions for spoliation, failure to produce or, in some matters, outright deception on the part of the sanctioned party are usually the result of substantial disregard for meeting preservation or production obligations. Getting a better handle on the amount of data to be processed and reviewed, while reducing it significantly, allows counsel to be more prepared throughout the course of discovery. The following proactive and defensible points should be considered.

An Experienced e-Discovery Expert Should:

  • Guide counsel from the very outset of a case to narrow the discovery scope and help prepare for 26(f) Meet & Confers. 

  • Assist in determining which custodians should have all of their data forensically copied and which should have their records gathered on a more selective basis. To guard against spoliation allegations, and avoid subsequent collection efforts if the scope of the case were to change, early efforts will typically err on the side of over-inclusiveness. 

  • Utilize proven data minimization methodologies during the e-discovery processing phase to reduce the amount of information to be reviewed with the assurance that all precautions have been taken to meet the necessary production requirements

  • Consult and train counsel on the latest web reviewing solutions, taking advantage of advanced culling and review techniques.

Proven Data Minimization Methodologies:  Utilizing data minimization strategies and solutions lowers e-discovery processing costs by reducing the volume of data to be examined during each phase of the discovery process. In large, document-intensive cases, a 40% to 80% data reduction can be achieved, allowing counsel to receive data in a fraction of the time traditionally required. Furthermore, this process makes collected data available for pre-case assessment and analysis virtually as soon as it has been gathered. 

Collections and Forensics: If proper planning and precautions are not taken, the decisions made during this critical phase of the case can adversely affect data minimization efforts. Working with a data collection and forensics expert will enable you to avoid the likelihood of sanctions for spoliation and/or failure to produce. It will also greatly reduce the costs associated with processing and reviewing ESI linked to excessive amounts of collected information.

  • Strategic, Targeted Data Collection: Forensic experts work with attorneys to understand what information is relevant to the matter and then seek to collect that information in a targeted manner.  Most providers and legal professionals see preservation and collection as one integrated step. Since the obligation to preserve is usually very broad, they end up collecting and processing excessive amounts of non-relevant data.  It is very important to select a forensic expert who takes a precise approach in which data is preserved broadly, but only what is needed for processing is collected.  This results in substantial time and cost savings.

  • Exclusion of System and Non-Relevant File Types as Part of the Forensic Extraction Process: Many providers process all files found on disk images, including computer system or operating files, and the numerous files needed to run programs such as Microsoft Word, Excel and Outlook. Even though these files are usually removed during the process, many clients end up paying for the initial processing of large amounts of non-relevant information. It is recommended to exclude these files, in a defensible manner, before they are processed. This option is highly cost-effective and time efficient, and can be applied in the vast majority of cases. Consult an expert to help determine the best options to fit your case specific needs.

 

 

 

De-NISTing:  Fraudulent computer users may attempt to hide data by renaming files as known software programs, or by hiding them in various systems or program directories. For this reason, it is important to compare the size and hash value of all processed ESI to information found within the National Institute of Standards and Technology (NIST). These lists contain information about all known commercial software programs and their individual files, providing a secure, defensible way to exclude software data while catching fraudulent files. This process, which is called de-NISTing, can reduce the need to forensically copy drives.  Electronic discovery providers should possess tools to automate the de-NISTing process.

 

Data Minimization Processing Techniques: Once electronically stored information (ESI) is collected, a variety of technologies and processes can be applied to reduce data volume and provide counsel with a reasonable data population for early case assessment. This can include tools for counsel to further cull down data in a legally defensible way. 

  • Filter by Date Range, Keywords and Document Types: ESI can be reduced greatly by filtering documents to the exact criteria dictated by counsel’s request for production. This can be accomplished by retrieving documents found within the date range, specified known document file types and keywords agreed upon by all parties. This is an efficient way of culling down the document population prior to removing duplicates and applying advanced culling techniques.  

  • Keyword filtering has been around for many years. However, keywords are usually negotiated and selected without prior knowledge of how well they will work within the search.  It is very common to have some keywords return large volumes of non-relevant data (false positives), while others inadvertently leave out relevant documents (false negatives). False positives result in unnecessary expenses linked to the processing and review of non-relevant documents.  False negatives can lead to more drastic consequences, including missing documents critical to defending the case, sanctions, etc.

  • The courts have made it clear it is incumbent upon counsel to understand and be able to defend the search methods used, including the selection of keywords.   A more substantial method would be to interactively select appropriate keywords based on testing them against the document population. Concept-based technologies can also suggest keywords attorneys may not have considered. 

  • Deduping and Reduping: Duplicate computer files can be identified with an extremely high degree of confidence. Removing the duplicates saves large amounts of review time by only presenting one copy of each unique email and loose file for review. Simply deduping electronic files within the records of individual custodians can reduce the volume of data to be processed by 20% to 30% or more. Deduping across a case or project can yield further savings of 50% to 75% reduction in volume. These duplicates can later be inserted by sophisticated tracking programs in order to meet the client’s production obligations to produce one copy for each custodian who had a copy. This reduping can take place after the review.

Advanced Data Culling and Early Case Assessment: Once records are de-NISTed, filtered and deduped, advanced culling techniques can be applied to the greatly reduced data set, providing further data minimization. Tools, such as Clearwell, can be utilized to index and work with native files (including NSF files) without the cost and expense of processing the entire population through more costly e-discovery tools. This offers users the ability to view data sooner, with visibility into the critical facts of the matter, resulting in a detailed early case assessment. The analytic capability of our technologies enables counsel to safely make individual decisions that impact many, sometimes thousands of records at a time, without a reviewer having to examine each of the impacted records. 

For example:

  • Early Case Assessment: Processing of ESI data is still fairly expensive and time consuming.  Most of the techniques described above can be used to cull documents before the processing step. This reduces processing cost significantly, while making culled information available to attorneys much sooner.

  • Top-Level Examination of Email Domains: Reviewers can get a top-level view of email senders’ domains in a collection and readily determine which records are non-responsive and irrelevant, e.g. emails from Amazon.com, WSJ.com, Ticketmaster.com, etc. These records can be easily eliminated for all custodians from subsequent reviews. Of equal importance is the ability to identify all email addresses associated with a particular law firm and to be able to tag or flag such emails as presumptively privileged, subject to a final review in the final review phase.
  • Email Threading Context: Most emails are part of lengthy threads and often sent to multiple recipients. Recipients then reply to, or forward, their copy of the email. Before long, the email thread contains numerous emails related to the same topic of discussion, a copy of each being stored in each recipient’s mailbox. 

  • Grouping unified email threads from various custodians, which can be viewed in the context of the entire thread or conversation, enables users to more quickly assess the relevance of all the objects in the thread. This process speeds the review and ensures consistent decision making. These email threads can then be reviewed and disposed of in a single operation. While this helps speeding up review for production, it is even more beneficial to attorneys looking for evidence related to the case. By cutting through the clutter and putting information in context, attorneys can often bypass the traditional first review aimed at identifying ‘hot’ documents. With email thread sizes often averaging over 4 emails per thread, the advantages of working with threads versus individual emails are obvious.

  • Eliminating Non-Relevant, Junk Emails and Files: A significant portion of emails and documents found on users’ computers can be easily recognized as non-relevant, based on the source of the email, its filing location, etc.   Powerful tools are available to rapidly eliminate non-relevant and junk data from final review and production, filtering emails by domain name, filing location, topic, etc.

Final Review: The reduced data set is typically placed in a separate review platform, optimized for document-specific tagging, redaction and creation. This is another area where a highly trained consultant can advise you on the best web hosting solution to meet your needs. Several choices are available, including Concordance FYI, iCONECT and Relativity, to name a few.  The functionality and performance of these web hosting solutions can be further enhanced with advanced concept-searching and record-grouping technologies, allowing greater minimization of the number of individual records to be examined.

  • Concept Clustering and Near Duplicates: In addition to containing exact duplicates of documents and emails, a significant volume of files often consists of slightly different versions of already existing documents: multiple drafts of the same document, for instance, or a Word document and its corresponding PDF version.

Technologies such as Syngence, Content Analyst and Equivio work in conjunction with a variety of review tools to manage near duplicates efficiently. Reviewers can use these concept clustering tools to search and examine records containing similar or related concepts. This can be invaluable when looking for similar documents that may be privileged or highly relevant. 

Grouping near-duplicates and reviewing them together reduces attorney review workload and increases accuracy. As an added benefit, this approach ensures that similar documents are treated consistently and can be safely included or excluded from further consideration. The benefits achieved from the use of data minimization techniques can be quite substantial.

Call to Action:  Our e-discovery Volume Reduction Calculator is available at www.EncoreDiscovery.com/EVRC. This tool can help you determine how data minimization can reduce your discovery costs and shorten your electronic discovery processing time. Should you have any questions, please contact Encore Discovery Solutions at Marketing@EncoreDiscovery.com.

Hemanth Salem
Vice President of Professional Services
Hemanth Salem joined Encore Discovery Solutions as Vice President of Professional Services in January 2007.  He is one of the nation’s leading experts on electronic discovery and related services and consults with law firms and corporate law departments on litigation readiness, data minimization and electronic discovery across the United States. Mr. Salem oversees a team of professional consultants dedicated to serving the growing electronic discovery and Web repository needs of Encore’s clients.  He has more than 15 years of experience as a systems analyst, software engineer and litigation consultant.  Most recently, Mr. Salem was Vice President of Technology Operations at Esquire Litigation Solutions. He previously served as an Engineering Manager at Merrill Corporation and Trion Technology, Inc.  Mr. Salem holds an undergraduate degree from the International Institute of Technology in India and a Master of Science degree from Case Western Reserve University in Cleveland.

James Ramsey
Vice President, eDiscovery Solutions
In leading the global electronic discovery practice for Encore Litigation Solutions, Ramsey works with corporate legal departments and their outside counsel to develop and implement litigation readiness plans, coordinate and facilitate large, complex, evidence-intensive cases.  For the past 30 years, Ramsey has been involved in virtually every aspect of the litigation process and has worked with a broad array of legal technologies designed to support complex litigation.  Most recently, he was National Accounts Manager for the electronic discovery and litigation solution practice of Esquire Litigation Solutions, where he worked with Fortune 500 corporate law departments and AmLaw 100 law firms.  Prior to Esquire, he served as Vice President of National Sales for SealedMedia, a digital rights management solution provider, Chief Executive Officer of Evoke, an interactive legal continuing education publisher, Senior Vice President of electronic publishing for Research Institute of America (a Thomson West company), and worked for Matthew Bender Company in releasing the first CD-ROM legal research products to the industry.  Ramsey frequently lectures on best practices in electronic discovery, as well as the use of technology to leverage successful litigation outcomes.  He received his undergraduate degree from the University of California, Berkeley and attended UC Berkeley for post-graduate studies in English Literature.  In recent years, he has attended strategic leadership and business programs at Columbia and Harvard universities.

 
©2008 Encore Discovery Solutions. Safe Harbor Policy Privacy Policy Legal Terms