Around the world, the volume of electronic data has been growing at an alarming rate as electronic data is created through everyday activities. Electronic documents can be easily duplicated, while the diversity of file types makes it difficult for companies to analyze their data without a unifying platform. As many companies have adapted to a work-from-home arrangement, electronic data is further proliferated as employees increase their use of emails and company chat applications to communicate with each other in lieu of face-to-face communication.

The growing volumes of data is a global problem in the Digital Age. Large volumes of electronic data are generated every day, both in our personal as well as our professional lives, with growing reliance on text messages to family, friends and business relationships, spreadsheets of sales data, and long chains of emails populating inboxes across the globe. Such data multiply at an exponential rate: sending a single email to a distribution list of a hundred recipients creates a separate copy of the email in each recipient's inbox.

With many employees switching to a work-from-home arrangement during the COVID-19 pandemic, organizations have been increasingly reliant on emails and electronic documentation to make up for a lack of in-person interaction. In addition, many companies have increased their usage of collaborative platforms that include a chat function, such as Microsoft Teams and Zoom, enabling employees to communicate in real-time and simulate the office working environment. Such collaborative tools allow for fast and convenient communications and transfer of data. As a result, electronic data has dramatically increased.

Diversity of file types poses another challenge, as it requires access using different platforms and tools – Microsoft Word for Word documents, Adobe for PDFs, and so on. This makes a comprehensive analysis of electronically stored evidence difficult in an investigation where facts may be located in different types of files. One would need a method to process the data into one unified, indexable database and a single platform to review the data.

Managing electronic data is not only a matter of housekeeping. Electronically stored information (ESI) may become discoverable evidence to be produced in a legal or regulatory proceeding, where there may be specific requirements for the format of production and types of metadata to be produced. In addition, there may be situations where an organization may consider internally investigating its electronic data, such as in response to a whistleblower's allegation of improper conduct, even if no legal or regulatory proceeding has been initiated.

Whether it is in the context of an official proceeding or for the purpose of an internal investigation, large volumes of data may need to be reviewed by a team of reviewers. Given the complexity of handling, processing, and reviewing electronic documents, substantial time and cost may be incurred. As a result, many companies have turned to technological solutions implemented by professionals through bespoke processes to reduce time and cost.

What are the challenges involved in reviewing electronic data?

Various jurisdictions around the world have specific rules regarding discovery of electronic data. For example, the Hong Kong High Court Practice Direction SL1.2 sets forth the rules for conducting discovery of electronic documents for certain cases. It provides definitions for important eDiscovery-related concepts such as concept searching, keyword search, and metadata and anticipates the use of advanced technologies, such as technology assisted review (TAR). As many jurisdictions have similar rules and definitions in place, it is critical for in-house counsel and law firms to be familiar with the eDiscovery processes and technologies in preparation for discovery production in connection with a lawsuit or regulatory investigation. In addition, similar processes and technologies are also useful for purely internal investigations where no production is anticipated.

There are challenges associated with handling electronic documents due to their unique features. But these features also open up opportunities for effective methods of fact-finding and data analysis.

  • Existence of metadata. In addition to the content of a document (e.g., the text in a Word document, sales data entered into a spreadsheet), an electronic document may also contain other data points, such as date of creation, author, sender, recipients, and data size. Metadata may be discoverable information in their own right and analyzing them requires technical solutions that are not required for handling hardcopy documents. On the other hand, metadata may be useful for structuring a targeted search for documents.
  • Diversity of data types. When reviewing hard drive or email data to identify facts relevant to an issue, one may come across different file types that are normally accessed on different applications. For example, evidence of business misconduct may be identified in the content of email communications as well as information entered in Excel spreadsheets. Without a singular platform that allows one to analyze and access different file types together, it may be extremely difficult to gather information that is spread across different types of files.
  • Large volume of documents. As electronic data is easy to generate and replicate, one may have to identify important facts from millions of electronic records, many of which are exact duplicates of each other. A solution is required to help filter out non-responsive materials and narrow down to a smaller set to be manually reviewed.

With organizations and individuals becoming increasingly reliant on electronic devices for communication and storage of information, the above obstacles are greatly amplified. By the same token, solutions for handling such obstacles are possible precisely because features unique to electronic data allow us to leverage structural and conceptual analytics and other tools to help streamline the analysis and review of large volumes of documents.

Leveraging technologies to enhance document review capabilities

Several technical solutions are available to resolve challenges posed by electronic data:

  • A unified indexable database and structured analytics. When handling a large volume of data with a diversity of data types, it is essential to process them into a unified database that allows one to search across different file types and leverage metadata. This also allows one to identify documents that are exact duplicates to reduce the number of documents that require manual review. Structured analytics such as email threading and textual near duplicates analysis may be applied to the data to allow for more streamlined and consistent analysis. Email domain analysis can be helpful for prioritizing review of emails from particular domains.
  • Keyword search and conceptual analytics. In an indexable database, one can create strings of keywords and connectors to help identify documents likely to contain important information. Such a search can be conducted on the extracted content of documents regardless of their file types, as well as on metadata such as subject lines and file names. To further enhance the searching process, one may leverage conceptual analytics such as keyword expansion to identify additional keywords that are conceptually related to existing keywords. Clustering is helpful for mapping out the data to help find groups of documents on which to focus one’s review. If some important documents have already been identified, one may use conceptual categorization to help find documents within the unreviewed database that are conceptually related to the important documents.
  • Linear review. Once a subset of documents has been identified from the wider database as the focus of the review, a team of reviewers may begin to manually review each document on a review platform. When reviewing a document, a reviewer will typically utilize a pre-rendered coding panel to indicate the relevancy of a document, what issues are identified in the document, and any other information within the scope of review – this is known as “coding” a document.
  • Predictive coding. Depending on the number of documents that require review, the size of the review team, and the amount of time required to complete a review, one may also consider expediting the review process by leveraging machine learning. Instead of having reviewers manually code each document, AI can attempt to learn the reviewers’ coding pattern and predict the relevancy of the unreviewed documents. The AI then “serves up” documents it deems most likely relevant to the reviewers for manual review, and prediction can happen simultaneously as reviewers continue to review documents, creating a feedback loop that gradually improves AI prediction. Depending on the required final deliverables, predictive coding can often help save a significant amount of time and cost.
  • Knowledge transfer and working together. When an organization or law firm leverages a team of reviewers to conduct a review, it is critical for the in-house or law firm case team to regularly communicate with the review team as new information emerges during the review so the case team can react quickly and reformulate review strategies. As both the case team and the review team can access the same review workspace, the review team can escalate important documents to the case team, who can examine such documents in real-time and provide quick feedback. The case team can also perform an additional level of review on the same platform, which serves as a repository for the coding decisions of both levels of review to facilitate communications on findings.
  • Other bespoke workflows. In addition to document review, a review platform and its associated analytic tools may be useful in other workflows. For example, before, in lieu of, or alongside a linear or technology assisted review project, one may utilize conceptual tools to take a deep dive into the data to extract insights of the case, which may be helpful for designing strategies for further narrowing down document set for review or performing substantive factual analysis to help identify important parties and timeline.


The existence of electronic data, and the benefits and challenges that come with them, are facts of life that organizations and individuals may need to take into consideration both in the ordinary course of business and in the context of legal proceedings. Although issues specific to handling and reviewing electronic documents may be complex, state-of-the-art technologies, robust processes, and trained professionals are available to help you design a strategy bespoke to your review needs.

Michael Yuen, Esq.
Epiq | Director, Document Review Services APAC


Michael is based in Hong Kong and oversees Epiq’s review teams in the APAC/Australia region, including Mainland China, Hong Kong, Japan, and Australia, to provide review services to clients that include law firms and corporations. He supervises the review managers in managing the review teams’ day-to-day operation and providing quality controls over the review deliverables. Michael works closely with clients and external counsel to accomplish their eDiscovery objectives by utilizing technologies and robust processes. In addition, he develops business plans for expanding Epiq’s review services’ regional presence as well as creating new professional service offerings.