Chapter 7

Determine Collection Requirements

Abstract

Focus is placed on the requirements for gathering digital evidence from identified data sources in support of all applicable business scenarios. Establishing collection requirements enabled organizations and relevant stakeholders responsible for risk managements to communicate what their requirements are and have it distributed to the necessary support teams for implementation.

Keywords

Collection factors; Data security; Infrastructure; Metadata; Requirements
 

Introduction

As the third stage, organizations must produce a collection requirements statement so that stakeholders responsible for managing business risk scenarios, as discussed in chapter “Define Business Risk Scenarios,” can effectively communicate to those who are responsible for operating and monitoring the systems where digital evidence will be sources, as discussed in chapter “Identify Potential Data Sources.”
However, in addition to the need for defining business risk scenarios and identifying data sources, before an organization can establish a statement around the proactive gathering of digital evidence they have to ensure that a thorough assessment is performed to ensure the requirements for collecting any digital evidence is justified.

Precollection Questions

Deciding on what the organization’s requirements are for proactively gathering digital evidence requires some preliminary activities to be completed before work can begin on creating an overall statement describing exactly what these requirements are. As the moderating factor to producing a requirements statement comes the need to complete a cost–benefit analysis (CBA).
Similar to how a CBA is used to determine if implementing a digital forensic readiness program is valuable to an organization, as discussed in chapter “Understanding Forensic Readiness,” this time around it is used to help organizations determine factors such as how much it will cost to gather the digital evidence and what benefit there is in collecting it. To determine if creating a requirements statement is beneficial, organizations have to answer several questions that focus on whether it can be done in a cost-effective manner.
To get an accurate comparison, organizations have to factor in all monetary aspects associated with conducting an investigation in reaction to an incident against the resulting impact of an incident. As a starting point, organizations can pull cost elements from their service catalog, discussed further in Appendix D: Service Catalog, to understand how administrative, technical, and physical security controls contribute to conducting a forensic investigation. Examples of cost elements that organizations must consider to be included as part of this comparison includes ongoing maintenance of governance documentation (ie, standard operating procedures (SOP)); resource allocation to facilitate both the incident management and continuous improvement activities; and the operational cost for all tools and technologies used to manage the business risk. With this initial analysis complete, a secondary comparison must be complete including all monetary aspects, tangible and intangible, associated with conducting an investigation having proactively gathered digital evidence against the resulting impact of an incident. Using results from the two comparative analyses, organizations can determine quantitative benefits of creating a requirements statement.
Question #2: Can the digital evidence be gathered without interfering with business functions and operations?
When conducted in reaction to an event, forensic investigations can require that organizations temporarily assigned several support resources to assist in the gathering of digital evidence. In some instances, the organization might realize that their ability to effectively and efficiently gather digital evidence in reaction to an incident is challenged by some type of roadblock (ie, restoration time delay). Where potential digital evidence can be proactively gathered, organizations can benefit from having digital evidence readily available when needed and not having to re-allocate resources away from their day-to-day business operations to assist. This improvement in operational efficiencies can reduce the need for resources to be temporarily removed from their normal duties and avoid any lost productivity or degradation in service availability.
Question #3: Can a forensic investigation minimize the impact or interruption to business functions and operations?
The potential for an incident to result in the loss or degradation of day-to-day business operations is a realistic scenario that most organizations face. In reaction to these events, the organization’s ability to manage the incident has a direct dependency on their capability to quickly gather and process digital evidence to understand the content and context of the incident. Having digital evidence gathered and made readily available, not only can the organization improve on the amount of time needed to investigate but they can also enable the ability to conduct proactive investigations. In addition to supporting forensic investigations, the capability to perform proactive investigations in support of security control assessments or user behavior analytics can reduce the likelihood of an event resulting in impact or interruption to the business.
Producing digital evidence in support of legal matters requires that organizations ensure their electronically stored information (ESI)1 is admissible in a court of law. As discussed in chapter “Evidence Management,” the US Federal Rules of Evidence 803(6) describes that ESI is admissible as digital evidence in a court of law if it demonstrates business “records of regularly conducted activity”; such as an act, event, condition, opinion, or diagnosis. Determining the relevance and usefulness of ESI as digital evidence before creating a collection requirements statement ensures that organizations will not give way to overcollecting resulting in unnecessary downstream processing and review expenses.
Question #5: Can the digital evidence be gathered in a manner that does not breach the compliance with legal or regulatory requirements?
Laws and regulations can be imposed against organizations depending on several factors such as the industry they operate within (ie, financial) or the countries they conduct business (ie, the Unites States, India, Great Britain). Organizations must have a good understanding of how these governing laws and regulations influence the way they conduct their business operations. To provide reasonable assurance there is adherence to these requirements, organizations may need to produce digital evidence of controls that demonstrate they are practicing a reasonable level of due care. Consideration must be given on how background and foreground digital evidence, as discussed in chapter “Identify Potential Data Sources,” will be proactively gathered and preserved in accordance with the compliance requirements.
Assessing the quantitative and qualitative implications of creating a collection requirements statement in advance helps organizations to determine if proactively gathering digital evidence will reduce investigative costs; such as selecting storage options, purchasing technologies, and developing SOPs. Appendix E: Cost-Benefit Analysis, further discusses how to perform a CBA in support of producing the digital evidence collection requirements statement.

Evidence Collection Factors

Traditionally, the majority of digital evidence is gathered from sources that contain the actual data content used to describe the “who, where, what, when, how” elements of a forensic investigation. In addition to the actual data content, there are several other factors that can be used to supplement the details about an event/incident and influence its meaningfulness, usefulness, and relevance during a forensic investigation.

Time

Using a centralized logging solution, such as an enterprise data warehouse (EDW),2 time stamps can be generated and recorded as data is collected. Additionally, using a consistent and verifiable time stamp unanimously across all distributed data sources will ensure that digital evidence collected will be much easier to correlate and corroborate during the analysis phase.
There are a number of mechanisms that can be used across many different platforms but are still considered a decentralized means of establishing time synchronization across distributed data sources. Alternatively using the network time protocol (NTP) set to Greenwich Mean Time (GMT), with time zone offsets configured on the local systems, is the best practice for establishing consistent time stamps in support of a forensic investigation. While NTP addresses the issue of centralized time synchronization, it does not account for the accuracy of time being published to connected data sources.
Originally developed for military use, global positioning system (GPS) provides accurate data about current position, elevation, and time. GPS receivers have a high rate of accuracy and are relatively simple to install because they only need an antenna with unobstructed line of sight to several satellites in order for them to work correctly. Connecting a GPS receiver to the NTP device is a cost-effective way of ensuring accurate time signals are being received.
Although organizations might only conduct business in a single time zone, an incident will most often produce digital evidence in data sources that span across several time zones. Having a centralized solution to provide these distributed data sources with accurate time synchronization is not something traditionally easy to challenge in a court of law.

Metadata

On its own, the data content of digital evidence can be challenging for investigators to use because it lacks contextual awareness; discussed further in chapter “Identify Potential Data Sources.” Metadata, which is essentially “data about data,” is used to add a supplemental layer of contextual information to data content. It gives digital evidence meaning and relevance by providing corroborating information about the data itself, revealing information that was either hidden, deleted, or obscured, and also helps to automate the correlation of data from different data sources.
One of the most common use of metadata during an investigation is to reduce the volume of ESI by adding meaning to the data content so that relevant digital evidence can be more accurately located. Additional, metadata can also be used to provide forensic investigators with the ability to identify additional evidence, associate different pieces of evidence, distinguish different pieces of evidence, and provide location details. Some of the most common types of metadata used during a forensic investigation include, but is not limited to, the following:
• Date and time when a file was modified, accessed, or created
• Location where a file is stored on an electronic storage medium
Guide metadata is used to assist with locating and identifying information and objects, such as a document title, author, or keywords.
However, because metadata is fundamentally just data it is also susceptible to the same evidence management requirements imposed on digital evidence; discussed further in chapter “Evidence Management.” Safeguards must be taken to ensure that the authenticity and integrity of metadata is upheld so that it can be used effectively during a forensic investigation and meets the legal requirements for admissibility in a court of law.
Nevertheless, because metadata is not generally accessible or visible there is a need for greater skills and the use of specialized tools to properly gather, process, and preserve it. An organization’s capability to use metadata to contextualize a forensic investigation will significantly reduce the amount of resources spent manually analyzing digital evidence by improving its meaningfulness, usefulness, and relevance.

Cause and Effect

The “Pareto Principle,” also referred to as the “80/20 Rule,” states that approximately 80% of all effects come from roughly 20% of the causes. As a rule of thumb, for example, this rule can be used as a representation of the information security industry where 80% of security risks can be effectively managed by prioritizing the implementation of 20% of available security controls; reinforcing a very powerful point that distributions are very rarely equal in any scenario.
In 2002, Microsoft announced they had made initial progress on the Trustworthy Computing initiative which focused on improving the reliability, security, and privacy of their software.
As the initiative continued to develop over the year, Microsoft quickly realized that among all the bugs reported in their software a relative small quantity of them resulting in some type of error.
Through further analysis, Microsoft learned that approximately 80% of the errors and crashes in their software were caused by 20% of all bugs detected.
It is not realistic for an organization to identify and understand every combination of cause and effect that are possible. Instead, by referring back to the business risk scenarios outlined in chapter “Define Business Risk Scenarios,” organizations can reduce the scope of which cause and effect events need to be considered based on its application to the organization and the business risk scenarios. From narrowing the scope cause and effect down to only those that are relevant to the organization, supplementary data sources can be identified and considered for inclusion in the collection requirements statement to enhance the analysis of digital evidence by further improving its contextual meaning and relevance.

Correlation and Association

Digital evidence gathered during a forensic investigation, which is traditionally considered the primary records or indication of an event, is used to indicate the details about what happened during an incident; including, but not limited to, system, audit, and application logs, network traffic captures, or metadata.
For quite some time, the scope of a digital crime scene was somewhat limited to only the computer system(s) directly involved in the incident itself. However, today most organizations have environments that are made up of interconnected and distributed resources where events on one system are frequently related to events on other systems. This requires that the scope of an event be broadened outwards to include all systems that would be—in some form or another—involved in the incident.
With the expansion of the investigative scope, establishing a link between the primary evidence sources is needed so investigators can determine how, when, where, and by whom events occurred. To provide this additional layer of details, consideration needs to be given to other supporting data sources that can be used to establish the links between the content and context of digital evidence.
Under the chain-of-evidence model methodology, illustrated in Figure 7.1 below, each set of discrete actions performed by a subject6 is placed into a group separate from each other based on the level of authority required to execute them. However, it is important that each group of actions in the different sources of digital evidence is linked to the adjacent action group in order to complete the entire chain of evidence link.
The ability to create a link between the various data sources is crucial for organizations to establish a complete chain of evidence and enhance their analytical capabilities by getting a better overall understanding of the incident. Using a chain-of-evidence model allows organizations to better plan for a complete trail of evidence across their entire environment. Following this model requires thinking in terms of gathering digital evidence in support of the entire chain of evidence instead of as individual data sources that may or may not be useful during the processing phase of the forensic investigations.

Corroboration and Redundancy

Coupled together with how pervasive and distributed it has become in our personal lives, technology has also been so deeply embedded into business operations and functions where that when it comes to investigating an incident, there is no shortage of digital evidence to be gathered and processed. However when an incident does occur, organizations can be challenged with proving what happened because individual pieces of digital evidence on their own do not provide the context necessary to arrive at credible and factual conclusions.
With the aggregation of multiple data sources, there will most likely be some level of duplication in terms of the information content. This duplication of information should not be viewed negatively, but should instead be taken advantage to confirm the details of an incident during the forensic investigation.
The strength of digital evidence collected will ultimately improve when it can be vetted by across data sources. Generally, the goal of every forensic investigation is to use digital evidence as a means of providing credible answers to substantiate an event or incident. Achieving this requires that the same or similar digital evidence from multiple sources is gathered and processed as an entire chain of evidence because there will be most likely indicators of the same incident found elsewhere.
Over time, the continued gathering of data across multiple sources can provide a sufficient amount of digital evidence that minimizes the need for a complete forensic analysis of systems. By preserving digital evidence from multiple sources, it allows organizations to leverage a consistent toolset across the entire chain of evidence that can be used to support several investigative purposes such as incident response, digital forensics, or e-discovery.

Storage Duration

A common practice for many organizations, for example, is to retain in long-term storage digital information such as e-mail messages and security logs (ie, intrusion prevention systems, firewalls, etc.). Not only does retaining this digital information support regulator and legal requirements, but can also hold potential evidentiary value and might need to be recalled to support one of the business risk scenarios discussed in chapter “Define Business Risk Scenarios.”
Organizations must carefully plan on which type of electronic storage medium will be used to support their long-term storage requirements. As an example, backups are commonly used for long-term storage; however, organizations should be diligent to ensure that the type of backup media selected is not susceptible to losing information each time they are used. To determine the most appropriate electronic storage medium, organizations should complete a CBA, as discussed in Appendix E: Cost-Benefit Analysis, to identify which solution best meets their needs for retention and recovery time objectives (RTO).

Storage Infrastructure

Even though there have been significant advancements in how digital forensic tools and techniques have helped to reduce the time required to work with digital evidence, there still remains the underlying issue of the how organization can efficiently manage the data volumes that need to be gather and processed during a forensic investigation.
Foremost, there is a need to design a storage solution that can easily adapt to the continuously growing volumes of data that need to be accessed in both real time and near real time. Using storage solutions such as an EDW allows organizations to store both structured7 and unstructured8 data in a scalable manner that can easily and dynamically adapt to changing storage capacity requirements.
Second, as data volumes continue to increase organizations can start to experience inefficiencies in their potential to effectively perform data mining and analytics. Integrating into the EDW solution, the use of cataloging and indexing of metadata properties allows organizations to quickly identify data and reduce the length of time it take for data to be retrieved. Not only will organizations benefit from data being readily accessible as a result of cataloging and indexing, but the ease in which data processing can be performed will improve the overall evidence-based reporting, discussed in chapter “Maintain Evidence-Based Presentation,” during a forensic investigation.
Appendix I: Data Warehouse Foundations, further discusses details on implementing a storage solution to support proactively gathering digital evidence.

Data Security Requirements

Having such a large amount of data located in a common centralized storage solution can become a problem if adequate security controls are not enforced. Securing the data repository depends on the organization’s diligence and attention to compliance regulations, awareness of potential threats, and the identification of both the risk and value of the ESI collected.
There is a significant amount of preliminary work that needs to be completed before data gathering and storage can take place. Complementary to the architectural design work that takes place, organization must incorporate current best practices and standards for implementing a data repository to ensure adequate security and reliability is maintained throughout the solution’s lifetime. This requires that ongoing assessments of the centralized storage solution are completed to identify and understand the risks associated with each aspect of its eventual implementation, including:
• Analysis of requirements specific to:
the value to data being collected
the architectural design
• Interpreting security and compliance standards and guidelines
• Assessment of the effectiveness of security controls and designs
Analysis of security requirements begins with having an understanding of the business needs and desires for building the centralized storage solution. As described in the sections throughout this chapter, the capabilities and functionalities for the storage solution have been identified where now security controls, countermeasures, and data protection need to be established.
Integrity: Generating cryptographic hash values, such as the message-digest algorithm family10 (eg, MD5) or the Secure Hashing Algorithm family11 (eg, SHA-2), for collected data stored in the centralized repository.
Availability: Required backups are taken in support of disaster recovery capabilities.
Continuity: Building cold, warm, or hot sites in support of business continuity capabilities.
Authentication: Leveraging existing centralized directory services for subject identification.
Authorization: Implementing role-based access controls12 to objects.
Nonrepudiation: Use of cryptographic certificates to associate the actions or changes by a specific subject, or to establish the integrity and origin of information.
Appendix J: Requirements Analysis, further discusses details how to perform a requirements assessment for gathering digital evidence.

Summary

Developing a requirement statement for the collection of digital evidence requires organizations to conduct thorough planning and preparation. Not only does the storage solution need to be functionally assessed in terms of its architectural design, it is critical that further security assessments are completed to ensure the collected digital evidence is safeguarded from unauthorized access.