TELEPHONE:
EMAIL:
Hard Disk Drives - Bigger is Not Better
Increasing Storage Capacities - The Computer Forensics Dilemma
By Michael R. Anderson
Computer Forensics deals with the preservation of computer evidence; the identification of leads and evidence; the extraction and segregation of relevant evidence; and the documentation of findings. Law enforcement standards in this relatively new field have evolved since the creation of the first formal computer forensics training courses at the Federal Law Enforcement Training Center (FLETC) by the Internal Revenue Service, Criminal Investigation Division in 1989. Prior to 1989 some computer forensics procedures and processes were used in US military and US intelligence agencies.
In 1996, computer forensics training and tools were introduced to the private sector by New Technologies, Inc. (NTI). However, NTI also currently supports over 3,700 law enforcement computer forensics specialists. The field of computer forensics is still evolving and computer forensics software tools and procedures are currently used by the both the public and private sectors in internal investigations, criminal investigations, electronic document discovery, internal audits, computer security reviews, data elimination certifications and as a follow on to computer incident responses. All branches of the US military, most US government agencies (including federal law enforcement agencies), all of the Big 5 accounting firms and over 150 of the Fortune 500 corporations currently rely upon computer forensics tools and methodologies to preserve computer- related evidence; to identify risks; to identify evidence; and to create and analyze timelines of computer usage. Over ten universities in the United States have, or are developing, computer forensics degree programs. Clearly, computer forensics has become one of the fastest growing areas of focus in computer science and forensics.
The Problem - Exponential Storage Capacity Increases:
The effectiveness of computer forensic software tools and processes is directly tied to the volume of computer data involved and the number of computers to be processed in a specified period of time. Today, most private and public sector environments rely heavily upon portable notebook computers, desk top computers and network servers. IBM released the first IBM Personal Computer (PC) less than twenty years ago in October of 1981. Since that time, the popularity of the PC has exploded beyond all expectations. As market demand increased for the PC, so did the need for more storage capacity. This paper deals specifically with current and predicted problems tied to the exponential growth of computer hard disk drive storage capacities. There are no easy answers but NTI remains on the leading edge in the development of computer forensics tools for the benefit of its clients.
When the first law enforcement computer forensics training courses were created at FLETC in 1989 by NTI's founders, typical computer media storage capacities ranged from 360,000 bytes (on a floppy diskette) to approximately 20 million bytes (on a hard disk drive). In 1996, when NTI began its private sector training of Big 5 accounting firms and Fortune 500 corporations, typical computer media storage capacities ranged from 1.44 million bytes (on a floppy diskette) to approximately 1.2 billion bytes (on a hard disk drive). Today, a typical computer hard disk drive has the capacity to store over 40 billion bytes of data and, as of this writing, 80 billion byte computer hard disk drives are available for purchase in the marketplace. For the purposes of this paper one million bytes is a megabyte, one billion bytes is a gigabyte and one trillion bytes is a terabyte.
We thought a 20 megabyte computer hard disk drive was huge back in 1989 and it was a real challenge to evaluate every byte of computer data for evidence and exculpatory information in criminal cases. Back then it was possible for us to manually evaluated every 512 byte sector of the data on the hard disk drives to identify encrypted data, compressed files and embedded text formats. Such data formats cannot be evaluated using standard computer forensic search utilities because search utilities can only identify strings of plain text. To overcome these limitations back then, we manually evaluated every sector in our search for headers and other identifying patterns indicative of encrypted or compressed data. Unfortunately, hard disk drives are too large today for the manual analysis of every sector. As a result, many computer forensic examinations today are limited to just the search of targeted plain text on the hard disk drive using a computer forensic search utility.
The exclusive use of a computer forensics search utility in the evaluation of computer evidence assumes that compressed data and/or encrypted data is not potentially involved. It also assumes that the computer forensics specialist knows what phrases and terms were used by the computer users to create the data stored on the subject computer. It is a difficult task to create an effective list of target search terms in a case where leads are limited to informant communication or probable cause. Unique language and grammar used by the computer user in the commission of a crime is usually unknown to the investigator. Thus, a thorough computer forensics examination extending beyond the mere use of a computer forensic search utility is required, if all relevant computer evidence and exculpatory information is to be found. It was tough back in the days of the 20 megabyte hard disk drives. Today, it is all but impossible to completely analyze huge computer hard disk drives without specialized processes, computer file timeline analysis, computer forensics software utilities and statistical data sampling techniques. The problem is amplified through the use of computer forensics software tools that claim to provide the computer forensics examiner with a complete forensics solution in one separate software tool. In reality, they only deal with the tip of the ice berg when huge hard disk drives are involved and they don't provide any solution concerning large raided servers.
Storage capacities of computer hard disk drives continue to increase at an exponential rate. Unfortunately, most law enforcement agencies are still using processing methodologies which spawned from the original FLETC protocols developed in 1989. Law enforcement agencies are also underfunded and they cannot afford to outfit state-of- the-art computer forensics laboratories. Law enforcement agencies are behind on the technology curve for computer evidence processing and the gap widens as computer hard disk drive storage capacities continue to increase. As of this writing the Federal Bureau of Investigation is nearing completion of its Automated Computer Examination System (ACES) and this will provide law enforcement with better tools than they have had previously. However, even this new technology will not fully bring law enforcement up to speed. Hard disk drive capacities are the issue. They continue to grow at an exponential rate as hard disk drive technologies advance.
To put the computer forensics storage capacity dilemma into clear focus, consider the following. Just one megabyte of printed computer data represents about 312 printed sheets of 8 Ç x 11 inch paper or about a 1.6 inch stack of printed paper. Thus, the contents of a 20 megabyte hard disk drives back in 1989 represents approximately a 32 inch stack of printed paper. A 1.2 gigabyte hard disk drive back in 1996 interpolates into approximately 1,920 inches or 160 feet of stacked printed pages. One of today's commercially available 80 gigabyte hard disk drives interpolates into a stack of printed paper over 10,000 feet tall! NTI's consulting team just completed a computer forensics litigation project involving more than four terabytes of computer data. If our computer forensics specialists had printed all of the data in that case, the printed output would have created a stack of paper over 100 miles high! Are you starting to understand why computer forensics specialists are tearing their hair out every time another "bigger and better" hard disk drive shows up in the marketplace? Remember, most of these cases involve multiple computer hard disk drives and the US court system expects legal discovery to be conducted in a "reasonable" period of time. A typical case may involve the review of 10 hard disk drives and unfortunately most lawyers and judges don't fully understand the magnitude of the problem.
One Solution - Random Data Sampling:
Statistical data sampling techniques have helped NTI deal with increased hard disk drive capacities and also to identify the most relevant computer(s) to process first when multiple computers are involved in a single case. NTI's computer forensics specialists evaluate ambient data, e.g., swap files, file slack, first because such data storage areas contain random samples of data associated with computer usage. To assist in this important phase of processing, NTI has developed "intelligent" fuzzy logic software tools. Some of these tools help to identify English language sentence structure stored in the form of ambient data. Others identify the names of individuals and others identify past Internet activities. Some of the filtering tools are sold to our corporate and government clients. Some of the more specialized filtering tools are used exclusively by NTI's consulting team members. The intelligent data filtering process helps to identify relevant strings of text that might not otherwise be discovered by computer forensics specialists. In some cases, these filtering processes have identified critical evidence that has essentially made the case for NTI's civil litigation law firm clients. Many of NTI's intelligent filtering processes are patent pending.
As mentioned previously, several forms of ambient data exist on Microsoft-based systems. NTI's computer evidence processing methodologies place a high priority on Windows swap files (page files in Windows NT). Windows swap files are used in Microsoft Windows as an extension of random access memory. When more random access memory is required, Microsoft Windows uses the swap file as a temporary electronic scratch pad. This activity is conducted transparently by the operating system and without the knowledge of most computer users.
Typically Windows swap files range in size from 20 megabytes to over 100 megabytes and the contents is randomly stored by the operating system as a normal process of the operating system. Thus, the contents of the Windows swap file should be thought of as a sampling of prior work performed on the subject computer. Through the process of "intelligent" filtering and data exclusion, relevant sentence structure, names, E-mail addresses and Internet browsing activity can be identified that might be overlooked using traditional computer forensics search methodologies. In this fashion, previously unknown search terms can be identified and added to the targeted list of search terms for use with traditional computer forensic search tools.
Computer hard disk drives are getting bigger and it is clear that traditional computer evidence processing methods and procedures have become outdated. The traditional search of computer hard disk drives using a "best guess" approach is no longer effective because so much incriminating and exculpatory evidence is potentially overlooked. NTI has developed methods and processes to overcome these deficiencies. However, more needs to be done as hard disk drive capacities continue to grow in the future.

