One of my favorite IT Directors, Buzz Eyler of the Orcutt Unified School District, tells me that, “Most people have no clue how data is stored on a hard drive running Windows. A discussion of how it is written and marked for erasing would help a lot of people understand what’s happening under the hood of their computer.” |
First, a little background: Inside your hard disk is a stack of one or more optically perfect platters where data is stored magnetically. When the drive is originally formatted, it is laid out in a pattern of concentric circles (“cylinders”) and wedges. Try to imagine a hybrid of a record album and a pizza pie…or a dartboard. However, rather than 8 slices of pizza, or about 80 places big enough to land your dart, there may be hundreds of millions of extremely small “Sectors.” A Sector is 512 “bytes” in size – or big enough to hold about 256 characters. Windows chunks these out into “Clusters”, each of which holds about 64 Sectors. Every time you create a file, Windows sets aside – “allocates” - at least one Cluster, and then writes your data to it. Whenever a file exceeds one Cluster in size, the computer allocates another entire Cluster. But even if a file consists of one letter, which is 2 bytes in size, the computer allocates approximately 32,000 (actually 32,768) bytes of space. The file may then be written to only the first 2 bytes of the Cluster, leaving the great majority of the Cluster unchanged, as “file slack.” The Cluster won’t be assigned to another file until the original file is deleted - that is, until the original is sent to the Recycle Bin, and the Recycle Bin emptied.
But this one Cluster isn’t the only place to which your data is written. Furthermore, where and in how many places data is written can be somewhat dependent upon the application writing it.
When a file is saved, there are several attributes saved with it. One is the date the file was created; one is the date the file was last changed, or modified; one is the date the file was last accessed. This information is kept as part of a file listing called a “directory.” This directory is viewed by the user as the contents of a folder.
Let us take for example, Microsoft Word, the leading word processing program for office computers. As soon as the user begins a Word document, an invisible, temporary work file is created (call it “Work File A”), and parts of the new document get written to the virtual memory file (which in WindowsXP, is called pagefile.sys). We can call it the “VM file.” When the user saves the document, a file is created on the hard disk with the name the user gives it; call it “User Document.” We think we have created one document, but the data we’re typing is going into three separate files. If we close the document, “Work File A” is deleted, but it doesn’t go away – more on this later.
Now, suppose that at a later date, we open “User Document” to make some changes. Unbeknownst to us, a new invisible temporary work file is created, and more data gets written to the VM file. When we print the document, a print buffer file is created. So, in the act of making a document, then opening it later, making a change or two, and printing it, we’ve created the original User Document, two temporary invisible work files, one print buffer file, and entries in the VM file.
Email and other documents behave in much the same way, although the specifics differ somewhat from program to program. Email issues will have their own article.
When a file is deleted, the file does not simply go away. It remains on the hard disk, its name slightly changed, ignored by the operating system, and invisible to the user, as are the preexisting, previously deleted Work Files already mentioned, and as is the VM file. The Cluster assigned to the file is deallocated, thereby becoming “unallocated space” even though it has data sitting in it. Unallocated Clusters can then be assigned to a new file when the need arises. The file listing assigned to the file’s name is also made available to be used, although the file’s name is only changed by one character. But until another file is saved to that directory or folder, and saved at that spot in the directory, the file name is not overwritten. Furthermore, if the name of the new file that is written to the same location in the directory is shorter than the original name, only part of the original name is overwritten.
Similarly, when a file is overwritten, much of the previous content of the file may remain intact. If, for instance, a file that took up 4 consecutive Clusters is deleted, and another file that takes up two consecutive Clusters overwrites the original file, then half of that original remains, albeit in a raw form, and may be recoverable. Recovering such files and file remnants is an important part of the work a computer forensic examiner performs. When a file is simply deleted, and not overwritten, it is fairly trivial for a computer forensic specialist (or data recovery technician) to recover, or recreate, the file. This process is generally known as electronic discovery, or e-discovery.
So, until data is actually overwritten, it is likely to be recoverable, in all or in part. Furthermore, if the original file is actually overwritten, it may be possible to search the hard disk for text from the original file, and thereby find complete or partial copies of the file from former, deleted versions of the file, from the aforementioned temporary work files, or from snippets that may remain in the virtual memory file. The result may be a rich lode of data useful to the computer forensics analysis, or simply recovered data for the end user.
As end users, we see one file being created when we save it, and we see it go away when we trash it. But behind the scenes, there is a lot more going on. More than the one document we think we’ve saved is created, and very little goes away when we delete it. While data is not necessarily immortal, we now see that there is typically a lot more lying around after we’re done with it than we realize.
Steve Burgess is a freelance technology writer, a practicing computer forensics and e-discovery specialist as the principal of Burgess Forensics, and a contributor to the upcoming Scientific Evidence in Civil and Criminal Cases, 5th Ed. By Moenssens, et al. Mr. Burgess can be reached at http://www.burgessforensics.com, or via email email@example.com
Related Articles -
data recovery, computer forensics, electronic discovery, e-discovery, file structure,