|
File deletion
The main purpose of any file system is organizing data on a disk.
Files should be safely stored and retrieved. File systems keep
structures to remember where free space is available, whether clusters are
bad and where files and directories are located.
The cluster is the smallest addressable unit that can be allocated
to a file. Cluster sizes can vary from 1 to 128 sectors (depending
on the size of the partition). Fixed variables like the cluster size are set
when a partition is formatted and kept in the boot sector of a volume. A volume is a
formatted portion of a hard disk that is recognized by Windows as a
separate drive.
Files are organized in directories or folders. On-disk structures are maintained to record the names,
creation-,
modification- and access dates and various other attributes and
properties of directories and files.
Typically when a file is deleted only these structures are updated:
actual file contents remain untouched on the disk.
File deletion and file undelete strategies for
FAT based file systems:
If a file is deleted on the FAT file system the first character of
a file name in the directory entry is replaced by a special
character (E5h) causing the operating system (Windows, DOS etc.)
to ignore the file. Also, all clusters allocated to the file are
marked as 'available' in the File Allocation Table (FAT for short). In
the following example Myfile.doc is deleted. Assuming a 4 Kb cluster size
(=8 sectors) then 12 clusters previously allocated to Myfile.doc are now marked
'available'.
From this point on the directory entry and clusters previously
allocated to the file can be used to store new data! If that
happens the file can no longer be undeleted or recovered.
| Directory
entry on FAT file system (simplified) |
| File Name |
Extension |
Attributes |
Date |
Start Cluster |
File Size |
| ~yfile |
doc |
archive |
16.10.06 |
7000 |
45596 b |
| File
Allocation Table (simplified) |
| ... |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6980 |
|
|
|
|
|
6986 |
6987 |
6988 |
EOF |
6990 |
6991 |
EOF |
just some files |
| 7000 |
7001 |
7002 |
7003 |
7004 |
7005 |
7006 |
7007 |
7008 |
7009 |
7010 |
EOF |
|
our deleted file |
| 7020 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7040 |
7041 |
7042 |
7064 |
|
|
|
|
|
|
|
1st part of
fragmented deleted file |
| 7060 |
|
|
|
|
7065 |
7066 |
7067 |
EOF |
|
|
2nd part of
fragmented deleted file |
| 7080 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ... |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(EOF = End Of File)
File undelete software scans the disk structures and locates directory entries marked by the special
delete character.
Old style undelete utilities like Norton Undelete would now prompt
the user to enter the first character of the file name of the
deleted file. It would then patch (edit) the directory entry and the FAT
to undelete the file 'in place'.
Modern undelete software does not modify the file
system structures as it is considered bad practice to do so.
Instead it will prompt the user for a location where it can
recover the file to. Information from the directory entry
is used to retrieve file name information. Data as it is found in
clusters previously allocated to the deleted file is copied in
binary form to a new file: the file is now recovered.
A file can only be recovered intact if the file's data was stored in consecutive
clusters: in the FAT, entries for 'clusters previously allocated to a deleted
file' are reset to 'available' so the undelete software has no way to
tell which clusters (apart from the very first one) were allocated
to the file.
File deletion and file undelete strategies for
the NTFS file system:
The central file system control structure of the NTFS file system
is the Master File Table, or MFT for short. For each file on a NTFS
volume a MFT entry, also called a File Record Segment (or FRS) is
created. All information needed to locate data allocated to the file
can be retrieved from this MFT entry. Because of this fragmented
files that were deleted can usually be recovered intact.
A file consists of multiple attributes: A file name, a data
attribute etc. Attributes are dynamic; they do not have fixed
sizes. To determine attributes for a file and their locations in
the MFT entry an attribute list is kept.
For small files the data can even be stored in the MFT entry
directly. Larger files are stored outside the MFT entry and can be
located using so called run lists. A run list describes a length
and a start cluster: a run. For multiple fragments a run is kept
for each fragment.
When a file is deleted a special file in the MFT entry is no
longer marked 'used' while the entry remains largely intact and
actual data allocated to a file remains untouched.
From this point on the MFT entry and clusters previously
allocated to the file can be used to store new data! If that
happens the file can no longer be undeleted or recovered.
| MFT
Entry or File Record Segment (simplified) |
| Standard information |
Attribute List |
File Name attribute |
Data attribute |
... |
... |
File undelete software scans the MFT for entries
that are flagged 'no longer in use'. From an entry the file name and the data runs
can be determined. Although the NTFS file system is more complex
and dynamic than FAT based file system, recoverability of data in
general is far better.
|