Overview of Windows Server 2003 File Systems

Overview of Windows Server 2003 File Systems

A file system is a little like a commercial real estate agent. It acts as a broker between a lessor who has space available and a lessee who wants that space. In the case of a file system, the storage system determines what space is available, and applications are the lessees that want a piece of that space.

As administrators, we need to know enough about the elements of a file system transaction so we can spec out our storage needs and anticipate where problems might occur. This means we need to know details about certain disk structures that support file system operations:

  • Sectors. These form the basic divisions of data on a disk. Sector location is determined by the disk geometry and is fixed by the manufacturer. Drives made for the U.S. market generally have 512-byte sectors.

  • Clusters. The file system assigns addresses to clusters, which represent groups of sectors. By lumping sectors into clusters, the file system reduces the size of the address space it must track.

  • Partitions and volumes. Raw storage on a drive is divided into partitions, each of which can be mounted by a separate file system. An IA32 system uses a Master Boot Record (MBR) to assign partitions on a basic disk. An IA64 system can use GUID Partition Table (GPT) or MBR partitioning for basic disks. Dynamic disks on either system are handled by the Logical Disk Manager (LDM), which uses a database to divide disk arrays into volumes. From the file system's perspective, these partitions and volumes all represent the same thing: a place to store files.

  • Partition boot sector. This is the first sector of a partition. It contains information about the file system and a small amount of bootstrap code to start loading the file system.

To see how a file system turns raw storage into a data repository, we need to know a little about the structures that hold critical information. They are as follows:

  • Files. These are addressable locations in a partition where discrete chunks of user data are stored. A file has a name and attributes that determine its contents.

  • Folders. A folder is an index of filenames. Folders give structure to a file system, creating a hierarchy that makes it easier to locate individual files.

  • File Allocation Table (FAT). This is a map of the clusters in a partition that has been formatted as FAT or FAT32. The file system uses the cluster map to locate files and folders.

  • Master File Table (MFT). This is a database containing information about the file system elements stored in a partition that has been formatted as NTFS.

  • MFT metadata records. These are special NTFS records that store information about the structure of the MFT itself and provide support for critical file system operations.

From an operational perspective, we need to know what each file system can do, what it can't do, and what to use as criteria when choosing between them. Let's start with storage details. (See the following sidebar, "More Information About File Systems.")

Sectors and Clusters

A hard drive stores data in concentric tracks that are divided into addressable units called sectors (see Figure). In most Western drives, a sector contains 512 bytes.

Figure. Diagram of sectors and clusters on a hard drive.


When a file system asks for data from a storage driver, it must specify the location of that data in relation to the start of the volume. The storage driver then works with the device controller to move the drive heads to the designated location, pick up the required information (plus a little extra for the cache), buffer the information, and deliver it to the file system driver.

A sector is the smallest addressable unit on a drive. Ideally, a file system would assign an address to every sector in a partition. This yields the best utilization, because any space left over between the end of a file and the end of the last sector holding the file is wasted.

At some point, though, a volume may contain so many sectors that the cost of maintaining addresses for all of them starts to become a burden and performance goes down. To improve performance, the file system clumps individual sectors into allocation units, or clusters.

More Information About File Systems

When I wrote about file systems for the predecessor to this book, Inside Windows 2000 Server, I did all of the investigative work myself. This involved many tedious hours studying hex dumps of disk structures. I was forced to do this because Microsoft has steadfastly refused to publish a specification for NTFS. Some information has dribbled out of Redmond over the years in the form of white papers and Resource Kit articles, but specific engineering details are tough to come by.

For this book, I benefited a great deal from work done in the Open Source community, specifically by the participants in the Linux-NTFS project. If you want to see details of NTFS data structures along with information about other exciting work in the cross-platform storage arena, visit linux-ntfs.sourceforge.net.

I also got great information about the on-disk structures of NTFS from the book Windows NT/2000 Native API Reference by Gary Nebbett. This is a great reference for anyone trying to understand the inner workings of Windows.

For details about the disk structures for FAT and FAT32, I recommend reading the Microsoft white paper, "FAT: General Overview of On-Disk Structure," available at www.nondot.org/sabre/os/files/FileSystems. This site is a good place to start when researching information about just about any file system.

For authoritative, high-level explanations of the workings of NTFS, I recommend reading "Inside Windows 2000" by David Solomon and Mark Russinovich, plus any of Mr. Russinovich's NTFS articles in Windows 2000 Magazine and other periodicals.

The Resource Kit also has a very good exposition on the Windows Server 2003 file systems. Someday perhaps Microsoft will allow the Resource Kit writers to include engineering specifications, as well.

A cluster contains an even multiple of sectors. This is called the cluster size. Clusters come in increasing powers of 2, yielding cluster sizes of 512 bytes, 1K, 2K, 4K, 16K, 32K, and 64K. The maximum cluster size supported by any file system that ships with Windows Server 2003 is 64K.

If the end of a file does not completely fill its assigned cluster, the excess space is wasted. Windows does not provide sub-allocation of sectors within a cluster. This means cluster size has a direct impact on disk utilization. For example, I've seen instances where nearly 25 percent of the available space on a volume was reclaimed by converting a large, heavily loaded FAT volume formatted with 32K clusters into an NTFS volume with 512-byte clusters.

It is beneficial in some instances to match cluster size to average file size. A volume that holds hundreds of thousands of small files should have a small cluster size. That seems obvious. But a volume that holds a few, very large files (database files, for example) can benefit from the improved efficiencies of large cluster sizes. For the most part, though, letting Windows decide on a cluster size when formatting a volume usually yields optimal performance.

Changing cluster sizes requires reformatting. If you decide to increase the cluster size on a big array for a database server, you'll need to back up your data, reformat the array with a different cluster size, and then restore the data from tape.

Each of the three file systems in Windows Server 2003 uses 512-byte clusters up to a certain volume size. Beyond that, behavior differs. For FAT, cluster size doubles each time volume size doubles. FAT32 and NTFS keep cluster sizes at 4K for as long as possible. Figure lists the default cluster sizes for each file system based on volume size.

The 4K plateau on NTFS cluster sizes is there because the compression API does not work with cluster sizes above 4K. FAT32 is seen as an intermediate stage prior to converting to NTFS, so FAT32 cluster sizes are also constrained to 4K as long as possible.

Cluster Sizes as a Function of File System and Volume Size

Volume Size




















































Cluster Size and Stripe Size

Cluster sizes are not related to the underlying data buffers used by Logical Disk Manager. This is often referred to as stripe size in hardware RAID. LDM always moves data to and from a drive in 64K chunks regardless of the cluster size.

Historically, the defragmentation API was another reason for limiting the maximum NTFS cluster size to 4K. As we'll see in section, "Defragmentation," Windows Server 2003 and XP now permit defragmenting volumes with cluster sizes above 4K.

FAT File System Structure

Figure shows the layout of the first few sectors on a disk that is formatted with FAT. The partition boot sector has an entry that identifies the format type and the location of the FAT and the mirrored FAT. Ordinarily, the FAT is located near the front of the disk to benefit from the fast read times there. (Tracks at the outside of a disk have a higher terminal velocity.)

Figure. Diagram of FAT layout on a typical partition.


The Fastfat.sys driver in Windows Server 2003 supports three cluster numbering schemes:

  • FAT12. This format uses 12 bits to identify a cluster. This is the original FAT format. It is quite compact so it is used for formatting floppies and volumes smaller than 16MB.

  • FAT16. This format uses 16 bits for numbering clusters, which pegs the maximum volume size at 216 (0xFFFF) or 65,535 clusters. (The actual size is 1 cluster smaller than the maximum theoretical size of 65,536.) Windows Server 2003 supports a maximum 64K cluster size for FAT16, making the largest FAT volume 65535 * 64K or 4GB. (Windows 9x and DOS only support a 32K cluster size—a power of 2 less than the total number of clusters—for a maximum volume size of 2GB.)

  • FAT32. This format uses a long integer (32 bits) for cluster numbering. This would ordinarily permit a very large volume, but Windows Server 2003 limits FAT32 volumes to 32GB. Standard Windows 95 and classic NT cannot read FAT32 partitions.

There is also a special version of FAT32 called FAT32x used by Windows 9x and ME when formatting drives larger than 8GB. FAT32x overcomes a limitation of traditional Cylinder/Head/Sector translation by forcing the operating system to use Logical Block Addressing (LBA), which assigns a number to each available sector reported by the drive. LBA ups the ante for partition sizes to whatever the file system can handle. For FAT32, that limit is 2TB (terabytes).

FAT32x volumes store their FAT tables at the end of the volume rather than the beginning. This signals the operating system to use LBA. Windows Server 2003 can read a FAT32x volume but does not use FAT32x formatting because the Fastfat driver always uses LBA unless specifically told not to. If you upgrade a Windows 9x/ME system to XP, there is no indication in any of the command-line utilities or in the Disk Management console that a volume is formatted as FAT32x rather than FAT2. For more information, visit www.win98private.net/fat32x.htm.

Location and Use of FAT Disk Structures

FAT12 and FAT16 file systems require the FAT to start at the first available sector following the boot sector. MBR partitions generally have a few hidden sectors between the partition boot sector and the start of the file system. These hidden sectors are often used for inscrutable purposes. GPT disks used on IA64 systems have no hidden sectors.

The mirrored copy of the FAT must follow immediately after the primary FAT. This fixed location of the FAT tables in FAT12 and FAT16 is a weakness. A failed sector can make the file system inaccessible. This weakness is overcome by FAT32, which can locate the FAT anywhere, although it is generally still located at the front of the disk right after the boot sector.

The first entry in the FAT represents the root directory of the partition. The root directory is special because it is exactly 32 sectors long, enough room for 512 entries. For this reason, you can only put about 500 files and directories at the root of a FAT partition. As you are no doubt aware, FAT and FAT32 support long filenames by robbing directory entries. This can quickly absorb many directory entries if you use long filenames in the root directory, greatly limiting the total number of files and folders you can store at root.

The size of the FAT table itself is determined by the number of clusters in the partition. The larger the partition, the more entries are needed in the FAT. For example, the FAT on a 2GB partition with the default 32K cluster size would take up 128K of disk space. The FAT mirror would also use 128K. This is a pretty efficient use of disk space when compared with FAT32 and NTFS, but the payoff comes in reduced reliability, slower performance for some drive operations, performance degradation in the face of even minimal fragmentation, and the lack of security and journaling features.

Cluster Maps

The FAT is actually just a big cluster map where each cluster in the partition is represented by a 16-bit (2-byte) entry. If there are 65,535 clusters in the volume, there would be 65,535 2-byte entries in the FAT. (FAT12 packs the bits for storage economy.) Figure shows the layout of a few cluster entries in a FAT16 table.

Figure. Diagram of typical FAT cluster mappings for a FAT16 file system.


The first two FAT entries are reserved and represent the root of the partition. The next entry represents the first file or folder created in the volume. In the diagram, this cluster is empty because the original file or folder has been deleted.

An empty cluster has a value of 0000. When you create a file or folder, the file system selects an empty cluster and writes data to the disk in that location. If the file or folder spills over into another cluster, the value of the FAT entry for the first cluster contains the number of the next cluster. This is called a cluster chain. The final cluster assigned to a file is identified with an end-of-file marker, FFFF.

Ideally, each cluster used by a file comes directly after the preceding cluster on the disk. Such a file is said to be contiguous. When you delete a file or folder, the FAT entry is set to 0000. This indicates that the cluster is available. As you add and delete files, the file system reuses empty clusters. This results in fragmentation.

Figure shows a portion of the FAT with a fragmented file. The cluster map shows a file that starts at cluster location 08. The file is too big to fit in one cluster and the next contiguous cluster already has a file in it. The file system driver selected the next available empty cluster and continued the file from there. Remember that the number in a FAT entry points at the next cluster in the chain, not the current cluster.

Figure. Cluster map for a fragmented file.


When the Fastfat driver delivers a file to the I/O Manager in the Windows Server 2003 Executive, it must "walk the chain" of FAT entries to locate all the cluster numbers associated with the file. The drive head must then travel out across the disk and buffer up the data, put it in order, then spool off the results to the guy upstairs. If the files and folders in the partition are heavily fragmented, it takes much more effort from the disk subsystem to collect the clusters. This impacts performance.

As we'll see, Windows Server 2003 and Windows 2000 have a built-in defragmentation utility that can put the FAT and the associated disk clusters back in apple-pie order. The same defragger is supplied in Windows Server 2003 and XP. You can schedule defragmentation in Windows Server 2003, something that required a third-party utility in Windows 2000.

FAT Directories

The file system cannot locate a file by its name simply by looking at the cluster map in the FAT. Finding a particular file by its name requires an index that shows the filename and the number of the first cluster in the file as listed in the FAT. That index is called a directory or a folder.

In addition to filenames, a FAT directory entry contains a single byte that defines the file's attributes (Read-Only, Hidden, System, and Archive) and a timestamp to show when the file was created. Directory entries are placed into a disk cluster just as if they were files. Figure shows a diagram of disk clusters that contains a set of directory entries.

Figure. Disk clusters with fragmented directory entries.


A directory entry can become fragmented like the example in the figure when the number of name entries exceeds the size of the cluster. This is another reason large FAT volumes need large cluster sizes.

If you add several files to a directory and the directory entry cannot grow into a contiguous cluster, the directory becomes fragmented. This significantly degrades performance. You can imagine the kind of work it takes for the file system to assemble fragmented directories and their linked files to display the results in Explorer.

FAT Partition and File Sizes

Due to real-mode memory limits, the cluster size for a FAT partition under DOS and Windows 9x is limited to a power of 2 less than the address limit. Therefore, the maximum cluster size is 215 bytes, or 32KB. The maximum size of a FAT partition under DOS and Windows 95, then, is 65535 * 32KB, or about 2GB.

FAT under Windows Server 2003 (which boots into protected mode before the Fastfat file system driver loads) has a cluster size limit of 216 bytes, or 64KB per cluster.

FAT supports 216 clusters on a volume but it reserves 12 clusters for special use, leaving 65,534 clusters for storing files and folders. In practice, it is nearly impossible to select a partition size that would yield exactly the maximum size for a given cylinder alignment. Gaining a few clusters of theoretical capacity is not worth the effort.

The maximum partition size of a FAT partition on Windows Server 2003 (and any member of the NT family) is about 4GB (65,534 clusters at 64K per cluster). DOS and Windows 9x cannot access a partition with a cluster size larger than 32K, so avoid 64K clusters when running a dual-boot machine.

FAT file sizes are specified by a value in the directory entry. This value uses a 32-bit word, so file sizes are limited to 232 bytes, or 4GB. The actual size is one byte shy of a full 4GB, or 4294967295 bytes, because a 32-bit word filled with 1s would be 0XFFFFFFFF.

You can verify this experimentally using Fsutil from the Support Tools. This utility permits you to create files of any length down to the nearest byte. In the experiment, create a FAT32 partition comfortably larger than 4GB. Issue the following command at the root of the partition:

fsutil file createnew 4294967296

You will get an error saying that insufficient disk space exists. Subtract one from the size and the file will be created with no errors.


FAT32 was introduced to overcome some of the more glaring deficiencies in FAT. The most significant difference is the FAT32 cluster map, which uses 32-bit words to identify clusters rather than 16-bit words. This significantly increases the number of clusters that can be addressed. The first 4 bits of each 32-bit cluster address are reserved, so the maximum number of clusters is 228. Coupled with the maximum FAT32 cluster size under Windows Server 2003 of 64K, this yields a theoretical volume size of 4EB (exabytes).

Now for practicalities. The size of any MBR-based disk partition is defined by a Volume Size value in the partition table. This value specifies the number of sectors assigned to the partition without regard to the file system that formats the partition. This Volume Size value is a 32-bit word, so the maximum size of an MBR-based partition is 232 sectors, or 2TB (terabytes).

If you use Dynamic disks in an IA32 system or GPT disks in an IA64 system, you avoid this partition table limit and a volume can grow to its theoretical limit. However, Windows Server 2003 will refuse to format a FAT32 volume larger than 32GB.

You can format a FAT32 volume under Windows 98 or ME to a size larger than 32GB and put it in a Windows Server 2003 machine, and the Fastfat driver can read and write to it. The maximum practical FAT32 volume size under ME is limited to 2TB because of the Volume Size entry in the boot sector.

FAT Size and Disk Efficiencies

The FAT itself is much, well, fatter under FAT32. The size of the FAT in FAT32 roughly doubles that of FAT16 because each cluster entry requires four bytes rather than two. Because the cluster chains in FAT are prone to breakage and corruption, a large FAT32 structure requires much care and maintenance. For the most part, you'll get equivalent performance and much better reliability using NTFS. If your previous experience with NTFS has been on older platforms, you should evaluate the performance improvements in NTFS 3.1. They are significant.

FAT and FAT32 Weaknesses

FAT and its cousin, FAT32, are kind of like a vaudeville act. They're notable for their longevity more than any remaining entertainment value they might have in them. Here are their primary weaknesses, most of which are corrected by NTFS:

  • The FAT location is fixed in FAT12 and FAT16 partitions. If a sector containing a piece of the FAT fails, the system must fall back on the mirrored FAT. If both FAT tables become inoperable, which could happen because of their proximity, the entire file system becomes unmountable. FAT32 avoids this fixed location weakness.

  • The number of files at the root of a FAT partition is limited to 512. The FAT32 limit is 65,534 files. Both FAT and FAT32 support long filenames by using additional directory entries, so you can exhaust the supply of root directory entries by using long filenames.

  • The limited addressing space of FAT16 wastes drive space by forcing the use of large cluster sizes. FAT32 increases the cluster number but not as much as NTFS.

  • The chain of FAT entries for a file or directory can become broken or corrupted. FAT also lends itself to truly stupendous fragmentation problems. This is also true for FAT32. NTFS deals with fragmentation more gracefully so the performance degradation is much less severe.

  • The Fastfat driver is not as efficient as the NTFS driver for random file lookups and large files.

  • FAT and FAT32 have no security. Anyone with access to the machine can access any of the files.

  • The compression mechanism used for FAT and FAT32 partitions under Windows 9x is clumsy and a major source of support problems. DriveSpace volumes are not supported under Windows Server 2003. If you need to save space by compressing files on a Windows Server 2003 or XP machine, you'll need to convert to NTFS.

NTFS Cluster Addressing and Sizes

NTFS sets aside a 64-bit word for cluster numbering, but all implementations of NTFS limit the address to the first 32 bits. At the maximum cluster size of 64K, this yields a maximum partition size of 256EB (exabytes). The partition size limit for MBR disks remains at 2TB, even under NTFS, thanks to the maximum size specified in the partition table. This limit can be overcome with dynamic disks or by using GPT disks on IA64 machines.

The maximum default cluster size for NTFS is 4K, but if you have no need for compression, you can select a larger cluster size when formatting a partition.

The maximum NTFS file size is artificially constrained from the theoretical maximum of 264 bytes to an actual maximum of 244 bytes, or about 16TB. This was considered an outrageously large file when NTFS was first developed, but if you extrapolate out the growth of current storage solutions, it won't be long before some high-end Intel-based servers start to nibble at that 16TB limit. Microsoft has not stated what its strategy is for these types of files, but rumor has it that a future successor to Windows Server 2003 might sport a new file system capable of handling humongous files.

NTFS Structure

Unlike the cluster map used by FAT and FAT32 to locate files, NTFS uses a true database called the Master File Table, or MFT.

The MFT consists of a set of fixed-length records, 1KB apiece. Each record holds a set of attributes that, taken together, uniquely identify the location and contents of a corresponding file, folder, or file system component. (There are a few minor exceptions to this "one record, one file" rule, but they are encountered only when a file gets very large and very fragmented and are not typically a concern.)

There are three classes of MFT records:

  • File records. These records store information typically thought of as "data" such as application files, driver files, system files, database files, and so forth.

  • Directory records. These records store and index filenames. Directory records are also used to index other attributes such as reparse points, security descriptors, and link tracking information.

  • Metadata records. These records control the structure and content of the Master File Table itself. Some of them use a file record structure. Others use a directory record structure. Still others use a unique set of attributes and a unique record structure.

Let's take a look at how each of these records is structured to get an idea of how the file system operates.

Metadata Records

Figure shows the hierarchy of the metadata records, if you think of them as representing files and folders. The record names start with a dollar sign, $.

Figure. MFT metadata records.


Metadata records are not exposed to the UI or to the command line. You can see the space set aside for them by running CHKDSK. Following is a sample listing. The bold entries show the space taking up by the metadata files:

635008 kilobytes total disk space.
535691 kilobytes in 8206 user files.
1894 kilobytes in 592 indexes.
14176 kilobytes in use by the system.
7647 kilobytes in use by the log file.
83247 kilobytes available on disk.

As you can see, NTFS takes a significant chunk out of a small volume. On volumes in excess of 10GB or so, though, the percentage of total space consumed by NTFS is smaller than FAT32 when cluster sizes are equal.

The order and structure of the metadata records are rigorously controlled because of the impact their location has on performance and functionality. For example, the MFT contains a record representing the partition boot sector. The secondary bootstrap loader, Ntldr, loads the first few metadata records in the MFT and uses that information to locate and read the boot sector so it can mount the rest of the file system.

The MFT is a file, just like any other file. The first record in the MFT, then, is a file record representing the MFT itself. This is the $MFT metadata record. If the sector holding the $MFT record gets damaged or is otherwise unreadable, the operating system would fail to start. To prevent this from happening, the first four MFT records ($MFT, $MFTMirr, $LogFile, and $Volume) are copied to the middle of the volume. If Ntldr cannot open these records in their normal location, it loads the mirrored copies and uses information in the $MFTMirr record to learn the contents of the damaged sector so it can mount the file system.

NTFS 1.2, the version used in NT4, used the first 16 records of the MFT to hold metadata. Only the first 11 records actually contained any information, with the rest reserved for future use. NTFS 3.0 and later set aside the first 26 records for metadata and uses 15 of them.

You may encounter the names of these metadata records in a variety of scenarios. They appear most often in error messages, especially when the file system gets gravely ill. For the most part, though, knowing their names and functions is like knowing the names of the bones in your body. It helps you pinpoint problems where otherwise you would only be able to give vague references.

Here is a quick list of the metadata records and their functions, with more detail offered later in the chapter:

  • $MFT. NTFS treats everything on the disk as a file, including the Master File Table itself. The $MFT record points at the MFT. The $MFT is a hybrid record type, containing both data and directory attributes. The directory attributes include a bitmap that lays out the MFT structure and shows which records are in use and which aren't.

  • $MFTMirr. NTFS mirrors the first few MFT records at the halfway point of the volume. If the cluster holding the primary $MFT record goes bad, the system uses the mirror to find the remaining portions of the MFT. The $MFTMirr is a standard MFT file record with a pointer at the location of the mirrored $MFT record.

  • $LogFile. This record contains the location of the log files used to support NTFS transaction tracking. NTFS writes updates to critical MFT records in a log prior to committing them to disk. The $LogFile record points at two log files, one primary and one mirror, located at the middle of the volume.

  • $Volume. This record contains the name of the NTFS volume, its size, and the version of NTFS used to format it. The $Volume record uses a unique format with specialized attributes. It also contains a flag used to indicate if the file system shut down abnormally.

  • $AttrDef. This record points at a file that lists the attributes supported by NTFS along with information about them, such as whether they must remain in the MFT or can be stored elsewhere on the disk. The $AttrDef record uses a standard file record format.

  • $\. This record contains the root directory of the file system. The root directory is crucial to the integrity of the file system because all other directories refer to it. The $\ record uses a standard directory record format.

  • $BitMap. Just like FAT and FAT32, NTFS maintains a cluster map of a volume so that it can quickly determine the location of unused clusters. The NTFS cluster map uses individual bits rather than bytes, so huge volumes can be mapped in a compact structure. The $BitMap record uses a standard file record format. The section, "Performance Enhancements," found later in the chapter, describes changes made to this record in NTFS 3.1.

  • $Boot. This file points at the partition boot sector at cluster 0. Ntldr uses this information to mount the file system. There are two data attributes in this record. One points at the main boot sector and one points at a mirrored boot sector at the end of the partition.

  • $BadClus. This file contains the location of any bad clusters identified during initial formatting or subsequent operation. (If you choose the quick format option, the system does not scan for bad clusters and nothing is initially written to this record.) The $BadClus record uses a standard file record format.

  • $Secure. This record was introduced in NTFS 3.0. It contains the security descriptors for all MFT records. Aggregating the security descriptors into one place rather than scattering them in the individual MFT records improves performance and enables features such as permission inheritance. The $Secure record uses a special format containing both data and index components.

  • $UpCase. This file contains a map of lowercase Unicode characters to their uppercase equivalents. The $UpCase record uses a standard file record format.

  • $Extend. This record was introduced in NTFS 3.0. It forms a folder that contains the additional metadata records added in NTFS 3.0: $Quota, $ObjID, $Reparse, and the Change Journal, $UsnJrnl. The $Extend record uses a standard directory record format.

  • $Quota. The $Quota record is a directory that holds records for user SIDs and the files they own. It supports assigning space on a volume based on quotas. The $Quota record has existed in NTFS since its inception but was not implemented until NTFS 3.0 (Windows 2000). The $Quota record uses a standard directory structure.

  • $ObjID. Another record introduced in NTFS 3.0, $ObjID contains an index of files and folders that contain Globally Unique Identifiers (GUIDs). This index is used by the Link Tracking Service to locate source files for OLE links such as shortcuts and compound documents.

  • $Reparse. This record was also introduced in NTFS 3.0. It stores information about reparse points. A reparse point redirects a calling process to an alternate data repository such as another folder or file or even a separate file system such as a CD-ROM or DVD or tape drive.

  • $UsnJrnl. Another new record in NTFS 3.0, this is a file record that contains the Change Journal. The Change Journal tracks when files are changed so that applications do not need to scan the entire file system. An example of an application that makes use of the Change Journal is the Content Indexing service.

The "Dirty" Flag

NTFS keeps a lot of data in cache waiting for convenient times to commit changes to disk. If you interrupt power (or if the system locks up), there is a possibility that some critical data might not have been saved. This missing data could conceivably compromise the file system.

When there are uncommitted pages in memory, NTFS toggles a flag in the $Volume record. This is commonly called the "dirty" flag. If the flag is set, the disk and the cache are not in a consistent state. When all uncommitted pages have been flushed to disk, NTFS toggles the dirty flag off.

If you start a system following a catastrophic event and the dirty flag is set, the system pauses before initializing the operating system and runs a special boot-time instance of CHKDSK called AUTOCHK. You can watch the results of this file system check in the console window. If the system finds errors, you'll be warned in the console and in the Event log, assuming that the system gets to the point where you can log on.

Bad Cluster Mapping

If NTFS tries to write to a cluster and fails, it marks the cluster as bad, maps the cluster number to a spare cluster somewhere else on the drive, and writes the data to the remapped cluster.

The cluster containing the bad sector is marked as bad by entering its Logical Cluster Number in the $BadClus metadata record. Each entry in $BadClus takes the form of a named data stream with a pointer to the bad cluster. In essence, the system blocks access to the cluster by assigning a file to it.

Bad cluster mapping is not required for SCSI drives, which are intrinsically self-repairing. If a sector goes bad (cannot be written) on a SCSI drive, the drive controller maps the sector address to one of a set of spare sectors set aside for this purpose. This sector sparing feature helps to keep the drive functioning normally as it ages. If NTFS is unable to write to a cluster on a SCSI drive, it first waits for the SCSI drive to handle the situation via sector sparing. If the drive runs out of spare sectors, NTFS uses bad cluster mapping. For IDE, USB, and FireWire drives, NTFS uses bad cluster mapping exclusively.

NTFS and Removable Drives

You can format a removable drive such as a ZIP, Jaz, or Orb drive using NTFS, but only if the drive itself has been configured to require safe removal instead of permitting it to be jerked from the system. This option is a property of the device driver, and you can set it using the Properties page for the device. Select the Hardware tab, then highlight the removable storage device and click Properties to open the Properties window for the device. Select the Policies tab. Figure shows an example.

Figure. Properties window for a removable storage device showing the Policies tab where the write caching options are selected.


The Write Caching and Safe Removal field shows the caching setting for the device. The default setting is Optimize for Quick Removal. This disables write caching and permits you to snatch the drive from the system at any time. The Optimize for Performance option requires that you use the Safe Removal icon in the Notification area of the status bar to stop the device before removing it.

You're purely on the honor system, of course. If you jerk the USB or FireWire cable from the interface, how can the machine stop you? Still, this is a means of telling you to live by the rules. Write caching will be disabled by default to prevent unnecessary data loss.

Performance Enhancements

One of the advantages to merging the consumer and corporate code bases into Windows XP was that it forced Microsoft to confront performance issues head on, especially when it comes to disk I/O. Traditionally, NTFS has taken a back seat to FAT32 in raw performance. After all, with all the security and reliability overhead in NTFS, it's tough to compete against what is essentially a big table lookup engine.

To meet the performance expectations of FAT32 users while retaining the reliability of NTFS, Microsoft had to work on lots of little details. At a micro level, Microsoft reworked the MFT record headers to make them fit into even byte boundaries and reordered the information a little. This reduced the work necessary to read a record header. A small thing, to be sure, but the file system reads header information a bazillion times a day, so if you can shave a few clock ticks here and there, it adds up.

At a macro level, Microsoft modified the record location of two key metadata records: $Bitmap and $Logfile. Here are the modifications:

  • The $Bitmap record essentially performs the same function as a FAT. It keeps a map of used clusters so that the NTFS driver can quickly locate empty space. The location of this record is critical because the hard drive heads have to pick up information from the record frequently.

  • The $Logfile record is also critical because it supports file system journaling, a key reliability feature compared to non-journaled file systems such as FAT and FAT32.

Also, the MFT was moved from its traditional spot near the start of the volume to a location about one-third of the way in from the start of the volume. This gets it out of the way of application files, which NTFS 3.1 jockeys to the start of the volume to take advantage of the higher throughput. The MFT mirror remains at the middle of the drive.

If you format a partition as NTFS during Setup, or convert a FAT/FAT32 partition formatted by Windows Server 2003 or XP, these two critical metadata records are placed in their proper location. If you upgrade from a previous version of Windows and then convert, this performance enhancement is not applied.

Application file placement plays a role in perceived and actual performance. Windows Server 2003 monitors the applications run by the system and by the users. Every three days, the system defrags critical files and places them at the prime real estate near the start of the volume. This is done during idle times, so don't be surprised if you see a lot of commotion on a hard drive in the evening.

NTFS File Journaling

NTFS caches a considerable amount of data in memory while waiting for the opportunity to commit it to disk. If you've ever had the unfortunate experience of powering down a machine unexpectedly, you know what it's like to face a long and tedious wait at restart while the system runs AUTOCHK.

There is a possibility that AUTOCHK might be unable to reconstruct the contents of critical metadata records. If this were to happen, the file system would be unmountable and you would be spending a very long day restoring the volume from tape.

To prevent this from happening, the metadata files are journaled. That is, any changes made to MFT records are first written to a log file at the center of the disk. Then, at some later time, the entries are transferred from the log file to the main MFT.

This transfer from the log file to the MFT is handled by the Log File Service, or LFS. Each transfer is done as an atomic transaction so that if it is interrupted during the transfer, records are not left in an inconsistent state.

Journaling the file system makes it possible to recover the MFT quickly following an unexpected loss of power or a system lockup. All that AUTOCHK needs to do is verify the integrity of the MFT then replay the uncommitted journal entries.

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows