Index: projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/filesystems/chapter.xml =================================================================== --- projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/filesystems/chapter.xml (revision 42472) +++ projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/filesystems/chapter.xml (revision 42473) @@ -1,1626 +1,1667 @@ Tom Rhodes Written by File Systems Support Synopsis File Systems File Systems Support File Systems File systems are an integral part of any operating system. They allow users to upload and store files, provide access to data, and make hard drives useful. Different operating systems differ in their native file system. Traditionally, the native &os; file system has been the Unix File System UFS which has been modernized as UFS2. Since &os; 7.0, the Z File System ZFS is also available as a native file system. In addition to its native file systems, &os; supports a multitude of other file systems so that data from other operating systems can be accessed locally, such as data stored on locally attached USB storage devices, flash drives, and hard disks. This includes support for the &linux; Extended File System (EXT) and the µsoft; New Technology File System (NTFS). There are different levels of &os; support for the various file systems. Some require a kernel module to be loaded and others may require a toolset to be installed. Some non-native file system support is full read-write while others are read-only. After reading this chapter, you will know: The difference between native and supported file systems. Which file systems are supported by &os;. How to enable, configure, access, and make use of non-native file systems. Before reading this chapter, you should: Understand &unix; and &os; basics. Be familiar with the basics of kernel configuration and compilation. Feel comfortable installing software in &os;. Have some familiarity with disks, storage, and device names in &os;. The Z File System (ZFS) The Z file system, originally developed by &sun;, is designed to future proof the file system by removing many of the arbitrary limits imposed on previous file systems. ZFS allows continuous growth of the pooled storage by adding additional devices. ZFS allows you to create many file systems (in addition to block devices) out of a single shared pool of storage. Space is allocated as needed, so all remaining free space is available to each file system in the pool. It is also designed for maximum data integrity, supporting data snapshots, multiple copies, and cryptographic checksums. It uses a software data replication model, known as RAID-Z. RAID-Z provides redundancy similar to hardware RAID, but is designed to prevent data write corruption and to overcome some of the limitations of hardware RAID. ZFS Features and Terminology ZFS is a fundamentally different file system because it is more than just a file system. ZFS combines the roles of file system and volume manager, enabling additional storage devices to be added to a live system and having the new space available on all of the existing file systems in that pool immediately. By combining the traditionally separate roles, ZFS is able to overcome previous limitations that prevented RAID groups being able to grow. Each top level device in a zpool is called a vdev, which can be a simple disk or a RAID transformation such as a mirror or RAID-Z array. ZFS file systems (called datasets), each have access to the combined free space of the entire pool. As blocks are allocated the free space in the pool available to of each file system is decreased. This approach avoids the common pitfall with extensive partitioning where free space becomes fragmentated across the partitions. - zpool + zpool A storage pool is the most basic building block of ZFS. A pool is made up of one or more vdevs, the underlying devices that store the data. A pool is then used to create one or more file systems (datasets) or block devices (volumes). These datasets and volumes share the pool of remaining free space. Each pool is uniquely identified by a name and a GUID. The zpool also controls the version number and therefore the features available for use with ZFS. &os; 9.0 and 9.1 include support for ZFS version 28. Future versions use ZFS version 5000 with feature flags. This allows greater cross-compatibility with other implementations of ZFS. - vdev Types + vdev Types A zpool is made up of one or more vdevs, which themselves can be a single disk or a group of disks, in the case of a RAID transform. When multiple vdevs are used, ZFS spreads data across the vdevs to increase performance and maximize usable space. - + Disk - The most basic type of vdev is a standard block device. This can be an entire disk (such as /dev/ada0 or /dev/da0) or a partition (/dev/ada0p3). Contrary to the Solaris documentation, on &os; there is no performance penalty for using a partition rather than an entire disk. - + File - In addition to disks, ZFS pools can be backed by regular files, this is especially useful for testing and experimentation. Use the full path to the file as the device path in the zpool create command. All vdevs must be atleast 128 MB in size. - + Mirror - When creating a mirror, specify the mirror keyword followed by the list of member devices for the mirror. A mirror consists of two or more devices, all data will be written to all member devices. A mirror vdev will only hold as much data as its smallest member. A mirror vdev can withstand the failure of all but one of its members without losing any data. A regular single disk vdev can be upgraded to a mirror vdev at any time using the zpool attach command. - + RAID-Z - ZFS implements RAID-Z, a variation on standard RAID-5 that offers better distribution of parity and eliminates the "RAID-5 write hole" in which the data and parity information become inconsistent after an unexpected restart. ZFS supports 3 levels of RAID-Z which provide varying levels of redundancy in exchange for decreasing levels of usable storage. The types are named RAID-Z1 through Z3 based on the number of parity devinces in the array and the number of disks that the pool can operate without. In a RAID-Z1 configuration with 4 disks, each 1 TB, usable storage will be 3 TB and the pool will still be able to operate in degraded mode with one faulted disk. If an additional disk goes offline before the faulted disk is replaced and resilvered, all data in the pool can be lost. In a RAID-Z3 configuration with 8 disks of 1 TB, the volume would provide 5TB of usable space and still be able to operate with three faulted disks. Sun recommends no more than 9 disks in a single vdev. If the configuration has more disks, it is recommended to divide them into separate vdevs and the pool data will be striped across them. A configuration of 2 RAID-Z2 vdevs consisting of 8 disks each would create something similar to a RAID 60 array. A RAID-Z group's storage capacity is approximately the size of the smallest disk, multiplied by the number of non-parity disks. 4x 1 TB disks in Z1 has an effective size of approximately 3 TB, and a 8x 1 TB array in Z3 will yeild 5 TB of usable space. - + Spare - ZFS has a special pseudo-vdev type for keeping track of available hot spares. Note that installed hot spares are not deployed automatically; they must manually be configured to replace the failed device using the zfs replace command. - + Log - ZFS Log Devices, also known as ZFS Intent Log (ZIL) move the intent log from the regular pool devices to a dedicated device. The ZIL accelerates synchronous transactions by using storage devices (such as SSDs) that are faster compared to those used for the main pool. When data is being written and the application requests a guarantee that the data has been safely stored, the data is written to the faster ZIL storage, then later flushed out to the regular disks, greatly reducing the latency of synchronous writes. Log devices can be mirrored, but RAID-Z is not supported. When specifying multiple log devices writes will be load balanced across all devices. - + Cache - Adding a cache vdev to a zpool will add the storage of the cache to the L2ARC. Cache devices cannot be mirrored. Since a cache device only stores additional copies of existing data, there is no risk of data loss. - Adaptive Replacement + Adaptive Replacement Cache (ARC) ZFS uses an Adaptive Replacement Cache (ARC), rather than a more traditional Least Recently Used (LRU) cache. An LRU cache is a simple list of items in the cache sorted by when each object was most recently used; new items are added to the top of the list and once the cache is full items from the bottom of the list are evicted to make room for more active objects. An ARC consists of four lists; the Most Recently Used (MRU) and Most Frequently Used (MFU) objects, plus a ghost list for each. These ghost lists tracks recently evicted objects to provent them being added back to the cache. This increases the cache hit ratio by avoiding objects that have a history of only being used occasionally. Another advantage of using both an MRU and MFU is that scanning an entire filesystem would normally evict all data from an MRU or LRU cache in favor of this freshly accessed content. In the case of ZFS since there is also an MFU that only tracks the most frequently used objects, the cache of the most commonly accessed blocks remains. - L2ARC + L2ARC The L2ARC is the second level of the ZFS caching system. The primary ARC is stored in RAM, however since the amount of available RAM is often limited, ZFS can also make use of cache vdevs. Solid State Disks (SSDs) are often used as these cache devices due to their higher speed and lower latency compared to traditional spinning disks. An L2ARC is entirely optional, but having one will significantly increase read speeds for files that are cached on the SSD instead of having to be read from the regular spinning disks. The L2ARC can also speed up deduplication since a DDT that does not fit in RAM but does fit in the L2ARC will be much faster than if the DDT had to be read from disk. The rate at which data is added to the cache devices is limited to prevent prematurely wearing out the SSD with too many writes. Until the cache is full (the first block has been evicted to make room), writing to the L2ARC is limited to the sum of the write limit and the boost limit, then after that limited to the write limit. A pair of sysctl values control these rate limits; vfs.zfs.l2arc_write_max controls how many bytes are written to the cache per second, while vfs.zfs.l2arc_write_boost adds to this limit during the "Turbo Warmup Phase" (Write Boost). - Copy-On-Write + Copy-On-Write Unlike a traditional file system, when data is overwritten on ZFS the new data is written to a different block rather than overwriting the old data in place. Only once this write is complete is the metadata then updated to point to the new location of the data. This means that in the event of a shorn write (a system crash or power loss in the middle of writing a file) the entire original contents of the file are still available and the incomplete write is discarded. This also means that ZFS does not require a fsck after an unexpected shutdown. - Dataset + Dataset - + Dataset is the generic term for a ZFS file + system, volume, snapshot or clone. Each dataset will + have a unique name in the format: + poolname/path@snapshot. The root + of the pool is technically a dataset as well. Child + datasets are named hierarchically like directories; + for example mypool/home, the home + dataset is a child of mypool and inherits properties + from it. This can be expended further by creating + mypool/home/user. This grandchild + dataset will inherity properties from the parent and + grandparent. It is also possible to set properties + on a child to override the defaults inherited from the + parents and grandparents. ZFS also allows + administration of datasets and their children to be + delegated. - Volume + Volume - In additional to regular file systems (datasets), + In additional to regular file system datasets, ZFS can also create volumes, which are block devices. Volumes have many of the same features, including copy-on-write, snapshots, clones and - checksumming. + checksumming. Volumes can be useful for running other + file system formats on top of ZFS, such as UFS or in + the case of Virtualization or exporting + iSCSI extents. - Snapshot + Snapshot The copy-on-write design of ZFS allows for nearly instantaneous consistent snapshots with arbitrary names. After taking a snapshot of a dataset (or a recursive snapshot of a parent dataset that will include all child datasets), new data is written to new blocks (as described above), however the old blocks are not reclaimed as free space. There are then two versions of the file system, the snapshot (what the file system looked like before) and the live file system; however no additional space is used. As new data is written to the live file system, new blocks are allocated to store this data. The apparent size of the snapshot will grow as the blocks are no longer used in the live file system, but only in the snapshot. These snapshots can be mounted (read only) to allow for the recovery of previous versions of files. It is also possible to rollback a live file system to a specific snapshot, undoing any changes that took place after the snapshot was taken. Each block in the zpool has a reference counter which indicates how many snapshots, clones, datasets or volumes make use of that block. As files and snapshots are deleted, the reference count is decremented; once a block is no longer referenced, it is reclaimed as free space. Snapshots can also be marked with a hold, once a snapshot is held, any attempt to destroy it will return an EBUY error. Each snapshot can have multiple holds, each with a unique name. The release command removes the hold so the snapshot can then be deleted. Snapshots can be taken on volumes, however they can only be cloned or rolled back, not mounted independently. - Clone + Clone Snapshots can also be cloned; a clone is a writable version of a snapshot, allowing the file system to be forked as a new dataset. As with a snapshot, a clone initially consumes no additional space, only as new data is written to a clone and new blocks are allocated does the apparent size of the clone grow. As blocks are overwritten in the cloned file system or volume, the reference count on the previous block is decremented. The snapshot upon which a clone is based cannot be deleted because the clone is dependeant upon it (the snapshot is the parent, and the clone is the child). Clones can be promoted, reversing this dependeancy, making the clone the parent and the previous parent the child. This operation requires no additional space, however it will change the way the used space is accounted. - Checksum + Checksum Every block that is allocated is also checksummed (which algorithm is used is a per dataset property, see: zfs set). ZFS transparently validates the checksum of each block as it is read, allowing ZFS to detect silent corruption. If the data that is read does not match the expected checksum, ZFS will attempt to recover the data from any available redundancy (mirrors, RAID-Z). You can trigger the validation of all checksums using the scrub command. The available checksum algorithms include: fletcher2 fletcher4 sha256 The fletcher algorithms are faster, but sha256 is a strong cryptographic hash and has a much lower chance of a collisions at the cost of some performance. Checksums can be disabled but it is inadvisable. - Compression + Compression Each dataset in ZFS has a compression property, which defaults to off. This property can be set to one of a number of compression algorithms, which will cause all new data that is written to this dataset to be compressed as it is written. In addition to the reduction in disk usage, this can also increase read and write throughput, as only the smaller compressed version of the file needs to be read or written. LZ4 compression is only available after &os; 9.2 - Deduplication + Deduplication ZFS has the ability to detect duplicate blocks of data as they are written (thanks to the checksumming feature). If deduplication is enabled, instead of writing the block a second time, the reference count of the existing block will be increased, saving storage space. In order to do this, ZFS keeps a deduplication table (DDT) in memory, containing the list of unique checksums, the location of that block and a reference count. When new data is written, the checksum is calculated and compared to the list. If a match is found, the data is considered to be a duplicate. When deduplication is enabled, the checksum algorithm is changed to SHA256 to provide a secure cryptographic hash. ZFS deduplication is tunable; if dedup is on, then a matching checksum is assumed to mean that the data is identical. If dedup is set to verify, then the data in the two blocks will be checked byte-for-byte to ensure it is actually identical and if it is not, the hash collision will be noted by ZFS and the two blocks will be stored separately. Due to the nature of the DDT, having to store the hash of each unique block, it consumes a very large amount of memory (a general rule of thumb is 5-6 GB of ram per 1 TB of deduplicated data). In situations where it is not practical to have enough RAM to keep the entire DDT in memory, performance will suffer greatly as the DDT will need to be read from disk before each new block is written. Deduplication can make use of the L2ARC to store the DDT, providing a middle ground between fast system memory and slower disks. It is advisable to consider using ZFS compression instead, which often provides nearly as much space savings without the additional memory requirement. - Scrub + Scrub In place of a consistency check like fsck, ZFS has the scrub command, which reads all data blocks stored on the pool and verifies their checksums them against the known good checksums stored in the metadata. This periodic check of all the data stored on the pool ensures the recovery of any corrupted blocks before they are needed. A scrub is not required after an unclean shutdown, but it is recommended that you run a scrub at least once each quarter. ZFS compares the checksum for each block as it is read in the normal course of use, but a scrub operation makes sure even infrequently used blocks are checked for silent corruption. - Dataset - Quota + Dataset Quota ZFS provides very fast and accurate dataset, user and group space accounting in addition to quotes and space reservations. This gives the administrator fine grained control over how space is allocated and allows critical file systems to reserve space to ensure other file systems do not take all of the free space. ZFS supports different types of quotas: the dataset quota, the reference quota (refquota), the user quota, and the group quota. Quotas limit the amount of space that a dataset and all of its descendants (snapshots of the dataset, child datasets and the snapshots of those datasets) can consume. Quotas cannot be set on volumes, as the volsize property acts as an implicit quota. - Reference + Reference Quota A reference quota limits the amount of space a dataset can consume by enforcing a hard limit on the space used. However, this hard limit includes only space that the dataset references and does not include space used by descendants, such as file systems or snapshots. - User - Quota + User + Quota User quotas are useful to limit the amount of space that can be used by the specified user. - - Group - Quota + Group + Quota The group quota limits the amount of space that a specified group can consume. - Dataset - Reservation + Dataset + Reservation The reservation property makes it possible to guaranteed a minimum amount of space for the use of a specific dataset and its descendants. This means that if a 10 GB reservation is set on storage/home/bob, if another dataset tries to use all of the free space, at least 10 GB of space is reserved for this dataset. If a snapshot is taken of storage/home/bob, the space used by that snapshot is counted against the reservation. The refreservation property works in a similar way, except it excludes descendants, such as snapshots. Reservations of any sort are useful in many situations, such as planning and testing the suitability of disk space allocation in a new system, or ensuring that enough space is available on file systems for audio logs or system recovery procedures and files. - Reference - Reservation + Reference + Reservation The refreservation property makes it possible to guaranteed a minimum amount of space for the use of a specific dataset excluding its descendants. This means that if a 10 GB reservation is set on storage/home/bob, if another dataset tries to use all of the free space, at least 10 GB of space is reserved for this dataset. In contrast to a regular reservation, space used by snapshots and decendant datasets is not counted against the reservation. As an example, if a snapshot was taken of storage/home/bob, enough disk space would have to exist outside of the refreservation amount for the operation to succeed because descendants of the main data set are not counted by the refreservation amount and so do not encroach on the space set. - Resilver + Resilver - + When a disk fails and must be replaced, the new + disk must be filled with the data that was lost. This + process of calculating and writing the missing data + (using the parity information distributed across the + remaining drives) to the new drive is called + Resilvering. What Makes ZFS Different - + ZFS is significantly different from any previous file + system owing to the fact that it is more than just a file + system. ZFS combines the traditionally separate roles of + volume manager and file system, which provides unique + advantages because the file system is now aware of the + underlying structure of the disks. Traditional file systems + could only be created on a single disk at a time, if there + were two disks then two separate file systems would have to + be created. In a traditional hardware RAID + configuration, this problem was worked around by presenting + the operating system with a single logical disk made up of + the space provided by a number of disks, on top of which the + operating system placed its file system. Even in the case of + software RAID solutions like GEOM, the UFS + file system living on top of the RAID + transform believed that it was dealing with a single device. + ZFS's combination of the volume manager and the file system + solves this and allows the creation of many file systems all + sharing a pool of available storage. One of the biggest + advantages to ZFS's awareness of the physical layout of the + disks is that ZFS can grow the existing file systems + automatically when additional disks are added to the pool. + This new space is then made available to all of the file + systems. ZFS also has a number of different properties that + can be applied to each file system, creating many advantages + to creating a number of different filesystems and datasets + rather than a single monolithic filesystem. <acronym>ZFS</acronym> Quick Start Guide There is a start up mechanism that allows &os; to mount ZFS pools during system initialization. To set it, issue the following commands: &prompt.root; echo 'zfs_enable="YES"' >> /etc/rc.conf &prompt.root; service zfs start The examples in this section assume three SCSI disks with the device names da0, da1, and da2. Users of SATA hardware should instead use ada device names. Single Disk Pool To create a simple, non-redundant ZFS pool using a single disk device, use zpool: &prompt.root; zpool create example /dev/da0 To view the new pool, review the output of df: &prompt.root; df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235230 1628718 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032846 48737598 2% /usr example 17547136 0 17547136 0% /example This output shows that the example pool has been created and mounted. It is now accessible as a file system. Files may be created on it and users can browse it, as seen in the following example: &prompt.root; cd /example &prompt.root; ls &prompt.root; touch testfile &prompt.root; ls -al total 4 drwxr-xr-x 2 root wheel 3 Aug 29 23:15 . drwxr-xr-x 21 root wheel 512 Aug 29 23:12 .. -rw-r--r-- 1 root wheel 0 Aug 29 23:15 testfile However, this pool is not taking advantage of any ZFS features. To create a dataset on this pool with compression enabled: &prompt.root; zfs create example/compressed &prompt.root; zfs set compression=gzip example/compressed The example/compressed dataset is now a ZFS compressed file system. Try copying some large files to /example/compressed. Compression can be disabled with: &prompt.root; zfs set compression=off example/compressed To unmount a file system, issue the following command and then verify by using df: &prompt.root; zfs umount example/compressed &prompt.root; df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235232 1628716 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032864 48737580 2% /usr example 17547008 0 17547008 0% /example To re-mount the file system to make it accessible again, and verify with df: &prompt.root; zfs mount example/compressed &prompt.root; df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235234 1628714 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032864 48737580 2% /usr example 17547008 0 17547008 0% /example example/compressed 17547008 0 17547008 0% /example/compressed The pool and file system may also be observed by viewing the output from mount: &prompt.root; mount /dev/ad0s1a on / (ufs, local) devfs on /dev (devfs, local) /dev/ad0s1d on /usr (ufs, local, soft-updates) example on /example (zfs, local) example/data on /example/data (zfs, local) example/compressed on /example/compressed (zfs, local) ZFS datasets, after creation, may be used like any file systems. However, many other features are available which can be set on a per-dataset basis. In the following example, a new file system, data is created. Important files will be stored here, the file system is set to keep two copies of each data block: &prompt.root; zfs create example/data &prompt.root; zfs set copies=2 example/data It is now possible to see the data and space utilization by issuing df: &prompt.root; df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235234 1628714 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032864 48737580 2% /usr example 17547008 0 17547008 0% /example example/compressed 17547008 0 17547008 0% /example/compressed example/data 17547008 0 17547008 0% /example/data Notice that each file system on the pool has the same amount of available space. This is the reason for using df in these examples, to show that the file systems use only the amount of space they need and all draw from the same pool. The ZFS file system does away with concepts such as volumes and partitions, and allows for several file systems to occupy the same pool. To destroy the file systems and then destroy the pool as they are no longer needed: &prompt.root; zfs destroy example/compressed &prompt.root; zfs destroy example/data &prompt.root; zpool destroy example <acronym>ZFS</acronym> RAID-Z There is no way to prevent a disk from failing. One method of avoiding data loss due to a failed hard disk is to implement RAID. ZFS supports this feature in its pool design. RAID-Z pools require 3 or more disks but yield more usable space than mirrored pools. To create a RAID-Z pool, issue the following command and specify the disks to add to the pool: &prompt.root; zpool create storage raidz da0 da1 da2 &sun; recommends that the number of devices used in a RAID-Z configuration is between three and nine. For environments requiring a single pool consisting of 10 disks or more, consider breaking it up into smaller RAID-Z groups. If only two disks are available and redundancy is a requirement, consider using a ZFS mirror. Refer to &man.zpool.8; for more details. This command creates the storage zpool. This may be verified using &man.mount.8; and &man.df.1;. This command makes a new file system in the pool called home: &prompt.root; zfs create storage/home It is now possible to enable compression and keep extra copies of directories and files using the following commands: &prompt.root; zfs set copies=2 storage/home &prompt.root; zfs set compression=gzip storage/home To make this the new home directory for users, copy the user data to this directory, and create the appropriate symbolic links: &prompt.root; cp -rp /home/* /storage/home &prompt.root; rm -rf /home /usr/home &prompt.root; ln -s /storage/home /home &prompt.root; ln -s /storage/home /usr/home Users should now have their data stored on the freshly created /storage/home. Test by adding a new user and logging in as that user. Try creating a snapshot which may be rolled back later: &prompt.root; zfs snapshot storage/home@08-30-08 Note that the snapshot option will only capture a real file system, not a home directory or a file. The @ character is a delimiter used between the file system name or the volume name. When a user's home directory gets trashed, restore it with: &prompt.root; zfs rollback storage/home@08-30-08 To get a list of all available snapshots, run ls in the file system's .zfs/snapshot directory. For example, to see the previously taken snapshot: &prompt.root; ls /storage/home/.zfs/snapshot It is possible to write a script to perform regular snapshots on user data. However, over time, snapshots may consume a great deal of disk space. The previous snapshot may be removed using the following command: &prompt.root; zfs destroy storage/home@08-30-08 After testing, /storage/home can be made the real /home using this command: &prompt.root; zfs set mountpoint=/home storage/home Run df and mount to confirm that the system now treats the file system as the real /home: &prompt.root; mount /dev/ad0s1a on / (ufs, local) devfs on /dev (devfs, local) /dev/ad0s1d on /usr (ufs, local, soft-updates) storage on /storage (zfs, local) storage/home on /home (zfs, local) &prompt.root; df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 2026030 235240 1628708 13% / devfs 1 1 0 100% /dev /dev/ad0s1d 54098308 1032826 48737618 2% /usr storage 26320512 0 26320512 0% /storage storage/home 26320512 0 26320512 0% /home This completes the RAID-Z configuration. To get status updates about the file systems created during the nightly &man.periodic.8; runs, issue the following command: &prompt.root; echo 'daily_status_zfs_enable="YES"' >> /etc/periodic.conf Recovering <acronym>RAID</acronym>-Z Every software RAID has a method of monitoring its state. The status of RAID-Z devices may be viewed with the following command: &prompt.root; zpool status -x If all pools are healthy and everything is normal, the following message will be returned: all pools are healthy If there is an issue, perhaps a disk has gone offline, the pool state will look similar to: pool: storage state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 da0 ONLINE 0 0 0 da1 OFFLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors This indicates that the device was previously taken offline by the administrator using the following command: &prompt.root; zpool offline storage da1 It is now possible to replace da1 after the system has been powered down. When the system is back online, the following command may issued to replace the disk: &prompt.root; zpool replace storage da1 From here, the status may be checked again, this time without the flag to get state information: &prompt.root; zpool status storage pool: storage state: ONLINE scrub: resilver completed with 0 errors on Sat Aug 30 19:44:11 2008 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors As shown from this example, everything appears to be normal. Data Verification ZFS uses checksums to verify the integrity of stored data. These are enabled automatically upon creation of file systems and may be disabled using the following command: &prompt.root; zfs set checksum=off storage/home Doing so is not recommended as checksums take very little storage space and are used to check data integrity using checksum verification in a process is known as scrubbing. To verify the data integrity of the storage pool, issue this command: &prompt.root; zpool scrub storage This process may take considerable time depending on the amount of data stored. It is also very I/O intensive, so much so that only one scrub may be run at any given time. After the scrub has completed, the status is updated and may be viewed by issuing a status request: &prompt.root; zpool status storage pool: storage state: ONLINE scrub: scrub completed with 0 errors on Sat Jan 26 19:57:37 2013 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors The completion time is displayed and helps to ensure data integrity over a long period of time. Refer to &man.zfs.8; and &man.zpool.8; for other ZFS options. <command>zpool</command> Administration Creating & Destroying Storage Pools Adding & Removing Devices Dealing with Failed Devices Importing & Exporting Pools Upgrading a Storage Pool Checking the Status of a Pool Performance Monitoring Splitting a Storage Pool <command>zfs</command> Administration Creating & Destroying Datasets Creating & Destroying Volumes Renaming a Dataset Setting Dataset Properties Managing Snapshots Managing Clones ZFS Replication Dataset, User and Group Quotes To enforce a dataset quota of 10 GB for storage/home/bob, use the following: &prompt.root; zfs set quota=10G storage/home/bob To enforce a reference quota of 10 GB for storage/home/bob, use the following: &prompt.root; zfs set refquota=10G storage/home/bob The general format is userquota@user=size, and the user's name must be in one of the following formats: POSIX compatible name such as joe. POSIX numeric ID such as 789. SID name such as joe.bloggs@example.com. SID numeric ID such as S-1-123-456-789. For example, to enforce a user quota of 50 GB for a user named joe, use the following: &prompt.root; zfs set userquota@joe=50G To remove the quota or make sure that one is not set, instead use: &prompt.root; zfs set userquota@joe=none User quota properties are not displayed by zfs get all. Non-root users can only see their own quotas unless they have been granted the userquota privilege. Users with this privilege are able to view and set everyone's quota. The general format for setting a group quota is: groupquota@group=size. To set the quota for the group firstgroup to 50 GB, use: &prompt.root; zfs set groupquota@firstgroup=50G To remove the quota for the group firstgroup, or to make sure that one is not set, instead use: &prompt.root; zfs set groupquota@firstgroup=none As with the user quota property, non-root users can only see the quotas associated with the groups that they belong to. However, root or a user with the groupquota privilege can view and set all quotas for all groups. To display the amount of space consumed by each user on the specified filesystem or snapshot, along with any specified quotas, use zfs userspace. For group information, use zfs groupspace. For more information about supported options or how to display only specific options, refer to &man.zfs.1;. Users with sufficient privileges and root can list the quota for storage/home/bob using: &prompt.root; zfs get quota storage/home/bob Reservations The general format of the reservation property is reservation=size, so to set a reservation of 10 GB on storage/home/bob, use: &prompt.root; zfs set reservation=10G storage/home/bob To make sure that no reservation is set, or to remove a reservation, use: &prompt.root; zfs set reservation=none storage/home/bob The same principle can be applied to the refreservation property for setting a refreservation, with the general format refreservation=size. To check if any reservations or refreservations exist on storage/home/bob, execute one of the following commands: &prompt.root; zfs get reservation storage/home/bob &prompt.root; zfs get refreservation storage/home/bob Compression Deduplication Delegated Administration ZFS Advanced Topics ZFS Tuning Booting Root on ZFS ZFS Boot Environments Troubleshooting ZFS on i386 Some of the features provided by ZFS are RAM-intensive, so some tuning may be required to provide maximum efficiency on systems with limited RAM. Memory At a bare minimum, the total system memory should be at least one gigabyte. The amount of recommended RAM depends upon the size of the pool and the ZFS features which are used. A general rule of thumb is 1GB of RAM for every 1TB of storage. If the deduplication feature is used, a general rule of thumb is 5GB of RAM per TB of storage to be deduplicated. While some users successfully use ZFS with less RAM, it is possible that when the system is under heavy load, it may panic due to memory exhaustion. Further tuning may be required for systems with less than the recommended RAM requirements. Kernel Configuration Due to the RAM limitations of the &i386; platform, users using ZFS on the &i386; architecture should add the following option to a custom kernel configuration file, rebuild the kernel, and reboot: options KVA_PAGES=512 This option expands the kernel address space, allowing the vm.kvm_size tunable to be pushed beyond the currently imposed limit of 1 GB, or the limit of 2 GB for PAE. To find the most suitable value for this option, divide the desired address space in megabytes by four (4). In this example, it is 512 for 2 GB. Loader Tunables The kmem address space can be increased on all &os; architectures. On a test system with one gigabyte of physical memory, success was achieved with the following options added to /boot/loader.conf, and the system restarted: vm.kmem_size="330M" vm.kmem_size_max="330M" vfs.zfs.arc_max="40M" vfs.zfs.vdev.cache.size="5M" For a more detailed list of recommendations for ZFS-related tuning, see . Additional Resources FreeBSD Wiki - ZFS FreeBSD Wiki - ZFS Tuning Illumos Wiki - ZFS Oracle Solaris ZFS Administration Guide ZFS Evil Tuning Guide ZFS Best Practices Guide &linux; Filesystems This section describes some of the &linux; filesystems supported by &os;. <acronym>ext2</acronym> The &man.ext2fs.5; file system kernel implementation has been available since &os; 2.2. In &os; 8.x and earlier, the code is licensed under the GPL. Since &os; 9.0, the code has been rewritten and is now BSD licensed. The &man.ext2fs.5; driver allows the &os; kernel to both read and write to ext2 file systems. To access an ext2 file system, first load the kernel loadable module: &prompt.root; kldload ext2fs Then, to mount an &man.ext2fs.5; volume located on /dev/ad1s1: &prompt.root; mount -t ext2fs /dev/ad1s1 /mnt XFS XFS was originally written by SGI for the IRIX operating system and was then ported to &linux; and released under the GPL. See this page for more details. The &os; port was started by Russel Cattelan, &a.kan.email;, and &a.rodrigc.email;. To load XFS as a kernel-loadable module: &prompt.root; kldload xfs The &man.xfs.5; driver lets the &os; kernel access XFS filesystems. However, only read-only access is supported and writing to a volume is not possible. To mount a &man.xfs.5; volume located on /dev/ad1s1: &prompt.root; mount -t xfs /dev/ad1s1 /mnt The sysutils/xfsprogs port includes the mkfs.xfs which enables the creation of XFS filesystems, plus utilities for analyzing and repairing them. The -p flag to mkfs.xfs can be used to create an &man.xfs.5; filesystem which is populated with files and other metadata. This can be used to quickly create a read-only filesystem which can be tested on &os;. ReiserFS The Reiser file system, ReiserFS, was ported to &os; by &a.dumbbell.email;, and has been released under the GPL . The ReiserFS driver permits the &os; kernel to access ReiserFS file systems and read their contents, but not write to them. First, the kernel-loadable module needs to be loaded: &prompt.root; kldload reiserfs Then, to mount a ReiserFS volume located on /dev/ad1s1: &prompt.root; mount -t reiserfs /dev/ad1s1 /mnt