Index: en_US.ISO8859-1/books/handbook/zfs/chapter.xml =================================================================== --- en_US.ISO8859-1/books/handbook/zfs/chapter.xml +++ en_US.ISO8859-1/books/handbook/zfs/chapter.xml @@ -143,6 +143,13 @@ ada device names. + The first step in creating a new ZFS pool + is deciding on the disk layout. There are a number of options + and once the pool is created, the layout cannot be changed. + For more information see Advanced Topics - Pool + Layout. + Single Disk Pool @@ -460,11 +467,11 @@ upon creation of file systems. - Checksums can be disabled, but it is - not recommended! Checksums take very - little storage space and provide data integrity. Many - ZFS features will not work properly with - checksums disabled. There is no noticeable performance gain + Checksums can be, but should never be disabled. The + storage space for the checksum is a fixed part of the + metadata, so no space is saved by disabling checksums. + Many features will not work properly without checksums and + there is also no noticeable performance gain from disabling these checksums. @@ -715,13 +722,13 @@ as for RAID-Z, an alternative method is to add another vdev to the pool. Additional vdevs provide higher performance, distributing writes across the vdevs. Each vdev - is responsible for providing its own redundancy. It is - possible, but discouraged, to mix vdev types, like + is responsible for providing its own redundancy. Do not mix + different vdev types, like mirror and RAID-Z. Adding a non-redundant vdev to a pool containing mirror or RAID-Z vdevs risks the data on the entire pool. Writes are distributed, so the failure of the - non-redundant disk will result in the loss of a fraction of + non-redundant vdev will result in the loss of a fraction of every block that has been written to the pool. Data is striped across each of the vdevs. For example, @@ -730,8 +737,8 @@ of mirrors. Space is allocated so that each vdev reaches 100% full at the same time. There is a performance penalty if the vdevs have different amounts of free space, as a - disproportionate amount of the data is written to the less - full vdev. + disproportionate amount of the data is written to the vdev + that is less full. When attaching additional devices to a boot pool, remember to update the bootcode. @@ -2171,7 +2178,7 @@ ZFS will issue this warning: &prompt.root; zfs list -rt snapshot mypool/var/tmp -AME USED AVAIL REFER MOUNTPOINT +NAME USED AVAIL REFER MOUNTPOINT mypool/var/tmp@my_recursive_snapshot 88K - 152K - mypool/var/tmp@after_cp 53.5K - 118K - mypool/var/tmp@diff_snapshot 0 - 120K - @@ -2265,7 +2272,7 @@ cp: /var/tmp/.zfs/snapshot/after_cp/rc.conf: Read-only file system The error reminds the user that snapshots are read-only - and can not be changed after creation. No files can be + and cannot be changed after creation. No files can be copied into or removed from snapshot directories because that would change the state of the dataset they represent. @@ -2290,99 +2297,140 @@ Managing Clones - A clone is a copy of a snapshot that is treated more like - a regular dataset. Unlike a snapshot, a clone is not read - only, is mounted, and can have its own properties. Once a - clone has been created using zfs clone, the - snapshot it was created from cannot be destroyed. The - child/parent relationship between the clone and the snapshot - can be reversed using zfs promote. After a - clone has been promoted, the snapshot becomes a child of the - clone, rather than of the original parent dataset. This will - change how the space is accounted, but not actually change the - amount of space consumed. The clone can be mounted at any - point within the ZFS file system hierarchy, - not just below the original location of the snapshot. - - To demonstrate the clone feature, this example dataset is - used: - - &prompt.root; zfs list -rt all camino/home/joe -NAME USED AVAIL REFER MOUNTPOINT -camino/home/joe 108K 1.3G 87K /usr/home/joe -camino/home/joe@plans 21K - 85.5K - -camino/home/joe@backup 0K - 87K - - - A typical use for clones is to experiment with a specific - dataset while keeping the snapshot around to fall back to in - case something goes wrong. Since snapshots can not be - changed, a read/write clone of a snapshot is created. After - the desired result is achieved in the clone, the clone can be - promoted to a dataset and the old file system removed. This - is not strictly necessary, as the clone and dataset can - coexist without problems. - - &prompt.root; zfs clone camino/home/joe@backup camino/home/joenew -&prompt.root; ls /usr/home/joe* -/usr/home/joe: -backup.txz plans.txt - -/usr/home/joenew: -backup.txz plans.txt -&prompt.root; df -h /usr/home -Filesystem Size Used Avail Capacity Mounted on -usr/home/joe 1.3G 31k 1.3G 0% /usr/home/joe -usr/home/joenew 1.3G 31k 1.3G 0% /usr/home/joenew - - After a clone is created it is an exact copy of the state - the dataset was in when the snapshot was taken. The clone can - now be changed independently from its originating dataset. - The only connection between the two is the snapshot. - ZFS records this connection in the property - origin. Once the dependency between the - snapshot and the clone has been removed by promoting the clone - using zfs promote, the - origin of the clone is removed as it is now - an independent dataset. This example demonstrates it: - - &prompt.root; zfs get origin camino/home/joenew -NAME PROPERTY VALUE SOURCE -camino/home/joenew origin camino/home/joe@backup - -&prompt.root; zfs promote camino/home/joenew -&prompt.root; zfs get origin camino/home/joenew -NAME PROPERTY VALUE SOURCE -camino/home/joenew origin - - - - After making some changes like copying - loader.conf to the promoted clone, for - example, the old directory becomes obsolete in this case. - Instead, the promoted clone can replace it. This can be - achieved by two consecutive commands: zfs - destroy on the old dataset and zfs - rename on the clone to name it like the old - dataset (it could also get an entirely different name). - - &prompt.root; cp /boot/defaults/loader.conf /usr/home/joenew -&prompt.root; zfs destroy -f camino/home/joe -&prompt.root; zfs rename camino/home/joenew camino/home/joe -&prompt.root; ls /usr/home/joe -backup.txz loader.conf plans.txt -&prompt.root; df -h /usr/home -Filesystem Size Used Avail Capacity Mounted on -usr/home/joe 1.3G 128k 1.3G 0% /usr/home/joe - - The cloned snapshot is now handled like an ordinary - dataset. It contains all the data from the original snapshot - plus the files that were added to it like - loader.conf. Clones can be used in - different scenarios to provide useful features to ZFS users. - For example, jails could be provided as snapshots containing - different sets of installed applications. Users can clone - these snapshots and add their own applications as they see - fit. Once they are satisfied with the changes, the clones can - be promoted to full datasets and provided to end users to work - with like they would with a real dataset. This saves time and - administrative overhead when providing these jails. + A clone is an exact copy of a snapshot that is treated + like a regular dataset. Unlike a snapshot, the clone can + be changed independently from its originating dataset. A + clone can be written to, mounted, and have its own dataset + properties. Similar to how snapshots works, a clone shares + unmodified blocks with the origin snapshot + it was created from. This converves space, as the clone only consumes additional space when it is + modified. A clone can only be created from a snapshot. + + Create a snapshot of a file system, then clone it: + + &prompt.root; echo "first message" > /var/tmp/my_message +&prompt.root; ls /var/tmp +my_message vi.recover +&prompt.root; zfs snapshot mypool/var/tmp@first_snapshot +&prompt.root; zfs list -rt all mypool/var/tmp +NAME USED AVAIL REFER MOUNTPOINT +mypool/var/tmp 249K 30.5G 249K /var/tmp +mypool/var/tmp@first_snapshot 0 - 249K - +&prompt.root; zfs clone mypool/var/tmp@first_snapshot mypool/var/clone +&prompt.root; zfs list -rt all mypool/var/clone mypool/var/tmp +NAME USED AVAIL REFER MOUNTPOINT +mypool/var/clone 12.8K 30.5G 249K /var/clone +mypool/var/tmp 249K 30.5G 249K /var/tmp +mypool/var/tmp@first_snapshot 0 - 249K - +&prompt.root; ls /var/clone +my_message vi.recover + + A clone is essentially a fork of a file system, a common + base set of blocks that are shared by two file systems. When + a file is modified in a clone, additional space is consumed. + The original blocks are kept intact because they are still + being used by the first file system and any snapshots that belong to it. + When a file is modified in the first file system, additional + space is consumed again, this time allocated to the snapshot. + The original blocks are still in use, now only by the + snapshot. The system now contains all three versions of the + file. + + One common use case for clones is for experimenting with a + dataset while preverving the original. + Clones can also be useful for + databases, jails, and virtual machines. Clones allow the + administrator to create multiple nearly identical versions of + the original without consuming additional space. Clones can be + kept indefinitely. If the clone achieves the desired result, + it can be promoted to be the parent dataset. The original + file system can then be destroyed. + + Make a change to the clone, and then the parent: + + &prompt.root; echo "clone message" > /var/clone/my_message +&prompt.root; zfs list -rt all mypool/var/clone mypool/var/tmp +NAME USED AVAIL REFER MOUNTPOINT +mypool/var/clone 134K 30.5G 249K /var/clone +mypool/var/tmp 249K 30.5G 249K /var/tmp +mypool/var/tmp@first_snapshot 0 - 249K - +&prompt.root; echo "new message" > /var/tmp/my_message +&prompt.root; zfs list -rt all mypool/var/clone mypool/var/tmp +NAME USED AVAIL REFER MOUNTPOINT +mypool/var/clone 134K 30.5G 249K /var/clone +mypool/var/tmp 383K 30.5G 249K /var/tmp +mypool/var/tmp@first_snapshot 134K - 249K - + + After a clone has been created, the snapshot it was + created from cannot be destroyed because the clone only + contains the blocks that have been modified. The child/parent + relationship between the clone and the snapshot can be + reversed using zfs promote. The snapshot + then becomes a child of the clone, rather than of the original + parent dataset. The original dataset can then be destroyed if + desired. The way that space usage is recorded changes when a clone + is promoted. The same amount of space is used, but which of + the blocks are owned by the parent and the child + changes. + + The only connection between the clone and the original + dataset is the snapshot. The + connection is recorded in the origin property. + The dependency between the clone and the original dataset is + reversed by + zfs promote. The original dataset becomes + the clone. The origin property on the + clone will then be blank. The origin + property on the original dataset now point to the + snapshot under the dataset that was formerly the clone. + + Promote the clone: + + &prompt.root; zfs list -rt all mypool/var/clone mypool/var/tmp +NAME USED AVAIL REFER MOUNTPOINT +mypool/var/clone 134K 30.5G 249K /var/clone +mypool/var/tmp 383K 30.5G 249K /var/tmp +mypool/var/tmp@first_snapshot 134K - 249K - +&prompt.root; zfs get origin mypool/var/clone +NAME PROPERTY VALUE SOURCE +mypool/var/clone origin mypool/var/tmp@first_snapshot - +&prompt.root; zfs promote mypool/var/clone +&prompt.root; zfs list -rt all mypool/var/clone mypool/var/tmp +NAME USED AVAIL REFER MOUNTPOINT +mypool/var/clone 383K 30.5G 249K /var/clone +mypool/var/clone@first_snapshot 134K - 249K - +mypool/var/tmp 134K 30.5G 249K /var/tmp +&prompt.root; zfs get origin mypool/var/clone +NAME PROPERTY VALUE SOURCE +mypool/var/clone origin - - +&prompt.root; zfs get origin mypool/var/tmp +NAME PROPERTY VALUE SOURCE +mypool/var/tmp origin mypool/var/clone@first_snapshot - + + After making changes to the clone, it is now in + the state wanted by the administrator. The old dataset has + is obsolete and the administrator wants to replace it + with the clone. After the clone is promoted, this can be + achieved with two additional commands: zfs + destroy the old dataset and zfs + rename the clone to the name of the old dataset. + The clone could also keep its original name, and only change + its mountpoint property instead. + + &prompt.root; zfs destroy -f mypool/var/tmp +&prompt.root; zfs rename mypool/var/clone mypool/var/tmp +&prompt.root; zfs list -rt all mypool/var/tmp +NAME USED AVAIL REFER MOUNTPOINT +mypool/var/tmp 383K 30.5G 249K /var/tmp +mypool/var/tmp@first_snapshot 134K - 249K - + + The original clone is now an ordinary dataset. It + contains all the data from the original snapshot plus the + files that were added or modified. Any changes made to the + original dataset after the snapshot was created will be + destroyed. Now that there are no other datasets + depending on the snapshot, it can be destroyed as + well. @@ -3041,6 +3089,55 @@ Advanced Topics + + Pool Layout + + Choosing the type of vdevs for construct a + pool requires deciding which factors are most important. The + main considerations for a pool are: redundancy, capacity, and + performance. + + Mirrors + provide the best performance in terms of operations per second + (IOPS). With a mirror, every disk in a + vdev can be used to service reads, because each disk in the vdev + contains a complete copy of the data. Mirrors provide good + redundancy. Each disk in a vdev contains a complete copy + of the data and a mirror vdev can consist of many disks. The + downside to mirrors is that they provide the worst space efficiency and total capacity. + Each mirror vdev, no matter how many disks it contains, + provides only the capacity of the smallest disk. Multiple + mirror vdevs can be striped together (similar to RAID-10) to + provide more capacity, but the usable capacity will usually be + less than the same number of disks in RAID-Z. + + RAID-Z comes in + a number of levels of redundancy. RAID-Z1 provides enough + redundancy to withstand the failure of a single disk in each + vdev. RAID-Z2 can withstand two disks failing at the same time, and Z3 + can withstand three, without any data loss. + Choosing between these levels allows the + administrator to balance redundancy against + usable capacity. Each RAID-Z vdev will provide + storage capacity equal to the number of disks, less the level + of redundancy, multiplied by the size of the smallest disk. + Examples of the storage calculations are provided in the + RAID-Z definition + in the terminology section. Multiple RAID-Z vdevs can be + striped together to create an effective RAID-50 or RAID-60 + type array. + + Using more vdevs will increase performance. + Each vdev is operated as a unit. The effective speed of an + individual vdev is determined by the speed of the slowest + device. For the best performance, the recommended layout is + many mirror vdevs, but this provides the worst effective + capacity of the possible configurations. For increased + redundancy, an administrator can choose between using RAID-Z2, + Z3, or adding more member disks to each mirror vdev. + + Tuning @@ -3173,16 +3270,6 @@ vfs.zfs.vdev.max_pending - - Limit the number of pending I/O requests per device. - A higher value will keep the device command queue full - and may give higher throughput. A lower value will reduce - latency. This value can be adjusted at any time with - &man.sysctl.8;. - - - - vfs.zfs.top_maxinflight - Maxmimum number of outstanding I/Os per top-level vdev. Limits the @@ -3299,7 +3386,7 @@ vfs.zfs.txg.timeout - Maximum number of seconds between - transaction groups. + transaction groups. The current transaction group will be written to the pool and a fresh transaction group started if this amount of time has elapsed since the previous transaction group. A @@ -3608,6 +3695,14 @@ and an array of eight 1 TB disks in RAID-Z3 will yield 5 TB of usable space. + + For + optimal performance, it is best to have a power of + 2 (2, 4, 8) number of non-parity drives so that + writes can be distributed evenly. The recommended + configurations are: RAID-Z1: 3, 5, or 9 disks. + RAID-Z2: 6 or 10 disks. RAID-Z3: 5, 7 or 11 + disks. @@ -4065,10 +4160,13 @@ this property on important datasets provides additional redundancy from which to recover a block that does not match its checksum. In pools without redundancy, the - copies feature is the only form of redundancy. The - copies feature can recover from a single bad sector or + copies feature is the only form of redundancy. The + copies feature can recover from a single bad sector or other forms of minor corruption, but it does not protect - the pool from the loss of an entire disk. + the pool from the loss of an entire disk. Each + copy of a block consumes that much additional space in + the file system, but also in any snapshots where that + block has been modified.