Index: en_US.ISO8859-1/books/handbook/zfs/chapter.xml
===================================================================
--- en_US.ISO8859-1/books/handbook/zfs/chapter.xml
+++ en_US.ISO8859-1/books/handbook/zfs/chapter.xml
@@ -143,6 +143,13 @@
ada device
names.
+ The first step in creating a new ZFS pool
+ is deciding on the disk layout. There are a number of options
+ and once the pool is created, the layout cannot be changed.
+ For more information see Advanced Topics - Pool
+ Layout.
+
Single Disk Pool
@@ -460,11 +467,11 @@
upon creation of file systems.
- Checksums can be disabled, but it is
- not recommended! Checksums take very
- little storage space and provide data integrity. Many
- ZFS features will not work properly with
- checksums disabled. There is no noticeable performance gain
+ Checksums can be, but should never be disabled. The
+ storage space for the checksum is a fixed part of the
+ metadata, so no space is saved by disabling checksums.
+ Many features will not work properly without checksums and
+ there is also no noticeable performance gain
from disabling these checksums.
@@ -715,13 +722,13 @@
as for RAID-Z, an alternative method is to
add another vdev to the pool. Additional vdevs provide higher
performance, distributing writes across the vdevs. Each vdev
- is responsible for providing its own redundancy. It is
- possible, but discouraged, to mix vdev types, like
+ is responsible for providing its own redundancy. Do not mix
+ different vdev types, like
mirror and RAID-Z.
Adding a non-redundant vdev to a pool containing mirror or
RAID-Z vdevs risks the data on the entire
pool. Writes are distributed, so the failure of the
- non-redundant disk will result in the loss of a fraction of
+ non-redundant vdev will result in the loss of a fraction of
every block that has been written to the pool.
Data is striped across each of the vdevs. For example,
@@ -730,8 +737,8 @@
of mirrors. Space is allocated so that each vdev reaches 100%
full at the same time. There is a performance penalty if the
vdevs have different amounts of free space, as a
- disproportionate amount of the data is written to the less
- full vdev.
+ disproportionate amount of the data is written to the vdev
+ that is less full.
When attaching additional devices to a boot pool, remember
to update the bootcode.
@@ -2171,7 +2178,7 @@
ZFS will issue this warning:
&prompt.root; zfs list -rt snapshot mypool/var/tmp
-AME USED AVAIL REFER MOUNTPOINT
+NAME USED AVAIL REFER MOUNTPOINT
mypool/var/tmp@my_recursive_snapshot 88K - 152K -
mypool/var/tmp@after_cp 53.5K - 118K -
mypool/var/tmp@diff_snapshot 0 - 120K -
@@ -2265,7 +2272,7 @@
cp: /var/tmp/.zfs/snapshot/after_cp/rc.conf: Read-only file systemThe error reminds the user that snapshots are read-only
- and can not be changed after creation. No files can be
+ and cannot be changed after creation. No files can be
copied into or removed from snapshot directories because
that would change the state of the dataset they
represent.
@@ -2290,99 +2297,140 @@
Managing Clones
- A clone is a copy of a snapshot that is treated more like
- a regular dataset. Unlike a snapshot, a clone is not read
- only, is mounted, and can have its own properties. Once a
- clone has been created using zfs clone, the
- snapshot it was created from cannot be destroyed. The
- child/parent relationship between the clone and the snapshot
- can be reversed using zfs promote. After a
- clone has been promoted, the snapshot becomes a child of the
- clone, rather than of the original parent dataset. This will
- change how the space is accounted, but not actually change the
- amount of space consumed. The clone can be mounted at any
- point within the ZFS file system hierarchy,
- not just below the original location of the snapshot.
-
- To demonstrate the clone feature, this example dataset is
- used:
-
- &prompt.root; zfs list -rt all camino/home/joe
-NAME USED AVAIL REFER MOUNTPOINT
-camino/home/joe 108K 1.3G 87K /usr/home/joe
-camino/home/joe@plans 21K - 85.5K -
-camino/home/joe@backup 0K - 87K -
-
- A typical use for clones is to experiment with a specific
- dataset while keeping the snapshot around to fall back to in
- case something goes wrong. Since snapshots can not be
- changed, a read/write clone of a snapshot is created. After
- the desired result is achieved in the clone, the clone can be
- promoted to a dataset and the old file system removed. This
- is not strictly necessary, as the clone and dataset can
- coexist without problems.
-
- &prompt.root; zfs clone camino/home/joe@backupcamino/home/joenew
-&prompt.root; ls /usr/home/joe*
-/usr/home/joe:
-backup.txz plans.txt
-
-/usr/home/joenew:
-backup.txz plans.txt
-&prompt.root; df -h /usr/home
-Filesystem Size Used Avail Capacity Mounted on
-usr/home/joe 1.3G 31k 1.3G 0% /usr/home/joe
-usr/home/joenew 1.3G 31k 1.3G 0% /usr/home/joenew
-
- After a clone is created it is an exact copy of the state
- the dataset was in when the snapshot was taken. The clone can
- now be changed independently from its originating dataset.
- The only connection between the two is the snapshot.
- ZFS records this connection in the property
- origin. Once the dependency between the
- snapshot and the clone has been removed by promoting the clone
- using zfs promote, the
- origin of the clone is removed as it is now
- an independent dataset. This example demonstrates it:
-
- &prompt.root; zfs get origin camino/home/joenew
-NAME PROPERTY VALUE SOURCE
-camino/home/joenew origin camino/home/joe@backup -
-&prompt.root; zfs promote camino/home/joenew
-&prompt.root; zfs get origin camino/home/joenew
-NAME PROPERTY VALUE SOURCE
-camino/home/joenew origin - -
-
- After making some changes like copying
- loader.conf to the promoted clone, for
- example, the old directory becomes obsolete in this case.
- Instead, the promoted clone can replace it. This can be
- achieved by two consecutive commands: zfs
- destroy on the old dataset and zfs
- rename on the clone to name it like the old
- dataset (it could also get an entirely different name).
-
- &prompt.root; cp /boot/defaults/loader.conf/usr/home/joenew
-&prompt.root; zfs destroy -f camino/home/joe
-&prompt.root; zfs rename camino/home/joenewcamino/home/joe
-&prompt.root; ls /usr/home/joe
-backup.txz loader.conf plans.txt
-&prompt.root; df -h /usr/home
-Filesystem Size Used Avail Capacity Mounted on
-usr/home/joe 1.3G 128k 1.3G 0% /usr/home/joe
-
- The cloned snapshot is now handled like an ordinary
- dataset. It contains all the data from the original snapshot
- plus the files that were added to it like
- loader.conf. Clones can be used in
- different scenarios to provide useful features to ZFS users.
- For example, jails could be provided as snapshots containing
- different sets of installed applications. Users can clone
- these snapshots and add their own applications as they see
- fit. Once they are satisfied with the changes, the clones can
- be promoted to full datasets and provided to end users to work
- with like they would with a real dataset. This saves time and
- administrative overhead when providing these jails.
+ A clone is an exact copy of a snapshot that is treated
+ like a regular dataset. Unlike a snapshot, the clone can
+ be changed independently from its originating dataset. A
+ clone can be written to, mounted, and have its own dataset
+ properties. Similar to how snapshots works, a clone shares
+ unmodified blocks with the origin snapshot
+ it was created from. This converves space, as the clone only consumes additional space when it is
+ modified. A clone can only be created from a snapshot.
+
+ Create a snapshot of a file system, then clone it:
+
+ &prompt.root; echo "first message" > /var/tmp/my_message
+&prompt.root; ls /var/tmp
+my_message vi.recover
+&prompt.root; zfs snapshot mypool/var/tmp@first_snapshot
+&prompt.root; zfs list -rt all mypool/var/tmp
+NAME USED AVAIL REFER MOUNTPOINT
+mypool/var/tmp 249K 30.5G 249K /var/tmp
+mypool/var/tmp@first_snapshot 0 - 249K -
+&prompt.root; zfs clone mypool/var/tmp@first_snapshotmypool/var/clone
+&prompt.root; zfs list -rt all mypool/var/clonemypool/var/tmp
+NAME USED AVAIL REFER MOUNTPOINT
+mypool/var/clone 12.8K 30.5G 249K /var/clone
+mypool/var/tmp 249K 30.5G 249K /var/tmp
+mypool/var/tmp@first_snapshot 0 - 249K -
+&prompt.root; ls /var/clone
+my_message vi.recover
+
+ A clone is essentially a fork of a file system, a common
+ base set of blocks that are shared by two file systems. When
+ a file is modified in a clone, additional space is consumed.
+ The original blocks are kept intact because they are still
+ being used by the first file system and any snapshots that belong to it.
+ When a file is modified in the first file system, additional
+ space is consumed again, this time allocated to the snapshot.
+ The original blocks are still in use, now only by the
+ snapshot. The system now contains all three versions of the
+ file.
+
+ One common use case for clones is for experimenting with a
+ dataset while preverving the original.
+ Clones can also be useful for
+ databases, jails, and virtual machines. Clones allow the
+ administrator to create multiple nearly identical versions of
+ the original without consuming additional space. Clones can be
+ kept indefinitely. If the clone achieves the desired result,
+ it can be promoted to be the parent dataset. The original
+ file system can then be destroyed.
+
+ Make a change to the clone, and then the parent:
+
+ &prompt.root; echo "clone message" > /var/clone/my_message
+&prompt.root; zfs list -rt all mypool/var/clonemypool/var/tmp
+NAME USED AVAIL REFER MOUNTPOINT
+mypool/var/clone 134K 30.5G 249K /var/clone
+mypool/var/tmp 249K 30.5G 249K /var/tmp
+mypool/var/tmp@first_snapshot 0 - 249K -
+&prompt.root; echo "new message" > /var/tmp/my_message
+&prompt.root; zfs list -rt all mypool/var/clonemypool/var/tmp
+NAME USED AVAIL REFER MOUNTPOINT
+mypool/var/clone 134K 30.5G 249K /var/clone
+mypool/var/tmp 383K 30.5G 249K /var/tmp
+mypool/var/tmp@first_snapshot 134K - 249K -
+
+ After a clone has been created, the snapshot it was
+ created from cannot be destroyed because the clone only
+ contains the blocks that have been modified. The child/parent
+ relationship between the clone and the snapshot can be
+ reversed using zfs promote. The snapshot
+ then becomes a child of the clone, rather than of the original
+ parent dataset. The original dataset can then be destroyed if
+ desired. The way that space usage is recorded changes when a clone
+ is promoted. The same amount of space is used, but which of
+ the blocks are owned by the parent and the child
+ changes.
+
+ The only connection between the clone and the original
+ dataset is the snapshot. The
+ connection is recorded in the origin property.
+ The dependency between the clone and the original dataset is
+ reversed by
+ zfs promote. The original dataset becomes
+ the clone. The origin property on the
+ clone will then be blank. The origin
+ property on the original dataset now point to the
+ snapshot under the dataset that was formerly the clone.
+
+ Promote the clone:
+
+ &prompt.root; zfs list -rt all mypool/var/clonemypool/var/tmp
+NAME USED AVAIL REFER MOUNTPOINT
+mypool/var/clone 134K 30.5G 249K /var/clone
+mypool/var/tmp 383K 30.5G 249K /var/tmp
+mypool/var/tmp@first_snapshot 134K - 249K -
+&prompt.root; zfs get origin mypool/var/clone
+NAME PROPERTY VALUE SOURCE
+mypool/var/clone origin mypool/var/tmp@first_snapshot -
+&prompt.root; zfs promote mypool/var/clone
+&prompt.root; zfs list -rt all mypool/var/clonemypool/var/tmp
+NAME USED AVAIL REFER MOUNTPOINT
+mypool/var/clone 383K 30.5G 249K /var/clone
+mypool/var/clone@first_snapshot 134K - 249K -
+mypool/var/tmp 134K 30.5G 249K /var/tmp
+&prompt.root; zfs get origin mypool/var/clone
+NAME PROPERTY VALUE SOURCE
+mypool/var/clone origin - -
+&prompt.root; zfs get origin mypool/var/tmp
+NAME PROPERTY VALUE SOURCE
+mypool/var/tmp origin mypool/var/clone@first_snapshot -
+
+ After making changes to the clone, it is now in
+ the state wanted by the administrator. The old dataset has
+ is obsolete and the administrator wants to replace it
+ with the clone. After the clone is promoted, this can be
+ achieved with two additional commands: zfs
+ destroy the old dataset and zfs
+ rename the clone to the name of the old dataset.
+ The clone could also keep its original name, and only change
+ its mountpoint property instead.
+
+ &prompt.root; zfs destroy -f mypool/var/tmp
+&prompt.root; zfs rename mypool/var/clonemypool/var/tmp
+&prompt.root; zfs list -rt all mypool/var/tmp
+NAME USED AVAIL REFER MOUNTPOINT
+mypool/var/tmp 383K 30.5G 249K /var/tmp
+mypool/var/tmp@first_snapshot 134K - 249K -
+
+ The original clone is now an ordinary dataset. It
+ contains all the data from the original snapshot plus the
+ files that were added or modified. Any changes made to the
+ original dataset after the snapshot was created will be
+ destroyed. Now that there are no other datasets
+ depending on the snapshot, it can be destroyed as
+ well.
@@ -3041,6 +3089,55 @@
Advanced Topics
+
+ Pool Layout
+
+ Choosing the type of vdevs for construct a
+ pool requires deciding which factors are most important. The
+ main considerations for a pool are: redundancy, capacity, and
+ performance.
+
+ Mirrors
+ provide the best performance in terms of operations per second
+ (IOPS). With a mirror, every disk in a
+ vdev can be used to service reads, because each disk in the vdev
+ contains a complete copy of the data. Mirrors provide good
+ redundancy. Each disk in a vdev contains a complete copy
+ of the data and a mirror vdev can consist of many disks. The
+ downside to mirrors is that they provide the worst space efficiency and total capacity.
+ Each mirror vdev, no matter how many disks it contains,
+ provides only the capacity of the smallest disk. Multiple
+ mirror vdevs can be striped together (similar to RAID-10) to
+ provide more capacity, but the usable capacity will usually be
+ less than the same number of disks in RAID-Z.
+
+ RAID-Z comes in
+ a number of levels of redundancy. RAID-Z1 provides enough
+ redundancy to withstand the failure of a single disk in each
+ vdev. RAID-Z2 can withstand two disks failing at the same time, and Z3
+ can withstand three, without any data loss.
+ Choosing between these levels allows the
+ administrator to balance redundancy against
+ usable capacity. Each RAID-Z vdev will provide
+ storage capacity equal to the number of disks, less the level
+ of redundancy, multiplied by the size of the smallest disk.
+ Examples of the storage calculations are provided in the
+ RAID-Z definition
+ in the terminology section. Multiple RAID-Z vdevs can be
+ striped together to create an effective RAID-50 or RAID-60
+ type array.
+
+ Using more vdevs will increase performance.
+ Each vdev is operated as a unit. The effective speed of an
+ individual vdev is determined by the speed of the slowest
+ device. For the best performance, the recommended layout is
+ many mirror vdevs, but this provides the worst effective
+ capacity of the possible configurations. For increased
+ redundancy, an administrator can choose between using RAID-Z2,
+ Z3, or adding more member disks to each mirror vdev.
+
+
Tuning
@@ -3173,16 +3270,6 @@
vfs.zfs.vdev.max_pending
- - Limit the number of pending I/O requests per device.
- A higher value will keep the device command queue full
- and may give higher throughput. A lower value will reduce
- latency. This value can be adjusted at any time with
- &man.sysctl.8;.
-
-
-
- vfs.zfs.top_maxinflight
- Maxmimum number of outstanding I/Os per top-level
vdev. Limits the
@@ -3299,7 +3386,7 @@
vfs.zfs.txg.timeout
- Maximum number of seconds between
- transaction groups.
+ transaction groups.
The current transaction group will be written to the pool
and a fresh transaction group started if this amount of
time has elapsed since the previous transaction group. A
@@ -3608,6 +3695,14 @@
and an array of eight 1 TB disks in
RAID-Z3 will yield 5 TB of
usable space.
+
+ For
+ optimal performance, it is best to have a power of
+ 2 (2, 4, 8) number of non-parity drives so that
+ writes can be distributed evenly. The recommended
+ configurations are: RAID-Z1: 3, 5, or 9 disks.
+ RAID-Z2: 6 or 10 disks. RAID-Z3: 5, 7 or 11
+ disks.
@@ -4065,10 +4160,13 @@
this property on important datasets provides additional
redundancy from which to recover a block that does not
match its checksum. In pools without redundancy, the
- copies feature is the only form of redundancy. The
- copies feature can recover from a single bad sector or
+ copies feature is the only form of redundancy. The
+ copies feature can recover from a single bad sector or
other forms of minor corruption, but it does not protect
- the pool from the loss of an entire disk.
+ the pool from the loss of an entire disk. Each
+ copy of a block consumes that much additional space in
+ the file system, but also in any snapshots where that
+ block has been modified.