diff --git a/man/man1/arcstat.1 b/man/man1/arcstat.1 index 9113b76af46c..7fe1e0bfb14a 100644 --- a/man/man1/arcstat.1 +++ b/man/man1/arcstat.1 @@ -1,502 +1,502 @@ .\" .\" This file and its contents are supplied under the terms of the .\" Common Development and Distribution License ("CDDL"), version 1.0. .\" You may only use this file in accordance with the terms of version .\" 1.0 of the CDDL. .\" .\" A full copy of the text of the CDDL should have accompanied this .\" source. A copy of the CDDL is also available via the Internet at .\" http://www.illumos.org/license/CDDL. .\" .\" .\" Copyright 2014 Adam Stevko. All rights reserved. .\" Copyright (c) 2015 by Delphix. All rights reserved. .\" Copyright (c) 2020 by AJ Jordan. All rights reserved. .\" -.TH ARCSTAT 1 "May 7, 2020" +.TH ARCSTAT 1 "Aug 24, 2020" OpenZFS .SH NAME arcstat \- report ZFS ARC and L2ARC statistics .SH SYNOPSIS .LP .nf \fBarcstat\fR [\fB-hvx\fR] [\fB-f field[,field]...\fR] [\fB-o file\fR] [\fB-s string\fR] [\fBinterval\fR [\fBcount\fR]] .fi .SH DESCRIPTION .LP The \fBarcstat\fR utility print various ZFS ARC and L2ARC statistics in vmstat-like fashion. .sp .sp .LP The \fBarcstat\fR command reports the following information: .sp .ne 2 .\" .sp .ne 1 .na \fBc \fR .ad .RS 14n ARC target size .RE .sp .ne 2 .na \fBdh% \fR .ad .RS 14n Demand data hit percentage .RE .sp .ne 2 .na \fBdm% \fR .ad .RS 14n Demand data miss percentage .RE .sp .ne 2 .na \fBmfu \fR .ad .RS 14n MFU list hits per second .RE .sp .ne 2 .na \fBmh% \fR .ad .RS 14n Metadata hit percentage .RE .sp .ne 2 .na \fBmm% \fR .ad .RS 14n Metadata miss percentage .RE .sp .ne 2 .na \fBmru \fR .ad .RS 14n MRU list hits per second .RE .sp .ne 2 .na \fBph% \fR .ad .RS 14n Prefetch hits percentage .RE .sp .ne 2 .na \fBpm% \fR .ad .RS 14n Prefetch miss percentage .RE .sp .ne 2 .na \fBdhit \fR .ad .RS 14n Demand data hits per second .RE .sp .ne 2 .na \fBdmis \fR .ad .RS 14n Demand data misses per second .RE .sp .ne 2 .na \fBhit% \fR .ad .RS 14n ARC hit percentage .RE .sp .ne 2 .na \fBhits \fR .ad .RS 14n ARC reads per second .RE .sp .ne 2 .na \fBmfug \fR .ad .RS 14n MFU ghost list hits per second .RE .sp .ne 2 .na \fBmhit \fR .ad .RS 14n Metadata hits per second .RE .sp .ne 2 .na \fBmiss \fR .ad .RS 14n ARC misses per second .RE .sp .ne 2 .na \fBmmis \fR .ad .RS 14n Metadata misses per second .RE .sp .ne 2 .na \fBmrug \fR .ad .RS 14n MRU ghost list hits per second .RE .sp .ne 2 .na \fBphit \fR .ad .RS 14n Prefetch hits per second .RE .sp .ne 2 .na \fBpmis \fR .ad .RS 14n Prefetch misses per second .RE .sp .ne 2 .na \fBread \fR .ad .RS 14n Total ARC accesses per second .RE .sp .ne 2 .na \fBtime \fR .ad .RS 14n Time .RE .sp .ne 2 .na \fBsize \fR .ad .RS 14n ARC size .RE .sp .ne 2 .na \fBarcsz \fR .ad .RS 14n Alias for \fBsize\fR .RE .sp .ne 2 .na \fBdread \fR .ad .RS 14n Demand data accesses per second .RE .sp .ne 2 .na \fBeskip \fR .ad .RS 14n evict_skip per second .RE .sp .ne 2 .na \fBmiss% \fR .ad .RS 14n ARC miss percentage .RE .sp .ne 2 .na \fBmread \fR .ad .RS 14n Metadata accesses per second .RE .sp .ne 2 .na \fBpread \fR .ad .RS 14n Prefetch accesses per second .RE .sp .ne 2 .na \fBl2hit% \fR .ad .RS 14n L2ARC access hit percentage .RE .sp .ne 2 .na \fBl2hits \fR .ad .RS 14n L2ARC hits per second .RE .sp .ne 2 .na \fBl2miss \fR .ad .RS 14n L2ARC misses per second .RE .sp .ne 2 .na \fBl2read \fR .ad .RS 14n Total L2ARC accesses per second .RE .sp .ne 2 .na \fBl2size \fR .ad .RS 14n Size of the L2ARC .RE .sp .ne 2 .na \fBmtxmis \fR .ad .RS 14n mutex_miss per second .RE .sp .ne 2 .na \fBl2bytes \fR .ad .RS 14n Bytes read per second from the L2ARC .RE .sp .ne 2 .na \fBl2miss% \fR .ad .RS 14n L2ARC access miss percentage .RE .sp .ne 2 .na \fBl2asize \fR .ad .RS 14n Actual (compressed) size of the L2ARC .RE .sp .ne 2 .na \fBgrow \fR .ad .RS 14n ARC grow disabled .RE .sp .ne 2 .na \fBneed \fR .ad .RS 14n ARC reclaim needed .RE .sp .ne 2 .na \fBfree \fR .ad .RS 14n The ARC's idea of how much free memory there is, which includes evictable memory in the page cache. Since the ARC tries to keep \fBavail\fR above zero, \fBavail\fR is usually more instructive to observe than \fBfree\fR. .RE .sp .ne 2 .na \fBavail \fR .ad .RS 14n The ARC's idea of how much free memory is available to it, which is a bit less than \fBfree\fR. May temporarily be negative, in which case the ARC will reduce the target size \fBc\fR. .RE .\" .SH OPTIONS .LP The following options are supported: .sp .ne 2 .na \fB\fB-f\fR\fR .ad .RS 12n Display only specific fields. See \fBDESCRIPTION\fR for supported statistics. .RE .sp .ne 2 .na \fB\fB-h\fR\fR .ad .RS 12n Display help message. .RE .sp .ne 2 .na \fB\fB-o\fR\fR .ad .RS 12n Report statistics to a file instead of the standard output. .RE .sp .ne 2 .na \fB\fB-s\fR\fR .ad .RS 12n Display data with a specified separator (default: 2 spaces). .RE .sp .ne 2 .na \fB\fB-x\fR\fR .ad .RS 12n Print extended stats (same as -f time,mfu,mru,mfug,mrug,eskip,mtxmis,dread,pread,read). .RE .sp .ne 2 .na \fB\fB-v\fR\fR .ad .RS 12n Show field headers and definitions .RE .SH OPERANDS .LP The following operands are supported: .sp .ne 2 .na \fB\fIcount\fR\fR .ad .RS 12n Display only \fIcount\fR reports. .RE .sp .ne 2 .na \fB\fIinterval\fR\fR .ad .RS 12n Specify the sampling interval in seconds. .RE .SH AUTHORS .LP arcstat was originally written in Perl by Neelakanth Nadgir and supported only ZFS ARC statistics. Mike Harsch updated it to support L2ARC statistics. John Hixson ported it to Python for FreeNAS over some beer, after which many individuals from the OpenZFS community continued to maintain and improve it. diff --git a/man/man1/cstyle.1 b/man/man1/cstyle.1 index f77d534507a4..14175838a4fd 100644 --- a/man/man1/cstyle.1 +++ b/man/man1/cstyle.1 @@ -1,167 +1,167 @@ .\" Copyright 2009 Sun Microsystems, Inc. All rights reserved. .\" Use is subject to license terms. .\" .\" CDDL HEADER START .\" .\" The contents of this file are subject to the terms of the .\" Common Development and Distribution License (the "License"). .\" You may not use this file except in compliance with the License. .\" .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE .\" or http://www.opensolaris.org/os/licensing. .\" See the License for the specific language governing permissions .\" and limitations under the License. .\" .\" When distributing Covered Code, include this CDDL HEADER in each .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. .\" If applicable, add the following below this CDDL HEADER, with the .\" fields enclosed by brackets "[]" replaced with your own identifying .\" information: Portions Copyright [yyyy] [name of copyright owner] .\" .\" CDDL HEADER END .\" -.TH cstyle 1 "28 March 2005" +.TH CSTYLE 1 "Aug 24, 2020" OpenZFS .SH NAME .I cstyle \- check for some common stylistic errors in C source files .SH SYNOPSIS \fBcstyle [-chpvCP] [-o constructs] [file...]\fP .LP .SH DESCRIPTION .IX "OS-Net build tools" "cstyle" "" "\fBcstyle\fP" .LP .I cstyle inspects C source files (*.c and *.h) for common stylistic errors. It attempts to check for the cstyle documented in \fIhttp://www.cis.upenn.edu/~lee/06cse480/data/cstyle.ms.pdf\fP. Note that there is much in that document that .I cannot be checked for; just because your code is \fBcstyle(1)\fP clean does not mean that you've followed Sun's C style. \fICaveat emptor\fP. .LP .SH OPTIONS .LP The following options are supported: .TP 4 .B \-c Check continuation line indentation inside of functions. Sun's C style states that all statements must be indented to an appropriate tab stop, and any continuation lines after them must be indented \fIexactly\fP four spaces from the start line. This option enables a series of checks designed to find continuation line problems within functions only. The checks have some limitations; see CONTINUATION CHECKING, below. .LP .TP 4 .B \-h Performs heuristic checks that are sometimes wrong. Not generally used. .LP .TP 4 .B \-p Performs some of the more picky checks. Includes ANSI #else and #endif rules, and tries to detect spaces after casts. Used as part of the putback checks. .LP .TP 4 .B \-v Verbose output; includes the text of the line of error, and, for \fB-c\fP, the first statement in the current continuation block. .LP .TP 4 .B \-C Ignore errors in header comments (i.e. block comments starting in the first column). Not generally used. .LP .TP 4 .B \-P Check for use of non-POSIX types. Historically, types like "u_int" and "u_long" were used, but they are now deprecated in favor of the POSIX types uint_t, ulong_t, etc. This detects any use of the deprecated types. Used as part of the putback checks. .LP .TP 4 .B \-o \fIconstructs\fP Allow a comma-separated list of additional constructs. Available constructs include: .LP .TP 10 .B doxygen Allow doxygen-style block comments (\fB/**\fP and \fB/*!\fP) .LP .TP 10 .B splint Allow splint-style lint comments (\fB/*@...@*/\fP) .LP .SH NOTES .LP The cstyle rule for the OS/Net consolidation is that all new files must be \fB-pP\fP clean. For existing files, the following invocations are run against both the old and new files: .LP .TP 4 \fBcstyle file\fB .LP .TP 4 \fBcstyle -p file\fB .LP .TP 4 \fBcstyle -pP file\fB .LP If the old file gave no errors for one of the invocations, the new file must also give no errors. This way, files can only become more clean. .LP .SH CONTINUATION CHECKING .LP The continuation checker is a reasonably simple state machine that knows something about how C is laid out, and can match parenthesis, etc. over multiple lines. It does have some limitations: .LP .TP 4 .B 1. Preprocessor macros which cause unmatched parenthesis will confuse the checker for that line. To fix this, you'll need to make sure that each branch of the #if statement has balanced parenthesis. .LP .TP 4 .B 2. Some \fBcpp\fP macros do not require ;s after them. Any such macros *must* be ALL_CAPS; any lower case letters will cause bad output. .LP The bad output will generally be corrected after the next \fB;\fP, \fB{\fP, or \fB}\fP. .LP Some continuation error messages deserve some additional explanation .LP .TP 4 .B multiple statements continued over multiple lines A multi-line statement which is not broken at statement boundaries. For example: .RS 4 .HP 4 if (this_is_a_long_variable == another_variable) a = .br b + c; .LP Will trigger this error. Instead, do: .HP 8 if (this_is_a_long_variable == another_variable) .br a = b + c; .RE .LP .TP 4 .B empty if/for/while body not on its own line For visibility, empty bodies for if, for, and while statements should be on their own line. For example: .RS 4 .HP 4 while (do_something(&x) == 0); .LP Will trigger this error. Instead, do: .HP 8 while (do_something(&x) == 0) .br ; .RE diff --git a/man/man1/raidz_test.1 b/man/man1/raidz_test.1 index 423177a1b839..63e9144ad201 100644 --- a/man/man1/raidz_test.1 +++ b/man/man1/raidz_test.1 @@ -1,97 +1,97 @@ '\" t .\" .\" CDDL HEADER START .\" .\" The contents of this file are subject to the terms of the .\" Common Development and Distribution License (the "License"). .\" You may not use this file except in compliance with the License. .\" .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE .\" or http://www.opensolaris.org/os/licensing. .\" See the License for the specific language governing permissions .\" and limitations under the License. .\" .\" When distributing Covered Code, include this CDDL HEADER in each .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. .\" If applicable, add the following below this CDDL HEADER, with the .\" fields enclosed by brackets "[]" replaced with your own identifying .\" information: Portions Copyright [yyyy] [name of copyright owner] .\" .\" CDDL HEADER END .\" .\" .\" Copyright (c) 2016 Gvozden Nešković. All rights reserved. .\" -.TH raidz_test 1 "2016" "ZFS on Linux" "User Commands" +.TH RAIDZ_TEST 1 "Aug 24, 2020" OpenZFS .SH NAME \fBraidz_test\fR \- raidz implementation verification and benchmarking tool .SH SYNOPSIS .LP .BI "raidz_test " .SH DESCRIPTION .LP This manual page documents briefly the \fBraidz_test\fR command. .LP Purpose of this tool is to run all supported raidz implementation and verify results of all methods. Tool also contains a parameter sweep option where all parameters affecting RAIDZ block are verified (like ashift size, data offset, data size, etc...). The tool also supports a benchmarking mode using -B option. .SH OPTION .HP .BI "\-h" "" .IP Print a help summary. .HP .BI "\-a" " ashift (default: 9)" .IP Ashift value. .HP .BI "\-o" " zio_off_shift" " (default: 0)" .IP Zio offset for raidz block. Offset value is 1 << (zio_off_shift) .HP .BI "\-d" " raidz_data_disks" " (default: 8)" .IP Number of raidz data disks to use. Additional disks for parity will be used during testing. .HP .BI "\-s" " zio_size_shift" " (default: 19)" .IP Size of data for raidz block. Size is 1 << (zio_size_shift). .HP .BI "\-S(weep)" .IP Sweep parameter space while verifying the raidz implementations. This option will exhaust all most of valid values for -a -o -d -s options. Runtime using this option will be long. .HP .BI "\-t(imeout)" .IP Wall time for sweep test in seconds. The actual runtime could be longer. .HP .BI "\-B(enchmark)" .IP This options starts the benchmark mode. All implementations are benchmarked using increasing per disk data size. Results are given as throughput per disk, measured in MiB/s. .HP .BI "\-v(erbose)" .IP Increase verbosity. .HP .BI "\-T(est the test)" .IP Debugging option. When this option is specified tool is supposed to fail all tests. This is to check if tests would properly verify bit-exactness. .HP .BI "\-D(ebug)" .IP Debugging option. Specify to attach gdb when SIGSEGV or SIGABRT are received. .HP .SH "SEE ALSO" .BR "ztest (1)" .SH "AUTHORS" vdev_raidz, created for ZFS on Linux by Gvozden Nešković diff --git a/man/man1/zhack.1 b/man/man1/zhack.1 index 11d300b70014..3126007a5e0d 100644 --- a/man/man1/zhack.1 +++ b/man/man1/zhack.1 @@ -1,98 +1,98 @@ '\" t .\" .\" CDDL HEADER START .\" .\" The contents of this file are subject to the terms of the .\" Common Development and Distribution License (the "License"). .\" You may not use this file except in compliance with the License. .\" .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE .\" or http://www.opensolaris.org/os/licensing. .\" See the License for the specific language governing permissions .\" and limitations under the License. .\" .\" When distributing Covered Code, include this CDDL HEADER in each .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. .\" If applicable, add the following below this CDDL HEADER, with the .\" fields enclosed by brackets "[]" replaced with your own identifying .\" information: Portions Copyright [yyyy] [name of copyright owner] .\" .\" CDDL HEADER END .\" .\" .\" Copyright 2013 Darik Horn . All rights reserved. .\" -.TH zhack 1 "2013 MAR 16" "ZFS on Linux" "User Commands" +.TH ZHACK 1 "Aug 24, 2020" OpenZFS .SH NAME zhack \- libzpool debugging tool .SH DESCRIPTION This utility pokes configuration changes directly into a ZFS pool, which is dangerous and can cause data corruption. .SH SYNOPSIS .LP .BI "zhack [\-c " "cachefile" "] [\-d " "dir" "] <" "subcommand" "> [" "arguments" "]" .SH OPTIONS .HP .BI "\-c" " cachefile" .IP Read the \fIpool\fR configuration from the \fIcachefile\fR, which is /etc/zfs/zpool.cache by default. .HP .BI "\-d" " dir" .IP Search for \fIpool\fR members in the \fIdir\fR path. Can be specified more than once. .SH SUBCOMMANDS .LP .BI "feature stat " "pool" .IP List feature flags. .LP .BI "feature enable [\-d " "description" "] [\-r] " "pool guid" .IP Add a new feature to \fIpool\fR that is uniquely identified by \fIguid\fR, which is specified in the same form as a zfs(8) user property. .IP The \fIdescription\fR is a short human readable explanation of the new feature. .IP The \fB\-r\fR switch indicates that \fIpool\fR can be safely opened in read-only mode by a system that does not have the \fIguid\fR feature. .LP .BI "feature ref [\-d|\-m] " "pool guid" .IP Increment the reference count of the \fIguid\fR feature in \fIpool\fR. .IP The \fB\-d\fR switch decrements the reference count of the \fIguid\fR feature in \fIpool\fR. .IP The \fB\-m\fR switch indicates that the \fIguid\fR feature is now required to read the pool MOS. .SH EXAMPLES .LP .nf # zhack feature stat tank for_read_obj: org.illumos:lz4_compress = 0 for_write_obj: com.delphix:async_destroy = 0 com.delphix:empty_bpobj = 0 descriptions_obj: com.delphix:async_destroy = Destroy filesystems asynchronously. com.delphix:empty_bpobj = Snapshots use less space. org.illumos:lz4_compress = LZ4 compression algorithm support. .LP # zhack feature enable -d 'Predict future disk failures.' \\ tank com.example:clairvoyance .LP # zhack feature ref tank com.example:clairvoyance .SH AUTHORS This man page was written by Darik Horn . .SH SEE ALSO .BR zfs (8), .BR zpool-features (5), .BR ztest (1) diff --git a/man/man1/ztest.1 b/man/man1/ztest.1 index 84e56c822d13..68c978ca0968 100644 --- a/man/man1/ztest.1 +++ b/man/man1/ztest.1 @@ -1,179 +1,179 @@ '\" t .\" .\" CDDL HEADER START .\" .\" The contents of this file are subject to the terms of the .\" Common Development and Distribution License (the "License"). .\" You may not use this file except in compliance with the License. .\" .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE .\" or http://www.opensolaris.org/os/licensing. .\" See the License for the specific language governing permissions .\" and limitations under the License. .\" .\" When distributing Covered Code, include this CDDL HEADER in each .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. .\" If applicable, add the following below this CDDL HEADER, with the .\" fields enclosed by brackets "[]" replaced with your own identifying .\" information: Portions Copyright [yyyy] [name of copyright owner] .\" .\" CDDL HEADER END .\" .\" .\" Copyright (c) 2009 Oracle and/or its affiliates. All rights reserved. .\" Copyright (c) 2009 Michael Gebetsroither . All rights .\" reserved. .\" -.TH ztest 1 "2009 NOV 01" "ZFS on Linux" "User Commands" +.TH ZTEST 1 "Aug 24, 2020" OpenZFS .SH NAME \fBztest\fR \- was written by the ZFS Developers as a ZFS unit test. .SH SYNOPSIS .LP .BI "ztest " .SH DESCRIPTION .LP This manual page documents briefly the \fBztest\fR command. .LP \fBztest\fR was written by the ZFS Developers as a ZFS unit test. The tool was developed in tandem with the ZFS functionality and was executed nightly as one of the many regression test against the daily build. As features were added to ZFS, unit tests were also added to \fBztest\fR. In addition, a separate test development team wrote and executed more functional and stress tests. .LP By default \fBztest\fR runs for ten minutes and uses block files (stored in /tmp) to create pools rather than using physical disks. Block files afford \fBztest\fR its flexibility to play around with zpool components without requiring large hardware configurations. However, storing the block files in /tmp may not work for you if you have a small tmp directory. .LP By default is non-verbose. This is why entering the command above will result in \fBztest\fR quietly executing for 5 minutes. The -V option can be used to increase the verbosity of the tool. Adding multiple -V option is allowed and the more you add the more chatty \fBztest\fR becomes. .LP After the \fBztest\fR run completes, you should notice many ztest.* files lying around. Once the run completes you can safely remove these files. Note that you shouldn't remove these files during a run. You can re-use these files in your next \fBztest\fR run by using the -E option. .SH OPTIONS .HP .BI "\-?" "" .IP Print a help summary. .HP .BI "\-v" " vdevs" " (default: 5) .IP Number of vdevs. .HP .BI "\-s" " size_of_each_vdev" " (default: 64M)" .IP Size of each vdev. .HP .BI "\-a" " alignment_shift" " (default: 9) (use 0 for random)" .IP Used alignment in test. .HP .BI "\-m" " mirror_copies" " (default: 2)" .IP Number of mirror copies. .HP .BI "\-r" " raidz_disks" " (default: 4)" .IP Number of raidz disks. .HP .BI "\-R" " raidz_parity" " (default: 1)" .IP Raidz parity. .HP .BI "\-d" " datasets" " (default: 7)" .IP Number of datasets. .HP .BI "\-t" " threads" " (default: 23)" .IP Number of threads. .HP .BI "\-g" " gang_block_threshold" " (default: 32K)" .IP Gang block threshold. .HP .BI "\-i" " initialize_pool_i_times" " (default: 1)" .IP Number of pool initialisations. .HP .BI "\-k" " kill_percentage" " (default: 70%)" .IP Kill percentage. .HP .BI "\-p" " pool_name" " (default: ztest)" .IP Pool name. .HP .BI "\-V(erbose)" .IP Verbose (use multiple times for ever more blather). .HP .BI "\-E(xisting)" .IP Use existing pool (use existing pool instead of creating new one). .HP .BI "\-T" " time" " (default: 300 sec)" .IP Total test run time. .HP .BI "\-z" " zil_failure_rate" " (default: fail every 2^5 allocs) .IP Injected failure rate. .HP .BI "\-G" .IP Dump zfs_dbgmsg buffer before exiting. .SH "EXAMPLES" .LP To override /tmp as your location for block files, you can use the -f option: .IP ztest -f / .LP To get an idea of what ztest is actually testing try this: .IP ztest -f / -VVV .LP Maybe you'd like to run ztest for longer? To do so simply use the -T option and specify the runlength in seconds like so: .IP ztest -f / -V -T 120 .SH "ENVIRONMENT VARIABLES" .TP .B "ZFS_HOSTID=id" Use \fBid\fR instead of the SPL hostid to identify this host. Intended for use with ztest, but this environment variable will affect any utility which uses libzpool, including \fBzpool(8)\fR. Since the kernel is unaware of this setting results with utilities other than ztest are undefined. .TP .B "ZFS_STACK_SIZE=stacksize" Limit the default stack size to \fBstacksize\fR bytes for the purpose of detecting and debugging kernel stack overflows. This value defaults to \fB32K\fR which is double the default \fB16K\fR Linux kernel stack size. In practice, setting the stack size slightly higher is needed because differences in stack usage between kernel and user space can lead to spurious stack overflows (especially when debugging is enabled). The specified value will be rounded up to a floor of PTHREAD_STACK_MIN which is the minimum stack required for a NULL procedure in user space. By default the stack size is limited to 256K. .SH "SEE ALSO" .BR "spl-module-parameters (5)" "," .BR "zpool (1)" "," .BR "zfs (1)" "," .BR "zdb (1)" "," .SH "AUTHOR" This manual page was transferred to asciidoc by Michael Gebetsroither from http://opensolaris.org/os/community/zfs/ztest/ diff --git a/man/man5/spl-module-parameters.5 b/man/man5/spl-module-parameters.5 index 2dce5b2963d6..5e28e694e04c 100644 --- a/man/man5/spl-module-parameters.5 +++ b/man/man5/spl-module-parameters.5 @@ -1,326 +1,326 @@ '\" te .\" .\" Copyright 2013 Turbo Fredriksson . All rights reserved. .\" -.TH SPL-MODULE-PARAMETERS 5 "Oct 28, 2017" +.TH SPL-MODULE-PARAMETERS 5 "Aug 24, 2020" OpenZFS .SH NAME spl\-module\-parameters \- SPL module parameters .SH DESCRIPTION .sp .LP Description of the different parameters to the SPL module. .SS "Module parameters" .sp .LP .sp .ne 2 .na \fBspl_kmem_cache_expire\fR (uint) .ad .RS 12n Cache expiration is part of default Illumos cache behavior. The idea is that objects in magazines which have not been recently accessed should be returned to the slabs periodically. This is known as cache aging and when enabled objects will be typically returned after 15 seconds. .sp On the other hand Linux slabs are designed to never move objects back to the slabs unless there is memory pressure. This is possible because under Linux the cache will be notified when memory is low and objects can be released. .sp By default only the Linux method is enabled. It has been shown to improve responsiveness on low memory systems and not negatively impact the performance of systems with more memory. This policy may be changed by setting the \fBspl_kmem_cache_expire\fR bit mask as follows, both policies may be enabled concurrently. .sp 0x01 - Aging (Illumos), 0x02 - Low memory (Linux) .sp Default value: \fB0x02\fR .RE .sp .ne 2 .na \fBspl_kmem_cache_kmem_threads\fR (uint) .ad .RS 12n The number of threads created for the spl_kmem_cache task queue. This task queue is responsible for allocating new slabs for use by the kmem caches. For the majority of systems and workloads only a small number of threads are required. .sp Default value: \fB4\fR .RE .sp .ne 2 .na \fBspl_kmem_cache_reclaim\fR (uint) .ad .RS 12n When this is set it prevents Linux from being able to rapidly reclaim all the memory held by the kmem caches. This may be useful in circumstances where it's preferable that Linux reclaim memory from some other subsystem first. Setting this will increase the likelihood out of memory events on a memory constrained system. .sp Default value: \fB0\fR .RE .sp .ne 2 .na \fBspl_kmem_cache_obj_per_slab\fR (uint) .ad .RS 12n The preferred number of objects per slab in the cache. In general, a larger value will increase the caches memory footprint while decreasing the time required to perform an allocation. Conversely, a smaller value will minimize the footprint and improve cache reclaim time but individual allocations may take longer. .sp Default value: \fB8\fR .RE .sp .ne 2 .na \fBspl_kmem_cache_obj_per_slab_min\fR (uint) .ad .RS 12n The minimum number of objects allowed per slab. Normally slabs will contain \fBspl_kmem_cache_obj_per_slab\fR objects but for caches that contain very large objects it's desirable to only have a few, or even just one, object per slab. .sp Default value: \fB1\fR .RE .sp .ne 2 .na \fBspl_kmem_cache_max_size\fR (uint) .ad .RS 12n The maximum size of a kmem cache slab in MiB. This effectively limits the maximum cache object size to \fBspl_kmem_cache_max_size\fR / \fBspl_kmem_cache_obj_per_slab\fR. Caches may not be created with object sized larger than this limit. .sp Default value: \fB32 (64-bit) or 4 (32-bit)\fR .RE .sp .ne 2 .na \fBspl_kmem_cache_slab_limit\fR (uint) .ad .RS 12n For small objects the Linux slab allocator should be used to make the most efficient use of the memory. However, large objects are not supported by the Linux slab and therefore the SPL implementation is preferred. This value is used to determine the cutoff between a small and large object. .sp Objects of \fBspl_kmem_cache_slab_limit\fR or smaller will be allocated using the Linux slab allocator, large objects use the SPL allocator. A cutoff of 16K was determined to be optimal for architectures using 4K pages. .sp Default value: \fB16,384\fR .RE .sp .ne 2 .na \fBspl_kmem_alloc_warn\fR (uint) .ad .RS 12n As a general rule kmem_alloc() allocations should be small, preferably just a few pages since they must by physically contiguous. Therefore, a rate limited warning will be printed to the console for any kmem_alloc() which exceeds a reasonable threshold. .sp The default warning threshold is set to eight pages but capped at 32K to accommodate systems using large pages. This value was selected to be small enough to ensure the largest allocations are quickly noticed and fixed. But large enough to avoid logging any warnings when a allocation size is larger than optimal but not a serious concern. Since this value is tunable, developers are encouraged to set it lower when testing so any new largish allocations are quickly caught. These warnings may be disabled by setting the threshold to zero. .sp Default value: \fB32,768\fR .RE .sp .ne 2 .na \fBspl_kmem_alloc_max\fR (uint) .ad .RS 12n Large kmem_alloc() allocations will fail if they exceed KMALLOC_MAX_SIZE. Allocations which are marginally smaller than this limit may succeed but should still be avoided due to the expense of locating a contiguous range of free pages. Therefore, a maximum kmem size with reasonable safely margin of 4x is set. Kmem_alloc() allocations larger than this maximum will quickly fail. Vmem_alloc() allocations less than or equal to this value will use kmalloc(), but shift to vmalloc() when exceeding this value. .sp Default value: \fBKMALLOC_MAX_SIZE/4\fR .RE .sp .ne 2 .na \fBspl_kmem_cache_magazine_size\fR (uint) .ad .RS 12n Cache magazines are an optimization designed to minimize the cost of allocating memory. They do this by keeping a per-cpu cache of recently freed objects, which can then be reallocated without taking a lock. This can improve performance on highly contended caches. However, because objects in magazines will prevent otherwise empty slabs from being immediately released this may not be ideal for low memory machines. .sp For this reason \fBspl_kmem_cache_magazine_size\fR can be used to set a maximum magazine size. When this value is set to 0 the magazine size will be automatically determined based on the object size. Otherwise magazines will be limited to 2-256 objects per magazine (i.e per cpu). Magazines may never be entirely disabled in this implementation. .sp Default value: \fB0\fR .RE .sp .ne 2 .na \fBspl_hostid\fR (ulong) .ad .RS 12n The system hostid, when set this can be used to uniquely identify a system. By default this value is set to zero which indicates the hostid is disabled. It can be explicitly enabled by placing a unique non-zero value in \fB/etc/hostid/\fR. .sp Default value: \fB0\fR .RE .sp .ne 2 .na \fBspl_hostid_path\fR (charp) .ad .RS 12n The expected path to locate the system hostid when specified. This value may be overridden for non-standard configurations. .sp Default value: \fB/etc/hostid\fR .RE .sp .ne 2 .na \fBspl_panic_halt\fR (uint) .ad .RS 12n -Cause a kernel panic on assertion failures. When not enabled, the thread is +Cause a kernel panic on assertion failures. When not enabled, the thread is halted to facilitate further debugging. .sp Set to a non-zero value to enable. .sp Default value: \fB0\fR .RE .sp .ne 2 .na \fBspl_taskq_kick\fR (uint) .ad .RS 12n Kick stuck taskq to spawn threads. When writing a non-zero value to it, it will scan all the taskqs. If any of them have a pending task more than 5 seconds old, it will kick it to spawn more threads. This can be used if you find a rare deadlock occurs because one or more taskqs didn't spawn a thread when it should. .sp Default value: \fB0\fR .RE .sp .ne 2 .na \fBspl_taskq_thread_bind\fR (int) .ad .RS 12n Bind taskq threads to specific CPUs. When enabled all taskq threads will be distributed evenly over the available CPUs. By default, this behavior is disabled to allow the Linux scheduler the maximum flexibility to determine where a thread should run. .sp Default value: \fB0\fR .RE .sp .ne 2 .na \fBspl_taskq_thread_dynamic\fR (int) .ad .RS 12n Allow dynamic taskqs. When enabled taskqs which set the TASKQ_DYNAMIC flag will by default create only a single thread. New threads will be created on demand up to a maximum allowed number to facilitate the completion of outstanding tasks. Threads which are no longer needed will be promptly destroyed. By default this behavior is enabled but it can be disabled to aid performance analysis or troubleshooting. .sp Default value: \fB1\fR .RE .sp .ne 2 .na \fBspl_taskq_thread_priority\fR (int) .ad .RS 12n Allow newly created taskq threads to set a non-default scheduler priority. When enabled the priority specified when a taskq is created will be applied to all threads created by that taskq. When disabled all threads will use the default Linux kernel thread priority. By default, this behavior is enabled. .sp Default value: \fB1\fR .RE .sp .ne 2 .na \fBspl_taskq_thread_sequential\fR (int) .ad .RS 12n The number of items a taskq worker thread must handle without interruption before requesting a new worker thread be spawned. This is used to control how quickly taskqs ramp up the number of threads processing the queue. Because Linux thread creation and destruction are relatively inexpensive a small default value has been selected. This means that normally threads will be created aggressively which is desirable. Increasing this value will result in a slower thread creation rate which may be preferable for some configurations. .sp Default value: \fB4\fR .RE .sp .ne 2 .na \fBspl_max_show_tasks\fR (uint) .ad .RS 12n The maximum number of tasks per pending list in each taskq shown in /proc/spl/{taskq,taskq-all}. Write 0 to turn off the limit. The proc file will walk the lists with lock held, reading it could cause a lock up if the list grow too large without limiting the output. "(truncated)" will be shown if the list is larger than the limit. .sp Default value: \fB512\fR .RE diff --git a/man/man5/vdev_id.conf.5 b/man/man5/vdev_id.conf.5 index 89c5ee961094..9ae3865f7d3d 100644 --- a/man/man5/vdev_id.conf.5 +++ b/man/man5/vdev_id.conf.5 @@ -1,222 +1,222 @@ -.TH vdev_id.conf 5 +.TH VDEV_ID.CONF 5 "Aug 24, 2020" OpenZFS .SH NAME vdev_id.conf \- Configuration file for vdev_id .SH DESCRIPTION .I vdev_id.conf is the configuration file for .BR vdev_id (8). It controls the default behavior of .BR vdev_id (8) while it is mapping a disk device name to an alias. .PP The .I vdev_id.conf file uses a simple format consisting of a keyword followed by one or more values on a single line. Any line not beginning with a recognized keyword is ignored. Comments may optionally begin with a hash character. The following keywords and values are used. .TP \fIalias\fR Maps a device link in the /dev directory hierarchy to a new device name. The udev rule defining the device link must have run prior to .BR vdev_id (8). A defined alias takes precedence over a topology-derived name, but the two naming methods can otherwise coexist. For example, one might name drives in a JBOD with the sas_direct topology while naming an internal L2ARC device with an alias. \fIname\fR - the name of the link to the device that will by created in /dev/disk/by-vdev. \fIdevlink\fR - the name of the device link that has already been defined by udev. This may be an absolute path or the base filename. .TP \fIchannel\fR [pci_slot] Maps a physical path to a channel name (typically representing a single disk enclosure). .TP \fIenclosure_symlinks\fR Additionally create /dev/by-enclosure symlinks to the disk enclosure sg devices using the naming scheme from vdev_id.conf. \fIenclosure_symlinks\fR is only allowed for sas_direct mode. .TP \fIenclosure_symlinks_prefix\fR Specify the prefix for the enclosure symlinks in the form of: /dev/by-enclosure/- Defaults to "enc" if not specified. .TP \fIpci_slot\fR - specifies the PCI SLOT of the HBA hosting the disk enclosure being mapped, as found in the output of .BR lspci (8). This argument is not used in sas_switch mode. \fIport\fR - specifies the numeric identifier of the HBA or SAS switch port connected to the disk enclosure being mapped. \fIname\fR - specifies the name of the channel. .TP \fIslot\fR [channel] Maps a disk slot number as reported by the operating system to an alternative slot number. If the \fIchannel\fR parameter is specified then the mapping is only applied to slots in the named channel, otherwise the mapping is applied to all channels. The first-specified \fIslot\fR rule that can match a slot takes precedence. Therefore a channel-specific mapping for a given slot should generally appear before a generic mapping for the same slot. In this way a custom mapping may be applied to a particular channel and a default mapping applied to the others. .TP \fImultipath\fR Specifies whether .BR vdev_id (8) will handle only dm-multipath devices. If set to "yes" then .BR vdev_id (8) will examine the first running component disk of a dm-multipath device as listed by the .BR multipath (8) command to determine the physical path. .TP \fItopology\fR Identifies a physical topology that governs how physical paths are mapped to channels. \fIsas_direct\fR - in this mode a channel is uniquely identified by a PCI slot and a HBA port number \fIsas_switch\fR - in this mode a channel is uniquely identified by a SAS switch port number .TP \fIphys_per_port\fR Specifies the number of PHY devices associated with a SAS HBA port or SAS switch port. .BR vdev_id (8) internally uses this value to determine which HBA or switch port a device is connected to. The default is 4. .TP \fIslot\fR Specifies from which element of a SAS identifier the slot number is taken. The default is bay. \fIbay\fR - read the slot number from the bay identifier. \fIphy\fR - read the slot number from the phy identifier. \fIport\fR - use the SAS port as the slot number. \fIid\fR - use the scsi id as the slot number. \fIlun\fR - use the scsi lun as the slot number. \fIses\fR - use the SCSI Enclosure Services (SES) enclosure device slot number, as reported by .BR sg_ses (8). This is intended for use only on systems where \fIbay\fR is unsupported, noting that \fIport\fR and \fIid\fR may be unstable across disk replacement. .SH EXAMPLES A non-multipath configuration with direct-attached SAS enclosures and an arbitrary slot re-mapping. .P .nf multipath no topology sas_direct phys_per_port 4 slot bay # PCI_SLOT HBA PORT CHANNEL NAME channel 85:00.0 1 A channel 85:00.0 0 B channel 86:00.0 1 C channel 86:00.0 0 D # Custom mapping for Channel A # Linux Mapped # Slot Slot Channel slot 1 7 A slot 2 10 A slot 3 3 A slot 4 6 A # Default mapping for B, C, and D slot 1 4 slot 2 2 slot 3 1 slot 4 3 .fi .P A SAS-switch topology. Note that the .I channel keyword takes only two arguments in this example. .P .nf topology sas_switch # SWITCH PORT CHANNEL NAME channel 1 A channel 2 B channel 3 C channel 4 D .fi .P A multipath configuration. Note that channel names have multiple definitions - one per physical path. .P .nf multipath yes # PCI_SLOT HBA PORT CHANNEL NAME channel 85:00.0 1 A channel 85:00.0 0 B channel 86:00.0 1 A channel 86:00.0 0 B .fi .P A configuration with enclosure_symlinks enabled. .P .nf multipath yes enclosure_symlinks yes # PCI_ID HBA PORT CHANNEL NAME channel 05:00.0 1 U channel 05:00.0 0 L channel 06:00.0 1 U channel 06:00.0 0 L .fi In addition to the disks symlinks, this configuration will create: .P .nf /dev/by-enclosure/enc-L0 /dev/by-enclosure/enc-L1 /dev/by-enclosure/enc-U0 /dev/by-enclosure/enc-U1 .fi .P A configuration using device link aliases. .P .nf # by-vdev # name fully qualified or base name of device link alias d1 /dev/disk/by-id/wwn-0x5000c5002de3b9ca alias d2 wwn-0x5000c5002def789e .fi .P .SH FILES .TP .I /etc/zfs/vdev_id.conf The configuration file for .BR vdev_id (8). .SH SEE ALSO .BR vdev_id (8) diff --git a/man/man5/zfs-events.5 b/man/man5/zfs-events.5 index 4a28be71e685..0d0e1a9593d5 100644 --- a/man/man5/zfs-events.5 +++ b/man/man5/zfs-events.5 @@ -1,965 +1,965 @@ '\" te .\" Copyright (c) 2013 by Turbo Fredriksson . All rights reserved. .\" Portions Copyright 2018 by Richard Elling .\" The contents of this file are subject to the terms of the Common Development .\" and Distribution License (the "License"). You may not use this file except .\" in compliance with the License. You can obtain a copy of the license at .\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. .\" .\" See the License for the specific language governing permissions and .\" limitations under the License. When distributing Covered Code, include this .\" CDDL HEADER in each file and include the License file at .\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this .\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your .\" own identifying information: .\" Portions Copyright [yyyy] [name of copyright owner] -.TH ZFS-EVENTS 5 "Oct 24, 2018" +.TH ZFS-EVENTS 5 "Aug 24, 2020" OpenZFS .SH NAME zfs\-events \- Events created by the ZFS filesystem. .SH DESCRIPTION .sp .LP Description of the different events generated by the ZFS stack. .sp Most of these don't have any description. The events generated by ZFS have never been publicly documented. What is here is intended as a starting point to provide documentation for all possible events. .sp To view all events created since the loading of the ZFS infrastructure (i.e, "the module"), run .P .nf \fBzpool events\fR .fi .P to get a short list, and .P .nf \fBzpool events -v\fR .fi .P to get a full detail of the events and what information is available about it. .sp This man page lists the different subclasses that are issued in the case of an event. The full event name would be \fIereport.fs.zfs.SUBCLASS\fR, but we only list the last part here. .SS "EVENTS (SUBCLASS)" .sp .LP .sp .ne 2 .na \fBchecksum\fR .ad .RS 12n Issued when a checksum error has been detected. .RE .sp .ne 2 .na \fBio\fR .ad .RS 12n Issued when there is an I/O error in a vdev in the pool. .RE .sp .ne 2 .na \fBdata\fR .ad .RS 12n Issued when there have been data errors in the pool. .RE .sp .ne 2 .na \fBdeadman\fR .ad .RS 12n Issued when an I/O is determined to be "hung", this can be caused by lost completion events due to flaky hardware or drivers. See the \fBzfs_deadman_failmode\fR module option description for additional information regarding "hung" I/O detection and configuration. .RE .sp .ne 2 .na \fBdelay\fR .ad .RS 12n Issued when a completed I/O exceeds the maximum allowed time specified by the \fBzio_slow_io_ms\fR module option. This can be an indicator of problems with the underlying storage device. The number of delay events is ratelimited by the \fBzfs_slow_io_events_per_second\fR module parameter. .RE .sp .ne 2 .na \fBconfig.sync\fR .ad .RS 12n Issued every time a vdev change have been done to the pool. .RE .sp .ne 2 .na \fBzpool\fR .ad .RS 12n Issued when a pool cannot be imported. .RE .sp .ne 2 .na \fBzpool.destroy\fR .ad .RS 12n Issued when a pool is destroyed. .RE .sp .ne 2 .na \fBzpool.export\fR .ad .RS 12n Issued when a pool is exported. .RE .sp .ne 2 .na \fBzpool.import\fR .ad .RS 12n Issued when a pool is imported. .RE .sp .ne 2 .na \fBzpool.reguid\fR .ad .RS 12n Issued when a REGUID (new unique identifier for the pool have been regenerated) have been detected. .RE .sp .ne 2 .na \fBvdev.unknown\fR .ad .RS 12n Issued when the vdev is unknown. Such as trying to clear device errors on a vdev that have failed/been kicked from the system/pool and is no longer available. .RE .sp .ne 2 .na \fBvdev.open_failed\fR .ad .RS 12n Issued when a vdev could not be opened (because it didn't exist for example). .RE .sp .ne 2 .na \fBvdev.corrupt_data\fR .ad .RS 12n Issued when corrupt data have been detected on a vdev. .RE .sp .ne 2 .na \fBvdev.no_replicas\fR .ad .RS 12n Issued when there are no more replicas to sustain the pool. This would lead to the pool being \fIDEGRADED\fR. .RE .sp .ne 2 .na \fBvdev.bad_guid_sum\fR .ad .RS 12n Issued when a missing device in the pool have been detected. .RE .sp .ne 2 .na \fBvdev.too_small\fR .ad .RS 12n Issued when the system (kernel) have removed a device, and ZFS notices that the device isn't there any more. This is usually followed by a \fBprobe_failure\fR event. .RE .sp .ne 2 .na \fBvdev.bad_label\fR .ad .RS 12n Issued when the label is OK but invalid. .RE .sp .ne 2 .na \fBvdev.bad_ashift\fR .ad .RS 12n Issued when the ashift alignment requirement has increased. .RE .sp .ne 2 .na \fBvdev.remove\fR .ad .RS 12n Issued when a vdev is detached from a mirror (or a spare detached from a vdev where it have been used to replace a failed drive - only works if the original drive have been readded). .RE .sp .ne 2 .na \fBvdev.clear\fR .ad .RS 12n Issued when clearing device errors in a pool. Such as running \fBzpool clear\fR on a device in the pool. .RE .sp .ne 2 .na \fBvdev.check\fR .ad .RS 12n Issued when a check to see if a given vdev could be opened is started. .RE .sp .ne 2 .na \fBvdev.spare\fR .ad .RS 12n Issued when a spare have kicked in to replace a failed device. .RE .sp .ne 2 .na \fBvdev.autoexpand\fR .ad .RS 12n Issued when a vdev can be automatically expanded. .RE .sp .ne 2 .na \fBio_failure\fR .ad .RS 12n Issued when there is an I/O failure in a vdev in the pool. .RE .sp .ne 2 .na \fBprobe_failure\fR .ad .RS 12n Issued when a probe fails on a vdev. This would occur if a vdev have been kicked from the system outside of ZFS (such as the kernel have removed the device). .RE .sp .ne 2 .na \fBlog_replay\fR .ad .RS 12n Issued when the intent log cannot be replayed. The can occur in the case of a missing or damaged log device. .RE .sp .ne 2 .na \fBresilver.start\fR .ad .RS 12n Issued when a resilver is started. .RE .sp .ne 2 .na \fBresilver.finish\fR .ad .RS 12n Issued when the running resilver have finished. .RE .sp .ne 2 .na \fBscrub.start\fR .ad .RS 12n Issued when a scrub is started on a pool. .RE .sp .ne 2 .na \fBscrub.finish\fR .ad .RS 12n Issued when a pool has finished scrubbing. .RE .sp .ne 2 .na \fBscrub.abort\fR .ad .RS 12n Issued when a scrub is aborted on a pool. .RE .sp .ne 2 .na \fBscrub.resume\fR .ad .RS 12n Issued when a scrub is resumed on a pool. .RE .sp .ne 2 .na \fBscrub.paused\fR .ad .RS 12n Issued when a scrub is paused on a pool. .RE .sp .ne 2 .na \fBbootfs.vdev.attach\fR .ad .RS 12n .RE .SS "PAYLOADS" .sp .LP This is the payload (data, information) that accompanies an event. .sp For .BR zed (8), these are set to uppercase and prefixed with \fBZEVENT_\fR. .sp .ne 2 .na \fBpool\fR .ad .RS 12n Pool name. .RE .sp .ne 2 .na \fBpool_failmode\fR .ad .RS 12n Failmode - \fBwait\fR, \fBcontinue\fR or \fBpanic\fR. See .BR zpool (8) (\fIfailmode\fR property) for more information. .RE .sp .ne 2 .na \fBpool_guid\fR .ad .RS 12n The GUID of the pool. .RE .sp .ne 2 .na \fBpool_context\fR .ad .RS 12n The load state for the pool (0=none, 1=open, 2=import, 3=tryimport, 4=recover 5=error). .RE .sp .ne 2 .na \fBvdev_guid\fR .ad .RS 12n The GUID of the vdev in question (the vdev failing or operated upon with \fBzpool clear\fR etc). .RE .sp .ne 2 .na \fBvdev_type\fR .ad .RS 12n Type of vdev - \fBdisk\fR, \fBfile\fR, \fBmirror\fR etc. See .BR zpool (8) under \fBVirtual Devices\fR for more information on possible values. .RE .sp .ne 2 .na \fBvdev_path\fR .ad .RS 12n Full path of the vdev, including any \fI-partX\fR. .RE .sp .ne 2 .na \fBvdev_devid\fR .ad .RS 12n ID of vdev (if any). .RE .sp .ne 2 .na \fBvdev_fru\fR .ad .RS 12n Physical FRU location. .RE .sp .ne 2 .na \fBvdev_state\fR .ad .RS 12n State of vdev (0=uninitialized, 1=closed, 2=offline, 3=removed, 4=failed to open, 5=faulted, 6=degraded, 7=healthy). .RE .sp .ne 2 .na \fBvdev_ashift\fR .ad .RS 12n The ashift value of the vdev. .RE .sp .ne 2 .na \fBvdev_complete_ts\fR .ad .RS 12n The time the last I/O completed for the specified vdev. .RE .sp .ne 2 .na \fBvdev_delta_ts\fR .ad .RS 12n The time since the last I/O completed for the specified vdev. .RE .sp .ne 2 .na \fBvdev_spare_paths\fR .ad .RS 12n List of spares, including full path and any \fI-partX\fR. .RE .sp .ne 2 .na \fBvdev_spare_guids\fR .ad .RS 12n GUID(s) of spares. .RE .sp .ne 2 .na \fBvdev_read_errors\fR .ad .RS 12n How many read errors that have been detected on the vdev. .RE .sp .ne 2 .na \fBvdev_write_errors\fR .ad .RS 12n How many write errors that have been detected on the vdev. .RE .sp .ne 2 .na \fBvdev_cksum_errors\fR .ad .RS 12n How many checksum errors that have been detected on the vdev. .RE .sp .ne 2 .na \fBparent_guid\fR .ad .RS 12n GUID of the vdev parent. .RE .sp .ne 2 .na \fBparent_type\fR .ad .RS 12n Type of parent. See \fBvdev_type\fR. .RE .sp .ne 2 .na \fBparent_path\fR .ad .RS 12n Path of the vdev parent (if any). .RE .sp .ne 2 .na \fBparent_devid\fR .ad .RS 12n ID of the vdev parent (if any). .RE .sp .ne 2 .na \fBzio_objset\fR .ad .RS 12n The object set number for a given I/O. .RE .sp .ne 2 .na \fBzio_object\fR .ad .RS 12n The object number for a given I/O. .RE .sp .ne 2 .na \fBzio_level\fR .ad .RS 12n The indirect level for the block. Level 0 is the lowest level and includes data blocks. Values > 0 indicate metadata blocks at the appropriate level. .RE .sp .ne 2 .na \fBzio_blkid\fR .ad .RS 12n The block ID for a given I/O. .RE .sp .ne 2 .na \fBzio_err\fR .ad .RS 12n The errno for a failure when handling a given I/O. The errno is compatible with \fBerrno\fR(3) with the value for EBADE (0x34) used to indicate ZFS checksum error. .RE .sp .ne 2 .na \fBzio_offset\fR .ad .RS 12n The offset in bytes of where to write the I/O for the specified vdev. .RE .sp .ne 2 .na \fBzio_size\fR .ad .RS 12n The size in bytes of the I/O. .RE .sp .ne 2 .na \fBzio_flags\fR .ad .RS 12n The current flags describing how the I/O should be handled. See the \fBI/O FLAGS\fR section for the full list of I/O flags. .RE .sp .ne 2 .na \fBzio_stage\fR .ad .RS 12n The current stage of the I/O in the pipeline. See the \fBI/O STAGES\fR section for a full list of all the I/O stages. .RE .sp .ne 2 .na \fBzio_pipeline\fR .ad .RS 12n The valid pipeline stages for the I/O. See the \fBI/O STAGES\fR section for a full list of all the I/O stages. .RE .sp .ne 2 .na \fBzio_delay\fR .ad .RS 12n The time elapsed (in nanoseconds) waiting for the block layer to complete the I/O. Unlike \fBzio_delta\fR this does not include any vdev queuing time and is therefore solely a measure of the block layer performance. .RE .sp .ne 2 .na \fBzio_timestamp\fR .ad .RS 12n The time when a given I/O was submitted. .RE .sp .ne 2 .na \fBzio_delta\fR .ad .RS 12n The time required to service a given I/O. .RE .sp .ne 2 .na \fBprev_state\fR .ad .RS 12n The previous state of the vdev. .RE .sp .ne 2 .na \fBcksum_expected\fR .ad .RS 12n The expected checksum value for the block. .RE .sp .ne 2 .na \fBcksum_actual\fR .ad .RS 12n The actual checksum value for an errant block. .RE .sp .ne 2 .na \fBcksum_algorithm\fR .ad .RS 12n Checksum algorithm used. See \fBzfs\fR(8) for more information on checksum algorithms available. .RE .sp .ne 2 .na \fBcksum_byteswap\fR .ad .RS 12n Whether or not the data is byteswapped. .RE .sp .ne 2 .na \fBbad_ranges\fR .ad .RS 12n [start, end) pairs of corruption offsets. Offsets are always aligned on a 64-bit boundary, and can include some gaps of non-corruption. (See \fBbad_ranges_min_gap\fR) .RE .sp .ne 2 .na \fBbad_ranges_min_gap\fR .ad .RS 12n In order to bound the size of the \fBbad_ranges\fR array, gaps of non-corruption less than or equal to \fBbad_ranges_min_gap\fR bytes have been merged with adjacent corruption. Always at least 8 bytes, since corruption is detected on a 64-bit word basis. .RE .sp .ne 2 .na \fBbad_range_sets\fR .ad .RS 12n This array has one element per range in \fBbad_ranges\fR. Each element contains the count of bits in that range which were clear in the good data and set in the bad data. .RE .sp .ne 2 .na \fBbad_range_clears\fR .ad .RS 12n This array has one element per range in \fBbad_ranges\fR. Each element contains the count of bits for that range which were set in the good data and clear in the bad data. .RE .sp .ne 2 .na \fBbad_set_bits\fR .ad .RS 12n If this field exists, it is an array of: (bad data & ~(good data)); that is, the bits set in the bad data which are cleared in the good data. Each element corresponds a byte whose offset is in a range in \fBbad_ranges\fR, and the array is ordered by offset. Thus, the first element is the first byte in the first \fBbad_ranges\fR range, and the last element is the last byte in the last \fBbad_ranges\fR range. .RE .sp .ne 2 .na \fBbad_cleared_bits\fR .ad .RS 12n Like \fBbad_set_bits\fR, but contains: (good data & ~(bad data)); that is, the bits set in the good data which are cleared in the bad data. .RE .sp .ne 2 .na \fBbad_set_histogram\fR .ad .RS 12n If this field exists, it is an array of counters. Each entry counts bits set in a particular bit of a big-endian uint64 type. The first entry counts bits set in the high-order bit of the first byte, the 9th byte, etc, and the last entry counts bits set of the low-order bit of the 8th byte, the 16th byte, etc. This information is useful for observing a stuck bit in a parallel data path, such as IDE or parallel SCSI. .RE .sp .ne 2 .na \fBbad_cleared_histogram\fR .ad .RS 12n If this field exists, it is an array of counters. Each entry counts bit clears in a particular bit of a big-endian uint64 type. The first entry counts bits clears of the high-order bit of the first byte, the 9th byte, etc, and the last entry counts clears of the low-order bit of the 8th byte, the 16th byte, etc. This information is useful for observing a stuck bit in a parallel data path, such as IDE or parallel SCSI. .RE .SS "I/O STAGES" .sp .LP The ZFS I/O pipeline is comprised of various stages which are defined below. The individual stages are used to construct these basic I/O operations: Read, Write, Free, Claim, and Ioctl. These stages may be set on an event to describe the life cycle of a given I/O. .TS tab(:); l l l . Stage:Bit Mask:Operations _:_:_ ZIO_STAGE_OPEN:0x00000001:RWFCI ZIO_STAGE_READ_BP_INIT:0x00000002:R---- ZIO_STAGE_WRITE_BP_INIT:0x00000004:-W--- ZIO_STAGE_FREE_BP_INIT:0x00000008:--F-- ZIO_STAGE_ISSUE_ASYNC:0x00000010:RWF-- ZIO_STAGE_WRITE_COMPRESS:0x00000020:-W--- ZIO_STAGE_ENCRYPT:0x00000040:-W--- ZIO_STAGE_CHECKSUM_GENERATE:0x00000080:-W--- ZIO_STAGE_NOP_WRITE:0x00000100:-W--- ZIO_STAGE_DDT_READ_START:0x00000200:R---- ZIO_STAGE_DDT_READ_DONE:0x00000400:R---- ZIO_STAGE_DDT_WRITE:0x00000800:-W--- ZIO_STAGE_DDT_FREE:0x00001000:--F-- ZIO_STAGE_GANG_ASSEMBLE:0x00002000:RWFC- ZIO_STAGE_GANG_ISSUE:0x00004000:RWFC- ZIO_STAGE_DVA_THROTTLE:0x00008000:-W--- ZIO_STAGE_DVA_ALLOCATE:0x00010000:-W--- ZIO_STAGE_DVA_FREE:0x00020000:--F-- ZIO_STAGE_DVA_CLAIM:0x00040000:---C- ZIO_STAGE_READY:0x00080000:RWFCI ZIO_STAGE_VDEV_IO_START:0x00100000:RW--I ZIO_STAGE_VDEV_IO_DONE:0x00200000:RW--I ZIO_STAGE_VDEV_IO_ASSESS:0x00400000:RW--I ZIO_STAGE_CHECKSUM_VERIFY:0x00800000:R---- ZIO_STAGE_DONE:0x01000000:RWFCI .TE .SS "I/O FLAGS" .sp .LP Every I/O in the pipeline contains a set of flags which describe its function and are used to govern its behavior. These flags will be set in an event as an \fBzio_flags\fR payload entry. .TS tab(:); l l . Flag:Bit Mask _:_ ZIO_FLAG_DONT_AGGREGATE:0x00000001 ZIO_FLAG_IO_REPAIR:0x00000002 ZIO_FLAG_SELF_HEAL:0x00000004 ZIO_FLAG_RESILVER:0x00000008 ZIO_FLAG_SCRUB:0x00000010 ZIO_FLAG_SCAN_THREAD:0x00000020 ZIO_FLAG_PHYSICAL:0x00000040 ZIO_FLAG_CANFAIL:0x00000080 ZIO_FLAG_SPECULATIVE:0x00000100 ZIO_FLAG_CONFIG_WRITER:0x00000200 ZIO_FLAG_DONT_RETRY:0x00000400 ZIO_FLAG_DONT_CACHE:0x00000800 ZIO_FLAG_NODATA:0x00001000 ZIO_FLAG_INDUCE_DAMAGE:0x00002000 ZIO_FLAG_IO_ALLOCATING:0x00004000 ZIO_FLAG_IO_RETRY:0x00008000 ZIO_FLAG_PROBE:0x00010000 ZIO_FLAG_TRYHARD:0x00020000 ZIO_FLAG_OPTIONAL:0x00040000 ZIO_FLAG_DONT_QUEUE:0x00080000 ZIO_FLAG_DONT_PROPAGATE:0x00100000 ZIO_FLAG_IO_BYPASS:0x00200000 ZIO_FLAG_IO_REWRITE:0x00400000 ZIO_FLAG_RAW_COMPRESS:0x00800000 ZIO_FLAG_RAW_ENCRYPT:0x01000000 ZIO_FLAG_GANG_CHILD:0x02000000 ZIO_FLAG_DDT_CHILD:0x04000000 ZIO_FLAG_GODFATHER:0x08000000 ZIO_FLAG_NOPWRITE:0x10000000 ZIO_FLAG_REEXECUTED:0x20000000 ZIO_FLAG_DELEGATED:0x40000000 ZIO_FLAG_FASTWRITE:0x80000000 .TE diff --git a/man/man5/zfs-module-parameters.5 b/man/man5/zfs-module-parameters.5 index 853e8fc9442c..444e747488e8 100644 --- a/man/man5/zfs-module-parameters.5 +++ b/man/man5/zfs-module-parameters.5 @@ -1,4084 +1,4084 @@ '\" te .\" Copyright (c) 2013 by Turbo Fredriksson . All rights reserved. .\" Copyright (c) 2019 by Delphix. All rights reserved. .\" Copyright (c) 2019 Datto Inc. .\" The contents of this file are subject to the terms of the Common Development .\" and Distribution License (the "License"). You may not use this file except .\" in compliance with the License. You can obtain a copy of the license at .\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. .\" .\" See the License for the specific language governing permissions and .\" limitations under the License. When distributing Covered Code, include this .\" CDDL HEADER in each file and include the License file at .\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this .\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your .\" own identifying information: .\" Portions Copyright [yyyy] [name of copyright owner] -.TH ZFS-MODULE-PARAMETERS 5 "Feb 15, 2019" +.TH ZFS-MODULE-PARAMETERS 5 "Aug 24, 2020" OpenZFS .SH NAME zfs\-module\-parameters \- ZFS module parameters .SH DESCRIPTION .sp .LP Description of the different parameters to the ZFS module. .SS "Module parameters" .sp .LP .sp .ne 2 .na \fBdbuf_cache_max_bytes\fR (ulong) .ad .RS 12n Maximum size in bytes of the dbuf cache. The target size is determined by the MIN versus \fB1/2^dbuf_cache_shift\fR (1/32) of the target ARC size. The behavior of the dbuf cache and its associated settings can be observed via the \fB/proc/spl/kstat/zfs/dbufstats\fR kstat. .sp Default value: \fBULONG_MAX\fR. .RE .sp .ne 2 .na \fBdbuf_metadata_cache_max_bytes\fR (ulong) .ad .RS 12n Maximum size in bytes of the metadata dbuf cache. The target size is determined by the MIN versus \fB1/2^dbuf_metadata_cache_shift\fR (1/64) of the target ARC size. The behavior of the metadata dbuf cache and its associated settings can be observed via the \fB/proc/spl/kstat/zfs/dbufstats\fR kstat. .sp Default value: \fBULONG_MAX\fR. .RE .sp .ne 2 .na \fBdbuf_cache_hiwater_pct\fR (uint) .ad .RS 12n The percentage over \fBdbuf_cache_max_bytes\fR when dbufs must be evicted directly. .sp Default value: \fB10\fR%. .RE .sp .ne 2 .na \fBdbuf_cache_lowater_pct\fR (uint) .ad .RS 12n The percentage below \fBdbuf_cache_max_bytes\fR when the evict thread stops evicting dbufs. .sp Default value: \fB10\fR%. .RE .sp .ne 2 .na \fBdbuf_cache_shift\fR (int) .ad .RS 12n Set the size of the dbuf cache, \fBdbuf_cache_max_bytes\fR, to a log2 fraction of the target ARC size. .sp Default value: \fB5\fR. .RE .sp .ne 2 .na \fBdbuf_metadata_cache_shift\fR (int) .ad .RS 12n Set the size of the dbuf metadata cache, \fBdbuf_metadata_cache_max_bytes\fR, to a log2 fraction of the target ARC size. .sp Default value: \fB6\fR. .RE .sp .ne 2 .na \fBdmu_object_alloc_chunk_shift\fR (int) .ad .RS 12n dnode slots allocated in a single operation as a power of 2. The default value minimizes lock contention for the bulk operation performed. .sp Default value: \fB7\fR (128). .RE .sp .ne 2 .na \fBdmu_prefetch_max\fR (int) .ad .RS 12n Limit the amount we can prefetch with one call to this amount (in bytes). This helps to limit the amount of memory that can be used by prefetching. .sp Default value: \fB134,217,728\fR (128MB). .RE .sp .ne 2 .na \fBignore_hole_birth\fR (int) .ad .RS 12n This is an alias for \fBsend_holes_without_birth_time\fR. .RE .sp .ne 2 .na \fBl2arc_feed_again\fR (int) .ad .RS 12n Turbo L2ARC warm-up. When the L2ARC is cold the fill interval will be set as fast as possible. .sp Use \fB1\fR for yes (default) and \fB0\fR to disable. .RE .sp .ne 2 .na \fBl2arc_feed_min_ms\fR (ulong) .ad .RS 12n Min feed interval in milliseconds. Requires \fBl2arc_feed_again=1\fR and only applicable in related situations. .sp Default value: \fB200\fR. .RE .sp .ne 2 .na \fBl2arc_feed_secs\fR (ulong) .ad .RS 12n Seconds between L2ARC writing .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBl2arc_headroom\fR (ulong) .ad .RS 12n How far through the ARC lists to search for L2ARC cacheable content, expressed as a multiplier of \fBl2arc_write_max\fR. ARC persistence across reboots can be achieved with persistent L2ARC by setting this parameter to \fB0\fR allowing the full length of ARC lists to be searched for cacheable content. .sp Default value: \fB2\fR. .RE .sp .ne 2 .na \fBl2arc_headroom_boost\fR (ulong) .ad .RS 12n Scales \fBl2arc_headroom\fR by this percentage when L2ARC contents are being successfully compressed before writing. A value of \fB100\fR disables this feature. .sp Default value: \fB200\fR%. .RE .sp .ne 2 .na \fBl2arc_trim_ahead\fR (ulong) .ad .RS 12n Trims ahead of the current write size (\fBl2arc_write_max\fR) on L2ARC devices by this percentage of write size if we have filled the device. If set to \fB100\fR we TRIM twice the space required to accommodate upcoming writes. A minimum of 64MB will be trimmed. It also enables TRIM of the whole L2ARC device upon creation or addition to an existing pool or if the header of the device is invalid upon importing a pool or onlining a cache device. A value of \fB0\fR disables TRIM on L2ARC altogether and is the default as it can put significant stress on the underlying storage devices. This will vary depending of how well the specific device handles these commands. .sp Default value: \fB0\fR%. .RE .sp .ne 2 .na \fBl2arc_noprefetch\fR (int) .ad .RS 12n Do not write buffers to L2ARC if they were prefetched but not used by applications. .sp Use \fB1\fR for yes (default) and \fB0\fR to disable. .RE .sp .ne 2 .na \fBl2arc_norw\fR (int) .ad .RS 12n No reads during writes. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBl2arc_write_boost\fR (ulong) .ad .RS 12n Cold L2ARC devices will have \fBl2arc_write_max\fR increased by this amount while they remain cold. .sp Default value: \fB8,388,608\fR. .RE .sp .ne 2 .na \fBl2arc_write_max\fR (ulong) .ad .RS 12n Max write bytes per interval. .sp Default value: \fB8,388,608\fR. .RE .sp .ne 2 .na \fBl2arc_rebuild_enabled\fR (int) .ad .RS 12n Rebuild the L2ARC when importing a pool (persistent L2ARC). This can be disabled if there are problems importing a pool or attaching an L2ARC device (e.g. the L2ARC device is slow in reading stored log metadata, or the metadata has become somehow fragmented/unusable). .sp Use \fB1\fR for yes (default) and \fB0\fR for no. .RE .sp .ne 2 .na \fBl2arc_rebuild_blocks_min_l2size\fR (ulong) .ad .RS 12n Min size (in bytes) of an L2ARC device required in order to write log blocks in it. The log blocks are used upon importing the pool to rebuild the L2ARC (persistent L2ARC). Rationale: for L2ARC devices less than 1GB, the amount of data l2arc_evict() evicts is significant compared to the amount of restored L2ARC data. In this case do not write log blocks in L2ARC in order not to waste space. .sp Default value: \fB1,073,741,824\fR (1GB). .RE .sp .ne 2 .na \fBmetaslab_aliquot\fR (ulong) .ad .RS 12n Metaslab granularity, in bytes. This is roughly similar to what would be referred to as the "stripe size" in traditional RAID arrays. In normal operation, ZFS will try to write this amount of data to a top-level vdev before moving on to the next one. .sp Default value: \fB524,288\fR. .RE .sp .ne 2 .na \fBmetaslab_bias_enabled\fR (int) .ad .RS 12n Enable metaslab group biasing based on its vdev's over- or under-utilization relative to the pool. .sp Use \fB1\fR for yes (default) and \fB0\fR for no. .RE .sp .ne 2 .na \fBmetaslab_force_ganging\fR (ulong) .ad .RS 12n Make some blocks above a certain size be gang blocks. This option is used by the test suite to facilitate testing. .sp Default value: \fB16,777,217\fR. .RE .sp .ne 2 .na \fBzfs_keep_log_spacemaps_at_export\fR (int) .ad .RS 12n Prevent log spacemaps from being destroyed during pool exports and destroys. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_metaslab_segment_weight_enabled\fR (int) .ad .RS 12n Enable/disable segment-based metaslab selection. .sp Use \fB1\fR for yes (default) and \fB0\fR for no. .RE .sp .ne 2 .na \fBzfs_metaslab_switch_threshold\fR (int) .ad .RS 12n When using segment-based metaslab selection, continue allocating from the active metaslab until \fBzfs_metaslab_switch_threshold\fR worth of buckets have been exhausted. .sp Default value: \fB2\fR. .RE .sp .ne 2 .na \fBmetaslab_debug_load\fR (int) .ad .RS 12n Load all metaslabs during pool import. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBmetaslab_debug_unload\fR (int) .ad .RS 12n Prevent metaslabs from being unloaded. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBmetaslab_fragmentation_factor_enabled\fR (int) .ad .RS 12n Enable use of the fragmentation metric in computing metaslab weights. .sp Use \fB1\fR for yes (default) and \fB0\fR for no. .RE .sp .ne 2 .na \fBmetaslab_df_max_search\fR (int) .ad .RS 12n Maximum distance to search forward from the last offset. Without this limit, fragmented pools can see >100,000 iterations and metaslab_block_picker() becomes the performance limiting factor on high-performance storage. With the default setting of 16MB, we typically see less than 500 iterations, even with very fragmented, ashift=9 pools. The maximum number of iterations possible is: \fBmetaslab_df_max_search / (2 * (1< physical sector size on new top-level vdevs. .sp Default value: \fBASHIFT_MAX\fR (16). .RE .sp .ne 2 .na \fBzfs_vdev_min_auto_ashift\fR (ulong) .ad .RS 12n Minimum ashift used when creating new top-level vdevs. .sp Default value: \fBASHIFT_MIN\fR (9). .RE .sp .ne 2 .na \fBzfs_vdev_min_ms_count\fR (int) .ad .RS 12n Minimum number of metaslabs to create in a top-level vdev. .sp Default value: \fB16\fR. .RE .sp .ne 2 .na \fBvdev_validate_skip\fR (int) .ad .RS 12n Skip label validation steps during pool import. Changing is not recommended unless you know what you are doing and are recovering a damaged label. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_vdev_ms_count_limit\fR (int) .ad .RS 12n Practical upper limit of total metaslabs per top-level vdev. .sp Default value: \fB131,072\fR. .RE .sp .ne 2 .na \fBmetaslab_preload_enabled\fR (int) .ad .RS 12n Enable metaslab group preloading. .sp Use \fB1\fR for yes (default) and \fB0\fR for no. .RE .sp .ne 2 .na \fBmetaslab_lba_weighting_enabled\fR (int) .ad .RS 12n Give more weight to metaslabs with lower LBAs, assuming they have greater bandwidth as is typically the case on a modern constant angular velocity disk drive. .sp Use \fB1\fR for yes (default) and \fB0\fR for no. .RE .sp .ne 2 .na \fBmetaslab_unload_delay\fR (int) .ad .RS 12n After a metaslab is used, we keep it loaded for this many txgs, to attempt to reduce unnecessary reloading. Note that both this many txgs and \fBmetaslab_unload_delay_ms\fR milliseconds must pass before unloading will occur. .sp Default value: \fB32\fR. .RE .sp .ne 2 .na \fBmetaslab_unload_delay_ms\fR (int) .ad .RS 12n After a metaslab is used, we keep it loaded for this many milliseconds, to attempt to reduce unnecessary reloading. Note that both this many milliseconds and \fBmetaslab_unload_delay\fR txgs must pass before unloading will occur. .sp Default value: \fB600000\fR (ten minutes). .RE .sp .ne 2 .na \fBsend_holes_without_birth_time\fR (int) .ad .RS 12n When set, the hole_birth optimization will not be used, and all holes will always be sent on zfs send. This is useful if you suspect your datasets are affected by a bug in hole_birth. .sp Use \fB1\fR for on (default) and \fB0\fR for off. .RE .sp .ne 2 .na \fBspa_config_path\fR (charp) .ad .RS 12n SPA config file .sp Default value: \fB/etc/zfs/zpool.cache\fR. .RE .sp .ne 2 .na \fBspa_asize_inflation\fR (int) .ad .RS 12n Multiplication factor used to estimate actual disk consumption from the size of data being written. The default value is a worst case estimate, but lower values may be valid for a given pool depending on its configuration. Pool administrators who understand the factors involved may wish to specify a more realistic inflation factor, particularly if they operate close to quota or capacity limits. .sp Default value: \fB24\fR. .RE .sp .ne 2 .na \fBspa_load_print_vdev_tree\fR (int) .ad .RS 12n Whether to print the vdev tree in the debugging message buffer during pool import. Use 0 to disable and 1 to enable. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBspa_load_verify_data\fR (int) .ad .RS 12n Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR) import. Use 0 to disable and 1 to enable. An extreme rewind import normally performs a full traversal of all blocks in the pool for verification. If this parameter is set to 0, the traversal skips non-metadata blocks. It can be toggled once the import has started to stop or start the traversal of non-metadata blocks. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBspa_load_verify_metadata\fR (int) .ad .RS 12n Whether to traverse blocks during an "extreme rewind" (\fB-X\fR) pool import. Use 0 to disable and 1 to enable. An extreme rewind import normally performs a full traversal of all blocks in the pool for verification. If this parameter is set to 0, the traversal is not performed. It can be toggled once the import has started to stop or start the traversal. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBspa_load_verify_shift\fR (int) .ad .RS 12n Sets the maximum number of bytes to consume during pool import to the log2 fraction of the target ARC size. .sp Default value: \fB4\fR. .RE .sp .ne 2 .na \fBspa_slop_shift\fR (int) .ad .RS 12n Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space in the pool to be consumed. This ensures that we don't run the pool completely out of space, due to unaccounted changes (e.g. to the MOS). It also limits the worst-case time to allocate space. If we have less than this amount of free space, most ZPL operations (e.g. write, create) will return ENOSPC. .sp Default value: \fB5\fR. .RE .sp .ne 2 .na \fBvdev_removal_max_span\fR (int) .ad .RS 12n During top-level vdev removal, chunks of data are copied from the vdev which may include free space in order to trade bandwidth for IOPS. This parameter determines the maximum span of free space (in bytes) which will be included as "unnecessary" data in a chunk of copied data. The default value here was chosen to align with \fBzfs_vdev_read_gap_limit\fR, which is a similar concept when doing regular reads (but there's no reason it has to be the same). .sp Default value: \fB32,768\fR. .RE .sp .ne 2 .na \fBzap_iterate_prefetch\fR (int) .ad .RS 12n If this is set, when we start iterating over a ZAP object, zfs will prefetch the entire object (all leaf blocks). However, this is limited by \fBdmu_prefetch_max\fR. .sp Use \fB1\fR for on (default) and \fB0\fR for off. .RE .sp .ne 2 .na \fBzfetch_array_rd_sz\fR (ulong) .ad .RS 12n If prefetching is enabled, disable prefetching for reads larger than this size. .sp Default value: \fB1,048,576\fR. .RE .sp .ne 2 .na \fBzfetch_max_distance\fR (uint) .ad .RS 12n Max bytes to prefetch per stream (default 8MB). .sp Default value: \fB8,388,608\fR. .RE .sp .ne 2 .na \fBzfetch_max_streams\fR (uint) .ad .RS 12n Max number of streams per zfetch (prefetch streams per file). .sp Default value: \fB8\fR. .RE .sp .ne 2 .na \fBzfetch_min_sec_reap\fR (uint) .ad .RS 12n Min time before an active prefetch stream can be reclaimed .sp Default value: \fB2\fR. .RE .sp .ne 2 .na \fBzfs_abd_scatter_enabled\fR (int) .ad .RS 12n Enables ARC from using scatter/gather lists and forces all allocations to be linear in kernel memory. Disabling can improve performance in some code paths at the expense of fragmented kernel memory. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_abd_scatter_max_order\fR (iunt) .ad .RS 12n Maximum number of consecutive memory pages allocated in a single block for scatter/gather lists. Default value is specified by the kernel itself. .sp Default value: \fB10\fR at the time of this writing. .RE .sp .ne 2 .na \fBzfs_abd_scatter_min_size\fR (uint) .ad .RS 12n This is the minimum allocation size that will use scatter (page-based) ABD's. Smaller allocations will use linear ABD's. .sp Default value: \fB1536\fR (512B and 1KB allocations will be linear). .RE .sp .ne 2 .na \fBzfs_arc_dnode_limit\fR (ulong) .ad .RS 12n When the number of bytes consumed by dnodes in the ARC exceeds this number of bytes, try to unpin some of it in response to demand for non-metadata. This value acts as a ceiling to the amount of dnode metadata, and defaults to 0 which indicates that a percent which is based on \fBzfs_arc_dnode_limit_percent\fR of the ARC meta buffers that may be used for dnodes. See also \fBzfs_arc_meta_prune\fR which serves a similar purpose but is used when the amount of metadata in the ARC exceeds \fBzfs_arc_meta_limit\fR rather than in response to overall demand for non-metadata. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_arc_dnode_limit_percent\fR (ulong) .ad .RS 12n Percentage that can be consumed by dnodes of ARC meta buffers. .sp See also \fBzfs_arc_dnode_limit\fR which serves a similar purpose but has a higher priority if set to nonzero value. .sp Default value: \fB10\fR%. .RE .sp .ne 2 .na \fBzfs_arc_dnode_reduce_percent\fR (ulong) .ad .RS 12n Percentage of ARC dnodes to try to scan in response to demand for non-metadata when the number of bytes consumed by dnodes exceeds \fBzfs_arc_dnode_limit\fR. .sp Default value: \fB10\fR% of the number of dnodes in the ARC. .RE .sp .ne 2 .na \fBzfs_arc_average_blocksize\fR (int) .ad .RS 12n The ARC's buffer hash table is sized based on the assumption of an average block size of \fBzfs_arc_average_blocksize\fR (default 8K). This works out to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers. For configurations with a known larger average block size this value can be increased to reduce the memory footprint. .sp Default value: \fB8192\fR. .RE .sp .ne 2 .na \fBzfs_arc_eviction_pct\fR (int) .ad .RS 12n When \fBarc_is_overflowing()\fR, \fBarc_get_data_impl()\fR waits for this percent of the requested amount of data to be evicted. For example, by default for every 2KB that's evicted, 1KB of it may be "reused" by a new allocation. Since this is above 100%, it ensures that progress is made towards getting \fBarc_size\fR under \fBarc_c\fR. Since this is finite, it ensures that allocations can still happen, even during the potentially long time that \fBarc_size\fR is more than \fBarc_c\fR. .sp Default value: \fB200\fR. .RE .sp .ne 2 .na \fBzfs_arc_evict_batch_limit\fR (int) .ad .RS 12n Number ARC headers to evict per sub-list before proceeding to another sub-list. This batch-style operation prevents entire sub-lists from being evicted at once but comes at a cost of additional unlocking and locking. .sp Default value: \fB10\fR. .RE .sp .ne 2 .na \fBzfs_arc_grow_retry\fR (int) .ad .RS 12n If set to a non zero value, it will replace the arc_grow_retry value with this value. The arc_grow_retry value (default 5) is the number of seconds the ARC will wait before trying to resume growth after a memory pressure event. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_arc_lotsfree_percent\fR (int) .ad .RS 12n Throttle I/O when free system memory drops below this percentage of total system memory. Setting this value to 0 will disable the throttle. .sp Default value: \fB10\fR%. .RE .sp .ne 2 .na \fBzfs_arc_max\fR (ulong) .ad .RS 12n Max size of ARC in bytes. If set to 0 then the max size of ARC is determined by the amount of system memory installed. For Linux, 1/2 of system memory will be used as the limit. For FreeBSD, the larger of all system memory - 1GB or 5/8 of system memory will be used as the limit. This value must be at least 67108864 (64 megabytes). .sp This value can be changed dynamically with some caveats. It cannot be set back to 0 while running and reducing it below the current ARC size will not cause the ARC to shrink without memory pressure to induce shrinking. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_arc_meta_adjust_restarts\fR (ulong) .ad .RS 12n The number of restart passes to make while scanning the ARC attempting the free buffers in order to stay below the \fBzfs_arc_meta_limit\fR. This value should not need to be tuned but is available to facilitate performance analysis. .sp Default value: \fB4096\fR. .RE .sp .ne 2 .na \fBzfs_arc_meta_limit\fR (ulong) .ad .RS 12n The maximum allowed size in bytes that meta data buffers are allowed to consume in the ARC. When this limit is reached meta data buffers will be reclaimed even if the overall arc_c_max has not been reached. This value defaults to 0 which indicates that a percent which is based on \fBzfs_arc_meta_limit_percent\fR of the ARC may be used for meta data. .sp This value my be changed dynamically except that it cannot be set back to 0 for a specific percent of the ARC; it must be set to an explicit value. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_arc_meta_limit_percent\fR (ulong) .ad .RS 12n Percentage of ARC buffers that can be used for meta data. See also \fBzfs_arc_meta_limit\fR which serves a similar purpose but has a higher priority if set to nonzero value. .sp Default value: \fB75\fR%. .RE .sp .ne 2 .na \fBzfs_arc_meta_min\fR (ulong) .ad .RS 12n The minimum allowed size in bytes that meta data buffers may consume in the ARC. This value defaults to 0 which disables a floor on the amount of the ARC devoted meta data. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_arc_meta_prune\fR (int) .ad .RS 12n The number of dentries and inodes to be scanned looking for entries which can be dropped. This may be required when the ARC reaches the \fBzfs_arc_meta_limit\fR because dentries and inodes can pin buffers in the ARC. Increasing this value will cause to dentry and inode caches to be pruned more aggressively. Setting this value to 0 will disable pruning the inode and dentry caches. .sp Default value: \fB10,000\fR. .RE .sp .ne 2 .na \fBzfs_arc_meta_strategy\fR (int) .ad .RS 12n Define the strategy for ARC meta data buffer eviction (meta reclaim strategy). A value of 0 (META_ONLY) will evict only the ARC meta data buffers. A value of 1 (BALANCED) indicates that additional data buffers may be evicted if that is required to in order to evict the required number of meta data buffers. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_arc_min\fR (ulong) .ad .RS 12n Min size of ARC in bytes. If set to 0 then arc_c_min will default to consuming the larger of 32M or 1/32 of total system memory. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_arc_min_prefetch_ms\fR (int) .ad .RS 12n Minimum time prefetched blocks are locked in the ARC, specified in ms. A value of \fB0\fR will default to 1000 ms. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_arc_min_prescient_prefetch_ms\fR (int) .ad .RS 12n Minimum time "prescient prefetched" blocks are locked in the ARC, specified in ms. These blocks are meant to be prefetched fairly aggressively ahead of the code that may use them. A value of \fB0\fR will default to 6000 ms. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_max_missing_tvds\fR (int) .ad .RS 12n Number of missing top-level vdevs which will be allowed during pool import (only in read-only mode). .sp Default value: \fB0\fR .RE .sp .ne 2 .na \fBzfs_max_nvlist_src_size\fR (ulong) .ad .RS 12n Maximum size in bytes allowed to be passed as zc_nvlist_src_size for ioctls on /dev/zfs. This prevents a user from causing the kernel to allocate an excessive amount of memory. When the limit is exceeded, the ioctl fails with EINVAL and a description of the error is sent to the zfs-dbgmsg log. This parameter should not need to be touched under normal circumstances. On FreeBSD, the default is based on the system limit on user wired memory. On Linux, the default is \fBKMALLOC_MAX_SIZE\fR . .sp Default value: \fB0\fR (kernel decides) .RE .sp .ne 2 .na \fBzfs_multilist_num_sublists\fR (int) .ad .RS 12n To allow more fine-grained locking, each ARC state contains a series of lists for both data and meta data objects. Locking is performed at the level of these "sub-lists". This parameters controls the number of sub-lists per ARC state, and also applies to other uses of the multilist data structure. .sp Default value: \fB4\fR or the number of online CPUs, whichever is greater .RE .sp .ne 2 .na \fBzfs_arc_overflow_shift\fR (int) .ad .RS 12n The ARC size is considered to be overflowing if it exceeds the current ARC target size (arc_c) by a threshold determined by this parameter. The threshold is calculated as a fraction of arc_c using the formula "arc_c >> \fBzfs_arc_overflow_shift\fR". The default value of 8 causes the ARC to be considered to be overflowing if it exceeds the target size by 1/256th (0.3%) of the target size. When the ARC is overflowing, new buffer allocations are stalled until the reclaim thread catches up and the overflow condition no longer exists. .sp Default value: \fB8\fR. .RE .sp .ne 2 .na \fBzfs_arc_p_min_shift\fR (int) .ad .RS 12n If set to a non zero value, this will update arc_p_min_shift (default 4) with the new value. arc_p_min_shift is used to shift of arc_c for calculating both min and max max arc_p .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_arc_p_dampener_disable\fR (int) .ad .RS 12n Disable arc_p adapt dampener .sp Use \fB1\fR for yes (default) and \fB0\fR to disable. .RE .sp .ne 2 .na \fBzfs_arc_shrink_shift\fR (int) .ad .RS 12n If set to a non zero value, this will update arc_shrink_shift (default 7) with the new value. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_arc_pc_percent\fR (uint) .ad .RS 12n Percent of pagecache to reclaim arc to This tunable allows ZFS arc to play more nicely with the kernel's LRU pagecache. It can guarantee that the ARC size won't collapse under scanning pressure on the pagecache, yet still allows arc to be reclaimed down to zfs_arc_min if necessary. This value is specified as percent of pagecache size (as measured by NR_FILE_PAGES) where that percent may exceed 100. This only operates during memory pressure/reclaim. .sp Default value: \fB0\fR% (disabled). .RE .sp .ne 2 .na \fBzfs_arc_shrinker_limit\fR (int) .ad .RS 12n This is a limit on how many pages the ARC shrinker makes available for eviction in response to one page allocation attempt. Note that in practice, the kernel's shrinker can ask us to evict up to about 4x this for one allocation attempt. .sp The default limit of 10,000 (in practice, 160MB per allocation attempt with 4K pages) limits the amount of time spent attempting to reclaim ARC memory to less than 100ms per allocation attempt, even with a small average compressed block size of ~8KB. .sp The parameter can be set to 0 (zero) to disable the limit. .sp This parameter only applies on Linux. .sp Default value: \fB10,000\fR. .RE .sp .ne 2 .na \fBzfs_arc_sys_free\fR (ulong) .ad .RS 12n The target number of bytes the ARC should leave as free memory on the system. Defaults to the larger of 1/64 of physical memory or 512K. Setting this option to a non-zero value will override the default. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_autoimport_disable\fR (int) .ad .RS 12n Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR). .sp Use \fB1\fR for yes (default) and \fB0\fR for no. .RE .sp .ne 2 .na \fBzfs_checksum_events_per_second\fR (uint) .ad .RS 12n Rate limit checksum events to this many per second. Note that this should not be set below the zed thresholds (currently 10 checksums over 10 sec) or else zed may not trigger any action. .sp Default value: 20 .RE .sp .ne 2 .na \fBzfs_commit_timeout_pct\fR (int) .ad .RS 12n This controls the amount of time that a ZIL block (lwb) will remain "open" when it isn't "full", and it has a thread waiting for it to be committed to stable storage. The timeout is scaled based on a percentage of the last lwb latency to avoid significantly impacting the latency of each individual transaction record (itx). .sp Default value: \fB5\fR%. .RE .sp .ne 2 .na \fBzfs_condense_indirect_commit_entry_delay_ms\fR (int) .ad .RS 12n Vdev indirection layer (used for device removal) sleeps for this many milliseconds during mapping generation. Intended for use with the test suite to throttle vdev removal speed. .sp Default value: \fB0\fR (no throttle). .RE .sp .ne 2 .na \fBzfs_condense_indirect_vdevs_enable\fR (int) .ad .RS 12n Enable condensing indirect vdev mappings. When set to a non-zero value, attempt to condense indirect vdev mappings if the mapping uses more than \fBzfs_condense_min_mapping_bytes\fR bytes of memory and if the obsolete space map object uses more than \fBzfs_condense_max_obsolete_bytes\fR bytes on-disk. The condensing process is an attempt to save memory by removing obsolete mappings. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_condense_max_obsolete_bytes\fR (ulong) .ad .RS 12n Only attempt to condense indirect vdev mappings if the on-disk size of the obsolete space map object is greater than this number of bytes (see \fBfBzfs_condense_indirect_vdevs_enable\fR). .sp Default value: \fB1,073,741,824\fR. .RE .sp .ne 2 .na \fBzfs_condense_min_mapping_bytes\fR (ulong) .ad .RS 12n Minimum size vdev mapping to attempt to condense (see \fBzfs_condense_indirect_vdevs_enable\fR). .sp Default value: \fB131,072\fR. .RE .sp .ne 2 .na \fBzfs_dbgmsg_enable\fR (int) .ad .RS 12n Internally ZFS keeps a small log to facilitate debugging. By default the log is disabled, to enable it set this option to 1. The contents of the log can be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to this proc file clears the log. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_dbgmsg_maxsize\fR (int) .ad .RS 12n The maximum size in bytes of the internal ZFS debug log. .sp Default value: \fB4M\fR. .RE .sp .ne 2 .na \fBzfs_dbuf_state_index\fR (int) .ad .RS 12n This feature is currently unused. It is normally used for controlling what reporting is available under /proc/spl/kstat/zfs. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_deadman_enabled\fR (int) .ad .RS 12n When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR milliseconds, or when an individual I/O takes longer than \fBzfs_deadman_ziotime_ms\fR milliseconds, then the operation is considered to be "hung". If \fBzfs_deadman_enabled\fR is set then the deadman behavior is invoked as described by the \fBzfs_deadman_failmode\fR module option. By default the deadman is enabled and configured to \fBwait\fR which results in "hung" I/Os only being logged. The deadman is automatically disabled when a pool gets suspended. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_deadman_failmode\fR (charp) .ad .RS 12n Controls the failure behavior when the deadman detects a "hung" I/O. Valid values are \fBwait\fR, \fBcontinue\fR, and \fBpanic\fR. .sp \fBwait\fR - Wait for a "hung" I/O to complete. For each "hung" I/O a "deadman" event will be posted describing that I/O. .sp \fBcontinue\fR - Attempt to recover from a "hung" I/O by re-dispatching it to the I/O pipeline if possible. .sp \fBpanic\fR - Panic the system. This can be used to facilitate an automatic fail-over to a properly configured fail-over partner. .sp Default value: \fBwait\fR. .RE .sp .ne 2 .na \fBzfs_deadman_checktime_ms\fR (int) .ad .RS 12n Check time in milliseconds. This defines the frequency at which we check for hung I/O and potentially invoke the \fBzfs_deadman_failmode\fR behavior. .sp Default value: \fB60,000\fR. .RE .sp .ne 2 .na \fBzfs_deadman_synctime_ms\fR (ulong) .ad .RS 12n Interval in milliseconds after which the deadman is triggered and also the interval after which a pool sync operation is considered to be "hung". Once this limit is exceeded the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR milliseconds until the pool sync completes. .sp Default value: \fB600,000\fR. .RE .sp .ne 2 .na \fBzfs_deadman_ziotime_ms\fR (ulong) .ad .RS 12n Interval in milliseconds after which the deadman is triggered and an individual I/O operation is considered to be "hung". As long as the I/O remains "hung" the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR milliseconds until the I/O completes. .sp Default value: \fB300,000\fR. .RE .sp .ne 2 .na \fBzfs_dedup_prefetch\fR (int) .ad .RS 12n Enable prefetching dedup-ed blks .sp Use \fB1\fR for yes and \fB0\fR to disable (default). .RE .sp .ne 2 .na \fBzfs_delay_min_dirty_percent\fR (int) .ad .RS 12n Start to delay each transaction once there is this amount of dirty data, expressed as a percentage of \fBzfs_dirty_data_max\fR. This value should be >= zfs_vdev_async_write_active_max_dirty_percent. See the section "ZFS TRANSACTION DELAY". .sp Default value: \fB60\fR%. .RE .sp .ne 2 .na \fBzfs_delay_scale\fR (int) .ad .RS 12n This controls how quickly the transaction delay approaches infinity. Larger values cause longer delays for a given amount of dirty data. .sp For the smoothest delay, this value should be about 1 billion divided by the maximum number of operations per second. This will smoothly handle between 10x and 1/10th this number. .sp See the section "ZFS TRANSACTION DELAY". .sp Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64. .sp Default value: \fB500,000\fR. .RE .sp .ne 2 .na \fBzfs_disable_ivset_guid_check\fR (int) .ad .RS 12n Disables requirement for IVset guids to be present and match when doing a raw receive of encrypted datasets. Intended for users whose pools were created with ZFS on Linux pre-release versions and now have compatibility issues. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_key_max_salt_uses\fR (ulong) .ad .RS 12n Maximum number of uses of a single salt value before generating a new one for encrypted datasets. The default value is also the maximum that will be accepted. .sp Default value: \fB400,000,000\fR. .RE .sp .ne 2 .na \fBzfs_object_mutex_size\fR (uint) .ad .RS 12n Size of the znode hashtable used for holds. Due to the need to hold locks on objects that may not exist yet, kernel mutexes are not created per-object and instead a hashtable is used where collisions will result in objects waiting when there is not actually contention on the same object. .sp Default value: \fB64\fR. .RE .sp .ne 2 .na \fBzfs_slow_io_events_per_second\fR (int) .ad .RS 12n Rate limit delay zevents (which report slow I/Os) to this many per second. .sp Default value: 20 .RE .sp .ne 2 .na \fBzfs_unflushed_max_mem_amt\fR (ulong) .ad .RS 12n Upper-bound limit for unflushed metadata changes to be held by the log spacemap in memory (in bytes). .sp Default value: \fB1,073,741,824\fR (1GB). .RE .sp .ne 2 .na \fBzfs_unflushed_max_mem_ppm\fR (ulong) .ad .RS 12n Percentage of the overall system memory that ZFS allows to be used for unflushed metadata changes by the log spacemap. (value is calculated over 1000000 for finer granularity). .sp Default value: \fB1000\fR (which is divided by 1000000, resulting in the limit to be \fB0.1\fR% of memory) .RE .sp .ne 2 .na \fBzfs_unflushed_log_block_max\fR (ulong) .ad .RS 12n Describes the maximum number of log spacemap blocks allowed for each pool. The default value of 262144 means that the space in all the log spacemaps can add up to no more than 262144 blocks (which means 32GB of logical space before compression and ditto blocks, assuming that blocksize is 128k). .sp This tunable is important because it involves a trade-off between import time after an unclean export and the frequency of flushing metaslabs. The higher this number is, the more log blocks we allow when the pool is active which means that we flush metaslabs less often and thus decrease the number of I/Os for spacemap updates per TXG. At the same time though, that means that in the event of an unclean export, there will be more log spacemap blocks for us to read, inducing overhead in the import time of the pool. The lower the number, the amount of flushing increases destroying log blocks quicker as they become obsolete faster, which leaves less blocks to be read during import time after a crash. .sp Each log spacemap block existing during pool import leads to approximately one extra logical I/O issued. This is the reason why this tunable is exposed in terms of blocks rather than space used. .sp Default value: \fB262144\fR (256K). .RE .sp .ne 2 .na \fBzfs_unflushed_log_block_min\fR (ulong) .ad .RS 12n If the number of metaslabs is small and our incoming rate is high, we could get into a situation that we are flushing all our metaslabs every TXG. Thus we always allow at least this many log blocks. .sp Default value: \fB1000\fR. .RE .sp .ne 2 .na \fBzfs_unflushed_log_block_pct\fR (ulong) .ad .RS 12n Tunable used to determine the number of blocks that can be used for the spacemap log, expressed as a percentage of the total number of metaslabs in the pool. .sp Default value: \fB400\fR (read as \fB400\fR% - meaning that the number of log spacemap blocks are capped at 4 times the number of metaslabs in the pool). .RE .sp .ne 2 .na \fBzfs_unlink_suspend_progress\fR (uint) .ad .RS 12n When enabled, files will not be asynchronously removed from the list of pending unlinks and the space they consume will be leaked. Once this option has been disabled and the dataset is remounted, the pending unlinks will be processed and the freed space returned to the pool. This option is used by the test suite to facilitate testing. .sp Uses \fB0\fR (default) to allow progress and \fB1\fR to pause progress. .RE .sp .ne 2 .na \fBzfs_delete_blocks\fR (ulong) .ad .RS 12n This is the used to define a large file for the purposes of delete. Files containing more than \fBzfs_delete_blocks\fR will be deleted asynchronously while smaller files are deleted synchronously. Decreasing this value will reduce the time spent in an unlink(2) system call at the expense of a longer delay before the freed space is available. .sp Default value: \fB20,480\fR. .RE .sp .ne 2 .na \fBzfs_dirty_data_max\fR (int) .ad .RS 12n Determines the dirty space limit in bytes. Once this limit is exceeded, new writes are halted until space frees up. This parameter takes precedence over \fBzfs_dirty_data_max_percent\fR. See the section "ZFS TRANSACTION DELAY". .sp Default value: \fB10\fR% of physical RAM, capped at \fBzfs_dirty_data_max_max\fR. .RE .sp .ne 2 .na \fBzfs_dirty_data_max_max\fR (int) .ad .RS 12n Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes. This limit is only enforced at module load time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed. This parameter takes precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section "ZFS TRANSACTION DELAY". .sp Default value: \fB25\fR% of physical RAM. .RE .sp .ne 2 .na \fBzfs_dirty_data_max_max_percent\fR (int) .ad .RS 12n Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a percentage of physical RAM. This limit is only enforced at module load time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed. The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this one. See the section "ZFS TRANSACTION DELAY". .sp Default value: \fB25\fR%. .RE .sp .ne 2 .na \fBzfs_dirty_data_max_percent\fR (int) .ad .RS 12n Determines the dirty space limit, expressed as a percentage of all memory. Once this limit is exceeded, new writes are halted until space frees up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this one. See the section "ZFS TRANSACTION DELAY". .sp Default value: \fB10\fR%, subject to \fBzfs_dirty_data_max_max\fR. .RE .sp .ne 2 .na \fBzfs_dirty_data_sync_percent\fR (int) .ad .RS 12n Start syncing out a transaction group if there's at least this much dirty data as a percentage of \fBzfs_dirty_data_max\fR. This should be less than \fBzfs_vdev_async_write_active_min_dirty_percent\fR. .sp Default value: \fB20\fR% of \fBzfs_dirty_data_max\fR. .RE .sp .ne 2 .na \fBzfs_fallocate_reserve_percent\fR (uint) .ad .RS 12n Since ZFS is a copy-on-write filesystem with snapshots, blocks cannot be preallocated for a file in order to guarantee that later writes will not run out of space. Instead, fallocate() space preallocation only checks that sufficient space is currently available in the pool or the user's project quota allocation, and then creates a sparse file of the requested size. The requested space is multiplied by \fBzfs_fallocate_reserve_percent\fR to allow additional space for indirect blocks and other internal metadata. Setting this value to 0 disables support for fallocate(2) and returns EOPNOTSUPP for fallocate() space preallocation again. .sp Default value: \fB110\fR% .RE .sp .ne 2 .na \fBzfs_fletcher_4_impl\fR (string) .ad .RS 12n Select a fletcher 4 implementation. .sp Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR, \fBavx2\fR, \fBavx512f\fR, \fBavx512bw\fR, and \fBaarch64_neon\fR. All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction set extensions to be available and will only appear if ZFS detects that they are present at runtime. If multiple implementations of fletcher 4 are available, the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR results in the original, CPU based calculation, being used. Selecting any option other than \fBfastest\fR and \fBscalar\fR results in vector instructions from the respective CPU instruction set being used. .sp Default value: \fBfastest\fR. .RE .sp .ne 2 .na \fBzfs_free_bpobj_enabled\fR (int) .ad .RS 12n Enable/disable the processing of the free_bpobj object. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_async_block_max_blocks\fR (ulong) .ad .RS 12n Maximum number of blocks freed in a single txg. .sp Default value: \fBULONG_MAX\fR (unlimited). .RE .sp .ne 2 .na \fBzfs_max_async_dedup_frees\fR (ulong) .ad .RS 12n Maximum number of dedup blocks freed in a single txg. .sp Default value: \fB100,000\fR. .RE .sp .ne 2 .na \fBzfs_override_estimate_recordsize\fR (ulong) .ad .RS 12n Record size calculation override for zfs send estimates. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_vdev_async_read_max_active\fR (int) .ad .RS 12n Maximum asynchronous read I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB3\fR. .RE .sp .ne 2 .na \fBzfs_vdev_async_read_min_active\fR (int) .ad .RS 12n Minimum asynchronous read I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_vdev_async_write_active_max_dirty_percent\fR (int) .ad .RS 12n When the pool has more than \fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use \fBzfs_vdev_async_write_max_active\fR to limit active async writes. If the dirty data is between min and max, the active I/O limit is linearly interpolated. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB60\fR%. .RE .sp .ne 2 .na \fBzfs_vdev_async_write_active_min_dirty_percent\fR (int) .ad .RS 12n When the pool has less than \fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use \fBzfs_vdev_async_write_min_active\fR to limit active async writes. If the dirty data is between min and max, the active I/O limit is linearly interpolated. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB30\fR%. .RE .sp .ne 2 .na \fBzfs_vdev_async_write_max_active\fR (int) .ad .RS 12n Maximum asynchronous write I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB10\fR. .RE .sp .ne 2 .na \fBzfs_vdev_async_write_min_active\fR (int) .ad .RS 12n Minimum asynchronous write I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Lower values are associated with better latency on rotational media but poorer resilver performance. The default value of 2 was chosen as a compromise. A value of 3 has been shown to improve resilver performance further at a cost of further increasing latency. .sp Default value: \fB2\fR. .RE .sp .ne 2 .na \fBzfs_vdev_initializing_max_active\fR (int) .ad .RS 12n Maximum initializing I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_vdev_initializing_min_active\fR (int) .ad .RS 12n Minimum initializing I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_vdev_max_active\fR (int) .ad .RS 12n The maximum number of I/Os active to each device. Ideally, this will be >= the sum of each queue's max_active. It must be at least the sum of each queue's min_active. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB1,000\fR. .RE .sp .ne 2 .na \fBzfs_vdev_rebuild_max_active\fR (int) .ad .RS 12n Maximum sequential resilver I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB3\fR. .RE .sp .ne 2 .na \fBzfs_vdev_rebuild_min_active\fR (int) .ad .RS 12n Minimum sequential resilver I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_vdev_removal_max_active\fR (int) .ad .RS 12n Maximum removal I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB2\fR. .RE .sp .ne 2 .na \fBzfs_vdev_removal_min_active\fR (int) .ad .RS 12n Minimum removal I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_vdev_scrub_max_active\fR (int) .ad .RS 12n Maximum scrub I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB2\fR. .RE .sp .ne 2 .na \fBzfs_vdev_scrub_min_active\fR (int) .ad .RS 12n Minimum scrub I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_vdev_sync_read_max_active\fR (int) .ad .RS 12n Maximum synchronous read I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB10\fR. .RE .sp .ne 2 .na \fBzfs_vdev_sync_read_min_active\fR (int) .ad .RS 12n Minimum synchronous read I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB10\fR. .RE .sp .ne 2 .na \fBzfs_vdev_sync_write_max_active\fR (int) .ad .RS 12n Maximum synchronous write I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB10\fR. .RE .sp .ne 2 .na \fBzfs_vdev_sync_write_min_active\fR (int) .ad .RS 12n Minimum synchronous write I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB10\fR. .RE .sp .ne 2 .na \fBzfs_vdev_trim_max_active\fR (int) .ad .RS 12n Maximum trim/discard I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB2\fR. .RE .sp .ne 2 .na \fBzfs_vdev_trim_min_active\fR (int) .ad .RS 12n Minimum trim/discard I/Os active to each device. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_vdev_queue_depth_pct\fR (int) .ad .RS 12n Maximum number of queued allocations per top-level vdev expressed as a percentage of \fBzfs_vdev_async_write_max_active\fR which allows the system to detect devices that are more capable of handling allocations and to allocate more blocks to those devices. It allows for dynamic allocation distribution when devices are imbalanced as fuller devices will tend to be slower than empty devices. See also \fBzio_dva_throttle_enabled\fR. .sp Default value: \fB1000\fR%. .RE .sp .ne 2 .na \fBzfs_expire_snapshot\fR (int) .ad .RS 12n Seconds to expire .zfs/snapshot .sp Default value: \fB300\fR. .RE .sp .ne 2 .na \fBzfs_admin_snapshot\fR (int) .ad .RS 12n Allow the creation, removal, or renaming of entries in the .zfs/snapshot directory to cause the creation, destruction, or renaming of snapshots. When enabled this functionality works both locally and over NFS exports which have the 'no_root_squash' option set. This functionality is disabled by default. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_flags\fR (int) .ad .RS 12n Set additional debugging flags. The following flags may be bitwise-or'd together. .sp .TS box; rB lB lB lB r l. Value Symbolic Name Description _ 1 ZFS_DEBUG_DPRINTF Enable dprintf entries in the debug log. _ 2 ZFS_DEBUG_DBUF_VERIFY * Enable extra dbuf verifications. _ 4 ZFS_DEBUG_DNODE_VERIFY * Enable extra dnode verifications. _ 8 ZFS_DEBUG_SNAPNAMES Enable snapshot name verification. _ 16 ZFS_DEBUG_MODIFY Check for illegally modified ARC buffers. _ 64 ZFS_DEBUG_ZIO_FREE Enable verification of block frees. _ 128 ZFS_DEBUG_HISTOGRAM_VERIFY Enable extra spacemap histogram verifications. _ 256 ZFS_DEBUG_METASLAB_VERIFY Verify space accounting on disk matches in-core range_trees. _ 512 ZFS_DEBUG_SET_ERROR Enable SET_ERROR and dprintf entries in the debug log. _ 1024 ZFS_DEBUG_INDIRECT_REMAP Verify split blocks created by device removal. _ 2048 ZFS_DEBUG_TRIM Verify TRIM ranges are always within the allocatable range tree. _ 4096 ZFS_DEBUG_LOG_SPACEMAP Verify that the log summary is consistent with the spacemap log and enable zfs_dbgmsgs for metaslab loading and flushing. .TE .sp * Requires debug build. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_free_leak_on_eio\fR (int) .ad .RS 12n If destroy encounters an EIO while reading metadata (e.g. indirect blocks), space referenced by the missing metadata can not be freed. Normally this causes the background destroy to become "stalled", as it is unable to make forward progress. While in this stalled state, all remaining space to free from the error-encountering filesystem is "temporarily leaked". Set this flag to cause it to ignore the EIO, permanently leak the space from indirect blocks that can not be read, and continue to free everything else that it can. The default, "stalling" behavior is useful if the storage partially fails (i.e. some but not all i/os fail), and then later recovers. In this case, we will be able to continue pool operations while it is partially failed, and when it recovers, we can continue to free the space, with no leaks. However, note that this case is actually fairly rare. Typically pools either (a) fail completely (but perhaps temporarily, e.g. a top-level vdev going offline), or (b) have localized, permanent errors (e.g. disk returns the wrong data due to bit flip or firmware bug). In case (a), this setting does not matter because the pool will be suspended and the sync thread will not be able to make forward progress regardless. In case (b), because the error is permanent, the best we can do is leak the minimum amount of space, which is what setting this flag will do. Therefore, it is reasonable for this flag to normally be set, but we chose the more conservative approach of not setting it, so that there is no possibility of leaking space in the "partial temporary" failure case. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_free_min_time_ms\fR (int) .ad .RS 12n During a \fBzfs destroy\fR operation using \fBfeature@async_destroy\fR a minimum of this much time will be spent working on freeing blocks per txg. .sp Default value: \fB1,000\fR. .RE .sp .ne 2 .na \fBzfs_obsolete_min_time_ms\fR (int) .ad .RS 12n Similar to \fBzfs_free_min_time_ms\fR but for cleanup of old indirection records for removed vdevs. .sp Default value: \fB500\fR. .RE .sp .ne 2 .na \fBzfs_immediate_write_sz\fR (long) .ad .RS 12n Largest data block to write to zil. Larger blocks will be treated as if the dataset being written to had the property setting \fBlogbias=throughput\fR. .sp Default value: \fB32,768\fR. .RE .sp .ne 2 .na \fBzfs_initialize_value\fR (ulong) .ad .RS 12n Pattern written to vdev free space by \fBzpool initialize\fR. .sp Default value: \fB16,045,690,984,833,335,022\fR (0xdeadbeefdeadbeee). .RE .sp .ne 2 .na \fBzfs_initialize_chunk_size\fR (ulong) .ad .RS 12n Size of writes used by \fBzpool initialize\fR. This option is used by the test suite to facilitate testing. .sp Default value: \fB1,048,576\fR .RE .sp .ne 2 .na \fBzfs_livelist_max_entries\fR (ulong) .ad .RS 12n The threshold size (in block pointers) at which we create a new sub-livelist. Larger sublists are more costly from a memory perspective but the fewer sublists there are, the lower the cost of insertion. .sp Default value: \fB500,000\fR. .RE .sp .ne 2 .na \fBzfs_livelist_min_percent_shared\fR (int) .ad .RS 12n If the amount of shared space between a snapshot and its clone drops below this threshold, the clone turns off the livelist and reverts to the old deletion method. This is in place because once a clone has been overwritten enough livelists no long give us a benefit. .sp Default value: \fB75\fR. .RE .sp .ne 2 .na \fBzfs_livelist_condense_new_alloc\fR (int) .ad .RS 12n Incremented each time an extra ALLOC blkptr is added to a livelist entry while it is being condensed. This option is used by the test suite to track race conditions. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_livelist_condense_sync_cancel\fR (int) .ad .RS 12n Incremented each time livelist condensing is canceled while in spa_livelist_condense_sync. This option is used by the test suite to track race conditions. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_livelist_condense_sync_pause\fR (int) .ad .RS 12n When set, the livelist condense process pauses indefinitely before executing the synctask - spa_livelist_condense_sync. This option is used by the test suite to trigger race conditions. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_livelist_condense_zthr_cancel\fR (int) .ad .RS 12n Incremented each time livelist condensing is canceled while in spa_livelist_condense_cb. This option is used by the test suite to track race conditions. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_livelist_condense_zthr_pause\fR (int) .ad .RS 12n When set, the livelist condense process pauses indefinitely before executing the open context condensing work in spa_livelist_condense_cb. This option is used by the test suite to trigger race conditions. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_lua_max_instrlimit\fR (ulong) .ad .RS 12n The maximum execution time limit that can be set for a ZFS channel program, specified as a number of Lua instructions. .sp Default value: \fB100,000,000\fR. .RE .sp .ne 2 .na \fBzfs_lua_max_memlimit\fR (ulong) .ad .RS 12n The maximum memory limit that can be set for a ZFS channel program, specified in bytes. .sp Default value: \fB104,857,600\fR. .RE .sp .ne 2 .na \fBzfs_max_dataset_nesting\fR (int) .ad .RS 12n The maximum depth of nested datasets. This value can be tuned temporarily to fix existing datasets that exceed the predefined limit. .sp Default value: \fB50\fR. .RE .sp .ne 2 .na \fBzfs_max_log_walking\fR (ulong) .ad .RS 12n The number of past TXGs that the flushing algorithm of the log spacemap feature uses to estimate incoming log blocks. .sp Default value: \fB5\fR. .RE .sp .ne 2 .na \fBzfs_max_logsm_summary_length\fR (ulong) .ad .RS 12n Maximum number of rows allowed in the summary of the spacemap log. .sp Default value: \fB10\fR. .RE .sp .ne 2 .na \fBzfs_max_recordsize\fR (int) .ad .RS 12n We currently support block sizes from 512 bytes to 16MB. The benefits of larger blocks, and thus larger I/O, need to be weighed against the cost of COWing a giant block to modify one byte. Additionally, very large blocks can have an impact on i/o latency, and also potentially on the memory allocator. Therefore, we do not allow the recordsize to be set larger than zfs_max_recordsize (default 1MB). Larger blocks can be created by changing this tunable, and pools with larger blocks can always be imported and used, regardless of this setting. .sp Default value: \fB1,048,576\fR. .RE .sp .ne 2 .na \fBzfs_allow_redacted_dataset_mount\fR (int) .ad .RS 12n Allow datasets received with redacted send/receive to be mounted. Normally disabled because these datasets may be missing key data. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_min_metaslabs_to_flush\fR (ulong) .ad .RS 12n Minimum number of metaslabs to flush per dirty TXG .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_metaslab_fragmentation_threshold\fR (int) .ad .RS 12n Allow metaslabs to keep their active state as long as their fragmentation percentage is less than or equal to this value. An active metaslab that exceeds this threshold will no longer keep its active status allowing better metaslabs to be selected. .sp Default value: \fB70\fR. .RE .sp .ne 2 .na \fBzfs_mg_fragmentation_threshold\fR (int) .ad .RS 12n Metaslab groups are considered eligible for allocations if their fragmentation metric (measured as a percentage) is less than or equal to this value. If a metaslab group exceeds this threshold then it will be skipped unless all metaslab groups within the metaslab class have also crossed this threshold. .sp Default value: \fB95\fR. .RE .sp .ne 2 .na \fBzfs_mg_noalloc_threshold\fR (int) .ad .RS 12n Defines a threshold at which metaslab groups should be eligible for allocations. The value is expressed as a percentage of free space beyond which a metaslab group is always eligible for allocations. If a metaslab group's free space is less than or equal to the threshold, the allocator will avoid allocating to that group unless all groups in the pool have reached the threshold. Once all groups have reached the threshold, all groups are allowed to accept allocations. The default value of 0 disables the feature and causes all metaslab groups to be eligible for allocations. This parameter allows one to deal with pools having heavily imbalanced vdevs such as would be the case when a new vdev has been added. Setting the threshold to a non-zero percentage will stop allocations from being made to vdevs that aren't filled to the specified percentage and allow lesser filled vdevs to acquire more allocations than they otherwise would under the old \fBzfs_mg_alloc_failures\fR facility. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_ddt_data_is_special\fR (int) .ad .RS 12n If enabled, ZFS will place DDT data into the special allocation class. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_user_indirect_is_special\fR (int) .ad .RS 12n If enabled, ZFS will place user data (both file and zvol) indirect blocks into the special allocation class. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_multihost_history\fR (int) .ad .RS 12n Historical statistics for the last N multihost updates will be available in \fB/proc/spl/kstat/zfs//multihost\fR .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_multihost_interval\fR (ulong) .ad .RS 12n Used to control the frequency of multihost writes which are performed when the \fBmultihost\fR pool property is on. This is one factor used to determine the length of the activity check during import. .sp The multihost write period is \fBzfs_multihost_interval / leaf-vdevs\fR milliseconds. On average a multihost write will be issued for each leaf vdev every \fBzfs_multihost_interval\fR milliseconds. In practice, the observed period can vary with the I/O load and this observed value is the delay which is stored in the uberblock. .sp Default value: \fB1000\fR. .RE .sp .ne 2 .na \fBzfs_multihost_import_intervals\fR (uint) .ad .RS 12n Used to control the duration of the activity test on import. Smaller values of \fBzfs_multihost_import_intervals\fR will reduce the import time but increase the risk of failing to detect an active pool. The total activity check time is never allowed to drop below one second. .sp On import the activity check waits a minimum amount of time determined by \fBzfs_multihost_interval * zfs_multihost_import_intervals\fR, or the same product computed on the host which last had the pool imported (whichever is greater). The activity check time may be further extended if the value of mmp delay found in the best uberblock indicates actual multihost updates happened at longer intervals than \fBzfs_multihost_interval\fR. A minimum value of \fB100ms\fR is enforced. .sp A value of 0 is ignored and treated as if it was set to 1. .sp Default value: \fB20\fR. .RE .sp .ne 2 .na \fBzfs_multihost_fail_intervals\fR (uint) .ad .RS 12n Controls the behavior of the pool when multihost write failures or delays are detected. .sp When \fBzfs_multihost_fail_intervals = 0\fR, multihost write failures or delays are ignored. The failures will still be reported to the ZED which depending on its configuration may take action such as suspending the pool or offlining a device. .sp When \fBzfs_multihost_fail_intervals > 0\fR, the pool will be suspended if \fBzfs_multihost_fail_intervals * zfs_multihost_interval\fR milliseconds pass without a successful mmp write. This guarantees the activity test will see mmp writes if the pool is imported. A value of 1 is ignored and treated as if it was set to 2. This is necessary to prevent the pool from being suspended due to normal, small I/O latency variations. .sp Default value: \fB10\fR. .RE .sp .ne 2 .na \fBzfs_no_scrub_io\fR (int) .ad .RS 12n Set for no scrub I/O. This results in scrubs not actually scrubbing data and simply doing a metadata crawl of the pool instead. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_no_scrub_prefetch\fR (int) .ad .RS 12n Set to disable block prefetching for scrubs. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_nocacheflush\fR (int) .ad .RS 12n Disable cache flush operations on disks when writing. Setting this will cause pool corruption on power loss if a volatile out-of-order write cache is enabled. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_nopwrite_enabled\fR (int) .ad .RS 12n Enable NOP writes .sp Use \fB1\fR for yes (default) and \fB0\fR to disable. .RE .sp .ne 2 .na \fBzfs_dmu_offset_next_sync\fR (int) .ad .RS 12n Enable forcing txg sync to find holes. When enabled forces ZFS to act like prior versions when SEEK_HOLE or SEEK_DATA flags are used, which when a dnode is dirty causes txg's to be synced so that this data can be found. .sp Use \fB1\fR for yes and \fB0\fR to disable (default). .RE .sp .ne 2 .na \fBzfs_pd_bytes_max\fR (int) .ad .RS 12n The number of bytes which should be prefetched during a pool traversal (eg: \fBzfs send\fR or other data crawling operations) .sp Default value: \fB52,428,800\fR. .RE .sp .ne 2 .na \fBzfs_per_txg_dirty_frees_percent \fR (ulong) .ad .RS 12n Tunable to control percentage of dirtied indirect blocks from frees allowed into one TXG. After this threshold is crossed, additional frees will wait until the next TXG. A value of zero will disable this throttle. .sp Default value: \fB5\fR, set to \fB0\fR to disable. .RE .sp .ne 2 .na \fBzfs_prefetch_disable\fR (int) .ad .RS 12n This tunable disables predictive prefetch. Note that it leaves "prescient" prefetch (e.g. prefetch for zfs send) intact. Unlike predictive prefetch, prescient prefetch never issues i/os that end up not being needed, so it can't hurt performance. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_qat_checksum_disable\fR (int) .ad .RS 12n This tunable disables qat hardware acceleration for sha256 checksums. It may be set after the zfs modules have been loaded to initialize the qat hardware as long as support is compiled in and the qat driver is present. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_qat_compress_disable\fR (int) .ad .RS 12n This tunable disables qat hardware acceleration for gzip compression. It may be set after the zfs modules have been loaded to initialize the qat hardware as long as support is compiled in and the qat driver is present. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_qat_encrypt_disable\fR (int) .ad .RS 12n This tunable disables qat hardware acceleration for AES-GCM encryption. It may be set after the zfs modules have been loaded to initialize the qat hardware as long as support is compiled in and the qat driver is present. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_read_chunk_size\fR (long) .ad .RS 12n Bytes to read per chunk .sp Default value: \fB1,048,576\fR. .RE .sp .ne 2 .na \fBzfs_read_history\fR (int) .ad .RS 12n Historical statistics for the last N reads will be available in \fB/proc/spl/kstat/zfs//reads\fR .sp Default value: \fB0\fR (no data is kept). .RE .sp .ne 2 .na \fBzfs_read_history_hits\fR (int) .ad .RS 12n Include cache hits in read history .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_rebuild_max_segment\fR (ulong) .ad .RS 12n Maximum read segment size to issue when sequentially resilvering a top-level vdev. .sp Default value: \fB1,048,576\fR. .RE .sp .ne 2 .na \fBzfs_reconstruct_indirect_combinations_max\fR (int) .ad .RS 12na If an indirect split block contains more than this many possible unique combinations when being reconstructed, consider it too computationally expensive to check them all. Instead, try at most \fBzfs_reconstruct_indirect_combinations_max\fR randomly-selected combinations each time the block is accessed. This allows all segment copies to participate fairly in the reconstruction when all combinations cannot be checked and prevents repeated use of one bad copy. .sp Default value: \fB4096\fR. .RE .sp .ne 2 .na \fBzfs_recover\fR (int) .ad .RS 12n Set to attempt to recover from fatal errors. This should only be used as a last resort, as it typically results in leaked space, or worse. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_removal_ignore_errors\fR (int) .ad .RS 12n .sp Ignore hard IO errors during device removal. When set, if a device encounters a hard IO error during the removal process the removal will not be cancelled. This can result in a normally recoverable block becoming permanently damaged and is not recommended. This should only be used as a last resort when the pool cannot be returned to a healthy state prior to removing the device. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_removal_suspend_progress\fR (int) .ad .RS 12n .sp This is used by the test suite so that it can ensure that certain actions happen while in the middle of a removal. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_remove_max_segment\fR (int) .ad .RS 12n .sp The largest contiguous segment that we will attempt to allocate when removing a device. This can be no larger than 16MB. If there is a performance problem with attempting to allocate large blocks, consider decreasing this. .sp Default value: \fB16,777,216\fR (16MB). .RE .sp .ne 2 .na \fBzfs_resilver_disable_defer\fR (int) .ad .RS 12n Disables the \fBresilver_defer\fR feature, causing an operation that would start a resilver to restart one in progress immediately. .sp Default value: \fB0\fR (feature enabled). .RE .sp .ne 2 .na \fBzfs_resilver_min_time_ms\fR (int) .ad .RS 12n Resilvers are processed by the sync thread. While resilvering it will spend at least this much time working on a resilver between txg flushes. .sp Default value: \fB3,000\fR. .RE .sp .ne 2 .na \fBzfs_scan_ignore_errors\fR (int) .ad .RS 12n If set to a nonzero value, remove the DTL (dirty time list) upon completion of a pool scan (scrub) even if there were unrepairable errors. It is intended to be used during pool repair or recovery to stop resilvering when the pool is next imported. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_scrub_min_time_ms\fR (int) .ad .RS 12n Scrubs are processed by the sync thread. While scrubbing it will spend at least this much time working on a scrub between txg flushes. .sp Default value: \fB1,000\fR. .RE .sp .ne 2 .na \fBzfs_scan_checkpoint_intval\fR (int) .ad .RS 12n To preserve progress across reboots the sequential scan algorithm periodically needs to stop metadata scanning and issue all the verifications I/Os to disk. The frequency of this flushing is determined by the \fBzfs_scan_checkpoint_intval\fR tunable. .sp Default value: \fB7200\fR seconds (every 2 hours). .RE .sp .ne 2 .na \fBzfs_scan_fill_weight\fR (int) .ad .RS 12n This tunable affects how scrub and resilver I/O segments are ordered. A higher number indicates that we care more about how filled in a segment is, while a lower number indicates we care more about the size of the extent without considering the gaps within a segment. This value is only tunable upon module insertion. Changing the value afterwards will have no affect on scrub or resilver performance. .sp Default value: \fB3\fR. .RE .sp .ne 2 .na \fBzfs_scan_issue_strategy\fR (int) .ad .RS 12n Determines the order that data will be verified while scrubbing or resilvering. If set to \fB1\fR, data will be verified as sequentially as possible, given the amount of memory reserved for scrubbing (see \fBzfs_scan_mem_lim_fact\fR). This may improve scrub performance if the pool's data is very fragmented. If set to \fB2\fR, the largest mostly-contiguous chunk of found data will be verified first. By deferring scrubbing of small segments, we may later find adjacent data to coalesce and increase the segment size. If set to \fB0\fR, zfs will use strategy \fB1\fR during normal verification and strategy \fB2\fR while taking a checkpoint. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_scan_legacy\fR (int) .ad .RS 12n A value of 0 indicates that scrubs and resilvers will gather metadata in memory before issuing sequential I/O. A value of 1 indicates that the legacy algorithm will be used where I/O is initiated as soon as it is discovered. Changing this value to 0 will not affect scrubs or resilvers that are already in progress. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_scan_max_ext_gap\fR (int) .ad .RS 12n Indicates the largest gap in bytes between scrub / resilver I/Os that will still be considered sequential for sorting purposes. Changing this value will not affect scrubs or resilvers that are already in progress. .sp Default value: \fB2097152 (2 MB)\fR. .RE .sp .ne 2 .na \fBzfs_scan_mem_lim_fact\fR (int) .ad .RS 12n Maximum fraction of RAM used for I/O sorting by sequential scan algorithm. This tunable determines the hard limit for I/O sorting memory usage. When the hard limit is reached we stop scanning metadata and start issuing data verification I/O. This is done until we get below the soft limit. .sp Default value: \fB20\fR which is 5% of RAM (1/20). .RE .sp .ne 2 .na \fBzfs_scan_mem_lim_soft_fact\fR (int) .ad .RS 12n The fraction of the hard limit used to determined the soft limit for I/O sorting by the sequential scan algorithm. When we cross this limit from below no action is taken. When we cross this limit from above it is because we are issuing verification I/O. In this case (unless the metadata scan is done) we stop issuing verification I/O and start scanning metadata again until we get to the hard limit. .sp Default value: \fB20\fR which is 5% of the hard limit (1/20). .RE .sp .ne 2 .na \fBzfs_scan_strict_mem_lim\fR (int) .ad .RS 12n Enforces tight memory limits on pool scans when a sequential scan is in progress. When disabled the memory limit may be exceeded by fast disks. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_scan_suspend_progress\fR (int) .ad .RS 12n Freezes a scrub/resilver in progress without actually pausing it. Intended for testing/debugging. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_scan_vdev_limit\fR (int) .ad .RS 12n Maximum amount of data that can be concurrently issued at once for scrubs and resilvers per leaf device, given in bytes. .sp Default value: \fB41943040\fR. .RE .sp .ne 2 .na \fBzfs_send_corrupt_data\fR (int) .ad .RS 12n Allow sending of corrupt data (ignore read/checksum errors when sending data) .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_send_unmodified_spill_blocks\fR (int) .ad .RS 12n Include unmodified spill blocks in the send stream. Under certain circumstances previous versions of ZFS could incorrectly remove the spill block from an existing object. Including unmodified copies of the spill blocks creates a backwards compatible stream which will recreate a spill block if it was incorrectly removed. .sp Use \fB1\fR for yes (default) and \fB0\fR for no. .RE .sp .ne 2 .na \fBzfs_send_no_prefetch_queue_ff\fR (int) .ad .RS 12n The fill fraction of the \fBzfs send\fR internal queues. The fill fraction controls the timing with which internal threads are woken up. .sp Default value: \fB20\fR. .RE .sp .ne 2 .na \fBzfs_send_no_prefetch_queue_length\fR (int) .ad .RS 12n The maximum number of bytes allowed in \fBzfs send\fR's internal queues. .sp Default value: \fB1,048,576\fR. .RE .sp .ne 2 .na \fBzfs_send_queue_ff\fR (int) .ad .RS 12n The fill fraction of the \fBzfs send\fR prefetch queue. The fill fraction controls the timing with which internal threads are woken up. .sp Default value: \fB20\fR. .RE .sp .ne 2 .na \fBzfs_send_queue_length\fR (int) .ad .RS 12n The maximum number of bytes allowed that will be prefetched by \fBzfs send\fR. This value must be at least twice the maximum block size in use. .sp Default value: \fB16,777,216\fR. .RE .sp .ne 2 .na \fBzfs_recv_queue_ff\fR (int) .ad .RS 12n The fill fraction of the \fBzfs receive\fR queue. The fill fraction controls the timing with which internal threads are woken up. .sp Default value: \fB20\fR. .RE .sp .ne 2 .na \fBzfs_recv_queue_length\fR (int) .ad .RS 12n The maximum number of bytes allowed in the \fBzfs receive\fR queue. This value must be at least twice the maximum block size in use. .sp Default value: \fB16,777,216\fR. .RE .sp .ne 2 .na \fBzfs_recv_write_batch_size\fR (int) .ad .RS 12n The maximum amount of data (in bytes) that \fBzfs receive\fR will write in one DMU transaction. This is the uncompressed size, even when receiving a compressed send stream. This setting will not reduce the write size below a single block. Capped at a maximum of 32MB .sp Default value: \fB1MB\fR. .RE .sp .ne 2 .na \fBzfs_override_estimate_recordsize\fR (ulong) .ad .RS 12n Setting this variable overrides the default logic for estimating block sizes when doing a zfs send. The default heuristic is that the average block size will be the current recordsize. Override this value if most data in your dataset is not of that size and you require accurate zfs send size estimates. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_sync_pass_deferred_free\fR (int) .ad .RS 12n Flushing of data to disk is done in passes. Defer frees starting in this pass .sp Default value: \fB2\fR. .RE .sp .ne 2 .na \fBzfs_spa_discard_memory_limit\fR (int) .ad .RS 12n Maximum memory used for prefetching a checkpoint's space map on each vdev while discarding the checkpoint. .sp Default value: \fB16,777,216\fR. .RE .sp .ne 2 .na \fBzfs_special_class_metadata_reserve_pct\fR (int) .ad .RS 12n Only allow small data blocks to be allocated on the special and dedup vdev types when the available free space percentage on these vdevs exceeds this value. This ensures reserved space is available for pool meta data as the special vdevs approach capacity. .sp Default value: \fB25\fR. .RE .sp .ne 2 .na \fBzfs_sync_pass_dont_compress\fR (int) .ad .RS 12n -Starting in this sync pass, we disable compression (including of metadata). +Starting in this sync pass, we disable compression (including of metadata). With the default setting, in practice, we don't have this many sync passes, so this has no effect. .sp The original intent was that disabling compression would help the sync passes to converge. However, in practice disabling compression increases the average number of sync passes, because when we turn compression off, a lot of block's size will change and thus we have to re-allocate (not overwrite) them. It also increases the number of 128KB allocations (e.g. for indirect blocks and spacemaps) because these will not be compressed. The 128K allocations are especially detrimental to performance on highly fragmented systems, which may have very few free segments of this size, and may need to load new metaslabs to satisfy 128K allocations. .sp Default value: \fB8\fR. .RE .sp .ne 2 .na \fBzfs_sync_pass_rewrite\fR (int) .ad .RS 12n Rewrite new block pointers starting in this pass .sp Default value: \fB2\fR. .RE .sp .ne 2 .na \fBzfs_sync_taskq_batch_pct\fR (int) .ad .RS 12n This controls the number of threads used by the dp_sync_taskq. The default value of 75% will create a maximum of one thread per cpu. .sp Default value: \fB75\fR%. .RE .sp .ne 2 .na \fBzfs_trim_extent_bytes_max\fR (uint) .ad .RS 12n Maximum size of TRIM command. Ranges larger than this will be split in to chunks no larger than \fBzfs_trim_extent_bytes_max\fR bytes before being issued to the device. .sp Default value: \fB134,217,728\fR. .RE .sp .ne 2 .na \fBzfs_trim_extent_bytes_min\fR (uint) .ad .RS 12n Minimum size of TRIM commands. TRIM ranges smaller than this will be skipped unless they're part of a larger range which was broken in to chunks. This is done because it's common for these small TRIMs to negatively impact overall performance. This value can be set to 0 to TRIM all unallocated space. .sp Default value: \fB32,768\fR. .RE .sp .ne 2 .na \fBzfs_trim_metaslab_skip\fR (uint) .ad .RS 12n Skip uninitialized metaslabs during the TRIM process. This option is useful for pools constructed from large thinly-provisioned devices where TRIM operations are slow. As a pool ages an increasing fraction of the pools metaslabs will be initialized progressively degrading the usefulness of this option. This setting is stored when starting a manual TRIM and will persist for the duration of the requested TRIM. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_trim_queue_limit\fR (uint) .ad .RS 12n Maximum number of queued TRIMs outstanding per leaf vdev. The number of concurrent TRIM commands issued to the device is controlled by the \fBzfs_vdev_trim_min_active\fR and \fBzfs_vdev_trim_max_active\fR module options. .sp Default value: \fB10\fR. .RE .sp .ne 2 .na \fBzfs_trim_txg_batch\fR (uint) .ad .RS 12n The number of transaction groups worth of frees which should be aggregated before TRIM operations are issued to the device. This setting represents a trade-off between issuing larger, more efficient TRIM operations and the delay before the recently trimmed space is available for use by the device. .sp Increasing this value will allow frees to be aggregated for a longer time. This will result is larger TRIM operations and potentially increased memory usage. Decreasing this value will have the opposite effect. The default value of 32 was determined to be a reasonable compromise. .sp Default value: \fB32\fR. .RE .sp .ne 2 .na \fBzfs_txg_history\fR (int) .ad .RS 12n Historical statistics for the last N txgs will be available in \fB/proc/spl/kstat/zfs//txgs\fR .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_txg_timeout\fR (int) .ad .RS 12n Flush dirty data to disk at least every N seconds (maximum txg duration) .sp Default value: \fB5\fR. .RE .sp .ne 2 .na \fBzfs_vdev_aggregate_trim\fR (int) .ad .RS 12n Allow TRIM I/Os to be aggregated. This is normally not helpful because the extents to be trimmed will have been already been aggregated by the metaslab. This option is provided for debugging and performance analysis. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_vdev_aggregation_limit\fR (int) .ad .RS 12n Max vdev I/O aggregation size .sp Default value: \fB1,048,576\fR. .RE .sp .ne 2 .na \fBzfs_vdev_aggregation_limit_non_rotating\fR (int) .ad .RS 12n Max vdev I/O aggregation size for non-rotating media .sp Default value: \fB131,072\fR. .RE .sp .ne 2 .na \fBzfs_vdev_cache_bshift\fR (int) .ad .RS 12n Shift size to inflate reads too .sp Default value: \fB16\fR (effectively 65536). .RE .sp .ne 2 .na \fBzfs_vdev_cache_max\fR (int) .ad .RS 12n Inflate reads smaller than this value to meet the \fBzfs_vdev_cache_bshift\fR size (default 64k). .sp Default value: \fB16384\fR. .RE .sp .ne 2 .na \fBzfs_vdev_cache_size\fR (int) .ad .RS 12n Total size of the per-disk cache in bytes. .sp Currently this feature is disabled as it has been found to not be helpful for performance and in some cases harmful. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_vdev_mirror_rotating_inc\fR (int) .ad .RS 12n A number by which the balancing algorithm increments the load calculation for the purpose of selecting the least busy mirror member when an I/O immediately follows its predecessor on rotational vdevs for the purpose of making decisions based on load. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_vdev_mirror_rotating_seek_inc\fR (int) .ad .RS 12n A number by which the balancing algorithm increments the load calculation for the purpose of selecting the least busy mirror member when an I/O lacks locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within this that are not immediately following the previous I/O are incremented by half. .sp Default value: \fB5\fR. .RE .sp .ne 2 .na \fBzfs_vdev_mirror_rotating_seek_offset\fR (int) .ad .RS 12n The maximum distance for the last queued I/O in which the balancing algorithm considers an I/O to have locality. See the section "ZFS I/O SCHEDULER". .sp Default value: \fB1048576\fR. .RE .sp .ne 2 .na \fBzfs_vdev_mirror_non_rotating_inc\fR (int) .ad .RS 12n A number by which the balancing algorithm increments the load calculation for the purpose of selecting the least busy mirror member on non-rotational vdevs when I/Os do not immediately follow one another. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_vdev_mirror_non_rotating_seek_inc\fR (int) .ad .RS 12n A number by which the balancing algorithm increments the load calculation for the purpose of selecting the least busy mirror member when an I/O lacks locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within this that are not immediately following the previous I/O are incremented by half. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzfs_vdev_read_gap_limit\fR (int) .ad .RS 12n Aggregate read I/O operations if the gap on-disk between them is within this threshold. .sp Default value: \fB32,768\fR. .RE .sp .ne 2 .na \fBzfs_vdev_write_gap_limit\fR (int) .ad .RS 12n Aggregate write I/O over gap .sp Default value: \fB4,096\fR. .RE .sp .ne 2 .na \fBzfs_vdev_raidz_impl\fR (string) .ad .RS 12n Parameter for selecting raidz parity implementation to use. Options marked (always) below may be selected on module load as they are supported on all systems. The remaining options may only be set after the module is loaded, as they are available only if the implementations are compiled in and supported on the running system. Once the module is loaded, the content of /sys/module/zfs/parameters/zfs_vdev_raidz_impl will show available options with the currently selected one enclosed in []. Possible options are: fastest - (always) implementation selected using built-in benchmark original - (always) original raidz implementation scalar - (always) scalar raidz implementation sse2 - implementation using SSE2 instruction set (64bit x86 only) ssse3 - implementation using SSSE3 instruction set (64bit x86 only) avx2 - implementation using AVX2 instruction set (64bit x86 only) avx512f - implementation using AVX512F instruction set (64bit x86 only) avx512bw - implementation using AVX512F & AVX512BW instruction sets (64bit x86 only) aarch64_neon - implementation using NEON (Aarch64/64 bit ARMv8 only) aarch64_neonx2 - implementation using NEON with more unrolling (Aarch64/64 bit ARMv8 only) powerpc_altivec - implementation using Altivec (PowerPC only) .sp Default value: \fBfastest\fR. .RE .sp .ne 2 .na \fBzfs_vdev_scheduler\fR (charp) .ad .RS 12n \fBDEPRECATED\fR: This option exists for compatibility with older user configurations. It does nothing except print a warning to the kernel log if set. .sp .RE .sp .ne 2 .na \fBzfs_zevent_cols\fR (int) .ad .RS 12n When zevents are logged to the console use this as the word wrap width. .sp Default value: \fB80\fR. .RE .sp .ne 2 .na \fBzfs_zevent_console\fR (int) .ad .RS 12n Log events to the console .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzfs_zevent_len_max\fR (int) .ad .RS 12n Max event queue length. A value of 0 will result in a calculated value which increases with the number of CPUs in the system (minimum 64 events). Events in the queue can be viewed with the \fBzpool events\fR command. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzfs_zil_clean_taskq_maxalloc\fR (int) .ad .RS 12n The maximum number of taskq entries that are allowed to be cached. When this limit is exceeded transaction records (itxs) will be cleaned synchronously. .sp Default value: \fB1048576\fR. .RE .sp .ne 2 .na \fBzfs_zil_clean_taskq_minalloc\fR (int) .ad .RS 12n The number of taskq entries that are pre-populated when the taskq is first created and are immediately available for use. .sp Default value: \fB1024\fR. .RE .sp .ne 2 .na \fBzfs_zil_clean_taskq_nthr_pct\fR (int) .ad .RS 12n This controls the number of threads used by the dp_zil_clean_taskq. The default value of 100% will create a maximum of one thread per cpu. .sp Default value: \fB100\fR%. .RE .sp .ne 2 .na \fBzil_maxblocksize\fR (int) .ad .RS 12n This sets the maximum block size used by the ZIL. On very fragmented pools, lowering this (typically to 36KB) can improve performance. .sp Default value: \fB131072\fR (128KB). .RE .sp .ne 2 .na \fBzil_nocacheflush\fR (int) .ad .RS 12n Disable the cache flush commands that are normally sent to the disk(s) by the ZIL after an LWB write has completed. Setting this will cause ZIL corruption on power loss if a volatile out-of-order write cache is enabled. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzil_replay_disable\fR (int) .ad .RS 12n Disable intent logging replay. Can be disabled for recovery from corrupted ZIL .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzil_slog_bulk\fR (ulong) .ad .RS 12n Limit SLOG write size per commit executed with synchronous priority. Any writes above that will be executed with lower (asynchronous) priority to limit potential SLOG device abuse by single active ZIL writer. .sp Default value: \fB786,432\fR. .RE .sp .ne 2 .na \fBzio_deadman_log_all\fR (int) .ad .RS 12n If non-zero, the zio deadman will produce debugging messages (see \fBzfs_dbgmsg_enable\fR) for all zios, rather than only for leaf zios possessing a vdev. This is meant to be used by developers to gain diagnostic information for hang conditions which don't involve a mutex or other locking primitive; typically conditions in which a thread in the zio pipeline is looping indefinitely. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzio_decompress_fail_fraction\fR (int) .ad .RS 12n If non-zero, this value represents the denominator of the probability that zfs should induce a decompression failure. For instance, for a 5% decompression failure rate, this value should be set to 20. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzio_slow_io_ms\fR (int) .ad .RS 12n When an I/O operation takes more than \fBzio_slow_io_ms\fR milliseconds to complete is marked as a slow I/O. Each slow I/O causes a delay zevent. Slow I/O counters can be seen with "zpool status -s". .sp Default value: \fB30,000\fR. .RE .sp .ne 2 .na \fBzio_dva_throttle_enabled\fR (int) .ad .RS 12n Throttle block allocations in the I/O pipeline. This allows for dynamic allocation distribution when devices are imbalanced. When enabled, the maximum number of pending allocations per top-level vdev is limited by \fBzfs_vdev_queue_depth_pct\fR. .sp Default value: \fB1\fR. .RE .sp .ne 2 .na \fBzio_requeue_io_start_cut_in_line\fR (int) .ad .RS 12n Prioritize requeued I/O .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzio_taskq_batch_pct\fR (uint) .ad .RS 12n Percentage of online CPUs (or CPU cores, etc) which will run a worker thread for I/O. These workers are responsible for I/O work such as compression and checksum calculations. Fractional number of CPUs will be rounded down. .sp The default value of 75 was chosen to avoid using all CPUs which can result in latency issues and inconsistent application performance, especially when high compression is enabled. .sp Default value: \fB75\fR. .RE .sp .ne 2 .na \fBzvol_inhibit_dev\fR (uint) .ad .RS 12n Do not create zvol device nodes. This may slightly improve startup time on systems with a very large number of zvols. .sp Use \fB1\fR for yes and \fB0\fR for no (default). .RE .sp .ne 2 .na \fBzvol_major\fR (uint) .ad .RS 12n Major number for zvol block devices .sp Default value: \fB230\fR. .RE .sp .ne 2 .na \fBzvol_max_discard_blocks\fR (ulong) .ad .RS 12n Discard (aka TRIM) operations done on zvols will be done in batches of this many blocks, where block size is determined by the \fBvolblocksize\fR property of a zvol. .sp Default value: \fB16,384\fR. .RE .sp .ne 2 .na \fBzvol_prefetch_bytes\fR (uint) .ad .RS 12n When adding a zvol to the system prefetch \fBzvol_prefetch_bytes\fR from the start and end of the volume. Prefetching these regions of the volume is desirable because they are likely to be accessed immediately by \fBblkid(8)\fR or by the kernel scanning for a partition table. .sp Default value: \fB131,072\fR. .RE .sp .ne 2 .na \fBzvol_request_sync\fR (uint) .ad .RS 12n When processing I/O requests for a zvol submit them synchronously. This effectively limits the queue depth to 1 for each I/O submitter. When set to 0 requests are handled asynchronously by a thread pool. The number of requests which can be handled concurrently is controller by \fBzvol_threads\fR. .sp Default value: \fB0\fR. .RE .sp .ne 2 .na \fBzvol_threads\fR (uint) .ad .RS 12n Max number of threads which can handle zvol I/O requests concurrently. .sp Default value: \fB32\fR. .RE .sp .ne 2 .na \fBzvol_volmode\fR (uint) .ad .RS 12n Defines zvol block devices behaviour when \fBvolmode\fR is set to \fBdefault\fR. Valid values are \fB1\fR (full), \fB2\fR (dev) and \fB3\fR (none). .sp Default value: \fB1\fR. .RE .SH ZFS I/O SCHEDULER ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os. The I/O scheduler determines when and in what order those operations are issued. The I/O scheduler divides operations into five I/O classes prioritized in the following order: sync read, sync write, async read, async write, and scrub/resilver. Each queue defines the minimum and maximum number of concurrent operations that may be issued to the device. In addition, the device has an aggregate maximum, \fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums must not exceed the aggregate maximum. If the sum of the per-queue maximums exceeds the aggregate maximum, then the number of active I/Os may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will be issued regardless of whether all per-queue minimums have been met. .sp For many physical devices, throughput increases with the number of concurrent operations, but latency typically suffers. Further, physical devices typically have a limit at which more concurrent operations have no effect on throughput or can actually cause it to decrease. .sp The scheduler selects the next operation to issue by first looking for an I/O class whose minimum has not been satisfied. Once all are satisfied and the aggregate maximum has not been hit, the scheduler looks for classes whose maximum has not been satisfied. Iteration through the I/O classes is done in the order specified above. No further operations are issued if the aggregate maximum number of concurrent operations has been hit or if there are no operations queued for an I/O class that has not hit its maximum. Every time an I/O is queued or an operation completes, the I/O scheduler looks for new operations to issue. .sp In general, smaller max_active's will lead to lower latency of synchronous operations. Larger max_active's may lead to higher overall throughput, depending on underlying storage. .sp The ratio of the queues' max_actives determines the balance of performance between reads, writes, and scrubs. E.g., increasing \fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete more quickly, but reads and writes to have higher latency and lower throughput. .sp All I/O classes have a fixed maximum number of outstanding operations except for the async write class. Asynchronous writes represent the data that is committed to stable storage during the syncing stage for transaction groups. Transaction groups enter the syncing state periodically so the number of queued async writes will quickly burst up and then bleed down to zero. Rather than servicing them as quickly as possible, the I/O scheduler changes the maximum number of active async write I/Os according to the amount of dirty data in the pool. Since both throughput and latency typically increase with the number of concurrent operations issued to physical devices, reducing the burstiness in the number of concurrent operations also stabilizes the response time of operations from other -- and in particular synchronous -- queues. In broad strokes, the I/O scheduler will issue more concurrent operations from the async write queue as there's more dirty data in the pool. .sp Async Writes .sp The number of concurrent operations issued for the async write I/O class follows a piece-wise linear function defined by a few adjustable points. .nf | o---------| <-- zfs_vdev_async_write_max_active ^ | /^ | | | / | | active | / | | I/O | / | | count | / | | | / | | |-------o | | <-- zfs_vdev_async_write_min_active 0|_______^______|_________| 0% | | 100% of zfs_dirty_data_max | | | `-- zfs_vdev_async_write_active_max_dirty_percent `--------- zfs_vdev_async_write_active_min_dirty_percent .fi Until the amount of dirty data exceeds a minimum percentage of the dirty data allowed in the pool, the I/O scheduler will limit the number of concurrent operations to the minimum. As that threshold is crossed, the number of concurrent operations issued increases linearly to the maximum at the specified maximum percentage of the dirty data allowed in the pool. .sp Ideally, the amount of dirty data on a busy pool will stay in the sloped part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the maximum percentage, this indicates that the rate of incoming data is greater than the rate that the backend storage can handle. In this case, we must further throttle incoming writes, as described in the next section. .SH ZFS TRANSACTION DELAY We delay transactions when we've determined that the backend storage isn't able to accommodate the rate of incoming writes. .sp If there is already a transaction waiting, we delay relative to when that transaction will finish waiting. This way the calculated delay time is independent of the number of threads concurrently executing transactions. .sp If we are the only waiter, wait relative to when the transaction started, rather than the current time. This credits the transaction for "time already served", e.g. reading indirect blocks. .sp The minimum time for a transaction to take is calculated as: .nf min_time = zfs_delay_scale * (dirty - min) / (max - dirty) min_time is then capped at 100 milliseconds. .fi .sp The delay has two degrees of freedom that can be adjusted via tunables. The percentage of dirty data at which we start to delay is defined by \fBzfs_delay_min_dirty_percent\fR. This should typically be at or above \fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to delay after writing at full speed has failed to keep up with the incoming write rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking, this variable determines the amount of delay at the midpoint of the curve. .sp .nf delay 10ms +-------------------------------------------------------------*+ | *| 9ms + *+ | *| 8ms + *+ | * | 7ms + * + | * | 6ms + * + | * | 5ms + * + | * | 4ms + * + | * | 3ms + * + | * | 2ms + (midpoint) * + | | ** | 1ms + v *** + | zfs_delay_scale ----------> ******** | 0 +-------------------------------------*********----------------+ 0% <- zfs_dirty_data_max -> 100% .fi .sp Note that since the delay is added to the outstanding time remaining on the most recent transaction, the delay is effectively the inverse of IOPS. Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve was chosen such that small changes in the amount of accumulated dirty data in the first 3/4 of the curve yield relatively small differences in the amount of delay. .sp The effects can be easier to understand when the amount of delay is represented on a log scale: .sp .nf delay 100ms +-------------------------------------------------------------++ + + | | + *+ 10ms + *+ + ** + | (midpoint) ** | + | ** + 1ms + v **** + + zfs_delay_scale ----------> ***** + | **** | + **** + 100us + ** + + * + | * | + * + 10us + * + + + | | + + +--------------------------------------------------------------+ 0% <- zfs_dirty_data_max -> 100% .fi .sp Note here that only as the amount of dirty data approaches its limit does the delay start to increase rapidly. The goal of a properly tuned system should be to keep the amount of dirty data out of that range by first ensuring that the appropriate limits are set for the I/O scheduler to reach optimal throughput on the backend storage, and then by changing the value of \fBzfs_delay_scale\fR to increase the steepness of the curve. diff --git a/man/man5/zpool-features.5 b/man/man5/zpool-features.5 index f65ef40a7e79..36c4343a1388 100644 --- a/man/man5/zpool-features.5 +++ b/man/man5/zpool-features.5 @@ -1,985 +1,985 @@ '\" te .\" Copyright (c) 2012, 2018 by Delphix. All rights reserved. .\" Copyright (c) 2013 by Saso Kiselkov. All rights reserved. .\" Copyright (c) 2014, Joyent, Inc. All rights reserved. .\" The contents of this file are subject to the terms of the Common Development .\" and Distribution License (the "License"). You may not use this file except .\" in compliance with the License. You can obtain a copy of the license at .\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. .\" .\" See the License for the specific language governing permissions and .\" limitations under the License. When distributing Covered Code, include this .\" CDDL HEADER in each file and include the License file at .\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this .\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your .\" own identifying information: .\" Portions Copyright [yyyy] [name of copyright owner] .\" Copyright (c) 2019, Klara Inc. .\" Copyright (c) 2019, Allan Jude -.TH ZPOOL-FEATURES 5 "Jun 8, 2018" +.TH ZPOOL-FEATURES 5 "Aug 24, 2020" OpenZFS .SH NAME zpool\-features \- ZFS pool feature descriptions .SH DESCRIPTION .sp .LP ZFS pool on\-disk format versions are specified via "features" which replace the old on\-disk format numbers (the last supported on\-disk format number is 28). To enable a feature on a pool use the \fBupgrade\fR subcommand of the zpool(8) command, or set the \fBfeature@\fR\fIfeature_name\fR property to \fBenabled\fR. .sp .LP The pool format does not affect file system version compatibility or the ability to send file systems between pools. .sp .LP Since most features can be enabled independently of each other the on\-disk format of the pool is specified by the set of all features marked as \fBactive\fR on the pool. If the pool was created by another software version this set may include unsupported features. .SS "Identifying features" .sp .LP Every feature has a GUID of the form \fIcom.example:feature_name\fR. The reversed DNS name ensures that the feature's GUID is unique across all ZFS implementations. When unsupported features are encountered on a pool they will be identified by their GUIDs. Refer to the documentation for the ZFS implementation that created the pool for information about those features. .sp .LP Each supported feature also has a short name. By convention a feature's short name is the portion of its GUID which follows the ':' (e.g. \fIcom.example:feature_name\fR would have the short name \fIfeature_name\fR), however a feature's short name may differ across ZFS implementations if following the convention would result in name conflicts. .SS "Feature states" .sp .LP Features can be in one of three states: .sp .ne 2 .na \fBactive\fR .ad .RS 12n This feature's on\-disk format changes are in effect on the pool. Support for this feature is required to import the pool in read\-write mode. If this feature is not read-only compatible, support is also required to import the pool in read\-only mode (see "Read\-only compatibility"). .RE .sp .ne 2 .na \fBenabled\fR .ad .RS 12n An administrator has marked this feature as enabled on the pool, but the feature's on\-disk format changes have not been made yet. The pool can still be imported by software that does not support this feature, but changes may be made to the on\-disk format at any time which will move the feature to the \fBactive\fR state. Some features may support returning to the \fBenabled\fR state after becoming \fBactive\fR. See feature\-specific documentation for details. .RE .sp .ne 2 .na \fBdisabled\fR .ad .RS 12n This feature's on\-disk format changes have not been made and will not be made unless an administrator moves the feature to the \fBenabled\fR state. Features cannot be disabled once they have been enabled. .RE .sp .LP The state of supported features is exposed through pool properties of the form \fIfeature@short_name\fR. .SS "Read\-only compatibility" .sp .LP Some features may make on\-disk format changes that do not interfere with other software's ability to read from the pool. These features are referred to as "read\-only compatible". If all unsupported features on a pool are read\-only compatible, the pool can be imported in read\-only mode by setting the \fBreadonly\fR property during import (see zpool(8) for details on importing pools). .SS "Unsupported features" .sp .LP For each unsupported feature enabled on an imported pool a pool property named \fIunsupported@feature_name\fR will indicate why the import was allowed despite the unsupported feature. Possible values for this property are: .sp .ne 2 .na \fBinactive\fR .ad .RS 12n The feature is in the \fBenabled\fR state and therefore the pool's on\-disk format is still compatible with software that does not support this feature. .RE .sp .ne 2 .na \fBreadonly\fR .ad .RS 12n The feature is read\-only compatible and the pool has been imported in read\-only mode. .RE .SS "Feature dependencies" .sp .LP Some features depend on other features being enabled in order to function properly. Enabling a feature will automatically enable any features it depends on. .SH FEATURES .sp .LP The following features are supported on this system: .sp .ne 2 .na \fBallocation_classes\fR .ad .RS 4n .TS l l . GUID org.zfsonlinux:allocation_classes READ\-ONLY COMPATIBLE yes DEPENDENCIES none .TE This feature enables support for separate allocation classes. This feature becomes \fBactive\fR when a dedicated allocation class vdev (dedup or special) is created with the \fBzpool create\fR or \fBzpool add\fR subcommands. With device removal, it can be returned to the \fBenabled\fR state if all the dedicated allocation class vdevs are removed. .RE .sp .ne 2 .na \fBasync_destroy\fR .ad .RS 4n .TS l l . GUID com.delphix:async_destroy READ\-ONLY COMPATIBLE yes DEPENDENCIES none .TE Destroying a file system requires traversing all of its data in order to return its used space to the pool. Without \fBasync_destroy\fR the file system is not fully removed until all space has been reclaimed. If the destroy operation is interrupted by a reboot or power outage the next attempt to open the pool will need to complete the destroy operation synchronously. When \fBasync_destroy\fR is enabled the file system's data will be reclaimed by a background process, allowing the destroy operation to complete without traversing the entire file system. The background process is able to resume interrupted destroys after the pool has been opened, eliminating the need to finish interrupted destroys as part of the open operation. The amount of space remaining to be reclaimed by the background process is available through the \fBfreeing\fR property. This feature is only \fBactive\fR while \fBfreeing\fR is non\-zero. .RE .sp .ne 2 .na \fBbookmarks\fR .ad .RS 4n .TS l l . GUID com.delphix:bookmarks READ\-ONLY COMPATIBLE yes DEPENDENCIES extensible_dataset .TE This feature enables use of the \fBzfs bookmark\fR subcommand. This feature is \fBactive\fR while any bookmarks exist in the pool. All bookmarks in the pool can be listed by running \fBzfs list -t bookmark -r \fIpoolname\fR\fR. .RE .sp .ne 2 .na \fBbookmark_v2\fR .ad .RS 4n .TS l l . GUID com.datto:bookmark_v2 READ\-ONLY COMPATIBLE no DEPENDENCIES bookmark, extensible_dataset .TE This feature enables the creation and management of larger bookmarks which are needed for other features in ZFS. This feature becomes \fBactive\fR when a v2 bookmark is created and will be returned to the \fBenabled\fR state when all v2 bookmarks are destroyed. .RE .sp .ne 2 .na \fBbookmark_written\fR .ad .RS 4n .TS l l . GUID com.delphix:bookmark_written READ\-ONLY COMPATIBLE no DEPENDENCIES bookmark, extensible_dataset, bookmark_v2 .TE This feature enables additional bookmark accounting fields, enabling the written# property (space written since a bookmark) and estimates of send stream sizes for incrementals from bookmarks. This feature becomes \fBactive\fR when a bookmark is created and will be returned to the \fBenabled\fR state when all bookmarks with these fields are destroyed. .RE .sp .ne 2 .na \fBdevice_rebuild\fR .ad .RS 4n .TS l l . GUID org.openzfs:device_rebuild READ\-ONLY COMPATIBLE yes DEPENDENCIES none .TE This feature enables the ability for the \fBzpool attach\fR and \fBzpool replace\fR subcommands to perform sequential reconstruction (instead of healing reconstruction) when resilvering. Sequential reconstruction resilvers a device in LBA order without immediately verifying the checksums. Once complete a scrub is started which then verifies the checksums. This approach allows full redundancy to be restored to the pool in the minimum amount of time. This two phase approach will take longer than a healing resilver when the time to verify the checksums is included. However, unless there is additional pool damage no checksum errors should be reported by the scrub. This feature is incompatible with raidz configurations. This feature becomes \fBactive\fR while a sequential resilver is in progress, and returns to \fBenabled\fR when the resilver completes. .RE .sp .ne 2 .na \fBdevice_removal\fR .ad .RS 4n .TS l l . GUID com.delphix:device_removal READ\-ONLY COMPATIBLE no DEPENDENCIES none .TE This feature enables the \fBzpool remove\fR subcommand to remove top-level vdevs, evacuating them to reduce the total size of the pool. This feature becomes \fBactive\fR when the \fBzpool remove\fR subcommand is used on a top-level vdev, and will never return to being \fBenabled\fR. .RE .sp .ne 2 .na \fBedonr\fR .ad .RS 4n .TS l l . GUID org.illumos:edonr READ\-ONLY COMPATIBLE no DEPENDENCIES extensible_dataset .TE This feature enables the use of the Edon-R hash algorithm for checksum, including for nopwrite (if compression is also enabled, an overwrite of a block whose checksum matches the data being written will be ignored). In an abundance of caution, Edon-R requires verification when used with dedup: \fBzfs set dedup=edonr,verify\fR. See \fBzfs\fR(8). Edon-R is a very high-performance hash algorithm that was part of the NIST SHA-3 competition. It provides extremely high hash performance (over 350% faster than SHA-256), but was not selected because of its unsuitability as a general purpose secure hash algorithm. This implementation utilizes the new salted checksumming functionality in ZFS, which means that the checksum is pre-seeded with a secret 256-bit random key (stored on the pool) before being fed the data block to be checksummed. Thus the produced checksums are unique to a given pool. When the \fBedonr\fR feature is set to \fBenabled\fR, the administrator can turn on the \fBedonr\fR checksum on any dataset using the \fBzfs set checksum=edonr\fR. See zfs(8). This feature becomes \fBactive\fR once a \fBchecksum\fR property has been set to \fBedonr\fR, and will return to being \fBenabled\fR once all filesystems that have ever had their checksum set to \fBedonr\fR are destroyed. FreeBSD does not support the \fBedonr\fR feature. .RE .sp .ne 2 .na \fBembedded_data\fR .ad .RS 4n .TS l l . GUID com.delphix:embedded_data READ\-ONLY COMPATIBLE no DEPENDENCIES none .TE This feature improves the performance and compression ratio of highly-compressible blocks. Blocks whose contents can compress to 112 bytes or smaller can take advantage of this feature. When this feature is enabled, the contents of highly-compressible blocks are stored in the block "pointer" itself (a misnomer in this case, as it contains the compressed data, rather than a pointer to its location on disk). Thus the space of the block (one sector, typically 512 bytes or 4KB) is saved, and no additional i/o is needed to read and write the data block. This feature becomes \fBactive\fR as soon as it is enabled and will never return to being \fBenabled\fR. .RE .sp .ne 2 .na \fBempty_bpobj\fR .ad .RS 4n .TS l l . GUID com.delphix:empty_bpobj READ\-ONLY COMPATIBLE yes DEPENDENCIES none .TE This feature increases the performance of creating and using a large number of snapshots of a single filesystem or volume, and also reduces the disk space required. When there are many snapshots, each snapshot uses many Block Pointer Objects (bpobj's) to track blocks associated with that snapshot. However, in common use cases, most of these bpobj's are empty. This feature allows us to create each bpobj on-demand, thus eliminating the empty bpobjs. This feature is \fBactive\fR while there are any filesystems, volumes, or snapshots which were created after enabling this feature. .RE .sp .ne 2 .na \fBenabled_txg\fR .ad .RS 4n .TS l l . GUID com.delphix:enabled_txg READ\-ONLY COMPATIBLE yes DEPENDENCIES none .TE Once this feature is enabled ZFS records the transaction group number in which new features are enabled. This has no user-visible impact, but other features may depend on this feature. This feature becomes \fBactive\fR as soon as it is enabled and will never return to being \fBenabled\fB. .RE .sp .ne 2 .na \fBencryption\fR .ad .RS 4n .TS l l . GUID com.datto:encryption READ\-ONLY COMPATIBLE no DEPENDENCIES bookmark_v2, extensible_dataset .TE This feature enables the creation and management of natively encrypted datasets. This feature becomes \fBactive\fR when an encrypted dataset is created and will be returned to the \fBenabled\fR state when all datasets that use this feature are destroyed. .RE .sp .ne 2 .na \fBextensible_dataset\fR .ad .RS 4n .TS l l . GUID com.delphix:extensible_dataset READ\-ONLY COMPATIBLE no DEPENDENCIES none .TE This feature allows more flexible use of internal ZFS data structures, and exists for other features to depend on. This feature will be \fBactive\fR when the first dependent feature uses it, and will be returned to the \fBenabled\fR state when all datasets that use this feature are destroyed. .RE .sp .ne 2 .na \fBfilesystem_limits\fR .ad .RS 4n .TS l l . GUID com.joyent:filesystem_limits READ\-ONLY COMPATIBLE yes DEPENDENCIES extensible_dataset .TE This feature enables filesystem and snapshot limits. These limits can be used to control how many filesystems and/or snapshots can be created at the point in the tree on which the limits are set. This feature is \fBactive\fR once either of the limit properties has been set on a dataset. Once activated the feature is never deactivated. .RE .sp .ne 2 .na \fBhole_birth\fR .ad .RS 4n .TS l l . GUID com.delphix:hole_birth READ\-ONLY COMPATIBLE no DEPENDENCIES enabled_txg .TE This feature has/had bugs, the result of which is that, if you do a \fBzfs send -i\fR (or \fB-R\fR, since it uses \fB-i\fR) from an affected dataset, the receiver will not see any checksum or other errors, but the resulting destination snapshot will not match the source. Its use by \fBzfs send -i\fR has been disabled by default. See the \fBsend_holes_without_birth_time\fR module parameter in zfs-module-parameters(5). This feature improves performance of incremental sends (\fBzfs send -i\fR) and receives for objects with many holes. The most common case of hole-filled objects is zvols. An incremental send stream from snapshot \fBA\fR to snapshot \fBB\fR contains information about every block that changed between \fBA\fR and \fBB\fR. Blocks which did not change between those snapshots can be identified and omitted from the stream using a piece of metadata called the 'block birth time', but birth times are not recorded for holes (blocks filled only with zeroes). Since holes created after \fBA\fR cannot be distinguished from holes created before \fBA\fR, information about every hole in the entire filesystem or zvol is included in the send stream. For workloads where holes are rare this is not a problem. However, when incrementally replicating filesystems or zvols with many holes (for example a zvol formatted with another filesystem) a lot of time will be spent sending and receiving unnecessary information about holes that already exist on the receiving side. Once the \fBhole_birth\fR feature has been enabled the block birth times of all new holes will be recorded. Incremental sends between snapshots created after this feature is enabled will use this new metadata to avoid sending information about holes that already exist on the receiving side. This feature becomes \fBactive\fR as soon as it is enabled and will never return to being \fBenabled\fB. .RE .sp .ne 2 .na \fBlarge_blocks\fR .ad .RS 4n .TS l l . GUID org.open-zfs:large_blocks READ\-ONLY COMPATIBLE no DEPENDENCIES extensible_dataset .TE The \fBlarge_block\fR feature allows the record size on a dataset to be set larger than 128KB. This feature becomes \fBactive\fR once a dataset contains a file with a block size larger than 128KB, and will return to being \fBenabled\fR once all filesystems that have ever had their recordsize larger than 128KB are destroyed. .RE .sp .ne 2 .na \fBlarge_dnode\fR .ad .RS 4n .TS l l . GUID org.zfsonlinux:large_dnode READ\-ONLY COMPATIBLE no DEPENDENCIES extensible_dataset .TE The \fBlarge_dnode\fR feature allows the size of dnodes in a dataset to be set larger than 512B. This feature becomes \fBactive\fR once a dataset contains an object with a dnode larger than 512B, which occurs as a result of setting the \fBdnodesize\fR dataset property to a value other than \fBlegacy\fR. The feature will return to being \fBenabled\fR once all filesystems that have ever contained a dnode larger than 512B are destroyed. Large dnodes allow more data to be stored in the bonus buffer, thus potentially improving performance by avoiding the use of spill blocks. .RE .sp .ne 2 .na \fB\fBlivelist\fR\fR .ad .RS 4n .TS l l . GUID com.delphix:livelist READ\-ONLY COMPATIBLE yes DEPENDENCIES none .TE This feature allows clones to be deleted faster than the traditional method when a large number of random/sparse writes have been made to the clone. All blocks allocated and freed after a clone is created are tracked by the the clone's livelist which is referenced during the deletion of the clone. The feature is activated when a clone is created and remains active until all clones have been destroyed. .RE .sp .ne 2 .na \fBlog_spacemap\fR .ad .RS 4n .TS l l . GUID com.delphix:log_spacemap READ\-ONLY COMPATIBLE yes DEPENDENCIES com.delphix:spacemap_v2 .TE This feature improves performance for heavily-fragmented pools, especially when workloads are heavy in random-writes. It does so by logging all the metaslab changes on a single spacemap every TXG instead of scattering multiple writes to all the metaslab spacemaps. This feature becomes \fBactive\fR as soon as it is enabled and will never return to being \fBenabled\fR. .RE .sp .ne 2 .na \fBlz4_compress\fR .ad .RS 4n .TS l l . GUID org.illumos:lz4_compress READ\-ONLY COMPATIBLE no DEPENDENCIES none .TE \fBlz4\fR is a high-performance real-time compression algorithm that features significantly faster compression and decompression as well as a higher compression ratio than the older \fBlzjb\fR compression. Typically, \fBlz4\fR compression is approximately 50% faster on compressible data and 200% faster on incompressible data than \fBlzjb\fR. It is also approximately 80% faster on decompression, while giving approximately 10% better compression ratio. When the \fBlz4_compress\fR feature is set to \fBenabled\fR, the administrator can turn on \fBlz4\fR compression on any dataset on the pool using the zfs(8) command. Please note that doing so will immediately activate the \fBlz4_compress\fR feature on the underlying pool using the zfs(8) command. Also, all newly written metadata will be compressed with \fBlz4\fR algorithm. Since this feature is not read-only compatible, this operation will render the pool unimportable on systems without support for the \fBlz4_compress\fR feature. Booting off of \fBlz4\fR-compressed root pools is supported. This feature becomes \fBactive\fR as soon as it is enabled and will never return to being \fBenabled\fB. .RE .sp .ne 2 .na \fBmulti_vdev_crash_dump\fR .ad .RS 4n .TS l l . GUID com.joyent:multi_vdev_crash_dump READ\-ONLY COMPATIBLE no DEPENDENCIES none .TE This feature allows a dump device to be configured with a pool comprised of multiple vdevs. Those vdevs may be arranged in any mirrored or raidz configuration. When the \fBmulti_vdev_crash_dump\fR feature is set to \fBenabled\fR, the administrator can use the \fBdumpadm\fR(1M) command to configure a dump device on a pool comprised of multiple vdevs. Under Linux this feature is registered for compatibility but not used. New pools created under Linux will have the feature \fBenabled\fR but will never transition to \fB\fBactive\fR. This functionality is not required in order to support crash dumps under Linux. Existing pools where this feature is \fB\fBactive\fR can be imported. .RE .sp .ne 2 .na \fBobsolete_counts\fR .ad .RS 4n .TS l l . GUID com.delphix:obsolete_counts READ\-ONLY COMPATIBLE yes DEPENDENCIES device_removal .TE This feature is an enhancement of device_removal, which will over time reduce the memory used to track removed devices. When indirect blocks are freed or remapped, we note that their part of the indirect mapping is "obsolete", i.e. no longer needed. This feature becomes \fBactive\fR when the \fBzpool remove\fR subcommand is used on a top-level vdev, and will never return to being \fBenabled\fR. .RE .sp .ne 2 .na \fBproject_quota\fR .ad .RS 4n .TS l l . GUID org.zfsonlinux:project_quota READ\-ONLY COMPATIBLE yes DEPENDENCIES extensible_dataset .TE This feature allows administrators to account the spaces and objects usage information against the project identifier (ID). The project ID is new object-based attribute. When upgrading an existing filesystem, object without project ID attribute will be assigned a zero project ID. After this feature is enabled, newly created object will inherit its parent directory's project ID if the parent inherit flag is set (via \fBchattr +/-P\fR or \fBzfs project [-s|-C]\fR). Otherwise, the new object's project ID will be set as zero. An object's project ID can be changed at anytime by the owner (or privileged user) via \fBchattr -p $prjid\fR or \fBzfs project -p $prjid\fR. This feature will become \fBactive\fR as soon as it is enabled and will never return to being \fBdisabled\fR. Each filesystem will be upgraded automatically when remounted or when new file is created under that filesystem. The upgrade can also be triggered on filesystems via `zfs set version=current `. The upgrade process runs in the background and may take a while to complete for the filesystems containing a large number of files. .RE .sp .ne 2 .na \fB\fBredaction_bookmarks\fR\fR .ad .RS 4n .TS l l . GUID com.delphix:redaction_bookmarks READ\-ONLY COMPATIBLE no DEPENDENCIES bookmarks, extensible_dataset .TE This feature enables the use of the redacted zfs send. Redacted \fBzfs send\fR creates redaction bookmarks, which store the list of blocks redacted by the send that created them. For more information about redacted send, see \fBzfs\fR(8). .RE .sp .ne 2 .na \fB\fBredacted_datasets\fR\fR .ad .RS 4n .TS l l . GUID com.delphix:redacted_datasets READ\-ONLY COMPATIBLE no DEPENDENCIES extensible_dataset .TE This feature enables the receiving of redacted zfs send streams. Redacted zfs send streams create redacted datasets when received. These datasets are missing some of their blocks, and so cannot be safely mounted, and their contents cannot be safely read. For more information about redacted receive, see \fBzfs\fR(8). .RE .sp .ne 2 .na \fBresilver_defer\fR .ad .RS 4n .TS l l . GUID com.datto:resilver_defer READ\-ONLY COMPATIBLE yes DEPENDENCIES none .TE This feature allows zfs to postpone new resilvers if an existing one is already in progress. Without this feature, any new resilvers will cause the currently running one to be immediately restarted from the beginning. This feature becomes \fBactive\fR once a resilver has been deferred, and returns to being \fBenabled\fR when the deferred resilver begins. .RE .sp .ne 2 .na \fBsha512\fR .ad .RS 4n .TS l l . GUID org.illumos:sha512 READ\-ONLY COMPATIBLE no DEPENDENCIES extensible_dataset .TE This feature enables the use of the SHA-512/256 truncated hash algorithm (FIPS 180-4) for checksum and dedup. The native 64-bit arithmetic of SHA-512 provides an approximate 50% performance boost over SHA-256 on 64-bit hardware and is thus a good minimum-change replacement candidate for systems where hash performance is important, but these systems cannot for whatever reason utilize the faster \fBskein\fR and \fBedonr\fR algorithms. When the \fBsha512\fR feature is set to \fBenabled\fR, the administrator can turn on the \fBsha512\fR checksum on any dataset using \fBzfs set checksum=sha512\fR. See zfs(8). This feature becomes \fBactive\fR once a \fBchecksum\fR property has been set to \fBsha512\fR, and will return to being \fBenabled\fR once all filesystems that have ever had their checksum set to \fBsha512\fR are destroyed. .RE .sp .ne 2 .na \fBskein\fR .ad .RS 4n .TS l l . GUID org.illumos:skein READ\-ONLY COMPATIBLE no DEPENDENCIES extensible_dataset .TE This feature enables the use of the Skein hash algorithm for checksum and dedup. Skein is a high-performance secure hash algorithm that was a finalist in the NIST SHA-3 competition. It provides a very high security margin and high performance on 64-bit hardware (80% faster than SHA-256). This implementation also utilizes the new salted checksumming functionality in ZFS, which means that the checksum is pre-seeded with a secret 256-bit random key (stored on the pool) before being fed the data block to be checksummed. Thus the produced checksums are unique to a given pool, preventing hash collision attacks on systems with dedup. When the \fBskein\fR feature is set to \fBenabled\fR, the administrator can turn on the \fBskein\fR checksum on any dataset using \fBzfs set checksum=skein\fR. See zfs(8). This feature becomes \fBactive\fR once a \fBchecksum\fR property has been set to \fBskein\fR, and will return to being \fBenabled\fR once all filesystems that have ever had their checksum set to \fBskein\fR are destroyed. .RE .sp .ne 2 .na \fBspacemap_histogram\fR .ad .RS 4n .TS l l . GUID com.delphix:spacemap_histogram READ\-ONLY COMPATIBLE yes DEPENDENCIES none .TE This features allows ZFS to maintain more information about how free space is organized within the pool. If this feature is \fBenabled\fR, ZFS will set this feature to \fBactive\fR when a new space map object is created or an existing space map is upgraded to the new format. Once the feature is \fBactive\fR, it will remain in that state until the pool is destroyed. .RE .sp .ne 2 .na \fBspacemap_v2\fR .ad .RS 4n .TS l l . GUID com.delphix:spacemap_v2 READ\-ONLY COMPATIBLE yes DEPENDENCIES none .TE This feature enables the use of the new space map encoding which consists of two words (instead of one) whenever it is advantageous. The new encoding allows space maps to represent large regions of space more efficiently on-disk while also increasing their maximum addressable offset. This feature becomes \fBactive\fR once it is \fBenabled\fR, and never returns back to being \fBenabled\fR. .RE .sp .ne 2 .na \fBuserobj_accounting\fR .ad .RS 4n .TS l l . GUID org.zfsonlinux:userobj_accounting READ\-ONLY COMPATIBLE yes DEPENDENCIES extensible_dataset .TE This feature allows administrators to account the object usage information by user and group. This feature becomes \fBactive\fR as soon as it is enabled and will never return to being \fBenabled\fR. Each filesystem will be upgraded automatically when remounted, or when new files are created under that filesystem. The upgrade can also be started manually on filesystems by running `zfs set version=current `. The upgrade process runs in the background and may take a while to complete for filesystems containing a large number of files. .RE .sp .ne 2 .na \fBzpool_checkpoint\fR .ad .RS 4n .TS l l . GUID com.delphix:zpool_checkpoint READ\-ONLY COMPATIBLE yes DEPENDENCIES none .TE This feature enables the \fBzpool checkpoint\fR subcommand that can checkpoint the state of the pool at the time it was issued and later rewind back to it or discard it. This feature becomes \fBactive\fR when the \fBzpool checkpoint\fR subcommand is used to checkpoint the pool. The feature will only return back to being \fBenabled\fR when the pool is rewound or the checkpoint has been discarded. .RE .sp .ne 2 .na \fBzstd_compress\fR .ad .RS 4n .TS l l . GUID org.freebsd:zstd_compress READ\-ONLY COMPATIBLE no DEPENDENCIES extensible_dataset .TE \fBzstd\fR is a high-performance compression algorithm that features a combination of high compression ratios and high speed. Compared to \fBgzip\fR, \fBzstd\fR offers slighty better compression at much higher speeds. Compared to \fBlz4\fR, \fBzstd\fR offers much better compression while being only modestly slower. Typically, \fBzstd\fR compression speed ranges from 250 to 500 MB/s per thread and decompression speed is over 1 GB/s per thread. When the \fBzstd\fR feature is set to \fBenabled\fR, the administrator can turn on \fBzstd\fR compression of any dataset by running `zfs set compress=zstd `. This feature becomes \fBactive\fR once a \fBcompress\fR property has been set to \fBzstd\fR, and will return to being \fBenabled\fR once all filesystems that have ever had their compress property set to \fBzstd\fR are destroyed. Booting off of \fBzstd\fR-compressed root pools is not yet supported. .RE .SH "SEE ALSO" zpool(8) diff --git a/man/man8/fsck.zfs.8 b/man/man8/fsck.zfs.8 index 80e7a1ebb35f..f681c2502ebe 100644 --- a/man/man8/fsck.zfs.8 +++ b/man/man8/fsck.zfs.8 @@ -1,67 +1,67 @@ '\" t .\" .\" CDDL HEADER START .\" .\" The contents of this file are subject to the terms of the .\" Common Development and Distribution License (the "License"). .\" You may not use this file except in compliance with the License. .\" .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE .\" or http://www.opensolaris.org/os/licensing. .\" See the License for the specific language governing permissions .\" and limitations under the License. .\" .\" When distributing Covered Code, include this CDDL HEADER in each .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. .\" If applicable, add the following below this CDDL HEADER, with the .\" fields enclosed by brackets "[]" replaced with your own identifying .\" information: Portions Copyright [yyyy] [name of copyright owner] .\" .\" CDDL HEADER END .\" .\" .\" Copyright 2013 Darik Horn . All rights reserved. .\" -.TH fsck.zfs 8 "2013 MAR 16" "ZFS on Linux" "System Administration Commands" +.TH FSCK.ZFS 8 "Aug 24, 2020" OpenZFS .SH NAME fsck.zfs \- Dummy ZFS filesystem checker. .SH SYNOPSIS .LP .BI "fsck.zfs [" "options" "] <" "dataset" ">" .SH DESCRIPTION .LP \fBfsck.zfs\fR is a shell stub that does nothing and always returns true. It is installed by ZoL because some Linux distributions expect a fsck helper for all filesystems. .SH OPTIONS .HP All \fIoptions\fR and the \fIdataset\fR are ignored. .SH "NOTES" .LP ZFS datasets are checked by running \fBzpool scrub\fR on the containing pool. An individual ZFS dataset is never checked independently of its pool, which is unlike a regular filesystem. .SH "BUGS" .LP On some systems, if the \fIdataset\fR is in a degraded pool, then it might be appropriate for \fBfsck.zfs\fR to return exit code 4 to indicate an uncorrected filesystem error. .LP Similarly, if the \fIdataset\fR is in a faulted pool and has a legacy /etc/fstab record, then \fBfsck.zfs\fR should return exit code 8 to indicate a fatal operational error. .SH "AUTHORS" .LP Darik Horn . .SH "SEE ALSO" .BR fsck (8), .BR fstab (5), .BR zpool-scrub (8) diff --git a/man/man8/mount.zfs.8 b/man/man8/mount.zfs.8 index 4b71367e23e9..016a909c26a0 100644 --- a/man/man8/mount.zfs.8 +++ b/man/man8/mount.zfs.8 @@ -1,144 +1,144 @@ '\" t .\" .\" CDDL HEADER START .\" .\" The contents of this file are subject to the terms of the .\" Common Development and Distribution License (the "License"). .\" You may not use this file except in compliance with the License. .\" .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE .\" or http://www.opensolaris.org/os/licensing. .\" See the License for the specific language governing permissions .\" and limitations under the License. .\" .\" When distributing Covered Code, include this CDDL HEADER in each .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. .\" If applicable, add the following below this CDDL HEADER, with the .\" fields enclosed by brackets "[]" replaced with your own identifying .\" information: Portions Copyright [yyyy] [name of copyright owner] .\" .\" CDDL HEADER END .\" .\" .\" Copyright 2013 Darik Horn . All rights reserved. .\" -.TH mount.zfs 8 "2013 FEB 28" "ZFS on Linux" "System Administration Commands" +.TH MOUNT.ZFS 8 "Aug 24, 2020" OpenZFS .SH NAME mount.zfs \- mount a ZFS filesystem .SH SYNOPSIS .LP .BI "mount.zfs [\-sfnvh] [\-o " options "]" " dataset mountpoint .SH DESCRIPTION .BR mount.zfs is part of the zfsutils package for Linux. It is a helper program that is usually invoked by the .BR mount (8) or .BR zfs (8) commands to mount a ZFS dataset. All .I options are handled according to the FILESYSTEM INDEPENDENT MOUNT OPTIONS section in the .BR mount (8) manual, except for those described below. The .I dataset parameter is a ZFS filesystem name, as output by the .B "zfs list -H -o name command. This parameter never has a leading slash character and is not a device name. The .I mountpoint parameter is the path name of a directory. .SH OPTIONS .TP .BI "\-s" Ignore bad or sloppy mount options. .TP .BI "\-f" Do a fake mount; do not perform the mount operation. .TP .BI "\-n" Do not update the /etc/mtab file. .TP .BI "\-v" Increase verbosity. .TP .BI "\-h" Print the usage message. .TP .BI "\-o context" This flag sets the SELinux context for all files in the filesystem under that mountpoint. .TP .BI "\-o fscontext" This flag sets the SELinux context for the filesystem being mounted. .TP .BI "\-o defcontext" This flag sets the SELinux context for unlabeled files. .TP .BI "\-o rootcontext" This flag sets the SELinux context for the root inode of the filesystem. .TP .BI "\-o legacy" This private flag indicates that the .I dataset has an entry in the /etc/fstab file. .TP .BI "\-o noxattr" This private flag disables extended attributes. .TP .BI "\-o xattr This private flag enables directory-based extended attributes and, if appropriate, adds a ZFS context to the selinux system policy. .TP .BI "\-o saxattr This private flag enables system attributed-based extended attributes and, if appropriate, adds a ZFS context to the selinux system policy. .TP .BI "\-o dirxattr Equivalent to .BR xattr . .TP .BI "\-o zfsutil" This private flag indicates that .BR mount (8) is being called by the .BR zfs (8) command. .SH NOTES ZFS conventionally requires that the .I mountpoint be an empty directory, but the Linux implementation inconsistently enforces the requirement. The .BR mount.zfs helper does not mount the contents of zvols. .SH FILES .TP 18n .I /etc/fstab The static filesystem table. .TP .I /etc/mtab The mounted filesystem table. .SH "AUTHORS" The primary author of .BR mount.zfs is Brian Behlendorf . This man page was written by Darik Horn . .SH "SEE ALSO" .BR fstab (5), .BR mount (8), .BR zfs (8) diff --git a/man/man8/vdev_id.8 b/man/man8/vdev_id.8 index 70956c634f03..6de3d18fe575 100644 --- a/man/man8/vdev_id.8 +++ b/man/man8/vdev_id.8 @@ -1,77 +1,77 @@ -.TH vdev_id 8 +.TH VDEV_ID 8 "Aug 24, 2020" OpenZFS .SH NAME vdev_id \- generate user-friendly names for JBOD disks .SH SYNOPSIS .LP .nf \fBvdev_id\fR <-d dev> [-c config_file] [-g sas_direct|sas_switch] [-m] [-p phys_per_port] \fBvdev_id\fR -h .fi .SH DESCRIPTION The \fBvdev_id\fR command is a udev helper which parses the file .BR /etc/zfs/vdev_id.conf (5) to map a physical path in a storage topology to a channel name. The channel name is combined with a disk enclosure slot number to create an alias that reflects the physical location of the drive. This is particularly helpful when it comes to tasks like replacing failed drives. Slot numbers may also be re-mapped in case the default numbering is unsatisfactory. The drive aliases will be created as symbolic links in /dev/disk/by-vdev. The currently supported topologies are sas_direct and sas_switch. A multipath mode is supported in which dm-mpath devices are handled by examining the first-listed running component disk as reported by the .BR multipath (8) command. In multipath mode the configuration file should contain a channel definition with the same name for each path to a given enclosure. .BR vdev_id also supports creating aliases based on existing udev links in the /dev hierarchy using the \fIalias\fR configuration file keyword. See the .BR vdev_id.conf (5) man page for details. .SH OPTIONS .TP \fB\-c\fR Specifies the path to an alternate configuration file. The default is /etc/zfs/vdev_id.conf. .TP \fB\-d\fR This is the only mandatory argument. Specifies the name of a device in /dev, i.e. "sda". .TP \fB\-g\fR Identifies a physical topology that governs how physical paths are mapped to channels. \fIsas_direct\fR - in this mode a channel is uniquely identified by a PCI slot and a HBA port number \fIsas_switch\fR - in this mode a channel is uniquely identified by a SAS switch port number .TP \fB\-m\fR Specifies that .BR vdev_id (8) will handle only dm-multipath devices. If set to "yes" then .BR vdev_id (8) will examine the first running component disk of a dm-multipath device as listed by the .BR multipath (8) command to determine the physical path. .TP \fB\-p\fR Specifies the number of PHY devices associated with a SAS HBA port or SAS switch port. .BR vdev_id (8) internally uses this value to determine which HBA or switch port a device is connected to. The default is 4. .TP \fB\-h\fR Print a usage summary. .SH SEE ALSO .LP \fBvdev_id.conf\fR(5) diff --git a/man/man8/zed.8.in b/man/man8/zed.8.in index 2ca3935724d7..9d494d5e8ff4 100644 --- a/man/man8/zed.8.in +++ b/man/man8/zed.8.in @@ -1,268 +1,268 @@ .\" .\" This file is part of the ZFS Event Daemon (ZED) .\" for ZFS on Linux (ZoL) . .\" Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). .\" Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. .\" Refer to the ZoL git commit log for authoritative copyright attribution. .\" .\" The contents of this file are subject to the terms of the .\" Common Development and Distribution License Version 1.0 (CDDL-1.0). .\" You can obtain a copy of the license from the top-level file .\" "OPENSOLARIS.LICENSE" or at . .\" You may not use this file except in compliance with the license. .\" -.TH ZED 8 "Octember 1, 2013" "ZFS on Linux" "System Administration Commands" +.TH ZED 8 "Aug 24, 2020" OpenZFS .SH NAME ZED \- ZFS Event Daemon .SH SYNOPSIS .HP .B zed .\" [\fB\-c\fR \fIconfigfile\fR] [\fB\-d\fR \fIzedletdir\fR] [\fB\-f\fR] [\fB\-F\fR] [\fB\-h\fR] [\fB\-I\fR] [\fB\-L\fR] [\fB\-M\fR] [\fB\-p\fR \fIpidfile\fR] [\fB\-P\fR \fIpath\fR] [\fB\-s\fR \fIstatefile\fR] [\fB\-v\fR] [\fB\-V\fR] [\fB\-Z\fR] .SH DESCRIPTION .PP \fBZED\fR (ZFS Event Daemon) monitors events generated by the ZFS kernel module. When a zevent (ZFS Event) is posted, \fBZED\fR will run any ZEDLETs (ZFS Event Daemon Linkage for Executable Tasks) that have been enabled for the corresponding zevent class. .SH OPTIONS .TP .BI \-h Display a summary of the command-line options. .TP .BI \-L Display license information. .TP .BI \-V Display version information. .TP .BI \-v Be verbose. .TP .BI \-f Force the daemon to run if at all possible, disabling security checks and throwing caution to the wind. Not recommended for use in production. .TP .BI \-F Run the daemon in the foreground. .TP .BI \-M Lock all current and future pages in the virtual memory address space. This may help the daemon remain responsive when the system is under heavy memory pressure. .TP .BI \-I Request that the daemon idle rather than exit when the kernel modules are not loaded. Processing of events will start, or resume, when the kernel modules are (re)loaded. Under Linux the kernel modules cannot be unloaded while the daemon is running. .TP .BI \-Z Zero the daemon's state, thereby allowing zevents still within the kernel to be reprocessed. .\" .TP .\" .BI \-c\ configfile .\" Read the configuration from the specified file. .TP .BI \-d\ zedletdir Read the enabled ZEDLETs from the specified directory. .TP .BI \-p\ pidfile Write the daemon's process ID to the specified file. .TP .BI \-P\ path Custom $PATH for zedlets to use. Normally zedlets run in a locked-down environment, with hardcoded paths to the ZFS commands ($ZFS, $ZPOOL, $ZED, ...), and a hardcoded $PATH. This is done for security reasons. However, the ZFS test suite uses a custom PATH for its ZFS commands, and passes it to zed with -P. In short, -P is only to be used by the ZFS test suite; never use it in production! .TP .BI \-s\ statefile Write the daemon's state to the specified file. .SH ZEVENTS .PP A zevent is comprised of a list of nvpairs (name/value pairs). Each zevent contains an EID (Event IDentifier) that uniquely identifies it throughout the lifetime of the loaded ZFS kernel module; this EID is a monotonically increasing integer that resets to 1 each time the kernel module is loaded. Each zevent also contains a class string that identifies the type of event. For brevity, a subclass string is defined that omits the leading components of the class string. Additional nvpairs exist to provide event details. .PP The kernel maintains a list of recent zevents that can be viewed (along with their associated lists of nvpairs) using the "\fBzpool events \-v\fR" command. .SH CONFIGURATION .PP ZEDLETs to be invoked in response to zevents are located in the \fIenabled-zedlets\fR directory. These can be symlinked or copied from the \fIinstalled-zedlets\fR directory; symlinks allow for automatic updates from the installed ZEDLETs, whereas copies preserve local modifications. As a security measure, ZEDLETs must be owned by root. They must have execute permissions for the user, but they must not have write permissions for group or other. Dotfiles are ignored. .PP ZEDLETs are named after the zevent class for which they should be invoked. In particular, a ZEDLET will be invoked for a given zevent if either its class or subclass string is a prefix of its filename (and is followed by a non-alphabetic character). As a special case, the prefix "all" matches all zevents. Multiple ZEDLETs may be invoked for a given zevent. .SH ZEDLETS .PP ZEDLETs are executables invoked by the ZED in response to a given zevent. They should be written under the presumption they can be invoked concurrently, and they should use appropriate locking to access any shared resources. Common variables used by ZEDLETs can be stored in the default rc file which is sourced by scripts; these variables should be prefixed with "ZED_". .PP The zevent nvpairs are passed to ZEDLETs as environment variables. Each nvpair name is converted to an environment variable in the following manner: 1) it is prefixed with "ZEVENT_", 2) it is converted to uppercase, and 3) each non-alphanumeric character is converted to an underscore. Some additional environment variables have been defined to present certain nvpair values in a more convenient form. An incomplete list of zevent environment variables is as follows: .TP .B ZEVENT_EID The Event IDentifier. .TP .B ZEVENT_CLASS The zevent class string. .TP .B ZEVENT_SUBCLASS The zevent subclass string. .TP .B ZEVENT_TIME The time at which the zevent was posted as "\fIseconds\fR\ \fInanoseconds\fR" since the Epoch. .TP .B ZEVENT_TIME_SECS The \fIseconds\fR component of ZEVENT_TIME. .TP .B ZEVENT_TIME_NSECS The \fInanoseconds\fR component of ZEVENT_TIME. .TP .B ZEVENT_TIME_STRING An almost-RFC3339-compliant string for ZEVENT_TIME. .PP Additionally, the following ZED & ZFS variables are defined: .TP .B ZED_PID The daemon's process ID. .TP .B ZED_ZEDLET_DIR The daemon's current \fIenabled-zedlets\fR directory. .TP .B ZFS_ALIAS The ZFS alias (\fIname-version-release\fR) string used to build the daemon. .TP .B ZFS_VERSION The ZFS version used to build the daemon. .TP .B ZFS_RELEASE The ZFS release used to build the daemon. .PP ZEDLETs may need to call other ZFS commands. The installation paths of the following executables are defined: \fBZDB\fR, \fBZED\fR, \fBZFS\fR, \fBZINJECT\fR, and \fBZPOOL\fR. These variables can be overridden in the rc file if needed. .SH FILES .\" .TP .\" @sysconfdir@/zfs/zed.conf .\" The default configuration file for the daemon. .TP .I @sysconfdir@/zfs/zed.d The default directory for enabled ZEDLETs. .TP .I @sysconfdir@/zfs/zed.d/zed.rc The default rc file for common variables used by ZEDLETs. .TP .I @zfsexecdir@/zed.d The default directory for installed ZEDLETs. .TP .I @runstatedir@/zed.pid The default file containing the daemon's process ID. .TP .I @runstatedir@/zed.state The default file containing the daemon's state. .SH SIGNALS .TP .B HUP Reconfigure the daemon and rescan the directory for enabled ZEDLETs. .TP .B TERM Terminate the daemon. .SH NOTES .PP \fBZED\fR requires root privileges. .\" Do not taunt zed. .SH BUGS .PP Events are processed synchronously by a single thread. This can delay the processing of simultaneous zevents. .PP There is no maximum timeout for ZEDLET execution. Consequently, a misbehaving ZEDLET can delay the processing of subsequent zevents. .PP The ownership and permissions of the \fIenabled-zedlets\fR directory (along with all parent directories) are not checked. If any of these directories are improperly owned or permissioned, an unprivileged user could insert a ZEDLET to be executed as root. The requirement that ZEDLETs be owned by root mitigates this to some extent. .PP ZEDLETs are unable to return state/status information to the kernel. .PP Some zevent nvpair types are not handled. These are denoted by zevent environment variables having a "_NOT_IMPLEMENTED_" value. .PP Internationalization support via gettext has not been added. .PP The configuration file is not yet implemented. .PP The diagnosis engine is not yet implemented. .SH LICENSE .PP \fBZED\fR (ZFS Event Daemon) is distributed under the terms of the Common Development and Distribution License Version 1.0 (CDDL\-1.0). .PP Developed at Lawrence Livermore National Laboratory (LLNL\-CODE\-403049). .SH SEE ALSO .BR zfs (8), .BR zpool (8) .BR zpool-events (8) diff --git a/man/man8/zfs-mount-generator.8.in b/man/man8/zfs-mount-generator.8.in index 41a2999f0f0a..3b8c9c3ae246 100644 --- a/man/man8/zfs-mount-generator.8.in +++ b/man/man8/zfs-mount-generator.8.in @@ -1,248 +1,248 @@ .\" .\" Copyright 2018 Antonio Russo .\" Copyright 2019 Kjeld Schouten-Lebbing .\" Copyright 2020 InsanePrawn .\" .\" Permission is hereby granted, free of charge, to any person obtaining .\" a copy of this software and associated documentation files (the .\" "Software"), to deal in the Software without restriction, including .\" without limitation the rights to use, copy, modify, merge, publish, .\" distribute, sublicense, and/or sell copies of the Software, and to .\" permit persons to whom the Software is furnished to do so, subject to .\" the following conditions: .\" .\" The above copyright notice and this permission notice shall be .\" included in all copies or substantial portions of the Software. .\" .\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, .\" EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF .\" MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND .\" NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE .\" LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION .\" OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION .\" WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -.TH "ZFS\-MOUNT\-GENERATOR" "8" "2020-01-19" "ZFS" "zfs-mount-generator" "\"" +.TH ZFS-MOUNT-GENERATOR 8 "Aug 24, 2020" OpenZFS .SH "NAME" zfs\-mount\-generator \- generates systemd mount units for ZFS .SH SYNOPSIS .B @systemdgeneratordir@/zfs\-mount\-generator .sp .SH DESCRIPTION zfs\-mount\-generator implements the \fBGenerators Specification\fP of .BR systemd (1), and is called during early boot to generate .BR systemd.mount (5) units for automatically mounted datasets. Mount ordering and dependencies are created for all tracked pools (see below). .SS ENCRYPTION KEYS If the dataset is an encryption root, a service that loads the associated key (either from file or through a .BR systemd\-ask\-password (1) prompt) will be created. This service . BR RequiresMountsFor the path of the key (if file-based) and also copies the mount unit's .BR After , .BR Before and .BR Requires . All mount units of encrypted datasets add the key\-load service for their encryption root to their .BR Wants and .BR After . The service will not be .BR Want ed or .BR Require d by .BR local-fs.target directly, and so will only be started manually or as a dependency of a started mount unit. .SS UNIT ORDERING AND DEPENDENCIES mount unit's .BR Before \-> key\-load service (if any) \-> mount unit \-> mount unit's .BR After It is worth nothing that when a mount unit is activated, it activates all available mount units for parent paths to its mountpoint, i.e. activating the mount unit for /tmp/foo/1/2/3 automatically activates all available mount units for /tmp, /tmp/foo, /tmp/foo/1, and /tmp/foo/1/2. This is true for any combination of mount units from any sources, not just ZFS. .SS CACHE FILE Because ZFS pools may not be available very early in the boot process, information on ZFS mountpoints must be stored separately. The output of the command .PP .RS 4 zfs list -H -o name,mountpoint,canmount,atime,relatime,devices,exec,readonly,setuid,nbmand,encroot,keylocation,org.openzfs.systemd:requires,org.openzfs.systemd:requires-mounts-for,org.openzfs.systemd:before,org.openzfs.systemd:after,org.openzfs.systemd:wanted-by,org.openzfs.systemd:required-by,org.openzfs.systemd:nofail,org.openzfs.systemd:ignore .RE .PP for datasets that should be mounted by systemd, should be kept separate from the pool, at .PP .RS 4 .RI @sysconfdir@/zfs/zfs-list.cache/ POOLNAME . .RE .PP The cache file, if writeable, will be kept synchronized with the pool state by the ZEDLET .PP .RS 4 history_event-zfs-list-cacher.sh . .RE .PP .sp .SS PROPERTIES The behavior of the generator script can be influenced by the following dataset properties: .sp .TP 4 .BR canmount = on | off | noauto If a dataset has .BR mountpoint set and .BR canmount is not .BR off , a mount unit will be generated. Additionally, if .BR canmount is .BR on , .BR local-fs.target will gain a dependency on the mount unit. This behavior is equal to the .BR auto and .BR noauto legacy mount options, see .BR systemd.mount (5). Encryption roots always generate a key-load service, even for .BR canmount=off . .TP 4 .BR org.openzfs.systemd:requires\-mounts\-for = \fIpath\fR... Space\-separated list of mountpoints to require to be mounted for this mount unit .TP 4 .BR org.openzfs.systemd:before = \fIunit\fR... The mount unit and associated key\-load service will be ordered before this space\-separated list of units. .TP 4 .BR org.openzfs.systemd:after = \fIunit\fR... The mount unit and associated key\-load service will be ordered after this space\-separated list of units. .TP 4 .BR org.openzfs.systemd:wanted\-by = \fIunit\fR... Space-separated list of units that will gain a .BR Wants dependency on this mount unit. Setting this property implies .BR noauto . .TP 4 .BR org.openzfs.systemd:required\-by = \fIunit\fR... Space-separated list of units that will gain a .BR Requires dependency on this mount unit. Setting this property implies .BR noauto . .TP 4 .BR org.openzfs.systemd:nofail = unset | on | off Toggles between a .BR Wants and .BR Requires type of dependency between the mount unit and .BR local-fs.target , if .BR noauto isn't set or implied. .BR on : Mount will be .BR WantedBy local-fs.target .BR off : Mount will be .BR Before and .BR RequiredBy local-fs.target .BR unset : Mount will be .BR Before and .BR WantedBy local-fs.target .TP 4 .BR org.openzfs.systemd:ignore = on | off If set to .BR on , do not generate a mount unit for this dataset. .RE See also .BR systemd.mount (5) .PP .SH EXAMPLE To begin, enable tracking for the pool: .PP .RS 4 touch .RI @sysconfdir@/zfs/zfs-list.cache/ POOLNAME .RE .PP Then, enable the tracking ZEDLET: .PP .RS 4 ln -s "@zfsexecdir@/zed.d/history_event-zfs-list-cacher.sh" "@sysconfdir@/zfs/zed.d" systemctl enable zfs-zed.service systemctl restart zfs-zed.service .RE .PP Force the running of the ZEDLET by setting a monitored property, e.g. .BR canmount , for at least one dataset in the pool: .PP .RS 4 zfs set canmount=on .I DATASET .RE .PP This forces an update to the stale cache file. To test the generator output, run .PP .RS 4 @systemdgeneratordir@/zfs-mount-generator /tmp/zfs-mount-generator . . .RE .PP This will generate units and dependencies in .I /tmp/zfs-mount-generator for you to inspect them. The second and third argument are ignored. If you're satisfied with the generated units, instruct systemd to re-run all generators: .PP .RS 4 systemctl daemon-reload .RE .PP .sp .SH SEE ALSO .BR zfs (5) .BR zfs-events (5) .BR zed (8) .BR zpool (5) .BR systemd (1) .BR systemd.target (5) .BR systemd.special (7) .BR systemd.mount (7) diff --git a/man/man8/zinject.8 b/man/man8/zinject.8 index f02e78ca207e..ee6776fe7641 100644 --- a/man/man8/zinject.8 +++ b/man/man8/zinject.8 @@ -1,198 +1,198 @@ '\" t .\" .\" CDDL HEADER START .\" .\" The contents of this file are subject to the terms of the .\" Common Development and Distribution License (the "License"). .\" You may not use this file except in compliance with the License. .\" .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE .\" or http://www.opensolaris.org/os/licensing. .\" See the License for the specific language governing permissions .\" and limitations under the License. .\" .\" When distributing Covered Code, include this CDDL HEADER in each .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. .\" If applicable, add the following below this CDDL HEADER, with the .\" fields enclosed by brackets "[]" replaced with your own identifying .\" information: Portions Copyright [yyyy] [name of copyright owner] .\" .\" CDDL HEADER END .\" .\" .\" Copyright 2013 Darik Horn . All rights reserved. .\" -.TH zinject 8 "2013 FEB 28" "ZFS on Linux" "System Administration Commands" +.TH ZINJECT 8 "Aug 24, 2020" OpenZFS .SH NAME zinject \- ZFS Fault Injector .SH DESCRIPTION .BR zinject creates artificial problems in a ZFS pool by simulating data corruption or device failures. This program is dangerous. .SH SYNOPSIS .TP .B "zinject" List injection records. .TP .B "zinject \-b \fIobjset:object:level:blkd\fB [\-f \fIfrequency\fB] [\-amu] \fIpool\fB" Force an error into the pool at a bookmark. .TP .B "zinject \-c <\fIid\fB | all> Cancel injection records. .TP .B "zinject \-d \fIvdev\fB \-A \fIpool\fB Force a vdev into the DEGRADED or FAULTED state. .TP .B "zinject -d \fIvdev\fB -D latency:lanes \fIpool\fB Add an artificial delay to IO requests on a particular device, such that the requests take a minimum of 'latency' milliseconds to complete. Each delay has an associated number of 'lanes' which defines the number of concurrent IO requests that can be processed. For example, with a single lane delay of 10 ms (-D 10:1), the device will only be able to service a single IO request at a time with each request taking 10 ms to complete. So, if only a single request is submitted every 10 ms, the average latency will be 10 ms; but if more than one request is submitted every 10 ms, the average latency will be more than 10 ms. Similarly, if a delay of 10 ms is specified to have two lanes (-D 10:2), then the device will be able to service two requests at a time, each with a minimum latency of 10 ms. So, if two requests are submitted every 10 ms, then the average latency will be 10 ms; but if more than two requests are submitted every 10 ms, the average latency will be more than 10 ms. Also note, these delays are additive. So two invocations of '-D 10:1', is roughly equivalent to a single invocation of '-D 10:2'. This also means, one can specify multiple lanes with differing target latencies. For example, an invocation of '-D 10:1' followed by '-D 25:2' will create 3 lanes on the device; one lane with a latency of 10 ms and two lanes with a 25 ms latency. .TP .B "zinject \-d \fIvdev\fB [\-e \fIdevice_error\fB] [\-L \fIlabel_error\fB] [\-T \fIfailure\fB] [\-f \fIfrequency\fB] [\-F] \fIpool\fB" Force a vdev error. .TP .B "zinject \-I [\-s \fIseconds\fB | \-g \fItxgs\fB] \fIpool\fB" Simulate a hardware failure that fails to honor a cache flush. .TP .B "zinject \-p \fIfunction\fB \fIpool\fB Panic inside the specified function. .TP .B "zinject \-t data [\-C \fIdvas\fB] [\-e \fIdevice_error\fB] [\-f \fIfrequency\fB] [\-l \fIlevel\fB] [\-r \fIrange\fB] [\-amq] \fIpath\fB" Force an error into the contents of a file. .TP .B "zinject \-t dnode [\-C \fIdvas\fB] [\-e \fIdevice_error\fB] [\-f \fIfrequency\fB] [\-l \fIlevel\fB] [\-amq] \fIpath\fB" Force an error into the metadnode for a file or directory. .TP .B "zinject \-t \fImos_type\fB [\-C \fIdvas\fB] [\-e \fIdevice_error\fB] [\-f \fIfrequency\fB] [\-l \fIlevel\fB] [\-r \fIrange\fB] [\-amqu] \fIpool\fB" Force an error into the MOS of a pool. .SH OPTIONS .TP .BI "\-a" Flush the ARC before injection. .TP .BI "\-b" " objset:object:level:start:end" Force an error into the pool at this bookmark tuple. Each number is in hexadecimal, and only one block can be specified. .TP .BI "\-C" " dvas" Inject the given error only into specific DVAs. The mask should be specified as a list of 0-indexed DVAs separated by commas (ex. '0,2'). This option is not applicable to logical data errors such as .BR "decompress" and .BR "decrypt" . .TP .BI "\-d" " vdev" A vdev specified by path or GUID. .TP .BI "\-e" " device_error" Specify .BR "checksum" " for an ECKSUM error," .BR "decompress" " for a data decompression error," .BR "decrypt" " for a data decryption error," .BR "corrupt" " to flip a bit in the data after a read," .BR "dtl" " for an ECHILD error," .BR "io" " for an EIO error where reopening the device will succeed, or" .BR "nxio" " for an ENXIO error where reopening the device will fail." For EIO and ENXIO, the "failed" reads or writes still occur. The probe simply sets the error value reported by the I/O pipeline so it appears the read or write failed. Decryption errors only currently work with file data. .TP .BI "\-f" " frequency" Only inject errors a fraction of the time. Expressed as a real number percentage between 0.0001 and 100. .TP .BI "\-F" Fail faster. Do fewer checks. .TP .BI "\-g" " txgs" Run for this many transaction groups before reporting failure. .TP .BI "\-h" Print the usage message. .TP .BI "\-l" " level" Inject an error at a particular block level. The default is 0. .TP .BI "\-L" " label_error" Set the label error region to one of .BR " nvlist" "," .BR " pad1" "," .BR " pad2" ", or" .BR " uber" "." .TP .BI "\-m" Automatically remount the underlying filesystem. .TP .BI "\-q" Quiet mode. Only print the handler number added. .TP .BI "\-r" " range" Inject an error over a particular logical range of an object, which will be translated to the appropriate blkid range according to the object's properties. .TP .BI "\-s" " seconds" Run for this many seconds before reporting failure. .TP .BI "\-T" " failure" Set the failure type to one of .BR " all" "," .BR " claim" "," .BR " free" "," .BR " read" ", or" .BR " write" "." .TP .BI "\-t" " mos_type" Set this to .BR "mos " "for any data in the MOS," .BR "mosdir " "for an object directory," .BR "config " "for the pool configuration," .BR "bpobj " "for the block pointer list," .BR "spacemap " "for the space map," .BR "metaslab " "for the metaslab, or" .BR "errlog " "for the persistent error log." .TP .BI "\-u" Unload the pool after injection. .SH "ENVIRONMENT VARIABLES" .TP .B "ZINJECT_DEBUG" Run \fBzinject\fR in debug mode. .SH "AUTHORS" This man page was written by Darik Horn excerpting the \fBzinject\fR usage message and source code. .SH "SEE ALSO" .BR zpool (8), .BR zfs (8) diff --git a/man/man8/zpool-iostat.8 b/man/man8/zpool-iostat.8 index c318dcd7478c..f91e55c3b01a 100644 --- a/man/man8/zpool-iostat.8 +++ b/man/man8/zpool-iostat.8 @@ -1,247 +1,247 @@ .\" .\" CDDL HEADER START .\" .\" The contents of this file are subject to the terms of the .\" Common Development and Distribution License (the "License"). .\" You may not use this file except in compliance with the License. .\" .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE .\" or http://www.opensolaris.org/os/licensing. .\" See the License for the specific language governing permissions .\" and limitations under the License. .\" .\" When distributing Covered Code, include this CDDL HEADER in each .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. .\" If applicable, add the following below this CDDL HEADER, with the .\" fields enclosed by brackets "[]" replaced with your own identifying .\" information: Portions Copyright [yyyy] [name of copyright owner] .\" .\" CDDL HEADER END .\" .\" .\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved. .\" Copyright (c) 2012, 2018 by Delphix. All rights reserved. .\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved. .\" Copyright (c) 2017 Datto Inc. .\" Copyright (c) 2018 George Melikov. All Rights Reserved. .\" Copyright 2017 Nexenta Systems, Inc. .\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved. .\" .Dd August 9, 2019 .Dt ZPOOL-IOSTAT 8 .Os .Sh NAME .Nm zpool Ns Pf - Cm iostat .Nd Display logical I/O statistics for the given ZFS storage pools/vdevs .Sh SYNOPSIS .Nm .Cm iostat .Op Oo Oo Fl c Ar SCRIPT Oc Oo Fl lq Oc Oc Ns | Ns Fl rw .Op Fl T Sy u Ns | Ns Sy d .Op Fl ghHLnpPvy .Oo Oo Ar pool Ns ... Oc Ns | Ns Oo Ar pool vdev Ns ... Oc Ns | Ns Oo Ar vdev Ns ... Oc Oc .Op Ar interval Op Ar count .Sh DESCRIPTION .Bl -tag -width Ds .It Xo .Nm .Cm iostat .Op Oo Oo Fl c Ar SCRIPT Oc Oo Fl lq Oc Oc Ns | Ns Fl rw .Op Fl T Sy u Ns | Ns Sy d .Op Fl ghHLnpPvy .Oo Oo Ar pool Ns ... Oc Ns | Ns Oo Ar pool vdev Ns ... Oc Ns | Ns Oo Ar vdev Ns ... Oc Oc .Op Ar interval Op Ar count .Xc Displays logical I/O statistics for the given pools/vdevs. Physical I/Os may be observed via .Xr iostat 1 . If writes are located nearby, they may be merged into a single larger operation. Additional I/O may be generated depending on the level of vdev redundancy. To filter output, you may pass in a list of pools, a pool and list of vdevs in that pool, or a list of any vdevs from any pool. If no items are specified, statistics for every pool in the system are shown. When given an .Ar interval , the statistics are printed every .Ar interval -seconds until ^C is pressed. If +seconds until ^C is pressed. If .Fl n -flag is specified the headers are displayed only once, otherwise they are +flag is specified the headers are displayed only once, otherwise they are displayed periodically. If count is specified, the command exits after count reports are printed. The first report printed is always the statistics since boot regardless of whether .Ar interval and .Ar count are passed. However, this behavior can be suppressed with the .Fl y flag. Also note that the units of .Sy K , .Sy M , .Sy G ... that are printed in the report are in base 1024. To get the raw values, use the .Fl p flag. .Bl -tag -width Ds .It Fl c Op Ar SCRIPT1 Ns Oo , Ns Ar SCRIPT2 Oc Ns ... Run a script (or scripts) on each vdev and include the output as a new column in the .Nm zpool Cm iostat output. Users can run any script found in their .Pa ~/.zpool.d directory or from the system .Pa /etc/zfs/zpool.d directory. Script names containing the slash (/) character are not allowed. The default search path can be overridden by setting the ZPOOL_SCRIPTS_PATH environment variable. A privileged user can run .Fl c if they have the ZPOOL_SCRIPTS_AS_ROOT environment variable set. If a script requires the use of a privileged command, like .Xr smartctl 8 , then it's recommended you allow the user access to it in .Pa /etc/sudoers or add the user to the .Pa /etc/sudoers.d/zfs file. .Pp If .Fl c is passed without a script name, it prints a list of all scripts. .Fl c also sets verbose mode .No \&( Ns Fl v Ns No \&). .Pp Script output should be in the form of "name=value". The column name is set to "name" and the value is set to "value". Multiple lines can be used to output multiple columns. The first line of output not in the "name=value" format is displayed without a column title, and no more output after that is displayed. This can be useful for printing error messages. Blank or NULL values are printed as a '-' to make output awk-able. .Pp The following environment variables are set before running each script: .Bl -tag -width "VDEV_PATH" .It Sy VDEV_PATH Full path to the vdev .El .Bl -tag -width "VDEV_UPATH" .It Sy VDEV_UPATH Underlying path to the vdev (/dev/sd*). For use with device mapper, multipath, or partitioned vdevs. .El .Bl -tag -width "VDEV_ENC_SYSFS_PATH" .It Sy VDEV_ENC_SYSFS_PATH The sysfs path to the enclosure for the vdev (if any). .El .It Fl T Sy u Ns | Ns Sy d Display a time stamp. Specify .Sy u for a printed representation of the internal representation of time. See .Xr time 2 . Specify .Sy d for standard date format. See .Xr date 1 . .It Fl g Display vdev GUIDs instead of the normal device names. These GUIDs can be used in place of device names for the zpool detach/offline/remove/replace commands. .It Fl H Scripted mode. Do not display headers, and separate fields by a single tab instead of arbitrary space. .It Fl L Display real paths for vdevs resolving all symbolic links. This can be used to look up the current block device name regardless of the .Pa /dev/disk/ path used to open it. .It Fl n Print headers only once when passed .It Fl p Display numbers in parsable (exact) values. Time values are in nanoseconds. .It Fl P Display full paths for vdevs instead of only the last component of the path. This can be used in conjunction with the .Fl L flag. .It Fl r Print request size histograms for the leaf vdev's IO. This includes histograms of individual IOs (ind) and aggregate IOs (agg). These stats can be useful for observing how well IO aggregation is working. Note that TRIM IOs may exceed 16M, but will be counted as 16M. .It Fl v Verbose statistics Reports usage statistics for individual vdevs within the pool, in addition to the pool-wide statistics. .It Fl y Omit statistics since boot. Normally the first line of output reports the statistics since boot. This option suppresses that first line of output. .Ar interval .It Fl w Display latency histograms: .Pp .Ar total_wait : Total IO time (queuing + disk IO time). .Ar disk_wait : Disk IO time (time reading/writing the disk). .Ar syncq_wait : Amount of time IO spent in synchronous priority queues. Does not include disk time. .Ar asyncq_wait : Amount of time IO spent in asynchronous priority queues. Does not include disk time. .Ar scrub : Amount of time IO spent in scrub queue. Does not include disk time. .It Fl l Include average latency statistics: .Pp .Ar total_wait : Average total IO time (queuing + disk IO time). .Ar disk_wait : Average disk IO time (time reading/writing the disk). .Ar syncq_wait : Average amount of time IO spent in synchronous priority queues. Does not include disk time. .Ar asyncq_wait : Average amount of time IO spent in asynchronous priority queues. Does not include disk time. .Ar scrub : Average queuing time in scrub queue. Does not include disk time. .Ar trim : Average queuing time in trim queue. Does not include disk time. .It Fl q Include active queue statistics. Each priority queue has both pending ( .Ar pend ) and active ( .Ar activ ) IOs. Pending IOs are waiting to be issued to the disk, and active IOs have been issued to disk and are waiting for completion. These stats are broken out by priority queue: .Pp .Ar syncq_read/write : Current number of entries in synchronous priority queues. .Ar asyncq_read/write : Current number of entries in asynchronous priority queues. .Ar scrubq_read : Current number of entries in scrub queue. .Ar trimq_write : Current number of entries in trim queue. .Pp All queue statistics are instantaneous measurements of the number of entries in the queues. If you specify an interval, the measurements will be sampled from the end of the interval. .El .El .Sh SEE ALSO .Xr zpool-list 8 , .Xr zpool-status 8 , .Xr iostat 1 , .Xr smartctl 8 diff --git a/man/man8/zstreamdump.8 b/man/man8/zstreamdump.8 index 33cd047f5d78..f499be442a47 100644 --- a/man/man8/zstreamdump.8 +++ b/man/man8/zstreamdump.8 @@ -1,58 +1,58 @@ '\" te .\" Copyright (c) 2009, Sun Microsystems, Inc. All Rights Reserved .\" The contents of this file are subject to the terms of the Common Development and Distribution License (the "License"). You may not use this file except in compliance with the License. You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. .\" See the License for the specific language governing permissions and limitations under the License. When distributing Covered Code, include this CDDL HEADER in each file and include the License file at usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this CDDL HEADER, with .\" the fields enclosed by brackets "[]" replaced with your own identifying information: Portions Copyright [yyyy] [name of copyright owner] -.TH zstreamdump 8 "29 Aug 2012" "ZFS pool 28, filesystem 5" "System Administration Commands" +.TH ZSTREAMDUMP 8 "Aug 24, 2020" OpenZFS .SH NAME zstreamdump \- filter data in zfs send stream .SH SYNOPSIS .LP .nf \fBzstreamdump\fR [\fB-C\fR] [\fB-v\fR] [\fB-d\fR] .fi .SH DESCRIPTION .sp .LP The \fBzstreamdump\fR utility reads from the output of the \fBzfs send\fR command, then displays headers and some statistics from that output. See \fBzfs\fR(8). .SH OPTIONS .sp .LP The following options are supported: .sp .ne 2 .na \fB-C\fR .ad .sp .6 .RS 4n Suppress the validation of checksums. .RE .sp .ne 2 .na \fB-v\fR .ad .sp .6 .RS 4n Verbose. Dump all headers, not only begin and end headers. .RE .sp .ne 2 .na \fB-d\fR .ad .sp .6 .RS 4n Dump contents of blocks modified. Implies verbose. .RE .SH SEE ALSO .sp .LP \fBzfs\fR(8)