diff --git a/sbin/fsck_ffs/fsck_ffs.8 b/sbin/fsck_ffs/fsck_ffs.8 index f100686e70e8..8288216c0681 100644 --- a/sbin/fsck_ffs/fsck_ffs.8 +++ b/sbin/fsck_ffs/fsck_ffs.8 @@ -1,442 +1,443 @@ .\" .\" Copyright (c) 1980, 1989, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)fsck.8 8.4 (Berkeley) 5/9/95 .\" -.Dd May 3, 2019 +.Dd November 17, 2023 .Dt FSCK_FFS 8 .Os .Sh NAME .Nm fsck_ffs , .Nm fsck_ufs .Nd file system consistency check and interactive repair .Sh SYNOPSIS .Nm .Op Fl BCdEFfnpRrSyZz .Op Fl b Ar block .Op Fl c Ar level .Op Fl m Ar mode .Ar filesystem .Ar ... .Sh DESCRIPTION The specified disk partitions and/or file systems are checked. In "preen" or "check clean" mode the clean flag of each file system's superblock is examined and only those file systems that are not marked clean are checked. File systems are marked clean when they are unmounted, when they have been mounted read-only, or when .Nm runs on them successfully. If the .Fl f option is specified, the file systems will be checked regardless of the state of their clean flag. .Pp The kernel takes care that only a restricted class of innocuous file system inconsistencies can happen unless hardware or software failures intervene. These are limited to the following: .Pp .Bl -item -compact -offset indent .It Unreferenced inodes .It Link counts in inodes too large .It Missing blocks in the free map .It Blocks in the free map also in files .It Counts in the super-block wrong .El .Pp These are the only inconsistencies that .Nm with the .Fl p option will correct; if it encounters other inconsistencies, it exits with an abnormal return status and an automatic reboot will then fail. For each corrected inconsistency one or more lines will be printed identifying the file system on which the correction will take place, and the nature of the correction. After successfully correcting a file system, .Nm will print the number of files on that file system, the number of used and free blocks, and the percentage of fragmentation. .Pp If sent a .Dv QUIT signal, .Nm will finish the file system checks, then exit with an abnormal return status that causes an automatic reboot to fail. This is useful when you want to finish the file system checks during an automatic reboot, but do not want the machine to come up multiuser after the checks complete. .Pp If .Nm receives a .Dv SIGINFO (see the .Dq status argument for .Xr stty 1 ) signal, a line will be written to the standard output indicating the name of the device currently being checked, the current phase number and phase-specific progress information. .Pp Without the .Fl p option, .Nm audits and interactively repairs inconsistent conditions for file systems. If the file system is inconsistent the operator is prompted for concurrence before each correction is attempted. It should be noted that some of the corrective actions which are not correctable under the .Fl p option will result in some loss of data. The amount and severity of data lost may be determined from the diagnostic output. The default action for each consistency correction is to wait for the operator to respond .Li yes or .Li no . If the operator does not have write permission on the file system .Nm will default to a .Fl n action. .Pp The following flags are interpreted by .Nm : .Bl -tag -width indent .It Fl B A check is done on the specified and possibly active file system. The set of corrections that can be done is limited to those done when running in preen mode (see the .Fl p flag). If unexpected errors are found, the file system is marked as needing a foreground check and .Nm exits without attempting any further cleaning. .It Fl b Use the block specified immediately after the flag as the super block for the file system. An alternate super block is usually located at block 32 for UFS1, and block 192 for UFS2. .Pp See the .Fl N flag of .Xr newfs 8 . .It Fl C Check if file system was dismounted cleanly. If so, skip file system checks (like "preen"). However, if the file system was not cleanly dismounted, do full checks, as if .Nm was invoked without .Fl C . .It Fl c Convert the file system to the specified level. Note that the level of a file system can only be raised. There are currently four levels defined: .Bl -tag -width indent .It 0 The file system is in the old (static table) format. .It 1 The file system is in the new (dynamic table) format. .It 2 The file system supports 32-bit uid's and gid's, short symbolic links are stored in the inode, and directories have an added field showing the file type. .It 3 If maxcontig is greater than one, build the free segment maps to aid in finding contiguous sets of blocks. If maxcontig is equal to one, delete any existing segment maps. .El .Pp In interactive mode, .Nm will list the conversion to be made and ask whether the conversion should be done. If a negative answer is given, no further operations are done on the file system. In preen mode, the conversion is listed and done if possible without user interaction. Conversion in preen mode is best used when all the file systems are being converted at once. The format of a file system can be determined from the first line of output from .Xr dumpfs 8 . .Pp This option implies the .Fl f flag. .It Fl d Enable debugging messages. .It Fl E Clear unallocated blocks, notifying the underlying device that they are not used and that their contents may be discarded. This is useful for filesystems which have been mounted on systems without TRIM support, or with TRIM support disabled, as well as filesystems which have been copied from one device to another. .Pp See the .Fl E and .Fl t flags of .Xr newfs 8 , and the .Fl t flag of .Xr tunefs 8 . .It Fl F Determine whether the file system needs to be cleaned immediately in foreground, or if its cleaning can be deferred to background. To be eligible for background cleaning it must have been running with soft updates, not have been marked as needing a foreground check, and be mounted and writable when the background check is to be done. If these conditions are met, then .Nm exits with a zero exit status. Otherwise it exits with a non-zero exit status. If the file system is clean, it will exit with a non-zero exit status so that the clean status of the file system can be verified and reported during the foreground checks. Note that when invoked with the .Fl F flag, no cleanups are done. The only thing that .Nm does is to determine whether a foreground or background check is needed and exit with an appropriate status code. .It Fl f Force .Nm to check .Sq clean file systems when preening. .It Fl m Use the mode specified in octal immediately after the flag as the permission bits to use when creating the .Pa lost+found directory rather than the default 1777. In particular, systems that do not wish to have lost files accessible by all users on the system should use a more restrictive set of permissions such as 700. .It Fl n Assume a no response to all questions asked by .Nm except for .Ql CONTINUE? , which is assumed to be affirmative; do not open the file system for writing. .It Fl p Preen file systems (see above). .It Fl R Instruct fsck_ffs to restart itself if it encounters certain errors that warrant another run. It will limit itself to a maximum of 10 restarts in a given run in order to avoid an endless loop with extremely corrupted filesystems. .It Fl r Free up excess unused inodes. Decreasing the number of preallocated inodes reduces the running time of future runs of .Nm and frees up space that can allocated to files. The .Fl r option is ignored when running in preen mode. .It Fl S Surrender on error. With this flag enabled, a hard error returned on disk i/o will cause .Nm to abort instead of continuing on and possibly tripping over more i/o errors. .It Fl y Assume a yes response to all questions asked by .Nm ; this should be used with great caution as this is a free license to continue after essentially unlimited trouble has been encountered. .It Fl Z Similar to .Fl E , but overwrites unused blocks with zeroes. If both .Fl E and .Fl Z are specified, blocks are first zeroed and then erased. .It Fl z Clear unused directory space. The cleared space includes deleted file names and name padding. .El .Pp Inconsistencies checked are as follows: .Pp .Bl -enum -compact .It Blocks claimed by more than one inode or the free map. .It Blocks claimed by an inode outside the range of the file system. .It Incorrect link counts. .It Size checks: .Bl -item -offset indent -compact .It Directory size not a multiple of DIRBLKSIZ. .It Partially truncated file. .El .It Bad inode format. .It Blocks not accounted for anywhere. .It Directory checks: .Bl -item -offset indent -compact .It File pointing to unallocated inode. .It Inode number out of range. .It Directories with unallocated blocks (holes). .It Dot or dot-dot not the first two entries of a directory or having the wrong inode number. .El .It Super Block checks: .Bl -item -offset indent -compact .It More blocks for inodes than there are in the file system. .It Bad free block map format. .It Total free block and/or free inode count incorrect. .El .El .Pp Orphaned files and directories (allocated but unreferenced) are, with the operator's concurrence, reconnected by placing them in the .Pa lost+found directory. The name assigned is the inode number. If the .Pa lost+found directory does not exist, it is created. If there is insufficient space its size is increased. .Pp The full foreground .Nm checks for many more problems that may occur after an unrecoverable disk write error. Thus, it is recommended that you perform foreground .Nm on your systems periodically and whenever you encounter unrecoverable disk write errors or file-system\-related panics. .Sh FILES .Bl -tag -width /etc/fstab -compact .It Pa /etc/fstab contains default list of file systems to check. .El .Sh EXIT STATUS .Ex -std .Pp Specific non-zero exit status values used are: .Bl -tag -width indent .It 1 Usage error (missing or invalid command arguments). .It 2 The .Fl p option was used and a .Dv SIGQUIT was received, indicating that the system should be returned to single user mode after the file system check. .It 3 The file system superblock cannot be read. This could indicate that the file system device does not exist or is not yet ready. .It 4 A mounted file system was modified; the system should be rebooted. .It 5 The .Fl B option was used and soft updates are not enabled on the file system. .It 6 The .Fl B option was used and the kernel lacks needed support. .It 7 The .Fl F option was used and the file system is clean. .It 8 General error exit. .It 16 The file system could not be completely repaired. The file system may be able to be repaired by running .Nm on the file system again. .El .Sh DIAGNOSTICS The diagnostics produced by .Nm are fully enumerated and explained in Appendix A of .Rs .%T "Fsck \- The UNIX File System Check Program" .Re .Sh SEE ALSO .Xr fs 5 , .Xr fstab 5 , +.Xr ffs 7 , .Xr fsck 8 , .Xr fsdb 8 , .Xr newfs 8 , .Xr reboot 8 .Sh HISTORY A .Nm fsck utility appeared in .Bx 4.0 . It became .Nm in .Fx 5.0 with the introduction of the filesystem independent wrapper as .Nm fsck . diff --git a/sbin/tunefs/tunefs.8 b/sbin/tunefs/tunefs.8 index bda39462a272..19059e335834 100644 --- a/sbin/tunefs/tunefs.8 +++ b/sbin/tunefs/tunefs.8 @@ -1,249 +1,251 @@ .\" Copyright (c) 1983, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)tunefs.8 8.2 (Berkeley) 12/11/93 .\" -.Dd August 16, 2022 +.Dd November 17, 2023 .Dt TUNEFS 8 .Os .Sh NAME .Nm tunefs .Nd tune up an existing UFS file system .Sh SYNOPSIS .Nm .Op Fl A .Op Fl a Cm enable | disable .Op Fl e Ar maxbpg .Op Fl f Ar avgfilesize .Op Fl j Cm enable | disable .Op Fl J Cm enable | disable .Op Fl k Ar held-for-metadata-blocks .Op Fl L Ar volname .Op Fl l Cm enable | disable .Op Fl m Ar minfree .Op Fl N Cm enable | disable .Op Fl n Cm enable | disable .Op Fl o Cm space | time .Op Fl p .Op Fl s Ar avgfpdir .Op Fl S Ar size .Op Fl t Cm enable | disable .Ar special | filesystem .Sh DESCRIPTION The .Nm utility is designed to change the dynamic parameters of a UFS file system which affect the layout policies. The .Nm utility cannot be run on an active file system. To change an active file system, it must be downgraded to read-only or unmounted. .Pp The parameters which are to be changed are indicated by the flags given below: .Bl -tag -width indent .It Fl A The file system has several backups of the super-block. Specifying this option will cause all backups to be modified as well as the primary super-block. This is potentially dangerous - use with caution. .It Fl a Cm enable | disable Turn on/off the administrative POSIX.1e ACL enable flag. .It Fl e Ar maxbpg Indicate the maximum number of blocks any single file can allocate out of a cylinder group before it is forced to begin allocating blocks from another cylinder group. Typically this value is set to about one quarter of the total blocks in a cylinder group. The intent is to prevent any single file from using up all the blocks in a single cylinder group, thus degrading access times for all files subsequently allocated in that cylinder group. The effect of this limit is to cause big files to do long seeks more frequently than if they were allowed to allocate all the blocks in a cylinder group before seeking elsewhere. For file systems with exclusively large files, this parameter should be set higher. .It Fl f Ar avgfilesize Specify the expected average file size. .It Fl j Cm enable | disable Turn on/off soft updates journaling. .Pp Enabling journaling reduces the time spent by .Xr fsck_ffs 8 cleaning up a filesystem after a crash to a few seconds from minutes to hours. Without journaling, the time to recover after a crash is a function of the number of files in the filesystem and the size of the filesystem. With journaling, the time to recover after a crash is a function of the amount of activity in the filesystem in the minute before the crash. Journaled recovery time is usually only a few seconds and never exceeds a minute. .Pp The drawback to using journaling is that the writes to its log adds an extra write load to the media containing the filesystem. Thus a write-intensive workload will have reduced throughput on a filesystem running with journaling. .Pp Like all journaling filesystems, the journal recovery will only fix issues known to the journal. Specifically if a media error occurs, the journal will not know about it and hence will not fix it. Thus when using journaling, it is still necessary to run a full fsck every few months or after a filesystem panic to check for and fix any errors brought on by media failure. A full fsck can be done by running a background fsck on a live filesystem or by running with the .Fl f flag on an unmounted filesystem. When running .Xr fsck_ffs 8 in background on a live filesystem the filesystem performance will be about half of normal during the time that the background .Xr fsck_ffs 8 is running. Running a full fsck on a UFS filesystem is the equivalent of running a scrub on a ZFS filesystem. .It Fl J Cm enable | disable Turn on/off gjournal flag. .It Fl k Ar held-for-metadata-blocks Set the amount of space to be held for metadata blocks. When set, the file system preference routines will try to save the specified amount of space immediately following the inode blocks in each cylinder group for use by metadata blocks. Clustering the metadata blocks speeds up random file access and decreases the running time of .Xr fsck 8 . While this option can be set at any time, it is most effective if set before any data is loaded into the file system. By default .Xr newfs 8 sets it to half of the space reserved to minfree. .It Fl L Ar volname Add/modify an optional file system volume label. Legal characters are alphanumerics, dashes, and underscores. .It Fl l Cm enable | disable Turn on/off MAC multilabel flag. .It Fl m Ar minfree Specify the percentage of space held back from normal users; the minimum free space threshold. The default value used is 8%. Note that lowering the threshold can adversely affect performance: .Bl -bullet .It Settings of 5% and less force space optimization to always be used which will greatly increase the overhead for file writes. .It The file system's ability to avoid fragmentation will be reduced when the total free space, including the reserve, drops below 15%. As free space approaches zero, throughput can degrade by up to a factor of three over the performance obtained at a 10% threshold. .El .Pp If the value is raised above the current usage level, users will be unable to allocate files until enough files have been deleted to get under the higher threshold. .It Fl N Cm enable | disable Turn on/off the administrative NFSv4 ACL enable flag. .It Fl n Cm enable | disable Turn on/off soft updates. .It Fl o Cm space | time The file system can either try to minimize the time spent allocating blocks, or it can attempt to minimize the space fragmentation on the disk. Optimization for space has much higher overhead for file writes. The kernel normally changes the preference automatically as the percent fragmentation changes on the file system. .It Fl p Show a summary of what the current tunable settings are on the selected file system. More detailed information can be obtained from the .Xr dumpfs 8 utility. .It Fl s Ar avgfpdir Specify the expected number of files per directory. .It Fl S Ar size Specify the softdep journal size in bytes. The minimum is 4M. .It Fl t Cm enable | disable Turn on/off the TRIM enable flag. If enabled, and if the underlying device supports the BIO_DELETE command, the file system will send a delete request to the underlying device for each freed block. The trim enable flag is typically set when the underlying device uses flash-memory as the device can use the delete command to pre-zero or at least avoid copying blocks that have been deleted. .Pp Note that this does not trim blocks that are already free. See the .Xr fsck_ffs 8 .Fl E flag. .El .Pp At least one of these flags is required. .Sh FILES .Bl -tag -width ".Pa /etc/fstab" .It Pa /etc/fstab read this to determine the device file for a specified mount point. .El .Sh SEE ALSO .Xr fs 5 , +.Xr ffs 7 , +.Xr tuning 7 , .Xr dumpfs 8 , .Xr gjournal 8 , .Xr growfs 8 , .Xr newfs 8 .Rs .%A M. McKusick .%A W. Joy .%A S. Leffler .%A R. Fabry .%T "A Fast File System for UNIX" .%J "ACM Transactions on Computer Systems 2" .%N 3 .%P pp 181-197 .%D August 1984 .%O "(reprinted in the BSD System Manager's Manual, SMM:5)" .Re .Sh HISTORY The .Nm utility appeared in .Bx 4.2 . .Sh BUGS This utility does not work on active file systems. To change the root file system, the system must be rebooted after the file system is tuned. .\" Take this out and a Unix Daemon will dog your steps from now until .\" the time_t's wrap around. .Pp You can tune a file system, but you cannot tune a fish. diff --git a/share/man/man7/tuning.7 b/share/man/man7/tuning.7 index 0756598e79e2..f04500d0f0dc 100644 --- a/share/man/man7/tuning.7 +++ b/share/man/man7/tuning.7 @@ -1,731 +1,732 @@ .\" Copyright (C) 2001 Matthew Dillon. All rights reserved. .\" Copyright (C) 2012 Eitan Adler. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd October 11, 2022 +.Dd November 17, 2023 .Dt TUNING 7 .Os .Sh NAME .Nm tuning .Nd performance tuning under FreeBSD .Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP The swap partition should typically be approximately 2x the size of main memory for systems with less than 4GB of RAM, or approximately equal to the size of main memory if you have more. Keep in mind future memory expansion when sizing the swap partition. Configuring too little swap can lead to inefficiencies in the VM page scanning code as well as create issues later on if you add more memory to your machine. On larger systems with multiple disks, configure swap on each drive. The swap partitions on the drives should be approximately the same size. The kernel can handle arbitrary sizes but internal data structures scale to 4 times the largest swap partition. Keeping the swap partitions near the same size will allow the kernel to optimally stripe swap space across the N disks. Do not worry about overdoing it a little, swap space is the saving grace of .Ux and even if you do not normally use much swap, it can give you more time to recover from a runaway program before being forced to reboot. .Pp It is not a good idea to make one large partition. First, each partition has different operational characteristics and separating them allows the file system to tune itself to those characteristics. For example, the root and .Pa /usr partitions are read-mostly, with very little writing, while a lot of reading and writing could occur in .Pa /var/tmp . By properly partitioning your system fragmentation introduced in the smaller more heavily write-loaded partitions will not bleed over into the mostly-read partitions. .Pp Properly partitioning your system also allows you to tune .Xr newfs 8 , and .Xr tunefs 8 parameters. The only .Xr tunefs 8 option worthwhile turning on is .Em softupdates with .Dq Li "tunefs -n enable /filesystem" . Softupdates drastically improves meta-data performance, mainly file creation and deletion. We recommend enabling softupdates on most file systems; however, there are two limitations to softupdates that you should be aware of when determining whether to use it on a file system. First, softupdates guarantees file system consistency in the case of a crash but could very easily be several seconds (even a minute!\&) behind on pending write to the physical disk. If you crash you may lose more work than otherwise. Secondly, softupdates delays the freeing of file system blocks. If you have a file system (such as the root file system) which is close to full, doing a major update of it, e.g.,\& .Dq Li "make installworld" , can run it out of space and cause the update to fail. For this reason, softupdates will not be enabled on the root file system during a typical install. There is no loss of performance since the root file system is rarely written to. .Pp A number of run-time .Xr mount 8 options exist that can help you tune the system. The most obvious and most dangerous one is .Cm async . Only use this option in conjunction with .Xr gjournal 8 , as it is far too dangerous on a normal file system. A less dangerous and more useful .Xr mount 8 option is called .Cm noatime . .Ux file systems normally update the last-accessed time of a file or directory whenever it is accessed. This operation is handled in .Fx with a delayed write and normally does not create a burden on the system. However, if your system is accessing a huge number of files on a continuing basis the buffer cache can wind up getting polluted with atime updates, creating a burden on the system. For example, if you are running a heavily loaded web site, or a news server with lots of readers, you might want to consider turning off atime updates on your larger partitions with this .Xr mount 8 option. However, you should not gratuitously turn off atime updates everywhere. For example, the .Pa /var file system customarily holds mailboxes, and atime (in combination with mtime) is used to determine whether a mailbox has new mail. You might as well leave atime turned on for mostly read-only partitions such as .Pa / and .Pa /usr as well. This is especially useful for .Pa / since some system utilities use the atime field for reporting. .Sh STRIPING DISKS In larger systems you can stripe partitions from several drives together to create a much larger overall partition. Striping can also improve the performance of a file system by splitting I/O operations across two or more disks. The .Xr gstripe 8 , .Xr gvinum 8 , and .Xr ccdconfig 8 utilities may be used to create simple striped file systems. Generally speaking, striping smaller partitions such as the root and .Pa /var/tmp , or essentially read-only partitions such as .Pa /usr is a complete waste of time. You should only stripe partitions that require serious I/O performance, typically .Pa /var , /home , or custom partitions used to hold databases and web pages. Choosing the proper stripe size is also important. File systems tend to store meta-data on power-of-2 boundaries and you usually want to reduce seeking rather than increase seeking. This means you want to use a large off-center stripe size such as 1152 sectors so sequential I/O does not seek both disks and so meta-data is distributed across both disks rather than concentrated on a single disk. .Sh SYSCTL TUNING .Xr sysctl 8 variables permit system behavior to be monitored and controlled at run-time. Some sysctls simply report on the behavior of the system; others allow the system behavior to be modified; some may be set at boot time using .Xr rc.conf 5 , but most will be set via .Xr sysctl.conf 5 . There are several hundred sysctls in the system, including many that appear to be candidates for tuning but actually are not. In this document we will only cover the ones that have the greatest effect on the system. .Pp The .Va vm.overcommit sysctl defines the overcommit behaviour of the vm subsystem. The virtual memory system always does accounting of the swap space reservation, both total for system and per-user. Corresponding values are available through sysctl .Va vm.swap_total , that gives the total bytes available for swapping, and .Va vm.swap_reserved , that gives number of bytes that may be needed to back all currently allocated anonymous memory. .Pp Setting bit 0 of the .Va vm.overcommit sysctl causes the virtual memory system to return failure to the process when allocation of memory causes .Va vm.swap_reserved to exceed .Va vm.swap_total . Bit 1 of the sysctl enforces .Dv RLIMIT_SWAP limit (see .Xr getrlimit 2 ) . Root is exempt from this limit. Bit 2 allows to count most of the physical memory as allocatable, except wired and free reserved pages (accounted by .Va vm.stats.vm.v_free_target and .Va vm.stats.vm.v_wire_count sysctls, respectively). .Pp The .Va kern.ipc.maxpipekva loader tunable is used to set a hard limit on the amount of kernel address space allocated to mapping of pipe buffers. Use of the mapping allows the kernel to eliminate a copy of the data from writer address space into the kernel, directly copying the content of mapped buffer to the reader. Increasing this value to a higher setting, such as `25165824' might improve performance on systems where space for mapping pipe buffers is quickly exhausted. This exhaustion is not fatal; however, and it will only cause pipes to fall back to using double-copy. .Pp The .Va kern.ipc.shm_use_phys sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on). Setting this parameter to 1 will cause all System V shared memory segments to be mapped to unpageable physical RAM. This feature only has an effect if you are either (A) mapping small amounts of shared memory across many (hundreds) of processes, or (B) mapping large amounts of shared memory across any number of processes. This feature allows the kernel to remove a great deal of internal memory management page-tracking overhead at the cost of wiring the shared memory into core, making it unswappable. .Pp The .Va vfs.vmiodirenable sysctl defaults to 1 (on). This parameter controls how directories are cached by the system. Most directories are small and use but a single fragment (typically 2K) in the file system and even less (typically 512 bytes) in the buffer cache. However, when operating in the default mode the buffer cache will only cache a fixed number of directories even if you have a huge amount of memory. Turning on this sysctl allows the buffer cache to use the VM Page Cache to cache the directories. The advantage is that all of memory is now available for caching directories. The disadvantage is that the minimum in-core memory used to cache a directory is the physical page size (typically 4K) rather than 512 bytes. We recommend turning this option off in memory-constrained environments; however, when on, it will substantially improve the performance of services that manipulate a large number of files. Such services can include web caches, large mail systems, and news systems. Turning on this option will generally not reduce performance even with the wasted memory but you should experiment to find out. .Pp The .Va vfs.write_behind sysctl defaults to 1 (on). This tells the file system to issue media writes as full clusters are collected, which typically occurs when writing large sequential files. The idea is to avoid saturating the buffer cache with dirty buffers when it would not benefit I/O performance. However, this may stall processes and under certain circumstances you may wish to turn it off. .Pp The .Va vfs.hirunningspace sysctl determines how much outstanding write I/O may be queued to disk controllers system-wide at any given time. It is used by the UFS file system. The default is self-tuned and usually sufficient but on machines with advanced controllers and lots of disks this may be tuned up to match what the controllers buffer. Configuring this setting to match tagged queuing capabilities of controllers or drives with average IO size used in production works best (for example: 16 MiB will use 128 tags with IO requests of 128 KiB). Note that setting too high a value (exceeding the buffer cache's write threshold) can lead to extremely bad clustering performance. Do not set this value arbitrarily high! Higher write queuing values may also add latency to reads occurring at the same time. .Pp The .Va vfs.read_max sysctl governs VFS read-ahead and is expressed as the number of blocks to pre-read if the heuristics algorithm decides that the reads are issued sequentially. It is used by the UFS, ext2fs and msdosfs file systems. With the default UFS block size of 32 KiB, a setting of 64 will allow speculatively reading up to 2 MiB. This setting may be increased to get around disk I/O latencies, especially where these latencies are large such as in virtual machine emulated environments. It may be tuned down in specific cases where the I/O load is such that read-ahead adversely affects performance or where system memory is really low. .Pp The .Va vfs.ncsizefactor sysctl defines how large VFS namecache may grow. The number of currently allocated entries in namecache is provided by .Va debug.numcache sysctl and the condition debug.numcache < kern.maxvnodes * vfs.ncsizefactor is adhered to. .Pp The .Va vfs.ncnegfactor sysctl defines how many negative entries VFS namecache is allowed to create. The number of currently allocated negative entries is provided by .Va debug.numneg sysctl and the condition vfs.ncnegfactor * debug.numneg < debug.numcache is adhered to. .Pp There are various other buffer-cache and VM page cache related sysctls. We do not recommend modifying these values. .Pp The .Va net.inet.tcp.sendspace and .Va net.inet.tcp.recvspace sysctls are of particular interest if you are running network intensive applications. They control the amount of send and receive buffer space allowed for any given TCP connection. The default sending buffer is 32K; the default receiving buffer is 64K. You can often improve bandwidth utilization by increasing the default at the cost of eating up more kernel memory for each connection. We do not recommend increasing the defaults if you are serving hundreds or thousands of simultaneous connections because it is possible to quickly run the system out of memory due to stalled connections building up. But if you need high bandwidth over a fewer number of connections, especially if you have gigabit Ethernet, increasing these defaults can make a huge difference. You can adjust the buffer size for incoming and outgoing data separately. For example, if your machine is primarily doing web serving you may want to decrease the recvspace in order to be able to increase the sendspace without eating too much kernel memory. Note that the routing table (see .Xr route 8 ) can be used to introduce route-specific send and receive buffer size defaults. .Pp As an additional management tool you can use pipes in your firewall rules (see .Xr ipfw 8 ) to limit the bandwidth going to or from particular IP blocks or ports. For example, if you have a T1 you might want to limit your web traffic to 70% of the T1's bandwidth in order to leave the remainder available for mail and interactive use. Normally a heavily loaded web server will not introduce significant latencies into other services even if the network link is maxed out, but enforcing a limit can smooth things out and lead to longer term stability. Many people also enforce artificial bandwidth limitations in order to ensure that they are not charged for using too much bandwidth. .Pp Setting the send or receive TCP buffer to values larger than 65535 will result in a marginal performance improvement unless both hosts support the window scaling extension of the TCP protocol, which is controlled by the .Va net.inet.tcp.rfc1323 sysctl. These extensions should be enabled and the TCP buffer size should be set to a value larger than 65536 in order to obtain good performance from certain types of network links; specifically, gigabit WAN links and high-latency satellite links. RFC1323 support is enabled by default. .Pp The .Va net.inet.tcp.always_keepalive sysctl determines whether or not the TCP implementation should attempt to detect dead TCP connections by intermittently delivering .Dq keepalives on the connection. By default, this is enabled for all applications; by setting this sysctl to 0, only applications that specifically request keepalives will use them. In most environments, TCP keepalives will improve the management of system state by expiring dead TCP connections, particularly for systems serving dialup users who may not always terminate individual TCP connections before disconnecting from the network. However, in some environments, temporary network outages may be incorrectly identified as dead sessions, resulting in unexpectedly terminated TCP connections. In such environments, setting the sysctl to 0 may reduce the occurrence of TCP session disconnections. .Pp The .Va net.inet.tcp.delayed_ack TCP feature is largely misunderstood. Historically speaking, this feature was designed to allow the acknowledgement to transmitted data to be returned along with the response. For example, when you type over a remote shell, the acknowledgement to the character you send can be returned along with the data representing the echo of the character. With delayed acks turned off, the acknowledgement may be sent in its own packet, before the remote service has a chance to echo the data it just received. This same concept also applies to any interactive protocol (e.g.,\& SMTP, WWW, POP3), and can cut the number of tiny packets flowing across the network in half. The .Fx delayed ACK implementation also follows the TCP protocol rule that at least every other packet be acknowledged even if the standard 40ms timeout has not yet passed. Normally the worst a delayed ACK can do is slightly delay the teardown of a connection, or slightly delay the ramp-up of a slow-start TCP connection. While we are not sure we believe that the several FAQs related to packages such as SAMBA and SQUID which advise turning off delayed acks may be referring to the slow-start issue. .Pp The .Va net.inet.ip.portrange.* sysctls control the port number ranges automatically bound to TCP and UDP sockets. There are three ranges: a low range, a default range, and a high range, selectable via the .Dv IP_PORTRANGE .Xr setsockopt 2 call. Most network programs use the default range which is controlled by .Va net.inet.ip.portrange.first and .Va net.inet.ip.portrange.last , which default to 49152 and 65535, respectively. Bound port ranges are used for outgoing connections, and it is possible to run the system out of ports under certain circumstances. This most commonly occurs when you are running a heavily loaded web proxy. The port range is not an issue when running a server which handles mainly incoming connections, such as a normal web server, or has a limited number of outgoing connections, such as a mail relay. For situations where you may run out of ports, we recommend decreasing .Va net.inet.ip.portrange.first modestly. A range of 10000 to 30000 ports may be reasonable. You should also consider firewall effects when changing the port range. Some firewalls may block large ranges of ports (usually low-numbered ports) and expect systems to use higher ranges of ports for outgoing connections. By default .Va net.inet.ip.portrange.last is set at the maximum allowable port number. .Pp The .Va kern.ipc.soacceptqueue sysctl limits the size of the listen queue for accepting new TCP connections. The default value of 128 is typically too low for robust handling of new connections in a heavily loaded web server environment. For such environments, we recommend increasing this value to 1024 or higher. The service daemon may itself limit the listen queue size (e.g.,\& .Xr sendmail 8 , apache) but will often have a directive in its configuration file to adjust the queue size up. Larger listen queues also do a better job of fending off denial of service attacks. .Pp The .Va kern.maxfiles sysctl determines how many open files the system supports. The default is typically a few thousand but you may need to bump this up to ten or twenty thousand if you are running databases or large descriptor-heavy daemons. The read-only .Va kern.openfiles sysctl may be interrogated to determine the current number of open files on the system. .Pp The .Va vm.swap_idle_enabled sysctl is useful in large multi-user systems where you have lots of users entering and leaving the system and lots of idle processes. Such systems tend to generate a great deal of continuous pressure on free memory reserves. Turning this feature on and adjusting the swapout hysteresis (in idle seconds) via .Va vm.swap_idle_threshold1 and .Va vm.swap_idle_threshold2 allows you to depress the priority of pages associated with idle processes more quickly then the normal pageout algorithm. This gives a helping hand to the pageout daemon. Do not turn this option on unless you need it, because the tradeoff you are making is to essentially pre-page memory sooner rather than later, eating more swap and disk bandwidth. In a small system this option will have a detrimental effect but in a large system that is already doing moderate paging this option allows the VM system to stage whole processes into and out of memory more easily. .Sh LOADER TUNABLES Some aspects of the system behavior may not be tunable at runtime because memory allocations they perform must occur early in the boot process. To change loader tunables, you must set their values in .Xr loader.conf 5 and reboot the system. .Pp .Va kern.maxusers controls the scaling of a number of static system tables, including defaults for the maximum number of open files, sizing of network memory resources, etc. .Va kern.maxusers is automatically sized at boot based on the amount of memory available in the system, and may be determined at run-time by inspecting the value of the read-only .Va kern.maxusers sysctl. .Pp The .Va kern.dfldsiz and .Va kern.dflssiz tunables set the default soft limits for process data and stack size respectively. Processes may increase these up to the hard limits by calling .Xr setrlimit 2 . The .Va kern.maxdsiz , .Va kern.maxssiz , and .Va kern.maxtsiz tunables set the hard limits for process data, stack, and text size respectively; processes may not exceed these limits. The .Va kern.sgrowsiz tunable controls how much the stack segment will grow when a process needs to allocate more stack. .Pp .Va kern.ipc.nmbclusters may be adjusted to increase the number of network mbufs the system is willing to allocate. Each cluster represents approximately 2K of memory, so a value of 1024 represents 2M of kernel memory reserved for network buffers. You can do a simple calculation to figure out how many you need. If you have a web server which maxes out at 1000 simultaneous connections, and each connection eats a 16K receive and 16K send buffer, you need approximately 32MB worth of network buffers to deal with it. A good rule of thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. So for this case you would want to set .Va kern.ipc.nmbclusters to 32768. We recommend values between 1024 and 4096 for machines with moderates amount of memory, and between 4096 and 32768 for machines with greater amounts of memory. Under no circumstances should you specify an arbitrarily high value for this parameter, it could lead to a boot-time crash. The .Fl m option to .Xr netstat 1 may be used to observe network cluster use. .Pp More and more programs are using the .Xr sendfile 2 system call to transmit files over the network. The .Va kern.ipc.nsfbufs sysctl controls the number of file system buffers .Xr sendfile 2 is allowed to use to perform its work. This parameter nominally scales with .Va kern.maxusers so you should not need to modify this parameter except under extreme circumstances. See the .Sx TUNING section in the .Xr sendfile 2 manual page for details. .Sh KERNEL CONFIG TUNING There are a number of kernel options that you may have to fiddle with in a large-scale system. In order to change these options you need to be able to compile a new kernel from source. The .Xr config 8 manual page and the handbook are good starting points for learning how to do this. Generally the first thing you do when creating your own custom kernel is to strip out all the drivers and services you do not use. Removing things like .Dv INET6 and drivers you do not have will reduce the size of your kernel, sometimes by a megabyte or more, leaving more memory available for applications. .Pp .Dv SCSI_DELAY may be used to reduce system boot times. The defaults are fairly high and can be responsible for 5+ seconds of delay in the boot process. Reducing .Dv SCSI_DELAY to something below 5 seconds could work (especially with modern drives). .Pp There are a number of .Dv *_CPU options that can be commented out. If you only want the kernel to run on a Pentium class CPU, you can easily remove .Dv I486_CPU , but only remove .Dv I586_CPU if you are sure your CPU is being recognized as a Pentium II or better. Some clones may be recognized as a Pentium or even a 486 and not be able to boot without those options. If it works, great! The operating system will be able to better use higher-end CPU features for MMU, task switching, timebase, and even device operations. Additionally, higher-end CPUs support 4MB MMU pages, which the kernel uses to map the kernel itself into memory, increasing its efficiency under heavy syscall loads. .Sh CPU, MEMORY, DISK, NETWORK The type of tuning you do depends heavily on where your system begins to bottleneck as load increases. If your system runs out of CPU (idle times are perpetually 0%) then you need to consider upgrading the CPU or perhaps you need to revisit the programs that are causing the load and try to optimize them. If your system is paging to swap a lot you need to consider adding more memory. If your system is saturating the disk you typically see high CPU idle times and total disk saturation. .Xr systat 1 can be used to monitor this. There are many solutions to saturated disks: increasing memory for caching, mirroring disks, distributing operations across several machines, and so forth. .Pp Finally, you might run out of network suds. Optimize the network path as much as possible. For example, in .Xr firewall 7 we describe a firewall protecting internal hosts with a topology where the externally visible hosts are not routed through it. Most bottlenecks occur at the WAN link. If expanding the link is not an option it may be possible to use the .Xr dummynet 4 feature to implement peak shaving or other forms of traffic shaping to prevent the overloaded service (such as web services) from affecting other services (such as email), or vice versa. In home installations this could be used to give interactive traffic (your browser, .Xr ssh 1 logins) priority over services you export from your box (web services, email). .Sh SEE ALSO .Xr netstat 1 , .Xr systat 1 , .Xr sendfile 2 , .Xr ata 4 , .Xr dummynet 4 , .Xr eventtimers 4 , .Xr login.conf 5 , .Xr rc.conf 5 , .Xr sysctl.conf 5 , +.Xr ffs 7 , .Xr firewall 7 , .Xr hier 7 , .Xr ports 7 , .Xr boot 8 , .Xr bsdinstall 8 , .Xr ccdconfig 8 , .Xr config 8 , .Xr fsck 8 , .Xr gjournal 8 , .Xr gpart 8 , .Xr gstripe 8 , .Xr gvinum 8 , .Xr ifconfig 8 , .Xr ipfw 8 , .Xr loader 8 , .Xr mount 8 , .Xr newfs 8 , .Xr route 8 , .Xr sysctl 8 , .Xr tunefs 8 .Sh HISTORY The .Nm manual page was originally written by .An Matthew Dillon and first appeared in .Fx 4.3 , May 2001. The manual page was greatly modified by .An Eitan Adler Aq Mt eadler@FreeBSD.org .