diff --git a/en_US.ISO8859-1/books/handbook/config/chapter.sgml b/en_US.ISO8859-1/books/handbook/config/chapter.sgml index 14cdcae0dd..45b0e0e19d 100644 --- a/en_US.ISO8859-1/books/handbook/config/chapter.sgml +++ b/en_US.ISO8859-1/books/handbook/config/chapter.sgml @@ -1,3100 +1,3100 @@ Chern Lee Written by Mike Smith Based on a tutorial written by Matt Dillon Also based on tuning(7) written by Configuration and Tuning Synopsis system configuration system optimization One of the important aspects of &os; is system configuration. Correct system configuration will help prevent headaches during future upgrades. This chapter will explain much of the &os; configuration process, including some of the parameters which can be set to tune a &os; system. After reading this chapter, you will know: How to efficiently work with file systems and swap partitions. The basics of rc.conf configuration and /usr/local/etc/rc.d startup systems. How to configure and test a network card. How to configure virtual hosts on your network devices. How to use the various configuration files in /etc. How to tune &os; using sysctl variables. How to tune disk performance and modify kernel limitations. Before reading this chapter, you should: Understand &unix; and &os; basics (). Be familiar with the basics of kernel configuration/compilation (). Initial Configuration Partition Layout partition layout /etc /var /usr Base Partitions When laying out file systems with &man.disklabel.8; or &man.sysinstall.8;, remember that hard drives transfer data faster from the outer tracks to the inner. Thus smaller and heavier-accessed file systems should be closer to the outside of the drive, while larger partitions like /usr should be placed toward the inner. It is a good idea to create partitions in a similar order to: root, swap, /var, /usr. The size of /var reflects the intended machine usage. /var is used to hold mailboxes, log files, and printer spools. Mailboxes and log files can grow to unexpected sizes depending on how many users exist and how long log files are kept. Most users would never require a gigabyte, but remember that /var/tmp must be large enough to contain packages. The /usr partition holds much of the files required to support the system, the &man.ports.7; collection (recommended) and the source code (optional). Both of which are optional at install time. At least 2 gigabytes would be recommended for this partition. When selecting partition sizes, keep the space requirements in mind. Running out of space in one partition while barely using another can be a hassle. Some users have found that &man.sysinstall.8;'s Auto-defaults partition sizer will sometimes select smaller than adequate /var and / partitions. Partition wisely and generously. Swap Partition swap sizing swap partition As a rule of thumb, the swap partition should be about double the size of system memory (RAM). For example, if the machine has 128 megabytes of memory, the swap file should be 256 megabytes. Systems with less memory may perform better with more swap. Less than 256 megabytes of swap is not recommended and memory expansion should be considered. The kernel's VM paging algorithms are tuned to perform best when the swap partition is at least two times the size of main memory. Configuring too little swap can lead to inefficiencies in the VM page scanning code and might create issues later if more memory is added. On larger systems with multiple SCSI disks (or multiple IDE disks operating on different controllers), it is recommend that a swap is configured on each drive (up to four drives). The swap partitions should be approximately the same size. The kernel can handle arbitrary sizes but internal data structures scale to 4 times the largest swap partition. Keeping the swap partitions near the same size will allow the kernel to optimally stripe swap space across disks. Large swap sizes are fine, even if swap is not used much. It might be easier to recover from a runaway program before being forced to reboot. Why Partition? Several users think a single large partition will be fine, but there are several reasons why this is a bad idea. First, each partition has different operational characteristics and separating them allows the file system to tune accordingly. For example, the root and /usr partitions are read-mostly, without much writing. While a lot of reading and writing could occur in /var and /var/tmp. By properly partitioning a system, fragmentation introduced in the smaller write heavy partitions will not bleed over into the mostly-read partitions. Keeping the write-loaded partitions closer to the disk's edge, will increase I/O performance in the partitions where it occurs the most. Now while I/O performance in the larger partitions may be needed, shifting them more toward the edge of the disk will not lead to a significant performance improvement over moving /var to the edge. Finally, there are safety concerns. A smaller, neater root partition which is mostly read-only has a greater chance of surviving a bad crash. Core Configuration rc files rc.conf The principal location for system configuration information is within /etc/rc.conf. This file contains a wide range of configuration information, principally used at system startup to configure the system. Its name directly implies this; it is configuration information for the rc* files. An administrator should make entries in the rc.conf file to override the default settings from /etc/defaults/rc.conf. The defaults file should not be copied verbatim to /etc - it contains default values, not examples. All system-specific changes should be made in the rc.conf file itself. A number of strategies may be applied in clustered applications to separate site-wide configuration from system-specific configuration in order to keep administration overhead down. The recommended approach is to place site-wide configuration into another file, such as /etc/rc.conf.site, and then include this file into /etc/rc.conf, which will contain only system-specific information. As rc.conf is read by &man.sh.1; it is trivial to achieve this. For example: rc.conf: . rc.conf.site hostname="node15.example.com" network_interfaces="fxp0 lo0" ifconfig_fxp0="inet 10.1.1.1" rc.conf.site: defaultrouter="10.1.1.254" saver="daemon" blanktime="100" The rc.conf.site file can then be distributed to every system using rsync or a similar program, while the rc.conf file remains unique. Upgrading the system using &man.sysinstall.8; or make world will not overwrite the rc.conf file, so system configuration information will not be lost. Application Configuration Typically, installed applications have their own configuration files, with their own syntax, etc. It is important that these files be kept separate from the base system, so that they may be easily located and managed by the package management tools. /usr/local/etc Typically, these files are installed in /usr/local/etc. In the case where an application has a large number of configuration files, a subdirectory will be created to hold them. Normally, when a port or package is installed, sample configuration files are also installed. These are usually identified with a .default suffix. If there are no existing configuration files for the application, they will be created by copying the .default files. For example, consider the contents of the directory /usr/local/etc/apache: -rw-r--r-- 1 root wheel 2184 May 20 1998 access.conf -rw-r--r-- 1 root wheel 2184 May 20 1998 access.conf.default -rw-r--r-- 1 root wheel 9555 May 20 1998 httpd.conf -rw-r--r-- 1 root wheel 9555 May 20 1998 httpd.conf.default -rw-r--r-- 1 root wheel 12205 May 20 1998 magic -rw-r--r-- 1 root wheel 12205 May 20 1998 magic.default -rw-r--r-- 1 root wheel 2700 May 20 1998 mime.types -rw-r--r-- 1 root wheel 2700 May 20 1998 mime.types.default -rw-r--r-- 1 root wheel 7980 May 20 1998 srm.conf -rw-r--r-- 1 root wheel 7933 May 20 1998 srm.conf.default The file sizes show that only the srm.conf file has been changed. A later update of the Apache port would not overwrite this changed file. Tom Rhodes Contributed by Starting Services services Many users choose to install third party software on &os; from the Ports Collection. In many of these situations it may be necessary to configure the software in a manner which will allow it to be started upon system initialization. Services, such as mail/postfix or www/apache13 are just two of the many software packages which may be started during system initialization. This section explains the procedures available for starting third party software. In &os;, most included services, such as &man.cron.8;, are started through the system start up scripts. These scripts may differ depending on &os; or vendor version; however, the most important aspect to consider is that their start up configuration can be handled through simple startup scripts. Before the advent of rcNG, applications would drop a simple start up script into the /usr/local/etc/rc.d directory which would be read by the system initialization scripts. These scripts would then be executed during the latter stages of system start up. While many individuals have spent hours trying to merge the old configuration style into the new system, the fact remains that some third party utilities still require a script simply dropped into the aforementioned directory. The subtle differences in the scripts depend whether or not rcNG is being used. Prior to &os; 5.1 the old configuration style is used and in almost all cases a new style script would do just fine. While every script must meet some minimal requirements, most of the time these requirements are &os; version agnostic. Each script must have a .sh extension appended to the end and every script must be executable by the system. The latter may be achieved by using the chmod command and setting the unique permissions of 755. There should also be, at minimal, an option to start the application and an option to stop the application. The simplest start up script would probably look a little bit like this one: #!/bin/sh echo -n ' utility' case "$1" in start) /usr/local/bin/utility ;; stop) kill -9 `cat /var/run/utility.pid` ;; *) echo "Usage: `basename $0` {start|stop}" >&2 exit 64 ;; esac exit 0 This script provides for a stop and start option for the application hereto referred simply as utility. Could be started manually with: &prompt.root; /usr/local/etc/rc.d/utility.sh start While not all third party software requires the line in rc.conf, almost every day a new port will be modified to accept this configuration. Check the final output of the installation for more information on a specific application. Some third party software will provide start up scripts which permit the application to be used with rcNG; although, this will be discussed in the next section. Extended Application Configuration Now that &os; includes rcNG, configuration of application start up has become more optimal; indeed, it has become a bit more in depth. Using the key words discussed in the rcNG section, applications may now be set to start after certain other services for example DNS; may permit extra flags to be passed through rc.conf in place of hard coded flags in the start up script, etc. A basic script may look similar to the following: #!/bin/sh # # PROVIDE: utility # REQUIRE: DAEMON # BEFORE: LOGIN # KEYWORD: FreeBSD shutdown # # DO NOT CHANGE THESE DEFAULT VALUES HERE # SET THEM IN THE /etc/rc.conf FILE # utility_enable=${utility_enable-"NO"} utility_flags=${utility_flags-""} utility_pidfile=${utility_pidfile-"/var/run/utility.pid"} . /etc/rc.subr name="utility" rcvar=`set_rcvar` command="/usr/local/sbin/utility" load_rc_config $name pidfile="${utility_pidfile}" start_cmd="echo \"Starting ${name}.\"; /usr/bin/nice -5 ${command} ${utility_flags} ${command_args}" run_rc_command "$1" This script will ensure that the provided utility will be started before the login service but after the daemon service. It also provides a method for setting and tracking the PID, or process ID file. This application could then have the following line placed in /etc/rc.conf: utility_enable="YES" This new method also allows for easier manipulation of the command line arguments, inclusion of the default functions provided in /etc/rc.subr, compatibility with the &man.rcorder.8; utility and provide for easier configuration via the rc.conf file. In essence, this script could even be placed in /etc/rc.d directory. Yet, that has the potential to upset the &man.mergemaster.8; utility when used in conjunction with software upgrades. Using Services to Start Services Other services, such as POP3 server daemons, IMAP, etc. could be started using the &man.inetd.8;. This involves installing the service utility from the Ports Collection with a configuration line appended to the /etc/inetd.conf file, or uncommenting one of the current configuration lines. Working with inetd and its configuration is described in depth in the inetd section. In some cases, it may be more plausible to use the &man.cron.8; daemon to start system services. This approach has a number of advantages because cron runs these processes as the crontab's file owner. This allows regular users to start and maintain some applications. The cron utility provides a unique feature, @reboot, which may be used in place of the time specification. This will cause the job to be run when &man.cron.8; is started, normally during system initialization. Tom Rhodes Contributed by Configuring the <command>cron</command> Utility cron configuration One of the most useful utilities in &os; is &man.cron.8;. The cron utility runs in the background and constantly checks the /etc/crontab file. The cron utility also checks the /var/cron/tabs directory, in search of new crontab files. These crontab files store information about specific functions which cron is supposed to perform at certain times. The cron utility uses two different types of configuration files, the system crontab and user crontabs. The only difference between these two formats is the sixth field. In the system crontab, the sixth field is the name of a user for the command to run as. This gives the system crontab the ability to run commands as any user. In a user crontab, the sixth field is the command to run, and all commands run as the user who created the crontab; this is an important security feature. User crontabs allow individual users to schedule tasks without the need for root privileges. Commands in a user's crontab run with the permissions of the user who owns the crontab. The root user can have a user crontab just like any other user. This one is different from /etc/crontab (the system crontab). Because of the system crontab, there is usually no need to create a user crontab for root. Let us take a look at the /etc/crontab file (the system crontab): # /etc/crontab - root's crontab for &os; # # $&os;: src/etc/crontab,v 1.32 2002/11/22 16:13:39 tom Exp $ # # SHELL=/bin/sh PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin HOME=/var/log # # #minute hour mday month wday who command # # */5 * * * * root /usr/libexec/atrun Like most &os; configuration files, the # character represents a comment. A comment can be placed in the file as a reminder of what and why a desired action is performed. Comments cannot be on the same line as a command or else they will be interpreted as part of the command; they must be on a new line. Blank lines are ignored. First, the environment must be defined. The equals (=) character is used to define any environment settings, as with this example where it is used for the SHELL, PATH, and HOME options. If the shell line is omitted, cron will use the default, which is sh. If the PATH variable is omitted, no default will be used and file locations will need to be absolute. If HOME is omitted, cron will use the invoking users home directory. This line defines a total of seven fields. Listed here are the values minute, hour, mday, month, wday, who, and command. These are almost all self explanatory. minute is the time in minutes the command will be run. hour is similar to the minute option, just in hours. mday stands for day of the month. month is similar to hour and minute, as it designates the month. The wday option stands for day of the week. All these fields must be numeric values, and follow the twenty-four hour clock. The who field is special, and only exists in the /etc/crontab file. This field specifies which user the command should be run as. When a user installs his or her crontab file, they will not have this option. Finally, the command option is listed. This is the last field, so naturally it should designate the command to be executed. This last line will define the values discussed above. Notice here we have a */5 listing, followed by several more * characters. These * characters mean first-last, and can be interpreted as every time. So, judging by this line, it is apparent that the atrun command is to be invoked by root every five minutes regardless of what day or month it is. For more information on the atrun command, see the &man.atrun.8; manual page. Commands can have any number of flags passed to them; however, commands which extend to multiple lines need to be broken with the backslash \ continuation character. This is the basic set up for every crontab file, although there is one thing different about this one. Field number six, where we specified the username, only exists in the system /etc/crontab file. This field should be omitted for individual user crontab files. Installing a Crontab You must not use the procedure described here to edit/install the system crontab. Simply use your favorite editor: the cron utility will notice that the file has changed and immediately begin using the updated version. See this FAQ entry for more information. To install a freshly written user crontab, first use your favorite editor to create a file in the proper format, and then use the crontab utility. The most common usage is: &prompt.user; crontab crontab-file In this example, crontab-file is the filename of a crontab that was previously created. There is also an option to list installed crontab files: just pass the option to crontab and look over the output. For users who wish to begin their own crontab file from scratch, without the use of a template, the crontab -e option is available. This will invoke the selected editor with an empty file. When the file is saved, it will be automatically installed by the crontab command. If you later want to remove your user crontab completely, use crontab with the option. Tom Rhodes Contributed by Using rc under FreeBSD 5.X rcNG &os; has recently integrated the NetBSD rc.d system for system initialization. Users should notice the files listed in the /etc/rc.d directory. Many of these files are for basic services which can be controlled with the , , and options. For instance, &man.sshd.8; can be restarted with the following command: &prompt.root; /etc/rc.d/sshd restart This procedure is similar for other services. Of course, services are usually started automatically as specified in &man.rc.conf.5;. For example, enabling the Network Address Translation daemon at startup is as simple as adding the following line to /etc/rc.conf: natd_enable="YES" If a line is already present, then simply change the to . The rc scripts will automatically load any other dependent services during the next reboot, as described below. Since the rc.d system is primarily intended to start/stop services at system startup/shutdown time, the standard , and options will only perform their action if the appropriate /etc/rc.conf variables are set. For instance the above sshd restart command will only work if sshd_enable is set to in /etc/rc.conf. To , or a service regardless of the settings in /etc/rc.conf, the commands should be prefixed with force. For instance to restart sshd regardless of the current /etc/rc.conf setting, execute the following command: &prompt.root; /etc/rc.d/sshd forcerestart It is easy to check if a service is enabled in /etc/rc.conf by running the appropriate rc.d script with the option . Thus, an administrator can check that sshd is in fact enabled in /etc/rc.conf by running: &prompt.root; /etc/rc.d/sshd rcvar # sshd $sshd_enable=YES The second line (# sshd) is the output from the sshd command, not a root console. To determine if a service is running, a option is available. For instance to verify that sshd is actually started: &prompt.root; /etc/rc.d/sshd status sshd is running as pid 433. It is also possible to a service. This will attempt to send a signal to an individual service, forcing the service to reload its configuration files. In most cases this means sending the service a SIGHUP signal. The rcNG structure is not only used for network services, it also contributes to most of the system initialization. For instance, consider the bgfsck file. When this script is executed, it will print out the following message: Starting background file system checks in 60 seconds. Therefore this file is used for background file system checks, which are done only during system initialization. Many system services depend on other services to function properly. For example, NIS and other RPC-based services may fail to start until after the rpcbind (portmapper) service has started. To resolve this issue, information about dependencies and other meta-data is included in the comments at the top of each startup script. The &man.rcorder.8; program is then used to parse these comments during system initialization to determine the order in which system services should be invoked to satisfy the dependencies. The following words may be included at the top of each startup file: PROVIDE: Specifies the services this file provides. REQUIRE: Lists services which are required for this service. This file will run after the specified services. BEFORE: Lists services which depend on this service. This file will run before the specified services. KEYWORD: &os; or NetBSD. This is used for *BSD dependent features. By using this method, an administrator can easily control system services without the hassle of runlevels like some other &unix; operating systems. Additional information about the &os; 5.X rc.d system can be found in the &man.rc.8; and &man.rc.subr.8; manual pages. Marc Fonvieille Contributed by Setting Up Network Interface Cards network cards configuration Nowadays we can not think about a computer without thinking about a network connection. Adding and configuring a network card is a common task for any &os; administrator. Locating the Correct Driver network cards driver Before you begin, you should know the model of the card you have, the chip it uses, and whether it is a PCI or ISA card. &os; supports a wide variety of both PCI and ISA cards. Check the Hardware Compatibility List for your release to see if your card is supported. Once you are sure your card is supported, you need to determine the proper driver for the card. /usr/src/sys/conf/NOTES and /usr/src/sys/arch/conf/NOTES will give you the list of network interface drivers with some information about the supported chipsets/cards. If you have doubts about which driver is the correct one, read the manual page of the driver. The manual page will give you more information about the supported hardware and even the possible problems that could occur. NOTES does not exist on &os; 4.X. Instead, check the LINT file for information about various network interfaces. See for a more detailed summary of NOTES versus LINT. If you own a common card, most of the time you will not have to look very hard for a driver. Drivers for common network cards are present in the GENERIC kernel, so your card should show up during boot, like so: dc0: <82c169 PNIC 10/100BaseTX> port 0xa000-0xa0ff mem 0xd3800000-0xd38 000ff irq 15 at device 11.0 on pci0 dc0: Ethernet address: 00:a0:cc:da:da:da miibus0: <MII bus> on dc0 ukphy0: <Generic IEEE 802.3u media interface> on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto dc1: <82c169 PNIC 10/100BaseTX> port 0x9800-0x98ff mem 0xd3000000-0xd30 000ff irq 11 at device 12.0 on pci0 dc1: Ethernet address: 00:a0:cc:da:da:db miibus1: <MII bus> on dc1 ukphy1: <Generic IEEE 802.3u media interface> on miibus1 ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto In this example, we see that two cards using the &man.dc.4; driver are present on the system. If the driver for your NIC is not present in GENERIC, you will need to load the proper driver to use your NIC. This may be accomplished in one of two ways: The easiest way is to simply load a kernel module for your network card with &man.kldload.8;. Not all NIC drivers are available as modules; notable examples of devices for which modules do not exist are ISA cards. Alternatively, you may statically compile the support for your card into your kernel. Check /usr/src/sys/conf/NOTES, /usr/src/sys/arch/conf/NOTES and the manual page of the driver to know what to add in your kernel configuration file. For more information about recompiling your kernel, please see . If your card was detected at boot by your kernel (GENERIC) you do not have to build a new kernel. Configuring the Network Card network cards configuration Once the right driver is loaded for the network card, the card needs to be configured. As with many other things, the network card may have been configured at installation time by sysinstall. To display the configuration for the network interfaces on your system, enter the following command: &prompt.user; ifconfig dc0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 192.168.1.3 netmask 0xffffff00 broadcast 192.168.1.255 ether 00:a0:cc:da:da:da media: Ethernet autoselect (100baseTX <full-duplex>) status: active dc1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255 ether 00:a0:cc:da:da:db media: Ethernet 10baseT/UTP status: no carrier lp0: flags=8810<POINTOPOINT,SIMPLEX,MULTICAST> mtu 1500 lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask 0xff000000 tun0: flags=8010<POINTOPOINT,MULTICAST> mtu 1500 Old versions of &os; may require the option following &man.ifconfig.8;, for more details about the correct syntax of &man.ifconfig.8;, please refer to the manual page. Note also that entries concerning IPv6 (inet6 etc.) were omitted in this example. In this example, the following devices were displayed: dc0: The first Ethernet interface dc1: The second Ethernet interface lp0: The parallel port interface lo0: The loopback device tun0: The tunnel device used by ppp &os; uses the driver name followed by the order in which one the card is detected at the kernel boot to name the network card. For example sis2 would be the third network card on the system using the &man.sis.4; driver. In this example, the dc0 device is up and running. The key indicators are: UP means that the card is configured and ready. The card has an Internet (inet) address (in this case 192.168.1.3). It has a valid subnet mask (netmask; 0xffffff00 is the same as 255.255.255.0). It has a valid broadcast address (in this case, 192.168.1.255). The MAC address of the card (ether) is 00:a0:cc:da:da:da The physical media selection is on autoselection mode (media: Ethernet autoselect (100baseTX <full-duplex>)). We see that dc1 was configured to run with 10baseT/UTP media. For more information on available media types for a driver, please refer to its manual page. The status of the link (status) is active, i.e. the carrier is detected. For dc1, we see status: no carrier. This is normal when an Ethernet cable is not plugged into the card. If the &man.ifconfig.8; output had shown something similar to: dc0: flags=8843<BROADCAST,SIMPLEX,MULTICAST> mtu 1500 ether 00:a0:cc:da:da:da it would indicate the card has not been configured. To configure your card, you need root privileges. The network card configuration can be done from the command line with &man.ifconfig.8; but you would have to do it after each reboot of the system. The file /etc/rc.conf is where to add the network card's configuration. Open /etc/rc.conf in your favorite editor. You need to add a line for each network card present on the system, for example in our case, we added these lines: ifconfig_dc0="inet 192.168.1.3 netmask 255.255.255.0" ifconfig_dc1="inet 10.0.0.1 netmask 255.255.255.0 media 10baseT/UTP" You have to replace dc0, dc1, and so on, with the correct device for your cards, and the addresses with the proper ones. You should read the card driver and &man.ifconfig.8; manual pages for more details about the allowed options and also &man.rc.conf.5; manual page for more information on the syntax of /etc/rc.conf. If you configured the network during installation, some lines about the network card(s) may be already present. Double check /etc/rc.conf before adding any lines. You will also have to edit the file /etc/hosts to add the names and the IP addresses of various machines of the LAN, if they are not already there. For more information please refer to &man.hosts.5; and to /usr/share/examples/etc/hosts. Testing and Troubleshooting Once you have made the necessary changes in /etc/rc.conf, you should reboot your system. This will allow the change(s) to the interface(s) to be applied, and verify that the system restarts without any configuration errors. Once the system has been rebooted, you should test the network interfaces. Testing the Ethernet Card network cards testing To verify that an Ethernet card is configured correctly, you have to try two things. First, ping the interface itself, and then ping another machine on the LAN. First test the local interface: &prompt.user; ping -c5 192.168.1.3 PING 192.168.1.3 (192.168.1.3): 56 data bytes 64 bytes from 192.168.1.3: icmp_seq=0 ttl=64 time=0.082 ms 64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.074 ms 64 bytes from 192.168.1.3: icmp_seq=2 ttl=64 time=0.076 ms 64 bytes from 192.168.1.3: icmp_seq=3 ttl=64 time=0.108 ms 64 bytes from 192.168.1.3: icmp_seq=4 ttl=64 time=0.076 ms --- 192.168.1.3 ping statistics --- 5 packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.074/0.083/0.108/0.013 ms Now we have to ping another machine on the LAN: &prompt.user; ping -c5 192.168.1.2 PING 192.168.1.2 (192.168.1.2): 56 data bytes 64 bytes from 192.168.1.2: icmp_seq=0 ttl=64 time=0.726 ms 64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.766 ms 64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.700 ms 64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.747 ms 64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=0.704 ms --- 192.168.1.2 ping statistics --- 5 packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.700/0.729/0.766/0.025 ms You could also use the machine name instead of 192.168.1.2 if you have set up the /etc/hosts file. Troubleshooting network cards troubleshooting Troubleshooting hardware and software configurations is always a pain, and a pain which can be alleviated by checking the simple things first. Is your network cable plugged in? Have you properly configured the network services? Did you configure the firewall correctly? Is the card you are using supported by &os;? Always check the hardware notes before sending off a bug report. Update your version of &os; to the latest STABLE version. Check the mailing list archives, or perhaps search the Internet. If the card works, yet performance is poor, it would be worthwhile to read over the &man.tuning.7; manual page. You can also check the network configuration as incorrect network settings can cause slow connections. Some users experience one or two device timeout messages, which is normal for some cards. If they continue, or are bothersome, you may wish to be sure the device is not conflicting with another device. Double check the cable connections. Perhaps you may just need to get another card. At times, users see a few watchdog timeout errors. The first thing to do here is to check your network cable. Many cards require a PCI slot which supports Bus Mastering. On some old motherboards, only one PCI slot allows it (usually slot 0). Check the network card and the motherboard documentation to determine if that may be the problem. No route to host messages occur if the system is unable to route a packet to the destination host. This can happen if no default route is specified, or if a cable is unplugged. Check the output of netstat -rn and make sure there is a valid route to the host you are trying to reach. If there is not, read on to . ping: sendto: Permission denied error messages are often caused by a misconfigured firewall. If ipfw is enabled in the kernel but no rules have been defined, then the default policy is to deny all traffic, even ping requests! Read on to for more information. Sometimes performance of the card is poor, or below average. In these cases it is best to set the media selection mode from autoselect to the correct media selection. While this usually works for most hardware, it may not resolve this issue for everyone. Again, check all the network settings, and read over the &man.tuning.7; manual page. Virtual Hosts virtual hosts IP aliases A very common use of &os; is virtual site hosting, where one server appears to the network as many servers. This is achieved by assigning multiple network addresses to a single interface. A given network interface has one real address, and may have any number of alias addresses. These aliases are normally added by placing alias entries in /etc/rc.conf. An alias entry for the interface fxp0 looks like: ifconfig_fxp0_alias0="inet xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx" Note that alias entries must start with alias0 and proceed upwards in order, (for example, _alias1, _alias2, and so on). The configuration process will stop at the first missing number. The calculation of alias netmasks is important, but fortunately quite simple. For a given interface, there must be one address which correctly represents the network's netmask. Any other addresses which fall within this network must have a netmask of all 1s (expressed as either 255.255.255.255 or 0xffffffff). For example, consider the case where the fxp0 interface is connected to two networks, the 10.1.1.0 network with a netmask of 255.255.255.0 and the 202.0.75.16 network with a netmask of 255.255.255.240. We want the system to appear at 10.1.1.1 through 10.1.1.5 and at 202.0.75.17 through 202.0.75.20. As noted above, only the first address in a given network range (in this case, 10.0.1.1 and 202.0.75.17) should have a real netmask; all the rest (10.1.1.2 through 10.1.1.5 and 202.0.75.18 through 202.0.75.20) must be configured with a netmask of 255.255.255.255. - The following entries configure the adapter correctly for - this arrangement: - - ifconfig_fxp0="inet 10.1.1.1 netmask 255.255.255.0" - ifconfig_fxp0_alias0="inet 10.1.1.2 netmask 255.255.255.255" - ifconfig_fxp0_alias1="inet 10.1.1.3 netmask 255.255.255.255" - ifconfig_fxp0_alias2="inet 10.1.1.4 netmask 255.255.255.255" - ifconfig_fxp0_alias3="inet 10.1.1.5 netmask 255.255.255.255" - ifconfig_fxp0_alias4="inet 202.0.75.17 netmask 255.255.255.240" - ifconfig_fxp0_alias5="inet 202.0.75.18 netmask 255.255.255.255" - ifconfig_fxp0_alias6="inet 202.0.75.19 netmask 255.255.255.255" - ifconfig_fxp0_alias7="inet 202.0.75.20 netmask 255.255.255.255" + The following /etc/rc.conf entries + configure the adapter correctly for this arrangement: + + ifconfig_fxp0="inet 10.1.1.1 netmask 255.255.255.0" +ifconfig_fxp0_alias0="inet 10.1.1.2 netmask 255.255.255.255" +ifconfig_fxp0_alias1="inet 10.1.1.3 netmask 255.255.255.255" +ifconfig_fxp0_alias2="inet 10.1.1.4 netmask 255.255.255.255" +ifconfig_fxp0_alias3="inet 10.1.1.5 netmask 255.255.255.255" +ifconfig_fxp0_alias4="inet 202.0.75.17 netmask 255.255.255.240" +ifconfig_fxp0_alias5="inet 202.0.75.18 netmask 255.255.255.255" +ifconfig_fxp0_alias6="inet 202.0.75.19 netmask 255.255.255.255" +ifconfig_fxp0_alias7="inet 202.0.75.20 netmask 255.255.255.255" Configuration Files <filename>/etc</filename> Layout There are a number of directories in which configuration information is kept. These include: /etc Generic system configuration information; data here is system-specific. /etc/defaults Default versions of system configuration files. /etc/mail Extra &man.sendmail.8; configuration, other MTA configuration files. /etc/ppp Configuration for both user- and kernel-ppp programs. /etc/namedb Default location for &man.named.8; data. Normally named.conf and zone files are stored here. /usr/local/etc Configuration files for installed applications. May contain per-application subdirectories. /usr/local/etc/rc.d Start/stop scripts for installed applications. /var/db Automatically generated system-specific database files, such as the package database, the locate database, and so on Hostnames hostname DNS <filename>/etc/resolv.conf</filename> resolv.conf /etc/resolv.conf dictates how &os;'s resolver accesses the Internet Domain Name System (DNS). The most common entries to resolv.conf are: nameserver The IP address of a name server the resolver should query. The servers are queried in the order listed with a maximum of three. search Search list for hostname lookup. This is normally determined by the domain of the local hostname. domain The local domain name. A typical resolv.conf: search example.com nameserver 147.11.1.11 nameserver 147.11.100.30 Only one of the search and domain options should be used. If you are using DHCP, &man.dhclient.8; usually rewrites resolv.conf with information received from the DHCP server. <filename>/etc/hosts</filename> hosts /etc/hosts is a simple text database reminiscent of the old Internet. It works in conjunction with DNS and NIS providing name to IP address mappings. Local computers connected via a LAN can be placed in here for simplistic naming purposes instead of setting up a &man.named.8; server. Additionally, /etc/hosts can be used to provide a local record of Internet names, reducing the need to query externally for commonly accessed names. # $&os;$ # # Host Database # This file should contain the addresses and aliases # for local hosts that share this file. # In the presence of the domain name service or NIS, this file may # not be consulted at all; see /etc/nsswitch.conf for the resolution order. # # ::1 localhost localhost.my.domain myname.my.domain 127.0.0.1 localhost localhost.my.domain myname.my.domain # # Imaginary network. #10.0.0.2 myname.my.domain myname #10.0.0.3 myfriend.my.domain myfriend # # According to RFC 1918, you can use the following IP networks for # private nets which will never be connected to the Internet: # # 10.0.0.0 - 10.255.255.255 # 172.16.0.0 - 172.31.255.255 # 192.168.0.0 - 192.168.255.255 # # In case you want to be able to connect to the Internet, you need # real official assigned numbers. PLEASE PLEASE PLEASE do not try # to invent your own network numbers but instead get one from your # network provider (if any) or from the Internet Registry (ftp to # rs.internic.net, directory `/templates'). # /etc/hosts takes on the simple format of: [Internet address] [official hostname] [alias1] [alias2] ... For example: 10.0.0.1 myRealHostname.example.com myRealHostname foobar1 foobar2 Consult &man.hosts.5; for more information. Log File Configuration log files <filename>syslog.conf</filename> syslog.conf syslog.conf is the configuration file for the &man.syslogd.8; program. It indicates which types of syslog messages are logged to particular log files. # $&os;$ # # Spaces ARE valid field separators in this file. However, # other *nix-like systems still insist on using tabs as field # separators. If you are sharing this file between systems, you # may want to use only tabs as field separators here. # Consult the syslog.conf(5) manual page. *.err;kern.debug;auth.notice;mail.crit /dev/console *.notice;kern.debug;lpr.info;mail.crit;news.err /var/log/messages security.* /var/log/security mail.info /var/log/maillog lpr.info /var/log/lpd-errs cron.* /var/log/cron *.err root *.notice;news.err root *.alert root *.emerg * # uncomment this to log all writes to /dev/console to /var/log/console.log #console.info /var/log/console.log # uncomment this to enable logging of all log messages to /var/log/all.log #*.* /var/log/all.log # uncomment this to enable logging to a remote log host named loghost #*.* @loghost # uncomment these if you're running inn # news.crit /var/log/news/news.crit # news.err /var/log/news/news.err # news.notice /var/log/news/news.notice !startslip *.* /var/log/slip.log !ppp *.* /var/log/ppp.log Consult the &man.syslog.conf.5; manual page for more information. <filename>newsyslog.conf</filename> newsyslog.conf newsyslog.conf is the configuration file for &man.newsyslog.8;, a program that is normally scheduled to run by &man.cron.8;. &man.newsyslog.8; determines when log files require archiving or rearranging. logfile is moved to logfile.0, logfile.0 is moved to logfile.1, and so on. Alternatively, the log files may be archived in &man.gzip.1; format causing them to be named: logfile.0.gz, logfile.1.gz, and so on. newsyslog.conf indicates which log files are to be managed, how many are to be kept, and when they are to be touched. Log files can be rearranged and/or archived when they have either reached a certain size, or at a certain periodic time/date. # configuration file for newsyslog # $&os;$ # # filename [owner:group] mode count size when [ZB] [/pid_file] [sig_num] /var/log/cron 600 3 100 * Z /var/log/amd.log 644 7 100 * Z /var/log/kerberos.log 644 7 100 * Z /var/log/lpd-errs 644 7 100 * Z /var/log/maillog 644 7 * @T00 Z /var/log/sendmail.st 644 10 * 168 B /var/log/messages 644 5 100 * Z /var/log/all.log 600 7 * @T00 Z /var/log/slip.log 600 3 100 * Z /var/log/ppp.log 600 3 100 * Z /var/log/security 600 10 100 * Z /var/log/wtmp 644 3 * @01T05 B /var/log/daily.log 640 7 * @T00 Z /var/log/weekly.log 640 5 1 $W6D0 Z /var/log/monthly.log 640 12 * $M1D0 Z /var/log/console.log 640 5 100 * Z Consult the &man.newsyslog.8; manual page for more information. <filename>sysctl.conf</filename> sysctl.conf sysctl sysctl.conf looks much like rc.conf. Values are set in a variable=value form. The specified values are set after the system goes into multi-user mode. Not all variables are settable in this mode. A sample sysctl.conf turning off logging of fatal signal exits and letting Linux programs know they are really running under &os;: kern.logsigexit=0 # Do not log fatal signal exits (e.g. sig 11) compat.linux.osname=&os; compat.linux.osrelease=4.3-STABLE Tuning with sysctl sysctl tuning with sysctl &man.sysctl.8; is an interface that allows you to make changes to a running &os; system. This includes many advanced options of the TCP/IP stack and virtual memory system that can dramatically improve performance for an experienced system administrator. Over five hundred system variables can be read and set using &man.sysctl.8;. At its core, &man.sysctl.8; serves two functions: to read and to modify system settings. To view all readable variables: &prompt.user; sysctl -a To read a particular variable, for example, kern.maxproc: &prompt.user; sysctl kern.maxproc kern.maxproc: 1044 To set a particular variable, use the intuitive variable=value syntax: &prompt.root; sysctl kern.maxfiles=5000 kern.maxfiles: 2088 -> 5000 Settings of sysctl variables are usually either strings, numbers, or booleans (a boolean being 1 for yes or a 0 for no). If you want to set automatically some variables each time the machine boots, add them to the /etc/sysctl.conf file. For more information see the &man.sysctl.conf.5; manual page and the . Tom Rhodes Contributed by &man.sysctl.8; Read-only In some cases it may be desirable to modify read-only &man.sysctl.8; values. While this is not recommended, it is also sometimes unavoidable. For instance on some laptop models the &man.cardbus.4; device will not probe memory ranges, and fail with errors which look similar to: cbb0: Could not map register memory device_probe_and_attach: cbb0 attach returned 12 Cases like the one above usually require the modification of some default &man.sysctl.8; settings which are set read only. To overcome these situations a user can put &man.sysctl.8; OIDs in their local /boot/loader.conf. Default settings are located in the /boot/defaults/loader.conf file. Fixing the problem mentioned above would require a user to set in the aforementioned file. Now &man.cardbus.4; will work properly. Tuning Disks Sysctl Variables <varname>vfs.vmiodirenable</varname> vfs.vmiodirenable The vfs.vmiodirenable sysctl variable may be set to either 0 (off) or 1 (on); it is 1 by default. This variable controls how directories are cached by the system. Most directories are small, using just a single fragment (typically 1 K) in the file system and less (typically 512 bytes) in the buffer cache. With this variable turned off (to 0), the buffer cache will only cache a fixed number of directories even if you have a huge amount of memory. When turned on (to 1), this sysctl allows the buffer cache to use the VM Page Cache to cache the directories, making all the memory available for caching directories. However, the minimum in-core memory used to cache a directory is the physical page size (typically 4 K) rather than 512  bytes. We recommend keeping this option on if you are running any services which manipulate large numbers of files. Such services can include web caches, large mail systems, and news systems. Keeping this option on will generally not reduce performance even with the wasted memory but you should experiment to find out. <varname>vfs.write_behind</varname> vfs.write_behind The vfs.write_behind sysctl variable defaults to 1 (on). This tells the file system to issue media writes as full clusters are collected, which typically occurs when writing large sequential files. The idea is to avoid saturating the buffer cache with dirty buffers when it would not benefit I/O performance. However, this may stall processes and under certain circumstances you may wish to turn it off. <varname>vfs.hirunningspace</varname> vfs.hirunningspace The vfs.hirunningspace sysctl variable determines how much outstanding write I/O may be queued to disk controllers system-wide at any given instance. The default is usually sufficient but on machines with lots of disks you may want to bump it up to four or five megabytes. Note that setting too high a value (exceeding the buffer cache's write threshold) can lead to extremely bad clustering performance. Do not set this value arbitrarily high! Higher write values may add latency to reads occurring at the same time. There are various other buffer-cache and VM page cache related sysctls. We do not recommend modifying these values. As of &os; 4.3, the VM system does an extremely good job of automatically tuning itself. <varname>vm.swap_idle_enabled</varname> vm.swap_idle_enabled The vm.swap_idle_enabled sysctl variable is useful in large multi-user systems where you have lots of users entering and leaving the system and lots of idle processes. Such systems tend to generate a great deal of continuous pressure on free memory reserves. Turning this feature on and tweaking the swapout hysteresis (in idle seconds) via vm.swap_idle_threshold1 and vm.swap_idle_threshold2 allows you to depress the priority of memory pages associated with idle processes more quickly then the normal pageout algorithm. This gives a helping hand to the pageout daemon. Do not turn this option on unless you need it, because the tradeoff you are making is essentially pre-page memory sooner rather than later; thus eating more swap and disk bandwidth. In a small system this option will have a determinable effect but in a large system that is already doing moderate paging this option allows the VM system to stage whole processes into and out of memory easily. <varname>hw.ata.wc</varname> hw.ata.wc &os; 4.3 flirted with turning off IDE write caching. This reduced write bandwidth to IDE disks but was considered necessary due to serious data consistency issues introduced by hard drive vendors. The problem is that IDE drives lie about when a write completes. With IDE write caching turned on, IDE hard drives not only write data to disk out of order, but will sometimes delay writing some blocks indefinitely when under heavy disk loads. A crash or power failure may cause serious file system corruption. &os;'s default was changed to be safe. Unfortunately, the result was such a huge performance loss that we changed write caching back to on by default after the release. You should check the default on your system by observing the hw.ata.wc sysctl variable. If IDE write caching is turned off, you can turn it back on by setting the kernel variable back to 1. This must be done from the boot loader at boot time. Attempting to do it after the kernel boots will have no effect. For more information, please see &man.ata.4;. <literal>SCSI_DELAY</literal> (<varname>kern.cam.scsi_delay</varname>) kern.cam.scsi_delay kernel options SCSI_DELAY The SCSI_DELAY kernel config may be used to reduce system boot times. The defaults are fairly high and can be responsible for 15 seconds of delay in the boot process. Reducing it to 5 seconds usually works (especially with modern drives). Newer versions of &os; (5.0 and higher) should use the kern.cam.scsi_delay boot time tunable. The tunable, and kernel config option accept values in terms of milliseconds and not seconds. Soft Updates Soft Updates tunefs The &man.tunefs.8; program can be used to fine-tune a file system. This program has many different options, but for now we are only concerned with toggling Soft Updates on and off, which is done by: &prompt.root; tunefs -n enable /filesystem &prompt.root; tunefs -n disable /filesystem A filesystem cannot be modified with &man.tunefs.8; while it is mounted. A good time to enable Soft Updates is before any partitions have been mounted, in single-user mode. As of &os; 4.5, it is possible to enable Soft Updates at filesystem creation time, through use of the -U option to &man.newfs.8;. Soft Updates drastically improves meta-data performance, mainly file creation and deletion, through the use of a memory cache. We recommend to use Soft Updates on all of your file systems. There are two downsides to Soft Updates that you should be aware of: First, Soft Updates guarantees filesystem consistency in the case of a crash but could very easily be several seconds (even a minute!) behind updating the physical disk. If your system crashes you may lose more work than otherwise. Secondly, Soft Updates delays the freeing of filesystem blocks. If you have a filesystem (such as the root filesystem) which is almost full, performing a major update, such as make installworld, can cause the filesystem to run out of space and the update to fail. More Details about Soft Updates Soft Updates details There are two traditional approaches to writing a file systems meta-data back to disk. (Meta-data updates are updates to non-content data like inodes or directories.) Historically, the default behavior was to write out meta-data updates synchronously. If a directory had been changed, the system waited until the change was actually written to disk. The file data buffers (file contents) were passed through the buffer cache and backed up to disk later on asynchronously. The advantage of this implementation is that it operates safely. If there is a failure during an update, the meta-data are always in a consistent state. A file is either created completely or not at all. If the data blocks of a file did not find their way out of the buffer cache onto the disk by the time of the crash, &man.fsck.8; is able to recognize this and repair the filesystem by setting the file length to 0. Additionally, the implementation is clear and simple. The disadvantage is that meta-data changes are slow. An rm -r, for instance, touches all the files in a directory sequentially, but each directory change (deletion of a file) will be written synchronously to the disk. This includes updates to the directory itself, to the inode table, and possibly to indirect blocks allocated by the file. Similar considerations apply for unrolling large hierarchies (tar -x). The second case is asynchronous meta-data updates. This is the default for Linux/ext2fs and mount -o async for *BSD ufs. All meta-data updates are simply being passed through the buffer cache too, that is, they will be intermixed with the updates of the file content data. The advantage of this implementation is there is no need to wait until each meta-data update has been written to disk, so all operations which cause huge amounts of meta-data updates work much faster than in the synchronous case. Also, the implementation is still clear and simple, so there is a low risk for bugs creeping into the code. The disadvantage is that there is no guarantee at all for a consistent state of the filesystem. If there is a failure during an operation that updated large amounts of meta-data (like a power failure, or someone pressing the reset button), the filesystem will be left in an unpredictable state. There is no opportunity to examine the state of the filesystem when the system comes up again; the data blocks of a file could already have been written to the disk while the updates of the inode table or the associated directory were not. It is actually impossible to implement a fsck which is able to clean up the resulting chaos (because the necessary information is not available on the disk). If the filesystem has been damaged beyond repair, the only choice is to use &man.newfs.8; on it and restore it from backup. The usual solution for this problem was to implement dirty region logging, which is also referred to as journaling, although that term is not used consistently and is occasionally applied to other forms of transaction logging as well. Meta-data updates are still written synchronously, but only into a small region of the disk. Later on they will be moved to their proper location. Because the logging area is a small, contiguous region on the disk, there are no long distances for the disk heads to move, even during heavy operations, so these operations are quicker than synchronous updates. Additionally the complexity of the implementation is fairly limited, so the risk of bugs being present is low. A disadvantage is that all meta-data are written twice (once into the logging region and once to the proper location) so for normal work, a performance pessimization might result. On the other hand, in case of a crash, all pending meta-data operations can be quickly either rolled-back or completed from the logging area after the system comes up again, resulting in a fast filesystem startup. Kirk McKusick, the developer of Berkeley FFS, solved this problem with Soft Updates: all pending meta-data updates are kept in memory and written out to disk in a sorted sequence (ordered meta-data updates). This has the effect that, in case of heavy meta-data operations, later updates to an item catch the earlier ones if the earlier ones are still in memory and have not already been written to disk. So all operations on, say, a directory are generally performed in memory before the update is written to disk (the data blocks are sorted according to their position so that they will not be on the disk ahead of their meta-data). If the system crashes, this causes an implicit log rewind: all operations which did not find their way to the disk appear as if they had never happened. A consistent filesystem state is maintained that appears to be the one of 30 to 60 seconds earlier. The algorithm used guarantees that all resources in use are marked as such in their appropriate bitmaps: blocks and inodes. After a crash, the only resource allocation error that occurs is that resources are marked as used which are actually free. &man.fsck.8; recognizes this situation, and frees the resources that are no longer used. It is safe to ignore the dirty state of the filesystem after a crash by forcibly mounting it with mount -f. In order to free resources that may be unused, &man.fsck.8; needs to be run at a later time. This is the idea behind the background fsck: at system startup time, only a snapshot of the filesystem is recorded. The fsck can be run later on. All file systems can then be mounted dirty, so the system startup proceeds in multiuser mode. Then, background fscks will be scheduled for all file systems where this is required, to free resources that may be unused. (File systems that do not use Soft Updates still need the usual foreground fsck though.) The advantage is that meta-data operations are nearly as fast as asynchronous updates (i.e. faster than with logging, which has to write the meta-data twice). The disadvantages are the complexity of the code (implying a higher risk for bugs in an area that is highly sensitive regarding loss of user data), and a higher memory consumption. Additionally there are some idiosyncrasies one has to get used to. After a crash, the state of the filesystem appears to be somewhat older. In situations where the standard synchronous approach would have caused some zero-length files to remain after the fsck, these files do not exist at all with a Soft Updates filesystem because neither the meta-data nor the file contents have ever been written to disk. Disk space is not released until the updates have been written to disk, which may take place some time after running rm. This may cause problems when installing large amounts of data on a filesystem that does not have enough free space to hold all the files twice. Tuning Kernel Limits tuning kernel limits File/Process Limits <varname>kern.maxfiles</varname> kern.maxfiles kern.maxfiles can be raised or lowered based upon your system requirements. This variable indicates the maximum number of file descriptors on your system. When the file descriptor table is full, file: table is full will show up repeatedly in the system message buffer, which can be viewed with the dmesg command. Each open file, socket, or fifo uses one file descriptor. A large-scale production server may easily require many thousands of file descriptors, depending on the kind and number of services running concurrently. kern.maxfile's default value is dictated by the option in your kernel configuration file. kern.maxfiles grows proportionally to the value of . When compiling a custom kernel, it is a good idea to set this kernel configuration option according to the uses of your system. From this number, the kernel is given most of its pre-defined limits. Even though a production machine may not actually have 256 users connected at once, the resources needed may be similar to a high-scale web server. As of &os; 4.5, setting to 0 in your kernel configuration file will choose a reasonable default value based on the amount of RAM present in your system. <varname>kern.ipc.somaxconn</varname> kern.ipc.somaxconn The kern.ipc.somaxconn sysctl variable limits the size of the listen queue for accepting new TCP connections. The default value of 128 is typically too low for robust handling of new connections in a heavily loaded web server environment. For such environments, it is recommended to increase this value to 1024 or higher. The service daemon may itself limit the listen queue size (e.g. &man.sendmail.8;, or Apache) but will often have a directive in its configuration file to adjust the queue size. Large listen queues also do a better job of avoiding Denial of Service (DoS) attacks. Network Limits The NMBCLUSTERS kernel configuration option dictates the amount of network Mbufs available to the system. A heavily-trafficked server with a low number of Mbufs will hinder &os;'s ability. Each cluster represents approximately 2 K of memory, so a value of 1024 represents 2 megabytes of kernel memory reserved for network buffers. A simple calculation can be done to figure out how many are needed. If you have a web server which maxes out at 1000 simultaneous connections, and each connection eats a 16 K receive and 16 K send buffer, you need approximately 32 MB worth of network buffers to cover the web server. A good rule of thumb is to multiply by 2, so 2x32 MB / 2 KB = 64 MB / 2 kB = 32768. We recommend values between 4096 and 32768 for machines with greater amounts of memory. Under no circumstances should you specify an arbitrarily high value for this parameter as it could lead to a boot time crash. The option to &man.netstat.1; may be used to observe network cluster use. kern.ipc.nmbclusters loader tunable should be used to tune this at boot time. Only older versions of &os; will require you to use the NMBCLUSTERS kernel &man.config.8; option. For busy servers that make extensive use of the &man.sendfile.2; system call, it may be necessary to increase the number of &man.sendfile.2; buffers via the NSFBUFS kernel configuration option or by setting its value in /boot/loader.conf (see &man.loader.8; for details). A common indicator that this parameter needs to be adjusted is when processes are seen in the sfbufa state. The sysctl variable kern.ipc.nsfbufs is a read-only glimpse at the kernel configured variable. This parameter nominally scales with kern.maxusers, however it may be necessary to tune accordingly. Even though a socket has been marked as non-blocking, calling &man.sendfile.2; on the non-blocking socket may result in the &man.sendfile.2; call blocking until enough struct sf_buf's are made available. <varname>net.inet.ip.portrange.*</varname> net.inet.ip.portrange.* The net.inet.ip.portrange.* sysctl variables control the port number ranges automatically bound to TCP and UDP sockets. There are three ranges: a low range, a default range, and a high range. Most network programs use the default range which is controlled by the net.inet.ip.portrange.first and net.inet.ip.portrange.last, which default to 1024 and 5000, respectively. Bound port ranges are used for outgoing connections, and it is possible to run the system out of ports under certain circumstances. This most commonly occurs when you are running a heavily loaded web proxy. The port range is not an issue when running servers which handle mainly incoming connections, such as a normal web server, or has a limited number of outgoing connections, such as a mail relay. For situations where you may run yourself out of ports, it is recommended to increase net.inet.ip.portrange.last modestly. A value of 10000, 20000 or 30000 may be reasonable. You should also consider firewall effects when changing the port range. Some firewalls may block large ranges of ports (usually low-numbered ports) and expect systems to use higher ranges of ports for outgoing connections — for this reason it is recommended that net.inet.ip.portrange.first be lowered. TCP Bandwidth Delay Product TCP Bandwidth Delay Product Limiting net.inet.tcp.inflight.enable The TCP Bandwidth Delay Product Limiting is similar to TCP/Vegas in NetBSD. It can be enabled by setting net.inet.tcp.inflight.enable sysctl variable to 1. The system will attempt to calculate the bandwidth delay product for each connection and limit the amount of data queued to the network to just the amount required to maintain optimum throughput. This feature is useful if you are serving data over modems, Gigabit Ethernet, or even high speed WAN links (or any other link with a high bandwidth delay product), especially if you are also using window scaling or have configured a large send window. If you enable this option, you should also be sure to set net.inet.tcp.inflight.debug to 0 (disable debugging), and for production use setting net.inet.tcp.inflight.min to at least 6144 may be beneficial. However, note that setting high minimums may effectively disable bandwidth limiting depending on the link. The limiting feature reduces the amount of data built up in intermediate route and switch packet queues as well as reduces the amount of data built up in the local host's interface queue. With fewer packets queued up, interactive connections, especially over slow modems, will also be able to operate with lower Round Trip Times. However, note that this feature only effects data transmission (uploading / server side). It has no effect on data reception (downloading). Adjusting net.inet.tcp.inflight.stab is not recommended. This parameter defaults to 20, representing 2 maximal packets added to the bandwidth delay product window calculation. The additional window is required to stabilize the algorithm and improve responsiveness to changing conditions, but it can also result in higher ping times over slow links (though still much lower than you would get without the inflight algorithm). In such cases, you may wish to try reducing this parameter to 15, 10, or 5; and may also have to reduce net.inet.tcp.inflight.min (for example, to 3500) to get the desired effect. Reducing these parameters should be done as a last resort only. In 4.X and earlier releases of &os; the inflight sysctl variables are directly under net.inet.tcp. Their names were (in alphabetic order): net.inet.tcp.inflight_debug, net.inet.tcp.inflight_enable, net.inet.tcp.inflight_max, net.inet.tcp.inflight_min, net.inet.tcp.inflight_stab. Adding Swap Space No matter how well you plan, sometimes a system does not run as you expect. If you find you need more swap space, it is simple enough to add. You have three ways to increase swap space: adding a new hard drive, enabling swap over NFS, and creating a swap file on an existing partition. Swap on a New Hard Drive The best way to add swap, of course, is to use this as an excuse to add another hard drive. You can always use another hard drive, after all. If you can do this, go reread the discussion of swap space in of the Handbook for some suggestions on how to best arrange your swap. Swapping over NFS Swapping over NFS is only recommended if you do not have a local hard disk to swap to. Swapping over NFS is slow and inefficient in versions of &os; prior to 4.X. It is reasonably fast and efficient in 4.0-RELEASE and newer. Even with newer versions of &os;, NFS swapping will be limited by the available network bandwidth and puts an additional burden on the NFS server. Swapfiles You can create a file of a specified size to use as a swap file. In our example here we will use a 64MB file called /usr/swap0. You can use any name you want, of course. Creating a Swapfile on &os; 4.X Be certain that your kernel configuration includes the vnode driver. It is not in recent versions of GENERIC. pseudo-device vn 1 #Vnode driver (turns a file into a device) Create a vn-device: &prompt.root; cd /dev &prompt.root; sh MAKEDEV vn0 Create a swapfile (/usr/swap0): &prompt.root; dd if=/dev/zero of=/usr/swap0 bs=1024k count=64 Set proper permissions on (/usr/swap0): &prompt.root; chmod 0600 /usr/swap0 Enable the swap file in /etc/rc.conf: swapfile="/usr/swap0" # Set to name of swapfile if aux swapfile desired. Reboot the machine or to enable the swap file immediately, type: &prompt.root; vnconfig -e /dev/vn0b /usr/swap0 swap Creating a Swapfile on &os; 5.X Be certain that your kernel configuration includes the memory disk driver (&man.md.4;). It is default in GENERIC kernel. device md # Memory "disks" Create a swapfile (/usr/swap0): &prompt.root; dd if=/dev/zero of=/usr/swap0 bs=1024k count=64 Set proper permissions on (/usr/swap0): &prompt.root; chmod 0600 /usr/swap0 Enable the swap file in /etc/rc.conf: swapfile="/usr/swap0" # Set to name of swapfile if aux swapfile desired. Reboot the machine or to enable the swap file immediately, type: &prompt.root; mdconfig -a -t vnode -f /usr/swap0 -u 0 && swapon /dev/md0 Hiten Pandya Written by Tom Rhodes Power and Resource Management It is very important to utilize hardware resources in an efficient manner. Before ACPI was introduced, it was very difficult and inflexible for operating systems to manage the power usage and thermal properties of a system. The hardware was controlled by some sort of BIOS embedded interface, such as Plug and Play BIOS (PNPBIOS), or Advanced Power Management (APM) and so on. Power and Resource Management is one of the key components of a modern operating system. For example, you may want an operating system to monitor system limits (and possibly alert you) in case your system temperature increased unexpectedly. In this section of the &os; Handbook, we will provide comprehensive information about ACPI. References will be provided for further reading at the end. Please be aware that ACPI is available on &os; 5.X and above systems as a default kernel module. For &os; 4.9, ACPI can be enabled by adding the line device acpica to a kernel configuration and rebuilding. What Is ACPI? ACPI APM Advanced Configuration and Power Interface (ACPI) is a standard written by an alliance of vendors to provide a standard interface for hardware resources and power management (hence the name). It is a key element in Operating System-directed configuration and Power Management, i.e.: it provides more control and flexibility to the operating system (OS). Modern systems stretched the limits of the current Plug and Play interfaces (such as APM, which is used in &os; 4.X), prior to the introduction of ACPI. ACPI is the direct successor to APM (Advanced Power Management). Shortcomings of Advanced Power Management (APM) The Advanced Power Management (APM) facility controls the power usage of a system based on its activity. The APM BIOS is supplied by the (system) vendor and it is specific to the hardware platform. An APM driver in the OS mediates access to the APM Software Interface, which allows management of power levels. There are four major problems in APM. Firstly, power management is done by the (vendor-specific) BIOS, and the OS does not have any knowledge of it. One example of this, is when the user sets idle-time values for a hard drive in the APM BIOS, that when exceeded, it (BIOS) would spin down the hard drive, without the consent of the OS. Secondly, the APM logic is embedded in the BIOS, and it operates outside the scope of the OS. This means users can only fix problems in their APM BIOS by flashing a new one into the ROM; which is a very dangerous procedure with the potential to leave the system in an unrecoverable state if it fails. Thirdly, APM is a vendor-specific technology, which means that there is a lot of parity (duplication of efforts) and bugs found in one vendor's BIOS, may not be solved in others. Last but not the least, the APM BIOS did not have enough room to implement a sophisticated power policy, or one that can adapt very well to the purpose of the machine. Plug and Play BIOS (PNPBIOS) was unreliable in many situations. PNPBIOS is 16-bit technology, so the OS has to use 16-bit emulation in order to interface with PNPBIOS methods. The &os; APM driver is documented in the &man.apm.4; manual page. Configuring <acronym>ACPI</acronym> The acpi.ko driver is loaded by default at start up by the &man.loader.8; and should not be compiled into the kernel. The reasoning behind this is that modules are easier to work with, say if switching to another acpi.ko without doing a kernel rebuild. This has the advantage of making testing easier. Another reason is that starting ACPI after a system has been brought up is not too useful, and in some cases can be fatal. In doubt, just disable ACPI all together. This driver should not and can not be unloaded because the system bus uses it for various hardware interactions. ACPI can be disabled with the &man.acpiconf.8; utility. In fact most of the interaction with ACPI can be done via &man.acpiconf.8;. Basically this means, if anything about ACPI is in the &man.dmesg.8; output, then most likely it is already running. ACPI and APM cannot coexist and should be used separately. The last one to load will terminate if the driver notices the other running. In the simplest form, ACPI can be used to put the system into a sleep mode with &man.acpiconf.8;, the flag, and a 1-5 option. Most users will only need 1. Option 5 will do a soft-off which is the same action as: &prompt.root; halt -p The other options are available. Check out the &man.acpiconf.8; manual page for more information. Nate Lawson Written by Peter Schultz With contributions from Tom Rhodes Using and Debugging &os; <acronym>ACPI</acronym> ACPI problems ACPI is a fundamentally new way of discovering devices, managing power usage, and providing standardized access to various hardware previously managed by the BIOS. Progress is being made toward ACPI working on all systems, but bugs in some motherboards' ACPI Machine Language (AML) bytecode, incompleteness in &os;'s kernel subsystems, and bugs in the &intel; ACPI-CA interpreter continue to appear. This document is intended to help you assist the &os; ACPI maintainers in identifying the root cause of problems you observe and debugging and developing a solution. Thanks for reading this and we hope we can solve your system's problems. Submitting Debugging Information Before submitting a problem, be sure you are running the latest BIOS version and, if available, embedded controller firmware version. For those of you that want to submit a problem right away, please send the following information to freebsd-acpi@FreeBSD.org: Description of the buggy behavior, including system type and model and anything that causes the bug to appear. Also, please note as accurately as possible when the bug began occurring if it is new for you. The &man.dmesg.8; output after boot -v, including any error messages generated by you exercising the bug. The &man.dmesg.8; output from boot -v with ACPI disabled, if disabling it helps fix the problem. Output from sysctl hw.acpi. This is also a good way of figuring out what features your system offers. URL where your ACPI Source Language (ASL) can be found. Do not send the ASL directly to the list as it can be very large. Generate a copy of your ASL by running this command: &prompt.root; acpidump -t -d > name-system.asl (Substitute your login name for name and manufacturer/model for system. Example: njl-FooCo6000.asl) Most of the developers watch the &a.current; but please submit problems to &a.acpi.name; to be sure it is seen. Please be patient, all of us have full-time jobs elsewhere. If your bug is not immediately apparent, we will probably ask you to submit a PR via &man.send-pr.1;. When entering a PR, please include the same information as requested above. This will help us track the problem and resolve it. Do not send a PR without emailing &a.acpi.name; first as we use PRs as reminders of existing problems, not a reporting mechanism. It is likely that your problem has been reported by someone before. Background ACPI ACPI is present in all modern computers that conform to the ia32 (x86), ia64 (Itanium), and amd64 (AMD) architectures. The full standard has many features including CPU performance management, power planes control, thermal zones, various battery systems, embedded controllers, and bus enumeration. Most systems implement less than the full standard. For instance, a desktop system usually only implements the bus enumeration parts while a laptop might have cooling and battery management support as well. Laptops also have suspend and resume, with their own associated complexity. An ACPI-compliant system has various components. The BIOS and chipset vendors provide various fixed tables (e.g., FADT) in memory that specify things like the APIC map (used for SMP), config registers, and simple configuration values. Additionally, a table of bytecode (the Differentiated System Description Table DSDT) is provided that specifies a tree-like name space of devices and methods. The ACPI driver must parse the fixed tables, implement an interpreter for the bytecode, and modify device drivers and the kernel to accept information from the ACPI subsystem. For &os;, &intel; has provided an interpreter (ACPI-CA) that is shared with Linux and NetBSD. The path to the ACPI-CA source code is src/sys/contrib/dev/acpica. The glue code that allows ACPI-CA to work on &os; is in src/sys/dev/acpica/Osd. Finally, drivers that implement various ACPI devices are found in src/sys/dev/acpica. Common Problems ACPI problems For ACPI to work correctly, all the parts have to work correctly. Here are some common problems, in order of frequency of appearance, and some possible workarounds or fixes. Mouse Issues In some cases, resuming from a suspend operation will cause the mouse to fail. A known work around is to add hint.psm.0.flags="0x3000" to the /boot/loader.conf file. If this does not work then please consider sending a bug report as described above. Suspend/Resume ACPI has three suspend to RAM (STR) states, S1-S3, and one suspend to disk state (STD), called S4. S5 is soft off and is the normal state your system is in when plugged in but not powered up. S4 can actually be implemented two separate ways. S4BIOS is a BIOS-assisted suspend to disk. S4OS is implemented entirely by the operating system. Start by checking sysctl hw.acpi for the suspend-related items. Here are the results for a Thinkpad: hw.acpi.supported_sleep_state: S3 S4 S5 hw.acpi.s4bios: 0 This means that we can use acpiconf -s to test S3, S4OS, and S5. If was one (1), we would have S4BIOS support instead of S4 OS. When testing suspend/resume, start with S1, if supported. This state is most likely to work since it does not require much driver support. No one has implemented S2 but if you have it, it is similar to S1. The next thing to try is S3. This is the deepest STR state and requires a lot of driver support to properly reinitialize your hardware. If you have problems resuming, feel free to email the &a.acpi.name; list but do not expect the problem to be resolved since there are a lot of drivers/hardware that need more testing and work. To help isolate the problem, remove as many drivers from your kernel as possible. If it works, you can narrow down which driver is the problem by loading drivers until it fails again. Typically binary drivers like nvidia.ko, X11 display drivers, and USB will have the most problems while Ethernet interfaces usually work fine. If you can properly load/unload the drivers, you can automate this by putting the appropriate commands in /etc/rc.suspend and /etc/rc.resume. There is a commented-out example for unloading and loading a driver. Try setting to zero (0) if your display is messed up after resume. Try setting longer or shorter values for to see if that helps. Another thing to try is load a recent Linux distribution with ACPI support and test their suspend/resume support on the same hardware. If it works on Linux, it is likely a &os; driver problem and narrowing down which driver causes the problems will help us fix the problem. Note that the ACPI maintainers do not usually maintain other drivers (e.g sound, ATA, etc.) so any work done on tracking down a driver problem should probably eventually be posted to the &a.current.name; list and mailed to the driver maintainer. If you are feeling adventurous, go ahead and start putting some debugging &man.printf.3;s in a problematic driver to track down where in its resume function it hangs. Finally, try disabling ACPI and enabling APM instead. If suspend/resume works with APM, you may be better off sticking with APM, especially on older hardware (pre-2000). It took vendors a while to get ACPI support correct and older hardware is more likely to have BIOS problems with ACPI. System Hangs (temporary or permanent) Most system hangs are a result of lost interrupts or an interrupt storm. Chipsets have a lot of problems based on how the BIOS configures interrupts before boot, correctness of the APIC (MADT) table, and routing of the System Control Interrupt (SCI). interrupt storms Interrupt storms can be distinguished from lost interrupts by checking the output of vmstat -i and looking at the line that has acpi0. If the counter is increasing at more than a couple per second, you have an interrupt storm. If the system appears hung, try breaking to DDB (CTRL ALTESC on console) and type show interrupts. APIC disabling Your best hope when dealing with interrupt problems is to try disabling APIC support with hint.apic.0.disabled="1" in loader.conf. Panics Panics are relatively rare for ACPI and are the top priority to be fixed. The first step is to isolate the steps to reproduce the panic (if possible) and get a backtrace. Follow the advice for enabling options DDB and setting up a serial console (see ) or setting up a &man.dump.8; partition. You can get a backtrace in DDB with tr. If you have to handwrite the backtrace, be sure to at least get the lowest five (5) and top five (5) lines in the trace. Then, try to isolate the problem by booting with ACPI disabled. If that works, you can isolate the ACPI subsystem by using various values of . See the &man.acpi.4; manual page for some examples. System Powers Up After Suspend or Shutdown First, try setting hw.acpi.disable_on_poweroff="0" in &man.loader.conf.5;. This keeps ACPI from disabling various events during the shutdown process. Some systems need this value set to 1 (the default) for the same reason. This usually fixes the problem of a system powering up spontaneously after a suspend or poweroff. Other Problems If you have other problems with ACPI (working with a docking station, devices not detected, etc.), please email a description to the mailing list as well; however, some of these issues may be related to unfinished parts of the ACPI subsystem so they might take a while to be implemented. Please be patient and prepared to test patches we may send you. <acronym>ASL</acronym>, <command>acpidump</command>, and <acronym>IASL</acronym> ACPI ASL The most common problem is the BIOS vendors providing incorrect (or outright buggy!) bytecode. This is usually manifested by kernel console messages like this: ACPI-1287: *** Error: Method execution failed [\\_SB_.PCI0.LPC0.FIGD._STA] \\ (Node 0xc3f6d160), AE_NOT_FOUND Often, you can resolve these problems by updating your BIOS to the latest revision. Most console messages are harmless but if you have other problems like battery status not working, they are a good place to start looking for problems in the AML. The bytecode, known as AML, is compiled from a source language called ASL. The AML is found in the table known as the DSDT. To get a copy of your ASL, use &man.acpidump.8;. You should use both the (show contents of the fixed tables) and (disassemble AML to ASL) options. See the Submitting Debugging Information section for an example syntax. The simplest first check you can do is to recompile your ASL to check for errors. Warnings can usually be ignored but errors are bugs that will usually prevent ACPI from working correctly. To recompile your ASL, issue the following command: &prompt.root; iasl your.asl Fixing Your <acronym>ASL</acronym> ACPI ASL In the long run, our goal is for almost everyone to have ACPI work without any user intervention. At this point, however, we are still developing workarounds for common mistakes made by the BIOS vendors. The µsoft; interpreter (acpi.sys and acpiec.sys) does not strictly check for adherence to the standard, and thus many BIOS vendors who only test ACPI under &windows; never fix their ASL. We hope to continue to identify and document exactly what non-standard behavior is allowed by µsoft;'s interpreter and replicate it so &os; can work without forcing users to fix the ASL. As a workaround and to help us identify behavior, you can fix the ASL manually. If this works for you, please send a &man.diff.1; of the old and new ASL so we can possibly work around the buggy behavior in ACPI-CA and thus make your fix unnecessary. ACPI error messages Here is a list of common error messages, their cause, and how to fix them: _OS dependencies Some AML assumes the world consists of various &windows; versions. You can tell &os; to claim it is any OS to see if this fixes problems you may have. An easy way to override this is to set hw.acpi.osname="Windows 2001" in /boot/loader.conf or other similar strings you find in the ASL. Missing Return statements Some methods do not explicitly return a value as the standard requires. While ACPI-CA does not handle this, &os; has a workaround that allows it to return the value implicitly. You can also add explicit Return statements where required if you know what value should be returned. To force iasl to compile the ASL, use the flag. Overriding the Default <acronym>AML</acronym> After you customize your.asl, you will want to compile it, run: &prompt.root; iasl your.asl You can add the flag to force creation of the AML, even if there are errors during compilation. Remember that some errors (e.g., missing Return statements) are automatically worked around by the interpreter. DSDT.aml is the default output filename for iasl. You can load this instead of your BIOS's buggy copy (which is still present in flash memory) by editing /boot/loader.conf as follows: acpi_dsdt_load="YES" acpi_dsdt_name="/boot/DSDT.aml" Be sure to copy your DSDT.aml to the /boot directory. Getting Debugging Output From <acronym>ACPI</acronym> ACPI problems ACPI debugging The ACPI driver has a very flexible debugging facility. It allows you to specify a set of subsystems as well as the level of verbosity. The subsystems you wish to debug are specified as layers and are broken down into ACPI-CA components (ACPI_ALL_COMPONENTS) and ACPI hardware support (ACPI_ALL_DRIVERS). The verbosity of debugging output is specified as the level and ranges from ACPI_LV_ERROR (just report errors) to ACPI_LV_VERBOSE (everything). The level is a bitmask so multiple options can be set at once, separated by spaces. In practice, you will want to use a serial console to log the output if it is so long it flushes the console message buffer. A full list of the individual layers and levels is found in the &man.acpi.4; manual page. Debugging output is not enabled by default. To enable it, add options ACPI_DEBUG to your kernel configuration file if ACPI is compiled into the kernel. You can add ACPI_DEBUG=1 to your /etc/make.conf to enable it globally. If it is a module, you can recompile just your acpi.ko module as follows: &prompt.root; cd /sys/modules/acpi/acpi && make clean && make ACPI_DEBUG=1 Install acpi.ko in /boot/kernel and add your desired level and layer to loader.conf. This example enables debug messages for all ACPI-CA components and all ACPI hardware drivers (CPU, LID, etc.) It will only output error messages, the least verbose level. debug.acpi.layer="ACPI_ALL_COMPONENTS ACPI_ALL_DRIVERS" debug.acpi.level="ACPI_LV_ERROR" If the information you want is triggered by a specific event (say, a suspend and then resume), you can leave out changes to loader.conf and instead use sysctl to specify the layer and level after booting and preparing your system for the specific event. The sysctls are named the same as the tunables in loader.conf. References More information about ACPI may be found in the following locations: The &a.acpi; The ACPI Mailing List Archives The old ACPI Mailing List Archives The ACPI 2.0 Specification &os; Manual pages: &man.acpi.4;, &man.acpi.thermal.4;, &man.acpidump.8;, &man.iasl.8;, &man.acpidb.8; DSDT debugging resource. (Uses Compaq as an example but generally useful.)