Index: stable/10/lib/libc/sys/poll.2 =================================================================== --- stable/10/lib/libc/sys/poll.2 (revision 306377) +++ stable/10/lib/libc/sys/poll.2 (revision 306378) @@ -1,274 +1,273 @@ .\" $NetBSD: poll.2,v 1.3 1996/09/07 21:53:08 mycroft Exp $ .\" $FreeBSD$ .\" .\" Copyright (c) 1996 Charles M. Hannum. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. All advertising materials mentioning features or use of this software .\" must display the following acknowledgement: .\" This product includes software developed by Charles M. Hannum. .\" 4. The name of the author may not be used to endorse or promote products .\" derived from this software without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR .\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES .\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. .\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, .\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT .\" NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, .\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY .\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .\" .Dd November 13, 2014 .Dt POLL 2 .Os .Sh NAME .Nm poll .Nd synchronous I/O multiplexing .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In poll.h .Ft int .Fn poll "struct pollfd fds[]" "nfds_t nfds" "int timeout" .Ft int .Fo ppoll .Fa "struct pollfd fds[]" .Fa "nfds_t nfds" .Fa "const struct timespec * restrict timeout" .Fa "const sigset_t * restrict newsigmask" .Fc .Sh DESCRIPTION The .Fn poll system call examines a set of file descriptors to see if some of them are ready for I/O. The .Fa fds argument is a pointer to an array of pollfd structures as defined in .In poll.h (shown below). The .Fa nfds argument determines the size of the .Fa fds array. .Bd -literal struct pollfd { int fd; /* file descriptor */ short events; /* events to look for */ short revents; /* events returned */ }; .Ed .Pp The fields of .Fa struct pollfd are as follows: .Bl -tag -width XXXrevents .It fd File descriptor to poll. If fd is equal to -1 then .Fa revents is cleared (set to zero), and that pollfd is not checked. .It events Events to poll for. (See below.) .It revents Events which may occur. (See below.) .El .Pp The event bitmasks in .Fa events and .Fa revents have the following bits: .Bl -tag -width XXXPOLLWRNORM .It POLLIN Data other than high priority data may be read without blocking. .It POLLRDNORM Normal data may be read without blocking. .It POLLRDBAND Data with a non-zero priority may be read without blocking. .It POLLPRI High priority data may be read without blocking. .It POLLOUT .It POLLWRNORM Normal data may be written without blocking. .It POLLWRBAND Data with a non-zero priority may be written without blocking. .It POLLERR An exceptional condition has occurred on the device or socket. This flag is always checked, even if not present in the .Fa events bitmask. .It POLLHUP The device or socket has been disconnected. This flag is always checked, even if not present in the .Fa events bitmask. Note that POLLHUP and POLLOUT should never be present in the .Fa revents bitmask at the same time. .It POLLNVAL The file descriptor is not open. This flag is always checked, even if not present in the .Fa events bitmask. .El .Pp If .Fa timeout is neither zero nor INFTIM (-1), it specifies a maximum interval to wait for any file descriptor to become ready, in milliseconds. If .Fa timeout is INFTIM (-1), the poll blocks indefinitely. If .Fa timeout is zero, then .Fn poll will return without blocking. .Pp The .Fn ppoll system call, unlike .Fn poll , is used to safely wait until either a set of file descriptors becomes ready or until a signal is caught. The .Fa fds and .Fa nfds arguments are identical to the analogous arguments of .Fn poll . The .Fa timeout argument in .Fn ppoll points to a .Vt "const struct timespec" which is defined in .In sys/timespec.h (shown below) rather than the .Vt "int timeout" used by .Fn poll . A null pointer may be passed to indicate that .Fn ppoll should wait indefinitely. Finally, .Fa newsigmask specifies a signal mask which is set while waiting for input. When .Fn ppoll returns, the original signal mask is restored. -.Pp .Bd -literal struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* and nanoseconds */ }; .Ed .Sh RETURN VALUES The .Fn poll system call returns the number of descriptors that are ready for I/O, or -1 if an error occurred. If the time limit expires, .Fn poll returns 0. If .Fn poll returns with an error, including one due to an interrupted system call, the .Fa fds array will be unmodified. .Sh COMPATIBILITY This implementation differs from the historical one in that a given file descriptor may not cause .Fn poll to return with an error. In cases where this would have happened in the historical implementation (e.g.\& trying to poll a .Xr revoke 2 Ns ed descriptor), this implementation instead copies the .Fa events bitmask to the .Fa revents bitmask. Attempting to perform I/O on this descriptor will then return an error. This behaviour is believed to be more useful. .Sh ERRORS An error return from .Fn poll indicates: .Bl -tag -width Er .It Bq Er EFAULT The .Fa fds argument points outside the process's allocated address space. .It Bq Er EINTR A signal was delivered before the time limit expired and before any of the selected events occurred. .It Bq Er EINVAL The specified time limit is invalid. One of its components is negative or too large. .El .Sh SEE ALSO .Xr accept 2 , .Xr connect 2 , .Xr kqueue 2 , .Xr pselect 2 , .Xr read 2 , .Xr recv 2 , .Xr select 2 , .Xr send 2 , .Xr write 2 .Sh STANDARDS The .Fn poll function conforms to .St -p1003.1-2001 . The .Fn ppoll is not specified by POSIX. .Sh HISTORY The .Fn poll function appeared in .At V . This manual page and the core of the implementation was taken from .Nx . The .Fn ppoll function first appeared in .Fx 11.0 .Sh BUGS The distinction between some of the fields in the .Fa events and .Fa revents bitmasks is really not useful without STREAMS. The fields are defined for compatibility with existing software. Index: stable/10/lib/libdpv/dpv.3 =================================================================== --- stable/10/lib/libdpv/dpv.3 (revision 306377) +++ stable/10/lib/libdpv/dpv.3 (revision 306378) @@ -1,511 +1,511 @@ .\" Copyright (c) 2013-2016 Devin Teske .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd Jan 26, 2016 .Dt DPV 3 .Os .Sh NAME .Nm dpv .Nd dialog progress view library .Sh LIBRARY .Lb libdpv .Sh SYNOPSIS .In dpv.h .Ft int .Fo dpv .Fa "struct dpv_config *config, struct dpv_file_node *file_list" .Fc .Ft void .Fo dpv_free .Fa "void" .Fc .Sh DESCRIPTION The .Nm library provides an interface for creating complex .Dq gauge widgets for displaying progress on various actions. The .Nm library can display progress with one of .Xr dialog 3 , .Xr dialog 1 , or .Xr Xdialog 1 .Pq x11/xdialog from the ports tree . .Pp The .Fn dpv .Fa config argument contains the following properties for configuring global display features: .Bd -literal -offset indent struct dpv_config { uint8_t keep_tite; /* Cleaner exit for scripts */ enum dpv_display display_type; /* Def. DPV_DISPLAY_LIBDIALOG */ enum dpv_output output_type; /* Default DPV_OUTPUT_NONE */ int debug; /* Enable debug on stderr */ int display_limit; /* Files/page. Default -1 */ int label_size; /* Label size. Default 28 */ int pbar_size; /* Mini-progress size */ int dialog_updates_per_second; /* Default 16 */ int status_updates_per_second; /* Default 2 */ uint16_t options; /* Default 0 (none) */ char *title; /* Widget title */ char *backtitle; /* Widget backtitle */ char *aprompt; /* Append. Default NULL */ char *pprompt; /* Prefix. Default NULL */ char *msg_done; /* Default `Done' */ char *msg_fail; /* Default `Fail' */ char *msg_pending; /* Default `Pending' */ char *output; /* Output format string */ const char *status_solo; /* dialog(3) solo-status format. * Default DPV_STATUS_SOLO */ const char *status_many; /* dialog(3) many-status format. * Default DPV_STATUS_MANY */ /* * Function pointer; action to perform data transfer */ int (*action)(struct dpv_file_node *file, int out); }; enum dpv_display { DPV_DISPLAY_LIBDIALOG = 0, /* Use dialog(3) (default) */ DPV_DISPLAY_STDOUT, /* Use stdout */ DPV_DISPLAY_DIALOG, /* Use spawned dialog(1) */ DPV_DISPLAY_XDIALOG, /* Use spawned Xdialog(1) */ }; enum dpv_output { DPV_OUTPUT_NONE = 0, /* No output (default) */ DPV_OUTPUT_FILE, /* Read `output' member as file path */ DPV_OUTPUT_SHELL, /* Read `output' member as shell cmd */ }; .Ed .Pp The .Va options member of the .Fn dpv .Fa config argument is a mask of bit fields indicating various processing options. Possible flags are as follows: .Bl -tag -width DPV_NO_OVERRUN .It Dv DPV_TEST_MODE Enable test mode. In test mode, the .Fn action callback of the .Fa config argument is not called but instead simulated-data is used to drive progress. Appends .Dq [TEST MODE] to the status line .Po to override, set the .Va status_format member of the .Fn dpv .Fa config argument; e.g., to .Dv DPV_STATUS_DEFAULT .Pc . .It Dv DPV_WIDE_MODE Enable wide mode. In wide mode, the length of the .Va aprompt and .Va pprompt members of the .Fn dpv .Fa config argument will bump the width of the gauge widget. Prompts wider than the maximum width will wrap .Po unless using .Xr Xdialog 1 ; see BUGS section below .Pc . .It Dv DPV_NO_LABELS Disables the display of labels associated with each transfer .Po .Va label_size member of .Fn dpv .Fa config argument is ignored .Pc . .It Dv DPV_USE_COLOR Force the use of color even if the .Va display_type does not support color .Po .Ev USE_COLOR environment variable is ignored .Pc . .It Dv DPV_NO_OVERRUN When enabled, callbacks for the current .Vt dpv_file_node are terminated when .Fn action returns 100 or greater .Po alleviates the need to change the .Va status of the current .Vt dpv_file_node but may also cause file truncation if the stream exceeds expected length .Pc . .El .Pp The .Fa file_list argument to .Fn dpv is a pointer to a .Dq linked-list , described as follows in .In dpv.h : .Bd -literal -offset indent struct dpv_file_node { enum dpv_status status; /* status of read operation */ char *msg; /* display instead of "Done/Fail" */ char *name; /* name of file to read */ char *path; /* path to file */ long long length; /* expected size */ long long read; /* number units read (e.g., bytes) */ struct dpv_file_node *next;/* pointer to next (end with NULL) */ }; .Ed .Pp For each of the items in the .Fa file_list .Dq linked-list argument, the .Fn action callback member of the .Fn dpv .Fa config argument is called. The .Fn action function should perform a .Dq nominal action on the file and return. The return value of .Vt int represents the current progress percentage .Pq 0-100 for the current file. .Pp The .Fn action callback provides two variables for each call. .Fa file provides a reference to the current .Vt dpv_file_node being processed. .Fa out provides a file descriptor where the data should go. .Pp If the .Va output member of the .Fn dpv .Fa config argument was set to DPV_OUTPUT_NONE .Pq default ; when invoking Fn dpv , the .Fa out file descriptor of .Fn action will be zero and should be ignored. If .Fa output was set to DPV_OUTPUT_FILE, .Fa out will be an open file descriptor to a file. If .Fa output was set to DPV_OUTPUT_SHELL, .Fa out will be an open file descriptor to a pipe for a spawned shell program. When .Fa out is greater than zero, you should write any data you have read back to .Fa out . .Pp To abort .Fn dpv , either from the .Fn action callback or asynchronously from a signal handler, two globals are provided via .In dpv.h : .Bd -literal -offset indent extern int dpv_interrupt; /* Set to TRUE in interrupt handler */ extern int dpv_abort; /* Set to true in callback to abort */ .Ed .Pp These globals are not automatically reset and must be manually maintained. Don't forget to reset these globals before subsequent invocations of .Fn dpv when making multiple calls from the same program. .Pp In addition, the .Va status member of the .Fn action .Fa file argument can be used to control callbacks for the current file. The .Va status member can be set to any of the following from .In dpv.h : .Bd -literal -offset indent enum dpv_status { DPV_STATUS_RUNNING = 0, /* Running (default) */ DPV_STATUS_DONE, /* Completed */ DPV_STATUS_FAILED, /* Oops, something went wrong */ }; .Ed .Pp The default .Fa status is zero, DPV_STATUS_RUNING, which keeps the callbacks coming for the current .Fn file . Setting .Ql file->status to anything other than DPV_STATUS_RUNNING will cause .Fn dpv to loop to the next file, effecting the next callback, if any. .Pp The .Fn action callback is responsible for calculating percentages and .Pq recommended maintaining a .Nm global counter so .Fn dpv can display throughput statistics. Percentages are reported through the .Vt int return value of the .Fn action callback. Throughput statistics are calculated from the following global .Vt int in .In dpv.h : .Bd -literal -offset indent extern int dpv_overall_read; .Ed .Pp This should be set to the number of bytes that have been read for all files. Throughput information is displayed in the status line .Pq only available when using Xr dialog 3 at the bottom of the screen. See DPV_DISPLAY_LIBDIALOG above. .Pp Note that .Va dpv_overall_read does not have to represent bytes. For example, you can change the .Va status_format to display something other than .Dq Li bytes and increment .Va dpv_overall_read accordingly .Pq e.g., counting lines . .Pp When .Fn dpv is processing the current file, the .Va length and .Va read members of the .Fn action .Fa file argument are used for calculating the display of mini progress bars .Po if enabled; see .Va pbar_size above .Pc . If the .Va length member of the current .Fa file is less than zero .Pq indicating an unknown file length , a .Xr humanize_number 3 version of the .Va read member is used instead of a traditional progress bar. Otherwise a progress bar is calculated as percentage read to file length. .Fn action callback must maintain these member values for mini-progress bars. .Pp The .Fn dpv_free function performs .Xr free 3 on private global variables initialized by .Fn dpv . .Sh ENVIRONMENT The following environment variables are referenced by .Nm : .Bl -tag -width ".Ev USE_COLOR" .It Ev DIALOG Override command string used to launch .Xr dialog 1 .Pq requires Dv DPV_DISPLAY_DIALOG or .Xr Xdialog 1 .Pq requires Dv DPV_DISPLAY_XDIALOG ; default is either .Ql dialog .Pq for Dv DPV_DISPLAY_DIALOG or .Ql Xdialog .Pq for Dv DPV_DISPLAY_XDIALOG . .It Ev DIALOGRC If set and non-NULL, path to .Ql .dialogrc file. .It Ev HOME If .Ql Ev $DIALOGRC is either not set or NULL, used as a prefix to .Ql .dialogrc .Pq i.e., Ql $HOME/.dialogrc . .It Ev USE_COLOR If set and NULL, disables the use of color when using .Xr dialog 1 .Pq does not apply to Xr Xdialog 1 . .It Ev msg_done Ev msg_fail Ev msg_pending Internationalization strings for overriding the default English strings .Ql Done , .Ql Fail , and .Ql Pending respectively. To prevent their usage, explicitly set the .Va msg_done , .Va msg_fail , and .Va msg_pending members of .Fn dpv .Fa config argument to default macros .Pq DPV_DONE_DEFAULT, DPV_FAIL_DEFAULT, and DPV_PENDING_DEFAULT or desired values. .El .Sh FILES .Bl -tag -width ".Pa $HOME/.dialogrc" -compact .It Pa $HOME/.dialogrc .El .Sh SEE ALSO .Xr dialog 1 , .Xr dialog 3 , .Xr Xdialog 1 .Sh HISTORY The .Nm library first appeared in .Fx 10.2 . .Sh AUTHORS .An Devin Teske Aq dteske@FreeBSD.org .Sh BUGS .Xr Xdialog 1 , when given both .Ql Fl -title Ar title .Po see above .Ql Va title member of .Va struct dpv_config .Pc and .Ql Fl -backtitle Ar backtitle .Po see above .Ql Va backtitle member of .Va struct dpv_config .Pc , displays the backtitle in place of the title and vice-versa. .Pp .Xr Xdialog 1 does not wrap long prompt texts received after initial launch. This is a known issue with the .Ql --gauge widget in .Xr Xdialog 1 . Embed escaped newlines within prompt text(s) to force line breaks. .Pp .Xr dialog 1 does not display the first character after a series of escaped escape-sequences (e.g., ``\\\\n'' produces ``\\'' instead of ``\\n''). This is a known issue with .Xr dialog 1 and does not affect .Xr dialog 3 or .Xr Xdialog 1 . .Pp If your application ignores .Ev USE_COLOR when set and NULL before calling .Xr dpv 3 with color escape sequences anyway, .Xr dialog 3 and .Xr dialog 1 may not render properly. Workaround is to detect when .Ev USE_COLOR is set and NULL and either not use color escape sequences at that time or use .Xr unsetenv 3 to unset .Ev USE_COLOR , forcing interpretation of color sequences. This does not effect .Xr Xdialog 1 , which renders the color escape sequences as plain text. See -.Do Li +.Do embedded "\\Z" sequences .Dc in .Xr dialog 1 for additional information. Index: stable/10/sbin/ipfw/ipfw.8 =================================================================== --- stable/10/sbin/ipfw/ipfw.8 (revision 306377) +++ stable/10/sbin/ipfw/ipfw.8 (revision 306378) @@ -1,3473 +1,3472 @@ .\" .\" $FreeBSD$ .\" .Dd May 26, 2016 .Dt IPFW 8 .Os .Sh NAME .Nm ipfw .Nd User interface for firewall, traffic shaper, packet scheduler, in-kernel NAT. .Sh SYNOPSIS .Ss FIREWALL CONFIGURATION .Nm .Op Fl cq .Cm add .Ar rule .Nm .Op Fl acdefnNStT .Op Cm set Ar N .Brq Cm list | show .Op Ar rule | first-last ... .Nm .Op Fl f | q .Op Cm set Ar N .Cm flush .Nm .Op Fl q .Op Cm set Ar N .Brq Cm delete | zero | resetlog .Op Ar number ... .Pp .Nm .Cm set Oo Cm disable Ar number ... Oc Op Cm enable Ar number ... .Nm .Cm set move .Op Cm rule .Ar number Cm to Ar number .Nm .Cm set swap Ar number number .Nm .Cm set show .Ss SYSCTL SHORTCUTS .Nm .Cm enable .Brq Cm firewall | altq | one_pass | debug | verbose | dyn_keepalive .Nm .Cm disable .Brq Cm firewall | altq | one_pass | debug | verbose | dyn_keepalive .Ss LOOKUP TABLES .Nm .Cm table Ar number Cm add Ar addr Ns Oo / Ns Ar masklen Oc Op Ar value .Nm .Cm table Ar number Cm delete Ar addr Ns Op / Ns Ar masklen .Nm .Cm table .Brq Ar number | all .Cm flush .Nm .Cm table .Brq Ar number | all .Cm list .Ss DUMMYNET CONFIGURATION (TRAFFIC SHAPER AND PACKET SCHEDULER) .Nm .Brq Cm pipe | queue | sched .Ar number .Cm config .Ar config-options .Nm .Op Fl s Op Ar field .Brq Cm pipe | queue | sched .Brq Cm delete | list | show .Op Ar number ... .Ss IN-KERNEL NAT .Nm .Op Fl q .Cm nat .Ar number .Cm config .Ar config-options .Pp .Nm .Op Fl cfnNqS .Oo .Fl p Ar preproc .Oo .Ar preproc-flags .Oc .Oc .Ar pathname .Sh DESCRIPTION The .Nm utility is the user interface for controlling the .Xr ipfw 4 firewall, the .Xr dummynet 4 traffic shaper/packet scheduler, and the in-kernel NAT services. .Pp A firewall configuration, or .Em ruleset , is made of a list of .Em rules numbered from 1 to 65535. Packets are passed to the firewall from a number of different places in the protocol stack (depending on the source and destination of the packet, it is possible for the firewall to be invoked multiple times on the same packet). The packet passed to the firewall is compared against each of the rules in the .Em ruleset , in rule-number order (multiple rules with the same number are permitted, in which case they are processed in order of insertion). When a match is found, the action corresponding to the matching rule is performed. .Pp Depending on the action and certain system settings, packets can be reinjected into the firewall at some rule after the matching one for further processing. .Pp A ruleset always includes a .Em default rule (numbered 65535) which cannot be modified or deleted, and matches all packets. The action associated with the .Em default rule can be either .Cm deny or .Cm allow depending on how the kernel is configured. .Pp If the ruleset includes one or more rules with the .Cm keep-state or .Cm limit option, the firewall will have a .Em stateful behaviour, i.e., upon a match it will create .Em dynamic rules , i.e., rules that match packets with the same 5-tuple (protocol, source and destination addresses and ports) as the packet which caused their creation. Dynamic rules, which have a limited lifetime, are checked at the first occurrence of a .Cm check-state , .Cm keep-state or .Cm limit rule, and are typically used to open the firewall on-demand to legitimate traffic only. See the .Sx STATEFUL FIREWALL and .Sx EXAMPLES Sections below for more information on the stateful behaviour of .Nm . .Pp All rules (including dynamic ones) have a few associated counters: a packet count, a byte count, a log count and a timestamp indicating the time of the last match. Counters can be displayed or reset with .Nm commands. .Pp Each rule belongs to one of 32 different .Em sets , and there are .Nm commands to atomically manipulate sets, such as enable, disable, swap sets, move all rules in a set to another one, delete all rules in a set. These can be useful to install temporary configurations, or to test them. See Section .Sx SETS OF RULES for more information on .Em sets . .Pp Rules can be added with the .Cm add command; deleted individually or in groups with the .Cm delete command, and globally (except those in set 31) with the .Cm flush command; displayed, optionally with the content of the counters, using the .Cm show and .Cm list commands. Finally, counters can be reset with the .Cm zero and .Cm resetlog commands. .Pp .Ss COMMAND OPTIONS The following general options are available when invoking .Nm : .Bl -tag -width indent .It Fl a Show counter values when listing rules. The .Cm show command implies this option. .It Fl b Only show the action and the comment, not the body of a rule. Implies .Fl c . .It Fl c When entering or showing rules, print them in compact form, i.e., omitting the "ip from any to any" string when this does not carry any additional information. .It Fl d When listing, show dynamic rules in addition to static ones. .It Fl e When listing and .Fl d is specified, also show expired dynamic rules. .It Fl f Do not ask for confirmation for commands that can cause problems if misused, i.e., .Cm flush . If there is no tty associated with the process, this is implied. .It Fl i When listing a table (see the .Sx LOOKUP TABLES section below for more information on lookup tables), format values as IP addresses. By default, values are shown as integers. .It Fl n Only check syntax of the command strings, without actually passing them to the kernel. .It Fl N Try to resolve addresses and service names in output. .It Fl q Be quiet when executing the .Cm add , .Cm nat , .Cm zero , .Cm resetlog or .Cm flush commands; (implies .Fl f ) . This is useful when updating rulesets by executing multiple .Nm commands in a script (e.g., .Ql sh\ /etc/rc.firewall ) , or by processing a file with many .Nm rules across a remote login session. It also stops a table add or delete from failing if the entry already exists or is not present. .Pp The reason why this option may be important is that for some of these actions, .Nm may print a message; if the action results in blocking the traffic to the remote client, the remote login session will be closed and the rest of the ruleset will not be processed. Access to the console would then be required to recover. .It Fl S When listing rules, show the .Em set each rule belongs to. If this flag is not specified, disabled rules will not be listed. .It Fl s Op Ar field When listing pipes, sort according to one of the four counters (total or current packets or bytes). .It Fl t When listing, show last match timestamp converted with ctime(). .It Fl T When listing, show last match timestamp as seconds from the epoch. This form can be more convenient for postprocessing by scripts. .El .Ss LIST OF RULES AND PREPROCESSING To ease configuration, rules can be put into a file which is processed using .Nm as shown in the last synopsis line. An absolute .Ar pathname must be used. The file will be read line by line and applied as arguments to the .Nm utility. .Pp Optionally, a preprocessor can be specified using .Fl p Ar preproc where .Ar pathname is to be piped through. Useful preprocessors include .Xr cpp 1 and .Xr m4 1 . If .Ar preproc does not start with a slash .Pq Ql / as its first character, the usual .Ev PATH name search is performed. Care should be taken with this in environments where not all file systems are mounted (yet) by the time .Nm is being run (e.g.\& when they are mounted over NFS). Once .Fl p has been specified, any additional arguments are passed on to the preprocessor for interpretation. This allows for flexible configuration files (like conditionalizing them on the local hostname) and the use of macros to centralize frequently required arguments like IP addresses. .Ss TRAFFIC SHAPER CONFIGURATION The .Nm .Cm pipe , queue and .Cm sched commands are used to configure the traffic shaper and packet scheduler. See the .Sx TRAFFIC SHAPER (DUMMYNET) CONFIGURATION Section below for details. .Pp If the world and the kernel get out of sync the .Nm ABI may break, preventing you from being able to add any rules. This can adversely effect the booting process. You can use .Nm .Cm disable .Cm firewall to temporarily disable the firewall to regain access to the network, allowing you to fix the problem. .Sh PACKET FLOW A packet is checked against the active ruleset in multiple places in the protocol stack, under control of several sysctl variables. These places and variables are shown below, and it is important to have this picture in mind in order to design a correct ruleset. .Bd -literal -offset indent ^ to upper layers V | | +----------->-----------+ ^ V [ip(6)_input] [ip(6)_output] net.inet(6).ip(6).fw.enable=1 | | ^ V [ether_demux] [ether_output_frame] net.link.ether.ipfw=1 | | +-->--[bdg_forward]-->--+ net.link.bridge.ipfw=1 ^ V | to devices | .Ed .Pp The number of times the same packet goes through the firewall can vary between 0 and 4 depending on packet source and destination, and system configuration. .Pp Note that as packets flow through the stack, headers can be stripped or added to it, and so they may or may not be available for inspection. E.g., incoming packets will include the MAC header when .Nm is invoked from .Cm ether_demux() , but the same packets will have the MAC header stripped off when .Nm is invoked from .Cm ip_input() or .Cm ip6_input() . .Pp Also note that each packet is always checked against the complete ruleset, irrespective of the place where the check occurs, or the source of the packet. If a rule contains some match patterns or actions which are not valid for the place of invocation (e.g.\& trying to match a MAC header within .Cm ip_input or .Cm ip6_input ), the match pattern will not match, but a .Cm not operator in front of such patterns .Em will cause the pattern to .Em always match on those packets. It is thus the responsibility of the programmer, if necessary, to write a suitable ruleset to differentiate among the possible places. .Cm skipto rules can be useful here, as an example: .Bd -literal -offset indent # packets from ether_demux or bdg_forward ipfw add 10 skipto 1000 all from any to any layer2 in # packets from ip_input ipfw add 10 skipto 2000 all from any to any not layer2 in # packets from ip_output ipfw add 10 skipto 3000 all from any to any not layer2 out # packets from ether_output_frame ipfw add 10 skipto 4000 all from any to any layer2 out .Ed .Pp (yes, at the moment there is no way to differentiate between ether_demux and bdg_forward). .Sh SYNTAX In general, each keyword or argument must be provided as a separate command line argument, with no leading or trailing spaces. Keywords are case-sensitive, whereas arguments may or may not be case-sensitive depending on their nature (e.g.\& uid's are, hostnames are not). .Pp Some arguments (e.g., port or address lists) are comma-separated lists of values. In this case, spaces after commas ',' are allowed to make the line more readable. You can also put the entire command (including flags) into a single argument. E.g., the following forms are equivalent: .Bd -literal -offset indent ipfw -q add deny src-ip 10.0.0.0/24,127.0.0.1/8 ipfw -q add deny src-ip 10.0.0.0/24, 127.0.0.1/8 ipfw "-q add deny src-ip 10.0.0.0/24, 127.0.0.1/8" .Ed .Sh RULE FORMAT The format of firewall rules is the following: .Bd -ragged -offset indent .Bk -words .Op Ar rule_number .Op Cm set Ar set_number .Op Cm prob Ar match_probability .Ar action .Op Cm log Op Cm logamount Ar number .Op Cm altq Ar queue .Oo .Bro Cm tag | untag .Brc Ar number .Oc .Ar body .Ek .Ed .Pp where the body of the rule specifies which information is used for filtering packets, among the following: .Pp .Bl -tag -width "Source and dest. addresses and ports" -offset XXX -compact .It Layer-2 header fields When available .It IPv4 and IPv6 Protocol TCP, UDP, ICMP, etc. .It Source and dest. addresses and ports .It Direction See Section .Sx PACKET FLOW .It Transmit and receive interface By name or address .It Misc. IP header fields Version, type of service, datagram length, identification, fragment flag (non-zero IP offset), Time To Live .It IP options .It IPv6 Extension headers Fragmentation, Hop-by-Hop options, Routing Headers, Source routing rthdr0, Mobile IPv6 rthdr2, IPSec options. .It IPv6 Flow-ID .It Misc. TCP header fields TCP flags (SYN, FIN, ACK, RST, etc.), sequence number, acknowledgment number, window .It TCP options .It ICMP types for ICMP packets .It ICMP6 types for ICMP6 packets .It User/group ID When the packet can be associated with a local socket. .It Divert status Whether a packet came from a divert socket (e.g., .Xr natd 8 ) . .It Fib annotation state Whether a packet has been tagged for using a specific FIB (routing table) in future forwarding decisions. .El .Pp Note that some of the above information, e.g.\& source MAC or IP addresses and TCP/UDP ports, can be easily spoofed, so filtering on those fields alone might not guarantee the desired results. .Bl -tag -width indent .It Ar rule_number Each rule is associated with a .Ar rule_number in the range 1..65535, with the latter reserved for the .Em default rule. Rules are checked sequentially by rule number. Multiple rules can have the same number, in which case they are checked (and listed) according to the order in which they have been added. If a rule is entered without specifying a number, the kernel will assign one in such a way that the rule becomes the last one before the .Em default rule. Automatic rule numbers are assigned by incrementing the last non-default rule number by the value of the sysctl variable .Ar net.inet.ip.fw.autoinc_step which defaults to 100. If this is not possible (e.g.\& because we would go beyond the maximum allowed rule number), the number of the last non-default value is used instead. .It Cm set Ar set_number Each rule is associated with a .Ar set_number in the range 0..31. Sets can be individually disabled and enabled, so this parameter is of fundamental importance for atomic ruleset manipulation. It can be also used to simplify deletion of groups of rules. If a rule is entered without specifying a set number, set 0 will be used. .br Set 31 is special in that it cannot be disabled, and rules in set 31 are not deleted by the .Nm ipfw flush command (but you can delete them with the .Nm ipfw delete set 31 command). Set 31 is also used for the .Em default rule. .It Cm prob Ar match_probability A match is only declared with the specified probability (floating point number between 0 and 1). This can be useful for a number of applications such as random packet drop or (in conjunction with .Nm dummynet ) to simulate the effect of multiple paths leading to out-of-order packet delivery. .Pp Note: this condition is checked before any other condition, including ones such as keep-state or check-state which might have side effects. .It Cm log Op Cm logamount Ar number Packets matching a rule with the .Cm log keyword will be made available for logging in two ways: if the sysctl variable .Va net.inet.ip.fw.verbose is set to 0 (default), one can use .Xr bpf 4 attached to the .Li ipfw0 pseudo interface. This pseudo interface can be created after a boot manually by using the following command: .Bd -literal -offset indent # ifconfig ipfw0 create .Ed .Pp Or, automatically at boot time by adding the following line to the .Xr rc.conf 5 file: .Bd -literal -offset indent firewall_logif="YES" .Ed .Pp There is no overhead if no .Xr bpf 4 is attached to the pseudo interface. .Pp If .Va net.inet.ip.fw.verbose is set to 1, packets will be logged to .Xr syslogd 8 with a .Dv LOG_SECURITY facility up to a maximum of .Cm logamount packets. If no .Cm logamount is specified, the limit is taken from the sysctl variable .Va net.inet.ip.fw.verbose_limit . In both cases, a value of 0 means unlimited logging. .Pp Once the limit is reached, logging can be re-enabled by clearing the logging counter or the packet counter for that entry, see the .Cm resetlog command. .Pp Note: logging is done after all other packet matching conditions have been successfully verified, and before performing the final action (accept, deny, etc.) on the packet. .It Cm tag Ar number When a packet matches a rule with the .Cm tag keyword, the numeric tag for the given .Ar number in the range 1..65534 will be attached to the packet. The tag acts as an internal marker (it is not sent out over the wire) that can be used to identify these packets later on. This can be used, for example, to provide trust between interfaces and to start doing policy-based filtering. A packet can have multiple tags at the same time. Tags are "sticky", meaning once a tag is applied to a packet by a matching rule it exists until explicit removal. Tags are kept with the packet everywhere within the kernel, but are lost when packet leaves the kernel, for example, on transmitting packet out to the network or sending packet to a .Xr divert 4 socket. .Pp To check for previously applied tags, use the .Cm tagged rule option. To delete previously applied tag, use the .Cm untag keyword. .Pp Note: since tags are kept with the packet everywhere in kernelspace, they can be set and unset anywhere in the kernel network subsystem (using the .Xr mbuf_tags 9 facility), not only by means of the .Xr ipfw 4 .Cm tag and .Cm untag keywords. For example, there can be a specialized .Xr netgraph 4 node doing traffic analyzing and tagging for later inspecting in firewall. .It Cm untag Ar number When a packet matches a rule with the .Cm untag keyword, the tag with the number .Ar number is searched among the tags attached to this packet and, if found, removed from it. Other tags bound to packet, if present, are left untouched. .It Cm altq Ar queue When a packet matches a rule with the .Cm altq keyword, the ALTQ identifier for the given .Ar queue (see .Xr altq 4 ) will be attached. Note that this ALTQ tag is only meaningful for packets going "out" of IPFW, and not being rejected or going to divert sockets. Note that if there is insufficient memory at the time the packet is processed, it will not be tagged, so it is wise to make your ALTQ "default" queue policy account for this. If multiple .Cm altq rules match a single packet, only the first one adds the ALTQ classification tag. In doing so, traffic may be shaped by using .Cm count Cm altq Ar queue rules for classification early in the ruleset, then later applying the filtering decision. For example, .Cm check-state and .Cm keep-state rules may come later and provide the actual filtering decisions in addition to the fallback ALTQ tag. .Pp You must run .Xr pfctl 8 to set up the queues before IPFW will be able to look them up by name, and if the ALTQ disciplines are rearranged, the rules in containing the queue identifiers in the kernel will likely have gone stale and need to be reloaded. Stale queue identifiers will probably result in misclassification. .Pp All system ALTQ processing can be turned on or off via .Nm .Cm enable Ar altq and .Nm .Cm disable Ar altq . The usage of .Va net.inet.ip.fw.one_pass is irrelevant to ALTQ traffic shaping, as the actual rule action is followed always after adding an ALTQ tag. .El .Ss RULE ACTIONS A rule can be associated with one of the following actions, which will be executed when the packet matches the body of the rule. .Bl -tag -width indent .It Cm allow | accept | pass | permit Allow packets that match rule. The search terminates. .It Cm check-state Checks the packet against the dynamic ruleset. If a match is found, execute the action associated with the rule which generated this dynamic rule, otherwise move to the next rule. .br .Cm Check-state rules do not have a body. If no .Cm check-state rule is found, the dynamic ruleset is checked at the first .Cm keep-state or .Cm limit rule. .It Cm count Update counters for all packets that match rule. The search continues with the next rule. .It Cm deny | drop Discard packets that match this rule. The search terminates. .It Cm divert Ar port Divert packets that match this rule to the .Xr divert 4 socket bound to port .Ar port . The search terminates. .It Cm fwd | forward Ar ipaddr | tablearg Ns Op , Ns Ar port Change the next-hop on matching packets to .Ar ipaddr , which can be an IP address or a host name. For IPv4, the next hop can also be supplied by the last table looked up for the packet by using the .Cm tablearg keyword instead of an explicit address. The search terminates if this rule matches. .Pp If .Ar ipaddr is a local address, then matching packets will be forwarded to .Ar port (or the port number in the packet if one is not specified in the rule) on the local machine. .br If .Ar ipaddr is not a local address, then the port number (if specified) is ignored, and the packet will be forwarded to the remote address, using the route as found in the local routing table for that IP. .br A .Ar fwd rule will not match layer-2 packets (those received on ether_input, ether_output, or bridged). .br The .Cm fwd action does not change the contents of the packet at all. In particular, the destination address remains unmodified, so packets forwarded to another system will usually be rejected by that system unless there is a matching rule on that system to capture them. For packets forwarded locally, the local address of the socket will be set to the original destination address of the packet. This makes the .Xr netstat 1 entry look rather weird but is intended for use with transparent proxy servers. .It Cm nat Ar nat_nr | tablearg Pass packet to a nat instance (for network address translation, address redirect, etc.): see the .Sx NETWORK ADDRESS TRANSLATION (NAT) Section for further information. .It Cm pipe Ar pipe_nr Pass packet to a .Nm dummynet .Dq pipe (for bandwidth limitation, delay, etc.). See the .Sx TRAFFIC SHAPER (DUMMYNET) CONFIGURATION Section for further information. The search terminates; however, on exit from the pipe and if the .Xr sysctl 8 variable .Va net.inet.ip.fw.one_pass is not set, the packet is passed again to the firewall code starting from the next rule. .It Cm queue Ar queue_nr Pass packet to a .Nm dummynet .Dq queue (for bandwidth limitation using WF2Q+). .It Cm reject (Deprecated). Synonym for .Cm unreach host . .It Cm reset Discard packets that match this rule, and if the packet is a TCP packet, try to send a TCP reset (RST) notice. The search terminates. .It Cm reset6 Discard packets that match this rule, and if the packet is a TCP packet, try to send a TCP reset (RST) notice. The search terminates. .It Cm skipto Ar number | tablearg Skip all subsequent rules numbered less than .Ar number . The search continues with the first rule numbered .Ar number or higher. It is possible to use the .Cm tablearg keyword with a skipto for a .Em computed skipto, but care should be used, as no destination caching is possible in this case so the rules are always walked to find it, starting from the .Cm skipto . .It Cm call Ar number | tablearg The current rule number is saved in the internal stack and ruleset processing continues with the first rule numbered .Ar number or higher. If later a rule with the .Cm return action is encountered, the processing returns to the first rule with number of this .Cm call rule plus one or higher (the same behaviour as with packets returning from .Xr divert 4 socket after a .Cm divert action). This could be used to make somewhat like an assembly language .Dq subroutine calls to rules with common checks for different interfaces, etc. .Pp Rule with any number could be called, not just forward jumps as with .Cm skipto . So, to prevent endless loops in case of mistakes, both .Cm call and .Cm return actions don't do any jumps and simply go to the next rule if memory cannot be allocated or stack overflowed/underflowed. .Pp Internally stack for rule numbers is implemented using .Xr mbuf_tags 9 facility and currently has size of 16 entries. As mbuf tags are lost when packet leaves the kernel, .Cm divert should not be used in subroutines to avoid endless loops and other undesired effects. .It Cm return Takes rule number saved to internal stack by the last .Cm call action and returns ruleset processing to the first rule with number greater than number of corresponding .Cm call rule. See description of the .Cm call action for more details. .Pp Note that .Cm return rules usually end a .Dq subroutine and thus are unconditional, but .Nm command-line utility currently requires every action except .Cm check-state to have body. While it is sometimes useful to return only on some packets, usually you want to print just .Dq return for readability. A workaround for this is to use new syntax and .Fl c switch: .Bd -literal -offset indent # Add a rule without actual body ipfw add 2999 return via any # List rules without "from any to any" part ipfw -c list .Ed .Pp This cosmetic annoyance may be fixed in future releases. .It Cm tee Ar port Send a copy of packets matching this rule to the .Xr divert 4 socket bound to port .Ar port . The search continues with the next rule. .It Cm unreach Ar code Discard packets that match this rule, and try to send an ICMP unreachable notice with code .Ar code , where .Ar code is a number from 0 to 255, or one of these aliases: .Cm net , host , protocol , port , .Cm needfrag , srcfail , net-unknown , host-unknown , .Cm isolated , net-prohib , host-prohib , tosnet , .Cm toshost , filter-prohib , host-precedence or .Cm precedence-cutoff . The search terminates. .It Cm unreach6 Ar code Discard packets that match this rule, and try to send an ICMPv6 unreachable notice with code .Ar code , where .Ar code is a number from 0, 1, 3 or 4, or one of these aliases: .Cm no-route, admin-prohib, address or .Cm port . The search terminates. .It Cm netgraph Ar cookie Divert packet into netgraph with given .Ar cookie . The search terminates. If packet is later returned from netgraph it is either accepted or continues with the next rule, depending on .Va net.inet.ip.fw.one_pass sysctl variable. .It Cm ngtee Ar cookie A copy of packet is diverted into netgraph, original packet continues with the next rule. See .Xr ng_ipfw 4 for more information on .Cm netgraph and .Cm ngtee actions. .It Cm setfib Ar fibnum | tablearg The packet is tagged so as to use the FIB (routing table) .Ar fibnum in any subsequent forwarding decisions. In the current implementation, this is limited to the values 0 through 15, see .Xr setfib 2 . Processing continues at the next rule. It is possible to use the .Cm tablearg keyword with setfib. If the tablearg value is not within the compiled range of fibs, the packet's fib is set to 0. .It Cm setdscp Ar DSCP | number | tablearg Set specified DiffServ codepoint for an IPv4/IPv6 packet. Processing continues at the next rule. Supported values are: .Pp .Cm CS0 .Pq Dv 000000 , .Cm CS1 .Pq Dv 001000 , .Cm CS2 .Pq Dv 010000 , .Cm CS3 .Pq Dv 011000 , .Cm CS4 .Pq Dv 100000 , .Cm CS5 .Pq Dv 101000 , .Cm CS6 .Pq Dv 110000 , .Cm CS7 .Pq Dv 111000 , .Cm AF11 .Pq Dv 001010 , .Cm AF12 .Pq Dv 001100 , .Cm AF13 .Pq Dv 001110 , .Cm AF21 .Pq Dv 010010 , .Cm AF22 .Pq Dv 010100 , .Cm AF23 .Pq Dv 010110 , .Cm AF31 .Pq Dv 011010 , .Cm AF32 .Pq Dv 011100 , .Cm AF33 .Pq Dv 011110 , .Cm AF41 .Pq Dv 100010 , .Cm AF42 .Pq Dv 100100 , .Cm AF43 .Pq Dv 100110 , .Cm EF .Pq Dv 101110 , .Cm BE .Pq Dv 000000 . Additionally, DSCP value can be specified by number (0..64). It is also possible to use the .Cm tablearg keyword with setdscp. If the tablearg value is not within the 0..64 range, lower 6 bits of supplied value are used. .It Cm reass Queue and reassemble IP fragments. If the packet is not fragmented, counters are updated and processing continues with the next rule. If the packet is the last logical fragment, the packet is reassembled and, if .Va net.inet.ip.fw.one_pass is set to 0, processing continues with the next rule. Otherwise, the packet is allowed to pass and the search terminates. If the packet is a fragment in the middle of a logical group of fragments, it is consumed and processing stops immediately. .Pp Fragment handling can be tuned via .Va net.inet.ip.maxfragpackets and .Va net.inet.ip.maxfragsperpacket which limit, respectively, the maximum number of processable fragments (default: 800) and the maximum number of fragments per packet (default: 16). .Pp NOTA BENE: since fragments do not contain port numbers, they should be avoided with the .Nm reass rule. Alternatively, direction-based (like .Nm in / .Nm out ) and source-based (like .Nm via ) match patterns can be used to select fragments. .Pp Usually a simple rule like: .Bd -literal -offset indent # reassemble incoming fragments ipfw add reass all from any to any in .Ed .Pp is all you need at the beginning of your ruleset. .El .Ss RULE BODY The body of a rule contains zero or more patterns (such as specific source and destination addresses or ports, protocol options, incoming or outgoing interfaces, etc.) that the packet must match in order to be recognised. In general, the patterns are connected by (implicit) .Cm and operators -- i.e., all must match in order for the rule to match. Individual patterns can be prefixed by the .Cm not operator to reverse the result of the match, as in .Pp .Dl "ipfw add 100 allow ip from not 1.2.3.4 to any" .Pp Additionally, sets of alternative match patterns .Pq Em or-blocks can be constructed by putting the patterns in lists enclosed between parentheses ( ) or braces { }, and using the .Cm or operator as follows: .Pp .Dl "ipfw add 100 allow ip from { x or not y or z } to any" .Pp Only one level of parentheses is allowed. Beware that most shells have special meanings for parentheses or braces, so it is advisable to put a backslash \\ in front of them to prevent such interpretations. .Pp The body of a rule must in general include a source and destination address specifier. The keyword .Ar any can be used in various places to specify that the content of a required field is irrelevant. .Pp The rule body has the following format: .Bd -ragged -offset indent .Op Ar proto Cm from Ar src Cm to Ar dst .Op Ar options .Ed .Pp The first part (proto from src to dst) is for backward compatibility with earlier versions of .Fx . In modern .Fx any match pattern (including MAC headers, IP protocols, addresses and ports) can be specified in the .Ar options section. .Pp Rule fields have the following meaning: .Bl -tag -width indent .It Ar proto : protocol | Cm { Ar protocol Cm or ... } .It Ar protocol : Oo Cm not Oc Ar protocol-name | protocol-number An IP protocol specified by number or name (for a complete list see .Pa /etc/protocols ) , or one of the following keywords: .Bl -tag -width indent .It Cm ip4 | ipv4 Matches IPv4 packets. .It Cm ip6 | ipv6 Matches IPv6 packets. .It Cm ip | all Matches any packet. .El .Pp The .Cm ipv6 in .Cm proto option will be treated as inner protocol. And, the .Cm ipv4 is not available in .Cm proto option. .Pp The .Cm { Ar protocol Cm or ... } format (an .Em or-block ) is provided for convenience only but its use is deprecated. .It Ar src No and Ar dst : Bro Cm addr | Cm { Ar addr Cm or ... } Brc Op Oo Cm not Oc Ar ports An address (or a list, see below) optionally followed by .Ar ports specifiers. .Pp The second format .Em ( or-block with multiple addresses) is provided for convenience only and its use is discouraged. .It Ar addr : Oo Cm not Oc Bro .Cm any | me | me6 | .Cm table Ns Pq Ar number Ns Op , Ns Ar value .Ar | addr-list | addr-set .Brc .Bl -tag -width indent .It Cm any matches any IP address. .It Cm me matches any IP address configured on an interface in the system. .It Cm me6 matches any IPv6 address configured on an interface in the system. The address list is evaluated at the time the packet is analysed. .It Cm table Ns Pq Ar number Ns Op , Ns Ar value Matches any IPv4 address for which an entry exists in the lookup table .Ar number . If an optional 32-bit unsigned .Ar value is also specified, an entry will match only if it has this value. See the .Sx LOOKUP TABLES section below for more information on lookup tables. .El .It Ar addr-list : ip-addr Ns Op Ns , Ns Ar addr-list .It Ar ip-addr : A host or subnet address specified in one of the following ways: .Bl -tag -width indent .It Ar numeric-ip | hostname Matches a single IPv4 address, specified as dotted-quad or a hostname. Hostnames are resolved at the time the rule is added to the firewall list. .It Ar addr Ns / Ns Ar masklen Matches all addresses with base .Ar addr (specified as an IP address, a network number, or a hostname) and mask width of .Cm masklen bits. As an example, 1.2.3.4/25 or 1.2.3.0/25 will match all IP numbers from 1.2.3.0 to 1.2.3.127 . .It Ar addr Ns : Ns Ar mask Matches all addresses with base .Ar addr (specified as an IP address, a network number, or a hostname) and the mask of .Ar mask , specified as a dotted quad. As an example, 1.2.3.4:255.0.255.0 or 1.0.3.0:255.0.255.0 will match 1.*.3.*. This form is advised only for non-contiguous masks. It is better to resort to the .Ar addr Ns / Ns Ar masklen format for contiguous masks, which is more compact and less error-prone. .El .It Ar addr-set : addr Ns Oo Ns / Ns Ar masklen Oc Ns Cm { Ns Ar list Ns Cm } .It Ar list : Bro Ar num | num-num Brc Ns Op Ns , Ns Ar list Matches all addresses with base address .Ar addr (specified as an IP address, a network number, or a hostname) and whose last byte is in the list between braces { } . Note that there must be no spaces between braces and numbers (spaces after commas are allowed). Elements of the list can be specified as single entries or ranges. The .Ar masklen field is used to limit the size of the set of addresses, and can have any value between 24 and 32. If not specified, it will be assumed as 24. .br This format is particularly useful to handle sparse address sets within a single rule. Because the matching occurs using a bitmask, it takes constant time and dramatically reduces the complexity of rulesets. .br As an example, an address specified as 1.2.3.4/24{128,35-55,89} or 1.2.3.0/24{128,35-55,89} will match the following IP addresses: .br 1.2.3.128, 1.2.3.35 to 1.2.3.55, 1.2.3.89 . .It Ar addr6-list : ip6-addr Ns Op Ns , Ns Ar addr6-list .It Ar ip6-addr : A host or subnet specified one of the following ways: .Bl -tag -width indent .It Ar numeric-ip | hostname Matches a single IPv6 address as allowed by .Xr inet_pton 3 or a hostname. Hostnames are resolved at the time the rule is added to the firewall list. .It Ar addr Ns / Ns Ar masklen Matches all IPv6 addresses with base .Ar addr (specified as allowed by .Xr inet_pton or a hostname) and mask width of .Cm masklen bits. .El .Pp No support for sets of IPv6 addresses is provided because IPv6 addresses are typically random past the initial prefix. .It Ar ports : Bro Ar port | port Ns \&- Ns Ar port Ns Brc Ns Op , Ns Ar ports For protocols which support port numbers (such as TCP and UDP), optional .Cm ports may be specified as one or more ports or port ranges, separated by commas but no spaces, and an optional .Cm not operator. The .Ql \&- notation specifies a range of ports (including boundaries). .Pp Service names (from .Pa /etc/services ) may be used instead of numeric port values. The length of the port list is limited to 30 ports or ranges, though one can specify larger ranges by using an .Em or-block in the .Cm options section of the rule. .Pp A backslash .Pq Ql \e can be used to escape the dash .Pq Ql - character in a service name (from a shell, the backslash must be typed twice to avoid the shell itself interpreting it as an escape character). .Pp .Dl "ipfw add count tcp from any ftp\e\e-data-ftp to any" .Pp Fragmented packets which have a non-zero offset (i.e., not the first fragment) will never match a rule which has one or more port specifications. See the .Cm frag option for details on matching fragmented packets. .El .Ss RULE OPTIONS (MATCH PATTERNS) Additional match patterns can be used within rules. Zero or more of these so-called .Em options can be present in a rule, optionally prefixed by the .Cm not operand, and possibly grouped into .Em or-blocks . .Pp The following match patterns can be used (listed in alphabetical order): .Bl -tag -width indent .It Cm // this is a comment. Inserts the specified text as a comment in the rule. Everything following // is considered as a comment and stored in the rule. You can have comment-only rules, which are listed as having a .Cm count action followed by the comment. .It Cm bridged Alias for .Cm layer2 . .It Cm diverted Matches only packets generated by a divert socket. .It Cm diverted-loopback Matches only packets coming from a divert socket back into the IP stack input for delivery. .It Cm diverted-output Matches only packets going from a divert socket back outward to the IP stack output for delivery. .It Cm dst-ip Ar ip-address Matches IPv4 packets whose destination IP is one of the address(es) specified as argument. .It Bro Cm dst-ip6 | dst-ipv6 Brc Ar ip6-address Matches IPv6 packets whose destination IP is one of the address(es) specified as argument. .It Cm dst-port Ar ports Matches IP packets whose destination port is one of the port(s) specified as argument. .It Cm established Matches TCP packets that have the RST or ACK bits set. .It Cm ext6hdr Ar header Matches IPv6 packets containing the extended header given by .Ar header . Supported headers are: .Pp Fragment, .Pq Cm frag , Hop-to-hop options .Pq Cm hopopt , any type of Routing Header .Pq Cm route , Source routing Routing Header Type 0 .Pq Cm rthdr0 , Mobile IPv6 Routing Header Type 2 .Pq Cm rthdr2 , Destination options .Pq Cm dstopt , IPSec authentication headers .Pq Cm ah , and IPsec encapsulated security payload headers .Pq Cm esp . .It Cm fib Ar fibnum Matches a packet that has been tagged to use the given FIB (routing table) number. .It Cm flow-id Ar labels Matches IPv6 packets containing any of the flow labels given in .Ar labels . .Ar labels is a comma separated list of numeric flow labels. .It Cm frag Matches packets that are fragments and not the first fragment of an IP datagram. Note that these packets will not have the next protocol header (e.g.\& TCP, UDP) so options that look into these headers cannot match. .It Cm gid Ar group Matches all TCP or UDP packets sent by or received for a .Ar group . A .Ar group may be specified by name or number. .It Cm jail Ar prisonID Matches all TCP or UDP packets sent by or received for the jail whos prison ID is .Ar prisonID . .It Cm icmptypes Ar types Matches ICMP packets whose ICMP type is in the list .Ar types . The list may be specified as any combination of individual types (numeric) separated by commas. .Em Ranges are not allowed . The supported ICMP types are: .Pp echo reply .Pq Cm 0 , destination unreachable .Pq Cm 3 , source quench .Pq Cm 4 , redirect .Pq Cm 5 , echo request .Pq Cm 8 , router advertisement .Pq Cm 9 , router solicitation .Pq Cm 10 , time-to-live exceeded .Pq Cm 11 , IP header bad .Pq Cm 12 , timestamp request .Pq Cm 13 , timestamp reply .Pq Cm 14 , information request .Pq Cm 15 , information reply .Pq Cm 16 , address mask request .Pq Cm 17 and address mask reply .Pq Cm 18 . .It Cm icmp6types Ar types Matches ICMP6 packets whose ICMP6 type is in the list of .Ar types . The list may be specified as any combination of individual types (numeric) separated by commas. .Em Ranges are not allowed . .It Cm in | out Matches incoming or outgoing packets, respectively. .Cm in and .Cm out are mutually exclusive (in fact, .Cm out is implemented as .Cm not in Ns No ). .It Cm ipid Ar id-list Matches IPv4 packets whose .Cm ip_id field has value included in .Ar id-list , which is either a single value or a list of values or ranges specified in the same way as .Ar ports . .It Cm iplen Ar len-list Matches IP packets whose total length, including header and data, is in the set .Ar len-list , which is either a single value or a list of values or ranges specified in the same way as .Ar ports . .It Cm ipoptions Ar spec Matches packets whose IPv4 header contains the comma separated list of options specified in .Ar spec . The supported IP options are: .Pp .Cm ssrr (strict source route), .Cm lsrr (loose source route), .Cm rr (record packet route) and .Cm ts (timestamp). The absence of a particular option may be denoted with a .Ql \&! . .It Cm ipprecedence Ar precedence Matches IPv4 packets whose precedence field is equal to .Ar precedence . .It Cm ipsec Matches packets that have IPSEC history associated with them (i.e., the packet comes encapsulated in IPSEC, the kernel has IPSEC support and IPSEC_FILTERTUNNEL option, and can correctly decapsulate it). .Pp Note that specifying .Cm ipsec is different from specifying .Cm proto Ar ipsec as the latter will only look at the specific IP protocol field, irrespective of IPSEC kernel support and the validity of the IPSEC data. .Pp Further note that this flag is silently ignored in kernels without IPSEC support. It does not affect rule processing when given and the rules are handled as if with no .Cm ipsec flag. .It Cm iptos Ar spec Matches IPv4 packets whose .Cm tos field contains the comma separated list of service types specified in .Ar spec . The supported IP types of service are: .Pp .Cm lowdelay .Pq Dv IPTOS_LOWDELAY , .Cm throughput .Pq Dv IPTOS_THROUGHPUT , .Cm reliability .Pq Dv IPTOS_RELIABILITY , .Cm mincost .Pq Dv IPTOS_MINCOST , .Cm congestion .Pq Dv IPTOS_ECN_CE . The absence of a particular type may be denoted with a .Ql \&! . .It Cm dscp spec Ns Op , Ns Ar spec Matches IPv4/IPv6 packets whose .Cm DS field value is contained in .Ar spec mask. Multiple values can be specified via the comma separated list. Value can be one of keywords used in .Cm setdscp action or exact number. .It Cm ipttl Ar ttl-list Matches IPv4 packets whose time to live is included in .Ar ttl-list , which is either a single value or a list of values or ranges specified in the same way as .Ar ports . .It Cm ipversion Ar ver Matches IP packets whose IP version field is .Ar ver . .It Cm keep-state Upon a match, the firewall will create a dynamic rule, whose default behaviour is to match bidirectional traffic between source and destination IP/port using the same protocol. The rule has a limited lifetime (controlled by a set of .Xr sysctl 8 variables), and the lifetime is refreshed every time a matching packet is found. .It Cm layer2 Matches only layer2 packets, i.e., those passed to .Nm from ether_demux() and ether_output_frame(). .It Cm limit Bro Cm src-addr | src-port | dst-addr | dst-port Brc Ar N The firewall will only allow .Ar N connections with the same set of parameters as specified in the rule. One or more of source and destination addresses and ports can be specified. Currently, only IPv4 flows are supported. .It Cm lookup Bro Cm dst-ip | dst-port | src-ip | src-port | uid | jail Brc Ar N Search an entry in lookup table .Ar N that matches the field specified as argument. If not found, the match fails. Otherwise, the match succeeds and .Cm tablearg is set to the value extracted from the table. .Pp This option can be useful to quickly dispatch traffic based on certain packet fields. See the .Sx LOOKUP TABLES section below for more information on lookup tables. .It Cm { MAC | mac } Ar dst-mac src-mac Match packets with a given .Ar dst-mac and .Ar src-mac addresses, specified as the .Cm any keyword (matching any MAC address), or six groups of hex digits separated by colons, and optionally followed by a mask indicating the significant bits. The mask may be specified using either of the following methods: .Bl -enum -width indent .It A slash .Pq / followed by the number of significant bits. For example, an address with 33 significant bits could be specified as: .Pp .Dl "MAC 10:20:30:40:50:60/33 any" .Pp .It An ampersand .Pq & followed by a bitmask specified as six groups of hex digits separated by colons. For example, an address in which the last 16 bits are significant could be specified as: .Pp .Dl "MAC 10:20:30:40:50:60&00:00:00:00:ff:ff any" .Pp Note that the ampersand character has a special meaning in many shells and should generally be escaped. -.Pp .El Note that the order of MAC addresses (destination first, source second) is the same as on the wire, but the opposite of the one used for IP addresses. .It Cm mac-type Ar mac-type Matches packets whose Ethernet Type field corresponds to one of those specified as argument. .Ar mac-type is specified in the same way as .Cm port numbers (i.e., one or more comma-separated single values or ranges). You can use symbolic names for known values such as .Em vlan , ipv4, ipv6 . Values can be entered as decimal or hexadecimal (if prefixed by 0x), and they are always printed as hexadecimal (unless the .Cm -N option is used, in which case symbolic resolution will be attempted). .It Cm proto Ar protocol Matches packets with the corresponding IP protocol. .It Cm recv | xmit | via Brq Ar ifX | Ar if Ns Cm * | Ar table Ns Pq Ar number Ns Op , Ns Ar value | Ar ipno | Ar any Matches packets received, transmitted or going through, respectively, the interface specified by exact name .Po Ar ifX Pc , by device name .Po Ar if* Pc , by IP address, or through some interface. .Pp The .Cm via keyword causes the interface to always be checked. If .Cm recv or .Cm xmit is used instead of .Cm via , then only the receive or transmit interface (respectively) is checked. By specifying both, it is possible to match packets based on both receive and transmit interface, e.g.: .Pp .Dl "ipfw add deny ip from any to any out recv ed0 xmit ed1" .Pp The .Cm recv interface can be tested on either incoming or outgoing packets, while the .Cm xmit interface can only be tested on outgoing packets. So .Cm out is required (and .Cm in is invalid) whenever .Cm xmit is used. .Pp A packet might not have a receive or transmit interface: packets originating from the local host have no receive interface, while packets destined for the local host have no transmit interface. .It Cm setup Matches TCP packets that have the SYN bit set but no ACK bit. This is the short form of .Dq Li tcpflags\ syn,!ack . .It Cm sockarg Matches packets that are associated to a local socket and for which the SO_USER_COOKIE socket option has been set to a non-zero value. As a side effect, the value of the option is made available as .Cm tablearg value, which in turn can be used as .Cm skipto or .Cm pipe number. .It Cm src-ip Ar ip-address Matches IPv4 packets whose source IP is one of the address(es) specified as an argument. .It Cm src-ip6 Ar ip6-address Matches IPv6 packets whose source IP is one of the address(es) specified as an argument. .It Cm src-port Ar ports Matches IP packets whose source port is one of the port(s) specified as argument. .It Cm tagged Ar tag-list Matches packets whose tags are included in .Ar tag-list , which is either a single value or a list of values or ranges specified in the same way as .Ar ports . Tags can be applied to the packet using .Cm tag rule action parameter (see it's description for details on tags). .It Cm tcpack Ar ack TCP packets only. Match if the TCP header acknowledgment number field is set to .Ar ack . .It Cm tcpdatalen Ar tcpdatalen-list Matches TCP packets whose length of TCP data is .Ar tcpdatalen-list , which is either a single value or a list of values or ranges specified in the same way as .Ar ports . .It Cm tcpflags Ar spec TCP packets only. Match if the TCP header contains the comma separated list of flags specified in .Ar spec . The supported TCP flags are: .Pp .Cm fin , .Cm syn , .Cm rst , .Cm psh , .Cm ack and .Cm urg . The absence of a particular flag may be denoted with a .Ql \&! . A rule which contains a .Cm tcpflags specification can never match a fragmented packet which has a non-zero offset. See the .Cm frag option for details on matching fragmented packets. .It Cm tcpseq Ar seq TCP packets only. Match if the TCP header sequence number field is set to .Ar seq . .It Cm tcpwin Ar tcpwin-list Matches TCP packets whose header window field is set to .Ar tcpwin-list , which is either a single value or a list of values or ranges specified in the same way as .Ar ports . .It Cm tcpoptions Ar spec TCP packets only. Match if the TCP header contains the comma separated list of options specified in .Ar spec . The supported TCP options are: .Pp .Cm mss (maximum segment size), .Cm window (tcp window advertisement), .Cm sack (selective ack), .Cm ts (rfc1323 timestamp) and .Cm cc (rfc1644 t/tcp connection count). The absence of a particular option may be denoted with a .Ql \&! . .It Cm uid Ar user Match all TCP or UDP packets sent by or received for a .Ar user . A .Ar user may be matched by name or identification number. .It Cm verrevpath For incoming packets, a routing table lookup is done on the packet's source address. If the interface on which the packet entered the system matches the outgoing interface for the route, the packet matches. If the interfaces do not match up, the packet does not match. All outgoing packets or packets with no incoming interface match. .Pp The name and functionality of the option is intentionally similar to the Cisco IOS command: .Pp .Dl ip verify unicast reverse-path .Pp This option can be used to make anti-spoofing rules to reject all packets with source addresses not from this interface. See also the option .Cm antispoof . .It Cm versrcreach For incoming packets, a routing table lookup is done on the packet's source address. If a route to the source address exists, but not the default route or a blackhole/reject route, the packet matches. Otherwise, the packet does not match. All outgoing packets match. .Pp The name and functionality of the option is intentionally similar to the Cisco IOS command: .Pp .Dl ip verify unicast source reachable-via any .Pp This option can be used to make anti-spoofing rules to reject all packets whose source address is unreachable. .It Cm antispoof For incoming packets, the packet's source address is checked if it belongs to a directly connected network. If the network is directly connected, then the interface the packet came on in is compared to the interface the network is connected to. When incoming interface and directly connected interface are not the same, the packet does not match. Otherwise, the packet does match. All outgoing packets match. .Pp This option can be used to make anti-spoofing rules to reject all packets that pretend to be from a directly connected network but do not come in through that interface. This option is similar to but more restricted than .Cm verrevpath because it engages only on packets with source addresses of directly connected networks instead of all source addresses. .El .Sh LOOKUP TABLES Lookup tables are useful to handle large sparse sets of addresses or other search keys (e.g., ports, jail IDs, interface names). In the rest of this section we will use the term ``address''. There may be up to 65535 different lookup tables, numbered 0 to 65534. .Pp Each entry is represented by an .Ar addr Ns Op / Ns Ar masklen and will match all addresses with base .Ar addr (specified as an IPv4/IPv6 address, a hostname or an unsigned integer) and mask width of .Ar masklen bits. If .Ar masklen is not specified, it defaults to 32 for IPv4 and 128 for IPv6. When looking up an IP address in a table, the most specific entry will match. Associated with each entry is a 32-bit unsigned .Ar value , which can optionally be checked by a rule matching code. When adding an entry, if .Ar value is not specified, it defaults to 0. .Pp An entry can be added to a table .Pq Cm add , or removed from a table .Pq Cm delete . A table can be examined .Pq Cm list or flushed .Pq Cm flush . .Pp Internally, each table is stored in a Radix tree, the same way as the routing table (see .Xr route 4 ) . .Pp Lookup tables currently support only ports, jail IDs, IPv4/IPv6 addresses and interface names. Wildcards is not supported for interface names. .Pp The .Cm tablearg feature provides the ability to use a value, looked up in the table, as the argument for a rule action, action parameter or rule option. This can significantly reduce number of rules in some configurations. If two tables are used in a rule, the result of the second (destination) is used. The .Cm tablearg argument can be used with the following actions: .Cm nat, pipe , queue, divert, tee, netgraph, ngtee, fwd, skipto, setfib, action parameters: .Cm tag, untag, rule options: .Cm limit, tagged. .Pp When used with .Cm fwd it is possible to supply table entries with values that are in the form of IP addresses or hostnames. See the .Sx EXAMPLES Section for example usage of tables and the tablearg keyword. .Pp When used with the .Cm skipto action, the user should be aware that the code will walk the ruleset up to a rule equal to, or past, the given number, and should therefore try keep the ruleset compact between the skipto and the target rules. .Sh SETS OF RULES Each rule belongs to one of 32 different .Em sets , numbered 0 to 31. Set 31 is reserved for the default rule. .Pp By default, rules are put in set 0, unless you use the .Cm set N attribute when entering a new rule. Sets can be individually and atomically enabled or disabled, so this mechanism permits an easy way to store multiple configurations of the firewall and quickly (and atomically) switch between them. The command to enable/disable sets is .Bd -ragged -offset indent .Nm .Cm set Oo Cm disable Ar number ... Oc Op Cm enable Ar number ... .Ed .Pp where multiple .Cm enable or .Cm disable sections can be specified. Command execution is atomic on all the sets specified in the command. By default, all sets are enabled. .Pp When you disable a set, its rules behave as if they do not exist in the firewall configuration, with only one exception: .Bd -ragged -offset indent dynamic rules created from a rule before it had been disabled will still be active until they expire. In order to delete dynamic rules you have to explicitly delete the parent rule which generated them. .Ed .Pp The set number of rules can be changed with the command .Bd -ragged -offset indent .Nm .Cm set move .Brq Cm rule Ar rule-number | old-set .Cm to Ar new-set .Ed .Pp Also, you can atomically swap two rulesets with the command .Bd -ragged -offset indent .Nm .Cm set swap Ar first-set second-set .Ed .Pp See the .Sx EXAMPLES Section on some possible uses of sets of rules. .Sh STATEFUL FIREWALL Stateful operation is a way for the firewall to dynamically create rules for specific flows when packets that match a given pattern are detected. Support for stateful operation comes through the .Cm check-state , keep-state and .Cm limit options of .Nm rules . .Pp Dynamic rules are created when a packet matches a .Cm keep-state or .Cm limit rule, causing the creation of a .Em dynamic rule which will match all and only packets with a given .Em protocol between a .Em src-ip/src-port dst-ip/dst-port pair of addresses .Em ( src and .Em dst are used here only to denote the initial match addresses, but they are completely equivalent afterwards). Dynamic rules will be checked at the first .Cm check-state, keep-state or .Cm limit occurrence, and the action performed upon a match will be the same as in the parent rule. .Pp Note that no additional attributes other than protocol and IP addresses and ports are checked on dynamic rules. .Pp The typical use of dynamic rules is to keep a closed firewall configuration, but let the first TCP SYN packet from the inside network install a dynamic rule for the flow so that packets belonging to that session will be allowed through the firewall: .Pp .Dl "ipfw add check-state" .Dl "ipfw add allow tcp from my-subnet to any setup keep-state" .Dl "ipfw add deny tcp from any to any" .Pp A similar approach can be used for UDP, where an UDP packet coming from the inside will install a dynamic rule to let the response through the firewall: .Pp .Dl "ipfw add check-state" .Dl "ipfw add allow udp from my-subnet to any keep-state" .Dl "ipfw add deny udp from any to any" .Pp Dynamic rules expire after some time, which depends on the status of the flow and the setting of some .Cm sysctl variables. See Section .Sx SYSCTL VARIABLES for more details. For TCP sessions, dynamic rules can be instructed to periodically send keepalive packets to refresh the state of the rule when it is about to expire. .Pp See Section .Sx EXAMPLES for more examples on how to use dynamic rules. .Sh TRAFFIC SHAPER (DUMMYNET) CONFIGURATION .Nm is also the user interface for the .Nm dummynet traffic shaper, packet scheduler and network emulator, a subsystem that can artificially queue, delay or drop packets emulating the behaviour of certain network links or queueing systems. .Pp .Nm dummynet operates by first using the firewall to select packets using any match pattern that can be used in .Nm rules. Matching packets are then passed to either of two different objects, which implement the traffic regulation: .Bl -hang -offset XXXX .It Em pipe A .Em pipe emulates a .Em link with given bandwidth and propagation delay, driven by a FIFO scheduler and a single queue with programmable queue size and packet loss rate. Packets are appended to the queue as they come out from .Nm ipfw , and then transferred in FIFO order to the link at the desired rate. .It Em queue A .Em queue is an abstraction used to implement packet scheduling using one of several packet scheduling algorithms. Packets sent to a .Em queue are first grouped into flows according to a mask on the 5-tuple. Flows are then passed to the scheduler associated to the .Em queue , and each flow uses scheduling parameters (weight and others) as configured in the .Em queue itself. A scheduler in turn is connected to an emulated link, and arbitrates the link's bandwidth among backlogged flows according to weights and to the features of the scheduling algorithm in use. .El .Pp In practice, .Em pipes can be used to set hard limits to the bandwidth that a flow can use, whereas .Em queues can be used to determine how different flows share the available bandwidth. .Pp A graphical representation of the binding of queues, flows, schedulers and links is below. .Bd -literal -offset indent (flow_mask|sched_mask) sched_mask +---------+ weight Wx +-------------+ | |->-[flow]-->--| |-+ -->--| QUEUE x | ... | | | | |->-[flow]-->--| SCHEDuler N | | +---------+ | | | ... | +--[LINK N]-->-- +---------+ weight Wy | | +--[LINK N]-->-- | |->-[flow]-->--| | | -->--| QUEUE y | ... | | | | |->-[flow]-->--| | | +---------+ +-------------+ | +-------------+ .Ed It is important to understand the role of the SCHED_MASK and FLOW_MASK, which are configured through the commands .Dl "ipfw sched N config mask SCHED_MASK ..." and .Dl "ipfw queue X config mask FLOW_MASK ..." . .Pp The SCHED_MASK is used to assign flows to one or more scheduler instances, one for each value of the packet's 5-tuple after applying SCHED_MASK. As an example, using ``src-ip 0xffffff00'' creates one instance for each /24 destination subnet. .Pp The FLOW_MASK, together with the SCHED_MASK, is used to split packets into flows. As an example, using ``src-ip 0x000000ff'' together with the previous SCHED_MASK makes a flow for each individual source address. In turn, flows for each /24 subnet will be sent to the same scheduler instance. .Pp The above diagram holds even for the .Em pipe case, with the only restriction that a .Em pipe only supports a SCHED_MASK, and forces the use of a FIFO scheduler (these are for backward compatibility reasons; in fact, internally, a .Nm dummynet's pipe is implemented exactly as above). .Pp There are two modes of .Nm dummynet operation: .Dq normal and .Dq fast . The .Dq normal mode tries to emulate a real link: the .Nm dummynet scheduler ensures that the packet will not leave the pipe faster than it would on the real link with a given bandwidth. The .Dq fast mode allows certain packets to bypass the .Nm dummynet scheduler (if packet flow does not exceed pipe's bandwidth). This is the reason why the .Dq fast mode requires less CPU cycles per packet (on average) and packet latency can be significantly lower in comparison to a real link with the same bandwidth. The default mode is .Dq normal . The .Dq fast mode can be enabled by setting the .Va net.inet.ip.dummynet.io_fast .Xr sysctl 8 variable to a non-zero value. .Pp .Ss PIPE, QUEUE AND SCHEDULER CONFIGURATION The .Em pipe , .Em queue and .Em scheduler configuration commands are the following: .Bd -ragged -offset indent .Cm pipe Ar number Cm config Ar pipe-configuration .Pp .Cm queue Ar number Cm config Ar queue-configuration .Pp .Cm sched Ar number Cm config Ar sched-configuration .Ed .Pp The following parameters can be configured for a pipe: .Pp .Bl -tag -width indent -compact .It Cm bw Ar bandwidth | device Bandwidth, measured in .Sm off .Op Cm K | M .Brq Cm bit/s | Byte/s . .Sm on .Pp A value of 0 (default) means unlimited bandwidth. The unit must immediately follow the number, as in .Pp .Dl "ipfw pipe 1 config bw 300Kbit/s" .Pp If a device name is specified instead of a numeric value, as in .Pp .Dl "ipfw pipe 1 config bw tun0" .Pp then the transmit clock is supplied by the specified device. At the moment only the .Xr tun 4 device supports this functionality, for use in conjunction with .Xr ppp 8 . .Pp .It Cm delay Ar ms-delay Propagation delay, measured in milliseconds. The value is rounded to the next multiple of the clock tick (typically 10ms, but it is a good practice to run kernels with .Dq "options HZ=1000" to reduce the granularity to 1ms or less). The default value is 0, meaning no delay. .Pp .It Cm burst Ar size If the data to be sent exceeds the pipe's bandwidth limit (and the pipe was previously idle), up to .Ar size bytes of data are allowed to bypass the .Nm dummynet scheduler, and will be sent as fast as the physical link allows. Any additional data will be transmitted at the rate specified by the .Nm pipe bandwidth. The burst size depends on how long the pipe has been idle; the effective burst size is calculated as follows: MAX( .Ar size , .Nm bw * pipe_idle_time). .Pp .It Cm profile Ar filename A file specifying the additional overhead incurred in the transmission of a packet on the link. .Pp Some link types introduce extra delays in the transmission of a packet, e.g., because of MAC level framing, contention on the use of the channel, MAC level retransmissions and so on. From our point of view, the channel is effectively unavailable for this extra time, which is constant or variable depending on the link type. Additionally, packets may be dropped after this time (e.g., on a wireless link after too many retransmissions). We can model the additional delay with an empirical curve that represents its distribution. .Bd -literal -offset indent cumulative probability 1.0 ^ | L +-- loss-level x | ****** | * | ***** | * | ** | * +-------*-------------------> delay .Ed The empirical curve may have both vertical and horizontal lines. Vertical lines represent constant delay for a range of probabilities. Horizontal lines correspond to a discontinuity in the delay distribution: the pipe will use the largest delay for a given probability. .Pp The file format is the following, with whitespace acting as a separator and '#' indicating the beginning a comment: .Bl -tag -width indent .It Cm name Ar identifier optional name (listed by "ipfw pipe show") to identify the delay distribution; .It Cm bw Ar value the bandwidth used for the pipe. If not specified here, it must be present explicitly as a configuration parameter for the pipe; .It Cm loss-level Ar L the probability above which packets are lost. (0.0 <= L <= 1.0, default 1.0 i.e., no loss); .It Cm samples Ar N the number of samples used in the internal representation of the curve (2..1024; default 100); .It Cm "delay prob" | "prob delay" One of these two lines is mandatory and defines the format of the following lines with data points. .It Ar XXX Ar YYY 2 or more lines representing points in the curve, with either delay or probability first, according to the chosen format. The unit for delay is milliseconds. Data points do not need to be sorted. Also, the number of actual lines can be different from the value of the "samples" parameter: .Nm utility will sort and interpolate the curve as needed. .El .Pp Example of a profile file: .Bd -literal -offset indent name bla_bla_bla samples 100 loss-level 0.86 prob delay 0 200 # minimum overhead is 200ms 0.5 200 0.5 300 0.8 1000 0.9 1300 1 1300 #configuration file end .Ed .El .Pp The following parameters can be configured for a queue: .Pp .Bl -tag -width indent -compact .It Cm pipe Ar pipe_nr Connects a queue to the specified pipe. Multiple queues (with the same or different weights) can be connected to the same pipe, which specifies the aggregate rate for the set of queues. .Pp .It Cm weight Ar weight Specifies the weight to be used for flows matching this queue. The weight must be in the range 1..100, and defaults to 1. .El .Pp The following case-insensitive parameters can be configured for a scheduler: .Pp .Bl -tag -width indent -compact .It Cm type Ar {fifo | wf2q+ | rr | qfq} specifies the scheduling algorithm to use. .Bl -tag -width indent -compact .It Cm fifo is just a FIFO scheduler (which means that all packets are stored in the same queue as they arrive to the scheduler). FIFO has O(1) per-packet time complexity, with very low constants (estimate 60-80ns on a 2GHz desktop machine) but gives no service guarantees. .It Cm wf2q+ implements the WF2Q+ algorithm, which is a Weighted Fair Queueing algorithm which permits flows to share bandwidth according to their weights. Note that weights are not priorities; even a flow with a minuscule weight will never starve. WF2Q+ has O(log N) per-packet processing cost, where N is the number of flows, and is the default algorithm used by previous versions dummynet's queues. .It Cm rr implements the Deficit Round Robin algorithm, which has O(1) processing costs (roughly, 100-150ns per packet) and permits bandwidth allocation according to weights, but with poor service guarantees. .It Cm qfq implements the QFQ algorithm, which is a very fast variant of WF2Q+, with similar service guarantees and O(1) processing costs (roughly, 200-250ns per packet). .El .El .Pp In addition to the type, all parameters allowed for a pipe can also be specified for a scheduler. .Pp Finally, the following parameters can be configured for both pipes and queues: .Pp .Bl -tag -width XXXX -compact .It Cm buckets Ar hash-table-size Specifies the size of the hash table used for storing the various queues. Default value is 64 controlled by the .Xr sysctl 8 variable .Va net.inet.ip.dummynet.hash_size , allowed range is 16 to 65536. .Pp .It Cm mask Ar mask-specifier Packets sent to a given pipe or queue by an .Nm rule can be further classified into multiple flows, each of which is then sent to a different .Em dynamic pipe or queue. A flow identifier is constructed by masking the IP addresses, ports and protocol types as specified with the .Cm mask options in the configuration of the pipe or queue. For each different flow identifier, a new pipe or queue is created with the same parameters as the original object, and matching packets are sent to it. .Pp Thus, when .Em dynamic pipes are used, each flow will get the same bandwidth as defined by the pipe, whereas when .Em dynamic queues are used, each flow will share the parent's pipe bandwidth evenly with other flows generated by the same queue (note that other queues with different weights might be connected to the same pipe). .br Available mask specifiers are a combination of one or more of the following: .Pp .Cm dst-ip Ar mask , .Cm dst-ip6 Ar mask , .Cm src-ip Ar mask , .Cm src-ip6 Ar mask , .Cm dst-port Ar mask , .Cm src-port Ar mask , .Cm flow-id Ar mask , .Cm proto Ar mask or .Cm all , .Pp where the latter means all bits in all fields are significant. .Pp .It Cm noerror When a packet is dropped by a .Nm dummynet queue or pipe, the error is normally reported to the caller routine in the kernel, in the same way as it happens when a device queue fills up. Setting this option reports the packet as successfully delivered, which can be needed for some experimental setups where you want to simulate loss or congestion at a remote router. .Pp .It Cm plr Ar packet-loss-rate Packet loss rate. Argument .Ar packet-loss-rate is a floating-point number between 0 and 1, with 0 meaning no loss, 1 meaning 100% loss. The loss rate is internally represented on 31 bits. .Pp .It Cm queue Brq Ar slots | size Ns Cm Kbytes Queue size, in .Ar slots or .Cm KBytes . Default value is 50 slots, which is the typical queue size for Ethernet devices. Note that for slow speed links you should keep the queue size short or your traffic might be affected by a significant queueing delay. E.g., 50 max-sized ethernet packets (1500 bytes) mean 600Kbit or 20s of queue on a 30Kbit/s pipe. Even worse effects can result if you get packets from an interface with a much larger MTU, e.g.\& the loopback interface with its 16KB packets. The .Xr sysctl 8 variables .Em net.inet.ip.dummynet.pipe_byte_limit and .Em net.inet.ip.dummynet.pipe_slot_limit control the maximum lengths that can be specified. .Pp .It Cm red | gred Ar w_q Ns / Ns Ar min_th Ns / Ns Ar max_th Ns / Ns Ar max_p [ecn] Make use of the RED (Random Early Detection) queue management algorithm. .Ar w_q and .Ar max_p are floating point numbers between 0 and 1 (inclusive), while .Ar min_th and .Ar max_th are integer numbers specifying thresholds for queue management (thresholds are computed in bytes if the queue has been defined in bytes, in slots otherwise). The two parameters can also be of the same value if needed. The .Nm dummynet also supports the gentle RED variant (gred) and ECN (Explicit Congestion Notification) as optional. Three .Xr sysctl 8 variables can be used to control the RED behaviour: .Bl -tag -width indent .It Va net.inet.ip.dummynet.red_lookup_depth specifies the accuracy in computing the average queue when the link is idle (defaults to 256, must be greater than zero) .It Va net.inet.ip.dummynet.red_avg_pkt_size specifies the expected average packet size (defaults to 512, must be greater than zero) .It Va net.inet.ip.dummynet.red_max_pkt_size specifies the expected maximum packet size, only used when queue thresholds are in bytes (defaults to 1500, must be greater than zero). .El .El .Pp When used with IPv6 data, .Nm dummynet currently has several limitations. Information necessary to route link-local packets to an interface is not available after processing by .Nm dummynet so those packets are dropped in the output path. Care should be taken to ensure that link-local packets are not passed to .Nm dummynet . .Sh CHECKLIST Here are some important points to consider when designing your rules: .Bl -bullet .It Remember that you filter both packets going .Cm in and .Cm out . Most connections need packets going in both directions. .It Remember to test very carefully. It is a good idea to be near the console when doing this. If you cannot be near the console, use an auto-recovery script such as the one in .Pa /usr/share/examples/ipfw/change_rules.sh . .It Do not forget the loopback interface. .El .Sh FINE POINTS .Bl -bullet .It There are circumstances where fragmented datagrams are unconditionally dropped. TCP packets are dropped if they do not contain at least 20 bytes of TCP header, UDP packets are dropped if they do not contain a full 8 byte UDP header, and ICMP packets are dropped if they do not contain 4 bytes of ICMP header, enough to specify the ICMP type, code, and checksum. These packets are simply logged as .Dq pullup failed since there may not be enough good data in the packet to produce a meaningful log entry. .It Another type of packet is unconditionally dropped, a TCP packet with a fragment offset of one. This is a valid packet, but it only has one use, to try to circumvent firewalls. When logging is enabled, these packets are reported as being dropped by rule -1. .It If you are logged in over a network, loading the .Xr kld 4 version of .Nm is probably not as straightforward as you would think. The following command line is recommended: .Bd -literal -offset indent kldload ipfw && \e ipfw add 32000 allow ip from any to any .Ed .Pp Along the same lines, doing an .Bd -literal -offset indent ipfw flush .Ed .Pp in similar surroundings is also a bad idea. .It The .Nm filter list may not be modified if the system security level is set to 3 or higher (see .Xr init 8 for information on system security levels). .El .Sh PACKET DIVERSION A .Xr divert 4 socket bound to the specified port will receive all packets diverted to that port. If no socket is bound to the destination port, or if the divert module is not loaded, or if the kernel was not compiled with divert socket support, the packets are dropped. .Sh NETWORK ADDRESS TRANSLATION (NAT) .Nm support in-kernel NAT using the kernel version of .Xr libalias 3 . .Pp The nat configuration command is the following: .Bd -ragged -offset indent .Bk -words .Cm nat .Ar nat_number .Cm config .Ar nat-configuration .Ek .Ed .Pp The following parameters can be configured: .Bl -tag -width indent .It Cm ip Ar ip_address Define an ip address to use for aliasing. .It Cm if Ar nic Use ip address of NIC for aliasing, dynamically changing it if NIC's ip address changes. .It Cm log Enable logging on this nat instance. .It Cm deny_in Deny any incoming connection from outside world. .It Cm same_ports Try to leave the alias port numbers unchanged from the actual local port numbers. .It Cm unreg_only Traffic on the local network not originating from an unregistered address spaces will be ignored. .It Cm reset Reset table of the packet aliasing engine on address change. .It Cm reverse Reverse the way libalias handles aliasing. .It Cm proxy_only Obey transparent proxy rules only, packet aliasing is not performed. .It Cm skip_global Skip instance in case of global state lookup (see below). .El .Pp Some specials value can be supplied instead of .Va nat_number: .Bl -tag -width indent .It Cm global Looks up translation state in all configured nat instances. If an entry is found, packet is aliased according to that entry. If no entry was found in any of the instances, packet is passed unchanged, and no new entry will be created. See section .Sx MULTIPLE INSTANCES in .Xr natd 8 for more information. .It Cm tablearg Uses argument supplied in lookup table. See .Sx LOOKUP TABLES section below for more information on lookup tables. .El .Pp To let the packet continue after being (de)aliased, set the sysctl variable .Va net.inet.ip.fw.one_pass to 0. For more information about aliasing modes, refer to .Xr libalias 3 . See Section .Sx EXAMPLES for some examples about nat usage. .Ss REDIRECT AND LSNAT SUPPORT IN IPFW Redirect and LSNAT support follow closely the syntax used in .Xr natd 8 . See Section .Sx EXAMPLES for some examples on how to do redirect and lsnat. .Ss SCTP NAT SUPPORT SCTP nat can be configured in a similar manner to TCP through the .Nm command line tool. The main difference is that .Nm sctp nat does not do port translation. Since the local and global side ports will be the same, there is no need to specify both. Ports are redirected as follows: .Bd -ragged -offset indent .Bk -words .Cm nat .Ar nat_number .Cm config if .Ar nic .Cm redirect_port sctp .Ar ip_address [,addr_list] {[port | port-port] [,ports]} .Ek .Ed .Pp Most .Nm sctp nat configuration can be done in real-time through the .Xr sysctl 8 interface. All may be changed dynamically, though the hash_table size will only change for new .Nm nat instances. See .Sx SYSCTL VARIABLES for more info. .Sh LOADER TUNABLES Tunables can be set in .Xr loader 8 prompt, .Xr loader.conf 5 or .Xr kenv 1 before ipfw module gets loaded. .Bl -tag -width indent .It Va net.inet.ip.fw.default_to_accept: No 0 Defines ipfw last rule behavior. This value overrides .Cd "options IPFW_DEFAULT_TO_(ACCEPT|DENY)" from kernel configuration file. .It Va net.inet.ip.fw.tables_max: No 128 Defines number of tables available in ipfw. Number cannot exceed 65534. .El .Sh SYSCTL VARIABLES A set of .Xr sysctl 8 variables controls the behaviour of the firewall and associated modules .Pq Nm dummynet , bridge , sctp nat . These are shown below together with their default value (but always check with the .Xr sysctl 8 command what value is actually in use) and meaning: .Bl -tag -width indent .It Va net.inet.ip.alias.sctp.accept_global_ootb_addip: No 0 Defines how the .Nm nat responds to receipt of global OOTB ASCONF-AddIP: .Bl -tag -width indent .It Cm 0 No response (unless a partially matching association exists - ports and vtags match but global address does not) .It Cm 1 .Nm nat will accept and process all OOTB global AddIP messages. .El .Pp Option 1 should never be selected as this forms a security risk. An attacker can establish multiple fake associations by sending AddIP messages. .It Va net.inet.ip.alias.sctp.chunk_proc_limit: No 5 Defines the maximum number of chunks in an SCTP packet that will be parsed for a packet that matches an existing association. This value is enforced to be greater or equal than .Cm net.inet.ip.alias.sctp.initialising_chunk_proc_limit . A high value is a DoS risk yet setting too low a value may result in important control chunks in the packet not being located and parsed. .It Va net.inet.ip.alias.sctp.error_on_ootb: No 1 Defines when the .Nm nat responds to any Out-of-the-Blue (OOTB) packets with ErrorM packets. An OOTB packet is a packet that arrives with no existing association registered in the .Nm nat and is not an INIT or ASCONF-AddIP packet: .Bl -tag -width indent .It Cm 0 ErrorM is never sent in response to OOTB packets. .It Cm 1 ErrorM is only sent to OOTB packets received on the local side. .It Cm 2 ErrorM is sent to the local side and on the global side ONLY if there is a partial match (ports and vtags match but the source global IP does not). This value is only useful if the .Nm nat is tracking global IP addresses. .It Cm 3 ErrorM is sent in response to all OOTB packets on both the local and global side (DoS risk). .El .Pp At the moment the default is 0, since the ErrorM packet is not yet supported by most SCTP stacks. When it is supported, and if not tracking global addresses, we recommend setting this value to 1 to allow multi-homed local hosts to function with the .Nm nat . To track global addresses, we recommend setting this value to 2 to allow global hosts to be informed when they need to (re)send an ASCONF-AddIP. Value 3 should never be chosen (except for debugging) as the .Nm nat will respond to all OOTB global packets (a DoS risk). .It Va net.inet.ip.alias.sctp.hashtable_size: No 2003 Size of hash tables used for .Nm nat lookups (100 < prime_number > 1000001). This value sets the .Nm hash table size for any future created .Nm nat instance and therefore must be set prior to creating a .Nm nat instance. The table sizes may be changed to suit specific needs. If there will be few concurrent associations, and memory is scarce, you may make these smaller. If there will be many thousands (or millions) of concurrent associations, you should make these larger. A prime number is best for the table size. The sysctl update function will adjust your input value to the next highest prime number. .It Va net.inet.ip.alias.sctp.holddown_time: No 0 Hold association in table for this many seconds after receiving a SHUTDOWN-COMPLETE. This allows endpoints to correct shutdown gracefully if a shutdown_complete is lost and retransmissions are required. .It Va net.inet.ip.alias.sctp.init_timer: No 15 Timeout value while waiting for (INIT-ACK|AddIP-ACK). This value cannot be 0. .It Va net.inet.ip.alias.sctp.initialising_chunk_proc_limit: No 2 Defines the maximum number of chunks in an SCTP packet that will be parsed when no existing association exists that matches that packet. Ideally this packet will only be an INIT or ASCONF-AddIP packet. A higher value may become a DoS risk as malformed packets can consume processing resources. .It Va net.inet.ip.alias.sctp.param_proc_limit: No 25 Defines the maximum number of parameters within a chunk that will be parsed in a packet. As for other similar sysctl variables, larger values pose a DoS risk. .It Va net.inet.ip.alias.sctp.log_level: No 0 Level of detail in the system log messages (0 \- minimal, 1 \- event, 2 \- info, 3 \- detail, 4 \- debug, 5 \- max debug). May be a good option in high loss environments. .It Va net.inet.ip.alias.sctp.shutdown_time: No 15 Timeout value while waiting for SHUTDOWN-COMPLETE. This value cannot be 0. .It Va net.inet.ip.alias.sctp.track_global_addresses: No 0 Enables/disables global IP address tracking within the .Nm nat and places an upper limit on the number of addresses tracked for each association: .Bl -tag -width indent .It Cm 0 Global tracking is disabled .It Cm >1 Enables tracking, the maximum number of addresses tracked for each association is limited to this value .El .Pp This variable is fully dynamic, the new value will be adopted for all newly arriving associations, existing associations are treated as they were previously. Global tracking will decrease the number of collisions within the .Nm nat at a cost of increased processing load, memory usage, complexity, and possible .Nm nat state problems in complex networks with multiple .Nm nats . We recommend not tracking global IP addresses, this will still result in a fully functional .Nm nat . .It Va net.inet.ip.alias.sctp.up_timer: No 300 Timeout value to keep an association up with no traffic. This value cannot be 0. .It Va net.inet.ip.dummynet.expire : No 1 Lazily delete dynamic pipes/queue once they have no pending traffic. You can disable this by setting the variable to 0, in which case the pipes/queues will only be deleted when the threshold is reached. .It Va net.inet.ip.dummynet.hash_size : No 64 Default size of the hash table used for dynamic pipes/queues. This value is used when no .Cm buckets option is specified when configuring a pipe/queue. .It Va net.inet.ip.dummynet.io_fast : No 0 If set to a non-zero value, the .Dq fast mode of .Nm dummynet operation (see above) is enabled. .It Va net.inet.ip.dummynet.io_pkt Number of packets passed to .Nm dummynet . .It Va net.inet.ip.dummynet.io_pkt_drop Number of packets dropped by .Nm dummynet . .It Va net.inet.ip.dummynet.io_pkt_fast Number of packets bypassed by the .Nm dummynet scheduler. .It Va net.inet.ip.dummynet.max_chain_len : No 16 Target value for the maximum number of pipes/queues in a hash bucket. The product .Cm max_chain_len*hash_size is used to determine the threshold over which empty pipes/queues will be expired even when .Cm net.inet.ip.dummynet.expire=0 . .It Va net.inet.ip.dummynet.red_lookup_depth : No 256 .It Va net.inet.ip.dummynet.red_avg_pkt_size : No 512 .It Va net.inet.ip.dummynet.red_max_pkt_size : No 1500 Parameters used in the computations of the drop probability for the RED algorithm. .It Va net.inet.ip.dummynet.pipe_byte_limit : No 1048576 .It Va net.inet.ip.dummynet.pipe_slot_limit : No 100 The maximum queue size that can be specified in bytes or packets. These limits prevent accidental exhaustion of resources such as mbufs. If you raise these limits, you should make sure the system is configured so that sufficient resources are available. .It Va net.inet.ip.fw.autoinc_step : No 100 Delta between rule numbers when auto-generating them. The value must be in the range 1..1000. .It Va net.inet.ip.fw.curr_dyn_buckets : Va net.inet.ip.fw.dyn_buckets The current number of buckets in the hash table for dynamic rules (readonly). .It Va net.inet.ip.fw.debug : No 1 Controls debugging messages produced by .Nm . .It Va net.inet.ip.fw.default_rule : No 65535 The default rule number (read-only). By the design of .Nm , the default rule is the last one, so its number can also serve as the highest number allowed for a rule. .It Va net.inet.ip.fw.dyn_buckets : No 256 The number of buckets in the hash table for dynamic rules. Must be a power of 2, up to 65536. It only takes effect when all dynamic rules have expired, so you are advised to use a .Cm flush command to make sure that the hash table is resized. .It Va net.inet.ip.fw.dyn_count : No 3 Current number of dynamic rules (read-only). .It Va net.inet.ip.fw.dyn_keepalive : No 1 Enables generation of keepalive packets for .Cm keep-state rules on TCP sessions. A keepalive is generated to both sides of the connection every 5 seconds for the last 20 seconds of the lifetime of the rule. .It Va net.inet.ip.fw.dyn_max : No 8192 Maximum number of dynamic rules. When you hit this limit, no more dynamic rules can be installed until old ones expire. .It Va net.inet.ip.fw.dyn_ack_lifetime : No 300 .It Va net.inet.ip.fw.dyn_syn_lifetime : No 20 .It Va net.inet.ip.fw.dyn_fin_lifetime : No 1 .It Va net.inet.ip.fw.dyn_rst_lifetime : No 1 .It Va net.inet.ip.fw.dyn_udp_lifetime : No 5 .It Va net.inet.ip.fw.dyn_short_lifetime : No 30 These variables control the lifetime, in seconds, of dynamic rules. Upon the initial SYN exchange the lifetime is kept short, then increased after both SYN have been seen, then decreased again during the final FIN exchange or when a RST is received. Both .Em dyn_fin_lifetime and .Em dyn_rst_lifetime must be strictly lower than 5 seconds, the period of repetition of keepalives. The firewall enforces that. .It Va net.inet.ip.fw.enable : No 1 Enables the firewall. Setting this variable to 0 lets you run your machine without firewall even if compiled in. .It Va net.inet6.ip6.fw.enable : No 1 provides the same functionality as above for the IPv6 case. .It Va net.inet.ip.fw.one_pass : No 1 When set, the packet exiting from the .Nm dummynet pipe or from .Xr ng_ipfw 4 node is not passed though the firewall again. Otherwise, after an action, the packet is reinjected into the firewall at the next rule. .It Va net.inet.ip.fw.tables_max : No 128 Maximum number of tables. .It Va net.inet.ip.fw.verbose : No 1 Enables verbose messages. .It Va net.inet.ip.fw.verbose_limit : No 0 Limits the number of messages produced by a verbose firewall. .It Va net.inet6.ip6.fw.deny_unknown_exthdrs : No 1 If enabled packets with unknown IPv6 Extension Headers will be denied. .It Va net.link.ether.ipfw : No 0 Controls whether layer-2 packets are passed to .Nm . Default is no. .It Va net.link.bridge.ipfw : No 0 Controls whether bridged packets are passed to .Nm . Default is no. .El .Sh EXAMPLES There are far too many possible uses of .Nm so this Section will only give a small set of examples. .Pp .Ss BASIC PACKET FILTERING This command adds an entry which denies all tcp packets from .Em cracker.evil.org to the telnet port of .Em wolf.tambov.su from being forwarded by the host: .Pp .Dl "ipfw add deny tcp from cracker.evil.org to wolf.tambov.su telnet" .Pp This one disallows any connection from the entire cracker's network to my host: .Pp .Dl "ipfw add deny ip from 123.45.67.0/24 to my.host.org" .Pp A first and efficient way to limit access (not using dynamic rules) is the use of the following rules: .Pp .Dl "ipfw add allow tcp from any to any established" .Dl "ipfw add allow tcp from net1 portlist1 to net2 portlist2 setup" .Dl "ipfw add allow tcp from net3 portlist3 to net3 portlist3 setup" .Dl "..." .Dl "ipfw add deny tcp from any to any" .Pp The first rule will be a quick match for normal TCP packets, but it will not match the initial SYN packet, which will be matched by the .Cm setup rules only for selected source/destination pairs. All other SYN packets will be rejected by the final .Cm deny rule. .Pp If you administer one or more subnets, you can take advantage of the address sets and or-blocks and write extremely compact rulesets which selectively enable services to blocks of clients, as below: .Pp .Dl "goodguys=\*q{ 10.1.2.0/24{20,35,66,18} or 10.2.3.0/28{6,3,11} }\*q" .Dl "badguys=\*q10.1.2.0/24{8,38,60}\*q" .Dl "" .Dl "ipfw add allow ip from ${goodguys} to any" .Dl "ipfw add deny ip from ${badguys} to any" .Dl "... normal policies ..." .Pp The .Cm verrevpath option could be used to do automated anti-spoofing by adding the following to the top of a ruleset: .Pp .Dl "ipfw add deny ip from any to any not verrevpath in" .Pp This rule drops all incoming packets that appear to be coming to the system on the wrong interface. For example, a packet with a source address belonging to a host on a protected internal network would be dropped if it tried to enter the system from an external interface. .Pp The .Cm antispoof option could be used to do similar but more restricted anti-spoofing by adding the following to the top of a ruleset: .Pp .Dl "ipfw add deny ip from any to any not antispoof in" .Pp This rule drops all incoming packets that appear to be coming from another directly connected system but on the wrong interface. For example, a packet with a source address of .Li 192.168.0.0/24 , configured on .Li fxp0 , but coming in on .Li fxp1 would be dropped. .Pp The .Cm setdscp option could be used to (re)mark user traffic, by adding the following to the appropriate place in ruleset: .Pp .Dl "ipfw add setdscp be ip from any to any dscp af11,af21" .Ss DYNAMIC RULES In order to protect a site from flood attacks involving fake TCP packets, it is safer to use dynamic rules: .Pp .Dl "ipfw add check-state" .Dl "ipfw add deny tcp from any to any established" .Dl "ipfw add allow tcp from my-net to any setup keep-state" .Pp This will let the firewall install dynamic rules only for those connection which start with a regular SYN packet coming from the inside of our network. Dynamic rules are checked when encountering the first occurrence of a .Cm check-state , .Cm keep-state or .Cm limit rule. A .Cm check-state rule should usually be placed near the beginning of the ruleset to minimize the amount of work scanning the ruleset. Your mileage may vary. .Pp To limit the number of connections a user can open you can use the following type of rules: .Pp .Dl "ipfw add allow tcp from my-net/24 to any setup limit src-addr 10" .Dl "ipfw add allow tcp from any to me setup limit src-addr 4" .Pp The former (assuming it runs on a gateway) will allow each host on a /24 network to open at most 10 TCP connections. The latter can be placed on a server to make sure that a single client does not use more than 4 simultaneous connections. .Pp .Em BEWARE : stateful rules can be subject to denial-of-service attacks by a SYN-flood which opens a huge number of dynamic rules. The effects of such attacks can be partially limited by acting on a set of .Xr sysctl 8 variables which control the operation of the firewall. .Pp Here is a good usage of the .Cm list command to see accounting records and timestamp information: .Pp .Dl ipfw -at list .Pp or in short form without timestamps: .Pp .Dl ipfw -a list .Pp which is equivalent to: .Pp .Dl ipfw show .Pp Next rule diverts all incoming packets from 192.168.2.0/24 to divert port 5000: .Pp .Dl ipfw divert 5000 ip from 192.168.2.0/24 to any in .Ss TRAFFIC SHAPING The following rules show some of the applications of .Nm and .Nm dummynet for simulations and the like. .Pp This rule drops random incoming packets with a probability of 5%: .Pp .Dl "ipfw add prob 0.05 deny ip from any to any in" .Pp A similar effect can be achieved making use of .Nm dummynet pipes: .Pp .Dl "ipfw add pipe 10 ip from any to any" .Dl "ipfw pipe 10 config plr 0.05" .Pp We can use pipes to artificially limit bandwidth, e.g.\& on a machine acting as a router, if we want to limit traffic from local clients on 192.168.2.0/24 we do: .Pp .Dl "ipfw add pipe 1 ip from 192.168.2.0/24 to any out" .Dl "ipfw pipe 1 config bw 300Kbit/s queue 50KBytes" .Pp note that we use the .Cm out modifier so that the rule is not used twice. Remember in fact that .Nm rules are checked both on incoming and outgoing packets. .Pp Should we want to simulate a bidirectional link with bandwidth limitations, the correct way is the following: .Pp .Dl "ipfw add pipe 1 ip from any to any out" .Dl "ipfw add pipe 2 ip from any to any in" .Dl "ipfw pipe 1 config bw 64Kbit/s queue 10Kbytes" .Dl "ipfw pipe 2 config bw 64Kbit/s queue 10Kbytes" .Pp The above can be very useful, e.g.\& if you want to see how your fancy Web page will look for a residential user who is connected only through a slow link. You should not use only one pipe for both directions, unless you want to simulate a half-duplex medium (e.g.\& AppleTalk, Ethernet, IRDA). It is not necessary that both pipes have the same configuration, so we can also simulate asymmetric links. .Pp Should we want to verify network performance with the RED queue management algorithm: .Pp .Dl "ipfw add pipe 1 ip from any to any" .Dl "ipfw pipe 1 config bw 500Kbit/s queue 100 red 0.002/30/80/0.1" .Pp Another typical application of the traffic shaper is to introduce some delay in the communication. This can significantly affect applications which do a lot of Remote Procedure Calls, and where the round-trip-time of the connection often becomes a limiting factor much more than bandwidth: .Pp .Dl "ipfw add pipe 1 ip from any to any out" .Dl "ipfw add pipe 2 ip from any to any in" .Dl "ipfw pipe 1 config delay 250ms bw 1Mbit/s" .Dl "ipfw pipe 2 config delay 250ms bw 1Mbit/s" .Pp Per-flow queueing can be useful for a variety of purposes. A very simple one is counting traffic: .Pp .Dl "ipfw add pipe 1 tcp from any to any" .Dl "ipfw add pipe 1 udp from any to any" .Dl "ipfw add pipe 1 ip from any to any" .Dl "ipfw pipe 1 config mask all" .Pp The above set of rules will create queues (and collect statistics) for all traffic. Because the pipes have no limitations, the only effect is collecting statistics. Note that we need 3 rules, not just the last one, because when .Nm tries to match IP packets it will not consider ports, so we would not see connections on separate ports as different ones. .Pp A more sophisticated example is limiting the outbound traffic on a net with per-host limits, rather than per-network limits: .Pp .Dl "ipfw add pipe 1 ip from 192.168.2.0/24 to any out" .Dl "ipfw add pipe 2 ip from any to 192.168.2.0/24 in" .Dl "ipfw pipe 1 config mask src-ip 0x000000ff bw 200Kbit/s queue 20Kbytes" .Dl "ipfw pipe 2 config mask dst-ip 0x000000ff bw 200Kbit/s queue 20Kbytes" .Ss LOOKUP TABLES In the following example, we need to create several traffic bandwidth classes and we need different hosts/networks to fall into different classes. We create one pipe for each class and configure them accordingly. Then we create a single table and fill it with IP subnets and addresses. For each subnet/host we set the argument equal to the number of the pipe that it should use. Then we classify traffic using a single rule: .Pp .Dl "ipfw pipe 1 config bw 1000Kbyte/s" .Dl "ipfw pipe 4 config bw 4000Kbyte/s" .Dl "..." .Dl "ipfw table 1 add 192.168.2.0/24 1" .Dl "ipfw table 1 add 192.168.0.0/27 4" .Dl "ipfw table 1 add 192.168.0.2 1" .Dl "..." .Dl "ipfw add pipe tablearg ip from table(1) to any" .Pp Using the .Cm fwd action, the table entries may include hostnames and IP addresses. .Pp .Dl "ipfw table 1 add 192.168.2.0/24 10.23.2.1" .Dl "ipfw table 1 add 192.168.0.0/27 router1.dmz" .Dl "..." .Dl "ipfw add 100 fwd tablearg ip from any to table(1)" .Pp In the following example per-interface firewall is created: .Pp .Dl "ipfw table 10 add vlan20 12000" .Dl "ipfw table 10 add vlan30 13000" .Dl "ipfw table 20 add vlan20 22000" .Dl "ipfw table 20 add vlan30 23000" .Dl ".." .Dl "ipfw add 100 ipfw skipto tablearg ip from any to any recv 'table(10)' in" .Dl "ipfw add 200 ipfw skipto tablearg ip from any to any xmit 'table(10)' out" .Ss SETS OF RULES To add a set of rules atomically, e.g.\& set 18: .Pp .Dl "ipfw set disable 18" .Dl "ipfw add NN set 18 ... # repeat as needed" .Dl "ipfw set enable 18" .Pp To delete a set of rules atomically the command is simply: .Pp .Dl "ipfw delete set 18" .Pp To test a ruleset and disable it and regain control if something goes wrong: .Pp .Dl "ipfw set disable 18" .Dl "ipfw add NN set 18 ... # repeat as needed" .Dl "ipfw set enable 18; echo done; sleep 30 && ipfw set disable 18" .Pp Here if everything goes well, you press control-C before the "sleep" terminates, and your ruleset will be left active. Otherwise, e.g.\& if you cannot access your box, the ruleset will be disabled after the sleep terminates thus restoring the previous situation. .Pp To show rules of the specific set: .Pp .Dl "ipfw set 18 show" .Pp To show rules of the disabled set: .Pp .Dl "ipfw -S set 18 show" .Pp To clear a specific rule counters of the specific set: .Pp .Dl "ipfw set 18 zero NN" .Pp To delete a specific rule of the specific set: .Pp .Dl "ipfw set 18 delete NN" .Ss NAT, REDIRECT AND LSNAT First redirect all the traffic to nat instance 123: .Pp .Dl "ipfw add nat 123 all from any to any" .Pp Then to configure nat instance 123 to alias all the outgoing traffic with ip 192.168.0.123, blocking all incoming connections, trying to keep same ports on both sides, clearing aliasing table on address change and keeping a log of traffic/link statistics: .Pp .Dl "ipfw nat 123 config ip 192.168.0.123 log deny_in reset same_ports" .Pp Or to change address of instance 123, aliasing table will be cleared (see reset option): .Pp .Dl "ipfw nat 123 config ip 10.0.0.1" .Pp To see configuration of nat instance 123: .Pp .Dl "ipfw nat 123 show config" .Pp To show logs of all the instances in range 111-999: .Pp .Dl "ipfw nat 111-999 show" .Pp To see configurations of all instances: .Pp .Dl "ipfw nat show config" .Pp Or a redirect rule with mixed modes could looks like: .Pp .Dl "ipfw nat 123 config redirect_addr 10.0.0.1 10.0.0.66" .Dl " redirect_port tcp 192.168.0.1:80 500" .Dl " redirect_proto udp 192.168.1.43 192.168.1.1" .Dl " redirect_addr 192.168.0.10,192.168.0.11" .Dl " 10.0.0.100 # LSNAT" .Dl " redirect_port tcp 192.168.0.1:80,192.168.0.10:22" .Dl " 500 # LSNAT" .Pp or it could be split in: .Pp .Dl "ipfw nat 1 config redirect_addr 10.0.0.1 10.0.0.66" .Dl "ipfw nat 2 config redirect_port tcp 192.168.0.1:80 500" .Dl "ipfw nat 3 config redirect_proto udp 192.168.1.43 192.168.1.1" .Dl "ipfw nat 4 config redirect_addr 192.168.0.10,192.168.0.11,192.168.0.12" .Dl " 10.0.0.100" .Dl "ipfw nat 5 config redirect_port tcp" .Dl " 192.168.0.1:80,192.168.0.10:22,192.168.0.20:25 500" .Sh SEE ALSO .Xr cpp 1 , .Xr m4 1 , .Xr altq 4 , .Xr divert 4 , .Xr dummynet 4 , .Xr if_bridge 4 , .Xr ip 4 , .Xr ipfirewall 4 , .Xr ng_ipfw 4 , .Xr protocols 5 , .Xr services 5 , .Xr init 8 , .Xr kldload 8 , .Xr reboot 8 , .Xr sysctl 8 , .Xr syslogd 8 .Sh HISTORY The .Nm utility first appeared in .Fx 2.0 . .Nm dummynet was introduced in .Fx 2.2.8 . Stateful extensions were introduced in .Fx 4.0 . .Nm ipfw2 was introduced in Summer 2002. .Sh AUTHORS .An Ugen J. S. Antsilevich , .An Poul-Henning Kamp , .An Alex Nash , .An Archie Cobbs , .An Luigi Rizzo . .Pp .An -nosplit API based upon code written by .An Daniel Boulet for BSDI. .Pp Dummynet has been introduced by Luigi Rizzo in 1997-1998. .Pp Some early work (1999-2000) on the .Nm dummynet traffic shaper supported by Akamba Corp. .Pp The ipfw core (ipfw2) has been completely redesigned and reimplemented by Luigi Rizzo in summer 2002. Further actions and options have been added by various developer over the years. .Pp .An -nosplit In-kernel NAT support written by .An Paolo Pisati Aq piso@FreeBSD.org as part of a Summer of Code 2005 project. .Pp SCTP .Nm nat support has been developed by .An The Centre for Advanced Internet Architectures (CAIA) Aq http://www.caia.swin.edu.au . The primary developers and maintainers are David Hayes and Jason But. For further information visit: .Aq http://www.caia.swin.edu.au/urp/SONATA .Pp Delay profiles have been developed by Alessandro Cerri and Luigi Rizzo, supported by the European Commission within Projects Onelab and Onelab2. .Sh BUGS The syntax has grown over the years and sometimes it might be confusing. Unfortunately, backward compatibility prevents cleaning up mistakes made in the definition of the syntax. .Pp .Em !!! WARNING !!! .Pp Misconfiguring the firewall can put your computer in an unusable state, possibly shutting down network services and requiring console access to regain control of it. .Pp Incoming packet fragments diverted by .Cm divert are reassembled before delivery to the socket. The action used on those packet is the one from the rule which matches the first fragment of the packet. .Pp Packets diverted to userland, and then reinserted by a userland process may lose various packet attributes. The packet source interface name will be preserved if it is shorter than 8 bytes and the userland process saves and reuses the sockaddr_in (as does .Xr natd 8 ) ; otherwise, it may be lost. If a packet is reinserted in this manner, later rules may be incorrectly applied, making the order of .Cm divert rules in the rule sequence very important. .Pp Dummynet drops all packets with IPv6 link-local addresses. .Pp Rules using .Cm uid or .Cm gid may not behave as expected. In particular, incoming SYN packets may have no uid or gid associated with them since they do not yet belong to a TCP connection, and the uid/gid associated with a packet may not be as expected if the associated process calls .Xr setuid 2 or similar system calls. .Pp Rule syntax is subject to the command line environment and some patterns may need to be escaped with the backslash character or quoted appropriately. .Pp Due to the architecture of .Xr libalias 3 , ipfw nat is not compatible with the TCP segmentation offloading (TSO). Thus, to reliably nat your network traffic, please disable TSO on your NICs using .Xr ifconfig 8 . .Pp ICMP error messages are not implicitly matched by dynamic rules for the respective conversations. To avoid failures of network error detection and path MTU discovery, ICMP error messages may need to be allowed explicitly through static rules. .Pp Rules using .Cm call and .Cm return actions may lead to confusing behaviour if ruleset has mistakes, and/or interaction with other subsystems (netgraph, dummynet, etc.) is used. One possible case for this is packet leaving .Nm in subroutine on the input pass, while later on output encountering unpaired .Cm return first. As the call stack is kept intact after input pass, packet will suddenly return to the rule number used on input pass, not on output one. Order of processing should be checked carefully to avoid such mistakes. Index: stable/10/share/man/man4/gre.4 =================================================================== --- stable/10/share/man/man4/gre.4 (revision 306377) +++ stable/10/share/man/man4/gre.4 (revision 306378) @@ -1,198 +1,196 @@ .\" $NetBSD: gre.4,v 1.28 2002/06/10 02:49:35 itojun Exp $ .\" .\" Copyright 1998 (c) The NetBSD Foundation, Inc. .\" All rights reserved. .\" .\" This code is derived from software contributed to The NetBSD Foundation .\" by Heiko W.Rupp .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE .\" POSSIBILITY OF SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd June 2, 2015 .Dt GRE 4 .Os .Sh NAME .Nm gre .Nd encapsulating network device .Sh SYNOPSIS To compile the driver into the kernel, place the following line in the kernel configuration file: .Bd -ragged -offset indent .Cd "device gre" .Ed .Pp Alternatively, to load the driver as a module at boot time, place the following line in .Xr loader.conf 5 : .Bd -literal -offset indent if_gre_load="YES" .Ed .Sh DESCRIPTION The .Nm network interface pseudo device encapsulates datagrams into IP. These encapsulated datagrams are routed to a destination host, where they are decapsulated and further routed to their final destination. The .Dq tunnel appears to the inner datagrams as one hop. .Pp .Nm interfaces are dynamically created and destroyed with the .Xr ifconfig 8 .Cm create and .Cm destroy subcommands. .Pp This driver corresponds to RFC 2784. Encapsulated datagrams are prepended an outer datagram and a GRE header. The GRE header specifies the type of the encapsulated datagram and thus allows for tunneling other protocols than IP like e.g.\& AppleTalk. GRE mode is also the default tunnel mode on Cisco routers. .Nm also supports Cisco WCCP protocol, both version 1 and version 2. .Pp The .Nm interfaces support a number of additional parameters to the .Xr ifconfig 8 : .Bl -tag -width "enable_csum" .It Ar grekey Set the GRE key used for outgoing packets. A value of 0 disables the key option. .It Ar enable_csum Enables checksum calculation for outgoing packets. .It Ar enable_seq Enables use of sequence number field in the GRE header for outgoing packets. .El .Sh EXAMPLES -.Pp .Bd -literal 192.168.1.* --- Router A -------tunnel-------- Router B --- 192.168.2.* \\ / \\ / +------ the Internet ------+ .Ed .Pp Assuming router A has the (external) IP address A and the internal address 192.168.1.1, while router B has external address B and internal address 192.168.2.1, the following commands will configure the tunnel: .Pp On router A: .Bd -literal -offset indent ifconfig greN create ifconfig greN inet 192.168.1.1 192.168.2.1 ifconfig greN inet tunnel A B route add -net 192.168.2 -netmask 255.255.255.0 192.168.2.1 .Ed .Pp On router B: .Bd -literal -offset indent ifconfig greN create ifconfig greN inet 192.168.2.1 192.168.1.1 ifconfig greN inet tunnel B A route add -net 192.168.1 -netmask 255.255.255.0 192.168.1.1 .Ed .Pp In case when internal and external IP addresses are the same, different routing tables (FIB) should be used. The default FIB will be applied to IP packets before GRE encapsulation. After encapsulation GRE interface should set different FIB number to outgoing packet. Then different FIB will be applied to such encapsulated packets. According to this FIB packet should be routed to tunnel endpoint. .Bd -literal Host X -- Host A (198.51.100.1) ---tunnel--- Cisco D (203.0.113.1) -- Host E \\ / \\ / +----- Host B ----- Host C -----+ (198.51.100.254) .Ed .Pp On Host A (FreeBSD): .Pp First of multiple FIBs should be configured via loader.conf: .Bd -literal -offset indent net.fibs=2 net.add_addr_allfibs=0 .Ed .Pp Then routes to the gateway and remote tunnel endpoint via this gateway should be added to the second FIB: .Bd -literal -offset indent route add -net 198.51.100.0 -netmask 255.255.255.0 -fib 1 -iface em0 route add -host 203.0.113.1 -fib 1 198.51.100.254 .Ed .Pp And GRE tunnel should be configured to change FIB for encapsulated packets: .Bd -literal -offset indent ifconfig greN create ifconfig greN inet 198.51.100.1 203.0.113.1 ifconfig greN inet tunnel 198.51.100.1 203.0.113.1 tunnelfib 1 .Ed .Pp .Sh NOTES The MTU of .Nm interfaces is set to 1476 by default, to match the value used by Cisco routers. This may not be an optimal value, depending on the link between the two tunnel endpoints. It can be adjusted via .Xr ifconfig 8 . .Pp For correct operation, the .Nm device needs a route to the decapsulating host that does not run over the tunnel, as this would be a loop. .Pp The kernel must be set to forward datagrams by setting the .Va net.inet.ip.forwarding .Xr sysctl 8 variable to non-zero. .Sh SEE ALSO .\" Xr atalk 4 , .Xr gif 4 , .Xr inet 4 , .Xr ip 4 , .Xr me 4 , .Xr netintro 4 , .Xr protocols 5 , .Xr ifconfig 8 , .Xr sysctl 8 .Pp A description of GRE encapsulation can be found in RFC 2784 and RFC 2890. .Sh AUTHORS .An Andrey V. Elsukov Aq Mt ae@FreeBSD.org .An Heiko W.Rupp Aq Mt hwr@pilhuhn.de .Sh BUGS -.Pp The current implementation uses the key only for outgoing packets. Incoming packets with a different key or without a key will be treated as if they would belong to this interface. .Pp The sequence number field also used only for outgoing packets. Index: stable/10/share/man/man4/man4.arm/cgem.4 =================================================================== --- stable/10/share/man/man4/man4.arm/cgem.4 (revision 306377) +++ stable/10/share/man/man4/man4.arm/cgem.4 (revision 306378) @@ -1,297 +1,296 @@ .\" .\" Copyright (c) 2014 Thomas Skibo .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. The name of the author may not be used to endorse or promote products .\" derived from this software without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd August 26, 2014 .Dt CGEM 4 .Os .Sh NAME .Nm cgem .Nd "Cadence GEM Gigabit Ethernet driver" .Sh SYNOPSIS To compile this driver into the kernel, place the following lines in your kernel configuration file: .Bd -ragged -offset indent .Cd "device ether" .Cd "device miibus" .Cd "device cgem" .Ed .Sh DESCRIPTION The .Nm driver provides support for the Cadence GEM (Gigabit Ethernet MAC). The Cadence GEM is used in some SoC (System on a Chip) devices such as the Xilinx Zynq-7000 and the Atmel SAMA5D3. .Pp The .Nm driver supports the following media types: .Bl -tag -width ".Cm 10baseT/UTP" .It Cm autoselect Enable autoselection of the media type and options. The user can manually override the autoselected mode using .Xr ifconfig 8 or by adding media options to .Xr rc.conf 5 . .It Cm 10baseT/UTP Set 10Mbps operation. The .Xr ifconfig 8 .Cm mediaopt option can also be used to select either .Cm full-duplex or .Cm half-duplex modes. .It Cm 100baseTX Set 100Mbps (Fast Ethernet) operation. The .Xr ifconfig 8 .Cm mediaopt option can also be used to select either .Cm full-duplex or .Cm half-duplex modes. .It Cm 1000baseT Set 1000Mbps (Gigabit Ethernet) operation over twisted pair. The GEM supports 1000Mbps in .Cm full-duplex mode only. .El .Pp The .Nm driver supports the following media options: .Bl -tag -width ".Cm full-duplex" .It Cm full-duplex Force full-duplex operation. .It Cm half-duplex Force half-duplex operation. .El .Pp The driver provides support for TCP/UDP/IP checksum offloading (although disabled by default). The device and driver also support 1536-byte frames for VLANs (vlanmtu). .Sh SYSCTL VARIABLES The following variables are available as both .Xr sysctl 8 variables and .Xr loader 8 tunables: .Bl -tag -width "xxxxxxxx" .It Va dev.cgem.%d.rxbufs The number of receive buffers allocated to the hardware. The default value is 256. The maximum value is 511. If this number is increased while the interface is UP, it will not take effect until the next packet is received. If this number is decreased while the interface is UP, buffers will not be immediately removed from the receive buffer ring but the number of buffers will decrease as packets are received until it reaches the new value. .It Va dev.cgem.%d.rxhangwar This tunable enables a work-around to recover from receive hangs. The default value is 1. Set to 0 to disable the work-around. .El .Pp The following read-only variables are available as .Xr sysctl 8 variables: .Bl -tag -width "xxxxxxxx" .It Va dev.cgem.%d._rxoverruns This variable counts the number of receive packet buffer overrun interrupts. .It Va dev.cgem.%d._rxnobufs This variable counts the number of interrupts due to the GEM buffer ring going empty. .It Va dev.cgem.%d._rxdmamapfails This variable is the number of times bus_dmamap_load_mbuf_sg(9) failed in the receive path. .It Va dev.cgem.%d._txfull The number of times the GEM's transmit ring was full. .It Va dev.cgem.%d._txdmamapfails This variable is the number of times bus_dmamap_load_mbuf_sg(9) failed in the transmit path. .It Va dev.cgem.%d._txdefrags This variable is the number of times the driver needed to call m_defrag(9) because a packet queued for transmit had too many DMA segments. .It Va dev.cgem.%d._txdefragfails This variable is the number of times .Xr m_defrag 9 failed. .It Va dev.cgem.%d.stats.* The following variables are useful MAC counters supplied by the hardware: .It Va dev.cgem.%d.stats.tx_bytes A 64-bit counter of the number of bytes transmitted in frames without error. .It Va dev.cgem.%d.stats.tx_frames Counter of frames transmitted without error excluding pause frames. .It Va dev.cgem.%d.stats.tx_frames_bcast Counter of broadcast frames transmitted without error excluding pause frames. .It Va dev.cgem.%d.stats.tx_frames_multi Counter of multicast frames transmitted without error excluding pause frames. .It Va dev.cgem.%d.stats.tx_frames_pause Counter of pause frames transmitted without error. .It Va dev.cgem.%d.stats.tx_frames_64b Counter of 64 byte frames transmitted without error. .It Va dev.cgem.%d.stats.tx_frames_65to127b Counter of 65 to 127 byte frames transmitted without error. .It Va dev.cgem.%d.stats.tx_frames_128to255b Counter of 128 to 255 byte frames transmitted without error. .It Va dev.cgem.%d.stats.tx_frames_256to511b Counter of 256 to 511 byte frames transmitted without error. .It Va dev.cgem.%d.stats.tx_frames_512to1023b Counter of 512 to 1023 byte frames transmitted without error. .It Va dev.cgem.%d.stats.tx_frames_1024to1536b Counter of 1024 to 1536 byte frames transmitted without error. .It Va dev.cgem.%d.stats.tx_under_runs Counter of frames not transmitted due to a transmit underrun. .It Va dev.cgem.%d.stats.tx_single_collisn Counter of frames experiencing a single collision before being successfully transmitted. .It Va dev.cgem.%d.stats.tx_multi_collisn Counter of frames experiencing between 2 and 15 collisions before being successfully transmitted. .It Va dev.cgem.%d.stats.tx_excsv_collisn Counter of frames that failed to transmit because they experienced 16 collisions. .It Va dev.cgem.%d.stats.tx_late_collisn Counter of frames that experienced a late collision. .It Va dev.cgem.%d.stats.tx_deferred_frames Counter of frames experiencing deferral due to carrier sense being active on their first attempt at transmission. .It Va dev.cgem.%d.stats.tx_carrier_sense_errs Counter of frames transmitted where carrier sense was not seen during transmission or where carrier sense was deasserted after being asserted in a transmit frame without collision. .It Va dev.cgem.%d.stats.rx_bytes A 64-bit counter of bytes received without error excluding pause frames. .It Va dev.cgem.%d.stats.rx_frames Counter of frames received without error excluding pause frames. .It Va dev.cgem.%d.stats.rx_frames_bcast Counter of broadcast frames receive without error excluding pause frames. .It Va dev.cgem.%d.stats.rx_frames_multi Counter of multicast frames receive without error excluding pause frames. .It Va dev.cgem.%d.stats.rx_frames_pause Counter of pause frames recevied without error. .It Va dev.cgem.%d.stats.rx_frames_64b Counter of 64-byte frames received without error. .It Va dev.cgem.%d.stats.rx_frames_65to127b Counter of 65 to 127 byte frames received without error. .It Va dev.cgem.%d.stats.rx_frames_128to255b Counter of 128 to 255 byte frames received without error. .It Va dev.cgem.%d.stats.rx_frames_256to511b Counter of 256 to 511 byte frames received without error. .It Va dev.cgem.%d.stats.rx_frames_512to1023b Counter of 512 to 1023 byte frames received without error. .It Va dev.cgem.%d.stats.rx_frames_1024to1536b Counter of 1024 to 1536 byte frames received without error. .It Va dev.cgem.%d.stats.rx_frames_undersize Counter of frames received less than 64 bytes in length that do not also have either a CRC error or an alignment error. .It Va dev.cgem.%d.stats.rx_frames_oversize Counter of frames received exceeding 1536 bytes and do not also have either a CRC error or an alignment error. .It Va dev.cgem.%d.stats.rx_frames_jabber Counter of frames received exceeding 1536 bytes and also have either a CRC error, an alignment error, or a receive symbol error. .It Va dev.cgem.%d.stats.rx_frames_fcs_errs Counter of frames received with a bad CRC and are between 64 and 1536 bytes. .It Va dev.cgem.%d.stats.rx_frames_length_errs Counter of frames received that are shorter than that extracted from the length field. .It Va dev.cgem.%d.stats.rx_symbol_errs Counter of receive symbol errors. .It Va dev.cgem.%d.stats.rx_align_errs Counter of received frames that are not an integral number of bytes. .It Va dev.cgem.%d.stats.rx_resource_errs Counter of frames successfully receive by the MAC but could not be copied to memory because no receive buffer was available. .It Va dev.cgem.%d.stats.rx_overrun_errs Counter of frames that are address recognized but were not copied to memory due to a receive overrun. .It Va dev.cgem.%d.stats.rx_frames_ip_hdr_csum_errs Counter of frames discarded due to an incorrect IP header checksum when checksum offloading is enabled. .It Va dev.cgem.%d.stats.rx_frames_tcp_csum_errs Counter of frames discarded due to an incorrect TCP checksum when checksum offloading is enabled. .It Va dev.cgem.%d.stats.rx_frames_udp_csum_errs Counter of frames discarded due to an incorrect UDP checksum when checksum offloading is enabled. .El +.Sh SEE ALSO +.Xr miibus 4 , +.Xr ifconfig 8 +.Rs +.%T "Zynq-7000 SoC Technical Reference Manual (Xilinx doc UG585)" +.%U http://www.xilinx.com/support/documentation/user_guides/\:ug585-Zynq-7000-TRM.pdf +.Re +.Sh HISTORY +The +.Nm +device driver first appeared in +.Fx 10.0 . +.Sh AUTHORS +The +.Nm +driver and this manual page was written by +.An Thomas Skibo Aq Mt thomasskibo@yahoo.com . .Sh BUGS The GEM can perform TCP/UDP/IP checksum offloading. However, when transmit checksum offloading is enabled, the GEM generates and replaces checksums for all packets it transmits. In a system that is forwarding packets, the device could potentially correct the checksum of packet that was corrupted in transit. For this reason, checksum offloading is disabled by default but can be enabled using ifconfig(8). .Pp When receive checksum offloading is enabled, the device will discard packets with bad TCP/UDP/IP checksums. The bad packets will not be counted in any .Xr netstat 1 statistics. There are .Xr sysctl 8 variables that count packets discarded by the hardware (see below). .Pp The GEM used in the Zynq-7000 has a bug such that the receiver can potentially freeze up under a high load. The issue is described in sec. 16.7 "Known Issues" of the Zynq-7000 SoC Technical Reference Manual (Xilinx UG585 v1.7). The .Nm driver implements the work-around suggested in the manual. If the bug does not exist in other versions of this device, the work-around can be disabled by setting the dev.cgem.%d.rxhangwar .Xr sysctl 8 variable to 0. -.Pp -.Sh SEE ALSO -.Xr miibus 4 , -.Xr ifconfig 8 -.Rs -.%T "Zynq-7000 SoC Technical Reference Manual (Xilinx doc UG585)" -.%U http://www.xilinx.com/support/documentation/user_guides/\:ug585-Zynq-7000-TRM.pdf -.Re -.Sh HISTORY -The -.Nm -device driver first appeared in -.Fx 10.0 . -.Sh AUTHORS -The -.Nm -driver and this manual page was written by -.An Thomas Skibo Aq Mt thomasskibo@yahoo.com . Index: stable/10/share/man/man4/me.4 =================================================================== --- stable/10/share/man/man4/me.4 (revision 306377) +++ stable/10/share/man/man4/me.4 (revision 306378) @@ -1,85 +1,84 @@ .\" Copyright (c) Andrey V. Elsukov .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd November 7, 2014 .Dt ME 4 .Os .Sh NAME .Nm me .Nd encapsulating network device .Sh SYNOPSIS To compile the driver into the kernel, place the following line in the kernel configuration file: .Bd -ragged -offset indent .Cd "device me" .Ed .Pp Alternatively, to load the driver as a module at boot time, place the following line in .Xr loader.conf 5 : .Bd -literal -offset indent if_me_load="YES" .Ed .Sh DESCRIPTION The .Nm network interface pseudo device encapsulates datagrams into IP. These encapsulated datagrams are routed to a destination host, where they are decapsulated and further routed to their final destination. .Pp .Nm interfaces are dynamically created and destroyed with the .Xr ifconfig 8 .Cm create and .Cm destroy subcommands. .Pp This driver corresponds to RFC 2004. Datagrams are encapsulated into IP with a shorter encapsulation. The original IP header is modified and the modifications are inserted between the so modified header and the original payload. The protocol number 55 is used for outer header. .Sh NOTES -.Pp For correct operation, the .Nm device needs a route to the decapsulating host that does not run over the tunnel, as this would be a loop. .Sh SEE ALSO .Xr gif 4 , .Xr gre 4 , .Xr inet 4 , .Xr ip 4 , .Xr netintro 4 , .Xr protocols 5 , .Xr ifconfig 8 , .Xr sysctl 8 .Sh AUTHORS .An Andrey V. Elsukov Aq Mt ae@FreeBSD.org Index: stable/10/share/man/man4/netmap.4 =================================================================== --- stable/10/share/man/man4/netmap.4 (revision 306377) +++ stable/10/share/man/man4/netmap.4 (revision 306378) @@ -1,1075 +1,1075 @@ .\" Copyright (c) 2011-2014 Matteo Landi, Luigi Rizzo, Universita` di Pisa .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" This document is derived in part from the enet man page (enet.4) .\" distributed with 4.3BSD Unix. .\" .\" $FreeBSD$ .\" .Dd February 13, 2014 .Dt NETMAP 4 .Os .Sh NAME .Nm netmap .Nd a framework for fast packet I/O .br .Nm VALE .Nd a fast VirtuAl Local Ethernet using the netmap API .br .Nm netmap pipes .Nd a shared memory packet transport channel .Sh SYNOPSIS .Cd device netmap .Sh DESCRIPTION .Nm is a framework for extremely fast and efficient packet I/O for both userspace and kernel clients. It runs on FreeBSD and Linux, and includes .Nm VALE , a very fast and modular in-kernel software switch/dataplane, and .Nm netmap pipes , a shared memory packet transport channel. All these are accessed interchangeably with the same API. .Pp .Nm , VALE and .Nm netmap pipes are at least one order of magnitude faster than standard OS mechanisms (sockets, bpf, tun/tap interfaces, native switches, pipes), reaching 14.88 million packets per second (Mpps) with much less than one core on a 10 Gbit NIC, about 20 Mpps per core for VALE ports, and over 100 Mpps for netmap pipes. .Pp Userspace clients can dynamically switch NICs into .Nm mode and send and receive raw packets through memory mapped buffers. Similarly, .Nm VALE switch instances and ports, and .Nm netmap pipes can be created dynamically, providing high speed packet I/O between processes, virtual machines, NICs and the host stack. .Pp .Nm suports both non-blocking I/O through .Xr ioctls() , synchronization and blocking I/O through a file descriptor and standard OS mechanisms such as .Xr select 2 , .Xr poll 2 , .Xr epoll 2 , .Xr kqueue 2 . .Nm VALE and .Nm netmap pipes are implemented by a single kernel module, which also emulates the .Nm API over standard drivers for devices without native .Nm support. For best performance, .Nm requires explicit support in device drivers. .Pp In the rest of this (long) manual page we document various aspects of the .Nm and .Nm VALE architecture, features and usage. .Pp .Sh ARCHITECTURE .Nm supports raw packet I/O through a .Em port , which can be connected to a physical interface .Em ( NIC ) , to the host stack, or to a .Nm VALE switch). Ports use preallocated circular queues of buffers .Em ( rings ) residing in an mmapped region. There is one ring for each transmit/receive queue of a NIC or virtual port. An additional ring pair connects to the host stack. .Pp After binding a file descriptor to a port, a .Nm client can send or receive packets in batches through the rings, and possibly implement zero-copy forwarding between ports. .Pp All NICs operating in .Nm mode use the same memory region, accessible to all processes who own .Nm /dev/netmap file descriptors bound to NICs. Independent .Nm VALE and .Nm netmap pipe ports by default use separate memory regions, but can be independently configured to share memory. .Pp .Sh ENTERING AND EXITING NETMAP MODE The following section describes the system calls to create and control .Nm netmap ports (including .Nm VALE and .Nm netmap pipe ports). Simpler, higher level functions are described in section .Xr LIBRARIES . .Pp Ports and rings are created and controlled through a file descriptor, created by opening a special device .Dl fd = open("/dev/netmap"); and then bound to a specific port with an .Dl ioctl(fd, NIOCREGIF, (struct nmreq *)arg); .Pp .Nm has multiple modes of operation controlled by the .Vt struct nmreq argument. .Va arg.nr_name specifies the port name, as follows: .Bl -tag -width XXXX .It Dv OS network interface name (e.g. 'em0', 'eth1', ... ) the data path of the NIC is disconnected from the host stack, and the file descriptor is bound to the NIC (one or all queues), or to the host stack; .It Dv valeXXX:YYY (arbitrary XXX and YYY) the file descriptor is bound to port YYY of a VALE switch called XXX, both dynamically created if necessary. The string cannot exceed IFNAMSIZ characters, and YYY cannot be the name of any existing OS network interface. .El .Pp On return, .Va arg indicates the size of the shared memory region, and the number, size and location of all the .Nm data structures, which can be accessed by mmapping the memory .Dl char *mem = mmap(0, arg.nr_memsize, fd); .Pp Non blocking I/O is done with special .Xr ioctl 2 .Xr select 2 and .Xr poll 2 on the file descriptor permit blocking I/O. .Xr epoll 2 and .Xr kqueue 2 are not supported on .Nm file descriptors. .Pp While a NIC is in .Nm mode, the OS will still believe the interface is up and running. OS-generated packets for that NIC end up into a .Nm ring, and another ring is used to send packets into the OS network stack. A .Xr close 2 on the file descriptor removes the binding, and returns the NIC to normal mode (reconnecting the data path to the host stack), or destroys the virtual port. .Pp .Sh DATA STRUCTURES The data structures in the mmapped memory region are detailed in .Xr sys/net/netmap.h , which is the ultimate reference for the .Nm API. The main structures and fields are indicated below: .Bl -tag -width XXX .It Dv struct netmap_if (one per interface) .Bd -literal struct netmap_if { ... const uint32_t ni_flags; /* properties */ ... const uint32_t ni_tx_rings; /* NIC tx rings */ const uint32_t ni_rx_rings; /* NIC rx rings */ uint32_t ni_bufs_head; /* head of extra bufs list */ ... }; .Ed .Pp Indicates the number of available rings .Pa ( struct netmap_rings ) and their position in the mmapped region. The number of tx and rx rings .Pa ( ni_tx_rings , ni_rx_rings ) normally depends on the hardware. NICs also have an extra tx/rx ring pair connected to the host stack. .Em NIOCREGIF can also request additional unbound buffers in the same memory space, to be used as temporary storage for packets. .Pa ni_bufs_head contains the index of the first of these free rings, which are connected in a list (the first uint32_t of each buffer being the index of the next buffer in the list). A 0 indicates the end of the list. .Pp .It Dv struct netmap_ring (one per ring) .Bd -literal struct netmap_ring { ... const uint32_t num_slots; /* slots in each ring */ const uint32_t nr_buf_size; /* size of each buffer */ ... uint32_t head; /* (u) first buf owned by user */ uint32_t cur; /* (u) wakeup position */ const uint32_t tail; /* (k) first buf owned by kernel */ ... uint32_t flags; struct timeval ts; /* (k) time of last rxsync() */ ... struct netmap_slot slot[0]; /* array of slots */ } .Ed .Pp Implements transmit and receive rings, with read/write pointers, metadata and and an array of .Pa slots describing the buffers. .Pp .It Dv struct netmap_slot (one per buffer) .Bd -literal struct netmap_slot { uint32_t buf_idx; /* buffer index */ uint16_t len; /* packet length */ uint16_t flags; /* buf changed, etc. */ uint64_t ptr; /* address for indirect buffers */ }; .Ed .Pp Describes a packet buffer, which normally is identified by an index and resides in the mmapped region. .It Dv packet buffers Fixed size (normally 2 KB) packet buffers allocated by the kernel. .El .Pp The offset of the .Pa struct netmap_if in the mmapped region is indicated by the .Pa nr_offset field in the structure returned by .Pa NIOCREGIF . From there, all other objects are reachable through relative references (offsets or indexes). Macros and functions in help converting them into actual pointers: .Pp .Dl struct netmap_if *nifp = NETMAP_IF(mem, arg.nr_offset); .Dl struct netmap_ring *txr = NETMAP_TXRING(nifp, ring_index); .Dl struct netmap_ring *rxr = NETMAP_RXRING(nifp, ring_index); .Pp .Dl char *buf = NETMAP_BUF(ring, buffer_index); .Sh RINGS, BUFFERS AND DATA I/O .Va Rings are circular queues of packets with three indexes/pointers .Va ( head , cur , tail ) ; one slot is always kept empty. The ring size .Va ( num_slots ) should not be assumed to be a power of two. .br (NOTE: older versions of netmap used head/count format to indicate the content of a ring). .Pp .Va head is the first slot available to userspace; .br .Va cur is the wakeup point: select/poll will unblock when .Va tail passes .Va cur ; .br .Va tail is the first slot reserved to the kernel. .Pp Slot indexes MUST only move forward; for convenience, the function .Dl nm_ring_next(ring, index) returns the next index modulo the ring size. .Pp .Va head and .Va cur are only modified by the user program; .Va tail is only modified by the kernel. The kernel only reads/writes the .Vt struct netmap_ring slots and buffers during the execution of a netmap-related system call. The only exception are slots (and buffers) in the range .Va tail\ . . . head-1 , that are explicitly assigned to the kernel. .Pp .Ss TRANSMIT RINGS On transmit rings, after a .Nm system call, slots in the range .Va head\ . . . tail-1 are available for transmission. User code should fill the slots sequentially and advance .Va head and .Va cur past slots ready to transmit. .Va cur may be moved further ahead if the user code needs more slots before further transmissions (see .Sx SCATTER GATHER I/O ) . .Pp At the next NIOCTXSYNC/select()/poll(), slots up to .Va head-1 are pushed to the port, and .Va tail may advance if further slots have become available. Below is an example of the evolution of a TX ring: .Pp .Bd -literal after the syscall, slots between cur and tail are (a)vailable head=cur tail | | v v TX [.....aaaaaaaaaaa.............] user creates new packets to (T)ransmit head=cur tail | | v v TX [.....TTTTTaaaaaa.............] NIOCTXSYNC/poll()/select() sends packets and reports new slots head=cur tail | | v v TX [..........aaaaaaaaaaa........] .Ed .Pp select() and poll() wlll block if there is no space in the ring, i.e. .Dl ring->cur == ring->tail and return when new slots have become available. .Pp High speed applications may want to amortize the cost of system calls by preparing as many packets as possible before issuing them. .Pp A transmit ring with pending transmissions has .Dl ring->head != ring->tail + 1 (modulo the ring size). The function .Va int nm_tx_pending(ring) implements this test. .Pp .Ss RECEIVE RINGS On receive rings, after a .Nm system call, the slots in the range .Va head\& . . . tail-1 contain received packets. User code should process them and advance .Va head and .Va cur past slots it wants to return to the kernel. .Va cur may be moved further ahead if the user code wants to wait for more packets without returning all the previous slots to the kernel. .Pp At the next NIOCRXSYNC/select()/poll(), slots up to .Va head-1 are returned to the kernel for further receives, and .Va tail may advance to report new incoming packets. .br Below is an example of the evolution of an RX ring: .Bd -literal after the syscall, there are some (h)eld and some (R)eceived slots head cur tail | | | v v v RX [..hhhhhhRRRRRRRR..........] user advances head and cur, releasing some slots and holding others head cur tail | | | v v v RX [..*****hhhRRRRRR...........] NICRXSYNC/poll()/select() recovers slots and reports new packets head cur tail | | | v v v RX [.......hhhRRRRRRRRRRRR....] .Ed .Pp .Sh SLOTS AND PACKET BUFFERS Normally, packets should be stored in the netmap-allocated buffers assigned to slots when ports are bound to a file descriptor. One packet is fully contained in a single buffer. .Pp The following flags affect slot and buffer processing: .Bl -tag -width XXX .It NS_BUF_CHANGED it MUST be used when the buf_idx in the slot is changed. This can be used to implement zero-copy forwarding, see .Sx ZERO-COPY FORWARDING . .Pp .It NS_REPORT reports when this buffer has been transmitted. Normally, .Nm notifies transmit completions in batches, hence signals can be delayed indefinitely. This flag helps detecting when packets have been send and a file descriptor can be closed. .It NS_FORWARD When a ring is in 'transparent' mode (see .Sx TRANSPARENT MODE ) , packets marked with this flags are forwarded to the other endpoint at the next system call, thus restoring (in a selective way) the connection between a NIC and the host stack. .It NS_NO_LEARN tells the forwarding code that the SRC MAC address for this packet must not be used in the learning bridge code. .It NS_INDIRECT indicates that the packet's payload is in a user-supplied buffer, whose user virtual address is in the 'ptr' field of the slot. The size can reach 65535 bytes. .br This is only supported on the transmit ring of .Nm VALE ports, and it helps reducing data copies in the interconnection of virtual machines. .It NS_MOREFRAG indicates that the packet continues with subsequent buffers; the last buffer in a packet must have the flag clear. .El .Sh SCATTER GATHER I/O Packets can span multiple slots if the .Va NS_MOREFRAG flag is set in all but the last slot. The maximum length of a chain is 64 buffers. This is normally used with .Nm VALE ports when connecting virtual machines, as they generate large TSO segments that are not split unless they reach a physical device. .Pp NOTE: The length field always refers to the individual fragment; there is no place with the total length of a packet. .Pp On receive rings the macro .Va NS_RFRAGS(slot) indicates the remaining number of slots for this packet, including the current one. Slots with a value greater than 1 also have NS_MOREFRAG set. .Sh IOCTLS .Nm uses two ioctls (NIOCTXSYNC, NIOCRXSYNC) for non-blocking I/O. They take no argument. Two more ioctls (NIOCGINFO, NIOCREGIF) are used to query and configure ports, with the following argument: .Bd -literal struct nmreq { char nr_name[IFNAMSIZ]; /* (i) port name */ uint32_t nr_version; /* (i) API version */ uint32_t nr_offset; /* (o) nifp offset in mmap region */ uint32_t nr_memsize; /* (o) size of the mmap region */ uint32_t nr_tx_slots; /* (i/o) slots in tx rings */ uint32_t nr_rx_slots; /* (i/o) slots in rx rings */ uint16_t nr_tx_rings; /* (i/o) number of tx rings */ uint16_t nr_rx_rings; /* (i/o) number of tx rings */ uint16_t nr_ringid; /* (i/o) ring(s) we care about */ uint16_t nr_cmd; /* (i) special command */ uint16_t nr_arg1; /* (i/o) extra arguments */ uint16_t nr_arg2; /* (i/o) extra arguments */ uint32_t nr_arg3; /* (i/o) extra arguments */ uint32_t nr_flags /* (i/o) open mode */ ... }; .Ed .Pp A file descriptor obtained through .Pa /dev/netmap also supports the ioctl supported by network devices, see .Xr netintro 4 . .Pp .Bl -tag -width XXXX .It Dv NIOCGINFO returns EINVAL if the named port does not support netmap. Otherwise, it returns 0 and (advisory) information about the port. Note that all the information below can change before the interface is actually put in netmap mode. .Pp .Bl -tag -width XX .It Pa nr_memsize indicates the size of the .Nm memory region. NICs in .Nm mode all share the same memory region, whereas .Nm VALE ports have independent regions for each port. .It Pa nr_tx_slots , nr_rx_slots indicate the size of transmit and receive rings. .It Pa nr_tx_rings , nr_rx_rings indicate the number of transmit and receive rings. Both ring number and sizes may be configured at runtime using interface-specific functions (e.g. .Xr ethtool ). .El .It Dv NIOCREGIF binds the port named in .Va nr_name to the file descriptor. For a physical device this also switches it into .Nm mode, disconnecting it from the host stack. Multiple file descriptors can be bound to the same port, with proper synchronization left to the user. .Pp .Dv NIOCREGIF can also bind a file descriptor to one endpoint of a .Em netmap pipe , consisting of two netmap ports with a crossover connection. A netmap pipe share the same memory space of the parent port, and is meant to enable configuration where a master process acts as a dispatcher towards slave processes. .Pp To enable this function, the .Pa nr_arg1 field of the structure can be used as a hint to the kernel to indicate how many pipes we expect to use, and reserve extra space in the memory region. .Pp On return, it gives the same info as NIOCGINFO, with .Pa nr_ringid and .Pa nr_flags indicating the identity of the rings controlled through the file descriptor. .Pp .Va nr_flags .Va nr_ringid selects which rings are controlled through this file descriptor. Possible values of .Pa nr_flags are indicated below, together with the naming schemes that application libraries (such as the .Nm nm_open indicated below) can use to indicate the specific set of rings. In the example below, "netmap:foo" is any valid netmap port name. .Pp .Bl -tag -width XXXXX .It NR_REG_ALL_NIC "netmap:foo" (default) all hardware ring pairs .It NR_REG_SW_NIC "netmap:foo^" the ``host rings'', connecting to the host stack. -.It NR_RING_NIC_SW "netmap:foo+ +.It NR_RING_NIC_SW "netmap:foo+" all hardware rings and the host rings .It NR_REG_ONE_NIC "netmap:foo-i" only the i-th hardware ring pair, where the number is in .Pa nr_ringid ; .It NR_REG_PIPE_MASTER "netmap:foo{i" the master side of the netmap pipe whose identifier (i) is in .Pa nr_ringid ; .It NR_REG_PIPE_SLAVE "netmap:foo}i" the slave side of the netmap pipe whose identifier (i) is in .Pa nr_ringid . .Pp The identifier of a pipe must be thought as part of the pipe name, and does not need to be sequential. On return the pipe will only have a single ring pair with index 0, irrespective of the value of i. .El .Pp By default, a .Xr poll 2 or .Xr select 2 call pushes out any pending packets on the transmit ring, even if no write events are specified. The feature can be disabled by or-ing .Va NETMAP_NO_TX_SYNC to the value written to .Va nr_ringid. When this feature is used, packets are transmitted only on .Va ioctl(NIOCTXSYNC) or select()/poll() are called with a write event (POLLOUT/wfdset) or a full ring. .Pp When registering a virtual interface that is dynamically created to a .Xr vale 4 switch, we can specify the desired number of rings (1 by default, and currently up to 16) on it using nr_tx_rings and nr_rx_rings fields. .It Dv NIOCTXSYNC tells the hardware of new packets to transmit, and updates the number of slots available for transmission. .It Dv NIOCRXSYNC tells the hardware of consumed packets, and asks for newly available packets. .El .Sh SELECT, POLL, EPOLL, KQUEUE. .Xr select 2 and .Xr poll 2 on a .Nm file descriptor process rings as indicated in .Sx TRANSMIT RINGS and .Sx RECEIVE RINGS , respectively when write (POLLOUT) and read (POLLIN) events are requested. Both block if no slots are available in the ring .Va ( ring->cur == ring->tail ) . Depending on the platform, .Xr epoll 2 and .Xr kqueue 2 are supported too. .Pp Packets in transmit rings are normally pushed out (and buffers reclaimed) even without requesting write events. Passing the NETMAP_NO_TX_SYNC flag to .Em NIOCREGIF disables this feature. By default, receive rings are processed only if read events are requested. Passing the NETMAP_DO_RX_SYNC flag to .Em NIOCREGIF updates receive rings even without read events. Note that on epoll and kqueue, NETMAP_NO_TX_SYNC and NETMAP_DO_RX_SYNC only have an effect when some event is posted for the file descriptor. .Sh LIBRARIES The .Nm API is supposed to be used directly, both because of its simplicity and for efficient integration with applications. .Pp For conveniency, the .Va header provides a few macros and functions to ease creating a file descriptor and doing I/O with a .Nm port. These are loosely modeled after the .Xr pcap 3 API, to ease porting of libpcap-based applications to .Nm . To use these extra functions, programs should .Dl #define NETMAP_WITH_LIBS before .Dl #include .Pp The following functions are available: .Bl -tag -width XXXXX .It Va struct nm_desc * nm_open(const char *ifname, const struct nmreq *req, uint64_t flags, const struct nm_desc *arg) similar to .Xr pcap_open , binds a file descriptor to a port. .Bl -tag -width XX .It Va ifname is a port name, in the form "netmap:XXX" for a NIC and "valeXXX:YYY" for a .Nm VALE port. .It Va req provides the initial values for the argument to the NIOCREGIF ioctl. The nm_flags and nm_ringid values are overwritten by parsing ifname and flags, and other fields can be overridden through the other two arguments. .It Va arg points to a struct nm_desc containing arguments (e.g. from a previously open file descriptor) that should override the defaults. The fields are used as described below .It Va flags can be set to a combination of the following flags: .Va NETMAP_NO_TX_POLL , .Va NETMAP_DO_RX_POLL (copied into nr_ringid); .Va NM_OPEN_NO_MMAP (if arg points to the same memory region, avoids the mmap and uses the values from it); .Va NM_OPEN_IFNAME (ignores ifname and uses the values in arg); .Va NM_OPEN_ARG1 , .Va NM_OPEN_ARG2 , .Va NM_OPEN_ARG3 (uses the fields from arg); .Va NM_OPEN_RING_CFG (uses the ring number and sizes from arg). .El .It Va int nm_close(struct nm_desc *d) closes the file descriptor, unmaps memory, frees resources. .It Va int nm_inject(struct nm_desc *d, const void *buf, size_t size) similar to pcap_inject(), pushes a packet to a ring, returns the size of the packet is successful, or 0 on error; .It Va int nm_dispatch(struct nm_desc *d, int cnt, nm_cb_t cb, u_char *arg) similar to pcap_dispatch(), applies a callback to incoming packets .It Va u_char * nm_nextpkt(struct nm_desc *d, struct nm_pkthdr *hdr) similar to pcap_next(), fetches the next packet .Pp .El .Sh SUPPORTED DEVICES .Nm natively supports the following devices: .Pp On FreeBSD: .Xr em 4 , .Xr igb 4 , .Xr ixgbe 4 , .Xr lem 4 , .Xr re 4 . .Pp On Linux .Xr e1000 4 , .Xr e1000e 4 , .Xr igb 4 , .Xr ixgbe 4 , .Xr mlx4 4 , .Xr forcedeth 4 , .Xr r8169 4 . .Pp NICs without native support can still be used in .Nm mode through emulation. Performance is inferior to native netmap mode but still significantly higher than sockets, and approaching that of in-kernel solutions such as Linux's .Xr pktgen . .Pp Emulation is also available for devices with native netmap support, which can be used for testing or performance comparison. The sysctl variable .Va dev.netmap.admode globally controls how netmap mode is implemented. .Sh SYSCTL VARIABLES AND MODULE PARAMETERS Some aspect of the operation of .Nm are controlled through sysctl variables on FreeBSD .Em ( dev.netmap.* ) and module parameters on Linux .Em ( /sys/module/netmap_lin/parameters/* ) : .Pp .Bl -tag -width indent .It Va dev.netmap.admode: 0 Controls the use of native or emulated adapter mode. 0 uses the best available option, 1 forces native and fails if not available, 2 forces emulated hence never fails. .It Va dev.netmap.generic_ringsize: 1024 Ring size used for emulated netmap mode .It Va dev.netmap.generic_mit: 100000 Controls interrupt moderation for emulated mode .It Va dev.netmap.mmap_unreg: 0 .It Va dev.netmap.fwd: 0 Forces NS_FORWARD mode .It Va dev.netmap.flags: 0 .It Va dev.netmap.txsync_retry: 2 .It Va dev.netmap.no_pendintr: 1 Forces recovery of transmit buffers on system calls .It Va dev.netmap.mitigate: 1 Propagates interrupt mitigation to user processes .It Va dev.netmap.no_timestamp: 0 Disables the update of the timestamp in the netmap ring .It Va dev.netmap.verbose: 0 Verbose kernel messages .It Va dev.netmap.buf_num: 163840 .It Va dev.netmap.buf_size: 2048 .It Va dev.netmap.ring_num: 200 .It Va dev.netmap.ring_size: 36864 .It Va dev.netmap.if_num: 100 .It Va dev.netmap.if_size: 1024 Sizes and number of objects (netmap_if, netmap_ring, buffers) for the global memory region. The only parameter worth modifying is .Va dev.netmap.buf_num as it impacts the total amount of memory used by netmap. .It Va dev.netmap.buf_curr_num: 0 .It Va dev.netmap.buf_curr_size: 0 .It Va dev.netmap.ring_curr_num: 0 .It Va dev.netmap.ring_curr_size: 0 .It Va dev.netmap.if_curr_num: 0 .It Va dev.netmap.if_curr_size: 0 Actual values in use. .It Va dev.netmap.bridge_batch: 1024 Batch size used when moving packets across a .Nm VALE switch. Values above 64 generally guarantee good performance. .El .Sh SYSTEM CALLS .Nm uses .Xr select 2 , .Xr poll 2 , .Xr epoll and .Xr kqueue to wake up processes when significant events occur, and .Xr mmap 2 to map memory. .Xr ioctl 2 is used to configure ports and .Nm VALE switches . .Pp Applications may need to create threads and bind them to specific cores to improve performance, using standard OS primitives, see .Xr pthread 3 . In particular, .Xr pthread_setaffinity_np 3 may be of use. .Sh CAVEATS No matter how fast the CPU and OS are, achieving line rate on 10G and faster interfaces requires hardware with sufficient performance. Several NICs are unable to sustain line rate with small packet sizes. Insufficient PCIe or memory bandwidth can also cause reduced performance. .Pp Another frequent reason for low performance is the use of flow control on the link: a slow receiver can limit the transmit speed. Be sure to disable flow control when running high speed experiments. .Pp .Ss SPECIAL NIC FEATURES .Nm is orthogonal to some NIC features such as multiqueue, schedulers, packet filters. .Pp Multiple transmit and receive rings are supported natively and can be configured with ordinary OS tools, such as .Xr ethtool or device-specific sysctl variables. The same goes for Receive Packet Steering (RPS) and filtering of incoming traffic. .Pp .Nm .Em does not use features such as .Em checksum offloading , TCP segmentation offloading , .Em encryption , VLAN encapsulation/decapsulation , etc. . When using netmap to exchange packets with the host stack, make sure to disable these features. .Sh EXAMPLES .Ss TEST PROGRAMS .Nm comes with a few programs that can be used for testing or simple applications. See the .Va examples/ directory in .Nm distributions, or .Va tools/tools/netmap/ directory in FreeBSD distributions. .Pp .Xr pkt-gen is a general purpose traffic source/sink. .Pp As an example .Dl pkt-gen -i ix0 -f tx -l 60 can generate an infinite stream of minimum size packets, and .Dl pkt-gen -i ix0 -f rx is a traffic sink. Both print traffic statistics, to help monitor how the system performs. .Pp .Xr pkt-gen has many options can be uses to set packet sizes, addresses, rates, and use multiple send/receive threads and cores. .Pp .Xr bridge is another test program which interconnects two .Nm ports. It can be used for transparent forwarding between interfaces, as in .Dl bridge -i ix0 -i ix1 or even connect the NIC to the host stack using netmap .Dl bridge -i ix0 -i ix0 .Ss USING THE NATIVE API The following code implements a traffic generator .Pp .Bd -literal -compact #include ... void sender(void) { struct netmap_if *nifp; struct netmap_ring *ring; struct nmreq nmr; struct pollfd fds; fd = open("/dev/netmap", O_RDWR); bzero(&nmr, sizeof(nmr)); strcpy(nmr.nr_name, "ix0"); nmr.nm_version = NETMAP_API; ioctl(fd, NIOCREGIF, &nmr); p = mmap(0, nmr.nr_memsize, fd); nifp = NETMAP_IF(p, nmr.nr_offset); ring = NETMAP_TXRING(nifp, 0); fds.fd = fd; fds.events = POLLOUT; for (;;) { poll(&fds, 1, -1); while (!nm_ring_empty(ring)) { i = ring->cur; buf = NETMAP_BUF(ring, ring->slot[i].buf_index); ... prepare packet in buf ... ring->slot[i].len = ... packet length ... ring->head = ring->cur = nm_ring_next(ring, i); } } } .Ed .Ss HELPER FUNCTIONS A simple receiver can be implemented using the helper functions .Bd -literal -compact #define NETMAP_WITH_LIBS #include ... void receiver(void) { struct nm_desc *d; struct pollfd fds; u_char *buf; struct nm_pkthdr h; ... d = nm_open("netmap:ix0", NULL, 0, 0); fds.fd = NETMAP_FD(d); fds.events = POLLIN; for (;;) { poll(&fds, 1, -1); while ( (buf = nm_nextpkt(d, &h)) ) consume_pkt(buf, h->len); } nm_close(d); } .Ed .Ss ZERO-COPY FORWARDING Since physical interfaces share the same memory region, it is possible to do packet forwarding between ports swapping buffers. The buffer from the transmit ring is used to replenish the receive ring: .Bd -literal -compact uint32_t tmp; struct netmap_slot *src, *dst; ... src = &src_ring->slot[rxr->cur]; dst = &dst_ring->slot[txr->cur]; tmp = dst->buf_idx; dst->buf_idx = src->buf_idx; dst->len = src->len; dst->flags = NS_BUF_CHANGED; src->buf_idx = tmp; src->flags = NS_BUF_CHANGED; rxr->head = rxr->cur = nm_ring_next(rxr, rxr->cur); txr->head = txr->cur = nm_ring_next(txr, txr->cur); ... .Ed .Ss ACCESSING THE HOST STACK The host stack is for all practical purposes just a regular ring pair, which you can access with the netmap API (e.g. with .Dl nm_open("netmap:eth0^", ... ) ; All packets that the host would send to an interface in .Nm mode end up into the RX ring, whereas all packets queued to the TX ring are send up to the host stack. .Ss VALE SWITCH A simple way to test the performance of a .Nm VALE switch is to attach a sender and a receiver to it, e.g. running the following in two different terminals: .Dl pkt-gen -i vale1:a -f rx # receiver .Dl pkt-gen -i vale1:b -f tx # sender The same example can be used to test netmap pipes, by simply changing port names, e.g. .Dl pkt-gen -i vale:x{3 -f rx # receiver on the master side .Dl pkt-gen -i vale:x}3 -f tx # sender on the slave side .Pp The following command attaches an interface and the host stack to a switch: .Dl vale-ctl -h vale2:em0 Other .Nm clients attached to the same switch can now communicate with the network card or the host. .Pp .Sh SEE ALSO .Pp http://info.iet.unipi.it/~luigi/netmap/ .Pp Luigi Rizzo, Revisiting network I/O APIs: the netmap framework, Communications of the ACM, 55 (3), pp.45-51, March 2012 .Pp Luigi Rizzo, netmap: a novel framework for fast packet I/O, Usenix ATC'12, June 2012, Boston .Pp Luigi Rizzo, Giuseppe Lettieri, VALE, a switched ethernet for virtual machines, ACM CoNEXT'12, December 2012, Nice .Pp Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione, Speeding up packet I/O in virtual machines, ACM/IEEE ANCS'13, October 2013, San Jose .Sh AUTHORS .An -nosplit The .Nm framework has been originally designed and implemented at the Universita` di Pisa in 2011 by .An Luigi Rizzo , and further extended with help from .An Matteo Landi , .An Gaetano Catalli , .An Giuseppe Lettieri , .An Vincenzo Maffione . .Pp .Nm and .Nm VALE have been funded by the European Commission within FP7 Projects CHANGE (257422) and OPENLAB (287581). Index: stable/10/share/man/man9/get_cyclecount.9 =================================================================== --- stable/10/share/man/man9/get_cyclecount.9 (revision 306377) +++ stable/10/share/man/man9/get_cyclecount.9 (revision 306378) @@ -1,95 +1,94 @@ .\" Copyright (c) 2000 Mark R V Murray .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd March 15, 2011 .Dt GET_CYCLECOUNT 9 .Os .Sh NAME .Nm get_cyclecount .Nd get the CPU's fast counter register contents .Sh SYNOPSIS .In sys/param.h .In sys/systm.h .In machine/cpu.h .Ft uint64_t .Fn get_cyclecount "void" .Sh DESCRIPTION The .Fn get_cyclecount function uses a register available in most modern CPUs to return a value that is monotonically increasing inside each CPU. .Pp On SMP systems, there will be a number of separate monotonic sequences, one for each CPU running. The value in the SMP case is selected from one of these sequences, dependent on which CPU was scheduled to service the request. .Pp The speed and the maximum value of each counter is CPU-dependent. Some CPUs (such as the .Tn Intel 80486) do not have such a register, so .Fn get_cyclecount on these platforms returns a (monotonic) combination of numbers represented by the structure returned by .Xr binuptime 9 . .Pp The .Tn AMD64 and .Tn Intel 64 processors use the .Li TSC register. -.Pp The .Tn IA64 processors use the .Li AR.ITC register. .Sh SEE ALSO .Xr binuptime 9 .Sh HISTORY The .Fn get_cyclecount function first appeared in .Fx 5.0 . .Sh AUTHORS This manual page was written by .An Mark Murray Aq markm@FreeBSD.org . Index: stable/10/share/man/man9/malloc.9 =================================================================== --- stable/10/share/man/man9/malloc.9 (revision 306377) +++ stable/10/share/man/man9/malloc.9 (revision 306378) @@ -1,285 +1,285 @@ .\" .\" Copyright (c) 1996 The NetBSD Foundation, Inc. .\" All rights reserved. .\" .\" This code is derived from software contributed to The NetBSD Foundation .\" by Paul Kranenburg. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE .\" LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE .\" POSSIBILITY OF SUCH DAMAGE. .\" .\" $NetBSD: malloc.9,v 1.3 1996/11/11 00:05:11 lukem Exp $ .\" $FreeBSD$ .\" .Dd November 15, 2012 .Dt MALLOC 9 .Os .Sh NAME .Nm malloc , .Nm free , .Nm realloc , .Nm reallocf , .Nm MALLOC_DEFINE , .Nm MALLOC_DECLARE .Nd kernel memory management routines .Sh SYNOPSIS .In sys/types.h .In sys/malloc.h .Ft void * .Fn malloc "unsigned long size" "struct malloc_type *type" "int flags" .Ft void .Fn free "void *addr" "struct malloc_type *type" .Ft void * .Fn realloc "void *addr" "unsigned long size" "struct malloc_type *type" "int flags" .Ft void * .Fn reallocf "void *addr" "unsigned long size" "struct malloc_type *type" "int flags" .Fn MALLOC_DECLARE type .In sys/param.h .In sys/malloc.h .In sys/kernel.h .Fn MALLOC_DEFINE type shortdesc longdesc .Sh DESCRIPTION The .Fn malloc function allocates uninitialized memory in kernel address space for an object whose size is specified by .Fa size . .Pp The .Fn free function releases memory at address .Fa addr that was previously allocated by .Fn malloc for re-use. The memory is not zeroed. If .Fa addr is .Dv NULL , then .Fn free does nothing. .Pp The .Fn realloc function changes the size of the previously allocated memory referenced by .Fa addr to .Fa size bytes. The contents of the memory are unchanged up to the lesser of the new and old sizes. Note that the returned value may differ from .Fa addr . If the requested memory cannot be allocated, .Dv NULL is returned and the memory referenced by .Fa addr is valid and unchanged. If .Fa addr is .Dv NULL , the .Fn realloc function behaves identically to .Fn malloc for the specified size. .Pp The .Fn reallocf function is identical to .Fn realloc except that it will free the passed pointer when the requested memory cannot be allocated. .Pp Unlike its standard C library counterpart .Pq Xr malloc 3 , the kernel version takes two more arguments. The .Fa flags argument further qualifies .Fn malloc Ns 's operational characteristics as follows: .Bl -tag -width indent .It Dv M_ZERO Causes the allocated memory to be set to all zeros. .It Dv M_NODUMP For allocations greater than page size, causes the allocated memory to be excluded from kernel core dumps. .It Dv M_NOWAIT Causes .Fn malloc , .Fn realloc , and .Fn reallocf to return .Dv NULL if the request cannot be immediately fulfilled due to resource shortage. Note that .Dv M_NOWAIT is required when running in an interrupt context. .It Dv M_WAITOK Indicates that it is OK to wait for resources. If the request cannot be immediately fulfilled, the current process is put to sleep to wait for resources to be released by other processes. The .Fn malloc , .Fn realloc , and .Fn reallocf functions cannot return .Dv NULL if .Dv M_WAITOK is specified. .It Dv M_USE_RESERVE Indicates that the system can use its reserve of memory to satisfy the request. This option should only be used in combination with .Dv M_NOWAIT when an allocation failure cannot be tolerated by the caller without catastrophic effects on the system. .El .Pp Exactly one of either .Dv M_WAITOK or .Dv M_NOWAIT must be specified. .Pp The .Fa type argument is used to perform statistics on memory usage, and for basic sanity checks. It can be used to identify multiple allocations. The statistics can be examined by .Sq vmstat -m . .Pp A .Fa type is defined using .Vt "struct malloc_type" via the .Fn MALLOC_DECLARE and .Fn MALLOC_DEFINE macros. .Bd -literal -offset indent /* sys/something/foo_extern.h */ MALLOC_DECLARE(M_FOOBUF); /* sys/something/foo_main.c */ MALLOC_DEFINE(M_FOOBUF, "foobuffers", "Buffers to foo data into the ether"); /* sys/something/foo_subr.c */ \&... buf = malloc(sizeof(*buf), M_FOOBUF, M_NOWAIT); .Ed .Pp In order to use .Fn MALLOC_DEFINE , one must include .In sys/param.h (instead of .In sys/types.h ) and .In sys/kernel.h . -.Sh IMPLEMENTATION NOTES -The memory allocator allocates memory in chunks that have size a power -of two for requests up to the size of a page of memory. -For larger requests, one or more pages is allocated. -While it should not be relied upon, this information may be useful for -optimizing the efficiency of memory use. .Pp Programmers should be careful not to confuse the malloc flags .Dv M_NOWAIT and .Dv M_WAITOK with the .Xr mbuf 9 flags .Dv M_DONTWAIT and .Dv M_WAIT . .Sh CONTEXT .Fn malloc , .Fn realloc and .Fn reallocf may not be called from fast interrupts handlers. When called from threaded interrupts, .Fa flags must contain .Dv M_NOWAIT . .Pp .Fn malloc , .Fn realloc and .Fn reallocf may sleep when called with .Dv M_WAITOK . .Fn free never sleeps. .Pp Any calls to .Fn malloc (even with .Dv M_NOWAIT ) or .Fn free when holding a .Xr vnode 9 interlock, will cause a LOR (Lock Order Reversal) due to the intertwining of VM Objects and Vnodes. +.Sh IMPLEMENTATION NOTES +The memory allocator allocates memory in chunks that have size a power +of two for requests up to the size of a page of memory. +For larger requests, one or more pages is allocated. +While it should not be relied upon, this information may be useful for +optimizing the efficiency of memory use. .Sh RETURN VALUES The .Fn malloc , .Fn realloc , and .Fn reallocf functions return a kernel virtual address that is suitably aligned for storage of any type of object, or .Dv NULL if the request could not be satisfied (implying that .Dv M_NOWAIT was set). .Sh DIAGNOSTICS A kernel compiled with the .Dv INVARIANTS configuration option attempts to detect memory corruption caused by such things as writing outside the allocated area and imbalanced calls to the .Fn malloc and .Fn free functions. Failing consistency checks will cause a panic or a system console message. .Sh SEE ALSO .Xr vmstat 8 , .Xr contigmalloc 9 , .Xr memguard 9 , .Xr vnode 9 Index: stable/10/share/man/man9/sleepqueue.9 =================================================================== --- stable/10/share/man/man9/sleepqueue.9 (revision 306377) +++ stable/10/share/man/man9/sleepqueue.9 (revision 306378) @@ -1,391 +1,390 @@ .\" Copyright (c) 2000-2004 John H. Baldwin .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE DEVELOPERS ``AS IS'' AND ANY EXPRESS OR .\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES .\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. .\" IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, .\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT .\" NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, .\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY .\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd September 22, 2014 .Dt SLEEPQUEUE 9 .Os .Sh NAME .Nm init_sleepqueues , .Nm sleepq_abort , .Nm sleepq_add , .Nm sleepq_alloc , .Nm sleepq_broadcast , .Nm sleepq_free , .Nm sleepq_lock , .Nm sleepq_lookup , .Nm sleepq_release , .Nm sleepq_remove , .Nm sleepq_signal , .Nm sleepq_set_timeout , .Nm sleepq_set_timeout_sbt , .Nm sleepq_sleepcnt , .Nm sleepq_timedwait , .Nm sleepq_timedwait_sig , .Nm sleepq_type , .Nm sleepq_wait , .Nm sleepq_wait_sig .Nd manage the queues of sleeping threads .Sh SYNOPSIS .In sys/param.h .In sys/sleepqueue.h .Ft void .Fn init_sleepqueues "void" .Ft int .Fn sleepq_abort "struct thread *td" .Ft void .Fn sleepq_add "void *wchan" "struct lock_object *lock" "const char *wmesg" "int flags" "int queue" .Ft struct sleepqueue * .Fn sleepq_alloc "void" .Ft int .Fn sleepq_broadcast "void *wchan" "int flags" "int pri" "int queue" .Ft void .Fn sleepq_free "struct sleepqueue *sq" .Ft struct sleepqueue * .Fn sleepq_lookup "void *wchan" .Ft void .Fn sleepq_lock "void *wchan" .Ft void .Fn sleepq_release "void *wchan" .Ft void .Fn sleepq_remove "struct thread *td" "void *wchan" .Ft int .Fn sleepq_signal "void *wchan" "int flags" "int pri" "int queue" .Ft void .Fn sleepq_set_timeout "void *wchan" "int timo" .Ft void .Fn sleepq_set_timeout_sbt "void *wchan" "sbintime_t sbt" \ "sbintime_t pr" "int flags" .Ft u_int .Fn sleepq_sleepcnt "void *wchan" "int queue" .Ft int .Fn sleepq_timedwait "void *wchan" "int pri" .Ft int .Fn sleepq_timedwait_sig "void *wchan" "int pri" .Ft int .Fn sleepq_type "void *wchan" .Ft void .Fn sleepq_wait "void *wchan" "int pri" .Ft int .Fn sleepq_wait_sig "void *wchan" "int pri" .Sh DESCRIPTION Sleep queues provide a mechanism for suspending execution of a thread until some condition is met. Each queue is associated with a specific wait channel when it is active, and only one queue may be associated with a wait channel at any given point in time. The implementation of each wait channel splits its sleepqueue into 2 sub-queues in order to enable some optimizations on threads' wakeups. An active queue holds a list of threads that are blocked on the associated wait channel. Threads that are not blocked on a wait channel have an associated inactive sleep queue. When a thread blocks on a wait channel it donates its inactive sleep queue to the wait channel. When a thread is resumed, the wait channel that it was blocked on gives it an inactive sleep queue for later use. .Pp The .Fn sleepq_alloc function allocates an inactive sleep queue and is used to assign a sleep queue to a thread during thread creation. The .Fn sleepq_free function frees the resources associated with an inactive sleep queue and is used to free a queue during thread destruction. .Pp Active sleep queues are stored in a hash table hashed on the addresses pointed to by wait channels. Each bucket in the hash table contains a sleep queue chain. A sleep queue chain contains a spin mutex and a list of sleep queues that hash to that specific chain. Active sleep queues are protected by their chain's spin mutex. The .Fn init_sleepqueues function initializes the hash table of sleep queue chains. .Pp The .Fn sleepq_lock function locks the sleep queue chain associated with wait channel .Fa wchan . .Pp The .Fn sleepq_lookup returns a pointer to the currently active sleep queue for that wait channel associated with .Fa wchan or .Dv NULL if there is no active sleep queue associated with argument .Fa wchan . It requires the sleep queue chain associated with .Fa wchan to have been locked by a prior call to .Fn sleepq_lock . .Pp The .Fn sleepq_release function unlocks the sleep queue chain associated with .Fn wchan and is primarily useful when aborting a pending sleep request before one of the wait functions is called. .Pp The .Fn sleepq_add function places the current thread on the sleep queue associated with the wait channel .Fa wchan . The sleep queue chain associated with argument .Fa wchan must be locked by a prior call to .Fn sleepq_lock when this function is called. If a lock is specified via the .Fa lock argument, and if the kernel was compiled with .Cd "options INVARIANTS" , then the sleep queue code will perform extra checks to ensure that the lock is used by all threads sleeping on .Fa wchan . The .Fa wmesg parameter should be a short description of .Fa wchan . The .Fa flags parameter is a bitmask consisting of the type of sleep queue being slept on and zero or more optional flags. The .Fa queue parameter specifies the sub-queue, in which the contending thread will be inserted. .Pp There are currently three types of sleep queues: .Pp .Bl -tag -width ".Dv SLEEPQ_CONDVAR" -compact .It Dv SLEEPQ_CONDVAR A sleep queue used to implement condition variables. .It Dv SLEEPQ_SLEEP A sleep queue used to implement .Xr sleep 9 , .Xr wakeup 9 and .Xr wakeup_one 9 . .It Dv SLEEPQ_PAUSE A sleep queue used to implement .Xr pause 9 . .El .Pp There are currently two optional flag: .Pp .Bl -tag -width ".Dv SLEEPQ_INTERRUPTIBLE" -compact .It Dv SLEEPQ_INTERRUPTIBLE The current thread is entering an interruptible sleep. .El .Bl -tag -width ".Dv SLEEPQ_STOP_ON_BDRY" -compact .It Dv SLEEPQ_STOP_ON_BDRY When thread is entering an interruptible sleep, do not stop it upon arrival of stop action, like .Dv SIGSTOP . Wake it up instead. .El .Pp A timeout on the sleep may be specified by calling .Fn sleepq_set_timeout after .Fn sleepq_add . The .Fa wchan parameter should be the same value from the preceding call to .Fn sleepq_add , and the sleep queue chain associated with .Fa wchan must have been locked by a prior call to .Fn sleepq_lock . The .Fa timo parameter should specify the timeout value in ticks. .Pp .Fn sleepq_set_timeout_sbt function takes .Fa sbt argument instead of .Fa timo . It allows to specify relative or absolute wakeup time with higher resolution in form of .Vt sbintime_t . The parameter .Fa pr allows to specify wanted absolute event precision. The parameter .Fa flags allows to pass additional .Fn callout_reset_sbt flags. .Pp -.Pp Once the thread is ready to suspend, one of the wait functions is called to put the current thread to sleep until it is awakened and to context switch to another thread. The .Fn sleepq_wait function is used for non-interruptible sleeps that do not have a timeout. The .Fn sleepq_timedwait function is used for non-interruptible sleeps that have had a timeout set via .Fn sleepq_set_timeout . The .Fn sleepq_wait_sig function is used for interruptible sleeps that do not have a timeout. The .Fn sleepq_timedwait_sig function is used for interruptible sleeps that do have a timeout set. The .Fa wchan argument to all of the wait functions is the wait channel being slept on. The sleep queue chain associated with argument .Fa wchan needs to have been locked with a prior call to .Fn sleepq_lock . The .Fa pri argument is used to set the priority of the thread when it is awakened. If it is set to zero, the thread's priority is left alone. .Pp When the thread is resumed, the wait functions return a non-zero value if the thread was awakened due to an interrupt other than a signal or a timeout. If the sleep timed out, then .Er EWOULDBLOCK is returned. If the sleep was interrupted by something other than a signal, then some other return value will be returned. .Pp A sleeping thread is normally resumed by the .Fn sleepq_broadcast and .Fn sleepq_signal functions. The .Fn sleepq_signal function awakens the highest priority thread sleeping on a wait channel while .Fn sleepq_broadcast awakens all of the threads sleeping on a wait channel. The .Fa wchan argument specifics which wait channel to awaken. The .Fa flags argument must match the sleep queue type contained in the .Fa flags argument passed to .Fn sleepq_add by the threads sleeping on the wait channel. If the .Fa pri argument does not equal \-1, then each thread that is awakened will have its priority raised to .Fa pri if it has a lower priority. The sleep queue chain associated with argument .Fa wchan must be locked by a prior call to .Fn sleepq_lock before calling any of these functions. The .Fa queue argument specifies the sub-queue, from which threads need to be woken up. .Pp A thread in an interruptible sleep can be interrupted by another thread via the .Fn sleepq_abort function. The .Fa td argument specifies the thread to interrupt. An individual thread can also be awakened from sleeping on a specific wait channel via the .Fn sleepq_remove function. The .Fa td argument specifies the thread to awaken and the .Fa wchan argument specifies the wait channel to awaken it from. If the thread .Fa td is not blocked on the wait channel .Fa wchan then this function will not do anything, even if the thread is asleep on a different wait channel. This function should only be used if one of the other functions above is not sufficient. One possible use is waking up a specific thread from a widely shared sleep channel. .Pp The .Fn sleepq_sleepcnt function offer a simple way to retrieve the number of threads sleeping for the specified .Fa queue , given a .Fa wchan . .Pp The .Fn sleepq_type function returns the type of .Fa wchan associated to a sleepqueue. .Pp The .Fn sleepq_abort , .Fn sleepq_broadcast , and .Fn sleepq_signal functions all return a boolean value. If the return value is true, then at least one thread was resumed that is currently swapped out. The caller is responsible for awakening the scheduler process so that the resumed thread will be swapped back in. This is done by calling the .Fn kick_proc0 function after releasing the sleep queue chain lock via a call to .Fn sleepq_release . .Pp The sleep queue interface is currently used to implement the .Xr sleep 9 and .Xr condvar 9 interfaces. Almost all other code in the kernel should use one of those interfaces rather than manipulating sleep queues directly. .Sh SEE ALSO .Xr condvar 9 , .Xr runqueue 9 , .Xr scheduler 9 , .Xr sleep 9 , .Xr timeout 9 Index: stable/10/sys/boot/common/zfsloader.8 =================================================================== --- stable/10/sys/boot/common/zfsloader.8 (revision 306377) +++ stable/10/sys/boot/common/zfsloader.8 (revision 306378) @@ -1,109 +1,106 @@ .\" Copyright (c) 2014 Andriy Gapon .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd September 15, 2014 .Dt ZFSLOADER 8 .Os .Sh NAME .Nm zfsloader .Nd kernel bootstrapping final stage .Sh DESCRIPTION .Nm is an extended variant of .Xr loader 8 with added support for booting from ZFS. This document describes only differences from .Xr loader 8 . .Sh ZFS FEATURES .Nm supports the following format for specifying ZFS filesystems which can be used wherever .Xr loader 8 refers to a device specification: .Pp .Ar zfs:pool/filesystem: .Pp where .Pa pool/filesystem is a ZFS filesystem name as described in .Xr zfs 8 . .Pp If .Pa /etc/fstab does not have an entry for the root filesystem and .Va vfs.root.mountfrom is not set, but .Va currdev refers to a ZFS filesystem, then .Nm will instruct kernel to use that filesystem as the root filesystem. .Sh ZFS COMMAND EXTENSIONS .Bl -tag -width Ds -compact -.Pp .It Ic lsdev Op Fl v Lists ZFS pools in addition to disks and partitions. Adding .Fl v shows more ZFS pool details in a format that resembles .Nm zpool Cm status output. .Pp .It Ic lszfs Ar filesystem A ZFS extended command that can be used to explore the ZFS filesystem hierarchy in a pool. Lists the immediate children of the .Ar filesystem . The filesystem hierarchy is rooted at a filesystem with the same name as the pool. .El .Sh FILES .Bl -tag -width /boot/zfsloader -compact .It Pa /boot/zfsloader .Nm itself. .El .Sh EXAMPLES Set the default device used for loading a kernel from a ZFS filesystem: -.Pp .Bd -literal -offset indent set currdev=zfs:tank/ROOT/knowngood: .Ed -.Pp .Sh SEE ALSO .Xr gptzfsboot 8 , .Xr loader 8 , .Xr zfs 8 , .Xr zfsboot 8 , .Xr zfsloader 8 , .Xr zpool 8 .Sh HISTORY The .Nm first appeared in .Fx 7.3 . .Sh AUTHORS This manual page was written by .An Andriy Gapon Aq avg@FreeBSD.org . Index: stable/10/sys/boot/i386/gptzfsboot/gptzfsboot.8 =================================================================== --- stable/10/sys/boot/i386/gptzfsboot/gptzfsboot.8 (revision 306377) +++ stable/10/sys/boot/i386/gptzfsboot/gptzfsboot.8 (revision 306378) @@ -1,193 +1,193 @@ .\" Copyright (c) 2014 Andriy Gapon .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd September 15, 2014 .Dt GPTZFSBOOT 8 .Os .Sh NAME .Nm gptzfsboot .Nd GPT bootcode for ZFS on BIOS-based computers .Sh DESCRIPTION .Nm is used on BIOS-based computers to boot from a filesystem in a ZFS pool. .Nm is installed in a .Cm freebsd-boot partition of a GPT-partitioned disk with .Xr gpart 8 . .Sh IMPLEMENTATION NOTES The GPT standard allows a variable number of partitions, but .Nm only boots from tables with 128 partitions or less. .Sh BOOTING .Nm tries to find all ZFS pools that are composed of BIOS-visible hard disks or partitions on them. .Nm looks for ZFS device labels on all visible disks and in discovered supported partitions for all supported partition scheme types. The search starts with the disk from which .Nm itself was loaded. Other disks are probed in BIOS defined order. After a disk is probed and .Nm determines that the whole disk is not a ZFS pool member, then individual partitions are probed in their partition table order. Currently GPT and MBR partition schemes are supported. With the GPT scheme, only partitions of type .Cm freebsd-zfs are probed. The first pool seen during probing is used as a default boot pool. .Pp The filesystem specified by the .Cm bootfs property of the pool is used as a default boot filesystem. If the .Cm bootfs property is not set, then the root filesystem of the pool is used as the default. .Xr zfsloader 8 is loaded from the boot filesystem. If .Pa /boot.config or .Pa /boot/config is present in the boot filesystem, boot options are read from it in the same way as .Xr boot 8 . .Pp The ZFS GUIDs of the first successfully probed device and the first detected pool are made available to .Xr zfsloader 8 in the .Cm vfs.zfs.boot.primary_vdev and .Cm vfs.zfs.boot.primary_pool variables. .Sh USAGE Normally .Nm will boot in fully automatic mode. However, like .Xr boot 8 , it is possible to interrupt the automatic boot process and interact with .Nm through a prompt. .Nm accepts all the options that .Xr boot 8 supports. .Pp Filesystem specification and the path to .Xr zfsloader 8 is different from .Xr boot 8 . The format is .Pp .Sm off .Oo zfs:pool/filesystem: Oc Oo /path/to/loader Oc .Sm on .Pp Both the filesystem and the path can be specified. If only a path is specified, then the default filesystem is used. If only a pool and filesystem are specified, then .Pa /boot/zfsloader is used as a path. .Pp Additionally, the .Ic status command can be used to query information about discovered pools. The output format is similar to that of .Cm zpool status .Pq see Xr zpool 8 . .Pp The configured or automatically determined ZFS boot filesystem is stored in the .Xr zfsloader 8 .Cm loaddev variable, and also set as the initial value of the .Cm currdev variable. .Sh FILES .Bl -tag -width /boot/gptzfsboot -compact .It Pa /boot/gptzfsboot boot code binary .It Pa /boot.config parameters for the boot block .Pq optional .It Pa /boot/config alternative parameters for the boot block .Pq optional .El .Sh EXAMPLES .Nm is typically installed in combination with a .Dq protective MBR .Po see .Xr gpart 8 .Pc . To install .Nm on the .Pa ada0 drive: .Bd -literal -offset indent gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0 .Ed .Pp .Nm can also be installed without the PMBR: .Bd -literal -offset indent gpart bootcode -p /boot/gptzfsboot -i 1 ada0 .Ed .Sh SEE ALSO .Xr boot.config 5 , .Xr boot 8 , .Xr gpart 8 , .Xr loader 8 , .Xr zfsloader 8 , .Xr zpool 8 .Sh HISTORY .Nm appeared in FreeBSD 7.3. +.Sh AUTHORS +This manual page was written by +.An Andriy Gapon Aq avg@FreeBSD.org . .Sh BUGS .Nm looks for ZFS meta-data only in MBR partitions .Pq known on FreeBSD as slices . It does not look into BSD .Xr disklabel 8 partitions that are traditionally called partitions. If a disklabel partition happens to be placed so that ZFS meta-data can be found at the fixed offsets relative to a slice, then .Nm will recognize the partition as a part of a ZFS pool, but this is not guaranteed to happen. -.Sh AUTHORS -This manual page was written by -.An Andriy Gapon Aq avg@FreeBSD.org . Index: stable/10/usr.bin/dpv/dpv.1 =================================================================== --- stable/10/usr.bin/dpv/dpv.1 (revision 306377) +++ stable/10/usr.bin/dpv/dpv.1 (revision 306378) @@ -1,435 +1,433 @@ .\" Copyright (c) 2013-2016 Devin Teske .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd Jan 26, 2016 .Dt DPV 1 .Os .Sh NAME .Nm dpv .Nd stream data from stdin or multiple paths with dialog progress view .Sh SYNOPSIS .Nm .Op options .Ar [bytes:]label .Nm .Op options .Fl m .Ar [bytes1:]label1 .Ar path1 .Op Ar [bytes2:]label2 path2 ... .Sh DESCRIPTION .Nm provides a dialog progress view, allowing a user to see current throughput rate and total data transferred for one or more streams. .Pp The .Nm utility has two main modes for processing input. .Pp The default input mode, without .Ql Fl m , .Nm reads bytes from standard input. A label for the data must be provided. .Pp The secondary input mode, with .Ql Fl m , .Nm reads multiple paths .Pq up to 2047 or Dq ARG_MAX/2-1 , sequentially. .Pp Data read in either mode is either thrown away .Pq default , sent to a spawned instance of the program specified via .Ql Fl x Ar cmd , or sent to a unique file specified by .Ql Fl o Ar file . .Pp With or without .Ql Fl m , progress is displayed using one of .Xr dialog 3 .Pq default , .Xr dialog 1 .Pq see Ql Fl D , or instead .Xr Xdialog 1 .Pq see Ql Fl X . .Pp The following options are available: .Bl -tag -width ".Fl b Ar backtitle" .It Fl a Ar text Display .Ar text below the file progress indicator(s). .It Fl b Ar backtitle Display .Ar backtitle on the backdrop, at top-left, behind the dialog widget. When using .Xr Xdialog 1 , this is displayed inside the window .Pq at the top followed by a separator line. .It Fl d Debug mode. Print dialog prompt data to standard out and provide additional debugging on standard error. .It Fl D Do not use the default interface of .Xr dialog 3 , but instead spawn an instance of .Xr dialog 1 . -The path to +The path to .Xr dialog 1 is taken from the .Ev DIALOG environment variable or simply .Dq Li dialog if unset or NULL. .It Fl h Produce a short syntax usage with brief option descriptions and exit. Output is produced on standard error. .It Fl i Ar format Customize the single-file format string used to update the status line. Ignored when using either .Ql Fl D or .Ql Fl X which lack the ability to display the status line .Pq containing bytes/rate/thread information . Default value is .Dq Li %'10lli bytes read @ %'9.1f bytes/sec. . This format is used when handling one file. .It Fl I Ar format Customize the multi-file format string used to update the status line. Ignored when using either .Ql Fl D or .Ql Fl X which lack the ability to display the status line .Pq containing bytes/rate/thread information . Default value is .Dq Li %'10lli bytes read @ %'9.1f bytes/sec. [%i/%i busy/wait] . This format is used when handling more than one file. .It Fl k Keep tite. Prevent visually distracting initialization/exit routines for scripts running .Xr dialog 1 several times. .It Fl l Line mode. Read lines from input instead of bytes. .It Fl L Ar size Label size. If negative, shrink to longest label width. .It Fl m Multi-input mode. Instead of reading bytes from standard input, read from a set of paths .Pq one for each label . By default, each path is processed sequentially in the order given. .It Fl n Ar num Display at-most .Ar num progress indicators per screen. If zero, display as many as possible. If negative, only display the main progress indicator. Default is 0. Maximum value is 10. .It Fl N No overrun. If enabled, stop reading known-length inputs when input reaches stated length. .It Fl o Ar file Output data to .Ar file . The first occurrence of .Ql %s .Pq if any in .Ql Ar file will be replaced with the .Ar label text. .It Fl p Ar text Display .Ar text above the file progress indicator(s). .It Fl P Ar size Mini-progressbar size. If negative, don't display mini-progressbars .Pq only the large overall progress indicator is shown . If zero, auto-adjust based on number of files to read. When zero and only one file to read, defaults to -1. When zero and more than one file to read, defaults to 17. .It Fl t Ar title Display .Ar title atop the dialog box. Note that if you use this option at the same time as .Ql Fl X and .Ql Fl b Ar backtitle , the .Ar backtitle and .Ar title are effectively switched .Pq see BUGS section below . .It Fl T Test mode. Simulate reading a number of bytes, divided evenly across the number of files, while stepping through each percent value of each file to process. Appends .Dq Li [TEST MODE] to the status line .Pq to override, use Ql Fl u Ar format . No data is actually read. .It Fl U Ar num Update status line .Ar num times per-second. Default value is .Ql Li 2 . A value of .Ql Li 0 disables status line updates. If negative, update the status line as fast as possible. Ignored when using either .Ql Fl D or .Ql Fl X which lack the ability to display the status line .Pq containing bytes/rate/thread information . .It Fl w Wide mode. Allows long .Ar text arguments used with .Ql Fl p and .Ql Fl a to bump the dialog width. Prompts wider than the maximum width will wrap .Pq unless using Xr Xdialog 1 ; see BUGS section below . .It Fl x Ar cmd Execute .Ar cmd .Pq via Xr sh 1 and send it data that has been read. Data is available to .Ar cmd on standard input. With .Ql Fl m , .Ar cmd is executed once for each .Ar path argument. The first occurrence of .Ql %s .Pq if any in .Ql Ar cmd will be replaced with the .Ar label text. .It Fl X Enable X11 mode by using .Xr Xdialog 1 instead of .Xr dialog 1 or .Xr dialog 3 . .El .Sh ENVIRONMENT The following environment variables are referenced by .Nm : .Bl -tag -width ".Ev USE_COLOR" .It Ev DIALOG Override command string used to launch .Xr dialog 1 .Pq requires Ql Fl D or .Xr Xdialog 1 .Pq requires Ql Fl X ; default is either .Ql dialog .Pq for Ql Fl D or .Ql Xdialog .Pq for Ql Fl X . .It Ev DIALOGRC If set and non-NULL, path to .Ql .dialogrc file. .It Ev HOME If .Ql Ev $DIALOGRC is either not set or NULL, used as a prefix to .Ql .dialogrc .Pq i.e., Ql $HOME/.dialogrc . .It Ev USE_COLOR If set and NULL, disables the use of color when using .Xr dialog 1 .Pq does not apply to Xr Xdialog 1 . .El .Sh DEPENDENCIES If using .Ql Fl D , .Xr dialog 1 is required. .Pp If using .Ql Fl X , .Xr Xdialog 1 is required. .Sh FILES .Bl -tag -width ".Pa $HOME/.dialogrc" -compact .It Pa $HOME/.dialogrc .El .Sh EXAMPLES -.Pp Simple example to show how fast .Xr yes 1 produces lines .Pq usually about ten-million per-second; your results may vary : .Bd -literal -offset indent yes | dpv -l yes .Ed .Pp Display progress while timing how long it takes .Xr yes 1 to produce a half-billion lines .Pq usually under one minute; your results may vary : .Bd -literal -offset indent time yes | dpv -Nl 500000000:yes .Ed .Pp An example to watch how quickly a file is transferred using .Xr nc 1 : .Bd -literal -offset indent dpv -x "nc -w 1 somewhere.com 3000" -m label file .Ed .Pp A similar example, transferring a file from another process and passing the expected size to .Nm : .Bd -literal -offset indent cat file | dpv -x "nc -w 1 somewhere.com 3000" 12345:label .Ed .Pp A more complicated example: .Bd -literal -offset indent tar cf - . | dpv -x "gzip -9 > out.tgz" \\ $( du -s . | awk '{print $1 * 1024}' ):label .Ed .Pp Taking an image of a disk: .Bd -literal -offset indent dpv -o disk-image.img -m label /dev/ada0 .Ed .Pp Writing an image back to a disk: .Bd -literal -offset indent dpv -o /dev/ada0 -m label disk-image.img .Ed .Pp Zeroing a disk: .Bd -literal -offset indent dpv -o /dev/md42 < /dev/zero .Ed -.Pp +.Sh SEE ALSO +.Xr dialog 1 , +.Xr dialog 3 , +.Xr sh 1 , +.Xr Xdialog 1 +.Sh HISTORY +A +.Nm +utility first appeared in +.Fx 10.2 . +.Sh AUTHORS +.An Devin Teske Aq dteske@FreeBSD.org .Sh BUGS .Xr Xdialog 1 , when given both .Ql Fl -title Ar title .Pq see above Ql Fl t Ar title and .Ql Fl -backtitle Ar backtitle .Pq see above Ql Fl b Ar backtitle , displays the backtitle in place of the title and vice-versa. .Pp .Xr Xdialog 1 does not wrap long prompt texts received after initial launch. This is a known issue with the .Ql --gauge widget in .Xr Xdialog 1 . .Pp .Xr dialog 1 does not display the first character after a series of escaped escape-sequences (e.g., ``\\\\n'' produces ``\\'' instead of ``\\n''). This is a known issue with .Xr dialog 1 and does not affect .Xr dialog 3 or .Xr Xdialog 1 . .Pp If your application ignores .Ev USE_COLOR when set and NULL before calling .Xr dpv 1 with color escape sequences anyway, .Xr dialog 3 and .Xr dialog 1 may not render properly. Workaround is to detect when .Ev USE_COLOR is set and NULL and either not use color escape sequences at that time or use .Xr unset 1 .Xr [ sh 1 ] or .Xr unsetenv 1 .Xr [ csh 1 ] to unset .Ev USE_COLOR , forcing interpretation of color sequences. This does not effect .Xr Xdialog 1 , which renders the color escape sequences as plain text. See -.Do Li +.Do embedded "\\Z" sequences .Dc in .Xr dialog 1 for additional information. -.Sh SEE ALSO -.Xr dialog 1 , -.Xr dialog 3 , -.Xr sh 1 , -.Xr Xdialog 1 -.Sh HISTORY -A -.Nm -utility first appeared in -.Fx 10.2 . -.Sh AUTHORS -.An Devin Teske Aq dteske@FreeBSD.org Index: stable/10 =================================================================== --- stable/10 (revision 306377) +++ stable/10 (revision 306378) Property changes on: stable/10 ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head:r274925