diff --git a/sbin/ipf/ippool/ippool.5 b/sbin/ipf/ippool/ippool.5 index 6ead9f7fbf3f..3b5c4d0f2bf6 100644 --- a/sbin/ipf/ippool/ippool.5 +++ b/sbin/ipf/ippool/ippool.5 @@ -1,319 +1,319 @@ .\" .TH IPPOOL 5 .SH NAME ippool, ippool.conf \- IP Pool file format .SH DESCRIPTION The file ippool.conf is used with ippool(8) to configure address pools for use with ipnat(8) and ipf(8). .PP There are four different types of address pools that can be configured through ippool.conf. The various types are presented below with a brief description of how they are used: .HP dstlist .IP destination list - is a collection of IP addresses with an optional network interface name that can be used with either redirect (rdr) rules in ipnat.conf(5) or as the destination in ipf.conf(5) for policy based routing. .HP group-map .IP group maps - support the srcgrpmap and dstgrpmap call functions in ipf.conf(5) by providing a list of addresses or networks rule group numbers to start processing them with. .HP hash .IP hash tables - provide the means for performing a very efficient lookup address or network when there is expected to be only one exact match. These are best used with more static sets of addresses so they can be sized optimally. .HP pool .IP address pools - are an alternative to hash tables that can perform just as well in most circumstances. In addition, the address pools allow for heirarchical matching, so it is possible to define a subnet as matching but then exclude specific addresses from it. .SS Evolving Configuration .PP Over time the configuration syntax used by ippool.conf(5) has evolved. Originally the syntax used was more verbose about what a particular value was being used for, for example: .PP .nf table role = ipf type = tree number = 100 { 1.1.1.1/32; !2.2.0.0/16; 2.2.2.0/24; ef00::5/128; }; .fi .PP This is rather long winded. The evolution of the configuration syntax has also replaced the use of numbers with names, although numbers can still be used as can be seen here: .PP .nf pool ipf/tree (name "100";) { 1.1.1.1/32; !2.2.0.0/16; 2.2.2.0/24; ef00::5/128; }; .fi .PP Both of the above examples produce the same configuration in the kernel for use with ipf.conf(5). .PP Newer options for use in ippool.conf(5) will only be offered in the new configuration syntax and all output using "ippool -l" will also be in the new configuration syntax. .SS IPFilter devices and pools .PP To cater to different administration styles, ipool.conf(5) allows you to tie a pool to a specific role in IPFilter. The recognised role names are: .HP ipf .IP pools defined for role "ipf" are available for use with all rules that are found in ipf.conf(5) except for auth rules. .HP nat .IP pools defined for role "nat" are available for use with all rules that are found in ipnat.conf(5). .HP auth .IP pools defined for role "auth" are available only for use with "auth" rules that are found in ipf.conf(5) .HP all .IP pools that are defined for the "all" role are available to all types of rules, be they NAT rules in ipnat.conf(5) or firewall rules in ipf.conf(5). .SH Address Pools .PP An address pool can be used in ipf.conf(5) and ipnat.conf(5) for matching the source or destination address of packets. They can be referred to either by name or number and can hold an arbitrary number of address patterns to match. .PP An address pool is considered to be a "tree type". In the older configuration style, it was necessary to have "type=tree" in ippool.conf(5). In the new style configuration, it follows the IPFilter device with which the pool is being configured. Now it is the default if left out. .PP For convenience, both IPv4 and IPv6 addresses can be stored in the same address pool. It should go without saying that either type of packet can only ever match an entry in a pool that is of the same address family. .PP The address pool searches the list of addresses configured for the best match. The "best match" is considered to be the match that has the highest number of bits set in the mask. Thus if both 2.2.0.0/16 and 2.2.2.0/24 are -present in an address pool, the addres 2.2.2.1 will match 2.2.2.0/24 and +present in an address pool, the address 2.2.2.1 will match 2.2.2.0/24 and 2.2.1.1 will match 2.2.0.0/16. The reason for this is to allow exceptions to be added through the use of negative matching. In the following example, the pool contains "2.2.0.0/16" and "!2.2.2.0/24", meaning that all packets that match 2.2.0.0/16, except those that match 2.2.2.0/24, will be considered as a match for this pool. .PP table role = ipf type = tree number = 100 { 1.1.1.1/32; 2.2.0.0/16; !2.2.2.0/24; ef00::5/128; }; .PP For the sake of clarity and to aid in managing large numbers of addresses inside address pools, it is possible to specify a location to load the addresses from. To do this simply use a "file://" URL where you would specify an actual IP address. .PP .nf pool ipf/tree (name rfc1918;) { file:///etc/ipf/rfc1918; }; .fi .PP The contents of the file might look something like this: .PP .nf # RFC 1918 networks 10.0.0.0/8 !127.0.0.0/8 172.16.0.0/12 192.168.0.0/24 .fi .PP In this example, the inclusion of the line "!127.0.0.0/8" is, strictly speaking not correct and serves only as an example to show that negative matching is also supported in this file. .PP Another format that ippool(8) recognises for input from a file is that from whois servers. In the following example, output from a query to a WHOIS server for information about which networks are associated with the name "microsoft" has been saved in a file named "ms-networks". There is no need to modify the output from the whois server, so using either the whois command or dumping data directly from it over a TCP connection works perfectly file as input. .PP .nf pool ipf/tree (name microsoft;) { whois file "/etc/ipf/ms-networks"; }; .fi .PP And to then block all packets to/from networks defined in that file, a rule like this might be used: .PP .nf block in from pool/microsoft to any .fi .PP Note that there are limitations on the output returned by whois servers so be aware that their output may not be 100% perfect for your goal. .SH Destination Lists .PP Destination lists are provided for use primarily with NAT redirect rules (rdr). Their purpose is to allow more sophisticated methods of selecting which host to send traffic to next than the simple round-robin technique that is present with with "round-robin" rules in ipnat.conf(5). .PP When building a list of hosts to use as a redirection list, it is necessary to list each host to be used explicitly. Expressing a collection of hosts as a range or a subnet is not supported. With each address it is also possible to specify a network interface name. The network interface name is ignored by NAT when using destination lists. The network itnerface name is currently only used with policy based routing (use of "to"/"dup-to" in ipf.conf(5)). .PP Unlike the other directives that can be expressed in this file, destination lists must be written using the new configuration syntax. Each destination list must have a name associated with it and a next hop selection policy. Some policies have further options. The currently available selection policies are: .HP round-robin .IP steps through the list of hosts configured with the destination list one by one .HP random .IP the next hop is chosen by random selection from the list available .HP src-hash .IP a hash is made of the source address components of the packet (address and port number) and this is used to select which next hop address is used .HP dst-hash .IP a hash is made of the destination address components of the packet (address and port number) and this is used to select which next hop address is used .HP hash .IP a hash is made of all the address components in the packet (addresses and port numbers) and this is used to select which next hop address is used .HP weighted .IP selecting a weighted policy for destination selection needs further clarification as to what type of weighted selection will be used. The sub-options to a weighted policy are: .RS .HP connection .IP the host that has received the least number of connections is selected to be the next hop. When all hosts have the same connection count, the last one used will be the next address selected. .RE .PP The first example here shows 4 destinations that are used with a round-robin selection policy. .PP .nf pool nat/dstlist (name servers; policy round-robin;) { 1.1.1.2; 1.1.1.4; 1.1.1.5; 1.1.1.9; }; .fi .PP In the following example, the destination is chosen by whichever has had the least number of connections. By placing the interface name with each address and saying "all/dstlist", the destination list can be used with both ipnat.conf(5) and ipf.conf(5). .PP .nf pool all/dstlist (name servers; policy weighted connection;) { bge0:1.1.1.2; bge0:1.1.1.4; bge1:1.1.1.5; bge1:1.1.1.9; }; .fi .SH Group maps .PP Group maps are provided to allow more efficient processing of packets where there are a larger number of subnets and groups of rules for those subnets. Group maps are used with "call" rules in ipf.conf(5) that use the "srcgrpmap" and "dstgrpmap" functions. .PP A group map declaration must mention which group is the default group for all matching addresses to be applied to. Then inside the list of addresses and networks for the group, each one may optionally have a group number associated with it. A simple example like this, where the first two entries would map to group 2020 but 5.0.0.0/8 sends rule processing to group 2040. .PP .nf group-map out role = ipf number = 2010 group = 2020 { 2.2.2.2/32; 4.4.0.0/16; 5.0.0.0/8, group = 2040; }; .fi .PP An example that outlines the real purpose of group maps is below, where each one of the 12 subnets is mapped to a different group number. This might be because each subnet has its own policy and rather than write a list of twelve rules in ipf.conf(5) that match the subnet and branch off with a head statement, a single rule can be used with this group map to achieve the same result. .PP .nf group-map ( name "2010"; in; ) { 192.168.1.0/24, group = 10010; 192.168.2.0/24, group = 10020; 192.168.3.0/24, group = 10030; 192.168.4.0/24, group = 10040; 192.168.5.0/24, group = 10050; 192.168.6.0/24, group = 10060; 192.168.7.0/24, group = 10070; 192.168.8.0/24, group = 10080; 192.168.9.0/24, group = 10090; 192.168.10.0/24, group = 10100; 192.168.11.0/24, group = 10110; 192.168.12.0/24, group = 10120; }; .fi .PP The limitation with group maps is that only the source address or the destination address can be used to map the packet to the starting group, not both, in your ipf.conf(5) file. .SH Hash Tables .PP The hash table is operationally similar to the address pool. It is used as a store for a collection of address to match on, saving the need to write a lengthy list of rules. As with address pools, searching will attempt to find the best match - an address specification with the largest contiguous netmask. .PP Hash tables are best used where the list of addresses, subnets and networks is relatively static, which is something of a contrast to the address pool that can work with either static or changing address list sizes. .PP Further work is still needed to have IPFilter correctly size and tune the hash table to optimise searching. The goal is to allow for small to medium sized tables to achieve close to O(1) for either a positive or negative match, in contrast to the address pool, which is O(logn). .PP The following two examples build the same table in the kernel, using the old configuration format (first) and the new one (second). .PP .nf table role=all type=hash name=servers size=5 { 1.1.1.2/32; 1.1.1.3/32; 11.23.44.66/32; }; pool all/hash (name servers; size 5;) { 1.1.1.2; 1.1.1.3; 11.23.44.66; }; .fi .SH FILES /dev/iplookup .br /etc/ippool.conf .br /etc/hosts .SH SEE ALSO ippool(8), hosts(5), ipf(5), ipf(8), ipnat(8) diff --git a/share/man/man9/mod_cc.9 b/share/man/man9/mod_cc.9 index 0d8488d7a92f..86d9c7b5312c 100644 --- a/share/man/man9/mod_cc.9 +++ b/share/man/man9/mod_cc.9 @@ -1,421 +1,421 @@ .\" .\" Copyright (c) 2008-2009 Lawrence Stewart .\" Copyright (c) 2010-2011 The FreeBSD Foundation .\" All rights reserved. .\" .\" Portions of this documentation were written at the Centre for Advanced .\" Internet Architectures, Swinburne University of Technology, Melbourne, .\" Australia by David Hayes and Lawrence Stewart under sponsorship from the .\" FreeBSD Foundation. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .Dd May 13, 2021 .Dt MOD_CC 9 .Os .Sh NAME .Nm mod_cc , .Nm DECLARE_CC_MODULE , .Nm CCV .Nd Modular Congestion Control .Sh SYNOPSIS .In netinet/tcp.h .In netinet/cc/cc.h .In netinet/cc/cc_module.h .Fn DECLARE_CC_MODULE "ccname" "ccalgo" .Fn CCV "ccv" "what" .Sh DESCRIPTION The .Nm framework allows congestion control algorithms to be implemented as dynamically loadable kernel modules via the .Xr kld 4 facility. Transport protocols can select from the list of available algorithms on a connection-by-connection basis, or use the system default (see .Xr mod_cc 4 for more details). .Pp .Nm modules are identified by an .Xr ascii 7 name and set of hook functions encapsulated in a .Vt "struct cc_algo" , which has the following members: .Bd -literal -offset indent struct cc_algo { char name[TCP_CA_NAME_MAX]; int (*mod_init) (void); int (*mod_destroy) (void); size_t (*cc_data_sz)(void); int (*cb_init) (struct cc_var *ccv, void *ptr); void (*cb_destroy) (struct cc_var *ccv); void (*conn_init) (struct cc_var *ccv); void (*ack_received) (struct cc_var *ccv, uint16_t type); void (*cong_signal) (struct cc_var *ccv, uint32_t type); void (*post_recovery) (struct cc_var *ccv); void (*after_idle) (struct cc_var *ccv); int (*ctl_output)(struct cc_var *, struct sockopt *, void *); void (*rttsample)(struct cc_var *, uint32_t, uint32_t, uint32_t); void (*newround)(struct cc_var *, uint32_t); }; .Ed .Pp The .Va name field identifies the unique name of the algorithm, and should be no longer than TCP_CA_NAME_MAX-1 characters in length (the TCP_CA_NAME_MAX define lives in .In netinet/tcp.h for compatibility reasons). .Pp The .Va mod_init function is called when a new module is loaded into the system but before the registration process is complete. It should be implemented if a module needs to set up some global state prior to being available for use by new connections. Returning a non-zero value from .Va mod_init will cause the loading of the module to fail. .Pp The .Va mod_destroy function is called prior to unloading an existing module from the kernel. It should be implemented if a module needs to clean up any global state before being removed from the kernel. The return value is currently ignored. .Pp The .Va cc_data_sz function is called by the socket option code to get the size of data that the .Va cb_init function needs. The socket option code then preallocates the modules memory so that the .Va cb_init function will not fail (the socket option code uses M_WAITOK with no locks held to do this). .Pp The .Va cb_init function is called when a TCP control block .Vt struct tcpcb is created. It should be implemented if a module needs to allocate memory for storing private per-connection state. Returning a non-zero value from .Va cb_init will cause the connection set up to be aborted, terminating the connection as a result. Note that the ptr argument passed to the function should be checked to see if it is non-NULL, if so it is preallocated memory that the cb_init function must use instead of calling malloc itself. .Pp The .Va cb_destroy function is called when a TCP control block .Vt struct tcpcb is destroyed. It should be implemented if a module needs to free memory allocated in .Va cb_init . .Pp The .Va conn_init function is called when a new connection has been established and variables are being initialised. It should be implemented to initialise congestion control algorithm variables for the newly established connection. .Pp The .Va ack_received function is called when a TCP acknowledgement (ACK) packet is received. Modules use the .Fa type argument as an input to their congestion management algorithms. The ACK types currently reported by the stack are CC_ACK and CC_DUPACK. CC_ACK indicates the received ACK acknowledges previously unacknowledged data. CC_DUPACK indicates the received ACK acknowledges data we have already received an ACK for. .Pp The .Va cong_signal function is called when a congestion event is detected by the TCP stack. Modules use the .Fa type argument as an input to their congestion management algorithms. The congestion event types currently reported by the stack are CC_ECN, CC_RTO, CC_RTO_ERR and CC_NDUPACK. CC_ECN is reported when the TCP stack receives an explicit congestion notification (RFC3168). CC_RTO is reported when the retransmission time out timer fires. CC_RTO_ERR is reported if the retransmission time out timer fired in error. CC_NDUPACK is reported if N duplicate ACKs have been received back-to-back, where N is the fast retransmit duplicate ack threshold (N=3 currently as per RFC5681). .Pp The .Va post_recovery function is called after the TCP connection has recovered from a congestion event. It should be implemented to adjust state as required. .Pp The .Va after_idle function is called when data transfer resumes after an idle period. It should be implemented to adjust state as required. .Pp The .Va ctl_output function is called when .Xr getsockopt 2 or .Xr setsockopt 2 is called on a .Xr tcp 4 socket with the .Va struct sockopt pointer forwarded unmodified from the TCP control, and a .Va void * pointer to algorithm specific argument. .Pp The .Va rttsample function is called to pass round trip time information to the congestion controller. The additional arguments to the function include the microsecond RTT that is being noted, the number of times that the data being acknowledged was retransmitted as well as the flightsize at send. For transports that do not track flightsize at send, this variable will be the current cwnd at the time of the call. .Pp The .Va newround function is called each time a new round trip time begins. The montonically increasing round number is also passed to the congestion controller as well. This can be used for various purposes by the congestion controller (e.g Hystart++). .Pp Note that currently not all TCP stacks call the .Va rttsample and .Va newround -function so dependancy on these functions is also -dependant upon which TCP stack is in use. +function so dependency on these functions is also +dependent upon which TCP stack is in use. .Pp The .Fn DECLARE_CC_MODULE macro provides a convenient wrapper around the .Xr DECLARE_MODULE 9 macro, and is used to register a .Nm module with the .Nm framework. The .Fa ccname argument specifies the module's name. The .Fa ccalgo argument points to the module's .Vt struct cc_algo . .Pp .Nm modules must instantiate a .Vt struct cc_algo , but are only required to set the name field, and optionally any of the function pointers. Note that if a module defines the .Va cb_init function it also must define a .Va cc_data_sz function. This is because when switching from one congestion control module to another the socket option code will preallocate memory for the .Va cb_init function. If no memory is allocated by the modules .Va cb_init then the .Va cc_data_sz function should return 0. .Pp The stack will skip calling any function pointer which is NULL, so there is no requirement to implement any of the function pointers (with the exception of -the cb_init <-> cc_data_sz dependancy noted above). +the cb_init <-> cc_data_sz dependency noted above). Using the C99 designated initialiser feature to set fields is encouraged. .Pp Each function pointer which deals with congestion control state is passed a pointer to a .Vt struct cc_var , which has the following members: .Bd -literal -offset indent struct cc_var { void *cc_data; int bytes_this_ack; tcp_seq curack; uint32_t flags; int type; union ccv_container { struct tcpcb *tcp; struct sctp_nets *sctp; } ccvc; uint16_t nsegs; uint8_t labc; }; .Ed .Pp .Vt struct cc_var groups congestion control related variables into a single, embeddable structure and adds a layer of indirection to accessing transport protocol control blocks. The eventual goal is to allow a single set of .Nm modules to be shared between all congestion aware transport protocols, though currently only .Xr tcp 4 is supported. .Pp To aid the eventual transition towards this goal, direct use of variables from the transport protocol's data structures is strongly discouraged. However, it is inevitable at the current time to require access to some of these variables, and so the .Fn CCV macro exists as a convenience accessor. The .Fa ccv argument points to the .Vt struct cc_var passed into the function by the .Nm framework. The .Fa what argument specifies the name of the variable to access. .Pp Apart from the .Va type and .Va ccv_container fields, the remaining fields in .Vt struct cc_var are for use by .Nm modules. .Pp The .Va cc_data field is available for algorithms requiring additional per-connection state to attach a dynamic memory pointer to. The memory should be allocated and attached in the module's .Va cb_init hook function. .Pp The .Va bytes_this_ack field specifies the number of new bytes acknowledged by the most recently received ACK packet. It is only valid in the .Va ack_received hook function. .Pp The .Va curack field specifies the sequence number of the most recently received ACK packet. It is only valid in the .Va ack_received , .Va cong_signal and .Va post_recovery hook functions. .Pp The .Va flags field is used to pass useful information from the stack to a .Nm module. The CCF_ABC_SENTAWND flag is relevant in .Va ack_received and is set when appropriate byte counting (RFC3465) has counted a window's worth of bytes has been sent. It is the module's responsibility to clear the flag after it has processed the signal. The CCF_CWND_LIMITED flag is relevant in .Va ack_received and is set when the connection's ability to send data is currently constrained by the value of the congestion window. Algorithms should use the absence of this flag being set to avoid accumulating a large difference between the congestion window and send window. .Pp The .Va nsegs variable is used to pass in how much compression was done by the local LRO system. So for example if LRO pushed three in-order acknowledgements into one acknowledgement the variable would be set to three. .Pp The .Va labc variable is used in conjunction with the CCF_USE_LOCAL_ABC flag to override what labc variable the congestion controller will use for this particular acknowledgement. .Sh SEE ALSO .Xr cc_cdg 4 , .Xr cc_chd 4 , .Xr cc_cubic 4 , .Xr cc_dctcp 4 , .Xr cc_hd 4 , .Xr cc_htcp 4 , .Xr cc_newreno 4 , .Xr cc_vegas 4 , .Xr mod_cc 4 , .Xr tcp 4 .Sh ACKNOWLEDGEMENTS Development and testing of this software were made possible in part by grants from the FreeBSD Foundation and Cisco University Research Program Fund at Community Foundation Silicon Valley. .Sh FUTURE WORK Integrate with .Xr sctp 4 . .Sh HISTORY The modular Congestion Control (CC) framework first appeared in .Fx 9.0 . .Pp The framework was first released in 2007 by James Healy and Lawrence Stewart whilst working on the NewTCP research project at Swinburne University of Technology's Centre for Advanced Internet Architectures, Melbourne, Australia, which was made possible in part by a grant from the Cisco University Research Program Fund at Community Foundation Silicon Valley. More details are available at: .Pp http://caia.swin.edu.au/urp/newtcp/ .Sh AUTHORS .An -nosplit The .Nm framework was written by .An Lawrence Stewart Aq Mt lstewart@FreeBSD.org , .An James Healy Aq Mt jimmy@deefa.com and .An David Hayes Aq Mt david.hayes@ieee.org . .Pp This manual page was written by .An David Hayes Aq Mt david.hayes@ieee.org and .An Lawrence Stewart Aq Mt lstewart@FreeBSD.org .