Index: stable/4/sbin/ipfw/ipfw.8 =================================================================== --- stable/4/sbin/ipfw/ipfw.8 (revision 116991) +++ stable/4/sbin/ipfw/ipfw.8 (revision 116992) @@ -1,2071 +1,2120 @@ .\" .\" $FreeBSD$ .\" .Dd August 13, 2002 .Dt IPFW 8 .Os .Sh NAME .Nm ipfw .Nd IP firewall and traffic shaper control program .Sh SYNOPSIS .Nm .Op Fl cq .Cm add .Ar rule .Nm .Op Fl acdeftNS .Brq Cm list | show .Op Ar number ... .Nm .Op Fl f | q .Cm flush .Nm .Op Fl q .Brq Cm delete | zero | resetlog .Op Cm set .Op Ar number ... .Nm .Cm enable .Brq Cm firewall | one_pass | debug | verbose | dyn_keepalive .Nm .Cm disable .Brq Cm firewall | one_pass | debug | verbose | dyn_keepalive .Pp .Nm .Cm set Oo Cm disable Ar number ... Oc Op Cm enable Ar number ... .Nm .Cm set move .Op Cm rule .Ar number Cm to Ar number .Nm .Cm set swap Ar number number .Nm .Cm set show .Pp .Nm .Brq Cm pipe | queue .Ar number .Cm config .Ar config-options .Nm .Op Fl s Op Ar field .Brq Cm pipe | queue .Brq Cm delete | list | show .Op Ar number ... .Pp .Nm .Op Fl q .Oo .Fl p Ar preproc -.Oo Fl D -.Ar macro Ns Op = Ns Ar value +.Oo +.Ar preproc-flags .Oc -.Op Fl U Ar macro .Oc .Ar pathname .Sh DESCRIPTION The .Nm utility is the user interface for controlling the .Xr ipfw 4 firewall and the .Xr dummynet 4 traffic shaper in .Fx . .Pp .Bd -ragged -offset XXXX .Em NOTE: this manual page documents the newer version of .Nm introduced in .Fx CURRENT in July 2002, also known as .Nm ipfw2 . .Nm ipfw2 is a superset of the old firewall, .Nm ipfw1 . The differences between the two are listed in Section .Sx IPFW2 ENHANCEMENTS , which you are encouraged to read to revise older rulesets and possibly write them more efficiently. See Section .Sx USING IPFW2 IN FreeBSD-STABLE for instructions on how to run .Nm ipfw2 on .Fx STABLE. .Ed .Pp An .Nm configuration, or .Em ruleset , is made of a list of .Em rules numbered from 1 to 65535. Packets are passed to .Nm from a number of different places in the protocol stack (depending on the source and destination of the packet, it is possible that .Nm is invoked multiple times on the same packet). The packet passed to the firewall is compared against each of the rules in the firewall .Em ruleset . When a match is found, the action corresponding to the matching rule is performed. .Pp Depending on the action and certain system settings, packets can be reinjected into the firewall at some rule after the matching one for further processing. .Pp An .Nm ruleset always includes a .Em default rule (numbered 65535) which cannot be modified, and matches all packets. The action associated with the .Em default rule can be either .Cm deny or .Cm allow depending on how the kernel is configured. .Pp If the ruleset includes one or more rules with the .Cm keep-state or .Cm limit option, then .Nm assumes a .Em stateful behaviour, i.e. upon a match it will create dynamic rules matching the exact parameters (addresses and ports) of the matching packet. .Pp These dynamic rules, which have a limited lifetime, are checked at the first occurrence of a .Cm check-state , .Cm keep-state or .Cm limit rule, and are typically used to open the firewall on-demand to legitimate traffic only. See the .Sx STATEFUL FIREWALL and .Sx EXAMPLES Sections below for more information on the stateful behaviour of .Nm . .Pp All rules (including dynamic ones) have a few associated counters: a packet count, a byte count, a log count and a timestamp indicating the time of the last match. Counters can be displayed or reset with .Nm commands. .Pp Rules can be added with the .Cm add command; deleted individually or in groups with the .Cm delete command, and globally with the .Cm flush command; displayed, optionally with the content of the counters, using the .Cm show and .Cm list commands. Finally, counters can be reset with the .Cm zero and .Cm resetlog commands. .Pp Also, each rule belongs to one of 32 different .Em sets , and there are .Nm commands to atomically manipulate sets, such as enable, disable, swap sets, move all rules in a set to another one, delete all rules in a set. These can be useful to install temporary configurations, or to test them. See Section .Sx SETS OF RULES for more information on .Em sets . .Pp The following options are available: .Bl -tag -width indent .It Fl a While listing, show counter values. The .Cm show command just implies this option. .It Fl c When entering or showing rules, print them in compact form, i.e. without the optional "ip from any to any" string when this does not carry any additional information. .It Fl d While listing, show dynamic rules in addition to static ones. .It Fl e While listing, if the .Fl d option was specified, also show expired dynamic rules. .It Fl f Don't ask for confirmation for commands that can cause problems if misused, .No i.e. Cm flush . If there is no tty associated with the process, this is implied. .It Fl N Try to resolve addresses and service names in output. .It Fl q While .Cm add Ns ing , .Cm zero Ns ing , .Cm resetlog Ns ging or .Cm flush Ns ing , be quiet about actions (implies .Fl f ) . This is useful for adjusting rules by executing multiple .Nm commands in a script (e.g., .Ql sh\ /etc/rc.firewall ) , or by processing a file of many .Nm rules across a remote login session. If a .Cm flush is performed in normal (verbose) mode (with the default kernel configuration), it prints a message. Because all rules are flushed, the message might not be delivered to the login session, causing the remote login session to be closed and the remainder of the ruleset to not be processed. Access to the console would then be required to recover. .It Fl S While listing rules, show the .Em set each rule belongs to. If this flag is not specified, disabled rules will not be listed. .It Fl s Op Ar field While listing pipes, sort according to one of the four counters (total or current packets or bytes). .It Fl t While listing, show last match timestamp. .El .Pp To ease configuration, rules can be put into a file which is processed using .Nm as shown in the last synopsis line. An absolute .Ar pathname must be used. The file will be read line by line and applied as arguments to the .Nm utility. .Pp Optionally, a preprocessor can be specified using .Fl p Ar preproc where .Ar pathname is to be piped through. Useful preprocessors include .Xr cpp 1 and .Xr m4 1 . If .Ar preproc doesn't start with a slash .Pq Ql / as its first character, the usual .Ev PATH name search is performed. Care should be taken with this in environments where not all file systems are mounted (yet) by the time .Nm is being run (e.g. when they are mounted over NFS). Once .Fl p -has been specified, optional -.Fl D -and -.Fl U -specifications can follow and will be passed on to the preprocessor. +has been specified, any additional arguments as passed on to the preprocessor +for interpretation. This allows for flexible configuration files (like conditionalizing them on the local hostname) and the use of macros to centralize frequently required arguments like IP addresses. .Pp The .Nm .Cm pipe and .Cm queue commands are used to configure the traffic shaper, as shown in the .Sx TRAFFIC SHAPER (DUMMYNET) CONFIGURATION Section below. .Pp If the world and the kernel get out of sync the .Nm ABI may break, preventing you from being able to add any rules. This can adversely effect the booting process. You can use .Nm .Cm disable .Cm firewall to temporarily disable the firewall to regain access to the network, allowing you to fix the problem. .Sh PACKET FLOW A packet is checked against the active ruleset in multiple places in the protocol stack, under control of several sysctl variables. These places and variables are shown below, and it is important to have this picture in mind in order to design a correct ruleset. .Bd -literal -offset indent ^ to upper layers V | | +----------->-----------+ ^ V [ip_input] [ip_output] net.inet.ip.fw.enable=1 | | ^ V [ether_demux] [ether_output_frame] net.link.ether.ipfw=1 | | +-->--[bdg_forward]-->--+ net.link.ether.bridge_ipfw=1 ^ V | to devices | .Ed .Pp As can be noted from the above picture, the number of times the same packet goes through the firewall can vary between 0 and 4 depending on packet source and destination, and system configuration. .Pp Note that as packets flow through the stack, headers can be stripped or added to it, and so they may or may not be available for inspection. E.g., incoming packets will include the MAC header when .Nm is invoked from .Cm ether_demux() , but the same packets will have the MAC header stripped off when .Nm is invoked from .Cm ip_input() . .Pp Also note that each packet is always checked against the complete ruleset, irrespective of the place where the check occurs, or the source of the packet. If a rule contains some match patterns or actions which are not valid for the place of invocation (e.g. trying to match a MAC header within .Cm ip_input() ), the match pattern will not match, but a .Cm not operator in front of such patterns .Em will cause the pattern to .Em always match on those packets. It is thus the responsibility of the programmer, if necessary, to write a suitable ruleset to differentiate among the possible places. .Cm skipto rules can be useful here, as an example: .Bd -literal -offset indent # packets from ether_demux or bdg_forward ipfw add 10 skipto 1000 all from any to any layer2 in # packets from ip_input ipfw add 10 skipto 2000 all from any to any not layer2 in # packets from ip_output ipfw add 10 skipto 3000 all from any to any not layer2 out # packets from ether_output_frame ipfw add 10 skipto 4000 all from any to any layer2 out .Ed .Pp (yes, at the moment there is no way to differentiate between ether_demux and bdg_forward). .Sh RULE FORMAT The format of .Nm rules is the following: .Bd -ragged -offset indent .Op Ar rule_number .Op Cm set Ar set_number .Op Cm prob Ar match_probability .br .Ar " " action .Op Cm log Op Cm logamount Ar number .Ar body .Ed .Pp where the body of the rule specifies which information is used for filtering packets, among the following: .Pp .Bl -tag -width "Source and dest. addresses and ports" -offset XXX -compact .It Layer-2 header fields When available .It IPv4 Protocol TCP, UDP, ICMP, etc. .It Source and dest. addresses and ports .It Direction See Section .Sx PACKET FLOW .It Transmit and receive interface By name or address .It Misc. IP header fields Version, type of service, datagram length, identification, fragment flag (non-zero IP offset), Time To Live .It IP options .It Misc. TCP header fields TCP flags (SYN, FIN, ACK, RST, etc.), sequence number, acknowledgment number, window .It TCP options .It ICMP types for ICMP packets .It User/group ID When the packet can be associated with a local socket. .El .Pp Note that some of the above information, e.g. source MAC or IP addresses and TCP/UDP ports, could easily be spoofed, so filtering on those fields alone might not guarantee the desired results. .Bl -tag -width indent .It Ar rule_number Each rule is associated with a .Ar rule_number in the range 1..65535, with the latter reserved for the .Em default rule. Rules are checked sequentially by rule number. Multiple rules can have the same number, in which case they are checked (and listed) according to the order in which they have been added. If a rule is entered without specifying a number, the kernel will assign one in such a way that the rule becomes the last one before the .Em default rule. Automatic rule numbers are assigned by incrementing the last non-default rule number by the value of the sysctl variable .Ar net.inet.ip.fw.autoinc_step which defaults to 100. If this is not possible (e.g. because we would go beyond the maximum allowed rule number), the number of the last non-default value is used instead. .It Cm set Ar set_number Each rule is associated with a .Ar set_number in the range 0..31, with the latter reserved for the .Em default rule. Sets can be individually disabled and enabled, so this parameter is of fundamental importance for atomic ruleset manipulation. It can be also used to simplify deletion of groups of rules. If a rule is entered without specifying a set number, set 0 will be used. .It Cm prob Ar match_probability A match is only declared with the specified probability (floating point number between 0 and 1). This can be useful for a number of applications such as random packet drop or (in conjunction with .Xr dummynet 4 ) to simulate the effect of multiple paths leading to out-of-order packet delivery. .Pp Note: this condition is checked before any other condition, including ones such as keep-state or check-state which might have side effects. .It Cm log Op Cm logamount Ar number When a packet matches a rule with the .Cm log keyword, a message will be logged to .Xr syslogd 8 with a .Dv LOG_SECURITY facility. The logging only occurs if the sysctl variable .Em net.inet.ip.fw.verbose is set to 1 (which is the default when the kernel is compiled with .Dv IPFIREWALL_VERBOSE ) and the number of packets logged so far for that particular rule does not exceed the .Cm logamount parameter. If no .Cm logamount is specified, the limit is taken from the sysctl variable .Em net.inet.ip.fw.verbose_limit . In both cases, a value of 0 removes the logging limit. .Pp Once the limit is reached, logging can be re-enabled by clearing the logging counter or the packet counter for that entry, see the .Cm resetlog command. .Pp Note: logging is done after all other packet matching conditions have been successfully verified, and before performing the final action (accept, deny, etc.) on the packet. .El .Ss RULE ACTIONS A rule can be associated with one of the following actions, which will be executed when the packet matches the body of the rule. .Bl -tag -width indent .It Cm allow | accept | pass | permit Allow packets that match rule. The search terminates. .It Cm check-state Checks the packet against the dynamic ruleset. If a match is found, execute the action associated with the rule which generated this dynamic rule, otherwise move to the next rule. .br .Cm Check-state rules do not have a body. If no .Cm check-state rule is found, the dynamic ruleset is checked at the first .Cm keep-state or .Cm limit rule. .It Cm count Update counters for all packets that match rule. The search continues with the next rule. .It Cm deny | drop Discard packets that match this rule. The search terminates. .It Cm divert Ar port Divert packets that match this rule to the .Xr divert 4 socket bound to port .Ar port . The search terminates. .It Cm fwd | forward Ar ipaddr Ns Op , Ns Ar port Change the next-hop on matching packets to .Ar ipaddr , which can be an IP address in dotted quad format or a host name. The search terminates if this rule matches. .Pp If .Ar ipaddr is a local address, then matching packets will be forwarded to .Ar port (or the port number in the packet if one is not specified in the rule) on the local machine. .br If .Ar ipaddr is not a local address, then the port number (if specified) is ignored, and the packet will be forwarded to the remote address, using the route as found in the local routing table for that IP. .br A .Ar fwd rule will not match layer-2 packets (those received on ether_input, ether_output, or bridged). .br The .Cm fwd action does not change the contents of the packet at all. In particular, the destination address remains unmodified, so packets forwarded to another system will usually be rejected by that system unless there is a matching rule on that system to capture them. For packets forwarded locally, the local address of the socket will be set to the original destination address of the packet. This makes the .Xr netstat 1 entry look rather weird but is intended for use with transparent proxy servers. .It Cm pipe Ar pipe_nr Pass packet to a .Xr dummynet 4 .Dq pipe (for bandwidth limitation, delay, etc.). See the .Sx TRAFFIC SHAPER (DUMMYNET) CONFIGURATION Section for further information. The search terminates; however, on exit from the pipe and if the .Xr sysctl 8 variable .Em net.inet.ip.fw.one_pass is not set, the packet is passed again to the firewall code starting from the next rule. .It Cm queue Ar queue_nr Pass packet to a .Xr dummynet 4 .Dq queue (for bandwidth limitation using WF2Q+). .It Cm reject (Deprecated). Synonym for .Cm unreach host . .It Cm reset Discard packets that match this rule, and if the packet is a TCP packet, try to send a TCP reset (RST) notice. The search terminates. .It Cm skipto Ar number Skip all subsequent rules numbered less than .Ar number . The search continues with the first rule numbered .Ar number or higher. .It Cm tee Ar port Send a copy of packets matching this rule to the .Xr divert 4 socket bound to port .Ar port . The search terminates and the original packet is accepted (but see Section .Sx BUGS below). .It Cm unreach Ar code Discard packets that match this rule, and try to send an ICMP unreachable notice with code .Ar code , where .Ar code is a number from 0 to 255, or one of these aliases: .Cm net , host , protocol , port , .Cm needfrag , srcfail , net-unknown , host-unknown , .Cm isolated , net-prohib , host-prohib , tosnet , .Cm toshost , filter-prohib , host-precedence or .Cm precedence-cutoff . The search terminates. .El .Ss RULE BODY The body of a rule contains zero or more patterns (such as specific source and destination addresses or ports, protocol options, incoming or outgoing interfaces, etc.) that the packet must match in order to be recognised. In general, the patterns are connected by (implicit) .Cm and operators -- i.e. all must match in order for the rule to match. Individual patterns can be prefixed by the .Cm not operator to reverse the result of the match, as in .Pp .Dl "ipfw add 100 allow ip from not 1.2.3.4 to any" .Pp Additionally, sets of alternative match patterns ( .Em or-blocks ) can be constructed by putting the patterns in lists enclosed between parentheses ( ) or braces { }, and using the .Cm or operator as follows: .Pp .Dl "ipfw add 100 allow ip from { x or not y or z } to any" .Pp Only one level of parentheses is allowed. Beware that most shells have special meanings for parentheses or braces, so it is advisable to put a backslash \\ in front of them to prevent such interpretations. .Pp The body of a rule must in general include a source and destination address specifier. The keyword .Ar any can be used in various places to specify that the content of a required field is irrelevant. .Pp The rule body has the following format: .Bd -ragged -offset indent .Op Ar proto Cm from Ar src Cm to Ar dst .Op Ar options .Ed .Pp The first part (protocol from src to dst) is for backward compatibility with .Nm ipfw1 . In .Nm ipfw2 any match pattern (including MAC headers, IPv4 protocols, addresses and ports) can be specified in the .Ar options section. .Pp Rule fields have the following meaning: .Bl -tag -width indent .It Ar proto : protocol | Cm { Ar protocol Cm or ... } An IPv4 protocol (or an .Em or-block with multiple protocols) specified by number or name (for a complete list see .Pa /etc/protocols ) . The .Cm ip or .Cm all keywords mean any protocol will match. .It Ar src No and Ar dst : ip-address | Cm { Ar ip-address Cm or ... } Op Oo Cm not Oc Ar ports A single .Ar ip-address , or an .Em or-block containing one or more of them, optionally followed by .Ar ports specifiers. .It Ar ip-address : An address (or set of addresses) specified in one of the following ways, optionally preceded by a .Cm not operator: .Bl -tag -width indent .It Cm any matches any IP address. .It Cm me matches any IP address configured on an interface in the system. The address list is evaluated at the time the packet is analysed. .It Ar numeric-ip | hostname Matches a single IPv4 address, specified as dotted-quad or a hostname. Hostnames are resolved at the time the rule is added to the firewall list. .It Ar addr Ns / Ns Ar masklen Matches all addresses with base .Ar addr (specified as a dotted quad or a hostname) and mask width of .Cm masklen bits. As an example, 1.2.3.4/25 will match all IP numbers from 1.2.3.0 to 1.2.3.127 . .It Ar addr Ns / Ns Ar masklen Ns Cm { Ns Ar num,num,... Ns Cm } Matches all addresses with base address .Ar addr (specified as a dotted quad or a hostname) and whose last byte is in the list between braces { } . Note that there must be no spaces between braces, commas and numbers. The .Ar masklen field is used to limit the size of the set of addresses, and can have any value between 24 and 32. .br As an example, an address specified as 1.2.3.4/24{128,35,55,89} will match the following IP addresses: .br 1.2.3.128 1.2.3.35 1.2.3.55 1.2.3.89 . .br This format is particularly useful to handle sparse address sets within a single rule. Because the matching occurs using a bitmask, it takes constant time and dramatically reduces the complexity of rulesets. .It Ar addr Ns : Ns Ar mask Matches all addresses with base .Ar addr (specified as a dotted quad or a hostname) and the mask of .Ar mask , specified as a dotted quad. As an example, 1.2.3.4/255.0.255.0 will match 1.*.3.*. We suggest to use this form only for non-contiguous masks, and resort to the .Ar addr Ns / Ns Ar masklen format for contiguous masks, which is more compact and less error-prone. .El .It Ar ports : Bro Ar port | port Ns \&- Ns Ar port Ns Brc Op , Ns Ar ports For protocols which support port numbers (such as TCP and UDP), optional .Cm ports may be specified as one or more ports or port ranges, separated by commas but no spaces, and an optional .Cm not operator. The .Ql \&- notation specifies a range of ports (including boundaries). .Pp Service names (from .Pa /etc/services ) may be used instead of numeric port values. The length of the port list is limited to 30 ports or ranges, though one can specify larger ranges by using an .Em or-block in the .Cm options section of the rule. .Pp A backslash .Pq Ql \e can be used to escape the dash .Pq Ql - character in a service name (from a shell, the backslash must be typed twice to avoid the shell itself interpreting it as an escape character). .Pp .Dl "ipfw add count tcp from any ftp\e\e-data-ftp to any" .Pp Fragmented packets which have a non-zero offset (i.e. not the first fragment) will never match a rule which has one or more port specifications. See the .Cm frag option for details on matching fragmented packets. .El .Ss RULE OPTIONS (MATCH PATTERNS) Additional match patterns can be used within rules. Zero or more of these so-called .Em options can be present in a rule, optionally prefixed by the .Cm not operand, and possibly grouped into .Em or-blocks . .Pp The following match patterns can be used (listed in alphabetical order): .Bl -tag -width indent .It Cm bridged Matches only bridged packets. .It Cm dst-ip Ar ip-address Matches IP packets whose destination IP is one of the address(es) specified as argument. .It Cm dst-port Ar ports Matches IP packets whose destination port is one of the port(s) specified as argument. .It Cm established Matches TCP packets that have the RST or ACK bits set. .It Cm frag Matches packets that are fragments and not the first fragment of an IP datagram. Note that these packets will not have the next protocol header (e.g. TCP, UDP) so options that look into these headers cannot match. .It Cm gid Ar group Matches all TCP or UDP packets sent by or received for a .Ar group . A .Ar group may be specified by name or number. .It Cm icmptypes Ar types Matches ICMP packets whose ICMP type is in the list .Ar types . The list may be specified as any combination of ranges or individual types separated by commas. The supported ICMP types are: .Pp echo reply .Pq Cm 0 , destination unreachable .Pq Cm 3 , source quench .Pq Cm 4 , redirect .Pq Cm 5 , echo request .Pq Cm 8 , router advertisement .Pq Cm 9 , router solicitation .Pq Cm 10 , time-to-live exceeded .Pq Cm 11 , IP header bad .Pq Cm 12 , timestamp request .Pq Cm 13 , timestamp reply .Pq Cm 14 , information request .Pq Cm 15 , information reply .Pq Cm 16 , address mask request .Pq Cm 17 and address mask reply .Pq Cm 18 . .It Cm in | out Matches incoming or outgoing packets, respectively. .Cm in and .Cm out are mutually exclusive (in fact, .Cm out is implemented as .Cm not in Ns No ). -.It Cm ipid Ar id +.It Cm ipid Ar id-list Matches IP packets whose .Cm ip_id -field has value -.Ar id . -.It Cm iplen Ar len +field has value included in +.Ar id-list , +which is either a single value or a list of values or ranges +specified in the same way as +.Ar ports . +.It Cm iplen Ar len-list Matches IP packets whose total length, including header and data, is -.Ar len -bytes. +in the set +.Ar len-list , +which is either a single value or a list of values or ranges +specified in the same way as +.Ar ports . .It Cm ipoptions Ar spec Matches packets whose IP header contains the comma separated list of options specified in .Ar spec . The supported IP options are: .Pp .Cm ssrr (strict source route), .Cm lsrr (loose source route), .Cm rr (record packet route) and .Cm ts (timestamp). The absence of a particular option may be denoted with a .Ql \&! . .It Cm ipprecedence Ar precedence Matches IP packets whose precedence field is equal to .Ar precedence . .It Cm iptos Ar spec Matches IP packets whose .Cm tos field contains the comma separated list of service types specified in .Ar spec . The supported IP types of service are: .Pp .Cm lowdelay .Pq Dv IPTOS_LOWDELAY , .Cm throughput .Pq Dv IPTOS_THROUGHPUT , .Cm reliability .Pq Dv IPTOS_RELIABILITY , .Cm mincost .Pq Dv IPTOS_MINCOST , .Cm congestion .Pq Dv IPTOS_CE . The absence of a particular type may be denoted with a .Ql \&! . -.It Cm ipttl Ar ttl -Matches IP packets whose time to live is -.Ar ttl . +.It Cm ipttl Ar ttl-list +Matches IP packets whose time to live is included in +.Ar ttl-list , +which is either a single value or a list of values or ranges +specified in the same way as +.Ar ports . .It Cm ipversion Ar ver Matches IP packets whose IP version field is .Ar ver . .It Cm keep-state Upon a match, the firewall will create a dynamic rule, whose default behaviour is to match bidirectional traffic between source and destination IP/port using the same protocol. The rule has a limited lifetime (controlled by a set of .Xr sysctl 8 variables), and the lifetime is refreshed every time a matching packet is found. .It Cm layer2 Matches only layer2 packets, i.e. those passed to .Nm from ether_demux() and ether_output_frame(). .It Cm limit Bro Cm src-addr | src-port | dst-addr | dst-port Brc Ar N The firewall will only allow .Ar N connections with the same set of parameters as specified in the rule. One or more of source and destination addresses and ports can be specified. .It Cm { MAC | mac } Ar dst-mac src-mac Match packets with a given .Ar dst-mac and .Ar src-mac addresses, specified as the .Cm any keyword (matching any MAC address), or six groups of hex digits separated by colons, and optionally followed by a mask indicating how many bits are significant, as in .Pp .Dl "MAC 10:20:30:40:50:60/33 any" .Pp Note that the order of MAC addresses (destination first, source second) is the same as on the wire, but the opposite of the one used for IP addresses. .It Cm mac-type Ar mac-type Matches packets whose Ethernet Type field corresponds to one of those specified as argument. .Ar mac-type is specified in the same way as .Cm port numbers (i.e. one or more comma-separated single values or ranges). You can use symbolic names for known values such as .Em vlan , ipv4, ipv6 . Values can be entered as decimal or hexadecimal (if prefixed by 0x), and they are always printed as hexadecimal (unless the .Cm -N option is used, in which case symbolic resolution will be attempted). .It Cm proto Ar protocol Matches packets with the corresponding IPv4 protocol. .It Cm recv | xmit | via Brq Ar ifX | Ar if Ns Cm * | Ar ipno | Ar any Matches packets received, transmitted or going through, respectively, the interface specified by exact name .Ns No ( Ar ifX Ns No ), by device name .Ns No ( Ar if Ns Ar * Ns No ), by IP address, or through some interface. .Pp The .Cm via keyword causes the interface to always be checked. If .Cm recv or .Cm xmit is used instead of .Cm via , then only the receive or transmit interface (respectively) is checked. By specifying both, it is possible to match packets based on both receive and transmit interface, e.g.: .Pp .Dl "ipfw add deny ip from any to any out recv ed0 xmit ed1" .Pp The .Cm recv interface can be tested on either incoming or outgoing packets, while the .Cm xmit interface can only be tested on outgoing packets. So .Cm out is required (and .Cm in is invalid) whenever .Cm xmit is used. .Pp A packet may not have a receive or transmit interface: packets originating from the local host have no receive interface, while packets destined for the local host have no transmit interface. .It Cm setup Matches TCP packets that have the SYN bit set but no ACK bit. This is the short form of .Dq Li tcpflags\ syn,!ack . .It Cm src-ip Ar ip-address Matches IP packets whose source IP is one of the address(es) specified as argument. .It Cm src-port Ar ports Matches IP packets whose source port is one of the port(s) specified as argument. .It Cm tcpack Ar ack TCP packets only. Match if the TCP header acknowledgment number field is set to .Ar ack . .It Cm tcpflags Ar spec TCP packets only. Match if the TCP header contains the comma separated list of flags specified in .Ar spec . The supported TCP flags are: .Pp .Cm fin , .Cm syn , .Cm rst , .Cm psh , .Cm ack and .Cm urg . The absence of a particular flag may be denoted with a .Ql \&! . A rule which contains a .Cm tcpflags specification can never match a fragmented packet which has a non-zero offset. See the .Cm frag option for details on matching fragmented packets. .It Cm tcpseq Ar seq TCP packets only. Match if the TCP header sequence number field is set to .Ar seq . .It Cm tcpwin Ar win TCP packets only. Match if the TCP header window field is set to .Ar win . .It Cm tcpoptions Ar spec TCP packets only. Match if the TCP header contains the comma separated list of options specified in .Ar spec . The supported TCP options are: .Pp .Cm mss (maximum segment size), .Cm window (tcp window advertisement), .Cm sack (selective ack), .Cm ts (rfc1323 timestamp) and .Cm cc (rfc1644 t/tcp connection count). The absence of a particular option may be denoted with a .Ql \&! . .It Cm uid Ar user Match all TCP or UDP packets sent by or received for a .Ar user . A .Ar user may be matched by name or identification number. +.It Cm verrevpath +For incoming packets, +a routing table lookup is done on the packet's source address. +If the interface on which the packet entered the system matches the +outgoing interface for the route, +the packet matches. +If the interfaces do not match up, +the packet does not match. +All outgoing packets or packets with no incoming interface match. +.Pp +The name and functionality of the option is intentionally similar to +the Cisco IOS command: +.Pp +.Dl ip verify unicast reverse-path +.Pp +This option can be used to make anti-spoofing rules. .El .Sh SETS OF RULES Each rule belongs to one of 32 different .Em sets , numbered 0 to 31. Set 31 is reserved for the default rule. .Pp By default, rules are put in set 0, unless you use the .Cm set N attribute when entering a new rule. Sets can be individually and atomically enabled or disabled, so this mechanism permits an easy way to store multiple configurations of the firewall and quickly (and atomically) switch between them. The command to enable/disable sets is .Bd -ragged -offset indent .Nm .Cm set Oo Cm disable Ar number ... Oc Op Cm enable Ar number ... .Ed .Pp where multiple .Cm enable or .Cm disable sections can be specified. Command execution is atomic on all the sets specified in the command. By default, all sets are enabled. .Pp When you disable a set, its rules behave as if they do not exist in the firewall configuration, with only one exception: .Bd -ragged -offset indent dynamic rules created from a rule before it had been disabled will still be active until they expire. In order to delete dynamic rules you have to explicitly delete the parent rule which generated them. .Ed .Pp The set number of rules can be changed with the command .Bd -ragged -offset indent -.Nm +.Nm .Cm set move .Brq Cm rule Ar rule-number | old-set .Cm to Ar new-set .Ed .Pp Also, you can atomically swap two rulesets with the command .Bd -ragged -offset indent .Nm .Cm set swap Ar first-set second-set .Ed .Pp See the .Sx EXAMPLES Section on some possible uses of sets of rules. .Sh STATEFUL FIREWALL Stateful operation is a way for the firewall to dynamically create rules for specific flows when packets that match a given pattern are detected. Support for stateful operation comes through the .Cm check-state , keep-state and .Cm limit options of .Nm rules. .Pp Dynamic rules are created when a packet matches a .Cm keep-state or .Cm limit rule, causing the creation of a .Em dynamic rule which will match all and only packets with a given .Em protocol between a .Em src-ip/src-port dst-ip/dst-port pair of addresses ( .Em src and .Em dst are used here only to denote the initial match addresses, but they are completely equivalent afterwards). Dynamic rules will be checked at the first .Cm check-state, keep-state or .Cm limit occurrence, and the action performed upon a match will be the same as in the parent rule. .Pp Note that no additional attributes other than protocol and IP addresses and ports are checked on dynamic rules. .Pp The typical use of dynamic rules is to keep a closed firewall configuration, but let the first TCP SYN packet from the inside network install a dynamic rule for the flow so that packets belonging to that session will be allowed through the firewall: .Pp .Dl "ipfw add check-state" .Dl "ipfw add allow tcp from my-subnet to any setup keep-state" .Dl "ipfw add deny tcp from any to any" .Pp A similar approach can be used for UDP, where an UDP packet coming from the inside will install a dynamic rule to let the response through the firewall: .Pp .Dl "ipfw add check-state" .Dl "ipfw add allow udp from my-subnet to any keep-state" .Dl "ipfw add deny udp from any to any" .Pp Dynamic rules expire after some time, which depends on the status of the flow and the setting of some .Cm sysctl variables. See Section .Sx SYSCTL VARIABLES for more details. For TCP sessions, dynamic rules can be instructed to periodically send keepalive packets to refresh the state of the rule when it is about to expire. .Pp See Section .Sx EXAMPLES for more examples on how to use dynamic rules. .Sh TRAFFIC SHAPER (DUMMYNET) CONFIGURATION .Nm is also the user interface for the .Xr dummynet 4 traffic shaper. .Pp .Nm dummynet operates by first using the firewall to classify packets and divide them into .Em flows , using any match pattern that can be used in .Nm rules. Depending on local policies, a flow can contain packets for a single TCP connection, or from/to a given host, or entire subnet, or a protocol type, etc. .Pp Packets belonging to the same flow are then passed to either of two different objects, which implement the traffic regulation: .Bl -hang -offset XXXX .It Em pipe A pipe emulates a link with given bandwidth, propagation delay, queue size and packet loss rate. Packets are queued in front of the pipe as they come out from the classifier, and then transferred to the pipe according to the pipe's parameters. .Pp .It Em queue A queue is an abstraction used to implement the WF2Q+ (Worst-case Fair Weighted Fair Queueing) policy, which is an efficient variant of the WFQ policy. .br The queue associates a .Em weight and a reference pipe to each flow, and then all backlogged (i.e., with packets queued) flows linked to the same pipe share the pipe's bandwidth proportionally to their weights. Note that weights are not priorities; a flow with a lower weight is still guaranteed to get its fraction of the bandwidth even if a flow with a higher weight is permanently backlogged. .Pp .El In practice, .Em pipes can be used to set hard limits to the bandwidth that a flow can use, whereas .Em queues can be used to determine how different flow share the available bandwidth. .Pp The .Em pipe and .Em queue configuration commands are the following: .Bd -ragged -offset indent .Cm pipe Ar number Cm config Ar pipe-configuration .Pp .Cm queue Ar number Cm config Ar queue-configuration .Ed .Pp The following parameters can be configured for a pipe: .Pp .Bl -tag -width indent -compact .It Cm bw Ar bandwidth | device Bandwidth, measured in .Sm off .Op Cm K | M .Brq Cm bit/s | Byte/s . .Sm on .Pp A value of 0 (default) means unlimited bandwidth. The unit must immediately follow the number, as in .Pp .Dl "ipfw pipe 1 config bw 300Kbit/s" .Pp If a device name is specified instead of a numeric value, as in .Pp .Dl "ipfw pipe 1 config bw tun0" .Pp then the transmit clock is supplied by the specified device. At the moment only the .Xr tun 4 device supports this functionality, for use in conjunction with .Xr ppp 8 . .Pp .It Cm delay Ar ms-delay Propagation delay, measured in milliseconds. The value is rounded to the next multiple of the clock tick (typically 10ms, but it is a good practice to run kernels with .Dq "options HZ=1000" to reduce the granularity to 1ms or less). Default value is 0, meaning no delay. .El .Pp The following parameters can be configured for a queue: .Pp .Bl -tag -width indent -compact .It Cm pipe Ar pipe_nr Connects a queue to the specified pipe. Multiple queues (with the same or different weights) can be connected to the same pipe, which specifies the aggregate rate for the set of queues. .Pp .It Cm weight Ar weight Specifies the weight to be used for flows matching this queue. The weight must be in the range 1..100, and defaults to 1. .El .Pp Finally, the following parameters can be configured for both pipes and queues: .Pp .Bl -tag -width XXXX -compact .Pp .It Cm buckets Ar hash-table-size Specifies the size of the hash table used for storing the various queues. Default value is 64 controlled by the .Xr sysctl 8 variable .Em net.inet.ip.dummynet.hash_size , allowed range is 16 to 65536. .Pp .It Cm mask Ar mask-specifier Packets sent to a given pipe or queue by an .Nm rule can be further classified into multiple flows, each of which is then sent to a different .Em dynamic pipe or queue. A flow identifier is constructed by masking the IP addresses, ports and protocol types as specified with the .Cm mask options in the configuration of the pipe or queue. For each different flow identifier, a new pipe or queue is created with the same parameters as the original object, and matching packets are sent to it. .Pp Thus, when .Em dynamic pipes are used, each flow will get the same bandwidth as defined by the pipe, whereas when .Em dynamic queues are used, each flow will share the parent's pipe bandwidth evenly with other flows generated by the same queue (note that other queues with different weights might be connected to the same pipe). .br Available mask specifiers are a combination of one or more of the following: .Pp .Cm dst-ip Ar mask , .Cm src-ip Ar mask , .Cm dst-port Ar mask , .Cm src-port Ar mask , .Cm proto Ar mask or .Cm all , .Pp where the latter means all bits in all fields are significant. .Pp .It Cm noerror When a packet is dropped by a dummynet queue or pipe, the error is normally reported to the caller routine in the kernel, in the same way as it happens when a device queue fills up. Setting this option reports the packet as successfully delivered, which can be needed for some experimental setups where you want to simulate loss or congestion at a remote router. .Pp .It Cm plr Ar packet-loss-rate Packet loss rate. Argument .Ar packet-loss-rate is a floating-point number between 0 and 1, with 0 meaning no loss, 1 meaning 100% loss. The loss rate is internally represented on 31 bits. .Pp .It Cm queue Brq Ar slots | size Ns Cm Kbytes Queue size, in .Ar slots or .Cm KBytes . Default value is 50 slots, which is the typical queue size for Ethernet devices. Note that for slow speed links you should keep the queue size short or your traffic might be affected by a significant queueing delay. E.g., 50 max-sized ethernet packets (1500 bytes) mean 600Kbit or 20s of queue on a 30Kbit/s pipe. Even worse effect can result if you get packets from an interface with a much larger MTU, e.g. the loopback interface with its 16KB packets. .Pp .It Cm red | gred Ar w_q Ns / Ns Ar min_th Ns / Ns Ar max_th Ns / Ns Ar max_p Make use of the RED (Random Early Detection) queue management algorithm. .Ar w_q and .Ar max_p are floating point numbers between 0 and 1 (0 not included), while .Ar min_th and .Ar max_th are integer numbers specifying thresholds for queue management (thresholds are computed in bytes if the queue has been defined in bytes, in slots otherwise). The .Xr dummynet 4 also supports the gentle RED variant (gred). Three .Xr sysctl 8 variables can be used to control the RED behaviour: .Bl -tag -width indent .It Em net.inet.ip.dummynet.red_lookup_depth specifies the accuracy in computing the average queue when the link is idle (defaults to 256, must be greater than zero) .It Em net.inet.ip.dummynet.red_avg_pkt_size specifies the expected average packet size (defaults to 512, must be greater than zero) .It Em net.inet.ip.dummynet.red_max_pkt_size specifies the expected maximum packet size, only used when queue thresholds are in bytes (defaults to 1500, must be greater than zero). .El .El .Sh CHECKLIST Here are some important points to consider when designing your rules: .Bl -bullet .It Remember that you filter both packets going .Cm in and .Cm out . Most connections need packets going in both directions. .It Remember to test very carefully. It is a good idea to be near the console when doing this. If you cannot be near the console, use an auto-recovery script such as the one in .Pa /usr/share/examples/ipfw/change_rules.sh . .It Don't forget the loopback interface. .El .Sh FINE POINTS .Bl -bullet .It There are circumstances where fragmented datagrams are unconditionally dropped. TCP packets are dropped if they do not contain at least 20 bytes of TCP header, UDP packets are dropped if they do not contain a full 8 byte UDP header, and ICMP packets are dropped if they do not contain 4 bytes of ICMP header, enough to specify the ICMP type, code, and checksum. These packets are simply logged as .Dq pullup failed since there may not be enough good data in the packet to produce a meaningful log entry. .It Another type of packet is unconditionally dropped, a TCP packet with a fragment offset of one. This is a valid packet, but it only has one use, to try to circumvent firewalls. When logging is enabled, these packets are reported as being dropped by rule -1. .It If you are logged in over a network, loading the .Xr kld 4 version of .Nm is probably not as straightforward as you would think. I recommend the following command line: .Bd -literal -offset indent kldload ipfw && \e ipfw add 32000 allow ip from any to any .Ed .Pp Along the same lines, doing an .Bd -literal -offset indent ipfw flush .Ed .Pp in similar surroundings is also a bad idea. .It The .Nm filter list may not be modified if the system security level is set to 3 or higher (see .Xr init 8 for information on system security levels). .El .Sh PACKET DIVERSION A .Xr divert 4 socket bound to the specified port will receive all packets diverted to that port. If no socket is bound to the destination port, or if the kernel wasn't compiled with divert socket support, the packets are dropped. .Sh SYSCTL VARIABLES A set of .Xr sysctl 8 variables controls the behaviour of the firewall and associated modules ( .Nm dummynet, bridge ). These are shown below together with their default value (but always check with the .Xr sysctl 8 command what value is actually in use) and meaning: .Bl -tag -width indent .It Em net.inet.ip.dummynet.expire : No 1 Lazily delete dynamic pipes/queue once they have no pending traffic. You can disable this by setting the variable to 0, in which case the pipes/queues will only be deleted when the threshold is reached. .It Em net.inet.ip.dummynet.hash_size : No 64 Default size of the hash table used for dynamic pipes/queues. This value is used when no .Cm buckets option is specified when configuring a pipe/queue. .It Em net.inet.ip.dummynet.max_chain_len : No 16 Target value for the maximum number of pipes/queues in a hash bucket. The product .Cm max_chain_len*hash_size is used to determine the threshold over which empty pipes/queues will be expired even when .Cm net.inet.ip.dummynet.expire=0 . .It Em net.inet.ip.dummynet.red_lookup_depth : No 256 .It Em net.inet.ip.dummynet.red_avg_pkt_size : No 512 .It Em net.inet.ip.dummynet.red_max_pkt_size : No 1500 Parameters used in the computations of the drop probability for the RED algorithm. .It Em net.inet.ip.fw.autoinc_step : No 100 Delta between rule numbers when auto-generating them. The value must be in the range 1..1000. .It Em net.inet.ip.fw.curr_dyn_buckets : Em net.inet.ip.fw.dyn_buckets The current number of buckets in the hash table for dynamic rules (readonly). .It Em net.inet.ip.fw.debug : No 1 Controls debugging messages produced by .Nm . .It Em net.inet.ip.fw.dyn_buckets : No 256 The number of buckets in the hash table for dynamic rules. Must be a power of 2, up to 65536. It only takes effect when all dynamic rules have expired, so you are advised to use a .Cm flush command to make sure that the hash table is resized. .It Em net.inet.ip.fw.dyn_count : No 3 Current number of dynamic rules (read-only). .It Em net.inet.ip.fw.dyn_keepalive : No 1 Enables generation of keepalive packets for .Cm keep-state rules on TCP sessions. A keepalive is generated to both sides of the connection every 5 seconds for the last 20 seconds of the lifetime of the rule. .It Em net.inet.ip.fw.dyn_max : No 8192 Maximum number of dynamic rules. When you hit this limit, no more dynamic rules can be installed until old ones expire. .It Em net.inet.ip.fw.dyn_ack_lifetime : No 300 .It Em net.inet.ip.fw.dyn_syn_lifetime : No 20 .It Em net.inet.ip.fw.dyn_fin_lifetime : No 1 .It Em net.inet.ip.fw.dyn_rst_lifetime : No 1 .It Em net.inet.ip.fw.dyn_udp_lifetime : No 5 .It Em net.inet.ip.fw.dyn_short_lifetime : No 30 These variables control the lifetime, in seconds, of dynamic rules. Upon the initial SYN exchange the lifetime is kept short, then increased after both SYN have been seen, then decreased again during the final FIN exchange or when a RST is received. Both .Em dyn_fin_lifetime and .Em dyn_rst_lifetime must be strictly lower than 5 seconds, the period of repetition of keepalives. The firewall enforces that. .It Em net.inet.ip.fw.enable : No 1 Enables the firewall. Setting this variable to 0 lets you run your machine without firewall even if compiled in. .It Em net.inet.ip.fw.one_pass : No 1 When set, the packet exiting from the .Xr dummynet 4 pipe is not passed though the firewall again. Otherwise, after a pipe action, the packet is reinjected into the firewall at the next rule. .It Em net.inet.ip.fw.verbose : No 1 Enables verbose messages. .It Em net.inet.ip.fw.verbose_limit : No 0 Limits the number of messages produced by a verbose firewall. .It Em net.link.ether.ipfw : No 0 Controls whether layer-2 packets are passed to .Nm . Default is no. .It Em net.link.ether.bridge_ipfw : No 0 Controls whether bridged packets are passed to .Nm . Default is no. .El .Sh USING IPFW2 IN FreeBSD-STABLE .Nm ipfw2 is standard in .Fx CURRENT, whereas .Fx STABLE still uses .Nm ipfw1 unless the kernel is compiled with .Cm options IPFW2 , and .Nm /sbin/ipfw and .Nm /usr/lib/libalias are recompiled with .Cm -DIPFW2 and reinstalled (the same effect can be achieved by adding .Cm IPFW2=TRUE to .Nm /etc/make.conf before a buildworld). .Pp .Sh IPFW2 ENHANCEMENTS This Section lists the features that have been introduced in .Nm ipfw2 which were not present in .Nm ipfw1 . We list them in order of the potential impact that they can have in writing your rulesets. You might want to consider using these features in order to write your rulesets in a more efficient way. .Bl -tag -width indent .It Handling of non-IPv4 packets .Nm ipfw1 will silently accept all non-IPv4 packets (which .Nm ipfw1 will only see when .Em net.link.ether.bridge_ipfw=1 Ns ). .Nm ipfw2 will filter all packets (including non-IPv4 ones) according to the ruleset. To achieve the same behaviour as .Nm ipfw1 you can use the following as the very first rule in your ruleset: .Pp .Dl "ipfw add 1 allow layer2 not mac-type ip" .Pp The .Cm layer2 option might seem redundant, but it is necessary -- packets passed to the firewall from layer3 will not have a MAC header, so the .Cm mac-type ip pattern will always fail on them, and the .Cm not operator will make this rule into a pass-all. .It Address sets .Nm ipfw1 does not supports address sets (those in the form .Ar addr/masklen{num,num,...} ). .Pp .It Port specifications .Nm ipfw1 only allows one port range when specifying TCP and UDP ports, and is limited to 10 entries instead of the 15 allowed by .Nm ipfw2 . Also, in .Nm ipfw1 you can only specify ports when the rule is requesting .Cm tcp or .Cm udp packets. With .Nm ipfw2 you can put port specifications in rules matching all packets, and the match will be attempted only on those packets carrying protocols which include port identifiers. .Pp Finally, .Nm ipfw1 allowed the first port entry to be specified as .Ar port:mask where .Ar mask can be an arbitrary 16-bit mask. This syntax is of questionable usefulness and it is not supported anymore in .Nm ipfw2 . .It Or-blocks .Nm ipfw1 does not support Or-blocks. .It keepalives .Nm ipfw1 does not generate keepalives for stateful sessions. As a consequence, it might cause idle sessions to drop because the lifetime of the dynamic rules expires. .It Sets of rules .Nm ipfw1 does not implement sets of rules. .It MAC header filtering and Layer-2 firewalling. .Nm ipfw1 does not implement filtering on MAC header fields, nor is it invoked on packets from .Cm ether_demux() and .Cm ether_output_frame(). The sysctl variable .Em net.link.ether.ipfw has no effect there. .It Options +In +.Nm ipfw1 , +the following options only accept a single value as an argument: +.Pp +.Cm ipid, iplen, ipttl +.Pp The following options are not implemented by .Nm ipfw1 : .Pp .Cm dst-ip, dst-port, layer2, mac, mac-type, src-ip, src-port. .Pp Additionally, the RELENG_4 version of .Nm ipfw1 does not implement the following options: .Pp .Cm ipid, iplen, ipprecedence, iptos, ipttl, .Cm ipversion, tcpack, tcpseq, tcpwin . .It Dummynet options The following option for .Nm dummynet pipes/queues is not supported: .Cm noerror . +.Pp +.It Preprocessor options +.Nm ipfw1 +only supports the +.Oo Fl D Ar macro Oo = value Oc Oc +and +.Op Fl U Ar macro +options in conjunction with the +.Fl p Ar preproc +flag. .El .Sh EXAMPLES There are far too many possible uses of .Nm so this Section will only give a small set of examples. .Pp .Ss BASIC PACKET FILTERING This command adds an entry which denies all tcp packets from .Em cracker.evil.org to the telnet port of .Em wolf.tambov.su from being forwarded by the host: .Pp .Dl "ipfw add deny tcp from cracker.evil.org to wolf.tambov.su telnet" .Pp This one disallows any connection from the entire cracker's network to my host: .Pp .Dl "ipfw add deny ip from 123.45.67.0/24 to my.host.org" .Pp A first and efficient way to limit access (not using dynamic rules) is the use of the following rules: .Pp .Dl "ipfw add allow tcp from any to any established" .Dl "ipfw add allow tcp from net1 portlist1 to net2 portlist2 setup" .Dl "ipfw add allow tcp from net3 portlist3 to net3 portlist3 setup" .Dl "..." .Dl "ipfw add deny tcp from any to any" .Pp The first rule will be a quick match for normal TCP packets, but it will not match the initial SYN packet, which will be matched by the .Cm setup rules only for selected source/destination pairs. All other SYN packets will be rejected by the final .Cm deny rule. .Pp If you administer one or more subnets, you can take advantage of the .Nm ipfw2 syntax to specify address sets and or-blocks and write extremely compact rulesets which selectively enable services to blocks of clients, as below: .Pp .Dl "goodguys=\*q{ 10.1.2.0/24{20,35,66,18} or 10.2.3.0/28{6,3,11} }\*q" .Dl "badguys=\*q10.1.2.0/24{8,38,60}\*q" .Dl "" .Dl "ipfw add allow ip from ${goodguys} to any" .Dl "ipfw add deny ip from ${badguys} to any" .Dl "... normal policies ..." .Pp The .Nm ipfw1 syntax would require a separate rule for each IP in the above example. +.Pp +The +.Cm verrevpath +option could be used to do automated anti-spoofing by adding the +following to the top of a ruleset: +.Pp +.Dl "ipfw add deny ip from any to any not verrevpath in" +.Pp +This rule drops all incoming packets that appear to be coming to the +sytem on the wrong interface. For example, a packet with a source +address belonging to a host on a protected internal network would be +dropped if it tried to enter the system from an external interface. .Ss DYNAMIC RULES In order to protect a site from flood attacks involving fake TCP packets, it is safer to use dynamic rules: .Pp .Dl "ipfw add check-state" .Dl "ipfw add deny tcp from any to any established" .Dl "ipfw add allow tcp from my-net to any setup keep-state" .Pp This will let the firewall install dynamic rules only for those connection which start with a regular SYN packet coming from the inside of our network. Dynamic rules are checked when encountering the first .Cm check-state or .Cm keep-state rule. A .Cm check-state rule should usually be placed near the beginning of the ruleset to minimize the amount of work scanning the ruleset. Your mileage may vary. .Pp To limit the number of connections a user can open you can use the following type of rules: .Pp .Dl "ipfw add allow tcp from my-net/24 to any setup limit src-addr 10" .Dl "ipfw add allow tcp from any to me setup limit src-addr 4" .Pp The former (assuming it runs on a gateway) will allow each host on a /24 network to open at most 10 TCP connections. The latter can be placed on a server to make sure that a single client does not use more than 4 simultaneous connections. .Pp .Em BEWARE : stateful rules can be subject to denial-of-service attacks by a SYN-flood which opens a huge number of dynamic rules. The effects of such attacks can be partially limited by acting on a set of .Xr sysctl 8 variables which control the operation of the firewall. .Pp Here is a good usage of the .Cm list command to see accounting records and timestamp information: .Pp .Dl ipfw -at list .Pp or in short form without timestamps: .Pp .Dl ipfw -a list .Pp which is equivalent to: .Pp .Dl ipfw show .Pp Next rule diverts all incoming packets from 192.168.2.0/24 to divert port 5000: .Pp .Dl ipfw divert 5000 ip from 192.168.2.0/24 to any in .Pp .Ss TRAFFIC SHAPING The following rules show some of the applications of .Nm and .Xr dummynet 4 for simulations and the like. .Pp This rule drops random incoming packets with a probability of 5%: .Pp .Dl "ipfw add prob 0.05 deny ip from any to any in" .Pp A similar effect can be achieved making use of dummynet pipes: .Pp .Dl "ipfw add pipe 10 ip from any to any" .Dl "ipfw pipe 10 config plr 0.05" .Pp We can use pipes to artificially limit bandwidth, e.g. on a machine acting as a router, if we want to limit traffic from local clients on 192.168.2.0/24 we do: .Pp .Dl "ipfw add pipe 1 ip from 192.168.2.0/24 to any out" .Dl "ipfw pipe 1 config bw 300Kbit/s queue 50KBytes" .Pp note that we use the .Cm out modifier so that the rule is not used twice. Remember in fact that .Nm rules are checked both on incoming and outgoing packets. .Pp Should we want to simulate a bidirectional link with bandwidth limitations, the correct way is the following: .Pp .Dl "ipfw add pipe 1 ip from any to any out" .Dl "ipfw add pipe 2 ip from any to any in" .Dl "ipfw pipe 1 config bw 64Kbit/s queue 10Kbytes" .Dl "ipfw pipe 2 config bw 64Kbit/s queue 10Kbytes" .Pp The above can be very useful, e.g. if you want to see how your fancy Web page will look for a residential user who is connected only through a slow link. You should not use only one pipe for both directions, unless you want to simulate a half-duplex medium (e.g. AppleTalk, Ethernet, IRDA). It is not necessary that both pipes have the same configuration, so we can also simulate asymmetric links. .Pp Should we want to verify network performance with the RED queue management algorithm: .Pp .Dl "ipfw add pipe 1 ip from any to any" .Dl "ipfw pipe 1 config bw 500Kbit/s queue 100 red 0.002/30/80/0.1" .Pp Another typical application of the traffic shaper is to introduce some delay in the communication. This can significantly affect applications which do a lot of Remote Procedure Calls, and where the round-trip-time of the connection often becomes a limiting factor much more than bandwidth: .Pp .Dl "ipfw add pipe 1 ip from any to any out" .Dl "ipfw add pipe 2 ip from any to any in" .Dl "ipfw pipe 1 config delay 250ms bw 1Mbit/s" .Dl "ipfw pipe 2 config delay 250ms bw 1Mbit/s" .Pp Per-flow queueing can be useful for a variety of purposes. A very simple one is counting traffic: .Pp .Dl "ipfw add pipe 1 tcp from any to any" .Dl "ipfw add pipe 1 udp from any to any" .Dl "ipfw add pipe 1 ip from any to any" .Dl "ipfw pipe 1 config mask all" .Pp The above set of rules will create queues (and collect statistics) for all traffic. Because the pipes have no limitations, the only effect is collecting statistics. Note that we need 3 rules, not just the last one, because when .Nm tries to match IP packets it will not consider ports, so we would not see connections on separate ports as different ones. .Pp A more sophisticated example is limiting the outbound traffic on a net with per-host limits, rather than per-network limits: .Pp .Dl "ipfw add pipe 1 ip from 192.168.2.0/24 to any out" .Dl "ipfw add pipe 2 ip from any to 192.168.2.0/24 in" .Dl "ipfw pipe 1 config mask src-ip 0x000000ff bw 200Kbit/s queue 20Kbytes" .Dl "ipfw pipe 2 config mask dst-ip 0x000000ff bw 200Kbit/s queue 20Kbytes" .Ss SETS OF RULES To add a set of rules atomically, e.g. set 18: .Pp .Dl "ipfw disable set 18" .Dl "ipfw add NN set 18 ... # repeat as needed" .Dl "ipfw enable set 18" .Pp To delete a set of rules atomically the command is simply: .Pp .Dl "ipfw delete set 18" .Pp To test a ruleset and disable it and regain control if something goes wrong: .Pp .Dl "ipfw disable set 18" .Dl "ipfw add NN set 18 ... # repeat as needed" .Dl "ipfw enable set 18 ; echo done; sleep 30 && ipfw disable set 18" .Pp Here if everything goes well, you press control-C before the "sleep" terminates, and your ruleset will be left active. Otherwise, e.g. if you cannot access your box, the ruleset will be disabled after the sleep terminates thus restoring the previous situation. .Sh SEE ALSO .Xr cpp 1 , .Xr m4 1 , .Xr bridge 4 , .Xr divert 4 , .Xr dummynet 4 , .Xr ip 4 , .Xr ipfirewall 4 , .Xr protocols 5 , .Xr services 5 , .Xr init 8 , .Xr kldload 8 , .Xr reboot 8 , .Xr sysctl 8 , .Xr syslogd 8 .Sh BUGS The syntax has grown over the years and sometimes it might be confusing. Unfortunately, backward compatibility prevents cleaning up mistakes made in the definition of the syntax. .Pp .Em !!! WARNING !!! .Pp Misconfiguring the firewall can put your computer in an unusable state, possibly shutting down network services and requiring console access to regain control of it. .Pp Incoming packet fragments diverted by .Cm divert or .Cm tee are reassembled before delivery to the socket. The action used on those packet is the one from the rule which matches the first fragment of the packet. .Pp Packets that match a .Cm tee rule should not be immediately accepted, but should continue going through the rule list. This may be fixed in a later version. .Pp Packets diverted to userland, and then reinserted by a userland process (such as .Xr natd 8 ) will lose various packet attributes, including their source interface. If a packet is reinserted in this manner, later rules may be incorrectly applied, making the order of .Cm divert rules in the rule sequence very important. .Sh AUTHORS .An Ugen J. S. Antsilevich , .An Poul-Henning Kamp , .An Alex Nash , .An Archie Cobbs , .An Luigi Rizzo . .Pp .An -nosplit API based upon code written by .An Daniel Boulet for BSDI. .Pp Work on .Xr dummynet 4 traffic shaper supported by Akamba Corp. .Sh HISTORY The .Nm utility first appeared in .Fx 2.0 . .Xr dummynet 4 was introduced in .Fx 2.2.8 . Stateful extensions were introduced in .Fx 4.0 . .Nm ipfw2 was introduced in Summer 2002. Index: stable/4/sbin/ipfw/ipfw2.c =================================================================== --- stable/4/sbin/ipfw/ipfw2.c (revision 116991) +++ stable/4/sbin/ipfw/ipfw2.c (revision 116992) @@ -1,3635 +1,3732 @@ /* * Copyright (c) 2002 Luigi Rizzo * Copyright (c) 1996 Alex Nash, Paul Traina, Poul-Henning Kamp * Copyright (c) 1994 Ugen J.S.Antsilevich * * Idea and grammar partially left from: * Copyright (c) 1993 Daniel Boulet * * Redistribution and use in source forms, with and without modification, * are permitted provided that this entire comment appears intact. * * Redistribution in binary form may occur without any restrictions. * Obviously, it would be nice if you gave credit where credit is due * but requiring it would be too onerous. * * This software is provided ``AS IS'' without any warranties of any kind. * * NEW command line interface for IP firewall facility * * $FreeBSD$ */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* def. of struct route */ #include #include #include int s, /* main RAW socket */ do_resolv, /* Would try to resolve all */ do_acct, /* Show packet/byte count */ do_time, /* Show time stamps */ do_quiet, /* Be quiet in add and flush */ do_force, /* Don't ask for confirmation */ do_pipe, /* this cmd refers to a pipe */ do_sort, /* field to sort results (0 = no) */ do_dynamic, /* display dynamic rules */ do_expired, /* display expired dynamic rules */ do_compact, /* show rules in compact mode */ show_sets, /* display rule sets */ verbose; #define IP_MASK_ALL 0xffffffff /* * structure to hold flag names and associated values to be * set in the appropriate masks. * A NULL string terminates the array. * Often, an element with 0 value contains an error string. * */ struct _s_x { char *s; int x; }; static struct _s_x f_tcpflags[] = { { "syn", TH_SYN }, { "fin", TH_FIN }, { "ack", TH_ACK }, { "psh", TH_PUSH }, { "rst", TH_RST }, { "urg", TH_URG }, { "tcp flag", 0 }, { NULL, 0 } }; static struct _s_x f_tcpopts[] = { { "mss", IP_FW_TCPOPT_MSS }, { "maxseg", IP_FW_TCPOPT_MSS }, { "window", IP_FW_TCPOPT_WINDOW }, { "sack", IP_FW_TCPOPT_SACK }, { "ts", IP_FW_TCPOPT_TS }, { "timestamp", IP_FW_TCPOPT_TS }, { "cc", IP_FW_TCPOPT_CC }, { "tcp option", 0 }, { NULL, 0 } }; /* * IP options span the range 0 to 255 so we need to remap them * (though in fact only the low 5 bits are significant). */ static struct _s_x f_ipopts[] = { { "ssrr", IP_FW_IPOPT_SSRR}, { "lsrr", IP_FW_IPOPT_LSRR}, { "rr", IP_FW_IPOPT_RR}, { "ts", IP_FW_IPOPT_TS}, { "ip option", 0 }, { NULL, 0 } }; static struct _s_x f_iptos[] = { { "lowdelay", IPTOS_LOWDELAY}, { "throughput", IPTOS_THROUGHPUT}, { "reliability", IPTOS_RELIABILITY}, { "mincost", IPTOS_MINCOST}, { "congestion", IPTOS_CE}, { "ecntransport", IPTOS_ECT}, { "ip tos option", 0}, { NULL, 0 } }; static struct _s_x limit_masks[] = { {"all", DYN_SRC_ADDR|DYN_SRC_PORT|DYN_DST_ADDR|DYN_DST_PORT}, {"src-addr", DYN_SRC_ADDR}, {"src-port", DYN_SRC_PORT}, {"dst-addr", DYN_DST_ADDR}, {"dst-port", DYN_DST_PORT}, {NULL, 0} }; /* * we use IPPROTO_ETHERTYPE as a fake protocol id to call the print routines * This is only used in this code. */ #define IPPROTO_ETHERTYPE 0x1000 static struct _s_x ether_types[] = { /* * Note, we cannot use "-:&/" in the names because they are field * separators in the type specifications. Also, we use s = NULL as * end-delimiter, because a type of 0 can be legal. */ { "ip", 0x0800 }, { "ipv4", 0x0800 }, { "ipv6", 0x86dd }, { "arp", 0x0806 }, { "rarp", 0x8035 }, { "vlan", 0x8100 }, { "loop", 0x9000 }, { "trail", 0x1000 }, { "at", 0x809b }, { "atalk", 0x809b }, { "aarp", 0x80f3 }, { "pppoe_disc", 0x8863 }, { "pppoe_sess", 0x8864 }, { "ipx_8022", 0x00E0 }, { "ipx_8023", 0x0000 }, { "ipx_ii", 0x8137 }, { "ipx_snap", 0x8137 }, { "ipx", 0x8137 }, { "ns", 0x0600 }, { NULL, 0 } }; static void show_usage(void); enum tokens { TOK_NULL=0, TOK_OR, TOK_NOT, TOK_STARTBRACE, TOK_ENDBRACE, TOK_ACCEPT, TOK_COUNT, TOK_PIPE, TOK_QUEUE, TOK_DIVERT, TOK_TEE, TOK_FORWARD, TOK_SKIPTO, TOK_DENY, TOK_REJECT, TOK_RESET, TOK_UNREACH, TOK_CHECKSTATE, TOK_UID, TOK_GID, TOK_IN, TOK_LIMIT, TOK_KEEPSTATE, TOK_LAYER2, TOK_OUT, TOK_XMIT, TOK_RECV, TOK_VIA, TOK_FRAG, TOK_IPOPTS, TOK_IPLEN, TOK_IPID, TOK_IPPRECEDENCE, TOK_IPTOS, TOK_IPTTL, TOK_IPVER, TOK_ESTAB, TOK_SETUP, TOK_TCPFLAGS, TOK_TCPOPTS, TOK_TCPSEQ, TOK_TCPACK, TOK_TCPWIN, TOK_ICMPTYPES, TOK_MAC, TOK_MACTYPE, + TOK_VERREVPATH, TOK_PLR, TOK_NOERROR, TOK_BUCKETS, TOK_DSTIP, TOK_SRCIP, TOK_DSTPORT, TOK_SRCPORT, TOK_ALL, TOK_MASK, TOK_BW, TOK_DELAY, TOK_RED, TOK_GRED, TOK_DROPTAIL, TOK_PROTO, TOK_WEIGHT, }; struct _s_x dummynet_params[] = { { "plr", TOK_PLR }, { "noerror", TOK_NOERROR }, { "buckets", TOK_BUCKETS }, { "dst-ip", TOK_DSTIP }, { "src-ip", TOK_SRCIP }, { "dst-port", TOK_DSTPORT }, { "src-port", TOK_SRCPORT }, { "proto", TOK_PROTO }, { "weight", TOK_WEIGHT }, { "all", TOK_ALL }, { "mask", TOK_MASK }, { "droptail", TOK_DROPTAIL }, { "red", TOK_RED }, { "gred", TOK_GRED }, { "bw", TOK_BW }, { "bandwidth", TOK_BW }, { "delay", TOK_DELAY }, { "pipe", TOK_PIPE }, { "queue", TOK_QUEUE }, { "dummynet-params", TOK_NULL }, { NULL, 0 } }; struct _s_x rule_actions[] = { { "accept", TOK_ACCEPT }, { "pass", TOK_ACCEPT }, { "allow", TOK_ACCEPT }, { "permit", TOK_ACCEPT }, { "count", TOK_COUNT }, { "pipe", TOK_PIPE }, { "queue", TOK_QUEUE }, { "divert", TOK_DIVERT }, { "tee", TOK_TEE }, { "fwd", TOK_FORWARD }, { "forward", TOK_FORWARD }, { "skipto", TOK_SKIPTO }, { "deny", TOK_DENY }, { "drop", TOK_DENY }, { "reject", TOK_REJECT }, { "reset", TOK_RESET }, { "unreach", TOK_UNREACH }, { "check-state", TOK_CHECKSTATE }, { NULL, TOK_NULL }, { NULL, 0 } }; struct _s_x rule_options[] = { { "uid", TOK_UID }, { "gid", TOK_GID }, { "in", TOK_IN }, { "limit", TOK_LIMIT }, { "keep-state", TOK_KEEPSTATE }, { "bridged", TOK_LAYER2 }, { "layer2", TOK_LAYER2 }, { "out", TOK_OUT }, { "xmit", TOK_XMIT }, { "recv", TOK_RECV }, { "via", TOK_VIA }, { "fragment", TOK_FRAG }, { "frag", TOK_FRAG }, { "ipoptions", TOK_IPOPTS }, { "ipopts", TOK_IPOPTS }, { "iplen", TOK_IPLEN }, { "ipid", TOK_IPID }, { "ipprecedence", TOK_IPPRECEDENCE }, { "iptos", TOK_IPTOS }, { "ipttl", TOK_IPTTL }, { "ipversion", TOK_IPVER }, { "ipver", TOK_IPVER }, { "estab", TOK_ESTAB }, { "established", TOK_ESTAB }, { "setup", TOK_SETUP }, { "tcpflags", TOK_TCPFLAGS }, { "tcpflgs", TOK_TCPFLAGS }, { "tcpoptions", TOK_TCPOPTS }, { "tcpopts", TOK_TCPOPTS }, { "tcpseq", TOK_TCPSEQ }, { "tcpack", TOK_TCPACK }, { "tcpwin", TOK_TCPWIN }, { "icmptype", TOK_ICMPTYPES }, { "icmptypes", TOK_ICMPTYPES }, { "dst-ip", TOK_DSTIP }, { "src-ip", TOK_SRCIP }, { "dst-port", TOK_DSTPORT }, { "src-port", TOK_SRCPORT }, { "proto", TOK_PROTO }, { "MAC", TOK_MAC }, { "mac", TOK_MAC }, { "mac-type", TOK_MACTYPE }, + { "verrevpath", TOK_VERREVPATH }, { "not", TOK_NOT }, /* pseudo option */ { "!", /* escape ? */ TOK_NOT }, /* pseudo option */ { "or", TOK_OR }, /* pseudo option */ { "|", /* escape */ TOK_OR }, /* pseudo option */ { "{", TOK_STARTBRACE }, /* pseudo option */ { "(", TOK_STARTBRACE }, /* pseudo option */ { "}", TOK_ENDBRACE }, /* pseudo option */ { ")", TOK_ENDBRACE }, /* pseudo option */ { NULL, TOK_NULL }, { NULL, 0 } }; +static __inline u_int64_t +align_uint64(u_int64_t *pll) { + u_int64_t ret; + + bcopy (pll, &ret, sizeof(ret)); + return ret; +}; + /** * match_token takes a table and a string, returns the value associated * with the string (0 meaning an error in most cases) */ static int match_token(struct _s_x *table, char *string) { struct _s_x *pt; int i = strlen(string); for (pt = table ; i && pt->s != NULL ; pt++) if (strlen(pt->s) == i && !bcmp(string, pt->s, i)) return pt->x; return -1; }; static char * match_value(struct _s_x *p, u_int32_t value) { for (; p->s != NULL; p++) if (p->x == value) return p->s; return NULL; } /* * prints one port, symbolic or numeric */ static void print_port(int proto, u_int16_t port) { if (proto == IPPROTO_ETHERTYPE) { char *s; if (do_resolv && (s = match_value(ether_types, port)) ) printf("%s", s); else printf("0x%04x", port); } else { struct servent *se = NULL; if (do_resolv) { struct protoent *pe = getprotobynumber(proto); se = getservbyport(htons(port), pe ? pe->p_name : NULL); } if (se) printf("%s", se->s_name); else printf("%d", port); } } /* * print the values in a list of ports * XXX todo: add support for mask. */ static void print_newports(ipfw_insn_u16 *cmd, int proto, int opcode) { u_int16_t *p = cmd->ports; int i; - char *sep= " "; + char *sep; if (cmd->o.len & F_NOT) printf(" not"); - if (opcode != 0) - printf ("%s", opcode == O_MAC_TYPE ? " mac-type" : - (opcode == O_IP_DSTPORT ? " dst-port" : " src-port")); + if (opcode != 0) { + switch (opcode) { + case O_IP_DSTPORT: + sep = "dst-port"; + break; + case O_IP_SRCPORT: + sep = "src-port"; + break; + case O_IPID: + sep = "ipid"; + break; + case O_IPLEN: + sep = "iplen"; + break; + case O_IPTTL: + sep = "ipttl"; + break; + case O_MAC_TYPE: + sep = "mac-type"; + break; + default: + sep = "???"; + break; + } + printf (" %s", sep); + } + sep = " "; for (i = F_LEN((ipfw_insn *)cmd) - 1; i > 0; i--, p += 2) { printf(sep); print_port(proto, p[0]); if (p[0] != p[1]) { printf("-"); print_port(proto, p[1]); } sep = ","; } } /* * Like strtol, but also translates service names into port numbers * for some protocols. * In particular: * proto == -1 disables the protocol check; * proto == IPPROTO_ETHERTYPE looks up an internal table * proto == matches the values there. * Returns *end == s in case the parameter is not found. */ static int strtoport(char *s, char **end, int base, int proto) { char *p, *buf; char *s1; int i; *end = s; /* default - not found */ if ( *s == '\0') return 0; /* not found */ if (isdigit(*s)) return strtol(s, end, base); /* * find separator. '\\' escapes the next char. */ for (s1 = s; *s1 && (isalnum(*s1) || *s1 == '\\') ; s1++) if (*s1 == '\\' && s1[1] != '\0') s1++; buf = malloc(s1 - s + 1); if (buf == NULL) return 0; /* * copy into a buffer skipping backslashes */ for (p = s, i = 0; p != s1 ; p++) if ( *p != '\\') buf[i++] = *p; buf[i++] = '\0'; if (proto == IPPROTO_ETHERTYPE) { i = match_token(ether_types, buf); free(buf); if (i != -1) { /* found */ *end = s1; return i; } } else { struct protoent *pe = NULL; struct servent *se; if (proto != 0) pe = getprotobynumber(proto); setservent(1); se = getservbyname(buf, pe ? pe->p_name : NULL); free(buf); if (se != NULL) { *end = s1; return ntohs(se->s_port); } } return 0; /* not found */ } /* * fill the body of the command with the list of port ranges. * At the moment it only understands numeric ranges. */ static int fill_newports(ipfw_insn_u16 *cmd, char *av, int proto) { u_int16_t *p = cmd->ports; int i = 0; char *s = av; while (*s) { u_int16_t a, b; a = strtoport(av, &s, 0, proto); if (s == av) /* no parameter */ break; if (*s == '-') { /* a range */ av = s+1; b = strtoport(av, &s, 0, proto); if (s == av) /* no parameter */ break; p[0] = a; p[1] = b; } else if (*s == ',' || *s == '\0' ) { p[0] = p[1] = a; } else { /* invalid separator */ errx(EX_DATAERR, "invalid separator <%c> in <%s>\n", *s, av); } i++; p += 2; av = s+1; } if (i > 0) { if (i+1 > F_LEN_MASK) errx(EX_DATAERR, "too many ports/ranges\n"); cmd->o.len |= i+1; /* leave F_NOT and F_OR untouched */ } return i; } static struct _s_x icmpcodes[] = { { "net", ICMP_UNREACH_NET }, { "host", ICMP_UNREACH_HOST }, { "protocol", ICMP_UNREACH_PROTOCOL }, { "port", ICMP_UNREACH_PORT }, { "needfrag", ICMP_UNREACH_NEEDFRAG }, { "srcfail", ICMP_UNREACH_SRCFAIL }, { "net-unknown", ICMP_UNREACH_NET_UNKNOWN }, { "host-unknown", ICMP_UNREACH_HOST_UNKNOWN }, { "isolated", ICMP_UNREACH_ISOLATED }, { "net-prohib", ICMP_UNREACH_NET_PROHIB }, { "host-prohib", ICMP_UNREACH_HOST_PROHIB }, { "tosnet", ICMP_UNREACH_TOSNET }, { "toshost", ICMP_UNREACH_TOSHOST }, { "filter-prohib", ICMP_UNREACH_FILTER_PROHIB }, { "host-precedence", ICMP_UNREACH_HOST_PRECEDENCE }, { "precedence-cutoff", ICMP_UNREACH_PRECEDENCE_CUTOFF }, { NULL, 0 } }; static void fill_reject_code(u_short *codep, char *str) { int val; char *s; val = strtoul(str, &s, 0); if (s == str || *s != '\0' || val >= 0x100) val = match_token(icmpcodes, str); if (val < 0) errx(EX_DATAERR, "unknown ICMP unreachable code ``%s''", str); *codep = val; return; } static void print_reject_code(u_int16_t code) { char *s = match_value(icmpcodes, code); if (s != NULL) printf("unreach %s", s); else printf("unreach %u", code); } /* * Returns the number of bits set (from left) in a contiguous bitmask, * or -1 if the mask is not contiguous. * XXX this needs a proper fix. * This effectively works on masks in big-endian (network) format. * when compiled on little endian architectures. * * First bit is bit 7 of the first byte -- note, for MAC addresses, * the first bit on the wire is bit 0 of the first byte. * len is the max length in bits. */ static int contigmask(u_char *p, int len) { int i, n; for (i=0; iarg1 & 0xff; u_char clear = (cmd->arg1 >> 8) & 0xff; if (list == f_tcpflags && set == TH_SYN && clear == TH_ACK) { printf(" setup"); return; } printf(" %s ", name); for (i=0; list[i].x != 0; i++) { if (set & list[i].x) { set &= ~list[i].x; printf("%s%s", comma, list[i].s); comma = ","; } if (clear & list[i].x) { clear &= ~list[i].x; printf("%s!%s", comma, list[i].s); comma = ","; } } } /* * Print the ip address contained in a command. */ static void print_ip(ipfw_insn_ip *cmd, char *s) { struct hostent *he = NULL; int mb; printf("%s%s ", cmd->o.len & F_NOT ? " not": "", s); if (cmd->o.opcode == O_IP_SRC_ME || cmd->o.opcode == O_IP_DST_ME) { printf("me"); return; } if (cmd->o.opcode == O_IP_SRC_SET || cmd->o.opcode == O_IP_DST_SET) { u_int32_t x, *d; - int i; + int i, j; char comma = '{'; x = cmd->o.arg1 - 1; x = htonl( ~x ); cmd->addr.s_addr = htonl(cmd->addr.s_addr); printf("%s/%d", inet_ntoa(cmd->addr), contigmask((u_char *)&x, 32)); x = cmd->addr.s_addr = htonl(cmd->addr.s_addr); x &= 0xff; /* base */ d = (u_int32_t *)&(cmd->mask); + /* + * Print bits and ranges. + * Locate first bit set (i), then locate first bit unset (j). + * If we have 3+ consecutive bits set, then print them as a + * range, otherwise only print the initial bit and rescan. + */ for (i=0; i < cmd->o.arg1; i++) - if (d[ i/32] & (1<<(i & 31))) { + if (d[i/32] & (1<<(i & 31))) { + for (j=i+1; j < cmd->o.arg1; j++) + if (!(d[ j/32] & (1<<(j & 31)))) + break; printf("%c%d", comma, i+x); + if (j>i+2) { /* range has at least 3 elements */ + printf("-%d", j-1+x); + i = j-1; + } comma = ','; } printf("}"); return; } if (cmd->o.opcode == O_IP_SRC || cmd->o.opcode == O_IP_DST) mb = 32; else mb = contigmask((u_char *)&(cmd->mask.s_addr), 32); if (mb == 32 && do_resolv) he = gethostbyaddr((char *)&(cmd->addr.s_addr), sizeof(u_long), AF_INET); if (he != NULL) /* resolved to name */ printf("%s", he->h_name); else if (mb == 0) /* any */ printf("any"); else { /* numeric IP followed by some kind of mask */ printf("%s", inet_ntoa(cmd->addr)); if (mb < 0) printf(":%s", inet_ntoa(cmd->mask)); else if (mb < 32) printf("/%d", mb); } } /* * prints a MAC address/mask pair */ static void print_mac(u_char *addr, u_char *mask) { int l = contigmask(mask, 48); if (l == 0) printf(" any"); else { printf(" %02x:%02x:%02x:%02x:%02x:%02x", addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]); if (l == -1) printf("&%02x:%02x:%02x:%02x:%02x:%02x", mask[0], mask[1], mask[2], mask[3], mask[4], mask[5]); else if (l < 48) printf("/%d", l); } } static void fill_icmptypes(ipfw_insn_u32 *cmd, char *av) { u_int8_t type; cmd->d[0] = 0; while (*av) { if (*av == ',') av++; type = strtoul(av, &av, 0); if (*av != ',' && *av != '\0') errx(EX_DATAERR, "invalid ICMP type"); if (type > 31) errx(EX_DATAERR, "ICMP type out of range"); cmd->d[0] |= 1 << type; } cmd->o.opcode = O_ICMPTYPE; cmd->o.len |= F_INSN_SIZE(ipfw_insn_u32); } static void print_icmptypes(ipfw_insn_u32 *cmd) { int i; char sep= ' '; printf(" icmptypes"); for (i = 0; i < 32; i++) { if ( (cmd->d[0] & (1 << (i))) == 0) continue; printf("%c%d", sep, i); sep = ','; } } /* * show_ipfw() prints the body of an ipfw rule. * Because the standard rule has at least proto src_ip dst_ip, we use * a helper function to produce these entries if not provided explicitly. * The first argument is the list of fields we have, the second is * the list of fields we want to be printed. * * Special cases if we have provided a MAC header: * + if the rule does not contain IP addresses/ports, do not print them; * + if the rule does not contain an IP proto, print "all" instead of "ip"; * * Once we have 'have_options', IP header fields are printed as options. */ #define HAVE_PROTO 0x0001 #define HAVE_SRCIP 0x0002 #define HAVE_DSTIP 0x0004 #define HAVE_MAC 0x0008 #define HAVE_MACTYPE 0x0010 #define HAVE_OPTIONS 0x8000 #define HAVE_IP (HAVE_PROTO | HAVE_SRCIP | HAVE_DSTIP) static void show_prerequisites(int *flags, int want, int cmd) { if ( (*flags & HAVE_IP) == HAVE_IP) *flags |= HAVE_OPTIONS; if ( (*flags & (HAVE_MAC|HAVE_MACTYPE|HAVE_OPTIONS)) == HAVE_MAC && cmd != O_MAC_TYPE) { /* * mac-type was optimized out by the compiler, * restore it */ printf(" any"); *flags |= HAVE_MACTYPE | HAVE_OPTIONS; return; } if ( !(*flags & HAVE_OPTIONS)) { if ( !(*flags & HAVE_PROTO) && (want & HAVE_PROTO)) printf(" ip"); if ( !(*flags & HAVE_SRCIP) && (want & HAVE_SRCIP)) printf(" from any"); if ( !(*flags & HAVE_DSTIP) && (want & HAVE_DSTIP)) printf(" to any"); } *flags |= want; } static void show_ipfw(struct ip_fw *rule, int pcwidth, int bcwidth) { static int twidth = 0; int l; ipfw_insn *cmd; int proto = 0; /* default */ int flags = 0; /* prerequisites */ ipfw_insn_log *logptr = NULL; /* set if we find an O_LOG */ int or_block = 0; /* we are in an or block */ + u_int32_t set_disable; - u_int32_t set_disable = (u_int32_t)(rule->next_rule); + bcopy(&rule->next_rule, &set_disable, sizeof(set_disable)); if (set_disable & (1 << rule->set)) { /* disabled */ if (!show_sets) return; else printf("# DISABLED "); } printf("%05u ", rule->rulenum); if (do_acct) - printf("%*llu %*llu ", pcwidth, rule->pcnt, bcwidth, - rule->bcnt); + printf("%*llu %*llu ", pcwidth, align_uint64(&rule->pcnt), + bcwidth, align_uint64(&rule->bcnt)); if (do_time) { char timestr[30]; time_t t = (time_t)0; if (twidth == 0) { strcpy(timestr, ctime(&t)); *strchr(timestr, '\n') = '\0'; twidth = strlen(timestr); } if (rule->timestamp) { #if _FreeBSD_version < 500000 /* XXX check */ #define _long_to_time(x) (time_t)(x) #endif t = _long_to_time(rule->timestamp); strcpy(timestr, ctime(&t)); *strchr(timestr, '\n') = '\0'; printf("%s ", timestr); } else { - printf("%*s ", twidth, " "); + printf("%*s", twidth, " "); } } if (show_sets) printf("set %d ", rule->set); /* * print the optional "match probability" */ if (rule->cmd_len > 0) { cmd = rule->cmd ; if (cmd->opcode == O_PROB) { ipfw_insn_u32 *p = (ipfw_insn_u32 *)cmd; double d = 1.0 * p->d[0]; d = (d / 0x7fffffff); printf("prob %f ", d); } } /* * first print actions */ for (l = rule->cmd_len - rule->act_ofs, cmd = ACTION_PTR(rule); l > 0 ; l -= F_LEN(cmd), cmd += F_LEN(cmd)) { switch(cmd->opcode) { case O_CHECK_STATE: printf("check-state"); flags = HAVE_IP; /* avoid printing anything else */ break; case O_ACCEPT: printf("allow"); break; case O_COUNT: printf("count"); break; case O_DENY: printf("deny"); break; case O_REJECT: if (cmd->arg1 == ICMP_REJECT_RST) printf("reset"); else if (cmd->arg1 == ICMP_UNREACH_HOST) printf("reject"); else print_reject_code(cmd->arg1); break; case O_SKIPTO: printf("skipto %u", cmd->arg1); break; case O_PIPE: printf("pipe %u", cmd->arg1); break; case O_QUEUE: printf("queue %u", cmd->arg1); break; case O_DIVERT: printf("divert %u", cmd->arg1); break; case O_TEE: printf("tee %u", cmd->arg1); break; case O_FORWARD_IP: { ipfw_insn_sa *s = (ipfw_insn_sa *)cmd; printf("fwd %s", inet_ntoa(s->sa.sin_addr)); if (s->sa.sin_port) printf(",%d", s->sa.sin_port); } break; case O_LOG: /* O_LOG is printed last */ logptr = (ipfw_insn_log *)cmd; break; default: printf("** unrecognized action %d len %d", cmd->opcode, cmd->len); } } if (logptr) { if (logptr->max_log > 0) printf(" log logamount %d", logptr->max_log); else printf(" log"); } /* * then print the body. */ if (rule->_pad & 1) { /* empty rules before options */ if (!do_compact) printf(" ip from any to any"); flags |= HAVE_IP | HAVE_OPTIONS; } for (l = rule->act_ofs, cmd = rule->cmd ; l > 0 ; l -= F_LEN(cmd) , cmd += F_LEN(cmd)) { /* useful alias */ ipfw_insn_u32 *cmd32 = (ipfw_insn_u32 *)cmd; show_prerequisites(&flags, 0, cmd->opcode); switch(cmd->opcode) { case O_PROB: break; /* done already */ case O_PROBE_STATE: break; /* no need to print anything here */ case O_MACADDR2: { ipfw_insn_mac *m = (ipfw_insn_mac *)cmd; if ((cmd->len & F_OR) && !or_block) printf(" {"); if (cmd->len & F_NOT) printf(" not"); printf(" MAC"); flags |= HAVE_MAC; print_mac( m->addr, m->mask); print_mac( m->addr + 6, m->mask + 6); } break; case O_MAC_TYPE: if ((cmd->len & F_OR) && !or_block) printf(" {"); print_newports((ipfw_insn_u16 *)cmd, IPPROTO_ETHERTYPE, (flags & HAVE_OPTIONS) ? cmd->opcode : 0); flags |= HAVE_MAC | HAVE_MACTYPE | HAVE_OPTIONS; break; case O_IP_SRC: case O_IP_SRC_MASK: case O_IP_SRC_ME: case O_IP_SRC_SET: show_prerequisites(&flags, HAVE_PROTO, 0); if (!(flags & HAVE_SRCIP)) printf(" from"); if ((cmd->len & F_OR) && !or_block) printf(" {"); print_ip((ipfw_insn_ip *)cmd, (flags & HAVE_OPTIONS) ? " src-ip" : ""); flags |= HAVE_SRCIP; break; case O_IP_DST: case O_IP_DST_MASK: case O_IP_DST_ME: case O_IP_DST_SET: show_prerequisites(&flags, HAVE_PROTO|HAVE_SRCIP, 0); if (!(flags & HAVE_DSTIP)) printf(" to"); if ((cmd->len & F_OR) && !or_block) printf(" {"); print_ip((ipfw_insn_ip *)cmd, (flags & HAVE_OPTIONS) ? " dst-ip" : ""); flags |= HAVE_DSTIP; break; case O_IP_DSTPORT: show_prerequisites(&flags, HAVE_IP, 0); case O_IP_SRCPORT: show_prerequisites(&flags, HAVE_PROTO|HAVE_SRCIP, 0); if ((cmd->len & F_OR) && !or_block) printf(" {"); print_newports((ipfw_insn_u16 *)cmd, proto, (flags & HAVE_OPTIONS) ? cmd->opcode : 0); break; case O_PROTO: { struct protoent *pe; if ((cmd->len & F_OR) && !or_block) printf(" {"); if (cmd->len & F_NOT) printf(" not"); proto = cmd->arg1; pe = getprotobynumber(cmd->arg1); if (flags & HAVE_OPTIONS) printf(" proto"); if (pe) printf(" %s", pe->p_name); else printf(" %u", cmd->arg1); } flags |= HAVE_PROTO; break; default: /*options ... */ show_prerequisites(&flags, HAVE_IP | HAVE_OPTIONS, 0); if ((cmd->len & F_OR) && !or_block) printf(" {"); if (cmd->len & F_NOT && cmd->opcode != O_IN) printf(" not"); switch(cmd->opcode) { case O_FRAG: printf(" frag"); break; case O_IN: printf(cmd->len & F_NOT ? " out" : " in"); break; case O_LAYER2: printf(" layer2"); break; case O_XMIT: case O_RECV: case O_VIA: { char *s; ipfw_insn_if *cmdif = (ipfw_insn_if *)cmd; if (cmd->opcode == O_XMIT) s = "xmit"; else if (cmd->opcode == O_RECV) s = "recv"; else if (cmd->opcode == O_VIA) s = "via"; if (cmdif->name[0] == '\0') printf(" %s %s", s, inet_ntoa(cmdif->p.ip)); else if (cmdif->p.unit == -1) printf(" %s %s*", s, cmdif->name); else printf(" %s %s%d", s, cmdif->name, cmdif->p.unit); } break; case O_IPID: - printf(" ipid %u", cmd->arg1 ); + if (F_LEN(cmd) == 1) + printf(" ipid %u", cmd->arg1 ); + else + print_newports((ipfw_insn_u16 *)cmd, 0, + O_IPID); break; case O_IPTTL: - printf(" ipttl %u", cmd->arg1 ); + if (F_LEN(cmd) == 1) + printf(" ipttl %u", cmd->arg1 ); + else + print_newports((ipfw_insn_u16 *)cmd, 0, + O_IPTTL); break; case O_IPVER: printf(" ipver %u", cmd->arg1 ); break; case O_IPPRECEDENCE: printf(" ipprecedence %u", (cmd->arg1) >> 5 ); break; case O_IPLEN: - printf(" iplen %u", cmd->arg1 ); + if (F_LEN(cmd) == 1) + printf(" iplen %u", cmd->arg1 ); + else + print_newports((ipfw_insn_u16 *)cmd, 0, + O_IPLEN); break; case O_IPOPT: print_flags("ipoptions", cmd, f_ipopts); break; case O_IPTOS: print_flags("iptos", cmd, f_iptos); break; case O_ICMPTYPE: print_icmptypes((ipfw_insn_u32 *)cmd); break; case O_ESTAB: printf(" established"); break; case O_TCPFLAGS: print_flags("tcpflags", cmd, f_tcpflags); break; case O_TCPOPTS: print_flags("tcpoptions", cmd, f_tcpopts); break; case O_TCPWIN: printf(" tcpwin %d", ntohs(cmd->arg1)); break; case O_TCPACK: printf(" tcpack %d", ntohl(cmd32->d[0])); break; case O_TCPSEQ: printf(" tcpseq %d", ntohl(cmd32->d[0])); break; case O_UID: { struct passwd *pwd = getpwuid(cmd32->d[0]); if (pwd) printf(" uid %s", pwd->pw_name); else printf(" uid %u", cmd32->d[0]); } break; case O_GID: { struct group *grp = getgrgid(cmd32->d[0]); if (grp) printf(" gid %s", grp->gr_name); else printf(" gid %u", cmd32->d[0]); } break; + case O_VERREVPATH: + printf(" verrevpath"); + break; + case O_KEEP_STATE: printf(" keep-state"); break; case O_LIMIT: { struct _s_x *p = limit_masks; ipfw_insn_limit *c = (ipfw_insn_limit *)cmd; u_int8_t x = c->limit_mask; char *comma = " "; printf(" limit"); for ( ; p->x != 0 ; p++) if ((x & p->x) == p->x) { x &= ~p->x; printf("%s%s", comma, p->s); comma = ","; } printf(" %d", c->conn_limit); } break; default: printf(" [opcode %d len %d]", cmd->opcode, cmd->len); } } if (cmd->len & F_OR) { printf(" or"); or_block = 1; } else if (or_block) { printf(" }"); or_block = 0; } } show_prerequisites(&flags, HAVE_IP, 0); printf("\n"); } static void show_dyn_ipfw(ipfw_dyn_rule *d, int pcwidth, int bcwidth) { struct protoent *pe; struct in_addr a; + uint16_t rulenum; if (!do_expired) { if (!d->expire && !(d->dyn_type == O_LIMIT_PARENT)) return; } - - printf("%05d %*llu %*llu (%ds)", (int)(d->rule), pcwidth, d->pcnt, - bcwidth, d->bcnt, d->expire); + bcopy(&d->rule, &rulenum, sizeof(rulenum)); + printf("%05d %*llu %*llu (%ds)", rulenum, pcwidth, + align_uint64(&d->pcnt), bcwidth, + align_uint64(&d->bcnt), d->expire); switch (d->dyn_type) { case O_LIMIT_PARENT: printf(" PARENT %d", d->count); break; case O_LIMIT: printf(" LIMIT"); break; case O_KEEP_STATE: /* bidir, no mask */ printf(" STATE"); break; } if ((pe = getprotobynumber(d->id.proto)) != NULL) printf(" %s", pe->p_name); else printf(" proto %u", d->id.proto); a.s_addr = htonl(d->id.src_ip); printf(" %s %d", inet_ntoa(a), d->id.src_port); a.s_addr = htonl(d->id.dst_ip); printf(" <-> %s %d", inet_ntoa(a), d->id.dst_port); printf("\n"); } int sort_q(const void *pa, const void *pb) { int rev = (do_sort < 0); int field = rev ? -do_sort : do_sort; long long res = 0; const struct dn_flow_queue *a = pa; const struct dn_flow_queue *b = pb; switch (field) { case 1: /* pkts */ res = a->len - b->len; break; case 2: /* bytes */ res = a->len_bytes - b->len_bytes; break; case 3: /* tot pkts */ res = a->tot_pkts - b->tot_pkts; break; case 4: /* tot bytes */ res = a->tot_bytes - b->tot_bytes; break; } if (res < 0) res = -1; if (res > 0) res = 1; return (int)(rev ? res : -res); } static void list_queues(struct dn_flow_set *fs, struct dn_flow_queue *q) { int l; printf(" mask: 0x%02x 0x%08x/0x%04x -> 0x%08x/0x%04x\n", fs->flow_mask.proto, fs->flow_mask.src_ip, fs->flow_mask.src_port, fs->flow_mask.dst_ip, fs->flow_mask.dst_port); if (fs->rq_elements == 0) return; printf("BKT Prot ___Source IP/port____ " "____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp\n"); if (do_sort != 0) heapsort(q, fs->rq_elements, sizeof *q, sort_q); for (l = 0; l < fs->rq_elements; l++) { struct in_addr ina; struct protoent *pe; ina.s_addr = htonl(q[l].id.src_ip); printf("%3d ", q[l].hash_slot); pe = getprotobynumber(q[l].id.proto); if (pe) printf("%-4s ", pe->p_name); else printf("%4u ", q[l].id.proto); printf("%15s/%-5d ", inet_ntoa(ina), q[l].id.src_port); ina.s_addr = htonl(q[l].id.dst_ip); printf("%15s/%-5d ", inet_ntoa(ina), q[l].id.dst_port); printf("%4qu %8qu %2u %4u %3u\n", q[l].tot_pkts, q[l].tot_bytes, q[l].len, q[l].len_bytes, q[l].drops); if (verbose) printf(" S %20qd F %20qd\n", q[l].S, q[l].F); } } static void print_flowset_parms(struct dn_flow_set *fs, char *prefix) { int l; char qs[30]; char plr[30]; char red[90]; /* Display RED parameters */ l = fs->qsize; if (fs->flags_fs & DN_QSIZE_IS_BYTES) { if (l >= 8192) sprintf(qs, "%d KB", l / 1024); else sprintf(qs, "%d B", l); } else sprintf(qs, "%3d sl.", l); if (fs->plr) sprintf(plr, "plr %f", 1.0 * fs->plr / (double)(0x7fffffff)); else plr[0] = '\0'; if (fs->flags_fs & DN_IS_RED) /* RED parameters */ sprintf(red, "\n\t %cRED w_q %f min_th %d max_th %d max_p %f", (fs->flags_fs & DN_IS_GENTLE_RED) ? 'G' : ' ', 1.0 * fs->w_q / (double)(1 << SCALE_RED), SCALE_VAL(fs->min_th), SCALE_VAL(fs->max_th), 1.0 * fs->max_p / (double)(1 << SCALE_RED)); else sprintf(red, "droptail"); printf("%s %s%s %d queues (%d buckets) %s\n", prefix, qs, plr, fs->rq_elements, fs->rq_size, red); } static void list_pipes(void *data, int nbytes, int ac, char *av[]) { u_long rulenum; void *next = data; struct dn_pipe *p = (struct dn_pipe *) data; struct dn_flow_set *fs; struct dn_flow_queue *q; int l; if (ac > 0) rulenum = strtoul(*av++, NULL, 10); else rulenum = 0; for (; nbytes >= sizeof *p; p = (struct dn_pipe *)next) { double b = p->bandwidth; char buf[30]; char prefix[80]; if (p->next != (struct dn_pipe *)DN_IS_PIPE) break; /* done with pipes, now queues */ /* * compute length, as pipe have variable size */ l = sizeof(*p) + p->fs.rq_elements * sizeof(*q); next = (void *)p + l; nbytes -= l; if (rulenum != 0 && rulenum != p->pipe_nr) continue; /* * Print rate (or clocking interface) */ if (p->if_name[0] != '\0') sprintf(buf, "%s", p->if_name); else if (b == 0) sprintf(buf, "unlimited"); else if (b >= 1000000) sprintf(buf, "%7.3f Mbit/s", b/1000000); else if (b >= 1000) sprintf(buf, "%7.3f Kbit/s", b/1000); else sprintf(buf, "%7.3f bit/s ", b); sprintf(prefix, "%05d: %s %4d ms ", p->pipe_nr, buf, p->delay); print_flowset_parms(&(p->fs), prefix); if (verbose) printf(" V %20qd\n", p->V >> MY_M); q = (struct dn_flow_queue *)(p+1); list_queues(&(p->fs), q); } for (fs = next; nbytes >= sizeof *fs; fs = next) { char prefix[80]; if (fs->next != (struct dn_flow_set *)DN_IS_QUEUE) break; l = sizeof(*fs) + fs->rq_elements * sizeof(*q); next = (void *)fs + l; nbytes -= l; q = (struct dn_flow_queue *)(fs+1); sprintf(prefix, "q%05d: weight %d pipe %d ", fs->fs_nr, fs->weight, fs->parent_nr); print_flowset_parms(fs, prefix); list_queues(fs, q); } } /* * This one handles all set-related commands * ipfw set { show | enable | disable } * ipfw set swap X Y * ipfw set move X to Y * ipfw set move rule X to Y */ static void sets_handler(int ac, char *av[]) { u_int32_t set_disable, masks[2]; int i, nbytes; u_int16_t rulenum; u_int8_t cmd, new_set; ac--; av++; if (!ac) errx(EX_USAGE, "set needs command"); if (!strncmp(*av, "show", strlen(*av)) ) { void *data; char *msg; nbytes = sizeof(struct ip_fw); if ((data = malloc(nbytes)) == NULL) err(EX_OSERR, "malloc"); if (getsockopt(s, IPPROTO_IP, IP_FW_GET, data, &nbytes) < 0) err(EX_OSERR, "getsockopt(IP_FW_GET)"); - set_disable = (u_int32_t)(((struct ip_fw *)data)->next_rule); + bcopy(&((struct ip_fw *)data)->next_rule, + &set_disable, sizeof(set_disable)); for (i = 0, msg = "disable" ; i < 31; i++) if ( (set_disable & (1< 30) errx(EX_DATAERR, "invalid set number %s\n", av[0]); if (!isdigit(*(av[1])) || new_set > 30) errx(EX_DATAERR, "invalid set number %s\n", av[1]); masks[0] = (4 << 24) | (new_set << 16) | (rulenum); i = setsockopt(s, IPPROTO_IP, IP_FW_DEL, masks, sizeof(u_int32_t)); } else if (!strncmp(*av, "move", strlen(*av))) { ac--; av++; if (ac && !strncmp(*av, "rule", strlen(*av))) { cmd = 2; ac--; av++; } else cmd = 3; if (ac != 3 || strncmp(av[1], "to", strlen(*av))) errx(EX_USAGE, "syntax: set move [rule] X to Y\n"); rulenum = atoi(av[0]); new_set = atoi(av[2]); if (!isdigit(*(av[0])) || (cmd == 3 && rulenum > 30) || (cmd == 2 && rulenum == 65535) ) errx(EX_DATAERR, "invalid source number %s\n", av[0]); if (!isdigit(*(av[2])) || new_set > 30) errx(EX_DATAERR, "invalid dest. set %s\n", av[1]); masks[0] = (cmd << 24) | (new_set << 16) | (rulenum); i = setsockopt(s, IPPROTO_IP, IP_FW_DEL, masks, sizeof(u_int32_t)); } else if (!strncmp(*av, "disable", strlen(*av)) || !strncmp(*av, "enable", strlen(*av)) ) { int which = !strncmp(*av, "enable", strlen(*av)) ? 1 : 0; ac--; av++; masks[0] = masks[1] = 0; while (ac) { if (isdigit(**av)) { i = atoi(*av); if (i < 0 || i > 30) errx(EX_DATAERR, "invalid set number %d\n", i); masks[which] |= (1<= nalloc) { nalloc = nalloc * 2 + 200; nbytes = nalloc; if ((data = realloc(data, nbytes)) == NULL) err(EX_OSERR, "realloc"); if (getsockopt(s, IPPROTO_IP, ocmd, data, &nbytes) < 0) err(EX_OSERR, "getsockopt(IP_%s_GET)", do_pipe ? "DUMMYNET" : "FW"); } if (do_pipe) { list_pipes(data, nbytes, ac, av); goto done; } /* * Count static rules. They have variable size so we * need to scan the list to count them. */ for (nstat = 1, r = data, lim = data + nbytes; r->rulenum < 65535 && (void *)r < lim; ++nstat, r = (void *)r + RULESIZE(r) ) ; /* nothing */ /* * Count dynamic rules. This is easier as they have * fixed size. */ r = (void *)r + RULESIZE(r); dynrules = (ipfw_dyn_rule *)r ; n = (void *)r - data; ndyn = (nbytes - n) / sizeof *dynrules; /* if showing stats, figure out column widths ahead of time */ bcwidth = pcwidth = 0; if (do_acct) { for (n = 0, r = data; n < nstat; n++, r = (void *)r + RULESIZE(r)) { /* packet counter */ - width = snprintf(NULL, 0, "%llu", r->pcnt); + width = snprintf(NULL, 0, "%llu", + align_uint64(&r->pcnt)); if (width > pcwidth) pcwidth = width; /* byte counter */ - width = snprintf(NULL, 0, "%llu", r->bcnt); + width = snprintf(NULL, 0, "%llu", + align_uint64(&r->bcnt)); if (width > bcwidth) bcwidth = width; } } if (do_dynamic && ndyn) { for (n = 0, d = dynrules; n < ndyn; n++, d++) { - width = snprintf(NULL, 0, "%llu", d->pcnt); + width = snprintf(NULL, 0, "%llu", + align_uint64(&d->pcnt)); if (width > pcwidth) pcwidth = width; - width = snprintf(NULL, 0, "%llu", d->bcnt); + width = snprintf(NULL, 0, "%llu", + align_uint64(&d->bcnt)); if (width > bcwidth) bcwidth = width; } } /* if no rule numbers were specified, list all rules */ if (ac == 0) { for (n = 0, r = data; n < nstat; n++, r = (void *)r + RULESIZE(r) ) show_ipfw(r, pcwidth, bcwidth); if (do_dynamic && ndyn) { printf("## Dynamic rules (%d):\n", ndyn); for (n = 0, d = dynrules; n < ndyn; n++, d++) show_dyn_ipfw(d, pcwidth, bcwidth); } goto done; } /* display specific rules requested on command line */ for (lac = ac, lav = av; lac != 0; lac--) { /* convert command line rule # */ rnum = strtoul(*lav++, &endptr, 10); if (*endptr) { exitval = EX_USAGE; warnx("invalid rule number: %s", *(lav - 1)); continue; } for (n = seen = 0, r = data; n < nstat; n++, r = (void *)r + RULESIZE(r) ) { if (r->rulenum > rnum) break; if (r->rulenum == rnum) { show_ipfw(r, pcwidth, bcwidth); seen = 1; } } if (!seen) { /* give precedence to other error(s) */ if (exitval == EX_OK) exitval = EX_UNAVAILABLE; warnx("rule %lu does not exist", rnum); } } if (do_dynamic && ndyn) { printf("## Dynamic rules:\n"); for (lac = ac, lav = av; lac != 0; lac--) { rnum = strtoul(*lav++, &endptr, 10); if (*endptr) /* already warned */ continue; for (n = 0, d = dynrules; n < ndyn; n++, d++) { - if ((int)(d->rule) > rnum) + uint16_t rulenum; + + bcopy(&d->rule, &rulenum, sizeof(rulenum)); + if (rulenum > rnum) break; - if ((int)(d->rule) == rnum) + if (rulenum == rnum) show_dyn_ipfw(d, pcwidth, bcwidth); } } } ac = 0; done: free(data); if (exitval != EX_OK) exit(exitval); } static void show_usage(void) { fprintf(stderr, "usage: ipfw [options]\n" " add [number] rule\n" " pipe number config [pipeconfig]\n" " queue number config [queueconfig]\n" " [pipe] flush\n" " [pipe] delete number ...\n" " [pipe] {list|show} [number ...]\n" " {zero|resetlog} [number ...]\n" "do \"ipfw -h\" or see ipfw manpage for details\n" ); exit(EX_USAGE); } static void help(void) { fprintf(stderr, "ipfw syntax summary:\n" "ipfw add [N] [prob {0..1}] ACTION [log [logamount N]] ADDR OPTIONS\n" "ipfw {pipe|queue} N config BODY\n" "ipfw [pipe] {zero|delete|show} [N{,N}]\n" "\n" "RULE: [1..] [PROB] BODY\n" "RULENUM: INTEGER(1..65534)\n" "PROB: prob REAL(0..1)\n" "BODY: check-state [LOG] (no body) |\n" " ACTION [LOG] MATCH_ADDR [OPTION_LIST]\n" "ACTION: check-state | allow | count | deny | reject | skipto N |\n" " {divert|tee} PORT | forward ADDR | pipe N | queue N\n" "ADDR: [ MAC dst src ether_type ] \n" " [ from IPLIST [ PORT ] to IPLIST [ PORTLIST ] ]\n" "IPLIST: IPADDR | ( IPADDR or ... or IPADDR )\n" "IPADDR: [not] { any | me | ip | ip/bits | ip:mask | ip/bits{x,y,z} }\n" "OPTION_LIST: OPTION [,OPTION_LIST]\n" ); exit(0); } static int lookup_host (char *host, struct in_addr *ipaddr) { struct hostent *he; if (!inet_aton(host, ipaddr)) { if ((he = gethostbyname(host)) == NULL) return(-1); *ipaddr = *(struct in_addr *)he->h_addr_list[0]; } return(0); } /* * fills the addr and mask fields in the instruction as appropriate from av. * Update length as appropriate. * The following formats are allowed: * any matches any IP. Actually returns an empty instruction. * me returns O_IP_*_ME * 1.2.3.4 single IP address * 1.2.3.4:5.6.7.8 address:mask * 1.2.3.4/24 address/mask * 1.2.3.4/26{1,6,5,4,23} set of addresses in a subnet */ static void fill_ip(ipfw_insn_ip *cmd, char *av) { char *p = 0, md = 0; u_int32_t i; cmd->o.len &= ~F_LEN_MASK; /* zero len */ if (!strncmp(av, "any", strlen(av))) return; if (!strncmp(av, "me", strlen(av))) { cmd->o.len |= F_INSN_SIZE(ipfw_insn); return; } p = strchr(av, '/'); if (!p) p = strchr(av, ':'); if (p) { md = *p; *p++ = '\0'; } if (lookup_host(av, &cmd->addr) != 0) errx(EX_NOHOST, "hostname ``%s'' unknown", av); switch (md) { case ':': if (!inet_aton(p, &cmd->mask)) errx(EX_DATAERR, "bad netmask ``%s''", p); break; case '/': i = atoi(p); if (i == 0) cmd->mask.s_addr = htonl(0); else if (i > 32) errx(EX_DATAERR, "bad width ``%s''", p); else cmd->mask.s_addr = htonl(~0 << (32 - i)); break; default: cmd->mask.s_addr = htonl(~0); break; } cmd->addr.s_addr &= cmd->mask.s_addr; /* * now look if we have a set of addresses. They are stored as follows: * arg1 is the set size (powers of 2, 2..256) * addr is the base address IN HOST FORMAT * mask.. is an array of u_int32_t with bits set. */ if (p) p = strchr(p, '{'); if (p) { /* fetch addresses */ u_int32_t *d; int low, high; int i = contigmask((u_char *)&(cmd->mask), 32); if (i < 24 || i > 31) { fprintf(stderr, "invalid set with mask %d\n", i); exit(0); } cmd->o.arg1 = 1<<(32-i); cmd->addr.s_addr = ntohl(cmd->addr.s_addr); d = (u_int32_t *)&cmd->mask; cmd->o.opcode = O_IP_DST_SET; /* default */ cmd->o.len |= F_INSN_SIZE(ipfw_insn_u32) + (cmd->o.arg1+31)/32; for (i = 0; i < (cmd->o.arg1+31)/32 ; i++) d[i] = 0; /* clear masks */ av = p+1; low = cmd->addr.s_addr & 0xff; high = low + cmd->o.arg1 - 1; + i = -1; /* previous value in a range */ while (isdigit(*av)) { char *s; u_int16_t a = strtol(av, &s, 0); if (s == av) /* no parameter */ break; if (a < low || a > high) { fprintf(stderr, "addr %d out of range [%d-%d]\n", a, low, high); exit(0); } a -= low; - d[ a/32] |= 1<<(a & 31); - if (*s != ',') - break; + if (i == -1) /* no previous in range */ + i = a; + else { /* check that range is valid */ + if (i > a) + errx(EX_DATAERR, "invalid range %d-%d", + i+low, a+low); + if (*s == '-') + errx(EX_DATAERR, "double '-' in range"); + } + for (; i <= a; i++) + d[i/32] |= 1<<(i & 31); + i = -1; + if (*s == '-') + i = a; + else if (*s != ',') + break; av = s+1; } return; } if (cmd->mask.s_addr == 0) { /* any */ if (cmd->o.len & F_NOT) errx(EX_DATAERR, "not any never matches"); else /* useless, nuke it */ return; } else if (cmd->mask.s_addr == IP_MASK_ALL) /* one IP */ cmd->o.len |= F_INSN_SIZE(ipfw_insn_u32); else /* addr/mask */ cmd->o.len |= F_INSN_SIZE(ipfw_insn_ip); } /* * helper function to process a set of flags and set bits in the * appropriate masks. */ static void fill_flags(ipfw_insn *cmd, enum ipfw_opcodes opcode, struct _s_x *flags, char *p) { u_int8_t set=0, clear=0; while (p && *p) { char *q; /* points to the separator */ int val; u_int8_t *which; /* mask we are working on */ if (*p == '!') { p++; which = &clear; } else which = &set; q = strchr(p, ','); if (q) *q++ = '\0'; val = match_token(flags, p); if (val <= 0) errx(EX_DATAERR, "invalid flag %s", p); *which |= (u_int8_t)val; p = q; } cmd->opcode = opcode; cmd->len = (cmd->len & (F_NOT | F_OR)) | 1; cmd->arg1 = (set & 0xff) | ( (clear & 0xff) << 8); } static void delete(int ac, char *av[]) { u_int32_t rulenum; struct dn_pipe pipe; int i; int exitval = EX_OK; int do_set = 0; memset(&pipe, 0, sizeof pipe); av++; ac--; if (ac > 0 && !strncmp(*av, "set", strlen(*av))) { do_set = 1; /* delete set */ ac--; av++; } /* Rule number */ while (ac && isdigit(**av)) { i = atoi(*av); av++; ac--; if (do_pipe) { if (do_pipe == 1) pipe.pipe_nr = i; else pipe.fs.fs_nr = i; i = setsockopt(s, IPPROTO_IP, IP_DUMMYNET_DEL, &pipe, sizeof pipe); if (i) { exitval = 1; warn("rule %u: setsockopt(IP_DUMMYNET_DEL)", do_pipe == 1 ? pipe.pipe_nr : pipe.fs.fs_nr); } } else { rulenum = (i & 0xffff) | (do_set << 24); i = setsockopt(s, IPPROTO_IP, IP_FW_DEL, &rulenum, sizeof rulenum); if (i) { exitval = EX_UNAVAILABLE; warn("rule %u: setsockopt(IP_FW_DEL)", rulenum); } } } if (exitval != EX_OK) exit(exitval); } /* * fill the interface structure. We do not check the name as we can * create interfaces dynamically, so checking them at insert time * makes relatively little sense. * A '*' following the name means any unit. */ static void fill_iface(ipfw_insn_if *cmd, char *arg) { cmd->name[0] = '\0'; cmd->o.len |= F_INSN_SIZE(ipfw_insn_if); /* Parse the interface or address */ if (!strcmp(arg, "any")) cmd->o.len = 0; /* effectively ignore this command */ else if (!isdigit(*arg)) { char *q; strncpy(cmd->name, arg, sizeof(cmd->name)); cmd->name[sizeof(cmd->name) - 1] = '\0'; /* find first digit or wildcard */ for (q = cmd->name; *q && !isdigit(*q) && *q != '*'; q++) continue; cmd->p.unit = (*q == '*') ? -1 : atoi(q); *q = '\0'; } else if (!inet_aton(arg, &cmd->p.ip)) errx(EX_DATAERR, "bad ip address ``%s''", arg); } /* * the following macro returns an error message if we run out of * arguments. */ #define NEED1(msg) {if (!ac) errx(EX_USAGE, msg);} static void config_pipe(int ac, char **av) { struct dn_pipe pipe; int i; char *end; u_int32_t a; void *par = NULL; memset(&pipe, 0, sizeof pipe); av++; ac--; /* Pipe number */ if (ac && isdigit(**av)) { i = atoi(*av); av++; ac--; if (do_pipe == 1) pipe.pipe_nr = i; else pipe.fs.fs_nr = i; } while (ac > 0) { double d; int tok = match_token(dummynet_params, *av); ac--; av++; switch(tok) { case TOK_NOERROR: pipe.fs.flags_fs |= DN_NOERROR; break; case TOK_PLR: NEED1("plr needs argument 0..1\n"); d = strtod(av[0], NULL); if (d > 1) d = 1; else if (d < 0) d = 0; pipe.fs.plr = (int)(d*0x7fffffff); ac--; av++; break; case TOK_QUEUE: NEED1("queue needs queue size\n"); end = NULL; pipe.fs.qsize = strtoul(av[0], &end, 0); if (*end == 'K' || *end == 'k') { pipe.fs.flags_fs |= DN_QSIZE_IS_BYTES; pipe.fs.qsize *= 1024; } else if (*end == 'B' || !strncmp(end, "by", 2)) { pipe.fs.flags_fs |= DN_QSIZE_IS_BYTES; } ac--; av++; break; case TOK_BUCKETS: NEED1("buckets needs argument\n"); pipe.fs.rq_size = strtoul(av[0], NULL, 0); ac--; av++; break; case TOK_MASK: NEED1("mask needs mask specifier\n"); /* * per-flow queue, mask is dst_ip, dst_port, * src_ip, src_port, proto measured in bits */ par = NULL; pipe.fs.flow_mask.dst_ip = 0; pipe.fs.flow_mask.src_ip = 0; pipe.fs.flow_mask.dst_port = 0; pipe.fs.flow_mask.src_port = 0; pipe.fs.flow_mask.proto = 0; end = NULL; while (ac >= 1) { u_int32_t *p32 = NULL; u_int16_t *p16 = NULL; tok = match_token(dummynet_params, *av); ac--; av++; switch(tok) { case TOK_ALL: /* * special case, all bits significant */ pipe.fs.flow_mask.dst_ip = ~0; pipe.fs.flow_mask.src_ip = ~0; pipe.fs.flow_mask.dst_port = ~0; pipe.fs.flow_mask.src_port = ~0; pipe.fs.flow_mask.proto = ~0; pipe.fs.flags_fs |= DN_HAVE_FLOW_MASK; goto end_mask; case TOK_DSTIP: p32 = &pipe.fs.flow_mask.dst_ip; break; case TOK_SRCIP: p32 = &pipe.fs.flow_mask.src_ip; break; case TOK_DSTPORT: p16 = &pipe.fs.flow_mask.dst_port; break; case TOK_SRCPORT: p16 = &pipe.fs.flow_mask.src_port; break; case TOK_PROTO: break; default: ac++; av--; /* backtrack */ goto end_mask; } if (ac < 1) errx(EX_USAGE, "mask: value missing"); if (*av[0] == '/') { a = strtoul(av[0]+1, &end, 0); a = (a == 32) ? ~0 : (1 << a) - 1; } else a = strtoul(av[0], &end, 0); if (p32 != NULL) *p32 = a; else if (p16 != NULL) { if (a > 65535) errx(EX_DATAERR, "mask: must be 16 bit"); *p16 = (u_int16_t)a; } else { if (a > 255) errx(EX_DATAERR, "mask: must be 8 bit"); pipe.fs.flow_mask.proto = (u_int8_t)a; } if (a != 0) pipe.fs.flags_fs |= DN_HAVE_FLOW_MASK; ac--; av++; } /* end while, config masks */ end_mask: break; case TOK_RED: case TOK_GRED: NEED1("red/gred needs w_q/min_th/max_th/max_p\n"); pipe.fs.flags_fs |= DN_IS_RED; if (tok == TOK_GRED) pipe.fs.flags_fs |= DN_IS_GENTLE_RED; /* * the format for parameters is w_q/min_th/max_th/max_p */ if ((end = strsep(&av[0], "/"))) { double w_q = strtod(end, NULL); if (w_q > 1 || w_q <= 0) errx(EX_DATAERR, "0 < w_q <= 1"); pipe.fs.w_q = (int) (w_q * (1 << SCALE_RED)); } if ((end = strsep(&av[0], "/"))) { pipe.fs.min_th = strtoul(end, &end, 0); if (*end == 'K' || *end == 'k') pipe.fs.min_th *= 1024; } if ((end = strsep(&av[0], "/"))) { pipe.fs.max_th = strtoul(end, &end, 0); if (*end == 'K' || *end == 'k') pipe.fs.max_th *= 1024; } if ((end = strsep(&av[0], "/"))) { double max_p = strtod(end, NULL); if (max_p > 1 || max_p <= 0) errx(EX_DATAERR, "0 < max_p <= 1"); pipe.fs.max_p = (int)(max_p * (1 << SCALE_RED)); } ac--; av++; break; case TOK_DROPTAIL: pipe.fs.flags_fs &= ~(DN_IS_RED|DN_IS_GENTLE_RED); break; case TOK_BW: NEED1("bw needs bandwidth or interface\n"); if (do_pipe != 1) errx(EX_DATAERR, "bandwidth only valid for pipes"); /* * set clocking interface or bandwidth value */ if (av[0][0] >= 'a' && av[0][0] <= 'z') { int l = sizeof(pipe.if_name)-1; /* interface name */ strncpy(pipe.if_name, av[0], l); pipe.if_name[l] = '\0'; pipe.bandwidth = 0; } else { pipe.if_name[0] = '\0'; pipe.bandwidth = strtoul(av[0], &end, 0); if (*end == 'K' || *end == 'k') { end++; pipe.bandwidth *= 1000; } else if (*end == 'M') { end++; pipe.bandwidth *= 1000000; } if (*end == 'B' || !strncmp(end, "by", 2)) pipe.bandwidth *= 8; if (pipe.bandwidth < 0) errx(EX_DATAERR, "bandwidth too large"); } ac--; av++; break; case TOK_DELAY: if (do_pipe != 1) errx(EX_DATAERR, "delay only valid for pipes"); NEED1("delay needs argument 0..10000ms\n"); pipe.delay = strtoul(av[0], NULL, 0); ac--; av++; break; case TOK_WEIGHT: if (do_pipe == 1) errx(EX_DATAERR,"weight only valid for queues"); NEED1("weight needs argument 0..100\n"); pipe.fs.weight = strtoul(av[0], &end, 0); ac--; av++; break; case TOK_PIPE: if (do_pipe == 1) errx(EX_DATAERR,"pipe only valid for queues"); NEED1("pipe needs pipe_number\n"); pipe.fs.parent_nr = strtoul(av[0], &end, 0); ac--; av++; break; default: errx(EX_DATAERR, "unrecognised option ``%s''", *av); } } if (do_pipe == 1) { if (pipe.pipe_nr == 0) errx(EX_DATAERR, "pipe_nr must be > 0"); if (pipe.delay > 10000) errx(EX_DATAERR, "delay must be < 10000"); } else { /* do_pipe == 2, queue */ if (pipe.fs.parent_nr == 0) errx(EX_DATAERR, "pipe must be > 0"); if (pipe.fs.weight >100) errx(EX_DATAERR, "weight must be <= 100"); } if (pipe.fs.flags_fs & DN_QSIZE_IS_BYTES) { if (pipe.fs.qsize > 1024*1024) errx(EX_DATAERR, "queue size must be < 1MB"); } else { if (pipe.fs.qsize > 100) errx(EX_DATAERR, "2 <= queue size <= 100"); } if (pipe.fs.flags_fs & DN_IS_RED) { size_t len; int lookup_depth, avg_pkt_size; double s, idle, weight, w_q; struct clockinfo clock; int t; if (pipe.fs.min_th >= pipe.fs.max_th) errx(EX_DATAERR, "min_th %d must be < than max_th %d", pipe.fs.min_th, pipe.fs.max_th); if (pipe.fs.max_th == 0) errx(EX_DATAERR, "max_th must be > 0"); len = sizeof(int); if (sysctlbyname("net.inet.ip.dummynet.red_lookup_depth", &lookup_depth, &len, NULL, 0) == -1) errx(1, "sysctlbyname(\"%s\")", "net.inet.ip.dummynet.red_lookup_depth"); if (lookup_depth == 0) errx(EX_DATAERR, "net.inet.ip.dummynet.red_lookup_depth" " must be greater than zero"); len = sizeof(int); if (sysctlbyname("net.inet.ip.dummynet.red_avg_pkt_size", &avg_pkt_size, &len, NULL, 0) == -1) errx(1, "sysctlbyname(\"%s\")", "net.inet.ip.dummynet.red_avg_pkt_size"); if (avg_pkt_size == 0) errx(EX_DATAERR, "net.inet.ip.dummynet.red_avg_pkt_size must" " be greater than zero"); len = sizeof(struct clockinfo); if (sysctlbyname("kern.clockrate", &clock, &len, NULL, 0) == -1) errx(1, "sysctlbyname(\"%s\")", "kern.clockrate"); /* * Ticks needed for sending a medium-sized packet. * Unfortunately, when we are configuring a WF2Q+ queue, we * do not have bandwidth information, because that is stored * in the parent pipe, and also we have multiple queues * competing for it. So we set s=0, which is not very * correct. But on the other hand, why do we want RED with * WF2Q+ ? */ if (pipe.bandwidth==0) /* this is a WF2Q+ queue */ s = 0; else s = clock.hz * avg_pkt_size * 8 / pipe.bandwidth; /* * max idle time (in ticks) before avg queue size becomes 0. * NOTA: (3/w_q) is approx the value x so that * (1-w_q)^x < 10^-3. */ w_q = ((double)pipe.fs.w_q) / (1 << SCALE_RED); idle = s * 3. / w_q; pipe.fs.lookup_step = (int)idle / lookup_depth; if (!pipe.fs.lookup_step) pipe.fs.lookup_step = 1; weight = 1 - w_q; for (t = pipe.fs.lookup_step; t > 0; --t) weight *= weight; pipe.fs.lookup_weight = (int)(weight * (1 << SCALE_RED)); } i = setsockopt(s, IPPROTO_IP, IP_DUMMYNET_CONFIGURE, &pipe, sizeof pipe); if (i) err(1, "setsockopt(%s)", "IP_DUMMYNET_CONFIGURE"); } static void get_mac_addr_mask(char *p, u_char *addr, u_char *mask) { int i, l; for (i=0; i<6; i++) addr[i] = mask[i] = 0; if (!strcmp(p, "any")) return; for (i=0; *p && i<6;i++, p++) { addr[i] = strtol(p, &p, 16); if (*p != ':') /* we start with the mask */ break; } if (*p == '/') { /* mask len */ l = strtol(p+1, &p, 0); for (i=0; l>0; l -=8, i++) mask[i] = (l >=8) ? 0xff : (~0) << (8-l); } else if (*p == '&') { /* mask */ for (i=0, p++; *p && i<6;i++, p++) { mask[i] = strtol(p, &p, 16); if (*p != ':') break; } } else if (*p == '\0') { for (i=0; i<6; i++) mask[i] = 0xff; } for (i=0; i<6; i++) addr[i] &= mask[i]; } /* * helper function, updates the pointer to cmd with the length * of the current command, and also cleans up the first word of * the new command in case it has been clobbered before. */ static ipfw_insn * next_cmd(ipfw_insn *cmd) { cmd += F_LEN(cmd); bzero(cmd, sizeof(*cmd)); return cmd; } /* * A function to fill simple commands of size 1. * Existing flags are preserved. */ static void fill_cmd(ipfw_insn *cmd, enum ipfw_opcodes opcode, int flags, u_int16_t arg) { cmd->opcode = opcode; cmd->len = ((cmd->len | flags) & (F_NOT | F_OR)) | 1; cmd->arg1 = arg; } /* * Fetch and add the MAC address and type, with masks. This generates one or * two microinstructions, and returns the pointer to the last one. */ static ipfw_insn * add_mac(ipfw_insn *cmd, int ac, char *av[]) { ipfw_insn_mac *mac; if (ac < 2) errx(EX_DATAERR, "MAC dst src"); cmd->opcode = O_MACADDR2; cmd->len = (cmd->len & (F_NOT | F_OR)) | F_INSN_SIZE(ipfw_insn_mac); mac = (ipfw_insn_mac *)cmd; get_mac_addr_mask(av[0], mac->addr, mac->mask); /* dst */ get_mac_addr_mask(av[1], &(mac->addr[6]), &(mac->mask[6])); /* src */ return cmd; } static ipfw_insn * add_mactype(ipfw_insn *cmd, int ac, char *av) { if (ac < 1) errx(EX_DATAERR, "missing MAC type"); if (strcmp(av, "any") != 0) { /* we have a non-null type */ fill_newports((ipfw_insn_u16 *)cmd, av, IPPROTO_ETHERTYPE); cmd->opcode = O_MAC_TYPE; return cmd; } else return NULL; } static ipfw_insn * add_proto(ipfw_insn *cmd, char *av) { struct protoent *pe; u_char proto = 0; if (!strncmp(av, "all", strlen(av))) ; /* same as "ip" */ else if ((proto = atoi(av)) > 0) ; /* all done! */ else if ((pe = getprotobyname(av)) != NULL) proto = pe->p_proto; else return NULL; if (proto != IPPROTO_IP) fill_cmd(cmd, O_PROTO, 0, proto); return cmd; } static ipfw_insn * add_srcip(ipfw_insn *cmd, char *av) { fill_ip((ipfw_insn_ip *)cmd, av); if (cmd->opcode == O_IP_DST_SET) /* set */ cmd->opcode = O_IP_SRC_SET; else if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn)) /* me */ cmd->opcode = O_IP_SRC_ME; else if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn_u32)) /* one IP */ cmd->opcode = O_IP_SRC; else if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn_ip)) /* addr/mask */ cmd->opcode = O_IP_SRC_MASK; return cmd; } static ipfw_insn * add_dstip(ipfw_insn *cmd, char *av) { fill_ip((ipfw_insn_ip *)cmd, av); if (cmd->opcode == O_IP_DST_SET) /* set */ ; else if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn)) /* me */ cmd->opcode = O_IP_DST_ME; else if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn_u32)) /* one IP */ cmd->opcode = O_IP_DST; else if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn_ip)) /* addr/mask */ cmd->opcode = O_IP_DST_MASK; return cmd; } static ipfw_insn * add_ports(ipfw_insn *cmd, char *av, u_char proto, int opcode) { if (!strncmp(av, "any", strlen(av))) { return NULL; } else if (fill_newports((ipfw_insn_u16 *)cmd, av, proto)) { /* XXX todo: check that we have a protocol with ports */ cmd->opcode = opcode; return cmd; } return NULL; } /* * Parse arguments and assemble the microinstructions which make up a rule. * Rules are added into the 'rulebuf' and then copied in the correct order * into the actual rule. * * The syntax for a rule starts with the action, followed by an * optional log action, and the various match patterns. * In the assembled microcode, the first opcode must be an O_PROBE_STATE * (generated if the rule includes a keep-state option), then the * various match patterns, the "log" action, and the actual action. * */ static void add(int ac, char *av[]) { /* * rules are added into the 'rulebuf' and then copied in * the correct order into the actual rule. * Some things that need to go out of order (prob, action etc.) * go into actbuf[]. */ static u_int32_t rulebuf[255], actbuf[255], cmdbuf[255]; ipfw_insn *src, *dst, *cmd, *action, *prev; ipfw_insn *first_cmd; /* first match pattern */ struct ip_fw *rule; /* * various flags used to record that we entered some fields. */ ipfw_insn *have_state = NULL; /* check-state or keep-state */ int i; int open_par = 0; /* open parenthesis ( */ /* proto is here because it is used to fetch ports */ u_char proto = IPPROTO_IP; /* default protocol */ double match_prob = 1; /* match probability, default is always match */ bzero(actbuf, sizeof(actbuf)); /* actions go here */ bzero(cmdbuf, sizeof(cmdbuf)); bzero(rulebuf, sizeof(rulebuf)); rule = (struct ip_fw *)rulebuf; cmd = (ipfw_insn *)cmdbuf; action = (ipfw_insn *)actbuf; av++; ac--; /* [rule N] -- Rule number optional */ if (ac && isdigit(**av)) { rule->rulenum = atoi(*av); av++; ac--; } /* [set N] -- set number (0..30), optional */ if (ac > 1 && !strncmp(*av, "set", strlen(*av))) { int set = strtoul(av[1], NULL, 10); if (set < 0 || set > 30) errx(EX_DATAERR, "illegal set %s", av[1]); rule->set = set; av += 2; ac -= 2; } /* [prob D] -- match probability, optional */ if (ac > 1 && !strncmp(*av, "prob", strlen(*av))) { match_prob = strtod(av[1], NULL); if (match_prob <= 0 || match_prob > 1) errx(EX_DATAERR, "illegal match prob. %s", av[1]); av += 2; ac -= 2; } /* action -- mandatory */ NEED1("missing action"); i = match_token(rule_actions, *av); ac--; av++; action->len = 1; /* default */ switch(i) { case TOK_CHECKSTATE: have_state = action; action->opcode = O_CHECK_STATE; break; case TOK_ACCEPT: action->opcode = O_ACCEPT; break; case TOK_DENY: action->opcode = O_DENY; action->arg1 = 0; break; case TOK_REJECT: action->opcode = O_REJECT; action->arg1 = ICMP_UNREACH_HOST; break; case TOK_RESET: action->opcode = O_REJECT; action->arg1 = ICMP_REJECT_RST; break; case TOK_UNREACH: action->opcode = O_REJECT; NEED1("missing reject code"); fill_reject_code(&action->arg1, *av); ac--; av++; break; case TOK_COUNT: action->opcode = O_COUNT; break; case TOK_QUEUE: case TOK_PIPE: action->len = F_INSN_SIZE(ipfw_insn_pipe); case TOK_SKIPTO: if (i == TOK_QUEUE) action->opcode = O_QUEUE; else if (i == TOK_PIPE) action->opcode = O_PIPE; else if (i == TOK_SKIPTO) action->opcode = O_SKIPTO; NEED1("missing skipto/pipe/queue number"); action->arg1 = strtoul(*av, NULL, 10); av++; ac--; break; case TOK_DIVERT: case TOK_TEE: action->opcode = (i == TOK_DIVERT) ? O_DIVERT : O_TEE; NEED1("missing divert/tee port"); action->arg1 = strtoul(*av, NULL, 0); if (action->arg1 == 0) { struct servent *s; setservent(1); s = getservbyname(av[0], "divert"); if (s != NULL) action->arg1 = ntohs(s->s_port); else errx(EX_DATAERR, "illegal divert/tee port"); } ac--; av++; break; case TOK_FORWARD: { ipfw_insn_sa *p = (ipfw_insn_sa *)action; char *s, *end; NEED1("missing forward address[:port]"); action->opcode = O_FORWARD_IP; action->len = F_INSN_SIZE(ipfw_insn_sa); p->sa.sin_len = sizeof(struct sockaddr_in); p->sa.sin_family = AF_INET; p->sa.sin_port = 0; /* * locate the address-port separator (':' or ',') */ s = strchr(*av, ':'); if (s == NULL) s = strchr(*av, ','); if (s != NULL) { *(s++) = '\0'; i = strtoport(s, &end, 0 /* base */, 0 /* proto */); if (s == end) errx(EX_DATAERR, "illegal forwarding port ``%s''", s); p->sa.sin_port = (u_short)i; } lookup_host(*av, &(p->sa.sin_addr)); } ac--; av++; break; default: errx(EX_DATAERR, "invalid action %s\n", av[-1]); } action = next_cmd(action); /* * [log [logamount N]] -- log, optional * * If exists, it goes first in the cmdbuf, but then it is * skipped in the copy section to the end of the buffer. */ if (ac && !strncmp(*av, "log", strlen(*av))) { ipfw_insn_log *c = (ipfw_insn_log *)cmd; cmd->len = F_INSN_SIZE(ipfw_insn_log); cmd->opcode = O_LOG; av++; ac--; if (ac && !strncmp(*av, "logamount", strlen(*av))) { ac--; av++; NEED1("logamount requires argument"); c->max_log = atoi(*av); if (c->max_log < 0) errx(EX_DATAERR, "logamount must be positive"); ac--; av++; } cmd = next_cmd(cmd); } if (have_state) /* must be a check-state, we are done */ goto done; #define OR_START(target) \ if (ac && (*av[0] == '(' || *av[0] == '{')) { \ if (open_par) \ errx(EX_USAGE, "nested \"(\" not allowed\n"); \ prev = NULL; \ open_par = 1; \ if ( (av[0])[1] == '\0') { \ ac--; av++; \ } else \ (*av)++; \ } \ target: \ #define CLOSE_PAR \ if (open_par) { \ if (ac && ( \ !strncmp(*av, ")", strlen(*av)) || \ !strncmp(*av, "}", strlen(*av)) )) { \ prev = NULL; \ open_par = 0; \ ac--; av++; \ } else \ errx(EX_USAGE, "missing \")\"\n"); \ } #define NOT_BLOCK \ if (ac && !strncmp(*av, "not", strlen(*av))) { \ if (cmd->len & F_NOT) \ errx(EX_USAGE, "double \"not\" not allowed\n"); \ cmd->len |= F_NOT; \ ac--; av++; \ } #define OR_BLOCK(target) \ if (ac && !strncmp(*av, "or", strlen(*av))) { \ if (prev == NULL || open_par == 0) \ errx(EX_DATAERR, "invalid OR block"); \ prev->len |= F_OR; \ ac--; av++; \ goto target; \ } \ CLOSE_PAR; first_cmd = cmd; #if 0 /* * MAC addresses, optional. * If we have this, we skip the part "proto from src to dst" * and jump straight to the option parsing. */ NOT_BLOCK; NEED1("missing protocol"); if (!strncmp(*av, "MAC", strlen(*av)) || !strncmp(*av, "mac", strlen(*av))) { ac--; av++; /* the "MAC" keyword */ add_mac(cmd, ac, av); /* exits in case of errors */ cmd = next_cmd(cmd); ac -= 2; av += 2; /* dst-mac and src-mac */ NOT_BLOCK; NEED1("missing mac type"); if (add_mactype(cmd, ac, av[0])) cmd = next_cmd(cmd); ac--; av++; /* any or mac-type */ goto read_options; } #endif /* * protocol, mandatory */ OR_START(get_proto); NOT_BLOCK; NEED1("missing protocol"); if (add_proto(cmd, *av)) { av++; ac--; if (F_LEN(cmd) == 0) /* plain IP */ proto = 0; else { proto = cmd->arg1; prev = cmd; cmd = next_cmd(cmd); } } else if (first_cmd != cmd) { errx(EX_DATAERR, "invalid protocol ``%s''", *av); } else goto read_options; OR_BLOCK(get_proto); /* * "from", mandatory */ if (!ac || strncmp(*av, "from", strlen(*av))) errx(EX_USAGE, "missing ``from''"); ac--; av++; /* * source IP, mandatory */ OR_START(source_ip); NOT_BLOCK; /* optional "not" */ NEED1("missing source address"); if (add_srcip(cmd, *av)) { ac--; av++; if (F_LEN(cmd) != 0) { /* ! any */ prev = cmd; cmd = next_cmd(cmd); } } OR_BLOCK(source_ip); /* * source ports, optional */ NOT_BLOCK; /* optional "not" */ if (ac) { if (!strncmp(*av, "any", strlen(*av)) || add_ports(cmd, *av, proto, O_IP_SRCPORT)) { ac--; av++; if (F_LEN(cmd) != 0) cmd = next_cmd(cmd); } } /* * "to", mandatory */ if (!ac || strncmp(*av, "to", strlen(*av))) errx(EX_USAGE, "missing ``to''"); av++; ac--; /* * destination, mandatory */ OR_START(dest_ip); NOT_BLOCK; /* optional "not" */ NEED1("missing dst address"); if (add_dstip(cmd, *av)) { ac--; av++; if (F_LEN(cmd) != 0) { /* ! any */ prev = cmd; cmd = next_cmd(cmd); } } OR_BLOCK(dest_ip); /* * dest. ports, optional */ NOT_BLOCK; /* optional "not" */ if (ac) { if (!strncmp(*av, "any", strlen(*av)) || add_ports(cmd, *av, proto, O_IP_DSTPORT)) { ac--; av++; if (F_LEN(cmd) != 0) cmd = next_cmd(cmd); } } read_options: if (ac && first_cmd == cmd) { /* * nothing specified so far, store in the rule to ease * printout later. */ rule->_pad = 1; } prev = NULL; while (ac) { char *s; ipfw_insn_u32 *cmd32; /* alias for cmd */ s = *av; cmd32 = (ipfw_insn_u32 *)cmd; if (*s == '!') { /* alternate syntax for NOT */ if (cmd->len & F_NOT) errx(EX_USAGE, "double \"not\" not allowed\n"); cmd->len = F_NOT; s++; } i = match_token(rule_options, s); ac--; av++; switch(i) { case TOK_NOT: if (cmd->len & F_NOT) errx(EX_USAGE, "double \"not\" not allowed\n"); cmd->len = F_NOT; break; case TOK_OR: if (open_par == 0 || prev == NULL) errx(EX_USAGE, "invalid \"or\" block\n"); prev->len |= F_OR; break; case TOK_STARTBRACE: if (open_par) errx(EX_USAGE, "+nested \"(\" not allowed\n"); open_par = 1; break; case TOK_ENDBRACE: if (!open_par) errx(EX_USAGE, "+missing \")\"\n"); open_par = 0; prev = NULL; break; case TOK_IN: fill_cmd(cmd, O_IN, 0, 0); break; case TOK_OUT: cmd->len ^= F_NOT; /* toggle F_NOT */ fill_cmd(cmd, O_IN, 0, 0); break; case TOK_FRAG: fill_cmd(cmd, O_FRAG, 0, 0); break; case TOK_LAYER2: fill_cmd(cmd, O_LAYER2, 0, 0); break; case TOK_XMIT: case TOK_RECV: case TOK_VIA: NEED1("recv, xmit, via require interface name" " or address"); fill_iface((ipfw_insn_if *)cmd, av[0]); ac--; av++; if (F_LEN(cmd) == 0) /* not a valid address */ break; if (i == TOK_XMIT) cmd->opcode = O_XMIT; else if (i == TOK_RECV) cmd->opcode = O_RECV; else if (i == TOK_VIA) cmd->opcode = O_VIA; break; case TOK_ICMPTYPES: NEED1("icmptypes requires list of types"); fill_icmptypes((ipfw_insn_u32 *)cmd, *av); av++; ac--; break; case TOK_IPTTL: NEED1("ipttl requires TTL"); - fill_cmd(cmd, O_IPTTL, 0, strtoul(*av, NULL, 0)); + if (strpbrk(*av, "-,")) { + if (!add_ports(cmd, *av, 0, O_IPTTL)) + errx(EX_DATAERR, "invalid ipttl %s", *av); + } else + fill_cmd(cmd, O_IPTTL, 0, strtoul(*av, NULL, 0)); ac--; av++; break; case TOK_IPID: - NEED1("ipid requires length"); - fill_cmd(cmd, O_IPID, 0, strtoul(*av, NULL, 0)); + NEED1("ipid requires id"); + if (strpbrk(*av, "-,")) { + if (!add_ports(cmd, *av, 0, O_IPID)) + errx(EX_DATAERR, "invalid ipid %s", *av); + } else + fill_cmd(cmd, O_IPID, 0, strtoul(*av, NULL, 0)); ac--; av++; break; case TOK_IPLEN: NEED1("iplen requires length"); - fill_cmd(cmd, O_IPLEN, 0, strtoul(*av, NULL, 0)); + if (strpbrk(*av, "-,")) { + if (!add_ports(cmd, *av, 0, O_IPLEN)) + errx(EX_DATAERR, "invalid ip len %s", *av); + } else + fill_cmd(cmd, O_IPLEN, 0, strtoul(*av, NULL, 0)); ac--; av++; break; case TOK_IPVER: NEED1("ipver requires version"); fill_cmd(cmd, O_IPVER, 0, strtoul(*av, NULL, 0)); ac--; av++; break; case TOK_IPPRECEDENCE: NEED1("ipprecedence requires value"); fill_cmd(cmd, O_IPPRECEDENCE, 0, (strtoul(*av, NULL, 0) & 7) << 5); ac--; av++; break; case TOK_IPOPTS: NEED1("missing argument for ipoptions"); fill_flags(cmd, O_IPOPT, f_ipopts, *av); ac--; av++; break; case TOK_IPTOS: NEED1("missing argument for iptos"); fill_flags(cmd, O_IPTOS, f_iptos, *av); ac--; av++; break; case TOK_UID: NEED1("uid requires argument"); { char *end; uid_t uid; struct passwd *pwd; cmd->opcode = O_UID; uid = strtoul(*av, &end, 0); pwd = (*end == '\0') ? getpwuid(uid) : getpwnam(*av); if (pwd == NULL) errx(EX_DATAERR, "uid \"%s\" nonexistent", *av); cmd32->d[0] = pwd->pw_uid; cmd->len = F_INSN_SIZE(ipfw_insn_u32); ac--; av++; } break; case TOK_GID: NEED1("gid requires argument"); { char *end; gid_t gid; struct group *grp; cmd->opcode = O_GID; gid = strtoul(*av, &end, 0); grp = (*end == '\0') ? getgrgid(gid) : getgrnam(*av); if (grp == NULL) errx(EX_DATAERR, "gid \"%s\" nonexistent", *av); cmd32->d[0] = grp->gr_gid; cmd->len = F_INSN_SIZE(ipfw_insn_u32); ac--; av++; } break; case TOK_ESTAB: fill_cmd(cmd, O_ESTAB, 0, 0); break; case TOK_SETUP: fill_cmd(cmd, O_TCPFLAGS, 0, (TH_SYN) | ( (TH_ACK) & 0xff) <<8 ); break; case TOK_TCPOPTS: NEED1("missing argument for tcpoptions"); fill_flags(cmd, O_TCPOPTS, f_tcpopts, *av); ac--; av++; break; case TOK_TCPSEQ: case TOK_TCPACK: NEED1("tcpseq/tcpack requires argument"); cmd->len = F_INSN_SIZE(ipfw_insn_u32); cmd->opcode = (i == TOK_TCPSEQ) ? O_TCPSEQ : O_TCPACK; cmd32->d[0] = htonl(strtoul(*av, NULL, 0)); ac--; av++; break; case TOK_TCPWIN: NEED1("tcpwin requires length"); fill_cmd(cmd, O_TCPWIN, 0, htons(strtoul(*av, NULL, 0))); ac--; av++; break; case TOK_TCPFLAGS: NEED1("missing argument for tcpflags"); cmd->opcode = O_TCPFLAGS; fill_flags(cmd, O_TCPFLAGS, f_tcpflags, *av); ac--; av++; break; case TOK_KEEPSTATE: if (open_par) errx(EX_USAGE, "keep-state cannot be part " "of an or block"); if (have_state) errx(EX_USAGE, "only one of keep-state " "and limit is allowed"); have_state = cmd; fill_cmd(cmd, O_KEEP_STATE, 0, 0); break; case TOK_LIMIT: if (open_par) errx(EX_USAGE, "limit cannot be part " "of an or block"); if (have_state) errx(EX_USAGE, "only one of keep-state " "and limit is allowed"); NEED1("limit needs mask and # of connections"); have_state = cmd; { ipfw_insn_limit *c = (ipfw_insn_limit *)cmd; cmd->len = F_INSN_SIZE(ipfw_insn_limit); cmd->opcode = O_LIMIT; c->limit_mask = 0; c->conn_limit = 0; for (; ac >1 ;) { int val; val = match_token(limit_masks, *av); if (val <= 0) break; c->limit_mask |= val; ac--; av++; } c->conn_limit = atoi(*av); if (c->conn_limit == 0) errx(EX_USAGE, "limit: limit must be >0"); if (c->limit_mask == 0) errx(EX_USAGE, "missing limit mask"); ac--; av++; } break; case TOK_PROTO: NEED1("missing protocol"); if (add_proto(cmd, *av)) { proto = cmd->arg1; ac--; av++; } else errx(EX_DATAERR, "invalid protocol ``%s''", *av); break; case TOK_SRCIP: NEED1("missing source IP"); if (add_srcip(cmd, *av)) { ac--; av++; } break; case TOK_DSTIP: NEED1("missing destination IP"); if (add_dstip(cmd, *av)) { ac--; av++; } break; case TOK_SRCPORT: NEED1("missing source port"); if (!strncmp(*av, "any", strlen(*av)) || add_ports(cmd, *av, proto, O_IP_SRCPORT)) { ac--; av++; } else errx(EX_DATAERR, "invalid source port %s", *av); break; case TOK_DSTPORT: NEED1("missing destination port"); if (!strncmp(*av, "any", strlen(*av)) || add_ports(cmd, *av, proto, O_IP_DSTPORT)) { ac--; av++; } else errx(EX_DATAERR, "invalid destination port %s", *av); break; case TOK_MAC: if (ac < 2) errx(EX_USAGE, "MAC dst-mac src-mac"); if (add_mac(cmd, ac, av)) { ac -= 2; av += 2; } break; case TOK_MACTYPE: NEED1("missing mac type"); if (!add_mactype(cmd, ac, *av)) errx(EX_DATAERR, "invalid mac type %s", *av); ac--; av++; break; + case TOK_VERREVPATH: + fill_cmd(cmd, O_VERREVPATH, 0, 0); + break; + default: errx(EX_USAGE, "unrecognised option [%d] %s\n", i, s); } if (F_LEN(cmd) > 0) { /* prepare to advance */ prev = cmd; cmd = next_cmd(cmd); } } done: /* * Now copy stuff into the rule. * If we have a keep-state option, the first instruction * must be a PROBE_STATE (which is generated here). * If we have a LOG option, it was stored as the first command, * and now must be moved to the top of the action part. */ dst = (ipfw_insn *)rule->cmd; /* * First thing to write into the command stream is the match probability. */ if (match_prob != 1) { /* 1 means always match */ dst->opcode = O_PROB; dst->len = 2; *((int32_t *)(dst+1)) = (int32_t)(match_prob * 0x7fffffff); dst += dst->len; } /* * generate O_PROBE_STATE if necessary */ if (have_state && have_state->opcode != O_CHECK_STATE) { fill_cmd(dst, O_PROBE_STATE, 0, 0); dst = next_cmd(dst); } /* * copy all commands but O_LOG, O_KEEP_STATE, O_LIMIT */ for (src = (ipfw_insn *)cmdbuf; src != cmd; src += i) { i = F_LEN(src); switch (src->opcode) { case O_LOG: case O_KEEP_STATE: case O_LIMIT: break; default: bcopy(src, dst, i * sizeof(u_int32_t)); dst += i; } } /* * put back the have_state command as last opcode */ if (have_state && have_state->opcode != O_CHECK_STATE) { i = F_LEN(have_state); bcopy(have_state, dst, i * sizeof(u_int32_t)); dst += i; } /* * start action section */ rule->act_ofs = dst - rule->cmd; /* * put back O_LOG if necessary */ src = (ipfw_insn *)cmdbuf; if ( src->opcode == O_LOG ) { i = F_LEN(src); bcopy(src, dst, i * sizeof(u_int32_t)); dst += i; } /* * copy all other actions */ for (src = (ipfw_insn *)actbuf; src != action; src += i) { i = F_LEN(src); bcopy(src, dst, i * sizeof(u_int32_t)); dst += i; } rule->cmd_len = (u_int32_t *)dst - (u_int32_t *)(rule->cmd); i = (void *)dst - (void *)rule; if (getsockopt(s, IPPROTO_IP, IP_FW_ADD, rule, &i) == -1) err(EX_UNAVAILABLE, "getsockopt(%s)", "IP_FW_ADD"); if (!do_quiet) show_ipfw(rule, 10, 10); } static void zero(int ac, char *av[]) { int rulenum; int failed = EX_OK; av++; ac--; if (!ac) { /* clear all entries */ if (setsockopt(s, IPPROTO_IP, IP_FW_ZERO, NULL, 0) < 0) err(EX_UNAVAILABLE, "setsockopt(%s)", "IP_FW_ZERO"); if (!do_quiet) printf("Accounting cleared.\n"); return; } while (ac) { /* Rule number */ if (isdigit(**av)) { rulenum = atoi(*av); av++; ac--; if (setsockopt(s, IPPROTO_IP, IP_FW_ZERO, &rulenum, sizeof rulenum)) { warn("rule %u: setsockopt(IP_FW_ZERO)", rulenum); failed = EX_UNAVAILABLE; } else if (!do_quiet) printf("Entry %d cleared\n", rulenum); } else { errx(EX_USAGE, "invalid rule number ``%s''", *av); } } if (failed != EX_OK) exit(failed); } static void resetlog(int ac, char *av[]) { int rulenum; int failed = EX_OK; av++; ac--; if (!ac) { /* clear all entries */ if (setsockopt(s, IPPROTO_IP, IP_FW_RESETLOG, NULL, 0) < 0) err(EX_UNAVAILABLE, "setsockopt(IP_FW_RESETLOG)"); if (!do_quiet) printf("Logging counts reset.\n"); return; } while (ac) { /* Rule number */ if (isdigit(**av)) { rulenum = atoi(*av); av++; ac--; if (setsockopt(s, IPPROTO_IP, IP_FW_RESETLOG, &rulenum, sizeof rulenum)) { warn("rule %u: setsockopt(IP_FW_RESETLOG)", rulenum); failed = EX_UNAVAILABLE; } else if (!do_quiet) printf("Entry %d logging count reset\n", rulenum); } else { errx(EX_DATAERR, "invalid rule number ``%s''", *av); } } if (failed != EX_OK) exit(failed); } static void flush() { int cmd = do_pipe ? IP_DUMMYNET_FLUSH : IP_FW_FLUSH; if (!do_force && !do_quiet) { /* need to ask user */ int c; printf("Are you sure? [yn] "); fflush(stdout); do { c = toupper(getc(stdin)); while (c != '\n' && getc(stdin) != '\n') if (feof(stdin)) return; /* and do not flush */ } while (c != 'Y' && c != 'N'); printf("\n"); if (c == 'N') /* user said no */ return; } if (setsockopt(s, IPPROTO_IP, cmd, NULL, 0) < 0) err(EX_UNAVAILABLE, "setsockopt(IP_%s_FLUSH)", do_pipe ? "DUMMYNET" : "FW"); if (!do_quiet) printf("Flushed all %s.\n", do_pipe ? "pipes" : "rules"); } static int ipfw_main(int ac, char **av) { int ch; if (ac == 1) show_usage(); /* Set the force flag for non-interactive processes */ do_force = !isatty(STDIN_FILENO); optind = optreset = 1; while ((ch = getopt(ac, av, "hs:acdefNqStv")) != -1) switch (ch) { case 'h': /* help */ help(); break; /* NOTREACHED */ case 's': /* sort */ do_sort = atoi(optarg); break; case 'a': do_acct = 1; break; case 'c': do_compact = 1; break; case 'd': do_dynamic = 1; break; case 'e': do_expired = 1; break; case 'f': do_force = 1; break; case 'N': do_resolv = 1; break; case 'q': do_quiet = 1; break; case 'S': show_sets = 1; break; case 't': do_time = 1; break; case 'v': /* verbose */ verbose++; break; default: show_usage(); } ac -= optind; av += optind; NEED1("bad arguments, for usage summary ``ipfw''"); /* * optional: pipe or queue */ if (!strncmp(*av, "pipe", strlen(*av))) { do_pipe = 1; ac--; av++; } else if (!strncmp(*av, "queue", strlen(*av))) { do_pipe = 2; ac--; av++; } NEED1("missing command"); /* * for pipes and queues we normally say 'pipe NN config' * but the code is easier to parse as 'pipe config NN' * so we swap the two arguments. */ if (do_pipe > 0 && ac > 1 && *av[0] >= '0' && *av[0] <= '9') { char *p = av[0]; av[0] = av[1]; av[1] = p; } if (!strncmp(*av, "add", strlen(*av))) add(ac, av); else if (do_pipe && !strncmp(*av, "config", strlen(*av))) config_pipe(ac, av); else if (!strncmp(*av, "delete", strlen(*av))) delete(ac, av); else if (!strncmp(*av, "flush", strlen(*av))) flush(); else if (!strncmp(*av, "zero", strlen(*av))) zero(ac, av); else if (!strncmp(*av, "resetlog", strlen(*av))) resetlog(ac, av); else if (!strncmp(*av, "print", strlen(*av)) || !strncmp(*av, "list", strlen(*av))) list(ac, av); else if (!strncmp(*av, "set", strlen(*av))) sets_handler(ac, av); else if (!strncmp(*av, "enable", strlen(*av))) sysctl_handler(ac, av, 1); else if (!strncmp(*av, "disable", strlen(*av))) sysctl_handler(ac, av, 0); else if (!strncmp(*av, "show", strlen(*av))) { do_acct++; list(ac, av); } else errx(EX_USAGE, "bad command `%s'", *av); return 0; } static void ipfw_readfile(int ac, char *av[]) { #define MAX_ARGS 32 #define WHITESP " \t\f\v\n\r" char buf[BUFSIZ]; char *a, *p, *args[MAX_ARGS], *cmd = NULL; char linename[10]; int i=0, lineno=0, qflag=0, pflag=0, status; FILE *f = NULL; pid_t preproc = 0; int c; - while ((c = getopt(ac, av, "D:U:p:q")) != -1) + while ((c = getopt(ac, av, "p:q")) != -1) { switch(c) { - case 'D': - if (!pflag) - errx(EX_USAGE, "-D requires -p"); - if (i > MAX_ARGS - 2) - errx(EX_USAGE, - "too many -D or -U options"); - args[i++] = "-D"; - args[i++] = optarg; - break; - - case 'U': - if (!pflag) - errx(EX_USAGE, "-U requires -p"); - if (i > MAX_ARGS - 2) - errx(EX_USAGE, - "too many -D or -U options"); - args[i++] = "-U"; - args[i++] = optarg; - break; - case 'p': pflag = 1; cmd = optarg; args[0] = cmd; i = 1; break; case 'q': qflag = 1; break; default: errx(EX_USAGE, "bad arguments, for usage" " summary ``ipfw''"); } + + if (pflag) + break; + } + + if (pflag) { + /* Pass all but the last argument to the preprocessor. */ + while (optind < ac - 1) { + if (i >= MAX_ARGS) + errx(EX_USAGE, "too many preprocessor options"); + args[i++] = av[optind++]; + } + } av += optind; ac -= optind; if (ac != 1) errx(EX_USAGE, "extraneous filename arguments"); if ((f = fopen(av[0], "r")) == NULL) err(EX_UNAVAILABLE, "fopen: %s", av[0]); if (pflag) { /* pipe through preprocessor (cpp or m4) */ int pipedes[2]; args[i] = 0; if (pipe(pipedes) == -1) err(EX_OSERR, "cannot create pipe"); switch((preproc = fork())) { case -1: err(EX_OSERR, "cannot fork"); case 0: /* child */ if (dup2(fileno(f), 0) == -1 || dup2(pipedes[1], 1) == -1) err(EX_OSERR, "dup2()"); fclose(f); close(pipedes[1]); close(pipedes[0]); execvp(cmd, args); err(EX_OSERR, "execvp(%s) failed", cmd); default: /* parent */ fclose(f); close(pipedes[1]); if ((f = fdopen(pipedes[0], "r")) == NULL) { int savederrno = errno; (void)kill(preproc, SIGTERM); errno = savederrno; err(EX_OSERR, "fdopen()"); } } } while (fgets(buf, BUFSIZ, f)) { lineno++; sprintf(linename, "Line %d", lineno); args[0] = linename; if (*buf == '#') continue; if ((p = strchr(buf, '#')) != NULL) *p = '\0'; i = 1; if (qflag) args[i++] = "-q"; for (a = strtok(buf, WHITESP); a && i < MAX_ARGS; a = strtok(NULL, WHITESP), i++) args[i] = a; if (i == (qflag? 2: 1)) continue; if (i == MAX_ARGS) errx(EX_USAGE, "%s: too many arguments", linename); args[i] = NULL; ipfw_main(i, args); } fclose(f); if (pflag) { if (waitpid(preproc, &status, 0) == -1) errx(EX_OSERR, "waitpid()"); if (WIFEXITED(status) && WEXITSTATUS(status) != EX_OK) errx(EX_UNAVAILABLE, "preprocessor exited with status %d", WEXITSTATUS(status)); else if (WIFSIGNALED(status)) errx(EX_UNAVAILABLE, "preprocessor exited with signal %d", WTERMSIG(status)); } } int main(int ac, char *av[]) { s = socket(AF_INET, SOCK_RAW, IPPROTO_RAW); if (s < 0) err(EX_UNAVAILABLE, "socket"); /* * If the last argument is an absolute pathname, interpret it * as a file to be preprocessed. */ if (ac > 1 && av[ac - 1][0] == '/' && access(av[ac - 1], R_OK) == 0) ipfw_readfile(ac, av); else ipfw_main(ac, av); return EX_OK; } Index: stable/4/sys/netinet/ip_dummynet.c =================================================================== --- stable/4/sys/netinet/ip_dummynet.c (revision 116991) +++ stable/4/sys/netinet/ip_dummynet.c (revision 116992) @@ -1,1991 +1,1999 @@ /* * Copyright (c) 1998-2002 Luigi Rizzo, Universita` di Pisa * Portions Copyright (c) 2000 Akamba Corp. * All rights reserved * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #if !defined(KLD_MODULE) #include "opt_ipfw.h" /* for IPFW2 definition */ #endif #define DEB(x) #define DDB(x) x /* * This module implements IP dummynet, a bandwidth limiter/delay emulator * used in conjunction with the ipfw package. * Description of the data structures used is in ip_dummynet.h * Here you mainly find the following blocks of code: * + variable declarations; * + heap management functions; * + scheduler and dummynet functions; * + configuration and initialization. * * NOTA BENE: critical sections are protected by splimp()/splx() * pairs. One would think that splnet() is enough as for most of * the netinet code, but it is not so because when used with * bridging, dummynet is invoked at splimp(). * * Most important Changes: * * 011004: KLDable * 010124: Fixed WF2Q behaviour * 010122: Fixed spl protection. * 000601: WF2Q support * 000106: large rewrite, use heaps to handle very many pipes. * 980513: initial release * * include files marked with XXX are probably not needed */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* for struct arpcom */ #include /* * We keep a private variable for the simulation time, but we could * probably use an existing one ("softticks" in sys/kern/kern_timer.c) */ static dn_key curr_time = 0 ; /* current simulation time */ static int dn_hash_size = 64 ; /* default hash size */ /* statistics on number of queue searches and search steps */ static int searches, search_steps ; static int pipe_expire = 1 ; /* expire queue if empty */ static int dn_max_ratio = 16 ; /* max queues/buckets ratio */ static int red_lookup_depth = 256; /* RED - default lookup table depth */ static int red_avg_pkt_size = 512; /* RED - default medium packet size */ static int red_max_pkt_size = 1500; /* RED - default max packet size */ /* * Three heaps contain queues and pipes that the scheduler handles: * * ready_heap contains all dn_flow_queue related to fixed-rate pipes. * * wfq_ready_heap contains the pipes associated with WF2Q flows * * extract_heap contains pipes associated with delay lines. * */ MALLOC_DEFINE(M_DUMMYNET, "dummynet", "dummynet heap"); static struct dn_heap ready_heap, extract_heap, wfq_ready_heap ; static int heap_init(struct dn_heap *h, int size) ; static int heap_insert (struct dn_heap *h, dn_key key1, void *p); static void heap_extract(struct dn_heap *h, void *obj); static void transmit_event(struct dn_pipe *pipe); static void ready_event(struct dn_flow_queue *q); static struct dn_pipe *all_pipes = NULL ; /* list of all pipes */ static struct dn_flow_set *all_flow_sets = NULL ;/* list of all flow_sets */ static struct callout_handle dn_timeout; #ifdef SYSCTL_NODE SYSCTL_NODE(_net_inet_ip, OID_AUTO, dummynet, CTLFLAG_RW, 0, "Dummynet"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, hash_size, CTLFLAG_RW, &dn_hash_size, 0, "Default hash table size"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, curr_time, CTLFLAG_RD, &curr_time, 0, "Current tick"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, ready_heap, CTLFLAG_RD, &ready_heap.size, 0, "Size of ready heap"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, extract_heap, CTLFLAG_RD, &extract_heap.size, 0, "Size of extract heap"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, searches, CTLFLAG_RD, &searches, 0, "Number of queue searches"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, search_steps, CTLFLAG_RD, &search_steps, 0, "Number of queue search steps"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, expire, CTLFLAG_RW, &pipe_expire, 0, "Expire queue if empty"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, max_chain_len, CTLFLAG_RW, &dn_max_ratio, 0, "Max ratio between dynamic queues and buckets"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, red_lookup_depth, CTLFLAG_RD, &red_lookup_depth, 0, "Depth of RED lookup table"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, red_avg_pkt_size, CTLFLAG_RD, &red_avg_pkt_size, 0, "RED Medium packet size"); SYSCTL_INT(_net_inet_ip_dummynet, OID_AUTO, red_max_pkt_size, CTLFLAG_RD, &red_max_pkt_size, 0, "RED Max packet size"); #endif static int config_pipe(struct dn_pipe *p); static int ip_dn_ctl(struct sockopt *sopt); static void rt_unref(struct rtentry *); static void dummynet(void *); static void dummynet_flush(void); void dummynet_drain(void); static ip_dn_io_t dummynet_io; static void dn_rule_delete(void *); int if_tx_rdy(struct ifnet *ifp); static void rt_unref(struct rtentry *rt) { if (rt == NULL) return ; if (rt->rt_refcnt <= 0) printf("dummynet: warning, refcnt now %ld, decreasing\n", rt->rt_refcnt); RTFREE(rt); } /* * Heap management functions. * * In the heap, first node is element 0. Children of i are 2i+1 and 2i+2. * Some macros help finding parent/children so we can optimize them. * * heap_init() is called to expand the heap when needed. * Increment size in blocks of 16 entries. * XXX failure to allocate a new element is a pretty bad failure * as we basically stall a whole queue forever!! * Returns 1 on error, 0 on success */ #define HEAP_FATHER(x) ( ( (x) - 1 ) / 2 ) #define HEAP_LEFT(x) ( 2*(x) + 1 ) #define HEAP_IS_LEFT(x) ( (x) & 1 ) #define HEAP_RIGHT(x) ( 2*(x) + 2 ) #define HEAP_SWAP(a, b, buffer) { buffer = a ; a = b ; b = buffer ; } #define HEAP_INCREMENT 15 static int heap_init(struct dn_heap *h, int new_size) { struct dn_heap_entry *p; if (h->size >= new_size ) { printf("dummynet: heap_init, Bogus call, have %d want %d\n", h->size, new_size); return 0 ; } new_size = (new_size + HEAP_INCREMENT ) & ~HEAP_INCREMENT ; p = malloc(new_size * sizeof(*p), M_DUMMYNET, M_NOWAIT); if (p == NULL) { printf("dummynet: heap_init, resize %d failed\n", new_size ); return 1 ; /* error */ } if (h->size > 0) { bcopy(h->p, p, h->size * sizeof(*p) ); free(h->p, M_DUMMYNET); } h->p = p ; h->size = new_size ; return 0 ; } /* * Insert element in heap. Normally, p != NULL, we insert p in * a new position and bubble up. If p == NULL, then the element is * already in place, and key is the position where to start the * bubble-up. * Returns 1 on failure (cannot allocate new heap entry) * * If offset > 0 the position (index, int) of the element in the heap is * also stored in the element itself at the given offset in bytes. */ #define SET_OFFSET(heap, node) \ if (heap->offset > 0) \ *((int *)((char *)(heap->p[node].object) + heap->offset)) = node ; /* * RESET_OFFSET is used for sanity checks. It sets offset to an invalid value. */ #define RESET_OFFSET(heap, node) \ if (heap->offset > 0) \ *((int *)((char *)(heap->p[node].object) + heap->offset)) = -1 ; static int heap_insert(struct dn_heap *h, dn_key key1, void *p) { int son = h->elements ; if (p == NULL) /* data already there, set starting point */ son = key1 ; else { /* insert new element at the end, possibly resize */ son = h->elements ; if (son == h->size) /* need resize... */ if (heap_init(h, h->elements+1) ) return 1 ; /* failure... */ h->p[son].object = p ; h->p[son].key = key1 ; h->elements++ ; } while (son > 0) { /* bubble up */ int father = HEAP_FATHER(son) ; struct dn_heap_entry tmp ; if (DN_KEY_LT( h->p[father].key, h->p[son].key ) ) break ; /* found right position */ /* son smaller than father, swap and repeat */ HEAP_SWAP(h->p[son], h->p[father], tmp) ; SET_OFFSET(h, son); son = father ; } SET_OFFSET(h, son); return 0 ; } /* * remove top element from heap, or obj if obj != NULL */ static void heap_extract(struct dn_heap *h, void *obj) { int child, father, max = h->elements - 1 ; if (max < 0) { printf("dummynet: warning, extract from empty heap 0x%p\n", h); return ; } father = 0 ; /* default: move up smallest child */ if (obj != NULL) { /* extract specific element, index is at offset */ if (h->offset <= 0) panic("dummynet: heap_extract from middle not supported on this heap!!!\n"); father = *((int *)((char *)obj + h->offset)) ; if (father < 0 || father >= h->elements) { printf("dummynet: heap_extract, father %d out of bound 0..%d\n", father, h->elements); panic("dummynet: heap_extract"); } } RESET_OFFSET(h, father); child = HEAP_LEFT(father) ; /* left child */ while (child <= max) { /* valid entry */ if (child != max && DN_KEY_LT(h->p[child+1].key, h->p[child].key) ) child = child+1 ; /* take right child, otherwise left */ h->p[father] = h->p[child] ; SET_OFFSET(h, father); father = child ; child = HEAP_LEFT(child) ; /* left child for next loop */ } h->elements-- ; if (father != max) { /* * Fill hole with last entry and bubble up, reusing the insert code */ h->p[father] = h->p[max] ; heap_insert(h, father, NULL); /* this one cannot fail */ } } #if 0 /* * change object position and update references * XXX this one is never used! */ static void heap_move(struct dn_heap *h, dn_key new_key, void *object) { int temp; int i ; int max = h->elements-1 ; struct dn_heap_entry buf ; if (h->offset <= 0) panic("cannot move items on this heap"); i = *((int *)((char *)object + h->offset)); if (DN_KEY_LT(new_key, h->p[i].key) ) { /* must move up */ h->p[i].key = new_key ; for (; i>0 && DN_KEY_LT(new_key, h->p[(temp = HEAP_FATHER(i))].key) ; i = temp ) { /* bubble up */ HEAP_SWAP(h->p[i], h->p[temp], buf) ; SET_OFFSET(h, i); } } else { /* must move down */ h->p[i].key = new_key ; while ( (temp = HEAP_LEFT(i)) <= max ) { /* found left child */ if ((temp != max) && DN_KEY_GT(h->p[temp].key, h->p[temp+1].key)) temp++ ; /* select child with min key */ if (DN_KEY_GT(new_key, h->p[temp].key)) { /* go down */ HEAP_SWAP(h->p[i], h->p[temp], buf) ; SET_OFFSET(h, i); } else break ; i = temp ; } } SET_OFFSET(h, i); } #endif /* heap_move, unused */ /* * heapify() will reorganize data inside an array to maintain the * heap property. It is needed when we delete a bunch of entries. */ static void heapify(struct dn_heap *h) { int i ; for (i = 0 ; i < h->elements ; i++ ) heap_insert(h, i , NULL) ; } /* * cleanup the heap and free data structure */ static void heap_free(struct dn_heap *h) { if (h->size >0 ) free(h->p, M_DUMMYNET); bzero(h, sizeof(*h) ); } /* * --- end of heap management functions --- */ /* * Scheduler functions: * * transmit_event() is called when the delay-line needs to enter * the scheduler, either because of existing pkts getting ready, * or new packets entering the queue. The event handled is the delivery * time of the packet. * * ready_event() does something similar with fixed-rate queues, and the * event handled is the finish time of the head pkt. * * wfq_ready_event() does something similar with WF2Q queues, and the * event handled is the start time of the head pkt. * * In all cases, we make sure that the data structures are consistent * before passing pkts out, because this might trigger recursive * invocations of the procedures. */ static void transmit_event(struct dn_pipe *pipe) { struct dn_pkt *pkt ; while ( (pkt = pipe->head) && DN_KEY_LEQ(pkt->output_time, curr_time) ) { /* * first unlink, then call procedures, since ip_input() can invoke * ip_output() and viceversa, thus causing nested calls */ pipe->head = DN_NEXT(pkt) ; /* * The actual mbuf is preceded by a struct dn_pkt, resembling an mbuf * (NOT A REAL one, just a small block of malloc'ed memory) with * m_type = MT_TAG, m_flags = PACKET_TAG_DUMMYNET * dn_m (m_next) = actual mbuf to be processed by ip_input/output * and some other fields. * The block IS FREED HERE because it contains parameters passed * to the called routine. */ switch (pkt->dn_dir) { case DN_TO_IP_OUT: (void)ip_output((struct mbuf *)pkt, NULL, NULL, 0, NULL, NULL); rt_unref (pkt->ro.ro_rt) ; break ; case DN_TO_IP_IN : ip_input((struct mbuf *)pkt) ; break ; case DN_TO_BDG_FWD : if (!BDG_LOADED) { /* somebody unloaded the bridge module. Drop pkt */ printf("dummynet: dropping bridged packet trapped in pipe\n"); m_freem(pkt->dn_m); break; } /* fallthrough */ case DN_TO_ETH_DEMUX: { struct mbuf *m = (struct mbuf *)pkt ; struct ether_header *eh; if (pkt->dn_m->m_len < ETHER_HDR_LEN && (pkt->dn_m = m_pullup(pkt->dn_m, ETHER_HDR_LEN)) == NULL) { printf("dummynet/bridge: pullup fail, dropping pkt\n"); break; } /* * same as ether_input, make eh be a pointer into the mbuf */ eh = mtod(pkt->dn_m, struct ether_header *); m_adj(pkt->dn_m, ETHER_HDR_LEN); /* * bdg_forward() wants a pointer to the pseudo-mbuf-header, but * on return it will supply the pointer to the actual packet * (originally pkt->dn_m, but could be something else now) if * it has not consumed it. */ if (pkt->dn_dir == DN_TO_BDG_FWD) { m = bdg_forward_ptr(m, eh, pkt->ifp); if (m) m_freem(m); } else ether_demux(NULL, eh, m); /* which consumes the mbuf */ } break ; case DN_TO_ETH_OUT: ether_output_frame(pkt->ifp, (struct mbuf *)pkt); break; default: printf("dummynet: bad switch %d!\n", pkt->dn_dir); m_freem(pkt->dn_m); break ; } free(pkt, M_DUMMYNET); } /* if there are leftover packets, put into the heap for next event */ if ( (pkt = pipe->head) ) heap_insert(&extract_heap, pkt->output_time, pipe ) ; /* XXX should check errors on heap_insert, by draining the * whole pipe p and hoping in the future we are more successful */ } /* * the following macro computes how many ticks we have to wait * before being able to transmit a packet. The credit is taken from * either a pipe (WF2Q) or a flow_queue (per-flow queueing) */ #define SET_TICKS(pkt, q, p) \ (pkt->dn_m->m_pkthdr.len*8*hz - (q)->numbytes + p->bandwidth - 1 ) / \ p->bandwidth ; /* * extract pkt from queue, compute output time (could be now) * and put into delay line (p_queue) */ static void move_pkt(struct dn_pkt *pkt, struct dn_flow_queue *q, struct dn_pipe *p, int len) { q->head = DN_NEXT(pkt) ; q->len-- ; q->len_bytes -= len ; pkt->output_time = curr_time + p->delay ; if (p->head == NULL) p->head = pkt; else DN_NEXT(p->tail) = pkt; p->tail = pkt; DN_NEXT(p->tail) = NULL; } /* * ready_event() is invoked every time the queue must enter the * scheduler, either because the first packet arrives, or because * a previously scheduled event fired. * On invokation, drain as many pkts as possible (could be 0) and then * if there are leftover packets reinsert the pkt in the scheduler. */ static void ready_event(struct dn_flow_queue *q) { struct dn_pkt *pkt; struct dn_pipe *p = q->fs->pipe ; int p_was_empty ; if (p == NULL) { printf("dummynet: ready_event- pipe is gone\n"); return ; } p_was_empty = (p->head == NULL) ; /* * schedule fixed-rate queues linked to this pipe: * Account for the bw accumulated since last scheduling, then * drain as many pkts as allowed by q->numbytes and move to * the delay line (in p) computing output time. * bandwidth==0 (no limit) means we can drain the whole queue, * setting len_scaled = 0 does the job. */ q->numbytes += ( curr_time - q->sched_time ) * p->bandwidth; while ( (pkt = q->head) != NULL ) { int len = pkt->dn_m->m_pkthdr.len; int len_scaled = p->bandwidth ? len*8*hz : 0 ; if (len_scaled > q->numbytes ) break ; q->numbytes -= len_scaled ; move_pkt(pkt, q, p, len); } /* * If we have more packets queued, schedule next ready event * (can only occur when bandwidth != 0, otherwise we would have * flushed the whole queue in the previous loop). * To this purpose we record the current time and compute how many * ticks to go for the finish time of the packet. */ if ( (pkt = q->head) != NULL ) { /* this implies bandwidth != 0 */ dn_key t = SET_TICKS(pkt, q, p); /* ticks i have to wait */ q->sched_time = curr_time ; heap_insert(&ready_heap, curr_time + t, (void *)q ); /* XXX should check errors on heap_insert, and drain the whole * queue on error hoping next time we are luckier. */ } else { /* RED needs to know when the queue becomes empty */ q->q_time = curr_time; q->numbytes = 0; } /* * If the delay line was empty call transmit_event(p) now. * Otherwise, the scheduler will take care of it. */ if (p_was_empty) transmit_event(p); } /* * Called when we can transmit packets on WF2Q queues. Take pkts out of * the queues at their start time, and enqueue into the delay line. * Packets are drained until p->numbytes < 0. As long as * len_scaled >= p->numbytes, the packet goes into the delay line * with a deadline p->delay. For the last packet, if p->numbytes<0, * there is an additional delay. */ static void ready_event_wfq(struct dn_pipe *p) { int p_was_empty = (p->head == NULL) ; struct dn_heap *sch = &(p->scheduler_heap); struct dn_heap *neh = &(p->not_eligible_heap) ; if (p->if_name[0] == 0) /* tx clock is simulated */ p->numbytes += ( curr_time - p->sched_time ) * p->bandwidth; else { /* tx clock is for real, the ifq must be empty or this is a NOP */ if (p->ifp && p->ifp->if_snd.ifq_head != NULL) return ; else { DEB(printf("dummynet: pipe %d ready from %s --\n", p->pipe_nr, p->if_name);) } } /* * While we have backlogged traffic AND credit, we need to do * something on the queue. */ while ( p->numbytes >=0 && (sch->elements>0 || neh->elements >0) ) { if (sch->elements > 0) { /* have some eligible pkts to send out */ struct dn_flow_queue *q = sch->p[0].object ; struct dn_pkt *pkt = q->head; struct dn_flow_set *fs = q->fs; u_int64_t len = pkt->dn_m->m_pkthdr.len; int len_scaled = p->bandwidth ? len*8*hz : 0 ; heap_extract(sch, NULL); /* remove queue from heap */ p->numbytes -= len_scaled ; move_pkt(pkt, q, p, len); p->V += (len<sum ; /* update V */ q->S = q->F ; /* update start time */ if (q->len == 0) { /* Flow not backlogged any more */ fs->backlogged-- ; heap_insert(&(p->idle_heap), q->F, q); } else { /* still backlogged */ /* * update F and position in backlogged queue, then * put flow in not_eligible_heap (we will fix this later). */ len = (q->head)->dn_m->m_pkthdr.len; q->F += (len<weight ; if (DN_KEY_LEQ(q->S, p->V)) heap_insert(neh, q->S, q); else heap_insert(sch, q->F, q); } } /* * now compute V = max(V, min(S_i)). Remember that all elements in sch * have by definition S_i <= V so if sch is not empty, V is surely * the max and we must not update it. Conversely, if sch is empty * we only need to look at neh. */ if (sch->elements == 0 && neh->elements > 0) p->V = MAX64 ( p->V, neh->p[0].key ); /* move from neh to sch any packets that have become eligible */ while (neh->elements > 0 && DN_KEY_LEQ(neh->p[0].key, p->V) ) { struct dn_flow_queue *q = neh->p[0].object ; heap_extract(neh, NULL); heap_insert(sch, q->F, q); } if (p->if_name[0] != '\0') {/* tx clock is from a real thing */ p->numbytes = -1 ; /* mark not ready for I/O */ break ; } } if (sch->elements == 0 && neh->elements == 0 && p->numbytes >= 0 && p->idle_heap.elements > 0) { /* * no traffic and no events scheduled. We can get rid of idle-heap. */ int i ; for (i = 0 ; i < p->idle_heap.elements ; i++) { struct dn_flow_queue *q = p->idle_heap.p[i].object ; q->F = 0 ; q->S = q->F + 1 ; } p->sum = 0 ; p->V = 0 ; p->idle_heap.elements = 0 ; } /* * If we are getting clocks from dummynet (not a real interface) and * If we are under credit, schedule the next ready event. * Also fix the delivery time of the last packet. */ if (p->if_name[0]==0 && p->numbytes < 0) { /* this implies bandwidth >0 */ dn_key t=0 ; /* number of ticks i have to wait */ if (p->bandwidth > 0) t = ( p->bandwidth -1 - p->numbytes) / p->bandwidth ; p->tail->output_time += t ; p->sched_time = curr_time ; heap_insert(&wfq_ready_heap, curr_time + t, (void *)p); /* XXX should check errors on heap_insert, and drain the whole * queue on error hoping next time we are luckier. */ } /* * If the delay line was empty call transmit_event(p) now. * Otherwise, the scheduler will take care of it. */ if (p_was_empty) transmit_event(p); } /* * This is called once per tick, or HZ times per second. It is used to * increment the current tick counter and schedule expired events. */ static void dummynet(void * __unused unused) { void *p ; /* generic parameter to handler */ struct dn_heap *h ; int s ; struct dn_heap *heaps[3]; int i; struct dn_pipe *pe ; heaps[0] = &ready_heap ; /* fixed-rate queues */ heaps[1] = &wfq_ready_heap ; /* wfq queues */ heaps[2] = &extract_heap ; /* delay line */ s = splimp(); /* see note on top, splnet() is not enough */ curr_time++ ; for (i=0; i < 3 ; i++) { h = heaps[i]; while (h->elements > 0 && DN_KEY_LEQ(h->p[0].key, curr_time) ) { DDB(if (h->p[0].key > curr_time) printf("dummynet: warning, heap %d is %d ticks late\n", i, (int)(curr_time - h->p[0].key));) p = h->p[0].object ; /* store a copy before heap_extract */ heap_extract(h, NULL); /* need to extract before processing */ if (i == 0) ready_event(p) ; else if (i == 1) { struct dn_pipe *pipe = p; if (pipe->if_name[0] != '\0') printf("dummynet: bad ready_event_wfq for pipe %s\n", pipe->if_name); else ready_event_wfq(p) ; } else transmit_event(p); } } /* sweep pipes trying to expire idle flow_queues */ for (pe = all_pipes; pe ; pe = pe->next ) if (pe->idle_heap.elements > 0 && DN_KEY_LT(pe->idle_heap.p[0].key, pe->V) ) { struct dn_flow_queue *q = pe->idle_heap.p[0].object ; heap_extract(&(pe->idle_heap), NULL); q->S = q->F + 1 ; /* mark timestamp as invalid */ pe->sum -= q->fs->weight ; } splx(s); dn_timeout = timeout(dummynet, NULL, 1); } /* * called by an interface when tx_rdy occurs. */ int if_tx_rdy(struct ifnet *ifp) { struct dn_pipe *p; for (p = all_pipes; p ; p = p->next ) if (p->ifp == ifp) break ; if (p == NULL) { char buf[32]; sprintf(buf, "%s%d",ifp->if_name, ifp->if_unit); for (p = all_pipes; p ; p = p->next ) if (!strcmp(p->if_name, buf) ) { p->ifp = ifp ; DEB(printf("dummynet: ++ tx rdy from %s (now found)\n", buf);) break ; } } if (p != NULL) { DEB(printf("dummynet: ++ tx rdy from %s%d - qlen %d\n", ifp->if_name, ifp->if_unit, ifp->if_snd.ifq_len);) p->numbytes = 0 ; /* mark ready for I/O */ ready_event_wfq(p); } return 0; } /* * Unconditionally expire empty queues in case of shortage. * Returns the number of queues freed. */ static int expire_queues(struct dn_flow_set *fs) { struct dn_flow_queue *q, *prev ; int i, initial_elements = fs->rq_elements ; if (fs->last_expired == time_second) return 0 ; fs->last_expired = time_second ; for (i = 0 ; i <= fs->rq_size ; i++) /* last one is overflow */ for (prev=NULL, q = fs->rq[i] ; q != NULL ; ) if (q->head != NULL || q->S != q->F+1) { prev = q ; q = q->next ; } else { /* entry is idle, expire it */ struct dn_flow_queue *old_q = q ; if (prev != NULL) prev->next = q = q->next ; else fs->rq[i] = q = q->next ; fs->rq_elements-- ; free(old_q, M_DUMMYNET); } return initial_elements - fs->rq_elements ; } /* * If room, create a new queue and put at head of slot i; * otherwise, create or use the default queue. */ static struct dn_flow_queue * create_queue(struct dn_flow_set *fs, int i) { struct dn_flow_queue *q ; if (fs->rq_elements > fs->rq_size * dn_max_ratio && expire_queues(fs) == 0) { /* * No way to get room, use or create overflow queue. */ i = fs->rq_size ; if ( fs->rq[i] != NULL ) return fs->rq[i] ; } q = malloc(sizeof(*q), M_DUMMYNET, M_NOWAIT | M_ZERO); if (q == NULL) { printf("dummynet: sorry, cannot allocate queue for new flow\n"); return NULL ; } q->fs = fs ; q->hash_slot = i ; q->next = fs->rq[i] ; q->S = q->F + 1; /* hack - mark timestamp as invalid */ fs->rq[i] = q ; fs->rq_elements++ ; return q ; } /* * Given a flow_set and a pkt in last_pkt, find a matching queue * after appropriate masking. The queue is moved to front * so that further searches take less time. */ static struct dn_flow_queue * find_queue(struct dn_flow_set *fs, struct ipfw_flow_id *id) { int i = 0 ; /* we need i and q for new allocations */ struct dn_flow_queue *q, *prev; if ( !(fs->flags_fs & DN_HAVE_FLOW_MASK) ) q = fs->rq[0] ; else { /* first, do the masking */ id->dst_ip &= fs->flow_mask.dst_ip ; id->src_ip &= fs->flow_mask.src_ip ; id->dst_port &= fs->flow_mask.dst_port ; id->src_port &= fs->flow_mask.src_port ; id->proto &= fs->flow_mask.proto ; id->flags = 0 ; /* we don't care about this one */ /* then, hash function */ i = ( (id->dst_ip) & 0xffff ) ^ ( (id->dst_ip >> 15) & 0xffff ) ^ ( (id->src_ip << 1) & 0xffff ) ^ ( (id->src_ip >> 16 ) & 0xffff ) ^ (id->dst_port << 1) ^ (id->src_port) ^ (id->proto ); i = i % fs->rq_size ; /* finally, scan the current list for a match */ searches++ ; for (prev=NULL, q = fs->rq[i] ; q ; ) { search_steps++; if (id->dst_ip == q->id.dst_ip && id->src_ip == q->id.src_ip && id->dst_port == q->id.dst_port && id->src_port == q->id.src_port && id->proto == q->id.proto && id->flags == q->id.flags) break ; /* found */ else if (pipe_expire && q->head == NULL && q->S == q->F+1 ) { /* entry is idle and not in any heap, expire it */ struct dn_flow_queue *old_q = q ; if (prev != NULL) prev->next = q = q->next ; else fs->rq[i] = q = q->next ; fs->rq_elements-- ; free(old_q, M_DUMMYNET); continue ; } prev = q ; q = q->next ; } if (q && prev != NULL) { /* found and not in front */ prev->next = q->next ; q->next = fs->rq[i] ; fs->rq[i] = q ; } } if (q == NULL) { /* no match, need to allocate a new entry */ q = create_queue(fs, i); if (q != NULL) q->id = *id ; } return q ; } static int red_drops(struct dn_flow_set *fs, struct dn_flow_queue *q, int len) { /* * RED algorithm * * RED calculates the average queue size (avg) using a low-pass filter * with an exponential weighted (w_q) moving average: * avg <- (1-w_q) * avg + w_q * q_size * where q_size is the queue length (measured in bytes or * packets). * * If q_size == 0, we compute the idle time for the link, and set * avg = (1 - w_q)^(idle/s) * where s is the time needed for transmitting a medium-sized packet. * * Now, if avg < min_th the packet is enqueued. * If avg > max_th the packet is dropped. Otherwise, the packet is * dropped with probability P function of avg. * */ int64_t p_b = 0; /* queue in bytes or packets ? */ u_int q_size = (fs->flags_fs & DN_QSIZE_IS_BYTES) ? q->len_bytes : q->len; DEB(printf("\ndummynet: %d q: %2u ", (int) curr_time, q_size);) /* average queue size estimation */ if (q_size != 0) { /* * queue is not empty, avg <- avg + (q_size - avg) * w_q */ int diff = SCALE(q_size) - q->avg; int64_t v = SCALE_MUL((int64_t) diff, (int64_t) fs->w_q); q->avg += (int) v; } else { /* * queue is empty, find for how long the queue has been * empty and use a lookup table for computing * (1 - * w_q)^(idle_time/s) where s is the time to send a * (small) packet. * XXX check wraps... */ if (q->avg) { u_int t = (curr_time - q->q_time) / fs->lookup_step; q->avg = (t < fs->lookup_depth) ? SCALE_MUL(q->avg, fs->w_q_lookup[t]) : 0; } } DEB(printf("dummynet: avg: %u ", SCALE_VAL(q->avg));) /* should i drop ? */ if (q->avg < fs->min_th) { q->count = -1; return 0; /* accept packet ; */ } if (q->avg >= fs->max_th) { /* average queue >= max threshold */ if (fs->flags_fs & DN_IS_GENTLE_RED) { /* * According to Gentle-RED, if avg is greater than max_th the * packet is dropped with a probability * p_b = c_3 * avg - c_4 * where c_3 = (1 - max_p) / max_th, and c_4 = 1 - 2 * max_p */ p_b = SCALE_MUL((int64_t) fs->c_3, (int64_t) q->avg) - fs->c_4; } else { q->count = -1; DEB(printf("dummynet: - drop");); return 1 ; } } else if (q->avg > fs->min_th) { /* * we compute p_b using the linear dropping function p_b = c_1 * * avg - c_2, where c_1 = max_p / (max_th - min_th), and c_2 = * max_p * min_th / (max_th - min_th) */ p_b = SCALE_MUL((int64_t) fs->c_1, (int64_t) q->avg) - fs->c_2; } if (fs->flags_fs & DN_QSIZE_IS_BYTES) p_b = (p_b * len) / fs->max_pkt_size; if (++q->count == 0) q->random = random() & 0xffff; else { /* * q->count counts packets arrived since last drop, so a greater * value of q->count means a greater packet drop probability. */ if (SCALE_MUL(p_b, SCALE((int64_t) q->count)) > q->random) { q->count = 0; DEB(printf("dummynet: - red drop");) /* after a drop we calculate a new random value */ q->random = random() & 0xffff; return 1; /* drop */ } } /* end of RED algorithm */ return 0 ; /* accept */ } static __inline struct dn_flow_set * locate_flowset(int pipe_nr, struct ip_fw *rule) { #if IPFW2 struct dn_flow_set *fs; ipfw_insn *cmd = rule->cmd + rule->act_ofs; if (cmd->opcode == O_LOG) cmd += F_LEN(cmd); +#ifdef __i386__ fs = ((ipfw_insn_pipe *)cmd)->pipe_ptr; +#else + bcopy(& ((ipfw_insn_pipe *)cmd)->pipe_ptr, &fs, sizeof(fs)); +#endif if (fs != NULL) return fs; if (cmd->opcode == O_QUEUE) #else /* !IPFW2 */ struct dn_flow_set *fs = NULL ; if ( (rule->fw_flg & IP_FW_F_COMMAND) == IP_FW_F_QUEUE ) #endif /* !IPFW2 */ for (fs=all_flow_sets; fs && fs->fs_nr != pipe_nr; fs=fs->next) ; else { struct dn_pipe *p1; for (p1 = all_pipes; p1 && p1->pipe_nr != pipe_nr; p1 = p1->next) ; if (p1 != NULL) fs = &(p1->fs) ; } /* record for the future */ #if IPFW2 +#ifdef __i386__ ((ipfw_insn_pipe *)cmd)->pipe_ptr = fs; #else + bcopy(&fs, & ((ipfw_insn_pipe *)cmd)->pipe_ptr, sizeof(fs)); +#endif +#else if (fs != NULL) rule->pipe_ptr = fs; #endif return fs ; } /* * dummynet hook for packets. Below 'pipe' is a pipe or a queue * depending on whether WF2Q or fixed bw is used. * * pipe_nr pipe or queue the packet is destined for. * dir where shall we send the packet after dummynet. * m the mbuf with the packet * ifp the 'ifp' parameter from the caller. * NULL in ip_input, destination interface in ip_output, * real_dst in bdg_forward * ro route parameter (only used in ip_output, NULL otherwise) * dst destination address, only used by ip_output * rule matching rule, in case of multiple passes * flags flags from the caller, only used in ip_output * */ static int dummynet_io(struct mbuf *m, int pipe_nr, int dir, struct ip_fw_args *fwa) { struct dn_pkt *pkt; struct dn_flow_set *fs; struct dn_pipe *pipe ; u_int64_t len = m->m_pkthdr.len ; struct dn_flow_queue *q = NULL ; int s = splimp(); int is_pipe; #if IPFW2 ipfw_insn *cmd = fwa->rule->cmd + fwa->rule->act_ofs; if (cmd->opcode == O_LOG) cmd += F_LEN(cmd); is_pipe = (cmd->opcode == O_PIPE); #else is_pipe = (fwa->rule->fw_flg & IP_FW_F_COMMAND) == IP_FW_F_PIPE; #endif pipe_nr &= 0xffff ; /* - * this is a dummynet rule, so we expect an O_PIPE or O_QUEUE rule. + * This is a dummynet rule, so we expect an O_PIPE or O_QUEUE rule. */ fs = locate_flowset(pipe_nr, fwa->rule); if (fs == NULL) goto dropit ; /* this queue/pipe does not exist! */ pipe = fs->pipe ; if (pipe == NULL) { /* must be a queue, try find a matching pipe */ for (pipe = all_pipes; pipe && pipe->pipe_nr != fs->parent_nr; pipe = pipe->next) ; if (pipe != NULL) fs->pipe = pipe ; else { printf("dummynet: no pipe %d for queue %d, drop pkt\n", fs->parent_nr, fs->fs_nr); goto dropit ; } } q = find_queue(fs, &(fwa->f_id)); if ( q == NULL ) goto dropit ; /* cannot allocate queue */ /* * update statistics, then check reasons to drop pkt */ q->tot_bytes += len ; q->tot_pkts++ ; if ( fs->plr && random() < fs->plr ) goto dropit ; /* random pkt drop */ if ( fs->flags_fs & DN_QSIZE_IS_BYTES) { if (q->len_bytes > fs->qsize) goto dropit ; /* queue size overflow */ } else { if (q->len >= fs->qsize) goto dropit ; /* queue count overflow */ } if ( fs->flags_fs & DN_IS_RED && red_drops(fs, q, len) ) goto dropit ; /* XXX expensive to zero, see if we can remove it*/ pkt = (struct dn_pkt *)malloc(sizeof (*pkt), M_DUMMYNET, M_NOWAIT|M_ZERO); if ( pkt == NULL ) goto dropit ; /* cannot allocate packet header */ /* ok, i can handle the pkt now... */ /* build and enqueue packet + parameters */ pkt->hdr.mh_type = MT_TAG; pkt->hdr.mh_flags = PACKET_TAG_DUMMYNET; pkt->rule = fwa->rule ; DN_NEXT(pkt) = NULL; pkt->dn_m = m; pkt->dn_dir = dir ; pkt->ifp = fwa->oif; if (dir == DN_TO_IP_OUT) { /* * We need to copy *ro because for ICMP pkts (and maybe others) * the caller passed a pointer into the stack; dst might also be * a pointer into *ro so it needs to be updated. */ pkt->ro = *(fwa->ro); if (fwa->ro->ro_rt) fwa->ro->ro_rt->rt_refcnt++ ; if (fwa->dst == (struct sockaddr_in *)&fwa->ro->ro_dst) /* dst points into ro */ fwa->dst = (struct sockaddr_in *)&(pkt->ro.ro_dst) ; pkt->dn_dst = fwa->dst; pkt->flags = fwa->flags; } if (q->head == NULL) q->head = pkt; else DN_NEXT(q->tail) = pkt; q->tail = pkt; q->len++; q->len_bytes += len ; if ( q->head != pkt ) /* flow was not idle, we are done */ goto done; /* * If we reach this point the flow was previously idle, so we need * to schedule it. This involves different actions for fixed-rate or * WF2Q queues. */ if (is_pipe) { /* * Fixed-rate queue: just insert into the ready_heap. */ dn_key t = 0 ; if (pipe->bandwidth) t = SET_TICKS(pkt, q, pipe); q->sched_time = curr_time ; if (t == 0) /* must process it now */ ready_event( q ); else heap_insert(&ready_heap, curr_time + t , q ); } else { /* * WF2Q. First, compute start time S: if the flow was idle (S=F+1) * set S to the virtual time V for the controlling pipe, and update * the sum of weights for the pipe; otherwise, remove flow from * idle_heap and set S to max(F,V). * Second, compute finish time F = S + len/weight. * Third, if pipe was idle, update V=max(S, V). * Fourth, count one more backlogged flow. */ if (DN_KEY_GT(q->S, q->F)) { /* means timestamps are invalid */ q->S = pipe->V ; pipe->sum += fs->weight ; /* add weight of new queue */ } else { heap_extract(&(pipe->idle_heap), q); q->S = MAX64(q->F, pipe->V ) ; } q->F = q->S + ( len<weight; if (pipe->not_eligible_heap.elements == 0 && pipe->scheduler_heap.elements == 0) pipe->V = MAX64 ( q->S, pipe->V ); fs->backlogged++ ; /* * Look at eligibility. A flow is not eligibile if S>V (when * this happens, it means that there is some other flow already * scheduled for the same pipe, so the scheduler_heap cannot be * empty). If the flow is not eligible we just store it in the * not_eligible_heap. Otherwise, we store in the scheduler_heap * and possibly invoke ready_event_wfq() right now if there is * leftover credit. * Note that for all flows in scheduler_heap (SCH), S_i <= V, * and for all flows in not_eligible_heap (NEH), S_i > V . * So when we need to compute max( V, min(S_i) ) forall i in SCH+NEH, * we only need to look into NEH. */ if (DN_KEY_GT(q->S, pipe->V) ) { /* not eligible */ if (pipe->scheduler_heap.elements == 0) printf("dummynet: ++ ouch! not eligible but empty scheduler!\n"); heap_insert(&(pipe->not_eligible_heap), q->S, q); } else { heap_insert(&(pipe->scheduler_heap), q->F, q); if (pipe->numbytes >= 0) { /* pipe is idle */ if (pipe->scheduler_heap.elements != 1) printf("dummynet: OUCH! pipe should have been idle!\n"); DEB(printf("dummynet: waking up pipe %d at %d\n", pipe->pipe_nr, (int)(q->F >> MY_M)); ) pipe->sched_time = curr_time ; ready_event_wfq(pipe); } } } done: splx(s); return 0; dropit: splx(s); if (q) q->drops++ ; m_freem(m); return ( (fs && (fs->flags_fs & DN_NOERROR)) ? 0 : ENOBUFS); } /* * Below, the rt_unref is only needed when (pkt->dn_dir == DN_TO_IP_OUT) * Doing this would probably save us the initial bzero of dn_pkt */ #define DN_FREE_PKT(pkt) { \ struct dn_pkt *n = pkt ; \ rt_unref ( n->ro.ro_rt ) ; \ m_freem(n->dn_m); \ pkt = DN_NEXT(n) ; \ free(n, M_DUMMYNET) ; } /* * Dispose all packets and flow_queues on a flow_set. * If all=1, also remove red lookup table and other storage, * including the descriptor itself. * For the one in dn_pipe MUST also cleanup ready_heap... */ static void purge_flow_set(struct dn_flow_set *fs, int all) { struct dn_pkt *pkt ; struct dn_flow_queue *q, *qn ; int i ; for (i = 0 ; i <= fs->rq_size ; i++ ) { for (q = fs->rq[i] ; q ; q = qn ) { for (pkt = q->head ; pkt ; ) DN_FREE_PKT(pkt) ; qn = q->next ; free(q, M_DUMMYNET); } fs->rq[i] = NULL ; } fs->rq_elements = 0 ; if (all) { /* RED - free lookup table */ if (fs->w_q_lookup) free(fs->w_q_lookup, M_DUMMYNET); if (fs->rq) free(fs->rq, M_DUMMYNET); /* if this fs is not part of a pipe, free it */ if (fs->pipe && fs != &(fs->pipe->fs) ) free(fs, M_DUMMYNET); } } /* * Dispose all packets queued on a pipe (not a flow_set). * Also free all resources associated to a pipe, which is about * to be deleted. */ static void purge_pipe(struct dn_pipe *pipe) { struct dn_pkt *pkt ; purge_flow_set( &(pipe->fs), 1 ); for (pkt = pipe->head ; pkt ; ) DN_FREE_PKT(pkt) ; heap_free( &(pipe->scheduler_heap) ); heap_free( &(pipe->not_eligible_heap) ); heap_free( &(pipe->idle_heap) ); } /* * Delete all pipes and heaps returning memory. Must also * remove references from all ipfw rules to all pipes. */ static void dummynet_flush() { struct dn_pipe *curr_p, *p ; struct dn_flow_set *fs, *curr_fs; int s ; s = splimp() ; /* remove all references to pipes ...*/ flush_pipe_ptrs(NULL); /* prevent future matches... */ p = all_pipes ; all_pipes = NULL ; fs = all_flow_sets ; all_flow_sets = NULL ; /* and free heaps so we don't have unwanted events */ heap_free(&ready_heap); heap_free(&wfq_ready_heap); heap_free(&extract_heap); splx(s) ; /* * Now purge all queued pkts and delete all pipes */ /* scan and purge all flow_sets. */ for ( ; fs ; ) { curr_fs = fs ; fs = fs->next ; purge_flow_set(curr_fs, 1); } for ( ; p ; ) { purge_pipe(p); curr_p = p ; p = p->next ; free(curr_p, M_DUMMYNET); } } extern struct ip_fw *ip_fw_default_rule ; static void dn_rule_delete_fs(struct dn_flow_set *fs, void *r) { int i ; struct dn_flow_queue *q ; struct dn_pkt *pkt ; for (i = 0 ; i <= fs->rq_size ; i++) /* last one is ovflow */ for (q = fs->rq[i] ; q ; q = q->next ) for (pkt = q->head ; pkt ; pkt = DN_NEXT(pkt) ) if (pkt->rule == r) pkt->rule = ip_fw_default_rule ; } /* * when a firewall rule is deleted, scan all queues and remove the flow-id * from packets matching this rule. */ void dn_rule_delete(void *r) { struct dn_pipe *p ; struct dn_pkt *pkt ; struct dn_flow_set *fs ; /* * If the rule references a queue (dn_flow_set), then scan * the flow set, otherwise scan pipes. Should do either, but doing * both does not harm. */ for ( fs = all_flow_sets ; fs ; fs = fs->next ) dn_rule_delete_fs(fs, r); for ( p = all_pipes ; p ; p = p->next ) { fs = &(p->fs) ; dn_rule_delete_fs(fs, r); for (pkt = p->head ; pkt ; pkt = DN_NEXT(pkt) ) if (pkt->rule == r) pkt->rule = ip_fw_default_rule ; } } /* * setup RED parameters */ static int config_red(struct dn_flow_set *p, struct dn_flow_set * x) { int i; x->w_q = p->w_q; x->min_th = SCALE(p->min_th); x->max_th = SCALE(p->max_th); x->max_p = p->max_p; x->c_1 = p->max_p / (p->max_th - p->min_th); x->c_2 = SCALE_MUL(x->c_1, SCALE(p->min_th)); if (x->flags_fs & DN_IS_GENTLE_RED) { x->c_3 = (SCALE(1) - p->max_p) / p->max_th; x->c_4 = (SCALE(1) - 2 * p->max_p); } /* if the lookup table already exist, free and create it again */ if (x->w_q_lookup) { free(x->w_q_lookup, M_DUMMYNET); x->w_q_lookup = NULL ; } if (red_lookup_depth == 0) { printf("\ndummynet: net.inet.ip.dummynet.red_lookup_depth must be > 0\n"); free(x, M_DUMMYNET); return EINVAL; } x->lookup_depth = red_lookup_depth; x->w_q_lookup = (u_int *) malloc(x->lookup_depth * sizeof(int), M_DUMMYNET, M_NOWAIT); if (x->w_q_lookup == NULL) { printf("dummynet: sorry, cannot allocate red lookup table\n"); free(x, M_DUMMYNET); return ENOSPC; } /* fill the lookup table with (1 - w_q)^x */ x->lookup_step = p->lookup_step ; x->lookup_weight = p->lookup_weight ; x->w_q_lookup[0] = SCALE(1) - x->w_q; for (i = 1; i < x->lookup_depth; i++) x->w_q_lookup[i] = SCALE_MUL(x->w_q_lookup[i - 1], x->lookup_weight); if (red_avg_pkt_size < 1) red_avg_pkt_size = 512 ; x->avg_pkt_size = red_avg_pkt_size ; if (red_max_pkt_size < 1) red_max_pkt_size = 1500 ; x->max_pkt_size = red_max_pkt_size ; return 0 ; } static int alloc_hash(struct dn_flow_set *x, struct dn_flow_set *pfs) { if (x->flags_fs & DN_HAVE_FLOW_MASK) { /* allocate some slots */ int l = pfs->rq_size; if (l == 0) l = dn_hash_size; if (l < 4) l = 4; else if (l > DN_MAX_HASH_SIZE) l = DN_MAX_HASH_SIZE; x->rq_size = l; } else /* one is enough for null mask */ x->rq_size = 1; x->rq = malloc((1 + x->rq_size) * sizeof(struct dn_flow_queue *), M_DUMMYNET, M_NOWAIT | M_ZERO); if (x->rq == NULL) { printf("dummynet: sorry, cannot allocate queue\n"); return ENOSPC; } x->rq_elements = 0; return 0 ; } static void set_fs_parms(struct dn_flow_set *x, struct dn_flow_set *src) { x->flags_fs = src->flags_fs; x->qsize = src->qsize; x->plr = src->plr; x->flow_mask = src->flow_mask; if (x->flags_fs & DN_QSIZE_IS_BYTES) { if (x->qsize > 1024*1024) x->qsize = 1024*1024 ; } else { if (x->qsize == 0) x->qsize = 50 ; if (x->qsize > 100) x->qsize = 50 ; } /* configuring RED */ if ( x->flags_fs & DN_IS_RED ) config_red(src, x) ; /* XXX should check errors */ } /* * setup pipe or queue parameters. */ static int config_pipe(struct dn_pipe *p) { int i, s; struct dn_flow_set *pfs = &(p->fs); struct dn_flow_queue *q; /* * The config program passes parameters as follows: * bw = bits/second (0 means no limits), * delay = ms, must be translated into ticks. * qsize = slots/bytes */ p->delay = ( p->delay * hz ) / 1000 ; /* We need either a pipe number or a flow_set number */ if (p->pipe_nr == 0 && pfs->fs_nr == 0) return EINVAL ; if (p->pipe_nr != 0 && pfs->fs_nr != 0) return EINVAL ; if (p->pipe_nr != 0) { /* this is a pipe */ struct dn_pipe *x, *a, *b; /* locate pipe */ for (a = NULL , b = all_pipes ; b && b->pipe_nr < p->pipe_nr ; a = b , b = b->next) ; if (b == NULL || b->pipe_nr != p->pipe_nr) { /* new pipe */ x = malloc(sizeof(struct dn_pipe), M_DUMMYNET, M_NOWAIT | M_ZERO); if (x == NULL) { printf("dummynet: no memory for new pipe\n"); return ENOSPC; } x->pipe_nr = p->pipe_nr; x->fs.pipe = x ; /* idle_heap is the only one from which we extract from the middle. */ x->idle_heap.size = x->idle_heap.elements = 0 ; x->idle_heap.offset=OFFSET_OF(struct dn_flow_queue, heap_pos); } else { x = b; s = splimp(); /* Flush accumulated credit for all queues */ for (i = 0; i <= x->fs.rq_size; i++) for (q = x->fs.rq[i]; q; q = q->next) q->numbytes = 0; splx(s); } s = splimp(); x->bandwidth = p->bandwidth ; x->numbytes = 0; /* just in case... */ bcopy(p->if_name, x->if_name, sizeof(p->if_name) ); x->ifp = NULL ; /* reset interface ptr */ x->delay = p->delay ; set_fs_parms(&(x->fs), pfs); if ( x->fs.rq == NULL ) { /* a new pipe */ s = alloc_hash(&(x->fs), pfs) ; if (s) { free(x, M_DUMMYNET); return s ; } x->next = b ; if (a == NULL) all_pipes = x ; else a->next = x ; } splx(s); } else { /* config queue */ struct dn_flow_set *x, *a, *b ; /* locate flow_set */ for (a=NULL, b=all_flow_sets ; b && b->fs_nr < pfs->fs_nr ; a = b , b = b->next) ; if (b == NULL || b->fs_nr != pfs->fs_nr) { /* new */ if (pfs->parent_nr == 0) /* need link to a pipe */ return EINVAL ; x = malloc(sizeof(struct dn_flow_set), M_DUMMYNET, M_NOWAIT|M_ZERO); if (x == NULL) { printf("dummynet: no memory for new flow_set\n"); return ENOSPC; } x->fs_nr = pfs->fs_nr; x->parent_nr = pfs->parent_nr; x->weight = pfs->weight ; if (x->weight == 0) x->weight = 1 ; else if (x->weight > 100) x->weight = 100 ; } else { /* Change parent pipe not allowed; must delete and recreate */ if (pfs->parent_nr != 0 && b->parent_nr != pfs->parent_nr) return EINVAL ; x = b; } s = splimp(); set_fs_parms(x, pfs); if ( x->rq == NULL ) { /* a new flow_set */ s = alloc_hash(x, pfs) ; if (s) { free(x, M_DUMMYNET); return s ; } x->next = b; if (a == NULL) all_flow_sets = x; else a->next = x; } splx(s); } return 0 ; } /* * Helper function to remove from a heap queues which are linked to * a flow_set about to be deleted. */ static void fs_remove_from_heap(struct dn_heap *h, struct dn_flow_set *fs) { int i = 0, found = 0 ; for (; i < h->elements ;) if ( ((struct dn_flow_queue *)h->p[i].object)->fs == fs) { h->elements-- ; h->p[i] = h->p[h->elements] ; found++ ; } else i++ ; if (found) heapify(h); } /* * helper function to remove a pipe from a heap (can be there at most once) */ static void pipe_remove_from_heap(struct dn_heap *h, struct dn_pipe *p) { if (h->elements > 0) { int i = 0 ; for (i=0; i < h->elements ; i++ ) { if (h->p[i].object == p) { /* found it */ h->elements-- ; h->p[i] = h->p[h->elements] ; heapify(h); break ; } } } } /* * drain all queues. Called in case of severe mbuf shortage. */ void dummynet_drain() { struct dn_flow_set *fs; struct dn_pipe *p; struct dn_pkt *pkt; heap_free(&ready_heap); heap_free(&wfq_ready_heap); heap_free(&extract_heap); /* remove all references to this pipe from flow_sets */ for (fs = all_flow_sets; fs; fs= fs->next ) purge_flow_set(fs, 0); for (p = all_pipes; p; p= p->next ) { purge_flow_set(&(p->fs), 0); for (pkt = p->head ; pkt ; ) DN_FREE_PKT(pkt) ; p->head = p->tail = NULL ; } } /* * Fully delete a pipe or a queue, cleaning up associated info. */ static int delete_pipe(struct dn_pipe *p) { int s ; if (p->pipe_nr == 0 && p->fs.fs_nr == 0) return EINVAL ; if (p->pipe_nr != 0 && p->fs.fs_nr != 0) return EINVAL ; if (p->pipe_nr != 0) { /* this is an old-style pipe */ struct dn_pipe *a, *b; struct dn_flow_set *fs; /* locate pipe */ for (a = NULL , b = all_pipes ; b && b->pipe_nr < p->pipe_nr ; a = b , b = b->next) ; if (b == NULL || (b->pipe_nr != p->pipe_nr) ) return EINVAL ; /* not found */ s = splimp() ; /* unlink from list of pipes */ if (a == NULL) all_pipes = b->next ; else a->next = b->next ; /* remove references to this pipe from the ip_fw rules. */ flush_pipe_ptrs(&(b->fs)); /* remove all references to this pipe from flow_sets */ for (fs = all_flow_sets; fs; fs= fs->next ) if (fs->pipe == b) { printf("dummynet: ++ ref to pipe %d from fs %d\n", p->pipe_nr, fs->fs_nr); fs->pipe = NULL ; purge_flow_set(fs, 0); } fs_remove_from_heap(&ready_heap, &(b->fs)); purge_pipe(b); /* remove all data associated to this pipe */ /* remove reference to here from extract_heap and wfq_ready_heap */ pipe_remove_from_heap(&extract_heap, b); pipe_remove_from_heap(&wfq_ready_heap, b); splx(s); free(b, M_DUMMYNET); } else { /* this is a WF2Q queue (dn_flow_set) */ struct dn_flow_set *a, *b; /* locate set */ for (a = NULL, b = all_flow_sets ; b && b->fs_nr < p->fs.fs_nr ; a = b , b = b->next) ; if (b == NULL || (b->fs_nr != p->fs.fs_nr) ) return EINVAL ; /* not found */ s = splimp() ; if (a == NULL) all_flow_sets = b->next ; else a->next = b->next ; /* remove references to this flow_set from the ip_fw rules. */ flush_pipe_ptrs(b); if (b->pipe != NULL) { /* Update total weight on parent pipe and cleanup parent heaps */ b->pipe->sum -= b->weight * b->backlogged ; fs_remove_from_heap(&(b->pipe->not_eligible_heap), b); fs_remove_from_heap(&(b->pipe->scheduler_heap), b); #if 1 /* XXX should i remove from idle_heap as well ? */ fs_remove_from_heap(&(b->pipe->idle_heap), b); #endif } purge_flow_set(b, 1); splx(s); } return 0 ; } /* * helper function used to copy data from kernel in DUMMYNET_GET */ static char * dn_copy_set(struct dn_flow_set *set, char *bp) { int i, copied = 0 ; struct dn_flow_queue *q, *qp = (struct dn_flow_queue *)bp; for (i = 0 ; i <= set->rq_size ; i++) for (q = set->rq[i] ; q ; q = q->next, qp++ ) { if (q->hash_slot != i) printf("dummynet: ++ at %d: wrong slot (have %d, " "should be %d)\n", copied, q->hash_slot, i); if (q->fs != set) printf("dummynet: ++ at %d: wrong fs ptr (have %p, should be %p)\n", i, q->fs, set); copied++ ; bcopy(q, qp, sizeof( *q ) ); /* cleanup pointers */ qp->next = NULL ; qp->head = qp->tail = NULL ; qp->fs = NULL ; } if (copied != set->rq_elements) printf("dummynet: ++ wrong count, have %d should be %d\n", copied, set->rq_elements); return (char *)qp ; } static int dummynet_get(struct sockopt *sopt) { char *buf, *bp ; /* bp is the "copy-pointer" */ size_t size ; struct dn_flow_set *set ; struct dn_pipe *p ; int s, error=0 ; s = splimp(); /* * compute size of data structures: list of pipes and flow_sets. */ for (p = all_pipes, size = 0 ; p ; p = p->next ) size += sizeof( *p ) + p->fs.rq_elements * sizeof(struct dn_flow_queue); for (set = all_flow_sets ; set ; set = set->next ) size += sizeof ( *set ) + set->rq_elements * sizeof(struct dn_flow_queue); buf = malloc(size, M_TEMP, M_NOWAIT); if (buf == 0) { splx(s); return ENOBUFS ; } for (p = all_pipes, bp = buf ; p ; p = p->next ) { struct dn_pipe *pipe_bp = (struct dn_pipe *)bp ; /* * copy pipe descriptor into *bp, convert delay back to ms, * then copy the flow_set descriptor(s) one at a time. * After each flow_set, copy the queue descriptor it owns. */ bcopy(p, bp, sizeof( *p ) ); pipe_bp->delay = (pipe_bp->delay * 1000) / hz ; /* * XXX the following is a hack based on ->next being the * first field in dn_pipe and dn_flow_set. The correct * solution would be to move the dn_flow_set to the beginning * of struct dn_pipe. */ pipe_bp->next = (struct dn_pipe *)DN_IS_PIPE ; /* clean pointers */ pipe_bp->head = pipe_bp->tail = NULL ; pipe_bp->fs.next = NULL ; pipe_bp->fs.pipe = NULL ; pipe_bp->fs.rq = NULL ; bp += sizeof( *p ) ; bp = dn_copy_set( &(p->fs), bp ); } for (set = all_flow_sets ; set ; set = set->next ) { struct dn_flow_set *fs_bp = (struct dn_flow_set *)bp ; bcopy(set, bp, sizeof( *set ) ); /* XXX same hack as above */ fs_bp->next = (struct dn_flow_set *)DN_IS_QUEUE ; fs_bp->pipe = NULL ; fs_bp->rq = NULL ; bp += sizeof( *set ) ; bp = dn_copy_set( set, bp ); } splx(s); error = sooptcopyout(sopt, buf, size); free(buf, M_TEMP); return error ; } /* * Handler for the various dummynet socket options (get, flush, config, del) */ static int ip_dn_ctl(struct sockopt *sopt) { int error = 0 ; struct dn_pipe *p, tmp_pipe; /* Disallow sets in really-really secure mode. */ if (sopt->sopt_dir == SOPT_SET) { #if __FreeBSD_version >= 500034 error = securelevel_ge(sopt->sopt_td->td_ucred, 3); if (error) return (error); #else if (securelevel >= 3) return (EPERM); #endif } switch (sopt->sopt_name) { default : printf("dummynet: -- unknown option %d", sopt->sopt_name); return EINVAL ; case IP_DUMMYNET_GET : error = dummynet_get(sopt); break ; case IP_DUMMYNET_FLUSH : dummynet_flush() ; break ; case IP_DUMMYNET_CONFIGURE : p = &tmp_pipe ; error = sooptcopyin(sopt, p, sizeof *p, sizeof *p); if (error) break ; error = config_pipe(p); break ; case IP_DUMMYNET_DEL : /* remove a pipe or queue */ p = &tmp_pipe ; error = sooptcopyin(sopt, p, sizeof *p, sizeof *p); if (error) break ; error = delete_pipe(p); break ; } return error ; } static void ip_dn_init(void) { printf("DUMMYNET initialized (011031)\n"); all_pipes = NULL ; all_flow_sets = NULL ; ready_heap.size = ready_heap.elements = 0 ; ready_heap.offset = 0 ; wfq_ready_heap.size = wfq_ready_heap.elements = 0 ; wfq_ready_heap.offset = 0 ; extract_heap.size = extract_heap.elements = 0 ; extract_heap.offset = 0 ; ip_dn_ctl_ptr = ip_dn_ctl; ip_dn_io_ptr = dummynet_io; ip_dn_ruledel_ptr = dn_rule_delete; bzero(&dn_timeout, sizeof(struct callout_handle)); dn_timeout = timeout(dummynet, NULL, 1); } static int dummynet_modevent(module_t mod, int type, void *data) { int s; switch (type) { case MOD_LOAD: s = splimp(); if (DUMMYNET_LOADED) { splx(s); printf("DUMMYNET already loaded\n"); return EEXIST ; } ip_dn_init(); splx(s); break; case MOD_UNLOAD: #if !defined(KLD_MODULE) printf("dummynet statically compiled, cannot unload\n"); return EINVAL ; #else s = splimp(); untimeout(dummynet, NULL, dn_timeout); dummynet_flush(); ip_dn_ctl_ptr = NULL; ip_dn_io_ptr = NULL; ip_dn_ruledel_ptr = NULL; splx(s); #endif break ; default: break ; } return 0 ; } static moduledata_t dummynet_mod = { "dummynet", dummynet_modevent, NULL }; DECLARE_MODULE(dummynet, dummynet_mod, SI_SUB_PSEUDO, SI_ORDER_ANY); MODULE_DEPEND(dummynet, ipfw, 1, 1, 1); MODULE_VERSION(dummynet, 1); Index: stable/4/sys/netinet/ip_fw2.c =================================================================== --- stable/4/sys/netinet/ip_fw2.c (revision 116991) +++ stable/4/sys/netinet/ip_fw2.c (revision 116992) @@ -1,2776 +1,2847 @@ /* * Copyright (c) 2002 Luigi Rizzo, Universita` di Pisa * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #define DEB(x) #define DDB(x) x /* * Implement IP packet firewall (new version) */ #if !defined(KLD_MODULE) #include "opt_ipfw.h" #include "opt_ipdn.h" #include "opt_ipdivert.h" #include "opt_inet.h" #ifndef INET #error IPFIREWALL requires INET. #endif /* INET */ #endif #if IPFW2 #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* XXX for ETHERTYPE_IP */ #include /* XXX for in_cksum */ /* * XXX This one should go in sys/mbuf.h. It is used to avoid that * a firewall-generated packet loops forever through the firewall. */ #ifndef M_SKIP_FIREWALL #define M_SKIP_FIREWALL 0x4000 #endif /* * set_disable contains one bit per set value (0..31). * If the bit is set, all rules with the corresponding set * are disabled. Set 31 is reserved for the default rule * and CANNOT be disabled. */ static u_int32_t set_disable; static int fw_verbose; static int verbose_limit; static struct callout_handle ipfw_timeout_h; #define IPFW_DEFAULT_RULE 65535 /* * list of rules for layer 3 */ static struct ip_fw *layer3_chain; MALLOC_DEFINE(M_IPFW, "IpFw/IpAcct", "IpFw/IpAcct chain's"); static int fw_debug = 1; static int autoinc_step = 100; /* bounded to 1..1000 in add_rule() */ #ifdef SYSCTL_NODE SYSCTL_NODE(_net_inet_ip, OID_AUTO, fw, CTLFLAG_RW, 0, "Firewall"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, enable, CTLFLAG_RW, &fw_enable, 0, "Enable ipfw"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, autoinc_step, CTLFLAG_RW, &autoinc_step, 0, "Rule number autincrement step"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO,one_pass,CTLFLAG_RW, &fw_one_pass, 0, "Only do a single pass through ipfw when using dummynet(4)"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, debug, CTLFLAG_RW, &fw_debug, 0, "Enable printing of debug ip_fw statements"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, verbose, CTLFLAG_RW, &fw_verbose, 0, "Log matches to ipfw rules"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, verbose_limit, CTLFLAG_RW, &verbose_limit, 0, "Set upper limit of matches of ipfw rules logged"); /* * Description of dynamic rules. * * Dynamic rules are stored in lists accessed through a hash table * (ipfw_dyn_v) whose size is curr_dyn_buckets. This value can * be modified through the sysctl variable dyn_buckets which is * updated when the table becomes empty. * * XXX currently there is only one list, ipfw_dyn. * * When a packet is received, its address fields are first masked * with the mask defined for the rule, then hashed, then matched * against the entries in the corresponding list. * Dynamic rules can be used for different purposes: * + stateful rules; * + enforcing limits on the number of sessions; * + in-kernel NAT (not implemented yet) * * The lifetime of dynamic rules is regulated by dyn_*_lifetime, * measured in seconds and depending on the flags. * * The total number of dynamic rules is stored in dyn_count. * The max number of dynamic rules is dyn_max. When we reach * the maximum number of rules we do not create anymore. This is * done to avoid consuming too much memory, but also too much * time when searching on each packet (ideally, we should try instead * to put a limit on the length of the list on each bucket...). * * Each dynamic rule holds a pointer to the parent ipfw rule so * we know what action to perform. Dynamic rules are removed when * the parent rule is deleted. XXX we should make them survive. * * There are some limitations with dynamic rules -- we do not * obey the 'randomized match', and we do not do multiple * passes through the firewall. XXX check the latter!!! */ static ipfw_dyn_rule **ipfw_dyn_v = NULL; static u_int32_t dyn_buckets = 256; /* must be power of 2 */ static u_int32_t curr_dyn_buckets = 256; /* must be power of 2 */ /* * Timeouts for various events in handing dynamic rules. */ static u_int32_t dyn_ack_lifetime = 300; static u_int32_t dyn_syn_lifetime = 20; static u_int32_t dyn_fin_lifetime = 1; static u_int32_t dyn_rst_lifetime = 1; static u_int32_t dyn_udp_lifetime = 10; static u_int32_t dyn_short_lifetime = 5; /* * Keepalives are sent if dyn_keepalive is set. They are sent every * dyn_keepalive_period seconds, in the last dyn_keepalive_interval * seconds of lifetime of a rule. * dyn_rst_lifetime and dyn_fin_lifetime should be strictly lower * than dyn_keepalive_period. */ static u_int32_t dyn_keepalive_interval = 20; static u_int32_t dyn_keepalive_period = 5; static u_int32_t dyn_keepalive = 1; /* do send keepalives */ static u_int32_t static_count; /* # of static rules */ static u_int32_t static_len; /* size in bytes of static rules */ static u_int32_t dyn_count; /* # of dynamic rules */ static u_int32_t dyn_max = 4096; /* max # of dynamic rules */ SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_buckets, CTLFLAG_RW, &dyn_buckets, 0, "Number of dyn. buckets"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, curr_dyn_buckets, CTLFLAG_RD, &curr_dyn_buckets, 0, "Current Number of dyn. buckets"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_count, CTLFLAG_RD, &dyn_count, 0, "Number of dyn. rules"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_max, CTLFLAG_RW, &dyn_max, 0, "Max number of dyn. rules"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, static_count, CTLFLAG_RD, &static_count, 0, "Number of static rules"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_ack_lifetime, CTLFLAG_RW, &dyn_ack_lifetime, 0, "Lifetime of dyn. rules for acks"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_syn_lifetime, CTLFLAG_RW, &dyn_syn_lifetime, 0, "Lifetime of dyn. rules for syn"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_fin_lifetime, CTLFLAG_RW, &dyn_fin_lifetime, 0, "Lifetime of dyn. rules for fin"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_rst_lifetime, CTLFLAG_RW, &dyn_rst_lifetime, 0, "Lifetime of dyn. rules for rst"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_udp_lifetime, CTLFLAG_RW, &dyn_udp_lifetime, 0, "Lifetime of dyn. rules for UDP"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_short_lifetime, CTLFLAG_RW, &dyn_short_lifetime, 0, "Lifetime of dyn. rules for other situations"); SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_keepalive, CTLFLAG_RW, &dyn_keepalive, 0, "Enable keepalives for dyn. rules"); #endif /* SYSCTL_NODE */ static ip_fw_chk_t ipfw_chk; ip_dn_ruledel_t *ip_dn_ruledel_ptr = NULL; /* hook into dummynet */ /* * This macro maps an ip pointer into a layer3 header pointer of type T */ #define L3HDR(T, ip) ((T *)((u_int32_t *)(ip) + (ip)->ip_hl)) static __inline int icmptype_match(struct ip *ip, ipfw_insn_u32 *cmd) { int type = L3HDR(struct icmp,ip)->icmp_type; return (type <= ICMP_MAXTYPE && (cmd->d[0] & (1<icmp_type; return (type <= ICMP_MAXTYPE && (TT & (1<arg1 or cmd->d[0]. * * We scan options and store the bits we find set. We succeed if * * (want_set & ~bits) == 0 && (want_clear & ~bits) == want_clear * * The code is sometimes optimized not to store additional variables. */ static int flags_match(ipfw_insn *cmd, u_int8_t bits) { u_char want_clear; bits = ~bits; if ( ((cmd->arg1 & 0xff) & bits) != 0) return 0; /* some bits we want set were clear */ want_clear = (cmd->arg1 >> 8) & 0xff; if ( (want_clear & bits) != want_clear) return 0; /* some bits we want clear were set */ return 1; } static int ipopts_match(struct ip *ip, ipfw_insn *cmd) { int optlen, bits = 0; u_char *cp = (u_char *)(ip + 1); int x = (ip->ip_hl << 2) - sizeof (struct ip); for (; x > 0; x -= optlen, cp += optlen) { int opt = cp[IPOPT_OPTVAL]; if (opt == IPOPT_EOL) break; if (opt == IPOPT_NOP) optlen = 1; else { optlen = cp[IPOPT_OLEN]; if (optlen <= 0 || optlen > x) return 0; /* invalid or truncated */ } switch (opt) { default: break; case IPOPT_LSRR: bits |= IP_FW_IPOPT_LSRR; break; case IPOPT_SSRR: bits |= IP_FW_IPOPT_SSRR; break; case IPOPT_RR: bits |= IP_FW_IPOPT_RR; break; case IPOPT_TS: bits |= IP_FW_IPOPT_TS; break; } } return (flags_match(cmd, bits)); } static int tcpopts_match(struct ip *ip, ipfw_insn *cmd) { int optlen, bits = 0; struct tcphdr *tcp = L3HDR(struct tcphdr,ip); u_char *cp = (u_char *)(tcp + 1); int x = (tcp->th_off << 2) - sizeof(struct tcphdr); for (; x > 0; x -= optlen, cp += optlen) { int opt = cp[0]; if (opt == TCPOPT_EOL) break; if (opt == TCPOPT_NOP) optlen = 1; else { optlen = cp[1]; if (optlen <= 0) break; } switch (opt) { default: break; case TCPOPT_MAXSEG: bits |= IP_FW_TCPOPT_MSS; break; case TCPOPT_WINDOW: bits |= IP_FW_TCPOPT_WINDOW; break; case TCPOPT_SACK_PERMITTED: case TCPOPT_SACK: bits |= IP_FW_TCPOPT_SACK; break; case TCPOPT_TIMESTAMP: bits |= IP_FW_TCPOPT_TS; break; case TCPOPT_CC: case TCPOPT_CCNEW: case TCPOPT_CCECHO: bits |= IP_FW_TCPOPT_CC; break; } } return (flags_match(cmd, bits)); } static int iface_match(struct ifnet *ifp, ipfw_insn_if *cmd) { if (ifp == NULL) /* no iface with this packet, match fails */ return 0; /* Check by name or by IP address */ if (cmd->name[0] != '\0') { /* match by name */ /* Check unit number (-1 is wildcard) */ if (cmd->p.unit != -1 && cmd->p.unit != ifp->if_unit) return(0); /* Check name */ if (!strncmp(ifp->if_name, cmd->name, IFNAMSIZ)) return(1); } else { struct ifaddr *ia; TAILQ_FOREACH(ia, &ifp->if_addrhead, ifa_link) { if (ia->ifa_addr == NULL) continue; if (ia->ifa_addr->sa_family != AF_INET) continue; if (cmd->p.ip.s_addr == ((struct sockaddr_in *) (ia->ifa_addr))->sin_addr.s_addr) return(1); /* match */ } } return(0); /* no match, fail ... */ } +/* + * The 'verrevpath' option checks that the interface that an IP packet + * arrives on is the same interface that traffic destined for the + * packet's source address would be routed out of. This is a measure + * to block forged packets. This is also commonly known as "anti-spoofing" + * or Unicast Reverse Path Forwarding (Unicast RFP) in Cisco-ese. The + * name of the knob is purposely reminisent of the Cisco IOS command, + * + * ip verify unicast reverse-path + * + * which implements the same functionality. But note that syntax is + * misleading. The check may be performed on all IP packets whether unicast, + * multicast, or broadcast. + */ +static int +verify_rev_path(struct in_addr src, struct ifnet *ifp) +{ + static struct route ro; + struct sockaddr_in *dst; + + dst = (struct sockaddr_in *)&(ro.ro_dst); + + /* Check if we've cached the route from the previous call. */ + if (src.s_addr != dst->sin_addr.s_addr) { + ro.ro_rt = NULL; + + bzero(dst, sizeof(*dst)); + dst->sin_family = AF_INET; + dst->sin_len = sizeof(*dst); + dst->sin_addr = src; + + rtalloc_ign(&ro, RTF_CLONING|RTF_PRCLONING); + } + + if ((ro.ro_rt == NULL) || (ifp == NULL) || + (ro.ro_rt->rt_ifp->if_index != ifp->if_index)) + return 0; + + return 1; +} + + static u_int64_t norule_counter; /* counter for ipfw_log(NULL...) */ #define SNPARGS(buf, len) buf + len, sizeof(buf) > len ? sizeof(buf) - len : 0 #define SNP(buf) buf, sizeof(buf) /* * We enter here when we have a rule with O_LOG. * XXX this function alone takes about 2Kbytes of code! */ static void ipfw_log(struct ip_fw *f, u_int hlen, struct ether_header *eh, struct mbuf *m, struct ifnet *oif) { char *action; int limit_reached = 0; char action2[40], proto[48], fragment[28]; fragment[0] = '\0'; proto[0] = '\0'; if (f == NULL) { /* bogus pkt */ if (verbose_limit != 0 && norule_counter >= verbose_limit) return; norule_counter++; if (norule_counter == verbose_limit) limit_reached = verbose_limit; action = "Refuse"; } else { /* O_LOG is the first action, find the real one */ ipfw_insn *cmd = ACTION_PTR(f); ipfw_insn_log *l = (ipfw_insn_log *)cmd; if (l->max_log != 0 && l->log_left == 0) return; l->log_left--; if (l->log_left == 0) limit_reached = l->max_log; cmd += F_LEN(cmd); /* point to first action */ if (cmd->opcode == O_PROB) cmd += F_LEN(cmd); action = action2; switch (cmd->opcode) { case O_DENY: action = "Deny"; break; case O_REJECT: if (cmd->arg1==ICMP_REJECT_RST) action = "Reset"; else if (cmd->arg1==ICMP_UNREACH_HOST) action = "Reject"; else snprintf(SNPARGS(action2, 0), "Unreach %d", cmd->arg1); break; case O_ACCEPT: action = "Accept"; break; case O_COUNT: action = "Count"; break; case O_DIVERT: snprintf(SNPARGS(action2, 0), "Divert %d", cmd->arg1); break; case O_TEE: snprintf(SNPARGS(action2, 0), "Tee %d", cmd->arg1); break; case O_SKIPTO: snprintf(SNPARGS(action2, 0), "SkipTo %d", cmd->arg1); break; case O_PIPE: snprintf(SNPARGS(action2, 0), "Pipe %d", cmd->arg1); break; case O_QUEUE: snprintf(SNPARGS(action2, 0), "Queue %d", cmd->arg1); break; case O_FORWARD_IP: { ipfw_insn_sa *sa = (ipfw_insn_sa *)cmd; int len; len = snprintf(SNPARGS(action2, 0), "Forward to %s", inet_ntoa(sa->sa.sin_addr)); if (sa->sa.sin_port) snprintf(SNPARGS(action2, len), ":%d", sa->sa.sin_port); } break; default: action = "UNKNOWN"; break; } } if (hlen == 0) { /* non-ip */ snprintf(SNPARGS(proto, 0), "MAC"); } else { struct ip *ip = mtod(m, struct ip *); /* these three are all aliases to the same thing */ struct icmp *const icmp = L3HDR(struct icmp, ip); struct tcphdr *const tcp = (struct tcphdr *)icmp; struct udphdr *const udp = (struct udphdr *)icmp; int ip_off, offset, ip_len; int len; if (eh != NULL) { /* layer 2 packets are as on the wire */ ip_off = ntohs(ip->ip_off); ip_len = ntohs(ip->ip_len); } else { ip_off = ip->ip_off; ip_len = ip->ip_len; } offset = ip_off & IP_OFFMASK; switch (ip->ip_p) { case IPPROTO_TCP: len = snprintf(SNPARGS(proto, 0), "TCP %s", inet_ntoa(ip->ip_src)); if (offset == 0) snprintf(SNPARGS(proto, len), ":%d %s:%d", ntohs(tcp->th_sport), inet_ntoa(ip->ip_dst), ntohs(tcp->th_dport)); else snprintf(SNPARGS(proto, len), " %s", inet_ntoa(ip->ip_dst)); break; case IPPROTO_UDP: len = snprintf(SNPARGS(proto, 0), "UDP %s", inet_ntoa(ip->ip_src)); if (offset == 0) snprintf(SNPARGS(proto, len), ":%d %s:%d", ntohs(udp->uh_sport), inet_ntoa(ip->ip_dst), ntohs(udp->uh_dport)); else snprintf(SNPARGS(proto, len), " %s", inet_ntoa(ip->ip_dst)); break; case IPPROTO_ICMP: if (offset == 0) len = snprintf(SNPARGS(proto, 0), "ICMP:%u.%u ", icmp->icmp_type, icmp->icmp_code); else len = snprintf(SNPARGS(proto, 0), "ICMP "); len += snprintf(SNPARGS(proto, len), "%s", inet_ntoa(ip->ip_src)); snprintf(SNPARGS(proto, len), " %s", inet_ntoa(ip->ip_dst)); break; default: len = snprintf(SNPARGS(proto, 0), "P:%d %s", ip->ip_p, inet_ntoa(ip->ip_src)); snprintf(SNPARGS(proto, len), " %s", inet_ntoa(ip->ip_dst)); break; } if (ip_off & (IP_MF | IP_OFFMASK)) snprintf(SNPARGS(fragment, 0), " (frag %d:%d@%d%s)", ntohs(ip->ip_id), ip_len - (ip->ip_hl << 2), offset << 3, (ip_off & IP_MF) ? "+" : ""); } if (oif || m->m_pkthdr.rcvif) log(LOG_SECURITY | LOG_INFO, "ipfw: %d %s %s %s via %s%d%s\n", f ? f->rulenum : -1, action, proto, oif ? "out" : "in", oif ? oif->if_name : m->m_pkthdr.rcvif->if_name, oif ? oif->if_unit : m->m_pkthdr.rcvif->if_unit, fragment); else log(LOG_SECURITY | LOG_INFO, "ipfw: %d %s %s [no if info]%s\n", f ? f->rulenum : -1, action, proto, fragment); if (limit_reached) log(LOG_SECURITY | LOG_NOTICE, "ipfw: limit %d reached on entry %d\n", limit_reached, f ? f->rulenum : -1); } /* * IMPORTANT: the hash function for dynamic rules must be commutative * in source and destination (ip,port), because rules are bidirectional * and we want to find both in the same bucket. */ static __inline int hash_packet(struct ipfw_flow_id *id) { u_int32_t i; i = (id->dst_ip) ^ (id->src_ip) ^ (id->dst_port) ^ (id->src_port); i &= (curr_dyn_buckets - 1); return i; } /** * unlink a dynamic rule from a chain. prev is a pointer to * the previous one, q is a pointer to the rule to delete, * head is a pointer to the head of the queue. * Modifies q and potentially also head. */ #define UNLINK_DYN_RULE(prev, head, q) { \ ipfw_dyn_rule *old_q = q; \ \ /* remove a refcount to the parent */ \ if (q->dyn_type == O_LIMIT) \ q->parent->count--; \ DEB(printf("ipfw: unlink entry 0x%08x %d -> 0x%08x %d, %d left\n",\ (q->id.src_ip), (q->id.src_port), \ (q->id.dst_ip), (q->id.dst_port), dyn_count-1 ); ) \ if (prev != NULL) \ prev->next = q = q->next; \ else \ head = q = q->next; \ dyn_count--; \ free(old_q, M_IPFW); } #define TIME_LEQ(a,b) ((int)((a)-(b)) <= 0) /** * Remove dynamic rules pointing to "rule", or all of them if rule == NULL. * * If keep_me == NULL, rules are deleted even if not expired, * otherwise only expired rules are removed. * * The value of the second parameter is also used to point to identify * a rule we absolutely do not want to remove (e.g. because we are * holding a reference to it -- this is the case with O_LIMIT_PARENT * rules). The pointer is only used for comparison, so any non-null * value will do. */ static void remove_dyn_rule(struct ip_fw *rule, ipfw_dyn_rule *keep_me) { static u_int32_t last_remove = 0; #define FORCE (keep_me == NULL) ipfw_dyn_rule *prev, *q; int i, pass = 0, max_pass = 0; if (ipfw_dyn_v == NULL || dyn_count == 0) return; /* do not expire more than once per second, it is useless */ if (!FORCE && last_remove == time_second) return; last_remove = time_second; /* * because O_LIMIT refer to parent rules, during the first pass only * remove child and mark any pending LIMIT_PARENT, and remove * them in a second pass. */ next_pass: for (i = 0 ; i < curr_dyn_buckets ; i++) { for (prev=NULL, q = ipfw_dyn_v[i] ; q ; ) { /* * Logic can become complex here, so we split tests. */ if (q == keep_me) goto next; if (rule != NULL && rule != q->rule) goto next; /* not the one we are looking for */ if (q->dyn_type == O_LIMIT_PARENT) { /* * handle parent in the second pass, * record we need one. */ max_pass = 1; if (pass == 0) goto next; if (FORCE && q->count != 0 ) { /* XXX should not happen! */ - printf( "ipfw: OUCH! cannot remove rule," + printf("ipfw: OUCH! cannot remove rule," " count %d\n", q->count); } } else { if (!FORCE && !TIME_LEQ( q->expire, time_second )) goto next; } UNLINK_DYN_RULE(prev, ipfw_dyn_v[i], q); continue; next: prev=q; q=q->next; } } if (pass++ < max_pass) goto next_pass; } /** * lookup a dynamic rule. */ static ipfw_dyn_rule * lookup_dyn_rule(struct ipfw_flow_id *pkt, int *match_direction, struct tcphdr *tcp) { /* * stateful ipfw extensions. * Lookup into dynamic session queue */ #define MATCH_REVERSE 0 #define MATCH_FORWARD 1 #define MATCH_NONE 2 #define MATCH_UNKNOWN 3 int i, dir = MATCH_NONE; ipfw_dyn_rule *prev, *q=NULL; if (ipfw_dyn_v == NULL) goto done; /* not found */ i = hash_packet( pkt ); for (prev=NULL, q = ipfw_dyn_v[i] ; q != NULL ; ) { if (q->dyn_type == O_LIMIT_PARENT) goto next; if (TIME_LEQ( q->expire, time_second)) { /* expire entry */ UNLINK_DYN_RULE(prev, ipfw_dyn_v[i], q); continue; } if ( pkt->proto == q->id.proto) { if (pkt->src_ip == q->id.src_ip && pkt->dst_ip == q->id.dst_ip && pkt->src_port == q->id.src_port && pkt->dst_port == q->id.dst_port ) { dir = MATCH_FORWARD; break; } if (pkt->src_ip == q->id.dst_ip && pkt->dst_ip == q->id.src_ip && pkt->src_port == q->id.dst_port && pkt->dst_port == q->id.src_port ) { dir = MATCH_REVERSE; break; } } next: prev = q; q = q->next; } if (q == NULL) goto done; /* q = NULL, not found */ if ( prev != NULL) { /* found and not in front */ prev->next = q->next; q->next = ipfw_dyn_v[i]; ipfw_dyn_v[i] = q; } if (pkt->proto == IPPROTO_TCP) { /* update state according to flags */ u_char flags = pkt->flags & (TH_FIN|TH_SYN|TH_RST); #define BOTH_SYN (TH_SYN | (TH_SYN << 8)) #define BOTH_FIN (TH_FIN | (TH_FIN << 8)) q->state |= (dir == MATCH_FORWARD ) ? flags : (flags << 8); switch (q->state) { case TH_SYN: /* opening */ q->expire = time_second + dyn_syn_lifetime; break; case BOTH_SYN: /* move to established */ case BOTH_SYN | TH_FIN : /* one side tries to close */ case BOTH_SYN | (TH_FIN << 8) : if (tcp) { #define _SEQ_GE(a,b) ((int)(a) - (int)(b) >= 0) u_int32_t ack = ntohl(tcp->th_ack); if (dir == MATCH_FORWARD) { if (q->ack_fwd == 0 || _SEQ_GE(ack, q->ack_fwd)) q->ack_fwd = ack; else { /* ignore out-of-sequence */ break; } } else { if (q->ack_rev == 0 || _SEQ_GE(ack, q->ack_rev)) q->ack_rev = ack; else { /* ignore out-of-sequence */ break; } } } q->expire = time_second + dyn_ack_lifetime; break; case BOTH_SYN | BOTH_FIN: /* both sides closed */ if (dyn_fin_lifetime >= dyn_keepalive_period) dyn_fin_lifetime = dyn_keepalive_period - 1; q->expire = time_second + dyn_fin_lifetime; break; default: #if 0 /* * reset or some invalid combination, but can also * occur if we use keep-state the wrong way. */ if ( (q->state & ((TH_RST << 8)|TH_RST)) == 0) printf("invalid state: 0x%x\n", q->state); #endif if (dyn_rst_lifetime >= dyn_keepalive_period) dyn_rst_lifetime = dyn_keepalive_period - 1; q->expire = time_second + dyn_rst_lifetime; break; } } else if (pkt->proto == IPPROTO_UDP) { q->expire = time_second + dyn_udp_lifetime; } else { /* other protocols */ q->expire = time_second + dyn_short_lifetime; } done: if (match_direction) *match_direction = dir; return q; } static void realloc_dynamic_table(void) { /* * Try reallocation, make sure we have a power of 2 and do * not allow more than 64k entries. In case of overflow, * default to 1024. */ if (dyn_buckets > 65536) dyn_buckets = 1024; if ((dyn_buckets & (dyn_buckets-1)) != 0) { /* not a power of 2 */ dyn_buckets = curr_dyn_buckets; /* reset */ return; } curr_dyn_buckets = dyn_buckets; if (ipfw_dyn_v != NULL) free(ipfw_dyn_v, M_IPFW); for (;;) { ipfw_dyn_v = malloc(curr_dyn_buckets * sizeof(ipfw_dyn_rule *), M_IPFW, M_NOWAIT | M_ZERO); if (ipfw_dyn_v != NULL || curr_dyn_buckets <= 2) break; curr_dyn_buckets /= 2; } } /** * Install state of type 'type' for a dynamic session. * The hash table contains two type of rules: * - regular rules (O_KEEP_STATE) * - rules for sessions with limited number of sess per user * (O_LIMIT). When they are created, the parent is * increased by 1, and decreased on delete. In this case, * the third parameter is the parent rule and not the chain. * - "parent" rules for the above (O_LIMIT_PARENT). */ static ipfw_dyn_rule * add_dyn_rule(struct ipfw_flow_id *id, u_int8_t dyn_type, struct ip_fw *rule) { ipfw_dyn_rule *r; int i; if (ipfw_dyn_v == NULL || (dyn_count == 0 && dyn_buckets != curr_dyn_buckets)) { realloc_dynamic_table(); if (ipfw_dyn_v == NULL) return NULL; /* failed ! */ } i = hash_packet(id); r = malloc(sizeof *r, M_IPFW, M_NOWAIT | M_ZERO); if (r == NULL) { printf ("ipfw: sorry cannot allocate state\n"); return NULL; } /* increase refcount on parent, and set pointer */ if (dyn_type == O_LIMIT) { ipfw_dyn_rule *parent = (ipfw_dyn_rule *)rule; if ( parent->dyn_type != O_LIMIT_PARENT) panic("invalid parent"); parent->count++; r->parent = parent; rule = parent->rule; } r->id = *id; r->expire = time_second + dyn_syn_lifetime; r->rule = rule; r->dyn_type = dyn_type; r->pcnt = r->bcnt = 0; r->count = 0; r->bucket = i; r->next = ipfw_dyn_v[i]; ipfw_dyn_v[i] = r; dyn_count++; DEB(printf("ipfw: add dyn entry ty %d 0x%08x %d -> 0x%08x %d, total %d\n", dyn_type, (r->id.src_ip), (r->id.src_port), (r->id.dst_ip), (r->id.dst_port), dyn_count ); ) return r; } /** * lookup dynamic parent rule using pkt and rule as search keys. * If the lookup fails, then install one. */ static ipfw_dyn_rule * lookup_dyn_parent(struct ipfw_flow_id *pkt, struct ip_fw *rule) { ipfw_dyn_rule *q; int i; if (ipfw_dyn_v) { i = hash_packet( pkt ); for (q = ipfw_dyn_v[i] ; q != NULL ; q=q->next) if (q->dyn_type == O_LIMIT_PARENT && rule== q->rule && pkt->proto == q->id.proto && pkt->src_ip == q->id.src_ip && pkt->dst_ip == q->id.dst_ip && pkt->src_port == q->id.src_port && pkt->dst_port == q->id.dst_port) { q->expire = time_second + dyn_short_lifetime; DEB(printf("ipfw: lookup_dyn_parent found 0x%p\n",q);) return q; } } return add_dyn_rule(pkt, O_LIMIT_PARENT, rule); } /** * Install dynamic state for rule type cmd->o.opcode * * Returns 1 (failure) if state is not installed because of errors or because * session limitations are enforced. */ static int install_state(struct ip_fw *rule, ipfw_insn_limit *cmd, struct ip_fw_args *args) { static int last_log; ipfw_dyn_rule *q; DEB(printf("ipfw: install state type %d 0x%08x %u -> 0x%08x %u\n", cmd->o.opcode, (args->f_id.src_ip), (args->f_id.src_port), (args->f_id.dst_ip), (args->f_id.dst_port) );) q = lookup_dyn_rule(&args->f_id, NULL, NULL); if (q != NULL) { /* should never occur */ if (last_log != time_second) { last_log = time_second; printf("ipfw: install_state: entry already present, done\n"); } return 0; } if (dyn_count >= dyn_max) /* * Run out of slots, try to remove any expired rule. */ remove_dyn_rule(NULL, (ipfw_dyn_rule *)1); if (dyn_count >= dyn_max) { if (last_log != time_second) { last_log = time_second; printf("ipfw: install_state: Too many dynamic rules\n"); } return 1; /* cannot install, notify caller */ } switch (cmd->o.opcode) { case O_KEEP_STATE: /* bidir rule */ add_dyn_rule(&args->f_id, O_KEEP_STATE, rule); break; case O_LIMIT: /* limit number of sessions */ { u_int16_t limit_mask = cmd->limit_mask; struct ipfw_flow_id id; ipfw_dyn_rule *parent; DEB(printf("ipfw: installing dyn-limit rule %d\n", cmd->conn_limit);) id.dst_ip = id.src_ip = 0; id.dst_port = id.src_port = 0; id.proto = args->f_id.proto; if (limit_mask & DYN_SRC_ADDR) id.src_ip = args->f_id.src_ip; if (limit_mask & DYN_DST_ADDR) id.dst_ip = args->f_id.dst_ip; if (limit_mask & DYN_SRC_PORT) id.src_port = args->f_id.src_port; if (limit_mask & DYN_DST_PORT) id.dst_port = args->f_id.dst_port; parent = lookup_dyn_parent(&id, rule); if (parent == NULL) { printf("ipfw: add parent failed\n"); return 1; } if (parent->count >= cmd->conn_limit) { /* * See if we can remove some expired rule. */ remove_dyn_rule(rule, parent); if (parent->count >= cmd->conn_limit) { if (fw_verbose && last_log != time_second) { last_log = time_second; log(LOG_SECURITY | LOG_DEBUG, "drop session, too many entries\n"); } return 1; } } add_dyn_rule(&args->f_id, O_LIMIT, (struct ip_fw *)parent); } break; default: printf("ipfw: unknown dynamic rule type %u\n", cmd->o.opcode); return 1; } lookup_dyn_rule(&args->f_id, NULL, NULL); /* XXX just set lifetime */ return 0; } /* * Transmit a TCP packet, containing either a RST or a keepalive. * When flags & TH_RST, we are sending a RST packet, because of a * "reset" action matched the packet. * Otherwise we are sending a keepalive, and flags & TH_ */ static void send_pkt(struct ipfw_flow_id *id, u_int32_t seq, u_int32_t ack, int flags) { struct mbuf *m; struct ip *ip; struct tcphdr *tcp; struct route sro; /* fake route */ MGETHDR(m, M_DONTWAIT, MT_HEADER); if (m == 0) return; m->m_pkthdr.rcvif = (struct ifnet *)0; m->m_pkthdr.len = m->m_len = sizeof(struct ip) + sizeof(struct tcphdr); m->m_data += max_linkhdr; ip = mtod(m, struct ip *); bzero(ip, m->m_len); tcp = (struct tcphdr *)(ip + 1); /* no IP options */ ip->ip_p = IPPROTO_TCP; tcp->th_off = 5; /* * Assume we are sending a RST (or a keepalive in the reverse * direction), swap src and destination addresses and ports. */ ip->ip_src.s_addr = htonl(id->dst_ip); ip->ip_dst.s_addr = htonl(id->src_ip); tcp->th_sport = htons(id->dst_port); tcp->th_dport = htons(id->src_port); if (flags & TH_RST) { /* we are sending a RST */ if (flags & TH_ACK) { tcp->th_seq = htonl(ack); tcp->th_ack = htonl(0); tcp->th_flags = TH_RST; } else { if (flags & TH_SYN) seq++; tcp->th_seq = htonl(0); tcp->th_ack = htonl(seq); tcp->th_flags = TH_RST | TH_ACK; } } else { /* * We are sending a keepalive. flags & TH_SYN determines * the direction, forward if set, reverse if clear. * NOTE: seq and ack are always assumed to be correct * as set by the caller. This may be confusing... */ if (flags & TH_SYN) { /* * we have to rewrite the correct addresses! */ ip->ip_dst.s_addr = htonl(id->dst_ip); ip->ip_src.s_addr = htonl(id->src_ip); tcp->th_dport = htons(id->dst_port); tcp->th_sport = htons(id->src_port); } tcp->th_seq = htonl(seq); tcp->th_ack = htonl(ack); tcp->th_flags = TH_ACK; } /* * set ip_len to the payload size so we can compute * the tcp checksum on the pseudoheader * XXX check this, could save a couple of words ? */ ip->ip_len = htons(sizeof(struct tcphdr)); tcp->th_sum = in_cksum(m, m->m_pkthdr.len); /* * now fill fields left out earlier */ ip->ip_ttl = ip_defttl; ip->ip_len = m->m_pkthdr.len; bzero (&sro, sizeof (sro)); ip_rtaddr(ip->ip_dst, &sro); m->m_flags |= M_SKIP_FIREWALL; ip_output(m, NULL, &sro, 0, NULL, NULL); if (sro.ro_rt) RTFREE(sro.ro_rt); } /* * sends a reject message, consuming the mbuf passed as an argument. */ static void send_reject(struct ip_fw_args *args, int code, int offset, int ip_len) { if (code != ICMP_REJECT_RST) { /* Send an ICMP unreach */ /* We need the IP header in host order for icmp_error(). */ if (args->eh != NULL) { struct ip *ip = mtod(args->m, struct ip *); ip->ip_len = ntohs(ip->ip_len); ip->ip_off = ntohs(ip->ip_off); } icmp_error(args->m, ICMP_UNREACH, code, 0L, 0); } else if (offset == 0 && args->f_id.proto == IPPROTO_TCP) { struct tcphdr *const tcp = L3HDR(struct tcphdr, mtod(args->m, struct ip *)); if ( (tcp->th_flags & TH_RST) == 0) send_pkt(&(args->f_id), ntohl(tcp->th_seq), ntohl(tcp->th_ack), tcp->th_flags | TH_RST); m_freem(args->m); } else m_freem(args->m); args->m = NULL; } /** * * Given an ip_fw *, lookup_next_rule will return a pointer * to the next rule, which can be either the jump * target (for skipto instructions) or the next one in the list (in * all other cases including a missing jump target). * The result is also written in the "next_rule" field of the rule. * Backward jumps are not allowed, so start looking from the next * rule... * * This never returns NULL -- in case we do not have an exact match, * the next rule is returned. When the ruleset is changed, * pointers are flushed so we are always correct. */ static struct ip_fw * lookup_next_rule(struct ip_fw *me) { struct ip_fw *rule = NULL; ipfw_insn *cmd; /* look for action, in case it is a skipto */ cmd = ACTION_PTR(me); if (cmd->opcode == O_LOG) cmd += F_LEN(cmd); if ( cmd->opcode == O_SKIPTO ) for (rule = me->next; rule ; rule = rule->next) if (rule->rulenum >= cmd->arg1) break; if (rule == NULL) /* failure or not a skipto */ rule = me->next; me->next_rule = rule; return rule; } /* * The main check routine for the firewall. * * All arguments are in args so we can modify them and return them * back to the caller. * * Parameters: * * args->m (in/out) The packet; we set to NULL when/if we nuke it. * Starts with the IP header. * args->eh (in) Mac header if present, or NULL for layer3 packet. * args->oif Outgoing interface, or NULL if packet is incoming. * The incoming interface is in the mbuf. (in) * args->divert_rule (in/out) * Skip up to the first rule past this rule number; * upon return, non-zero port number for divert or tee. * * args->rule Pointer to the last matching rule (in/out) * args->next_hop Socket we are forwarding to (out). * args->f_id Addresses grabbed from the packet (out) * * Return value: * * IP_FW_PORT_DENY_FLAG the packet must be dropped. * 0 The packet is to be accepted and routed normally OR * the packet was denied/rejected and has been dropped; * in the latter case, *m is equal to NULL upon return. * port Divert the packet to port, with these caveats: * * - If IP_FW_PORT_TEE_FLAG is set, tee the packet instead * of diverting it (ie, 'ipfw tee'). * * - If IP_FW_PORT_DYNT_FLAG is set, interpret the lower * 16 bits as a dummynet pipe number instead of diverting */ static int ipfw_chk(struct ip_fw_args *args) { /* * Local variables hold state during the processing of a packet. * * IMPORTANT NOTE: to speed up the processing of rules, there * are some assumption on the values of the variables, which * are documented here. Should you change them, please check * the implementation of the various instructions to make sure * that they still work. * * args->eh The MAC header. It is non-null for a layer2 * packet, it is NULL for a layer-3 packet. * * m | args->m Pointer to the mbuf, as received from the caller. * It may change if ipfw_chk() does an m_pullup, or if it * consumes the packet because it calls send_reject(). * XXX This has to change, so that ipfw_chk() never modifies * or consumes the buffer. * ip is simply an alias of the value of m, and it is kept * in sync with it (the packet is supposed to start with * the ip header). */ struct mbuf *m = args->m; struct ip *ip = mtod(m, struct ip *); /* * oif | args->oif If NULL, ipfw_chk has been called on the * inbound path (ether_input, bdg_forward, ip_input). * If non-NULL, ipfw_chk has been called on the outbound path * (ether_output, ip_output). */ struct ifnet *oif = args->oif; struct ip_fw *f = NULL; /* matching rule */ int retval = 0; /* * hlen The length of the IPv4 header. * hlen >0 means we have an IPv4 packet. */ u_int hlen = 0; /* hlen >0 means we have an IP pkt */ /* * offset The offset of a fragment. offset != 0 means that * we have a fragment at this offset of an IPv4 packet. * offset == 0 means that (if this is an IPv4 packet) * this is the first or only fragment. */ u_short offset = 0; /* * Local copies of addresses. They are only valid if we have * an IP packet. * * proto The protocol. Set to 0 for non-ip packets, * or to the protocol read from the packet otherwise. * proto != 0 means that we have an IPv4 packet. * * src_port, dst_port port numbers, in HOST format. Only * valid for TCP and UDP packets. * * src_ip, dst_ip ip addresses, in NETWORK format. * Only valid for IPv4 packets. */ u_int8_t proto; u_int16_t src_port = 0, dst_port = 0; /* NOTE: host format */ struct in_addr src_ip, dst_ip; /* NOTE: network format */ u_int16_t ip_len=0; int pktlen; int dyn_dir = MATCH_UNKNOWN; ipfw_dyn_rule *q = NULL; if (m->m_flags & M_SKIP_FIREWALL) return 0; /* accept */ /* * dyn_dir = MATCH_UNKNOWN when rules unchecked, * MATCH_NONE when checked and not matched (q = NULL), * MATCH_FORWARD or MATCH_REVERSE otherwise (q != NULL) */ pktlen = m->m_pkthdr.len; if (args->eh == NULL || /* layer 3 packet */ ( m->m_pkthdr.len >= sizeof(struct ip) && ntohs(args->eh->ether_type) == ETHERTYPE_IP)) hlen = ip->ip_hl << 2; /* * Collect parameters into local variables for faster matching. */ if (hlen == 0) { /* do not grab addresses for non-ip pkts */ proto = args->f_id.proto = 0; /* mark f_id invalid */ goto after_ip_checks; } proto = args->f_id.proto = ip->ip_p; src_ip = ip->ip_src; dst_ip = ip->ip_dst; if (args->eh != NULL) { /* layer 2 packets are as on the wire */ offset = ntohs(ip->ip_off) & IP_OFFMASK; ip_len = ntohs(ip->ip_len); } else { offset = ip->ip_off & IP_OFFMASK; ip_len = ip->ip_len; } pktlen = ip_len < pktlen ? ip_len : pktlen; #define PULLUP_TO(len) \ do { \ if ((m)->m_len < (len)) { \ args->m = m = m_pullup(m, (len)); \ if (m == 0) \ goto pullup_failed; \ ip = mtod(m, struct ip *); \ } \ } while (0) if (offset == 0) { switch (proto) { case IPPROTO_TCP: { struct tcphdr *tcp; PULLUP_TO(hlen + sizeof(struct tcphdr)); tcp = L3HDR(struct tcphdr, ip); dst_port = tcp->th_dport; src_port = tcp->th_sport; args->f_id.flags = tcp->th_flags; } break; case IPPROTO_UDP: { struct udphdr *udp; PULLUP_TO(hlen + sizeof(struct udphdr)); udp = L3HDR(struct udphdr, ip); dst_port = udp->uh_dport; src_port = udp->uh_sport; } break; case IPPROTO_ICMP: PULLUP_TO(hlen + 4); /* type, code and checksum. */ args->f_id.flags = L3HDR(struct icmp, ip)->icmp_type; break; default: break; } #undef PULLUP_TO } args->f_id.src_ip = ntohl(src_ip.s_addr); args->f_id.dst_ip = ntohl(dst_ip.s_addr); args->f_id.src_port = src_port = ntohs(src_port); args->f_id.dst_port = dst_port = ntohs(dst_port); after_ip_checks: if (args->rule) { /* * Packet has already been tagged. Look for the next rule * to restart processing. * * If fw_one_pass != 0 then just accept it. * XXX should not happen here, but optimized out in * the caller. */ if (fw_one_pass) return 0; f = args->rule->next_rule; if (f == NULL) f = lookup_next_rule(args->rule); } else { /* * Find the starting rule. It can be either the first * one, or the one after divert_rule if asked so. */ int skipto = args->divert_rule; f = layer3_chain; if (args->eh == NULL && skipto != 0) { if (skipto >= IPFW_DEFAULT_RULE) return(IP_FW_PORT_DENY_FLAG); /* invalid */ while (f && f->rulenum <= skipto) f = f->next; if (f == NULL) /* drop packet */ return(IP_FW_PORT_DENY_FLAG); } } args->divert_rule = 0; /* reset to avoid confusion later */ /* * Now scan the rules, and parse microinstructions for each rule. */ for (; f; f = f->next) { int l, cmdlen; ipfw_insn *cmd; int skip_or; /* skip rest of OR block */ again: if (set_disable & (1 << f->set) ) continue; skip_or = 0; for (l = f->cmd_len, cmd = f->cmd ; l > 0 ; l -= cmdlen, cmd += cmdlen) { int match; /* * check_body is a jump target used when we find a * CHECK_STATE, and need to jump to the body of * the target rule. */ check_body: cmdlen = F_LEN(cmd); /* * An OR block (insn_1 || .. || insn_n) has the * F_OR bit set in all but the last instruction. * The first match will set "skip_or", and cause * the following instructions to be skipped until * past the one with the F_OR bit clear. */ if (skip_or) { /* skip this instruction */ if ((cmd->len & F_OR) == 0) skip_or = 0; /* next one is good */ continue; } match = 0; /* set to 1 if we succeed */ switch (cmd->opcode) { /* * The first set of opcodes compares the packet's * fields with some pattern, setting 'match' if a * match is found. At the end of the loop there is * logic to deal with F_NOT and F_OR flags associated * with the opcode. */ case O_NOP: match = 1; break; case O_FORWARD_MAC: printf("ipfw: opcode %d unimplemented\n", cmd->opcode); break; case O_GID: case O_UID: /* * We only check offset == 0 && proto != 0, * as this ensures that we have an IPv4 * packet with the ports info. */ if (offset!=0) break; { struct inpcbinfo *pi; int wildcard; struct inpcb *pcb; if (proto == IPPROTO_TCP) { wildcard = 0; pi = &tcbinfo; } else if (proto == IPPROTO_UDP) { wildcard = 1; pi = &udbinfo; } else break; pcb = (oif) ? in_pcblookup_hash(pi, dst_ip, htons(dst_port), src_ip, htons(src_port), wildcard, oif) : in_pcblookup_hash(pi, src_ip, htons(src_port), dst_ip, htons(dst_port), wildcard, NULL); if (pcb == NULL || pcb->inp_socket == NULL) break; #if __FreeBSD_version < 500034 #define socheckuid(a,b) ((a)->so_cred->cr_uid != (b)) #endif if (cmd->opcode == O_UID) { match = !socheckuid(pcb->inp_socket, (uid_t)((ipfw_insn_u32 *)cmd)->d[0]); } else { match = groupmember( (uid_t)((ipfw_insn_u32 *)cmd)->d[0], pcb->inp_socket->so_cred); } } break; case O_RECV: match = iface_match(m->m_pkthdr.rcvif, (ipfw_insn_if *)cmd); break; case O_XMIT: match = iface_match(oif, (ipfw_insn_if *)cmd); break; case O_VIA: match = iface_match(oif ? oif : m->m_pkthdr.rcvif, (ipfw_insn_if *)cmd); break; case O_MACADDR2: if (args->eh != NULL) { /* have MAC header */ u_int32_t *want = (u_int32_t *) ((ipfw_insn_mac *)cmd)->addr; u_int32_t *mask = (u_int32_t *) ((ipfw_insn_mac *)cmd)->mask; u_int32_t *hdr = (u_int32_t *)args->eh; match = ( want[0] == (hdr[0] & mask[0]) && want[1] == (hdr[1] & mask[1]) && want[2] == (hdr[2] & mask[2]) ); } break; case O_MAC_TYPE: if (args->eh != NULL) { u_int16_t t = ntohs(args->eh->ether_type); u_int16_t *p = ((ipfw_insn_u16 *)cmd)->ports; int i; for (i = cmdlen - 1; !match && i>0; i--, p += 2) match = (t>=p[0] && t<=p[1]); } break; case O_FRAG: match = (hlen > 0 && offset != 0); break; case O_IN: /* "out" is "not in" */ match = (oif == NULL); break; case O_LAYER2: match = (args->eh != NULL); break; case O_PROTO: /* * We do not allow an arg of 0 so the * check of "proto" only suffices. */ match = (proto == cmd->arg1); break; case O_IP_SRC: match = (hlen > 0 && ((ipfw_insn_ip *)cmd)->addr.s_addr == src_ip.s_addr); break; case O_IP_SRC_MASK: match = (hlen > 0 && ((ipfw_insn_ip *)cmd)->addr.s_addr == (src_ip.s_addr & ((ipfw_insn_ip *)cmd)->mask.s_addr)); break; case O_IP_SRC_ME: if (hlen > 0) { struct ifnet *tif; INADDR_TO_IFP(src_ip, tif); match = (tif != NULL); } break; case O_IP_DST_SET: case O_IP_SRC_SET: if (hlen > 0) { u_int32_t *d = (u_int32_t *)(cmd+1); u_int32_t addr = cmd->opcode == O_IP_DST_SET ? args->f_id.dst_ip : args->f_id.src_ip; if (addr < d[0]) break; addr -= d[0]; /* subtract base */ match = (addr < cmd->arg1) && ( d[ 1 + (addr>>5)] & (1<<(addr & 0x1f)) ); } break; case O_IP_DST: match = (hlen > 0 && ((ipfw_insn_ip *)cmd)->addr.s_addr == dst_ip.s_addr); break; case O_IP_DST_MASK: match = (hlen > 0) && (((ipfw_insn_ip *)cmd)->addr.s_addr == (dst_ip.s_addr & ((ipfw_insn_ip *)cmd)->mask.s_addr)); break; case O_IP_DST_ME: if (hlen > 0) { struct ifnet *tif; INADDR_TO_IFP(dst_ip, tif); match = (tif != NULL); } break; case O_IP_SRCPORT: case O_IP_DSTPORT: /* * offset == 0 && proto != 0 is enough * to guarantee that we have an IPv4 * packet with port info. */ if ((proto==IPPROTO_UDP || proto==IPPROTO_TCP) && offset == 0) { u_int16_t x = (cmd->opcode == O_IP_SRCPORT) ? src_port : dst_port ; u_int16_t *p = ((ipfw_insn_u16 *)cmd)->ports; int i; for (i = cmdlen - 1; !match && i>0; i--, p += 2) match = (x>=p[0] && x<=p[1]); } break; case O_ICMPTYPE: match = (offset == 0 && proto==IPPROTO_ICMP && icmptype_match(ip, (ipfw_insn_u32 *)cmd) ); break; case O_IPOPT: match = (hlen > 0 && ipopts_match(ip, cmd) ); break; case O_IPVER: match = (hlen > 0 && cmd->arg1 == ip->ip_v); break; + case O_IPID: + case O_IPLEN: case O_IPTTL: - match = (hlen > 0 && cmd->arg1 == ip->ip_ttl); - break; + if (hlen > 0) { /* only for IP packets */ + uint16_t x; + uint16_t *p; + int i; - case O_IPID: - match = (hlen > 0 && - cmd->arg1 == ntohs(ip->ip_id)); + if (cmd->opcode == O_IPLEN) + x = ip_len; + else if (cmd->opcode == O_IPTTL) + x = ip->ip_ttl; + else /* must be IPID */ + x = ntohs(ip->ip_id); + if (cmdlen == 1) { + match = (cmd->arg1 == x); + break; + } + /* otherwise we have ranges */ + p = ((ipfw_insn_u16 *)cmd)->ports; + i = cmdlen - 1; + for (; !match && i>0; i--, p += 2) + match = (x >= p[0] && x <= p[1]); + } break; - case O_IPLEN: - match = (hlen > 0 && cmd->arg1 == ip_len); - break; - case O_IPPRECEDENCE: match = (hlen > 0 && (cmd->arg1 == (ip->ip_tos & 0xe0)) ); break; case O_IPTOS: match = (hlen > 0 && flags_match(cmd, ip->ip_tos)); break; case O_TCPFLAGS: match = (proto == IPPROTO_TCP && offset == 0 && flags_match(cmd, L3HDR(struct tcphdr,ip)->th_flags)); break; case O_TCPOPTS: match = (proto == IPPROTO_TCP && offset == 0 && tcpopts_match(ip, cmd)); break; case O_TCPSEQ: match = (proto == IPPROTO_TCP && offset == 0 && ((ipfw_insn_u32 *)cmd)->d[0] == L3HDR(struct tcphdr,ip)->th_seq); break; case O_TCPACK: match = (proto == IPPROTO_TCP && offset == 0 && ((ipfw_insn_u32 *)cmd)->d[0] == L3HDR(struct tcphdr,ip)->th_ack); break; case O_TCPWIN: match = (proto == IPPROTO_TCP && offset == 0 && cmd->arg1 == L3HDR(struct tcphdr,ip)->th_win); break; case O_ESTAB: /* reject packets which have SYN only */ /* XXX should i also check for TH_ACK ? */ match = (proto == IPPROTO_TCP && offset == 0 && (L3HDR(struct tcphdr,ip)->th_flags & (TH_RST | TH_ACK | TH_SYN)) != TH_SYN); break; case O_LOG: if (fw_verbose) ipfw_log(f, hlen, args->eh, m, oif); match = 1; break; case O_PROB: match = (random()<((ipfw_insn_u32 *)cmd)->d[0]); break; + case O_VERREVPATH: + /* Outgoing packets automatically pass/match */ + match = ((oif != NULL) || + (m->m_pkthdr.rcvif == NULL) || + verify_rev_path(src_ip, m->m_pkthdr.rcvif)); + break; + /* * The second set of opcodes represents 'actions', * i.e. the terminal part of a rule once the packet * matches all previous patterns. * Typically there is only one action for each rule, * and the opcode is stored at the end of the rule * (but there are exceptions -- see below). * * In general, here we set retval and terminate the * outer loop (would be a 'break 3' in some language, * but we need to do a 'goto done'). * * Exceptions: * O_COUNT and O_SKIPTO actions: * instead of terminating, we jump to the next rule * ('goto next_rule', equivalent to a 'break 2'), * or to the SKIPTO target ('goto again' after * having set f, cmd and l), respectively. * * O_LIMIT and O_KEEP_STATE: these opcodes are * not real 'actions', and are stored right * before the 'action' part of the rule. * These opcodes try to install an entry in the * state tables; if successful, we continue with * the next opcode (match=1; break;), otherwise * the packet * must be dropped * ('goto done' after setting retval); * * O_PROBE_STATE and O_CHECK_STATE: these opcodes * cause a lookup of the state table, and a jump * to the 'action' part of the parent rule * ('goto check_body') if an entry is found, or * (CHECK_STATE only) a jump to the next rule if * the entry is not found ('goto next_rule'). * The result of the lookup is cached to make * further instances of these opcodes are * effectively NOPs. */ case O_LIMIT: case O_KEEP_STATE: if (install_state(f, (ipfw_insn_limit *)cmd, args)) { retval = IP_FW_PORT_DENY_FLAG; goto done; /* error/limit violation */ } match = 1; break; case O_PROBE_STATE: case O_CHECK_STATE: /* * dynamic rules are checked at the first * keep-state or check-state occurrence, * with the result being stored in dyn_dir. * The compiler introduces a PROBE_STATE * instruction for us when we have a * KEEP_STATE (because PROBE_STATE needs * to be run first). */ if (dyn_dir == MATCH_UNKNOWN && (q = lookup_dyn_rule(&args->f_id, &dyn_dir, proto == IPPROTO_TCP ? L3HDR(struct tcphdr, ip) : NULL)) != NULL) { /* * Found dynamic entry, update stats * and jump to the 'action' part of * the parent rule. */ q->pcnt++; q->bcnt += pktlen; f = q->rule; cmd = ACTION_PTR(f); l = f->cmd_len - f->act_ofs; goto check_body; } /* * Dynamic entry not found. If CHECK_STATE, * skip to next rule, if PROBE_STATE just * ignore and continue with next opcode. */ if (cmd->opcode == O_CHECK_STATE) goto next_rule; match = 1; break; case O_ACCEPT: retval = 0; /* accept */ goto done; case O_PIPE: case O_QUEUE: args->rule = f; /* report matching rule */ retval = cmd->arg1 | IP_FW_PORT_DYNT_FLAG; goto done; case O_DIVERT: case O_TEE: if (args->eh) /* not on layer 2 */ break; args->divert_rule = f->rulenum; retval = (cmd->opcode == O_DIVERT) ? cmd->arg1 : cmd->arg1 | IP_FW_PORT_TEE_FLAG; goto done; case O_COUNT: case O_SKIPTO: f->pcnt++; /* update stats */ f->bcnt += pktlen; f->timestamp = time_second; if (cmd->opcode == O_COUNT) goto next_rule; /* handle skipto */ if (f->next_rule == NULL) lookup_next_rule(f); f = f->next_rule; goto again; case O_REJECT: /* * Drop the packet and send a reject notice * if the packet is not ICMP (or is an ICMP * query), and it is not multicast/broadcast. */ if (hlen > 0 && (proto != IPPROTO_ICMP || is_icmp_query(ip)) && !(m->m_flags & (M_BCAST|M_MCAST)) && !IN_MULTICAST(dst_ip.s_addr)) { send_reject(args, cmd->arg1, offset,ip_len); m = args->m; } /* FALLTHROUGH */ case O_DENY: retval = IP_FW_PORT_DENY_FLAG; goto done; case O_FORWARD_IP: if (args->eh) /* not valid on layer2 pkts */ break; if (!q || dyn_dir == MATCH_FORWARD) args->next_hop = &((ipfw_insn_sa *)cmd)->sa; retval = 0; goto done; default: panic("-- unknown opcode %d\n", cmd->opcode); } /* end of switch() on opcodes */ if (cmd->len & F_NOT) match = !match; if (match) { if (cmd->len & F_OR) skip_or = 1; } else { if (!(cmd->len & F_OR)) /* not an OR block, */ break; /* try next rule */ } } /* end of inner for, scan opcodes */ next_rule:; /* try next rule */ } /* end of outer for, scan rules */ printf("ipfw: ouch!, skip past end of rules, denying packet\n"); return(IP_FW_PORT_DENY_FLAG); done: /* Update statistics */ f->pcnt++; f->bcnt += pktlen; f->timestamp = time_second; return retval; pullup_failed: if (fw_verbose) printf("ipfw: pullup failed\n"); return(IP_FW_PORT_DENY_FLAG); } /* * When a rule is added/deleted, clear the next_rule pointers in all rules. * These will be reconstructed on the fly as packets are matched. * Must be called at splimp(). */ static void flush_rule_ptrs(void) { struct ip_fw *rule; for (rule = layer3_chain; rule; rule = rule->next) rule->next_rule = NULL; } /* * When pipes/queues are deleted, clear the "pipe_ptr" pointer to a given * pipe/queue, or to all of them (match == NULL). * Must be called at splimp(). */ void flush_pipe_ptrs(struct dn_flow_set *match) { struct ip_fw *rule; for (rule = layer3_chain; rule; rule = rule->next) { ipfw_insn_pipe *cmd = (ipfw_insn_pipe *)ACTION_PTR(rule); if (cmd->o.opcode != O_PIPE && cmd->o.opcode != O_QUEUE) continue; - if (match == NULL || cmd->pipe_ptr == match) - cmd->pipe_ptr = NULL; + /* + * XXX Use bcmp/bzero to handle pipe_ptr to overcome + * possible alignment problems on 64-bit architectures. + * This code is seldom used so we do not worry too + * much about efficiency. + */ + if (match == NULL || + !bcmp(&cmd->pipe_ptr, &match, sizeof(match)) ) + bzero(&cmd->pipe_ptr, sizeof(cmd->pipe_ptr)); } } /* * Add a new rule to the list. Copy the rule into a malloc'ed area, then * possibly create a rule number and add the rule to the list. * Update the rule_number in the input struct so the caller knows it as well. */ static int add_rule(struct ip_fw **head, struct ip_fw *input_rule) { struct ip_fw *rule, *f, *prev; int s; int l = RULESIZE(input_rule); if (*head == NULL && input_rule->rulenum != IPFW_DEFAULT_RULE) return (EINVAL); rule = malloc(l, M_IPFW, M_NOWAIT | M_ZERO); if (rule == NULL) return (ENOSPC); bcopy(input_rule, rule, l); rule->next = NULL; rule->next_rule = NULL; rule->pcnt = 0; rule->bcnt = 0; rule->timestamp = 0; s = splimp(); if (*head == NULL) { /* default rule */ *head = rule; goto done; } /* * If rulenum is 0, find highest numbered rule before the * default rule, and add autoinc_step */ if (autoinc_step < 1) autoinc_step = 1; else if (autoinc_step > 1000) autoinc_step = 1000; if (rule->rulenum == 0) { /* * locate the highest numbered rule before default */ for (f = *head; f; f = f->next) { if (f->rulenum == IPFW_DEFAULT_RULE) break; rule->rulenum = f->rulenum; } if (rule->rulenum < IPFW_DEFAULT_RULE - autoinc_step) rule->rulenum += autoinc_step; input_rule->rulenum = rule->rulenum; } /* * Now insert the new rule in the right place in the sorted list. */ for (prev = NULL, f = *head; f; prev = f, f = f->next) { if (f->rulenum > rule->rulenum) { /* found the location */ if (prev) { rule->next = f; prev->next = rule; } else { /* head insert */ rule->next = *head; *head = rule; } break; } } flush_rule_ptrs(); done: static_count++; static_len += l; splx(s); DEB(printf("ipfw: installed rule %d, static count now %d\n", rule->rulenum, static_count);) return (0); } /** * Free storage associated with a static rule (including derived * dynamic rules). * The caller is in charge of clearing rule pointers to avoid * dangling pointers. * @return a pointer to the next entry. * Arguments are not checked, so they better be correct. * Must be called at splimp(). */ static struct ip_fw * delete_rule(struct ip_fw **head, struct ip_fw *prev, struct ip_fw *rule) { struct ip_fw *n; int l = RULESIZE(rule); n = rule->next; remove_dyn_rule(rule, NULL /* force removal */); if (prev == NULL) *head = n; else prev->next = n; static_count--; static_len -= l; if (DUMMYNET_LOADED) ip_dn_ruledel_ptr(rule); free(rule, M_IPFW); return n; } /* * Deletes all rules from a chain (including the default rule * if the second argument is set). * Must be called at splimp(). */ static void free_chain(struct ip_fw **chain, int kill_default) { struct ip_fw *rule; flush_rule_ptrs(); /* more efficient to do outside the loop */ while ( (rule = *chain) != NULL && (kill_default || rule->rulenum != IPFW_DEFAULT_RULE) ) delete_rule(chain, NULL, rule); } /** * Remove all rules with given number, and also do set manipulation. * * The argument is an u_int32_t. The low 16 bit are the rule or set number, * the next 8 bits are the new set, the top 8 bits are the command: * * 0 delete rules with given number * 1 delete rules with given set number * 2 move rules with given number to new set * 3 move rules with given set number to new set * 4 swap sets with given numbers */ static int del_entry(struct ip_fw **chain, u_int32_t arg) { struct ip_fw *prev, *rule; int s; u_int16_t rulenum; u_int8_t cmd, new_set; rulenum = arg & 0xffff; cmd = (arg >> 24) & 0xff; new_set = (arg >> 16) & 0xff; if (cmd > 4) return EINVAL; if (new_set > 30) return EINVAL; if (cmd == 0 || cmd == 2) { if (rulenum == IPFW_DEFAULT_RULE) return EINVAL; } else { if (rulenum > 30) return EINVAL; } switch (cmd) { case 0: /* delete rules with given number */ /* * locate first rule to delete */ for (prev = NULL, rule = *chain; rule && rule->rulenum < rulenum; prev = rule, rule = rule->next) ; if (rule->rulenum != rulenum) return EINVAL; s = splimp(); /* no access to rules while removing */ /* * flush pointers outside the loop, then delete all matching * rules. prev remains the same throughout the cycle. */ flush_rule_ptrs(); while (rule && rule->rulenum == rulenum) rule = delete_rule(chain, prev, rule); splx(s); break; case 1: /* delete all rules with given set number */ s = splimp(); flush_rule_ptrs(); for (prev = NULL, rule = *chain; rule ; ) if (rule->set == rulenum) rule = delete_rule(chain, prev, rule); else { prev = rule; rule = rule->next; } splx(s); break; case 2: /* move rules with given number to new set */ s = splimp(); for (rule = *chain; rule ; rule = rule->next) if (rule->rulenum == rulenum) rule->set = new_set; splx(s); break; case 3: /* move rules with given set number to new set */ s = splimp(); for (rule = *chain; rule ; rule = rule->next) if (rule->set == rulenum) rule->set = new_set; splx(s); break; case 4: /* swap two sets */ s = splimp(); for (rule = *chain; rule ; rule = rule->next) if (rule->set == rulenum) rule->set = new_set; else if (rule->set == new_set) rule->set = rulenum; splx(s); break; } return 0; } /* * Clear counters for a specific rule. */ static void clear_counters(struct ip_fw *rule, int log_only) { ipfw_insn_log *l = (ipfw_insn_log *)ACTION_PTR(rule); if (log_only == 0) { rule->bcnt = rule->pcnt = 0; rule->timestamp = 0; } if (l->o.opcode == O_LOG) l->log_left = l->max_log; } /** * Reset some or all counters on firewall rules. * @arg frwl is null to clear all entries, or contains a specific * rule number. * @arg log_only is 1 if we only want to reset logs, zero otherwise. */ static int zero_entry(int rulenum, int log_only) { struct ip_fw *rule; int s; char *msg; if (rulenum == 0) { s = splimp(); norule_counter = 0; for (rule = layer3_chain; rule; rule = rule->next) clear_counters(rule, log_only); splx(s); msg = log_only ? "ipfw: All logging counts reset.\n" : "ipfw: Accounting cleared.\n"; } else { int cleared = 0; /* * We can have multiple rules with the same number, so we * need to clear them all. */ for (rule = layer3_chain; rule; rule = rule->next) if (rule->rulenum == rulenum) { s = splimp(); while (rule && rule->rulenum == rulenum) { clear_counters(rule, log_only); rule = rule->next; } splx(s); cleared = 1; break; } if (!cleared) /* we did not find any matching rules */ return (EINVAL); msg = log_only ? "ipfw: Entry %d logging count reset.\n" : "ipfw: Entry %d cleared.\n"; } if (fw_verbose) log(LOG_SECURITY | LOG_NOTICE, msg, rulenum); return (0); } /* * Check validity of the structure before insert. * Fortunately rules are simple, so this mostly need to check rule sizes. */ static int check_ipfw_struct(struct ip_fw *rule, int size) { int l, cmdlen = 0; int have_action=0; ipfw_insn *cmd; if (size < sizeof(*rule)) { printf("ipfw: rule too short\n"); return (EINVAL); } /* first, check for valid size */ l = RULESIZE(rule); if (l != size) { printf("ipfw: size mismatch (have %d want %d)\n", size, l); return (EINVAL); } /* * Now go for the individual checks. Very simple ones, basically only * instruction sizes. */ for (l = rule->cmd_len, cmd = rule->cmd ; l > 0 ; l -= cmdlen, cmd += cmdlen) { cmdlen = F_LEN(cmd); if (cmdlen > l) { printf("ipfw: opcode %d size truncated\n", cmd->opcode); return EINVAL; } DEB(printf("ipfw: opcode %d\n", cmd->opcode);) switch (cmd->opcode) { case O_NOP: case O_PROBE_STATE: case O_KEEP_STATE: case O_PROTO: case O_IP_SRC_ME: case O_IP_DST_ME: case O_LAYER2: case O_IN: case O_FRAG: case O_IPOPT: - case O_IPLEN: - case O_IPID: case O_IPTOS: case O_IPPRECEDENCE: - case O_IPTTL: case O_IPVER: case O_TCPWIN: case O_TCPFLAGS: case O_TCPOPTS: case O_ESTAB: + case O_VERREVPATH: if (cmdlen != F_INSN_SIZE(ipfw_insn)) goto bad_size; break; case O_UID: case O_GID: case O_IP_SRC: case O_IP_DST: case O_TCPSEQ: case O_TCPACK: case O_PROB: case O_ICMPTYPE: if (cmdlen != F_INSN_SIZE(ipfw_insn_u32)) goto bad_size; break; case O_LIMIT: if (cmdlen != F_INSN_SIZE(ipfw_insn_limit)) goto bad_size; break; case O_LOG: if (cmdlen != F_INSN_SIZE(ipfw_insn_log)) goto bad_size; ((ipfw_insn_log *)cmd)->log_left = ((ipfw_insn_log *)cmd)->max_log; break; case O_IP_SRC_MASK: case O_IP_DST_MASK: if (cmdlen != F_INSN_SIZE(ipfw_insn_ip)) goto bad_size; if (((ipfw_insn_ip *)cmd)->mask.s_addr == 0) { printf("ipfw: opcode %d, useless rule\n", cmd->opcode); return EINVAL; } break; case O_IP_SRC_SET: case O_IP_DST_SET: if (cmd->arg1 == 0 || cmd->arg1 > 256) { printf("ipfw: invalid set size %d\n", cmd->arg1); return EINVAL; } if (cmdlen != F_INSN_SIZE(ipfw_insn_u32) + (cmd->arg1+31)/32 ) goto bad_size; break; case O_MACADDR2: if (cmdlen != F_INSN_SIZE(ipfw_insn_mac)) goto bad_size; break; + case O_IPID: + case O_IPTTL: + case O_IPLEN: + if (cmdlen < 1 || cmdlen > 31) + goto bad_size; + break; case O_MAC_TYPE: case O_IP_SRCPORT: case O_IP_DSTPORT: /* XXX artificial limit, 30 port pairs */ if (cmdlen < 2 || cmdlen > 31) goto bad_size; break; case O_RECV: case O_XMIT: case O_VIA: if (cmdlen != F_INSN_SIZE(ipfw_insn_if)) goto bad_size; break; case O_PIPE: case O_QUEUE: if (cmdlen != F_INSN_SIZE(ipfw_insn_pipe)) goto bad_size; goto check_action; case O_FORWARD_IP: if (cmdlen != F_INSN_SIZE(ipfw_insn_sa)) goto bad_size; goto check_action; case O_FORWARD_MAC: /* XXX not implemented yet */ case O_CHECK_STATE: case O_COUNT: case O_ACCEPT: case O_DENY: case O_REJECT: case O_SKIPTO: case O_DIVERT: case O_TEE: if (cmdlen != F_INSN_SIZE(ipfw_insn)) goto bad_size; check_action: if (have_action) { printf("ipfw: opcode %d, multiple actions" " not allowed\n", cmd->opcode); return EINVAL; } have_action = 1; if (l != cmdlen) { printf("ipfw: opcode %d, action must be" " last opcode\n", cmd->opcode); return EINVAL; } break; default: printf("ipfw: opcode %d, unknown opcode\n", cmd->opcode); return EINVAL; } } if (have_action == 0) { printf("ipfw: missing action\n"); return EINVAL; } return 0; bad_size: printf("ipfw: opcode %d size %d wrong\n", cmd->opcode, cmdlen); return EINVAL; } /** * {set|get}sockopt parser. */ static int ipfw_ctl(struct sockopt *sopt) { int error, s, rulenum; size_t size; struct ip_fw *bp , *buf, *rule; static u_int32_t rule_buf[255]; /* we copy the data here */ /* * Disallow modifications in really-really secure mode, but still allow * the logging counters to be reset. */ if (sopt->sopt_name == IP_FW_ADD || (sopt->sopt_dir == SOPT_SET && sopt->sopt_name != IP_FW_RESETLOG)) { #if __FreeBSD_version >= 500034 error = securelevel_ge(sopt->sopt_td->td_ucred, 3); if (error) return (error); #else /* FreeBSD 4.x */ if (securelevel >= 3) return (EPERM); #endif } error = 0; switch (sopt->sopt_name) { case IP_FW_GET: /* * pass up a copy of the current rules. Static rules * come first (the last of which has number IPFW_DEFAULT_RULE), * followed by a possibly empty list of dynamic rule. * The last dynamic rule has NULL in the "next" field. */ s = splimp(); size = static_len; /* size of static rules */ if (ipfw_dyn_v) /* add size of dyn.rules */ size += (dyn_count * sizeof(ipfw_dyn_rule)); /* * XXX todo: if the user passes a short length just to know * how much room is needed, do not bother filling up the * buffer, just jump to the sooptcopyout. */ buf = malloc(size, M_TEMP, M_WAITOK); if (buf == 0) { splx(s); error = ENOBUFS; break; } bp = buf; for (rule = layer3_chain; rule ; rule = rule->next) { int i = RULESIZE(rule); bcopy(rule, bp, i); - /* - * abuse 'next_rule' to store the set_disable word - */ - (u_int32_t)(((struct ip_fw *)bp)->next_rule) = - set_disable; + bcopy(&set_disable, &(bp->next_rule), + sizeof(set_disable)); bp = (struct ip_fw *)((char *)bp + i); } if (ipfw_dyn_v) { int i; ipfw_dyn_rule *p, *dst, *last = NULL; dst = (ipfw_dyn_rule *)bp; for (i = 0 ; i < curr_dyn_buckets ; i++ ) for ( p = ipfw_dyn_v[i] ; p != NULL ; p = p->next, dst++ ) { bcopy(p, dst, sizeof *p); - (int)dst->rule = p->rule->rulenum ; + bcopy(&(p->rule->rulenum), &(dst->rule), + sizeof(p->rule->rulenum)); /* * store a non-null value in "next". * The userland code will interpret a * NULL here as a marker * for the last dynamic rule. */ - dst->next = dst ; + bcopy(&dst, &dst->next, sizeof(dst)); last = dst ; dst->expire = TIME_LEQ(dst->expire, time_second) ? 0 : dst->expire - time_second ; } if (last != NULL) /* mark last dynamic rule */ - last->next = NULL; + bzero(&last->next, sizeof(last)); } splx(s); error = sooptcopyout(sopt, buf, size); free(buf, M_TEMP); break; case IP_FW_FLUSH: /* * Normally we cannot release the lock on each iteration. * We could do it here only because we start from the head all * the times so there is no risk of missing some entries. * On the other hand, the risk is that we end up with * a very inconsistent ruleset, so better keep the lock * around the whole cycle. * * XXX this code can be improved by resetting the head of * the list to point to the default rule, and then freeing * the old list without the need for a lock. */ s = splimp(); free_chain(&layer3_chain, 0 /* keep default rule */); splx(s); break; case IP_FW_ADD: rule = (struct ip_fw *)rule_buf; /* XXX do a malloc */ error = sooptcopyin(sopt, rule, sizeof(rule_buf), sizeof(struct ip_fw) ); size = sopt->sopt_valsize; if (error || (error = check_ipfw_struct(rule, size))) break; error = add_rule(&layer3_chain, rule); size = RULESIZE(rule); if (!error && sopt->sopt_dir == SOPT_GET) error = sooptcopyout(sopt, rule, size); break; case IP_FW_DEL: /* * IP_FW_DEL is used for deleting single rules or sets, * and (ab)used to atomically manipulate sets. Argument size * is used to distinguish between the two: * sizeof(u_int32_t) * delete single rule or set of rules, * or reassign rules (or sets) to a different set. * 2*sizeof(u_int32_t) * atomic disable/enable sets. * first u_int32_t contains sets to be disabled, * second u_int32_t contains sets to be enabled. */ error = sooptcopyin(sopt, rule_buf, 2*sizeof(u_int32_t), sizeof(u_int32_t)); if (error) break; size = sopt->sopt_valsize; if (size == sizeof(u_int32_t)) /* delete or reassign */ error = del_entry(&layer3_chain, rule_buf[0]); else if (size == 2*sizeof(u_int32_t)) /* set enable/disable */ set_disable = (set_disable | rule_buf[0]) & ~rule_buf[1] & ~(1<<31); /* set 31 always enabled */ else error = EINVAL; break; case IP_FW_ZERO: case IP_FW_RESETLOG: /* argument is an int, the rule number */ rulenum=0; if (sopt->sopt_val != 0) { error = sooptcopyin(sopt, &rulenum, sizeof(int), sizeof(int)); if (error) break; } error = zero_entry(rulenum, sopt->sopt_name == IP_FW_RESETLOG); break; default: printf("ipfw: ipfw_ctl invalid option %d\n", sopt->sopt_name); error = EINVAL; } return (error); } /** * dummynet needs a reference to the default rule, because rules can be * deleted while packets hold a reference to them. When this happens, * dummynet changes the reference to the default rule (it could well be a * NULL pointer, but this way we do not need to check for the special * case, plus here he have info on the default behaviour). */ struct ip_fw *ip_fw_default_rule; /* * This procedure is only used to handle keepalives. It is invoked * every dyn_keepalive_period */ static void ipfw_tick(void * __unused unused) { int i; int s; ipfw_dyn_rule *q; if (dyn_keepalive == 0 || ipfw_dyn_v == NULL || dyn_count == 0) goto done; s = splimp(); for (i = 0 ; i < curr_dyn_buckets ; i++) { for (q = ipfw_dyn_v[i] ; q ; q = q->next ) { if (q->dyn_type == O_LIMIT_PARENT) continue; if (q->id.proto != IPPROTO_TCP) continue; if ( (q->state & BOTH_SYN) != BOTH_SYN) continue; if (TIME_LEQ( time_second+dyn_keepalive_interval, q->expire)) continue; /* too early */ if (TIME_LEQ(q->expire, time_second)) continue; /* too late, rule expired */ send_pkt(&(q->id), q->ack_rev - 1, q->ack_fwd, TH_SYN); send_pkt(&(q->id), q->ack_fwd - 1, q->ack_rev, 0); } } splx(s); done: ipfw_timeout_h = timeout(ipfw_tick, NULL, dyn_keepalive_period*hz); } static void ipfw_init(void) { struct ip_fw default_rule; ip_fw_chk_ptr = ipfw_chk; ip_fw_ctl_ptr = ipfw_ctl; layer3_chain = NULL; bzero(&default_rule, sizeof default_rule); default_rule.act_ofs = 0; default_rule.rulenum = IPFW_DEFAULT_RULE; default_rule.cmd_len = 1; default_rule.set = 31; default_rule.cmd[0].len = 1; default_rule.cmd[0].opcode = #ifdef IPFIREWALL_DEFAULT_TO_ACCEPT 1 ? O_ACCEPT : #endif O_DENY; add_rule(&layer3_chain, &default_rule); ip_fw_default_rule = layer3_chain; printf("ipfw2 initialized, divert %s, " "rule-based forwarding enabled, default to %s, logging ", #ifdef IPDIVERT "enabled", #else "disabled", #endif default_rule.cmd[0].opcode == O_ACCEPT ? "accept" : "deny"); #ifdef IPFIREWALL_VERBOSE fw_verbose = 1; #endif #ifdef IPFIREWALL_VERBOSE_LIMIT verbose_limit = IPFIREWALL_VERBOSE_LIMIT; #endif if (fw_verbose == 0) printf("disabled\n"); else if (verbose_limit == 0) printf("unlimited\n"); else printf("limited to %d packets/entry by default\n", verbose_limit); bzero(&ipfw_timeout_h, sizeof(struct callout_handle)); ipfw_timeout_h = timeout(ipfw_tick, NULL, hz); } static int ipfw_modevent(module_t mod, int type, void *unused) { int s; int err = 0; switch (type) { case MOD_LOAD: s = splimp(); if (IPFW_LOADED) { splx(s); printf("IP firewall already loaded\n"); err = EEXIST; } else { ipfw_init(); splx(s); } break; case MOD_UNLOAD: #if !defined(KLD_MODULE) printf("ipfw statically compiled, cannot unload\n"); err = EBUSY; #else s = splimp(); untimeout(ipfw_tick, NULL, ipfw_timeout_h); ip_fw_chk_ptr = NULL; ip_fw_ctl_ptr = NULL; free_chain(&layer3_chain, 1 /* kill default rule */); splx(s); printf("IP firewall unloaded\n"); #endif break; default: break; } return err; } static moduledata_t ipfwmod = { "ipfw", ipfw_modevent, 0 }; DECLARE_MODULE(ipfw, ipfwmod, SI_SUB_PSEUDO, SI_ORDER_ANY); MODULE_VERSION(ipfw, 1); #endif /* IPFW2 */ Index: stable/4/sys/netinet/ip_fw2.h =================================================================== --- stable/4/sys/netinet/ip_fw2.h (revision 116991) +++ stable/4/sys/netinet/ip_fw2.h (revision 116992) @@ -1,404 +1,414 @@ /* * Copyright (c) 2002 Luigi Rizzo, Universita` di Pisa * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _IPFW2_H #define _IPFW2_H /* * The kernel representation of ipfw rules is made of a list of * 'instructions' (for all practical purposes equivalent to BPF * instructions), which specify which fields of the packet - * (or its metatada) should be analysed. + * (or its metadata) should be analysed. * * Each instruction is stored in a structure which begins with * "ipfw_insn", and can contain extra fields depending on the * instruction type (listed below). + * Note that the code is written so that individual instructions + * have a size which is a multiple of 32 bits. This means that, if + * such structures contain pointers or other 64-bit entities, + * (there is just one instance now) they may end up unaligned on + * 64-bit architectures, so the must be handled with care. * * "enum ipfw_opcodes" are the opcodes supported. We can have up * to 256 different opcodes. */ enum ipfw_opcodes { /* arguments (4 byte each) */ O_NOP, O_IP_SRC, /* u32 = IP */ O_IP_SRC_MASK, /* ip = IP/mask */ O_IP_SRC_ME, /* none */ O_IP_SRC_SET, /* u32=base, arg1=len, bitmap */ O_IP_DST, /* u32 = IP */ O_IP_DST_MASK, /* ip = IP/mask */ O_IP_DST_ME, /* none */ O_IP_DST_SET, /* u32=base, arg1=len, bitmap */ O_IP_SRCPORT, /* (n)port list:mask 4 byte ea */ O_IP_DSTPORT, /* (n)port list:mask 4 byte ea */ O_PROTO, /* arg1=protocol */ O_MACADDR2, /* 2 mac addr:mask */ O_MAC_TYPE, /* same as srcport */ O_LAYER2, /* none */ O_IN, /* none */ O_FRAG, /* none */ O_RECV, /* none */ O_XMIT, /* none */ O_VIA, /* none */ O_IPOPT, /* arg1 = 2*u8 bitmap */ O_IPLEN, /* arg1 = len */ O_IPID, /* arg1 = id */ O_IPTOS, /* arg1 = id */ O_IPPRECEDENCE, /* arg1 = precedence << 5 */ O_IPTTL, /* arg1 = TTL */ O_IPVER, /* arg1 = version */ O_UID, /* u32 = id */ O_GID, /* u32 = id */ O_ESTAB, /* none (tcp established) */ O_TCPFLAGS, /* arg1 = 2*u8 bitmap */ O_TCPWIN, /* arg1 = desired win */ O_TCPSEQ, /* u32 = desired seq. */ O_TCPACK, /* u32 = desired seq. */ O_ICMPTYPE, /* u32 = icmp bitmap */ O_TCPOPTS, /* arg1 = 2*u8 bitmap */ + O_VERREVPATH, /* none */ + O_PROBE_STATE, /* none */ O_KEEP_STATE, /* none */ O_LIMIT, /* ipfw_insn_limit */ O_LIMIT_PARENT, /* dyn_type, not an opcode. */ /* * these are really 'actions', and must be last in the list. */ O_LOG, /* ipfw_insn_log */ O_PROB, /* u32 = match probability */ O_CHECK_STATE, /* none */ O_ACCEPT, /* none */ O_DENY, /* none */ O_REJECT, /* arg1=icmp arg (same as deny) */ O_COUNT, /* none */ O_SKIPTO, /* arg1=next rule number */ O_PIPE, /* arg1=pipe number */ O_QUEUE, /* arg1=queue number */ O_DIVERT, /* arg1=port number */ O_TEE, /* arg1=port number */ O_FORWARD_IP, /* fwd sockaddr */ O_FORWARD_MAC, /* fwd mac */ O_LAST_OPCODE /* not an opcode! */ }; /* * Template for instructions. * * ipfw_insn is used for all instructions which require no operands, * a single 16-bit value (arg1), or a couple of 8-bit values. * * For other instructions which require different/larger arguments * we have derived structures, ipfw_insn_*. * * The size of the instruction (in 32-bit words) is in the low * 6 bits of "len". The 2 remaining bits are used to implement * NOT and OR on individual instructions. Given a type, you can * compute the length to be put in "len" using F_INSN_SIZE(t) * * F_NOT negates the match result of the instruction. * * F_OR is used to build or blocks. By default, instructions * are evaluated as part of a logical AND. An "or" block * { X or Y or Z } contains F_OR set in all but the last * instruction of the block. A match will cause the code * to skip past the last instruction of the block. * * NOTA BENE: in a couple of places we assume that * sizeof(ipfw_insn) == sizeof(u_int32_t) * this needs to be fixed. * */ typedef struct _ipfw_insn { /* template for instructions */ enum ipfw_opcodes opcode:8; u_int8_t len; /* numer of 32-byte words */ #define F_NOT 0x80 #define F_OR 0x40 #define F_LEN_MASK 0x3f #define F_LEN(cmd) ((cmd)->len & F_LEN_MASK) u_int16_t arg1; } ipfw_insn; /* * The F_INSN_SIZE(type) computes the size, in 4-byte words, of * a given type. */ #define F_INSN_SIZE(t) ((sizeof (t))/sizeof(u_int32_t)) /* * This is used to store an array of 16-bit entries (ports etc.) */ typedef struct _ipfw_insn_u16 { ipfw_insn o; u_int16_t ports[2]; /* there may be more */ } ipfw_insn_u16; /* * This is used to store an array of 32-bit entries * (uid, single IPv4 addresses etc.) */ typedef struct _ipfw_insn_u32 { ipfw_insn o; u_int32_t d[1]; /* one or more */ } ipfw_insn_u32; /* * This is used to store IP addr-mask pairs. */ typedef struct _ipfw_insn_ip { ipfw_insn o; struct in_addr addr; struct in_addr mask; } ipfw_insn_ip; /* * This is used to forward to a given address (ip) */ typedef struct _ipfw_insn_sa { ipfw_insn o; struct sockaddr_in sa; } ipfw_insn_sa; /* * This is used for MAC addr-mask pairs. */ typedef struct _ipfw_insn_mac { ipfw_insn o; u_char addr[12]; /* dst[6] + src[6] */ u_char mask[12]; /* dst[6] + src[6] */ } ipfw_insn_mac; /* * This is used for interface match rules (recv xx, xmit xx) */ typedef struct _ipfw_insn_if { ipfw_insn o; union { struct in_addr ip; - int unit; + int32_t unit; } p; char name[IFNAMSIZ]; } ipfw_insn_if; /* * This is used for pipe and queue actions, which need to store * a single pointer (which can have different size on different * architectures. + * Note that, because of previous instructions, pipe_ptr might + * be unaligned in the overall structure, so it needs to be + * manipulated with care. */ typedef struct _ipfw_insn_pipe { ipfw_insn o; - void *pipe_ptr; + void *pipe_ptr; /* XXX */ } ipfw_insn_pipe; /* * This is used for limit rules. */ typedef struct _ipfw_insn_limit { ipfw_insn o; u_int8_t _pad; u_int8_t limit_mask; /* combination of DYN_* below */ #define DYN_SRC_ADDR 0x1 #define DYN_SRC_PORT 0x2 #define DYN_DST_ADDR 0x4 #define DYN_DST_PORT 0x8 u_int16_t conn_limit; } ipfw_insn_limit; /* * This is used for log instructions */ typedef struct _ipfw_insn_log { ipfw_insn o; u_int32_t max_log; /* how many do we log -- 0 = all */ u_int32_t log_left; /* how many left to log */ } ipfw_insn_log; /* * Here we have the structure representing an ipfw rule. * * It starts with a general area (with link fields and counters) * followed by an array of one or more instructions, which the code * accesses as an array of 32-bit values. * * Given a rule pointer r: * * r->cmd is the start of the first instruction. * ACTION_PTR(r) is the start of the first action (things to do * once a rule matched). * * When assembling instruction, remember the following: * * + if a rule has a "keep-state" (or "limit") option, then the * first instruction (at r->cmd) MUST BE an O_PROBE_STATE * + if a rule has a "log" option, then the first action * (at ACTION_PTR(r)) MUST be O_LOG * * NOTE: we use a simple linked list of rules because we never need * to delete a rule without scanning the list. We do not use * queue(3) macros for portability and readability. */ struct ip_fw { struct ip_fw *next; /* linked list of rules */ struct ip_fw *next_rule; /* ptr to next [skipto] rule */ u_int16_t act_ofs; /* offset of action in 32-bit units */ u_int16_t cmd_len; /* # of 32-bit words in cmd */ u_int16_t rulenum; /* rule number */ u_int8_t set; /* rule set (0..31) */ u_int8_t _pad; /* padding */ /* These fields are present in all rules. */ u_int64_t pcnt; /* Packet counter */ u_int64_t bcnt; /* Byte counter */ u_int32_t timestamp; /* tv_sec of last match */ ipfw_insn cmd[1]; /* storage for commands */ }; #define ACTION_PTR(rule) \ (ipfw_insn *)( (u_int32_t *)((rule)->cmd) + ((rule)->act_ofs) ) #define RULESIZE(rule) (sizeof(struct ip_fw) + \ ((struct ip_fw *)(rule))->cmd_len * 4 - 4) /* * This structure is used as a flow mask and a flow id for various * parts of the code. */ struct ipfw_flow_id { u_int32_t dst_ip; u_int32_t src_ip; u_int16_t dst_port; u_int16_t src_port; u_int8_t proto; u_int8_t flags; /* protocol-specific flags */ }; /* * dynamic ipfw rule */ typedef struct _ipfw_dyn_rule ipfw_dyn_rule; struct _ipfw_dyn_rule { ipfw_dyn_rule *next; /* linked list of rules. */ - struct ipfw_flow_id id; /* (masked) flow id */ struct ip_fw *rule; /* pointer to rule */ ipfw_dyn_rule *parent; /* pointer to parent rule */ - u_int32_t expire; /* expire time */ u_int64_t pcnt; /* packet match counter */ u_int64_t bcnt; /* byte match counter */ + struct ipfw_flow_id id; /* (masked) flow id */ + u_int32_t expire; /* expire time */ u_int32_t bucket; /* which bucket in hash table */ u_int32_t state; /* state of this rule (typically a * combination of TCP flags) */ u_int32_t ack_fwd; /* most recent ACKs in forward */ u_int32_t ack_rev; /* and reverse directions (used */ /* to generate keepalives) */ u_int16_t dyn_type; /* rule type */ u_int16_t count; /* refcount */ }; /* * Definitions for IP option names. */ #define IP_FW_IPOPT_LSRR 0x01 #define IP_FW_IPOPT_SSRR 0x02 #define IP_FW_IPOPT_RR 0x04 #define IP_FW_IPOPT_TS 0x08 /* * Definitions for TCP option names. */ #define IP_FW_TCPOPT_MSS 0x01 #define IP_FW_TCPOPT_WINDOW 0x02 #define IP_FW_TCPOPT_SACK 0x04 #define IP_FW_TCPOPT_TS 0x08 #define IP_FW_TCPOPT_CC 0x10 #define ICMP_REJECT_RST 0x100 /* fake ICMP code (send a TCP RST) */ /* * Main firewall chains definitions and global var's definitions. */ #ifdef _KERNEL #define IP_FW_PORT_DYNT_FLAG 0x10000 #define IP_FW_PORT_TEE_FLAG 0x20000 #define IP_FW_PORT_DENY_FLAG 0x40000 /* * arguments for calling ipfw_chk() and dummynet_io(). We put them * all into a structure because this way it is easier and more * efficient to pass variables around and extend the interface. */ struct ip_fw_args { struct mbuf *m; /* the mbuf chain */ struct ifnet *oif; /* output interface */ struct sockaddr_in *next_hop; /* forward address */ struct ip_fw *rule; /* matching rule */ struct ether_header *eh; /* for bridged packets */ struct route *ro; /* for dummynet */ struct sockaddr_in *dst; /* for dummynet */ int flags; /* for dummynet */ struct ipfw_flow_id f_id; /* grabbed from IP header */ u_int16_t divert_rule; /* divert cookie */ u_int32_t retval; }; /* * Function definitions. */ /* Firewall hooks */ struct sockopt; struct dn_flow_set; void flush_pipe_ptrs(struct dn_flow_set *match); /* used by dummynet */ typedef int ip_fw_chk_t (struct ip_fw_args *args); typedef int ip_fw_ctl_t (struct sockopt *); extern ip_fw_chk_t *ip_fw_chk_ptr; extern ip_fw_ctl_t *ip_fw_ctl_ptr; extern int fw_one_pass; extern int fw_enable; #define IPFW_LOADED (ip_fw_chk_ptr != NULL) #endif /* _KERNEL */ #endif /* _IPFW2_H */