Make ipfw dynamic states lockless on fast path
Needs ReviewPublic

Authored by ae on Oct 16 2017, 10:36 AM.

Details

Summary

General changes:

  • struct ipfw_dyn_info added. It keeps all needed for ipfw_chk and for dynamic states implementation information.
  • lookup and install states functions changed. Now all internal locking in the dynamic states implementation is hidden from ipfw_chk().
  • ipfw_dyn_rule now becomes obsolete. Currently it used to pass information from kernel to userland only.
  • ipfw_send_pkt() moved from ip_fw_dynamic into ip_fw2. It will be reworked later. Now ip_fw_dynamic2 has its own functions to send keep-alivies.
  • Added ipfw_add_protected_rule() that creates default_rule. It also used to create "dynamic states default rule".
  • Some fixes to range matching: use UINT32_MAX to match all rules with the same rulenum.

Dynamic states implementation:

  • IPv4 and IPv6 states now described by different structures dyn_ipv4_state and dyn_ipv6_state;
  • IPv6 scope zones support is added;
  • states are linked with "entry" field using CK_SLIST. This allows lockless lookup and protected by mutex modifications.
  • the "expired" SLIST field is used for states expiring.
  • struct dyn_data is used to keep generic information for both IPv4 and IPv6;
  • struct dyn_parent is used to keep O_LIMIT_PARENT information;
  • IPv4 and IPv6 states are in different hash tables;
  • Also O_LIMIT_PARENT states now are kept separately from O_LIMIT and O_KEEP_STATE states.
  • per-cpu dyn_hp pointers are used to implement hazard pointers and they prevent freeing states that are locklessly used by lookup threads.
  • mutexes to protect modification of lists in hash tables now kept in separate arrays;
  • each hash table has two arrays of "bucket version" for add and delete operations. These arrays are used for some speedups and protections.
  • dyn_update_tcp_state() added. Updating algorithm for this functions is modified.
  • Separate lookup and install functions added for IPv4 and IPv6 states and for parent states.
  • By default now is used Jenkinks hash function.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 12191
Build 12490: arc lint + arc unit
ae created this revision.Oct 16 2017, 10:36 AM
ae updated this revision to Diff 34025.Oct 16 2017, 11:01 AM
  • Restore support for 'check-state :any'.
ae edited the summary of this revision. (Show Details)Oct 16 2017, 2:53 PM
ae added a reviewer: oleg.Oct 17 2017, 7:07 AM
ae added a reviewer: julian.Oct 18 2017, 9:59 AM
julian requested changes to this revision.Oct 20 2017, 8:00 AM

really minor things only.. don't get discouraged..

sys/netpfil/ipfw/ip_fw2.c
476

this seems unfinished

478

say more about the flags

508

do non tcp packets ever get here?

520

I don't like this but I can't really say why.. "Prove this to be correct in a comment"

529

should we have a no-ipv4 set of conditionals too?

sys/netpfil/ipfw/ip_fw_private.h
200

separate by whitespace plz :-)

sys/netpfil/ipfw/ip_fw_sockopt.c
799

this makes me wonder if you could create a diagram of structure linkages in the modern ipfw. Might even try it myself...

2407–2409

split after the ? not the : plz (or not split)

This revision now requires changes to proceed.Oct 20 2017, 8:00 AM
ae added a comment.Oct 20 2017, 9:02 AM

The ipfw_send_pkt() function was not changed, it just moved from ip_fw_dynamic.c into ip_fw2.c. I can remove keep-alive related functionality from this function if you prefer this.
The new dynamic states implementation uses own keep-alive functions that are in ip_fw_dynamic2.c (is is hidden in phabricator and should be expanded via "Show File Contents").
I am planning to rework ipfw_send_pkt() to use some deferred sending (e.g. taskq). I don't like the fact that we are reentering the firewall when we sending RST. This produces high stack usage.

ae updated this revision to Diff 34172.Oct 20 2017, 9:19 AM
ae edited the summary of this revision. (Show Details)

Some whitespace fixes and blank lines.
In dyn_export_data() fix typo in bytes counter calculation

ae marked 2 inline comments as done.Oct 20 2017, 9:32 AM
ae added inline comments.
sys/netpfil/ipfw/ip_fw_sockopt.c
799

We have the array of pointers to rules. This array reallocated each time we add new rule. get_map() does this. Not sure what you wanted to hear/read here? :)
This function just allocates rule with 65535 rule number and in reserved set. It is used to allocate default rule, and also new dynamic states code uses this function to allocate "default dynamic states" rule. This rule is placed next to the default rule and looks like:

# ipfw list 65535
65535 deny ip from any to any
65535 allow ip from any to any // Dynamic states default rule - NOTREACHED

We use this rule to link with states that are created by ipfwsync code. Also I think it can be used to implement "keep_states" functionality. When ifpw rule that have some states is deleted, we can use this default rule as new parent, and states will use this parent until they are expired. But this needs to some thoughts on how this implement better.

ae updated this revision to Diff 34181.Oct 20 2017, 2:05 PM
  • Switch the default hash algorithm to jenkins hash.
ae edited the summary of this revision. (Show Details)Oct 20 2017, 2:07 PM
ae set the repository for this revision to rS FreeBSD src repository.
ae added a subscriber: olivier.
olivier added inline comments.Oct 23 2017, 7:29 PM
sys/netpfil/ipfw/ip_fw_dynamic2.c
3201

Compilation problem here:

--- ip_fw_dynamic2.o ---
/src/sys/netpfil/ipfw/ip_fw_dynamic2.c:3200:14: error: use of undeclared identifier 'dyn_expired_ipv4'
        SLIST_INIT(&dyn_expired_ipv4);
                    ^
3209

Compilation problem here:

--- all_subdir_ipfw ---
/src/sys/netpfil/ipfw/ip_fw_dynamic2.c:3208:14: error: use of undeclared identifier 'dyn_expired_ipv6'
        SLIST_INIT(&dyn_expired_ipv6);
                    ^
sys/netpfil/ipfw/ip_fw_sockopt.c
842

Wrong copy/past ? This same function is declared 3 times here.

-- ip_fw_sockopt.o ---
/src/sys/netpfil/ipfw/ip_fw_sockopt.c:818:1: error: redefinition of 'ipfw_add_protected_rule'
ipfw_add_protected_rule(struct ip_fw_chain *chain, struct ip_fw *rule,
^
/src/sys/netpfil/ipfw/ip_fw_sockopt.c:794:1: note: previous definition is here
ipfw_add_protected_rule(struct ip_fw_chain *chain, struct ip_fw *rule,
^
ae updated this revision to Diff 34264.Oct 23 2017, 8:12 PM
  • Fix build with VIMAGE and remove mismerged chunks.
  • Replace ip_fw_dynamic in sys/conf/files.
ae marked 3 inline comments as done.Oct 23 2017, 8:14 PM

Here is on a Xeon 8cores with Chelsio results, 2000 UDP flows of small packet size:

inet4: +8%

x r325006: inet4 packets-per-second
+ r325006 with D12685: inet4 packets-per-second
+--------------------------------------------------------------------------+
| x     x       x    x                                        +  +   + +  +|
||______M_A_______|                                                        |
|                                                              |____AM___| |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       4926226       5061461       4968007     4982323.8     61171.031
+   5     5354928.5       5439683       5400744     5397190.5     33366.021
Difference at 95.0% confidence
        414867 +/- 71858.4
        8.32677% +/- 1.53565%
        (Student's t, pooled s = 49270.6)

inet6, +26% :-)

x r325006: inet6 packets-per-second
+ r325006 with D12685: inet6 packets-per-second
+--------------------------------------------------------------------------+
|xxx                                                            +  ++ +   +|
||A|                                                                       |
|                                                                |__MA__|  |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       3495513       3522367       3504904     3508423.8     11667.122
+   5       4373025       4506515       4425156       4433615     50334.836
Difference at 95.0% confidence
        925191 +/- 53285.2
        26.3706% +/- 1.54172%
        (Student's t, pooled s = 36535.7)

On a small 4 cores ATOM:

inet4: +1.6%

x r325006: inet4 packets-per-seconds
+ r325006 with D12685: inet4 packets-per-seconds
+--------------------------------------------------------------------------+
|  x x                 x    x                   + +       x        +    ++ |
||_____________________A______________________|                            |
|                                                 |___________A____M______||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5        533769        546839      538652.5      538631.2     5283.2651
+   5      544503.5        550422        548852      547726.8     2850.0441
Difference at 95.0% confidence
        9095.6 +/- 6190.71
        1.68865% +/- 1.1644%
        (Student's t, pooled s = 4244.74)

inet6:

x r325006:inet6 packets-per-seconds
+ r325006 with D12685: inet6 packets-per-seconds
+--------------------------------------------------------------------------+
|       x  *   *            +        x   +       +                        x|
||_____________M_____________A__________________________|                  |
|            |______________MA_______________|                             |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5        433603        438510        434100      435151.8      2056.723
+   5        433856        436658        435092      435163.6      1207.954
No difference proven at 95.0% confidence
ae added a comment.Oct 26 2017, 8:13 PM

Also now you can try your 5000000 flows test :)
You can set

sysctl net.inet.ip.fw.dyn_max=5000000
sysctl net.inet.ip.fw.dyn_buckets=5000000

Also it is interesting to see what value has net.inet.ip.fw.curr_max_length variable during the test.

Wow, results regarding the 5M UDP session bench:

x r325006 with AFDATA/RADIX patches: inet4 Packets-per-second
+ r325006 with AFDATA/RADIX and D12685 patches: inet4 Packets-per-second
+--------------------------------------------------------------------------+
|x                                                                        +|
|x                                                                        +|
|x                                                                        +|
|x                                                                        +|
|x                                                                        +|
|A                                                                         |
|                                                                         A|
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5         28583         36232       32290.5       32308.3     3725.3587
+   5       5661388       5697271       5687629       5682374     15266.038
Difference at 95.0% confidence
        5.65007e+06 +/- 16205.5
        17488% +/- 2092%
        (Student's t, pooled s = 11111.5)

During the 5M session bench:

sysctl net.inet.ip.fw.curr_max_length
net.inet.ip.fw.curr_max_length: 9

And now comparing pf vs ipfw:

what are the chances of this improvement making it into stable/10?

ae added a comment.Oct 30 2017, 2:51 PM

I have not plan to merge this into stable/10.
Also I'm not sure about committing this as is, because I'm not sure ConcurrencyKit is supported on all platforms.
Probably I will modify ip_fw_dynamic.c to be KPI compatible with this code, and then make it conditionally buildable.
Also CK is not merged into stable/11 yet, but cognet@ said that he will merge it into stable/11 if it will become needed.

bapt added a subscriber: bapt.Nov 21 2017, 4:58 PM
In D12685#266423, @ae wrote:

I have not plan to merge this into stable/10.
Also I'm not sure about committing this as is, because I'm not sure ConcurrencyKit is supported on all platforms.

It is!

Probably I will modify ip_fw_dynamic.c to be KPI compatible with this code, and then make it conditionally buildable.
Also CK is not merged into stable/11 yet, but cognet@ said that he will merge it into stable/11 if it will become needed.