Make ipfw dynamic states lockless on fast path
ClosedPublic

Authored by ae on Oct 16 2017, 10:36 AM.

Details

Summary

General changes:

  • struct ipfw_dyn_info added. It keeps all needed for ipfw_chk and for dynamic states implementation information.
  • lookup and install states functions changed. Now all internal locking in the dynamic states implementation is hidden from ipfw_chk().
  • ipfw_dyn_rule now becomes obsolete. Currently it used to pass information from kernel to userland only.
  • ipfw_send_pkt() moved from ip_fw_dynamic into ip_fw2. It will be reworked later. Now ip_fw_dynamic2 has its own functions to send keep-alivies.
  • Added ipfw_add_protected_rule() that creates default_rule. It also used to create "dynamic states default rule".
  • Some fixes to range matching: use UINT32_MAX to match all rules with the same rulenum.

Dynamic states implementation:

  • IPv4 and IPv6 states now described by different structures dyn_ipv4_state and dyn_ipv6_state;
  • IPv6 scope zones support is added;
  • states are linked with "entry" field using CK_SLIST. This allows lockless lookup and protected by mutex modifications.
  • the "expired" SLIST field is used for states expiring.
  • struct dyn_data is used to keep generic information for both IPv4 and IPv6;
  • struct dyn_parent is used to keep O_LIMIT_PARENT information;
  • IPv4 and IPv6 states are in different hash tables;
  • Also O_LIMIT_PARENT states now are kept separately from O_LIMIT and O_KEEP_STATE states.
  • per-cpu dyn_hp pointers are used to implement hazard pointers and they prevent freeing states that are locklessly used by lookup threads.
  • mutexes to protect modification of lists in hash tables now kept in separate arrays;
  • each hash table has two arrays of "bucket version" for add and delete operations. These arrays are used for some speedups and protections.
  • dyn_update_tcp_state() added. Updating algorithm for this functions is modified.
  • Separate lookup and install functions added for IPv4 and IPv6 states and for parent states.
  • By default now is used Jenkinks hash function.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
ae created this revision.Oct 16 2017, 10:36 AM
ae updated this revision to Diff 34025.Oct 16 2017, 11:01 AM
  • Restore support for 'check-state :any'.
ae edited the summary of this revision. (Show Details)Oct 16 2017, 2:53 PM
ae added a reviewer: oleg.Oct 17 2017, 7:07 AM
ae added a reviewer: julian.Oct 18 2017, 9:59 AM
julian requested changes to this revision.Oct 20 2017, 8:00 AM

really minor things only.. don't get discouraged..

sys/netpfil/ipfw/ip_fw2.c
476 ↗(On Diff #34025)

this seems unfinished

478 ↗(On Diff #34025)

say more about the flags

508 ↗(On Diff #34025)

do non tcp packets ever get here?

520 ↗(On Diff #34025)

I don't like this but I can't really say why.. "Prove this to be correct in a comment"

529 ↗(On Diff #34025)

should we have a no-ipv4 set of conditionals too?

sys/netpfil/ipfw/ip_fw_private.h
200 ↗(On Diff #34025)

separate by whitespace plz :-)

sys/netpfil/ipfw/ip_fw_sockopt.c
799 ↗(On Diff #34025)

this makes me wonder if you could create a diagram of structure linkages in the modern ipfw. Might even try it myself...

2407 ↗(On Diff #34025)

split after the ? not the : plz (or not split)

This revision now requires changes to proceed.Oct 20 2017, 8:00 AM
ae added a comment.Oct 20 2017, 9:02 AM

The ipfw_send_pkt() function was not changed, it just moved from ip_fw_dynamic.c into ip_fw2.c. I can remove keep-alive related functionality from this function if you prefer this.
The new dynamic states implementation uses own keep-alive functions that are in ip_fw_dynamic2.c (is is hidden in phabricator and should be expanded via "Show File Contents").
I am planning to rework ipfw_send_pkt() to use some deferred sending (e.g. taskq). I don't like the fact that we are reentering the firewall when we sending RST. This produces high stack usage.

ae updated this revision to Diff 34172.Oct 20 2017, 9:19 AM
ae edited the summary of this revision. (Show Details)

Some whitespace fixes and blank lines.
In dyn_export_data() fix typo in bytes counter calculation

ae marked 2 inline comments as done.Oct 20 2017, 9:32 AM
ae added inline comments.
sys/netpfil/ipfw/ip_fw_sockopt.c
799 ↗(On Diff #34025)

We have the array of pointers to rules. This array reallocated each time we add new rule. get_map() does this. Not sure what you wanted to hear/read here? :)
This function just allocates rule with 65535 rule number and in reserved set. It is used to allocate default rule, and also new dynamic states code uses this function to allocate "default dynamic states" rule. This rule is placed next to the default rule and looks like:

# ipfw list 65535
65535 deny ip from any to any
65535 allow ip from any to any // Dynamic states default rule - NOTREACHED

We use this rule to link with states that are created by ipfwsync code. Also I think it can be used to implement "keep_states" functionality. When ifpw rule that have some states is deleted, we can use this default rule as new parent, and states will use this parent until they are expired. But this needs to some thoughts on how this implement better.

ae updated this revision to Diff 34181.Oct 20 2017, 2:05 PM
  • Switch the default hash algorithm to jenkins hash.
ae edited the summary of this revision. (Show Details)Oct 20 2017, 2:07 PM
ae set the repository for this revision to rS FreeBSD src repository.
ae added a subscriber: olivier.
olivier added inline comments.Oct 23 2017, 7:29 PM
sys/netpfil/ipfw/ip_fw_dynamic2.c
3200 ↗(On Diff #34181)

Compilation problem here:

--- ip_fw_dynamic2.o ---
/src/sys/netpfil/ipfw/ip_fw_dynamic2.c:3200:14: error: use of undeclared identifier 'dyn_expired_ipv4'
        SLIST_INIT(&dyn_expired_ipv4);
                    ^
3208 ↗(On Diff #34181)

Compilation problem here:

--- all_subdir_ipfw ---
/src/sys/netpfil/ipfw/ip_fw_dynamic2.c:3208:14: error: use of undeclared identifier 'dyn_expired_ipv6'
        SLIST_INIT(&dyn_expired_ipv6);
                    ^
sys/netpfil/ipfw/ip_fw_sockopt.c
842 ↗(On Diff #34181)

Wrong copy/past ? This same function is declared 3 times here.

-- ip_fw_sockopt.o ---
/src/sys/netpfil/ipfw/ip_fw_sockopt.c:818:1: error: redefinition of 'ipfw_add_protected_rule'
ipfw_add_protected_rule(struct ip_fw_chain *chain, struct ip_fw *rule,
^
/src/sys/netpfil/ipfw/ip_fw_sockopt.c:794:1: note: previous definition is here
ipfw_add_protected_rule(struct ip_fw_chain *chain, struct ip_fw *rule,
^
ae updated this revision to Diff 34264.Oct 23 2017, 8:12 PM
  • Fix build with VIMAGE and remove mismerged chunks.
  • Replace ip_fw_dynamic in sys/conf/files.
ae marked 3 inline comments as done.Oct 23 2017, 8:14 PM

Here is on a Xeon 8cores with Chelsio results, 2000 UDP flows of small packet size:

inet4: +8%

x r325006: inet4 packets-per-second
+ r325006 with D12685: inet4 packets-per-second
+--------------------------------------------------------------------------+
| x     x       x    x                                        +  +   + +  +|
||______M_A_______|                                                        |
|                                                              |____AM___| |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       4926226       5061461       4968007     4982323.8     61171.031
+   5     5354928.5       5439683       5400744     5397190.5     33366.021
Difference at 95.0% confidence
        414867 +/- 71858.4
        8.32677% +/- 1.53565%
        (Student's t, pooled s = 49270.6)

inet6, +26% :-)

x r325006: inet6 packets-per-second
+ r325006 with D12685: inet6 packets-per-second
+--------------------------------------------------------------------------+
|xxx                                                            +  ++ +   +|
||A|                                                                       |
|                                                                |__MA__|  |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       3495513       3522367       3504904     3508423.8     11667.122
+   5       4373025       4506515       4425156       4433615     50334.836
Difference at 95.0% confidence
        925191 +/- 53285.2
        26.3706% +/- 1.54172%
        (Student's t, pooled s = 36535.7)

On a small 4 cores ATOM:

inet4: +1.6%

x r325006: inet4 packets-per-seconds
+ r325006 with D12685: inet4 packets-per-seconds
+--------------------------------------------------------------------------+
|  x x                 x    x                   + +       x        +    ++ |
||_____________________A______________________|                            |
|                                                 |___________A____M______||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5        533769        546839      538652.5      538631.2     5283.2651
+   5      544503.5        550422        548852      547726.8     2850.0441
Difference at 95.0% confidence
        9095.6 +/- 6190.71
        1.68865% +/- 1.1644%
        (Student's t, pooled s = 4244.74)

inet6:

x r325006:inet6 packets-per-seconds
+ r325006 with D12685: inet6 packets-per-seconds
+--------------------------------------------------------------------------+
|       x  *   *            +        x   +       +                        x|
||_____________M_____________A__________________________|                  |
|            |______________MA_______________|                             |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5        433603        438510        434100      435151.8      2056.723
+   5        433856        436658        435092      435163.6      1207.954
No difference proven at 95.0% confidence
ae added a comment.Oct 26 2017, 8:13 PM

Also now you can try your 5000000 flows test :)
You can set

sysctl net.inet.ip.fw.dyn_max=5000000
sysctl net.inet.ip.fw.dyn_buckets=5000000

Also it is interesting to see what value has net.inet.ip.fw.curr_max_length variable during the test.

Wow, results regarding the 5M UDP session bench:

x r325006 with AFDATA/RADIX patches: inet4 Packets-per-second
+ r325006 with AFDATA/RADIX and D12685 patches: inet4 Packets-per-second
+--------------------------------------------------------------------------+
|x                                                                        +|
|x                                                                        +|
|x                                                                        +|
|x                                                                        +|
|x                                                                        +|
|A                                                                         |
|                                                                         A|
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5         28583         36232       32290.5       32308.3     3725.3587
+   5       5661388       5697271       5687629       5682374     15266.038
Difference at 95.0% confidence
        5.65007e+06 +/- 16205.5
        17488% +/- 2092%
        (Student's t, pooled s = 11111.5)

During the 5M session bench:

sysctl net.inet.ip.fw.curr_max_length
net.inet.ip.fw.curr_max_length: 9

And now comparing pf vs ipfw:

what are the chances of this improvement making it into stable/10?

ae added a comment.Oct 30 2017, 2:51 PM

I have not plan to merge this into stable/10.
Also I'm not sure about committing this as is, because I'm not sure ConcurrencyKit is supported on all platforms.
Probably I will modify ip_fw_dynamic.c to be KPI compatible with this code, and then make it conditionally buildable.
Also CK is not merged into stable/11 yet, but cognet@ said that he will merge it into stable/11 if it will become needed.

bapt added a subscriber: bapt.Nov 21 2017, 4:58 PM
In D12685#266423, @ae wrote:

I have not plan to merge this into stable/10.
Also I'm not sure about committing this as is, because I'm not sure ConcurrencyKit is supported on all platforms.

It is!

Probably I will modify ip_fw_dynamic.c to be KPI compatible with this code, and then make it conditionally buildable.
Also CK is not merged into stable/11 yet, but cognet@ said that he will merge it into stable/11 if it will become needed.

seanc added a subscriber: seanc.Jan 22 2018, 11:18 PM
ae added a comment.Feb 7 2018, 9:47 AM

I tried to build universe and it is failed to build on sparc64 and powerpc. At least not all CK's primitives are defined for these platforms.

sparc64:

/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c: In function 'dyn_update_tcp_state':
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:962: warning: implicit declaration of function 'ck_pr_xor_16'
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:962: warning: nested extern declaration of 'ck_pr_xor_16' [-Wnested-externs]
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:964: warning: implicit declaration of function 'ck_pr_or_16'
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:964: warning: nested extern declaration of 'ck_pr_or_16' [-Wnested-externs]
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c: In function 'dyn_get_parent_state':
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:1832: warning: implicit declaration of function 'ck_pr_md_load_16'
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:1832: warning: nested extern declaration of 'ck_pr_md_load_16' [-Wnested-externs]
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:1846: warning: implicit declaration of function 'ck_pr_inc_16'
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:1846: warning: nested extern declaration of 'ck_pr_inc_16' [-Wnested-externs]
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c: In function 'dyn_install_state':
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:1948: warning: implicit declaration of function 'ck_pr_dec_16'
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:1948: warning: nested extern declaration of 'ck_pr_dec_16' [-Wnested-externs]
--- ip_fw_dynamic.o ---
*** [ip_fw_dynamic.o] Error code 1

powerpc:

/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c: In function 'dyn_update_tcp_state':
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:962: warning: implicit declaration of function 'ck_pr_xor_16'
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:962: warning: nested extern declaration of 'ck_pr_xor_16' [-Wnested-externs]
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:964: warning: implicit declaration of function 'ck_pr_or_16'
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:964: warning: nested extern declaration of 'ck_pr_or_16' [-Wnested-externs]
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c: In function 'dyn_get_parent_state':
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:1846: warning: implicit declaration of function 'ck_pr_inc_16'
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:1846: warning: nested extern declaration of 'ck_pr_inc_16' [-Wnested-externs]
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c: In function 'dyn_install_state':
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:1948: warning: implicit declaration of function 'ck_pr_dec_16'
/usr/src/sys/netpfil/ipfw/ip_fw_dynamic.c:1948: warning: nested extern declaration of 'ck_pr_dec_16' [-Wnested-externs]
--- ip_fw_dynamic.o ---
*** [ip_fw_dynamic.o] Error code 1
ae added a comment.Feb 7 2018, 10:38 AM

I'll try to change the related code to use ck_pr_xxx_32 operations, it seems all platforms support them.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 7 2018, 7:00 PM
This revision was automatically updated to reflect the committed changes.
will added a subscriber: will.Feb 14 2018, 8:40 PM

Hi, just a quick question.

I admit I haven't looked at this very carefully, but: how do these dynamic states get freed? From dyn_expire_states:

2236         /*                                                                       
2237          * Concatenate temporary lists with global expired lists.                
2238          */                                                                      
2239         DYN_EXPIRED_LOCK();                                                      
2240         SLIST_CONCAT(&V_dyn_expired_ipv4, &expired_ipv4,                         
2241             dyn_ipv4_state, expired);                                            
2242 #ifdef INET6                                                                     
2243         SLIST_CONCAT(&V_dyn_expired_ipv6, &expired_ipv6,                         
2244             dyn_ipv6_state, expired);                                            
2245 #endif                                                                           
2246         DYN_EXPIRED_UNLOCK();

To my untrained (and quickly reviewing) eye, it looks like these lists get cleared only in ipfw_dyn_uninit, here:

3060         DYN_FREE_STATES_FORCED(, s4, ipv4, expired_ipv4, expired);               
3061 #ifdef INET6                                                                     
3062         DYN_FREE_STATES_FORCED(, s6, ipv6, expired_ipv6, expired);               
3063 #endif

This call is only made if ipfw is unloaded (via VNET_SYSUNINIT -> vnet_ipfw_uninit), correct? Surely this is way too long for expired states to stick around in memory?

ae added a comment.Feb 14 2018, 9:43 PM
In D12685#301251, @will wrote:

I admit I haven't looked at this very carefully, but: how do these dynamic states get freed? From dyn_expire_states:
To my untrained (and quickly reviewing) eye, it looks like these lists get cleared only in ipfw_dyn_uninit, here:

They are freed by dyn_free_states() function that is called every second.