Details

Reviewers

None

Group Reviewers

Summary

A scaling environment may require several NAT instances for the
amount of customers. This patch speed up the access of the NAT
instance in question while processing a packet in the ipfw rule.

It replaces the single unsorted list by a very simple hash of
lists. The hashing alogrithm works best with sequentially
numbered NAT instances.

Test Plan

# cat /boot/loader.conf  
net.inet.ip.fw.default_to_accept=1
net.inet.ip.fw.nat_hash=7
# kldload ipfw
# kldload ipfw_nat
# foreach i ( 1 2 3 4 5 6 7 8 9 10 11 12 )
> ipfw nat $i config ip 192.168.3.$i
> ipfw add nat $i ip from any to 192.168.3.$i
> ipfw add nat $i ip from 192.168.1.2 to 192.168.2.$i
> end
# kldunload ipfw_nat
# kldunload ipfw

Loading and unloading should work.
Natting should work regardless of the nat_hash value.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 39897
Build 36786: arc lint + arc unit

Event Timeline

donner created this revision.Feb 8 2020, 9:27 PM

Herald added a reviewer: manpages. · View Herald TranscriptFeb 8 2020, 9:27 PM

Herald added subscribers: Contributor Reviews (src), ae, imp. · View Herald Transcript

Harbormaster completed remote builds in B29247: Diff 67986.Feb 8 2020, 9:27 PM

donner edited the test plan for this revision. (Show Details)Feb 8 2020, 9:29 PM

donner added a reviewer: network.

donner added inline comments.Feb 8 2020, 9:41 PM

sys/netpfil/ipfw/ip_fw2.c
3421–3423	Module specific initialization moved into the module. Zeroized memory at initialization is sufficient to avoid erroneous behavior.
sys/netpfil/ipfw/ip_fw_nat.c
486–490	Avoid code duplication

donner mentioned this in D23450: libalias: Allow setting alias port ranges.Feb 8 2020, 9:43 PM

We already have "nat tablearg" feature for this task:

single rule

ipfw add nat tablearg ip from any to 'table(1)'

for each $k and $i

ipfw table 1 add 192.168.$k.$i $k

Have you tried measuring performance of such configuration? It uses RADIX tree instead of linear seach.

OK from manpages.

In D23586#517583, @eugen_grosbein.net wrote:

We already have "nat tablearg" feature for this task:

single rule

ipfw add nat tablearg ip from any to 'table(1)'

for each $k and $i

ipfw table 1 add 192.168.$k.$i $k

Have you tried measuring performance of such configuration? It uses RADIX tree instead of linear seach.

You missed the point: tablearg is for selecting the "number" of the nat instance given the rule processing.
This patch is about selecting the nat instance itself given the "number".

For some background about the amount of instances see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219316

donner retitled this revision from ipfw_nat: Perfomance of acceing multiple nat tables to ipfw_nat: Perfomance of accessing multiple nat tables.Feb 9 2020, 2:04 PM

In D23586#517583, @eugen_grosbein.net wrote:

We already have "nat tablearg" feature for this task:

The relevant lookup part in ip_fw2.c is

 t = ((ipfw_insn_nat *)cmd)->nat;
 if (t == NULL) {
    nat_id = TARG(cmd->arg1, nat);
    t = (*lookup_nat_ptr)(&chain->nat, nat_id);

    if (t == NULL) {
       retval = IP_FW_DENY;
       break;
    }
    if (cmd->arg1 != IP_FW_TARG)
        ((ipfw_insn_nat *)cmd)->nat = t;
}

The lookup for explicit referencing is cached (in cmd->nat), so the patch has no useful impact at all.
Using tableargs (what I want to do), the caching is disabled, and the lookup for the result of the tablearg is performed for each packet traversing the kernel. So the speedup of this patch is only realized, if the nat table is selected by a tablearg.

donner edited the summary of this revision. (Show Details)Feb 11 2020, 8:40 PM

Rebase to r357812

Harbormaster completed remote builds in B29306: Diff 68170.Feb 12 2020, 11:42 AM

General comment: I'd prefer not to add non-resizable hashes. It should be system job, not user, to resize the hash. Unfortunately, there is no existing generic resizable hash primitive in the kernel code currently.

Speaking of this particular case, I would suggest doing it slightly differently.
We know that nat numbers are limited to 65k. Given that, we can simply allocate 65k array of pointers on the first addition of the nat rule, w/o bothering about hash efficiency, resizing, etc.

In D23586#519046, @melifaro wrote:

General comment: I'd prefer not to add non-resizable hashes. It should be system job, not user, to resize the hash. Unfortunately, there is no existing generic resizable hash primitive in the kernel code currently.

Speaking of this particular case, I would suggest doing it slightly differently.
We know that nat numbers are limited to 65k. Given that, we can simply allocate 65k array of pointers on the first addition of the nat rule, w/o bothering about hash efficiency, resizing, etc.

This approach has serious consequences:

It takes an non-trivial amount of kernel memory. I tend to avoid this.
It causes a significant performance hit in all cases where the full scan of all instances is required (i.e. dealiasing via "global" or "ipfw show"). This is the reason, why I tried to keep the hash table as small as possible.
Most use cases for the non-experienced user (or embedded devices) involves only a single NAT instance at all. I don't want to burden them with performance issues of large CGN providers.

Futhermore the benefit of the patch is only realized for tablearg selection of NAT instances. All other cases do access the NAT instances either by caching or by full scan.

In D23586#519077, @lutz_donnerhacke.de wrote:

In D23586#519046, @melifaro wrote:

General comment: I'd prefer not to add non-resizable hashes. It should be system job, not user, to resize the hash. Unfortunately, there is no existing generic resizable hash primitive in the kernel code currently.

Speaking of this particular case, I would suggest doing it slightly differently.
We know that nat numbers are limited to 65k. Given that, we can simply allocate 65k array of pointers on the first addition of the nat rule, w/o bothering about hash efficiency, resizing, etc.

This approach has serious consequences:

It takes an non-trivial amount of kernel memory. I tend to avoid this.

Well, we already allocate 2 such arrays for the rule index, so 512k won't drastically increase the footprint.
Anyway, I think even the dynamic-sized array capped by max instance number could be implemented relatively easily.

It causes a significant performance hit in all cases where the full scan of all instances is required (i.e. dealiasing via "global" or "ipfw show"). This is the reason, why I tried to keep the hash table as small as possible.

No. You still keep the list.

Most use cases for the non-experienced user (or embedded devices) involves only a single NAT instance at all. I don't want to burden them with performance issues of large CGN providers.

Not sure how this is applicable, as the array option does not have any configurable options.

Futhermore the benefit of the patch is only realized for tablearg selection of NAT instances. All other cases do access the NAT instances either by caching or by full scan.

Well, we already allocate 2 such arrays for the rule index, so 512k won't drastically increase the footprint.

Then we can get rid of the whole caching framework, which will decrease the code size considerably. Just lookup the nat instance directly (if the number is static or from tablearg), so more branches in this path. This way the idea become much more interesting.

It causes a significant performance hit in all cases where the full scan of all instances is required (i.e. dealiasing via "global" or "ipfw show"). This is the reason, why I tried to keep the hash table as small as possible.

No. You still keep the list.

Yep. Only for the purpose of scanning all instances.

donner mentioned this in D23577: divert: Add socket options for divert socket's send and receive buffers.Feb 12 2020, 7:35 PM

Chance to a simple table based approach.
Remove the whole caching framework incl. special opcode extensions.
Keep the table small (dynamically allocated).
Move local managment structures from global into local file.
No need for externally visible tunables anymore, no man page changes.
Not yet fully tested (only module loading/unloading, rule creating, deletion).

Harbormaster completed remote builds in B29343: Diff 68243.Feb 13 2020, 1:05 PM

This implementation certainly looks better, both eliminating old hacks and improving performance.
Some comments inline.

sys/netpfil/ipfw/ip_fw_nat.c
101	Is it `nat_list` or `nat_mgmt` or `nat_ctl` or `nat_priv` or ?
102	also: why `all`? Both list and array have all items. Maybe it would be better to consider something like `nat_list`, `nat_idxmap`, `nat_idxsize` etc? Also: can size be negative?
572	You cannot do M_WAITOK while holding non-sleepable lock. Also, as this is configuration change, UH_WLOCK (configuration lock) should be sufficient for most of the control plane changes. To be a bit more specific: nat list is not used in fast path anymore, hence there is no need to acquire IPFW_WLOCK for it, UH_WLOCK is sufficient. For realloc: don't you want to add M_ZERO ? Some generic comments: you almost never want to panic in case of memory allocation failure. Typically you either retry a couple of times or return an error to the caller. Also, you almost never have the luxury of using `M_WAITOK`.
1184	Do alloc first and then assign to chain under both UH_WLOCK and IPFW_WLOCK. Also, I'm not sure if we're allowed to sleep in vnet_init hooks. Probably yes, but better verify.
sys/netpfil/ipfw/ip_fw_sockopt.c
1997	You have to keep backward compatibility. Old and new ipfw(8) binaries have to work. I guess you can potentially accept both old & new opcode variations in the kernel, while changing ipfw(8) to use the new one, but this has to be verified.

donner added inline comments.Feb 13 2020, 3:18 PM

sys/netpfil/ipfw/ip_fw_nat.c
102	Negative values is not only an issue with "size" but also with "id". This is far more serious, I'm work on this.

donner added inline comments.Feb 13 2020, 3:55 PM

sys/netpfil/ipfw/ip_fw_nat.c
572	M_ZERO is a really good idea. I just had a beautiful panic by configuring an intermediate (go backwards below the largest nat id) unassigend instance. Thank you for the comment. Reading it solved my issue as it happend.

Fix a lot of errors.
Renaming the struct and fields.
id numbers are not longer generic ints, but uint16_t.
Fix locking.
Document creation of new nat instances in a more readable way.
Alloc memory only if outside of any locks.

Harbormaster completed remote builds in B29379: Diff 68324.Feb 14 2020, 4:27 PM

donner marked 5 inline comments as done.Feb 14 2020, 4:39 PM

Updated to revision 358355.

Harbormaster completed remote builds in B29628: Diff 68855.Feb 26 2020, 8:09 PM

@melifaro Are your concerns resolved?

driesm added a subscriber: driesm.May 12 2020, 9:09 AM

Rebase to r366170.

Harbormaster completed remote builds in B33820: Diff 77539.Sep 25 2020, 7:24 PM

Updated to revision 367040.

Harbormaster completed remote builds in B34378: Diff 78741.Oct 25 2020, 4:53 PM

This patch causes a hanging kernel, if the ipfw ruleset is modified under stress.
To modify the ruleset, the system needs to be rebooted.

Removing manpages since no man page change is involved.

Rebase after a long time

Harbormaster completed remote builds in B36789: Diff 83511.Feb 7 2021, 9:26 PM

Rebase

Harbormaster completed remote builds in B39897: Diff 90848.Jun 13 2021, 9:06 PM

ipfw_nat is loadable as separate module.
Why we are use M_IPFW memory type ?

ipfw_nat: Perfomance of accessing multiple nat tables
Needs ReviewPublic
Actions

Details

Diff Detail

Event Timeline

single rule

for each $k and $i

single rule

for each $k and $i

Revision Contents
Changeset List

Diff 90848

sbin/ipfw/ipfw2.c

sys/netpfil/ipfw/ip_fw2.c

sys/netpfil/ipfw/ip_fw_nat.c

sys/netpfil/ipfw/ip_fw_private.h

sys/netpfil/ipfw/ip_fw_sockopt.c

ipfw_nat: Perfomance of accessing multiple nat tablesNeeds ReviewPublicActions

Details

Diff Detail

Event Timeline

single rule

for each $k and $i

single rule

for each $k and $i

Revision ContentsChangeset List

Diff 90848

sbin/ipfw/ipfw2.c

sys/netpfil/ipfw/ip_fw2.c

sys/netpfil/ipfw/ip_fw_nat.c

sys/netpfil/ipfw/ip_fw_private.h

sys/netpfil/ipfw/ip_fw_sockopt.c

ipfw_nat: Perfomance of accessing multiple nat tables
Needs ReviewPublic
Actions

Revision Contents
Changeset List