Details

Reviewers

jeff
mjg
markj
rscheff

Group Reviewers

network
transport

Commits

rGd554522f6e68: tcp_hostcache: use SMR for lookups, mutex(9) for updates.

Summary

When we receive a SYN flood from a limited set of hosts, the TCP hostcache
is the bottleneck. This is hash row lock contention just to read hostcache
data.

The core of the change is that only update requests do mtx_lock. The read
requests use SMR section. This required some changes to writers as well,
e.g. directly reusing the least used entry in a row is no longer possible,
also updates need to be atomic.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 38536
Build 35425: arc lint + arc unit

Event Timeline

glebius created this revision.Apr 12 2021, 4:55 PM

Herald added a reviewer: transport. · View Herald TranscriptApr 12 2021, 4:55 PM

Herald added subscribers: melifaro, imp. · View Herald Transcript

glebius requested review of this revision.Apr 12 2021, 4:55 PM

Harbormaster completed remote builds in B38536: Diff 87323.Apr 12 2021, 4:56 PM

markj added inline comments.Apr 12 2021, 5:26 PM

sys/netinet/tcp_hostcache.c
288	Really it is somewhat strange to allocate a new SMR zone for each VNET. I understand that it's the pattern we use today, but we should permit VNET zones to share slabs and SMR trackers. This is just an observation.
353	TAILQs are not SMR-safe. You need to use CK_LIST or CK_STAILQ for now. There is a diff to add SMR_LIST_* macros with extra parameters to allow assertions to be embedded, and to distinguish between readers and writers, but they did not land yet.
591	This illustrates why we need special queue macros here. Since reads are now unlocked, something needs to ensure that all stores to the nascent `hc_entry` are visible to other CPUs before it is added to the queue.

glebius added inline comments.Apr 12 2021, 7:22 PM

sys/netinet/tcp_hostcache.c
288	I totally agree. Can do this single zone change prior to SMR change.
591	The nature of the hostcache is that a lookup failure isn't fatal. So, I think, the change is safe with TAILQ. As long as SMR guarantees that we won't read entry contents from the previous allocation, we are good.

markj added inline comments.Apr 12 2021, 7:49 PM

sys/netinet/tcp_hostcache.c
591	It is not just lookup failures that need to be considered. There are many fields that get initialized before the structure is inserted into the hash bucket. Without synchronization it is possible to match on a half-constructed hostcache entry. This is not memory safe anyway. There is nothing guaranteeing that the update of the "next" pointer of a new entry will be visible before the update to the queue head, so a concurrent lookup might follow a garbage pointer. It may work fine in practice on amd64 but the standard queue.h macros do not give the right semantics on platforms with weak memory ordering guarantees.

glebius planned changes to this revision.Apr 13 2021, 5:14 PM

glebius added inline comments.

sys/netinet/tcp_hostcache.c
591	Got it, understood. Thanks, Mark. Will update the revision.

Converted hash buckets to CK_SLIST.

Harbormaster completed remote builds in B38581: Diff 87454.Apr 14 2021, 9:21 PM

glebius added inline comments.Apr 14 2021, 9:22 PM

sys/netinet/tcp_hostcache.c
288	It appears that all TCP UMA zones are VNET local. I don't like that, but I would prefer to leave it as is and let VIMAGE fans and experts (I'm an neither) to decide whether they should stay VNET local or become global.

Seems ok to me, most of my comments are minor.

sys/netinet/tcp_hostcache.c
326	I would split this into two functions. There would some duplication, but the checks in the main loop can be lifted into a function to reduce that, and it's nicer to have separate code paths for readers and writers when they use different synchronization primitives.
376	Is there any point in having a branch here? atomic_store_int() is just a plain store.
481	"second to last" or "next to last" is a more standard way of referring to `hc_prev`, IMO.
528	This comment is a bit misleading to me: plain (aligned) stores are already assumed to be atomic. atomic_store here just ensures that the stores will not be reordered with each other atomic(9) operations, which doesn't seem to be very important. However, it does make it clear that data races are possible and expected, so we should keep them. But then tcp_hc_get() and similar functions should also use atomic_load.
751	How is it guaranteed that `hch_length > 0`?
755	This can become `head->hch_length--`.

glebius marked 3 inline comments as done.Apr 15 2021, 5:19 PM

glebius added inline comments.

sys/netinet/tcp_hostcache.c
326	My original version was like this, but then I collapsed it back in one as to me this looks nicer. Very subjective of course. If more reviewers prefer to split it, I'll do it.
376	My goal was to avoid unnecessary cache line trashing on every lookup. Imagine two CPUs looking up the same entry million times a second (a SYN flood scenario), if each of them updates the entry on each lookup, that would create cache misses. With the check, that would happen only once per prune interval.
528	For the TCP hostcache it is non-fatal to read a stale value, when a new value is being written. However, we don't want any trash value to be read, we need either old or new. On amd64 aligned stores can't produce any intermediate values, and hc_entry is most likely aligned, although there is no enforcement for its each field. So, all these atomics aren't necessary in practice. I don't know if any other or any future arch can write 32-bit word in two steps. I sought advice with Kostik and he suggested to keep these atomics as self documenting code.
751	If we entered the list traversal, then there is at least one entry on the list. This logic was here before and my patch doesn't change it.

Address markj@'s comments.

Harbormaster completed remote builds in B38600: Diff 87502.Apr 15 2021, 5:21 PM

Seems ok to me, with the caveat that I'm not very familiar with the TCP stack.

sys/netinet/tcp_hostcache.c
326	Ok. I just generally dislike using flag variables to determine what kind of synchronization is being used. It is not just a matter of style, this kind of thing makes it harder to read, to write assertions, to get useful results from static analysis, etc.. But this is not my code. :)
528	The compiler will ensure that each field is naturally aligned, unless you specifically force it not to. I agree with what you wrote. I think you should keep the atomics. The change I'm suggesting is to add corresponding atomic_loads. Again, it shows that data races are possible, and thus provides hints for both the reader and for tools that try to automatically detect data races, like KCSAN. Right now we are not very good about this, but at least we can try to avoid it in new code.

Use atomic loads when reading hostcache values, as suggested by Mark.

Harbormaster completed remote builds in B38604: Diff 87507.Apr 15 2021, 6:18 PM

markj added inline comments.Apr 15 2021, 6:32 PM

sys/netinet/tcp_hostcache.c
375–376
750–754

As Mark suggested split tcp_hc_lookup() into lookup-for-read and
lookup-for-write versions, and then inline the latter into tcp_hc_update().

As Mark says make locking semantics clearer. Also removes passing
pointers to pointers.

Harbormaster completed remote builds in B38610: Diff 87521.Apr 15 2021, 10:20 PM

glebius marked 2 inline comments as done.Apr 15 2021, 10:20 PM

markj accepted this revision.Apr 16 2021, 1:25 PM

markj added inline comments.

sys/netinet/tcp_hostcache.c
495	These checks should probably be in their own function.

glebius added inline comments.Apr 16 2021, 4:03 PM

sys/netinet/tcp_hostcache.c
495	The cycles are now different. For update we keep the previous pointer and for read lookup we don't. I can reduce copy-n-paste in the hash calculation, though.