Page MenuHomeFreeBSD

SMR for TCP hostcache.
ClosedPublic

Authored by glebius on Apr 12 2021, 4:55 PM.

Details

Summary

When we receive a SYN flood from a limited set of hosts, the TCP hostcache
is the bottleneck. This is hash row lock contention just to read hostcache
data.

The core of the change is that only update requests do mtx_lock. The read
requests use SMR section. This required some changes to writers as well,
e.g. directly reusing the least used entry in a row is no longer possible,
also updates need to be atomic.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

sys/netinet/tcp_hostcache.c
283

Really it is somewhat strange to allocate a new SMR zone for each VNET. I understand that it's the pattern we use today, but we should permit VNET zones to share slabs and SMR trackers. This is just an observation.

320–321

TAILQs are not SMR-safe. You need to use CK_LIST or CK_STAILQ for now. There is a diff to add SMR_LIST_* macros with extra parameters to allow assertions to be embedded, and to distinguish between readers and writers, but they did not land yet.

609

This illustrates why we need special queue macros here. Since reads are now unlocked, something needs to ensure that all stores to the nascent hc_entry are visible to other CPUs before it is added to the queue.

sys/netinet/tcp_hostcache.c
283

I totally agree. Can do this single zone change prior to SMR change.

609

The nature of the hostcache is that a lookup failure isn't fatal. So, I think, the change is safe with TAILQ. As long as SMR guarantees that we won't read entry contents from the previous allocation, we are good.

sys/netinet/tcp_hostcache.c
609

It is not just lookup failures that need to be considered. There are many fields that get initialized before the structure is inserted into the hash bucket. Without synchronization it is possible to match on a half-constructed hostcache entry.

This is not memory safe anyway. There is nothing guaranteeing that the update of the "next" pointer of a new entry will be visible before the update to the queue head, so a concurrent lookup might follow a garbage pointer. It may work fine in practice on amd64 but the standard queue.h macros do not give the right semantics on platforms with weak memory ordering guarantees.

glebius added inline comments.
sys/netinet/tcp_hostcache.c
609

Got it, understood. Thanks, Mark. Will update the revision.

Converted hash buckets to CK_SLIST.

sys/netinet/tcp_hostcache.c
283

It appears that all TCP UMA zones are VNET local. I don't like that, but I would prefer to leave it as is and let VIMAGE fans and experts (I'm an neither) to decide whether they should stay VNET local or become global.

Seems ok to me, most of my comments are minor.

sys/netinet/tcp_hostcache.c
317

I would split this into two functions. There would some duplication, but the checks in the main loop can be lifted into a function to reduce that, and it's nicer to have separate code paths for readers and writers when they use different synchronization primitives.

366

Is there any point in having a branch here? atomic_store_int() is just a plain store.

466

"second to last" or "next to last" is a more standard way of referring to hc_prev, IMO.

546

This comment is a bit misleading to me: plain (aligned) stores are already assumed to be atomic. atomic_store here just ensures that the stores will not be reordered with each other atomic(9) operations, which doesn't seem to be very important. However, it does make it clear that data races are possible and expected, so we should keep them. But then tcp_hc_get() and similar functions should also use atomic_load.

768

How is it guaranteed that hch_length > 0?

771–790

This can become head->hch_length--.

glebius added inline comments.
sys/netinet/tcp_hostcache.c
317

My original version was like this, but then I collapsed it back in one as to me this looks nicer. Very subjective of course. If more reviewers prefer to split it, I'll do it.

366

My goal was to avoid unnecessary cache line trashing on every lookup. Imagine two CPUs looking up the same entry million times a second (a SYN flood scenario), if each of them updates the entry on each lookup, that would create cache misses. With the check, that would happen only once per prune interval.

546

For the TCP hostcache it is non-fatal to read a stale value, when a new value is being written. However, we don't want any trash value to be read, we need either old or new. On amd64 aligned stores can't produce any intermediate values, and hc_entry is most likely aligned, although there is no enforcement for its each field. So, all these atomics aren't necessary in practice. I don't know if any other or any future arch can write 32-bit word in two steps. I sought advice with Kostik and he suggested to keep these atomics as self documenting code.

768

If we entered the list traversal, then there is at least one entry on the list. This logic was here before and my patch doesn't change it.

glebius marked an inline comment as done.

Address markj@'s comments.

Seems ok to me, with the caveat that I'm not very familiar with the TCP stack.

sys/netinet/tcp_hostcache.c
317

Ok. I just generally dislike using flag variables to determine what kind of synchronization is being used. It is not just a matter of style, this kind of thing makes it harder to read, to write assertions, to get useful results from static analysis, etc.. But this is not my code. :)

546

The compiler will ensure that each field is naturally aligned, unless you specifically force it not to.

I agree with what you wrote. I think you should keep the atomics. The change I'm suggesting is to add corresponding atomic_loads. Again, it shows that data races are possible, and thus provides hints for both the reader and for tools that try to automatically detect data races, like KCSAN. Right now we are not very good about this, but at least we can try to avoid it in new code.

Use atomic loads when reading hostcache values, as suggested by Mark.

sys/netinet/tcp_hostcache.c
366
770–771

As Mark suggested split tcp_hc_lookup() into lookup-for-read and
lookup-for-write versions, and then inline the latter into tcp_hc_update().

As Mark says make locking semantics clearer. Also removes passing
pointers to pointers.

markj added inline comments.
sys/netinet/tcp_hostcache.c
480

These checks should probably be in their own function.

sys/netinet/tcp_hostcache.c
480

The cycles are now different. For update we keep the previous pointer and for read lookup we don't.

I can reduce copy-n-paste in the hash calculation, though.

Reduce copy-n-paste: macro for hash and tcp_hc_cmp().

sys/netinet/tcp_hostcache.c
510

This check was added to make sure no undetected race (depite the row lock) exists and leads to accounting errors.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 20 2021, 5:06 PM
This revision was automatically updated to reflect the committed changes.