Differential D2095

Break out IPv4 reassembly locking into per-hash-bucket locks.
ClosedPublic
Actions

Authored by adrian on Mar 18 2015, 8:55 PM.

Details

Reviewers

hiren

Group Reviewers

network

Commits

rS281239: Move the IPv4 reassembly queue locking from a single lock to be per-bucket…

Summary

The current IPv4 reassembly code uses a single lock to protect
all the hash table buckets. This cause massive lock contention
on multi-core servers if a lot of reassembly occurs.

This may happen more frequently than others in LAN environments
with UDP heavy traffic, eg memcached.

Test Plan

Tested with local UDP traffic, with / without fragments

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

adrian updated this revision to Diff 4279.Mar 18 2015, 8:55 PM

adrian retitled this revision from to Break out IPv4 reassembly locking into per-hash-bucket locks..

adrian updated this object.

adrian edited the test plan for this revision. (Show Details)

adrian added a reviewer: network.

Herald added a subscriber: imp. · View Herald TranscriptMar 18 2015, 8:55 PM

"something" in the name of the function ip_reass_purge_something() seems...not clean but I don't have a better suggestion. :-)
Looks okay to me otherwise.

This revision is now accepted and ready to land.Mar 18 2015, 11:54 PM

Change "something" to "element" and then the name will be clearer. The locking itself looks OK from first glance.

See inlined comments, please.

sys/netinet/ip_input.c
174 ↗	(On Diff #4279)	I'd suggest to just include the mutex into the hashrow header, instead of using padalign. Anyway, if we are just about to hit the cacheline with the hashrow header if we lock the corresponding mutex.
426 ↗	(On Diff #4279)	Why do you need IPQ_LOCK_ALL? Why can't you go hash rows one by one, lock one, drain it, unlock it, go for next. btw, mutex initialization lacks MTX_DUPOK, but current code clearly requires it. Definitely not tested with WITNESS :)
1241 ↗	(On Diff #4279)	So, we always drain from a bucket next to the current one? Can this be exploited? For example, attacker knows that a certain peer of a victim host always sends fragments. Attacker calculates peers hash slot, then generates traffic that overflows the previous slot. My suggestion is to simply use a 'static int i' inside the ip_reass_purge_something(), and always start with current i value. It is okay that access to it is racy. The ip_reass_purge_something() should only try to lock hash slots, skipping locked ones.

Probably outside scope of this patch, but it would be nice if we had IP fragment reassembly as library, utilized by both ip_input() and by pf(4), without functionality duplication.

... yup, I think the next thing should be to migrate this into a library and have it be reusable / unit testable.

But that'll come next.

.. and yes, it's running on HEAD + WITNESS, but it didn't panic.

The reason why I've not gone and completely unrolled things is to keep the diffs minimal, but yes, I should really just lock/drain/unlock.

Address / clarify things!

This revision now requires review to proceed.Mar 20 2015, 4:47 PM

Hi Gleb:

It now individually locks things inside drain-all, rather than lock-all then drain-all.
It only initialises the locks once - the IP init path only does the bottom half of the function for the default vnet, not /all/ vnets.

Thanks.

Do you agree on my comments on embedding mutexes into hash slots? And about ip_reass_purge_element() logic? Do you plan to implement that?

sys/netinet/ip_input.c
1352 ↗	(On Diff #4302)	The function name should probably be ip_drain_vnet().

I'll look at updating the locks to be per hash bucket slot in a follow-up commit. I'll have to pull apart the initialiser and create a specific struct for it.

If it possible, doing all the stuff properly in one commit is preferred.

What are the benefits of having the locks be per-vnet right now, rather than just per-bucket?

I don't suggest this. I suggest to embed locks into buckets, instead of making array of locks.

I tested it out with a handful of vnets; there wasn't any performance benefit.

I'd like to not bloat things out for people running vnets on the smaller embedded platforms.

You probably misunderstand me, since my suggestion reduces code, rather than bloats. And my suggestion is orthogonal to vnets at all. I suggest to embed locks into buckets, not make them per-vnet.

Right, but the buckets are per-vnet, so if I embed them into buckets, it'll end up being per-vnet anyway.

Yes, that would be side effect. But do we optimize for people who run numerous jails on platforms with small memory? If we optimize for the common case, keeping the mutex and data it protects together is the right way.

Closed by commit rS281239 (authored by @adrian).

Revision Contents
Changeset List

Path

Size

head/

sys/

netinet/

ip_input.c

120 lines

Diff 4732

View Options

head/sys/netinet/ip_input.c

Break out IPv4 reassembly locking into per-hash-bucket locks.ClosedPublicActions