Page MenuHomeFreeBSD

First take on syn-rlock.
ClosedPublic

Authored by glebius on Apr 4 2021, 8:48 PM.

Details

Summary

When packet is a SYN packet, we don't need to modify any existing PCB.
Normally SYN arrives on a listening socket, we either create a syncache
entry or generate syncookie, but we don't modify anything with the
listening socket or associated PCB. Thus create a new PCB lookup
mode - rlock if listening. This removes the primary contention point
under SYN flood - the listening socket PCB.

This change removes the main bottleneck under heavy SYN flood.

Sidenote: when SYN arrives on a synchronized connection, we still
don't need write access to PCB to send a challenge ACK or just to
drop. There is only one exclusion - tcptw recycling. However,
existing entanglement of tcp_input + stacks doesn't allow to make
this change small. Consider this patch as first approach to the problem.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 38315
Build 35204: arc lint + arc unit

Event Timeline

This revision is now accepted and ready to land.Apr 5 2021, 10:26 AM

I doubt that it is enough to just do s/wlock/rlock/ in the syncache code. Usually we need to add the extra check that an entry was not already linked by another thread in such cases.

In D29576#663155, @ae wrote:

I doubt that it is enough to just do s/wlock/rlock/ in the syncache code. Usually we need to add the extra check that an entry was not already linked by another thread in such cases.

Nothing changes in the syncache hash locking with this patch. The patch changes to rlock of the listening socket PCB.

In D29576#663155, @ae wrote:

I doubt that it is enough to just do s/wlock/rlock/ in the syncache code. Usually we need to add the extra check that an entry was not already linked by another thread in such cases.

Nothing changes in the syncache hash locking with this patch. The patch changes to rlock of the listening socket PCB.

Imagine, you have two or more identical SYNs, now you use INP_RLOCK(), this means they all can be handled in parallel "in the same time". This means, syncache_add() in some cases can determine that there are no corresponding entries yet - syncache_lookup() returns NULL, we still hold SCH_LOCK(), then we allocate syncache entry, populate all its fields and do SCH_UNLOCK(). Imagine, now the same things will be done by the another thread, and another. In the end we will have several the same entries added by the syncache_insert().
Is this impossible?

In D29576#663187, @ae wrote:

Imagine, you have two or more identical SYNs, now you use INP_RLOCK(), this means they all can be handled in parallel "in the same time". This means, syncache_add() in some cases can determine that there are no corresponding entries yet - syncache_lookup() returns NULL, we still hold SCH_LOCK(), then we allocate syncache entry, populate all its fields and do SCH_UNLOCK(). Imagine, now the same things will be done by the another thread, and another. In the end we will have several the same entries added by the syncache_insert().
Is this impossible?

In this case the second thread will wait in SCH_LOCK and then, once it acquires the lock, syncache_lookup will return entry created by the first thread.

In D29576#663187, @ae wrote:

Imagine, you have two or more identical SYNs, now you use INP_RLOCK(), this means they all can be handled in parallel "in the same time". This means, syncache_add() in some cases can determine that there are no corresponding entries yet - syncache_lookup() returns NULL, we still hold SCH_LOCK(), then we allocate syncache entry, populate all its fields and do SCH_UNLOCK(). Imagine, now the same things will be done by the another thread, and another. In the end we will have several the same entries added by the syncache_insert().
Is this impossible?

In this case the second thread will wait in SCH_LOCK and then, once it acquires the lock, syncache_lookup will return entry created by the first thread.

I think there is a small window after SCH_UNLOCK() and before syncache_insert() when another threads can do lookup, create entry and come to the same point when it ready to syncache_insert(). And there are no guaranties that they will not do that :)
I seen some strange cases, when packets were delayed for 3 seconds where this looks impossible. And this happened several times on different machines, so this highly depends on the scheduling :)
Another question how it is dangerous for us. I just say, that if we want to change INP_WLOCK to INP_RLOCK, we need to make syncache_lookup() again when we are ready to make syncache_insert() with SCH_LOCK() held. Or rework somehow locking in syncache to avoid this. Or we need to make sure, that syncache duplicates will not lead to worse things.

In D29576#663233, @ae wrote:

I think there is a small window after SCH_UNLOCK() and before syncache_insert() when another threads can do lookup, create entry and come to the same point when it ready to syncache_insert(). And there are no guaranties that they will not do that :)

That's a very good find, thanks a lot! I will address that and post updated patch.

In D29576#663233, @ae wrote:

I think there is a small window after SCH_UNLOCK() and before syncache_insert() when another threads can do lookup, create entry and come to the same point when it ready to syncache_insert(). And there are no guaranties that they will not do that :)
I seen some strange cases, when packets were delayed for 3 seconds where this looks impossible. And this happened several times on different machines, so this highly depends on the scheduling :)
Another question how it is dangerous for us. I just say, that if we want to change INP_WLOCK to INP_RLOCK, we need to make syncache_lookup() again when we are ready to make syncache_insert() with SCH_LOCK() held. Or rework somehow locking in syncache to avoid this. Or we need to make sure, that syncache duplicates will not lead to worse things.

Andrey, after further reviewing I found out that the race you describe is already possible. The listening inpcb is already unlocked at this stage.

I think this race can be easily closed, there is nothing that prevents us from calling syncache_respond() with the syncache hash lock held. This is already performed in syncache_timeout(). However, I'd prefer to do that separately from this change. Thus, reopening this revision as it was already tested as is and doesn't add a new race.

This revision is now accepted and ready to land.Apr 8 2021, 7:45 PM