Page MenuHomeFreeBSD

tcp: Fix a locking issue
ClosedPublic

Authored by tuexen on Nov 13 2021, 1:10 PM.

Details

Summary

tcp_respond() might get called with an rlock only. For logging, try to upgrade the lock, and if that does not work, just don't log.

This fixes:

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

sys/netinet/tcp_subr.c
2109

I'm not familiar with TCP logging. Why is it ok to silently drop an event? Is it generally best-effort?

Remove trailing whitespace.

sys/netinet/tcp_subr.c
2109

The alternatives would be:

  • Also drop the packet (a RST not affecting the connection in this case)
  • always get a WLOCK instead of an RLOCK in case we get a SYN.

But I'm fine with the alternatives... Just thought that the solution chosen has the least negative impact.

markj added inline comments.
sys/netinet/tcp_subr.c
2109

It's up to you. Generally logging frameworks try hard to either log an event or log a notification saying that an event was dropped. Silently dropping an event can make for some very frustrating debugging if one is relying on tcp logging to be reliable.

I don't see any problems with the change as it is, but I suspect someone more familiar with TCP debugging should weigh in.

tuexen marked an inline comment as done.
tuexen added inline comments.
sys/netinet/tcp_subr.c
2109

Let's see what rrs@ and jtl@ want to say on this...

rrs added inline comments.
sys/netinet/tcp_subr.c
2109

Logging is always best effort. I would have done this a bit different, instead of trying to
upgrade I would only do the logging *if* the INP was already WLOCK'd. I don't think
its expected that anything RLOCKed would be logged. But I am ok with trying
to upgrade too (I just would not have bothered).

This revision is now accepted and ready to land.Nov 14 2021, 11:32 AM
This revision was automatically updated to reflect the committed changes.
lstewart added inline comments.
sys/netinet/tcp_subr.c
2109

While I agree that logging any given event is indeed best effort, I strongly agree with @markj's comment that we should always capture when an attempt to log something is unsuccessful. The tcp log buf has the notion of a monotonically increasing record serial number which is the primary mechanism to tell when log events have gone missing for any reason, and so I'd suggest we use that here i.e. figure out a way to increment the serial number without actually writing the log.

tuexen added inline comments.
sys/netinet/tcp_subr.c
2109

The reason we are not incrementing tp->t_logsn is that this requires holding the write lock, we we only have the read lock...

sys/netinet/tcp_subr.c
2109

If we hold the inp read lock, wouldn't an atomic increment of tp->t_logsn be fine in the "failed to upgrade" case?

The whole point of this was that there were absolutely no logs what so ever when
tcp_respond() was used. So prior to this set of commits you *never* saw anything
when we responded to a packet (and that happens in many places in all stacks).
Now I would have to study when we use tcp_respond() with a INP_RLOCK() this
is a very narrow case I would imagine, but if you are not willing to get a INP_WLOCK()
that implies to me you are not changing the TCB... but thats what logging does.

Let me chat with Michael on this my gut says its not worth doing anything else
but possibly an atomic when you hold a RLOCK will not cause too much issues,
other than a slight performance hit.

Michael and I have chatted about this. The only way you
can get in this state where you have an INP_RLOCK and
you are responding, is when you are getting a stand-alone
SYN. In that case due to DOS worries Gleb changed the code so that
we would get an RLOCK. Now this is a rare case unless you are
under a DOS attack. And the connection needs to be in the closed
state or its to a connection from a different port number. There
are also a couple of other cases where the SYN-CACHE unpack
fails (but unlikely to have a t_logstate set of course).

Now these are all very very rare events. And both Michael and I agree
that if we start doing an atomic on something that is normally
twiddled assuming a INP_WLOCK() I am not quite sure you would
have sane results unless you start doing atomic operations always
on that value.. else I can see the potential for issues.

So at this point I think whats here is fine and I don't think I would be
prone to change it. Yes there is a small chance of us not getting
a log, but it actually is pretty unlikely since Michael does have
an TRY_UPGRADE path here too.. and if you can get the RLOCK
you should have no problem getting the WLOCK (unless maybe
someone is doing a netstat on all sockets)..

The whole point of this was that there were absolutely no logs what so ever when tcp_respond() was used. So prior to this set of commits you *never* saw anything when we responded to a packet (and that happens in many places in all stacks). Now I would have to study when we use tcp_respond() with a INP_RLOCK() this is a very narrow case I would imagine, but if you are not willing to get a INP_WLOCK() that implies to me you are not changing the TCB... but thats what logging does.

For the purposes of this discussion, it's important to make a distinction between absence of log events caused by a failure to log vs there being no log event hook at all i.e. to @markj's point, if there's a log point in some code, I should reasonably expect that if that code path executes I can rely on a corresponding log event being generated if logging is enabled. Not having a way to infer the loss of a log event that otherwise would have been written ultimately undermines the utility/trustworthiness of the log. Given that there is a straightforward way to avoid this problem by incrementing the serial number, I see no upside and definite downside to not addressing this.

The only way you can get in this state where you have an INP_RLOCK and you are responding, is ... [snip] Now these are all very very rare events.

I would argue that the rarity of the events has no bearing on agreeing to the general principle that "where there is a log point in code and logging is enabled, we will guarantee that either a log is successfully written to the log buffer, or that the serial number is bumped to convey the loss of information". If we can agree on that, then figuring out the mechanism to implement/achieve that "contract" is secondary.

And both Michael and I agree that if we start doing an atomic on something that is normally twiddled assuming a INP_WLOCK() I am not quite sure you would have sane results unless you start doing atomic operations always on that value.. else I can see the potential for issues.

Firstly I'll note that there are other ways to achieve the proposed outcome than doing an atomic increment (e.g. deferring the increment until next time the wlock is held and the next log event is generated) but I think the atomic increment is the most straight forward option and is why I proposed it for this case. That being said, I'm only really concerned about making sure the "contract" is agreed and maintained, less so about how it's done.

As to the technical concerns about doing an atomic increment in this case, could you please help me understand the problem you think it could cause? My understanding (and I'm certainly no expert here so keen to be educated if I'm missing something) is that:

  • If we hold the inp rlock the tcpcb can't go away so tp->t_logsn will always be a valid memory address to increment
  • If we hold the inp rlock and do the increment atomically we have no possibility of variable corruption if we're racing with other rlock holders who may also try to increment
  • If we hold the inp rlock we've excluded all inp wlock code paths from doing a non-atomic increment of the serial number
  • If a wlock code path does a non-atomic increment of the serial number and then unlocks, the barrier semantics of the unlock force consistency of the serial number such that a subsequent atomic increment in an rlock path would be ensured to operate on the correct wlock-incremented serial number

Am I missing a scenario/sequence of operations where there could be a problem with the inp-rlocked atomic increment approach?

Firstly I'll note that there are other ways to achieve the proposed outcome than doing an atomic increment (e.g. deferring the increment until next time the wlock is held and the next log event is generated) but I think the atomic increment is the most straight forward option and is why I proposed it for this case. That being said, I'm only really concerned about making sure the "contract" is agreed and maintained, less so about how it's done.

As to the technical concerns about doing an atomic increment in this case, could you please help me understand the problem you think it could cause? My understanding (and I'm certainly no expert here so keen to be educated if I'm missing something) is that:

  • If we hold the inp rlock the tcpcb can't go away so tp->t_logsn will always be a valid memory address to increment
  • If we hold the inp rlock and do the increment atomically we have no possibility of variable corruption if we're racing with other rlock holders who may also try to increment
  • If we hold the inp rlock we've excluded all inp wlock code paths from doing a non-atomic increment of the serial number
  • If a wlock code path does a non-atomic increment of the serial number and then unlocks, the barrier semantics of the unlock force consistency of the serial number such that a subsequent atomic increment in an rlock path would be ensured to operate on the correct wlock-incremented serial number

Am I missing a scenario/sequence of operations where there could be a problem with the inp-rlocked atomic increment approach?

So what happens if you have someone with the wlock, they are going to increment the t_logsn, and they are not
using an atomic and the other guy is using an atomic.. this I think is the case Michael and I were unsure of.
Hmm but if you there is a rlock, I suppose you cannot have a wlock. So its probably a nop.

So there is another concern here too...

The whole reason we have no wlock by the way is due to the fact that this path with the
rlock happens when you have a SYN attack underway. So if your going to be now adding
an atomic is that going to be "to much" i.e. the atomic to get the wlock was too
much (think being jammed by 1000's of SYN's), is doing the atomic on all
cpu's at once going to serialize thing and remove the protection that Gleb intended
when he move it to a RLOCK. It upgrades to a WLOCK if its gone through and
done as tcp_dosegment() call.. so its only in the cases that jump around the
lookup to the send reset case.

Hmm and thinking about those cases (there are 3 IIRR from my look yesterday) you are
*not* going to have logging on in those cases anyway. Oh wait re-looking there are really
only 2 cases that you will have only a RLOCK and you would jump around the
upgrade:

  1. The connection is TCPS_CLOSED

or

  1. The connection is established and it came in on a different UDP port (which would indicate it does not really belong to this connection).

In case 1, at that point we *should not* be logging anything.
In case 2, I don't know if you want a log of a packet that does not
belong to your connection.

So I don't think I would want any bump in sn here.

Overall I think whats here is fine. And I don't think it breaks the contract
on valid packets that I would expect to log on.