Page MenuHomeFreeBSD

ng_l2tp: Fix callout synchronization in the RACK timeout handler
ClosedPublic

Authored by markj on Sep 24 2020, 6:40 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Oct 7, 2:19 PM
Unknown Object (File)
Fri, Oct 4, 12:49 AM
Unknown Object (File)
Thu, Oct 3, 3:35 PM
Unknown Object (File)
Tue, Oct 1, 3:29 PM
Unknown Object (File)
Tue, Oct 1, 2:16 PM
Unknown Object (File)
Thu, Sep 26, 6:44 AM
Unknown Object (File)
Thu, Sep 26, 6:44 AM
Unknown Object (File)
Thu, Sep 26, 6:44 AM

Details

Summary

The synchronization between ng_l2tp_seq_recv_nr() and
ng_l2tp_seq_rack_timeout() is buggy. ng_l2tp_seq_recv_nr()
releases ack'ed packets from the transmit queue, stops the retransmit
callout, checks whether the queue is empty, and arms the callout again
if not. All of this is done with the node mutex held.

ng_l2tp_seq_rack_timeout() checks to see if it was cancelled by
ng_l2tp_seq_recv_nr(), and *then* acquires the node mutex and
retransmits outstanding packets. So there is a race:

CPU 1:

  • clears send queue

CPU 2:

  • schedules callout
  • callout blocks on node mutex

CPU 1:

  • stops callout
  • finds that the send queue is empty

CPU 2:

  • tries to transmit the first packet from the send queue
  • panic()

Fix it by modifying the callout handler to check whether it has been
cancelled after aquiring the node lock. We could perhaps further
"harden" the code by making the handler check for an empty queue, but I
believe it should not be necessary.

Test Plan

None, this is based on some bug reports forwarded to me.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj requested review of this revision.Sep 24 2020, 6:40 PM
markj added a reviewer: glebius.
bz requested changes to this revision.Sep 24 2020, 9:59 PM
bz added a subscriber: bz.
bz added inline comments.
sys/netgraph/ng_l2tp.c
1463 ↗(On Diff #77495)

Don't you leak the mutex in that early return case?

This revision now requires changes to proceed.Sep 24 2020, 9:59 PM
sys/netgraph/ng_l2tp.c
1463 ↗(On Diff #77495)

D'oh, yes. Will fix.

Can you please point me to the bug report?
We are running this node in a large scale production environment (with remote LAC at different carriers) and did not observe such an issue before.

In D26548#591305, @lutz_donnerhacke.de wrote:

Can you please point me to the bug report?

The PR that Aleksandr noted, https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241133

A pfsense developer also reported seeing a null pointer dereference in ng_l2tp_seq_rack_timeout(), apparently because seq->xwin[0] == 0. Apparently the panic was observed even after this commit: https://svnweb.freebsd.org/base?view=revision&revision=353027 .

We are running this node in a large scale production environment (with remote LAC at different carriers) and did not observe such an issue before.

How common are retransmits in this environment?

Is the bug reproducible in a test environment?
Did someone try to compile with INVARIANTS (in order to bring L2TP_SEQ_CHECK into life)?

In D26548#591401, @lutz_donnerhacke.de wrote:

Is the bug reproducible in a test environment?

No, it is apparently triggered by mpd5 running in a production environment but only rarely, once every few months.

Did someone try to compile with INVARIANTS (in order to bring L2TP_SEQ_CHECK into life)?

I don't believe so. Indeed, L2TP_SEQ_CHECK should be able to detect the problem since it effectively asserts that seq->xwin[0] != NULL if and only if the retransmit timer is active.

In D26548#591305, @lutz_donnerhacke.de wrote:

Can you please point me to the bug report?

The PR that Aleksandr noted, https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241133

A pfsense developer also reported seeing a null pointer dereference in ng_l2tp_seq_rack_timeout(), apparently because seq->xwin[0] == 0. Apparently the panic was observed even after this commit: https://svnweb.freebsd.org/base?view=revision&revision=353027 .

Thank you for the info.

We are running this node in a large scale production environment (with remote LAC at different carriers) and did not observe such an issue before.

How common are retransmits in this environment?

We had no problems with the L2TP tunnel since PR146082 was solved. There we run into long time stability problems with L2TP sequence numbers.
Yes we have retransmits, but I did not set up an special monitoring for this, because we use sequence numbers in L2TP only for the control channel, but not for the data channel.
Typically we see a permanent difference in sequence numbers of more than one (always sending data). May be the heavy traffic hides the problem, because we usually do not run into any timeout.
Of course, we have loss of connectivity between LAC and LNS, but never had a panic with a similar stack trace.

In D26548#591438, @lutz_donnerhacke.de wrote:
In D26548#591305, @lutz_donnerhacke.de wrote:

Can you please point me to the bug report?

The PR that Aleksandr noted, https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241133

A pfsense developer also reported seeing a null pointer dereference in ng_l2tp_seq_rack_timeout(), apparently because seq->xwin[0] == 0. Apparently the panic was observed even after this commit: https://svnweb.freebsd.org/base?view=revision&revision=353027 .

Thank you for the info.

We are running this node in a large scale production environment (with remote LAC at different carriers) and did not observe such an issue before.

How common are retransmits in this environment?

We had no problems with the L2TP tunnel since PR146082 was solved. There we run into long time stability problems with L2TP sequence numbers.
Yes we have retransmits, but I did not set up an special monitoring for this, because we use sequence numbers in L2TP only for the control channel, but not for the data channel.
Typically we see a permanent difference in sequence numbers of more than one (always sending data). May be the heavy traffic hides the problem, because we usually do not run into any timeout.
Of course, we have loss of connectivity between LAC and LNS, but never had a panic with a similar stack trace.

If the pending transmit queue is usually non-empty, then the problem should indeed be hidden. The race looks pretty hard to hit, there are several ingredients:

  • some pending ACKs to arm the 1s timeout
  • ACKs for pending transmits are received at the same time as the timeout fires
  • received ACKs clear the transmit queue

So if there is a lot of control traffic I expect the panic to be rare. Note that r353027 fixed a more severe problem with the timeout handler.

My problem is, that I do not understand, how this race condition can happen at all.
Sure, the fix seems obvious and does not harm, so let's go with it.

OTOH almost all handling of callouts (at least in ng_l2tp) is prone to the same error, if the reasoning is true.

In D26548#591451, @lutz_donnerhacke.de wrote:

My problem is, that I do not understand, how this race condition can happen at all.

You mean, there is some high-level reason the race should not occur? Or, are the details of the race not clear? I can try to explain my reasoning further.

Sure, the fix seems obvious and does not harm, so let's go with it.

OTOH almost all handling of callouts (at least in ng_l2tp) is prone to the same error, if the reasoning is true.

Indeed, there seems to be a race in ng_l2tp_seq_xack_timeout() as well, I started looking at it. It'll be a separate diff though.

Indeed, there seems to be a race in ng_l2tp_seq_xack_timeout() as well.

This looks like it could cause transmission of a ZLB with stale NR or NS, again in rare circumstances. I don't think it will cause a kernel panic.

This revision is now accepted and ready to land.Sep 25 2020, 6:25 PM

Indeed, there seems to be a race in ng_l2tp_seq_xack_timeout() as well.

This looks like it could cause transmission of a ZLB with stale NR or NS, again in rare circumstances. I don't think it will cause a kernel panic.

There is not problem with an additional packet, so let this unchanged,
I've to think about the issue in the context of the netgraph framework itself.
I clearly do not understand the interaction of the components within the framework (lesson learned)