Page MenuHomeFreeBSD

Fix panic in network stack due memory use after free in relation to fragmented packets
ClosedPublic

Authored by hselasky on Mar 18 2019, 1:56 PM.

Details

Summary

When sending IPv6 fragmented packets and a fragment is lost before the network device is freed, the mbuf making up the fragment will remain in the temporary hashed fragment list and cause a panic when it times out due to accessing a freed network interface structure.

Fixes:

  1. Make sure the m_pkthdr.rcvif always point to a valid network interface. Else the value should be set to NULL.
  2. Use the rcvif of the last received fragment as m_pkthdr.rcvif for the fully defragged packet.

This avoids feeding mbufs to the netisr having a NULL rcvif in the m_pkthdr.

Backtrace:
panic()
icmp6_reflect()

hlim = ND_IFINFO(m->m_pkthdr.rcvif)->chlim;
^^^^ rcvif->if_afdata[AF_INET6] is NULL.

icmp6_error()
frag6_freef()
frag6_slowtimo()
pfslowtimo()
softclock_call_cc()
softclock()
ithread_loop()

Sponsored by: Mellanox Technologies
MFC after: 1 week

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
bz added inline comments.Apr 20 2019, 2:33 PM
sys/netinet/ip_reass.c
620 ↗(On Diff #55771)

You can fold these two into one line.

622 ↗(On Diff #55771)

Can you add a KASSERT that ifp != NULL?

624 ↗(On Diff #55771)

I'd remove the blank line.

631 ↗(On Diff #55771)

Can you do the assignment not on the declaration line; also the declaration could happen at the beginning of the function. Also Why is it mb and not just m?

sys/netinet6/frag6.c
84 ↗(On Diff #55771)

could be bool and then you could use true/false.

953 ↗(On Diff #55771)

You can fold these three into one line.

955 ↗(On Diff #55771)

Please add a KASSERT that ifp is not NULL

961 ↗(On Diff #55771)

I'd remove the two blank lines (also above this one).

969 ↗(On Diff #55771)

Again, please declare at the top and then you can just use it here without the blank line.

This revision now requires changes to proceed.Apr 20 2019, 2:33 PM
hselasky marked 9 inline comments as done.Apr 21 2019, 12:59 PM
hselasky updated this revision to Diff 56444.Apr 21 2019, 1:03 PM

Address comments from @bz .

bz accepted this revision.Apr 21 2019, 4:56 PM

Thanks for persevering on this. Let's get this for now. I also think the discussion had in the last days was very fruitful. Either way, there is a way forward now beyond just the IP reassembly queues, whether we start fixing the ifp on the mbuf replacing it or adding more event handlers for other parts of the tree which need clearing.

This revision is now accepted and ready to land.Apr 21 2019, 4:56 PM
This revision was automatically updated to reflect the committed changes.
hselasky reopened this revision.Apr 22 2019, 7:41 PM

It appears there are more issues that needs to be fixed before this patch can land.

I'm re-opening review.

The current patch causes a panic with the following sequence of commands:

kldload pfsync
cd /usr/tests/sys/netpfil/pf
sudo kyua test

I will investigate that more closely tomorrow.

panic: mtx_lock() of destroyed mutex @ /usr/src/sys/netinet/ip_reass.c:628
cpuid = 0
time = 1555939645
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0091d68530
vpanic() at vpanic+0x19d/frame 0xfffffe0091d68580
panic() at panic+0x43/frame 0xfffffe0091d685e0
__mtx_lock_flags() at __mtx_lock_flags+0x12e/frame 0xfffffe0091d68630
ipreass_cleanup() at ipreass_cleanup+0x86/frame 0xfffffe0091d68670
if_detach_internal() at if_detach_internal+0x786/frame 0xfffffe0091d686f0
if_detach() at if_detach+0x3d/frame 0xfffffe0091d68710
lo_clone_destroy() at lo_clone_destroy+0x16/frame 0xfffffe0091d68730
if_clone_destroyif() at if_clone_destroyif+0x21f/frame 0xfffffe0091d68780
if_clone_detach() at if_clone_detach+0xb8/frame 0xfffffe0091d687b0
vnet_loif_uninit() at vnet_loif_uninit+0x26/frame 0xfffffe0091d687d0
vnet_destroy() at vnet_destroy+0x124/frame 0xfffffe0091d68800
prison_deref() at prison_deref+0x29d/frame 0xfffffe0091d68840
sys_jail_remove() at sys_jail_remove+0x28f/frame 0xfffffe0091d68890
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe0091d689b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0091d689b0
--- syscall (508, FreeBSD ELF64, sys_jail_remove), rip = 0x80031e12a, rsp = 0x7fffffffe848, rbp = 0x7fffffffe8d0 ---
KDB: enter: panic
[ thread pid 1223 tid 100501 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>
hselasky updated this revision to Diff 56513.Apr 22 2019, 8:30 PM

Add missing check for vnet_state .

@bz: There is another bug I believe in the IPv6 VNET shutdown code.

Where is the counterpart mtx_destroy() for the following mtx_init():

void
frag6_init(void)
{

struct ip6q *q6;
int i;
V_ip6_maxfragpackets = IP6_MAXFRAGPACKETS;
frag6_set_bucketsize();
for (i = 0; i < IP6REASS_NHASH; i++) {
        q6 = IP6Q_HEAD(i);
        q6->ip6q_next = q6->ip6q_prev = q6;
        mtx_init(&V_ip6q[i].lock, "ip6qlock", NULL, MTX_DEF);
        V_ip6q[i].count = 0;
}
bz requested changes to this revision.Apr 23 2019, 10:04 AM

Sorry my bad for missing the VIMAGE state problem in the initial submit.

sys/net/vnet.c
601 ↗(On Diff #56513)

Why do you need these two?

The above should always end up on SI_SUB_VNET_DONE (see vnet_sysinit_done()) and when you get to the end of the sysuninit loop, the only thing left should be the state tied to the jail.

sys/netinet/ip_reass.c
630 ↗(On Diff #56513)

You may want to go and have a look how "shutdown" is defined in if_detach_internal(). I think using the same logic here and in IPv6 is probably wise.

This revision now requires changes to proceed.Apr 23 2019, 10:04 AM
bz added subscribers: jtl, jhb.Apr 23 2019, 10:07 AM

@bz: There is another bug I believe in the IPv6 VNET shutdown code.
Where is the counterpart mtx_destroy() for the following mtx_init():

void
frag6_init(void)
{

struct ip6q *q6;
int i;
V_ip6_maxfragpackets = IP6_MAXFRAGPACKETS;
frag6_set_bucketsize();
for (i = 0; i < IP6REASS_NHASH; i++) {
        q6 = IP6Q_HEAD(i);
        q6->ip6q_next = q6->ip6q_prev = q6;
        mtx_init(&V_ip6q[i].lock, "ip6qlock", NULL, MTX_DEF);
        V_ip6q[i].count = 0;
}

This came in with the SA at r337776 by @jtl and @jhb . Seems they missed the case. Let's open a separate review to add the missing mtx_destroy().

hselasky marked 2 inline comments as done.Apr 25 2019, 10:56 AM
hselasky added inline comments.
sys/net/vnet.c
601 ↗(On Diff #56513)

After your explanation I see these chunks are not needed. I'll remove.

sys/netinet/ip_reass.c
630 ↗(On Diff #56513)

I'll update the code.

hselasky updated this revision to Diff 56638.Apr 25 2019, 11:05 AM
hselasky marked 2 inline comments as done.
hselasky updated this revision to Diff 57255.May 10 2019, 9:15 AM

Update patch to match D20051 .

jtl requested changes to this revision.May 23 2019, 6:16 PM

In the interests of avoiding discussion fragmentation, I am adding the feedback from the transport working group meeting today.

@hselasky raised this on the working group call and solicited feedback. The strong (seemingly-unanimous) feedback on the call was that we should not discard fragments when the associated interface goes away. Some points that were raised include:

  • Systems which have multiple interfaces can receive different fragments over different interfaces. Therefore, the other fragments may still arrive over other interfaces.
  • Systems with flapping interfaces may receive fragments after the interface recovers.
  • It is not unheard of for systems to queue fragments for multiple seconds. Therefore, it is possible that the fragments will arrive once connectivity is restored and/or rerouted.

It was the strong feeling of the participants on the call that this problem should be solved in a way which preserves the fragments on the queue. Some suggestions included:

  • At the time of enqueueing, copying the information from the ifnet pointer which is necessary to process the packet. Save this information, rather than the ifnet pointer.
  • When an interface goes away, clear the ifnet pointer from the fragments on the queue. Skip processing steps which depend on this information (which @hselasky described as incrementing interface statistics and sending an error response when a fragment is dropped).
This revision now requires changes to proceed.May 23 2019, 6:16 PM
hselasky updated this revision to Diff 57839.May 24 2019, 1:23 PM
hselasky edited the summary of this revision. (Show Details)

Fix according to input from @jtl . Set rcvif to NULL when a network interface vanishes.

I'll register strong desire to discard fragments immediately on interface vanishing. I don't expect that to come with any significant weight, my rational is whatever makes state management easier within current codebase and usage patterns of fbsd.

@jtl: Can you please comment?

@kbowling : This is what the previous version of this patch does.

@jtl: Given that we don't have a significant number of panics in this area, I don't see a problem just freeing the fragments. I.E. the current behaviour leads to a panic if a fragment is leftover from an interface. This is only done when interfaces are destroyed and does not affect LINK-flapping.

tuexen added a subscriber: tuexen.May 29 2019, 9:28 AM

I'll register strong desire to discard fragments immediately on interface vanishing. I don't expect that to come with any significant weight, my rational is whatever makes state management easier within current codebase and usage patterns of fbsd.

Why do you want to discard a packet if the interface it was received from is going away?

The packet is there, it doesn't matter if the interface goes away after it was received. And we don't have to wait for the interface to come back before continuing processing. We should be able to just process the packet even without that interface it was received on still being present when the processing happens.

bz added a comment.May 29 2019, 9:47 AM

I'll register strong desire to discard fragments immediately on interface vanishing. I don't expect that to come with any significant weight, my rational is whatever makes state management easier within current codebase and usage patterns of fbsd.

Why do you want to discard a packet if the interface it was received from is going away?
The packet is there, it doesn't matter if the interface goes away after it was received. And we don't have to wait for the interface to come back before continuing processing. We should be able to just process the packet even without that interface it was received on still being present when the processing happens.

And all kinds of upper-layer logics will keep working with an NULL ifp? I am just wondering when we do address list walks, statistics, hashes, ...

What do we do with packets in ifqs when the interface goes away? We don't process them anymore either.

Why would you want the packet to stay so badly. The Internet is designed that we can lose packets and we do all the time. Buffer overflows, link drops, routing glitches, .. out of memory conditions in our stack, ... There are zero guarantees any packet will ever make it. The chances that a partial packet sitting in the reassembly queue will become a "full" packet when an interface goes away become narrowly zero.

Even if this patch doesn't change the behaviour for link flapping, I'd even argue the same thing for link-flaps. What happens on real hardware in these cases? I mean the time a link-flap takes is usually going to be a multiple of receive rings/buffers/.. so by the time your link comes back up the remainders of your fragments for the "full" packet have long gone down the orcus, or worse, might even have triggered ICMP(v6) back to the sender already where the connection was dropped rather than waiting on resend-timers.

I think preserving any kind of state which we can get rid of, we are just trying to be to complicated instead of KISS.

And we have two problems to address here:
(1) purging state from all kinds of queues (or in a future trying not to have that state in this way anymore)
(2) the mbuf ifp problem (which is a way larger problem to address and affects all parts of the stack) and should be a different problem

Both problems are intermixed here; let's step-by-step solve (1) and leave (2) for another review.

@bz - once a packet (fragment in this case) has made it to it's destination, I would argue that you shouldn't needlessly drop that packet any more. (Host vs. Router behavior). An interface flapping frequently, and removing all enqueued fragments while doing this, would reduce the usability of such an interface from marginal to completely inoperable.

Using the ifp of the last received fragment rather the ifp of the initial one (in time, not relative offset) sounds to me like a good compromise.

adrian added a comment.Jun 6 2019, 4:01 PM

@bz - once a packet (fragment in this case) has made it to it's destination, I would argue that you shouldn't needlessly drop that packet any more. (Host vs. Router behavior). An interface flapping frequently, and removing all enqueued fragments while doing this, would reduce the usability of such an interface from marginal to completely inoperable.
Using the ifp of the last received fragment rather the ifp of the initial one (in time, not relative offset) sounds to me like a good compromise.

But wait - this should only be called when an ifp is DELETED, not when the interface flaps link / administrative state. If an interface is /deleted/ then we should honestly drop the frames.

I feel like "interface flap" is a bit mis-labeled here.. :/

hselasky updated this revision to Diff 60498.Aug 6 2019, 9:15 AM

Rebase patch after latest changes upstream.

bz added a comment.EditedSep 27 2019, 1:45 PM
In D19622#439705, @jtl wrote:

In the interests of avoiding discussion fragmentation, I am adding the feedback from the transport working group meeting today.
@hselasky raised this on the working group call and solicited feedback. The strong (seemingly-unanimous) feedback on the call was that we should not discard fragments when the associated interface goes away. Some points that were raised include:

  • Systems which have multiple interfaces can receive different fragments over different interfaces. Therefore, the other fragments may still arrive over other interfaces.
  • Systems with flapping interfaces may receive fragments after the interface recovers.
  • It is not unheard of for systems to queue fragments for multiple seconds. Therefore, it is possible that the fragments will arrive once connectivity is restored and/or rerouted.

It was the strong feeling of the participants on the call that this problem should be solved in a way which preserves the fragments on the queue. Some suggestions included:

  • At the time of enqueueing, copying the information from the ifnet pointer which is necessary to process the packet. Save this information, rather than the ifnet pointer.
  • When an interface goes away, clear the ifnet pointer from the fragments on the queue. Skip processing steps which depend on this information (which @hselasky described as incrementing interface statistics and sending an error response when a fragment is dropped).

In a later call I think your suggestion was along the lines of:

(a) if ifnet goes away nuke the recvif pointer from the queued mbufs
(b) if reassembly times out do as outlined above and skip sending ICMP error/per-IF statistics/.. if the ifnet pointer was nuked
(c) if another fragment arrives (or the last fragment to complete the packet arrives) use that ones recvif pointer as that interface is expected to still be there to pass the packet on

Now there is a gray area between (b) and (c) in which (b) could be extended to "if we cannot find an ifnet pointer in the expected place, scan the fragments of the packet in question for any ifnet pointer and use that one" for error handling. It'd be a one-time slightly more expensive operation. If there's an attack however that is kind-of the extra work you'd want to avoid. Without the extra work, you may not find out as easily what kind of problem you are running into though as you are lacking statistics. Trade-off...

jtl added a comment.Sep 27 2019, 2:44 PM
In D19622#476255, @bz wrote:

In a later call I think your suggestion was along the lines of:
(a) if ifnet goes away nuke the recvif pointer from the queued mbufs
(b) if reassembly times out do as outlined above and skip sending ICMP error/per-IF statistics/.. if the ifnet pointer was nuked
(c) if another fragment arrives (or the last fragment to complete the packet arrives) use that ones recvif pointer as that interface is expected to still be there to pass the packet on
Now there is a gray area between (b) and (c) in which (b) could be extended to "if we cannot find an ifnet pointer in the expected place, scan the fragments of the packet in question for any ifnet pointer and use that one" for error handling. It'd be a one-time slightly more expensive operation. If there's an attack however that is kind-of the extra work you'd want to avoid. Without the extra work, you may not find out as easily what kind of problem you are running into though as you are lacking statistics. Trade-off...

There was no grey area in my mind. :-) My proposal was (and still is) to skip anything ifnet-dependent if the reassembly times out and we don't have a pointer to the ifnet.

I agree with the rest of your statement, however. I do recognize the trade-off you have noted. And, while I have a preference for foregoing the extra work, I'm OK with resolving this either way.

hselasky updated this revision to Diff 62739.Sep 30 2019, 8:16 AM

Rebase patch.

@bz If you don't need to send any ICMP errors on timeout, then the rcvifp could just be cleared when packets entering the ICMP hash. That would simplify the code.

hselasky updated this revision to Diff 62985.Oct 7 2019, 2:29 PM

Rebase patch.

bz requested changes to this revision.Oct 8 2019, 8:40 PM

I left you a sample change which should be good enough for the IPv4 frag reassembly to no longer trip over VNET shutdown problems and is independent on when things will happen and independent on any SI_SUB_* ordering.

Once we fix frag6.c to also have a proper destroy the same can be done there.

sys/netinet/ip_reass.c
639 ↗(On Diff #62985)

This block can be changed to:
{{{

/*
 * Skip processing if IPv4 reassembly is not initialised
 * or torn down by ipreass_destroy().
 */ 
if (V_ipq_zone == NULL)
    return;

}}}

667 ↗(On Diff #62985)

Please add a V_ipq_zone = NULL; here. (If you really want to be fuzzy, you can use a temporary variable but VNET shutdown is single threaded so ...)

This revision now requires changes to proceed.Oct 8 2019, 8:40 PM
hselasky updated this revision to Diff 63087.Oct 9 2019, 4:11 PM

Fix check for shutdown in IPv4 reassembly cleanup.

hselasky updated this revision to Diff 63088.Oct 9 2019, 4:13 PM

No need to check shutdown flag in IPv6 fragment cleanup code yet (@bz fill add this later)

hselasky marked 2 inline comments as done.Oct 9 2019, 4:16 PM

I've added prints in the new code paths and verified that:

  1. fragment loss does not cause a panic after 30 seconds (ifp is really cleared)
  2. creating and destroying a jail using vnet does not panic either (NULL check added causes a return)

All tests succeeded at my side.

bz accepted this revision.Oct 15 2019, 8:55 PM

I guess given the frag6_destroy() is only in my local tree still, this should be fine to go and I'll fill the vnet check for frag6 in later (as you indicated), when we are no longer leaking frag6 data.

If this works for you and applies to head commit it.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 16 2019, 9:12 AM
This revision was automatically updated to reflect the committed changes.