Page MenuHomeFreeBSD

Fix panic in network stack due memory use after free in relation to fragmented packets
Needs ReviewPublic

Authored by hselasky on Mar 18 2019, 1:56 PM.

Details

Summary

When sending IPv6 fragmented packets and a fragment is lost before the network device is freed, the mbuf making up the fragment will remain in the temporary hashed fragment list and cause a panic when it times out due to accessing a freed network interface structure.

Fixes:

  1. Make sure the m_pkthdr.rcvif always point to a valid network interface. Else the value should be set to NULL.
  2. Use the rcvif of the last received fragment as m_pkthdr.rcvif for the fully defragged packet.

This avoids feeding mbufs to the netisr having a NULL rcvif in the m_pkthdr.

Backtrace:
panic()
icmp6_reflect()

hlim = ND_IFINFO(m->m_pkthdr.rcvif)->chlim;
^^^^ rcvif->if_afdata[AF_INET6] is NULL.

icmp6_error()
frag6_freef()
frag6_slowtimo()
pfslowtimo()
softclock_call_cc()
softclock()
ithread_loop()

Sponsored by: Mellanox Technologies
MFC after: 1 week

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 25699

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
bz added a comment.Mar 21 2019, 5:28 PM

I think that while in flight problem can be fixed with epoch,

Yes, should be.

the queueing problem should be fixed in a radical way - do not store mbufs with rcvif in reassembly queues. Just grab the ND_IFINFO(m->m_pkthdr.rcvif)->chlim or whatever you need before putting buffer on queue.

That won't work if a packet can still leave as upper/other layers may still want that ifp pointer; even if only to send out an error or for counters or firewalls or ..

@glebius and @hselasky rather than changing pr_drain I wondered about an eventhandler or something as that way dealing with non-protocol places such as firewalls, netisr, .. would also be possible? I think not queuing is not an option, arp queue is just another one of these places... there's more and more the longer I think about it... We'll need something to get them all (and getting the locking right).

ae added a comment.Mar 21 2019, 8:09 PM
In D19622#421205, @bz wrote:

@glebius and @hselasky rather than changing pr_drain I wondered about an eventhandler or something as that way dealing with non-protocol places such as firewalls, netisr, .. would also be possible? I think not queuing is not an option, arp queue is just another one of these places... there's more and more the longer I think about it... We'll need something to get them all (and getting the locking right).

The problem with disappeared ifnet and queued mbuf exist as long as I remember in dummynet.

@bz :

@glebius and @hselasky rather than changing pr_drain I wondered about an eventhandler or something as that way dealing with non-protocol places such as firewalls, netisr, .. would also be possible? I think not queuing is not an option, arp queue is just another one of these places... there's more and more the longer I think about it... We'll need something to get them all (and getting the locking right).

I'm fine using the existing ifnet_departure_event for this purpose. Or should I create a new event?
When such an event is received all packets with rcvif equal to the ifp, will be removed from the queues.
The current plan is to cover IPv4, IPv6, TCP and SCTP.

--HPS

hselasky updated this revision to Diff 55769.Apr 3 2019, 9:33 AM
hselasky edited the summary of this revision. (Show Details)

Hi,

I've had another go at this and it appears only IPv4 and IPv6 fragments cause use-after free and panic.
I've narrowed down my patch to be more specific.

Can the reviewers please have a look at this patch again?

--HPS

hselasky updated this revision to Diff 55771.Apr 3 2019, 10:16 AM

Fix eventhandler function arguments. Apparently there was no compiletime check for the prototype.

glebius accepted this revision as: glebius.Apr 15 2019, 3:34 PM

Although I still think that the problem of departing ifnets and queued/in-flight mbufs should be solved in a generic way, I'm fine with this patch. It closes one case out of many, but in non-disruptive way.

Do I get an accept from transport aswell?

glebius resigned from this revision.Apr 15 2019, 10:29 PM

Sorry, I withdraw my approval. The more I think about this, the more I dislike it. What the patch does - it fixes the panic in a most straightforward way. But let's look at it from standpoint of network engineer, not kernel developer.

Removal of an interface doesn't de-legitimate a packet that was already received on it. Once the packet has reached its destination, it doesn't matter what happened to the network path it traveled, including the very last interface. If I got a software tunnel, or a dialup line that is flapping and interface goes away and comes back, I actually want keep my reassembly queues.

I think the right approach to the problem is clearing "rcvif" on mbuf to be queued, and copy all required information from the ifnet into reassembly queue token.

adrian added a subscriber: adrian.Apr 15 2019, 10:39 PM

*puts network engineer in the past hat on*

If the interface is deleted and re-added then you want the pending frames on that interface to also go away - you don't want packets for a previous incarnation of that interface to end up there. Eg if you have a VPN service that churns quickly through ifnets, you don't want packets for the previous incarnation of tunX to appear on the new instance of tunX - you want them to be tossed so you know nothing leaks out.

Link /flap/ ? Sure, keep the reassembly queues. VPN teardown and re-establishment? Flush.

And yes I'd love for this to be addressed as a whole, but this is a pretty neat way to explicitly call out fixing up a good corner case.

bz added a comment.Apr 15 2019, 10:43 PM

Sorry, I withdraw my approval. The more I think about this, the more I dislike it. What the patch does - it fixes the panic in a most straightforward way. But let's look at it from standpoint of network engineer, not kernel developer.

Packets can be lost everywhere. It doesn't matter if it's an interface buffer, a fragment reassembly queue, a fibre cut, a BGP flap, a switch buffer overrun, a software bug, a hardware bug, .. What if your interface doesn't come back? What if the same interface comes back with a different connection? This is why we do have reliable upper layer transport protocols. Otherwise we wouldn't need them.

Copying interface meta-information into the frag-reass-q won't do any good as it describes something which is no longer valid (whether it is a copy or not). The point is to keep it simple and not to further complicate re-assembly.

The real underlying problem remains that an object can go away despite still being in use because there is no proper reference counting in place (as it supposedly is too expensive). Until we can fix that problem generally, we'll need work-arounds.

You didn't convince me guys. Can you please elaborate why removal of an interface should free packets that once came through this interface? Packet has arrived all the way to its last bit, it is legitimate packet. If you think it should be freed, then why are we flushing only fragment reassembly queues? Why don't flush TCP reassembly queues, socket buffers? Ah, we don't usually dereference m_pkthdr.rcvif for these kinds of queues, right, so we don't panic.

Back in dialup times it was practically useful that interface can go away and come back but packets remain. Today's practical use is of lesser importance, but I can imagine it be useful for a multihoming with WiFi+LTE and tunneling.

Bjoern, you comment can be reduced to "Packets can be lost anywhere... so let's just drop them instead of further complicating reassembly code". :(

mmacy added a subscriber: mmacy.Apr 16 2019, 8:48 PM

So without delving in to the specifics of this issue I'd like to point out a couple of things:

  • When an interface is taken down, freeing all unprocessed inbound and outbound is standard protocol as we have no idea how long the device will be down for, if it's more than 500ms the packets are good as dead
  • inbound mbufs have a non refcounted pointer to their ifp - this would be easy to fix, but extremely invasive

So in a Previous Project a Long Long Time Ago we solved this by having the receive/send state being an ifindex into an array of "ifnet" pointers, and a gencount so you can see if it's stale. Then all the code had to handle that the interface ifindex was stale (ie, a NULL pointer versus a garbage pointer) and decide at each point how to make forward progress. In some cases it wasn't needed for forward progress - eg it was already on a transmit queue, so the fact the receive interface went away wasn't a huge deal. But sometimes it was - eg tunnel (l2tp in one case) went away.

I agree this isn't the best solution, but it beats the heck out of panicking. It's panicing because SOME (interesting) state is being dereferenced through rcvif at some very late stage point in the stack. Maybe finding/tidying those up would also help things but this at least gets the panic out of the way.

mmacy added a comment.Apr 17 2019, 6:51 PM

So in a Previous Project a Long Long Time Ago we solved this by having the receive/send state being an ifindex into an array of "ifnet" pointers, and a gencount so you can see if it's stale. Then all the code had to handle that the interface ifindex was stale (ie, a NULL pointer versus a garbage pointer) and decide at each point how to make forward progress. In some cases it wasn't needed for forward progress - eg it was already on a transmit queue, so the fact the receive interface went away wasn't a huge deal. But sometimes it was - eg tunnel (l2tp in one case) went away.
I agree this isn't the best solution, but it beats the heck out of panicking. It's panicing because SOME (interesting) state is being dereferenced through rcvif at some very late stage point in the stack. Maybe finding/tidying those up would also help things but this at least gets the panic out of the way.

I implemented this with EBR instead of generation counts in VPC with minimal overhead. It also reduces the space for the ifp from 8 bytes to 2 on 64-bit. I think it's the right way forward, but like I say - invasive. Unclear if / when there will be a driver to change the whole stack.

adrian accepted this revision as: adrian.Apr 17 2019, 7:23 PM

I think this is a pretty self contained change and would like to see it land.

We could also do step 1 after this and convert all the uses of rcvif over to an accessor macro. That way we can then start handling the cases where it could be NULL and start down the path of not relying on unrefcounted rcvif's. It'll be fun though, stuff like wifi uses rcvif a bit oddly these days for callback state and it will likely be a pretty useful change to fix rcvif uses in net80211 to be a bit .. less hacky.

@glebius: I believe all the alternatives for solving this issue has been enumerated and discussed.

At some point you will need to go and cleanup the cache of mbufs. Else you end up delaying the destruction of the ifnet until all references are gone. Failing to wait for refs to disappear might mean previous packets arrive on new ifnet, which I believe is not what we want.

bz requested changes to this revision.Apr 20 2019, 2:33 PM
bz added inline comments.
sys/netinet/ip_reass.c
630

You can fold these two into one line.

632

Can you add a KASSERT that ifp != NULL?

634

I'd remove the blank line.

641

Can you do the assignment not on the declaration line; also the declaration could happen at the beginning of the function. Also Why is it mb and not just m?

sys/netinet6/frag6.c
83

could be bool and then you could use true/false.

967

You can fold these three into one line.

969

Please add a KASSERT that ifp is not NULL

975

I'd remove the two blank lines (also above this one).

983

Again, please declare at the top and then you can just use it here without the blank line.

This revision now requires changes to proceed.Apr 20 2019, 2:33 PM
hselasky marked 9 inline comments as done.Apr 21 2019, 12:59 PM
hselasky updated this revision to Diff 56444.Apr 21 2019, 1:03 PM

Address comments from @bz .

bz accepted this revision.Apr 21 2019, 4:56 PM

Thanks for persevering on this. Let's get this for now. I also think the discussion had in the last days was very fruitful. Either way, there is a way forward now beyond just the IP reassembly queues, whether we start fixing the ifp on the mbuf replacing it or adding more event handlers for other parts of the tree which need clearing.

This revision is now accepted and ready to land.Apr 21 2019, 4:56 PM
This revision was automatically updated to reflect the committed changes.
hselasky reopened this revision.Apr 22 2019, 7:41 PM

It appears there are more issues that needs to be fixed before this patch can land.

I'm re-opening review.

The current patch causes a panic with the following sequence of commands:

kldload pfsync
cd /usr/tests/sys/netpfil/pf
sudo kyua test

I will investigate that more closely tomorrow.

panic: mtx_lock() of destroyed mutex @ /usr/src/sys/netinet/ip_reass.c:628
cpuid = 0
time = 1555939645
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0091d68530
vpanic() at vpanic+0x19d/frame 0xfffffe0091d68580
panic() at panic+0x43/frame 0xfffffe0091d685e0
__mtx_lock_flags() at __mtx_lock_flags+0x12e/frame 0xfffffe0091d68630
ipreass_cleanup() at ipreass_cleanup+0x86/frame 0xfffffe0091d68670
if_detach_internal() at if_detach_internal+0x786/frame 0xfffffe0091d686f0
if_detach() at if_detach+0x3d/frame 0xfffffe0091d68710
lo_clone_destroy() at lo_clone_destroy+0x16/frame 0xfffffe0091d68730
if_clone_destroyif() at if_clone_destroyif+0x21f/frame 0xfffffe0091d68780
if_clone_detach() at if_clone_detach+0xb8/frame 0xfffffe0091d687b0
vnet_loif_uninit() at vnet_loif_uninit+0x26/frame 0xfffffe0091d687d0
vnet_destroy() at vnet_destroy+0x124/frame 0xfffffe0091d68800
prison_deref() at prison_deref+0x29d/frame 0xfffffe0091d68840
sys_jail_remove() at sys_jail_remove+0x28f/frame 0xfffffe0091d68890
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe0091d689b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0091d689b0
--- syscall (508, FreeBSD ELF64, sys_jail_remove), rip = 0x80031e12a, rsp = 0x7fffffffe848, rbp = 0x7fffffffe8d0 ---
KDB: enter: panic
[ thread pid 1223 tid 100501 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>
hselasky updated this revision to Diff 56513.Apr 22 2019, 8:30 PM

Add missing check for vnet_state .

@bz: There is another bug I believe in the IPv6 VNET shutdown code.

Where is the counterpart mtx_destroy() for the following mtx_init():

void
frag6_init(void)
{

struct ip6q *q6;
int i;
V_ip6_maxfragpackets = IP6_MAXFRAGPACKETS;
frag6_set_bucketsize();
for (i = 0; i < IP6REASS_NHASH; i++) {
        q6 = IP6Q_HEAD(i);
        q6->ip6q_next = q6->ip6q_prev = q6;
        mtx_init(&V_ip6q[i].lock, "ip6qlock", NULL, MTX_DEF);
        V_ip6q[i].count = 0;
}
bz requested changes to this revision.Apr 23 2019, 10:04 AM

Sorry my bad for missing the VIMAGE state problem in the initial submit.

sys/net/vnet.c
601 ↗(On Diff #56513)

Why do you need these two?

The above should always end up on SI_SUB_VNET_DONE (see vnet_sysinit_done()) and when you get to the end of the sysuninit loop, the only thing left should be the state tied to the jail.

sys/netinet/ip_reass.c
639

You may want to go and have a look how "shutdown" is defined in if_detach_internal(). I think using the same logic here and in IPv6 is probably wise.

This revision now requires changes to proceed.Apr 23 2019, 10:04 AM
bz added subscribers: jtl, jhb.Apr 23 2019, 10:07 AM

@bz: There is another bug I believe in the IPv6 VNET shutdown code.
Where is the counterpart mtx_destroy() for the following mtx_init():

void
frag6_init(void)
{

struct ip6q *q6;
int i;
V_ip6_maxfragpackets = IP6_MAXFRAGPACKETS;
frag6_set_bucketsize();
for (i = 0; i < IP6REASS_NHASH; i++) {
        q6 = IP6Q_HEAD(i);
        q6->ip6q_next = q6->ip6q_prev = q6;
        mtx_init(&V_ip6q[i].lock, "ip6qlock", NULL, MTX_DEF);
        V_ip6q[i].count = 0;
}

This came in with the SA at r337776 by @jtl and @jhb . Seems they missed the case. Let's open a separate review to add the missing mtx_destroy().

hselasky marked 2 inline comments as done.Apr 25 2019, 10:56 AM
hselasky added inline comments.
sys/net/vnet.c
601 ↗(On Diff #56513)

After your explanation I see these chunks are not needed. I'll remove.

sys/netinet/ip_reass.c
639

I'll update the code.

hselasky updated this revision to Diff 56638.Apr 25 2019, 11:05 AM
hselasky marked 2 inline comments as done.
hselasky updated this revision to Diff 57255.May 10 2019, 9:15 AM

Update patch to match D20051 .

jtl requested changes to this revision.May 23 2019, 6:16 PM

In the interests of avoiding discussion fragmentation, I am adding the feedback from the transport working group meeting today.

@hselasky raised this on the working group call and solicited feedback. The strong (seemingly-unanimous) feedback on the call was that we should not discard fragments when the associated interface goes away. Some points that were raised include:

  • Systems which have multiple interfaces can receive different fragments over different interfaces. Therefore, the other fragments may still arrive over other interfaces.
  • Systems with flapping interfaces may receive fragments after the interface recovers.
  • It is not unheard of for systems to queue fragments for multiple seconds. Therefore, it is possible that the fragments will arrive once connectivity is restored and/or rerouted.

It was the strong feeling of the participants on the call that this problem should be solved in a way which preserves the fragments on the queue. Some suggestions included:

  • At the time of enqueueing, copying the information from the ifnet pointer which is necessary to process the packet. Save this information, rather than the ifnet pointer.
  • When an interface goes away, clear the ifnet pointer from the fragments on the queue. Skip processing steps which depend on this information (which @hselasky described as incrementing interface statistics and sending an error response when a fragment is dropped).
This revision now requires changes to proceed.May 23 2019, 6:16 PM
hselasky updated this revision to Diff 57839.May 24 2019, 1:23 PM
hselasky edited the summary of this revision. (Show Details)

Fix according to input from @jtl . Set rcvif to NULL when a network interface vanishes.

I'll register strong desire to discard fragments immediately on interface vanishing. I don't expect that to come with any significant weight, my rational is whatever makes state management easier within current codebase and usage patterns of fbsd.

@jtl: Can you please comment?

@kbowling : This is what the previous version of this patch does.

@jtl: Given that we don't have a significant number of panics in this area, I don't see a problem just freeing the fragments. I.E. the current behaviour leads to a panic if a fragment is leftover from an interface. This is only done when interfaces are destroyed and does not affect LINK-flapping.

tuexen added a subscriber: tuexen.May 29 2019, 9:28 AM

I'll register strong desire to discard fragments immediately on interface vanishing. I don't expect that to come with any significant weight, my rational is whatever makes state management easier within current codebase and usage patterns of fbsd.

Why do you want to discard a packet if the interface it was received from is going away?

The packet is there, it doesn't matter if the interface goes away after it was received. And we don't have to wait for the interface to come back before continuing processing. We should be able to just process the packet even without that interface it was received on still being present when the processing happens.

bz added a comment.May 29 2019, 9:47 AM

I'll register strong desire to discard fragments immediately on interface vanishing. I don't expect that to come with any significant weight, my rational is whatever makes state management easier within current codebase and usage patterns of fbsd.

Why do you want to discard a packet if the interface it was received from is going away?
The packet is there, it doesn't matter if the interface goes away after it was received. And we don't have to wait for the interface to come back before continuing processing. We should be able to just process the packet even without that interface it was received on still being present when the processing happens.

And all kinds of upper-layer logics will keep working with an NULL ifp? I am just wondering when we do address list walks, statistics, hashes, ...

What do we do with packets in ifqs when the interface goes away? We don't process them anymore either.

Why would you want the packet to stay so badly. The Internet is designed that we can lose packets and we do all the time. Buffer overflows, link drops, routing glitches, .. out of memory conditions in our stack, ... There are zero guarantees any packet will ever make it. The chances that a partial packet sitting in the reassembly queue will become a "full" packet when an interface goes away become narrowly zero.

Even if this patch doesn't change the behaviour for link flapping, I'd even argue the same thing for link-flaps. What happens on real hardware in these cases? I mean the time a link-flap takes is usually going to be a multiple of receive rings/buffers/.. so by the time your link comes back up the remainders of your fragments for the "full" packet have long gone down the orcus, or worse, might even have triggered ICMP(v6) back to the sender already where the connection was dropped rather than waiting on resend-timers.

I think preserving any kind of state which we can get rid of, we are just trying to be to complicated instead of KISS.

And we have two problems to address here:
(1) purging state from all kinds of queues (or in a future trying not to have that state in this way anymore)
(2) the mbuf ifp problem (which is a way larger problem to address and affects all parts of the stack) and should be a different problem

Both problems are intermixed here; let's step-by-step solve (1) and leave (2) for another review.

@bz - once a packet (fragment in this case) has made it to it's destination, I would argue that you shouldn't needlessly drop that packet any more. (Host vs. Router behavior). An interface flapping frequently, and removing all enqueued fragments while doing this, would reduce the usability of such an interface from marginal to completely inoperable.

Using the ifp of the last received fragment rather the ifp of the initial one (in time, not relative offset) sounds to me like a good compromise.

adrian added a comment.Jun 6 2019, 4:01 PM

@bz - once a packet (fragment in this case) has made it to it's destination, I would argue that you shouldn't needlessly drop that packet any more. (Host vs. Router behavior). An interface flapping frequently, and removing all enqueued fragments while doing this, would reduce the usability of such an interface from marginal to completely inoperable.
Using the ifp of the last received fragment rather the ifp of the initial one (in time, not relative offset) sounds to me like a good compromise.

But wait - this should only be called when an ifp is DELETED, not when the interface flaps link / administrative state. If an interface is /deleted/ then we should honestly drop the frames.

I feel like "interface flap" is a bit mis-labeled here.. :/

hselasky updated this revision to Diff 60498.Aug 6 2019, 9:15 AM

Rebase patch after latest changes upstream.