Page MenuHomeFreeBSD

netinet6: fix parent lle locking
Needs ReviewPublic

Authored by kp on Jul 8 2024, 2:52 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Aug 23, 3:28 AM
Unknown Object (File)
Thu, Aug 22, 9:02 PM
Unknown Object (File)
Mon, Aug 19, 7:29 PM
Unknown Object (File)
Mon, Aug 19, 8:16 AM
Unknown Object (File)
Jul 29 2024, 11:58 PM
Unknown Object (File)
Jul 11 2024, 10:30 PM
Unknown Object (File)
Jul 11 2024, 6:58 PM
Unknown Object (File)
Jul 9 2024, 7:47 AM

Details

Reviewers
melifaro
Group Reviewers
network
pfsense
Summary

If an lle has a parent we resolve that one instead. However, we release the
child lle lock before we acquire the parent's lock. This leaves room for the
parent to get freed before we acquire its lock.

We've seen reports of crashes with this backtrace and a NULL lock:

  • trap 0xc, rip = 0xffffffff80d5ab70, rsp = 0xfffffe01067209f0, rbp = 0xfffffe0106720a00 ---

turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xfffffe0106720a00
rw_wunlock_hard() at rw_wunlock_hard+0x9e/frame 0xfffffe0106720a30
nd6_resolve_slow() at nd6_resolve_slow+0x2d7/frame 0xfffffe0106720aa0
nd6_resolve() at nd6_resolve+0x125/frame 0xfffffe0106720b10
ether_output() at ether_output+0x4e7/frame 0xfffffe0106720ba0
ip_output_send() at ip_output_send+0xdc/frame 0xfffffe0106720be0
ip_output() at ip_output+0x1295/frame 0xfffffe0106720ce0
ip_forward() at ip_forward+0x3c2/frame 0xfffffe0106720d90
ip_input() at ip_input+0x705/frame 0xfffffe0106720df0
swi_net() at swi_net+0x138/frame 0xfffffe0106720e60

nd6_resolve_slow+0x2d7 decodes to the LLE_WLOCK(lle) call on the parent lock.

Take the parent lock before releasing the child lock.
While here also sprinkle in a few extra assertions and add a test case to at
least excercise this code path.

Sponsored by: Rubicon Communications, LLC ("Netgate")

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 58562
Build 55450: arc lint + arc unit

Event Timeline

kp requested review of this revision.Jul 8 2024, 2:52 PM

how does this look like in terms of valid lock ordering?

i would expect that instead a ref would be held or something else to that extent

zlei added inline comments.
sys/netinet6/nd6.c
2477

The lock order in nd6_free_children() is parent lle then child lle.
Will the change introduce LOR ?

tests/sys/netinet/forward.sh
280

The -6 is confusing.

sys/netinet6/nd6.c
2477

Hmm. Possibly. I've not been able to reproduce the original panic that led me here, so it's plausible that I've not triggered the scenario that would demonstrate the LOR either.

There's a reference counting mechanism on lle's, but that requires holding the lock on the lle we want to reference, so that won't actually help us here. And it's not at all clear to me that holding a on the child would even prevent the parent from being freed.

refs need to be converted to atomics, then you can grab one without taking the lock

however, it is unclear to me who even takes these refs and in what aspects -- chances are decent this code is already pretty whack and requiring further changes

I'm taking this over from Kristof as a "favor".

Looking at the coredump I believe the panicking unlock is already on the child lle, thus the patch does not address the problem. I'll be posting a new review when I find out what's up, this one will only contain the test case.