Page MenuHomeFreeBSD

Add nd6_ifinfo() function to do basic checks to avoid NULL pointer dereference
Needs ReviewPublic

Authored by ae on Nov 2 2021, 10:52 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Sep 27, 6:20 PM
Unknown Object (File)
Thu, Sep 26, 3:22 PM
Unknown Object (File)
Thu, Sep 26, 3:11 PM
Unknown Object (File)
Tue, Sep 24, 11:21 AM
Unknown Object (File)
Fri, Sep 20, 11:14 AM
Unknown Object (File)
Fri, Sep 20, 11:13 AM
Unknown Object (File)
Fri, Sep 20, 11:13 AM
Unknown Object (File)
Fri, Sep 6, 12:19 AM

Details

Reviewers
melifaro
markj
Group Reviewers
network
Summary

ND6_IFINFO macro is used to get access to interface's AF_INET6
specific if_afdata. When ifnet is already destroyed it is possible that
some code tries to get access to this data. E.g. when netisr queue has
many mbufs queued and ifnet destroying is in progress, it is easy
to get NULL pointer dereference when ND6_IFINFO macro is used without
extra checks.

The patch does not add 100% protection from such panics, but reduces
it a lot. Internally we use deferred ifnet destroying and with this
patch it seems most of such panics are avoided.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 42535
Build 39423: arc lint + arc unit

Event Timeline

ae held this revision as a draft.
ae published this revision for review.Nov 2 2021, 11:05 PM
ae added reviewers: network, melifaro, markj.
sys/netinet6/nd6.h
112–114

There are no guarantees that the code in the other thread will not set IFF_DYING immediately after checking. The same goes with afdata check.

I'd rather do the following:

sys/netinet6/nd6.h
112–114

For good measure I would use atomic_load_ptr() as well, to prevent the compiler from loading the pointer multiple times and possibly circumventing the NULL checks.

Is something blocking this change?

I'm asking because there is a stable/12 crash which looks very much like something which would be worked around by this patch:

db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100056 td 0xfffff8000546c000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00401f37f0
vpanic() at vpanic+0x197/frame 0xfffffe00401f3840
panic() at panic+0x43/frame 0xfffffe00401f38a0
trap_fatal() at trap_fatal+0x391/frame 0xfffffe00401f3900
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00401f3950
trap() at trap+0x286/frame 0xfffffe00401f3a60
calltrap() at calltrap+0x8/frame 0xfffffe00401f3a60
--- trap 0xc, rip = 0xffffffff8105bbfb, rsp = 0xfffffe00401f3b30, rbp = 0xfffffe00401f3bc0 ---
nd6_dad_timer() at nd6_dad_timer+0x4b/frame 0xfffffe00401f3bc0
softclock_call_cc() at softclock_call_cc+0x141/frame 0xfffffe00401f3c70
softclock() at softclock+0x79/frame 0xfffffe00401f3c90
ithread_loop() at ithread_loop+0x23c/frame 0xfffffe00401f3cf0
fork_exit() at fork_exit+0x7e/frame 0xfffffe00401f3d30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00401f3d30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
sys/netinet6/nd6.h
112–114

A minor note is that I would add predict false for the thing being NULL on top.

I think this problem also can be classified as queuing mbufs with pkthdr.rcvif pointer will allow ifnet pointer to outlive the epoch. I have a WIP that will post soon to provide API to serialize/restore the pkthdr.rcvif pointer and that should be used in all long term queuing of mbufs.

See also https://reviews.freebsd.org/D33064

Given the description is the problem that there is no reference taken of any sort? single-var atomics would be a non-starter of course, but a per-CPU refcounting scheme may be more than fast enough. It does add some cycles for each mbuf (unless they can be batched?) but avoids any hairy-ness with use-after-free.

In D32811#759237, @mjg wrote:

Given the description is the problem that there is no reference taken of any sort? single-var atomics would be a non-starter of course, but a per-CPU refcounting scheme may be more than fast enough. It does add some cycles for each mbuf (unless they can be batched?) but avoids any hairy-ness with use-after-free.

Queueing/dequeuing is a slow process anyway, so instead of adding a new way to reference ifnet, I suggest to use pointer serialization to solve all races related to queueing mbufs: D33266

So should this revision be abandoned?

In D32811#764927, @mjg wrote:

So should this revision be abandoned?

Only after D33266 is committed, which depends on D33672.

I think D33268 should address this problem.