Page MenuHomeFreeBSD

ng_eiface: protect private data from ioctl access
Needs ReviewPublic

Authored by pouria on Mon, Mar 9, 9:26 PM.
Tags
None
Referenced Files
F148379995: D55777.diff
Tue, Mar 17, 11:55 AM
F148275723: D55777.diff
Mon, Mar 16, 10:59 PM
Unknown Object (File)
Mon, Mar 16, 11:13 AM
Unknown Object (File)
Sun, Mar 15, 2:22 PM
Unknown Object (File)
Wed, Mar 11, 6:47 AM
Unknown Object (File)
Wed, Mar 11, 4:16 AM

Details

Reviewers
zlei
glebius
markj
Group Reviewers
network
Summary

This fixes a race between freeing private data in ng_eiface_rmnode()
and netlink trying to access priv->media in dump_iface() triggered by
ng_eiface_disconnect() link down event.

This patch will NOT completely fix the issue in PR292993.
However, it uncovers another race between if_vmove() and
ether_detach()/if_detach.

PR: 292993
MFC after: 1 week

Test Plan

See PR292993 to reproduce.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 71301
Build 68184: arc lint + arc unit

Event Timeline

pouria requested review of this revision.Mon, Mar 9, 9:26 PM

We can also reduce the locking duration in ng_eiface_rmnode() by mallocing the priv->media and simply use this lock for ifmedia null check only.
I choose this approach because ng_iface have done the same thing which is safer.

Isn't this problem more general than just this type of interface?

Isn't this problem more general than just this type of interface?

If by "problem" you mean the priv data protection:
Since the netgraph lock generally protects other parts of struct priv, the only things that should be protected by the lock are media ioctl requests.
I can't find any other use case for the lock.
Therefore I only locked the SIOCXIFMEDIA case.
I can protect other uses of priv in other functions beside ioctl too.
But I believe that would be overkill.

However, if you mean the PR292993 race problem in general.
I think that one is too complicated to be solved inside the ng_eiface itself.
At least, I couldn't do it.

Also, ng_eiface_mediachange and ng_eiface_mediastatus are called from inside the ifmedia_ioctl.
That's why I didn't lock them.

I mean that negraph priv is a special case of an interface softc. And netgraph node removal is a special case of a hardware detached event. That's why I think that other interfaces that support SIOC[SG]IFMEDEIA and can be detached at a random moment suffer from the same problem.

I mean that negraph priv is a special case of an interface softc. And netgraph node removal is a special case of a hardware detached event. That's why I think that other interfaces that support SIOC[SG]IFMEDEIA and can be detached at a random moment suffer from the same problem.

Oh, I see now. The only interface that directly uses ifmedia in netgraph is ng_eiface.
and most of the other interfaces outside the netgraph are already protected by their own lock.
Am I missing something?

% fgrep -Er 'SIOCSIFMEDIA|SIOCGIFMEDIA|ifmedia_ioctl' sys/netgraph
sys/netgraph/ng_eiface.c:       case SIOCSIFMEDIA:
sys/netgraph/ng_eiface.c:       case SIOCGIFMEDIA:
sys/netgraph/ng_eiface.c:               error = ifmedia_ioctl(ifp, ifr, &priv->media, command);

if_epair looks suspicious. I will verify ifmedia protection on other interfaces right now.

But there are many interfaces outside of netgraph that use ifmedia_ioctl() and can be detached. I'm pretty sure most of them don't have a big driver level lock. They usually have a single instance lock in their softc, though. But not sure existence of this lock allows to fix the race.

if_epair looks suspicious. I will verify ifmedia protection on other interfaces right now.

Just a friendly reminder that epair(4) is on the hot path for podman containers.

But there are many interfaces outside of netgraph that use ifmedia_ioctl() and can be detached. I'm pretty sure most of them don't have a big driver level lock. They usually have a single instance lock in their softc, though. But not sure existence of this lock allows to fix the race.

I'm now deeper in this problem.
You see, in sys/net these interfaces use ifmedia in their softc:

  • if_vxlan
  • if_lagg
  • if_epair

if_vxlan simply don't support vnets.
if_lagg already have its own lock.
However, this race will NOT happen in if_epair.
Why?
because it doesn't have any method that could be called outside the clone_destroy AND spawn a new thread to call ifmedia_ioctl (e.g. if_link_state_change).

Why it happens in ng_eiface?
because it has ng_eiface_disconnect function as its hook and it could be called outside the ng_eiface_rmnode.

Why is that a problem?
because inside ng_eiface_disconnect function it will call if_link_state_change and it will spawn a new thread that eventually calls dump_iface and reaches ifmedia_ioctl.

That's why it's not a problem for other modules.

I think the problem applies to any driver that uses ifmedia. Most of them are physical NICs, so exercising the race is much more difficult than with ng_eiface, as you need to either emulate device detach or indeed physically detach it. Maybe USB devices would be the most vulnerable to the problem. Now that you are deeper into the problem, do you agree with this assessment?

I think the problem applies to any driver that uses ifmedia. Most of them are physical NICs, so exercising the race is much more difficult than with ng_eiface, as you need to either emulate device detach or indeed physically detach it. Maybe USB devices would be the most vulnerable to the problem. Now that you are deeper into the problem, do you agree with this assessment?

Well, the test is harder than I initially expected.
My usb ethernet dongle is already unstable, see 55682.
and beside usb, unlike logical interface, it's hard to test in a VM too because I have to passthrough it.
Also, the test may not make sense at all. because someone would have to unplug and immediately detach its physical interface module.
Let me see what I can do about it.

I'm not asking you to reproduce it with a physical interface, but asking does your analysis of the problem match my analysis? If we agree that problem is more generic and potentially a physical NIC detach will produce same panic, there is no need to chase for reproduction on a physical device.

The problem should be addressed somehow (definitely not trivial) in a generic way and ng_eiface can be used as a canary that will prove that problem is fixed.

I'm not asking you to reproduce it with a physical interface, but asking does your analysis of the problem match my analysis?

Yes, it does.

If we agree that problem is more generic and potentially a physical NIC detach will produce same panic, there is no need to chase for reproduction on a physical device.
The problem should be addressed somehow (definitely not trivial) in a generic way and ng_eiface can be used as a canary that will prove that problem is fixed.

Before moving to a more general solution. I want proof for myself that this problem can occur in physical interfaces.
Physical drivers were basically a blackbox to me before 55694 (well, my first driver), and I have only recently become more familiar with them.
Give me time and I'll update you.

Thank you

I'm not asking you to reproduce it with a physical interface, but asking does your analysis of the problem match my analysis? If we agree that problem is more generic and potentially a physical NIC detach will produce same panic, there is no need to chase for reproduction on a physical device.

The problem should be addressed somehow (definitely not trivial) in a generic way and ng_eiface can be used as a canary that will prove that problem is fixed.

somehow, I managed to start and win a race between if_link_state_change/dump_iface and my hands detaching the physical interface :))))
Here is the result:

Fatal trap 12: page fault while in kernel mode
cpuid = 15; apic id = 39
fault virtual address   = 0x488
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80b926f9
stack pointer           = 0x28:0xfffffe022d4c18e0
frame pointer           = 0x28:0xfffffe022d4c1960
code segment            = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 67426 (ifconfig)
rdi: fffff80057cc9ae0 rsi: 0000000000000004 rdx: 0000000000000000
rcx: fffff80057cc9ac8  r8: fffff800536b0550  r9: fffffe022d4c2000
rax: 0000000000000000 rbx: fffff80057cc9ac8 rbp: fffffe022d4c1960
r10: 0000000000000003 r11: 0000000000002af8 r12: 0000000000000000
r13: fffff800536b0000 r14: fffffe022d4c1900 r15: fffff80057cc9ae0
trap number             = 12
panic: page fault
cpuid = 15
time = 1773267310
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe022d4c1630
vpanic() at vpanic+0x136/frame 0xfffffe022d4c1760
panic() at panic+0x43/frame 0xfffffe022d4c17c0
trap_pfault() at trap_pfault+0x3cf/frame 0xfffffe022d4c1810
calltrap() at calltrap+0x8/frame 0xfffffe022d4c1810
--- trap 0xc, rip = 0xffffffff80b926f9, rsp = 0xfffffe022d4c18e0, rbp = 0xfffffe022d4c1960 ---
__mtx_lock_sleep() at __mtx_lock_sleep+0xc9/frame 0xfffffe022d4c1960
usbd_do_request_flags() at usbd_do_request_flags+0x7e8/frame 0xfffffe022d4c19e0
usbd_do_request_proc() at usbd_do_request_proc+0x5e/frame 0xfffffe022d4c1a30
ure_miibus_readreg() at ure_miibus_readreg+0x184/frame 0xfffffe022d4c1a90
rgephy_status() at rgephy_status+0x74/frame 0xfffffe022d4c1ad0
rgephy_service() at rgephy_service+0x323/frame 0xfffffe022d4c1b20
mii_pollstat() at mii_pollstat+0x57/frame 0xfffffe022d4c1b50
ure_ifmedia_sts() at ure_ifmedia_sts+0x184/frame 0xfffffe022d4c1ba0
ifmedia_ioctl() at ifmedia_ioctl+0x17e/frame 0xfffffe022d4c1bd0
ifioctl() at ifioctl+0x87e/frame 0xfffffe022d4c1cd0
kern_ioctl() at kern_ioctl+0x286/frame 0xfffffe022d4c1d40
sys_ioctl() at sys_ioctl+0x101/frame 0xfffffe022d4c1e00
amd64_syscall() at amd64_syscall+0x126/frame 0xfffffe022d4c1f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe022d4c1f30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x2dbc090bb2ea, rsp = 0x2dbc02133ec8, rbp = 0x2dbc02133ef0 ---
KDB: enter: panic

So here is our proof.
What do you think about protecting the ifmedia in general?
should we create a new global lock and acquire it in the middle of if_detach?

What do you think about protecting the ifmedia in general?
should we create a new global lock and acquire it in the middle of if_detach?

Of course, we should put more thought into this.
But this is what I could think of at first glance without a need to update ALL of the driver's softc.
So that was just a simple/stupid solution for a complex problem.