Page MenuHomeFreeBSD

if_bridge(4): Fix module teardown

Authored by kevans on Mar 14 2019, 1:42 AM.



Currently, the following steps will lead to an uma panic:

kldload if_bridge
ifconfig bridge0 create
ifconfig bridge0 addm wlan0 addm em0 up
dhclient em0
kldunload if_bridge

Because bridge_rtnode_zone has outstanding allocations at the time of destruction.

Fix this by explicitly destroying the bridge interfaces at UNINIT. I chose to (attempt) to virtualize the uma zone these allocations come from to simplify things -- trying to reason through how best to destroy the uma zone once all of the bridges were cleaned up across all of the vnets gave me a bit of a headache. I am not completely sure how successful I was, but it looked at a glance like all of the bridge_rt* functions are called in a proper vnet context such that I didn't need to CURVNET_SET (ioctl handlers, IIRC).

Additionally, bridge_rtable_fini now flushes out the routing table to free the memory referenced in the initial panic.

Diff Detail

rS FreeBSD src repository
Lint Skipped
Unit Tests Skipped
Build Status
Buildable 23077

Event Timeline

kevans created this revision.Mar 14 2019, 1:42 AM

I guess I should mention that my original example isn't actually accurate to what I'm doing -- I've got an interface like OpenBSD's if_vether that I'm throwing in the bridge rather than em0. That doesn't work for other reasons, but it was clear that the panic was legit.

kevans updated this revision to Diff 55046.Mar 14 2019, 4:36 AM

Sorry, did the dumb there... ripped out the explicit teardown of the clones, since that actually happens. bridge_rtnode_zone remains virtualized, because that's still the cleanest way -- the clones get destroyed upon if_clone_detach, which happens after the MOD_UNLOAD event.

The bridge_rtnode_destroy reference does need CURVNET magic, because bridge_timer -> bridge_rtage -> bridge_rtnode_destroy isn't already setting the proper vnet. bridge_rtupdate is only called from bridge_input/bridge_forward/bridge_ioctl.

I'll try to take a more in-depth look later today.


Do you know where we call into bridge_rtnode_destroy() without a vnet context set?
Is it a timeout callback? In that case it might be better to set the context in the callback itself.

kevans added inline comments.Mar 14 2019, 11:41 AM

Yeah, it's from a timeout callback. I'll push this back up into it and upload a new version soon.

kevans updated this revision to Diff 55056.Mar 14 2019, 1:11 PM

Push vnet context up into the callout callback

Do we need to virtualise the bridge_rtnode_zone? Doesn't the cleanup call to bridge_rtflush() take care of all of the allocations already?


It makes no real difference here, but I"d be inclined to put the restore at the end of the function. Mostly in case someone else comes by later and adds something which does need to be in the correct VNET context.

Do we need to virtualise the bridge_rtnode_zone? Doesn't the cleanup call to bridge_rtflush() take care of all of the allocations already?

It would take care of it, but bridge_rtflush (spawned from VNET_SYSUNINIT) happens -after- the destruction of bridge_rtnode_zone (MOD_UNLOAD), so we have an ordering problem.

kristof accepted this revision.Mar 14 2019, 6:00 PM

Oh yes, of course.

I've fixed a couple of other bugs where people forgot or did not know that the vnet uninit flow is a little unusual (i.e. last vnet uninit happens after unload) so I should have realised.

This revision is now accepted and ready to land.Mar 14 2019, 6:00 PM
This revision was automatically updated to reflect the committed changes.