Page MenuHomeFreeBSD

domain: make it safer to add domains post-domainfinalize
ClosedPublic

Authored by kevans on Jun 26 2020, 2:50 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Apr 20, 1:53 AM
Unknown Object (File)
Feb 10 2024, 5:51 AM
Unknown Object (File)
Jan 18 2024, 5:05 PM
Unknown Object (File)
Jan 3 2024, 3:50 AM
Unknown Object (File)
Jan 3 2024, 3:50 AM
Unknown Object (File)
Jan 3 2024, 3:50 AM
Unknown Object (File)
Jan 3 2024, 3:50 AM
Unknown Object (File)
Dec 23 2023, 9:39 AM
Subscribers

Details

Summary

I can see two concerns for adding domains after domainfinalize:

1.) The slow/fast callouts have already been setup.
2.) Userland could create a socket while we're in the middle of initialization.

We can address #1 fairly easily by tracking whether the domain's been initialized for at least the default vnet. There are still some concerns about the callbacks being invoked while a vnet is in the process of being created/destroyed, but this is a pre-existing issue that the callbacks must coordinate anyways.

#2 we should address, but technically this has been an issue anyways because we don't assert on post-domainfinalize additions; we don't seem to hit it in practice.

Future work can fix that up to make sure we don't find partially constructed domains, but care must be taken to make sure that at least, e.g., the usages of pffindproto in ip_input.c can still find them.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kevans created this revision.

Actually, why do we add the domain to the list via domain_add() before calling domain_init()? If we do domain_init() before domain_add() then doesn't that fix the issue? It would mean swapping the order of the SYSINIT's in DOMAIN_SET and VNET_DOMAIN_SET, but presumably none of the domain init routines really care that they are on the global list yet?

sys/kern/uipc_domain.c
194 ↗(On Diff #73680)

Do you need a fence or the like here? There's no lock protecting this value. @kib might have some suggestions

In D25459#562323, @jhb wrote:

Actually, why do we add the domain to the list via domain_add() before calling domain_init()? If we do domain_init() before domain_add() then doesn't that fix the issue? It would mean swapping the order of the SYSINIT's in DOMAIN_SET and VNET_DOMAIN_SET, but presumably none of the domain init routines really care that they are on the global list yet?

Great question- I wondered myself when looking at it, but it was clearly deliberate so I didn't dig much into it.

sys/kern/uipc_domain.c
194 ↗(On Diff #73680)

Formally this should be an atomic or with release semantic, and then reads of dom_flags need acquire for guaranteed observation of updates prior to the stage where DOMF_INITED was set (unless I misunderstood the algorithm).

I suspect it is less deliberate and it's probably worth looking into to see if we can reverse them. To be honest, what most other subsystems would do is use a single SYSINIT that initialized the object and then added it to the list (e.g. the module handler for device driver modules). If domains were per-VNET that would be trivial. As it is, you probably want to init on all VNETs first via the existing separate SYSINITs before the domain_add.

In D25459#562400, @jhb wrote:

I suspect it is less deliberate and it's probably worth looking into to see if we can reverse them. To be honest, what most other subsystems would do is use a single SYSINIT that initialized the object and then added it to the list (e.g. the module handler for device driver modules). If domains were per-VNET that would be trivial. As it is, you probably want to init on all VNETs first via the existing separate SYSINITs before the domain_add.

Sure- I'll investigate this angle more thoroughly.

I briefly looked at per-VNET domains and decided that was probably an incredibly slippery slope. It'd be trivial for socreate() to compensate, but to do it right I think you'd need per-VNET slow/fast callouts lest you face another synchronization nightmare.

OK, the problem with reversing them is that some pr_init will want to pffindproto() a protocol from the domain they're a part of (e.g. ip_init/ip6_init), and that's not necessarily trivial to resolve at first blush.

Sprinkle some atomics into place

sys/kern/uipc_domain.c
493 ↗(On Diff #77872)

You can micro-optimize like this (for non-x86 arches):

if ((atomic_load_int(&dp->dom_flags) & DOMF_INITED) == 0)
   continue;
atomic_thread_fence_acq();
for (pr = dp->dom_protosw; ....

Apply micro-optimization in both loops

It looks fine to me from the algorithmic PoV, but I do not have good understanding of the domain lifetime.

Future work can fix that up to make sure we don't find partially constructed domains, but care must be taken to make sure that at least, e.g., the usages of pffindproto in ip_input.c can still find them.

You can create 2 new lists: one for domains with fasttimo and another for slowtimo and only insert to them after everything is initialized. Then you don't have to branch on anything, albeit this still does not address unload.

If you convert list iteration to use CK macros, unlink at some point and only do actual destruction after the epoch ends you will get safe unload. I don't know what would be needed to get there in this code though.

In D25459#595135, @mjg wrote:

Future work can fix that up to make sure we don't find partially constructed domains, but care must be taken to make sure that at least, e.g., the usages of pffindproto in ip_input.c can still find them.

You can create 2 new lists: one for domains with fasttimo and another for slowtimo and only insert to them after everything is initialized. Then you don't have to branch on anything, albeit this still does not address unload.

If you convert list iteration to use CK macros, unlink at some point and only do actual destruction after the epoch ends you will get safe unload. I don't know what would be needed to get there in this code though.

I'm going to waffle on this a little bit. I'll still need a flag to indicate that it's initialized to address the other not-yet-addressed race between socket creation and domain initialization. But, adding two lists is easy and would save us an annoying-ish amount of overhead from having to walk all domain+protocols for the two protosw in-tree that even have pr_fasttimo.

You can keep the flag and add the 2 lists anyway. The unload handling can be added later.

I've split that suggestion out to D26709, since it's a little invasive; pf_proto_{,un}register must be taken into account in case a protocol with fasttimo/slowtimo is added after the domain is constructed.

Defer the setting of pr->pr_protocol at deorbit time until the end, and throw an atomic_thread_fence_rel() before it. It seems generally OK to do this locklessly, as long as the other stores have all completed before pf_proto_register() observes the entry as a PR_SPACER as it will immediately bcopy() into the spacer.

pf_proto_register could fail if there's a deregistration in process that would otherwise be freeing up the spacer, but hasn't yet because it's waiting on the epoch. This seems OK.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 16 2021, 6:09 AM
This revision was automatically updated to reflect the committed changes.