Details

Reviewers

jhb
imp

Commits

rG29e400e994ea: domain: make it safer to add domains post-domainfinalize
rG625932c9aa2c: hvsock: fail the probe on non-HyperV machines

Summary

I can see two concerns for adding domains after domainfinalize:

1.) The slow/fast callouts have already been setup.
2.) Userland could create a socket while we're in the middle of initialization.

We can address #1 fairly easily by tracking whether the domain's been initialized for at least the default vnet. There are still some concerns about the callbacks being invoked while a vnet is in the process of being created/destroyed, but this is a pre-existing issue that the callbacks must coordinate anyways.

#2 we should address, but technically this has been an issue anyways because we don't assert on post-domainfinalize additions; we don't seem to hit it in practice.

Future work can fix that up to make sure we don't find partially constructed domains, but care must be taken to make sure that at least, e.g., the usages of pffindproto in ip_input.c can still find them.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kevans requested review of this revision.Jun 26 2020, 2:50 AM

kevans created this revision.

Harbormaster completed remote builds in B31969: Diff 73680.Jun 26 2020, 2:50 AM

kevans added a parent revision: D25062: domain: give domains a chance to probe for availability.Jun 26 2020, 2:55 AM

Actually, why do we add the domain to the list via domain_add() before calling domain_init()? If we do domain_init() before domain_add() then doesn't that fix the issue? It would mean swapping the order of the SYSINIT's in DOMAIN_SET and VNET_DOMAIN_SET, but presumably none of the domain init routines really care that they are on the global list yet?

sys/kern/uipc_domain.c
194 ↗	(On Diff #73680)	Do you need a fence or the like here? There's no lock protecting this value. @kib might have some suggestions

In D25459#562323, @jhb wrote:

Actually, why do we add the domain to the list via domain_add() before calling domain_init()? If we do domain_init() before domain_add() then doesn't that fix the issue? It would mean swapping the order of the SYSINIT's in DOMAIN_SET and VNET_DOMAIN_SET, but presumably none of the domain init routines really care that they are on the global list yet?

Great question- I wondered myself when looking at it, but it was clearly deliberate so I didn't dig much into it.

kib added inline comments.Jun 26 2020, 7:19 PM

sys/kern/uipc_domain.c
194 ↗	(On Diff #73680)	Formally this should be an atomic or with release semantic, and then reads of dom_flags need acquire for guaranteed observation of updates prior to the stage where DOMF_INITED was set (unless I misunderstood the algorithm).

I suspect it is less deliberate and it's probably worth looking into to see if we can reverse them. To be honest, what most other subsystems would do is use a single SYSINIT that initialized the object and then added it to the list (e.g. the module handler for device driver modules). If domains were per-VNET that would be trivial. As it is, you probably want to init on all VNETs first via the existing separate SYSINITs before the domain_add.

In D25459#562400, @jhb wrote:

I suspect it is less deliberate and it's probably worth looking into to see if we can reverse them. To be honest, what most other subsystems would do is use a single SYSINIT that initialized the object and then added it to the list (e.g. the module handler for device driver modules). If domains were per-VNET that would be trivial. As it is, you probably want to init on all VNETs first via the existing separate SYSINITs before the domain_add.

Sure- I'll investigate this angle more thoroughly.

I briefly looked at per-VNET domains and decided that was probably an incredibly slippery slope. It'd be trivial for socreate() to compensate, but to do it right I think you'd need per-VNET slow/fast callouts lest you face another synchronization nightmare.

OK, the problem with reversing them is that some pr_init will want to pffindproto() a protocol from the domain they're a part of (e.g. ip_init/ip6_init), and that's not necessarily trivial to resolve at first blush.

Sprinkle some atomics into place

Harbormaster completed remote builds in B33990: Diff 77872.Oct 5 2020, 12:33 AM

kib added inline comments.Oct 5 2020, 1:56 AM

sys/kern/uipc_domain.c
493 ↗	(On Diff #77872)	You can micro-optimize like this (for non-x86 arches): if ((atomic_load_int(&dp->dom_flags) & DOMF_INITED) == 0) continue; atomic_thread_fence_acq(); for (pr = dp->dom_protosw; ....

Apply micro-optimization in both loops

Harbormaster completed remote builds in B34035: Diff 77975.Oct 7 2020, 2:59 PM

It looks fine to me from the algorithmic PoV, but I do not have good understanding of the domain lifetime.

Future work can fix that up to make sure we don't find partially constructed domains, but care must be taken to make sure that at least, e.g., the usages of pffindproto in ip_input.c can still find them.

You can create 2 new lists: one for domains with fasttimo and another for slowtimo and only insert to them after everything is initialized. Then you don't have to branch on anything, albeit this still does not address unload.

If you convert list iteration to use CK macros, unlink at some point and only do actual destruction after the epoch ends you will get safe unload. I don't know what would be needed to get there in this code though.

In D25459#595135, @mjg wrote:

Future work can fix that up to make sure we don't find partially constructed domains, but care must be taken to make sure that at least, e.g., the usages of pffindproto in ip_input.c can still find them.

You can create 2 new lists: one for domains with fasttimo and another for slowtimo and only insert to them after everything is initialized. Then you don't have to branch on anything, albeit this still does not address unload.

If you convert list iteration to use CK macros, unlink at some point and only do actual destruction after the epoch ends you will get safe unload. I don't know what would be needed to get there in this code though.

I'm going to waffle on this a little bit. I'll still need a flag to indicate that it's initialized to address the other not-yet-addressed race between socket creation and domain initialization. But, adding two lists is easy and would save us an annoying-ish amount of overhead from having to walk all domain+protocols for the two protosw in-tree that even have pr_fasttimo.

You can keep the flag and add the 2 lists anyway. The unload handling can be added later.

kevans mentioned this in D26709: domain: explicitly track domains with fasttimo/slowtimo callbacks.Oct 7 2020, 6:17 PM

kevans added a child revision: D26709: domain: explicitly track domains with fasttimo/slowtimo callbacks.

I've split that suggestion out to D26709, since it's a little invasive; pf_proto_{,un}register must be taken into account in case a protocol with fasttimo/slowtimo is added after the domain is constructed.

markj mentioned this in D25062: domain: give domains a chance to probe for availability.Oct 13 2020, 12:44 PM

Defer the setting of pr->pr_protocol at deorbit time until the end, and throw an atomic_thread_fence_rel() before it. It seems generally OK to do this locklessly, as long as the other stores have all completed before pf_proto_register() observes the entry as a PR_SPACER as it will immediately bcopy() into the spacer.

pf_proto_register could fail if there's a deregistration in process that would otherwise be freeing up the spacer, but hasn't yet because it's waiting on the epoch. This seems OK.

Harbormaster completed remote builds in B34168: Diff 78227.Oct 14 2020, 5:54 PM

Sorry, wrong review. :(

Harbormaster completed remote builds in B34169: Diff 78230.Oct 14 2020, 5:56 PM

This revision was not accepted when it landed; it landed in state Needs Review.Aug 16 2021, 6:09 AM

Closed by commit rG625932c9aa2c: hvsock: fail the probe on non-HyperV machines (authored by kevans). · Explain Why

This revision was automatically updated to reflect the committed changes.

kevans added a commit: rG625932c9aa2c: hvsock: fail the probe on non-HyperV machines.

kevans added a commit: rG29e400e994ea: domain: make it safer to add domains post-domainfinalize.