Page MenuHomeFreeBSD

Add domainset policy allocation for fpu_kern_ctx
ClosedPublic

Authored by cem on Oct 16 2019, 3:44 AM.

Details

Summary

Like other types of allocation, fpu_kern_ctx are frequently allocated per-cpu.
Provide the API and sketch some example consumers.

(Long term, maybe it makes more sense to just shove one of these in the DPCPU
area?)

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

DPCPU allocations might work - it looks like we only ever allocate one array per driver, so the allocation size is known at compile time. DPCPU has another advantage: ctx_fpu[] itself has to be accessed in the current scheme, and that may be a remote access.

DPCPU allocations might work - it looks like we only ever allocate one array per driver, so the allocation size is known at compile time.

It's mostly that we only ever want one of these per CPU (per driver), and by and large these drivers don't need to be preempted or stacked. So a single kernel context per CPU provided by the base system, for any accelerated driver to use, would remove some of the boilerplate of using SSE/AVX code on the correct memory domain.

DPCPU has another advantage: ctx_fpu[] itself has to be accessed in the current scheme, and that may be a remote access.

Sure, that's true too.

sys/amd64/amd64/fpu.c
1048 ↗(On Diff #63342)

I would suggest calling plain malloc() in this case instead. The use of RR here overrides policy that might be specified in the malloc zones, or in the calling thread, or in the kernel VM object.

cem marked an inline comment as done.
  • Use plain malloc for non-policy fpu alloc
markj added inline comments.
sys/amd64/amd64/fpu.c
1045 ↗(On Diff #71750)

I would pass the domain index here instead of the policy. In practice we'll only ever want to use the default policy (first-touch, implemented by malloc()/UMA), or request memory from a specific domain (DOMAINSET_PREF), and it makes the consumer KPI a bit lighter.

sys/crypto/blake2/blake2_cryptodev.c
147 ↗(On Diff #71750)

You might want to use DPCPU for the mutexes while you're here, so that they automatically get placed on domain-local memory. I see now that that doesn't quite work for the context structures since their size isn't known at compile-time, at least on x86, so you'd have to define a maximum size for the fpu context.

This revision is now accepted and ready to land.May 14 2020, 3:44 PM
sys/amd64/amd64/fpu.c
1045 ↗(On Diff #71750)

The general pattern for other _domainset variant functions has been to pass the policy; in callers this looks like fpu_kern_alloc_ctx_domainset(DOMAINSET_PREF(foo), 0) which seems clear to me. The alternative I think you're proposing is just making the PREF policy hardcoded in this function, and passing a plain domain?

That might be fine, but we might want to use a different name. fpu_kern_alloc_ctx_domain_pref is a bit unwieldy, and fpu_kern_alloc_ctx_domain elides the PREF policy.

I don't have any preference here.

If we use domainid_t, it looks like we need to pull in the entire sys/domainset.h header rather than just forward-declaring a struct.

sys/crypto/blake2/blake2_cryptodev.c
147 ↗(On Diff #71750)

Rather than do that in N places, I'd rather get this in, centralize the whole 'accelerated code using PCPU FPU contexts and locks' idiom, and refactor aesni and blake2 to both use it. We don't need per-driver versions of these.

sys/amd64/amd64/fpu.c
1045 ↗(On Diff #71750)

Right, I'd just pass an int domain.

fpu_kern_alloc_ctx_domain seems reasonable to me - the use of _PREF is really just an internal detail that handles the possibility of empty domains.

sys/crypto/blake2/blake2_cryptodev.c
147 ↗(On Diff #71750)

Sounds reasonable to me.

  • Just use a domain index for API
  • Don't break tinderbox on !amd64
This revision now requires review to proceed.May 14 2020, 5:23 PM
cem marked 2 inline comments as done.May 14 2020, 5:24 PM
This revision is now accepted and ready to land.May 14 2020, 5:59 PM
This revision was automatically updated to reflect the committed changes.