Page MenuHomeFreeBSD

Introduce CTLFLAG_NEEDGIANT
AbandonedPublic

Authored by kaktus on Jan 9 2020, 7:03 PM.
Tags
None
Referenced Files
F103630837: D23110.id66847.largetrue.diff
Wed, Nov 27, 9:28 AM
Unknown Object (File)
Wed, Nov 20, 9:03 AM
Unknown Object (File)
Sat, Nov 16, 6:11 AM
Unknown Object (File)
Fri, Nov 15, 2:15 PM
Unknown Object (File)
Mon, Nov 11, 4:02 PM
Unknown Object (File)
Wed, Nov 6, 12:00 PM
Unknown Object (File)
Sun, Nov 3, 3:34 AM
Unknown Object (File)
Fri, Nov 1, 3:51 PM

Details

Summary

Introduce the CTLFLAG_NEEDGIANT flag (modelled after D_NEEDGIANT) that can be used to name and shame sysctls still requiring Giant ;-)

Set CTLFLAG_NEEDGIANT to all SYSCTL_(ADD_)PROCs not marked as MPSAFE.

Mark all SYSCTL_(ADD_)NODEs using custom handler and not already marked as MPSAFE as CTLFLAG_NEEDGIANT.

Mark all various handcrafted SYSCTL_(ADD_)OID etc invocation I've seen in the code.

Add a static_assert to check for one of the now required flags.

Full diff with -U999999 available at https://people.freebsd.org/~kaktus/sysctl.patch because it's getting close to 18MB.

TODO:
Check kmods in ports?

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

I like it. The only thing that could be better in CTASSERT that exactly one of MPSAFE or NEEDGIANT is set.

I disagree with this approach.

Instead I suggest the following:
0. on registration add a warning that given sysctl is not marked as mpsafe [optional]

  1. keep converting existing sysctls and adding the CTLFLAG_MPSAFE flag. worst case giant can be pushed into specific handlers
  2. once it seems everything is converted require the flag to be set for sysctl to be registered, special handling for giant is removed from sysctl itself
  3. phase out the flag some time later. coccinelle can easily do it.

The advantage of this approach is that it lets us know what has been audited at compile time...

I don't see any benefit. "general" kernel does not use giant and everything there is almost guaranteed to be trivially annotated as _MPSAFE (just someone has to inspect all cases). the only fishy stuff may be in syscons or some drivers. If in doubt, giant handling can be pushed into these handlers. Basically almost everything inspected is instantly resolvable one way or the other.

Setting aside the issue of whether having a flag is a good idea, I'm not sure this is the right name. Really what we're trying to indicate after we get rid of Giant is whether these sysctls are reentrant or not (and need external mutual exclusion)... right? Maybe something like NEEDMUTEX rather than Giant in particular? /bikeshed

(As always, I might be totally mistaken. Let me know if I’m wrong and hopefully I learn something.)

Scenario where a sysctl ends up being called from another sysctl like this sounds like a programming error.

In D23110#506565, @mjg wrote:

Scenario where a sysctl ends up being called from another sysctl like this sounds like a programming error.

Reentrant was the wrong word. Threadsafe? Concurrent? Something like that.

Fix few typos and be more consistent on style(9).

I'm not going to die for a name, but! we still have D_NEEDGIANT and the name still describe what is being done in this case.

Also, based on my (and probably not only mine) experience, if there is no whining there is little incentive to work on boring fixes and this is what most of the work here will be.

In D23110#506566, @cem wrote:
In D23110#506565, @mjg wrote:

Scenario where a sysctl ends up being called from another sysctl like this sounds like a programming error.

Reentrant was the wrong word. Threadsafe? Concurrent? Something like that.

mpsafe sysctls already have to provide their own locking in this regard. the only guarantee provided by the sysctl mechanism is that the handler will not get unloaded while executed

Also, based on my (and probably not only mine) experience, if there is no whining there is little incentive to work on boring fixes and this is what most of the work here will be.

Every single case to be inspected by hand immediately gets CTLFLAG_MPSAFE. Either it did not need giant to begin with or (in the worst case) you can put giant locking into the handler. As such, the proposed flag does not add any value. It's just extra level of churn before the entire ordeal is removed.

In D23110#506563, @cem wrote:

Setting aside the issue of whether having a flag is a good idea, I'm not sure this is the right name. Really what we're trying to indicate after we get rid of Giant is whether these sysctls are reentrant or not (and need external mutual exclusion)... right? Maybe something like NEEDMUTEX rather than Giant in particular? /bikeshed

(As always, I might be totally mistaken. Let me know if I’m wrong and hopefully I learn something.)

Yea, but the NEEDGIANT name means a grep

In D23110#506589, @mjg wrote:

Also, based on my (and probably not only mine) experience, if there is no whining there is little incentive to work on boring fixes and this is what most of the work here will be.

Every single case to be inspected by hand immediately gets CTLFLAG_MPSAFE. Either it did not need giant to begin with or (in the worst case) you can put giant locking into the handler. As such, the proposed flag does not add any value. It's just extra level of churn before the entire ordeal is removed.

Except that assumes one person is doing the auditing. There will be several and the tag let's them all run asynchronously as well as the fix. Churn in the modern SCM doesn't matter. We have high rates of normal change, much of it changing things that have been changed since the last branch. Churn mattered on CVS, but doesn't, really, if done right.

In D23110#506612, @imp wrote:
In D23110#506589, @mjg wrote:

Every single case to be inspected by hand immediately gets CTLFLAG_MPSAFE. Either it did not need giant to begin with or (in the worst case) you can put giant locking into the handler. As such, the proposed flag does not add any value. It's just extra level of churn before the entire ordeal is removed.

Except that assumes one person is doing the auditing. There will be several and the tag let's them all run asynchronously as well as the fix. Churn in the modern SCM doesn't matter. We have high rates of normal change, much of it changing things that have been changed since the last branch. Churn mattered on CVS, but doesn't, really, if done right.

With the patch as proposed here people would have to look for CTLFLAG_NEEDGIANT sysctls. With stock kernel people have to look for sysctls *without* the CTLFLAG_MPSAFE flag. Both cases have the same coopoeration issues, just the method of figuring out whether given handler needs to be worked on is different.

In D23110#506613, @mjg wrote:
In D23110#506612, @imp wrote:
In D23110#506589, @mjg wrote:

Every single case to be inspected by hand immediately gets CTLFLAG_MPSAFE. Either it did not need giant to begin with or (in the worst case) you can put giant locking into the handler. As such, the proposed flag does not add any value. It's just extra level of churn before the entire ordeal is removed.

Except that assumes one person is doing the auditing. There will be several and the tag let's them all run asynchronously as well as the fix. Churn in the modern SCM doesn't matter. We have high rates of normal change, much of it changing things that have been changed since the last branch. Churn mattered on CVS, but doesn't, really, if done right.

With the patch as proposed here people would have to look for CTLFLAG_NEEDGIANT sysctls.

RIght. Another fruit of this work would be audit of all places which have this flag set, and handling of trivial cases where the flag can be removed.

So the result should be an material and easy to asses estimation of the efforts to clean sysctls from Giant.

With stock kernel people have to look for sysctls *without* the CTLFLAG_MPSAFE flag. Both cases have the same coopoeration issues, just the method of figuring out whether given handler needs to be worked on is different.

No, grepping for absence in multi-line vague text is not a solution. More, there is a lot of things to grep, e.g. dynamic sysctl oids allocation and so on. Also, I object against fixing cases of NEEDGIANT by pushing Giant into the handler, there is no point doing this. Handlers must be fixed to not require Giant, in the course of mpsafe work.

I do want sysctls be on par with cdevsw in this regard, where it is absolutely obvious when needs Giant and what does not.

But again, the main benefit of tht work would be the audit, and I was both surprised and pleased when I saw that kaktus started it.

sys/x86/x86/cpu_machdep.c
745

This probably can get rid of NEEDGIANT.

792

This probably can get rid of NEEDGIANT.

sys/x86/x86/intr_machdep.c
754

This probably can get rid of NEEDGIANT.

sys/x86/x86/tsc.c
755

This NEEDGIANT looks spurious, it can be removed in the next step.

[I stopped auditing additions of NEEDGIANT but almost every appearance looks spurious]

Yes, many of the sysctls can be trivially be made / already are but aren't marked as MPSAFE but this was mostly sed + vim macros + a bit of manual fixes to add the flag where MPSAFE isn't set (yet).
The question is: should fixing this be done in this (already huge) diff or as a separate work later.

Yes, many of the sysctls can be trivially be made / already are but aren't marked as MPSAFE but this was mostly sed + vim macros + a bit of manual fixes to add the flag where MPSAFE isn't set (yet).
The question is: should fixing this be done in this (already huge) diff or as a separate work later.

Of course adding MPSAFE (or removing NEEDGIANT) should be done in the next patch. I listed some of them only to demonstrate how your work to add NEEDGIANT causes easy and straightforward fixes.

A question because I should update my tools.
If a node has not the CLT_MFSAFE flags then it has to have the CTL_NEEDGIANT, is it right?
(So I should add a new "flag column" to deskutils/sysctlview and handle/show the new flag with "nsysctl -aNG" sysutils/nsysctl)

Thank you in advance

In D23110#506672, @kib wrote:
In D23110#506613, @mjg wrote:
In D23110#506612, @imp wrote:
In D23110#506589, @mjg wrote:

Every single case to be inspected by hand immediately gets CTLFLAG_MPSAFE. Either it did not need giant to begin with or (in the worst case) you can put giant locking into the handler. As such, the proposed flag does not add any value. It's just extra level of churn before the entire ordeal is removed.

Except that assumes one person is doing the auditing. There will be several and the tag let's them all run asynchronously as well as the fix. Churn in the modern SCM doesn't matter. We have high rates of normal change, much of it changing things that have been changed since the last branch. Churn mattered on CVS, but doesn't, really, if done right.

With the patch as proposed here people would have to look for CTLFLAG_NEEDGIANT sysctls.

RIght. Another fruit of this work would be audit of all places which have this flag set, and handling of trivial cases where the flag can be removed.

So the result should be an material and easy to asses estimation of the efforts to clean sysctls from Giant.

But there is no need to patch the kernel to get that information. See below.

With stock kernel people have to look for sysctls *without* the CTLFLAG_MPSAFE flag. Both cases have the same coopoeration issues, just the method of figuring out whether given handler needs to be worked on is different.

No, grepping for absence in multi-line vague text is not a solution. More, there is a lot of things to grep, e.g. dynamic sysctl oids allocation and so on.

I don't know what kind of script kaktus used to generate his patch, I hacked up the following:

find . -name '*.[ch]' | xargs -P1 awk '/SYSCTL_PROC/ { ins = 1; } /SYSCTL_ADD_PROC/ { ins = 1; } ins { line = line $0; } /;/ && ins { ins = 0; print FILENAME " " line; line = "" }' | grep -v MPSAFE

Prints stuff like this:

./fs/nfsclient/nfs_clnfsiod.c SYSCTL_PROC(_vfs_nfs, OID_AUTO, iodmin, CTLTYPE_UINT | CTLFLAG_RW, 0, sizeof (nfs_iodmin), sysctl_iodmin, "IU", "Min number of nfsiod kthreads to keep as spares");
./fs/nfsclient/nfs_clnfsiod.c SYSCTL_PROC(_vfs_nfs, OID_AUTO, iodmax, CTLTYPE_UINT | CTLFLAG_RW, 0, sizeof (ncl_iodmax), sysctl_iodmax, "IU", "Max number of nfsiod kthreads");
./fs/nfsserver/nfs_nfsdport.c SYSCTL_PROC(_vfs_nfsd, OID_AUTO, dsdirsize, CTLTYPE_UINT | CTLFLAG_RW, 0, sizeof(nfsrv_dsdirsize), sysctl_dsdirsize, "IU", "Number of dsN subdirs on the DS servers");

It probably misses entries but should catch everything seen in this patch and more. Everyone interested in removing some of this stuff can just run it.

Once nothing in the tree is found like this one can add a CTASSERT the flag is set and a runtime check to panic during registration of sysctls which don't have it.

Also interested people can run dtrace to see what is taking giant at runtime on their boxes:

dtrace -n 'fbt::sysctl_root_handler_locked:entry /(args[0]->oid_kind & 0x00040000) == 0/ { @[stringof(args[0]->oid_name), args[0]->oid_descr[0] != 0 ? stringof(args[0]->oid_descr) : "NULL", sym((uintptr_t)args[0]->oid_handler)] = count(); }' -o out
sysctl -a > /dev/null

I cleaned up some immediate offenders.

Also, I object against fixing cases of NEEDGIANT by pushing Giant into the handler, there is no point doing this. Handlers must be fixed to not require Giant, in the course of mpsafe work.

I don't suggest just plopping Giant in the handler. Rather vast majority of cases will straight up just need the annotation added. I expect few problematic cases where Giant is actually used and I don't think they should be holding off removal of giant support from sysctl itself.

I do want sysctls be on par with cdevsw in this regard, where it is absolutely obvious when needs Giant and what does not.

But it is clear (but the condition is inverted). Instead of grepping for a flag one has to grep for lack of the flag, which as noted above is not a problem.

But again, the main benefit of tht work would be the audit, and I was both surprised and pleased when I saw that kaktus started it.

In D23110#506834, @alfix86_gmail.com wrote:

A question because I should update my tools.
If a node has not the CLT_MFSAFE flags then it has to have the CTL_NEEDGIANT, is it right?

Exactly.

kaktus edited the summary of this revision. (Show Details)

Mark all SYSCTL_NODEs using custom handler and not already marked as MPSAFE as CTLFLAG_NEEDGIANT.
Mark all SYSCTL_NODEs with NULL handler as MPSAFE.

Sync with HEAD.

@mjg You are completely missing the point I think. grep -r CTLFLAG_NEEDGIANT is much easier for many folks to run (and remember, and derive off the top of their head) than the script you came up with. FWIW, I would like to use a similar approach with bus_setup_intr() to tag the ones that still need giant explicitly to make them easier to find. It is true that when finishing marking syscalls, I did not add a Giant flag but just converted them all over, but that was a much smaller set of things to examine. I'd much rather us at this point move to a model where we tag things as needing Giant rather than not needing Giant.

Sorry, I'm lost. What is the purpose of this flag?

If it has a real value for the code path (define a specific functionaliy, which is not already present), I'm fine (modulo naming discussions).

If it is only annotating something for a group of deverlopers, I'd suggest an inline comment in all the files.
If the group of interested developers is not significant, I'd suggest a separate todo-file somewhere in the development area containing all the syscalls in question.

In D23110#508042, @jhb wrote:

@mjg You are completely missing the point I think. grep -r CTLFLAG_NEEDGIANT is much easier for many folks to run (and remember, and derive off the top of their head) than the script you came up with.

I'm pretty sure I'm not missing the point. Giant-locked handlers are still in the tree mostly because there was no to push to take care of them. Given someone's idea to remove Giant before 13.0 ships now there is a push to do it. The very same people who would have to be told to grep for CTLFLAG_NEEDGIANT can be told to run the one-liner I provided above. I really don't see how this is supposed to be more problematic. Presumably there would be an e-mail urging people to clear this up, then they can easily reference it if needed. Or if they use the one-liner they will probably easily find it later in their shell history.

In reality I suspect a small group of people will take care of majority of all handlers, 3-4 people will fix some instances on their own volition and the rest will have to be chased down to either remove an obscure driver or ungiantify it (along with a diff which pushes giant into the handler).

FWIW, I would like to use a similar approach with bus_setup_intr() to tag the ones that still need giant explicitly to make them easier to find.

I have no opinion about that code.

For sysctls, vast majority of handlers is already MPSAFE but not yet annotated as such and this is why I think the new flag does not add anything.

In D23110#508108, @lutz_donnerhacke.de wrote:

If it is only annotating something for a group of deverlopers, I'd suggest an inline comment in all the files.

It's hard to assert on a comment line.

kaktus edited the summary of this revision. (Show Details)
kaktus added a reviewer: x11.

Mark everything that was left since the last revision.

Add a static_assert to enforce one of the now required flags.

This diff was generated with -U20 as the full one is close to 37MB now, available at https://people.freebsd.org/~kaktus/sysctl.patch

sys/sys/sysctl.h
272

Might be, use xor instead of or ?

360

Why string oids need Giant ?

763

Same q for opaque.

Because the write path in sysctl_handle_string() and sysctl_handle_opaque() isn't locked yet?

Because the write path in sysctl_handle_string() and sysctl_handle_opaque() isn't locked yet?

Are you sure that it makes any sense/difference ? Note that sysctl_new_in() is just copyin, so kernel fetches something user-controllable directly into the location.

Anyway, if you consider it is useful to add a locking there, do it in this patch, please.

kaktus edited the summary of this revision. (Show Details)

Regen.

Create macro to add MPSAFE flag for SYSCTL_NODEs that are not marked as NEEDGIANT instead of correcting every single entry.

Add locking to sysctl_handle_string() to handle writeable strings. Micro optimise and handle sysctl not marked as writeable as read-only regardless of not being set as CONST_STRING.

All writeable SYSCTL_STRINGs should be migrated to SYSCTL_PROCs and handle locking for their own instead of relying on default handler to protect the write path - to be done in other patches together with marking a lot of sysctls as MPSAFE already.

Mark SYSCTL_OPAQUE as MPSAFE.

Add a man page entry for sysctl(9).

Use xor in the static_assert.

Don't mark sysctl's that are already MPSAFE as NEEDGIANT.

bcr added a subscriber: bcr.

OK for the man page change.

Oops! Include proper diff.

D23110#506589, @mjg wrote:
Also interested people can run dtrace to see what is taking giant at runtime on their boxes:

dtrace -n 'fbt::sysctl_root_handler_locked:entry /(args[0]->oid_kind & 0x00040000) == 0/ { @[stringof(args[0]->oid_name), args[0]->oid_descr[0] != 0 ? stringof(args[0]->oid_descr) : "NULL", sym((uintptr_t)args[0]->oid_handler)] = count(); }' -o out

sysctl -a > /dev/null

just a tip, easier and more precise because "sysctl -a" gets the "next leaf" while "nsysctl -aI" gets the "next leaf or next internal node":

nsysctl -aNGI | /usr/bin/grep -v MPSAFE
bjk added inline comments.
share/man/man9/sysctl.9
887 ↗(On Diff #66847)

We should be more clear about whether this is a statement about the (unpreferred nature of the) internal implementation, or a requirement on callers that are querying this node.

share/man/man9/sysctl.9
887 ↗(On Diff #66847)

How about

handler is not MP safe and locks Giant internally.

?

I want this to be split into three patches:

  1. sysctl infra: CTLFLAG_NEEDGIANT, related code, locking for strings, and empty definition of SYSCTL_ENFORCE_FLAGS()
  2. addition of NEEDGIANT for everything that should be marked as such
  3. proper definition of SYSCTL_ENFORCE_FLAGS

We can commit 1 shortly. Part 2 can be committed by pieces, e.g. after review of maintainers of touched subsystem if you feel the need of the review. Then finally we will turn the switch by 3, some time later.

sys/sys/sysctl.h
323

This MAYBE thing breaks the whole purpose of the work. You should end up with all nodes eother marked NEEDGIANT or MPSAFE, explicitly.

sys/sys/sysctl.h
323

Agreed. At the end of this, things must either be marked NEEDGIANT or MPSAFE so the NEEDGIANT things can be fixed. maybe is hard to grep for :(

OK, I'll cut it into smaller batches.

sys/sys/sysctl.h
323

Ack.
I wasn't happy about that either but for a moment it looked like a better alternative to changing 700 more files.