olce retitled D43984: PP mutexes: unlock: Reset inherited prio regardless of privileges from do_unlock_pp(): Reset umtx_q's inherited prio regardless of privileges to PP mutexes: unlock: Reset inherited prio regardless of privileges.

Feb 23 2024, 5:37 PM

Feb 20 2024

olce requested review of D43984: PP mutexes: unlock: Reset inherited prio regardless of privileges.

Feb 20 2024, 2:19 PM

olce committed rGab978c432e4d: SCHEDULER_STOPPED(): Move it (back) to 'systm.h' (authored by olce).

SCHEDULER_STOPPED(): Move it (back) to 'systm.h'

Feb 20 2024, 9:02 AM

olce committed rG2c8244485795: Annotate 'rebooting' with __read_mostly (authored by olce).

Annotate 'rebooting' with __read_mostly

Feb 20 2024, 9:01 AM

olce committed rG0f0bf1e880c6: sched_setscheduler(2): Change realtime privilege check (authored by dev_submerge.ch).

sched_setscheduler(2): Change realtime privilege check

Feb 20 2024, 8:28 AM

olce committed rG261f70ffc8a8: Annotate 'rebooting' with __read_mostly (authored by olce).

Annotate 'rebooting' with __read_mostly

Feb 20 2024, 8:28 AM

olce committed rGa6d066d0120f: SCHEDULER_STOPPED(): Move it (back) to 'systm.h' (authored by olce).

SCHEDULER_STOPPED(): Move it (back) to 'systm.h'

Feb 20 2024, 8:28 AM

Feb 19 2024

olce committed rG6a3c02bc5289: sched: sched_switch(): Factorize sleepqueue flags (authored by olce).

sched: sched_switch(): Factorize sleepqueue flags

Feb 19 2024, 4:42 PM

olce committed rGbcaa0b4c2bab: umtxvar.h: Add missing include (authored by olce).

umtxvar.h: Add missing include

Feb 19 2024, 4:42 PM

Feb 14 2024

olce committed rG1ee910875cd0: sched_setscheduler(2): Change realtime privilege check (authored by dev_submerge.ch).

sched_setscheduler(2): Change realtime privilege check

Feb 14 2024, 6:21 PM

olce committed rG8ff01d01f2e8: sched_setscheduler(2): Change realtime privilege check (authored by dev_submerge.ch).

sched_setscheduler(2): Change realtime privilege check

Feb 14 2024, 6:19 PM

olce closed D43835: sched_setscheduler(2): Fix realtime privilege check.

Feb 14 2024, 2:26 PM

olce committed rG2198221bd9df: sched_setscheduler(2): Change realtime privilege check (authored by dev_submerge.ch).

sched_setscheduler(2): Change realtime privilege check

Feb 14 2024, 2:26 PM

olce added a comment to D43835: sched_setscheduler(2): Fix realtime privilege check.

In D43835#1001048, @imp wrote:

Git arc parch -c D43835 likely works and is less hassle

Feb 14 2024, 2:17 PM

olce added a comment to D43835: sched_setscheduler(2): Fix realtime privilege check.

In D43835#1001028, @olce wrote:

Could you send me a patch prepared with git format-patch, else I can take the patch here and put your name and mail as the author myself (as you prefer)?

Feb 14 2024, 1:45 PM

olce added a comment to D43835: sched_setscheduler(2): Fix realtime privilege check.

In D43835#1001005, @dev_submerge.ch wrote:

Thanks for the article, Olivier - now that I know the extent of your project, I suspect it won't be MFC'd?
If that's the case it may be worth to get this minimal fix in right now, and MFC it to STABLE. The earlier this issue is fixed in all supported releases, the less workarounds in ports.

Feb 14 2024, 1:26 PM

Feb 13 2024

olce added a comment to D43835: sched_setscheduler(2): Fix realtime privilege check.

Hi Florian,

Feb 13 2024, 8:22 PM

olce accepted D43835: sched_setscheduler(2): Fix realtime privilege check.

The PRIV_SCHED_SETPOLICY and PRIV_SCHED_SET privileges are inconsistent with some other places and can be circumvented. Additionally, I don't think they serve any real security purposes (beyond what PRIV_SCHED_RTPRIO and PRIV_SCHED_IDPRIO can provide).

Feb 13 2024, 2:23 PM

olce accepted D43815: unionfs: cache upper/lower mount objects.

Feb 13 2024, 2:09 PM

olce added a comment to D43815: unionfs: cache upper/lower mount objects.

In D43815#1000700, @jah wrote:

Thinking about it a little more, I should simply remove this part of the commit message. Accessing [base_vp]->v_mount does have risks, but any code that is subject to those risks is almost certainly going to face the same risks from careless access to ump->um_[upper|lower]mp (where 'ump' was obtained by a presumably-safe load of [unionfs_vp]->v_mount->mnt_data at the beginning of the call, as most of these operations do).

Feb 13 2024, 2:09 PM

olce added a comment to D43815: unionfs: cache upper/lower mount objects.

In D43815#1000687, @jah wrote:

In D43815#1000600, @olce wrote:

There is a misunderstanding. I'm very well aware of what you are saying, as you should know. But this is not my point, which concerns the sentence "Use of [vnode]->v_mount is unsafe in the presence of a concurrent forced unmount." in the context of the current change. The bulk of the latter is modifications of unionfs_vfsops.c, which contains VFS operations, and not vnode ones. There are no vnodes involved there, except accessing the layers' root ones. And what I'm saying, and that I proved above is that v_mount on these, again in the context of a VFS operation, cannot be NULL because of a force unmount (if you disagree, then please show where you think there is a flaw in the reasoning).

Actually the assertion about VFS operations isn't entirely true either (mostly, but not entirely); see the vfs_unbusy() dance we do in unionfs_quotactl().

Feb 13 2024, 2:03 PM

olce accepted D43816: VFS: update VOP_FSYNC() debug check to reflect actual locking policy.

Feb 13 2024, 10:18 AM

olce added a comment to D43815: unionfs: cache upper/lower mount objects.

In D43815#1000340, @jah wrote:

In D43815#1000302, @olce wrote:

I don't think it can. Given the first point above, there can't be any unmount of some layer (even forced) until the unionfs mount on top is unmounted. As the layers' root vnodes are vrefed(), they can't become doomed (since unmount of their own FS is prevented), and consequently their v_mount is never modified (barring the ZFS rollback case). This is independent of holding (or not) any vnode lock.

Which doesn't say that they aren't any problems of the sort that you're reporting in unionfs, it's just a different matter.

That's not true; vref() does nothing to prevent a forced unmount from dooming the vnode, only holding its lock does this. As such, if the lock needs to be transiently dropped for some reason and the timing is sufficiently unfortunate, the concurrent recursive forced unmount can first unmount unionfs (dooming the unionfs vnode) and then the base FS (dooming the lower/upper vnode). The held references prevent the vnodes from being recycled (but not doomed), but even this isn't foolproof: for example, in the course of being doomed, the unionfs vnode will drop its references on the lower/upper vnodes, at which point they may become unreferenced unless additional action is taken. Whatever caller invoked the unionfs VOP will of course still hold a reference on the unionfs vnode, but this does not automatically guarantee that references will be held on the underlying vnodes for the duration of the call, due to the aforementioned scenario.

Feb 13 2024, 8:22 AM

Feb 12 2024

olce added a comment to D43815: unionfs: cache upper/lower mount objects.

In D43815#1000214, @jah wrote:

In D43815#1000171, @olce wrote:

If one of the layer if forcibly unmounted, there isn't much point in continuing operation. But, given the first point above, that cannot even happen. So really the only case when v_mount can get NULL is the ZFS rollback's one (the layers' root vnodes can't be recycled since they are vrefed). Thinking more about it, always testing if these are alive and well is going to be inevitable going forward. But I'm fine with this change as it is for now.

This can indeed happen, despite the first point above. If a unionfs VOP ever temporarily drops its lock, another thread is free to stage a recursive forced unmount of both the unionfs and the base FS during this window. Moreover, it's easy for this to happen without unionfs even being aware of it: because unionfs shares its lock with the base FS, if a base FS VOP (forwarded by a unionfs VOP) needs to drop the lock temporarily (this is common e.g. for FFS operations that need to update metadata), the unionfs vnode may effectively be unlocked during that time. That last point is a particularly dangerous one; I have another pending set of changes to deal with the problems that can arise in that situation.

This is why I say it's easy to make a mistake in accessing [base vp]->v_mount at an unsafe time.

Feb 12 2024, 6:08 PM

olce added a comment to D43815: unionfs: cache upper/lower mount objects.

In D43815#999937, @jah wrote:

Well, as it is today unmounting of the base FS is either recursive or it doesn't happen at all (i.e. the unmount attempt is rejected immediately because of the unionfs stacked atop the mount in question). I don't think it can work any other way, although I could see the default settings around recursive unmounts changing (maybe vfs.recursive_forced_unmount being enabled by default, or recursive unmounts even being allowed for the non-forced case as well). I don't have plans to change any of those defaults though.

Feb 12 2024, 1:31 PM

olce added a comment to D43818: unionfs: workaround underlying FS failing to respect cn_namelen.

In D43818#1000015, @jah wrote:

Actually I've been thinking of doing exactly that, although it depends on how much time I get away from $work over the next few weeks.

Feb 12 2024, 9:19 AM

olce added a comment to D40850: VFS lookup: New vn_cross_single_mount() and vn_cross_mounts().

In D40850#1000012, @jah wrote:

@olce @mjg This change seems to have stalled, what do you want to do about it?

Feb 12 2024, 9:15 AM

Feb 11 2024

olce accepted D43817: unionfs: upgrade the vnode lock during fsync() if necessary.

Nice catch.

Feb 11 2024, 7:37 PM

olce accepted D43818: unionfs: workaround underlying FS failing to respect cn_namelen.

OK as a workaround. Hopefully, we'll get OpenZFS fixed soon. If you don't plan to, I may try to submit a patch upstream, since it seems no one has proposed any change in https://github.com/openzfs/zfs/issues/15705.

Feb 11 2024, 7:34 PM

olce accepted D43815: unionfs: cache upper/lower mount objects.

I think this goes in the right direction long term also.

Feb 11 2024, 5:36 PM