getblk: track "non-sterile" bufobj to avoid bo lock on miss if sterile
ClosedPublic
Actions

Authored by rlibby on Jun 12 2024, 7:28 PM.

Details

Reviewers

bdrewery
kib
markj

Commits

rG0dc98b57f321: getblk: track "non-sterile" bufobj to avoid bo lock on miss if sterile

Summary

This is a scheme to avoid taking the bufobj lock and doing a second
lookup in the case where in getblk we do an unlocked lookup and find no
buf. Was there really no buf, or were we in the middle of a reassignbuf
race? By tracking any use of reassignbuf with a flag, we can know if
there can't have been a race because there has been no reassignbuf.
Because this scheme is spoiled on the first use of reassignbuf, it is
mostly only beneficial for cases where a certain vnode is never expected
to use dirty bufs at all.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

rlibby created this revision.Jun 12 2024, 7:28 PM

Herald added subscribers: olce, imp. · View Herald TranscriptJun 12 2024, 7:28 PM

rlibby requested review of this revision.Jun 12 2024, 7:28 PM

Harbormaster completed remote builds in B58158: Diff 139793.Jun 12 2024, 7:28 PM

rlibby edited the summary of this revision. (Show Details)Jun 12 2024, 7:29 PM

Taking advice on style and proper atomic conventions.

And on whether to take this at all.

sys/kern/vfs_bio.c
4010–4011	Unsure if the atomic_load is required after the fence.
sys/kern/vfs_subr.c
3217	This might be more clear with atomic_thread_fence_rel. In that case, as above, would atomic_set_int be required still, or would just a plain bo->bo_flag \|= BO_NONSTERILE followed by the fence suffice?
sys/sys/bufobj.h
119	With BO_NONSTERILE, we avoid setting up flags in bufobj_init. Reversing the sense to BO_STERILE might read more clearly though.

IMO this is somewhat strange change in isolation. What is the action of your caller if GB_NOCREAT does not return a buffer?

I am asking this because if you consider the buffer cache autonomously, there is a big race there, since another thread might create the buffer right after we did not found it, and the answer we get is already outdated. But, buffer cache is not used alone, the vnode must be locked when doing the lookup (except for devvp, but devvp code does not use GB_NOCREAT), and it must be locked exclusively when creating dirty buffers. So really even fences are not strictly needed for correctness, but their cost is zero on x86 and the code is more correct alone with them.

sys/kern/vfs_subr.c
3217	You need the _rel fence to complement the _acq in getblkx() for this scheme to work. But then, there is no reason to use atomics to set the NONSTERILE flag, since other flags manipulations are not atomic (and must be protected by bo_lock).

In D45571#1039766, @kib wrote:

IMO this is somewhat strange change in isolation. What is the action of your caller if GB_NOCREAT does not return a buffer?

I am asking this because if you consider the buffer cache autonomously, there is a big race there, since another thread might create the buffer right after we did not found it, and the answer we get is already outdated.

I agree it's a little strange on its own.

We have a few kinds of uses of GB_NOCREAT out of tree. One of them is certain kinds of file prefetch logic. They are logically okay with the buffer existence races, it only affects performance by influence prefetch. They may not use dirty bufs at all, so they see a benefit from being able to avoid the bufobj lock on a "sterile" bufobj. That benefit can be significant with a lot of concurrent activity on a file.

But, buffer cache is not used alone, the vnode must be locked when doing the lookup (except for devvp, but devvp code does not use GB_NOCREAT), and it must be locked exclusively when creating dirty buffers.

Right, so, we also have uses of GB_NOCREAT on devvps that are not okay with a false negative in the lookup. Generally there would be some kind of synchronization in a higher layer that prevents or renders harmless what you mention about the result being immediately outdated.

An example use is a cache invalidation operation that wants to invalidate a buf if it exists, or an operation to shootdown an IO if there's a buf, or read or write around the buffer cache but only if there's not a buf. Also the motivation for the 2021 patch c926114f2ff20d7c3e5a157f1618b6228f391bf7 (D28375) is related.

Anyway, this is all just to paint a picture about why it makes some sense in our out of tree use.

So really even fences are not strictly needed for correctness, but their cost is zero on x86 and the code is more correct alone with them.

I'll revise this according to what I've understood. In short it would be in getblkx without the bo lock, acq fence; plain read of bo_flag, and in reassignbuf with the bo lock, plain write of bo_flag; rel fence.

However, as I said, I understand if there's just no benefit to taking it in-tree.

kib (IIUC):