Page MenuHomeFreeBSD

getblk: reduce time under bufobj lock
ClosedPublic

Authored by rlibby on May 28 2024, 7:39 PM.
Tags
None
Referenced Files
Unknown Object (File)
Nov 13 2024, 2:24 PM
Unknown Object (File)
Oct 7 2024, 5:20 AM
Unknown Object (File)
Oct 4 2024, 10:14 PM
Unknown Object (File)
Oct 4 2024, 10:14 PM
Unknown Object (File)
Oct 4 2024, 10:13 PM
Unknown Object (File)
Oct 4 2024, 10:12 PM
Unknown Object (File)
Oct 4 2024, 10:12 PM
Unknown Object (File)
Oct 4 2024, 10:12 PM
Subscribers

Details

Summary

Use the new pctrie combined insert/lookup facility to reduce work and
time under the bufobj interlock when associating a buf with a vnode.

We now do one lookup in the dirty tree and one combined lookup/insert in
the clean tree instead of one lookup in dirty, two in clean, and then an
insert in clean. We also avoid touching the possibly unrelated buf at
the tail of the queue.

Also correct an issue where the actual order of the tail queue depended
on the insertion order due to sign issues.

Sponsored by: Dell EMC Isilon

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib added inline comments.
sys/kern/vfs_subr.c
2716–2718
2726

I wanted, for long time, to check that buffers do not hold intersecting data, i.e. n_lblkno + size <= bp->b_lblkno. Not sure how easy is to express this due to alignment.

2728

The line should be de-indented by one space (same for similar indent for lot of KASSERTs below).

2773
2839

Might be, you can further micro-optimize bgetvp() by only vhold()ing the vnode if both clean nd dirty list were empty (it should shave one atomic).

Reciprocal change would be needed for brelvp().

sys/sys/buf.h
606

May be, add __result_use_check (AKA nodiscard) ?

This revision is now accepted and ready to land.May 29 2024, 1:13 PM
rlibby added inline comments.
sys/kern/vfs_subr.c
2726

I don't think I understand the alignment problem you mention.

If we wanted to make an assertion like this strong, we would need to look in both directions (forward as well as backward), and look in both trees (dirty as well as clean). We don't necessarily need to make it that strong though.

I also worry slightly that there may be file systems that actually are using overlapping buffers, even though they shouldn't be. But I guess the point of the assert would be to find them.

2728

I went back and applied this feedback to the pctrie patch too.

2839

This makes sense but let me look at it as follow up. I want to look through the cache line implications, since this would mean we would have to touch more fields under the bufobj lock in brelvp. Although, maybe there's nothing to consider since ultimately we are embedded in vnode and vnode has only pointer alignment.

sys/sys/buf.h
606

I see two styles being used, one (e.g. systm.h) like

int __result_use_check bgetvp(struct vnode *, struct buf *);

and one (e.g. malloc.h) like

int        bgetvp(struct vnode *, struct buf *) __result_use_check;

Do we have style guide? I prefer the latter by eyeball.

rlibby added inline comments.
sys/kern/vfs_subr.c
2726

Oh, re alignment, maybe you mean, two bufs can use the same page but shouldn't claim to use the same DEV_BSIZE chunk of a page?

sys/kern/vfs_subr.c
2726

Yes, like that (esp broken/should not happen for VMIO).

2839

I agree that it should be a follow-up.

kib feedback on style and __result_use_check

This revision now requires review to proceed.May 29 2024, 6:15 PM
sys/kern/vfs_subr.c
2832

From here to line 2853, I find the code needlessly convoluted by control flow choices. Why not this?

	/*
	 * If no existing dirty buf, insert onto list for new vnode or
	 * find an existing clean buf.
	 */
	BO_LOCK(bo);
	if (BUF_PCTRIE_LOOKUP(&bo->bo_dirty.bv_root, bp->b_lblkno))
		error = EEXIST;
	else
		error = buf_vlist_find_or_add(bp, bo, BX_VNCLEAN);
	BO_UNLOCK(bo);
	if (error == 0) {
		vhold(vp);
		return (0);
	}

dougm feedback: simplify error paths

rlibby added inline comments.
sys/kern/vfs_subr.c
2832

I agree your suggestion is cleaner. I reworked it along these lines.

sys/kern/vfs_bio.c
4247–4256

I don't know this code at all, so my concerns are probably unfounded.

Currently, in gbincore, we check the clean tree and then, conditionally, the dirty tree. With this change we will, in bgetvp, check the dirty tree and then, conditionally, the clean tree. Does that change matter?

sys/kern/vfs_subr.c
2725

I suggest that you use if (n == NULL) {} else {} here, as you do at line 2739, so that the assertion patters match.

	if (n == NULL) {
		KASSERT(error != EEXIST,
		    ("buf_vlist_add: EEXIST but no existing buf found: bp %p",
		    bp));
	} else {
		KASSERT((uint64_t)n->b_lblkno <= (uint64_t)bp->b_lblkno,
		    ("buf_vlist_add: out of order insert/lookup: bp %p n %p",
		    bp, n));
		KASSERT((n->b_lblkno == bp->b_lblkno) == (error == EEXIST),
		    ("buf_vlist_add: inconsistent result for existing buf: "
		    "error %d bp %p n %p", error, bp, n));
	}
rlibby added inline comments.
sys/kern/vfs_bio.c
4247–4256

No, it doesn't matter, since the end result is that we don't insert into clean unless there was no entry in either tree.

The two trees do not intersect, if there is an entry for an index in one tree, it won't be in the other tree. Modifying either tree requires the bufobj wlock (BO_LOCK), the lock we grab in bgetvp (previously grabbed by the caller). The only modifications to the tree are bgetvp, brelvp, and reassignbuf (which moves between the trees), all under the lock.

5467–5469

This isn't supposed to be in this diff... not sure if this is a phabricator or git arc issue or what but I'll make sure db_show_buffer isn't touched as part of this.

sys/kern/vfs_subr.c
2725

Thanks, I agree that's better.

rlibby marked an inline comment as done.

dougm feedback: kassert readability

This revision is now accepted and ready to land.Jun 1 2024, 6:41 PM
markj added inline comments.
sys/kern/vfs_bio.c
4248

I wonder if it would be worthwhile to have a counter for these events, akin to getnewbufrestarts and others in this file.

sys/kern/vfs_bio.c
4248

Indeed I have been using some private counters during development. They tend to be a little ugly (if (foo) goto bar; becomes if (foo) {counter(foocase); goto bar;}). I wasn't sure we'd want development counters like that in tree. Do we have any examples of best practice, or would getnewbufrestarts be it?

sys/kern/vfs_bio.c
4248

If you're comfortable without having a counter here, I think that's fine.

I don't think we have much by way of best practices here. In the VM I try to put counters like this under vm.stats.<subsystem>.<counter>. In the VFS we have vfs.vnode.stats, which seems to be along the same lines, but the name is inconsistent. Having a vfs.buf.stats tree for debug counters would be nice. I'm not sure that we're really concerned about compatibility when it comes to changing counter names. But that's also outside the scope of the patch.

This revision was automatically updated to reflect the committed changes.