Details

Reviewers

dougm
jeff
kib
markj

Commits

rG780666c09bf9: getblk: reduce time under bufobj lock

Summary

Use the new pctrie combined insert/lookup facility to reduce work and
time under the bufobj interlock when associating a buf with a vnode.

We now do one lookup in the dirty tree and one combined lookup/insert in
the clean tree instead of one lookup in dirty, two in clean, and then an
insert in clean. We also avoid touching the possibly unrelated buf at
the tail of the queue.

Also correct an issue where the actual order of the tail queue depended
on the insertion order due to sign issues.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

rlibby created this revision.May 28 2024, 7:39 PM

Herald added subscribers: olce, imp. · View Herald TranscriptMay 28 2024, 7:39 PM

rlibby requested review of this revision.May 28 2024, 7:39 PM

Harbormaster completed remote builds in B57948: Diff 139174.May 28 2024, 7:39 PM

rlibby added a parent revision: D45394: pctrie: add combined insert/lookup operations.May 28 2024, 7:39 PM

rlibby added a child revision: D45396: vm_radix: define vm_radix_insert_lookup_lt and use in vm_page_rename.

kib accepted this revision.May 29 2024, 1:13 PM

kib added inline comments.

sys/kern/vfs_subr.c
2716–2718
2726	I wanted, for long time, to check that buffers do not hold intersecting data, i.e. n_lblkno + size <= bp->b_lblkno. Not sure how easy is to express this due to alignment.
2728	The line should be de-indented by one space (same for similar indent for lot of KASSERTs below).
2773
2839	Might be, you can further micro-optimize bgetvp() by only vhold()ing the vnode if both clean nd dirty list were empty (it should shave one atomic). Reciprocal change would be needed for brelvp().
sys/sys/buf.h
606	May be, add __result_use_check (AKA nodiscard) ?

This revision is now accepted and ready to land.May 29 2024, 1:13 PM

rlibby marked 4 inline comments as done.May 29 2024, 4:27 PM

rlibby added inline comments.

sys/kern/vfs_subr.c
2726	I don't think I understand the alignment problem you mention. If we wanted to make an assertion like this strong, we would need to look in both directions (forward as well as backward), and look in both trees (dirty as well as clean). We don't necessarily need to make it that strong though. I also worry slightly that there may be file systems that actually are using overlapping buffers, even though they shouldn't be. But I guess the point of the assert would be to find them.
2728	I went back and applied this feedback to the pctrie patch too.
2839	This makes sense but let me look at it as follow up. I want to look through the cache line implications, since this would mean we would have to touch more fields under the bufobj lock in brelvp. Although, maybe there's nothing to consider since ultimately we are embedded in vnode and vnode has only pointer alignment.
sys/sys/buf.h
606	I see two styles being used, one (e.g. systm.h) like int __result_use_check bgetvp(struct vnode , struct buf ); and one (e.g. malloc.h) like int bgetvp(struct vnode , struct buf ) __result_use_check; Do we have style guide? I prefer the latter by eyeball.

rlibby marked 2 inline comments as done.May 29 2024, 4:36 PM

rlibby added inline comments.

sys/kern/vfs_subr.c
2726	Oh, re alignment, maybe you mean, two bufs can use the same page but shouldn't claim to use the same DEV_BSIZE chunk of a page?

kib added inline comments.May 29 2024, 5:36 PM

sys/kern/vfs_subr.c
2726	Yes, like that (esp broken/should not happen for VMIO).
2839	I agree that it should be a follow-up.

kib feedback on style and __result_use_check

This revision now requires review to proceed.May 29 2024, 6:15 PM

Harbormaster completed remote builds in B57955: Diff 139197.May 29 2024, 6:15 PM

dougm added inline comments.Jun 1 2024, 9:49 AM

sys/kern/vfs_subr.c

2832

From here to line 2853, I find the code needlessly convoluted by control flow choices. Why not this?

	/*
	 * If no existing dirty buf, insert onto list for new vnode or
	 * find an existing clean buf.
	 */
	BO_LOCK(bo);
	if (BUF_PCTRIE_LOOKUP(&bo->bo_dirty.bv_root, bp->b_lblkno))
		error = EEXIST;
	else
		error = buf_vlist_find_or_add(bp, bo, BX_VNCLEAN);
	BO_UNLOCK(bo);
	if (error == 0) {
		vhold(vp);
		return (0);
	}

dougm feedback: simplify error paths

Harbormaster completed remote builds in B57992: Diff 139307.Jun 1 2024, 4:15 PM

rlibby marked an inline comment as done.Jun 1 2024, 4:19 PM

rlibby added inline comments.

sys/kern/vfs_subr.c
2832	I agree your suggestion is cleaner. I reworked it along these lines.

dougm added inline comments.Jun 1 2024, 5:51 PM

sys/kern/vfs_bio.c
4247–4256	I don't know this code at all, so my concerns are probably unfounded. Currently, in gbincore, we check the clean tree and then, conditionally, the dirty tree. With this change we will, in bgetvp, check the dirty tree and then, conditionally, the clean tree. Does that change matter?
sys/kern/vfs_subr.c
2725	I suggest that you use if (n == NULL) {} else {} here, as you do at line 2739, so that the assertion patters match. if (n == NULL) { KASSERT(error != EEXIST, ("buf_vlist_add: EEXIST but no existing buf found: bp %p", bp)); } else { KASSERT((uint64_t)n->b_lblkno <= (uint64_t)bp->b_lblkno, ("buf_vlist_add: out of order insert/lookup: bp %p n %p", bp, n)); KASSERT((n->b_lblkno == bp->b_lblkno) == (error == EEXIST), ("buf_vlist_add: inconsistent result for existing buf: " "error %d bp %p n %p", error, bp, n)); }

rlibby marked 2 inline comments as done.Jun 1 2024, 6:19 PM

rlibby added inline comments.

sys/kern/vfs_bio.c
4247–4256	No, it doesn't matter, since the end result is that we don't insert into clean unless there was no entry in either tree. The two trees do not intersect, if there is an entry for an index in one tree, it won't be in the other tree. Modifying either tree requires the bufobj wlock (BO_LOCK), the lock we grab in bgetvp (previously grabbed by the caller). The only modifications to the tree are bgetvp, brelvp, and reassignbuf (which moves between the trees), all under the lock.
5467–5469	This isn't supposed to be in this diff... not sure if this is a phabricator or git arc issue or what but I'll make sure db_show_buffer isn't touched as part of this.
sys/kern/vfs_subr.c
2725	Thanks, I agree that's better.

dougm feedback: kassert readability

Harbormaster completed remote builds in B57994: Diff 139309.Jun 1 2024, 6:31 PM

dougm accepted this revision.Jun 1 2024, 6:41 PM

This revision is now accepted and ready to land.Jun 1 2024, 6:41 PM

markj accepted this revision.Jun 4 2024, 8:57 PM

markj added inline comments.

sys/kern/vfs_bio.c
4248	I wonder if it would be worthwhile to have a counter for these events, akin to `getnewbufrestarts` and others in this file.

rlibby added inline comments.Jun 5 2024, 2:25 AM

sys/kern/vfs_bio.c
4248	Indeed I have been using some private counters during development. They tend to be a little ugly (if (foo) goto bar; becomes if (foo) {counter(foocase); goto bar;}). I wasn't sure we'd want development counters like that in tree. Do we have any examples of best practice, or would getnewbufrestarts be it?

markj added inline comments.Jun 5 2024, 2:32 AM

sys/kern/vfs_bio.c
4248	If you're comfortable without having a counter here, I think that's fine. I don't think we have much by way of best practices here. In the VM I try to put counters like this under vm.stats.<subsystem>.<counter>. In the VFS we have vfs.vnode.stats, which seems to be along the same lines, but the name is inconsistent. Having a vfs.buf.stats tree for debug counters would be nice. I'm not sure that we're really concerned about compatibility when it comes to changing counter names. But that's also outside the scope of the patch.

rlibby mentioned this in D45486: vm_page_insert: use pctrie combined insert/lookup.Jun 5 2024, 6:40 PM

Closed by commit rG780666c09bf9: getblk: reduce time under bufobj lock (authored by rlibby). · Explain WhyJun 6 2024, 3:45 AM

This revision was automatically updated to reflect the committed changes.

rlibby added a commit: rG780666c09bf9: getblk: reduce time under bufobj lock.

getblk: reduce time under bufobj lock
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 139546

sys/kern/vfs_bio.c

sys/kern/vfs_subr.c

sys/sys/buf.h

getblk: reduce time under bufobj lockClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 139546

sys/kern/vfs_bio.c

sys/kern/vfs_subr.c

sys/sys/buf.h

getblk: reduce time under bufobj lock
ClosedPublic
Actions

Revision Contents
Changeset List