Page MenuHomeFreeBSD

Add unlocked/SMR fast path to getblk()
ClosedPublic

Authored by cem on Jul 22 2020, 11:29 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 17, 9:14 PM
Unknown Object (File)
Wed, Jan 15, 7:58 PM
Unknown Object (File)
Wed, Jan 15, 7:56 PM
Unknown Object (File)
Wed, Jan 15, 7:52 PM
Unknown Object (File)
Wed, Jan 15, 7:46 PM
Unknown Object (File)
Wed, Jan 15, 7:33 PM
Unknown Object (File)
Wed, Jan 15, 7:30 PM
Unknown Object (File)
Wed, Jan 15, 5:09 PM
Subscribers

Details

Summary

Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
API wrapping this functionality. Use it for a fast path in getblkx(),
falling back to locked lookup if we raced a thread changing the buf's
identity.

Test Plan

Quite surprisingly to me, it doesn't seem to immediately panic. So I'm probably missing something. ;-)

In our getblk-heavy workloads (which have other scaling limitations beyond those in FreeBSD), on 2x48 CPU (2P) metal, we see significantly better scaling with this change at 6+ threads (4.3x single-thread vs 3.7x single-thread), and the gap gets wider at higher thread counts. I don't have any great benchmarks on vanilla FreeBSD.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

cem requested review of this revision.Jul 22 2020, 11:29 PM
cem created this revision.
cem added reviewers: alc, dougm, markj, rlibby, kib, jeff.

Quite surprisingly to me, it doesn't seem to immediately panic. So I'm probably missing something.

Perhaps @pho would be willing to try and test this together with D25781?

Quite surprisingly to me, it doesn't seem to immediately panic. So I'm probably missing something.

Perhaps @pho would be willing to try and test this together with D25781?

I'd certainly appreciate it.

In D25782#571014, @cem wrote:

Quite surprisingly to me, it doesn't seem to immediately panic. So I'm probably missing something.

Perhaps @pho would be willing to try and test this together with D25781?

I'd certainly appreciate it.

I'm building a kernel right now.

sys/kern/vfs_bio.c
3872 ↗(On Diff #74826)

The blocking there is not needed.

3884 ↗(On Diff #74826)

I think the rule should be that any failure on lockless path for buffer lookup results in locked retry. Imagine that buffer was reclaimed meantime, then you would not reach the check for bufobj/blkno at all.

3889 ↗(On Diff #74826)

Why do you think it should be handled there ?

sys/kern/vfs_subr.c
2352 ↗(On Diff #74826)

return ()

Thanks!

sys/kern/vfs_bio.c
3872 ↗(On Diff #74826)

will fix

3884 ↗(On Diff #74826)

We can abort without retry for EINTR/ERESTART, right? Retrying in all other cases makes sense to me.

3889 ↗(On Diff #74826)
#define BUF_UNLOCK(bp) do {                                             \
        KASSERT(((bp)->b_flags & B_REMFREE) == 0,                       \
            ("BUF_UNLOCK %p while B_REMFREE is still set.", (bp)));     \
                                                                        \
        (void)_lockmgr_args(&(bp)->b_lock, LK_RELEASE, NULL,            \
            LK_WMESG_DEFAULT, LK_PRIO_DEFAULT, LK_TIMO_DEFAULT,         \
            LOCK_FILE, LOCK_LINE);                                      \
sys/kern/vfs_subr.c
2352 ↗(On Diff #74826)

will fix

cem marked 2 inline comments as done.
  • Fix style issues
  • Fallback to locked getblk path on all BUF_TIMELOCK errors except EINTR/ERESTART.
sys/kern/vfs_bio.c
3889 ↗(On Diff #74826)

Huh. I think you should use lockmgr() directly there, to not touch buffer we do not intend to operate on, at all.

Add BUF_UNLOCK_NOASSERT macro for raw lockmgr unlock operation, and use it in unlocked identity-race path.

sys/kern/vfs_bio.c
3917 ↗(On Diff #74859)

I must admit that 'unlocked' suffix is misleading.

sys/sys/buf.h
331 ↗(On Diff #74859)

I would name it BUF_UNLOCK_RAW()

cem marked 2 inline comments as done.Jul 24 2020, 2:06 PM
cem added inline comments.
sys/kern/vfs_bio.c
3917 ↗(On Diff #74859)

Will change.

sys/sys/buf.h
331 ↗(On Diff #74859)

Will change

Rename new BUF_UNLOCK variant, goto label, for clarity.

This revision is now accepted and ready to land.Jul 24 2020, 2:24 PM
sys/kern/vfs_bio.c
3882 ↗(On Diff #74880)

If you invert the condition, the control flow can be simplified a bit:

if (bp->b_bufobj == && bp->b_lblkno == blkno)
    goto foundbuf_fastpath;
/* It changed, fall back to locked lookup. */
BUF_UNLOCK_RAW(bp);
loop:
sys/kern/vfs_subr.c
495 ↗(On Diff #74880)

Missing parens.

cem marked 2 inline comments as done.Jul 24 2020, 2:30 PM
cem added inline comments.
sys/kern/vfs_bio.c
3882 ↗(On Diff #74880)

Sure, will do.

cem marked an inline comment as done.

Invert sense of identity check to simplify goto logic

This revision now requires review to proceed.Jul 24 2020, 2:32 PM
cem marked an inline comment as done.

Fix MarkJ's style comment I missed earlier.

This revision is now accepted and ready to land.Jul 24 2020, 2:49 PM
This revision was automatically updated to reflect the committed changes.
bdrewery added inline comments.
head/sys/kern/vfs_subr.c
2348–2352

Suppose a buf moved from dirty to clean here, after checking the clean queue, before checking the dirty queue. No buf is found. Is SMR intended to avoid that, or is it only consistent per queue?

We found, presumably, that getblk(GB_NOCREAT) gets the wrong result because of this.

head/sys/kern/vfs_subr.c
2348–2352

SMR only helps you on a per-queue basis. I think maybe ideally there would be a single pctrie here and some other mechanism for efficiently tracking dirty bufs, but I don't know how realistic that is.

Probably the right fix for getblk GB_NOCREAT is to re-check with the lock if unlocked lookup returned NULL. Something like:

	bp = gbincore_unlocked(bo, blkno);
	if (bp == NULL) {
		if (flags & GB_NOCREAT)
			goto loop;
		else
			goto newbuf_unlocked;
	}
head/sys/kern/vfs_subr.c
2348–2352

(In the !GB_NOCREAT case, we might see the same race from gbincore_unlocked, but it is harmless: we'd create a useless buf at the same lbn, re-lock to attempt to insert it, and discover we'd lost the race. That race can happen even without SMR-gbincore.)

head/sys/kern/vfs_subr.c
2348–2352

Suraj came up with the same patch idea. I'm wondering if it could be more efficient to just try gbincore_unlocked a second time rather than wait for the bo lock, in the GB_NOCREAT case.

head/sys/kern/vfs_subr.c
2348–2352

Ah, cool. I hope Suraj is doing well. Spinning locklessly again (GB_NOCREAT case) might be worth doing, but you'll still need the same locked fallback if you continue to get NULL (for correctness). If you should spin, and for how long, probably depends on how many GB_NOCREAT getblk requests will already exist in the bufq and be transitioning queues, vs true-positive misses in your workload. (And relative costs of a few lockless rechecks vs taking the bufobj lock.)