Paths

Table of Contentst

vfs: scale foffset_lock
ClosedPublic
Actions

Authored by mjg on Sep 12 2019, 10:25 PM.

Details

Reviewers

kib
kevans

Commits

rS361429: vfs: scale foffset_lock by using atomics instead of serializing on mtx pool

Summary

The routine avoidably locks a mutex from a small pool which causes scalability issues. Instead we can set the bit with atomics and only resort to locking if that fails. Since going to sleep would take a sleepq lock anyway remove mtx pool use and use sleepq locks directly.

On a kernel with other fixes (including markj's vm page patch) I get the following results from will-it-scale during tests with 104 threads on skylake:

test	before	after	diff
lseek1_processes	257118082	402258149	+37%
readseek1_processes	75866480	101140043	+25%

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

mjg created this revision.Sep 12 2019, 10:25 PM

Herald added a subscriber: imp. · View Herald TranscriptSep 12 2019, 10:25 PM

Harbormaster completed remote builds in B26446: Diff 62009.Sep 12 2019, 10:25 PM

The comment in sys/file.h for f_vnread_flags needs updating.

sys/kern/vfs_vnops.c
691 ↗	(On Diff #62009)	Are _short implemented on all arches ?
708 ↗	(On Diff #62009)	Don't this lack DROP_GIANT/PICKUP_GIANT ?
740 ↗	(On Diff #62009)	`) != 0`

address feedback

sys/kern/vfs_vnops.c
691 ↗	(On Diff #62009)	I should have checked, apparently not. I'll get this sorted out.

mjg marked 2 inline comments as done.Sep 13 2019, 3:56 PM

I am fine with this version modulo atomic_short.

mjg mentioned this in D21822: Provide generic sub-word atomic *cmpset.Oct 1 2019, 4:02 PM

This patch is not quite right.

f_vnread_flags can have FDEVFS_VNODE from devfs nodes, but this isn't taken into account. (noticed because of the next part) and maintained throughout the process.
foffset_lock will take the lock with FOF_NOLOCK set if OFF_MAX > LONG_MAX, but foffset_nolock will not drop it in this model.

With this patch tacked on to yours, mips/MALTA will boot: https://people.freebsd.org/~kevans/D21626-fix.diff -- I suspect it might be better to instead #define FOFFSET_LOCK_FLAGS FOFFSET_LOCKED | FOFFSET_LOCK_WAITINGand set something like vflag = *flagsp & ~FOFFSET_LOCK_FLAGS to future-proof it, but I only took the bare minimum to make this work properly.

kevans added a reviewer: kevans.Oct 2 2019, 9:37 PM

Looks like the current code (i.e., unpatched) is already buggy in this regard.

There is only one place which acts on the flag and I consider it to be an abuse of the field.

dofilewrite

        if (fp->f_type == DTYPE_VNODE &&
            (fp->f_vnread_flags & FDEVFS_VNODE) == 0)
                bwillwrite();

device nodes don't register their own ops. I think the flag should be eliminated and instead we can compare ->v_op against devfs_vnodeops.

bwillwrite itself is kind of bogus. not everything uses buffer cache (and most notably zfs does not).

Sure, that seems reasonable. I mostly don't care about that as long as the ILP32 unlocking part is fixed.=)

This revision was not accepted when it landed; it landed in state Needs Review.May 24 2020, 3:51 AM

Closed by commit rS361429: vfs: scale foffset_lock by using atomics instead of serializing on mtx pool (authored by mjg). · Explain Why

This revision was automatically updated to reflect the committed changes.

mjg added a commit: rS361429: vfs: scale foffset_lock by using atomics instead of serializing on mtx pool.

Revision Contents
Changeset List

Path

Size

head/

sys/

kern/

vfs_vnops.c

78 lines

sys/

file.h

2 lines

Diff 72182

View Options

head/sys/kern/vfs_vnops.c