Page MenuHomeFreeBSD

vfs: scale foffset_lock
ClosedPublic

Authored by mjg on Sep 12 2019, 10:25 PM.
Tags
None
Referenced Files
Unknown Object (File)
Oct 20 2024, 10:03 PM
Unknown Object (File)
Sep 21 2024, 5:50 PM
Unknown Object (File)
Sep 21 2024, 5:50 PM
Unknown Object (File)
Sep 21 2024, 5:50 PM
Unknown Object (File)
Sep 21 2024, 5:49 PM
Unknown Object (File)
Sep 21 2024, 5:48 PM
Unknown Object (File)
Sep 18 2024, 6:15 AM
Unknown Object (File)
Sep 18 2024, 12:21 AM
Subscribers

Details

Summary

The routine avoidably locks a mutex from a small pool which causes scalability issues. Instead we can set the bit with atomics and only resort to locking if that fails. Since going to sleep would take a sleepq lock anyway remove mtx pool use and use sleepq locks directly.

On a kernel with other fixes (including markj's vm page patch) I get the following results from will-it-scale during tests with 104 threads on skylake:

testbeforeafterdiff
lseek1_processes257118082402258149+37%
readseek1_processes75866480101140043+25%

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

The comment in sys/file.h for f_vnread_flags needs updating.

sys/kern/vfs_vnops.c
691 ↗(On Diff #62009)

Are _short implemented on all arches ?

708 ↗(On Diff #62009)

Don't this lack DROP_GIANT/PICKUP_GIANT ?

740 ↗(On Diff #62009)

) != 0

  • address feedback
sys/kern/vfs_vnops.c
691 ↗(On Diff #62009)

I should have checked, apparently not. I'll get this sorted out.

mjg marked 2 inline comments as done.Sep 13 2019, 3:56 PM

I am fine with this version modulo atomic_short.

This patch is not quite right.

  • f_vnread_flags can have FDEVFS_VNODE from devfs nodes, but this isn't taken into account. (noticed because of the next part) and maintained throughout the process.
  • foffset_lock will take the lock with FOF_NOLOCK set if OFF_MAX > LONG_MAX, but foffset_nolock will not drop it in this model.

With this patch tacked on to yours, mips/MALTA will boot: https://people.freebsd.org/~kevans/D21626-fix.diff -- I suspect it might be better to instead #define FOFFSET_LOCK_FLAGS FOFFSET_LOCKED | FOFFSET_LOCK_WAITINGand set something like vflag = *flagsp & ~FOFFSET_LOCK_FLAGS to future-proof it, but I only took the bare minimum to make this work properly.

Looks like the current code (i.e., unpatched) is already buggy in this regard.

There is only one place which acts on the flag and I consider it to be an abuse of the field.

dofilewrite

        if (fp->f_type == DTYPE_VNODE &&
            (fp->f_vnread_flags & FDEVFS_VNODE) == 0)
                bwillwrite();

device nodes don't register their own ops. I think the flag should be eliminated and instead we can compare ->v_op against devfs_vnodeops.

bwillwrite itself is kind of bogus. not everything uses buffer cache (and most notably zfs does not).

Sure, that seems reasonable. I mostly don't care about that as long as the ILP32 unlocking part is fixed.=)

This revision was not accepted when it landed; it landed in state Needs Review.May 24 2020, 3:51 AM
This revision was automatically updated to reflect the committed changes.