Page MenuHomeFreeBSD

parallelize buffer cache
ClosedPublic

Authored by jeff on Feb 8 2018, 11:12 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 3, 4:10 PM
Unknown Object (File)
Dec 2 2024, 11:15 PM
Unknown Object (File)
Dec 2 2024, 3:18 PM
Unknown Object (File)
Nov 23 2024, 2:17 PM
Unknown Object (File)
Oct 17 2024, 9:57 AM
Unknown Object (File)
Oct 9 2024, 2:37 PM
Unknown Object (File)
Sep 28 2024, 3:01 PM
Unknown Object (File)
Sep 25 2024, 11:32 PM
Subscribers

Details

Summary

This separates the buffer cache from a single managed entity in to multiple 'buf domains'. A bufobj is associated with a buf domain. Each domain manages its own buf space and buf count limits, has its own wait channels, and its own bufspace daemon. Each domain additionally has an optional per-cpu clean queue to alleviate pressure on the clean queue lock. A single global bufspace is not longer supportable on large machines, it becomes a point of significant contention, whether for the atomic bufspace variable, or the locks synchronizing sleeps and wakeups.

The kva space is actually still managed by a single vmem that uses per-cpu quantum caches. The buf headers are still also managed in per-cpu uma caches. All global counters have been switched from atomics to counter(9).

There is a second level of lock avoidance with B_REUSE. This favors a second chance algorithm in buf_recycle() rather than directly adjusting queue position on every access to a buf. It means frequently read bufs may linger slightly longer and buf_recycle() may have to visit more bufs to reach successful completion. However, for things like indirect blocks which are read very frequently it was a substantial reduction in lock contention.

I don't really like bufqueue() and bufqueue_acquire(). They work, they are not elegant. I would accept suggestions for other mechanisms. The rest is fairly typically divide and conquer locking work.

I changed bufspace_reserve() to use only atomic add. I believe there are no races with it missing wakeups but I could be mistaken.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kern/vfs_bio.c
543 ↗(On Diff #39072)

This is missing a diff that switches to atomic_fetchadd. I will update after I incorporate other review feedback.

1795–1798 ↗(On Diff #39072)

This is broken because we return with a different lock held than we start with. I have a fix in my branch that involves letting this function manage its own locks.

This closes a race in bufspace_daemon_wakeup(). replaced an atomic cmpset loop with atomic fetchadd. And fixes a locking bug with B_AGE.

kern/vfs_bio.c
494 ↗(On Diff #39227)

Use atomic_store_int for consistency with bufspace_daemon_wait()?

506 ↗(On Diff #39227)

It kind of bugs me that we have bufspace_daemon_wait() and bufspace_wait(), and that bufspace_daemon_wait() is called by the bufspace daemon while bufspace_daemon_wakeup() is not.

It looks a bit like this is modeled after pagedaemon_wait()/pagedaemon_wakeup(), but they're not really analogous. I would inline bufspace_daemon_wait() and rename bufspace_daemon_wakeup() to bufspace_wakeup().

1354 ↗(On Diff #39227)

Why isn't this "if (cpu == -1)"? Looks like this is relying on mp_maxid being unsigned.

1425 ↗(On Diff #39227)

Unneeded return statement.

1762 ↗(On Diff #39227)

Style: spaces around '|'.

1782 ↗(On Diff #39227)

This is racy if bd_flush() was called because bd_wanted is true. Is that going to lead to problems?

jeff added inline comments.
kern/vfs_bio.c
506 ↗(On Diff #39227)

What was bufspace_wakeup() is now handled via bd_wanted and bq_flush(). This is really the thing that wakes up the bufspace waiters.

the _daemon_ things only wake up the deferred processing thread. I like having the two functions separate from the main body because it's easier to look at and reason about the synchronization issues.

1782 ↗(On Diff #39227)

I'm not sure what you mean. We can miss an update to bd_wanted when inserting into one of the per-cpu queues. But this is true even with a lock. The only place that really shouldn't sleep at all is the bufspace daemon which calls flushall. If even this fails it will sleep for a short while and retry.

kern/vfs_bio.c
1782 ↗(On Diff #39227)

Ignore me, I forgot that the domain lock is clean queue lock for that domain.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 20 2018, 12:06 AM
This revision was automatically updated to reflect the committed changes.
jeff marked 2 inline comments as done.