Page MenuHomeFreeBSD

parallelize buffer cache
ClosedPublic

Authored by jeff on Feb 8 2018, 11:12 PM.
Tags
None
Referenced Files
F84797935: D14274.diff
Tue, May 28, 6:50 PM
Unknown Object (File)
Mon, May 20, 6:06 AM
Unknown Object (File)
Sun, May 5, 2:08 AM
Unknown Object (File)
Nov 10 2023, 12:01 AM
Unknown Object (File)
Nov 7 2023, 6:09 AM
Unknown Object (File)
Nov 1 2023, 9:43 AM
Unknown Object (File)
Oct 8 2023, 10:52 PM
Unknown Object (File)
Oct 6 2023, 5:05 AM
Subscribers

Details

Summary

This separates the buffer cache from a single managed entity in to multiple 'buf domains'. A bufobj is associated with a buf domain. Each domain manages its own buf space and buf count limits, has its own wait channels, and its own bufspace daemon. Each domain additionally has an optional per-cpu clean queue to alleviate pressure on the clean queue lock. A single global bufspace is not longer supportable on large machines, it becomes a point of significant contention, whether for the atomic bufspace variable, or the locks synchronizing sleeps and wakeups.

The kva space is actually still managed by a single vmem that uses per-cpu quantum caches. The buf headers are still also managed in per-cpu uma caches. All global counters have been switched from atomics to counter(9).

There is a second level of lock avoidance with B_REUSE. This favors a second chance algorithm in buf_recycle() rather than directly adjusting queue position on every access to a buf. It means frequently read bufs may linger slightly longer and buf_recycle() may have to visit more bufs to reach successful completion. However, for things like indirect blocks which are read very frequently it was a substantial reduction in lock contention.

I don't really like bufqueue() and bufqueue_acquire(). They work, they are not elegant. I would accept suggestions for other mechanisms. The rest is fairly typically divide and conquer locking work.

I changed bufspace_reserve() to use only atomic add. I believe there are no races with it missing wakeups but I could be mistaken.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kern/vfs_bio.c
543

This is missing a diff that switches to atomic_fetchadd. I will update after I incorporate other review feedback.

1795–1798

This is broken because we return with a different lock held than we start with. I have a fix in my branch that involves letting this function manage its own locks.

This closes a race in bufspace_daemon_wakeup(). replaced an atomic cmpset loop with atomic fetchadd. And fixes a locking bug with B_AGE.

kern/vfs_bio.c
494

Use atomic_store_int for consistency with bufspace_daemon_wait()?

506

It kind of bugs me that we have bufspace_daemon_wait() and bufspace_wait(), and that bufspace_daemon_wait() is called by the bufspace daemon while bufspace_daemon_wakeup() is not.

It looks a bit like this is modeled after pagedaemon_wait()/pagedaemon_wakeup(), but they're not really analogous. I would inline bufspace_daemon_wait() and rename bufspace_daemon_wakeup() to bufspace_wakeup().

1354

Why isn't this "if (cpu == -1)"? Looks like this is relying on mp_maxid being unsigned.

1425

Unneeded return statement.

1762

Style: spaces around '|'.

1782

This is racy if bd_flush() was called because bd_wanted is true. Is that going to lead to problems?

jeff added inline comments.
kern/vfs_bio.c
506

What was bufspace_wakeup() is now handled via bd_wanted and bq_flush(). This is really the thing that wakes up the bufspace waiters.

the _daemon_ things only wake up the deferred processing thread. I like having the two functions separate from the main body because it's easier to look at and reason about the synchronization issues.

1782

I'm not sure what you mean. We can miss an update to bd_wanted when inserting into one of the per-cpu queues. But this is true even with a lock. The only place that really shouldn't sleep at all is the bufspace daemon which calls flushall. If even this fails it will sleep for a short while and retry.

kern/vfs_bio.c
1782

Ignore me, I forgot that the domain lock is clean queue lock for that domain.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 20 2018, 12:06 AM
This revision was automatically updated to reflect the committed changes.
jeff marked 2 inline comments as done.