Parallelize the buffer cache and rewrite getnewbuf(). This results in a
rS289279
Actions

Description

Parallelize the buffer cache and rewrite getnewbuf(). This results in a
8x performance improvement in a micro benchmark on a 4 socket machine.

Get buffer headers from a per-cpu uma cache that sits in from of the free queue.
Use a per-cpu quantum cache in vmem to eliminate contention for kva.
Use multiple clean queues according to buffer cache size to eliminate clean queue lock contention.
Introduce a bufspace daemon that attempts to prevent getnewbuf() callers from blocking or doing direct recycling.
Close some bufspace allocation races that could lead to endless recycling.
Further the transition to a more modern style of small functions grouped by prefix in order to improve growing complexity.

Sponsored by: EMC / Isilon
Reviewed by: kib
Tested by: pho