This has a side effect of reducing contention of vm object (since pmap locks are held with vm object lock). This is also preparation for allowing unused chunks to hang around on LRU lists to reduce creation/destruction. I did not try to make them per-cpu as it is due to the current reclamation scheme. I'm not fond of the way the physical addr is found. Basic ideas how to solve it are:
- encode the domain in lower bits of the pointer
- reduce the number of chunks by 1
This is 90 minutes of poudriere -j 104 with some other local changes + markj's atomic queue state patch:
|head| per-domain pv chunk
|1940930217 (rw:vm object) | 1477941752 (sx:vm map (user)) |
|1767248576 (sx:proctree) | 1455213049 (rw:vm object)
|1457914859 (sx:vm map (user)) | 1431832545 (sx:proctree)
|1027485418 (sleep mutex:VM reserv domain) | **829714675 (rw:pmap pv list) **
|**916225793 (rw:pmap pv list)** | 827757829 (sleep mutex:VM reserv domain)
|650061753 (sleep mutex:ncvn) | 549093363 (sleep mutex:vm active pagequeue)
|579930729 (sleep mutex:vm active pagequeue) | 543907775 (sleep mutex:ncvn)
|500588125 (sleep mutex:pfs_vncache) | 529903985 (sleep mutex:process lock)
|**483413129 (sleep mutex:pmap pv chunk list)**| 510587707 (sleep mutex:pfs_vncache)
|470146522 (sleep mutex:process lock) | 416087302 (sleep mutex:vm page free queue)
|438331739 (sleep mutex:vm page free queue)| 372820786 (lockmgr:tmpfs)
|432400293 (lockmgr:tmpfs) | 341279654 (sleep mutex:struct mount vlist mtx)
|374447164 (lockmgr:zfs) | 309354907 (lockmgr:zfs)
|324149303 (sleep mutex:struct mount vlist mtx)| 257888425 (spin mutex:sleepq chain)
|250128575 (spin mutex:sleepq chain) | 244137628 (sleep mutex:vnode interlock)
|228014749 (sleep mutex:vnode interlock)| 231878487 (sleep mutex:vnode_free_list)
|221631639 (sleep mutex:vnode_free_list) | **181800839 (sleep mutex:pmap pv chunk list) **