Currently sends on unix sockets contend heavily on read locking the list lock. unix1_processes in will-it-scale peaks at 6 processes and then declines.
With this change I get a substantial improvement in number of operations per second with 96 processes:
```
x before
+ after
N Min Max Median Avg Stddev
x 11 1688420 1696389 1693578 1692766.3 2971.1702
+ 10 63417955 71030114 70662504 69576423 2374684.6
Difference at 95.0% confidence
6.78837e+07 +/- 1.49463e+06
4010.22% +/- 88.4246%
(Student's t, pooled s = 1.63437e+06)
"Small" iron changes (1, 2, and 4 processes):
x before1
+ after1.2
+------------------------------------------------------------------------+
| + |
| x + |
| x + |
| x + |
| x ++ |
| xx ++ |
|x x xx ++ |
| |__________________A_____M_____AM____||
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 1131648 1197750 1197138.5 1190369.3 20651.839
+ 10 1203840 1205056 1204919 1204827.9 353.27404
Difference at 95.0% confidence
14458.6 +/- 13723
1.21463% +/- 1.16683%
(Student's t, pooled s = 14605.2)
x before2
+ after2.2
+------------------------------------------------------------------------+
| +|
| +|
| +|
| +|
| +|
| +|
| x +|
| x +|
| x xx +|
|x xxxx +|
| |___AM_| A|
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 1972843 2045866 2038186.5 2030443.8 21367.694
+ 10 2400853 2402196 2401043.5 2401172.7 385.40024
Difference at 95.0% confidence
370729 +/- 14198.9
18.2585% +/- 0.826943%
(Student's t, pooled s = 15111.7)
x before4
+ after4.2
N Min Max Median Avg Stddev
x 10 3986994 3991728 3990137.5 3989985.2 1300.0164
+ 10 4799990 4806664 4806116.5 4805194 1990.6625
Difference at 95.0% confidence
815209 +/- 1579.64
20.4314% +/- 0.0421713%
(Student's t, pooled s = 1681.19)
```
Updated locking protocol:
```
* Locking and synchronization:
*
* Three types of locks exist in the local domain socket implementation: a
* a global linkage rwlock, the mtxpool lock, and per-unpcb mutexes.
* The linkage lock protects the socket count, global generation number,
* and stream/datagram global lists.
*
* The mtxpool lock protects the vnode from being modified while referenced.
* Lock ordering requires that it be acquired before any unpcb locks.
*
* The unpcb lock (unp_mtx) protects all fields in the unpcb. Of particular
* note is that this includes the unp_conn field. So long as the unpcb lock
* is held the reference to the unpcb pointed to by unp_conn is valid. If we
* require that the unpcb pointed to by unp_conn remain live in cases where
* we need to drop the unp_mtx as when we need to acquire the lock for a
* second unpcb the caller must first acquire an additional reference on the
* second unpcb and then revalidate any state (typically check that unp_conn
* is non-NULL) upon requiring the initial unpcb lock. The lock ordering
* between unpcbs is the conventional ascending address order. Two helper
* routines exist for this:
*
* - unp_pcb_lock2(unp, unp2) - which just acquires the two locks in the
* safe ordering.
*
* - unp_pcb_owned_lock2(unp, unp2, freed) - the lock for unp is held
* when called. If unp is unlocked and unp2 is subsequently freed
* freed will be set to 1.
*
* The helper routines for references are:
*
* - unp_pcb_hold(unp): Can be called any time we currently hold a valid
* reference to unp.
*
* - unp_pcb_rele(unp): The caller must hold the unp lock. If we are
* releasing the last reference, detach must have been called thus
* unp->unp_socket be NULL.
*
* UNIX domain sockets each have an unpcb hung off of their so_pcb pointer,
* allocated in pru_attach() and freed in pru_detach(). The validity of that
* pointer is an invariant, so no lock is required to dereference the so_pcb
* pointer if a valid socket reference is held by the caller. In practice,
* this is always true during operations performed on a socket. Each unpcb
* has a back-pointer to its socket, unp_socket, which will be stable under
* the same circumstances.
*
* This pointer may only be safely dereferenced as long as a valid reference
* to the unpcb is held. Typically, this reference will be from the socket,
* or from another unpcb when the referring unpcb's lock is held (in order
* that the reference not be invalidated during use). For example, to follow
* unp->unp_conn->unp_socket, you need to hold a lock on unp_conn to guarantee
* that detach is not run clearing unp_socket.
```