Currently sends on unix sockets contend heavily on read locking the list lock. unix1_processes in will-it-scale peaks at 6 processes and then declines.
With this change I get a substantial improvement in number of operations per second with 96 processes:
```
x before
+ after
N Min Max Median Avg Stddev
x 11 1688420 1696389 1693578 1692766.3 2971.1702
+ 10 63417955 71030114 70662504 69576423 2374684.6
Difference at 95.0% confidence
6.78837e+07 +/- 1.49463e+06
4010.22% +/- 88.4246%
(Student's t, pooled s = 1.63437e+06)
"Small" iron changes (1, 2, and 4 processes):
x before1
+ after1
+------------------------------------------------------------------------+
| + |
| + |
| + |
| + x |
| + x |
| + x |
| + x |
| + xx |
|x ++ x xx |
| |_____________A____A_____M___________||
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 1131648 1197750 1197138.5 1190369.3 20651.839
+ 10 1184568 1185836 1185313.5 1185290.5 327.29676
No difference proven at 95.0% confidence
x before2
+ after2
+------------------------------------------------------------------------+
| + |
| + |
| + |
| + |
| x + |
| x + |
| x xx + |
|x xxxx + + + |
| |___AM_| |__AM_||
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 1972843 2045866 2038186.5 2030443.8 21367.694
+ 10 2329770 2386833 2385753 2378617.6 17975.116
Difference at 95.0% confidence
348174 +/- 18551.8
17.1477% +/- 1.00839%
(Student's t, pooled s = 19744.4)
x before4
+ after4
N Min Max Median Avg Stddev
x 10 3986994 3991728 3990137.5 3989985.2 1300.0164
+ 10 4775200 4783252 4781933.5 4780467.9 3359.2578
Difference at 95.0% confidence
790483 +/- 2393.17
19.8117% +/- 0.0616572%
```
Updated locking protocol:
```
* Locking and synchronization:
*
* Three types of locks exist in the local domain socket implementation: a
* a global linkage rwlock, the mtxpool lock, and per-unpcb mutexes.
* The linkage lock protects the socket count, global generation number,
* and stream/datagram global lists.
*
* The mtxpool lock protects the vnode from being modified while referenced.
* Lock ordering requires that it be acquired before any unpcb locks.
*
* The unpcb lock (unp_mtx) protects all fields in the unpcb. Of particular
* note is that this includes the unp_conn field. So long as the unpcb lock
* is held the reference to the unpcb pointed to by unp_conn is valid. If we
* require that the unpcb pointed to by unp_conn remain live in cases where
* we need to drop the unp_mtx as when we need to acquire the lock for a
* second unpcb the caller must first acquire an additional reference on the
* second unpcb and then revalidate any state (typically check that unp_conn
* is non-NULL) upon requiring the initial unpcb lock. The lock ordering
* between unpcbs is the conventional ascending address order. Two helper
* routines exist for this:
*
* - unp_pcb_lock2(unp, unp2) - which just acquires the two locks in the
* safe ordering.
*
* - unp_pcb_owned_lock2(unp, unp2, freed) - the lock for unp is held
* when called. If unp is unlocked and unp2 is subsequently freed
* freed will be set to 1.
*
* The helper routines for references are:
*
* - unp_pcb_hold(unp): Can be called any time we currently hold a valid
* reference to unp.
*
* - unp_pcb_rele(unp): The caller must hold the unp lock. If we are
* releasing the last reference, detach must have been called thus
* unp->unp_socket be NULL.
*
* UNIX domain sockets each have an unpcb hung off of their so_pcb pointer,
* allocated in pru_attach() and freed in pru_detach(). The validity of that
* pointer is an invariant, so no lock is required to dereference the so_pcb
* pointer if a valid socket reference is held by the caller. In practice,
* this is always true during operations performed on a socket. Each unpcb
* has a back-pointer to its socket, unp_socket, which will be stable under
* the same circumstances.
*
* This pointer may only be safely dereferenced as long as a valid reference
* to the unpcb is held. Typically, this reference will be from the socket,
* or from another unpcb when the referring unpcb's lock is held (in order
* that the reference not be invalidated during use). For example, to follow
* unp->unp_conn->unp_socket, you need to hold a lock on unp_conn to guarantee
* that detach is not run clearing unp_socket.
```