If m_pulldown() can be serviced by a contiguous range of a single mbuf, then the mbuf data region does not need to be writable. Therefore, remove the writability check for that case.
Panasas originally made this change internally in our 7.2-based code, and found ourselves needing to make it again in 10.3. Hence, we want to upstream the change and stop having to re-do it.
Context from the original internal issue:
The issue there is that this change semantically changes m_pulldown but probably makes m_pulldown semantics more like the documented semantics than the current semantics.
Specifically, the change removes the test for writeability of a shared external mbuf when deciding to return the buffer unchanged or modify it. The test for writeability was broken anyway (since it only checked for cluster mbufs rather than all types of external mbufs, which include 4k, 9K and 16K types in addition to cluster mbufs).
After a long discussion, I am convinced that the semantic change should not be a problem since all of the modifications that follow the test don't actually change the external header but might modify the mbuf chain that surrounds the external mbuf.
And more current analysis:
I ran the following dtrace script (dtrace -w -s /root/mbuf.d) while doing <<<NETWORK STUFF>>>. m_pulldown() can fail if the mbufs it's operating on or might create is larger than MCLBYTES (2 KB). On any failure path, it will call m_freem(). I inspected a mbuf being freed by m_freem() and found the mbuf is 2120 bytes. The interesting question is why does this code work on 7.2 with jumbo frames but not 10.3 if m_pulldown() is limited by MCLBYTES. The mxge driver on 7.2 will use jumbo mbuf clusters while cxgbe on 10.3 will use M_EXT mbufs that point to external buffers allocated by the driver.
% cat /root/mbuf.d fbt::m_pulldown:entry { printf("before: %p %p %d %p %d %d %p", args[0], args[0]->m_hdr.mh_data, args[0]->m_hdr.mh_len, args[0]->m_hdr.mh_next, args[1], args[2], args[3]); self->m = args[0]; self->traceme = 1; } fbt::m_pulldown:return /self->m/ { printf("after: %p %d %p", self->m, args[0], args[1]); self->m = 0; self->traceme = 0; } fbt::m_freem:entry /self->traceme/ { printf("m_freem: %p", args[0]); panic(); /* @[stack()] = count(); */ }