Page MenuHomeFreeBSD

Remove writability requirement for single-mbuf, contiguous-range m_pulldown()
ClosedPublic

Authored by rpokala on Jan 5 2017, 11:01 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 10, 6:56 AM
Unknown Object (File)
Dec 25 2024, 8:51 PM
Unknown Object (File)
Dec 25 2024, 8:51 PM
Unknown Object (File)
Dec 25 2024, 9:04 AM
Unknown Object (File)
Dec 23 2024, 6:41 PM
Unknown Object (File)
Oct 27 2024, 11:43 AM
Unknown Object (File)
Oct 10 2024, 1:19 PM
Unknown Object (File)
Oct 4 2024, 3:51 AM

Details

Summary

If m_pulldown() can be serviced by a contiguous range of a single mbuf, then the mbuf data region does not need to be writable. Therefore, remove the writability check for that case.


Panasas originally made this change internally in our 7.2-based code, and found ourselves needing to make it again in 10.3. Hence, we want to upstream the change and stop having to re-do it.

Context from the original internal issue:

The issue there is that this change semantically changes m_pulldown but probably makes m_pulldown semantics more like the documented semantics than the current semantics.

Specifically, the change removes the test for writeability of a shared external mbuf when deciding to return the buffer unchanged or modify it. The test for writeability was broken anyway (since it only checked for cluster mbufs rather than all types of external mbufs, which include 4k, 9K and 16K types in addition to cluster mbufs).

After a long discussion, I am convinced that the semantic change should not be a problem since all of the modifications that follow the test don't actually change the external header but might modify the mbuf chain that surrounds the external mbuf.

And more current analysis:

I ran the following dtrace script (dtrace -w -s /root/mbuf.d) while doing <<<NETWORK STUFF>>>. m_pulldown() can fail if the mbufs it's operating on or might create is larger than MCLBYTES (2 KB). On any failure path, it will call m_freem(). I inspected a mbuf being freed by m_freem() and found the mbuf is 2120 bytes. The interesting question is why does this code work on 7.2 with jumbo frames but not 10.3 if m_pulldown() is limited by MCLBYTES. The mxge driver on 7.2 will use jumbo mbuf clusters while cxgbe on 10.3 will use M_EXT mbufs that point to external buffers allocated by the driver.

% cat /root/mbuf.d 
fbt::m_pulldown:entry
{
  printf("before: %p %p %d %p %d %d %p", args[0], args[0]->m_hdr.mh_data, args[0]->m_hdr.mh_len, args[0]->m_hdr.mh_next, args[1], args[2], args[3]);
  self->m = args[0];
  self->traceme = 1;
}

fbt::m_pulldown:return
/self->m/
{
  printf("after: %p %d %p", self->m, args[0], args[1]);
  self->m = 0;
  self->traceme = 0;
}

fbt::m_freem:entry
/self->traceme/
{
  printf("m_freem: %p", args[0]);
  panic();
  /* 
  @[stack()] = count();
  */
}

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

rpokala retitled this revision from to Remove writability requirement for single-mbuf, contiguous-range m_pulldown().
rpokala updated this object.
rpokala edited the test plan for this revision. (Show Details)
rpokala added a subscriber: rwatson.

@rwatson - the most recent commit that mentions m_pulldown() was yours (r276884); could you either review this in the near future, or else suggest someone else? Thanks!

rwatson edited edge metadata.
rwatson added a subscriber: bz.

I'm not sure if consumers of m_pulldown() make assumptions about writability or not. The man page doesn't mention that they should (or not) but this is more of an empirical question. As I recall, m_pulldown() is particularly popular in IPv6, so tagging Bjoern to perhaps take a look at this and see what he thinks.

rpokala edited edge metadata.

Ok, just for my understanding, can you confirm that the commit message really means "if m_pullup does not have to do anything, then the mbuf does not need to be writeable"? Or in other words "if the requested memory region is already contiguous and nothing needs to change, the mbuf does not need to be writeable"?

Because that's my understanding from reading of the code.

Now to address @rwatson comment; I am not sure if I care; if there are assumptions in code that the mbuf that you get back is writeable because of m_pullup() they are broken if you ask me and we'll find them otherwise. As a matter of fact the "writeable" variable seems to be a mis-namer now as M_*SPACE does that check already in later conditions and the writeable is only checked after that, so it's really just a check whether it satisfies:

154         if ((n->m_flags & M_EXT) == 0 ||
155             (n->m_ext.ext_type == EXT_CLUSTER

and could be named "is_cluster"?

m_pulldown() only needs to determine if a mbuf is writable if it is going to copy data into the data region of an existing mbuf. It does this to create a contiguous data region in a single mbuf from multiple mbufs in the chain. The if block at line 162 does not require any mbuf data region modification. Therefore, checking writable is unnecessary.

Note that @bmueller_panasas.com is the originator of this patch, so I happily defer to his explanation.

bz edited edge metadata.

I'd say get it in and @rwatson and I can figure out the other bits...

This revision is now accepted and ready to land.Jan 11 2017, 8:13 PM
This revision was automatically updated to reflect the committed changes.