Marry mbuf(9) and queue(3).
AbandonedPublic
Actions

Authored by glebius on Jan 11 2015, 6:47 PM.

Details

Reviewers

rpaulo
glebius
gnn
rwatson

Group Reviewers

network

Summary

I always hated to reimplement the basic programming ideas. That's
why I love our queue.h and how we use it. But mbufs have always
been not friendly to it.

Provide enough syntatic sugar that:

a) all code that uses m_next and m_nextpkt still works as before
b) SLIST and STAILQ macros are now usable over mbufs.

I propose to commit only the mbuf.h changes. The struct ifqueue
changes are provided in the differential revision just to demonstrate
the power of new feature. For the struct ifqueue itself I actually
have other plans in the projects/ifnet branch.

Test Plan

This differential has been submitted via kernel running the patch.
At least IPv6 MLD code runs with modified macros.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

No Lint Coverage

Unit

No Test Coverage

Event Timeline

glebius updated this revision to Diff 3116.Jan 11 2015, 6:47 PM

glebius retitled this revision from to Marry mbuf(9) and queue(3)..

glebius updated this object.

glebius edited the test plan for this revision. (Show Details)

glebius added a reviewer: network.

rpaulo added a subscriber: rpaulo.Jan 11 2015, 8:01 PM

rpaulo added inline comments.

sys/net/ifq.h
227 ↗	(On Diff #3116)	Why are we initialising the stailq here? Is it because all the drivers call this and you needed a place to initialise the stailq?
sys/netinet/igmp.c
2770 ↗	(On Diff #3116)	Now, this is clearly more verbose and could also be considered a source code regression. We should provide macros to get the tail mbuf.
sys/sys/mbuf.h
95–96	This will conflict with Robert's change of embedding m_hdr into struct mbuf.

glebius added inline comments.Jan 11 2015, 8:10 PM

sys/net/ifq.h
227 ↗	(On Diff #3116)	Yes, exactly. This is a quite hack to get quickly a working kernel. As said, the ifqueue changes were provided to show how ifq.h merges down, and not to be committed. The ifqueue itself is destined for removal.
sys/netinet/igmp.c
2770 ↗	(On Diff #3116)	The IGMP code actually violates API when it names ifq_head and ifq_tail. The users of ifqueue should use only IF_* and IFQ_* families of macros. And this API doesn't allow to check for last pointer, hence the violation. Well, initially the idea to use _interface_ queue for a simple mbuf queue was already incorrect. May be the struct ifqueue was not the best example to demonstrate the feature, but it was the smallest facility to quickly get a working example.
sys/sys/mbuf.h
95–96	It won't conflict, but of course automatic merge would fail.

I have a bad feeling over the mbuf.h change. It seems to add more options/complexity.
I am also note sure to which extent this is all still needed once we have varsize mbufs (some of this will certainly be needed).
I would love to defer this until we do understand the new mbuf world a lot better.

Sorry, Bjoern, but your comment is very unconstructive and FUD-like.

Yes, change adds more options. Why is this considered a downside?
No, it doesn't add complexity. The change to mbuf.h itself has zero impact on the kernel binary.
The change is absolutely orthogonal to var-size mbufs. It doesn't matter how mbufs are sized, the linked lists of mbuf were there and would be there.

In general I like this idea of bringing mbufs into line with our current set of macros for list walking. I think so long as this is coordinated with Robert's large change that this will be fine to do, and should be done sooner rather than later as a lot of network code will need to be changed to follow this change.

as a lot of network code will need to be changed to follow this change.

No! I outlined in initial description that "all code that uses m_next and m_nextpkt still works as before". And you can see that if you look at the change.

I gather that the ultimate goal of this work is to replace the old way of doing this with the macros. If that is the case then we should plan a migration. If that is not the case then I don't think this commit should proceed at all.

This revision now requires changes to proceed.Jan 13 2015, 3:48 PM

Do you suggest to convert all code in src/sys utilizing mbufs to new macros? I suggest new tool, not policy. Tool is backward compatible.

Mixing queue(3) macros and direct manipulation of pointers may be binary compatible but doesn't read too well. Our code base is reasonably clean so let's try and keep it that way as much as possible. So my suggestion is that you should either convert all the code that manipulates mbufs to use the queue(3) macros or not make this change at all.

There is absolutely no reason to convert 200 old NIC drivers to queue(3) macros. I do not suggest to mix old accessors with queue(3) macros. Old code continues to use old fields, new code is encouraged to use new.

One can utilize queue(3) macros when writing a new facility, which is self-contained and isolated from rest of the stack. Examples: interface queues, queues of multicast/ARP/etc packets, socket buffers, and many other places.

Please compare this:

https://svnweb.freebsd.org/base/projects/ifnet/sys/sys/mbuf.h?r1=277070&r2=277071

with the code we have now in net/ifq.h. The STAILQ based code is much cleared to a one familiar with queue(3), which applies to any BSD kernel hacker. While manual reimplementation of stailq is less clear and more prone to errors.

P.S. Note that STAILQ also has extra error-catching code when compiled with INVARIANTS.

Sync with latest change from Robert.

glebius added a reviewer: rwatson.Jan 15 2015, 6:31 AM

Ping, guys!

I'm not against of this, but I'm agree with np@. There is no way to determine what type of pointer we have in mu_next/mu_nextpkt. Thus it is possible that some code will do access to them assuming wrong type.

Hi Gleb:

Three thoughts, not necessarily to immediately prompt a new patch version, but might:

(1) It would be nice to have CTASSERT()s that no matter how you name the 'next' pointer, they end up pointing to the same offset in struct mbuf, have the same size, and so on. This is important in part because it's easy to imagine having versions of the queue macros that add additional debugging state, which wouldn't violate the API for queues, but would break the binary interface for mbufs. CTASSERTs would help cause a compilation failure if something were to happen there .. vs. memory corruption.

(2) It's unfortunate that we need to introduce m_next as a #define -- I'm just in the middle of trying to remove the namespace mess in struct mbuf. I'd like to get rid of m_ext next. I can't think of an easy way not to do it and still do what you're trying though .. or can we do anonymous unions in C? I forget. Regardless, I would like it if the field names weren't stailq but m_stailq or mu_stailq or similar, as otherwise there's a further increased chance of collisions. Mbufs have long had unfortunate field names but we need to make that better if we can.

(3) It also makes me nervous (as it does np@, et al) to have both conventions in the kernel at once. There's a strong argument for using queue macros here -- and it's really a shame that wasn't done from inception in mbufs, but far too late for that now (by a good twenty or thirty years!).

Robert

Robert,

We already have CTASSERTs that cover this. They reside in uipc_mbuf.c:

#if defined(LP64)
CTASSERT(offsetof(struct mbuf, m_dat) == 32);
CTASSERT(sizeof(struct pkthdr) == 56);
CTASSERT(sizeof(struct struct_m_ext) == 48);
#else
CTASSERT(offsetof(struct mbuf, m_dat) == 24);
CTASSERT(sizeof(struct pkthdr) == 48);
CTASSERT(sizeof(struct struct_m_ext) == 28);
#endif

That's a really good idea. The anonymous unions are C11 standard, and long time had been a GNU extension. Now, that we have C11 compiler and also we already got C11 code in userland, we should encourage ourselves to use C11 code in the kernel. I will upload updated patch. It looks much cleaner than before.

Pretty sure that mbufs predate queue(3), so no shame and no suprise that macros where not used from the very beginning. And I don't accept argument "it was always like this, so should it be always so!". You just embedded m_hdr into mbuf, which has broken that rule :) And note that you needed to do a sweep to adopt everyone to m_hdr embedded, while I don't need to.

+ Navdeep, + Andrey

I really don't understand your concerns. We all agree that from a binary viewpoint nothing changes. Speaking of code, mixing old m_next with new SLIST/STAILQ in the same subsystem is of course a strong style(9) bug, but I doubt that someone can do that intentionally. So, some subsystems can continue to use old chain glue, and others can be converted, or written from scratch and use new glue.

When mbufs travel between subsystems, it is entirely valid to use different code in different subsystems to store them. It is even valid to use different data structures. For example, when mbufs sit in current socket buffer, they are glued by m_next (and m_nextpkt). When they travel from socket buffer to interface fxp0 send queue, they are glued by m_nextpkt. But, if they are sent out of cxgbe0, then they are stored in mp_ring, where m_nextpkt isn't used at all. This doesn't make you concerned, does it? And that's different even from binary viewpoint, but this is still okay! So, I really don't understand what can go wrong if I write a subsystem that stores mbufs in a STAILQ_HEAD and connect it to the stack.

Use C11 anonymous unions.

Hi Gleb:

I meant adding new CTASSERTs to ensure that sizeof(stailq bits) == sizeof(mbuf pointer bits), etc, to ensure that if someone decides to tweak stailq or slist entry structures, the assertion fails. Essentially, to embody in compile-time assertions something that makes the kernel fail to compile if the implied equality of types fails to hold true -- in as much as we possibly can. This would require new CTASSERTs in your patch.

Robert

(Just to be clear: I do realise that the mbuf size asserts will fail if we change the size of the queue stuff -- but what I want is CTASSERTs that fail for exactly the right cause so that it is obvious to the person breaking it that they should fix the actual problem rather than just adjust mbuf size)

The assertion in uipc_mbuf.c looks for offsetof m_dat, which is down below m_next and m_nextpkt. So, if STAILQ_ENTRY or SLIST_ENTRY grows, then the assertion will fire. So we got this covered, implicitly.

To assert explicitly that sizeof(stailq bits) == sizeof(mbuf pointer bits), we need the structures that hide behind STAILQ_ENTRY or SLIST_ENTRY to be named, which is impossible.

I could put this:

CTASSERT(offsetof(struct mbuf, m_data) == sizeof(struct mbuf *) * 2);

But this is the same level of implicitness that we have with current assertions, IMHO.

This could work:

#ifdef INVARIANTS
static struct mbuf m_assertbuf;

CTASSERT(sizeof(m_assertbuf.m_next) == sizeof(m_assertbuf.m_stailq));
...
#endif

Yes, that sounds good to me -- we can't quite assert what we want, but that's pretty close.

Add extra assertions to uipc_mbuf.c

I remain deeply concerned about mixing three different kinds of lists. I don't even know if it's possible to always contain all code paths through all subsystems on just a single list type. Do we really need to add 2 or could we go with just 1 new one to make it clear we intend to go from A to B and not maybe B or C?

That said I'd still hope that the m_next one would be more or less (mostly) be going away entirely soon so making a mess there is not a good idea.

We want 2 because mbufs can be put on SLISTs and can be put on STAILQs. For example, a socket buffer queue is a STAILQ, since we are interested in fast access to the last packet. But a packet that consist of a chain of mbufs is a SLIST.

I have thought about this patch for a bit, and I really think its a *good* thing. I know
it adds an added complexity, but in the end it helps us get to the point where
we can use list structures (queue.h) and benefit from that. I understand gnn's concern
that he wants everything changed. But thats not practical in my opinion and here is why, consider
all the folks out there who have external code that may never get into FreeBSD, if you
don't provide an m_next for there code to continue to work, you have just broken them.

This is not an unusual thing in the networking stack, we have slowly moved away from
using if_snd and its kin to using the drbr ring. Old interface drivers are still in the past but
the new ones and of course ones being actively updated have moved to the if_transmit methods.
So I think this patch should go ahead and get in and of course coordinated with Roberts work.

Just my 2 cents.. (good work Gleb!!)

I'm about to commit it this week.