Page MenuHomeFreeBSD


Authored by glebius on Mar 15 2017, 9:23 PM.



This is something I promised to do before stable/11 branchpoint, but
failed. The goal is to stop userland applications using kernel
internal structures representing inet protocol control block and TCP
control block. And to stop appropriate sysctls from exporting them.

Basicly, back 25 years ago, when 'struct xsocket' was introduced,
the 'struct xinpcb' and 'struct xtcpcb' should had been designed
the same way - pasting all useful fields from corresponding kernel

Not done that time, so time to do it today, the hard way. This is
going to break a bunch of applications in ports, but I will fix them
as quickly as I can. I already went through such process when I've
hidden 'struct ifnet' and 'struct ifaddr' from userland. I already
started fixing applications...

What will we achieve with the change:

  1. There is no need for spares in 'struct inpcb', 'struct tcpcb'. We are free to move fields around the structure. And what is important Randall already has branch tossing around inpcb that has measurable performance gain.
  2. We will have binaries like netstat(1) working on next versions of FreeBSD. I have just checked, 10-th netstat won't work on 11, and 9-th won't work on 10, etc. But with this change 12-th is going to work on 13, 14, 15, etc...


  • Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO.
  • Make struct xinpcb, struct xtcpcb pure API structures, not including kernel structures inpcb and tcpcb inside.
  • Provide some extra fields into struct xinpcb. Provide some spares. Convert inp_depend4 and inp_depend6 to anonymous structs to remove defines and make names of fields in inpcb and xinpcb same.
  • Seems no good point to export ALL of struct tcpcb. Instead provide a ton of spare fields, to be used as we are in need to export smth.
  • Make kernel and userland utilities compilable and working after these changes.
  • Bump FreeBSD version.

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

glebius created this revision.Mar 15 2017, 9:23 PM

I'm going to commit this on Monday. Waiting for your reviews, guys.

gnn accepted this revision.Mar 17 2017, 5:07 AM
gnn added a subscriber: gnn.

Other than the nits I pointed out already this looks like a good change.

2448 ↗(On Diff #26290)

Out of a judicious level of paranoia I'd check for NULL inputs here.

222 ↗(On Diff #26290)

Any reason not to give this bit of structure a convenient, even if different, name?

293 ↗(On Diff #26290)

Is there a reason to have spares here as well as later? Are you trying to force an alignment or is there another reason?

2790 ↗(On Diff #26290)

The above reads oddly to me. Why not leave COPYTIMER defined and put it in an appropriate place in this file or in a header file?

665 ↗(On Diff #26290)

As above, is there a reason to bracket the structure with spares? If so, I think you ought to put something in the comment block above telling future programmers how they need to allocate these spares.

66 ↗(On Diff #26290)

Does the first #define get undefined? This looks strange to me.

This revision is now accepted and ready to land.Mar 17 2017, 5:07 AM
glebius added inline comments.Mar 17 2017, 6:02 AM
2448 ↗(On Diff #26290)

This shall not happen. Function must panic in that case.

222 ↗(On Diff #26290)

The reason to anonymize this structure is to make it possible to match names in inpcb and xinpcb. Before anonymization, this code wasn't possible:

struct xinpcb *xi;
struct inpcb *inp;

xi->inp_options = inp->inp_options;

..., because inp_options was a pre-processor define. That would be substituted on both sides of the equation.

Note that pre-processor defines were part of API, now actual structure names with same names are.

Finally, a better question on this peace of code would be: why did you leave the structure at all? Why just not to make bare names, which would perfectly work?! Here is the answer. These protocol dependent bits should be protocol-specific, and should be embraced into anonymous union, which would save more space in struct inpcb. However, doing that would create a insta-panicing kernel! Seems like there is some misuse, when both netinet and netinet6 use same fields. The plan is to find misuse, fix it, and embrace into union later.

293 ↗(On Diff #26290)

The reason is to make the API/ABI stable into many future FreeBSD versions. Whatever we can put later into inpcb and want export into userland.

2790 ↗(On Diff #26290)

This is a regular and useful practice in FreeBSD code, when a define is used in a single place. You can find many dozens of examples throughout kernel and userland.

665 ↗(On Diff #26290)

This version of xtcpcb imports those fields of tcpcb that are known to be used by existing programs. But there are many different fields of tcpcb, that are now not exported. We may want to export them later.

If you can name fields of tcpcb that should be exported now, please suggest. My main intent of this review is to hear from TCP guys what else we should export here and now, not awaiting for demand to export and not wasting spares in future.

66 ↗(On Diff #26290)

They are different.

gnn added inline comments.Mar 17 2017, 6:14 AM
66 ↗(On Diff #26290)

Ah, yup, by two, easy to miss characters, but fair enough.

rrs accepted this revision.Mar 17 2017, 10:20 AM

In general I love this change with my few nits. Note I have a major restructure coming
to inpcb and tcpcb that this helps pave the way for. Basically I have used vtune to
cache-line optimize tcp-output to get its cache line usage down.

2448 ↗(On Diff #26290)

Then lets at least add an ASSERT :)

222 ↗(On Diff #26290)

Well not quite "insta-panicing" kernel.. it took a couple of hours (unless on a 100G machine it took only a 20-30min period :o)... :-)

2790 ↗(On Diff #26290)

I agree with gnn on this one. It looks odd and personally I despise macro's anyway. If your going to
define one (which yes this makes sense for) lets put it in the header file. The name of the macro
clearly tells whats happening.

665 ↗(On Diff #26290)

One thing that might be good to export (thats not there now but we at NF have discussed) is the
actual name of the tcp stack running.. i.e. the string from tcbcb->tf_fb->tfb_tcp_block_name

This would allow a program like netstat to display the tcp stack being used. If tf_fb were NULL we would need to copy in the name "default".

glebius added inline comments.Mar 17 2017, 3:56 PM
2448 ↗(On Diff #26290)

If you insist, guys, I can of course. But there sotoxsocket(), that has been in place for 30 years and is 100% analogue to this function. There is no KASSERT here, and nobody complained. And there are thousands of other functions that will panic with NULL argument. The panic in case of NULL deref is so clear and self explanatory.

Hey. just look into this particular file - in_pcb.c. EVERY function in this file that takes inpcb pointer will panic in case if it is NULL, and NEITHER function has KASSERT. Was this ever a problem? So why do you want me to make this function special?

2790 ↗(On Diff #26290)

Guys, please do:

find sys -name \*.c | xargs grep ^#undef

and look at each case. More than half of them would be like this. This is a normal and quite convenient practice,

glebius updated this revision to Diff 26358.Mar 17 2017, 4:36 PM
glebius edited edge metadata.

Export TCP stack name into xtcpcb.

This revision now requires review to proceed.Mar 17 2017, 4:36 PM
glebius marked an inline comment as done.Mar 17 2017, 4:37 PM
gnn accepted this revision.Mar 18 2017, 12:33 AM
This revision is now accepted and ready to land.Mar 18 2017, 12:33 AM
This revision was automatically updated to reflect the committed changes.
brooks added a subscriber: brooks.Apr 21 2017, 9:12 PM
brooks added inline comments.

What is the purpose of this __aligned(8)? The structure contains an int64_t and thus will be at least 8 byte aligned.

glebius added inline comments.Apr 25 2017, 4:28 PM

You are right. The previous structure had _x_alignment_hack member, back from times where compilers didn't have __aligned keyword. So I removed the member and added keyword. Later, with the final stages of the patch I added the spares.

bdrewery added inline comments.

I think finding what fields are "used" is impossible. Ports is not the entire FreeBSD ecosystem. I just merged this change downstream and have *14* fields missing that we depend on. Breaking ABI is one thing but not providing all of the same data is another.