Page MenuHomeFreeBSD

Track device's NUMA domain in ifnet & allocate ifnet from NUMA local memory
ClosedPublic

Authored by gallatin on Apr 17 2019, 12:09 AM.

Details

Summary

I have added a new if_alloc_domain() and if_alloc_dev() and ported 2 example drivers to use this new interface. (Assuming this patch is accepted, I will follow on with patches to at least iflib(4) and mxge(4)).
When called with a domain on a NUMA machine, ifalloc_domain() will record the NUMA domain in the ifnet, and it will allocate the ifnet struct from memory which is local to the device's NUMA node.

Drivers can call if_alloc_dev() to automatically record the NUMA domain in the ifnet and allocate the ifnet from local memory.

The NUMA node is used in a follow on patch to enhance lagg lacp egress port selection based on NUMA domain. (in addition to another patch which records NUMA domain information in the inp).

I have intentionally avoided ifdef NUMA as much as possible; all code is out of the critical path (only run at alloc/free) and gracefully falls back to the old behavior when run on a non-NUMA kernel or with a NUMA kernel on non-NUMA hardware.

The new if_numa_domain field fits in an alignment pad in struct ifnet, and so does not alter the size of the structure.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

gallatin created this revision.Apr 17 2019, 12:09 AM
kib added a comment.Apr 17 2019, 8:20 AM

I wonder if a KPI of the format

struct ifnet *if_alloc_dev(u_char type, dev_t device);

would be more useful. It would hide all NUMA/non-NUMA/bus_get_domain() failed details that driver authors will copy/paste ad infinitum.

I do not propose to not add if_alloc_domain(), just think that if_alloc_dev() should be the primari KPI. Also it might allow to get more device_t context in if_alloc().

sys/net/if.c
540 ↗(On Diff #56266)

There should be spaces around binary op.

594 ↗(On Diff #56266)

There should be a blank line after '{' if there is no locals.

In D19930#428329, @kib wrote:

I wonder if a KPI of the format

struct ifnet *if_alloc_dev(u_char type, dev_t device);

would be more useful. It would hide all NUMA/non-NUMA/bus_get_domain() failed details that driver authors will copy/paste ad infinitum.
I do not propose to not add if_alloc_domain(), just think that if_alloc_dev() should be the primari KPI. Also it might allow to get more device_t context in if_alloc().

I think I agree with you. I was torn between the 2 different KPIs. Given that you suggested what I already was thinking about, I will implement it.

I'm trying to think now how if_alloc_domain() could be useful for non-device ifnets, and it does not seem to me at all clear that it would be. So I think I will replace if_alloc_domain() with if_alloc_dev()

kib added a comment.Apr 17 2019, 1:13 PM
In D19930#428329, @kib wrote:

I wonder if a KPI of the format

struct ifnet *if_alloc_dev(u_char type, dev_t device);

would be more useful. It would hide all NUMA/non-NUMA/bus_get_domain() failed details that driver authors will copy/paste ad infinitum.
I do not propose to not add if_alloc_domain(), just think that if_alloc_dev() should be the primari KPI. Also it might allow to get more device_t context in if_alloc().

I think I agree with you. I was torn between the 2 different KPIs. Given that you suggested what I already was thinking about, I will implement it.
I'm trying to think now how if_alloc_domain() could be useful for non-device ifnets, and it does not seem to me at all clear that it would be. So I think I will replace if_alloc_domain() with if_alloc_dev()

if_alloc_domain() could be useful for specific cases of synthetic ifnets, e.g. the mentioned lagg when both ifnets belong to the same domain, or even more clear, if_vlan for the domain of the main interface.

gallatin edited the summary of this revision. (Show Details)Apr 17 2019, 3:34 PM
gallatin updated this revision to Diff 56286.
  • Added if_alloc_dev() and changed example mlx5en and cxgbe code to use them, as suggested by kib.
  • Added links to if_alloc_domain and if_alloc_dev for man pages
  • Fixed style issues pointed out by kib
gallatin marked 2 inline comments as done.Apr 17 2019, 3:34 PM
kib added inline comments.Apr 17 2019, 4:41 PM
sys/net/if.c
42 ↗(On Diff #56286)

Thus header is excessive if domainset.h is included ?

599 ↗(On Diff #56286)

I would write it as

if (dev == NULL || bus_get_domain(dev, &numa_domain) != 0)
   return (if_alloc_domain(type, IF_NODOM);
return (if_alloc_domain(type, numa_domain);
gallatin updated this revision to Diff 56302.Apr 17 2019, 5:53 PM

Address kib's feedback

  • remove unneeded _domainset.h include
  • re-write if_alloc_dev() to be more concise
gallatin marked an inline comment as done.Apr 17 2019, 5:54 PM
gallatin added inline comments.
sys/net/if.c
42 ↗(On Diff #56286)

Indeed, you are correct. I confess to just copying from the domainset(9) man page.

kib accepted this revision.Apr 17 2019, 6:19 PM
This revision is now accepted and ready to land.Apr 17 2019, 6:19 PM
markj accepted this revision.Apr 18 2019, 12:58 PM
markj added inline comments.
sys/net/if.c
596 ↗(On Diff #56302)

There are redundant parens around the call.

gallatin updated this revision to Diff 56348.Apr 18 2019, 3:18 PM

Removed redundant parens around bus_get_domain() call as pointed out by markj

This revision now requires review to proceed.Apr 18 2019, 3:18 PM
gallatin marked an inline comment as done.Apr 18 2019, 3:19 PM
This revision was not accepted when it landed; it landed in state Needs Review.Apr 22 2019, 7:24 PM
This revision was automatically updated to reflect the committed changes.