Page MenuHomeFreeBSD

Track device's NUMA domain in ifnet & allocate ifnet from NUMA local memory
ClosedPublic

Authored by gallatin on Apr 17 2019, 12:09 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Mar 25, 11:11 PM
Unknown Object (File)
Sat, Mar 9, 4:56 AM
Unknown Object (File)
Thu, Mar 7, 5:56 AM
Unknown Object (File)
Feb 7 2024, 12:53 PM
Unknown Object (File)
Jan 16 2024, 11:17 AM
Unknown Object (File)
Dec 23 2023, 1:53 AM
Unknown Object (File)
Nov 18 2023, 10:42 PM
Unknown Object (File)
Nov 18 2023, 4:14 PM
Subscribers

Details

Summary

I have added a new if_alloc_domain() and if_alloc_dev() and ported 2 example drivers to use this new interface. (Assuming this patch is accepted, I will follow on with patches to at least iflib(4) and mxge(4)).
When called with a domain on a NUMA machine, ifalloc_domain() will record the NUMA domain in the ifnet, and it will allocate the ifnet struct from memory which is local to the device's NUMA node.

Drivers can call if_alloc_dev() to automatically record the NUMA domain in the ifnet and allocate the ifnet from local memory.

The NUMA node is used in a follow on patch to enhance lagg lacp egress port selection based on NUMA domain. (in addition to another patch which records NUMA domain information in the inp).

I have intentionally avoided ifdef NUMA as much as possible; all code is out of the critical path (only run at alloc/free) and gracefully falls back to the old behavior when run on a non-NUMA kernel or with a NUMA kernel on non-NUMA hardware.

The new if_numa_domain field fits in an alignment pad in struct ifnet, and so does not alter the size of the structure.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I wonder if a KPI of the format

struct ifnet *if_alloc_dev(u_char type, dev_t device);

would be more useful. It would hide all NUMA/non-NUMA/bus_get_domain() failed details that driver authors will copy/paste ad infinitum.

I do not propose to not add if_alloc_domain(), just think that if_alloc_dev() should be the primari KPI. Also it might allow to get more device_t context in if_alloc().

sys/net/if.c
540 ↗(On Diff #56266)

There should be spaces around binary op.

594 ↗(On Diff #56266)

There should be a blank line after '{' if there is no locals.

In D19930#428329, @kib wrote:

I wonder if a KPI of the format

struct ifnet *if_alloc_dev(u_char type, dev_t device);

would be more useful. It would hide all NUMA/non-NUMA/bus_get_domain() failed details that driver authors will copy/paste ad infinitum.

I do not propose to not add if_alloc_domain(), just think that if_alloc_dev() should be the primari KPI. Also it might allow to get more device_t context in if_alloc().

I think I agree with you. I was torn between the 2 different KPIs. Given that you suggested what I already was thinking about, I will implement it.

I'm trying to think now how if_alloc_domain() could be useful for non-device ifnets, and it does not seem to me at all clear that it would be. So I think I will replace if_alloc_domain() with if_alloc_dev()

In D19930#428329, @kib wrote:

I wonder if a KPI of the format

struct ifnet *if_alloc_dev(u_char type, dev_t device);

would be more useful. It would hide all NUMA/non-NUMA/bus_get_domain() failed details that driver authors will copy/paste ad infinitum.

I do not propose to not add if_alloc_domain(), just think that if_alloc_dev() should be the primari KPI. Also it might allow to get more device_t context in if_alloc().

I think I agree with you. I was torn between the 2 different KPIs. Given that you suggested what I already was thinking about, I will implement it.

I'm trying to think now how if_alloc_domain() could be useful for non-device ifnets, and it does not seem to me at all clear that it would be. So I think I will replace if_alloc_domain() with if_alloc_dev()

if_alloc_domain() could be useful for specific cases of synthetic ifnets, e.g. the mentioned lagg when both ifnets belong to the same domain, or even more clear, if_vlan for the domain of the main interface.

gallatin edited the summary of this revision. (Show Details)
  • Added if_alloc_dev() and changed example mlx5en and cxgbe code to use them, as suggested by kib.
  • Added links to if_alloc_domain and if_alloc_dev for man pages
  • Fixed style issues pointed out by kib
sys/net/if.c
42 ↗(On Diff #56286)

Thus header is excessive if domainset.h is included ?

599 ↗(On Diff #56286)

I would write it as

if (dev == NULL || bus_get_domain(dev, &numa_domain) != 0)
   return (if_alloc_domain(type, IF_NODOM);
return (if_alloc_domain(type, numa_domain);

Address kib's feedback

  • remove unneeded _domainset.h include
  • re-write if_alloc_dev() to be more concise
gallatin added inline comments.
sys/net/if.c
42 ↗(On Diff #56286)

Indeed, you are correct. I confess to just copying from the domainset(9) man page.

This revision is now accepted and ready to land.Apr 17 2019, 6:19 PM
markj added inline comments.
sys/net/if.c
596 ↗(On Diff #56302)

There are redundant parens around the call.

Removed redundant parens around bus_get_domain() call as pointed out by markj

This revision now requires review to proceed.Apr 18 2019, 3:18 PM
This revision was not accepted when it landed; it landed in state Needs Review.Apr 22 2019, 7:24 PM
This revision was automatically updated to reflect the committed changes.