Page MenuHomeFreeBSD

Correct and final KPI to traverse through interface address lists for drivers.Removal of not stack allocated epoch_tracker.
ClosedPublic

Authored by glebius on Tue, Oct 8, 5:30 PM.

Details

Summary

Historically drivers were always traversing address lists manually. This
always was a nuisance:

  • drivers depending on struct ifaddr, struct ifmaddr
  • yet another dependency on struct ifnet
  • dependency on how we lock address lists

All three points had been changed in the past numerous times and always
required at minimum recompilation of drivers, but sometimes a sweeping edit.

The most recent problem came with the network epoch. Conversion from LIST
to CK_LIST was performed with sed(1), but the bitter problem is that epoch
isn't a lock. Thus if_addr_rlock, if_maddr_rlock were converted into epoch
acqusition, but in an ugly way - the epoch tracker was placed on ifnet.
Thus, parallel call into if_addr_rlock on same interface would panic. This
was attempted to be corrected in r340413, which moved epoch tracker into
struct thread. This also emerged FreeBSD-EN-19:11.net for the stable/12.

The new KPI hides locking and internal representation of interface addresses.

The inital plan was to give KPI that will call a given callback for every
address. After editing all the drivers in the system, the plan was updated:

  • filter out only link level addresses, no driver is interested in !AF_LINK
  • count successfully processed addresses and return resulting count
  • provide basic address counting functions

After converting all the drivers, the old KPIs are removed and then all
epoch related hacks from struct thread removed as well.

The branch is not planned to be committed as a single commit. The full
commit history can be browsed here:

https://github.com/glebius/FreeBSD/commits/if_foreach_maddr

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

glebius created this revision.Tue, Oct 8, 5:30 PM
glebius retitled this revision from Correct and final ABI to traverse through interface address lists for drivers. Removal of not stack allocated epoch_tracker. to Correct and final KBI to traverse through interface address lists for drivers.Removal of not stack allocated epoch_tracker..Tue, Oct 8, 5:33 PM
glebius retitled this revision from Correct and final KBI to traverse through interface address lists for drivers.Removal of not stack allocated epoch_tracker. to Correct and final KPI to traverse through interface address lists for drivers.Removal of not stack allocated epoch_tracker..
glebius edited the summary of this revision. (Show Details)
gallatin accepted this revision.Tue, Oct 8, 5:59 PM

I like this a lot. Note that I only looked at mxge and a few iflib drivers.

This revision is now accepted and ready to land.Tue, Oct 8, 5:59 PM
erj accepted this revision.Tue, Oct 8, 6:36 PM

I like the changes for the Intel ethernet drivers. I've always been uncomfortable with how we were required to manually iterate through the list in the driver using the LIST_FOREACH macros; like you say in your description, if the data structure changes, then it causes a lot of driver thrash.

bz added a subscriber: bz.Tue, Oct 8, 8:28 PM

In general I like the change very much. See the minor comment for sanity checks in the internal implementation.

How do you want review on the rest of the pile?

It is quite a lot of different architecture and vendor specific code in there and this might affect ports and a possible MFC once you remove the if_*lock() functions?

sys/net/if.c
4269 ↗(On Diff #63044)

Can you add a KASSERT here that cb is not NULL please?

A lot of functions taking a callback will just do the same as the functions without callback if the cb is NULL (in other words, you could get away with just one function and an if) as the counting-only case is just a special case of the callback without callback function.

Also I think the KPI and expectations should be documented somewhere as if the callback function returns (-1) [we do port drivers from other OSes etc or 100] the outcome might be different that 0 and 1 (or 2).

PS: count should be initialised separately here as well as it is done above.

4305 ↗(On Diff #63044)

Same as above.

sys/ofed, sys/dev/mlx4, sys/dev/mlx5 and sys/dev/usb changes look OK.

sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c
625 ↗(On Diff #63044)

Maybe for infiniband, ibX, devices.

sys/net/if.c
4263 ↗(On Diff #63044)

Should this function return size_t or is there a limit somewhere on the number of entries in the address list?

4299 ↗(On Diff #63044)

Use size_t ?

4311 ↗(On Diff #63044)

If the list should corrupt, maybe it is worth to check
KASSERT(count < INT_MAX);

erj added inline comments.Tue, Oct 8, 11:28 PM
sys/dev/ixl/if_iavf.c
1229 ↗(On Diff #63044)

Why did you remove the "__unused" from the count argument? It looks like count is still unused in the modified function.

sys/dev/ixl/if_ixl.c
1638 ↗(On Diff #63044)

same question here

glebius marked 3 inline comments as done.Wed, Oct 9, 9:45 PM
glebius added inline comments.
sys/net/if.c
4311 ↗(On Diff #63044)

If the list is corrupt overflowing count is our smallest problem :) I don't think we ever have over 4 million addresses on an interface, so u_int must suffice.

glebius updated this revision to Diff 63103.Wed, Oct 9, 9:55 PM

Address reviewers comments.

This revision now requires review to proceed.Wed, Oct 9, 9:55 PM

The plan is to commit first series of commits (see my github link), then do drivers in small batches, asking for testing and resorting to committing untested after timeout, if no volunteers found. When all drivers are done, finalize with the last two commits. Looks like we are all positive, so I'm going to proceed.

philip added a subscriber: philip.Wed, Oct 9, 10:59 PM

I really like this!

I also sanity-checked the sfxge driver change and it looks good.

This revision was not accepted when it landed; it landed in state Needs Review.Thu, Oct 10, 11:43 PM
This revision was automatically updated to reflect the committed changes.