Page MenuHomeFreeBSD

if_epair: rework
ClosedPublic

Authored by kp on Jul 6 2021, 4:40 PM.

Details

Summary

Rework if_epair(4) to no longer use netisr and dpcpu.
Instead use mbufq and swi_net.
This simplifies the code and seems to make it work better and
no longer hang.
Also make sure that the maxqlen is not overrun but instead if the
queue is full drop the oldest packet instead (you can think of your
dma ring overwriting the odlest entry).

I started this at some point last year and wouldn't do some of
this the same anymore (e.g., probably not use swi_net anymore).

I ran a simplified test with 512 l-m-r jail-vnet setup on a
12 core CPU running 100k ping6 -f packets from l to r each
forwarded in m. That lead to about 18% packet drops.

I've put the full file as a drop-in replacement at
https://people.freebsd.org/~bz/tmp/20210706-01-epair.diff

Just putting this out so that other people can test or work
on it and improve it.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

bz requested review of this revision.Jul 6 2021, 4:40 PM

Good work! But hard to review.

sys/net/if_epair.c
151

cysctl -> sysctl

697

Frist -> First

Good work! But hard to review.

That's partially why the full file is also available. It's a few hundred lines so not too bad I'd hope. But maybe someone should think first if they'd want to improve the code further before proper review?

Two minor observations on the copyright... I'll try to review more later.

sys/net/if_epair.c
5

Copyright law is such that you can also do 2009-2021 as well. There's no requirement to list individual years. Only the oldest and youngest years are relevant.

5–6

I think the foundation has a blanket 'remove all rights reserved' policy. Maybe you can ask?

sys/net/if_epair.c
5

Well, copyright law here doesn't require me to put the line down at all and the only year which is relevant are 70 years after my death. The world is round.
But I'll happily safe us a few bytes in the source tree and fold it together the next time I touch it (or in case someone else continues, feel free to do so).

sys/net/if_epair.c
5

Sorry, that wasn't polite of me to answer that way.

sys/net/if_epair.c
5

I was just saying that a date range is legally acceptable and easier to do if you wanted to. The final choice is yours, of course, and I'm sorry that I failed to make that clear.

rgrimes added inline comments.
sys/net/if_epair.c
5

Though Warner and myself stronly disagree on which is correct on how to specify years, I can assure you the law itself does not state "
either is acceptable". Legal opionions on this vary as well. Specifically US law state at 17 usc 401b:

Form of Notice.—If a notice appears on the copies, it shall consist of the following three elements:

(1) the symbol © (the letter C in a circle), or the word “Copyright”, or the abbreviation “Copr.”; and

(2) the year of first publication of the work; in the case of compilations or derivative works incorporating previously published material, the year date of first publication of the compilation or derivative work is sufficient. The year date may be omitted where a pictorial, graphic, or sculptural work, with accompanying text matter, if any, is reproduced in or on greeting cards, postcards, stationery, jewelry, dolls, toys, or any useful articles; and

(3) the name of the owner of copyright in the work, or an abbreviation by which the name can be recognized, or a generally known alternative designation of the owner.

5–6

My understanding is the foundation has removed the "All rights reserved" stanza from they contracts with indivisuals, and the project has removed the stanza from the templates. Further it was my undersanding that at "some point" Ed Maste said they, the foundation, would look into doing a tree sweep to clean up the Foundations copyrights.

sys/net/if_epair.c
5

Agreed. Legal opinion does vary, and often is influenced by international concerns. And rod is right: wording was imprecise. The range of years is the most common advice I've seen in this areas, though the copyright holder is free to do what they want.

epair rework take III.

After more discussions with @kp on performance this is a slightly
improved and reworked version again. Still a bit to go.
We'll see. Doing 3x3mpps now in parallel on locally originated
min sized UDP/IPv6 packets routed to disc(4) on the other end.
This was on mercat5 (which I briefly used for that test).

In D31077#702491, @bz wrote:

epair rework take III.

After more discussions with @kp on performance this is a slightly
improved and reworked version again. Still a bit to go.
We'll see. Doing 3x3mpps now in parallel on locally originated
min sized UDP/IPv6 packets routed to disc(4) on the other end.
This was on mercat5 (which I briefly used for that test).

I've put a full copy of the file to grab and drop into your local tree here: https://people.freebsd.org/~bz/tmp/20210717-01-epair.c

bz marked 6 inline comments as done.Jul 17 2021, 5:33 PM
bz marked an inline comment as done.Jul 17 2021, 7:51 PM

I think something went wrong with the cleanup. I'll go and have a look. Doesn't help if packets don't arrive...

I'm going to commandeer the review to fix this. bz@ feel free to take it back whenever ;)

sys/net/if_epair.c
227

This is wrong, and causes if_epair to not work.
It wants to only swi_sched() the first time we add a packet to the ring, but to do so it must check idx, not ridx (and also check == 0, rather than != 0).

kp added a reviewer: bz.

Fix think-o, make this work

  • rebase
  • use buf_ring rather than a hand-rolled ring
  • Drop new packets rather than the tail of the queue, to reduce the work we do

when saturated.

This version isn't faster than the previous one, but it does cope a lot better
with being overloaded (i.e. receiving more packets than it can handle).

In D31077#739107, @kp wrote:
  • Drop new packets rather than the tail of the queue, to reduce the work we do

when saturated.

The reason I dropped the old one was to "simulate" a ring overflow where you just overwrite the next packet and keep moving.

I like your approach taking things over incrementally. Thank you for doing this!

In D31077#739110, @bz wrote:
In D31077#739107, @kp wrote:
  • Drop new packets rather than the tail of the queue, to reduce the work we do

when saturated.

The reason I dropped the old one was to "simulate" a ring overflow where you just overwrite the next packet and keep moving.

I see. I assumed it was to improve buffer delay behaviour. I’m inclined to optimise for throughput over alignment with hardware though.

I like your approach taking things over incrementally. Thank you for doing this!

I have proof-of-concept code with multiple taskqueues in the receive path that boost throughout considerably. I’d like to merge this first though.
The current version is hilariously slow if there are multiple epairs in the path (e.g.igb0 bridges to epair0a, epair0b and epair1a are in a vnet and route traffic, epair1b bridges to igb1). With the PoC version it’s significantly less slow.

In D31077#739250, @kp wrote:
In D31077#739110, @bz wrote:
In D31077#739107, @kp wrote:
  • Drop new packets rather than the tail of the queue, to reduce the work we do

when saturated.

The reason I dropped the old one was to "simulate" a ring overflow where you just overwrite the next packet and keep moving.

I see. I assumed it was to improve buffer delay behaviour. I’m inclined to optimise for throughput over alignment with hardware though.

Not sure it'll give better throughput in the end for connection-oriented transmissions. But I assume there is an entire theory filed behind this and the change below is a lot more important :-)

I like your approach taking things over incrementally. Thank you for doing this!

I have proof-of-concept code with multiple taskqueues in the receive path that boost throughout considerably.

Happy to hear that the suggested fan-out works out. Can't wait to see it. The tricky bits are to avoid re-ordering if possible of flows.

I’d like to merge this first though.

Go ahead if it passes testing. I am not going to "accept:" this one as reviewing mostly my own code makes little sense, right?

The current version is hilariously slow if there are multiple epairs in the path (e.g.igb0 bridges to epair0a, epair0b and epair1a are in a vnet and route traffic, epair1b bridges to igb1). With the PoC version it’s significantly less slow.

Turns out with you taking over the revision I can "Accept" it :-)

This revision is now accepted and ready to land.Oct 31 2021, 7:20 PM
This revision was automatically updated to reflect the committed changes.