iflib: remove transmit prefetching
ClosedPublic
Actions

Authored by gallatin on Mon, Nov 10, 6:21 PM.

Details

Reviewers

shurd
glebius
markj
mjg
ae
melifaro
kbowling

Group Reviewers

iflib

Commits

rGc9e4643da132: iflib: remove transmit prefetching

Summary

iflib's prefetching transmit descriptors actually hurts performance. I've tested 100g and 400g NICs in multiple generations of AMD and Intel servers and I have not yet found a case where it improves things. In fact, it reduces peak bandwidth by 2-4%, increases cache misses, and increases memory bus traffic.

I was thinking to add a flag to conditionally disable it, but I'd rather just remove the code entirely, as removing it simplifies the transmit path greatly. Also, I'm working on optimizing the cache behavior of the iflib transit path, and removing this code removes a reference to ifc_flags from the hot path, which helps a lot with cache locality. (there are other references that I'll introduce a patch to remove after this is merged).

I have left receive side prefetching in-place as I don't have a good setup to test it.

Typical results from my UDP packet blaster (sends full-mtu UDP segments directly to the driver) on a 7502P AMD EPYC using ice and bnxt):
ice:
before (with prefetch)
1 thread: 31.2Gb/s 19.8% L3 miss, 6.1GB/s memory bw
3 thread: 82.5Gb/s 26.5% L3 miss, 15GB/s

after prefetch removed
1 thread: 31.6Gb/s 18% L3 miss, 61.GB/s mem bw
3 thread: 84.4Gb/s 25% L3 miss, 14.8GB/s

bnxt:
before: (with prefetch)
1 thread; 30.7Gb, 20% L3 miss, 4.6GB/s
3 thread: 82.gB/s 26% L3 miss, 13.7GB/s

after prefetch removed
1 thread: 31.1Gb/s 17% L3 miss, 4.5GB/s
3 thread: 88.8Gb/s 19% L3 miss, 13.6GB/s

400G results are from a NIC which is still under NDA and cannot be shared. L3 miss/mem bw data is from, AMDuProfPcm -m memory,l3

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

gallatin created this revision.Mon, Nov 10, 6:21 PM

Herald added a reviewer: shurd. · View Herald TranscriptMon, Nov 10, 6:21 PM

Herald added a reviewer: iflib. · View Herald Transcript

Herald added subscribers: glebius, melifaro, ae, imp. · View Herald Transcript

gallatin requested review of this revision.Mon, Nov 10, 6:21 PM

gallatin added reviewers: glebius, markj, mjg, ae, melifaro.Mon, Nov 10, 6:24 PM

This looks legitimate.

The kernel does not have infra for static branches, so a tunable would not look good anyway. However, instead of just removing it I would propose just ifdefing the code for the time being, maybe start the file with:

/*
 * Uncomment to enable prefetching. Measurements in https://reviews.freebsd.org/D53674 show it hurts performance.
 */
//#define IFLIB_PREFETCH

Then interested parties can experiment should the choose to.

If mere existence of this code is making other work harder, then by all means remove it.

In D53674#1225304, @mjg wrote:
This looks legitimate.

The kernel does not have infra for static branches, so a tunable would not look good anyway. However, instead of just removing it I would propose just ifdefing the code for the time being, maybe start the file with:
/*
 * Uncomment to enable prefetching. Measurements in https://reviews.freebsd.org/D53674 show it hurts performance.
 */
//#define IFLIB_PREFETCH
Then interested parties can experiment should the choose to.

If mere existence of this code is making other work harder, then by all means remove it.

Do we know if there are any interested parties? It feels like iflib is suffering from a "tragedy of the commons" situation, where now that @shurd is gone, nobody really feels any ownership of it, and folks make tiny changes related to iflib's interaction with whatever they are working on. I tend to think this is at least partially driven by how hard to code is to understand, so I feel like anything that can make it simpler, and is performance neutral or better, will improve things. Improving performance and reducing the prematurely optimized, overly complex nature of the code has been my goal. I'm afraid I may end up owning this..

My feeling is that the code is in the repo, and anybody qualified to do these sorts of experiments would be qualified to revert this patch.

So I feel rather strongly that it should just go..

mjg accepted this revision.Mon, Nov 10, 7:18 PM

these also need to go:

sys/net/iflib.c:#define IFC_PREFETCH            0x040
sys/net/iflib.c:        if (rxq->ifr_ctx->ifc_flags & IFC_PREFETCH)
sys/net/iflib.c:        do_prefetch = (ctx->ifc_flags & IFC_PREFETCH);
sys/net/iflib.c:                ctx->ifc_flags |= IFC_PREFETCH;

In D53674#1225349, @mjg wrote:

these also need to go:

sys/net/iflib.c:#define IFC_PREFETCH            0x040
sys/net/iflib.c:        if (rxq->ifr_ctx->ifc_flags & IFC_PREFETCH)
sys/net/iflib.c:        do_prefetch = (ctx->ifc_flags & IFC_PREFETCH);
sys/net/iflib.c:                ctx->ifc_flags |= IFC_PREFETCH;

Yes, just as soon as I have a good receive-side benchmark. I prefer testing to justify anything I do is at least performance neutral. Right now, I have my UDP packet blaster, and the Netflix CDN workload, both of which are heavily transmit focused. The only receive side changes that I've made have been related to LRO, where I can measure an impact. [ EDIT ] And where I asked Olivier to confirm his packet forwarding benchmark on slow hardware didn't show regressions. However, this only applies to 10G or faster NICs, so his testbed would not be helpful..

kbowling accepted this revision.Mon, Nov 10, 8:42 PM

This revision is now accepted and ready to land.Mon, Nov 10, 8:42 PM

markj accepted this revision.Tue, Nov 11, 2:11 PM

markj added inline comments.

sys/net/iflib.c
3546	I believe calc_next_txd() is now unused.

gallatin added inline comments.Tue, Nov 11, 2:45 PM

sys/net/iflib.c
3546	Good catch.. I'll remove it when I commit. Thank you!

Closed by commit rGc9e4643da132: iflib: remove transmit prefetching (authored by gallatin). · Explain WhyTue, Nov 11, 5:01 PM

This revision was automatically updated to reflect the committed changes.

gallatin added a commit: rGc9e4643da132: iflib: remove transmit prefetching.

Thank you for your work on optimizing iflib!

Yeah, the prefetching was tuned for a specific hardware configuration used years ago by a company that's no longer in business and from my notes it was a marginal improvement. I seem to recall @olivier seeing some even worse results on Atoms and such, so a solid good riddance to this added complexity.

Revision Contents
Changeset List

Path

Size

sys/

net/

iflib.c

46 lines

Diff 166248

View Options

iflib: remove transmit prefetchingClosedPublicActions