iflib: remove convoluted custom zeroing code
ClosedPublic
Actions

Authored by brooks on Jan 8 2026, 6:09 PM.

Details

Reviewers

jrtc27
gallatin
shurd
erj

Group Reviewers

iflib
cheri

Commits

rG296ebf7b8a50: iflib: remove convoluted custom zeroing code
rG484edabb9455: iflib: remove convoluted custom zeroing code
rG31e7dc6b9a00: iflib: remove convoluted custom zeroing code

Summary

Replace a collection of aliasing violations and ifdefs with memset
(which now expands to __builtin_memset and should be quite reliably
inlined.) The old code is hard to maintain as evidenced by the most
recent change to if_pkt_info_t updating the defines, but not the zeroing
code.

Effort: CHERI upstreaming
Sponsored by: Innovate UK
Fixes: 43d7ee540efe ("iflib: support for transmit side nic KTLS offload")

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

brooks created this revision.Jan 8 2026, 6:09 PM

Herald added a reviewer: shurd. · View Herald TranscriptJan 8 2026, 6:09 PM

Herald added a reviewer: iflib. · View Herald Transcript

Herald added subscribers: glebius, melifaro, ae, imp. · View Herald Transcript

brooks requested review of this revision.Jan 8 2026, 6:09 PM

Herald added a reviewer: cheri. · View Herald TranscriptJan 8 2026, 6:09 PM

Harbormaster completed remote builds in B69726: Diff 169337.Jan 8 2026, 6:09 PM

CHERI adds yet another branch to the ifdefs and makes this worse and harder to maintain so it's time for it to go. The compiler is also free to recognize the pattern here as memset and replace the code with a libcall so this optimization isn't reliable in any way.

Inline them? There are only 5 calls in total.

I would suggest to go deeper and call memset() directly. These empty proxy functions do not improve readability.

For additional context, I took a look at the disassembly on amd64 and the memsets are all expanded in place:

;       memset(pi, 0, sizeof(*pi));
    8461: 48 c7 85 78 ff ff ff 00 00 00 00      movq    $0x0, -0x88(%rbp)
    846c: 48 c7 85 70 ff ff ff 00 00 00 00      movq    $0x0, -0x90(%rbp)
    8477: 48 c7 85 68 ff ff ff 00 00 00 00      movq    $0x0, -0x98(%rbp)
    8482: 48 c7 85 60 ff ff ff 00 00 00 00      movq    $0x0, -0xa0(%rbp)
    848d: 48 c7 85 58 ff ff ff 00 00 00 00      movq    $0x0, -0xa8(%rbp)
    8498: 48 c7 85 50 ff ff ff 00 00 00 00      movq    $0x0, -0xb0(%rbp)
...
;       memset(ri, 0, sizeof(*ri));
    8b1d: 48 c7 85 78 ff ff ff 00 00 00 00      movq    $0x0, -0x88(%rbp)
    8b28: 48 c7 85 70 ff ff ff 00 00 00 00      movq    $0x0, -0x90(%rbp)
    8b33: 48 c7 85 60 ff ff ff 00 00 00 00      movq    $0x0, -0xa0(%rbp)
    8b3e: 48 c7 85 58 ff ff ff 00 00 00 00      movq    $0x0, -0xa8(%rbp)
    8b49: 48 8b 85 48 ff ff ff          movq    -0xb8(%rbp), %rax
...
;       memset(pi, 0, sizeof(*pi));
    a9aa: 48 c7 45 a0 00 00 00 00       movq    $0x0, -0x60(%rbp)
    a9b2: 48 c7 45 98 00 00 00 00       movq    $0x0, -0x68(%rbp)
    a9ba: 48 c7 45 90 00 00 00 00       movq    $0x0, -0x70(%rbp)
    a9c2: 48 c7 45 88 00 00 00 00       movq    $0x0, -0x78(%rbp)
    a9ca: 48 c7 45 80 00 00 00 00       movq    $0x0, -0x80(%rbp)
    a9d2: 48 c7 85 78 ff ff ff 00 00 00 00      movq    $0x0, -0x88(%rbp)
    a9dd: 48 c7 85 70 ff ff ff 00 00 00 00      movq    $0x0, -0x90(%rbp)
...
;       memset(ri, 0, sizeof(*ri));
    bc7c: 48 c7 45 b0 00 00 00 00       movq    $0x0, -0x50(%rbp)
    bc84: 48 c7 45 a8 00 00 00 00       movq    $0x0, -0x58(%rbp)
    bc8c: 48 c7 45 a0 00 00 00 00       movq    $0x0, -0x60(%rbp)
    bc94: 48 c7 45 98 00 00 00 00       movq    $0x0, -0x68(%rbp)
    bc9c: 48 c7 45 90 00 00 00 00       movq    $0x0, -0x70(%rbp)
...
;       memset(pi, 0, sizeof(*pi));
    dc56: 48 c7 45 c8 00 00 00 00       movq    $0x0, -0x38(%rbp)
    dc5e: 48 c7 45 c0 00 00 00 00       movq    $0x0, -0x40(%rbp)
    dc66: 48 c7 45 b8 00 00 00 00       movq    $0x0, -0x48(%rbp)
    dc6e: 48 c7 45 b0 00 00 00 00       movq    $0x0, -0x50(%rbp)
    dc76: 48 c7 45 a8 00 00 00 00       movq    $0x0, -0x58(%rbp)
    dc7e: 48 c7 45 a0 00 00 00 00       movq    $0x0, -0x60(%rbp)
    dc86: 48 c7 45 98 00 00 00 00       movq    $0x0, -0x68(%rbp)

Drop *_zero functions entierly

Harbormaster completed remote builds in B69727: Diff 169338.Jan 8 2026, 6:36 PM

Bravo. Anything we can do to make iflib simpler is a good thing in my book!

This revision is now accepted and ready to land.Jan 8 2026, 8:58 PM

In D54605#1247601, @brooks wrote:

For additional context, I took a look at the disassembly on amd64 and the memsets are all expanded in place:

Did you look at the assembly from before the change and did it look like the post-memset change contents? I'm curious about what that custom zeroing code looked like and what it achieved.

In D54605#1247657, @erj wrote:

In D54605#1247601, @brooks wrote:

For additional context, I took a look at the disassembly on amd64 and the memsets are all expanded in place:

Did you look at the assembly from before the change and did it look like the post-memset change contents? I'm curious about what that custom zeroing code looked like and what it achieved.

It's bit hard to tease out, but it looks like the compiler might be slightly better at dead store elimination with the open-coded version. In particular rxd_init_zero expands to:

;               ri_pad->rxd_val[i] = 0;
ffffffff80cf98ad: 48 c7 85 58 ff ff ff 00 00 00 00      movq    $0x0, -0xa8(%rbp)
;               ri_pad->rxd_val[i + 3] = 0;
ffffffff80cf98b8: 48 c7 85 70 ff ff ff 00 00 00 00      movq    $0x0, -0x90(%rbp)
;       ri_pad->rxd_val[RXD_INFO_SIZE - 1] = 0;  
ffffffff80cf98c3: 48 c7 85 78 ff ff ff 00 00 00 00      movq    $0x0, -0x88(%rbp)

where the i + 1 and i + 2 statements are eliminated.

If this really matters we should find a more maintainable solution.

Closed by commit rG31e7dc6b9a00: iflib: remove convoluted custom zeroing code (authored by brooks). · Explain WhyJan 9 2026, 2:43 PM

This revision was automatically updated to reflect the committed changes.

brooks added a commit: rG31e7dc6b9a00: iflib: remove convoluted custom zeroing code.

In D54605#1247812, @brooks wrote:
In D54605#1247657, @erj wrote:

In D54605#1247601, @brooks wrote:

For additional context, I took a look at the disassembly on amd64 and the memsets are all expanded in place:

Did you look at the assembly from before the change and did it look like the post-memset change contents? I'm curious about what that custom zeroing code looked like and what it achieved.

It's bit hard to tease out, but it looks like the compiler might be slightly better at dead store elimination with the open-coded version. In particular rxd_init_zero expands to:
;               ri_pad->rxd_val[i] = 0;
ffffffff80cf98ad: 48 c7 85 58 ff ff ff 00 00 00 00      movq    $0x0, -0xa8(%rbp)
;               ri_pad->rxd_val[i + 3] = 0;
ffffffff80cf98b8: 48 c7 85 70 ff ff ff 00 00 00 00      movq    $0x0, -0x90(%rbp)
;       ri_pad->rxd_val[RXD_INFO_SIZE - 1] = 0;  
ffffffff80cf98c3: 48 c7 85 78 ff ff ff 00 00 00 00      movq    $0x0, -0x88(%rbp)
where the i + 1 and i + 2 statements are eliminated.

If this really matters we should find a more maintainable solution.

I'm assuming it probably doesn't matter, at least until someone profiles it. Thanks for looking at it!

In D54605#1248134, @erj wrote:

In D54605#1247812, @brooks wrote:

If this really matters we should find a more maintainable solution.

I'm assuming it probably doesn't matter, at least until someone profiles it. Thanks for looking at it!

The compiler could probably be fixed to do the optimization without too much pain if the right person looked at it. We probably suffer a bit from the flags used by the kernel in that LLVM head seems to use FPU instructions for zeroing by default which reduces pressure to fix this kind of thing.

brooks added a commit: rG484edabb9455: iflib: remove convoluted custom zeroing code.Jan 16 2026, 9:30 AM

brooks added a commit: rG296ebf7b8a50: iflib: remove convoluted custom zeroing code.