Page MenuHomeFreeBSD

mlx5en: add pfil ethernet hook
ClosedPublic

Authored by gallatin on Feb 2 2019, 3:29 PM.
Tags
None
Referenced Files
F107829245: D19063.diff
Sat, Jan 18, 2:12 PM
F107822736: D19063.id53540.diff
Sat, Jan 18, 12:42 PM
F107822725: D19063.id54191.diff
Sat, Jan 18, 12:42 PM
Unknown Object (File)
Tue, Jan 14, 4:48 AM
Unknown Object (File)
Tue, Jan 14, 1:46 AM
Unknown Object (File)
Wed, Jan 1, 9:09 AM
Unknown Object (File)
Dec 4 2024, 5:31 PM
Unknown Object (File)
Nov 30 2024, 12:00 PM
Subscribers

Details

Summary

Enable new pfil(9) KPI ethernet filtering hooks to allow efficient packet filtering at packet ingress on mlx5en.

Note that the packets are filtered (and potentially dropped) *before* the driver has committed to (re)allocating an mbuf for the packet. Dropped packets are treated essentially the same as an error. Nothing is allocated, and the existing buffer is recycled. This allows us to drop packets at close to line rate with very little CPU use.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 22391

Event Timeline

sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
467

Style, no declarations not at the start of a function.

474

Is this condition something that user controls ? Or, is it impossible to get PFIL_REALLOCATED in whatever configuration accessible to a user ?

  • moved rv declaration to top of function
  • entered CURVNET to prevent panic from null curvnet when VIMAGE is compiled in Note placing this at the top of the rx function is a trade off between tiny overhead in the common case (nothing hooked) and a bit larger overhead in the case when something is hooked, and entering/restoring the VNET each time we call into the filter.
  • Addressed all possible return values from pfil_run_hooks Note that PFIL_REALLOCED was tested by hacking pfil.c to copy packets
  • Added a label to jump to when PFIL_REALLOCED is returned, and we need to send up an mbuf allocated by the filter.
gallatin added inline comments.
sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
474

These are return values from the filter, and are determined by the enum pfil_return_t.

I was going to wait until we had an easy way to test PFIL_REALLOCED, but I decided to just go ahead and hack up a test case so that I could handle all cases immediately.

I have a generic question about the patch. Why this needs to be done inside the driver, and why cannot pfil hooks be applied e.g. in ether_input() ? It is strange that all drivers would need this patch.

The reason to have this in the driver is for performance. The plan for pfil is to eventually just use a pointer to a memory blob for filtering. The current fake mbuf stuff is a step in that direction.

By passing just a pointer from inside the driver, filtering and dropping floods of unwanted traffic can become incredibly cheap. For each packet examined and dropped, we avoid the cache misses inherent in allocating an mbuf and filling it in (at least 2 cachelines just for the mbuf), as well as the memory copy that most drivers do for small packets (likely 2 more cachelines). For larger packets, or drivers that do not employ the copy-for-small packets strategy, instead of the copy, we avoid the virtual to physical (or worse, IOMMU) translation for new rx buffers. Avoiding non-IO cache misses is especially beneficial on Intel Xeon w/DDIO, when recently received traffic may still be resident in the DDIO cache ways.

On a haswell box, with a connectx-4 using just a single RX queue, I experimented with hardcoded rules for examining and dropping packets at various points in the stack. (eg, drop ip packets of len 65, as an example). I did not try ether_input, but I did compare dropping at this point in the driver to dropping at the top of ip_input. In that case, for a single queue, this dropping mechanism can drop 14Mpps, where as if the packet gets up into ip_input(), the same dropping code can drop packets at only 6Mpps. According to vtune, much of the overhead at 6Mpps was mbuf allocation and memory copy.

Reference also all the work done in Linux for XDP. I came up with this idea of dropping packets before the driver had committed to them, and realized that XDP does the same thing. Once I realized this, I wanted to use XDP. Thank Gleb for realizing we did not have to go full XDP, and coming up with the idea to add pfil hooks.

The reason to have this in the driver is for performance. The plan for pfil is to eventually just use a pointer to a memory blob for filtering. The current fake mbuf stuff is a step in that direction.

By passing just a pointer from inside the driver, filtering and dropping floods of unwanted traffic can become incredibly cheap. For each packet examined and dropped, we avoid the cache misses inherent in allocating an mbuf and filling it in (at least 2 cachelines just for the mbuf), as well as the memory copy that most drivers do for small packets (likely 2 more cachelines). For larger packets, or drivers that do not employ the copy-for-small packets strategy, instead of the copy, we avoid the virtual to physical (or worse, IOMMU) translation for new rx buffers. Avoiding non-IO cache misses is especially beneficial on Intel Xeon w/DDIO, when recently received traffic may still be resident in the DDIO cache ways.

So the advantage you claim there is the fact that we do not recycle the mbufs attached to the given WGE element on pfil drop ? It is a lot of code to add to each driver. each of place being very driver-specific, and it is much worse than e.g. BPF hooks due to that.

Also I am not sure about the single pointer of the memory blob statement. For instance, mlx5 does scatter-gather for jumbo packets, so the data cannot be described by a single pointer in full. I know that at least Intel cards have the 'split header' feature (not sure if drivers activate it). So the interface with just a pointer to a blob is somewhat limited only to headers.

On a haswell box, with a connectx-4 using just a single RX queue, I experimented with hardcoded rules for examining and dropping packets at various points in the stack. (eg, drop ip packets of len 65, as an example). I did not try ether_input, but I did compare dropping at this point in the driver to dropping at the top of ip_input. In that case, for a single queue, this dropping mechanism can drop 14Mpps, where as if the packet gets up into ip_input(), the same dropping code can drop packets at only 6Mpps. According to vtune, much of the overhead at 6Mpps was mbuf allocation and memory copy.

Reference also all the work done in Linux for XDP. I came up with this idea of dropping packets before the driver had committed to them, and realized that XDP does the same thing. Once I realized this, I wanted to use XDP. Thank Gleb for realizing we did not have to go full XDP, and coming up with the idea to add pfil hooks.

Any high performance interface will look similar to this and be similarly invasive.

As to header splitting and scatter/gather: The interface will specify a pointer and a length. In those cases, we'd want to point at just the first segment. Lets not let the perfect be the enemy of the good.

I assume it would be sufficient to just send in the min(byte_cnt, MLX5E_MAX_RX_BYTES) to handle the jumbo frames in this case?

This revision is now accepted and ready to land.Feb 7 2019, 10:26 PM

Updated to truncate the length passed to pfil for multi-segment jumbo frames to just the length of the first segment.

This revision now requires review to proceed.Feb 8 2019, 3:13 PM

Guard against null pfil at device detach by deregistering pfil hook later. While here, check for a null pfil hook and cache it in a local to make the code easier to read.

Is all the latest pfil.h support pushed upstream?

sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
482

The breaks after goto are not needed. Can you remove them?

489

Note that the data pointer may not be aligned for pointer reads. Typically it is + 2 bytes for the sake if IP-packet alignment. May break on some arm platforms.

Can this pointer be returned w/o this hack?

This revision now requires changes to proceed.Feb 26 2019, 12:33 PM
sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
489

mb = *(struct mbuf * __packed *) ???

  • Use new pfil_mem2mbuf() to avoid unaligned access of mbuf pointer
  • Remove unneeded breaks
This revision is now accepted and ready to land.Apr 15 2019, 2:12 PM
This revision was automatically updated to reflect the committed changes.