Page MenuHomeFreeBSD

mlx5en: add pfil ethernet hook
Needs ReviewPublic

Authored by gallatin on Sat, Feb 2, 3:29 PM.

Details

Summary

Enable new pfil(9) KPI ethernet filtering hooks to allow efficient packet filtering at packet ingress on mlx5en.

Note that the packets are filtered (and potentially dropped) *before* the driver has committed to (re)allocating an mbuf for the packet. Dropped packets are treated essentially the same as an error. Nothing is allocated, and the existing buffer is recycled. This allows us to drop packets at close to line rate with very little CPU use.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 22645

Event Timeline

gallatin created this revision.Sat, Feb 2, 3:29 PM
hselasky added a reviewer: kib.Mon, Feb 4, 8:39 AM
kib added inline comments.Mon, Feb 4, 9:02 AM
sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
469

Style, no declarations not at the start of a function.

476

Is this condition something that user controls ? Or, is it impossible to get PFIL_REALLOCATED in whatever configuration accessible to a user ?

gallatin updated this revision to Diff 53625.Wed, Feb 6, 3:56 PM
  • moved rv declaration to top of function
  • entered CURVNET to prevent panic from null curvnet when VIMAGE is compiled in Note placing this at the top of the rx function is a trade off between tiny overhead in the common case (nothing hooked) and a bit larger overhead in the case when something is hooked, and entering/restoring the VNET each time we call into the filter.
  • Addressed all possible return values from pfil_run_hooks Note that PFIL_REALLOCED was tested by hacking pfil.c to copy packets
  • Added a label to jump to when PFIL_REALLOCED is returned, and we need to send up an mbuf allocated by the filter.
gallatin marked an inline comment as done.Wed, Feb 6, 3:58 PM
gallatin added inline comments.
sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
476

These are return values from the filter, and are determined by the enum pfil_return_t.

I was going to wait until we had an easy way to test PFIL_REALLOCED, but I decided to just go ahead and hack up a test case so that I could handle all cases immediately.

kib added a comment.Thu, Feb 7, 5:07 AM

I have a generic question about the patch. Why this needs to be done inside the driver, and why cannot pfil hooks be applied e.g. in ether_input() ? It is strange that all drivers would need this patch.

The reason to have this in the driver is for performance. The plan for pfil is to eventually just use a pointer to a memory blob for filtering. The current fake mbuf stuff is a step in that direction.

By passing just a pointer from inside the driver, filtering and dropping floods of unwanted traffic can become incredibly cheap. For each packet examined and dropped, we avoid the cache misses inherent in allocating an mbuf and filling it in (at least 2 cachelines just for the mbuf), as well as the memory copy that most drivers do for small packets (likely 2 more cachelines). For larger packets, or drivers that do not employ the copy-for-small packets strategy, instead of the copy, we avoid the virtual to physical (or worse, IOMMU) translation for new rx buffers. Avoiding non-IO cache misses is especially beneficial on Intel Xeon w/DDIO, when recently received traffic may still be resident in the DDIO cache ways.

On a haswell box, with a connectx-4 using just a single RX queue, I experimented with hardcoded rules for examining and dropping packets at various points in the stack. (eg, drop ip packets of len 65, as an example). I did not try ether_input, but I did compare dropping at this point in the driver to dropping at the top of ip_input. In that case, for a single queue, this dropping mechanism can drop 14Mpps, where as if the packet gets up into ip_input(), the same dropping code can drop packets at only 6Mpps. According to vtune, much of the overhead at 6Mpps was mbuf allocation and memory copy.

Reference also all the work done in Linux for XDP. I came up with this idea of dropping packets before the driver had committed to them, and realized that XDP does the same thing. Once I realized this, I wanted to use XDP. Thank Gleb for realizing we did not have to go full XDP, and coming up with the idea to add pfil hooks.

kib added a comment.Thu, Feb 7, 7:36 PM

The reason to have this in the driver is for performance. The plan for pfil is to eventually just use a pointer to a memory blob for filtering. The current fake mbuf stuff is a step in that direction.

By passing just a pointer from inside the driver, filtering and dropping floods of unwanted traffic can become incredibly cheap. For each packet examined and dropped, we avoid the cache misses inherent in allocating an mbuf and filling it in (at least 2 cachelines just for the mbuf), as well as the memory copy that most drivers do for small packets (likely 2 more cachelines). For larger packets, or drivers that do not employ the copy-for-small packets strategy, instead of the copy, we avoid the virtual to physical (or worse, IOMMU) translation for new rx buffers. Avoiding non-IO cache misses is especially beneficial on Intel Xeon w/DDIO, when recently received traffic may still be resident in the DDIO cache ways.

So the advantage you claim there is the fact that we do not recycle the mbufs attached to the given WGE element on pfil drop ? It is a lot of code to add to each driver. each of place being very driver-specific, and it is much worse than e.g. BPF hooks due to that.

Also I am not sure about the single pointer of the memory blob statement. For instance, mlx5 does scatter-gather for jumbo packets, so the data cannot be described by a single pointer in full. I know that at least Intel cards have the 'split header' feature (not sure if drivers activate it). So the interface with just a pointer to a blob is somewhat limited only to headers.

On a haswell box, with a connectx-4 using just a single RX queue, I experimented with hardcoded rules for examining and dropping packets at various points in the stack. (eg, drop ip packets of len 65, as an example). I did not try ether_input, but I did compare dropping at this point in the driver to dropping at the top of ip_input. In that case, for a single queue, this dropping mechanism can drop 14Mpps, where as if the packet gets up into ip_input(), the same dropping code can drop packets at only 6Mpps. According to vtune, much of the overhead at 6Mpps was mbuf allocation and memory copy.

Reference also all the work done in Linux for XDP. I came up with this idea of dropping packets before the driver had committed to them, and realized that XDP does the same thing. Once I realized this, I wanted to use XDP. Thank Gleb for realizing we did not have to go full XDP, and coming up with the idea to add pfil hooks.

Any high performance interface will look similar to this and be similarly invasive.

As to header splitting and scatter/gather: The interface will specify a pointer and a length. In those cases, we'd want to point at just the first segment. Lets not let the perfect be the enemy of the good.

I assume it would be sufficient to just send in the min(byte_cnt, MLX5E_MAX_RX_BYTES) to handle the jumbo frames in this case?

glebius accepted this revision.Thu, Feb 7, 10:26 PM
This revision is now accepted and ready to land.Thu, Feb 7, 10:26 PM
gallatin updated this revision to Diff 53689.Fri, Feb 8, 3:13 PM

Updated to truncate the length passed to pfil for multi-segment jumbo frames to just the length of the first segment.

This revision now requires review to proceed.Fri, Feb 8, 3:13 PM
gallatin updated this revision to Diff 54191.Thu, Feb 21, 5:40 PM

Guard against null pfil at device detach by deregistering pfil hook later. While here, check for a null pfil hook and cache it in a local to make the code easier to read.