Details

Reviewers

glebius
• hselasky
slavash
kib

Commits

rS346247: mlx5en: Enable new pfil(9) KPI ethernet filtering hooks

Summary

Enable new pfil(9) KPI ethernet filtering hooks to allow efficient packet filtering at packet ingress on mlx5en.

Note that the packets are filtered (and potentially dropped) *before* the driver has committed to (re)allocating an mbuf for the packet. Dropped packets are treated essentially the same as an error. Nothing is allocated, and the existing buffer is recycled. This allows us to drop packets at close to line rate with very little CPU use.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 22364

Event Timeline

gallatin created this revision.Feb 2 2019, 3:29 PM

Herald added a subscriber: ae. · View Herald TranscriptFeb 2 2019, 3:29 PM

• hselasky added a reviewer: slavash.Feb 2 2019, 3:31 PM

• hselasky added a reviewer: kib.Feb 4 2019, 8:39 AM

kib added inline comments.Feb 4 2019, 9:02 AM

sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
467	Style, no declarations not at the start of a function.
474	Is this condition something that user controls ? Or, is it impossible to get PFIL_REALLOCATED in whatever configuration accessible to a user ?

moved rv declaration to top of function
entered CURVNET to prevent panic from null curvnet when VIMAGE is compiled in Note placing this at the top of the rx function is a trade off between tiny overhead in the common case (nothing hooked) and a bit larger overhead in the case when something is hooked, and entering/restoring the VNET each time we call into the filter.
Addressed all possible return values from pfil_run_hooks Note that PFIL_REALLOCED was tested by hacking pfil.c to copy packets
Added a label to jump to when PFIL_REALLOCED is returned, and we need to send up an mbuf allocated by the filter.

Herald added a subscriber: imp. · View Herald TranscriptFeb 6 2019, 3:56 PM

Harbormaster completed remote builds in B22364: Diff 53625.Feb 6 2019, 3:56 PM

gallatin marked an inline comment as done.Feb 6 2019, 3:58 PM

gallatin added inline comments.

sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
474	These are return values from the filter, and are determined by the enum pfil_return_t. I was going to wait until we had an easy way to test PFIL_REALLOCED, but I decided to just go ahead and hack up a test case so that I could handle all cases immediately.

I have a generic question about the patch. Why this needs to be done inside the driver, and why cannot pfil hooks be applied e.g. in ether_input() ? It is strange that all drivers would need this patch.

The reason to have this in the driver is for performance. The plan for pfil is to eventually just use a pointer to a memory blob for filtering. The current fake mbuf stuff is a step in that direction.

By passing just a pointer from inside the driver, filtering and dropping floods of unwanted traffic can become incredibly cheap. For each packet examined and dropped, we avoid the cache misses inherent in allocating an mbuf and filling it in (at least 2 cachelines just for the mbuf), as well as the memory copy that most drivers do for small packets (likely 2 more cachelines). For larger packets, or drivers that do not employ the copy-for-small packets strategy, instead of the copy, we avoid the virtual to physical (or worse, IOMMU) translation for new rx buffers. Avoiding non-IO cache misses is especially beneficial on Intel Xeon w/DDIO, when recently received traffic may still be resident in the DDIO cache ways.

On a haswell box, with a connectx-4 using just a single RX queue, I experimented with hardcoded rules for examining and dropping packets at various points in the stack. (eg, drop ip packets of len 65, as an example). I did not try ether_input, but I did compare dropping at this point in the driver to dropping at the top of ip_input. In that case, for a single queue, this dropping mechanism can drop 14Mpps, where as if the packet gets up into ip_input(), the same dropping code can drop packets at only 6Mpps. According to vtune, much of the overhead at 6Mpps was mbuf allocation and memory copy.

Reference also all the work done in Linux for XDP. I came up with this idea of dropping packets before the driver had committed to them, and realized that XDP does the same thing. Once I realized this, I wanted to use XDP. Thank Gleb for realizing we did not have to go full XDP, and coming up with the idea to add pfil hooks.

In D19063#408640, @gallatin wrote:

The reason to have this in the driver is for performance. The plan for pfil is to eventually just use a pointer to a memory blob for filtering. The current fake mbuf stuff is a step in that direction.

By passing just a pointer from inside the driver, filtering and dropping floods of unwanted traffic can become incredibly cheap. For each packet examined and dropped, we avoid the cache misses inherent in allocating an mbuf and filling it in (at least 2 cachelines just for the mbuf), as well as the memory copy that most drivers do for small packets (likely 2 more cachelines). For larger packets, or drivers that do not employ the copy-for-small packets strategy, instead of the copy, we avoid the virtual to physical (or worse, IOMMU) translation for new rx buffers. Avoiding non-IO cache misses is especially beneficial on Intel Xeon w/DDIO, when recently received traffic may still be resident in the DDIO cache ways.

So the advantage you claim there is the fact that we do not recycle the mbufs attached to the given WGE element on pfil drop ? It is a lot of code to add to each driver. each of place being very driver-specific, and it is much worse than e.g. BPF hooks due to that.

Also I am not sure about the single pointer of the memory blob statement. For instance, mlx5 does scatter-gather for jumbo packets, so the data cannot be described by a single pointer in full. I know that at least Intel cards have the 'split header' feature (not sure if drivers activate it). So the interface with just a pointer to a blob is somewhat limited only to headers.

On a haswell box, with a connectx-4 using just a single RX queue, I experimented with hardcoded rules for examining and dropping packets at various points in the stack. (eg, drop ip packets of len 65, as an example). I did not try ether_input, but I did compare dropping at this point in the driver to dropping at the top of ip_input. In that case, for a single queue, this dropping mechanism can drop 14Mpps, where as if the packet gets up into ip_input(), the same dropping code can drop packets at only 6Mpps. According to vtune, much of the overhead at 6Mpps was mbuf allocation and memory copy.

Reference also all the work done in Linux for XDP. I came up with this idea of dropping packets before the driver had committed to them, and realized that XDP does the same thing. Once I realized this, I wanted to use XDP. Thank Gleb for realizing we did not have to go full XDP, and coming up with the idea to add pfil hooks.

Any high performance interface will look similar to this and be similarly invasive.

As to header splitting and scatter/gather: The interface will specify a pointer and a length. In those cases, we'd want to point at just the first segment. Lets not let the perfect be the enemy of the good.

I assume it would be sufficient to just send in the min(byte_cnt, MLX5E_MAX_RX_BYTES) to handle the jumbo frames in this case?

glebius accepted this revision.Feb 7 2019, 10:26 PM

This revision is now accepted and ready to land.Feb 7 2019, 10:26 PM

Updated to truncate the length passed to pfil for multi-segment jumbo frames to just the length of the first segment.

This revision now requires review to proceed.Feb 8 2019, 3:13 PM

Harbormaster completed remote builds in B22391: Diff 53689.Feb 8 2019, 3:13 PM

Guard against null pfil at device detach by deregistering pfil hook later. While here, check for a null pfil hook and cache it in a local to make the code easier to read.

Harbormaster completed remote builds in B22645: Diff 54191.Feb 21 2019, 5:40 PM

Is all the latest pfil.h support pushed upstream?

sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
482	The breaks after goto are not needed. Can you remove them?
489	Note that the data pointer may not be aligned for pointer reads. Typically it is + 2 bytes for the sake if IP-packet alignment. May break on some arm platforms. Can this pointer be returned w/o this hack?

This revision now requires changes to proceed.Feb 26 2019, 12:33 PM

• hselasky added inline comments.Feb 26 2019, 12:34 PM

sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
489	mb = (struct mbuf __packed *) ???

Use new pfil_mem2mbuf() to avoid unaligned access of mbuf pointer
Remove unneeded breaks

Harbormaster completed remote builds in B22989: Diff 54898.Mar 10 2019, 6:21 PM

gallatin marked 2 inline comments as done.Mar 10 2019, 6:22 PM

Looks good.

This revision is now accepted and ready to land.Apr 15 2019, 2:12 PM

kib accepted this revision.Apr 15 2019, 2:12 PM

Looks good to me.

Closed by commit rS346247: mlx5en: Enable new pfil(9) KPI ethernet filtering hooks (authored by gallatin). · Explain WhyApr 15 2019, 5:15 PM

This revision was automatically updated to reflect the committed changes.

mlx5en: add pfil ethernet hook
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 53625

sys/dev/mlx5/mlx5_en/en.h

sys/dev/mlx5/mlx5_en/mlx5_en_main.c

sys/dev/mlx5/mlx5_en/mlx5_en_rx.c

mlx5en: add pfil ethernet hookClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 53625

sys/dev/mlx5/mlx5_en/en.h

sys/dev/mlx5/mlx5_en/mlx5_en_main.c

sys/dev/mlx5/mlx5_en/mlx5_en_rx.c

mlx5en: add pfil ethernet hook
ClosedPublic
Actions

Revision Contents
Changeset List