Details

Reviewers

jhb
• hselasky
np
scottl
shurd
jtl
rrs
ae
kib
jeff
erj
glebius

Group Reviewers

transport

Commits

rS359919: KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.

Summary

While the original implementation of unmapped mbufs is a large
step forward in terms of reducing cache misss by enabling mbufs
to carry more than a single page for sendfile, they are rather
cache unfriendly when accessing the ext_pgs metadata and
data. This is because the ext_pgs part of the mbuf is allocated
separately, and almost guaranteed to be cold in cache.

This change takes advantage of the fact that unmapped mbufs
are never used at the same time as pkthdr mbufs. Given this
fact, we can overlap the ext_pgs metadata with the mbuf
pkthdr, and carry the ext_pgs meta directly in the mbuf itself.
Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array
of pages) directly after the existing m_ext.

In order to be able to carry 5 pages (which is the minimum
required for a 16K TLS record which is not perfectly aligned) on
LP64, I've had to steal ext_arg2. The only user of this in the
xmit path is sendfile, and I've adjusted it to use arg1 when
using unmapped mbufs.

This change is almost entirely mechanical, except that we
change mb_alloc_ext_pgs() to no longer allow allocating
pkthdrs, the change to avoid ext_art2 as mentioned above,
and the removal of the ext_pgs zone,

This change saves roughly 2% "raw" CPU (~59% -> 57%), or over
3% "scaled" CPU on a Netflix 100% software kTLS workload at
90+ Gb/s on Broadwell Xeons.

In a follow-on commit, I plan to remove some hacks to avoid
access ext_pgs fields of mbufs, since they will now be in
cache.

Many thanks to glebius for helping to make this better in
the Netflix tree.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

gallatin created this revision.Mar 28 2020, 8:27 PM

Herald added a reviewer: transport. · View Herald TranscriptMar 28 2020, 8:27 PM

Herald added subscribers: melifaro, bz, imp. · View Herald Transcript

Harbormaster completed remote builds in B30154: Diff 69959.Mar 28 2020, 8:27 PM

gallatin added a reviewer: glebius.Mar 28 2020, 8:28 PM

Don't you want to provide some asserts about the struct mbuf layout ?
Imagine I do not know anything about constraints and want to add a field somewhere.

In D24213#532679, @kib wrote:

Don't you want to provide some asserts about the struct mbuf layout ?
Imagine I do not know anything about constraints and want to add a field somewhere.

In kern_mbuf.c, I replaced the constraint that m_ext_pgs was 256b with the constraints that the offset of m_ext in struct mbuf matches the offset of m_ext_pgs.m_ext, as well as a constraint that the size of struct mbuf does not exceed MSIZE. Did you have others in mind? The ones I added caught a few problems on 32-bit platforms which naturally align 8-byte types, for example.

In D24213#532730, @gallatin wrote:

In D24213#532679, @kib wrote:

Don't you want to provide some asserts about the struct mbuf layout ?
Imagine I do not know anything about constraints and want to add a field somewhere.

In kern_mbuf.c, I replaced the constraint that m_ext_pgs was 256b with the constraints that the offset of m_ext in struct mbuf matches the offset of m_ext_pgs.m_ext, as well as a constraint that the size of struct mbuf does not exceed MSIZE. Did you have others in mind? The ones I added caught a few problems on 32-bit platforms which naturally align 8-byte types, for example.

No, I do not, otherwise I would suggest the checks.

BTW, you commented out some asserts, why not remove them if they are not relevant ?

Looks good and also see my comments.

sys/kern/uipc_ktls.c
1222 ↗	(On Diff #69959)	What about byte order here? Do we need to use htonl() ?
sys/kern/uipc_mbuf.c
166 ↗	(On Diff #69959)	Need to fix these before commit.
sys/sys/mbuf.h
352 ↗	(On Diff #69959)	Maybe add the new field at the end of the union ....

Restored static asserts on the size of m_ext() that I'd forgotten I'd commented
Changed copy of seqno to a memcpy

Harbormaster completed remote builds in B30179: Diff 69995.Mar 30 2020, 1:33 PM

gallatin marked 2 inline comments as done.Mar 30 2020, 1:37 PM

gallatin added inline comments.

sys/kern/uipc_ktls.c
1222 ↗	(On Diff #69959)	Yes, the assignment was almost certainly wrong. I've changed this to a memcpy of the seqno into the trailer. This should have the same outcome as overlaying the bytes in a union, as it was previously done. However, I really, really, really think that this belongs in the mellanox driver. Can you please remind me why you need this in the stack, rather than in the driver?

In D24213#532734, @kib wrote:

In D24213#532730, @gallatin wrote:

In D24213#532679, @kib wrote:

Don't you want to provide some asserts about the struct mbuf layout ?
Imagine I do not know anything about constraints and want to add a field somewhere.

In kern_mbuf.c, I replaced the constraint that m_ext_pgs was 256b with the constraints that the offset of m_ext in struct mbuf matches the offset of m_ext_pgs.m_ext, as well as a constraint that the size of struct mbuf does not exceed MSIZE. Did you have others in mind? The ones I added caught a few problems on 32-bit platforms which naturally align 8-byte types, for example.

No, I do not, otherwise I would suggest the checks.

BTW, you commented out some asserts, why not remove them if they are not relevant ?

Thanks for pointing those out, I've restored them with the new, updated size of m_ext (much larger, since it now includes the ext_pgs hdr/trailer/pg array)

gallatin added inline comments.Mar 30 2020, 1:40 PM

sys/sys/mbuf.h
352 ↗	(On Diff #69959)	Hmm.. It makese more sense to me the way it is, actually.. so I'd prefer to leave it, but I don't feel hugely strongly about that.

When doing TLS v1.3 retransmissions the HW needs to know the sequence number in the fast path.

• hselasky added inline comments.Mar 30 2020, 1:55 PM

sys/kern/uipc_ktls.c
1222 ↗	(On Diff #69959)	It wasn't the seqno that was supposed to be in the trailer, but the type byte, like DATA / ALERT and so on for TLS v1.3 ???

After discussion with Hans, it turns out that it is the record type, not the seqno, that the Mellanox driver needs in the trailer for hwtls.

Harbormaster completed remote builds in B30180: Diff 69998.Mar 30 2020, 2:53 PM

Rebase past r359474
Move m_ext_pgs down below m_pkthdr in the union, as suggested by Hans

Harbormaster completed remote builds in B30321: Diff 70267.Apr 6 2020, 6:30 PM

This does mean that for the non-TLS case, ext_pgs now only holds 5 pages instead of 19, so for "plain" sendfile() this won't be as beneficial (though it's still a win over pre-ext_pgs). I think this is generally ok (I have some out of tree patches I've mentioned previously I'll have to adjust as they were using space in the unused pkthdr region, but only to cache computations).

sys/kern/uipc_mbuf.c
1775 ↗	(On Diff #70267)	I think you could make the `ext_pgs` pointer `const` instead of needing the `__DECONST`?
sys/kern/uipc_sockbuf.c
132 ↗	(On Diff #70267)	I would probably drop this blank line.
sys/sys/mbuf.h
270 ↗	(On Diff #70267)	Perhaps trim the blank line above this and add back the comment from before ext_pgs: char ext_buf; / start of buffer */
298 ↗	(On Diff #70267)	Perhaps 'struct<space>ktls_session' to match the other fields in this structure?

jhb accepted this revision.Apr 8 2020, 12:06 AM

Make cosmetic/style fixes as suggested by jhb

Harbormaster completed remote builds in B30366: Diff 70342.Apr 8 2020, 3:11 PM

gallatin added inline comments.Apr 8 2020, 3:11 PM

sys/kern/uipc_mbuf.c
1775 ↗	(On Diff #70267)	The problem is that uiomove() does not take a const, and I would then need to DECONST several more times to call it (which seems uglier), or I'd need to change uiomove() itself (which seems out of scope for this change).

• hselasky accepted this revision.Apr 8 2020, 3:54 PM

jhb accepted this revision.Apr 8 2020, 6:26 PM

jhb added inline comments.

sys/kern/uipc_mbuf.c
1775 ↗	(On Diff #70267)	Oof, ok. You can't change uiomove() since it can read or write to the kernel buffer. What you have is probably best, yes.

Fix issues encountered when running mlx5 with hw tls enabled with this patchset:

- accessing ext_pgs via pointer was missed at compile time, as the driver was using the ext_buf rather than ext_pgs pointer
the driver was using ext_pgs at the same time as pkthdr

Harbormaster completed remote builds in B30392: Diff 70377.Apr 9 2020, 5:37 PM

I can test this on T6 once it's committed.

• hselasky accepted this revision.Apr 9 2020, 6:52 PM

rrs accepted this revision.Apr 10 2020, 1:37 PM

This revision is now accepted and ready to land.Apr 10 2020, 1:37 PM

Just a heads up that i plan to commit this tomorrow.

Closed by commit rS359919: KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. (authored by gallatin). · Explain WhyApr 14 2020, 2:46 PM

This revision was automatically updated to reflect the committed changes.

gallatin added a commit: rS359919: KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself..

glebius mentioned this in D24598: More re-working of multipage mbufs..Apr 27 2020, 6:26 PM

Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 70552

head/sys/dev/cxgbe/crypto/t4_kern_tls.c

head/sys/dev/cxgbe/t4_sge.c

head/sys/dev/cxgbe/tom/t4_cpl_io.c

head/sys/dev/cxgbe/tom/t4_tls.c

head/sys/dev/mlx5/mlx5_en/mlx5_en_hw_tls.c

head/sys/kern/kern_mbuf.c

head/sys/kern/kern_sendfile.c

head/sys/kern/subr_bus_dma.c

head/sys/kern/subr_sglist.c

head/sys/kern/uipc_ktls.c

head/sys/kern/uipc_mbuf.c

head/sys/kern/uipc_sockbuf.c

head/sys/netinet/ip_output.c

head/sys/netinet/tcp_output.c

head/sys/netinet6/ip6_output.c

head/sys/sys/mbuf.h

Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 70552

head/sys/dev/cxgbe/crypto/t4_kern_tls.c

head/sys/dev/cxgbe/t4_sge.c

head/sys/dev/cxgbe/tom/t4_cpl_io.c

head/sys/dev/cxgbe/tom/t4_tls.c

head/sys/dev/mlx5/mlx5_en/mlx5_en_hw_tls.c

head/sys/kern/kern_mbuf.c

head/sys/kern/kern_sendfile.c

head/sys/kern/subr_bus_dma.c

head/sys/kern/subr_sglist.c

head/sys/kern/uipc_ktls.c

head/sys/kern/uipc_mbuf.c

head/sys/kern/uipc_sockbuf.c

head/sys/netinet/ip_output.c

head/sys/netinet/tcp_output.c

head/sys/netinet6/ip6_output.c

head/sys/sys/mbuf.h

Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
ClosedPublic
Actions

Revision Contents
Changeset List