Page MenuHomeFreeBSD

m_unshare: Fail with a NULL return if the chain contains unmapped mbufs
Needs ReviewPublic

Authored by jhb on Sep 25 2024, 3:13 PM.
Tags
None
Referenced Files
F106178007: D46786.diff
Thu, Dec 26, 4:32 PM
Unknown Object (File)
Nov 18 2024, 8:55 AM
Unknown Object (File)
Nov 15 2024, 4:58 AM
Unknown Object (File)
Nov 13 2024, 8:52 PM
Unknown Object (File)
Nov 13 2024, 9:55 AM
Unknown Object (File)
Nov 12 2024, 3:27 PM
Unknown Object (File)
Nov 7 2024, 10:51 PM
Unknown Object (File)
Nov 7 2024, 7:35 PM
Subscribers

Details

Reviewers
gallatin
kp
Summary

If unmapped mbuf support is needed in the future, this could be fixed
by using m_copydata to read data from the source M_EXT mbufs into the
new clusters.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 59596
Build 56483: arc lint + arc unit

Event Timeline

jhb requested review of this revision.Sep 25 2024, 3:13 PM
sys/kern/uipc_mbuf.c
2150

More of a question than a remark, really.

pf currently calls m_unshare() to make sure it's safe to modify the mbuf (https://cgit.freebsd.org/src/tree/sys/netpfil/pf/pf.c#n8918). This would mean that's insufficient for M_EXTPG imbues and pf will end up dropping them. I guess that'd be pretty obvious, but is there something else it should be doing instead?

Would it be better to call mb_unmapped_to_ext() here ?

sys/kern/uipc_mbuf.c
2152

Why not call mb_unmapped_to_ext() here?

sys/kern/uipc_mbuf.c
2152

I think instead the fix would be to replace some of the memcpy() calls below with m_copydata() and then you'd be fine. You would copy the data into regular mbuf clusters, but that would be fatal for NIC TLS where the TLS mbufs need to remain intact. Mostly though I was assuming that there aren't any unmapped mbufs actually hitting this path? That is, today we only use unmapped mbufs on the transmit side either for TLS or sendfile. It's not clear to me that either of those are going to be relevant here.

For pf's use case, pf probably doesn't care about packet "payload" but only headers, and protocol headers should all be in mapped mbufs anyway. I'm not sure if there's a good way to "know" which part of an mbuf chain is headers vs payload data, but if there was, I suspect pf only wants to unshare the header part and not the payload. For pf's use case, you can probably assume that unmapped mbufs (and anything following) are "data" and not headers, so you can probably have a modified m_unshare() that just stops duplicating mbufs and appends the rest of the current chain (or uses m_dupcl() or the like for the rest of the chain).

For the stated use case of m_unshare() though which I guess is for things like IPsec that wouldn't quite be sufficient. (IPsec probably needs to choke on a NIC TLS mbuf btw.)

2201

You would use m_copydata() to here instead of mtod(m to safely copy from m into mprev.

2201

You would need to use m_copydata here to safely copy data from m instead of using mtod(m)

2233

This would also need to use m_copydata instead of mtod(m)

So would ya'll rather me fix m_unshare()? It's clearly not being used today on unmapped mbufs today as it would be panicking when it used the pointers returned from mtod(), so I wouldn't be able to test it.

sys/kern/uipc_mbuf.c
2152

For pf's use case, pf probably doesn't care about packet "payload" but only headers, and protocol headers should all be in mapped mbufs anyway

Usually that's true, but there are some instances where we end up looking at the entire packet. It's also not always immediately obvious how many bytes we're going to need for any given packet before we're done with headers and we're into payload. It's mostly straightforward for things like TCP or UDP over IPv4 (although IPv4 header options are a thing), but it becomes much harder for IPv6, and for SCTP we end up having to potentially look at the entire packet looking for ASCONF chunks. At least that last one does m_copyout(), so should be safe either way.

There might be edge cases where we'd end up with strange results for ipv6 reassembly and refragmentation (e.g. if the sender does really dumb things like having the first fragment be shorter than subsequent fragments) but those are not going to be cases where we end up with unmapped mbufs (at least, not unless we break IPv6 fragmentation in the stack again).

Life is much easier if pf can assume the entire packet is there. We could possibly cope if there's an m_pullup() equivalent that lets us check that we're not going to run into an unmapped mbuf, but it'll require carefully going through pf to make sure everywhere calls that before doing mtod() or similar.

Humm, I posit that pf(4) will never need to look into the content of unmapped mbufs. They will always be "plain" packet data (either data from a call to write(2) or sendfile(2)) and never headers inserted by any protocol layer. You can also get them for things like iSCSI and NVMe PDUs, but again only for the data portion, not even for PDU headers (which are in plain mbufs). The reason being that protocol headers have to be constructed, and the code that constructs the headers is going to want a mapped mbuf that it can write into. It could be that what we say here is that m_unshare() does not un-share unmapped mbufs but instead just bumps the refcount on the backing mbuf. We could certainly add a new helper of sorts that is something like m_mapped(m, off, len) that pf(4) could use before calls to m_pullup(). However, it is also true that nothing should be reading the unencrypted data in NIC TLS mbufs (where the NIC does the encryption so the data stays unencrypted all the way down the stack).

That seems reasonable, yes. Ideally I'd like to be able to assert that assumption (probably in pf_pull_hdr()) though.

It's also not the only case where we currently call m_unshare() to make sure we can access all the relevant data. I wound up adding one to if_ovpn here: https://cgit.freebsd.org/src/commit/sys/net/if_ovpn.c?id=5644e2c6d47c6113a61ab7fc0776b7227677656a
Short version, in kib's words: "ftpd uses sendfile(2) AFAIR, and the symptoms of the file corruption mean that M_RDONLY mbuf, of type M_EXTPG, modified by the network stack"

In D46786#1074982, @kp wrote:

That seems reasonable, yes. Ideally I'd like to be able to assert that assumption (probably in pf_pull_hdr()) though.

It's also not the only case where we currently call m_unshare() to make sure we can access all the relevant data. I wound up adding one to if_ovpn here: https://cgit.freebsd.org/src/commit/sys/net/if_ovpn.c?id=5644e2c6d47c6113a61ab7fc0776b7227677656a
Short version, in kib's words: "ftpd uses sendfile(2) AFAIR, and the symptoms of the file corruption mean that M_RDONLY mbuf, of type M_EXTPG, modified by the network stack"

Hmmm, so for KTLS, we are careful to recognize when the pages of an unmapped mbuf are "borrowed" and allocate anonymous pages for encrypting them explicitly. I wonder if for DCO you want/need to be doing something similar to protect against what kib@ describes? In particular, check ktls_encrypt_record() in sys/kern/uipc_ktls.c and how it uses EPG_FLAG_ANON to detect "anonymous" writable pages vs borrowed pages. (Potentially once this series lands that makes M_RDONLY correct for unmapped mbufs we could retire EPG_FLAG_ANON and depend on M_RDONLY for that instead.)

In D46786#1077456, @jhb wrote:
In D46786#1074982, @kp wrote:

That seems reasonable, yes. Ideally I'd like to be able to assert that assumption (probably in pf_pull_hdr()) though.

It's also not the only case where we currently call m_unshare() to make sure we can access all the relevant data. I wound up adding one to if_ovpn here: https://cgit.freebsd.org/src/commit/sys/net/if_ovpn.c?id=5644e2c6d47c6113a61ab7fc0776b7227677656a
Short version, in kib's words: "ftpd uses sendfile(2) AFAIR, and the symptoms of the file corruption mean that M_RDONLY mbuf, of type M_EXTPG, modified by the network stack"

Hmmm, so for KTLS, we are careful to recognize when the pages of an unmapped mbuf are "borrowed" and allocate anonymous pages for encrypting them explicitly. I wonder if for DCO you want/need to be doing something similar to protect against what kib@ describes? In particular, check ktls_encrypt_record() in sys/kern/uipc_ktls.c and how it uses EPG_FLAG_ANON to detect "anonymous" writable pages vs borrowed pages. (Potentially once this series lands that makes M_RDONLY correct for unmapped mbufs we could retire EPG_FLAG_ANON and depend on M_RDONLY for that instead.)

Right, we'll have to detect M_RDONLY (and potentially EPG_FLAG_ANON too, unless it's retired first) and then convince opencrypto to write the output somewhere else.
Presumably we can 'just' allocate a new mbuf for the result and then call crypto_use_output_mbuf()?

I'm slightly reluctant to mess with what we have now, and not (just) because of laziness. I never actually managed to reproduce https://bugs.freebsd.org/280036 myself, so I don't have a setup to test that. Still, it looks like that's the direction we ought to go with this.

In D46786#1077536, @kp wrote:
In D46786#1077456, @jhb wrote:
In D46786#1074982, @kp wrote:

That seems reasonable, yes. Ideally I'd like to be able to assert that assumption (probably in pf_pull_hdr()) though.

It's also not the only case where we currently call m_unshare() to make sure we can access all the relevant data. I wound up adding one to if_ovpn here: https://cgit.freebsd.org/src/commit/sys/net/if_ovpn.c?id=5644e2c6d47c6113a61ab7fc0776b7227677656a
Short version, in kib's words: "ftpd uses sendfile(2) AFAIR, and the symptoms of the file corruption mean that M_RDONLY mbuf, of type M_EXTPG, modified by the network stack"

Hmmm, so for KTLS, we are careful to recognize when the pages of an unmapped mbuf are "borrowed" and allocate anonymous pages for encrypting them explicitly. I wonder if for DCO you want/need to be doing something similar to protect against what kib@ describes? In particular, check ktls_encrypt_record() in sys/kern/uipc_ktls.c and how it uses EPG_FLAG_ANON to detect "anonymous" writable pages vs borrowed pages. (Potentially once this series lands that makes M_RDONLY correct for unmapped mbufs we could retire EPG_FLAG_ANON and depend on M_RDONLY for that instead.)

Right, we'll have to detect M_RDONLY (and potentially EPG_FLAG_ANON too, unless it's retired first) and then convince opencrypto to write the output somewhere else.
Presumably we can 'just' allocate a new mbuf for the result and then call crypto_use_output_mbuf()?

I'm slightly reluctant to mess with what we have now, and not (just) because of laziness. I never actually managed to reproduce https://bugs.freebsd.org/280036 myself, so I don't have a setup to test that. Still, it looks like that's the direction we ought to go with this.

Yes, a separate mbuf would work. ktls does a more convoluted thing where it uses IOVecs instead of the "single" mbuf and it swaps out the backing vm_page_t's for the mbuf in question. You could certainly just allocate a new anon-backed mbuf and encrypt into that.