Page MenuHomeFreeBSD

ovpn: Introduce OpenVPN DCO support
ClosedPublic

Authored by kp on Feb 22 2022, 3:42 PM.

Details

Summary

OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing
(i.e. tunneling and cryptography) into the kernel, rather than using tap
devices.
This avoids significant copying and context switching overhead between
kernel and user space and improves OpenVPN throughput.

In my test setup throughput improved from around 660Mbit/s to around
2Gbit/s.

Sponsored by: Rubicon Communications, LLC ("Netgate")

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 45413
Build 42301: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sys/net/if_ovpn.c
392

Should we hold the softc write lock when setting sc->so?

439

This can be one line.

481

I suspect it'd be a bit better to set the tunneling function after finding a slot in the peer table. If we fail to find a slot, we leave the tunneling function set.

kp marked 10 inline comments as done.

Review remarks

kp marked 4 inline comments as done.Apr 26 2022, 2:21 PM

(In progress, I owe you a few more fixes.)

sys/net/if_ovpn.c
548

The callout requires the sc->lock, so I wouldn't expect that to be an issue.

I'm also seeing panics over sleepq_add with sleeping prohibited if we sleep here. Thinking about it some more I'm wondering if the drain for ping_send shouldn't also be a stop. It takes the same lock, so it should be similarly safe.

871

I don't think this is guaranteed to be contiguous, but I also don't think it matters. m_copyback() deals with mbuf chains and ovpn_encap() basically just prepends data.

1314

Good catch.

The first potential fix that comes to mind is to not reset the callout, but to track the timestamp of the last packet (with second level accuracy). We can then perform the check in the ovpn_timeout_peer() callback and either restart the callout or handle a timeout.

It may be worth replicating the trick Reid used in pf (https://cgit.freebsd.org/src/commit/sys?id=0abcc1d2d33aef2333dab28e1ec1858cf45b314a), where we have a per-cpu variable to write to. If we go that route it might be worth abstracting that a little.

sys/net/if_ovpn.c
548

I think you're right that the ping_rcv callout is ok, my mistake. But the ping_snd callout handler drops the softc lock while executing, and we really do need to defer the free of peer until that's done.

871

I think this comment dates back to the initial revision of the patch, which had

m_defrag(m);
nvlist_add_binary(... mtod(m, uint8_t *), m->m_pkthdr.len);

Now it's ok because we allocate a contiguous buffer, copy the packet, and move it into the nvlist.

kp marked 18 inline comments as done.Apr 26 2022, 3:54 PM
kp added inline comments.
sys/net/if_ovpn.c
1666

Hmm. Do you have any ideas on how to cope with that?

We're kind of stuck passing at least the softc, both for counters and in some cases because we need to look up the peer (in ovpn_encap(), from ovpn_encrypt_tx_cb()).

1841

Yes. The protocol doesn't really have a mechanism to cope with that, but in practice it's not much of a concern because it will negotiate a new key and switch to it (which resets the sequence counter) long before we've passed enough packets for that.

I've posted a first simple test case in D35067.

sys/net/if_ovpn.c
548

I don't think it does. Not any more at least.

It keeps the lock, because it's a read lock and the callout support for rmlocks is a little limited, in that it doesn't export the rm_priotracker anywhere, so there's no way for callout users to release an rm-read lock (unlike mutex, or rwlock-based callouts).

sys/net/if_ovpn.c
548

Ah, got it. I missed that the tracker pointer is NULL in that case.

1666

An atomic reference count is the standard way to handle it. That's how IPSec solves this problem.

If you ignore the possibility of hardware offloads, then I believe it would be safe to simply hold the softc lock across the crypto_dispatch() call. The ping_send callout does this. I don't believe any of our hw offload drivers implement CHACHA20, for what it's worth, though that of course could change.

If you want to be able to make use of hardware offloads and need to avoid the overhead of a global refcount, then per-CPU reference counting for each peer would probably be the way to go. I don't believe we have a generic implementation of that, though. I would probably try adding per-peer atomic refcounting and see how affects throughput, since that's easy to implement and think about, and can be replaced with something more sophisticated later.

1794

Is it guaranteed that there are no IPv6 extension options following the fixed-length header?

sys/net/if_ovpn.h
4

bump the year?

52–53

These are identical to DIOCSTART / DIOCSTOP. I don't think there's an opportunity for confusion, but do we have some existing guidance or reference on IOCTL assignments?

sys/net/if_ovpn.h
52–53

None that I'm aware of.

kp marked 2 inline comments as done.
  • Don't reset the timeout callout for every packet
  • Ensure softc and peer don't get deleted while crypto operations are running.

Clear checksum flags from the mbuf. We don't do any checksum verification, and
the checksum flags for the outer layer IP(v6)/tcp|udp packets are no longer
meaningful.

Update copyright.
Work started in 2021, so 2021-2022 is more appropriate.

Will rereview manual page when the rest settles down.

sys/net/if_ovpn.c
1794

Yes, because we're creating the IPv6 header ourselves here.

sys/net/if_ovpn.c
2014

This is a stack declaration. Please move it above M_ASSERTPKTHDR() which is a statement.

2091

Can we please count differently packets lost due to crypto failures and due to memory allocation failures? Just add one or two more counters.

2120

Why not using refcount(9)?

2238

If counters were split out into a structure, could be initialized with one liner macros like ipstat, tcpstat. Exporting them to userland as a structure also seems easier.

kp marked 4 inline comments as done.
  • Separate counters for allocation failures
  • Use array for counters for simplified init and cleanup
sys/net/if_ovpn.c
2120

We adjust the refcount with only the read lock held, so multiple cores can be modifying it at the same time. refcount(9) requires additional protection we don't provide.

  • rebase
  • fix IPv6 checksum issue
share/man/man4/ovpn.4
38–43

Does it have to be present when the kernel gets control, or is loading it during multiuser startup (rc) early enough?

share/man/man4/ovpn.4
38–43

It can be loaded at any time prior to its use. The upcoming openvpn version with DCO support should actually attempt to load it automatically if you attempt to use DCO on FreeBSD.

share/man/man4/ovpn.4
38–43

Then I'd mention the autoloading thing and use rc.conf loading instead (with or without bringing in sysrc(8), not sure).

kp marked 2 inline comments as done.Jun 3 2022, 6:37 AM
kp added inline comments.
share/man/man4/ovpn.4
38–43

This is the standard module text, I'm not sure we should be deviating from that.

See for example man 4 if_bridge or man 4 if_epair.

sys/net/if_ovpn.c
300

Q: what about link-local addresses?

524

It should be peer ( or process) fib, not 0.

1601
1620
1783

I'm wondering if we can build this prepend once for a peer and then use it via simply memcpy() (maybe with a bit of tiny per-af tweaks).

1861

Probably worth having struct route in the peer data & reference it here, so we leverage L3+L2 cache lookups.

kp marked 6 inline comments as done.Jun 3 2022, 11:53 AM
kp added inline comments.
sys/net/if_ovpn.c
300

Interesting question, I didn't think about them.

I've done a little digging in the OpenVPN code, and it appears that OpenVPN also doesn't support configuration of link-local addresses, so right now I'd be inclined to not support them here.

The ioctl interface uses nvlists, so if that changes in the future it'll be relatively easy to add this support. Although to be honest, I think the use case for a VPN over link-local addresses is fairly weak.

1783

Interesting idea, but there are some cache invalidation issues.

We'd have to be careful about rebuilding it if the peer configuration changes. That's manageable, but we also use V_ip_defttl, so if we cached it we'd end up not changing TTL when the default is changed. Pretty minor of course, but I'm also skeptical that there's a measurable performance improvement in doing this.

We'd also still have to tweak things a bit for length and checksum fields, for every packet.

1861

Ooh, interesting.

Although I worry about locking of the 'struct route'. if_ovpn uses an rmlock and multiple cores may be running ip_output() at the same time. It's also unpredictable if ip_output() will modify the 'struct route', because the routing table could have been updated. (so if_ovpn can't just take the write lock when it has an invalid struct route).

I wonder if we should keep a per-CPU 'struct route'.

kp marked 2 inline comments as done.
  • review remarks
share/man/man4/ovpn.4
38–43

Point. (I think that decision should be revisited, but this is not the place for that discussion.) I'd still mention the autoloading thing, though.

  • fix double WUNLOCK in ovpn_new_peer()
  • fix missing RUNLOCK in ovpn_transmit_to_peer()
  • check link state under the lock
  • support pf's route-to

    When pf's route-to (or reply-to) re-routes a packet it passes a new destination in if_output()'s dst argument. if_ovpn ignored this, and kept routing the packet to the original (i.e. the one in the IP header) destination.

    Re-arrange ovpn_output() and ovpn_transmit() to pass the 'dst' argument onwards to ovpn_route_peer(), so that function can use it if it's present.

Manual page LGTM. Can't attest to anything else.

This revision is now accepted and ready to land.Jun 15 2022, 12:02 AM
  • rebase
  • pick the only peer if there's only one If we only have one peer don't do a routing lookup, but use the sole peer as destination unconditionally.

    This fixes esoteric setups which used to work with openvpn/tap, as that code does not do a routing lookup if there's only one peer.
  • only obey 'dst' (i.e. the gateway) if no route is supplied That's our indication that we're being called through pf's route-to, and we should route according to 'dst' instead. We can't do so consistently, because the usual openvpn configuration sets the first non-server IP in the subnet as the gateway. If we always use that one we'd end up routing all traffic to the first client.

    TL;DR: 'ro == NULL' tells us pf is doing a route-to, and then but only then, we should treat 'dst' as the destination.
This revision now requires review to proceed.Jun 20 2022, 3:03 PM

I'm not good at reading source code, but some of this seems to implement ioctls, and I don't see them documented in the manual page. Should they be?

In D34340#806137, @pauamma_gundo.com wrote:

I'm not good at reading source code, but some of this seems to implement ioctls, and I don't see them documented in the manual page. Should they be?

There's not much point to that, given that there's really only one intended user for them. To make things work user space needs to do a lot more than call these ioctl. It needs to implement the entire openvpn protocol. That's done by openvpn, which will call these ioctls, but it's very unlikely that any other code ever will.

In D34340#806195, @kp wrote:
In D34340#806137, @pauamma_gundo.com wrote:

I'm not good at reading source code, but some of this seems to implement ioctls, and I don't see them documented in the manual page. Should they be?

There's not much point to that, given that there's really only one intended user for them. To make things work user space needs to do a lot more than call these ioctl. It needs to implement the entire openvpn protocol. That's done by openvpn, which will call these ioctls, but it's very unlikely that any other code ever will.

Fair enough.

This revision is now accepted and ready to land.Jun 23 2022, 6:34 AM
This revision was automatically updated to reflect the committed changes.