Thanks!
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Mon, Mar 25
Last modification.
Thanks
Wed, Mar 20
Tue, Mar 19
Any additional tests with pkt-gen in TX or RX mode over iflib interfaces and virtio-net ones?
E.g. two VMs/machines connected back to back with one transmitting and the other receiving, to stress test it?
Setting the sys.dev.netmap.admode sysctl to 2 will force emulated netmap even if the interface has native support.
Mon, Mar 18
Feb 4 2024
No objections on my side.
Jan 20 2024
In D43460#992353, @markj wrote:In D43460#992332, @vmaffione wrote:Thanks.
I'm still wondering whether anyone would call wg_output(), though...In addition to the hw TX/RX rings, have you tested that the host RX/TX rings are fully functional?
i.e., if you run bridge between the hw rings and the host rings like this# bridge -i netmap:wg0 -i netmap:wg0^with all the offloads disabled on wg0, you should be able to use wg0 as if netmap did not exist.
Indeed, this was my smoke test while developing the patch.
Thanks.
I'm still wondering whether anyone would call wg_output(), though...
Jan 18 2024
Looks pretty good to me, however I suggest adding some more comments to make easier to understand the four datapaths.
It's very confusing for me that I know the netmap code. I can only imagine for the casual reader...
Dec 29 2023
Dec 27 2023
Dec 21 2023
Dec 1 2023
Do you need me to commit this?
Nov 23 2023
Ok
Nov 8 2023
{ and } are missing.
Makes sense to me in general, but please also open a pull request here https://github.com/luigirizzo/netmap/
where the libnetmap developer can see it.
May 9 2023
Apr 26 2023
Apr 25 2023
I can commit this if needed
Apr 9 2023
Sorry for the delay. I've little experience with the FreeBSD test suite, so I wouldn't know...
Apr 5 2023
Yeah, that makes sense, thanks.
Interesting, thanks. I wonder why the behaviour is different with AMD Ryzen.
Which setup (machine, NIC) did you use to reproduce the issue?
If useful, please note that there are already a bunch of integration tests in the github netmap repo, e.g.:
https://github.com/luigirizzo/netmap/blob/master/utils/randomized_tests
Any example other than iflib? (iflib has native support...)
Mar 29 2023
tap mode is supposed to work unmodified... I've not tried on FreeBSD, but on Linux it definitely works.
Mar 28 2023
No worries.
Mar 22 2023
What is the use case for this change?
(Netmap was not originally supposed to work with L3 interfaces)
Looks good.
How are the tests going?
Mar 21 2023
Mar 20 2023
Mar 15 2023
Mar 14 2023
Mar 13 2023
Mar 12 2023
Mar 11 2023
Mar 9 2023
Mar 6 2023
Mar 1 2023
ok, so maybe we can go ahead
In D38065#884010, @markj wrote:In D38065#884005, @vmaffione wrote:In D38065#883858, @markj wrote:In D38065#883802, @vmaffione wrote:Thanks for testing.
I think that's not a fatal problem, since (1) one is supposed to use native mode where available, and (2) this patch enables more use cases for emulated adapter, as you are suggesting.
In this regard, I would be curious to have an example where "a driver may reasonably defer freeing a small number of transmitted buffers indefinitely" (quoting your commit message above).iflib may do this, for example. iflib_if_transmit() enqueues packets in a ring buffer which is drained by iflib_txq_drain(). iflib_encap() is responsible for queuing packets in a hardware TX ring. It asks the driver to raise completion interrupts by setting IPI_TX_INTR, but it doesn't do this for every packet.
Yes, iflib does not set IPI_TX_INTR for each packet in order to moderate TX interrupts and descriptor writeback overhead, but I'm pretty sure it guarantees that all the mbufs submitted to the hardware TX rings are m_free()d in a bounded amount of time.
In iflib, transmitted mbufs are freed only by iflib_completed_tx_reclaim(), which does not do any work in the reclaim <= thresh case. So if the interface is idle, nothing will force the reclamation of a small number of transmitted mbufs.
In theory you are right, although it looks like in practice thresh is always set to 0, so no deferral should happen.
Feb 28 2023
In D38065#883858, @markj wrote:In D38065#883802, @vmaffione wrote:Thanks for testing.
I think that's not a fatal problem, since (1) one is supposed to use native mode where available, and (2) this patch enables more use cases for emulated adapter, as you are suggesting.
In this regard, I would be curious to have an example where "a driver may reasonably defer freeing a small number of transmitted buffers indefinitely" (quoting your commit message above).iflib may do this, for example. iflib_if_transmit() enqueues packets in a ring buffer which is drained by iflib_txq_drain(). iflib_encap() is responsible for queuing packets in a hardware TX ring. It asks the driver to raise completion interrupts by setting IPI_TX_INTR, but it doesn't do this for every packet.
Is the patch ready for review or maybe the man page changes still need to be reworked?
In D38066#881958, @markj wrote:In D38066#880176, @vmaffione wrote:In D38066#878350, @markj wrote:In D38066#878318, @vmaffione wrote:In D38066#876096, @markj wrote:From an ifnet perspective, if_bridge is already special. There is a unique ifnet hook, if_bridge_input, by which it receives packets. The if_input hook of a bridge interface is not used.
Just out of curiosity: isn't the if_input bridge0 hook used at line 2542 of if_bridge.c (multicast and broadcast)?
I'm trying to understand here... how is it possible that bridge0 if_input is not used? How can local packets reach the protocol stack for bridge0 (e.g. TCP/UDP sockets bound to bridge0) if not by means of if_input?Yes, it's true that the bridge's if_input is called for that one special case, but that's not part of the regular data path.
Each ifnet which belongs to a bridge has a special hook, if_bridge_input, pointing to bridge_input(). Each ifnet also carries a pointer to the bridge softc. When a bridge member receives a packet, an mbuf chain is passed to ether_input_internal(), which checks to see if the receiving ifnet has if_bridge_input set. If so, the packet is passed to if_bridge_input/bridge_input().
bridge_input() can consume the packet and return NULL, which it does in the forwarding case, and then ether_input_internal() does nothing further. If the packet is local, bridge_input() uses the dst MAC to figure out which bridge port "received" the packet (this may be the bridge interface itself), and then returns the mbuf chain back to ether_input_internal(), which dispatches it to the protocol layers.
Thanks for testing.
I think that's not a fatal problem, since (1) one is supposed to use native mode where available, and (2) this patch enables more use cases for emulated adapter, as you are suggesting.
In this regard, I would be curious to have an example where "a driver may reasonably defer freeing a small number of transmitted buffers indefinitely" (quoting your commit message above).
indefinitely"
Feb 27 2023
Yeah, if not used since you are here please clean it up.
Feb 26 2023
Yeah, it is very possible that a regression has been introduced with iflib.
Note that it does not make sense to use emulated mode with iflib, since iflib has native support. So it's not surprising that the regression has gone unnoticed.
Feb 19 2023
In D38066#878350, @markj wrote:In D38066#878318, @vmaffione wrote:In D38066#876096, @markj wrote:Then, in netmap's model, it doesn't really make sense for netmap to see non-local forwarded packets. But this makes netmap+if_bridge less useful. It also just seems like a surprising behaviour to me as a non-expert: above you wrote, 'the meaning of "opening an interface in netmap mode" is to steal and detach that interface from the kernel stack, so that netmap will see all the RX traffic arriving on the interface', and I would consider non-local packets arriving at bridge0 as "RX traffic arriving on the interface", so why shouldn't they be intercepted by netmap?
Non-local packets are not arriving at bridge0, IMHO, because they are meant to be forwarded, rather than be received by the host. That's why they should not be intercepted by netmap.
I don't think this is related to netmap. Netmap is just a different API to access the ifnet (alternative to raw sockets, for example), but it should not change the behaviour of an interface.
Feb 14 2023
In D38066#876096, @markj wrote:In D38066#874987, @vmaffione wrote:I'm sorry, I don't want to slow down you work, but I do not understand why this complication is needed..
No problem at all, thank you for reviewing.
The purpose of the host stack is to allow for some subset of the traffic (e.g. ssh traffic on interface em0) to keep going to the kernel stack (e.g. the sshd socket), although all the traffic is intercepted by netmap.
So you have a packet ready in your hw RX ring of em0, you look at that and you decide that it should go ahead like netmap never intercepted it; so you write it to the em0 host (sw) TX ring. Netmap will process it by converting it into an mbuf and calling if_input on em0, so that the packet will appear on em0 as if netmap did not exist.So IMHO the bridge0 should behave the same. On the "hw" RX ring of bridge0 you will receive all the locally destined packets (or all of them in promiscuous mode). If you want some packets to go ahead like netmap never intercepted them, you would write them to the bridge0 host TX ring. Netmap will call the if_input method of bridge0, and I don't see why we should forward the packet across the bridge...
Suppose netmap intercepts a packet that would be forwarded from one bridge port to another, say em0 and em1. The application (a firewall, perhaps) decides it wants to allow the packet through, so it writes the packet to bridge0's host TX ring. netmap will call if_input of bridge0, but now the packet's receiving interface is bridge0, not em0. Thus, bridge_input() does not see the packet again, and the packet is sent to the protocol layers instead of being forwarded.
Feb 11 2023
It may be because of bhyve em0 emulation, I'm not sure it's bug free.
QEMU emulation (on a linux KVM box) should be very reliable.
Feb 8 2023
I implemented this - now only locally destined packets are visible to netmap by default.
To be honest, I'm still not entirely convinced that this makes sense. If if_bridge natively required IFF_PROMISC to be set in order to perform L2 forwarding, then I would certainly agree. But now IFF_PROMISC has a special meaning for netmap mode.
Fix a problem with forwarding in netmap mode: when the application
writes packets to the host ring, the packets are injected via the
bridge's if_input to ether_input(). But this means that the bridge
interface won't see them again and in particular won't perform L2
forwarding.Fix the problem by interposing bridge_inject() between if_input and
ether_input(). In netmap mode, bridge_inject() flags the packet
such that ether_input() will bring it back to the bridge, which can
then select a fake source port based on the source L2 addr. This
then allows forwarding to work.
Can I ask what kind of tests you performed? I guess you have set sysctl dev.netmap.admode=2 (see netmap(4)) and tried on a vtnet0 interface.
If not done yet, could you please perform some tests on an em0 interface (e.g. emulated by qemu or bhyve)?
Still some changes to be done.
Feb 6 2023
Feb 2 2023
Yes, I've seen that IFF_PROMISC is not handled by if_bridge right now...
This could allow us to ignore the problem for the moment being, and pass them all to netmap. But handling IFF_PROMISC properly would be the more reasonable approach.
Jan 31 2023
Ok with dropping the zerocopy, but I think we should try to reuse the mbufs.
Jan 26 2023
Sure, I didn't mean to imply that any IP processing happens within the bridge code. I'll try to reformulate my question starting from your response. As you say there are two cases: depending on the dst ether address, the packet may (1) belong to the bridge or (2) belong to something else (reachable through some member port). In the first case, you have ether_demux() and maybe ip_input() on the bridge0 interface. In the second case, you forward to one (or more member ports). What I was trying to express (and failed to do so far), is that maybe when opening the netmap port in bridge0 you should only see the packets that match case (1) and not those that match case (2), assuming bridge0 is not in promisc mode (in promisc mode, you should also see packets of case (2)). This would match the behaviour of any other physical interface that you happen to open in netmap mode (again assuming no promisc mode), since you would only see packets with dst ether matching the MAC address of the physical interface.
I hope I managed to explain myself now...
What do you think?
Jan 24 2023
Yes, but a member interface is something different from the bridge0 interface.
If I run netmap on a member interface I expect to see any packets that comes across that interface (irrespective of src or dst addresses), but I do not expect to see packets that come across other member functions (unless those packets also happen to pass through the member ifnet open in netmap mode).
Similary, if I run netmap on bridge0, I expect to see any packets that would be received on that interface if that interface would not be in netmap mode. If I am not mistaken, bridge0 only gets packets with IP destination matching the IP address of bridge0 (plus broadcasts).
In other words, I do not expect an interface to behave functionally different when open in netmap mode. I just want to use a different API. Would bridge0 in this case work differently when open in netmap mode?
Jan 23 2023
Already merged.
Not relevant anymore.
Doesn't bpf receive a copy of the packets, and the packets keep going their normal rule? Netmap steals the packets instead, so that's a completely different use case.
Support already merged in a different DR.
Jan 18 2023
I'm not sure on what you mean about the generic atomic_load...
Anyway, the original meaning of the comment is "Load the variable(s) only once, in order to avoid TOCTOU issues". We may update it to make it more clear.
I'm sorry, what are we going to achieve here, exactly?
Keep in mind that if the user writes to cur and head while the syscall is executing, then the user is breaking the ABI and the result is undefined (but no harm can happen because of the checks of the prologue).
Yes, indeed, this is definitely fine! That was just a note on the historical approach.
We could later even increment the counters to emulated rxsync and txsync methods, since those increments would apply once per batch (like in iflib).
Jan 16 2023
No objections for emulated mode, but keep in mind that patched drivers do not have the counters incremented to avoid the performance hit.
(Although we may revisit this with some proper overahd measurements in place).
Jan 11 2023
A first pass.