User Details
- User Since
- Mar 10 2018, 1:54 AM (357 w, 6 d)
Oct 14 2024
Looks good, but maybe I would use more evocative names such as:
` netmap_get_local_allocator() netmap_get_na_local_allocator() `
in contrast with the "global" allocator.
It's not necessary that the names start with netmap_mem, since there are more functions that do not start with that.
Oct 11 2024
Oct 9 2024
If the application wants to zerocopy between two netmap ports, it must check that the two ports are associated to the same memory allocator. In libnetmap, for example, you would check that nmport_d::mem for the two ports point to the same address.
If the application is well written, it will check and fall back to copy if zerocopy is not possible. That is, changing the default will only potentially disable an optimization.
No, I don't have experience on that. I was just saying that only by measuring (maybe with variable packet length) one can assess the real impact. It may well be, as you say, that once you need to copy data across NUMA domain, the zerocopy feature is not too relevant.
Oct 8 2024
Hi Mark,
You were actually right, and with a fresh mind I realized I overlooked some important aspects (sorry for that).
Yes, IOMMU domains and NUMA domains are not the same thing at all, so it's probably not wise to overload the field. But the main point here is that the difference between the two has implications for the purpose of this patch.
Oct 2 2024
Sorry for the delay. Thanks for the patch.
I wonder if nm_grp can be reused.
Sep 16 2024
I guess you mean that the same code is present inside netmap_get_na which is call right after.
However, with this change the behaviour would not be equivalent, because netmap_mem_find grabs a reference to a netmap memory allocator.
So before this patch the reference is kept along the whole netmap_ioctl function, whereas with the patch the reference scope drops to the duration o the netmap_get_na call.
Sep 1 2024
Apr 26 2024
Great, thanks.
Apr 12 2024
Actually, netmap also has a more generic mechanism to store custom metadata within the netmap buffer.
Apr 11 2024
Hi,
Netmap follows a KISS approach, and has not been designed to handle hardware offloads (or at least those offloads that require metadata to be stored in the mbuf).
In this way, you can keep the per packet-metadata (struct netmap_slot) very small (16 bytes) and 16-bytes aligned, to play efficiently with the processor cache layers.
These choices are what allows netmap to achive high packet rates with small packets.
Mar 25 2024
Thanks!
Last modification.
Thanks
Mar 20 2024
Mar 19 2024
Any additional tests with pkt-gen in TX or RX mode over iflib interfaces and virtio-net ones?
E.g. two VMs/machines connected back to back with one transmitting and the other receiving, to stress test it?
Setting the sys.dev.netmap.admode sysctl to 2 will force emulated netmap even if the interface has native support.
Mar 18 2024
Feb 4 2024
No objections on my side.
Jan 20 2024
Thanks.
I'm still wondering whether anyone would call wg_output(), though...
Jan 18 2024
Looks pretty good to me, however I suggest adding some more comments to make easier to understand the four datapaths.
It's very confusing for me that I know the netmap code. I can only imagine for the casual reader...
Dec 29 2023
Dec 27 2023
Dec 21 2023
Dec 1 2023
Do you need me to commit this?
Nov 23 2023
Ok
Nov 8 2023
{ and } are missing.
Makes sense to me in general, but please also open a pull request here https://github.com/luigirizzo/netmap/
where the libnetmap developer can see it.
May 9 2023
Apr 26 2023
Apr 25 2023
I can commit this if needed
Apr 9 2023
Sorry for the delay. I've little experience with the FreeBSD test suite, so I wouldn't know...
Apr 5 2023
Yeah, that makes sense, thanks.
Interesting, thanks. I wonder why the behaviour is different with AMD Ryzen.
Which setup (machine, NIC) did you use to reproduce the issue?
If useful, please note that there are already a bunch of integration tests in the github netmap repo, e.g.:
https://github.com/luigirizzo/netmap/blob/master/utils/randomized_tests
Any example other than iflib? (iflib has native support...)
Mar 29 2023
tap mode is supposed to work unmodified... I've not tried on FreeBSD, but on Linux it definitely works.
Mar 28 2023
No worries.
Mar 22 2023
What is the use case for this change?
(Netmap was not originally supposed to work with L3 interfaces)
Looks good.
How are the tests going?
Mar 21 2023
Mar 20 2023
Mar 15 2023
Mar 14 2023
Mar 13 2023
Mar 12 2023
Mar 11 2023
Mar 9 2023
Mar 6 2023
Mar 1 2023
ok, so maybe we can go ahead
In theory you are right, although it looks like in practice thresh is always set to 0, so no deferral should happen.
Feb 28 2023
Is the patch ready for review or maybe the man page changes still need to be reworked?
Thanks for testing.
I think that's not a fatal problem, since (1) one is supposed to use native mode where available, and (2) this patch enables more use cases for emulated adapter, as you are suggesting.
In this regard, I would be curious to have an example where "a driver may reasonably defer freeing a small number of transmitted buffers indefinitely" (quoting your commit message above).
indefinitely"
Feb 27 2023
Yeah, if not used since you are here please clean it up.
Feb 26 2023
Yeah, it is very possible that a regression has been introduced with iflib.
Note that it does not make sense to use emulated mode with iflib, since iflib has native support. So it's not surprising that the regression has gone unnoticed.
Feb 19 2023
Feb 14 2023
Feb 11 2023
It may be because of bhyve em0 emulation, I'm not sure it's bug free.
QEMU emulation (on a linux KVM box) should be very reliable.
Feb 8 2023
I implemented this - now only locally destined packets are visible to netmap by default.
To be honest, I'm still not entirely convinced that this makes sense. If if_bridge natively required IFF_PROMISC to be set in order to perform L2 forwarding, then I would certainly agree. But now IFF_PROMISC has a special meaning for netmap mode.
Fix a problem with forwarding in netmap mode: when the application
writes packets to the host ring, the packets are injected via the
bridge's if_input to ether_input(). But this means that the bridge
interface won't see them again and in particular won't perform L2
forwarding.Fix the problem by interposing bridge_inject() between if_input and
ether_input(). In netmap mode, bridge_inject() flags the packet
such that ether_input() will bring it back to the bridge, which can
then select a fake source port based on the source L2 addr. This
then allows forwarding to work.
Can I ask what kind of tests you performed? I guess you have set sysctl dev.netmap.admode=2 (see netmap(4)) and tried on a vtnet0 interface.
If not done yet, could you please perform some tests on an em0 interface (e.g. emulated by qemu or bhyve)?
Still some changes to be done.
Feb 6 2023
Feb 2 2023
Yes, I've seen that IFF_PROMISC is not handled by if_bridge right now...
This could allow us to ignore the problem for the moment being, and pass them all to netmap. But handling IFF_PROMISC properly would be the more reasonable approach.
Jan 31 2023
Ok with dropping the zerocopy, but I think we should try to reuse the mbufs.
Jan 26 2023
Sure, I didn't mean to imply that any IP processing happens within the bridge code. I'll try to reformulate my question starting from your response. As you say there are two cases: depending on the dst ether address, the packet may (1) belong to the bridge or (2) belong to something else (reachable through some member port). In the first case, you have ether_demux() and maybe ip_input() on the bridge0 interface. In the second case, you forward to one (or more member ports). What I was trying to express (and failed to do so far), is that maybe when opening the netmap port in bridge0 you should only see the packets that match case (1) and not those that match case (2), assuming bridge0 is not in promisc mode (in promisc mode, you should also see packets of case (2)). This would match the behaviour of any other physical interface that you happen to open in netmap mode (again assuming no promisc mode), since you would only see packets with dst ether matching the MAC address of the physical interface.
I hope I managed to explain myself now...
What do you think?
Jan 24 2023
Yes, but a member interface is something different from the bridge0 interface.
If I run netmap on a member interface I expect to see any packets that comes across that interface (irrespective of src or dst addresses), but I do not expect to see packets that come across other member functions (unless those packets also happen to pass through the member ifnet open in netmap mode).
Similary, if I run netmap on bridge0, I expect to see any packets that would be received on that interface if that interface would not be in netmap mode. If I am not mistaken, bridge0 only gets packets with IP destination matching the IP address of bridge0 (plus broadcasts).
In other words, I do not expect an interface to behave functionally different when open in netmap mode. I just want to use a different API. Would bridge0 in this case work differently when open in netmap mode?
Jan 23 2023
Already merged.
Not relevant anymore.
Doesn't bpf receive a copy of the packets, and the packets keep going their normal rule? Netmap steals the packets instead, so that's a completely different use case.
Support already merged in a different DR.
Jan 18 2023
I'm not sure on what you mean about the generic atomic_load...
Anyway, the original meaning of the comment is "Load the variable(s) only once, in order to avoid TOCTOU issues". We may update it to make it more clear.
I'm sorry, what are we going to achieve here, exactly?
Keep in mind that if the user writes to cur and head while the syscall is executing, then the user is breaking the ABI and the result is undefined (but no harm can happen because of the checks of the prologue).
Yes, indeed, this is definitely fine! That was just a note on the historical approach.
We could later even increment the counters to emulated rxsync and txsync methods, since those increments would apply once per batch (like in iflib).
Jan 16 2023
No objections for emulated mode, but keep in mind that patched drivers do not have the counters incremented to avoid the performance hit.
(Although we may revisit this with some proper overahd measurements in place).
Jan 11 2023
A first pass.