Allow users to configure the address to send carp messages to. This
allows carp to be used in unicast mode, which is useful in certain
virtual configurations (e.g. AWS, VMWare ESXi, ...)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential D38940
carp: support unicast kp on Mar 7 2023, 10:47 AM. Authored by Tags None Referenced Files
Details
Allow users to configure the address to send carp messages to. This Sponsored by: Rubicon Communications, LLC ("Netgate")
Diff Detail
Event TimelineComment Actions I'm a little torn on how to handle the extension in the interface to userspace. I've added a new ioctl for it, but we could also extend the struct (and then teach the existing ioctl to cope with two sizes of structure), or we could convert the whole thing to using netlink, to make future extensions easier. Comment Actions Thank you for working on this! Comment Actions I'm going to use this as an opportunity to get to know netlink a bit more, so I'll take a stab at a first version myself, and then we'll go from there. Comment Actions The first thing I'm running into is "The library does not currently offer any wrappers for writing netlink messages.". I can compose it directly, but if I understand things correctly that means we're not doing the TLV thing, and we're back to painting ourselves into a corner w.r.t. extensibility. I'm also not sure I understand the distinction between fields and attributes. Oh. Wait, are fields always there (i.e. the things we can easily compose directly with a struct before the snl_send() call), and attributes are optional? So for carp it'd probably make sense to have at least 'ifname' be a field, and things like vhid be attributes (so we can optionally filter on vhid)? Comment Actions I'll have it working for route(8) conversion - will publish writers diff either later today or tomorrow.
Sure, I don't want anyone to write raw message composing code.
Yes, fields are mandatory (in a sense that they need to be present in the wire), while attributes are not _mandatory_ per protocol, but the handler can of course require certain number of them to be present. For the network interfaces API, netlink has a concept of sharing custom per-interface data in the RTM_NEWLINK messages. Nested attributes IFLA_LINKINFO / IFLA_INFO_DATA are used for that. Comment Actions
I use carp in my production environment (ESXi), and I have to enable promisc mode for virtual ports so that multicast frames can forward between VMs. I do not familiar with AWS, so suppose AWS does not provide mechanism like ESXi. If AWS is willing to provide ( configuration of promisc mode ) then this feature is not that useful. After dig into D38941 I'm even get confused about the design of unicast mode. And how about on carp group with three or more boxes ? So we need peers instead of peer ? Comment Actions It doesn't. Yesterday I discovered that the current version of the patch doesn't work in AWS, because it changes the source MAC address to a multicast address. I've got a fix for that (i.e. don't change the MAC address in unicast mode) that allows it to work in AWS. Even in promiscuous mode the multicast traffic never arrives, so we do need this at least for AWS, and it would be useful for ESXi to avoid needing to run interfaces in promiscuous mode.
That's the intention, yes. The idea is to allow an elastic IP (in AWS parlance) to be a failed over between different instances, without making assumptions about the virtual network layout.
Interesting point. I hadn't considered that, mostly because this mirrors the functionality in pfsync, where unicast mode is also limited to two peers. My inclination is to start off with the smaller change (i.e. supporting two boxes in the carp group) before we worry about other scenarios. Comment Actions I think AWS and ESXi by default block those forge packets with CARP(VRRP) multicast source MAC address.
That is a little tricky. I see carp_multicast_setup() calls in_joingroup() / in6_joingroup() , then the interface will join CARP multicast group, so promiscuous mode is not needed. Am I missing something?
That may be workaround by CARP over VXLAN and devd, or net/ucarp .
Comment Actions I wouldn't describe the traffic as forged. (And I was wrong, it's not a multicast MAC address, but a "virtual router MAC address"). The VRRP standard at least explicitly requires that address to be used. CARP isn't formally standardised (at least that I could find), but it behaves very like VRRP.
Cloud networks tend to be a bit special. At least in the first tests the multicast traffic simply didn't make it between VMs (in AWS). I don't currently have access to an ESXi instance, but I understood from your comments that it needs workarounds for multicast. Again something we'd like to avoid having to deal with.
I'm insufficiently familiar with VXLAN to say if it'll accept multicast traffic or not, but in any event, we'd also like to keep the carp code as similar as possible between the physical hardware and cloud instance case. Comment Actions Sorry I did not mean they (packets with CARP(VRRP) multicast source MAC address) are forged, but AWS / ESXi might treat them as forged [1].
I can confirm it will.
I believe that's a good approach, though I can not tell which one is better, a workaround or more features to support non-standard-compliant devices.
Comment Actions Generally, LGTM, please see some comments (mostly related to the IP address handling) inline.
Comment Actions Thank you! LGTM for the netlink side. Left some minor comments I missed earlier. The only remaining bit is the <netlinet/in.h> header in the netlink_message_writer.h, I'd love to avoid having that.
|