Page MenuHomeFreeBSD

ipoib: assign link-local address according to RFC
ClosedPublic

Authored by kib on Apr 22 2019, 11:38 AM.

Details

Summary

RFC 4391 specifies that the IB interface GID should be re-used as IPv6 link-local address. Since the code in in6_get_hw_ifid() ignored IFT_INFINIBAND case, ibX interfaces ended up with the local address borrowed from some other interface, which is non-compliant.

Sponsored by: Mellanox Technologies

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib created this revision.Apr 22 2019, 11:38 AM
kib edited the summary of this revision. (Show Details)Apr 22 2019, 11:40 AM
kib added reviewers: bz, ae, slavash, hselasky.
kib added a subscriber: menyy_mellanox.com.
kib edited the summary of this revision. (Show Details)Apr 22 2019, 11:44 AM
bz accepted this revision.Apr 22 2019, 11:45 AM

Seems fine to me.

This revision is now accepted and ready to land.Apr 22 2019, 11:45 AM
slavash accepted this revision.Apr 22 2019, 2:49 PM

Looks good to me.

ae added a comment.Apr 22 2019, 3:13 PM

It looks like INFINIBAND_ALEN defines link address length for infiniband. It is 20 bytes. It seems the check if (addrlen != 16) will be always successful.

kib updated this revision to Diff 56490.Apr 22 2019, 3:32 PM

Actually use the GID part of the interface id for l/l address, skipping QP number.

Noted by: ae

This revision now requires review to proceed.Apr 22 2019, 3:32 PM
ae added a comment.Apr 22 2019, 3:54 PM

I'm not sure that is correct. in6_get_hw_ifid() is used by in6_ifattach_linklocal() via get_ifid() to generate IPv6 link local address. IPv6 link local address has already filled first 8 bytes. And in this change you override them. Note in the line 238 says that first 8 bytes should be preserved.

kib added a comment.Apr 22 2019, 4:43 PM
In D20006#430002, @ae wrote:

I'm not sure that is correct. in6_get_hw_ifid() is used by in6_ifattach_linklocal() via get_ifid() to generate IPv6 link local address. IPv6 link local address has already filled first 8 bytes. And in this change you override them. Note in the line 238 says that first 8 bytes should be preserved.

Hm, from my reading of RFC, there is a very explicit requirement to form 20-byte link-local address as [0:qpn:GID], and the 16-byte l/l should be formed by truncating. I also verified that after the patch, l/l address on the ibX is same as configured by Linux.

Are you worrying about the embedded scope id ? If scope is implanted before the call, I can preserve it ?

ae added a comment.Apr 22 2019, 5:19 PM
In D20006#430051, @kib wrote:

Hm, from my reading of RFC, there is a very explicit requirement to form 20-byte link-local address as [0:qpn:GID], and the 16-byte l/l should be formed by truncating. I also verified that after the patch, l/l address on the ibX is same as configured by Linux.
Are you worrying about the embedded scope id ? If scope is implanted before the call, I can preserve it ?

Scope Id is embedded just after get_ifid() in in6_setscope(). But if by some chance it will happen that first two bytes are not htons(0xfe80), this will not IPv6 link-local address and I think in6_ifattach_linklocal() will fail.
Can you show what address you get in your tests, and also what HW address has the card?

Also it seems Linux uses only 8 bytes:

net/ipv6/addrconf.c:

static int addrconf_ifid_infiniband(u8 *eui, struct net_device *dev)
{
        if (dev->addr_len != INFINIBAND_ALEN)
                return -1;
        memcpy(eui, dev->dev_addr + 12, 8);
        eui[0] |= 2;
        return 0;
}
...

static int ipv6_generate_eui64(u8 *eui, struct net_device *dev)
{
        switch (dev->type) {
        case ARPHRD_ETHER:
        case ARPHRD_FDDI:
                return addrconf_ifid_eui48(eui, dev);
        case ARPHRD_ARCNET:
                return addrconf_ifid_arcnet(eui, dev);
        case ARPHRD_INFINIBAND:
                return addrconf_ifid_infiniband(eui, dev);
...

PS. I'm not quite sure that the quoted code does exactly this task, but looks like :)

kib added a comment.Apr 22 2019, 5:57 PM
In D20006#430056, @ae wrote:
In D20006#430051, @kib wrote:

Hm, from my reading of RFC, there is a very explicit requirement to form 20-byte link-local address as [0:qpn:GID], and the 16-byte l/l should be formed by truncating. I also verified that after the patch, l/l address on the ibX is same as configured by Linux.
Are you worrying about the embedded scope id ? If scope is implanted before the call, I can preserve it ?

Scope Id is embedded just after get_ifid() in in6_setscope(). But if by some chance it will happen that first two bytes are not htons(0xfe80), this will not IPv6 link-local address and I think in6_ifattach_linklocal() will fail.
Can you show what address you get in your tests, and also what HW address has the card?

# ifconfig ib0
ib0: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> metric 0 mtu 2044
        options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE>
        lladdr 0.0.8.86.fe.80.0.0.0.0.0.0.e4.1d.2d.3.0.e7.10.b
        inet6 fe80::e61d:2d03:e7:100b%ib0 prefixlen 64 scopeid 0x6
        inet6 2001::43 prefixlen 64
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

# ibstat
CA 'mlx5_1'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.25.0239
        Hardware version: 0
        Node GUID: 0xe41d2d0300e7100b
        System image GUID: 0xe41d2d0300e7100a
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 8
                LMC: 0
                SM lid: 1
                Capability mask: 0x2651e848
                Port GUID: 0xe41d2d0300e7100b
                Link layer: InfiniBand

Also it seems Linux uses only 8 bytes:
net/ipv6/addrconf.c:

static int addrconf_ifid_infiniband(u8 *eui, struct net_device *dev)
{
        if (dev->addr_len != INFINIBAND_ALEN)
                return -1;
        memcpy(eui, dev->dev_addr + 12, 8);
        eui[0] |= 2;
        return 0;
}
...
static int ipv6_generate_eui64(u8 *eui, struct net_device *dev)
{
        switch (dev->type) {
        case ARPHRD_ETHER:
        case ARPHRD_FDDI:
                return addrconf_ifid_eui48(eui, dev);
        case ARPHRD_ARCNET:
                return addrconf_ifid_arcnet(eui, dev);
        case ARPHRD_INFINIBAND:
                return addrconf_ifid_infiniband(eui, dev);
...

PS. I'm not quite sure that the quoted code does exactly this task, but looks like :)

Well, I can take only 8bytes. But what I see on Linux (not me, this is copy/paste from somebody else actions):

# ifconfig ib5
Ifconfig uses the ioctl access method to get the full address information, which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are displayed correctly.
Ifconfig is obsolete! For replacement check ip.
ib5       Link encap:InfiniBand  HWaddr 80:00:08:87:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:16.7.5.140  Bcast:16.7.255.255  Mask:255.255.0.0
          inet6 addr: fe80::ee0d:9a03:43:f7bd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:672 (672.0 b)  TX bytes:916 (916.0 b)
[root@sqa140 ~]# ibstat mlx5_4
CA 'mlx5_4'
        CA type: MT4119
        Number of ports: 1
        Firmware version: 16.25.0310
        Hardware version: 0
        Node GUID: 0xec0d9a030043f7bd
        System image GUID: 0xec0d9a030043f7bc
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 11
                LMC: 0
                SM lid: 66
                Capability mask: 0x2651e848
                Port GUID: 0xec0d9a030043f7bd
                Link layer: InfiniBand
ae added a comment.Apr 22 2019, 6:26 PM
In D20006#430085, @kib wrote:

Well, I can take only 8bytes. But what I see on Linux (not me, this is copy/paste from somebody else actions):

# ifconfig ib5
Ifconfig uses the ioctl access method to get the full address information, which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are displayed correctly.
Ifconfig is obsolete! For replacement check ip.
ib5       Link encap:InfiniBand  HWaddr 80:00:08:87:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:16.7.5.140  Bcast:16.7.255.255  Mask:255.255.0.0
          inet6 addr: fe80::ee0d:9a03:43:f7bd/64 Scope:Link

I think it is right to use only last 64 bits. The first 8 bytes is already 'fe80::', thus the result will be correct and expected.

kib updated this revision to Diff 56512.Apr 22 2019, 8:03 PM

Use only 8 bytes from hw address.

ae accepted this revision.Apr 22 2019, 9:38 PM
This revision is now accepted and ready to land.Apr 22 2019, 9:38 PM
slavash accepted this revision.Apr 23 2019, 6:53 AM

Tested fix - works for me.

hselasky accepted this revision.Apr 23 2019, 6:56 AM
This revision was automatically updated to reflect the committed changes.