Page MenuHomeFreeBSD

lagg: Handle a port count of zero
Needs ReviewPublic

Authored by markj on Mon, May 11, 2:07 PM.
Tags
None
Referenced Files
F157576794: D56942.id177988.diff
Sat, May 23, 12:21 AM
F157529610: D56942.id177592.diff
Fri, May 22, 11:38 AM
F157523818: D56942.id.diff
Fri, May 22, 9:39 AM
F157511765: D56942.id177582.diff
Fri, May 22, 6:02 AM
Unknown Object (File)
Tue, May 19, 10:49 PM
Unknown Object (File)
Tue, May 19, 8:41 PM
Unknown Object (File)
Tue, May 19, 7:07 AM
Unknown Object (File)
Mon, May 18, 10:45 PM

Details

Reviewers
zlei
Group Reviewers
network
Summary

The sc_count check in lagg_transmit_ethernet() and
lagg_transmit_infiniband() is racy, as the lagg protocol handlers are
only synchronized by net_epoch. Handle a count of 0 in each protocol
handler where it's needed, namely in the RR and LB handlers.

PR: 289017

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 73174
Build 70057: arc lint + arc unit

Event Timeline

markj requested review of this revision.Mon, May 11, 2:07 PM

Actually there are other sc members ( and other places sc->sc_count invoved ) not synchronized properly, I think on architectures with weak memory order they are easier to spot. Found that while I was working on PR 289017 .

I planned to ask what is the best approach for drivers to handle those synchronizations on the mailing list, but at that time the 14.4-RELEASE was about to be released, and I thought that is a generic question, so I decide to postpone the asking.

Actually there are other sc members ( and other places sc->sc_count invoved ) not synchronized properly, I think on architectures with weak memory order they are easier to spot. Found that while I was working on PR 289017 .

Could you be more specific? I'm sure there are other problems, but this one is easy to trigger in practice.

I planned to ask what is the best approach for drivers to handle those synchronizations on the mailing list, but at that time the 14.4-RELEASE was about to be released, and I thought that is a generic question, so I decide to postpone the asking.

Here, it seems straightfoward to me that at a minimum we need to verify that the count isn't zero. A relaxed atomic_load_int()+check is sufficient.

sys/net/if_lagg.c
2430

We also need to make sure that lp isn't NULL.

Avoid a related potential null pointer dereference in lagg_rr_start().

Actually there are other sc members ( and other places sc->sc_count invoved ) not synchronized properly, I think on architectures with weak memory order they are easier to spot. Found that while I was working on PR 289017 .

Could you be more specific? I'm sure there are other problems, but this one is easy to trigger in practice.

I think the design of switching protocols is not right.

Different protocols has different requirements, the whole idea to keep pushing traffic / receiving traffic to / from the lagg interface while switching protocols looks wrong to me. That's why I suggested stop the interface, wait a while, switch the protocol, bring up the interface as a workaround in the mail.

There're multiple possible solutions for that. A straight forward one is introducing a protocol_ready flag to sc. But that looks a bit overkill, as user rarely switch protocol. So I think maybe we can re-use IFF_DRV_RUNNING. On switching the protocol, stop the interface first, that is

ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
lagg_proto_stop();
/* switch protocol */
...
lagg_proto_init();
ifp->if_drv_flags |= IFF_DRV_RUNNING;

Well, that still need proper synchronization of if_drv_flags. The fast path lacks that. ( Actually a lot of drivers rely if_drv_flags, that is not right. CC @glebius )

So I think, if we can extends NET_EPOCH_WAIT / NET_EPOCH_ENTER a bit, so that they establish a release / require semantic. Lots of drivers will benifet that I think. That is what I planned to ask on the mailing list.

I planned to ask what is the best approach for drivers to handle those synchronizations on the mailing list, but at that time the 14.4-RELEASE was about to be released, and I thought that is a generic question, so I decide to postpone the asking.

Here, it seems straightfoward to me that at a minimum we need to verify that the count isn't zero. A relaxed atomic_load_int()+check is sufficient.

We're racing with releasing of 15.1, I think maybe we can land a simple solid fix firstly. I'm preparing it now.

Actually there are other sc members ( and other places sc->sc_count invoved ) not synchronized properly, I think on architectures with weak memory order they are easier to spot. Found that while I was working on PR 289017 .

Could you be more specific? I'm sure there are other problems, but this one is easy to trigger in practice.

I think the design of switching protocols is not right.

Different protocols has different requirements, the whole idea to keep pushing traffic / receiving traffic to / from the lagg interface while switching protocols looks wrong to me. That's why I suggested stop the interface, wait a while, switch the protocol, bring up the interface as a workaround in the mail.

How long do you wait?

There're multiple possible solutions for that. A straight forward one is introducing a protocol_ready flag to sc. But that looks a bit overkill, as user rarely switch protocol.

if_lagg already sets sc->sc_proto = LAGG_PROTO_NONE for the window where the current protocol is detaching and we're attaching the new protocol. This is racy of course, as you say, but I don't see why any new flag is needed.

So I think maybe we can re-use IFF_DRV_RUNNING. On switching the protocol, stop the interface first, that is

ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
lagg_proto_stop();
/* switch protocol */
...
lagg_proto_init();
ifp->if_drv_flags |= IFF_DRV_RUNNING;

Well, that still need proper synchronization of if_drv_flags. The fast path lacks that. ( Actually a lot of drivers rely if_drv_flags, that is not right. CC @glebius )

So I think, if we can extends NET_EPOCH_WAIT / NET_EPOCH_ENTER a bit, so that they establish a release / require semantic. Lots of drivers will benifet that I think. That is what I planned to ask on the mailing list.

I do not see how protocol switching can be fixed without updating protocol pr_attach and pr_detach implementations to be epoch-aware. This means, at minimum:

  1. sc_proto must be loaded/updated using acquire/release semantics.
  2. pr_detach implementations must use deferred free(), i.e., NET_EPOCH_CALL().

I planned to ask what is the best approach for drivers to handle those synchronizations on the mailing list, but at that time the 14.4-RELEASE was about to be released, and I thought that is a generic question, so I decide to postpone the asking.

Here, it seems straightfoward to me that at a minimum we need to verify that the count isn't zero. A relaxed atomic_load_int()+check is sufficient.

We're racing with releasing of 15.1, I think maybe we can land a simple solid fix firstly. I'm preparing it now.

This patch is intended to exactly a "simple solid fix."

sys/net/if_lagg.c
882

Do we need a barrier here ?

sys/net/if_lagg.c
882

CK_SLIST_INSERT_AFTER and CK_SLIST_INSERT_HEAD issue a store-store fence, so the data path should not be able to observe a partially constructed lp even today. In any case, fixing such bugs is not really the goal of this particular patch.

1843

We should check here too.

Update lookup_snd_tag_port() too.

sys/net/if_lagg.c
882

For example the implementation of CK_SLIST_INSERT_HEAD on aarch64,

#define CK_SLIST_INSERT_HEAD(head, elm, field) do {                             \
        (elm)->field.csle_next = (head)->cslh_first;                            \
        ck_pr_fence_store();                                                    \
        ck_pr_store_ptr(&(head)->cslh_first, elm);                              \
} while (0)

and ck_pr_store_ptr,

#define CK_PR_STORE_SAFE(DST, VAL, TYPE)                        \
    ck_pr_md_store_##TYPE(                                      \
        ((void)sizeof(*(DST) = (VAL)), (DST)),                  \
        (VAL))

#define ck_pr_store_ptr(DST, VAL) CK_PR_STORE_SAFE((DST), (VAL), ptr)

#define CK_PR_STORE_64(S, M, T, I)                              \
        CK_CC_INLINE static void                                \
        ck_pr_md_store_##S(M *target, T v)                      \
        {                                                       \
                __asm__ __volatile__(I " %2, [%1]"              \
                                        : "=m" (*(T *)target)   \
                                        : "r" (target),         \
                                          "r" (v)               \
                                        : "memory");            \
                return;                                         \
        }

CK_PR_STORE_64(ptr, void, const void *, "str")
...

If I read the code correctly, the ck_pr_store_ptr only ensures compiler order, but there's no ( implicit or explicit ) fence introduced by it, then it is possible to get runtime disorder of following write to sc->sc_count on SMP ?

So that another thread see an increased sc->sc_count++ but the list sc->sc_ports does not contain the lp ?

Please correct me if wrong. I'm not expert on ARM, really.

sys/net/if_lagg.c
882

The memory ordering is provided by ck_pr_fence_store().

But in any case it is indeed necessary to ensure that data path code handles the case where sc->sc_count and the port list are not in agreement. That's why I added the NULL check to the loop in lagg_rr_start().

I do not see how protocol switching can be fixed without updating protocol pr_attach and pr_detach implementations to be epoch-aware. This means, at minimum:

  1. sc_proto must be loaded/updated using acquire/release semantics.
  2. pr_detach implementations must use deferred free(), i.e., NET_EPOCH_CALL().

Sorry for unrelated comment, but I have re-read this several times and it blew my mind until I realized you are not talking about protosw(9). I'd suggest to rename the lagg protocol method names to something different to avoid this confusion.