Page MenuHomeFreeBSD

Teach lagg(4) to change MTU
AbandonedPublic

Authored by rpokala on Feb 28 2015, 3:32 AM.

Details

Summary

Teach lagg(4) to accept ioctl(SIOCSIFMTU) and change the MTU on all the component interfaces. If for any reason a component interface's MTU cannot be changed as requested, all component interfaces are returned to the original MTU.

Discussed on freebsd-hackers@:

https://lists.freebsd.org/pipermail/freebsd-hackers/2015-February/thread.html#46986

There is a LOR, but I don't know how to interpret it:

https://lists.freebsd.org/pipermail/freebsd-hackers/2015-February/047012.html

Putting rstone@ down as a reviewer, per his offer on the list.

Test Plan

Diff Detail

Repository
rS FreeBSD src repository
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rstone added inline comments.Mar 2 2015, 12:02 AM
sys/net/if_lagg.c
1811

I find the flow control here a bit confusing (my first read through, I thought that err2 could be used unitinialized). Given that you have a continue in the if block, I would find it clearer to not have an else here

emaste added a subscriber: emaste.Mar 2 2015, 4:38 AM
ae added a comment.Mar 2 2015, 7:38 AM
In D1986#7, @rstone wrote:

RLOCK only gets a read lock. You want WLOCK to get a write lock to ensure serialization.

Also we can use another lock in the lagg_ioctl, that will prevent simultaneous MTU changing.

rpokala-panasas.com updated this revision to Diff 4079.

Updating D1986: Teach lagg(4) to change MTU

Addressing review comments from rstone:

Fix style(9) violations
Replace LAGG_RLOCK() w/ LAGG_WLOCK()
Clarify logic for roll-back
Better comments throughout.

4079 addresses @rstone's comments. However, the LOR remains:

Mar  2 17:00:43 fbsd-X root: ifconfig lagg0 mtu 9000
Mar  2 17:00:43 fbsd-X kernel: lock order reversal:
Mar  2 17:00:43 fbsd-X kernel: 1st 0xfffff80004960c08 if_lagg rmlock (if_lagg rmlock) @ /usr/src/sys/modules/if_lagg/../../net/if_lagg.c:1779
Mar  2 17:00:43 fbsd-X kernel: 2nd 0xfffffe0000d50748 em1 (EM Core Lock) @ /usr/src/sys/dev/e1000/if_lem.c:1035
Mar  2 17:00:43 fbsd-X kernel: KDB: stack backtrace:
Mar  2 17:00:43 fbsd-X kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00003f86c0
Mar  2 17:00:43 fbsd-X kernel: witness_checkorder() at witness_checkorder+0xe45/frame 0xfffffe00003f8750
Mar  2 17:00:43 fbsd-X kernel: __mtx_lock_flags() at __mtx_lock_flags+0xa8/frame 0xfffffe00003f87a0
Mar  2 17:00:43 fbsd-X kernel: lem_ioctl() at lem_ioctl+0x3e8/frame 0xfffffe00003f87e0
Mar  2 17:00:43 fbsd-X kernel: lagg_ioctl() at lagg_ioctl+0xafd/frame 0xfffffe00003f88e0
Mar  2 17:00:43 fbsd-X kernel: ifioctl() at ifioctl+0x102e/frame 0xfffffe00003f89a0
Mar  2 17:00:43 fbsd-X kernel: kern_ioctl() at kern_ioctl+0x2c0/frame 0xfffffe00003f8a00
Mar  2 17:00:43 fbsd-X kernel: sys_ioctl() at sys_ioctl+0x153/frame 0xfffffe00003f8ae0
Mar  2 17:00:43 fbsd-X kernel: amd64_syscall() at amd64_syscall+0x27f/frame 0xfffffe00003f8bf0
Mar  2 17:00:43 fbsd-X kernel: Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00003f8bf0
Mar  2 17:00:43 fbsd-X kernel: --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8011e1eea, rsp = 0x7fffffffe288, rbp = 0x7fffffffe2a0 ---

One of my colleagues dug into the first LOR - the one that happens when adding em1 to lagg0, which happens both with and without the patch. It looks like this is a known issue:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194109

We (Panasas) tried reproducing the problem by applying the patch and enabled WITNESS in freebsd stable (10.1) kernel.
While testing with this patched kernel, LOR not observed on ixl and igb ports.

Based on the LOR logs mentioned earlier, the issue seems to be observed on device specific driver like em (e1000).
We tried doing code analysis on em driver and could not find anything obvious. Our platform do not use em driver anymore.

The issue seems to be on device specific driver and not the lagg driver changes that we introduced.
So, we kindly request you to accept these changes to mainline.

sbruno added a subscriber: sbruno.Oct 1 2015, 3:44 PM

We (Panasas) tried reproducing the problem by applying the patch and enabled WITNESS in freebsd stable (10.1) kernel.
While testing with this patched kernel, LOR not observed on ixl and igb ports.
Based on the LOR logs mentioned earlier, the issue seems to be observed on device specific driver like em (e1000).
We tried doing code analysis on em driver and could not find anything obvious. Our platform do not use em driver anymore.
The issue seems to be on device specific driver and not the lagg driver changes that we introduced.
So, we kindly request you to accept these changes to mainline.

lem_ioctl() at lem_ioctl+0x3e8/frame 0xfffffe00003f87e0

This actually means sys/dev/e1000/if_lem.c ... I'll take a look at the LOR and see if its fixable.

@sbruno, A gentle remainder on em driver's LOR issue. Can you please share your findings.

hrs added a subscriber: hrs.Oct 19 2015, 6:21 AM

It is true that this LOR is driver-specific but calling SIOCSIFMTU after acquiring a lock in lagg ioctl is not always safe. Change of lladdr suffers from the same situation and it was solved by using an asynchronous task queue to update addresses on each port. What do you think about piggybacking an MTU change to the queue by extending struct lagg_llq to a more generic one which makes it possible to handle per-port properties?

After digging into lagg internals on updating lladdrs on lagg ports, I'd also vote for extenging llq to deal with MTU changes for underlying interfaces

Thanks @hrs and @melifaro for the suggestions.

I am working on the code changes to handle the MTU asynchronously. I will update the tested patch for review by this week.

rpokala commandeered this revision.
rpokala updated this revision to Diff 11321.Dec 15 2015, 3:37 PM

Updated patch from LN, addressing comments from hrs@, melifaro@, and internal Panasas review.

rpokala marked 3 inline comments as done.Dec 15 2015, 3:41 PM
smh added a subscriber: smh.Dec 17 2015, 9:53 PM

Some general style nits and a question re-loss of the mtu set error.

sys/net/if_lagg.c
693

style(9) bool use of pointer type.

715

style(9) four space additional indent only should be used, more below.

753

Looks like this attempts to change more that would have been done above before the error, is that intended?

767

Looks like the error gets lost, although printed, is there no way we can avoid this?

796

style(9) bracing around return.

802

style(9) init of vars in declaration should be avoided.

Moving to down to where its first needed can avoid setting it at all.

821

style(9) bool use of pointer type.

827

style(9) bool use of pointer type.

836

consider: "it might have been updated."

845

style(9) bool use of pointer type.

melifaro added inline comments.Dec 17 2015, 10:16 PM
sys/net/if_lagg.c
712

What if we have multiple events queued on tasq? e.g mtu AND mac change

747

Not that easy, unfortunately.
At this moment original ioctl returned 0, so other things/events were fired:
rtsock notification, IPv6 nd mtg update, route table mtg update.. see net/if.c for more details.
Trying to silently revert part of it without dealing with other things is not a good idea.

hrs added inline comments.Dec 18 2015, 10:11 PM
sys/net/if_lagg.c
697

This traversal and freeing an entry after processing it should be done in lagg_port_ops().

737

Please separate a llq loop from a handler for per-port configuration. A llq traversal should be required only once in lagg_port_ops() if the handlers process a single lagg_llq entry.

821

Is this (llq == NULL), not (llq != NULL)?

824

Why is cleanup required here? This removes all of tasks not limited to MTU change.

845

free(NULL) does nothing. Checking if NULL or not is useless.

sys/net/if_lagg.h
220 ↗(On Diff #11321)

Please add "llq_" prefix to the members.

221 ↗(On Diff #11321)

Is there any reason to have ifr as a pointer? malloc is generally expensive in kernel, and overhead of struct ifreq is acceptable for me even if every llq has one. I feel this complicates the error handling at least.

Thanks @smh, @melifaro and @hrs for the review comments.

rpokala marked 8 inline comments as done.Jan 7 2016, 3:10 AM
rpokala updated this revision to Diff 11984.

Address review comments from smh@, melifaro@, and hrs@.

rpokala marked 11 inline comments as done.Jan 8 2016, 7:02 AM

I'm proxying for LN, who asserts that all the comments have been addressed. So, marking them done.

sys/net/if_lagg.c
753

For the interfaces after the one that failed (and therefore which still have the old MTU), this is a noop.

796

I rather hate that idiom, but fine. :-P

smh added inline comments.Jan 11 2016, 10:17 PM
sys/kern/kern_condvar.c
466 ↗(On Diff #11984)

style(9) there should be blank line at the beginning of methods that don't have any vars.

468 ↗(On Diff #11984)

No need for else here

sys/net/if_lagg.c
107

style(9) should be 4 space indented, more below

sys/net/if_lagg.h
241 ↗(On Diff #11984)

not sure but looks like there are alignment issues here?

smh added inline comments.Jan 11 2016, 11:27 PM
sys/kern/kern_condvar.c
465 ↗(On Diff #11984)

ASSERT that the lock is held?

sys/net/if_lagg.c
698

Not sure if its possible but if we have no llq_entries then err will be uninitialised on return.

721

style(9) blank line after vars.

744

static void should be on a separate line, few more below.

Also not keen on the name for this method may be something like lagg_llq_free_entries?

777

No need for == TRUE on bool.

784

I assume this is checked after cv_has_waiters as that indicates an mtu change is in progress hence there's no easy way to confirm what it will become?

Its a shame ioctl doesn't currently have EBUSY as a possible return error as that may be simpler.

rpokala marked an inline comment as done.Jan 14 2016, 12:21 AM
rpokala updated this revision to Diff 12272.

Address smh@'s comments from 2016-01-11.

rpokala marked 5 inline comments as done.Jan 14 2016, 12:23 AM

smh@ - I just uploaded an updated patch from LN, which addressed your latest comments.

smh added a comment.Jan 14 2016, 9:55 AM

Also still some older style(9) comments still not addressed.

sys/net/if_lagg.c
785

This looks a bit messy given you only need LAGG_WUNLOCK(sc); I would do something like (also eliminates done var too).

Alternatively just use goto out, which while it sets busy to false that should be fine.

if (SLIST_EMPTY(&sc->sc_ports)) {
        LAGG_WUNLOCK(sc);
        return (EIO);
} else if (sc->sc_mtu_ctxt.busy) {
        LAGG_WUNLOCK(sc);
        return (EBUSY);
} else if (ifp->if_mtu == ifr->ifr_mtu) {
        LAGG_WUNLOCK(sc);
        return (0);
}

sc->sc_mtu_ctxt.busy = TRUE;
sys/net/if_lagg.h
216 ↗(On Diff #12272)

Any reason for the blank line here?

Hi Steven Hartland,
Thanks for your review.
I have attached the revised diffs here, as I cannot update them myself.
Ravi, who owns this review, works on a different timezone, and I have
copied him here.

Looking forward to hear your comments on the same.

Thanks,
LN

@smh: "Also still some older style(9) comments still not addressed."

Other than the blank line you mentioned, what other style(9) issues are there?

rpokala marked 6 inline comments as done.Jan 15 2016, 4:54 PM
rpokala added inline comments.
sys/net/if_lagg.h
240 ↗(On Diff #12366)

I think it's an artifact of Phabricator; it's aligned correctly in the file.

smh added a comment.Jan 15 2016, 5:49 PM

@smh: "Also still some older style(9) comments still not addressed."
Other than the blank line you mentioned, what other style(9) issues are there?

There where a few previous comments not marked as done, which still appeared valid. I think however this was due to diff upload so hopefully just the one I've re-highlighted.

sys/net/if_lagg.c
107

This still seems incorrect (tab intended)

rpokala marked an inline comment as done.Jan 15 2016, 6:02 PM
rpokala added inline comments.
sys/net/if_lagg.c
107

Ah, thanks.

I did it the way it is because other stuff in the file is done that way. For example, see the prototypes for lagg_port_output(), lagg_setflag(), lagg_rr_input(), lagg_fail_input(), lagg_lb_input(), lagg_bcast_input(), lagg_lacp_input().

If you still want me to change this, I can, but I'd rather go with the in-file precedent.

smh added inline comments.Jan 15 2016, 6:11 PM
sys/net/if_lagg.c
107

If that's the case then all good :)

rpokala added inline comments.Jan 15 2016, 6:43 PM
sys/net/if_lagg.c
107

Excellent. Can you mark this approved for the record, so I can commit?

smh added a reviewer: smh.Jan 16 2016, 2:18 AM
smh accepted this revision.

Happy with this from a style perspective, however I haven't tested and this area is not one of my strong points. So for commit I'd say I've only reviewed (in part).

My other concern it that the actions performed on failure may well not result in the system being in the same state before the update was attempted, which could cause unexpected issues.

This revision is now accepted and ready to land.Jan 16 2016, 2:18 AM

Please allow me one more day to review the changes.

Thanks Steve for your comments.
I would appreciate if you can explain more on "... actions performed on failure may well not result in the system being in the same state before the update was attempted, which could cause unexpected issues."

The changes introduce MTU changes on lagg, addresses concerns of inconsistent/partial rollback on failures by synchronizing calls to wait for response from async taskq. And additionally, when operations are in progress, newer calls are rejected with busy status. Is there any flow/window you have seen where the state is not restored?

The operation of MAC change on lagg is left as it was earlier. And these are the only 2 operations currently defined on lagg port.

Sorry for the delay. Will try to review today/tomorrow.

Sorry for the delay. Will try to review today/tomorrow.

Ping?

melifaro requested changes to this revision.

Sorry for taking that long.

First off, I like the idea w/ synchronous completion - it is really a good way to go.
However, I have some comments/suggestions on the implementation.

Changing MTU is not different from changing MAC or changing capabilities - each needs to be run async, on the list of interfaces, commands can fail, you might want rollback and completion signal. This should be done in generic way.

So, I'd suggest the following:

  1. use single queue item per task queue "task"
  2. embed all interface pointers w/ their ioctl handlers (and maybe other flags there) (maybe using some standard e.g. non mtu-specific function) into that item
  3. embed/init mutex/condvar there
  4. embed callback/free routine pointers

So, the process can look like the following:

  • lag_change_mtu() allocates new structure, stores there all current interfaces, along with lagg_llq_action_mtu() and initialised lagg_signal.
  • do mtx lock, put-to-task, cv_wait()
  • then, llq_action is called by generic lagg dequeue code, it handles mtu changes and signals back to cv
  • initial ioctl wakes up, reclaims memory/do other cleanup and returns the result
  • (there could be a "destroy" handler which simply signals cv w/ error on enqueued items, when lagg destroy is requested)

Doing it that way allows to have more than 1 item on the taskq and permits extending this approach to other code.

This revision now requires changes to proceed.Jan 26 2016, 7:04 AM
rpokala abandoned this revision.Aug 1 2016, 8:17 PM

This turned out to be way more complicated than anyone expected, the people who were working on it for Panasas are no longer with the company, the changes have bitrotted due to six months of neglect from me, and we came up with a reasonable workaround in the scripting layer. For all those reasons, I'm abandoning this diff.

cem added a subscriber: cem.Sep 2 2016, 9:26 PM