Details

Reviewers

melifaro
rpokala-panasas.com
rstone
smh

Summary

Teach lagg(4) to accept ioctl(SIOCSIFMTU) and change the MTU on all the component interfaces. If for any reason a component interface's MTU cannot be changed as requested, all component interfaces are returned to the original MTU.

Discussed on freebsd-hackers@:

https://lists.freebsd.org/pipermail/freebsd-hackers/2015-February/thread.html#46986

There is a LOR, but I don't know how to interpret it:

https://lists.freebsd.org/pipermail/freebsd-hackers/2015-February/047012.html

Putting rstone@ down as a reviewer, per his offer on the list.

Test Plan

Test script attached to initial thread on freebsd-hackers@

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 2112
Build 2120: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

rstone added inline comments.Mar 2 2015, 12:02 AM

sys/net/if_lagg.c
1811	I find the flow control here a bit confusing (my first read through, I thought that err2 could be used unitinialized). Given that you have a continue in the if block, I would find it clearer to not have an else here

emaste added a subscriber: emaste.Mar 2 2015, 4:38 AM

In D1986#7, @rstone wrote:

RLOCK only gets a read lock. You want WLOCK to get a write lock to ensure serialization.

Also we can use another lock in the lagg_ioctl, that will prevent simultaneous MTU changing.

Updating D1986: Teach lagg(4) to change MTU

Addressing review comments from rstone:

Fix style(9) violations
Replace LAGG_RLOCK() w/ LAGG_WLOCK()
Clarify logic for roll-back
Better comments throughout.

4079 addresses @rstone's comments. However, the LOR remains:

Mar  2 17:00:43 fbsd-X root: ifconfig lagg0 mtu 9000
Mar  2 17:00:43 fbsd-X kernel: lock order reversal:
Mar  2 17:00:43 fbsd-X kernel: 1st 0xfffff80004960c08 if_lagg rmlock (if_lagg rmlock) @ /usr/src/sys/modules/if_lagg/../../net/if_lagg.c:1779
Mar  2 17:00:43 fbsd-X kernel: 2nd 0xfffffe0000d50748 em1 (EM Core Lock) @ /usr/src/sys/dev/e1000/if_lem.c:1035
Mar  2 17:00:43 fbsd-X kernel: KDB: stack backtrace:
Mar  2 17:00:43 fbsd-X kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00003f86c0
Mar  2 17:00:43 fbsd-X kernel: witness_checkorder() at witness_checkorder+0xe45/frame 0xfffffe00003f8750
Mar  2 17:00:43 fbsd-X kernel: __mtx_lock_flags() at __mtx_lock_flags+0xa8/frame 0xfffffe00003f87a0
Mar  2 17:00:43 fbsd-X kernel: lem_ioctl() at lem_ioctl+0x3e8/frame 0xfffffe00003f87e0
Mar  2 17:00:43 fbsd-X kernel: lagg_ioctl() at lagg_ioctl+0xafd/frame 0xfffffe00003f88e0
Mar  2 17:00:43 fbsd-X kernel: ifioctl() at ifioctl+0x102e/frame 0xfffffe00003f89a0
Mar  2 17:00:43 fbsd-X kernel: kern_ioctl() at kern_ioctl+0x2c0/frame 0xfffffe00003f8a00
Mar  2 17:00:43 fbsd-X kernel: sys_ioctl() at sys_ioctl+0x153/frame 0xfffffe00003f8ae0
Mar  2 17:00:43 fbsd-X kernel: amd64_syscall() at amd64_syscall+0x27f/frame 0xfffffe00003f8bf0
Mar  2 17:00:43 fbsd-X kernel: Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00003f8bf0
Mar  2 17:00:43 fbsd-X kernel: --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8011e1eea, rsp = 0x7fffffffe288, rbp = 0x7fffffffe2a0 ---

One of my colleagues dug into the first LOR - the one that happens when adding em1 to lagg0, which happens both with and without the patch. It looks like this is a known issue:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194109

We (Panasas) tried reproducing the problem by applying the patch and enabled WITNESS in freebsd stable (10.1) kernel.
While testing with this patched kernel, LOR not observed on ixl and igb ports.

Based on the LOR logs mentioned earlier, the issue seems to be observed on device specific driver like em (e1000).
We tried doing code analysis on em driver and could not find anything obvious. Our platform do not use em driver anymore.

The issue seems to be on device specific driver and not the lagg driver changes that we introduced.
So, we kindly request you to accept these changes to mainline.

In D1986#77466, @lakshmi.n_msystechnologies.com wrote:

We (Panasas) tried reproducing the problem by applying the patch and enabled WITNESS in freebsd stable (10.1) kernel.
While testing with this patched kernel, LOR not observed on ixl and igb ports.

Based on the LOR logs mentioned earlier, the issue seems to be observed on device specific driver like em (e1000).
We tried doing code analysis on em driver and could not find anything obvious. Our platform do not use em driver anymore.

The issue seems to be on device specific driver and not the lagg driver changes that we introduced.
So, we kindly request you to accept these changes to mainline.

lem_ioctl() at lem_ioctl+0x3e8/frame 0xfffffe00003f87e0

This actually means sys/dev/e1000/if_lem.c ... I'll take a look at the LOR and see if its fixable.

@sbruno, A gentle remainder on em driver's LOR issue. Can you please share your findings.

It is true that this LOR is driver-specific but calling SIOCSIFMTU after acquiring a lock in lagg ioctl is not always safe. Change of lladdr suffers from the same situation and it was solved by using an asynchronous task queue to update addresses on each port. What do you think about piggybacking an MTU change to the queue by extending struct lagg_llq to a more generic one which makes it possible to handle per-port properties?

After digging into lagg internals on updating lladdrs on lagg ports, I'd also vote for extenging llq to deal with MTU changes for underlying interfaces

Thanks @hrs and @melifaro for the suggestions.

I am working on the code changes to handle the MTU asynchronously. I will update the tested patch for review by this week.

rpokala commandeered this revision.Dec 14 2015, 7:48 PM

rpokala added a reviewer: rpokala-panasas.com.

Herald added a subscriber: imp. · View Herald TranscriptDec 14 2015, 7:48 PM

Updated patch from LN, addressing comments from hrs@, melifaro@, and internal Panasas review.

rpokala marked 3 inline comments as done.Dec 15 2015, 3:41 PM

Some general style nits and a question re-loss of the mtu set error.

sys/net/if_lagg.c
752	style(9) bool use of pointer type.
923	style(9) four space additional indent only should be used, more below.
961	Looks like this attempts to change more that would have been done above before the error, is that intended?
975	Looks like the error gets lost, although printed, is there no way we can avoid this?
1004	style(9) bracing around return.
1010	style(9) init of vars in declaration should be avoided. Moving to down to where its first needed can avoid setting it at all.
1029	style(9) bool use of pointer type.
1035	style(9) bool use of pointer type.
1044	consider: "it might have been updated."
1053	style(9) bool use of pointer type.

melifaro added inline comments.Dec 17 2015, 10:16 PM

sys/net/if_lagg.c
920	What if we have multiple events queued on tasq? e.g mtu AND mac change
955	Not that easy, unfortunately. At this moment original ioctl returned 0, so other things/events were fired: rtsock notification, IPv6 nd mtg update, route table mtg update.. see net/if.c for more details. Trying to silently revert part of it without dealing with other things is not a good idea.

hrs added inline comments.Dec 18 2015, 10:11 PM

sys/net/if_lagg.c
905	This traversal and freeing an entry after processing it should be done in lagg_port_ops().
945	Please separate a llq loop from a handler for per-port configuration. A llq traversal should be required only once in lagg_port_ops() if the handlers process a single lagg_llq entry.
1029	Is this (llq == NULL), not (llq != NULL)?
1032	Why is cleanup required here? This removes all of tasks not limited to MTU change.
1053	free(NULL) does nothing. Checking if NULL or not is useless.
sys/net/if_lagg.h
237	Please add "llq_" prefix to the members.
238	Is there any reason to have ifr as a pointer? malloc is generally expensive in kernel, and overhead of struct ifreq is acceptable for me even if every llq has one. I feel this complicates the error handling at least.

Thanks @smh, @melifaro and @hrs for the review comments.

Address review comments from smh@, melifaro@, and hrs@.

I'm proxying for LN, who asserts that all the comments have been addressed. So, marking them done.

sys/net/if_lagg.c
961	For the interfaces after the one that failed (and therefore which still have the old MTU), this is a noop.
1004	I rather hate that idiom, but fine. :-P

smh added inline comments.Jan 11 2016, 10:17 PM

sys/kern/kern_condvar.c
466 ↗	(On Diff #11984)	style(9) there should be blank line at the beginning of methods that don't have any vars.
468 ↗	(On Diff #11984)	No need for else here
sys/net/if_lagg.c
106	style(9) should be 4 space indented, more below
sys/net/if_lagg.h
241	not sure but looks like there are alignment issues here?

smh added inline comments.Jan 11 2016, 11:27 PM

sys/kern/kern_condvar.c
465 ↗	(On Diff #11984)	ASSERT that the lock is held?
sys/net/if_lagg.c
757	Not sure if its possible but if we have no llq_entries then err will be uninitialised on return.
780	style(9) blank line after vars.
803	static void should be on a separate line, few more below. Also not keen on the name for this method may be something like lagg_llq_free_entries?
836	No need for == TRUE on bool.
843	I assume this is checked after cv_has_waiters as that indicates an mtu change is in progress hence there's no easy way to confirm what it will become? Its a shame ioctl doesn't currently have EBUSY as a possible return error as that may be simpler.

Address smh@'s comments from 2016-01-11.

smh@ - I just uploaded an updated patch from LN, which addressed your latest comments.

Also still some older style(9) comments still not addressed.

sys/net/if_lagg.c
844	This looks a bit messy given you only need LAGG_WUNLOCK(sc); I would do something like (also eliminates done var too). Alternatively just use goto out, which while it sets busy to false that should be fine. if (SLIST_EMPTY(&sc->sc_ports)) { LAGG_WUNLOCK(sc); return (EIO); } else if (sc->sc_mtu_ctxt.busy) { LAGG_WUNLOCK(sc); return (EBUSY); } else if (ifp->if_mtu == ifr->ifr_mtu) { LAGG_WUNLOCK(sc); return (0); } sc->sc_mtu_ctxt.busy = TRUE;
sys/net/if_lagg.h
216	Any reason for the blank line here?

Hi Steven Hartland,
Thanks for your review.
I have attached the revised diffs here, as I cannot update them myself.
Ravi, who owns this review, works on a different timezone, and I have
copied him here.

Looking forward to hear your comments on the same.

Thanks,
LN

mtu_comments_up.patch73 KBDownload

@smh: "Also still some older style(9) comments still not addressed."

Other than the blank line you mentioned, what other style(9) issues are there?

Update with LN's changes from https://reviews.freebsd.org/D1986#104620

rpokala marked 6 inline comments as done.Jan 15 2016, 4:54 PM

rpokala added inline comments.

sys/net/if_lagg.h
241	I think it's an artifact of Phabricator; it's aligned correctly in the file.

In D1986#105007, @rpokala wrote:

@smh: "Also still some older style(9) comments still not addressed."

Other than the blank line you mentioned, what other style(9) issues are there?

There where a few previous comments not marked as done, which still appeared valid. I think however this was due to diff upload so hopefully just the one I've re-highlighted.

sys/net/if_lagg.c
106	This still seems incorrect (tab intended)

rpokala marked an inline comment as done.Jan 15 2016, 6:02 PM

rpokala added inline comments.

sys/net/if_lagg.c
106	Ah, thanks. I did it the way it is because other stuff in the file is done that way. For example, see the prototypes for lagg_port_output(), lagg_setflag(), lagg_rr_input(), lagg_fail_input(), lagg_lb_input(), lagg_bcast_input(), lagg_lacp_input(). If you still want me to change this, I can, but I'd rather go with the in-file precedent.

smh added inline comments.Jan 15 2016, 6:11 PM

sys/net/if_lagg.c
106	If that's the case then all good :)

rpokala added inline comments.Jan 15 2016, 6:43 PM

sys/net/if_lagg.c
106	Excellent. Can you mark this approved for the record, so I can commit?

Happy with this from a style perspective, however I haven't tested and this area is not one of my strong points. So for commit I'd say I've only reviewed (in part).

My other concern it that the actions performed on failure may well not result in the system being in the same state before the update was attempted, which could cause unexpected issues.

This revision is now accepted and ready to land.Jan 16 2016, 2:18 AM

Please allow me one more day to review the changes.

Thanks Steve for your comments.
I would appreciate if you can explain more on "... actions performed on failure may well not result in the system being in the same state before the update was attempted, which could cause unexpected issues."

The changes introduce MTU changes on lagg, addresses concerns of inconsistent/partial rollback on failures by synchronizing calls to wait for response from async taskq. And additionally, when operations are in progress, newer calls are rejected with busy status. Is there any flow/window you have seen where the state is not restored?

The operation of MAC change on lagg is left as it was earlier. And these are the only 2 operations currently defined on lagg port.

ping @melifaro, @smh

Sorry for the delay. Will try to review today/tomorrow.

In D1986#105877, @melifaro wrote:

Sorry for the delay. Will try to review today/tomorrow.

Ping?

Sorry for taking that long.

First off, I like the idea w/ synchronous completion - it is really a good way to go.
However, I have some comments/suggestions on the implementation.

Changing MTU is not different from changing MAC or changing capabilities - each needs to be run async, on the list of interfaces, commands can fail, you might want rollback and completion signal. This should be done in generic way.

So, I'd suggest the following:

use single queue item per task queue "task"
embed all interface pointers w/ their ioctl handlers (and maybe other flags there) (maybe using some standard e.g. non mtu-specific function) into that item
embed/init mutex/condvar there
embed callback/free routine pointers

So, the process can look like the following:

lag_change_mtu() allocates new structure, stores there all current interfaces, along with lagg_llq_action_mtu() and initialised lagg_signal.
do mtx lock, put-to-task, cv_wait()
then, llq_action is called by generic lagg dequeue code, it handles mtu changes and signals back to cv
initial ioctl wakes up, reclaims memory/do other cleanup and returns the result
(there could be a "destroy" handler which simply signals cv w/ error on enqueued items, when lagg destroy is requested)

Doing it that way allows to have more than 1 item on the taskq and permits extending this approach to other code.

This revision now requires changes to proceed.Jan 26 2016, 7:04 AM

This turned out to be way more complicated than anyone expected, the people who were working on it for Panasas are no longer with the company, the changes have bitrotted due to six months of neglect from me, and we came up with a reasonable workaround in the scripting layer. For all those reasons, I'm abandoning this diff.

cem added a subscriber: cem.Sep 2 2016, 9:26 PM

Teach lagg(4) to change MTU
AbandonedPublic
Actions

Details

Diff Detail

Event Timeline

Updating D1986: Teach lagg(4) to change MTU

Revision Contents
Changeset List

Diff 12272

sys/net/if_lagg.h

sys/net/if_lagg.c

Teach lagg(4) to change MTUAbandonedPublicActions

Details

Diff Detail

Event Timeline

Updating D1986: Teach lagg(4) to change MTU

Revision ContentsChangeset List

Diff 12272

sys/net/if_lagg.h

sys/net/if_lagg.c

Teach lagg(4) to change MTU
AbandonedPublic
Actions

Revision Contents
Changeset List