We've observed some weird behavior with ena(4) interface being reset periodically on our r4.xlarge EC2 instances. Turns out our DHCP server supplies non-default MTU (9001), which causes dhclient(8) to issue ioctl(SIOCSIFMTU) on every renew. This is further aggravated by the fact that most if not all network drivers don't bother checking the current value and most of them are quite complex beasts these days doing all kind of weird and wonderful stuff when MTU being changed, such as shutting down and starting up kernel threads etc.
This patch fixes this problem by only issuing MTU update on initial configuration or if new value received from the DHCP server differs from a previous one.
Dec 11 05:51:13 builder kernel: ena0: device is going DOWN Dec 11 05:51:14 builder kernel: ena0: device is going UP Dec 11 05:51:14 builder kernel: ena0: queue 0 - cpu 0 Dec 11 05:51:14 builder kernel: ena0: queue 1 - cpu 1 Dec 11 05:51:14 builder kernel: ena0: queue 2 - cpu 2 Dec 11 05:51:14 builder kernel: ena0: queue 3 - cpu 3 Dec 11 06:21:13 builder kernel: ena0: device is going DOWN Dec 11 06:21:14 builder kernel: ena0: device is going UP Dec 11 06:21:14 builder kernel: ena0: queue 0 - cpu 0 Dec 11 06:21:14 builder kernel: ena0: queue 1 - cpu 1 Dec 11 06:21:14 builder kernel: ena0: queue 2 - cpu 2 Dec 11 06:21:14 builder kernel: ena0: queue 3 - cpu 3 Dec 11 06:51:13 builder kernel: ena0: device is going DOWN Dec 11 06:51:14 builder kernel: ena0: device is going UP Dec 11 06:51:14 builder kernel: ena0: queue 0 - cpu 0 Dec 11 06:51:14 builder kernel: ena0: queue 1 - cpu 1 Dec 11 06:51:14 builder kernel: ena0: queue 2 - cpu 2 Dec 11 06:51:14 builder kernel: ena0: queue 3 - cpu 3 Dec 11 07:21:14 builder kernel: ena0: device is going DOWN Dec 11 07:21:14 builder kernel: ena0: device is going UP Dec 11 07:21:14 builder kernel: ena0: queue 0 - cpu 0 Dec 11 07:21:14 builder kernel: ena0: queue 1 - cpu 1 Dec 11 07:21:14 builder kernel: ena0: queue 2 - cpu 2 Dec 11 07:21:14 builder kernel: ena0: queue 3 - cpu 3 Dec 11 07:51:13 builder kernel: ena0: device is going DOWN Dec 11 07:51:14 builder kernel: ena0: device is going UP Dec 11 07:51:14 builder kernel: ena0: queue 0 - cpu 0 Dec 11 07:51:14 builder kernel: ena0: queue 1 - cpu 1 Dec 11 07:51:14 builder kernel: ena0: queue 2 - cpu 2 Dec 11 07:51:14 builder kernel: ena0: queue 3 - cpu 3
With some extra debug added:
Dec 13 09:26:16 builder kernel: Trying to mount root from zfs:vol-000bbd3ff1dd89617 []... Dec 13 09:26:16 builder kernel: random: unblocking device. Dec 13 09:26:16 builder kernel: ena_ioctl: SIOCSIFFLAGS Dec 13 09:26:16 builder kernel: ena0: device is going UP Dec 13 09:26:16 builder kernel: ena0: queue 0 - cpu 0 Dec 13 09:26:16 builder kernel: ena0: queue 1 - cpu 1 Dec 13 09:26:16 builder kernel: ena0: queue 2 - cpu 2 Dec 13 09:26:16 builder kernel: ena0: queue 3 - cpu 3 Dec 13 09:26:16 builder kernel: ena_ioctl: SIOCGIFMEDIA Dec 13 09:26:16 builder kernel: ena_ioctl: SIOCSIFFLAGS Dec 13 09:26:16 builder kernel: ena_ioctl: SIOCADDMULTI Dec 13 09:26:16 builder kernel: ena_ioctl(SIOCSIFMTU): Changing MTU from 1500 to 9001 Dec 13 09:26:16 builder kernel: ena0: device is going DOWN Dec 13 09:26:16 builder kernel: ena0: device is going UP Dec 13 09:26:16 builder kernel: ena0: queue 0 - cpu 0 Dec 13 09:26:16 builder kernel: ena0: queue 1 - cpu 1 Dec 13 09:26:16 builder kernel: ena0: queue 2 - cpu 2 Dec 13 09:26:16 builder kernel: ena0: queue 3 - cpu 3 Dec 13 09:26:16 builder kernel: ena_ioctl: SIOCDELMULTI Dec 13 09:26:16 builder kernel: ena_ioctl: SIOCADDMULTI Dec 13 09:26:16 builder kernel: ena_ioctl: SIOCGIFMEDIA Dec 13 09:26:26 builder last message repeated 38 times Dec 13 09:26:26 builder kernel: ena1: device is going UP Dec 13 09:26:26 builder kernel: ena1: queue 0 - cpu 0 Dec 13 09:26:26 builder kernel: ena1: queue 1 - cpu 1 Dec 13 09:26:26 builder kernel: ena1: queue 2 - cpu 2 Dec 13 09:26:26 builder kernel: ena1: queue 3 - cpu 3 Dec 13 09:26:26 builder kernel: ena_ioctl: SIOCADDMULTI Dec 13 09:26:26 builder kernel: ena_ioctl: SIOCGIFMEDIA Dec 13 09:26:26 builder last message repeated 2 times Dec 13 09:28:48 builder kernel: ena_ioctl: SIOCGIFMEDIA Dec 13 09:28:48 builder last message repeated 5 times Dec 13 09:56:13 builder kernel: ena_ioctl(SIOCSIFMTU): Changing MTU from 9001 to 9001 Dec 13 09:56:13 builder kernel: ena0: device is going DOWN Dec 13 09:56:13 builder kernel: ena0: device is going UP Dec 13 09:56:13 builder kernel: ena0: queue 0 - cpu 0 Dec 13 09:56:13 builder kernel: ena0: queue 1 - cpu 1