Page MenuHomeFreeBSD

Fix refcounting leaks in IPv6 MLD code leading to loss of IPv6 connectivity
ClosedPublic

Authored by hselasky on Jan 18 2019, 11:43 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 16 2024, 6:02 AM
Unknown Object (File)
Mar 16 2024, 5:57 AM
Unknown Object (File)
Mar 16 2024, 5:40 AM
Unknown Object (File)
Mar 6 2024, 6:28 AM
Unknown Object (File)
Feb 27 2024, 12:59 AM
Unknown Object (File)
Feb 15 2024, 5:51 AM
Unknown Object (File)
Jan 11 2024, 10:58 PM
Unknown Object (File)
Jan 5 2024, 5:25 AM

Details

Summary

Fix refcounting leaks in IPv6 MLD code leading to loss of IPv6 connectivity.

Looking at past changes in this area like r337866, some refcounting bugs have been introduced, one by one. For example like calling in6m_disconnect() and in6m_rele_locked() in mld_v1_process_group_timer() where previously no disconnect nor refcount decrement was done. Calling in6m_disconnect() when it shouldn't causes IPv6 solitation to no longer work, because all the multicast addresses receiving the solitation messages are now deleted from the network interface. Due to a double increment when joining IPv6 multicast groups many of these issues have remained hidded.

This patch reverts some recent changes while improving the MLD refcounting and concurrency model after the MLD code was converted into using EPOCH(9).

List of important changes:

  1. All CK_STAILQ_FOREACH() macros are now properly enclosed into EPOCH(9) sections.
  2. Corrected bad use of in6m_disconnect() leading to loss of IPv6 connectivity for MLD v1.
  3. Added sysctl for sake of debugging to disable incoming MLD v2 messages similar to existing sysctl for MLD v1 messages.
  4. When joining multicast IPv6 groups bad refcounting was removed. This was observed by starting and stopping rpcbind, that the in6m_refcount was incrementing instead of remaining neutral.
  5. When detaching a network interface we need to drain the workqueue freeing the inm's because it will access the ifnet pointer which is about to be freed.
  6. Factored out checks for valid inm structure into in6m_ifmultiaddr_get_inm().

PR: 233535
MFC after: 1 week
Sponsored by: Mellanox Technologies

Test Plan
while true
do
ifconfig re0 inet6 fc00::1
sleep 3
vmstat -m | grep multi
done

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Fixed a whitespaces vs tabs issue.

Can you add an explanation which of this changes fixes loss of ipv6 connectivity, and why the problem occurred in the first place.

In D18887#403202, @kib wrote:

Can you add an explanation which of this changes fixes loss of ipv6 connectivity, and why the problem occurred in the first place.

Calling in6m_disconnect() when it shouldn't causes IPv6 solitation to no longer work, because all the multicast addresses receiving the solitation messages are now deleted from the network interface.

(a) there's one or two lines of whitespace changes in there; can we get them sorted separately?
(b) how much of this has become "reverting" changes and how much is actual change now? I wonder if, for the sake of clarity and history as well as for easier review, could be split into two parts? Is that feasible?

In D18887#403208, @bz wrote:

(a) there's one or two lines of whitespace changes in there; can we get them sorted separately?

Should be fixed. See latest patch.

(b) how much of this has become "reverting" changes and how much is actual change now? I wonder if, for the sake of clarity and history as well as for easier review, could be split into two parts? Is that feasible?

I can split the patch up into 2-4 logical pieces as the commit message suggests before commit. I hoped I could get an approval for all the changes I've made in one go.

Most of the "cosmetic" changes can be ignored if you want. The only real one is the one I marked with XXX; if the current code is correct I think it would deserve a comment on why that is. I couldn't figure it out quickly.

sys/netinet6/in6_mcast.c
593 ↗(On Diff #52976)

Wait ... complete. Comment start uppercase and end in punctuation.
Also empty line at the beginning of the function if there are no variable declarations.

1260 ↗(On Diff #52976)

This could be done right after the out_in6m_release: label; in that case in an earlier exist we don't have to do the work.

1373 ↗(On Diff #52976)

This once again could be done later but in this case it doesn't matter too much as there's no early exit.

On another note, the initialization of error below is no longer needed either.

sys/netinet6/mld6.c
828 ↗(On Diff #52976)

This could be done before the initialization above which is not needed in case of early return.

1496 ↗(On Diff #52976)

XXX Why is this not in6m_rele_locked() or in other words, why do we have an inm that we want to remove which has no reference for it?

1804 ↗(On Diff #52976)

whitespace; could be done before.

1904 ↗(On Diff #52976)

There's extra spaces between tabs in there.

hselasky added inline comments.
sys/netinet6/mld6.c
1496 ↗(On Diff #52976)

This is a special case where we use the inmh to queue up inm's which are subject to mld_v1_transmit_report() instead of free().

Address comments from @bz . Fix a CTR statement which was accessing an uninitialized variable.

You may want to consider forwarding this to re@ after commit and suggest once merged to stable/12 there should be a EN about this; probably together with markj's D18906.

This revision is now accepted and ready to land.Jan 23 2019, 9:09 PM

Here's a copy of the errata template:

After commit and MFC please fill out as much as you can and email to re@ and so@ to move the errata along.

This revision was automatically updated to reflect the committed changes.