Complete INADDR_HASH lock protection
Needs ReviewPublic

Authored by eugen_grosbein.net on Sep 22 2017, 11:11 AM.

Details

Reviewers
ae
avg
mav
rwatson
Summary

If a system has dynamic ever changing set of IP addresses, it can panic due to unprotected access to global INADDR_HASH in several kernel subsystems: ip_input.c, in_mcast.c, if_stf(4).

There was also similar problem in ip_fw2.c fixed recently in head by ae@.
kgdb showed "(struct in_ifaddr *) 0xdeadc0dedeadc0de" for a panic in INVARIANTS-enabled kernel (PR #220078).

I could provoke such panic in ipfw with my stress-test for net/mpd5 creating/deleting hundreds of ngXXX interfaces having distinct IP addresses.

Proposed change makes use of existing IN_IFADDR_[RW]LOCK macros to protect access to the set as they are already used in the ip_icmp.c, if_ether.c and in.c.

Test Plan

The nature of the race makes it hard to reproduce the bug, but it can happen using high loaded (or stress-tested) net/mpd5 installation.

  1. Apply the patch.
  2. Rebuild a kernel.
  3. Run busy mpd5 server with hundreds of users reconnecting very often.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
mav added inline comments.Sep 22 2017, 12:18 PM
sys/net/if_stf.c
383

Wouldn't it be better to use in_localip() here instead of code duplication? Or I miss something?

sys/netinet/in_mcast.c
1383

Do I miss some other protection means or this and few below places just scream about possible races due to missing interface pointer pulled out of the lock without taking reference?

I suspect INADDR_TO_IFP KPI is not safe now in general.

sys/netinet/ip_input.c
697

Looking on r194951 commit message and later r286001 commit this should be uncommented now, but comments from active networking people are welcome.

rwatson added a subscriber: bz.Sep 22 2017, 12:25 PM

Many of these changes are not sufficient to address the underlying problems in this code, as the lock covers loop iteration but fails to protect the stability of the 'ifp' or 'ia' pointer outside of the loop. It could be that this prevents the crash you are seeing as the issue is iteration over entries being deleted from the list, or that the problem you are seeing is masked by changes in timing (as it involves using an ifp, ia, etc, being removed). It would be useful to think a little harder about our goals here. @bz and I have been pondering for some time a somewhat coarser, rmlock-based synchronisation approach to use in these circumstances, but have not prototyped this yet.

eugen_grosbein.net marked an inline comment as done.Sep 22 2017, 12:27 PM
eugen_grosbein.net added inline comments.
sys/net/if_stf.c
383

in_localip() may be better, I just wished to make the patch as clear as possible minizing changes for first revision.

eugen_grosbein.net marked 2 inline comments as done.Sep 26 2017, 11:00 AM

Many of these changes are not sufficient to address the underlying problems in this code, as the lock covers loop iteration but fails to protect the stability of the 'ifp' or 'ia' pointer outside of the loop.

Yes, it's true but that's distinct and much larger problem. I am not going to solve all such problems at once with my first commit to src. I prefer to fix things one by one when it is possible with small changes.

eugen_grosbein.net marked an inline comment as done.Sep 26 2017, 11:03 AM
eugen_grosbein.net added inline comments.
sys/netinet/in_mcast.c
1383

Yes, it does. I just want to fix that what's easy to fix to start.

eugen_grosbein.net marked 2 inline comments as done.Sep 26 2017, 11:03 AM
eugen_grosbein.net marked an inline comment as done.
ae added inline comments.Sep 27 2017, 5:28 AM
sys/net/if_stf.c
383

Using in_localip() is preferred, because INADDR_HASH is protected with rmlock only in head/ and stable/11. So if you plan to merge this into stable/10, you will need to do direct commit.

eugen_grosbein.net marked an inline comment as done.Sep 27 2017, 12:12 PM
eugen_grosbein.net added inline comments.
sys/net/if_stf.c
383

Got it, thanks.