Page MenuHomeFreeBSD

Lock the NDP default router list and use refcounting for defrtr objects
ClosedPublic

Authored by markj on Feb 18 2016, 1:00 AM.
Tags
None
Referenced Files
Unknown Object (File)
Oct 1 2024, 7:44 PM
Unknown Object (File)
Oct 1 2024, 7:44 PM
Unknown Object (File)
Oct 1 2024, 7:44 PM
Unknown Object (File)
Oct 1 2024, 7:41 PM
Unknown Object (File)
Sep 28 2024, 4:54 AM
Unknown Object (File)
Sep 27 2024, 7:50 AM
Unknown Object (File)
Sep 20 2024, 5:10 AM
Unknown Object (File)
Sep 19 2024, 8:05 PM

Details

Summary

This change adds a rwlock, the ND lock, to synchonize access to the
per-vnet default router list. The intent is to use this lock to protect
the prefix list as well. This change also introduces a number of
subroutines used to manage the lifecycle of defrouter objects.

A rwlock is used because most accesses of the list are read-only.
Additionally, rwlock read locks are always recursive, which will be
important for locking the prefix list. This lock is only acquired in NDP
and interface configuration operations so I believe it will not have any
impact on critical performance paths.

In this change, the ifafdata and lle locks may be acquired with the ND
lock held. It is a bug to acquire a rtentry lock with the ND lock held.

The process of deleting a defrouter object becomes more complicated. We
generally don't want to call defrouter_del() with the ND lock held,
since it flushes routing table entries and acquires a number of locks
doing so. Moreover, nd6_rtrequest() must acquire the ND read lock, so we
cannot call defrouter_del() or defrouter_delreq() with the ND lock held.
As a result, the following pattern is used in a few places:

  • unlink defrouters from the global list with the ND write lock held, and stash them in a local list
  • release the ND lock and invoke defrouter_del() on the unlinked defrouters

A refcount field is also added to struct nd_defrouter. There are code
paths where a defrouter must be held without the ND lock held; prefixes
will also hold a reference to their advertising router.

defrouter_reset() becomes pretty ugly: to avoid calling
defrouter_delreq() with the ND lock held, we make a temporary array
containing pointers to existing defrouter list entries, and call
defrouter_delreq() on entries from the array. This function is only
called in ioctl context though; suggestions for better approaches
are more than welcome.

Test Plan

Larry Rosenman tested an earlier version of this patch; it solved some use-after-frees that were consistently occurring in his environment.

I've done some testing with invariants and witness enabled in my own setup.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj retitled this revision from to Lock the NDP default router list and use refcounting for defrtr objects.
markj edited the test plan for this revision. (Show Details)
markj updated this object.
markj edited the test plan for this revision. (Show Details)
markj added reviewers: bz, hrs, ae, melifaro.

I'm building a new world / kernel /this patch (and a companion rtsold patch).
I'll chime in with any issues

This patch still fixes my crashes with rebooting the firewall. I also picked up the rtsold patch that @markj committed, and that fixes the rtsold crash when I kicked my firewall. I'd recommend committing.

Looks acceptable for me. Can you also look into in6_ifdetach() function? It seems with this patch the second call of nd6_purge() now should be useless.

sys/netinet6/nd6.c
209 ↗(On Diff #13396)

It looks like rw_destroy() is missed in nd6_destroy().

markj edited edge metadata.

Destroy the ND lock in nd6_destroy()

In D5315#113374, @ae wrote:

Looks acceptable for me. Can you also look into in6_ifdetach() function? It seems with this patch the second call of nd6_purge() now should be useless.

I'm not sure about that - it seems that prefixes will linger until their associated addresses are removed. In particular, note the check of pr->ndpr_refcnt in prelist_remove(). This patch doesn't touch prefix locking or refcounting, so I don't see how the situation has changed with respect to nd6_purge(). Am I missing something?

I'll be sure to address this with changes to the prefix locking, at least.

sys/netinet6/nd6.c
209 ↗(On Diff #13396)

Fixed, thanks.

This comment was removed by bz.

Any further comments? I'd like to commit this soon.

We're running this in production and found it eliminated most/all crashes.

This revision was automatically updated to reflect the committed changes.