There are a number of races in the DAD code which can cause double frees or use-after-frees of the DAD objects. These objects are kept in a global queue which is protected by a rwlock; however, they aren't reference-counted.
This change attempts to fix some of the races by adding a refcount to struct dadq and simplifying the way the DAD objects are managed.
Currently, the DAD queue can be manipulated in a number of contexts: the DAD callout, nd6_dad_start() and nd6_dad_stop() (called from ioctl handlers), and NS and NA packet handlers. This makes it pretty difficult to fix all of the races; see my comments here: https://lists.freebsd.org/pipermail/freebsd-current/2014-November/053437.html
The proposed change makes the following modifications:
- Add a refcount field to struct dadq. This field is incremented/decremented whenever a dadq is inserted into/removed from the DAD queue. In nd6_dad_start(), the refcount is initialized to 1 for the callout's reference, and nd6_dad_timer() releases the reference if it decides not to reschedule itself. Otherwise nd6_dad_stoptimer() releases it. nd6_dad_find() now returns a referenced struct dadq.
- Fix handling of the dad_ifa reference in each struct dadq. This reference needs to be declared when the struct dadq is initialized and released only when the dadq is about to be freed. The current handling is just wrong.
- Stop calling nd6_dad_duplicated() in packet input paths. Instead, just increment counters in the corresponding struct dadq so that the DAD callout for that address can mark the address as a duplicate when it runs. This means that there will be a small delay before the address is actually marked as duplicate, but during that time it'll be marked as tentative, which means that it shouldn't be used anyway.
- nd6_dad_duplicated() no longer needs to stop the callout or free the struct dadq, since these things are now both done by the DAD callout itself, which is now the only caller of nd6_dad_duplicated().
- Use nd6_dad_init() to initialize the DAD queue and its rwlock. This is a bit superfluous, but it's consistent with the way the rest of the NDP queues and locks are initialized.
There are still some races in this code. In particular, we may try to remove a struct dadq from the queue twice if multiple threads attempt to remove the same struct dadq from the queue simultaneously. But I can't seem to test this without triggering a crash elsewhere in the IPv6 code, so before investigating that I was hoping to have some review of what I've done so far so that I can work on this incrementally.