Implement RFC 4861 Section 7.2.6 and RFC 9131, which is also
address one of the IPv6 deployment issues in RFC 9898 Section 3.9.
GRAND should be triggered by a change in link-layer address of interface
or by configuration of a new global ipv6 address AFTER DAD completes.
Details
To test link-layer address change:
ifconfig vtnet0 ether 58:9c:fc:11:22:33
To test new global address configuration:
ifconfig vtnet0 inet6 2001:db8::1
To verify:
tcpdump -vni vtnet0 icmp6 and icmp6[0] == 136
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 70716 Build 67599: arc lint + arc unit
Event Timeline
| sys/netinet6/nd6_nbr.c | ||
|---|---|---|
| 134 | I also prefer not to virtualize it.
There are additional RFC/rules that require delaying as well.
I don't know how to measure lock contention, AFAIK, witness(4) only helps with order violations. Anyway, I will remove queue virtualization for now and will rollback when you think it's needed. About in6_addr, I don't think we can attach the queue to the in6_addr. | |
Sorry, I didn't ask to de-virtualize the queue, only the lock. I actually asked why can't we hang off the queue from the in6_ifaddr instead of making it global? btw, how many queue entries per unique address could be there?
The lock profiling is a big topic, let's discuss this separate from this review. The TLDR answer on your question is that if you start 100 jails in parallel your NDP lock will be the last thing to contend on. In the first place it will be the global all prison lock, and then dozens of other locks that are used to startup a jail.
| sys/netinet6/nd6_nbr.c | ||
|---|---|---|
| 134 |
Oh, missed this comment. Can you please elaborate on what the complexity is? Seems to be straightforward at the first glance. | |
| 134 | Let's confirm we both are talking about in6_ifaddr, not about in6_addr. | |
I misunderstood you message. I'll fix it right away!
I actually asked why can't we hang off the queue from the in6_ifaddr instead of making it global? btw, how many queue entries per unique address could be there?
Per unique address? For now, should be only one. unless, someone change the mac address right after assigning a new IPv6 address to the same interface.
That's why I think it's pointless to hang off the queue from the in6_ifaddr.
I'm confused now, how can I initialize a single lock for each per-vnet queue?
If I use an static initializer I need to tell/assign which per-vnet queue:
VNET_DEFINE_STATIC(TAILQ_HEAD(, nd6_queue), nd6_queue); #define V_nd6_queue VNET(nd6_queue) static struct rwlock ndq_rwlock; RW_SYSINIT(ndq_rwlock, &**V_ndq_rwlock**, "nd6 queue");
which is only protecting the vnet0.
and if I want to initialize it per vnet like before:
void
nd6_queue_init(void)
{
rw_init(&ndq_rwlock, "nd6 queue");
TAILQ_INIT(&V_nd6_queue);
}the second vnet call to initialize it will fail.
I actually asked why...
So, should I virtualize the lock too? (I answer to your question in the comment)
| sys/netinet6/nd6_nbr.c | ||
|---|---|---|
| 134 |
On the GRAND, llifaddr event, and other delaying rules that I'm aware of, | |
| sys/netinet6/nd6_nbr.c | ||
|---|---|---|
| 134 | If I understood you correct, we don't need a queue at all. The entry is to become a part of in6_ifaddr and in case we need overall counter of delayed packets per interface, depending on how often we need that count, we can either iterate the address list or we can store the counter in in6_ifextra. If I still don't understand that correct, let's talk about that online right after tomorrow's transport meeting. | |
Move nd6_queue tailq to in6_ifextra.
initialize nd6_queue at nd6_ifattach.
Reuse IF_ADDR_WLOCK as wlock of tailq.
Use epoch(9) instead of rlock.
@glebius, we use ifa_ref(). do we still need the if_ref() at nd6_grand_add?
Replace callout_init_rw with callout_init_mtx. (Oops!)
Interesting enough, I tested every part my code. everything works.
nobody complains that I used callout_init_rw and passed a mutex!
rename nd6_queue to nd_queue to match the style of in6_ifextra
rebase from main to update parent commit
| sys/netinet6/nd6_nbr.c | ||
|---|---|---|
| 1743–1748 | I thought this way is cleaner. feels weird since nobody before me done it (fgrep -A2 -r in6_setscope sys) if (in6_setscope(&taddr6, ifp, NULL) == 0)
nd6_na_output_fib(ifp, &taddr6, IFA_IN6(ifa), flags, tlladdr, NULL, ifp->if_fib);
NET_EPOCH_EXIT(et);
CURVNET_RESTORE();feels long if (in6_setscope(&taddr6, ifp, NULL) != 0) {
NET_EPOCH_EXIT(et);
CURVNET_RESTORE();
return;
}
nd6_na_output_fib(ifp, &taddr6, IFA_IN6(ifa), flags, tlladdr, NULL, ifp->if_fib);
NET_EPOCH_EXIT(et);
CURVNET_RESTORE();feels over protective error = in6_setscope(&taddr6, ifp, NULL);
if (error == 0)
nd6_na_output_fib(ifp, &taddr6, IFA_IN6(ifa), flags, tlladdr, NULL, ifp->if_fib);
NET_EPOCH_EXIT(et);
CURVNET_RESTORE();which one is better to replace or is there any other way that I didn't think of? | |
| sys/netinet6/nd6.c | ||
|---|---|---|
| 234 | What do we achieve separating this function into a separate not inlined function? It is used only once. I always prefer not to separate a function that is used only once. This doesn't improve readability. There can be exclusions somtimes, of course. | |
| 342 | This is used only once, I'd suggest to just do TAILQ_INIT() right here. Easier to understand for a new reader of code. | |
| sys/netinet6/nd6_nbr.c | ||
| 1843 | Unfortunately callout_drain() can't be called in non-sleepable context :( And we are in epoch here. Unfortunately, WITNESS didn't warn you about that. I'll take a look if WITNESS can be improved here. | |
| 1857 | Shouldn't this be an assertion? How can we happen to have an ndq entry with ifaddr that belongs to a different interface? | |
| sys/netinet6/nd6.c | ||
|---|---|---|
| 234 |
I tried to follow the implementation of same event in IPv4. (arp_handle_ifllchange() in sys/netinet/if_ether.c). | |
| 342 | It was useful when I had a per-VNET lock to initialize. now it's useless. I'll fix that. | |
| sys/netinet6/nd6_nbr.c | ||
| 1843 | Note to myself. This line in the callout(9) manual means it could sleep:
Let me see what can I do about it. | |
| 1857 |
It doesn't need to be an assertion either. You're right. | |
| sys/netinet6/nd6_nbr.c | ||
|---|---|---|
| 1843 | We have same problem on ipv6 dad too.
| |
| sys/netinet6/nd6_nbr.c | ||
|---|---|---|
| 1843 | It is ok for nd6_dad_stoptimer() to call callout_drain(), the problem arises when it happens in a epoch read section. | |
| 1851 | Can we instead acquire the if_addr write lock and use callout_stop()? I don't think we should be using the net_epoch to synchronize DAD-related data structures: DAD operations are rare and do not really benefit from the concurrency enabled by epoch(9). It is much easier to reason about the correctness of code if it just uses regular locks. Really, net_epoch should mainly be used in data paths. | |
| sys/netinet6/nd6_nbr.c | ||
|---|---|---|
| 1843 |
So you mean it's ok to add an indirect function to call callout_drain() in the middle of NET_EPOCH_ENTER? | |
| 1851 | GRAND is somehow different from DAD. but i'm working on removing epoch. | |
- use callout_stop() instead of callout_drain()
- remove epoch(9) in nd6_queue_stop(), sync queue list using IF_ADDR_WLOCK and protect callout at the same time.
- fix memory leak.
The ndq_refcount seems to be not really used. Could it be you wanted to use to to have the callout to refcount the ndq?
| sys/netinet6/nd6_nbr.c | ||
|---|---|---|
| 1743–1748 | I like the very first: if (in6_setscope(&taddr6, ifp, NULL) == 0)
nd6_na_output_fib(ifp, &taddr6, IFA_IN6(ifa), flags, tlladdr, NULL, ifp->if_fib);You may also augment with __predict_true(). | |
| 1820 | This callout_stop() may not succeed and in that case nd6_queue_rel() will free the memory. But the callout thread is already blocked in softclock_call_cc(0 at line 1544. Blocked on the lock that nd6_queue_rel() just destroyed. Yep, unfortunately our callout(9) is a minefield. There is an undocumented flag to callout that you may find useful. See efcb2ec8cb81ea44c58e41082c6d965f9493099f. | |