Page MenuHomeFreeBSD

ndp: Add support for Gratuitous Neighbor Discovery (GRAND)
Needs ReviewPublic

Authored by pouria on Sat, Jan 31, 6:42 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Feb 10, 6:27 PM
Unknown Object (File)
Tue, Feb 3, 10:57 PM
Unknown Object (File)
Tue, Feb 3, 10:56 PM
Unknown Object (File)
Sun, Feb 1, 4:09 PM
Unknown Object (File)
Sun, Feb 1, 9:23 AM
Unknown Object (File)
Sun, Feb 1, 5:41 AM
Unknown Object (File)
Sun, Feb 1, 3:18 AM
Unknown Object (File)
Sun, Feb 1, 2:04 AM

Details

Reviewers
glebius
madpilot
zlei
bz
markj
Group Reviewers
network
Summary

Implement RFC 4861 Section 7.2.6 and RFC 9131, which is also
address one of the IPv6 deployment issues in RFC 9898 Section 3.9.
GRAND should be triggered by a change in link-layer address of interface
or by configuration of a new global ipv6 address AFTER DAD completes.

Test Plan

To test link-layer address change:
ifconfig vtnet0 ether 58:9c:fc:11:22:33
To test new global address configuration:
ifconfig vtnet0 inet6 2001:db8::1
To verify:
tcpdump -vni vtnet0 icmp6 and icmp6[0] == 136

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 70716
Build 67599: arc lint + arc unit

Event Timeline

Rename struct and function. no functional change.

I can't understand why do we need the global (pre-vnet) list. Can the queue just hang off the in6_ifaddr?

sys/netinet6/nd6_nbr.c
130

Is there any possibility for ndq_vnet != ndq_ifa->ifa_ifp->if_vnet?

134

Do we expect any substantial contention on this lock? I'd suggest not to virtualize it.

sys/netinet6/nd6_nbr.c
134

I also prefer not to virtualize it.
However, I plan to implement all delayed NDP related RFCs that are currently unaddressed.
For instance:

  • Anycast advertisement delay rule (RFC 4861, 7.2.7)
  • Proxy advertisement delay rule (RFC 4861, 7.2.8)

There are additional RFC/rules that require delaying as well.
I'm worried that we might end up facing contention on this lock.
Why?
IPv6 can and will have multiple addresses per interface.
In production, I saw many cases with more than 50+ jails, and each having two or more IPv6 addesses.
Consider this scenario:

  1. sysrc jail_parallel_start="YES" (which is common)
  2. boot or service jail restart
  3. Attempting to assign more than 100 IPv6 addresses simultaneously.

I don't know how to measure lock contention, AFAIK, witness(4) only helps with order violations.
I would appreciate if you could introduce me to any facility that records contention.
I can use rw_try_lock and counting it for measurement, I think it would affect how it behaves.

Anyway, I will remove queue virtualization for now and will rollback when you think it's needed.

About in6_addr, I don't think we can attach the queue to the in6_addr.
At least, this is what I was trying to do first and failed to do it without introducing too much complexity.

Don't virtualize nd6_queue.
@glebius Done.

rename nd6 queue flags from GRAND to QUEUE to reuse nd6_queue in future.

Sorry, I didn't ask to de-virtualize the queue, only the lock. I actually asked why can't we hang off the queue from the in6_ifaddr instead of making it global? btw, how many queue entries per unique address could be there?

The lock profiling is a big topic, let's discuss this separate from this review. The TLDR answer on your question is that if you start 100 jails in parallel your NDP lock will be the last thing to contend on. In the first place it will be the global all prison lock, and then dozens of other locks that are used to startup a jail.

sys/netinet6/nd6_nbr.c
134

About in6_addr, I don't think we can attach the queue to the in6_addr.
At least, this is what I was trying to do first and failed to do it without introducing too much complexity.

Oh, missed this comment. Can you please elaborate on what the complexity is? Seems to be straightforward at the first glance.

134

Let's confirm we both are talking about in6_ifaddr, not about in6_addr.

Sorry, I didn't ask to de-virtualize the queue, only the lock.

I misunderstood you message. I'll fix it right away!

I actually asked why can't we hang off the queue from the in6_ifaddr instead of making it global? btw, how many queue entries per unique address could be there?

Per unique address? For now, should be only one. unless, someone change the mac address right after assigning a new IPv6 address to the same interface.
That's why I think it's pointless to hang off the queue from the in6_ifaddr.

I didn't ask to de-virtualize the queue, only the lock.

I'm confused now, how can I initialize a single lock for each per-vnet queue?
If I use an static initializer I need to tell/assign which per-vnet queue:

VNET_DEFINE_STATIC(TAILQ_HEAD(, nd6_queue), nd6_queue);
#define V_nd6_queue             VNET(nd6_queue)

static struct rwlock ndq_rwlock;
RW_SYSINIT(ndq_rwlock, &**V_ndq_rwlock**, "nd6 queue");

which is only protecting the vnet0.

and if I want to initialize it per vnet like before:

void
nd6_queue_init(void)
{

       rw_init(&ndq_rwlock, "nd6 queue");
       TAILQ_INIT(&V_nd6_queue);
}

the second vnet call to initialize it will fail.

I actually asked why...

So, should I virtualize the lock too? (I answer to your question in the comment)

sys/netinet6/nd6_nbr.c
134

Can you please elaborate on what the complexity is? Seems to be straightforward at the first glance.

On the GRAND, llifaddr event, and other delaying rules that I'm aware of,
we want to enumerate per interface to find out how many delayed ND packets exist,
which is usually one per unique address, since delaying rules apply to network, not address.
This demands it to be on ifnet itself, which is not what we want.
Since we don't want to add this queue to ifnet for obvious reasons, the next candidate is in6_ifaddr,
which is associated with each unique address and that's not what we want.
because we don't have many entry per address.
we have many entries per interface.
IMHO, enumerating each queue for every address per interface is longer and more cumbersome,
especially considering we have to filter these addresses to determine their anycast/proxy flags or DAD status.

sys/netinet6/nd6_nbr.c
134

If I understood you correct, we don't need a queue at all. The entry is to become a part of in6_ifaddr and in case we need overall counter of delayed packets per interface, depending on how often we need that count, we can either iterate the address list or we can store the counter in in6_ifextra.

If I still don't understand that correct, let's talk about that online right after tomorrow's transport meeting.

Move nd6_queue tailq to in6_ifextra.
initialize nd6_queue at nd6_ifattach.
Reuse IF_ADDR_WLOCK as wlock of tailq.
Use epoch(9) instead of rlock.

@glebius, we use ifa_ref(). do we still need the if_ref() at nd6_grand_add?

Replace callout_init_rw with callout_init_mtx. (Oops!)
Interesting enough, I tested every part my code. everything works.
nobody complains that I used callout_init_rw and passed a mutex!

rename nd6_queue to nd_queue to match the style of in6_ifextra
rebase from main to update parent commit

Do you have time to review this change?
@ae @zlei

Update by rebase to main. resolve conflict with 9df6a7f9a60b in ip6_var.h.

sys/netinet6/nd6_nbr.c
1704

This assert is too late. If ifa was null, we would panic at struct ifnet *ifp = ifa->ifa_ifp;. I don't think asserting here has a value, since if used incorrectly it will panic anyway.

1743–1748

You can avoid goto here easily.

  1. Fix interface removal panic when there are entries in the queue. (reported by @glebius)
  2. Fix late call to KASSERT. @glebius done.
pouria added inline comments.
sys/netinet6/nd6_nbr.c
1743–1748

I thought this way is cleaner.
These are alternatives I can think of.

feels weird since nobody before me done it (fgrep -A2 -r in6_setscope sys)

if (in6_setscope(&taddr6, ifp, NULL) == 0)
    nd6_na_output_fib(ifp, &taddr6, IFA_IN6(ifa), flags, tlladdr, NULL, ifp->if_fib);

NET_EPOCH_EXIT(et);
CURVNET_RESTORE();

feels long

if (in6_setscope(&taddr6, ifp, NULL) != 0) {
    NET_EPOCH_EXIT(et);
    CURVNET_RESTORE();
    return;
}
    
nd6_na_output_fib(ifp, &taddr6, IFA_IN6(ifa), flags, tlladdr, NULL, ifp->if_fib);

NET_EPOCH_EXIT(et);
CURVNET_RESTORE();

feels over protective

error = in6_setscope(&taddr6, ifp, NULL);

if (error == 0)
    nd6_na_output_fib(ifp, &taddr6, IFA_IN6(ifa), flags, tlladdr, NULL, ifp->if_fib);

NET_EPOCH_EXIT(et);
CURVNET_RESTORE();

which one is better to replace or is there any other way that I didn't think of?

sys/netinet6/nd6.c
234

What do we achieve separating this function into a separate not inlined function? It is used only once. I always prefer not to separate a function that is used only once. This doesn't improve readability. There can be exclusions somtimes, of course.

342

This is used only once, I'd suggest to just do TAILQ_INIT() right here. Easier to understand for a new reader of code.

sys/netinet6/nd6_nbr.c
1843

Unfortunately callout_drain() can't be called in non-sleepable context :( And we are in epoch here. Unfortunately, WITNESS didn't warn you about that. I'll take a look if WITNESS can be improved here.

1857

Shouldn't this be an assertion? How can we happen to have an ndq entry with ifaddr that belongs to a different interface?

sys/netinet6/nd6.c
234

What do we achieve separating this function into a separate not inlined function? It is used only once. I always prefer not to separate a function that is used only once. This doesn't improve readability. There can be exclusions somtimes, of course.

I tried to follow the implementation of same event in IPv4. (arp_handle_ifllchange() in sys/netinet/if_ether.c).
However, I agree, the same event in ARP implementation has too many nested function too (arp_announce_ifaddr, arp_request, ...).
I'll merge it into a single function.

342

It was useful when I had a per-VNET lock to initialize. now it's useless. I'll fix that.

sys/netinet6/nd6_nbr.c
1843

Note to myself. This line in the callout(9) manual means it could sleep:

This function MUST NOT be called while holding any locks on which the
callout might block

Let me see what can I do about it.

1857

Shouldn't this be an assertion? How can we happen to have an ndq entry with ifaddr that belongs to a different interface?

It doesn't need to be an assertion either. You're right.
I was a little excited that I managed to make that panic go away and didn't double check. sorry :"(

Address @glebius comments.
except for callout_drain()

I have to fix these issues:

  • invalid use of callout_drain()
  • memory leak
sys/netinet6/nd6_nbr.c
1843

We have same problem on ipv6 dad too.

  1. nd6_ifdetach (enters epoch)
  2. nd6_dad_stop()
  3. nd6_dad_stoptimer()
  4. callout_drain()
sys/netinet6/nd6_nbr.c
1843

It is ok for nd6_dad_stoptimer() to call callout_drain(), the problem arises when it happens in a epoch read section.

1851

Can we instead acquire the if_addr write lock and use callout_stop()? I don't think we should be using the net_epoch to synchronize DAD-related data structures: DAD operations are rare and do not really benefit from the concurrency enabled by epoch(9). It is much easier to reason about the correctness of code if it just uses regular locks. Really, net_epoch should mainly be used in data paths.

sys/netinet6/nd6_nbr.c
1843

It is ok for nd6_dad_stoptimer() to call callout_drain(), the problem arises when it happens in a epoch read section.

So you mean it's ok to add an indirect function to call callout_drain() in the middle of NET_EPOCH_ENTER?
because, AFAICU, this is exactly what happens for dad_stop.

1851

GRAND is somehow different from DAD. but i'm working on removing epoch.

  • use callout_stop() instead of callout_drain()
  • remove epoch(9) in nd6_queue_stop(), sync queue list using IF_ADDR_WLOCK and protect callout at the same time.
  • fix memory leak.

@glebius @markj
This should be ready to review again.

The ndq_refcount seems to be not really used. Could it be you wanted to use to to have the callout to refcount the ndq?

sys/netinet6/nd6_nbr.c
1743–1748

I like the very first:

if (in6_setscope(&taddr6, ifp, NULL) == 0)
    nd6_na_output_fib(ifp, &taddr6, IFA_IN6(ifa), flags, tlladdr, NULL, ifp->if_fib);

You may also augment with __predict_true().

1820

This callout_stop() may not succeed and in that case nd6_queue_rel() will free the memory. But the callout thread is already blocked in softclock_call_cc(0 at line 1544. Blocked on the lock that nd6_queue_rel() just destroyed. Yep, unfortunately our callout(9) is a minefield.

There is an undocumented flag to callout that you may find useful. See efcb2ec8cb81ea44c58e41082c6d965f9493099f.