Page MenuHomeFreeBSD

Netisr delayed dispatch queue for complex processing cases.
ClosedPublic

Authored by melifaro on Oct 5 2014, 11:56 AM.
Tags
None
Referenced Files
F132131830: D903.id.diff
Tue, Oct 14, 1:05 AM
Unknown Object (File)
Tue, Sep 23, 10:59 PM
Unknown Object (File)
Tue, Sep 23, 1:32 AM
Unknown Object (File)
Sun, Sep 21, 6:40 AM
Unknown Object (File)
Fri, Sep 19, 11:45 AM
Unknown Object (File)
Aug 24 2025, 6:54 PM
Unknown Object (File)
Aug 24 2025, 5:18 AM
Unknown Object (File)
Aug 18 2025, 9:33 AM

Details

Reviewers
melifaro
Group Reviewers
network
Summary

There are cases, when we have to send some data while holding
too many locks which may lead to LORs / recursive lock acquisition.

Typical problem areas:

  • ndp (ICMPv6 needs to be routed over IPv6)
  • nesting interfaces sending some control traffic

Idea is to simplify locking model for consumers by adding generic
mbuf queue which
a) deals with interface departures automatically
b) is able to save some state which may be needed to process mbuf before sending
c) calls special handler for each mbuf in queue so it becomes relatively easy to do some preprocessing before sending

What exactly is proposed:

  • Another one netisr queue for handling different types of packets
  • metainfo is stored in mbuf_tag attached to packet
  • ifnet departure handler taking care of packets queued from/to killed ifnet
  • API to register/unregister/dispath given type of traffic

Current problems that can be solved:

  1. Locking in IPv6 LLE timers (solution embedded)

We're using per-LLE IPv6 timers for various purposes, most of them
requires LLE modifications, so timer function starts with lle write lock
held.

Some timer events requires us to send neighbour solicication messages
which involves a) source address selection (requiring LLE lock being
held ) and b) calling ip6_output() which requires LLE lock being not
held. It is solved exactly as in IPv4 arp handling code: timer function
drops write lock before calling nd6_ns_output().

Dropping/acquiring lock is error-prone, for example, the following scenario is possible (traced by ae@):

we're calling if_detach(ifp) (thread 1) and nd6_llinfo_timer (thread 2).
Then the following can happen:

#1 T2 releases LLE lock and runs nd6_ns_output().
#2 T1 proceeds with detaching: in6_ifdetach() -> in6_purgeaddr() -> nd6_rem_ifa_lle() -> in6_lltable_prefix_free()

which removes all LLEs for given prefix acquiring each LLE write lock.
"Our" LLE is not destroyed since it is refcounted by nd6_llinfo_settimer_locked().

#3 T2 proceeds with nd6_ns_output() selecting source address (which involves acquiring LLE read lock)

#4 T1 finishes with detaching interface addresses and sets ifp->if_addr to NULL

#5 T2 calls nd6_ifptomac() which reads interface MAC from ifp->if_addr

#6 User inspects core generated by previous call

Using new API, we can avoid #6 by making the following code changes:

  • LLE timer does not drop/reacquire LLE lock
  • we require nd6_ns_output callers to lock LLE if it is provided
  • nd6_ns_output() uses "slow" path instead of sending mbuf to ip6_output() immediately if LLE is not NULL.
  1. Lagg locking:

Changing lagg primary port requires updating MAC addresses on other ports and nested devices.
We do this with holding lagg WLOCK. Since changing mac involves sending gratious arp, we generate mbuf and
send it via vlan interface, which transmits it to lagg which tries to acquire read lock..
While this was (partially?) addressed by r272547 I feel we still have to send gratious arp via given interface,
because:

  • current lagg scheme works by a) _detaching_ all port on reconfig and b) does this in taskqueue. This is too complex and bad for production traffic.
  • there can be lots of other cases with nested devices, so we'd better solve them in one place.

We've been running very similar patch on more than 50 heavy-loaded IPv6 firewalls since January, without any issues.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

melifaro retitled this revision from to Netisr delayed dispatch queue for complex processing cases..
melifaro updated this object.
melifaro edited the test plan for this revision. (Show Details)
melifaro added a subscriber: ae.

Query: are there a sufficient number of these types that simply adding more netisr protocols isn't the right solution? That was what was done historically for routing sockets, etc. I'm not opposed to a more fine-grained mechanism, but there are tradeoffs in adding additional memory allocation/freeing for every packet, etc. These mostly don't matter for lower-volume events, of course.

s/Netisr/netisr/ in mbuf.h.

I would prefer it if this were named 'deferred' rather than 'delayed', as that is the term used elsewhere for this design pattern (i.e., 'deferred dispatch').

I agree with Robert's comments on the topic. I'd also like to know how this effects the performance of the system. Finally, if this were to become a general mechanism in the kernel, how would we abstract it so that it worked, for instance, with ARP.

Yes, It is true that LORs in if_lagg(4) are not solved even in r272547.

I agree that ARP, NDP, and MLD/IGMP require an asynchronous queue to avoid recursive lock acquisition. I basically like this idea though I am not sure if we should implement it as new protocol set for netisr or not as others pointed out.

And, what is difference between dispatch and pdispatch in practice? Correct me if I am wrong, but to me they are almost the same interface with each other. Do you have any specific use cases?

This comment was removed by melifaro.

Sorry for the noise, guys.
I've pushed patch for different this here by mistake, so there is still nothing to review.
I'll try to provide better version of delayed queues (half-finished) soon.

melifaro added a reviewer: melifaro.

Temporarily closing this change request not to annoy anyone. It looks like I can't close it unless it is "Accepted".

This revision is now accepted and ready to land.Apr 19 2015, 1:11 PM