Add support for sleepable EPOCH(9) type, based on prior work in the LinuxKPI.
Needs ReviewPublic
Actions

Authored by • hselasky on May 21 2021, 9:31 AM.

Details

Reviewers

markj
glebius
mmacy
kib

Group Reviewers

manpages

Summary

This EPOCH type is not targeted for high turnaround performance, but more to
close configuration races in the kernel in a performant way, without having to
enforce sleepable mutexes which has the disadvantage of sharing a single
cache line for the lock.

MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

• hselasky requested review of this revision.May 21 2021, 9:31 AM

• hselasky created this revision.

• hselasky added a child revision: D28136: Make sure user-space calls are drained when detaching network interfaces..May 21 2021, 12:26 PM

Diffreduction.

donner added a subscriber: donner.May 21 2021, 1:25 PM

• hselasky added a reviewer: manpages.May 21 2021, 1:47 PM

This EPOCH type is not targeted for high turnaround performance, but more to
close configuration races in the kernel in a performant way, without having to
enforce sleepable mutexes which has the disadvantage of sharing a single
cache line for the lock.

Why are sleepable rm locks not sufficient?

@markj:

Reasons for not using a RM lock:

blocking can happen on the lock itself.
multiple CPUs can touch the same RM lock , causing memory congestion trying to update the lock.

This also may be way to move the "RCU" support in the LinuxKPI to be implemented by the kernel.

In D30376#682609, @hselasky wrote:

@markj:

Reasons for not using a RM lock:

blocking can happen on the lock itself.

Ok, but by definition sleeping can happen anyway. What usages do you have in mind for this?

multiple CPUs can touch the same RM lock , causing memory congestion trying to update the lock.

RM read locks touch only per-CPU data in the fast path.

In D30376#682614, @markj wrote:

In D30376#682609, @hselasky wrote:

@markj:

Reasons for not using a RM lock:

blocking can happen on the lock itself.

Ok, but by definition sleeping can happen anyway. What usages do you have in mind for this?

I want to resolve a race with if_ioctl. Yes, there are many ways to solve this, but EPOCH seems to be the more performant one, where wait operations are very rare, compared to the read sections.

https://reviews.freebsd.org/D28136

multiple CPUs can touch the same RM lock , causing memory congestion trying to update the lock.

RM read locks touch only per-CPU data in the fast path.

I see, but still they block all callers when there is an exclusive lock?

--HPS

In D30376#682616, @hselasky wrote:

In D30376#682614, @markj wrote:

In D30376#682609, @hselasky wrote:

@markj:

Reasons for not using a RM lock:

blocking can happen on the lock itself.

Ok, but by definition sleeping can happen anyway. What usages do you have in mind for this?

I want to resolve a race with if_ioctl. Yes, there are many ways to solve this, but EPOCH seems to be the more performant one, where wait operations are very rare, compared to the read sections.

rm lock read locks should be cheaper than epoch read sections.

https://reviews.freebsd.org/D28136

multiple CPUs can touch the same RM lock , causing memory congestion trying to update the lock.

RM read locks touch only per-CPU data in the fast path.

I see, but still they block all callers when there is an exclusive lock?

--HPS

Right, they enforce mutual exclusion. In general I think it's preferable to use mutual exclusion when possible, since that makes it much easier to reason about the correctness of the code, at the expense of large worst-case latency for readers. But why is this downside a problem for ioctl handlers?

Right, they enforce mutual exclusion. In general I think it's preferable to use mutual exclusion when possible, since that makes it much easier to reason about the correctness of the code, at the expense of large worst-case latency for readers. But why is this downside a problem for ioctl handlers?

The ifioctl() handling always start by looking up the ifnet by name. There is no problem with correctness checking. By controlling if the network device is listed or not, this makes a perfect match for EPOCH(9). The only problem is that this code is sleepable, so that's why I added sleepable EPOCH(9) support, which should not conflict with your patches to simplify preemptable EPOCH's.

In D30376#682798, @hselasky wrote:

Right, they enforce mutual exclusion. In general I think it's preferable to use mutual exclusion when possible, since that makes it much easier to reason about the correctness of the code, at the expense of large worst-case latency for readers. But why is this downside a problem for ioctl handlers?

The ifioctl() handling always start by looking up the ifnet by name. There is no problem with correctness checking. By controlling if the network device is listed or not, this makes a perfect match for EPOCH(9). The only problem is that this code is sleepable, so that's why I added sleepable EPOCH(9) support, which should not conflict with your patches to simplify preemptable EPOCH's.

Sure, I am not worried about conflicts. I just don't see why we should extend our synchronization primitives when existing ones are sufficient. If there is some reason that we really want to avoid blocking in read-only ioctl handlers, then perhaps it is reasonable, but it's not clear why that is a problem.

Rebase code.

In D30376#683995, @markj wrote:

In D30376#682798, @hselasky wrote:

Right, they enforce mutual exclusion. In general I think it's preferable to use mutual exclusion when possible, since that makes it much easier to reason about the correctness of the code, at the expense of large worst-case latency for readers. But why is this downside a problem for ioctl handlers?

The ifioctl() handling always start by looking up the ifnet by name. There is no problem with correctness checking. By controlling if the network device is listed or not, this makes a perfect match for EPOCH(9). The only problem is that this code is sleepable, so that's why I added sleepable EPOCH(9) support, which should not conflict with your patches to simplify preemptable EPOCH's.

Sure, I am not worried about conflicts. I just don't see why we should extend our synchronization primitives when existing ones are sufficient. If there is some reason that we really want to avoid blocking in read-only ioctl handlers, then perhaps it is reasonable, but it's not clear why that is a problem.

I currently view EPOCH(9) as the best solution, that sleeping is entirely on the writer side, so to speak. Then we don't have to worry about sleeping issues in the fast path.

I've also added setsockopt and getsockopt under the same sleepable EPOCH(9), see:

https://reviews.freebsd.org/D28136

I'd like to move forward with a solution for a few panic I see.

What do you think?

--HPS

erj added a subscriber: erj.Aug 4 2021, 3:40 PM