Page MenuHomeFreeBSD

Ignore EPOLLEXCLUSIVE
Needs ReviewPublic

Authored by trasz on Jan 14 2020, 12:14 PM.
Tags
None
Referenced Files
F81590427: D23172.diff
Thu, Apr 18, 2:53 PM
Unknown Object (File)
Wed, Apr 17, 8:51 PM
Unknown Object (File)
Wed, Apr 17, 9:49 AM
Unknown Object (File)
Feb 1 2024, 9:16 PM
Unknown Object (File)
Jan 31 2024, 9:07 AM
Unknown Object (File)
Jan 4 2024, 4:13 AM
Unknown Object (File)
Dec 23 2023, 1:43 AM
Unknown Object (File)
Dec 19 2023, 12:17 AM

Details

Reviewers
dchagin
emaste
kib
Group Reviewers
Linux Emulation
Summary

Make linux(4) explicitly ignore EPOLLEXCLUSIVE. This is another
fix - or a workaround - for Nginx.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 28661
Build 26687: arc lint + arc unit

Event Timeline

sys/compat/linux/linux_event.c
334โ€“336

I think we should have a message for this case, and a comment with a brief description for the option.

sys/compat/linux/linux_event.c
334โ€“336

e.g. one of linux_msg, LINUX_SDT_PROBE, LINUX_CTR as appropriate

What is the semantic of the flag ?

sys/compat/linux/linux_event.h
44

Use braces around the define. There and below.

Quoting the man page (http://man7.org/linux/man-pages/man2/epoll_ctl.2.html):

EPOLLEXCLUSIVE (since Linux 4.5)
       Sets an exclusive wakeup mode for the epoll file descriptor that is being attached to the target file descriptor, fd.  When a wakeup event occurs and multiple epoll file descriptors  are
       attached to the same target file using EPOLLEXCLUSIVE, one or more of the epoll file descriptors will receive an event with epoll_wait(2).  The default in this scenario (when EPOLLEXCLU-
       SIVE is not set) is for all epoll file descriptors to receive an event.  EPOLLEXCLUSIVE is thus useful for avoiding thundering herd problems in certain scenarios.

I think a scary kernel message should be printed when the flag is ignored. Simply because it might affect correctness.

I tried that; it's way too verbose. Maybe just a sysctl to disable it?

maybe just emit the warning once?

or implement native analog for kqueue/kevent? if it makes sense, I'm not sure)

or implement native analog for kqueue/kevent? if it makes sense, I'm not sure)

What I see in the description can be implemented easily for kqueue, basically kern_event.c:knote() should stop after successfully activating single knote. I am not sure how closely this would match actual linux semantic.

What is needed to help this review along?

What is needed to help this review along?

a test or usage example

In D23172#516201, @kib wrote:

or implement native analog for kqueue/kevent? if it makes sense, I'm not sure)

What I see in the description can be implemented easily for kqueue, basically kern_event.c:knote() should stop after successfully activating single knote. I am not sure how closely this would match actual linux semantic.

Hi, some analysis below. Please correct me if I'm wrong.
EPOLLEXCLUSIVE intended to use in a classic scheme with many worker threads/one event queue. If EPOLLEXCLUSIVE is set on epoll instance, only one thread should be woken up on descriptor event to avoid threads storm.
Classic example - listen/epoll_wait/accept, where all threads are started from epoll_wait on client connect but only one gets a descriptor from accept, others gets EAGAIN, if EPOLLEXCLUSIVE is not set.

Our kqueue/kevent has the same behavior, since any successful (knote activated) call of knote() ends up waking up all threads. As kqueue_scan() sleeps on kq and knote_enqueue() call wakeup(kq).

I have created experimental D35155. It's not a review request, mostly for demonstration and discussion.

For EV_EXCLUSIVE knote I call wakeup_one() to see how it should work, but this is not a solution, seems to me that kqueue_scan() has a bug.