Make linux(4) explicitly ignore EPOLLEXCLUSIVE. This is another
fix - or a workaround - for Nginx.
Details
- Reviewers
dchagin emaste kib - Group Reviewers
Linux Emulation
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 28731 Build 26746: arc lint + arc unit
Event Timeline
sys/compat/linux/linux_event.c | ||
---|---|---|
334โ336 | I think we should have a message for this case, and a comment with a brief description for the option. |
sys/compat/linux/linux_event.c | ||
---|---|---|
334โ336 | e.g. one of linux_msg, LINUX_SDT_PROBE, LINUX_CTR as appropriate |
What is the semantic of the flag ?
sys/compat/linux/linux_event.h | ||
---|---|---|
44 | Use braces around the define. There and below. |
Quoting the man page (http://man7.org/linux/man-pages/man2/epoll_ctl.2.html):
EPOLLEXCLUSIVE (since Linux 4.5) Sets an exclusive wakeup mode for the epoll file descriptor that is being attached to the target file descriptor, fd. When a wakeup event occurs and multiple epoll file descriptors are attached to the same target file using EPOLLEXCLUSIVE, one or more of the epoll file descriptors will receive an event with epoll_wait(2). The default in this scenario (when EPOLLEXCLU- SIVE is not set) is for all epoll file descriptors to receive an event. EPOLLEXCLUSIVE is thus useful for avoiding thundering herd problems in certain scenarios.
I think a scary kernel message should be printed when the flag is ignored. Simply because it might affect correctness.
What I see in the description can be implemented easily for kqueue, basically kern_event.c:knote() should stop after successfully activating single knote. I am not sure how closely this would match actual linux semantic.
Hi, some analysis below. Please correct me if I'm wrong.
EPOLLEXCLUSIVE intended to use in a classic scheme with many worker threads/one event queue. If EPOLLEXCLUSIVE is set on epoll instance, only one thread should be woken up on descriptor event to avoid threads storm.
Classic example - listen/epoll_wait/accept, where all threads are started from epoll_wait on client connect but only one gets a descriptor from accept, others gets EAGAIN, if EPOLLEXCLUSIVE is not set.
Our kqueue/kevent has the same behavior, since any successful (knote activated) call of knote() ends up waking up all threads. As kqueue_scan() sleeps on kq and knote_enqueue() call wakeup(kq).
I have created experimental D35155. It's not a review request, mostly for demonstration and discussion.
For EV_EXCLUSIVE knote I call wakeup_one() to see how it should work, but this is not a solution, seems to me that kqueue_scan() has a bug.