Page MenuHomeFreeBSD

Ignore EPOLLEXCLUSIVE
Needs ReviewPublic

Authored by trasz on Jan 14 2020, 12:14 PM.

Details

Reviewers
dchagin
emaste
kib
Group Reviewers
Linux Emulation
Summary

Make linux(4) explicitly ignore EPOLLEXCLUSIVE. This is another
fix - or a workaround - for Nginx.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 28731
Build 26746: arc lint + arc unit

Event Timeline

trasz created this revision.Jan 14 2020, 12:14 PM
emaste added inline comments.Jan 14 2020, 3:11 PM
sys/compat/linux/linux_event.c
334–336

I think we should have a message for this case, and a comment with a brief description for the option.

emaste added inline comments.Jan 14 2020, 3:35 PM
sys/compat/linux/linux_event.c
334–336

e.g. one of linux_msg, LINUX_SDT_PROBE, LINUX_CTR as appropriate

trasz updated this revision to Diff 66900.Jan 17 2020, 11:08 AM

Add CTR, just in case.

trasz marked 2 inline comments as done.Jan 17 2020, 11:08 AM
kib added a comment.Jan 17 2020, 12:58 PM

What is the semantic of the flag ?

sys/compat/linux/linux_event.h
44

Use braces around the define. There and below.

trasz updated this revision to Diff 66908.Jan 17 2020, 1:21 PM

Add braces.

trasz marked an inline comment as done.Jan 17 2020, 1:23 PM

Quoting the man page (http://man7.org/linux/man-pages/man2/epoll_ctl.2.html):

EPOLLEXCLUSIVE (since Linux 4.5)
       Sets an exclusive wakeup mode for the epoll file descriptor that is being attached to the target file descriptor, fd.  When a wakeup event occurs and multiple epoll file descriptors  are
       attached to the same target file using EPOLLEXCLUSIVE, one or more of the epoll file descriptors will receive an event with epoll_wait(2).  The default in this scenario (when EPOLLEXCLU-
       SIVE is not set) is for all epoll file descriptors to receive an event.  EPOLLEXCLUSIVE is thus useful for avoiding thundering herd problems in certain scenarios.
kib added a comment.Jan 17 2020, 2:03 PM

I think a scary kernel message should be printed when the flag is ignored. Simply because it might affect correctness.

trasz added a comment.Jan 17 2020, 3:01 PM

I tried that; it's way too verbose. Maybe just a sysctl to disable it?

maybe just emit the warning once?

or implement native analog for kqueue/kevent? if it makes sense, I'm not sure)

kib added a comment.Wed, Feb 5, 7:33 PM

or implement native analog for kqueue/kevent? if it makes sense, I'm not sure)

What I see in the description can be implemented easily for kqueue, basically kern_event.c:knote() should stop after successfully activating single knote. I am not sure how closely this would match actual linux semantic.