Page MenuHomeFreeBSD

Ignore EPOLLEXCLUSIVE
Needs ReviewPublic

Authored by trasz on Jan 14 2020, 12:14 PM.

Details

Reviewers
dchagin
emaste
kib
Group Reviewers
Linux Emulation
Summary

Make linux(4) explicitly ignore EPOLLEXCLUSIVE. This is another
fix - or a workaround - for Nginx.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 28731
Build 26746: arc lint + arc unit

Event Timeline

sys/compat/linux/linux_event.c
333–335

I think we should have a message for this case, and a comment with a brief description for the option.

sys/compat/linux/linux_event.c
333–335

e.g. one of linux_msg, LINUX_SDT_PROBE, LINUX_CTR as appropriate

What is the semantic of the flag ?

sys/compat/linux/linux_event.h
43

Use braces around the define. There and below.

Quoting the man page (http://man7.org/linux/man-pages/man2/epoll_ctl.2.html):

EPOLLEXCLUSIVE (since Linux 4.5)
       Sets an exclusive wakeup mode for the epoll file descriptor that is being attached to the target file descriptor, fd.  When a wakeup event occurs and multiple epoll file descriptors  are
       attached to the same target file using EPOLLEXCLUSIVE, one or more of the epoll file descriptors will receive an event with epoll_wait(2).  The default in this scenario (when EPOLLEXCLU-
       SIVE is not set) is for all epoll file descriptors to receive an event.  EPOLLEXCLUSIVE is thus useful for avoiding thundering herd problems in certain scenarios.

I think a scary kernel message should be printed when the flag is ignored. Simply because it might affect correctness.

I tried that; it's way too verbose. Maybe just a sysctl to disable it?

maybe just emit the warning once?

or implement native analog for kqueue/kevent? if it makes sense, I'm not sure)

or implement native analog for kqueue/kevent? if it makes sense, I'm not sure)

What I see in the description can be implemented easily for kqueue, basically kern_event.c:knote() should stop after successfully activating single knote. I am not sure how closely this would match actual linux semantic.