Page MenuHomeFreeBSD

syslogd: Centralize operations into a kevent loop
ClosedPublic

Authored by jfree on Aug 8 2023, 3:22 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, May 1, 1:55 AM
Unknown Object (File)
Mon, Apr 22, 11:31 AM
Unknown Object (File)
Mon, Apr 22, 5:16 AM
Unknown Object (File)
Apr 4 2024, 12:36 AM
Unknown Object (File)
Apr 4 2024, 12:01 AM
Unknown Object (File)
Apr 4 2024, 12:01 AM
Unknown Object (File)
Feb 21 2024, 4:11 PM
Unknown Object (File)
Jan 9 2024, 11:30 AM

Details

Summary
Use kqueue(2) for socket I/O and signal notification. Previously,
select(2) and traditional signal handlers were being used.

This change centralizes all of the async notification delivery into a
single loop so future Capsicum sandboxing will be easier. It also
simplifies the code by removing boiler-plate cruft that comes with the
older interfaces.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

usr.sbin/syslogd/syslogd.c
375

Can this be static const int sigcatch[]?

377

As a matter of style, I'd have these either all on one line, or all on separate lines.

714

IMO it is rather odd to call this variable error when it holds a file descriptor in the common case.

773

err() will print the string corresponding to the errno value set by kevent(). But, errno might be clobbered by the pidfile_remove() call, so to be correct you need to write something like:

if (kevent(...) == -1) {
    error = errno;
    pidfile_remove(pfh);
    errc(1, error, "failed to add kevent");
}
776

SIGPIPE isn't listed in sigcatch[].

780

Same comment as above with respect to errno.

801

Same comment as above with respect to errno.

3599

In the code above which blocks these signals, there is a special case for SIGCHLD, but that's not present here. Is that intentional?

usr.sbin/syslogd/syslogd.c
375

Yes.

377

As a matter of style, I'd have these either all on one line, or all on separate lines.

I was trying to align with the TypeNames[] style above.

This is all fixed later. sigcatch is put on a single line.

714

IMO it is rather odd to call this variable error when it holds a file descriptor in the common case.

Agreed. Not sure why I changed this.

3599

In the code above which blocks these signals, there is a special case for SIGCHLD, but that's not present here. Is that intentional?

I can't quite remember why SIGCHLD is even in sigcatch[]. It especially doesn't make sense with process descriptors.

I wrote a lot of this code back when I was still learning syslogd, unfortunately. It is going to be a massive headache to rebase all of this.

jfree marked 7 inline comments as done.

Address Mark's suggestions

Mostly looks good, I just have one question about signal handling.

usr.sbin/syslogd/syslogd.c
787

If I understand correctly, when a blocked signal is delivered, it'll trigger the knote and remain pending until the signal is unblocked. I believe it'll remain pending until syslogd exits, because they're never unblocked. Why not just ignore these signals instead?

usr.sbin/syslogd/syslogd.c
787

I will address this in the reapchild() removal patch. Ignoring SIGCHLD will also stop the generation of zombies.

Adjust to accommodate the reapchild() patch being moved down the series.

This revision is now accepted and ready to land.Aug 14 2023, 1:57 PM
slw_zxy.spb.ru added a subscriber: slw_zxy.spb.ru.

I am use syslogd for collectiong messages from many sourcses.
I have moderate message flow (about 1..10 messages/second average).
I am applay D41357, D41358, D41359, D41360, D41362 -- no problems.
I am applay D41363 -- after about 1..5 minutes syslogd stop to got and processing remote messages. Local messages still processing.
Most of remote messages don't recived and lost -- no action in kevent().
After D41363 syslogd is broken.

This revision now requires changes to proceed.Aug 16 2023, 8:19 PM
usr.sbin/syslogd/syslogd.c
771

Hmm, why exactly do we need EV_CLEAR here? Don't you want the event to fire so long as there's data in the socket buffer?

I am use syslogd for collectiong messages from many sourcses.
I have moderate message flow (about 1..10 messages/second average).
I am applay D41357, D41358, D41359, D41360, D41362 -- no problems.
I am applay D41363 -- after about 1..5 minutes syslogd stop to got and processing remote messages. Local messages still processing.
Most of remote messages don't recived and lost -- no action in kevent().
After D41363 syslogd is broken.

I was able to reproduce this issue and it seems Mark was right, removing EV_CLEAR from kevent() flags, fixes the messages stalling.

Oddly enough, this bug doesn't occur when the messages are sent through syslogd's UDP socket locally. This only happens when the messages are coming from another host.

@slw_zxy.spb.ru, would you mind seeing if this new patch works for you?

Remove EV_CLEAR flag when adding logsocket kevents to kqueue so we don't drop INET messages.

Thanks to @markj for pointing this out.

I am use syslogd for collectiong messages from many sourcses.
I have moderate message flow (about 1..10 messages/second average).
I am applay D41357, D41358, D41359, D41360, D41362 -- no problems.
I am applay D41363 -- after about 1..5 minutes syslogd stop to got and processing remote messages. Local messages still processing.
Most of remote messages don't recived and lost -- no action in kevent().
After D41363 syslogd is broken.

I was able to reproduce this issue and it seems Mark was right, removing EV_CLEAR from kevent() flags, fixes the messages stalling.

Oddly enough, this bug doesn't occur when the messages are sent through syslogd's UDP socket locally. This only happens when the messages are coming from another host.

@slw_zxy.spb.ru, would you mind seeing if this new patch works for you?

Yes, now work for me.

This revision is now accepted and ready to land.Aug 17 2023, 4:54 PM

Yes, now work for me.

Thanks for reporting the issue and testing the new version!

I am use syslogd for collectiong messages from many sourcses.
I have moderate message flow (about 1..10 messages/second average).
I am applay D41357, D41358, D41359, D41360, D41362 -- no problems.
I am applay D41363 -- after about 1..5 minutes syslogd stop to got and processing remote messages. Local messages still processing.
Most of remote messages don't recived and lost -- no action in kevent().
After D41363 syslogd is broken.

I was able to reproduce this issue and it seems Mark was right, removing EV_CLEAR from kevent() flags, fixes the messages stalling.

Oddly enough, this bug doesn't occur when the messages are sent through syslogd's UDP socket locally. This only happens when the messages are coming from another host.

@slw_zxy.spb.ru, would you mind seeing if this new patch works for you?

Yes, now work for me.

Yes. Thanks for testing the change. :)

Fix bug where syslogd would stall upon startup, never getting the chance to fsync().

The ts and tsp variables were originally used as a timeout in kevent(). If needdofsync was true, the timeout would be set to 0, and dofsync() would be called (assuming the kqueue had 0 events pending). Upon startup, syslogd's sockets were already populated, so the kqueue had pending events. syslogd never got a chance to fsync() and signal its parent that it had finished initialization, so it stalled.

This is fixed with a simple if-statement that checks if needdofsync is true. If it is, it will sync. There is no dependency on kevent() this way.

This revision now requires review to proceed.Aug 17 2023, 6:46 PM
This revision is now accepted and ready to land.Aug 23 2023, 2:29 PM
This revision was automatically updated to reflect the committed changes.