Differential D20764

Allow limiting the size of syslogd output files using options in syslog.conf
Needs RevisionPublic
Actions

Authored by • ian on Jun 25 2019, 10:02 PM.

Details

Reviewers

manu
cem

Group Reviewers

manpages

Summary

Systems with limited storage, such as embedded systems using emmc or sd card devices, can be accidentally driven into failure by some error or event which triggers lots of unexpected high-rate logging. Such logging can quickly fill up a small filesystem, which then triggers more errors, leading to more logging, and a quick death-spiral for the overall system. Running newsyslog(8) more often ameliorates the problem, but doesn't completely protect against it.

These changes add a new option to the pathname action lines in syslog.conf which limits the size of the file by specifying "R <size>" after the filename. If the file grows beyond the given size before it is rotated out, syslogd goes into "recyle mode" on that file, where the last 32K of the file is treated as a small circular buffer. The last 32K gets repeatedly overwritten until the next rotation, preserving the vast majority of the file, and thus hopefully preserving some information about the original error or event which triggered the unexpected volume of logging.

Test Plan

This feature has been extensively tested. Patches to add this feature have been in use at Microchip/Symmtricom/Timing Solutions beginning with freebsd 4.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 25045

Event Timeline

• ian created this revision.Jun 25 2019, 10:02 PM

Herald added a reviewer: manpages. · View Herald TranscriptJun 25 2019, 10:02 PM

Herald added a subscriber: imp. · View Herald Transcript

Harbormaster completed remote builds in B25045: Diff 59033.Jun 25 2019, 10:02 PM

The design seems kind of poor (random last 32k corruption?). Alternatively / relatedly, we (ISLN) have a syslogd patch we can share that parses newsyslogd.conf and manually invokes newsyslogd directly when logs grow beyond configured rotation size.

cem added subscribers: darrick.freebsd_gmail.com, vangyzen.Jun 26 2019, 5:50 AM

I'm one of the reason Ian opened this review because I was pondering adding clog support into syslogd for embedded use case.
clog is present since monowall (at least) and still is used in pfSense and OPNSense but talking to Ian about this I preferred his solution better.
The goal here is to

Do not fill up your disk (when ether it's a real disk or a tmpfs)
In case a program crashes in a loop for $somereason you still have access to the original log that caused the crash (hopefully), playing with log rotation with newsyslog will make sure at 100% that you will loose the original problem because the logfile rotated.

That being said I would prefer if the rotating size would be user configurable in syslogd.conf

emaste added a subscriber: emaste.Jun 26 2019, 6:33 PM

+1 to advise caution on this design. Sounds like you want to roll logs more often.

So far there was more commenting on mailing lists than here. Summarizing the feedback from both venues so far...

4 people strongly support this... me, manu@, rgrimes@, and Karl Deninger -- all people who work with embedded systems, which is exactly where this concept and code originated about 15 years ago.
4 people provided feedback that amounts to "Rotate logs more often", which completely misses the point of this change, which is to increase the chances of preserving the information available at the moment that something went wrong and began spewing to the logs. It is often the information that appears in the seconds immediately before unexpected spewage that explains the problem. Rotating logs more often when some unexpected error is dumping hundreds or even thousands of lines per second into the log is just a g'teed way to lose information about the original triggering event.
1 person suggested using fifolog, which is basically another form of "rotate more often".

Given that this is a completely optional feature that places no burden on anyone who chooses not to use it, and given that the code to implement it is small, simple, and has 15 years of real-world testing in deployed products, my intention is still to move forward with this unless there is some better argument for not doing so.

FWIW, I also work on an embedded appliance.

If you have this kind of extremely broken program with frequent log spam you're pretty screwed no matter what you do. I think for some classes of this broken program, it'd be better to detect log cycles and dump the output, which isn't the same as "rotate more frequently." But I don't think this hack is worth the code bloat in syslogd, which is already messy.

(Marking "request changes" as an explicit NACK signal, not because there are specific things to fix.)

This revision now requires changes to proceed.Jul 9 2019, 6:08 AM