Page MenuHomeFreeBSD

Add RFC 5424 syslog message parsing to syslogd.
ClosedPublic

Authored by ed on Apr 1 2018, 6:30 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 8, 11:12 PM
Unknown Object (File)
Fri, Dec 27, 1:05 PM
Unknown Object (File)
Sun, Dec 22, 7:57 PM
Unknown Object (File)
Sat, Dec 21, 3:53 PM
Unknown Object (File)
Wed, Dec 18, 11:14 PM
Unknown Object (File)
Dec 12 2024, 9:00 PM
Unknown Object (File)
Dec 10 2024, 8:21 AM
Unknown Object (File)
Dec 10 2024, 2:06 AM
Subscribers

Details

Summary

Syslogd currently uses the RFC 3164 format for its log messages. One
limitation of RFC 3164 is that it cannot be used to log entries with
sub-second precision timestamps. One of our users has expressed a desire
for doing this for doing some basic performance measurements.

This change attempts to make a first cut at switching to RFC 5424 based
logging. The first step is to alter syslogd's input path to properly
parse such messages. It alters the logmsg() prototype to match the
fields of RFC 5424. The parsemsg() function is extended to parse both
RFC 3164 and 5424 messages and call into logmsg() accordingly.

Additional changes include:

  • Introducing proper parsing of timestamps, so that they can be printed in any desired output format. This means we need to infer the year and timezone for RFC 3164 timestamps.
  • Removing ISKERNEL. This can now be realised by simply providing an APP-NAME (== "kernel").
  • Extending RFC 3164 parsing to trim off the TAG prefix and using that to derive APP-NAME and PROCID.
  • Increase MAXLINE. RFC 5424 mentions we should support 2k messages.
Test Plan

Tested both with existing applications and a patched up libc syslog(3)
function that emits RFC 5424 entries.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Add explicit buffer size to parsemsg_remove_unsafe_characters().

The sizeof() in this function won't work, of course.

Hi ed, this is a really nice start at supporting RFC5424 logging format - you have obviously put a lot of work into it - thank you! It was far better than my initial attempt in D14918 which only brought the new timestamp format without the other required syntax updates - so I have abandoned my review.
I have reviewed this patch of your thoroughly, and provided some comments. Please be aware that I have not yet tested this patch and I don't have a lot of FreeBSD src experience (I am a ports committer) - hopefully my comments are helpful and not naïve.

usr.sbin/syslogd/syslogd.c
919–920 ↗(On Diff #40970)

Perhaps a comment should added to describe the functionality of this function, including describing that the PRI and VERSION have already been handled by parsemsg() and the actual log printing is handled by logmsg()

950 ↗(On Diff #40970)

Could this section be simplified by making use of strptime() to perform at least some of the parsing?

964 ↗(On Diff #40970)

Is this supposed to be "timestamp_remote.tm.tm_year -= 1900;"?

1002–1003 ↗(On Diff #40970)

According to ctime(3), the field tm.tm_gmtoff represents "offset from UTC in seconds".
Therefore, I think this equation should have "* 60" added to the end.

1071 ↗(On Diff #40970)

Does this prevent the MSG utilising the MSG-UTF8 format (starting with BOM) allowed in RFC5424?

1131–1132 ↗(On Diff #40970)

Perhaps a comment should added to describe the functionality of this function, including describing that the PRI has already been handled by parsemsg() and the actual log printing is handled by logmsg()

1146–1147 ↗(On Diff #40970)

Is this check still required for the !RemoteAddDate case, now that strptime() is being used?
Perhaps this check should be moved to the RemoteAddDate case logic, to make sure that the "msg += RFC3164_DATELEN + 1" is only skipping over a timestamp?

1174 ↗(On Diff #40970)

What happens if the time() function fails and returns -1?
Consider checking "t_now != -1" before running localtime_r?

1176 ↗(On Diff #40970)

There are only really 3 options/iterations ever required: next year, current year, or last year.
Consider setting the for loop test expression to "year < tm_now.tm_year - 1"?

1177 ↗(On Diff #40970)

I think this line can be done before the for loop, as tm_parsed will not set the tm.tm_year or tm.tm_isdst values?

1220–1221 ↗(On Diff #40970)

This comment should be updated to reflect the reduced functionality of this function.

1397–1400 ↗(On Diff #40970)

This comment should be updated to explain the current functionality of this function.
I.e. this function currently logs messages in the RFC3164 format, dropping any provided msgid and structured_data information.

I believe the syslogd(8) manpage should be updated with this patch to provide details on the current functionality. Something along the following lines:

SYSLOG PROTOCOL NOTES
This syslogd utility accepts messages formatted in either the traditional BSD syslog Protocol (defined by RFC3164) or the newer Syslog Protocol (defined by RFC5424, via the RFC5426 UDP transport only). However, log outputs to the console, files or other machines are always in the traditional BSD syslog Protocol format (defined by RFC3164), regardless of the format in which they were received. The intention is to add RFC5424 output support in the future, with new command-line options to direct syslogd as to which output format to use.
ed marked 12 inline comments as done.

Process the excellent feedback given by woodsb02@. Thanks!

usr.sbin/syslogd/syslogd.c
950 ↗(On Diff #40970)

It could, but I thought it made more sense not to down that path. With strptime() being (close to) the opposite of strftime(), one could argue that strptime() parses time in a locale and time zone dependent fashion. POSIX leaves it unspecified what happens if you pass in a timestamp that is nonexistent in the current time zone (e.g., due to daylight saving time).

This piece of code is intended to parse time in a completely time zone oblivious fashion. It is also a bit more strict, as it does not permit leap seconds, per RFC 5424.

As we already need a PARSE_NUMBER() to parse the time zone offset, it takes little effort to do the parsing of the date and time ourselves.

964 ↗(On Diff #40970)

Well spotted. Thanks!

1071 ↗(On Diff #40970)

Unfortunately, it does mean that. I am not enough of an expert to fully judge what kind of data is safe to store and display on the console, which is why I thought it made more sense to stick to the existing filtering for the time being.

I think we want to be consistent between RFC 3164 and RFC 5424 messages in this regard. As in, I think both of them should use the same parsemsg_remove_unsafe_characters() function to determine what is and what is not permitted. That way we can always translate between both flavours of messages. I've added a TODO to that function.

1146–1147 ↗(On Diff #40970)

That's a good point. I've just changed the code to simply call strptime() unconditionally. That also keeps the external behaviour of syslogd consistent, regardless of whether RemoteAddDate is used. The strlen() call can be moved further to the bottom, so that its value is invariant.

1174 ↗(On Diff #40970)

The syslogd code already assumes that time() never fails in some other places. I think that's a fair assumption to make on somewhat decently configured systems.

Even if this call were to fail, there's nothing failing really badly. It simply thinks it's December 31st, 1969, meaning it will generate log entries between January 7th, 1969 an January 6th, 1970.

1176 ↗(On Diff #40970)

Yes, it can only run three iterations. That said, instead of adding that as an explicit check, let me simply document this fact.

1177 ↗(On Diff #40970)

My initial version of this code did exactly this, but that's incorrect. mktime() alters its input, which in my case caused it to apply the time zone offset/DST during every loop of the iteration. This is why I'm reinitialising the entire structure during every iteration now.

Respect RemoteAddDate (-T) for RFC 5424 messages.

usr.sbin/syslogd/syslogd.c
1017 ↗(On Diff #40994)

Probably worth adding an “else timestamp = NULL” to specifically initialise timestamp, and for clarity.

1176 ↗(On Diff #40970)

My concern was whether there was any possibility of the for loop iterating excessively (more than 3 times). E.g. if the inputs where bad, such as time() returning -1? Whilst I haven’t played out the possibilities, is it worth adding a for loop test to prevent any possible syslogd DoS.

ed marked 2 inline comments as done.Apr 3 2018, 7:29 AM
ed added inline comments.
usr.sbin/syslogd/syslogd.c
1017 ↗(On Diff #40994)

Indeed. Fixed!

1176 ↗(On Diff #40970)

I've added an assertion to the code.

ed marked 2 inline comments as done.

Make minor changes based on woodsb02's feedback.

Hi Ben,

Shall I go ahead and land this change and D14951 tomorrow?

This revision was not accepted when it landed; it landed in state Needs Review.Apr 6 2018, 12:57 PM
This revision was automatically updated to reflect the committed changes.