Page MenuHomeFreeBSD

syslogd: Support locale when removing unsafe characters
Needs ReviewPublic

Authored by yzhong_freebsdfoundation.org on Sep 16 2020, 5:36 PM.

Details

Reviewers
markj
emaste
Summary

From https://bugs.freebsd.org/244226 :

syslogd has code to convert all logged messages to "safe" strings.

At the moment, the code converts control characters to "^x" sequences and ALL 8-bit characters to "M-x" sequences. This means that printable characters in character sets other than ASCII are converted and so do not display as expected when viewing the logs.

This patch adds LC_CTYPE locale support to syslogd and changes the "safe" conversion code to examine the logged characters using mbrtoc32() and to use iswgraph() to test if a character needs converting to safe sequences.

It also uses vis() to do the conversion which is similar to OpenBSD but which means control chars become \^x and non-graphical 8-bit chars become \M-x.

Test Plan

simplest way is this:

  1. Add to /etc/rc.conf: syslogd_env="LC_CTYPE=C.UTF-8"
  1. Restart syslogd.
  1. Run: echo '\xe0\xb8\xaa\xe0\xb8\xa7\xe0\xb8\xb1\xe0\xb8\xaa\xe0\xb8\x94\xe0\xb8\xb5' | logger
  1. Look at /var/log/messages. You should see "สวัสดี" (hello in Thai).

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

The only change I made to the original patch was to re-order some of the #include directives, as more had been added after the original patch was submitted.

usr.sbin/syslogd/syslogd.c
969

mask_C1 is now a write-only variable. What does vis() do with C1 control characters?

1001

Is the cast to size_t actually required?

Don't we need to limit that parameter to min(MB_LEN_MAX, <# bytes left>)?

1004

When can mbrtoc32() return a value greater than n? It's not clear to me from reading the man page that that's permitted.

1016

This might as well be a memcpy(). strncpy() doesn't guarantee nul-termination here, the assignment after the loop does.

I tested it more thoroughly and found an additional problem (on line 984, it used to test if a size_t was less than 0), along with the ones markj pointed out. They should all be fixed now.

usr.sbin/syslogd/syslogd.c
969

From my testing, vis() automatically converts C1 characters. I added back the old behaviour for when mask_C1 is false.

usr.sbin/syslogd/syslogd.c
979

end_q - q measures the number of bytes remaining in the output buffer, but we're passing the input buffer to mbrtoc32(). In particular, mbrtoc32() assumes that it can access up to n bytes of the input buffer, which is not always true.

max_len should be of type size_t.

Fixed another issue: mbrtoc32() returns the error value (size_t)-2 if it reads a C1 control character. This wouldn't be a problem by itself, as the rest of the function takes care of it, but this screws up mbrtoc32()'s internal state for all of the characters afterwards. Switching to the non-restartable mbtowc() resolves that issue. However, I am not sure if this has any bad implications.

usr.sbin/syslogd/syslogd.c
979

It's better to write end_p = p + strlen(p) and check end_p - p rather than recomputing the entire string's length for each wide character.

usr.sbin/syslogd/syslogd.c
791

Just setlocale(LC_CTYPE, "") probably.

Only three locales are defined by default, the empty string "" which
denotes the native environment, and the "C" and "POSIX" locales, which
denote the C language environment.  A locale argument of NULL causes
setlocale() to return the current locale.

If LC_CTYPE is unset getenv returns null, which causes setlocale() to just return the current setting. The empty string sets it according to environment variables automatically.

Seems reasonable to me.

usr.sbin/syslogd/syslogd.c
979

You could rewrite this as while (p < end_p && q < end_q) I think.

This revision is now accepted and ready to land.Sep 22 2020, 1:39 PM
This revision now requires review to proceed.Sep 22 2020, 6:48 PM

(I plan to commit this, but I would like to finish some work on sandboxing syslogd first - all of this string parsing code makes me a bit nervous.)