Page MenuHomeFreeBSD

syslogd: Support locale when removing unsafe characters
Needs ReviewPublic

Authored by yzhong_freebsdfoundation.org on Sep 16 2020, 5:36 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Apr 6, 3:49 AM
Unknown Object (File)
Fri, Apr 3, 4:39 PM
Unknown Object (File)
Wed, Apr 1, 6:08 AM
Unknown Object (File)
Tue, Mar 31, 2:55 AM
Unknown Object (File)
Sun, Mar 29, 12:48 AM
Unknown Object (File)
Sat, Mar 28, 1:44 AM
Unknown Object (File)
Fri, Mar 27, 3:02 PM
Unknown Object (File)
Thu, Mar 26, 9:04 AM

Details

Reviewers
markj
emaste
Summary

From https://bugs.freebsd.org/244226 :

syslogd has code to convert all logged messages to "safe" strings.

At the moment, the code converts control characters to "^x" sequences and ALL 8-bit characters to "M-x" sequences. This means that printable characters in character sets other than ASCII are converted and so do not display as expected when viewing the logs.

This patch adds LC_CTYPE locale support to syslogd and changes the "safe" conversion code to examine the logged characters using mbrtoc32() and to use iswgraph() to test if a character needs converting to safe sequences.

It also uses vis() to do the conversion which is similar to OpenBSD but which means control chars become \^x and non-graphical 8-bit chars become \M-x.

Test Plan

simplest way is this:

  1. Add to /etc/rc.conf: syslogd_env="LC_CTYPE=C.UTF-8"
  1. Restart syslogd.
  1. Run: echo '\xe0\xb8\xaa\xe0\xb8\xa7\xe0\xb8\xb1\xe0\xb8\xaa\xe0\xb8\x94\xe0\xb8\xb5' | logger
  1. Look at /var/log/messages. You should see "สวัสดี" (hello in Thai).

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

The only change I made to the original patch was to re-order some of the #include directives, as more had been added after the original patch was submitted.

usr.sbin/syslogd/syslogd.c
969

mask_C1 is now a write-only variable. What does vis() do with C1 control characters?

1002

Is the cast to size_t actually required?

Don't we need to limit that parameter to min(MB_LEN_MAX, <# bytes left>)?

1005

When can mbrtoc32() return a value greater than n? It's not clear to me from reading the man page that that's permitted.

1017

This might as well be a memcpy(). strncpy() doesn't guarantee nul-termination here, the assignment after the loop does.

I tested it more thoroughly and found an additional problem (on line 984, it used to test if a size_t was less than 0), along with the ones markj pointed out. They should all be fixed now.

usr.sbin/syslogd/syslogd.c
969

From my testing, vis() automatically converts C1 characters. I added back the old behaviour for when mask_C1 is false.

usr.sbin/syslogd/syslogd.c
980

end_q - q measures the number of bytes remaining in the output buffer, but we're passing the input buffer to mbrtoc32(). In particular, mbrtoc32() assumes that it can access up to n bytes of the input buffer, which is not always true.

max_len should be of type size_t.

Fixed another issue: mbrtoc32() returns the error value (size_t)-2 if it reads a C1 control character. This wouldn't be a problem by itself, as the rest of the function takes care of it, but this screws up mbrtoc32()'s internal state for all of the characters afterwards. Switching to the non-restartable mbtowc() resolves that issue. However, I am not sure if this has any bad implications.

usr.sbin/syslogd/syslogd.c
980

It's better to write end_p = p + strlen(p) and check end_p - p rather than recomputing the entire string's length for each wide character.

usr.sbin/syslogd/syslogd.c
792

Just setlocale(LC_CTYPE, "") probably.

Only three locales are defined by default, the empty string "" which
denotes the native environment, and the "C" and "POSIX" locales, which
denote the C language environment.  A locale argument of NULL causes
setlocale() to return the current locale.

If LC_CTYPE is unset getenv returns null, which causes setlocale() to just return the current setting. The empty string sets it according to environment variables automatically.

Seems reasonable to me.

usr.sbin/syslogd/syslogd.c
980

You could rewrite this as while (p < end_p && q < end_q) I think.

This revision is now accepted and ready to land.Sep 22 2020, 1:39 PM
This revision now requires review to proceed.Sep 22 2020, 6:48 PM

(I plan to commit this, but I would like to finish some work on sandboxing syslogd first - all of this string parsing code makes me a bit nervous.)

What's the status of this? We bumped into exactly this problem at $WORK.

What's the status of this? We bumped into exactly this problem at $WORK.

Last year, @jfree wrote patches to capsicumize syslogd. Most of that work has landed, but a few patches are left. We will try to get through them by the end of next week, and then I think this patch can land.

Any progress on this?

Yes, I’ve been working on it. I will try to get them in this weekend. Sorry for the wait

Any progress on this?

Yes, I’ve been working on it. I will try to get them in this weekend. Sorry for the wait

Ok, all of my syslogd patches are in. So sorry about the delay!

This change is still not merged in.