Page MenuHomeFreeBSD

regcomp() recurses infinitely on a case-insensitive pattern containing wide characters in 128-255 range
ClosedPublic

Authored by yuripv on Nov 23 2018, 1:48 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Nov 25, 6:28 AM
Unknown Object (File)
Wed, Nov 20, 12:59 AM
Unknown Object (File)
Wed, Nov 20, 12:24 AM
Unknown Object (File)
Fri, Nov 15, 10:06 PM
Unknown Object (File)
Thu, Nov 14, 5:13 AM
Unknown Object (File)
Wed, Nov 13, 12:49 PM
Unknown Object (File)
Wed, Nov 13, 5:16 AM
Unknown Object (File)
Wed, Nov 13, 4:34 AM
Subscribers

Details

Summary

Problem:

$ echo μ | bsdgrep -i 'µ'
Segmentation fault (core dumped)

Why: We use a bitmap to optimize the first 256 (CHAR_MAX - CHAR_MIN + 1) characters handling. Apparently, the *real* problem here is that one half of lower-pair is in the bitmap we use for first 256 characters (CHAR_MAX - CHAR_MIN), and the other one is outside, which check in othercase() doesn't handle it, and we have a infinite recursion:

...
    frame #93: 0x000000080031bfa3 libc.so.7`p_bracket(p=0x00007fffffffe770) at regcomp.c:904
    frame #94: 0x000000080031c538 libc.so.7`ordinary [inlined] bothcases(p=<unavailable>, ch=<unavailable>) at regcomp.c:1142
    frame #95: 0x000000080031c4ce libc.so.7`ordinary(p=0x00007fffffffe770, ch=181) at regcomp.c:1158
    frame #96: 0x000000080031bfa3 libc.so.7`p_bracket(p=0x00007fffffffe770) at regcomp.c:904
    frame #97: 0x000000080031c538 libc.so.7`ordinary [inlined] bothcases(p=<unavailable>, ch=<unavailable>) at regcomp.c:1142
    frame #98: 0x000000080031c4ce libc.so.7`ordinary(p=0x00007fffffffe770, ch=181) at regcomp.c:1158
    frame #99: 0x000000080031bfa3 libc.so.7`p_bracket(p=0x00007fffffffe770) at regcomp.c:904
...

Proposed fix: Reduce the actual size of bitmap down to 128 characters (0-127) for multibyte locales. While I was tempted to not use it at all there to be absolutely safe, it will greatly affect performance as 99.99% of regexes used e.g. when building world fall into the POSIX locale characters range; more so there are no known issues (yet?) with that range. It also required a change in computejumps() to reduce the character range as well in UTF-8 case (we can get there in multibyte case only if using UTF-8).

Test Plan
  • tests/lib/libc/regex
  • tests/usr.bin/grep
  • tests/usr.bin/sed
  • GNU grep test suite (partially, the original test case now passes)
  • buildworld w/o and with the change; no visible performance change; running resulting system for a week

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

yuripv retitled this revision from regcomp() recuses infinitely on a case-insensitive pattern with unidirectional case-mappings to regcomp() recurses infinitely on a case-insensitive pattern with unidirectional case-mappings.Nov 23 2018, 1:48 AM

Looks good to me, but it would be nice to have input from kevans as well.

This revision is now accepted and ready to land.Nov 23 2018, 2:17 AM
yuripv edited the summary of this revision. (Show Details)
yuripv edited the test plan for this revision. (Show Details)

After providing the quick fix, I have started wondering how GNU grep (or libgnuregex, more precisely) does the matching without providing any locale specific data; the answer is pretty simple, not sure why I didn't think about it from the start -- fold the case of *both* sides of comparison, and compare both-upper and/or both-lower.

This revision now requires review to proceed.Nov 23 2018, 11:39 AM
This revision is now accepted and ready to land.Nov 23 2018, 2:29 PM

Can you throw a test case for this in ^/usr.bin/grep/tests/grep_freebsd_test.sh as well, please? It's currently pretty-much-mandatory to run grep/sed/libc test suites against regex(3) changes anyways, so this would be sensible.

yuripv retitled this revision from regcomp() recurses infinitely on a case-insensitive pattern with unidirectional case-mappings to regcomp() recurses infinitely on a case-insensitive pattern containing wide characters in 128-255 range.
yuripv edited the summary of this revision. (Show Details)
yuripv edited the test plan for this revision. (Show Details)

Sorry for the delay. I have spent quite a bit more time on this, including testing. Summary and the change updated accordingly.

This revision now requires review to proceed.Dec 8 2018, 4:08 AM
This revision is now accepted and ready to land.Dec 8 2018, 3:27 PM

@kevans are you OK with the test case added to libc/regex instead and using sed? :)

@kevans are you OK with the test case added to libc/regex instead and using sed? :)

Ah, yes, this looks good to me. =)

This revision was automatically updated to reflect the committed changes.