Problem:
$ echo μ | bsdgrep -i 'µ' Segmentation fault (core dumped)
Why: We use a bitmap to optimize the first 256 (CHAR_MAX - CHAR_MIN + 1) characters handling. Apparently, the *real* problem here is that one half of lower-pair is in the bitmap we use for first 256 characters (CHAR_MAX - CHAR_MIN), and the other one is outside, which check in othercase() doesn't handle it, and we have a infinite recursion:
... frame #93: 0x000000080031bfa3 libc.so.7`p_bracket(p=0x00007fffffffe770) at regcomp.c:904 frame #94: 0x000000080031c538 libc.so.7`ordinary [inlined] bothcases(p=<unavailable>, ch=<unavailable>) at regcomp.c:1142 frame #95: 0x000000080031c4ce libc.so.7`ordinary(p=0x00007fffffffe770, ch=181) at regcomp.c:1158 frame #96: 0x000000080031bfa3 libc.so.7`p_bracket(p=0x00007fffffffe770) at regcomp.c:904 frame #97: 0x000000080031c538 libc.so.7`ordinary [inlined] bothcases(p=<unavailable>, ch=<unavailable>) at regcomp.c:1142 frame #98: 0x000000080031c4ce libc.so.7`ordinary(p=0x00007fffffffe770, ch=181) at regcomp.c:1158 frame #99: 0x000000080031bfa3 libc.so.7`p_bracket(p=0x00007fffffffe770) at regcomp.c:904 ...
Proposed fix: Reduce the actual size of bitmap down to 128 characters (0-127) for multibyte locales. While I was tempted to not use it at all there to be absolutely safe, it will greatly affect performance as 99.99% of regexes used e.g. when building world fall into the POSIX locale characters range; more so there are no known issues (yet?) with that range. It also required a change in computejumps() to reduce the character range as well in UTF-8 case (we can get there in multibyte case only if using UTF-8).