HomeFreeBSD

regex: mixed sets are misidentified as singletons

Description

regex: mixed sets are misidentified as singletons

Fix "singleton" function used by regcomp() to turn character set matches
into exact character matches if a character set has exactly one
element.

The underlying cset representation is complex; most critically it
records"small" characters (codepoint less than either 128
or 256 depending on locale) in a bit vector, and "wide" characters in
a secondary array.

Unfortunately the "singleton" function uses to identify singleton sets
treated a cset as a singleton if either the "small" or the "wide" sets
had exactly one element (it would then ignore the other set).

The easiest way to demonstrate this bug:

$ export LANG=C.UTF-8
$ echo 'a' | grep '[abĂ ]'

It should match (and print "a") but instead it doesn't match because the
single accented character in the set is misinterpreted as a singleton.

PR: 281710
Reviewed by: kevans, yuripv
Obtained from: illumos

(cherry picked from commit 8f7ed58a15556bf567ff876e1999e4fe4d684e1d)

Details

Provenance
sommerfeld_hamachi.orgAuthored on Dec 21 2023, 3:46 AM
kevansCommitted on Sep 25 2024, 8:42 PM
Parents
rG867aaad5c2bf: bhyve: validate corb->wp to avoid infinite loop
Branches
Unknown
Tags
Unknown