Page MenuHomeFreeBSD

sed(1) / regex(3): correctly identify word boundaries when pattern is matched more than once
AbandonedPublic

Authored by avg on Jun 12 2015, 1:30 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 26, 6:44 AM
Unknown Object (File)
Fri, Apr 26, 6:44 AM
Unknown Object (File)
Jan 8 2024, 11:11 AM
Unknown Object (File)
Jan 8 2024, 11:11 AM
Unknown Object (File)
Jan 8 2024, 10:59 AM
Unknown Object (File)
Dec 20 2023, 1:12 AM
Unknown Object (File)
Sep 21 2023, 2:27 PM
Unknown Object (File)
Jul 8 2023, 3:39 PM
Subscribers

Details

Reviewers
pfg
Summary

Currenty there is a problem with matching word boundaries at the start
of the pattern when the pattern is applied to the string more than once.
E.g. in the sed substitution command with 'g' flag and a pattern starting
with ':<:'.

Here is a concrete example of the problem:
http://thread.gmane.org/gmane.os.freebsd.current/152226/focus=152228

With this change sed makes available the whole string to regexec(3) while passing
offsets of the start and the end of the substring to work on.
This allows regex(3) to correctly identify a word boundary at the start
of the substring, because now we can examine 'lastc' (previous character)
when operating on the beginning of the substring which is not at the very
start of the string.

Note that other uitilites that use regex(3) would also have to pass
the whole string to avoid the same problem. The regex(3) change alone is not
enough to fix all the consumers.

This change also includes a hack to allow libc/regex/grot to compile
after xlocale changes.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

avg retitled this revision from to sed(1) / regex(3): correctly identify word boundaries when pattern is matched more than once.
avg updated this object.
avg edited the test plan for this revision. (Show Details)

Pedro, I've noticed your recent activity in this area. Maybe you'll have some feedback. Thank you.

pfg requested changes to this revision.May 17 2016, 2:33 PM
pfg added a reviewer: pfg.

Interestingly .. the OpenBSD guys have been working on a similar issue that breaks Mesa.

I almost committed a similar patch, but it is incorrect.
Check:
https://www.mail-archive.com/tech@openbsd.org/msg31625.html

lib/libc/regex/engine.c
792

This causes a regression in regexex(3) REG_STARTEND.

897

As in the previous line this causes a regression. I have an improved patch but it's not ready either.

lib/libc/regex/regcomp.c
774

Are you disabling collation? This seems unrelated and should be an independent revision.

This revision now requires changes to proceed.May 17 2016, 2:33 PM
In D2792#136077, @pfg wrote:

Interestingly .. the OpenBSD guys have been working on a similar issue that breaks Mesa.

I almost committed a similar patch, but it is incorrect.
Check:
https://www.mail-archive.com/tech@openbsd.org/msg31625.html

Thank you for the pointer.

lib/libc/regex/regcomp.c
774

I don't recall much details now. I think that I did this to get some test program to compile.
Please note that for normal compilation REDEBUG is never defined and so the code was enabled.

The problem has been solved by rS300683, so this request is no longer valid.