Page MenuHomeFreeBSD

regcomp(3): Refactor and combine top-level expression parsers
ClosedPublic

Authored by kevans on May 26 2017, 3:19 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 17, 2:25 AM
Unknown Object (File)
Tue, Nov 12, 5:07 AM
Unknown Object (File)
Tue, Nov 12, 4:59 AM
Unknown Object (File)
Tue, Nov 12, 4:49 AM
Unknown Object (File)
Tue, Nov 12, 2:57 AM
Unknown Object (File)
Tue, Nov 12, 2:33 AM
Unknown Object (File)
Sep 21 2024, 12:46 AM
Unknown Object (File)
Sep 16 2024, 10:56 AM
Subscribers

Details

Summary

The impending libregex will implement GNU extensions to bring BREs and
EREs closer together. Prepare for this and reduce the diff of libregex changes by
refactoring and combining the top-level parsers for EREs/BREs ahead of time.

Branching functionality has been split out to make it easier to follow the combined
version of the top-level parser. It may also be enabled in the parsing context to make
it easier when libregex enables branching for BREs.

A branching context was also added for the various branching functions and so that
BREs, for instance, can determine if they're the first expression in a chain of expressions
within the current branch and treat '*' as ordinary if so.

This should have no functional impact and negligible performance impact.

As an aside, I'm not against moving the pre- and post- parse logic in the top-level
parser (if (p->bre && EAT('^')) { ... } and if (p->bre && wasdollar) { ... }
out into their own functions, setting them as function pointers of the struct parse
in the caller of the top-level parser if (p->bre), and then just calling them
in the appropriate places if they're set. I just did not do this here because it
seemed like perhaps a step too far.

Test Plan

Run libc, sed, gsed, bsdgrep tests; exp-run if necessary

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I'm not familiar enough with grep to have any idea if these changes are correct.

If they pass the tests, though, I'm happy.

Some minor style nits below.

lib/libc/regex/regcomp.c
270 ↗(On Diff #28828)

style nit: space between & and arguments

271–272 ↗(On Diff #28828)

Seems like these should be bools instead?

293 ↗(On Diff #28828)

same

519 ↗(On Diff #28828)

style nit: return (nskip);

530 ↗(On Diff #28828)

style nit: compare non-boolean values against 0 (rather than using !)

661–662 ↗(On Diff #28828)

style nit: This should probably just be one line

This revision is now accepted and ready to land.Jun 1 2017, 11:17 PM
lib/libc/regex/regcomp.c
271–272 ↗(On Diff #28828)

I actually made them`bool` originally, then later changed them to int since I had no argument to introduce bool to a file that hasn't used them yet. I then forgot to correct all of the testing of them down below to use int-style rather than bool-style.

Admittedly, that's probably bad reasoning, so I'll go ahead and fix that.

lib/libc/regex/regcomp.c
271–272 ↗(On Diff #28828)

If the dominant pattern in the file is int-style boolean values, this is ok. (And probably better, just for consistency's sake.) I'm definitely less familiar with the file than you are, so your best judgement on it is fine with me.

(Even bool-style conditionals are probably fine, given they're standing in for boolean values.)

kevans edited edge metadata.
  • Revise style based on recommendations by cem@, use bool appropriately
  • Be more explicit about initialization
  • Do previously mentioned refactoring of BOL/EOL checks for BREs out into BRE-specific pre-/post- parse functions

Precedent for int-style bool was not all that strong, and I think I touched most of the cases anyways, so I ran with it
and used bools for the new stuff again.

I don't think performance impact of the pre-/post- parse function stuff should be bad. EREs won't have them, so they
have no need to dereference+call. BREs only do one iteration of this outer loop per invocation, with one invocation for each
expression or (subexpression), so performance is not really degraded from this. Besides this, we do have a couple benefits:

  • Slightly more readable top-level parser, removing bits that are specific to the one expression type
  • Less frequent checking whether we're parsing a BRE or not; benefit likely mitigated by the dereference+call overhead
This revision now requires review to proceed.Jun 2 2017, 4:44 AM
This revision is now accepted and ready to land.Jun 2 2017, 4:22 PM

I like bools too :).

Presumably the tests on the base system tools all passed. Do you think it's worth requesting an exp-run?

lib/libc/regex/regcomp.c
65 ↗(On Diff #29134)

I think it's worth a comment to clarify this struct.

Presumably the tests on the base system tools all passed. Do you think it's worth requesting an exp-run?

Indeed, they do. Honestly, I don't think it's worth it; breakage resulting from getting the top-level parser wrong is catastrophic and really really obvious from the base system tests in my experience in working with it given the fundamental nature of the beast.

I'll add said comment shortly.

kevans edited edge metadata.

Comment on the usage of branchc

This revision now requires review to proceed.Jul 6 2017, 1:07 AM
This revision was automatically updated to reflect the committed changes.