bsdgrep: provide primitive literal matcher instead of erroring out on fgrep use
ClosedPublic

Authored by kevans on Aug 17 2017, 3:35 AM.

Details

Summary

fgrep/grep -F will error out at runtime if compiled with a regex(3)
that does not define REG_NOSPEC or REG_LITERAL. glibc is one such regex(3)
implementation, and as it turns out they don't support literal matching at all.

Provide a primitive literal matcher for use with glibc and other implementations
that don't support literal matching so that we don't completely lose
fgrep/grep -F if compiled against libgnuregex on stable/10, stable/11, or
other systems that we don't necessarily support.

This is a wholly unoptimized implementation with no plans to optimize it as of
now. This is due to both its use-case being primarily on unsupported systems in
the near-distant future and that it's reinventing the wheel that we already
have available as a feature of regex(3).

More fgrep tests have been added to more extensively test this one-off
implementation.

MFC after: 2 weeks?

Test Plan

Run the tests, Luke

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
kevans created this revision.Aug 17 2017, 3:35 AM
emaste added inline comments.Aug 18 2017, 2:52 PM
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
716 ↗(On Diff #32151)

Probably worth adding a test for something like -e "BAR" -e "bAz"

726 ↗(On Diff #32151)

any reason 'h' is missing, just out of curiosity?

usr.bin/grep/grep.c
743–747 ↗(On Diff #32151)

I'd generally have a slight preference for

#ifdef WITH_INTERNAL_NOSPEC
        if (!matchall && grepbehave != GREP_FIXED) {
#else
        if (!matchall) {
#endif
usr.bin/grep/util.c
360 ↗(On Diff #32151)

Might be worth extending this comment with info you have in the proposed commit message above (i.e. does not apply in default FreeBSD config, applies if built against libgnuregex or on non-FreeBSD OSes).

372 ↗(On Diff #32151)

Ought not use _-prefixed variables, what about strstr_fn = strcasestr or such?

kevans updated this revision to Diff 32204.Aug 18 2017, 3:12 PM
  • Address concerns by emaste@
kevans marked 4 inline comments as done.Aug 18 2017, 3:15 PM
kevans added inline comments.
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
716 ↗(On Diff #32151)

Right- actual mixed-case stuff makes sense =p.

726 ↗(On Diff #32151)

Keyboard goofs- 'h' takes extra effort on my Pinebook's keyboard. When I was typing this out, I failed to put in that extra effort and then just carried on with the typo.

usr.bin/grep/util.c
515 ↗(On Diff #32204)

I forgot to update my working copy -- this will not be included since it already went in.

372 ↗(On Diff #32151)

It did indeed feel dirty, but for some reason I hadn't thought of a _fn suffix.

emaste added inline comments.Aug 18 2017, 3:21 PM
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
726 ↗(On Diff #32151)

:)

ngie added a comment.Aug 18 2017, 9:14 PM

Could you please commit the testcases separately? It seems like they are good, but orthogonal to fixing fgrep.

contrib/netbsd-tests/usr.bin/grep/t_grep.sh
690 ↗(On Diff #32204)

Aren't some testcases missing dealing with newline separated patterns?

-F, --fixed-strings
         Interpret PATTERN as a list of fixed strings, separated by
         newlines, any of which is to be matched.
usr.bin/grep/grep.c
727 ↗(On Diff #32204)

Could you please add a comment as to why this is the case?

usr.bin/grep/util.c
74 ↗(On Diff #32204)

Spurious space after "char *".

kevans updated this revision to Diff 32226.Aug 18 2017, 9:34 PM
kevans marked 2 inline comments as done.
  • Address ngie@'s concerns
  • Kill an extra space around strstr_fn's definition

I left a rather verbose comment around the GREP_FIXED cflag setting to explain both
motivation behind the preproc bits and what to do if we're wrong, since there's
one other place that would need amended in that case.

I could, but the only motivation behind these tests was to make sure that my implementation of the fixed matcher is functional -- without having to write the fixed matcher, I would not have bothered with these tests since I'd expect regex(3)'s REG_NOSPEC bits to have already been tested to a better standard than I could duplicate.

That being said, I have no strong opinion either way on whether they get committed together or separately.

contrib/netbsd-tests/usr.bin/grep/t_grep.sh
690 ↗(On Diff #32204)

Negative; that's not an fgrep specific feature, that's just how grep in general interprets PATTERN. This is reflected better in bsdgrep(1), where this isn't specifically mentioned in -F. Breaking up the pattern by newlines happens long before it gets to the bit that I care to test the most, and should happen elsewhere.

cem accepted this revision.Aug 19 2017, 4:43 AM
This revision is now accepted and ready to land.Aug 19 2017, 4:43 AM
ngie accepted this revision.Aug 19 2017, 4:07 PM
This revision was automatically updated to reflect the committed changes.