Details

Reviewers

emaste
cem
ngie

Commits

rS317665: bsdgrep: fix -w -v matching improperly with certain patterns

Summary

-w and -v flag matching is mostly functional, but with some minor problems. In this case, there were two problems:

1.) -w flag processing only allowed one iteration through pattern matching on a line. This is problematic if one pattern could match more than once, or if there were multiple patterns and the earliest/longest match was not the most ideal, and
2.) I "fixed" things to not further process a line if the first iteration through patterns didn't produce any matches. This is clearly wrong if we're dealing with the more restrictive -w matching.

#2 breakage could have also occurred before my broad rewrites, but it would be more arbitrary based on input patterns as to whether or not it actually affected things.

Ultimately, we can fix both of these by forcing a retry of the patterns after advancing just past the start of the first match if we're doing more restrictive -w matching and we didn't get any hits to start with. We can also go ahead and move -v flag processing outside of the loop it's in so that we have as much chance as possible to match in the more restrictive cases. This wasn't strictly wrong, but it could be a little more error prone.

While here, I introduced some regressions tests for this behavior and went ahead and fixed some excessive wrapping nearby that actually hindered my ability to parse through these if statements. gnugrep(1) does continue to pass these new tests.

PR: 218467, 218811

Test Plan

Ensure no regressions in Kyua tests, to include newly introduced tests

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kevans created this revision.Apr 9 2017, 4:27 AM

Herald added a reviewer: ngie. · View Herald TranscriptApr 9 2017, 4:27 AM

Harbormaster completed remote builds in B8624: Diff 27250.Apr 9 2017, 4:27 AM

cem added inline comments.Apr 9 2017, 5:27 AM

usr.bin/grep/util.c
319–322 ↗	(On Diff #27250)	These changes are unrelated, right? If so, please leave them out of this patch for now.
324 ↗	(On Diff #27250)	Ditto
327 ↗	(On Diff #27250)	There's no corresponding check for whole word matching. Seems like we'll retry in any case?
368 ↗	(On Diff #27250)	style(9) nit: no space between `st` and `++`.

kevans added inline comments.Apr 9 2017, 5:33 AM

usr.bin/grep/util.c
319–322 ↗	(On Diff #27250)	Alright, will revise in the morning.
327 ↗	(On Diff #27250)	Right, so I didn't want to limit retry to just the wflag case, and there is only one path to set retry at this point- through the wflag. It's entirely possible that I might later find a reason to allow retry in other paths, such as -x flag (for some reason).

LGTM modulo nits which you'll fix.

This revision is now accepted and ready to land.Apr 9 2017, 5:47 AM

Remove unrelated style changes, fix style(9) nit

This revision now requires review to proceed.Apr 9 2017, 1:36 PM

Harbormaster completed remote builds in B8636: Diff 27263.Apr 9 2017, 1:36 PM

kevans marked 6 inline comments as done.Apr 9 2017, 1:36 PM

cem accepted this revision.Apr 9 2017, 3:35 PM

This revision is now accepted and ready to land.Apr 9 2017, 3:35 PM

Rebase to account for recent commits, stop polluting repo with trivial test inputs that can be generated on the fly

This revision now requires review to proceed.Apr 17 2017, 4:54 PM

emaste added inline comments.Apr 17 2017, 5:34 PM

usr.bin/grep/util.c
355–357 ↗	(On Diff #27487)	overly-long lines

kevans added inline comments.Apr 17 2017, 5:41 PM

usr.bin/grep/util.c
355–357 ↗	(On Diff #27487)	Ah, drats. I forgot to take into account tab width.

Re-wrap long lines, take into account tab widths

Harbormaster completed remote builds in B8762: Diff 27488.Apr 17 2017, 5:45 PM

kevans marked 2 inline comments as done.Apr 17 2017, 5:45 PM

Slight optimization; advance to the end of the earliest match rather than one character at a time

Wrapping, use start of match instead of end of match due to bug found and add test case

Alright, sorry, I think I'm done tweaking this. Advancing one character at a time could've been pretty bad for long line lengths and one or two patterns matching not at the beginning of the string.

Advancing to just part the start of the shortest match is pretty safe and can reduce time spent here by quite a bit.

I've also added a test case for a pattern that can match in whole or subset, but only subset matches -w, to make sure we test the case that I tried to break in one of these middle revisions.

cem accepted this revision.Apr 18 2017, 4:49 PM

This revision is now accepted and ready to land.Apr 18 2017, 4:49 PM

Also note that the -w flag in general could probably be optimized by actually embedding pattern in :<: and [[:>::] if !(cflags&REG_NOSPEC); we could then avoid doing the after-match check for word boundaries in all but the literal (fgrep/-F) case, and all matches by regexec(3) could be considered in !REG_NOSPEC case.

ngie accepted this revision.Apr 18 2017, 5:47 PM

ngie added inline comments.

usr.bin/grep/util.c
362–363 ↗	(On Diff #27525)	I'm not 100% caught up with regcomp and friends, but can `.rm_so` ever be -1 here?

kevans added inline comments.Apr 18 2017, 5:50 PM

usr.bin/grep/util.c
362–363 ↗	(On Diff #27525)	Should not be possible; we evacuate further up if we didn't actually match. We've requested exactly one match and we matched, so that match should be filled to a valid substring of the subject.

Ping @emaste, anything else on this one? =)

kevans edited the summary of this revision. (Show Details)Apr 24 2017, 9:49 PM

kevans edited the summary of this revision. (Show Details)Apr 26 2017, 7:03 PM

Closed by commit rS317665: bsdgrep: fix -w -v matching improperly with certain patterns (authored by emaste). · Explain WhyMay 2 2017, 2:32 AM

This revision was automatically updated to reflect the committed changes.

bsdgrep(1): Fix -w -v matching improperly with certain patterns
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 27903

head/contrib/netbsd-tests/usr.bin/grep/t_grep.sh

head/usr.bin/grep/util.c

bsdgrep(1): Fix -w -v matching improperly with certain patternsClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 27903

head/contrib/netbsd-tests/usr.bin/grep/t_grep.sh

head/usr.bin/grep/util.c

bsdgrep(1): Fix -w -v matching improperly with certain patterns
ClosedPublic
Actions

Revision Contents
Changeset List