Page MenuHomeFreeBSD

bsdgrep(1): generally fix matching behavior in one broad sweep
ClosedPublic

Authored by kevans on Mar 22 2017, 9:35 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jan 6, 10:52 PM
Unknown Object (File)
Sun, Dec 22, 11:09 PM
Unknown Object (File)
Sun, Dec 22, 9:24 PM
Unknown Object (File)
Thu, Dec 12, 12:49 AM
Unknown Object (File)
Wed, Dec 11, 12:52 AM
Unknown Object (File)
Dec 9 2024, 6:57 PM
Unknown Object (File)
Dec 1 2024, 1:19 PM
Unknown Object (File)
Nov 26 2024, 1:04 AM
Subscribers

Details

Summary
  • Set REG_NOTBOL if we've already matched beginning of line and we're examining later parts
  • For each pattern we examine, apply it to the remaining bits of the line rather than (potentially) smaller subsets
  • Check for REG_NOSUB after we've looked at all patterns initially matching the line
  • Keep track of the last match we made to later determine if we're simply not matching any longer or if we need to proceed another byte because we hit a zero-length match
  • Match the earliest and longest bit of each line before moving the beginning of what we match to further in the line, past the end of the longest match; this generally matches how gnugrep(1) seems to behave, and seems like pretty good behavior to me
  • Finally, bail out of printing any matches if we were set to print all (empty pattern) but -o (output matches) was set

PR: 195763, 180990, 197555, 197531, 181263, 209116

Test Plan

Test the test cases provided in these four different PRs. Check kyua tests for no additional regressions.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 8255
Build 8498: arc lint + arc unit

Event Timeline

Thanks for doing this work. grep being GPL (and bsdgrep being an inadequate replacement) is one of the last specters of GPL in the codebase.

Would it be possible to add kyua tests for the issues in PR?

I do intend to address the test coverage for these cases and many others, but I assumed that should be out of scope for this already somewhat large set of changes on the grep(1) that we do not currently install.

I plan to ping @ngie when I go to follow up with more test cases, since they originate from contrib/netbsd-tests and @ngie seems to be well versed in this area.

I have to say, I'm totally unfamiliar with this code base and can't comment on the correctness of these changes. However, if there aren't regressions and we can add testcases for the bugs this is intended to fix, I'm ok with it going in anyway.

usr.bin/grep/util.c
376

Style nit — no space between variable and unary operators.

In D10104#208972, @kevans91_ksu.edu wrote:

I do intend to address the test coverage for these cases and many others, but I assumed that should be out of scope for this already somewhat large set of changes on the grep(1) that we do not currently install.

I like tests and changes to go in together. Maybe even tests first, if you want to separate them. But if you have a strong opinion about it, I'm open to being persuaded.

I plan to ping @ngie when I go to follow up with more test cases, since they originate from contrib/netbsd-tests and @ngie seems to be well versed in this area.

Sounds good.

In D10104#208976, @cem wrote:

I like tests and changes to go in together. Maybe even tests first, if you want to separate them. But if you have a strong opinion about it, I'm open to being persuaded.

I have no strong feelings toward whether I do them now or later, but I do feel that they warrant their own review. My reasoning being that they're tangential enough to the purpose of this review and technically testing both bsdgrep and gnugrep, which leads me to believe there's probably a discrete but potentially overlapping set of people that would be interested in those changes alone.

I'll go ahead and write the additional tests and submit them tonight, hopefully.

  • Remove space between unary operator and identifier
kevans added inline comments.
usr.bin/grep/util.c
376

Thanks! Addressed this instance as well as one further up in this function.

Sneak in a small extra change: use \33[K only at end of match coloring to match gnugrep(1) behavior

Mistakenly introduced an extra line of churn

Replace '00' for matching gnugrep(1) color output

This revision is now accepted and ready to land.Mar 23 2017, 7:31 PM
This revision was automatically updated to reflect the committed changes.