Page MenuHomeFreeBSD

bsdgrep(1): Print more than MAX_LINE_MATCHES in a line if they exist
ClosedPublic

Authored by kevans on May 2 2017, 9:16 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 15, 4:00 PM
Unknown Object (File)
Wed, Nov 13, 5:00 AM
Unknown Object (File)
Oct 9 2024, 7:56 PM
Unknown Object (File)
Oct 2 2024, 5:34 AM
Unknown Object (File)
Sep 26 2024, 8:25 AM
Unknown Object (File)
Sep 24 2024, 8:41 AM
Unknown Object (File)
Sep 24 2024, 5:40 AM
Unknown Object (File)
Sep 23 2024, 11:05 PM
Subscribers
None

Details

Summary

Currently at 32, MAX_LINE_MATCHES is not necessarily reasonable as a cap
on the number of matches we can produce for a given line. In fact, there probably
is no such thing as a reasonable cap. If in one invocation we've matched more than
MAX_LINE_MATCHES, keep processing and matching from the last match we found until
we've found all matches.

For the regression test, we produce a number of matches that will likely be larger than we'll ever raise MAX_LINE_MATCHES to, 4096, and make sure we actually get 4096 lines of output with the -o flag so that we know we're not truncating it for certain numbers of matches close to or at MAX_LINE_MATCHES. We'll also make sure that every distinct line is getting its own line number to detect line metadata not being printed as appropriate along the way.

This test does not currently pass with bsdgrep or gnugrep in base. The former will be fixed in another patch correcting metadata printing for all combinations of -o/--color.

The primary issue, excessive matching, was reported in conjunction with -w breakage
in PR 218811, this patch fixing up the remaining issue reported.

PR: 218811

Test Plan

Ensure no regressions with kyua test suite

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 9034
Build 9440: arc lint + arc unit

Event Timeline

usr.bin/grep/util.c
290

space here, while (

  • Fix space issue I forgot to correct prior to submission

(Approved modulo nits and clarifying question.)

usr.bin/grep/util.c
366

Maybe remove these redundant initializations.

631–641

Help me understand this change. We've gone from printing once per line to printing once per match. Was it just wrong before?

This revision is now accepted and ready to land.May 3 2017, 2:28 AM
usr.bin/grep/util.c
631–641

Ah, I need to rework that. I frequently forget about colored output. =)

So, there's two scenarios where we go down this code path: -o and --color. The current behavior is wrong in the former scenario since multiple matches in a line are separated by newlines, and each *line* should prefixed with file name/line number/offset based on input flags.

We should probably just print before the loop for not -o output and print within the loop for -o, rather than print before the loop and then every match but the first for -o. I'll do this and throw in a test for the color case (as well as color+o).

It also looks like the putchar() in the !oflag branch at the end of this section got reverted to '\n' instead of fileeol, but I'll leave that out of here.

kevans edited edge metadata.
  • Revert metadata change; that's turning into worthy of its own review
  • Remove redundant initializations

The second atf_check that verifies we have no matches that aren't prefixed with
the line number should stay, just in case things later get changed and we end up with
hinky behavior once we've produced more than MAX_LINE_MATCHES matches.

This revision now requires review to proceed.May 3 2017, 3:01 AM
This revision is now accepted and ready to land.May 3 2017, 3:17 AM
kevans added inline comments.
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
429

I botched this part again -- $type should be $?. If this revision is otherwise good, can this change be slipped into the commit?

contrib/netbsd-tests/usr.bin/grep/t_grep.sh
429

Yep!

contrib/netbsd-tests/usr.bin/grep/t_grep.sh
429

Err... please update the diff (it'll make it easier for the person that does the final commit)

usr.bin/grep/util.c
66

Add a comment for consistency with the other fields?

366

What is nst?

512

What is st?

kevans added inline comments.
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
429

Alrighty, I'm going to have to update the diff anywho for other concerns presented -- I've been told in the past that small changes like this don't necessarily need a full round through Phabricator on their own.

usr.bin/grep/util.c
366

Well, I think of it as 'new start'.

512

I think of it as 'start', for 'start of section to match.' This is going to get changed (along with c, because the name/type bother me) in a future changeset, but thus far I've tried to avoid renaming unless I'm actually changing semantics. st will more likely just go away in favor of using the new pc->lnstart, since that should always be accurate.

usr.bin/grep/util.c
512

I should note that I haven't done this yet because I've yet to do my due diligence in making sure that using pc->lnstart in that way won't complicate other things existing in local branches or future plans.

usr.bin/grep/util.c
366

I wouldn't have thought nst was new start and st was start (also: start of what?). Could you please expound the variable names so their purposes are more intuitive please?

usr.bin/grep/util.c
366

"Next" start and start are intuitive given their usage. I don't think there's any value in blowing out some already long lines with longer names.

If anything can be done for clarity, I think refactoring this larger function into smaller routines might be good. But it doesn't have to happen in this patch.

kevans edited edge metadata.
  • $type => $?
  • Add comment for lnstart field
This revision now requires review to proceed.May 5 2017, 5:31 PM
kevans added inline comments.
usr.bin/grep/util.c
366

Refactoring this a little further should be fairly intuitive now that I've simplified this function quite a bit and moved match information out into the parsing context.

usr.bin/grep/util.c
245–246

Was this bit added intentionally?

cem added inline comments.
usr.bin/grep/util.c
245–246

Ignore me, I misinterpreted phabricator's "light green." Probably this was added in another commit and you just rebased your changes.

This revision is now accepted and ready to land.May 11 2017, 4:28 PM
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
438

This fails until D10580 goes in so I am going to comment it out initially, and enable it with D10580.

This revision was automatically updated to reflect the committed changes.
head/contrib/netbsd-tests/usr.bin/grep/t_grep.sh
428–430 ↗(On Diff #28590)

This could have been done with: jot -b 'x' 4096 | tr -d '\n' > test.in

433 ↗(On Diff #28590)

Why commented out?

contrib/netbsd-tests/usr.bin/grep/t_grep.sh
438

Ah, ok... I didn't catch that part before I posted my previous comment.