Page MenuHomeFreeBSD

bsdgrep(1): Print more than MAX_LINE_MATCHES in a line if they exist
ClosedPublic

Authored by kevans on May 2 2017, 9:16 PM.
Tags
None
Referenced Files
F81650009: D10577.diff
Fri, Apr 19, 11:25 AM
Unknown Object (File)
Wed, Apr 3, 9:09 AM
Unknown Object (File)
Jan 31 2024, 12:01 PM
Unknown Object (File)
Jan 16 2024, 7:12 PM
Unknown Object (File)
Dec 21 2023, 7:31 AM
Unknown Object (File)
Dec 10 2023, 7:23 PM
Unknown Object (File)
Nov 21 2023, 10:15 AM
Unknown Object (File)
Sep 13 2023, 6:23 AM
Subscribers
None

Details

Summary

Currently at 32, MAX_LINE_MATCHES is not necessarily reasonable as a cap
on the number of matches we can produce for a given line. In fact, there probably
is no such thing as a reasonable cap. If in one invocation we've matched more than
MAX_LINE_MATCHES, keep processing and matching from the last match we found until
we've found all matches.

For the regression test, we produce a number of matches that will likely be larger than we'll ever raise MAX_LINE_MATCHES to, 4096, and make sure we actually get 4096 lines of output with the -o flag so that we know we're not truncating it for certain numbers of matches close to or at MAX_LINE_MATCHES. We'll also make sure that every distinct line is getting its own line number to detect line metadata not being printed as appropriate along the way.

This test does not currently pass with bsdgrep or gnugrep in base. The former will be fixed in another patch correcting metadata printing for all combinations of -o/--color.

The primary issue, excessive matching, was reported in conjunction with -w breakage
in PR 218811, this patch fixing up the remaining issue reported.

PR: 218811

Test Plan

Ensure no regressions with kyua test suite

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

usr.bin/grep/util.c
290 ↗(On Diff #27947)

space here, while (

  • Fix space issue I forgot to correct prior to submission

(Approved modulo nits and clarifying question.)

usr.bin/grep/util.c
366 ↗(On Diff #27951)

Maybe remove these redundant initializations.

631–641 ↗(On Diff #27951)

Help me understand this change. We've gone from printing once per line to printing once per match. Was it just wrong before?

This revision is now accepted and ready to land.May 3 2017, 2:28 AM
usr.bin/grep/util.c
631–641 ↗(On Diff #27951)

Ah, I need to rework that. I frequently forget about colored output. =)

So, there's two scenarios where we go down this code path: -o and --color. The current behavior is wrong in the former scenario since multiple matches in a line are separated by newlines, and each *line* should prefixed with file name/line number/offset based on input flags.

We should probably just print before the loop for not -o output and print within the loop for -o, rather than print before the loop and then every match but the first for -o. I'll do this and throw in a test for the color case (as well as color+o).

It also looks like the putchar() in the !oflag branch at the end of this section got reverted to '\n' instead of fileeol, but I'll leave that out of here.

kevans edited edge metadata.
  • Revert metadata change; that's turning into worthy of its own review
  • Remove redundant initializations

The second atf_check that verifies we have no matches that aren't prefixed with
the line number should stay, just in case things later get changed and we end up with
hinky behavior once we've produced more than MAX_LINE_MATCHES matches.

This revision now requires review to proceed.May 3 2017, 3:01 AM
This revision is now accepted and ready to land.May 3 2017, 3:17 AM
kevans added inline comments.
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
429 ↗(On Diff #27958)

I botched this part again -- $type should be $?. If this revision is otherwise good, can this change be slipped into the commit?

contrib/netbsd-tests/usr.bin/grep/t_grep.sh
429 ↗(On Diff #27958)

Yep!

contrib/netbsd-tests/usr.bin/grep/t_grep.sh
429 ↗(On Diff #27958)

Err... please update the diff (it'll make it easier for the person that does the final commit)

usr.bin/grep/util.c
66 ↗(On Diff #27958)

Add a comment for consistency with the other fields?

366 ↗(On Diff #27958)

What is nst?

512 ↗(On Diff #27958)

What is st?

kevans added inline comments.
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
429 ↗(On Diff #27958)

Alrighty, I'm going to have to update the diff anywho for other concerns presented -- I've been told in the past that small changes like this don't necessarily need a full round through Phabricator on their own.

usr.bin/grep/util.c
366 ↗(On Diff #27958)

Well, I think of it as 'new start'.

512 ↗(On Diff #27958)

I think of it as 'start', for 'start of section to match.' This is going to get changed (along with c, because the name/type bother me) in a future changeset, but thus far I've tried to avoid renaming unless I'm actually changing semantics. st will more likely just go away in favor of using the new pc->lnstart, since that should always be accurate.

usr.bin/grep/util.c
512 ↗(On Diff #27958)

I should note that I haven't done this yet because I've yet to do my due diligence in making sure that using pc->lnstart in that way won't complicate other things existing in local branches or future plans.

usr.bin/grep/util.c
366 ↗(On Diff #27958)

I wouldn't have thought nst was new start and st was start (also: start of what?). Could you please expound the variable names so their purposes are more intuitive please?

usr.bin/grep/util.c
366 ↗(On Diff #27958)

"Next" start and start are intuitive given their usage. I don't think there's any value in blowing out some already long lines with longer names.

If anything can be done for clarity, I think refactoring this larger function into smaller routines might be good. But it doesn't have to happen in this patch.

kevans edited edge metadata.
  • $type => $?
  • Add comment for lnstart field
This revision now requires review to proceed.May 5 2017, 5:31 PM
kevans added inline comments.
usr.bin/grep/util.c
366 ↗(On Diff #27958)

Refactoring this a little further should be fairly intuitive now that I've simplified this function quite a bit and moved match information out into the parsing context.

usr.bin/grep/util.c
240–244 ↗(On Diff #28071)

Was this bit added intentionally?

cem added inline comments.
usr.bin/grep/util.c
240–244 ↗(On Diff #28071)

Ignore me, I misinterpreted phabricator's "light green." Probably this was added in another commit and you just rebased your changes.

This revision is now accepted and ready to land.May 11 2017, 4:28 PM
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
438 ↗(On Diff #28071)

This fails until D10580 goes in so I am going to comment it out initially, and enable it with D10580.

This revision was automatically updated to reflect the committed changes.
head/contrib/netbsd-tests/usr.bin/grep/t_grep.sh
428–430

This could have been done with: jot -b 'x' 4096 | tr -d '\n' > test.in

433

Why commented out?

contrib/netbsd-tests/usr.bin/grep/t_grep.sh
438 ↗(On Diff #28071)

Ah, ok... I didn't catch that part before I posted my previous comment.