Page MenuHomeFreeBSD

bsdgrep(1): Correct line metadata printing (-b,-H,-n) on a per-line basis
ClosedPublic

Authored by kevans on May 3 2017, 4:49 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 12 2024, 8:13 PM
Unknown Object (File)
Feb 23 2024, 1:28 PM
Unknown Object (File)
Jan 28 2024, 3:53 PM
Unknown Object (File)
Jan 28 2024, 11:26 AM
Unknown Object (File)
Jan 24 2024, 4:14 AM
Unknown Object (File)
Jan 22 2024, 1:02 PM
Unknown Object (File)
Dec 25 2023, 1:42 PM
Unknown Object (File)
Dec 20 2023, 1:28 AM
Subscribers
None

Details

Summary

Current metadata printing with either the -b, -H, and -n flags suffers
from a couple of flaws:

1.) -b/offset printing is currently broken when used in conjunction with -o
2.) With -o, bsdgrep does not currently print metadata for every match/line, just the first match of a line
3.) None of this is tested, so the above two were caught in testing for D10577

So we address these issues by making sure we output this data per-match if the -o flag is specified,
and prior to outputting any matches if -o but not --color, since --color alone will not generate
a new line of output for every iteration over the matches.

To correct -b output, we go ahead and fudge the line offset as we're printing matches. This
can be justified in that we're printing metadata for the line at a given point, rather
than the whole line, so we commandeer the pc->ln.off to indicate where in the line
we're currently printing.

While here, make sure we're using grep_printline in -A context so that we don't
accidentally botch that somehow. Context printing should explicitly *never* look
at the parsing context, just the line.

The tests included do not pass with gnugrep in base due to it exhibiting similar
quirky behavior that bsdgrep previously exhibited.

struct members also got realigned here, since I was adding member to indicate how
many times printline() has been invoked on a given line.

Test Plan

Ensure no regressions in Kyua tests

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

  • Needless whitespace change
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
501 ↗(On Diff #27963)

What about --color?

519–528 ↗(On Diff #27963)

I don't necessarily follow these new testcases. What kind of example output are you trying to guard against/prove will not occur?

usr.bin/grep/util.c
67 ↗(On Diff #27963)

bool instead of int?

298 ↗(On Diff #27963)

Gratuitous change renaming the function?

626 ↗(On Diff #27963)

Why the special !oflag case here?

contrib/netbsd-tests/usr.bin/grep/t_grep.sh
501 ↗(On Diff #27963)

I've only listed here flags that actually add metadata to output lines; --color and -o aren't specifically mentioned since they change the contents of the lines that are output but not the metadata that we care about for these tests.

519–528 ↗(On Diff #27963)

We're trying to make sure that:

1.) The metadata is present at the beginning of the line, and
2.) The metadata is *not* being re-output in the middle of the line

This is specifically a potential problem with ! -o and --color, since metadata gets printed per-match with -o it's entirely possible that --color could get screwed up and end up with weird results like:

2:yy2:yy

Given something like grep --color=always -e 'yy'. This is a thing that I accidentally did when fixing the problem initially.

usr.bin/grep/util.c
67 ↗(On Diff #27963)

I'm torn on this one, because one could conceivably want to know how many times we've printed on the current line (for debugging purposes, mostly). This member currently reflect thats that information, though we only care about pc->printed == 0 in normal execution.

298 ↗(On Diff #27963)

Safer; this was briefly mentioned in the revision summary. Context printing shouldn't including the parsing context, to be safe in case the context has stale data in it for some reason. This is consistent with printing in queue.c, and is a thing solely because we can strip out some of the logic from printline when we know we're not going to be printing any matches.

I am open to rename the parsing context aware printline, but I couldn't decide on a good name when separating these out.

626 ↗(On Diff #27963)

Because this covers the case of --color and ! -o output, where we'll just print one line with all matches. I did this so that the match printing loop just below doesn't have to be cluttered with logic to only print metadata if !oflag on the first match or oflag on every match. This approach felt cleaner in the end.

cem added inline comments.
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
506–508 ↗(On Diff #27963)

Does it pass with GNU grep from ports? It would be good to be able to test this behavior against that version of grep.

usr.bin/grep/util.c
621–624 ↗(On Diff #27963)

I would describe this as maybe "overloading" or "repurposing" instead of fudging, just to be a little more specific.

I'm not really a fan of doing that, either. Could we just add another field in pc->ln instead?

662 ↗(On Diff #27963)

I prefer suffix ++ in C code — it's not C++ where you have overloaded operators. And it would match the preceding i++.

This revision is now accepted and ready to land.May 11 2017, 4:18 PM
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
506–508 ↗(On Diff #27963)

It does(-ish)- I'll make a small amendment here in a bit. GNU grep from ports also colors the metadata, causing check_expr to be a little too strict because of the color output being sprinkled around the line number.

usr.bin/grep/util.c
621–624 ↗(On Diff #27963)

Yeah, so, I guess the most correct thing to do would be:

1.) Add another member to save the byte offset independent of the line offset
2.) Take this into account in printline_metadata
3.) Set it to match.rm_so every iteration, reset to 0 as the line offset is advanced in procfile

kevans edited edge metadata.

Address concerns by cem@:

  • Create a separate 'struct str' property to represent the byte offset within a line (if any)
  • Use suffix++
  • Relax test to work despite colored differences between textproc/gnugrep and bsdgrep
This revision now requires review to proceed.May 11 2017, 5:21 PM

Remove cruft... I need to write a script to manage this better

cem added inline comments.
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
506 ↗(On Diff #28251)

Should the first [^:] be [^0-9]?

This revision is now accepted and ready to land.May 11 2017, 5:30 PM
contrib/netbsd-tests/usr.bin/grep/t_grep.sh
506 ↗(On Diff #28251)

[^0-9:] would be a candidate, but ultimately it doesn't really matter- we really just need to make sure that we have some number like sequence, the ':' delimiter, and then a sequence that doesn't include a ':' (i.e. metadata in the middle, since the input file explicitly doesn't use a colon anywhere).

LGTM.

contrib/netbsd-tests/usr.bin/grep/t_grep.sh
506 ↗(On Diff #28251)

Ok.

(I misread it :-). I thought the [0-9] portion as [0-9]* and thought the [^:]* would greedily consume any digits, not verifying what we wanted.)

kevans edited edge metadata.
  • Rebase following rS318571, uncomment test that now passes
  • Revised lnstart field comment to not pass 80 col; now less verbose, but between the name and the comment it should still be clear the intent/usage
This revision now requires review to proceed.May 20 2017, 4:14 AM
This revision was automatically updated to reflect the committed changes.