Page MenuHomeFreeBSD

Update zgrep(1) to correctly handle -f FILE and combined flags
ClosedPublic

Authored by leres on Jul 10 2020, 9:38 PM.
Tags
None
Referenced Files
Unknown Object (File)
Jan 19 2024, 2:46 AM
Unknown Object (File)
Dec 30 2023, 4:20 PM
Unknown Object (File)
Dec 29 2023, 9:43 PM
Unknown Object (File)
Dec 29 2023, 6:16 AM
Unknown Object (File)
Dec 24 2023, 11:10 PM
Unknown Object (File)
Dec 20 2023, 4:20 AM
Unknown Object (File)
Dec 7 2023, 10:14 AM
Unknown Object (File)
Dec 1 2023, 12:07 AM
Subscribers

Details

Summary

The zgrep shell wrapper that has been used since r332993 to handle
compressed input files does not handle the -f flag correctly:

  1. Works % zfgrep RELEASE /etc/motd FreeBSD 12.1-RELEASE-p3 GENERIC
  2. Hangs reading from stdin % echo RELEASE > /tmp/0 % zfgrep -f /tmp/0 /etc/motd /etc/motd:FreeBSD 12.1-RELEASE-p3 GENERIC ^Z Suspended leviathan 89 % ps t PID TT STAT TIME COMMAND 11249 4 Ss 0:00.11 -csh (csh) 12190 4 T 0:00.00 /bin/sh /usr/bin/zfgrep -f /tmp/0 /etc/motd 12191 4 T 0:00.00 /usr/bin/zcat -f - 12192 4 T 0:00.00 grep -F -f /tmp/0 -- /etc/motd - 12193 4 R+ 0:00.00 ps t

In addition the script does not handle certain combined flags:

  1. Works % fgrep -we RELEASE /etc/motd FreeBSD 12.1-RELEASE-p3 GENERIC
  2. Fails % zfgrep -we RELEASE /etc/motd grep: RELEASE: No such file or directory

Add tests for the various failure cases and update the zgrep.sh
wrapper script to correct the issues.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

leres requested review of this revision.Jul 10 2020, 9:38 PM
usr.bin/grep/zgrep.sh
82 ↗(On Diff #74289)

Why is this case added? Is it related to the rest of the patch?

153 ↗(On Diff #74289)

So if -e is specified (this script doesn't appear to handle multiple uses of -e at all), we are now passing -e in grep_args, so we are losing quoting. As a result, with this patch the following no longer works: zgrep -e 'foo bar' *.

usr.bin/grep/zgrep.sh
82 ↗(On Diff #74289)

The problem is when you add the wildcard to -*[ABCDXdefm] it can potentially match long flags. Examples are --ignore-case, --line-buffered, --with-filename, and --no-filename because they end with 'e' and 'd' but it could also happen with any --regexp=XXX, --include=XXX, or --exclude=XXX that end with a flag in the match class.

The new version will include a test for this along with another test for a new problem; -e PATTERN followed by other flags; the break in the -e/-*e case needs to be a continue otherwise we stop processing flags prematurely.

153 ↗(On Diff #74289)

The new version will include a test for this case.

Add a test for problems with -e PATTERN when it is quoted. Add a
test for problems with long flags (e.g. --ignore-case), particularly
when reading from stdin. Add a test for -e PATTERN followed by
other flags.

Revert -e handling to preserve quoting. Change break to continue
in -e handling and process possible following flags.

usr.bin/grep/zgrep.sh
153 ↗(On Diff #74289)

Hum... I missed your comment about multiple uses of -e; I did some testing I see that grep applies all -e patterns:

# Works
echo foobar > test
grep -e foo -e xxx test
foobar

# No output
zgrep -e foo -e xxx test

So I should go back to my original idea of keep -e PATTERN in grep_args.

I'll wait for guidance before continuing.

usr.bin/grep/zgrep.sh
153 ↗(On Diff #74289)

Ok so fixing this is hard. I see several approaches to fix this. One is to use a temp file, put multiple -e patterns in the file and use -f tempfile. This is easy but yuck: temp file.

The other involves rewriting the script to loop over $@ so we can look at the quoted versions of arguments:

cat /var/tmp/y.sh
#!/bin/sh
for arg in "$@"; do
    echo "# ${arg}"
done
/var/tmp/y.sh foo \'\" '"foo bar"' "'foo bar'"
# foo
# '"
# "foo bar"
# 'foo bar'

Executing the final pipeline with eval would let the shell strip quoting. I'm not inclined to work on this solution unless it is agreed to be a reasonable solution.

It's worth noting that the grep(1) does not document what happens when you use more than one -e but clearly it handles it the same way it does mutiple patterns in a -f file.

Here's yet another regression I just noticed:

# Works
echo foobar > test
grep -efoo test
foobar

# Incorrect -H output
zgrep -efoo test
test:foobar
usr.bin/grep/zgrep.sh
101 ↗(On Diff #74337)

With this patch, zgrep now behaves differently with -e "".

82 ↗(On Diff #74289)

I see, thanks.

153 ↗(On Diff #74289)

Passing parameters through eval seems a bit dangerous. Without having thought about it carefully, can we guarantee that parameters don't get evaluated directly by the shell? In other words, can we do it in a way that ensures that zgrep.sh won't execute arbitrary commands?

It looks like you are right about the other bug. (It is not a regression from your change, to be clear.)

I don't think fixing these other issues should hold up this patch, once the edge case I pointed out above is fixed.

usr.bin/grep/zgrep.sh
101 ↗(On Diff #74337)

Next version adds a test and solves this.

Next round of fixes

I assume it's reasonable to add tests with atf_skip for the outstanding
regressions? There are (currently) three of these:

echo 'foo bar' > test
zgrep -wefoo test < /dev/null

echo foobar > test
zgrep -e foo -e xxx test

echo "'foo bar'" > test2
echo '"foo bar"' >> test2
zgrep -e "'foo bar'" -e '"foo bar"' test2

I noticed but did not add tests for whitespace in filenames, e.g:

echo foo > 'pattern 2'
echo foobar > test
zgrep -f 'pattern 2' test

Add a test for and fix for the long version of the -f problem:

echo foo > pattern
echo foobar > test
zgrep --file=pattern test </dev/null

Add a test for and fix for the long version of the -e problem:

echo 'foo bar' > test
zgrep --regexp='foo bar' test < /dev/null

New test and logic to avoid breaking an empty -p pattern:

echo foobar > test
zgrep -e '' test

Next round of fixes

I assume it's reasonable to add tests with atf_skip for the outstanding
regressions? There are (currently) three of these:

atf_expect_fail is better here, then if it succeeds it fails and we know it's safe to remove it.

atf_expect_fail is better here, then if it succeeds it fails and we know it's safe to remove it.

That makes sense; thanks!

Change atf_skip to atf_expect_fail as suggested by @kevans.

Thanks for adding new tests. This version looks ok to me. I'd give the other reviewers a couple more days before committing, but once you're ready please feel free to either commit with Approved by: markj or ask one of us here to commit it for you.

This revision is now accepted and ready to land.Jul 13 2020, 4:50 PM

The thought occurs that this is an opportunity to make minimal changes to the modest man page; @matthew remarked that it doesn't reference zstd(1) and I see gzip(1) is also missing. In addition it would probably make sense to add a BUGS section and document the known problems (regressions).

Remove the "-e flags containing quotes" test, it only fails because
of the multiple -e flag issue. Add references to gzip(1) and zstd(1)
to the man page and also document the two known problems.

This revision now requires review to proceed.Jul 17 2020, 7:59 PM
markj added inline comments.
usr.bin/grep/zgrep.1
99 ↗(On Diff #74590)

The BUGS section usually just describes the known bugs without any preamble.

This revision is now accepted and ready to land.Jul 20 2020, 1:37 PM
usr.bin/grep/zgrep.1
99 ↗(On Diff #74590)

I'll remove it.

Remove preamble from BUGS section as suggested by @markj

This revision now requires review to proceed.Jul 20 2020, 8:17 PM
This revision is now accepted and ready to land.Jul 20 2020, 8:23 PM
This revision was automatically updated to reflect the committed changes.