Page MenuHomeFreeBSD

Update zgrep(1) to correctly handle -f FILE and combined flags
ClosedPublic

Authored by leres on Jul 10 2020, 9:38 PM.

Details

Summary

The zgrep shell wrapper that has been used since r332993 to handle
compressed input files does not handle the -f flag correctly:

  1. Works % zfgrep RELEASE /etc/motd FreeBSD 12.1-RELEASE-p3 GENERIC
  2. Hangs reading from stdin % echo RELEASE > /tmp/0 % zfgrep -f /tmp/0 /etc/motd /etc/motd:FreeBSD 12.1-RELEASE-p3 GENERIC ^Z Suspended leviathan 89 % ps t PID TT STAT TIME COMMAND 11249 4 Ss 0:00.11 -csh (csh) 12190 4 T 0:00.00 /bin/sh /usr/bin/zfgrep -f /tmp/0 /etc/motd 12191 4 T 0:00.00 /usr/bin/zcat -f - 12192 4 T 0:00.00 grep -F -f /tmp/0 -- /etc/motd - 12193 4 R+ 0:00.00 ps t

In addition the script does not handle certain combined flags:

  1. Works % fgrep -we RELEASE /etc/motd FreeBSD 12.1-RELEASE-p3 GENERIC
  2. Fails % zfgrep -we RELEASE /etc/motd grep: RELEASE: No such file or directory

Add tests for the various failure cases and update the zgrep.sh
wrapper script to correct the issues.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

leres requested review of this revision.Jul 10 2020, 9:38 PM
usr.bin/grep/zgrep.sh
82 ↗(On Diff #74289)

Why is this case added? Is it related to the rest of the patch?

153 ↗(On Diff #74289)

So if -e is specified (this script doesn't appear to handle multiple uses of -e at all), we are now passing -e in grep_args, so we are losing quoting. As a result, with this patch the following no longer works: zgrep -e 'foo bar' *.

usr.bin/grep/zgrep.sh
82 ↗(On Diff #74289)

The problem is when you add the wildcard to -*[ABCDXdefm] it can potentially match long flags. Examples are --ignore-case, --line-buffered, --with-filename, and --no-filename because they end with 'e' and 'd' but it could also happen with any --regexp=XXX, --include=XXX, or --exclude=XXX that end with a flag in the match class.

The new version will include a test for this along with another test for a new problem; -e PATTERN followed by other flags; the break in the -e/-*e case needs to be a continue otherwise we stop processing flags prematurely.

153 ↗(On Diff #74289)

The new version will include a test for this case.

Add a test for problems with -e PATTERN when it is quoted. Add a
test for problems with long flags (e.g. --ignore-case), particularly
when reading from stdin. Add a test for -e PATTERN followed by
other flags.

Revert -e handling to preserve quoting. Change break to continue
in -e handling and process possible following flags.

usr.bin/grep/zgrep.sh
153 ↗(On Diff #74289)

Hum... I missed your comment about multiple uses of -e; I did some testing I see that grep applies all -e patterns:

# Works
echo foobar > test
grep -e foo -e xxx test
foobar

# No output
zgrep -e foo -e xxx test

So I should go back to my original idea of keep -e PATTERN in grep_args.

I'll wait for guidance before continuing.

usr.bin/grep/zgrep.sh
153 ↗(On Diff #74289)

Ok so fixing this is hard. I see several approaches to fix this. One is to use a temp file, put multiple -e patterns in the file and use -f tempfile. This is easy but yuck: temp file.

The other involves rewriting the script to loop over $@ so we can look at the quoted versions of arguments:

cat /var/tmp/y.sh
#!/bin/sh
for arg in "$@"; do
    echo "# ${arg}"
done
/var/tmp/y.sh foo \'\" '"foo bar"' "'foo bar'"
# foo
# '"
# "foo bar"
# 'foo bar'

Executing the final pipeline with eval would let the shell strip quoting. I'm not inclined to work on this solution unless it is agreed to be a reasonable solution.

It's worth noting that the grep(1) does not document what happens when you use more than one -e but clearly it handles it the same way it does mutiple patterns in a -f file.

Here's yet another regression I just noticed:

# Works
echo foobar > test
grep -efoo test
foobar

# Incorrect -H output
zgrep -efoo test
test:foobar
usr.bin/grep/zgrep.sh
82 ↗(On Diff #74289)

I see, thanks.

153 ↗(On Diff #74289)

Passing parameters through eval seems a bit dangerous. Without having thought about it carefully, can we guarantee that parameters don't get evaluated directly by the shell? In other words, can we do it in a way that ensures that zgrep.sh won't execute arbitrary commands?

It looks like you are right about the other bug. (It is not a regression from your change, to be clear.)

I don't think fixing these other issues should hold up this patch, once the edge case I pointed out above is fixed.

101 ↗(On Diff #74337)

With this patch, zgrep now behaves differently with -e "".

usr.bin/grep/zgrep.sh
101 ↗(On Diff #74337)

Next version adds a test and solves this.

Next round of fixes

I assume it's reasonable to add tests with atf_skip for the outstanding
regressions? There are (currently) three of these:

echo 'foo bar' > test
zgrep -wefoo test < /dev/null

echo foobar > test
zgrep -e foo -e xxx test

echo "'foo bar'" > test2
echo '"foo bar"' >> test2
zgrep -e "'foo bar'" -e '"foo bar"' test2

I noticed but did not add tests for whitespace in filenames, e.g:

echo foo > 'pattern 2'
echo foobar > test
zgrep -f 'pattern 2' test

Add a test for and fix for the long version of the -f problem:

echo foo > pattern
echo foobar > test
zgrep --file=pattern test </dev/null

Add a test for and fix for the long version of the -e problem:

echo 'foo bar' > test
zgrep --regexp='foo bar' test < /dev/null

New test and logic to avoid breaking an empty -p pattern:

echo foobar > test
zgrep -e '' test

Next round of fixes

I assume it's reasonable to add tests with atf_skip for the outstanding
regressions? There are (currently) three of these:

atf_expect_fail is better here, then if it succeeds it fails and we know it's safe to remove it.

atf_expect_fail is better here, then if it succeeds it fails and we know it's safe to remove it.

That makes sense; thanks!

Change atf_skip to atf_expect_fail as suggested by @kevans.

Thanks for adding new tests. This version looks ok to me. I'd give the other reviewers a couple more days before committing, but once you're ready please feel free to either commit with Approved by: markj or ask one of us here to commit it for you.

This revision is now accepted and ready to land.Jul 13 2020, 4:50 PM

The thought occurs that this is an opportunity to make minimal changes to the modest man page; @matthew remarked that it doesn't reference zstd(1) and I see gzip(1) is also missing. In addition it would probably make sense to add a BUGS section and document the known problems (regressions).

Remove the "-e flags containing quotes" test, it only fails because
of the multiple -e flag issue. Add references to gzip(1) and zstd(1)
to the man page and also document the two known problems.

This revision now requires review to proceed.Jul 17 2020, 7:59 PM
markj added inline comments.
usr.bin/grep/zgrep.1
99 ↗(On Diff #74590)

The BUGS section usually just describes the known bugs without any preamble.

This revision is now accepted and ready to land.Jul 20 2020, 1:37 PM
usr.bin/grep/zgrep.1
99 ↗(On Diff #74590)

I'll remove it.

Remove preamble from BUGS section as suggested by @markj

This revision now requires review to proceed.Jul 20 2020, 8:17 PM
This revision is now accepted and ready to land.Jul 20 2020, 8:23 PM
This revision was automatically updated to reflect the committed changes.