Page MenuHomeFreeBSD

Switch to WITH_GNU_GREP_COMPAT knob instead of WITHOUT_GNU_GREP_COMPAT
ClosedPublic

Authored by kevans on Mar 23 2017, 4:14 AM.
Tags
None
Referenced Files
Unknown Object (File)
May 25 2024, 3:33 AM
Unknown Object (File)
Apr 26 2024, 2:12 AM
Unknown Object (File)
Apr 26 2024, 2:08 AM
Unknown Object (File)
Apr 26 2024, 2:08 AM
Unknown Object (File)
Apr 26 2024, 2:08 AM
Unknown Object (File)
Apr 26 2024, 2:08 AM
Unknown Object (File)
Apr 26 2024, 2:08 AM
Unknown Object (File)
Apr 25 2024, 8:03 PM

Details

Summary

The GNU extension bits in the base system are old, no longer faithful to upstream, and surprising in some regards. Switch to documenting WITH_GNU_GREP_COMPAT and default GNU_GREP_COMPAT to OFF in the name of good behavior.

PR: 184733, 191086

Test Plan

Run the kyua tests to ensure we've not caused any regressions

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I think this is a reasonable change to make. It'd be good to have a list of what the "gnu extensions" are for the commit message, and perhaps to be mentioned in the WITH_GNU_GREP_COMPAT file though.

Ditto what Ed said already.

This revision is now accepted and ready to land.Apr 4 2017, 12:31 AM

PR191086: GNU grep and WITH_GNU_GREP_COMPAT bsdgrep do not accept [[:<:]] and [[:>:]] character classes.

tools/build/options/WITH_GNU_GREP_COMPAT
2 โ†—(On Diff #26588)
  • should be GNU extensions (although though the issue exists in the WITHOUT_ file already)
  • we should probably mention that it includes the extensions by linking against libgnuregex (as that carries LGPL issues)

Heh, shucks. =) I'm not familiar with the GNU extensions myself outside of knowing that the mentioned PRs exhibited broken behavior with libgnuregex and was fine without -- it'll take me a little bit to scrape through our version of libgnuregex and make somewhat of an exhaustive list of differences.

tools/build/options/WITH_GNU_GREP_COMPAT
2 โ†—(On Diff #26588)

Something along the lines of Set this option to include the GNU extensions in BSD grep by linking against libgnuregex, which is licensed under the LGPL?

Be sure to do an exp- run with the final bit of code that's to be committed.

In D10114#212405, @ngie wrote:

Be sure to do an exp- run with the final bit of code that's to be committed.

I can't imagine many ports want to use /usr/bin/bsdgrep and not grep, and switching BSD grep to be installed as /usr/bin/grep really needs an exp-run. I suspect we should perform the exp-run with:

diff --git a/share/mk/src.opts.mk b/share/mk/src.opts.mk
index 16c2527de51f..afc4a51e8e24 100644
--- a/share/mk/src.opts.mk
+++ b/share/mk/src.opts.mk
@@ -62,6 +62,7 @@ __DEFAULT_YES_OPTIONS = \
     BOOTPARAMD \
     BOOTPD \
     BSD_CPIO \
+    BSD_GREP \
     BSDINSTALL \
     BSNMP \
     BZIP2 \
@@ -99,8 +100,6 @@ __DEFAULT_YES_OPTIONS = \
     GDB \
     GNU \
     GNU_DIFF \
-    GNU_GREP \
-    GNU_GREP_COMPAT \
     GPIO \
     GPL_DTC \
     GROFF \
@@ -179,9 +180,10 @@ __DEFAULT_YES_OPTIONS = \
     ZONEINFO
 
 __DEFAULT_NO_OPTIONS = \
-    BSD_GREP \
     CLANG_EXTRAS \
     DTRACE_TESTS \
+    GNU_GREP \
+    GNU_GREP_COMPAT \
     HESIOD \
     LIBSOFT \
     NAND \
In D10114#212405, @ngie wrote:

Be sure to do an exp- run with the final bit of code that's to be committed.

I can't imagine many ports want to use /usr/bin/bsdgrep and not grep, and switching BSD grep to be installed as /usr/bin/grep really needs an exp-run. I suspect we should perform the exp-run with:

diff --git a/share/mk/src.opts.mk b/share/mk/src.opts.mk
index 16c2527de51f..afc4a51e8e24 100644
--- a/share/mk/src.opts.mk
+++ b/share/mk/src.opts.mk
@@ -62,6 +62,7 @@ __DEFAULT_YES_OPTIONS = \
     BOOTPARAMD \
     BOOTPD \
     BSD_CPIO \
+    BSD_GREP \
     BSDINSTALL \
     BSNMP \
     BZIP2 \
@@ -99,8 +100,6 @@ __DEFAULT_YES_OPTIONS = \
     GDB \
     GNU \
     GNU_DIFF \
-    GNU_GREP \
-    GNU_GREP_COMPAT \
     GPIO \
     GPL_DTC \
     GROFF \
@@ -179,9 +180,10 @@ __DEFAULT_YES_OPTIONS = \
     ZONEINFO
 
 __DEFAULT_NO_OPTIONS = \
-    BSD_GREP \
     CLANG_EXTRAS \
     DTRACE_TESTS \
+    GNU_GREP \
+    GNU_GREP_COMPAT \
     HESIOD \
     LIBSOFT \
     NAND \

Ok. This review is separate from what I was anticipating.. carry on!

kevans edited edge metadata.
  • gnu => GNU
  • Add note about linking against LGPL'd libgnuregex
This revision now requires review to proceed.Apr 4 2017, 7:54 PM
tools/build/options/WITH_GNU_GREP_COMPAT
2 โ†—(On Diff #26588)

There should be a hard/soft stop in that sentence. The first part is talking about GNU extensions being implemented. The second part is talking about libgnuregex's less permissive license.

Reminder/note: mdoc says sentences should start on new lines.

In D10114#212675, @ngie wrote:

Reminder/note: mdoc says sentences should start on new lines.

Ahhh, excellent! I had no idea that mdoc(7) was a thing, and I was just starting to worry about making man changes without a document similar to style(9) that I can reference. This looks to be exactly what I needed, though. Thanks!

tools/build/options/WITH_GNU_GREP_COMPAT
2 โ†—(On Diff #26588)

Or maybe we should just drop the LGPL comment: Set this option to include the GNU extensions in BSD grep, by linking against libgnuregex.

tools/build/options/WITH_GNU_GREP_COMPAT
2 โ†—(On Diff #26588)

The name "libgnuregex" alone would probably spur further investigation into the exact license of said library.

It would be nice if the the license for libgnuregex was more prominent than it is now, though.

  • Drop libgnuregex licensing comment

According to http://www.regular-expressions.info, GNU extensions:

  • Add missing quantifiers to BREs: \?, \+
  • Add branching to BREs: \|
  • Add backreferences (\1 through \9) to EREs
  • Add \w, \W, \s, and \S corresponding to :alnum:, [^[:alnum:]], :space:, and [^[:space:]] respectively
  • Add word boundaries and anchors:
    • \b: word boundary
    • \B: not word boundary
    • \<: Strt of word
    • \>: End of word
    • \`: Start of subject string
    • \': End of subject string

Additionally, I've noted that GNU utilities seem to allow patterns like "a{,}" or "a{,5}" to mean "any number of a" or "up to 5 a" respectively, rather than the format described by re_format(7) where the start of the interval at least must be specified, while the end may be omitted.

I could slowly implement some of these into regex(3) if that is deemed desirable, whether it be through some kind of additional flag or #define support.

This revision was automatically updated to reflect the committed changes.