Page MenuHomeFreeBSD

bsdgrep(1): Fix errors with invalid expressions like grep -E { /dev/null
ClosedPublic

Authored by kevans on Mar 23 2017, 3:53 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Dec 10, 5:33 AM
Unknown Object (File)
Nov 22 2024, 10:19 AM
Unknown Object (File)
Nov 9 2024, 1:37 AM
Unknown Object (File)
Nov 3 2024, 5:57 PM
Unknown Object (File)
Oct 20 2024, 3:29 AM
Unknown Object (File)
Oct 8 2024, 1:43 PM
Unknown Object (File)
Sep 28 2024, 9:11 AM
Unknown Object (File)
Sep 27 2024, 12:04 PM
Subscribers

Details

Summary

Invalid expressions with an ultimate compiled pattern length of 0 were not taken into account and caused a segfault while trying to fill in the good suffix table. Address this by making sure we NULL out the BMG properties if the pattern is invalid.

PR: 194823

Test Plan

Run kyua tests to make sure no additional regressions come from this.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 8242
Build 8485: arc lint + arc unit

Event Timeline

Mistakenly switched branches prior to submitting -- this is an attempt at a fix

This revision is now accepted and ready to land.Apr 4 2017, 12:46 AM

(It looks like xmalloc() might also assert() that size > 0, causing a crash in a similar location, but with SIGABRT instead of SIGSEGV.)

kevans edited edge metadata.
  • The BMG variables are already initialized, don't let jemalloc bork them
This revision now requires review to proceed.Apr 4 2017, 3:13 AM

This is true in the case that we use our internal xmalloc implementation, but that's not the default =(. By default, xmalloc is the jemalloc(3) malloc which happily accepts our size=0.

In re-examining this, I did catch that the bmGs should be explicitly initialized by INIT_COMP before getting to this point, so we just need to make sure that we don't kill later expectations due to the jemalloc(3) behavior when size=0.

In D10113#212181, @kevans91_ksu.edu wrote:

This is true in the case that we use our internal xmalloc implementation, but that's not the default =(. By default, xmalloc is the jemalloc(3) malloc which happily accepts our size=0.

Oh, that sucks. Can we either use xmalloc or remove it entirely? Doesn't have to be in this patch.

In re-examining this, I did catch that the bmGs should be explicitly initialized by INIT_COMP before getting to this point, so we just need to make sure that we don't kill later expectations due to the jemalloc(3) behavior when size=0.

Well, malloc(0) -> NULL is sane. Dereferencing more than the allocated bytes is illegal, so there's got to be some other bug... does something not handle REG_ESPACE?

In D10113#212184, @cem wrote:

Oh, that sucks. Can we either use xmalloc or remove it entirely? Doesn't have to be in this patch.

This is one of the rabbits hole I've been going down since my last comment, but malloc with size=0 doesn't result in NULL =(. I would like to rip out xmalloc, personally, because I see little to no advantage to keeping it around -- it only serves now to obfuscate what's really going on.

Well, malloc(0) -> NULL is sane. Dereferencing more than the allocated bytes is illegal, so there's got to be some other bug... does something not handle REG_ESPACE?

Hey, there's the other rabbit hole. If this set of functions fails either due to REG_BADPAT/REG_ESPACE, we fallback to regcomp (regex(3)) and then handle errors from *that* if it throws any. For REG_BADPAT, this might make sense, but REG_ESPACE should be a guaranteed fail.

I have mixed feelings as to where to go with this patch, but I think this version available now is probably the way to go for the time being. Failing compilation due to this exact scenario will do *nothing*, because the regcomp fallback lets this pattern slip through as if it were a blank pattern that matches everything. I personally think that until we're ready to address this in regex(3) (which really should also fail this invalid pattern), we should probably err on the side of consistency for the time being and fix both instances at the same time. What do you think?

In D10113#212187, @kevans91_ksu.edu wrote:
In D10113#212184, @cem wrote:

Oh, that sucks. Can we either use xmalloc or remove it entirely? Doesn't have to be in this patch.

This is one of the rabbits hole I've been going down since my last comment, but malloc with size=0 doesn't result in NULL =(. I would like to rip out xmalloc, personally, because I see little to no advantage to keeping it around -- it only serves now to obfuscate what's really going on.

Thanks, I saw the other patch and it LGTM.

Well, malloc(0) -> NULL is sane. Dereferencing more than the allocated bytes is illegal, so there's got to be some other bug... does something not handle REG_ESPACE?

Hey, there's the other rabbit hole. If this set of functions fails either due to REG_BADPAT/REG_ESPACE, we fallback to regcomp (regex(3)) and then handle errors from *that* if it throws any. For REG_BADPAT, this might make sense, but REG_ESPACE should be a guaranteed fail.

That sounds very reasonable to me.

I have mixed feelings as to where to go with this patch, but I think this version available now is probably the way to go for the time being. Failing compilation due to this exact scenario will do *nothing*, because the regcomp fallback lets this pattern slip through as if it were a blank pattern that matches everything. I personally think that until we're ready to address this in regex(3) (which really should also fail this invalid pattern), we should probably err on the side of consistency for the time being and fix both instances at the same time. What do you think?

What do you mean by both instances? (I'm also a little confused why bsdgrep falls back to regex(3), but, that's probably another one of those rabbit holes :-).)

This revision is now accepted and ready to land.Apr 4 2017, 5:26 AM
In D10113#212209, @cem wrote:

I have mixed feelings as to where to go with this patch, but I think this version available now is probably the way to go for the time being. Failing compilation due to this exact scenario will do *nothing*, because the regcomp fallback lets this pattern slip through as if it were a blank pattern that matches everything. I personally think that until we're ready to address this in regex(3) (which really should also fail this invalid pattern), we should probably err on the side of consistency for the time being and fix both instances at the same time. What do you think?

What do you mean by both instances? (I'm also a little confused why bsdgrep falls back to regex(3), but, that's probably another one of those rabbit holes :-).)

Sorry, both instances meaning one instance in the regex/ bits locally as well as the other instance in regex(3), implying that matching all on this invalid expression is definitively wrong behavior. I believe this to be the case based on recent GNU grep behavior in the same situation -- exit code of 1, which seems to line up with what I expect.

As far as the fallback, my unqualified/tired/currently contextless opinion is that this was done to get away with implementing a subset of regex that will quickly match (I assume regex(3) is slow) and optimize most cases, but fallback in case we're looking at extended syntax, since this would otherwise be falling back to libgnuregex for the regex(3) implementation.

So should we hold off on applying this until regex(3) is fixed?

In D10113#212224, @cem wrote:

So should we hold off on applying this until regex(3) is fixed?

In my opinion, no, because fixing regex(3) is kind of a rough one since it's more broadly applicable/usable than just bsdgrep(1) and that's problematic in its own right. It may not be a bad idea to go ahead and make this patch fail compilation in these bits and let it fall back and succeed/fail in regex(3).

To clarify, regex(3) *should* see a fix for this soon, but I don't think there's much reason in delaying the bsdgrep(1) fix because of that.

This revision was automatically updated to reflect the committed changes.