Page MenuHomeFreeBSD

Correct an out-of-bounds read in regcomp when the RE is bad.
ClosedPublic

Authored by brooks on Apr 29 2017, 12:48 AM.
Tags
None
Referenced Files
F106684596: D10541.diff
Fri, Jan 3, 8:09 PM
Unknown Object (File)
Dec 4 2024, 5:05 AM
Unknown Object (File)
Dec 2 2024, 11:52 AM
Unknown Object (File)
Nov 20 2024, 4:25 AM
Unknown Object (File)
Oct 21 2024, 1:11 AM
Unknown Object (File)
Oct 16 2024, 9:39 PM
Unknown Object (File)
Oct 4 2024, 11:23 PM
Unknown Object (File)
Oct 2 2024, 1:47 AM
Subscribers

Details

Summary

When passed the invalid regular expression "a**", the error is eventually
detected and seterr() is called. It sets p->error appropriatly and
p->next and p->end to nuls which is a never used char nuls[10] which is
zeros due to .bss initialization. Unfortunatly, p_ere_exp() and
p_simp_re() both have fall through cases where they set the error,
decrement p->next and access it which means a read from what ever .bss
variable comes before nuls.

Found with regex_test:repet_multi and CHERI bounds checking.

Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
MFC after: 1 week

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This bug was introduced with wide character support in rS132019.

emaste added a subscriber: kevans.
This revision is now accepted and ready to land.Apr 29 2017, 2:07 AM
pfg edited edge metadata.

LGTM, but let me add ache as he knows this code better.

lib/libc/regex/regcomp.c
657 ↗(On Diff #27842)

Typo in the comment: "Definitely"

brooks edited edge metadata.
  • Fix typo.
This revision now requires review to proceed.May 1 2017, 3:19 AM

This bug was introduced with wide character support in rS132019.

I am unsure if it's better to return or just break .... apparently the return just works so .. I'll accept it.

This revision is now accepted and ready to land.May 1 2017, 4:11 AM
In D10541#218737, @pfg wrote:

I am unsure if it's better to return or just break .... apparently the return just works so .. I'll accept it.

The error-handling stuff here feels wrong enough that kicking control back up to the top-level expression parser is likely safer.

Honestly, we could *probably* do a little more to safely exit out of regcomp in case of an error. Calls to the top-level parser can be nested, so this could bail out of one level and land us back in the sub-expression bits of this same function further up.

In D10541#218757, @kevans91_ksu.edu wrote:
In D10541#218737, @pfg wrote:

I am unsure if it's better to return or just break .... apparently the return just works so .. I'll accept it.

The error-handling stuff here feels wrong enough that kicking control back up to the top-level expression parser is likely safer.

Honestly, we could *probably* do a little more to safely exit out of regcomp in case of an error. Calls to the top-level parser can be nested, so this could bail out of one level and land us back in the sub-expression bits of this same function further up.

The struct parse error handling stuff is bizarre. I guess it's a pre-branch predictor optimization to keep error handling out of the fast path (after all, it's not like you need to consume the buffer). I think this code could do with a refactor, and probably quite a lot of fuzzing.

The struct parse error handling stuff is bizarre. I guess it's a pre-branch predictor optimization to keep error handling out of the fast path (after all, it's not like you need to consume the buffer). I think this code could do with a refactor, and probably quite a lot of fuzzing.

Is such an optimization even worth it at this point? Especially given that this is in regcomp -- the lesser of two evils in regex(3).

I do have some refactoring for relevant parts in a local branch that will make it to review once D10315 gets hammered out, I may revisit this sooner rather than later.

Is such an optimization even worth it at this point? Especially given that this is in regcomp -- the lesser of two evils in regex(3).

Almost certainly not worth it. I think this is a reasonable change for now, and introduce a further refactoring later.

Is such an optimization even worth it at this point? Especially given that this is in regcomp -- the lesser of two evils in regex(3).

Almost certainly not worth it. I think this is a reasonable change for now, and introduce a further refactoring later.

+1

Given past experience breaking things, it's best to introduce tests beforehand, then request an exp- run with any and all refactoring.

This revision was automatically updated to reflect the committed changes.