Page MenuHomeFreeBSD

bsdgrep(1): Fix escape map building for multibyte strings
ClosedPublic

Authored by emaste on Mar 22 2017, 5:52 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 19, 2:40 PM
Unknown Object (File)
Fri, Apr 19, 11:02 AM
Unknown Object (File)
Sun, Apr 7, 5:41 PM
Unknown Object (File)
Mar 11 2024, 5:11 AM
Unknown Object (File)
Mar 4 2024, 6:54 AM
Unknown Object (File)
Dec 23 2023, 11:52 AM
Unknown Object (File)
Dec 11 2023, 12:12 PM
Unknown Object (File)
Dec 10 2023, 7:30 PM
Subscribers

Details

Summary

In bsdgrep(1), fix escape map building in the regex parser. It was previously using memory not explicitly initialized, and the MBS escape map was being built based on a version of the pattern with escapes already parsed out.

PR: 175314

Test Plan

Run test cases stated by jbeich@ in the PR, ensure no further failures in kyua tests

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kevans retitled this revision from Fix escape map building for multibyte strings to bsdgrep(1): Fix escape map building for multibyte strings.Mar 22 2017, 5:54 PM
kevans edited the summary of this revision. (Show Details)
  • Update following r316492 to remove the "xmalloc" bits
  • Fix bogus indentation with ts=8
  • Consistently test pointers against NULL in this section
  • As a minor optimization, if we hit an unescaped dot in the pattern, break out after setting firstdot if we're not building an escmap. This will also hopefully make it more clear upon later review why we bother entering this section at all if fg->hasdot but fg->wescmap == NULL
  • Revert bogus optimization; I missed a bit in one of the latter macros (FILL_QSBC => _FILL_QSBC, but not _FILL_QSBC_REVERSED) where hasdot is relied upon to point to the final unescaped dot

It's worth noting that I don't have much faith at the moment that (_escmap/fg->wescmap/fg->escmap in !TRE_WCHAR case) construction is correct, either. I'm fairly certain it should be taking into account escape sequences, but I will check that out and address it in a separate patch.

Apologies, I meant to update this sooner -- I'm kind of torn here. This may no longer be necessary if we can run D10282 through or go a step further and just rip out the TRE bits instead of the intermediate D10282. I feel fairly confident that our regex(3) implementation can much more reasonably fulfill this duty, and just as quickly.

This may no longer be necessary if we can run D10282.

I'd rather we commit the work that's already been done here, even if we eventually remove this altogether.

emaste updated this revision to Diff 27941.
emaste edited reviewers, added: kevans; removed: emaste.
emaste added reviewers: cem, bapt.

Rebase after committing a portion of this change in rS317700 and rS317701
Restore broken style that exists in this file

This revision is now accepted and ready to land.May 2 2017, 8:17 PM
This revision was automatically updated to reflect the committed changes.