Page MenuHomeFreeBSD

Regex improvement.
AbandonedPublic

Authored by pfg on May 6 2016, 11:55 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 26, 6:44 AM
Unknown Object (File)
Fri, Apr 26, 6:44 AM
Unknown Object (File)
Thu, Apr 25, 8:44 PM
Unknown Object (File)
Jan 8 2024, 11:24 AM
Unknown Object (File)
Jan 8 2024, 11:24 AM
Unknown Object (File)
Jan 8 2024, 11:24 AM
Unknown Object (File)
Jan 8 2024, 11:24 AM
Unknown Object (File)
Jan 8 2024, 11:24 AM
Subscribers

Details

Reviewers
jilles
tijl
bapt
Group Reviewers
manpages
Summary

This curiosity was found in the OpenBSD tech list (2016-05-04),
as part as a bigger bug:

"...
The patch in engine.c from the library is because beginp is set in
matcher to start, so this one always results to true."

Our implementation does some extra trick so it is not clear (to me) how
the patch works but from my measurements it produces almost a 5%
improvement in out testsuite and while keeping the same standard
deviation.

Some insight from regex experts is welcome.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 3941
Build 3984: arc lint + arc unit

Event Timeline

pfg retitled this revision from to Regex improvement..
pfg updated this object.
pfg edited the test plan for this revision. (Show Details)

Add the change for slow() also.

Anyone here is an expert on our regex, to review this change? ;).

More conservative patch based on new comments in the OpenBSD list.

The previous change was breaking the regexec(3) manual on the
REG_STARTEND section.

This is slightly faster than the previous patch.

Update from OpenBSD's WIP

From openbsd-tech list 2016-05-24 18:11:27 (Martijn van Duren)

The patch itself changes the behavior of when REG_STARTEND is combined
with REG_NOTBOL. Currently when the two are combined
"string + pmatch[0].rm_so" is used as an offset and REG_NOTBOL is used
to do his determination equal to when string is used without offset.
This patch enables the assumption that pmatch[0].rm_so is a
continuation offset to string and allows us to do a proper assessment of
the character in regards to it's word position ('^' or '\<'), without
risking going into unallocated memory.

This was committed as r300683 but phabricator didn't catch it.

It did not close because the Differential Revision line is not at the end of the commit message in rS300683

In D6257#140611, @pfg wrote:

This was committed as r300683 but phabricator didn't catch it.

BTW, I think that you should have Closed this request rather than Abandoned it.

In D6257#143415, @avg wrote:
In D6257#140611, @pfg wrote:

This was committed as r300683 but phabricator didn't catch it.

BTW, I think that you should have Closed this request rather than Abandoned it.

I tried but since no one approved it, it was not an option.
It is also not documented anywhere the "Differential Revision" link should be last.
I am not very fond of phabricator :(.

In D6257#143420, @pfg wrote:

I tried but since no one approved it, it was not an option.

Ah, I see, sorry for the noise.

I tried but since no one approved it, it was not an option.

It used to allow self-accept or close-without-accept I think; probably that changed somehow. I'll see if I can find out how/why.

It is also not documented anywhere the "Differential Revision" link should be last.

I thought it was in the template, but it just reports that the full URL is required not only the D### tag. I'll see about including a note in there too.