Page MenuHomeFreeBSD

awk: revert to upstream behavior for ranges for gawk compatibility

Authored by imp on Jul 9 2021, 8:21 AM.



In 2005, FreeBSD changed one-true-awk to honor the locale's collating
order. This was billed as a temporary patch. It was also compatible with
the then-current behavior of gawk. That temporary patch has lasted 16
years now.

However, IEEE Std 1003.1-2008 changed the behaivor of ranges in regular
expressions outside of the "C" and "POSIX" locales to be undefined.

Starting in 2011, gawk 4.0 stopped using the locale for the range
regular expressions and used the traditional behavior only. The
maintainer had grown weary of answering why '[A-Z]' would sometimes
match lower-case expressions. The details about are explained here:

To restore compatibility with other implementaitons of awk, revert this
patch. FreeBSD is the odd-system out. It also has the nice side effect
of eliminating the last of our differences with upstream one-true-awk.

MFC After: 2 weeks
Sponsored by: Netflix

Diff Detail

rG FreeBSD src repository
Lint Not Applicable
Tests Not Applicable

Event Timeline

imp requested review of this revision.Jul 9 2021, 8:21 AM
imp created this revision.
rgrimes added a subscriber: rgrimes.
rgrimes added inline comments.

Are there any possible side effects from this no-longer being cleared to NULL? Ie, what if LC_COLLATE is set, prior to this change that wouldn't matter, but after this change can that cause things like ports to break ? NVM, I think I answered that myself, after this change there are no more calls to strcoll, so this value would not matter anyway.

This revision is now accepted and ready to land.Jul 9 2021, 1:19 PM
cy added a subscriber: cy.

Reverting to standard is a good idea.

I plan on committing this w/o an exp run in the next few days.
has the context for why... The exp run is done in a context where
these changes wouldn't change anything, so it wouldn't be able to
detect any problems and would thus be a waste of time.