Page MenuHomeFreeBSD

create widths.txt from utf8proc data
ClosedPublic

Authored by yuripv on Nov 17 2020, 7:01 PM.

Details

Summary

Update our pretty outdated widths.txt with data from utf8proc, which looks to be widely used exactly for the reasons of wcwidth() provided by OS to be unreliable. This implementation consists of simple piece of C code (required as there's only C library in utf8proc), going through 0x0 - 0x110000 character range, and perl script (copied and modified from utf8-rollup.pl) mapping the data into widths.txt format.

Test Plan

Rebuild locale data with new widths.txt, enjoy the text full of emojis in terminal/vim/....

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

  • rewrite (actually copy and modify utf8-rollup.pl) mkwidths in perl, making it 300x faster compared to sh version
  • add to README
  • use pkgconf to get utf8proc cflags/libs

Not tested but looks good to me anyway

tools/tools/locale/tools/mkwidths.pl
7 ↗(On Diff #79742)

I think here only your copyright is valid, the 2 above should be removed

This revision is now accepted and ready to land.Dec 4 2020, 1:32 PM
tools/tools/locale/tools/mkwidths.pl
7 ↗(On Diff #79742)

utf8to32() and get_utf8map(), which contain almost all logic in this script, are not something I wrote; that applies to original utf8-rollup.pl as well.

Committed in rS368390, not sure why phabricator didn't grok it.