Page MenuHomeFreeBSD

create widths.txt from utf8proc data

Authored by yuripv on Nov 17 2020, 7:01 PM.



Update our pretty outdated widths.txt with data from utf8proc, which looks to be widely used exactly for the reasons of wcwidth() provided by OS to be unreliable. This implementation consists of simple piece of C code (required as there's only C library in utf8proc), going through 0x0 - 0x110000 character range, and perl script (copied and modified from mapping the data into widths.txt format.

Test Plan

Rebuild locale data with new widths.txt, enjoy the text full of emojis in terminal/vim/....

Diff Detail

rS FreeBSD src repository - subversion
Lint Skipped
Unit Tests Skipped

Event Timeline

  • rewrite (actually copy and modify mkwidths in perl, making it 300x faster compared to sh version
  • add to README
  • use pkgconf to get utf8proc cflags/libs

Not tested but looks good to me anyway

7 ↗(On Diff #79742)

I think here only your copyright is valid, the 2 above should be removed

This revision is now accepted and ready to land.Dec 4 2020, 1:32 PM
7 ↗(On Diff #79742)

utf8to32() and get_utf8map(), which contain almost all logic in this script, are not something I wrote; that applies to original as well.

Committed in rS368390, not sure why phabricator didn't grok it.