create widths.txt from utf8proc data
ClosedPublic
Actions

Authored by yuripv on Nov 17 2020, 7:01 PM.

Details

Reviewers

bapt
hrs

Commits

rS368390: update wcwidth data from utf8proc

Summary

Update our pretty outdated widths.txt with data from utf8proc, which looks to be widely used exactly for the reasons of wcwidth() provided by OS to be unreliable. This implementation consists of simple piece of C code (required as there's only C library in utf8proc), going through 0x0 - 0x110000 character range, and perl script (copied and modified from utf8-rollup.pl) mapping the data into widths.txt format.

Test Plan

Rebuild locale data with new widths.txt, enjoy the text full of emojis in terminal/vim/....

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

yuripv created this revision.Nov 17 2020, 7:01 PM

Herald added a subscriber: imp. · View Herald TranscriptNov 17 2020, 7:01 PM

yuripv requested review of this revision.Nov 17 2020, 7:01 PM

context

rewrite (actually copy and modify utf8-rollup.pl) mkwidths in perl, making it 300x faster compared to sh version
add to README
use pkgconf to get utf8proc cflags/libs

yuripv edited the summary of this revision. (Show Details)Nov 25 2020, 9:01 PM

Not tested but looks good to me anyway

tools/tools/locale/tools/mkwidths.pl
7	I think here only your copyright is valid, the 2 above should be removed

This revision is now accepted and ready to land.Dec 4 2020, 1:32 PM

yuripv added inline comments.Dec 4 2020, 3:07 PM

tools/tools/locale/tools/mkwidths.pl
7	utf8to32() and get_utf8map(), which contain almost all logic in this script, are not something I wrote; that applies to original utf8-rollup.pl as well.