Page MenuHomeFreeBSD

[libedit] enhance multibyte support
ClosedPublic

Authored by naito.yuichiro_gmail.com on Nov 8 2018, 5:45 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 20, 11:52 AM
Unknown Object (File)
Tue, Nov 19, 4:42 AM
Unknown Object (File)
Tue, Nov 19, 4:37 AM
Unknown Object (File)
Tue, Nov 19, 4:25 AM
Unknown Object (File)
Mon, Nov 4, 2:39 PM
Unknown Object (File)
Wed, Oct 30, 2:50 AM
Unknown Object (File)
Oct 21 2024, 9:11 PM
Unknown Object (File)
Oct 3 2024, 11:57 PM

Details

Summary

According to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232374,
libedit doesn't support other multibyte charset than UTF-8.

We enhance to support other multibyte charset that FreeBSD libc supports.

Test Plan

execute /bin/sh in following locale and input multibyte characters from STDIN.

  1. en_us.UTF-8
  2. ja_JP.eucJP
  3. ja_JP.UTF-8
  4. ja_JP.SJIS

We expect /bin/sh is enable to handle multibyte characters.

Regression:
execute /bin/sh in C locale and input ASCII characters from STDIN.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

yuripv added a subscriber: yuripv.

I think this makes perfect sense. You might want to update editline(3) as well so it doesn't say it only knows about UTF-8 and C locales.

Fixed editline.3 manual page.
With this fix, libedit supports multibyte charset other than UTF-8.

This makes perfect sense to me as well, have you try to contact upstream (aka netbsd) with this patch? christos@ might be a good contact overthere.

This revision is now accepted and ready to land.Nov 9 2018, 9:59 AM
0mp requested changes to this revision.Nov 9 2018, 11:18 AM
0mp added a subscriber: 0mp.

Remember to bump the date in the manual page.

This revision now requires changes to proceed.Nov 9 2018, 11:18 AM
lib/libedit/el.c
296 ↗(On Diff #50190)

I think we shouldn't just remove this flag, and rather check MB_CUR_MAX being bigger than 1, i.e.:

if (MB_CUR_MAX > 1)
    el->el_flags &= ~NARROW_HISTORY;
lib/libedit/el.c
296 ↗(On Diff #50190)

Or, trying to read the original condition again, if (MB_CUR_MAX == 1) :)

This makes perfect sense to me as well, have you try to contact upstream (aka netbsd) with this patch? christos@ might be a good contact overthere.

No, I've never tried to contact to NetBSD project.
I'm sorry that I don't know how to contact to them.

Remember to bump the date in the manual page.

Thanks to let me know.
I'll fix the date.

I think we shouldn't just remove this flag, and rather check MB_CUR_MAX being bigger than 1, i.e.:

Are you afraid of poor performance in C or ASCII environment?
If so, to keep NARROW_HISTORY flag makes sense to me.
I'll revert NARROW_HISTORY flag deletion.

Update the date in the manual page.
Revert deleting NARROW_HISTORY flag.

For the record ...

We really do need to update libedit (from NetBSD).
These changes are welcome but I fear the approach of trying to preserve our changes is not sustainable anymore

Someone brave, and with time, has to take the step of just bringing the NetBSD version ...

In D17903#382776, @pfg wrote:

For the record ...

We really do need to update libedit (from NetBSD).
These changes are welcome but I fear the approach of trying to preserve our changes is not sustainable anymore

Someone brave, and with time, has to take the step of just bringing the NetBSD version ...

I did it once, it is not that hard, and our version right now is not that far from there version is it? for the record, I have also contacted christos@ by email so he can wait on this review.

Also our "extensions" to libedit are not big if it hasn't changed from my last update.

In D17903#382777, @bapt wrote:
In D17903#382776, @pfg wrote:

For the record ...

We really do need to update libedit (from NetBSD).
These changes are welcome but I fear the approach of trying to preserve our changes is not sustainable anymore

Someone brave, and with time, has to take the step of just bringing the NetBSD version ...

I did it once, it is not that hard, and our version right now is not that far from there version is it? for the record, I have also contacted christos@ by email so he can wait on this review.

My memory is sketchy here, I think there were moved files and some other code disappeared making the merge difficult.
In all honesty they did some sensible cleanups, getting rid of the multiple hacks to get an internationalized version (which is now the only version we build anyways).
OpenBSD also upstreamed some changes. The thing from my last try is that "svn merge" didn't help much, and there were build issues to figure out.
We still have that big change in the vendor tree but I never found the time to finish merging it.

The other nice thing we have to bring at some point is NetBSD's mail(1) support for libedit. (sorry for other more TODOs to someone else's list ;).

Also our "extensions" to libedit are not big if it hasn't changed from my last update.

Those are pretty small yes, but they are difficult to keep track of when you try to follow the upstream changes.

It's a matter of patience. and yes, the update is not really related to this change.

This revision is now accepted and ready to land.Nov 9 2018, 5:20 PM

I talked with Christos and asked him to merge my patch to NetBSD src tree.
He accepted my request. So I will send my patch to him after it has been committed to FreeBSD src tree.

This revision now requires review to proceed.Nov 19 2018, 3:44 PM

I had review from hrs@ in FreeBSD workshop Tokyo on 21 Nov.
He commented about my patch and update it.
I will update this differential and write down his comments.

This revision is now accepted and ready to land.Nov 23 2018, 6:30 PM
naito.yuichiro_gmail.com added inline comments.
lib/libedit/chartype.c
189 ↗(On Diff #50915)

hrs@ says that wctomb(3) has an internal shift state,
if wctomb(3) is called outside of libedit,
the internal state can be changed and causes miscalculate multibyte size.

So in this part, wcrtomb(3) should be used.
wcrtomb(3) requires that shift state is given in the argument.
We always initialize the shift state in ct_enc_width() to keep independent from outside of libedit.

lib/libedit/read.c
366 ↗(On Diff #50915)

hrs@ says that
(cbp >= MB_LEN_MAX) condition is necessary for checking invalid byte sequences.
If malicious input was given, libedit would read byte sequences forever.

This revision was automatically updated to reflect the committed changes.