Page MenuHomeFreeBSD

nvi: fallback to iso8859-1 if utf-8 check failed and fileencoding/locale encoding is utf-8

Authored by yuripv on May 19 2020, 5:00 PM.



Trying to use the user locale's encoding that is UTF-8 doesn't make much sense if we already know that it fail as we failed the looks_utf8() check above. I've changed the logic to fallback to ISO8859-1 instead as it seems to be the most widely used single byte locale:

  1. Check for valid UTF-8.
  2. Check if fallback fileencoding is set and is NOT UTF-8.
  3. Check if user locale's encoding is NOT UTF-8.
  4. Use ISO8859-1 as last resort.

Triggered by a question from Ronald F. Guilmette on questions@.

Test Plan

Having any UTF-8 locale (e.g. en_US.UTF-8), try editing a file with (valid) ISO8859-1 characters in [128,255] bytes range:

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

In D24919#548529, @bapt wrote:

Have you contacted the upstream?

Will do, forgot about it! Thanks.

In D24919#548529, @bapt wrote:

Have you contacted the upstream?

Will do, forgot about it! Thanks.

As I have pretty limited connectivity at the moment and can't do a pull request, I'm adding to author to this review. If it doesn't work out, I'll send a PR in few days.

I agree with the change. Please send a PR later, thanks.

13 ↗(On Diff #71999)

If you must commit this, please bump the revision (to 10.65), update the date, and put "yuripv" in place of "zy" so that ident(1) can tell users what behavior they should expect from their binary.

This revision is now accepted and ready to land.May 20 2020, 2:36 AM
13 ↗(On Diff #71999)

Will do.

Interesting, the only idents I see currently are:

$ ident /usr/bin/vi
     $FreeBSD: head/lib/csu/amd64/crtn.S 217105 2011-01-07 16:07:51Z kib $
     $FreeBSD: head/lib/csu/common/ignore_init.c 340702 2018-11-20 21:04:20Z emaste $
     $FreeBSD: head/lib/csu/common/crtend.c 354541 2019-11-08 14:28:39Z kevans $
     $FreeBSD: head/lib/csu/amd64/reloc.c 340702 2018-11-20 21:04:20Z emaste $
     $FreeBSD: head/lib/csu/common/crtbegin.c 340395 2018-11-13 15:28:27Z andrew $
     $FreeBSD: head/lib/csu/amd64/crti.S 217105 2011-01-07 16:07:51Z kib $
     $FreeBSD: head/lib/csu/common/crtbrand.c 340701 2018-11-20 20:59:49Z emaste $
     $FreeBSD: head/lib/csu/amd64/crt1.c 339351 2018-10-13 23:52:55Z kib $

However running ident directly on source shows:

$ ident /usr/src/contrib/nvi/common/exf.c
     $Id: exf.c,v 10.64 2015/04/05 15:21:55 zy Exp $
13 ↗(On Diff #71999)

Merged to upstream.
Thanks, the rcsids may worth a fix in a way similar to
or to be removed like what OpenBSD did.

13 ↗(On Diff #71999)

OK, if you don't mind, I'll commit this without updating the ID not to confuse readers and as it won't be reflected in the binary anyway, and we (you? :) ) can decide what to do with IDs separately. And when we import next release, we'll get what's in upstream.

This revision was automatically updated to reflect the committed changes.