Page MenuHomeFreeBSD

iconv: use GNU-compatible handling of invalid sequences by default
AcceptedPublic

Authored by kevans on May 1 2026, 5:20 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jun 20, 3:04 PM
Unknown Object (File)
Tue, Jun 16, 11:22 PM
Unknown Object (File)
Fri, Jun 12, 9:00 PM
Unknown Object (File)
Thu, Jun 11, 5:32 PM
Unknown Object (File)
Thu, Jun 4, 2:26 PM
Unknown Object (File)
Tue, Jun 2, 3:07 AM
Unknown Object (File)
Tue, Jun 2, 3:03 AM
Unknown Object (File)
Tue, Jun 2, 3:03 AM

Details

Reviewers
siva
des
Group Reviewers
srcmgr
Summary

Many applications and libraries (e.g., libarchive) assume GNU behavior,
and this particular divergence at least breaks part of the libarchive
test suite. The difference is the behavior when an input character
cannot be represented in the output character set: CITRUS followed along
with POSIX and returned > 0 to indicate the number of invalid
characters, while GNU chose to just halt immediately and EILSEQ it.

While conformance is great, the reality is that most of our ported
software using iconv isn't written against the spec. The previous
behavior can be restored with the ICONV_SET_ILSEQ_INVALID iconvctl(3).

PR: 294577
Relnotes: yes (probably)
Obtained from: https://github.com/apple-oss-distributions/libiconv

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 72720
Build 69603: arc lint + arc unit

Event Timeline

kevans requested review of this revision.May 1 2026, 5:20 PM

Sounds good to me. This should be noted in iconv.3 or somewhere though I imagine?

Update iconv(3) to note the two ways for handling invalid output characters

LGTM, one comment about STANDARDS in the man page.

lib/libc/iconv/iconv.3
308

Should we add a reference to the EILSEQ default here too in the standards section? I didn't notice it until now.

Do we need to do an exp-run for this? Is it likely to break existing programs?

Do we need to do an exp-run for this? Is it likely to break existing programs?

IMO it's possible but unlikely that it will break existing programs, and I'm not sure that an exp-run will necesarily pick up the kind of breakage we'd see since they don't execute tests that are hooked up to the frameworkr -- we'd only really be exercising tools used for builds. I did go patch spelunking for anything of interest, and we have some interesting ones:

  • devel/glib20/files/patch-revert-8abf3a0: I think this patch is misguided somehow, but I need to dig into it more. It claims that inkscape was broken previously, so presumably with the new iconv() behavior it will again break. There are other locale differences between glibc and *BSDs, it'd be good to investigate further (CC @arrowd in case he can provide any more context on the breakage)
  • www/webalizer/files/webalizer-a-urasim_2.patch: I'm not sure that this is strictly broken, but it would result in losing the whole conversion

Maybe it *would* be better to instead adopt libarchive to use ICONV_SET_ILSEQ_INVALID when it's available since it does expect that behavior... webalizer's patch makes me like our default a little better here.

The related bug report in glib's upstream is https://gitlab.gnome.org/GNOME/glib/-/work_items/1883
I didn't have a chance to test it yet, so just leaving the URL here. Thanks for the heads up.

The related bug report in glib's upstream is https://gitlab.gnome.org/GNOME/glib/-/work_items/1883
I didn't have a chance to test it yet, so just leaving the URL here. Thanks for the heads up.

Thanks- a little bit of spelunking, I note that glib/gconvert has a viable fix for FreeBSD that works no matter how we default things: https://gitlab.gnome.org/GNOME/glib/-/merge_requests/3824/diffs -- the patch in the port should be able to go away if we're past a version that has that.

Do we need to do an exp-run for this? Is it likely to break existing programs?

IMO it's possible but unlikely that it will break existing programs, and I'm not sure that an exp-run will necesarily pick up the kind of breakage we'd see since they don't execute tests that are hooked up to the frameworkr -- we'd only really be exercising tools used for builds. I did go patch spelunking for anything of interest, and we have some interesting ones:

  • devel/glib20/files/patch-revert-8abf3a0: I think this patch is misguided somehow, but I need to dig into it more. It claims that inkscape was broken previously, so presumably with the new iconv() behavior it will again break. There are other locale differences between glibc and *BSDs, it'd be good to investigate further (CC @arrowd in case he can provide any more context on the breakage)
  • www/webalizer/files/webalizer-a-urasim_2.patch: I'm not sure that this is strictly broken, but it would result in losing the whole conversion

Maybe it *would* be better to instead adopt libarchive to use ICONV_SET_ILSEQ_INVALID when it's available since it does expect that behavior... webalizer's patch makes me like our default a little better here.

Simple enough, the patch below works too, though I should probably also check some flag to determine whether it's using BSD iconv. Maybe I should just submit this upstream? Do you know offhand if iconvctl() is a FreeBSD-specific thing?

diff --git a/contrib/libarchive/libarchive/archive_string.c b/contrib/libarchive/libarchive/archive_string.c
index c6ae8968d54f..171971b674cd 100644
--- a/contrib/libarchive/libarchive/archive_string.c
+++ b/contrib/libarchive/libarchive/archive_string.c
@@ -1314,6 +1314,12 @@ create_sconv_object(const char *fc, const char *tc,
                        else if (strcmp(fc, "CP932") == 0)
                                sc->cd = iconv_open(tc, "SJIS");
                }
+#ifdef __FreeBSD__
+               int v;
+
+               v = 1;
+               (void)iconvctl(sc->cd, ICONV_SET_ILSEQ_INVALID, &v);
+#endif
des added inline comments.
lib/libc/iconv/bsd_iconv.c
73

“failures” perhaps?

modulo wordsmithing of comments

This revision is now accepted and ready to land.May 13 2026, 2:36 PM

Do we need to do an exp-run for this? Is it likely to break existing programs?

IMO it's possible but unlikely that it will break existing programs, and I'm not sure that an exp-run will necesarily pick up the kind of breakage we'd see since they don't execute tests that are hooked up to the frameworkr -- we'd only really be exercising tools used for builds. I did go patch spelunking for anything of interest, and we have some interesting ones:

  • devel/glib20/files/patch-revert-8abf3a0: I think this patch is misguided somehow, but I need to dig into it more. It claims that inkscape was broken previously, so presumably with the new iconv() behavior it will again break. There are other locale differences between glibc and *BSDs, it'd be good to investigate further (CC @arrowd in case he can provide any more context on the breakage)
  • www/webalizer/files/webalizer-a-urasim_2.patch: I'm not sure that this is strictly broken, but it would result in losing the whole conversion

Maybe it *would* be better to instead adopt libarchive to use ICONV_SET_ILSEQ_INVALID when it's available since it does expect that behavior... webalizer's patch makes me like our default a little better here.

Simple enough, the patch below works too, though I should probably also check some flag to determine whether it's using BSD iconv. Maybe I should just submit this upstream? Do you know offhand if iconvctl() is a FreeBSD-specific thing?

We took it from GNU libiconv, but this particular control is FreeBSD-specific. It's also propagated to macOS, though they enable it by default because of the compatibility issues (they only migrated to CITRUS in 2023 here, so a lot of baggage: https://github.com/apple-oss-distributions/libiconv/commit/887cbee187264111f89c608ce0986bcc98d7aa37)

diff --git a/contrib/libarchive/libarchive/archive_string.c b/contrib/libarchive/libarchive/archive_string.c
index c6ae8968d54f..171971b674cd 100644
--- a/contrib/libarchive/libarchive/archive_string.c
+++ b/contrib/libarchive/libarchive/archive_string.c
@@ -1314,6 +1314,12 @@ create_sconv_object(const char *fc, const char *tc,
                        else if (strcmp(fc, "CP932") == 0)
                                sc->cd = iconv_open(tc, "SJIS");
                }
+#ifdef __FreeBSD__
+               int v;
+
+               v = 1;
+               (void)iconvctl(sc->cd, ICONV_SET_ILSEQ_INVALID, &v);
+#endif

LGTM, and I think this is the least risky way to fix the issue. We might still want to consider a default switch, but it'd be better to do so without pressure from libarchive functionality.

ngie added inline comments.
lib/libc/tests/iconv/iconv_test.c
66
lib/libc/tests/iconv/iconv_test.c
66

No, the original is correct.

Do we need to do an exp-run for this? Is it likely to break existing programs?

IMO it's possible but unlikely that it will break existing programs, and I'm not sure that an exp-run will necesarily pick up the kind of breakage we'd see since they don't execute tests that are hooked up to the frameworkr -- we'd only really be exercising tools used for builds. I did go patch spelunking for anything of interest, and we have some interesting ones:

  • devel/glib20/files/patch-revert-8abf3a0: I think this patch is misguided somehow, but I need to dig into it more. It claims that inkscape was broken previously, so presumably with the new iconv() behavior it will again break. There are other locale differences between glibc and *BSDs, it'd be good to investigate further (CC @arrowd in case he can provide any more context on the breakage)
  • www/webalizer/files/webalizer-a-urasim_2.patch: I'm not sure that this is strictly broken, but it would result in losing the whole conversion

Maybe it *would* be better to instead adopt libarchive to use ICONV_SET_ILSEQ_INVALID when it's available since it does expect that behavior... webalizer's patch makes me like our default a little better here.

Simple enough, the patch below works too, though I should probably also check some flag to determine whether it's using BSD iconv. Maybe I should just submit this upstream? Do you know offhand if iconvctl() is a FreeBSD-specific thing?

We took it from GNU libiconv, but this particular control is FreeBSD-specific. It's also propagated to macOS, though they enable it by default because of the compatibility issues (they only migrated to CITRUS in 2023 here, so a lot of baggage: https://github.com/apple-oss-distributions/libiconv/commit/887cbee187264111f89c608ce0986bcc98d7aa37)

diff --git a/contrib/libarchive/libarchive/archive_string.c b/contrib/libarchive/libarchive/archive_string.c
index c6ae8968d54f..171971b674cd 100644
--- a/contrib/libarchive/libarchive/archive_string.c
+++ b/contrib/libarchive/libarchive/archive_string.c
@@ -1314,6 +1314,12 @@ create_sconv_object(const char *fc, const char *tc,
                        else if (strcmp(fc, "CP932") == 0)
                                sc->cd = iconv_open(tc, "SJIS");
                }
+#ifdef __FreeBSD__
+               int v;
+
+               v = 1;
+               (void)iconvctl(sc->cd, ICONV_SET_ILSEQ_INVALID, &v);
+#endif

LGTM, and I think this is the least risky way to fix the issue. We might still want to consider a default switch, but it'd be better to do so without pressure from libarchive functionality.

I got around to making a PR: https://github.com/libarchive/libarchive/pull/3056