Page MenuHomeFreeBSD

stdio: rename short _fileno to _fileno_short for legacy abi compatibility, add new int _fileno in struct __sFILE
AcceptedPublic

Authored by kris_tranception.com on Dec 23 2025, 6:03 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Feb 2, 12:37 PM
Unknown Object (File)
Tue, Jan 27, 10:56 AM
Unknown Object (File)
Tue, Jan 27, 2:19 AM
Unknown Object (File)
Mon, Jan 26, 2:37 PM
Unknown Object (File)
Sat, Jan 24, 4:41 AM
Unknown Object (File)
Fri, Jan 23, 10:48 AM
Unknown Object (File)
Fri, Jan 23, 7:52 AM
Unknown Object (File)
Thu, Jan 22, 5:12 PM
Subscribers

Details

Reviewers
des
jhb
adrian
Summary

Kernel file descriptors are 32-bit integers whereas the _file field in libc stdio FILE is a 16-bit short integer.
As a result libc stdio calls such as fopen(), fdopen(), etc. fail when a process has more than 32,767 file descriptors (through whatever library or syscall opened) in use.

This change is change is in two parts:

  • D54354: rename _file to _fileno to break source code that directly attempts to access this internal libc stdio FILE field
  • D54355: add a new 32-bit _fileno file at end of struct FILE; retain old 16-bit field but renamed for binary compatibilty

Please see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=291610#c2 for further background and related bug reports.

Test Plan

Have locally rebuilt and installed 16-CURRENT kernel & world, as well as git and emacs-nox ports, which include dependencies perl and gnulib (part of m4). No issues noted, in particular the perl issues reported in bug #203900 ten years ago were not observed (perl now uses fdclose() instead of overwriting _file). gnulib as well as /bin/cat in the base system both access internal FILE members other than _file directly but appeared to work correctly.

The following simple test program below also worked as expected (failed on current stdio, succeeded on patched stdio, binary compiled against current stdio but run on patched stdio succeeded with fileno() reporting the actual file descriptor but fileno_unlocked() (which is implemented as a macro) reporting -1).

#include<stdio.h>
#include<sysexits.h>
#include<err.h>
#include<unistd.h>
#include<sys/socket.h>
int main() {
  int f;
  for (int i=0; i<0x8000; i++)
    if ((f=socket(PF_INET, SOCK_STREAM, 0))<0)
      err(EX_OSFILE, "socket");
  if (close(f)<0)
    err(EX_OSFILE, "close");
  FILE* const fp = fopen("/dev/null", "r");
  if (!fp) err(EX_OSFILE, "fopen");
  printf("fileno         : %d\n", fileno(fp));
  printf("fileno_unlocked: %d\n", fileno_unlocked(fp));
  if (fclose(fp) == EOF)
    err(EX_OSFILE, "fclose");
  return 0;
}

Question: should /usr/src/lib/libc/tests/stdio/fopen_test.c and friends have a test case for >32,767 currently open sockets? Note this could fail for other reasons such as low resource limits on the build machine, sysctl set low, etc.

All ports should be built against this to see what other ports might break.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 70035
Build 66918: arc lint + arc unit

Event Timeline

des added inline comments.
lib/libc/stdio/fopen.c
63

In FreeBSD, no cast is needed as SHRT_MAX is defined as 0x7fff which has type int.

Some downstream projects which are liable to ingest this code at some point may or may not misguidedly define it as 0x7fffU which could set off warnings about comparing expressions of differing signedness. I assume that is why you added a cast and I agree.

I would prefer casting SHRT_MAX over casting f, but neither option is more correct than the other.

This revision is now accepted and ready to land.Wed, Jan 14, 12:39 PM
lib/libc/stdio/fopen.c
63

The intention of the cast was actually to force an unsigned comparison to catch values less than zero as well as those greater than SHRT_MAX.

Alternatively the lengthier expression f < 0 || f > (signed)SHRT_MAX ? -1 : f could instead be used, with the cast to handle if SHRT_MAX happens to be defined as an unsigned constant.

For fdopen() and vdprintf() the caller passes in the file descriptor and thus it could be arbitrary/byzantine, hence care to ensure _fileno_short is always either -1 or a plausibly valid non-negative file descriptor representable in a short.

lib/libc/stdio/fopen.c
63

In fopen() and freopen(), if the file descriptor is negative, we error out before reaching this point, so there is no issue.

In fdopen() and vdprintf(), we should immediately return EBADF if the caller-provided descriptor is negative, just like we currently return EMFILE if it exceeds SHRT_MAX.

rebased to freebsd/main

  • stdio: rename short _fileno to _fileno_short for legacy abi compatibility, add new int _fileno in struct __sFILE
  • stdio: return error EBADF when fd < 0 in fdopen() and vdprintf()
  • stdio: check file descriptor opened for writing in vdprintf()
This revision now requires review to proceed.Tue, Jan 20, 11:17 PM
lib/libc/stdio/fopen.c
63

I've added fd < 0 checks returning EBADF to fdopen() and vdprintf().

For vdprintf() I've also added a check that the passed in file descriptor is compatible with writing, similar to the open mode check that fdopen() already implements.

Considering that the behaviour of vdprintf() is largely equivalent to fdopen() + vfprintf() + fdclose(), it may be worth factoring the FILE initialisation logic into a single internal function that fdopen() and vdprintf() both share, though that should probably be a separate change / pull request.

This revision is now accepted and ready to land.Wed, Jan 21, 7:34 PM

Oh, this is the sort of thing I really hoped to avoid, was exposing all of FILE as a new ABI. I was careful in my previous versions to figure out which parts of FILE were actually used (not _fileno which doesn't matter, but all the _other_ fields that things like gnulib abuse) and then tried to make FILE mostly opaque, and added the wider _fileno in a slot that was safe to reuse that was near the front still. I think I also used some #ifdef's to ensure that only the fields in FILE that were part of the public ABI were exposed outside of libc. This doesn't do any of that.

Also, you can't do this this way. You have to bump the symbol versions of all symbols that create a FILE object that uses _fileno, and in the old compat version, still limit to 16-bit descriptors because that old software is using the old fileno() macro that reads the file descriptor from the old location. Only in the "new" versions of functions like fopen(), etc. can you actually use the wider fileno field.

@jhb Thank you for taking a look at this and apologies for my delay in responding as I've been on holiday the past week.

I've been moving over my applications and servers over to FreeBSD from various Linux flavours over the past year both for my own sanity and to provide some technical diversity in a space which is completely overrun with penguins.

One application failed last year with an fdopen() returning EMFILE which I thought very strange given the file descriptor was already open. This then led to a rabbit hole where I found five other bugzilla bug reports going back over 16 years as well your work roughly a decade ago which for whatever reason was never incorporated in whole or in part from what I can tell.

Given the continued presence of alternative __sputc() implementations, one of which dates back to at least 1994 and is optimised for a compiler (pcc) which is no longer used for a machine (vax) which is no longer made and no longer supported, there is clearly an admirable tradition of conservatism and if-it-ain't-broke-don't-fix-it within FreeBSD. The fact that this obvious defect/libc-kernel-divergence/arbitrary-stdio-implementation-limitation/call-it-what-you-will continues to exist decades after being identified is rather less admirable and rather concerning.

Although the FreeBSD kernel supports millions of open file descriptors, many people and the applications they use never require more than a few thousand and thus the stdio limitation is of little consequence to them. Most others I suspect who do require a working stdio implementation in the face of tens and hundreds of thousands mmap'd files/sockets/etc. would conclude (unfairly in my opinion) that FreeBSD is not fit for purpose for such applications and use a Linux based distribution rather than spend time rewriting their applications, libraries to eliminate stdio or fork the operating system.

Over the course of going down this rabbit hole, I've gotten to building kernels, userlands, ports packages, vm images and installation media with these stdio patches which proved to be surprisingly straightforward which is a credit to the FreeBSD project. I believe I could maintain indefinitely a patched base OS, keeping it up to date with security patches, producing packages and installation media for use with the servers I run as they change over time, though this would be considerably time consuming. However I would rather this stdio shortcoming be fixed in the FreeBSD source to the benefit of all, and would be happy to work on this further to accommodate wider requirements and make it suitable for inclusion, should the FreeBSD community be open to this.

@jhb so the submitter is going to also need to add/bump symbol versions to a bunch of stdio routines? Is that what you're saying?

(I'm looking for something concrete for them to take as a next step so this doesn't languish ;-) )

In D54355#1256017, @jhb wrote:

Oh, this is the sort of thing I really hoped to avoid, was exposing all of FILE as a new ABI. I was careful in my previous versions to figure out which parts of FILE were actually used (not _fileno which doesn't matter, but all the _other_ fields that things like gnulib abuse) and then tried to make FILE mostly opaque, and added the wider _fileno in a slot that was safe to reuse that was near the front still. I think I also used some #ifdef's to ensure that only the fields in FILE that were part of the public ABI were exposed outside of libc. This doesn't do any of that.

The change as it is currently is focused entirely on supporting 32-bit file descriptors in stdio and little else as that is what I need. Populating the old 16-bit field to support older binaries with inlined fileno() or which read FILE->_file directly seemed reasonable to me. Old binaries would work as is with new FILE objects when less than 32,768 descriptors are open. With more than 32,767 open descriptors, rather than EMFILE, old binaries would receive valid open FILE objects which would work normally (i.e. they can be fread()'d, fwrite()'d, fclose()'d, etc.) save that inlined fileno() would return -1 which is what non-file descriptor based FILE objects do anyways such as those returned by fmemopen() or funopen(). Pragmatically I would consider an application receiving a usable FILE object on the 32,768th fopen() albeit with inlined fileno() returning -1 no worse than receiving an EMFILE and arguably somewhat better.

Preserving old behaviour for old binaries more faithfully would come at the cost of implementation complexity but can certainly be done, with versioned entrypoint for stdio and possibly polymorphic FILE layout with some sort of magic or detection so that the updated stdio can deal with FILE objects created by an old stdio (i.e. an shared object plugin statically linked against an old libc.a is dlopen()'d by an application dynamically linked against a new libc.so and the application fopen()'s a FILE and passes that to the plugin). However my understanding is that 16-CURRENT does not need to maintain any compatibility with 15-RELEASE or earlier versions.

Not to be too frank, but I'm not exactly ignorant of this issue. You can see my last attempt at this and the trivial amount of work to add symbol versions here: https://github.com/freebsd/freebsd-src/compare/master...bsdjhb:freebsd:stdio_file Today the new versions would be newer than FBSD_1.4 though, and perl has seen been fixed to use fdclose(). The gnulib issue is the one blocker I did not get over previously as gnulib's build system is somewhat obtuse and it wasn't clear to me what gnulib thinks is wrong about our ungetc() and if we should fix our ungetc() or not. The other path I went down previously was trying to be more proactive in hiding as much of FILE as possible outside of libc, and that branch is here: https://github.com/freebsd/freebsd-src/compare/master...bsdjhb:freebsd:stdio_hide.'

Also, we used to have two separate FILE objects long ago and that sucked: https://github.com/freebsd/freebsd-src/commit/1e98f88776fc606df245a382685b1ac634a81389

I can see that int _flags2 was added to FILE ten years ago and only one bit has been defined leaving 31 unused bits, precisely what's needed to represent any non-negative file descriptor.
So I'm reworking this to not add a new field but rather continue using short _file for <32768 and the high 31 bits of _flags2 for descriptors >32767 && <0x7fffffff; the size of FILE will not need to change.
Where the descriptor fits in a short, _flags2[31:1] can be set to 0x00007fff so that _flags2[31:1] is always non-zero indicating a FILE object constructed by new stdio vs. one constructed by old stdio; (mind you I'm not entirely convinced this is needed or useful).
I'll remove the fileno macro to prevent _flags2 getting compiled in and I'll add the versioned fopen and friends so old binaries get old short-only FILEs.
Not sure how to hook stdin/stdout/stderr construction and version it, looks like it's initialised at load time; likely doesn't matter unless apps freopen stdin/out/err and likely not even then.
This will be a bit more intrusive than the older patch but will hopefully satisfy backwards compatibility requirements so that it could get back ported for 15.1 and/or 14.4.

As for FILE hiding, I'll do that separately; right now that looks like rename/move struct sFILE into local.h as sFILE_private, leave a partially obfuscated struct definition in stdio.h as sFILE_public, #define sFILE sFILE_private in namespace.h and have stdio.h #define sFILE sFILE_public if undefined(sFILE). Then obfuscate/take out more of fields in __sFILE_public as cat, gnulib and others are fixed.

OpenBSD and DragonFlyBSD both have interesting takes on stdio; OpenBSD's is fully opaque now though I'm not sure that having every putc() be a function call is a good tradeoff. DragonFlyBSD's stdio has a small public header with buffer pointers/counters exposed for macros/inlines.

I can see that int _flags2 was added to FILE ten years ago and only one bit has been defined leaving 31 unused bits, precisely what's needed to represent any non-negative file descriptor.
So I'm reworking this to not add a new field but rather continue using short _file for <32768 and the high 31 bits of _flags2 for descriptors >32767 && <0x7fffffff; the size of FILE will not need to change.

I'd hold off on that and have more of a chat about it first.

I can see that int _flags2 was added to FILE ten years ago and only one bit has been defined leaving 31 unused bits, precisely what's needed to represent any non-negative file descriptor.
So I'm reworking this to not add a new field but rather continue using short _file for <32768 and the high 31 bits of _flags2 for descriptors >32767 && <0x7fffffff; the size of FILE will not need to change.

I'd hold off on that and have more of a chat about it first.

D54355 in its current form aside from not versioning new fopen and friends also increases the size of FILE and references the new field at the end of the struct in the fileno() macro.
This is not good if the goal is to reduce code that touches FILE internals outside of libc stdio; it also means objects and apps compiled against the D54355 stdio may segfault or exhibit undefined behaviour if run against a non-D54355 libc.
More concerning is that you don't need to call fopen/fdopen/freopen/etc. as you get stdin/out/err for free from libc and calling fileno() on one of them could do it; the versioned symbol wouldn't stop this.
(Mind you an app that doesn't fopen any files but calls fileno() on stdin/out/err would be somewhat rare and odd.)

I'm nearly done with the second version so will put that up in a new differential (or maybe a draft PR against the freebsd github) and link to it from here.
If adding a new field is deemed better than using the rest of _flags2 then you still need a _flags2 bit/magic field to confirm presence of new field for runtime safety/sanity.

BTW I've just noticed the f>SHRT_MAX check in freopen.c leaks the descriptor (i.e. it wipes the FILE return EMFILE but it doesn't close the open file with the descriptor it doesn't like.

I've reworked this (against releng/15.0) and put it in github here:
https://github.com/freebsd/freebsd-src/compare/releng/15.0...svenski123:freebsd-src:fd32compat

This version:

  • stores large file descriptors in _flags2
  • adds versioned symbols for fopen, fdopen and freopen
  • fixes some missing initialisation in a few *printf variants that construct fake FILEs internally
  • fixes descriptor leak in freopen
  • reorders some #include per guidance in man style
  • some other whitespace/line break changes as suggested by tools/build/checkstyle9.pl

I've built and installed this in a jail under 15.0; simple test app that opens 32k stream works/fails as expected, currently running checkworld.

I would appreciate a steer if this looks like a better approach for something that can hopefully go into a minor release.

I can (and will eventually) separate out the changes into separate fd32, other bug fixes, whitespace/formatting, #include reorder and put them into separate differentials or GH pull requests.

I have reworked this change into a new branch off of CURRENT organised into three commits (with detailed, multi-line commit messages):

  • 2aa94c66b4a4 stdio: Support 32-bit file descriptors in FILE
  • 582805ad8d8d stdio: Remove fileno macros, add internal inline fileno get/set functions
  • 23897d36c606 stdio: Thread safety fixes, sanity checks, initialisation fixes

As compared to the version in D54354/D54355, the new version:

  • Leaves the name of the _file field unchanged (i.e. no D54354)
  • Includes the fd sanity checks added to fdopen/vdprintf
  • Adds missing thread locking for FILE objects with _flags==0 in freopen
  • Fixes potential fd leak in freopen if new fd >32767; rare in practice, could only happen if passed FILE not open or dup2 fails
  • Removes fileno/fileno_unlocked macros (and thus _file/fd details) from public API
  • Stores large fds (>32767) in unused portion of _flags2
  • Does not add fields to __sFILE or change its size
  • Adds versioned symbols for fopen, fdopen and freopen preserving the existing behaviour for old binaries
  • Note vdprintf has not been versioned as the FILE object it uses internally was never exposed to the caller

As stdio's access to _file is now via inlined static functions in local.h, it would be fairly trivial to have it store large fds in a new field (as done in D54355 and jhb's earlier branch) if that it is considered preferable to using the unused portion of _flags2.

As it is not quite clear to me how to change a 2-stack differential into a 3-stack differential, I have opened a draft pull request in the FreeBSD github repository at the following URL.

If helpful for process or record keeping, I can replace D54354 with an empty diff, and replace D54355 with the three commits squashed into one.
Alternatively I can create a new stack of three differentials for the new branch with git-arc if that would make things easier.

I've implemented an improved encoding for large file descriptors that is efficient and only requires 16 additional bits from _flags2 instead of 31.
More importantly, when the file descriptor fits in a signed short, the 16 additional bits are all zero so the FILE struct data in memory is identical.
Given a 32-bit FD, compute FD ^ ((FD << 16) >> 16); the bottom word goes in_file, the top word goes in the new field.
I took inspiration from the AMD64 canonical virtual addresses scheme for this.
I've added the commit to the draft github pull request I mentioned earlier (URL below).
*https://github.com/freebsd/freebsd-src/pull/2005