Page MenuHomeFreeBSD

kern: never restart syscalls calling closefp(), e.g. close(2)
ClosedPublic

Authored by kevans on Sun, Nov 22, 1:25 AM.

Details

Summary

All paths leading into closefp() will either replace or remove the fd from the filedesc table, and closefp() will call fo_close that can and do currently sleep without regard for the possibility of an ERESTART. This can be dangerous in multithreaded applications as another thread could have opened another file in its place that is subsequently operated on upon restart.

The following are seemingly the only ones that will pass back ERESTART in-tree, at least, though I haven't looked past sys/kern, sys/fs, sys/ufs:

    • soo_close/soclose (SO_LINGER)
  • fusefs
  • nfsclient

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

I wonder if it is slightly better to convert errors at closefp() instead of kern_close(). Right now closefp() is static and all of its other uses are cast to void.

sys/kern/kern_descrip.c
1325 ↗(On Diff #79842)

I think some explanation there would be useful for a reader. It is enough to say that we already removed fd from the filedesc table, so restart would not operate on our fd.

This revision is now accepted and ready to land.Sun, Nov 22, 7:48 AM
In D27310#610112, @kib wrote:

I wonder if it is slightly better to convert errors at closefp() instead of kern_close(). Right now closefp() is static and all of its other uses are cast to void.

Sure, this makes sense. If, e.g., kern_dup() starts trying to pick up some class of errors from the close then this would be an easy oversight and you'd get similar bad results. I can't imagine why it would start to care, but it would be in a better position to accurately report the error to userland than close() at least.

kevans retitled this revision from kern: never restart close(2) to kern: never restart syscalls calling closefp(), e.g. close(2).
kevans edited the summary of this revision. (Show Details)

Push the translation into closefp() and explain why it's there.

This revision now requires review to proceed.Sun, Nov 22, 3:39 PM
This revision is now accepted and ready to land.Sun, Nov 22, 3:48 PM
jilles requested changes to this revision.Sun, Nov 22, 10:34 PM

Restarting a close() would indeed be very bad, but returning [EINTR] might cause userland to do the same.

Since pre-POSIX systems did not agree about whether an [EINTR] error from close() should leave the fd open or not, POSIX up to POSIX.1-2008 left this unspecified. Later, https://www.austingroupbugs.net/view.php?id=529 specified that close() only return [EINTR] if the fd is left open and specified the error [EINPROGRESS] for the case where a signal interrupts the wait for a closing operation which continues in the background (the example given is of a tape drive rewinding, although SO_LINGER and NFS cache flushes behave similarly).

This revision now requires changes to proceed.Sun, Nov 22, 10:34 PM

Restarting a close() would indeed be very bad, but returning [EINTR] might cause userland to do the same.

Since pre-POSIX systems did not agree about whether an [EINTR] error from close() should leave the fd open or not, POSIX up to POSIX.1-2008 left this unspecified. Later, https://www.austingroupbugs.net/view.php?id=529 specified that close() only return [EINTR] if the fd is left open and specified the error [EINPROGRESS] for the case where a signal interrupts the wait for a closing operation which continues in the background (the example given is of a tape drive rewinding, although SO_LINGER and NFS cache flushes behave similarly).

The problem here is that we've been returning EINTR since, apparently, the 90's. I think it'd be good to re-evaluate that and consider returning success on interrupt, but it's probably debatable if that can be MFC'd.

I do not see this EINTR and EINPROGRESS change in the most recent version of POSIX I have, IEEE Std 1003.1™-2017.

The referenced austin group ticket even went as far as propose posix_close() with some retry flag, which again was not added. More, there were strong objections from Linux side arguing that EINTR behavior should be as it is de-facto implemented by Linux and FreeBSD: file is always closed.

More, our libthr cancellation points in close() ensure that we call the syscall first. As I understand, MacOSX could return EINTR and not issue syscall if cancellation is detected before syscall. We always do the syscall and not terminating before it.

So I think it is correct change that have almost no probability of contradicting to newer POSIX.

In D27310#610450, @kib wrote:

I do not see this EINTR and EINPROGRESS change in the most recent version of POSIX I have, IEEE Std 1003.1™-2017.

The referenced austin group ticket even went as far as propose posix_close() with some retry flag, which again was not added. More, there were strong objections from Linux side arguing that EINTR behavior should be as it is de-facto implemented by Linux and FreeBSD: file is always closed.

The changes are not in the 2017 version because the ticket is tagged issue8, i.e. the next major release of the specification, not a technical corrigendum. It will be years before issue 8 will be released. It is possible that opinions will change during that time.

More, our libthr cancellation points in close() ensure that we call the syscall first. As I understand, MacOSX could return EINTR and not issue syscall if cancellation is detected before syscall. We always do the syscall and not terminating before it.

So I think it is correct change that have almost no probability of contradicting to newer POSIX.

It is definitely a step in the right direction.

This revision is now accepted and ready to land.Tue, Nov 24, 10:41 PM
This comment was removed by jilles.