Page MenuHomeFreeBSD

sh: Don't assume EINTR means SIGALRM
ClosedPublic

Authored by des on Fri, Nov 14, 5:24 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 18, 1:26 AM
Unknown Object (File)
Sat, Nov 15, 5:01 PM
Unknown Object (File)
Sat, Nov 15, 4:37 PM
Unknown Object (File)
Sat, Nov 15, 3:11 PM
Unknown Object (File)
Sat, Nov 15, 12:26 PM
Unknown Object (File)
Sat, Nov 15, 12:16 PM
Unknown Object (File)
Sat, Nov 15, 12:01 PM
Unknown Object (File)
Sat, Nov 15, 11:53 AM
Subscribers

Details

Summary

While waiting for input in the read builtin, if select() is interrupted
but there is no pending signal, we act like we timed out, and return the
same status as if we had been interrupted by SIGALRM, instead of looping
until we actually do time out.

  • Replace the single select() call with a ppoll() loop.
  • Improve validation of the timeout value. We now accept things like "1h30m15s", which we used to silently truncate to "1h". The flip side is that we no longer accept things like "1hour" or "5sec".
  • Modify the existing read -t 0 test case to verify that read returns immediately when there is input and fails immediately when there isn't.
  • Add a second test case which performs the same tests with a non-zero timeout value.

PR: 290844
MFC after: 1 week

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 68678
Build 65561: arc lint + arc unit

Event Timeline

des requested review of this revision.Fri, Nov 14, 5:24 PM
bdrewery added inline comments.
bin/sh/miscbltin.c
255

This isn't in a loop.

I suspect it won't be this simple either so be sure to run the bin/sh tests.

This revision now requires changes to proceed.Fri, Nov 14, 5:30 PM

Yeah I read the if as a while and forgot that FreeBSD does not update tv when select() is interrupted. I'll work on this some more.

Poudriere's test suite has more testing of this timeout and SIGALRM. It checks for timeout adjusting on [EINTR] and ensuring timeout runs within the period specified. If you want to try that you'll need to modify external/sh/miscbltin.c with the patch and then use make checkquick. Or I can check on the next patch.

My intuition is we'll need a SIGALRM handler which is going to get tricky to not interfere with user traps. I haven't thought on it deeply.

This passes the sh tests but times out (timeout(1)) on poudriere. Looking into why.

This passes all tests. However, I noticed that we don't have full coverage for the read -t 0 case. The read11 test case verifies that read -t 0 returns 128 + SIGALRM if no input is available, but there is no test case that verifies that it succeeds if input is available.

des marked an inline comment as done.Fri, Nov 14, 6:37 PM
# echo yes | /usr/bin/time ./sh -c 'read -t 5 n; echo $n'
yes
        5.00 real         0.92 user         4.05 sys

should be 0. We may need another test case using timeout(1).

This revision now requires changes to proceed.Fri, Nov 14, 6:47 PM

improve timeout argument parser

This works for me. It could be cleaned up more. Confirmed it prevents the original issue reported.

  • check status == 1.
  • handle -t 0 special, which avoids clock_gettime and doesn't sleep forever.

We may want to check for errno == EINTR explicitly.

diff --git bin/sh/miscbltin.c bin/sh/miscbltin.c
index 9d0280bb548a..8de8942f4fa5 100644
--- bin/sh/miscbltin.c
+++ bin/sh/miscbltin.c
@@ -162,6 +162,7 @@ readcmd(int argc __unused, char **argv __unused)
 	int is_ifs;
 	int saveall = 0;
 	ptrdiff_t lastnonifs, lastnonifsws;
+	struct timespec tnow, tend;
 	struct timeval tv;
 	char *tvptr;
 	fd_set ifds;
@@ -187,19 +188,20 @@ readcmd(int argc __unused, char **argv __unused)
 			tv.tv_sec = strtol(shoptarg, &tvptr, 0);
 			if (tvptr == shoptarg)
 				error("timeout value");
-			switch(*tvptr) {
-			case 0:
+			switch (*tvptr) {
 			case 's':
+				tvptr++;
 				break;
 			case 'h':
 				tv.tv_sec *= 60;
 				/* FALLTHROUGH */
 			case 'm':
 				tv.tv_sec *= 60;
+				tvptr++;
 				break;
-			default:
-				error("timeout unit");
 			}
+			if (*tvptr != '\0')
+				error("timeout unit");
 			break;
 		}
 	}
@@ -216,16 +218,36 @@ readcmd(int argc __unused, char **argv __unused)
 		/*
 		 * Wait for something to become available.
 		 */
+		status = sig = 0;
 		FD_ZERO(&ifds);
 		FD_SET(0, &ifds);
-		status = select(1, &ifds, NULL, NULL, &tv);
+		if (tv.tv_sec == 0) {
+			status = select(1, &ifds, NULL, NULL, &tv);
+			sig = pendingsig;
+		} else {
+			clock_gettime(CLOCK_UPTIME, &tnow);
+			tend = tnow;
+			tend.tv_sec += tv.tv_sec;
+			do {
+				status = select(1, &ifds, NULL, NULL, &tv);
+				if (status == 0 || status == 1 ||
+				    (status < 0 && (sig = pendingsig) != 0))
+					break;
+				tv.tv_sec = tend.tv_sec - tnow.tv_sec;
+				tv.tv_usec = (tend.tv_nsec - tnow.tv_nsec) / 1000;
+				if (tv.tv_usec < 0) {
+					tv.tv_sec--;
+					tv.tv_usec -= 1000000;
+				}
+				clock_gettime(CLOCK_UPTIME, &tnow);
+			} while (timespeccmp(&tnow, &tend, <));
+		}
 		/*
 		 * If there's nothing ready, return an error.
 		 */
 		if (status <= 0) {
 			while (*ap != NULL)
 				setvar(*ap++, "", 0);
-			sig = pendingsig;
 			return (128 + (sig != 0 ? sig : SIGALRM));
 		}
 	}

Edited my comment with a more cleaned up version.

What would happen if a catch-able signal was delivered right before we entered select? Would the read built-in become blocked until timeout occurred then? [If yes, this is pre-existing issue in the code, I am only curious there]

In D53761#1227691, @kib wrote:

What would happen if a catch-able signal was delivered right before we entered select? Would the read built-in become blocked until timeout occurred then? [If yes, this is pre-existing issue in the code, I am only curious there]

Yes. To avoid this we'd have to block signals before entering the loop and use ppoll() or pselect().

bin/sh/miscbltin.c
233

Should this be ppoll as well? Masked signals do not interrupt poll.

bin/sh/miscbltin.c
233

So? This won't block.

bin/sh/miscbltin.c
233

Right, timeout is zero, so it is peek.

des marked 2 inline comments as done.Sun, Nov 16, 11:23 AM
jilles added inline comments.
bin/sh/miscbltin.c
191–207

This will cause wrong timeout specifications like -t 30sec or -t 1h30m to be reported as an error. I think it's acceptable, but it should be called out in the commit message.

bin/sh/miscbltin.c
191–207

We can fairly trivially add support for 1h30m, just give me a moment...

further improve timeout parser

des marked an inline comment as done.Mon, Nov 17, 4:16 PM

@bdrewery are you ok with this version of the patch? did you figure out what was going wrong in poudriere?

This one passes all of the Poudriere tests too (more read -t use than a bulk run). The previous problem was -t $n taking n seconds. bin/sh tests lack checking for duration on read -t.

This revision is now accepted and ready to land.Tue, Nov 18, 1:49 AM
This revision now requires review to proceed.Tue, Nov 18, 6:16 AM
bdrewery added inline comments.
bin/sh/tests/builtins/read11.0
21 ↗(On Diff #166654)

In my experience we should test as high as +1s or +2s the expected timeout to avoid a failing test on a loaded system. But I won't block it as these tests usually run alone.

This revision is now accepted and ready to land.Tue, Nov 18, 3:34 PM
This revision was automatically updated to reflect the committed changes.