Page MenuHomeFreeBSD

kern: tty: fix EOF handling for canonical reads
ClosedPublic

Authored by kevans on Jan 9 2024, 6:24 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 19, 10:33 AM
Unknown Object (File)
Wed, Nov 13, 8:34 PM
Unknown Object (File)
Fri, Nov 8, 11:34 AM
Unknown Object (File)
Fri, Nov 8, 11:29 AM
Unknown Object (File)
Oct 18 2024, 10:46 PM
Unknown Object (File)
Oct 18 2024, 10:44 PM
Unknown Object (File)
Oct 18 2024, 9:38 PM
Unknown Object (File)
Oct 18 2024, 8:06 PM
Subscribers

Details

Summary

If the read(2) buffer is one byte short of an EOF, then we'll end up
reading the line into the buffer, then re-entering and seeing an EOF at
the beginning of the inq, assuming it's a zero-length line.

Fix this corner-case by searching one more byte than we have available
for an EOF. If we found it, then we'll trim it here; otherwise, we'll
limit our read to just the space we have in the out buffer and the next
read(2) will (potentially) read the remainder of the line.

PR: 276220

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kevans requested review of this revision.Jan 9 2024, 6:24 PM
cy added a subscriber: cy.

Looks good.

This revision is now accepted and ready to land.Jan 9 2024, 8:08 PM

I think this is fine.

So if we have e.g. the string "aVEOF" queued, what does FIONREAD return?

In D43378#989089, @kib wrote:

So if we have e.g. the string "aVEOF" queued, what does FIONREAD return?

Both before and after this change it returns 2 (and if you just have VEOF enqueued, 1), which seems wrong and should probably also be fixed. The ttyinq tracks the "beginning" of the next line as part of ttyinq_canonicalize, which is naturally the character after a VEOF -- so it's not just off-by-one, because if you then enqueue "aVEOFVEOF" you'll get 3, but it should still be 1 until we've peeled off the "a". Probably ttyinq_canonicalize should track a lineend that mostly moves as you read from it.

Right, so I suspect that the right fix is to not queue VEOF in canonical mode at all. It should be an indicator of the position for the last VEOF, as one possibility.

I would tend to agree, but POSIX doesn't seem to define a -ICANON -> ICANON or vice-versa transition as flushing the buffer, so I think that approach adds some complexity as you need to interpret VEOF based on the state of lflags at the time of read/FIONREAD rather than at the time of write

On Linux, at least, FIONREAD will effectively return whatever's there with control characters stripped. So "HelloVEOFHelloVEOFVEOF" will reflect 10 bytes, "HelloVEOFHello" reflects 5 , which also makes sense and might even be easier to implement in some ways. I guess we don't and probably have never guaranteed that if FIONREAD indicates 10 bytes available that you'll get the full 10 bytes in a single read(2) call.

On Linux, at least, FIONREAD will effectively return whatever's there with control characters stripped. So "HelloVEOFHelloVEOFVEOF" will reflect 10 bytes, "HelloVEOFHello" reflects 5 , which also makes sense and might even be easier to implement in some ways. I guess we don't and probably have never guaranteed that if FIONREAD indicates 10 bytes available that you'll get the full 10 bytes in a single read(2) call.

If FIONREAD returns 10, then read(2) indeed should non-blockingly return 10 or more (assuming exclusive access).

I do not think that Linux is an exemplar implementation. Classical tty layer should be still present in OpenBSD/NetBSD, alternatively you could try to find FreeBSD <= 5.x (not sure about 6.x).

In D43378#989266, @kib wrote:

On Linux, at least, FIONREAD will effectively return whatever's there with control characters stripped. So "HelloVEOFHelloVEOFVEOF" will reflect 10 bytes, "HelloVEOFHello" reflects 5 , which also makes sense and might even be easier to implement in some ways. I guess we don't and probably have never guaranteed that if FIONREAD indicates 10 bytes available that you'll get the full 10 bytes in a single read(2) call.

If FIONREAD returns 10, then read(2) indeed should non-blockingly return 10 or more (assuming exclusive access).

I do not think that Linux is an exemplar implementation. Classical tty layer should be still present in OpenBSD/NetBSD, alternatively you could try to find FreeBSD <= 5.x (not sure about 6.x).

OpenBSD behavior seems to match our current behavior here:

If you only read(2) of 5 bytes and you feed in 5 characters + VEOF, the VEOF is not dropped, which I think is actually incorrect behavior:

Special character on input, which is recognized if the ICANON flag is set. When received, all the bytes waiting to be read are immediately passed to the process without waiting for a <newline>, and the EOF is discarded. Thus, if there are no bytes waiting (that is, the EOF occurred at the beginning of a line), a byte count of zero shall be returned from the read(), representing an end-of-file indication. If ICANON is set, the EOF character shall be discarded when processed.

FIONREAD over-counts and includes the VEOF:

kevobsd1# ./a.out
# Sent: A^D
AReady: 2
A^C
kevobsd1# ./a.out
# Sent: A^DDude 
ADudeReady: 2
A^C
kevobsd1# ./a.out 
# Sent: A^DDude^D
ADudeReady: 7
ADude^C

That pairs with this version that additional does the read() after FIONREAD to understand what that returns:

kevobsd1# ./a.out
# Sent: A^DDude
ADudeReady: 2
Read 1 bytes
^C
kevobsd1# ./a.out 
# Sent: A^DDude^D
ADudeReady: 7
Read 1 bytes
Read 4 bytes
^C

The first prompt just demonstrates a shorter-than-FIONREAD read, the second prompt demonstrates that it does indeed return the full size of all lines up-front and still splits it across multiple read(2) (since we can only do one line at a time). Finally, what happens in my somewhat contrived example of ingesting characters while -ICANON, then flipping ICANON on and reading them; in this example, canonicalization is off for 5 seconds in which I send the input, then it flips it back on and reads:

OpenBSD:

kevobsd1# ./a.out 
Canonicalization off
A^DDudeADudeRead 1 bytes
^C

Linux:

snow% ./a.out
Canonicalization off
A^DDudeRead 5 bytes
Read 1 bytes
^C

The Linux read(2) is only broken up because I'm still using a 5 byte buffer for the tests. macOS and FreeBSD behave the same as OpenBSD does here, modulo some differences in echo'ing.

So on the BSD-ish systems, the behavior is more like I'd interpret the spec: canonical mode is purely about what we do upon read(2) / FIONREAD, so we would need to be prepared for the possibility that one switches a tty to -ICANON before reading and make sure that we re-inject VEOF if we're holding that information out-of-band.

I think the other problem is that VEOF character could be changed between tty ingestion and application read(2). Today we'll canonicalize lines upon input based on VEOF, but the VEOF upon read(2) is the final authority. Right now we could end up in a situation where read(2) blocks for too long because we won't scan past the stale VEOF if it's still ICANON when we go to read(2).

So what is the conclusion?

We need to scan for VEOF both on read(2) and FIONREAD in canonical mode, and we can make FIONREAD return the count of bytes till the first VEOF? Also we would need to re-scan on VEOF change?

In D43378#989966, @kib wrote:

So what is the conclusion?

We need to scan for VEOF both on read(2) and FIONREAD in canonical mode, and we can make FIONREAD return the count of bytes till the first VEOF? Also we would need to re-scan on VEOF change?

Sorry- yes on both counts. We have to keep VEOF in the input queue to maintain historical consistency (re: interpretation at time of ingestion vs. time of read(2)), so we'll need to fix FIONREAD to truncate the returned size as needed (fairly clean) and rescan on VEOF/VEOL change. I have a patch to do the first, and the second shouldn't be that bad.

Based on observations, added:

Fix FIONREAD while we're here to match what an application can expect
read(2) to return -- scan for the first break character in the part of
the input that's been canonicalized, we'll never return more than that.

In the interim, I wrote a tool to do some testing on this stuff using pts(4) and
wrote some general tests to do some sanity testing of this stuff.

https://git.kevans.dev/kevans/orch (inspired by expect(1), but just uses lua)
https://git.kevans.dev/kevans/tty-tests/src/branch/main/test_canon.orch

I'll expand the tests a bit more as time permits, I mainly wanted to make sure
that I could exercise this stuff in a more automated fashion.

This revision now requires review to proceed.Jan 15 2024, 5:52 AM
This revision is now accepted and ready to land.Jan 15 2024, 1:41 PM