Details

Reviewers

markj
kib

Commits

rG3e19aac2e84a: tty: delete knotes when TTY is revoked
rG012c194f39e8: tty: delete knotes when TTY is revoked
rGacd5638e268a: tty: delete knotes when TTY is revoked

Summary

The mutex used for locking the TTY also doubles as the lock for the
TTY's knlist. This becomes problematic since knlist_clear() will detach
knotes from the knlist and results in the TTY filterops being called
without locking the TTY.

To workaround this, do not clear knotes when the TTY is still active,
unless the TTY is being revoked - in that case, delete them. When a TTY
is being freed, the knotes are removed as usual.

PR: 272151

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

rew created this revision.Aug 26 2023, 6:41 PM

Herald added a subscriber: imp. · View Herald TranscriptAug 26 2023, 6:41 PM

rew requested review of this revision.Aug 26 2023, 6:41 PM

Harbormaster completed remote builds in B53304: Diff 126558.Aug 26 2023, 6:42 PM

rew added reviewers: markj, kib.Aug 26 2023, 6:44 PM

update revision with missing chunk

Harbormaster completed remote builds in B53305: Diff 126560.Aug 26 2023, 8:16 PM

Can you show the backtrace where tty filter is called without tty lock, please?

In D41605#948130, @kib wrote:

Can you show the backtrace where tty filter is called without tty lock, please?

panic: mutex ttymtx not owned at /usr/src/sys/kern/tty.c:718                                                                          
cpuid = 5                                                                                                                             
time = 1693058495                                                                                                                     
KDB: stack backtrace:                                                                                                                 
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00dc596780                                                        
vpanic() at vpanic+0x132/frame 0xfffffe00dc5968b0                  
panic() at panic+0x43/frame 0xfffffe00dc596910                     
__mtx_assert() at __mtx_assert+0x9f/frame 0xfffffe00dc596920       
tty_kqops_read_event() at tty_kqops_read_event+0x2b/frame 0xfffffe00dc596940                                                          
kqueue_register() at kqueue_register+0x8c3/frame 0xfffffe00dc5969c0                                                                   
kqueue_kevent() at kqueue_kevent+0xdd/frame 0xfffffe00dc596c90     
kern_kevent_fp() at kern_kevent_fp+0x97/frame 0xfffffe00dc596ce0                                                                      
kern_kevent() at kern_kevent+0x82/frame 0xfffffe00dc596d40                                                                            
kern_kevent_generic() at kern_kevent_generic+0x73/frame 0xfffffe00dc596da0                                                            
sys_kevent() at sys_kevent+0x61/frame 0xfffffe00dc596e00                                                                              
amd64_syscall() at amd64_syscall+0x138/frame 0xfffffe00dc596f30                                                                       
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00dc596f30       
--- syscall (560, FreeBSD ELF64, kevent), rip = 0x822cdd46a, rsp = 0x820896178, rbp = 0x8208961c0 ---
KDB: enter: panic

This is what I think is happening:

knlist_clear() gets called and detaches the knote from tty's knlist
sometime later sys_kevent() is called
kqueue_register() finds the detached knote on the kqueue.
kqueue_register() locks the knlist for the knote - which is a nop since the knote is detached
tty_kqops_read_event() is called without tty_lock

But then, what would guarantee liveness of the tty when its detached knote f_event method is called? For me it sounds as if we must not call f_event if the knote is detached, and perhaps should not register the knote at all?

In D41605#948243, @kib wrote:

But then, what would guarantee liveness of the tty when its detached knote f_event method is called?

I'm not aware of a guarantee - but, I'm assuming that if events are being generated, the tty must have some sort of liveness. This does not seem like a safe assumption though.

For me it sounds as if we must not call f_event if the knote is detached, and perhaps should not register the knote at all?

My reading of the code, it seemed intentional for f_event() to be called on detached knotes to allow cleaning up of the knlist and to report a final event.

With that as my understanding, I assumed the responsibility of handling detached knotes falls on the f_event() handler. Since I didn't know of a way to guarantee that the tty would still be around when a detached knote is triggered, I opted for the solution to not pass detached knotes to the tty event handlers.

What would prevent this situation for other kinds of knotes? I do not see anything that would do it, and IMO the problem is generic, not limited to ttys?

In D41605#948294, @kib wrote:

What would prevent this situation for other kinds of knotes?

It depends on what the f_event() handler is using the knlist lock for.

I do not see anything that would do it, and IMO the problem is generic, not limited to ttys?

Right..it's either a misuse of knlist_clear() or a generic problem.

If the premise is that events shouldn't be triggered for a detached knote, then knlist_clear() should be reaped.

In D41605#948294, @kib wrote:

What would prevent this situation for other kinds of knotes? I do not see anything that would do it, and IMO the problem is generic, not limited to ttys?

Most knotes hold an fd reference, which usually guarantees liveness. Consider pipes for instance: file_piperead() unconditionally locks the pipe, but knlist_clear() is not called until the pipe is being closed, so a similar race is not possible there.

When a TTY is being freed, the knotes are removed as usual.

How do they get removed? tty_dealloc() calls knlist_destroy(), but this presumes that knotes have already been dequeued.

In D41605#948445, @markj wrote:

How do they get removed? tty_dealloc() calls knlist_destroy(), but this presumes that knotes have already been dequeued.

I missed that, I thought tty_dealloc() was calling knlist_delete() in addition to knlist_destroy().

knlist_delete() should probably be getting called before knlist_destroy().

I'll update this revision to reflect that if we agree that's the fix we want.

In D41605#948453, @rew wrote:

In D41605#948445, @markj wrote:

How do they get removed? tty_dealloc() calls knlist_destroy(), but this presumes that knotes have already been dequeued.

I missed that, I thought tty_dealloc() was calling knlist_delete() in addition to knlist_destroy().

knlist_delete() should probably be getting called before knlist_destroy().

I'll update this revision to reflect that if we agree that's the fix we want.

I do think that the problem is in the TTY layer, i.e., the patch is going in the right direction. I'm not completely sure yet about the details.

Do not clear knotes from the TTY until it gets dealloc'ed unless the TTY
is being revoked, in that case delete the knotes when closed is called
on the TTY.

When knotes are cleared from a knlist, those knotes become detached from
the knlist. And when an event is triggered on a detached knote there
isn't an associated knlist and therefore no lock will be taken when the
event is triggered.

This becomes a problem when a detached knote is triggered on a TTY since
the mutex for a TTY is also used as the lock for its knlists. This
scenario ends up calling the TTY event handlers without the TTY lock
being held and tripping on the asserts in the event handlers.

Harbormaster completed remote builds in B54895: Diff 131259.Dec 11 2023, 7:13 PM

Why exactly do we need special handling for FREVOKE?

sys/kern/tty.c
394
395

In D41605#980592, @markj wrote:

Why exactly do we need special handling for FREVOKE?

I noticed that there is a small window when a revoked TTY still triggers kevents.

For example, when exiting an SSH session:

the shell exits: sys_exit() -> devfs_revoke() -> ttydev_close() /* TTY handle is not dealloc'ed since sshd still has a session on the TTY */
kevents are still triggered
sshd process with a session on the TTY is reaped

It's not strictly necessary to have special handling for FREVOKE but it felt natural to delete knotes as soon as the TTY is revoked.

In D41605#980664, @rew wrote:

In D41605#980592, @markj wrote:

Why exactly do we need special handling for FREVOKE?

I noticed that there is a small window when a revoked TTY still triggers kevents.

For example, when exiting an SSH session:

the shell exits: sys_exit() -> devfs_revoke() -> ttydev_close() /* TTY handle is not dealloc'ed since sshd still has a session on the TTY */

kevents are still triggered

sshd process with a session on the TTY is reaped

It's not strictly necessary to have special handling for FREVOKE but it felt natural to delete knotes as soon as the TTY is revoked.

FREVOKE when caused by revoke(2) must ensure that tty is fully cleared and prepared for reuse. Old knotes lingering on the tty knlist means that we leaked the state from the previous use.

kib added inline comments.Dec 12 2023, 11:56 PM