Page MenuHomeFreeBSD

mqueuefs: remove
Needs ReviewPublic

Authored by kib on May 22 2024, 10:19 PM.
Tags
None
Referenced Files
F88368108: D45305.id.diff
Sun, Jul 14, 10:41 AM
Unknown Object (File)
Sat, Jul 6, 8:36 PM
Unknown Object (File)
Sat, Jul 6, 9:25 AM
Unknown Object (File)
Fri, Jun 28, 11:52 PM
Unknown Object (File)
Wed, Jun 26, 3:20 AM
Unknown Object (File)
May 26 2024, 8:54 AM
Unknown Object (File)
May 26 2024, 4:16 AM
Unknown Object (File)
May 25 2024, 10:50 PM

Details

Reviewers
markj
Summary
The filesystem has significant bugs, as usual related to the vnode
lifecycle and its interaction with the mqueue lifecycle. Fixing it
would take a lot of efforts which seems to be futile. E.g., the trivial
immediately panicing issue fixed in f0a4dd6d46e99d47fde1 prevented
mqueuefs mount at all, and was there for quite some time.

PR:     278936

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.May 22 2024, 10:19 PM
kib retitled this revision from mqueuefs: ensure that mqfs_node is alive until unlink task completed to mqueuefs: rework node unlinking.
kib edited the summary of this revision. (Show Details)

Now crashing like this:

panic: Bad list head 0xffffffff80f31ec0 first->prev != head
cpuid = 2
time = 1716697553
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00da25e9a0
vpanic() at vpanic+0x13d/frame 0xfffffe00da25ead0
panic() at panic+0x43/frame 0xfffffe00da25eb30
do_unlink() at do_unlink+0x2d8/frame 0xfffffe00da25eb60
mqfs_remove() at mqfs_remove+0x59/frame 0xfffffe00da25eb90
VOP_REMOVE_APV() at VOP_REMOVE_APV+0x3a/frame 0xfffffe00da25ebb0
kern_funlinkat() at kern_funlinkat+0x473/frame 0xfffffe00da25ede0
sys_unlink() at sys_unlink+0x28/frame 0xfffffe00da25ee00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe00da25ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00da25ef30

  • syscall (10, FreeBSD ELF64, unlink), rip = 0x17dae350b52a, rsp = 0x17dae0fb24d8, rbp = 0x17dae0fb25f0 ---

KDB: enter: panic

kib retitled this revision from mqueuefs: rework node unlinking to mqueuefs: remove.
kib edited the summary of this revision. (Show Details)

I give up

I have no particular objection to removal of the filesystem, but mqueuefs.4 and posixmqcontrol.1 need to be updated as well to remove mentions of mount points.

E.g., the trivial immediately panicing issue fixed in f0a4dd6d46e99d47fde1 prevented mqueuefs mount at all

What happens with a non-invariants kernel? Does it panic as well?

sys/kern/uipc_mqueue.c
1899

How does unloadable get set?

I have no particular objection to removal of the filesystem, but mqueuefs.4 and posixmqcontrol.1 need to be updated as well to remove mentions of mount points.

E.g., the trivial immediately panicing issue fixed in f0a4dd6d46e99d47fde1 prevented mqueuefs mount at all

What happens with a non-invariants kernel? Does it panic as well?

There is definitely a memory corruption issue, and from the report, it seems that panic is typically not triggered (immediately).

Most prominent is the running task with struct task and vnode freed, then vhold() etc called on the freed memory.

Of course, there are very serious issues with unmount.

Remove 'unloadable' variable.
Clean up man pages.

share/man/man4/mqueuefs.4
49

This man page documents the whole mqueue module, not just the filesystem component. IMO it should still be kept, with references to the filesystem removed.

In D45305#1035284, @kib wrote:

There is definitely a memory corruption issue, and from the report, it seems that panic is typically not triggered (immediately).

The panic was immediate, and not related with unmount.

We'll need a way to list mqueues in the system. Otherwise we have no way to know about them and the user may create 100 queues that root can't fix by removing them.

In D45305#1035284, @kib wrote:

There is definitely a memory corruption issue, and from the report, it seems that panic is typically not triggered (immediately).

The panic was immediate, and not related with unmount.

I mean, that besides the issue you uncovered, and other issues with the interaction between vnode and mqueue lifetime, there are also hard to fix unmount races.

We'll need a way to list mqueues in the system. Otherwise we have no way to know about them and the user may create 100 queues that root can't fix by removing them.

I will add list/ls to posixmqcontrol, might be after rewriting it from scratch.

share/man/man4/mqueuefs.4
49

The documentation for whole module is single sentence, 'The module contains system calls to manipulate
.Tn POSIX
message queues.'

All other text is about the filesystem proper. I do not see it useful to create single-sentence man page.