Page MenuHomeFreeBSD

sound: Allocate vchans on-demand
ClosedPublic

Authored by christos on Dec 4 2024, 4:57 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Oct 23, 2:41 AM
Unknown Object (File)
Fri, Oct 17, 11:49 PM
Unknown Object (File)
Mon, Oct 13, 12:29 AM
Unknown Object (File)
Sun, Oct 12, 3:18 AM
Unknown Object (File)
Mon, Sep 29, 11:49 AM
Unknown Object (File)
Mon, Sep 29, 2:35 AM
Unknown Object (File)
Sun, Sep 28, 2:02 AM
Unknown Object (File)
Fri, Sep 26, 8:53 PM
Subscribers

Details

Summary

Refactor pcm_chnalloc() and merge with parts of vchan_setnew() (now
removed) and dsp_open()’s channel creation into a new dsp_chn_alloc()
function. The function is responsible for either using a free HW channel
(if vchans are disabled), or allocating a new vchan.

Clean up allocated vchans associated with a given dsp_cdevpriv on
dsp_close() instead of leaving them unused.

hw.snd.vchans_enable (previously hw.snd.maxautovchans) and
dev.pcm.X.{play|rec}.vchans now work as tunables to only enable/disable
vchans, as opposed to setting their number and/or (de-)allocating
vchans. Since these sysctls do not trigger any (de-)allocations anymore,
their effect is instantaneous, whereas before we could have frozen the
machine (when trying to allocate new vchans) when setting
dev.pcm.X.{play|rec}.vchans to a very large value.

Create a new "primary" channel sublist so that we do not waste time
looping through all channels in dsp_chn_alloc(), since we are only
looking for a parent channel to either use, or add a new vchan to. This
guarantees a steady traversal speed, as the parent channels are most
likely going to be just a handful (2). What was currently in place was a
loop through the whole channel list, which meant that the traversal
would take longer the more channels were added to that list.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Test Plan
  • Audio tests (snd_uaudio(4): Focusrite Scarlett Solo)
    • Simplex: OK
    • Duplex: OK
    • Same vchanrate: OK
    • Same vchanrate, one direction of vchans disabled: OK
    • Same vchanrate, vchans disabled: OK
    • Different vchanrate: BAD (Most likely what's mentioned in the first paragraph of snd_uaudio.4 BUGS)
    • Different vchanrate, one direction of vchans disabled: BAD (see above)
    • Different vchanrate, vchans disabled: BAD (see above)
  • Benchmark: for i in $(seq 1 X); do cat /dev/random >/dev/dsp & done

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
share/man/man4/pcm.4
365

"overridden"

sys/dev/sound/pcm/dsp.c
205–214

Since d is not locked here, and the flags do not represent the current state of opened (v)channels, could we end up with using a busy primary channel here? Or a vchan on top of a directly opened primary channel?

361–362

Increase of the channel counters is not protected under the same mutex lock that we hold here. The comparison should at least be >=, in case multiple channels are opened at once and the total of the counters exceeds 10000.

sys/dev/sound/pcm/vchan.c
813–814

Are we allowed to clear the CHN_F_BUSY flag here? I suppose it could depend on other children's state?

Kernel panics when I play something, stop it, cat /dev/sndstat, and then try to play something again. My tree has D47868, D47917 and D47918 on top of main branch, I suspect it's this one.

Fatal trap 9: general protection fault while in kernel mode
cpuid = 4; apic id = 04
instruction pointer     = 0x20:0xffffffff808c8d70
stack pointer           = 0x28:0xfffffe00d9971bd0
frame pointer           = 0x28:0xfffffe00d9971c00
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1026 (sox)
rdi: fffff800018b8c80 rsi: 0000000000000008 rdx: deadc0dedeadc20e
rcx: deadc0dedeadc0de  r8: 0000000000000002  r9: ffffffffffffffff
rax: fffff800016b5d88 rbx: fffff80007e3bc00 rbp: fffffe00d9971c00
r10: ffffffff81c4a918 r11: 0000000000000000 r12: fffff80009404080
r13: fffff800016b5d80 r14: 0000000000000001 r15: fffff800079ac090
trap number             = 9
panic: general protection fault
cpuid = 4
time = 1734030966
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d9971910
vpanic() at vpanic+0x136/frame 0xfffffe00d9971a40
panic() at panic+0x43/frame 0xfffffe00d9971aa0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00d9971b00
calltrap() at calltrap+0x8/frame 0xfffffe00d9971b00
--- trap 0x9, rip = 0xffffffff808c8d70, rsp = 0xfffffe00d9971bd0, rbp = 0xfffffe00d9971c00 ---
chn_trigger() at chn_trigger+0x150/frame 0xfffffe00d9971c00
chn_write() at chn_write+0x164/frame 0xfffffe00d9971c60
dsp_io_ops() at dsp_io_ops+0x3ae/frame 0xfffffe00d9971cc0
dsp_write() at dsp_write+0x22/frame 0xfffffe00d9971ce0
devfs_write_f() at devfs_write_f+0xf3/frame 0xfffffe00d9971d40
dofilewrite() at dofilewrite+0x81/frame 0xfffffe00d9971d90
sys_write() at sys_write+0xb7/frame 0xfffffe00d9971e00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe00d9971f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00d9971f30
--- syscall (4, FreeBSD ELF64, write), rip = 0x82e2b3d3a, rsp = 0x820d384c8, rbp = 0x820d384f0 ---
KDB: enter: panic
[ thread pid 1026 tid 100293 ]
Stopped at      kdb_enter+0x33: movq    $0,0x1055632(%rip)

Kernel panics when I play something, stop it, cat /dev/sndstat, and then try to play something again. My tree has D47868, D47917 and D47918 on top of main branch, I suspect it's this one.

Fatal trap 9: general protection fault while in kernel mode
cpuid = 4; apic id = 04
instruction pointer     = 0x20:0xffffffff808c8d70
stack pointer           = 0x28:0xfffffe00d9971bd0
frame pointer           = 0x28:0xfffffe00d9971c00
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1026 (sox)
rdi: fffff800018b8c80 rsi: 0000000000000008 rdx: deadc0dedeadc20e
rcx: deadc0dedeadc0de  r8: 0000000000000002  r9: ffffffffffffffff
rax: fffff800016b5d88 rbx: fffff80007e3bc00 rbp: fffffe00d9971c00
r10: ffffffff81c4a918 r11: 0000000000000000 r12: fffff80009404080
r13: fffff800016b5d80 r14: 0000000000000001 r15: fffff800079ac090
trap number             = 9
panic: general protection fault
cpuid = 4
time = 1734030966
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d9971910
vpanic() at vpanic+0x136/frame 0xfffffe00d9971a40
panic() at panic+0x43/frame 0xfffffe00d9971aa0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00d9971b00
calltrap() at calltrap+0x8/frame 0xfffffe00d9971b00
--- trap 0x9, rip = 0xffffffff808c8d70, rsp = 0xfffffe00d9971bd0, rbp = 0xfffffe00d9971c00 ---
chn_trigger() at chn_trigger+0x150/frame 0xfffffe00d9971c00
chn_write() at chn_write+0x164/frame 0xfffffe00d9971c60
dsp_io_ops() at dsp_io_ops+0x3ae/frame 0xfffffe00d9971cc0
dsp_write() at dsp_write+0x22/frame 0xfffffe00d9971ce0
devfs_write_f() at devfs_write_f+0xf3/frame 0xfffffe00d9971d40
dofilewrite() at dofilewrite+0x81/frame 0xfffffe00d9971d90
sys_write() at sys_write+0xb7/frame 0xfffffe00d9971e00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe00d9971f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00d9971f30
--- syscall (4, FreeBSD ELF64, write), rip = 0x82e2b3d3a, rsp = 0x820d384c8, rbp = 0x820d384f0 ---
KDB: enter: panic
[ thread pid 1026 tid 100293 ]
Stopped at      kdb_enter+0x33: movq    $0,0x1055632(%rip)

Interesting, I am not able to reproduce this. Does this happen with a different player than sox as well? Not sure how this patch could be responsible for this, though. All it does is essentially allocate vchans on open() and delete them on close(), instead of keeping them around after close().

sys/dev/sound/pcm/dsp.c
205–214

d is in fact "locked". SD_F_BUSY is set, which acts as a kind of sleepable lock. Also which flags are you referring to? The PCM or the channel ones?

sys/dev/sound/pcm/vchan.c
813–814

The reason I am clearing it here, is because I see that CHN_F_HAS_VCHAN is also cleared here. The fail label is only reached if a call to chn_reset() or chn_getcaps() on the parent fails. So I guess this code is reached if the parent is (for lack of a better word) unusable in general.

Kernel panics when I play something, stop it, cat /dev/sndstat, and then try to play something again. My tree has D47868, D47917 and D47918 on top of main branch, I suspect it's this one.

Fatal trap 9: general protection fault while in kernel mode
...

Interesting, I am not able to reproduce this. Does this happen with a different player than sox as well? Not sure how this patch could be responsible for this, though. All it does is essentially allocate vchans on open() and delete them on close(), instead of keeping them around after close().

I now isolated the panic to this patch (on top of D47868 which is required to build). It also happens with cat /dev/zero > /dev/dsp instead of sox. The cat /dev/sndstat is not needed to reproduce it. I did a full system build / install with and without this patch, consistent results.

sys/dev/sound/pcm/dsp.c
205–214

I meant the d->flags, but you're right, they are protected through SD_F_BUSY. What I apparently can do is to enable vchans with sysctl dev.pcm.0.play.vchans = 1 while playing on a primary channel, and then open a vchan for playback on top of that. It leads to a timeout and inconsistent state afterwards, not good.

I think there's a conceptual problem here with the flags not representing the actual state of channels running. We have to know whether the primary channel is used directly or only for vchans.

sys/dev/sound/pcm/vchan.c
813–814

I wouldn't equate a chn_reset() error to the parent being completely unusable. And even if that's true, we take these flags as indicator for vchans, it's inconsistent to unset them without checking for other vchans first. See vchan_destroy() below, maybe we can call that or reuse the code there?

FYI, the panic also happens on close:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer     = 0x20:0xffffffff808c8dd0
stack pointer           = 0x28:0xfffffe00d974b600
frame pointer           = 0x28:0xfffffe00d974b630
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1059 (cat)
rdi: fffff800017d5fa0 rsi: 0000000000000008 rdx: deadc0dedeadc20e
rcx: deadc0dedeadc0de  r8: 0000000000000002  r9: ffffffffffffffff
rax: fffff80001843d88 rbx: fffff800017d0800 rbp: fffffe00d974b630
r10: ffffffff81c4a450 r11: 0000000000000000 r12: fffffe00ddd4cc00
r13: fffff80001843d80 r14: 00000000ffffffff r15: fffff800017cf0c0
trap number             = 9
panic: general protection fault
cpuid = 2
time = 1734349393
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d974b340
vpanic() at vpanic+0x136/frame 0xfffffe00d974b470
panic() at panic+0x43/frame 0xfffffe00d974b4d0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00d974b530
calltrap() at calltrap+0x8/frame 0xfffffe00d974b530
--- trap 0x9, rip = 0xffffffff808c8dd0, rsp = 0xfffffe00d974b600, rbp = 0xfffffe00d974b630 ---
chn_trigger() at chn_trigger+0x1b0/frame 0xfffffe00d974b630
chn_flush() at chn_flush+0x54/frame 0xfffffe00d974b650
dsp_close() at dsp_close+0x3d6/frame 0xfffffe00d974b6a0
devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0xab/frame 0xfffffe00d974b6c0
devfs_close_f() at devfs_close_f+0x63/frame 0xfffffe00d974b6f0
_fdrop() at _fdrop+0x1b/frame 0xfffffe00d974b710
closef() at closef+0x1e3/frame 0xfffffe00d974b7a0
fdescfree() at fdescfree+0x41e/frame 0xfffffe00d974b860
exit1() at exit1+0x49d/frame 0xfffffe00d974b8d0
sigexit() at sigexit+0x16b/frame 0xfffffe00d974bd50
postsig() at postsig+0x1a5/frame 0xfffffe00d974be20
ast_sig() at ast_sig+0x1bb/frame 0xfffffe00d974bed0
ast_handler() at ast_handler+0xe8/frame 0xfffffe00d974bf10
ast() at ast+0x20/frame 0xfffffe00d974bf30
doreti_ast() at doreti_ast+0x1c/frame 0x3076a6210ba0
KDB: enter: panic
[ thread pid 1059 tid 100269 ]
Stopped at      kdb_enter+0x33: movq    $0,0x10554b2(%rip)

Kernel panics when I play something, stop it, cat /dev/sndstat, and then try to play something again. My tree has D47868, D47917 and D47918 on top of main branch, I suspect it's this one.

Fatal trap 9: general protection fault while in kernel mode
...

Interesting, I am not able to reproduce this. Does this happen with a different player than sox as well? Not sure how this patch could be responsible for this, though. All it does is essentially allocate vchans on open() and delete them on close(), instead of keeping them around after close().

I now isolated the panic to this patch (on top of D47868 which is required to build). It also happens with cat /dev/zero > /dev/dsp instead of sox. The cat /dev/sndstat is not needed to reproduce it. I did a full system build / install with and without this patch, consistent results.

I applied just D47868 and this one and still cannot reproduce the panic... Tried all the tests I have in the test section here, as well as what you mentioned, with both vchans enabled or disabled, including bitperfect. Judging from your backtraces, it seems that the panics most likely occur at the CHN_INSERT_HEAD_SAFE()/CHN_REMOVE_SAFE() calls. Is there something worth noting in your setup/test environment?

I will also work on addressing the issues in the inline comments.

Kernel panics when I play something, stop it, cat /dev/sndstat, and then try to play something again. My tree has D47868, D47917 and D47918 on top of main branch, I suspect it's this one.

Fatal trap 9: general protection fault while in kernel mode
...

Interesting, I am not able to reproduce this. Does this happen with a different player than sox as well? Not sure how this patch could be responsible for this, though. All it does is essentially allocate vchans on open() and delete them on close(), instead of keeping them around after close().

I now isolated the panic to this patch (on top of D47868 which is required to build). It also happens with cat /dev/zero > /dev/dsp instead of sox. The cat /dev/sndstat is not needed to reproduce it. I did a full system build / install with and without this patch, consistent results.

I applied just D47868 and this one and still cannot reproduce the panic... Tried all the tests I have in the test section here, as well as what you mentioned, with both vchans enabled or disabled, including bitperfect. Judging from your backtraces, it seems that the panics most likely occur at the CHN_INSERT_HEAD_SAFE()/CHN_REMOVE_SAFE() calls. Is there something worth noting in your setup/test environment?

I will also work on addressing the issues in the inline comments.

Nothing special, really. Just a bhyve VM with CURRENT, a USB card passthrough for the single snd_uaudio device. No non-default settings, I only use this setup for testing.

I had a look at a core dump, with the playback failing at first attempt. There seems to be a garbage pointer at the head of the channels.pcm.busy list. Obviously CHN_INSERT_HEAD_SAFE() trips over that. The pointer references 0xdeadc0dedeadc0de memory, I suppose that's already freed memory?

Ok, I did some more debugging, my current theory is this:

  1. The first (and only) playback vchan is closed while running, we call vchan_destroy().
  2. vchan_destroy() removes the vchan from its parent's children list.
  3. vchan_destroy() calls chn_kill(), which in turn calls chn_trigger(ABORT).
  4. chn_trigger(ABORT) calls vchan_trigger() through the object method.
  5. vchan_trigger() calls chn_notify() on the parent.
  6. chn_notify() fails with ENODEV because the parent's children list is empty now.
  7. Thus vchan_trigger() also fails, ENODEV.
  8. chn_trigger(ABORT) fails early and doesn't remove the vchan from the channels.pcm.busy list.
  9. chn_kill() frees the vchan data, leaving a stale entry in channels.pcm.busy.
  10. Next occasion to traverse channels.pcm.busy list -> panic!

So a precondition of this panic would be that there's one vchan open which gets closed abruptly, instead of being stopped. This seems to hold in my experiments, as I cannot trigger the panic while keeping a vchan open.

Ok, I did some more debugging, my current theory is this:

  1. The first (and only) playback vchan is closed while running, we call vchan_destroy().
  2. vchan_destroy() removes the vchan from its parent's children list.
  3. vchan_destroy() calls chn_kill(), which in turn calls chn_trigger(ABORT).
  4. chn_trigger(ABORT) calls vchan_trigger() through the object method.
  5. vchan_trigger() calls chn_notify() on the parent.
  6. chn_notify() fails with ENODEV because the parent's children list is empty now.
  7. Thus vchan_trigger() also fails, ENODEV.
  8. chn_trigger(ABORT) fails early and doesn't remove the vchan from the channels.pcm.busy list.
  9. chn_kill() frees the vchan data, leaving a stale entry in channels.pcm.busy.
  10. Next occasion to traverse channels.pcm.busy list -> panic!

So a precondition of this panic would be that there's one vchan open which gets closed abruptly, instead of being stopped. This seems to hold in my experiments, as I cannot trigger the panic while keeping a vchan open.

Ok, tested with chn_notify() returning (0) instead of ENODEV. Whether that's the proper fix is open for discussion, but it is a fix. That was a nice puzzle.

Interesting. Does calling chn_abort() on the vchan in dsp_close() before the call to vchan_destroy() fix this?

Interesting. Does calling chn_abort() on the vchan in dsp_close() before the call to vchan_destroy() fix this?

Probably, but to me that would fix the symptoms, not the cause. I think chn_notify() shouldn't fail with an error there, the primary channel having no children is not a real problem - it just means there is nothing to do (no-one to notify). Also maybe an error from vchan_trigger() should be handled differently in chn_trigger(), regarding the busy list.

Interesting. Does calling chn_abort() on the vchan in dsp_close() before the call to vchan_destroy() fix this?

Probably, but to me that would fix the symptoms, not the cause. I think chn_notify() shouldn't fail with an error there, the primary channel having no children is not a real problem - it just means there is nothing to do (no-one to notify). Also maybe an error from vchan_trigger() should be handled differently in chn_trigger(), regarding the busy list.

Yeah, I agree that chn_notify() should return 0 if the children list is empty. My idea was that it'd make sense to first stop the channel, then destroy it. I think we should go with the chn_notify() change instead, though. D48156

My bad. Do not take this version as the finalized one.

christos marked 3 inline comments as done.

Make sure that in case we choose a primary channel to allocate a vchan, it's
not being used directly.

Tested with:

  • vchans enabled, spawn vchan(s): OK
  • vchans enabled, spawn vchan, disable vchans, spawn chan: OK (ENODEV, we need to close the vchan first)
  • vchans disabled, spawn chan: OK
  • vchans disabled, spawn chan, spawn another chan: OK (EBUSY)
  • vchans disabled, spawn chan, enable vchans, spawn vchan: OK (EBUSY)

Also depend on D48183.

sys/dev/sound/pcm/dsp.c
205–214

Do you think the new diff addresses the issue properly? The bug you mention is fixed.

sys/dev/sound/pcm/vchan.c
813–814
sys/dev/sound/pcm/dsp.c
189–190

Since we do not take vchan enabled / disabled into account here, this may select a primary channel that is incompatible with what we need. While this just defers the error in case of one primary channel per direction, it may prevent choosing another (compatible) one if there are multiple primary channels per direction.

202–203

We probably need this condition in the loop too, I'd suggest a variable like bool vchan_enabled = ... for readability.

434–438

What happens in duplex mode when only the recording dsp_chn_alloc() call fails? Is dsp_close() actually called when dsp_open() fails? And if that's the case, we have an inconsistency between priv->wrch and the channels.pcm.opened list. We call CHN_REMOVE() there...

sys/dev/sound/pcm/vchan.c
813–816

We may want to explicitly set *child = NULL; here, to not leave any dangling pointers, even if we also return an error value.

christos added inline comments.
sys/dev/sound/pcm/dsp.c
189–190

I included the vchan enabled/disabled check here as well, as suggested in your comment below. However, if vchans are disabled for a given direction, would having >1 primary channels in that direction make a difference in the first place?

434–438

So, dsp_close() will be called, but because priv->wrch will not have yet been added to the opened list, the CHN_REMOVE() will most likely cause a problem. I think we could retain the wrch and rdch variables, and assign them to priv->wrch and priv->rdch respectively, once we get past the dsp_chn_alloc()s.

sys/dev/sound/pcm/dsp.c
189–190

Now I'm confused - I only see the c->flags & CHN_F_HAS_VCHAN which checks for the presence of vchans, not whether they are enabled? Or do you mean you included these checks locally but didn't update the diff here yet?

And yes, if vchans are disabled, with a vchan still running on the first playback channel, this first playback channel would always be selected because CHN_F_HAS_VCHAN is set. Despite the second playback channel possibly being free to use, with CHN_F_BUSY not set.

sys/dev/sound/pcm/dsp.c
189–190

Sorry, I meant that I applied the change locally. This should be enough IMHO.

christos marked 4 inline comments as done.

Address all remaining comments.

sys/dev/sound/pcm/dsp.c
194–196

Shouldn't this also break out of the loop when vdir_enabled is not set? I think the logic order is mixed up, did you mean (c->flags & CHN_F_BUSY) == 0 || (vdir_enabled && c->flags & CHN_F_HAS_VCHAN)?

434–450

I think we still have an issue here, when allocation of the playback channel succeeds, but recording channel fails. We're basically "leaking" the playback channel now, with no chance for dsp_close() to clean up the channels because it's lacking info about the allocation.

I would suggest to treat the cases for playback and recording separately, handle allocation success or failure independently per direction. Complete the registration of e.g. playback channel in the open list, even if the subsequent recording channel allocation fails. If the data is consistent, we can let dsp_close() do the cleanup afterwards.

christos marked an inline comment as done.

Address comments.

sys/dev/sound/pcm/dsp.c
194–196

Yes, I noticed this right after I submitted. I meant to write

		if ((c->flags & CHN_F_BUSY) == 0 ||
		    (vdir_enabled && (c->flags & CHN_F_HAS_VCHAN)))
434–450

Good idea.

I guess that's because of D47917, which assigns vchanrate to VCHAN_DEFAULT_RATE (48000), during attach in pcm_init(). Perhaps we should initialize these values once the primary channels have been created, and use their rates instead of defaulting to VCHAN_DEFAULT_RATE. I suppose the same should apply to the format.

Original discussion on D48336. Apart from that I think we're done, at least from my side.

I guess that's because of D47917, which assigns vchanrate to VCHAN_DEFAULT_RATE (48000), during attach in pcm_init(). Perhaps we should initialize these values once the primary channels have been created, and use their rates instead of defaulting to VCHAN_DEFAULT_RATE. I suppose the same should apply to the format.

Original discussion on D48336. Apart from that I think we're done, at least from my side.

Do you agree with this approach? The patch will need some refactoring but I think it should work.

I guess that's because of D47917, which assigns vchanrate to VCHAN_DEFAULT_RATE (48000), during attach in pcm_init(). Perhaps we should initialize these values once the primary channels have been created, and use their rates instead of defaulting to VCHAN_DEFAULT_RATE. I suppose the same should apply to the format.

Original discussion on D48336. Apart from that I think we're done, at least from my side.

Do you agree with this approach? The patch will need some refactoring but I think it should work.

Yes, at least for the vchanrate. We have to see about the vchanformat, some of the conversion feeders like to work with S16_NE or S32_NE, and the hardware could be S24_NE for example. That could introduce some unnecessary conversion stages, since applications use mostly 16bit or 32bit samples.

christos marked an inline comment as done.

Adapt vchanrates to driver rate after creating the primary channels.

Remove pcm_init assignments.

Unfortunately the last two changes leave the vchanformat uninitialized until the channel's first use:

<flo@current:~> sysctl dev.pcm.0.rec                                                            20:24:30 [0]
dev.pcm.0.rec.vchanformat:
dev.pcm.0.rec.vchanrate: 44100
dev.pcm.0.rec.vchanmode: fixed
dev.pcm.0.rec.vchans: 1

Also cat /dev/sndstat shows a feeder chain for 8000Hz mono playback / recording instead of what is shown in the sysctls. Isn't there an init function for this, like chn_reset() or whatever is called on the first use? Hopefully that would give us the same behavior as prior to this patch, where some initialization was done when the spare vchans were created.

If we want to have more sensible defaults for vchanformat and vchanrate, I'd prefer to do that in separate change. This commit is already quite extensive.

Unfortunately the last two changes leave the vchanformat uninitialized until the channel's first use:

<flo@current:~> sysctl dev.pcm.0.rec                                                            20:24:30 [0]
dev.pcm.0.rec.vchanformat:
dev.pcm.0.rec.vchanrate: 44100
dev.pcm.0.rec.vchanmode: fixed
dev.pcm.0.rec.vchans: 1

I know. I've been trying to think of a clean way to do address this. I actually do not like the idea of initializing vchanrate in pcm_addchan(), and then vchanformat somewhere else. It makes the code quite messy and harder to follow. This will go away.

Also cat /dev/sndstat shows a feeder chain for 8000Hz mono playback / recording instead of what is shown in the sysctls.

That's because sndstat first reports the software rate (see sndstat_prepare_pcm()), hence the 8000. Whereas what I did was fetch the hardware rate with sndbuf_getspd(c->bufhard).

Isn't there an init function for this, like chn_reset() or whatever is called on the first use? Hopefully that would give us the same behavior as prior to this patch, where some initialization was done when the spare vchans were created.

So how vchanrate and vchanformat were initialized before is in vchan_create(). It issues a chn_reset() to the parent channel with vchanspd and vchanfmt as parameters, which default to VCHAN_DEFAULT_RATE and VCHAN_DEFAULT_FORMAT, and then sets vchanrate and vchanformat to whatever chn_reset() set parent->speed and parent->format to.

I think what we could do is, initialize channels with VCHAN_DEFAULT_RATE and VCHAN_DEFAULT_FORMAT in chn_init(), and set vchanrate and vchanformat after the chn_reset() call in chn_init(). I still do not really like this approach since we mix normal channel and vchan code, but I think it might be the lesser evil for now. An advantage of this is also that we can avoid the additional chn_reset() when opening the channel, because we are initializing the channels with a sane default rate and format. It's very unlikely that someone would actually open the channel with AFMT_U8 at 8000Hz, which is what we do now in chn_init().

If we want to have more sensible defaults for vchanformat and vchanrate, I'd prefer to do that in separate change. This commit is already quite extensive.

I agree.

I think what we could do is, initialize channels with VCHAN_DEFAULT_RATE and VCHAN_DEFAULT_FORMAT in chn_init(), and set vchanrate and vchanformat after the chn_reset() call in chn_init(). I still do not really like this approach since we mix normal channel and vchan code, but I think it might be the lesser evil for now. An advantage of this is also that we can avoid the additional chn_reset() when opening the channel, because we are initializing the channels with a sane default rate and format. It's very unlikely that someone would actually open the channel with AFMT_U8 at 8000Hz, which is what we do now in chn_init().

Hmm.. thinking a bit more about this, I think there is a design problem here. As mentioned in the previous comment, vchanrate and vchanformat are initialized in vchan_create():

	if (vchanfmt == 0) {
                /* fetch vchanformat from hints or use default */
	}

	if (vchanspd == 0) {
                /* fetch vchanrate from hints or use default */
	}

The only case where these conditions are currently true is when the first vchan is created. Apart from the non-obvious logic, this functionality is also not related to creating a vchan. That code should live in vchan_initsys(). Another problem is that if we keep this initialization code in vchan_create(), then the approach I am proposing will make that code essentially a no-op, which will make the vchanrate and vchanformat device hints useless, since they are handled there.

I played around with moving the code to vchan_initsys() (without this patch), but the only problem now is that this code is executed after the vchans have been created, because vchan_initsys() is called after vchan_setmaxauto() in pcm_register(). In other words, we'd create vchans with uninitialized vchanrate and vchanformat... This call sequence in pcm_register() is also a design issue IMHO.

I'm still a bit conflicted as to what approach is the best for this patch specifically. It's probably better not to over-complicate this patch, and instead either leave the vchanrate and vchanformat sysctls uninitialized until the first vchan is created, or just hardcode some value.
@dev_submerge.ch What do you think?

I'm still a bit conflicted as to what approach is the best for this patch specifically. It's probably better not to over-complicate this patch, and instead either leave the vchanrate and vchanformat sysctls uninitialized until the first vchan is created, or just hardcode some value.
@dev_submerge.ch What do you think?

While the practical consequences of this issue are somewhat minor, what bothers me is the disconnectedness of sysctl settings and current state. We should either make the vchan* sysctls reflect the current software format of the primary channel. Or, as an alternative, make the vchan* settings a template for newly created vchans, independent of the current primary channel setup. It doesn't help to decide here that vchanrate only accepts hardware sample rates now, whereas the vchanformat can be chosen freely.

That being said, I'm fine with postponing these fixes, and I would recommend to merge this patch to get it exposed to more real-world testing. It's quite a fundamental change in behavior, but I'm done with reviewing from my side. Please take note of the issues I mentioned so we can tackle them in a future change.

This revision is now accepted and ready to land.Jan 18 2025, 7:54 PM
share/man/man4/pcm.4
363–368

@emaste @markj Is this more appropriate for RELNOTES or UPDATING? This patch will probably be MFC'd to stable/14 along with the other patches.

share/man/man4/pcm.4
363–368

RELNOTES makes more sense. UPDATING is for documenting issues that specifically affect users upgrading from source, as opposed to using freebsd-update, say.

I'm still a bit conflicted as to what approach is the best for this patch specifically. It's probably better not to over-complicate this patch, and instead either leave the vchanrate and vchanformat sysctls uninitialized until the first vchan is created, or just hardcode some value.
@dev_submerge.ch What do you think?

While the practical consequences of this issue are somewhat minor, what bothers me is the disconnectedness of sysctl settings and current state. We should either make the vchan* sysctls reflect the current software format of the primary channel. Or, as an alternative, make the vchan* settings a template for newly created vchans, independent of the current primary channel setup. It doesn't help to decide here that vchanrate only accepts hardware sample rates now, whereas the vchanformat can be chosen freely.

That being said, I'm fine with postponing these fixes, and I would recommend to merge this patch to get it exposed to more real-world testing. It's quite a fundamental change in behavior, but I'm done with reviewing from my side. Please take note of the issues I mentioned so we can tackle them in a future change.

D48961

This revision was automatically updated to reflect the committed changes.