Details

Reviewers

markj
emaste
dev_submerge.ch

Summary

Even though the OSS manual [1] advises against using AFMT_FLOAT, there
are applications that expect the sound driver to support it, and might
not work properly without it.

This patch adds AFMT_F32_LE|BE (as well as AFMT_FLOAT for OSS
compatibility) in sys/soundcard.h and implements AFMT_F32_LE|BE <->
AFMT_S32_LE|BE conversion functions. As a result, applications can
write/read floats to/from sound(4), but internally, because sound(4)
works with integers, we convert floating point samples to integer ones,
before doing any processing.

[1] http://manuals.opensound.com/developer/AFMT_FLOAT.html

PR: 157050, 184380, 264973, 280612, 281390
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Test Plan

Apply D47639 to test sound.

Below is a benchmark for AFMT_FLOAT operations in pcm_sample_read() and pcm_sample_write(). The results are in cycles, and are measured by adding 2 rdtscp() calls in pcm_sample_read() and pcm_sample_write() respectively - one at the start and one at the end, and calculating the difference.

The test program is a simple loopback program that reads and writes zeroes 512 times:

#include <sys/soundcard.h>

#include <err.h>
#include <fcntl.h>
#include <unistd.h>

int
main(int argc, char *argv[])
{
        uint32_t sample;
        int fd, fmt, chans, rate, n;

        if ((fd = open(argv[1], O_RDWR)) < 0)
                err(1, "open(%s)", argv[1]);

        chans = 1;
        if (ioctl(fd, SNDCTL_DSP_CHANNELS, &chans) < 0)
                err(1, "ioctl(SNDCTL_DSP_CHANNELS)");

        fmt = AFMT_FLOAT;
        if (ioctl(fd, SNDCTL_DSP_SETFMT, &fmt) < 0)
                err(1, "ioctl(SNDCTL_DSP_SETFMT)");

        rate = 48000;
        if (ioctl(fd, SNDCTL_DSP_SPEED, &rate) < 0)
                err(1, "ioctl(SNDCTL_DSP_SPEED)");

        for (n = 512; n--;) {
                sample = 0;
                if (read(fd, &sample, sizeof(sample)) < 0)
                        err(1, "read");
                if (write(fd, &sample, sizeof(sample)) < 0)
                        err(1, "write");
        }

        close(fd);
}

The sound card used is a Focusrite Scarlett Solo (snd_uaudio(4)), with VCHANs disabled to simplify the test.

function	cycles
`pcm_sample_read(AFMT_S32_LE)`	330.806
`pcm_sample_write(AFMT_S32_LE)`	183.298
`pcm_sample_read(AFMT_F32_LE)`	3246.6
`pcm_sample_write(AFMT_F32_LE)`	1893.8

The AFMT_FLOAT operations seem to be ~10x more expensive.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 61916
Build 58800: arc lint + arc unit

Event Timeline

christos created this revision.Nov 16 2024, 6:15 PM

Herald added a subscriber: imp. · View Herald TranscriptNov 16 2024, 6:15 PM

christos requested review of this revision.Nov 16 2024, 6:15 PM

Harbormaster completed remote builds in B60632: Diff 146593.Nov 16 2024, 6:15 PM

christos added a child revision: D47639: beep(1): Use AFMT_FLOAT.Nov 16 2024, 6:19 PM

Hello,

i can confirm so far that it works fine with emulators/wine.

I did took 2 FMOD games for testing (Way of the Hunter and Roboquest) that required AFMT_FLOAT.

Thank you for your work!

emaste added inline comments.Nov 16 2024, 9:31 PM

sys/dev/sound/pcm/feeder_format.c
68 ↗	(On Diff #146593)	I'm a bit worried about a possible performance impact, and we might need to hoist the `fpu_kern_enter` / `fpu_kern_leaave` up a level, but we can profile it and see what the impact actually is.

emaste mentioned this in D47639: beep(1): Use AFMT_FLOAT.Nov 16 2024, 9:34 PM

agh_riseup.net added a subscriber: agh_riseup.net.Nov 17 2024, 1:19 AM

Very welcome, thanks for tackling this @christos. Please note that there are sound cards supporting float format, and I intend to experiment with that some time in the future (probably needs bitperfect for now).

sys/dev/sound/pcm/feeder_format.c
68 ↗	(On Diff #146593)	I'm a bit worried about a possible performance impact, and we might need to hoist the `fpu_kern_enter` / `fpu_kern_leaave` up a level, but we can profile it and see what the impact actually is. +1. Also the way this function is called through a function pointer for every sample seems quite inefficient, at least I don't see how it could be inlined there. Means a restructuring could help all format conversions, not just float. Although I see moving format conversions out to userland as the better option in the middle to long term, anyway.

christos added inline comments.Nov 18 2024, 7:47 PM

sys/dev/sound/pcm/feeder_format.c
68 ↗	(On Diff #146593)	I will second Florian's idea about optimizing the conversion framework in general, as long as we keep doing this in the kernel and not userland. For the fpu_kern(9) functions specifically, I think they could be moved around the `do-while` in `feed_format_feed()`, as Ed suggested in private, although I haven't tested this yet. This will make the code a bit uglier, because we'll have to check for `AFMT_FLOAT` in `feed_format_feed()` so that we can call `fpu_kern_enter()/fpu_kern_leave()`, but it will most likely give us a good performance boost.

christos added inline comments.Nov 18 2024, 7:49 PM

sys/dev/sound/pcm/feeder_format.c
68 ↗	(On Diff #146593)	Also I am wondering whether we need to care about endianness.

dev_submerge.ch added inline comments.Nov 19 2024, 12:43 AM

sys/dev/sound/pcm/feeder_format.c
68 ↗	(On Diff #146593)	Also I am wondering whether we need to care about endianness. Good point. In principle there's `AFMT_FLOAT_BE` and `AFMT_FLOAT_LE`, and what we currently support here as `AFMT_FLOAT` is actually `AFMT_FLOAT_NE`. I don't think it matters with audio data from userland, that should always be native endian. But it would matter when going to a sound card, e.g. a RME HDSPe card in a PowerPC big endian machine. I suppose we should have a closer look. For now we could at least define the different endianness formats for floats, similar to the integer formats.
68 ↗	(On Diff #146593)	I will second Florian's idea about optimizing the conversion framework in general, as long as we keep doing this in the kernel and not userland. The remark on doing conversion in userland was meant as a perspective on how much effort we should invest here, not as a concrete suggestion - sorry for the confusion. For the fpu_kern(9) functions specifically, I think they could be moved around the `do-while` in `feed_format_feed()`, as Ed suggested in private, although I haven't tested this yet. This will make the code a bit uglier, because we'll have to check for `AFMT_FLOAT` in `feed_format_feed()` so that we can call `fpu_kern_enter()/fpu_kern_leave()`, but it will most likely give us a good performance boost. If my assumptions about the code are correct, we do two function calls in `feed_format_feed()` (read and write) for each sample, with no inlining at all. That would be terribly inefficient, for all formats. Ideally we could specialize the whole read-write-loop instead of just the read / write functions, thereby speed up all format conversions and include the `fpu_kern_enter()/fpu_kern_leave()` in the float specializations. Unfortunately, this creates a whole lot of specializations for all format combinations. Does that make sense? What is a good way to measure this in the kernel?

Depend on D47932.
Implement little and big endian support.
Make AFMT_FLOAT usable from dev.pcm.X.play|rec.vchanformat.

Harbormaster completed remote builds in B60970: Diff 147533.Dec 5 2024, 5:59 PM

christos added a parent revision: D47932: sound: Refactor the format conversion framework.Dec 5 2024, 6:00 PM

christos added inline comments.

sys/dev/sound/pcm/feeder_format.c
68 ↗	(On Diff #146593)	I think we can move the discussion to D47932. I cleaned up the framework quite a lot IMHO so we can work on it easier now.

Remove early return in pcm_sample_read().

Harbormaster completed remote builds in B60971: Diff 147534.Dec 5 2024, 6:05 PM

christos edited the summary of this revision. (Show Details)Dec 5 2024, 6:35 PM

christos edited the test plan for this revision. (Show Details)Dec 5 2024, 6:37 PM

Bump.

meka_tilda.center added a subscriber: meka_tilda.center.Dec 10 2024, 3:19 PM

Fix shifting error I accidentally introduced while rewriting some parts.

Harbormaster completed remote builds in B61080: Diff 147809.Dec 11 2024, 10:07 AM

Use __bswap32() instead of doing the LE <-> BE conversion manually.

As per kevans@ and imp@ on IRC, since v is an int32_t, it shouldn't be a
problem that __bswap32() expects and returns a uint32_t, because the return
value is implicitly casted.

Harbormaster completed remote builds in B61084: Diff 147814.Dec 11 2024, 2:36 PM

imp added inline comments.Dec 11 2024, 3:48 PM

sys/dev/sound/pcm/feeder_format.c
141 ↗	(On Diff #147814)	Why not just bswap32?
258 ↗	(On Diff #147814)	Bswap32?
300 ↗	(On Diff #147814)	Bswap32?

I have a hard time to apply these interdependent patches cleanly to my tree. @christos, do you update all affected patches in the series when doing a rebase?

sys/dev/sound/pcm/feeder_format.c
141 ↗	(On Diff #147814)	There's discussion relevant to this in D47932.

dev_submerge.ch mentioned this in D48036: sound: Remove feed_matrix_apply_generic().Dec 15 2024, 11:35 PM

christos edited the test plan for this revision. (Show Details)Tue, Jan 7, 6:39 PM

christos edited the test plan for this revision. (Show Details)Wed, Jan 8, 4:35 PM

Make implementation alignment-agnostic.
Add support in feeder_chain.c, feeder_rate.c and feeder_volume.c.
Rebase to work with D47932.
Define AFMT_F32_OE.
Guard fpu_kern(9) calls with ifdef _KERNEL so that the unit test can run.

I think the handling in pcm_sample_read() is quite ugly. Maybe it can be
simplified a bit.

Also, according to the benchmarks I provided, it seems that the fpu_kern(9)
calls do introduce a significant overhead, but I am not sure whether that's an amount worth caring about. Perhaps someone can provide more feedback.
It might be worth trying to do these operations using only integers.

I will also add tests to the unit test implemented in D48330.

Harbormaster completed remote builds in B61576: Diff 148936.Wed, Jan 8, 4:39 PM

christos edited the test plan for this revision. (Show Details)Thu, Jan 9, 12:38 PM

christos added a child revision: D48394: sound: Simplify pcm/feeder_mixer.c.Thu, Jan 9, 2:37 PM

christos added a parent revision: D48421: sound: Turn PCM_CLAMP_* macros into a function.Fri, Jan 10, 6:39 PM

Rebase on top of D48421.

Harbormaster completed remote builds in B61655: Diff 149085.Fri, Jan 10, 6:42 PM

Use __DECONST to silence const warning in pcm_sample_read(). A bit ugly...

Harbormaster completed remote builds in B61762: Diff 149340.Thu, Jan 16, 2:37 PM

Fix bugs.
Write unit tests.
Re-arrange fpu_kern(9) calls.

There are a few sound tests I haven't done yet, but I am posting as a WIP in
case someone wants to leave a comment.

Harbormaster completed remote builds in B61916: Diff 149763.Thu, Jan 23, 3:19 PM

christos added inline comments.Thu, Jan 23, 3:19 PM

sys/dev/sound/pcm/pcm.h
217	Fixed.

Fix big endian regressions introduced in last diff.

I tried recording and playback with real audio (both LE and BE) and things seem
to work fine.

@dev_submerge.ch The big endian tests fail, because on a little endian machine,
converting 0x3f000000 (0.5) to 0x0000003f will result in the float value being
0, so the pcm_sample_* functions return will return wrong values. For
example, pcm_sample_read() reads the input as float (i.e 0x3f000000 ->
0.5), then converts it to big endian (0x3f000000 -> 0x0000003f -> 0), and
lastly converts it to S32 (0 * PCM_S32_MAX = 0), in which case the return
value will be 0, hence the test failure. I am not sure yet how we should adapt
the test program to accomodate floating-point big endian tests.

Harbormaster completed remote builds in B62150: Diff 150208.Thu, Jan 30, 5:20 PM

In D47638#1111768, @christos wrote:

@dev_submerge.ch The big endian tests fail, because on a little endian machine,
converting 0x3f000000 (0.5) to 0x0000003f will result in the float value being
0, so the pcm_sample_* functions return will return wrong values. For
example, pcm_sample_read() reads the input as float (i.e 0x3f000000 ->
0.5), then converts it to big endian (0x3f000000 -> 0x0000003f -> 0), and
lastly converts it to S32 (0 * PCM_S32_MAX = 0), in which case the return
value will be 0, hence the test failure. I am not sure yet how we should adapt
the test program to accomodate floating-point big endian tests.

Well, you do have the test data backwards for the big-endian floats? Just fill in the byte array in big-endian order and check for the correct sample values.

sys/dev/sound/pcm/pcm.h
196–207	Isn't this a bit complicated? Just read it as a 32bit integer from src, reinterpret that as a float, and then calculate the integer value from that float?

Simplify read.
Make big-endian tests pass. Hopefully it will all work properly on actual big-endian architectures as well.

Harbormaster completed remote builds in B62215: Diff 150388.Mon, Feb 3, 10:32 AM

sound: Implement AFMT_FLOAT support
Needs ReviewPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 149763

sys/dev/sound/pcm/channel.c

sys/dev/sound/pcm/feeder_chain.c

sys/dev/sound/pcm/feeder_mixer.c

sys/dev/sound/pcm/feeder_rate.c

sys/dev/sound/pcm/feeder_volume.c

sys/dev/sound/pcm/pcm.h

sys/dev/sound/pcm/sound.h

sys/sys/soundcard.h

tests/sys/sound/pcm_read_write.c

sound: Implement AFMT_FLOAT supportNeeds ReviewPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 149763

sys/dev/sound/pcm/channel.c

sys/dev/sound/pcm/feeder_chain.c

sys/dev/sound/pcm/feeder_mixer.c

sys/dev/sound/pcm/feeder_rate.c

sys/dev/sound/pcm/feeder_volume.c

sys/dev/sound/pcm/pcm.h

sys/dev/sound/pcm/sound.h

sys/sys/soundcard.h

tests/sys/sound/pcm_read_write.c

sound: Implement AFMT_FLOAT support
Needs ReviewPublic
Actions

Revision Contents
Changeset List