Page MenuHomeFreeBSD

sound: Refactor the format conversion framework
Needs ReviewPublic

Authored by christos on Dec 5 2024, 5:57 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Jan 14, 6:58 PM
Unknown Object (File)
Thu, Jan 9, 10:29 AM
Unknown Object (File)
Wed, Jan 8, 9:41 PM
Unknown Object (File)
Wed, Jan 8, 8:33 PM
Unknown Object (File)
Wed, Jan 8, 8:06 PM
Unknown Object (File)
Wed, Jan 8, 3:20 AM
Unknown Object (File)
Dec 26 2024, 8:30 AM
Unknown Object (File)
Dec 26 2024, 7:54 AM

Details

Summary

Merge the PCM_READ|WRITE_* macros defined in pcm/pcm.h, as well as the
intpcm_read|write_* macros defined in pcm/feeder_format.c, into six
inline functions: pcm_sample_read|write[_norm|calc](). The absence of
macro magic makes the code significantly easier to read, use and modify.

Since these functions take the input/output format as a parameter, get
rid of the read() and write() function pointers defined in struct
feed_format_info, as well as the feeder_format_read|write_op()
functions, and use the new read/write functions directly.

Sponsored by: The FreeBSD Fondation
MFC after: 1 week

Test Plan

Apply D48330 before this patch and run the tests.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 61859
Build 58743: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sys/dev/sound/pcm/feeder_format.c
54–55

I think it'd be more intuitive to check against AFMT_FOO_NE and AFMT_FOO_OE. For the former case we just use the buffer as-is (with the cast), and for the latter, we bswap().

sys/dev/sound/pcm/feeder_format.c
54–55

Possible, but actually less intuitive to me - you still have to handle the 24bit formats differently, and you have to assume things about the platform's endianness. There's platforms with mixed endianness for example, although I don't think FreeBSD currently supports one.

I'd go with the sys/sys/endian.h, uniformly for all formats. Also it is the canonical way that's used in many places for conversions like this.

sys/dev/sound/pcm/feeder_format.c
54–55

To clarify, I meant the sys/sys/endian.h approach, that is also currently used here for 24bit formats. Not to use the actual sys/sys/endian.h implementation.

sys/dev/sound/pcm/feeder_format.c
54–55

So basically, if I understand correctly, what you propose is that for example, AFMT_S16_LE is v = INTPCM_T(src[0] | (int8_t)src[1] << 8);, and AFMT_S16_BE is v = INTPCM_T(src[1] | (int8_t)src[0] << 8);.

sys/dev/sound/pcm/feeder_format.c
54–55

Yes, that should work independent of endianness, and in my experience all compilers are smart enough to optimize this into a simple 16bit-read for native endian samples.

We have to be careful about the meaning of the result values here. As far as I understand these are not correctly signed 32bit or 64bit integers, but some shifted down bit representation of the smaller sample values. An explicit distinction between these and the full integers used for computation would be a huge improvement, IMHO. Separate topic though.

christos marked 7 inline comments as done.
  • Only use PCM_FXSHIFT with 32-bit formats. This is consistent with the previous behavior.
  • I'm not sure if we need to test AC3, MU_LAW and A_LAW, so I left them commented out.
  • @dev_submerge.ch Is there still a problem with compiling if pcm_sample_read|write() are defined in feeder_format.c?
  • We should somehow make sure everything works properly on big-endian machines.
  • Should this live under SND_DEBUG?
christos edited the test plan for this revision. (Show Details)
christos marked 2 inline comments as done.

First of all, could you please update these reviews when you rebase? The diff here still has D47868 included, and thus fails to apply cleanly.

  • Only use PCM_FXSHIFT with 32-bit formats. This is consistent with the previous behavior.
  • I'm not sure if we need to test AC3, MU_LAW and A_LAW, so I left them commented out.
  • @dev_submerge.ch Is there still a problem with compiling if pcm_sample_read|write() are defined in feeder_format.c?

Sure. You cannot compile __always_inline stuff when the definition is not available:

--- feeder_eq.o ---
In file included from /usr/src/sys/dev/sound/pcm/feeder_eq.c:42:
In file included from /usr/src/sys/dev/sound/pcm/sound.h:87:
/usr/src/sys/dev/sound/pcm/pcm.h:126:33: error: inline function 'pcm_sample_read' is not defined [-Werror,-Wundefined-inline]
  126 | extern __always_inline intpcm_t pcm_sample_read(uint8_t *, uint32_t, bool);
      |                                 ^
/usr/src/sys/dev/sound/pcm/feeder_eq.c:217:1: note: used here
  217 | FEEDEQ_DECLARE(S, 16, LE)
      | ^
/usr/src/sys/dev/sound/pcm/feeder_eq.c:161:8: note: expanded from macro 'FEEDEQ_DECLARE'
  161 |                         v = pcm_sample_read(dst, AFMT_##SIGN##BIT##_##ENDIAN,   \
      |                             ^
In file included from /usr/src/sys/dev/sound/pcm/feeder_eq.c:42:
In file included from /usr/src/sys/dev/sound/pcm/sound.h:87:
/usr/src/sys/dev/sound/pcm/pcm.h:127:29: error: inline function 'pcm_sample_write' is not defined [-Werror,-Wundefined-inline]
  127 | extern __always_inline void pcm_sample_write(uint8_t *, intpcm_t, uint32_t, bool);
      |                             ^
/usr/src/sys/dev/sound/pcm/feeder_eq.c:217:1: note: used here
  217 | FEEDEQ_DECLARE(S, 16, LE)
      | ^
/usr/src/sys/dev/sound/pcm/feeder_eq.c:164:4: note: expanded from macro 'FEEDEQ_DECLARE'
  164 |                         pcm_sample_write(dst, v, AFMT_##SIGN##BIT##_##ENDIAN,   \
      |                         ^
2 errors generated.
*** [feeder_eq.o] Error code 1
  • We should somehow make sure everything works properly on big-endian machines.

If we make this a unit test, I suppose it would be run on any big-endian test builders? If there is no CI for those, we could ask someone that works on such an architecture to run them.

  • Should this live under SND_DEBUG?

I'd say a unit test is more appropriate.

sys/dev/sound/pcm/feeder_format.c
54–55

That's already a lot better. For readability I would prefer a separate function that calls this one and does the shift, instead of a boolean flag. That would emphasize the different return values and make it easier to grep what is used in which feeder.

I'm not convinced we actually need two separate functions in the end, the shifted values should work everywhere, since they are of the same magnitude as 32bit (or 24bit) samples. But for refactoring this is ok.

Also the shift here seems contradictory if SND_PCM64 is not set. In that case 32bit samples are shifted down to 24bit magnitude, whereas all other sample formats are shifted up to full 32bit magnitude?

54–55

Apart from putting this into a unit test, I'd suggest to make these tests a lot simpler and more verbose. If you replicate too much logic it's easy to fool yourself (and us reviewers) with a badly written test.

I would start with a byte array like { 0x01, 0x02, 0x03, 0x04 }, read it as a specific sample format and compare it to an explicit value. Then write the value into another array and compare that explicitly. Also repeat this for at least one negative integer value, like { 0x81, 0x82, 0x83, 0x84 } - does that make sense?

sys/dev/sound/pcm/pcm.h
34–41

Some of these comments should be preserved and adjusted. They are valuable hints to understand what we do here.

132–133

You have to put the actual implementation here, otherwise the inlining doesn't work and it won't compile.

christos added inline comments.
sys/dev/sound/pcm/feeder_format.c
54–55

Also the shift here seems contradictory if SND_PCM64 is not set. In that case 32bit samples are shifted down to 24bit magnitude, whereas all other sample formats are shifted up to full 32bit magnitude?

I had the same thought. The reason I kept this was to preserve the current behavior. That being said, I also do not understand this.

/*
 * 24bit integer ?!? This is quite unfortunate, eh? Get the fact straight:
 * Dynamic range for:
 *	1) Human =~ 140db
 *	2) 16bit = 96db (close enough)
 *	3) 24bit = 144db (perfect)
 *	4) 32bit = 196db (way too much)
 *	5) Bugs Bunny = Gazillion!@%$Erbzzztt-EINVAL db
 * Since we're not Bugs Bunny ..uh..err.. avoiding 64bit arithmetic, 24bit
 * is pretty much sufficient for our signed integer processing.
 */

According to this comment from pcm/pcm.h (left diff), 24-bit magnitude is just fine for the human ear, so I suppose we should either:

  1. Not shift down 32-bit formats to 24 bits, and also shift the rest of the formats up to 32 bits.
  2. Shift everything up/down to 24 bits.
54–55

I do not have any strong objection to this approach, but in what way is this test confusing (if at all)? I think it's easier to automate testing of all formats (not sure about AFMT_FLOAT yet, though) this way.

Also I did move it to tests/sys/sound.

sys/dev/sound/pcm/feeder_format.c
54–55

There is this comment though:

To avoid overflow, we truncate 32bit (and only 32bit) samples down to 24bit (see below for the reason), unless SND_PCM_64 is defined.

sys/dev/sound/pcm/feeder_format.c
54–55

Both of these comments apply. In case SND_PCM_64 is set, all calculations are in 64bit integers, there's no risk of overflow, and we may shift all samples up to 32bit magnitude. If SND_PCM_64 is not set, calculations are in 32bit integers and we should avoid anything that exceeds the 24bit magnitude.

And just to "get the fact straight": 32bit precision can be useful in music production setups where you want to preserve the full 24bit precision quality, but need some headroom to accommodate for louder / quieter parts. Thus 24bit everywhere, always, is not an option.

I didn't look at all feeders yet, but there's some nuances in the requirements:

  • Forward feeders that don't do format conversion get by without any sample shift.
  • Format conversion feeders always need the same magnitude for read and write, but won't overflow because there are no calculations. Means we don't have to shift down 32bit samples to 24bit if SND_PCM_64 is not set.
  • Mixers and equalizers do calculations, samples should have the same magnitude for read and write, and we have to shift down 32bit samples to 24bit if SND_PCM_64 is not set.

For the sake of refactoring, I'd suggest to implement separate inline functions for these different requirements. We can call the main sample read / write function from there and shift as necessary per requirement.

54–55

In my experience, unit tests like this should be easily verifiable, test exactly what is required from the implementation, with the least amount of assumptions and dependencies.

For example your current code makes assumptions about the platform's integer memory layout, depends on endianness specific macros, and does many non-obvious (__bswap16(int32_t)?) transformations on the output before checking it. To verify its correctness I have to consider different int sizes, different endianness, different sample formats and that the output is properly checked. BTW, there's an error in there, can you spot it?

While all we want to test is that a certain sample format in a byte array is read to the correct integer value, and vice versa for writing. Put these into a well-structured table of test data, it will be more verbose but also explicit and extendable.

Another reason is that there's already quite some undefined or implementation defined behavior in pcm_sample_read() and pcm_sample_write(), like signed integer shifts or unsigned <-> signed conversions. While hard to avoid there, at least I'd want to keep those out of the tests.

FYI: at some point we had the sound system with the lowest latency. Ariff was comparing MS, OS X and Linux. I do not know if they have catched-up since then, but it may be worth the effort to check the performance / latency for such changes. Unfortunately I haven't found the info about the latency stuff he did, only his resampling quality comparison with other resampler implementations (https://people.freebsd.org/~ariff/z_comparison/). In https://people.freebsd.org/~ariff/old/ he has some old low latency diffs, so some interested soul could check which parts he modified to get lower latency.

FYI: at some point we had the sound system with the lowest latency. Ariff was comparing MS, OS X and Linux. I do not know if they have catched-up since then, but it may be worth the effort to check the performance / latency for such changes. Unfortunately I haven't found the info about the latency stuff he did, only his resampling quality comparison with other resampler implementations (https://people.freebsd.org/~ariff/z_comparison/). In https://people.freebsd.org/~ariff/old/ he has some old low latency diffs, so some interested soul could check which parts he modified to get lower latency.

Thanks for your input. I agree that we should keep an eye on performance, but I'd like to clarify: In our current refactoring of the format conversion we're not touching anything that changes the buffer sizes. And that's the only thing that could possibly affect audio latency. If latency really is of concern, then we'd have to look at smaller buffer sizes and how to make timers and scheduling more reliable. Also I doubt that we ever had the lowest overall latency, if you count in ASIO on Windows. I suspect that was about the latency introduced by the resampler.

What we actually do here is replace the macro-generated explicit specializations for different formats with inline functions and explicit format parameters. This is expected to result in comparable efficiency using modern compilers. I suppose we can have a look at the assembler generated for the unit test. Any suggestion for benchmarking the format conversions?

Thanks for your input. I agree that we should keep an eye on performance, but I'd like to clarify: In our current refactoring of the format conversion we're not touching anything that changes the buffer sizes. And that's the only thing that could possibly affect audio latency. If latency really is of concern, then we'd have to look at smaller buffer sizes and how to make timers and scheduling more reliable. Also I doubt that we ever had the lowest overall latency, if you count in ASIO on Windows. I suspect that was about the latency introduced by the resampler.

We have the possibility to modify the latency / buffer size already. See https://meka.rs/blog/2017/01/25/sing-beastie-sing/
And no, it was not about the resampler. See also https://news.ycombinator.com/item?id=12616183
And it was about 14 years ago or such. So comparing with Windows 2000 / Vista. No idea how much Windows has improved in the latency area since then, but I have some dim memories about "not that much". We should be still quiet good in this area.

What we actually do here is replace the macro-generated explicit specializations for different formats with inline functions and explicit format parameters. This is expected to result in comparable efficiency using modern compilers. I suppose we can have a look at the assembler generated for the unit test. Any suggestion for benchmarking the format conversions?

Benchmarking: including the format conversion code into a userland tool which can be benchmarked, or maybe via dtrace (timestamps between entering and exiting), or hwpmc?

Thanks for your input. I agree that we should keep an eye on performance, but I'd like to clarify: In our current refactoring of the format conversion we're not touching anything that changes the buffer sizes. And that's the only thing that could possibly affect audio latency. If latency really is of concern, then we'd have to look at smaller buffer sizes and how to make timers and scheduling more reliable. Also I doubt that we ever had the lowest overall latency, if you count in ASIO on Windows. I suspect that was about the latency introduced by the resampler.

We have the possibility to modify the latency / buffer size already. See https://meka.rs/blog/2017/01/25/sing-beastie-sing/

Sure, within some limits. There's a lot to fix and improve in that area. Funny enough, Meka's setup was terrible back then in terms of latency.

And no, it was not about the resampler. See also https://news.ycombinator.com/item?id=12616183
And it was about 14 years ago or such. So comparing with Windows 2000 / Vista. No idea how much Windows has improved in the latency area since then, but I have some dim memories about "not that much". We should be still quiet good in this area.

That's not low latency in my book, more like decent enough latency for media consumption. I suppose we do well in that department. For applications that really need low latency, e.g. live sound effects, ASIO on Windows still beats us, and certainly did so 14 years ago. Anyway, different use cases.

What we actually do here is replace the macro-generated explicit specializations for different formats with inline functions and explicit format parameters. This is expected to result in comparable efficiency using modern compilers. I suppose we can have a look at the assembler generated for the unit test. Any suggestion for benchmarking the format conversions?

Benchmarking: including the format conversion code into a userland tool which can be benchmarked, or maybe via dtrace (timestamps between entering and exiting), or hwpmc?

Thanks, I didn't find much info about micro benchmarking in FreeBSD base. DTrace may be an option though if the number of samples processed is large enough.

christos marked 3 inline comments as done.
  • Add pcm_sample_read|write_shift() macros and remove the boolean argument from pcm_sample_read|write().
  • Move tests to sys/tests/sound and re-write according to Florian's suggestions.
  • Hardcode expected results instead of reverse engineering them.
  • Move pcm/sound.h's AFMT_* defines out of ifdef _KERNEL so that the test can use them.

Will rebase dependent reviews when we are done with this first.

Notice that in the test code I left some comments, namely:

  • Since the results are hardcoded, I'm not sure whether this will work on big-endian machines.
  • I left out tests (for now) for when SND_PCM_64 is not set.
  • I'm not sure whether we need to add tests for AFMT_MU_LAW and AFMT_A_LAW. See how pcm/pcm.h handles these formats.
sys/dev/sound/pcm/pcm.h
331–337

Why macros instead of inline functions?

tests/sys/sound/fmtconv.c
68 ↗(On Diff #148558)

Shouldn't this value be negative? Same for the following unsigned sample formats?

158–162 ↗(On Diff #148558)

That wasn't the idea, it can't work that way. You have to define the format independent of the machine's endianness, here that would be AFMT_U32_LE. We want to test that reading a little endian U32 sample of bytes { 0x81, 0x82, 0x83, 0x84 } always results in the corresponding integer value of 0x04838281. That should be true on any machine, regardless of it's endianness.

For the same reason, the result of the write functions has to be a byte array. We want to test that writing a value of 0x04838281 as a little endian U32 sample results in a byte array of { 0x81, 0x82, 0x83, 0x84 }, regardless of the machine's endianness. Don't compare integer values here, compare the resulting byte arrays. Then our test checks the right thing, without any endianness of the machine running it involved. Only the implementation of the read and write function has to care about endianness. Our test just checks the correctness of the results.

Also you may want to create separate tables for read and write test data. I think you can then fit each test case on one line, tidying up the whole thing.

190 ↗(On Diff #148558)

I'd prefer to avoid the undefined behavior of shifting a negative signed integer here, for the sake of testing. Can we add the shifted value to the test data? Alternatively we could reformulate the calculation without undefined behavior.

Please note that I'm working on a unit test which exercises the original sample read / write macros, to be committed before this refactoring. This way we can verify the test procedure and test data with the original code, then adapt it to the refactored code. It's still WIP, but I can already tell that we definitely have some regressions here.

Christos, please have a close look at the precise definition of arithmetic operators in C, regarding integer promotion. There's some subtle nuances about conversion order that affect your code.

BTW, I hope you all had a pleasant start into 2025!

christos marked 4 inline comments as done.
  • Remove unit test in favor of D48330. Work will continue there. Depend on it.
  • Make shift functions inline, instead of macros.
  • Mark read/write functions as __unused to suppress compiler warnings about unused shift functions in various files.
  • Fix regressions with unsigned format conversion. Now passes D48330 tests.
  • Modify D48330 to work with the refactor.

BTW, I hope you all had a pleasant start into 2025!

Happy new year! :-)

christos edited the test plan for this revision. (Show Details)

There's actually three different sample value types involved here:

  1. Original magnitude of the sample format (_PCM_READ_*() and _PCM_WRITE_*()).
  2. Original magnitude, shift to 24 bit for 32 bit processing (PCM_READ_*() and PCM_WRITE_*()).
  3. Normalized to 32 bit magnitude (intpcm_read_*() and intpcm_write_*()).

The refactoring is currently missing the first, I suggest that pcm_sample_read() and pcm_sample_write() should implement that instead of the second (see inline comments). We can handle the second type in a separate function. The third type then becomes pcm_sample_read_shift() and pcm_sample_write_shift().

sys/dev/sound/pcm/pcm.h
231–234

This should be factored out into a separate function.

251–254

This should be factored out into a separate function.

This comment was removed by christos.
This comment was removed by christos.
This comment was removed by christos.

Scratch that. I remembered the reason for the shift down to 24-bits is that if we are on a 32-bit machine, then can save some space this way and avoid overflows...

christos added inline comments.
sys/dev/sound/pcm/pcm.h
251–254

The PCM_WRITE_* are unused, so I think we can get away with not implementing this. No?

  • Depend on D48077. The test won't compile without this patch.
  • Introduce pcm_sample_read_24bit() and use in place of PCM_READ_*. I did not implement pcm_sample_write_24bit() respectively, since PCM_WRITE_* was never used.
sys/dev/sound/pcm/pcm.h
75–77

I don't think we can remove this (yet)?

251–254

It's not strictly necessary to implement it, but it's only unused because the original code includes the shift back to 32bit in the clamp macro which is ... at least non-obvious. If we want to untangle that convolution (I would), there will be a need to do the shift to 32bit outside of the clamp function.

My idea was actually to have this 32bit <-> 24bit transformation (or no-op) in a separate function without calling pcm_sample_write() or pcm_sample_read(), but never mind. What is important is to make it obvious what sample magnitude each feeder code is dealing with.

sys/dev/sound/pcm/pcm.h
151

As originally suggested by @markj in D48330.

227
233
christos edited the test plan for this revision. (Show Details)
christos marked 6 inline comments as done.

Address comments.

sys/dev/sound/pcm/pcm.h
75–77

This was a leftover from an experimental change. The diff is fixed already, but I didn't want to keep spamming everyone with more updates. :-)

251–254

I will implement the function, but for now I will keep it unused, since it'd require to change the logic of the clamp function as well in order to use pcm_sample_write_24bit() and not do the shift in the clamp function. I think it's better to implement this in a followup patch to not over-complicate this one.

Use pcm_sample_write_24bit() instead of calling PCM_CLAMP_* and then
passing the result to pcm_sample_write(). I said I would do it in a separate
patch, but I think it's not that bad to have it here.

sys/dev/sound/pcm/feeder_mixer.c
68

We are discussing this in D48394 as well, but we lose precision here, and in the other call of pcm_sample_write_24bit(). If this function is called with a 32-bit format, then the type assigned to z will be int64_t if SND_PCM_64 is set. pcm_sample_write_*() take the sample as intpcm_t, aka int32_t.

I think we need to address this intpcm_t thing ASAP. It's quite confusing.

Include assert.h if we're not in the kernel, and also set v to in
pcm_sample_read() default case, so that the compiler doesn't complain
when building the tests.

christos edited the summary of this revision. (Show Details)
  • Rename *_shift to *_norm and *_24bit to *_calc, similar to D48330.
  • Adapt D48330. Tests pass.

@dev_submerge.ch Both 32 and 64 bit tests pass normally (with the fix mentioned in D48330). Are we done with this one?