Page MenuHomeFreeBSD

[WIP] audio(3): New OSS audio and MIDI library
Needs ReviewPublic

Authored by christos on Jul 11 2024, 4:27 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Dec 31, 8:55 PM
Unknown Object (File)
Mon, Dec 30, 8:31 PM
Unknown Object (File)
Sun, Dec 29, 8:23 PM
Unknown Object (File)
Sat, Dec 28, 7:56 PM
Unknown Object (File)
Sat, Dec 21, 7:08 AM
Unknown Object (File)
Nov 13 2024, 12:59 AM
Unknown Object (File)
Nov 1 2024, 3:51 PM
Unknown Object (File)
Oct 10 2024, 2:17 PM

Details

Summary

Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Test Plan

Test program that plays whatever is fed from STDIN to the device specified in argv[1]: https://reviews.freebsd.org/P642

It can be very easily modified to support recording as well, by specifying the AUDIO_REC flag in audio_open(). Similarly, AUDIO_NBIO will enable non-blocking IO (but you also need to call poll(2) before calling audio_read()/audio_write(). And the same is for MIDI; call midi_open(), and then midi_read()/midi_write().

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 58617
Build 55505: arc lint + arc unit

Event Timeline

So, this is still a WIP patch, which means some things are intentionally left out or not properly implemented yet. Apart from the various XXX/TODO/FIXMEs throughout the source code, which are mostly minor things, there are a few things which I would really appreciate some discussion and ideas in order to continue.

  1. Memory-mapped IO doesn't really work properly yet. I've been mostly testing playback, and stuff can be heard, but with lots of stuttering and noise in between. @dev_submerge.ch That being said, I am not decided whether providing mmaped IO is actually needed or worth the effort in the first place. I try to follow what the manual suggests, and it seems to be discouraging caring about mmap too much. http://manuals.opensound.com/developer/mmap.html
  2. Buffer size assignment. There is SNDCTL_DSP_POLICY and SNDCTL_DSP_SETFRAGMENT. Alternatively, there is also doing simply nothing and using whatever the defaults are. I think it's good to let the user set a buffer size hint, but if we want to achieve low latencies, we can just as well call audio_read()/audio_write() in smaller chunks (plus disabling VCHANs and opening in exclusive mode). This is another case where the manual suggests against caring about this too much. http://manuals.opensound.com/developer/audio_cases.html, http://manuals.opensound.com/developer/audio_timing.html
  3. I am trying to keep the API simple in an libsndio-like fashion, so it's by design that the library doesn't offer a million functions. However, is there something that should definitely be provided by the API that is currently missing?

The man page will be written once the code is finalized.

So, this is still a WIP patch, which means some things are intentionally left out or not properly implemented yet. Apart from the various XXX/TODO/FIXMEs throughout the source code, which are mostly minor things, there are a few things which I would really appreciate some discussion and ideas in order to continue.

  1. Memory-mapped IO doesn't really work properly yet. I've been mostly testing playback, and stuff can be heard, but with lots of stuttering and noise in between. @dev_submerge.ch That being said, I am not decided whether providing mmaped IO is actually needed or worth the effort in the first place. I try to follow what the manual suggests, and it seems to be discouraging caring about mmap too much. http://manuals.opensound.com/developer/mmap.html

Forget about the manual - the mmap io is slightly faster and may save a syscall, but that's not it's main benefit. In contrast to read() / write(), with mmap you know what position you are reading from / writing to. That is relevant as soon as you deal with over- and underruns, and irregularities like drift and loss.

If your application fails to write() in time and produces an underrun, it's impossible to recover in a way to get the same latency as before. You only get to know the actual position of the driver again when it reads from the audio buffer. At that point you can try to correct the next write() length, but that's really messy and error prone. Plus there may be races between querying the buffer content and the actual write(). That was my approach in the previous Jack OSS backend (currently in ports).

Whereas if that happens with mmap io, the mmap pointer just chugs along and the application can simply skip audio data up to the current pointer position. If there is a race, it doesn't affect latency. That approach is used in the new sosso based Jack OSS backend (still hanging on review upstream), whenever mmap is available. It's much more robust and you can achieve latency down to writing blocks of 32 frames at 48kHz (0.7ms).

In short, mmap io is a necessity for serious DAW use where you want the same latency +/- 1ms for all recordings, and for live sound effects or interactive virtual instruments (low latency). For consumer audio it would be nice, but is far from a hard requirement.

  1. Buffer size assignment. There is SNDCTL_DSP_POLICY and SNDCTL_DSP_SETFRAGMENT. Alternatively, there is also doing simply nothing and using whatever the defaults are. I think it's good to let the user set a buffer size hint, but if we want to achieve low latencies, we can just as well call audio_read()/audio_write() in smaller chunks (plus disabling VCHANs and opening in exclusive mode). This is another case where the manual suggests against caring about this too much. http://manuals.opensound.com/developer/audio_cases.html, http://manuals.opensound.com/developer/audio_timing.html

Writing in smaller chunks doesn't work in conjunction with poll() or blocking write(). The application would have to apply a time-based wakeup approach, and thereby account for all the drift / loss introduced by the hardware. I do that in the sosso library, and it's quite difficult to get right. Another possibility is to set SNDCTL_DSP_LOW_WATER accordingly, but there are some caveats too.

I also advise against changing the buffer size, you can easily break playback on some sound cards, snd_hda(4) in particular. SNDCTL_DSP_SETFRAGMENT is conceptually broken (power of 2). One of the goals developing the sosso library was to not touch any buffer sizes, and work around that with the time-based approach.

  1. I am trying to keep the API simple in an libsndio-like fashion, so it's by design that the library doesn't offer a million functions. However, is there something that should definitely be provided by the API that is currently missing?

Frankly I'm a bit skeptical about the value proposition of this library, in its current form. Why would an application developer choose it over portaudio or sndio? If we forget about the pro audio latency topics above, I can only think of bitperfect, passthrough, and exclusive open() as additional features we could provide. You're already in contact with the sndio developer, right? Maybe it's feasible to fit those features into the sndio API and use that instead?

Or is there another raison d'être for this library that I am missing? Sorry, don't mean to sound negative.

lib/libaudio/audio.c
471–473

Waiting on poll() for non-blocking io completely defies its purpose (non-blocking!). Also, IIRC poll() doesn't work for mmap'ed io (what event should it report?). See SNDCTL_DSP_LOW_WATER for what poll() reports as event.

Forget about the manual - the mmap io is slightly faster and may save a syscall, but that's not it's main benefit. In contrast to read() / write(), with mmap you know what position you are reading from / writing to. That is relevant as soon as you deal with over- and underruns, and irregularities like drift and loss.

If your application fails to write() in time and produces an underrun, it's impossible to recover in a way to get the same latency as before. You only get to know the actual position of the driver again when it reads from the audio buffer. At that point you can try to correct the next write() length, but that's really messy and error prone. Plus there may be races between querying the buffer content and the actual write(). That was my approach in the previous Jack OSS backend (currently in ports).

Whereas if that happens with mmap io, the mmap pointer just chugs along and the application can simply skip audio data up to the current pointer position. If there is a race, it doesn't affect latency. That approach is used in the new sosso based Jack OSS backend (still hanging on review upstream), whenever mmap is available. It's much more robust and you can achieve latency down to writing blocks of 32 frames at 48kHz (0.7ms).

In short, mmap io is a necessity for serious DAW use where you want the same latency +/- 1ms for all recordings, and for live sound effects or interactive virtual instruments (low latency). For consumer audio it would be nice, but is far from a hard requirement.

No objection to this.

Writing in smaller chunks doesn't work in conjunction with poll() or blocking write(). The application would have to apply a time-based wakeup approach, and thereby account for all the drift / loss introduced by the hardware. I do that in the sosso library, and it's quite difficult to get right. Another possibility is to set SNDCTL_DSP_LOW_WATER accordingly, but there are some caveats too.

I also advise against changing the buffer size, you can easily break playback on some sound cards, snd_hda(4) in particular. SNDCTL_DSP_SETFRAGMENT is conceptually broken (power of 2). One of the goals developing the sosso library was to not touch any buffer sizes, and work around that with the time-based approach.

As indirectly mentioned in my initial message, I am also against touching buffer sizes. Can you elaborate on the time-based approach? I have taken a look at sosso but I cannot say I fully understand the logic behind it.

Frankly I'm a bit skeptical about the value proposition of this library, in its current form. Why would an application developer choose it over portaudio or sndio? If we forget about the pro audio latency topics above, I can only think of bitperfect, passthrough, and exclusive open() as additional features we could provide. You're already in contact with the sndio developer, right? Maybe it's feasible to fit those features into the sndio API and use that instead?

Or is there another raison d'être for this library that I am missing? Sorry, don't mean to sound negative.

So the main rationale behind this library is to provide an API that can be consumed by the numerous applications that already have OSS backends, or anyone who wants to write an OSS application (for whatever reason). Applications like MPV, FFmpeg, and the like can be greatly simplified with an API like this. The current patch, as I pointed out, is incomplete and contains errors, but I posted it as a proof of concept and also to discuss design choices.

Regarding my contact with the sndio author, I am still waiting for his reply, so I might poke him during these days. We could propose bringing bitperfect etc. to sndio, but that should be a future endeavor IMO.

Another proposition I have is the possibility of merging this library with sosso in some way; either 1) by rewritting sosso in C and perhaps simplifying the API a bit to resemble more this or sndio's API, or 2) by bringing in some of sosso's ideas to here.

I know OSS is not really in-demand, but since this is the audio system FreeBSD ships with, it might be a good idea to have some kind of built-in API than scattered IOCTLs. The ideal scenario would be what we discussed a few months ago, which is to simply get rid of OSS and use something else (e.g sndio with the suggested improvements). But because this is a larger endeavor and would require lots of planning and discussion with other developers, I think it's good to offer something more than a barebones API for OSS, at least for now.

Writing in smaller chunks doesn't work in conjunction with poll() or blocking write(). The application would have to apply a time-based wakeup approach, and thereby account for all the drift / loss introduced by the hardware. I do that in the sosso library, and it's quite difficult to get right. Another possibility is to set SNDCTL_DSP_LOW_WATER accordingly, but there are some caveats too.

I also advise against changing the buffer size, you can easily break playback on some sound cards, snd_hda(4) in particular. SNDCTL_DSP_SETFRAGMENT is conceptually broken (power of 2). One of the goals developing the sosso library was to not touch any buffer sizes, and work around that with the time-based approach.

As indirectly mentioned in my initial message, I am also against touching buffer sizes. Can you elaborate on the time-based approach? I have taken a look at sosso but I cannot say I fully understand the logic behind it.

Ok, time-based approach from simple (OSSv4 example) to sophisticated (sosso):

  • Let the application wake up every e.g. 4ms, and process all audio data available. Works with non-blocking read() / write() or mmap'ed io.
  • 4ms is too coarse to control the latency within +/- 1ms, wake up at 0.3ms intervals to synchronize with the driver writes / reads from the buffer.
  • 0.3ms is too many wakeups, only synchronize at certain intervals (Jack period, e.g. 4ms). Requires one wakeup before the driver writes / reads, one wakeup 0.3ms afterwards.
  • Make synchronizing more efficient (less wakeups) by predicting the next chunk of driver write / read.
  • Require multiple successful synchronizations at start, after under- and overruns, and other irregularities like USB data loss.

If implemented correctly, this method achieves very precise synchronization between the driver and system time, depending on the wakeup precision of the system scheduler. This is measured separately for recording and playback, and then one of them (usually recording) can be used as the master to correct drift on the other.

This method implies a certain running model of the application, the sosso library will tell the application when to wakeup again, but it does no blocking waits by itself. This allows the application to do other stuff and use its own event loop instead of poll().

Frankly I'm a bit skeptical about the value proposition of this library, in its current form. Why would an application developer choose it over portaudio or sndio? If we forget about the pro audio latency topics above, I can only think of bitperfect, passthrough, and exclusive open() as additional features we could provide. You're already in contact with the sndio developer, right? Maybe it's feasible to fit those features into the sndio API and use that instead?

Or is there another raison d'être for this library that I am missing? Sorry, don't mean to sound negative.

So the main rationale behind this library is to provide an API that can be consumed by the numerous applications that already have OSS backends, or anyone who wants to write an OSS application (for whatever reason). Applications like MPV, FFmpeg, and the like can be greatly simplified with an API like this. The current patch, as I pointed out, is incomplete and contains errors, but I posted it as a proof of concept and also to discuss design choices.

If our library does the same thing as sndio, we might as well implement the sndio library API - more compatibility for free. I think you should study the running models of existing applications and sound servers first. The question is: Can our own library provide something for them that is not possible through the sndio API?

Regarding my contact with the sndio author, I am still waiting for his reply, so I might poke him during these days. We could propose bringing bitperfect etc. to sndio, but that should be a future endeavor IMO.

Whether we add these to our own library API or on top of our own implementation of the sndio API doesn't matter, IMHO.

Another proposition I have is the possibility of merging this library with sosso in some way; either 1) by rewritting sosso in C and perhaps simplifying the API a bit to resemble more this or sndio's API, or 2) by bringing in some of sosso's ideas to here.

Thing is, you cannot simplify the sosso API without compromising the pro audio use case. And the only consumer that's currently missing a pro audio backend is pipewire. If you want to take some ideas from sosso, fine, I'll help you with that. But I'm not gonna walk you through the code and all the caveats, just to rewrite the whole sosso in C. Don't get me wrong, you're a good, solid programmer, but not thorough enough to leave the testing to you, which is most of the work here. I may at some point write a pro audio OSS backend for pipewire, and rewrite sosso for that, but it's a lot less work to do it myself.

I know OSS is not really in-demand, but since this is the audio system FreeBSD ships with, it might be a good idea to have some kind of built-in API than scattered IOCTLs. The ideal scenario would be what we discussed a few months ago, which is to simply get rid of OSS and use something else (e.g sndio with the suggested improvements). But because this is a larger endeavor and would require lots of planning and discussion with other developers, I think it's good to offer something more than a barebones API for OSS, at least for now.

The OSSv4 API is a toolbox that can be combined to various different running models and use cases. Sound libraries simplify implementation by making choices and restrictions. Unless you write a very thin wrapper (which is not much simpler to use), you will not be able to provide the flexibility for all use cases. Again, have a good look at how current applications and sound servers use the OSS API. Then decide what should be covered in our library.

If our library does the same thing as sndio, we might as well implement the sndio library API - more compatibility for free. The question is: Can our own library provide something for them that is not possible through the sndio API?

No, the library cannot provide something more than sndio, but I don't think that's the point of this library in the first place, at least for now. My initial goal behind it is to have an easier-to-use API for OSS, so applications that already (or in the rare case that someone wants to write a new one) use OSS, can work with something simpler. It doesn't have to do with providing something "better" compared to another API per se.

I think you should study the running models of existing applications and sound servers first.

Applications like MPV, FFmpeg, and in general similar relatively simple use-cases can be adapted to use the library. When it comes to more sophisticated applications like Jack or virtual_oss are concerned, they can still use it, but will still require lots of code on their part to implement stuff the library won't provide. So you could argue that the library is kind of limited in its current form. I agree 100%.

Another proposition I have is the possibility of merging this library with sosso in some way; either 1) by rewritting sosso in C and perhaps simplifying the API a bit to resemble more this or sndio's API, or 2) by bringing in some of sosso's ideas to here.

Thing is, you cannot simplify the sosso API without compromising the pro audio use case. And the only consumer that's currently missing a pro audio backend is pipewire. If you want to take some ideas from sosso, fine, I'll help you with that. But I'm not gonna walk you through the code and all the caveats, just to rewrite the whole sosso in C. Don't get me wrong, you're a good, solid programmer, but not thorough enough to leave the testing to you, which is most of the work here. I may at some point write a pro audio OSS backend for pipewire, and rewrite sosso for that, but it's a lot less work to do it myself.

Makes sense.

I know OSS is not really in-demand, but since this is the audio system FreeBSD ships with, it might be a good idea to have some kind of built-in API than scattered IOCTLs. The ideal scenario would be what we discussed a few months ago, which is to simply get rid of OSS and use something else (e.g sndio with the suggested improvements). But because this is a larger endeavor and would require lots of planning and discussion with other developers, I think it's good to offer something more than a barebones API for OSS, at least for now.

The OSSv4 API is a toolbox that can be combined to various different running models and use cases. Sound libraries simplify implementation by making choices and restrictions. Unless you write a very thin wrapper (which is not much simpler to use), you will not be able to provide the flexibility for all use cases. Again, have a good look at how current applications and sound servers use the OSS API. Then decide what should be covered in our library.

The reason this API is essentially so limited, is because it's not really possible to cover all use-cases without making it very complicated, as you already mentioned. My rationale was that we can defer the (sometimes tedious) device initialization stuff and reading/writing routines to the library, and let the application do the rest, since we cannot predict its use case. This is basically what libsndio does -- it provides the basis (open, read, write, close) and the rest is left to the application.

That being said, I am open to forgetting about this library for the time being, as I don't think it's such a pressing issue honestly. My main focus is on implementing userland utilities, fixing driver bugs and making a test suite, so if you believe that this library is not worth the effort or won't really satisfy any use-case, then I am totally fine with abandoning it for now. :)

If our library does the same thing as sndio, we might as well implement the sndio library API - more compatibility for free. The question is: Can our own library provide something for them that is not possible through the sndio API?

No, the library cannot provide something more than sndio, but I don't think that's the point of this library in the first place, at least for now. My initial goal behind it is to have an easier-to-use API for OSS, so applications that already (or in the rare case that someone wants to write a new one) use OSS, can work with something simpler. It doesn't have to do with providing something "better" compared to another API per se.

Well, sndio is an easier-to-use API for OSS, doesn't add much overhead, is mature, and already gained some foothold in applications ;)

I think you should study the running models of existing applications and sound servers first.

Applications like MPV, FFmpeg, and in general similar relatively simple use-cases can be adapted to use the library.

Not really. Your current design limits use to blocking, non-duplex scenarios, which means most applications need separate threads for recording, playback, and their main event loop.

When it comes to more sophisticated applications like Jack or virtual_oss are concerned, they can still use it, but will still require lots of code on their part to implement stuff the library won't provide. So you could argue that the library is kind of limited in its current form. I agree 100%.

Exposing the OSSv4 internals while doing some of the initialization could actually be a value proposition, but it's not as sexy for application developers as a complete abstraction layer.

I know OSS is not really in-demand, but since this is the audio system FreeBSD ships with, it might be a good idea to have some kind of built-in API than scattered IOCTLs. The ideal scenario would be what we discussed a few months ago, which is to simply get rid of OSS and use something else (e.g sndio with the suggested improvements). But because this is a larger endeavor and would require lots of planning and discussion with other developers, I think it's good to offer something more than a barebones API for OSS, at least for now.

The OSSv4 API is a toolbox that can be combined to various different running models and use cases. Sound libraries simplify implementation by making choices and restrictions. Unless you write a very thin wrapper (which is not much simpler to use), you will not be able to provide the flexibility for all use cases. Again, have a good look at how current applications and sound servers use the OSS API. Then decide what should be covered in our library.

The reason this API is essentially so limited, is because it's not really possible to cover all use-cases without making it very complicated, as you already mentioned. My rationale was that we can defer the (sometimes tedious) device initialization stuff and reading/writing routines to the library, and let the application do the rest, since we cannot predict its use case. This is basically what libsndio does -- it provides the basis (open, read, write, close) and the rest is left to the application.

It seems to me you don't understand how limiting your current design is, regarding the modus operandi of applications. While the library is not a pressing issue, your understanding of that matter is quite essential for your work on the audio stack. Please have a good read of the sndio developer man page and do a bit of research how different applications handle sound io: Where and how they wait for audio data, in which thread, etc.

That being said, I am open to forgetting about this library for the time being, as I don't think it's such a pressing issue honestly. My main focus is on implementing userland utilities, fixing driver bugs and making a test suite, so if you believe that this library is not worth the effort or won't really satisfy any use-case, then I am totally fine with abandoning it for now. :)

Actually I wanted to challenge you a bit, not discourage you ;)
Maybe we can just let this rest a bit, and come back to it when we have a practical use case?

Not to be discussed here at length, just some food for thought: A library interface for applications, like this proposal here, could pave us the way to transition from OSSv4 to a better audio API. We could for example:

  • Do the format and sample rate conversions in this library instead of the kernel
  • Do the channel mapping / surround sound downmix / upmix in this library
  • Make mmap() io the default and simplify its handling through this library
  • Support more ways to synchronize / sleep / wakeup applications

This would allow us to move complexity out of the kernel, be more flexible and explicit in how formats and channel mappings are handled, bypass the currently problematic buffer size management, and so on.

Such a transition would be middle- to long-term, and we'd have to find a consensus first on how we envision the future audio stack. We'd then have to design the library accordingly, to support and abstract the envisioned features.