Page MenuHomeFreeBSD

Cleanup MD pollution of MI busdma header
ClosedPublic

Authored by jah on May 15 2017, 4:36 AM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 15 2024, 3:36 AM
Unknown Object (File)
Feb 15 2024, 3:36 AM
Unknown Object (File)
Feb 15 2024, 3:36 AM
Unknown Object (File)
Feb 15 2024, 3:36 AM
Unknown Object (File)
Feb 15 2024, 3:36 AM
Unknown Object (File)
Dec 22 2023, 11:09 PM
Unknown Object (File)
Nov 26 2023, 9:51 AM
Unknown Object (File)
Nov 11 2023, 8:34 PM

Details

Summary

Remove NULL-checked dmamap macros. These seem to be a minor pessimization
on every configuration besides x86+bounce, as no other configuration allows NULL
maps. arm64 originally allowed NULL maps, which seems to have been the motivation
behind adding arm[64]-specific memory barriers to bus_dma.h, but this was removed
in r299463.

Remove sparc64 exception for dmamap function declarations, and implement
the previous macros as functions in bus_machdep.c

Declare (optional) interface to MD implementation in bus_dma_internal.h

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jah added reviewers: kib, andrew, marius.

Would there be any concern about performance implications of un-inlining the NULL checks for x86 and the map calls for sparc64?
Maybe network adapters that need high-frequency mbuf operations?

One alternative I'd considered was to let each MD bus_dma.h #define something like WANT_INLINE_DMAMAP.
If set that would change the bus_dmamap* declarations in bus_dma.h to static inline. That would still be a lot
cleaner than what we have now. If we did that, then:
--we could re-inline the sparc64 implementation
--we could inline the x86 and arm64 dmamap calls, which have been sparc64-like function-pointer calls since
Intel DMAR support was added
--for x86 we could move the NULL checks back out to the inlined functions, removing any function call
penalty for no-op maps.

I gnerally like this simplification.

I do not believe that moving NULL checks into non-inline path would have significant (or measurable) effect on operation of any device.

This revision is now accepted and ready to land.May 17 2017, 12:44 PM
In D10729#223080, @kib wrote:

I gnerally like this simplification.

I do not believe that moving NULL checks into non-inline path would have significant (or measurable) effect on operation of any device.

Good, I didn't think so either. Is there anyone else who should have a look at this before I commit it?

In D10729#222257, @jah wrote:

Would there be any concern about performance implications of un-inlining the NULL checks for x86 and the map calls for sparc64?
Maybe network adapters that need high-frequency mbuf operations?

One alternative I'd considered was to let each MD bus_dma.h #define something like WANT_INLINE_DMAMAP.
If set that would change the bus_dmamap* declarations in bus_dma.h to static inline. That would still be a lot
cleaner than what we have now. If we did that, then:
--we could re-inline the sparc64 implementation
--we could inline the x86 and arm64 dmamap calls, which have been sparc64-like function-pointer calls since
Intel DMAR support was added
--for x86 we could move the NULL checks back out to the inlined functions, removing any function call
penalty for no-op maps.

Testing at work with 40 G NICs and a userland IP stack has show that de-inlining such kind of wrappers has a
quite measurable impact on throughput (it's still in the noise with 10 G, though). So given that the performance
of I/O gear also is ever increasing and given that it's not too much hassle, I'd actually prefer your suggested
approach of optional inlining based on a (WANT_)INLINE_DMAMAP macro.

In D10729#222257, @jah wrote:

Would there be any concern about performance implications of un-inlining the NULL checks for x86 and the map calls for sparc64?
Maybe network adapters that need high-frequency mbuf operations?

One alternative I'd considered was to let each MD bus_dma.h #define something like WANT_INLINE_DMAMAP.
If set that would change the bus_dmamap* declarations in bus_dma.h to static inline. That would still be a lot
cleaner than what we have now. If we did that, then:
--we could re-inline the sparc64 implementation
--we could inline the x86 and arm64 dmamap calls, which have been sparc64-like function-pointer calls since
Intel DMAR support was added
--for x86 we could move the NULL checks back out to the inlined functions, removing any function call
penalty for no-op maps.

Testing at work with 40 G NICs and a userland IP stack has show that de-inlining such kind of wrappers has a
quite measurable impact on throughput (it's still in the noise with 10 G, though). So given that the performance
of I/O gear also is ever increasing and given that it's not too much hassle, I'd actually prefer your suggested
approach of optional inlining based on a (WANT_)INLINE_DMAMAP macro.

That's a good data point, thank you for testing on that hardware.
How much did the performance penalty as a % end up being for the 40G case?

In D10729#228105, @jah wrote:
In D10729#222257, @jah wrote:

Would there be any concern about performance implications of un-inlining the NULL checks for x86 and the map calls for sparc64?
Maybe network adapters that need high-frequency mbuf operations?

One alternative I'd considered was to let each MD bus_dma.h #define something like WANT_INLINE_DMAMAP.
If set that would change the bus_dmamap* declarations in bus_dma.h to static inline. That would still be a lot
cleaner than what we have now. If we did that, then:
--we could re-inline the sparc64 implementation
--we could inline the x86 and arm64 dmamap calls, which have been sparc64-like function-pointer calls since
Intel DMAR support was added
--for x86 we could move the NULL checks back out to the inlined functions, removing any function call
penalty for no-op maps.

Testing at work with 40 G NICs and a userland IP stack has show that de-inlining such kind of wrappers has a
quite measurable impact on throughput (it's still in the noise with 10 G, though). So given that the performance
of I/O gear also is ever increasing and given that it's not too much hassle, I'd actually prefer your suggested
approach of optional inlining based on a (WANT_)INLINE_DMAMAP macro.

That's a good data point, thank you for testing on that hardware.
How much did the performance penalty as a % end up being for the 40G case?

Inlining alloc_iov() there, i. e. reverting 94210486, results in a ~ 0.4 % improvement of mean performance
with 40 G NICs:
https://github.com/NTAP/warpcore/blob/master/lib/src/warpcore.c
In principle, alloc_iov() is similar to _bus_dmamap_load_buffer() in terms of functionality (get/load a raw
buffer) and usage, i. e. repeatedly called per payload buffer and mbuf respectively. This might also
translate to bus_dmamap_sync() as - although it makes no sense to call that function in a loop -, sending/
receiving one packet requires at least a few invocations (with different DMA maps and/or operations).

jah edited edge metadata.

Allow MD busdma implementations to provide inline map functions

MD bus_dma.h can define WANT_INLINE_DMAMAP to declare the create/
destroy/alloc/free/sync/unload dmamap functions static inline.
This can be used as a performance optimization on architectures
for which the dmamap calls are implemented through function
pointer indirection to an underlying map implementation which
may be bus or device-specific.

Make use of this on sparc64, x86, and arm64.

This revision now requires review to proceed.Jun 8 2017, 7:34 AM
In D10729#228105, @jah wrote:
In D10729#222257, @jah wrote:

Would there be any concern about performance implications of un-inlining the NULL checks for x86 and the map calls for sparc64?
Maybe network adapters that need high-frequency mbuf operations?

One alternative I'd considered was to let each MD bus_dma.h #define something like WANT_INLINE_DMAMAP.
If set that would change the bus_dmamap* declarations in bus_dma.h to static inline. That would still be a lot
cleaner than what we have now. If we did that, then:
--we could re-inline the sparc64 implementation
--we could inline the x86 and arm64 dmamap calls, which have been sparc64-like function-pointer calls since
Intel DMAR support was added
--for x86 we could move the NULL checks back out to the inlined functions, removing any function call
penalty for no-op maps.

Testing at work with 40 G NICs and a userland IP stack has show that de-inlining such kind of wrappers has a
quite measurable impact on throughput (it's still in the noise with 10 G, though). So given that the performance
of I/O gear also is ever increasing and given that it's not too much hassle, I'd actually prefer your suggested
approach of optional inlining based on a (WANT_)INLINE_DMAMAP macro.

That's a good data point, thank you for testing on that hardware.
How much did the performance penalty as a % end up being for the 40G case?

Inlining alloc_iov() there, i. e. reverting 94210486, results in a ~ 0.4 % improvement of mean performance
with 40 G NICs:
https://github.com/NTAP/warpcore/blob/master/lib/src/warpcore.c
In principle, alloc_iov() is similar to _bus_dmamap_load_buffer() in terms of functionality (get/load a raw
buffer) and usage, i. e. repeatedly called per payload buffer and mbuf respectively. This might also
translate to bus_dmamap_sync() as - although it makes no sense to call that function in a loop -, sending/
receiving one packet requires at least a few invocations (with different DMA maps and/or operations).

I might be tempted to say that 0.4% is still not a big deal, but as you point out I/O is always getting faster.
100G is a thing now, plus there are a lot of high-speed storage controllers we might be concerned about.

So I've posted an update that implements the WANT_INLINE_DMAMAP path. There's some risk for
compile errors in drivers that directly (and incorrectly) include sys/bus_dma.h and then later include
the correct machine/bus.h as that can produce conflicting declarations of the map functions. I've
gone through and fixed the instances I found.

sys/arm64/include/bus_dma.h
9 ↗(On Diff #29330)

This should be machine/bus_dma_impl.h

11 ↗(On Diff #29330)

Should this file be hidden behind _KERNEL?

Use correct include path for arm64 bus_dma_impl.h

jah marked an inline comment as done.Jun 9 2017, 4:05 AM
jah added inline comments.
sys/arm64/include/bus_dma.h
9 ↗(On Diff #29330)

Good catch, fixed.

11 ↗(On Diff #29330)

Good question. We don't seem to have any history of doing that for busdma headers, which before this change still had bits and pieces of inlined/macro-ized functionality. What's the recommendation for keeping stuff away from userspace these days?

sys/arm64/include/bus_dma.h
11 ↗(On Diff #29330)

busdma is clearly kernel-only stuff, it has no value for the userspace. If busdma headers ever included in the user mode compilation environment, then this is clearly a leak.

Do you get any failures with the new headers and buildworld ? If not, I do not see a need in any protection applied to the inlined stuff.

And yes, #ifdef _KERNEL is the semi-official way to avoid user env namespace pollution.

jah marked an inline comment as done.Jun 10 2017, 6:11 AM
jah added inline comments.
sys/arm64/include/bus_dma.h
11 ↗(On Diff #29330)

buildworld is fine on amd64/i386/sparc64, and arm64 after fixing the include path mistake Andrew caught.

But at least camdd consumes these headers in userspace, although it looks like it may only do so to use bus_dma_segment_t.

This revision is now accepted and ready to land.Jun 10 2017, 8:35 AM

ping...Marius/Andrew, are you guys ok with the change as it is now?

marius requested changes to this revision.Jun 21 2017, 8:27 PM

Apart from the style(9) bugs, looks good to me.

sys/sparc64/include/bus_dma.h
130 ↗(On Diff #29368)

According to style(9), this and the following functions/wrappers should start with an empty line each: "Insert an empty line if the function has no local variables".

178 ↗(On Diff #29368)

According to style(9), "Second level indents are four spaces", i. e. indent this wrapped line and those in the below functions/wrappers by 4 spaces relative to the beginning of "return" rather than trying to align with the arguments of the previous line.

sys/sys/bus_dma_internal.h
25 ↗(On Diff #29368)

There probably should be a comment line with $FreeBSD$ here.

sys/x86/include/bus_dma.h
24 ↗(On Diff #29368)

Ditto

This revision now requires changes to proceed.Jun 21 2017, 8:27 PM
jah edited edge metadata.

Style(9) fixes

This revision is now accepted and ready to land.Jun 22 2017, 8:58 AM
This revision was automatically updated to reflect the committed changes.
cem added inline comments.
head/sys/x86/iommu/busdma_dmar.c
364–365

Why was WARN_SLEEPOK added here? The allocations in this function are M_NOWAIT.

If this function is sleepable, shouldn't the allocations by M_WAITOK?

head/sys/x86/x86/busdma_bounce.c
265–266

Ditto above.

375–381

This one should clearly be conditional on (flags & BUS_DMA_NOWAIT) == 0.

head/sys/x86/iommu/busdma_dmar.c
364–365

Ah, fixed in r346351 and r346851.

head/sys/x86/x86/busdma_bounce.c
375–381

This one wasn't ever fixed.

head/sys/x86/x86/busdma_bounce.c
265–266

It wasn't newly added here, it was taken wholesale from the previous wrapper in bus_dma_machdep.c--see the deleted code immediately below this. I wanted to keep the diff as straightforward as possible, since this change was intended to be about header cleanup and not addressing (admittedly sketchy) checks within x86 busdma.