Page MenuHomeFreeBSD

Cleanup MD pollution of MI busdma header
ClosedPublic

Authored by jah on May 15 2017, 4:36 AM.

Details

Summary

Remove NULL-checked dmamap macros. These seem to be a minor pessimization
on every configuration besides x86+bounce, as no other configuration allows NULL
maps. arm64 originally allowed NULL maps, which seems to have been the motivation
behind adding arm[64]-specific memory barriers to bus_dma.h, but this was removed
in r299463.

Remove sparc64 exception for dmamap function declarations, and implement
the previous macros as functions in bus_machdep.c

Declare (optional) interface to MD implementation in bus_dma_internal.h

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jah created this revision.May 15 2017, 4:36 AM
jah edited the summary of this revision. (Show Details)May 15 2017, 4:40 AM
jah added reviewers: kib, andrew, marius.
jah added a comment.May 15 2017, 4:48 AM

Would there be any concern about performance implications of un-inlining the NULL checks for x86 and the map calls for sparc64?
Maybe network adapters that need high-frequency mbuf operations?

One alternative I'd considered was to let each MD bus_dma.h #define something like WANT_INLINE_DMAMAP.
If set that would change the bus_dmamap* declarations in bus_dma.h to static inline. That would still be a lot
cleaner than what we have now. If we did that, then:
--we could re-inline the sparc64 implementation
--we could inline the x86 and arm64 dmamap calls, which have been sparc64-like function-pointer calls since
Intel DMAR support was added
--for x86 we could move the NULL checks back out to the inlined functions, removing any function call
penalty for no-op maps.

jah edited the summary of this revision. (Show Details)May 15 2017, 4:49 AM
kib accepted this revision.May 17 2017, 12:44 PM

I gnerally like this simplification.

I do not believe that moving NULL checks into non-inline path would have significant (or measurable) effect on operation of any device.

This revision is now accepted and ready to land.May 17 2017, 12:44 PM
jah added a comment.May 27 2017, 10:10 PM
In D10729#223080, @kib wrote:

I gnerally like this simplification.
I do not believe that moving NULL checks into non-inline path would have significant (or measurable) effect on operation of any device.

Good, I didn't think so either. Is there anyone else who should have a look at this before I commit it?

marius edited edge metadata.May 31 2017, 8:09 PM
In D10729#222257, @jah wrote:

Would there be any concern about performance implications of un-inlining the NULL checks for x86 and the map calls for sparc64?
Maybe network adapters that need high-frequency mbuf operations?
One alternative I'd considered was to let each MD bus_dma.h #define something like WANT_INLINE_DMAMAP.
If set that would change the bus_dmamap* declarations in bus_dma.h to static inline. That would still be a lot
cleaner than what we have now. If we did that, then:
--we could re-inline the sparc64 implementation
--we could inline the x86 and arm64 dmamap calls, which have been sparc64-like function-pointer calls since
Intel DMAR support was added
--for x86 we could move the NULL checks back out to the inlined functions, removing any function call
penalty for no-op maps.

Testing at work with 40 G NICs and a userland IP stack has show that de-inlining such kind of wrappers has a
quite measurable impact on throughput (it's still in the noise with 10 G, though). So given that the performance
of I/O gear also is ever increasing and given that it's not too much hassle, I'd actually prefer your suggested
approach of optional inlining based on a (WANT_)INLINE_DMAMAP macro.

jah added a comment.Jun 1 2017, 12:01 AM
In D10729#222257, @jah wrote:

Would there be any concern about performance implications of un-inlining the NULL checks for x86 and the map calls for sparc64?
Maybe network adapters that need high-frequency mbuf operations?
One alternative I'd considered was to let each MD bus_dma.h #define something like WANT_INLINE_DMAMAP.
If set that would change the bus_dmamap* declarations in bus_dma.h to static inline. That would still be a lot
cleaner than what we have now. If we did that, then:
--we could re-inline the sparc64 implementation
--we could inline the x86 and arm64 dmamap calls, which have been sparc64-like function-pointer calls since
Intel DMAR support was added
--for x86 we could move the NULL checks back out to the inlined functions, removing any function call
penalty for no-op maps.

Testing at work with 40 G NICs and a userland IP stack has show that de-inlining such kind of wrappers has a
quite measurable impact on throughput (it's still in the noise with 10 G, though). So given that the performance
of I/O gear also is ever increasing and given that it's not too much hassle, I'd actually prefer your suggested
approach of optional inlining based on a (WANT_)INLINE_DMAMAP macro.

That's a good data point, thank you for testing on that hardware.
How much did the performance penalty as a % end up being for the 40G case?

marius added a comment.Jun 3 2017, 1:26 AM
In D10729#228105, @jah wrote:
In D10729#222257, @jah wrote:

Would there be any concern about performance implications of un-inlining the NULL checks for x86 and the map calls for sparc64?
Maybe network adapters that need high-frequency mbuf operations?
One alternative I'd considered was to let each MD bus_dma.h #define something like WANT_INLINE_DMAMAP.
If set that would change the bus_dmamap* declarations in bus_dma.h to static inline. That would still be a lot
cleaner than what we have now. If we did that, then:
--we could re-inline the sparc64 implementation
--we could inline the x86 and arm64 dmamap calls, which have been sparc64-like function-pointer calls since
Intel DMAR support was added
--for x86 we could move the NULL checks back out to the inlined functions, removing any function call
penalty for no-op maps.

Testing at work with 40 G NICs and a userland IP stack has show that de-inlining such kind of wrappers has a
quite measurable impact on throughput (it's still in the noise with 10 G, though). So given that the performance
of I/O gear also is ever increasing and given that it's not too much hassle, I'd actually prefer your suggested
approach of optional inlining based on a (WANT_)INLINE_DMAMAP macro.

That's a good data point, thank you for testing on that hardware.
How much did the performance penalty as a % end up being for the 40G case?

Inlining alloc_iov() there, i. e. reverting 94210486, results in a ~ 0.4 % improvement of mean performance
with 40 G NICs:
https://github.com/NTAP/warpcore/blob/master/lib/src/warpcore.c
In principle, alloc_iov() is similar to _bus_dmamap_load_buffer() in terms of functionality (get/load a raw
buffer) and usage, i. e. repeatedly called per payload buffer and mbuf respectively. This might also
translate to bus_dmamap_sync() as - although it makes no sense to call that function in a loop -, sending/
receiving one packet requires at least a few invocations (with different DMA maps and/or operations).

jah updated this revision to Diff 29330.Jun 8 2017, 7:34 AM
jah edited edge metadata.

Allow MD busdma implementations to provide inline map functions

MD bus_dma.h can define WANT_INLINE_DMAMAP to declare the create/
destroy/alloc/free/sync/unload dmamap functions static inline.
This can be used as a performance optimization on architectures
for which the dmamap calls are implemented through function
pointer indirection to an underlying map implementation which
may be bus or device-specific.

Make use of this on sparc64, x86, and arm64.

This revision now requires review to proceed.Jun 8 2017, 7:34 AM
jah added a comment.Jun 8 2017, 7:41 AM
In D10729#228105, @jah wrote:
In D10729#222257, @jah wrote:

Would there be any concern about performance implications of un-inlining the NULL checks for x86 and the map calls for sparc64?
Maybe network adapters that need high-frequency mbuf operations?
One alternative I'd considered was to let each MD bus_dma.h #define something like WANT_INLINE_DMAMAP.
If set that would change the bus_dmamap* declarations in bus_dma.h to static inline. That would still be a lot
cleaner than what we have now. If we did that, then:
--we could re-inline the sparc64 implementation
--we could inline the x86 and arm64 dmamap calls, which have been sparc64-like function-pointer calls since
Intel DMAR support was added
--for x86 we could move the NULL checks back out to the inlined functions, removing any function call
penalty for no-op maps.

Testing at work with 40 G NICs and a userland IP stack has show that de-inlining such kind of wrappers has a
quite measurable impact on throughput (it's still in the noise with 10 G, though). So given that the performance
of I/O gear also is ever increasing and given that it's not too much hassle, I'd actually prefer your suggested
approach of optional inlining based on a (WANT_)INLINE_DMAMAP macro.

That's a good data point, thank you for testing on that hardware.
How much did the performance penalty as a % end up being for the 40G case?

Inlining alloc_iov() there, i. e. reverting 94210486, results in a ~ 0.4 % improvement of mean performance
with 40 G NICs:
https://github.com/NTAP/warpcore/blob/master/lib/src/warpcore.c
In principle, alloc_iov() is similar to _bus_dmamap_load_buffer() in terms of functionality (get/load a raw
buffer) and usage, i. e. repeatedly called per payload buffer and mbuf respectively. This might also
translate to bus_dmamap_sync() as - although it makes no sense to call that function in a loop -, sending/
receiving one packet requires at least a few invocations (with different DMA maps and/or operations).

I might be tempted to say that 0.4% is still not a big deal, but as you point out I/O is always getting faster.
100G is a thing now, plus there are a lot of high-speed storage controllers we might be concerned about.

So I've posted an update that implements the WANT_INLINE_DMAMAP path. There's some risk for
compile errors in drivers that directly (and incorrectly) include sys/bus_dma.h and then later include
the correct machine/bus.h as that can produce conflicting declarations of the map functions. I've
gone through and fixed the instances I found.

andrew added inline comments.Jun 8 2017, 10:04 AM
sys/arm64/include/bus_dma.h
9 ↗(On Diff #29330)

This should be machine/bus_dma_impl.h

11 ↗(On Diff #29330)

Should this file be hidden behind _KERNEL?

jah updated this revision to Diff 29368.Jun 9 2017, 3:57 AM

Use correct include path for arm64 bus_dma_impl.h

jah marked an inline comment as done.Jun 9 2017, 4:05 AM
jah added inline comments.
sys/arm64/include/bus_dma.h
9 ↗(On Diff #29330)

Good catch, fixed.

11 ↗(On Diff #29330)

Good question. We don't seem to have any history of doing that for busdma headers, which before this change still had bits and pieces of inlined/macro-ized functionality. What's the recommendation for keeping stuff away from userspace these days?

kib added inline comments.Jun 9 2017, 10:11 AM
sys/arm64/include/bus_dma.h
11 ↗(On Diff #29330)

busdma is clearly kernel-only stuff, it has no value for the userspace. If busdma headers ever included in the user mode compilation environment, then this is clearly a leak.

Do you get any failures with the new headers and buildworld ? If not, I do not see a need in any protection applied to the inlined stuff.

And yes, #ifdef _KERNEL is the semi-official way to avoid user env namespace pollution.

jah marked an inline comment as done.Jun 10 2017, 6:11 AM
jah added inline comments.
sys/arm64/include/bus_dma.h
11 ↗(On Diff #29330)

buildworld is fine on amd64/i386/sparc64, and arm64 after fixing the include path mistake Andrew caught.

But at least camdd consumes these headers in userspace, although it looks like it may only do so to use bus_dma_segment_t.

kib accepted this revision.Jun 10 2017, 8:35 AM
This revision is now accepted and ready to land.Jun 10 2017, 8:35 AM
jah added a comment.Jun 21 2017, 6:08 PM

ping...Marius/Andrew, are you guys ok with the change as it is now?

marius requested changes to this revision.Jun 21 2017, 8:27 PM

Apart from the style(9) bugs, looks good to me.

sys/sparc64/include/bus_dma.h
130 ↗(On Diff #29368)

According to style(9), this and the following functions/wrappers should start with an empty line each: "Insert an empty line if the function has no local variables".

178 ↗(On Diff #29368)

According to style(9), "Second level indents are four spaces", i. e. indent this wrapped line and those in the below functions/wrappers by 4 spaces relative to the beginning of "return" rather than trying to align with the arguments of the previous line.

sys/sys/bus_dma_internal.h
25 ↗(On Diff #29368)

There probably should be a comment line with $FreeBSD$ here.

sys/x86/include/bus_dma.h
24 ↗(On Diff #29368)

Ditto

This revision now requires changes to proceed.Jun 21 2017, 8:27 PM
jah updated this revision to Diff 29942.Jun 22 2017, 5:34 AM
jah edited edge metadata.

Style(9) fixes

marius accepted this revision.Jun 22 2017, 8:58 AM
This revision is now accepted and ready to land.Jun 22 2017, 8:58 AM
This revision was automatically updated to reflect the committed changes.