Eliminate some redundant calculation in iommu_gas_match one, and since callers invoke iommu_gas_match_insert if and only of iommu_gas_match_one returns true, merge match_insert into match_one. Merge lowermatch and uppermatch into one.
Details
In a GENERIC-NODEBUG kernel, the command netperf -H lip3 -t TCP_MAERTS -D 1 repeated 6 times on the modified code produced these lines: Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec
65536 32768 32768 10.01 3561.18
65536 32768 32768 10.00 3294.36
65536 32768 32768 10.00 3239.29
65536 32768 32768 10.00 3494.65
65536 32768 32768 10.00 3456.08
65536 32768 32768 10.00 3454.40
and on the unmodified code produced these lines: 65536 32768 32768 10.00 3544.47 65536 32768 32768 10.00 3298.67 65536 32768 32768 10.00 3464.01 65536 32768 32768 10.00 3259.26 65536 32768 32768 10.00 3448.33 65536 32768 32768 10.00 3420.01
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Delete uppermatch. Rename lowermatch to match. Put the find-next-element code into its own function. Reduces size to 8100 or so.
sys/dev/iommu/iommu_gas.c | ||
---|---|---|
333 | This part of the comment should be either removed or updated. | |
369–385 | This comment exactly duplicates the herald. One of them should be eliminated. | |
436–437 | This comment probably needs some expansion to explain that now we look up in the upper region. | |
472 | I believe you can remove these two lines now, and update the assert below to check that error == 0 || error == ENOMEM | |
496 | I cannot find this check in a new code. It cannot be an assert because drivers can specify highaddr above max domain width. |
Replace ummin/ummax with MIN/MAX.
Reading the busdma man page, I see that the addresses excluded by lowaddr and highaddr are those in (lowaddr, highaddr]. Put in '+1' in a few places to get those checks right.
Change ">=" to ">" in a few places to account for my new understanding that lowaddr can be allocated and highaddr cannot.
Pull all the match code up into find_space, which is otherwise a useless wrapper function.
sys/dev/iommu/iommu_gas.c | ||
---|---|---|
325–354 | The expected case is that boundary crossing is a non-issue. These conditional branches should not be repeated in that expected case. As such, this is a pessimization of the original code. |
Add braces, in a case where I thought any decent compiler could optimize-out an extra test, but evidently I'm not using a decent compiler.
sys/dev/iommu/iommu_gas.c | ||
---|---|---|
325–354 | I figured on the compiler working out that the tests would be redundant in the common case, but I was wrong. |
sys/dev/iommu/iommu_gas.c | ||
---|---|---|
355–363 | The compiler reloads a->entry over and over: 1ef2: 48 8b 47 28 movq 40(%rdi), %rax 1ef6: 4c 89 18 movq %r11, (%rax) 1ef9: 4b 8d 04 08 leaq (%r8,%r9), %rax 1efd: 48 05 ff 0f 00 00 addq $4095, %rax # imm = 0xFFF 1f03: 48 25 00 f0 ff ff andq $-4096, %rax # imm = 0xF000 1f09: 4c 01 d8 addq %r11, %rax 1f0c: 48 8b 4f 28 movq 40(%rdi), %rcx 1f10: 48 89 41 08 movq %rax, 8(%rcx) 1f14: 48 8b 47 28 movq 40(%rdi), %rax 1f18: c7 40 28 04 00 00 00 movl $4, 40(%rax) 1f1f: 48 8b 07 movq (%rdi), %rax 1f22: 48 8b 77 28 movq 40(%rdi), %rsi 1f26: 48 83 c0 68 addq $104, %rax 1f2a: 48 89 c7 movq %rax, %rdi 1f2d: e8 00 00 00 00 callq 0x1f32 <iommu_gas_match_one+0xf2> |
This isn't the right approach. It adds register pressure on the compiler to maintain the value for entry over the entire execution of "match one". Accordingly the compiler has to use another callee saves register and actually allocate some stack space. It's better to simply define a local "entry" and initialize it from a->entry right before RB tree insertion.
Put entry back into the struct-of-all-arguments. Pull it out into a local variable just before use. Do something similar with 'domain'.
sys/dev/iommu/iommu_gas.c | ||
---|---|---|
368–369 | 'next' would be more descriptive. |