Page MenuHomeFreeBSD

iommu_gas: avoid needless tree augmentations
ClosedPublic

Authored by dougm on Jun 15 2022, 9:05 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Dec 16, 6:32 AM
Unknown Object (File)
Tue, Dec 10, 1:04 PM
Unknown Object (File)
Sat, Dec 7, 5:16 PM
Unknown Object (File)
Fri, Dec 6, 11:13 AM
Unknown Object (File)
Fri, Dec 6, 8:29 AM
Unknown Object (File)
Mon, Dec 2, 4:13 PM
Unknown Object (File)
Sun, Dec 1, 12:04 AM
Unknown Object (File)
Sat, Nov 30, 9:03 AM
Subscribers

Details

Summary

iommu_gas_augment_entry updates a map entry element. Invoked as RB_AUGMENT in RB tree code, it is applied from the point where the tree is modified, all the way up to the root, and is also applied when rotation moves a node down in the tree.

There are several opportunities to invoke it less. This change looks for some of those opportunities. The automatic augmentation with every rotation is a mistake.

  1. In the double-rotate case of RB_INSERT_COLOR, augmentation after the first rotation is not necessary when the element being moved down the tree becomes a leaf. It was in the tree, and was a leaf, before the RB_INSERT operation began, and so recomputing augmentation for it would do nothing.
  1. In the final (possibly only) rotation of RB_REMOVE_COLOR, both the elements - the one moving up and the one moving down - end up in the path from the deletion point to the tree root, so there's no need to augment either of them here.
  1. In RB_REMOVE, when the right child of the removed node replaces it in tree, it began with a null left child. Replacement creates a non-NULL left child, and then rotation may put a NULL node back in that place. If that happens, start the augmenting-up-to-root with the parent of that node, since augmentation would do nothing.

Adjust to avoid these needless augmentations.

Test Plan

This shrinks iommu_gas.o by 208 bytes.

On lip3, for a test program (

) that uses no augmentation, this does not change the test binary:
$ sum rb_toy rb_toy.orig
13209 19 rb_toy
13209 19 rb_toy.orig

Unsurprisingly, they produce similar performance:

x times.orig
+ times
+-------------------------------------------------------------------------------------+
|                         x                                           +               |
|              x   +      x                                           +               |
|      +   x   x  ++ +    *+x      +x x    +    x       +     +       + +             |
|+++*  +  **  x*x+**x*x*  *** ++ xx*x+*x +x+x   x  + x  +  xx+* x  +++* *x*xx x x    x|
|            |__|______________M__M_A__A____________________|_|                       |
+-------------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  48     61.469533     63.283309     62.139805     62.249214    0.51279665
+  48      61.39718     63.025671     62.064204     62.193492    0.52476965
No difference proven at 95.0% confidence

For a test program that maintains augmentation (compiled with -DTEST_AUGMENTATION), but (as in the first example) picks random values to insert in the tree, this change shrinks the code size:
$ size rb_toy rb_toy.orig

text   data   bss    dec      hex   filename
6014    472    24   6510   0x196e   rb_toy
6358    472    24   6854   0x1ac6   rb_toy.orig

and ministat reports a small performance improvement:

x times.orig
+ times
+-------------------------------------------------------------------------------------+
|                          +                                                          |
|          x       +       ++        x       +     x                                x |
|++        x   x   ++++   +++x      xx +*    + xx  xxx              +      +   x  xxx |
|++        xx xx + ++++ + ++** *xx  x* **++*x* xxxxxxx     +     + ++   ++ + ++xx xxxx|
|              |________|___M________A________MA___________|__________|               |
+-------------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  48     87.376449     89.539418     88.406093     88.438795    0.66567261
+  48     87.096947      89.33828     87.893474     88.139996    0.63203003
Difference at 95.0% confidence
        -0.298799 +/- 0.263127
        -0.337859% +/- 0.297524%
        (Student's t, pooled s = 0.649069)

With augmentation, and with inserting the first free block (-DTEST_AUGMENTATION -DINSERT_FIRST_FREE), this patch reduces the executable size:

text   data   bss    dec      hex   filename
6695    480    24   7199   0x1c1f   rb_toy.orig
6351    480    24   6855   0x1ac7   rb_toy

and performance stats, processed by ministat show:

x times.orig
+ times
+-------------------------------------------------------------------------------------+
|                           +                 +                                       |
|                           +                 +                                       |
|                         x +      x          +                          +            |
|        +  x         x  +x++*+  x *        + +*                     x  ++x+  x       |
|++      + *xxx xxx   xx++**+** +*x*x  x+ x *****x***     +  x       xx ++*+ xx    x x|
|                   |_______________M_M_A_A_________________|__|                      |
+-------------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  48     79.861774     81.958606     80.555018     80.719996    0.61400222
+  48     79.565552     81.681703     80.606281     80.678374    0.56207507
No difference proven at 95.0% confidence

So, improvement, sort of.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

dougm requested review of this revision.Jun 15 2022, 9:05 PM
dougm created this revision.

Restore performance of augment function. Update augmentation before color-rebalancing in rb_insert/rb_delete.

Move the updates of augmentation data to RB_INSERT_COLOR and RB_REMOVE_COLOR, from RB_INSERT and RB_REMOVE, to avoid two separate climbs up toward the root.

Add a new RB_AUGMENT_CHECK macro, that is like RB_AUGMENT but can return a true/false value to indicate whether the entry was/wasn't changed. If it is not defined, provide a default definition based on RB_AUGMENT.

Add counters to iommu_gas_augment calls.

New performance results:
hw.iommu.gas_augments: 83631
65536 32768 32768 30.00 3744.75
hw.iommu.gas_augments: 164907934
65536 32768 32768 30.00 3732.75
hw.iommu.gas_augments: 330001651
65536 32768 32768 30.00 3723.69
hw.iommu.gas_augments: 494228573
65536 32768 32768 30.00 3714.62
hw.iommu.gas_augments: 659007416
65536 32768 32768 30.00 3715.91
hw.iommu.gas_augments: 823836555

Make it easier for a compiler to erase the final loops in RB_INSERT_COLOR and RB_REMOVE_COLOR when RB_AUGMENT_CHECK is false.

Drop some redundant assignments from the color functions.

Add to the tree.3 man page. Drop counters.

Make the third parameter to RB_ROTATE_{LEFT,RIGHT} an input parameter instead of an output parameter, to avoid a few pointer dereferences where we already have the value in a register. Dropping a new such assignments makes things a tiny bit faster (and the code a tiny bit bigger, because, who knows).

Update the augmentation function in a way that causes iommu_gas.o to shrink somewhat.

Merge-in overlapping changes. Add manpage symbolic link.

Stop trying to do augmentation and rebalancing on the same walk up the tree. Keep trying to shorten the augmentation walk. Resolve conflicts with overlapping merges.

Restore to working order by again moving augmentation before rebalancing.

Size goes from 8686 to 9166 on lip3.

dougm edited the summary of this revision. (Show Details)
dougm edited the test plan for this revision. (Show Details)

Big redo.

dougm edited the summary of this revision. (Show Details)
dougm edited the test plan for this revision. (Show Details)

Strip out the "stop climbing to root" parts. The rest has a measurable micro-benchmark benefit.

Tidy code in the remove_color code.

Change augmentation loops from while to do-while.

Put the REMOVE_COLOR augment loop in the parent!=NULL block.

Move all the RB_AUGMENT calls displaced from RB_ROTATE to the bottom of RB_*_COLOR, so that there aren't too many of them.

Update to resolve conflict with a recent checkin.

dougm edited the test plan for this revision. (Show Details)

Remove switch statements from RB_INSERT_COLOR that hurt performance on augmentation-free tests. Use ifs instead.

This revision is now accepted and ready to land.Aug 6 2022, 6:10 PM
This revision was automatically updated to reflect the committed changes.