The function removes the range of addresses from GAS. Right now it is unused. The patch is extracted from the stalled work for integration DMAR and bhyve, see D25672. Having the function in the main tree would allow it to co-evolve with other active changes to the IOMMU driver. Requested by: alc
If you consider accepting D36010, which I just put out for review, you could implement this without calling insert or remove or having to have two allocated entries to plug in. You'd just have to pass a flag to this function to choose whether 'clip' modified the start field or the end field, then just modify that field and call RB_UPDATE_AUGMENT on the entry.
On second thought, there is a case when you'd need to have one spare entry. If the (start, end) you are removing lie within a single entry (that is, entry->start < start < end < entry->end), then you do need one new entry. You could copy entry into new_entry, change entry->end to start, update augmentation, change new_entry->start, to end, and insert that. So you'd need one allocated node in that case.
After RB_UPDATE_AUGMENT, I thought something like thiswould work for the clipping code.
I realize now, though, that I shouldn't have discarded nentry, as it should be the first argument to RB_FOREACH_FROM, and should be used in place of 'entry' everywhere in the loop.
Since you need to have a variable nentry anyway (see below), how about (to avoid old_end):
nentry = *r; *r = NULL; *nentry = *entry; nentry->start = end;
You don't really need a return, or a goto here. You'll skip the loop, and the post-loop test, quickly.
My patch contained an error which remains here. The first and last arguments to RB_FOREACH_FROM can't be the same. You need
nentry = entry;
or something similar.
Is this because you don't want to clip RMRR entries? If so, you could just write in line 644:
if (entry->start < start && (entry->flags & IOMMU_MAP_ENTRY_RMRR) == 0)
to stop clipping at the left boundary. I don't understand, though, why this requires putting in another loop ahead of the RB_FOREACH_FROM loop.
There are things about the latest revision that I don't understand. For one thing, I don't understand why a new entry is getting inserted when an entry spanning 'end' is encountered. That's not what happens around 'start'; there, a new entry is created only when the entry that spans 'start' also spans 'end'. If this change is good, for some reason, then I'd expect the splitting around 'start' to be the same, and not what we had back when we were deleting elements between 'start' and 'end'.
For another thing - if we're not looping for any reason other than to get to 'end', it's better to use RB_NFIND than to loop.
I don't really understand why this loop was gutted and a second loop is necessary to remove elements, but there must be one.
Problem with previous version WRT invalidation is that to unload we need the actual entry.
I changed the code to be a mix between the version you agreed with, and the last one. In particular, it has only one loop instead of two, over the removed range' entries. But I still create up to two clipped entries at start and end, which are unloaded and freed.
Here's what the clip-right function does.
If the intent is to remove everything in the range [5, 15), and there is an entry that covers [13, 17), then this function will shrink that to [15, 17) and create a new entry for [13, 15).
Here is what I think you want the clip-left function to do.
If the intent is to remove everything in the range [5, 15), and there is an entry that covers [3, 7), then this function will shrink that to [3,5) and create a new entry for [5,7).
Here is what the clip-left function actually does.
If the intent is to remove everything in the range [5,15), and there is an entry that covers [3,17), then this function will shrink that to [3,5) and create a new entry for [15,17). If there is an entry that covers [3,7), then this function will shrink it do [3,5) and create no new entries.
If clip_right returned true, then last is the same as what r2 was before clip_right was called. So you could null r2 here, after unmap, and avoid having a 'last' parameter.
iommu_domain_unload_entry() is going to directly or indirectly (through qi) call iommu_gas_rb_remove() a second time on each entry. Can a _PLACE entry actually be mapped, i.e., not be _UNMAPPED?
Rather than implementing this new synchronization mechanism, for now, I would suggest changing (1) iommu_gas_remove_unmap() so that it simply adds the entry to a TAILQ (and thus doesn't need to release and reacquire the domain lock) and (2) this function so that it calls iommu_domain_unload() on that TAILQ after the domain lock is released. Later, we could look at adding a new helper function to the arm64- and x86-specific code that might initiate the unmapping and IOTLB invalidation if the latter can be done asynchronously. Otherwise, this new function adds the entry to the above TAILQ for synchronous invalidation after the domain lock is released.
I now think that this is in fact not correct (it was not correct in the previous patch as well, but for different reason, which you pointed out). Imagine that some consumer calls iommu_gase_remove() on a range that intersects with our operating range. I do not see why this is incorrect.
Then, the second caller overrides the dmamap_link linkage for the participating map entries, if they are not yet processed by unload.
Agreed. To resolve this issue, I suggest adding an "in transition" flag to the entry. iommu_gas_remove_unmap() skips entries with the flag set, and sets it on entries that are added to the TAILQ.