Page MenuHomeFreeBSD

iommu_gas: add iommu_gas_remove()
Needs ReviewPublic

Authored by kib on Sun, Jul 31, 9:05 AM.

Details

Summary
The function removes the range of addresses from GAS.  Right now it is
unused.

The patch is extracted from the stalled work for integration DMAR and 
bhyve, see D25672. Having the function in the main tree would allow it
to co-evolve with other active changes to the IOMMU driver.

Requested by:   alc

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kib requested review of this revision.Sun, Jul 31, 9:05 AM
sys/dev/iommu/iommu_gas.c
661

I guess that you seek the first entry with end <= entry->end, based on the line 715 KASSERT. But that's what RB_NFIND does for you already. So I don't understand the purpose of the code from lines 643 to 660. Or I don't understand the purpose of the function.

Replace 'end' with 'start' and 'line 715' with 'line 689', and the same comment applies to the previous function.

Maybe a comment to explain these functions would help.

672

What if entry->end <= clip?

sys/dev/iommu/iommu_gas.c
742

lentry could be NULL, leading to segfault. Same for rentry.

kib marked 3 inline comments as done.Mon, Aug 1, 11:46 AM
kib added inline comments.
sys/dev/iommu/iommu_gas.c
661

I added the comments and supposedly fixed the bugs I could find by reading.

742

I added the safety checks, but practically it should not because there is always first_place and last_place entries in the GAS.

kib marked 2 inline comments as done.

Handle Doug' comments.
Clarify RB_FIND/RB_NFIND in tree(3), this is a separate commit.

share/man/man3/tree.3
533 ↗(On Diff #108777)

Not just any element, but the least (or first) element.

sys/dev/iommu/iommu_gas.c
606

So, you want to find the entry with
entry->start <= start < entry->end
if there is one, for the first entry with entry->start < start. Or NULL.

So that's just the first entry with entry->end < start.

That is the entry value returned by RB_FIND.

So you could just return that. Without lines 618 to 634.

617

If first_;lace and last_place entries in the GAS mean that entry is not NULL here, ever, you can KASSERT that instead of adding needless safety checks.

kib marked 3 inline comments as done.Mon, Aug 1, 4:57 PM
kib added inline comments.
sys/dev/iommu/iommu_gas.c
606

...
or the first entry with entry->end < start, or NULL (just correcting a type in your reply, I think).

I do not think that the _last_ entry with entry->start < start is it, it is that entry, or the one right after it.

Let me formulate it differently: it is the entry where we should either start the iteration over entries that needs to be removed, or the entry which must be split, and then the right half of it is the entry we start iteration with.

617

We usually do not assert for pointers not equal to NULL, hardware is good at catching such cases, esp. since we disabled mappings at zero and hardware has SMAP. bde@ was esp. against KASSERT(p != NULL...) kind of things.

I added asserts that start/end are below the domain top address.

kib marked 2 inline comments as done.

Man page edit: RB_NFIND returns first element ...
Add asserts that start/end are in the domain' GAS limit.

sys/dev/iommu/iommu_gas.c
620

Suppose RB_NFIND did not return NULL. Then we get here with start < entry->end.
We fail the line 621 test. If start == entry->start, we break and return entry. Otherwise, if start < entry->start, we find the previous entry. If there is no previous entry, we break and return entry. If there is a previous entry, I assume that nentry->end <= entry->start. If I'm wrong, and these ranges can overlap, then I have misunderstood pretty much everything. So is nentry->end <= start? Yes! Because RB_NFIND guarantees that entry is the first entry with entry->end > start. Se we break, and return entry.

So in every case where RB_NFIND returns a non-NULL entry, we end up returning that entry. So you could just return it without falling into the for-loop.

What if RB_NFIND returns NULL? Then you will walk the entire list looking for an entry satisfying start <= entry->start, never finding one and eventually returning NULL, which is what RB_NFIND gave you in the first place.

So, we're stuck. I tried to tell you that you didn't need the code after RB_NFIND, and you didn't believe me. Now I've tried to tell you that the code after RB_NFIND accompiishes nothing, and you won't believe me.

kib marked an inline comment as done.

Remove iommu_gas_find_left/right_clip_entry().

share/man/man3/tree.3
526 ↗(On Diff #108788)

macro returns the element in the tree equal to the provided
key, or
.Dv NULL
if there is no such element.

533 ↗(On Diff #108788)

macro returns the least element greater than or equal to the provided
key, or
.Dv NULL
if there is no such element.

sys/dev/iommu/iommu_gas.c
683

By "contains the ... address start", I assume you mean an entry with

entry->start <= start < entry->end

For fentry.start = fentry.end = start, you might get back an entry with entry->end = start. So you should add one to the fentry values to be sure you get back an entry with entry->end > start.

But maybe 'contains' means something else here. The comment should be clearer.

695

I don't think you need to do a find here. You can iterate the loop while entry->end < end, and when the loop is done, 'entry' will be the same as the rentry you're finding here, and you can clip or not at that point.

kib marked 4 inline comments as done.Tue, Aug 2, 12:26 AM
kib added inline comments.
sys/dev/iommu/iommu_gas.c
683

I am not sure what do you mean by 'clearer' there. The terminology for entry containing an address seems to be well-established, for instance there are more than five instances of it in vm_map.c.

kib marked an inline comment as done.

More Doug' suggestions

Do not forget to actually remove left half of the right-clipped entry.

sys/dev/iommu/iommu_gas.c
609

If you consider accepting D36010, which I just put out for review, you could implement this without calling insert or remove or having to have two allocated entries to plug in. You'd just have to pass a flag to this function to choose whether 'clip' modified the start field or the end field, then just modify that field and call RB_UPDATE_AUGMENT on the entry.

sys/dev/iommu/iommu_gas.c
726

How about

RB_FOREACH(entry, iommu_gas_entries_tree, &domain->rb_root) {

?

sys/dev/iommu/iommu_gas.c
609

On second thought, there is a case when you'd need to have one spare entry. If the (start, end) you are removing lie within a single entry (that is, entry->start < start < end < entry->end), then you do need one new entry. You could copy entry into new_entry, change entry->end to start, update augmentation, change new_entry->start, to end, and insert that. So you'd need one allocated node in that case.

After RB_UPDATE_AUGMENT, I thought something like this

would work for the clipping code.

I realize now, though, that I shouldn't have discarded nentry, as it should be the first argument to RB_FOREACH_FROM, and should be used in place of 'entry' everywhere in the loop.

Take Doug' patch, with some editing.

sys/dev/iommu/iommu_gas.c
644

old_end is needed because entry->end is overwritten above

648

This is goto out instead of return, to not skip INVARIANTS check

sys/dev/iommu/iommu_gas.c
639

Since you need to have a variable nentry anyway (see below), how about (to avoid old_end):
if (end < entry->end) {

nentry = *r;
*r = NULL;
*nentry = *entry;
nentry->start = end;

}
entry->end = start;
RB_UPDATE_AUGMENT(entry, rb_entry);
if (*r == NULL)

iommu_gas_rb_insert(domain, nentry);

You don't really need a return, or a goto here. You'll skip the loop, and the post-loop test, quickly.

653

My patch contained an error which remains here. The first and last arguments to RB_FOREACH_FROM can't be the same. You need

nentry = entry;
RB_FOREACH(entry, iommu_gas_entries_tree, nentry) {

or something similar.

kib marked 2 inline comments as done.

Eliminate old_end; use nentry.
Consistently skip RMRR entries.

sys/dev/iommu/iommu_gas.c
636

Is this because you don't want to clip RMRR entries? If so, you could just write in line 644:

if (entry->start < start && (entry->flags & IOMMU_MAP_ENTRY_RMRR) == 0)

to stop clipping at the left boundary. I don't understand, though, why this requires putting in another loop ahead of the RB_FOREACH_FROM loop.

kib marked an inline comment as done.

Remove RMRR skipping loop.

Fix RMRR check condition.
Remove the pointless assert from the main loop.

This revision is now accepted and ready to land.Thu, Aug 4, 4:13 PM
sys/dev/iommu/iommu_gas.c
609–610

It's not clear to me where the IOTLB invalidation will occur.

Handle iotlb invalidation.
Mark domain busy for the duration of _remove(), and prevent _map() from occuring while _remove() is processing (it drops the domain lock).

This revision now requires review to proceed.Thu, Aug 4, 10:15 PM
sys/dev/iommu/iommu_gas.c
659

There are things about the latest revision that I don't understand. For one thing, I don't understand why a new entry is getting inserted when an entry spanning 'end' is encountered. That's not what happens around 'start'; there, a new entry is created only when the entry that spans 'start' also spans 'end'. If this change is good, for some reason, then I'd expect the splitting around 'start' to be the same, and not what we had back when we were deleting elements between 'start' and 'end'.

For another thing - if we're not looping for any reason other than to get to 'end', it's better to use RB_NFIND than to loop.

I don't really understand why this loop was gutted and a second loop is necessary to remove elements, but there must be one.

kib marked an inline comment as done.Fri, Aug 5, 7:21 PM
kib added inline comments.
sys/dev/iommu/iommu_gas.c
659

Problem with previous version WRT invalidation is that to unload we need the actual entry.

I changed the code to be a mix between the version you agreed with, and the last one. In particular, it has only one loop instead of two, over the removed range' entries. But I still create up to two clipped entries at start and end, which are unloaded and freed.

kib marked an inline comment as done.

A version with the single loop over [start, end).

Here's what the clip-right function does.

If the intent is to remove everything in the range [5, 15), and there is an entry that covers [13, 17), then this function will shrink that to [15, 17) and create a new entry for [13, 15).

Here is what I think you want the clip-left function to do.

If the intent is to remove everything in the range [5, 15), and there is an entry that covers [3, 7), then this function will shrink that to [3,5) and create a new entry for [5,7).

Here is what the clip-left function actually does.

If the intent is to remove everything in the range [5,15), and there is an entry that covers [3,17), then this function will shrink that to [3,5) and create a new entry for [15,17). If there is an entry that covers [3,7), then this function will shrink it do [3,5) and create no new entries.

sys/dev/iommu/iommu_gas.c
715

If clip_right returned true, then last is the same as what r2 was before clip_right was called. So you could null r2 here, after unmap, and avoid having a 'last' parameter.

kib marked 5 inline comments as done.Sat, Aug 6, 11:14 AM

Try to fix _remove_clip_left().
Remove the last argument from _remove_clip_right().

Upload the right patch, some bits were only staged

Please go ahead and commit the changes to tree.3.

sys/dev/iommu/iommu_gas.c
662

I'm confused. Shouldn't this be ==?

sys/dev/iommu/iommu_gas.c
687–689

Am I correct in inferring that you introduced this synchronization because the domain lock is released and reacquired in the middle of the below RB_FOREACH_FROM loop?

kib marked 2 inline comments as done.Sat, Aug 6, 7:25 PM
kib added inline comments.
sys/dev/iommu/iommu_gas.c
662

Definitely.

687–689

Yes

kib marked 2 inline comments as done.

Fix condition for unmap

sys/dev/iommu/iommu_gas.c
669–670

iommu_domain_unload_entry() is going to directly or indirectly (through qi) call iommu_gas_rb_remove() a second time on each entry. Can a _PLACE entry actually be mapped, i.e., not be _UNMAPPED?

This revision was not accepted when it landed; it landed in state Needs Review.Sun, Aug 7, 7:25 PM
Closed by commit rG624e5dc0ecf6: tree.3: fix markup (authored by kib). · Explain Why
This revision was automatically updated to reflect the committed changes.
kib added inline comments.
sys/dev/iommu/iommu_gas.c
669–670

I do not think that PLACE can be mapped. I added assert and removed calculation of the argument. Also the iommu_gas_rb_remove() call is eliminated.

Do not call iommu_gas_rb_remove().
Assert that _PLACE is not set.

sys/dev/iommu/iommu_gas.c
687–689

Rather than implementing this new synchronization mechanism, for now, I would suggest changing (1) iommu_gas_remove_unmap() so that it simply adds the entry to a TAILQ (and thus doesn't need to release and reacquire the domain lock) and (2) this function so that it calls iommu_domain_unload() on that TAILQ after the domain lock is released. Later, we could look at adding a new helper function to the arm64- and x86-specific code that might initiate the unmapping and IOTLB invalidation if the latter can be done asynchronously. Otherwise, this new function adds the entry to the above TAILQ for synchronous invalidation after the domain lock is released.

kib marked 2 inline comments as done.

Use iommu_domain_unload() instead of rolling BUSY.