This is the least elegant of the object concurrency patches. The basic notion is that the object can become busy, after which new pages can not become busy but existing pages may persist in being busy. This permits the existing style of only checking, and not acquiring, busy for dedicated blocks of code.
Here with object busy we're synchronizing vm_fault_soft_fast against pmap_remove* or changes in valid bits on any constituent page in the super page set. We could simply busy every page in the set and then unbusy at the end as we would've for 512 individual mappings. This would be my preference, but there was concern over the cost. So instead we use the atomic object busy counter to emulate the old behavior of a busy check being sufficient.
This is the only consumer that still uses the pattern of checking busy without acquiring and so it is at present the only place that requires object busy. This is only synchronizing against other page busy consumers that don't have the object lock held. Callers which do have the object write lock will still be serialized. The object busy may allow us to do vm_fault_soft_fast without the object lock held across the pmap operation.
The implementation does allow for pages to transiently become busy, notice the object busy count, and drop busy. These will revert back to single page mappings if you lose the race. I don't see a reasonable way around this. I think it's likely that if the page is becoming busy it may drop out of the superpage set later anyway.