The basic observation is that the delayed invalidation is page-local operation. If page has the generation assigned, we only need to wait until the owner of the generation finishes.
The drawback is that it increases struct vm_page (its md part) by 4 (actually 8) bytes.
It is practically impossible to generate conjection for this stuff using buildworld. I reduced the number of slots to two and I only get
```
vm.pmap.invl_busy_next_gen: 0
vm.pmap.invl_gen_wrapped: 0
vm.pmap.invl_wait: 2
vm.pmap.invl_busy_slots: 1465
```