Currently, aio_return() and aio_waitcomplete() update rusage counters for the caller's thread, which (a) may not be the one that initiated the IO and (b) happens some arbitrary time later. This patch updates the rusage for the submitting thread as soon as the IO completes. Extracted from D33144 for separate review.
It would be nice to add a test case that ensures resource accounting happens on the correct thread.
This is racy unless you use acquire/release semantics. You should use atomic_load_acq_int here and below and everywhere this variable is modified must use atomic_store_rel_int or similar.
If unsafe AIO is enabled, then this loop could block indefinitely. That's not very good. Is there a way to allow a thread to exit even in this case? What about if it it nulls out the td field for every active job of this thread, and then aio_complete checks whether that field is null before adjusting its counters? For that matter, AIO's resource usage isn't really associated with any one thread. The I/O itself doesn't happen in a thread's context. So it would be reasonable to account for it not in the submitting thread's td_ru structure but in the process's p_ru structure.
Is it safe to acquire job->td's thread lock here? If so you could eliminate the atomic ops, and you could also eliminate the wait for all I/O to complete in aio_thread_exit.
I think it's good style to load these variables using atomic ops if you're going to store them that way, even if the assembly code will be the same. You should define RU_ATOMIC_LOAD and use it everywhere these variables are read. And that includes the struct copy in the penultimate line of rufetchadd.
Why are you sure that a zero timeout will be sufficient? I think you should use a NULL timeout instead, to wait forever if necessary.