Page MenuHomeFreeBSD

Refactor the AIO code.
ClosedPublic

Authored by jhb on Feb 16 2016, 1:04 AM.
Tags
None
Referenced Files
Unknown Object (File)
Nov 26 2024, 4:16 PM
Unknown Object (File)
Nov 26 2024, 4:16 PM
Unknown Object (File)
Nov 26 2024, 4:16 PM
Unknown Object (File)
Nov 26 2024, 4:16 PM
Unknown Object (File)
Nov 26 2024, 4:14 PM
Unknown Object (File)
Nov 26 2024, 3:53 PM
Unknown Object (File)
Nov 16 2024, 12:50 PM
Unknown Object (File)
Oct 20 2024, 4:51 PM
Subscribers

Details

Summary

Refactor the AIO subsystem to permit file-type-specific
implementations and improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.

A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.

The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.

Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order. Socket requests will not block indefinitely
permitting timely cancellation of all requests.

Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels. The VFS_AIO
kernel option and aio.ko module are gone.

Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon. This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system. To protect against this, AIO
requests are only permitted for known "safe" files by default. AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value.

Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default. aio_mlock() is also enabled by default.

Test Plan
  • Run the various aio regression tests both with enable_unsafe=0 and enable_unsafe=1
  • Run the 'fio' benchmark with AIO against a raw disk device (this previously caught an accounting bug that caused all AIO requests to be disabled once 1024 raw disk requests had been queued).

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jhb retitled this revision from to Refactor the AIO code..
jhb updated this object.
jhb edited the test plan for this revision. (Show Details)
jhb added reviewers: jilles, ngie, kib.
jhb added subscribers: np, glebius.
cem added a reviewer: cem.
cem added a subscriber: cem.

I only skimmed the meat of the AIO changes. I'm not super familiar with them yet. I don't see any major red flags.

share/man/man4/aio.4
49–50 ↗(On Diff #13339)

IO on disk-backed vnodes also shouldn't block indefinitely. Is the concern that some non-local filesystems (NFS) may block indefinitely on network partitions? In which case, maybe we want to push the safe/unsafe decision down to individual mountpoints?

77 ↗(On Diff #13339)

Maybe have the sysctl in HZ-independent units, like seconds or milliseconds?

sys/kern/sys_socket.c
446 ↗(On Diff #13339)

The AIO pools seem to share a lot in common with a taskqueue. Could the thread min/max / work-driven scaling be generalized to taskqueues, and then soaio could just use an instance of that?

460–461 ↗(On Diff #13339)

Can we use more distinct wchan strings than "-"?

489–496 ↗(On Diff #13339)

Shouldn't we bump soaio_starting before dropping the lock?

499–508 ↗(On Diff #13339)

Could soaio_kproc_loop race to assert and decrement soaio_starting?

752 ↗(On Diff #13339)

nit: (flags & RUNNING) == 0

sys/kern/vfs_aio.c
348–354 ↗(On Diff #13339)

Should this just become a SYSINIT()?

621 ↗(On Diff #13339)

nit: (flags & XXX) != 0

This revision is now accepted and ready to land.Feb 17 2016, 4:43 AM
jhb marked 2 inline comments as done.Feb 18 2016, 6:31 PM
jhb added inline comments.
share/man/man4/aio.4
49–50 ↗(On Diff #13339)

devfs makes this far messier (individual devices might misbehave). I think the solution is for Someone(tm) to write a vnode-specific fo_aio_queue and ensure it DTRT. Then they will be exempt from this check. kib@ has already said he might look at this for filesystems that use the buffer cache. However, I think that a vn_aio_queue() should probably be in a separate change as this is already large enough.

77 ↗(On Diff #13339)

I will probably change this as a followup to use msec's (like many network sysctls). However, for now I'm matching the existing sysctl (both use the same value to set their default).

sys/kern/sys_socket.c
446 ↗(On Diff #13339)

As mentioned on IRC, AIO requires kprocs not kthreads. In particular, aio tasks can switch vmspace, and the main loop of an AIO daemon reverts back to it's original vmspace before going idle. It's actually this last bit (having custom logic before atomically going idle) that makes it hard to just have this function use taskqueue_run directly.

460–461 ↗(On Diff #13339)

As a policy idle kernel threads use "-" so that you can distinguish "I'm just idle waiting for work" from "I'm waiting on an actual resource". Some of the older daemons like pagedaemon predate this convention so they still use names, but more recently added threads all follow this. For example, taskqueue threads use "-" when idle.

489–496 ↗(On Diff #13339)

Yes. I was probably trying to avoid having to clean up if the kproc_create() failed, but that is fine to handle.

499–508 ↗(On Diff #13339)

Yes, as above.

752 ↗(On Diff #13339)

It depends on who you ask. :) Testing flags in a flags field is a boolean condition (compared with checking for a specific value of a multiple bit field). Some of style(9) is implicit from examples. Note this bit of sample code:

k = !(l & FLAGS);

The context is about using additional ()'s to avoid confusion, it is not saying that flag tests as booleans is bad. OTOH, the section in style(9) that talks about explicitly comparing refers to comparing pointers against NULL and use of ! for other non-boolean comparisons such as comparing a char against '\0'.

In the past bde@ has preferred to treat 'flags & RUNNING' as a boolean and thus not require the explicit comparison against 0.

sys/kern/vfs_aio.c
348–354 ↗(On Diff #13339)

Eventually, yes. I did not make that large of a change to avoid making the diff even larger. As a followup I will make the AIO system calls "STD", remove all of the SYSCALL_INIT_HELPER stuff, etc. As part of that aio_onceonly would get smaller and what is left would become a SYSINIT.

jhb edited edge metadata.
jhb marked 2 inline comments as done.
  • Bump soaio_starting before the new daemon starts.
This revision now requires review to proceed.Feb 18 2016, 6:32 PM
jhb edited edge metadata.

Rebase after sockbuf locking notes committed.

share/man/man4/aio.4
49–50 ↗(On Diff #13431)

That's fine.

77 ↗(On Diff #13431)

Ok.

sys/kern/sys_socket.c
460–461 ↗(On Diff #13431)

And it makes it hard to figure out which idle state they're blocked on when they aren't processing the work I've queued. It's obnoxious, I like named wait states. :)

sys/kern/vfs_aio.c
348–354 ↗(On Diff #13431)

Ok.

sys/kern/sys_socket.c
460–461 ↗(On Diff #13431)

Well, you know "-" always means "idle". Any named state means it's actually blocked on something real. :) Previously we had 27 different spellings of "idle" which made it hard when looking at a set of kernel threads to know which ones were actually blocked on something real (a lock, some sort of resource or event, etc.) In theory "-" should mean "you can ignore this".

jilles edited edge metadata.
jilles added inline comments.
share/man/man4/aio.4
83 ↗(On Diff #13431)

missing space: is used.

This revision is now accepted and ready to land.Feb 29 2016, 10:49 PM
jhb marked an inline comment as done.Mar 1 2016, 12:47 AM
This revision was automatically updated to reflect the committed changes.