Page MenuHomeFreeBSD

add proc_readmem() and proc_writemem() interfaces
ClosedPublic

Authored by markj on Nov 22 2015, 1:23 AM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 28 2024, 10:49 AM
Unknown Object (File)
Feb 28 2024, 9:03 AM
Unknown Object (File)
Jan 21 2024, 11:01 PM
Unknown Object (File)
Dec 20 2023, 12:18 AM
Unknown Object (File)
Dec 1 2023, 5:44 PM
Unknown Object (File)
Nov 14 2023, 8:50 AM
Unknown Object (File)
Nov 6 2023, 7:07 PM
Unknown Object (File)
Nov 6 2023, 5:36 PM

Details

Summary

At the moment, the only API for reading and writing an arbitrary
process' vmspace is proc_rwmem(). This function uses a uio to describe
the operation, so every caller must fill out a struct uio, which is
onerous. I propose adding a simpler interface on top of this which allow
for the common case of reading or writing a single kernel buffer.
Specifically, this change adds proc_readmem(9) and proc_writemem(9).

One detail I considered is whether the interface should support short
reads or writes. However, these cannot occur with the current
implementation of proc_rwmem(9), and no callers really check for this
condition anyway. Moreover, truncated I/Os would seem to be a
counterintuitive possibility since proc_rwmem(9) operates on
byte-granular memory rather than blocks. So the API does not support
them (and adds assertions to this effect).

The change also adds a manual page for these functions.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj retitled this revision from to add proc_readmem() and proc_writemem() interfaces.
markj edited the test plan for this revision. (Show Details)
markj updated this object.
markj added reviewers: jhb, kib, mjg.
bjk added inline comments.
share/man/man9/proc_rwmem.9
40 ↗(On Diff #10410)

You can avoid these long lines by using Fo, Fa, and Fc on separate lines.

Use function blocks to avoid overly-long lines, per bjk.

Short I/O is indeed impossible in your implementation, but this is a consequence of the implementation which would also make changing the implementation of proc_rwmem() quite hard. Usual Unix API design is to return success if any bytes where moved, and only if no i/o happen, error could be returned. For proc_rwmem() this was not done, probably because the kernel function can (and does) return both error and decrement uio_resid. I would prefer that this detail not leaked into your functions.

Another thing which I do not quite like is the PROC_HOLD() added to proc_read/writemem. E.g., for kern_proc.c, you get a recursion on hold count. Generally, the caller must already own the hold count on the process it is operating on, otherwise the action can be aborted at any time due to the process exiting. The recursion seems harmless because PHOLD() is not blocking, but still I dislike it.

Overall, I think that the helpers are useful and should be added.

In D4245#89542, @kib wrote:

Short I/O is indeed impossible in your implementation, but this is a consequence of the implementation which would also make changing the implementation of proc_rwmem() quite hard. Usual Unix API design is to return success if any bytes where moved, and only if no i/o happen, error could be returned. For proc_rwmem() this was not done, probably because the kernel function can (and does) return both error and decrement uio_resid. I would prefer that this detail not leaked into your functions.

My argument was that any API which modifies a process' address space isn't really implementing an "I/O operation" at all, at least in the usual sense. IMHO, the only reason proc_rwmem() exposes the possibility of short I/O is because it uses uio(9), which is designed to be used for true I/O operations where this is indeed possible.

Would you be satisfied with an optional ssize_t * parameter to proc_readmem/writemem that allows the caller to determine the number of bytes read or written? That is, all current callers would pass NULL, but in the future they could be modified to check for short I/Os without modifying the KPI.

Another thing which I do not quite like is the PROC_HOLD() added to proc_read/writemem. E.g., for kern_proc.c, you get a recursion on hold count. Generally, the caller must already own the hold count on the process it is operating on, otherwise the action can be aborted at any time due to the process exiting. The recursion seems harmless because PHOLD() is not blocking, but still I dislike it.

Hm, I don't follow. proc_readmem/writemem don't modify the hold count. They do the same thing as proc_rwmem() and require that the caller hold the process.

In D4245#89661, @markj wrote:
In D4245#89542, @kib wrote:

Short I/O is indeed impossible in your implementation, but this is a consequence of the implementation which would also make changing the implementation of proc_rwmem() quite hard. Usual Unix API design is to return success if any bytes where moved, and only if no i/o happen, error could be returned. For proc_rwmem() this was not done, probably because the kernel function can (and does) return both error and decrement uio_resid. I would prefer that this detail not leaked into your functions.

My argument was that any API which modifies a process' address space isn't really implementing an "I/O operation" at all, at least in the usual sense. IMHO, the only reason proc_rwmem() exposes the possibility of short I/O is because it uses uio(9), which is designed to be used for true I/O operations where this is indeed possible.

Would you be satisfied with an optional ssize_t * parameter to proc_readmem/writemem that allows the caller to determine the number of bytes read or written? That is, all current callers would pass NULL, but in the future they could be modified to check for short I/Os without modifying the KPI.

No, this is probably too burdensome for the consumers of the KPI.

First, note that proc_rwmem() current implementation is somewhat unnatural. Assume, just for the discussion, that somebody asked to read two aligned consequtive pages of the process memory. Also, suppose that the first page is mapped, while second page is not. Then, depending on how the request is split in the uio vector, you either get one page read, and uio_resid advanced by a page, or not. In any case, you get EFAULT.

IMHO, the right interface there is to return short read and no error, if both io and error occured. Changing proc_rwmem() is outside the scope of your patch. But, for the new functions, IMO it is reasonable to provide the reasonable interface. In other words, I would remove the KASSERT(), and return the count of bytes moved if any, instead of a concurrent error.

Another thing which I do not quite like is the PROC_HOLD() added to proc_read/writemem. E.g., for kern_proc.c, you get a recursion on hold count. Generally, the caller must already own the hold count on the process it is operating on, otherwise the action can be aborted at any time due to the process exiting. The recursion seems harmless because PHOLD() is not blocking, but still I dislike it.

Hm, I don't follow. proc_readmem/writemem don't modify the hold count. They do the same thing as proc_rwmem() and require that the caller hold the process.

My bad, I misread the patch, mixing different fragments without re-checking my memory. In fact, the complaint is probably still valid, but directed at the fasttrap_isa.c functions (which is unrelated, but highlighted by the patch).

In D4245#89699, @kib wrote:
In D4245#89661, @markj wrote:
In D4245#89542, @kib wrote:

Short I/O is indeed impossible in your implementation, but this is a consequence of the implementation which would also make changing the implementation of proc_rwmem() quite hard. Usual Unix API design is to return success if any bytes where moved, and only if no i/o happen, error could be returned. For proc_rwmem() this was not done, probably because the kernel function can (and does) return both error and decrement uio_resid. I would prefer that this detail not leaked into your functions.

My argument was that any API which modifies a process' address space isn't really implementing an "I/O operation" at all, at least in the usual sense. IMHO, the only reason proc_rwmem() exposes the possibility of short I/O is because it uses uio(9), which is designed to be used for true I/O operations where this is indeed possible.

Would you be satisfied with an optional ssize_t * parameter to proc_readmem/writemem that allows the caller to determine the number of bytes read or written? That is, all current callers would pass NULL, but in the future they could be modified to check for short I/Os without modifying the KPI.

No, this is probably too burdensome for the consumers of the KPI.

First, note that proc_rwmem() current implementation is somewhat unnatural. Assume, just for the discussion, that somebody asked to read two aligned consequtive pages of the process memory. Also, suppose that the first page is mapped, while second page is not. Then, depending on how the request is split in the uio vector, you either get one page read, and uio_resid advanced by a page, or not. In any case, you get EFAULT.

IMHO, the right interface there is to return short read and no error, if both io and error occured. Changing proc_rwmem() is outside the scope of your patch. But, for the new functions, IMO it is reasonable to provide the reasonable interface. In other words, I would remove the KASSERT(), and return the count of bytes moved if any, instead of a concurrent error.

Ok, that makes sense to me.

Another thing which I do not quite like is the PROC_HOLD() added to proc_read/writemem. E.g., for kern_proc.c, you get a recursion on hold count. Generally, the caller must already own the hold count on the process it is operating on, otherwise the action can be aborted at any time due to the process exiting. The recursion seems harmless because PHOLD() is not blocking, but still I dislike it.

Hm, I don't follow. proc_readmem/writemem don't modify the hold count. They do the same thing as proc_rwmem() and require that the caller hold the process.

My bad, I misread the patch, mixing different fragments without re-checking my memory. In fact, the complaint is probably still valid, but directed at the fasttrap_isa.c functions (which is unrelated, but highlighted by the patch).

Oh, yeah, the current usage of the hold count (as well as proc locking) in fasttrap isn't correct. I have some patches that attempt to fix it, but that'll be a separate change.

sys/kern/kern_proc.c
1540 ↗(On Diff #10414)

I realize this is the existing code (and unrelated to this change), but reading a page at a time would seem to be more efficient than a byte at a time.

sys/kern/sys_process.c
1083 ↗(On Diff #10414)

After this change, there is basically no shared code now between PT_READ_* and PT_WRITE_*. It might be cleaner to remove the FALLTHROUGH and move the write case up into the PT_WRITE_* block instead.

markj edited edge metadata.

Rework the proc_readmem/writemem interface per kib's suggestion.

Have the new subroutines return the number of bytes read or written
rather than a binary success/failure status. If an error occurs, i.e.
proc_rwmem() returned a non-zero status and no bytes were read or
written, we return -1.

The behaviour of existing callers is preserved. That is, in most cases
we return an error if a short read or write occurs.

proc_read_string() is simplified somewhat: as the comment notes, we may
get a short read if one of the requested chunks falls in an unmapped
page. We now return success in this case rather than resorting to
single-byte reads.

This change also simplifies handling of PT_READ_* and PT_WRITE_*, per
jhb.

sys/kern/kern_proc.c
1535–1537 ↗(On Diff #10751)

This is addressed by the most recent change, since we now allow short reads.

sys/kern/sys_process.c
1092–1093 ↗(On Diff #10751)

I think so too; the latest change splits them.

jhb edited edge metadata.

I think this looks fine to me.

This revision is now accepted and ready to land.Dec 4 2015, 10:39 PM
kib edited edge metadata.
markj edited edge metadata.

Fix some small mistakes:

  • td->proc becomes td->td_proc in mips/pm_machdep.c
  • proc_readmem() and proc_writemem() return ssize_t, not int
This revision now requires review to proceed.Dec 7 2015, 7:23 PM
kib edited edge metadata.
This revision is now accepted and ready to land.Dec 7 2015, 7:39 PM
This revision was automatically updated to reflect the committed changes.