shm_open2: Implement SHM_GROW_ON_WRITE
ClosedPublic
Actions

Authored by kevans on Jun 29 2020, 3:53 AM.

Details

Reviewers

kib

Commits

rS363066: memfd_create: turn on SHM_GROW_ON_WRITE
rS363065: shm_open2: Implement SHM_GROW_ON_WRITE

Summary

Lack of SHM_GROW_ON_WRITE is actively breaking Python tests, so go ahead and implement it. memfd_create will always set SHM_GROW_ON_WRITE, to match Linux behavior.

The basic premise is pretty simple, write(2) and friends should be allowed to grow the file, assuming F_SEAL_GROW has not been applied. Grab the total extent of the write (uio->uio_offset should be valid after foffset_lock_uio even if (flags & FOF_OFFSET) == 0) and quickly extend the file prior to writing if the write extends beyond the current shm_size.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kevans created this revision.Jun 29 2020, 3:53 AM

Herald added a subscriber: imp. · View Herald TranscriptJun 29 2020, 3:53 AM

kevans requested review of this revision.Jun 29 2020, 3:53 AM

Harbormaster completed remote builds in B32031: Diff 73855.Jun 29 2020, 3:53 AM

koobs added a subscriber: koobs.Jun 29 2020, 3:56 AM

kib added inline comments.Jun 29 2020, 5:25 AM

sys/kern/uipc_shm.c
318 ↗	(On Diff #73855)	The type cannot be size_t, it should be off_t, otherwise it wraps on 32bit arches.
327 ↗	(On Diff #73855)	I think this should be checked for overflow.

kevans added inline comments.Jun 29 2020, 5:52 AM

sys/kern/uipc_shm.c
327 ↗	(On Diff #73855)	Sure, this seems like a good idea. I guess for SHM_GROW_ON_WRITE we should probably return EFBIG on error, and if we're not growing from write we just clamp it to shmfd->shm_size? Previously we would've overflowed when grabbing the rangelock, but I suppose we're more likely to overflow in the !FOF_OFFSET case than offset-supplied.

Unwhoops the datatype for size

Do an overflow check for the calculation; if we would've wrapped before, I suppose we'd have likely ended up with an invalid range for the rangelock... the new version will rangelock from uio->uio_offset to shmfd->shm_size in that particular case instead of offset, but perhaps it'd be better to use OFF_MAX instead of shmfd->shm_size? It's a bit of an edge-case, I think.

Harbormaster completed remote builds in B32044: Diff 73882.Jun 29 2020, 7:47 PM

I realized that we do not check for the overflow in vn_io_fault() either.
I am fine with this commit as-is, and you could fix vn_io_fault() (if needed) later.
For instance, despite no check on filedesc/VFS layer, ffs_write() eventually does

	if ((uoff_t)uio->uio_offset + uio->uio_resid > fs->fs_maxfilesize)
		return (EFBIG);

which probably is good enough to avoid corruption for FFS.
But rangelocks are probably not immune.

This revision is now accepted and ready to land.Jun 30 2020, 6:11 AM

In D25502#563799, @kib wrote:
I realized that we do not check for the overflow in vn_io_fault() either.
I am fine with this commit as-is, and you could fix vn_io_fault() (if needed) later.
For instance, despite no check on filedesc/VFS layer, ffs_write() eventually does
	if ((uoff_t)uio->uio_offset + uio->uio_resid > fs->fs_maxfilesize)
		return (EFBIG);
which probably is good enough to avoid corruption for FFS.
But rangelocks are probably not immune.

Sure- I'll check out vn_io_fault later. I'm somewhat unsure about what to do there; it seems like in some cases (e.g. shmfd) it could make sense to allow the overflow because the file won't grow like most traditional filesystems so we'll just truncate the write to the pre-allocated size of the file. It would be simpler to not care about shmfd being an odd duck.

For this current diff, it occurs to me that this specific instance should be checking for overflow of SIZE_MAX rather than OFF_MAX since shm_size is a size_t. I'll make that tweak pre-commit.

In D25502#563934, @kevans wrote:
In D25502#563799, @kib wrote:
I realized that we do not check for the overflow in vn_io_fault() either.
I am fine with this commit as-is, and you could fix vn_io_fault() (if needed) later.
For instance, despite no check on filedesc/VFS layer, ffs_write() eventually does
	if ((uoff_t)uio->uio_offset + uio->uio_resid > fs->fs_maxfilesize)
		return (EFBIG);
which probably is good enough to avoid corruption for FFS.
But rangelocks are probably not immune.
Sure- I'll check out vn_io_fault later. I'm somewhat unsure about what to do there; it seems like in some cases (e.g. shmfd) it could make sense to allow the overflow because the file won't grow like most traditional filesystems so we'll just truncate the write to the pre-allocated size of the file. It would be simpler to not care about shmfd being an odd duck.

vn_io_fault() is not used for shmfd.

My general concern there is that overflow for end causes rangelocks to misbehave. This is besides a possibility that some filesystems might be not prepared to this as well.

For this current diff, it occurs to me that this specific instance should be checking for overflow of SIZE_MAX rather than OFF_MAX since shm_size is a size_t. I'll make that tweak pre-commit.

Uhh. So the check in shm_dotruncate_locked() is boosted, and also we could have mis-synchronization between shm_object size and shm_size.

In D25502#564145, @kib wrote:
In D25502#563934, @kevans wrote:
In D25502#563799, @kib wrote:
I realized that we do not check for the overflow in vn_io_fault() either.
I am fine with this commit as-is, and you could fix vn_io_fault() (if needed) later.
For instance, despite no check on filedesc/VFS layer, ffs_write() eventually does
	if ((uoff_t)uio->uio_offset + uio->uio_resid > fs->fs_maxfilesize)
		return (EFBIG);
which probably is good enough to avoid corruption for FFS.
But rangelocks are probably not immune.
Sure- I'll check out vn_io_fault later. I'm somewhat unsure about what to do there; it seems like in some cases (e.g. shmfd) it could make sense to allow the overflow because the file won't grow like most traditional filesystems so we'll just truncate the write to the pre-allocated size of the file. It would be simpler to not care about shmfd being an odd duck.
vn_io_fault() is not used for shmfd.

My general concern there is that overflow for end causes rangelocks to misbehave. This is besides a possibility that some filesystems might be not prepared to this as well.

Oh, whoops, of course- my bad.

For this current diff, it occurs to me that this specific instance should be checking for overflow of SIZE_MAX rather than OFF_MAX since shm_size is a size_t. I'll make that tweak pre-commit.

Uhh. So the check in shm_dotruncate_locked() is boosted, and also we could have mis-synchronization between shm_object size and shm_size.

I'll (separately) add a check for length > SIZE_MAX in shm_dotruncate_locked and kick back an EFBIG there; is there another source of mis-synchronization for shm_object size and shm_size, or are you generally referring to the truncation of shm_size that could happen when assigning the off_t length to 32-bit quantity?

In D25502#564805, @kevans wrote:

In D25502#564145, @kib wrote:

In D25502#563934, @kevans wrote:

For this current diff, it occurs to me that this specific instance should be checking for overflow of SIZE_MAX rather than OFF_MAX since shm_size is a size_t. I'll make that tweak pre-commit.

Uhh. So the check in shm_dotruncate_locked() is boosted, and also we could have mis-synchronization between shm_object size and shm_size.

I'll (separately) add a check for length > SIZE_MAX in shm_dotruncate_locked and kick back an EFBIG there; is there another source of mis-synchronization for shm_object size and shm_size, or are you generally referring to the truncation of shm_size that could happen when assigning the off_t length to 32-bit quantity?

I am not sure why check for SIZE_MAX and not change shm_size type to either off_t or better vm_ooffset_t.

In D25502#564823, @kib wrote:

In D25502#564805, @kevans wrote:

In D25502#564145, @kib wrote:

In D25502#563934, @kevans wrote:

For this current diff, it occurs to me that this specific instance should be checking for overflow of SIZE_MAX rather than OFF_MAX since shm_size is a size_t. I'll make that tweak pre-commit.

Uhh. So the check in shm_dotruncate_locked() is boosted, and also we could have mis-synchronization between shm_object size and shm_size.

I'll (separately) add a check for length > SIZE_MAX in shm_dotruncate_locked and kick back an EFBIG there; is there another source of mis-synchronization for shm_object size and shm_size, or are you generally referring to the truncation of shm_size that could happen when assigning the off_t length to 32-bit quantity?

I am not sure why check for SIZE_MAX and not change shm_size type to either off_t or better vm_ooffset_t.

My recollection was that we can't really change struct shmfd like that on stable without KBI break, and we should probably fix it at least on stable/12; leading to "ok, just validate it for the MFC then we can drop the check and expand shm_size after."

In D25502#564824, @kevans wrote:

My recollection was that we can't really change struct shmfd like that on stable without KBI break, and we should probably fix it at least on stable/12; leading to "ok, just validate it for the MFC then we can drop the check and expand shm_size after."

Can you point me to the discussion, please ? I see struct shmfd as internal to uipc_shm.c. There are some uses in kern_sendfile.c but this is still internal to the kernel. There are more references to it in mac module, but there I think it is either 'do not care' or you can put large shm_size at the end of the structure for stable/12.

No discussion, just my own previous inspection when I was doing sealing stuff. Looking again, I see that you're correct -- the only thing that would seemingly break is libprocstat's procstat_get_shm_info_kvm.

Closed by commit rS363065: shm_open2: Implement SHM_GROW_ON_WRITE (authored by kevans). · Explain WhyJul 10 2020, 12:43 AM

This revision was automatically updated to reflect the committed changes.

kevans added a commit: rS363065: shm_open2: Implement SHM_GROW_ON_WRITE.

kevans added a commit: rS363066: memfd_create: turn on SHM_GROW_ON_WRITE.