Page MenuHomeFreeBSD

Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).
Needs ReviewPublic

Authored by khng on Jan 26 2021, 11:10 AM.

Details

Summary

Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).

fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI.

The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.

fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd can be
SPACECTL_DEALLOC, and flags has to be 0 for SPACECTL_DEALLOC
operation.

fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.

Submitted by: Ka Ho Ng <khng@freebsdfoundation.org>
Sponsored by: The FreeBSD Foundation

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 38039
Build 34928: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

vop_stddeallocate: Fix shifting of offset

  • Add fspacectl(2) and VOP_DEALLOCATE(9).
  • Regen.
  • Wire up OpenZFS with VOP_DEALLOCATE(9)
  • fspacectl.2 added
  • VOP_DEALLOCATE.9 added
  • vn_deallocate.9 added
  • Put posix_fallocate under the fo_fspacectl umbrella
lib/libc/sys/fspacectl.2
63

TODO: Document SPACECTL_ALLOC

sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
5204 ↗(On Diff #84034)

TODO: This is undocumented.

tests/sys/file/fspacectl_test.c
294

TODO: Add also SPACECTL_ALLOC

vn_deallocate_impl: Comments

  • Add fspacectl(2) and VOP_DEALLOCATE(9).
  • Regen.
  • Wire up OpenZFS with VOP_DEALLOCATE(9)
  • fspacectl.2 added
  • VOP_DEALLOCATE.9 added
  • vn_deallocate.9 added
  • Put posix_fallocate under the fo_fspacectl umbrella

Fix wordings in fspacectl.2

  • fspacectl.2 added
  • VOP_DEALLOCATE.9 added
  • vn_deallocate.9 added
  • Put posix_fallocate under the fo_fspacectl umbrella
  • Document _PC_FDEALLOC_PRESENT in pathconf.2
  • fspacectl.2: Document SPACECTL_ALLOC
  • fspacectl.2: Clarifies !SPACECTL_F_CANEXTEND cases for SPACECTL_ALLOC and SPACECTL_DEALLOC

Cleanups and extend VOP_ALLOCATE(9)

  • Add fspacectl(2) and VOP_DEALLOCATE(9).
  • Regen.
  • Wire up OpenZFS with VOP_DEALLOCATE(9)
  • fspacectl.2 added
  • VOP_DEALLOCATE.9 added
  • vn_deallocate.9 added
  • Put posix_fallocate under the fo_fspacectl umbrella
  • Document _PC_FDEALLOC_PRESENT in pathconf.2
  • fspacectl.2: Document SPACECTL_ALLOC
  • fspacectl.2: Clarifies !SPACECTL_F_CANEXTEND cases for SPACECTL_ALLOC and SPACECTL_DEALLOC
  • fspacectl_test.c: Cleanups
  • fspacectl_test.c: Add SPACECTL_ALLOC test cases
  • Change vop_allocate call

Fixes

  • Add fspacectl(2) and VOP_DEALLOCATE(9).
  • Regen.
  • Wire up OpenZFS with VOP_DEALLOCATE(9)
  • fspacectl.2 added
  • VOP_DEALLOCATE.9 added
  • vn_deallocate.9 added
  • Put posix_fallocate under the fo_fspacectl umbrella
  • Document _PC_FDEALLOC_PRESENT in pathconf.2
  • fspacectl.2: Document SPACECTL_ALLOC
  • fspacectl.2: Clarifies !SPACECTL_F_CANEXTEND cases for SPACECTL_ALLOC and SPACECTL_DEALLOC
  • fspacectl_test.c: Cleanups
  • fspacectl_test.c: Add SPACECTL_ALLOC test cases
  • Change vop_allocate call

Document more changes in commit message

  • Change vop_allocate call

fspacectl(2), vn_deallocate and vn_fallocate do not accepts len == 0 from now.

  • Add fspacectl(2) and VOP_DEALLOCATE(9).
  • Regen.
  • Wire up OpenZFS with VOP_DEALLOCATE(9)
  • fspacectl.2 added
  • VOP_DEALLOCATE.9 added
  • vn_deallocate.9 added
  • Put posix_fallocate under the fo_fspacectl umbrella
  • Document _PC_FDEALLOC_PRESENT in pathconf.2
  • fspacectl.2: Document SPACECTL_ALLOC
  • fspacectl.2: Clarifies !SPACECTL_F_CANEXTEND cases for SPACECTL_ALLOC and SPACECTL_DEALLOC
  • fspacectl_test.c: Cleanups
  • fspacectl_test.c: Add SPACECTL_ALLOC test cases

fspacectl_test.c fixes and OFF_MAX test case addition

  • Add fspacectl(2) and VOP_DEALLOCATE(9).
  • Regen.
  • Wire up OpenZFS with VOP_DEALLOCATE(9)
  • fspacectl.2 added
  • VOP_DEALLOCATE.9 added
  • vn_deallocate.9 added
  • Put posix_fallocate under the fo_fspacectl umbrella
  • VOP_ALLOCATE.9: Changing prototype
  • Document _PC_FDEALLOC_PRESENT in pathconf.2
  • fspacectl.2: Document SPACECTL_ALLOC
  • fspacectl.2: Clarifies !SPACECTL_F_CANEXTEND cases for SPACECTL_ALLOC and SPACECTL_DEALLOC

Reorganize commits

  • Manpages works:
  • Put posix_fallocate under the fo_fspacectl umbrella

fspacectl.2: Document EINVAL errno, which happens when calling posix_fallocate on OpenZFS.

  • Manpages works:
  • Put posix_fallocate under the fo_fspacectl umbrella

Rebased

  • Add vnode_pager_purge_range(9) KPI
  • vnode_pager_purge_range comments updated
  • Add vnode_pager_purge_range.9
  • vnode_pager_purge_range.9 linting and style fixes
  • Update wording
  • Two fixes:
  • Minor changes in vnode_pager_purge_range to unnecessary vm_object_page_remove
  • vnode_pager_purge_range fixes on non-existing page (m == NULL)
  • Add fspacectl(2) and VOP_DEALLOCATE(9).
  • Regen.
  • Wire up OpenZFS with VOP_DEALLOCATE(9)
  • Manpages works:
  • Put posix_fallocate under the fo_fspacectl umbrella

Rebased

  • Add fspacectl(2) and VOP_DEALLOCATE(9).
  • Regen.
  • Wire up OpenZFS with VOP_DEALLOCATE(9)
  • Manpages works:
  • Put posix_fallocate under the fo_fspacectl umbrella

Fix missing closing curly bracket

  • Add fspacectl(2) and VOP_DEALLOCATE(9).
  • Regen.
  • Wire up OpenZFS with VOP_DEALLOCATE(9)
  • Manpages works:
  • Put posix_fallocate under the fo_fspacectl umbrella
khng retitled this revision from Add system call and VOP to do hole-punching to Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9)..Feb 21 2021, 8:10 AM

Switch to tools/tools/git/arc-git.sh workflow.

Rebased on top of the newest D27194.

In vop_stddeallocate:

  • Use zero_region for zero-filling
  • Replace MAXPHYS with maxphys

syscallsubr.h: Remove nonexisting prototypes.

Removed SPACECTL_F_CANEXPAND support on SPACECTL_DEALLOC

Can't think of any use of it right now.

khng edited the summary of this revision. (Show Details)
sys/kern/uipc_shm.c
1891

TODO: kill this, and kill int error;

1919

TODO: Just directly return (0).

1963–1965

TODO: Move them to D28833.

khng marked 4 inline comments as not done.Mon, Mar 22, 6:16 PM
khng added inline comments.
tests/sys/file/fspacectl_test.c
294

Done in D28833.

Sorry I'm late to the party, but I have a few questions:

  1. Why a new syscall? There are already at least three APIs for hole-punching. Instead of adding a fourth, can we just reuse one of these existing APIs?
    • Solaris uses fcntl() with F_FREESP. AFAIK it was the first major operating system to support hole punching.
    • Linux uses fallocate() with FALLOC_FL_PUNCH_HOLE.
    • GEOM uses ioctl() with DIOCGDELETE.
  1. Where did zfs_deallocate go? I see it in the comments but not in the source.

Sorry I'm late to the party, but I have a few questions:

  1. Why a new syscall? There are already at least three APIs for hole-punching. Instead of adding a fourth, can we just reuse one of these existing APIs?
    • Solaris uses fcntl() with F_FREESP. AFAIK it was the first major operating system to support hole punching.
    • Linux uses fallocate() with FALLOC_FL_PUNCH_HOLE.
    • GEOM uses ioctl() with DIOCGDELETE.

The Solaris's use of F_FREESP is actually unstandarized. As far as I remember, only
F_FREESP on ZFS would do hole-punching, while F_FREESP on UFS only allows
truncation. Even with that, the ZFS behavior of supporting F_FREESP also allows
extending a file and doing hole-punching at the same time, which I could not
understand in what situation could we use such functionalities. The extending
behavior is not seen in other operating system's equivalent such as Linux's
fallocate(2) or FSCTL_SET_ZERO_DATA on Windows either, nor is it listed to be
one of the side-effect of the NFSv4 DEALLOCATE command.

For Linux's fallocate(), we currently do not have an equivalent of linux's fallocate(2)
either. In particular I do not quite like it putting FALLOC_FL_PUNCH_HOLE as a flag
despite it is actually one of the major function.

GEOM uses ioctl() with DIOCGDELETE does TRIMMING as its intended behavior.
But Trimming does not equal hole-punching as trimming need not to enforce the
"zeroing" as the main side-effect. The fspacectl(SPACECTL_DEALLOC) on the
otherhand enforces "zeroing" as its main side-effect, while allowing the underlying
file system to do deallocation as well.

The fspacectl(2) in this revision currently does SPACECTL_DEALLOC only. In
D28833 SPACECTL_ALLOC is added as well and posix_fallocate(2) is changed over
to use this. In future we might even add an equivalent of TRIM which does not
mandate the zeroing behavior and only exposes the TRIM command of the
underlying storage if possible.

  1. Where did zfs_deallocate go? I see it in the comments but not in the source.

I separated the giant revision into different smaller revisions, which is listed under
"Stack" tab. zfs_deallocate() is in D28834.
for them.

In D28347#657998, @khng wrote:

For Linux's fallocate(), we currently do not have an equivalent of linux's fallocate(2)
either. In particular I do not quite like it putting FALLOC_FL_PUNCH_HOLE as a flag
despite it is actually one of the major function.

A particular part regarding fallocate(2) with FALLOC_FL_PUNCH_HOLE is that, the behavior of the call on Linux would fail in case the underlying file system does not support hole-punching. However such behavior requires calling the fallocate(2) to check explicitly the return value as well, which makes the code calling the system call to be wrapped with extra fallback behaviors. As a remedy for this, _PC_FDEALLOC_PRESENT is added to pathconf(2) to allow checking if the underlying file system does support hole-punching without internal fallback, for userspace code that really cares whether the hole-punching requires long range of zero-filling. In case the userspace code does not care about the whether hole punching is supported on the file system, fspacectl(SPACECTL_DEALLOC) could be called without much extra care.

  • Move them to D28833.
  • shm_deallocate: kill int error;. Just directly return (0).