⚙ D11392 Allow CUSE to free memory mapped memory by using d_mmap_single instead of d

• hselasky created this revision.Jun 28 2017, 11:04 AM

Herald added a subscriber: imp. · View Herald TranscriptJun 28 2017, 11:04 AM

kib added inline comments.Jun 28 2017, 4:28 PM

sys/fs/cuse/cuse.c
1266 ↗	(On Diff #30171)	I do not understand this. Why do you need populate() method if you always return VM_PAGER_BAD ? This return code always results in the vm_fault code to (try to) call _pg_fault() method, which is NULL in your case. More, because of this, vm_pager_get_pages() returns VM_PAGER_FAIL and indeed the freshly allocated but _not_ fictitious page is left on the object queue and used. Is this what you want there ?

• hselasky added inline comments.Jun 28 2017, 4:35 PM

sys/fs/cuse/cuse.c
1266 ↗	(On Diff #30171)	if (rv == VM_PAGER_BAD) { /* * VM_PAGER_BAD is the backdoor for a pager to request * normal fault handling. */ I want that when fault happens that a zero-filled page is automatically mapped, so that I dont have to pre-allocate all the memory. Looking at other drivers like drm-next, this appears to be the way to do it? Is there a better way? --HPS

kib added inline comments.Jun 28 2017, 4:53 PM

sys/fs/cuse/cuse.c
1266 ↗	(On Diff #30171)	You do not need backdoor if all you want is to non-populate handler to be called. Leave it NULL. Well, if you are fine with non-fictitious pages in the device object page queue, this might even work, but definitely was not much tested. Did you run this with the INVARIANTS option ?

markj added inline comments.Jun 28 2017, 5:44 PM

sys/fs/cuse/cuse.c
1266 ↗	(On Diff #30171)	I don't think it makes much sense to use a device object here. Such objects are used for creating mappings to device-owned memory ranges; OBJT_MGTDEVICE objects in particular have the property that all mappings to a page managed by the object can be identified and removed quickly. This property is required by the DRM drivers. Here it sounds as though you just want demand-allocated pages? In that case it would appropriate to use an OBJT_SWAP object. I'm not sure what the purpose of this interface is though, so maybe I'm misunderstanding.

• hselasky added inline comments.Jun 28 2017, 6:43 PM

sys/fs/cuse/cuse.c
1266 ↗	(On Diff #30171)	@kib : Yes, I see I can leave the populate handler NULL for this purpose. I ran the current code with INVARIANTS and it worked just fine. @markj : The purpose is to allow shared memory between CUSE servers and CUSE clients. The memory should not be swapped to disk. Is that one difference between OBJT_MGTDEVICE and OBJT_SWAP ?

Removed not needed populate handler function.

markj added inline comments.Jun 28 2017, 8:26 PM

sys/fs/cuse/cuse.c
1266 ↗	(On Diff #30171)	Pages belonging to an OBJT_MGTDEVICE should not be swapped to disk because they are typically shared between the CPU and some device controller. However, in your new scheme I think there is nothing preventing the VM system from attempting to swap out these pages, which will trigger the panic in dev_pager_putpages(). In normal usage, pages belonging to a device object are logically wired, which means that the VM system cannot reclaim them from the object in response to memory pressure. But here there is nothing wiring the pages allocated during a fault on a cuse device mapping.

• hselasky added inline comments.Jun 29 2017, 5:18 AM

sys/fs/cuse/cuse.c
1266 ↗	(On Diff #30171)	@markj : Swapping OBJT_MGTDEVICE with OBJT_SWAP doesnt work. How can I prevent OBJT_MGTDEVICE pages from being swapped out when the populate function is NULL? BTW: This also needs to be fixed in the LinuxKPI for a similar case.

kib added inline comments.Jun 29 2017, 1:03 PM

sys/fs/cuse/cuse.c
1266 ↗	(On Diff #30171)	I suspect that what you want there is OBJT_PHYS pager instead.

Rework patch based on input.

Fix some bugs found during testing.

Set correct size of allocated VM object in case there is an offset within the object requested by the mmap().

• hselasky accepted this revision.Sep 18 2017, 7:13 PM

This revision is now accepted and ready to land.Sep 18 2017, 7:13 PM

Can you explain why did you claimed, in the conversation with Mark, that the swap object does not work for you ?

If all you want is the shared memory between server and client, then IMO OBJT_SWAP or even OBJT_DEFAULT just provide it without additional overhead of single-purpose code. Does the kernel side access the memory, or is it purely transport between the server and the client ?

sys/fs/cuse/cuse.c
1337 ↗	(On Diff #33203)	I do not understand this check. Did you actually saw the situation where fault occured with the fictitious page passed to you as *mres ?
1396 ↗	(On Diff #33203)	I suspect that page_nr can overflow.
1424 ↗	(On Diff #33203)	There is too many ()s.
1437 ↗	(On Diff #33203)	Perhaps use ptoa(mem->page_count)
1446 ↗	(On Diff #33203)	There too.

Can you explain why did you claimed, in the conversation with Mark, that the swap object does not work for you ?

If all you want is the shared memory between server and client, then IMO OBJT_SWAP or even OBJT_DEFAULT just provide it without additional overhead of single-purpose code. Does the kernel side access the memory, or is it purely transport between the server and the client ?

Hi,

The memory cannot be of SWAP type, because it needs to be quickly accessible. I.E. it is not allowed to swap this memory to disk.

Is that supported when using OBJT_SWAP.

Using a kernel allocated buffer might make it easier to debug memory leaks, that you can check all memory is returned when cuse.ko is unloaded.

--HPS

sys/fs/cuse/cuse.c
1337 ↗	(On Diff #33203)	This code was copied from sys/vm/xxx . If there are no fictious pages for kernel allocated virtual memory, I can remove that check.
1396 ↗	(On Diff #33203)	I'll fix. This isn't critical because *offset is not used as-is.

In D11392#257466, @hselasky wrote:

The memory cannot be of SWAP type, because it needs to be quickly accessible. I.E. it is not allowed to swap this memory to disk.

This cannot be an argument. If functionally the swap-backed memory is fine, I do not believe into the arguments 'as fast as possible'. Kernel does not slow-down you for malicious intent, but in low-memory situation, using the swappable memory allows the system to proceed instead of deadlocking. Otherwise you would not notice a difference in 'speed'.

Is that supported when using OBJT_SWAP.

Using a kernel allocated buffer might make it easier to debug memory leaks, that you can check all memory is returned when cuse.ko is unloaded.

If the shared communication memory is still mapped when cuse.ko is unloaded, what should happen ? I do not think that it is really possible to track such situations and prevent cuse.ko unload without huge and really unneeded efforts. Leaving the deallocation of the objects to normal VM lifecycle rules avoids unnecessary troubles, IMO.

If the shared communication memory is still mapped when cuse.ko is unloaded, what should happen ? I do not think that it is really possible to track such situations and prevent cuse.ko unload without huge and really unneeded efforts. Leaving the deallocation of the objects to normal VM lifecycle rules avoids unnecessary troubles, IMO.

The problem is the following. The callback functions we install will disappear from the .code segment and cannot be called after cuse.ko is unloaded. In other words cuse.ko must wait for all allocated memory or VM objects to be freed before unloading. Can this be done with SWAP ?

In D11392#257483, @hselasky wrote:

If the shared communication memory is still mapped when cuse.ko is unloaded, what should happen ? I do not think that it is really possible to track such situations and prevent cuse.ko unload without huge and really unneeded efforts. Leaving the deallocation of the objects to normal VM lifecycle rules avoids unnecessary troubles, IMO.

The problem is the following. The callback functions we install will disappear from the .code segment and cannot be called after cuse.ko is unloaded. In other words cuse.ko must wait for all allocated memory or VM objects to be freed before unloading. Can this be done with SWAP ?

What callbacks ? If you unref the allocated vm objects on cuse.ko unload, VM will correctly free both objects and objects page queues when the last process unmaps it. For OBJT_SWAP you do not need to do anything else except releasing _your_ references.

What callbacks ? If you unref the allocated vm objects on cuse.ko unload, VM will correctly free both objects and objects page queues when the last process unmaps it. For OBJT_SWAP you do not need to do anything else except releasing _your_ references.

The following callbacks:

static struct cdev_pager_ops cuse_pager_ops = {
	        .cdev_pg_fault = cuse_pager_fault,
	        .cdev_pg_ctor = cuse_pager_ctor,
	        .cdev_pg_dtor = cuse_pager_dtor
	};

In D11392#257486, @hselasky wrote:
What callbacks ? If you unref the allocated vm objects on cuse.ko unload, VM will correctly free both objects and objects page queues when the last process unmaps it. For OBJT_SWAP you do not need to do anything else except releasing _your_ references.

The following callbacks:
static struct cdev_pager_ops cuse_pager_ops = {
	        .cdev_pg_fault = cuse_pager_fault,
	        .cdev_pg_ctor = cuse_pager_ctor,
	        .cdev_pg_dtor = cuse_pager_dtor
	};
	

These callbacks only defined for the device pagers, VM automatically handles page allocations and free, as well as the page faults, for swap objects.

@kib

There is one more requirement. If I create a swap object I need to be able to offset inside the swap object. Is that supported?

Personally I like malloc() better than the idea of a swap object. The though about having a swapout for CUSE driven device drivers I don't like.

In D11392#257488, @hselasky wrote:

@kib

There is one more requirement. If I create a swap object I need to be able to offset inside the swap object. Is that supported?

Does the vm_ooffset_t *offset argument in the d_mmap_single() signature does what you need (I am not sure what and why you need) ?

Personally I like malloc() better than the idea of a swap object. The though about having a swapout for CUSE driven device drivers I don't like.

In D11392#257497, @kib wrote:

In D11392#257488, @hselasky wrote:

@kib

There is one more requirement. If I create a swap object I need to be able to offset inside the swap object. Is that supported?

Does the vm_ooffset_t *offset argument in the d_mmap_single() signature does what you need (I am not sure what and why you need) ?

I need to be able to allocate a X pages VM object, and then only mmap() parts of it to userspace by d_mmap_single() calls.

--HPS

In D11392#257510, @hselasky wrote:

In D11392#257497, @kib wrote:

In D11392#257488, @hselasky wrote:

@kib

There is one more requirement. If I create a swap object I need to be able to offset inside the swap object. Is that supported?

Does the vm_ooffset_t *offset argument in the d_mmap_single() signature does what you need (I am not sure what and why you need) ?

I need to be able to allocate a X pages VM object, and then only mmap() parts of it to userspace by d_mmap_single() calls.

offset specifies the byte offset into the start of the mapping that will be created. Max allowed size of the mapping is in fact limited by the object size left after the offset, pages that are behind object limit cause SIGBUS.

But every your answer really raises next 'why' question. Why cannot you create a separate object per each pair of server/client mapping requests ?

@kib : How can two different processes share a piece of memory if not using the same VM object?

In D11392#257519, @hselasky wrote:

@kib : How can two different processes share a piece of memory if not using the same VM object?

I asked if it possible to use the same object per single pair of server/client request. It would be much simpler if you explained the userspace API you want to implement. It need not be a man page, but provide all essential details so that the suggestions would be usable.

Hi,

Yes, server and client can use same object, and that is what happens with the current patch when the "mem" handle is the same.

--HPS

Use SWAP object instead of malloc'ed memory.

This revision now requires review to proceed.Nov 1 2017, 6:06 PM

Make sure VM memory refs are dropped.

Remove now redundant memory free call.

Bugfix. Avoid duplicate locking.

Fix refcount deadlock.

kib accepted this revision.Nov 3 2017, 10:59 AM

This revision is now accepted and ready to land.Nov 3 2017, 10:59 AM

• hselasky accepted this revision.Nov 3 2017, 1:54 PM

Closed by commit rS325362: Allow CUSE(3) to free all memory mapped memory by using regular SWAP objects (authored by • hselasky). · Explain WhyNov 3 2017, 2:11 PM

This revision was automatically updated to reflect the committed changes.

Allow CUSE to free memory mapped memory by using d_mmap_single instead of d_mmap
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 34718

head/sys/fs/cuse/cuse.c

head/sys/fs/cuse/cuse_ioctl.h

Allow CUSE to free memory mapped memory by using d_mmap_single instead of d_mmapClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 34718

head/sys/fs/cuse/cuse.c

head/sys/fs/cuse/cuse_ioctl.h

Allow CUSE to free memory mapped memory by using d_mmap_single instead of d_mmap
ClosedPublic
Actions

Revision Contents
Changeset List