Page MenuHomeFreeBSD

Restructure memory allocation in bhyve to support "devmem".
ClosedPublic

Authored by neel on Jun 9 2015, 1:08 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 27, 6:02 PM
Unknown Object (File)
Nov 17 2024, 12:49 AM
Unknown Object (File)
Nov 16 2024, 11:41 PM
Unknown Object (File)
Nov 7 2024, 7:22 PM
Unknown Object (File)
Oct 28 2024, 10:53 AM
Unknown Object (File)
Oct 3 2024, 9:07 PM
Unknown Object (File)
Oct 1 2024, 8:34 PM
Unknown Object (File)
Sep 23 2024, 9:01 PM
Subscribers

Details

Summary

devmem is used to represent MMIO devices like the boot ROM or a VESA
framebuffer where doing a trap-and-emulate for every access is impractical.
devmem is a hybrid of system memory (sysmem) and device emulation.

devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.

Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and it decouples the host mapping of
devmem from its mapping in the guest address space (which can change).

Test Plan

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

neel retitled this revision from to Restructure memory allocation in bhyve to support "devmem"..
neel updated this object.
neel added reviewers: grehan, tychon.
neel edited the test plan for this revision. (Show Details)

This restructuring will be really handy. However, I think it's possible
to unify the handling of 'sysmem' and 'devmem' such that 'sysmem'
differs from 'devmem' only in persistance and accessibility (PROT_*).

Since alloc_memseg() is already called by both vm_setup_memory() and
vm_create_devmem() you'll simply need to provide a vanity name for
"lower", "low" and "high" memory to vm_alloc_memseg() when it's
created. Then you can delete the inference that a memsesg is 'RAM'
you get when you encounter a VM_MEMSEG_NAME. You'll also need to
augment vm_alloc_memseg() will to take a persistance argument
which will be set for 'sysmem' and clear for 'devmem'.

As a side benefit to treating all memory roughly equal is that you'll
be able to expose the entire guest's memory PA-wise in
/dev/vmm/testvmm.lowermem, /dev/vmm/testvmm.lowmem and
/dev/vmm/testvmm.highmem!

When each memseg has a vanity name you can dispense with
providing segid to vm_alloc_memseg() and instead have it returned as a
cookie of sorts; subsequent operations such as vm_map_memseg() and
vm_unmap_memseg() will take the segid and you can provide a
vm_get_segid_by_name() for the unfortuante consumer who misplaces the segid.

You'll also be able to get rid of these upfront defines:

enum {
        VM_SYSMEM,
        VM_BOOTROM,
        VM_FRAMEBUFFER,
 };

and automatically be able to support more than device ROM, etc.

Additionally, since all callers of vm_mmap_getnext() seem interested in a
specific range why not provide vm_segid_by_range() and
vm_get_memmap(). That way you can replace:

	gpa = 0;
	found = 0;
	while (!found) {
		error = vm_mmap_getnext(sc->vm, &gpa, &segid, &segoff, &len,
		    NULL, NULL);
		if (error)
			break;
	
		if (first >= gpa && last <= gpa + len)
			found = 1;
		else
			gpa += len;
	}

with:

	error = vm_segid_by_range(sc->vm, first, last - first, &segid);
	if (!error) {
		vm_get_memmap(segid, &segoff, &len, &prot, &flags);
	}

where vm_segid_by_range() is:

	int
	vm_segid_by_range(struct vm *vm, vm_paddr_t gpa, size_t *len, int *segid)
	{
		int i;

		for (i = 0; i < VM_MAX_MEMMAPS; i++) {
			mm = &vm->mem_maps[i];
			if (mm->gpa <= gpa && <= gpa + len mm->gpa + mm->len)
				return (mm->segid);
		}

		return (EINVAL);
	}

This also has the benefit of removing the nested:

	while (!found) {
		for (i = 0; i < VM_MAX_MEMMAPS; i++) {
		}
	}

loop!

In D2762#52915, @tychon wrote:

Thanks Tycho. My responses inline:

This restructuring will be really handy. However, I think it's possible
to unify the handling of 'sysmem' and 'devmem' such that 'sysmem'
differs from 'devmem' only in persistance and accessibility (PROT_*).

Since alloc_memseg() is already called by both vm_setup_memory() and
vm_create_devmem() you'll simply need to provide a vanity name for
"lower", "low" and "high" memory to vm_alloc_memseg() when it's
created. Then you can delete the inference that a memsesg is 'RAM'
you get when you encounter a VM_MEMSEG_NAME. You'll also need to
augment vm_alloc_memseg() will to take a persistance argument
which will be set for 'sysmem' and clear for 'devmem'.

I think knowing which memory segments represent system memory is useful.

For e.g., I use this in iommu_modify() to only map 'sysmem' in the passthru
device's address space. This enforces consistency with the treatment of other
(emulated) device memory (e.g. AHCI BAR) from the point of view of the
passthru device.

As a side benefit to treating all memory roughly equal is that you'll
be able to expose the entire guest's memory PA-wise in
/dev/vmm/testvmm.lowermem, /dev/vmm/testvmm.lowmem and
/dev/vmm/testvmm.highmem!

That's one way to do it although I don't see it as being functionally
different than mmap(/dev/vmm/vmname).

When each memseg has a vanity name you can dispense with
providing segid to vm_alloc_memseg() and instead have it returned as a
cookie of sorts; subsequent operations such as vm_map_memseg() and
vm_unmap_memseg() will take the segid and you can provide a
vm_get_segid_by_name() for the unfortuante consumer who misplaces the segid.

You'll also be able to get rid of these upfront defines:

enum {
        VM_SYSMEM,
        VM_BOOTROM,
        VM_FRAMEBUFFER,
 };

and automatically be able to support more than device ROM, etc.

Possibly, but that is just shuffling the identifier from an enumeration
to a character string (the segment name). It will still be necessary for
somebody to dole out or arbitrate who uses what character string.

Also, an opaque identifier will require adding a 'vm_memseg_getnext()'
API for bhyvectl to be able to iterate over all memory segments.

The enum { VM_SYSMEM, VM_BOOTROM ... } is a handy mnemonic to
make it easy to identify memory segments. It does not limit the number
or the type of memory segments that can be created.

Additionally, since all callers of vm_mmap_getnext() seem interested in a
specific range why not provide vm_segid_by_range() and
vm_get_memmap(). That way you can replace:

	gpa = 0;
	found = 0;
	while (!found) {
		error = vm_mmap_getnext(sc->vm, &gpa, &segid, &segoff, &len,
		    NULL, NULL);
		if (error)
			break;
	
		if (first >= gpa && last <= gpa + len)
			found = 1;
		else
			gpa += len;
	}

with:

	error = vm_segid_by_range(sc->vm, first, last - first, &segid);
	if (!error) {
		vm_get_memmap(segid, &segoff, &len, &prot, &flags);
	}

where vm_segid_by_range() is:

	int
	vm_segid_by_range(struct vm *vm, vm_paddr_t gpa, size_t *len, int *segid)
	{
		int i;

		for (i = 0; i < VM_MAX_MEMMAPS; i++) {
			mm = &vm->mem_maps[i];
			if (mm->gpa <= gpa && <= gpa + len mm->gpa + mm->len)
				return (mm->segid);
		}

		return (EINVAL);
	}

This also has the benefit of removing the nested:

	while (!found) {
		for (i = 0; i < VM_MAX_MEMMAPS; i++) {
		}
	}

loop!

Hmm, I am not sure where this nested construct is used (didn't see it in vmm.c).

I don't have any objections to having an API that looks up a specific range but
given that the only consumer is 'vmmdev_mmap_single()' I wonder if its really
necessary ...

tychon edited edge metadata.
In D2762#52919, @neel wrote:
In D2762#52915, @tychon wrote:

Thanks Tycho. My responses inline:

This restructuring will be really handy. However, I think it's possible
to unify the handling of 'sysmem' and 'devmem' such that 'sysmem'
differs from 'devmem' only in persistance and accessibility (PROT_*).

Since alloc_memseg() is already called by both vm_setup_memory() and
vm_create_devmem() you'll simply need to provide a vanity name for
"lower", "low" and "high" memory to vm_alloc_memseg() when it's
created. Then you can delete the inference that a memsesg is 'RAM'
you get when you encounter a VM_MEMSEG_NAME. You'll also need to
augment vm_alloc_memseg() will to take a persistance argument
which will be set for 'sysmem' and clear for 'devmem'.

I think knowing which memory segments represent system memory is useful.

For e.g., I use this in iommu_modify() to only map 'sysmem' in the passthru
device's address space. This enforces consistency with the treatment of other
(emulated) device memory (e.g. AHCI BAR) from the point of view of the
passthru device.

It appears beyond just useful but rather an implementation requirement :-)

As a side benefit to treating all memory roughly equal is that you'll
be able to expose the entire guest's memory PA-wise in
/dev/vmm/testvmm.lowermem, /dev/vmm/testvmm.lowmem and
/dev/vmm/testvmm.highmem!

That's one way to do it although I don't see it as being functionally
different than mmap(/dev/vmm/vmname).

It's not different so there isn't much point.

My point really was that the API as proposed would be impossible for someone with a copy of vmm_dev.h and a binary only vmm.ko to use properly. Not that that someone like that exists but libvmmapi.so and bhyve shouldn't contain too much embedded knowledge of the kernel internals.

Specifically, VM_ALLOC_MEMSEG takes a struct vm_memseg which in no way makes it obvious that when you supply a name you get a devseg and that it's necessary to omit the name when you want to get a "true" memseg. It you add 'int ismemseg' then it would be obvious.

When each memseg has a vanity name you can dispense with
providing segid to vm_alloc_memseg() and instead have it returned as a
cookie of sorts; subsequent operations such as vm_map_memseg() and
vm_unmap_memseg() will take the segid and you can provide a
vm_get_segid_by_name() for the unfortuante consumer who misplaces the segid.

You'll also be able to get rid of these upfront defines:

enum {
        VM_SYSMEM,
        VM_BOOTROM,
        VM_FRAMEBUFFER,
 };

and automatically be able to support more than device ROM, etc.

Possibly, but that is just shuffling the identifier from an enumeration
to a character string (the segment name). It will still be necessary for
somebody to dole out or arbitrate who uses what character string.

Also, an opaque identifier will require adding a 'vm_memseg_getnext()'
API for bhyvectl to be able to iterate over all memory segments.

The enum { VM_SYSMEM, VM_BOOTROM ... } is a handy mnemonic to
make it easy to identify memory segments. It does not limit the number
or the type of memory segments that can be created.

Indeed the enum doesn't limit the number nor types or memory segments but having the consumer supply two identifiers (the id and when required the name) make it's it a bit split brain. Compounding this is that it's not possible to go from the device file back to the segid even with a brute force search over the entire gpa-space with vm_mmap_getnext() as the name isn't returned by vm_mmap_getnext()!

Additionally, since all callers of vm_mmap_getnext() seem interested in a
specific range why not provide vm_segid_by_range() and
vm_get_memmap(). That way you can replace:

	gpa = 0;
	found = 0;
	while (!found) {
		error = vm_mmap_getnext(sc->vm, &gpa, &segid, &segoff, &len,
		    NULL, NULL);
		if (error)
			break;
	
		if (first >= gpa && last <= gpa + len)
			found = 1;
		else
			gpa += len;
	}

with:

	error = vm_segid_by_range(sc->vm, first, last - first, &segid);
	if (!error) {
		vm_get_memmap(segid, &segoff, &len, &prot, &flags);
	}

where vm_segid_by_range() is:

	int
	vm_segid_by_range(struct vm *vm, vm_paddr_t gpa, size_t *len, int *segid)
	{
		int i;

		for (i = 0; i < VM_MAX_MEMMAPS; i++) {
			mm = &vm->mem_maps[i];
			if (mm->gpa <= gpa && <= gpa + len mm->gpa + mm->len)
				return (mm->segid);
		}

		return (EINVAL);
	}

This also has the benefit of removing the nested:

	while (!found) {
		for (i = 0; i < VM_MAX_MEMMAPS; i++) {
		}
	}

loop!

Hmm, I am not sure where this nested construct is used (didn't see it in vmm.c).

I formed the nested loop from while { vm_mmap_getnext() } within vmmdev_mmap_single and the for() { } within vm_mmap_getnext().

I don't have any objections to having an API that looks up a specific range but
given that the only consumer is 'vmmdev_mmap_single()' I wonder if its really
necessary ...

I see 3 places where vm_mmap_getnext() is called: vm_mmap_memseg(), vmmdev_mmap_single() and show_memmap(). In my opinion both vm_mmap_memseg(), vmmdev_mmap_single() would benefit from a by-range search and show_memmap() by an opaque iterator (something along the lines of what you do with the old school e820, to wit, supply and return a continuation cookie). Then if you ever wanted to do a tree rather than an array for vm->mem_maps it would be simple.

Anyway, there are merely suggestions and I'm having a deeper look here because the porting tax is going be much higher with new device files being created, etc.

If you believe some of them have merit, and would like to incorporate them in your patch, great. If you like some them but don't have the time to implement then I'm happy to follow-up with some refinements.

Since it's functionally correct, I'll accept the revision.

Tycho

This revision is now accepted and ready to land.Jun 10 2015, 4:48 PM
In D2762#52988, @tychon wrote:
In D2762#52919, @neel wrote:
In D2762#52915, @tychon wrote:

Thanks Tycho. My responses inline:

This restructuring will be really handy. However, I think it's possible
to unify the handling of 'sysmem' and 'devmem' such that 'sysmem'
differs from 'devmem' only in persistance and accessibility (PROT_*).

Since alloc_memseg() is already called by both vm_setup_memory() and
vm_create_devmem() you'll simply need to provide a vanity name for
"lower", "low" and "high" memory to vm_alloc_memseg() when it's
created. Then you can delete the inference that a memsesg is 'RAM'
you get when you encounter a VM_MEMSEG_NAME. You'll also need to
augment vm_alloc_memseg() will to take a persistance argument
which will be set for 'sysmem' and clear for 'devmem'.

I think knowing which memory segments represent system memory is useful.

For e.g., I use this in iommu_modify() to only map 'sysmem' in the passthru
device's address space. This enforces consistency with the treatment of other
(emulated) device memory (e.g. AHCI BAR) from the point of view of the
passthru device.

It appears beyond just useful but rather an implementation requirement :-)

As a side benefit to treating all memory roughly equal is that you'll
be able to expose the entire guest's memory PA-wise in
/dev/vmm/testvmm.lowermem, /dev/vmm/testvmm.lowmem and
/dev/vmm/testvmm.highmem!

That's one way to do it although I don't see it as being functionally
different than mmap(/dev/vmm/vmname).

It's not different so there isn't much point.

My point really was that the API as proposed would be impossible for someone with a copy of vmm_dev.h and a binary only vmm.ko to use properly. Not that that someone like that exists but libvmmapi.so and bhyve shouldn't contain too much embedded knowledge of the kernel internals.

Specifically, VM_ALLOC_MEMSEG takes a struct vm_memseg which in no way makes it obvious that when you supply a name you get a devseg and that it's necessary to omit the name when you want to get a "true" memseg. It you add 'int ismemseg' then it would be obvious.

That's a fair criticism and especially makes sense if all memory segments had a vanity name.

When each memseg has a vanity name you can dispense with
providing segid to vm_alloc_memseg() and instead have it returned as a
cookie of sorts; subsequent operations such as vm_map_memseg() and
vm_unmap_memseg() will take the segid and you can provide a
vm_get_segid_by_name() for the unfortuante consumer who misplaces the segid.

You'll also be able to get rid of these upfront defines:

enum {
        VM_SYSMEM,
        VM_BOOTROM,
        VM_FRAMEBUFFER,
 };

and automatically be able to support more than device ROM, etc.

Possibly, but that is just shuffling the identifier from an enumeration
to a character string (the segment name). It will still be necessary for
somebody to dole out or arbitrate who uses what character string.

Also, an opaque identifier will require adding a 'vm_memseg_getnext()'
API for bhyvectl to be able to iterate over all memory segments.

The enum { VM_SYSMEM, VM_BOOTROM ... } is a handy mnemonic to
make it easy to identify memory segments. It does not limit the number
or the type of memory segments that can be created.

Indeed the enum doesn't limit the number nor types or memory segments but having the consumer supply two identifiers (the id and when required the name) make it's it a bit split brain.

Yes, that's a fair point.

Compounding this is that it's not possible to go from the device file back to the segid even with a brute force search over the entire gpa-space with vm_mmap_getnext() as the name isn't returned by vm_mmap_getnext()!

'vm_mmap_getnext()' returns a 'segid' which can be used in 'vm_get_memseg()' to arrive at the name.

Additionally, since all callers of vm_mmap_getnext() seem interested in a
specific range why not provide vm_segid_by_range() and
vm_get_memmap(). That way you can replace:

	gpa = 0;
	found = 0;
	while (!found) {
		error = vm_mmap_getnext(sc->vm, &gpa, &segid, &segoff, &len,
		    NULL, NULL);
		if (error)
			break;
	
		if (first >= gpa && last <= gpa + len)
			found = 1;
		else
			gpa += len;
	}

with:

	error = vm_segid_by_range(sc->vm, first, last - first, &segid);
	if (!error) {
		vm_get_memmap(segid, &segoff, &len, &prot, &flags);
	}

where vm_segid_by_range() is:

	int
	vm_segid_by_range(struct vm *vm, vm_paddr_t gpa, size_t *len, int *segid)
	{
		int i;

		for (i = 0; i < VM_MAX_MEMMAPS; i++) {
			mm = &vm->mem_maps[i];
			if (mm->gpa <= gpa && <= gpa + len mm->gpa + mm->len)
				return (mm->segid);
		}

		return (EINVAL);
	}

This also has the benefit of removing the nested:

	while (!found) {
		for (i = 0; i < VM_MAX_MEMMAPS; i++) {
		}
	}

loop!

Hmm, I am not sure where this nested construct is used (didn't see it in vmm.c).

I formed the nested loop from while { vm_mmap_getnext() } within vmmdev_mmap_single and the for() { } within vm_mmap_getnext().

I don't have any objections to having an API that looks up a specific range but
given that the only consumer is 'vmmdev_mmap_single()' I wonder if its really
necessary ...

I see 3 places where vm_mmap_getnext() is called: vm_mmap_memseg(), vmmdev_mmap_single() and show_memmap(). In my opinion both vm_mmap_memseg(), vmmdev_mmap_single() would benefit from a by-range search and show_memmap() by an opaque iterator (something along the lines of what you do with the old school e820, to wit, supply and return a continuation cookie). Then if you ever wanted to do a tree rather than an array for vm->mem_maps it would be simple.

Anyway, there are merely suggestions and I'm having a deeper look here because the porting tax is going be much higher with new device files being created, etc.

If you believe some of them have merit, and would like to incorporate them in your patch, great. If you like some them but don't have the time to implement then I'm happy to follow-up with some refinements.

I am coming around to seeing your point of view on the having all memory segments being named: it will keep the API consistent and also eliminate the split-brain situation you mentioned earlier.

I am not 100% convinced about the merits of representing each sysmem segment as a device node (i.e. vm.lowermem, vm.lowmem, vm.highmem). It still feels more right to represent the system memory map via a single /dev/vmm/vm device node. With x86 the three sysmem devices are manageable but I don't know if other architectures (for e.g. ARM64) have a more fragmented memory map.

Since it's functionally correct, I'll accept the revision.

Thanks and I'll take you up on the offer to follow up with refinements :-)

Tycho

In D2762#53053, @neel wrote:
In D2762#52988, @tychon wrote:
In D2762#52919, @neel wrote:
In D2762#52915, @tychon wrote:

Thanks Tycho. My responses inline:

This restructuring will be really handy. However, I think it's possible
to unify the handling of 'sysmem' and 'devmem' such that 'sysmem'
differs from 'devmem' only in persistance and accessibility (PROT_*).

Since alloc_memseg() is already called by both vm_setup_memory() and
vm_create_devmem() you'll simply need to provide a vanity name for
"lower", "low" and "high" memory to vm_alloc_memseg() when it's
created. Then you can delete the inference that a memsesg is 'RAM'
you get when you encounter a VM_MEMSEG_NAME. You'll also need to
augment vm_alloc_memseg() will to take a persistance argument
which will be set for 'sysmem' and clear for 'devmem'.

I think knowing which memory segments represent system memory is useful.

For e.g., I use this in iommu_modify() to only map 'sysmem' in the passthru
device's address space. This enforces consistency with the treatment of other
(emulated) device memory (e.g. AHCI BAR) from the point of view of the
passthru device.

It appears beyond just useful but rather an implementation requirement :-)

As a side benefit to treating all memory roughly equal is that you'll
be able to expose the entire guest's memory PA-wise in
/dev/vmm/testvmm.lowermem, /dev/vmm/testvmm.lowmem and
/dev/vmm/testvmm.highmem!

That's one way to do it although I don't see it as being functionally
different than mmap(/dev/vmm/vmname).

It's not different so there isn't much point.

My point really was that the API as proposed would be impossible for someone with a copy of vmm_dev.h and a binary only vmm.ko to use properly. Not that that someone like that exists but libvmmapi.so and bhyve shouldn't contain too much embedded knowledge of the kernel internals.

Specifically, VM_ALLOC_MEMSEG takes a struct vm_memseg which in no way makes it obvious that when you supply a name you get a devseg and that it's necessary to omit the name when you want to get a "true" memseg. It you add 'int ismemseg' then it would be obvious.

That's a fair criticism and especially makes sense if all memory segments had a vanity name.

When each memseg has a vanity name you can dispense with
providing segid to vm_alloc_memseg() and instead have it returned as a
cookie of sorts; subsequent operations such as vm_map_memseg() and
vm_unmap_memseg() will take the segid and you can provide a
vm_get_segid_by_name() for the unfortuante consumer who misplaces the segid.

You'll also be able to get rid of these upfront defines:

enum {
        VM_SYSMEM,
        VM_BOOTROM,
        VM_FRAMEBUFFER,
 };

and automatically be able to support more than device ROM, etc.

Possibly, but that is just shuffling the identifier from an enumeration
to a character string (the segment name). It will still be necessary for
somebody to dole out or arbitrate who uses what character string.

Also, an opaque identifier will require adding a 'vm_memseg_getnext()'
API for bhyvectl to be able to iterate over all memory segments.

The enum { VM_SYSMEM, VM_BOOTROM ... } is a handy mnemonic to
make it easy to identify memory segments. It does not limit the number
or the type of memory segments that can be created.

Indeed the enum doesn't limit the number nor types or memory segments but having the consumer supply two identifiers (the id and when required the name) make it's it a bit split brain.

Yes, that's a fair point.

Compounding this is that it's not possible to go from the device file back to the segid even with a brute force search over the entire gpa-space with vm_mmap_getnext() as the name isn't returned by vm_mmap_getnext()!

'vm_mmap_getnext()' returns a 'segid' which can be used in 'vm_get_memseg()' to arrive at the name.

Sorry, I had missed that :-(

Additionally, since all callers of vm_mmap_getnext() seem interested in a
specific range why not provide vm_segid_by_range() and
vm_get_memmap(). That way you can replace:

	gpa = 0;
	found = 0;
	while (!found) {
		error = vm_mmap_getnext(sc->vm, &gpa, &segid, &segoff, &len,
		    NULL, NULL);
		if (error)
			break;
	
		if (first >= gpa && last <= gpa + len)
			found = 1;
		else
			gpa += len;
	}

with:

	error = vm_segid_by_range(sc->vm, first, last - first, &segid);
	if (!error) {
		vm_get_memmap(segid, &segoff, &len, &prot, &flags);
	}

where vm_segid_by_range() is:

	int
	vm_segid_by_range(struct vm *vm, vm_paddr_t gpa, size_t *len, int *segid)
	{
		int i;

		for (i = 0; i < VM_MAX_MEMMAPS; i++) {
			mm = &vm->mem_maps[i];
			if (mm->gpa <= gpa && <= gpa + len mm->gpa + mm->len)
				return (mm->segid);
		}

		return (EINVAL);
	}

This also has the benefit of removing the nested:

	while (!found) {
		for (i = 0; i < VM_MAX_MEMMAPS; i++) {
		}
	}

loop!

Hmm, I am not sure where this nested construct is used (didn't see it in vmm.c).

I formed the nested loop from while { vm_mmap_getnext() } within vmmdev_mmap_single and the for() { } within vm_mmap_getnext().

I don't have any objections to having an API that looks up a specific range but
given that the only consumer is 'vmmdev_mmap_single()' I wonder if its really
necessary ...

I see 3 places where vm_mmap_getnext() is called: vm_mmap_memseg(), vmmdev_mmap_single() and show_memmap(). In my opinion both vm_mmap_memseg(), vmmdev_mmap_single() would benefit from a by-range search and show_memmap() by an opaque iterator (something along the lines of what you do with the old school e820, to wit, supply and return a continuation cookie). Then if you ever wanted to do a tree rather than an array for vm->mem_maps it would be simple.

Anyway, there are merely suggestions and I'm having a deeper look here because the porting tax is going be much higher with new device files being created, etc.

If you believe some of them have merit, and would like to incorporate them in your patch, great. If you like some them but don't have the time to implement then I'm happy to follow-up with some refinements.

I am coming around to seeing your point of view on the having all memory segments being named: it will keep the API consistent and also eliminate the split-brain situation you mentioned earlier.

Okay.

I am not 100% convinced about the merits of representing each sysmem segment as a device node (i.e. vm.lowermem, vm.lowmem, vm.highmem). It still feels more right to represent the system memory map via a single /dev/vmm/vm device node. With x86 the three sysmem devices are manageable but I don't know if other architectures (for e.g. ARM64) have a more fragmented memory map.

Actually, I should have made it clear when I accepted that you convinced me that different device nodes for the memory segments don't add value. If you tag them as 'sysmem' when they are created you can always find them again with the iterator. There just isn't any additional value having them as discrete device nodes.

Since it's functionally correct, I'll accept the revision.

Thanks and I'll take you up on the offer to follow up with refinements :-)

Sounds good.

wblock added inline comments.
usr.sbin/bhyveload/bhyveload.8
118 ↗(On Diff #6050)

.El (end list) should be before .Sh EXAMPLES. mandoc -Tlint bhyveload.8 should detect that and any other problems.

This revision was automatically updated to reflect the committed changes.