Page MenuHomeFreeBSD

Switch to contigmalloc in the Hyper-V code
ClosedPublic

Authored by andrew on May 23 2023, 8:56 AM.
Tags
None
Referenced Files
F103764634: D40227.diff
Fri, Nov 29, 3:17 AM
Unknown Object (File)
Thu, Nov 28, 3:42 AM
Unknown Object (File)
Tue, Nov 26, 9:49 AM
Unknown Object (File)
Tue, Nov 26, 8:56 AM
Unknown Object (File)
Tue, Nov 26, 8:37 AM
Unknown Object (File)
Tue, Nov 26, 6:59 AM
Unknown Object (File)
Sun, Nov 3, 5:01 PM
Unknown Object (File)
Sun, Nov 3, 5:00 PM
Subscribers

Details

Summary

In the Hyper-V drivers we need to allocate buffers shared between the
host and guest. This memory has been allocated with bus_dma, however
it doesn't use this correctly, e.g. it is missing calls to
bus_dmamap_sync. Along with this on arm64 we need this memory to be
mapped with the correct memory type that bus_dma may not use.

Switch to contigmalloc to allocate this memory as this will correctly
allocate cacheable memory.

Sponsored by: Arm Ltd

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 51679
Build 48570: arc lint + arc unit

Event Timeline

Tested under Hyper-V on a Windows Dev Kit 2023 with the patch in [1], still needs testing elsewhere.

[1] https://lists.freebsd.org/archives/dev-commits-src-all/2023-April/025804.html

sys/dev/hyperv/netvsc/if_hn.c
5502

where is check for contigmalloc() failure? Missing in all the places

sys/dev/hyperv/vmbus/vmbus.c
749

Please use a Macro here instead 12 bit shift. Same for the code below.

sys/dev/hyperv/vmbus/vmbus_chan.c
1643

Please handle null return scenarios.

Interesting, M_WAITOK has a slightly different meaning in contigmalloc to the main kernel malloc. In malloc with M_WAITOK the thread will sleep until it can allocate the requested memory. With contigmalloc it will try harder to allocate the memory, but can still fail after a few attempts.

Also if hyperv_dmamem_alloc() and hyperv_dmamem_free() is not in use, please remove them from vmbus. As they are dead code.

hyperv_dmamem_alloc is used in amd64 code. I wasn't planning on touching that as I am unable to easily test the change.

hyperv_dmamem_alloc is used in amd64 code. I wasn't planning on touching that as I am unable to easily test the change.

Then it will create confusion in future, as having two different functions in same code base.
The current change not only scoped for arm64 only, all these are also part of amd64 code.
So either we should take care of the whole code, or only arm64 / amd64 specific area.

  • Restore NULL checks
  • Don't include hyperv_busdma.h when it's unneeded
  • Add a MSR_HV_SIEFP_PGMASK and MSR_HV_SIMP_PGMASK macro
  • Remove hyperv_dmamem_free as it's unused

hyperv_dmamem_free() is removed, but hyperv_dmamem_alloc() stays, so what about the memory allocated by hyperv_dmamem_alloc()? How they will be freed?
Also it is not very consistent.
We need to do a thorough testing on Azure amd64 gen1 gen2 and arm64 with this patch.
Have you tested the change on Hyper-V x86 and arm64?

I will try to test the patch on Azure x86 gen1 and gen2 and arm64.
I guess by this weekend I can complete it.

The only remaining caller of hyperv_dmamem_alloc is to allocate hyperv_ref_tsc.tsc_ref. It doesn't have a call to hyperv_dmamem_free.

If possible please get the change thoroughly tested in Azure x86 platform or Hyper-V x86.

Move the last x86 caller of hyperv_dmamem_alloc

If possible please get the change thoroughly tested in Azure x86 platform or Hyper-V x86.

Getting panic after installation :
cd0 at storvsc1 bus 0 scbus1 target 0 lun 0
cd0: <Msft Virtual DVD-ROM 1.0> Removable CD-ROM SPC-3 SCSI device
cd0: 300.000MB/s transfers
cd0: 819MB (419479 2048 byte sectors)
GEOM: new disk cd0
hvkvp0: chan8 revoked
hvkvp0: chan8 detached
hvkvp0: chan8 closed
panic: Assertion bt->bt_size == vmem_roundup_size(vm, size) || bt->bt_size - vmem_roundup_size(vm, size) <= vm->vm_quantum_mask failed at /datadrive/sandbox_30_05/src/sys/kern/subr_vmem.c:1492
cpuid = 3
time = 1685480882
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x13c
panic() at panic+0x44
vmem_xfree() at vmem_xfree+0x350
contigfree() at contigfree+0x14
vmbus_chan_close_internal() at vmbus_chan_close_internal+0x204
vmbus_ic_detach() at vmbus_ic_detach+0x20
device_detach() at device_detach+0x1a8
device_delete_child() at device_delete_child+0x18
vmbus_delete_child() at vmbus_delete_child+0x28
vmbus_prichan_detach_task() at vmbus_prichan_detach_task+0x1c
taskqueue_run_locked() at taskqueue_run_locked+0x17c
taskqueue_thread_loop() at taskqueue_thread_loop+0xc8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 0 tid 100116 ]
Stopped at kdb_enter+0x44: str xzr, [x19, #3328]
db>

Fix contigfree on the correct branch

I have tested in arm64 and x86 and looks good to me.

This revision is now accepted and ready to land.Jun 1 2023, 10:28 AM
This revision was automatically updated to reflect the committed changes.