Page MenuHomeFreeBSD

hyperv/vmbus: use a better retry method in hv_vmbus_post_message()
ClosedPublic

Authored by decui_microsoft.com on Mar 23 2016, 3:30 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Oct 20, 4:27 AM
Unknown Object (File)
Mon, Oct 20, 1:44 AM
Unknown Object (File)
Sat, Oct 18, 9:45 PM
Unknown Object (File)
Fri, Oct 17, 7:37 PM
Unknown Object (File)
Sun, Oct 12, 12:42 AM
Unknown Object (File)
Sat, Oct 11, 6:43 AM
Unknown Object (File)
Sat, Oct 11, 6:43 AM
Unknown Object (File)
Fri, Oct 10, 11:27 PM
Subscribers

Details

Summary

I occassionally hit the KASSERT:

panic: Error VMBUS: Message Post Failed
cpuid = 7
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe011cbeb450
vpanic() at vpanic+0x182/frame 0xfffffe011cbeb4d0
kassert_panic() at kassert_panic+0x126/frame 0xfffffe011cbeb540
hv_vmbus_post_message() at hv_vmbus_post_message+0x1ce/frame 0xfffffe011cbeb570
hv_vmbus_channel_establish_gpadl() at hv_vmbus_channel_establish_gpadl+0x4e2/frame 0xfffffe011cbeb5d0
hv_nv_on_device_add() at hv_nv_on_device_add+0x670/frame 0xfffffe011cbeb640
hv_rf_on_device_add() at hv_rf_on_device_add+0x8f/frame 0xfffffe011cbeb6f0
netvsc_attach() at netvsc_attach+0x141c/frame 0xfffffe011cbeb830
device_attach() at device_attach+0x41d/frame 0xfffffe011cbeb890
hv_vmbus_child_device_register() at hv_vmbus_child_device_register+0x13a/frame 0xfffffe011cbeb980
vmbus_channel_on_offer_internal() at vmbus_channel_on_offer_internal+0x33c/frame 0xfffffe011cbeb9c0
work_item_callback() at work_item_callback+0x10/frame 0xfffffe011cbeb9e0
taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xfffffe011cbeba40
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe011cbeba70
fork_exit() at fork_exit+0x84/frame 0xfffffe011cbebab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe011cbebab0

So let's use a better retry method instead, by

  1. retry more times;
  2. use a slightly bigger delay and double the delay time on retry;
  3. use pause_sbt() to replace the busy DELAY();

Usually hv_vmbus_post_msg_via_msg_ipc() doesn't fail and I think it only fails
when creating GPADLs of big shared memory with the host, e.g., in the above
netvsc initialization code, a GPADL of 15MB sendbuf is created, causing lots
of messages posted to the host in a short period of time, and it looks the
host (or the hypervisor) may have a throttling machanism by returning
HV_STATUS_INSUFFICIENT_BUFFERS. In practice, we resove the issue by
delay-and-retry. And, since the GPADL setup is one-shot, there is no
performance issue when we retry with a bigger delay -- usually we only need
to retry once or twice.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 2987
Build 3017: arc lint + arc unit

Event Timeline

decui_microsoft.com retitled this revision from to hyperv/vmbus: use a better retry method in hv_vmbus_post_message().
decui_microsoft.com updated this object.
decui_microsoft.com edited the test plan for this revision. (Show Details)
sepherosa_gmail.com edited edge metadata.

Yeah, the more NICs you have the higher chance you got this panic.

My worry is, now that the code sleeps in the globally shared taskqueue_thread. I'd suggest to create two vmbus taskqueues, instead of sharing the global taskqueue_fast/taskqueue_thread. Though that should be done in another patch.

This revision is now accepted and ready to land.Mar 23 2016, 3:40 AM

Yeah, the more NICs you have the higher chance you got this panic.

My worry is, now that the code sleeps in the globally shared taskqueue_thread. I'd suggest to create two vmbus taskqueues, instead of sharing the global taskqueue_fast/taskqueue_thread. Though that should be done in another patch.

You can not sleep on taskqueue_fast, so only the places on taskqueue_thread need change.

Yeah, the more NICs you have the higher chance you got this panic.

My worry is, now that the code sleeps in the globally shared taskqueue_thread. I'd suggest to create two vmbus taskqueues, instead of sharing the global taskqueue_fast/taskqueue_thread. Though that should be done in another patch.

You can not sleep on taskqueue_fast, so only the places on taskqueue_thread need change.

I'd prefer to have our own taskqueues for the mgmt stuffs, i.e. the msg and channel offers, so we could do malloc(WAITOK) and sleep as we wish.

Yeah, the more NICs you have the higher chance you got this panic.

My worry is, now that the code sleeps in the globally shared taskqueue_thread. I'd suggest to create two vmbus taskqueues, instead of sharing the global taskqueue_fast/taskqueue_thread. Though that should be done in another patch.

You can not sleep on taskqueue_fast, so only the places on taskqueue_thread need change.

I'd prefer to have our own taskqueues for the mgmt stuffs, i.e. the msg and channel offers, so we could do malloc(WAITOK) and sleep as we wish.

I think Jun agrees on this. I'll make another patch to add VMBus's own sleepable taskqueue.

I think Jun's point is: we don't need to add our own fast taskqueue for VMBus. Since almost all the users of fast taskqueue in a Hyper-V VM are our drivers (NIC, storage, timer, etc), I think it's ok for us to share the global fast taskqueue.

Yeah, the more NICs you have the higher chance you got this panic.

My worry is, now that the code sleeps in the globally shared taskqueue_thread. I'd suggest to create two vmbus taskqueues, instead of sharing the global taskqueue_fast/taskqueue_thread. Though that should be done in another patch.

You can not sleep on taskqueue_fast, so only the places on taskqueue_thread need change.

I'd prefer to have our own taskqueues for the mgmt stuffs, i.e. the msg and channel offers, so we could do malloc(WAITOK) and sleep as we wish.

I think Jun agrees on this. I'll make another patch to add VMBus's own sleepable taskqueue.

I think Jun's point is: we don't need to add our own fast taskqueue for VMBus. Since almost all the users of fast taskqueue in a Hyper-V VM are our drivers (NIC, storage, timer, etc), I think it's ok for us to share the global fast taskqueue.

The point of using private taskqueue is that we could use malloc(WAITOK) and sleep safely instead of:
ptr = malloc(NOWAIT);
KASSERT(ptr != NULL);
in our msg handlers.

taskqueue_fast, which is swi actually, is not fast taskqueue :)

Note that KASSERTing that malloc(M_NOWAIT) did not returned NULL is unacceptable. This is false condition, which result in the undeserved panic in the debugging builds, and in NULL pointer exception in production builds.

If you have a task queue which can wait, you should consider the consequences, even if you make the queue private for hyperv stuff. Are e.g. disk writes blocked while the taskqueue is not processed ? If yes, this is quite bad, since producing free memory to satisfy the malloc() request might involve page-out and disk writes.

In D5715#122044, @kib wrote:

Note that KASSERTing that malloc(M_NOWAIT) did not returned NULL is unacceptable. This is false condition, which result in the undeserved panic in the debugging builds, and in NULL pointer exception in production builds.

If you have a task queue which can wait, you should consider the consequences, even if you make the queue private for hyperv stuff. Are e.g. disk writes blocked while the taskqueue is not processed ? If yes, this is quite bad, since producing free memory to satisfy the malloc() request might involve page-out and disk writes.

I/O and performance related bits are handled in other taskqs.

The two taskqs I mentions above are only used as the 'config/init' taskqueues; they may be only run several times on a long lived system.

This revision was automatically updated to reflect the committed changes.