Page MenuHomeFreeBSD

hyperv: vmbus: run non-blocking message handlers in vmbus_msg_swintr()
ClosedPublic

Authored by decui_microsoft.com on Dec 16 2015, 1:46 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 10 2024, 1:53 AM
Unknown Object (File)
Dec 9 2024, 5:12 AM
Unknown Object (File)
Sep 26 2024, 10:13 PM
Unknown Object (File)
Sep 26 2024, 9:56 AM
Unknown Object (File)
Sep 16 2024, 5:44 PM
Unknown Object (File)
Sep 16 2024, 1:39 PM
Unknown Object (File)
Sep 16 2024, 10:12 AM
Unknown Object (File)
Sep 14 2024, 9:45 PM
Subscribers

Details

Summary

We'll remove the per-channel control_work_queue because it can't properly do
serialization of message handling, e.g., when there are 2 NIC devices,
vmbus_channel_on_offer() -> hv_queue_work_item() has a race condition:
for an SMP VM, vmbus_channel_process_offer() can run concurrently on different
CPUs and if the second NIC's
vmbus_channel_process_offer() -> hv_vmbus_child_device_register() runs first,
the second NIC's name will be hn0 and the first NIC's name will be hn1!

We can fix the race condition by removing the per-channel control_work_queue
and run all the message handlers in the global
hv_vmbus_g_connection.work_queue -- we'll do this in the next patch.

With the coming next patch, we have to run the non-blocking handlers directly
in the kernel thread vmbus_msg_swintr(), because the special handling of
sub-channel: when a sub-channel (e.g., of the storvsc driver) is received and
being handled in vmbus_channel_on_offer() running on the global
hv_vmbus_g_connection.work_queue, vmbus_channel_process_offer() invokes
channel->sc_creation_callback, i.e., storvsc_handle_sc_creation, and the
callback will invoke hv_vmbus_channel_open() -> hv_vmbus_post_message and
expect a further reply from the host, but the handling of the further message
can't be done because the current message's handling hasn't finished yet; as a
result, hv_vmbus_channel_open() -> sema_timedwait() will time out and the
device can't work.

Also renamed the handler type from hv_pfn_channel_msg_handler to
vmbus_msg_handler: the 'pfn' and 'channel' in the old name make no sense.

Signed-off-by: Dexuan Cui <decui@microsoft.com>

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 1695
Build 1701: arc lint + arc unit

Event Timeline

decui_microsoft.com retitled this revision from to hyperv: vmbus: run non-blocking message handlers in vmbus_msg_swintr().
decui_microsoft.com updated this object.
royger edited edge metadata.

LGTM (although I know nothing about HyperV internals).

AFAICT from your description, you cannot run the handlers from vmbus_msg_swintr because some of them expect a reply, and this reply won't be delivered because the previous message hasn't been marked as 'done'. Is there any reason why you don't simply copy the message, mark it as done and then launch the handler from vmbus_msg_swintr?

I've also realized that you are registering the software interrupts without the INTR_MPSAFE flag (see line ~498 of hv_vmbus_drv_freebsd.c), which means that vmbus_msg_swintr will never run concurrently on different CPUs, is this intended?

sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c
122

Align with 4 spaces

This revision is now accepted and ready to land.Dec 23 2015, 5:14 PM
In D4596#99078, @royger wrote:

LGTM (although I know nothing about HyperV internals).

AFAICT from your description, you cannot run the handlers from vmbus_msg_swintr because some of them expect a reply, and this reply won't be delivered because the previous message hasn't been marked as 'done'. Is there any reason why you don't simply copy the message, mark it as done and then launch the handler from vmbus_msg_swintr?

There are 2 kinds of handlers: sleepable handlers and non-sleepable handlers.

Because we can't sleep in swi thread, so the first kind of handlers have to be pushed to a taskqueue: "hv_queue_work_item(hv_vmbus_g_connection.work_queue,...)".

When a handler of the first kind is running in the takqueue, it can send a message to the host and then sleep, expecting a 'completion' reply message, which will be firstly handled by the swi thread -- here we must run this (non-sleepable) handler in the swi_thread rather than trying to push it to the same taskqueue, because the sleepable handler is sleeping in the taskqueue so any new (non-sleepable) task (i.e., handler) won't have a chance to run.

In D4596#99078, @royger wrote:

I've also realized that you are registering the software interrupts without the INTR_MPSAFE flag (see line ~498 of hv_vmbus_drv_freebsd.c), which means that vmbus_msg_swintr will never run concurrently on different CPUs, is this intended?

There are 2 kinds of messages: timer messages (Howard Su is implementing Hyper-V timer with this kind of msg) and non-timer messages (i.e., the messages are handled by vmbus_msg_swintr()).

Timer messages can arrive on every CPUs and Howard Su is hanndling them in hv_vmbus_isr() directly.

Non-timer messages can only arrive on CPU0. So it's OK to register the message swi thread without the INTR_MPSAFE. BTW, actually we only need to register 1 swi thread. I'm going to make a patch for this.

However, we indeed need to add INTR_MPSAFE when registering the event swi threads (hv_vmbus_on_events()). I'm going to make a patch for this.

This revision was automatically updated to reflect the committed changes.