hyperv: vmbus: run non-blocking message handlers in vmbus_msg_swintr()
ClosedPublic
Actions

Authored by decui_microsoft.com on Dec 16 2015, 1:46 PM.

Details

Reviewers

kmacy
howard0su_gmail.com
honzhan_microsoft.com
whu
delphij
royger

Commits

rS292861: hyperv: vmbus: run non-blocking message handlers in vmbus_msg_swintr()

Summary

We'll remove the per-channel control_work_queue because it can't properly do
serialization of message handling, e.g., when there are 2 NIC devices,
vmbus_channel_on_offer() -> hv_queue_work_item() has a race condition:
for an SMP VM, vmbus_channel_process_offer() can run concurrently on different
CPUs and if the second NIC's
vmbus_channel_process_offer() -> hv_vmbus_child_device_register() runs first,
the second NIC's name will be hn0 and the first NIC's name will be hn1!

We can fix the race condition by removing the per-channel control_work_queue
and run all the message handlers in the global
hv_vmbus_g_connection.work_queue -- we'll do this in the next patch.

With the coming next patch, we have to run the non-blocking handlers directly
in the kernel thread vmbus_msg_swintr(), because the special handling of
sub-channel: when a sub-channel (e.g., of the storvsc driver) is received and
being handled in vmbus_channel_on_offer() running on the global
hv_vmbus_g_connection.work_queue, vmbus_channel_process_offer() invokes
channel->sc_creation_callback, i.e., storvsc_handle_sc_creation, and the
callback will invoke hv_vmbus_channel_open() -> hv_vmbus_post_message and
expect a further reply from the host, but the handling of the further message
can't be done because the current message's handling hasn't finished yet; as a
result, hv_vmbus_channel_open() -> sema_timedwait() will time out and the
device can't work.

Also renamed the handler type from hv_pfn_channel_msg_handler to
vmbus_msg_handler: the 'pfn' and 'channel' in the old name make no sense.

Signed-off-by: Dexuan Cui <decui@microsoft.com>

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

decui_microsoft.com updated this revision to Diff 11347.Dec 16 2015, 1:46 PM

decui_microsoft.com retitled this revision from to hyperv: vmbus: run non-blocking message handlers in vmbus_msg_swintr().

decui_microsoft.com updated this object.

decui_microsoft.com added reviewers: whu, royger, delphij, kmacy, howard0su_gmail.com, honzhan_microsoft.com.

Herald added a subscriber: imp. · View Herald TranscriptDec 16 2015, 1:46 PM

LGTM (although I know nothing about HyperV internals).

AFAICT from your description, you cannot run the handlers from vmbus_msg_swintr because some of them expect a reply, and this reply won't be delivered because the previous message hasn't been marked as 'done'. Is there any reason why you don't simply copy the message, mark it as done and then launch the handler from vmbus_msg_swintr?

I've also realized that you are registering the software interrupts without the INTR_MPSAFE flag (see line ~498 of hv_vmbus_drv_freebsd.c), which means that vmbus_msg_swintr will never run concurrently on different CPUs, is this intended?

sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c
122 ↗	(On Diff #11347)	Align with 4 spaces

This revision is now accepted and ready to land.Dec 23 2015, 5:14 PM

In D4596#99078, @royger wrote:

LGTM (although I know nothing about HyperV internals).

AFAICT from your description, you cannot run the handlers from vmbus_msg_swintr because some of them expect a reply, and this reply won't be delivered because the previous message hasn't been marked as 'done'. Is there any reason why you don't simply copy the message, mark it as done and then launch the handler from vmbus_msg_swintr?

There are 2 kinds of handlers: sleepable handlers and non-sleepable handlers.

Because we can't sleep in swi thread, so the first kind of handlers have to be pushed to a taskqueue: "hv_queue_work_item(hv_vmbus_g_connection.work_queue,...)".

When a handler of the first kind is running in the takqueue, it can send a message to the host and then sleep, expecting a 'completion' reply message, which will be firstly handled by the swi thread -- here we must run this (non-sleepable) handler in the swi_thread rather than trying to push it to the same taskqueue, because the sleepable handler is sleeping in the taskqueue so any new (non-sleepable) task (i.e., handler) won't have a chance to run.

In D4596#99078, @royger wrote:

I've also realized that you are registering the software interrupts without the INTR_MPSAFE flag (see line ~498 of hv_vmbus_drv_freebsd.c), which means that vmbus_msg_swintr will never run concurrently on different CPUs, is this intended?

There are 2 kinds of messages: timer messages (Howard Su is implementing Hyper-V timer with this kind of msg) and non-timer messages (i.e., the messages are handled by vmbus_msg_swintr()).

Timer messages can arrive on every CPUs and Howard Su is hanndling them in hv_vmbus_isr() directly.

Non-timer messages can only arrive on CPU0. So it's OK to register the message swi thread without the INTR_MPSAFE. BTW, actually we only need to register 1 swi thread. I'm going to make a patch for this.

However, we indeed need to add INTR_MPSAFE when registering the event swi threads (hv_vmbus_on_events()). I'm going to make a patch for this.

decui_microsoft.com added a child revision: D4597: hyperv: vmbus: remove the per-channel control_work_queue.Dec 29 2015, 8:08 AM

Closed by commit rS292861: hyperv: vmbus: run non-blocking message handlers in vmbus_msg_swintr() (authored by delphij). · Explain WhyDec 29 2015, 8:19 AM

This revision was automatically updated to reflect the committed changes.