Page MenuHomeFreeBSD

vmm: vlapic resume can eat 100% CPU by vlapic_callout_handler
ClosedPublic

Authored by gusev.vitaliy_gmail.com on Dec 14 2021, 7:32 PM.
Tags
Referenced Files
Unknown Object (File)
Wed, Apr 24, 6:34 AM
Unknown Object (File)
Tue, Apr 2, 5:05 AM
Unknown Object (File)
Jan 16 2024, 9:26 PM
Unknown Object (File)
Jan 12 2024, 7:50 AM
Unknown Object (File)
Dec 20 2023, 2:46 AM
Unknown Object (File)
Dec 19 2023, 6:32 AM
Unknown Object (File)
Sep 16 2023, 6:48 PM
Unknown Object (File)
Sep 10 2023, 3:17 PM

Details

Summary

Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts.

Win10 does not use LAPIC timer to often and in most cases, and I see it is disabled by writing 0 to Initial Count Register (for Timer).

Fix: restart timer only for enabled LAPIC and enabled timer for that LAPIC.

Test Plan

Perform Suspend/Resume for Win10 and RHEL-s. Look at vmstat, top, output. Look at interrupts and cpu usage after resume.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

gusev.vitaliy_gmail.com edited the summary of this revision. (Show Details)
gusev.vitaliy_gmail.com edited the summary of this revision. (Show Details)

If the timer is not in use, then isn't ccr = 0 true during resume? Then, I would handle this in vlapic_reset_callout() (which should really be renamed to vlapic_callout_resume() IMO).

sys/amd64/vmm/io/vlapic.c
1744

If the timer is not in use, then isn't ccr = 0 true during resume? Then, I would handle this in vlapic_reset_callout() (which should really be renamed to vlapic_callout_resume() IMO).

No, ccr can be 0 even if timer is used. From Vol. 3A Manual 10.5.4 APIC Timer:

In periodic mode, the current-count register is automatically reloaded from the initial-count register when the count reaches 0 and a timer interrupt is generated, and the count-down is repeated. If during the count-down
process the initial-count register is set, counting will restart, using the new initial-count value. The initial-count register is a read-write register; the current-count register is read only.

That means if we have ccr=0 during resume in periodic mode, we should restart timer.

If the timer is not in use, then isn't ccr = 0 true during resume? Then, I would handle this in vlapic_reset_callout() (which should really be renamed to vlapic_callout_resume() IMO).

No, ccr can be 0 even if timer is used. From Vol. 3A Manual 10.5.4 APIC Timer:

In periodic mode, the current-count register is automatically reloaded from the initial-count register when the count reaches 0 and a timer interrupt is generated, and the count-down is repeated. If during the count-down
process the initial-count register is set, counting will restart, using the new initial-count value. The initial-count register is a read-write register; the current-count register is read only.

That means if we have ccr=0 during resume in periodic mode, we should restart timer.

Sorry, I meant to say that I think the patch is ok. I just think it's cleaner to handle this in vlapic_reset_callout().

ccr=0 does not imply that the timer is disabled. But if the timer is disabled, then ccr=0, is all.

That means if we have ccr=0 during resume in periodic mode, we should restart timer.

Sorry, I meant to say that I think the patch is ok. I just think it's cleaner to handle this in vlapic_reset_callout().

ccr=0 does not imply that the timer is disabled. But if the timer is disabled, then ccr=0, is all.

Did you mean to move check on icr_timer and vlapic_enabled to vlapic_reset_callout() ? If yes, why did you mention "ccr"? ccr and icr - are different.

That means if we have ccr=0 during resume in periodic mode, we should restart timer.

Sorry, I meant to say that I think the patch is ok. I just think it's cleaner to handle this in vlapic_reset_callout().

ccr=0 does not imply that the timer is disabled. But if the timer is disabled, then ccr=0, is all.

Did you mean to move check on icr_timer and vlapic_enabled to vlapic_reset_callout() ?

Yes.

If yes, why did you mention "ccr"? ccr and icr - are different.

Only because this new check can become part of the ccr == 0 case of vlapic_reset_callout().

If yes, why did you mention "ccr"? ccr and icr - are different.

Only because this new check can become part of the ccr == 0 case of vlapic_reset_callout().

In this case it executes actually non needed code:

VLAPIC_TIMER_LOCK(vlapic);

bt = vlapic->timer_freq_bt;
bintime_mul(&bt, ccr);

and later developers would have question: "Why lock is taken and some values is calculated when timer is disabled?". Because of this I put all checks before vlapic_reset_callout() call.

I am free to move check to vlapic_reset_callout() but is it really needed?

If yes, why did you mention "ccr"? ccr and icr - are different.

Only because this new check can become part of the ccr == 0 case of vlapic_reset_callout().

In this case it executes actually non needed code:

VLAPIC_TIMER_LOCK(vlapic);

bt = vlapic->timer_freq_bt;
bintime_mul(&bt, ccr);

and later developers would have question: "Why lock is taken and some values is calculated when timer is disabled?". Because of this I put all checks before vlapic_reset_callout() call.

Ok, sure. Today I have the question, "why are we loading timer_freq_bt and multiplying by ccr before checking ccr == 0?" so I wanted to encourage some further cleanup in this function.

I am free to move check to vlapic_reset_callout() but is it really needed?

No, I don't insist.

This revision is now accepted and ready to land.Dec 16 2021, 4:02 PM

@markj It seems changeset can be committed ?