When an NM interrupt occurs during a thread context switch out the saved count is the overflowed value. On the subsequent context switch in, an incorrect counter value is loaded due to which no further interrupts are observed.
Instead, it should be the reload count minus the overflowed value.
With the current code, once this event occurs no more NMI are observed and due to which samples are not collected. This change makes sure that the performance counter is still properly loaded and NMIs are generated even after this occurs.
Logs for when this occurs:
18020 193 87929932928880 CSW:SWO:1: cpu=193 proc=0xfffff8015380a000 (6318, ctfconvert) pp=0xfffff80129d94400
18019 248 87929912357700 CSW:SWO:3: cpu=248 adjri=15
18018 248 87929912321960 MDP:INT:2: retval=0 isnull=16 v=-8790397116416
18017 248 87929912321780 MDP:INT:1: cpu=248 tf=0xfffffe00015ddf30 um=0
18016 248 87929912320280 MDP:SWO:1: pc=0xfffff80105ebe400 pp=0xfffff80129d94400 enable-msr=0
18015 248 87929912320080 MDP:CFG:1: cpu=248 ri=0 pm=0x0
18014 248 87929912319200 CSW:SWO:2: cpu=248 ri=17 val=-70 (samp)
18013 248 87929912319120 MDP:REA:2: amd-read (post-munge) id=0 -> -70
18012 248 87929912319080 MDP:REA:2: amd-read (pre-munge) id=0 -> 70
18011 248 87929912318940 MDP:REA:1: amd-read id=0 class=2
18010 248 87929912318440 MDP:STO:1: amd-stop ri=0
18009 248 87929912317640 CSW:SWO:1: cpu=248 proc=0xfffff8015380a000 (6318, ctfconvert) pp=0xfffff80129d94400