Details

Reviewers

kib
markj

Commits

rS353483: Fix sample check in hwpmc
rS339188: hwpmc: Refactor sample ring buffer handling to fix races

Summary

Fixes: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231793

Refactor sample ring buffer ring handling to make it more robust to long running callchain collection handling

r338112 introduced a regression that exposed a number of race conditions within the management of the sample buffers. This simplifies the handling and moves the decision to overwrite a callchain sample that has taken too long out of the NMI in to the hardlock handler. With this change the problem no longer shows up as a ring corruption but as the code spending all of its time in callchain collection.

Makes the producer / consumer index incrementing monotonic, making it easier (for me at least) to reason about.
Moves the decision to overwrite a sample from NMI context to interrupt context where we can enforce serialization.
Puts a time limit on waiting to collect a user callchain - putting a bound on head-of-line blocking causing samples to be dropped
Removes the flush routine which was previously needed to purge dangling references to the pmc from the sample buffers but now is only a source of a race condition on unload.

Currently one can lock up or crash HEAD by running:
pmcstat -S inst_retired.any_p -T and then hitting ^C

After this change it is no longer possible.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

mmacy created this revision.Sep 4 2018, 8:05 AM

Herald added subscribers: gnn, imp. · View Herald TranscriptSep 4 2018, 8:05 AM

Harbormaster completed remote builds in B19380: Diff 47633.Sep 4 2018, 8:05 AM

mmacy edited the summary of this revision. (Show Details)Sep 4 2018, 9:04 AM

mmacy retitled this revision from hwpmc: Refactor sample ring buffer handilng to hwpmc: Refactor sample ring buffer handling.

I see a "panic: [pmc,4980] pm=0x9e81380 runcount 0" on i386 with this patch. If I compile the hwpmc.ko module with "-O0" I do not get the panic.
https://people.freebsd.org/~pho/stress/log/mmacy032.txt
https://people.freebsd.org/~pho/kernel+vmcore.1106-x4.txz

Can you add an explanation what exactly changes intend to fix, and how they achieve robustness improvements.

sys/dev/hwpmc/hwpmc_mod.c
4868 ↗	(On Diff #47633)	If you need to synchronize between the interrupt context and normal kernel code, then dedicating normal spinlock would be better then abusing spinlock implementation bits. I mean, if you do this for NMI interrupting userspace, spinlock would work. If this is called from NMI handler when the kernel was interrupted, you do not need neither spinlock nor spinlock_enter() (but I believe that this code is only used for the first case).
4900 ↗	(On Diff #47633)	This assert validity actually depends on the speed and load of the disk device or even network/NFS response time. I think it might be useful as a debugging tool on the specific machine, but not suitable for commit. Logging some message, as above, is useful.

no need to flush any more
remove invalid assert

Harbormaster completed remote builds in B19902: Diff 48640.Oct 2 2018, 5:35 AM

remove unused force parameter

Harbormaster completed remote builds in B19903: Diff 48641.Oct 2 2018, 5:42 AM

remove invalid assert

Harbormaster completed remote builds in B19913: Diff 48656.Oct 2 2018, 8:03 PM

Fix patch

Harbormaster completed remote builds in B19916: Diff 48659.Oct 2 2018, 8:41 PM

mmacy marked an inline comment as done.Oct 2 2018, 8:43 PM

mmacy added inline comments.

sys/dev/hwpmc/hwpmc_mod.c
4868 ↗	(On Diff #47633)	It's to serialize access between hardclock and non interrupt contexts manipulating samples. spinlock_{enter, exit} are a lot cheaper than atomics. I don't know that there is otherwise an architecture agnostic intr disable.

In D17011#363656, @kib wrote:

Can you add an explanation what exactly changes intend to fix, and how they achieve robustness improvements.

Makes the producer / consumer index incrementing monotonic, making it easier (for me at least) to reason about.
Moves the decision to overwrite a sample from NMI context to interrupt context where we can enforce serialization.
Puts a time limit on waiting to collect a user callchain - putting a bound on head-of-line blocking causing samples to be dropped
Removes the flush routine which was previously needed to purge dangling references to the pmc from the sample buffers but now is only a source of a race condition on unload.

Currently one can lock up or crash HEAD by running:
pmcstat -S inst_retired.any_p -T and then hitting ^C

After this change it is no longer possible.

@kib @markj Any further comments before I send this to RE?

markj added inline comments.Oct 3 2018, 7:55 PM

sys/dev/hwpmc/hwpmc_mod.c
4685 ↗	(On Diff #48659)	ps_considx is a 64-bit value, which can't be updated atomically on all platforms. I think this handler could observe a torn write by the interrupted thread, though it seems that might be harmless for now. Why did you change this away from ps_read/_write? Does it fix a bug?
4817 ↗	(On Diff #48659)	The indentation of the continuation line is wrong.
4837 ↗	(On Diff #48659)	The indentation looks wrong here.
4882 ↗	(On Diff #48659)	I know it's not new to this change, but PMC_SAMPLE_INUSE is a pretty bad name - any non-zero value for ps_nsamples means that the sample is "in use." You might call it PMC_SAMPLE_PENDING or so instead.
4960 ↗	(On Diff #48659)	Add a pm_stats entry for this?
4997 ↗	(On Diff #48659)	Style: indentation is wrong.
5184 ↗	(On Diff #48659)	The cast should be to uint64_t.
5570 ↗	(On Diff #48659)	You don't need the else, just write `pmc_sample_mask = pmc_nsamples - 1;` after the block.

I’ll update the patch with the other feedback.

sys/dev/hwpmc/hwpmc_mod.c
4685 ↗	(On Diff #48659)	I’ll look and see if a partial update could cause problems. The change makes the code for looking at the delta simpler as I no longer have to deal with wrap. Primarily it makes it easier for me to reason about.

markj added inline comments.Oct 3 2018, 8:05 PM

sys/dev/hwpmc/hwpmc_mod.c
4685 ↗	(On Diff #48659)	I suspect a partial update will not cause problems with the current code, it just seems fragile. I don't object to that part of the change, just noting that it could lead to bugs. I had the same impression, but now you need to add these considx != prodix checks instead.

mmacy marked an inline comment as done.Oct 3 2018, 9:07 PM

mmacy added inline comments.

sys/dev/hwpmc/hwpmc_mod.c
4685 ↗	(On Diff #48659)	Its merits are subjective. Coming from having spent a lot of time in NIC drivers, index wrap was one of the things that annoyed me most. I'm quite happy with adding considx != prodidx.

mmacy marked an inline comment as done.Oct 3 2018, 9:25 PM

mmacy added inline comments.

sys/dev/hwpmc/hwpmc_mod.c
5184 ↗	(On Diff #48659)	_int64_t_ we don't want it to go negative.

incorporate feedback from @markj

Harbormaster completed remote builds in B19942: Diff 48697.Oct 3 2018, 9:29 PM

mmacy edited the summary of this revision. (Show Details)Oct 3 2018, 10:12 PM

markj accepted this revision.Oct 4 2018, 9:40 PM

This revision is now accepted and ready to land.Oct 4 2018, 9:40 PM

Closed by commit rS339188: hwpmc: Refactor sample ring buffer handling to fix races (authored by mmacy). · Explain WhyOct 5 2018, 5:56 AM

This revision was automatically updated to reflect the committed changes.

After this change sampling under load is truly terrible - rejecting 99+% of samples:

+ pmcstat -S unhalted_core_cycles -O ppid.pmcstat sleep 10
+ pmcstat -R ppid.pmcstat -z100 -G ppid.stacks
CONVERSION STATISTICS:
 #exec/elf                                1
 #samples/total                           566445
 #callchain/dubious-frames                565995

Fixed

rajeeb.barman_amd.com added a subscriber: rajeeb.barman_amd.com.Sep 4 2019, 11:15 AM

rajeeb.barman_amd.com added inline comments.

head/sys/dev/hwpmc/hwpmc_mod.c
4837	Shouldn't be ps->ps_nsamples != PMC_USER_CALLCHAIN_PENDING instead of ps->ps_nsamples == PMC_USER_CALLCHAIN_PENDING ??

markj added inline comments.Sep 13 2019, 7:09 PM

head/sys/dev/hwpmc/hwpmc_mod.c
4837	Yes, I think you're right.

mjg added inline comments.Sep 13 2019, 7:35 PM

head/sys/dev/hwpmc/hwpmc_mod.c
4837	Hello there. I can confirm making this change takes care of the majority of lost samples, thank you for noting it! We still need to verify validity of samples collected but it does look promising.

shreyank.amartya_amd.com added a subscriber: shreyank.amartya_amd.com.Sep 19 2019, 10:43 AM

shreyank.amartya_amd.com added inline comments.

head/sys/dev/hwpmc/hwpmc_mod.c
3195	In a scenario where pmc is attached to a target process other than owner (similar to pmcstat -t <pid>), pmc_start() returns without setting the machine dependent state of the pmc to start. Should this be modified to return only when a forced context switch occurs?

mmacy added a commit: rS353483: Fix sample check in hwpmc.Oct 13 2019, 10:27 PM

hwpmc: Refactor sample ring buffer handling
AbandonedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 48765

head/sys/dev/hwpmc/hwpmc_logging.c

head/sys/dev/hwpmc/hwpmc_mod.c

head/sys/sys/pmc.h

head/sys/sys/pmckern.h

hwpmc: Refactor sample ring buffer handlingAbandonedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 48765

head/sys/dev/hwpmc/hwpmc_logging.c

head/sys/dev/hwpmc/hwpmc_mod.c

head/sys/sys/pmc.h

head/sys/sys/pmckern.h

hwpmc: Refactor sample ring buffer handling
AbandonedPublic
Actions

Revision Contents
Changeset List