Details

Reviewers

kib
cem
jhb
ngie
bdrewery
pho

Group Reviewers

manpages

Commits

rS296927: fail(9): Upstreaming some fail point enhancements

Summary

This is several year's worth of fail point upgrades done at EMC Isilon. They are interdependent enough that it makes sense to put a single diff up for them. Primarily, we added:

Changing all mainline execution paths to be lockless, which lets us use fail points in more sleep-sensitive areas, and allows more parallel execution
A number of additional commands, including 'pause' that lets us do some interesting deterministic repros of race conditions
The ability to dump the stacks of all threads sleeping on a fail point
A number of other API changes to allow marking up the fail point's context in the code, and firing callbacks before and after execution
A man page update

Test Plan

This has been tested internally using test automation, and through real use. We are investigating how/if we can upstream those tests (Kyua?). Regardless, we will continue to run them in house. NGie can chime in about that if he has the moment to.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

matthew.bryan_isilon.com updated this revision to Diff 13697.Feb 24 2016, 11:40 PM

matthew.bryan_isilon.com retitled this revision from to A few fail point upgrades.

matthew.bryan_isilon.com updated this object.

matthew.bryan_isilon.com edited the test plan for this revision. (Show Details)

matthew.bryan_isilon.com added reviewers: bdrewery, cem, ngie, jhb, pho, kib.

matthew.bryan_isilon.com set the repository for this revision to rS FreeBSD src repository - subversion.

Herald added a reviewer: manpages. · View Herald TranscriptFeb 24 2016, 11:40 PM

Herald added a subscriber: imp. · View Herald Transcript

bdrewery added inline comments.Feb 24 2016, 11:44 PM

head/share/man/man9/fail.9
29	It should be spelled out and not padded. February 2, 2016

Style police!

head/sys/kern/kern_fail.c
219	style: assign outside of declaration.
290	style: empty line needed
418	style: empty line needed
434	style: empty line needed
435	We
510	Convert. Also suggest moving 'int timo' above this with a blank line after.
583	whitespace change
856–858	style consistency, no braces needed here.
886	period
913	period
1115–1117	style: empty line needed here

matthew.bryan_isilon.com marked 11 inline comments as done.Feb 25 2016, 12:26 AM

matthew.bryan_isilon.com added inline comments.

head/sys/kern/kern_fail.c
583	The existing style in the file seems to be 8 space line continuation, so I was synchronizing it with that. This seems fine.

CR fixes / style9 issues.

Herald edited edge metadata. · View Herald TranscriptFeb 25 2016, 12:31 AM

matthew.bryan_isilon.com added inline comments.Feb 25 2016, 12:34 AM

head/sys/kern/kern_fail.c
223	Dang wrong name here. Fixed.

Fix misspelled variable name.

Herald edited edge metadata. · View Herald TranscriptFeb 25 2016, 12:36 AM

I have tested this patch on both i386 and amd64. Tests included all of stress2 and a buildworld.
No problems seen.

This revision is now accepted and ready to land.Feb 26 2016, 6:22 AM

kib added inline comments.Feb 26 2016, 5:50 PM

head/sys/kern/kern_fail.c
69	Usually sys/systm.h is the second included header, first being sys/param.h. Move it to the start of the #include blocks, before sys/ctype.h.
83	Style: space around binary op.
167	Why do you need explicitely sized type there ?
189	Style: no space between start and function name.
226	why is the TAILQ_EMPTY check needed ?
283	Are parallel modifications to the fe_entries list possible ?
304	Why _rel semantic is needed there ?
414	We usually say 'triggering' in this situation.
417	This really belongs to the declarations block, according to style.
head/sys/kern/subr_sleepqueue.c
1109	At this moment, the thread lock should be equal to the sleepq_chain lock. Your thread_lock() recurses, it seems. Why do you need the thread lock there at all ?
head/sys/sys/fail.h
68	Why do you need explicitely sized type there ?

matthew.bryan_isilon.com marked 10 inline comments as done.Feb 29 2016, 7:29 PM

matthew.bryan_isilon.com added inline comments.

head/sys/kern/kern_fail.c
69	Moving it caused the build to break, but it looks like it is superfluous anyway. I'm just going to remove that include.
167	Not needed. Changed back to int.
226	It isn't. Removed.
283	No; once fe_entries is created it is read only until refs go to 0, at which point at most one thread can garbage collect it. Having a single garbage collector is ensured by locking done during calls to the sysctl handler. What CAN change is: Each fail_point_entry in the list can have atomic operations performed on them The pointer in the fail point to a different fe_entries can be atomically swapped. So the list itself is never modified in parallel.
304	Actually this should be acq: we don't want any accesses ordered before we increment here. Thanks for pointing that out.
head/sys/kern/subr_sleepqueue.c
1109	Ah I didn't spot that lock dance. Great; I'll take the explicit lock/unlock of the thread lock out and replace it with a comment so it is bloody clear to the reader we hold it. Thanks.
head/sys/sys/fail.h
68	Not necessary. I cleaned up the whitespace in this area too.

CR fixes re: kib's comments.

This revision now requires review to proceed.Feb 29 2016, 7:39 PM

kib accepted this revision.Mar 1 2016, 2:47 PM

kib edited edge metadata.

kib added inline comments.

sys/kern/subr_sleepqueue.c
1108 ↗	(On Diff #13894)	What you really need there, is a guarantee that the thread does not become runnable suddenly. This is provided by owning the lock of the sleep chain the thread is waiting on, since rescheduling requires taking the thread lock. IMO it would be more useful to reader to give him a note about this, then just reference the ownership.

This revision is now accepted and ready to land.Mar 1 2016, 2:47 PM

On an unrelated note. The failpoint code is one of the last places using the deprecated timeout(9) API. I have an old patch to convert it that needs testing (and will need to be reworked after these changes). It is at www.freebsd.org/~jhb/patches/fail_callout.patch It would be really great to get that in before 11 so I can axe timeout() from 11.

sys/kern/kern_fail.c
372 ↗	(On Diff #13894)	Why the barrier here? (Also, you should compare against 0, not use '!' per style(9)).
399 ↗	(On Diff #13894)	Why the barrier here?
556 ↗	(On Diff #13894)	What other memory operations are these barriers relative to? Also, this algorithm seems racy. Just because it's > 0 now doesn't mean it is still > 0 when you do the subtraction. You should instead use a loop with cmpset or possibly use fetchadd, e.g.: val = ent->fe_count; while (val > 0) { if (atomic_cmpset_32(&ent->fe_count, val, val - 1)) { val--; break; } val = ent->fe_count; } if (val == 0) ent->fe_stale = true; Or using fetchadd: val = ent->fe_count; if (val > 0) val = atomic_fetchadd_32(&ent->fe_count, -1) - 1; if (val <= 0) ent->fe_stale = true; The first one would never allow fe_count to go negative. The second one would.

Responses to John. John - does this approach sound right to you?

sys/kern/kern_fail.c
372 ↗	(On Diff #13894)	Fixed style issue. This is the only form of load available in atomic.h according to atomic(9). <type> atomic_load_acq_<type>(volatile <type> *p); I used an atomic load here out of an abundance of caution. My thought is that in an SMP environment I want to make sure that a modification of the refcnt cannot jump this load. Given this is garbage collection, though, I'm not sure it matters too much. It doesn't affect correctness either way since we are no longer linked to the fail point: the worst thing that would happen I think is that we wouldn't garbage collect something we otherwise could. So perhaps we can't justify the performance hit of a memory barrier here. John - does that sound right to you?
399 ↗	(On Diff #13894)	This is for the same reason as above: this is the only form of atomic load. But I think in this case that is more caution than is needed: if a store get reordered we are just going to loop again. I'm just going to remove the atomicity of the load. Does that make sense to you as well?
556 ↗	(On Diff #13894)	Right, so if the count becomes negative we know the entry is stale, just as though it was 0. We could end up setting stale to 'true' twice, but that is obviously idempotent. So this is a bit weird, but it is simple and not incorrect I think. What none of the above solutions solve is that we can end up executing this entry more times than intended because we already checked the stale field above. At the time I designed this I wanted to side with simplicity because of how complex this function is becoming, so I accepted that. But perhaps I need to revisit that decision now. I could use a modified version of your first algorithm for this. if (ent->fe_stale) continue; if (....UNTRACKED) { val = ent->fe_count; execute = false while (val > 0) { if (atomic_cmpset_32(&ent->fe_count, val, val - 1)) { val--; execute = true; break; } val = ent->fe_count; } if (execute == false) /* We lost the race; consider the entry stale and bail now */ continue; if (val == 0) ent->fe_stale = true; }

CR fixes

This revision now requires review to proceed.Mar 7 2016, 9:06 PM

Herald edited edge metadata. · View Herald TranscriptMar 7 2016, 9:06 PM

jhb added inline comments.Mar 9 2016, 6:55 PM

sys/kern/kern_fail.c
372 ↗	(On Diff #14145)	If you don't need a barrier, just use x = y. There isn't an atomic op for simple loads and stores as you can do that with normal assignment (if your variable is volatile). The barrier doesn't do anything to prevent some other CPU modifying the ref count or ordering the write of that other CPU with respect to your read. The barrier only prevents operations that this thread/CPU is doing from moving relative to the read of the ref_count. In this case there doesn't seem to be any clear relation of the operations after this relative to the read.
399 ↗	(On Diff #14145)	Yes.
558 ↗	(On Diff #14145)	Is this trying to only trigger the fail point N times? If so, I think your new code is fine.

Ready to merge?

sys/kern/kern_fail.c
372 ↗	(On Diff #14145)	Awesome background! Thanks John.
558 ↗	(On Diff #14145)	Yep.

kib accepted this revision.Mar 12 2016, 3:46 AM

kib edited edge metadata.

kib added inline comments.

sys/kern/kern_fail.c
370 ↗	(On Diff #14145)	This should have 4 spaces less for indentation.

This revision is now accepted and ready to land.Mar 12 2016, 3:46 AM

Closed by commit rS296927: fail(9): Upstreaming some fail point enhancements (authored by cem). · Explain WhyMar 16 2016, 4:22 AM

This revision was automatically updated to reflect the committed changes.

A few fail point upgrades
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 14351

head/share/man/man9/fail.9

head/sys/kern/kern_fail.c

head/sys/kern/subr_sleepqueue.c

head/sys/sys/fail.h

head/sys/sys/sleepqueue.h

A few fail point upgradesClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 14351

head/share/man/man9/fail.9

head/sys/kern/kern_fail.c

head/sys/kern/subr_sleepqueue.c

head/sys/sys/fail.h

head/sys/sys/sleepqueue.h

A few fail point upgrades
ClosedPublic
Actions

Revision Contents
Changeset List