Page MenuHomeFreeBSD

A few fail point upgrades
ClosedPublic

Authored by matthew.bryan_isilon.com on Feb 24 2016, 11:40 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Dec 5, 8:48 AM
Unknown Object (File)
Mon, Dec 2, 10:54 AM
Unknown Object (File)
Nov 26 2024, 9:47 AM
Unknown Object (File)
Nov 25 2024, 5:33 PM
Unknown Object (File)
Nov 25 2024, 5:33 PM
Unknown Object (File)
Nov 25 2024, 5:33 PM
Unknown Object (File)
Nov 25 2024, 5:33 PM
Unknown Object (File)
Nov 25 2024, 5:33 PM
Subscribers

Details

Summary

This is several year's worth of fail point upgrades done at EMC Isilon. They are interdependent enough that it makes sense to put a single diff up for them. Primarily, we added:

  • Changing all mainline execution paths to be lockless, which lets us use fail points in more sleep-sensitive areas, and allows more parallel execution
  • A number of additional commands, including 'pause' that lets us do some interesting deterministic repros of race conditions
  • The ability to dump the stacks of all threads sleeping on a fail point
  • A number of other API changes to allow marking up the fail point's context in the code, and firing callbacks before and after execution
  • A man page update
Test Plan

This has been tested internally using test automation, and through real use. We are investigating how/if we can upstream those tests (Kyua?). Regardless, we will continue to run them in house. NGie can chime in about that if he has the moment to.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

matthew.bryan_isilon.com retitled this revision from to A few fail point upgrades.
matthew.bryan_isilon.com updated this object.
matthew.bryan_isilon.com edited the test plan for this revision. (Show Details)
matthew.bryan_isilon.com set the repository for this revision to rS FreeBSD src repository - subversion.
head/share/man/man9/fail.9
29

It should be spelled out and not padded. February 2, 2016

Style police!

head/sys/kern/kern_fail.c
219

style: assign outside of declaration.

290

style: empty line needed

418

style: empty line needed

434

style: empty line needed

435

We

510

Convert.

Also suggest moving 'int timo' above this with a blank line after.

583

whitespace change

856–858

style consistency, no braces needed here.

886

period

913

period

1115–1117

style: empty line needed here

matthew.bryan_isilon.com added inline comments.
head/sys/kern/kern_fail.c
583

The existing style in the file seems to be 8 space line continuation, so I was synchronizing it with that. This seems fine.

head/sys/kern/kern_fail.c
223

Dang wrong name here. Fixed.

pho edited edge metadata.

I have tested this patch on both i386 and amd64. Tests included all of stress2 and a buildworld.
No problems seen.

This revision is now accepted and ready to land.Feb 26 2016, 6:22 AM
head/sys/kern/kern_fail.c
69

Usually sys/systm.h is the second included header, first being sys/param.h. Move it to the start of the #include blocks, before sys/ctype.h.

83

Style: space around binary op.

167

Why do you need explicitely sized type there ?

189

Style: no space between start and function name.

226

why is the TAILQ_EMPTY check needed ?

283

Are parallel modifications to the fe_entries list possible ?

304

Why _rel semantic is needed there ?

414

We usually say 'triggering' in this situation.

417

This really belongs to the declarations block, according to style.

head/sys/kern/subr_sleepqueue.c
1109

At this moment, the thread lock should be equal to the sleepq_chain lock. Your thread_lock() recurses, it seems. Why do you need the thread lock there at all ?

head/sys/sys/fail.h
68

Why do you need explicitely sized type there ?

matthew.bryan_isilon.com added inline comments.
head/sys/kern/kern_fail.c
69

Moving it caused the build to break, but it looks like it is superfluous anyway. I'm just going to remove that include.

167

Not needed. Changed back to int.

226

It isn't. Removed.

283

No; once fe_entries is created it is read only until refs go to 0, at which point at most one thread can garbage collect it. Having a single garbage collector is ensured by locking done during calls to the sysctl handler. What CAN change is:

  • Each fail_point_entry in the list can have atomic operations performed on them
  • The pointer in the fail point to a different fe_entries can be atomically swapped.

So the list itself is never modified in parallel.

304

Actually this should be acq: we don't want any accesses ordered before we increment here. Thanks for pointing that out.

head/sys/kern/subr_sleepqueue.c
1109

Ah I didn't spot that lock dance. Great; I'll take the explicit lock/unlock of the thread lock out and replace it with a comment so it is bloody clear to the reader we hold it. Thanks.

head/sys/sys/fail.h
68

Not necessary. I cleaned up the whitespace in this area too.

matthew.bryan_isilon.com edited edge metadata.
matthew.bryan_isilon.com removed rS FreeBSD src repository - subversion as the repository for this revision.
matthew.bryan_isilon.com marked 6 inline comments as done.

CR fixes re: kib's comments.

This revision now requires review to proceed.Feb 29 2016, 7:39 PM
kib edited edge metadata.
kib added inline comments.
sys/kern/subr_sleepqueue.c
1108 ↗(On Diff #13894)

What you really need there, is a guarantee that the thread does not become runnable suddenly. This is provided by owning the lock of the sleep chain the thread is waiting on, since rescheduling requires taking the thread lock.

IMO it would be more useful to reader to give him a note about this, then just reference the ownership.

This revision is now accepted and ready to land.Mar 1 2016, 2:47 PM

On an unrelated note. The failpoint code is one of the last places using the deprecated timeout(9) API. I have an old patch to convert it that needs testing (and will need to be reworked after these changes). It is at www.freebsd.org/~jhb/patches/fail_callout.patch It would be really great to get that in before 11 so I can axe timeout() from 11.

sys/kern/kern_fail.c
372 ↗(On Diff #13894)

Why the barrier here? (Also, you should compare against 0, not use '!' per style(9)).

399 ↗(On Diff #13894)

Why the barrier here?

556 ↗(On Diff #13894)

What other memory operations are these barriers relative to? Also, this algorithm seems racy. Just because it's > 0 now doesn't mean it is still > 0 when you do the subtraction. You should instead use a loop with cmpset or possibly use fetchadd, e.g.:

val = ent->fe_count;
while (val > 0) {
    if (atomic_cmpset_32(&ent->fe_count, val, val - 1)) {
        val--;
        break;
    }
    val = ent->fe_count;
}
if (val == 0)
    ent->fe_stale = true;

Or using fetchadd:

val = ent->fe_count;
if (val > 0)
    val = atomic_fetchadd_32(&ent->fe_count, -1) - 1;
if (val <= 0)
    ent->fe_stale = true;

The first one would never allow fe_count to go negative. The second one would.

Responses to John. John - does this approach sound right to you?

sys/kern/kern_fail.c
372 ↗(On Diff #13894)

Fixed style issue.

This is the only form of load available in atomic.h according to atomic(9).

<type>
atomic_load_acq_<type>(volatile <type> *p);

I used an atomic load here out of an abundance of caution. My thought is that in an SMP environment I want to make sure that a modification of the refcnt cannot jump this load. Given this is garbage collection, though, I'm not sure it matters too much. It doesn't affect correctness either way since we are no longer linked to the fail point: the worst thing that would happen I think is that we wouldn't garbage collect something we otherwise could. So perhaps we can't justify the performance hit of a memory barrier here. John - does that sound right to you?

399 ↗(On Diff #13894)

This is for the same reason as above: this is the only form of atomic load. But I think in this case that is more caution than is needed: if a store get reordered we are just going to loop again. I'm just going to remove the atomicity of the load. Does that make sense to you as well?

556 ↗(On Diff #13894)

Right, so if the count becomes negative we know the entry is stale, just as though it was 0. We could end up setting stale to 'true' twice, but that is obviously idempotent. So this is a bit weird, but it is simple and not incorrect I think.

What none of the above solutions solve is that we can end up executing this entry more times than intended because we already checked the stale field above. At the time I designed this I wanted to side with simplicity because of how complex this function is becoming, so I accepted that. But perhaps I need to revisit that decision now. I could use a modified version of your first algorithm for this.

if (ent->fe_stale)
      continue;
if (....UNTRACKED) {
 val = ent->fe_count;
 execute = false
 while (val > 0) {
    if (atomic_cmpset_32(&ent->fe_count, val, val - 1)) {
        val--;
        execute = true;
        break;
    }
    val = ent->fe_count;
 }
 if (execute == false)
    /* We lost the race; consider the entry stale and bail now */
    continue;
 if (val == 0)
    ent->fe_stale = true;

}

This revision now requires review to proceed.Mar 7 2016, 9:06 PM
sys/kern/kern_fail.c
372 ↗(On Diff #14145)

If you don't need a barrier, just use x = y. There isn't an atomic op for simple loads and stores as you can do that with normal assignment (if your variable is volatile).

The barrier doesn't do anything to prevent some other CPU modifying the ref count or ordering the write of that other CPU with respect to your read. The barrier only prevents operations that this thread/CPU is doing from moving relative to the read of the ref_count. In this case there doesn't seem to be any clear relation of the operations after this relative to the read.

399 ↗(On Diff #14145)

Yes.

558 ↗(On Diff #14145)

Is this trying to only trigger the fail point N times? If so, I think your new code is fine.

Ready to merge?

sys/kern/kern_fail.c
372 ↗(On Diff #14145)

Awesome background! Thanks John.

558 ↗(On Diff #14145)

Yep.

kib edited edge metadata.
kib added inline comments.
sys/kern/kern_fail.c
370 ↗(On Diff #14145)

This should have 4 spaces less for indentation.

This revision is now accepted and ready to land.Mar 12 2016, 3:46 AM
This revision was automatically updated to reflect the committed changes.