Page MenuHomeFreeBSD

A few fail point upgrades
ClosedPublic

Authored by matthew.bryan_isilon.com on Feb 24 2016, 11:40 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Dec 5, 8:48 AM
Unknown Object (File)
Mon, Dec 2, 10:54 AM
Unknown Object (File)
Nov 26 2024, 9:47 AM
Unknown Object (File)
Nov 25 2024, 5:33 PM
Unknown Object (File)
Nov 25 2024, 5:33 PM
Unknown Object (File)
Nov 25 2024, 5:33 PM
Unknown Object (File)
Nov 25 2024, 5:33 PM
Unknown Object (File)
Nov 25 2024, 5:33 PM
Subscribers

Details

Summary

This is several year's worth of fail point upgrades done at EMC Isilon. They are interdependent enough that it makes sense to put a single diff up for them. Primarily, we added:

  • Changing all mainline execution paths to be lockless, which lets us use fail points in more sleep-sensitive areas, and allows more parallel execution
  • A number of additional commands, including 'pause' that lets us do some interesting deterministic repros of race conditions
  • The ability to dump the stacks of all threads sleeping on a fail point
  • A number of other API changes to allow marking up the fail point's context in the code, and firing callbacks before and after execution
  • A man page update
Test Plan

This has been tested internally using test automation, and through real use. We are investigating how/if we can upstream those tests (Kyua?). Regardless, we will continue to run them in house. NGie can chime in about that if he has the moment to.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

matthew.bryan_isilon.com retitled this revision from to A few fail point upgrades.
matthew.bryan_isilon.com updated this object.
matthew.bryan_isilon.com edited the test plan for this revision. (Show Details)
matthew.bryan_isilon.com set the repository for this revision to rS FreeBSD src repository - subversion.
head/share/man/man9/fail.9
29 ↗(On Diff #13697)

It should be spelled out and not padded. February 2, 2016

Style police!

head/sys/kern/kern_fail.c
218 ↗(On Diff #13697)

style: assign outside of declaration.

289 ↗(On Diff #13697)

style: empty line needed

319 ↗(On Diff #13697)

style: empty line needed

341 ↗(On Diff #13697)

style: empty line needed

342 ↗(On Diff #13697)

We

417 ↗(On Diff #13697)

Convert.

Also suggest moving 'int timo' above this with a blank line after.

565 ↗(On Diff #13697)

whitespace change

838–840 ↗(On Diff #13697)

style consistency, no braces needed here.

869 ↗(On Diff #13697)

period

896 ↗(On Diff #13697)

period

1098 ↗(On Diff #13697)

style: empty line needed here

matthew.bryan_isilon.com added inline comments.
head/sys/kern/kern_fail.c
565 ↗(On Diff #13697)

The existing style in the file seems to be 8 space line continuation, so I was synchronizing it with that. This seems fine.

head/sys/kern/kern_fail.c
222 ↗(On Diff #13701)

Dang wrong name here. Fixed.

pho edited edge metadata.

I have tested this patch on both i386 and amd64. Tests included all of stress2 and a buildworld.
No problems seen.

This revision is now accepted and ready to land.Feb 26 2016, 6:22 AM
head/sys/kern/kern_fail.c
69 ↗(On Diff #13702)

Usually sys/systm.h is the second included header, first being sys/param.h. Move it to the start of the #include blocks, before sys/ctype.h.

82 ↗(On Diff #13702)

Style: space around binary op.

166 ↗(On Diff #13702)

Why do you need explicitely sized type there ?

188 ↗(On Diff #13702)

Style: no space between start and function name.

225 ↗(On Diff #13702)

why is the TAILQ_EMPTY check needed ?

282 ↗(On Diff #13702)

Are parallel modifications to the fe_entries list possible ?

303 ↗(On Diff #13702)

Why _rel semantic is needed there ?

361 ↗(On Diff #13702)

We usually say 'triggering' in this situation.

364 ↗(On Diff #13702)

This really belongs to the declarations block, according to style.

head/sys/kern/subr_sleepqueue.c
1108 ↗(On Diff #13702)

At this moment, the thread lock should be equal to the sleepq_chain lock. Your thread_lock() recurses, it seems. Why do you need the thread lock there at all ?

head/sys/sys/fail.h
68 ↗(On Diff #13702)

Why do you need explicitely sized type there ?

matthew.bryan_isilon.com added inline comments.
head/sys/kern/kern_fail.c
69 ↗(On Diff #13702)

Moving it caused the build to break, but it looks like it is superfluous anyway. I'm just going to remove that include.

166 ↗(On Diff #13702)

Not needed. Changed back to int.

225 ↗(On Diff #13702)

It isn't. Removed.

282 ↗(On Diff #13702)

No; once fe_entries is created it is read only until refs go to 0, at which point at most one thread can garbage collect it. Having a single garbage collector is ensured by locking done during calls to the sysctl handler. What CAN change is:

  • Each fail_point_entry in the list can have atomic operations performed on them
  • The pointer in the fail point to a different fe_entries can be atomically swapped.

So the list itself is never modified in parallel.

303 ↗(On Diff #13702)

Actually this should be acq: we don't want any accesses ordered before we increment here. Thanks for pointing that out.

head/sys/kern/subr_sleepqueue.c
1108 ↗(On Diff #13702)

Ah I didn't spot that lock dance. Great; I'll take the explicit lock/unlock of the thread lock out and replace it with a comment so it is bloody clear to the reader we hold it. Thanks.

head/sys/sys/fail.h
68 ↗(On Diff #13702)

Not necessary. I cleaned up the whitespace in this area too.

matthew.bryan_isilon.com edited edge metadata.
matthew.bryan_isilon.com removed rS FreeBSD src repository - subversion as the repository for this revision.
matthew.bryan_isilon.com marked 6 inline comments as done.

CR fixes re: kib's comments.

This revision now requires review to proceed.Feb 29 2016, 7:39 PM
kib edited edge metadata.
kib added inline comments.
sys/kern/subr_sleepqueue.c
1108

What you really need there, is a guarantee that the thread does not become runnable suddenly. This is provided by owning the lock of the sleep chain the thread is waiting on, since rescheduling requires taking the thread lock.

IMO it would be more useful to reader to give him a note about this, then just reference the ownership.

This revision is now accepted and ready to land.Mar 1 2016, 2:47 PM

On an unrelated note. The failpoint code is one of the last places using the deprecated timeout(9) API. I have an old patch to convert it that needs testing (and will need to be reworked after these changes). It is at www.freebsd.org/~jhb/patches/fail_callout.patch It would be really great to get that in before 11 so I can axe timeout() from 11.

sys/kern/kern_fail.c
372

Why the barrier here? (Also, you should compare against 0, not use '!' per style(9)).

399

Why the barrier here?

558

What other memory operations are these barriers relative to? Also, this algorithm seems racy. Just because it's > 0 now doesn't mean it is still > 0 when you do the subtraction. You should instead use a loop with cmpset or possibly use fetchadd, e.g.:

val = ent->fe_count;
while (val > 0) {
    if (atomic_cmpset_32(&ent->fe_count, val, val - 1)) {
        val--;
        break;
    }
    val = ent->fe_count;
}
if (val == 0)
    ent->fe_stale = true;

Or using fetchadd:

val = ent->fe_count;
if (val > 0)
    val = atomic_fetchadd_32(&ent->fe_count, -1) - 1;
if (val <= 0)
    ent->fe_stale = true;

The first one would never allow fe_count to go negative. The second one would.

Responses to John. John - does this approach sound right to you?

sys/kern/kern_fail.c
372

Fixed style issue.

This is the only form of load available in atomic.h according to atomic(9).

<type>
atomic_load_acq_<type>(volatile <type> *p);

I used an atomic load here out of an abundance of caution. My thought is that in an SMP environment I want to make sure that a modification of the refcnt cannot jump this load. Given this is garbage collection, though, I'm not sure it matters too much. It doesn't affect correctness either way since we are no longer linked to the fail point: the worst thing that would happen I think is that we wouldn't garbage collect something we otherwise could. So perhaps we can't justify the performance hit of a memory barrier here. John - does that sound right to you?

399

This is for the same reason as above: this is the only form of atomic load. But I think in this case that is more caution than is needed: if a store get reordered we are just going to loop again. I'm just going to remove the atomicity of the load. Does that make sense to you as well?

558

Right, so if the count becomes negative we know the entry is stale, just as though it was 0. We could end up setting stale to 'true' twice, but that is obviously idempotent. So this is a bit weird, but it is simple and not incorrect I think.

What none of the above solutions solve is that we can end up executing this entry more times than intended because we already checked the stale field above. At the time I designed this I wanted to side with simplicity because of how complex this function is becoming, so I accepted that. But perhaps I need to revisit that decision now. I could use a modified version of your first algorithm for this.

if (ent->fe_stale)
      continue;
if (....UNTRACKED) {
 val = ent->fe_count;
 execute = false
 while (val > 0) {
    if (atomic_cmpset_32(&ent->fe_count, val, val - 1)) {
        val--;
        execute = true;
        break;
    }
    val = ent->fe_count;
 }
 if (execute == false)
    /* We lost the race; consider the entry stale and bail now */
    continue;
 if (val == 0)
    ent->fe_stale = true;

}

This revision now requires review to proceed.Mar 7 2016, 9:06 PM
sys/kern/kern_fail.c
372

If you don't need a barrier, just use x = y. There isn't an atomic op for simple loads and stores as you can do that with normal assignment (if your variable is volatile).

The barrier doesn't do anything to prevent some other CPU modifying the ref count or ordering the write of that other CPU with respect to your read. The barrier only prevents operations that this thread/CPU is doing from moving relative to the read of the ref_count. In this case there doesn't seem to be any clear relation of the operations after this relative to the read.

399

Yes.

558

Is this trying to only trigger the fail point N times? If so, I think your new code is fine.

Ready to merge?

sys/kern/kern_fail.c
372

Awesome background! Thanks John.

558

Yep.

kib edited edge metadata.
kib added inline comments.
sys/kern/kern_fail.c
370

This should have 4 spaces less for indentation.

This revision is now accepted and ready to land.Mar 12 2016, 3:46 AM
This revision was automatically updated to reflect the committed changes.