This avoids the one place we recursively lock turnstiles. In highly contended workloads the recursive locking actually imposes a performance penalty. We trade failed atomic cmpsets and multiple modifications to the lock cacheline for a few pointer comparisons.