The CPU succeeding in releasing the not last reference can still have pending stores to the object protected by the affected counter. This opens a time window where another CPU can release the last reference and free the object, resulting in use-after-free. On top of that this prevents the compiler from generating more accesses to the object regardless of how atomic_fcmpset_rel_int is implemented (of course as long as it provides the release semantic).