It's a little silly this wasn't done already. There is no case where we delete from the middle of a list.
This also makes the change to always insert at the tail of the bucket list. If the items in the bucket are cache hot on a remote CPU it can be slower than if they are cache cold. This optimization will depend on how big your CPU is. Smaller systems are more likely to have shared caches encompassing all cores. I would argue that allocator perf is more important on large systems.