A 32-bit zone is used to avoid deficiencies of the counter(9) api. It may be int is too small now and we should switch it at least on amd64, i.e. change it from int to long (keeping the smaller size on 32-bit platforms).
There are very few helpers here and the code is full of ugly int-casts. Perhaps this can be augmented.
These issues aside I consider the patch committable. Proposed commit message:
struct mount counters are very frequently updated, but their exact value is rarely needed. At the same time constant use of a centralized counter causes avoidable cacheline traffic. Avoid the problem by adding per-cpu variants for the common case. The code falls back to centralized counters in case of mnt_vfs_ops > 0.
Sample result of a 104-way fstatfs (performs busy/unbusy):
before: 852393 ops/s
after: 76682077 ops/s