Tune cpuset macros to optimize cases when CPU_SETSIZE fits into single
machine word. For example, it turns CPU_SET() into expected shift and OR,
removing two extra shifts and additional index on memory access.
Generated code checked for kernel (optimized) and user-level (unoptimized)
cases with GCC and CLANG.
Reviewed by: attilio
MFC after: 2 weeks