On a 4-core rpi2 it improves build time for ~10% from 435 to 390 secs, utilization from ~325% to ~370%
On a 12-core hsw-ep it improves build time for ~30% from 25 to 16 secs, utilization from ~400% to ~650%
It is far from ideal mostly due to chained dependencies in binutils.