Eliot and I have done extensive testing of L3C promotion on a variety of workloads. This includes some testing by me on a system configured with a 16KB base page size. Happily, on that system the number of L3C promotions to 2MB mappings is very close to the number of L2 promotions on a system with a 4KB base page size. Moreover, as expected, the number of L2 promotions is unaffected by this change.
The downside to this change is the increased direct and indirect costs of madvise(MADV_FREE). In a buildworld workload, the net effect is still positive. However, for GraphChi computing the page rank algorithm on a large graph, there is a significant increase in the number of page faults. Jemalloc is performing madvise(MADV_FREE) on a significant amount of memory that gets reused, and we suffer page faults to repromote to L3C (and L2) mappings. This has always been an issue with madvise(MADV_FREE), but with this change it is somewhat worse.