When system call auditing is configured, we see measurable performance loss on high core count servers due to atomic operations in the uncontended rw read lock of evclass_lock. This rw lock protects the evclass hash table. A contrived example of 64 threads continuously reading a byte from per-thread files shows 99% of the time spent in this stack: amd64_syscall -> audit_syscall_enter -> au_event_class -> __rw_rlock_int
Given that the evclass hash table can never have items removed, only added, using a mutex to serialize additions and converting to ck_list allows sufficient protection for lockless lookups. In the contrived example, this change increases performance from 5M reads/sec to 70M reads/sec on an AMD 7502P. In the real world, it gets us back about 1.5% CPU on busy servers.