opencrypto: Fix assignment of crypto completions to worker threads
Since r336439 we simply take the session pointer value mod the number of
worker threads (ncpu by default). On small systems this ends up
funneling all completion work through a single thread, which becomes a
bottleneck when processing IPSec traffic using hardware crypto drivers.
(Software drivers such as aesni(4) are unaffected since they invoke
completion handlers synchonously.)
Instead, maintain an incrementing counter with a unique value per
session, and use that to distribute work to completion threads.
Reviewed by: cem, jhb
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D28159
(cherry picked from commit 98d788c867b9e1d7a7e290254443b87ea77d8ab1)