opencrypto: Fix assignment of crypto completions to worker threads
Since r336439 we simply take the session pointer value mod the number of
worker threads (ncpu by default). On small systems this ends up
funneling all completion work through a single thread, which becomes a
bottleneck when processing IPSec traffic using hardware crypto drivers.
(Software drivers such as aesni(4) are unaffected since they invoke
completion handlers synchonously.)
Instead, maintain an incrementing counter with a unique value per
session, and use that to distribute work to completion threads.
Reviewed by: cem, jhb
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D28159