User Details
- User Since
- Oct 21 2015, 3:01 PM (446 w, 13 h)
Jun 28 2018
Looks good and useful!
May 25 2018
May 24 2018
I have two questions:
- is it OK to leave the old SP in case of error? I guess it is but in the previous implementation the SP is removed if it cannot be updated, and I think it is really questionable.
- is it OK to hold a mutex during a printf? I guess not but there are already some examples doing that in this file
Nobody wants this? It is definitely better than the current code though.
May 16 2018
Remove bad comment
May 15 2018
ae's remarks
May 14 2018
Remarks
May 11 2018
Apr 23 2018
ae's remarks
Apr 20 2018
Update the maxentries sysctl once the hashtable has been init in order to reflect the actual size used.
Apr 13 2018
Apr 12 2018
Nov 3 2017
Moved function to simplify diff
Oct 23 2017
Oct 19 2017
Fixed typo in comment + 32bits build
Oct 17 2017
Changed sysctl name to "net.inet.inet.ipsec.async_crypto"
Oct 2 2017
Style remarks
Sep 29 2017
Updates done regarding jmg remarks. I hope this is easier to understand now.
@jmg, I will post a new review soon with the changes you suggest.
Sep 25 2017
Sep 12 2017
Updated the crypto(9) man page to reflect the changes
Aug 29 2017
Aug 25 2017
As jhb suggested, I added a flag on drivers that may benefit from a multi cpu dispatch.
This provides a way to just do direct dispatch for hardware crypto drivers.
Aug 21 2017
Jul 28 2017
Jun 12 2017
Now deleting the taskqueue in crypto_destroy()
May 30 2017
May 19 2017
Any thoughts on this review?
May 11 2017
Apr 20 2017
The taskqueue API could possibly be expanded to support this. But I don't think it could easily be expanded to support the use case that I mentioned in the previous comment to support a single invocation supporting multiple streaming requests.
I have made some further tests, it looks like a single queue that is shared between several worker threads is far more efficient than a round robin on several workers, each having its own queue (none of the kernel threads are pinned to a CPU)
The problem is that on a dual socket system (2*6 cores) performance is really bad: all the time is spent in the mutex protecting the single job queue.
What would you suggest to get decent performance? Round robin on several queues, each one having several worker threads?Are you making sure you're only waking one when adding to the queue (and not waking up all 12 threads to hammer on the mutex)? Also, even doing a wake one is only effective if when the worker thread wakes up, it processes one item before going to sleep, otherwise you'll be waking threads that don't do anything (but hammer the mutex) before going back to sleep. If you have your worker threads polling the queue for work, you need to have an active flag, which when set, causes no wakeup's to happen when a item is enqueued. Then it is up to the worker thread to detect the back log, and wake additional worker threads once there is an average or more than one item per worker thread woken in the backlog. This average should probably be kept over multiple iterations unless you care more about latency than cpu usage.
Apr 18 2017
Eventually the goal for us is to use crypto(9) from IPsec to accelerate single flows processing. Indeed IPsec does not guarantee packet ordering (neither does IP), but it would be for sure quite harmful for some end user applications if packets are not ordered.
A same crypto session may be used for several flows coming from the nic on several CPUs. It would be needed to keep the packets ordered for each flow on each CPU but it does not really matter to loss the ESP packet order in ouput, as the anti replay window handles that on the remote host.
That's why I think it would be nice for crypto(9) users to keep ordering when dispatching the jobs.No, this is a requirement of the IPsec layer, not all users of crypto(9) require this. For example, disk encryption does not need this, as the upper layers ensures that writes are ordered correctly (ZFS and UFS both do this). And by forcing order, you are increasing latency unnecessarily for other consumers.
This isn't hard to handle at the IPsec layer. You use a TAILQ to enqueue the packets w/ a simple data structure w/ a flag that gets set when the packet is completed, then each completed packet, while the head of the tailq is ready, send it. It's not hard, and keeps the ordering logic where it belongs, or you add a flag to crypto(9) and the logic there, but you need to allow non-ordered operation.
This is maybe a bit more difficult, since we would need to reorder packets only within the flows that may share the same crypto session, but I get your idea. Maybe a reording queue per CPU would do the job, since we expect each flow to be pinned on the same CPU.
Apr 14 2017
Apr 13 2017
Thanks for completing the job!
Nov 17 2016
Added missing m_cat call.
Nov 16 2016
Hello,
Please find an updated version using extension messages.
I also checked the new behavior with a patched version of charon.
Nov 10 2016
Nov 9 2016
delphij: is that OK for you?
Nov 8 2016
Nov 7 2016
Updated diff using full context, changed code according to delphij's comment
Oct 7 2016
Is that OK for you?
Updated diff according to ache's remarks
if (atomic_cmpset_int(&arc4rand_iniseed_state, ARC4_ENTR_HAVE, ARC4_ENTR_SEED) || reseed) { ARC4_FOREACH(arc4) arc4_randomstir(arc4); } ... if (arc4->numruns > ARC4_RESEED_BYTES || tv.tv_sec > arc4->t_reseed) arc4_randomstir(arc4);
This is nearly what I proposed before, but why would you reseed all the instances in case the user calls arc4rand with reseed = 1?
Oct 6 2016
Yes that is what I meant. Not sure this is a big security concern or not?
I meant the arc4rand_iniseed_state variable.
That's a good point indeed.
Do you have other remarks on this patch?
Oct 5 2016
Oct 4 2016
Changed malloc test to KASSERT
Oct 3 2016
Sorry, forgot to remove a commented line