Move write aggregation memory copy out of vq_lock.
Memory copy is too heavy operation to do under the congested lock.
Moving it out reduces congestion by many times to almost invisible.
Since the original zio removed from the queue, and the child zio is
not executed yet, I don't see why would the copy need protection.
My guess it just remained like this from the time when lock was not
dropped here, which was added later to fix lock ordering issue.
Multi-threaded sequential write tests with both HDD and SSD pools
with ZVOL block sizes of 4KB, 16KB, 64KB and 128KB all show major
reduction of lock congestion, saving from 15% to 35% of CPU time
and increasing throughput from 10% to 40%.
Reviewed by: ahrens, behlendorf, ryao
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.