MFC r281825: Rewrite physio() to not allocate pbufs for unmapped I/O.
pbufs is a limited resource, and their allocator is not SMP-scalable.
So instead of always allocating pbuf to immediately convert it to bio,
allocate bio just here. If buffer needs kernel mapping, then pbuf is
still allocated, but used only as a source of KVA and storage for a list
of held pages.
On 40-core system doing many 512-byte reads from user level to array of
raw SSDs this change removes huge lock congestion inside pbuf allocator.
It improves peak performance from ~300K to ~1.2M IOPS. On my previous
24-core system this problem also existed, but was less serious.