This improves performance by reducing the number of allocations as we
write into the memstream, both in the fully buffered case with larger
memstreams and also more trivially in the line- and un-buffered case as
they flush back to the underlying buffer more often.
The inspiration for this was taken from Apple's implementation in
https://github.com/apple-oss-distributions/libc, but expanded to include
wmemstream for consistency. I've added a test for the bug that I hit in
libder that caused me to notice this in the first place, and fixed that
bug in this version.
Sponsored by: Klara, Inc.