When the ktls_buffer zone needs to expand, it may fail due to a lack of physically contiguous memory. We tried to rectify that by introducing an alloc thread to provide a context where it is harmless to sleep, and letting that thread repopulate the ktls_buffer zone. However, it turns out that M_WAITOK is not enough, and we must call vm_page_reclaim_contig_domain() to reclaim contig memory. Worse, M_WAITOK results in the allocation essentially busy-looping around vm_domain_alloc_fail() returning EAGIN, causing vm_page_alloc_noobj_contig_domain() to loop and resulting in the alloc thread consuming 100% CPU. To fix this, we change the alloc thread to call vm_page_reclaim_contig_domain_ext() In order to prevent the busy loop around vm_domain_alloc_fail(), we must change the uma_zalloc flags to M_NORECLAIM | M_NOWAIT. However, once that is done, these allocations become no different than the allocations done in the critical path in ktls_buffer_alloc(), so its best to just eliminate them. Since we're no longer doing allocations but just calling vm_page_reclaim_contig_domain_ext(), the name has changed to the ktls reclaim thread.
Details
Details
- Reviewers
markj jhb - Commits
- rG198558523361: ktls: re-work alloc thread
Put a machine under memory pressure while serving a netflix heavy webserver workload, ensure that we see a small number of failed allocations in the ktls_buffers zone, and that the alloc thread is not consuming 100% of a core.
Diff Detail
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Comment Actions
The first patch did not test well.. We wound up with just a single 16k allocation succeeding per wakeup of the alloc thread. I'm going to try a few things and revise the patch.