Page MenuHomeFreeBSD

make m_getm2() resilient to zone_jumbop exhaustion
ClosedPublic

Authored by gallatin on Aug 21 2020, 6:48 PM.

Details

Summary

Currently, m_getm2() will sleep or fail if we're out of page sized mbufs. This leads to most things using sosend* (like sshd) eventually hanging waiting for memory in the page-size mbuf zone, and makes it impossible to communicate with a box which is under attack and has had its page-sized zone exhausted.

Rather than depending on the page size zone, also try cluster allocations to satisfy the request. This allows me to ssh to, and serve 100Gb/s of traffic from a server which is under attack and has had its page-sized zone exhausted.

Test Plan

Boot with the loader tunable kern.ipc.nmbjumbop="4" and verify that ssh and networking in general still works.

Note via vmstat -z | egrep mbuf\|ITEM that the zone is limited in size, and has failed memory allocations.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

sys/kern/kern_mbuf.c
1442 ↗(On Diff #76071)

I think this check can be lifted into the previous block.

1443 ↗(On Diff #76071)

This test for nm != NULL is redundant.

Looks reasonable to me. I once modified the NFS code to use page_size clusters
for larger RPC messages and it could get hung when 4K clusters were exhausted.
--> I never though to do the 4K clusters with M_NOWAIT and then fallback to

regular clusters.

This patch looks fine to me. (I'll let you decide if markj@'s suggested changes
make the code more readable?)

gallatin edited the test plan for this revision. (Show Details)

Address markj's feedback:

  • Pull mb == NULL test into the previous block
  • Simplify by removing (existing) nm == NULL check and letting that happen in m_freem()
This revision is now accepted and ready to land.Aug 24 2020, 9:05 PM

Is there a possible side effect where if we now never do an M_WAITOK
alloc from zone_jumbop then we may not increase the zone size up to the
limit? As in, should there be a condition in here where we only convert
M_WAITOK to M_NOWAIT if we are actually near the limit? For example
should mb_reclaim arm this?

Is there a possible side effect where if we now never do an M_WAITOK
alloc from zone_jumbop then we may not increase the zone size up to the
limit? As in, should there be a condition in here where we only convert
M_WAITOK to M_NOWAIT if we are actually near the limit? For example
should mb_reclaim arm this?

I'm woefully ignorant of how UMA manages things, and had not realized that this would cause the zone to be more limited.

Mark, what do you think?

Is there a possible side effect where if we now never do an M_WAITOK
alloc from zone_jumbop then we may not increase the zone size up to the
limit?

You mean, because of a memory shortage? In a scenario where we are persistently failing to allocate slabs for UMA I'm not sure that it's very important to prioritize allocation from a specific zone. Maybe I'm misunderstanding?

As in, should there be a condition in here where we only convert
M_WAITOK to M_NOWAIT if we are actually near the limit? For example
should mb_reclaim arm this?

It sounds like you are suggesting a new semantic: sleep during memory shortages, but bail without sleeping if the zone limit is reached. That seems easiest to accomplish with a new malloc flag.