Paths

Table of Contentst

make m_getm2() resilient to zone_jumbop exhaustion
ClosedPublic
Actions

Authored by gallatin on Aug 21 2020, 6:48 PM.

Details

Reviewers

glebius
rrs
bz
rlibby
jeff
rmacklem
jhb
markj

Commits

rS364986: make m_getm2() resilient to zone_jumbop exhaustion

Summary

Currently, m_getm2() will sleep or fail if we're out of page sized mbufs. This leads to most things using sosend* (like sshd) eventually hanging waiting for memory in the page-size mbuf zone, and makes it impossible to communicate with a box which is under attack and has had its page-sized zone exhausted.

Rather than depending on the page size zone, also try cluster allocations to satisfy the request. This allows me to ssh to, and serve 100Gb/s of traffic from a server which is under attack and has had its page-sized zone exhausted.

Test Plan

Boot with the loader tunable kern.ipc.nmbjumbop="4" and verify that ssh and networking in general still works.

Note via vmstat -z | egrep mbuf\|ITEM that the zone is limited in size, and has failed memory allocations.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 33151

Event Timeline

gallatin created this revision.Aug 21 2020, 6:48 PM

Herald added a subscriber: imp. · View Herald TranscriptAug 21 2020, 6:48 PM

gallatin requested review of this revision.Aug 21 2020, 6:48 PM

Harbormaster completed remote builds in B33098: Diff 76071.Aug 21 2020, 6:49 PM

markj added inline comments.Aug 21 2020, 6:57 PM

sys/kern/kern_mbuf.c
1432	I think this check can be lifted into the previous block.
1433–1444	This test for `nm != NULL` is redundant.

Looks reasonable to me. I once modified the NFS code to use page_size clusters
for larger RPC messages and it could get hung when 4K clusters were exhausted.
--> I never though to do the 4K clusters with M_NOWAIT and then fallback to

regular clusters.

This patch looks fine to me. (I'll let you decide if markj@'s suggested changes
make the code more readable?)

Address markj's feedback:

Pull mb == NULL test into the previous block
Simplify by removing (existing) nm == NULL check and letting that happen in m_freem()

Harbormaster completed remote builds in B33151: Diff 76176.Aug 24 2020, 8:53 PM

gallatin marked 2 inline comments as done.Aug 24 2020, 8:53 PM

markj accepted this revision.Aug 24 2020, 9:05 PM

This revision is now accepted and ready to land.Aug 24 2020, 9:05 PM

glebius accepted this revision.Aug 30 2020, 4:38 PM

Closed by commit rS364986: make m_getm2() resilient to zone_jumbop exhaustion (authored by gallatin). · Explain WhyAug 31 2020, 1:53 PM

This revision was automatically updated to reflect the committed changes.

gallatin added a commit: rS364986: make m_getm2() resilient to zone_jumbop exhaustion.

Is there a possible side effect where if we now never do an M_WAITOK
alloc from zone_jumbop then we may not increase the zone size up to the
limit? As in, should there be a condition in here where we only convert
M_WAITOK to M_NOWAIT if we are actually near the limit? For example
should mb_reclaim arm this?

In D26150#584330, @rlibby wrote:

Is there a possible side effect where if we now never do an M_WAITOK
alloc from zone_jumbop then we may not increase the zone size up to the
limit? As in, should there be a condition in here where we only convert
M_WAITOK to M_NOWAIT if we are actually near the limit? For example
should mb_reclaim arm this?

I'm woefully ignorant of how UMA manages things, and had not realized that this would cause the zone to be more limited.

Mark, what do you think?

In D26150#584330, @rlibby wrote:

Is there a possible side effect where if we now never do an M_WAITOK
alloc from zone_jumbop then we may not increase the zone size up to the
limit?

You mean, because of a memory shortage? In a scenario where we are persistently failing to allocate slabs for UMA I'm not sure that it's very important to prioritize allocation from a specific zone. Maybe I'm misunderstanding?

As in, should there be a condition in here where we only convert
M_WAITOK to M_NOWAIT if we are actually near the limit? For example
should mb_reclaim arm this?

It sounds like you are suggesting a new semantic: sleep during memory shortages, but bail without sleeping if the zone limit is reached. That seems easiest to accomplish with a new malloc flag.

Revision Contents
Changeset List

Path

Size

sys/

kern/

kern_mbuf.c

29 lines

Diff 76176

View Options

make m_getm2() resilient to zone_jumbop exhaustionClosedPublicActions