Page MenuHomeFreeBSD

Address memory leak in new reno cc module
ClosedPublic

Authored by lstewart on May 8 2018, 8:02 PM.

Details

Summary

Proposed commit log message:

Plug a memory leak and potential NULL-pointer dereference introduced in r331214.
Each TCP connection that uses the system default cc_newreno(4) congestion
control algorithm module leaks a "struct newreno" (8 bytes of memory) at
connection initialisation time. The NULL-pointer dereference is only germane
when using the ABE feature, which is disabled by default.
While at it:

  • Defer the allocation of memory until it is actually needed given that ABE is optional and disabled by default.
  • Document the ENOMEM errno in getsockopt(2)/setsockopt(2).
  • Document ENOMEM and ENOBUFS in tcp(4) as being synonymous given that they are used interchangeably throughout the code.
  • Fix a few other nits also accidentally omitted from the original patch. Reported by: Harsh Jain on freebsd-net@ Tested by: tjh@ Differential Revision: https://reviews.freebsd.org/D15358
Test Plan

I ran nc in a loop and watched MemUse in vmstat grow. Afterwards it stays
constant.

#!/bin/sh

for run in seq 300
do

echo "hello world" | nc -w 1 139.133.232.165 2600
vmstat -m | grep newreno

done

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

thj created this revision.May 8 2018, 8:02 PM
jhb added a subscriber: jhb.May 8 2018, 10:33 PM
jhb added inline comments.
sys/netinet/cc/cc_newreno.c
123 ↗(On Diff #42284)

This seems to be an important bug fix. Several places in this file assume ccv->cc_data is not NULL. newreno_cong_signal() checks it for NULL but then later assumes it is not NULL for the CC_NDUPACK case. newreno_ctl_output() doesn't even check but just assumes non-NULL. It seems like this should probably fail with ENOMEM if the memory allocation fails?

jtl added inline comments.May 9 2018, 12:03 AM
sys/netinet/cc/cc_newreno.c
123 ↗(On Diff #42284)

Yeah, this is clearly a good catch. We're allocating memory and not doing anything with it, so clearly that's wrong.

I had initially wondered if cc_data ccv->cc_data pointed to bogus memory; however, it should be initialized to NULL by tcp_newtcpcb(), which would cause a page fault if accessed. So, it must just be that no one is actually using the ABE features that this structure supports.

My opinion is that the features which cc_data supports in the newreno module are "optional" (as evidenced by the fact that no one has noticed they don't work). So, we should probably let the connection succeed and simply fix the places that are brokenly relying on cc_data being non-NULL.

In newreno_cong_signal() , you could use the default beta/beta_ecn values if cc_data is NULL.

In newreno_ctl_output(), you could try to allocate and initialize cc_data, if it is NULL. If that fails, return ENOBUFS.

Actually, an even better change may be to simply do the above two changes, but then delay allocating cc_data until someone tries to change the defaults. That would avoid unnecessary allocations for a feature which evidently is not widely used. (Otherwise, it seems like we should be seeing many more kernel panics.)

In any case, very good catch @thj!

jtl added a comment.May 9 2018, 12:09 AM

BTW, it looks like this was introduced in r331214. Since it is so new, my "not widely used" comment is probably both true and somewhat irrelevant.

jhb added inline comments.May 9 2018, 1:01 AM
sys/netinet/cc/cc_newreno.c
123 ↗(On Diff #42284)

Yes, I think this is too new to say that it is not widely used. However, if cc_newreno is the default, I do see some value in it not failing in cb_init for a malloc(). I think your suggestions of deferring the allocation to the setsockopt() callback and using the defaults as a fallback if it is NULL seems sensible. Arguably if you do that then the 'ABE' check should always be there even if newreno is NULL and use the global ecn value for CC_ECN?

Still to be tested, but I think something like this would address the leak and change memory allocation to being conditional on need: newreno_plugleak_v1.diff

thj added a comment.May 9 2018, 11:13 AM

Still to be tested, but I think something like this would address the leak and change memory allocation to being conditional on need: newreno_plugleak_v1.diff

This reads okay.

I ran through a loop with netcat and didn't doesn't leak(of course it wouldn't!) and tested setting the abe beta values via set sockopt with a modified netcat (https://people.freebsd.org/~thj/diffs/ncabe.diff). This also doesn't leak.

In D15358#323834, @thj wrote:

Still to be tested, but I think something like this would address the leak and change memory allocation to being conditional on need: newreno_plugleak_v1.diff

This reads okay.
I ran through a loop with netcat and didn't doesn't leak(of course it wouldn't!) and tested setting the abe beta values via set sockopt with a modified netcat (https://people.freebsd.org/~thj/diffs/ncabe.diff). This also doesn't leak.

@thj Cool. Did you also verify that ABE does in fact work as expected in the presence of ECN marks?

As an aside, @jtl privately questioned the use of ENOMEM vs ENOBUFS. The TCP stack uses ENOBUFS and ENOMEM interchangeably for malloc failures, and @jtl is wondering if we should standardise on one or the other as far as user space app error handling is concerned to aid portability. This LKML thread and the Linux socket(2) man page suggest that returning both is kosher from both a POSIX point of view and as established precedent. I haven't consulted the standards myself, but it seems plausible that we should add ENOMEM to our tcp(4) - here is a revised patch that does so. @emaste, your were singled out as someone with a good working knowledge of POSIX... do you have any comment on the subject of socket API ENOMEM vs ENOBUFS errno usage?

wollman added a subscriber: wollman.EditedMay 10 2018, 2:47 AM

As an aside, @jtl privately questioned the use of ENOMEM vs ENOBUFS. The TCP stack uses ENOBUFS and ENOMEM interchangeably for malloc failures, and @jtl is wondering if we should standardise on one or the other as far as user space app error handling is concerned to aid portability. This LKML thread and the Linux socket(2) man page suggest that returning both is kosher from both a POSIX point of view and as established precedent. I haven't consulted the standards myself, but it seems plausible that we should add ENOMEM to our tcp(4) - here is a revised patch that does so. @emaste, your were singled out as someone with a good working knowledge of POSIX... do you have any comment on the subject of socket API ENOMEM vs ENOBUFS errno usage?

Historically, everything in the network stack was stored in an mbuf, and [ENOBUFS] was the error for "out of mbufs". Probably that distinction should be maintained, even though we now only store (meta)data in mbufs and not things like ARP tables, since the mbuf pool is still separately sized (and many users manually tune it).

Checking The Standard: a firm distinction is maintained between [ENOBUFS] = "No buffer space is available" and [ENOMEM] = "Insufficient memory to complete the operation", and these are both "shall fail" errors, meaning that implementations are required to distinguish between "buffer space" and "memory" more generally.

Update: it's worse than that: I was looking at accept(), but it seems like each socket call is different in terms of whether [ENOMEM] is defined at all, how [ENOBUFS] is described, and whether it's a "shall fail" or a "may fail". If [ENOMEM] isn't listed as may/shall fail, it's permitted so long as the condition isn't ascribed to some other errno -- but in some cases, the description of [ENOBUFS] is vague enough ("Insufficient system resources were available to complete the action") that it's probably required even when mbufs are not implicated. Sigh.

thj added a comment.May 10 2018, 9:50 AM
In D15358#323834, @thj wrote:

Still to be tested, but I think something like this would address the leak and change memory allocation to being conditional on need: newreno_plugleak_v1.diff

This reads okay.
I ran through a loop with netcat and didn't doesn't leak(of course it wouldn't!) and tested setting the abe beta values via set sockopt with a modified netcat (https://people.freebsd.org/~thj/diffs/ncabe.diff). This also doesn't leak.

@thj Cool. Did you also verify that ABE does in fact work as expected in the presence of ECN marks?

I was only looking to test the socket api functions, I am not sure I will have time to validate the behaviour until next week.

lstewart added a comment.EditedMay 10 2018, 10:00 AM

@wollman Many thanks for the historical and standards related context - greatly appreciated. I realised that I probably need to update getsockopt(2)/setsockopt(2) as well...

@thj Ok, well in the interest of expediency given the severity of the bug, I guess proceeding before confirming ABE functionality would be a good idea. Would you like to exercise your newly minted commit bit for this or should I commit?

thj added a comment.May 10 2018, 10:05 AM

@wollman Many thanks for the historical and standards related context - greatly appreciated. I realised that I probably need to update getsockopt(2)/setsockopt(2) as well...
@thj Ok, well in the interest of expediency given the severity of the bug, I guess proceeding before confirming ABE functionality would be a good idea. Would you like to exercise your newly minted commit bit for this or should I commit?

Makes sense. It will probably be quicker if you do it. We are rearranging our labs at the moment.

lstewart edited reviewers, added: thj; removed: lstewart.May 10 2018, 10:34 AM
lstewart commandeered this revision.

newreno_plugleak_v3.diff with getsockopt(2)/setsockopt(2) updated.

lstewart updated this revision to Diff 42359.May 10 2018, 10:55 AM
lstewart edited the summary of this revision. (Show Details)May 14 2018, 2:24 AM
lstewart updated this revision to Diff 42507.

Final commit candidate, rebased against FreeBSD head r333598, and with M_ZERO removed from malloc call.

jhb added inline comments.May 14 2018, 3:54 PM
sys/netinet/cc/cc_newreno.c
332 ↗(On Diff #42507)

I do think it would be simpler to rename 'newreno_condmalloc' to 'newreno_malloc' and remove the conditional logic from it (so it just always calls malloc) and call it here if newreno is NULL, so something like:

nreno = ccv->cc_data;
opt = buf;
switch (sopt->sopt_dir) {
case SOPT_SET:
    /* We cannot set without cc_data memory. */
    if (nreno == NULL) {
        nreno = newreno_malloc(ccv);
        if (nreno == NULL)
            return (ENOMEM);
    }
    ...
lstewart added inline comments.May 15 2018, 1:45 AM
sys/netinet/cc/cc_newreno.c
332 ↗(On Diff #42507)

Not sure I agree your suggestion is "simpler"... just different, and it increases nesting which I have a mild dislike of. I was anticipating potential future behaviour whereby other code paths might also want to conditionally allocate memory for on-demand feature activation. That being said, no such thing exists currently, so I'm open to adopting your suggestion, but unconvinced "simpler" is a good argument for doing so :)

Are there other arguments/opinions on the matter? I'd like to commit this in the next 24h.

rgrimes added inline comments.
sys/netinet/cc/cc_newreno.c
332 ↗(On Diff #42507)

To me it would appear the overall code would be easier to understand if the change from the conditional logic was removed from the function and the function renamed. It would remove nesting from inside that code, which would migrate to here though. But it would remove the testing of sopt->sopt_dir == SOPT_SET twice, once in the call to newreno_condmalloc and once in the switch statement on sopt->sopt_dir

This revision was not accepted when it landed; it landed in state Needs Review.May 17 2018, 2:46 AM
This revision was automatically updated to reflect the committed changes.