Page MenuHomeFreeBSD

krpc: UNIX auth: Prevent DoS, fix various OOB accesses
AbandonedPublic

Authored by olce on Aug 29 2025, 11:03 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 4, 4:34 PM
Unknown Object (File)
Thu, Oct 30, 2:16 PM
Unknown Object (File)
Mon, Oct 27, 5:12 PM
Unknown Object (File)
Sun, Oct 26, 10:59 PM
Unknown Object (File)
Sat, Oct 25, 9:05 PM
Unknown Object (File)
Thu, Oct 23, 5:51 AM
Unknown Object (File)
Wed, Oct 22, 4:10 AM
Unknown Object (File)
Sun, Oct 12, 6:08 PM
Subscribers

Details

Summary

In xdr_authunix_parms(), prevent accidental or malicious DoS by refusing
to receive a number of groups that is higher than 'ngroups_max' (maximum
number of supplementary groups) + 1 (accounting for the effective GID).

The 'cr_groups' field of 'struct xucred' can only contain up to
XU_NGROUPS including the effective GID. Just keep accepting more groups
than that, dropping the extra ones on the floor (as before).

For consistency, in the XDR_INLINE variant in _svcauth_unix(), make sure
to accept more groups than XU_NGROUPS (if less than the above-mentioned
limit).

xdr_authunix_parms() would try to fill up cr_groups[] with as many
groups as received with the only limit being 'ngroups_max' + 1, while by
default and in practice that limit is much greater than XU_NGROUPS which
represents the storage really allocated.

In the XDR_INLINE version, fix multiple OOB accesses possibilities when
answers are too short (in general, or with respect to their particular
content), which the previously existing check would reject too late to
prevent the OOB accesses.

As defense-in-depth, because 'cr_groups' is sized by XU_NGROUPS, check
for that limit instead of NGRPS. Fortunately, these constants are
currently equal so there was no practical harm. As NGRPS, which is
local, doesn't seem to serve any other purpose (like a protocol
limitation), just remove it.

While here, fix some style.

Fixes: dfdcada31e79 ("Add the new kernel-mode NFS Lock Manager.")
Sponsored by: The FreeBSD Foundation

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 66666
Build 63549: arc lint + arc unit

Event Timeline

olce requested review of this revision.Aug 29 2025, 11:03 PM

I made a few comments, but stopped there.
You seem to be confusing the 16 which is
a hardwired RPC constant with what happens
to be defined elsewhere.
(Even if XU_NGROUPS == 16 there is nothing
stopping someone from changing that someday.)

sys/rpc/authunix_prot.c
55

Note that 16 is a hard wired value, defined
long ago in the Sun RPC RFC.

I think the code should continue to use
it (as the limit for ENCODE and maybe
the limit for DECODE).

81

I'd suggest you avoid gratuitous changes
like these. The patch should only change
semantics.

Even if the code does not conform to
style(9), style(9) states that the style
should not be changed unless you are
changing at least 50% of the lines, for
good reason.

91

Same as above, there is no reason to remove
the {}.

109

As noted, this code must ensure that the
number of groups does not exceed NGRPS == 16.
It should not be using anything from somewhere
else that might change (such as "max_ngroups").

You seem to be confusing the 16 which is
a hardwired RPC constant with what happens
to be defined elsewhere.
(Even if XU_NGROUPS == 16 there is nothing
stopping someone from changing that someday.)

I'm not confusing anything, this change proposal was deliberate, based on the assumption that this limit of 16 may not have been in the protocol at all. This belief was prompted in turn by the already existing, actual confusion that NGROUPS should be used as a limit to fill struct xucred, whereas, as you said, XU_NGROUPS can independently be changed to a higher or lower value some day, the latter change leading to OOB accesses with the existing code. Additionally, there was no comment alluding to NGROUPS being protocol-specified. An initial version in fact continued to take NGROUPS into account, in addition to XU_NGROUPS, but I then chose to discard it for these reasons.

By adding you as a reviewer, I have exactly been expecting your take on whether this view was appropriate or not. Perhaps I should have made this explicit. I will reintroduce NGROUPS in some form (probably not as a per-file constant in multiple files) and add comments about what it stands for.

So, while I'm here, I'd also like to know what you think about discarding groups exceeding the NGROUPS limit. Now that you've said that NGROUPS is mandated by the protocol, it could be more appropriate to reject groups in excess instead of just discarding them, exactly as the inline version has been doing (without the change here). The only things I insist on are that both the inline and regular versions be consistent in this respect, and that we have a limit in place so that we are not accepting an arbitrary large number of groups to be discarded (which I catered to in the current version).

sys/rpc/authunix_prot.c
55

Ok, noted.

81

As you seem to be finding these annoying, I'll remove them from this revision at the next update.

For the rest, I have to disagree. These are not gratuitous changes (they help with clarity, beyond style(9)) and are confined to a function that I'm heavily modifying. I think you're largely over-interpreting style(9), which mentions the 50% rules as the (advisory) threshold to be able to modify style throughout the *whole* file. To my knowledge, there has never been any restriction to changing style in the code that is modified (and, frankly, such a restriction would be quite silly).

109

As explained in the main text, that's not enough nor correct. But the fact that 16 is a protocol constant has been noted, yes, and will be taken into account.

sys/rpc/authunix_prot.c
81

To my knowledge, there has never been any restriction to changing style in the code that is modified (and, frankly, such a restriction would be quite silly).

Well, to be more precise: This is not true for any style changes, but specifically a tolerance for changes towards what style(9) prescribes. Also, style(9) is advisory on these matters, relying rightly on developers' common sense for a lot of things. I think correcting style within a heavily-modified function is covered by this. I'll probably propose an amendment to style(9) to reflect this view.

So, while I'm here, I'd also like to know what you think about discarding groups exceeding the NGROUPS limit.
(snip)

According to RFC 5531, the authsys_parms structure for AUTH_SYS contains in fact the effective group ID (spelled out exactly like this in the RFC) and then an array of 16 groups "that contain the caller as a member". But we only have XU_NGROUPS (16) in total in struct xucred.

So, what do you think of:

  1. To support the protocol, we accept up to 17 groups (1 + 16), but no more (extensions are not supported).
  2. But we discard the 17th one, as we don't have room to store it.

This is in fact what the inline decode version is already doing.

So, while I'm here, I'd also like to know what you think about discarding groups exceeding the NGROUPS limit.
(snip)

According to RFC 5531, the authsys_parms structure for AUTH_SYS contains in fact the effective group ID (spelled out exactly like this in the RFC) and then an array of 16 groups "that contain the caller as a member". But we only have XU_NGROUPS (16) in total in struct xucred.

So, what do you think of:

  1. To support the protocol, we accept up to 17 groups (1 + 16), but no more (extensions are not supported).
  2. But we discard the 17th one, as we don't have room to store it.

This is in fact what the inline decode version is already doing.

It would be nice if all 17 groups ends up in the real cred structure,
but I now recall looking at this long ago and leaving it, since I
was not willing to revise xucred.

It might be worth looking at what else uses xucred to see if adding
a separate cr_gid field for the additional gid is feasible without
too much churn. (xucred was just defined when ucred was being
changed to handle more groups)

Btw, the DECODE case is coded the way it is because, before the
change to ucred to handle more groups, the last one had to be
tossed.

What the code should do when the groups list in the received XDR
exceeds 16 is not obvious.

  • At one time, there was talk of "ignoring" the 16 limit in the RFC and sending more groups on the wire. I do not know if any client actually does this?

If you choose to not accept any additional "cheater" groups
(my preferred case, since I like to conform to RFCs), then the
code can either:
A- throw the additional ones away
or
B - reply with an RPC layer authentication error
I do not have a strong opinion w.r.t. which is preferred.
(I think it currently does A.)

On the generic comment you made about DOS attacks...

  • There are a million ways to do a DOS attack against a NFS server. Firewalls or exports(5) is really the only practical defense, I think.
  • At this point, the transport handling code has already parsed out the RPC message into an mbuf list. As such, most of the DOS damage has already been done. I think svc_vc.c has already checked for absurdly large received RPC messages.

If the groups list is bogus, the XDR parsing code
will eventually find out the XDR is bogus and
throw a BadXDR RPC layer error back to the client.

sys/rpc/authunix_prot.c
81

Here's the snippet from style(9):

In general code can be considered "new code" when it makes up about 50%
or more of the file(s) involved.  This is enough to break precedents in
the existing code and use the current style guidelines.

As far as I am concerned, if you are going to change style is
an extant function, that should either be in a separate patch
that just changes style (I think this is a waste of time, but that's
up to the committer) and not in a patch that is to be reviewed
for semantics changes.
--> The bigger the patch, the harder it is to see what you are
changing and why.

If you try and take the above snippet out of style, I will
oppose it. I think it is the most important thing style(9)
states.

109

I think the 16 constant should be either defined
here or in a .h file in sys/rpc and might be more
obvious if it was named something like
SUNRPC_MAXGROUPS, with a comment noting
that this is defined for the protocol via the RFC.

sys/rpc/authunix_prot.c
81

Btw, you might find this a useful read.
(I've been forced to read it, since I have
some Linux NFS kernel patches to submit someday.)

https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html

For example, here's one statement in Section 2.
Solve only one problem per patch. If your description starts to get long,
that’s a sign that you probably need to split up your patch.
See 3) Separate your changes.

For this patch, your description is pretty long.

For example, if adding a cr_gid field to xucred is
practical, then after doing that for the generic code,
a patch that uses that field here so that all 17 gids
get handled would make sense to me.

olce marked 5 inline comments as done.Oct 7 2025, 5:21 PM

So, what do you think of:

  1. To support the protocol, we accept up to 17 groups (1 + 16), but no more (extensions are not supported).
  2. But we discard the 17th one, as we don't have room to store it.

This is in fact what the inline decode version is already doing.

It would be nice if all 17 groups ends up in the real cred structure,
but I now recall looking at this long ago and leaving it, since I
was not willing to revise xucred.

It might be worth looking at what else uses xucred to see if adding
a separate cr_gid field for the additional gid is feasible without
too much churn. (xucred was just defined when ucred was being
changed to handle more groups)

I've looked and this is not trivial, as struct xucred is used by several other components (Internet protocols sysctl knobs, UNIX sockets' peer credentials, etc.), so let's forget that for the time being and focus on fixing the immediate, bigger problems we have, even if I agree with you that in the longer term we should switch to struct ucred.

If you choose to not accept any additional "cheater" groups
(my preferred case, since I like to conform to RFCs)

As long as XDR mandates passing the length of variable-length arrays, I don't see why we should limit ourselves to 16 when receiving whereas we could actually handle more, already accommodating possible future RFC extensions. But I agree with you that we should by default limit to 16 supplementary groups when sending the auth handle, though. Supporting more groups there would be controlled with some administrative setting (e.g., a sysctl knob specifying the maximum number of supplementary groups to use). All this in accordance with the Robustness principle.

Rather, I think the decisive factor here to comply with the RFC is that we cannot store more than 16 including the effective GID (as long as we keep 'struct xucred'), so pretending we can accept more is currently just misleading. Accepting 17 groups in total whereas we can't store the last already feels bad, but at least seems more compliant with the RFC than returning an error outright, as the RFC does not mandate what is to be done with the identification information but mandates that up to 16 supplementary groups should be accepted.

On the generic comment you made about DOS attacks...

  • There are a million ways to do a DOS attack against a NFS server. Firewalls or exports(5) is really the only practical defense, I think.

Yes. I used DoS in the title, but really I was targeting unintentional/accidental forms of DoS (quality of implementation), not malicious ones (bullet-proof implementation) which as you pointed out is impossible in NFS itself.

  • At this point, the transport handling code has already parsed out the RPC message into an mbuf list. As such, most of the DOS damage has already been done. I think svc_vc.c has already checked for absurdly large received RPC messages.

I'm not familiar with this code, just spent some time today reading some. IIUC what I read, I don't see any limitation for DoS, that is to say, the code will try to read anything it can from the connection's socket up to an end of record, and only then will check for the call header and apply the MAX_AUTH_BYTES limit.

Unrelated note, but could you please refrain from forcibly wrapping lines in your comments? Too short lines impair reading as much as too long ones, and wrapping by hand prevent adjusting the browser's window to get some desired line width.

sys/rpc/authunix_prot.c
81

If you try and take the above snippet out of style, I will
oppose it. I think it is the most important thing style(9)
states.

Again, my take is that you're over-interpreting that snippet, and will push for a revision of style(9) to clear any such interpretation. See D52885.

81

Btw, you might find this a useful read.
(snip)

I have been in this trade long enough to have known all this already for a long time, and also to discern when following such rules is improductive, although obviously that also depends on the context and the reviewers.

For this patch, your description is pretty long.

The important point here is that both the inline and classical version of deserialization are put in sync, that's the main reason for grouping changes in xdr_authunix_parms() and _svcauth_unix() in the same revision. There are some changes to _svcauth_unix() that can be considered and can be separated, yes, which I will do since I have to rework this anyway.

As for having a few style changes intermingled with changes here, I can't understand how they can disturb you so much (there are only 2, extremely small!), but as said will make the effort for you (it's much more time for me than I suspect it would be for you to just forget about them).

olce marked an inline comment as done.
olce marked 2 inline comments as done.

Superseded by the series starting at D52960 (through D52964).