Page MenuHomeFreeBSD

Bring back redirect route expiration.

Authored by melifaro on Jan 7 2020, 10:31 PM.



(Part of a larger D22988 ).

Redirect (and temporal) route expiration was broken a while ago.

This change brings route expiration back, with unified IPv4/IPv6 handling code.

It introduces net.inet.icmp.redirtimeout sysctl, allowing to set an expiration time for redirected routes. It defaults to 10 minutes, analogues with net.inet6.icmp6.redirtimeout.

Implementation uses separate file, route_temporal.c, as route.c is already bloated with tons of different functions.

Internally, expiration is implemented as an per-rnh callout scheduled when route with non-zero rt_expire time is added or rt_expire is changed. It does not add any overhead when no temporal routes are present.

Callout traverses entire routing tree under wlock, scheduling expired routes for deletion and calculating the next time it needs to be run.
The rationale for such implemention is the following: typically workloads requiring large amount of routes have redirects turned off already, while the systems with small amount of routes will not inhibit large overhead during tree traversal.

To support this callout, rib_fibnum, rib_family and rib_vnet fields are added to the rhn. These will be used by the upcoming route changes as well.

This changes also fixes netstat -rn display of route expiration time, which has been broken since the conversion from kread() to sysctl.

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

melifaro created this revision.Jan 7 2020, 10:31 PM
melifaro edited the summary of this revision. (Show Details)Jan 7 2020, 10:33 PM
melifaro added reviewers: bz, network.
melifaro updated this revision to Diff 66564.Jan 9 2020, 6:09 PM

Update the diff to reflect the committed pre-requisites.

melifaro updated this revision to Diff 66566.Jan 9 2020, 6:31 PM

Add forgotten tests.

bz requested changes to this revision.Jan 10 2020, 12:14 PM

I've not looked at the test cases (yet) in particular detail.
There's 2 or 3 functional questions, most is just comments and whitespace.

191 ↗(On Diff #66566)

If you don't break up the first line, the KASSERT will still fit on two lines.

191 ↗(On Diff #66566)

Shouldn't it be 0 <= %d < %d?

194 ↗(On Diff #66566)

And here as well: 0 <= ..

602 ↗(On Diff #66566)

While changing the code, can we start the comment upper case and end in a '.' as well?

One the other hand, these checks violate RFC 4861 I think? (Haven't checked if there is a specified behaviour for IPv4)

A host MUST NOT consider a redirect invalid just because the Target
Address of the redirect is not covered under one of the link's
prefixes.  Part of the semantics of the Redirect message is that the
Target Address is on-link.

Given we are currently not prepared to add an interface route to my understanding,
can we add a comment

/* XXX-BZ RFC 4861 8.1 does not mandate this check. */

and fix it later (unless you do want to fix it as well, but I fear that will require more code changes which are better done in a follow-up?

611 ↗(On Diff #66566)

If rt is NULL we don't do anything up to the end and there we (a) call RTFREE_LOCKED() and will have a NULL pointer deref and then (b) return (0);
If we add a check for rt == NULL here and handle it, then we can drop all the rt != NULL on all the conditions further down.

635 ↗(On Diff #66566)

'.' at the end of comment.

640 ↗(On Diff #66566)

I keep wondering if I'd swap the two last checks with the first (currently) if (rt != NULL) block and do the flags checks before comparing addresses?

651 ↗(On Diff #66566)

The comment could be re-wrapped to three lines.

654 ↗(On Diff #66566)

Indentation looks too much here.

And as mentioned for the declaration I think the argument order should change?

657 ↗(On Diff #66566)

Please don't assign values in variable declaration. Set it after NET_EPOCH_ASSERT() maybe.

665 ↗(On Diff #66566)

error is 0 here?

671 ↗(On Diff #66566)

Comments start with upper case and end in punctuation.

673 ↗(On Diff #66566)

This is the same "on-link" vs. prefix issue as in the previous function, right? We should add another comment here?

684 ↗(On Diff #66566)

'.' at the end.

686 ↗(On Diff #66566)

'.' at the end.

687 ↗(On Diff #66566)

I think time_second + "expire_sec"; would read better. I am still confused if this is "expire_sec" or actually "lifetime" as it seems to be the lifetime of the redirect whereas rmx_expire then indeed is the "second in which to expire this entry".

695 ↗(On Diff #66566)

Hmm... I wonder if that should be done for other errors as well?

1061 ↗(On Diff #66566)
Iterates over a routing table specified by @fibnum and @family and
deletes elements marked by @filter_f.

(replacing rtable, fixinf af argument name, removing white space on 2nd line).

1066 ↗(On Diff #66566)

remove "the" or add "function" at the end.

1067 ↗(On Diff #66566)

Then it could almost be a bool instead of int?

1092 ↗(On Diff #66566)

'.' at the end.

1112 ↗(On Diff #66566)

Could be one sentence with an 'and'. also "returns a" not "returned" maybe?

1114 ↗(On Diff #66566)

, "it " iterates .. in "the " particular ..

1120 ↗(On Diff #66566)

u_int fibnum?

501 ↗(On Diff #66566)

Is there a particular reason to break up the sockaddr pointers by putting the int flags in the middle between them?

27 ↗(On Diff #66566)

Normally this would be a __FBSDID(..); in the code after the cdefs include and not the comment.

51 ↗(On Diff #66566)

extra space to remove .. but the 2nd sentence can also be one line easily.

56 ↗(On Diff #66566)

No assignments in variable declarations.

62 ↗(On Diff #66566)

Is this is debug printf which should go? Otherwise at least at a %s: func please.

69 ↗(On Diff #66566)

Here the next_callout = (time_t *)arg; could happen; which means we'd also not do it for previous returns and save some work.

84 ↗(On Diff #66566)

Please move the assignments down before first use.

96 ↗(On Diff #66566)

Isn't this a problem after roughly 24 days?

135 ↗(On Diff #66566)

Same ~24 days issue as above?

This revision now requires changes to proceed.Jan 10 2020, 12:14 PM
melifaro added inline comments.Jan 10 2020, 1:13 PM
602 ↗(On Diff #66566)

One the other hand, these checks violate RFC 4861 I think?

IMHO It does not strictly violate RFC 4861. 8.1 indeed requires calling the message "valid redirect", while 8.3 suggest that

A host receiving a valid redirect SHOULD update its Destination Cache
   accordingly so that subsequent traffic goes to the specified target.

and not honouring SHOULD is not exactly a violation.

I'd certainly prefer the scope of this review to be limited to bringing back generic functionality, relying on existing verifications and fixing the message processing parts as a separate changes. I'll add the comment on the RFC compliance'

However, looking further for the way suggested by the RFC: if we have global unicast IP as destination (IP1) and GU ip as a target (IP2),
where IP1 != IP2 and IP2 is not directly reachable via this interface - is adding ND cache entry for IP2 a really good idea security-wise?

Given we are currently not prepared to add an interface route to my understanding,

Do we want to?

687 ↗(On Diff #66566)

I think time_second + "expire_sec"; would read better.


I am still confused if this is "expire_sec" or actually "lifetime"

Would like to indicate the unit as well, are you okay with lifetime_sec?

27 ↗(On Diff #66566)

Copied this pattern from the files with newer copyright :-)
Will change back to __FBSDID()

69 ↗(On Diff #66566)

Sorry, not sure if I get it correctly :-(. Can you please rephrase the statement?

96 ↗(On Diff #66566)

That's a good point, thanks for catching this!
I was mostly thinking of temporal routes as redirects with reasonably limited lifetime, however user is able to set up arbitrary expiration time. I'll change the handling here to callout_reset_sbt() to avoid using ticks at all or move to int64_t.

melifaro added inline comments.Jan 10 2020, 1:31 PM
602 ↗(On Diff #66566)

Was looking into the code once again - we have most of the checks in the icmp6_redirect_input() and they are different (in the details) between IPv4 and IPv6. Maybe it would be better to move the most existing verification to IPv4 code, like "icmp_redirect_input()" and making ipv6 codepath not making some of the checks twice.

501 ↗(On Diff #66566)

I don't know what was the original reason - probably the desire of splitting the "to-be-used-in-redirect" arguments and "just-for-verification" arguments.

I don't have any strong opinion on this one - I like the current one more, but ca swap the arguments if that's your preference.

bz added inline comments.Jan 10 2020, 1:41 PM
602 ↗(On Diff #66566)

Sounds good to me as well.

687 ↗(On Diff #66566)

lifetime_sec is great!

69 ↗(On Diff #66566)

If you do not do the assignment at variable declaration time, it could be moved down to here:

next_callout = (time_t *)arg;

as this is the first use of next_callout in this function. So there is no need to assing arg to it any sooner. If (rt->rt_expire == 0) is true we'd return and an early assignment at variable declaration time would have been wasted CPU cycles (and probably register space).

melifaro marked 24 inline comments as done.Jan 20 2020, 10:39 PM
melifaro added inline comments.
695 ↗(On Diff #66566)

I guess more granular statistics is needed. I'd prefer to deal with it as a separate change.

melifaro updated this revision to Diff 67073.Jan 20 2020, 10:44 PM

Update review to address comments.

melifaro updated this revision to Diff 67094.Jan 21 2020, 10:09 AM

Add redirect test for IPv4.
Add a bit more wording on host/network redirects.

melifaro updated this revision to Diff 67095.Jan 21 2020, 10:18 AM

Fix forgotten variable declaration in expire_callout().

bz accepted this revision.Jan 21 2020, 10:16 PM

It's extremely hard in Phabricator to follow the changes between the last round and this so I am not sure I found all the changes fixed and comments done.

I found one minor thing that should be removed from this review before commit. Otherwise I think you should go ahead and commit it (without the flowid change).

63 ↗(On Diff #67094)

This is big difference. One is an IPv6 header field per-connection; the other the a 4-tuple calculated for a connection. The argument is unused in the function called.
I don't think this change belongs here to this review.

This revision is now accepted and ready to land.Jan 21 2020, 10:16 PM
melifaro added inline comments.Jan 21 2020, 10:21 PM
63 ↗(On Diff #67094)

Yep, that one sneaked from another change. Will remove prior to the commit.

This revision was automatically updated to reflect the committed changes.