Page MenuHomeFreeBSD

add a new -I command line option to mountd which makes it update the kernel exports incrementally upon a reload
ClosedPublic

Authored by rmacklem on May 31 2019, 9:20 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 7, 2:31 AM
Unknown Object (File)
Sat, Dec 7, 2:30 AM
Unknown Object (File)
Sat, Dec 7, 2:29 AM
Unknown Object (File)
Sat, Dec 7, 2:29 AM
Unknown Object (File)
Sat, Dec 7, 2:28 AM
Unknown Object (File)
Sat, Dec 7, 2:27 AM
Unknown Object (File)
Nov 29 2024, 7:22 PM
Unknown Object (File)
Nov 29 2024, 7:22 PM
Subscribers

Details

Summary

Peter reported via email that mountd would take 16seconds to reload the
exports upon receipt of a SIGHUP for his 72,000+ file system server,
which results in a 16sec suspension of the nfsd threads.

D20270 replaced the single linked list of exportlist elements with a hash table of lists.

That should have improved performance somewhat. However, I believe a majority of the
time spent reloading is when mountd.c deletes and reloads all the exports in the kernel.
This patch adds a "-I" option that tells mountd
to only delete/load exports in the kernel for export file entries that
have changed. There will somewhat increased CPU overheads for first doing
the processing of the exports file(s) and then comparing the old and new
to find the changes that need to be done via nmount() for the kernel.
However, the processing of the exports file(s) can be done before the nfsd
threads are suspended (assuming the "-S" option), so hopefully the duration
of the nfsd threads being suspended will be shortened considerably for small
changes (what will typically happen for a reload) to the exports file(s).

Measurements by Peter Erikkson show that the time the nfsd threads are suspended
is reduced from seconds to milliseconds for server with 20000 exported file systems.

There have been several commits done to restructure the code so that this patch is
more straightforward. It is still pretty large, but I don't see an obvious advantage to
committing it in sections.

Basically, with the "-I" option, the old exports structures (a hash table of linked lists
with other structures hanging off the entries in these lists) are kept and a new hash
table of linked lists is created as "pass 1" of the reload. Then the two sets of lists
are traversed looking for any changes. When changes are found, they are handled by
updating the kernel exports, as required.
At the end of this comparison pass, the new hash table of export lists becomes the current
one and the old hash table of lists is free'd up.

Test Plan

I have been testing using a small exports file with as many cases of changes
as I could think of.

Peter Errikson has been testing a variant of the patch (in PR#237860 for stable)
on his servers with a large number of exported file systems (20,000-70,000+)
and has not yet reported problems.

There is some runtime syslog() debugging in the code that can be enabled by
creating a file called /var/log/mountd.debug that anyone who runs into difficulties
when using "-I" can use to acquire debug information that I can hopefully use to
debug the case where the incremental update somehow failed.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Why make the new behavior optional? If the only downside is increased CPU usage by a constant factor, then I think the new behavior should be mandatory.

Well, the main reason I made it an optional non-default option is...

  • The exports file format (and the code that parses it and reloads it) is complex and I may have missed a case such that the "-I reload" doesn't get it right. --> With a non-default option, the sysadmin can just go back to the default case and that should work around the bug until it is fixed. (Since the initial load must be a full loading of the exports, the non "-I" case code needs to remain in the sources and making it non-optional wouldn't simplify the code.)

Put another way, I might currently consider this an "experimental option" that
could cause POLA violations when enabled. (That is why I have the syslog() debug
code always compiled in with the current patch, so that sysadmins can enable
the debugging without rebuilding or even restarting the daemon.)

However, I can see the counter argument of making it non-optional, so that
sysadmins don't need to know to add the option and, in that case the daemon
would need to be restarted if a reload fails to work correctly.

I'll ask what people think on freebsd-fs@, since it would be easy to make non-optional.

Your follow-up questions definitely show that you want some mountd.conf config file, e.g. to manage -I flag. The file should be re-read on SIGHUP.
I could argue that it is reasonable for mountd.conf to provide all options supplied as command-line arguments to mountd.

usr.sbin/mountd/mountd.c
2198 ↗(On Diff #58129)

Why do you need maxn ? If only to declare fndarray[] below, you can use fndarray[n], we are way past c99.

2209 ↗(On Diff #58129)

Generally, use of memcmp() is not correct if members of array can contain padding.

This version of the patch just implements the incremental update of the kernel exports
for all reloads and does not need/use the "-I" option.

Although I have only received one reply to my RFC request w.r.t. should I have the "-I" option
flag, it was a "no". Although he has not said so, I also suspect that asomers@ was thinking
along the same lines when he asked the question about it.

It also uses a "variable size array" in COMPARE_ARRAYS() as suggested by kib@.

W.r.t. the config file. Unless there are several who disagree with just having this always enabled,
I think it is ok to just have it always enabled and not configurable.

As for enabling the syslog() debugging info, cy@ has a patch that converts this to using dtrace,
so I suspect that to go away too. (There is some issue w.r.t. dtrace for stable/12 that needs to
be resolved by cy@, but he is working on other things for now.)
--> I don't think any configuration file will be needed after all.

Thanks for the comments, rick

usr.sbin/mountd/mountd.c
2198 ↗(On Diff #58129)

Sure. I wasn't sure if variable sized arrays were allowed in FreeBSD source, but since you
indicate they are, I have changed the macro to use one.

2209 ↗(On Diff #58129)

The only types it is used for are "gid_t" and "int", so I don't think padding will be an issue.
(Just fyi, a case where memcmp() falsely returns "not same" only results in the comparison
being done the slow way, so any padding would only make this less efficient.)

If you think it should be changed to a for(;;) loop on the n elements, I can easily change
it to that. (Part of the reason I used memcmp() is I suspected someone might ask why I
did a loop instead of using it.)

kib added inline comments.
usr.sbin/mountd/mountd.c
2200 ↗(On Diff #58168)

I think that these micro-optimizations are not an optimizations at all. Memcmp has early-stop behavior, and you do not need to squeeze the last cycle from the reload code anyway. I would seriously consider just doing per-element comparision only. It is both more correct and produce less code, which might be faster due to the less code, after all.

This revision is now accepted and ready to land.Jun 2 2019, 12:11 PM
rmacklem marked an inline comment as done.

The COMPARE_ARRAYS() macro has been changed to remove the "optimizations"
as suggested by kib@. It now just checks for the common case of identical
arrays by iterating through them with a for loop.
This also allowed the "typ" argument to be removed, since it is no longer needed.

This revision now requires review to proceed.Jun 2 2019, 10:01 PM

The COMPARE_ARRAYS() macro has been changed per suggestions by kib@.

Thanks for the review, rick

usr.sbin/mountd/mountd.c
2200 ↗(On Diff #58168)

These have been removed and replaced with a simple for loop to check for
identical arrays.

2209 ↗(On Diff #58129)

memcmp() has now been removed from the patch.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 3 2019, 10:59 PM
This revision was automatically updated to reflect the committed changes.
rmacklem marked an inline comment as done.