Page MenuHomeFreeBSD

Add "rcorder -p".
AbandonedPublic

Authored by trasz on Sep 22 2015, 5:38 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 24, 11:03 PM
Unknown Object (File)
Sun, Nov 24, 3:03 PM
Unknown Object (File)
Sat, Nov 23, 1:55 PM
Unknown Object (File)
Sat, Nov 23, 7:46 AM
Unknown Object (File)
Fri, Nov 22, 4:00 PM
Unknown Object (File)
Fri, Nov 22, 5:06 AM
Unknown Object (File)
Thu, Nov 21, 7:17 AM
Unknown Object (File)
Thu, Nov 21, 4:56 AM

Details

Summary

Add the "-p" option to rcorder(8), to output things that can be started together
separated by spaces. This might be used to implement parallel service startup.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 15242
Build 15306: arc lint + arc unit

Event Timeline

trasz retitled this revision from to Add "rcorder -p"..
trasz updated this object.
trasz edited the test plan for this revision. (Show Details)

Fix ordering bug too complicated to describe here.

While generating the correct ordering, this seems to handle loops in a slightly
less graceful way. Oh well.

Hook up to the startup scripts.

We've been running a python variant of this for a while now at $work.

Based on past experience, this is a difficult task to deal with.

One thing that seems to be missing is the ability to halt the boot process in the event a critical error occurred (say, fsck/mountcritlocal fails). Do you have any plans on how this will be handled?

etc/rc
133

Where is this variable set?

Not really. It's basically a piece of work in progress. There are many more elements needed - such as a mechanism to "spawn" services in a known environment, instead of inheriting the one of the user calling service(8), or running a startup script by hand. I've just stashed this here so it won't get lost.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 23 2018, 8:01 PM
This revision was automatically updated to reflect the committed changes.

@trasz sorry I accidentally committed r329877 but referenced this review; can you please re-upload?

Restore the proper patch and rebase.

I suspect the rcorder -p mode could be implemented much less invasively than this change. It's fundamentally the same graph algorithm. I'll give it a shot.

etc/rc
130–131

Usual style convention in this file seems to be ; do on the same line as the while or if.

133

Lines 97-100 above. The name is a little confusing/misleading; it contains the set of "early" rc scripts that were completed already, in the loop lines 98-105. It is not updated as regular rc scripts are run (before or after this change).

(Also, this section of the code isn't really changed in this revision, it's just indented an extra level.)

137

Might be more clearly expressed as if checkyesno rc_parallel, but I don't feel strongly about it.

138

As ngie points out, this doesn't really handle failing scripts in the same way (I think — I'm not a shell expert).

run_rc_script traps SIGQUIT (just to print out a message about what script was interrupted before re-raising the signal) and SIGINT (to exit rc(8)); as well as handling SIGINFO to print a diagnostic about the running script.

In this patch's rc_parallel mode, both the QUIT and INT handlers are broken (if I understand correctly). If an rc script receives QUIT, it will not forward QUIT to the parent rc process but instead exit with signal itself. If an rc script receives INT, it will do basically the same thing — exit with error rather than signal, but either way, the parent rc process is not signaled, and we do not return to single user.

Additionally, the user pressing ^C or ^\ at the console will no longer signal a running rc script, but instead the primary rc process.

I don't know good ways to fix these differences in shell.

wait without arguments always returns zero, but theoretically we could count the number of commands in $line and wait for each individual background job. Then we could check for SIGQUIT exit (128+3) or SIGINT exit (1) and return to single user. That addresses one aspect of the difference, but not the lack of console interaction with running scripts. I don't have a good solution for that.

In D3715#409266, @cem wrote:

I suspect the rcorder -p mode could be implemented much less invasively than this change. It's fundamentally the same graph algorithm. I'll give it a shot.

I take that back :-). rcorder's implementation of a topological sort leaves a lot to be desired, unfortunately. It'd probably be better to redo it (correctly) from scratch.

I wonder: why not leverage tsort?

etc/rc
137

other places use it, and while consistency is the hobgoblins of small minds, it's a useful hobgoblin...

143

This is completely lame. We do each layer then wait. But that's lame, given the topology sort we're doing, we'll wind up doing more waiting than we need to.
There's also the issue of error codes. what should we do with the dependent scripts for a failed RC script?

sbin/rcorder/rcorder.c
756

clean this up.

804–808

this should be deleted.

845

ummm, this comment makes no sense.

In D3715#409280, @imp wrote:

I wonder: why not leverage tsort?

tsort's topological sort implementation is definitely better than rcorder's and lends itself to printing parallelizable "generations" more readily, but integrating it into rcorder is somewhat tricky (you could move the toposort functionality into libutil, perhaps, but fork+exec just to run a topo sort is ugly).

Neither tool is really what you want — the generations are better, but still artificially restrict concurrency. (Services in generation N+1 do not necessarily depend on every single service in generation N.) Ideally, you'd do the topological sort in rc(8) itself and start services as a side effect, as soon as their dependency count drops to zero. We're held back from implementing that in today's rc by Bourne shell. Shell just lacks reasonable datastructures. It is conceivably implementable in Lua, though. (Or C, obviously.)

(Also, while it's something that can be fixed, at the moment tsort(1) lives on /usr, and historically we've allowed /usr to be an independent filesystem with different liveness.)

Okay, so apart from implementation problems, there's actually a major architectural issue here: the fact that we wait for all the scripts from the previous generation. Do you see any way to actually fix it, or should we perhaps just scrap it and go back to the drawing board?

In D3715#409480, @trasz wrote:

Okay, so apart from implementation problems, there's actually a major architectural issue here: the fact that we wait for all the scripts from the previous generation. Do you see any way to actually fix it, or should we perhaps just scrap it and go back to the drawing board?

I don't know if it's major. You may still see some parallelism with this change that you wouldn't without it. However, the churn and risk may not be worth the incomplete improvement? (I don't feel strongly either way.)

I don't have any _strong_ feelings, but I'd prefer doing this right.

Perhaps we should start from the other end? As in, get (either write or use existing one) an utility to "spawn" processes in a controlled way, with clean environment ('environment' in the wide sense, not just UNIX environment variables), and leave the dependencies and ordering in general for later?

Drop it. It doesn't work quite right.