Page MenuHomeFreeBSD

Add "rcorder -p".
AbandonedPublic

Authored by trasz on Sep 22 2015, 5:38 PM.

Details

Summary

Add the "-p" option to rcorder(8), to output things that can be started together
separated by spaces. This might be used to implement parallel service startup.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 15242
Build 15306: arc lint + arc unit

Event Timeline

trasz updated this revision to Diff 8894.Sep 22 2015, 5:38 PM
trasz retitled this revision from to Add "rcorder -p"..
trasz updated this object.
trasz edited the test plan for this revision. (Show Details)
trasz updated this revision to Diff 8909.Sep 23 2015, 1:56 PM

Fix ordering bug too complicated to describe here.

While generating the correct ordering, this seems to handle loops in a slightly
less graceful way. Oh well.

trasz updated this revision to Diff 8961.Sep 28 2015, 1:13 PM

Hook up to the startup scripts.

ngie added a subscriber: ngie.Oct 25 2015, 8:01 AM

We've been running a python variant of this for a while now at $work.

Based on past experience, this is a difficult task to deal with.

One thing that seems to be missing is the ability to halt the boot process in the event a critical error occurred (say, fsck/mountcritlocal fails). Do you have any plans on how this will be handled?

etc/rc
133

Where is this variable set?

ngie added a subscriber: benno.Oct 25 2015, 8:01 AM
trasz added a comment.Nov 10 2015, 9:47 AM

Not really. It's basically a piece of work in progress. There are many more elements needed - such as a mechanism to "spawn" services in a known environment, instead of inheriting the one of the user calling service(8), or running a startup script by hand. I've just stashed this here so it won't get lost.

emaste added a subscriber: emaste.Feb 23 2018, 7:49 PM
This revision was not accepted when it landed; it landed in state Needs Review.Feb 23 2018, 8:01 PM
Closed by commit rS329877: Correct typo in ATA_WRITE_UNCORRECTABLE_PSEUDO (authored by emaste, committed by ). · Explain Why
This revision was automatically updated to reflect the committed changes.
emaste reopened this revision.Feb 23 2018, 8:05 PM

@trasz sorry I accidentally committed r329877 but referenced this review; can you please re-upload?

trasz updated this revision to Diff 39680.Feb 24 2018, 2:54 PM

Restore the proper patch and rebase.

cem added a subscriber: cem.Feb 9 2019, 11:01 PM

I suspect the rcorder -p mode could be implemented much less invasively than this change. It's fundamentally the same graph algorithm. I'll give it a shot.

etc/rc
130–131

Usual style convention in this file seems to be ; do on the same line as the while or if.

133

Lines 97-100 above. The name is a little confusing/misleading; it contains the set of "early" rc scripts that were completed already, in the loop lines 98-105. It is not updated as regular rc scripts are run (before or after this change).

(Also, this section of the code isn't really changed in this revision, it's just indented an extra level.)

137

Might be more clearly expressed as if checkyesno rc_parallel, but I don't feel strongly about it.

138

As ngie points out, this doesn't really handle failing scripts in the same way (I think — I'm not a shell expert).

run_rc_script traps SIGQUIT (just to print out a message about what script was interrupted before re-raising the signal) and SIGINT (to exit rc(8)); as well as handling SIGINFO to print a diagnostic about the running script.

In this patch's rc_parallel mode, both the QUIT and INT handlers are broken (if I understand correctly). If an rc script receives QUIT, it will not forward QUIT to the parent rc process but instead exit with signal itself. If an rc script receives INT, it will do basically the same thing — exit with error rather than signal, but either way, the parent rc process is not signaled, and we do not return to single user.

Additionally, the user pressing ^C or ^\ at the console will no longer signal a running rc script, but instead the primary rc process.

I don't know good ways to fix these differences in shell.

wait without arguments always returns zero, but theoretically we could count the number of commands in $line and wait for each individual background job. Then we could check for SIGQUIT exit (128+3) or SIGINT exit (1) and return to single user. That addresses one aspect of the difference, but not the lack of console interaction with running scripts. I don't have a good solution for that.

cem added a comment.Feb 9 2019, 11:27 PM
In D3715#409266, @cem wrote:

I suspect the rcorder -p mode could be implemented much less invasively than this change. It's fundamentally the same graph algorithm. I'll give it a shot.

I take that back :-). rcorder's implementation of a topological sort leaves a lot to be desired, unfortunately. It'd probably be better to redo it (correctly) from scratch.

imp added a comment.Feb 10 2019, 1:18 AM

I wonder: why not leverage tsort?

etc/rc
137

other places use it, and while consistency is the hobgoblins of small minds, it's a useful hobgoblin...

143

This is completely lame. We do each layer then wait. But that's lame, given the topology sort we're doing, we'll wind up doing more waiting than we need to.
There's also the issue of error codes. what should we do with the dependent scripts for a failed RC script?

sbin/rcorder/rcorder.c
756

clean this up.

804–808

this should be deleted.

845

ummm, this comment makes no sense.

cem added a comment.Feb 10 2019, 1:34 AM
In D3715#409280, @imp wrote:

I wonder: why not leverage tsort?

tsort's topological sort implementation is definitely better than rcorder's and lends itself to printing parallelizable "generations" more readily, but integrating it into rcorder is somewhat tricky (you could move the toposort functionality into libutil, perhaps, but fork+exec just to run a topo sort is ugly).

Neither tool is really what you want — the generations are better, but still artificially restrict concurrency. (Services in generation N+1 do not necessarily depend on every single service in generation N.) Ideally, you'd do the topological sort in rc(8) itself and start services as a side effect, as soon as their dependency count drops to zero. We're held back from implementing that in today's rc by Bourne shell. Shell just lacks reasonable datastructures. It is conceivably implementable in Lua, though. (Or C, obviously.)

(Also, while it's something that can be fixed, at the moment tsort(1) lives on /usr, and historically we've allowed /usr to be an independent filesystem with different liveness.)

Okay, so apart from implementation problems, there's actually a major architectural issue here: the fact that we wait for all the scripts from the previous generation. Do you see any way to actually fix it, or should we perhaps just scrap it and go back to the drawing board?

cem added a comment.Feb 11 2019, 5:31 PM
In D3715#409480, @trasz wrote:

Okay, so apart from implementation problems, there's actually a major architectural issue here: the fact that we wait for all the scripts from the previous generation. Do you see any way to actually fix it, or should we perhaps just scrap it and go back to the drawing board?

I don't know if it's major. You may still see some parallelism with this change that you wouldn't without it. However, the churn and risk may not be worth the incomplete improvement? (I don't feel strongly either way.)

trasz added a comment.Feb 12 2019, 1:43 PM

I don't have any _strong_ feelings, but I'd prefer doing this right.

Perhaps we should start from the other end? As in, get (either write or use existing one) an utility to "spawn" processes in a controlled way, with clean environment ('environment' in the wide sense, not just UNIX environment variables), and leave the dependencies and ordering in general for later?

trasz abandoned this revision.Mar 5 2019, 9:00 AM

Drop it. It doesn't work quite right.