Details

Reviewers: None
Commits: rS329877: Correct typo in ATA_WRITE_UNCORRECTABLE_PSEUDO

Summary

Add the "-p" option to rcorder(8), to output things that can be started together
separated by spaces. This might be used to implement parallel service startup.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 15242
Build 15306: arc lint + arc unit

Event Timeline

trasz updated this revision to Diff 8894.Sep 22 2015, 5:38 PM

trasz retitled this revision from to Add "rcorder -p"..

trasz updated this object.

trasz edited the test plan for this revision. (Show Details)

Herald added a subscriber: imp. · View Herald TranscriptSep 22 2015, 5:38 PM

Fix ordering bug too complicated to describe here.

While generating the correct ordering, this seems to handle loops in a slightly
less graceful way. Oh well.

Hook up to the startup scripts.

We've been running a python variant of this for a while now at $work.

Based on past experience, this is a difficult task to deal with.

One thing that seems to be missing is the ability to halt the boot process in the event a critical error occurred (say, fsck/mountcritlocal fails). Do you have any plans on how this will be handled?

etc/rc
133	Where is this variable set?

ngie added a subscriber: benno.Oct 25 2015, 8:01 AM

Not really. It's basically a piece of work in progress. There are many more elements needed - such as a mechanism to "spawn" services in a known environment, instead of inheriting the one of the user calling service(8), or running a startup script by hand. I've just stashed this here so it won't get lost.

emaste added a subscriber: emaste.Feb 23 2018, 7:49 PM

This revision was not accepted when it landed; it landed in state Needs Review.Feb 23 2018, 8:01 PM

Closed by commit rS329877: Correct typo in ATA_WRITE_UNCORRECTABLE_PSEUDO (authored by emaste). · Explain Why

This revision was automatically updated to reflect the committed changes.

@trasz sorry I accidentally committed r329877 but referenced this review; can you please re-upload?

Restore the proper patch and rebase.

Harbormaster completed remote builds in B15242: Diff 39680.Feb 24 2018, 2:54 PM

I suspect the rcorder -p mode could be implemented much less invasively than this change. It's fundamentally the same graph algorithm. I'll give it a shot.

etc/rc
130–131	Usual style convention in this file seems to be `; do` on the same line as the `while` or `if`.
133	Lines 97-100 above. The name is a little confusing/misleading; it contains the set of "early" rc scripts that were completed already, in the loop lines 98-105. It is not updated as regular rc scripts are run (before or after this change). (Also, this section of the code isn't really changed in this revision, it's just indented an extra level.)
137	Might be more clearly expressed as `if checkyesno rc_parallel`, but I don't feel strongly about it.
138	As ngie points out, this doesn't really handle failing scripts in the same way (I think — I'm not a shell expert). run_rc_script traps SIGQUIT (just to print out a message about what script was interrupted before re-raising the signal) and SIGINT (to exit rc(8)); as well as handling SIGINFO to print a diagnostic about the running script. In this patch's rc_parallel mode, both the QUIT and INT handlers are broken (if I understand correctly). If an rc script receives QUIT, it will not forward QUIT to the parent rc process but instead exit with signal itself. If an rc script receives INT, it will do basically the same thing — exit with error rather than signal, but either way, the parent rc process is not signaled, and we do not return to single user. Additionally, the user pressing ^C or ^\ at the console will no longer signal a running rc script, but instead the primary rc process. I don't know good ways to fix these differences in shell. `wait` without arguments always returns zero, but theoretically we could count the number of commands in `$line` and `wait` for each individual background job. Then we could check for SIGQUIT exit (128+3) or SIGINT exit (1) and return to single user. That addresses one aspect of the difference, but not the lack of console interaction with running scripts. I don't have a good solution for that.

In D3715#409266, @cem wrote:

I suspect the rcorder -p mode could be implemented much less invasively than this change. It's fundamentally the same graph algorithm. I'll give it a shot.

I take that back :-). rcorder's implementation of a topological sort leaves a lot to be desired, unfortunately. It'd probably be better to redo it (correctly) from scratch.

I wonder: why not leverage tsort?

etc/rc
137	other places use it, and while consistency is the hobgoblins of small minds, it's a useful hobgoblin...
143	This is completely lame. We do each layer then wait. But that's lame, given the topology sort we're doing, we'll wind up doing more waiting than we need to. There's also the issue of error codes. what should we do with the dependent scripts for a failed RC script?
sbin/rcorder/rcorder.c
756	clean this up.
804–808	this should be deleted.
845	ummm, this comment makes no sense.

In D3715#409280, @imp wrote:

I wonder: why not leverage tsort?

tsort's topological sort implementation is definitely better than rcorder's and lends itself to printing parallelizable "generations" more readily, but integrating it into rcorder is somewhat tricky (you could move the toposort functionality into libutil, perhaps, but fork+exec just to run a topo sort is ugly).

Neither tool is really what you want — the generations are better, but still artificially restrict concurrency. (Services in generation N+1 do not necessarily depend on every single service in generation N.) Ideally, you'd do the topological sort in rc(8) itself and start services as a side effect, as soon as their dependency count drops to zero. We're held back from implementing that in today's rc by Bourne shell. Shell just lacks reasonable datastructures. It is conceivably implementable in Lua, though. (Or C, obviously.)

(Also, while it's something that can be fixed, at the moment tsort(1) lives on /usr, and historically we've allowed /usr to be an independent filesystem with different liveness.)

Okay, so apart from implementation problems, there's actually a major architectural issue here: the fact that we wait for all the scripts from the previous generation. Do you see any way to actually fix it, or should we perhaps just scrap it and go back to the drawing board?

In D3715#409480, @trasz wrote:

Okay, so apart from implementation problems, there's actually a major architectural issue here: the fact that we wait for all the scripts from the previous generation. Do you see any way to actually fix it, or should we perhaps just scrap it and go back to the drawing board?

I don't know if it's major. You may still see some parallelism with this change that you wouldn't without it. However, the churn and risk may not be worth the incomplete improvement? (I don't feel strongly either way.)

I don't have any _strong_ feelings, but I'd prefer doing this right.

Perhaps we should start from the other end? As in, get (either write or use existing one) an utility to "spawn" processes in a controlled way, with clean environment ('environment' in the wide sense, not just UNIX environment variables), and leave the dependencies and ordering in general for later?

Drop it. It doesn't work quite right.

0mp mentioned this in D25389: Add loop visualization and parallel execution support into rcorder.Jun 26 2020, 7:16 PM

unitrunker_gmail.com added a subscriber: unitrunker_gmail.com.Aug 21 2020, 7:17 AM

unitrunker_gmail.com removed a subscriber: unitrunker_gmail.com.

unitrunker_gmail.com added a subscriber: unitrunker_gmail.com.Aug 21 2020, 7:25 AM