Page MenuHomeFreeBSD

rc.subr: Implement list_vars without using 'read'
ClosedPublic

Authored by cem on Dec 7 2018, 10:35 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Mar 25, 10:07 PM
Unknown Object (File)
Feb 15 2024, 12:21 AM
Unknown Object (File)
Feb 14 2024, 10:36 AM
Unknown Object (File)
Feb 12 2024, 4:49 AM
Unknown Object (File)
Feb 5 2024, 5:16 AM
Unknown Object (File)
Dec 20 2023, 3:19 AM
Unknown Object (File)
Oct 14 2023, 11:40 AM
Unknown Object (File)
Sep 11 2023, 12:58 PM
Subscribers
None

Details

Summary

'read' pessimistically read(2)s one byte at a time, which can be quite
slow for large environments.

Suggested by: jilles

Test Plan

Basic test:

$ truss -f -o truss.log -- sh -c '. /etc/rc.subr ; list_vars SHELL\*' && rg read\\\( truss.log | wc -l
SHELL
SHELLCHECK_OPTS
3401
$ truss -f -o truss.log -- sh -c '. ./rc.subr ; list_vars SHELL\*' && rg read\\\( truss.log | wc -l
SHELL
SHELLCHECK_OPTS
63

Performance:

$ /usr/bin/time sh -c '. ./rc.subr ; for j in $(seq 20) ; do for i in $(seq 100) ; do list_vars SHELL\* ; done; done' >/dev/null
        1.08 real         1.06 user         0.02 sys
$ /usr/bin/time sh -c '. ./rc.subr ; for j in $(seq 20) ; do for i in $(seq 100) ; do list_vars SHELL\* ; done; done' >/dev/null
        1.09 real         1.06 user         0.03 sys
$ /usr/bin/time sh -c '. ./rc.subr ; for j in $(seq 20) ; do for i in $(seq 100) ; do list_vars SHELL\* ; done; done' >/dev/null
        1.10 real         1.09 user         0.00 sys
$ /usr/bin/time sh -c '. ./rc.subr ; for j in $(seq 20) ; do for i in $(seq 100) ; do list_vars SHELL\* ; done; done' >/dev/null
        1.06 real         1.06 user         0.00 sys

$ /usr/bin/time sh -c '. /etc/rc.subr ; for j in $(seq 20) ; do for i in $(seq 100) ; do list_vars SHELL\* ; done; done' >/dev/null
        3.65 real         1.54 user         2.40 sys
$ /usr/bin/time sh -c '. /etc/rc.subr ; for j in $(seq 20) ; do for i in $(seq 100) ; do list_vars SHELL\* ; done; done' >/dev/null
        3.69 real         1.49 user         2.49 sys
$ /usr/bin/time sh -c '. /etc/rc.subr ; for j in $(seq 20) ; do for i in $(seq 100) ; do list_vars SHELL\* ; done; done' >/dev/null
        3.81 real         1.55 user         2.55 sys
$ /usr/bin/time sh -c '. /etc/rc.subr ; for j in $(seq 20) ; do for i in $(seq 100) ; do list_vars SHELL\* ; done; done' >/dev/null
        3.82 real         1.54 user         2.57 sys

$ cat > postpatchms.dat
1060
1100
1090
^D

$ cat > prepatchms.dat
3820
3810
3690
^D

$ ministat -w 60 prepatchms.dat postpatchms.dat
x prepatchms.dat
+ postpatchms.dat
+------------------------------------------------------------+
| +                                                        x |
|++                                                      x x |
|AM                                                      |AM||
+------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3          3690          3820          3810     3773.3333     72.341781
+   3          1060          1100          1090     1083.3333      20.81666
Difference at 95.0% confidence
        -2690 +/- 120.649
        -71.2898% +/- 1.24902%
        (Student's t, pooled s = 53.2291)

Loop enough times and it's measurable, even on 2017 amd64.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

FYI, this booted in a very simple MFS root (no real networking other than lo0)

cem added a reviewer: dteske.
libexec/rc/rc.subr
66 ↗(On Diff #51744)

Add comment (since this is the only use in this file)

local - # localize set options
67 ↗(On Diff #51744)

Remove this line and add the word local before the IFS=$'\n' line (emulating ltr() function in this file).

68–69 ↗(On Diff #51744)

Style in this file is to combine related locals.

local line var

71–72 ↗(On Diff #51744)

No need to be so verbose. Just put the shell option's longname in an inline comment:

set -f # noglob

73 ↗(On Diff #51744)

Combine local and set in one like in the ltr() function in this file:

local IFS=$'\n'

74 ↗(On Diff #51744)

Space around parens (editor jump aid).

for line in $( set ); do
              ^   ^
77–78 ↗(On Diff #51744)

Indentation doesn't match sister-style of switch/case in style(9). Use:

case ... in
blah) ... ;;
blah) ... ;;
*) ...
esac

Like in C (as seen in style(9) -- switch and case at same level of indentation):

switch (...) {
case blah: ...;
case blah: ...;
default: ...;
}
libexec/rc/rc.subr
66 ↗(On Diff #51744)

Clarification: Use an inline comment to prevent ambiguity that the comment could pertain to the block since there is no newline after the local - line (an inline comment makes it more clear and the recommend inline comment would be localize set options)

libexec/rc/rc.subr
71–72 ↗(On Diff #51744)

In that case, it seems even better to write set -o noglob.

74 ↗(On Diff #51744)

That would be the first time to use that pattern in rc.subr. Existing style in this file is to write $() command substitutions without whitespace between the $(, command and ).

77–78 ↗(On Diff #51744)

Also, existing style in this file is to indent the keyword case and the patterns at the same level.

cem marked 6 inline comments as done.Dec 8 2018, 10:30 PM
cem added inline comments.
libexec/rc/rc.subr
66 ↗(On Diff #51744)

Do we have a sh(1) style document that supersedes style(9) (re: requests for terse, inline commenting)? Generally inline comments in C are used rarely; mostly for struct member declarations.

I think it is understood that not all lines following a comment are necessarily described by the preceding comment, but could add a blank line instead, if that is preferred.

67 ↗(On Diff #51744)

This syntax isn't part of POSIX shell and isn't document in the sh(1) manual page section on local. If I should use a construct with potentially confusing side effects (is the assignment evaluated before the variable is made local, or after) depending on implementation, I'd like it to be documented.

68–69 ↗(On Diff #51744)

Again, this syntax isn't documented in sh(1).

71–72 ↗(On Diff #51744)

I'll go with jilles' suggestion.

74 ↗(On Diff #51744)

I'll leave as-is. Can you elaborate on what you mean by editor jump aid, though?

77–78 ↗(On Diff #51744)

I'll de-indent case to match existing style. In C / style(9), generally the body of a case statement would be on an independent line, and indented.

cem marked 7 inline comments as done.
  • comment the full behavior of 'local -' rather than the subset used in this function, per dteske
  • Use the slightly more explicit 'set -o noglob', in preference to 'set -f', per jilles
  • De-indent cases, to match existing style (per dteske and jilles)
  • Drop case body statements to independent lines -- unlike in C, there is a lot going on in the 'case' itself
libexec/rc/rc.subr
68–69 ↗(On Diff #51744)

Sure looks like it's documented to me:

Variables may be declared to be local to a function by using the local
 command.  This should appear as the first statement of a function, and
 the syntax is:

       local [variable ...] [-]

 The local command is implemented as a built-in command.  The exit status
 is zero unless the command is not in a function or a variable name is
 invalid.

The ... means more than one is allowed. This is reinforced by 'a variable' rather than 'the variable' used in the description.

cem marked an inline comment as done.Dec 8 2018, 11:54 PM
cem added inline comments.
libexec/rc/rc.subr
68–69 ↗(On Diff #51744)

It wasn't clear to me, or we wouldn't be having this conversation, right? And if the docs are confusing to someone who has read and written a lot of docs and code, maybe there is room for improvement.

I think it would help to

  1. Be explicit about the command supporting multiple variables in one invocation. The text as written just indicates that multiple uses of the command can localize plural variables.
  2. Leave off the [-] from the syntax line. It isn't some positional argument that must occur last, it's just another "variable."
  3. I don't think the first variable is especially optional, or if it is, it's a useless invocation of the command and not worth documenting.

This seems better:

One or more variables may be declared to be local to a function by using the local
command.  This should appear as the first statement of a function, and
the syntax is:

       local variable [...]

I'd reorder the subsequent paragraphs to put the one documenting - first, the inheritance behavior second, and the built-in command / exit status last, or just elided entirely. Zero in-tree consumers of local care about the exit status.

libexec/rc/rc.subr
68–69 ↗(On Diff #51744)

Sure, that would be clearer. I've seen this idiom so many times that knowing it's allowed, I have a blind spot for how that can be better communicated.

libexec/rc/rc.subr
68–69 ↗(On Diff #51744)

The words "parameter" and "variable" are not used fully consistently in that section, but what you are proposing would make it inconsistent with "Variables and Parameters": a variable is a parameter designated by a name, and there are also positional and special parameters. So when both variables and the special parameter - are meant, the word "parameter" should be used. On the other hand, the special parameter - is always set, never readonly and never exported.

Regarding invoking local without operands, this does nothing but that is not portable, so not documenting that seems reasonable. The ... should not be in square brackets to be consistent with e.g. eval and export in the section "Built-in Commands" below.

The section "Built-in Commands" repeats local's synopsis (this is OK, but needs to follow suit when the section "Functions" is changed).

The exit status matters to scripts that do set -e.

cem marked an inline comment as done.Dec 10 2018, 12:22 AM
cem added inline comments.
libexec/rc/rc.subr
68–69 ↗(On Diff #51744)

Thanks, that's a good suggestion — "parameter" is a more appropriate description.

Re: ...: Ok — I guess ... already conveys optionality.

Re: built-in commands: maybe it would make more sense to document the exit status there, where it is specified as a command?

Re: set -e: I don't find that argument totally convincing yet. My read of the paragraph is that local only returns non-zero when scripts have what are essentially syntax errors. I'd assert that set -e scripts don't expect to get away with syntax errors, given the higher bar they already set re: every statement returning success. On the other hand, it is good to document where the local statement is valid, and that paragraph does so.

Anyway, let's table this discussion for now; I will owe you and imp a subsequent review for the changes I am proposing.

libexec/rc/rc.subr
66 ↗(On Diff #51744)

If you look at the file itself, you'll see inline comments are common and many.

The style section of my book (Serious Shell Programming) has a chapter on comments:
https://freebsdfrau.gitbooks.io/serious-shell-programming/content/style/comments.html

67 ↗(On Diff #51744)

Is the problem one of declaring a local and assigning a value at the same time?

Not asking for anything that isn't already done in this file. See line 1891, inside the ltr() function:

1891         local IFS="${_src}"

Combine line 67 of your patch:

67         local IFS

with line 73 of your patch:

73         IFS=$'\n'

to form the following single line:

73         local IFS=$'\n'

Or is the problem one of using $'\n' for the first time in this file? That was part of the originally submitted patch and I am unsure of when it was added to our /bin/sh (perhaps Jilles can provide some backstory there), but it appears to be supported (and the likelihood of someone porting rc.subr to another OS that lacks our /bin/sh is perhaps dubious).

68–69 ↗(On Diff #51744)

Re: exit status

Advanced shell code such as that in bsdconfig that emulates things like structs in shell make sure to check the exit code of built-ins such as setvar and local without relying on mehanics such as set -e to catch errors. Basically anytime you may pass a variable as the name of another variable (utilizing "indirection").

74 ↗(On Diff #51744)

The simplest example perhaps is in nvi/vi/vim, comparing these two lines:

abc=$(date)
abc=$( date )

Put your cursor at far left.

In line 1, abc=$(date) (which lacks space around the word date), there are 2 ways to jump the cursor to the start of date:

  1. Press w twice
  2. Press f then d to jump to the first occurrence of d to the right of the cursor

In line 2, abc=$( date ) (which has space around the word date), the above options are possible as well as:

  1. Press W once

FWIW, this did fix the multi-hour delay in a formal model simulator.

cem marked 3 inline comments as done.Dec 10 2018, 7:08 PM
cem added inline comments.
libexec/rc/rc.subr
67 ↗(On Diff #51744)

Is the problem one of declaring a local and assigning a value at the same time

Yep.

Not asking for anything that isn't already done in this file. See line 1891, inside the ltr() function:

Understood. Nevertheless, it isn't documented.

(I understand the basic text manipulation of what you want done.)

68–69 ↗(On Diff #51744)

Where does bsdconfig check the exit code of local? AFAICT that statement is false — the return value of local is not checked explicitly. If bsdconfig doesn't set -e, then it doesn't check at all. (I don't think checking the return value of local is especially useful, but you're making the extraordinary claim that it does check.)

74 ↗(On Diff #51744)

Huh. I'd just press → 6 or 7 times.

Put 'local' variables on the same line; just forgot to do this last revision.

libexec/rc/rc.subr
67 ↗(On Diff #51744)

It is absolutely documented.

Variables and Parameters
  The shell maintains a set of parameters.  A parameter denoted by a name
  (consisting solely of alphabetics, numerics, and underscores, and
  starting with an alphabetic or an underscore) is called a variable.  When
  starting up, the shell turns all environment variables with valid names
  into shell variables.  New variables can be set using the form

        name=value

  A parameter can also be denoted by a number or a special character as
  explained below.

  Assignments are expanded differently from other words: tilde expansion is
  also performed after the equals sign and after any colon and usernames
  are also terminated by colons, and field splitting and pathname expansion
  are not performed.

  This special expansion applies not only to assignments that form a simple
  command by themselves or precede a command word, but also to words passed
  to the export, local or readonly built-in commands that have this form.
  For this, the builtin's name must be literal (not the result of an
  expansion) and may optionally be preceded by one or more literal
  instances of command without options.

It is in the last paragraph that it explains that:

foo=bar somecommand some args
local foo=bar
readonly foo=bar
export foo=bar

Are all supported syntaxes.

68–69 ↗(On Diff #51744)

Where does bsdconfig check the exit code of local?

It doesn't. I said "built-in such as ..." (you seem to have missed the "such as").

AFAICT that statement is false -- the return value of local is not checked explicitly.

It doesn't. I said "Advanced shell code such as that in bsdconfig" (you again seem to have missed the "such as").

If bsdconfig doesn't set -e, then it doesn't check at all.

Not sure why you checked in the first place. I never said bsdconfig either explicitly or implicitly checks the return status of local but rather that advanced shell code that uses variable indirection such as bsdconfig. Here is an example of variable indirection taken from /usr/share/bsdconfig/struct.subr:

f_struct_new()
{
        local type="$1" name="$2"
        f_dprintf "f_struct_new: type=[%s] name=[%s]" "$type" "$name"
        [ "$name" ] || return $FAILURE
        setvar "_struct_type_$name" "$type" || return $FAILURE
        # OK to use bare $name at this point
        eval $name\(\){ f_struct $name \"\$@\"\; }
}

Shell code that uses similarly-advanced techniques would quite conceivably do the following:

somefunc()
{
        local $1 || return $FAILURE
        ...
}

No extraordinary claims are being made here.

The manual states, by your own admission:

The exit status is zero unless the command is not in a function or a variable name is invalid.

In any/all cases of variable indirection (dynamic name for a local in this context), the return status should be checked. You can be forgiven for not envisioning the use-case of indirection based on the description above from the man-page, but I hardly think it is the modus operandi of every manual to explicitly spell out exactly the value-add here.

74 ↗(On Diff #51744)

Which is it? 6? or 7?

I strive to waste as little energy as possible navigating the cursor, or trying to visually determine how far away something is.

On occasion I also have to deal with a laggy terminal and over-shooting a goal is not only a waste of time but frustrating and error-prone.

The way I drive a terminal, I often don't have to see where the cursor is at the time that I type. I can type blindly with only looking at what is on the screen very seldom.

cem marked 3 inline comments as done.Dec 10 2018, 11:47 PM
cem added inline comments.
libexec/rc/rc.subr
67 ↗(On Diff #51744)

Thanks for pointing that out. I missed it — I think that's a sign it could be more clearly linked from the section on local specifically (and perhaps the other ones as well, although on the one hand I've never heard of readonly, and on the other I feel like export foo=bar is extremely common).

68–69 ↗(On Diff #51744)

If the intent was "such as" and didn't not make the claim that the return value of local is used, I fail to see how it was germane. So why say it?

(I checked bsdconfig only because you mentioned it specifically — I had previously done several whole-tree greps to make sure the exit status was not used anywhere before I made my earlier claim.)

I have no idea what you're talking about re: indirection and am not really sure why it's germane either.

74 ↗(On Diff #51744)

Which is it? 6? or 7?

Depends on whether you put that extra space there or not?

I strive to waste as little energy as possible navigating the cursor, or trying to visually determine how far away something is.

Yeah, me too. That's why I don't bother with all of the silly navigation commands and just use the arrow keys. It's not for everyone. :-)

libexec/rc/rc.subr
68–69 ↗(On Diff #51744)

If the intent was "such as"

It was, and that's why I typed it, not once but twice.

So why say it?

You advocated for the removal of a sentence documenting the exit status of the local built-in. I gave arguments against your "zero in-tree consumers" argument. I'm failing to see the logic or removal when I can clearly imagine a case (and gave an example use-case -- variable indirection).

(I checked ... done several whole-tree greps ...

To be fair, your grep would miss things like this:

valid_varname()
{
        local $1 > /dev/null 2>&1
}
var=bad-var-name
valid_varname "$var" || { echo "$var: bad variable name" >&2; exit 1; }

I have no idea what you're talking about re: indirection and am not really sure why it's germane either.

Try googling "variable indirection".

Programming 101:

In /bin/sh:

foo=bar
setvar $foo baz # set bar=baz
eval echo \$$foo # baz

In bash:

foo=bar
eval $foo=baz # set bar=baz
echo ${!foo} # baz

This is basic stuff. Nearly every programming language on the planet supports variable indirection.

Why is it germain to mention variable indirection in a discussion about testing the exit status of local? Because local promises to return error status when given an invalid variable name. Indirection occurs when you rely on variable expansion in the argument space of the local built-in keyword to define the variable that will ultimately be made local.

74 ↗(On Diff #51744)

why I don't bother with all of the silly navigation commands

Muscle memory, my friend. Don't fight it, leverage it.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 11 2018, 1:39 AM
This revision was automatically updated to reflect the committed changes.