Page MenuHomeFreeBSD

Relax the rule against declaring variables in nested scopes.
ClosedPublic

Authored by imp on Jun 17 2020, 4:15 AM.

Details

Summary

The source tree has moved beyond this in a number of ways. It's time to just
delete the rule. The tree uses it in a number of places, and reviewers can still
comment that any specific use of a variable in nested scopes is less clear than
the alternatives.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I'm in favor of the change. However, it's been pointed out to me in the past that with all object declarations at the start of the function body it's easier to reason about the stack.

arichardson added a subscriber: arichardson.

I'm in favor of the change. However, it's been pointed out to me in the past that with all object declarations at the start of the function body it's easier to reason about the stack.

Except that it isn't true unless you compile at -O0: any vaguely recent compiler will reuse stack slots for variables for which it knows the lifetime cannot overlap.

Thanks for this change!

I'm in favor of the change. However, it's been pointed out to me in the past that with all object declarations at the start of the function body it's easier to reason about the stack.

Except that it isn't true unless you compile at -O0: any vaguely recent compiler will reuse stack slots for variables for which it knows the lifetime cannot overlap.

Thanks for this change!

I was probably vague. I referred to code and not to its execution. Consider this:

int f(void) {
  char *x;
  x = malloc(...);
  ...
  ...
  ...
  if (condition) {
    char buffer[huge];
  }
  ...

I wrote if (condition) { but that's more debatable than for (;;) { while both camouflage stack usage.

I do not like it.

I agree that the locals in the inner block indeed allow easier _write_ of the code, but I found empirically that it is easier to _read_ when locals are collected at the start of the function. Perhaps it is because I can see all local parameters in one place.

The argument that we have around a hundred places where the rule is already broken in fact confirms that most of our code confirms to the style(9) in this regard, since hundred places for several MLoCs of our own code is quite good ratio.

And, taking one step further, I strongly disagree about lifting the requirement of no initializators in local declarations.

vangyzen added a subscriber: vangyzen.

Long overdue. Thank you, Warner.

ian added a subscriber: ian.
In D25312#558270, @kib wrote:

I do not like it.

[...]

The argument that we have around a hundred places where the rule is already broken in fact confirms that most of our code confirms to the style(9) in this regard, since hundred places for several MLoCs of our own code is quite good ratio.

It's not 'a hundred places', it's "hundreds of places within src/sys even if you filter out contrib and vendor-contributed drivers". If you expand the grep to all of /usr/src and leave contributed code in, it amounts to about 5000 occurrances, and that's just searching 'for \(int'. I don't know of an easy way to search for local vars inside a local block but not inside a for() statement, but I'll bet it's several thousand more.

IMO, it's actually easier to find variable declarations when they are closer to the point of use, it's just a matter of old habbits that need to be unlearned and new ones developed.

I find this change debatable, and as presented here incomplete as I believe this has effects on some other parts of style(9), nor does it provide a proper sample (it leaves the example unchanged) Lets not RUSH to slam this in the tree!

In D25312#558270, @kib wrote:

I do not like it.
...

Kib raises some valid points, and if in fact the actual usage in non-vendor code it may be best to proceed slowly and carefully.

In D25312#558270, @kib wrote:

I found empirically that it is easier to _read_ when locals are collected at the start of the function. Perhaps it is because I can see all local parameters in one place.

I agree with this. One look at the beginning of a function definition and you have a good idea what the stack will look like. As opposed to when variables are declared somewhere later in the function.

I like this change; we've had nested variable declarations since 1989 and I think 30 years is more than enough time to get used to them. I'm a bit more cautious about declarations in the initializer of a for loop. I do acknowledge the concerns several developers have expressed above about stack depth; this information should be possible to derive from DWARF2 debugging information, so perhaps an automated tool could replace manual inspection of the source.

jhb added a subscriber: jhb.

This doesn't lift the requirement to avoid initializing variables when they are declared, it only lifts the requirement of where to place declarations. I recently broke this rule on purpose in some crypto changes where I wanted to constrain the object's lifetime so I could put the explicit_bzero in the ideal place.

BTW, one thing this leaves ambiguous is our you can declare variables in the middle of blocks vs at the beginning. That is:

int
foo(int *p, size_t len)
{
    for (size_t i = 0; i < len; i++) {
        int one, two;

        one = p[i] * 2;
        two = bar(&one);
        ...
    }
}

vs.

int
foo(int *p, size_t len)
{
    for (size_t i = 0; i < len; i++) {
        int one = p[i] * 2;
        int two = bar(&one);
        ...
    }
}

I guess this is the second point kib@ made. Do we think relaxing this means it's ok to declare variables at the start of any block or do we think it's fine to define them in the middle of blocks? If the former you would perhaps want to replace the removed sentence with something like:

Place declarations at the start of blocks.

I had read the intention of removing this as being the former and not the latter FWIW.

In D25312#558449, @jhb wrote:

BTW, one thing this leaves ambiguous is [whether] you can declare variables in the middle of blocks vs at the beginning. That is:

[...]

Place declarations at the start of blocks.

I could live with this. Although I don’t have an objection to _allowing_ declarations in line when it makes sense, I do think one can get carried away and overdo it.

I'm ok dropping the sentence as-is, but I am concerned the increased use of scoped declarations could decrease legibility, if overused. Can we "lightly discourage" use of scoped declarations in a more relaxed way? The existing language seems to kind of do that if you read the qualification at the end and don't stop at "Do not …"

Maybe "Declarations in block scope are acceptable if they increase code clarity. Absent other factors, top-level declarations are preferred."? That is essentially the same as "use it when you want to," with a slight bias towards top-level declarations. I'm ok being outvoted on this one.

In D25312#558449, @jhb wrote:

BTW, one thing this leaves ambiguous is our you can declare variables in the middle of blocks vs at the beginning. That is:

int
foo(int *p, size_t len)
{
    for (size_t i = 0; i < len; i++) {
        int one, two;

        one = p[i] * 2;
        two = bar(&one);
        ...
    }
}

vs.

int
foo(int *p, size_t len)
{
    for (size_t i = 0; i < len; i++) {
        int one = p[i] * 2;
        int two = bar(&one);
        ...
    }
}

I guess this is the second point kib@ made. Do we think relaxing this means it's ok to declare variables at the start of any block or do we think it's fine to define them in the middle of blocks? If the former you would perhaps want to replace the removed sentence with something like:

Place declarations at the start of blocks.

I had read the intention of removing this as being the former and not the latter FWIW.

comma statements should be avoided:)

In D25312#558466, @dab wrote:
In D25312#558449, @jhb wrote:

BTW, one thing this leaves ambiguous is [whether] you can declare variables in the middle of blocks vs at the beginning. That is:

[...]

Place declarations at the start of blocks.

I could live with this. Although I don’t have an objection to _allowing_ declarations in line when it makes sense, I do think one can get carried away and overdo it.

+1

I wonder if we need a poll for this, since there are now three options (1: as-is, 2: start-of-block, 3: anywhere). The ambiguity means most of the approvals are also ambiguous.

That being said, since the majority is clearly in favor of some change, you could simply commit this as-is, and someone could open a followup review to distinguish options 2 and 3. There is no need to be overly legalistic.

One look at the beginning of a function definition and you have a good idea what the stack will look like. As opposed to when variables are declared somewhere later in the function.

I'm afraid that's not the case. Some of this was already mentioned above, but I'll summarize in one place.

Many modern machines have a lot of general purpose registers, which the compiler will use for some local variables. Looking at a declaration block alone doesn't tell you which will be where (except those too large for a register). You have to study the entire function, and even then, you might be surprised.

Callees might be inlined into the function of interest, merging the stack frames, further complicating a human's understanding of the size and layout.

Modern compilers track variable liveness, so one storage location might hold multiple variables at different times. This argument is in favor of this change, since minimizing the scope of a variable more closely represents this liveness tracking.

One look at the beginning of a function definition and you have a good idea what the stack will look like. As opposed to when variables are declared somewhere later in the function.

I'm afraid that's not the case. Some of this was already mentioned above, but I'll summarize in one place.

Many modern machines have a lot of general purpose registers, which the compiler will use for some local variables. Looking at a declaration block alone doesn't tell you which will be where (except those too large for a register). You have to study the entire function, and even then, you might be surprised.

Callees might be inlined into the function of interest, merging the stack frames, further complicating a human's understanding of the size and layout.

Modern compilers track variable liveness, so one storage location might hold multiple variables at different times. This argument is in favor of this change, since minimizing the scope of a variable more closely represents this liveness tracking.

Note that what I said has nothing to do with the stack usage. I mean exactly the cognitive load to assess the local data that is operated by the function upon.
I am very much aware about stack space reuse optimizations due to reading a lot of disassemble looking for panics reasons.

Summary so far:

  1. there's overwhelming support for this, at least in theory
  2. the devil is in the details.

As constructed, it's a bit incomplete, however. There's a couple of other tweaks likely necessary to go along with this. So I'll be doing a second round that's more verbose and offers some best practices clarifications.

The millions of lines of code is an unfair denominator: only a tiny fraction of those are declarations. And vendor code should not be excluded since it represents industry practice. I know I've written maybe thousands of extra lines of code with newbus because I couldn't say struct foo_softc *sc = device_get_softc(dev); or because I passed in a void *xfoo and wanted to use foo_t *foo = xfoo; There's much pent up demand for this. There's wide-spread support for this, I judge, though I'll make sure the details have a chance for approval of people that are commenting here.

Stack usage is a red herring. No modern compiler cares and above -O0 you can't predict anything.

Random declarations mixed in with code are rare in the tree, and rarely promote better code. The support for these isn't clear and likely should be handled separately.

So the goal of any style doc is not only prescriptive, but to encode best practices. With that in mind, I propose to come up with a new diff that codifies the following:

  1. Initializations may be included in declaration where it makes the code clearer
  2. Weaken the admonition against nested variables
  3. Explicitly allow for (int i...) but I'm on the fence for the examples.

All these things exist today in the tree and in many places make the code more compact, readable and understandable (though I'm sure you'll find places too that do not). I was naive to think the originally proposed simple change would suffice.

In D25312#559159, @imp wrote:

Summary so far:

  1. there's overwhelming support for this, at least in theory
  2. the devil is in the details.

... Much sound analysis removed ...

Thank you for this summary, I agree that it represents the consensus and concerns raised by all and that doing so would achieve a better result.

sef added a subscriber: sef.

Long long long overdue.

gnn added a subscriber: gnn.
This comment was removed by gnn.

Update to match what I think the conensus is in this review. Please
tell me how well I did :)

This revision now requires review to proceed.Jul 1 2020, 8:36 PM
This revision is now accepted and ready to land.Jul 1 2020, 8:43 PM

I think this is probably consistent with a fair bit of code in the tree which does use patterns like

static int
foo_attach(device_t dev)
{
     struct foo_softc *sc = device_get_sofc(dev);
...

I think the rule to only use initializers when the value is constant is probably a good compromise.

In D25312#564662, @jhb wrote:

I think the rule to only use initializers when the value is constant is probably a good compromise.

Yea. I phrased it like that because that seemed to be what the vast majority of the tree does and therefore had implicit support of the community of developers... I'm open to other ways to capture this, but I think this is the best compromise (though we didn't talk about it in the review itself).

cem requested changes to this revision.Jul 1 2020, 11:25 PM

The first hunk is fine and should go in on its own. Again, I'd prefer a vague "prefer top-level default" / don't overdo it, folks, but whatever. It's ok.

The second hunk textual changes are mostly fine as well.

share/man/man9/style.9
693–694 ↗(On Diff #73996)

"constant for the rest of the scope" does not reflect real-world use of initialized-at-declaration variables in our codebase. It also isn't clear what "constant" means in this context.

699 ↗(On Diff #73996)

I don't like the added example of struct foo one, *two = bar_get_foo(bar);. Function initializations should probably not be mixed with other declarations in a single statement.

This revision now requires changes to proceed.Jul 1 2020, 11:25 PM

Though much better than the prior iteration I believe it leaves some detail out.

share/man/man9/style.9
596 ↗(On Diff #73996)

I believe we still want to say something about the fact that the preferable form is to put declarations at the top of a routing, and that these 2 exceptions exist, the for loop, and the const within scope. If we just delete the above it leaves an uncovered set of cases, some of which may be undesirable.

603 ↗(On Diff #73996)

Is it specific ONLY to for? or can I do this in a while, or other loop constructs?

Update example
Soften the wording a little

This revision is now accepted and ready to land.Jul 14 2020, 2:34 PM
share/man/man9/style.9
596 ↗(On Diff #73996)

Unaddressed?

603 ↗(On Diff #73996)

Unaddressed?

share/man/man9/style.9
603 ↗(On Diff #73996)

This only makes sense for for loops.

share/man/man9/style.9
705 ↗(On Diff #74442)

Do to all the changes on these 5 lines it makes it very hard to understand just what rule is changed. Why did myfunction, and all the other variable names get shuffled around? Why should the double be renamed to four, could you of not just used three? Since you never use four, five or six why did they need to change?

share/man/man9/style.9
603 ↗(On Diff #73996)

Since the block scope declaration restriction has been deleted I believe this version of style would allow me to declare variables inside a while block. If that is NOT the intent it would be best to put back a reworded block restriction clause.

share/man/man9/style.9
603 ↗(On Diff #73996)

That is intended, and has nothing to do with loops in particular. It’s permitted in any scoped block.