Page MenuHomeFreeBSD

Multiple improvements of indent(1)
ClosedPublic

Authored by pstef on Jun 25 2016, 10:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 17, 9:37 PM
Unknown Object (File)
Fri, Jan 17, 6:55 PM
Unknown Object (File)
Fri, Jan 17, 6:17 PM
Unknown Object (File)
Sun, Jan 5, 3:04 AM
Unknown Object (File)
Dec 2 2024, 3:58 AM
Unknown Object (File)
Nov 21 2024, 4:36 AM
Unknown Object (File)
Nov 3 2024, 4:57 AM
Unknown Object (File)
Oct 11 2024, 9:50 AM

Details

Summary

Postgres developers use an older fork of indent(1) to indent their sources. I took FreeBSD's indent(1) and made the changes listed below, in hope that: 1) Postgres could use it and 2) it would be possible to push as many of these as possible upstream.

r301513[bugfix]Fix typo in keyword "typedef".
r303482[bugfix]Avoid out of bound access of array codebuf pointed into by s_code.
[bugfix]Avoid out of bound access of array ps.paren_indents.
[bugfix]Avoid out of bound access of array in_buffer.
r303483[bugfix]Avoid potential use-after-free.
r303484[bugfix]Fix breakage caused by single comment following "else".
[bugfix]Avoid out of bound access of array ps.p_stack.
r303485[bugfix]Semicolons inside struct declarations don't end the declarations.
r303499[bugfix]Support "f" and "F" floating constant suffixes.
r303489[bugfix]Removed whitespace shouldn't be considered in column calculations.
r303571[bugfix]Bail out if there's no more space on the parser stack.
r303570[bugfix]Consistently indent declarations.
r303718[bugfix]Don't ignore the fact that offsetof is a keyword.
[cleanup]Make definition of opchar conditional.
r303588[cleanup]Remove dead code relating to unix-style comments.
r303596[cleanup]Simplify pr_comment().
r303596[cleanup]Deduplicate code that decided if a comment needs a blank line at the top.
r303597[bugfix]Fix wrapping of some lines in comments.
r303598[bugfix]Untangle the connection between pr_comment.c and io.c, fixing at least two bugs.
r303599[bugfix]Don't newline on cpp lines like #endif unless -bacc is on.
r303735[feature]Add -sac (space after cast) and -nsac options.
r303735[feature]Add -U option for providing a file containing list of types.
r303746[optimization]Use bsearch() for looking up type keywords.
[style]return (var).
[style]sizeof(var).
r303600[cleanup]replace function call to bzero with memset.
[cleanup]Remove dead assignment.
r303601[cleanup]Rearrange option parsing code a bit to avoid complaints from static analyzers.
[style]Conform to FreeBSD's style(9).
r303629[bugfix]Avoid using values of pointers that refer to deallocated space.

I do possess separate patch files for when they are needed. My work on indent(1) is also available at https://github.com/pstef/freebsd_indent/commits/pass1

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

pstef retitled this revision from to Multiple improvements of indent(1).
pstef updated this object.
pstef edited the test plan for this revision. (Show Details)

hi,

this /looks/ fine for me, but it's a bit late to get it into 11.0-RELEASE. I can look to commit it after the stable/11 branch tag is laid.

How's that sound?

hi,

this /looks/ fine for me, but it's a bit late to get it into 11.0-RELEASE. I can look to commit it after the stable/11 branch tag is laid.

How's that sound?

I'll be glad if we could push this sooner or later, so even "later" is cool to me.

A minor fix for a bug I've introduced.

Add two fixes:

[bugfix] Fix breakage caused by single comment following "else"
[bugfix] Avoid out of bound access of array ps.p_stack

Found a few potential things just glancing at this.

args.c
354

fgets does not guarantee that there is trailing whitespace. In case there is no trailing whitespace at EOF, this will set line[0] = '\0';

indent.c
595

Please use snprintf to prevent buffer overflows.

pr_comment.c
205

Does this need a bounds check to make sure that buf_ptr is always within the bounds of the buffer (prevent out-of-bound read)?

314

Are commas here valid C syntax? Should they be semicolons?

Answer lattera's comments.

args.c
354

Why would that be a problem?

indent.c
595

I don't intend to change this fragment, at least not yet, and it has been like this for years. None of my patches address potentially unsafe practices in general, like replacing sprintf() with snprintf() as you suggest.

It's shown in the diff as changed merely because of whitespace change.

pr_comment.c
205

Perhaps, but again - it has been like that for years and I haven't changed it in any of the proposed patches of mine.

Not to say it doesn't need fixing, but that's perhaps for another patch.

314

Yes, they are. A more canonical way of expressing code doing effectively the same would be:

if (!ps.box_com && star_comment_cont) {
  *e_com++ = ' ';
  *e_com++ = '*';
  *e_com++ = ' ';
}

But I find it visually inferior to the one-liner from my patch.

If someone is to propose a patch that does the latter instead of the former consistently, I won't oppose.

I will try to have a deeper look/test in the next days

args.c
352

why not using getline(3) here to avoid having a limitation on the length of the line? which will also address the @lattera-gmail.com problem below

indent.c
1108

What is the point of changing that if and not keeping the original form ?

io.c
536

style: missing parenthesis?

lexi.c
137

same style issue?

591–618

style bug: sizeof()

602

style bug: sizeof()

620–622

sizeof()

pr_comment.c
281

style extract return (repeated in other part of the code)

args.c
354

It'd be a problem because if:

char *copy;
char *line = "some data that should be parsed here but isn't due to no trailing whitespace";
size_t lastchar = 0; /* strcspn(line, " \t\n\r") will return 0 */
line[lastchar] = '\0';
copy = strdup(line); add_typename(copy);

The copy variable ends up containing an empty string, even though there's data to be parsed.

Since my comment was based on a glance, not on a full and careful review, it's likely that this piece of data is getting parsed for a reason. Why parse it if you don't care it's an empty string, even when it's not?

ok I have to admit the original does not respect style(9) either

Address comments.

args.c
352

No specific reason, that's just the way it had been done for the Postgres fork and I didn't bother to improve it as you suggest.

354

strcspn("some data that should be parsed here but isn't due to no trailing whitespace", " \t\n\r") will return 4, not 0. 4 is the length of the segment of the first string which consists entirely of characters not from the second string.

I argue that truncation here is not doing any harm, bacause typedef names cannot include any of the characters from the second string.

indent.c
1108

I presume you mean the change from strncmp() == 0 to !strncmp(). I did that to make the line shorter so that I could fit the comment that I added on that line. Initially the comment was a bit longer and didn't fit in 80 columns.

io.c
536

I will add the parentheses if you insist, but I'm still to learn what their purpose is. Also note that the original code didn't use parenthesis around returns' arguments consistently, either.

lexi.c
591–618

If you insist.

pr_comment.c
281

I don't understand this comment.

args.c
354

After re-reading the manpage for strcspn, it looks like I missed an important detail: "it computes the string array index of the first character of s which is also in charset, else the index of the first null character." So if that trailing whitespace isn't found, then it'll reduce it to line[strlen(line)], in which case, you'll be fine. So disregard this set of comments.

I added @pfg who might also be interested

@pfg if working on this please note that on the github he listen, the changes are atomically committed, which may easier the process of incomporating the changes, in particular the bug fix one and would once in make this review easier to focus on the features.

Very interesting: All the BSDs fixed the "typdef" typo but indent(1) generally needs much more attention. Plus the clang-format tool doesn't accept BSD style.

I agree it's a bit late for 11 and it is easier to take the changes from github.
Perhaps it's good chance to also cleanup the Coverity issues (mostly cosmetic AFAICT).

In D6966#152751, @pfg wrote:

Very interesting: All the BSDs fixed the "typdef" typo but indent(1) generally needs much more attention.

That was when I mocked decades-old code for having a typo that nobody seems to have noticed all that time. Some NetBSD and OpenBSD people with commit bits noticed that and fixed it in their BSDs, FreeBSD then just copied that.

Perhaps it's good chance to also cleanup the Coverity issues (mostly cosmetic AFAICT).

I don't have access to that (or just don't know where to find it), but I'd love to look into fixing those.

pstef updated this object.
pstef edited edge metadata.
In D6966#152867, @email_piotr-stefaniak.me wrote:
In D6966#152751, @pfg wrote:

Very interesting: All the BSDs fixed the "typdef" typo but indent(1) generally needs much more attention.

That was when I mocked decades-old code for having a typo that nobody seems to have noticed all that time. Some NetBSD and OpenBSD people with commit bits noticed that and fixed it in their BSDs, FreeBSD then just copied that.

Ah OK, I was the onw who copied it :).

Perhaps it's good chance to also cleanup the Coverity issues (mostly cosmetic AFAICT).

I don't have access to that (or just don't know where to find it), but I'd love to look into fixing those.

You can trigger those in github through travis ci. But don't spend time on those... I am in the middle of another project but will start committing fixes soon and I will share the reports with you.

I fixed all return (var) and sizeof(var) issues, including reported ones, not reported ones, and the ones that had been there for ages.

io.c
536

Fixed.

lexi.c
137

Fixed.

591–618

Fixed.

602

Fixed.

620–622

Fixed.

I forgot the -U9999999 the last time I updated the diff.

In D6966#153099, @bapt wrote:

I also cleaned up NULL vs 0 for pointers rS303502. The revision may need some updates now but hopefully it shall be clearer to read.
I left out the array bound checks for now: they wouldn't hurt but I am not sure they are necessary
Not sure if we want -sac (space after cast).
The "Consistently indent declarations." sounds good but I didn't look at it in detail.

I left out the array bound checks for now: they wouldn't hurt but I am not sure they are necessary

They prevent triggerable undefined behavior; the code in some cases works only by accident. I don't see a reason not to commit those fixes.

Not sure if we want -sac (space after cast).

At some point in time FreeBSD's flavor of indent(1) got changed to never put a space after cast without reservation. It's a stylistic issue and many projects - including PostgreSQL - prefer otherwise, hence my addition of the option.

The "Consistently indent declarations." sounds good but I didn't look at it in detail.

It's the most broken thing in FreeBSD indent(1) as far as I can tell. Currently when it indents a declaration, it adds surplus indentation before commas, like here:
original:

{
  int a, b, c;
}

what it's supposed to be:

{
	int		a, b, c;
}

what indent(1) currently does:

{
	int		a         , b, c;
}
In D6966#153151, @email_piotr-stefaniak.me wrote:

I left out the array bound checks for now: they wouldn't hurt but I am not sure they are necessary

They prevent triggerable undefined behavior; the code in some cases works only by accident. I don't see a reason not to commit those fixes.

I am not against committing them, but I would like to see if there are alternatives.

Not sure if we want -sac (space after cast).

At some point in time FreeBSD's flavor of indent(1) got changed to never put a space after cast without reservation. It's a stylistic issue and many projects - including PostgreSQL - prefer otherwise, hence my addition of the option.

Yes, they are against our our style(9), of course other code bases may find them useful.

The "Consistently indent declarations." sounds good but I didn't look at it in detail.

It's the most broken thing in FreeBSD indent(1) as far as I can tell.

Thanks. I am usually take changes by batches and I try to follow some order.

I just mentioned the above points in case another person feels like reviewing them and/or
grabbing where I left..

Fixed a few "style bugs".

jilles added inline comments.
indent_globs.h
68–79

If you're changing this code anyway, please also fix the undefined behaviour of using pointers derived from the original pointer after a successful call to realloc(). This can be done by calculating e_com-s_com and last_bl-s_com before calling realloc().

In D6966#153151, @email_piotr-stefaniak.me wrote:

The "Consistently indent declarations." sounds good but I didn't look at it in detail.

It's the most broken thing in FreeBSD indent(1) as far as I can tell. Currently when it indents a declaration, it adds surplus indentation before commas, like here:
original:

{
  int a, b, c;
}

what it's supposed to be:

{
	int		a, b, c;
}

what indent(1) currently does:

{
	int		a         , b, c;
}

Nice.. I just reproduced it, but I had to disable my defaults (from FreeBSD's examples).
Committed as rS303570 with changes due to length of lines, unnecessary brackets.

indent.c
724–728

line too long. for our style(9) there are several like this.

949

Unnecessary brackets for one statement (style).

955–956

This two line change is unnecessary.

1004–1008

line too long for our style(0)

In D6966#153160, @pfg wrote:
In D6966#153151, @email_piotr-stefaniak.me wrote:

I left out the array bound checks for now: they wouldn't hurt but I am not sure they are necessary

They prevent triggerable undefined behavior; the code in some cases works only by accident. I don't see a reason not to commit those fixes.

I am not against committing them, but I would like to see if there are alternatives.

I meant I want to give a thought on preventing such situations in a cleaner way.

Not sure if we want -sac (space after cast).

At some point in time FreeBSD's flavor of indent(1) got changed to never put a space after cast without reservation. It's a stylistic issue and many projects - including PostgreSQL - prefer otherwise, hence my addition of the option.

Yes, they are against our our style(9), of course other code bases may find them useful.

I forgot to mention: such changes must be documented in the man page.

pstef updated this object.

Remove dead assignment as per clang-analyzer's complaint.
Rearrange option parsing code as per clang-analyzer's complaint about "dangling reference".
Avoid undefined behavior as per Jilles Tjoelker's complaint. Apply style(9) to the macro definitions while there.
Fix incompatible pointer type of sizeof's operand when operating on the object "typenames".

indent.c
1135

Not sure if I am missing something but I am getting these when I get to 01f36f4141c71754b3a73a91886fb425bab0df3e @pstef pstef committed on Jun 25

indent.c:1131:11: error: initializing 'char *' with an expression of type 'const char [8]' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
                    {7, "include"},
                        ^~~~~~~~~
indent.c:1132:11: error: initializing 'char *' with an expression of type 'const char [7]' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
                    {6, "define"},
                        ^~~~~~~~
indent.c:1133:11: error: initializing 'char *' with an expression of type 'const char [6]' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
                    {5, "undef"},
                        ^~~~~~~
indent.c:1134:11: error: initializing 'char *' with an expression of type 'const char [5]' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
                    {4, "line"},
indent.c
724–728

This appears to have been commited with the style improvements. Would you like me to fix it in the proposed patch as well or do you think it doesn't matter anymore?

1108

Fixed.

1135

I'm unable to reproduce this. Should be fixed by constifying char *string:

                struct directives {
                    int size;
-                   char *string;
+                   const char *string;
                }
indent_globs.h
68–79

Fixed.

indent.c
724–728

At this time I am using the github patches.

It would be better to rebase the differential review but wait until later today when I commit a couple more.

indent_globs.h
68–79

Looks good to me. The fifth realloc() call, in io.c, was already correct.

This revision was automatically updated to reflect the committed changes.

The enhancements have been really nice and welcome, THANKS!
I think I have committed all that was interesting for FreeBSD.

The remaining changes are either cosmetic or have no real effect: the style changes are not mandatory as long as the code has some style already. Maybe I missed something though, but perhaps it would be better to rebase any further enhancement on top of the latest version in FreeBSD.

Thanks again!

In D6966#154186, @pfg wrote:

The enhancements have been really nice and welcome, THANKS!

Thank you for guiding me through the process of getting this stuff committed.

The remaining changes are either cosmetic or have no real effect: the style changes are not mandatory as long as the code has some style already. Maybe I missed something though, but perhaps it would be better to rebase any further enhancement on top of the latest version in FreeBSD.

I would like to see the out of bound access bugs eventually fixed one way or the other, mostly because the bugs render address sanitizer useless in further development, unless those custom changes are applied. I understand your concerns over my primitive changes being not explained well and I'll see what I can do with that.

As a heads-up: I plan to do more work on various things in indent(1).

Thanks.