Page MenuHomeFreeBSD

Multiple improvements of indent(1)
ClosedPublic

Authored by pstef on Jun 25 2016, 10:51 PM.

Details

Summary

Postgres developers use an older fork of indent(1) to indent their sources. I took FreeBSD's indent(1) and made the changes listed below, in hope that: 1) Postgres could use it and 2) it would be possible to push as many of these as possible upstream.

r301513[bugfix]Fix typo in keyword "typedef".
r303482[bugfix]Avoid out of bound access of array codebuf pointed into by s_code.
[bugfix]Avoid out of bound access of array ps.paren_indents.
[bugfix]Avoid out of bound access of array in_buffer.
r303483[bugfix]Avoid potential use-after-free.
r303484[bugfix]Fix breakage caused by single comment following "else".
[bugfix]Avoid out of bound access of array ps.p_stack.
r303485[bugfix]Semicolons inside struct declarations don't end the declarations.
r303499[bugfix]Support "f" and "F" floating constant suffixes.
r303489[bugfix]Removed whitespace shouldn't be considered in column calculations.
r303571[bugfix]Bail out if there's no more space on the parser stack.
r303570[bugfix]Consistently indent declarations.
r303718[bugfix]Don't ignore the fact that offsetof is a keyword.
[cleanup]Make definition of opchar conditional.
r303588[cleanup]Remove dead code relating to unix-style comments.
r303596[cleanup]Simplify pr_comment().
r303596[cleanup]Deduplicate code that decided if a comment needs a blank line at the top.
r303597[bugfix]Fix wrapping of some lines in comments.
r303598[bugfix]Untangle the connection between pr_comment.c and io.c, fixing at least two bugs.
r303599[bugfix]Don't newline on cpp lines like #endif unless -bacc is on.
r303735[feature]Add -sac (space after cast) and -nsac options.
r303735[feature]Add -U option for providing a file containing list of types.
r303746[optimization]Use bsearch() for looking up type keywords.
[style]return (var).
[style]sizeof(var).
r303600[cleanup]replace function call to bzero with memset.
[cleanup]Remove dead assignment.
r303601[cleanup]Rearrange option parsing code a bit to avoid complaints from static analyzers.
[style]Conform to FreeBSD's style(9).
r303629[bugfix]Avoid using values of pointers that refer to deallocated space.

I do possess separate patch files for when they are needed. My work on indent(1) is also available at https://github.com/pstef/freebsd_indent/commits/pass1

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

pstef updated this revision to Diff 17885.Jun 25 2016, 10:51 PM
pstef retitled this revision from to Multiple improvements of indent(1).
pstef updated this object.
pstef edited the test plan for this revision. (Show Details)
adrian added a subscriber: adrian.Jun 26 2016, 4:37 AM

hi,

this /looks/ fine for me, but it's a bit late to get it into 11.0-RELEASE. I can look to commit it after the stable/11 branch tag is laid.

How's that sound?

pstef added a comment.Jun 26 2016, 4:57 AM

hi,
this /looks/ fine for me, but it's a bit late to get it into 11.0-RELEASE. I can look to commit it after the stable/11 branch tag is laid.
How's that sound?

I'll be glad if we could push this sooner or later, so even "later" is cool to me.

pstef updated this revision to Diff 17905.Jun 26 2016, 9:46 PM

A minor fix for a bug I've introduced.

pstef updated this revision to Diff 18495.Jul 17 2016, 5:24 PM

Add two fixes:

[bugfix] Fix breakage caused by single comment following "else"
[bugfix] Avoid out of bound access of array ps.p_stack
pstef updated this object.Jul 17 2016, 5:28 PM

Found a few potential things just glancing at this.

args.c
356 ↗(On Diff #18495)

fgets does not guarantee that there is trailing whitespace. In case there is no trailing whitespace at EOF, this will set line[0] = '\0';

indent.c
595 ↗(On Diff #18495)

Please use snprintf to prevent buffer overflows.

pr_comment.c
205 ↗(On Diff #18495)

Does this need a bounds check to make sure that buf_ptr is always within the bounds of the buffer (prevent out-of-bound read)?

314 ↗(On Diff #18495)

Are commas here valid C syntax? Should they be semicolons?

pstef added a comment.Jul 27 2016, 9:24 PM

Answer lattera's comments.

args.c
356 ↗(On Diff #18495)

Why would that be a problem?

indent.c
595 ↗(On Diff #18495)

I don't intend to change this fragment, at least not yet, and it has been like this for years. None of my patches address potentially unsafe practices in general, like replacing sprintf() with snprintf() as you suggest.

It's shown in the diff as changed merely because of whitespace change.

pr_comment.c
205 ↗(On Diff #18495)

Perhaps, but again - it has been like that for years and I haven't changed it in any of the proposed patches of mine.

Not to say it doesn't need fixing, but that's perhaps for another patch.

314 ↗(On Diff #18495)

Yes, they are. A more canonical way of expressing code doing effectively the same would be:

if (!ps.box_com && star_comment_cont) {
  *e_com++ = ' ';
  *e_com++ = '*';
  *e_com++ = ' ';
}

But I find it visually inferior to the one-liner from my patch.

If someone is to propose a patch that does the latter instead of the former consistently, I won't oppose.

bapt added a subscriber: bapt.Jul 27 2016, 9:26 PM

I will try to have a deeper look/test in the next days

args.c
354 ↗(On Diff #18495)

why not using getline(3) here to avoid having a limitation on the length of the line? which will also address the @lattera-gmail.com problem below

indent.c
1108 ↗(On Diff #18495)

What is the point of changing that if and not keeping the original form ?

io.c
536 ↗(On Diff #18495)

style: missing parenthesis?

lexi.c
137 ↗(On Diff #18495)

same style issue?

594 ↗(On Diff #18495)

style bug: sizeof()

605 ↗(On Diff #18495)

style bug: sizeof()

623 ↗(On Diff #18495)

sizeof()

pr_comment.c
281 ↗(On Diff #18495)

style extract return (repeated in other part of the code)

bapt added a reviewer: bapt.Jul 27 2016, 9:26 PM
args.c
356 ↗(On Diff #18495)

It'd be a problem because if:

char *copy;
char *line = "some data that should be parsed here but isn't due to no trailing whitespace";
size_t lastchar = 0; /* strcspn(line, " \t\n\r") will return 0 */
line[lastchar] = '\0';
copy = strdup(line); add_typename(copy);

The copy variable ends up containing an empty string, even though there's data to be parsed.

Since my comment was based on a glance, not on a full and careful review, it's likely that this piece of data is getting parsed for a reason. Why parse it if you don't care it's an empty string, even when it's not?

bapt edited edge metadata.Jul 27 2016, 9:46 PM

ok I have to admit the original does not respect style(9) either

pstef added a comment.Jul 27 2016, 9:56 PM

Address comments.

args.c
354 ↗(On Diff #18495)

No specific reason, that's just the way it had been done for the Postgres fork and I didn't bother to improve it as you suggest.

356 ↗(On Diff #18495)

strcspn("some data that should be parsed here but isn't due to no trailing whitespace", " \t\n\r") will return 4, not 0. 4 is the length of the segment of the first string which consists entirely of characters not from the second string.

I argue that truncation here is not doing any harm, bacause typedef names cannot include any of the characters from the second string.

indent.c
1108 ↗(On Diff #18495)

I presume you mean the change from strncmp() == 0 to !strncmp(). I did that to make the line shorter so that I could fit the comment that I added on that line. Initially the comment was a bit longer and didn't fit in 80 columns.

io.c
536 ↗(On Diff #18495)

I will add the parentheses if you insist, but I'm still to learn what their purpose is. Also note that the original code didn't use parenthesis around returns' arguments consistently, either.

lexi.c
594 ↗(On Diff #18495)

If you insist.

pr_comment.c
281 ↗(On Diff #18495)

I don't understand this comment.

args.c
356 ↗(On Diff #18495)

After re-reading the manpage for strcspn, it looks like I missed an important detail: "it computes the string array index of the first character of s which is also in charset, else the index of the first null character." So if that trailing whitespace isn't found, then it'll reduce it to line[strlen(line)], in which case, you'll be fine. So disregard this set of comments.

bapt added a reviewer: pfg.Jul 27 2016, 10:32 PM
bapt added a comment.Jul 27 2016, 10:35 PM

I added @pfg who might also be interested

@pfg if working on this please note that on the github he listen, the changes are atomically committed, which may easier the process of incomporating the changes, in particular the bug fix one and would once in make this review easier to focus on the features.

emaste added a subscriber: emaste.Jul 27 2016, 10:44 PM
pfg edited edge metadata.Jul 28 2016, 12:33 AM

Very interesting: All the BSDs fixed the "typdef" typo but indent(1) generally needs much more attention. Plus the clang-format tool doesn't accept BSD style.

I agree it's a bit late for 11 and it is easier to take the changes from github.
Perhaps it's good chance to also cleanup the Coverity issues (mostly cosmetic AFAICT).

pstef added a comment.EditedJul 28 2016, 3:27 PM
In D6966#152751, @pfg wrote:

Very interesting: All the BSDs fixed the "typdef" typo but indent(1) generally needs much more attention.

That was when I mocked decades-old code for having a typo that nobody seems to have noticed all that time. Some NetBSD and OpenBSD people with commit bits noticed that and fixed it in their BSDs, FreeBSD then just copied that.

Perhaps it's good chance to also cleanup the Coverity issues (mostly cosmetic AFAICT).

I don't have access to that (or just don't know where to find it), but I'd love to look into fixing those.

pstef updated this revision to Diff 18851.Jul 28 2016, 3:39 PM
pstef updated this object.
pstef edited edge metadata.
pfg added a comment.Jul 28 2016, 3:48 PM
In D6966#152867, @email_piotr-stefaniak.me wrote:
In D6966#152751, @pfg wrote:

Very interesting: All the BSDs fixed the "typdef" typo but indent(1) generally needs much more attention.

That was when I mocked decades-old code for having a typo that nobody seems to have noticed all that time. Some NetBSD and OpenBSD people with commit bits noticed that and fixed it in their BSDs, FreeBSD then just copied that.

Ah OK, I was the onw who copied it :).

Perhaps it's good chance to also cleanup the Coverity issues (mostly cosmetic AFAICT).

I don't have access to that (or just don't know where to find it), but I'd love to look into fixing those.

You can trigger those in github through travis ci. But don't spend time on those... I am in the middle of another project but will start committing fixes soon and I will share the reports with you.

pstef added a comment.Jul 28 2016, 3:49 PM

I fixed all return (var) and sizeof(var) issues, including reported ones, not reported ones, and the ones that had been there for ages.

io.c
536 ↗(On Diff #18495)

Fixed.

lexi.c
137 ↗(On Diff #18495)

Fixed.

594 ↗(On Diff #18495)

Fixed.

605 ↗(On Diff #18495)

Fixed.

623 ↗(On Diff #18495)

Fixed.

pstef updated this revision to Diff 18853.Jul 28 2016, 4:58 PM

I forgot the -U9999999 the last time I updated the diff.

pfg added a comment.Jul 29 2016, 7:49 PM
In D6966#153099, @bapt wrote:

I also cleaned up NULL vs 0 for pointers rS303502. The revision may need some updates now but hopefully it shall be clearer to read.
I left out the array bound checks for now: they wouldn't hurt but I am not sure they are necessary
Not sure if we want -sac (space after cast).
The "Consistently indent declarations." sounds good but I didn't look at it in detail.

pstef added a comment.Jul 29 2016, 9:26 PM

I left out the array bound checks for now: they wouldn't hurt but I am not sure they are necessary

They prevent triggerable undefined behavior; the code in some cases works only by accident. I don't see a reason not to commit those fixes.

Not sure if we want -sac (space after cast).

At some point in time FreeBSD's flavor of indent(1) got changed to never put a space after cast without reservation. It's a stylistic issue and many projects - including PostgreSQL - prefer otherwise, hence my addition of the option.

The "Consistently indent declarations." sounds good but I didn't look at it in detail.

It's the most broken thing in FreeBSD indent(1) as far as I can tell. Currently when it indents a declaration, it adds surplus indentation before commas, like here:
original:

{
  int a, b, c;
}

what it's supposed to be:

{
	int		a, b, c;
}

what indent(1) currently does:

{
	int		a         , b, c;
}
pfg added a comment.Jul 29 2016, 9:41 PM
In D6966#153151, @email_piotr-stefaniak.me wrote:

I left out the array bound checks for now: they wouldn't hurt but I am not sure they are necessary

They prevent triggerable undefined behavior; the code in some cases works only by accident. I don't see a reason not to commit those fixes.

I am not against committing them, but I would like to see if there are alternatives.

Not sure if we want -sac (space after cast).

At some point in time FreeBSD's flavor of indent(1) got changed to never put a space after cast without reservation. It's a stylistic issue and many projects - including PostgreSQL - prefer otherwise, hence my addition of the option.

Yes, they are against our our style(9), of course other code bases may find them useful.

The "Consistently indent declarations." sounds good but I didn't look at it in detail.

It's the most broken thing in FreeBSD indent(1) as far as I can tell.

Thanks. I am usually take changes by batches and I try to follow some order.

I just mentioned the above points in case another person feels like reviewing them and/or
grabbing where I left..

pstef updated this revision to Diff 18896.Jul 30 2016, 7:31 AM

Fixed a few "style bugs".

jilles added a subscriber: jilles.Jul 30 2016, 11:27 AM
jilles added inline comments.
indent_globs.h
76 ↗(On Diff #18896)

If you're changing this code anyway, please also fix the undefined behaviour of using pointers derived from the original pointer after a successful call to realloc(). This can be done by calculating e_com-s_com and last_bl-s_com before calling realloc().

pfg added a comment.EditedJul 31 2016, 4:14 AM
In D6966#153151, @email_piotr-stefaniak.me wrote:

The "Consistently indent declarations." sounds good but I didn't look at it in detail.

It's the most broken thing in FreeBSD indent(1) as far as I can tell. Currently when it indents a declaration, it adds surplus indentation before commas, like here:
original:

{
  int a, b, c;
}

what it's supposed to be:

{
	int		a, b, c;
}

what indent(1) currently does:

{
	int		a         , b, c;
}

Nice.. I just reproduced it, but I had to disable my defaults (from FreeBSD's examples).
Committed as rS303570 with changes due to length of lines, unnecessary brackets.

indent.c
724 ↗(On Diff #18896)

line too long. for our style(9) there are several like this.

949 ↗(On Diff #18896)

Unnecessary brackets for one statement (style).

955 ↗(On Diff #18896)

This two line change is unnecessary.

1004 ↗(On Diff #18896)

line too long for our style(0)

pfg added a comment.Jul 31 2016, 4:19 AM
In D6966#153160, @pfg wrote:
In D6966#153151, @email_piotr-stefaniak.me wrote:

I left out the array bound checks for now: they wouldn't hurt but I am not sure they are necessary

They prevent triggerable undefined behavior; the code in some cases works only by accident. I don't see a reason not to commit those fixes.

I am not against committing them, but I would like to see if there are alternatives.

I meant I want to give a thought on preventing such situations in a cleaner way.

Not sure if we want -sac (space after cast).

At some point in time FreeBSD's flavor of indent(1) got changed to never put a space after cast without reservation. It's a stylistic issue and many projects - including PostgreSQL - prefer otherwise, hence my addition of the option.

Yes, they are against our our style(9), of course other code bases may find them useful.

I forgot to mention: such changes must be documented in the man page.

pstef updated this revision to Diff 18919.Jul 31 2016, 2:36 PM
pstef updated this object.

Remove dead assignment as per clang-analyzer's complaint.
Rearrange option parsing code as per clang-analyzer's complaint about "dangling reference".
Avoid undefined behavior as per Jilles Tjoelker's complaint. Apply style(9) to the macro definitions while there.
Fix incompatible pointer type of sizeof's operand when operating on the object "typenames".

pfg added inline comments.Jul 31 2016, 3:39 PM
indent.c
1135 ↗(On Diff #18919)

Not sure if I am missing something but I am getting these when I get to 01f36f4141c71754b3a73a91886fb425bab0df3e @pstef pstef committed on Jun 25

indent.c:1131:11: error: initializing 'char *' with an expression of type 'const char [8]' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
                    {7, "include"},
                        ^~~~~~~~~
indent.c:1132:11: error: initializing 'char *' with an expression of type 'const char [7]' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
                    {6, "define"},
                        ^~~~~~~~
indent.c:1133:11: error: initializing 'char *' with an expression of type 'const char [6]' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
                    {5, "undef"},
                        ^~~~~~~
indent.c:1134:11: error: initializing 'char *' with an expression of type 'const char [5]' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
                    {4, "line"},
pstef added inline comments.Jul 31 2016, 4:40 PM
indent.c
724 ↗(On Diff #18896)

This appears to have been commited with the style improvements. Would you like me to fix it in the proposed patch as well or do you think it doesn't matter anymore?

1135 ↗(On Diff #18919)

I'm unable to reproduce this. Should be fixed by constifying char *string:

                struct directives {
                    int size;
-                   char *string;
+                   const char *string;
                }
1108 ↗(On Diff #18495)

Fixed.

indent_globs.h
76 ↗(On Diff #18896)

Fixed.

pfg added inline comments.Jul 31 2016, 6:19 PM
indent.c
724 ↗(On Diff #18896)

At this time I am using the github patches.

It would be better to rebase the differential review but wait until later today when I commit a couple more.

jilles added inline comments.Jul 31 2016, 9:11 PM
indent_globs.h
76 ↗(On Diff #18896)

Looks good to me. The fifth realloc() call, in io.c, was already correct.

pstef updated this object.Aug 2 2016, 5:15 PM
This revision was automatically updated to reflect the committed changes.
pfg added a comment.Aug 4 2016, 3:34 PM

The enhancements have been really nice and welcome, THANKS!
I think I have committed all that was interesting for FreeBSD.

The remaining changes are either cosmetic or have no real effect: the style changes are not mandatory as long as the code has some style already. Maybe I missed something though, but perhaps it would be better to rebase any further enhancement on top of the latest version in FreeBSD.

Thanks again!

pstef updated this object.Aug 4 2016, 4:06 PM
pstef added a comment.Aug 4 2016, 4:18 PM
In D6966#154186, @pfg wrote:

The enhancements have been really nice and welcome, THANKS!

Thank you for guiding me through the process of getting this stuff committed.

The remaining changes are either cosmetic or have no real effect: the style changes are not mandatory as long as the code has some style already. Maybe I missed something though, but perhaps it would be better to rebase any further enhancement on top of the latest version in FreeBSD.

I would like to see the out of bound access bugs eventually fixed one way or the other, mostly because the bugs render address sanitizer useless in further development, unless those custom changes are applied. I understand your concerns over my primitive changes being not explained well and I'll see what I can do with that.

As a heads-up: I plan to do more work on various things in indent(1).

Thanks.

pstef updated this object.Aug 6 2016, 7:53 PM