Add sponge(1)

Authored by eadler on Nov 1 2017, 7:55 PM.


Group Reviewers
rS326555: sponge(1) Minor commit for commit log

This adds a naive implementation of sponge(1) to base.

sponge(1) is useful in pipelines to write to files read earlier without
setting up a temporary file inline

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
There are a very large number of changes, so older changes are hidden. Show Older Changes
eadler updated this revision to Diff 34626.Nov 1 2017, 7:56 PM

add default

imp added a comment.Nov 1 2017, 10:06 PM

I like it. This looks great, but I'm concerned about performance for larger files since realloc copies data and can wind up with a fragmented heap that might artificially limit what you can do. Simple lists of ranges would suffice to avoid these copies without greatly increasing the complexity of the code.

48 ↗(On Diff #34626)

It might be useful to specify what happens when memory can't be allocated for the entire file.

20 ↗(On Diff #34626)

so if we fail to allocate, no output is produced. If so, this is OK.

88 ↗(On Diff #34626)

why *4? This artificially may limit the file size as we approach the VM limits. *2 will result in more allocation but will let us get much closer to the limit without going nuts.

Also, this strategy is great for small files, but sucks for large files since we call realloc, which could be copying a crapton of data to keep the output code simple.

110 ↗(On Diff #34626)

Why mix raw access and stdio? Seems like we could easily use all one or the other.

117 ↗(On Diff #34626)

There's no harm in always closing the output, we're about to exit anyway.

eadler marked 3 inline comments as done.Nov 2 2017, 4:50 AM

Please wait for the next update and then review

110 ↗(On Diff #34626)

because I was using printf for debugging and Just Worked. replacing with writev to avoid the reallocf problem you mentioned

117 ↗(On Diff #34626)

true, but I dislike it, and it makes debugging annoying.

bcr added a subscriber: bcr.Nov 2 2017, 8:09 AM

Hi Eitan,

how about adding a small EXAMPLE section to the man page to show usage?

danfe added a subscriber: danfe.Nov 2 2017, 3:51 PM
danfe added inline comments.
20 ↗(On Diff #34654)

style(9) violation: should be static void *\nsafe_malloc.

27 ↗(On Diff #34654)

return (ret);

30 ↗(On Diff #34654)


37 ↗(On Diff #34654)


47 ↗(On Diff #34654)


51 ↗(On Diff #34654)

Move { to the next line.

52 ↗(On Diff #34654)

Usage is typically sent to stderr.

56 ↗(On Diff #34654)


65 ↗(On Diff #34654)

Local variables are utterly unsorted. See style(9).

mjoras added a subscriber: mjoras.Nov 2 2017, 9:22 PM
eadler updated this revision to Diff 34701.Nov 3 2017, 4:39 AM
eadler marked 13 inline comments as done.

Attempt style(9)

eadler updated this revision to Diff 34702.Nov 3 2017, 4:42 AM

Fix history

jilles added a subscriber: jilles.Nov 3 2017, 3:31 PM

An alternative implementation would write to a temporary file, then rename onto the destination once stdin closes. This is how ksh93's >;pathname redirection works (this does not mean I think this definitely should be done like that, or that the shell is the best place to implement this feature).

26 ↗(On Diff #34702)

Convention seems to be not to add the leading zero here.

53 ↗(On Diff #34702)

We should perhaps add a warning that the file will be written even if earlier components of the pipeline failed.

15 ↗(On Diff #34702)

whitespace style: static void *safe_malloc...

77 ↗(On Diff #34702)

style: spaces

79 ↗(On Diff #34702)

-h for help is an unusual convention. It is very common for -h to mean something else, such as affecting a symlink instead of its target or making output more human-readable.

116 ↗(On Diff #34702)

This memory is leaked (never used).

132 ↗(On Diff #34702)

You can use err(3) here and elsewhere to simplify the code a bit.

136 ↗(On Diff #34702)

If there are too many iovecs (sysconf(_SC_IOV_MAX), 16 MB with 16 KB buffers on FreeBSD), writev() will do nothing and return the error [EINVAL]. To avoid this, the write should be split up.

If sysconf(_SC_IOV_MAX) returns -1 or if you want to simplify the code, use the constant _XOPEN_IOV_MAX. This may lead to more system calls, but that may not be a problem.

143 ↗(On Diff #34702)

The error from close() should be checked.

danfe added a comment.Nov 3 2017, 3:33 PM

Some more style(9) issues...

19 ↗(On Diff #34702)

Per our style(9), * should be part of the variable, not the type. Also applies to prototypes above.

22 ↗(On Diff #34702)

Local variables are typically separated from the rest of of the function with extra linefeed.

28 ↗(On Diff #34702)

Still not done.

34 ↗(On Diff #34702)


40 ↗(On Diff #34702)

Again, return (ret);.

43 ↗(On Diff #34702)

Again, should be static void * (note the space).

46 ↗(On Diff #34702)

Missing linefeed.

52 ↗(On Diff #34702)

Again, return (ret);.

98 ↗(On Diff #34702)

Per style(9), infinite loops are spelled as for (;;) in FreeBSD.

eadler updated this revision to Diff 34767.Nov 4 2017, 6:13 PM
eadler marked an inline comment as done.

almost there...

53 ↗(On Diff #34702)

Its the same for all unix commands, but its likely worth adding a reminder here.

79 ↗(On Diff #34702)

It is annoying to me when utilities use -h for anything other than help.

136 ↗(On Diff #34702)

this is ridiculous. Will fix in the next iteration.

wblock added a subscriber: wblock.Nov 17 2017, 8:18 PM
wblock added inline comments.
40 ↗(On Diff #34767)


utility reads standard in until complete, then opens
​the output file and writes to it.
41 ↗(On Diff #34767)

I would say "useful" instead of "valuable".

42 ↗(On Diff #34767)
This makes it useful in pipelines that read a file and then write to it.
43 ↗(On Diff #34767)

s/The following/These/

46 ↗(On Diff #34767)


51 ↗(On Diff #34767)

s/In the event that/If/
and this needs a comma and some other stuff. I doubt you are referring to a specific malloc, but rather to memory allocation in general. So:

If an attempt to allocate memory fails,
54 ↗(On Diff #34767)

s/will be/is/

54 ↗(On Diff #34767)

Seems like this ought to be in a BUGS section at the end.

55 ↗(On Diff #34767)


67 ↗(On Diff #34767)

This should use double quotes. Too bad we don't have a macro for air quotes, because that's pretty much what this is.

68 ↗(On Diff #34767)
.Cm sort file | sponge file
eadler updated this revision to Diff 36074.Dec 1 2017, 10:29 PM

handle too large bufs...

eadler updated this revision to Diff 36083.Dec 2 2017, 1:01 AM
eadler marked 30 inline comments as done.


eadler added inline comments.Dec 2 2017, 1:01 AM
54 ↗(On Diff #34767)

this isn't a bug. Its statement on how shells work.

eadler marked 2 inline comments as done.Dec 2 2017, 1:01 AM
eadler updated this revision to Diff 36115.Dec 2 2017, 5:35 PM

fix typo

jilles added inline comments.Dec 3 2017, 5:02 PM
53 ↗(On Diff #36115)

missing space in iswritten

136 ↗(On Diff #34702)

Ok, but to be portable you still need something like

if (maxiovec == -1)
    maxiovec =  _XOPEN_IOV_MAX;

since I expect haters of fixed limits to return -1 from sysconf(_SC_IOV_MAX).

eadler updated this revision to Diff 36160.Dec 3 2017, 7:46 PM
eadler marked an inline comment as done.

last iteration - I hope!

eadler updated this revision to Diff 36167.Dec 3 2017, 8:48 PM
eadler marked an inline comment as done.

add unit test

eadler updated this revision to Diff 36168.Dec 3 2017, 8:49 PM

add unit test

eadler updated this revision to Diff 36169.Dec 3 2017, 9:03 PM

fix copyright

This revision was automatically updated to reflect the committed changes.