Page MenuHomeFreeBSD

diff: mmap the file to be diffed
Needs ReviewPublic

Authored by pstef on Jan 28 2021, 4:13 PM.
Tags
None
Referenced Files
F84256686: D28402.diff
Tue, May 21, 1:19 PM
Unknown Object (File)
Sat, May 18, 9:42 PM
Unknown Object (File)
Tue, May 14, 2:22 PM
Unknown Object (File)
Dec 25 2023, 5:17 AM
Unknown Object (File)
Dec 22 2023, 8:27 AM
Unknown Object (File)
Nov 6 2023, 3:06 PM
Unknown Object (File)
Nov 5 2023, 9:55 AM
Unknown Object (File)
Nov 5 2023, 8:59 AM
Subscribers

Details

Summary

mmap the file to be diffed to prepare the land for modification of
the code to use mmap.

First component to be converted fo mmap: asciifile function to detect
if the file is a binary file or not.

diff: convert the hash to mmap

diff: Simplify the file comparison with mmap

diff: convert the check function to use mmap

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 38173
Build 35062: arc lint + arc unit

Event Timeline

bapt requested review of this revision.Jan 28 2021, 4:13 PM
  • diff: mmap the file to be diffed
  • diff: convert the hash to mmap
  • diff: Simplify the file comparison with mmap
  • diff: convert the check function to use mmap

extending the list of potential reviewers, just in case ;)

usr.bin/diff/diffreg.c
348

Are CAP_READ and CAP_SEEK still needed?

459

Is it possible to use const uint8_t * in this function and others to make it more clear that these functions are not operating on nul-terminated strings (in the D_BINARY case at least)?

901

Can memchr() be used?

  • diff: mmap the file to be diffed
  • diff: convert the hash to mmap
  • diff: Simplify the file comparison with mmap
  • diff: convert the check function to use mmap
  • diff: use uint8_t instead of char
  • diff: use memchr to find end of line
bapt marked 2 inline comments as done.Mar 29 2021, 9:38 PM
bapt added inline comments.
usr.bin/diff/diffreg.c
348

yes not everything is done yet on the mmap, there are still rewind and fread operations in the output function, I may switch it do mmap later

usr.bin/diff/diffreg.c
746

If we have buf2 == end2 then this dereference may not be safe, I think.

756–760

Similarly here you need an extra check to handle the EOF case. It looks like there are several other places below. Am I missing something?

816–817

Same problem here.

  • diff: add guards against dereferencements
bapt marked 3 inline comments as done.Mar 30 2021, 4:36 AM
bapt marked an inline comment as done.Mar 30 2021, 4:37 AM
usr.bin/diff/diffreg.c
710

It can just be a function, IMO.

758

Per style.9 the && and || should be before the line breaks.

802

Not sure what the XXX refers to exactly, but this is not the same logic as before. If one of c and d is EOF, but not both, then chrtran(c) != chrtran(d) is true.

816–817

Same problem. This does not behave the same way as before if we are at EOF on one file but not both.

bapt marked 2 inline comments as done.

Reduce the scope of the change to only readhash and files_differs skip check()
for now, as it is more complex and does not really bring any performance
improvement

  • diff: mmap the files before actually trying to look up for differencies
  • diff: simplify file comparison now we have the files already mmaped

Could we make this (mmap) into an option?

usr.bin/diff/diffreg.c
375

files_differ() returns either 0 or 1, so the code under default: will be never reached, I think.

517

I would rename this buf to walk and remove the object named walk below. It's a copy anyway.

Could we make this (mmap) into an option?

What would be the point for it to be an option? I think if it is safe and faster we should just use that code, instead of complexifying diff code with 2 differents code path

In D28402#714642, @bapt wrote:

What would be the point for it to be an option? I think if it is safe and faster we should just use that code, instead of complexifying diff code with 2 differents code path

mmap is problematic whenever any of the files can be truncated. But I don't insist we make this an option, I just wanted to have it pointed out during review.

bapt marked an inline comment as done.

Addresse @pstef issues

  • diff: mmap the files before actually trying to look up for differencies
  • diff: simplify file comparison now we have the files already mmaped
pstef edited reviewers, added: bapt; removed: pstef.

Swim or surrender!