Page MenuHomeFreeBSD

diff: mmap the file to be diffed
Needs ReviewPublic

Authored by pstef on Jan 28 2021, 4:13 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 25 2023, 5:17 AM
Unknown Object (File)
Dec 22 2023, 8:27 AM
Unknown Object (File)
Nov 6 2023, 3:06 PM
Unknown Object (File)
Nov 5 2023, 9:55 AM
Unknown Object (File)
Nov 5 2023, 8:59 AM
Unknown Object (File)
Nov 5 2023, 4:22 AM
Unknown Object (File)
Nov 4 2023, 11:55 PM
Unknown Object (File)
Oct 31 2023, 9:09 PM
Subscribers

Details

Summary

mmap the file to be diffed to prepare the land for modification of
the code to use mmap.

First component to be converted fo mmap: asciifile function to detect
if the file is a binary file or not.

diff: convert the hash to mmap

diff: Simplify the file comparison with mmap

diff: convert the check function to use mmap

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 41226
Build 38115: arc lint + arc unit

Event Timeline

bapt requested review of this revision.Jan 28 2021, 4:13 PM
  • diff: mmap the file to be diffed
  • diff: convert the hash to mmap
  • diff: Simplify the file comparison with mmap
  • diff: convert the check function to use mmap

extending the list of potential reviewers, just in case ;)

usr.bin/diff/diffreg.c
350

Are CAP_READ and CAP_SEEK still needed?

452

Is it possible to use const uint8_t * in this function and others to make it more clear that these functions are not operating on nul-terminated strings (in the D_BINARY case at least)?

893

Can memchr() be used?

  • diff: mmap the file to be diffed
  • diff: convert the hash to mmap
  • diff: Simplify the file comparison with mmap
  • diff: convert the check function to use mmap
  • diff: use uint8_t instead of char
  • diff: use memchr to find end of line
bapt marked 2 inline comments as done.Mar 29 2021, 9:38 PM
bapt added inline comments.
usr.bin/diff/diffreg.c
350

yes not everything is done yet on the mmap, there are still rewind and fread operations in the output function, I may switch it do mmap later

usr.bin/diff/diffreg.c
747

If we have buf2 == end2 then this dereference may not be safe, I think.

759

Similarly here you need an extra check to handle the EOF case. It looks like there are several other places below. Am I missing something?

812

Same problem here.

  • diff: add guards against dereferencements
bapt marked 3 inline comments as done.Mar 30 2021, 4:36 AM
bapt marked an inline comment as done.Mar 30 2021, 4:37 AM
usr.bin/diff/diffreg.c
711

It can just be a function, IMO.

761

Per style.9 the && and || should be before the line breaks.

804

Not sure what the XXX refers to exactly, but this is not the same logic as before. If one of c and d is EOF, but not both, then chrtran(c) != chrtran(d) is true.

812

Same problem. This does not behave the same way as before if we are at EOF on one file but not both.

bapt marked 2 inline comments as done.

Reduce the scope of the change to only readhash and files_differs skip check()
for now, as it is more complex and does not really bring any performance
improvement

  • diff: mmap the files before actually trying to look up for differencies
  • diff: simplify file comparison now we have the files already mmaped

Could we make this (mmap) into an option?

usr.bin/diff/diffreg.c
375

files_differ() returns either 0 or 1, so the code under default: will be never reached, I think.

510

I would rename this buf to walk and remove the object named walk below. It's a copy anyway.

Could we make this (mmap) into an option?

What would be the point for it to be an option? I think if it is safe and faster we should just use that code, instead of complexifying diff code with 2 differents code path

In D28402#714642, @bapt wrote:

What would be the point for it to be an option? I think if it is safe and faster we should just use that code, instead of complexifying diff code with 2 differents code path

mmap is problematic whenever any of the files can be truncated. But I don't insist we make this an option, I just wanted to have it pointed out during review.

bapt marked an inline comment as done.

Addresse @pstef issues

  • diff: mmap the files before actually trying to look up for differencies
  • diff: simplify file comparison now we have the files already mmaped
pstef edited reviewers, added: bapt; removed: pstef.

Swim or surrender!