Page MenuHomeFreeBSD

The locate statistics output needs a rework. It is not user friendly.
Needs ReviewPublic

Authored by wosch on Feb 17 2022, 6:48 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Mar 31, 6:42 AM
Unknown Object (File)
Mar 3 2024, 10:06 PM
Unknown Object (File)
Jan 12 2024, 9:15 AM
Unknown Object (File)
Dec 22 2023, 11:52 PM
Unknown Object (File)
Dec 9 2023, 10:35 PM
Unknown Object (File)
Nov 23 2023, 12:03 PM
Unknown Object (File)
Nov 22 2023, 8:02 PM
Unknown Object (File)
Nov 19 2023, 12:56 PM

Details

Reviewers
jrtc27
Summary

Improve the locate statistics

  • show last update date, and how many days ago
  • better readable compression factor instead percentage
  • improve wordings (8-Bit -> nonprintable ASCII)
  • refactor statistics into a new file

e.g.:
old:

$ locate -S
Database: /var/db/locate.database
Compression: Front: 9.96%, Bigram: 65.01%, Total: 7.38%
Filenames: 29910923, Characters: 2902743213, Database size: 214123631
Bigram characters: 74925080, Integers: 716236, 8-Bit characters: 4
Longest path: 223

new:
$ locate -S
Database: /var/db/locate.database
Last database update: Sat Feb  5 07:48:15 2022 (4.4 days ago)
Database size: 213401959 bytes
Filenames: 29820255
Longest pathname: 223 bytes
Compression factor: front: 10.0x, bigrams: 1.4x, total: 13.6x
Filenames size: 2894961156 bytes
Data types: int: 712776, bigrams: 74761507, nonprintable ASCII: 4

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

wosch requested review of this revision.Feb 17 2022, 6:48 PM
wosch added a reviewer: jrtc27.

Splitting it out into its own file as part of this change makes it quite hard to see what the actual diff is. Also, as you're touching the code, please de-K&R it (and fix the style so there's no space before the ( in function definitions).

Splitting it out into its own file as part of this change makes it quite hard to see what the actual diff is. Also, as you're touching the code, please de-K&R it (and fix the style so there's no space before the ( in function definitions).

It was a mistake to have the statistic function in fastfind.c from the beginning - my bad. Given that the function is now mostly rewritten I took the opportunity to put it in the new file statistic.c

I get the "no space before the ( in function definitions", but what do you mean with "de-K&R" it?

Thanks for the email poke.

I think -S needs a description of the field meanings in the manual page.

Compression factor: front: 10.0x, bigrams: 1.4x, total: 13.6x

Should be "overall", not "total", otherwise you're claiming 10.0 + 1.4 = 13.6.

Data types: int: 712776, bigrams: 74761507, nonprintable ASCII: 4

So this locate database contains... 712776 integers, 74761507 character pairs (that may or may not be compressed) and 4 ASCII control characters? If I'm right, the relevance of that information isn't clear to me: is it the raw data for the compression ratios above? Also, what are these integers for and why is it important to tell me how many there are?

All of the above except "overall" probably belong in the manual page, if only not to risk confusing postprocessing of the locate -S output. And this brings me to locate(1):

DESCRIPTION:

-0		 Print pathnames separated by an ASCII NUL character (charac-
		 ter code 0) instead of default NL (newline, character code
		 10).

s/default/the default/

BUGS:

The locate database is not byte order independent. It is not possible to share the databases between machines with different byte order. The current locate implementation understands databases in host byte order or network byte order if both architectures use the same integer size. So on a FreeBSD/i386 machine (little endian), you can read a locate database which was built on SunOS/sparc machine (big endian, net).

I scratched my head for a minute or so at this apparent contradiction, until I figured out you probably meant I can't update the same database on a network server from both architectures, but copying it from one to the other and use it there with locate -dworks. This needs clarifying.

Also, s/SunOS/a SunOS/ (and if "net" stands for "network byte order", it should be clarified as well).

The locate utility does not recognize multibyte characters.

This fortunately appears to be no longer true. That language should be corrected.

[pauamma@gadfly] ~% locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=
[pauamma@gadfly] ~% locate Foreman
/usr/home/pauamma/Downloads/NemesisAZForeman.mp3
/usr/home/pauamma/Music/Μεσομήδης ὁ Κρής/NemesisAZForeman.mp3
[pauamma@gadfly] ~% locate Πολιορκημένοι
/usr/home/pauamma/Music/Γιάννης Μαρκόπουλος/Οι Ελεύθεροι Πολιορκημένοι-CD1
/usr/home/pauamma/Music/Γιάννης Μαρκόπουλος/Οι Ελεύθεροι Πολιορκημένοι-CD1/01. Ειρήνη Παπά - Στοχασμός-Εισαγωγή - Reflection.mp3