Page MenuHomeFreeBSD

Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4)
ClosedPublic

Authored by sobomax on Feb 18 2016, 9:46 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Oct 25, 3:15 AM
Unknown Object (File)
Mon, Oct 13, 6:34 PM
Unknown Object (File)
Mon, Oct 13, 5:10 AM
Unknown Object (File)
Sun, Oct 12, 5:14 PM
Unknown Object (File)
Sun, Oct 12, 5:14 PM
Unknown Object (File)
Sun, Oct 12, 5:14 PM
Unknown Object (File)
Sun, Oct 12, 5:14 PM
Unknown Object (File)
Sun, Oct 12, 5:14 PM

Details

Summary

This change provides snapshot of my recent work to refresh mkuzip(8) utility and geom_uzip(4) kernel module. Some of the most significant changes that are included:

  1. mkuzip(8):
    • Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes.
    • New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSD kernel, hence it's turned off by default.
    • provide options to control both features and document them in manual page.
    • merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both.
    • switch license from ad-hoc beerware into standard 2-clause BSD.
  1. geom_uzip(4):
    • implement support for de-duplicated images;
    • optimize some code paths to handle "all-zero" blocks without reading any compressed data;
    • beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive.
    • convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module.
    • hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail.
    • merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header.
    • move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores.
    • document new knobs in the manual page.
Test Plan
  1. Create some realistic data partition. In my case it's FreeBSD 10.2 base system (binaries and data), no sources and 230 packages, 2.3GB worth of data. Full list of packages is here:

http://bit.ly/1QMw1D0

  1. Make 4 versions of the compressed image with new mkuzip(8):
# mkuzip -o outfile.zcomp.nodedup.uzip infile
# mkuzip -Z -o outfile.nozcomp.nodedup.uzip infile
# mkuzip -d -o outfile.zcomp.dedup.uzip infile
# mkuzip -Zd -o outfile.nozcomp.dedup.uzip infile
  1. Attach all 4
# mdconfig -a -f outfile.zcomp.nodedup.uzip
md0
# mdconfig -a -f outfile.nozcomp.nodedup.uzip
md1
# mdconfig -a -f outfile.zcomp.dedup.uzip
md2
# mdconfig -a -f outfile.nozcomp.dedup.uzip
md3
  1. Try reading data with some off block size and see if it matches:
# dd if=infile bs=260608 status=none | md5
# dd if=/dev/md0.uzip bs=260608 status=none | md5
# dd if=/dev/md1.uzip bs=260608 status=none | md5
# dd if=/dev/md2.uzip bs=260608 status=none | md5
# dd if=/dev/md3.uzip bs=260608 status=none | md5
  1. Change block size, see if you can break it. fsck the resulting device node, try to mount/chroot in there and run some I/O tests.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sobomax updated this object.
sobomax edited the test plan for this revision. (Show Details)
sobomax set the repository for this revision to rS FreeBSD src repository - subversion.
sobomax added subscribers: manpages, imp.
sobomax edited edge metadata.
sobomax edited the test plan for this revision. (Show Details)
sobomax updated this object.

P.S. This is just to sync the code into the tree so that more people can play / look at it. It's also desperately needed to move decompression into its own worker thread per geom_uzip() instance, as right now we are running in the context of geom{g_up} thread, so if you have multiple devices used at the same time or more than one user for the device the scalability onto multi-core CPUs is not existing. On top of that I suspect that it blocks the said thread so it might affect I/O to/from other GEOM providers as well.

sobomax retitled this revision from Improve mkuzip(8) and geom_uzip(4) to Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4).Feb 20 2016, 2:49 AM
sobomax updated this object.

o Fully merge in LZMA support from mkulzma(8) and geom_uncompress(4)

o fix rare bug that may cause zero-size IO request to be emitted.

o improve debugging levels somewhat

sobomax edited edge metadata.

Add missing (c) into g_uzip.h.

No comment on geom_uzip module code or mkuzip.

share/man/man4/geom_uzip.4
53–54 ↗(On Diff #13540)

a some -> some. Or "a little" was fine too.

66–70 ↗(On Diff #13540)

Remove first "The."

I'd suggest: ".Nm is not limited to supporting only .Xr md 4 images."

71–72 ↗(On Diff #13540)

I'd leave out the RW/RO distinction here. For geom_uzip's purposes, compressed images are always RO so the media characteristics don't really matter.

"The image can also reside on a block device. (For example, a disk, USB flash drive, or DVD-ROM.)"

I'm not sure the examples are necessary, but if you're going to include them, that's how I'd phrase it.

73–76 ↗(On Diff #13540)

"the appropriate device node will appear with the .Pa .uzip suffix:"

140 ↗(On Diff #13540)

Remove "The"

142–145 ↗(On Diff #13540)

".Nm allows mounting the root file system from a compressed disk partition by setting the .Dv vfs.root.mountfrom tunable."

156–159 ↗(On Diff #13540)

"Zero disables logging. Higher values enable more verbose debug logging for .Nm ."

162 ↗(On Diff #13540)

I would remove "key" and "specific."

sys/geom/uzip/g_uzip.c
81 ↗(On Diff #13540)

perhaps add "(0-3)" here

106 ↗(On Diff #13540)

As Adrian points out, this should not be static.

726–741 ↗(On Diff #13540)

Yeugh.

sys/geom/uzip/g_uzip_dapi.h
1–8 ↗(On Diff #13540)

I'm not sure what this macro gets you. Why not get rid of it?

sys/geom/uzip/g_uzip_lzma.c
87 ↗(On Diff #13540)

I'd suggest sizeof(*lzp) instead. And definitely do not cast the result of malloc(9). This is C, not C++.

sys/geom/uzip/g_uzip_zlib.c
74–75 ↗(On Diff #13540)

Why not just:

err = inflate(...);
if (err != Z_STREAM_END) {
  print();
  return (1);
}
return (0);
102–103 ↗(On Diff #13540)

Don't cast malloc(); suggest sizeof(*zp).

125 ↗(On Diff #13540)

suggest casting type to size_t although I doubt zlib tries to allocate >4GB anyway.

usr.bin/mkuzip/mkuzip.c
48 ↗(On Diff #13540)

Duplicate definition of this macro... I don't think you need it, anyway.

364 ↗(On Diff #13540)

Unnecessary to explicitly cast between 'const void*' and 'const u_char *'.

sobomax edited edge metadata.
sobomax marked an inline comment as done.
sobomax added a subscriber: adrian.

New revision, changelog:

o fix static compilation reported by @adrian

o move decompression into its own worker thread

o better range checking on TOC's block offsets.

sobomax edited edge metadata.

I'd leave the worker thread for a 2nd revision to make history cleaner.

Posted interim revision just dealing with functional deficiencies as reported by @adrian. Will deal with style issues and manpages later on.

share/man/man4/geom_uzip.4
53–54 ↗(On Diff #13540)

Well, I think "little" is too heavy here. How little or not heavily depends on the CPU and algorithm.

sys/geom/uzip/g_uzip.c
81 ↗(On Diff #13540)

haha, this already has been added yesterday.

726–741 ↗(On Diff #13540)

Huh? Well, it's the only reasonable way to unwind complex object in C, I think it's much better to call g_uzip_softc_free() previously, since it basically needs to be smart enough to decode which sub-objects have been inited and which aren't. That other approach is highly error prone and tends to get out of sync quickly.

sys/geom/uzip/g_uzip_dapi.h
1–8 ↗(On Diff #13540)

I should probably be ashame of that, but I could never really remember how to typedef pointer to a function in C. :( So instead of toorturing myself and never getting it right, I cooked up this little macro and just copy it over. Saves me tons of time.

sys/geom/uzip/g_uzip_lzma.c
87 ↗(On Diff #13540)

Well, I prefer explicit size when it's not too obtrusive. Because the size I want might not be really *lzp, I could stick variable size array at the end some way (i.e. data[0]). So if I search my sources by sizeof(struct g_uzip_lzma), this will come out prominently.

I hope if you don't mind if I keep it this way.

sys/geom/uzip/g_uzip_zlib.c
74–75 ↗(On Diff #13540)

Well probably because inflate() may return some werd-o typebeing 3rd party API with lot of historical hairs and in general its error conventions do not match ours. So assigning that could lead to a bug in a future when somebody tweaks the code and that error gets propagated.

102–103 ↗(On Diff #13540)

See my comment above WRT *zp. I'll remove the casts though.

usr.bin/mkuzip/mkuzip.c
364 ↗(On Diff #13540)

Well, I just cast always when going from void * to something non-void *. Saves me trouble to remember when it's needed and when not, again I don't mind if I keep it this way.

sys/geom/uzip/g_uzip.c
768–786 ↗(On Diff #13552)

I think it's better to just initialize pointers to NULL, and free/destroy them conditional on being non-NULL, with a single out: or error: label.

Many error labels tends to result in jumping to the wrong one in some path, because most of the time no errors happen. Especially with numbered labels and not names. Still, just my 2¢. If it works, it's fine to keep it as is.

sys/geom/uzip/g_uzip_dapi.h
2–9 ↗(On Diff #13552)

Why not just use it for reference and not copy it? :/ This is a header, it leaks in to anything that includes it.

sys/geom/uzip/g_uzip_lzma.c
88 ↗(On Diff #13552)

You can use + operator on sizeof(*lzp) equally well, so I don't understand the VLA data concern. Still, if you prefer it, it's okay to keep malloc(sizeof(struct g_uzip_lzma)).

Casting the result is not okay though, please fix that.

@ceri, added some comments, thanks for your help, I'll work on new revision.

sys/geom/uzip/g_uzip.c
768–786 ↗(On Diff #13552)

Well, that's where I respectfully disagree. First, let me reiterate: destructor (i.e. g_uzip_softc_free()) has no business to being used on partially constructed object. Never. End of story, thank you very much. :) I don't think it was your suggestion, but just in the case.

About using single error label, I found that once the number of sub-objects that you need to unwind exceeds 2-3 the approach you are advocating for tends to produce slight differences in ordering and hidden bugs that are only exposed on error conditions rarely seen in practice since those code paths are rarely taken. The problem here is that, once the object is complex, we need to unwind it in a very particular order, since sub-object A might be passed as initial data to object B, so if you unwind it in the opposite order it might blow up. Your NULL-check does not gurantee that. And in fact as I said as the code develops initialization order might swap or shift, so you are basically have two code parts that need to be adjusted.

I find it easier and less error prone to do it this way as long as you adopt some mind discipline on keeping eX list neat, ordered and tidy it works.

sys/geom/uzip/g_uzip_dapi.h
2–9 ↗(On Diff #13552)

Well, because I modify the code. I could not really even read it out back properly, always get confused what is what type-wise. It gets hopeless when function gets more than 5-6 arguments and returns either void or void *. >:-( And my ability to tell what the method API return type and arguments are are quite important in producing maybe better code but also definitely more code per unit of my limited time.

Anyway, as I said, this is not a public api of any kind or sort, so as long as I am only one actively working on this code I'd keep it this way. Thanks!

usr.bin/mkuzip/mkuzip.c
364 ↗(On Diff #13552)

I don't mind if I keep it this way -> I hope you don't mind if I keep it this way

Add some more explanation for the unwind handling.

sys/geom/uzip/g_uzip.c
768–786 ↗(On Diff #13552)

To make it more concrete, let's say you have the following code:

D_init(void)
{
   A = B = C = NULL;
   A = A_init();
   B = B_init(A);
   C = C_init();
   if (C == NULL)
        goto error;
 [...]
 error:
    if (A != NULL)
        A_free(A);
    if (B != NULL)
        B_free(B);
    return (whatever)
}

Now, obviously in this 3-line example you can spot the issue easily, but how about more realistic softc function that has 100 lines and 10 sub-objects for example? It's all further amplified by the fact that C_init() could be something that is failing extremely rarely in practice. Like that very little used function malloc() for example. Then you got yourself a nice sleeping bug that may lay dormant and unnoticed for ages, but it can bite somebody badly one day.

I am not saying my approach is a silver bullet, you can obviously mix the order just as well or start hiting the wrong label after moving the code block around. However, I found that chances that you would leave some traces like dangling memory or hit a null pointer are much higher. In contrast your approach would produce seemingly working code that might pass both visual inspection and unit/functional tests until that C_init fails on somebody one day. In my view, if ordering is violated, it's better to leave either A or B unfreed, that is likely to get noticed and fixed sooner, rather than free everything but in the wrong order.

usr.bin/mkuzip/mkuzip.c
48 ↗(On Diff #13552)

No, it's not duplicate. Note, this is userland part. We are not using any headers from the g_uzip(4) here, don't need to.

sys/geom/uzip/g_uzip.c
768–786 ↗(On Diff #13552)

That is a straw man version of my proposal. Either you don't understand it, or just don't like it. That's fine. I don't appreciate the tone, though. I'll take myself off the review.

Thanks for doing the geom_uzip + geom_uncompress unifying work.

sys/geom/uzip/g_uzip.c
768–786 ↗(On Diff #13552)

@ceri, sorry about the tone, was not my intent really. English is not my native tongue, so sometimes what I am writing may read too heavy for a native speaker. Just wanted to explain my point in more detail. I hope you'd reconsider.

The key here, is that I am trying to make sure order of unwind matches order of init in reverse. That's all is to it. With NULL checks and single label that's not very difficult to do right when you just make function for the first time and you know what's done when. But as the code evolves it tends to diverge and keeping it in sync becomes somewhat of a pain, since at least I cannot keep 100 lines of the code in my poor little head, so every time code is moved from one place to another you need to re-create that order, possible long time after initial work. Jumping to the wrong label is minor issue because unless the code is too complicated (and init rarely is, in my experience it's usually just long boring list of things to initialize and join together) order in which those exit labels are used in the main body of the function should be non-decreasing, while monotonically decreasing in the exit path. Can be easily checked by just doing single pass through the code looking for "goto e[0-9]". All this strictly IMHO based on my own experience writing OO-like C code.

Anyway, thank you for your constructive comments and suggestions, I'll definitely integrate most of it into the final version!

sobomax added inline comments.
sys/geom/uzip/g_uzip.c
768–786 ↗(On Diff #13552)

@cem sorry about the tone, was not my intent really. English is not my native tongue, so sometimes what I am writing may read too heavy for a native speaker. Just wanted to explain my point in more detail. I hope you'd reconsider.

The key here, is that I am trying to make sure order of unwind matches order of init in reverse. That's all is to it. With NULL checks and single label that's not very difficult to do right when you just make function for the first time and you know what's done when. But as the code evolves it tends to diverge and keeping it in sync becomes somewhat of a pain, since at least I cannot keep 100 lines of the code in my poor little head, so every time code is moved from one place to another you need to re-create that order, possible long time after initial work. Jumping to the wrong label is minor issue because unless the code is too complicated (and init rarely is, in my experience it's usually just long boring list of things to initialize and join together) order in which those exit labels are used in the main body of the function should be non-decreasing, while monotonically decreasing in the exit path. Can be easily checked by just doing single pass through the code looking for "goto e[0-9]". All this strictly IMHO based on my own experience writing OO-like C code.

Anyway, thank you for your constructive comments and suggestions, I'll definitely integrate most of it into the final version!

sys/geom/uzip/g_uzip.h
28 ↗(On Diff #13552)

There's no #ifdef guard here.

sys/geom/uzip/g_uzip.c
46 ↗(On Diff #13552)

MALLOC_DEFINE needs to be moed /after/ the include section, or we get a double define error from gcc-4.2

cc1: warnings being treated as errors
In file included from /usr/home/adrian/work/freebsd/head-embedded/src/sys/geom/uzip/g_uzip.c:48:
/usr/home/adrian/work/freebsd/head-embedded/src/sys/geom/uzip/g_uzip.h:29: warning: redundant redeclaration of 'M_GEOM_UZIP' [-Wredundant-decls]
/usr/home/adrian/work/freebsd/head-embedded/src/sys/geom/uzip/g_uzip.c:46: warning: previous definition of 'M_GEOM_UZIP' was here

  • g_uzip.o ---
  • [g_uzip.o] Error code 1

commit cdab02d18d2e5d7855498eb5a84d87a6a4ac04de
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sun Feb 21 17:45:49 2016 -0800

Only define M_GEOM_UZIP if it's not defined already.

Reported by: @adrian

commit 09f29117de0426725532db0e38f24ea0041a497d
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sun Feb 21 16:00:37 2016 -0800

Remove warning about de-duplicated images possibility to inflict
a crash or other problem to other OS or earlier kernels. With
the latest changes to mkuzip(8), the header for de-duplicated
images will be using distinct signature and should be rejected
as non-compatible header.

commit 70d9a35faf1dc47fe6eb110b4157e5d559996110
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sun Feb 21 12:57:39 2016 -0800

Recognize new version string for the deduplicated images.

commit 7351ff12bccc4cd16b83839cf4ebaa7ca9b76b07
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sun Feb 21 10:17:02 2016 -0800

Move cloop struct and related constants into its own header.

When output de-duplication is enabled give it a distinct header
by lowering case of the letter used to specify a compression
method, i.e. 'L' -> 'l' for LZMA and 'V' -> 'v' for ZLIB. This
should prevent foot shooting when image is loaded on older
FreeBSD version or on different OS.

commit fc3f2c45b17c62d20d66e2528866ab8ab0b0eb3b
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sun Feb 21 09:42:27 2016 -0800

Move cloop struct and related defines into its own header and
replace some "magic" values in the code with symbols.

commit 33c9409c2690b6c1c85a9a8596bff9525dde44ae
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sun Feb 21 08:45:29 2016 -0800

Cleanup based on the input from @cem.

commit bcee4352d49d5b6e4469c8a95f2ba5b1a5a25130
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sat Feb 20 13:22:05 2016 -0800

Don't cast malloc return value, it's pointless.

Suggested by: @cem

commit 7d7ed03abda5c0519d1a1f31458e09d5120094f8
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sat Feb 20 12:40:27 2016 -0800

Debug level 0-4 -> Debug level (0-4).

Suggested by: @cem (kinda)
sobomax added a subscriber: cem.
sobomax added inline comments.
share/man/man4/geom_uzip.4
66–105 ↗(On Diff #13588)

That comment seems to be misplaced. I've located the sentence and fixed it in the next update.

71–72 ↗(On Diff #13588)

@cem done, I've added etc to suggest that the list is not complete.

sys/geom/uzip/g_uzip.c
46–72 ↗(On Diff #13588)

Fixed in the g_uzip.h. Thanks!

sys/geom/uzip/g_uzip.h
29 ↗(On Diff #13588)

There is one now. Thanks!

sys/geom/uzip/g_uzip_lzma.c
88 ↗(On Diff #13552)

Cast(s) removed in the upcoming version, thanks. In fact now that I am thinking about it I never used cast with malloc with my own C code before, so I don't know what made me stick that one in there in one place and then it got copied to another.

This revision was automatically updated to reflect the committed changes.
sobomax marked 4 inline comments as done.