Details

Reviewers

dch
mav
cperciva
rob.norris_klarasystems.com
erj

Group Reviewers

secteam

Summary

DRAFT

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275308

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

emaste requested review of this revision.Nov 29 2023, 2:41 PM

emaste created this revision.

emaste added a reviewer: mav.

markj added a subscriber: markj.Nov 29 2023, 2:43 PM

markj added inline comments.

FreeBSD-EN-23:16.openzfs
52	Is zfs(4) the right reference? I don't see anything in there about holes.

emaste added inline comments.Nov 29 2023, 2:47 PM

FreeBSD-EN-23:16.openzfs
16–17	Need to indicate that 12.x has a variant of the faulty dirtiness check, but it may not be possible to encounter the issue there (in particular, `copy_file_range` does not exist and 12.x ZFS does not support `SEEK_HOLE` as far as I can tell).
36–38	Can update this with some notes from @mav's text in PR275308 - in particular mentioning the incomplete dirtiness check.

emaste added inline comments.Nov 29 2023, 2:48 PM

FreeBSD-EN-23:16.openzfs

I included the zfs(4) reference for the description of the sysctl -- i.e.,

zfs_dmu_offset_next_sync=1|0 (int)
        Enable forcing TXG sync to find holes.  When enabled forces ZFS
        to sync data when SEEK_HOLE or SEEK_DATA flags are used allowing
        holes in a file to be accurately reported.  When disabled holes
        will not be reported in recently dirtied files.

I can make this "See the zfs(4) man page for more information on the effect of setting the sysctl to zero." or something like that

mav added inline comments.Nov 29 2023, 3:05 PM

FreeBSD-EN-23:16.openzfs
16–17	I do see SEEK_HOLE support in stable/12 ZFS. But you are right about `copy_file_range()`.
36–38	I'd say: "hole may incorrectly be reported where data were written". And "The data may instead be interpreted as zero bytes." -- explicit reads are not affected, apps just skip them.
42	Again, not while reading.

emaste added inline comments.Nov 29 2023, 3:09 PM

FreeBSD-EN-23:16.openzfs
33	Perhaps add: An OpenZFS dnode is a data structure used to represent and manage metadata about files and directories. In file systems, "dirty" refers to data or metadata that has been modified in memory but not yet written to the storage device.
36–38	A check did not test both the dnode itself and its data for dirtiness. This provides a very small window while a file is being modified where the dirtiness check falsely reports that the dnode is clean. If this happens a hole can then reported in place of the expected data.

incorporate feedback

more updates

emaste added a subscriber: mm.Nov 29 2023, 6:07 PM

mav accepted this revision.Nov 29 2023, 6:09 PM

This revision is now accepted and ready to land.Nov 29 2023, 6:09 PM

pstef added a subscriber: pstef.Nov 29 2023, 6:13 PM

pstef added inline comments.

FreeBSD-EN-23:16.openzfs
62	likelihood

emaste marked an inline comment as done.Nov 29 2023, 6:15 PM

emaste added inline comments.

FreeBSD-EN-23:16.openzfs
62	applied locally

lwhsu added a subscriber: lwhsu.Nov 29 2023, 6:15 PM

lwhsu added inline comments.

FreeBSD-EN-23:16.openzfs
112	This URL is redirected to https://docs.freebsd.org/en/books/handbook/kernelconfig/ , should we update it (and the template)?

emaste marked an inline comment as done.Nov 29 2023, 6:17 PM

emaste added inline comments.

FreeBSD-EN-23:16.openzfs
112	I think so yes. The templates are in the doc tree if you want to update.

emaste added a reviewer: cperciva.Nov 29 2023, 6:46 PM

Add a note about cp(1) using copy_file_range in 13.2 and 14.0, but not in 12.4.

This revision now requires review to proceed.Nov 29 2023, 6:53 PM

LGTM aside from maybe the word "uncommitted".

FreeBSD-EN-23:16.openzfs
40	"uncommitted" might confuse people who aren't familiar with filesystem/database jargon?

emaste added inline comments.Nov 29 2023, 7:39 PM

FreeBSD-EN-23:16.openzfs
40	possibly, although it is probably understandable with the context of the previous sentence. I can't think of a good way to rewrite it without basically repeating the previous sentence.

erj added a subscriber: erj.Nov 29 2023, 9:37 PM

erj added inline comments.

FreeBSD-EN-23:16.openzfs
53	This might just be a function of it's good to be complete when describing the problem/impact, but this first sentence repeats the same information that the last sentence of the previous section has. I guess it's fine, especially if you want to just jump to the impact section, but it's weird when reading the whole thing. Otherwise, I think I can understand the problem and impact, and the remediation steps are also clear.

attempt to clarify section III based on feedback from erj

erj added inline comments.Nov 30 2023, 12:04 AM

FreeBSD-EN-23:16.openzfs
62	I like the additional bits added here.

rob.norris_klarasystems.com signed these changes with MFA.Nov 30 2023, 1:28 AM

rob.norris_klarasystems.com accepted this revision.

This revision is now accepted and ready to land.Nov 30 2023, 1:28 AM

emaste added inline comments.Nov 30 2023, 1:42 AM

FreeBSD-EN-23:16.openzfs
22–24	@rob.norris_klarasystems.com just mentioned that it's been reproduced on Illumos https://www.illumos.org/issues/16087 - so we probably want to update to "Affects: All supported versions of FreeBSD" but leave a note that a patch for FreeBSD 12.4 is not yet available but the advisory will be available when it is.

Mention that 12.4 is affected and a patch will follow.

This revision now requires review to proceed.Nov 30 2023, 1:48 AM

netchild added a subscriber: netchild.Nov 30 2023, 7:50 AM

netchild added inline comments.

FreeBSD-EN-23:16.openzfs
57–62	This can be read as if the impact is only during a file copy (as a non-native speaker I could understand "If this occurs.." as "I'm doing a file copy and if I'm unlucky, the issue is triggered, and corruption of the target happens"), but the file copy is just one use case. Should we make it more explicit that the we make a special case here for a file copy?

LGTM. Reading through the github issues I see a common misunderstanding emerging that I think is worth covering in the errata.

People are assuming despite Rob’s best efforts that a run of zeroes in the file means corruption and conversely no zeroes is good.

Perhaps this, just after the cp example?

Note that the preceding illustrative example using cp is not the only scenario in which this rare combination of events could occur. It is not possible to determine presence of, or lack of, data corruption, by searching for runs of zeroes in files. Any process that incorrectly receives a zeroed hole chunk, may in turn transform that data, eventually writing it out to disk with the zeroes obscured due to the transformation.

If this doesn’t seem helpful please go ahead without it.

This revision now requires changes to proceed.Nov 30 2023, 8:28 AM

Attempt to clarify impact

It is not possible to determine presence of, or lack of, data corruption, by searching for runs of zeroes in files.

Not only that, there are legitimate reasons for files to have runs of zeros, and I imagine searching for runs of zeros is more likely to find false positives than file copies affected by this issue.

I think @netchild's suggestion is sufficient though - we don't talk at all about identifying corrupt files in this EN, and I don't think we should. I imagine part of the confusion in the GitHub pull request is that there are over 200 comments that mirror the evolution of the understanding of this issue, and clarification there is probably good.

Rework impact some more - don't try to enumerate multiple ways incorrect zeros could persist, be transformed, etc.

mav accepted this revision.Nov 30 2023, 3:48 PM

erj accepted this revision.Nov 30 2023, 4:45 PM

dch accepted this revision.Nov 30 2023, 7:17 PM

This revision is now accepted and ready to land.Nov 30 2023, 7:17 PM

Thanks for the review everyone, released as https://www.freebsd.org/security/advisories/FreeBSD-EN-23:16.openzfs.asc

emaste closed this revision.Aug 13 2024, 7:49 PM

EN for OpenZFS data corruption issue
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 130804

FreeBSD-EN-23:16.openzfs

EN for OpenZFS data corruption issueClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 130804

FreeBSD-EN-23:16.openzfs

EN for OpenZFS data corruption issue
ClosedPublic
Actions

Revision Contents
Changeset List