Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
52 | Is zfs(4) the right reference? I don't see anything in there about holes. |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
16–17 | Need to indicate that 12.x has a variant of the faulty dirtiness check, but it may not be possible to encounter the issue there (in particular, copy_file_range does not exist and 12.x ZFS does not support SEEK_HOLE as far as I can tell). | |
36–38 | Can update this with some notes from @mav's text in PR275308 - in particular mentioning the incomplete dirtiness check. |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
52 | I included the zfs(4) reference for the description of the sysctl -- i.e., zfs_dmu_offset_next_sync=1|0 (int) Enable forcing TXG sync to find holes. When enabled forces ZFS to sync data when SEEK_HOLE or SEEK_DATA flags are used allowing holes in a file to be accurately reported. When disabled holes will not be reported in recently dirtied files. I can make this "See the zfs(4) man page for more information on the effect of setting the sysctl to zero." or something like that |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
16–17 | I do see SEEK_HOLE support in stable/12 ZFS. But you are right about copy_file_range(). | |
36–38 | I'd say: "hole may incorrectly be reported where data were written". And "The data may instead be interpreted as zero bytes." -- explicit reads are not affected, apps just skip them. | |
42 | Again, not while reading. |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
33 | Perhaps add: An OpenZFS dnode is a data structure used to represent and manage metadata about files and directories. In file systems, "dirty" refers to data or metadata that has been modified in memory but not yet written to the storage device. | |
36–38 | A check did not test both the dnode itself and its data for dirtiness. This provides a very small window while a file is being modified where the dirtiness check falsely reports that the dnode is clean. If this happens a hole can then reported in place of the expected data. |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
62 | likelihood |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
62 | applied locally |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
112 | This URL is redirected to https://docs.freebsd.org/en/books/handbook/kernelconfig/ , should we update it (and the template)? |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
112 | I think so yes. The templates are in the doc tree if you want to update. |
LGTM aside from maybe the word "uncommitted".
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
40 | "uncommitted" might confuse people who aren't familiar with filesystem/database jargon? |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
40 | possibly, although it is probably understandable with the context of the previous sentence. I can't think of a good way to rewrite it without basically repeating the previous sentence. |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
53 | This might just be a function of it's good to be complete when describing the problem/impact, but this first sentence repeats the same information that the last sentence of the previous section has. I guess it's fine, especially if you want to just jump to the impact section, but it's weird when reading the whole thing. Otherwise, I think I can understand the problem and impact, and the remediation steps are also clear. |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
62 | I like the additional bits added here. |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
22–24 | @rob.norris_klarasystems.com just mentioned that it's been reproduced on Illumos https://www.illumos.org/issues/16087 - so we probably want to update to "Affects: All supported versions of FreeBSD" but leave a note that a patch for FreeBSD 12.4 is not yet available but the advisory will be available when it is. |
FreeBSD-EN-23:16.openzfs | ||
---|---|---|
57–62 | This can be read as if the impact is only during a file copy (as a non-native speaker I could understand "If this occurs.." as "I'm doing a file copy and if I'm unlucky, the issue is triggered, and corruption of the target happens"), but the file copy is just one use case. Should we make it more explicit that the we make a special case here for a file copy? |
LGTM. Reading through the github issues I see a common misunderstanding emerging that I think is worth covering in the errata.
People are assuming despite Rob’s best efforts that a run of zeroes in the file means corruption and conversely no zeroes is good.
Perhaps this, just after the cp example?
Note that the preceding illustrative example using cp is not the only scenario in which this rare combination of events could occur. It is not possible to determine presence of, or lack of, data corruption, by searching for runs of zeroes in files. Any process that incorrectly receives a zeroed hole chunk, may in turn transform that data, eventually writing it out to disk with the zeroes obscured due to the transformation.
If this doesn’t seem helpful please go ahead without it.
It is not possible to determine presence of, or lack of, data corruption, by searching for runs of zeroes in files.
Not only that, there are legitimate reasons for files to have runs of zeros, and I imagine searching for runs of zeros is more likely to find false positives than file copies affected by this issue.
I think @netchild's suggestion is sufficient though - we don't talk at all about identifying corrupt files in this EN, and I don't think we should. I imagine part of the confusion in the GitHub pull request is that there are over 200 comments that mirror the evolution of the understanding of this issue, and clarification there is probably good.
Rework impact some more - don't try to enumerate multiple ways incorrect zeros could persist, be transformed, etc.
Thanks for the review everyone, released as https://www.freebsd.org/security/advisories/FreeBSD-EN-23:16.openzfs.asc