Page MenuHomeFreeBSD
Feed Advanced Search

Wed, Apr 14

mckusick committed R10:14d0cd7225e2: Ensure that the mount command shows "with quotas" when quotas are enabled. (authored by mckusick).
Ensure that the mount command shows "with quotas" when quotas are enabled.
Wed, Apr 14, 10:21 PM

Tue, Apr 13

mckusick added a comment to D29695: systat: Implemented per-process swap display on -swap.

Not having seen this review, I started a review at https://reviews.freebsd.org/D29754. I am fine with abandoning my review though I do think you should consider incorporating my version of the manual page.

Tue, Apr 13, 11:12 PM
mckusick requested review of D29754: Augment systat(1) -swap to display large swap space processes.
Tue, Apr 13, 9:41 PM

Fri, Apr 9

mckusick committed R10:27aa4fcbbc73: Ensure that all allocated data structures in fsck_ffs are freed. (authored by mckusick).
Ensure that all allocated data structures in fsck_ffs are freed.
Fri, Apr 9, 12:49 AM

Fri, Apr 2

mckusick accepted D29563: Remove kgmon(8).

Given that -pg support has been withdrawn from the kernel, it is sensible to remove kgmon(8).

Fri, Apr 2, 10:44 PM
mckusick committed R10:343b9e6219e1: Fix fsck_ffs -R finds unfixed duplicate block errors when rerunning. (authored by mckusick).
Fix fsck_ffs -R finds unfixed duplicate block errors when rerunning.
Fri, Apr 2, 9:53 PM
mckusick committed R10:fab7c18ce322: Fix fsck_ffs Pass 1b error exit "bad inode number 2 to nextinode". (authored by mckusick).
Fix fsck_ffs Pass 1b error exit "bad inode number 2 to nextinode".
Fri, Apr 2, 9:50 PM
mckusick committed R10:fc56fd262d0b: Ensure that all allocated data structures in fsck_ffs are freed. (authored by mckusick).
Ensure that all allocated data structures in fsck_ffs are freed.
Fri, Apr 2, 6:56 PM

Wed, Mar 31

mckusick accepted D29478: ffsinfo: Update example to avoid to-be-deprecated vinum.

Your change looks good.

Wed, Mar 31, 5:34 AM

Fri, Mar 26

mckusick added a comment to D28856: Move struct bufobj out of struct vnode.
In D28856#659254, @kib wrote:

The cost of an extra allocation versus the overhead of having to handle low-memory situations and building up and tearing down zones seems like a bad tradeoff.

Why? It is reverse, IMO: the normal system operation performs a lot of vnode allocations and deallocations, while lowmem is rare condition, where we do not worry about system performance at all, only about system liveness. Optimizing for normal path is right, optimizing for lowmem handler is not.

The purpose of this change is to reduce the amount of memory dedicated to vnodes.

Fri, Mar 26, 6:19 PM
mckusick added a comment to D28856: Move struct bufobj out of struct vnode.
In D28856#659141, @kib wrote:

There is still overhead as the zone memory has to be cleaned up (locks disposed of) and then new memory initialized (zeroed, lists and queues initialized, locks initialized, etc). Also there is extra work done detecting that we have hit these conditions and making them happen. In general we are going to have more memory tied up and do more work moving it between the zones. If we just had one zone for vnodes and another zone for bufobjs we could avoid all of this. In all likelyhood we would only need occational freeing of memory in the bufobj zone.

I am curious why you are so resistant to having a single vnode zone and a single bufobj zone?

With either (vnode + bufobj, vnode - bufobj), or (vnode - bufobj, bufobj) we still have two zones, and on low memory condition two zones needs to be drained. But for the separate bufobj zone, we additionally punish filesystems that use buffers. Instead of single allocation for vnode, they have to perform two, and also they have to perform two frees.

We have a similar structure in namecache, where {short,long}x{timestamp, no timestamp} allocations use specific zones, instead of allocating nc + path segment + timestamp.

Fri, Mar 26, 12:09 AM

Thu, Mar 25

mckusick added a comment to D28856: Move struct bufobj out of struct vnode.

I understand that there cannot be more than maxvnodes. What I am concerned about is how much memory is tied up in the two zones. In this example as vnlru() is freeing (vnodes without bufobjs) into the (vnodes without bufobjs) zone. It then allocates memory from (vnodes+bufobj) from the (vnodes+bufobj) zone. That allocation cannot use the memory in the (vnodes without bufobj) zone. So when we are done we have enough memory locked down in the two zones to support 2 * maxvnodes. This is much more wasteful of memory than having a single zone for pure vnodes and a second zone that holds bufobjs each of which will be limited to maxvnodes in size.

Thu, Mar 25, 9:18 PM
mckusick committed R10:7848b25edd2a: Fix fsck_ffs -R finds unfixed duplicate block errors when rerunning. (authored by mckusick).
Fix fsck_ffs -R finds unfixed duplicate block errors when rerunning.
Thu, Mar 25, 12:22 AM

Wed, Mar 24

mckusick committed R10:bc444e2ec6e6: Fix fsck_ffs Pass 1b error exit "bad inode number 2 to nextinode". (authored by mckusick).
Fix fsck_ffs Pass 1b error exit "bad inode number 2 to nextinode".
Wed, Mar 24, 11:50 PM
mckusick added a comment to D28856: Move struct bufobj out of struct vnode.
In D28856#658811, @kib wrote:

No, we do not have two pools of vnodes after this change. We have two zones, but zones do not keep vnodes, they cache partially initialized memory for vnodes. Either the current single zone, or two zones after applying the patch, do not have any limits to grow that size of the cache. But it is a cache of memory and not vnodes. Without or with the patch, only maxvnodes constructed vnodes can exist in the system. The constructed vnode is struct vnode which is correctly initialized and has identity belonging to some filesystem, or reclaimed. [In fact in some cases getnewvnodes() is allowed to ignore the limit of maxvnodes, but this is not relevant to the discussion].

Let me try again to explain my perceived issue.

Under this scheme we have two zones. If there is a lot of ZFS activity, the vnode-only zone can be filled with maxvnodes worth of entries. Now suppose activity in ZFS drops but activity in NFS rises. Now the zone with vnodes + bufobj can fill to maxvnodes worth of memory. As I understand it we do not reclaim any of the memory from the vnode-only zone, it just sits there unable to be used. Is that correct?

No, this is not correct. Total number of vnodes (summary of both types) is limited by maxvnodes. After the load shifted from zfs to nfs in your scenario, vnlru starts reclaiming vnodes in LRU order from the global free list, freeing (vnode without bufobj)s, and most of the allocated vnodes would be from the (vnode+bufobj) zone. We do not allow more that maxvnodes total vnodes allocated in the system.

Wed, Mar 24, 10:31 PM
mckusick added a comment to D28856: Move struct bufobj out of struct vnode.

No, we do not have two pools of vnodes after this change. We have two zones, but zones do not keep vnodes, they cache partially initialized memory for vnodes. Either the current single zone, or two zones after applying the patch, do not have any limits to grow that size of the cache. But it is a cache of memory and not vnodes. Without or with the patch, only maxvnodes constructed vnodes can exist in the system. The constructed vnode is struct vnode which is correctly initialized and has identity belonging to some filesystem, or reclaimed. [In fact in some cases getnewvnodes() is allowed to ignore the limit of maxvnodes, but this is not relevant to the discussion].

Wed, Mar 24, 7:10 PM

Mon, Mar 22

mckusick added a comment to D28856: Move struct bufobj out of struct vnode.
In D28856#657865, @kib wrote:

Three inline comments / questions.

In Sun's implementation of vnodes each filesystem type had its own pool. When I adopted the vnode idea into BSD, I created generic vnodes that could be used by all filesystems so that they could move between filesystems based on demand.

This design reverts back to vnodes usable by ZFS and a few other filesystems and vnodes for NFS, UFS, and most other filesystems. This will be a win for systems that run just ZFS. But, systems that are also running NFS or UFS will not be able to share vnode memory and will likely have a bigger memory footprint than if they stuck with the single type of vnode.

There has been no attempt to fix vlrureclaim() so we can end up reclaiming a bunch of vnodes of the wrong type thus reducing the usefulness of the cache without recovering any useful memory. In the worst case, we can end up with each of the two vnode pools using maxvnodes worth of memory.

We probably need to have a separate maxvnodes for each pool. Alternately we could keep track of how many vnodes are in each pool and limit the two pools to a total of maxvnodes. That of course begs the question of how we decide to divide the quota between the two pools. At a minimum, vlrureclaim() needs to have a way to decide which pool needs to have vnodes reclaimed.

We do not have two pools of vnodes after the patch. For very long time, we free vnode after its hold count goes to zero (mod SMR complications).

Mon, Mar 22, 11:30 PM
mckusick requested changes to D28856: Move struct bufobj out of struct vnode.

Three inline comments / questions.

Mon, Mar 22, 5:20 AM

Mar 16 2021

mckusick committed R10:cf0310dfefee: Fix bug 253158 - Panic: snapacct_ufs2: bad block - mksnap_ffs(8) crash (authored by mckusick).
Fix bug 253158 - Panic: snapacct_ufs2: bad block - mksnap_ffs(8) crash
Mar 16 2021, 12:07 AM

Mar 14 2021

mckusick committed R10:7dd29d256ff7: Do not complain about incorrect cylinder group check-hashes when (authored by mckusick).
Do not complain about incorrect cylinder group check-hashes when
Mar 14 2021, 10:16 PM

Mar 12 2021

mckusick committed R10:6385cabd5be6: Do not complain about incorrect cylinder group check-hashes when (authored by mckusick).
Do not complain about incorrect cylinder group check-hashes when
Mar 12 2021, 6:44 AM

Mar 11 2021

mckusick accepted D29178: UFS SU: handle races on remounts rw<->ro.

Flag definitions look good.
Breakdown of commits is excellent.
Changes should resolve the problem.

Mar 11 2021, 7:39 PM
mckusick accepted D29178: UFS SU: handle races on remounts rw<->ro.

It would help if I had looked at your commit logs before my previous comment. You have in fact separated everything out appropriately.

Mar 11 2021, 6:31 AM
mckusick added a comment to D29178: UFS SU: handle races on remounts rw<->ro.

Nearly all of the changes in ffs_softdep.c are code cleanups and not related to this bug fix. I would prefer to see the code cleanups in a separate commit so that it is easier to see the changes that are needed to fix this problem. That said, this update appears to solve the problem that you describe.

Mar 11 2021, 6:27 AM

Mar 10 2021

mckusick added a comment to D29178: UFS SU: handle races on remounts rw<->ro.

In your summary you say ``rw<->ro remounts are not atomic, filesystem is accessible by other threads during the process. As result, its internal state is inconsistent. Just blocking writers with suspend is not enough.'' Can you elaborate on how having other processes reading the filesystem causes trouble?

Mar 10 2021, 4:48 AM

Mar 3 2021

mckusick accepted D29045: Don't use sleeping allocations for ufs dirhash blocks when holding directory vnode.

This seems like a reasonable solution to the problem. The allocation will fail in a few cases where it previously would have succeeded, but hopefully those will be rare. The effect of failing will simply be slower lookups rather than unexpected errors to applications.

Mar 3 2021, 9:19 PM
mckusick accepted D29021: growfs: allow operation on RW-mounted filesystems.

I agree that this change is appropriate.

Mar 3 2021, 9:02 PM

Mar 2 2021

mckusick added a comment to D28999: FFS extattr: fix handling of the tail.

Sorry for the delayed review. This change looks correct to me.

Mar 2 2021, 5:55 PM
mckusick added a comment to D26999: Remove an extra if_ref()..

Sorry for the delayed review. This fix looks correct to me.

Mar 2 2021, 5:54 PM

Feb 25 2021

mckusick added a comment to D28901: buf: Fix the dirtybufthresh check.

It appears that this change should be MFC'ed to 12.

Feb 25 2021, 12:24 AM

Feb 24 2021

mckusick accepted D28901: buf: Fix the dirtybufthresh check.
Feb 24 2021, 6:11 AM

Feb 20 2021

mckusick accepted D28679: vnode: move write cluster support data to inodes..

Its a wrap!

Feb 20 2021, 1:03 AM

Feb 18 2021

mckusick added a comment to D28679: vnode: move write cluster support data to inodes..

I note that lib/libprocstat/libprocstat.c includes ufs/ufs/inode.h but in fact does not need to do so.

Feb 18 2021, 10:53 PM
mckusick accepted D28679: vnode: move write cluster support data to inodes..

Getting rid of _buf_cluster.h was a lot more work than I expected, but is definitely cleaner and has the added bonus of cleaning up some other cruft as well.

Feb 18 2021, 10:49 PM
mckusick accepted D28679: vnode: move write cluster support data to inodes..

The problem is in lib/libprocstat that wants to read inodes out of the kernel so needs to know their size. So, I agree avoiding _buf_cluster.h is hard. I reluctantly agree with that solution.

Feb 18 2021, 6:31 AM

Feb 17 2021

mckusick added a comment to D28679: vnode: move write cluster support data to inodes..

To avoid the _buf_cluster.h file, how about making the inclusion of buf.h in ufs/inode.h and ext2fs/inode.h conditional on #ifdef KERNEL? Both vfs_cluster.c and fuse_node.h already include buf.h so are not an issue. I don't know if msdosfs/denode.h is used outside the kernel, but if so could make inclusion of buf.h conditional on #ifdef KERNEL.

Feb 17 2021, 8:32 PM
mckusick accepted D28697: lockf: ensure atomicity of lockf for open(O_CREAT|O_EXCL|O_EXLOCK).

All looks good.

Feb 17 2021, 8:09 PM

Feb 16 2021

mckusick added a comment to D28697: lockf: ensure atomicity of lockf for open(O_CREAT|O_EXCL|O_EXLOCK).

Overall looks good.

Feb 16 2021, 9:58 PM
mckusick added a comment to D28679: vnode: move write cluster support data to inodes..

Overall looks good. A few inline comments.

Feb 16 2021, 7:27 PM
mckusick accepted D28675: cache: add an introductory comment.

Thanks for rewriting this comment.
I have provided some suggestions for cleanup and/or clarification.

Feb 16 2021, 1:23 AM

Feb 12 2021

mckusick committed R10:8563de2f2799: Fix bug 253158 - Panic: snapacct_ufs2: bad block - mksnap_ffs(8) crash (authored by mckusick).
Fix bug 253158 - Panic: snapacct_ufs2: bad block - mksnap_ffs(8) crash
Feb 12 2021, 5:36 AM

Jan 30 2021

mckusick committed R10:1f9ee757d96d: MFC: 8c22cf9b (authored by mckusick).
MFC: 8c22cf9b
Jan 30 2021, 8:21 AM
mckusick committed R10:1aa1ede1fd44: MFC: a63eae6 (authored by mckusick).
MFC: a63eae6
Jan 30 2021, 8:15 AM
mckusick added a reverting change for R10:2d4422e7991a: Eliminate lock order reversal in UFS ffs_unmount().: R10:1aa1ede1fd44: MFC: a63eae6.
Jan 30 2021, 8:15 AM
mckusick added a reverting change for R10:2d4422e7991a: Eliminate lock order reversal in UFS ffs_unmount().: R10:a63eae65ff87: Revert 2d4422e7991a, Eliminate lock order reversal in UFS ffs_unmount()..
Jan 30 2021, 8:02 AM
mckusick committed R10:a63eae65ff87: Revert 2d4422e7991a, Eliminate lock order reversal in UFS ffs_unmount(). (authored by mckusick).
Revert 2d4422e7991a, Eliminate lock order reversal in UFS ffs_unmount().
Jan 30 2021, 8:01 AM

Jan 26 2021

mckusick committed R10:8c22cf9b0997: Fix fsck_ffs incorrectly reporting "CANNOT READ BLK: NNNN" errors. (authored by mckusick).
Fix fsck_ffs incorrectly reporting "CANNOT READ BLK: NNNN" errors.
Jan 26 2021, 7:48 PM

Jan 16 2021

mckusick committed R10:79a5c790bdf0: Eliminate a locking panic when cleaning up UFS snapshots after a (authored by mckusick).
Eliminate a locking panic when cleaning up UFS snapshots after a
Jan 16 2021, 12:33 AM
mckusick committed R10:173779b98f10: Eliminate lock order reversal in UFS when unmounting filesystems (authored by mckusick).
Eliminate lock order reversal in UFS when unmounting filesystems
Jan 16 2021, 12:02 AM

Jan 12 2021

mckusick committed R10:2d4422e7991a: Eliminate lock order reversal in UFS ffs_unmount(). (authored by mckusick).
Eliminate lock order reversal in UFS ffs_unmount().
Jan 12 2021, 12:45 AM

Jan 7 2021

mckusick committed R10:5cc52631b3b8: Rewrite the disk I/O management system in fsck_ffs(8). Other than (authored by mckusick).
Rewrite the disk I/O management system in fsck_ffs(8). Other than
Jan 7 2021, 10:59 PM
mckusick accepted D28008: vfs: fix rangelock range in vn_rdwr() for IO_APPEND.

This change certainly fixes the problem though it write-locks far more than necessary.

Jan 7 2021, 5:18 AM
mckusick committed R10:c8a7a3ffe120: Fix bug in expanding lost+found direct blocks. (authored by mckusick).
Fix bug in expanding lost+found direct blocks.
Jan 7 2021, 12:31 AM

Jan 3 2021

mckusick committed R10:997f81af4316: The fsck_ffs program had previously only been able to expand the size (authored by mckusick).
The fsck_ffs program had previously only been able to expand the size
Jan 3 2021, 6:48 AM

Jan 1 2021

mckusick committed R10:41cf333f9b2a: MFC: Correct and add some comments. (authored by mckusick).
MFC: Correct and add some comments.
Jan 1 2021, 7:59 AM

Dec 31 2020

mckusick committed R10:68dc94c7d314: Correct and add some comments. (authored by mckusick).
Correct and add some comments.
Dec 31 2020, 11:32 PM

Dec 23 2020

mckusick accepted D27731: ffs: Avoid out-of-bounds accesses in the fs_active bitmap.

No change in actual running, but definitely correct change.

Dec 23 2020, 5:31 AM

Dec 18 2020

mckusick committed rS368773: Rename pass4check() to freeblock() and move from pass4.c to inode.c..
Rename pass4check() to freeblock() and move from pass4.c to inode.c.
Dec 18 2020, 11:28 PM

Dec 11 2020

mckusick accepted D27558: ffs: quiet -Wstrict-prototypes.
Dec 11 2020, 7:08 AM

Dec 9 2020

mckusick committed rS368494: MFC of 368396 and 368425..
MFC of 368396 and 368425.
Dec 9 2020, 10:37 PM

Dec 8 2020

mckusick committed rS368425: In ext2fs, BA_CLRBUF is used in ext2_balloc() not UFS_BALLOC()..
In ext2fs, BA_CLRBUF is used in ext2_balloc() not UFS_BALLOC().
Dec 8 2020, 12:49 AM

Dec 6 2020

mckusick committed rS368396: Document the BA_CLRBUF flag used in ufs and ext2fs filesystems..
Document the BA_CLRBUF flag used in ufs and ext2fs filesystems.
Dec 6 2020, 8:50 PM
mckusick accepted D27457: ufs: handle two more cases of possible VNON vnode returned from VFS_VGET()..

These updates look needed and correct.

Dec 6 2020, 12:36 AM

Nov 29 2020

mckusick accepted D27353: ffs: do not read full direct blocks if they are going to be overwritten..

Good to go.

Nov 29 2020, 10:12 PM

Nov 25 2020

mckusick requested changes to D27353: ffs: do not read full direct blocks if they are going to be overwritten..

The sentiment is correct, but logic fixes noted are needed.

Nov 25 2020, 4:39 AM

Nov 20 2020

mckusick added a comment to D27269: msdosfs: suspend around umount or remount rw->ro..

Belatedly, these changes look good and in particular get rid of VOP_SYNC(..., MNT_WAIT).

Nov 20 2020, 9:14 PM
mckusick committed rS367911: Only attempt a VOP_UNLOCK() when the vn_lock() has been successful..
Only attempt a VOP_UNLOCK() when the vn_lock() has been successful.
Nov 20 2020, 8:22 PM

Nov 17 2020

mckusick committed rS367751: MFC of 367045..
MFC of 367045.
Nov 17 2020, 6:04 AM
mckusick committed rS367750: MFC of 367035..
MFC of 367035.
Nov 17 2020, 6:00 AM
mckusick committed rS367749: MFC of 340927 and 367034..
MFC of 340927 and 367034.
Nov 17 2020, 5:48 AM

Nov 16 2020

mckusick accepted D27225: Make MAXPHYS tunable..

I have wanted this change for a long time. Thanks for doing it.

Nov 16 2020, 10:14 PM

Nov 14 2020

mckusick added a comment to D26912: RFC: Disk I/O priority support.

Have we reached any conclusions about whether to do any of the ideas suggested in this phabricator thread?

Nov 14 2020, 10:53 PM · cam
mckusick added a comment to D26964: ufs: end-of-life truncate should depend on dirent write.

I have thought about how to preserve the performance behavior aspects of r209717, but I haven't come up with how to do it. Here are a few thoughts.

  • There's the patch here that uses i_nlink instead of i_effnlink, which I think solves the problem I set out to solve, but probably reopens the problem from r209717.
  • We could do the above and add a chicken switch, like vfs.ffs.doasyncfree (vfs.ffs.doeagertrunc?).
  • I think, most ideally, we would just make all of the writes that happens in ffs_truncate for the end-of-life truncate depend on i_nlink reaching zero (via softdep, and either directly or indirectly). That way the thread doing the remove is still usually the one calling ffs_truncate and can be throttled. However I don't really know how to do this in code, and I'm unsure whether it's feasible (?).
  • As a hack, we could do some kind of proxy throttle. This would be something like, if we are inactivating a large file, or we are softdep_excess_items(D_DIRREM), then do some process_worklist_item() to do some of the flusher work.

Thoughts on how to proceed?

Nov 14 2020, 10:49 PM

Nov 6 2020

mckusick added a comment to D27054: Suspend all writeable local filesystems on power suspend..

Two questions.

Nov 6 2020, 6:09 PM

Nov 1 2020

mckusick accepted D26136: Handle LoR in flush_pagedep_deps()..

All looks good.

Nov 1 2020, 4:59 PM

Oct 31 2020

mckusick added a comment to D26964: ufs: end-of-life truncate should depend on dirent write.

! In D26964#603222, @rlibby wrote:

! In D26964#603106, @mckusick wrote:

I do like your new approach better as it is much clearer what is going on. I think that it may be sufficient to just make the last change in your delta where you switch it only doing the truncation when i_nlink falls to zero. The other actions taken when i_effnlink falls to zero should still be OK to be done then. As before, getting Peter Holm's testing is important.

Okay. I'll post the diff as you suggest, but I don't quite understand. Those other actions seem just to be doing a vn_start_secondary_write(). Is this so that when i_effnlink <= 0 but i_nlink > 0 and we have one of the IN_ change flags, that we use V_NOWAIT for the vn_start_secondary_write and possibly defer with VI_OWEINACT vs the V_WAIT we might use just above UFS_UPDATE?

My recollection is that the i_effnlink <= 0 when i_nlink > 0 got added because we were somehow missing getting VI_OWEINACT set. But looking through the code, I just cannot come up with a scenario where that is the case. So your original proposed change is probably correct and would save some extra unnecessary work.

Oct 31 2020, 11:07 PM
mckusick added a comment to D26136: Handle LoR in flush_pagedep_deps()..

This is an impressive piece of work, what a lot of effort to fix this LOR. Took a couple of hours to review, but overall looks good. A couple of minor inline comments.

Oct 31 2020, 10:36 PM

Oct 30 2020

mckusick added a comment to D26964: ufs: end-of-life truncate should depend on dirent write.
In D26964#601553, @kib wrote:

I am not sure about this approach. Note that vref()/usecount reference does not prevent the vnode reclaim. So for instance force umount results in vgone() which does inactivation and reclaim regardless of the active state (or rather does inactivation if the vnode is active). In this case, it seems to not fix the issue.

Also, typically SU does not rely on the _vnode_ state, workitems are attached to the metadata blocks owned by devvp.

Oct 30 2020, 11:21 PM
mckusick added a comment to D26964: ufs: end-of-life truncate should depend on dirent write.

Okay. I am definitely open to better solutions, especially if they fit the paradigm better.

In ufs_inactive, why do we look at i_effnlink at all? Why not just base the truncate on i_nlink? I think this could be another method of delaying the end-of-life truncate until after the write of the dirent, but without relying on a vnode reference. I think that might look like this:
https://github.com/rlibby/freebsd/commit/3b62c248f3377c47fb4bfa65a19b0f5390caec37
(Passes stress2's fs.sh and issue repros above.)

Oct 30 2020, 11:17 PM

Oct 27 2020

mckusick added a comment to D26964: ufs: end-of-life truncate should depend on dirent write.

You note that ``If we then crashed before the dirent write, we would recover with a state where the file was still linked, but had been truncated to zero. The resulting state could be considered corruption.'' The filesystem is not corrupted in the sense that it needs fsck to be run to clean it up. We simply end up with an unexpected result.

Oct 27 2020, 12:55 AM

Oct 25 2020

mckusick committed rS367045: Use proper type (ino_t) for inode numbers to avoid improper sign extention.
Use proper type (ino_t) for inode numbers to avoid improper sign extention
Oct 25 2020, 9:04 PM
mckusick committed rS367035: Filesystem utilities that modify the filesystem (growfs(8), tunefs(8),.
Filesystem utilities that modify the filesystem (growfs(8), tunefs(8),
Oct 25 2020, 1:36 AM
mckusick committed rS367034: Various new check-hash checks have been added to the UFS filesystem.
Various new check-hash checks have been added to the UFS filesystem
Oct 25 2020, 12:44 AM

Oct 23 2020

mckusick added a comment to D26912: RFC: Disk I/O priority support.

It would be trivial to request high priority for synchronous writes in bwrite() and if desired synchronous reads in bread(). That would have effects for several filesystems.

Oct 23 2020, 6:16 AM · cam

Oct 18 2020

mckusick added a comment to D26136: Handle LoR in flush_pagedep_deps()..

Added a couple of inline comments.

Oct 18 2020, 10:46 PM

Oct 3 2020

mckusick accepted D26596: ufs: restore uniqueness of st_dev.
Oct 3 2020, 8:27 PM

Sep 26 2020

mckusick committed rS366187: MFS of 366163 from stable/12 which is MFC of 365992 from head..
MFS of 366163 from stable/12 which is MFC of 365992 from head.
Sep 26 2020, 9:46 PM

Sep 25 2020

mckusick added a comment to D26511: Do not leak B_BARRIER..
In D26511#591360, @kib wrote:

The block being written with the barrier is a newly allocated block of inodes. The write is done asynchronously and the cylinder group is then updated to reflect that the additional inodes are available. The reason for the barrier is so that the cylinder group buffer with the newly expanded inode map cannot be written before the newly allocated set of inodes.

Since the incore cylinder group includes the newly created inodes, some other thread can come along and try to use one of those newly allocated inodes. but, it will block on the inode buffer until its write has completed.

The bug in this instance is that there is an assumption that the write cannot fail. Clearly this is a bad assumption. So, the correct fix is to not depend on the barrier write, but rather to create a callback that updates the cylinder group once the write of the new inodes has completed successfully.

In the specific case of kostik1316, doing that would deadlock the machine. Problem is that CoW write returned ENOSPC, which means that there is no way to correctly free space on the volume. I am not sure what do to there. Most likely any other write would also return ENOSPC, so in fact consistency of the volume is not too badly broken if we just fail there. If we start tracking the write as a dependency for cg write, then all that buffers add to the dirty space which eventually hang buffer subsystem.

I believe (based on Peter testing) that just erroring write allows us to unmount.

What we are trying to do where the barrier write is being used in UFS is expand the number of available inodes. It is OK to abandon the write of the zero'ed out inode block as the expansion can be put off to later. But, we must not update the cylinder group to say that these new inodes are available if we have not been able to zero them out. As it is currently written, the cylinder group is updated but the on-disk inodes are not zero'ed out. If the unmount succeeds in writing out the dirty buffers we will end up with a corrupted filesystem because fsck_ffs will try to check the non-zero'ed out inodes and raise numerous errors trying to correct the inconsistencies that arise from the random data in the uninitialzed inode block.

Sep 25 2020, 6:36 PM
mckusick committed rS366163: MFC of 365992.
MFC of 365992
Sep 25 2020, 5:14 PM
mckusick added a comment to D26511: Do not leak B_BARRIER..
In D26511#590951, @kib wrote:
In D26511#590229, @kib wrote:

As long as the buffer remains locked until it is successfully written, this should be fine.

Sorry, I do not fully understand you note. Do you mean that it is fine to have B_BARRIER set as far as buffer is not unlocked ? If not, could you please clarify.

We are depending on this being written before it can be used. If it were unlocked, then some other thread could get it and make use of it. See comment above use of babarrierwrite in sys/ufs/ffs/ffs_alloc.c.

I still do not understand what do you mean by 'use'. In the kostik1316 dump, most probable scenario was that babarrierwrite() for the inode block failed with ENOSPC, and then bufdone() does brelse() on the buffer. So it is unlocked, but this is somewhat unrelated to the issue of leaking B_BARRIER.

Sep 25 2020, 1:03 AM

Sep 23 2020

mckusick added a comment to D26511: Do not leak B_BARRIER..
In D26511#590229, @kib wrote:

As long as the buffer remains locked until it is successfully written, this should be fine.

Sorry, I do not fully understand you note. Do you mean that it is fine to have B_BARRIER set as far as buffer is not unlocked ? If not, could you please clarify.

We are depending on this being written before it can be used. If it were unlocked, then some other thread could get it and make use of it. See comment above use of babarrierwrite in sys/ufs/ffs/ffs_alloc.c.

Sep 23 2020, 10:25 PM

Sep 22 2020

mckusick committed rS365992: Add missing cylinder group check-hash updates when doing large expansions.
Add missing cylinder group check-hash updates when doing large expansions
Sep 22 2020, 3:57 AM

Sep 21 2020

mckusick accepted D26511: Do not leak B_BARRIER..

As long as the buffer remains locked until it is successfully written, this should be fine.

Sep 21 2020, 10:38 PM
mckusick committed rS365971: MFC of 365700.
MFC of 365700
Sep 21 2020, 7:26 PM

Sep 19 2020

mckusick committed rS365919: Update the libufs cgget() and cgput() interfaces to have a similar.
Update the libufs cgget() and cgput() interfaces to have a similar
Sep 19 2020, 10:49 PM
mckusick committed rS365912: The fsdb(8) utility uses the fsck_ffs(8) disk I/O interfaces, so.
The fsdb(8) utility uses the fsck_ffs(8) disk I/O interfaces, so
Sep 19 2020, 8:06 PM

Sep 13 2020

mckusick committed rS365700: In the newfs(8) utility, use the more appropriate sbwrite() and cgwrite().
In the newfs(8) utility, use the more appropriate sbwrite() and cgwrite()
Sep 13 2020, 10:58 PM

Sep 12 2020

mckusick accepted D26375: Move td_softdep_cleanup() from userret() to ast().

This change looks reasonable to me as at least for amd64 every path out of the kernel appears to check TDF_ASTPENDING and calls ast() if set.

Sep 12 2020, 6:14 AM

Sep 1 2020

mckusick accepted D26289: Remove risky compatability with old kernels..

It was a horrible hack at the time and should have been tossed decades ago.

Sep 1 2020, 11:17 PM

Aug 31 2020

mckusick committed rS364983: MFC of 364895.
MFC of 364895
Aug 31 2020, 5:25 AM