- User Since
- Oct 24 2014, 7:17 PM (353 w, 5 d)
Thu, Jul 29
Mon, Jul 19
I am fine with the structure of this patch. I do ask that you await passing Peter Holm's tests before committing.
This reminds me of something that occurred to me while I was writing the code though: these flags are effectively prevented from being exposed to userspace because they're outside the 32-bit range of unmount_args->flags.
It seems a bit fragile to rely on that though...should we have something like an MNT_KERNEL mask for flags that are only allowed to be passed from kernel-initiated operations, and return EINVAL for any userspace request that includes any of them?
In theory MNT_VISFLAGMASK are externally visible and MNT_CMDFLAGS are internal only. The MNT_VISFLAGMASK flags are the only ones returned in a statfs(2) call but are not checked on system calls as far as I can tell.
Adding some clarity / enforcement would likely be a good idea, but not as part of these changes.
Sat, Jul 17
Suggestion on how to run forcible unmount test on UFS.
Wed, Jul 14
@jah - are you ready to have Peter Holm test these changes? If so, I will enlist his help.
Tue, Jul 13
Mon, Jul 12
For the purposes of this discussion, I think you should just go with your original proposal as it is clearly a useful step forward.
Sat, Jul 10
Latest change looks good.
Code wise it looks good. I am not familiar with the exact layout on disk, but given cy@ has tested it and found it to work, I am happy that those aspects of the code are correct.
Jun 30 2021
Basing a decision on just the superblock magic number seems a bit weak, but that is what has been used traditionally and it worked, so with that in mind, this change seems reasonable.
Were this still in use, I would recommend more robust checks. For example verifying the check-hash for UFS2 filesystems that have them.
Jun 28 2021
The tests that Peter Holm ran (back in March) showed that buildworld ran about 1.5% slower with this change than without it. We were not able to figure out why that slowdown occurred, but unless this can be done in a way that at least has the same running time, I do not think we should do it.
Maybe it'd be nice if the read/write fd's were handled by libufs, but then again maybe not - I couldn't say without studying the code a bit more.
Jun 15 2021
Looks good to go.
Good catch on Mark's part.
Jun 11 2021
People experiencing this problem report that running the sync(8) command cleared up the slowdown. The reason that it helps is because the problem is triggered when the journal is low on space to make new entries. The journal is written as a circular queue, so running low on space means that the head of the list is nearing the tail of the list. The only way to make more space in the journal is to flush the oldest entries in it so that the tail can advance. What we do in softdep_prelink() is to flush out the entry that we have at hand. But it is almost never causes the oldest entry in the journal to be flushed. So the problem persists and we are just going to keep hitting it until some event causes the oldest entry to be flushed. The reason that sync(8) is so helpful is that it causes the older entries in the journal to be cleared which makes the problem go away.
Jun 10 2021
This looks good and based on testing appears to mitigate effectively. I also recommend changing SUJ_MAX in sys/ufs/ffs/fs.h to 512Mb or at least 256Mb.
Jun 7 2021
Jun 2 2021
May 31 2021
Good catch, definitely needed.
May 30 2021
This looks good to me.
May 29 2021
If mprotect(2) can force the breakup of pages suitable for superpage mappings, then it opens a vector for a denial of performance by mapping and breaking up all the superpages held by the operating system.
Given that you rolled back the two changes, your suggested change of just adding a bufinit() in the gjournal case is the easiest and minimal change to solving this bug.
I think a better change is to simply preinitialize fs_bsize in sblock_init to MAXBSIZE which will be valid for all UFS filesystems so that bufinit will do a resonable allocation.
I travelled to the north coast where we have been offline for two days, but now back online. Will look into this directly.
May 22 2021
Ed, thanks for trying to do the close. Seems like a bug in Phabricator, but not sure to whom to report it. Meanwhile, abandonment seems like the best option.
May 21 2021
Yes, that is the correct fix.
May 20 2021
May 19 2021
Expanding the journal is just to reduce the incidence of running into this problem. It is still necessary to fix the problem and this looks to be a viable approach.
Looks good to go.
May 18 2021
Other than one requested change, this looks good.
May 17 2021
I have been working with folks on these two bug reports which I suspect are caused by this problem:
This change has now been MFC'ed to 12-stable and 13-stable, so can be closed.
May 11 2021
This now follows FreeBSD style.
forgot to approve before committing.
May 9 2021
I think that this change provides a useful cleanup of the VM system. As noted it makes it easier to create new types of pagers.
May 7 2021
This change does accurately determine when a vnode does not need to be sync'ed. Does it help solve the stalling problem?
May 5 2021
Legally "All Rights Reserved." is no longer required, so we no longer add it to copyright messages.
Apr 27 2021
Apr 26 2021
Apr 19 2021
This change is a useful improvement.
Apr 18 2021
Apr 14 2021
Apr 13 2021
Not having seen this review, I started a review at https://reviews.freebsd.org/D29754. I am fine with abandoning my review though I do think you should consider incorporating my version of the manual page.
Apr 9 2021
Apr 2 2021
Given that -pg support has been withdrawn from the kernel, it is sensible to remove kgmon(8).
Mar 31 2021
Your change looks good.
Mar 26 2021
The purpose of this change is to reduce the amount of memory dedicated to vnodes.
Mar 25 2021
I understand that there cannot be more than maxvnodes. What I am concerned about is how much memory is tied up in the two zones. In this example as vnlru() is freeing (vnodes without bufobjs) into the (vnodes without bufobjs) zone. It then allocates memory from (vnodes+bufobj) from the (vnodes+bufobj) zone. That allocation cannot use the memory in the (vnodes without bufobj) zone. So when we are done we have enough memory locked down in the two zones to support 2 * maxvnodes. This is much more wasteful of memory than having a single zone for pure vnodes and a second zone that holds bufobjs each of which will be limited to maxvnodes in size.
Mar 24 2021
No, we do not have two pools of vnodes after this change. We have two zones, but zones do not keep vnodes, they cache partially initialized memory for vnodes. Either the current single zone, or two zones after applying the patch, do not have any limits to grow that size of the cache. But it is a cache of memory and not vnodes. Without or with the patch, only maxvnodes constructed vnodes can exist in the system. The constructed vnode is struct vnode which is correctly initialized and has identity belonging to some filesystem, or reclaimed. [In fact in some cases getnewvnodes() is allowed to ignore the limit of maxvnodes, but this is not relevant to the discussion].
Mar 22 2021
Three inline comments / questions.
Mar 16 2021
Mar 14 2021
Mar 12 2021
Mar 11 2021
Flag definitions look good.
Breakdown of commits is excellent.
Changes should resolve the problem.
It would help if I had looked at your commit logs before my previous comment. You have in fact separated everything out appropriately.
Nearly all of the changes in ffs_softdep.c are code cleanups and not related to this bug fix. I would prefer to see the code cleanups in a separate commit so that it is easier to see the changes that are needed to fix this problem. That said, this update appears to solve the problem that you describe.
Mar 10 2021
In your summary you say ``rw<->ro remounts are not atomic, filesystem is accessible by other threads during the process. As result, its internal state is inconsistent. Just blocking writers with suspend is not enough.'' Can you elaborate on how having other processes reading the filesystem causes trouble?
Mar 3 2021
This seems like a reasonable solution to the problem. The allocation will fail in a few cases where it previously would have succeeded, but hopefully those will be rare. The effect of failing will simply be slower lookups rather than unexpected errors to applications.
I agree that this change is appropriate.
Mar 2 2021
Sorry for the delayed review. This change looks correct to me.
Sorry for the delayed review. This fix looks correct to me.
Feb 25 2021
It appears that this change should be MFC'ed to 12.
Feb 24 2021
Feb 20 2021
Its a wrap!
Feb 18 2021
I note that lib/libprocstat/libprocstat.c includes ufs/ufs/inode.h but in fact does not need to do so.
Getting rid of _buf_cluster.h was a lot more work than I expected, but is definitely cleaner and has the added bonus of cleaning up some other cruft as well.