I agree with mark and alc comments.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 2 2021
Good catch.
Aug 21 2021
Aug 20 2021
This looks like a correct adaptation of the UFS code to the MSDOS filesystem implementation. Given that I have been struggling with getting UFS rename correct since the mid-1980s I am sure there are still bugs lurking in this implementation, but at least they are closer to being the same bugs in both filesystem implementations.
I concur with kib's suggestion. Otherwise looks like a good use of vn_lock_pair().
Sorry for the delay in responding, been vacationing this week. Looks good to go.
Aug 13 2021
A couple more comments.
Aug 12 2021
Overall looks good. One suggestion, one possible nit.
Jul 29 2021
Jul 19 2021
I am fine with the structure of this patch. I do ask that you await passing Peter Holm's tests before committing.
This reminds me of something that occurred to me while I was writing the code though: these flags are effectively prevented from being exposed to userspace because they're outside the 32-bit range of unmount_args->flags.
It seems a bit fragile to rely on that though...should we have something like an MNT_KERNEL mask for flags that are only allowed to be passed from kernel-initiated operations, and return EINVAL for any userspace request that includes any of them?
In theory MNT_VISFLAGMASK are externally visible and MNT_CMDFLAGS are internal only. The MNT_VISFLAGMASK flags are the only ones returned in a statfs(2) call but are not checked on system calls as far as I can tell.
Adding some clarity / enforcement would likely be a good idea, but not as part of these changes.
In D29754#686586, @ota_j.email.ne.jp wrote:I tested the 2 kernel changes both applied. In short, it helps since the 2nd refresh.
With or without the kernel changes, systat takes long time to return from the call. During that period, top, ps and few others freeze, although xosview draws memory and swap usage fine. Without the kernel changes, each systat refresh takes same amount of time and causes some others to hung. On the other hand, with the kernel changes, systat redraws significantly faster for the 2nd refresh and thereafter and also others like top don't hung.
So, the first call is an expensive call.
Jul 17 2021
Suggestion on how to run forcible unmount test on UFS.
Jul 14 2021
@jah - are you ready to have Peter Holm test these changes? If so, I will enlist his help.
Jul 13 2021
Jul 12 2021
For the purposes of this discussion, I think you should just go with your original proposal as it is clearly a useful step forward.
Jul 10 2021
Latest change looks good.
Code wise it looks good. I am not familiar with the exact layout on disk, but given cy@ has tested it and found it to work, I am happy that those aspects of the code are correct.
Jun 30 2021
Basing a decision on just the superblock magic number seems a bit weak, but that is what has been used traditionally and it worked, so with that in mind, this change seems reasonable.
Were this still in use, I would recommend more robust checks. For example verifying the check-hash for UFS2 filesystems that have them.
Jun 28 2021
The tests that Peter Holm ran (back in March) showed that buildworld ran about 1.5% slower with this change than without it. We were not able to figure out why that slowdown occurred, but unless this can be done in a way that at least has the same running time, I do not think we should do it.
Maybe it'd be nice if the read/write fd's were handled by libufs, but then again maybe not - I couldn't say without studying the code a bit more.
Jun 15 2021
Looks good to go.
Good catch on Mark's part.
Jun 11 2021
People experiencing this problem report that running the sync(8) command cleared up the slowdown. The reason that it helps is because the problem is triggered when the journal is low on space to make new entries. The journal is written as a circular queue, so running low on space means that the head of the list is nearing the tail of the list. The only way to make more space in the journal is to flush the oldest entries in it so that the tail can advance. What we do in softdep_prelink() is to flush out the entry that we have at hand. But it is almost never causes the oldest entry in the journal to be flushed. So the problem persists and we are just going to keep hitting it until some event causes the oldest entry to be flushed. The reason that sync(8) is so helpful is that it causes the older entries in the journal to be cleared which makes the problem go away.
Jun 10 2021
This looks good and based on testing appears to mitigate effectively. I also recommend changing SUJ_MAX in sys/ufs/ffs/fs.h to 512Mb or at least 256Mb.
Jun 7 2021
In D30041#682108, @bz wrote:FYI I am running the current version now as well. I'll see if my occasional 100% Syncer for a few minutes go away (or get better).
Do you want to track this here or should I follow-up on the PR?
Jun 2 2021
May 31 2021
Good catch, definitely needed.
May 30 2021
This looks good to me.
May 29 2021
If mprotect(2) can force the breakup of pages suitable for superpage mappings, then it opens a vector for a denial of performance by mapping and breaking up all the superpages held by the operating system.
Given that you rolled back the two changes, your suggested change of just adding a bufinit() in the gjournal case is the easiest and minimal change to solving this bug.
In D30537#685871, @jrtc27 wrote:In D30537#685867, @mckusick wrote:I think a better change is to simply preinitialize fs_bsize in sblock_init to MAXBSIZE which will be valid for all UFS filesystems so that bufinit will do a resonable allocation.
diffs392 BDownloadIf you're doing that, why not just make bufinit pass MAXBSIZE to malloc instead rather than initialising the in-memory superblock with a dummy value? The latter seems dangerous, it's nice to be able to have malloc junk catch things that read parts of the superblock that haven't been read from disk yet.
I think a better change is to simply preinitialize fs_bsize in sblock_init to MAXBSIZE which will be valid for all UFS filesystems so that bufinit will do a resonable allocation.
I travelled to the north coast where we have been offline for two days, but now back online. Will look into this directly.
May 22 2021
Ed, thanks for trying to do the close. Seems like a bug in Phabricator, but not sure to whom to report it. Meanwhile, abandonment seems like the best option.
May 21 2021
Yes, that is the correct fix.
In D30041#682108, @bz wrote:FYI I am running the current version now as well. I'll see if my occasional 100% Syncer for a few minutes go away (or get better).
Do you want to track this here or should I follow-up on the PR?
May 20 2021
May 19 2021
Expanding the journal is just to reduce the incidence of running into this problem. It is still necessary to fix the problem and this looks to be a viable approach.
Looks good to go.
May 18 2021
Other than one requested change, this looks good.
May 17 2021
I have been working with folks on these two bug reports which I suspect are caused by this problem:
This change has now been MFC'ed to 12-stable and 13-stable, so can be closed.
May 11 2021
This now follows FreeBSD style.
forgot to approve before committing.
May 9 2021
I think that this change provides a useful cleanup of the VM system. As noted it makes it easier to create new types of pagers.
May 7 2021
This change does accurately determine when a vnode does not need to be sync'ed. Does it help solve the stalling problem?
May 5 2021
Legally "All Rights Reserved." is no longer required, so we no longer add it to copyright messages.
Apr 27 2021
Apr 26 2021
Apr 19 2021
This change is a useful improvement.
Apr 18 2021
Apr 14 2021
Apr 13 2021
Not having seen this review, I started a review at https://reviews.freebsd.org/D29754. I am fine with abandoning my review though I do think you should consider incorporating my version of the manual page.
Apr 9 2021
Apr 2 2021
Given that -pg support has been withdrawn from the kernel, it is sensible to remove kgmon(8).
Mar 31 2021
Your change looks good.
Mar 26 2021
In D28856#659254, @kib wrote:In D28856#659176, @mckusick wrote:The cost of an extra allocation versus the overhead of having to handle low-memory situations and building up and tearing down zones seems like a bad tradeoff.
Why? It is reverse, IMO: the normal system operation performs a lot of vnode allocations and deallocations, while lowmem is rare condition, where we do not worry about system performance at all, only about system liveness. Optimizing for normal path is right, optimizing for lowmem handler is not.
The purpose of this change is to reduce the amount of memory dedicated to vnodes.
In D28856#659141, @kib wrote:In D28856#659119, @mckusick wrote:There is still overhead as the zone memory has to be cleaned up (locks disposed of) and then new memory initialized (zeroed, lists and queues initialized, locks initialized, etc). Also there is extra work done detecting that we have hit these conditions and making them happen. In general we are going to have more memory tied up and do more work moving it between the zones. If we just had one zone for vnodes and another zone for bufobjs we could avoid all of this. In all likelyhood we would only need occational freeing of memory in the bufobj zone.
I am curious why you are so resistant to having a single vnode zone and a single bufobj zone?
With either (vnode + bufobj, vnode - bufobj), or (vnode - bufobj, bufobj) we still have two zones, and on low memory condition two zones needs to be drained. But for the separate bufobj zone, we additionally punish filesystems that use buffers. Instead of single allocation for vnode, they have to perform two, and also they have to perform two frees.
We have a similar structure in namecache, where {short,long}x{timestamp, no timestamp} allocations use specific zones, instead of allocating nc + path segment + timestamp.
Mar 25 2021
I understand that there cannot be more than maxvnodes. What I am concerned about is how much memory is tied up in the two zones. In this example as vnlru() is freeing (vnodes without bufobjs) into the (vnodes without bufobjs) zone. It then allocates memory from (vnodes+bufobj) from the (vnodes+bufobj) zone. That allocation cannot use the memory in the (vnodes without bufobj) zone. So when we are done we have enough memory locked down in the two zones to support 2 * maxvnodes. This is much more wasteful of memory than having a single zone for pure vnodes and a second zone that holds bufobjs each of which will be limited to maxvnodes in size.
Mar 24 2021
In D28856#658811, @kib wrote:In D28856#658765, @mckusick wrote:No, we do not have two pools of vnodes after this change. We have two zones, but zones do not keep vnodes, they cache partially initialized memory for vnodes. Either the current single zone, or two zones after applying the patch, do not have any limits to grow that size of the cache. But it is a cache of memory and not vnodes. Without or with the patch, only maxvnodes constructed vnodes can exist in the system. The constructed vnode is struct vnode which is correctly initialized and has identity belonging to some filesystem, or reclaimed. [In fact in some cases getnewvnodes() is allowed to ignore the limit of maxvnodes, but this is not relevant to the discussion].
Let me try again to explain my perceived issue.
Under this scheme we have two zones. If there is a lot of ZFS activity, the vnode-only zone can be filled with maxvnodes worth of entries. Now suppose activity in ZFS drops but activity in NFS rises. Now the zone with vnodes + bufobj can fill to maxvnodes worth of memory. As I understand it we do not reclaim any of the memory from the vnode-only zone, it just sits there unable to be used. Is that correct?
No, this is not correct. Total number of vnodes (summary of both types) is limited by maxvnodes. After the load shifted from zfs to nfs in your scenario, vnlru starts reclaiming vnodes in LRU order from the global free list, freeing (vnode without bufobj)s, and most of the allocated vnodes would be from the (vnode+bufobj) zone. We do not allow more that maxvnodes total vnodes allocated in the system.
No, we do not have two pools of vnodes after this change. We have two zones, but zones do not keep vnodes, they cache partially initialized memory for vnodes. Either the current single zone, or two zones after applying the patch, do not have any limits to grow that size of the cache. But it is a cache of memory and not vnodes. Without or with the patch, only maxvnodes constructed vnodes can exist in the system. The constructed vnode is struct vnode which is correctly initialized and has identity belonging to some filesystem, or reclaimed. [In fact in some cases getnewvnodes() is allowed to ignore the limit of maxvnodes, but this is not relevant to the discussion].
Mar 22 2021
In D28856#657865, @kib wrote:In D28856#657658, @mckusick wrote:Three inline comments / questions.
In Sun's implementation of vnodes each filesystem type had its own pool. When I adopted the vnode idea into BSD, I created generic vnodes that could be used by all filesystems so that they could move between filesystems based on demand.
This design reverts back to vnodes usable by ZFS and a few other filesystems and vnodes for NFS, UFS, and most other filesystems. This will be a win for systems that run just ZFS. But, systems that are also running NFS or UFS will not be able to share vnode memory and will likely have a bigger memory footprint than if they stuck with the single type of vnode.
There has been no attempt to fix vlrureclaim() so we can end up reclaiming a bunch of vnodes of the wrong type thus reducing the usefulness of the cache without recovering any useful memory. In the worst case, we can end up with each of the two vnode pools using maxvnodes worth of memory.
We probably need to have a separate maxvnodes for each pool. Alternately we could keep track of how many vnodes are in each pool and limit the two pools to a total of maxvnodes. That of course begs the question of how we decide to divide the quota between the two pools. At a minimum, vlrureclaim() needs to have a way to decide which pool needs to have vnodes reclaimed.
We do not have two pools of vnodes after the patch. For very long time, we free vnode after its hold count goes to zero (mod SMR complications).
Three inline comments / questions.
Mar 16 2021
Mar 14 2021
Mar 12 2021
Mar 11 2021
Flag definitions look good.
Breakdown of commits is excellent.
Changes should resolve the problem.
It would help if I had looked at your commit logs before my previous comment. You have in fact separated everything out appropriately.
Nearly all of the changes in ffs_softdep.c are code cleanups and not related to this bug fix. I would prefer to see the code cleanups in a separate commit so that it is easier to see the changes that are needed to fix this problem. That said, this update appears to solve the problem that you describe.