Page MenuHomeFreeBSD

tmpfs: Account for whiteouts during rename/rmdir
AcceptedPublic

Authored by jah on Jul 17 2024, 2:50 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Sep 1, 4:57 PM
Unknown Object (File)
Sun, Sep 1, 3:36 PM
Unknown Object (File)
Sun, Sep 1, 3:11 PM
Unknown Object (File)
Sun, Sep 1, 10:42 AM
Unknown Object (File)
Sun, Sep 1, 10:42 AM
Unknown Object (File)
Sun, Sep 1, 7:27 AM
Unknown Object (File)
Sat, Aug 31, 12:15 AM
Unknown Object (File)
Thu, Aug 29, 11:48 AM
Subscribers

Details

Summary

The existing tmpfs implementation will return ENOTEMPTY for VOP_RMDIR,
or for the destinatiopn directory of VOP_RENAME, for any case in which
the directory is non-empty, even if the directory only contains
whiteouts.

Fix this by tracking total whiteout dirent allocation separately for
each directory, only returning ENOTEMPTY if the total allocation of
dirents is greater than the total whiteout allocation. This addresses
"directory entry not empty" failures seen on some recently-added
unionfs stress2 tests which use tmpfs as a base-layer filesystem.

A separate issue for independent consideration is that unionfs' default
behavior when deleting files or directories is to create whiteouts even
when it does not truly need to do so.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 59011
Build 55898: arc lint + arc unit

Event Timeline

jah requested review of this revision.Jul 17 2024, 2:50 AM
  • Move tmpfs_dir_clear() calls after error checking and detachment from parent
  • Move clearing of rename target dir to more logical location
This revision is now accepted and ready to land.Jul 17 2024, 6:06 AM

This also matches the UFS semantic, which considers directory empty if it consists only of whiteouts.

I've left a pretty comprehensive comment about why we would currently prefer these semantics in the generic vn_dir_check_empty().

I'm not sure however if we really want this behavior in the future, regardless of the filesystem. Here, the emptyness test is a guard for removing a directory, which is a different situation than considering whether a directory is empty to mount over it (which is the only situation in which vn_dir_check_empty() is called currently). In the latter, we just ensure we are not shadowing data (and this cannot affect some unionfs using the filesystem of the directory mounted upon). In the present situation, the test is used to decide whether to remove a directory and, although it doesn't contain files, it does contain data in the form of whiteout entries. In this light, removing the whiteouts silently seems odd.

I suspect that removing a directory with whiteouts was allowed for UFS only because not doing so would confuse most users not aware of whiteouts, as these are not printed by a regular ls. I doubt there was any other reason than that. It appears the test on WINO wasn't in 4.4BSD and was introduced in the Lite/2 commit (996c772f581f56248, r22521). Adding @mckusick as he might have some recollection of what lead to this decision.

I've left a pretty comprehensive comment about why we would currently prefer these semantics in the generic vn_dir_check_empty().

I'm not sure however if we really want this behavior in the future, regardless of the filesystem. Here, the emptyness test is a guard for removing a directory, which is a different situation than considering whether a directory is empty to mount over it (which is the only situation in which vn_dir_check_empty() is called currently). In the latter, we just ensure we are not shadowing data (and this cannot affect some unionfs using the filesystem of the directory mounted upon). In the present situation, the test is used to decide whether to remove a directory and, although it doesn't contain files, it does contain data in the form of whiteout entries. In this light, removing the whiteouts silently seems odd.

I suspect that removing a directory with whiteouts was allowed for UFS only because not doing so would confuse most users not aware of whiteouts, as these are not printed by a regular ls. I doubt there was any other reason than that. It appears the test on WINO wasn't in 4.4BSD and was introduced in the Lite/2 commit (996c772f581f56248, r22521). Adding @mckusick as he might have some recollection of what lead to this decision.

I've left a pretty comprehensive comment about why we would currently prefer these semantics in the generic vn_dir_check_empty().

I'm not sure however if we really want this behavior in the future, regardless of the filesystem. Here, the emptyness test is a guard for removing a directory, which is a different situation than considering whether a directory is empty to mount over it (which is the only situation in which vn_dir_check_empty() is called currently). In the latter, we just ensure we are not shadowing data (and this cannot affect some unionfs using the filesystem of the directory mounted upon). In the present situation, the test is used to decide whether to remove a directory and, although it doesn't contain files, it does contain data in the form of whiteout entries. In this light, removing the whiteouts silently seems odd.

I suspect that removing a directory with whiteouts was allowed for UFS only because not doing so would confuse most users not aware of whiteouts, as these are not printed by a regular ls. I doubt there was any other reason than that. It appears the test on WINO wasn't in 4.4BSD and was introduced in the Lite/2 commit (996c772f581f56248, r22521). Adding @mckusick as he might have some recollection of what lead to this decision.

This behavior seems odd to me as well, but to be honest the entire whiteout concept strikes me as odd, and likely to produce unexpected user-visible results (at least in various corner cases) regardless of the policy chosen here.
I therefore simply used the UFS implementation as my guide in determining what to do for tmpfs, so that we at least don't suffer from such inconsistent behavior between different whiteout-supporting filesystems.

As I noted in the commit message, there's another component to the unionfs test failures that brought about this change: unionfs defaults to UNIONFS_WHITE_ALWAYS, which does not seem like a sensible default to me. And apparently this has always been the default unionfs behavior, in fact there was no other behavior until the introduction of the 'whiteout' mount option in 2007. On that note, the unionfs test failures will also go away (even without these tmpfs changes) if you tweak unionfs17.sh and unionfs18.sh to pass '-o whiteout=whenneeded' to the unionfs mount command.

I ran tests with D45987.141024.patch. I ran all of the tmpfs test scenarios in a loop for 15 hours, without seeing any problems.

I suspect that removing a directory with whiteouts was allowed for UFS only because not doing so would confuse most users not aware of whiteouts, as these are not printed by a regular ls. I doubt there was any other reason than that. It appears the test on WINO wasn't in 4.4BSD and was introduced in the Lite/2 commit (996c772f581f56248). Adding @mckusick as he might have some recollection of what lead to this decision.

I do not remember why the change to allow the removing of directories with whiteouts became allowed. The changes in commit 996c772f581f56248 are quite extensive. If you can let me know which file(s) in that commit have the relevant changes, I can go back through the changes made in those files for the Lite/2 release to see if any explanation shows up.

I do not remember why the change to allow the removing of directories with whiteouts became allowed. The changes in commit 996c772f581f56248 are quite extensive. If you can let me know which file(s) in that commit have the relevant changes, I can go back through the changes made in those files for the Lite/2 release to see if any explanation shows up.

Sure, I'm referring to:

--- a/sys/ufs/ufs/ufs_lookup.c
+++ b/sys/ufs/ufs/ufs_lookup.c
@@ -906,7 +961,7 @@ ufs_dirempty(ip, parentino, cred)
                if (dp->d_reclen == 0)
                        return (0);
                /* skip empty entries */
-               if (dp->d_ino == 0)
+               if (dp->d_ino == 0 || dp->d_ino == WINO)
                        continue;
                /* accept only "." and ".." */
 #              if (BYTE_ORDER == LITTLE_ENDIAN)

part of function ufs_dirempty(), which is (now, but if I'm not mistaken also was then) called only by ufs_rename() and ufs_rmdir().

Barring other reasons I can't see now, I think we'd better revert that behavior, both for logical reasons (the directory references no file but still has whiteout entries that we are going to lose silently) and practical ones (e.g., on Linux, whiteouts are implemented as device files, and thus rmdir() on directories holding some will fail with ENOTEMPTY; this is also to ease consistency in ZFS' code).

I do not remember why the change to allow the removing of directories with whiteouts became allowed. The changes in commit 996c772f581f56248 are quite extensive. If you can let me know which file(s) in that commit have the relevant changes, I can go back through the changes made in those files for the Lite/2 release to see if any explanation shows up.

Sure, I'm referring to:

--- a/sys/ufs/ufs/ufs_lookup.c
+++ b/sys/ufs/ufs/ufs_lookup.c
@@ -906,7 +961,7 @@ ufs_dirempty(ip, parentino, cred)
                if (dp->d_reclen == 0)
                        return (0);
                /* skip empty entries */
-               if (dp->d_ino == 0)
+               if (dp->d_ino == 0 || dp->d_ino == WINO)
                        continue;
                /* accept only "." and ".." */
 #              if (BYTE_ORDER == LITTLE_ENDIAN)

part of function ufs_dirempty(), which is (now, but if I'm not mistaken also was then) called only by ufs_rename() and ufs_rmdir().

Barring other reasons I can't see now, I think we'd better revert that behavior, both for logical reasons (the directory references no file but still has whiteout entries that we are going to lose silently) and practical ones (e.g., on Linux, whiteouts are implemented as device files, and thus rmdir() on directories holding some will fail with ENOTEMPTY; this is also to ease consistency in ZFS' code).

I think we need to be extremely careful about reverting UFS behavior that's been in place for 27 years just because it's inconsistent with Linux and/or ZFS. If we change this behavior, our approach will still be fundamentally different than the Linux approach because we treat whiteouts as special inodes rather than user-visible files. For example, would reverting this behavior mean that 'rm -rf' would no longer work on a directory containing whiteouts, thus producing a completely un-deletable directory on the upper filesystem?

EDIT: It looks as though the -W option available to both ls(1) and rm(1) should avoid that particular pitfall. I'd still urge extra care in changing this behavior though.

In D45987#1051615, @jah wrote:

I think we need to be extremely careful about reverting UFS behavior that's been in place for 27 years

Aren't we?

just because it's inconsistent with Linux and/or ZFS.

Please... This isn't what I wrote. These are secondary, but still important, reasons.

The main reason, just sketched above, is a "philosophical" one built upon common practice. Kirk can correct me if I'm second-guessing it wrong, but rmdir() exists mainly because you can't remove a directory from UFS just by unlinking it from its parent directory. Doing that indeed leaks a lot of resources (inodes and blocks) for all sub-files (including sub-directories). So there was two possible paths: Either the filesystem has code to do all that job recursively, or it leaves the recursion part that to userland programs, providing only primitives. Obviously, the second path was the one chosen, I guess out of simplicity but also as added usability protection, as you have to have a different stance (rm -fR) to delete a whole hierarchy than just deleting a single, empty directory (rmdir).

The net result is that, today, someone issuing rmdir expects nothing will go wrong if the call succeeds. Only the directory will be suppressed, and its metadata, but no other data will. Allowing rmdir on a directory with whiteouts, at least in my view, violates this expectation. Whiteouts are not directory metadata, such as its timestamp or permissions for example are. They are rather data stored into it, data of which entries in the lower layer should disappear when used as an upper layer.

If we change this behavior, our approach will still be fundamentally different than the Linux approach because we treat whiteouts as special inodes rather than user-visible files.

I've already approached this topic in details in the full unionfs proposal. Not going to rewrite it here, but in a nutshell, whiteouts are currently in UFS directory entries, and for a variety of reasons that's the best approach. It would be even better if we could set attributes on such entries, but this out of scope for this review. That said, almost all of the functionality of using entries can be replicated by having full file (character devices, or whatever) markers. So while in nature, the approaches are fundamentally different, in practice they are not.

For example, would reverting this behavior mean that 'rm -rf' would no longer work on a directory containing whiteouts, thus producing a completely un-deletable directory on the upper filesystem?

There is absolutely no reason to change the user-visible behavior of rm -fR. Whiteout entries there should just be treated as files.

EDIT: It looks as though the -W option available to both ls(1) and rm(1) should avoid that particular pitfall. I'd still urge extra care in changing this behavior though.

Yes, but rm -fR must be able to delete a directory hierarchy, with or without whiteouts. The code of rm probably will have to be slightly revised (IIRC, it's just a matter of passing the appropriate flags to fts). But this is talking here about implementation practical details, which in this case can probably always be solved at little cost. The crux of the discussion here is the behavior that makes sense consistently with other expected, and long-established, filesystem behaviors.

In D45987#1051615, @jah wrote:

I think we need to be extremely careful about reverting UFS behavior that's been in place for 27 years

Aren't we?

just because it's inconsistent with Linux and/or ZFS.

Please... This isn't what I wrote. These are secondary, but still important, reasons.

I did gloss over the other arguments, didn't I? It wasn't intentional, but I guess that's partly because those arguments seem a bit subjective to me.

Yes, discarding the whiteout data does strike me as odd, I've already mentioned that. But whiteouts themselves are quite odd: as you mentioned they're not directory metadata. They might be thought of as file metadata for files contained within the lower directory, but they are file metadata whose purpose is to allow the user and other parts of the system to pretend those files don't exist. Given that, a directory whose upper FS component contains only whiteouts that completely negate the contents of its lower FS component is meant to be seen as logically empty by the user. In light of that, I see (again, it's subjective I'll admit) failing an rmdir or rename-onto attempt against such a directory as worse than discarding the whiteouts. It might be less bad if it could be made clear to the user that the attempt failed because of whiteouts, but still strikes me as very odd. Moreover, in the case of rmdir unionfs will then proceed to create a whiteout for the directory itself, so it's not as though the rmdir will suddenly make the lower FS files visible in the union. I do think it's less clear what the correct/expected behavior should be if that directory whiteout is later removed, or if we start talking about rename-onto rather than rmdir.

The main reason, just sketched above, is a "philosophical" one built upon common practice. Kirk can correct me if I'm second-guessing it wrong, but rmdir() exists mainly because you can't remove a directory from UFS just by unlinking it from its parent directory. Doing that indeed leaks a lot of resources (inodes and blocks) for all sub-files (including sub-directories). So there was two possible paths: Either the filesystem has code to do all that job recursively, or it leaves the recursion part that to userland programs, providing only primitives. Obviously, the second path was the one chosen, I guess out of simplicity but also as added usability protection, as you have to have a different stance (rm -fR) to delete a whole hierarchy than just deleting a single, empty directory (rmdir).

The net result is that, today, someone issuing rmdir expects nothing will go wrong if the call succeeds. Only the directory will be suppressed, and its metadata, but no other data will. Allowing rmdir on a directory with whiteouts, at least in my view, violates this expectation. Whiteouts are not directory metadata, such as its timestamp or permissions for example are. They are rather data stored into it, data of which entries in the lower layer should disappear when used as an upper layer.

If we change this behavior, our approach will still be fundamentally different than the Linux approach because we treat whiteouts as special inodes rather than user-visible files.

I've already approached this topic in details in the full unionfs proposal. Not going to rewrite it here, but in a nutshell, whiteouts are currently in UFS directory entries, and for a variety of reasons that's the best approach. It would be even better if we could set attributes on such entries, but this out of scope for this review. That said, almost all of the functionality of using entries can be replicated by having full file (character devices, or whatever) markers. So while in nature, the approaches are fundamentally different, in practice they are not.

For example, would reverting this behavior mean that 'rm -rf' would no longer work on a directory containing whiteouts, thus producing a completely un-deletable directory on the upper filesystem?

There is absolutely no reason to change the user-visible behavior of rm -fR. Whiteout entries there should just be treated as files.

EDIT: It looks as though the -W option available to both ls(1) and rm(1) should avoid that particular pitfall. I'd still urge extra care in changing this behavior though.

Yes, but rm -fR must be able to delete a directory hierarchy, with or without whiteouts. The code of rm probably will have to be slightly revised (IIRC, it's just a matter of passing the appropriate flags to fts). But this is talking here about implementation practical details, which in this case can probably always be solved at little cost. The crux of the discussion here is the behavior that makes sense consistently with other expected, and long-established, filesystem behaviors.

In D45987#1052043, @jah wrote:

Yes, discarding the whiteout data does strike me as odd, I've already mentioned that.

Sure, but explaining the why is important, else there's no discussion possible. I don't think anyone an architecture for unionfs where all behaviors are "unsurprising" to users at first exists. What I am seeking, though, is to have consistent, understandable, and most of the time unsurprising behaviors, at least in connection to security, data loss, etc.

But whiteouts themselves are quite odd: as you mentioned they're not directory metadata. They might be thought of as file metadata for files contained within the lower directory, but they are file metadata whose purpose is to allow the user and other parts of the system to pretend those files don't exist. Given that, a directory whose upper FS component contains only whiteouts that completely negate the contents of its lower FS component is meant to be seen as logically empty by the user.

That's some common thinking for these but I'm in clear disagreement with it. It conflates what is visible through the union view and how the latter is concretely built out of the (upper and lower) layers. Once one recognizes them as conceptually separate, there is really no reason to see a directory holding whiteouts as empty, even if these whiteouts are meant to "delete" some lower layer's files *in the union view*. Even more, considering such directories empty, in the context of the upper layer on its own, is basically discarding the existence of these whiteouts in a context where the operator should be in a position to tweak the pieces that finally make the union view at will, and thus should be presented with all of them. For people not using UFS at all for unions, or using it only as a layer backing store that they never tweak directly, the chosen behavior doesn't matter at all, since they will never have to deal with whiteouts.

Currently, whiteouts are treated as "transparent" by unionfs, meaning that whiteouts of the upper layer also appear through the union. I've evoked this kind of conflation, and why it was made, in a paragraph of the "Metadata in the Upper Layer" sub-section in the full proposal document, as well as for the description of mount option unionfs_metadata_support. In earlier versions, I had considered ruling out that possibility, but it is almost practically required in order to stack several unions. To summarize the relevant parts of the proposal, option unionfs_metadata_support will control whether a union itself exports the whiteouts of its upper layer (in which case, it is meant to be part of another union) or not (in which case, it is considered as a filesystem to use in itself). This is consistent with considering directories with whiteouts as not empty: If unionfs_metadata_support is set to no (the default), whiteouts used in implementing some union's directory are not seen *through the union*, so these directories of the union are considered as empty if their backing upper layer's directory has only whiteouts at least shadowing all lower layer's entries. If that mount option is set to conflate instead, then such kinds of directory won't be considered empty, which is expected since that union is then itself meant to be part of another union as its upper layer, and thus is subject to the same reasoning developed above for UFS.

Moreover, in the case of rmdir unionfs will then proceed to create a whiteout for the directory itself, so it's not as though the rmdir will suddenly make the lower FS files visible in the union.

The discussion we are having is mainly about rmdir of a directory in the upper layer. In this case, no, rmdir won't create any whiteouts (why would it? it doesn't even know about some potential lower layer).

If you're talking instead about the union itself, then the union should be mounted with unionfs_metadata_support set to no, which I intend to be the default value (layering unions should be the less common case for human administrators), in which case rmdir will work as it should (whether there are whiteouts in the upper layer's directory or not; which in the end should be irrelevant, as, again, it is an implementation detail).

So, really, I think we should change UFS' behavior to conform to that (and I don't think that there is a compelling argument that would say otherwise; but I'm open to hear some if it exists). That said, it is probably better to wait for unionfs to be revamped and new mount options to be added before doing it, which has been my intent since the proposal. In any case, I don't think that tmpfs' behavior in this respect should be changed, that would be against the longer-term plan.

In D45987#1052043, @jah wrote:

Yes, discarding the whiteout data does strike me as odd, I've already mentioned that.

Sure, but explaining the why is important, else there's no discussion possible. I don't think anyone an architecture for unionfs where all behaviors are "unsurprising" to users at first exists. What I am seeking, though, is to have consistent, understandable, and most of the time unsurprising behaviors, at least in connection to security, data loss, etc.

But whiteouts themselves are quite odd: as you mentioned they're not directory metadata. They might be thought of as file metadata for files contained within the lower directory, but they are file metadata whose purpose is to allow the user and other parts of the system to pretend those files don't exist. Given that, a directory whose upper FS component contains only whiteouts that completely negate the contents of its lower FS component is meant to be seen as logically empty by the user.

That's some common thinking for these but I'm in clear disagreement with it. It conflates what is visible through the union view and how the latter is concretely built out of the (upper and lower) layers. Once one recognizes them as conceptually separate, there is really no reason to see a directory holding whiteouts as empty, even if these whiteouts are meant to "delete" some lower layer's files *in the union view*. Even more, considering such directories empty, in the context of the upper layer on its own, is basically discarding the existence of these whiteouts in a context where the operator should be in a position to tweak the pieces that finally make the union view at will, and thus should be presented with all of them. For people not using UFS at all for unions, or using it only as a layer backing store that they never tweak directly, the chosen behavior doesn't matter at all, since they will never have to deal with whiteouts.

I think this paragraph gets to the heart of our disagreement. I'm looking at the problem almost entirely through the lens of the observed behavior *when operating against the union view*, while you're at least partly considering the behavior that should be expected when operating directly against the upper FS. And there's nothing wrong with that, it's good to at least consider the impact that direct manipulation of the component filesystems will have on the union view.

And though I've read your design proposal, I have little insight into what your plans will look like when put into code (e.g. concrete changes to unionfs, to various filesystems, and precisely when and in what order they'll be implemented). I'm therefore mostly left to consider the landscape as it exists today, where we have an obvious problem with observable union-view behavior against tmpfs, and would have the same problem with UFS if we reverted the behavior in question without making other changes. Admittedly that bias toward the present is also exacerbated by the fact that going forward I'd prefer to significantly curtail my filesystem work, as (to put it mildly) it's proven to be "not my favorite thing to work on".

With all that said, I'm still somewhat skeptical that discarding whiteouts upon direct removal of an upper FS directory (as opposed to rmdir issued against the union view) would constitute truly incorrect behavior. I see these whiteouts as data that are meaningless outside the context of the union view, such that discarding them is something the user shouldn't care about, but if they do care then 'ls -W' gives them the ability to check for whiteout presence. And I also think such a user should be prepared for the unusual results this will produce in the union view, as *today* that will certainly be the case for any number of similar operations taken directly against the upper or lower filesystem. However, given my aforementioned lack of insight I'm prepared to set this argument aside.

Currently, whiteouts are treated as "transparent" by unionfs, meaning that whiteouts of the upper layer also appear through the union. I've evoked this kind of conflation, and why it was made, in a paragraph of the "Metadata in the Upper Layer" sub-section in the full proposal document, as well as for the description of mount option unionfs_metadata_support. In earlier versions, I had considered ruling out that possibility, but it is almost practically required in order to stack several unions. To summarize the relevant parts of the proposal, option unionfs_metadata_support will control whether a union itself exports the whiteouts of its upper layer (in which case, it is meant to be part of another union) or not (in which case, it is considered as a filesystem to use in itself). This is consistent with considering directories with whiteouts as not empty: If unionfs_metadata_support is set to no (the default), whiteouts used in implementing some union's directory are not seen *through the union*, so these directories of the union are considered as empty if their backing upper layer's directory has only whiteouts at least shadowing all lower layer's entries. If that mount option is set to conflate instead, then such kinds of directory won't be considered empty, which is expected since that union is then itself meant to be part of another union as its upper layer, and thus is subject to the same reasoning developed above for UFS.

Moreover, in the case of rmdir unionfs will then proceed to create a whiteout for the directory itself, so it's not as though the rmdir will suddenly make the lower FS files visible in the union.

The discussion we are having is mainly about rmdir of a directory in the upper layer. In this case, no, rmdir won't create any whiteouts (why would it? it doesn't even know about some potential lower layer).

If you're talking instead about the union itself, then the union should be mounted with unionfs_metadata_support set to no, which I intend to be the default value (layering unions should be the less common case for human administrators), in which case rmdir will work as it should (whether there are whiteouts in the upper layer's directory or not; which in the end should be irrelevant, as, again, it is an implementation detail).

I was very much referring rmdir against the union view here, in which case unionfs_rmdir() will specify DOWHITEOUT to the upper FS VOP_RMDIR() if either the directory is present in the lower FS or the whiteout mode is 'always'.
I think it would help me to understand your viewpoint better if you could walk me through a concrete description of how you would see unionfs_rmdir() working in the "unionfs_metadata_support=no" case. What would it do differently compared to its current behavior? What behaviors (if any) would you expect to change on the part of the upper/lower FS when invoked by unionfs_rmdir()?

So, really, I think we should change UFS' behavior to conform to that (and I don't think that there is a compelling argument that would say otherwise; but I'm open to hear some if it exists). That said, it is probably better to wait for unionfs to be revamped and new mount options to be added before doing it, which has been my intent since the proposal. In any case, I don't think that tmpfs' behavior in this respect should be changed, that would be against the longer-term plan.

I do not remember why the change to allow the removing of directories with whiteouts became allowed. The changes in commit 996c772f581f56248 are quite extensive. If you can let me know which file(s) in that commit have the relevant changes, I can go back through the changes made in those files for the Lite/2 release to see if any explanation shows up.

Sure, I'm referring to:

--- a/sys/ufs/ufs/ufs_lookup.c
+++ b/sys/ufs/ufs/ufs_lookup.c
@@ -906,7 +961,7 @@ ufs_dirempty(ip, parentino, cred)
                if (dp->d_reclen == 0)
                        return (0);
                /* skip empty entries */
-               if (dp->d_ino == 0)
+               if (dp->d_ino == 0 || dp->d_ino == WINO)
                        continue;
                /* accept only "." and ".." */
 #              if (BYTE_ORDER == LITTLE_ENDIAN)

part of function ufs_dirempty(), which is (now, but if I'm not mistaken also was then) called only by ufs_rename() and ufs_rmdir().

Barring other reasons I can't see now, I think we'd better revert that behavior, both for logical reasons (the directory references no file but still has whiteout entries that we are going to lose silently) and practical ones (e.g., on Linux, whiteouts are implemented as device files, and thus rmdir() on directories holding some will fail with ENOTEMPTY; this is also to ease consistency in ZFS' code).

In reviewing the original CSRG commit logs, this change was made by Simon Pendry (the original author of the union filesystem) when he added the concept of whiteout entries to UFS. When doing a rmdir() on a union stack, the directory would be replaced by a whiteout with that name so that the underlying directory would not be visible. If a new directory was created with the whiteout name, it would have its UF_OPAQUE set so that none of the underlying directory would be visible. Thus the loss of the whiteout entries in the original directory would not matter. Simply put, ignoring the whiteout entries provided the semantics that he desired.

@olce Reading back through the discussion here, I'm not sure we're in as much disagreement as it initially seemed.

My main concern is that I don't want rmdir against a logically-empty directory in the union view to fail due to the presence of whiteouts; this is also the fundamental issue at play in the unionfs17/unionfs18 test failures that motivated these changes in the first place. But based on your more recent comments regarding the planned behavior of "unionfs_metadata_support=no", it doesn't sound as though you were ever advocating for rmdir to fail in that case. Instead it sounds as though your main concern was that rmdir shouldn't silently discard whiteouts when executed directly against an upper FS directory outside the union view.

Is this correct? If so, apologies for the confusion here. I don't feel nearly as strongly about the latter case as I do the former case. I'm not sure how concerned we really need to be about the latter case, but I could certainly buy the argument that we should bias toward not discarding whiteout data in that situation. It also looks as though that's the behavior that would be consistent with Linux; looking at the in-tree overlayfs implementation at least, it appears that overlayfs' rmdir implementation intentionally ignores whiteout inodes when determining whether the directory is empty and then subsequently discards whiteouts when removing the directory. Since this logic resides entirely within the overlayfs layer, it would only apply to directory removal through the union view; I would expect the presence of the whiteout files to cause rmdir to fail when executed directly against the underlying FS.

In light of the above, would something like the attached simple patch be worth doing here? The idea is that it would get us the desired split between union-view and non-union-view behavior today, and it would also be trivial for a future unionfs change to make use of the DROPWHITEOUT flag conditional on whether the unionfs mount was specified with "unionfs_metadata_support=no". I realize you may have some completely different plan for the implementation of unionfs_metadata_support, so if that's the case I won't push for these changes.

In reviewing the original CSRG commit logs, this change was made by Simon Pendry (the original author of the union filesystem) when he added the concept of whiteout entries to UFS. When doing a rmdir() on a union stack, the directory would be replaced by a whiteout with that name so that the underlying directory would not be visible. If a new directory was created with the whiteout name, it would have its UF_OPAQUE set so that none of the underlying directory would be visible. Thus the loss of the whiteout entries in the original directory would not matter. Simply put, ignoring the whiteout entries provided the semantics that he desired.

@mckusick Thanks very much Kirk for digging this out. This is something along the lines of what I was expecting, and in line with what both Jason and I were stating (Jason: "Moreover, in the case of rmdir unionfs will then proceed to create a whiteout for the directory itself, so it's not as though the rmdir will suddenly make the lower FS files visible in the union.", I: "Currently, whiteouts are treated as "transparent" by unionfs, meaning that whiteouts of the upper layer also appear through the union." and "If unionfs_metadata_support is set to no (the default), whiteouts used in implementing some union's directory are not seen *through the union*, so these directories of the union are considered as empty if their backing upper layer's directory has only whiteouts at least shadowing all lower layer's entries.").

I think this paragraph gets to the heart of our disagreement. I'm looking at the problem almost entirely through the lens of the observed behavior *when operating against the union view*, while you're at least partly considering the behavior that should be expected when operating directly against the upper FS. And there's nothing wrong with that, it's good to at least consider the impact that direct manipulation of the component filesystems will have on the union view.

@jah I'm in fact considering both operating against the union view and against the upper layer FS directly, I'm really not ruling out the former, rather including it in a larger view.

And though I've read your design proposal, I have little insight into what your plans will look like when put into code (e.g. concrete changes to unionfs, to various filesystems, and precisely when and in what order they'll be implemented). I'm therefore mostly left to consider the landscape as it exists today, where we have an obvious problem with observable union-view behavior against tmpfs, and would have the same problem with UFS if we reverted the behavior in question without making other changes. Admittedly that bias toward the present is also exacerbated by the fact that going forward I'd prefer to significantly curtail my filesystem work, as (to put it mildly) it's proven to be "not my favorite thing to work on".

The design proposal is indeed very big, and has a number of reasonings and ideas that have quite wide ramifications in unionfs and beyond. I probably didn't dig into this particular one with sufficient details, although the main idea has been there almost from the start.

The proposal document as a whole is not even "complete", in the sense that not all implementation ideas are fully described, although I hope it can be considered as going a long way to come reasonably close to that goal. Going into full details is practically impossible, nor desirable. As in all coding tasks, there are lots of things, normally only minor if the design part has been done well, that will be devised at the moment of implementation, including the ordering of some tasks. It would require as much, if not more, time trying to write them all down in advance then actually working on the implementation.

For the envisioned changes at hand, I think the proper guideline is to consider that administrators of union views are in charge of anything that participates in building those views, and in particular we should make explicit (or at least expected) all modifications of the layers to them. I think some rmdir() directly on some layer removing a directory containing whiteouts violates it, as previously explained. As another example, administrators directly manipulating the layers not only should be able to manipulate whiteouts safely, but ideally should see them by default, which implies reversing the meaning of argument -W' to ls` the opposite of what it is today. This is another change I don't think I explicitly wrote down in the design document, but I'm planning to carry forward.

As a side note, I don't think we should generally be too picky about backwards compatibility for functionaly related to unionfs, as it has been knowingly broken for so many years (I'm even wondering if it could have ever been considered as a production FS in FreeBSD; most certainly it couldn't for at least the past 20 years), but really put the priority on making things "right" with respect to all reasonable scenarios.

In D45987#1053545, @jah wrote:

Is this correct? If so, apologies for the confusion here.

Yes. Just to make things clear, as explained above, I don't really consider the rmdir() directly on the layer more (nor less) important than having rmdir() work as expected on the union view. They are equally important in my view. But I was indeed foremost concerned by the the first as the change here breaks that expectation long-term.

I don't feel nearly as strongly about the latter case as I do the former case. I'm not sure how concerned we really need to be about the latter case, but I could certainly buy the argument that we should bias toward not discarding whiteout data in that situation.

Then we are indeed in agreement. Great!

All these difficulties come from the fact that unionfs-specific metadata, used to build the view, here the whiteouts, are currently also exported by the view itself, which is undesirable except if trying to use views to build other views. It's because of this conflation that Simon Pendry made the UFS change I evoked above, as confirmed by Kirk. With proper separation of concerns, this change becomes unnecessary, and that's great, because from an administrator's point of view, it is actually harmful.

It also looks as though that's the behavior that would be consistent with Linux; looking at the in-tree overlayfs implementation at least, it appears that overlayfs' rmdir implementation intentionally ignores whiteout inodes when determining whether the directory is empty and then subsequently discards whiteouts when removing the directory. Since this logic resides entirely within the overlayfs layer, it would only apply to directory removal through the union view; I would expect the presence of the whiteout files to cause rmdir to fail when executed directly against the underlying FS.

Yes, it works like that in Linux.

In D45987#1053545, @jah wrote:

In light of the above, would something like the attached simple patch be worth doing here? The idea is that it would get us the desired split between union-view and non-union-view behavior today, and it would also be trivial for a future unionfs change to make use of the DROPWHITEOUT flag conditional on whether the unionfs mount was specified with "unionfs_metadata_support=no". I realize you may have some completely different plan for the implementation of unionfs_metadata_support, so if that's the case I won't push for these changes.

I'm sorry, but it seems I can't see this attachment at all. I had some problem once uploading some attachment to Phabricator, it may have been because I wasn't logged in anymore (disconnection?), but I never figured it out, maybe it's another occurence of this problem? In any case, here I'm logged in correctly and reading this review. Your

reference shows no link, and as a reference file dated the day of your last comment in the list at the top of the page, I only can see some "Restricted File" with a lock icon. It seems I somehow don't have the rights to download it. Could you please repost it or change its visibility (if at all possible)?

Thanks, I've just downloaded it.

In D45987#1053545, @jah wrote:

In light of the above, would something like the attached simple patch be worth doing here? The idea is that it would get us the desired split between union-view and non-union-view behavior today, and it would also be trivial for a future unionfs change to make use of the DROPWHITEOUT flag conditional on whether the unionfs mount was specified with "unionfs_metadata_support=no". I realize you may have some completely different plan for the implementation of unionfs_metadata_support, so if that's the case I won't push for these changes.

Just to clarify, taking only the case of rmdir() (that of rename() is similar), rmdir() should never work on a directory with witheouts, whatever the filesystem. The "split" you're evoking has to be implemented in fact in unionfs itself, which has to filter (or not) directory entries of the upper layer with respect to userland.

That said, this new patch seems very useful for internal use by unionfs, so that it can actually call VOP_RMDIR() without having to handle whiteouts in the upper layer by itself. This handling should arguably be delegated to the FS layers when possible, as it should prove much more performant.

As a digression, and not for now, this makes me think that we should probably have some flag per filesystem type indicating what their various VOP_() can support, perhaps a single flag indicating whether they support unionfs (indicating minimal support for whiteouts, opaque directories, and, e.g., VOP_RMDIR() and VOP_RENAME() taking DROPWHITEOUT) or having a few flags for different subsets of useful functionality.

DROPWHITEOUT is yet one new flag to struct componentname, but I don't really see any other solution without changing the signature of the concerned VOP_(), which is probably worse.

I don't have much more time today, but if you merge the last patch with the one visible here, I'll try to review all that tomorrow.

Would you perhaps rename DROPWHITEOUT to IGNOREWHITEOUT? I may want to reuse this flag at some point for VOP_READDIR().

Would you perhaps rename DROPWHITEOUT to IGNOREWHITEOUT? I may want to reuse this flag at some point for VOP_READDIR().

Will do. Also worth noting is that the patch doesn't add IGNOREWHITEOUT in unionfs_rename(). I'd forgotten that unionfs_rename() currently doesn't allow renaming over a directory that has an upper FS component. I'm not sure if this was done to avoid the "surprises" that could result from discarding whiteouts in such a directory or for some other reason; as with so many other things in unionfs there's no comment or commit message to explain the reasoning.

Add IGNOREWHITEOUT flag and adopt in UFS and tmpfs.

Also restructure the commits to first add IGNOREWHITEOUT and adopt it
in UFS and unionfs, and then fully adopt whiteout tracking in tmpfs.

This revision now requires review to proceed.Aug 6 2024, 4:40 AM

Since whiteout-only directory removal will no longer succeed by default outside the union view, should we also change 'rm -rf' to allow whiteouts to be forcibly removed? As @olce alluded to earlier, it looks as though this might just be a one-line change in bin/rm/rm.c to specify FTS_WHITEOUT in the 'fflag' case.

I ran tests with D45987.141850.patch without seeing any issues.

Sorry, but I was off for a while, and couldn't get back to this. So that you know, I'm also going to be almost completely off in the next 2 weeks.

I generally agree with these changes, and only have a couple of suggestions, please see inline comments.

Thanks!

sys/fs/tmpfs/tmpfs_subr.c
1870

Given its uses, I'd reserve it for clearing directories with only whiteouts, and rename it accordingly. The proposed name above is some kind of middle ground. A stronger name would be something like tmpfs_dir_only_whiteouts_clear().

1879–1881

Once this routine is limited to directory with only whiteouts, that test can be converted to an assertion.

sys/fs/tmpfs/tmpfs_vnops.c
1081–1084

Test guard rewrite suggestion (seems clearer to me, YMMV).

1243–1254

Maybe a small comment indicating that we have tested above that the directory is either empty, or holds only whiteouts?

With one of the new names proposed above, what the call tmpfs_dir_clear() does will also be clearer.

1340–1342

Same suggestion as above on the order of the tests.

This revision is now accepted and ready to land.Tue, Aug 13, 8:44 PM
  • Clarifying tweaks for the tmpfs implementation
This revision now requires review to proceed.Wed, Aug 14, 1:44 AM

I ran a brief test with D45987.142064.patch without seeing any problems.

This revision is now accepted and ready to land.Mon, Sep 2, 12:34 PM