Page MenuHomeFreeBSD

"Long" filename support changes
AbandonedPublic

Authored by cem on Sep 12 2017, 6:12 AM.

Details

Reviewers
julian
Summary

Bump up filename and path limits to accommodate 255 Unicode character paths,
like Windows (UTF-8 encoded). (The maximum length of an encoded code point
is four bytes, so NAME_MAX / MAXNAMLEN becomes ~1020.) PATH_MAX is bumped by
a factor of four as well to accomodate longer names.

Some associated limits like _XOPEN_NAME_MAX, _XOPEN_PATH_MAX, FILENAME_MAX
are bumped.

Glob is modified to allocate now larger path buffers on the heap, rather
than the stack.

EXTATTR_MAXNAMELEN is de-linked from NAME_MAX; there's no obvious reason
attribute names need to be longer than 255 bytes. In particular, Windows
Alternate Data Stream names cannot be longer than 255 bytes.

The old 1024 byte path limit is retained as OLD_PATH_MAX. Several ABIs are
preserved by retaining the old limit.

Finally, UFS is modified to enforce its existing 255 byte name limit.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 15783
Build 15795: arc lint + arc unit

Event Timeline

It seems that ZFS (which we don't use) has some ABI dependency on these constants via zfs_cmd, and this change breaks a CTASSERT on line 202 of zfs_ioctl.c. For some reason passing 3 MAXPATHLENs and 1 MAXNAMELEN via ioctl is too much. Someone would need to figure out how to fix ZFS to use this patch.

In D12330#255757, @cem wrote:

It seems that ZFS (which we don't use) has some ABI dependency on these constants via zfs_cmd, and this change breaks a CTASSERT on line 202 of zfs_ioctl.c. For some reason passing 3 MAXPATHLENs and 1 MAXNAMELEN via ioctl is too much. Someone would need to figure out how to fix ZFS to use this patch.

We have the ZFS expertise to fix that,,.. we DO use ZFS.,

It wouldn't be too hard to just convert the ioctl interface ZFS uses now to a syscall interface, which can take arbitrarily large commands. I think that's probably the most straightforward route.

sys/ufs/ufs/ufs_lookup.c
228

Might want similar checks on other existing filesystems.

NFS already seems to check against its own MAXNAMLEN constant. tmpfs I think we can reasonably allow to just support long names. UDF is readonly. msdosfs probably needs a check added. FUSE? Not sure whether we should restrict userspace filesystems or not. ext2fs probably also should verify against EXT2FS_MAXNAMLEN (if it does now, it isn't explicit enough for me to see).

In D12330#255757, @cem wrote:

It seems that ZFS (which we don't use) has some ABI dependency on these constants via zfs_cmd, and this change breaks a CTASSERT on line 202 of zfs_ioctl.c. For some reason passing 3 MAXPATHLENs and 1 MAXNAMELEN via ioctl is too much. Someone would need to figure out how to fix ZFS to use this patch.

I think that zfs_cmd_t is supposed to use ZFS_MAXNAMELEN. I am not sure why it doesn't.
zc_value probably needs to be at least MAXPATHLEN long because of zfs_ioc_obj_to_path, but zc_name definitely should be ZFS_MAXNAMELEN.

ed added inline comments.
include/limits.h
133–134

These should not be modified. POSIX requires them to be exactly 255/1024.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html

I'm personally not a big fan of going into this direction. The problem with these constants is that they are defined in the first place. A lot of people don't realise that POSIX allows us to omit PATH_MAX and NAME_MAX entirely. By omitting them, you specify that these limits are not known at compile-time for this implementation. This is in our case arguably even more correct, as in our environment, the actual value depends on the mount point. The true limit may then be extracted using pathconf().

Instead of simply bumping these constants, I personally think we should just spend the effort to change our ABI to be path size independent. For example, struct dirent::d_name could as well be declared as having no size at all. POSIX allows this. I would even argue that this may even reduce memory usage here and there, as we often allocate PATH_MAX bytes, just to store some short pathname. This wouldn't be a problem if code just made dynamic allocations.

Once those kinds of changes are made, we could change limits.h to make PATH_MAX and NAME_MAX kernel-only. People may then change them to any value they prefer without breaking the ABI.

In D12330#256361, @ed wrote:

I'm personally not a big fan of going into this direction. The problem with these constants is that they are defined in the first place. A lot of people don't realise that POSIX allows us to omit PATH_MAX and NAME_MAX entirely. By omitting them, you specify that these limits are not known at compile-time for this implementation. This is in our case arguably even more correct, as in our environment, the actual value depends on the mount point. The true limit may then be extracted using pathconf().

Technically, sure. But porting software to e.g. GNU Hurd, which does this, is more difficult because of it. I don't think we want to increase porters' burdens.

Instead of simply bumping these constants, I personally think we should just spend the effort to change our ABI to be path size independent. For example, struct dirent::d_name could as well be declared as having no size at all. POSIX allows this. I would even argue that this may even reduce memory usage here and there, as we often allocate PATH_MAX bytes, just to store some short pathname. This wouldn't be a problem if code just made dynamic allocations.

I think your proposal is orthogonal to this work. Please counter propose a patch if you have one, though. I think it will be a pretty big change.

Once those kinds of changes are made, we could change limits.h to make PATH_MAX and NAME_MAX kernel-only. People may then change them to any value they prefer without breaking the ABI.

First you have to extract all of the constant-sized string fields out of various ABIs.

include/limits.h
133–134

Ok. I think this change was solely to support this test code, which (mis)uses _XOPEN_PATH_MAX in place of PATH_MAX: https://svnweb.freebsd.org/base?view=revision&revision=318210

so there are bits of this missing if you are (like us) trying to make this work in stable/10 or stable/11 because Isilon have already aommitted parts of it..
like change 313475 for example..
I need to merge that into our 10 based code until I can use this..

include/limits.h
133–134

maybe ngie (who works for Isilon) can weigh in on this?

include/limits.h
133–134

Ngie has moved on to another employer.

In D12330#255757, @cem wrote:

It seems that ZFS (which we don't use) has some ABI dependency on these constants via zfs_cmd, and this change breaks a CTASSERT on line 202 of zfs_ioctl.c. For some reason passing 3 MAXPATHLENs and 1 MAXNAMELEN via ioctl is too much. Someone would need to figure out how to fix ZFS to use this patch.

I've just found out that that CTASSERT is no longer needed as we do not pass zfs_cmd_t directly via the ioctl path. Instead, we pass zfs_iocparm_t that contains a "pointer" to zfs_cmd_t and the latter is passed via explicit copyin and copyout.

I've applied these changes to 10.3 and will give them a run..

Have applied this patch at $JOB now to stable/10

do we need to allocate new MIBs for longer kinfo ioctls?
the code says:

#define KINFO_FILE_SIZE 1392

while the patch is not too bit it addresses a long standing issue in FreeBSD, that others have addressed long ago.

Have applied this patch at $JOB now to stable/10

do we need to allocate new MIBs for longer kinfo ioctls?
the code says:

#define KINFO_FILE_SIZE 1392

while the patch is not too bit it addresses a long standing issue in FreeBSD, that others have addressed long ago.

We attempted to preserve the existing KINFO (kldstat, etc) ABI. (Hence the changes to kf_path, kve_path, and kvo_path in this diff.) So I think it is fine to use the existing MIBs. We don't see 255-character unicode filenames as being especially relevant to kldstat and it was one less ABI to break.

Just a rebase on more recent CURRENT - untested

Can we make a list of other things that we'd need to change?
ABIs for Start

and there is a 1 byte name field in the name cache.

cem planned changes to this revision.Apr 16 2020, 4:42 PM

We have a v2 of this pending that goes a step further. S3 allows 1024-byte UTF-8 paths.

@cem Hi there! Is there a link to the v2 ?

No, I’m no longer at Isilon.

It is pretty trivial to take this patch and bump the limits up by 4 bytes for S3-compatible paths, if needed. The extra few bytes doesn’t make a huge difference for stack path buffers like the initial 4x bump did.