Page MenuHomeFreeBSD

Linux getdents/getdents64 rewrite.
ClosedPublic

Authored by trasz on Apr 2 2015, 5:01 PM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 14 2024, 5:56 AM
Unknown Object (File)
Feb 9 2024, 8:28 AM
Unknown Object (File)
Feb 8 2024, 6:40 PM
Unknown Object (File)
Jan 21 2024, 7:01 AM
Unknown Object (File)
Jan 16 2024, 10:57 AM
Unknown Object (File)
Nov 30 2023, 8:27 AM
Unknown Object (File)
Nov 23 2023, 7:12 AM
Unknown Object (File)
Nov 16 2023, 8:36 PM

Details

Summary

This patch replaces linuxulator implementation of getdents() and getdents64()
with wrapper over kern_getdirentries().

Test Plan
#define _GNU_SOURCE
#include <dirent.h>     /* Defines DT_* constants */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>

#define handle_error(msg) \
	do { perror(msg); exit(EXIT_FAILURE); } while (0)

struct linux_dirent {
	long		d_ino;
	off_t		d_off;
	unsigned short	d_reclen;
	unsigned char	d_type;
	char		d_name[];
};

#define BUF_SIZE 1024

int
main(int argc, char *argv[])
{
         int fd, nread;
	char buf[BUF_SIZE];
	struct linux_dirent *d;
	int bpos;
	char d_type;

	fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY);
	if (fd == -1)
		handle_error("open");

	for (;;) {
		nread = syscall(SYS_getdents64, fd, buf, BUF_SIZE);
		if (nread == -1)
			handle_error("getdents");

		if (nread == 0)
			break;

		printf("--------------- nread=%d ---------------\n", nread);
		printf("inode#    file type  d_reclen  d_off   d_name\n");
		for (bpos = 0; bpos < nread;) {
			d = (struct linux_dirent *) (buf + bpos);
			printf("%8ld  ", d->d_ino);
			d_type = d->d_type;
			printf("%-10s ", (d_type == DT_REG) ?  "regular" :
			(d_type == DT_DIR) ?  "directory" :
			(d_type == DT_FIFO) ? "FIFO" :
			(d_type == DT_SOCK) ? "socket" :
			(d_type == DT_LNK) ?  "symlink" :
			(d_type == DT_BLK) ?  "block dev" :
			(d_type == DT_CHR) ?  "char dev" : "???");
			printf("%4d %10lld  %s\n", d->d_reclen,
			(long long) d->d_off, d->d_name);
			bpos += d->d_reclen;
		}
	}

	exit(EXIT_SUCCESS);
}

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

trasz retitled this revision from to Linux getdents/getdents64 rewrite..
trasz updated this object.
trasz edited the test plan for this revision. (Show Details)
bdrewery added inline comments.
sys/compat/linux/linux_file.c
410 ↗(On Diff #4606)

Very nice to just reuse kern_getdirentries. This code in particular was bad. I thought I had fixed it but forgot and only did it for NFS in r268114.

I understand the idea, but I do not know the details of the Linux struct dirent layout peculiarities. I have some memory of issues due to wrong padding. see e.g. 182892. Was it handled ?

Kind of - it was handled by not changing that part of the code, apart from refactoring. Basically, previously both versions were handled in a single routine with "if (is64bit)", and now are in separate routines. What did change is how we obtain the dentries, not how we translate them into Linux format.

In D2210#7, @trasz wrote:

Kind of - it was handled by not changing that part of the code, apart from refactoring. Basically, previously both versions were handled in a single routine with "if (is64bit)", and now are in separate routines. What did change is how we obtain the dentries, not how we translate them into Linux format.

Well, ok. I cannot check that there is no regression there. I think it would be prudent to commit with "Discussed with: kib" tag.

Ok. My biggest concern is whether the approach taken - ie. reading less entries than we were actually requested, to make sure they fit into the user-provided buffer (practically speaking, over twice less) is valid. Also, I have no idea how this could work before, but it apparently did.

In D2210#9, @trasz wrote:

Ok. My biggest concern is whether the approach taken - ie. reading less entries than we were actually requested, to make sure they fit into the user-provided buffer (practically speaking, over twice less) is valid. Also, I have no idea how this could work before, but it apparently did.

Reading less than requested is very common and acceptable result for all syscalls which return data to usermode. This is true for e.g. read(2), getdirents(2) and so on. Only zero-sized return has special meaning, which often (but not completely correct) is interpreted as EOF.

It is question of implementation quality to return as much as possible there.

dchagin edited edge metadata.

with patch:

dchagin opt # ./getdents64

  • nread=744 ---------------

inode# file type d_reclen d_off d_name
4981559 directory 24 12 .
2412819 directory 24 24 ..
4981560 regular 32 40 .keep
4975952 directory 32 60 recvfrom
4979442 regular 40 84 glibc-2.20.tar
5136544 directory 24 96 gc
4975966 regular 32 116 fdescfs.c
5299546 directory 24 128 ltp
4978734 regular 32 144 ltp.tar
5299879 directory 32 160 epoll
4975875 regular 32 176 pipe.c
4976237 regular 24 192 pipe
4975877 regular 32 212 execve.c
4978857 regular 32 228 execve
5063019 directory 32 248 glibc-2.20
5056128 directory 32 268 glibc-2.21
4975959 regular 32 288 tst-shm.c
6909177 directory 24 300 gx
4978858 regular 40 324 glibc-2.21.tar
4975967 regular 40 352 ltp-20150119.tar.gz
4978910 regular 32 372 getdents.c
4978912 regular 32 392 getdents
4975962 regular 32 408 35370.c
5061345 directory 32 424 trinity

  • nread=128 ---------------

inode# file type d_reclen d_off d_name
4978909 regular 32 488 getdents64.c
4978913 regular 32 508 getdents64
4975969 regular 32 528 tst-ss.c
4975973 regular 32 544 sig.c
dchagin opt

without patch sys call returns all entries (872 bytes) in one call

sys/compat/linux/linux_file.c
409 ↗(On Diff #4606)

error != 0 and bellow

Basically I'm having second thoughts about this approach, like what's the minimum correct size for getdirentries().

btw, latest linux-test-project aka ltp panic in getdents syscall:

#13 0xffffffff8075f900 in panic (fmt=0xffffffff80e3abcf "Duplicate free of %p from zone %p(%s) slab %p(%d)\n")

at /home/git/head/sys/kern/kern_shutdown.c:675

#14 0xffffffff80c367e1 in uma_dbg_free (zone=0xfffff8033e5d7680, slab=0xfffff8000fa12f90, item=0xfffff8000fa121c0)

at /home/git/head/sys/vm/uma_dbg.c:284

#15 0xffffffff80c2ee8b in uma_zfree_arg (zone=0xfffff8033e5d7680, item=0xfffff8000fa121c0, udata=0xfffff8000fa12f90)

at /home/git/head/sys/vm/uma_core.c:2718

#16 0xffffffff80730842 in free (addr=0xfffff8000fa121c0, mtp=0xffffffff810bca40)

at /home/git/head/sys/kern/kern_malloc.c:587

#17 0xffffffff81c91db2 in getdents_common (td=0xfffff80009b9a000, args=<value optimized out>, is64bit=0x1)

at /home/git/head/sys/modules/linux64/../../compat/linux/linux_file.c:519

#18 0xffffffff80ca9199 in syscallenter (td=0xfffff80009b9a000, sa=0xfffffe033c1e2a48) at subr_syscall.c:133
#19 0xffffffff80ca8a5a in amd64_syscall (td=0xfffff80009b9a000, traced=0x0)

at /home/git/head/sys/amd64/amd64/trap.c:974

#20 0xffffffff80c7f7fb in Xfast_syscall () at /home/git/head/sys/amd64/amd64/exception.S:395
#21 0x0000000800ae3629 in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language: auto; currently minimal
(kgdb) up 17
#17 0xffffffff81c91db2 in getdents_common (td=0xfffff80009b9a000, args=<value optimized out>, is64bit=0x1)

at /home/git/head/sys/modules/linux64/../../compat/linux/linux_file.c:519

519 free(cookies, M_TEMP);
(kgdb) l
514
515 eof:
516 td->td_retval[0] = nbytes - resid;
517
518 out:
519 free(cookies, M_TEMP);
520
521 VOP_UNLOCK(vp, 0);
522 foffset_unlock(fp, off, 0);
523 fdrop(fp, td);
(kgdb)

just FYI

eadler edited the test plan for this revision. (Show Details)
eadler added a subscriber: eadler.

Does the patched version (with patch from this review) pass all the LTP tests?

In D2210#98562, @emaste wrote:

Ping?

the patch is not applicable to the HEAD

the patch is not applicable to the HEAD

But I wonder if it will be rebased and another attempt made to bring it in?

Dmitry, do I understand it right that the current code - the one we have in CURRENT and 11 - panics on LTP tests? Could you run it with the patch applied and see if it works correctly? Thanks!

Basically, if the current code panics, and we have PRs like https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=216134, then perhaps it is the time to bring it in...

dchagin edited edge metadata.

Updating D2210: Linux getdents/getdents64 rewrite.

  1. I divided linux_getdents() and linux_readdir() as in case when the first called

with count = 1 it can overwrite user stack (by writing to user buffer pointer more than 1 byte).

  1. Linux returns EINVAL in case when user supplied buffer is not enough to contain

fetched dirent, so I added a check.

  1. Linux returns ENOTDIR in case when fd points to not a directory, for that I added

linux_getdents_fdcheck() method.

  1. Unfortunatelly, as I deleted code for buflen calculation as:

buflen = max(LINUX_DIRBLKSIZ, buflen),
LTP getdents02 test fails. It expects ENOENT error but kern_getdirentries()
returns EINVAL. As far as I can see it happens only in case when the user supplied buffer
is small, looks like less than DIRBLKSIZ.
But, we can't make the buflen size larger than args->count.

Tested by trinity (Linux kernel syscall fuzzer).

sys/compat/linux/linux_file.c
342 ↗(On Diff #25049)

Would it be possible to only call it if kern_getdirentries() returns EINVAL?

344 ↗(On Diff #25049)

Doesn't this make it possible for the userspace to DoS the system by making the kernel allocate arbitrarily large piece of memory?

sys/compat/linux/linux_file.c
342 ↗(On Diff #25049)

sure, I'll do

344 ↗(On Diff #25049)

no, first of all, AFAIR LINUX_RECLEN_RATIO() macro reduces the user provided buffer size and
next line (buflen = min(buflen, MAXBSIZE);) limit the buflen size anyway
and kern_getdirentries() check buffer size too

dchagin edited edge metadata.

check for kern_getdirentries() error.

trasz added a reviewer: trasz.

Looks good. :-)

This revision is now accepted and ready to land.Feb 14 2017, 3:45 PM
This revision was automatically updated to reflect the committed changes.