I have contrary opinion about ioctl vs syscall.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
May 18 2023
May 14 2023
In D38933#910452, @kib wrote:So there is still a case. Imagine that Linux process is chrooted into a subtree with its own '/compat/linux'. It does not start using this new adir. Might be it needs a namei() in chroot (or rather it should be sysent method?).
ugh, now this code in https://reviews.freebsd.org/D40090
split
May 10 2023
In D38459#911504, @jfree wrote:In D38459#903657, @val_packett.cool wrote:
- and also…
- EVFILT_TIMER is currently subject to a system-wide and small-by-default kern.kq_calloutmax (kq_ncallouts) limit, which feels very unnerving: imagine an important daemon getting starved of timers by some random user app!! This code as-is is not, and I wouldn't like it to be, but then it would be kinda strange that EVFILT_TIMER would still be.
- should we convert that to a per-{user,process,…} that both facilities would use? An rlimit sounds appropriate I think?
- but do we need that limit at all? Maybe just abolish it from EVFILT_TIMER?
A per-proc limit sounds appropriate through rlimit. I'm not sure about abolishing the limit altogether, though. I am guessing it was implemented for a reason.
May 9 2023
In D38459#903657, @val_packett.cool wrote:Hey, couple random notes:
- re: "Developers that wish to support FreeBSD should avoid using timerfd" in the quarterly… :/
- file descriptor handle based APIs are actually kinda better because composability / fd-passing / capability mode friendliness
- FreeBSD invented procdesc(4) soo it's strange that we're not yet striving to turn everything into a file descriptor and Linux has us beat on this…
- also there's no explicit clock selection in EVFILT_TIMER so when we finally add a suspend-aware monotonic clock it would only be possible to explicitly choose suspend-awareness-or-not with timerfd :)
Quick update: I'm nearly finished with my school work for the year, so I've had more time to work on this. I've nearly re-engineered the entire patch and I'm passing ~95% of the epoll-shim timerfd testing suite. I should have a new patch out in the next week (hopefully).
May 6 2023
So there is still a case. Imagine that Linux process is chrooted into a subtree with its own '/compat/linux'. It does not start using this new adir. Might be it needs a namei() in chroot (or rather it should be sysent method?).
Apr 28 2023
rebase to main
Apr 26 2023
PR: 72920
Apr 22 2023
Apr 20 2023
Hey, couple random notes:
Apr 18 2023
In D39647#902355, @emaste wrote:I wonder if it makes sense to just inline the LINUX_KERNVER for these kinds of tests?
if ((p->p_osrel >= LINUX_KERNVER(2,6,26) || p->p_osrel == 0) && imgp->execpathp != 0)
In D39646#902353, @emaste wrote:My quick google suggests Linux AT_CANARY is defined as pointing to 16 random bytes.
Should we add either a runtime test or assertion that imgp->canarylen >= 16?
I wonder if it makes sense to just inline the LINUX_KERNVER for these kinds of tests?
My quick google suggests Linux AT_CANARY is defined as pointing to 16 random bytes.
Should we add either a runtime test or assertion that imgp->canarylen >= 16?
Apr 17 2023
rewored, allow emul_path in chroot
Apr 12 2023
In D38933#897702, @mjg wrote:I strongly suspect the right way is to have linux binaries auto chrooted to /compat/linux
In D38933#897702, @mjg wrote:I strongly suspect the right way is to have linux binaries auto chrooted to /compat/linux or whatever you are looking up against and then have nullfs mounts inside for /home, /tmp and whatever else which makes sense to share. This avoids any suspicious lookups like failing to find a file in Linux because it is missing when it should not and trying to pick up the FreeBSD one. This also avoids adding any complexity to the kernel.
Apr 9 2023
Apr 8 2023
Done
Apr 7 2023
Apr 6 2023
You can implement namei_altroot(struct nameidata *nd, struct vnode *altroot) (or whatever the name) where altroot is guaranteed v_usecount > 0. Then it can handle faking pwd for the first pass without polluting any consumers.
In D38933#897720, @mjg wrote:In D38933#897717, @dchagin wrote:In D38933#897702, @mjg wrote:Something does not add up whatsoever with your bench results -- how is this patch supposed to improve scalability for open/close/unlink?
This patch is not supposed to improve scalability, I used will-it-scale to check that I do not broke hot path.
I am saying that according to the graph it did improve, markedly so, but this can't be true and consequently the bench is bogus.
That aside the entire idea of Linux binaries doing 2 lookups was incredibly dodgy from the get go and I don't think this is helping the fundamental problem, albeit it may be it makes it less iffy.
The added branchfest is definitely not nice, especially the restart clause.
I strongly suspect the right way is to have linux binaries auto chrooted to /compat/linux or whatever you are looking up against and then have nullfs mounts inside for /home, /tmp and whatever else which makes sense to share. This avoids any suspicious lookups like failing to find a file in Linux because it is missing when it should not and trying to pick up the FreeBSD one. This also avoids adding any complexity to the kernel.
Even if going this route, I think the functionality can be added without pessimizing existing code. Note that for example vfs_lookup is already a standalone routine.
Well, I mostly agree with all of you statement. Except some sort of pessimization, I tried to minimize touching of the hot path for native binaries.
This patch adds only two compares (pwd->pwd_rdir != pwd->pwd_adir) on error path of namei() for native. I don't think it's worth the cost.It adds a branch to set things up and another one for failed lookups. clang probably also pessimized namei entry, which already is quite bad. Perhaps I should note there several single-threaded slowdowns remaining, most of them branches and this goes counter to whacking them.
split, move Makefile's part into the separate commit
done
Apr 5 2023
In D38933#897702, @mjg wrote:I strongly suspect the right way is to have linux binaries auto chrooted to /compat/linux or whatever you are looking up against and then have nullfs mounts inside for /home, /tmp and whatever else which makes sense to share. This avoids any suspicious lookups like failing to find a file in Linux because it is missing when it should not and trying to pick up the FreeBSD one. This also avoids adding any complexity to the kernel.
This functionality (double lookup in the ugly current form) was added exactly to avoid requiring users doing what you described above.
I was so startled by the supposed scalability diff I did not take a proper look at the other results.
In D38933#897717, @dchagin wrote:In D38933#897702, @mjg wrote:Something does not add up whatsoever with your bench results -- how is this patch supposed to improve scalability for open/close/unlink?
This patch is not supposed to improve scalability, I used will-it-scale to check that I do not broke hot path.
In D38933#897702, @mjg wrote:Something does not add up whatsoever with your bench results -- how is this patch supposed to improve scalability for open/close/unlink?
Something does not add up whatsoever with your bench results -- how is this patch supposed to improve scalability for open/close/unlink? Similarly it can't be faster than stock code for regular lookups, but this one is perhaps measurement error.
- Back to the char linux_emul_path[], this simplifies the code, adding one namei() call to execve() path.
Apr 4 2023
In D39398#896979, @mjg wrote:ignoring the flag would result in a massive wtf for the unlikely case where someone needs it
just error out
In D38459#896885, @kib wrote:What is the destiny of these patches? Why did you not committed them still?
ignoring the flag would result in a massive wtf for the unlikely case where someone needs it
ok,
what if we simply ignore UNSHARE flag and call kern_close_range with flags = 0?
I dont see problem as file manipulation is under FILEDESC_XLOCK.
What is the destiny of these patches? Why did you not committed them still?
Apr 3 2023
done, 10x
that would introduce wtf in debugging, just don't try to mess with the feature
In D39398#896738, @mjg wrote:linux "fd unsharing" is *not* accurately modeled by fdunshare, which is why I said trying to provide one for native binaries would be weird. they unshare at thread level, while freebsd is doing it at process level. It's all legacy crap.
ah, I completely forgot that the Linux kernel does not make distinction between processes and threads.
I have read only man page )) which is "Unshare the specified file descriptors from any other processes before closing them, avoiding races with other threads sharing the file descriptor table." I.e., it should be process-wide.
So, calling close_range(UNSHARE) from a thread in multi-threaded app can cause unexpected behavior? As that thread can open a new file for which the fd will be reused.
linux "fd unsharing" is *not* accurately modeled by fdunshare, which is why I said trying to provide one for native binaries would be weird. they unshare at thread level, while freebsd is doing it at process level. It's all legacy crap.
Apr 2 2023
Apr 1 2023
Done
Indeed, good idea, done
done, 10x
some test result
https://people.freebsd.org/~dchagin/j.txt
Perhaps better to drop visible entirely:
fix linsysfs_if_visible, initialize visible
Handle VNETs