Page MenuHomeFreeBSD

kib (Konstantin Belousov)
User

Projects

User Details

User Since
May 16 2014, 7:35 PM (574 w, 1 d)

Recent Activity

Today

kib added inline comments to D50389: pctrie: use popmap in locked lookup_range.
Sat, May 17, 6:33 PM
kib added a comment to D50371: unix: Restrict dirfds exchanged between jails with a different root.
In D50371#1149975, @kib wrote:

So overall I do not like this, sorry. It makes very delicate semantic change for what is basically an assisted self-induced jail problem. The O_RESOLVE_BENEATH is huge complication to the lookup process already to abuse it even more.

There is no semantic change, I am just adding a "sticky" mode for O_RESOLVE_BENEATH, i.e., extending the existing semantics. It gives capsicum's semantics for a single directory file descriptor. I suspect it could be useful elsewhere.

Sat, May 17, 6:32 PM
kib added inline comments to D50389: pctrie: use popmap in locked lookup_range.
Sat, May 17, 6:14 PM
kib accepted D50392: vm_page: reset iterator after domainset drops lock.
Sat, May 17, 6:08 PM
kib accepted D50394: vm_page_grab_pages: avoid doomed range lookups.
Sat, May 17, 6:07 PM
kib added a comment to D50390: nullfs lookup: looked up vnode must be from the lower mount.

This doesn't fix the problem. There are no mount points involved (except for a nullfs mount point).

The issue involves what happens if 1) a process is using a nullfs vnode as the cwd, 2) something moves the corresponding lower vnode outside of the directory hierarchy bound by the nullfs mount. Suppose that the process does chdir(".."). null_lookup() does the lower vnode lookup, then calls null_nodeget() to find the corresponding nullfs node. There is none (because the lower vnode isn't covered by nullfs anymore), so null_nodeget() instantiates a new vnode and returns it.

Now keep doing that until we reach the process' root directory (i.e., the jail root). We will return a nullfs vnode where the lower vnode is the process root dir. But then the root dir check in vfs_lookup() does not catch it, because it is using pointer equality to check whether the returned vnode is the root.

In fact this is the same case that is already handled by null_lookup ("Renames in the lower mounts might create an inconsistent configuration..."), but that check is not sufficient unless the root directory is also a mount point.

To fix it, I think null_lookup() needs to check whether the /lower/ vnode is equal to ndp->ni_rootdir or ndp->ni_topdir. But, VOP_LOOKUP does not get the nameidata. Maybe we should start passing that as an extra parameter?

Sat, May 17, 5:59 PM
kib added a comment to D50380: elf.5: Document .init_array and .fini_array.

mention that for dynamic binaries rtld selects which init model to use

We do describe NT_FREEBSD_NOINIT_TAG in here already. If we add more documentation should it be here or in rtld(1) with an explicit cross reference?

Old binaries: .init and .ctors, called by csu
New binaries with NT_FREEBSD_NOINIT_TAG: .init, .ctors, and .init_array, called by rtld

Sat, May 17, 5:11 PM
kib added inline comments to D50375: namei: Update documentation..
Sat, May 17, 7:30 AM
kib added a comment to D50371: unix: Restrict dirfds exchanged between jails with a different root.

So overall I do not like this, sorry. It makes very delicate semantic change for what is basically an assisted self-induced jail problem. The O_RESOLVE_BENEATH is huge complication to the lookup process already to abuse it even more.
I would very much prefer the straight fix of not allowing to externalize dirfd if it comes from a different jail (with nuances about prison0 or equal roots).

Sat, May 17, 6:01 AM
kib requested review of D50390: nullfs lookup: looked up vnode must be from the lower mount.
Sat, May 17, 5:52 AM
kib added a comment to D50376: pthread_switch_add_np(3): remove.

Is an exp-run needed? I guess this is esoteric enough that it's not, although there is a reference in GCC.

Sat, May 17, 4:00 AM
kib added inline comments to D50375: namei: Update documentation..
Sat, May 17, 3:57 AM
kib updated the diff for D50377: libthr: add stable user interface for sigfastblock(2).
Sat, May 17, 3:49 AM
kib added inline comments to D50377: libthr: add stable user interface for sigfastblock(2).
Sat, May 17, 3:48 AM
kib added a comment to D50380: elf.5: Document .init_array and .fini_array.

You might want to mention that for dynamic binaries rtld selects which init model to use based on the presence of the FreeBSD note NOINIT.

Sat, May 17, 3:41 AM

Yesterday

kib updated the diff for D50377: libthr: add stable user interface for sigfastblock(2).

Fix language mistakes in the man page.

Fri, May 16, 3:50 PM
kib requested review of D50377: libthr: add stable user interface for sigfastblock(2).
Fri, May 16, 3:12 PM
kib requested review of D50376: pthread_switch_add_np(3): remove.
Fri, May 16, 3:11 PM
kib added a comment to D50371: unix: Restrict dirfds exchanged between jails with a different root.

IMO the right way is much simpler, and was formulated in the PR: only allow to pass dirfd across the same jail, or from any jail to prison0. I think your addition of allowing to pass dirfd between different jails but having same rootvp is neat.

Fri, May 16, 5:31 AM

Thu, May 15

kib committed rGe9131d5545d5: bhyve aarch64: generate SError exception when accessing non-backed gpa (authored by kib).
bhyve aarch64: generate SError exception when accessing non-backed gpa
Thu, May 15, 2:02 PM
kib committed rGd8c4c2ce68ee: bhyve x86: when accessing non-backed gpa, emulate hw (authored by kib).
bhyve x86: when accessing non-backed gpa, emulate hw
Thu, May 15, 2:02 PM
kib committed rGe8b9d839b1fa: bhyve: make code to handle non-backed memory accesses MD (authored by kib).
bhyve: make code to handle non-backed memory accesses MD
Thu, May 15, 2:02 PM
kib closed D50116: bhyve: when accessing non-backed gpa, emulate hw.
Thu, May 15, 2:02 PM
kib committed rGaf907bb62414: nfs server: only cn_pnbuf is initialized in nfsrvd_lookup() (authored by kib).
nfs server: only cn_pnbuf is initialized in nfsrvd_lookup()
Thu, May 15, 9:13 AM
kib added inline comments to D50310: vfs: vntblinit(): Rework comments on the computation of 'kern.maxvnodes'.
Thu, May 15, 1:35 AM

Wed, May 14

kib accepted D50353: pctrie_lookup_range: don't write the null.
Wed, May 14, 11:43 PM
kib accepted D50354: libc: Give __thr_jtable protected visibility.
Wed, May 14, 11:28 PM
kib added inline comments to D50310: vfs: vntblinit(): Rework comments on the computation of 'kern.maxvnodes'.
Wed, May 14, 6:26 AM
kib accepted D50338: vfs: vn_alloc(): Stop always calling vn_alloc_hard() and direct reclaiming.
Wed, May 14, 12:36 AM

Tue, May 13

kib accepted D50333: vm_phys: Ensure that pages are removed from pagequeues before freeing.
Tue, May 13, 11:39 PM
kib accepted D50310: vfs: vntblinit(): Rework comments on the computation of 'kern.maxvnodes'.

So if talking about changing (bumping) this limit, we probably should just drop the kmem part of the formula for DMAP machines or machines where uma has small_alloc.

Tue, May 13, 5:55 PM
kib added inline comments to D50310: vfs: vntblinit(): Rework comments on the computation of 'kern.maxvnodes'.
Tue, May 13, 5:16 PM
kib accepted D50317: vfs: Sort includes in vfs_syscalls.c.
Tue, May 13, 4:25 PM
kib accepted D50316: vfs: Sort includes in vfs_vnops.c.
Tue, May 13, 4:24 PM
kib added inline comments to D50315: inotify: Initial revision.
Tue, May 13, 1:33 AM
kib added inline comments to D50317: vfs: Sort includes in vfs_syscalls.c.
Tue, May 13, 1:00 AM
kib added inline comments to D50316: vfs: Sort includes in vfs_vnops.c.
Tue, May 13, 12:58 AM
kib accepted D50263: vm_page: cleanup.
Tue, May 13, 12:51 AM

Mon, May 12

kib added a comment to D50310: vfs: vntblinit(): Rework comments on the computation of 'kern.maxvnodes'.
In D50310#1147892, @kib wrote:

Having a verbose rewrite of the formula into the text does not help in reviewing the code.

It helps because it makes it clearer what the intentions of who changes the code are, and we can then check whether the actual code matches them.

Exactly the reverse. Your comment repeats the code and does not say anything about a possible intent.

Mon, May 12, 6:58 PM
kib added a comment to D50310: vfs: vntblinit(): Rework comments on the computation of 'kern.maxvnodes'.
In D50310#1147813, @kib wrote:

What is the point of making comment repeating the code in even more verbose way?

The point of the new message is to clearly spell out in plain English what the formulae mean in more "physical" terms and be extra-clear about the computation (in particular, the 98,304 cutoff is not on the final number of vnodes, and the growth rate stated is in fact not the global one, both because the previous comments do not take the maxproc contribution into account). The computation here is not overly complex, but I still think the text has value. This is a pre-requisite either for revising it (in a very minor way now, for an alternative where the number of vnodes is raised instead) or just helping evaluate it quickly where it stands asymptotically with respect to maxfiles. In particular, I think it is important to state which formula actually applies and when.

This would make the comment inconsistent with the actual code on the next whatever minor change.

There are no more numbers in the new description than in the previous one, except for the last paragraph (see below).

While I agree with your concern in general, for a system-wide setting like this, I certainly don't want anyone to revise these numbers without proper review and updates of the surrounding comments.

Having a verbose rewrite of the formula into the text does not help in reviewing the code.

Mon, May 12, 4:27 PM
kib added a comment to D50125: kern.maxfiles: Clamp it to 'kern.maxvnodes'.

So the NFS server in particular doesn't break this mechanism? Certainly a number of other subsystems will hold a vnode's usecount > 0 without having an open file handle: mdconfig -f vnode, ktrace, swapon, ...

For mdconfig and swapon, AFAIK, the number of such vnodes is fairly limited. ktrace I don't know. I'm actually the most worried about nullfs. NFS has a similar problem. So, yes, these are definitely dents in the mitigation presented here. I don't think it means we should not apply it though.

Mon, May 12, 4:21 PM
kib added a comment to D50125: kern.maxfiles: Clamp it to 'kern.maxvnodes'.

Thinking about it some more, from my POV the bug is that we are applying a resource limit (maxvnodes) when there is no actual shortage (of memory).

Mon, May 12, 3:14 PM
kib added a comment to D50311: Lower default 'maxfiles' to stay under default 'maxvnodes'.

The objections to the previous review are equally applicable to this one.

Mon, May 12, 2:06 PM
kib added a comment to D50310: vfs: vntblinit(): Rework comments on the computation of 'kern.maxvnodes'.

What is the point of making comment repeating the code in even more verbose way? This would make the comment inconsistent with the actual code on the next whatever minor change.

Mon, May 12, 2:01 PM

Sat, May 10

kib committed rG97312e0b46de: rc.conf.5: document precious_machine (authored by kib).
rc.conf.5: document precious_machine
Sat, May 10, 7:34 PM
kib committed rG8cb3670fb90e: rc.d: Add precious_machine rc.conf knob to create /var/run/noshutdown (authored by kib).
rc.d: Add precious_machine rc.conf knob to create /var/run/noshutdown
Sat, May 10, 7:34 PM
kib committed rG16f17042683b: shutdown(8): refuse to run if /var/run/noshutdown is present (authored by kib).
shutdown(8): refuse to run if /var/run/noshutdown is present
Sat, May 10, 7:34 PM
kib committed rGeb8cf785e441: libthr.so: mark as -z initfirst (authored by kib).
libthr.so: mark as -z initfirst
Sat, May 10, 7:34 PM
kib committed rG29b28e380c14: include/paths.h: add _PATH_NOSHUTDOWN (authored by kib).
include/paths.h: add _PATH_NOSHUTDOWN
Sat, May 10, 7:34 PM
kib committed rG6d6fdcbdd87f: rtld: remove stray tabs (authored by kib).
rtld: remove stray tabs
Sat, May 10, 7:34 PM
kib committed rGf31acb9d5cb2: rtld: add support for -z initfirst (authored by kib).
rtld: add support for -z initfirst
Sat, May 10, 7:34 PM
kib committed rGeccd076e8e71: sys/elf_common.h: add the DF_1_INITFIRST flag definition (authored by kib).
sys/elf_common.h: add the DF_1_INITFIRST flag definition
Sat, May 10, 7:34 PM
kib committed rG7865d159c5e1: thr_rtld: accept read lock requests while owning the lock for write (authored by kib).
thr_rtld: accept read lock requests while owning the lock for write
Sat, May 10, 7:34 PM
kib committed rG5f195cb44bee: rtld: do not call into ifunc resolvers with the bind lock write-locked (authored by kib).
rtld: do not call into ifunc resolvers with the bind lock write-locked
Sat, May 10, 7:33 PM
kib committed rGc8f21c33fd46: rtld: add lockstate_wlocked() (authored by kib).
rtld: add lockstate_wlocked()
Sat, May 10, 7:33 PM
kib committed rG6286f5df3e4a: sys/queue.h: add STAILQ_REVERSE (authored by kib).
sys/queue.h: add STAILQ_REVERSE
Sat, May 10, 7:33 PM
kib committed rG34aae0732122: sys/queue.h: use reserved identifiers with the file scope for locals (authored by kib).
sys/queue.h: use reserved identifiers with the file scope for locals
Sat, May 10, 7:33 PM
kib committed rGaa7487ee9537: queue.3: document STAILQ_REVERSE (authored by kib).
queue.3: document STAILQ_REVERSE
Sat, May 10, 7:33 PM

Fri, May 9

kib accepted D50275: vm_page: Don't create a cache zone for the lazyinit freepool.
Fri, May 9, 8:53 PM
kib updated the diff for D50116: bhyve: when accessing non-backed gpa, emulate hw.

Implement andrew' suggestion.

Fri, May 9, 10:22 AM
kib updated the diff for D50116: bhyve: when accessing non-backed gpa, emulate hw.

Add needed headers for mem_aarch64.c

Fri, May 9, 10:03 AM
kib updated the diff for D50116: bhyve: when accessing non-backed gpa, emulate hw.

On armv8, generate simple-minded SError.

Fri, May 9, 9:11 AM
kib accepted D50249: vm_page: reorder declarations in header file.
Fri, May 9, 7:09 AM

Thu, May 8

kib committed rG33759fc7e823: rtld: remove stray tabs (authored by kib).
rtld: remove stray tabs
Thu, May 8, 11:57 PM

Wed, May 7

kib accepted D50248: vm_page: make iter_insert() public.
Wed, May 7, 11:18 PM
kib accepted D50228: libc: Use struct tcb * rather than uintptr_t ** for the tcb.
Wed, May 7, 6:38 PM
kib accepted D50227: libc: Use variables more consistent with Variant I for Variant II TLS.
Wed, May 7, 6:38 PM
kib accepted D50225: libc: Consistently use uintptr_t for TLS implementation.
Wed, May 7, 6:37 PM
kib added a comment to D50226: rtld-elf: Consistently use uintptr_t for TLS implementation.
In D50226#1145838, @kib wrote:
In D50226#1145753, @kib wrote:

For me, Elf_Addr is more natural type synonym to use in the rtld code. Also, it gives us the luxury of Elf32/64_Addr typedefs, so if needed, multilib-like rtld has an easier way to express itself.

Well, Elf_Addr is about the format of addresses within the ELF file, matching the default ELF class for the current ABI. Technically that does not have to match the format of addresses at run time (LP64 IA-64 used both ELFCLASS32 and ELFCLASS64; LP46 vs ILP32 was an orthogonal EF_IA_64_ABI64), though you probably shouldn't build a system that does that these days, and of course it definitely doesn't match the format of pointers for CHERI. The values here are not addresses in ELF files, they are pointers at run time, so they should use pointer types, and using anything other than (u)intptr_t or T * does not work on CHERI. We've already had to change this code downstream and whilst refactoring things upstream I'd like to take the time to use CHERI-compatible types rather than write new(ish) code that I know does not work for CHERI.

I think this opinion should be stated in the change' summary.

Thanks. Are you happy with the updated description?

Wed, May 7, 6:22 PM
kib accepted D50226: rtld-elf: Consistently use uintptr_t for TLS implementation.
In D50226#1145753, @kib wrote:

For me, Elf_Addr is more natural type synonym to use in the rtld code. Also, it gives us the luxury of Elf32/64_Addr typedefs, so if needed, multilib-like rtld has an easier way to express itself.

Well, Elf_Addr is about the format of addresses within the ELF file, matching the default ELF class for the current ABI. Technically that does not have to match the format of addresses at run time (LP64 IA-64 used both ELFCLASS32 and ELFCLASS64; LP46 vs ILP32 was an orthogonal EF_IA_64_ABI64), though you probably shouldn't build a system that does that these days, and of course it definitely doesn't match the format of pointers for CHERI. The values here are not addresses in ELF files, they are pointers at run time, so they should use pointer types, and using anything other than (u)intptr_t or T * does not work on CHERI. We've already had to change this code downstream and whilst refactoring things upstream I'd like to take the time to use CHERI-compatible types rather than write new(ish) code that I know does not work for CHERI.

Wed, May 7, 5:42 PM
kib accepted D50231: tls: Introduce struct dtv and struct dtv_slot.
Wed, May 7, 5:32 PM
kib accepted D50224: libc: Reassociate pointer arithmetic in __libc_tls_get_addr.
Wed, May 7, 4:29 PM
kib committed rGa130a604a4fb: rc.conf.5: document precious_machine (authored by kib).
rc.conf.5: document precious_machine
Wed, May 7, 2:46 PM
kib closed D50215: rc.conf.5: document precious_machine.
Wed, May 7, 2:46 PM
kib added inline comments to D50231: tls: Introduce struct dtv and struct dtv_slot.
Wed, May 7, 2:42 PM
kib accepted D50230: rtld-elf: Use clear pointer provenance when updating DTV pointer.
Wed, May 7, 2:09 PM
kib added a comment to D50226: rtld-elf: Consistently use uintptr_t for TLS implementation.

For me, Elf_Addr is more natural type synonym to use in the rtld code. Also, it gives us the luxury of Elf32/64_Addr typedefs, so if needed, multilib-like rtld has an easier way to express itself.

Wed, May 7, 2:07 PM
kib accepted D50232: rtld-elf: Use variables more consistent with Variant I for Variant II TLS.
Wed, May 7, 2:06 PM
kib accepted D50229: rtld-elf: Use struct tcb * rather than uintptr_t ** for the tcb.

(I understand that this change in its current form depends on two other rtld reviews)

Wed, May 7, 2:03 PM
kib accepted D50192: linuxkpi: Fix up jiffies handling.
Wed, May 7, 1:34 PM

Tue, May 6

kib committed rGc15cdaf50584: sys/queue.h: use reserved identifiers with the file scope for locals (authored by kib).
sys/queue.h: use reserved identifiers with the file scope for locals
Tue, May 6, 10:21 PM
kib closed D50216: sys/queue.h: use reserved identifiers with the file scope for locals.
Tue, May 6, 10:21 PM
kib accepted D50185: rtld-elf: Fix UB for direct exec with no extra rtld arguments.
Tue, May 6, 9:45 PM
kib accepted D50187: rtld-elf: Push TLS_DTV_OFFSET into tls_get_addr_common's arguments.
Tue, May 6, 9:42 PM
kib accepted D50184: rtld-elf: Fix dl_iterate_phdr's dlpi_tls_data for PowerPC and RISC-V.
Tue, May 6, 9:41 PM
kib accepted D50183: rtld-elf: Fix dlsym(3) for TLS symbols on PowerPC and RISC-V.
Tue, May 6, 9:40 PM
kib accepted D50181: libc: Don't bias DTV entries by TLS_DTV_OFFSET.
Tue, May 6, 9:37 PM
kib accepted D50182: libc: Fix dl_iterate_phdr's dlpi_tls_data for PowerPC and RISC-V.
Tue, May 6, 9:34 PM
kib accepted D50199: vm_page: drop mpred param from insert_lookup.
Tue, May 6, 8:21 PM
kib added inline comments to D50124: Compute 'maxproc'/'maxfiles' from memory amount; Expand/fix comments.
Tue, May 6, 7:44 PM
kib added a comment to D50216: sys/queue.h: use reserved identifiers with the file scope for locals.

For me __curelm is a bit more readable than _Curelm, but I'm fine with this versino as well.

Tue, May 6, 6:39 PM
kib requested review of D50216: sys/queue.h: use reserved identifiers with the file scope for locals.
Tue, May 6, 6:17 PM
kib accepted D50121: sysctl(9): Ease exporting struct sizes; Discourage doing that.
Tue, May 6, 5:42 PM
kib requested review of D50215: rc.conf.5: document precious_machine.
Tue, May 6, 5:40 PM
kib accepted D50192: linuxkpi: Fix up jiffies handling.
Tue, May 6, 5:33 PM
kib accepted D50186: rtld-elf: Fix executable's TLS module index for direct exec.
Tue, May 6, 5:27 PM
kib committed rG7c6a5272f1de: libthr.so: mark as -z initfirst (authored by kib).
libthr.so: mark as -z initfirst
Tue, May 6, 5:16 PM
kib committed rG78aaab9f1cf3: rtld: add support for -z initfirst (authored by kib).
rtld: add support for -z initfirst
Tue, May 6, 5:16 PM
kib committed rG31a248318023: sys/elf_common.h: add the DF_1_INITFIRST flag definition (authored by kib).
sys/elf_common.h: add the DF_1_INITFIRST flag definition
Tue, May 6, 5:16 PM
kib closed D50133: libthr.so: mark as -z initfirst.
Tue, May 6, 5:16 PM