kib (Konstantin Belousov)
User

Projects

User Details

User Since
May 16 2014, 7:35 PM (213 w, 6 d)

Recent Activity

Today

kib added a comment to D15910: Avoid reclaiming the PV entry for a VA when updating the VA's PTE..
In D15910#338114, @alc wrote:
In D15910#338113, @alc wrote:

Isn't this invalid if both the old and new mappings are wired? Or should it not be possible

... for that situation to arise when the mappings are to different physical pages?

To be clear, I'm assuming that the problem that Kostik is worried about is having "spurious" faults on mlock()ed memory. However, when we wire mappings (and the underlying physical pages) during mlock(), we simulate the COW faults. So, in the normal case of mlock(), I assert that there is not a problem. I speculate that setting a breakpoint in the code of an mlockall()ed application might be the one scenario where a temporarily zeroed PTE could occur. Do I need to argue that a spurious page fault in that scenario isn't really of concern. :-)

And for completeness, in the case of a fork of an mlockall()ed application, we preemptively copy all writeable data. (During the fork, other threads are paused from executing, right?)

Fri, Jun 22, 6:23 PM
kib added a comment to D15910: Avoid reclaiming the PV entry for a VA when updating the VA's PTE..
In D15910#338067, @alc wrote:
In D15910#337632, @kib wrote:
In D15910#337582, @alc wrote:
  1. In regards to pmap_enter(), we should aim to kill two birds with one stone. Recall the copy-on-write mapping bug that Kostik worked around in vm_fault(). I say worked around because the root cause is here in pmap_enter(). When the physical page mapped at va is changing, pmap_enter() should destroy the old mapping before creating the new one. Once pmap_enter() is restructured in this way, you can simply recycle the old mapping's PV entry.

Can you elaborate more, please ? What do you mean by destroying the old mapping ? In particular, do you mean installing the pte with clear PG_V into the changing PTE ?

Yes, I mean destroying the PTE, including TLB shootdown. In effect, briefly the PTE will be 0. Then, installing the new PTE is just a pte_store(). All of the stuff that we currently perform under "if ((origpte & PG_V) != 0) {" will already have been performed when we destroyed the PTE.

Fri, Jun 22, 5:54 PM
kib accepted D15911: Re-count available PV entries after allocating a new chunk..
Fri, Jun 22, 11:25 AM
kib committed rS335548: MFC r335199:.
MFC r335199:
Fri, Jun 22, 10:07 AM
kib accepted D15957: dirent is now 8-byte aligned after ino64..

As I said I believe it is fine functionally.

Fri, Jun 22, 8:56 AM

Yesterday

kib added a comment to D15957: dirent is now 8-byte aligned after ino64..

This is fine.

Thu, Jun 21, 10:35 PM
kib added inline comments to D14567: Introduce fdunlinkat..
Thu, Jun 21, 9:37 PM
kib committed rS335508: MFC r335171:.
MFC r335171:
Thu, Jun 21, 9:21 PM
kib committed rS335507: MFC r335135:.
MFC r335135:
Thu, Jun 21, 9:20 PM
kib committed rS335505: linux_clone_thread: mark new thread as TDB_BORN..
linux_clone_thread: mark new thread as TDB_BORN.
Thu, Jun 21, 9:15 PM
kib closed D15880: linux_clone_thread: mark new thread as TDB_BORN.
Thu, Jun 21, 9:15 PM
kib committed rS335504: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED..
fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
Thu, Jun 21, 9:13 PM
kib closed D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
Thu, Jun 21, 9:13 PM
kib closed D15954: update proc->p_ptevents annotation.
Thu, Jun 21, 9:07 PM
kib committed rS335503: Update proc->p_ptevents annotation to reflect the actual locking..
Update proc->p_ptevents annotation to reflect the actual locking.
Thu, Jun 21, 9:07 PM
kib accepted D15954: update proc->p_ptevents annotation.
Thu, Jun 21, 5:25 PM
kib added a reviewer for D15954: update proc->p_ptevents annotation: jhb.
Thu, Jun 21, 5:25 PM
kib accepted D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.

It looks fine to me, I will wait some time for John opinion.

Thu, Jun 21, 5:24 PM
kib added a comment to D15910: Avoid reclaiming the PV entry for a VA when updating the VA's PTE..
In D15910#337582, @alc wrote:
  1. In regards to pmap_enter(), we should aim to kill two birds with one stone. Recall the copy-on-write mapping bug that Kostik worked around in vm_fault(). I say worked around because the root cause is here in pmap_enter(). When the physical page mapped at va is changing, pmap_enter() should destroy the old mapping before creating the new one. Once pmap_enter() is restructured in this way, you can simply recycle the old mapping's PV entry.
Thu, Jun 21, 5:20 PM
kib added inline comments to D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
Thu, Jun 21, 2:28 PM
kib added inline comments to D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
Thu, Jun 21, 1:01 PM
kib added inline comments to D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
Thu, Jun 21, 10:20 AM
kib added a comment to D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.

Other than two notes I put inline, the patch looks fine to me.

Thu, Jun 21, 9:22 AM

Wed, Jun 20

kib added a comment to D15814: Get rid of netbsd_lchown and netbsd_msync syscall entries..

You can create symbols which are exported but not linkable, since they do not provide a default version. Such symbol can be only created by asm '@' syntax, it should be removed from the version map. Also I do not see a sense in leaving the private symbols around.

Wed, Jun 20, 9:25 PM
kib committed rS335455: MFC r335072, r335089, r335131, r335132:.
MFC r335072, r335089, r335131, r335132:
Wed, Jun 20, 6:51 PM
kib committed rS335453: MFC r332994 (by tychon):.
MFC r332994 (by tychon):
Wed, Jun 20, 5:38 PM
kib added inline comments to D15905: safer wait-free iteration of shared interrupt handlers.
Wed, Jun 20, 4:47 PM
kib added a comment to D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.

I think John' idea was to move the block which sets the has_ptrace_fork variable to true, down to the code which acts on its true value. De-fact, eliminating the var, and removing one if().

OK, but as far as I can tell we will need p1's PROC_LOCK or the proctree_lock in order to check the PTRACE_FORK flag. Am i missing something obvious again :) ?

Wed, Jun 20, 4:12 PM
kib added a comment to D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.

Address 2 of the jhb comments.

Have no idea about moving other code around.

Wed, Jun 20, 1:49 PM
kib added inline comments to D15914: Split up deadlkres() to make it more readable..
Wed, Jun 20, 1:30 PM
kib added a comment to D15924: Fix circular reaper dependency after r275800..

This is somewhat orthogonal, but since you are makes the init the proper child of the proc0, shouldn't proc0 get the P_TREE_REAPER flag ? Otherwise, dying init would confuse the reaping code. We do allow init(8) to die sometimes, without inducing the panic.

Wed, Jun 20, 12:10 PM
kib accepted D15916: More consistently use FOREACH_PROC_IN_SYSTEM()..
Wed, Jun 20, 10:49 AM
kib accepted D15911: Re-count available PV entries after allocating a new chunk..
In D15911#336659, @kib wrote:

This should be fine.

Can we return an indicator if the page freed from the locked pmap, and retry only in this case ?

We can. I implemented it this way originally, but found it a bit ugly, and strictly speaking we only need to retry if the chunk was freed from the locked pmap *and* it contained free entries. That is, even with your suggestion we may still retry when it is not necessary. If you prefer that approach I'll implement it instead, but I mildly prefer this patch because it's simpler than the alternative.

Wed, Jun 20, 10:47 AM

Tue, Jun 19

kib accepted D15910: Avoid reclaiming the PV entry for a VA when updating the VA's PTE..
Tue, Jun 19, 8:48 PM
kib added a comment to D15911: Re-count available PV entries after allocating a new chunk..

This should be fine.

Tue, Jun 19, 8:42 PM

Mon, Jun 18

kib added inline comments to D14567: Introduce fdunlinkat..
Mon, Jun 18, 9:00 PM
kib added a comment to D15570: Virtualization of basic variables and locks for jail+vps..
In D15570#335994, @bz wrote:
In D15570#335980, @kib wrote:
In D15570#335888, @bz wrote:
In D15570#328824, @kib wrote:

grep for FOREACH_PROC_IN_SYSTEM() for the start. In fact these places are already visible in the patch because they require allproc_lock.

Yeah I got that bit. I was more wondering about ..

But the more important question still stands. Some of the uses of the global lists should go, but some, esp. for VM, probably must stay. As I noted, this requires preliminary architectural discussion.

.. exactly one of these esp for VM, where it must stay. Can you just point out one specific example.

In the network stack we have VNET_FOREACH() as an outside iterator over all virtual (network stack) instances and then can iterate over all per-virtual-instance lists. I am wondering if the same could be done here..

Mon, Jun 18, 6:30 PM
kib added a comment to D15570: Virtualization of basic variables and locks for jail+vps..
In D15570#335888, @bz wrote:
In D15570#328824, @kib wrote:

I have no idea about the global plan and most of the implementation details, so I am replying from the common sense PoV. I am completely not sure about the split of the allproc_lock and proctree_lock into per-VPS locks. For often need to iterate over all processes in the tree, e.g. to make a decision about the global memory subsystem state. In this case, the global process list and the global allproc_lock are right, but per-VPS lists and locks are not. Of course, unless you also partition the physical memory (but I doubt that it makes sense).

In other words, there still must be a global lock, and there probably should be per-container locks. But this requires initial architectural discussion and agreement on the major points, which would allow to make such decisions later without much reconsiderations.

Can you give me a good example where we currently do this? The iterations over all processes won't really work anymore given there's no single all-processes list anymore.
I'd like to have a look at a good example (as you mention the memory subsystem maybe there) so I can give you a more informed description for discussion.

grep for FOREACH_PROC_IN_SYSTEM() for the start. In fact these places are already visible in the patch because they require allproc_lock.

Mon, Jun 18, 5:43 PM
kib added inline comments to D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
Mon, Jun 18, 5:35 PM
kib accepted D15880: linux_clone_thread: mark new thread as TDB_BORN.
Mon, Jun 18, 1:05 PM
kib added inline comments to D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
Mon, Jun 18, 1:05 PM
kib added a comment to D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.

I was not able to imagine a case which is broken by this simplification. Also, the ptrace_test works with the patch.

Mon, Jun 18, 9:38 AM
kib added a reviewer for D15857: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED: jhb.
Mon, Jun 18, 9:33 AM

Sat, Jun 16

kib committed rS335258: Remove unused file..
Remove unused file.
Sat, Jun 16, 5:11 PM
kib committed rS335257: Remove some empty directories.
Remove some empty directories
Sat, Jun 16, 4:16 PM
kib committed rS335253: Rework ofed build..
Rework ofed build.
Sat, Jun 16, 3:05 PM
kib closed D15648: Rework ofed build..
Sat, Jun 16, 3:05 PM
kib added a comment to D15823: Linuxlator enable ptrace.

I checked the rest of the RFSTOPPED use points and applying the change seems trivial. So if the general idea is acceptable I can apply the changes to all the places and upload a new patch.

Right now I think that this is working approach.

Sat, Jun 16, 2:57 PM · Linux Emulation
kib added a comment to D15823: Linuxlator enable ptrace.

Please generate large diff context when you upload diff into phab, e.g. for svn it would be svn diff -x -U999999, for git diff -U999999.

Sat, Jun 16, 11:11 AM · Linux Emulation

Fri, Jun 15

kib added a comment to D15648: Rework ofed build..

Just make sure buildworld WITH_OFED -jXXX does not break.

Fri, Jun 15, 3:42 PM
kib added inline comments to D15802: Permit the kernel environment to set an array of numeric values for a single sysctl(9) node..
Fri, Jun 15, 3:09 PM
kib committed rS335199: linprocfs: add TracerPid to /proc/pid/status..
linprocfs: add TracerPid to /proc/pid/status.
Fri, Jun 15, 1:57 PM
kib committed rS335196: MFC rr335072, r335089:.
MFC rr335072, r335089:
Fri, Jun 15, 1:22 PM
kib added a comment to D15648: Rework ofed build..

Tinderbox and installworld tests passed with WITH_OFED=yes. I think that the patch is ready to go.

Fri, Jun 15, 11:13 AM
kib added a comment to D15816: Normalize COMPAT_43 syscall declerations..

Remove SYSPROTO for them all, instead of patching ? It is not useful for current syscalls, and obviously even less so for compat.

Fri, Jun 15, 8:35 AM
kib added a comment to D15814: Get rid of netbsd_lchown and netbsd_msync syscall entries..

I believe that stubs which return ENOSYS are fine, without redirecting to the syscalls. More, the stubs do not need to provide the default version, so that linking with the symbols will be impossible any more.

Fri, Jun 15, 8:32 AM

Thu, Jun 14

kib accepted D15814: Get rid of netbsd_lchown and netbsd_msync syscall entries..
Thu, Jun 14, 11:17 PM
kib updated the diff for D15648: Rework ofed build..

Fix libpcap dependency on libmlx5

Thu, Jun 14, 10:32 PM
kib closed D15293: Handle the race between fork/vm_object_split() and faults..
Thu, Jun 14, 7:42 PM
kib added 1 commit(s) for D15293: Handle the race between fork/vm_object_split() and faults.: rS335171: Handle the race between fork/vm_object_split() and faults..
Thu, Jun 14, 7:42 PM
kib added an edge to rS335171: Handle the race between fork/vm_object_split() and faults.: D15293: Handle the race between fork/vm_object_split() and faults..
Thu, Jun 14, 7:42 PM
kib committed rS335171: Handle the race between fork/vm_object_split() and faults..
Handle the race between fork/vm_object_split() and faults.
Thu, Jun 14, 7:41 PM
kib committed rS335169: MFC r335089:.
MFC r335089:
Thu, Jun 14, 6:50 PM
kib accepted D15809: proc0_post: Fix some locking issues.
Thu, Jun 14, 6:38 PM
kib added a comment to D15809: proc0_post: Fix some locking issues.
In D15809#334292, @mjg wrote:

What is the purpose of this code to begin with? It looks like it should just be removed. If it is needed (what for?), it probably has to run after all initial forking is finished.

Code makes consistent early processes start time vs rusage.

Thu, Jun 14, 6:38 PM
kib added inline comments to D15809: proc0_post: Fix some locking issues.
Thu, Jun 14, 5:25 PM
kib accepted D15802: Permit the kernel environment to set an array of numeric values for a single sysctl(9) node..
Thu, Jun 14, 4:58 PM
kib updated the diff for D15648: Rework ofed build..

Handle Brian' notes.

Thu, Jun 14, 4:42 PM
kib added inline comments to D15648: Rework ofed build..
Thu, Jun 14, 4:39 PM
kib added inline comments to D15802: Permit the kernel environment to set an array of numeric values for a single sysctl(9) node..
Thu, Jun 14, 3:39 PM
kib added inline comments to D15802: Permit the kernel environment to set an array of numeric values for a single sysctl(9) node..
Thu, Jun 14, 2:21 PM
kib added inline comments to D15583: Mmap device BAR into userspace..
Thu, Jun 14, 2:11 PM
kib updated the diff for D15583: Mmap device BAR into userspace..

Add optional start:count for pciconf -D.
Update pciio(4) and pciconf(8) man pages.

Thu, Jun 14, 2:06 PM
kib committed rS335135: linuxolator/amd64: Don't mangle %r10 on return from syscall for EJUSTRETURN..
linuxolator/amd64: Don't mangle %r10 on return from syscall for EJUSTRETURN.
Thu, Jun 14, 12:36 PM
kib committed rS335132: Reorganize code flow in fpudna()/npxdna() to highlight the critical.
Reorganize code flow in fpudna()/npxdna() to highlight the critical
Thu, Jun 14, 11:10 AM
kib committed rS335131: Remove printf() in #NM handler..
Remove printf() in #NM handler.
Thu, Jun 14, 10:33 AM

Wed, Jun 13

kib committed rS335089: Enable eager FPU context switch by default on i386 too, based on.
Enable eager FPU context switch by default on i386 too, based on
Wed, Jun 13, 9:10 PM
kib committed rS335090: MFC r335072:.
MFC r335072:
Wed, Jun 13, 9:10 PM
kib committed rS335072: Enable eager FPU context switch by default on amd64..
Enable eager FPU context switch by default on amd64.
Wed, Jun 13, 5:55 PM
kib added a comment to D15293: Handle the race between fork/vm_object_split() and faults..

Peter, could you, please, test this patch ? I do not think that it is feasible to try to reproduce the original problem, but the generic testing would be useful.

Wed, Jun 13, 4:10 PM
kib updated subscribers of D15293: Handle the race between fork/vm_object_split() and faults..
Wed, Jun 13, 4:10 PM
kib updated the diff for D15293: Handle the race between fork/vm_object_split() and faults..

Remove use of vm_page_next(). Add comment.

Wed, Jun 13, 9:53 AM

Tue, Jun 12

kib updated the diff for D15293: Handle the race between fork/vm_object_split() and faults..

Fixed bug with the iteration without lock. Implement other suggestions too.

Tue, Jun 12, 10:37 PM
kib updated the diff for D15293: Handle the race between fork/vm_object_split() and faults..

Swap the patch for the swap_pager.c patch.

Tue, Jun 12, 9:50 PM
kib added a comment to D15293: Handle the race between fork/vm_object_split() and faults..
In D15293#333512, @alc wrote:

If we delay the allocation of the ahead pages until after the buffer is allocated, in other words, after we have dropped and reacquired the object lock, then I believe that we can still use the maxahead result from the swap_pager_haspages() call that was performed before the object lock was dropped. If any of that swap space was invalidated while the object was unlocked, there should now be a resident page at that pindex that stops us from attempting to read from the now invalid swap space.

Tue, Jun 12, 8:59 PM
kib added a comment to D15293: Handle the race between fork/vm_object_split() and faults..
In D15293#333432, @alc wrote:

For what it's worth, I believe that this race is fallout from the pager interface changes for sendfile. Before those changes, we would have adjusted readbehind and performed the page allocations before calling the pager and releasing the object lock.

Tue, Jun 12, 5:32 PM
kib added a comment to D15755: add support for marking interrupt handlers as suspended.

I suspect it might be useful to get an ack from the interrupt thread that it sees the IH_SUSP flag, so that device power down does not occur while the handler still run ?

Tue, Jun 12, 12:29 PM
kib accepted D15691: Make kernel allocations be non-executable on some platforms.
Tue, Jun 12, 12:11 PM
kib added inline comments to D15691: Make kernel allocations be non-executable on some platforms.
Tue, Jun 12, 12:11 PM
kib added a comment to D14567: Introduce fdunlinkat..

I do not have any important notes about this change except the error code when raced.

Tue, Jun 12, 11:32 AM
kib committed rS334995: All exceptions IDT descriptors must use interrupt gates on 4/4 kernel..
All exceptions IDT descriptors must use interrupt gates on 4/4 kernel.
Tue, Jun 12, 10:43 AM
kib committed rS334994: Fix typo..
Fix typo.
Tue, Jun 12, 10:41 AM

Mon, Jun 11

kib added a comment to D15293: Handle the race between fork/vm_object_split() and faults..
In D15293#333070, @alc wrote:

To be clear, the clipping of ahead and behind after the object lock is reacquired should likely remain, but we should also clip behind before the object lock is released. It's also worth considering whether this new clipping should occur in vm_fault_hold(). Currently, I believe that only the swap pager is vulnerable, but it still might be argued that vm_fault_hold() should perform the new clipping.

It could also be argued that vm_object_split() should wait out the paging-in-progress indication on the top-level object by vm_fault_hold().

Mon, Jun 11, 7:17 PM
kib added a comment to D15293: Handle the race between fork/vm_object_split() and faults..
In D15293#333055, @alc wrote:
Mon, Jun 11, 7:10 PM
kib committed rS334952: Fix braino in r334799. Maxmem is in pages..
Fix braino in r334799. Maxmem is in pages.
Mon, Jun 11, 3:28 PM
kib added a comment to D15749: Ensure that the pageout daemon runs on schedule.

The change tightly alias the worker wakeup to the period ticks. It might result at least in the cosmetics issues with the load average.

Mon, Jun 11, 9:50 AM
kib added a comment to D15293: Handle the race between fork/vm_object_split() and faults..
In D15293#332786, @alc wrote:

Can't we simply modify the restart so that it doesn't find_least from the original pindex, but instead at the busy page's pindex? In other words, ...

Mon, Jun 11, 9:28 AM

Sun, Jun 10

kib accepted D15744: Fix build of i915kms with base gcc.
Sun, Jun 10, 8:58 PM
kib closed D15714: libc qsort: stop aliasing.
Sun, Jun 10, 5:54 PM
kib committed rS334928: libc qsort(3): stop aliasing..
libc qsort(3): stop aliasing.
Sun, Jun 10, 5:54 PM