alc (Alan Cox)
User

Projects

User Details

User Since
Dec 14 2014, 5:52 AM (183 w, 4 d)

Recent Activity

Yesterday

alc added inline comments to D15911: Re-count available PV entries after allocating a new chunk..
Thu, Jun 21, 5:15 PM
alc added inline comments to D15911: Re-count available PV entries after allocating a new chunk..
Thu, Jun 21, 4:58 PM
alc accepted D15911: Re-count available PV entries after allocating a new chunk..
Thu, Jun 21, 4:52 PM
alc added a comment to D15910: Avoid reclaiming the PV entry for a VA when updating the VA's PTE..
In D15910#337582, @alc wrote:

I have two comments.

  1. The change to reserve_pv_entries() is unnecessary. That function is only called during demotion. At that point, the only PV entry will be for the superpage mapping, and reclaim_pv_chunk() already skips superpage mappings.

Indeed, I noted this in the review description. The comment above reclaim_pv_chunk() suggests that it skips superpage mappings only to avoid worsening an ongoing PV entry shortage, but in fact this is required for correctness of the code. To me it seemed fragile to leave reserve_pv_entries() as it was since the handling of reclaim_pv_chunk() wrt superpage mappings might change in the future (e.g., by searching for and reclaiming all 4KB page mappings when a superpage mapping is discovered) and introduce this subtle bug.

Thu, Jun 21, 4:43 PM
alc added a comment to D15911: Re-count available PV entries after allocating a new chunk..
In D15911#336915, @kib wrote:
In D15911#336659, @kib wrote:

This should be fine.

Can we return an indicator if the page freed from the locked pmap, and retry only in this case ?

We can. I implemented it this way originally, but found it a bit ugly, and strictly speaking we only need to retry if the chunk was freed from the locked pmap *and* it contained free entries. That is, even with your suggestion we may still retry when it is not necessary. If you prefer that approach I'll implement it instead, but I mildly prefer this patch because it's simpler than the alternative.

Sure I do not object against adding the additional check for the chunk to contain a free entry, and it is cheap. My motivation for the suggestion is that we should not penalize the situation when get_pv_entry() failed to allocate a page, too much.

But I do not object to the patch in its current form.

Thu, Jun 21, 4:26 PM
alc added a comment to D15910: Avoid reclaiming the PV entry for a VA when updating the VA's PTE..

I have two comments.

Thu, Jun 21, 4:06 PM

Wed, Jun 13

alc accepted D15293: Handle the race between fork/vm_object_split() and faults..

Looks good!

Wed, Jun 13, 3:58 PM
alc accepted D15691: Make kernel allocations be non-executable on some platforms.
Wed, Jun 13, 5:20 AM
alc added inline comments to D15293: Handle the race between fork/vm_object_split() and faults..
Wed, Jun 13, 3:49 AM

Tue, Jun 12

alc added a comment to D15293: Handle the race between fork/vm_object_split() and faults..

I like it.

Tue, Jun 12, 10:18 PM
alc added a comment to D15293: Handle the race between fork/vm_object_split() and faults..
In D15293#333522, @kib wrote:
In D15293#333512, @alc wrote:

If we delay the allocation of the ahead pages until after the buffer is allocated, in other words, after we have dropped and reacquired the object lock, then I believe that we can still use the maxahead result from the swap_pager_haspages() call that was performed before the object lock was dropped. If any of that swap space was invalidated while the object was unlocked, there should now be a resident page at that pindex that stops us from attempting to read from the now invalid swap space.

But then, why not move the buffer allocation after we drop the object lock anyway ? Even with the current place, we sleep for the available pbuf with m[0] busied. So it should not matter much if we sleep with all readahead/behind pages busied ?

Tue, Jun 12, 9:27 PM
alc added a comment to D15293: Handle the race between fork/vm_object_split() and faults..
In D15293#333441, @kib wrote:
In D15293#333432, @alc wrote:

For what it's worth, I believe that this race is fallout from the pager interface changes for sendfile. Before those changes, we would have adjusted readbehind and performed the page allocations before calling the pager and releasing the object lock.

I mostly agree with this. But I do not understand your change.

So we looked at the page index in the object queue, right before our ma[0], and claim that we can safely (WRT vm_object_split()) allocate pages between that index and ma[0] after relock. I do not see why. After we dropped the lock, why cannot a page be instantiated in this range by other means, even if split is going, and then split processed it, all while we are blocked on phys buf allocation and not yet owning the object lock.

Tue, Jun 12, 8:28 PM
alc added a comment to D15293: Handle the race between fork/vm_object_split() and faults..

For what it's worth, I believe that this race is fallout from the pager interface changes for sendfile. Before those changes, we would have adjusted readbehind and performed the page allocations before calling the pager and releasing the object lock.

Tue, Jun 12, 5:13 PM
alc added a comment to D15293: Handle the race between fork/vm_object_split() and faults..
In D15293#333078, @kib wrote:
In D15293#333070, @alc wrote:

To be clear, the clipping of ahead and behind after the object lock is reacquired should likely remain, but we should also clip behind before the object lock is released. It's also worth considering whether this new clipping should occur in vm_fault_hold(). Currently, I believe that only the swap pager is vulnerable, but it still might be argued that vm_fault_hold() should perform the new clipping.

It could also be argued that vm_object_split() should wait out the paging-in-progress indication on the top-level object by vm_fault_hold().

I do not quite understand this. If we already copied the page from the split object into the new one, the page content is no longer our problem. The only issue is that other parallel faults must re-evaluate the final page, and this is done by 1) map generation bump 2) the fact that split thread owns map lock exclusively and vm_fault retry would block there.

So why do we need to do anything else ?

Tue, Jun 12, 5:09 PM
alc added a comment to D15293: Handle the race between fork/vm_object_split() and faults..

Here is my current patch.

Index: vm/swap_pager.c
===================================================================
--- vm/swap_pager.c     (revision 334544)
+++ vm/swap_pager.c     (working copy)
@@ -1103,6 +1103,19 @@ swap_pager_getpages(vm_object_t object, vm_page_t
Tue, Jun 12, 4:34 PM

Mon, Jun 11

alc added a comment to D15293: Handle the race between fork/vm_object_split() and faults..

To be clear, the clipping of ahead and behind after the object lock is reacquired should likely remain, but we should also clip behind before the object lock is released. It's also worth considering whether this new clipping should occur in vm_fault_hold(). Currently, I believe that only the swap pager is vulnerable, but it still might be argued that vm_fault_hold() should perform the new clipping.

Mon, Jun 11, 7:01 PM
alc added a comment to D15293: Handle the race between fork/vm_object_split() and faults..

vm_object_split() is (and must be) performed with the map exclusively locked, so the hypothesized vm_fault() that starts before vm_object_split() must be dropping the map lock in order to perform I/O. Otherwise, the map lock would serialize vm_object_split() and vm_fault(), i.e., their executions could not overlap.

Mon, Jun 11, 5:55 PM

Sun, Jun 10

alc added a comment to D15293: Handle the race between fork/vm_object_split() and faults..

Can't we simply modify the restart so that it doesn't find_least from the original pindex, but instead at the busy page's pindex? In other words, ...

Index: vm/vm_object.c
===================================================================
--- vm/vm_object.c      (revision 334544)
+++ vm/vm_object.c      (working copy)
@@ -1411,8 +1411,9 @@ vm_object_split(vm_map_entry_t entry)
                    ("orig_object->charge < 0"));
                orig_object->charge -= ptoa(size);
        }
+       foo = offidxstart;
 retry:
-       m = vm_page_find_least(orig_object, offidxstart);
+       m = vm_page_find_least(orig_object, foo);
        for (; m != NULL && (idx = m->pindex - offidxstart) < size;
            m = m_next) {
                m_next = TAILQ_NEXT(m, listq);
@@ -1426,6 +1427,7 @@ retry:
                 */
                if (vm_page_busied(m)) {
                        VM_OBJECT_WUNLOCK(new_object);
+                       foo = m->pindex;
                        vm_page_lock(m);
                        VM_OBJECT_WUNLOCK(orig_object);
                        vm_page_busy_sleep(m, "spltwt", false);
Sun, Jun 10, 11:24 PM
alc created D15749: Ensure that the pageout daemon runs on schedule.
Sun, Jun 10, 10:05 PM

Fri, Jun 8

alc added a comment to D15691: Make kernel allocations be non-executable on some platforms.
In D15691#332057, @jtl wrote:
In D15691#331801, @alc wrote:

Overall, I think that this is a good idea, but the implementation has the following problem. The allocation of one executable page will block the promotion of the surrounding pages to a superpage mapping.

This is easy enough to do. I wonder, however, if there is a concern about allocating a superpage of kernel address space for this, particularly on 32-bit systems. For example, if we used a different arena, a single use of M_EXEC memory on i386 would need to allocate 21 bits/2MB (PAE) or 22 bits/4MB (non-PAE) of kernel address space to that arena. I wonder if that would put unnecessary pressure on the available kernel address space?

Fri, Jun 8, 5:19 PM
alc added a comment to D15691: Make kernel allocations be non-executable on some platforms.
In D15691#331991, @jhb wrote:

Given how rarely executable memory is probably used (just bpf JIT currently), I'm not sure how big of an impact the fine-grained PG_NX permissions will be?

Fri, Jun 8, 4:59 PM

Thu, Jun 7

alc added a comment to D15691: Make kernel allocations be non-executable on some platforms.

Overall, I think that this is a good idea, but the implementation has the following problem. The allocation of one executable page will block the promotion of the surrounding pages to a superpage mapping. In short, the executable mappings should be segregated from the normal mappings. One possible approach is to create a separate arena for allocation of kernel virtual addresses that are executable And, that arena imports from the current arena at a superpage granularity.

Thu, Jun 7, 5:21 PM
alc committed rS334769: When pidctrl_daemon() is called multiple times within an interval, it.
When pidctrl_daemon() is called multiple times within an interval, it
Thu, Jun 7, 7:50 AM
alc committed rS334752: pidctrl_daemon() implements a variation on the classical, discrete PID.
pidctrl_daemon() implements a variation on the classical, discrete PID
Thu, Jun 7, 2:55 AM

Mon, Jun 4

alc committed rS334621: Use a single, consistent approach to returning success versus failure in.
Use a single, consistent approach to returning success versus failure in
Mon, Jun 4, 4:28 PM
alc added inline comments to D15491: Eliminate the "pass" variable in the page daemon control loop..
Mon, Jun 4, 3:59 PM

Sun, Jun 3

alc added inline comments to D15491: Eliminate the "pass" variable in the page daemon control loop..
Sun, Jun 3, 1:52 AM

Fri, Jun 1

alc committed rS334499: Only a small subset of mmap(2)'s flags should be used in combination with.
Only a small subset of mmap(2)'s flags should be used in combination with
Fri, Jun 1, 9:37 PM

Mon, May 28

alc committed rS334287: Addendum to r334233. In vm_fault_populate(), since the page lock is held,.
Addendum to r334233. In vm_fault_populate(), since the page lock is held,
Mon, May 28, 4:23 PM
alc committed rS334274: Eliminate duplicate assertions. We assert at the start of vm_fault_hold().
Eliminate duplicate assertions. We assert at the start of vm_fault_hold()
Mon, May 28, 4:38 AM
alc closed D15582: Eliminate redundant KASSERT()s in vm_fault() and its helpers.
Mon, May 28, 4:38 AM

Sun, May 27

alc added inline comments to D15582: Eliminate redundant KASSERT()s in vm_fault() and its helpers.
Sun, May 27, 6:03 PM

Sat, May 26

alc added inline comments to D15293: Handle the race between fork/vm_object_split() and faults..
Sat, May 26, 7:21 PM
alc added inline comments to D15582: Eliminate redundant KASSERT()s in vm_fault() and its helpers.
Sat, May 26, 5:37 PM
alc created D15582: Eliminate redundant KASSERT()s in vm_fault() and its helpers.
Sat, May 26, 6:17 AM
alc closed D15572: Use pmap_enter(..., psind=1) in vm_fault_populate().
Sat, May 26, 2:59 AM
alc committed rS334233: Use pmap_enter(..., psind=1) in vm_fault_populate() on amd64. While.
Use pmap_enter(..., psind=1) in vm_fault_populate() on amd64. While
Sat, May 26, 2:59 AM

Fri, May 25

alc added inline comments to D15572: Use pmap_enter(..., psind=1) in vm_fault_populate().
Fri, May 25, 5:24 PM
alc updated the diff for D15572: Use pmap_enter(..., psind=1) in vm_fault_populate().

Change the for () step expression, simplifying the code.

Fri, May 25, 5:10 PM
alc updated subscribers of D15572: Use pmap_enter(..., psind=1) in vm_fault_populate().
Fri, May 25, 4:56 PM
alc added inline comments to D15572: Use pmap_enter(..., psind=1) in vm_fault_populate().
Fri, May 25, 4:51 PM
alc created D15572: Use pmap_enter(..., psind=1) in vm_fault_populate().
Fri, May 25, 3:29 PM

Thu, May 24

alc committed rS334180: Eliminate an unused parameter from vm_fault_populate()..
Eliminate an unused parameter from vm_fault_populate().
Thu, May 24, 8:44 PM
alc accepted D15490: Split active and inactive queue scans into separate functions..
Thu, May 24, 2:43 AM

Wed, May 23

alc added inline comments to D15490: Split active and inactive queue scans into separate functions..
Wed, May 23, 8:56 PM

May 22 2018

alc added a comment to D15491: Eliminate the "pass" variable in the page daemon control loop..

Ping?

May 22 2018, 5:14 PM
alc accepted D15490: Split active and inactive queue scans into separate functions..
May 22 2018, 4:29 PM

May 21 2018

alc added inline comments to D15506: Add missed barrier for pm_gen/pm_active interaction..
May 21 2018, 6:39 PM
alc accepted D15506: Add missed barrier for pm_gen/pm_active interaction..

Can you please add a comment to the code explaining the need for the barrier. The comment in the second and third instances could simply refer the reader to the comment in the first instance.

May 21 2018, 4:08 PM

May 18 2018

alc accepted D15479: Don't bump addl_page_shortage for wired pages..
May 18 2018, 4:19 PM

May 10 2018

alc added a comment to D15055: Map constant zero page on read faults which touch non-existing anon page..
In D15055#323469, @kib wrote:

In fact I started with ft.A.x when I did the testing, but there it was even less interesting than for ft.C.x. The counter's increment was about 1 or 2. This is why I changed to C and also asked about tuning.

I can re-test but I do not see the point.

May 10 2018, 4:13 PM

May 6 2018

alc added a comment to D15055: Map constant zero page on read faults which touch non-existing anon page..
In D15055#321332, @kib wrote:
In D15055#321321, @alc wrote:

To be clear, running the serial version of the benchmark would suffice.

Is there any tuning that needs to be done ?

May 6 2018, 7:46 PM

May 3 2018

alc added a comment to D15231: create straightforward EBR wrapper with rudimentary support for preemption.

On a related note, take a look at https://www.cc.gatech.edu/~smaass3/papers/latr-paper.pdf for a proposed approach to hiding the latency of TLB shootdown on munmap(2).

May 3 2018, 9:43 PM

Apr 30 2018

alc added a comment to D15055: Map constant zero page on read faults which touch non-existing anon page..

To be clear, running the serial version of the benchmark would suffice.

Apr 30 2018, 4:56 PM
alc added a comment to D15055: Map constant zero page on read faults which touch non-existing anon page..
In D15055#320998, @jeff wrote:
In D15055#320936, @alc wrote:
In D15055#320927, @kib wrote:
In D15055#320924, @alc wrote:

Has anyone actually measured how often this optimization gets triggered? I'm just curious.

Even plain multiuser boot does trigger this code several times, it is me were sloppy with the testing of the last version.

In hindsight, the question that I should have asked is "How often does pmap_remove() encounter the zero page in the page table?" pmap_remove_pages() won't encounter the zero page because it's not mapped as a managed mapping. For "normal", i.e., writeable, virtual memory, I fear that this change is a pessimization. Without this change, on first touch, regardless of whether the access is a write, we will allocate a physical page and map it for write access. And so, this change would only increase the number of page faults. Moreover, in a multithreaded program, those page faults are going to have to perform a TLB shootdown, because we're changing the physical page being mapped. The cost of these additional page faults would have to be outweighed by the savings in the cases where pmap_remove() encountered a mapping to the zero page.

That said, I can see a variant of this change being an optimization for a more restricted set of cases, e.g., a read-only mapping of a file.

The optimization was requested by Jeff for very specific benchmark, since Linux also does the same trick and apparently FreeBSD loose a lot due to this. See also related D14917.
I think actual numbers will be provided when Jeff returns.

I can see about finding a specific benchmark. It's actually more of a memory optimization than a performance optimization. Apparently there are many programs that rely on allocating a large anonymous region to manage a tree or hash and then sparsely populating it. They are using the very large vm space rather than manually handling discontiguous memory regions.

Apr 30 2018, 4:40 PM

Apr 28 2018

alc added a comment to D15055: Map constant zero page on read faults which touch non-existing anon page..

Can you try a "buildworld" with the counters in place?

Apr 28 2018, 9:03 PM
alc added a comment to D15122: Eliminate vm object relocks in vm fault..

The vm_fault_prefault() changes look good. Please commit them.

Apr 28 2018, 6:47 PM
alc added a comment to D15055: Map constant zero page on read faults which touch non-existing anon page..
In D15055#320927, @kib wrote:
In D15055#320924, @alc wrote:

Has anyone actually measured how often this optimization gets triggered? I'm just curious.

Even plain multiuser boot does trigger this code several times, it is me were sloppy with the testing of the last version.

Apr 28 2018, 6:18 PM
alc added a comment to D15055: Map constant zero page on read faults which touch non-existing anon page..

Has anyone actually measured how often this optimization gets triggered? I'm just curious.

Apr 28 2018, 5:13 PM

Apr 27 2018

alc added inline comments to D15055: Map constant zero page on read faults which touch non-existing anon page..
Apr 27 2018, 4:52 PM
alc added inline comments to D15055: Map constant zero page on read faults which touch non-existing anon page..
Apr 27 2018, 4:31 PM

Apr 13 2018

alc accepted D15052: Set PG_G global mapping bit on the trampoline mappings..
Apr 13 2018, 4:51 AM

Apr 7 2018

alc added a comment to D14961: Optimize context switch for PTI on PCID pmap..

Conceptually, this seems correct. Please proceed with the additional testing that you mentioned.

Apr 7 2018, 5:11 PM
alc added inline comments to D14961: Optimize context switch for PTI on PCID pmap..
Apr 7 2018, 5:06 PM

Mar 31 2018

alc added inline comments to D14633: i386 4/4G split.
Mar 31 2018, 7:08 PM

Mar 30 2018

alc accepted D14902: Make vm_map_max/min/pmap KBI stable..
Mar 30 2018, 5:11 AM

Mar 21 2018

alc accepted D14778: Elide the object lock in vfs_vmio_truncate() in the common case..
Mar 21 2018, 5:11 PM
alc added inline comments to D14778: Elide the object lock in vfs_vmio_truncate() in the common case..
Mar 21 2018, 5:03 PM
alc added inline comments to D14778: Elide the object lock in vfs_vmio_truncate() in the common case..
Mar 21 2018, 4:43 PM
alc accepted D14778: Elide the object lock in vfs_vmio_truncate() in the common case..
Mar 21 2018, 7:55 AM

Mar 20 2018

alc accepted D14767: Check for wrap-around in vm_phys_alloc_seg_contig()..
Mar 20 2018, 3:41 PM
alc added inline comments to D14767: Check for wrap-around in vm_phys_alloc_seg_contig()..
Mar 20 2018, 3:11 PM

Mar 17 2018

alc accepted D14625: Avoid dequeuing the page found by a soft fault..
Mar 17 2018, 5:21 PM

Mar 13 2018

alc added a comment to D14625: Avoid dequeuing the page found by a soft fault..

... I don't like the

else if (m->queue != PQ_INACTIVE)
    vm_page_deactivate(m);
else
    vm_page_requeue(m);

in vfs_vmio_unwire(), for example.

Even before r330296, having a vm_page_deactivate_or_requeue() for use in release_page() would be an improvement since we'd only have to touch the page and page queue locks once instead of twice. With r330296, vm_page_deactivate_or_requeue() could be modified to use a lazy requeue, or another API could be added so that the caller can choose between strict LRU and second-chance. Does that sound like a reasonable interim approach? If so, any suggestions for a better name than vm_page_deactivate_or_requeue()? :)

Mar 13 2018, 6:04 PM

Mar 12 2018

alc added a comment to D14666: Increment v_pdpages in the laundry queue scan..
In D14666#308086, @alc wrote:

Could I talk you into creating distinct counters for the inactive and laundry queues? And, in regards to the active queue, I'd like to be able to distinguish between idle scanning and shortage-driven scanning.

Sure. I guess we effectively want one counter per (per-domain) queue?

Mar 12 2018, 5:44 PM
alc added a comment to D14625: Avoid dequeuing the page found by a soft fault..
In D14625#307878, @alc wrote:

Before talking about implementation, do we agree that the following describes the desired behavior for COW faults?

Yep, I agree that the behaviour you described seems reasonable.

Mar 12 2018, 5:06 PM
alc added a comment to D14666: Increment v_pdpages in the laundry queue scan..

I've never found pdpages to be that useful in understanding system behavior. Could I talk you into creating distinct counters for the inactive and laundry queues? And, in regards to the active queue, I'd like to be able to distinguish between idle scanning and shortage-driven scanning.

Mar 12 2018, 4:41 PM

Mar 11 2018

alc added a comment to D14625: Avoid dequeuing the page found by a soft fault..

Before talking about implementation, do we agree that the following describes the desired behavior for COW faults?

Mar 11 2018, 8:20 PM
alc added inline comments to D14625: Avoid dequeuing the page found by a soft fault..
Mar 11 2018, 7:35 PM

Mar 8 2018

alc added a comment to D14625: Avoid dequeuing the page found by a soft fault..

I have to prepare for my lecture. I'll think about this some more later.

Mar 8 2018, 6:15 PM
alc added inline comments to D14625: Avoid dequeuing the page found by a soft fault..
Mar 8 2018, 5:26 PM
alc accepted D14625: Avoid dequeuing the page found by a soft fault..

For what it's worth, this goes back even further, to the 1980's at CMU. I also had this in my TODO list as a post-lazy wiring change.

Mar 8 2018, 5:12 PM

Feb 12 2018

alc added a comment to D14062: Make memory mapped via pmap_qenter() non-executable for amd64/i386..
In D14062#300393, @jhb wrote:
  1. I think markj@'s point is true: are there any pmap_qenter() callers that need X or should we just change the API to assume RW mappings only?
Feb 12 2018, 6:13 PM

Feb 10 2018

alc accepted D14269: Mark initial page table entries as wired.
Feb 10 2018, 6:41 PM

Feb 8 2018

alc accepted D14266: Use vm_page_unwire_noq() in some pmap code..
Feb 8 2018, 6:18 PM

Jan 22 2018

alc added inline comments to D13985: Use PCID to optimize PTI..
Jan 22 2018, 6:07 PM
alc added inline comments to D13985: Use PCID to optimize PTI..
Jan 22 2018, 5:17 PM

Jan 21 2018

alc added inline comments to D13985: Use PCID to optimize PTI..
Jan 21 2018, 5:45 PM

Jan 20 2018

alc added inline comments to D13735: Assign map->header values to avoid boundary checks.
Jan 20 2018, 8:21 AM
alc accepted D13735: Assign map->header values to avoid boundary checks.
Jan 20 2018, 8:18 AM

Jan 19 2018

alc accepted D13956: PTI: Trap if we returned to userspace with kernel (full) page table still active..
In D13956#293479, @kib wrote:
In D13956#293475, @alc wrote:

Does this, in fact, dovetail with the PCID patch?

This is really an excerpt from my dev branch where PCID patch and some more fixes are stored.

Jan 19 2018, 8:40 PM
alc added a comment to D13956: PTI: Trap if we returned to userspace with kernel (full) page table still active..

Does this, in fact, dovetail with the PCID patch?

Jan 19 2018, 8:08 PM
alc added a comment to D13956: PTI: Trap if we returned to userspace with kernel (full) page table still active..

Doesn't your proposed PCID patch have kernel- and user-mode page tables differ in bit 11? Suppose that you change that to one of the ignored bits when PCID isn't enabled. Then, regardless of whether PCID was enabled, you can test that bit to know whether a kernel- or user-mode page table was active at the time of the fault.

Jan 19 2018, 6:54 PM
alc added a comment to D13956: PTI: Trap if we returned to userspace with kernel (full) page table still active..
In D13956#293443, @alc wrote:

At the moment, I'm looking at Tables 4-12 and 4-13. Table 4-12 shows several ignored, as opposed to reserved, bits when PCID is inactive. This is from a version dated 12/2017.

Jan 19 2018, 6:41 PM
alc added a comment to D13956: PTI: Trap if we returned to userspace with kernel (full) page table still active..

At the moment, I'm looking at Tables 4-12 and 4-13. Table 4-12 shows several ignored, as opposed to reserved, bits when PCID is inactive. This is from a version dated 12/2017.

Jan 19 2018, 6:30 PM
alc added a comment to D13956: PTI: Trap if we returned to userspace with kernel (full) page table still active..

Although it may be that all of the ignored bits are consumed by PCID.

Jan 19 2018, 6:19 PM
alc added a comment to D13956: PTI: Trap if we returned to userspace with kernel (full) page table still active..
In D13956#293431, @kib wrote:
In D13956#293422, @alc wrote:

Are any of the ignored bits in CR3 preserved, i.e., when you set them upon write, a later read will return them? If so, then we could use one as a flag to distinguish a kernel- versus user-mode page table.

SDM states that writing any reserved bit into %cr3 causes GPF.

Jan 19 2018, 6:17 PM
alc added a comment to D13956: PTI: Trap if we returned to userspace with kernel (full) page table still active..
In D13956#292918, @kib wrote:
In D13956#292839, @alc wrote:

Out of curiously, how does the system behave when this sanity check trips?

I initially modified the doiret path to not reload the %cr3, which panics the kernel. But after your question I realized that this is too simple because vm_fault() does not consider these faults as invalid and does nothing.

Jan 19 2018, 5:41 PM

Jan 17 2018

alc added a comment to D13956: PTI: Trap if we returned to userspace with kernel (full) page table still active..

Out of curiously, how does the system behave when this sanity check trips?

Jan 17 2018, 11:03 PM
alc accepted D13956: PTI: Trap if we returned to userspace with kernel (full) page table still active..
Jan 17 2018, 10:50 PM