Extend device_pager to support prefaulting of device memory
AbandonedPublic
Actions

Authored by kmacy on May 27 2016, 1:36 AM.

Details

Reviewers

dumbbell
kib
alc
emaste
markj
adrian

Summary

Linux graphics drivers aggressively pre-fault device memory. The current interface prevents that from working - at least in terms of expecations implied by existing asserts in the VM code. This change to the device pager's fault interface let's it know how many pages have been passed and let's it communicate how much read ahead it has done.

It turns out that the driver prefault path isn't invoked at X startup. But during glxgears my instrumentation indicates that this change is in fact adding the *rahead mappings.

Test Plan

It doesn't need any testing per se as it is a no-op for all in-tree consumers.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

kmacy updated this revision to Diff 16957.May 27 2016, 1:36 AM

kmacy retitled this revision from to Enable device's to support prefaulting of device memory.

kmacy updated this object.

kmacy edited the test plan for this revision. (Show Details)

kmacy added reviewers: adrian, kib.

kmacy set the repository for this revision to rS FreeBSD src repository - subversion.

Herald added a subscriber: imp. · View Herald TranscriptMay 27 2016, 1:36 AM

kmacy added reviewers: emaste, dumbbell.May 27 2016, 2:28 AM

kmacy retitled this revision from Enable device's to support prefaulting of device memory to Extend device_pager to support prefaulting of device memory.May 27 2016, 6:58 PM

Please explain what is the supposed action of the driver when *rahead is assigned non-zero value.

Of course, the change breaks driver KBI. If it is accepted, something (cdevsw version, i.e. D_VERSION ?) should be bumped.

In D6587#139656, @kib wrote:

Please explain what is the supposed action of the driver when *rahead is assigned non-zero value.

Ahh thanks for pointing out that prefaulting on fictitious pages is broken there too.

vm_fault.c:959 - faultcount != 1

if (faultcount != 1 && (fault_flags & VM_FAULT_WIRE) == 0 &&

	    wired == 0)
		vm_fault_prefault(&fs, vaddr,
		    faultcount > 0 ? behind : PFBAK,
		    faultcount > 0 ? ahead : PFFOR);

This will call vm_fault_prefault with the value of rahead as the 4th argument.

...

		if (m->valid == VM_PAGE_BITS_ALL &&
		    (m->flags & PG_FICTITIOUS) == 0)
			pmap_enter_quick(pmap, addr, m, entry->protection);

What is the reason for that? It's a completely gratuitous restriction.

Of course, the change breaks driver KBI. If it is accepted, something (cdevsw version, i.e. D_VERSION ?) should be bumped.

Alternatively, we can create a new cdev_pager_allocate. However, the existing API doesn't strike me as being worth preserving.

kmacy added a reviewer: markj.May 27 2016, 7:26 PM

@kib can you explain the check?

If not I'll update the patch with its removal. Thanks again.

kmacy added a subscriber: scottl.May 27 2016, 8:00 PM

Fix erroneous limitation on drivers preventing prefaulting of device mappings.
Add new version number to conf.h

kmacy updated this object.May 27 2016, 11:50 PM

kmacy added a reviewer: alc.

kmacy updated this object.May 28 2016, 12:49 AM

Still, can you formally explain what are the expectations WRT value of *rahead and state of the device object page queue ?

Also, I wanted to pass the real protection value down to the cdev_pg_fault() for quite some time. Currently prot always indicates write, which causes unneccessary wait in the GEM and TTM page fault handlers. The KBI of the cdev_pg_fault does not need to be changed for that, but since we are discussing the interface update, I think it worth being mentioned.

In D6587#139787, @kib wrote:

Still, can you formally explain what are the expectations WRT value of *rahead and state of the device object page queue ?

I don't understand the question. The answer to what it _sounds_like_ you're asking is self-evident. All the pages that are (effectively) passed up from the i915 gem fault routine are inserted in the object in the appropriate order. Even with prefault failing to call pmap_enter_quick as it did it previously having all the pages in object saved on work later by having all the pages ready for lookup to find. I can provide a walkthrough the new code if it would help.

Also, I wanted to pass the real protection value down to the cdev_pg_fault() for quite some time. Currently prot always indicates write, which causes unneccessary wait in the GEM and TTM page fault handlers. The KBI of the cdev_pg_fault does not need to be changed for that, but since we are discussing the interface update, I think it worth being mentioned.

Within the pager fault routine the prot would logically indicate the actual fault type. If it is always VM_PROT_WRITE that is a VM bug. The ABI as it stands now supports the correct behavior, even if the implementation currently does not.

Thanks for pointing out yet another bug in the VM relating to graphics performance. I had assumed it would DTRT. I can't trace every call in the code and instrument every argument on the off-chance it's wrong - or at least I hope I don't have to.

In D6587#139856, @kmacy wrote:

In D6587#139787, @kib wrote:

Still, can you formally explain what are the expectations WRT value of *rahead and state of the device object page queue ?

I don't understand the question. The answer to what it _sounds_like_ you're asking is self-evident. All the pages that are (effectively) passed up from the i915 gem fault routine are inserted in the object in the appropriate order.

I think the question was referring to behaviour with respect to the devp_pglist queue.

Within the pager fault routine the prot would logically indicate the actual fault type. If it is always VM_PROT_WRITE that is a VM bug. The ABI as it stands now supports the correct behavior, even if the implementation currently does not.

Thanks for pointing out yet another bug in the VM relating to graphics performance. I had assumed it would DTRT. I can't trace every call in the code and instrument every argument on the off-chance it's wrong - or at least I hope I don't have to.

I see - it doesn't pass down fault_type:

			rv = vm_pager_get_pages(fs.object, &fs.m, 1,
			    &behind, &ahead);
			if (rv == VM_PAGER_OK) {
				if (ahead)

... just wow. Thanks again. I will update the patch and fix. I'm also thinking about extending the device_pager API rather than changing it to make it easier to backport to 10.

I see - it doesn't pass down fault_type:
			rv = vm_pager_get_pages(fs.object, &fs.m, 1,
			    &behind, &ahead);
			if (rv == VM_PAGER_OK) {
				if (ahead)
... just wow. Thanks again. I will update the patch and fix. I'm also thinking about extending the device_pager API rather than changing it to make it easier to backport to 10.

Oh dear ...

static int
dev_pager_getpages(vm_object_t object, vm_page_t *ma, int count, int *rbehind,

int *rahead)

{
int error;
<...>

error = object->un_pager.devp.ops->cdev_pg_fault(object,

			IDX_TO_OFF(ma[0]->pindex), PROT_READ, ma, count, rahead);

<...>

This code really is decrepit. Thanks again for forcing me to take a closer look. I'm sure there are other problems too.

In D6587#139880, @kmacy wrote:

I have a patch somewhere which passes correct PROT_XXX down to the vm_pager_getpages(). Nothing non-trivial, but it compiled some time ago. I will try to find it tomorrow.

In D6587#139858, @markj wrote:

In D6587#139856, @kmacy wrote:

In D6587#139787, @kib wrote:

Still, can you formally explain what are the expectations WRT value of *rahead and state of the device object page queue ?

I don't understand the question. The answer to what it _sounds_like_ you're asking is self-evident. All the pages that are (effectively) passed up from the i915 gem fault routine are inserted in the object in the appropriate order.

I think the question was referring to behaviour with respect to the devp_pglist queue.

The driver needs to add to the list itself. My next patch will make that clearer.

In D6587#139881, @kib wrote:

In D6587#139880, @kmacy wrote:

I have a patch somewhere which passes correct PROT_XXX down to the vm_pager_getpages(). Nothing non-trivial, but it compiled some time ago. I will try to find it tomorrow.

Cool. I have something I just hacked out that I'll post soon. I hope that we can merge the two somehow.

In D6587#139787, @kib wrote:

Still, can you formally explain what are the expectations WRT value of *rahead and state of the device object page queue ?

Also, I wanted to pass the real protection value down to the cdev_pg_fault() for quite some time. Currently prot always indicates write, which causes unneccessary wait in the GEM and TTM page fault handlers. The KBI of the cdev_pg_fault does not need to be changed for that, but since we are discussing the interface update, I think it worth being mentioned.

@kib see D6620 for a clarification of the code to answer your first question and a first cut at addressing the issues raised by your second.