Page MenuHomeFreeBSD

Improve pmap_free_zero_pages() operations
ClosedPublic

Authored by ota_j.email.ne.jp on Dec 14 2017, 6:26 AM.

Details

Summary

Instead of calling vm_page_free_toq() for each vm_page, create and call a function, vm_page_free_spglist(), that frees a list of vm_pages.

I used vm_page_reclaim_run() as a reference when to lock mutex and call vm_page_free_phys().

This reduces the number of locks and also duration of lock.

Test Plan

Boot a system and run any commands.
Every process calls pmap_free_zero_pages especially at exit.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

One one of my branches, I have the following

static __inline void
pmap_free_zero_pages(struct spglist *free)
{
	struct pglist pgl;
	vm_page_t m;
	int count;

	if (SLIST_EMPTY(free))
		return;
	TAILQ_INIT(&pgl);
	for (count = 0; (m = SLIST_FIRST(free)) != NULL; count++) {
		SLIST_REMOVE_HEAD(free, plinks.s.ss);
		/* Preserve the page's PG_ZERO setting. */
		if (vm_page_free_prep(m, false))
			TAILQ_INSERT_TAIL(&pgl, m, listq);
	}
	atomic_subtract_int(&vm_cnt.v_wire_count, count);
	vm_page_free_phys_pglist(&pgl);
}

Perhaps it makes sense to add a bool argument to vm_page_free_spglist() to instruct it to decrement v_wire_count instead of splitting this into callers.

sys/riscv/riscv/pmap.c
1071–1072

There is no much point in keeping using the pmap_free_zero_pages() when the function does not add any new functionality.

sys/arm/arm/pmap-v6.c
2552–2553

Style requires a blank line at the beginning of the function if there are no local variable definitions.

sys/vm/vm_page.c
3252

Style: "struct spglist *free"

3260

This is structured somewhat oddly. Why not:

while ((m = SLIST_FIRST(free)) != NULL) {
    SLIST_REMOVE_HEAD(free, plinks.s.ss);
    if (vm_page_free_prep(m, false))
        ...
}
vm_page_free_phys_pglist(&pgl);
3262

Shouldn't we be incrementing count only for pages that are added to pgl?

3270

Style: parens around return value.

ota_j.email.ne.jp added inline comments.
sys/riscv/riscv/pmap.c
1071–1072

Given we are to remove this function, will vm_page_free_zero_pages() be a better name than vm_page_free_spglist()?

sys/vm/vm_page.c
3262

pmap_free_zero_pages() adds the count of vm_page_free_toq() calls to atomic_subtract_int(&vm_cnt.v_wire_count, count).

At least, this makes code backward compatible. I also wondered, though...

Replaced pmap_free_zero_pages with vm_page_free_spglist().
Fixed styles.
I will handle wire_count upon next changeset.

Call atomic_subtract_int(&vm_cnt.v_wire_count, count), too if a caller instructs so.

Renamed to vm_page_free_zero_pages() and it takes boolean_t to update wire count.

ota_j.email.ne.jp added inline comments.
sys/arm/arm/pmap-v6.c
2552–2553

The function is removed.

sys/vm/vm_page.c
3260

EMPTY check is moved outside and changed to while loop.

3270

Return value is removed as wire count is adjusted by this function.

I have only style notes about the patch. In my opinion, this is a good change if only due to the reduction of the code duplication.

Still. I will wait for one more person agreement with the change, after you fix the style bugs.

sys/vm/vm_page.c
2541

IMO no point in checking the emptiness since vm_page_free_zero_pages() does.

3250

Use bool instead of boolean_t, for the new code.

3254

Move pgl declaration before count.

3262

'{' should be put on the previous line.

3270

I do not see a point of checking the emptiness of the list, it is done in vm_page_free_phys_pglist().

In D13485#297889, @kib wrote:

I have only style notes about the patch. In my opinion, this is a good change if only due to the reduction of the code duplication.

I've thought in the past that we should have a subr_pmap.c for sharing code between pmap implementations. Simple routines like this and pmap_resident_count_inc()/dec() could live there initially. Might be that some PV entry management code could be generalized there too.

I think the change is fine, modulo style bugs and nits that I pointed out.

sys/vm/vm_page.c
3243

"In other words"

3250

The name is a bit misleading, since the input pages need not be zero-filled. Maybe vm_page_free_pages_toq() for consistency?

ota_j.email.ne.jp marked 6 inline comments as done.

Renamed the function and updated styles based on comments.

ota_j.email.ne.jp marked 2 inline comments as done.

vm_page_free_phys_pglist(&pgl) was removed once by a mistake.

Intention of these changes were to improve reuse of code and also reduce number of locks.

pmap_free_zero_pages calls vm_page_free_toq() for each page and each vm_page_free_toq() locks vm_page_queue_free_mtx.
The new function calls vm_page_free_phys_pglist() that locks vm_page_queue_free_mtx. only once for all pages.

When I had printed the number of pages being freed in printf while testing, I saw many frees are for many 4 pages and sometimes up to 32.

Ok, I will do some testing of the patch and then commit it later today.

This revision is now accepted and ready to land.Mar 4 2018, 10:38 AM

I don't mean to object to the change, but I will note that r330296 subsumes this change since it adds transparent free queue batching to vm_page_free(). That is, most calls to vm_page_free() just add the page to a per-CPU queue, without acquire the free queue lock. One of the motivations for that work was to get rid of some of the ad-hoc batching code we already have and replace it with something more flexible. I'm not sure yet whether that change will eventually land, so I'm ok with the change in this diff.

I don't mean to object to the change, but I will note that r330296 subsumes this change since it adds transparent free queue batching to vm_page_free(). That is, most calls to vm_page_free() just add the page to a per-CPU queue, without acquire the free queue lock. One of the motivations for that work was to get rid of some of the ad-hoc batching code we already have and replace it with something more flexible. I'm not sure yet whether that change will eventually land, so I'm ok with the change in this diff.

I believe that the good bit of this change is the consolidation of the interface, I mean introduction of the vm_page_free_pages_toq(). Many pmaps use the same code, and x86 pmaps must use delayed free for page table pages. So providing the common interface, which can be adjusted to whatever state the VM evolution suggests, is IMO useful. If you want to trivialize the free of the set of page table pages, it is a single place to do it now.

This revision was automatically updated to reflect the committed changes.