Page MenuHomeFreeBSD

Improve pmap_free_zero_pages() operations
ClosedPublic

Authored by ota_j.email.ne.jp on Dec 14 2017, 6:26 AM.

Details

Summary

Instead of calling vm_page_free_toq() for each vm_page, create and call a function, vm_page_free_spglist(), that frees a list of vm_pages.

I used vm_page_reclaim_run() as a reference when to lock mutex and call vm_page_free_phys().

This reduces the number of locks and also duration of lock.

Test Plan

Boot a system and run any commands.
Every process calls pmap_free_zero_pages especially at exit.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib added a comment.Dec 17 2017, 1:56 PM

One one of my branches, I have the following

static __inline void
pmap_free_zero_pages(struct spglist *free)
{
	struct pglist pgl;
	vm_page_t m;
	int count;

	if (SLIST_EMPTY(free))
		return;
	TAILQ_INIT(&pgl);
	for (count = 0; (m = SLIST_FIRST(free)) != NULL; count++) {
		SLIST_REMOVE_HEAD(free, plinks.s.ss);
		/* Preserve the page's PG_ZERO setting. */
		if (vm_page_free_prep(m, false))
			TAILQ_INSERT_TAIL(&pgl, m, listq);
	}
	atomic_subtract_int(&vm_cnt.v_wire_count, count);
	vm_page_free_phys_pglist(&pgl);
}

Perhaps it makes sense to add a bool argument to vm_page_free_spglist() to instruct it to decrement v_wire_count instead of splitting this into callers.

sys/riscv/riscv/pmap.c
1073 ↗(On Diff #36578)

There is no much point in keeping using the pmap_free_zero_pages() when the function does not add any new functionality.

kib added reviewers: alc, markj.Dec 17 2017, 1:56 PM
markj added inline comments.Dec 17 2017, 6:16 PM
sys/arm/arm/pmap-v6.c
2550 ↗(On Diff #36578)

Style requires a blank line at the beginning of the function if there are no local variable definitions.

sys/vm/vm_page.c
3070 ↗(On Diff #36578)

Style: "struct spglist *free"

3078 ↗(On Diff #36578)

This is structured somewhat oddly. Why not:

while ((m = SLIST_FIRST(free)) != NULL) {
    SLIST_REMOVE_HEAD(free, plinks.s.ss);
    if (vm_page_free_prep(m, false))
        ...
}
vm_page_free_phys_pglist(&pgl);
3080 ↗(On Diff #36578)

Shouldn't we be incrementing count only for pages that are added to pgl?

3088 ↗(On Diff #36578)

Style: parens around return value.

ota_j.email.ne.jp planned changes to this revision.Dec 18 2017, 5:07 PM
ota_j.email.ne.jp added inline comments.
sys/riscv/riscv/pmap.c
1073 ↗(On Diff #36578)

Given we are to remove this function, will vm_page_free_zero_pages() be a better name than vm_page_free_spglist()?

sys/vm/vm_page.c
3080 ↗(On Diff #36578)

pmap_free_zero_pages() adds the count of vm_page_free_toq() calls to atomic_subtract_int(&vm_cnt.v_wire_count, count).

At least, this makes code backward compatible. I also wondered, though...

Replaced pmap_free_zero_pages with vm_page_free_spglist().
Fixed styles.
I will handle wire_count upon next changeset.

Call atomic_subtract_int(&vm_cnt.v_wire_count, count), too if a caller instructs so.

Renamed to vm_page_free_zero_pages() and it takes boolean_t to update wire count.

ota_j.email.ne.jp marked 4 inline comments as done.Dec 19 2017, 12:03 AM
ota_j.email.ne.jp added inline comments.
sys/arm/arm/pmap-v6.c
2550 ↗(On Diff #36578)

The function is removed.

sys/vm/vm_page.c
3078 ↗(On Diff #36578)

EMPTY check is moved outside and changed to while loop.

3088 ↗(On Diff #36578)

Return value is removed as wire count is adjusted by this function.

ota_j.email.ne.jp marked 3 inline comments as done.Jan 29 2018, 1:06 PM
kib added a comment.Feb 5 2018, 12:48 PM

I have only style notes about the patch. In my opinion, this is a good change if only due to the reduction of the code duplication.

Still. I will wait for one more person agreement with the change, after you fix the style bugs.

sys/vm/vm_page.c
2521 ↗(On Diff #36741)

IMO no point in checking the emptiness since vm_page_free_zero_pages() does.

3061 ↗(On Diff #36741)

Use bool instead of boolean_t, for the new code.

3065 ↗(On Diff #36741)

Move pgl declaration before count.

3073 ↗(On Diff #36741)

'{' should be put on the previous line.

3081 ↗(On Diff #36741)

I do not see a point of checking the emptiness of the list, it is done in vm_page_free_phys_pglist().

markj added a comment.Feb 5 2018, 3:48 PM
In D13485#297889, @kib wrote:

I have only style notes about the patch. In my opinion, this is a good change if only due to the reduction of the code duplication.

I've thought in the past that we should have a subr_pmap.c for sharing code between pmap implementations. Simple routines like this and pmap_resident_count_inc()/dec() could live there initially. Might be that some PV entry management code could be generalized there too.

I think the change is fine, modulo style bugs and nits that I pointed out.

sys/vm/vm_page.c
3054 ↗(On Diff #36741)

"In other words"

3061 ↗(On Diff #36741)

The name is a bit misleading, since the input pages need not be zero-filled. Maybe vm_page_free_pages_toq() for consistency?

ota_j.email.ne.jp marked 6 inline comments as done.

Renamed the function and updated styles based on comments.

ota_j.email.ne.jp marked 2 inline comments as done.

vm_page_free_phys_pglist(&pgl) was removed once by a mistake.

ota_j.email.ne.jp marked an inline comment as done.Feb 6 2018, 3:24 AM
ota_j.email.ne.jp planned changes to this revision.Feb 6 2018, 5:14 AM
ota_j.email.ne.jp requested review of this revision.Feb 10 2018, 5:59 AM

Intention of these changes were to improve reuse of code and also reduce number of locks.

pmap_free_zero_pages calls vm_page_free_toq() for each page and each vm_page_free_toq() locks vm_page_queue_free_mtx.
The new function calls vm_page_free_phys_pglist() that locks vm_page_queue_free_mtx. only once for all pages.

When I had printed the number of pages being freed in printf while testing, I saw many frees are for many 4 pages and sometimes up to 32.

kib accepted this revision.Mar 4 2018, 10:38 AM

Ok, I will do some testing of the patch and then commit it later today.

This revision is now accepted and ready to land.Mar 4 2018, 10:38 AM
markj accepted this revision.Mar 4 2018, 5:28 PM

I don't mean to object to the change, but I will note that r330296 subsumes this change since it adds transparent free queue batching to vm_page_free(). That is, most calls to vm_page_free() just add the page to a per-CPU queue, without acquire the free queue lock. One of the motivations for that work was to get rid of some of the ad-hoc batching code we already have and replace it with something more flexible. I'm not sure yet whether that change will eventually land, so I'm ok with the change in this diff.

kib added a comment.EditedMar 4 2018, 5:51 PM

I don't mean to object to the change, but I will note that r330296 subsumes this change since it adds transparent free queue batching to vm_page_free(). That is, most calls to vm_page_free() just add the page to a per-CPU queue, without acquire the free queue lock. One of the motivations for that work was to get rid of some of the ad-hoc batching code we already have and replace it with something more flexible. I'm not sure yet whether that change will eventually land, so I'm ok with the change in this diff.

I believe that the good bit of this change is the consolidation of the interface, I mean introduction of the vm_page_free_pages_toq(). Many pmaps use the same code, and x86 pmaps must use delayed free for page table pages. So providing the common interface, which can be adjusted to whatever state the VM evolution suggests, is IMO useful. If you want to trivialize the free of the set of page table pages, it is a single place to do it now.

This revision was automatically updated to reflect the committed changes.