Page MenuHomeFreeBSD

vm_phys: speed up unfree_page search
Needs ReviewPublic

Authored by dougm on Aug 6 2023, 6:53 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 19 2024, 4:27 AM
Unknown Object (File)
Mar 19 2024, 2:49 AM
Unknown Object (File)
Feb 27 2024, 1:06 AM
Unknown Object (File)
Dec 31 2023, 4:10 PM
Unknown Object (File)
Dec 22 2023, 11:17 PM
Unknown Object (File)
Dec 21 2023, 3:45 AM
Unknown Object (File)
Aug 12 2023, 8:25 PM
Subscribers

Details

Reviewers
alc
markj
Summary

Rewrite the first search loop in vm_phys_unfree_page.

The new version has fewer iterations, as each iteration produces a different m_set value, which is not the case now. Each iteration has two exit tests per iteration, rather than three.

The search loop size is reduced from this:

Original (58 bytes):

20a0: 49 c7 c1 00 e0 ff ff         	movq	$-0x2000, %r9           # imm = 0xE000
20a7: 49 d3 e1                     	shlq	%cl, %r9
20aa: 49 21 c1                     	andq	%rax, %r9
20ad: 4d 29 d9                     	subq	%r11, %r9
20b0: 72 6c                        	jb	0x211e <vm_phys_unfree_page+0xce>
20b2: 4c 8d 51 01                  	leaq	0x1(%rcx), %r10
20b6: 48 8b 33                     	movq	(%rbx), %rsi
20b9: 49 c1 e9 0c                  	shrq	$0xc, %r9
20bd: 4d 6b f9 68                  	imulq	$0x68, %r9, %r15
20c1: 46 0f b6 4c 3e 5c            	movzbl	0x5c(%rsi,%r15), %r9d
20c7: 41 80 f9 0d                  	cmpb	$0xd, %r9b
20cb: 41 0f 94 c6                  	sete	%r14b
20cf: 75 09                        	jne	0x20da <vm_phys_unfree_page+0x8a>
20d1: 48 83 f9 0b                  	cmpq	$0xb, %rcx
20d5: 4c 89 d1                     	movq	%r10, %rcx
20d8: 72 c6                        	jb	0x20a0 <vm_phys_unfree_page+0x50>

to this (36 bytes):

20a0: 49 89 c0                     	movq	%rax, %r8
20a3: 48 ff c8                     	decq	%rax
20a6: 4c 21 c0                     	andq	%r8, %rax
20a9: 49 89 c2                     	movq	%rax, %r10
20ac: 49 29 d2                     	subq	%rdx, %r10
20af: 49 c1 ea 0c                  	shrq	$0xc, %r10
20b3: 4c 39 c8                     	cmpq	%r9, %rax
20b6: 76 0c                        	jbe	0x20c4 <vm_phys_unfree_page+0x74>
20b8: 4d 6b da 68                  	imulq	$0x68, %r10, %r11
20bc: 42 80 7c 19 5c 0d            	cmpb	$0xd, 0x5c(%rcx,%r11)
20c2: 74 dc                        	je	0x20a0 <vm_phys_unfree_page+0x50>

The size of the function is reduced by 16 bytes.

Test Plan

Peter, do you have any tests that involve the 'ram_blacklist' module? That's what it would take to test this.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

dougm requested review of this revision.Aug 6 2023, 6:53 AM
dougm created this revision.

No unfortunately I don't have any tests that involve the 'ram_blacklist' module.

sys/vm/vm_phys.c
1319

How can it arise that m_set->order < order? That would imply that the chunk of pages headed by m_set does not contain m, but in that case we should have already found the largest free chunk containing m.

dougm added inline comments.
sys/vm/vm_phys.c
1319

m is odd, allocated, and m-1 is a single free block. So
m->order == VM_NFREEORDER, m_set becomes m-1, and m_set->order == 0, while order == 1, when the two are compared after the loop.

dougm marked an inline comment as done.

Refresh, seeking review.