Paths

Table of Contentst

-
head/sys/vm/
-
sys/
-
vm/
-
vm_object.c

Rewrite handling of the busy parent page in collapse
ClosedPublic
Actions

Authored by kib on Nov 13 2015, 12:58 PM.

Details

Reviewers

alc
cem

Commits

rS291576: r221714 fixed the situation when the collapse scan improperly handled

Summary

This is a continuation of D4103. It includes

the D4103 fix, by handling busy pages for both _WAIT and _NOWAIT in the same way (modulo sleep),
the fix to not free the shadow swap when parent page is invalid,
combine all three instances of the sleep code into one helper.

It was coded with assumption that invalid implies busy, which is checked by a new assert. This can be replaced by explicit invalid check and skip, if decided.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib updated this revision to Diff 10156.Nov 13 2015, 12:58 PM

kib retitled this revision from to Rewrite handling of the busy parent page in collapse.

kib updated this object.

kib edited the test plan for this revision. (Show Details)

kib added reviewers: alc, cem.

kib set the repository for this revision to rS FreeBSD src repository - subversion.

Herald added a subscriber: imp. · View Herald TranscriptNov 13 2015, 12:58 PM

kib mentioned this in D4103: Fix vm_object_collapse <-> vm_fault race (again).Nov 13 2015, 12:58 PM

Looks good to me.

This revision is now accepted and ready to land.Nov 13 2015, 6:57 PM

cem edited edge metadata.Nov 13 2015, 6:57 PM

cem added a subscriber: rlibby.

In D4146#87206, @cem wrote:

Looks good to me.

Did you tested the patch, in particular, with your failpoints ?

I've tested the proposed patch with some failpoints (described in https://reviews.freebsd.org/D4103 ) and it seems to fix this particular panic. There are still some bugs left in this area.

(1) We can still hit this related but slightly different panic with the UMA NOWAIT failpoints + while true; do sh -c "echo | cat | cat > /dev/null"; done:

panic @ time 1448702920.309, thread 0xfffff80028b93000: backing_object 0xfffff800282fd400 was somehow re-referenced during collapse!
cpuid = 0
Panic occurred in module kernel loaded at 0xffffffff80200000:

Stack: --------------------------------------------------
kernel:kassert_panic+0x17e
kernel:vm_object_collapse+0x40d
kernel:vm_object_deallocate+0x4c2
kernel:vm_map_process_deferred+0x88
kernel:vm_map_remove+0xca
kernel:vmspace_exit+0xcc
kernel:exit1+0x55d
kernel:sys_sys_exit+0xd
kernel:amd64_syscall+0x2f6
--------------------------------------------------
Disabling swatchdog
Cannot dump stacks. No stack dump device defined.

db> show object 0xfffff800282fd400
Object 0xfffff800282fd400: type=0, size=0x20, res=0, ref=2, flags=0x1008 ruid 0 charge 20000
 sref=1, backing_object(0)=(0)+0x0

And (2), the D4103 failpoints still cause userspace processes to dump core. But the kernel stays up with just the D4103 failpoints.

In D4146#90570, @cem wrote:
I've tested the proposed patch with some failpoints (described in https://reviews.freebsd.org/D4103 ) and it seems to fix this particular panic. There are still some bugs left in this area.

(1) We can still hit this related but slightly different panic with the UMA NOWAIT failpoints + while true; do sh -c "echo | cat | cat > /dev/null"; done:
panic @ time 1448702920.309, thread 0xfffff80028b93000: backing_object 0xfffff800282fd400 was somehow re-referenced during collapse!
cpuid = 0
Panic occurred in module kernel loaded at 0xffffffff80200000:

Stack: --------------------------------------------------
kernel:kassert_panic+0x17e
kernel:vm_object_collapse+0x40d
kernel:vm_object_deallocate+0x4c2
kernel:vm_map_process_deferred+0x88
kernel:vm_map_remove+0xca
kernel:vmspace_exit+0xcc
kernel:exit1+0x55d
kernel:sys_sys_exit+0xd
kernel:amd64_syscall+0x2f6
--------------------------------------------------
Disabling swatchdog
Cannot dump stacks. No stack dump device defined.

db> show object 0xfffff800282fd400
Object 0xfffff800282fd400: type=0, size=0x20, res=0, ref=2, flags=0x1008 ruid 0 charge 20000
 sref=1, backing_object(0)=(0)+0x0

I think an easy way to understand what is going on is to add the debugging print like

if ((object->flags & OBJ_IN_COLLAPSE) != 0) {printf("referencing the backing collapsed obj %p\n", object); kdb_backtrace();}

and correspondingly set and clear the new OBJ_IN_COLLAPSE flag for the backing_object in the vm_object_collapse().

I suspect that this happens during some walk of the shadow chain which walk increases the ref count. Might be, even during the vm_fault(). Most likely, the assert is simply invalid and would need to be removed, but lets see when it is triggered.

And (2), the D4103 failpoints still cause userspace processes to dump core. But the kernel stays up with just the D4103 failpoints.

Ok, lets finish with the assert first. I want to flush this patch before producing more changes.

In D4146#90770, @kib wrote:
I think an easy way to understand what is going on is to add the debugging print like
if ((object->flags & OBJ_IN_COLLAPSE) != 0) {printf("referencing the backing collapsed obj %p\n", object); kdb_backtrace();}
and correspondingly set and clear the new OBJ_IN_COLLAPSE flag for the backing_object in the vm_object_collapse().

I suspect that this happens during some walk of the shadow chain which walk increases the ref count. Might be, even during the vm_fault(). Most likely, the assert is simply invalid and would need to be removed, but lets see when it is triggered.

Unfortunately, adding such checking seems to have closed the race.

rlibby mentioned this in D4326: vm_fault_hold: handle vm_page_rename failure.Dec 1 2015, 4:13 AM

Closed by commit rS291576: r221714 fixed the situation when the collapse scan improperly handled (authored by kib). · Explain WhyDec 1 2015, 9:06 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

Path

Size

head/

sys/

vm/

vm_object.c

183 lines

Diff 10629

View Options

head/sys/vm/vm_object.c

	Show First 20 Lines • Show All 433 Lines • ▼ Show 20 Lines
	enum Dwarf_ISA {			enum Dwarf_ISA {
	DW_ISA_ARM,			DW_ISA_ARM,
	DW_ISA_IA64,			DW_ISA_IA64,
	DW_ISA_MIPS,			DW_ISA_MIPS,
	DW_ISA_PPC,			DW_ISA_PPC,
	DW_ISA_SPARC,			DW_ISA_SPARC,
	DW_ISA_X86,			DW_ISA_X86,
	DW_ISA_X86_64,			DW_ISA_X86_64,
				DW_ISA_AARCH64,
	DW_ISA_MAX			DW_ISA_MAX
	};			};

	/* Function prototype definitions. */			/* Function prototype definitions. */
	__BEGIN_DECLS			__BEGIN_DECLS
	Dwarf_P_Attribute dwarf_add_AT_comp_dir(Dwarf_P_Die, char , Dwarf_Error );			Dwarf_P_Attribute dwarf_add_AT_comp_dir(Dwarf_P_Die, char , Dwarf_Error );
	Dwarf_P_Attribute dwarf_add_AT_const_value_signedint(Dwarf_P_Die, Dwarf_Signed,			Dwarf_P_Attribute dwarf_add_AT_const_value_signedint(Dwarf_P_Die, Dwarf_Signed,
	Dwarf_Error *);			Dwarf_Error *);
	▲ Show 20 Lines • Show All 386 Lines • Show Last 20 Lines

Rewrite handling of the busy parent page in collapseClosedPublicActions