Page MenuHomeFreeBSD

Add retry loop around GetMemoryMap call to fix fragmentation bug
ClosedPublic

Authored by bcran on Feb 25 2019, 6:57 AM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 21 2024, 3:18 AM
Unknown Object (File)
Feb 17 2024, 8:47 PM
Unknown Object (File)
Feb 16 2024, 11:45 AM
Unknown Object (File)
Feb 16 2024, 11:45 AM
Unknown Object (File)
Feb 16 2024, 11:45 AM
Unknown Object (File)
Feb 16 2024, 11:45 AM
Unknown Object (File)
Feb 16 2024, 11:45 AM
Unknown Object (File)
Feb 16 2024, 11:45 AM
Subscribers
None

Details

Summary

The call to BS->AllocatePages can cause the memory map to become framented,
causing BS->GetMemoryMap to return EFI_BUFFER_TOO_SMALL more than once. For
example this can happen on the MinnowBoard Turbot, causing the boot to stop
with an error. Avoid this by calling GetMemoryMap in a loop.

Test Plan

Tested on MinnowBoard Turbot. Caused the existing code to halt boot with the
GetMemoryMap error 5, and re-tested with this patch and verified the boot
continues.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 22761
Build 21853: arc lint + arc unit

Event Timeline

The odd thing is that the comment at the top does say "Note that the AllocatePages call can itself modify the memory map", but then if the second call to GetMemoryMap fails the function bails out instead of retrying. I'm wondering if that was a mistake, and if the call to ExitBootServices can _also_ fail or if we can remove the outer retry loop now the inner once is retrying GetMemoryMap?

The outer loop should be able to go away. IIRC the only non-fatal failure of ExitBootServices at this point is due to not having the latest memory map. If we've already done the allocate -> getmemorymap dance, there will be no failure we can correct.

stand/efi/loader/bootinfo.c
343

You can make it simpler - this could be endless loop (while (true)), and we:

  1. return if GetMemoryMap() will return error other than EFI_BUFFER_TOO_SMALL
  2. break if GetMemoryMap() had no error
  3. return if AllocatePages() will fail.
345

If you move if (!EFI_ERROR(status)) check before this line, we can just have if (status != EFI_BUFFER_TOO_SMALL)

354

Since we did check this already, this if is not needed.

358

This increment is too much, I think (the memory map can be quite large in systems with a lot of memory).

I would suggest to go with smaller steps, like page at a time, also at next line you can use ROUNDUP(sz, EFI_PAGE_SIZE).

Several changes to improve the code, and apply the same fix to loader/copy.c

stand/efi/loader/copy.c
181

I added a NULL check because BS->FreePages() fails (asserts) when passed a NULL pointer.

stand/efi/loader/bootinfo.c
343

Hmm, it probably _is_ safe to make it an infinite loop since I don't see a way it could end up looping forever.

stand/efi/loader/bootinfo.c
354

Good point - I'll fix it.

358

For systems with a large amount of memory, I'd expect still a relatively small memory map with one entry that covers most of the RAM. It'll be larger on systems like HPE that have lots of firmware utilities that will cause a lot of fragmentation.

In my latest patch I've changed it to add 10 memory descriptors to the size, which I think should be sufficient.

stand/efi/loader/copy.c
181

that check should be inside the free(). And this will lead us to another question - boot1 does use pool based allocation, and loader itself does use preallocated heap (see efi_main.c). Where we get to FreePages()?

Changes based on review comments by @tsoome

stand/efi/loader/copy.c
181

Oh, Free _does_ check for a NULL pointer before calling FreePages. However, it does seem that for some reason boot1 isn't using the preallocated heap?

void *
Malloc(size_t len, const char *file __unused, int line __unused)
{
	void *out;

	if (BS->AllocatePool(EfiLoaderData, len, &out) == EFI_SUCCESS)
		return (out);

	return (NULL);
}

void
Free(void *buf, const char *file __unused, int line __unused)
{
	if (buf != NULL)
		(void)BS->FreePool(buf);
}

Remove NULL pointer check before calling free().

stand/efi/loader/copy.c
181

before calling FreePages

Of course that should read FreePool.

stand/efi/loader/copy.c
181

No, boot1 does not use preallocated heap; We could make it to, but we rather should remove boot1 instead:)

From practical point of view, the size of preallocated heap would be an issue, because boot1 will stay in memory while loader.efi is running, that memory is wasted and will bite the systems with small memory - the loader itself is using 64MB for heap, another 64MB for kernel staging area + the loader binary itself. then add boot1...

stand/efi/loader/copy.c
119

You don't need this check.

stand/efi/loader/bootinfo.c
391

mjg (the other mjg, not our mjg) noted that ExitBootServices can modify the memory map then subsequently fail, forcing you to replay the GetMemoryMap dance to actually exit -- but the subsequent retry will succeed, so I was clearly wrong about dropping the outer loop (and now that I re-read the comment, whoops...).

stand/efi/loader/bootinfo.c
391

Do you have a link to the note/discussion, or more information? I haven't read anything in the spec about ExitBootServices modifying the memory map. Is it a bug in some implementations?

stand/efi/loader/copy.c
181

Ah, that's right, loader uses functions in stand/libsa/zalloc_malloc.c. I suspect the tons of calls to the UEFI allocation functions while the loader is running are coming from the system firmware for printing strings etc.

I hope to garbage collect efi/boot1 once I've updated installworld to update the ESP.

bcran marked 7 inline comments as done.Mar 2 2019, 3:46 AM
  • Added a retry loop around ExitBootServices, and addressed other comments on the review.
  • Removed unused ebsretry and retry variables
stand/efi/loader/bootinfo.c
358

@tsoome Could you confirm whether the latest patch addresses your comment, please? I'm not entirely sure what increment you were referring to.

stand/efi/loader/bootinfo.c
358

I think the comment was based on slight misreading; at the time of writing, it was being increased by mmsz (whose name would suggest that it might be the Memory Map SiZe, rather than the descriptor size that it is). Allocating for 10 more descriptors seems reasonable here, I think.

358

This can probably just go away, as discussed on IRC. The conversion to pages will take care of any necessary alignment, then we'll be clobbering sz here shortly if we don't error out anyways.

  • Remove alignment of sz and rename mmsz to dsz (to match copy.c).
  • Update copy.c to use dsz instead of sizeof(EFI_MEMORY_DESCRIPTOR).
bcran marked an inline comment as done.Mar 5 2019, 4:38 AM
bcran marked 2 inline comments as done.Mar 5 2019, 5:38 AM

generally this looks good.
I'm unsure about some of the times you call printf, but it can't hurt, I guess and might help.
minor style issues that I flagged.

stand/efi/loader/bootinfo.c
345

/*

  • Matthew...

is the style.

347

can we call printf if we can't get memory?

362

ditto

396–397

I'd be tempted to do this BEFORE the printf call.

This revision is now accepted and ready to land.Mar 6 2019, 4:22 AM
bcran marked 4 inline comments as done.Mar 6 2019, 5:09 AM
bcran added inline comments.
stand/efi/loader/bootinfo.c
345

Fixed.

347

Yes, because printf etc. use a preallocated 64MB heap.

362

We use a block of memory that was already allocated, so no problems calling printf here.

396–397

Changed.

This revision was automatically updated to reflect the committed changes.
bcran marked 4 inline comments as done.