Page MenuHomeFreeBSD

smp: Dynamically allocate the stoppcbs array
ClosedPublic

Authored by markj on Apr 25 2023, 5:22 PM.
Tags
None
Referenced Files
F108126279: D39806.diff
Tue, Jan 21, 4:15 PM
Unknown Object (File)
Sat, Jan 11, 2:47 AM
Unknown Object (File)
Tue, Jan 7, 6:45 PM
Unknown Object (File)
Tue, Jan 7, 6:44 PM
Unknown Object (File)
Tue, Jan 7, 6:38 PM
Unknown Object (File)
Mon, Jan 6, 7:06 AM
Unknown Object (File)
Dec 12 2024, 7:50 PM
Unknown Object (File)
Nov 12 2024, 5:28 PM

Details

Summary

With MAXCPU = 256, it occupies 80KB, but most of this memory is not
needed. Use dynamic allocation instead to avoid bloat.

Test Plan

Verified that kgdb can still open a vmcore.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj requested review of this revision.Apr 25 2023, 5:22 PM
This revision is now accepted and ready to land.Apr 25 2023, 5:23 PM

Indeed I believe that kgdb now needs to do the indirection to fetch pcbs.

In D39806#906402, @kib wrote:

Indeed I believe that kgdb now needs to do the indirection to fetch pcbs.

Are you saying that there is a problem with the patch?

In D39806#906402, @kib wrote:

Indeed I believe that kgdb now needs to do the indirection to fetch pcbs.

Are you saying that there is a problem with the patch?

In the patch no, but I do believe that kgdb needs a change after the patch.

In D39806#906673, @kib wrote:
In D39806#906402, @kib wrote:

Indeed I believe that kgdb now needs to do the indirection to fetch pcbs.

Are you saying that there is a problem with the patch?

In the patch no, but I do believe that kgdb needs a change after the patch.

Based on my testing (on amd64), it does not.

In D39806#906694, @kib wrote:

So does kgdb use stoppcbs?

Yes, but it does not appear to need any changes after this commit, gdb evaluates stoppcbs the same as before.

Though, currently I cannot open arm64 vmcores with kgdb, regardless of whether this patch is applied. So I will hold off on committing it for now until I understand why arm64 kernel dumps are broken [again].

So I looked at the actual use of stoppcb in kdb. In particular, /usr/ports/devel/gdb/files/kgdb/fbsd-kvm.c:

static CORE_ADDR stoppcbs;
...
	stoppcbs = kgdb_lookup("stoppcbs");
...
CORE_ADDR
kgdb_trgt_stop_pcb(u_int cpuid)
{

	if (stoppcbs == 0 || pcb_size == 0)
		return 0;

	return (stoppcbs + pcb_size * cpuid);
}

Since stoppcbs changed from array to pointer, I do not see how kgdb_trgt_stop_pcb() could work after the change. And the function is used, according to my browsing around the kgdb sources.

In D39806#906694, @kib wrote:

So does kgdb use stoppcbs?

Yes, but it does not appear to need any changes after this commit, gdb evaluates stoppcbs the same as before.

Though, currently I cannot open arm64 vmcores with kgdb, regardless of whether this patch is applied. So I will hold off on committing it for now until I understand why arm64 kernel dumps are broken [again].

arm64 needs changes for trap frame and pcb changes (which I will do, I just did them for Morello in the CHERI GDB branch this weekend). However, I think @kib is right that this will need to at least have a __FreeBSD_version bump so that GDB will know if it needs to use the address of stoppcbs as the array's base address or if it needs to the treat it as a pointer and read the pointer value from the symbol instead.

You need a crash dump and you have to look at a thread running on another CPU (the crashing thread uses dumppcb) to test this.

Also need lldb patch

contrib/llvm-project/lldb/source/Plugins/Process/FreeBSDKernel/ProcessFreeBSDKernel.cpp:

    // stoppcbs is an array of PCBs on all CPUs
    // each element is of size pcb_size
    int32_t pcbsize =
        ReadSignedIntegerFromMemory(FindSymbol("pcb_size"), 4, -1, error);
    lldb::addr_t stoppcbs = FindSymbol("stoppcbs");
...
          if (stoppcbs != LLDB_INVALID_ADDRESS && pcbsize > 0)
            pcb_addr = stoppcbs + oncpu * pcbsize;
  • Bump __FreeBSD_version so that kgdb can see the change, see D39951.
  • Make the change on all platforms, allocate the stoppcbs array in mp_start().
  • Remove the KDB_STOPPEDPCB macro.
This revision now requires review to proceed.May 3 2023, 3:54 PM
emaste added inline comments.
sys/x86/include/x86_smp.h
58–61

unrelated c99 improvement?

This revision is now accepted and ready to land.May 3 2023, 4:40 PM
sys/x86/include/x86_smp.h
32–33

This will be removed too.

58–61

That's from a rebase.

Since you change all arches, might be it makes sense to rename stoppcbs into something different, so that kgdb does not need to check freebsd version, and simply look up one symbol or another, with changed semantics?

In D39806#909477, @kib wrote:

Since you change all arches, might be it makes sense to rename stoppcbs into something different, so that kgdb does not need to check freebsd version, and simply look up one symbol or another, with changed semantics?

Is there any other advantage? The diff required for kgdb and lldb is quite small and easy to understand, so I don't see much value in renaming.

markj retitled this revision from x86: Dynamically allocate the stoppcbs array to smp: Dynamically allocate the stoppcbs array.May 4 2023, 2:10 PM

One advantage is we can avoid the osversion magic, and it would also just work if we MFC this (although I don't think that's planned). If we really wanted we could even add the new pointer and have it point to the existing stoppcbs initialy, have lldb and kgdb start using it, and then make this change.

All of that said I don't think it's too important and a window of a couple of days to get the kgdb/lldb change sorted should be ok.

One advantage is we can avoid the osversion magic, and it would also just work if we MFC this (although I don't think that's planned). If we really wanted we could even add the new pointer and have it point to the existing stoppcbs initialy, have lldb and kgdb start using it, and then make this change.

All of that said I don't think it's too important and a window of a couple of days to get the kgdb/lldb change sorted should be ok.

I'm not sure how this could be MFCed even with a rename - users need an updated kgdb/lldb regardless, but I think we want to assume that an old kgdb will continue to work with a newer stable/13 kernel.

We could add e.g. stoppcbs_ptr and have it point to stoppcbs in 13.x, so old and updated lldb/kgdb would work on old and updated kernels with no version check. In any case I'm fine with either approach.

We could add e.g. stoppcbs_ptr and have it point to stoppcbs in 13.x, so old and updated lldb/kgdb would work on old and updated kernels with no version check. In any case I'm fine with either approach.

Sorry, but I still don't understand what that buys us. No matter what, kgdb/lldb need an update, and as a result the change cannot be MFCed.

I think an osreldate check is ok. I already have to add one now to deal with arm64 struct layout changes in 14 anyway.

I'll commit this version later today if there are no objections.

Sorry, but I still don't understand what that buys us. No matter what, kgdb/lldb need an update, and as a result the change cannot be MFCed.

I have no objection to this version.

My point was just that we could start with something like:

struct pcb stoppcbs[];
struct pcb *stoppcbs_ptr = stoppcbs;

and have kgdb and lldb dereference stoppcbs_ptr if found, falling back to stoppcbs if not. This kgdb and lldb would work in all cases, regardless of whether there is only stoppcbs, or whether stoppcbs_ptr points to the static array or malloc. After enough time passes to ensure kgdb and lldb versions in use have the change it would be possible to MFC the use of malloc.

After enough time passes to ensure kgdb and lldb versions in use have the change it would be possible to MFC the use of malloc.

Sorry, but I still don't understand what that buys us. No matter what, kgdb/lldb need an update, and as a result the change cannot be MFCed.

I have no objection to this version.

My point was just that we could start with something like:

struct pcb stoppcbs[];
struct pcb *stoppcbs_ptr = stoppcbs;

and have kgdb and lldb dereference stoppcbs_ptr if found, falling back to stoppcbs if not. This kgdb and lldb would work in all cases, regardless of whether there is only stoppcbs, or whether stoppcbs_ptr points to the static array or malloc. After enough time passes to ensure kgdb and lldb versions in use have the change it would be possible to MFC the use of malloc.

According to our general compatibility policy I'd presume to be able to use 13.0 kgdb to debug 13.x vmcores. If so, that precludes MFCing the kernel patch, no matter which approach is taken.

According to our general compatibility policy I'd presume to be able to use 13.0 kgdb to debug 13.x vmcores. If so, that precludes MFCing the kernel patch, no matter which approach is taken.

To be clear, I think it's safer to MFC none of the kernel changes and it's a moot point.

But in my strawman proposal I'd leave stoppcbs in stable/13, with stoppcbs_ptr pointing to it, so that (a new) kgdb/lldb would work with all three variants without having to do any version checks.

13.0 - stoppcbs_ptr does not exist, old/new kgdb uses stoppcbs
stable/13 - old kgdb accesses stoppcbs, new kgdb via stoppcbs_ptr
main - old kgdb fails, new kgdb accesses allocated stoppcbs_ptr

kgdb/lldb would work with all three variants without having to do any version checks.

As I understand it, jhb is planning to add version checks soon anyway to handle arm64 trapframe layout changes, so I think I'll just go ahead with what we have. ELF minidumps will make this kind of problem much easier to deal with when they arrive.

This revision was automatically updated to reflect the committed changes.