Page MenuHomeFreeBSD

libexec/kgdb: Add a new VNET function and add more scaffolding
ClosedPublic

Authored by markj on Jun 12 2025, 9:01 PM.
Tags
None
Referenced Files
F131847100: D50825.diff
Sat, Oct 11, 4:20 PM
Unknown Object (File)
Sun, Oct 5, 2:23 AM
Unknown Object (File)
Fri, Oct 3, 8:57 AM
Unknown Object (File)
Wed, Oct 1, 11:40 PM
Unknown Object (File)
Wed, Oct 1, 11:40 PM
Unknown Object (File)
Tue, Sep 30, 11:41 AM
Unknown Object (File)
Tue, Sep 30, 7:10 AM
Unknown Object (File)
Mon, Sep 29, 9:16 PM

Details

Summary
- Add a gdb helper function which can be used to easily print VNET
  variables from kgdb.  See the docstring for some usage hints.
- Introduce a little module of useful python functions.
- Install a kernel-gdb.py in /usr/lib/debug/boot/kernel.  This sources
  all of our commands and functions so that they're automatically
  available.

I'm open to other ways of organizing these modules. It'd also be nice
to have a list of all the custom functions and commands we have, but I'm
not sure yet how best to go about that.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj requested review of this revision.Jun 12 2025, 9:01 PM

I guess installing these scripts to /usr/libexec/kgdb is not really right--we might want different versions of those scripts for different kernels. So, we could probably install them to /usr/lib/debug/boot/${kernel}/kernel-gdb/ instead. But, where in the src tree should the python libraries live? sys/tools/kgdb maybe?

While testing I found that this has issues with vnet variables in kernel modules.

For a test I added a panic() call in pf_test(), and tried to print V_pf_status.

The kernel and module were loaded at:

0xffffffff80200000  217bc18 kernel
0xffffffff836dd000    52b48 pf.ko

The kernel shows these addresses:

curvnet 0xfffff804a6a76bc0
curvnet->vnet_data_base 0xfffffe02c7d4aa20
&VNET_NAME(pf_status) 0xffffffff81b8fac8
&V_pf_status = 0xfffffe02498da4e8

Debugging the vnet.py script I found that it had the correct vnet: 0xfffff804a6a76bc0, and the correct vnet_data_base 0xfffffe02c7d4aa20, but it thought that the address of vnet_entry_pf_status (== &VNET_NAME(pf_status)) was 0xffffffff8372c390.

It looks like the vnet,py code thinks the pf.ko vnet variables live in the module's address range, while the kernel actually puts those in the kernel's vnet address range. I'm not sure how we can teach vnet.py to get the correct address for kernel module vnet variables.

In D50825#1160523, @kp wrote:

While testing I found that this has issues with vnet variables in kernel modules.

For a test I added a panic() call in pf_test(), and tried to print V_pf_status.

The kernel and module were loaded at:

0xffffffff80200000  217bc18 kernel
0xffffffff836dd000    52b48 pf.ko

The kernel shows these addresses:

curvnet 0xfffff804a6a76bc0
curvnet->vnet_data_base 0xfffffe02c7d4aa20
&VNET_NAME(pf_status) 0xffffffff81b8fac8
&V_pf_status = 0xfffffe02498da4e8

Debugging the vnet.py script I found that it had the correct vnet: 0xfffff804a6a76bc0, and the correct vnet_data_base 0xfffffe02c7d4aa20, but it thought that the address of vnet_entry_pf_status (== &VNET_NAME(pf_status)) was 0xffffffff8372c390.

It looks like the vnet,py code thinks the pf.ko vnet variables live in the module's address range, while the kernel actually puts those in the kernel's vnet address range. I'm not sure how we can teach vnet.py to get the correct address for kernel module vnet variables.

I think we can loop over the loaded linker files, find each one's VNET base, compare that with the pointer to see if the object lives in the corresponding module, and if so use the correct base. Certainly more complicated than what's there now, but I think it's doable. Let's see.

Thanks, the vnet function is indeed too simplistic. I believe, though I'm not totally certain, that I'll need to modify link_elf_obj.c a bit to preserve the original base address for the VNET variable section. Otherwise the debugger doesn't have a good way to figure out which VNET section a given variable belongs to. I have some WIP to address this but I need a bit of time.

The other problem is that we probably want to install a copy of these modules with each kernel, e.g., under /usr/lib/debug/boot/${kernel}/kernel-gdb/. I'll fix that too.

  • Fix vnet.py to work with kernel modules.
  • Move scripts to sys/tools/gdb/.
  • Install scripts to /usr/lib/debug/boot/<kernel name>/gdb
  • Modify kernel-gdb.py to automatically load those scripts.
  • Remove unneeded changes.

Python looks ok

sys/conf/kern.post.mk
401

Why not list them here? This looks unreproducible if there are extra files...

sys/conf/kern.post.mk
401

Mostly since it's just another list to maintain, but yes it's probably better to be explicit.

Have a hard-coded list in kern.post.mk

If this looks ok, I'll extend it to support PCPU and DPCPU variables. And probably counter(9) too.

If this looks ok, I'll extend it to support PCPU and DPCPU variables. And probably counter(9) too.

That would be great.

ObsoleteFiles.inc
56

FWIW, there is already something for kgdb in this file:

# 20201215: in-tree gdb removed 
OLD_FILES+=usr/libexec/gdb
OLD_FILES+=usr/libexec/kgdb

Not sure if OLD_FILES + OLD_DIRS could be consolidated somehow/

sys/tools/gdb/vnet.py
29

Just wondering if this is the most idiomatic and reliable way to check for VNET...

This revision is now accepted and ready to land.Thu, Sep 25, 1:53 PM
  • Fix handling of explicit VNETs passed to $V()
  • Improve the README
  • Add a simplistic selftest module
This revision now requires review to proceed.Sat, Sep 27, 12:31 PM
ObsoleteFiles.inc
56

From looking at the delete-old make targets, I don't think so. Dirs and files are just handled separately.

sys/tools/gdb/vnet.py
29

Maybe? I'm not sure. I don't see why it's worse than anything else, at least. I should probably add a helper function for this though.

I may be holding it wrong, but it still breaks for me:
It's a panic in vnet shutdown, so perhaps it's related to that:

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff804aaeba in db_dump (dummy=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) at /usr/src/sys/ddb/db_command.c:596
#3  0xffffffff804aacad in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=true) at /usr/src/sys/ddb/db_command.c:508
#4  0xffffffff804aa96d in db_command_loop () at /usr/src/sys/ddb/db_command.c:555
#5  0xffffffff804ae366 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:267
#6  0xffffffff80bdea1f in kdb_trap (type=type@entry=3, code=code@entry=0, tf=tf@entry=0xfffffe002fa705d0) at /usr/src/sys/kern/subr_kdb.c:790
#7  0xffffffff810e04eb in trap (frame=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:614
#8  <signal handler called>
#9  kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:556
#10 0xffffffff80b8e7bb in vpanic (fmt=0xffffffff8372daba "foo", ap=ap@entry=0xfffffe002fa70800) at /usr/src/sys/kern/kern_shutdown.c:962
#11 0xffffffff80b8e623 in panic (fmt=0xffffffff81d9fab0 <cnputs_mtx> "bm\034\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:887
#12 0xffffffff836d9722 in pf_sctp_multihome_detach_addr (s=0xfffff806ce77f480) at /usr/src/sys/netpfil/pf/pf.c:7415
#13 pf_detach_state (s=0xfffff806ce77f480) at /usr/src/sys/netpfil/pf/pf.c:1601
#14 0xffffffff836db623 in pf_remove_state (s=0xfffff806ce77f480) at /usr/src/sys/netpfil/pf/pf.c:2857
#15 0xffffffff837048ed in pf_clear_all_states () at /usr/src/sys/netpfil/pf/pf_ioctl.c:6181
#16 0xffffffff837039f5 in shutdown_pf () at /usr/src/sys/netpfil/pf/pf_ioctl.c:6665
#17 pf_unload_vnet () at /usr/src/sys/netpfil/pf/pf_ioctl.c:7011
#18 vnet_pf_uninit (unused=<optimized out>) at /usr/src/sys/netpfil/pf/pf_ioctl.c:7119
#19 0xffffffff80d136c4 in vnet_sysuninit () at /usr/src/sys/net/vnet.c:640
#20 vnet_destroy (vnet=0xfffff8012f7e4180) at /usr/src/sys/net/vnet.c:295
#21 0xffffffff80b483b9 in prison_deref (pr=<optimized out>, flags=67) at /usr/src/sys/kern/kern_jail.c:3576
#22 0xffffffff80bf7ae2 in taskqueue_run_locked (queue=queue@entry=0xfffff80004407a00) at /usr/src/sys/kern/subr_taskqueue.c:517
#23 0xffffffff80bf89d3 in taskqueue_thread_loop (arg=arg@entry=0xffffffff81da08a8 <taskqueue_jail_remove>) at /usr/src/sys/kern/subr_taskqueue.c:829
#24 0xffffffff80b3e042 in fork_exit (callout=0xffffffff80bf8900 <taskqueue_thread_loop>, arg=0xffffffff81da08a8 <taskqueue_jail_remove>, frame=0xfffffe002fa70f40)
    at /usr/src/sys/kern/kern_fork.c:1155
#25 <signal handler called>
(kgdb) p $V("pf_sctp_endpoints")
Python Exception <class 'gdb.error'>: There is no member named origaddr.
Error occurred in Python: There is no member named origaddr.
(kgdb) p $V("pf_status")
Python Exception <class 'gdb.error'>: There is no member named origaddr.
Error occurred in Python: There is no member named origaddr.
In D50825#1205382, @kp wrote:

I may be holding it wrong, but it still breaks for me:
It's a panic in vnet shutdown, so perhaps it's related to that:

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff804aaeba in db_dump (dummy=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) at /usr/src/sys/ddb/db_command.c:596
#3  0xffffffff804aacad in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=true) at /usr/src/sys/ddb/db_command.c:508
#4  0xffffffff804aa96d in db_command_loop () at /usr/src/sys/ddb/db_command.c:555
#5  0xffffffff804ae366 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:267
#6  0xffffffff80bdea1f in kdb_trap (type=type@entry=3, code=code@entry=0, tf=tf@entry=0xfffffe002fa705d0) at /usr/src/sys/kern/subr_kdb.c:790
#7  0xffffffff810e04eb in trap (frame=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:614
#8  <signal handler called>
#9  kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:556
#10 0xffffffff80b8e7bb in vpanic (fmt=0xffffffff8372daba "foo", ap=ap@entry=0xfffffe002fa70800) at /usr/src/sys/kern/kern_shutdown.c:962
#11 0xffffffff80b8e623 in panic (fmt=0xffffffff81d9fab0 <cnputs_mtx> "bm\034\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:887
#12 0xffffffff836d9722 in pf_sctp_multihome_detach_addr (s=0xfffff806ce77f480) at /usr/src/sys/netpfil/pf/pf.c:7415
#13 pf_detach_state (s=0xfffff806ce77f480) at /usr/src/sys/netpfil/pf/pf.c:1601
#14 0xffffffff836db623 in pf_remove_state (s=0xfffff806ce77f480) at /usr/src/sys/netpfil/pf/pf.c:2857
#15 0xffffffff837048ed in pf_clear_all_states () at /usr/src/sys/netpfil/pf/pf_ioctl.c:6181
#16 0xffffffff837039f5 in shutdown_pf () at /usr/src/sys/netpfil/pf/pf_ioctl.c:6665
#17 pf_unload_vnet () at /usr/src/sys/netpfil/pf/pf_ioctl.c:7011
#18 vnet_pf_uninit (unused=<optimized out>) at /usr/src/sys/netpfil/pf/pf_ioctl.c:7119
#19 0xffffffff80d136c4 in vnet_sysuninit () at /usr/src/sys/net/vnet.c:640
#20 vnet_destroy (vnet=0xfffff8012f7e4180) at /usr/src/sys/net/vnet.c:295
#21 0xffffffff80b483b9 in prison_deref (pr=<optimized out>, flags=67) at /usr/src/sys/kern/kern_jail.c:3576
#22 0xffffffff80bf7ae2 in taskqueue_run_locked (queue=queue@entry=0xfffff80004407a00) at /usr/src/sys/kern/subr_taskqueue.c:517
#23 0xffffffff80bf89d3 in taskqueue_thread_loop (arg=arg@entry=0xffffffff81da08a8 <taskqueue_jail_remove>) at /usr/src/sys/kern/subr_taskqueue.c:829
#24 0xffffffff80b3e042 in fork_exit (callout=0xffffffff80bf8900 <taskqueue_thread_loop>, arg=0xffffffff81da08a8 <taskqueue_jail_remove>, frame=0xfffffe002fa70f40)
    at /usr/src/sys/kern/kern_fork.c:1155
#25 <signal handler called>
(kgdb) p $V("pf_sctp_endpoints")
Python Exception <class 'gdb.error'>: There is no member named origaddr.
Error occurred in Python: There is no member named origaddr.
(kgdb) p $V("pf_status")
Python Exception <class 'gdb.error'>: There is no member named origaddr.
Error occurred in Python: There is no member named origaddr.

The patch in D52730 is required as well. For kernel modules we need some extra metadata in order to properly resolve VNET symbols; I added an "origaddr" field in link_elf_obj.c for this reason, and here the script is complaining that it doesn't exist.

JFYI, for counter(9) this is all what is needed:

define counter_fetch
  set $sum = 0
  set $c = (uintptr_t )$arg0 + (uintptr_t )&__pcpu[0]
  set $n = 0
  while ( $n < mp_ncpus )
    set $sum = $sum + *(uint64_t *)$c
    set $c = (uint64_t *)((char *)$c + sizeof(struct pcpu))
    set $n = $n + 1  
  end
  p/u $sum
end
document counter_fetch
display a counter_u64 value
end

JFYI, for counter(9) this is all what is needed:

define counter_fetch
  set $sum = 0
  set $c = (uintptr_t )$arg0 + (uintptr_t )&__pcpu[0]
  set $n = 0
  while ( $n < mp_ncpus )
    set $sum = $sum + *(uint64_t *)$c
    set $c = (uint64_t *)((char *)$c + sizeof(struct pcpu))
    set $n = $n + 1  
  end
  p/u $sum
end
document counter_fetch
display a counter_u64 value
end

On amd64 only, or does this work on all platforms? IIRC the amd64 flavour is a bit different, see the zpcpu_* macros in amd64/include/pcpu.h vs sys/pcpu.h.

On amd64 only, or does this work on all platforms? IIRC the amd64 flavour is a bit different, see the zpcpu_* macros in amd64/include/pcpu.h vs sys/pcpu.h.

AFAIR, amd64 has only special increment method, but the storage layout is the same on all platforms.

Add a $PCPU function to access PCPU and DPCPU variables.

I think I will stop here. Will add a pretty-printer for per-CPU counters in a
separate patch.

I'm not qualified to review this in depth, but with the prerequisite patch included this works and is very, very useful.

This revision is now accepted and ready to land.Sun, Sep 28, 1:43 PM