Page MenuHomeFreeBSD

procstat kqueues: query and display events registered in the process kqueues
ClosedPublic

Authored by kib on Feb 28 2025, 7:21 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, May 18, 6:37 PM
Unknown Object (File)
Sun, May 18, 5:40 PM
Unknown Object (File)
Tue, Apr 22, 7:29 AM
Unknown Object (File)
Apr 15 2025, 1:45 AM
Unknown Object (File)
Apr 10 2025, 12:18 PM
Unknown Object (File)
Apr 6 2025, 8:55 AM
Unknown Object (File)
Apr 5 2025, 9:35 AM
Unknown Object (File)
Mar 30 2025, 2:41 PM
Subscribers

Details

Summary

Currently the output of 'procstat kqueues' looks like this:

 PID       KQFD   FILTER      IDENT      FLAGS     FFLAGS       DATA      UDATA       EXT0       EXT1       EXT2       EXT3     STATUS
2323         13     READ          5          0          0          0 0x34887b416000          0          0          0          0          0
2323         13     READ          8          0          0          0 0x34887b44f070          0          0          0          0          0
2323         13     READ         10          0          0          0 0x34887b44f0e0          0          0          0          0          0
2323         13   SIGNAL          1       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
2323         13   SIGNAL          2       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
2323         13   SIGNAL          3       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
2323         13   SIGNAL         13       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
2323         13   SIGNAL         14       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
2323         13   SIGNAL         15       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
2323         13   SIGNAL         20       0x20(CLEAR)          0          0        0x0          0          0          0          0          0

TODO1: add filter-specific reporting. See the vnode data in the kinfo, it is still not yet displayed by procstat.
TODO2: fix ABI of kinfo_kevent
TODO3: consider compat32 (currently EOPNOTSUPP)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Feb 28 2025, 7:21 AM
lib/libc/gen/sysctl.3
589 ↗(On Diff #151663)
595 ↗(On Diff #151663)
sys/kern/kern_event.c
2852 ↗(On Diff #151663)

An error can occur, e.g., if the vnode referenced by the knote is doomed (in filt_fsdump()). In this case we will abort the whole sysctl, but that feels wrong.

2900 ↗(On Diff #151663)

Why do this adjustment here? Userspace has to handle races anyway.

2906 ↗(On Diff #151663)

What's the purpose of using _SAFE? It seems to be possible that kn is removed while the locks are dropped (or maybe not, since knote_drop_detached() asserts that kn_influx == 1), but that's true of kn1 too, no?

sys/kern/vfs_subr.c
6485 ↗(On Diff #151663)

Did you mean to attach this to file_filtops?

sys/sys/user.h
671 ↗(On Diff #151663)

Is this a pad field?

usr.bin/procstat/procstat_kqueue.c
32 ↗(On Diff #151663)

Why is this define needed?

98 ↗(On Diff #151663)

Some of these names can be obtained from libsysdecode. Did you consider using and extending it instead?

kib marked 9 inline comments as done.Mar 1 2025, 9:30 PM
kib added inline comments.
sys/kern/kern_event.c
2900 ↗(On Diff #151663)

It is the copy/paste from the vm_object sysctl, and I believe that this is just a minor service to userspace.

2906 ↗(On Diff #151663)

This was copy/paste.

sys/kern/vfs_subr.c
6485 ↗(On Diff #151663)

Yes, thank you.

sys/sys/user.h
671 ↗(On Diff #151663)

No, this is a non-finished work. The extra data in this struct needs discriminator, since e.g. vnode info is relevant for some non-static set of the filters.

usr.bin/procstat/procstat_kqueue.c
32 ↗(On Diff #151663)

For the struct knote.kn_status KN_XXX constants.

98 ↗(On Diff #151663)

sysdecode is very inconvenient for use in procstat. I can move this set of functions to libsysdecode, but existing interfaces in libsysdecode (FILE * etc) do not fit.

kib marked 6 inline comments as done.

Fixes after the Mark' notes.

procstat.1: sort subcommands. Document the rlimitusage and kqueue subcommands.

sys/kern/kern_event.c
2906 ↗(On Diff #151663)

So the assumption appears to be that the in-flux state prevents the knote from being dequeued. I'm not sure that's true in general. Consider the case where we're attaching a new knote in kqueue_register():

1663                         error = knote_attach(kn, kq);                                                                                                                                                                      
1664                         KQ_UNLOCK(kq);                                                                                                                                                                                     
1665                         if (error != 0) {                                                                                                                                                                                  
1666                                 tkn = kn;                                                                                                                                                                                  
1667                                 goto done;                                                                                                                                                                                 
1668                         }                                                                                                                                                                                                  
1669                                                                                                                                                                                                                            
1670                         if ((error = kn->kn_fop->f_attach(kn)) != 0) {                                                                                                                                                     
1671                                 knote_drop_detached(kn, td);                                                                                                                                                               
1672                                 goto done;                                                                                                                                                                                 
1673                         }

knote_attach() makes the knote visible to the sysctl; if f_attach fails, the knote is removed without checking for the in-flux state.

Perhaps the new sysctl handler should skip KN_DETACHED knotes, or kqueue_register() should drain the in-flux state after f_attach fails. There might be other cases, I did not check exhaustively.

usr.bin/procstat/procstat_kqueue.c
32 ↗(On Diff #151663)

Probably it's better to add a translation step, akin to KMAP_FLAG_* for vm_map flags.

kib marked an inline comment as done.

Drain detached fluxed knotes.
Convert knt_status bits to stable ABI.
More (unrelated) procstat.1 cleanups.

sys/kern/kern_event.c
2906 ↗(On Diff #151663)

Yes, it would even trigger the assert in knote_drop_detached(). I added the drain there.

sys/kern/kern_event.c
2885 ↗(On Diff #151731)

Does this need to be kept?

sys/sys/user.h
683 ↗(On Diff #151731)

I tried building world with this patch, and got:

In file included from /home/markj/sb/main/src/contrib/googletest/googletest/src/gtest-all.cc:45:
In file included from /home/markj/sb/main/src/contrib/googletest/googletest/src/gtest-port.cc:70:
/home/markj/sb/main/bricoler/freebsd-src-build/obj.amd64.amd64/home/markj/sb/main/src/amd64.amd64/tmp/usr/include/sys/user.h:684:10: error: types cannot be declared in an anonymous union
  684 |                 struct knt_vnode_t {
      |                        ^
lib/libc/gen/sysctl.3
595 ↗(On Diff #151731)

You could optionally xref the kqueue manual.

kib marked 2 inline comments as done.

Make struct kinfo_knote definition correct C++.
Remove #ifdef-ed block.

The formatting for knotes with flags set is somewhat strange. The output is also quite wide. For the latter, perhaps the EXT* fields should be printed only if a -v option is specified? As far as I know, they are not used by any existing filter types.

An example from a VM:

 PID       KQFD   FILTER      IDENT      FLAGS     FFLAGS       DATA      UDATA       EXT0       EXT1       EXT2       EXT3     STATUS
1384         13     READ          5          0          0          0 0x1ecadb016000          0          0          0          0          0
1384         13     READ          8          0          0          0 0x1ecadb049070          0          0          0          0          0
1384         13     READ         10          0          0          0 0x1ecadb0490e0          0          0          0          0          0
1384         13   SIGNAL          1       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
1384         13   SIGNAL          2       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
1384         13   SIGNAL          3       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
1384         13   SIGNAL         13       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
1384         13   SIGNAL         14       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
1384         13   SIGNAL         15       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
1384         13   SIGNAL         20       0x20(CLEAR)          0          0        0x0          0          0          0          0          0
3601          5     READ          3       0x20(CLEAR)          0          0        0x0          0          0          0          0          0

Maybe flags should be single letters, like in procstat -v output?

sys/sys/user.h
689 ↗(On Diff #151772)
kib marked 3 inline comments as done.Mar 9 2025, 11:38 AM
kib added inline comments.
lib/libc/gen/sysctl.3
595 ↗(On Diff #151731)

The kqueue there means an object and not a syscall. IMO it is somewhat confusing to reference it in such way.

Encode flags and status with single letters.

Stop displaying hex for flags/fflags/status, show only the symbols or names.

 PID       KQFD   FILTER      IDENT      FLAGS     FFLAGS       DATA      UDATA     STATUS
2323         13     READ          5          -          -          0 0x19c290616000          -
2323         13     READ          8          -          -          0 0x19c29064f070          -
2323         13     READ         10          -          -          0 0x19c29064f0e0          -
2323         13   SIGNAL          1          C          -          0        0x0          -
2323         13   SIGNAL          2          C          -          0        0x0          -
2323         13   SIGNAL          3          C          -          0        0x0          -
2323         13   SIGNAL         13          C          -          0        0x0          -
2323         13   SIGNAL         14          C          -          0        0x0          -
2323         13   SIGNAL         15          C          -          0        0x0          -
2323         13   SIGNAL         20          C          -          0        0x0          -

Current output from procstat -a kqueues.

markj added inline comments.
usr.bin/procstat/procstat.1
199 ↗(On Diff #152064)
200 ↗(On Diff #152064)
222 ↗(On Diff #152064)
This revision is now accepted and ready to land.Mar 13 2025, 12:39 AM
kib marked 3 inline comments as done.Mar 13 2025, 3:28 PM

Thanks a bunch of for doing this. This is one of the features I've wanted for a long time as a useful tool to understand what a process might be waiting on. The other thing I would love to have in this vein are a way to see the file descriptors that a thread blocked in select/poll is waiting on (and the specific poll flags for each fd). For that case you could use the existing open file descriptor sysctl/core dump note to get details about each fd, you would just need the list of <fd, poll flags> tuples.

One question I have is have you considered including this information in core dumps via PROCSTAT core dump notes as we do for open files, etc?

usr.bin/procstat/procstat_kqueue.c
98 ↗(On Diff #151663)

Note that you can use an open_memstream() stream as the FILE * to pass to sysdecode routines in theory (and it does have decoders for a few of these already I think), but I can see why that can be a bit klunky.

In D49163#1125281, @jhb wrote:

Thanks a bunch of for doing this. This is one of the features I've wanted for a long time as a useful tool to understand what a process might be waiting on. The other thing I would love to have in this vein are a way to see the file descriptors that a thread blocked in select/poll is waiting on (and the specific poll flags for each fd). For that case you could use the existing open file descriptor sysctl/core dump note to get details about each fd, you would just need the list of <fd, poll flags> tuples.

Isn't this information easily available from the memory dump already? Both for poll and select, the syscalls take the set of fds explicitly, so all that is needed is to look up their arg array or bitmaps.

One question I have is have you considered including this information in core dumps via PROCSTAT core dump notes as we do for open files, etc?

This should be done by D49372

In D49163#1126058, @kib wrote:
In D49163#1125281, @jhb wrote:

Thanks a bunch of for doing this. This is one of the features I've wanted for a long time as a useful tool to understand what a process might be waiting on. The other thing I would love to have in this vein are a way to see the file descriptors that a thread blocked in select/poll is waiting on (and the specific poll flags for each fd). For that case you could use the existing open file descriptor sysctl/core dump note to get details about each fd, you would just need the list of <fd, poll flags> tuples.

Isn't this information easily available from the memory dump already? Both for poll and select, the syscalls take the set of fds explicitly, so all that is needed is to look up their arg array or bitmaps.

The use case I have is not a core, but when a process is just sleeping and I want to be able to see what it is waiting on. Kind of like the equivalent of 'show sleepchains' in DDB, but a bit more abstract. It's true that if it a process is stuck in "[select]" you can probably guess from procstat -f, but sometimes it is nice to explicitly see. I have used kgdb to do this in the past (and have kgdb scripts for this, just as I had for dumping kqueues), but not requiring root is nicer (and more accessible to folks who aren't kernel hackers).

In D49163#1126595, @jhb wrote:
In D49163#1126058, @kib wrote:
In D49163#1125281, @jhb wrote:

Thanks a bunch of for doing this. This is one of the features I've wanted for a long time as a useful tool to understand what a process might be waiting on. The other thing I would love to have in this vein are a way to see the file descriptors that a thread blocked in select/poll is waiting on (and the specific poll flags for each fd). For that case you could use the existing open file descriptor sysctl/core dump note to get details about each fd, you would just need the list of <fd, poll flags> tuples.

Isn't this information easily available from the memory dump already? Both for poll and select, the syscalls take the set of fds explicitly, so all that is needed is to look up their arg array or bitmaps.

The use case I have is not a core, but when a process is just sleeping and I want to be able to see what it is waiting on. Kind of like the equivalent of 'show sleepchains' in DDB, but a bit more abstract. It's true that if it a process is stuck in "[select]" you can probably guess from procstat -f, but sometimes it is nice to explicitly see. I have used kgdb to do this in the past (and have kgdb scripts for this, just as I had for dumping kqueues), but not requiring root is nicer (and more accessible to folks who aren't kernel hackers)

If the process is sleeping, you do not need kgdb to see where it sleeps, it is still a gdb work to either look for struct pollfd[] or select bitmask. I understand that not needing to operate gdb is nicer there, and might even consider writing an utility that would use ptrace+kern.files to do the report. I do not think it is worth adding more (tricky) kernel code to walk over the internals of the selfds.

In D49163#1126859, @kib wrote:
In D49163#1126595, @jhb wrote:
In D49163#1126058, @kib wrote:
In D49163#1125281, @jhb wrote:

Thanks a bunch of for doing this. This is one of the features I've wanted for a long time as a useful tool to understand what a process might be waiting on. The other thing I would love to have in this vein are a way to see the file descriptors that a thread blocked in select/poll is waiting on (and the specific poll flags for each fd). For that case you could use the existing open file descriptor sysctl/core dump note to get details about each fd, you would just need the list of <fd, poll flags> tuples.

Isn't this information easily available from the memory dump already? Both for poll and select, the syscalls take the set of fds explicitly, so all that is needed is to look up their arg array or bitmaps.

The use case I have is not a core, but when a process is just sleeping and I want to be able to see what it is waiting on. Kind of like the equivalent of 'show sleepchains' in DDB, but a bit more abstract. It's true that if it a process is stuck in "[select]" you can probably guess from procstat -f, but sometimes it is nice to explicitly see. I have used kgdb to do this in the past (and have kgdb scripts for this, just as I had for dumping kqueues), but not requiring root is nicer (and more accessible to folks who aren't kernel hackers)

If the process is sleeping, you do not need kgdb to see where it sleeps, it is still a gdb work to either look for struct pollfd[] or select bitmask. I understand that not needing to operate gdb is nicer there, and might even consider writing an utility that would use ptrace+kern.files to do the report. I do not think it is worth adding more (tricky) kernel code to walk over the internals of the selfds.

In some of my use cases I didn't have debug symbols for the process in question, and that when diagnosing behavior across several processes I wanted a way to see this information in a standard format across multiple processes executing different binaries.

In D49163#1129883, @jhb wrote:
In D49163#1126859, @kib wrote:
In D49163#1126595, @jhb wrote:
In D49163#1126058, @kib wrote:
In D49163#1125281, @jhb wrote:

Thanks a bunch of for doing this. This is one of the features I've wanted for a long time as a useful tool to understand what a process might be waiting on. The other thing I would love to have in this vein are a way to see the file descriptors that a thread blocked in select/poll is waiting on (and the specific poll flags for each fd). For that case you could use the existing open file descriptor sysctl/core dump note to get details about each fd, you would just need the list of <fd, poll flags> tuples.

Isn't this information easily available from the memory dump already? Both for poll and select, the syscalls take the set of fds explicitly, so all that is needed is to look up their arg array or bitmaps.

The use case I have is not a core, but when a process is just sleeping and I want to be able to see what it is waiting on. Kind of like the equivalent of 'show sleepchains' in DDB, but a bit more abstract. It's true that if it a process is stuck in "[select]" you can probably guess from procstat -f, but sometimes it is nice to explicitly see. I have used kgdb to do this in the past (and have kgdb scripts for this, just as I had for dumping kqueues), but not requiring root is nicer (and more accessible to folks who aren't kernel hackers)

If the process is sleeping, you do not need kgdb to see where it sleeps, it is still a gdb work to either look for struct pollfd[] or select bitmask. I understand that not needing to operate gdb is nicer there, and might even consider writing an utility that would use ptrace+kern.files to do the report. I do not think it is worth adding more (tricky) kernel code to walk over the internals of the selfds.

In some of my use cases I didn't have debug symbols for the process in question, and that when diagnosing behavior across several processes I wanted a way to see this information in a standard format across multiple processes executing different binaries.

So did you noted https://github.com/kostikbel/pollinfo (and per-requisite D49430)?