Page MenuHomeFreeBSD

ktrace: Record detailed ECAPMODE violations
ClosedPublic

Authored by jfree on Jun 20 2023, 10:03 PM.
Referenced Files
Unknown Object (File)
Thu, Jul 18, 3:59 AM
Unknown Object (File)
Fri, Jun 28, 6:03 AM
Unknown Object (File)
Mon, Jun 24, 5:12 AM
Unknown Object (File)
Jun 7 2024, 3:33 PM
Unknown Object (File)
Jun 7 2024, 3:32 PM
Unknown Object (File)
Jun 6 2024, 1:13 PM
Unknown Object (File)
Jun 6 2024, 1:13 PM
Unknown Object (File)
Jun 4 2024, 11:52 PM

Details

Summary
ktrace: Record detailed ECAPMODE violations

When a Capsicum violation occurs in the kernel, ktrace will now record
detailed information pertaining to the violation.

For example:
- When a namei lookup violation occurs, ktrace will record the path.
- When a signal violation occurs, ktrace will record the signal number.
- When a sendto(2) violation occurs, ktrace will record the recipient
  sockaddr.

For all violations, the syscall and ABI is recorded.

kdump is also modified to display this new information to the user.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Overall this looks good to me. I wonder if @emaste, @oshogbo or @theraven have any thoughts on it? To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.

sys/kern/sys_capability.c
160

Missing a newline after a variable definition.

184–185

Missing a newline after a variable definition.

sys/sys/ktrace.h
212
217

IMO, CAPFAIL_NAMEI would be a better name for this.

219

Missing newlines between the type definitions.

226

This can be PATH_MAX instead, in which case you can pull in syslimits.h instead of the much larger param.h.

usr.bin/kdump/kdump.c
2128

Missing newline after a local variable definition.

To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.

So this traces the system calls that are not on the allowed-in-cap-mode list?

To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.

So this traces the system calls that are not on the allowed-in-cap-mode list?

Among other things (like calling namei() with an absolute path), yes.

To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.

I haven't looked into the code, to be honest. However, I don't see a real application for this approach, or maybe I misread how this is supposed to work.
Is this a tool for improving debugging sandboxed applications or sandboxing new applications?
If the second, this doesn't help determine the "trusted part" and "capability part" of the application because after re-arranging the code I still will get the same errors.
If I would try to sandbox a new application why not put cap_enter at a "random" place and see how ktrace reports the same errors?

After a quick scan, I don't see any additional parameter to ktrace. Can I see a sample output of ktrace? Won't it be misleading for the user that something is happening in TCB ("trusted part") and will be reported as a violation of Capsicum?

Again, maybe I just need some more context to understand the reasoning behind this change.

Again, maybe I just need some more context to understand the reasoning behind this change.

Here is an example of ktrace CAPFAIL tracing in action, with this patch:

https://cdaemon.com/posts/capsicum#detecting-violations

To summarize the patch very briefly, this lets you ktrace an application that does not run in capability mode, and ktrace will log all events which would have triggered a Capsicum violation.

I haven't looked into the code, to be honest. However, I don't see a real application for this approach, or maybe I misread how this is supposed to work.
Is this a tool for improving debugging sandboxed applications or sandboxing new applications?

Mostly for sandboxing new applications.

If the second, this doesn't help determine the "trusted part" and "capability part" of the application because after re-arranging the code I still will get the same errors.

No, it doesn't tell you how to split out your code. The idea is that would be useful early on, when becoming familiar with an application's behaviour. Tracing records

If I would try to sandbox a new application why not put cap_enter at a "random" place and see how ktrace reports the same errors?

Because the application will (probably) quickly fail and exit or just spin its wheels somehow, because most of its system calls will fail. This new mode does not affect the behaviour of the program.

After a quick scan, I don't see any additional parameter to ktrace. Can I see a sample output of ktrace? Won't it be misleading for the user that something is happening in TCB ("trusted part") and will be reported as a violation of Capsicum?

Again, maybe I just need some more context to understand the reasoning behind this change.

If I understand correctly, for application like:

localtime();
open();
cap_enter()
openat()

The first two operations will always cause ktrace to report insufficient capabilities. Which is a false-postive statement, and will be misleading for "normal" users.
The application that wasn't supposed to be sandboxed will also get these errors.
I think this should be a special flag to ktrace which says "Report all capabilities violation".

I would also consider doing something else, instead or maybe in addition to extending ktrace, I would consider adding an lddb plugin that allows pointing from which point this kind of violation should be reported.
Or if the lldb is to complicated maybe we can add a special syscall cap_enter_report_only or something like that.

If I understand correctly, for application like:

localtime();
open();
cap_enter()
openat()

The first two operations will always cause ktrace to report insufficient capabilities. Which is a false-postive statement, and will be misleading for "normal" users.

What do you mean by "normal user"? To get these events, you have to explicitly ask ktrace to tell you about capability mode violations.

The application that wasn't supposed to be sandboxed will also get these errors.
I think this should be a special flag to ktrace which says "Report all capabilities violation".

It already is. You have to specify ktrace -t p. The "p" flag is "trace capability check failures".

I would also consider doing something else, instead or maybe in addition to extending ktrace, I would consider adding an lddb plugin that allows pointing from which point this kind of violation should be reported.
Or if the lldb is to complicated maybe we can add a special syscall cap_enter_report_only or something like that.

Ah, ok I thought it was printed by default.
Then I don't think I have any complaints through the idea.

Are these events exposed to DTrace? When sandboxing, the thing I really want is a stack trace in userspace at the point where the violation happened. If so, it would be great to include a script that logged them. Ideally with an option of an explicit start marker so you can put in a fake cap_enter and be told what you still need to fix.

Are these events exposed to DTrace? When sandboxing, the thing I really want is a stack trace in userspace at the point where the violation happened. If so, it would be great to include a script that logged them. Ideally with an option of an explicit start marker so you can put in a fake cap_enter and be told what you still need to fix.

It's doable in principle, but in practice dtrace's inability to resolve backtraces in the face of fork/exec makes it mostly unusable. For instance, dtrace -n 'fbt::ktrcapfail:entry /progenyof($pid)/{ustack();}' -c "ktrace -t p ls" works up until ktrace execs the target process. This is really a limitation of dtrace and doesn't have anything to do with ktrace.

It works somewhat better if you're experimenting with a daemon that you can attach to. For instance, below, pid 3641 is syslogd:

# ktrace -t p -p 3641
# sleep 2 && kill -HUP 3641 &
# dtrace -n 'fbt::ktrcapfail:entry /pid == $target/{ustack();}' -p 3641
CPU     ID                    FUNCTION:NAME                                                                                                                                                                                                                                                                                   
  7  50830                 ktrcapfail:entry                                                                                                                                                                                                                                                                                   
              libc.so.7`_open+0xa                                                                                                                                                                                                                                                                                             
              libc.so.7`0x9726e899fcf                                                                                                                                                                                                                                                                                         
              libc.so.7`0x9726e8990ff                                                                                                                                                                                                                                                                                         
              libc.so.7`tzset+0x36                                                                                                                                                                                                                                                                                            
              syslogd`init+0xf6                                                                                                                                                                                                                                                                                               
              syslogd`main+0xeb1                                                                                                                                                                                                                                                                                              
              libc.so.7`__libc_start1+0x12a                                                                                                                                                                                                                                                                                   
              syslogd`_start+0x2d                                                                                                                                                                                                                                                                                             
              `0xf98aa403008
  7  50830                 ktrcapfail:entry 
              libc.so.7`_open+0xa
              syslogd`init+0x1fa
              syslogd`main+0xeb1
              libc.so.7`__libc_start1+0x12a
              syslogd`_start+0x2d
              `0xf98aa403008

  7  50830                 ktrcapfail:entry 
              libc.so.7`_openat+0xa
              syslogd`cfline+0x8fd
              syslogd`parseconfigfile+0x602
              syslogd`init+0x20f
              syslogd`main+0xeb1
              libc.so.7`__libc_start1+0x12a
              syslogd`_start+0x2d
              `0xf98aa403008
...

so you can get some sense for what's going on.

All this aside, I'd argue that having the "capfail" tracepoints identified and enumerated (in this case by ktrace) is useful in its own right. You could for instance use ktrace to get a full trace of an application's behaviour, then subsequently use gdb or dtrace or whatever to drill down into specific system calls.

It's doable in principle, but in practice dtrace's inability to resolve backtraces in the face of fork/exec makes it mostly unusable

I think around 20% of the places I've used Capsicum have done fork/execve. This would have been a huge win for the most recent thing (which used the Tesseract OCR libraries in a Capsized process and spent a while adding the compat syscall wrappers to use in cap mode). Learning that libc calls open via _open because it hates me and everything I stand for and that libomp uses __sys_shm_open2 via the shm_open wrapper took a while to discover and would have taken a minute or two with that DTrace script.

I created this patch to make the Capsicumization experience less intimidating for inexperienced developers. Both David and Mariusz may not be the target audience for this change because they already know how to extract the information that the tracing provides. Developers that are unfamiliar with Capsicum's semantics could use this tracing mode to easily determine why their program is not working in capability mode. I think it provides a solid starting point so new developers don't get lost and discouraged.

The barrier to entry for Capsicum development is high and this tracing flag is an extra tool in a new developer's toolbox that will ease them into Capsicum. It really can only help.

I created this patch to make the Capsicumization experience less intimidating for inexperienced developers. Both David and Mariusz may not be the target audience for this change

Oh, I am definitely in the target audience for this, thank you for working on it! If I’d had this and the DTrace script a few weeks ago, it would have saved me a few hours of work.

Hello Jake,

I have raised my concerns about the approach you have taken. I asked questions to understand it better and suggested some potential fixes. Mark has proved me wrong, and in the end, I supported the change. This is how the peer review works. You have been working on this for weeks, or maybe months, and I was trying to understand your approach to the issue. :)

I'm not sure who else should do this review, if not people who have been working on Capsicum and Capsicumizing applications.

Besides nontechnical issues, in my opinion, DTrace is one of the ways to accomplish that. However, I don't see a reason not to extend a ktrace output of additional information. In the end, we can have multiple tools for different levels of experience or even, especially since DTrace is another complicated tool one must learn.

jfree added inline comments.
sys/sys/ktrace.h
226

This can be PATH_MAX instead, in which case you can pull in syslimits.h instead of the much larger param.h.

Other macros in this file depend on <sys/param>, so if we are trying to make headers independent, then we already have to include it.

jfree marked an inline comment as done.
  • Address Mark's comments
  • Rebase on main after several months
This revision is now accepted and ready to land.Mar 29 2024, 3:33 PM
This revision was automatically updated to reflect the committed changes.

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled, which is confusing and distracting. Can this please be reverted to behave without CAP output for systems w/o capsicum ?

Eg:

<11:53am>beast/gallatin:~>ktrace ls -1 | wc -l

319

<11:55am>beast/gallatin:~>kdump | grep CAP
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ktrace CAP system call not allowed: execve
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: readlink
22235 ls CAP fstatat: restricted VFS lookup: AT_FDCWD
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: fchdir
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: open
22235 ls CAP system call not allowed: fchdir

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled, which is confusing and distracting. Can this please be reverted to behave without CAP output for systems w/o capsicum ?

This was done already in commit f239db4800ee9e7ff8485f96b7a68e6c38178c3b.

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled

Are systems without Capsicum still supported? I thought that option was removed in 14.

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled

Are systems without Capsicum still supported? I thought that option was removed in 14.

Userland always builds with capsicum. Kernel support is still optional.

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled, which is confusing and distracting. Can this please be reverted to behave without CAP output for systems w/o capsicum ?

This was done already in commit f239db4800ee9e7ff8485f96b7a68e6c38178c3b.

Ah! Thank you. I guess I need to update :)