Page MenuHomeFreeBSD

Refactor truss.

Authored by jhb on Sep 5 2015, 8:11 PM.



Several changes to truss.

  • Refactor the interface between the ABI-independent code and the ABI-specific backends. The backends now provide smaller hooks to fetch system call arguments and return values. The rest of the system call entry and exit handling that was previously duplicated among all the backends has been moved to one place.
  • Merge the loop when waiting for an event with the loop for handling stops. This also means not emulating a procfs-like interface on top of ptrace(). Instead, use a single event loop that fetches process events via waitid(). Among other things this allows us to report the full 32-bit exit value.
  • Use PT_FOLLOW_FORK to follow new child processes instead of forking a new truss process for each new child. This allows one truss process to monitor a tree of processes and truss -c should now display one total for the entire tree instead of separate summaries per process.
  • Use the recently added fields to ptrace_lwpinfo to determine the current system call number and argument count. The latter is especially useful and fixes a regression since the conversion from procfs. truss now generally prints the correct number of arguments for most system calls rather than printing extra arguments for any call not listed in the table in syscalls.c.
  • Actually check the new ABI when processes call exec. The comments claimed that this happened but it was not being done (perhaps this was another regression in the conversion to ptrace()). If the new ABI after exec is not supported, truss detaches from the process. If truss does not support the ABI for a newly executed process the process is killed before it returns from exec.
  • Along with the refactor, teach the various ABI-specific backends to fetch both return values, not just the first. Use this to properly report the full 64-bit return value from lseek(). In addition, the handler for "pipe" now pulls the pair of descriptors out of the return values (which is the true kernel system call interface) but displays them as an argument (which matches the interface exported by libc).
  • Each ABI handler adds entries to a linker set rather than requiring a statically defined table of handlers in main.c.
  • The arm and mips system call fetching code was changed to follow the same pattern as amd64 (and the in-kernel handler) of fetching register arguments first and then reading any remaining arguments from the stack. This should fix indirect system call arguments on at least arm.
  • The mipsn32 and n64 ABIs will now look for arguments in A4 through A7.
  • Use register %ebp for the 6th system call argument for Linux/i386 ABIs to match the in-kernel argument fetch code.
  • For powerpc binaries on a powerpc64 system, fetch the extra arguments on the stack as 32-bit values that are then copied into the 64-bit argument array instead of reading the 32-bit values directly into the 64-bit array.
Test Plan
  • Have tested truss -f on a fork test app on amd64.
  • Have run truss against a syscall exerciser test program on amd64 and i386.
  • Need to get more testing of other platforms before this can be committed.

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jhb retitled this revision from to Refactor truss..Sep 5 2015, 8:11 PM
jhb updated this object.
jhb edited the test plan for this revision. (Show Details)
jhb added a reviewer: kib.
jhb updated this revision to Diff 8506.
kib edited edge metadata.Sep 7 2015, 10:29 AM

Overall, this looks fine.

116 ↗(On Diff #8506)

Shouldn't this assignment explicitely zero upper word of the retval[0,1] ? It probably typically comes zero from kernel, but I would not bet that this is guaranteed.

190 ↗(On Diff #8506)

SIGKILL is better name for 9, IMHO.

194 ↗(On Diff #8506)

'else' is not needed, and the next line can be un-indented.

188 ↗(On Diff #8506)

Add error checking there ?

237 ↗(On Diff #8506)

Do you need pid there ? I mean, lwpids are global.

356 ↗(On Diff #8506)

This is strange, you trust the built-in lists more that the kernel, which now reports the number of args which are used, not some number which is supposed to be used.

andrew added inline comments.Sep 7 2015, 10:54 AM
77–82 ↗(On Diff #8506)

This is only needed if you plan on MFCing this code to 10 or prior. The non-EABI support was removed from 11.

95 ↗(On Diff #8506)

This doesn't look correct, we should only be looking at r0-r3.

jhb marked 4 inline comments as done.Sep 8 2015, 10:55 PM
jhb edited edge metadata.
jhb updated this revision to Diff 8604.
  • Mask off lower 32-bits of return values in 32-bit ABIs on 64-bit hosts.
  • Fix wrong register arg count.
  • Prefer SIGKILL to magic number.
  • Remove an else.
116 ↗(On Diff #8506)

Done. I almost did this the first time. Note that the amd64/linux32 argument fetching does not mask off the upper 32-bits of the registers still, but the kernel doesn't do this either. (It probably should be doing that for correctness in both places though.)

77–82 ↗(On Diff #8506)

I will probably merge this to 10, yes.

jhb marked an inline comment as done.Sep 8 2015, 11:16 PM
jhb added inline comments.
237 ↗(On Diff #8604)

So truss does not try to be perfect here. In particular, it does not cull exited threads from its internal thread list, and it does not fetch the list of LWPs when attaching to an existing process. Instead, it just remembers an LWP the first time it appears. However, if an LWP exits and is reused for another process that is also being followed (due to fork following), then it might get confused. I instead opted to just leave the stale LWPs as-is and avoid having to be exact in LWP handling.

I suppose, though, that if an LWP is reused in another process, it's really the same as reusing in the same process. I assume that on reuse a new SCE event will arrive and the previous SCE for thr_exit() will just be lost (since there is never an SCX event to clean it up).

I have another set of changes to ptrace() to add optional event reporting when threads are created and exit that I plan to use in gdb instead of the libthread_db style of setting breakpoints in the threading libraries. With that in place I could be more exact in LWP handling.

OTOH, I still need to ensure that the 'proc' pointer for the thread is correct, so even if there were a global list of LWPs I would still need the 'pid' to set that.

356 ↗(On Diff #8604)

This is how the existing code worked. However, what matters is that 'nargs' after this point is really the number of strings to print (items in s_args). When there is no special formatting magic, each raw argument is identity mapped to a hex string in s_args. The formatting rules for a given system call are free to coalesce arguments (e.g. print two raw arguments as one logical value if that makes sense), or print arguments using another source (e.g. the hack to print the pipe fd's as an argument to match libc even though the kernel returns them via retval[]). You might also want to omit some arguments (e.g. the padding words around off_t arguments on some platforms).

jhb updated this revision to Diff 8606.Sep 8 2015, 11:16 PM
  • Exit if PT_FOLLOW_FORK fails.
jhb updated this revision to Diff 8609.Sep 8 2015, 11:34 PM
  • Split out some reporting functions to avoid excessive indentation.
  • Don't drop signals sent to child when signal reporting is disabled.
  • Don't report new child processes when only counting system calls (-c).
kib edited edge metadata.Sep 9 2015, 6:29 AM
kib accepted this revision.
kib added inline comments.
116–120 ↗(On Diff #8609)

Registers coming from the 32bit mode, must have upper word zeroed, according to SDM. But indeed, we do not enforce the kernel code to treat the upper half as zero.

This revision is now accepted and ready to land.Sep 9 2015, 6:29 AM

Generally looks fine to me.

89 ↗(On Diff #8609)

Dumb comment, but that could be done without a for loop, since it's really just choosing which register to use based on number of arguments. Not sure if using a break-less switch statement is worth that, though.

jhb added a comment.Sep 23 2015, 9:14 PM
  • kib tested this on arm
89 ↗(On Diff #8609)

It's a copy of the code from the kernel. It's not quite a simple switch statement in that you may not take the 0th case if this was an indirect system call. So it's not just the number of args, but also if you had to skip the first reg.

jhb edited edge metadata.Sep 25 2015, 6:45 PM
jhb updated this revision to Diff 8935.
  • Try to explain the trickery with nargs and args[] vs s_args[].
  • Try to make the argument handling less confusing and trim some malloc()s.
  • Drop extra arg from i386 and remove another malloc().
  • Drop syscall descriptions for zero args with a standard return type.
This revision now requires review to proceed.Sep 25 2015, 6:45 PM
kib added a comment.Sep 26 2015, 6:52 AM

Could you, please, explain the addition of the nargs argument to the fetch_args() methods ? I probably miss something, but it looks like nargs is always the same constant value.

jhb added a comment.Sep 28 2015, 2:42 PM

Previously, the 'narg' field in struct current_syscall was overloaded to mean two different things:

  1. Before fetch_syscall_args() was called, it was set to the number of valid argument words as returned by 'pl_syscall_narg' so that those callbacks know how many arguments to fetch.
  1. After fetch_syscall_args() it was set to the number of valid argument descriptions in the s_args[] array which can be a different value than the raw number of argument words for system calls with an entry in the table in syscalls.c.

I think it was confusing to have it mean two different things and to have it change its meaning during enter_syscall(), so I reverted back to what the old truss did where it passes the raw number of arguments to the fetch_syscall_args() and only uses 'narg' in the structure for 2). Note that it is not a constant as it honors pl_syscall_narg, but it is capped at 10 so that I can statically allocate the various arg[] and s_arg[] arrays to avoid some malloc() calls on each trace.

kib edited edge metadata.Sep 29 2015, 5:55 AM
kib accepted this revision.

I see. IMO it would be also not much confusing if the comment for nargs is added about fetch_args() capping the value during the call, but I am completely fine with the latest patch. Thank you for the explanation.

This revision is now accepted and ready to land.Sep 29 2015, 5:55 AM
jhb edited edge metadata.Sep 29 2015, 5:55 PM
jhb updated this revision to Diff 8970.
  • Compile fixes on powerpc64.
This revision now requires review to proceed.Sep 29 2015, 5:55 PM
jhb added a comment.Sep 29 2015, 5:56 PM

Tested a simple fork test with -f as well as my system call exerciser for both 64-bit and 32-bit ppc binaries on powerpc64 under qemu.

jhb edited edge metadata.Sep 30 2015, 3:51 AM
jhb updated this revision to Diff 8975.
  • Merge branch 'master' into truss_refactor
jhb updated this revision to Diff 8982.Sep 30 2015, 6:37 PM
  • Construct quad explicitly on 32-bit to avoid unaligned access.
  • Various warning fixes, mostly from u_int narg.
jhb updated this revision to Diff 8983.Sep 30 2015, 7:08 PM
  • Tidy.
  • Don't return garbage for an unsupported ABI.
  • During an exec stop, just detach for a bad ABI.
  • This does actually work the way we want.
This revision was automatically updated to reflect the committed changes.