Page MenuHomeFreeBSD

accounting: explicitly mark the exiting thread as doing accounting or ktrace record writing
ClosedPublic

Authored by kib on May 13 2021, 11:57 PM.

Details

Summary

Use the mark to stop applying file size limits on the write of the accounting record. This allows to remove hack to clear process limits in acct_process(), and avoids the bug with the clearing being ineffective because limits are also cached in the thread structure.

Use proper file size limit from the ktrace requester for ktrace record writes.

Reported by: markj

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

kib requested review of this revision.May 13 2021, 11:57 PM
kib created this revision.
This revision is now accepted and ready to land.May 14 2021, 1:35 AM

I disagree with this method. There are other cases where rlimits gets in the way of correct operation, for example ktrace: rlimit of the target is used when it should not be to limit the output of the kernel log (instead, if any limit is to be there, it should be taken from the process which instigated ktracing). I suspect the right way to fix this is to swap around the rlimit around ktrace calls and same technique can be used here, similarly to how the original patch tried to do it.

In D30257#679628, @mjg wrote:

I disagree with this method. There are other cases where rlimits gets in the way of correct operation, for example ktrace: rlimit of the target is used when it should not be to limit the output of the kernel log (instead, if any limit is to be there, it should be taken from the process which instigated ktracing). I suspect the right way to fix this is to swap around the rlimit around ktrace calls and same technique can be used here, similarly to how the original patch tried to do it.

I disagree with your objections. It is not correct to swap neither credentials nor limits for the thread. VOP operations, all some parallel actors, might examine and use them for something not indended. [I am aware that NFS client sometimes swaps credentials]

There, for accounting, the solution is exact, RLIMIT_FSIZE should not be used for accounting write. Hack with resetting thread limit to not effective is a hack. It is probably less important for exiting process. Swapping limits for ktrace records writes is not acceptable, I think.

In fact, this approach, with explicit mark of the regions where vn_rlimit_fsize() should make special checks instead of the check against current thread' limits, works quite well for the ktrace writes too. I decided to refactor code to put the ktrace io related data into single structure, it is more convenient this way.

kib retitled this revision from accounting: explicitly mark the exiting thread as doing accounting to accounting: explicitly mark the exiting thread as doing accounting or ktrace record writing.
kib edited the summary of this revision. (Show Details)

Add ktrace limit handling.
Refactor process ktrace io data, put it into dedicated structure.

This revision now requires review to proceed.May 15 2021, 1:04 AM
sys/kern/kern_descrip.c
82–83

Sort?

sys/kern/kern_ktrace.c
440

ktr_io_params_rele would be a more standard name.

468

These two lines can be res->cr = crhold(td->td_ucred).

489–490

I think it is not guaranteed in all cases that p->p_ktrioparams is not NULL. For instance, ktrace(KTROP_CLEAR) on a process which did not have tracing enabled before.

505
1043–1044

Need to check p->p_ktrioparms != NULL first.

sys/kern/vfs_vnops.c
2367

I would use local variables for both sides of the <= comparison. This is painful to read otherwise.

sys/kern/kern_ktrace.c
1126

Shouldn't the mutex be released first?

kib marked 8 inline comments as done.May 18 2021, 12:52 AM
kib added inline comments.
sys/kern/kern_descrip.c
82–83

Unfortunately no. ktrace.h is not self contained, the trace structures basically gather a lot of stuff from misc. syscalls declarations. And if some things like enum uio_rw can be handled by inclusion on sys/_uio.h, others e.g. sig_t cannot be handled reasonable except inclusion of signalvar.h before ktrace.h. So this header is outlier.

sys/kern/kern_ktrace.c
489–490

Yes, _unref/_rele should handle NULL arg, same as ktr_io_params_free(). I missed this.

kib marked 2 inline comments as done.

_unref->_rele
fix unlocking/free order bug
Check for NULL, also s/0/NULL

sys/kern/vfs_vnops.c
2373

So if a tracing process has a limit imposed, and tracing exceeds the limit, we will send a default-fatal signal to the traced process. This is not good behaviour for a debugging tool IMO. At this point we should instead disable tracing and ideally log some record to indicate what happened.

sys/kern/vfs_vnops.c
2373

This is one approach. But the semantic of SIGXFSZ was defined long time ago. You might equally complain that we kill all debuggees on debugger termination, but this is how FreeBSD kernel behaves.

I can easily change this place to not send SIGXFSZ, then ktr_writerequest() would automatically stop tracing anyway, but I am not sure that this is right, even if not desirable.

sys/kern/vfs_vnops.c
2373

Well, that behaviour of ptrace() is quite annoying, in many cases I believe it is unnecessary.

If anything, SIGXFSZ should be sent to the tracing process, not the tracee.

kib marked an inline comment as done.May 18 2021, 4:29 PM
kib added inline comments.
sys/kern/vfs_vnops.c
2373

There is no tracing process.

Ok. I will add a control, default off, to allow/disallow the signal there.

kib marked an inline comment as done.

Add knobs to not send SIGXFSZ on ktrace, and not kill debuggee on debugger exit (not tested).

Per-commit view is available at https://kib.kiev.ua/git/gitweb.cgi?p=deviant3.git;a=shortlog;h=refs/heads/file_limit

sys/kern/kern_ktrace.c
1030

This routine bumps the usecount on the vnode, but isn't it already incremented for us?

kib marked an inline comment as done.May 19 2021, 4:25 AM
kib added inline comments.
sys/kern/kern_ktrace.c
1030

In fact the issue is that vn_close() was missed.

kib marked an inline comment as done.

Drop ptrace changes, I will finish them separately.
Remove excess vref(), and do not forget vn_close() on the tracevp.
Use vrefact() in ktr_writerecord().
Initialize ref count for new ktr_io_params structure to 1, remove increment that is always done.

markj added inline comments.
sys/kern/kern_ktrace.c
152
1329

Actually I wonder if TDP2_KTRWRITE is really needed, it is guaranteed that TDP_INKTRACE is set here if the write is coming from ktr_writerequest().

This revision is now accepted and ready to land.May 19 2021, 1:57 PM
kib marked 2 inline comments as done.May 19 2021, 2:32 PM
kib added inline comments.
sys/kern/kern_ktrace.c
1329

I think, concern there would be for any operation done under ktrace_enter() and resulting in fs write. It seems that it is not possible to do this right now, from code inspection.

kib marked an inline comment as done.

Use TDP_INKTRACE

This revision now requires review to proceed.May 19 2021, 2:40 PM
This revision is now accepted and ready to land.May 19 2021, 2:41 PM