Page MenuHomeFreeBSD

Stop posix realtime timer and kqueue EVFILT_TIMER timers around process stop and kill
ClosedPublic

Authored by kib on Mar 6 2021, 3:25 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Mar 8, 5:50 PM
Unknown Object (File)
Sat, Mar 2, 4:39 PM
Unknown Object (File)
Feb 19 2024, 10:09 PM
Unknown Object (File)
Feb 16 2024, 5:36 PM
Unknown Object (File)
Jan 27 2024, 4:02 AM
Unknown Object (File)
Jan 27 2024, 4:02 AM
Unknown Object (File)
Jan 27 2024, 4:02 AM
Unknown Object (File)
Jan 19 2024, 6:10 PM
Subscribers

Details

Summary

This way, even if the process specified very tight reschedule intervals, it should be stoppable/killable.

This patch requires at least one improvement. I calculate the count of skipped events for kq timers, but not for posix for timer_getoverrun(). I suspect that this can benefit from your other change, than can be refactored to be used there as well.

Per-commit view is available at https://kib.kiev.ua/git/gitweb.cgi?p=deviant3.git;a=shortlog;h=refs/heads/timers

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Mar 6 2021, 3:25 AM
kib created this revision.

I find it strange to disable periodic timers when the process is stopped. Applications may require information about the passage of time even if they're stopped. There is even an example in the Linux timerfd man page which demonstrates that their periodic timer continues firing when the owning process is stopped. Is the intent to ensure that a debugger can be attached?

Is Linux timerfd passable? If yes, this is different from kqueue.

In D29106#652008, @kib wrote:

Is Linux timerfd passable? If yes, this is different from kqueue.

I can't find an answer without looking at the sources. But consider an application with an event loop that uses EVFILT_TIMER to periodically fetch some statistics. Such a program could easily detect and normalize missing events due to SIGSTOP, but now will behave incorrectly. At least this should be documented, but IMO the new behaviour is surprising and will probably break some applications.

In D29106#652008, @kib wrote:

Is Linux timerfd passable? If yes, this is different from kqueue.

I can't find an answer without looking at the sources. But consider an application with an event loop that uses EVFILT_TIMER to periodically fetch some statistics. Such a program could easily detect and normalize missing events due to SIGSTOP, but now will behave incorrectly. At least this should be documented, but IMO the new behaviour is surprising and will probably break some applications.

What do you mean by 'behaving incorrectly'? How would it detect that events are missed due to SIGSTOP? Note that I increment kn_data by the number of the schedule intervals between prev and current moments, which is exactly the number of missed knote_activate() calls.

https://gist.github.com/12332ccd0876d9d95264e6e434549585
https://gist.github.com/ac6cba0501246a1c96ba4420a690dc3c
are the test programs I used

Start the program, then ^Z it, then continue after some time.

In D29106#652012, @kib wrote:
In D29106#652008, @kib wrote:

Is Linux timerfd passable? If yes, this is different from kqueue.

I can't find an answer without looking at the sources. But consider an application with an event loop that uses EVFILT_TIMER to periodically fetch some statistics. Such a program could easily detect and normalize missing events due to SIGSTOP, but now will behave incorrectly. At least this should be documented, but IMO the new behaviour is surprising and will probably break some applications.

What do you mean by 'behaving incorrectly'? How would it detect that events are missed due to SIGSTOP? Note that I increment kn_data by the number of the schedule intervals between prev and current moments, which is exactly the number of missed knote_activate() calls.

Sorry, I misunderstood part of the patch. If kn_data is updated as you say then my comment does not apply.

sys/kern/kern_event.c
695

I would use "continue" instead of the past tense "continued".

704

Why not sbinuptime()?

749

Consider using a local variable for kc->p.

sys/kern/kern_time.c
908

This will use it_interval to set the precision, but suppose it_interval is not set, and the process was stopped and started again before the timeout expires. Then we will use a zero-valued it_interval to calculate the precision for the callout. Really I think we should not be scheduling a callout at all if one is still pending at this point.

kib marked 4 inline comments as done.Mar 8 2021, 11:39 PM
kib added inline comments.
sys/kern/kern_event.c
704

sbinuptime() returns relative time since boot, while getboottimebin() provides absolute time. Timers are set in abs time, and I need it there.

sys/kern/kern_time.c
908

If it_interval is not set, then P2_ITSTOPPED is not set, and itimer_proc_continue() is not called.

I only stop periodic (rescheduling) timers, one-shot timers are left alone for stopped process. And for periodic, I do not cancel scheduled callout, I let it execute and observe the process state, then decide if it needs to reschedule.

Cancellation is simply not reliable, esp. while network people do not even consider other users and make incompatible KBI changes without handling all callers.

kib marked 2 inline comments as done.

continued->continue
use local p for kc->p

I still owe the calculation of itimer overrun when stopped, but lets agree on the approach first.

Also stop posix realtime timers

I'm sorry for the delay.

In D29106#652487, @kib wrote:

I still owe the calculation of itimer overrun when stopped, but lets agree on the approach first.

The overall approach? I reread the patch again and I'm ok with it.

sys/kern/kern_time.c
913

I think we need to verify that p->p_itimers != NULL first.

kib marked an inline comment as done.Apr 7 2021, 8:59 PM

I'm sorry for the delay.

In D29106#652487, @kib wrote:

I still owe the calculation of itimer overrun when stopped, but lets agree on the approach first.

The overall approach? I reread the patch again and I'm ok with it.

I believe that I handled the overrun issue, or rather, it is naturally handled by the code structure.

Check p->p_itimers for NULL

This revision is now accepted and ready to land.Apr 9 2021, 12:22 AM