Page MenuHomeFreeBSD

Reimplement dtrace_gethrtime() on arm64 to use the cycle counter.
ClosedPublic

Authored by rwatson on Feb 17 2021, 2:20 AM.

Details

Summary

Reimplement the arm64 dtrace_gethrtime(), which provides the
high-resolution nanosecond timestamp used for the DTrace
'timestamp' built-in variable. The new implementation uses
the EL0 cycle counter and frequency registers in ARMv8-A. This
replaces a previous implementation that relied on an
instrumentation-safe implementation of getnanotime(), which
provided only timer resolution.

MFC after: 3 days

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Seems legit, so long as freq is returned in GHz

This revision is now accepted and ready to land.Feb 17 2021, 6:28 PM
In D28723#643374, @imp wrote:

Seems legit, so long as freq is returned in GHz

freq is Hz, the 1000000000UL is to turn the s into ns.

Is dtrace_gethrtime run while pinned to a single CPU? My readding of the spec is that it should be identical over all cores as it's based on the system counter, although it this is not the case it could lead to interesting results.

sys/cddl/dev/dtrace/aarch64/dtrace_subr.c
165

This is a generic counter, it may not be a cycle count in all implementations.

In D28723#643374, @imp wrote:

Seems legit, so long as freq is returned in GHz

freq is Hz, the 1000000000UL is to turn the s into ns.

It took me longer than desirable to convince myself I had the unit conversions right .. and also that this ordering minimised the chances of precision loss. It wouldn’t surprise me if I had at least the latter not quite right.

Is dtrace_gethrtime run while pinned to a single CPU? My readding of the spec is that it should be identical over all cores as it's based on the system counter, although it this is not the case it could lead to interesting results.

Certainly in the DTrace probe context, the thread is pinned and has interrupts disabled. However, I will check and see if it’s possibly called from non-probe contexts. However, my reading was that this counter should be OK to use in this way. I also confirmed on the RPi4 that the rate is invariant despite CPU clock frequency changes, across multi core.

Is dtrace_gethrtime run while pinned to a single CPU? My readding of the spec is that it should be identical over all cores as it's based on the system counter, although it this is not the case it could lead to interesting results.

Certainly in the DTrace probe context, the thread is pinned and has interrupts disabled. However, I will check and see if it’s possibly called from non-probe contexts. However, my reading was that this counter should be OK to use in this way. I also confirmed on the RPi4 that the rate is invariant despite CPU clock frequency changes, across multi core.

I’ve confirmed that dtrace_gethrtime() is used outside the probe context. In fact, I wonder if the dtrace_getnanouptime() implementation having 1/hz resolution is the reason we’ve been seeing occasionally Dtrace buffer rotation hangs FreeBSD/arm64. In any case, it is not safe to assume that dtrace_gethrtime() is pinned or has interrupts disabled. My reading is that this use of the system timer is OK due to the broad and useful scope of the word ‘system’. But adding sched_pin()/sched_unpin() would certainly be easy if I’m wrong.

I'm happy with this, although the cyclecount name is a bit confusing as it's not related to the CPU clock so isn't CPU cycles.

sys/cddl/dev/dtrace/aarch64/dtrace_subr.c
165

Maybe just count?

  • Rename local variable as it holds the systen clock count rather than the cycle counter.
This revision now requires review to proceed.Feb 18 2021, 10:52 PM
This revision is now accepted and ready to land.Feb 19 2021, 3:31 PM