Instead of systematically disabling use of tsc as timecounter for VM, a new option is introduced to allow overriding default behavior and still enable tsc. The major reason to previously systematically disallow tsc as timecounter in the VM case is to allow for VM migration, but in many contexts this is not a restriction. Some systems rely on efficient and low granularity timecounter, even when running as VM. While timecounter choice can be overwritten after boot, allowing it from kernel boot additionally allows to test the full boot procedure in a VM with tsc timecounter which can be useful to diagnose timecounter related issues during that time. Tested: ran Freebsd native and under VM, checked tsc choices and behavior.
tested with kernel running on baremetal or VM
I'm curious which hypervisor you have tested this under and did you consider using one of the virtualized clocks if one was available? I know that there are some early patches to add kvmclock support to bhyve for example.
For FreeBSD we probably want to leave the default setting as-is by setting this to zero. I would perhaps rename the knob/variable as well to be something like 'tsc_vm_enable' and 'hw.tsc.vm_enable'. Finally, I would maybe make this a CTLFLAG_RDTUN sysctl and call it 'machdep.tsc_vm_enable'. Then it is documented via sysctl -d and you can inspect the current value on existing systems, etc. The reason for machdep is that the other existing TSC sysctl is 'machdep.tsc_freq'.
I have ran experiments with hypervizor either linux/qemu/kvm or freebsd/bhyve and had the same behavior for both. kvmclock could solve the issue in long-term, but it seems some additional parts would be needed in both bhyve and in freebsd kernel (as guest).
I updated the patch. But if you have a pointer to kvmclock patches, I'll be glad to try them.
Tiny style nits can be fixed when this is committed.
I would move tsc_vm_enable declaration here and just leave off the ' = 0' (see the tsc_skip_calibration node above for an example of this style).
style(9) would want a single tab indent, a blank line before the indent, and parentheses around the return value.