Page MenuHomeFreeBSD

vmtotal: extend counters to match reality
ClosedPublic

Authored by kaktus on Nov 9 2017, 2:50 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 22, 3:48 PM
Unknown Object (File)
Mon, Jan 13, 8:22 PM
Unknown Object (File)
Dec 28 2024, 8:15 AM
Unknown Object (File)
Dec 7 2024, 11:41 AM
Unknown Object (File)
Nov 28 2024, 1:59 AM
Unknown Object (File)
Nov 23 2024, 11:49 PM
Unknown Object (File)
Nov 19 2024, 12:39 AM
Unknown Object (File)
Nov 11 2024, 9:21 AM
Subscribers

Details

Summary

struct vmtotal is used by tools like vmstat, systat -vmstat, etc do display basic statistics about memory and run queue.
Currently if one allocate huge amount of virtual memory, the 32 bit counters will overflow causing wrong values to be reported, often as negative numbers.

This patch extends the size of counters to mach the one used to obtain original values. I've separate patches to fix overflows in vmstat and systat but this is the prerequisite.

I tested the compat version with systat and vmstat from 11.1, worked correctly - there was still overflow and the values were displayed using old structure layout.

The only problem I found is with /sbin/sysctl that provides function printing pretty table if vm.vmtotal is queried. In case of older binary it reports vm.vmtotal: sysctl: S_vmtotal 72 != 48 because the size of the structure changed and it checks the size before querying so the compat part isn't used in this case. I'll need some help here how to overcome this is if the sysctl has to be kept accessible using older version of the binary.

This partially fixes PR2137, the rest is related to how systat display large numbers and will be provided in a separate review.

Test Plan
  1. Allocate huge amount of virtual memory.
  2. Run vmstat, vmstat -H, systat -vmstat, sysctl vm.vmtotal.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

You might try to use p_osrel to detect releng 11 binary. It is not too good, but perhaps allows to make old sysctl binaries compatible.

sys/sys/vmmeter.h
51 ↗(On Diff #35001)

If you break the ABI, I do not see why not expand _all_ members counting pages into uint64_t. 4G pages is just 4T RAM, which is already quite accessible configuration for top boxes.

From what I see, sysctl(8) first calls sysctl(3) to get the size of vmtotal. As defined in

SYSCTL_PROC(_vm, VM_TOTAL, vmtotal, CTLTYPE_OPAQUE|CTLFLAG_RD|CTLFLAG_MPSAFE,
    0, sizeof(struct vmtotal), vmtotal, "S,vmtotal", 
    "System virtual memory statistics");

it'll return the size of the current vmtotal. It later compares it with the compile time size of struct vmtotal and report mismatch.

I don't see how I can fix this with p_osrel :-(

sys/sys/vmmeter.h
51 ↗(On Diff #35001)

Will do.

In D13018#270459, @pawel.biernacki-gmail.com wrote:

From what I see, sysctl(8) first calls sysctl(3) to get the size of vmtotal. As defined in

SYSCTL_PROC(_vm, VM_TOTAL, vmtotal, CTLTYPE_OPAQUE|CTLFLAG_RD|CTLFLAG_MPSAFE,
    0, sizeof(struct vmtotal), vmtotal, "S,vmtotal", 
    "System virtual memory statistics");

it'll return the size of the current vmtotal. It later compares it with the compile time size of struct vmtotal and report mismatch.

I don't see how I can fix this with p_osrel :-(

Look at kern/vfs_bio.c sysctl_bufspace which tries to handle different sizes of the output buffer. Most likely you would need to set sizeof() in the SYSCTL_PROC to zero, and do some reading of the sysctl code and experimentation. In the worst case, you can add some flag to sysctl node declartion to specify that size should be handled by the proc (it might be not needed, I just do not know the sysctl code to say without reading it).

  • Extend all counters to uint64_t.
  • Introduce new P_OSREL_VMTOTAL64 define to mark extension of the struct vmtotal.
  • Provide sysctl(8) compatibility.
  • Bump FreeBSD_version.
sys/vm/vm_meter.c
286 ↗(On Diff #35212)

Should I account for sizeof(vmtotal) == 2*sizeof(vmtotal11) here and enclose it in p->p_osrel check?

sys/sys/vmmeter.h
51 ↗(On Diff #35001)

Is there any point in also reordering these to be first?

sys/vm/vm_meter.c
187 ↗(On Diff #35212)

Why do you need the assignment to p ? It is only used once.

Case of req->oldptr == NULL makes the useful optimization for non-compat case as well: there is no need to iterate over all processes just to return sizeof(struct vmtotal).

285 ↗(On Diff #35212)

repoRted

289 ↗(On Diff #35212)

Unneeded blank line.

304 ↗(On Diff #35212)

Again unneeded blank line.

315 ↗(On Diff #35212)

Why do you want to keep this case declared as before ? I do not see the point.

Address @kib comments:

  • Use curproc directly.
  • Add fast path for req->oldptr == NULL for non compat cases.
  • Fix spelling.
  • Remove extra blank lines.

Patch looks good now, please fix remaining style issues.

sys/vm/vm_meter.c
287 ↗(On Diff #35230)

No need for () around 2 * sizeof.

310 ↗(On Diff #35230)

Look at the old style of the SYSCTL_PROC() split: put the description on the new line.

Also, while there, fix style on the previous line, put spaces around binary '|' operator.

Address the style(9) comments. Also there is no need to call sysctl_handle_opaque in the last return as all we do now is returning fully populated struct. SYSCTL_OUT is enough.

Uhm, I believe there is one more serious issue with the patch, which I missed initially.

Can you try the following: compile i386 world and take the sysctl(8) binary from there, which uses new definition of the struct vmtotal. Try to run it on the new amd64 kernel with COMPAT32 option. My expectation is that you would see the garbage in all 64bit counters.

The problem which I envision is that the uint64_t type has 8-bytes alignment on amd64, but only 4-bytes on i386. As result, despite the fact that the vmtotal structure definition only uses fixed-sized types, the 32- and 64-bit variants of it are ABI-incompatible.

In D13018#272087, @kib wrote:

The problem which I envision is that the uint64_t type has 8-bytes alignment on amd64, but only 4-bytes on i386.

Ah, indeed; sorting them (as in my idle query above) would resolve that.

Reordering didn't really helped. There is 5 16-bit fields that seems to be aligned on amd64 leaving 3x16bit hole which is only 1x16bit on i386.
Any ideas how to fix it? Enforce struct alignment to 8 bytes? Add an extra 32bit field as a spare?

I would put the 64-bit fields first, and make the whole struct 8-byte aligned, but I'll defer to any suggestion kib might offer.

I would put the 64-bit fields first, and make the whole struct 8-byte aligned, but I'll defer to any suggestion kib might offer.

There is also the padding at the end, so sizeof() of the structures would be different even if 16bit fields are moved to the end.

I agree that explicitly adding the padding of 3*uint16_t seems to be the easiest way out. Does not matter if they are added in between current members, or 16bits are moved at the end and then padding still added there.

Fix the alignment issue by introducing spare filed of 3x16 bit ints.

Tested on amd64 with sysctl vm.vmtotal, vmstat (with and without -H), systat -vmstat using: native amd64 bins, i386 bins and 11.1/amd64 bins.

This revision is now accepted and ready to land.Nov 15 2017, 1:40 PM
This revision was automatically updated to reflect the committed changes.