Page MenuHomeFreeBSD

Augment systat(1) -swap to display large swap space processes
ClosedPublic

Authored by ota_j.email.ne.jp on Apr 13 2021, 9:41 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Dec 19, 7:09 AM
Unknown Object (File)
Tue, Dec 17, 7:50 AM
Unknown Object (File)
Wed, Dec 11, 9:43 PM
Unknown Object (File)
Mon, Dec 9, 5:35 PM
Unknown Object (File)
Tue, Dec 3, 2:37 PM
Unknown Object (File)
Sat, Nov 23, 3:33 PM
Unknown Object (File)
Nov 19 2024, 4:20 AM
Unknown Object (File)
Oct 29 2024, 9:54 PM

Details

Summary

This change updates the systat(1) -swap display to use libprocstat to obtain and display per-process swap space usage infomation following its existing swap devise/file statistics. It also incorporates the disk I/O information from the -vmstat display.

The new screen looks like below with 'systat -swap':

/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average |

Device/Path Size Used |0% /10 /20 /30 /40 / 60\ 70\ 80\ 90\ 100|
ada0s1b 2048M 2034M XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
zvol/sys/tempora 1024M 1015M XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
zvol/sys/swap 1024M 1014M XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Total 4096M 4063M XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Pid Username Command Swap/Total Per-Process Per-System
24153 hiro seamonkey 98M / 1G 7% 2%
23677 hiro xfce4-pane 28M / 81M 34% XXX 0%
23629 hiro xfce4-sess 25M / 118M 21% XX 0%
23681 hiro xfdesktop 20M / 58M 34% XXX 0%
23678 hiro thunar 15M / 43M 36% XXX 0%
23658 hiro at-spi-bus 14M / 23M 63% XXXXXX 0%
23660 hiro gvfsd 12M / 21M 56% XXXXX 0%

Disks ada0 ada1 ada2 cd0 pass0 pass1 pass2 pass3
KB/t 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
tps 0 0 0 0 1 0 0 0
MB/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
%busy 0 0 0 0 0 0 0 0

Test Plan

This patch has been tested by the submitter, Yoshihiro Ota.

Further feedback is welcomed from the reviewers.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks for the program. Indeed, I saw lots of duplicates from the java program.

usr.bin/systat/proc.c
204 ↗(On Diff #88635)

After adding pid check, the numbers look more reasonable - e.g. no process with more swap than its allocated.

140 ↗(On Diff #88581)

This tracks the max realloc size.
I renamed it to maxnobj in the latest version.

usr.bin/systat/systat.h
93–104 ↗(On Diff #88635)

I moved these from devs.h.
I guess we don't want to include these in the review or commit.

I mostly looked at style.

usr.bin/systat/proc.c
4 ↗(On Diff #88635)

Add email?

28 ↗(On Diff #88635)

$FreeBSD$ in copyright was only used for header.

After git conversion, it does not make much sense to add the id to new files, I suggest to remove __FBSDID() below as well.

69 ↗(On Diff #88635)

COMMLEN + 1

125 ↗(On Diff #88635)

return (0);

126 ↗(On Diff #88635)

return (*aa > bb->kvo_me ? -1: 1);

134 ↗(On Diff #88635)

contig line should be indented +4 spaces
return ();

171 ↗(On Diff #88635)

BTW why realloc(3) and not reallocf(3)?

183 ↗(On Diff #88635)

... number of swap pages ...

184 ↗(On Diff #88635)

Newline after uint32_t

214 ↗(On Diff #88635)

return (pages);

245 ↗(On Diff #88635)

space before '('

311 ↗(On Diff #88635)

const struct proc_usage *aa = *((const struct proc_usage **)a);

314 ↗(On Diff #88635)

return (aa->pages > bb->pages ? -1: 1);

usr.bin/systat/swap.c
6 ↗(On Diff #88635)

I do not think that this changes deserve a copyright line, at least according to our typical policy of 20% changes.

usr.bin/systat/proc.c
5 ↗(On Diff #88635)

Also by convention we've been dropping All rights reserved. as it is not necessary, feel free to take it out absent a reason you want it there (as an example, a company contribution guideline might specify it).

ota_j.email.ne.jp marked an inline comment as done.

Style fixes based on review comments and recstored Kirk's systat.1 updates.

ota_j.email.ne.jp added inline comments.
usr.bin/systat/proc.c
28 ↗(On Diff #88635)

Do I keep sys/cdfs.h here?

171 ↗(On Diff #88635)

I've never known reallocf().
It calls die() which calls exit() if allocation fails, anyway.

usr.bin/systat/swap.c
6 ↗(On Diff #88635)

Is it 20% changes per commit or over a year?
I'm just curious.

usr.bin/systat/systat.h
36 ↗(On Diff #88722)

I think I can drop devstat.h as well from the changeset.

ota_j.email.ne.jp marked 3 inline comments as done.

Remove #include <devstat.h>

usr.bin/systat/proc.c
121 ↗(On Diff #88723)

space before :

243 ↗(On Diff #88723)

I do not quite follow. Assuming there is no sysctl errors, on the first call you return 0, and on the second and consequent calls, 1. Is this intended?

Also, why not follow consistent style, and do just return (prstat != NULL); (or whatever, ==) in the first statement in the function?

277 ↗(On Diff #88723)

Excess ()

310 ↗(On Diff #88723)

space before :

28 ↗(On Diff #88635)

No need to keep it. sys/param.h includes sys/types.h which already includes sys/cdefs.h

usr.bin/systat/swap.c
182 ↗(On Diff #88723)

I would wrote this as count = kvnsw == 1 ? 2 : 3;

184 ↗(On Diff #88723)

Excess (), two times

6 ↗(On Diff #88635)

Total. I did not looked at the blame output.

If you are the author of more than 1/5 of this file, then it is fine, of course.

usr.bin/systat/sysput.c
49 ↗(On Diff #88723)

This does not look like a tab before return.

But, can you use e.g. atop() and ptoa() instead of hand-rolled functions?

124 ↗(On Diff #88723)

Change function definition to ANSI.
BTW as I see it is not needed if you start using ptoa()

usr.bin/systat/systat.1
297 ↗(On Diff #88723)

..., the total use of swap space in bytes, ...

usr.bin/systat/systat.h
36 ↗(On Diff #88723)

Style, sys/ includes before userspace includes.

81 ↗(On Diff #88723)

I am not sure what meaning does this comment provide.

Thank you for reviews.

I happened to extract ports.tar.xz onto tmpfs and that caused long CPU spike and high memory usage.
systat took nearly 700MB.
I suppose it was procstat_getvmmap() took long time.
"top" wasn't updating its display, neither.

I will mention it in the BUGS section for a warning.

usr.bin/systat/proc.c
243 ↗(On Diff #88723)

I had one wrong.
swapinit() returns 1 for success.

Thank you for reviews.

I happened to extract ports.tar.xz onto tmpfs and that caused long CPU spike and high memory usage.
systat took nearly 700MB.
I suppose it was procstat_getvmmap() took long time.
"top" wasn't updating its display, neither.

I will mention it in the BUGS section for a warning.

Most likely it was sysctl vm.objects and not per-proc vmmap sysctl that caused the intermediate hang and CPU usage. As I and Mark discussed it earlier, some more fine-grained API to retrieve e.g. just the shadow chain for given object handle probably would help there.

But before I implement this sysctl, can you recheck, please, that the following patch solves the transient hang issue?

 commit fd2da9e1515bb50fb1c8fb747d9987ee9b8a314d
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Sat May 8 01:13:29 2021 +0300

    sysctl vm.objects: yield if hog

diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
index 1c4e879d82ea..b1fa3ebdf9ab 100644
--- a/sys/vm/vm_object.c
+++ b/sys/vm/vm_object.c
@@ -2614,6 +2614,7 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
 		kvo->kvo_structsize = roundup(kvo->kvo_structsize,
 		    sizeof(uint64_t));
 		error = SYSCTL_OUT(req, kvo, kvo->kvo_structsize);
+		maybe_yield();
 		mtx_lock(&vm_object_list_mtx);
 		if (error)
 			break;

Most likely it was sysctl vm.objects and not per-proc vmmap sysctl that caused the intermediate hang and CPU usage. As I and Mark discussed it earlier, some more fine-grained API to retrieve e.g. just the shadow chain for given object handle probably would help there.

But before I implement this sysctl, can you recheck, please, that the following patch solves the transient hang issue?

 commit fd2da9e1515bb50fb1c8fb747d9987ee9b8a314d
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Sat May 8 01:13:29 2021 +0300

    sysctl vm.objects: yield if hog

diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
index 1c4e879d82ea..b1fa3ebdf9ab 100644
--- a/sys/vm/vm_object.c
+++ b/sys/vm/vm_object.c
@@ -2614,6 +2614,7 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
 		kvo->kvo_structsize = roundup(kvo->kvo_structsize,
 		    sizeof(uint64_t));
 		error = SYSCTL_OUT(req, kvo, kvo->kvo_structsize);
+		maybe_yield();
 		mtx_lock(&vm_object_list_mtx);
 		if (error)
 			break;

The patch didn't help.
By the way, I've been working on releng/13.0 branch with git cherry-pick.
If I need some changes only in main or stable/13 branches, I will either cherry-pick or switch to such branch.

I checked process with procstat as below.
kinfo_getvmobject() was the one taking long time.
xosview was still updating CPU, memory, and swap usage; it must be using different APIs than top/systat.

% procstat -k 13568

PID    TID COMM                TDNAME              KSTACK

13568 101023 systat - tmpfs_vptocnp_fill tmpfs_vptocnp vn_vptocnp vn_fullpath_any vn_fullpath sysctl_vm_object_list sysctl_root_handler_locked sysctl_root userland_sysctl kern___sysctlbyname sys___sysctlbyname syscall wmt_devclass
% procstat -k 13568

PID    TID COMM                TDNAME              KSTACK

13568 101023 systat - vn_vptocnp vn_fullpath_any vn_fullpath sysctl_vm_object_list sysctl_root_handler_locked sysctl_root userland_sysctl kern___sysctlbyname sys___sysctlbyname syscall wmt_devclass

Should we also try "sysctl improvements", https://www.freebsd.org/status/report-2021-01-2021-03/#_sysctl_improvements, to see if that helps?

Adjust systat.1 from the other review.

A few suggestions for wording improvements in the man page.

usr.bin/systat/systat.1
283 ↗(On Diff #88919)

I would change the last "and" here to "as well as" to make this long sentence a bit more legible. Otherwise, it's just one long sentence connected by and's. I'm prone to do this myself, but I think we don't need to break up this sentence into two either.

286 ↗(On Diff #88919)

s/percentage usage/usage percentage/

298 ↗(On Diff #88919)

s/and/as well as/ (in this line here, see above)

I mean, when asking whether my one-line patch helped, did interactivity of the system during vm.objects retrieval improved? Of course it cannot optimize the running time of the sysctl, but it should allow for other threads to run some more while sysctl is gathering data and copying it out.

In D29754#677726, @kib wrote:

I mean, when asking whether my one-line patch helped, did interactivity of the system during vm.objects retrieval improved? Of course it cannot optimize the running time of the sysctl, but it should allow for other threads to run some more while sysctl is gathering data and copying it out.

I do not see any differences with or without patch. I ran "top" and "systat -swap" and after creating lots of tmpfs files, both systat spins and top is blocked.

In D29754#677726, @kib wrote:

I mean, when asking whether my one-line patch helped, did interactivity of the system during vm.objects retrieval improved? Of course it cannot optimize the running time of the sysctl, but it should allow for other threads to run some more while sysctl is gathering data and copying it out.

I do not see any differences with or without patch. I ran "top" and "systat -swap" and after creating lots of tmpfs files, both systat spins and top is blocked.

Ok. Could you add the following to the kernel, please, and see if it makes any difference?

diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
index dbb09f67b4c1..cf8b3fcb8f9c 100644
--- a/sys/vm/vm_object.c
+++ b/sys/vm/vm_object.c
@@ -2478,7 +2478,7 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
 	struct vnode *vp;
 	struct vattr va;
 	vm_object_t obj;
-	vm_page_t m;
+//	vm_page_t m;
 	u_long sp;
 	int count, error;
 
@@ -2524,6 +2524,7 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
 		kvo->kvo_memattr = obj->memattr;
 		kvo->kvo_active = 0;
 		kvo->kvo_inactive = 0;
+#if 0
 		TAILQ_FOREACH(m, &obj->memq, listq) {
 			/*
 			 * A page may belong to the object but be
@@ -2539,6 +2540,7 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
 			else if (m->a.queue == PQ_INACTIVE)
 				kvo->kvo_inactive++;
 		}
+#endif
 
 		kvo->kvo_vn_fileid = 0;
 		kvo->kvo_vn_fsid = 0;
In D29754#680269, @kib wrote:

Ok. Could you add the following to the kernel, please, and see if it makes any difference?

diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
index dbb09f67b4c1..cf8b3fcb8f9c 100644
--- a/sys/vm/vm_object.c
+++ b/sys/vm/vm_object.c
@@ -2478,7 +2478,7 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
 	struct vnode *vp;
 	struct vattr va;
 	vm_object_t obj;
-	vm_page_t m;
+//	vm_page_t m;
 	u_long sp;
 	int count, error;
 
@@ -2524,6 +2524,7 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
 		kvo->kvo_memattr = obj->memattr;
 		kvo->kvo_active = 0;
 		kvo->kvo_inactive = 0;
+#if 0

Did you mean both the 1st patch and 2nd patch together?

I so far tested the 2nd one only; I had reverted the 1st one.
With 2nd one only, I don't see visible improvements.

Update systat.1 based on suggestions.

I tested the 2 kernel changes both applied. In short, it helps since the 2nd refresh.

With or without the kernel changes, systat takes long time to return from the call. During that period, top, ps and few others freeze, although xosview draws memory and swap usage fine. Without the kernel changes, each systat refresh takes same amount of time and causes some others to hung. On the other hand, with the kernel changes, systat redraws significantly faster for the 2nd refresh and thereafter and also others like top don't hung.
So, the first call is an expensive call.

To revive this stuff, please try D31163. You need to replace sysctl vm.objects with vm.swap_objects. No other changes to systat code is required. You might even try vm.swap_objects first, and if not available, still use vm.objects.

The new sysctl reports all but only swap objects in one go. This is probably the fastest we can get, except for computing swap use fully in kernel (but I prefer to avoid that). In particular, comparing with my initial proposal of doing per-vm_entry sysctl, this only requires one syscall to fetch relevant objects info.

So please apply D31163, change code to use new sysctl, and report if it is any faster. I think that we would move forward anyway, assuming that the per-process swap use calculation can be disabled.

gbe added a subscriber: gbe.

OK for the man page part, since all comments are addressed.

I tested the 2 kernel changes both applied. In short, it helps since the 2nd refresh.

With or without the kernel changes, systat takes long time to return from the call. During that period, top, ps and few others freeze, although xosview draws memory and swap usage fine. Without the kernel changes, each systat refresh takes same amount of time and causes some others to hung. On the other hand, with the kernel changes, systat redraws significantly faster for the 2nd refresh and thereafter and also others like top don't hung.
So, the first call is an expensive call.

Have you had a chance to try out Kostik's latest sysctl addition (D31163 to add sysctl vm.objects_swap) that returns only swap objects to
see if it reduces the overhead when you are running your `systat -swap'?

I started looking into https://reviews.freebsd.org/D31163.
I've been releng/13.0 based system (with cherrypick).
This time, the patch doesn't apply to 13 and I need to prepare main branch based environment.
I will need a bit time to build the latest.

  • systat: Stop displaying total if there is one swap devise.
  • systat: Implemented per-process swap display on -swap.
  • Avoid double counting and address review comments.
  • systat.1 updates
  • D31163 - Add sysctl vm.objects_swap.
  • Use new vm.objects_swap instead of vm.objects.

Added a patch from D31163. Unfortuantely, the new sysctl didn't improve
performance issue with tmpfs files.

  • systat: Stop displaying total if there is one swap devise.
  • systat: Implemented per-process swap display on -swap.
  • Avoid double counting and address review comments.
  • systat.1 updates
  • D31163 - Add sysctl vm.objects_swap.
  • Use new vm.objects_swap instead of vm.objects.

Added a patch from D31163. Unfortuantely, the new sysctl didn't improve
performance issue with tmpfs files.

Great to see that you have had some time to get back to this project.

While you say that D31163 did not improve the performance issue with tmpfs files, did it improve performance on other tests? Or did it fail to improve performance on any of your tests?

While you say that D31163 did not improve the performance issue with tmpfs files, did it improve performance on other tests? Or did it fail to improve performance on any of your tests?

tmpfs with lots of files slows down a lot. For example, when I extract ports.txz onto tmpfs, I observed sysctl("vm.objects") call takes significant amout of time with truss -o file. Each sysctl calls takes more than few minutes. On the contrary, when I have a large file on tmpfs, e.g. dd if=/dev/zero of=/tmpfs/large bs=1M count=10240, vm.objects calls are very quick.

I haven't be able to figure out which function during a system call taking long time, yet.

There was a bug which resulted not calling new vm.objects_swap.
After fixing the bug and further testing, I no longer see long pause.
In other words, the kernel patch and new sysctl is helping performance improvement.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 26 2021, 12:50 PM
This revision was automatically updated to reflect the committed changes.