Page MenuHomeFreeBSD

Optimize vm_page_in_laundry() with a bit operation.
AbandonedPublic

Authored by ota_j.email.ne.jp on Oct 12 2019, 2:18 AM.
Tags
None
Referenced Files
Unknown Object (File)
Nov 8 2023, 10:39 AM
Unknown Object (File)
Nov 6 2023, 2:00 AM
Unknown Object (File)
Nov 4 2023, 3:51 AM
Unknown Object (File)
Oct 10 2023, 6:31 PM
Unknown Object (File)
Oct 7 2023, 9:35 AM
Unknown Object (File)
Oct 3 2023, 3:43 AM
Unknown Object (File)
Jul 25 2023, 2:16 PM
Unknown Object (File)
May 18 2023, 10:02 AM
Subscribers

Details

Reviewers
kib
alc
dougm
markj
Summary

Use bit operation to reduce comparisons.
Does FreeBSD allow this type of optimizations?
Is it worth the effort?

Based on more than few weeks of experiments with this patch,
I han't found any issues. PQ_NONE dropping 0x2 doesn't have any side effects.

Test Plan

Rebuilt kernel and use the system with heavy swap-in/out activities.
For example, make buildworld -j <large#> causes clang compilations to use
a lot of memory.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 26994
Build 25290: arc lint + arc unit

Event Timeline

sys/vm/vm_page.h
294

Its not clear that this change is necessary. Why is it necessary?

893

This function should return true or false, not 0 or 2.

Have you seen that this change causes some compiler to generate better assembly language for this function? Could you show us that?

sys/vm/vm_page.h
294

vm_page_in_laundry() would otherwise return true if m->queue == PQ_NONE.

893

We're testing queue == 2 || queue == 3, which can be optimized to (queue | 1) == 3. I don't think this kind of micro-optimization is really worth changing the definition of PQ_NONE.

Thanks for follow up, markj.

If it isn't worth, I will abandon this review in few days, then.

sys/vm/vm_page.h
893

Copy the below adjusted code to https://godbolt.org/ and see the assembly code.
Unfortunately, clang isn't yet supported and gcc 9.2 didn't like bool nor inline; but it is 4 + 7 = 11 compare to 4 instructions.

#define PQ_NONE 255
#define PQ_INACTIVE 0
#define PQ_ACTIVE 1
#define PQ_LAUNDRY 2
#define PQ_UNSWAPPABLE 3
#define PQ_COUNT 4

struct vm_page {

char queue;

};

typedef struct vm_page *vm_page_t;

static inline char
vm_page_queue(vm_page_t m)
{
return (m->queue);
}

static char
vm_page_in_laundry(vm_page_t m)
{
char queue;

queue = vm_page_queue(m);
return (queue == PQ_LAUNDRY || queue == PQ_UNSWAPPABLE);
}

#undef PQ_NONE
#define PQ_NONE (255-0x2)

static char vm_page_in_laundry2(vm_page_t m)
{

return (vm_page_queue(m) & PQ_LAUNDRY);
}

Would there be a problem with changing the #defines:
#define PQ_NONE 255
#define PQ_UNSWAPPABLE 0
#define PQ_LAUNDRY 1
#define PQ_INACTIVE 2
#define PQ_ACTIVE 3
#define PQ_COUNT 4

and using "queue <= PQ_LAUNDRY" as the test in vm_page_in_laundry?

That's another idea.
I will check the code and test if that is also an option.

I've been using the comparison based optimization for 2 weeks on current
following daily, buildworld/buildkernel daily and haven't seen an issue
with this approach.

This is much easier to read compare to bit-operation based code but is
not sure if worth the change for its gain.