Page MenuHomeFreeBSD

Optimize vm_page_in_laundry() with a bit operation.
AbandonedPublic

Authored by ota_j.email.ne.jp on Oct 12 2019, 2:18 AM.

Details

Reviewers
kib
alc
dougm
markj
Summary

Use bit operation to reduce comparisons.
Does FreeBSD allow this type of optimizations?
Is it worth the effort?

Based on more than few weeks of experiments with this patch,
I han't found any issues. PQ_NONE dropping 0x2 doesn't have any side effects.

Test Plan

Rebuilt kernel and use the system with heavy swap-in/out activities.
For example, make buildworld -j <large#> causes clang compilations to use
a lot of memory.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 27214
Build 25479: arc lint + arc unit

Event Timeline

dougm added inline comments.Oct 12 2019, 2:42 AM
sys/vm/vm_page.h
315

Its not clear that this change is necessary. Why is it necessary?

920

This function should return true or false, not 0 or 2.

Have you seen that this change causes some compiler to generate better assembly language for this function? Could you show us that?

markj added inline comments.Oct 13 2019, 3:51 PM
sys/vm/vm_page.h
315

vm_page_in_laundry() would otherwise return true if m->queue == PQ_NONE.

920

We're testing queue == 2 || queue == 3, which can be optimized to (queue | 1) == 3. I don't think this kind of micro-optimization is really worth changing the definition of PQ_NONE.

Thanks for follow up, markj.

If it isn't worth, I will abandon this review in few days, then.

sys/vm/vm_page.h
920

Copy the below adjusted code to https://godbolt.org/ and see the assembly code.
Unfortunately, clang isn't yet supported and gcc 9.2 didn't like bool nor inline; but it is 4 + 7 = 11 compare to 4 instructions.

#define PQ_NONE 255
#define PQ_INACTIVE 0
#define PQ_ACTIVE 1
#define PQ_LAUNDRY 2
#define PQ_UNSWAPPABLE 3
#define PQ_COUNT 4

struct vm_page {

char queue;

};

typedef struct vm_page *vm_page_t;

static inline char
vm_page_queue(vm_page_t m)
{
return (m->queue);
}

static char
vm_page_in_laundry(vm_page_t m)
{
char queue;

queue = vm_page_queue(m);
return (queue == PQ_LAUNDRY || queue == PQ_UNSWAPPABLE);
}

#undef PQ_NONE
#define PQ_NONE (255-0x2)

static char vm_page_in_laundry2(vm_page_t m)
{

return (vm_page_queue(m) & PQ_LAUNDRY);
}

dougm added a comment.Oct 14 2019, 1:36 AM

Would there be a problem with changing the #defines:
#define PQ_NONE 255
#define PQ_UNSWAPPABLE 0
#define PQ_LAUNDRY 1
#define PQ_INACTIVE 2
#define PQ_ACTIVE 3
#define PQ_COUNT 4

and using "queue <= PQ_LAUNDRY" as the test in vm_page_in_laundry?

That's another idea.
I will check the code and test if that is also an option.

I've been using the comparison based optimization for 2 weeks on current
following daily, buildworld/buildkernel daily and haven't seen an issue
with this approach.

This is much easier to read compare to bit-operation based code but is
not sure if worth the change for its gain.

ota_j.email.ne.jp abandoned this revision.Sat, Nov 9, 5:31 AM