Page MenuHomeFreeBSD

eliminate locking surrounding ui_vmsize and swap reserve by using atomics
ClosedPublic

Authored by mmacy on Jul 15 2018, 3:07 AM.
Tags
None
Referenced Files
F108322621: D16273.diff
Thu, Jan 23, 8:54 PM
F108271861: D16273.id48448.diff
Thu, Jan 23, 8:16 AM
Unknown Object (File)
Sun, Jan 5, 12:07 PM
Unknown Object (File)
Fri, Jan 3, 2:07 PM
Unknown Object (File)
Fri, Jan 3, 4:28 AM
Unknown Object (File)
Thu, Jan 2, 1:52 AM
Unknown Object (File)
Fri, Dec 27, 6:51 AM
Unknown Object (File)
Wed, Dec 25, 6:25 PM
Subscribers

Details

Summary
  • change swap reserve to transact in vm_size_t to make operations appropriate to the platform

~70% increase in number of brk calls / second

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sys/vm/swap_pager.c
281 ↗(On Diff #45310)

I haven't seen that in practice. How would I avoid a leak.

sys/vm/swap_pager.c
154 ↗(On Diff #45310)

That's not true in general and definitely isn't true with PAE, and swap_reserved may be much larger than the amount of swap actually used. vm_size_t isn't the right type.

mmacy added inline comments.
sys/vm/swap_pager.c
154 ↗(On Diff #45310)

Does it make sense to support PAE? Do we know that it even works still? One can't do atomic updates on 64-bit quantities on a number of the tier 2 legacy arches. Nonetheless, I suppose I could implement a wrapper for this purpose.

sys/vm/swap_pager.c
154 ↗(On Diff #45310)

Sure, I've tested it in the past and I occasionally see bug reports from people running PAE kernels. That's irrelevant though since one can cause swap_reserved to exceed 4GB by starting firefox and opening a few tabs.

sys/vm/swap_pager.c
154 ↗(On Diff #45310)

A different option might be to make swap_reserved a page count rather than a byte count. We typically use u_int for page counts, though that'll have to change on 64-bit platforms eventually. I haven't looked closely at doing this though, there might be some pitfalls.

mmacy added inline comments.
sys/vm/swap_pager.c
154 ↗(On Diff #45310)

That actually makes the most sense. And I don't see why it would necessarily have to be the same size type between 32-bit and 64-bit.

sys/vm/swap_pager.c
154 ↗(On Diff #45310)

Right, I think u_long makes sense for page counts in general. We shouldn't need more than 32 bits on 32-bit architectures, and you can just use atomic_*_long_* everywhere.

Convert swap_total and swap_reserved to be in units of pages.

sys/vm/swap_pager.c
154 ↗(On Diff #45310)

PAE works, Peter tests PAE kernel semi-regularly. I believe there is a similar mode implemented for ppc32, but do not want to look it up. For armv7 there is paging mode LPAE similar to PAE, but there I know that we do not implement it.

It was/is not unusual to have e.g. 4G i386 machine with 8-16G of swap configured, and it was operating as expected.

281 ↗(On Diff #45310)

Original code used proc lock to avoid the race. Hmm, in this specific place it should dereference curproc->p_cred->cr_ruidinfo, in fact.

  • protect against accounting leak if credentials change

restore vm_ooffset_t as argument type to swap functions

Make r and s vm_ooffset_t again

make page counts unsigned long

sys/vm/swap_pager.c
184 ↗(On Diff #48439)

You don't need these.

289 ↗(On Diff #48439)

These can just be KASSERTS(), including the one in swap_reserve_by_cred().

2284 ↗(On Diff #48439)

You don't need the cast.

2381 ↗(On Diff #48439)

You don't need the cast.

sys/vm/swap_pager.c
154 ↗(On Diff #48438)

I agree with markj. You should simply use an u_long here. Historically, vm_size_t has represented quantities that are byte counts, not page counts. So, this muddies the meaning of vm_size_t. Similarly, many functions that take a page count as an argument, e.g., contigmalloc, various vm_phys_*, some vm_page_*. Just grep for "npages".

mmacy added inline comments.
sys/vm/swap_pager.c
184 ↗(On Diff #48439)

Yes I do. On ARM long and int are not interoperable.

mmacy added inline comments.
sys/vm/swap_pager.c
184 ↗(On Diff #48439)

Err. I guess now that it's explicitly an unsigned long maybe it will work.

sys/sys/resourcevar.h
96 ↗(On Diff #48443)

That's the same as (b).

sys/vm/swap_pager.c
210 ↗(On Diff #48443)

This should be unsigned long.

309 ↗(On Diff #48443)

This should be unsigned long.

314 ↗(On Diff #48443)

The condition being tested is inverted. The function names in the assertions are wrong. It would also be nice to include the value of "decr" in the assertion message.

fix KASSERTs and resourcevar.h comment

fix KASSERT function names printed

sys/vm/swap_pager.c
182 ↗(On Diff #48450)

You can have single handler that takes &swap_reserved or &swap_total as arg1.

218 ↗(On Diff #48450)

All of the KASSERT lines should wrap at 80 columns.

245 ↗(On Diff #48450)

We should verify that swap_reserved >= incr:

prev = atomic_fetchadd_long(&swap_reserved, -incr);
if (prev < incr)
    panic("...");

like in swap_release_by_cred().

258 ↗(On Diff #48450)

You don't need this cast anymore.

280 ↗(On Diff #48450)

Now we unnecessarily lock the proc twice if racct is enabled. The lines

incr >>= PAGE_SHIFT;
atomic_add_long(&swap_reserved, inc);

can move up.

pass properly scaled value to racct_add_force

sys/vm/swap_pager.c
222 ↗(On Diff #48463)

We have atop()/ptoa() macros for size/page count conversions.

281 ↗(On Diff #48463)

This is unsafe without the proc lock.

293 ↗(On Diff #48463)

And this.

mmacy marked 3 inline comments as done.Oct 2 2018, 8:10 PM
mmacy added inline comments.
sys/vm/swap_pager.c
222 ↗(On Diff #48463)

coming up

281 ↗(On Diff #48463)

It's acquired on line 274.

293 ↗(On Diff #48463)

It's acquired 2 lines above.

This revision is now accepted and ready to land.Oct 3 2018, 6:51 PM
markj added inline comments.
sys/vm/swap_pager.c
245 ↗(On Diff #48450)

This comment applies to ui_vmsize block below too.

sys/vm/swap_pager.c
186 ↗(On Diff #48657)

Internally, ptoa() does not cast its argument to a type that is guaranteed to be 64 bits wide. So, unless you change the type of value to something that is 64 bits wide, this can suffer overflow, and declaring newval as uint64_t was all for naught.

P.S. arm will still be broken because it has the only ptoa() implementation among the 32-bit platforms that performs a cast to a 32-bit unsigned internally. I would regard this as "not your problem".

sys/vm/swap_pager.c
222 ↗(On Diff #48657)

32-bit architectures would benefit from replacing the incr on the left-hand side of this assignment by an actual unsigned long-type variable, which I'll call "pincr", because then throughout the rest of this function they would the narrower pincr and no longer pointlessly perform 64-bit arithmetic.

Similarly the variable r could be made unsigned long (or u_long).

232 ↗(On Diff #48657)

This is incorrect because s is in bytes and swap_total is in pages. Using ptoa() will be problematic here on all 32-bit architectures because swap_total is an unsigned long.

258 ↗(On Diff #48657)

With the introduction of pincr, this ptoa(incr) would simply revert back to incr.

279 ↗(On Diff #48657)

See earlier comment about this same assignment elsewhere.

308 ↗(On Diff #48657)

My earlier comments apply to this function as well.

mmacy marked an inline comment as done.

incorporate @alc feedback

This revision now requires review to proceed.Oct 3 2018, 11:05 PM
sys/vm/swap_pager.c
245 ↗(On Diff #48699)

There is a typo here: _fetchad_

sys/vm/swap_pager.c
317–318 ↗(On Diff #48699)

It seems like this snippet should be wrapped with

#ifdef RACCT

All other RACCT snippets in this file are #ifdef'ed. To be clear, the original code was also lacking this #ifdef, but we might as well add it now.

  • add #ifdef RACCT
  • fix typo
This revision is now accepted and ready to land.Oct 4 2018, 7:23 AM

Aside from this comment about style, I've got nothing else to say about this patch. I'm happy to see it committed.

sys/sys/resourcevar.h
99 ↗(On Diff #48704)

Just a point about style, not correctness or performance: Everywhere else in this file "unsigned long" is spelled "u_long". Please consider using that shorter spelling here and swap_pager.c. swap_pager.c is mixed it its use of "unsigned long" versus "u_long", but previous to this patch the shorter spelling appeared more often than the longer spelling. I think using the shorter spelling wherever you're changing the code would be a virtue.

This revision was automatically updated to reflect the committed changes.