Page MenuHomeFreeBSD

eliminate locking surrounding ui_vmsize and swap reserve by using atomics
ClosedPublic

Authored by mmacy on Jul 15 2018, 3:07 AM.
Tags
None
Referenced Files
F108322621: D16273.diff
Thu, Jan 23, 8:54 PM
F108271861: D16273.id48448.diff
Thu, Jan 23, 8:16 AM
Unknown Object (File)
Sun, Jan 5, 12:07 PM
Unknown Object (File)
Fri, Jan 3, 2:07 PM
Unknown Object (File)
Fri, Jan 3, 4:28 AM
Unknown Object (File)
Thu, Jan 2, 1:52 AM
Unknown Object (File)
Fri, Dec 27, 6:51 AM
Unknown Object (File)
Wed, Dec 25, 6:25 PM
Subscribers

Details

Summary
  • change swap reserve to transact in vm_size_t to make operations appropriate to the platform

~70% increase in number of brk calls / second

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 19783

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sys/vm/swap_pager.c
301

I haven't seen that in practice. How would I avoid a leak.

sys/vm/swap_pager.c
153–154

That's not true in general and definitely isn't true with PAE, and swap_reserved may be much larger than the amount of swap actually used. vm_size_t isn't the right type.

mmacy added inline comments.
sys/vm/swap_pager.c
153–154

Does it make sense to support PAE? Do we know that it even works still? One can't do atomic updates on 64-bit quantities on a number of the tier 2 legacy arches. Nonetheless, I suppose I could implement a wrapper for this purpose.

sys/vm/swap_pager.c
153–154

Sure, I've tested it in the past and I occasionally see bug reports from people running PAE kernels. That's irrelevant though since one can cause swap_reserved to exceed 4GB by starting firefox and opening a few tabs.

sys/vm/swap_pager.c
153–154

A different option might be to make swap_reserved a page count rather than a byte count. We typically use u_int for page counts, though that'll have to change on 64-bit platforms eventually. I haven't looked closely at doing this though, there might be some pitfalls.

mmacy added inline comments.
sys/vm/swap_pager.c
153–154

That actually makes the most sense. And I don't see why it would necessarily have to be the same size type between 32-bit and 64-bit.

sys/vm/swap_pager.c
153–154

Right, I think u_long makes sense for page counts in general. We shouldn't need more than 32 bits on 32-bit architectures, and you can just use atomic_*_long_* everywhere.

Convert swap_total and swap_reserved to be in units of pages.

sys/vm/swap_pager.c
153–154

PAE works, Peter tests PAE kernel semi-regularly. I believe there is a similar mode implemented for ppc32, but do not want to look it up. For armv7 there is paging mode LPAE similar to PAE, but there I know that we do not implement it.

It was/is not unusual to have e.g. 4G i386 machine with 8-16G of swap configured, and it was operating as expected.

301

Original code used proc lock to avoid the race. Hmm, in this specific place it should dereference curproc->p_cred->cr_ruidinfo, in fact.

  • protect against accounting leak if credentials change

restore vm_ooffset_t as argument type to swap functions

Make r and s vm_ooffset_t again

make page counts unsigned long

sys/vm/swap_pager.c
184

You don't need these.

288–289

These can just be KASSERTS(), including the one in swap_reserve_by_cred().

2279

You don't need the cast.

2376

You don't need the cast.

sys/vm/swap_pager.c
154–158

I agree with markj. You should simply use an u_long here. Historically, vm_size_t has represented quantities that are byte counts, not page counts. So, this muddies the meaning of vm_size_t. Similarly, many functions that take a page count as an argument, e.g., contigmalloc, various vm_phys_*, some vm_page_*. Just grep for "npages".

mmacy added inline comments.
sys/vm/swap_pager.c
184

Yes I do. On ARM long and int are not interoperable.

mmacy added inline comments.
sys/vm/swap_pager.c
184

Err. I guess now that it's explicitly an unsigned long maybe it will work.

sys/sys/resourcevar.h
96

That's the same as (b).

sys/vm/swap_pager.c
221

This should be unsigned long.

316

This should be unsigned long.

322

The condition being tested is inverted. The function names in the assertions are wrong. It would also be nice to include the value of "decr" in the assertion message.

fix KASSERTs and resourcevar.h comment

fix KASSERT function names printed

sys/vm/swap_pager.c
182

You can have single handler that takes &swap_reserved or &swap_total as arg1.

230

All of the KASSERT lines should wrap at 80 columns.

256

We should verify that swap_reserved >= incr:

prev = atomic_fetchadd_long(&swap_reserved, -incr);
if (prev < incr)
    panic("...");

like in swap_release_by_cred().

269

You don't need this cast anymore.

292–296

Now we unnecessarily lock the proc twice if racct is enabled. The lines

incr >>= PAGE_SHIFT;
atomic_add_long(&swap_reserved, inc);

can move up.

pass properly scaled value to racct_add_force

sys/vm/swap_pager.c
241

We have atop()/ptoa() macros for size/page count conversions.

302

This is unsafe without the proc lock.

310

And this.

mmacy marked 3 inline comments as done.Oct 2 2018, 8:10 PM
mmacy added inline comments.
sys/vm/swap_pager.c
241

coming up

302

It's acquired on line 274.

310

It's acquired 2 lines above.

This revision is now accepted and ready to land.Oct 3 2018, 6:51 PM
markj added inline comments.
sys/vm/swap_pager.c
256

This comment applies to ui_vmsize block below too.

sys/vm/swap_pager.c
187

Internally, ptoa() does not cast its argument to a type that is guaranteed to be 64 bits wide. So, unless you change the type of value to something that is 64 bits wide, this can suffer overflow, and declaring newval as uint64_t was all for naught.

P.S. arm will still be broken because it has the only ptoa() implementation among the 32-bit platforms that performs a cast to a 32-bit unsigned internally. I would regard this as "not your problem".

sys/vm/swap_pager.c
241

32-bit architectures would benefit from replacing the incr on the left-hand side of this assignment by an actual unsigned long-type variable, which I'll call "pincr", because then throughout the rest of this function they would the narrower pincr and no longer pointlessly perform 64-bit arithmetic.

Similarly the variable r could be made unsigned long (or u_long).

251

This is incorrect because s is in bytes and swap_total is in pages. Using ptoa() will be problematic here on all 32-bit architectures because swap_total is an unsigned long.

275

With the introduction of pincr, this ptoa(incr) would simply revert back to incr.

300–301

See earlier comment about this same assignment elsewhere.

324–326

My earlier comments apply to this function as well.

mmacy marked an inline comment as done.

incorporate @alc feedback

This revision now requires review to proceed.Oct 3 2018, 11:05 PM
sys/vm/swap_pager.c
264

There is a typo here: _fetchad_

sys/vm/swap_pager.c
332–334

It seems like this snippet should be wrapped with

#ifdef RACCT

All other RACCT snippets in this file are #ifdef'ed. To be clear, the original code was also lacking this #ifdef, but we might as well add it now.

  • add #ifdef RACCT
  • fix typo
This revision is now accepted and ready to land.Oct 4 2018, 7:23 AM

Aside from this comment about style, I've got nothing else to say about this patch. I'm happy to see it committed.

sys/sys/resourcevar.h
100

Just a point about style, not correctness or performance: Everywhere else in this file "unsigned long" is spelled "u_long". Please consider using that shorter spelling here and swap_pager.c. swap_pager.c is mixed it its use of "unsigned long" versus "u_long", but previous to this patch the shorter spelling appeared more often than the longer spelling. I think using the shorter spelling wherever you're changing the code would be a virtue.

This revision was automatically updated to reflect the committed changes.