Page MenuHomeFreeBSD

ufs: support unmapped bufs for indirect blocks in bmap
Needs ReviewPublic

Authored by chs on Tue, Oct 28, 9:12 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 7, 2:01 AM
Unknown Object (File)
Fri, Nov 7, 2:01 AM
Unknown Object (File)
Thu, Nov 6, 9:31 PM
Unknown Object (File)
Sun, Nov 2, 8:08 PM
Unknown Object (File)
Thu, Oct 30, 8:29 AM
Unknown Object (File)
Wed, Oct 29, 8:50 AM
Unknown Object (File)
Wed, Oct 29, 7:09 AM
Unknown Object (File)
Wed, Oct 29, 6:37 AM
Subscribers

Details

Summary

Use unmapped bufs for indirect block buffers in bmap if the platform
has a direct map, and use the direct map to access the buffer data
instead of using the traditional virtually contiguous mapping.

On our 96-core boxes serving 350-ish Gb/s of streaming video traffic, this change gives a reduction in CPU usage of about 8% of CPU cycles used.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 68198
Build 65081: arc lint + arc unit

Event Timeline

chs requested review of this revision.Tue, Oct 28, 9:12 PM
chs added reviewers: mckusick, kib, gallatin, imp.

I strongly do not like this, esp the proliferation of PMAP_HAS_DMAP into MD code.

Since you access the buffer pages one by one, you can use sf_bufs to get the mapping. On dmap arches, this would give the same optimization as you did manually coding PHYS_TO_DMAP(). But also the same code path would be used on !dmap platforms.

This is an impressive performance improvement and one well worth adding to the system.

If it is possible to achieve your goal in the way outlined by kib that would certainly be preferable.

Is there any concern that this new use will add pressure to the sf_buf system on non-direct map platforms?

Is there any concern that this new use will add pressure to the sf_buf system on non-direct map platforms?

No. sf_bufs were designed specifically to provide transient efficient remapping of arbitrary page. On non-dmap arches, there is a limited (like 1k) number of slots available for sf_buf mappings, and in the worst case the requester would wait until other thread free its sf_buf, which should not take long.

Using cpu-private sf_bufs together with sched_pin() would eliminate the need to broadcast invalidation IPI for non-dmap case. It costs nothing on dmap.