jeff (Jeffrey Roberson)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 6 2017, 12:45 AM (46 w, 12 h)

Recent Activity

Today

jeff added inline comments to D15985: Reduce unnecessary preemption, add a preemption knob for timeshare, fix missing NEEDRESCHED.
Sun, Jun 24, 12:23 AM
jeff added inline comments to D15985: Reduce unnecessary preemption, add a preemption knob for timeshare, fix missing NEEDRESCHED.
Sun, Jun 24, 12:14 AM
jeff created D15985: Reduce unnecessary preemption, add a preemption knob for timeshare, fix missing NEEDRESCHED.
Sun, Jun 24, 12:08 AM

Yesterday

jeff abandoned D13308: pageout fix.

Closing as it is already fixed.

Sat, Jun 23, 11:51 PM
jeff commandeered D13308: pageout fix.

This has been resolved in current with a more complete fix.

Sat, Jun 23, 11:50 PM
jeff added inline comments to D15976: Change vm_page_import() to avoid physical memory fragmentation.
Sat, Jun 23, 10:11 PM
jeff added a comment to D15977: Make the partpopq LRU sloppy to reduce contention.
In D15977#338278, @kib wrote:
In D15977#338276, @jeff wrote:

The other thing to consider is how accurate it needs to be and already is. Which thread was scheduled in the last tick is just as arbitrary as this LRU. You just want to replace something that hasn't been used in a long time. I would guess we're more often looking at things that haven't been touched in seconds or at least hundreds of milliseconds than within a few ticks. I can measure that but it will of course be grossly dependent on the workload and amount of memory.

If we do not need an LRU, might be we do not need the partpopq list at all ? E.g. keeping a bins of generations per ticks, up to some limited number of bins.

Sat, Jun 23, 10:37 AM
jeff added inline comments to D15976: Change vm_page_import() to avoid physical memory fragmentation.
Sat, Jun 23, 10:13 AM
jeff added a comment to D15977: Make the partpopq LRU sloppy to reduce contention.
In D15977#338275, @kib wrote:

How much random becomes the order of the partpopq ? Is there any way to evaluate it ?

I mean, a tick is a lot, so instead of only doing it at tick, deletegate the limited (?) sorting of the rvq_partpop queues by lasttick to a daemon.

Sat, Jun 23, 10:04 AM
jeff added a comment to D15975: eliminate global serialization points in swap reserve & mmap.

Some great stuff in here. Let's peel off parts while we perfect the rest.

Sat, Jun 23, 9:28 AM
jeff created D15977: Make the partpopq LRU sloppy to reduce contention.
Sat, Jun 23, 8:54 AM
jeff committed rS335579: Sort uma_zone fields according to 64 byte cache line with adjacent line.
Sort uma_zone fields according to 64 byte cache line with adjacent line
Sat, Jun 23, 8:10 AM

Fri, Jun 15

jeff added a comment to D15799: use fget_unlocked in pollscan, defer fdrop to sel/pollrescan and seltdclear.

kqueue and select both use fget_unlocked. If you want to propose files without references for single threaded programs you are free to do so. You should raise it on arch@ as there is no real owner in this area. This patch further reduces differences between select and poll and reduces the number of atomics used in select which I would argue is the more frequently used of the pair.

Fri, Jun 15, 10:55 PM

Thu, Jun 14

jeff added inline comments to D15799: use fget_unlocked in pollscan, defer fdrop to sel/pollrescan and seltdclear.
Thu, Jun 14, 8:37 PM
jeff added a comment to D15799: use fget_unlocked in pollscan, defer fdrop to sel/pollrescan and seltdclear.
In D15799#334063, @mjg wrote:
In D15799#334059, @jeff wrote:

I understand but you can't guarantee the thread is the only thing which is accessing these file descriptors. Off the top of my head, the unix domain socket gc thread does fdrops() on a task queue. It _may_ be possible to start to work around these things but it becomes incredibly hard to reason about. And you'd have to audit everything else in the kernel that uses a file * to understand whether it imposes restrictions on things you can do single threaded.

None of this is of any concern.

If the process is single threaded and the file descriptor table is not shared, it is the only entity which can modify its own fd table.

So in particular if it has a file installed, it holds a reference to keep it alive. Also nothing but curthread can drop it.

Let's say the same file object is being inspected by the unix gc thread - it is of no significance for this process. Let's say it fdrops. Does not matter, the process at hand still has its own ref.

The optimisation of not refing/unrefing files in single-threaded processes is implemented in Linux for all syscalls translating fd -> file.

The only caveat here is that you have to remember whether you grabbed the reference or not, since after you got one the other thread/whatever can disappear and you may transition to being single-threaded.

So the idiom is of this sort:
fp = fd2fp(fd, &need_fdrop);
....
fdrop_cond(fp, need_fdrop);

I would say pretty much no obfuscation in the caller and possibly beneficial to applly globally, not only here.

Thu, Jun 14, 8:34 PM
jeff added a comment to D15799: use fget_unlocked in pollscan, defer fdrop to sel/pollrescan and seltdclear.

as an aside, this is what select already does anyway. Poll was still using the big slock but select was using the lockless fd support.

Thu, Jun 14, 6:58 AM
jeff added a comment to D15799: use fget_unlocked in pollscan, defer fdrop to sel/pollrescan and seltdclear.
In D15799#334058, @mjg wrote:
In D15799#334056, @jeff wrote:
In D15799#334039, @mjg wrote:

this avoidably pessimizes the common case of single-threaded execution by adding atomic op pair for each fd. the code can check if both the process is single threaded and the fd table is not shared, in which case there is no need to grab a ref on files. this will end up being a minor pessimization for the multithreaded (and presumably rare) case while being a win for singlethreaded one.

on my machine it takes less than 40 clock cycles or 11ns to do a atomic_add, atomic_fetchadd pair on a line that is in cache. I would really prefer that we did not obfuscate the code with fragile exceptions for a tiny bit of performance. There are far more profitable ways to improve our single threaded perf in poll.

This patch converts a per-call lock/unlock pair into a ref/unref pair for each passed fd, so it does matter. More importantly vast majority of poll users are single-threaded, so this patch as presented is pessimal for real uses. I don't see how the proposal obfuscated the code in any significant way.

I definitely agree there are plenty of wins to get in this code regardless of the above.

Thu, Jun 14, 6:49 AM
jeff added inline comments to D15799: use fget_unlocked in pollscan, defer fdrop to sel/pollrescan and seltdclear.
Thu, Jun 14, 6:43 AM
jeff added a comment to D15799: use fget_unlocked in pollscan, defer fdrop to sel/pollrescan and seltdclear.
In D15799#334039, @mjg wrote:

this avoidably pessimizes the common case of single-threaded execution by adding atomic op pair for each fd. the code can check if both the process is single threaded and the fd table is not shared, in which case there is no need to grab a ref on files. this will end up being a minor pessimization for the multithreaded (and presumably rare) case while being a win for singlethreaded one.

Thu, Jun 14, 6:38 AM

Mon, Jun 11

jeff added a comment to D15736: Implement fast path for malloc and free.

Does this not include the malloc.h changes for M_ZERO?

Mon, Jun 11, 1:30 AM

Thu, May 31

jeff accepted D15491: Eliminate the "pass" variable in the page daemon control loop..

I just realized that this change has the same goal as D13644. The discussion there is still relevant; in particular, with the PID controller change it now doesn't make sense for v_free_target to be set as high as it is: the controller will produce a positive output as soon as v_free_count < v_free_target, and the page daemon runs the controller once every 100ms. In other words, we start freeing pages more or less immediately after v_free_count drops below v_free_target. I plan to address that, but in a separate change.

Thu, May 31, 11:36 PM

Wed, May 30

jeff added a comment to D15526: reduce overhead of entropy collection.

Given that there is trivially little if any entropy coming from mbufs is there a reason we're leaving this callsite at all? has anyone from secteam commented?

Wed, May 30, 11:05 PM

May 24 2018

jeff committed rS334127: Merge from head.
Merge from head
May 24 2018, 3:47 AM

May 17 2018

jeff accepted D15462: Fix a race in vm_page_pagequeue_lockptr()..
May 17 2018, 4:15 AM

May 13 2018

jeff added inline comments to D15365: simple preempt safe epoch API.
May 13 2018, 1:04 AM
jeff added a comment to D15010: add white listing for ZFS locking pairs that WITNESS can't report accurately and enable WITNESS by default in ZFS.
In D15010#316190, @mav wrote:

I am not closely familiar with WITNESS, so just a feeling: the long lists of blessed locks and their combinations promises high chances for them to be forgotten on following ZFS updates.

That's actually true of all documented lock orders. I don't have a good fix for that. However, the cost of lookup can be further reduced by putting the names in a red black tree, reducing the overhead to the O(lgN).

At very list it would be good to document how those new mechanisms should be used.

Yup. Seems pretty self-explanatory apart from the separate lists used for expediting negative lookups.

May 13 2018, 12:43 AM
jeff added a comment to D15275: Feature enhancements to pmcstat.
In D15275#322358, @kib wrote:
In D15275#322071, @kib wrote:

How does an event description from the json tables is matched against the index from pmc_events.h ?

It works so long as the FreeBSD version was named correctly. I have an aliases table pmu_utils.c for things like UNHALTED_CORE_CYCLES and LLC_MISSES. If the table lookup fails it will just use the default sampling rate that is used on HEAD. Ultimately, on supported architectures I'd like to switch from using the ad hoc defines in pmc_events.h to using the json tables from Intel, IBM, and Cavium.

So did you verified that the name matches ? What are the plan for non-matching names ?

Mostly the names match. Switching to using the Intel names in the tables will fix it for good.

Also I think that importing tables in userspace is really a half measure. Right now we must update both kernel and userspace to get new event added, and in the course of it we have to break pmc(4) ABI. IMO the tables should live in the kernel, it is not a problem when hwpmc(4) is a module, or hwpmc(4) can be a minimal core, with microarch-specific submodules loaded as needed. Userspace should fetch the table from kernel and use kernel handles for events.

I don't agree and neither does Intel (see Andi's mail). We have a table of bits we can pass to the kernel as an ioctl. One could claim that defining them in the kernel buys some safety or compatibility guarantees, but that doesn't actually hold water in practice. Being able to add newly exposed PMCs without having to modify the kernel is a step forward, not a half measure. There are dozens if not hundreds of non-public PMCs on Zen processors. With this mechanism we could simply add a new table as opposed to laboriously and in error prone fashion copy from the docs.

May 13 2018, 12:19 AM
jeff added a comment to D15337: Add support for higher resolution timestamps.

My feeling is that ticks is unlikely to go any faster on general purpose kernels and some technique like this is inevitable as we continue to scale link performance. Some slight extra CPU time is a good trade-off for also eliminating weird rounding conditions and scaling factors. Overall I support this work going forward.

May 13 2018, 12:16 AM

May 11 2018

jeff added a comment to D15055: Map constant zero page on read faults which touch non-existing anon page..
In D15055#324256, @alc wrote:
In D15055#323469, @kib wrote:

In fact I started with ft.A.x when I did the testing, but there it was even less interesting than for ft.C.x. The counter's increment was about 1 or 2. This is why I changed to C and also asked about tuning.

I can re-test but I do not see the point.

I agree.

May 11 2018, 9:27 AM
jeff accepted D14917: Detect reads from the hole..

It would be nice to implement it in other filesystems that support sparse files.

May 11 2018, 9:23 AM
jeff added inline comments to D15155: Make pmclog buffer pcpu and update constants.
May 11 2018, 9:21 AM

Apr 30 2018

jeff added a comment to D15233: make ucred thread private.
In D15233#321371, @mjg wrote:

I mean is there any good reason to do this per-uid swap accounting to begin with? By default overcommit flags are 0, which in particular means the limit is not enforced whatsoever. I think it would be acceptable for the time being to flip overcommit to be a boot-time tunable and only play around with accounting if it got enabled.

The general point here is that in the normal case this is just a pessimization and fixing it requires quite some care, all while more pressing issues are here and 12.0 releng process is behind the corner.

Apr 30 2018, 9:45 PM
jeff added a comment to D15233: make ucred thread private.
In D15233#321171, @mjg wrote:
Apr 30 2018, 9:25 AM

Apr 29 2018

jeff added a comment to D15055: Map constant zero page on read faults which touch non-existing anon page..
In D15055#320936, @alc wrote:
In D15055#320927, @kib wrote:
In D15055#320924, @alc wrote:

Has anyone actually measured how often this optimization gets triggered? I'm just curious.

Even plain multiuser boot does trigger this code several times, it is me were sloppy with the testing of the last version.

In hindsight, the question that I should have asked is "How often does pmap_remove() encounter the zero page in the page table?" pmap_remove_pages() won't encounter the zero page because it's not mapped as a managed mapping. For "normal", i.e., writeable, virtual memory, I fear that this change is a pessimization. Without this change, on first touch, regardless of whether the access is a write, we will allocate a physical page and map it for write access. And so, this change would only increase the number of page faults. Moreover, in a multithreaded program, those page faults are going to have to perform a TLB shootdown, because we're changing the physical page being mapped. The cost of these additional page faults would have to be outweighed by the savings in the cases where pmap_remove() encountered a mapping to the zero page.

That said, I can see a variant of this change being an optimization for a more restricted set of cases, e.g., a read-only mapping of a file.

The optimization was requested by Jeff for very specific benchmark, since Linux also does the same trick and apparently FreeBSD loose a lot due to this. See also related D14917.
I think actual numbers will be provided when Jeff returns.

Apr 29 2018, 2:47 AM

Apr 8 2018

jeff added a comment to D14917: Detect reads from the hole..

We should think about what other filesystems could be trivially converted to this interface.

Apr 8 2018, 8:06 PM

Apr 7 2018

jeff added inline comments to D14994: Update zfs_arc_free_target after r329882..
Apr 7 2018, 7:52 PM

Apr 4 2018

jeff accepted D14893: VM page queue batching.
Apr 4 2018, 6:05 PM

Apr 3 2018

jeff added inline comments to D14893: VM page queue batching.
Apr 3 2018, 10:42 PM
jeff added a comment to D14891: msetdomain prototype (similar to mbind()).

I intend to commit this next week. I will denote that it is experimental and API may change in a man page and in comments. I think we're going to need more burn-in time with applications before 12.0 settles. I have a commitment from Netflix to sponsor that work.

Apr 3 2018, 10:34 PM

Apr 1 2018

jeff added inline comments to D14893: VM page queue batching.
Apr 1 2018, 8:41 PM
jeff committed rS331863: Add a uma cache of free pages in the DEFAULT freepool. This gives us.
Add a uma cache of free pages in the DEFAULT freepool. This gives us
Apr 1 2018, 4:50 AM
jeff closed D14905: per-cpu free page caching.
Apr 1 2018, 4:50 AM
jeff committed rS331862: Add the flag ZONE_NOBUCKETCACHE. This flag instructions UMA not to keep.
Add the flag ZONE_NOBUCKETCACHE. This flag instructions UMA not to keep
Apr 1 2018, 4:47 AM
jeff added a comment to D14917: Detect reads from the hole..

This version is much slower than the other version but still much faster than head. I think there is some bug though because after a while it started consuming space.

Apr 1 2018, 4:22 AM
jeff committed rS331861: Experimental support for msetdomain() a syscall similar to linux's mbind().
Experimental support for msetdomain() a syscall similar to linux's mbind()
Apr 1 2018, 4:12 AM

Mar 31 2018

jeff added a comment to D14917: Detect reads from the hole..

For what it's worth, I did test this with my sparse file dd test and we well exceed the performance of linux at this benchmark now that we're using the same technique. Unfortunately it defeats a convenient way to create a lot of paging traffic.

Mar 31 2018, 1:42 PM
jeff added inline comments to D14917: Detect reads from the hole..
Mar 31 2018, 1:20 PM

Mar 30 2018

jeff added inline comments to D14905: per-cpu free page caching.
Mar 30 2018, 11:24 PM
jeff updated the diff for D14891: msetdomain prototype (similar to mbind()).

I addressed review feedback.

Mar 30 2018, 8:35 AM
jeff added inline comments to D14891: msetdomain prototype (similar to mbind()).
Mar 30 2018, 5:44 AM
jeff added inline comments to D14893: VM page queue batching.
Mar 30 2018, 5:06 AM
jeff created D14905: per-cpu free page caching.
Mar 30 2018, 3:47 AM
jeff committed rS331754: Re-implement the page free cache with UMA. Change the limits so the import….
Re-implement the page free cache with UMA. Change the limits so the import…
Mar 30 2018, 1:33 AM
jeff committed rS331753: Fix a couple of pageout control issues. Reset pass after we meet our target..
Fix a couple of pageout control issues. Reset pass after we meet our target.
Mar 30 2018, 1:31 AM

Mar 29 2018

jeff added inline comments to D14891: msetdomain prototype (similar to mbind()).
Mar 29 2018, 11:17 PM
jeff added inline comments to D14891: msetdomain prototype (similar to mbind()).
Mar 29 2018, 8:59 PM
jeff committed rS331748: Merge from head.
Merge from head
Mar 29 2018, 8:40 PM
jeff added inline comments to D14891: msetdomain prototype (similar to mbind()).
Mar 29 2018, 6:03 AM
jeff created D14891: msetdomain prototype (similar to mbind()).
Mar 29 2018, 6:01 AM
jeff committed rS331723: Implement several enhancements to NUMA policies..
Implement several enhancements to NUMA policies.
Mar 29 2018, 2:55 AM
jeff closed D14839: NUMA policy enhancements.
Mar 29 2018, 2:55 AM

Mar 28 2018

jeff added inline comments to D14839: NUMA policy enhancements.
Mar 28 2018, 7:19 PM
jeff committed rS331698: Restore r331606 with a bugfix to setup cpuset_domain[] earlier on all.
Restore r331606 with a bugfix to setup cpuset_domain[] earlier on all
Mar 28 2018, 6:47 PM

Mar 27 2018

jeff committed rS331610: Backout r331606 until I can identify why it does not boot on some.
Backout r331606 until I can identify why it does not boot on some
Mar 27 2018, 10:21 AM
jeff committed rS331606: Only use CPUs in the domain the device is attached to for default.
Only use CPUs in the domain the device is attached to for default
Mar 27 2018, 3:37 AM
jeff closed D14838: By default bind interrupts to the set of CPUs in the domain they are connected to.
Mar 27 2018, 3:37 AM
jeff committed rS331605: Move vm_ndomains to vm.h where it can be used with a single header include.
Move vm_ndomains to vm.h where it can be used with a single header include
Mar 27 2018, 3:27 AM

Mar 26 2018

jeff committed rS331561: Fix a bug introduced in r329612 that slowly invalidates all clean bufs..
Fix a bug introduced in r329612 that slowly invalidates all clean bufs.
Mar 26 2018, 6:36 PM

Mar 25 2018

jeff added inline comments to D14839: NUMA policy enhancements.
Mar 25 2018, 11:23 PM
jeff added inline comments to D14838: By default bind interrupts to the set of CPUs in the domain they are connected to.
Mar 25 2018, 6:56 PM
jeff committed rS331529: Add missing file from 4331508.
Add missing file from 4331508
Mar 25 2018, 7:43 AM
jeff created D14839: NUMA policy enhancements.
Mar 25 2018, 1:26 AM
jeff created D14838: By default bind interrupts to the set of CPUs in the domain they are connected to.
Mar 25 2018, 1:19 AM

Mar 24 2018

jeff committed rS331508: Document new NUMA related syscalls and utility options..
Document new NUMA related syscalls and utility options.
Mar 24 2018, 11:59 PM
jeff added inline comments to D14835: Enhance support for Linux mremap system call.
Mar 24 2018, 7:36 PM

Mar 23 2018

jeff committed rS331450: Fix two compliation problems on non-amd64 architectures..
Fix two compliation problems on non-amd64 architectures.
Mar 23 2018, 6:24 PM
jeff committed rS331444: Re-implement vm_pageout_free_pages()..
Re-implement vm_pageout_free_pages().
Mar 23 2018, 5:58 PM

Mar 22 2018

jeff committed rS331377: Merge from user/jeff/numa.
Merge from user/jeff/numa
Mar 22 2018, 9:58 PM
jeff committed rS331372: Remove garbage diffs from merges and differences from head patches..
Remove garbage diffs from merges and differences from head patches.
Mar 22 2018, 7:50 PM
jeff committed rS331371: Merge from head.
Merge from head
Mar 22 2018, 7:39 PM
jeff committed rS331370: Attempt to improve the include situation for vm_ndomains..
Attempt to improve the include situation for vm_ndomains.
Mar 22 2018, 7:23 PM
jeff committed rS331369: Lock reservations with a dedicated lock in each reservation. Protect the.
Lock reservations with a dedicated lock in each reservation. Protect the
Mar 22 2018, 7:21 PM
jeff closed D14707: Fine grain lock reservations.
Mar 22 2018, 7:21 PM
jeff committed rS331368: Start witness much earlier in boot so that we can shrink the pend list and.
Start witness much earlier in boot so that we can shrink the pend list and
Mar 22 2018, 7:11 PM
jeff committed rS331367: Use read_mostly and alignment tags to eliminate or limit false sharing..
Use read_mostly and alignment tags to eliminate or limit false sharing.
Mar 22 2018, 7:07 PM
jeff added inline comments to D14707: Fine grain lock reservations.
Mar 22 2018, 5:22 AM
jeff updated the diff for D14707: Fine grain lock reservations.

Fix review feedback. Move witness initialization into the vm startup so that we can allocate large numbers of locks prior to bringing up malloc. Use fcmpset.

Mar 22 2018, 2:36 AM

Mar 20 2018

jeff accepted D14778: Elide the object lock in vfs_vmio_truncate() in the common case..
Mar 20 2018, 9:23 PM

Mar 19 2018

jeff added inline comments to D14707: Fine grain lock reservations.
Mar 19 2018, 6:56 PM
jeff added inline comments to D14707: Fine grain lock reservations.
Mar 19 2018, 6:24 PM

Mar 17 2018

jeff committed rS331106: Move the dirty queues inside the per-domain structure. This resolves a bug.
Move the dirty queues inside the per-domain structure. This resolves a bug
Mar 17 2018, 6:15 PM
jeff closed D14705: Make dirty queues a per-domain property.
Mar 17 2018, 6:15 PM

Mar 16 2018

jeff added inline comments to D14707: Fine grain lock reservations.
Mar 16 2018, 4:30 AM

Mar 15 2018

jeff created D14707: Fine grain lock reservations.
Mar 15 2018, 11:38 PM
jeff committed rS331024: Merge from head..
Merge from head.
Mar 15 2018, 8:26 PM
jeff committed rS331020: Correct print formats..
Correct print formats.
Mar 15 2018, 7:32 PM
jeff added inline comments to D14705: Make dirty queues a per-domain property.
Mar 15 2018, 7:31 PM
jeff created D14705: Make dirty queues a per-domain property.
Mar 15 2018, 7:27 PM
jeff committed rS331018: Eliminate pageout wakeup races. Take another step towards lockless.
Eliminate pageout wakeup races. Take another step towards lockless
Mar 15 2018, 7:23 PM
jeff closed D14612: Lock avoiding pageout wakeup algorithm.
Mar 15 2018, 7:23 PM