jeff (Jeffrey Roberson)
User

Projects

User does not belong to any projects.

User Details

User Since: Aug 6 2017, 12:45 AM (413 w, 23 h)

Recent Activity
View All

Jun 4 2024

jeff added a comment to D45396: vm_radix: define vm_radix_insert_lookup_lt and use in vm_page_rename.

In D45396#1037489, @rlibby wrote:

In D45396#1037485, @markj wrote:

I guess vm_page_insert() could be improved similarly, but that's more work.

Yes. It's not a lot of work, I just didn't want to get ahead of myself. I can put up a patch later this week when I get some more time.

IMO a better long-term direction there is to remove the memq (insertion into which is the purpose of looking up mpred in the first place) and use the radix tree for iteration instead, but that's a separate topic.

Yes, I've thought about that a little but haven't explored thoroughly. Honestly we may want to go that direction for bufs too. Maintaining tailq linkage can be costly as the neighbors may be cache cold. Privately we have added cache line prefetches in certain places which are surprisingly effective, but I don't know if we have an appetite for that sort of thing in tree.

If we actually want to take steps toward removing the linkage, we may need to provide better iterator primitives or at least conventions for pctrie, as otherwise scans may be more costly.

Jun 4 2024, 4:37 PM

Jun 3 2024

jeff added a comment to D45390: runq/sched: Switch to 256 distinct levels.

In D45390#1037322, @olce wrote:

In D45390#1036063, @jeff wrote:

The runq index only moves at a rate slower than once per-tick when the system is very overloaded. This is how ULE implements priority decay. The difference in priority determines how many slices the lower (better) priority task will get for each slice the worse priority task will get.

You're absolutely right. I have been knowing that already but was so focused on the offset part that I forgot to consider how the head (the base index) moves (and indeed, unless the system has a high load, the head moves with tick frequency).

Two thoughts on that:

It's easy to counter-balance the effect of the higher number of queues by just incrementing the head by 4 (the old RQ_PPQ) at each tick. I'll actually do that in the commit that switches to 256 queues, and then will change it to 1 in a separate commit (unless you prefer to wait for that part), just so it is later easier to bisect. If you're OK with the second commit, this won't change the revision here (since both commits would be grouped into it anyway).

I've always found niceness in FreeBSD to be way too weak, so I wouldn't be against giving it a stronger (and possibly configurable) effect. That said, the side effect of the change here is still far from being enough from my POV. When using nice -n 20, I would really want it to only use at most 10% of the CPU time other processes with a nice value of 0 would get. I'd even argue to going as far as 1%. My expectation would be that nice -n 20 should be almost the same as using an idle priority. Another possible problem with our implementation of nice levels is that they are not logarithmic, which seems inconsistent with POSIX specifying that the nice() interface takes increments. This indeed hints at making increments have the same relative effect regardless of the value they are applied to, which then leads by composition to the logarithmic scale. It's what Linux does today IIRC. So I think there is work to do in this area also. It may make sense to try to avoid changing the nice behavior in this round of changes and save it for the time when we make other ones. Grouping those could be... nicer on users.

This won't just affect nice but it will be most evident with it. Run two cpu hogs that never terminate and consume 100% cpu on the same core with cpuset. Let one be nice +20. Report the % cpu consumed by each before and after this patch set. I believe the nice +20 process should get 1/4 the cpu it was allotted before.

I'll do the test just to be sure but I'm already convinced this is exactly what I'm going to observe.

Jun 3 2024, 10:09 PM

jeff added a comment to D45388: sched_ule: Re-implement stealing on top of runq common-code.

In D45388#1037293, @olce wrote:

In D45388#1035721, @mav wrote:

I suspect that first thread was skipped to avoid stealing a thread that was just scheduled to a CPU, but was unable to run yet.

I don't think that's a possibility with the current code, right?

In D45388#1036064, @jeff wrote:

The point was to move threads that are least likely to benefit from affinity because they are unlikely to run soon enough to take advantage of it. We leave a thread that may execute soon. I would want to see this patchset benchmarked together with a wide array of tests to make sure there's no regression in the totality of it. I don't feel particularly strongly about this case but taken together there is some chance of unintended consequences.

I agree this may improve affinity in some cases, but at the same time we don't really know when the next thread on the queue is to run. Not stealing in this case also amounts to slightly violating the expected execution ordering and fairness.

As for benchmarking, of course this patchset needs wider benchmarking. I'll need help for that since I don't have that many different machines to test it on, and moreover it has to undergo testing with a variety of different workloads. I plan to contact olivier@ to see if he can test that patch set (perhaps slightly amended) at Netflix.

Jun 3 2024, 8:59 PM

May 31 2024

jeff added a comment to D45388: sched_ule: Re-implement stealing on top of runq common-code.

This special case, introduced as soon as commit "ULE 3.0"
(ae7a6b38d53f, r171482, from July 2007), has no apparent justification.
All the reasons we can second-guess are dubious at best. In absence of
objections, let's just remove this twist, which caused bugs in the past.

May 31 2024, 5:50 AM

jeff added a comment to D45390: runq/sched: Switch to 256 distinct levels.

In D45390#1035982, @olce wrote:

In D45390#1035791, @jeff wrote:

There is one thing the original code author intended that we will have to validate. The relative impact of nice levels depends on their distance in the runq. ridx/idx (I'm not sure why they were renamed in one diff) march forward at a fixed rate in real time. Changing the number of queues and priorities changes that rate. I believe it will have the effect of increasing the impact of nice. We may need to change the way nice values are scaled to priorities to compensate.

The rename is to clear a possible confusion. After this change, these fields of struct tdq don't store the absolute index of a queue anymore, but rather the offset of the "start" queue in the range assigned to the timesharing selection policy. Moreover, I found that a single letter of difference sometimes impair quick reading, so I chose to be more explicit with _deq.

I don't think the relative impact of nice levels is changed. tdq_ts_deq_off (the old tdq_ridx) is not incremented by sched_clock() unless the queue it points to is empty, so the march forward, AFAIU, doesn't happen at a fixed rate, but rather depends on the time spent executing all queued threads (so at most the quantum by that number of threads, but will usually be less). Moreover, since the nice value is used in the computation of the *priority* to assign to a thread, that priority doesn't depend on the number of queues (as it should). Obviously, the final index of the chosen queue now changes (that's the point of this diff), but the only difference I can see is that the scheduler now will always run threads with lower numerical priority sooner than those with higher ones even if the priority difference is less than 4. Relative execution between these "clusters" of threads is unchanged. Do you agree?

May 31 2024, 5:43 AM

May 30 2024

jeff added a comment to D45390: runq/sched: Switch to 256 distinct levels.

In D45390#1035772, @mav wrote:

Differences of less than 4 (RQ_PPQ) are insignificant and are simply removed. No functional change (intended).

This is sure the easiest (least invasive answer) answer, but does it make it the right answer? Simply doing nothing here could expose the original code author's ideas. There could be some rationale that you are dropping. It would be nice to look what was it, unless you've done it already.

May 30 2024, 1:41 AM

Apr 19 2021

jeff added inline comments to D29790: Improve UMA cache reclamation.

Apr 19 2021, 10:40 PM

Dec 29 2020

jeff added a comment to D27803: Add a code example to cpuset(2) showing how to modify the affinity of the current process. Improve cross referencing..

I feel that I made the API overly complicated because I was trying to unify an API for administration and for programming. This example is more complex than necessary if you are simply trying to change the set for the process. A non-anonymous or numeric set is only required if you wish to refer to it later. Think of it more like a process group where you want to be able to apply that constraint to multiple things at once. Normal programs simply exist within their current process group and only in specific circumstances do you create a new one. The numbered sets are this 'group'. They exist in the middle of the hierarchy with the root set above them giving an absolute limit on available CPUs that may be from jail or the actual system. Below them exists anonymous sets for programs that have constrained themselves to a subset of the numbered set.

Dec 29 2020, 6:07 PM

Dec 13 2020

jeff committed R9:1790b98f4c26: Adding myself to the developers list. (authored by jeff).

Adding myself to the developers list.

Dec 13 2020, 7:25 PM

jeff committed R9:60814017b4b9: - Add a description of the 800026 version. (authored by jeff).

- Add a description of the 800026 version.

Dec 13 2020, 5:51 PM

Sep 2 2020

jeff accepted D26304: Avoid unnecessary object locking in vm_page_grab_pages_unlocked()..

Sep 2 2020, 7:38 PM

jeff added a comment to D26304: Avoid unnecessary object locking in vm_page_grab_pages_unlocked()..

hilarious, thank you.

Sep 2 2020, 7:37 PM

Aug 12 2020

jeff added inline comments to D25968: VMIO read.

Aug 12 2020, 5:57 PM

jeff added inline comments to D25968: VMIO read.

Aug 12 2020, 5:37 PM

Aug 10 2020

jeff added a comment to D25968: VMIO read.

We should set reference bits or similar so that the LRU is updated lazily.

Aug 10 2020, 6:55 PM

Jun 21 2020

jeff committed rS362459: Use zone nomenclature that is consistent with UMA..

Use zone nomenclature that is consistent with UMA.

Jun 21 2020, 4:59 AM

Jun 20 2020

jeff committed rS362449: Clarify some language. Favor primary where both master and primary were.

Clarify some language. Favor primary where both master and primary were

Jun 20 2020, 8:21 PM

Jun 9 2020

jeff accepted D25200: New Code of Conduct (LLVM-derived).

Jun 9 2020, 7:59 PM

May 5 2020

jeff added a comment to D24622: Use the regular turnstile interface to lend prio to preempted readers..

I approve of the approach.

May 5 2020, 7:45 PM

May 1 2020

jeff added a comment to D24645: sched_ule: rate limit work stealing.

First off, lots of good discussion here. I think this is the start of a good approach and I support committing it disabled.

May 1 2020, 7:31 PM

Apr 27 2020

jeff accepted D24592: Re-check for wirings after busying the page in vm_page_release_locked()..

Apr 27 2020, 6:49 PM

Apr 21 2020

jeff added a comment to D24473: Use a single kernel stack object..

Have you thought about whether there are any side-effects in swap behavior from using the same object? Might we run into clustering behavior?

Apr 21 2020, 12:55 AM

jeff accepted D24475: Factor out contig page alloc code..

Apr 21 2020, 12:48 AM

Mar 16 2020

jeff added a comment to D24028: Normalize VM_ALLOC_ZERO handling in page busy routines..

I think there may be a bug in this. busy_sleep may not properly wait for an exclusive lock if it is sleeping to zero a page while it is sbusy and the caller requested an sbusy after zeroing. Currently no caller does this so it is not going to cause problems. There are simply too many flags and it complicates things but I do not see an easy way to drop many.

Mar 16 2020, 3:01 AM

jeff added inline comments to D24031: Consolidate busy sleep mechanisms..

Mar 16 2020, 2:54 AM

Mar 11 2020

jeff committed rS358901: Check for busy or wired in vm_page_relookup(). Some callers will only keep.

Check for busy or wired in vm_page_relookup(). Some callers will only keep

Mar 11 2020, 10:25 PM

jeff updated the summary of D24032: Retire vm_page_sbusy/vm_page_xbusy. They are unsafe with lockless lookup..

Mar 11 2020, 8:46 PM

jeff created D24032: Retire vm_page_sbusy/vm_page_xbusy. They are unsafe with lockless lookup..

Mar 11 2020, 8:45 PM

jeff added a comment to D24031: Consolidate busy sleep mechanisms..

One other thing; I have submitted patches to drm to address these changes. I will have to bump FreeBSD_version and fix one extra case in drm-legacy because it is not written in a way that it can use vm_page_busy_acquire().

Mar 11 2020, 8:45 PM

jeff updated the summary of D24031: Consolidate busy sleep mechanisms..

Mar 11 2020, 8:41 PM

jeff created D24031: Consolidate busy sleep mechanisms..

Mar 11 2020, 8:38 PM

jeff updated the summary of D24030: Make grab_valid more consistent with other grab functions by requiring ZERO..

Mar 11 2020, 8:37 PM

jeff retitled D24029: Simplify some grab VM_ALLOC_ZERO callers now that grab will set the pagesvalid. from Simplify some grab VM_ALLOC_ZERO callers now that grab will set the pages valid. to Simplify some grab VM_ALLOC_ZERO callers now that grab will set the pagesvalid..

Mar 11 2020, 8:33 PM

jeff updated the summary of D24028: Normalize VM_ALLOC_ZERO handling in page busy routines..

Mar 11 2020, 8:32 PM

jeff created D24030: Make grab_valid more consistent with other grab functions by requiring ZERO..

Mar 11 2020, 8:27 PM

jeff created D24029: Simplify some grab VM_ALLOC_ZERO callers now that grab will set the pagesvalid..

Mar 11 2020, 8:26 PM

jeff created D24028: Normalize VM_ALLOC_ZERO handling in page busy routines..

Mar 11 2020, 8:25 PM

Mar 10 2020

jeff added inline comments to D24010: Add SMR_LIST_* macros..

Mar 10 2020, 2:48 AM

jeff added a comment to D24010: Add SMR_LIST_* macros..

I agree with mark. CK_LIST is just a copy of queue.h with ck barriers added. We want stronger assertions to tie to the smr interface. We have wide arm64 platforms available to us for testing now. I think the risk of incorrect barriers is minimal if we are conservative. The only real question is how often to require acquire load.

Mar 10 2020, 12:58 AM

Mar 9 2020

jeff added inline comments to D24010: Add SMR_LIST_* macros..

Mar 9 2020, 11:50 PM

Mar 6 2020

jeff accepted D23986: zfs dmu_read: loosen the assertion..

I am ok with this but eventually it likely should not be shared busy.

Mar 6 2020, 8:35 PM

jeff added inline comments to D23988: Move SMR pointer type definition and access macros to smr_types.h..

Mar 6 2020, 6:12 PM

jeff accepted D23988: Move SMR pointer type definition and access macros to smr_types.h..

Mar 6 2020, 5:52 PM

jeff added a comment to D23913: vfs: introduce basic smr-protected path lookup.

Would you not prefer to use an invalid value rather than a bit to hold the free state? i.e. VHOLD_NO_SMR would be something like VHOLD_DEAD 0xffffffff. Some of your asserts would have to be adjusted. The first ref should switch dead with 1.

Mar 6 2020, 7:40 AM

jeff added a comment to D23915: vfs: fully smr-protected path lookup.

I don't mind adding accessors after the fact. It will help us continue to make them more natural and less obtrusive if we can make this work with them.

Mar 6 2020, 7:07 AM

jeff added a comment to D23915: vfs: fully smr-protected path lookup.

So In general the naming scheme is:
vn_* or v* in older code for vnode specific routines.
vop_ for routines specific to ops.
vfs_ for global or mount.

Mar 6 2020, 6:31 AM

jeff added inline comments to D23915: vfs: fully smr-protected path lookup.

Mar 6 2020, 5:44 AM

Mar 5 2020

jeff added inline comments to D23980: Make uma_int.h friendlier to userspace..

Mar 5 2020, 9:10 PM

jeff added inline comments to D23889: fd: use SMR for managing struct pwd.

Mar 5 2020, 8:36 PM

jeff added a comment to D23889: fd: use SMR for managing struct pwd.

Overall this is nice and a better first example of smr than radix.

Mar 5 2020, 8:14 PM

jeff added inline comments to D23889: fd: use SMR for managing struct pwd.

Mar 5 2020, 8:09 PM

jeff added inline comments to D23979: Segregate _NOFREE allocations in physical memory and KVA..

Mar 5 2020, 7:48 PM

Mar 1 2020

jeff added a comment to D23889: fd: use SMR for managing struct pwd.

also please put a comment in the pwd structure describing how the synchronization works. I believe I understand but am not 100%.

Mar 1 2020, 10:02 PM

jeff added a reviewer for D23889: fd: use SMR for managing struct pwd: markj.

This is a good opportunity to push for idiomatic access.

Mar 1 2020, 10:00 PM

jeff added inline comments to D23884: (pwd 2/3) fd: move vnodes out of filedesc into a dedicated structure.

Mar 1 2020, 9:58 PM

Feb 28 2020

jeff committed rS358451: Provide a lock free alternative to resolve bogus pages. This is not likely.

Provide a lock free alternative to resolve bogus pages. This is not likely

Feb 28 2020, 9:42 PM

jeff closed D23866: Add a lock free relookup for bogus page replacement..

Feb 28 2020, 9:42 PM

jeff added inline comments to D23866: Add a lock free relookup for bogus page replacement..

Feb 28 2020, 8:36 PM

jeff abandoned D23861: Add a lock free relookup for bogus page replacement..

Feb 28 2020, 8:34 PM

jeff closed D23847: Convert a few trivial uses to unlocked page lookups..

Feb 28 2020, 8:34 PM

jeff committed rS358447: Convert a few triviail consumers to the new unlocked grab API..

Convert a few triviail consumers to the new unlocked grab API.

Feb 28 2020, 8:34 PM

jeff committed rS358446: Use unlocked grab for uipc_shm/tmpfs..

Use unlocked grab for uipc_shm/tmpfs.

Feb 28 2020, 8:33 PM

jeff closed D23865: Unlocked grab for shm..

Feb 28 2020, 8:32 PM

jeff committed rS358445: Support the NOCREAT flag for grab_valid_unlocked..

Support the NOCREAT flag for grab_valid_unlocked.

Feb 28 2020, 8:32 PM

jeff committed rS358444: Simplify vref() code in object_reference. The local temporary is no longer.

Simplify vref() code in object_reference. The local temporary is no longer

Feb 28 2020, 8:31 PM

jeff committed rS358443: Eliminate object locking in zfs where possible with the new lockless grab.

Eliminate object locking in zfs where possible with the new lockless grab

Feb 28 2020, 8:30 PM

jeff closed D23848: Reduce object locking in zfs..

Feb 28 2020, 8:30 PM

jeff accepted D23861: Add a lock free relookup for bogus page replacement..

Feb 28 2020, 7:03 PM

jeff added inline comments to D23865: Unlocked grab for shm..

Feb 28 2020, 7:03 PM

Feb 27 2020

jeff updated the summary of D23866: Add a lock free relookup for bogus page replacement..

Feb 27 2020, 10:47 PM

jeff updated the summary of D23865: Unlocked grab for shm..

Feb 27 2020, 10:44 PM

jeff created D23866: Add a lock free relookup for bogus page replacement..

Feb 27 2020, 10:38 PM

jeff created D23865: Unlocked grab for shm..

Feb 27 2020, 10:38 PM

jeff created D23861: Add a lock free relookup for bogus page replacement..

Feb 27 2020, 8:39 PM

jeff committed rS358400: Simplify lazy advance with a 64bit atomic cmpset..

Simplify lazy advance with a 64bit atomic cmpset.

Feb 27 2020, 7:05 PM

jeff closed D23825: Simplify lazy advance and decouple it somewhat from ticks so that it canbe forwarded after other broadcast interrupts..

Feb 27 2020, 7:05 PM

jeff added inline comments to D23847: Convert a few trivial uses to unlocked page lookups..

Feb 27 2020, 6:35 PM

jeff abandoned D23738: Experimental ticks based SMR..

Feb 27 2020, 8:24 AM

jeff committed rS358377: A pair of performance improvements..

A pair of performance improvements.

Feb 27 2020, 8:23 AM

jeff closed D23824: A pair of UMA optimizations to improve cache behavior..

Feb 27 2020, 8:23 AM

jeff added inline comments to D23847: Convert a few trivial uses to unlocked page lookups..

Feb 27 2020, 3:33 AM

jeff updated the summary of D23848: Reduce object locking in zfs..

Feb 27 2020, 3:14 AM

jeff created D23848: Reduce object locking in zfs..

Feb 27 2020, 3:13 AM

jeff updated the summary of D23847: Convert a few trivial uses to unlocked page lookups..

Feb 27 2020, 3:10 AM

jeff created D23847: Convert a few trivial uses to unlocked page lookups..

Feb 27 2020, 3:06 AM

jeff committed rS358363: Add unlocked grab* function variants that use lockless radix code to.

Add unlocked grab* function variants that use lockless radix code to

Feb 27 2020, 2:37 AM

jeff closed D23449: Add unlocked variants of grab functions..

Feb 27 2020, 2:37 AM

Feb 26 2020

jeff updated the diff for D23449: Add unlocked variants of grab functions..

Review feedback.

Feb 26 2020, 8:06 PM

jeff added inline comments to D23449: Add unlocked variants of grab functions..

Feb 26 2020, 6:06 PM

jeff updated the diff for D23449: Add unlocked variants of grab functions..

Upload the correct diff

Feb 26 2020, 12:50 AM

jeff updated the diff for D23449: Add unlocked variants of grab functions..

Sometimes if you keep re-arranging pieces they shuffle into a more compact
representation.

Feb 26 2020, 12:45 AM

Feb 25 2020

jeff added inline comments to D23449: Add unlocked variants of grab functions..

Feb 25 2020, 7:28 PM

jeff updated the diff for D23449: Add unlocked variants of grab functions..

Fully handle sleep fail/nowait cases. Simplify a few lockless functions by
introducing another helper. Handle the object NULL assignment in a single
place with a compiler barrier.

Feb 25 2020, 6:33 AM

jeff added inline comments to D23825: Simplify lazy advance and decouple it somewhat from ticks so that it canbe forwarded after other broadcast interrupts..

Feb 25 2020, 5:04 AM

jeff added inline comments to D23824: A pair of UMA optimizations to improve cache behavior..

Feb 25 2020, 4:58 AM

jeff added a comment to D23738: Experimental ticks based SMR..

I forgot to submit these comments

Feb 25 2020, 4:57 AM

Feb 24 2020

jeff retitled D23825: Simplify lazy advance and decouple it somewhat from ticks so that it canbe forwarded after other broadcast interrupts. from Simplify lazy advance and decouple it somewhat from ticks so that it can be forwarded after other broadcast interrupts. to Simplify lazy advance and decouple it somewhat from ticks so that it canbe forwarded after other broadcast interrupts..

Feb 24 2020, 11:42 PM

jeff created D23825: Simplify lazy advance and decouple it somewhat from ticks so that it canbe forwarded after other broadcast interrupts..

Feb 24 2020, 11:41 PM

jeff updated the summary of D23824: A pair of UMA optimizations to improve cache behavior..

Feb 24 2020, 11:40 PM

jeff created D23824: A pair of UMA optimizations to improve cache behavior..

Feb 24 2020, 11:26 PM