- stop re-reading the pcpu pointer
- make use of cpus_fence_seq_cst_issue in !SMP case
- force re-read of the curthread pointer
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 24 2019
Sep 23 2019
Sep 22 2019
In D21646#473051, @kib wrote:Several filesystems cache its root vnode, which becomes redundant with the caching done in mnt. I believe it is worth to make simultaneous cleanup, in the separate patch.
- panic if the found vnode does not match ours and is not doomed
That would be avoidably wasteful for this code. Part of the motivation is to be able to get periodic snapshots of lockprof stats without disturbing the actual workload on top of the existing overhead.
Sep 21 2019
Sep 19 2019
Sep 17 2019
In D21681#473203, @jmallett wrote:I would wonder if it makes sense to implement these in terms of cmpset operations on register-sized quantities in C with the appropriate arithmetic shifts, rather than doing the assembly sort of half by-hand. That's maybe less optimal, but do we have performance-critical 8- and 16-bit atomics in performance-critical areas? I've always been skeptical of smaller-than-register atomics and tend to resist them, so that may just be a personal bias. I can't speak to correctness beyond that.
- make vn_start_write_nb static
- add a comment explaining error handling
- assert LK_NOWAIT passed if LK_IGNORE_INTERLOCK is used
I think it's cleaner to handle this separately mostly because this function cannot do any clean up so it behaves quite differently compared to vn_start_write. If you insist I can fold it in.
- plug a blatant mp ref leak
- rebase
- add an async-signal-safe implementation
- fix return value to always be 0
Sep 16 2019
So I just checked and packet.net has 24-way EPYC boxes. I can get one no problem (paid by FF). Then I can get some data points - at least buildkernel on tmpfs and will-it-scale.
- rebase
- drop unnecesary wait in vfs_cache_root_clear
- add a comment about fence pairing in vfs_cache_root_fallback
Most of the slowness was stemming from lack of support for NUMA meaning for the "wrong" pages the stores were sent across the interconnect. I redid the tests some time ago on the same hardware and there is next to no difference real time (but just more cache misses).
Take over this revision after a talk with @jhibbits.
- assorted fixes
- add vfs_mp_count_add/sub_pcpu in place of hand-rolled manipulation. I don't care for this name, but I think the approach is fine
- If fdwalk lands in the tree, it will be an async-signal safe implementation.
- It will always work for the current process (i.e. when it requests the data for itself), in fact my suggested kernel interface only allows this mode.
Sep 15 2019
I already noted pmcstat needs serious help. I think adding this with (the optional key swap) is solves the immediate problem of not having the feature while not having to spend a significant amount of time writing a new tool from scratch.
- fix manpage
- drop redundant sym assignment from the original
- vdropl before vn_finished_write
- fix parens placement
I think pmcstat needs serious help and the compatibility change (which I highly doubt hurts anyone) is a step towards that goal. I want a usable tool without the need to rewrite current one, if it can be helped. Should someone want to write something new from scratch I'm happy to see it, but I don't want to the work myself.
The option was added at my request and highly doubt anyone else is using it (including the big user). I highly doubt any user will get upset over this change, but worst case I can drop the swap.
I think it fits better. 'A' for address, 'I' for instruction. If there is a time to change it, it is now. I would not do it if pmcstat was actually in widespread use and people had the opt ingrained in their fingers.
Sep 14 2019
- remove the slowpath label
- rebase on top of D21425
- rebase on top of the custom barrier
- rebase on top of the custom barrier
- release the mountlist lock after leaving the section
No, I wanted to have the approach sorted out before I post rebased patch.
- remove spurious 'to' from vfs_op_thread_enter comment and change the ending a little
- change comment in vfs_op_enter
- rename _FASTPATH to _UNLOCKED
Sep 13 2019
I don't care for the name for the most part, but perhaps something not sounding so refcounty will make it easier to not accidentally use the wrong API. Alternatively, perhaps it's time we de-u_int all consumers and have something of this sort:
So I had a closer look and think that the introduction of support for waiting was an abuse of this API and should either be moved to another one or get dedicated routines in this one. Most consumers (including very frequently used ones like struct file) don't care for it and avoidably pay the price for its existence.
Sep 12 2019
Sep 11 2019
- see the rewritten summary
Sep 10 2019
- rework on top of stock head to remove the dependency on devfs patch
- .. the type var should probably move and will be in a separate review