Query: Advanced Search

amd64: stop re-reading curpc on subyte/suword

fd: use racct_set_unlocked

racct: add RACCT_ENABLED macro and racct_set_unlocked

fd: try do less work with the lock in dup

vm: use fcmpset for vmspace reference counting

Replace hand-rolled unrefs if > 1 with refcount_release_if_not_last

refcount: remove a stale comment about conditional ref/unref routines

proc: when exiting move to zombproc before taking proctree

Manage process-related IDs with bitmaps

Annotate Giant drop/pickup macros with __predict_false

unr64: use locked variant if not __LP64__

sx: retire SX_NOADAPTIVE

	Include stories about projects I am a member of.

can you grab a flamegraph from such a test? also, can you compare this against https://reviews.freebsd.org/D17992 ?

I did basic tests with changing the alignment of src and slowdowns were very small compared to similarly misaligned dst, at least on EPYC. I may take a closer look later.

amd64: align target memmove buffer to 16 bytes before using rep movs

amd64: handle small memmove buffers with overlapping stores

amd64: remove stale attribution for memmove work

amd64: tidy up copying backwards in memmove

vfs: fix i386 build after r341220

cache: retire cache_enter compat schim

audit: predict AUDITING_TD as false

vfs: drop spurious memcpy in stat

fd: unify fd range check across the routines

audit: change audit_syscalls_enabled type to bool

Convert racct_enable to bool and annotate as __read_frequently

Deinline racct throttling out of syscall exit path.

Annotate td_cowgen check as unlikely.

proc: create a dedicated lock for zombproc to ligthen the load on allproc_lock

once more i don't have a full picture so can't give a proper review.

Revert "fork: fix use-after-free with vfork"

Annotate TDP_RFPPWAIT as unlikely.

fork: remove avoidable proc lock/unlock pair

fork: fix use-after-free with vfork

remove now spurious cv_broadcast(&p->p_pwait);

In D17992#387922, @kristof wrote:

Adding more rings won't really help any more than making this one ring larger. That merely increases the queue length between the multiple pf threads, and the pfsync processing code (which is still single-threaded) in pfsync_msg_intr() and pfsyncintr().

strings: unbreak the build after r340746

uipc_usrreq: fix inode number assignment

proc: update list manipulation comment on process exit

uipc_shm: use unr64 for inode numbers

proc: convert pfind & friends to use pidhash locks and other cleanup

proc: implement pid hash locks and an iterator

MFC r340108 and r340149

So both ring and swi kicking code are significant players. I think a simple and probably good enough solution would just add more rings, perhaps based on the number of hardware threads. Assuming the traffic is hashed to distribute among them, the rings could mostly remain unshared with unrelated threads. Sending out of the traffic would just combine data from all rings. Kicking can also be avoided in a simple manner. You can add a var signifying the frequency of wakeups. The increase the frequency based on the traffic and past certain threshold you stop kicking swi. It has to decay so that if there is no traffic, the code goes back to wakeups once a second (or whatever).

MFC r339531,r339579,r340252,r340463,r340464,340472,r340587

tmpfs: use unr64 for inode numbers

So I retested with your change. a failing build indeed is fixed. Perhaps my original change had a typo or compatible. Thanks.

Are you sure that on your box a *failing* -DNO_CLEAN starts building again with this change? Are you using meta-mode? I had a similar change locally and the build kept failing anyway, no meta-mode though.

address feedback
drop killpg changes

proc: always store parent pid in p_oppid

amd64: handle small memset buffers with overlapping stores

yes. these patches are stale and kind of crap. I have a WIP replacement which I'llprobalby post in a new review, we will see.

amd64: sync up libc memset with the kernel version

amd64: convert libc bzero to a C func to avoid future bloat

I have no doubt there is an improvement, just saying it is still slower than it can be and unless this uncovered a new major bottleneck, the ring manipulation is the new hotspot.

I think the approach taken here is iffy. Basic problem with this is that even if there is no lock contention anymore, you are still suffering from bouncing cache lines. Also swi_sched probably does not appreciate being called very often.

locks: plug warnings about unitialized variables

amd64: align memset buffers to 16 bytes before using rep stos

address feedback
regen against head
i did not change the condition in proc_realparent as it goes way over the 80 char limit

I don't think the same problem is a concern for ps/top, so it can be discussed further in a different review. I can simply drop killpg conversion from the patchset (and remove the spurious curly braces).

killpg is already unreliable. if the child is spotted as PRS_NEW it will be explicitly omitted, so I don't think this constitutes a regression in functionality

I believe the problem is roughly the same as before. Passed buffers are often already heavily misaligned, so movs here trip over the same words anyway.

Advanced Search
Use Results
Edit Query
Hide Query

Dec 8 2018

Dec 7 2018

Dec 5 2018

Dec 1 2018

Nov 30 2018

Nov 29 2018

Nov 28 2018

Nov 23 2018

Nov 22 2018

Nov 21 2018

Nov 20 2018

Nov 18 2018

Nov 16 2018

Nov 15 2018

Nov 14 2018

Nov 13 2018

Nov 8 2018

Nov 6 2018

Nov 4 2018

Nov 3 2018

Nov 2 2018

Nov 1 2018

Oct 31 2018

Oct 24 2018

Oct 23 2018

Advanced SearchUse ResultsEdit QueryHide Query