I've made it work with additional changes, but don't have a workload + hardware combo that benefits from the reduction in cache misses.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 6 2018
Apr 5 2018
Apr 4 2018
Mar 20 2018
Feb 27 2018
Feb 24 2018
Update: Although it works fine with SOL through ipmitool, iKVM/HTML5 does _not_ work once it drops you to the db> prompt
Please pull in the keyboard bits as well.
Feb 21 2018
Just a "works for me." I haven't had time to dig in to what's holding the lock, but the inability to enter ddb consistently when not entering by way of a proper panic has been a serious issue for me for at least the past 6 months. I don't know what others are doing to never see this.
Feb 15 2018
LGTM
Dec 19 2017
This seems to fix dumps from panics caused by in-kernel page faults.
Nov 12 2017
Oct 31 2017
In D12101#264936, @gallatin wrote:So let me try to sum up in my own words what's going on here:
- We want to be able to sleep when talking to the hardware, rather than poll
- We want to change the driver lock from a mutex to an sx lock to allow sleeping when talking to h/w
- We can't just change to an SX lock and be done because the multicast code holds a mutex (and maybe an RW lock) when calling ifioctl
- We noticed that the multicast code does not check return values from driver ioctls, so it seems safe to defer it into another context
- We decided to defer mutlicast (and promisc, since it can be coming from the multicast code) into another context at the iflib level
While, obviously the best solution would be to refactor the multicast code to avoid holding a lock, I think that I generally agree with the approach, and think it is very clever.
However, if we're going to defer the mutlicast code into another context, can we please consider *NOT* doing it at the iflib level? I'm thinking we could re-work this hack so this works for all drivers if we made an mcast kproc and submitted mutlicast add/del requests to a global mcast work queue. Then the deferred multicast kproc can call down into drivers with no locks held. Then we assert no locks in the ifioctl entry points.
Sep 11 2017
In D12295#255719, @kevin.bowling_kev009.com wrote:This is silly to bikeshed. 12.0 wont be released for at least a year and -CURRENT is for API breaks. Users can run the current -STABLE trees in a jail if they have any shitware they need to support for a long time. It's also trivial to maintain the patch in a corporate tree if needed, or a compat lib that doesn't live in src/.
Sep 6 2017
@rgrimes what is the state of this review?
Sep 4 2017
In D12140#252557, @johalun0_gmail.com wrote:I did a clean world+kernel build on iflib/netmap_rx branch.
Running pkt-gen -i em1 -f rx on receiver and pkt-gen -i em1 -f tx on the sender results in crash on receiver.
(sender and receiver two different machines connected back to back on em1)db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe03668b2510 vpanic() at vpanic+0x19c/frame 0xfffffe03668b2590 kassert_panic() at kassert_panic+0x126/frame 0xfffffe03668b2600 em_isc_rxd_pkt_get() at em_isc_rxd_pkt_get+0xf1/frame 0xfffffe03668b2670 iflib_netmap_rxsync() at iflib_netmap_rxsync+0x235/frame 0xfffffe03668b2770 netmap_poll() at netmap_poll+0x79c/frame 0xfffffe03668b2870 freebsd_netmap_poll() at freebsd_netmap_poll+0x32/frame 0xfffffe03668b28a0 devfs_poll_f() at devfs_poll_f+0x7f/frame 0xfffffe03668b2900 kern_poll() at kern_poll+0x4fc/frame 0xfffffe03668b2aa0 sys_poll() at sys_poll+0x50/frame 0xfffffe03668b2ac0 amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe03668b2bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe03668b2bf0 --- syscall (209, FreeBSD ELF64, sys_poll), rip = 0x800daf2aa, rsp = 0x7fffdfff9e78, rbp = 0x7fffdfff9eb0 --- KDB: enter: panic [ thread pid 828 tid 100177 ] Stopped at kdb_enter+0x3b: movq $0,kdb_whyHowever, if I limit the rate of packets I like so
pkt-gen -i em1 -f rx on receiver and pkt-gen -i em1 -f tx -R 100 on the sender
the receiver stops receiving packets after ~600 packets or so (6 batches received), after that rate on receiver goes to zero but does not crash.Breaking the process might result in crash:
cpuid = 2 time = 1504082223 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02398c0730 vpanic() at vpanic+0x19c/frame 0xfffffe02398c07b0 kassert_panic() at kassert_panic+0x126/frame 0xfffffe02398c0820 iflib_fl_bufs_free() at iflib_fl_bufs_free+0x1c2/frame 0xfffffe02398c0870 iflib_stop() at iflib_stop+0x478/frame 0xfffffe02398c08c0 iflib_netmap_register() at iflib_netmap_register+0x1a4/frame 0xfffffe02398c0900 netmap_hw_reg() at netmap_hw_reg+0x2c/frame 0xfffffe02398c0930 netmap_do_unregif() at netmap_do_unregif+0x16a/frame 0xfffffe02398c0960 netmap_priv_delete() at netmap_priv_delete+0x31/frame 0xfffffe02398c0980 netmap_dtor() at netmap_dtor+0x2b/frame 0xfffffe02398c09a0 devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0x8b/frame 0xfffffe02398c09c0 devfs_close_f() at devfs_close_f+0x65/frame 0xfffffe02398c09f0 closef() at closef+0x1f5/frame 0xfffffe02398c0a80 closefp() at closefp+0x9f/frame 0xfffffe02398c0ac0 amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe02398c0bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe02398c0bf0igb0 interface works flawlessly at 1.4 Mpps so problem limited to em
Aug 30 2017
Aug 29 2017
In D12132#252205, @mjg wrote:since this is a spin mutex even failed trylock adds a trip through disabling/enabling preemption + interrupts. add a cacheline fetch. "fortunately" it so happens that spin trylock does not dirty it if it sees a taken lock so this bit is not extremely slow. the sum is definitely significantly slower than necessary.
I think a much better approach would be to have this rate-limited instead. use a per-cpu counter and grab every n'th packet.
longer term this should probably gather per-cpu and rollup once a second or similar
@sbruno I've created an iflib/adaptive_entropy branch with this change. It's dependent on the master-bogofix branch.
Only launch as many threads as we actually need
@sbruno dedicated branch vs iflib/ithread_dispatch called iflib/pollution reduction
@sbruno @johalun0_gmail.com I've created a dedicated branch for this fix ifilb/netmap_rx
avoid gratuitous ithread dispatch
Aug 28 2017
In D12140#252066, @johalun0_gmail.com wrote:em0@pci0:0:31:6: class=0x020000 card=0x06db1028 chip=0x156f8086 rev=0x21 hdr=0x00 device = 'Ethernet Connection I219-LM'Tested with netmap's pkt-gen
On receive I get crash:
#7 0xffffffff80a72006 in kassert_panic ( fmt=0xffffffff810482e7 "Assertion %s failed at %s:%d") at /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/kern/kern_shutdown.c:669 #8 0xffffffff8053a20f in em_isc_rxd_pkt_get (arg=<unavailable>, ri=<optimized out>) at /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/dev/e1000/em_txrx.c:697 #9 0xffffffff80b81288 in iflib_netmap_rxsync (kring=<optimized out>, flags=<optimized out>) at /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/net/iflib.c:1008 #10 0xffffffff806c12fc in netmap_poll (priv=<optimized out>, events=<optimized out>, sr=0xfffff80220420560) at /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/dev/netmap/netmap.c:2724 #11 0xffffffff806c3a02 in freebsd_netmap_poll (cdevi=<optimized out>, events=1, td=0xfffff80220420560) at /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/dev/netmap/netmap_freebsd.c:1393 #12 0xffffffff80941eef in devfs_poll_f (fp=0xfffff8006cdd9c30, events=1, cred=0xfffff8000c4bba00, td=0xfffff80220420560)On transmit I get:
621.256106 sender_body [1181] start, fd 3 main_fd 3 622.257327 main_thread [2056] 1.022 Kpps (1.023 Kpkts 491.040 Kbps in 1001225 usec) 511.50 avg_batch 0 min_space 623.258329 main_thread [2056] 0.000 pps (0.000 pkts 0.000 bps in 1001002 usec) 0.00 avg_batch 99999 min_space 623.260011 sender_body [1250] poll error/timeout on queue 0: No error: 0 624.259329 main_thread [2056] 1.023 Kpps (1.024 Kpkts 491.520 Kbps in 1000999 usec) 341.33 avg_batch 99999 min_space 625.263889 main_thread [2056] 0.000 pps (0.000 pkts 0.000 bps in 1004548 usec) 0.00 avg_batch 99999 min_space 625.263876 sender_body [1250] poll error/timeout on queue 0: No error: 0 626.265335 main_thread [2056] 1.023 Kpps (1.024 Kpkts 491.520 Kbps in 1001458 usec) 341.33 avg_batch 99999 min_space 627.265335 sender_body [1250] poll error/timeout on queue 0: No error: 0 627.276066 main_thread [2056] 1.013 Kpps (1.024 Kpkts 491.520 Kbps in 1010731 usec) 341.33 avg_batch 99999 min_space 628.276338 main_thread [2056] 0.000 pps (0.000 pkts 0.000 bps in 1000273 usec) 0.00 avg_batch 99999 min_spaceThe packets that do get transmitted are received by receiving machine.
Aug 27 2017
It occurred to me that we can both have our way by enabling the driver to disable entropy collection on packets.
Aug 26 2017
In D12101#251708, @sbruno wrote:Huh .... Testing this review by itself this morning. I see the following startup panic:
Sleeping on "e1000_delay" with the following non-sleepable locks held: exclusive sleep mutex em2:tx(0):callo (em2:tx(0):callo) r = 0 (0xfffff80004175068) locked @ /home/sbruno/bsd/fbsd_head/sys/kern/kern_mutex.c:182 stack backtrace: #0 0xffffffff80ac85d3 at witness_debugger+0x73 #1 0xffffffff80ac99cf at witness_warn+0x43f #2 0xffffffff80a708fc at _sleep+0x6c #3 0xffffffff80a71117 at pause_sbt+0x117 #4 0xffffffff805620e1 at e1000_read_phy_reg_mdic+0xf1 #5 0xffffffff80562b5b at e1000_read_phy_reg_igp+0x5b #6 0xffffffff80535d6c at em_if_timer+0xcc #7 0xffffffff80b76279 at iflib_timer+0x69 #8 0xffffffff80a7e375 at softclock_call_cc+0x155 #9 0xffffffff80a7e75c at softclock+0x7c #10 0xffffffff80a2bed1 at intr_event_execute_handlers+0x91 #11 0xffffffff80a2c5d6 at ithread_loop+0xb6 #12 0xffffffff80a29304 at fork_exit+0x84 #13 0xffffffff80ecf7ce at fork_trampoline+0xe panic: sleepq_add: td 0xfffff80003d59000 to sleep on wchan 0xffffffff81c7d260 with sleeping prohibited cpuid = 0 time = 4 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f55f4730 vpanic() at vpanic+0x19c/frame 0xfffffe00f55f47b0 kassert_panic() at kassert_panic+0x126/frame 0xfffffe00f55f4820 sleepq_add() at sleepq_add+0x34f/frame 0xfffffe00f55f4870 _sleep() at _sleep+0x26c/frame 0xfffffe00f55f4910 pause_sbt() at pause_sbt+0x117/frame 0xfffffe00f55f4960 e1000_read_phy_reg_mdic() at e1000_read_phy_reg_mdic+0xf1/frame 0xfffffe00f55f49b0 e1000_read_phy_reg_igp() at e1000_read_phy_reg_igp+0x5b/frame 0xfffffe00f55f49e0 em_if_timer() at em_if_timer+0xcc/frame 0xfffffe00f55f4a10 iflib_timer() at iflib_timer+0x69/frame 0xfffffe00f55f4a40 softclock_call_cc() at softclock_call_cc+0x155/frame 0xfffffe00f55f4af0 softclock() at softclock+0x7c/frame 0xfffffe00f55f4b20 intr_event_execute_handlers() at intr_event_execute_handlers+0x91/frame 0xfffffe00f55f4b60 ithread_loop() at ithread_loop+0xb6/frame 0xfffffe00f55f4bb0 fork_exit() at fork_exit+0x84/frame 0xfffffe00f55f4bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00f55f4bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 12 tid 100023 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why
move more blocking work to deferred context
Aug 25 2017
Aug 24 2017
Folded in to D12101
- Eliminate remaining mutex use in os independent e1000 code.
- Fold in iflib updates.
In D12101#251488, @erj wrote:In D12101#251426, @johalun0_gmail.com wrote:In D12101#251279, @johalun0_gmail.com wrote:The busy waiting would cause 10% CPU usage on my system when no cable connected to em0 (traced to em_if_update_admin_status).
This patch also fixes this. I can no longer detect any unusual CPU usage.Getting a lot of witness output. Like a few per second or so.
I think this revision depends on D11969. It changes the core lock from a mutex to an sx lock, which would allow sleeping while its held.
@jtl any comments?
Aug 23 2017
@erj I updated the summary - is there anything more I should add?
Also, can we stop massively refactoring the drivers until after the iflib version goes in to HEAD? And in general, there's a lot of non-productive function shuffling that goes on in Intel drivers that makes it a lot more work for downstream users maintaining their own branches.
Aug 22 2017
In D12101#251141, @erj wrote:Can we also get a description of what this fixes, for future reference?
update to reflect potential for brokenness
Aug 12 2017
In D11969#248789, @jtl wrote:In D11969#248624, @kmacy wrote:Please review the FreeBSD locking hierarchy. SX locks can't be acquired after default mutexes.
I'm familiar with the FreeBSD locking hierarchy. What I don't see (and, since I may be missing it, I'm asking you to point it out) is where the code changes actually require the unbounded sleep that SX locks allow. Understanding this will help me to better understand the rationale for this change.
Aug 11 2017
In D11969#248529, @erj wrote:In D11969#248501, @gallatin wrote:Can you remind me what lock is held?
I don't know about other ioctls, but I remember there being a lock held when IPv6 multicast MAC address are added.
Unrelated to that, for ixl(4), nvmupdate uses ioctls to the update, but it needs the returned error code from the ioctl call (from the driver) to work. So ixl(4) wouldn't work with this change.
I'm also unsure about the benefit of deferring some of the work to run asynchronously in a different thread. Again, I may be missing something, but it looks to me like the same code runs, but it is deferred. In some cases, this has user-visible side effects. For example, there could be a period of time after an ioctl call has "successfully" completed when the behavior will be different than expected. This might be OK if the behavior was well-documented and there was a good reason for it. But, I'm not sure the change description explains why this is necessary. (Again, I might be missing something.)
I talked to Kevin. I may need to clarify my thinking a bit more.
Aug 10 2017
Jul 7 2017
In D11476#238139, @kmacy wrote:In D11476#238004, @ae wrote:From a quick look, the iflib code does not bind irq to CPU cores. The old em/igb drivers did that and I guess, if you add bus_bind_intr() again, this will increase the performance.
It does. The problem is that FreeBSD goes through the entire forwarding path for every single packet, not supporting any sort of batching. There is a doorbell coalesce optimization for content serving that is a pessimization for forwarding on FreeBSD.
In D11476#238004, @ae wrote:From a quick look, the iflib code does not bind irq to CPU cores. The old em/igb drivers did that and I guess, if you add bus_bind_intr() again, this will increase the performance.
Jul 5 2017
Thanks for your review.
May 26 2017
@rstone have you committed an equivalent fix already?
May 25 2017
May 24 2017
May 23 2017
I updated my primary e-mail. I didn't see your update. Thanks.
Update for comments
May 1 2017
Jan 5 2017
Jan 2 2017
FYI - this is a backtrace of where X hangs when running the patched driver:
#0 0x000000080250115a in ioctl () from /lib/libc.so.7
#1 0x00000008012763d5 in drmIoctl (fd=7, request=3223348297, arg=0x7fffffffd830) at xf86drm.c:183
#2 0x0000000807a5a1b4 in amdgpu_ioctl_wait_cs (context=0x80409c000, ip=0, ip_instance=0, ring=0, handle=1,
timeout_ns=18446744073709551615, flags=1, busy=0x7fffffffd8d7) at amdgpu_cs.c:403
#3 0x0000000807a5a0bf in amdgpu_cs_query_fence_status (fence=0x804170010, timeout_ns=18446744073709551615, flags=1,
expired=0x7fffffffd944) at amdgpu_cs.c:436
#4 0x0000000808d213bb in amdgpu_fence_wait (fence=0x804170000, timeout=18446744073709551615, absolute=true)
at amdgpu_cs.c:117
#5 0x0000000808d1ea3a in amdgpu_bo_wait (_buf=0x8041e2880, timeout=18446744073709551615, usage=RADEON_USAGE_WRITE)
at amdgpu_bo.c:117
#6 0x0000000808d1fa95 in amdgpu_bo_map (buf=0x8041e2880, rcs=0x0, usage=PIPE_TRANSFER_READ) at amdgpu_bo.c:264
#7 0x0000000808d4406d in r600_buffer_map_sync_with_rings (ctx=0x804171000, resource=0x804038b40, usage=1)
at r600_buffer_common.c:99
#8 0x0000000808d4c3c2 in r600_query_init_backend_mask (ctx=0x804171000) at r600_query.c:1611
#9 0x0000000808c1353c in si_create_context (screen=0x80409b800, priv=0x0, flags=0) at si_pipe.c:253
#10 0x0000000808c13042 in radeonsi_screen_create (ws=0x8040e5600) at si_pipe.c:848
#11 0x0000000808d26c23 in amdgpu_winsys_create (fd=8, screen_create=0x808c12c70 <radeonsi_screen_create>)
at amdgpu_winsys.c:590
#12 0x00000008084671ca in pipe_radeonsi_create_screen (fd=8)
at ../../../../src/gallium/auxiliary/target-helpers/drm_helper.h:151
#13 0x0000000808aa04f3 in pipe_loader_drm_create_screen (dev=0x80415f7c0) at pipe_loader_drm.c:347
#14 0x0000000808a9f618 in pipe_loader_create_screen (dev=0x80415f7c0) at pipe_loader.c:79
#15 0x00000008088ccb15 in dri2_init_screen (sPriv=0x8040385a0) at dri2.c:1885
#16 0x00000008088c7c79 in driCreateNewScreen2 (scrn=0, fd=6, extensions=0x80463aaa0 <gbm_dri_screen_extensions>,
driver_extensions=0x8091b8910 <galliumdrm_driver_extensions>, driver_configs=0x804042570, data=0x804042400) at dri_util.c:145
#17 0x0000000804435b0f in dri_screen_create_dri2 (dri=0x804042400, driver_name=0x804029120 "radeonsi")
at backends/dri/gbm_dri.c:448
#18 0x0000000804435480 in dri_screen_create (dri=0x804042400) at backends/dri/gbm_dri.c:523
#19 0x00000008044342b2 in dri_device_create (fd=6) at backends/dri/gbm_dri.c:1069
#20 0x000000080443396e in _gbm_create_device (fd=6) at main/backend.c:100
#21 0x0000000804433baf in gbm_create_device (fd=6) at main/gbm.c:123
#22 0x0000000807840dbe in ?? () from /usr/local/lib/xorg/modules/drivers/amdgpu_drv.so
#23 0x000000000047ce3f in InitOutput ()
#24 0x000000000043a4d4 in ?? ()
#25 0x0000000000424e0f in _start ()
Dec 23 2016
Why are you abandoning this?
Dec 11 2016
Fixed by PQ_LAUNDER
Replaced by populate, which although it still insists on doing a great deal of accounting not needed for device mappings (COW cargo culting AFAICT) that complicates mapping the same device memory at different VAs - it is quite close to what is expected by graphics drivers. Thanks go to Kostik and Alan.
Nov 23 2016
add missed check
Nov 18 2016
Add comment to tcp_timer.c as requested by hiren@
simplify badrxtwin assignment
This is now dependent on D8558