Page MenuHomeFreeBSD

gallatin (Andrew Gallatin)
User

Projects

User Details

User Since
Jun 22 2015, 5:21 PM (500 w, 4 d)

Recent Activity

Wed, Jan 15

gallatin committed rGcf9070746742: Introduce the UMA_ZONE_NOTRIM uma zone type (authored by gallatin).
Introduce the UMA_ZONE_NOTRIM uma zone type
Wed, Jan 15, 5:24 PM
gallatin closed D48451: add UMA_ZONE_NOTRIM & use it for the ktls_buffer zone.
Wed, Jan 15, 5:23 PM

Mon, Jan 13

gallatin requested review of D48451: add UMA_ZONE_NOTRIM & use it for the ktls_buffer zone.
Mon, Jan 13, 7:50 PM

Nov 25 2024

gallatin accepted D47735: ktls: Enable by default.
Nov 25 2024, 3:25 PM
gallatin accepted D47720: setsockopt.2.
Nov 25 2024, 3:21 PM

Nov 15 2024

gallatin committed rG4605a99b51ab: aio: remove write-only jobid & kernelinfo (authored by gallatin).
aio: remove write-only jobid & kernelinfo
Nov 15 2024, 3:49 PM
gallatin closed D47583: aio: remove write-only jobid & kernelinfo.
Nov 15 2024, 3:49 PM

Nov 14 2024

gallatin added a comment to D47518: aio: improve lock contention on the aio_job_mtx.

Super helpful review, John. I just opened a new review (https://reviews.freebsd.org/D47583) for the simplest suggested change. Will work on your other suggestions.

Nov 14 2024, 11:25 PM
gallatin requested review of D47583: aio: remove write-only jobid & kernelinfo.
Nov 14 2024, 11:24 PM

Nov 13 2024

gallatin added inline comments to D47518: aio: improve lock contention on the aio_job_mtx.
Nov 13 2024, 9:54 PM

Nov 12 2024

gallatin added inline comments to D47518: aio: improve lock contention on the aio_job_mtx.
Nov 12 2024, 2:53 PM
gallatin added inline comments to D47518: aio: improve lock contention on the aio_job_mtx.
Nov 12 2024, 12:00 PM

Nov 11 2024

gallatin added inline comments to D47518: aio: improve lock contention on the aio_job_mtx.
Nov 11 2024, 10:59 PM
gallatin updated the diff for D47518: aio: improve lock contention on the aio_job_mtx.

Address Kib's feedbackj

Nov 11 2024, 10:58 PM
gallatin requested review of D47518: aio: improve lock contention on the aio_job_mtx.
Nov 11 2024, 7:14 PM

Nov 8 2024

gallatin committed rGfd67ff5c7a6c: Use the correct idle routine on recent AMD EPYC servers (authored by gallatin).
Use the correct idle routine on recent AMD EPYC servers
Nov 8 2024, 10:13 PM
gallatin closed D47444: Use correct idle routine on AMD.
Nov 8 2024, 10:13 PM

Nov 6 2024

gallatin added inline comments to D47444: Use correct idle routine on AMD.
Nov 6 2024, 10:39 PM

Nov 4 2024

gallatin requested review of D47444: Use correct idle routine on AMD.
Nov 4 2024, 10:25 PM

Oct 28 2024

gallatin committed rGee373c1234d3: acpi_ged: Handle events directly (authored by gallatin).
acpi_ged: Handle events directly
Oct 28 2024, 11:03 PM
gallatin added a comment to D47294: if_bridge: Mask MEXTPG if some members don't support it.

Why do we want or need a hardcoded list? Why can't this function be more like lagg_capabilities()? If we do want a hardcoded list, what about IFCAP_TXTLS*

That's a good question. if_bridge could probably be smarter, indeed.

Why exactly does if_bridge need to care about IFCAP_TXTLS*?

Oct 28 2024, 5:29 PM
gallatin accepted D47295: tuntap: Enable MEXTPG support.
Oct 28 2024, 2:06 PM
gallatin accepted D47294: if_bridge: Mask MEXTPG if some members don't support it.

Why do we want or need a hardcoded list? Why can't this function be more like lagg_capabilities()? If we do want a hardcoded list, what about IFCAP_TXTLS*

Oct 28 2024, 2:05 PM

Oct 25 2024

gallatin added a comment to D47287: cam: Don't log invalid cdb errors.

I'd personally want to keep these messages with bootverbose.. I can imagine it might be handy to see them at times...

Oct 25 2024, 10:48 PM

Oct 23 2024

gallatin committed rG49597c3e84c4: mlx5e: Use M_WAITOK when allocating TLS tags (authored by gallatin).
mlx5e: Use M_WAITOK when allocating TLS tags
Oct 23 2024, 7:58 PM
gallatin closed D47260: mlx5e: Immediately initialize TLS send tags.
Oct 23 2024, 7:53 PM
gallatin committed rG81dbc22ce8b6: mlx5e: Immediately initialize TLS send tags (authored by gallatin).
mlx5e: Immediately initialize TLS send tags
Oct 23 2024, 7:53 PM
gallatin added inline comments to D47260: mlx5e: Immediately initialize TLS send tags.
Oct 23 2024, 3:17 PM
gallatin updated the diff for D47260: mlx5e: Immediately initialize TLS send tags.

Fix style issue pointed out by Mark

Oct 23 2024, 3:10 PM

Oct 22 2024

gallatin added a comment to D4295: Add driver backpressure.

Why is this re-surfacing?

Oct 22 2024, 11:19 PM · transport
gallatin requested review of D47260: mlx5e: Immediately initialize TLS send tags.
Oct 22 2024, 9:58 PM

Oct 16 2024

gallatin added a comment to D30155: ixgbe: Bring back accounting for tx in AIM.

The a/b results were not surprising (boring as David likes to say). Just slightly higher CPU on the canary (due to the increased irq rate). But no clear streaming quality changes.
All in all, it seems to work and do no real harm, but we'll not use it due to the increased CPU

Oct 16 2024, 6:16 PM
gallatin added a comment to D30155: ixgbe: Bring back accounting for tx in AIM.

Yeah, my ideal irq rate/queue is < 1000 . We mostly use Chelsio and Mellanox NICs that can do super aggressive irq coalescing without freaking out TCP due to using RX timestamps. Super aggressive coalescing like this lets us build packet trains in excess of 1000 packets to feed to lro via RSS assisted LRO, and we actually have useful LRO on internet workloads with tens of thousands of TCP connections per-queue. That reminds me that I should port RSS assisted LRO to iflib (eg, lro_queue_mbuf()).

Oct 16 2024, 2:41 AM
gallatin added a comment to D30155: ixgbe: Bring back accounting for tx in AIM.

@imp @gallatin if you are able to test your workload, setting this to 1 and 2 would be new behavior versus where you are currently:

I can pull this into our tree and make an image for @dhw to run on the A/B cluster. However, we're not using this hardware very much any more, and there is only 1 pair of machines using it in the A/B cluster. Lmk if you're still interested, and I'll try to build the image tomorrow so that David can test it at his leisure.

Sure, it sounds like that is only enough for one experiment so I would focus on the default algorithm the patch will boot with sysctl dev.ix.<N>.enable_aim=1

Oct 16 2024, 1:39 AM

Oct 15 2024

gallatin accepted D46785: netinet*: Add assertions for some places that don't support M_EXTPG mbufs.
Oct 15 2024, 5:10 PM
gallatin updated subscribers of D30155: ixgbe: Bring back accounting for tx in AIM.
Oct 15 2024, 2:41 AM

Oct 14 2024

gallatin added a comment to D45950: vtnet: Fix an LOR in the input path.
In D45950#1073880, @jhb wrote:

I don't think we need the taskqueue. It's probably just a design copied from the Intel drivers, and I don't think it makes much sense for those either. The other thing that can be nice to do though when making this change is to instead build a temporary list of packets linked via m_nextpkt (mbufq works for this) and pass an entire batch to if_input. This lets you avoid dropping the lock as often.

Oct 14 2024, 8:57 PM

Oct 7 2024

gallatin added a comment to D46761: mlx5en: do no call if_input() while holding the rq mutex.
In D46761#1070797, @kib wrote:

No, the lock cannot be sleepable because the processing occurs in the context of the interrupt thread.

I would implemented something with blockcount_t or even epoch, but then I realized that it would not help. blockcount cannot be used because rx memory is not type-stable. Driver-private epoch might work, but then note that the second reported backtrace in the PR 281368 shows ip stack acquiring sleepable lock. So even if I try to fix driver, the stack still tries to sleep in ip_input().

I suspect you (Netflix) did not see the deadlock because you either do not use ipv6 or use it in situation with static network configuration. The problems are visible when multicast group membership is changed, at least this is what I see in the PR.

Oct 7 2024, 8:17 PM

Oct 1 2024

gallatin accepted D46824: tcp_output: Clear FIN if tcp_m_copym truncates output length.
Oct 1 2024, 5:58 PM

Sep 26 2024

gallatin accepted D46784: ktls: Mark mbufs containing outbound encrypted TLS records read-only.
Sep 26 2024, 2:38 PM
gallatin accepted D46787: mbuf: Add M_WRITABLE_EXTPG.
Sep 26 2024, 2:37 PM
gallatin added a comment to D46786: m_unshare: Fail with a NULL return if the chain contains unmapped mbufs.

Would it be better to call mb_unmapped_to_ext() here ?

Sep 26 2024, 2:36 PM
gallatin added a comment to D46787: mbuf: Add M_WRITABLE_EXTPG.

Ah, OK, I understand now.

Sep 26 2024, 2:22 PM

Sep 25 2024

gallatin requested changes to D46761: mlx5en: do no call if_input() while holding the rq mutex.
Sep 25 2024, 11:17 PM
gallatin added a comment to D46761: mlx5en: do no call if_input() while holding the rq mutex.

I'm very afraid there will be performance implications due to new cache misses here from queing mbufs twice. On tens of thounsands of interfaces running over 8 years, we've never hit a deadlock from this lock, and I don't think fixing this is important enough to hurt performance for.

Sep 25 2024, 11:16 PM
gallatin added a comment to D46787: mbuf: Add M_WRITABLE_EXTPG.

I'm confused.. if we are marking non-writable M_EXTPG mufs as M_RDONLY, why can't we simply remove the M_EXTPG check from M_WRITABLE? Why do we need a new macro?

Sep 25 2024, 11:09 PM
gallatin accepted D46783: mbuf: Don't force all M_EXTPG mbufs to be read-only.
Sep 25 2024, 11:08 PM

Sep 9 2024

gallatin accepted D46412: tests: Add some test cases for SO_SPLICE.
Sep 9 2024, 4:08 PM
gallatin accepted D46411: socket: Implement SO_SPLICE.
Sep 9 2024, 4:07 PM

Sep 5 2024

gallatin accepted D46411: socket: Implement SO_SPLICE.

This passes basic sanity testing at netflix. Sorry for the delayed approval; we had a few integration issues with this and a local Netflix feature that made it look like splice was not working. It only just now became obvious that it was due to our local feature & how to fix it.

Sep 5 2024, 6:40 PM

Aug 16 2024

gallatin accepted D46303: socket: Split up soreceive_stream().
Aug 16 2024, 9:27 PM
gallatin accepted D46304: socket: Split up soreceive_generic().
Aug 16 2024, 9:25 PM
gallatin accepted D46305: socket: Split up sosend_generic().
Aug 16 2024, 9:24 PM

Aug 5 2024

gallatin committed rG1f628be888b7: tcp_ratelimit: provide an api for drivers to release ratesets at detach (authored by gallatin).
tcp_ratelimit: provide an api for drivers to release ratesets at detach
Aug 5 2024, 4:52 PM
gallatin closed D46221: tcp_ratelimit: provide a hook for drivers to release ratesets at detach.
Aug 5 2024, 4:52 PM
gallatin added inline comments to D46221: tcp_ratelimit: provide a hook for drivers to release ratesets at detach.
Aug 5 2024, 3:45 PM

Aug 4 2024

gallatin updated the summary of D46221: tcp_ratelimit: provide a hook for drivers to release ratesets at detach.
Aug 4 2024, 4:13 PM
gallatin requested review of D46221: tcp_ratelimit: provide a hook for drivers to release ratesets at detach.
Aug 4 2024, 4:10 PM

Jul 18 2024

gallatin accepted D46024: nvme: Always lock and only avoid processing for recovery state.
Jul 18 2024, 10:00 PM
gallatin accepted D46031: nvme: Warn if there's system interrupt issues..
Jul 18 2024, 9:56 PM

Jul 15 2024

gallatin added a comment to D45950: vtnet: Fix an LOR in the input path.

Is this safe? I think so, but I confess that I don't know the low level details in this driver very well.

I believe so, from what I see, the lock exists to synchronize with the taskqueue and to protect some non-atomic counters.

When I read a network driver, I view a lock around rx processing as an indicator there is room for improvement in the design. The reason the lock seems to exist is to serialize rx ring servicing between the ithread (the normal path) and the taskqueue (which is woken if we continually find more packets to process... or maybe if interrupts don't work..?). I don't really understand code at the bottom of vtnet_rx_vq_process(). It seems like interrupts should be disabled if switching to the taskqueue, and enabled if returning from it. It probably has something to do with the "race" mentioned in that code..

I don't quite understand that either. The comment above the definition of VTNET_INTR_DISABLE_RETRIES suggests to me that the idea is:

  • vtnet_rxq_eof() returns 0, so more == 0, i.e., there were no more descriptors to process.
  • vtnet_rxq_enable_intr() returned 1, meaning that we found some completed descriptors on the queue when enabling interrupts.
  • We should call vtnet_rxq_eof() again to collect those newly completed descriptors instead of deferring.

I'm not sure I understand the purpose of the taskqueue at all though. Why can't we handle all packets in the ithread context?

Jul 15 2024, 8:11 PM
gallatin added a comment to D45950: vtnet: Fix an LOR in the input path.

Is this safe? I think so, but I confess that I don't know the low level details in this driver very well.

Jul 15 2024, 1:11 AM

Jul 8 2024

gallatin accepted D45922: socket: Simplify synchronization in soreceive_stream().
Jul 8 2024, 7:56 PM

Jul 1 2024

gallatin accepted D45764: arm64: Add smbios to kernel.
Jul 1 2024, 6:41 PM
gallatin accepted D45763: smbios: Add length sanity checking.
Jul 1 2024, 6:40 PM

Jun 21 2024

gallatin accepted D45675: ktls: Remove the socket parameter to ktls_ocf_try().

I was concerned at first about isal, but then I remembered that @jhb had moved it from plugging in at the ktls layer to plugging in at the ocf layer

Jun 21 2024, 6:41 PM
gallatin accepted D45674: ktls: Fix races that can lead to double initialization.
Jun 21 2024, 6:37 PM
gallatin accepted D45673: socket: Pass capsicum rights down to socket option handlers.
Jun 21 2024, 6:33 PM

May 31 2024

gallatin accepted D45419: tcp: mark TCP stacks which can serve as a default stack.
May 31 2024, 4:02 PM

May 25 2024

gallatin added inline comments to D45288: vm_pageout_scan_inactive: take a lock break.
May 25 2024, 10:04 PM

May 24 2024

gallatin accepted D45310: cam: Drop periph lock when completing I/O with ENOMEM status.
May 24 2024, 5:46 AM
gallatin added inline comments to D45288: vm_pageout_scan_inactive: take a lock break.
May 24 2024, 5:45 AM
gallatin accepted D45304: tcp: improve blackhole support.
May 24 2024, 5:40 AM

May 23 2024

gallatin added inline comments to D45310: cam: Drop periph lock when completing I/O with ENOMEM status.
May 23 2024, 1:40 AM
gallatin added inline comments to D45311: nvme: Count number of alginment splits.
May 23 2024, 1:37 AM

May 17 2024

gallatin added a comment to D45224: arm64: Set ATTR_CONTIGUOUS on DMAP's L2 blocks.
In D45224#1031599, @alc wrote:

@gallatin @markj Could you please test this patch? I've also tested this patch on EC2 VMs with both 4K and 16K base pages, but their device mappings don't trigger any L1 or L2C demotions in the direct map.

May 17 2024, 8:19 PM

May 8 2024

gallatin added a comment to D45042: arm64: Make jemalloc safe for 16k / 4k interoperability.
In D45042#1029022, @alc wrote:
In D45042#1028058, @alc wrote:

Do we have any idea what the downsides of the change are? If we make the default 64KB, then I'd expect memory usage to increase; do we have any idea what the looks like? It'd be nice to, for example, compare memory usage on a newly booted system with and without this change.

I had the same question. It will clearly impact a lot of page granularity counters, at the very least causing some confusion for people who look at those counters, e.g.,

./include/jemalloc/internal/arena_inlines_b.h-          arena_stats_add_u64(tsdn, &arena->stats,
./include/jemalloc/internal/arena_inlines_b.h-              &arena->decay_dirty.stats->nmadvise, 1);
./include/jemalloc/internal/arena_inlines_b.h-          arena_stats_add_u64(tsdn, &arena->stats,
./include/jemalloc/internal/arena_inlines_b.h:              &arena->decay_dirty.stats->purged, extent_size >> LG_PAGE);
./include/jemalloc/internal/arena_inlines_b.h-          arena_stats_sub_zu(tsdn, &arena->stats, &arena->stats.mapped,
./include/jemalloc/internal/arena_inlines_b.h-              extent_size);

However, it's not so obvious what the effect on the memory footprint will be. For example, the madvise(MADV_FREE) calls will have coarser granularity. If we set the page size to 64KB, then one in-use 4KB page within a 64KB region will be enough to block the application of madvise(MADV_FREE) to the other 15 pages. Quantifying the impact that this coarsening has will be hard.

This does, however, seem to be the intended workaround: https://github.com/jemalloc/jemalloc/issues/467

Buried in that issue is the claim that Firefox's builtin derivative version of jemalloc eliminated the statically compiled page size.

What direction does the kernel grow the vm map? They apparently reverted support for lg page size values larger than the runtime page size because it caused fragmentation when the kernel grows the vm map downwards..

Typically, existing map entries are only extended in an upward direction. For downward growing regions, e.g., stacks, new entries are created. Do you have a pointer to where this is discussed? I'm puzzled as to why the direction would be a factor.

May 8 2024, 2:16 PM

May 5 2024

gallatin added a comment to D45042: arm64: Make jemalloc safe for 16k / 4k interoperability.
In D45042#1028058, @alc wrote:

Do we have any idea what the downsides of the change are? If we make the default 64KB, then I'd expect memory usage to increase; do we have any idea what the looks like? It'd be nice to, for example, compare memory usage on a newly booted system with and without this change.

I had the same question. It will clearly impact a lot of page granularity counters, at the very least causing some confusion for people who look at those counters, e.g.,

./include/jemalloc/internal/arena_inlines_b.h-          arena_stats_add_u64(tsdn, &arena->stats,
./include/jemalloc/internal/arena_inlines_b.h-              &arena->decay_dirty.stats->nmadvise, 1);
./include/jemalloc/internal/arena_inlines_b.h-          arena_stats_add_u64(tsdn, &arena->stats,
./include/jemalloc/internal/arena_inlines_b.h:              &arena->decay_dirty.stats->purged, extent_size >> LG_PAGE);
./include/jemalloc/internal/arena_inlines_b.h-          arena_stats_sub_zu(tsdn, &arena->stats, &arena->stats.mapped,
./include/jemalloc/internal/arena_inlines_b.h-              extent_size);

However, it's not so obvious what the effect on the memory footprint will be. For example, the madvise(MADV_FREE) calls will have coarser granularity. If we set the page size to 64KB, then one in-use 4KB page within a 64KB region will be enough to block the application of madvise(MADV_FREE) to the other 15 pages. Quantifying the impact that this coarsening has will be hard.

This does, however, seem to be the intended workaround: https://github.com/jemalloc/jemalloc/issues/467

Buried in that issue is the claim that Firefox's builtin derivative version of jemalloc eliminated the statically compiled page size.

May 5 2024, 9:19 PM

May 2 2024

gallatin accepted D45066: libsys: Fall back to the worst case page size.
May 2 2024, 6:12 PM
gallatin added inline comments to D45065: param.h: Add PAGE_SIZE_MAX and PAGE_SHIFT_MAX.
May 2 2024, 6:11 PM
gallatin added a comment to D45042: arm64: Make jemalloc safe for 16k / 4k interoperability.

I've been thinking about adding PAGE_SIZE_MAX/PAGE_SHIFT_MAX or similar to arm64 to define the largest page size the kernel could support. We could then use that here if it's defined.

May 2 2024, 3:01 PM

May 1 2024

gallatin added a comment to D40676: ktrace: Record detailed ECAPMODE violations.

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled, which is confusing and distracting. Can this please be reverted to behave without CAP output for systems w/o capsicum ?

This was done already in commit f239db4800ee9e7ff8485f96b7a68e6c38178c3b.

May 1 2024, 6:05 PM · capsicum
gallatin added a comment to D40676: ktrace: Record detailed ECAPMODE violations.

After this change, ktrace output is littered with 'CAP system call not allowed: $SYSCALL' on systems w/o capsicum enabled, which is confusing and distracting. Can this please be reverted to behave without CAP output for systems w/o capsicum ?

May 1 2024, 3:56 PM · capsicum
gallatin requested review of D45042: arm64: Make jemalloc safe for 16k / 4k interoperability.
May 1 2024, 1:30 PM

Apr 30 2024

gallatin committed rG13a5a46c49d0: Fix new users of MAXPHYS and hide it from the kernel namespace (authored by gallatin).
Fix new users of MAXPHYS and hide it from the kernel namespace
Apr 30 2024, 7:30 PM
gallatin closed D44986: Fix new users of MAXPHYS and hide it from the kernel namespace.
Apr 30 2024, 7:30 PM
gallatin added inline comments to D44986: Fix new users of MAXPHYS and hide it from the kernel namespace.
Apr 30 2024, 2:11 PM

Apr 29 2024

gallatin updated the diff for D44986: Fix new users of MAXPHYS and hide it from the kernel namespace.

I consulted with @imp, and after a trip down the rabbit hole, we concluded that a header file consisting only of the definition of MAXPHYS is not creative (as this is the only way to express this in C) so it can't have copyright protection, and should simply be public domain.

Apr 29 2024, 11:24 PM

Apr 28 2024

gallatin updated the diff for D44986: Fix new users of MAXPHYS and hide it from the kernel namespace.
  • Update diff to avoid cutting/pasting MAXPHYS definition as per @kib's suggestion
Apr 28 2024, 4:41 PM
gallatin requested review of D44986: Fix new users of MAXPHYS and hide it from the kernel namespace.
Apr 28 2024, 1:54 AM

Apr 18 2024

gallatin updated the diff for D39150: hyperv: Fix compilation with larger page sizes.

I just tripped over this again when trying to use some of the 16K changes I have in my Netflix tree on a personal machine running a GENERIC kernel, so let's try this again in a different way.

Apr 18 2024, 12:07 AM

Apr 15 2024

gallatin accepted D44800: tcp bbr: improve code consistency.
Apr 15 2024, 8:13 PM

Apr 5 2024

gallatin accepted D43504: netinet: add a probe point for IP stats counters.

Thank you for adding that option.

Apr 5 2024, 5:50 PM

Apr 3 2024

gallatin added a comment to D43504: netinet: add a probe point for IP stats counters.

Below are the results from my testing. I'm sorry that it took so long.. I had to re-do testing from the start b/c the new machine was not exactly identical to the old (different BIOS rev) and was giving slightly different results.
The results are from 92Gb/s of traffic over a one hour period with 45-47K TCP connections established/

No SDT probes: 56.4%
normal SDT 57.5%
new IP SDTs 57.9%
new IP SDT+ 56.6%
zero-cost

Just to be clear, "SDT+" is with the patch I supplied to provide new asm goto-based SDT probes? I'm not sure what the "zero-cost" line means.

This is just measuring CPU usage as reported by the scheduler?

I made some progress on the hot-patching implementation last week. I hope to have it ready fairly soon.

Apr 3 2024, 7:05 PM
gallatin added a comment to D43504: netinet: add a probe point for IP stats counters.

Below are the results from my testing. I'm sorry that it took so long.. I had to re-do testing from the start b/c the new machine was not exactly identical to the old (different BIOS rev) and was giving slightly different results.
The results are from 92Gb/s of traffic over a one hour period with 45-47K TCP connections established/

No SDT probes: 56.4%
normal SDT 57.5%
new IP SDTs 57.9%
new IP SDT+ 56.6%
zero-cost

Just to be clear, "SDT+" is with the patch I supplied to provide new asm goto-based SDT probes? I'm not sure what the "zero-cost" line means.

This is just measuring CPU usage as reported by the scheduler?

I made some progress on the hot-patching implementation last week. I hope to have it ready fairly soon.

Apr 3 2024, 6:47 PM
gallatin added a comment to D43504: netinet: add a probe point for IP stats counters.

Below are the results from my testing. I'm sorry that it took so long.. I had to re-do testing from the start b/c the new machine was not exactly identical to the old (different BIOS rev) and was giving slightly different results.
The results are from 92Gb/s of traffic over a one hour period with 45-47K TCP connections established/

Apr 3 2024, 5:53 PM

Mar 29 2024

gallatin added a comment to D43504: netinet: add a probe point for IP stats counters.

OK, starting with an unpatched kernel & working my way through the patches. I'll report percent busy for unpatched and various patches on our original 100G server (based around Xeon E5-2697A v4, which tends to be a poster-child for cache misses, as it runs very close to the limits of its memory bandwidth. I'll be disabling powerd and using TCP RACK TCP's DGP pacing.

This will take several days, as it takes a while to load up a server, get a few hours of steady-state, and unload it gently,.

Mar 29 2024, 4:18 PM

Mar 26 2024

gallatin accepted D44420: Optimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold.
Mar 26 2024, 7:15 PM

Mar 25 2024

gallatin added a comment to D43504: netinet: add a probe point for IP stats counters.

OK, starting with an unpatched kernel & working my way through the patches. I'll report percent busy for unpatched and various patches on our original 100G server (based around Xeon E5-2697A v4, which tends to be a poster-child for cache misses, as it runs very close to the limits of its memory bandwidth. I'll be disabling powerd and using TCP RACK TCP's DGP pacing.

Mar 25 2024, 3:43 PM

Mar 23 2024

gallatin added a comment to D43504: netinet: add a probe point for IP stats counters.

Regarding SDT hotpatching, the implementation[1] was written a long time ago, before we had "asm goto" in LLVM. It required a custom toolchain program[2].

Since then, "asm goto" support appeared in LLVM. It makes for a much simpler implementation. I hacked up part of it and posted a patch[3]. In particular, the patch makes use of asm goto to remove the branch and data access. (The probe site is moved to the end of the function in an unreachable block.) The actual hot-patching part isn't implemented and will take some more work, but this is enough to do some benchmarking to verify that the overhead really is minimal. @gallatin would you be able to verify this?

I would also appreciate any comments on the approach taken in the patch, keeping in mind that the MD bits are not yet implemented.

[1] https://people.freebsd.org/~markj/patches/sdt-zerocost/
[2] https://github.com/markjdb/sdtpatch
[3] https://reviews.freebsd.org/D44483

Mar 23 2024, 9:26 PM