⚙ D43504 netinet: add a probe point for IP stats counters

kp created this revision.Jan 18 2024, 8:03 PM

Herald added a reviewer: gnn. · View Herald TranscriptJan 18 2024, 8:03 PM

Herald added subscribers: glebius, melifaro, imp. · View Herald Transcript

kp requested review of this revision.Jan 18 2024, 8:03 PM

Harbormaster completed remote builds in B55486: Diff 132956.Jan 18 2024, 8:03 PM

Mostly posted as a discussion opener. If others also think this is a good idea I'll do the same for the netinet6 code.

Inspired by trying to track down a bug where netstat reports "2428 output packets dropped due to no bufs, etc.", which tells us something, but could still be either ip_fragment() or ip_output() failing. The idea behind the SDT is to be able to disambiguate these errors, and thus figure out why packets are getting dropped.

Would it make sense to implement a mib provider?

Does SDT_PROBE() add any instructions when dtrace.ko isn't loaded, yet?

zlei added a subscriber: zlei.Jan 19 2024, 3:59 AM

In D43504#991799, @glebius wrote:

Does SDT_PROBE() add any instructions when dtrace.ko isn't loaded, yet?

Yes, we end up with an extra branch:

/usr/src/sys/netinet/ip_output.c:371
                IPSTAT_INC(ips_localout);
     17f:       80 3d 00 00 00 00 00    cmpb   $0x0,0x0(%rip)        # 186 <ip_output+0x186>
     186:       0f 85 87 14 00 00       jne    1613 <ip_output+0x1613>

from

#define SDT_PROBES_ENABLED()    __predict_false(sdt_probes_enabled)

#define SDT_PROBE(prov, mod, func, name, arg0, arg1, arg2, arg3, arg4)  do {    \
        if (SDT_PROBES_ENABLED()) {                                             \
                if (__predict_false(sdt_##prov##_##mod##_##func##_##name->id))  \
                (*sdt_probe_func)(sdt_##prov##_##mod##_##func##_##name->id,     \
                    (uintptr_t) arg0, (uintptr_t) arg1, (uintptr_t) arg2,       \
                    (uintptr_t) arg3, (uintptr_t) arg4);                        \
        } \
} while (0)

I'd expect that to have no measurable impact, but I've not tested that to confirm.

In D43504#991789, @tuexen wrote:

Would it make sense to implement a mib provider?

We don't have any infrastructure for that in our dtrace at the moment.
It might be a reasonable idea, but if we go down that path we should do it for all counters (rather than piecemeal, one at a time), and that's a rather larger project than I'm interested in right now.

In D43504#991959, @kp wrote:

In D43504#991789, @tuexen wrote:

Would it make sense to implement a mib provider?

We don't have any infrastructure for that in our dtrace at the moment.
It might be a reasonable idea, but if we go down that path we should do it for all counters (rather than piecemeal, one at a time), and that's a rather larger project than I'm interested in right now.

I was actually thinking about adding it to more counters than only IPv6 and IPv6. I could help with the transport related counters. But I don't have much experience with setting up the infrastructure...

In D43504#991982, @tuexen wrote:

I was actually thinking about adding it to more counters than only IPv6 and IPv6. I could help with the transport related counters. But I don't have much experience with setting up the infrastructure...

Right. If we're going down this path (either like in this patch, or via the mib thing) we should indeed look at adding it for imp/icmp6/tcp/udp/sctp/... as well.

So after a closer look I figured out that 'mib' probe points in Illumos are just
SDTs in a 'mib' namespace. We don't really need any significant infrastructure
to put the probe points under 'mib'.

As an example I'ved added this to ICMP as well.

One (minor) issue is that this produces probe results like
'none:count icps_inhist[icp->icmp_type] 1'.
That is, we won't be able to distinguish different ICMP input types.
We could improve that, but it'd require more changes to ICMPSTAT_INC() and I'm
not sure that's worth it.

Another possible improvement would be to include the counter name in the probe
name, but that would also require us to pre-define many more probe points. We'd
also have to manually keep that in sync with the counters (i.e. define a new
probe point if we add a counter).
We can already filter them based on the 'name' argument
(/ stringof(arg0) == "ips_ofragments" /), so I don't think that's worth it.

Harbormaster completed remote builds in B55496: Diff 133029.Jan 19 2024, 1:10 PM

Great. Is it possible that args[0] contains the value with which the counter is to be incremented. This would keep it compatible with the Solaris version.

In D43504#992200, @tuexen wrote:

Great. Is it possible that args[0] contains the value with which the counter is to be incremented. This would keep it compatible with the Solaris version.

We could easily swap the order of the arguments. It feels a bit odd though, and our arguments are always going to be different from Solaris anyway, so I’m not sure that’s all that useful.
I don’t have strong feelings either way though.

Though it's kind of tedious to implement (absent some clever trick), here's a reason to have a separate probe per counter: enabling a single counter probe means that you're not adding overhead to every code path which increments a counter. If the probe corresponds to a rare event, that might be a significant benefit. Calling dtrace_probe() just to do a strcmp() isn't cheap, and with the current design you'd have to do that at least once per packet no matter which counter you're interested in.

In D43504#992202, @kp wrote:

In D43504#992200, @tuexen wrote:

Great. Is it possible that args[0] contains the value with which the counter is to be incremented. This would keep it compatible with the Solaris version.

We could easily swap the order of the arguments. It feels a bit odd though, and our arguments are always going to be different from Solaris anyway, so I’m not sure that’s all that useful.
I don’t have strong feelings either way though.

Maybe others do? I think keeping it as consistent as possible without too much work is a good thing.

In D43504#992204, @markj wrote:

Though it's kind of tedious to implement (absent some clever trick), here's a reason to have a separate probe per counter: enabling a single counter probe means that you're not adding overhead to every code path which increments a counter. If the probe corresponds to a rare event, that might be a significant benefit. Calling dtrace_probe() just to do a strcmp() isn't cheap, and with the current design you'd have to do that at least once per packet no matter which counter you're interested in.

That's a good point. On the other hand, it's better to have a "big hammer" probe point than no probe point at all.

We do have several very similar stats structs (e.g. for ip, ip6, icmp, icmp6, tcp, udp, sctp) so it might be worth doing something clever-ish with macros to do this. Or, worst case, if we have to track it manually I'd expect compile errors if we forget to add a probe definition, so we're unlikely to forget to add them for any new counters we end up adding.

I'll see if I can come up with something on Monday.

In D43504#992329, @kp wrote:

In D43504#992204, @markj wrote:

Though it's kind of tedious to implement (absent some clever trick), here's a reason to have a separate probe per counter: enabling a single counter probe means that you're not adding overhead to every code path which increments a counter. If the probe corresponds to a rare event, that might be a significant benefit. Calling dtrace_probe() just to do a strcmp() isn't cheap, and with the current design you'd have to do that at least once per packet no matter which counter you're interested in.

That's a good point. On the other hand, it's better to have a "big hammer" probe point than no probe point at all.

We do have several very similar stats structs (e.g. for ip, ip6, icmp, icmp6, tcp, udp, sctp) so it might be worth doing something clever-ish with macros to do this. Or, worst case, if we have to track it manually I'd expect compile errors if we forget to add a probe definition, so we're unlikely to forget to add them for any new counters we end up adding.

I'll see if I can come up with something on Monday.

I am willing to some of the work. I could also start with SCTP, since code changes there have a limited impact. It would just be great to have a way to do the right thing. Since there is no urgency, we can also do one protocol after the other. I am willing to deal with SCTP, TCP, and UDP. I also have some stuff locally for UDP-Lite.

Have separate probes for each field.

I didn't try to do any preprocessor magic for this, it's straightforward enough
to just enumerate them manually, and if we miss one the compiler will be happy
to tell us, so it didn't seem worth it.

This version adds probes to IP, IPv6 and ICMP. If we agree on this approach we
should do the same to ICMPv6, UDP and TCP at least. Maybe also SCTP.

Note that there are very few changes in the actual IP stack code itself. The only
required change is where we distinguish array entries in e.g. icps_inhist or ip6s_nxthist.

Herald added a subscriber: ae. · View Herald TranscriptJan 22 2024, 8:00 PM

Harbormaster completed remote builds in B55546: Diff 133167.Jan 22 2024, 8:00 PM

I'm wondering why we use icmp, ip, and ip6 as module and count as functions. I think in the Solaris implementation module and function are unspecified. Using a name different from what is used by Solaris is fine for me. So I really like the direction this patch is taking.

In D43504#993167, @tuexen wrote:

I'm wondering why we use icmp, ip, and ip6 as module and count as functions. I think in the Solaris implementation module and function are unspecified. Using a name different from what is used by Solaris is fine for me. So I really like the direction this patch is taking.

I'm including icmp, etc to distinguish the different counters more easily. While we could tell from the name (e.g. ips_total is IP, and ip6s_total is IPv6) it's rather easier to distinguish them if we also include that.
It also means that we can easily list all counters for a given protocol (e.g. dtrace -n 'mib:icmp:count: { printf("%d %d", arg0, arg1); }').

The 'count' is mostly a holdover from the first draft of this patch. We could remove it and have 'count' be implicit because it's in the 'mib' provider, but I'd keep it. It doesn't get in the way of anything, and if we ever decide to add extra probes that might belong in the 'mib' provider we can distinguish them. I could image us someday adding probes for when the counters are read or reset for example.

OK. I am fine with keeping the additional information.

This revision is now accepted and ready to land.Jan 23 2024, 10:44 AM

Include icmp6, udp and tcp counters.

This revision now requires review to proceed.Jan 23 2024, 3:38 PM

Herald added a reviewer: transport. · View Herald TranscriptJan 23 2024, 3:38 PM

Harbormaster completed remote builds in B55572: Diff 133200.Jan 23 2024, 3:38 PM

The real intent behind the "function" field is that the kernel will automatically populate it with the C function containing the same probe. The idea being that you have the same provider:::name probe in two C functions, dtrace will let you enable just one of the two probes if you specify the function name. Currently that's not implemented on FreeBSD but my preference would be to avoid using it. The same applies to the "module" identifier, which on FreeBSD does get autopopulated. There are some existing probes which violate this rule. Really it's a bug that the SDT macros let you specify them at all but it's hard to fix at this point.

I don't really see the downside of having the "module" be part of the probe name. You can list probes matching a glob, so something like dtrace -ln 'mib:::icmp-*' would work too.

sys/netinet/in_kdtrace.c
35	Where is this used?

In D43504#993300, @markj wrote:

The real intent behind the "function" field is that the kernel will automatically populate it with the C function containing the same probe. The idea being that you have the same provider:::name probe in two C functions, dtrace will let you enable just one of the two probes if you specify the function name. Currently that's not implemented on FreeBSD but my preference would be to avoid using it. The same applies to the "module" identifier, which on FreeBSD does get autopopulated. There are some existing probes which violate this rule. Really it's a bug that the SDT macros let you specify them at all but it's hard to fix at this point.

I don't really see the downside of having the "module" be part of the probe name. You can list probes matching a glob, so something like dtrace -ln 'mib:::icmp-*' would work too.

It shouldn't be very hard to change the current names.
Do we want mib:icmp::count-icps_error or mib:::icmp-icps_error or mib:icmp::count-icps_error or something else yet?

sys/netinet/in_kdtrace.c
35	Nowhere. That's a think-o and I'll remove it.

I added Kostik since he was behind making counter(9) so efficient and also Drew, as he cares about multi-pps performance.

gallatin added a reviewer: olivier.Jan 23 2024, 6:41 PM

In D43504#991958, @kp wrote:

I'd expect that to have no measurable impact, but I've not tested that to confirm.

I added Olivier to the reviewers. Hopefully he can test this in his packet forwarding setup with slow machines.
I'm really not eager to have an extra instruction all over the place on every counter increment. Did you consider adding SDT probes for just the error counters, and changing the INC function to an ERR_INC or something similar? That way the happy path does not pay any cost, while the errors are still instrumented.

I do feel your pain, and one way i've debugged this stuff in the past was by looking at stack traces when mbufs are freed. That works sometimes.. I actually wonder if the new dtrace instruction tracing might be useful here. I think i'll add Christos to the review as well.

gallatin added reviewers: markj, christos.Jan 23 2024, 6:47 PM

In D43504#993475, @gallatin wrote:

In D43504#991958, @kp wrote:

I'd expect that to have no measurable impact, but I've not tested that to confirm.

I added Olivier to the reviewers. Hopefully he can test this in his packet forwarding setup with slow machines.
I'm really not eager to have an extra instruction all over the place on every counter increment. Did you consider adding SDT probes for just the error counters, and changing the INC function to an ERR_INC or something similar? That way the happy path does not pay any cost, while the errors are still instrumented.

I didn't think of that specifically, no.

The downsides of that approach are that they'd require more changes in the existing code and that it's not always obvious which counter you'd need to be looking at to debug a given problem. It'd probably be okay to not instrument counters like say tcps_sndbyte, but I could see say ips_forward being useful, even if it's not strictly an error counter.

Still, having just the error counters would already be helpful in many cases, so that's something we could look at if the impact is measurable.

I do feel your pain, and one way i've debugged this stuff in the past was by looking at stack traces when mbufs are freed. That works sometimes.. I actually wonder if the new dtrace instruction tracing might be useful here. I think i'll add Christos to the review as well.

Wouldn't we have to have a pretty good idea of which specific call to a counter we'd have to look at? That is, we'd have to separately instrument each occurrence of say IPSTAT_INC(ips_fragdropped); (and we'd have to remember to also instrument IPSTAT_ADD(ips_fragdropped, fp->ipq_nfrags);) if we're looking for dropped fragments.
Whereas with the SDT approach we have a single mib:icmp:count:ips_fragdropped probe for all of those (and can use stack() to tell us which specific one triggered).

In D43504#993448, @kp wrote:

In D43504#993300, @markj wrote:

The real intent behind the "function" field is that the kernel will automatically populate it with the C function containing the same probe. The idea being that you have the same provider:::name probe in two C functions, dtrace will let you enable just one of the two probes if you specify the function name. Currently that's not implemented on FreeBSD but my preference would be to avoid using it. The same applies to the "module" identifier, which on FreeBSD does get autopopulated. There are some existing probes which violate this rule. Really it's a bug that the SDT macros let you specify them at all but it's hard to fix at this point.

I don't really see the downside of having the "module" be part of the probe name. You can list probes matching a glob, so something like dtrace -ln 'mib:::icmp-*' would work too.

It shouldn't be very hard to change the current names.
Do we want mib:icmp::count-icps_error or mib:::icmp-icps_error or mib:icmp::count-icps_error or something else yet?

Why no mib:::icps_foo? Not perfectly, but for TCP it would be mib:::tcps_foo...

In D43504#993475, @gallatin wrote:

Why no mib:::icps_foo? Not perfectly, but for TCP it would be mib:::tcps_foo...

I have a mild preference for being slightly more explicit, because ips_tooshort vs. icps_tooshort doesn't really make it obvious that one is IP and the other ICMP. It's not too much of a problem, because the intended use case is to be looking at the code while probing these, but it also doesn't really cost us anything to do mib:ip::ips_tooshort and make it all explicit.

In D43504#993624, @kp wrote:

In D43504#993475, @gallatin wrote:

Why no mib:::icps_foo? Not perfectly, but for TCP it would be mib:::tcps_foo...

I have a mild preference for being slightly more explicit, because ips_tooshort vs. icps_tooshort doesn't really make it obvious that one is IP and the other ICMP. It's not too much of a problem, because the intended use case is to be looking at the code while probing these, but it also doesn't really cost us anything to do mib:ip::ips_tooshort and make it all explicit.

Then why not change icps_tooshort to icmps_tooshort? I guess the other protocols have a proper prefix.

In D43504#993625, @tuexen wrote:

Then why not change icps_tooshort to icmps_tooshort? I guess the other protocols have a proper prefix.

I don't mind the name enough to go change the existing icmp code to that extent. It's not that important to me.

I'll wait for the performance impact tests and then update the patch to follow the mib:::icps_foo naming scheme (and maybe remove probes from some of the more frequently updated counters).

In D43504#994028, @kp wrote:

I'll wait for the performance impact tests

In my initial testing I've not been able to show a difference with or without this patch. I expect @olivier 's tests are more robust than mine though.

I'd like to land this patch. Absent anyone raising objections I intend to do so in two weeks or so.

tuexen accepted this revision as: transport, tuexen.Mar 21 2024, 3:47 AM

This revision is now accepted and ready to land.Mar 21 2024, 3:47 AM

LGTM. Do it!

To put it lightly, I'd really like to see this patch performance tested. To put it straight - I object. It basically ruins the whole purpose of counter(9) being used all around in network stack as a single instruction lossless counter. It is no longer a single instruction.

I can suggest two options:

Provide additional compile option COUNTER_TRACE that will uninline counter_u64_add() and allow you to dtrace it as fbt. A developer option, like EPOCH_TRACE, that has performance impact, but allow for more debugging.
Reduce the existing patch to add SDT probes only to slowly growing error counters. This would however still have negative performance impact on a DDoS resiliency, but at least won't have impact on normal runtime.

We didn't hear from Christos. Maybe what you want to achieve can be achieved with kinst easily?

Guys, this is crazy. Every SDT probe does a test on a global variable. If this lands, it will cause a noticeable performance impact, especially in high packet rate workloads. Can we shelve this until / unless SDT is modified to insert nops rather than do tests on a global variable? Or put this under its own options EXTRA_IP_PROBES or something?

This revision now requires changes to proceed.Mar 21 2024, 5:46 PM

In D43504#1013965, @gallatin wrote:

Guys, this is crazy. Every SDT probe does a test on a global variable. If this lands, it will cause a noticeable performance impact, especially in high packet rate workloads. Can we shelve this until / unless SDT is modified to insert nops rather than do tests on a global variable? Or put this under its own options EXTRA_IP_PROBES or something?

OK, I went back through my notes. Even before this patch, we see about a 1% speedup on a busy web server if we compile out just SDT probes. This is not just from the cost savings of the test, but because the code we execute boils down to this:

if (__predict_false(sdt_enabled)) {

if (__predict_false(std_probe.id))
        sdt_probe.function(....);

}

And the cpu takes a miss when it speculatively fetches the sdt_probe.function, even if it never actually calls it (thanks to vtune for revealing that).

I did a patch at netfllix to add a KDTRACE_SDT_HOOKS option, so that we could compile out SDT probes. But then I realized that lockstat depends on SDT, and that was too valuable to loose. If this goes in, I'll have to re-think that.

I know that @markj had a zero-cost way to do sdt probes. I'm begging you to wait for that before merging this.

I agree, that introducing these probes should not have a performance hit. Since they exist in Solaris, I was assuming that there is no substantial performance hit. I would really like to see zero cost probes going into the tree.

tuexen requested changes to this revision.Mar 21 2024, 7:47 PM

In D43504#1013980, @tuexen wrote:

I agree, that introducing these probes should not have a performance hit. Since they exist in Solaris, I was assuming that there is no substantial performance hit. I would really like to see zero cost probes going into the tree.

I am fairly sure that probes in Solaris are "free" due to inserted nops being re-written when probes are activated. I defer to @markj on this...

In D43504#1013955, @glebius wrote:

To put it lightly, I'd really like to see this patch performance tested.

As would I. In the local testing I've been able to do I've not seen any performance difference, but I'm very aware that my local testing is not going to be as detailed or comprehensive as the test setups at Netflix.
Which is why I've been asking for such testing.

I'll hold off on committing this until we have agreement.

We didn't hear from Christos. Maybe what you want to achieve can be achieved with kinst easily?

I'd expect that to require probe points for every possible place the relevant counter can be incremented, but I'm not very familiar with the kinst probes.

Regarding SDT hotpatching, the implementation[1] was written a long time ago, before we had "asm goto" in LLVM. It required a custom toolchain program[2].

Since then, "asm goto" support appeared in LLVM. It makes for a much simpler implementation. I hacked up part of it and posted a patch[3]. In particular, the patch makes use of asm goto to remove the branch and data access. (The probe site is moved to the end of the function in an unreachable block.) The actual hot-patching part isn't implemented and will take some more work, but this is enough to do some benchmarking to verify that the overhead really is minimal. @gallatin would you be able to verify this?

I would also appreciate any comments on the approach taken in the patch, keeping in mind that the MD bits are not yet implemented.

[1] https://people.freebsd.org/~markj/patches/sdt-zerocost/
[2] https://github.com/markjdb/sdtpatch
[3] https://reviews.freebsd.org/D44483

In D43504#1014487, @markj wrote:

Regarding SDT hotpatching, the implementation[1] was written a long time ago, before we had "asm goto" in LLVM. It required a custom toolchain program[2].

Since then, "asm goto" support appeared in LLVM. It makes for a much simpler implementation. I hacked up part of it and posted a patch[3]. In particular, the patch makes use of asm goto to remove the branch and data access. (The probe site is moved to the end of the function in an unreachable block.) The actual hot-patching part isn't implemented and will take some more work, but this is enough to do some benchmarking to verify that the overhead really is minimal. @gallatin would you be able to verify this?

I would also appreciate any comments on the approach taken in the patch, keeping in mind that the MD bits are not yet implemented.

[1] https://people.freebsd.org/~markj/patches/sdt-zerocost/
[2] https://github.com/markjdb/sdtpatch
[3] https://reviews.freebsd.org/D44483

I think what I will test is:

SDT probes compiled out entirely
This (D43504) patch with our current SDT mechanism
D43504 + D44483 together

Does that sound appropriate?

@olivier : Do you have the time to do the same sort of test on your low-end pps routing setup?

In D43504#1014598, @gallatin wrote:

I think what I will test is:

SDT probes compiled out entirely

This (D43504) patch with our current SDT mechanism

D43504 + D44483 together

Does that sound appropriate?

@olivier : Do you have the time to do the same sort of test on your low-end pps routing setup?

I wonder if it'd be worth adding 'SDT probes compiled in, but without this patch (D43504)' too.
That'd give us an idea of this patch's cost, independent of the general cost of the other SDT probes in the system.

On the other hand, if 'D43504 + D44483' gives more or less the same number as 'SDT probes compiled out entirely' we know what we want anyway.

In D43504#1014628, @kp wrote:

In D43504#1014598, @gallatin wrote:

I think what I will test is:

SDT probes compiled out entirely

This (D43504) patch with our current SDT mechanism

D43504 + D44483 together

Does that sound appropriate?

@olivier : Do you have the time to do the same sort of test on your low-end pps routing setup?

I wonder if it'd be worth adding 'SDT probes compiled in, but without this patch (D43504)' too.
That'd give us an idea of this patch's cost, independent of the general cost of the other SDT probes in the system.

My intuition is that this (D43504) patch's overhead will be pretty marginal, given that we already have probes in all the protocol data paths and in most of the commonly used locking primitives.

On the other hand, if 'D43504 + D44483' gives more or less the same number as 'SDT probes compiled out entirely' we know what we want anyway.

OK, starting with an unpatched kernel & working my way through the patches. I'll report percent busy for unpatched and various patches on our original 100G server (based around Xeon E5-2697A v4, which tends to be a poster-child for cache misses, as it runs very close to the limits of its memory bandwidth. I'll be disabling powerd and using TCP RACK TCP's DGP pacing.

This will take several days, as it takes a while to load up a server, get a few hours of steady-state, and unload it gently,.

In D43504#1014924, @gallatin wrote:

OK, starting with an unpatched kernel & working my way through the patches. I'll report percent busy for unpatched and various patches on our original 100G server (based around Xeon E5-2697A v4, which tends to be a poster-child for cache misses, as it runs very close to the limits of its memory bandwidth. I'll be disabling powerd and using TCP RACK TCP's DGP pacing.

This will take several days, as it takes a while to load up a server, get a few hours of steady-state, and unload it gently,.

I lost my benchmark box yesterday. It will take me a few days to wrangle another one into shape for testing. I'm sorry this is taking so long. If you want to go ahead and push this, I'd understand..

In D43504#1016001, @gallatin wrote:

I lost my benchmark box yesterday. It will take me a few days to wrangle another one into shape for testing. I'm sorry this is taking so long. If you want to go ahead and push this, I'd understand..

No worries, things happen. I'm glad that it's progressing and am happy to wait for your results.

Below are the results from my testing. I'm sorry that it took so long.. I had to re-do testing from the start b/c the new machine was not exactly identical to the old (different BIOS rev) and was giving slightly different results.
The results are from 92Gb/s of traffic over a one hour period with 45-47K TCP connections established/

No SDT probes: 56.4%
normal SDT 57.5%
new IP SDTs 57.9%
new IP SDT+ 56.6%
zero-cost

[The No SDT case is using my old patch to compile out just SDT probes.]

The results kind of make sense.. This patch increases the number of SDT probes reported by nm on the kernel by roughly 50%. Its less than I expected, but I'm guessing some of the cache misses are avoided on subsequent probes.
Its not super bad, and it seems like it will cost nothing when the zero-cost SDT work goes in.

In the meantime, could you have a compile option to avoid these probes? Maybe re-do your new macros to use IP_SDT_PROBE" which can be defined to a noop if "NO_EXTRA_IP_SDT_PROBES" or some better named kernel config option is present?
That way you get your probes, and I don't get the overhead? That would be the best of both worlds

In D43504#1017242, @gallatin wrote:

In the meantime, could you have a compile option to avoid these probes? Maybe re-do your new macros to use IP_SDT_PROBE" which can be defined to a noop if "NO_EXTRA_IP_SDT_PROBES" or some better named kernel config option is present?

I'll try to do that in the next couple of days.

In D43504#1017242, @gallatin wrote:

Below are the results from my testing. I'm sorry that it took so long.. I had to re-do testing from the start b/c the new machine was not exactly identical to the old (different BIOS rev) and was giving slightly different results.
The results are from 92Gb/s of traffic over a one hour period with 45-47K TCP connections established/

No SDT probes: 56.4%
normal SDT 57.5%
new IP SDTs 57.9%
new IP SDT+ 56.6%
zero-cost

Just to be clear, "SDT+" is with the patch I supplied to provide new asm goto-based SDT probes? I'm not sure what the "zero-cost" line means.

This is just measuring CPU usage as reported by the scheduler?

I made some progress on the hot-patching implementation last week. I hope to have it ready fairly soon.

In D43504#1017265, @markj wrote:

In D43504#1017242, @gallatin wrote:

Below are the results from my testing. I'm sorry that it took so long.. I had to re-do testing from the start b/c the new machine was not exactly identical to the old (different BIOS rev) and was giving slightly different results.
The results are from 92Gb/s of traffic over a one hour period with 45-47K TCP connections established/

No SDT probes: 56.4%
normal SDT 57.5%
new IP SDTs 57.9%
new IP SDT+ 56.6%
zero-cost

Just to be clear, "SDT+" is with the patch I supplied to provide new asm goto-based SDT probes? I'm not sure what the "zero-cost" line means.

This is just measuring CPU usage as reported by the scheduler?

I made some progress on the hot-patching implementation last week. I hope to have it ready fairly soon.

Yes, sorry. I was thinking of the asm-goto as "zero cost".

In D43504#1017265, @markj wrote:

In D43504#1017242, @gallatin wrote:

Below are the results from my testing. I'm sorry that it took so long.. I had to re-do testing from the start b/c the new machine was not exactly identical to the old (different BIOS rev) and was giving slightly different results.
The results are from 92Gb/s of traffic over a one hour period with 45-47K TCP connections established/

No SDT probes: 56.4%
normal SDT 57.5%
new IP SDTs 57.9%
new IP SDT+ 56.6%
zero-cost

Just to be clear, "SDT+" is with the patch I supplied to provide new asm goto-based SDT probes? I'm not sure what the "zero-cost" line means.

This is just measuring CPU usage as reported by the scheduler?

I made some progress on the hot-patching implementation last week. I hope to have it ready fairly soon.

And yes. this is reporting load via 100%-idle% as measured using the same stats vmstat, etc. uses

Allow MIB SDT's to be disabled.

Set 'options KDTRACE_NO_MIB_SDT' in the kernel config to do so.

Harbormaster completed remote builds in B56922: Diff 136556.Apr 5 2024, 8:08 AM

Thank you for adding that option.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 8 2024, 3:31 PM

Closed by commit rG60d8dbbef075: netinet: add a probe point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters (authored by kp). · Explain Why

This revision was automatically updated to reflect the committed changes.

kp added a commit: rG60d8dbbef075: netinet: add a probe point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters.

In D43504#1014487, @markj wrote:

I would also appreciate any comments on the approach taken in the patch, keeping in mind that the MD bits are not yet implemented.

[3] https://reviews.freebsd.org/D44483

The implementation is done now. Feedback and further testing is still welcome.

netinet: add a probe point for IP stats counters
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 136679

sys/conf/options

sys/netinet/icmp6.h

sys/netinet/icmp_var.h

sys/netinet/in_kdtrace.h

sys/netinet/in_kdtrace.c

sys/netinet/ip_icmp.c

sys/netinet/ip_var.h

sys/netinet/tcp_var.h

sys/netinet/udp_var.h

sys/netinet6/icmp6.c

sys/netinet6/in6_src.c

sys/netinet6/ip6_input.c

sys/netinet6/ip6_var.h

sys/netinet6/mld6.c

sys/netinet6/nd6_nbr.c

sys/netinet6/raw_ip6.c

sys/sys/sdt.h

netinet: add a probe point for IP stats countersClosedPublicActions