Page MenuHomeFreeBSD

nd6: Count packets dropped due to an invalid hop limit
ClosedPublic

Authored by markj on Sep 28 2020, 1:21 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Mar 22, 6:02 AM
Unknown Object (File)
Dec 21 2023, 10:28 PM
Unknown Object (File)
Nov 23 2023, 10:47 AM
Unknown Object (File)
Nov 15 2023, 7:16 PM
Unknown Object (File)
Nov 12 2023, 5:06 PM
Unknown Object (File)
Nov 12 2023, 12:26 PM
Unknown Object (File)
Nov 8 2023, 5:08 PM
Unknown Object (File)
Oct 24 2023, 10:07 AM

Details

Summary

Generally we should move away from using syslog messages to record
dropped packets, there's still quite a lot of this in the ND6 code.

Pad the icmp6stat structure so that we can add more counters without
breaking compatibility again, last done in r358620.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj requested review of this revision.Sep 28 2020, 1:21 PM

While I agree on the counter I am not yet sure I entirely want to lose all these messages.
Some are helpful in that they tell us things. If you are on multiple network as most of my devices are it can become a pain otherwise to retroactively debug.
Given none of them are logged by default (nd6_debug = 0 unless -DND6_DEBUG is used at compile time and nd6.h:#define nd6log(x) do { if (V_nd6_debug) log x; } while (/*CONSTCOND*/ 0) ) can we not just do both?

On the other hand I've had one of them (maybe not in this list) on my screens for months which were entirely unhelpful:
"RA with a lower CurHopLimit sent from fe80:... on em0 (current = 255, received = 64). Ignored."
It's been unhelpful as (a) I had to figure out how to reset it, and (b) I had no idea where the original limit change came from anymore.
So I wished that was a ndlog rather than a direct log() call.

In D26578#592011, @bz wrote:

While I agree on the counter I am not yet sure I entirely want to lose all these messages.
Some are helpful in that they tell us things. If you are on multiple network as most of my devices are it can become a pain otherwise to retroactively debug.
Given none of them are logged by default (nd6_debug = 0 unless -DND6_DEBUG is used at compile time and nd6.h:#define nd6log(x) do { if (V_nd6_debug) log x; } while (/*CONSTCOND*/ 0) ) can we not just do both?

Certainly we can, I forgot that nd6log() is disabled by default. In this case though I am skeptical that the messages are really useful - if hlim != 255 then presumably something is forwarding ND6 packets and so is quite broken. I can't find any reports of this on mailing lists for forums. It made sense to have verbose logging during earlier days of ND6, but it doesn't seem particularly useful anymore. The extra buffers also waste stack space, at least in principle - maybe the compiler is clever enough to avoid that.

If you prefer to keep these particular log messages I'll re-add them. Otherwise as I add other counters we can consider whether it makes sense to keep various log messages. Errors that might arise as a result of misconfiguration seem worth logging.

On the other hand I've had one of them (maybe not in this list) on my screens for months which were entirely unhelpful:
"RA with a lower CurHopLimit sent from fe80:... on em0 (current = 255, received = 64). Ignored."
It's been unhelpful as (a) I had to figure out how to reset it, and (b) I had no idea where the original limit change came from anymore.
So I wished that was a ndlog rather than a direct log() call.

The extra buffers also waste stack space, at least in principle - maybe the compiler is clever enough to avoid that.

In some modified functions the stack usage appears identical, in others it decreased by 16 bytes, so stack usage is not a compelling reason to remove the nd6log() calls.

While I'm all up for more statistic counter, I'd suggest keeping nd6log calls. They're turned off by default, but useful for debugging.

While I'm all up for more statistic counter, I'd suggest keeping nd6log calls. They're turned off by default, but useful for debugging.

Ok, I will re-add them.

  • Restore log messages.
  • Annotate hop limit checks with __predict_false.
sys/netinet/icmp6.h
643 ↗(On Diff #77884)

Will this work with older netstat(8)?
If yes, are spare fields needed?

sys/netinet/icmp6.h
643 ↗(On Diff #77884)

No, it will break older netstat(8), just as r358620 did. The purpose of adding spares here is to ensure that future additions don't have this effect.

sys/netinet/icmp6.h
643 ↗(On Diff #77884)

Given we pass the size of icmp6stats structure from userland, maybe it would be possible to explore adding a wrapper in ICMPV6CTL_STATS that would check userland buffer size and memcpy() bits from kernel structure appropriately?

sys/netinet/icmp6.h
643 ↗(On Diff #77884)

@markj @melifaro
I think a lot more of the counters usage has similar problems. (totally unrelated to this review). Just if you try to fins a solution, something that'll work for all of them would be nice ... not saying nv or similar ..

sys/netinet/icmp6.h
643 ↗(On Diff #77884)

Yes, there's ~15 or so structures with the same problem. There's a few approaches that could help make netstat more resilient:

  1. Make netstat query the kernel for the structure size before reading counters. It can read the full structure and cast it to its version of struct icmp6stat.
  2. Truncate the structure to fit the userspace buffer, as suggested.
  3. Add a bunch of spare fields to the structures in question so that this issue comes up less frequently.
  4. Come up with a kernel interface to allow netstat to read individual counters, so it can fetch only those it knows about.

I didn't pursue the problem here partly because the icmp6 KBI was already broken once since 12 was branched. I don't like 2) so much because some applications might want to know if there is a mismatch.

Any objection to committing the patch as it is? While having a backwards-compatible solution here would be nice, I don't think it should block the addition of new counters. As I pointed out, the ABI for icmpv6 stats is already broken with respect to stable/12.

This revision is now accepted and ready to land.Oct 13 2020, 8:27 PM

Any objection to committing the patch as it is? While having a backwards-compatible solution here would be nice, I don't think it should block the addition of new counters. As I pointed out, the ABI for icmpv6 stats is already broken with respect to stable/12.

Yeah. I experimented with versioned sysctl solutions (e.g. having net.inet.icmp.stats.%d, CURRENT_ICMP_STATS_VER as user-side sysctl and a generic sysctl macro/handlers allowing to easily support old&new structures), with net.inet.icmp.stats assuming the oldest version. Unfortunately, it turns out it requires changes in the sysctl core w.r.t. accepting callback handlers for non-leaf nodes, so it doesn't look like a fast way ahead.

Given ABI is already broken I have no objections for landing it as is :-)

P.S. thank you for improving the reporting!

sys/netinet/icmp6.h
643 ↗(On Diff #77884)

Thanks for summ

To followup on the discussion: I raised D27035, which introduces a generic mechanism to solve such issues. It also unbreaks FreeBSD12 compat for net.inet6.icmp6.stats.