Page MenuHomeFreeBSD

Add ddb(4) 'netdump' command to netdump a core without preconfiguration
ClosedPublic

Authored by cem on Aug 29 2019, 12:17 AM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 23 2024, 9:02 AM
Unknown Object (File)
Jan 27 2024, 7:05 PM
Unknown Object (File)
Jan 10 2024, 5:38 PM
Unknown Object (File)
Dec 25 2023, 10:51 PM
Unknown Object (File)
Dec 20 2023, 4:36 AM
Unknown Object (File)
Dec 12 2023, 11:17 AM
Unknown Object (File)
Dec 12 2023, 4:37 AM
Unknown Object (File)
Nov 29 2023, 3:20 PM
Subscribers

Details

Summary

Add a 'X -s <server> -c <client> [-g <gateway>] -i <interface>' subroutine
to the generic debugnet code. The imagined use is both netdump, shown here,
and NetGDB (vaporware). It uses the ddb(4) lexer, with some new extensions,
to parse out IPv4 addresses.

'Netdump' uses the generic debugnet routine to load a configuration and
start a dump, without any netdump configuration prior to panic.

Loosely derived from work by: John Reimer <john.reimer AT emc.com>

Test Plan
$ sysctl debug.debugger_on_panic=1
$ dumpon off
$ sysctl debug.kdb.panic=1
...
db> netdump -s a.b.c.d -c ... -g ... -i vtnet0
debugnet: overwriting mbuf zone pointers
debugnet_connect: searching for gateway MAC...
debugnet_handle_arp: got server MAC address xx:xx:xx:xx:xx:xx
netdumping to a.b.c.d (xx:xx:xx:xx:xx:xx)
Dumping 97 out of 475 MB
...
..17%..33%..50%
..66%..82%..99%
netdump finished.
debugnet: restoring mbuf zone pointers

Dump complete

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 26199
Build 24704: arc lint + arc unit

Event Timeline

markj added inline comments.
sys/net/debugnet.c
817

Maybe add a comment explaining why you need to take a ref? I would have assumed that it wasn't necessary, but netdump_configure() expects to be passed a ref'ed interface.

sys/netinet/netdump/netdump_client.c
748

This should be documented in ddb.4.

789

Typo: "Fake".

This revision is now accepted and ready to land.Aug 30 2019, 3:34 PM

Thanks!

sys/net/debugnet.c
817

Yes, that's exactly it. I actually thought I had already added that comment but apparently not. Will fix, thanks.

(Yeah, and assuming this is in ddb(4) and !panic (which doesn't work yet for other reasons), netdump_configure will hold on to that nd_ifp and if_rele it later, if the device is removed or deconfigured. So this is for that case.)

sys/netinet/netdump/netdump_client.c
748

Good catch, thanks. Will fix.

789

'Fako' was actually how I intended to type it, but 'Fake' as a verb probably works as well or better. Will change.

sys/net/debugnet.c
817

I see, it's briefly alluded to in the output structure definition in netdump.h:

194         struct ifnet    *dd_ifp;        /* ref'd */

But not documented.

In fact, I don't think we actually need the ref at all, because the netdump DB_FUNC only grabs the if_name() and invokes netdump_configure(), which re-looks up the ifp and grabs its own ref.

So I'll add a comment regardless, but drop the ref'd ifp.

sys/net/debugnet.c
817

Oh, I didn't read netdump_configure() closely enough. I think you can just use ifunit() instead then, though that variant doesn't check for IFF_DYING.

cem marked 3 inline comments as done.
  • Drop unneeded ref, but add comment about why we don't need it
  • Fako -> Fake
  • Document in ddb.4; mild organizational changes while there to make sense of sometimes non-alphabetical command sorting.
This revision now requires review to proceed.Aug 30 2019, 8:25 PM
  • document in netdump.4 as well
markj added inline comments.
share/man/man4/ddb.4
1181

Maybe "... any way to configure compression or encryption."?

sys/net/debugnet.c
866

Would it make a bit more sense to check DEBUGNET_SUPPORTED_NIC before IFF_UP?

sys/netinet/netdump/netdump_client.c
806

Missing newline.

This revision is now accepted and ready to land.Aug 30 2019, 8:32 PM
sys/net/debugnet.c
817

I did go ahead and use ifunit(). Re DYING, any thoughts on:

  • Just rely on netdump_configure's subsequent ifunit_ref()
  • explicitly check for IFF_DYING here
  • Assume IFF_DYING => !IFF_UP invariant
  • ifunit_ref() and immediately if_rele

In rough order of my preference.

share/man/man4/ddb.4
1181

Will do.

sys/net/debugnet.c
866

It doesn't matter too much from a happy case or single failure perspective.

I guess the idea is if the NIC is down AND debugnet isn't supported, we'd rather the user find out netdump won't work on this NIC than that the NIC happened to be offline at the time? That's reasonable.

sys/netinet/netdump/netdump_client.c
806

Good catch, thanks.

cem marked 3 inline comments as done.
  • wording
  • check supported before UP
  • newline
This revision now requires review to proceed.Aug 30 2019, 8:45 PM
sys/net/debugnet.c
817

I think the first choice would be fine.

866

Yeah, it just seems a bit more logical to check for "static" failure cases first.

I think I got all the oustanding issues, let me know if you see any others or need more time. (I still have more WIP that will go to phab before any of this goes into SVN, so, no big hurry). Thanks for the review.

sys/net/debugnet.c
817

Ok, cool.

866

(I think I just copied this order from netdump_client.)

866

Yeah, that makes a lot of sense to me.

Rebase on lexer API change.

The comment I made in D21482 about configuration from outside vnet0 really belongs here, but the change otherwise LGTM.

This revision is now accepted and ready to land.Sep 10 2019, 12:43 AM

@markj I've moved over discussion of netdump_configure() / CURVNET / etc from D21482 to here. I'll admit I don't understand VNETs at all, so I'm pretty naive about what is reasonable where. Given you have some idea of what the right thing to do is, I'm totally happy to do just do that.

(Copied for easy context, you wrote:)

This check doesn't really make sense for a DDB-invoked netdump. The panicking thread may well be in a non-default vnet, in which case we should switch to vnet0 for the purpose of the lookup, like you do for the routing table lookup above. For configuration coming from dumpon I think it still makes sense for now to reject the request when running in vnet0.

sys/netinet/netdump/netdump_client.c
405–409

What's the best way of retaining this behavior for userspace configurers while using vtnet0 for the ddb caller added in this revision?

We could pass td=NULL from DDB and conditionalize on that basis in this function?

606

Called from userspace here.

710

And also kernel here.

781–782

Invoked from ddb here.

In D21460#470271, @cem wrote:

@markj I've moved over discussion of netdump_configure() / CURVNET / etc from D21482 to here. I'll admit I don't understand VNETs at all, so I'm pretty naive about what is reasonable where. Given you have some idea of what the right thing to do is, I'm totally happy to do just do that.

A VNET basically corresponds to a copy of the network stack. Each VNET has its own copy of all of the global state related to networking (with some exceptions, for instance the mbuf UMA zones are not virtualized this way). When you access a virtualized global variable (prefixed with V_), the current thread's current vnet selects the copy of the global variable to use, so there is implicit context in what looks like an ordinary variable access. When VIMAGE is configured, there is one VNET created by default, vnet0.

VNETs are used for jails, but they are not required. With the existing check I was just trying to be conservative: I can't imagine a scenario in which a user in a non-default VNET should be configuring netdump via dumpon. However, if a thread in a non-default VNET panics we shouldn't prevent a DDB-invoked netdump, since that would be inconvenient if one was trying to debug an issue with the VNET code itself.

sys/netinet/netdump/netdump_client.c
405–409

I think that's a reasonable way to check, yeah. Maybe at some point there will be another use for td and we will have to add a flag instead, but we can just handle that if and when it ever happens.

cem marked an inline comment as done.

Conditionalize curvnet check / which vnet we set in netdump_configure on the basis of caller (kernel sources pass NULL td).

This revision now requires review to proceed.Sep 21 2019, 2:43 AM
This revision is now accepted and ready to land.Sep 23 2019, 9:37 PM