Page MenuHomeFreeBSD

Add ddb(4) 'netdump' command to netdump a core without preconfiguration
ClosedPublic

Authored by cem on Aug 29 2019, 12:17 AM.

Details

Summary

Add a 'X -s <server> -c <client> [-g <gateway>] -i <interface>' subroutine
to the generic debugnet code. The imagined use is both netdump, shown here,
and NetGDB (vaporware). It uses the ddb(4) lexer, with some new extensions,
to parse out IPv4 addresses.

'Netdump' uses the generic debugnet routine to load a configuration and
start a dump, without any netdump configuration prior to panic.

Loosely derived from work by: John Reimer <john.reimer AT emc.com>

Test Plan
$ sysctl debug.debugger_on_panic=1
$ dumpon off
$ sysctl debug.kdb.panic=1
...
db> netdump -s a.b.c.d -c ... -g ... -i vtnet0
debugnet: overwriting mbuf zone pointers
debugnet_connect: searching for gateway MAC...
debugnet_handle_arp: got server MAC address xx:xx:xx:xx:xx:xx
netdumping to a.b.c.d (xx:xx:xx:xx:xx:xx)
Dumping 97 out of 475 MB
...
..17%..33%..50%
..66%..82%..99%
netdump finished.
debugnet: restoring mbuf zone pointers

Dump complete

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

cem created this revision.Aug 29 2019, 12:17 AM
markj accepted this revision.Aug 30 2019, 3:34 PM
markj added inline comments.
sys/net/debugnet.c
817 ↗(On Diff #61438)

Maybe add a comment explaining why you need to take a ref? I would have assumed that it wasn't necessary, but netdump_configure() expects to be passed a ref'ed interface.

sys/netinet/netdump/netdump_client.c
748 ↗(On Diff #61438)

This should be documented in ddb.4.

789 ↗(On Diff #61438)

Typo: "Fake".

This revision is now accepted and ready to land.Aug 30 2019, 3:34 PM
cem added a comment.Aug 30 2019, 4:37 PM

Thanks!

sys/net/debugnet.c
817 ↗(On Diff #61438)

Yes, that's exactly it. I actually thought I had already added that comment but apparently not. Will fix, thanks.

(Yeah, and assuming this is in ddb(4) and !panic (which doesn't work yet for other reasons), netdump_configure will hold on to that nd_ifp and if_rele it later, if the device is removed or deconfigured. So this is for that case.)

sys/netinet/netdump/netdump_client.c
748 ↗(On Diff #61438)

Good catch, thanks. Will fix.

789 ↗(On Diff #61438)

'Fako' was actually how I intended to type it, but 'Fake' as a verb probably works as well or better. Will change.

cem added inline comments.Aug 30 2019, 5:00 PM
sys/net/debugnet.c
817 ↗(On Diff #61438)

I see, it's briefly alluded to in the output structure definition in netdump.h:

194         struct ifnet    *dd_ifp;        /* ref'd */

But not documented.

In fact, I don't think we actually need the ref at all, because the netdump DB_FUNC only grabs the if_name() and invokes netdump_configure(), which re-looks up the ifp and grabs its own ref.

So I'll add a comment regardless, but drop the ref'd ifp.

markj added inline comments.Aug 30 2019, 8:11 PM
sys/net/debugnet.c
817 ↗(On Diff #61438)

Oh, I didn't read netdump_configure() closely enough. I think you can just use ifunit() instead then, though that variant doesn't check for IFF_DYING.

cem updated this revision to Diff 61492.Aug 30 2019, 8:25 PM
cem marked 3 inline comments as done.
  • Drop unneeded ref, but add comment about why we don't need it
  • Fako -> Fake
  • Document in ddb.4; mild organizational changes while there to make sense of sometimes non-alphabetical command sorting.
This revision now requires review to proceed.Aug 30 2019, 8:25 PM
cem updated this revision to Diff 61493.Aug 30 2019, 8:29 PM
  • document in netdump.4 as well
markj accepted this revision.Aug 30 2019, 8:32 PM
markj added inline comments.
share/man/man4/ddb.4
1181 ↗(On Diff #61492)

Maybe "... any way to configure compression or encryption."?

sys/net/debugnet.c
866 ↗(On Diff #61493)

Would it make a bit more sense to check DEBUGNET_SUPPORTED_NIC before IFF_UP?

sys/netinet/netdump/netdump_client.c
806 ↗(On Diff #61493)

Missing newline.

This revision is now accepted and ready to land.Aug 30 2019, 8:32 PM
cem added inline comments.Aug 30 2019, 8:34 PM
sys/net/debugnet.c
817 ↗(On Diff #61438)

I did go ahead and use ifunit(). Re DYING, any thoughts on:

  • Just rely on netdump_configure's subsequent ifunit_ref()
  • explicitly check for IFF_DYING here
  • Assume IFF_DYING => !IFF_UP invariant
  • ifunit_ref() and immediately if_rele

In rough order of my preference.

cem added inline comments.Aug 30 2019, 8:37 PM
share/man/man4/ddb.4
1181 ↗(On Diff #61492)

Will do.

sys/net/debugnet.c
866 ↗(On Diff #61493)

It doesn't matter too much from a happy case or single failure perspective.

I guess the idea is if the NIC is down AND debugnet isn't supported, we'd rather the user find out netdump won't work on this NIC than that the NIC happened to be offline at the time? That's reasonable.

sys/netinet/netdump/netdump_client.c
806 ↗(On Diff #61493)

Good catch, thanks.

cem updated this revision to Diff 61494.Aug 30 2019, 8:45 PM
cem marked 3 inline comments as done.
  • wording
  • check supported before UP
  • newline
This revision now requires review to proceed.Aug 30 2019, 8:45 PM
markj added inline comments.Aug 30 2019, 9:02 PM
sys/net/debugnet.c
866 ↗(On Diff #61493)

Yeah, it just seems a bit more logical to check for "static" failure cases first.

817 ↗(On Diff #61438)

I think the first choice would be fine.

cem added a comment.Aug 30 2019, 10:46 PM

I think I got all the oustanding issues, let me know if you see any others or need more time. (I still have more WIP that will go to phab before any of this goes into SVN, so, no big hurry). Thanks for the review.

sys/net/debugnet.c
866 ↗(On Diff #61493)

(I think I just copied this order from netdump_client.)

866 ↗(On Diff #61493)

Yeah, that makes a lot of sense to me.

817 ↗(On Diff #61438)

Ok, cool.

cem updated this revision to Diff 61631.Sep 4 2019, 12:23 AM

Rebase on lexer API change.

markj accepted this revision.Sep 10 2019, 12:43 AM

The comment I made in D21482 about configuration from outside vnet0 really belongs here, but the change otherwise LGTM.

This revision is now accepted and ready to land.Sep 10 2019, 12:43 AM
cem added a comment.EditedSep 10 2019, 5:19 AM

@markj I've moved over discussion of netdump_configure() / CURVNET / etc from D21482 to here. I'll admit I don't understand VNETs at all, so I'm pretty naive about what is reasonable where. Given you have some idea of what the right thing to do is, I'm totally happy to do just do that.

(Copied for easy context, you wrote:)

This check doesn't really make sense for a DDB-invoked netdump. The panicking thread may well be in a non-default vnet, in which case we should switch to vnet0 for the purpose of the lookup, like you do for the routing table lookup above. For configuration coming from dumpon I think it still makes sense for now to reject the request when running in vnet0.

sys/netinet/netdump/netdump_client.c
405–409 ↗(On Diff #61631)

What's the best way of retaining this behavior for userspace configurers while using vtnet0 for the ddb caller added in this revision?

We could pass td=NULL from DDB and conditionalize on that basis in this function?

606 ↗(On Diff #61631)

Called from userspace here.

710 ↗(On Diff #61631)

And also kernel here.

781–782 ↗(On Diff #61631)

Invoked from ddb here.

In D21460#470271, @cem wrote:

@markj I've moved over discussion of netdump_configure() / CURVNET / etc from D21482 to here. I'll admit I don't understand VNETs at all, so I'm pretty naive about what is reasonable where. Given you have some idea of what the right thing to do is, I'm totally happy to do just do that.

A VNET basically corresponds to a copy of the network stack. Each VNET has its own copy of all of the global state related to networking (with some exceptions, for instance the mbuf UMA zones are not virtualized this way). When you access a virtualized global variable (prefixed with V_), the current thread's current vnet selects the copy of the global variable to use, so there is implicit context in what looks like an ordinary variable access. When VIMAGE is configured, there is one VNET created by default, vnet0.

VNETs are used for jails, but they are not required. With the existing check I was just trying to be conservative: I can't imagine a scenario in which a user in a non-default VNET should be configuring netdump via dumpon. However, if a thread in a non-default VNET panics we shouldn't prevent a DDB-invoked netdump, since that would be inconvenient if one was trying to debug an issue with the VNET code itself.

sys/netinet/netdump/netdump_client.c
405–409 ↗(On Diff #61631)

I think that's a reasonable way to check, yeah. Maybe at some point there will be another use for td and we will have to add a flag instead, but we can just handle that if and when it ever happens.

cem updated this revision to Diff 62369.Sep 21 2019, 2:43 AM
cem marked an inline comment as done.

Conditionalize curvnet check / which vnet we set in netdump_configure on the basis of caller (kernel sources pass NULL td).

This revision now requires review to proceed.Sep 21 2019, 2:43 AM
markj accepted this revision.Sep 23 2019, 9:37 PM
This revision is now accepted and ready to land.Sep 23 2019, 9:37 PM