Page MenuHomeFreeBSD

TCP packet capture for debugging
Needs RevisionPublic

Authored by jonlooney_gmail.com on Jul 15 2015, 3:04 PM.
Referenced Files
Unknown Object (File)
Mon, Apr 8, 8:14 AM
Unknown Object (File)
Mar 24 2024, 6:12 PM
Unknown Object (File)
Dec 19 2023, 11:55 AM
Unknown Object (File)
Nov 13 2023, 12:29 PM
Unknown Object (File)
Nov 10 2023, 12:07 AM
Unknown Object (File)
Nov 9 2023, 8:20 PM
Unknown Object (File)
Nov 7 2023, 7:08 PM
Unknown Object (File)
Nov 6 2023, 2:19 PM

Details

Summary

There are times when it would be really nice to have a record of the last few packets and/or state transitions from each TCP socket. That would help with narrowing down certain problems we see in the field that are hard to reproduce without understanding the history of how we got into a certain state.

I wrote code to implement such a feature. It saves copies of the last N packets in a list in the tcpcb. When the tcpcb is destroyed, the list is freed. I thought this was likely to be more performance-friendly than saving copies of the tcpcb. Plus, with the packets, you should be able to reverse-engineer what happened to the tcpcb.

To enable the feature, you will need to compile a kernel with the TCPPCAP option. Even then, the feature defaults to being deactivated. You can activate it by setting a positive value for the number of captured packets. You can do that on either a global basis or on a per-socket basis (via a setsockopt call).

There is no way to get the packets out of the kernel other than using kmem or getting a coredump. I thought that would help some of the legal/privacy concerns regarding such a feature. However, it should be possible to add a future effort to export them in PCAP format.

I tested this at low scale, and found that there were no mbuf leaks and the peak mbuf usage appeared to be unchanged with and without the feature.

The main performance concern I can envision is the number of mbufs that would be used on systems with a large number of sockets. If you save five packets per direction per socket and have 3,000 sockets, that will consume at least 30,000 mbufs just to keep these packets. I tried to reduce the concerns associated with this by limiting the number of clusters (not mbufs) that could be used for this feature. Again, in my testing, that appears to work correctly.

Test Plan

Tested mbuf usage with the feature disabled and found that there were no extra mbufs used (i.e. the feature really was disabled).

Tested mbuf usage with the feature enabled and found that only the expected number of mbufs per socket were used. Also, I used a kernel debugger to ensure the packets were correctly captured in the TCPCB.

Tested that the overall mbuf cluster limit worked correctly.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

jonlooney_gmail.com retitled this revision from to TCP packet capture for debugging.
jonlooney_gmail.com updated this object.
jonlooney_gmail.com edited the test plan for this revision. (Show Details)
jonlooney_gmail.com set the repository for this revision to rS FreeBSD src repository - subversion.
jonlooney_gmail.com added a project: network.
jonlooney_gmail.com added a subscriber: network.
gnn requested changes to this revision.Jul 16 2015, 2:46 PM
gnn added a reviewer: gnn.
gnn added a subscriber: gnn.

I've mentioned a few changes above. I am hoping this will get a wider review and then it can be committed. A very interesting and useful bit of code.

sys/netinet/tcp_input.c
1538

Nit pick. Can you make all of these 3 liners into one liner comments?

sys/netinet/tcp_pcap.c
85

If you take a look at flow6.c you'll see a way to have this be dynamic in the case where the number of clusters is changed via sysctl(). I'd suggest that mechanism.

180

Can't one of the standard m_ functions do this?

185

No data would be the case for a pure ACK would it not?

329

Weirdly long line? Is this a problem in phabricator or in the source file?

This revision now requires changes to proceed.Jul 16 2015, 2:46 PM

Thanks for the feedback!

I'll fix the three-line comments. Can you let me know what action you want me to take with respect to the comments where I responded with questions? Also, let me know if the information with respect to TCP ACKs was sufficient to answer your query.

sys/netinet/tcp_input.c
1538

Yes, will do.

sys/netinet/tcp_pcap.c
85

I couldn't find flow6.c in the source tree. Can you point me to the location of the file?

Did you mean frag6.c, by any chance? If so, I used the same mechanism here that is used in that file.

180

I couldn't find one that would do this.

m_pullup(m, 1) would trim leading empty mbufs as a side-effect. But, I don't think its really the right tool for this job.

We could always create a new m_ inline function that just finds the first mbuf with data. If you want that, let me know what name you want me to use.

185

Yes and no.

In this case, "no data" means that the m_len of all the mbufs in the chain was 0. I expect to at least see a TCP header in an mbuf (making its m_len non-0). That's why I consider this a highly unusual situation.

Because we also get a pointer to the tcp header, we can just bcopy() the data from there (wherever "there" is) to the destination mbuf.

I don't actually expect to directly hit this code; but, its a convenient fallback when something goes wrong or we encounter something unexpected. It makes sure we at least get the tcp header.

329

I think its a phabricator display problem. It is only one byte longer than line 313 and both have the same leading space in my local text copy of the diff.

emaste added inline comments.
sys/netinet/tcp_pcap.c
14–16

Careful when copying existing license text - you don't want "University" here (and probably want only a 2-clause license anyhow). You can grab the text from the beginning of the src/COPYRIGHT file.

18

And not REGENTS.

jonlooney_gmail.com edited edge metadata.
jonlooney_gmail.com marked an inline comment as done.
jonlooney_gmail.com updated this object.

Addressed feedback:

  • Changed three-line comments to one-liners.
  • Updated the copyright notice on the new files.
  • Updated a few comments.
jonlooney_gmail.com added inline comments.
sys/netinet/tcp_pcap.c
185

I updated the comment to make this situation more clear. Let me know if you think this is sufficient.

gnn edited edge metadata.

I'm fine with the changes.

sys/netinet/tcp_pcap.c
180

I think I was thinking of m_adj() but maybe that's not quite right.

This revision is now accepted and ready to land.Jul 17 2015, 1:19 AM
hiren edited edge metadata.

Very good work :-)

It looks good to me but I'd suggest someone more experienced in this area also reviews the code before commit.

sys/netinet/tcp_pcap.c
60

Make sure it's not > 80 columns.

63

Make sure it's not > 80 columns.

71

What does "Default" convey here? Isn't just "Number of packets served per direction" a better description?

98

Should we print an error or a warning when we hit the ceiling? I'd leave that up to you. Hum...we do go totcp_pcap_copy_bestfit() in that case, right? I guess we can leave it as is.

sys/netinet/tcp_var.h
215

O_o !!! :-)

It seems to be in good shape.
I'm planning to commit this early next week if none objects.

jonlooney_gmail.com added inline comments.
sys/netinet/tcp_pcap.c
60

It looks like its 79 columns.

63

It looks like its 79 columns.

71

I labeled this as "default" because it can be over-ridden on a per-socket basis.

98

That's right. Even if we can't take a reference to the cluster, we still try to get whatever data we can fit in the mbuf's internal storage.

bdrewery added a reviewer: bdrewery.
bdrewery added a subscriber: bdrewery.

Style bugs.

  • Every function with no vars should have an empty line at the top.
  • Compare ptrs to NULL (KASSERTS lacking this and many if statements, I stopped pointing out specifics)
sys/netinet/tcp_pcap.c
111

m != NULL

150

spaces before/after =, typically don't assign here either.

173–176

Unneeded {}, but worse is not having '} else {' - don't put a newline after }.

196

m_cur != NULL

219–220

} else {

237

m_cur != NULL

261

m != NULL

265

NULL

286

!= NULL

291–292

} else { (but also unneeded braces)

325–326

} else { (also unneeded braces)

This revision now requires changes to proceed.Oct 14 2015, 1:58 AM

VNET question.

sys/netinet/tcp_pcap.c
70

Why is the variable virtualised but the access to changes it from within a VNET jail not granted? Also if we allow to change this from within a jail could that lead to a DoS?

jonlooney_gmail.com added inline comments.
sys/netinet/tcp_pcap.c
70

Hmmm... There probably should be some sort of global maximum limit. (Currently, there is just a limit on the number of clusters, so a high packet count *could* theoretically consume all mbufs.) If there was a global limit on the number of packets, then we should be able to safely allow per-VNET changes that fall below that limit.

My general reasoning was that you might want to enable packet capturing for just certain jails, which is why I virtualized this particular variable. However, I agree that it needs additional precautions.

What do you think about adding a global sysctl (tcp_pcap_packets_max) to set a maximum per-socket value, and changing the tcp_pcap_packets sysctl to a SYSCTL_PROC that only accepts values <= tcp_pcap_packets_max?