Page MenuHomeFreeBSD

debugnet_handle_arp(): Fix false-positive assertion for dp_state
ClosedPublic

Authored by bdrewery on Jul 27 2021, 9:14 PM.

Details

Summary

An assertion is present to ensure the pcb is only modified when the state is
DN_STATE_INIT. Because debugnet_arp_gw() is asynchronous it is possible for ARP
replies to come in after the gateway address is known and the state
already changed.

Sponsored by: Dell EMC

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

This revision is now accepted and ready to land.Jul 27 2021, 9:45 PM
vangyzen added inline comments.
sys/net/debugnet_inet.c
354

buf is uninitialized here.

This revision now requires review to proceed.Jul 27 2021, 11:08 PM
This revision is now accepted and ready to land.Jul 28 2021, 12:45 PM
cem accepted this revision.EditedJul 28 2021, 3:37 PM

This would only happen if you have multiple competing dhcp servers on a subnet, right? Edit: not dhcp, oops. Still, kind of surprising we’re getting multiple ARP responses?

In D31327#705894, @cem wrote:

This would only happen if you have multiple competing dhcp servers on a subnet, right? Edit: not dhcp, oops. Still, kind of surprising we’re getting multiple ARP responses?

I don't know exactly but I figured it was simple latency with something like debugnet_arp_gw sending arp request 1, delay(500), retry -> arp request 2 sent, arp reply 1 received (state changes), delay(500), arp reply 2 received (panic due to unexpected state).

Another state issue is if I disconnect netgdb at the right moment and reconnect immediately to the same server, it is possible the server is still sending packets and we end up trying to send an ACK before we even have the gw mac which trips the assert in debugnet_udp_output. "Don't do that" but the assert does make testing errors/retries a bit annoying. I was thinking of simply ignoring all IP packets before gw mac is known.

gdb_trap bailing, hopefully back to ddb!
Switching to ddb back-end
[ thread pid 7112 tid 100967 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> netgdb -s 10.205.198.147
debugnet: overwriting mbuf zone pointers
debugnet_connect: Connecting to 10.205.198.147:20025 via 10.205.228.1 from 10.205.228.124:20026 on vmx1
debugnet_connect: searching for server MAC...
vmx1: debugnet_pkt_in: discard frame with incorrect destination addr 86.139.97.4
vmx1: debugnet_pkt_in: discard frame with incorrect destination addr 86.139.97.4
debugnet_handle_arp: ignoring ARP not to our IP (dest: 10.205.229.109 me: 10.205.228.124)
debugnet_handle_arp: ignoring ARP not to our IP (dest: 10.205.231.8 me: 10.205.228.124)
debugnet_handle_arp: ignoring ARP not to our IP (dest: 10.205.229.208 me: 10.205.228.124)
debugnet_ack_output: Acking with seqno 0
panic @ time 1627418352.271, thread 0xfffffe8bd21f3b00: Assertion pcb->dp_state >= DN_STATE_HAVE_GW_MAC failed at /b/mnt/src/sys/net/debugnet.c:168
time = 1627418352
cpuid = 11, TSC = 0x472b2b1e588
Panic occurred in module kernel loaded at 0xffffffff80200000:

Stack: --------------------------------------------------
kernel:debugnet_udp_output+0x118
kernel:debugnet_handle_udp+0x28c
kernel:debugnet_pkt_in+0x1d9
vmxnet3.ko:vmxnet3_rq_rx_complete+0x9da
vmxnet3.ko:vmxnet3_debugnet_poll+0x5c
kernel:debugnet_arp_gw+0x252
kernel:debugnet_connect+0x4dc
kernel:db_netgdb_cmd+0x167

Something trivial like this:

diff --git sys/net/debugnet_inet.c sys/net/debugnet_inet.c
index c68632b507cc..e7449113ba10 100644
--- sys/net/debugnet_inet.c
+++ sys/net/debugnet_inet.c
@@ -86,6 +86,9 @@ debugnet_handle_ip(struct debugnet_pcb *pcb, struct mbuf **mb)
        struct mbuf *m;
        unsigned short hlen;

+       if (pcb->dp_state < DN_STATE_HAVE_GW_MAC)
+               return;
+
        /* IP processing. */
        m = *mb;
        if (m->m_pkthdr.len < sizeof(struct ip)) {

It wouldn't ignore all delayed last-connect packets but overall the state assertions have been a pain point for my testing.

I don't know exactly but I figured it was simple latency with something like debugnet_arp_gw sending arp request 1, delay(500), retry -> arp request 2 sent, arp reply 1 received (state changes), delay(500), arp reply 2 received (panic due to unexpected state).

Oh, yeah. That makes sense!

I was thinking of simply ignoring all IP packets before gw mac is known.

Seems reasonable to me.

Something trivial like this:

diff --git sys/net/debugnet_inet.c sys/net/debugnet_inet.c
index c68632b507cc..e7449113ba10 100644
--- sys/net/debugnet_inet.c
+++ sys/net/debugnet_inet.c
@@ -86,6 +86,9 @@ debugnet_handle_ip(struct debugnet_pcb *pcb, struct mbuf **mb)
        struct mbuf *m;
        unsigned short hlen;

+       if (pcb->dp_state < DN_STATE_HAVE_GW_MAC)
+               return;
+
        /* IP processing. */
        m = *mb;
        if (m->m_pkthdr.len < sizeof(struct ip)) {

+1