Page MenuHomeFreeBSD

debugnet(4): Add optional full-duplex mode
ClosedPublic

Authored by cem on Sep 6 2019, 12:53 AM.

Details

Summary

It remains unattached to any client protocol. Netdump is unaffected
(remaining half-duplex). The intended consumer is NetGDB.

Test Plan

Confirmed netdump client continues to work correctly with stock netdumpd from ports.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

cem created this revision.Sep 6 2019, 12:53 AM
markj added inline comments.Sep 13 2019, 8:57 PM
sys/net/debugnet.c
354 ↗(On Diff #61710)

Why is "netdump" in the name? Isn't this routine for full-duplex protocols?

376 ↗(On Diff #61710)

Why not const struct debugnet_msg_hdr *?

388 ↗(On Diff #61710)

One thing to be careful about here is the sorcerer's apprentice problem: if the received packet was delayed enough that the server retransmitted, here I think we'll ACK both the original packet and the retransmit. Without knowing what the netgdb proxy does with an ACK, I'm not sure if the duplicate ACKs will actually cause problems.

sys/net/debugnet_inet.c
206 ↗(On Diff #61710)

Can you explain why this fragment is needed? Is it related to the rest of the diff?

cem added inline comments.Sep 13 2019, 9:36 PM
sys/net/debugnet.c
354 ↗(On Diff #61710)

I honestly have no idea. Brain-o.

376 ↗(On Diff #61710)

I don't like typing long typenames twice (it's already in the declaration), and const void* is equally valid given mtod just blindly casts.

Honestly, mtod is useless. This seems better to me: https://godbolt.org/z/lUXC9v

388 ↗(On Diff #61710)

This race can happen in any protocol with Acks, no? I think the sender must always handle duplicate acks (and the in-progress version of the proxy handles duplicate acks already).

sys/net/debugnet_inet.c
206 ↗(On Diff #61710)

It's orthogonal and could be extracted into an independent change.

markj added inline comments.Sep 13 2019, 9:43 PM
sys/net/debugnet.c
388 ↗(On Diff #61710)

Sure, but how exactly does the proxy handle them? The problem would arise if the proxy sent out a "next" packet as a response to an ACK, and sent out multiple copies of the packet in response to duplicate ACKs.

markj added inline comments.Sep 13 2019, 9:47 PM
sys/net/debugnet.c
376 ↗(On Diff #61710)

I guess it's more helpful to the reader in some of the massive network input/output routines we have, where looking at the declaration would involve scrolling back hundreds of lines.

cem added inline comments.Sep 13 2019, 10:08 PM
sys/net/debugnet.c
376 ↗(On Diff #61710)

I guess. When I read code with an off-screen declaration block, I tend to look at how the variables are used to establish type, if I don't know off-hand. Generally the reason to mtod() something is to access it, which means either a direct struct dereference or passing the pointer to a function. Ctags works well for either of those, if they happen near the mtod.

That said, I'm definitely supportive of refactoring the massive network routines we have down to more bite-sized pieces independent of all that.

388 ↗(On Diff #61710)

So big caveat, the following is just describing the as-is state of the nonfunctional C proxy; I might be missing something:

The problem would arise if the proxy sent out a "next" packet as a response to an ACK

The as-is proxy doesn't anything conditional on recieving TX acks today.

and sent out multiple copies of the packet in response to duplicate ACKs.

When the proxy initially sends a message to netgdb, it stores a copy of the TX message in an "unacked TX" list. On first ack with matching seqno, it just frees the buffer from the list. An Ack that doesn't match an on-list buffer is ignored. (A to-be-implemented resend asynchronous timer periodically resends anything on the unacked list, like debugnet/netdump today.)

These checks can be done fairly cheaply even with a linked list because we can track a txseqno below which all packets have been acked. Also, the list is inherently sorted by seqno; some combination of low packet loss, low packet reordering, and limited list size means we're unlikely to spend a lot of time traversing the list in practice. [Though: unlike the kernel debugnet, I did not place any upper limit on the number of unacked TX messages.]

cem updated this revision to Diff 62406.Sep 21 2019, 10:02 PM
cem marked 4 inline comments as done.
  • Split out orthogonal change
  • Drop netdump from function name having nothing to do with netdump
cem added a comment.Sep 21 2019, 10:02 PM

Ok, I think the first round of comments is addressed.

sys/net/debugnet.c
388 ↗(On Diff #61710)

I'm marking this thread "done" on the basis of I (believe that I) understand the concern and need for care in the other conversant (i.e., the gdb proxy), and to the best of my understanding, there is nothing actionable about it from this end of the protocol.

sys/net/debugnet_inet.c
206 ↗(On Diff #61710)

Extracted to D21745

markj added inline comments.Sep 22 2019, 5:40 PM
sys/net/debugnet.c
388 ↗(On Diff #61710)

I had written the following comment but neglected to send it, sorry. "That seems reasonable to me: the main concern is that an agent receiving duplicate ACKs should not perform a duplicate action (like sending out multiple copies of the same packet in response), and it sounds like you're handling that properly."

cem added inline comments.Sep 22 2019, 5:47 PM
sys/net/debugnet.c
388 ↗(On Diff #61710)

Ah, glad we're on the same page.

This is accounted for here in the current version of the Python proxy: https://reviews.freebsd.org/P314$331-336 . (Those line number links may break slightly as I update the Python proxy.)

I forget how familiar you are with Python, but if you aren't (or for other readers), debugnet_unacked_tx is a dictionary/map/hashtable from sequence numbers to sent messages; "not in" is a check for the absence of the received ACK seqno, and "del foo[bar]" removes the K-V pair associated with key bar from the dictionary/map/hashtable foo.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 17 2019, 8:25 PM
This revision was automatically updated to reflect the committed changes.