Page MenuHomeFreeBSD

dummynet: add simple gilbert-elliott channel model
ClosedPublic

Authored by rscheff on Dec 8 2023, 11:16 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 22, 6:26 AM
Unknown Object (File)
Mon, Jan 20, 5:15 AM
Unknown Object (File)
Mon, Jan 20, 5:06 AM
Unknown Object (File)
Mon, Jan 20, 5:05 AM
Unknown Object (File)
Mon, Jan 20, 5:05 AM
Unknown Object (File)
Mon, Jan 20, 5:05 AM
Unknown Object (File)
Sun, Jan 19, 4:42 PM
Unknown Object (File)
Fri, Jan 17, 6:09 AM

Details

Summary

Building a good analog of correlated loss behavior
across realistic environments in dummynet was cumbersome.

Introducing state in the flow-set and 4 probabilities to
switch between the two states, with two distinct loss
probabilities provides for a simple Gilbert-Elliott
channel model. This streamlines the testing of burst-loss
environments.

Test Plan

ipfw pipe 100 config plr 0.001
ipfw pipe 200 config plr 0.0,0.01,0.1,0.05

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 54882
Build 51771: arc lint + arc unit

Event Timeline

  • print full probabilies also when G-loss is 0
sbin/ipfw/ipfw.8
3078

Worth being explicit that these arguments are comma-separated and mention their internal representation if relevant?

3087

Or explain what A is.

Minor style quibbles.

I'd also love to see a test case, even a basic one that just activates the packet-loss-rate code and sends a few ping packets through provides some sanity checking. (My view is that while it might not be worth the effort to validate the bimodal loss rates here it is worth just running the code, because that has a tendency to provoke lock issues, or leaks or or or.)

sys/netpfil/ipfw/ip_dn_io.c
503

I'm really not a fan of having the case and first line of code on the same line.
I think style(9) also disagrees with it, and I don't see any examples of it in the ipfw/dummynet code either.

505

This might be clarified a little by having an enum for the states.
Something like f->pf_state = PL_STATE_BAD perhaps?

This revision is now accepted and ready to land.Dec 9 2023, 10:56 AM
  • extend man page with gilbert-elliott model description
This revision now requires review to proceed.Dec 9 2023, 11:56 AM
rscheff added inline comments.
sbin/ipfw/ipfw.8
3078

Ideally, I wanted to have the parameters in loss probabilities; the literature of the gilbert-elliott model give these (k, h) as transmission probabilities - the inverse of what the simple PLR loss probability is;
Adjusted the code accordingly.
Alternatively, I could call these parameters K and H, with K = (1-k) and H = (1-h), keep the code more streamlined and the probabilities would align nicely with the simple PLR loss model...

Any opinions?

rscheff marked an inline comment as done.
  • enum the states
In D42980#980005, @kp wrote:

I'd also love to see a test case, even a basic one that just activates the packet-loss-rate code and sends a few ping packets through provides some sanity checking. (My view is that while it might not be worth the effort to validate the bimodal loss rates here it is worth just running the code, because that has a tendency to provoke lock issues, or leaks or or or.)

It doesn't seem that any test code exist currently, to validate any of the dummynet functionality though... And testing a stochastic / probability process is more involved.

But what I need this code for is to collect statistical relevant flow-completion times between the base TCP stack (A test), and an enhancement (not discarding SACK data after an RTO) of the TCP stack (B test). In order to elicit the RTO - loss of a retransmission - there needs a quite significant loss probablity during such a loss burst, but also the TCP congestion window has to have grown sufficiently - thus a simple loss probability is not good enough to statistically test this in a short enough test campaign...

The idea was to place a gilbert-elliott model pipe on the lo0 interface, and have a tool like uperf transfer 10MB for 10000-100000 times, logging the completion times (and maybe other statistics) for each run...

Manual page change LGTM. I can't speak to consistency with code.

sbin/ipfw/ipfw.8
3078

I have no basis for an informed opinion on that point myself.

This revision is now accepted and ready to land.Dec 9 2023, 2:54 PM
sys/netpfil/ipfw/ip_dn_io.c
506

This default: /* FALLTHROUGH */ looks weird to me.

Do we have any other possible pl_state ( in future ) ?

  • put code in default branch and fall through on specific case
This revision now requires review to proceed.Dec 12 2023, 9:35 AM

Discussed this in the transport call. Will change the probabilities in the Gilbert model from transmission probability back to drop probabiliy for consistency within the tool, and document how to map the literature variables k and h to what the tool accepts.

sys/netpfil/ipfw/ip_dn_io.c
506

It would not be inconceiveable to extend this code from a simple Gilbert-Elliott (2 state, 4 probabilities) model, to a full multi-state markov chain (3 probabilities per state - loss prob, prob to move state forward, prob to move state backwards).

rscheff marked an inline comment as done.
  • make gilbert model with loss prob, document in man page
This revision is now accepted and ready to land.Dec 14 2023, 4:10 PM
In D42980#980005, @kp wrote:

I'd also love to see a test case, even a basic one that just activates the packet-loss-rate code and sends a few ping packets through provides some sanity checking. (My view is that while it might not be worth the effort to validate the bimodal loss rates here it is worth just running the code, because that has a tendency to provoke lock issues, or leaks or or or.)

It doesn't seem that any test code exist currently, to validate any of the dummynet functionality though... And testing a stochastic / probability process is more involved.

We do already have some dummynet tests in https://cgit.freebsd.org/src/tree/tests/sys/netpfil/common/dummynet.sh
Those are very high-level tests, and don't validate the statistical correctness of what dummynet does, but they do serve to detect things like lock order or cleanup issues.

Manual page English still LGTM.

  • add kyua test cases for dummynet pls
This revision now requires review to proceed.Dec 16 2023, 1:40 PM
  • add allow-all rule prior to sanity check, validate an approximate percentage of pings get dropped
tests/sys/netpfil/common/dummynet.sh
600 ↗(On Diff #131496)

We're going to send a ping every 100ms (because -i .1), and keep running that for 60 seconds (so 6x), with a 10 second interval (so we lose a few iteration). I don't think we need the :10 here.

We probably want to do ping -i .01, and drop the :10.

Ideally we'd want to extract the actual loss rate into a variable and do math on that, but that's not exactly straightforward with ping.

  • speed up test case from 60sec down to a maximum of 6sec, if initial check fails
rscheff added inline comments.
tests/sys/netpfil/common/dummynet.sh
600 ↗(On Diff #131496)

Actually, looking at
https://github.com/freebsd/atf/commit/d7c7c53c0626ab59a62aa4efcf05323b3621baa9
the -r option does seem to repeatedly call whatever test function / binary when the check fails - until the timeout expires or the check passes; and waits :<n> milliseconds between two consecutive calls.

the ping process with -c 100 -i .1 would take 10 sec, so this should call the pinger up to 5 or 6 times. And yes, I'll reduce the ping time down to 10ms (-i 0.010) for a faster execution time, and slice the -r down to 6:10 at the atf_check.

Since there is no point waiting 50ms between executions of the ping process, I'll keep that at 10ms.

Since these are probabilistic losses, I think giving a reasonable confidence interval, how many good pings to expect should be good enough.

kp added inline comments.
tests/sys/netpfil/common/dummynet.sh
600 ↗(On Diff #131496)

Since there is no point waiting 50ms between executions of the ping process, I'll keep that at 10ms.

I did not read the man page with sufficient attention, and thought :10 would imply a 10 second (rather than the correct 10 milliseconds) wait between invocations.

So yes, that's worth keeping as you had it.

This revision is now accepted and ready to land.Dec 17 2023, 8:26 PM
This revision was automatically updated to reflect the committed changes.
rscheff marked an inline comment as done.