Add support for the TFTP windowsize option described in RFC 7440.
ClosedPublic
Actions

Authored by jhb on Feb 25 2020, 5:51 PM.

Details

Reviewers

asomers
cem
jrtc27

Group Reviewers

manpages

Commits

rS358556: Add support for the TFTP windowsize option described in RFC 7440.

Summary

The windowsize option permits multiple blocks to be transmitted
before the receiver sends an ACK improving throughput for larger
files.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

jhb created this revision.Feb 25 2020, 5:51 PM

Harbormaster completed remote builds in B29609: Diff 68813.Feb 25 2020, 5:51 PM

I didn't closely review tftp-transfer.c, but everything else looks good, especially the tests! Thanks for implementing tests for option processing; I never got around to that.

libexec/tftpd/tftp-options.c
307 ↗	(On Diff #68813)	Spaces around `&`.
309 ↗	(On Diff #68813)	Shouldn't this be `OPT_WINDOWSIZE`?
libexec/tftpd/tftp-transfer.c
243 ↗	(On Diff #68813)	Are these leading spaces here? Or is Phabricator just trying to show that you changed indentation?
404 ↗	(On Diff #68813)	Spaces around `&`
libexec/tftpd/tftpd.8
31 ↗	(On Diff #68813)	Bump this date, too.
usr.bin/tftp/tftp.1
31 ↗	(On Diff #68813)	Don't forget to bump the man page date.

This revision now requires changes to proceed.Feb 25 2020, 6:37 PM

jrtc27 added inline comments.Feb 25 2020, 6:44 PM

libexec/tftpd/tftp-options.c
307 ↗	(On Diff #68813)	FWIW this is matching the existing style.
libexec/tftpd/tftp-transfer.c
77 ↗	(On Diff #68813)	`style(9)` doesn't say anything about labels, but I would expect this to be unindented, and match the existing labels in the source?
243 ↗	(On Diff #68813)	The latter; `Show Raw File (Right)` in `View Options` can be used to verify that.

jhb marked 2 inline comments as done.Feb 25 2020, 10:47 PM

jhb added inline comments.

libexec/tftpd/tftp-options.c
307 ↗	(On Diff #68813)	Yes, all of these are to match the existing (ugh) style.
libexec/tftpd/tftpd.8
31 ↗	(On Diff #68813)	I will, I always wait to do that until commit since I have to do that anyway.

Fix logged windowsize.
Consistently indent new label.

Harbormaster completed remote builds in B29618: Diff 68826.Feb 25 2020, 10:48 PM

asomers accepted this revision.Feb 26 2020, 3:30 AM

This revision is now accepted and ready to land.Feb 26 2020, 3:30 AM

I can’t speak to the relevant standards and don’t have time to read up on them now, but the improvement seems pretty marginal? Would a larger window or block size be acceptable? Switched networks often have extremely low loss rates, even if this doesn’t do nice TCP scaling.

In D23836#524215, @cem wrote:

I can’t speak to the relevant standards and don’t have time to read up on them now, but the improvement seems pretty marginal? Would a larger window or block size be acceptable? Switched networks often have extremely low loss rates, even if this doesn’t do nice TCP scaling.

This is a 50 MHz soft core, and that window size is enough to match the TCP speeds it can currently achieve; the memory-I/O interaction is not currently great on it despite having a DMA engine. Larger window sizes would of course be beneficial for better hardware, and larger block sizes can also work, although require your network stack to support UDP fragmentation, something that bootloaders may not do, but if they ask TFTPD for either then they will get them, the limitations all just stem from the netboot client we're using.

So for reference, I ran 'tftp' over localhost to fetch a ~5MB file. With the defaults (512b blocks, 1 window), it took 1.7 seconds. With 1024 byte blocks it took 1.2 seconds. With 512 byte blocks and a windowsize of 16 it took 0.2 seconds, and with 1025 byte blocks and a windowsize of 16 it took 0.1 seconds (as reported by the client). For that simple test it seems that windowsize makes a much bigger difference than blocksize for improving throughput.

In D23836#524333, @jhb wrote:

So for reference, I ran 'tftp' over localhost to fetch a ~5MB file. With the defaults (512b blocks, 1 window), it took 1.7 seconds. With 1024 byte blocks it took 1.2 seconds. With 512 byte blocks and a windowsize of 16 it took 0.2 seconds, and with 1025 byte blocks and a windowsize of 16 it took 0.1 seconds (as reported by the client). For that simple test it seems that windowsize makes a much bigger difference than blocksize for improving throughput.

1.7s->1.2s is 1.4x faster for 2x the block size, and 1.7s->0.2s is 8.5x faster for 16x the window size. The only meaningful comparison would be 1024 byte 1 blocks with a window size of 1 vs 512 byte blocks with a window size of 2, and I would expect the former to be marginally faster (though perhaps not noticeably) than the latter given there are fewer packets to construct and process. Where you really win with window size though is the ability to ramp it up such that you're transmitting more than 1 MTU's worth of data at a time; increasing the block size further just gives you either fragmentation (at which point you basically have windowing, just at a lower layer) or your network stack doesn't bother to implement it and you can't go that high.

In D23836#524335, @jrtc27 wrote:

In D23836#524333, @jhb wrote:

So for reference, I ran 'tftp' over localhost to fetch a ~5MB file. With the defaults (512b blocks, 1 window), it took 1.7 seconds. With 1024 byte blocks it took 1.2 seconds. With 512 byte blocks and a windowsize of 16 it took 0.2 seconds, and with 1025 byte blocks and a windowsize of 16 it took 0.1 seconds (as reported by the client). For that simple test it seems that windowsize makes a much bigger difference than blocksize for improving throughput.

1.7s->1.2s is 1.4x faster for 2x the block size, and 1.7s->0.2s is 8.5x faster for 16x the window size. The only meaningful comparison would be 1024 byte 1 blocks with a window size of 1 vs 512 byte blocks with a window size of 2, and I would expect the former to be marginally faster (though perhaps not noticeably) than the latter given there are fewer packets to construct and process. Where you really win with window size though is the ability to ramp it up such that you're transmitting more than 1 MTU's worth of data at a time; increasing the block size further just gives you either fragmentation (at which point you basically have windowing, just at a lower layer) or your network stack doesn't bother to implement it and you can't go that high.