Page MenuHomeFreeBSD

Add support for the TFTP windowsize option described in RFC 7440.
ClosedPublic

Authored by jhb on Feb 25 2020, 5:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jul 6, 10:33 AM
Unknown Object (File)
Tue, Jul 2, 7:25 PM
Unknown Object (File)
Tue, Jul 2, 7:25 PM
Unknown Object (File)
Tue, Jul 2, 7:25 PM
Unknown Object (File)
Tue, Jul 2, 7:25 PM
Unknown Object (File)
Tue, Jul 2, 11:50 AM
Unknown Object (File)
May 29 2024, 7:38 PM
Unknown Object (File)
May 17 2024, 11:06 PM
Subscribers

Details

Summary

The windowsize option permits multiple blocks to be transmitted
before the receiver sends an ACK improving throughput for larger
files.

Sponsored by: DARPA

Test Plan
  • booting CheriBSD using a netboot loader on an FPGA board now fetches a 48M kernel in about 115 seconds instead of 160 seconds previously (the netboot loader uses a block size of 1024 and a windowsize of 16 blocks)
  • the added tests also pass

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

asomers requested changes to this revision.Feb 25 2020, 6:37 PM

I didn't closely review tftp-transfer.c, but everything else looks good, especially the tests! Thanks for implementing tests for option processing; I never got around to that.

libexec/tftpd/tftp-options.c
307 ↗(On Diff #68813)

Spaces around &.

309 ↗(On Diff #68813)

Shouldn't this be OPT_WINDOWSIZE?

libexec/tftpd/tftp-transfer.c
243 ↗(On Diff #68813)

Are these leading spaces here? Or is Phabricator just trying to show that you changed indentation?

404 ↗(On Diff #68813)

Spaces around &

libexec/tftpd/tftpd.8
31 ↗(On Diff #68813)

Bump this date, too.

usr.bin/tftp/tftp.1
31 ↗(On Diff #68813)

Don't forget to bump the man page date.

This revision now requires changes to proceed.Feb 25 2020, 6:37 PM
libexec/tftpd/tftp-options.c
307 ↗(On Diff #68813)

FWIW this is matching the existing style.

libexec/tftpd/tftp-transfer.c
77 ↗(On Diff #68813)

style(9) doesn't say anything about labels, but I would expect this to be unindented, and match the existing labels in the source?

243 ↗(On Diff #68813)

The latter; Show Raw File (Right) in View Options can be used to verify that.

jhb marked 2 inline comments as done.Feb 25 2020, 10:47 PM
jhb added inline comments.
libexec/tftpd/tftp-options.c
307 ↗(On Diff #68813)

Yes, all of these are to match the existing (ugh) style.

libexec/tftpd/tftpd.8
31 ↗(On Diff #68813)

I will, I always wait to do that until commit since I have to do that anyway.

  • Fix logged windowsize.
  • Consistently indent new label.
This revision is now accepted and ready to land.Feb 26 2020, 3:30 AM

I can’t speak to the relevant standards and don’t have time to read up on them now, but the improvement seems pretty marginal? Would a larger window or block size be acceptable? Switched networks often have extremely low loss rates, even if this doesn’t do nice TCP scaling.

In D23836#524215, @cem wrote:

I can’t speak to the relevant standards and don’t have time to read up on them now, but the improvement seems pretty marginal? Would a larger window or block size be acceptable? Switched networks often have extremely low loss rates, even if this doesn’t do nice TCP scaling.

This is a 50 MHz soft core, and that window size is enough to match the TCP speeds it can currently achieve; the memory-I/O interaction is not currently great on it despite having a DMA engine. Larger window sizes would of course be beneficial for better hardware, and larger block sizes can also work, although require your network stack to support UDP fragmentation, something that bootloaders may not do, but if they ask TFTPD for either then they will get them, the limitations all just stem from the netboot client we're using.

So for reference, I ran 'tftp' over localhost to fetch a ~5MB file. With the defaults (512b blocks, 1 window), it took 1.7 seconds. With 1024 byte blocks it took 1.2 seconds. With 512 byte blocks and a windowsize of 16 it took 0.2 seconds, and with 1025 byte blocks and a windowsize of 16 it took 0.1 seconds (as reported by the client). For that simple test it seems that windowsize makes a much bigger difference than blocksize for improving throughput.

In D23836#524333, @jhb wrote:

So for reference, I ran 'tftp' over localhost to fetch a ~5MB file. With the defaults (512b blocks, 1 window), it took 1.7 seconds. With 1024 byte blocks it took 1.2 seconds. With 512 byte blocks and a windowsize of 16 it took 0.2 seconds, and with 1025 byte blocks and a windowsize of 16 it took 0.1 seconds (as reported by the client). For that simple test it seems that windowsize makes a much bigger difference than blocksize for improving throughput.

1.7s->1.2s is 1.4x faster for 2x the block size, and 1.7s->0.2s is 8.5x faster for 16x the window size. The only meaningful comparison would be 1024 byte 1 blocks with a window size of 1 vs 512 byte blocks with a window size of 2, and I would expect the former to be marginally faster (though perhaps not noticeably) than the latter given there are fewer packets to construct and process. Where you really win with window size though is the ability to ramp it up such that you're transmitting more than 1 MTU's worth of data at a time; increasing the block size further just gives you either fragmentation (at which point you basically have windowing, just at a lower layer) or your network stack doesn't bother to implement it and you can't go that high.

In D23836#524333, @jhb wrote:

So for reference, I ran 'tftp' over localhost to fetch a ~5MB file. With the defaults (512b blocks, 1 window), it took 1.7 seconds. With 1024 byte blocks it took 1.2 seconds. With 512 byte blocks and a windowsize of 16 it took 0.2 seconds, and with 1025 byte blocks and a windowsize of 16 it took 0.1 seconds (as reported by the client). For that simple test it seems that windowsize makes a much bigger difference than blocksize for improving throughput.

1.7s->1.2s is 1.4x faster for 2x the block size, and 1.7s->0.2s is 8.5x faster for 16x the window size. The only meaningful comparison would be 1024 byte 1 blocks with a window size of 1 vs 512 byte blocks with a window size of 2, and I would expect the former to be marginally faster (though perhaps not noticeably) than the latter given there are fewer packets to construct and process. Where you really win with window size though is the ability to ramp it up such that you're transmitting more than 1 MTU's worth of data at a time; increasing the block size further just gives you either fragmentation (at which point you basically have windowing, just at a lower layer) or your network stack doesn't bother to implement it and you can't go that high.

Duh, yes.