Page MenuHomeFreeBSD

pxeboot: improve and simplify rx handling
ClosedPublic

Authored by kevans on Aug 12 2021, 3:02 AM.
Tags
None
Referenced Files
F106881348: D31512.diff
Mon, Jan 6, 9:04 PM
Unknown Object (File)
Nov 18 2024, 1:28 PM
Unknown Object (File)
Nov 18 2024, 1:09 PM
Unknown Object (File)
Nov 11 2024, 1:20 PM
Unknown Object (File)
Nov 2 2024, 3:51 AM
Unknown Object (File)
Oct 10 2024, 11:26 AM
Unknown Object (File)
Oct 4 2024, 11:44 PM
Unknown Object (File)
Oct 1 2024, 4:07 PM
Subscribers

Details

Summary

This pushes the bulk of the rx servicing into a single loop that's only
slightly convoluted, and it addresses a problem with rx handling in the
process. If we hit a tx interrupt while we're processing, we'd
previously drop the frame on the floor completely and ultimately
timeout, increasing boot time on particularly busy hosts as we keep
having to backoff and resend.

After this patch, we don't seem to hit timeouts at all on zoo anymore
though loading a 27M kernel is still relatively slow (~1m20s).

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I've largely based this off of what Figure 3-5 in the PXE 2.1 spec seems to claim we should do... (and observing on zoo that we are seeing tx interrupts interlaced while we're processing a single rx frame)

This is something I was suspecting to happen:) Still need to read the resulting code carefully. Which method was used to load kernel, tftp or nfs?

stand/i386/libi386/pxe.c
518

This is not correct. We get Frame, with FrameLength and allocate buffer for it.

The actual data size in Frame, is BufferLength and to receive full frame, we *may* need to iterate PXENV_UNDI_ISR_IN_GET_NEXT till we have whole frame (FrameLength).

"FrameLength: This parameter contains the total length of the receive frame. A receive frame may contain more than one data buffer. If FrameLength is not the same as BufferLength, the application will have to call PXENV_UNDI_ISR several times to receive the complete frame. In the case of the multi buffered frame, all buffers (except the last one) must contain at least 512 bytes of data (this does not include the media header). In other words, the minimum buffer size is 512 bytes plus the size of the media header."

That is, we do get one frame, but it may be split to at least 512B chunks.

stand/i386/libi386/pxe.c
518

Right, that's what we do. This is the difference between Frame and frame; we're actually reading the end of the frame
(spread out over multiple buffers) and able to collect a whole new frame, but this function can only deal with a single packet at a time so we must not traverse into the next. Note that rsize and size haven't changed, the latter still tracks the complete frame size.

stand/i386/libi386/pxe.c
518

(I would have liked to rename the local frame to framebuf or something to help disambiguate)

I've got one confirmed instance of ~30-75min pxeboot time dropped to ~5min

This looks reasonable. Any idea of the 'chaining packets' comment / TODO item would reduce the loading time substantially or not? Or is the problem elsewhere?

I don't see anything major wrong with this, but I've not fact-checked it against the pxeboot standards, so consider this what the linux world wold call 'acked by'

I don't know about timeouts or recent other breakage (maybe fixed again).
Thanks a lot for working on this! Very much appreciated!

1m20s for an amd64 GENERIC kernel is still way out of what should be possible is my guts feeling.
One thing I keep thinking of is loading kernel via NFS vs. tftp (there used to be a LOADER_TFT_SUPPORT make.conf setting?).
I wonder if anyone looked into that lately to see if one or the other makes a difference and it may give another clue?

Discussed with tsoome: record if we have another ethernet frame pending after
handling this one, and skip the START/PROCESS dance in the next call. Primitive
testing in zoo seems to suggest that we hit the data_pending case 99% of the
time, but that may have been artificially inflated by the delay added to observe
that...

Ok, after some thinking, I did realize, we need to restructure this code some more. The problem is, pxe_netif_receive() should return complete Frame unless there is nothing coming, but UNDI ISR can end up in different states, and in some cases, we would want to receive again. Or to put other way - the current implementation is using pxe_netif_receive() as interrupt routine for UNDI, but UNDI interrupt and pxe_netif_receive have different semantics.

So, the idea should be something like:

  1. have main loop to read data in pxe_netif_receive() - its job is to re-call ISR receive if needed.
  2. ISR receive should call getnext or start+in process like it is currently doing.

    after PXENV_UNDI_ISR_IN_PROCESS, we must check for PXENV_UNDI_ISR_OUT_TRANSMIT (and use PXENV_UNDI_ISR_IN_GET_NEXT to get to PXENV_UNDI_ISR_OUT_DONE or PXENV_UNDI_ISR_OUT_BUSY, on which we return and pxe_netif_receive() will call us again.

    in case of PXENV_UNDI_ISR_OUT_RECEIVE, we pick up Frame and use PXENV_UNDI_ISR_IN_GET_NEXT to receive all buffer fragments (rsize < size), if we still do not have PXENV_UNDI_ISR_OUT_DONE, it means we have another Frame, so we set data_pending, return to pxe_netif_receive(), it will exit loop and return the data. Next call to pxe_netif_receive() will process next frame.

This way we can pick up whole frame even when PXE is using split buffer, we do get the frame when there was still transmit going on and in case we got some other issue (busy/status code).

Restructure a little more and handle PXENV_UNDI_ISR_OUT_BUSY. We can avoid
having to free/reallocate isr if we're about to restart the operation anyways.

I've also re-worded the comment around data_pending to make it clear that we
don't really know if we would have caught another packet or not. The subsequent
PXENV_UNDI_ISR_IN_GET_NEXT call could just as easily return a TRANSMIT, but we
would need to be prepared to buffer another packet in case it doesn't if we
wanted to chase it all the way to DONE.

Simplify the terminal condition of the loop a little more (we check Status both
before the loop and after the GET_NEXT call) and indicate that we'll often not
hit it.

Hopefully final update for a while: on the off chance that we did only catch
tx interrupts, make sure we restart just as we did if we caught BUSY.

Other than those few nits, it seems to be good (I hope the hw tests will also confirm;)

stand/i386/libi386/pxe.c
475

think, we can drop else from there:) and perhaps put { } around to help to track what is in this block.

This revision is now accepted and ready to land.Aug 25 2021, 8:37 PM

Drop else and restore a bunch of missing braces to reduce diff noise

Patch still OK on Zoo

This revision now requires review to proceed.Aug 25 2021, 8:51 PM
This revision is now accepted and ready to land.Aug 25 2021, 8:52 PM
This revision was automatically updated to reflect the committed changes.