Page MenuHomeFreeBSD

loader: network read rework
ClosedPublic

Authored by tsoome on Apr 1 2017, 3:09 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 9, 10:46 AM
Unknown Object (File)
Thu, Jan 9, 4:38 AM
Unknown Object (File)
Sat, Jan 4, 3:10 AM
Unknown Object (File)
Mon, Dec 23, 1:21 PM
Unknown Object (File)
Mon, Dec 23, 1:12 PM
Unknown Object (File)
Mon, Dec 23, 12:40 PM
Unknown Object (File)
Mon, Dec 23, 12:29 PM
Unknown Object (File)
Sun, Dec 15, 2:20 PM

Details

Summary

The current read from network is working from up to down - we have some
protocol needing the data from the network, so we build the buffer space
for that protocol, add the extra space for headers and pass this buffer
down to be filled by nif get call in hope, we have guessed the incoming
packet size right. Amazingly enough this approach mostly does work, but
not always...

So, this update does work from down to up - we allocate buffer (based
on MTU or frame size info), fill it up, and pass on for upper layers.
The obvious problem is that when we should free the buffer - if at all.

In the current implementation the upper layer will free the packet on error
or when the packet is no longer needed.

While working on the issue, the additional issue did pop up - the bios
implementation does not have generic get/put interface but is using pxe
udpsend/udpreceive instead. So the udp calls are gone and undi interface
is implemented instead. Which in turn means slight other changes as we
do not need to have duplicated pxe implementation and can just use dev_net.

Note the transformation to dev_net.c usage is not yet finalized, the efi
side will need some work and I am not yet completely sure about the details.

To align packet content, the actual read from nic is using shifted buffer by 2.
The UEFI network read seems to like to provide more data than is suggested
by the MTU, so additional padding is added.

Test Plan

Tested network boot with loader.efi and pxeboot on x86. Other platforms
still need to be tested...

update: tinderbox sparc64 build seems to be good.

Diff Detail

Event Timeline

use nitems() instead of NENTS()

EFI should build dev_net.c and link efinet_dev with netdev.

use getsecs() instead of time(0), fix efi getsecs...

We do not need to restore *ip on bad checksum - the whole packet
will be freed anyhow.

Can you tell me a bit more about the testing you have done on this, and what remains to be done?

Maybe we can get Andrew Turner to try net booting an ARM64 machine with this.

Can you tell me a bit more about the testing you have done on this, and what remains to be done?

Maybe we can get Andrew Turner to try net booting an ARM64 machine with this.

I have only tested the x86 (efi/bios) because thats what I have... hm, it just did hit me tho... I think the other platforms are "a bit broken" with this code, because there we assume the receiver will allocate the packet; so, I really need to check over the relevant code bits first and see if I can understand how the receive happens there and update the receive bits.

Implement ofw_net and uboot net changes.

Update to revision 317100

I have tested this on armv6 (beaglebone and wandboard) and it works. It's hard to say if there is any change in performance since on the wandboard it already took about 1.5 seconds to load the kernel, and it seems to be about the same, but it's hard to time something that short "by eye".

lib/libstand/bootp.c
306

No need for 0x%p, the %p formatting already puts a 0x on. Also, no need for the (void*) cast of bp to print it.

334

Probably should change this free/malloc sequence to a reallocf() call. Then we can optimize libstand realloc() to do nothing if the existing buffer is big enough already (as a separate piece of work from these changes).

lib/libstand/bootparam.c
131

The code in libstand mostly follows the style(9) rule of not initializing a variable in the definition of it. (I hate that rule, it's one of the dumbest, but even so I tend to follow it when the nearby code is already doing so.)

326

uint32_t (since it's changed below for this var). same in the decode function below

lib/libstand/nfs.c
821

Setting all these local variables to 0 before returning is a bit odd. Likewise on other error-exit paths. Was this related to a "goto done" type scheme that got removed or didn't get finished?

862

Does pkt leak away on this exit path?

sys/boot/ofw/libofw/ofw_net.c
147

I'm not sure this +2 to align ip headers is a good idea for all arches. We had problems with the uboot loader and uboot's IO API routines complaining about doing IO that wasn't on cacheline boundaries. We ended up tweaking libstand's malloc to make sure that all allocations were aligned to cacheline size to fix the problem (see libstand/zalloc_defs.h).

I suspect the platforms that use OF may have their own alignment requirements, and the same may be true of the efi code when uboot starts supporting efi (well, when we start using that support which could be any time now).

Can you tell me a bit more about the testing you have done on this, and what remains to be done?

Maybe we can get Andrew Turner to try net booting an ARM64 machine with this.

I have only tested the x86 (efi/bios) because thats what I have... hm, it just did hit me tho... I think the other platforms are "a bit broken" with this code, because there we assume the receiver will allocate the packet; so, I really need to check over the relevant code bits first and see if I can understand how the receive happens there and update the receive bits.

lib/libstand/bootp.c
306

yep, and we have %zd for ssize_t.

334

except the libstand does not have reallocf() so this idea has to wait:)

lib/libstand/nfs.c
821

Those are static variables, as the readdir there is implemented in an way to fetch the data, then process one dirent at the time - so we need to be careful not to step on the stale data. But yes, we can rework the cleanup there.

862

It is stored in the static variable, so the next calls will take care of it. Of course such readdir implementation itself is a bit bad, because it is sort of assuming the entire list will get processed, but I think it would deserve separate issue.

sys/boot/ofw/libofw/ofw_net.c
147

I do agree; I did use the simplest approach in this cycle, in hope that setting the alignment for ethernet header is ok for most platforms, this hope is actually based on the similar approach used by grub2.

However, I think we need better solution, mbuf like structure to provide the buffer space itself and with pointer(s) to actual data, so the custom alignment could be applied per platform. However, the focus of this work is about reworking the read buffer build to provide the read buffer where the content is created, and to avoid guessing. It may be reasonable to implement the buffer "metadata" as part of this update, or maybe as followup. I am not entirely sure which way is better.

tsoome marked 3 inline comments as done.

changes suggested by ian.

In D10232#216150, @ian wrote:

I have tested this on armv6 (beaglebone and wandboard) and it works. It's hard to say if there is any change in performance since on the wandboard it already took about 1.5 seconds to load the kernel, and it seems to be about the same, but it's hard to time something that short "by eye".

As an remark about performance: I really would not expect the change in current performance. The reason is, the change is about providing the read buffer based on the data received from NIC and remove the guessing. Of course on x86 the switch from PXE UDP read to UNDI package interface may have small effect, but it would not expect much. So the current packet size limits are the same, the same calls are performed, and the same amount of data is copied.

However, this change will allow us to implement IP re-assembly as an follow up work, and that will give boost, as we will have chance to reduce the call counts through the stack for many small packets and receive data by larger chunks.

Somehow I did lost one semicolon...

use ETHER_ALIGN and proper packet size calculation for efi and ofw.

netinet/if_ether.h does already include net/ethernet.h

tested both with pxeboot and loader.efi

This revision is now accepted and ready to land.May 6 2017, 8:11 PM
tsoome edited edge metadata.

simple fix for NULL...

This revision now requires review to proceed.May 6 2017, 8:14 PM
This revision is now accepted and ready to land.May 6 2017, 8:18 PM
This revision was automatically updated to reflect the committed changes.
cem added inline comments.
head/lib/libstand/rarp.c
150–158 ↗(On Diff #28119)

I suspect this should be readether(d, &ptr, ...);

Coverity reports this as a NULL pointer dereference (*pkt = ptr; in readether()), CID 1374944, and it does look like a real bug to me.

tsoome added inline comments.
head/lib/libstand/rarp.c
150–158 ↗(On Diff #28119)

yes, missing & operator here. review posted;)