Page MenuHomeFreeBSD

Replace dhcp option 150 by 66
AbandonedPublic

Authored by kczekirda on Apr 24 2017, 9:34 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 9, 4:54 AM
Unknown Object (File)
Dec 9 2024, 4:06 AM
Unknown Object (File)
Dec 5 2024, 1:28 AM
Unknown Object (File)
Dec 4 2024, 10:19 AM
Unknown Object (File)
Nov 27 2024, 12:39 PM
Unknown Object (File)
Oct 20 2024, 5:54 AM
Unknown Object (File)
Oct 20 2024, 3:14 AM
Unknown Object (File)
Oct 19 2024, 8:20 PM

Details

Summary

In current behavior to netboot over tftp we have to set option 150 in the dhcp server, but this option is not available in the PXE specification. We cannot properly recognize user intentions based only on the RFC 1048 data from PXE and always boot over nfs. If TAG_TFTP_SERVER_NAME is a dotted quad address and can be converted to n_long by inet_addr() set this address to tftpip and next netboot over tftp protocol.
Replacing option 150 by option 66 which is stored on the vm_rfc1048 data makes possible to skip bootp() request for boot over tftp too.

Sponsored by: Oktawave

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Even if this is the correct change to make, the old option must still be supported for backwards compatibility with older PXE servers. Shouldn't there be an accompanying documentation change? How will users know to change their DHCP options?

Can BOTH options be sent with the same value?

RFC1048 is obsoleted, first by 1533, then by 2132. Please lets follow current RFC's when seeking information and refering to them. Refering to obsolete RFC's is going to lead to obsolete code. I see at least in the comment of adding vend_end that you refer to RFC2132, it is unclear why you refered to 1048 in the description of the change. RFC2132 names code 66 "TFTP server name". I can not find in the RFC that option 150 is defined. I do find from google searches that this was/is a cisco specific value: "DHCP option 150 provides the IP addresses of a list of TFTP servers. • DHCP option 66 gives the IP address or the hostname of a single TFTP server. Note Cisco IP Phones might also include DHCP option 3 in their requests, which sets the default route. A single request might include both options 150 and 66." This clarifies that it is possible to send both.

The code change as it stands is a correction to a use of a non standard DHCP option and should move forward on that, it may be worth making a release notes: yes entry in updating saying we no longer support the cisco specific DHCP option 150, or to implement that option with proper processing (I dont think the current code would actually use a list but I did not verify that.)

The old code processed this item as a list of ADDRESSES, which is what cisco says it was. The new code also does this, but option 66 is a NAME, which can probably also contain a dotted quad address, but assuming it is a dotted quad address is a bug.

/usr/src/lib/libstand/bootp.c
443

Here inet_addr(val) is assuming that val is a dotted quad, the specification does not say that. It says Name. Name in the spec usually means this can be a hostname or a dotted quad.

/usr/src/lib/libstand/bootp.h
97

This should be TAG_TFTP_SERVER_NAME which is what the current RFC calls code 66, making it clearer that this is a name and not necessarily a dotted quad address.

kczekirda edited the summary of this revision. (Show Details)
/usr/src/lib/libstand/bootp.h
99

Let's use 255 instead of hexa value as described in rfc:

Code 255 (END), if present, signifies the end of the

encapsulated vendor extensions, not the end of the vendor
extensions field. If no code 255 is present, then the end of
the enclosing vendor-specific information field is taken as the
end of the encapsulated vendor-specific extensions field.

Lets also use the file standard:

#define VEND_INFO_END                   ((unsigned char)  255) /* End option in RFC2132  */

@asomers
this change exactly provides compatibility with PXE standard, because in the PXE specification option 150 doesn't exist, but 66 does.
netproto variable and option 150 appears in r305125. Documentation and relnotes should be updated, but not by changing, because this part of loader never appers there (maybe I only can't find it).

@ler
It depends on your DHCP server. But only option 66 will be correct. You should also remember, that if ip addres is available in rootpath then tftpip option will be replaced by them.
https://svnweb.freebsd.org/base?view=revision&revision=305125

@rgrimes
RFC1048 is here, because 1048 is a part of names of functions or variables in bootp.c, I think we should rename them. I updated the description of revision with part about dotted quad. Probably we have no better way to do that.

@asomers
this change exactly provides compatibility with PXE standard, because in the PXE specification option 150 doesn't exist, but 66 does.
netproto variable and option 150 appears in r305125. Documentation and relnotes should be updated, but not by changing, because this part of loader never appers there (maybe I only can't find it).

@kczekirda I'm not worried about compatibility with the PXE standard. I'm worried about compatibility with existing users of option 150. How are you going to support them or transition them to the new option?

@rgrimes
RFC1048 is here, because 1048 is a part of names of functions or variables in bootp.c, I think we should rename them. I updated the description of revision with part about dotted quad. Probably we have no better way to do that.

For now it might be best to leave them with old names and just try to remember when writing code it is a bad idea to use rfc numbers in variables names as that is something very likely to change over time as it has here. Maybe sprinkle a few comments next to them at some point in time saying rfc1048 has been updated.

@kczekirda I'm not worried about compatibility with the PXE standard. I'm worried about compatibility with existing users of option 150. How are you going to support them or transition them to the new option?

This is a valid concern, We should also ask Bapt why he choose to use option 150 instead of 66 when implementing this?

/usr/src/lib/libstand/bootp.c
443

You marked this as done but it still only accepts a dotted quad, if we cant process a hostname here maybe that is why Bapt used option 150? As a minimum add a comment saying there is a known bug here with not processing a hostname, maybe /* XXX Should accept hostname as well as dotted quad */

/usr/src/lib/libstand/bootp.h
99

Good catch, this also makes it match all the options before it in style of value with the cast.

I think we should really keep both 66 and 150. But also definitely we should point out in docs that in case of option 66, the ip address should be used, as in current code, we do not have name resolver. This does add some more complications to the validation and feedback.

/usr/src/lib/libstand/bootp.c
443

it should be clear error message for user. if you are attempting your netboot, then digging in source for XXX comments is last thing in your mind.

I have done some more digging, in rfc3492 section 4 the ietf has reclassified options: 4. Reclassifying Options The site-specific option codes 128 to 223 are hereby reclassified as publicly defined options. This leaves 31 site-specific options, 224 to 254.
Interestingly this rfc is by Cisco!
Found the list, these are in the domain of Iana now, so you have to go to https://www.iana.org/assignments/bootp-dhcp-parameters/bootp-dhcp-parameters.xhtml to find them. Which leads you to rfc5859. Reading that RFC is causing me to say something un popular here. We should be using option 150 as an IP addresses, and NOT using option 66 because we can not do hostname resolution. The reasons that Cisco request option 150 was partly based on that.
Does the code even request option 66 from the server? Or do we just expect an sname filled in? I read some of the Intel pxe 2.1 spec and they don't even mention option 66 in there list of options, yet the refer to it several times at the sname line of data structures.

I have also run accross conflicts in what vendors say go in option 66, some even saying that it is an address. The RFC's and Cisco clearly state that it should be treated as a name and that you need name resolution avaliable if you are going to use it.

Until I go forward with your comments about the code I want to highlight it's not possible to support option 150, because we have to ask DHCP server for this option. PXE client (I mean network card firmware) never asks about option 150. I can't see any chance to use this non standard option but revert r314948 and always do DHCP request for everything. The second option is doing DHCP request in #ifdef LOADER_TFTP_SUPPORT directive, what is something bad too, because we want to have one universal loader for both (NFS and TFTP) protocols. The third option is to force DHCP request in ifdef directive when somebody really wants to do this. And the last one option - to leave support for option 150, because it has never appear in the documentation. I can't see any really good solution, now your move to comment.

The more I look at PXE and DHCP the sicker I get. Your right, a client well not normally request option 150, for that matter I am not sure it well request 66 either, though I suspect many do, and probably many of them also wrongly interpret that as a dotted quad when it is actually a host name. I have had to fuss with dhcp servers and pxe clients for days if not weeks to get some operating systems booting over the network. Right now I can boot almost any FreeBSD >9 version as long as I am not an a UEFI platform, so our code is not that broken, I do have to use loader code from -head on the older releases, so things have improved. My use case is rather narrow I suppose in that I always chain load to iPXE to use its advanced features vs what most cards have in there bios. I am getting freebsd's pxeboot via tftp, then use that to get the kernel via nfs. Again, can we get input from Bapt who did the addition of opt 150 and how it worked?

Until I go forward with your comments about the code I want to highlight it's not possible to support option 150, because we have to ask DHCP server for this option. PXE client (I mean network card firmware) never asks about option 150. I can't see any chance to use this non standard option but revert r314948 and always do DHCP request for everything. The second option is doing DHCP request in #ifdef LOADER_TFTP_SUPPORT directive, what is something bad too, because we want to have one universal loader for both (NFS and TFTP) protocols. The third option is to force DHCP request in ifdef directive when somebody really wants to do this. And the last one option - to leave support for option 150, because it has never appear in the documentation. I can't see any really good solution, now your move to comment.

Note that dhcp servers in real life can offer all the configured data. And well, we can process both 66 and 150 just because that data may be there anyhow.

But also there is much simpler way to distinguish NFS versus TFTP boot - because that is the problem - how to understand if we should go for NFS of for TFTP. And the idea is simple - if there is no server set, we opt to use next server option for nfs/tftp server; now the root_path syntax as such is not set, therefore we can just state that if the root_path is IP:/path then its NFS, if the root_path is /path, then it is TFTP.

Meanwhile I would also like to move on with D10232 :)

@rgrimes
Please try to boot CURRENT over tftp protocol and without any third part software like iPXE.

@tsoome

Note that dhcp servers in real life can offer all the configured data. And well, we can process both 66 and 150 just because that data may be there anyhow.

It's not the server side problem, but the client's side. How we can process option 150 if it not exists? I'm sorry, but I'm not able to do this.

tcpdump: listening on br-lan, link-type EN10MB (Ethernet), capture size 65535 bytes
23:39:38.527420 IP (tos 0x0, ttl 20, id 0, offset 0, flags [none], proto UDP (17), length 576)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 84:8f:69:f1:00:2d, length 548, xid 0x69f1002d, Flags [Broadcast] (0x8000)
	  Client-Ethernet-Address 84:8f:69:f1:00:2d
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Discover
	    Parameter-Request Option 55, length 36: 
	      Subnet-Mask, Time-Zone, Default-Gateway, Time-Server
	      IEN-Name-Server, Domain-Name-Server, RL, Hostname
	      BS, Domain-Name, SS, RP
	      EP, RSZ, TTL, BR
	      YD, YS, NTP, Vendor-Option
	      Requested-IP, Lease-Time, Server-ID, RN
	      RB, Vendor-Class, TFTP, BF
	      Option 128, Option 129, Option 130, Option 131
	      Option 132, Option 133, Option 134, Option 135
	    MSZ Option 57, length 2: 1260
	    GUID Option 97, length 17: 0.68.69.76.76.67.0.16.50.128.89.185.192.79.87.80.49
	    Client-ID Option 61, length 17: "DELLC^@^P2M-^@YM-9M-@OWP1"
	    ARCH Option 93, length 2: 0
	    NDI Option 94, length 3: 1.2.1
	    Vendor-Class Option 60, length 32: "PXEClient:Arch:00000:UNDI:002001"
	    END Option 255, length 0
	    PAD Option 0, length 0, occurs 181
23:39:39.007131 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 342)
    192.168.22.1.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 314, xid 0x69f1002d, Flags [Broadcast] (0x8000)
	  Your-IP 192.168.22.56
	  Server-IP 192.168.22.19
	  Client-Ethernet-Address 84:8f:69:f1:00:2d
	  file "pxeboot"
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Offer
	    Server-ID Option 54, length 4: 192.168.22.1
	    Lease-Time Option 51, length 4: 3600
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    Default-Gateway Option 3, length 4: 192.168.22.1
	    Domain-Name-Server Option 6, length 8: 8.8.8.8,8.8.4.4
	    Hostname Option 12, length 5: "e6220"
	    RP Option 17, length 15: "192.168.22.19:/"
	    BR Option 28, length 4: 192.168.22.255
	    TFTP Option 66, length 4: "test"
	    END Option 255, length 0

If we want to process data from dhcp_try_rfc1048() we have to drop option 150, because it's not exists there.
I like your idea for detecting protocol type.

@rgrimes
Please try to boot CURRENT over tftp protocol and without any third part software like iPXE.

@tsoome

Note that dhcp servers in real life can offer all the configured data. And well, we can process both 66 and 150 just because that data may be there anyhow.

It's not the server side problem, but the client's side. How we can process option 150 if it not exists? I'm sorry, but I'm not able to do this.

tcpdump: listening on br-lan, link-type EN10MB (Ethernet), capture size 65535 bytes
23:39:38.527420 IP (tos 0x0, ttl 20, id 0, offset 0, flags [none], proto UDP (17), length 576)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 84:8f:69:f1:00:2d, length 548, xid 0x69f1002d, Flags [Broadcast] (0x8000)
	  Client-Ethernet-Address 84:8f:69:f1:00:2d
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Discover
	    Parameter-Request Option 55, length 36: 
	      Subnet-Mask, Time-Zone, Default-Gateway, Time-Server
	      IEN-Name-Server, Domain-Name-Server, RL, Hostname
	      BS, Domain-Name, SS, RP
	      EP, RSZ, TTL, BR
	      YD, YS, NTP, Vendor-Option
	      Requested-IP, Lease-Time, Server-ID, RN
	      RB, Vendor-Class, TFTP, BF
	      Option 128, Option 129, Option 130, Option 131
	      Option 132, Option 133, Option 134, Option 135
	    MSZ Option 57, length 2: 1260
	    GUID Option 97, length 17: 0.68.69.76.76.67.0.16.50.128.89.185.192.79.87.80.49
	    Client-ID Option 61, length 17: "DELLC^@^P2M-^@YM-9M-@OWP1"
	    ARCH Option 93, length 2: 0
	    NDI Option 94, length 3: 1.2.1
	    Vendor-Class Option 60, length 32: "PXEClient:Arch:00000:UNDI:002001"
	    END Option 255, length 0
	    PAD Option 0, length 0, occurs 181
23:39:39.007131 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 342)
    192.168.22.1.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 314, xid 0x69f1002d, Flags [Broadcast] (0x8000)
	  Your-IP 192.168.22.56
	  Server-IP 192.168.22.19
	  Client-Ethernet-Address 84:8f:69:f1:00:2d
	  file "pxeboot"
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Offer
	    Server-ID Option 54, length 4: 192.168.22.1
	    Lease-Time Option 51, length 4: 3600
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    Default-Gateway Option 3, length 4: 192.168.22.1
	    Domain-Name-Server Option 6, length 8: 8.8.8.8,8.8.4.4
	    Hostname Option 12, length 5: "e6220"
	    RP Option 17, length 15: "192.168.22.19:/"
	    BR Option 28, length 4: 192.168.22.255
	    TFTP Option 66, length 4: "test"
	    END Option 255, length 0

If we want to process data from dhcp_try_rfc1048() we have to drop option 150, because it's not exists there.
I like your idea for detecting protocol type.

If it does not exist, the its easy;) the better question is, what to do if both 66 and 150 are present and are different;)

btw, I did test with my vmware fusion VM, the pxe boot is sending out:
DHCP: Message type = DHCPDISCOVER
DHCP: Requested Options:
DHCP: 1 (Subnet Mask)
DHCP: 2 (UTC Time Offset)
DHCP: 3 (Router)
DHCP: 5 (IEN 116 Name Servers)
DHCP: 6 (DNS Servers)
DHCP: 11 (RFC 887 Resource Location Servers)
DHCP: 12 (Client Hostname)
DHCP: 13 (Boot File size in 512 byte Blocks)
DHCP: 15 (DNS Domain Name)
DHCP: 16 (SWAP Server)
DHCP: 17 (Client Root Path)
DHCP: 18 (BOOTP options extensions path)
DHCP: 43 (Vendor Specific Options)
DHCP: 54 (DHCP Server Identifier)
DHCP: 60 (Client Class Identifier =)
DHCP: 67 (Simple Mail (SMTP) Servers)
DHCP: 128 (Site Option)
DHCP: 129 (Site Option)
DHCP: 130 (Site Option)
DHCP: 131 (Site Option)
DHCP: 132 (Site Option)
DHCP: 133 (Site Option)
DHCP: 134 (Site Option)
DHCP: 135 (Site Option)
DHCP: Maximum DHCP Message Size = 1260 bytes
DHCP: Unrecognized Option = 97, length = 17 octets
DHCP: Value = 0x00 0x56 0x4D 0xED 0x79 0xDF 0xA9 0x40 0xE1 0x61 0xAC 0x8F 0x85 0x11 0x7B 0xF5 0x59 (unprintable)
DHCP: Unrecognized Option = 93, length = 2 octets
DHCP: Value = 0x00 0x00 (unprintable)
DHCP: Unrecognized Option = 94, length = 3 octets
DHCP: Value = 0x01 0x02 0x01 (unprintable)
DHCP: Client Class Identifier = "PXEClient:Arch:00000:UNDI:002001"

So, no 66, 150 from this one. Also, I did check with isc-dhcp 4.3.4, this one does not offer options which are *not* asked for.

And this means that if we do not want to rely on root path syntax, and if we want to have configurable (by dhcp server config) tftp-nfs selection, we can not rely on pxe reply and we have to build the proper option list and request them, thus leaving the PXE response as is (it is for the boot file loading anyhow).

Yes, there may be an client asking for an option 66 as seen above, but there are also ones *not* asking, and the sad thing is, you have no way to tell if the 66 was asked for but the admin does not set it in dhcp config as the NFS is wanted, or 66 was not asked by PXE, but we would like to have TFTP boot.

Using the dhcp option 150 is a hack, I used one that is not used and not common to avoid any issue with existing setups where option 66 is used for something else (given it is common) I was maybe going to far with that.

IMHO the best is to detect the protocol in the root path. But as I stated in the commit log, ipxe is already doing that proto://... and when one chainload from ipxe to pxeboot which is quite common these days ipxe will fail on protocols it does not know about.

If someone find a way to have all the informations in the rootpath without ipxe chainload failing, this is imho the best idea.

@tsoome
I based on the Intel's PXE Specification and there 66 exists.

Nevermind, let's focus on the root-path. If we cannot use prefix tftp, maybe it can be a suffix? iPXE ignores bad root-path like: "192.168.22.19:/:tftp"

Screenshot at 2017-05-03 11:50:10.png (213×626 px, 6 KB)

This is ugly, but maybe less that abusing dhcp options :)

@tsoome
I based on the Intel's PXE Specification and there 66 exists.

Nevermind, let's focus on the root-path. If we cannot use prefix tftp, maybe it can be a suffix? iPXE ignores bad root-path like: "192.168.22.19:/:tftp"

Screenshot at 2017-05-03 11:50:10.png (213×626 px, 6 KB)

Specifications are nice, but as shown from the vmware case, you really can not assume they are followed. For the root path - the suffix idea is messy - I would really keep the root path very plain and simple, it is too easy to mess up the configurations otherwise. Also, I think the best solution really is just to make the dhcp query with options 66 and 150 listed, so the existing setups with 150 will continue to work, 66 may be preferred (but has issue if someone is using the name, as discussed earlier). Also note that the ability to reuse PXE packet is really not much win anyhow, the UEFI case is kind of forced to issue dhcp request anyhow - at least I was not able to fetch the pxe ACK when simplenetwork API is used - I actually did try to see if it is possible. And I think there is more value from having simple clear implementation for network setup, than attempting to save a bit for one single platform and trying to implement some other hacks. Note the D10232 does nuke the BIOS pxe specific code and is switching to generic common/dev_net.c

And the last one possible solution:

But also there is much simpler way to distinguish NFS versus TFTP boot - because that is the problem - how to understand if we should go for NFS of for TFTP. And the idea is simple - if there is no server set, we opt to use next server option for nfs/tftp server; now the root_path syntax as such is not set, therefore we can just state that if the root_path is IP:/path then its NFS, if the root_path is /path, then it is TFTP.

Don't use any of 66 or 150 option, but when root-path includes ip address - go thru NFS, if ip address not exists in root-path - go thru TFTP from server which ip address is in next-server. But there is one limitation - only one tftp server in network to provide loader and everything else. Does enybody use more than only one?

well, the root path idea is not bad (IMO;) just I have a bit the same concern that at the end of the day it still may not be enough for whatever reason and then we are still back on the beginning:D

I only recently looked into our pxeboot and it seems to be grossly non compliant with the PXE specification.
And it probably needs to be because it acts as a PXE client.
For starters, it does not support "Proxy DHCP" (a PXE server running separately from a DHCP server) at all.
Format of option 60, vendor class identifier, is non compliant.
Option 93, client system architecture, is not sent at all. And the same goes for a few other mandatory options.

Seems like pxeboot works only with very permissive servers.