Page MenuHomeFreeBSD

ofw_bus_subr.c: Don't treat "0" as a valid OFW node
AbandonedPublic

Authored by kd on Jun 14 2021, 2:10 PM.

Details

Reviewers
mmel
andrew
mw
Summary

ofw_bus_gen_get_node returns "0" in case of failure, where as the default implementation - ofw_bus_default_get_node returns -1.
With the addition of ofw_pci.c in 28c4e511c23f it is now possible for the former case to trigger.
This fixes pci on mcbin with FDT enabled.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

kd requested review of this revision.Jun 14 2021, 2:10 PM
kd created this revision.

@mmel any thoughts about this patch?

This revision is now accepted and ready to land.Jun 16 2021, 12:58 PM

Imho, this is just papering over real problem, it is obvious that ofw_bus_lookup_imap() should not be called for a device/bus that is not based on ofw.
I'm still trying to fully understand the real problem, but in the meantime I have a few questions:

  • Why ofw_bus_gen_get_node() returns a different value for the errored case than ofw_bus_default_get_node() ?
  • For all other buses (i2c, spi...) we have ofw and "native" variants -> why is PCI is special?
  • Why do we want to fake an enumerated pci bus (i.e. not ofw-based) as ofw-based?
  • Why do you think passing get_devinfo() to the parent is the correct default method, and why can it be used with all existing PCIE controllers?
  • How can ofw_pci work in case of more complex pcie topology (for example if the system has PCIe switches)?
In D30761#692182, @mmel wrote:

Imho, this is just papering over real problem, it is obvious that ofw_bus_lookup_imap() should not be called for a device/bus that is not based on ofw.

ofw_bus_lookup_imap is called regardless of the ofw_pci patch.
pci_dw is a class 1 driver, which inherits devmethods from ofw_pcib. The latter implements pcib_route_interrupt using ofw_pcib_route_interrupt.
In other words ofw_bus_lookup_imap is called because the RC is ofw based.

I'm still trying to fully understand the real problem, but in the meantime I have a few questions:

  • Why ofw_bus_gen_get_node() returns a different value for the errored case than ofw_bus_default_get_node() ?

Frankly I have no idea. I haven't written any of those. I guess we wanted to differentiate between a case where there is no ofw support on the bus(-1) vs. where no node was found for a given device. (0)
Since phandle_t is an uint32_t type, assigning -1 to it is a bad idea imho.

  • For all other buses (i2c, spi...) we have ofw and "native" variants -> why is PCI is special?

RCs declare themselves as pci bridges, with a pci bus attached as their child.
From kernel perspective we have something like this: pcib0(actual ofw-based driver) -> pci0 -> foo.
From dts perspective foo is a child of the RC.
Now in order for ofw_bus_get_node(foo) to work pci0 needs to be ofw aware too, i.e. it needs to pass the request up to the parent, which parsed its dts children nodes.
Now the ofw_pci.c driver should only attach, if its parent has an ofw node, see ofw_pci_probe.

  • Why do we want to fake an enumerated pci bus (i.e. not ofw-based) as ofw-based?

Only pci buses whose parents are ofw-based will use ofw_pci.
I need that to map a dts node to a normally enumerated PCI device.
LS1028A SoC uses dts to describe phy/mdio for the switch and NICs connected to the PCI bus.
Their respective driver needs to get that information, see D30729.

  • Why do you think passing get_devinfo() to the parent is the correct default method, and why can it be used with all existing PCIE controllers?

Currently the only RC that implements it is the pci_host_generic_fdt.c. On all other existing PCIe controllers get_devinfo() behavior will remain unchanged.

  • How can ofw_pci work in case of more complex pcie topology (for example if the system has PCIe switches)?

I assume that by PCIe switches you mean pci_pci bridges.
ofw_pci won't attach to those.
From your bootlog:

Top bus: pci0: <OFW PCI bus> on pcib0
Bus under a pci_pci bridge: pci1: <PCI bus> on pcib1

First, please don't take this as hating - I just think we've opened a Pandora's box full of mistakes... I've gotten caught up in the 0/-1 ambiguity for invalid phandle more than once, so I think it would be good to get that sorted out.

In D30761#692212, @mindal_semihalf.com wrote:
In D30761#692182, @mmel wrote:

Imho, this is just papering over real problem, it is obvious that ofw_bus_lookup_imap() should not be called for a device/bus that is not based on ofw.

ofw_bus_lookup_imap is called regardless of the ofw_pci patch.
pci_dw is a class 1 driver, which inherits devmethods from ofw_pcib. The latter implements pcib_route_interrupt using ofw_pcib_route_interrupt.
In other words ofw_bus_lookup_imap is called because the RC is ofw based.

Thats true, but something else is wrong in pcib_route_interrupt call-down hiearchy. This need slightly more time and deeper investigation...

I'm still trying to fully understand the real problem, but in the meantime I have a few questions:

  • Why ofw_bus_gen_get_node() returns a different value for the errored case than ofw_bus_default_get_node() ?

Frankly I have no idea. I haven't written any of those. I guess we wanted to differentiate between a case where there is no ofw support on the bus(-1) vs. where no node was found for a given device. (0)
Since phandle_t is an uint32_t type, assigning -1 to it is a bad idea imho.

After looking deeper into the dev/ofw source code, I noticed that the code only uses compare to -1 for error detection. So, I think ofw_bus_gen_get_node() is bad and should return -1 in case of error.

  • For all other buses (i2c, spi...) we have ofw and "native" variants -> why is PCI is special?

RCs declare themselves as pci bridges, with a pci bus attached as their child.
From kernel perspective we have something like this: pcib0(actual ofw-based driver) -> pci0 -> foo.
From dts perspective foo is a child of the RC.
Now in order for ofw_bus_get_node(foo) to work pci0 needs to be ofw aware too, i.e. it needs to pass the request up to the parent, which parsed its dts children nodes.
Now the ofw_pci.c driver should only attach, if its parent has an ofw node, see ofw_pci_probe.

This sound reasonable and gives sense for me. I have not objection with one exception. Passing fw_bus_get_node(foo) to parent is not necessary and I think that it cannot be used in all save. Moreover ofw_bus_devinfo structure cannot be shared between more devices (if nothing else, rest of ofw_bus_get_<foo> must not return RC properties.

  • Why do we want to fake an enumerated pci bus (i.e. not ofw-based) as ofw-based?

Only pci buses whose parents are ofw-based will use ofw_pci.

I need that to map a dts node to a normally enumerated PCI device.
LS1028A SoC uses dts to describe phy/mdio for the switch and NICs connected to the PCI bus.
Their respective driver needs to get that information, see D30729.

Yeah, I get it. But what happens when this is behind another switch - not directly connected to the root port ?

  • Why do you think passing get_devinfo() to the parent is the correct default method, and why can it be used with all existing PCIE controllers?

Currently the only RC that implements it is the pci_host_generic_fdt.c. On all other existing PCIe controllers get_devinfo() behavior will remain unchanged.

That's not true. Behavior of ofw_bus_get_<foo> was changed for for all RC connected PCI busses)

  • How can ofw_pci work in case of more complex pcie topology (for example if the system has PCIe switches)?

I assume that by PCIe switches you mean pci_pci bridges.
ofw_pci won't attach to those.
From your bootlog:

Top bus: pci0: <OFW PCI bus> on pcib0
Bus under a pci_pci bridge: pci1: <PCI bus> on pcib1

Sorry for confusion, I was mean whas if DT contains more complex topology,
trivial example is tegra -> https://cgit.freebsd.org/src/tree/sys/contrib/device-tree/src/arm64/nvidia/tegra210.dtsi#n18 with dual port RC(root port is connected to degraded PCIe switch, with fixed configuration, invisible from outside), where each port needs own ofw node different from RC.

The ofw_bus_gen_get_node() fix removes the panic in ofw_bus_lookup_imap().
For ofw_pci -> I think we should allocate a custom devinfo for each bus and populate it with the appropriate data, node should be explicit if the bus is mentioned in the DT, or implicit if the RC driver needs it, or otherwise empty.
And (in an ideal world) every device sitting on an ofw-based bus should have an ofw devinfo, empty or filled in from the DT.
What do you think?
Btw: ofw_pci should be bound to ofw_pcib, not pure pcib ( https://cgit.freebsd.org/src/tree/sys/dev/ofw/ofw_pci.c#n69 ). This will make the code more readable and save testing in the probe function.

In D30761#692573, @mmel wrote:

First, please don't take this as hating - I just think we've opened a Pandora's box full of mistakes... I've gotten caught up in the 0/-1 ambiguity for invalid phandle more than once, so I think it would be good to get that sorted out.

No worries, I agree that we should use only a single value to indicate a "NULL"/incorrent phandle.
Since it's a uint32_t I'd personally go with "0".

In D30761#692212, @mindal_semihalf.com wrote:
In D30761#692182, @mmel wrote:

Imho, this is just papering over real problem, it is obvious that ofw_bus_lookup_imap() should not be called for a device/bus that is not based on ofw.

ofw_bus_lookup_imap is called regardless of the ofw_pci patch.
pci_dw is a class 1 driver, which inherits devmethods from ofw_pcib. The latter implements pcib_route_interrupt using ofw_pcib_route_interrupt.
In other words ofw_bus_lookup_imap is called because the RC is ofw based.

Thats true, but something else is wrong in pcib_route_interrupt call-down hiearchy. This need slightly more time and deeper investigation...

I'm still trying to fully understand the real problem, but in the meantime I have a few questions:

  • Why ofw_bus_gen_get_node() returns a different value for the errored case than ofw_bus_default_get_node() ?

Frankly I have no idea. I haven't written any of those. I guess we wanted to differentiate between a case where there is no ofw support on the bus(-1) vs. where no node was found for a given device. (0)
Since phandle_t is an uint32_t type, assigning -1 to it is a bad idea imho.

After looking deeper into the dev/ofw source code, I noticed that the code only uses compare to -1 for error detection. So, I think ofw_bus_gen_get_node() is bad and should return -1 in case of error.

Well as mentioned above we should definitely return just one error value.
The problem is that changing the return value of ofw_bus_gen_get_node() could potentially break something else that relies on it returning 0.

  • For all other buses (i2c, spi...) we have ofw and "native" variants -> why is PCI is special?

RCs declare themselves as pci bridges, with a pci bus attached as their child.
From kernel perspective we have something like this: pcib0(actual ofw-based driver) -> pci0 -> foo.
From dts perspective foo is a child of the RC.
Now in order for ofw_bus_get_node(foo) to work pci0 needs to be ofw aware too, i.e. it needs to pass the request up to the parent, which parsed its dts children nodes.
Now the ofw_pci.c driver should only attach, if its parent has an ofw node, see ofw_pci_probe.

This sound reasonable and gives sense for me. I have not objection with one exception. Passing fw_bus_get_node(foo) to parent is not necessary and I think that it cannot be used in all save. Moreover ofw_bus_devinfo structure cannot be shared between more devices (if nothing else, rest of ofw_bus_get_<foo> must not return RC properties.

I don't think we can accidentally return RC node/other property now.
ofw_bus_gen_get_<foo> will try to call OFW_BUS_GET_DEVINFO, which should return NULL for all existing RCs, except for pci_host_generic_fdt.c when a matching device was found.

  • Why do we want to fake an enumerated pci bus (i.e. not ofw-based) as ofw-based?

Only pci buses whose parents are ofw-based will use ofw_pci.

I need that to map a dts node to a normally enumerated PCI device.
LS1028A SoC uses dts to describe phy/mdio for the switch and NICs connected to the PCI bus.
Their respective driver needs to get that information, see D30729.

Yeah, I get it. But what happens when this is behind another switch - not directly connected to the root port ?

Right now it won't work with any dts device nodes that are not direct children of the RC.

  • Why do you think passing get_devinfo() to the parent is the correct default method, and why can it be used with all existing PCIE controllers?

Currently the only RC that implements it is the pci_host_generic_fdt.c. On all other existing PCIe controllers get_devinfo() behavior will remain unchanged.

That's not true. Behavior of ofw_bus_get_<foo> was changed for for all RC connected PCI busses)

Well, kind of.
The difference is that now we call ofw_bus_gen_get_<foo> instead of the default implementations.
On all drivers except pci_host_generic_fdt.c those will fail.
The only difference and the root cause for this breakage is that _get_node routine returns 0, instead of -1.
(...)

Sorry for confusion, I was mean whas if DT contains more complex topology,
trivial example is tegra -> https://cgit.freebsd.org/src/tree/sys/contrib/device-tree/src/arm64/nvidia/tegra210.dtsi#n18 with dual port RC(root port is connected to degraded PCIe switch, with fixed configuration, invisible from outside), where each port needs own ofw node different from RC.

Hmm, I haven't seen this before. I guess this is a great example of dts incompatibility/mess.
If I understand correctly in the dts above RC children nodes describe ports, where as in LS1028A dts the children describe devices connected to the bus.

The ofw_bus_gen_get_node() fix removes the panic in ofw_bus_lookup_imap().

It does, but it could also break something that depends on it returning 0 in case of an error.

For ofw_pci -> I think we should allocate a custom devinfo for each bus and populate it with the appropriate data, node should be explicit if the bus is mentioned in the DT, or implicit if the RC driver needs it, or otherwise empty.
And (in an ideal world) every device sitting on an ofw-based bus should have an ofw devinfo, empty or filled in from the DT.
What do you think?

That sounds like a cleaner solution compared to what I did.
The problem is that right now I can't see an easy way to differentiate between bus and an endpoint device nodes in dts.

Btw: ofw_pci should be bound to ofw_pcib, not pure pcib ( https://cgit.freebsd.org/src/tree/sys/dev/ofw/ofw_pci.c#n69 ). This will make the code more readable and save testing in the probe function.

Actually that's how I wanted to do it initially, but I don't think it will work.
pci_host_generic used by pci_host_generic_fdt attaches directly do pcib, ignoring ofw_pcib.

In D30761#692611, @mindal_semihalf.com wrote:
In D30761#692573, @mmel wrote:

First, please don't take this as hating - I just think we've opened a Pandora's box full of mistakes... I've gotten caught up in the 0/-1 ambiguity for invalid phandle more than once, so I think it would be good to get that sorted out.

No worries, I agree that we should use only a single value to indicate a "NULL"/incorrent phandle.
Since it's a uint32_t I'd personally go with "0".

I understand but 0 is (probably only theoretically) valid pnode.

In D30761#692212, @mindal_semihalf.com wrote:
In D30761#692182, @mmel wrote:

Imho, this is just papering over real problem, it is obvious that ofw_bus_lookup_imap() should not be called for a device/bus that is not based on ofw.

ofw_bus_lookup_imap is called regardless of the ofw_pci patch.
pci_dw is a class 1 driver, which inherits devmethods from ofw_pcib. The latter implements pcib_route_interrupt using ofw_pcib_route_interrupt.
In other words ofw_bus_lookup_imap is called because the RC is ofw based.

Thats true, but something else is wrong in pcib_route_interrupt call-down hiearchy. This need slightly more time and deeper investigation...

I'm still trying to fully understand the real problem, but in the meantime I have a few questions:

  • Why ofw_bus_gen_get_node() returns a different value for the errored case than ofw_bus_default_get_node() ?

Frankly I have no idea. I haven't written any of those. I guess we wanted to differentiate between a case where there is no ofw support on the bus(-1) vs. where no node was found for a given device. (0)
Since phandle_t is an uint32_t type, assigning -1 to it is a bad idea imho.

After looking deeper into the dev/ofw source code, I noticed that the code only uses compare to -1 for error detection. So, I think ofw_bus_gen_get_node() is bad and should return -1 in case of error.

Well as mentioned above we should definitely return just one error value.
The problem is that changing the return value of ofw_bus_gen_get_node() could potentially break something else that relies on it returning 0.

That's true, but we have time to next release to fix all protentional problems.

  • For all other buses (i2c, spi...) we have ofw and "native" variants -> why is PCI is special?

RCs declare themselves as pci bridges, with a pci bus attached as their child.
From kernel perspective we have something like this: pcib0(actual ofw-based driver) -> pci0 -> foo.
From dts perspective foo is a child of the RC.
Now in order for ofw_bus_get_node(foo) to work pci0 needs to be ofw aware too, i.e. it needs to pass the request up to the parent, which parsed its dts children nodes.
Now the ofw_pci.c driver should only attach, if its parent has an ofw node, see ofw_pci_probe.

This sound reasonable and gives sense for me. I have not objection with one exception. Passing fw_bus_get_node(foo) to parent is not necessary and I think that it cannot be used in all save. Moreover ofw_bus_devinfo structure cannot be shared between more devices (if nothing else, rest of ofw_bus_get_<foo> must not return RC properties.

I don't think we can accidentally return RC node/other property now.
ofw_bus_gen_get_<foo> will try to call OFW_BUS_GET_DEVINFO, which should return NULL for all existing RCs, except for pci_host_generic_fdt.c when a matching device was found.

Booom, that's a stupid mistake. Plus, it's my own stupid mistake. I never realized that ofw_pcib doesn't implement ofw_bus_get_devinfo :(. And the useless definition ofw_bus_get_node hides the correct definition derived from of_pcib - ofw_bus_get_node(). This gives much more light to the strange behavior ofw_pcib_route_interrupt() on the affected controllers. Thanks for pointing me to this bug.

  • Why do we want to fake an enumerated pci bus (i.e. not ofw-based) as ofw-based?

Only pci buses whose parents are ofw-based will use ofw_pci.

I need that to map a dts node to a normally enumerated PCI device.
LS1028A SoC uses dts to describe phy/mdio for the switch and NICs connected to the PCI bus.
Their respective driver needs to get that information, see D30729.

Yeah, I get it. But what happens when this is behind another switch - not directly connected to the root port ?

Right now it won't work with any dts device nodes that are not direct children of the RC.

  • Why do you think passing get_devinfo() to the parent is the correct default method, and why can it be used with all existing PCIE controllers?

Currently the only RC that implements it is the pci_host_generic_fdt.c. On all other existing PCIe controllers get_devinfo() behavior will remain unchanged.

That's not true. Behavior of ofw_bus_get_<foo> was changed for for all RC connected PCI busses)

Well, kind of.
The difference is that now we call ofw_bus_gen_get_<foo> instead of the default implementations.
On all drivers except pci_host_generic_fdt.c those will fail.
The only difference and the root cause for this breakage is that _get_node routine returns 0, instead of -1.
(...)

Sorry for confusion, I was mean whas if DT contains more complex topology,
trivial example is tegra -> https://cgit.freebsd.org/src/tree/sys/contrib/device-tree/src/arm64/nvidia/tegra210.dtsi#n18 with dual port RC(root port is connected to degraded PCIe switch, with fixed configuration, invisible from outside), where each port needs own ofw node different from RC.

Hmm, I haven't seen this before. I guess this is a great example of dts incompatibility/mess.
If I understand correctly in the dts above RC children nodes describe ports, where as in LS1028A dts the children describe devices connected to the bus.

yes, and I think that exact binding is always controller specific.

The ofw_bus_gen_get_node() fix removes the panic in ofw_bus_lookup_imap().

It does, but it could also break something that depends on it returning 0 in case of an error.

For ofw_pci -> I think we should allocate a custom devinfo for each bus and populate it with the appropriate data, node should be explicit if the bus is mentioned in the DT, or implicit if the RC driver needs it, or otherwise empty.
And (in an ideal world) every device sitting on an ofw-based bus should have an ofw devinfo, empty or filled in from the DT.
What do you think?

That sounds like a cleaner solution compared to what I did.
The problem is that right now I can't see an easy way to differentiate between bus and an endpoint device nodes in dts.

Bus should have device_type = "pci" property. But again - the exact binding is defined per controller.

Btw: ofw_pci should be bound to ofw_pcib, not pure pcib ( https://cgit.freebsd.org/src/tree/sys/dev/ofw/ofw_pci.c#n69 ). This will make the code more readable and save testing in the probe function.

Actually that's how I wanted to do it initially, but I don't think it will work.
pci_host_generic used by pci_host_generic_fdt attaches directly do pcib, ignoring ofw_pcib.

Ahh, right, sorry. I forgot that pci_host_generic_fdt() is not subclass of ofw_pcib.

Ok, so how about this.
I'll move all dt parsing I recently added from pci_host_generic_fdt.c to ofw_pci.c.
Then pci_host_generic_fdt will check if it has any children in dt and create a fake ofw devinfo for pci bus as you suggested.
That devinfo will point to the RCs ofw node so that ofw_pci will be able to parse dt by itself.
This will make ofw_pci attach only when its actually needed.
Any thoughts?

No worries, I agree that we should use only a single value to indicate a "NULL"/incorrent phandle.
Since it's a uint32_t I'd personally go with "0".

I understand but 0 is (probably only theoretically) valid pnode.

I don't think that is the case. In sys/dev/ofw/openfirm.c they have a comment saying: /* Return the first child of this node or 0. */.
So at least the openfirm code treats "0" as invalid pnode.

(...)

Hmm, I haven't seen this before. I guess this is a great example of dts incompatibility/mess.
If I understand correctly in the dts above RC children nodes describe ports, where as in LS1028A dts the children describe devices connected to the bus.

yes, and I think that exact binding is always controller specific.

The problem is that the RC in LS1028A declares it self as compatible = "pci-host-ecam-generic";, so this scheme will have to be used for all ecam ofw-based devices.

Hi @mmel The patch is IMO correct and plan to commit it, so that to remove breakage for Armada platforms. In case any further improvements are required, feel free to continue the discussion.

I'm working on this - but situation about return value becomes more clear now. Please see comment in ofw_if.m https://cgit.freebsd.org/src/tree/sys/dev/ofw/ofw_bus_if.m#n148 and andrew just pointed me to this commit https://cgit.freebsd.org/src/commit/?h=0d8d9edaaaca1

@mmel sure, thanks for the heads up. Please provide the pointer once available.

This helps on the GT 8k (admitting very close to the mcbin) to boot as well (pas the PCI-E controller).
See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256779 for the problem I had encountered without this.
Thanks a lot for the patch.

nathan confirmed that ofw_bus_gen_get_node() should return -1 for non-ofw based device. i will commit fix asap (tomorrow).