Page MenuHomeFreeBSD

nvdimm: add a driver for the NVDIMM root device
ClosedPublic

Authored by scottph on Nov 26 2018, 9:19 PM.
Tags
None
Referenced Files
F108542399: D18346.id52966.diff
Sun, Jan 26, 3:15 AM
F108542140: D18346.id51839.diff
Sun, Jan 26, 3:10 AM
Unknown Object (File)
Fri, Jan 24, 6:58 PM
Unknown Object (File)
Fri, Jan 24, 5:31 PM
Unknown Object (File)
Fri, Jan 24, 5:35 AM
Unknown Object (File)
Thu, Jan 23, 6:06 AM
Unknown Object (File)
Wed, Jan 22, 8:28 PM
Unknown Object (File)
Wed, Jan 22, 5:20 PM
Subscribers

Details

Summary

The NVDIMM root device is parent to the individual ACPI NVDIMM
devices. Add a driver for the NVDIMM root device that can own
enumeration of NVDIMM devices as well as NVDIMM SPA ranges that
the system has.

Submitted by: D Scott Phillips <d.scott.phillips@intel.com>
Sponsored by: Intel Corporation

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

So there are really two changes, and the addition of the spa_g_destroy() is rather small comparing to the other part. You stopped allocating spa_mappings[] array at all. Can you extract it into separate change ?

BTW, I am not sure if it is possible to destroy geom forcibly by a user command. If yes, then the tight bind between spa lifetime and geom lifetime is undesirable, I think, because besides geom spa provides usermode access device.

In D18346#390124, @kib wrote:

So there are really two changes, and the addition of the spa_g_destroy() is rather small comparing to the other part. You stopped allocating spa_mappings[] array at all. Can you extract it into separate change ?

BTW, I am not sure if it is possible to destroy geom forcibly by a user command. If yes, then the tight bind between spa lifetime and geom lifetime is undesirable, I think, because besides geom spa provides usermode access device.

ah, yes it looks like userspace can issue a command to destroy a geom. I'll rework the patch to separate geom and device lifetimes.

In D18346#390125, @scott.d.phillips_intel.com wrote:
In D18346#390124, @kib wrote:

So there are really two changes, and the addition of the spa_g_destroy() is rather small comparing to the other part. You stopped allocating spa_mappings[] array at all. Can you extract it into separate change ?

BTW, I am not sure if it is possible to destroy geom forcibly by a user command. If yes, then the tight bind between spa lifetime and geom lifetime is undesirable, I think, because besides geom spa provides usermode access device.

ah, yes it looks like userspace can issue a command to destroy a geom. I'll rework the patch to separate geom and device lifetimes.

I am not sure that there is a generic geom destroy command, all I was able to find is class-specific. Still, it might be better to split the lifetimes on principle.

scottph retitled this revision from nvdimm: provide a destroy_geom method to nvdimm_spa: rework SPA ownership.
scottph edited the summary of this revision. (Show Details)

@kib Here's an approach to splitting the geom and spa device by making nvdimm_spa devices on the acpi bus. Let me know if this looks like the right direction here.

  • check for the existence of an spa device before adding a new one to the acpi bus.

Why do we need newbus attachment for SPAs ?

For ACPI-enumerated NVDIMMs, creating the devices is more or less natural since they represent physical objects that exist in the chassis. But SPAs are purely software constructs, which newbus devices seems to only tunnel the calls to init_ine()/fini_one().

In D18346#392701, @kib wrote:

Why do we need newbus attachment for SPAs ?

For ACPI-enumerated NVDIMMs, creating the devices is more or less natural since they represent physical objects that exist in the chassis. But SPAs are purely software constructs, which newbus devices seems to only tunnel the calls to init_ine()/fini_one().

I created new devices to not need to reinvent lifecycle tracking. While we don't support it yet SPAs can come and go dynamically when dimms are hotplugged. Also in the future SPAs will have namespaces which will be variable in number and can come and go dynamically.

I agree that SPAs aren't really like devices, but more precisely like resources provided by the NVDIMM root device, but they could reasonably proxy some operations to either the nvdimm devices or the nvdimm root device (like writes to flush hint addresses on nvdimms, or provide error information via the root device's address range scrub functionality). Providing some operations from a somewhat virtual SPA or namespace device would let userspace not have to reconstruct which dimms compose a SPA, or what address range of a SPA make up a namespace.

So I think those reasons are compelling, but I can understand that it's a bit weird. If you're not convinced then I can go back to tracking SPAs not as devices, but in that case we will need to implement a fair bit of what bus can already provide us.

In D18346#392730, @scott.d.phillips_intel.com wrote:
In D18346#392701, @kib wrote:

Why do we need newbus attachment for SPAs ?

For ACPI-enumerated NVDIMMs, creating the devices is more or less natural since they represent physical objects that exist in the chassis. But SPAs are purely software constructs, which newbus devices seems to only tunnel the calls to init_ine()/fini_one().

I created new devices to not need to reinvent lifecycle tracking. While we don't support it yet SPAs can come and go dynamically when dimms are hotplugged. Also in the future SPAs will have namespaces which will be variable in number and can come and go dynamically.

I agree that SPAs aren't really like devices, but more precisely like resources provided by the NVDIMM root device, but they could reasonably proxy some operations to either the nvdimm devices or the nvdimm root device (like writes to flush hint addresses on nvdimms, or provide error information via the root device's address range scrub functionality). Providing some operations from a somewhat virtual SPA or namespace device would let userspace not have to reconstruct which dimms compose a SPA, or what address range of a SPA make up a namespace.

So I think those reasons are compelling, but I can understand that it's a bit weird. If you're not convinced then I can go back to tracking SPAs not as devices, but in that case we will need to implement a fair bit of what bus can already provide us.

What do we need ? I see that we need a unique unit number (currently punned by reusing the SPA index) and a way to list all SPAs at the unload time, to destruct them. Would be a global list of all SPAs enough for the later ?

In D18346#392735, @kib wrote:

What do we need ? I see that we need a unique unit number (currently punned by reusing the SPA index) and a way to list all SPAs at the unload time, to destruct them. Would be a global list of all SPAs enough for the later ?

That, and then later a per-spa list of namespaces and the corresponding code to handle their discovery/creation/deletion/etc

In D18346#392749, @scott.d.phillips_intel.com wrote:
In D18346#392735, @kib wrote:

What do we need ? I see that we need a unique unit number (currently punned by reusing the SPA index) and a way to list all SPAs at the unload time, to destruct them. Would be a global list of all SPAs enough for the later ?

That, and then later a per-spa list of namespaces and the corresponding code to handle their discovery/creation/deletion/etc

But namespaces have no relevance for newbus at all, e.g. we do not create device_t for partitions for the same reason.

In D18346#392753, @kib wrote:

But namespaces have no relevance for newbus at all, e.g. we do not create device_t for partitions for the same reason.

hmm ok, it sounds like a bus is just the wrong tool for this job then. I'll take it out.

ah, the other thing was that having SPAs as devices gave me a convenient fix to the ordering problem where the nvdimm devices are not yet attached at SPA initialization time. perhaps a better fix would be to have a device on the acpi bus for the NVDIMM root device (ACPI0012), which control the SPA initialization, outside of anything related to geom.

scottph retitled this revision from nvdimm_spa: rework SPA ownership to nvdimm: Move ownership of SPA mappings to the NVDIMM root device.
scottph edited the summary of this revision. (Show Details)

I do not understand your fixation on the newbus attachment for SPA. Now you are trying to hide it under the root NVDIMM device, but again, I do not see any connection between the root and activation of SPA support. To get SPA used, we only need SPA entries from the NFIT table.

Other way to say it, suppose that nvdimm.c driver attaches not only to the NVDIMM devices proper, but also to the root device. How would your trick work then ?

I want to have some progress with this patch, I see clearly independent part which moves the spa_g fields from SPA_mapping into g_spa_softc. Can you extract this into separate patch ? I think this would be committable as is.

In D18346#394755, @kib wrote:

I do not understand your fixation on the newbus attachment for SPA. Now you are trying to hide it under the root NVDIMM device, but again, I do not see any connection between the root and activation of SPA support. To get SPA used, we only need SPA entries from the NFIT table.

I'm looking for the right place to put the ownership of the SPA_mappings. It seems wrong to have them owned and managed by the geom init function when geom is becoming a separate thing with a separate lifetime that can be destroyed before the SPA_mapping is destroyed. Does that sound like a reasonable motivation, or are you saying that doesn't matter?

In a certain sense, I think the SPA ranges could be considered as resources provided by the NVDIMM root device, specifically when you look at it in connection with the _DSMs the root device provides (sec 9.20.7 in the acpi spec). Though like you say, they're not like SPAs are _CRS resources so I suppose that view isn't inherent in the way things are modeled.

Other way to say it, suppose that nvdimm.c driver attaches not only to the NVDIMM devices proper, but also to the root device. How would your trick work then ?

I want to have some progress with this patch, I see clearly independent part which moves the spa_g fields from SPA_mapping into g_spa_softc. Can you extract this into separate patch ? I think this would be committable as is.

Sure, so to be clear you're saying that the right way to go is to have the ownership of the SPA_mappings stay managed by the geom init function after the split?

In D18346#394762, @scott.d.phillips_intel.com wrote:
In D18346#394755, @kib wrote:

I do not understand your fixation on the newbus attachment for SPA. Now you are trying to hide it under the root NVDIMM device, but again, I do not see any connection between the root and activation of SPA support. To get SPA used, we only need SPA entries from the NFIT table.

I'm looking for the right place to put the ownership of the SPA_mappings. It seems wrong to have them owned and managed by the geom init function when geom is becoming a separate thing with a separate lifetime that can be destroyed before the SPA_mapping is destroyed. Does that sound like a reasonable motivation, or are you saying that doesn't matter?

In a certain sense, I think the SPA ranges could be considered as resources provided by the NVDIMM root device, specifically when you look at it in connection with the _DSMs the root device provides (sec 9.20.7 in the acpi spec). Though like you say, they're not like SPAs are _CRS resources so I suppose that view isn't inherent in the way things are modeled.

Ok. I think we could do some cross of what you want and what I do not want.

Right now how I wrote nvdimm.c, it directly digs into subordinate devices of the root NVDIMM device and instantiate device_t for each found device in the namespace in probe. What if we add a top-level newbus driver (in nvdimm.c) which attaches to the root ACPI device. This driver duties would be:

  • create children for each subordinate device, where the current nvdimm.c driver attaches. Basically, this moves the nvdimm_probe() into the root device driver;
  • iterate over the NFIT and create SPAs, basically identical to what you do in the current patch.

The difference with the patch is that root driver is not a hack but a place where we put all the enumeration logic.

Other way to say it, suppose that nvdimm.c driver attaches not only to the NVDIMM devices proper, but also to the root device. How would your trick work then ?

I want to have some progress with this patch, I see clearly independent part which moves the spa_g fields from SPA_mapping into g_spa_softc. Can you extract this into separate patch ? I think this would be committable as is.

Sure, so to be clear you're saying that the right way to go is to have the ownership of the SPA_mappings stay managed by the geom init function after the split?

No, I am not saying this. I want to commit half of the patch which does not cause much of discussion.

In D18346#394764, @kib wrote:

Ok. I think we could do some cross of what you want and what I do not want.

Right now how I wrote nvdimm.c, it directly digs into subordinate devices of the root NVDIMM device and instantiate device_t for each found device in the namespace in probe. What if we add a top-level newbus driver (in nvdimm.c) which attaches to the root ACPI device. This driver duties would be:

  • create children for each subordinate device, where the current nvdimm.c driver attaches. Basically, this moves the nvdimm_probe() into the root device driver;
  • iterate over the NFIT and create SPAs, basically identical to what you do in the current patch.

The difference with the patch is that root driver is not a hack but a place where we put all the enumeration logic.

OK, got it, thanks for the direction.

scottph retitled this revision from nvdimm: Move ownership of SPA mappings to the NVDIMM root device to nvdimm: add a driver for the NVDIMM root device.
scottph edited the summary of this revision. (Show Details)
scottph edited the test plan for this revision. (Show Details)
sys/dev/nvdimm/nvdimm.c
2 ↗(On Diff #52520)

Update copyright ?

91 ↗(On Diff #52520)

Not directly related to your patch. You might want to change this malloc() to mallocarray, even if only for consistency with the style of the rest of the file.

This revision is now accepted and ready to land.Jan 15 2019, 5:54 AM
sys/dev/nvdimm/nvdimm.c
2 ↗(On Diff #52520)

Also, please update the review description by providing all required tags for the commit. 'Submitted by: <your canonical email address>', 'Sponsored by:' (if required) etc.

scottph edited the summary of this revision. (Show Details)

Add copyright for Intel, add commit message tags, allocate flush addrs with mallocarray .

This revision now requires review to proceed.Jan 17 2019, 6:23 PM
This revision is now accepted and ready to land.Jan 18 2019, 10:43 AM
This revision was automatically updated to reflect the committed changes.