Page MenuHomeFreeBSD

Add AMD NTB Hardware Driver for AMD SoC
ClosedPublic

Authored by rajeshasp on Jan 7 2019, 4:46 PM.

Details

Reviewers
mav
cem
Group Reviewers
Core Team
Commits
rS349594: Add driver for NTB in AMD SoC.
Summary

This patch is the driver for NTB hardware in AMD SoCs (ported from Linux) and enables the NTB infrastructure like Doorbells, Scratchpads and Memory window in AMD SoC. This driver has been validated using ntb_transport and if_ntb driver already available in FreeBSD.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sys/dev/ntb/ntb.c
485–488 ↗(On Diff #52636)

It seems like this needs to incorporate nc->dbmask (the value provided by hints) still. I'm not sure what the point of the hinting system is, though, you'd have to ask Mav.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
297 ↗(On Diff #52636)

style nit: a space goes between if and (.

304 ↗(On Diff #52636)

Usual style for indentation on subsequent lines is 4 spaces. We have a style(9) manual page for more examples / specifics.

658 ↗(On Diff #52636)

Starting a string sysctl with a newline is unconventional

872 ↗(On Diff #52636)

M_WAITOK allocations cannot fail in FreeBSD.

1292 ↗(On Diff #52636)

I'd encourage adding a MODULE_PNP_INFO(), like the Intel NTB driver has.

sys/dev/ntb/ntb_hw/ntb_hw_amd.h
217 ↗(On Diff #52636)

Probably stale comment?

Hi cem,

In D18774#400705, @cem wrote:

I didn't really look at this driver thoroughly and don't plan to, but just skimmed through and pointed out some possible improvements.

Is there a public datasheet for this part or is this device spec NDA'd?

Thanks!

Thanks for looking into this patch. The details of this hardware is not yet made public. But let me know if you need any specific details. I will see if I can share you necessary details.

Corrected overall style mistakes. Also, moved "info" sysctl to global space, just to be in sync with other ntb_hw drivers.

sys/dev/ntb/ntb.c
485–488 ↗(On Diff #52636)

In ntb.c, ntb_register_device function call dbt = flsll(NTB_DB_VALID_MASK(dev)). If nc->dbmask is returned from NTB_DB_VALID_MASK, it will be zero (irrespective of whatever the hardware driver says. So, hint becomes necessary to configured the db count. If we have the change in this patch, initially hardware driver provided db will be taken and later based on the hint, it will support only specific db_count. That's the reason I changed this. Please correct me if I am wrong.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
297 ↗(On Diff #52636)

Also, checked for other similar occurrences and corrected it.

304 ↗(On Diff #52636)

Also, checked for other similar occurrences and corrected it.

872 ↗(On Diff #52636)

Removed the check for NULL

1292 ↗(On Diff #52636)

I see this macro register device information. But how useful that will be with the system overall? It may be simple question, but just trying to understand.

sys/dev/ntb/ntb_hw/ntb_hw_amd.h
24 ↗(On Diff #52636)

Added the 3rd Clause. Copy/paste mistake.

217 ↗(On Diff #52636)

Kept this as other ntb_hw drivers has the same comment in place.

rajeshasp marked 5 inline comments as done.

V2 patch addressing the previous comments from cem.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
1292 ↗(On Diff #52636)

It's used by devmatch to automatically load modules based on plug and play information.

1178 ↗(On Diff #52693)

I'd encourage the use of pci_device_table here and the use of PCI_PNP_INFO instead below. That's not as important as the PNP_INFO macro though.

mav requested changes to this revision.Jan 16 2019, 10:20 PM

I've quite briefly looked through the AMD part itself, but I have objections about interface part. See inline.

sys/dev/ntb/ntb.c
424 ↗(On Diff #52693)

I am not sure about motivation of this function, but it ignores hints mechanism.

431 ↗(On Diff #52693)

This function looks like a subset of ntb_mw_get_range(), except it does not respect hints mechanism, in particular nc->mwoff. I can understand addition of this function if it is needed to match Linux KPI, but otherwise I see no real need or benefits.

485–488 ↗(On Diff #52636)

The system of hints allows to split resources of single NTB controller between several consumers. For example, on our hardware we split it between if_ntb/ntb_transport interface and large separate window for NVDIMM synchronization. If somebody wish to try it, it should be possible to set up two ntb_transport instances using different windows, scratchpads and doorbells, each with its own if_ntb interface.

I don't understand the provided explanation, I don't see what initial zero you are talking about. While we are not splitting doorbells on our setup, I tested this code and it worked just fine, while your code is obviously broken, since all NTB children will see the same mask despite hints, which is wrong.

sys/dev/ntb/ntb.h
111 ↗(On Diff #52693)

I know Linux added those methods, but is there real usage for them? If we add them, we'll need either implement them for Intel and PLX or add default handlers, so that callers won't crash.

This revision now requires changes to proceed.Jan 16 2019, 10:20 PM
rajeshasp added inline comments.
sys/dev/ntb/ntb.c
424 ↗(On Diff #52693)

This was added for ntb_tool (D18819) and ntb_perf(D18832). But this function is currently not used by those drivers. So removed this.

431 ↗(On Diff #52693)

This was also added for ntb_tool (D18819) and ntb_perf(D18832). But this function is currently not used by those drivers. So removed this also.

485–488 ↗(On Diff #52636)

Sorry about the confusion here. I confused between "ntb_mw_count" and "NTB_MW_COUNT", which leads to that initial zero statement. Yes, it's not needed and is removed now.

I would be interested to try the scenario what is mentioned here. Can you please let me know how to configure the resources for two ntb_transport instances and get two if_ntb interfaces?

sys/dev/ntb/ntb.h
111 ↗(On Diff #52693)

ntb_tool (D18819) uses this. That's the reason its been added. Yes I agree that we need default handlers for other hardware drivers. But they need to be implemented in the respective hardware drivers right (or) is there any other approach we can use, as in Linux?

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
1178 ↗(On Diff #52693)

Changed the PCI IDS to pci_device_table format and used PCI_PNP_INFO rather than MODULE_PNP_INFO.

rajeshasp marked an inline comment as done.

Addressed commented from @imp and @mav and submitted V3 patch. Also, added amd_ntb_mw_set_wc routine.

I just made a pass through for anything that caught my eye, and found a couple. I have not reviewed it to see if it matches the hardware spec for what it's trying to control.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
79 ↗(On Diff #52982)

I'm not sure this is the right way to grab what you are grabbing. In the normal kernel build, all .h files are present in the current directory, or directly via -I lines. Having constructs like this leads to problems.

109 ↗(On Diff #52982)

This is 0

111 ↗(On Diff #52982)

This is 1

116 ↗(On Diff #52982)

This a small positive integer of indeterminate value (which happens not to be 0 or 1 by chance).

So you have overlapping ranges here which is dangerous. There's other places as well that have this construct.

715 ↗(On Diff #52982)

So *ntb doesn't ever change? If that's true, this is OK. If it changes, there's no locking in amd_ntb_hw_info_handler.

sys/dev/ntb/ntb_hw/ntb_hw_amd.h
7 ↗(On Diff #52982)

Strictly speaking, 'All Rights Reserved.' is not needed and baggage from a by-gone era no longer with any real meaning. You may drop it entirely, if you'd like (it's up to the copyright holder to determine the form of their copyright noice, but the project has recently dropped the phrase from its model license).

sys/dev/ntb/ntb.c
485–488 ↗(On Diff #52636)

You should be able to attach two ntb_transport's by setting such loader tunable (if I understand properly what resources this AMD NTB has):
hint.ntb_hw.0.config="ntb_transport:1:6:8,ntb_transport"

To attach third there probably won't be enough scratchpads, since ntb_transport uses at least 6 in case of one memory window.

In D18774#403351, @imp wrote:

I just made a pass through for anything that caught my eye, and found a couple. I have not reviewed it to see if it matches the hardware spec for what it's trying to control.

As I mentioned, the spec for this hardware is not yet made public. If you wish to see, you can refer the Linux driver for the same (drivers/ntb/hw/amd). Please let me know if you have any specific questions.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
79 ↗(On Diff #52982)

Changed it to "dev/ntb/ntb.h". Other hardware drivers shall also be changed to be in sync. @mav , can we make this change for other hardware drivers as well.

116 ↗(On Diff #52982)

I assume, the concern here is returning positive error value, so that value could wrongly be interpreted as a valid value. If so, that's a valid concern and it's been changed to negative value now.

715 ↗(On Diff #52982)

Other than lnk_sta, rest of the ntb structure elements accessed are constants. Other values which could change, are being read from registers (using bus_space_read_X calls), which I believe is atomic and doesn't need any locking. "lnk_sta" is also changed only during link events. Please let me know if that needs a lock.

sys/dev/ntb/ntb_hw/ntb_hw_amd.h
7 ↗(On Diff #52982)

Removed. Also removed, unwanted Copy-right from other vendor

rajeshasp marked 3 inline comments as done.

Submitting V4 patch, addressing comment from @imp

sys/dev/ntb/ntb.c
485–488 ↗(On Diff #52636)

Hi Mav,

I tried this config in my setup. As expected, I could able to create only one transport instance and second transport instance fails (because of minimum number of spads). But with the created instance with minimum 6 spads, things are working good. Thanks for the info.

A question here, If two instance of ntb_transport is created properly, will loading if_ntb, create two network interfaces automatically with queues split equally (or based on the hint.ntb_transport.X.config)? or any manual config is needed to create two separate network interfaces?

Any more comments on this patch?

sys/dev/ntb/ntb.c
485–488 ↗(On Diff #52636)

Each if_ntb instance is independent and will automatically attach to its ntb_transport instance. If you look, you may also attach several if_ntb instances to one ntb_transport. Number of queues in each case will depend on number of doorbells it get after the split. It is all configurable, but defaults where possible should be sane.

I have no idea about the hardware, but I have suspicion that alignment constraints you may report incorrectly. Disregard if I am wrong.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
303 ↗(On Diff #53053)

Just to be sure, does the hardware allows 1MB aligned windows for any window sizes? Not aligned to bigger window sizes? Neither Intel nor PLX can do that. For PLX I had to use A-LUT translation table to reduce this constraint.

355 ↗(On Diff #53053)

I am curios why those checks are needed? If hardware has constraints on translation address, instead of failing here you should better report the constraint via align above. You may check if there is a reason, but I'd prefer it to be exception rather then rule.

384 ↗(On Diff #53053)

I am curios why those checks are needed? If hardware has constraints on window size, instead of failing here you should better report the constraint via align_size above.

rajeshasp added inline comments.
sys/dev/ntb/ntb_hw/ntb_hw_amd.c
303 ↗(On Diff #53053)

@mav, you are right. HW doesn't allow 1M aligned. It allow a granularity range. The minimum granularity supported by this hardware is 4K aligned, which is what Linux provides. Made the same changes to return 4K aligned now in FreeBSD.

Just curious, when you say "Intel and PLX can't do that", I assume those hardware doesn't allow higher size aligned (like 1M)? Is that right? But in code, I see the comments like the alignment should be BAR sized. Also, A-LUT is there to support small size granularity. So, can you please clarify what you mean?

355 ↗(On Diff #53053)

In AMD hardware, these registers are read-only when link is not enabled. So, when a client (ntb_transport) is loaded, first it calls set_trans to write these register and then does link enable. So these checks are there to capture the read-only scenario and return without proceeding furthur.

Also inline with this, needed a clarification. In client driver detach path, link disable happens before clear_trans(from free_mws). Is that valid? Shouldn't it be clearing the translate registers, then link disable and then free the mw resources? Because of this, we have the old values retained in some scenarios until the next set_trans happens to overwrite. Functionally, it is not disturbing. But looking for clarity on this.

384 ↗(On Diff #53053)

This is for the same reason why we have the check for xlat registers above.

rajeshasp marked an inline comment as done.

Addressed comment from @mav regarding the alignment and explained the reason for those checks for xlat and limit registers.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
303 ↗(On Diff #53053)

Your change to 4K is actually opposite of what I expected. So I'd recommend to do double check that.

What I am saying is that both Intel and PLX NTB (without A-LUT) require translation address to be multiple of window (PCI BAR) size, i.e. NTB window of 1MB must be aligned to 1MB on both sides, and window of 64GB has to be aligned to 64GB, that is actually quite inconvenient. In case of PLX there is additional A-LUT table of 256 entries, which allow single window to be translated as 256 separate chunks, so that 64GB window may have alignment of 65536/256 => 256MB, that is much nicer.

355 ↗(On Diff #53053)

You may see, that early window allocation was added to try avoid problems with allocation of significant piece of physically contiguous memory if it take a lot of system run time before the link first established. After the link is established, the code will check and try to adjust allocation, if needed. Neither Intel nor PLX had the problem with register writing you are describing, so the code may indeed require some tuning. Just consider that error reported there during pre-allocation will free the allocated memory window, that may give you back the problem I described later.

No new patch attached, but looking for few clarification on your previous comments.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
303 ↗(On Diff #53053)

I will double check about this part and update about it.

355 ↗(On Diff #53053)

Yes, we might run into the problem of not getting contiguous physical memory. But problem here is, in ntb_transport when link_up event happens and when the code tries to adjust allocation, it's seeing that (mw->xlat_size == xlat_size), and returning without programming the xlat and limit registers (set_trans will not be called). So, if that check in AMD driver needs to be removed. I could see two possible ways to handle this.

  1. I am not sure writing the xlat and limit registers once again would be a problem with Intel and PLX. If not, can we re-write those registers again when link up happens?
  2. In Linux, set_trans happens only during Link event. So, can we have the same approach here? Inline with this, may be a silly question. HW Link up happens right after module loading. So do we really face trouble in getting the physically contiguous memory during link events? But wondering about the use-case for the probability of this issue?

Any comments? or any better way we can go about here?

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
355 ↗(On Diff #53053)

I think we could split memory allocation from setting translation address in ntb_transport, and still try to preallocate it on driver load, while set translation address only on link set up.

What worries me though, is that if as you say registers are not writable when link is down, then what happen when link is back up? Will they be reset to defaults with window being disabled until configured, or keep their values, and letting remote side access possibly repurposed memory?

What's about link up delay, the reason is simple -- the other node is physically not present, powered off, or at least not booted at the time. It may boot first time days or even months after the first one, when the first one has no physically contiguous memory any more.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
355 ↗(On Diff #53053)

Splitting memory allocation and setting xlat address would be good.

To clarify, those registers are read-only only until link is enabled (not till link up). Link enable happens during driver load (but after ntb_set_mw to program the xlat and limit registers, which is the problem we have now). So, if splitting should be fine, we can allocate the memory before link enable and program the register after link enable. And we can have both memory reallocation and programming during link up (as we have currently), but just having them split. These registers will be RW until link is disabled (which happens during driver unload). So, link up/down will not disturb those registers or mw window access differently (unless mw size changes and needs reallocation during link up).

If we are ok with this approach, I shall make the necessary changes and submit the next patch. I am yet to get the clarity on the alignment part. Will try to get that asap.

@mav: I checked about that alignment requirements. Seems, the alignment should be on-par to the bar size (and not a fixed value) and the granularity range what I mentioned earlier for LUT based addressing (which is not implemented here). So, I need to change the alignment equal to the bar size. Do you have any comments on the previous discussed changes?

Yes, please make a patch to split allocation and programming.

Made necessary changes as per the previous discussion. Attached V6 patch.

@mav, Any more comments on this patch?

@mav: Sorry to disturb you on this again. But, please have a look at this when you get a chance and let me know if there any comments to address here.

@mav, Did you get a chance to look at this?

brooks added a subscriber: brooks.

This is dual licensed and fine per current policy.

Is there anything else needed here? or can this be taken upstream?

This revision was not accepted when it landed; it landed in state Needs Review.Jul 2 2019, 5:25 AM
Closed by commit rS349594: Add driver for NTB in AMD SoC. (authored by mav). · Explain Why
This revision was automatically updated to reflect the committed changes.

I am sorry for enormous delay. Long flight over Atlantic finally made me to look on this. Committed to FreeBSD head with minor changes. I've build-tested it, but please make sure that it works fine hardware also. Thank you for your work.

Thanks @mav for reviewing and pushing the patch upstream. I verified it to work properly. I have some more minor changes to this Driver (basically to add support for another PCI Device ID). Shall I submit the additional patch here itself? or should I open another review for the same?

In D18774#452750, @rajfbsd_gmail.com wrote:

Thanks @mav for reviewing and pushing the patch upstream. I verified it to work properly. I have some more minor changes to this Driver (basically to add support for another PCI Device ID). Shall I submit the additional patch here itself? or should I open another review for the same?

Please create a new review :-).

Ok. I will create another review. Thanks for your time and immediate response.