Page MenuHomeFreeBSD

Add AMD NTB Hardware Driver for AMD SoC
Needs ReviewPublic

Authored by rajfbsd_gmail.com on Jan 7 2019, 4:46 PM.

Details

Reviewers
mav
cem
Group Reviewers
Core Team
Summary

This patch is the driver for NTB hardware in AMD SoCs (ported from Linux) and enables the NTB infrastructure like Doorbells, Scratchpads and Memory window in AMD SoC. This driver has been validated using ntb_transport and if_ntb driver already available in FreeBSD.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
cem added a comment.Jan 8 2019, 1:08 AM

I didn't really look at this driver thoroughly and don't plan to, but just skimmed through and pointed out some possible improvements.

Is there a public datasheet for this part or is this device spec NDA'd?

Thanks!

sys/dev/ntb/ntb.c
485–488

It seems like this needs to incorporate nc->dbmask (the value provided by hints) still. I'm not sure what the point of the hinting system is, though, you'd have to ask Mav.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
298

style nit: a space goes between if and (.

305

Usual style for indentation on subsequent lines is 4 spaces. We have a style(9) manual page for more examples / specifics.

659

Starting a string sysctl with a newline is unconventional

873

M_WAITOK allocations cannot fail in FreeBSD.

1293

I'd encourage adding a MODULE_PNP_INFO(), like the Intel NTB driver has.

sys/dev/ntb/ntb_hw/ntb_hw_amd.h
25

Do you have AMD's permission to drop the 3rd clause of the BSDL present in the Linux driver?

218

Probably stale comment?

rajfbsd_gmail.com marked 6 inline comments as done.Jan 9 2019, 2:27 PM

Hi cem,

In D18774#400705, @cem wrote:

I didn't really look at this driver thoroughly and don't plan to, but just skimmed through and pointed out some possible improvements.

Is there a public datasheet for this part or is this device spec NDA'd?

Thanks!

Thanks for looking into this patch. The details of this hardware is not yet made public. But let me know if you need any specific details. I will see if I can share you necessary details.

Corrected overall style mistakes. Also, moved "info" sysctl to global space, just to be in sync with other ntb_hw drivers.

sys/dev/ntb/ntb.c
485–488

In ntb.c, ntb_register_device function call dbt = flsll(NTB_DB_VALID_MASK(dev)). If nc->dbmask is returned from NTB_DB_VALID_MASK, it will be zero (irrespective of whatever the hardware driver says. So, hint becomes necessary to configured the db count. If we have the change in this patch, initially hardware driver provided db will be taken and later based on the hint, it will support only specific db_count. That's the reason I changed this. Please correct me if I am wrong.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
298

Also, checked for other similar occurrences and corrected it.

305

Also, checked for other similar occurrences and corrected it.

873

Removed the check for NULL

1293

I see this macro register device information. But how useful that will be with the system overall? It may be simple question, but just trying to understand.

sys/dev/ntb/ntb_hw/ntb_hw_amd.h
25

Added the 3rd Clause. Copy/paste mistake.

218

Kept this as other ntb_hw drivers has the same comment in place.

rajfbsd_gmail.com marked 5 inline comments as done.

V2 patch addressing the previous comments from cem.

imp added inline comments.Jan 9 2019, 4:07 PM
sys/dev/ntb/ntb_hw/ntb_hw_amd.c
1179

I'd encourage the use of pci_device_table here and the use of PCI_PNP_INFO instead below. That's not as important as the PNP_INFO macro though.

1293

It's used by devmatch to automatically load modules based on plug and play information.

mav requested changes to this revision.Jan 16 2019, 10:20 PM

I've quite briefly looked through the AMD part itself, but I have objections about interface part. See inline.

sys/dev/ntb/ntb.c
424

I am not sure about motivation of this function, but it ignores hints mechanism.

431

This function looks like a subset of ntb_mw_get_range(), except it does not respect hints mechanism, in particular nc->mwoff. I can understand addition of this function if it is needed to match Linux KPI, but otherwise I see no real need or benefits.

485–488

The system of hints allows to split resources of single NTB controller between several consumers. For example, on our hardware we split it between if_ntb/ntb_transport interface and large separate window for NVDIMM synchronization. If somebody wish to try it, it should be possible to set up two ntb_transport instances using different windows, scratchpads and doorbells, each with its own if_ntb interface.

I don't understand the provided explanation, I don't see what initial zero you are talking about. While we are not splitting doorbells on our setup, I tested this code and it worked just fine, while your code is obviously broken, since all NTB children will see the same mask despite hints, which is wrong.

sys/dev/ntb/ntb.h
111

I know Linux added those methods, but is there real usage for them? If we add them, we'll need either implement them for Intel and PLX or add default handlers, so that callers won't crash.

This revision now requires changes to proceed.Jan 16 2019, 10:20 PM
rajfbsd_gmail.com marked an inline comment as done.Jan 18 2019, 2:16 PM
rajfbsd_gmail.com added inline comments.
sys/dev/ntb/ntb.c
424

This was added for ntb_tool (D18819) and ntb_perf(D18832). But this function is currently not used by those drivers. So removed this.

431

This was also added for ntb_tool (D18819) and ntb_perf(D18832). But this function is currently not used by those drivers. So removed this also.

485–488

Sorry about the confusion here. I confused between "ntb_mw_count" and "NTB_MW_COUNT", which leads to that initial zero statement. Yes, it's not needed and is removed now.

I would be interested to try the scenario what is mentioned here. Can you please let me know how to configure the resources for two ntb_transport instances and get two if_ntb interfaces?

sys/dev/ntb/ntb.h
111

ntb_tool (D18819) uses this. That's the reason its been added. Yes I agree that we need default handlers for other hardware drivers. But they need to be implemented in the respective hardware drivers right (or) is there any other approach we can use, as in Linux?

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
1179

Changed the PCI IDS to pci_device_table format and used PCI_PNP_INFO rather than MODULE_PNP_INFO.

rajfbsd_gmail.com marked an inline comment as done.

Addressed commented from @imp and @mav and submitted V3 patch. Also, added amd_ntb_mw_set_wc routine.

imp added a comment.Jan 18 2019, 5:28 PM

I just made a pass through for anything that caught my eye, and found a couple. I have not reviewed it to see if it matches the hardware spec for what it's trying to control.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
80

I'm not sure this is the right way to grab what you are grabbing. In the normal kernel build, all .h files are present in the current directory, or directly via -I lines. Having constructs like this leads to problems.

110

This is 0

112

This is 1

117

This a small positive integer of indeterminate value (which happens not to be 0 or 1 by chance).

So you have overlapping ranges here which is dangerous. There's other places as well that have this construct.

716

So *ntb doesn't ever change? If that's true, this is OK. If it changes, there's no locking in amd_ntb_hw_info_handler.

sys/dev/ntb/ntb_hw/ntb_hw_amd.h
8

Strictly speaking, 'All Rights Reserved.' is not needed and baggage from a by-gone era no longer with any real meaning. You may drop it entirely, if you'd like (it's up to the copyright holder to determine the form of their copyright noice, but the project has recently dropped the phrase from its model license).

mav added inline comments.Jan 18 2019, 6:15 PM
sys/dev/ntb/ntb.c
485–488

You should be able to attach two ntb_transport's by setting such loader tunable (if I understand properly what resources this AMD NTB has):
hint.ntb_hw.0.config="ntb_transport:1:6:8,ntb_transport"

To attach third there probably won't be enough scratchpads, since ntb_transport uses at least 6 in case of one memory window.

rajfbsd_gmail.com marked 3 inline comments as done.Jan 21 2019, 5:27 PM
In D18774#403351, @imp wrote:

I just made a pass through for anything that caught my eye, and found a couple. I have not reviewed it to see if it matches the hardware spec for what it's trying to control.

As I mentioned, the spec for this hardware is not yet made public. If you wish to see, you can refer the Linux driver for the same (drivers/ntb/hw/amd). Please let me know if you have any specific questions.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
80

Changed it to "dev/ntb/ntb.h". Other hardware drivers shall also be changed to be in sync. @mav , can we make this change for other hardware drivers as well.

117

I assume, the concern here is returning positive error value, so that value could wrongly be interpreted as a valid value. If so, that's a valid concern and it's been changed to negative value now.

716

Other than lnk_sta, rest of the ntb structure elements accessed are constants. Other values which could change, are being read from registers (using bus_space_read_X calls), which I believe is atomic and doesn't need any locking. "lnk_sta" is also changed only during link events. Please let me know if that needs a lock.

sys/dev/ntb/ntb_hw/ntb_hw_amd.h
8

Removed. Also removed, unwanted Copy-right from other vendor

rajfbsd_gmail.com marked 3 inline comments as done.

Submitting V4 patch, addressing comment from @imp

sys/dev/ntb/ntb.c
485–488

Hi Mav,

I tried this config in my setup. As expected, I could able to create only one transport instance and second transport instance fails (because of minimum number of spads). But with the created instance with minimum 6 spads, things are working good. Thanks for the info.

A question here, If two instance of ntb_transport is created properly, will loading if_ntb, create two network interfaces automatically with queues split equally (or based on the hint.ntb_transport.X.config)? or any manual config is needed to create two separate network interfaces?

Any more comments on this patch?

mav added inline comments.Jan 29 2019, 2:43 PM
sys/dev/ntb/ntb.c
485–488

Each if_ntb instance is independent and will automatically attach to its ntb_transport instance. If you look, you may also attach several if_ntb instances to one ntb_transport. Number of queues in each case will depend on number of doorbells it get after the split. It is all configurable, but defaults where possible should be sane.

mav added a comment.Jan 29 2019, 3:02 PM

I have no idea about the hardware, but I have suspicion that alignment constraints you may report incorrectly. Disregard if I am wrong.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
304

Just to be sure, does the hardware allows 1MB aligned windows for any window sizes? Not aligned to bigger window sizes? Neither Intel nor PLX can do that. For PLX I had to use A-LUT translation table to reduce this constraint.

356

I am curios why those checks are needed? If hardware has constraints on translation address, instead of failing here you should better report the constraint via align above. You may check if there is a reason, but I'd prefer it to be exception rather then rule.

385

I am curios why those checks are needed? If hardware has constraints on window size, instead of failing here you should better report the constraint via align_size above.

rajfbsd_gmail.com marked an inline comment as done.Feb 1 2019, 1:52 PM
rajfbsd_gmail.com added inline comments.
sys/dev/ntb/ntb_hw/ntb_hw_amd.c
304

@mav, you are right. HW doesn't allow 1M aligned. It allow a granularity range. The minimum granularity supported by this hardware is 4K aligned, which is what Linux provides. Made the same changes to return 4K aligned now in FreeBSD.

Just curious, when you say "Intel and PLX can't do that", I assume those hardware doesn't allow higher size aligned (like 1M)? Is that right? But in code, I see the comments like the alignment should be BAR sized. Also, A-LUT is there to support small size granularity. So, can you please clarify what you mean?

356

In AMD hardware, these registers are read-only when link is not enabled. So, when a client (ntb_transport) is loaded, first it calls set_trans to write these register and then does link enable. So these checks are there to capture the read-only scenario and return without proceeding furthur.

Also inline with this, needed a clarification. In client driver detach path, link disable happens before clear_trans(from free_mws). Is that valid? Shouldn't it be clearing the translate registers, then link disable and then free the mw resources? Because of this, we have the old values retained in some scenarios until the next set_trans happens to overwrite. Functionally, it is not disturbing. But looking for clarity on this.

385

This is for the same reason why we have the check for xlat registers above.

rajfbsd_gmail.com marked an inline comment as done.

Addressed comment from @mav regarding the alignment and explained the reason for those checks for xlat and limit registers.

mav added inline comments.Feb 1 2019, 2:45 PM
sys/dev/ntb/ntb_hw/ntb_hw_amd.c
304

Your change to 4K is actually opposite of what I expected. So I'd recommend to do double check that.

What I am saying is that both Intel and PLX NTB (without A-LUT) require translation address to be multiple of window (PCI BAR) size, i.e. NTB window of 1MB must be aligned to 1MB on both sides, and window of 64GB has to be aligned to 64GB, that is actually quite inconvenient. In case of PLX there is additional A-LUT table of 256 entries, which allow single window to be translated as 256 separate chunks, so that 64GB window may have alignment of 65536/256 => 256MB, that is much nicer.

356

You may see, that early window allocation was added to try avoid problems with allocation of significant piece of physically contiguous memory if it take a lot of system run time before the link first established. After the link is established, the code will check and try to adjust allocation, if needed. Neither Intel nor PLX had the problem with register writing you are describing, so the code may indeed require some tuning. Just consider that error reported there during pre-allocation will free the allocated memory window, that may give you back the problem I described later.

No new patch attached, but looking for few clarification on your previous comments.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
304

I will double check about this part and update about it.

356

Yes, we might run into the problem of not getting contiguous physical memory. But problem here is, in ntb_transport when link_up event happens and when the code tries to adjust allocation, it's seeing that (mw->xlat_size == xlat_size), and returning without programming the xlat and limit registers (set_trans will not be called). So, if that check in AMD driver needs to be removed. I could see two possible ways to handle this.

  1. I am not sure writing the xlat and limit registers once again would be a problem with Intel and PLX. If not, can we re-write those registers again when link up happens?
  2. In Linux, set_trans happens only during Link event. So, can we have the same approach here? Inline with this, may be a silly question. HW Link up happens right after module loading. So do we really face trouble in getting the physically contiguous memory during link events? But wondering about the use-case for the probability of this issue?

Any comments? or any better way we can go about here?

mav added inline comments.Feb 4 2019, 3:23 PM
sys/dev/ntb/ntb_hw/ntb_hw_amd.c
356

I think we could split memory allocation from setting translation address in ntb_transport, and still try to preallocate it on driver load, while set translation address only on link set up.

What worries me though, is that if as you say registers are not writable when link is down, then what happen when link is back up? Will they be reset to defaults with window being disabled until configured, or keep their values, and letting remote side access possibly repurposed memory?

What's about link up delay, the reason is simple -- the other node is physically not present, powered off, or at least not booted at the time. It may boot first time days or even months after the first one, when the first one has no physically contiguous memory any more.

sys/dev/ntb/ntb_hw/ntb_hw_amd.c
356

Splitting memory allocation and setting xlat address would be good.

To clarify, those registers are read-only only until link is enabled (not till link up). Link enable happens during driver load (but after ntb_set_mw to program the xlat and limit registers, which is the problem we have now). So, if splitting should be fine, we can allocate the memory before link enable and program the register after link enable. And we can have both memory reallocation and programming during link up (as we have currently), but just having them split. These registers will be RW until link is disabled (which happens during driver unload). So, link up/down will not disturb those registers or mw window access differently (unless mw size changes and needs reallocation during link up).

If we are ok with this approach, I shall make the necessary changes and submit the next patch. I am yet to get the clarity on the alignment part. Will try to get that asap.

@mav: I checked about that alignment requirements. Seems, the alignment should be on-par to the bar size (and not a fixed value) and the granularity range what I mentioned earlier for LUT based addressing (which is not implemented here). So, I need to change the alignment equal to the bar size. Do you have any comments on the previous discussed changes?

mav added a comment.Feb 8 2019, 2:18 PM

Yes, please make a patch to split allocation and programming.

Made necessary changes as per the previous discussion. Attached V6 patch.

@mav, Any more comments on this patch?

@mav: Sorry to disturb you on this again. But, please have a look at this when you get a chance and let me know if there any comments to address here.

@mav, Did you get a chance to look at this?

brooks accepted this revision as: Core Team.Mon, Mar 11, 6:58 PM
brooks added a subscriber: brooks.

This is dual licensed and fine per current policy.