Page MenuHomeFreeBSD

Add ioat(4) driver and test tool
ClosedPublic

Authored by cem on Aug 21 2015, 10:47 PM.
Tags
None
Referenced Files
F103256110: D3456.diff
Fri, Nov 22, 5:07 PM
F103194585: D3456.id8175.diff
Fri, Nov 22, 3:03 AM
Unknown Object (File)
Wed, Nov 20, 12:30 AM
Unknown Object (File)
Mon, Nov 18, 3:22 PM
Unknown Object (File)
Mon, Nov 18, 9:37 AM
Unknown Object (File)
Mon, Nov 18, 4:44 AM
Unknown Object (File)
Mon, Nov 18, 12:55 AM
Unknown Object (File)
Sun, Nov 17, 11:12 PM
Subscribers

Details

Reviewers
markj
jimharris
Group Reviewers
manpages
Commits
rS287117: Import ioat(4) driver
Summary

ioat is also referred to as Crystal Beach DMA and is a Platform Storage
Extension (PSE) on E5 (Sandy Bridge) Intel Xeon processors.

This driver currently supports DMA descriptors only and is part of a
larger effort to upstream an interconnect between multiple systems using
the Non-Transparent Bridge (NTB) PSE.

Sponsored by: Intel
Submitted by: jimharris, Carl Delsey <carl.r.delsey@intel.com>

Additionally (cem@):

ioat(4): Clean-up closer to style(9); add PCI IDs

Handle errors; use common error numbers instead of -1.

Add PCIIDs for newer QD/IOAT hardware (Haswell, Ivybridge, Broadwell).

Don't panic if a copy cannot be satisfied in one request, just fail it.

Use sysctls with CTL_*TUN instead of TUNABLE_FETCH in various places.

Not tested, just compiled it.

Drop LOGGING ifdef entirely.

Style-cleanup tests / test code.

Conditionalize /dev/ioat_test presense on hw.ioat.enable_ioat_test
tunable. Compile it into the module unconditionally.

Update README to match reality.

Fix ioatcontrol Makefile to work with OBJDIR.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 215
Build 215: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

sys/modules/Makefile needs to be updated too, such that _ioat is only defined when MACHINE_CPUARCH == amd64. I forgot to mention this when I commented on files.amd64 and NOTES.

cem edited edge metadata.

Add ioat to sys/modules/Makefile cond'l on AMD64

markj edited edge metadata.

The change is ok with me provided that Jim's happy with the changes in D3457.

The new driver should have a man page, perhaps based on and instead of the readme in sys/modules/ioat.

This revision is now accepted and ready to land.Aug 23 2015, 5:14 PM
sys/dev/ioat/ioat_test.c
255

I think CTLTYPE_STRING should be CTLTYPE_INT here.

cem marked an inline comment as done.Aug 24 2015, 3:13 PM
cem added inline comments.
sys/dev/ioat/ioat_test.c
255

Yeah, that seems right to me. (Also A -> I as well.) I'll write some manual pages and adjust the patch, thanks!

cem edited edge metadata.
cem marked an inline comment as done.

Write ioat.4, ioatcontrol.8 manual pages; delete README

As a KBI check, KASSERT that we don't get any surprising flags in ioat_copy()
and ioat_null().

ioat_test.c: Use CTLTYPE_INT and "I".

This revision now requires review to proceed.Aug 24 2015, 5:04 PM

Both manual pages are igor-clean.

share/man/man4/ioat.4
2

share/man/man4/Makefile needs to be updated to include this man page. Suggest only installing it on amd64.

I would not add ioatcontrol.8 to share/man/man8/Makefile, since ioatcontrol is not part of the standard build.

share/man/man4/ioat.4
40

Since you added ioat to sys/modules/Makefile, ioat.ko will be built unless MODULES_OVERRIDE is set. If a user is setting MODULES_OVERRIDE, then they need to modify MODULES_OVERRIDE for _any_ kernel module that they want to build, which implies that every single driver man page should mention MODULES_OVERRIDE (and none of them do). For consistency, I would just remove the reference to it.

If you disagree and want to keep it, it should also be noted that MODULES_OVERRIDE can be set in the kernel configuration file with "makeoptions", as a (very handy) alternative to make.conf.

84

"on a some"

85

Could you qualify "newer"?

94

Trigger an interrupt under what condition?

96

What does "grab" mean? Does it give the caller exclusive access to the channel?

101

What's a submission lock?

share/man/man4/ioat.4
2

Right. Will do.

Yes, I noticed other tools/tools/*/*.8 exist but aren't hooked up to the build (e.g. cxgbetool). I'll leave that be.

40

Oh, ok. Will drop that part then.

85

Sandybridge+. I will add a blurb to the page.

94

Successful completion of the null operation with DMA_INT_EN flagged. I'll note the flag.

96

No; I was trying to avoid 'acquire' because that has some baggage too.

Really the caller just uses ioat_get_dmaengine() to look up the bus_dmaengine_t for a given channel index, which is just a number. There is no refcount or exclusive access semantics here. I'll reword.

101

Exclusive access to submit operations on the given channel / DMA engine.

share/man/man4/ioat.4
85

Tylersburg and Jasper Forest are pre-Sandy Bridge. (On Tylersburg, it was actually in the chipset, not the CPU.) Also, these are only on Xeon-based CPUs..

Maybe just simplify this to "on some Intel server platforms" and avoid the extra complexity?

cem marked 7 inline comments as done.
cem edited edge metadata.

man4/Makefile: Add ioat.4 on amd64

ioat.4: Reword, phrasing and content improvements per MarkJ

cem edited edge metadata.

ioat.4: Avoid complexity of fully specifying I/OAT availability

This revision looks good to me.

cem marked an inline comment as done.Aug 24 2015, 6:57 PM

Mark some feedback as done.

Sidenote: It'd be really nice if Intel listed I/OAT DMA extensions in ARK, but they don't seem to. As a consumer it's hard to find what hardware does and does not support this stuff.

share/man/man4/ioat.4
85

Can do.

cem marked an inline comment as done.Aug 24 2015, 6:57 PM

This revision looks good to me.

Can you mark it 'Accepted'? In the drop down above the comment box, there should be an "Accept Revision" option. Thanks!

jimharris edited edge metadata.
This revision is now accepted and ready to land.Aug 24 2015, 6:58 PM
markj edited edge metadata.
This revision was automatically updated to reflect the committed changes.
ngie added inline comments.
head/sys/conf/files.amd64
209 ↗(On Diff #8178)

Why not create a separate driver for this?

head/sys/dev/ioat/ioat.c
333 ↗(On Diff #8178)

Why? A link to an Intel Programmer's Guide would be nice

737 ↗(On Diff #8178)

This can fail

739 ↗(On Diff #8178)

This can fail

741 ↗(On Diff #8178)

retry == FALSE

801 ↗(On Diff #8178)

Bad style (not explicitly checking types). Should this be checking 0 or NULL or false or FALSE or ...?

884–885 ↗(On Diff #8178)

This can fail

891–892 ↗(On Diff #8178)

This can also fail

895–896 ↗(On Diff #8178)

This can fail

927 ↗(On Diff #8178)

Why? A link to an Intel programmer's guide would be helpful

965 ↗(On Diff #8178)

Why 8? At the very least this deserves a well-named #define.

993–1009 ↗(On Diff #8178)

I don't like this API because it defeats the purpose of the logging infrastructure in FreeBSD.

  • %d for times :(..? why not %ju with appropriate (uintmax_t) casting?

tv_sec and tv_usec are suseconds_t, which is typedef'ed to long, so at the very least these shouldn't be truncated by casting them to (int).

  • Also, why do you need to get the time in the logs? syslog does a pretty darn good job at grabbing the time right now... IIRC grabbing the time (even from the kernel) was expensive, and this defeats the "X seen N times" message squelching from syslogd, which means that it will spam /var/log/messages, etc.
head/sys/dev/ioat/ioat_internal.h
124 ↗(On Diff #8178)

Do any of these structures need to be __packed?

head/sys/dev/ioat/ioat_test.c
120 ↗(On Diff #8178)

"miscompare" sounds awkward

167 ↗(On Diff #8178)

Will tx == NULL always be the result of an ENOMEM condition?

197–209 ↗(On Diff #8178)

Can't these be omitted?

246–247 ↗(On Diff #8178)

What if make_dev fails? It would be helpful to log the failure, instead of silently failing...

254–255 ↗(On Diff #8178)

This does not seem MPSAFE (what if 2 things come in at the same time and try to create the cdev)?

head/tools/tools/ioat/Makefile
3 ↗(On Diff #8178)

Usually stuff bitrots in tools/ . It would be a good idea to put it in usr.sbin, then conditionally build it on amd64.

4 ↗(On Diff #8178)

SRCS= ${PROG}.c is implied

head/tools/tools/ioat/ioatcontrol.c
52 ↗(On Diff #8178)

What if I pass in -1?

57 ↗(On Diff #8178)

Again, what if I pass in -1?

65 ↗(On Diff #8178)

It would be nice to err(3) on failure... especially because invalid data might cause t.status to blow up at the end if it's bogus.

I'm ok with a little bikeshedding. But if you're going to review changes in the future, I'd request that you

  1. do it before the revision is committed, and
  2. don't comment on APIs you haven't taken the time to understand (mtx_lock/unlock, ioctl, sysctl, devfs, ...)

There is some good feedback in there! But it's hidden in all of the noise.

head/sys/conf/files.amd64
209 ↗(On Diff #8178)

It was discussed earlier. The summary is it seems like an unreasonable amount of extra work for limited benefit.

head/sys/dev/ioat/ioat.c
737 ↗(On Diff #8178)

No it can't?

void
mtx_lock(struct mtx *mutex);
739 ↗(On Diff #8178)

Again, nope.

void
mtx_unlock(struct mtx *mutex);
741 ↗(On Diff #8178)

Again, nope.

retry is a boolean_t. Using boolean NOT on a boolean is perfectly reasonable.

801 ↗(On Diff #8178)

Good catch.

895–896 ↗(On Diff #8178)

No, it can't.

The callout_reset() and callout_schedule() function families return non-
zero if the callout was pending before the new function invocation was
scheduled.
993–1009 ↗(On Diff #8178)

I would be totally okay with a patch reworking the logging handling. This revision has sailed, though.

tv_sec and tv_usec are suseconds_t, which is typedef'ed to long, so at the very least these shouldn't be truncated by casting them to (int).

Fortunately, there are only 1 million microseconds in a second, which is fully representable in int, so it's perfectly fine.

head/sys/dev/ioat/ioat_internal.h
124 ↗(On Diff #8178)

It doesn't look like they will have any padding bytes without the keyword. So, I guess not.

head/sys/dev/ioat/ioat_test.c
167 ↗(On Diff #8178)

Yes. The (test) routine is defined about 50 lines above this comment and it's pretty clear that NULL only results from memory allocation failures.

197–209 ↗(On Diff #8178)

No. The devfs code invokes d_open() unconditionally. I looked.

246–247 ↗(On Diff #8178)

It cannot fail. The kernel panics if it fails.

254–255 ↗(On Diff #8178)

That's why it's not marked MPSAFE. sysctl will take Giant before invoking this function.

head/tools/tools/ioat/Makefile
3 ↗(On Diff #8178)

Bitrot is acceptable. This does not belong in usr.sbin.

head/tools/tools/ioat/ioatcontrol.c
52 ↗(On Diff #8178)

channel_index is unsigned.

57 ↗(On Diff #8178)

Also unsigned.

65 ↗(On Diff #8178)

As written, the ioctl cannot fail.

The compiler cannot "blow up" t.status because it cannot know that it is not modified by ioctl(2).

head/sys/dev/ioat/ioat.c
737 ↗(On Diff #8178)

Annoyingly, the C11 thread lib defines a mtx_lock(3) which can indeed fail.

cem marked 3 inline comments as done.Aug 25 2015, 4:31 PM