Page MenuHomeFreeBSD

nvme: Revert to using the old API.
Needs ReviewPublic

Authored by imp on Mar 15 2018, 6:04 PM.

Details

Summary

Bit masks are fine, except they weren't the original API. Extenral
software depends on the old API. There's no compelling need for the
new API, so revert to the old API with the proper #ifdefs for little
vs big endian operation on these fields.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 15587
Build 15623: arc lint + arc unit

Event Timeline

imp created this revision.Mar 15 2018, 6:04 PM
jimharris requested changes to this revision.Mar 15 2018, 6:17 PM

Changes look fine, but you should probably remove the commented out code.

This revision now requires changes to proceed.Mar 15 2018, 6:17 PM
imp added a comment.Mar 15 2018, 6:18 PM

Changes look fine, but you should probably remove the commented out code.

You're right, of course. I left it in mostly to make sure I'd done things right, but it is of no value beyond this review.

mav added inline comments.Mar 15 2018, 6:35 PM
sys/dev/nvme/nvme.h
491

Is there places where this used as uint16_t? Why the asymmetry between request and response representation style?

imp updated this revision to Diff 40327.Mar 15 2018, 6:46 PM

review comments so far

imp added inline comments.Mar 15 2018, 6:47 PM
sys/dev/nvme/nvme.h
491

I don't think so, and the asymmetry should be fixed.

chuck added a comment.Mar 15 2018, 6:50 PM

These changes look good to me.
Should the register definitions and other structures which previously used bit fields get similar changes?
Ditto for sbin/nvmecontrol/*?

imp added a comment.Mar 15 2018, 6:53 PM

These changes look good to me.
Should the register definitions and other structures which previously used bit fields get similar changes?

I thinks so, but there are caveats. There are several fields that take up 16 or 32 bits that this technique won't fix, and I'm torn. Ideally, I'd like this just to be for backward compat only, but some choices in the implementation make that tricky so I'm unsure. I'd like to have a discussion on the best way forward to ensure we don't have regressions in big endian support, while at the same time producing something we can MFC w/o an API/ABI hit.

Ditto for sbin/nvmecontrol/*?

I haven't checked there, but I should. there might be a camcontrol impact as well.

cy added a subscriber: cy.Mar 16 2018, 12:39 AM
wma edited reviewers, added: wma, mst_semihalf.com; removed: wma_semihalf.com.Mar 16 2018, 8:16 AM
wma accepted this revision.Mar 16 2018, 8:50 AM

Here is my two cents.

Frist of all, bitfields are fine, but not for being used as an abstraction layer of system registers for two reasons:

  1. endiannes is not maintained then,
  2. where exactly compiler puts bits inside a byte (at MSB or LSB side or even randomly) is implementation defined.

As for the second reason, gcc or clang starts from LSB as we expect here, but in my opinion it is not something we should rely on. In case we only transfer data between kernel and userspace there is no issue - due to how he fbsd is organized both are using the same compiler. The problem arises if we need to guarantee an exact place where bits are located inside a byte, for example in this case.

Rith now I'm confused, because I dislike bitfields, dislike current mask&shift approach as being ugly, and also dislike keeping API unchanged as IMO it shouldn't contain registers abstracted as bitfields in the first place.
This change tries to keep the compatibility for userspace (which is good!), but requires anonymous unions that might cause compilers to complain if some -pedantic or old cc87/prehistoric options are chosen.

I do like this patch however, and I'm giving it my thumbs up, but as a follow up we need to discuss what to do with all these headers that left.

My proposals:

  1. If we want to honour all C-specification nuances, we should consider letting API to change and fix ports to be compatible,. as the mask&offset access type is the only one 'generic' option.
  2. If we want to make it working using the least effort, this approach with adding anonymous unions is a path to follow (ugh..).
  3. We ignore C specs and rely on compiler good will to put bits in a logical way and revert to using bitfields. To allow BE/LE operation, header should be somehow generated from a meta file (I really don't like idea to have two versions of the same structures chosen by an ifdef BE/LE as it requires maintenance of both) and all places which uses u16 and u32 within the driver should be acompanied with htole/whatever macros.
imp added a comment.Mar 16 2018, 5:28 PM

Although the standard may not define the order, the ABI does. And the TCP/IP stack already requires that bits be packed how I've assume they are packed, so there's no new requirement here. I find the 'C allows anything' argument unpersuasive because in practice it doesn't really allow it, and the compiler is constrained by ABIs and things like the BSD stack depends on it. Having looked closely, though, it appears that it relies on it only for single byte quantities, which gives a smaller level of confidence than I'd originally thought (I know it used to do multiple-byte as bitfields, for example, but that was a while ago).

Second, we've already published these interfaces. We have to maintain them. We can't arbitrarily change them on a whim. We can deprecate them and use totally new interfaces, but we can't just hack and slash through the old ones like was done here. That's where most of my heart-burn comes from.

I'm OK with totally new interfaces, just not the half-new / half-old ones we have now that require code changes. I'm also good with them only working on little endian if it comes to that, since that's the only release we have to support. If we do have totally new / totally old separation, we can define the totally old to be perfect type-puns so that we can MFC this and -stable branches work. If we do the totally old / totally new approach, we can ditch the crazy anonymous unions / structs as well.

So maybe the right path forward is to publish type-punned old names on x86 only and to move to entirely new names for structures in the driver and FreeBSD-supplied user-land. Part of the reason for putting this up was to judge where we should go, not that this must be the path forward.

sys/dev/nvme/nvme.h
135

These would work only on little-endian

489

This won't work on big endian.

sys/dev/nvme/nvme_ctrlr.c
987

Won't work on big-endian.

sys/dev/nvme/nvme_ns.c
276

Again, broken on big-endian.

sys/dev/nvme/nvme_qpair.c
442

See above

In D14703#309190, @imp wrote:

Having looked closely, though, it appears that it relies on it only for single byte quantities

This is the reason why such unification (of field access) is impossible without breaking big-endian support.
I just want to make this clear to everybody. Sometimes hardware people are nice to the software folks and distribute fields of different meanings 'evenly' throughout registers and other structures when designing protocols, standards and new hardware. NVMe is not such a case.
When a field spans more than one byte, it becomes impossible to support both endianess with the simple approach of "reverse fields and ifdef".

Looking at memory of a bitfield struct as a string of bits, the bits belonging to a single field would be split in two parts and separated from each other by a gap filled by bits from a neighboring field in the struct. These two disjointed parts would then have to be glued together on access. Example of such a field in our code is nvme_status->sc, which is one of the most commonly used data fields in NVMe. In this patch, nvme_command could possibly be #ifdef'd, while nvme_status cannot.

It is indeed unfortunate that this realization has come so late in the development of the driver. Had it been written in a more future-proof and maintainable way, we wouldn't have such problems now.

IMHO the way to go should be moving to the new API in userland (e.g. nvmecontrol and smartmontools have mostly been fixed I think) if possible. Or perhaps there exists some clever C hack which I had not thought about, which could be used to unify the two APIs.

imp added a comment.Mar 16 2018, 9:10 PM
In D14703#309190, @imp wrote:

Having looked closely, though, it appears that it relies on it only for single byte quantities

This is the reason why such unification (of field access) is impossible without breaking big-endian support.
I just want to make this clear to everybody. Sometimes hardware people are nice to the software folks and distribute fields of different meanings 'evenly' throughout registers and other structures when designing protocols, standards and new hardware. NVMe is not such a case.
When a field spans more than one byte, it becomes impossible to support both endianess with the simple approach of "reverse fields and ifdef".

Right.

Looking at memory of a bitfield struct as a string of bits, the bits belonging to a single field would be split in two parts and separated from each other by a gap filled by bits from a neighboring field in the struct. These two disjointed parts would then have to be glued together on access. Example of such a field in our code is nvme_status->sc, which is one of the most commonly used data fields in NVMe. In this patch, nvme_command could possibly be #ifdef'd, while nvme_status cannot.

OK. It can for little endian only. And we have to have that interface to userland, at least in a compat fashion.

It is indeed unfortunate that this realization has come so late in the development of the driver. Had it been written in a more future-proof and maintainable way, we wouldn't have such problems now.

True, but not relevant.

IMHO the way to go should be moving to the new API in userland (e.g. nvmecontrol and smartmontools have mostly been fixed I think) if possible. Or perhaps there exists some clever C hack which I had not thought about, which could be used to unify the two APIs.

They haven't been fixed. Don't think of it that way. We've done hacks so they compile, but that's not the same as fixing them. There's many other programs that use this interface, some not public, and if we want to MFC changes to the driver (which we do), then we'll need to do it in an API-ly compatible way. What's there now isn't API-ly compatible because it breaks the old code. Sure, we can hack the old code we know about, but that doesn't mean the API isn't broken still.

The API must exist through all of 11.x. The current hacks can't be merged because they break the API. They do not create a completely new API, but break the old one in a way that's not compatible. To have a new API, it must be able to co-exist with the old API, at least on little endian platforms. What we have today isn't that. We have to move to that. Basically, we have to have new structure names for anything that we now have to access via the #define stuff. And the old structures must type pun to the new ones so the old API can continue to be used through the life of 11. Or we can never merge the driver.

This is indeed unfortunate, but it's not a situation where we can just say "too bad, fix your code" and move on. This code has to be fixed to restore the old API and to create a new API. If the hacks with bit fields isn't going to work, we have to restore the old API, define a new API, use the new API in the kernel and nvmecontrol and allow the old code to migrate to the new API as needed. We can and should make the old API unavailable in big-endian environments since we have no obligation to it there.

So I'll revise things for this, but again, I'm not at all happy with how rushed things were into the tree, nor that there was insufficient time to raise these issues before they became a problem.

And we'll have to merge the driver to 11 since there's a bunch of new work that needs to be in 11 with name spaces and similar.

linimon retitled this revision from Revert to using the old API. to nvme: Revert to using the old API..Jul 19 2018, 2:39 AM
chuck added a comment.Aug 22 2018, 4:33 AM

Alternative approach outlined in D16404 committed to svn