Page MenuHomeFreeBSD

Add a driver for the SafeXcel EIP-97.

Authored by markj on Jun 23 2020, 7:51 PM.



This is a packet processing module found on the espressobin. This
commit adds an opencrypto driver for the crypto and hash engines in this
module; it can be programmed to do fairly arbitrary packet processing
and so the hardware interface is quite general compared to most other
opencrypto drivers that I've looked at.

This driver implements a number of
AES modes: CBC, CTR, and XTS, as well as AES-GCM and -CCM. The block
AES modes can be combined with HMAC, and the driver can be used to
compute plain SHA hashes as well. The EIP-197 is similar but not
supported at the moment for lack of hardware to test on. It implements
some other algorithms not supported here, particularly, chacha20 and
poly1305. The EIP-97 implements DES, 3DES and RC4 as well, but I didn't
really see a reason to include support for those.

Documentation for the hardware interface is not publicly available, so much
of the driver cannot be usefully reviewed. I'd appreciate any feedback on the
opencrypto integration though.

Test Plan

cryptocheck, geli tests, ipsec tests

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • Remove unused file.
  • Fix the device string in arm64 GENERIC.
  • Depend on fdt.
  • Fix style used in bitfield definitions.
linimon retitled this revision from Add a driver for the SafeXcel EIP-97. to new driver: add a driver for the SafeXcel EIP-97..Jun 26 2020, 7:58 AM
linimon retitled this revision from new driver: add a driver for the SafeXcel EIP-97. to Add a driver for the SafeXcel EIP-97..Jun 26 2020, 5:23 PM

Address a number of TODOs, add a man page.

Remove unintended change.

markj edited the test plan for this revision. (Show Details)
markj added reviewers: cem, delphij, jhb.
markj added a reviewer: loos. added inline comments.
288 ↗(On Diff #73912)

Technically, the EIP97 is supported by this driver. The EIP-197 hasn't been tested.

Correct the driver description string in the GENERIC config.

Did you intend to attach the module to the build?

7 ↗(On Diff #73918)

You also need ofw_bus_if.h here as it's included from ofw_bus.h and ofw_bus_subr.h

66 ↗(On Diff #73918)

When is this used? I don't see it in the Linux driver or bindings documentation.

markj marked 3 inline comments as done.
  • Update sys/modules Makefile
  • Include ofw_bus_if.h
  • Remove unused compatibility string
213 ↗(On Diff #73921)

I would perhaps invert this condition so it's 'If this specific case, use EBADMSG, else use EIO for generic errors'.

Also, what happens if you have multiple errors for a single crp? Right now you use the last one. That is probably ok since it probably never happens anyway?

1214 ↗(On Diff #73921)

This seems somewhat curious as you haven't added any children devices so bus_generic_attach(dev) should be a no-op? I would normally only use this in bus drivers and just return (0) in a leaf driver.

1250 ↗(On Diff #73921)

Likewise. (Plus, normally you want to detach children devices first so they can veto the attach early)

1527 ↗(On Diff #73921)

This is true of existing OCF consumers and it's ok to rely on that for now (your existing validation of csp_ivlen in your probesession hook will protect you if this changes). I may eventually add a new ioctl to /dev/crypto that permits setting the IV length for a given session. The only use case currently is to permit running all of the NIST KAT vectors for CCM and GCM. Some of the test vectors have IV lengths and tag lengths other than 12 and 16, respectively.

1936 ↗(On Diff #73921)

This doesn't appear to handle the recently added CSP_F_SEPARATE_AAD. For that, you wound want to append the equivalent of sglist_append(sg, crp->crp_aad, crp->crp_aad_length) when crp->crp_aad != NULL. IPsec will eventually use this for AES-GCM when using ESN.

1943 ↗(On Diff #73921)

To support CSP_F_SEPARATE_OUTPUT I think you would just need to adjust this to use segs from the output buffer and start at crp_payload_output_start when CRYPTO_HAS_OUTPUT_BUFFER() is true.

2209 ↗(On Diff #73921)

This can use ahash = crypto_auth_hash(csp) if you pass the csp to this routine.

2236 ↗(On Diff #73921)

You could perhaps use hmac_init_ipad and hmac_init_opad here. To do that, you would change setkey_hmac_digest to only copy the bits out of the ctx. If you combine that with passing in the csp from the caller this function becomes:

    union authctx ctx;

    ahash = crypto_auth_hash(csp);
    hmac_init_ipad(ahash, key, klen, &ctx);
    safexcel_setkey_hmac_digest(ahash, &ctx, ses->hmac_ipad);
    hmac_init_opad(ahash, key, klen, &ctx);
    safexcel_setkey_hmac_digest(ahash, &ctx, ses->hmac_opad);
    explicit_bzero(&ctx, sizeof(ctx));
477 ↗(On Diff #73921)

FWIW, I prefer to use bus_write_4((_sc)->sc_res, _off, _val) for new drivers and avoid the use of bus_space tags and handles directly when possible.

markj marked 6 inline comments as done.

Address most of jhb's comments, except those about separate AAD and
output buffers.

213 ↗(On Diff #73921)

I changed it so that we set EBADMSG only if AUTH_FAILED is set, which I should have been doing anyway. That error flag is set if digest verification fails, so it is the only flag we expect to see. Other error flags can be set as a result of programming errors in the driver (I triggered that during development), but I don't expect to see them otherwise.

1214 ↗(On Diff #73921)

Indeed, we can just return 0 here and below.

1527 ↗(On Diff #73921)

I was surprised that OCF isn't more flexible here. It shouldn't be a problem to generalize this code once the framework supports multiple IV lengths.

1943 ↗(On Diff #73921)

Hmm, how is this supposed to work with bus_dmamap_load_crp()? Don't I need a function that will load crp->crp_obuf?

2236 ↗(On Diff #73921)

Thanks, that is much nicer. It is a bit unfortunate though that we compute the same hash twice that way.

Can this device also be used to offload generic DMA copies? Just curious.

44 ↗(On Diff #73937)

We get this one already via sys/rman.h.

It seems odd to me, but we don't get machine/bus.h via sys/bus.h.

1297 ↗(On Diff #73937)

Is the OCF klen in XTS context actually correct, or is half?

2132–2135 ↗(On Diff #73937)

AES key scheduling and encryption in software is generally pretty slow. Is it slower to chain this operation through the device?

2181–2188 ↗(On Diff #73937)

/ sizeof(uint64_t)?

2373–2375 ↗(On Diff #73937)

Don't we guarantee the session is pre-zeroed? Or is that some other object. I might misremember.

2441–2443 ↗(On Diff #73937)

It does eliminate concurrency if the same session is used from many threads. I think GELI does this. Perhaps it shouldn't.

markj marked 3 inline comments as done.Jul 1 2020, 2:47 AM
In D25417#564186, @cem wrote:

Can this device also be used to offload generic DMA copies? Just curious.

It should be possible, yes. You'd just submit a request with no encryption or hash operations, using a DIRECTION instruction to copy the input data to the output stream.

1297 ↗(On Diff #73937)

Note that sess->klen is half of the klen given by OCF (see the XTS case in safexcel_setkey()). So here, sess->klen is the block cipher key length.

2132–2135 ↗(On Diff #73937)

In principle it is possible to offload this operation to the device, but typically this is going to be done as part of session initialization, so the cost will be amortized over all operations using the session. For IPSec the cost should be negligible.

2181–2188 ↗(On Diff #73937)


2373–2375 ↗(On Diff #73937)

You are right, crypto_newsession() allocates a zeroed session softc structure.

2441–2443 ↗(On Diff #73937)

Each GELI worker thread gets a unique session, from my reading of the code.

markj marked 2 inline comments as done.

Address cem's comments.

Some minor nits. I'm happy to see this go in.

190 ↗(On Diff #73940)

Should this be a KASSERT rather than a plain panic()?

452 ↗(On Diff #73940)

If you're going to keep this delay can you leave a comment as to why it's 100?

Also if you want to keep it can you #define it to something so it's not a magic number?

782 ↗(On Diff #73940)

Please give that magic number a #define'd name.

1292 ↗(On Diff #73940)

I tend to favor parens for this type of equation, even when not strictly necessary.

1530 ↗(On Diff #73940)

0x2 is...? Name it please.

1595 ↗(On Diff #73940)

More magic numbers.

1740 ↗(On Diff #73940)

Please name this even if it is 0.

markj marked 5 inline comments as done.

Address most of gnn's comments.

This revision is now accepted and ready to land.Jul 1 2020, 3:57 PM
1527 ↗(On Diff #73921)

Well, prior to the refactor the IV length was hardcoded. The refactor added csp_ivlen which makes it possible to support this. However, existing in-tree uses of GCM and CCM (IPsec, TLS) all use an IV length of 12. As I said, I need to extend /dev/crypto's interface to support setting the value of csp_ivlen. The current ioctl doesn't include an IV length so it always uses 12.

1943 ↗(On Diff #73921)

There is now a bus_dmamap_load_crp_buf(). bus_dmamap_load_crp(crp, ...) is now just bus_dmamap_load_crp_buf(&crp->crp_buf, ...). When you have an output buffer you would use bus_dmamap_load_crp_buf() with crp->crp_obuf, but you would need to deal with the fact that you have two sglists.

2236 ↗(On Diff #73921)

Similar to the ghash case above, this isn't typically a hot path. I could provide a hmac_init_pads() perhaps that would take in two contexts and to avoid some of the duplication, but you really only duplicate work in the case of a key longer than the digest size (which OCF didn't even support consistently prior to the refactor, so isn't common anyway). The Init method of these transforms is pretty simple, and you always have to do the hash on the xored key twice.

2132–2135 ↗(On Diff #73937)

While this is true and ccr(4) takes the same approach, it does occur to me that it would be nice to have a little wrapper around AESNI or the like that one could use for quick AES bits. It could fallback to the existing C software on platforms without some kind of MD software AES. Probably not very urgent though.

2441–2443 ↗(On Diff #73937)

It does, but it's also true that right now there is nothing in OCF that implicitly says sessions are single-threaded or can only have a single request in flight. For co-processors you probably want to support multiple sessions in flight (and in theory IPsec should be able to trigger that since multiple flows can share a single SA). Most drivers permit multiple in-flight operations for a single session (ccp(4) is the one driver I know that isn't safe to do this, and it is also missing the checks to defer requests to avoid doing the wrong thing in this scenario). I believe most drivers also permit concurrent operations though they may use a lock to single thread them.

The OCF interface looks fine to me. Supporting separate output and separate AAD is optional. For IPsec separate AAD will be more meaningful once the semi half folks land ESN support. (ESN uses a 64-bit sequence number where the upper 32-bits are implicit and never sent on the wire, so not in the mbuf.)

Separate output is currently only used by KTLS, though GELI should be updated to use it since right now GELI just does an explicit copy of the data to the single buffer.

1527 ↗(On Diff #73921)

Ok, I can update the driver and test once you're able to update OCF and cryptotest to handle alternate IV lengths.

1943 ↗(On Diff #73921)

I see, thanks. This still seems pretty awkward. To support both separate AAD and separate output buffers I need to use up to three dmamaps per request. As far as I can see, to load both crp->crp_buf and crp->crp_obuf I need two back-to-back calls to bus_dmamap_load_crp_buffer(), and I don't think they can share a dmamap_t. Similarly I need a third dmamap_load for separate AAD. I have it implemented and working, but it feels more complicated than it should be.

2236 ↗(On Diff #73921)

Right, the only reason I hadn't used hmac_init_ipad/opad() was that I didn't know about them.

2441–2443 ↗(On Diff #73937)

There is nothing here preventing having multiple in-flight operations on a single session, the mutex simply serializes setup of the request. The session just seemed like a natural serialization point, which is why I went with this scheme. I doubt it will matter much in practice since the driver is going to be used on SOCs with low core counts anyway, but I could revisit this if IPSec ends up triggering a lot of ring lock contention. The only other option I can see is to have a global or per-session round-robin iterator which gets incremented for each request.

jhb added inline comments.
1527 ↗(On Diff #73921)

Yes. In fact, I looked at that last week and concluded that I probably need to adjust this anyway to have the IV length be per-op instead of per-session. The only algorithms that support variable length IVs currently (GCM and CCM) don't have any cached session state that is contingent on the IV length, and the existing /dev/crypto ioctl has a per-op length. It may be a while before I get to this though. The 'mlen' for auth should likely also be per-op. I don't think any drivers in the tree depend on that being a constant in the session but always have to either cache the session value or go get it out of the csp for each operation. I will probably adjust both of those at the same time when I do it. No reason to wait for that for this driver though.

1943 ↗(On Diff #73921)

Three separate maps is what it would take, yes. You can avoid adding these for now since your use case is IPsec. At some point separate AAD will probably get used for ESN with GCM in IPsec. KTLS uses these features, but nothing else currently.

2441–2443 ↗(On Diff #73937)

Ok, that sounds sane. ccr(4) uses the same approach of locking just for setup / completion.

1943 ↗(On Diff #73921)

Ok, thanks for confirming that I'm not missing something. I will commit the required support as a follow-up change, since I'm working on porting this driver to stable/12 and having support for separate buffers will just complicate things a bit at this point.

This revision was automatically updated to reflect the committed changes.