Diff between linux ntb_perf tool and the freebsd ported version. Created for @emaste.
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
./sys/dev/ntb/test/ntb_perf.c | ||
---|---|---|
291–295 | So this is a bit orthogonal to this change, but: Do we ever want to allow the compiler to reorder NTB SPAD or DB writes? We don't want the CPU to reorder some things but don't care about others; but that isn't a concern because the MMIO memory for SPAD and DB is mapped UC anyway, right? What I'm trying to ask is, would it be reasonable to just add barrier() (compiler barrier) annotations directly to ntb_*_{spad,db}_{read,write}() directly instead? |
./sys/dev/ntb/test/ntb_perf.c | ||
---|---|---|
291–295 | Okay, so marking a region as UC would also prevent compiler reordering? Or, does the memory barrier prevent the CPU from reordering instructions as well? I ask because I'm not very clear on the effects of changing the memory attributes of a region. In ntb_perf, when I changed the memory window region attribute to WC, there was a huge jump in performance (30x increase). I was not able to determine why that happened. Yes, adding the barrier to spad read/write commands will add more fences, so to speak and I don't see a problem with that. |
./sys/dev/ntb/test/ntb_perf.c | ||
---|---|---|
291–295 | No, compiler isn't aware of UC memory; compiler reordering is still a concern. Both mmiowb() (Linux) and barrier() (FreeBSD) are just compiler barriers. The other barrier primitives like wmb(), mb(), etc, are both compiler and CPU barriers. UC mode ("uncacheable") is really slow. All accesses are synchronous and cannot be combined or buffered at all. In contrast, all other modes are more relaxed and faster. WC is less synchronized than UC. I don't recall exact semantics but at least adjacent writes are allowed to be combined by the CPU before being sent out on the PCIe bus to hardware (and they can be asynchronous w.r.t. the instruction stream). (The barrier() primitive does not add any actual {m,l,s}fence instructions to generated code.) |
./sys/dev/ntb/test/ntb_perf.c | ||
---|---|---|
291–295 | That's the only thing that came to mind. I haven't looked at the patch thoroughly. | |
320 | ntb_spad_read returns int, not uint32_t. And there's not much point storing ret if you aren't going to check it. I'd probably just assert it. The only real error condition is EINVAL due to a bad index value, so I think you're ok just KASSERT()ing ret == 0. |