It broke interrupt routing on some test hardware. This reverts commit eee6537665ae9830e831d7473dedd6b2cc81c2ea. This reverts commit 2bb16c6352494bf7aba92be700908ddf01e0c08b. Sponsored by: The FreeBSD Foundation
Details
- Reviewers
markj
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
It'd be useful to know where to start looking for the root cause. We have nexus_bind_intr() -> _intr_event_bind() -> ie_assign_cpu, and I think ie_assign_cpu will be intr_assign_cpu() on amd64, and the problem is likely in msi_assign_cpu()?
eee6537665ae9830e831d7473dedd6b2cc81c2ea and 2bb16c6352494bf7aba92be700908ddf01e0c08b are correct/required; this revert "works" because it truncates to CPU<256; adding cpu &= 0xff "works" equally well.
I'm unsure whether it works well as part of any existing differential, but it was suggested there should be a cpu_t type definition. That didn't fit as part of D36901, but does seem worthwhile. That would likely address the breakage which was found.
I think we should indeed have a cpu_t definition, but it would not help the issue here.
For context, I have a system with 512 CPUs. sys/dev/nvme/nvme_ctrlr.c does:
for (i = c = n = 0; i < ctrlr->num_io_queues; i++, c += n) {
qpair = &ctrlr->ioq[i];
/*
* Admin queue has ID=0. IO queues start at ID=1 -
* hence the 'i+1' here.
*/
qpair->id = i + 1;
if (ctrlr->num_io_queues > 1) {
/* Find number of CPUs served by this queue. */
for (n = 1; QP(ctrlr, c + n) == i; n++)
;
/* Shuffle multiple NVMe devices between CPUs. */
qpair->cpu = c + (device_get_unit(ctrlr->dev)+n/2) % n;
qpair->domain = pcpu_find(qpair->cpu)->pc_domain;
} else {
qpair->cpu = CPU_FFS(&cpuset_domain[ctrlr->domain]) - 1;
qpair->domain = ctrlr->domain;
}
/*
* For I/O queues, use the controller-wide max_xfer_size
* calculated in nvme_attach().
*/
error = nvme_qpair_construct(qpair, num_entries, num_trackers,
ctrlr);
if (error)
return (error);
/*
* Do not bother binding interrupts if we only have one I/O
* interrupt thread for this controller.
*/
if (ctrlr->num_io_queues > 1)
bus_bind_intr(ctrlr->dev, qpair->res, qpair->cpu);
}that is, it has its own logic to choose a CPU for each qpair if more than one, and it ends up choosing CPUs with APIC ID > 255 for some number of qpairs. prior to 2bb16c6352494bf7aba92be700908ddf01e0c08b (or with my cpu &= 0xff hack) we end up binding to a cpu < 256 despite nvme's intent, and interrupts work by being handled by a CPU other than what nvme requested.
With 2bb16c635249 we no longer truncate the value and then fail in msi_map if the APIC ID > 255 (if we don't have an iommu)