Page MenuHomeFreeBSD

x86: restore intr_bind
AbandonedPublic

Authored by emaste on Aug 14 2023, 2:21 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, May 12, 5:50 AM
Unknown Object (File)
Mon, May 6, 1:32 AM
Unknown Object (File)
Wed, May 1, 11:37 PM
Unknown Object (File)
Wed, May 1, 11:34 PM
Unknown Object (File)
Wed, May 1, 8:19 PM
Unknown Object (File)
Mon, Apr 22, 7:07 PM
Unknown Object (File)
Mar 14 2024, 4:01 PM
Unknown Object (File)
Feb 15 2024, 5:18 AM

Details

Reviewers
markj
Summary
It broke interrupt routing on some test hardware.

This reverts commit eee6537665ae9830e831d7473dedd6b2cc81c2ea.
This reverts commit 2bb16c6352494bf7aba92be700908ddf01e0c08b.

Sponsored by:   The FreeBSD Foundation

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

emaste created this revision.

It'd be useful to know where to start looking for the root cause. We have nexus_bind_intr() -> _intr_event_bind() -> ie_assign_cpu, and I think ie_assign_cpu will be intr_assign_cpu() on amd64, and the problem is likely in msi_assign_cpu()?

eee6537665ae9830e831d7473dedd6b2cc81c2ea and 2bb16c6352494bf7aba92be700908ddf01e0c08b are correct/required; this revert "works" because it truncates to CPU<256; adding cpu &= 0xff "works" equally well.

I'm unsure whether it works well as part of any existing differential, but it was suggested there should be a cpu_t type definition. That didn't fit as part of D36901, but does seem worthwhile. That would likely address the breakage which was found.

I'm unsure whether it works well as part of any existing differential, but it was suggested there should be a cpu_t type definition. That didn't fit as part of D36901, but does seem worthwhile. That would likely address the breakage which was found.

I think we should indeed have a cpu_t definition, but it would not help the issue here.

For context, I have a system with 512 CPUs. sys/dev/nvme/nvme_ctrlr.c does:

for (i = c = n = 0; i < ctrlr->num_io_queues; i++, c += n) {
        qpair = &ctrlr->ioq[i];

        /*
         * Admin queue has ID=0. IO queues start at ID=1 -
         *  hence the 'i+1' here.
         */
        qpair->id = i + 1;
        if (ctrlr->num_io_queues > 1) {
                /* Find number of CPUs served by this queue. */
                for (n = 1; QP(ctrlr, c + n) == i; n++)
                        ;
                /* Shuffle multiple NVMe devices between CPUs. */
                qpair->cpu = c + (device_get_unit(ctrlr->dev)+n/2) % n;
                qpair->domain = pcpu_find(qpair->cpu)->pc_domain;
        } else {
                qpair->cpu = CPU_FFS(&cpuset_domain[ctrlr->domain]) - 1;
                qpair->domain = ctrlr->domain;
        }

        /*
         * For I/O queues, use the controller-wide max_xfer_size
         *  calculated in nvme_attach().
         */
        error = nvme_qpair_construct(qpair, num_entries, num_trackers,
            ctrlr);
        if (error)
                return (error);

        /*
         * Do not bother binding interrupts if we only have one I/O
         *  interrupt thread for this controller.
         */
        if (ctrlr->num_io_queues > 1)
                bus_bind_intr(ctrlr->dev, qpair->res, qpair->cpu);
}

that is, it has its own logic to choose a CPU for each qpair if more than one, and it ends up choosing CPUs with APIC ID > 255 for some number of qpairs. prior to 2bb16c6352494bf7aba92be700908ddf01e0c08b (or with my cpu &= 0xff hack) we end up binding to a cpu < 256 despite nvme's intent, and interrupts work by being handled by a CPU other than what nvme requested.

With 2bb16c635249 we no longer truncate the value and then fail in msi_map if the APIC ID > 255 (if we don't have an iommu)