Page MenuHomeFreeBSD

Define PNP info after defining driver modules
ClosedPublic

Authored by markj on Jan 20 2021, 7:47 PM.

Details

Summary

PNP info has the unfortunate requirement that it must follow the
associated module definition in the module metadata linker set.
Otherwise devmatch can segfault while processing the linker hints file
since kldxref maintains the order in the linker set.

I found a number of drivers that violate this requirement after
debugging a devmatch crash on a 32-bit ARM system. We really need a
better solution than this, but let's try to work around the problem for
13.0 at least.

Test Plan

None yet.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

markj requested review of this revision.Jan 20 2021, 7:47 PM
markj added a reviewer: imp.

I agree that something better is needed here...
But unless we change the PNP macros to require driver name and bus name, there's no way to easily detect this at compile time.
At run time, we could emit a warning for this condition... At the time I wrote devmatch, no driver violated this rule, so I never saw the core dump....
This is fine enough for 13.0, though.

This revision is now accepted and ready to land.Jan 20 2021, 8:47 PM

To expand a little: devmatch shouldn't dump core... I don't see how, but I've not seen the core traceback yet...

devmatch parses the linker.hints which kldxref creates. kldxref just copies the different modules over. By the time devmatch gets involved, it's too late to know where something came from w/o this ordering requirement.
kldxref knows and could issue a warning if this precondition is violated. It's the only place that knows in the current setup.
Making the pnp marcos 'know' would require both the bus and driver name so we could toss an a reference to an extern that should already be defined to give a compile time error. This would be an invasive change, but likely a doable one.

In D28260#632046, @imp wrote:

To expand a little: devmatch shouldn't dump core... I don't see how, but I've not seen the core traceback yet...

Ah, printf -> puts is causing a NULL dereference... These changes will definitely avoid that...

Wait, kldxref should be ordering these correctly since rS348309. Why isn't it?

In D28260#632585, @cem wrote:

Wait, kldxref should be ordering these correctly since rS348309. Why isn't it?

Probably because I hit this issue on stable/12 and didn't notice that kldxref was fixed.

In D28260#632585, @cem wrote:

Wait, kldxref should be ordering these correctly since rS348309. Why isn't it?

Probably because I hit this issue on stable/12 and didn't notice that kldxref was fixed.

Hmm, the kldxref change doesn't fully fix the problem as I understand it (not that my commit does). The real problem is that the pnp info structure doesn't contain the module name, so devmatch(8) doesn't know which PNP info goes with which module. The kldxref change at least ensures that devmatch won't try to process a PNP info definition without having seen an MDT_MODULE record.

In D28260#632585, @cem wrote:

Wait, kldxref should be ordering these correctly since rS348309. Why isn't it?

Probably because I hit this issue on stable/12 and didn't notice that kldxref was fixed.

Hmm, the kldxref change doesn't fully fix the problem as I understand it (not that my commit does). The real problem is that the pnp info structure doesn't contain the module name, so devmatch(8) doesn't know which PNP info goes with which module. The kldxref change at least ensures that devmatch won't try to process a PNP info definition without having seen an MDT_MODULE record.

If there is no module information, there is no .ko and then PNP info doesn't matter...

Hmm, the kldxref change doesn't fully fix the problem as I understand it (not that my commit does). The real problem is that the pnp info structure doesn't contain the module name, so devmatch(8) doesn't know which PNP info goes with which module. The kldxref change at least ensures that devmatch won't try to process a PNP info definition without having seen an MDT_MODULE record.

MDT_PNP_INFO always corresponds to the most recently traversed MDT_MODULE. That's the ordering property guaranteed by the kldxref change. I don't know what crash or bug you hit in devmatch exactly. Do you have a stack or short summary? Does it reproduce on CURRENT or with kldxref otherwise fixed?

The reason we made the kldxref change was to generally address a specific problem this revision also seems to try to address — the ordering between MDT_MODULE and MDT_PNP_INFO. So I am surprised that we still need to shuffle things around in the code. If we're missing something in kldxref or devmatch, let's fix it there. (As the kldxref commit mentions, there is no guarantee the compiler preserves the source code ordering of static data values.)

In D28260#632737, @cem wrote:

Hmm, the kldxref change doesn't fully fix the problem as I understand it (not that my commit does). The real problem is that the pnp info structure doesn't contain the module name, so devmatch(8) doesn't know which PNP info goes with which module. The kldxref change at least ensures that devmatch won't try to process a PNP info definition without having seen an MDT_MODULE record.

MDT_PNP_INFO always corresponds to the most recently traversed MDT_MODULE. That's the ordering property guaranteed by the kldxref change. I don't know what crash or bug you hit in devmatch exactly. Do you have a stack or short summary? Does it reproduce on CURRENT or with kldxref otherwise fixed?

The reason we made the kldxref change was to generally address a specific problem this revision also seems to try to address — the ordering between MDT_MODULE and MDT_PNP_INFO. So I am surprised that we still need to shuffle things around in the code. If we're missing something in kldxref or devmatch, let's fix it there. (As the kldxref commit mentions, there is no guarantee the compiler preserves the source code ordering of static data values.)

I think that the kldxref commit was simply not MFC'd. This was seen on stable/12. I think we can leave this commit in (it doesn't hurt anything, and there's no benefit to reverting), but refrain from others in the future. I honestly had missed the kldxref commit (or forgot about it) and it's quite similar to what I'd wanted to do...

I think maybe we should revert the change, if it is truly unfounded and was only observed on 12. If this wasn't ever a problem on CURRENT, the commit message rationale is wrong and misleading. Reverting is the canonical way to correct that kind of thing.

I agree it's functionally a no-op code-wise; it's the source control metadata that matters here.

I think maybe we should revert the change, if it is truly unfounded and was only observed on 12. If this wasn't ever a problem on CURRENT, the commit message rationale is wrong and misleading. Reverting is the canonical way to correct that kind of thing.

I agree, I will revert this change and MFC the kldxref commit to 12 when I get a chance.

I don't know what crash or bug you hit in devmatch exactly. Do you have a stack or short summary?

The crash was in devmatch's search_hints(), when it processed a MDT_PNP_INFO record without having seen a MDT_MODULE record.