Page MenuHomeFreeBSD

Fix dpcpu and vnet panics with complex types at the end of the section
ClosedPublic

Authored by bz on Oct 10 2018, 10:49 PM.

Details

Summary

Apply two linker scripts when linking i386 kernel modules.
The first is used to determine whether a kernel module has a
set_pcpu or a set_vnet section and record it in a new symbol.
The new symbol is reduced to either 1 (if the section exist) or
0 (if the section does not exist). The value of the symbol will be
used in the second linker script to enlarge (pad at the end) an already
existing section by 1 byte.

This is needed as the code generated on certain architectures for
non-simple-types, e.g., an array can generate an absolute relocation
on the edge (just outside) the section and thus will not be properly
relocated. Adding the 1 byte pad to the end of the section will ensure
that even absolute relocations of complex types will be inside the
section, if they are the last object in there and hence relocation will
work properly and avoid panics such as observed with carp.ko.

PR: 230857
With suggestions from: arichardson

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

bz created this revision.Oct 10 2018, 10:49 PM
bz added a comment.Oct 10 2018, 10:50 PM

Argh the kmod changes are the old ones. I'll update in a second

bz updated this revision to Diff 48994.Oct 10 2018, 10:55 PM

The kmod.mk changes can be reduced to two linker invocations instead of three,
also needing one less intermediate file to cleanup.

I wonder if this could be done in a single pass by adding a dummy symbol in the pcpu/vnet macros and using . = . + (DEFINED(pcpu_dummy) ? 1 : 0?

bz added a comment.Oct 11 2018, 10:06 AM

I wonder if this could be done in a single pass by adding a dummy symbol in the pcpu/vnet macros and using . = . + (DEFINED(pcpu_dummy) ? 1 : 0?

I had tried something similar with DEFINED() there before and the parser never liked me. Could be because of what I was testing to be defined.
Even then, the problem of selectively, automatically adding a dummy symbol, without redefining it, and without it being splattered all over the place .. if you have a good method for doing that, I'll give DEFINED() another shot.

It seems like both lld and newer ld.bfd (I tried 2.26) always create the the output section even if the only assignment is . = . + 0 or . = ..
I the simplest solution would be to add a SHORT(0xdead) at the end and teach the kernel linker to irgnore sections with size == 2 and contents == 0xdead.

bz updated this revision to Diff 49651.Oct 26 2018, 2:08 PM

Go with Alex's suggestion of fixed padding at the end as linkers are not
working reliable enough to simply extend an already existing section (see PR 232291).
This simplfis the linker script logic and adds some extra checkes to link_elf.c
for just i386.

bz updated this revision to Diff 49653.Oct 26 2018, 2:30 PM

Update the constants to sizeof linker-script-LONG (32bit) to have a smaller
possibility of accidentally matching.

bz added a comment.Oct 26 2018, 2:33 PM

Ok, I've tested this (well the shorter constants version mostly) with my two test modules (no longer panics, size of 1 symbol works), which a linker script which had a wrong padding, and with the matching modules. I am aware that 3rd party modules will be unhappy but after spending days and weeks to get to this, no better solution could be found. Does anyone want to review this quickly so it can go into HEAD and go to 12, to prevent panics there?

jhb added a comment.Oct 26 2018, 4:44 PM

The magic value seems kind of odd. Note that it is a valid KVA on i386 now that i386 was 4:4 (and even before then it was possible to move the top of UVA down to give more KVA, e.g. some folks ran 2:2 on i386 instead of 3:1). Do you have more details on the relocation check? Is it using < instead of <= or some such?

bz added a comment.Oct 26 2018, 10:17 PM
In D17512#378547, @jhb wrote:

The magic value seems kind of odd. Note that it is a valid KVA on i386 now that i386 was 4:4 (and even before then it was possible to move the top of UVA down to give more KVA, e.g. some folks ran 2:2 on i386 instead of 3:1). Do you have more details on the relocation check? Is it using < instead of <= or some such?

Most of the history is in PR 230857 and PR 232291.

The problem is that we end up with an absolute relocation on i386 which is on the top boundary (the __stop_set_* symbol) which is just out of bounds. Given it is out of bounds it doesn't get relocated, and with that the code will access nonsense memory and panic. This absolute relocation seems to only happen in certain cases of non-simple types, e.g., an array. The only real problem is if such a non-simple type is the last symbol inside the set.

We cannot change the relocation checks to become <= to on the right edge as the next address (the __stop_set_* symbol) could be (and often is) already part of of the next section and whatever is there shall not be relocated according to pcpu or vnet rules as that'd just be a disaster again.

We cannot easily control the order of symbols, as otherwise we'd make sure there's always a simple type at the end (if there is non possibly add one). But it's already getting complicated.
So the only way Alex and I came up with that would works is manually adding padding at the end of these (two) sections, so that even a non-simple type at the end of the section with an absolute relocation would still work.

My initial attempt was to use ". = . + <absolute symbol>;" in the linker script as, according to documentation, this would not create the section if it wasn't there if "absolute symbol" was 0; so my attempt was to set it to 1 if the set_<pcpu|vnet> sections were there, otherwise to 0. Sadly this seemed to work with an old bfd (at which time this was undocumented); newer bfd (which has this behaviour documented as working), as well as lld however always create the section (both are buggy). Alex had a similar case with "DEFINED" which I also tried. The conclusion from this was that while we can try to make this 0 impact for modules not containing pcpu or vnet sections, the linkers are not reliable/feature-rich/consistent enough to make this work nicely.

Hence the only solution was the simple one to slam some static padding at the end of the section. Now this could be a single random byte. We wouldn't care. We could enforce the section size having to be at least 1 and be done with it. Given alignment will almost always increase any 1 single unaligned byte at the end to sizeof(void *), even if not in the linker, the section memory allocators and copy functions will do a roundup2(x, sizeof(void *)), we can also use the "random padding junk", put something less likely to collide with real world values there and check that it is there or otherwise we know the module was not build according to our needs and current rules. This is where the "magic value" came from. This way we can alert the user that the module needs to be compiled properly. If you think of it as an address, it could be 0xdeadcode but we use that too often. If your concern is that it's an address that could possibly lead to some dereference (despite us not having a symbol there), then sure, we can change it to any less likely initialisation value for global variables.

I am fully aware that this solution sucks, especially for out of tree modules, and if you know a better way to prevent the panics, I'll be all ears and game to try it.

Please note that if we force the section creation there are normally no __start/stop_set_* symbols created or the minimum "padding only" size will be detected and no memory in these valuable regions will be used. Only the .ko files will have a few extra bytes in size.

Does this help?

Ideally we could place the padding outside of the section instead of inside and then just use <=. However, the linker can place any orphan section inbetween so there might be some relocations...
Since that is not possible this solution looks fine to me.

I think using 0xdeadbeef or similar would make it obvious in the objdump that those 4 bytes are padding but I don't think it matters.

sys/kern/link_elf.c
695 ↗(On Diff #49653)

Could this duplicated code be extracted to a static helper function that does nothing for non-i386?

bz added a comment.Oct 30 2018, 6:44 PM

Ideally we could place the padding outside of the section instead of inside and then just use <=. However, the linker can place any orphan section inbetween so there might be some relocations...
Since that is not possible this solution looks fine to me.

Also changing the logic for i386 with regards to what a boundary is and where the section ends compared to everywhere else sounds like a bad idea to me.

WRT to making it a function: I've actually thought harmonising the pcpu and vnet code which only differs very little; for the change I'd try to shuffle as little around as possible and keep things to as close as they are, so that in case of problems it'll be easier to deal with. We can do all kinds of things to all that code in 13 if needed.

bz abandoned this revision.Nov 2 2018, 1:56 PM

See PR 230857 for details.

lwhsu added a subscriber: lwhsu.May 22 2019, 6:27 PM

Tested this in Waterloo Hackathon 2019, this fixes bug230857 and bug238012.

bz reclaimed this revision.May 22 2019, 9:33 PM

Hi,

so this happened as well with ipsec.ko not just carp.ko so we need a solution to the problem which scales. We discussed it here a bit in Waterloo and I think the only question is as to whether to use the BYTE(1) option and just put a padding byte at the end, or as to whether put a larger magic number there and actually check that the module might be compiled with the right options?

The solution as-is above was tested to help both ipsec.ko and carp.ko and I'd love to commit it either way (with or without the extra checking) the next days as some of our regression test depend on ipsec.ko and we no longer compile it into the kernel.

Any advice/comment would be welcome.

/bz

jhb added a comment.May 22 2019, 10:56 PM

I still don't understand why i386 is special in this regard. Surely i386 isn't the only architecture to ever use this type of relocation?

bz added a comment.May 26 2019, 4:18 PM
In D17512#439294, @jhb wrote:

I still don't understand why i386 is special in this regard. Surely i386 isn't the only architecture to ever use this type of relocation?

It is the only one where we have identified this so far when we were looking last year and it is the only one where our users reported such panics.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 8 2019, 5:44 PM
This revision was automatically updated to reflect the committed changes.

Over in powerpc land, we've been chasing crashes with dpcpu and vnet as well. Our current prospective fix is https://reviews.freebsd.org/D20461 but we don't understand why it seems to fix it.

bz added a comment.Jun 13 2019, 7:14 PM

Over in powerpc land, we've been chasing crashes with dpcpu and vnet as well. Our current prospective fix is https://reviews.freebsd.org/D20461 but we don't understand why it seems to fix it.

That seems to be an entirely different problem fixed for other architectures (differently). I left a comment on that review with the reference to one of the original commit for that patch (for a different architecture).