Page MenuHomeFreeBSD

mips32: move support for temporary mappings above KSEG0 to per-CPU data
Needs ReviewPublic

Authored by jah on Dec 18 2018, 5:50 AM.

Details

Summary

This is derived from similar work done in r310481 for i386 and r312610 for
armv6/armv7. Additionally, use a critical section to keep the thread
pinned for per-CPU operations instead of completely disabling local interrupts.

So far, this has been tested on a qemu MALTA instance with 1GB of RAM.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 21624
Build 20917: arc lint + arc unit

Event Timeline

jah created this revision.Dec 18 2018, 5:50 AM
jah updated this revision to Diff 52136.Dec 18 2018, 5:53 AM

Remove unintended README change

jah added inline comments.
sys/mips/mips/pmap.c
205

If mips32 + SMP ends up not being a thing anymore due to the recent 64-bit atomic debate, this function can just be removed.

Otherwise, is there a good way for me to exercise that config? I'm open to buying real HW...

jah added inline comments.Dec 18 2018, 6:02 AM
sys/mips/mips/pmap.c
2653

One thing I've been curious about looking thru the mips pmap is the need to clean the cache every time we remove a mapping. It seems like MIPS d-caches are always VIPT these days, so I'd naively expect the physical tag to allow eviction to work even in the absence of a valid mapping.

Is this just an architectural requirement I'm not familiar with, or is it a mitigation against cache aliasing, or something else entirely?

I ask about the aliasing thing because I vaguely recall a post on -developers or -arch maybe ~3yrs ago that mentioned the pmap isn't doing what it should to prevent alias-related corruption.
Based on the MIPS docs I've read, it does seem like that's possible with certain combinations of (large way size + small page size + no hardware anti-alias).

I don't have any exceptionally helpful comments; the code looks fine, but I haven't given it the review it deserves.

  1. I don't think very much about 32-bit MIPS, unfortunately.
  2. If you're interested in MALTA, why not use MALTA64? If there's an actual hardware platform you care about, it'd be useful to know which one that is.
  3. The general approach of a reusable mapping for this kind of bounce buffering is a good one. Really, you could have a per-thread reusable mapping wired into the TLB for this use, and just update the wiring. That way the update path would be quicker, no TLBL/TLBS/TLBRefill. I never understood the sysmap_lmem code very well because (see point 1 above), but that's how I'd approach it.
  4. I would encourage allocating pairs of virtual pages even if you end up wasting one or more, to avoid even possibly splitting the TLB entry for one of these with something else.
  5. I agree we should get rid of 32-bit SMP on MIPS barring something really disruptive entering the marketplace. (This seems vanishingly unlikely, much as MIPS seems to be vanishing. People who would have accidentally used MIPS are all using ARM, and people who would have intentionally sought out MIPS are using RISC-V instead.)
  6. It's almost 110% certain that the cache handling is not totally right. I don't think very much these days about how to make it right, and it's a subject that requires nontrivial thought given VIVT vs. VIPT. If you're deep into reading about this stuff and have all of it paged in, I'd encourage taking a swing at auditing all of mips/mips for places that caching could be made correct, and giving it a try. You're unlikely to make it worse :)

Thanks for doing this, this is stuff that needs to be happen and looks to be good work!

jah added a comment.May 25 2019, 9:11 PM

I don't have any exceptionally helpful comments; the code looks fine, but I haven't given it the review it deserves.

  1. I don't think very much about 32-bit MIPS, unfortunately.
  2. If you're interested in MALTA, why not use MALTA64? If there's an actual hardware platform you care about, it'd be useful to know which one that is.
  3. The general approach of a reusable mapping for this kind of bounce buffering is a good one. Really, you could have a per-thread reusable mapping wired into the TLB for this use, and just update the wiring. That way the update path would be quicker, no TLBL/TLBS/TLBRefill. I never understood the sysmap_lmem code very well because (see point 1 above), but that's how I'd approach it.
  4. I would encourage allocating pairs of virtual pages even if you end up wasting one or more, to avoid even possibly splitting the TLB entry for one of these with something else.
  5. I agree we should get rid of 32-bit SMP on MIPS barring something really disruptive entering the marketplace. (This seems vanishingly unlikely, much as MIPS seems to be vanishing. People who would have accidentally used MIPS are all using ARM, and people who would have intentionally sought out MIPS are using RISC-V instead.)
  6. It's almost 110% certain that the cache handling is not totally right. I don't think very much these days about how to make it right, and it's a subject that requires nontrivial thought given VIVT vs. VIPT. If you're deep into reading about this stuff and have all of it paged in, I'd encourage taking a swing at auditing all of mips/mips for places that caching could be made correct, and giving it a try. You're unlikely to make it worse :)

Thanks for doing this, this is stuff that needs to be happen and looks to be good work!

  1. There's not any specific MIPS platform I care about, MALTA just seems to be the standard for testing under qemu. I'm mostly doing this because I did similar work for the i386 and armv6 pmaps, and it seemed like it might be nice to do the same thing for MIPS. Plus I'd get to learn something about an architecture I'm not very familiar with (I spend all my time at $work with arm/arm64, so I've arm-wrestled those to death and don't have much interest in spending my freebsd time with them).
  1. This is similar to how we already deal with per-CPU data for MIPS+SMP. I wouldn't do it per-thread, since these functions can run early in boot when there might not be a valid thread context. I'd keep things per-CPU and just take what I already have here a couple of steps further: move the KVA pageframes from per-CPU data to global and then just change the TLB entry on the map/unmap operations. The cmap1/cmap2 pages already exist as a pair, so they could get a single wired entry which would avoid splitting. The qmap page OTOH is just a single page, but we don't expect it to be used as much so it might make sense to just leave it as-is instead of taking up a wired TLB entry for it.

I do have one concern with this approach though: TLB entries seem like they might be precious. It looks like there may be as few as 64 of them, so using up another wired entry or two could have knock-on effects elsewhere. I notice that even the per-CPU data, which is used constantly, only gets a wired entry for SMP kernels even though it seems like there could still be a perf benefit to using a wired entry for it on UP.

But it also looks like, on all MIPS configurations, we're reserving TLB entry 0 for KSTACK_TLB_ENTRY. But I don't see any evidence that we're using that TLB slot currently: there's no call to tlb_insert_wired() on that slot that I can find. It also seems like a single TLB entry might not be sufficient for that, depending on the combination of PAGE_SIZE and KSTACK_PAGES.
Can that TLB entry be repurposed, or are we still using it somewhere I've missed?

  1. I think Warner killed off the one 32-bit MIPS SMP config shortly after I posted this review, for lack of 64-bit atomic support, so it's probably a moot point now anyway.
  1. Some of the cache maintenance I'm seeing in here wouldn't really make sense unless you had to allow for VIVT caches. Do we actually support any MIPS variant with VIVT? It seems like VIPT has been the rule for MIPS for quite a while now. On the flip side, even if we can take VIPT for granted, it seems like there are other places where we aren't doing enough to prevent aliasing, unless we can also assume that either a) the way size is small enough to never alias or b) the hardware we care about always has the optional hardware anti-alias feature.
jah added a comment.May 26 2019, 4:32 AM
In D18593#440598, @jah wrote:

But it also looks like, on all MIPS configurations, we're reserving TLB entry 0 for KSTACK_TLB_ENTRY. But I don't see any evidence that we're using that TLB slot currently: there's no call to tlb_insert_wired() on that slot that I can find. It also seems like a single TLB entry might not be sufficient for that, depending on the combination of PAGE_SIZE and KSTACK_PAGES.
Can that TLB entry be repurposed, or are we still using it somewhere I've missed?

To answer my own question: no, we can't repurpose entry 0. We wire it down in switch.S to guarantee we won't take TLB faults on access to the kernel stack. Makes sense, esp. if we don't have something like a boot stack or exception stack to fall back on.

In D18593#440643, @jah wrote:

To answer my own question: no, we can't repurpose entry 0. We wire it down in switch.S to guarantee we won't take TLB faults on access to the kernel stack. Makes sense, esp. if we don't have something like a boot stack or exception stack to fall back on.

Yeah, and just to be clear as to why we're stuck with that approach for SMP at least: we'd need to get the current CPU and index into an array for exception stacks using only k0 and k1 to have a per-CPU exception stack. We could require MIPS platforms to provide a macro for assembly use which does so, and we'd have to be very careful about our use of said exception stack with regard to exception nesting. Not really all that hard to do, but the cost of giving up a TLB entry isn't such a bad thing. Moreover, the kernel stack is going to be in such heavy use, that unless you have a fairly unusual workload, it's going to almost always be in the TLB anyway, so you're also avoiding doing a TLB refill for it for some performance benefit. It would be interesting, though, to measure the different in something like buildworld for the different SMP approaches here.

imp added a comment.May 26 2019, 5:36 PM

I only proposed getting rid of a mips32 SMP kernel. There's mips32r6 instructions that could be used to implement 64-bit atomics, so I stayed my hand.
I've not looked closely enough at the rest of this patch to click 'OK' but I have no objections.

jah added a comment.Sun, Sep 15, 4:33 AM

Any objections to me submitting this as-is? I got sucked into $work for the last few months, but I did take some time to re-test just now to make sure this hasn't bit-rotted. I'd like to get this in and then look at some of the cache/busdma stuff.

hi,

I have no complaints; I'm still stuck on mips32r2 and similar non-SMP hardware for these router things. I'll soon be moving to ARM though..