Page MenuHomeFreeBSD

atomics: Constify loads
ClosedPublic

Authored by olce on Oct 3 2024, 5:02 PM.
Tags
None
Referenced Files
F107477647: D46887.id144193.diff
Tue, Jan 14, 6:25 PM
F107466429: D46887.diff
Tue, Jan 14, 2:13 PM
Unknown Object (File)
Thu, Dec 19, 3:26 AM
Unknown Object (File)
Wed, Dec 18, 11:52 AM
Unknown Object (File)
Wed, Dec 18, 3:11 AM
Unknown Object (File)
Mon, Dec 16, 2:45 PM
Unknown Object (File)
Dec 13 2024, 7:34 PM
Unknown Object (File)
Dec 9 2024, 12:59 AM

Details

Summary

To match reality and allow using these functions with pointers on const
objects.

Remove the '+' modifier in the atomic_load_acq_64_i586()'s inline asm
statement's constraint for '*p' (the value to load). CMPXCHG8B always
writes back some value, even when the value exchange does not happen, in
which case what was read is written back. atomic_load_acq_64_i586()
further takes care of the operation atomically writing back the same
value that was read in any case. All in all, this makes the inline
asm's write back undetectable by any other code, whether executing on
other CPUs or code on the same CPU before and after the call to
atomic_load_acq_64_i586(), except for the fact that CMPXCHG8B will
trigger a #GP(0) if the memory address is part of a read-only mapping.
This unfortunate property is however out of scope of the C abstract
machine, and in particular independent of whether the 'uint64_t' pointed
to is declared 'const' or not.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

olce requested review of this revision.Oct 3 2024, 5:02 PM

What is the purpose of this change? What should it change?

sys/i386/include/atomic.h
455

This is not correct, the implementation actually writes to the *p location, albeit idempotent value.

olce edited the summary of this revision. (Show Details)

Change atomic_load_acq_64_i586(), and explain both the reason for this change and overall for this revision.

olce marked an inline comment as done.Oct 4 2024, 7:47 AM
sys/i386/include/atomic.h
462–463

This is even more wrong.

Problem is that the write happens, so 'const' is a lie. For instance, the op cannot be used on ro mapping, while const implies that it can be.

sys/i386/include/atomic.h
462–463

const is not a lie, on the contrary it is true at its level, namely, the language's abstract machine level. The compiler can't know, doesn't care and shouldn't care about how pages are actually mapped by the hardware. Same for "m" instead of "+m" (these are directives for the compiler, nothing more).

The real problem here is the protection fault that LOCK CMPXCHG8B can trigger. One indeed can expect to be able to load something atomically from read-only mapped pages without such fault. The use of this instruction is a clever trick, but in this light is nonetheless conceptually wrong, irrespective of the presence of const or not. It is wrong because it is basically implying that code calling atomic_load_acq_64() must be so on read-write-mapped pages (again, this has nothing to do with the presence of const). Moreover, AFAICT, i586/Pentium is the only such outlier in our tree.

For the first part, having const in these functions' prototype is a security advantage as it makes it possible to use const in more places, making the code we produce safer. For the second part, modern architectures all provide the necessary instructions to atomically load without writing, so automatically abide by the assumption of "no fault on read-only mappings".

So I'd rather replace LOCK CMPXCHG8B by something that doesn't trigger writes for i586. Searching a bit, I've only found the FILD/FISTP alternative which, yes, entails floating-point use in the kernel.

The drawbacks of doing this are a bit more time spent to figure the alternative right, and pessimizing kernel code on i586, which I doubt is still in use and support for which will vanish in the next release. Trying to optimize specifically for UP vs SMP for i586 is not even worth the thought.

So, as I currently see it (please correct me if I'm wrong), this is a trade-off between a slight increase in security/correctness for our code going forward versus slight performance pessimization of a very outdated processor family we soon won't even support anymore.

To me, provided we can work out the FILD/FISTP alternative (or another one, if you have a better idea), this is a no-brainer.

This seems fine insofar as it brings us closer to the C11 interface.

sys/i386/include/atomic.h
462–463

Presumably you have some patches which depend on the const qualifier being present here? If not, I'd rather not spend significant time on this. If so, is it possible to have some special case (i.e., use __DECONST) on i386, given that platform support will be removed in the not-too-distant future?

sys/i386/include/atomic.h
462–463

Yes, I have a patch depending on this.

I'd rather not spend too much time on this too, inasmuch as I don't know how to test a change here properly, not having that hardware anymore (perhaps QEMU still can?).

You mean, using __DECONST() at the point of use? In this case, I think it's simpler and better to stick with the current version here, i.e., everything is changed to use const in advance of removal of i386 (I don't think it is critical to MFC these changes, having them in head is enough). As developed above, this is conceptually correct and won't change the code generated by the compiler even for i586.

Any modern amd64 machine is capable of booting i386, except machines with 64bit UEFI without CSM.

Note that sys/i386/include/atomic.h is not going anywhere even if i386 kernels are removed.

Right now userspace

const uint64_t x = 1;
atomic_load_64_i586(&x);

does not compile, after the change it should fault at runtime.

In D46887#1069912, @kib wrote:

Any modern amd64 machine is capable of booting i386, except machines with 64bit UEFI without CSM.

You've just made me realize that atomic_load_64_i586() is used not only for i586 but also for all above processors. However, as we're going to see, that doesn't seem to really matter, and we have ways out anyway.

Note that sys/i386/include/atomic.h is not going anywhere even if i386 kernels are removed.

Yes, but the kernel parts in it are going to be removed, thus, all atomic_*_64*() functions will disappear, won't they?

Right now userspace

const uint64_t x = 1;
atomic_load_64_i586(&x);

does not compile, after the change it should fault at runtime.

It seems to me that atomic_*_64*() are not exposed to userland for i386. So is this case relevant?

If it is nonetheless, then there is a relatively simple way out: Implement an atomic_load_acq_64_sse2() variant based on SSE2. Since all amd64 processors support SSE2, once the kernel part of i386 is removed, we just have to have atomic_load_64_i586() behave like atomic_load_64_sse2(), which solves faults at runtime. And, regarding the signature change for atomic_load_acq_64_i586() (assuming signature is shared between kernel and userland), if specifically necessary we could postpone it until i386 is gone, and in the meantime use __DECONST() internally in the kernel when calling it from atomic_load_acq_64().

Yes, perhaps SSE2 is the way out when i386 become userspace-only.

If so, is it possible to have some special case (i.e., use __DECONST) on i386, given that platform support will be removed in the not-too-distant future?

Re-reading this, I think you perhaps meant instead (of what I first inferred above) using __DECONST in atomic_load_acq_64() when calling atomic_load_acq_64_i586()? In which case, it is of course possible, but I don't think desirable, please see next paragraph.

In D46887#1069912, @kib wrote:

Right now userspace

const uint64_t x = 1;
atomic_load_64_i586(&x);

does not compile, after the change it should fault at runtime.

Re-reading this snippet, I don't see a case where atomic loading of a constant variable could make sense. While it is better to have such code fail at compile time, it will also immediately fail at runtime too, and more importantly seems unlikely to be ever produced in practice. And this function is not exported to userspace anyway.

In D46887#1070262, @kib wrote:

Yes, perhaps SSE2 is the way out when i386 become userspace-only.

Given the near removal of i386 kernel-wise, this comment seems to make sense only if we are going to provide code for 64-bit loads in userland for i386 (which we do not today). Am I misunderstanding something? Additionally, I thought the longer-term plan was to rely on compilers for such atomic operations (at least in userland) instead of rolling out our own?

Given all the above, barring any misunderstanding on my part, my current plan is to:

  • Commit this as is.
  • Open a separate revision adding atomic_load_acq_64_sse2() but only if you tell me that having our own hand-rolled implementation is necessary, e.g., in preparation of exposing the atomic_*_64() functions to userland (because I still don't understand why we would need one otherwise, given the i386 kernel is going away and these are not going to be used by the amd64 kernel).

Is that acceptable?

Atomic load from const uint64_t by itself is not very useful, but having uint64_t * which sometimes point to a constant is not too strange.

I noted that load_64_586 does not work on ro mappings when I developed userspace timecounters. I tried to use it for reading some value from the shared page and it faulted for obvious reasons. As result, ABI of the timehands exposed to userspace is limited to uint32_t on 32bit x86.

In D46887#1070776, @kib wrote:

Atomic load from const uint64_t by itself is not very useful, but having uint64_t * which sometimes point to a constant is not too strange.

Yes. Passing a uint64_t * to a function having a const pointer parameter is however not a problem.

I noted that load_64_586 does not work on ro mappings when I developed userspace timecounters. I tried to use it for reading some value from the shared page and it faulted for obvious reasons. As result, ABI of the timehands exposed to userspace is limited to uint32_t on 32bit x86.

I can image the surprise... But I assume this ship has sailed already, i.e., you do not plan to change that now, do you?

In D46887#1070776, @kib wrote:

Atomic load from const uint64_t by itself is not very useful, but having uint64_t * which sometimes point to a constant is not too strange.

Yes. Passing a uint64_t * to a function having a const pointer parameter is however not a problem.

I mean that using atomic_load() on memory living in ro mapped segments can occur naturally, by dereferencing uint64_t *.

I noted that load_64_586 does not work on ro mappings when I developed userspace timecounters. I tried to use it for reading some value from the shared page and it faulted for obvious reasons. As result, ABI of the timehands exposed to userspace is limited to uint32_t on 32bit x86.

I can image the surprise... But I assume this ship has sailed already, i.e., you do not plan to change that now, do you?

I do not plan this now of course, since we already have the ABI for sharedpage timehands.

Still, having atomic_64 for i386 userspace is probably a requirement which we cannot avoid. Either we should keep current cmpxchg8b bits, or switch to SSE2 load as you proposed.

olce retitled this revision from atomic: Constify atomic loads to atomics: Constify loads.
olce edited the summary of this revision. (Show Details)

Fix compilation of atomic_load_acq_64_i586(). Expand commit message.

This revision was not accepted when it landed; it landed in state Needs Review.Mon, Dec 16, 2:45 PM
Closed by commit rG5e9a82e898d5: atomics: Constify loads (authored by olce). · Explain Why
This revision was automatically updated to reflect the committed changes.

Please see D48062 for a followup of the i386 discussion.

Until a personal issue is taken care of I'm not going to be eloquent. Looks like you figured out how to suppress the fault, which D34701 lacked. Having a commit bit also helps get things through... I'm glad this is done since most cases it will be a const pointer to writable memory (the const being used to cause the compiler to error if it is written to).