Page MenuHomeFreeBSD

libc qsort: stop aliasing
ClosedPublic

Authored by kib on Jun 9 2018, 1:49 PM.

Details

Summary

qsort swap code aliases the sorted array elements to ints and longs in order to do swap on the machine words. Unfortunately this breaks with the full code optimization, e.g. LTO.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 which seems to reference code directly copied from libc/stdlib/qsort.c.

PR: 228780

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

This looks fine for FreeBSD.

The current implementation will break CheriBSD, but it's easy enough to fix on our end.

This revision is now accepted and ready to land.Jun 9 2018, 5:54 PM

This looks fine for FreeBSD.

The current implementation will break CheriBSD, but it's easy enough to fix on our end.

What do you plan to do to fix it ? I tried to provide some way to not loose the optimization of doing the word swaps, but cannot imagine how to do it without introducing MD primitive.
But for the cheri fix it should be something sufficiently close to that ?

In D15714#332308, @kib wrote:

This looks fine for FreeBSD.

The current implementation will break CheriBSD, but it's easy enough to fix on our end.

What do you plan to do to fix it ? I tried to provide some way to not loose the optimization of doing the word swaps, but cannot imagine how to do it without introducing MD primitive.
But for the cheri fix it should be something sufficiently close to that ?

There are a couple options, but probably something along the lines of:

if ((uintptr_t)a % sizeof(void *) == 0 && (uintptr_t)a % sizeof(void *) == 0 && es % sizeof(void *) == 0)
    swapfunc_byword(a, b, es);
else
   current code.
fi

Another approach I've seen in out of tree versions is to use macros to generate fast, custom versions for common primitive type sizes and alignments and then implement swap with a malloced buffer and memcpy for other cases.

There are a couple options, but probably something along the lines of:

if ((uintptr_t)a % sizeof(void *) == 0 && (uintptr_t)a % sizeof(void *) == 0 && es % sizeof(void *) == 0)
    swapfunc_byword(a, b, es);
else
   current code.
fi

This is exactly what is removed in the patch. It does not hold for aliasing.

Another approach I've seen in out of tree versions is to use macros to generate fast, custom versions for common primitive type sizes and alignments and then implement swap with a malloced buffer and memcpy for other cases.

Again, it is unknown whether the array elements type is compatible with the words. For instance,

struct {
  uint16_t a,b;
};

must not be accessed as uint32_t, for the same reason. And qsort() does not know anything about array elements except the size.

In D15714#332308, @kib wrote:

This looks fine for FreeBSD.

The current implementation will break CheriBSD, but it's easy enough to fix on our end.

What do you plan to do to fix it ? I tried to provide some way to not loose the optimization of doing the word swaps, but cannot imagine how to do it without introducing MD primitive.
But for the cheri fix it should be something sufficiently close to that ?

There are a couple options, but probably something along the lines of:

if ((uintptr_t)a % sizeof(void *) == 0 && (uintptr_t)a % sizeof(void *) == 0 && es % sizeof(void *) == 0)
    swapfunc_byword(a, b, es);
else
   current code.
fi

Another approach I've seen in out of tree versions is to use macros to generate fast, custom versions for common primitive type sizes and alignments and then implement swap with a malloced buffer and memcpy for other cases.

In D15714#332404, @kib wrote:

There are a couple options, but probably something along the lines of:

if ((uintptr_t)a % sizeof(void *) == 0 && (uintptr_t)a % sizeof(void *) == 0 && es % sizeof(void *) == 0)
    swapfunc_byword(a, b, es);
else
   current code.
fi

This is exactly what is removed in the patch. It does not hold for aliasing.

Another approach I've seen in out of tree versions is to use macros to generate fast, custom versions for common primitive type sizes and alignments and then implement swap with a malloced buffer and memcpy for other cases.

Again, it is unknown whether the array elements type is compatible with the words. For instance,

struct {
  uint16_t a,b;
};

must not be accessed as uint32_t, for the same reason. And qsort() does not know anything about array elements except the size.

If compilers take this strict an approach to aliasing (not allowing memory to be accessed differently after a cast in a separate function after verifying alignment) then I see no viable approach for CHERI but to fully pessimize all cases where es >= sizeof(void *) with a malloc and explicit memcpy().

This revision was automatically updated to reflect the committed changes.