Add malloc_domainset_aligned() that force-align pointer-aligned result from malloc, copied from the rtld code.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Tested this, and the "bus_dmamem_alloc failed to align memory properly" messages went away!
Basically, previous version worked because malloc(9) returns result aligned on the zone size, assuming the used zone size is power of two. So there were no realignment of result and free(9) was passed the original pointer.
This can be codified in malloc_aligned() directly, but I highly dislike it. Instead I do the usual trick of storing pointer to the original allocation right before the returned memory.
sys/kern/kern_malloc.c | ||
---|---|---|
778 | I'm probably just missing something here, but why does the padding need to be '3 * align' instead of 'sizeof(void*) + align'? |
sys/kern/kern_malloc.c | ||
---|---|---|
785 | Nit: could a simple store be used here instead of memcpy()? Since align must be at least sizeof(void*) , I don't think you'll end up with unaligned access (for the architectures that care about that sort of thing). |
sys/kern/kern_malloc.c | ||
---|---|---|
785 | I remember from the rtld patch that a compiler does not know about alignment of the pointer. |
sys/kern/kern_malloc.c | ||
---|---|---|
785 |
To clarify, are you saying that a compiler might choose to emit the store as something like a series of 1-byte stores (or maybe even a call to memcpy() since compilers these days can do that), because the pointer arithmetic here will make it believe that it can't guarantee natural alignment of the access? |
sys/kern/kern_malloc.c | ||
---|---|---|
785 | No, I mean that compiler might issue a warning that the alignment of the casted pointer cannot be known. |
sys/kern/kern_malloc.c | ||
---|---|---|
800 | Is there any meaningful assertion that could be made here that 'mem' is within a certain range below 'addr', to catch corruption? |
Another solution would be to require malloc() to return self-aligned memory when the allocation size is a power of 2 (up to the page size). This is already true because of how UMA lays out slabs, we could simply set the required alignment when creating malloc zones and add assertions. Then, malloc_aligned() would just round up the allocation size to the next power of 2 and free_aligned() is not needed. With this solution there is no wasted memory when the allocation size is a power of 2. I don't object to this patch, just proposing an alternative.
The man page should be updated as well.
sys/kern/kern_malloc.c | ||
---|---|---|
782 | This could be roundup2(). |
Store alignment in the header, and assert that mem and addr are close enough on free().
Yes but malloc zones only request pointer-aligned allocation. I suspect that the only place in the kernel that depends on the undocumented feature of over-alignment is this busdma code.
Do we want to make the undocumented feature mandatory? If we ever want to change malloc underlying implementation, this requirement would either waste memory or tie hands of the new author. I assumed that being explicit there is more optimal.
I found another place by coincidence: struct aesni_session requires 16-byte alignment for key schedules. There, the allocation is handled by the generic opencrypto framework so aesni cannot easily specify the required alignment. I will post a patch soon.
Do we want to make the undocumented feature mandatory? If we ever want to change malloc underlying implementation, this requirement would either waste memory or tie hands of the new author. I assumed that being explicit there is more optimal.
I am proposing to change malloc to request self-alignment from UMA for power-of-2 zones, so the malloc_aligned() implementation can be optimized. External callers should still use malloc_aligned() if they have specific alignment requirements. Perhaps it is premature, let's see if new malloc_aligned() callers appear.
sys/kern/kern_malloc.c | ||
---|---|---|
787 | for the INVARIANTS case, should this be '(uintptr_t)mem + sizeof(void*) + sizeof(size_t)' ? |