arm: atomic_testand{set,clear} didn't follow atomic(9) API
AbandonedPublic
Actions

Authored by rlibby on Jan 2 2021, 7:25 PM.

Details

Reviewers

andrew
• ian
kib
markj
mmel

Summary

atomic(9) says that it will take the mod of the bit argument by the size
of the type, but 32-bit arm's implementations did not.

Reported by: mmel

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 35857
Build 32746: arc lint + arc unit

Event Timeline

rlibby created this revision.Jan 2 2021, 7:25 PM

Herald added a subscriber: imp. · View Herald TranscriptJan 2 2021, 7:25 PM

rlibby requested review of this revision.Jan 2 2021, 7:25 PM

Harbormaster completed remote builds in B35857: Diff 81513.Jan 2 2021, 7:25 PM

rlibby added inline comments.Jan 2 2021, 7:30 PM

sys/arm/include/atomic-v6.h

861–889

Incidentally I believe these assembly routines could be shorter, but as I don't have arm hardware for testing and as such a change is not necessary, I have left it alone.

static __inline int
atomic_testandclear_32(volatile uint32_t *ptr, u_int bit)
{
	uint32_t mask, newv, nex, oldv;

	mask = 1u << (bit & 0x1f);
	__asm __volatile(
	    "1:							\n"
	    "   ldrex	%[oldv], [%[ptr]]			\n"
	    "   bic     %[newv], %[oldv], %[mask]		\n"
	    "   strex	%[nex], %[newv], [%[ptr]]		\n"
	    "   teq	%[nex], #0				\n"
	    "   it	ne					\n"
	    "   bne	1b					\n"
	    : [oldv] "=&r"   (oldv),
	      [newv] "=&r"   (newv)
	      [nex]  "=&r"   (nex)
	    : [ptr]  "r"     (ptr),
	      [mask] "r"     (mask)
	    : "cc", "memory");

	return ((oldv & mask) != 0);
}

I just committed fix for arm.
Thanks for cooperation.

sys/arm/include/atomic-v6.h
861–889	This is ~ original version which ian hand-optimized to faster (but with longer assembly) one. See https://svnweb.freebsd.org/base/head/sys/arm/include/atomic-v6.h?r1=357708&r2=357709& .

rlibby mentioned this in D27897: arm: fix atomic_testand{set,clear}_64 for ops on high bits.Jan 2 2021, 8:29 PM

Closing this one as overcome-by-events. Opened D27897.

rlibby added inline comments.Jan 2 2021, 10:50 PM

sys/arm/include/atomic-v6.h

861–889

I see. Because of the loop and earlyclobber, the compiler has a hard time with register allocation. But still better code can be generated from the compiler by making the hand-rolled assembly even simpler.

This probably goes for the rest of the llsc loops in the 32-bit arm atomic support. But I don't have a stake in this code, so again I will leave it alone. @ian, FYI.

static __inline int
atomic_testandclear_32_proposal(volatile uint32_t *ptr, u_int bit)
{
	uint32_t mask, newv, nex, oldv;

	mask = 1u << (bit & 0x1f);
	do {
		__asm __volatile(
		    "   ldrex	%[oldv], [%[ptr]]			\n"
		    "   bic     %[newv], %[oldv], %[mask]		\n"
		    "   strex	%[nex], %[newv], [%[ptr]]		\n"
		    : [oldv] "=&r"   (oldv),
		      [newv] "=&r"   (newv),
		      [nex]  "=r"    (nex)
		    : [ptr]  "r"     (ptr),
		      [mask] "r"     (mask)
		    : "memory");
	} while (nex);

	return ((oldv & mask) != 0);
}

void
wrap_tac_proposal(uint32_t *p, unsigned int v)
{
	if (atomic_testandclear_32_proposal(p, v))
		panic("%s", __func__);
}

llvm-objdump -d kernel
...
00005aa4 <wrap_tac_current>:
    5aa4: 00 48 2d e9   push    {r11, lr}
    5aa8: 0d b0 a0 e1   mov     r11, sp
    5aac: 1f 10 01 e2   and     r1, r1, #31
    5ab0: 01 c0 a0 e3   mov     r12, #1
    5ab4: 1c c1 a0 e1   lsl     r12, r12, r1
    5ab8: 9f 2f 90 e1   ldrex   r2, [r0]
    5abc: 0c 30 c2 e1   bic     r3, r2, r12
    5ac0: 93 1f 80 e1   strex   r1, r3, [r0]
    5ac4: 00 00 31 e3   teq     r1, #0
    5ac8: fa ff ff 1a   bne     #-24 <wrap_tac_current+0x14>
    5acc: 0c 10 12 e0   ands    r1, r2, r12
    5ad0: 01 10 a0 13   movne   r1, #1
    5ad4: 00 00 51 e3   cmp     r1, #0
    5ad8: 00 88 bd 08   popeq   {r11, pc}
    5adc: 00 00 00 e3   movw    r0, #0
    5ae0: 00 10 00 e3   movw    r1, #0
    5ae4: 00 00 40 e3   movt    r0, #0
    5ae8: 00 10 40 e3   movt    r1, #0
    5aec: fe ff ff eb   bl      #-8 <wrap_tac_current+0x48>

00005af0 <wrap_tac_naive>:
    5af0: 10 4c 2d e9   push    {r4, r10, r11, lr}
    5af4: 08 b0 8d e2   add     r11, sp, #8
    5af8: 1f e0 01 e2   and     lr, r1, #31
    5afc: 01 c0 a0 e3   mov     r12, #1
    5b00: 1c 3e a0 e1   lsl     r3, r12, lr
    5b04: 9f 2f 90 e1   ldrex   r2, [r0]
    5b08: 03 10 c2 e1   bic     r1, r2, r3
    5b0c: 91 4f 80 e1   strex   r4, r1, [r0]
    5b10: 00 00 34 e3   teq     r4, #0
    5b14: fa ff ff 1a   bne     #-24 <wrap_tac_naive+0x14>
    5b18: 1c 0e 12 e1   tst     r2, r12, lsl lr
    5b1c: 10 8c bd 08   popeq   {r4, r10, r11, pc}
    5b20: 00 00 00 e3   movw    r0, #0
    5b24: 00 10 00 e3   movw    r1, #0
    5b28: 00 00 40 e3   movt    r0, #0
    5b2c: 00 10 40 e3   movt    r1, #0
    5b30: fe ff ff eb   bl      #-8 <wrap_tac_naive+0x40>

00005b34 <wrap_tac_proposal>:
    5b34: 00 48 2d e9   push    {r11, lr}
    5b38: 0d b0 a0 e1   mov     r11, sp
    5b3c: 1f 10 01 e2   and     r1, r1, #31
    5b40: 01 20 a0 e3   mov     r2, #1
    5b44: 12 c1 a0 e1   lsl     r12, r2, r1
    5b48: 9f 2f 90 e1   ldrex   r2, [r0]
    5b4c: 0c 30 c2 e1   bic     r3, r2, r12
    5b50: 93 1f 80 e1   strex   r1, r3, [r0]
    5b54: 00 00 51 e3   cmp     r1, #0
    5b58: fa ff ff 1a   bne     #-24 <wrap_tac_proposal+0x14>
    5b5c: 0c 00 12 e1   tst     r2, r12
    5b60: 00 88 bd 08   popeq   {r11, pc}
    5b64: 00 00 00 e3   movw    r0, #0
    5b68: 00 10 00 e3   movw    r1, #0
    5b6c: 00 00 40 e3   movt    r0, #0
    5b70: 00 10 40 e3   movt    r1, #0
    5b74: fe ff ff eb   bl      #-8 <wrap_tac_proposal+0x40>

Revision Contents
Changeset List

Path

Size

sys/

arm/

include/

atomic-v6.h

6 lines

Diff 81513

View Options

sys/arm/include/atomic-v6.h

arm: atomic_testand{set,clear} didn't follow atomic(9) APIAbandonedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 81513

sys/arm/include/atomic-v6.h

arm: atomic_testand{set,clear} didn't follow atomic(9) API
AbandonedPublic
Actions

Revision Contents
Changeset List