Index: head/lib/libc/softfloat/Makefile.inc =================================================================== --- head/lib/libc/softfloat/Makefile.inc (revision 230362) +++ head/lib/libc/softfloat/Makefile.inc (revision 230363) @@ -1,20 +1,30 @@ -# $NetBSD: Makefile.inc,v 1.3 2003/05/06 08:58:20 rearnsha Exp $ +# $NetBSD: Makefile.inc,v 1.10 2011/07/04 02:53:15 mrg Exp $ # $FreeBSD$ SOFTFLOAT_BITS?=64 .PATH: ${LIBC_ARCH}/softfloat \ ${.CURDIR}/softfloat/bits${SOFTFLOAT_BITS} ${.CURDIR}/softfloat CFLAGS+= -I${.CURDIR}/${LIBC_ARCH}/softfloat -I${.CURDIR}/softfloat CFLAGS+= -DSOFTFLOAT_FOR_GCC SRCS+= softfloat.c SRCS+= fpgetround.c fpsetround.c fpgetmask.c fpsetmask.c \ fpgetsticky.c SRCS+= eqsf2.c nesf2.c gtsf2.c gesf2.c ltsf2.c lesf2.c negsf2.c \ eqdf2.c nedf2.c gtdf2.c gedf2.c ltdf2.c ledf2.c negdf2.c \ unordsf2.c unorddf2.c + +.if defined(SOFTFLOAT_128) +CFLAGS+= -DFLOAT128 +SRCS+= eqtf2.c netf2.c gttf2.c getf2.c lttf2.c letf2.c negtf2.c +.endif + +.if defined(SOFTFLOAT_X80) +CFLAGS+= -DFLOATX80 +SRCS+= nexf2.c gtxf2.c gexf2.c negxf2.c +.endif SYM_MAPS+= ${.CURDIR}/softfloat/Symbol.map Index: head/lib/libc/softfloat/bits32/softfloat-macros =================================================================== --- head/lib/libc/softfloat/bits32/softfloat-macros (revision 230362) +++ head/lib/libc/softfloat/bits32/softfloat-macros (revision 230363) @@ -1,649 +1,649 @@ /* $FreeBSD$ */ /* =============================================================================== This C source fragment is part of the SoftFloat IEC/IEEE Floating-point Arithmetic Package, Release 2a. Written by John R. Hauser. This work was made possible in part by the International Computer Science Institute, located at Suite 600, 1947 Center Street, Berkeley, California 94704. Funding was partially provided by the National Science Foundation under grant MIP-9311980. The original version of this code was written as part of a project to build a fixed-point vector processor in collaboration with the University of California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. More information is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ arithmetic/SoftFloat.html'. THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. Derivative works are acceptable, even for commercial purposes, so long as (1) they include prominent notice that the work is derivative, and (2) they include prominent notice akin to these four paragraphs for those parts of this code that are retained. =============================================================================== */ /* ------------------------------------------------------------------------------- Shifts `a' right by the number of bits given in `count'. If any nonzero bits are shifted off, they are ``jammed'' into the least significant bit of the result by setting the least significant bit to 1. The value of `count' can be arbitrarily large; in particular, if `count' is greater than 32, the result will be either 0 or 1, depending on whether `a' is zero or nonzero. The result is stored in the location pointed to by `zPtr'. ------------------------------------------------------------------------------- */ INLINE void shift32RightJamming( bits32 a, int16 count, bits32 *zPtr ) { bits32 z; if ( count == 0 ) { z = a; } else if ( count < 32 ) { z = ( a>>count ) | ( ( a<<( ( - count ) & 31 ) ) != 0 ); } else { z = ( a != 0 ); } *zPtr = z; } /* ------------------------------------------------------------------------------- Shifts the 64-bit value formed by concatenating `a0' and `a1' right by the number of bits given in `count'. Any bits shifted off are lost. The value of `count' can be arbitrarily large; in particular, if `count' is greater than 64, the result will be 0. The result is broken into two 32-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void shift64Right( bits32 a0, bits32 a1, int16 count, bits32 *z0Ptr, bits32 *z1Ptr ) { bits32 z0, z1; int8 negCount = ( - count ) & 31; if ( count == 0 ) { z1 = a1; z0 = a0; } else if ( count < 32 ) { z1 = ( a0<>count ); z0 = a0>>count; } else { z1 = ( count < 64 ) ? ( a0>>( count & 31 ) ) : 0; z0 = 0; } *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Shifts the 64-bit value formed by concatenating `a0' and `a1' right by the number of bits given in `count'. If any nonzero bits are shifted off, they are ``jammed'' into the least significant bit of the result by setting the least significant bit to 1. The value of `count' can be arbitrarily large; in particular, if `count' is greater than 64, the result will be either 0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or nonzero. The result is broken into two 32-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void shift64RightJamming( bits32 a0, bits32 a1, int16 count, bits32 *z0Ptr, bits32 *z1Ptr ) { bits32 z0, z1; int8 negCount = ( - count ) & 31; if ( count == 0 ) { z1 = a1; z0 = a0; } else if ( count < 32 ) { z1 = ( a0<>count ) | ( ( a1<>count; } else { if ( count == 32 ) { z1 = a0 | ( a1 != 0 ); } else if ( count < 64 ) { z1 = ( a0>>( count & 31 ) ) | ( ( ( a0<>count ); z0 = a0>>count; } else { if ( count == 32 ) { z2 = a1; z1 = a0; } else { a2 |= a1; if ( count < 64 ) { z2 = a0<>( count & 31 ); } else { z2 = ( count == 64 ) ? a0 : ( a0 != 0 ); z1 = 0; } } z0 = 0; } z2 |= ( a2 != 0 ); } *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Shifts the 64-bit value formed by concatenating `a0' and `a1' left by the number of bits given in `count'. Any bits shifted off are lost. The value of `count' must be less than 32. The result is broken into two 32-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void shortShift64Left( bits32 a0, bits32 a1, int16 count, bits32 *z0Ptr, bits32 *z1Ptr ) { *z1Ptr = a1<>( ( - count ) & 31 ) ); } /* ------------------------------------------------------------------------------- Shifts the 96-bit value formed by concatenating `a0', `a1', and `a2' left by the number of bits given in `count'. Any bits shifted off are lost. The value of `count' must be less than 32. The result is broken into three 32-bit pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. ------------------------------------------------------------------------------- */ INLINE void shortShift96Left( bits32 a0, bits32 a1, bits32 a2, int16 count, bits32 *z0Ptr, bits32 *z1Ptr, bits32 *z2Ptr ) { bits32 z0, z1, z2; int8 negCount; z2 = a2<>negCount; z0 |= a1>>negCount; } *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Adds the 64-bit value formed by concatenating `a0' and `a1' to the 64-bit value formed by concatenating `b0' and `b1'. Addition is modulo 2^64, so any carry out is lost. The result is broken into two 32-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void add64( bits32 a0, bits32 a1, bits32 b0, bits32 b1, bits32 *z0Ptr, bits32 *z1Ptr ) { bits32 z1; z1 = a1 + b1; *z1Ptr = z1; *z0Ptr = a0 + b0 + ( z1 < a1 ); } /* ------------------------------------------------------------------------------- Adds the 96-bit value formed by concatenating `a0', `a1', and `a2' to the 96-bit value formed by concatenating `b0', `b1', and `b2'. Addition is modulo 2^96, so any carry out is lost. The result is broken into three 32-bit pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. ------------------------------------------------------------------------------- */ INLINE void add96( bits32 a0, bits32 a1, bits32 a2, bits32 b0, bits32 b1, bits32 b2, bits32 *z0Ptr, bits32 *z1Ptr, bits32 *z2Ptr ) { bits32 z0, z1, z2; int8 carry0, carry1; z2 = a2 + b2; carry1 = ( z2 < a2 ); z1 = a1 + b1; carry0 = ( z1 < a1 ); z0 = a0 + b0; z1 += carry1; - z0 += ( z1 < carry1 ); + z0 += ( z1 < (bits32)carry1 ); z0 += carry0; *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Subtracts the 64-bit value formed by concatenating `b0' and `b1' from the 64-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo 2^64, so any borrow out (carry out) is lost. The result is broken into two 32-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void sub64( bits32 a0, bits32 a1, bits32 b0, bits32 b1, bits32 *z0Ptr, bits32 *z1Ptr ) { *z1Ptr = a1 - b1; *z0Ptr = a0 - b0 - ( a1 < b1 ); } /* ------------------------------------------------------------------------------- Subtracts the 96-bit value formed by concatenating `b0', `b1', and `b2' from the 96-bit value formed by concatenating `a0', `a1', and `a2'. Subtraction is modulo 2^96, so any borrow out (carry out) is lost. The result is broken into three 32-bit pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. ------------------------------------------------------------------------------- */ INLINE void sub96( bits32 a0, bits32 a1, bits32 a2, bits32 b0, bits32 b1, bits32 b2, bits32 *z0Ptr, bits32 *z1Ptr, bits32 *z2Ptr ) { bits32 z0, z1, z2; int8 borrow0, borrow1; z2 = a2 - b2; borrow1 = ( a2 < b2 ); z1 = a1 - b1; borrow0 = ( a1 < b1 ); z0 = a0 - b0; - z0 -= ( z1 < borrow1 ); + z0 -= ( z1 < (bits32)borrow1 ); z1 -= borrow1; z0 -= borrow0; *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Multiplies `a' by `b' to obtain a 64-bit product. The product is broken into two 32-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void mul32To64( bits32 a, bits32 b, bits32 *z0Ptr, bits32 *z1Ptr ) { bits16 aHigh, aLow, bHigh, bLow; bits32 z0, zMiddleA, zMiddleB, z1; aLow = a; aHigh = a>>16; bLow = b; bHigh = b>>16; z1 = ( (bits32) aLow ) * bLow; zMiddleA = ( (bits32) aLow ) * bHigh; zMiddleB = ( (bits32) aHigh ) * bLow; z0 = ( (bits32) aHigh ) * bHigh; zMiddleA += zMiddleB; z0 += ( ( (bits32) ( zMiddleA < zMiddleB ) )<<16 ) + ( zMiddleA>>16 ); zMiddleA <<= 16; z1 += zMiddleA; z0 += ( z1 < zMiddleA ); *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Multiplies the 64-bit value formed by concatenating `a0' and `a1' by `b' to obtain a 96-bit product. The product is broken into three 32-bit pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. ------------------------------------------------------------------------------- */ INLINE void mul64By32To96( bits32 a0, bits32 a1, bits32 b, bits32 *z0Ptr, bits32 *z1Ptr, bits32 *z2Ptr ) { bits32 z0, z1, z2, more1; mul32To64( a1, b, &z1, &z2 ); mul32To64( a0, b, &z0, &more1 ); add64( z0, more1, 0, z1, &z0, &z1 ); *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Multiplies the 64-bit value formed by concatenating `a0' and `a1' to the 64-bit value formed by concatenating `b0' and `b1' to obtain a 128-bit product. The product is broken into four 32-bit pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. ------------------------------------------------------------------------------- */ INLINE void mul64To128( bits32 a0, bits32 a1, bits32 b0, bits32 b1, bits32 *z0Ptr, bits32 *z1Ptr, bits32 *z2Ptr, bits32 *z3Ptr ) { bits32 z0, z1, z2, z3; bits32 more1, more2; mul32To64( a1, b1, &z2, &z3 ); mul32To64( a1, b0, &z1, &more2 ); add64( z1, more2, 0, z2, &z1, &z2 ); mul32To64( a0, b0, &z0, &more1 ); add64( z0, more1, 0, z1, &z0, &z1 ); mul32To64( a0, b1, &more1, &more2 ); add64( more1, more2, 0, z2, &more1, &z2 ); add64( z0, z1, 0, more1, &z0, &z1 ); *z3Ptr = z3; *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Returns an approximation to the 32-bit integer quotient obtained by dividing `b' into the 64-bit value formed by concatenating `a0' and `a1'. The divisor `b' must be at least 2^31. If q is the exact quotient truncated toward zero, the approximation returned lies between q and q + 2 inclusive. If the exact quotient q is larger than 32 bits, the maximum positive 32-bit unsigned integer is returned. ------------------------------------------------------------------------------- */ static bits32 estimateDiv64To32( bits32 a0, bits32 a1, bits32 b ) { bits32 b0, b1; bits32 rem0, rem1, term0, term1; bits32 z; if ( b <= a0 ) return 0xFFFFFFFF; b0 = b>>16; z = ( b0<<16 <= a0 ) ? 0xFFFF0000 : ( a0 / b0 )<<16; mul32To64( b, z, &term0, &term1 ); sub64( a0, a1, term0, term1, &rem0, &rem1 ); while ( ( (sbits32) rem0 ) < 0 ) { z -= 0x10000; b1 = b<<16; add64( rem0, rem1, b0, b1, &rem0, &rem1 ); } rem0 = ( rem0<<16 ) | ( rem1>>16 ); z |= ( b0<<16 <= rem0 ) ? 0xFFFF : rem0 / b0; return z; } #ifndef SOFTFLOAT_FOR_GCC /* ------------------------------------------------------------------------------- Returns an approximation to the square root of the 32-bit significand given by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of `aExp' (the least significant bit) is 1, the integer returned approximates 2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either case, the approximation returned lies strictly within +/-2 of the exact value. ------------------------------------------------------------------------------- */ static bits32 estimateSqrt32( int16 aExp, bits32 a ) { static const bits16 sqrtOddAdjustments[] = { 0x0004, 0x0022, 0x005D, 0x00B1, 0x011D, 0x019F, 0x0236, 0x02E0, 0x039C, 0x0468, 0x0545, 0x0631, 0x072B, 0x0832, 0x0946, 0x0A67 }; static const bits16 sqrtEvenAdjustments[] = { 0x0A2D, 0x08AF, 0x075A, 0x0629, 0x051A, 0x0429, 0x0356, 0x029E, 0x0200, 0x0179, 0x0109, 0x00AF, 0x0068, 0x0034, 0x0012, 0x0002 }; int8 index; bits32 z; index = ( a>>27 ) & 15; if ( aExp & 1 ) { z = 0x4000 + ( a>>17 ) - sqrtOddAdjustments[ index ]; z = ( ( a / z )<<14 ) + ( z<<15 ); a >>= 1; } else { z = 0x8000 + ( a>>17 ) - sqrtEvenAdjustments[ index ]; z = a / z + z; z = ( 0x20000 <= z ) ? 0xFFFF8000 : ( z<<15 ); if ( z <= a ) return (bits32) ( ( (sbits32) a )>>1 ); } return ( ( estimateDiv64To32( a, 0, z ) )>>1 ) + ( z>>1 ); } #endif /* ------------------------------------------------------------------------------- Returns the number of leading 0 bits before the most-significant 1 bit of `a'. If `a' is zero, 32 is returned. ------------------------------------------------------------------------------- */ static int8 countLeadingZeros32( bits32 a ) { static const int8 countLeadingZerosHigh[] = { 8, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; int8 shiftCount; shiftCount = 0; if ( a < 0x10000 ) { shiftCount += 16; a <<= 16; } if ( a < 0x1000000 ) { shiftCount += 8; a <<= 8; } shiftCount += countLeadingZerosHigh[ a>>24 ]; return shiftCount; } /* ------------------------------------------------------------------------------- Returns 1 if the 64-bit value formed by concatenating `a0' and `a1' is equal to the 64-bit value formed by concatenating `b0' and `b1'. Otherwise, returns 0. ------------------------------------------------------------------------------- */ INLINE flag eq64( bits32 a0, bits32 a1, bits32 b0, bits32 b1 ) { return ( a0 == b0 ) && ( a1 == b1 ); } /* ------------------------------------------------------------------------------- Returns 1 if the 64-bit value formed by concatenating `a0' and `a1' is less than or equal to the 64-bit value formed by concatenating `b0' and `b1'. Otherwise, returns 0. ------------------------------------------------------------------------------- */ INLINE flag le64( bits32 a0, bits32 a1, bits32 b0, bits32 b1 ) { return ( a0 < b0 ) || ( ( a0 == b0 ) && ( a1 <= b1 ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the 64-bit value formed by concatenating `a0' and `a1' is less than the 64-bit value formed by concatenating `b0' and `b1'. Otherwise, returns 0. ------------------------------------------------------------------------------- */ INLINE flag lt64( bits32 a0, bits32 a1, bits32 b0, bits32 b1 ) { return ( a0 < b0 ) || ( ( a0 == b0 ) && ( a1 < b1 ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the 64-bit value formed by concatenating `a0' and `a1' is not equal to the 64-bit value formed by concatenating `b0' and `b1'. Otherwise, returns 0. ------------------------------------------------------------------------------- */ INLINE flag ne64( bits32 a0, bits32 a1, bits32 b0, bits32 b1 ) { return ( a0 != b0 ) || ( a1 != b1 ); } Index: head/lib/libc/softfloat/bits64/softfloat-macros =================================================================== --- head/lib/libc/softfloat/bits64/softfloat-macros (revision 230362) +++ head/lib/libc/softfloat/bits64/softfloat-macros (revision 230363) @@ -1,746 +1,746 @@ -/* $NetBSD: softfloat-macros,v 1.1 2002/05/21 23:51:08 bjh21 Exp $ */ +/* $NetBSD: softfloat-macros,v 1.2 2009/02/16 10:23:35 tron Exp $ */ /* $FreeBSD$ */ /* =============================================================================== This C source fragment is part of the SoftFloat IEC/IEEE Floating-point Arithmetic Package, Release 2a. Written by John R. Hauser. This work was made possible in part by the International Computer Science Institute, located at Suite 600, 1947 Center Street, Berkeley, California 94704. Funding was partially provided by the National Science Foundation under grant MIP-9311980. The original version of this code was written as part of a project to build a fixed-point vector processor in collaboration with the University of California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. More information is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ arithmetic/SoftFloat.html'. THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. Derivative works are acceptable, even for commercial purposes, so long as (1) they include prominent notice that the work is derivative, and (2) they include prominent notice akin to these four paragraphs for those parts of this code that are retained. =============================================================================== */ /* ------------------------------------------------------------------------------- Shifts `a' right by the number of bits given in `count'. If any nonzero bits are shifted off, they are ``jammed'' into the least significant bit of the result by setting the least significant bit to 1. The value of `count' can be arbitrarily large; in particular, if `count' is greater than 32, the result will be either 0 or 1, depending on whether `a' is zero or nonzero. The result is stored in the location pointed to by `zPtr'. ------------------------------------------------------------------------------- */ INLINE void shift32RightJamming( bits32 a, int16 count, bits32 *zPtr ) { bits32 z; if ( count == 0 ) { z = a; } else if ( count < 32 ) { z = ( a>>count ) | ( ( a<<( ( - count ) & 31 ) ) != 0 ); } else { z = ( a != 0 ); } *zPtr = z; } /* ------------------------------------------------------------------------------- Shifts `a' right by the number of bits given in `count'. If any nonzero bits are shifted off, they are ``jammed'' into the least significant bit of the result by setting the least significant bit to 1. The value of `count' can be arbitrarily large; in particular, if `count' is greater than 64, the result will be either 0 or 1, depending on whether `a' is zero or nonzero. The result is stored in the location pointed to by `zPtr'. ------------------------------------------------------------------------------- */ INLINE void shift64RightJamming( bits64 a, int16 count, bits64 *zPtr ) { bits64 z; if ( count == 0 ) { z = a; } else if ( count < 64 ) { z = ( a>>count ) | ( ( a<<( ( - count ) & 63 ) ) != 0 ); } else { z = ( a != 0 ); } *zPtr = z; } /* ------------------------------------------------------------------------------- Shifts the 128-bit value formed by concatenating `a0' and `a1' right by 64 _plus_ the number of bits given in `count'. The shifted result is at most 64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. The bits shifted off form a second 64-bit result as follows: The _last_ bit shifted off is the most-significant bit of the extra result, and the other 63 bits of the extra result are all zero if and only if _all_but_the_last_ bits shifted off were all zero. This extra result is stored in the location pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. (This routine makes more sense if `a0' and `a1' are considered to form a fixed-point value with binary point between `a0' and `a1'. This fixed-point value is shifted right by the number of bits given in `count', and the integer part of the result is returned at the location pointed to by `z0Ptr'. The fractional part of the result may be slightly corrupted as described above, and is returned at the location pointed to by `z1Ptr'.) ------------------------------------------------------------------------------- */ INLINE void shift64ExtraRightJamming( bits64 a0, bits64 a1, int16 count, bits64 *z0Ptr, bits64 *z1Ptr ) { bits64 z0, z1; int8 negCount = ( - count ) & 63; if ( count == 0 ) { z1 = a1; z0 = a0; } else if ( count < 64 ) { z1 = ( a0<>count; } else { if ( count == 64 ) { z1 = a0 | ( a1 != 0 ); } else { z1 = ( ( a0 | a1 ) != 0 ); } z0 = 0; } *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the number of bits given in `count'. Any bits shifted off are lost. The value of `count' can be arbitrarily large; in particular, if `count' is greater than 128, the result will be 0. The result is broken into two 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void shift128Right( bits64 a0, bits64 a1, int16 count, bits64 *z0Ptr, bits64 *z1Ptr ) { bits64 z0, z1; int8 negCount = ( - count ) & 63; if ( count == 0 ) { z1 = a1; z0 = a0; } else if ( count < 64 ) { z1 = ( a0<>count ); z0 = a0>>count; } else { z1 = ( count < 64 ) ? ( a0>>( count & 63 ) ) : 0; z0 = 0; } *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Shifts the 128-bit value formed by concatenating `a0' and `a1' right by the number of bits given in `count'. If any nonzero bits are shifted off, they are ``jammed'' into the least significant bit of the result by setting the least significant bit to 1. The value of `count' can be arbitrarily large; in particular, if `count' is greater than 128, the result will be either 0 or 1, depending on whether the concatenation of `a0' and `a1' is zero or nonzero. The result is broken into two 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void shift128RightJamming( bits64 a0, bits64 a1, int16 count, bits64 *z0Ptr, bits64 *z1Ptr ) { bits64 z0, z1; int8 negCount = ( - count ) & 63; if ( count == 0 ) { z1 = a1; z0 = a0; } else if ( count < 64 ) { z1 = ( a0<>count ) | ( ( a1<>count; } else { if ( count == 64 ) { z1 = a0 | ( a1 != 0 ); } else if ( count < 128 ) { z1 = ( a0>>( count & 63 ) ) | ( ( ( a0<>count ); z0 = a0>>count; } else { if ( count == 64 ) { z2 = a1; z1 = a0; } else { a2 |= a1; if ( count < 128 ) { z2 = a0<>( count & 63 ); } else { z2 = ( count == 128 ) ? a0 : ( a0 != 0 ); z1 = 0; } } z0 = 0; } z2 |= ( a2 != 0 ); } *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the number of bits given in `count'. Any bits shifted off are lost. The value of `count' must be less than 64. The result is broken into two 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void shortShift128Left( bits64 a0, bits64 a1, int16 count, bits64 *z0Ptr, bits64 *z1Ptr ) { *z1Ptr = a1<>( ( - count ) & 63 ) ); } /* ------------------------------------------------------------------------------- Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' left by the number of bits given in `count'. Any bits shifted off are lost. The value of `count' must be less than 64. The result is broken into three 64-bit pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. ------------------------------------------------------------------------------- */ INLINE void shortShift192Left( bits64 a0, bits64 a1, bits64 a2, int16 count, bits64 *z0Ptr, bits64 *z1Ptr, bits64 *z2Ptr ) { bits64 z0, z1, z2; int8 negCount; z2 = a2<>negCount; z0 |= a1>>negCount; } *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, so any carry out is lost. The result is broken into two 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void add128( bits64 a0, bits64 a1, bits64 b0, bits64 b1, bits64 *z0Ptr, bits64 *z1Ptr ) { bits64 z1; z1 = a1 + b1; *z1Ptr = z1; *z0Ptr = a0 + b0 + ( z1 < a1 ); } /* ------------------------------------------------------------------------------- Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to the 192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is modulo 2^192, so any carry out is lost. The result is broken into three 64-bit pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. ------------------------------------------------------------------------------- */ INLINE void add192( bits64 a0, bits64 a1, bits64 a2, bits64 b0, bits64 b1, bits64 b2, bits64 *z0Ptr, bits64 *z1Ptr, bits64 *z2Ptr ) { bits64 z0, z1, z2; int8 carry0, carry1; z2 = a2 + b2; carry1 = ( z2 < a2 ); z1 = a1 + b1; carry0 = ( z1 < a1 ); z0 = a0 + b0; z1 += carry1; - z0 += ( z1 < carry1 ); + z0 += ( z1 < (bits64)carry1 ); z0 += carry0; *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Subtracts the 128-bit value formed by concatenating `b0' and `b1' from the 128-bit value formed by concatenating `a0' and `a1'. Subtraction is modulo 2^128, so any borrow out (carry out) is lost. The result is broken into two 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void sub128( bits64 a0, bits64 a1, bits64 b0, bits64 b1, bits64 *z0Ptr, bits64 *z1Ptr ) { *z1Ptr = a1 - b1; *z0Ptr = a0 - b0 - ( a1 < b1 ); } /* ------------------------------------------------------------------------------- Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' from the 192-bit value formed by concatenating `a0', `a1', and `a2'. Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The result is broken into three 64-bit pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. ------------------------------------------------------------------------------- */ INLINE void sub192( bits64 a0, bits64 a1, bits64 a2, bits64 b0, bits64 b1, bits64 b2, bits64 *z0Ptr, bits64 *z1Ptr, bits64 *z2Ptr ) { bits64 z0, z1, z2; int8 borrow0, borrow1; z2 = a2 - b2; borrow1 = ( a2 < b2 ); z1 = a1 - b1; borrow0 = ( a1 < b1 ); z0 = a0 - b0; - z0 -= ( z1 < borrow1 ); + z0 -= ( z1 < (bits64)borrow1 ); z1 -= borrow1; z0 -= borrow0; *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Multiplies `a' by `b' to obtain a 128-bit product. The product is broken into two 64-bit pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. ------------------------------------------------------------------------------- */ INLINE void mul64To128( bits64 a, bits64 b, bits64 *z0Ptr, bits64 *z1Ptr ) { bits32 aHigh, aLow, bHigh, bLow; bits64 z0, zMiddleA, zMiddleB, z1; aLow = a; aHigh = a>>32; bLow = b; bHigh = b>>32; z1 = ( (bits64) aLow ) * bLow; zMiddleA = ( (bits64) aLow ) * bHigh; zMiddleB = ( (bits64) aHigh ) * bLow; z0 = ( (bits64) aHigh ) * bHigh; zMiddleA += zMiddleB; z0 += ( ( (bits64) ( zMiddleA < zMiddleB ) )<<32 ) + ( zMiddleA>>32 ); zMiddleA <<= 32; z1 += zMiddleA; z0 += ( z1 < zMiddleA ); *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Multiplies the 128-bit value formed by concatenating `a0' and `a1' by `b' to obtain a 192-bit product. The product is broken into three 64-bit pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. ------------------------------------------------------------------------------- */ INLINE void mul128By64To192( bits64 a0, bits64 a1, bits64 b, bits64 *z0Ptr, bits64 *z1Ptr, bits64 *z2Ptr ) { bits64 z0, z1, z2, more1; mul64To128( a1, b, &z1, &z2 ); mul64To128( a0, b, &z0, &more1 ); add128( z0, more1, 0, z1, &z0, &z1 ); *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the 128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit product. The product is broken into four 64-bit pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. ------------------------------------------------------------------------------- */ INLINE void mul128To256( bits64 a0, bits64 a1, bits64 b0, bits64 b1, bits64 *z0Ptr, bits64 *z1Ptr, bits64 *z2Ptr, bits64 *z3Ptr ) { bits64 z0, z1, z2, z3; bits64 more1, more2; mul64To128( a1, b1, &z2, &z3 ); mul64To128( a1, b0, &z1, &more2 ); add128( z1, more2, 0, z2, &z1, &z2 ); mul64To128( a0, b0, &z0, &more1 ); add128( z0, more1, 0, z1, &z0, &z1 ); mul64To128( a0, b1, &more1, &more2 ); add128( more1, more2, 0, z2, &more1, &z2 ); add128( z0, z1, 0, more1, &z0, &z1 ); *z3Ptr = z3; *z2Ptr = z2; *z1Ptr = z1; *z0Ptr = z0; } /* ------------------------------------------------------------------------------- Returns an approximation to the 64-bit integer quotient obtained by dividing `b' into the 128-bit value formed by concatenating `a0' and `a1'. The divisor `b' must be at least 2^63. If q is the exact quotient truncated toward zero, the approximation returned lies between q and q + 2 inclusive. If the exact quotient q is larger than 64 bits, the maximum positive 64-bit unsigned integer is returned. ------------------------------------------------------------------------------- */ static bits64 estimateDiv128To64( bits64 a0, bits64 a1, bits64 b ) { bits64 b0, b1; bits64 rem0, rem1, term0, term1; bits64 z; if ( b <= a0 ) return LIT64( 0xFFFFFFFFFFFFFFFF ); b0 = b>>32; z = ( b0<<32 <= a0 ) ? LIT64( 0xFFFFFFFF00000000 ) : ( a0 / b0 )<<32; mul64To128( b, z, &term0, &term1 ); sub128( a0, a1, term0, term1, &rem0, &rem1 ); while ( ( (sbits64) rem0 ) < 0 ) { z -= LIT64( 0x100000000 ); b1 = b<<32; add128( rem0, rem1, b0, b1, &rem0, &rem1 ); } rem0 = ( rem0<<32 ) | ( rem1>>32 ); z |= ( b0<<32 <= rem0 ) ? 0xFFFFFFFF : rem0 / b0; return z; } #if !defined(SOFTFLOAT_FOR_GCC) || defined(FLOATX80) || defined(FLOAT128) /* ------------------------------------------------------------------------------- Returns an approximation to the square root of the 32-bit significand given by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of `aExp' (the least significant bit) is 1, the integer returned approximates 2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `aExp' is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either case, the approximation returned lies strictly within +/-2 of the exact value. ------------------------------------------------------------------------------- */ static bits32 estimateSqrt32( int16 aExp, bits32 a ) { static const bits16 sqrtOddAdjustments[] = { 0x0004, 0x0022, 0x005D, 0x00B1, 0x011D, 0x019F, 0x0236, 0x02E0, 0x039C, 0x0468, 0x0545, 0x0631, 0x072B, 0x0832, 0x0946, 0x0A67 }; static const bits16 sqrtEvenAdjustments[] = { 0x0A2D, 0x08AF, 0x075A, 0x0629, 0x051A, 0x0429, 0x0356, 0x029E, 0x0200, 0x0179, 0x0109, 0x00AF, 0x0068, 0x0034, 0x0012, 0x0002 }; int8 idx; bits32 z; idx = ( a>>27 ) & 15; if ( aExp & 1 ) { z = 0x4000 + ( a>>17 ) - sqrtOddAdjustments[ idx ]; z = ( ( a / z )<<14 ) + ( z<<15 ); a >>= 1; } else { z = 0x8000 + ( a>>17 ) - sqrtEvenAdjustments[ idx ]; z = a / z + z; z = ( 0x20000 <= z ) ? 0xFFFF8000 : ( z<<15 ); if ( z <= a ) return (bits32) ( ( (sbits32) a )>>1 ); } return ( (bits32) ( ( ( (bits64) a )<<31 ) / z ) ) + ( z>>1 ); } #endif /* ------------------------------------------------------------------------------- Returns the number of leading 0 bits before the most-significant 1 bit of `a'. If `a' is zero, 32 is returned. ------------------------------------------------------------------------------- */ static int8 countLeadingZeros32( bits32 a ) { static const int8 countLeadingZerosHigh[] = { 8, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; int8 shiftCount; shiftCount = 0; if ( a < 0x10000 ) { shiftCount += 16; a <<= 16; } if ( a < 0x1000000 ) { shiftCount += 8; a <<= 8; } shiftCount += countLeadingZerosHigh[ a>>24 ]; return shiftCount; } /* ------------------------------------------------------------------------------- Returns the number of leading 0 bits before the most-significant 1 bit of `a'. If `a' is zero, 64 is returned. ------------------------------------------------------------------------------- */ static int8 countLeadingZeros64( bits64 a ) { int8 shiftCount; shiftCount = 0; if ( a < ( (bits64) 1 )<<32 ) { shiftCount += 32; } else { a >>= 32; } shiftCount += countLeadingZeros32( a ); return shiftCount; } /* ------------------------------------------------------------------------------- Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is equal to the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, returns 0. ------------------------------------------------------------------------------- */ INLINE flag eq128( bits64 a0, bits64 a1, bits64 b0, bits64 b1 ) { return ( a0 == b0 ) && ( a1 == b1 ); } /* ------------------------------------------------------------------------------- Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less than or equal to the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, returns 0. ------------------------------------------------------------------------------- */ INLINE flag le128( bits64 a0, bits64 a1, bits64 b0, bits64 b1 ) { return ( a0 < b0 ) || ( ( a0 == b0 ) && ( a1 <= b1 ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is less than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, returns 0. ------------------------------------------------------------------------------- */ INLINE flag lt128( bits64 a0, bits64 a1, bits64 b0, bits64 b1 ) { return ( a0 < b0 ) || ( ( a0 == b0 ) && ( a1 < b1 ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is not equal to the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, returns 0. ------------------------------------------------------------------------------- */ INLINE flag ne128( bits64 a0, bits64 a1, bits64 b0, bits64 b1 ) { return ( a0 != b0 ) || ( a1 != b1 ); } Index: head/lib/libc/softfloat/bits64/softfloat.c =================================================================== --- head/lib/libc/softfloat/bits64/softfloat.c (revision 230362) +++ head/lib/libc/softfloat/bits64/softfloat.c (revision 230363) @@ -1,5500 +1,5595 @@ -/* $NetBSD: softfloat.c,v 1.2 2003/07/26 19:24:52 salo Exp $ */ +/* $NetBSD: softfloat.c,v 1.8 2011/07/10 04:52:23 matt Exp $ */ /* * This version hacked for use with gcc -msoft-float by bjh21. * (Mostly a case of #ifdefing out things GCC doesn't need or provides * itself). */ /* * Things you may want to define: * * SOFTFLOAT_FOR_GCC - build only those functions necessary for GCC (with * -msoft-float) to work. Include "softfloat-for-gcc.h" to get them * properly renamed. */ /* =============================================================================== This C source file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic Package, Release 2a. Written by John R. Hauser. This work was made possible in part by the International Computer Science Institute, located at Suite 600, 1947 Center Street, Berkeley, California 94704. Funding was partially provided by the National Science Foundation under grant MIP-9311980. The original version of this code was written as part of a project to build a fixed-point vector processor in collaboration with the University of California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. More information is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ arithmetic/SoftFloat.html'. THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. Derivative works are acceptable, even for commercial purposes, so long as (1) they include prominent notice that the work is derivative, and (2) they include prominent notice akin to these four paragraphs for those parts of this code that are retained. =============================================================================== */ #include __FBSDID("$FreeBSD$"); #ifdef SOFTFLOAT_FOR_GCC #include "softfloat-for-gcc.h" #endif #include "milieu.h" #include "softfloat.h" /* * Conversions between floats as stored in memory and floats as * SoftFloat uses them */ #ifndef FLOAT64_DEMANGLE #define FLOAT64_DEMANGLE(a) (a) #endif #ifndef FLOAT64_MANGLE #define FLOAT64_MANGLE(a) (a) #endif /* ------------------------------------------------------------------------------- Floating-point rounding mode, extended double-precision rounding precision, and exception flags. ------------------------------------------------------------------------------- */ int float_rounding_mode = float_round_nearest_even; int float_exception_flags = 0; #ifdef FLOATX80 int8 floatx80_rounding_precision = 80; #endif /* ------------------------------------------------------------------------------- Primitive arithmetic functions, including multi-word arithmetic, and division and square root approximations. (Can be specialized to target if desired.) ------------------------------------------------------------------------------- */ #include "softfloat-macros" /* ------------------------------------------------------------------------------- Functions and definitions to determine: (1) whether tininess for underflow is detected before or after rounding by default, (2) what (if anything) happens when exceptions are raised, (3) how signaling NaNs are distinguished from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs are propagated from function inputs to output. These details are target- specific. ------------------------------------------------------------------------------- */ #include "softfloat-specialize" #if !defined(SOFTFLOAT_FOR_GCC) || defined(FLOATX80) || defined(FLOAT128) /* ------------------------------------------------------------------------------- Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 and 7, and returns the properly rounded 32-bit integer corresponding to the input. If `zSign' is 1, the input is negated before being converted to an integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point input is simply rounded to an integer, with the inexact exception raised if the input cannot be represented exactly as an integer. However, if the fixed- point input is too large, the invalid exception is raised and the largest positive or negative integer is returned. ------------------------------------------------------------------------------- */ static int32 roundAndPackInt32( flag zSign, bits64 absZ ) { int8 roundingMode; flag roundNearestEven; int8 roundIncrement, roundBits; int32 z; roundingMode = float_rounding_mode; roundNearestEven = ( roundingMode == float_round_nearest_even ); roundIncrement = 0x40; if ( ! roundNearestEven ) { if ( roundingMode == float_round_to_zero ) { roundIncrement = 0; } else { roundIncrement = 0x7F; if ( zSign ) { if ( roundingMode == float_round_up ) roundIncrement = 0; } else { if ( roundingMode == float_round_down ) roundIncrement = 0; } } } roundBits = absZ & 0x7F; absZ = ( absZ + roundIncrement )>>7; absZ &= ~ ( ( ( roundBits ^ 0x40 ) == 0 ) & roundNearestEven ); z = absZ; if ( zSign ) z = - z; if ( ( absZ>>32 ) || ( z && ( ( z < 0 ) ^ zSign ) ) ) { float_raise( float_flag_invalid ); return zSign ? (sbits32) 0x80000000 : 0x7FFFFFFF; } if ( roundBits ) float_exception_flags |= float_flag_inexact; return z; } /* ------------------------------------------------------------------------------- Takes the 128-bit fixed-point value formed by concatenating `absZ0' and `absZ1', with binary point between bits 63 and 64 (between the input words), and returns the properly rounded 64-bit integer corresponding to the input. If `zSign' is 1, the input is negated before being converted to an integer. Ordinarily, the fixed-point input is simply rounded to an integer, with the inexact exception raised if the input cannot be represented exactly as an integer. However, if the fixed-point input is too large, the invalid exception is raised and the largest positive or negative integer is returned. ------------------------------------------------------------------------------- */ static int64 roundAndPackInt64( flag zSign, bits64 absZ0, bits64 absZ1 ) { int8 roundingMode; flag roundNearestEven, increment; int64 z; roundingMode = float_rounding_mode; roundNearestEven = ( roundingMode == float_round_nearest_even ); increment = ( (sbits64) absZ1 < 0 ); if ( ! roundNearestEven ) { if ( roundingMode == float_round_to_zero ) { increment = 0; } else { if ( zSign ) { increment = ( roundingMode == float_round_down ) && absZ1; } else { increment = ( roundingMode == float_round_up ) && absZ1; } } } if ( increment ) { ++absZ0; if ( absZ0 == 0 ) goto overflow; absZ0 &= ~ ( ( (bits64) ( absZ1<<1 ) == 0 ) & roundNearestEven ); } z = absZ0; if ( zSign ) z = - z; if ( z && ( ( z < 0 ) ^ zSign ) ) { overflow: float_raise( float_flag_invalid ); return zSign ? (sbits64) LIT64( 0x8000000000000000 ) : LIT64( 0x7FFFFFFFFFFFFFFF ); } if ( absZ1 ) float_exception_flags |= float_flag_inexact; return z; } #endif /* ------------------------------------------------------------------------------- Returns the fraction bits of the single-precision floating-point value `a'. ------------------------------------------------------------------------------- */ INLINE bits32 extractFloat32Frac( float32 a ) { return a & 0x007FFFFF; } /* ------------------------------------------------------------------------------- Returns the exponent bits of the single-precision floating-point value `a'. ------------------------------------------------------------------------------- */ INLINE int16 extractFloat32Exp( float32 a ) { return ( a>>23 ) & 0xFF; } /* ------------------------------------------------------------------------------- Returns the sign bit of the single-precision floating-point value `a'. ------------------------------------------------------------------------------- */ INLINE flag extractFloat32Sign( float32 a ) { return a>>31; } /* ------------------------------------------------------------------------------- Normalizes the subnormal single-precision floating-point value represented by the denormalized significand `aSig'. The normalized exponent and significand are stored at the locations pointed to by `zExpPtr' and `zSigPtr', respectively. ------------------------------------------------------------------------------- */ static void normalizeFloat32Subnormal( bits32 aSig, int16 *zExpPtr, bits32 *zSigPtr ) { int8 shiftCount; shiftCount = countLeadingZeros32( aSig ) - 8; *zSigPtr = aSig<>7; zSig &= ~ ( ( ( roundBits ^ 0x40 ) == 0 ) & roundNearestEven ); if ( zSig == 0 ) zExp = 0; return packFloat32( zSign, zExp, zSig ); } /* ------------------------------------------------------------------------------- Takes an abstract floating-point value having sign `zSign', exponent `zExp', and significand `zSig', and returns the proper single-precision floating- point value corresponding to the abstract input. This routine is just like `roundAndPackFloat32' except that `zSig' does not have to be normalized. Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' floating-point exponent. ------------------------------------------------------------------------------- */ static float32 normalizeRoundAndPackFloat32( flag zSign, int16 zExp, bits32 zSig ) { int8 shiftCount; shiftCount = countLeadingZeros32( zSig ) - 1; return roundAndPackFloat32( zSign, zExp - shiftCount, zSig<>52 ) & 0x7FF; } /* ------------------------------------------------------------------------------- Returns the sign bit of the double-precision floating-point value `a'. ------------------------------------------------------------------------------- */ INLINE flag extractFloat64Sign( float64 a ) { return FLOAT64_DEMANGLE(a)>>63; } /* ------------------------------------------------------------------------------- Normalizes the subnormal double-precision floating-point value represented by the denormalized significand `aSig'. The normalized exponent and significand are stored at the locations pointed to by `zExpPtr' and `zSigPtr', respectively. ------------------------------------------------------------------------------- */ static void normalizeFloat64Subnormal( bits64 aSig, int16 *zExpPtr, bits64 *zSigPtr ) { int8 shiftCount; shiftCount = countLeadingZeros64( aSig ) - 11; *zSigPtr = aSig<>10; zSig &= ~ ( ( ( roundBits ^ 0x200 ) == 0 ) & roundNearestEven ); if ( zSig == 0 ) zExp = 0; return packFloat64( zSign, zExp, zSig ); } /* ------------------------------------------------------------------------------- Takes an abstract floating-point value having sign `zSign', exponent `zExp', and significand `zSig', and returns the proper double-precision floating- point value corresponding to the abstract input. This routine is just like `roundAndPackFloat64' except that `zSig' does not have to be normalized. Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true'' floating-point exponent. ------------------------------------------------------------------------------- */ static float64 normalizeRoundAndPackFloat64( flag zSign, int16 zExp, bits64 zSig ) { int8 shiftCount; shiftCount = countLeadingZeros64( zSig ) - 1; return roundAndPackFloat64( zSign, zExp - shiftCount, zSig<>15; } /* ------------------------------------------------------------------------------- Normalizes the subnormal extended double-precision floating-point value represented by the denormalized significand `aSig'. The normalized exponent and significand are stored at the locations pointed to by `zExpPtr' and `zSigPtr', respectively. ------------------------------------------------------------------------------- */ static void normalizeFloatx80Subnormal( bits64 aSig, int32 *zExpPtr, bits64 *zSigPtr ) { int8 shiftCount; shiftCount = countLeadingZeros64( aSig ); *zSigPtr = aSig<>48 ) & 0x7FFF; } /* ------------------------------------------------------------------------------- Returns the sign bit of the quadruple-precision floating-point value `a'. ------------------------------------------------------------------------------- */ INLINE flag extractFloat128Sign( float128 a ) { return a.high>>63; } /* ------------------------------------------------------------------------------- Normalizes the subnormal quadruple-precision floating-point value represented by the denormalized significand formed by the concatenation of `aSig0' and `aSig1'. The normalized exponent is stored at the location pointed to by `zExpPtr'. The most significant 49 bits of the normalized significand are stored at the location pointed to by `zSig0Ptr', and the least significant 64 bits of the normalized significand are stored at the location pointed to by `zSig1Ptr'. ------------------------------------------------------------------------------- */ static void normalizeFloat128Subnormal( bits64 aSig0, bits64 aSig1, int32 *zExpPtr, bits64 *zSig0Ptr, bits64 *zSig1Ptr ) { int8 shiftCount; if ( aSig0 == 0 ) { shiftCount = countLeadingZeros64( aSig1 ) - 15; if ( shiftCount < 0 ) { *zSig0Ptr = aSig1>>( - shiftCount ); *zSig1Ptr = aSig1<<( shiftCount & 63 ); } else { *zSig0Ptr = aSig1<> 1 ); + return normalizeRoundAndPackFloat32( 0, 0x9C, a ); +} + + /* ------------------------------------------------------------------------------- Returns the result of converting the 32-bit two's complement integer `a' to the double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 int32_to_float64( int32 a ) { flag zSign; uint32 absA; int8 shiftCount; bits64 zSig; if ( a == 0 ) return 0; zSign = ( a < 0 ); absA = zSign ? - a : a; shiftCount = countLeadingZeros32( absA ) + 21; zSig = absA; return packFloat64( zSign, 0x432 - shiftCount, zSig<>( - shiftCount ); if ( (bits32) ( aSig<<( shiftCount & 31 ) ) ) { float_exception_flags |= float_flag_inexact; } if ( aSign ) z = - z; return z; } #ifndef SOFTFLOAT_FOR_GCC /* __fix?fdi provided by libgcc2.c */ /* ------------------------------------------------------------------------------- Returns the result of converting the single-precision floating-point value `a' to the 64-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic---which means in particular that the conversion is rounded according to the current rounding mode. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned. ------------------------------------------------------------------------------- */ int64 float32_to_int64( float32 a ) { flag aSign; int16 aExp, shiftCount; bits32 aSig; bits64 aSig64, aSigExtra; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); aSign = extractFloat32Sign( a ); shiftCount = 0xBE - aExp; if ( shiftCount < 0 ) { float_raise( float_flag_invalid ); if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) { return LIT64( 0x7FFFFFFFFFFFFFFF ); } return (sbits64) LIT64( 0x8000000000000000 ); } if ( aExp ) aSig |= 0x00800000; aSig64 = aSig; aSig64 <<= 40; shift64ExtraRightJamming( aSig64, 0, shiftCount, &aSig64, &aSigExtra ); return roundAndPackInt64( aSign, aSig64, aSigExtra ); } /* ------------------------------------------------------------------------------- Returns the result of converting the single-precision floating-point value `a' to the 64-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic, except that the conversion is always rounded toward zero. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned. ------------------------------------------------------------------------------- */ int64 float32_to_int64_round_to_zero( float32 a ) { flag aSign; int16 aExp, shiftCount; bits32 aSig; bits64 aSig64; int64 z; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); aSign = extractFloat32Sign( a ); shiftCount = aExp - 0xBE; if ( 0 <= shiftCount ) { if ( a != 0xDF000000 ) { float_raise( float_flag_invalid ); if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) { return LIT64( 0x7FFFFFFFFFFFFFFF ); } } return (sbits64) LIT64( 0x8000000000000000 ); } else if ( aExp <= 0x7E ) { if ( aExp | aSig ) float_exception_flags |= float_flag_inexact; return 0; } aSig64 = aSig | 0x00800000; aSig64 <<= 40; z = aSig64>>( - shiftCount ); if ( (bits64) ( aSig64<<( shiftCount & 63 ) ) ) { float_exception_flags |= float_flag_inexact; } if ( aSign ) z = - z; return z; } #endif /* !SOFTFLOAT_FOR_GCC */ /* ------------------------------------------------------------------------------- Returns the result of converting the single-precision floating-point value `a' to the double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 float32_to_float64( float32 a ) { flag aSign; int16 aExp; bits32 aSig; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); aSign = extractFloat32Sign( a ); if ( aExp == 0xFF ) { if ( aSig ) return commonNaNToFloat64( float32ToCommonNaN( a ) ); return packFloat64( aSign, 0x7FF, 0 ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloat64( aSign, 0, 0 ); normalizeFloat32Subnormal( aSig, &aExp, &aSig ); --aExp; } return packFloat64( aSign, aExp + 0x380, ( (bits64) aSig )<<29 ); } #ifdef FLOATX80 /* ------------------------------------------------------------------------------- Returns the result of converting the single-precision floating-point value `a' to the extended double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ floatx80 float32_to_floatx80( float32 a ) { flag aSign; int16 aExp; bits32 aSig; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); aSign = extractFloat32Sign( a ); if ( aExp == 0xFF ) { if ( aSig ) return commonNaNToFloatx80( float32ToCommonNaN( a ) ); return packFloatx80( aSign, 0x7FFF, LIT64( 0x8000000000000000 ) ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloatx80( aSign, 0, 0 ); normalizeFloat32Subnormal( aSig, &aExp, &aSig ); } aSig |= 0x00800000; return packFloatx80( aSign, aExp + 0x3F80, ( (bits64) aSig )<<40 ); } #endif #ifdef FLOAT128 /* ------------------------------------------------------------------------------- Returns the result of converting the single-precision floating-point value `a' to the double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float128 float32_to_float128( float32 a ) { flag aSign; int16 aExp; bits32 aSig; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); aSign = extractFloat32Sign( a ); if ( aExp == 0xFF ) { if ( aSig ) return commonNaNToFloat128( float32ToCommonNaN( a ) ); return packFloat128( aSign, 0x7FFF, 0, 0 ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloat128( aSign, 0, 0, 0 ); normalizeFloat32Subnormal( aSig, &aExp, &aSig ); --aExp; } return packFloat128( aSign, aExp + 0x3F80, ( (bits64) aSig )<<25, 0 ); } #endif #ifndef SOFTFLOAT_FOR_GCC /* Not needed */ /* ------------------------------------------------------------------------------- Rounds the single-precision floating-point value `a' to an integer, and returns the result as a single-precision floating-point value. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float32 float32_round_to_int( float32 a ) { flag aSign; int16 aExp; bits32 lastBitMask, roundBitsMask; int8 roundingMode; float32 z; aExp = extractFloat32Exp( a ); if ( 0x96 <= aExp ) { if ( ( aExp == 0xFF ) && extractFloat32Frac( a ) ) { return propagateFloat32NaN( a, a ); } return a; } if ( aExp <= 0x7E ) { if ( (bits32) ( a<<1 ) == 0 ) return a; float_exception_flags |= float_flag_inexact; aSign = extractFloat32Sign( a ); switch ( float_rounding_mode ) { case float_round_nearest_even: if ( ( aExp == 0x7E ) && extractFloat32Frac( a ) ) { return packFloat32( aSign, 0x7F, 0 ); } break; case float_round_to_zero: break; case float_round_down: return aSign ? 0xBF800000 : 0; case float_round_up: return aSign ? 0x80000000 : 0x3F800000; } return packFloat32( aSign, 0, 0 ); } lastBitMask = 1; lastBitMask <<= 0x96 - aExp; roundBitsMask = lastBitMask - 1; z = a; roundingMode = float_rounding_mode; if ( roundingMode == float_round_nearest_even ) { z += lastBitMask>>1; if ( ( z & roundBitsMask ) == 0 ) z &= ~ lastBitMask; } else if ( roundingMode != float_round_to_zero ) { if ( extractFloat32Sign( z ) ^ ( roundingMode == float_round_up ) ) { z += roundBitsMask; } } z &= ~ roundBitsMask; if ( z != a ) float_exception_flags |= float_flag_inexact; return z; } #endif /* !SOFTFLOAT_FOR_GCC */ /* ------------------------------------------------------------------------------- Returns the result of adding the absolute values of the single-precision floating-point values `a' and `b'. If `zSign' is 1, the sum is negated before being returned. `zSign' is ignored if the result is a NaN. The addition is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ static float32 addFloat32Sigs( float32 a, float32 b, flag zSign ) { int16 aExp, bExp, zExp; bits32 aSig, bSig, zSig; int16 expDiff; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); bSig = extractFloat32Frac( b ); bExp = extractFloat32Exp( b ); expDiff = aExp - bExp; aSig <<= 6; bSig <<= 6; if ( 0 < expDiff ) { if ( aExp == 0xFF ) { if ( aSig ) return propagateFloat32NaN( a, b ); return a; } if ( bExp == 0 ) { --expDiff; } else { bSig |= 0x20000000; } shift32RightJamming( bSig, expDiff, &bSig ); zExp = aExp; } else if ( expDiff < 0 ) { if ( bExp == 0xFF ) { if ( bSig ) return propagateFloat32NaN( a, b ); return packFloat32( zSign, 0xFF, 0 ); } if ( aExp == 0 ) { ++expDiff; } else { aSig |= 0x20000000; } shift32RightJamming( aSig, - expDiff, &aSig ); zExp = bExp; } else { if ( aExp == 0xFF ) { if ( aSig | bSig ) return propagateFloat32NaN( a, b ); return a; } if ( aExp == 0 ) return packFloat32( zSign, 0, ( aSig + bSig )>>6 ); zSig = 0x40000000 + aSig + bSig; zExp = aExp; goto roundAndPack; } aSig |= 0x20000000; zSig = ( aSig + bSig )<<1; --zExp; if ( (sbits32) zSig < 0 ) { zSig = aSig + bSig; ++zExp; } roundAndPack: return roundAndPackFloat32( zSign, zExp, zSig ); } /* ------------------------------------------------------------------------------- Returns the result of subtracting the absolute values of the single- precision floating-point values `a' and `b'. If `zSign' is 1, the difference is negated before being returned. `zSign' is ignored if the result is a NaN. The subtraction is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ static float32 subFloat32Sigs( float32 a, float32 b, flag zSign ) { int16 aExp, bExp, zExp; bits32 aSig, bSig, zSig; int16 expDiff; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); bSig = extractFloat32Frac( b ); bExp = extractFloat32Exp( b ); expDiff = aExp - bExp; aSig <<= 7; bSig <<= 7; if ( 0 < expDiff ) goto aExpBigger; if ( expDiff < 0 ) goto bExpBigger; if ( aExp == 0xFF ) { if ( aSig | bSig ) return propagateFloat32NaN( a, b ); float_raise( float_flag_invalid ); return float32_default_nan; } if ( aExp == 0 ) { aExp = 1; bExp = 1; } if ( bSig < aSig ) goto aBigger; if ( aSig < bSig ) goto bBigger; return packFloat32( float_rounding_mode == float_round_down, 0, 0 ); bExpBigger: if ( bExp == 0xFF ) { if ( bSig ) return propagateFloat32NaN( a, b ); return packFloat32( zSign ^ 1, 0xFF, 0 ); } if ( aExp == 0 ) { ++expDiff; } else { aSig |= 0x40000000; } shift32RightJamming( aSig, - expDiff, &aSig ); bSig |= 0x40000000; bBigger: zSig = bSig - aSig; zExp = bExp; zSign ^= 1; goto normalizeRoundAndPack; aExpBigger: if ( aExp == 0xFF ) { if ( aSig ) return propagateFloat32NaN( a, b ); return a; } if ( bExp == 0 ) { --expDiff; } else { bSig |= 0x40000000; } shift32RightJamming( bSig, expDiff, &bSig ); aSig |= 0x40000000; aBigger: zSig = aSig - bSig; zExp = aExp; normalizeRoundAndPack: --zExp; return normalizeRoundAndPackFloat32( zSign, zExp, zSig ); } /* ------------------------------------------------------------------------------- Returns the result of adding the single-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float32 float32_add( float32 a, float32 b ) { flag aSign, bSign; aSign = extractFloat32Sign( a ); bSign = extractFloat32Sign( b ); if ( aSign == bSign ) { return addFloat32Sigs( a, b, aSign ); } else { return subFloat32Sigs( a, b, aSign ); } } /* ------------------------------------------------------------------------------- Returns the result of subtracting the single-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float32 float32_sub( float32 a, float32 b ) { flag aSign, bSign; aSign = extractFloat32Sign( a ); bSign = extractFloat32Sign( b ); if ( aSign == bSign ) { return subFloat32Sigs( a, b, aSign ); } else { return addFloat32Sigs( a, b, aSign ); } } /* ------------------------------------------------------------------------------- Returns the result of multiplying the single-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float32 float32_mul( float32 a, float32 b ) { flag aSign, bSign, zSign; int16 aExp, bExp, zExp; bits32 aSig, bSig; bits64 zSig64; bits32 zSig; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); aSign = extractFloat32Sign( a ); bSig = extractFloat32Frac( b ); bExp = extractFloat32Exp( b ); bSign = extractFloat32Sign( b ); zSign = aSign ^ bSign; if ( aExp == 0xFF ) { if ( aSig || ( ( bExp == 0xFF ) && bSig ) ) { return propagateFloat32NaN( a, b ); } if ( ( bExp | bSig ) == 0 ) { float_raise( float_flag_invalid ); return float32_default_nan; } return packFloat32( zSign, 0xFF, 0 ); } if ( bExp == 0xFF ) { if ( bSig ) return propagateFloat32NaN( a, b ); if ( ( aExp | aSig ) == 0 ) { float_raise( float_flag_invalid ); return float32_default_nan; } return packFloat32( zSign, 0xFF, 0 ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloat32( zSign, 0, 0 ); normalizeFloat32Subnormal( aSig, &aExp, &aSig ); } if ( bExp == 0 ) { if ( bSig == 0 ) return packFloat32( zSign, 0, 0 ); normalizeFloat32Subnormal( bSig, &bExp, &bSig ); } zExp = aExp + bExp - 0x7F; aSig = ( aSig | 0x00800000 )<<7; bSig = ( bSig | 0x00800000 )<<8; shift64RightJamming( ( (bits64) aSig ) * bSig, 32, &zSig64 ); zSig = zSig64; if ( 0 <= (sbits32) ( zSig<<1 ) ) { zSig <<= 1; --zExp; } return roundAndPackFloat32( zSign, zExp, zSig ); } /* ------------------------------------------------------------------------------- Returns the result of dividing the single-precision floating-point value `a' by the corresponding value `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float32 float32_div( float32 a, float32 b ) { flag aSign, bSign, zSign; int16 aExp, bExp, zExp; bits32 aSig, bSig, zSig; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); aSign = extractFloat32Sign( a ); bSig = extractFloat32Frac( b ); bExp = extractFloat32Exp( b ); bSign = extractFloat32Sign( b ); zSign = aSign ^ bSign; if ( aExp == 0xFF ) { if ( aSig ) return propagateFloat32NaN( a, b ); if ( bExp == 0xFF ) { if ( bSig ) return propagateFloat32NaN( a, b ); float_raise( float_flag_invalid ); return float32_default_nan; } return packFloat32( zSign, 0xFF, 0 ); } if ( bExp == 0xFF ) { if ( bSig ) return propagateFloat32NaN( a, b ); return packFloat32( zSign, 0, 0 ); } if ( bExp == 0 ) { if ( bSig == 0 ) { if ( ( aExp | aSig ) == 0 ) { float_raise( float_flag_invalid ); return float32_default_nan; } float_raise( float_flag_divbyzero ); return packFloat32( zSign, 0xFF, 0 ); } normalizeFloat32Subnormal( bSig, &bExp, &bSig ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloat32( zSign, 0, 0 ); normalizeFloat32Subnormal( aSig, &aExp, &aSig ); } zExp = aExp - bExp + 0x7D; aSig = ( aSig | 0x00800000 )<<7; bSig = ( bSig | 0x00800000 )<<8; if ( bSig <= ( aSig + aSig ) ) { aSig >>= 1; ++zExp; } zSig = ( ( (bits64) aSig )<<32 ) / bSig; if ( ( zSig & 0x3F ) == 0 ) { zSig |= ( (bits64) bSig * zSig != ( (bits64) aSig )<<32 ); } return roundAndPackFloat32( zSign, zExp, zSig ); } #ifndef SOFTFLOAT_FOR_GCC /* Not needed */ /* ------------------------------------------------------------------------------- Returns the remainder of the single-precision floating-point value `a' with respect to the corresponding value `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float32 float32_rem( float32 a, float32 b ) { flag aSign, bSign, zSign; int16 aExp, bExp, expDiff; bits32 aSig, bSig; bits32 q; bits64 aSig64, bSig64, q64; bits32 alternateASig; sbits32 sigMean; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); aSign = extractFloat32Sign( a ); bSig = extractFloat32Frac( b ); bExp = extractFloat32Exp( b ); bSign = extractFloat32Sign( b ); if ( aExp == 0xFF ) { if ( aSig || ( ( bExp == 0xFF ) && bSig ) ) { return propagateFloat32NaN( a, b ); } float_raise( float_flag_invalid ); return float32_default_nan; } if ( bExp == 0xFF ) { if ( bSig ) return propagateFloat32NaN( a, b ); return a; } if ( bExp == 0 ) { if ( bSig == 0 ) { float_raise( float_flag_invalid ); return float32_default_nan; } normalizeFloat32Subnormal( bSig, &bExp, &bSig ); } if ( aExp == 0 ) { if ( aSig == 0 ) return a; normalizeFloat32Subnormal( aSig, &aExp, &aSig ); } expDiff = aExp - bExp; aSig |= 0x00800000; bSig |= 0x00800000; if ( expDiff < 32 ) { aSig <<= 8; bSig <<= 8; if ( expDiff < 0 ) { if ( expDiff < -1 ) return a; aSig >>= 1; } q = ( bSig <= aSig ); if ( q ) aSig -= bSig; if ( 0 < expDiff ) { q = ( ( (bits64) aSig )<<32 ) / bSig; q >>= 32 - expDiff; bSig >>= 2; aSig = ( ( aSig>>1 )<<( expDiff - 1 ) ) - bSig * q; } else { aSig >>= 2; bSig >>= 2; } } else { if ( bSig <= aSig ) aSig -= bSig; aSig64 = ( (bits64) aSig )<<40; bSig64 = ( (bits64) bSig )<<40; expDiff -= 64; while ( 0 < expDiff ) { q64 = estimateDiv128To64( aSig64, 0, bSig64 ); q64 = ( 2 < q64 ) ? q64 - 2 : 0; aSig64 = - ( ( bSig * q64 )<<38 ); expDiff -= 62; } expDiff += 64; q64 = estimateDiv128To64( aSig64, 0, bSig64 ); q64 = ( 2 < q64 ) ? q64 - 2 : 0; q = q64>>( 64 - expDiff ); bSig <<= 6; aSig = ( ( aSig64>>33 )<<( expDiff - 1 ) ) - bSig * q; } do { alternateASig = aSig; ++q; aSig -= bSig; } while ( 0 <= (sbits32) aSig ); sigMean = aSig + alternateASig; if ( ( sigMean < 0 ) || ( ( sigMean == 0 ) && ( q & 1 ) ) ) { aSig = alternateASig; } zSign = ( (sbits32) aSig < 0 ); if ( zSign ) aSig = - aSig; return normalizeRoundAndPackFloat32( aSign ^ zSign, bExp, aSig ); } #endif /* !SOFTFLOAT_FOR_GCC */ #ifndef SOFTFLOAT_FOR_GCC /* Not needed */ /* ------------------------------------------------------------------------------- Returns the square root of the single-precision floating-point value `a'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float32 float32_sqrt( float32 a ) { flag aSign; int16 aExp, zExp; bits32 aSig, zSig; bits64 rem, term; aSig = extractFloat32Frac( a ); aExp = extractFloat32Exp( a ); aSign = extractFloat32Sign( a ); if ( aExp == 0xFF ) { if ( aSig ) return propagateFloat32NaN( a, 0 ); if ( ! aSign ) return a; float_raise( float_flag_invalid ); return float32_default_nan; } if ( aSign ) { if ( ( aExp | aSig ) == 0 ) return a; float_raise( float_flag_invalid ); return float32_default_nan; } if ( aExp == 0 ) { if ( aSig == 0 ) return 0; normalizeFloat32Subnormal( aSig, &aExp, &aSig ); } zExp = ( ( aExp - 0x7F )>>1 ) + 0x7E; aSig = ( aSig | 0x00800000 )<<8; zSig = estimateSqrt32( aExp, aSig ) + 2; if ( ( zSig & 0x7F ) <= 5 ) { if ( zSig < 2 ) { zSig = 0x7FFFFFFF; goto roundAndPack; } aSig >>= aExp & 1; term = ( (bits64) zSig ) * zSig; rem = ( ( (bits64) aSig )<<32 ) - term; while ( (sbits64) rem < 0 ) { --zSig; rem += ( ( (bits64) zSig )<<1 ) | 1; } zSig |= ( rem != 0 ); } shift32RightJamming( zSig, 1, &zSig ); roundAndPack: return roundAndPackFloat32( 0, zExp, zSig ); } #endif /* !SOFTFLOAT_FOR_GCC */ /* ------------------------------------------------------------------------------- Returns 1 if the single-precision floating-point value `a' is equal to the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float32_eq( float32 a, float32 b ) { if ( ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) ) || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) ) ) { if ( float32_is_signaling_nan( a ) || float32_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } return ( a == b ) || ( (bits32) ( ( a | b )<<1 ) == 0 ); } /* ------------------------------------------------------------------------------- Returns 1 if the single-precision floating-point value `a' is less than or equal to the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float32_le( float32 a, float32 b ) { flag aSign, bSign; if ( ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) ) || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) ) ) { float_raise( float_flag_invalid ); return 0; } aSign = extractFloat32Sign( a ); bSign = extractFloat32Sign( b ); if ( aSign != bSign ) return aSign || ( (bits32) ( ( a | b )<<1 ) == 0 ); return ( a == b ) || ( aSign ^ ( a < b ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the single-precision floating-point value `a' is less than the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float32_lt( float32 a, float32 b ) { flag aSign, bSign; if ( ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) ) || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) ) ) { float_raise( float_flag_invalid ); return 0; } aSign = extractFloat32Sign( a ); bSign = extractFloat32Sign( b ); if ( aSign != bSign ) return aSign && ( (bits32) ( ( a | b )<<1 ) != 0 ); return ( a != b ) && ( aSign ^ ( a < b ) ); } #ifndef SOFTFLOAT_FOR_GCC /* Not needed */ /* ------------------------------------------------------------------------------- Returns 1 if the single-precision floating-point value `a' is equal to the corresponding value `b', and 0 otherwise. The invalid exception is raised if either operand is a NaN. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float32_eq_signaling( float32 a, float32 b ) { if ( ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) ) || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) ) ) { float_raise( float_flag_invalid ); return 0; } return ( a == b ) || ( (bits32) ( ( a | b )<<1 ) == 0 ); } /* ------------------------------------------------------------------------------- Returns 1 if the single-precision floating-point value `a' is less than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float32_le_quiet( float32 a, float32 b ) { flag aSign, bSign; if ( ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) ) || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) ) ) { if ( float32_is_signaling_nan( a ) || float32_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } aSign = extractFloat32Sign( a ); bSign = extractFloat32Sign( b ); if ( aSign != bSign ) return aSign || ( (bits32) ( ( a | b )<<1 ) == 0 ); return ( a == b ) || ( aSign ^ ( a < b ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the single-precision floating-point value `a' is less than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float32_lt_quiet( float32 a, float32 b ) { flag aSign, bSign; if ( ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) ) || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) ) ) { if ( float32_is_signaling_nan( a ) || float32_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } aSign = extractFloat32Sign( a ); bSign = extractFloat32Sign( b ); if ( aSign != bSign ) return aSign && ( (bits32) ( ( a | b )<<1 ) != 0 ); return ( a != b ) && ( aSign ^ ( a < b ) ); } #endif /* !SOFTFLOAT_FOR_GCC */ #ifndef SOFTFLOAT_FOR_GCC /* Not needed */ /* ------------------------------------------------------------------------------- Returns the result of converting the double-precision floating-point value `a' to the 32-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic---which means in particular that the conversion is rounded according to the current rounding mode. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned. ------------------------------------------------------------------------------- */ int32 float64_to_int32( float64 a ) { flag aSign; int16 aExp, shiftCount; bits64 aSig; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); aSign = extractFloat64Sign( a ); if ( ( aExp == 0x7FF ) && aSig ) aSign = 0; if ( aExp ) aSig |= LIT64( 0x0010000000000000 ); shiftCount = 0x42C - aExp; if ( 0 < shiftCount ) shift64RightJamming( aSig, shiftCount, &aSig ); return roundAndPackInt32( aSign, aSig ); } #endif /* !SOFTFLOAT_FOR_GCC */ /* ------------------------------------------------------------------------------- Returns the result of converting the double-precision floating-point value `a' to the 32-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic, except that the conversion is always rounded toward zero. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned. ------------------------------------------------------------------------------- */ int32 float64_to_int32_round_to_zero( float64 a ) { flag aSign; int16 aExp, shiftCount; bits64 aSig, savedASig; int32 z; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); aSign = extractFloat64Sign( a ); if ( 0x41E < aExp ) { if ( ( aExp == 0x7FF ) && aSig ) aSign = 0; goto invalid; } else if ( aExp < 0x3FF ) { if ( aExp || aSig ) float_exception_flags |= float_flag_inexact; return 0; } aSig |= LIT64( 0x0010000000000000 ); shiftCount = 0x433 - aExp; savedASig = aSig; aSig >>= shiftCount; z = aSig; if ( aSign ) z = - z; if ( ( z < 0 ) ^ aSign ) { invalid: float_raise( float_flag_invalid ); return aSign ? (sbits32) 0x80000000 : 0x7FFFFFFF; } if ( ( aSig<>( - shiftCount ); if ( (bits64) ( aSig<<( shiftCount & 63 ) ) ) { float_exception_flags |= float_flag_inexact; } } if ( aSign ) z = - z; return z; } #endif /* !SOFTFLOAT_FOR_GCC */ /* ------------------------------------------------------------------------------- Returns the result of converting the double-precision floating-point value `a' to the single-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float32 float64_to_float32( float64 a ) { flag aSign; int16 aExp; bits64 aSig; bits32 zSig; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); aSign = extractFloat64Sign( a ); if ( aExp == 0x7FF ) { if ( aSig ) return commonNaNToFloat32( float64ToCommonNaN( a ) ); return packFloat32( aSign, 0xFF, 0 ); } shift64RightJamming( aSig, 22, &aSig ); zSig = aSig; if ( aExp || zSig ) { zSig |= 0x40000000; aExp -= 0x381; } return roundAndPackFloat32( aSign, aExp, zSig ); } #ifdef FLOATX80 /* ------------------------------------------------------------------------------- Returns the result of converting the double-precision floating-point value `a' to the extended double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ floatx80 float64_to_floatx80( float64 a ) { flag aSign; int16 aExp; bits64 aSig; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); aSign = extractFloat64Sign( a ); if ( aExp == 0x7FF ) { if ( aSig ) return commonNaNToFloatx80( float64ToCommonNaN( a ) ); return packFloatx80( aSign, 0x7FFF, LIT64( 0x8000000000000000 ) ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloatx80( aSign, 0, 0 ); normalizeFloat64Subnormal( aSig, &aExp, &aSig ); } return packFloatx80( aSign, aExp + 0x3C00, ( aSig | LIT64( 0x0010000000000000 ) )<<11 ); } #endif #ifdef FLOAT128 /* ------------------------------------------------------------------------------- Returns the result of converting the double-precision floating-point value `a' to the quadruple-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float128 float64_to_float128( float64 a ) { flag aSign; int16 aExp; bits64 aSig, zSig0, zSig1; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); aSign = extractFloat64Sign( a ); if ( aExp == 0x7FF ) { if ( aSig ) return commonNaNToFloat128( float64ToCommonNaN( a ) ); return packFloat128( aSign, 0x7FFF, 0, 0 ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloat128( aSign, 0, 0, 0 ); normalizeFloat64Subnormal( aSig, &aExp, &aSig ); --aExp; } shift128Right( aSig, 0, 4, &zSig0, &zSig1 ); return packFloat128( aSign, aExp + 0x3C00, zSig0, zSig1 ); } #endif #ifndef SOFTFLOAT_FOR_GCC /* ------------------------------------------------------------------------------- Rounds the double-precision floating-point value `a' to an integer, and returns the result as a double-precision floating-point value. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 float64_round_to_int( float64 a ) { flag aSign; int16 aExp; bits64 lastBitMask, roundBitsMask; int8 roundingMode; float64 z; aExp = extractFloat64Exp( a ); if ( 0x433 <= aExp ) { if ( ( aExp == 0x7FF ) && extractFloat64Frac( a ) ) { return propagateFloat64NaN( a, a ); } return a; } if ( aExp < 0x3FF ) { if ( (bits64) ( a<<1 ) == 0 ) return a; float_exception_flags |= float_flag_inexact; aSign = extractFloat64Sign( a ); switch ( float_rounding_mode ) { case float_round_nearest_even: if ( ( aExp == 0x3FE ) && extractFloat64Frac( a ) ) { return packFloat64( aSign, 0x3FF, 0 ); } break; case float_round_to_zero: break; case float_round_down: return aSign ? LIT64( 0xBFF0000000000000 ) : 0; case float_round_up: return aSign ? LIT64( 0x8000000000000000 ) : LIT64( 0x3FF0000000000000 ); } return packFloat64( aSign, 0, 0 ); } lastBitMask = 1; lastBitMask <<= 0x433 - aExp; roundBitsMask = lastBitMask - 1; z = a; roundingMode = float_rounding_mode; if ( roundingMode == float_round_nearest_even ) { z += lastBitMask>>1; if ( ( z & roundBitsMask ) == 0 ) z &= ~ lastBitMask; } else if ( roundingMode != float_round_to_zero ) { if ( extractFloat64Sign( z ) ^ ( roundingMode == float_round_up ) ) { z += roundBitsMask; } } z &= ~ roundBitsMask; if ( z != a ) float_exception_flags |= float_flag_inexact; return z; } #endif /* ------------------------------------------------------------------------------- Returns the result of adding the absolute values of the double-precision floating-point values `a' and `b'. If `zSign' is 1, the sum is negated before being returned. `zSign' is ignored if the result is a NaN. The addition is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ static float64 addFloat64Sigs( float64 a, float64 b, flag zSign ) { int16 aExp, bExp, zExp; bits64 aSig, bSig, zSig; int16 expDiff; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); bSig = extractFloat64Frac( b ); bExp = extractFloat64Exp( b ); expDiff = aExp - bExp; aSig <<= 9; bSig <<= 9; if ( 0 < expDiff ) { if ( aExp == 0x7FF ) { if ( aSig ) return propagateFloat64NaN( a, b ); return a; } if ( bExp == 0 ) { --expDiff; } else { bSig |= LIT64( 0x2000000000000000 ); } shift64RightJamming( bSig, expDiff, &bSig ); zExp = aExp; } else if ( expDiff < 0 ) { if ( bExp == 0x7FF ) { if ( bSig ) return propagateFloat64NaN( a, b ); return packFloat64( zSign, 0x7FF, 0 ); } if ( aExp == 0 ) { ++expDiff; } else { aSig |= LIT64( 0x2000000000000000 ); } shift64RightJamming( aSig, - expDiff, &aSig ); zExp = bExp; } else { if ( aExp == 0x7FF ) { if ( aSig | bSig ) return propagateFloat64NaN( a, b ); return a; } if ( aExp == 0 ) return packFloat64( zSign, 0, ( aSig + bSig )>>9 ); zSig = LIT64( 0x4000000000000000 ) + aSig + bSig; zExp = aExp; goto roundAndPack; } aSig |= LIT64( 0x2000000000000000 ); zSig = ( aSig + bSig )<<1; --zExp; if ( (sbits64) zSig < 0 ) { zSig = aSig + bSig; ++zExp; } roundAndPack: return roundAndPackFloat64( zSign, zExp, zSig ); } /* ------------------------------------------------------------------------------- Returns the result of subtracting the absolute values of the double- precision floating-point values `a' and `b'. If `zSign' is 1, the difference is negated before being returned. `zSign' is ignored if the result is a NaN. The subtraction is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ static float64 subFloat64Sigs( float64 a, float64 b, flag zSign ) { int16 aExp, bExp, zExp; bits64 aSig, bSig, zSig; int16 expDiff; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); bSig = extractFloat64Frac( b ); bExp = extractFloat64Exp( b ); expDiff = aExp - bExp; aSig <<= 10; bSig <<= 10; if ( 0 < expDiff ) goto aExpBigger; if ( expDiff < 0 ) goto bExpBigger; if ( aExp == 0x7FF ) { if ( aSig | bSig ) return propagateFloat64NaN( a, b ); float_raise( float_flag_invalid ); return float64_default_nan; } if ( aExp == 0 ) { aExp = 1; bExp = 1; } if ( bSig < aSig ) goto aBigger; if ( aSig < bSig ) goto bBigger; return packFloat64( float_rounding_mode == float_round_down, 0, 0 ); bExpBigger: if ( bExp == 0x7FF ) { if ( bSig ) return propagateFloat64NaN( a, b ); return packFloat64( zSign ^ 1, 0x7FF, 0 ); } if ( aExp == 0 ) { ++expDiff; } else { aSig |= LIT64( 0x4000000000000000 ); } shift64RightJamming( aSig, - expDiff, &aSig ); bSig |= LIT64( 0x4000000000000000 ); bBigger: zSig = bSig - aSig; zExp = bExp; zSign ^= 1; goto normalizeRoundAndPack; aExpBigger: if ( aExp == 0x7FF ) { if ( aSig ) return propagateFloat64NaN( a, b ); return a; } if ( bExp == 0 ) { --expDiff; } else { bSig |= LIT64( 0x4000000000000000 ); } shift64RightJamming( bSig, expDiff, &bSig ); aSig |= LIT64( 0x4000000000000000 ); aBigger: zSig = aSig - bSig; zExp = aExp; normalizeRoundAndPack: --zExp; return normalizeRoundAndPackFloat64( zSign, zExp, zSig ); } /* ------------------------------------------------------------------------------- Returns the result of adding the double-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 float64_add( float64 a, float64 b ) { flag aSign, bSign; aSign = extractFloat64Sign( a ); bSign = extractFloat64Sign( b ); if ( aSign == bSign ) { return addFloat64Sigs( a, b, aSign ); } else { return subFloat64Sigs( a, b, aSign ); } } /* ------------------------------------------------------------------------------- Returns the result of subtracting the double-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 float64_sub( float64 a, float64 b ) { flag aSign, bSign; aSign = extractFloat64Sign( a ); bSign = extractFloat64Sign( b ); if ( aSign == bSign ) { return subFloat64Sigs( a, b, aSign ); } else { return addFloat64Sigs( a, b, aSign ); } } /* ------------------------------------------------------------------------------- Returns the result of multiplying the double-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 float64_mul( float64 a, float64 b ) { flag aSign, bSign, zSign; int16 aExp, bExp, zExp; bits64 aSig, bSig, zSig0, zSig1; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); aSign = extractFloat64Sign( a ); bSig = extractFloat64Frac( b ); bExp = extractFloat64Exp( b ); bSign = extractFloat64Sign( b ); zSign = aSign ^ bSign; if ( aExp == 0x7FF ) { if ( aSig || ( ( bExp == 0x7FF ) && bSig ) ) { return propagateFloat64NaN( a, b ); } if ( ( bExp | bSig ) == 0 ) { float_raise( float_flag_invalid ); return float64_default_nan; } return packFloat64( zSign, 0x7FF, 0 ); } if ( bExp == 0x7FF ) { if ( bSig ) return propagateFloat64NaN( a, b ); if ( ( aExp | aSig ) == 0 ) { float_raise( float_flag_invalid ); return float64_default_nan; } return packFloat64( zSign, 0x7FF, 0 ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloat64( zSign, 0, 0 ); normalizeFloat64Subnormal( aSig, &aExp, &aSig ); } if ( bExp == 0 ) { if ( bSig == 0 ) return packFloat64( zSign, 0, 0 ); normalizeFloat64Subnormal( bSig, &bExp, &bSig ); } zExp = aExp + bExp - 0x3FF; aSig = ( aSig | LIT64( 0x0010000000000000 ) )<<10; bSig = ( bSig | LIT64( 0x0010000000000000 ) )<<11; mul64To128( aSig, bSig, &zSig0, &zSig1 ); zSig0 |= ( zSig1 != 0 ); if ( 0 <= (sbits64) ( zSig0<<1 ) ) { zSig0 <<= 1; --zExp; } return roundAndPackFloat64( zSign, zExp, zSig0 ); } /* ------------------------------------------------------------------------------- Returns the result of dividing the double-precision floating-point value `a' by the corresponding value `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 float64_div( float64 a, float64 b ) { flag aSign, bSign, zSign; int16 aExp, bExp, zExp; bits64 aSig, bSig, zSig; bits64 rem0, rem1; bits64 term0, term1; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); aSign = extractFloat64Sign( a ); bSig = extractFloat64Frac( b ); bExp = extractFloat64Exp( b ); bSign = extractFloat64Sign( b ); zSign = aSign ^ bSign; if ( aExp == 0x7FF ) { if ( aSig ) return propagateFloat64NaN( a, b ); if ( bExp == 0x7FF ) { if ( bSig ) return propagateFloat64NaN( a, b ); float_raise( float_flag_invalid ); return float64_default_nan; } return packFloat64( zSign, 0x7FF, 0 ); } if ( bExp == 0x7FF ) { if ( bSig ) return propagateFloat64NaN( a, b ); return packFloat64( zSign, 0, 0 ); } if ( bExp == 0 ) { if ( bSig == 0 ) { if ( ( aExp | aSig ) == 0 ) { float_raise( float_flag_invalid ); return float64_default_nan; } float_raise( float_flag_divbyzero ); return packFloat64( zSign, 0x7FF, 0 ); } normalizeFloat64Subnormal( bSig, &bExp, &bSig ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloat64( zSign, 0, 0 ); normalizeFloat64Subnormal( aSig, &aExp, &aSig ); } zExp = aExp - bExp + 0x3FD; aSig = ( aSig | LIT64( 0x0010000000000000 ) )<<10; bSig = ( bSig | LIT64( 0x0010000000000000 ) )<<11; if ( bSig <= ( aSig + aSig ) ) { aSig >>= 1; ++zExp; } zSig = estimateDiv128To64( aSig, 0, bSig ); if ( ( zSig & 0x1FF ) <= 2 ) { mul64To128( bSig, zSig, &term0, &term1 ); sub128( aSig, 0, term0, term1, &rem0, &rem1 ); while ( (sbits64) rem0 < 0 ) { --zSig; add128( rem0, rem1, 0, bSig, &rem0, &rem1 ); } zSig |= ( rem1 != 0 ); } return roundAndPackFloat64( zSign, zExp, zSig ); } #ifndef SOFTFLOAT_FOR_GCC /* ------------------------------------------------------------------------------- Returns the remainder of the double-precision floating-point value `a' with respect to the corresponding value `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 float64_rem( float64 a, float64 b ) { flag aSign, bSign, zSign; int16 aExp, bExp, expDiff; bits64 aSig, bSig; bits64 q, alternateASig; sbits64 sigMean; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); aSign = extractFloat64Sign( a ); bSig = extractFloat64Frac( b ); bExp = extractFloat64Exp( b ); bSign = extractFloat64Sign( b ); if ( aExp == 0x7FF ) { if ( aSig || ( ( bExp == 0x7FF ) && bSig ) ) { return propagateFloat64NaN( a, b ); } float_raise( float_flag_invalid ); return float64_default_nan; } if ( bExp == 0x7FF ) { if ( bSig ) return propagateFloat64NaN( a, b ); return a; } if ( bExp == 0 ) { if ( bSig == 0 ) { float_raise( float_flag_invalid ); return float64_default_nan; } normalizeFloat64Subnormal( bSig, &bExp, &bSig ); } if ( aExp == 0 ) { if ( aSig == 0 ) return a; normalizeFloat64Subnormal( aSig, &aExp, &aSig ); } expDiff = aExp - bExp; aSig = ( aSig | LIT64( 0x0010000000000000 ) )<<11; bSig = ( bSig | LIT64( 0x0010000000000000 ) )<<11; if ( expDiff < 0 ) { if ( expDiff < -1 ) return a; aSig >>= 1; } q = ( bSig <= aSig ); if ( q ) aSig -= bSig; expDiff -= 64; while ( 0 < expDiff ) { q = estimateDiv128To64( aSig, 0, bSig ); q = ( 2 < q ) ? q - 2 : 0; aSig = - ( ( bSig>>2 ) * q ); expDiff -= 62; } expDiff += 64; if ( 0 < expDiff ) { q = estimateDiv128To64( aSig, 0, bSig ); q = ( 2 < q ) ? q - 2 : 0; q >>= 64 - expDiff; bSig >>= 2; aSig = ( ( aSig>>1 )<<( expDiff - 1 ) ) - bSig * q; } else { aSig >>= 2; bSig >>= 2; } do { alternateASig = aSig; ++q; aSig -= bSig; } while ( 0 <= (sbits64) aSig ); sigMean = aSig + alternateASig; if ( ( sigMean < 0 ) || ( ( sigMean == 0 ) && ( q & 1 ) ) ) { aSig = alternateASig; } zSign = ( (sbits64) aSig < 0 ); if ( zSign ) aSig = - aSig; return normalizeRoundAndPackFloat64( aSign ^ zSign, bExp, aSig ); } /* ------------------------------------------------------------------------------- Returns the square root of the double-precision floating-point value `a'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 float64_sqrt( float64 a ) { flag aSign; int16 aExp, zExp; bits64 aSig, zSig, doubleZSig; bits64 rem0, rem1, term0, term1; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); aSign = extractFloat64Sign( a ); if ( aExp == 0x7FF ) { if ( aSig ) return propagateFloat64NaN( a, a ); if ( ! aSign ) return a; float_raise( float_flag_invalid ); return float64_default_nan; } if ( aSign ) { if ( ( aExp | aSig ) == 0 ) return a; float_raise( float_flag_invalid ); return float64_default_nan; } if ( aExp == 0 ) { if ( aSig == 0 ) return 0; normalizeFloat64Subnormal( aSig, &aExp, &aSig ); } zExp = ( ( aExp - 0x3FF )>>1 ) + 0x3FE; aSig |= LIT64( 0x0010000000000000 ); zSig = estimateSqrt32( aExp, aSig>>21 ); aSig <<= 9 - ( aExp & 1 ); zSig = estimateDiv128To64( aSig, 0, zSig<<32 ) + ( zSig<<30 ); if ( ( zSig & 0x1FF ) <= 5 ) { doubleZSig = zSig<<1; mul64To128( zSig, zSig, &term0, &term1 ); sub128( aSig, 0, term0, term1, &rem0, &rem1 ); while ( (sbits64) rem0 < 0 ) { --zSig; doubleZSig -= 2; add128( rem0, rem1, zSig>>63, doubleZSig | 1, &rem0, &rem1 ); } zSig |= ( ( rem0 | rem1 ) != 0 ); } return roundAndPackFloat64( 0, zExp, zSig ); } #endif /* ------------------------------------------------------------------------------- Returns 1 if the double-precision floating-point value `a' is equal to the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float64_eq( float64 a, float64 b ) { if ( ( ( extractFloat64Exp( a ) == 0x7FF ) && extractFloat64Frac( a ) ) || ( ( extractFloat64Exp( b ) == 0x7FF ) && extractFloat64Frac( b ) ) ) { if ( float64_is_signaling_nan( a ) || float64_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } return ( a == b ) || ( (bits64) ( ( FLOAT64_DEMANGLE(a) | FLOAT64_DEMANGLE(b) )<<1 ) == 0 ); } /* ------------------------------------------------------------------------------- Returns 1 if the double-precision floating-point value `a' is less than or equal to the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float64_le( float64 a, float64 b ) { flag aSign, bSign; if ( ( ( extractFloat64Exp( a ) == 0x7FF ) && extractFloat64Frac( a ) ) || ( ( extractFloat64Exp( b ) == 0x7FF ) && extractFloat64Frac( b ) ) ) { float_raise( float_flag_invalid ); return 0; } aSign = extractFloat64Sign( a ); bSign = extractFloat64Sign( b ); if ( aSign != bSign ) return aSign || ( (bits64) ( ( FLOAT64_DEMANGLE(a) | FLOAT64_DEMANGLE(b) )<<1 ) == 0 ); return ( a == b ) || ( aSign ^ ( FLOAT64_DEMANGLE(a) < FLOAT64_DEMANGLE(b) ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the double-precision floating-point value `a' is less than the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float64_lt( float64 a, float64 b ) { flag aSign, bSign; if ( ( ( extractFloat64Exp( a ) == 0x7FF ) && extractFloat64Frac( a ) ) || ( ( extractFloat64Exp( b ) == 0x7FF ) && extractFloat64Frac( b ) ) ) { float_raise( float_flag_invalid ); return 0; } aSign = extractFloat64Sign( a ); bSign = extractFloat64Sign( b ); if ( aSign != bSign ) return aSign && ( (bits64) ( ( FLOAT64_DEMANGLE(a) | FLOAT64_DEMANGLE(b) )<<1 ) != 0 ); return ( a != b ) && ( aSign ^ ( FLOAT64_DEMANGLE(a) < FLOAT64_DEMANGLE(b) ) ); } #ifndef SOFTFLOAT_FOR_GCC /* ------------------------------------------------------------------------------- Returns 1 if the double-precision floating-point value `a' is equal to the corresponding value `b', and 0 otherwise. The invalid exception is raised if either operand is a NaN. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float64_eq_signaling( float64 a, float64 b ) { if ( ( ( extractFloat64Exp( a ) == 0x7FF ) && extractFloat64Frac( a ) ) || ( ( extractFloat64Exp( b ) == 0x7FF ) && extractFloat64Frac( b ) ) ) { float_raise( float_flag_invalid ); return 0; } return ( a == b ) || ( (bits64) ( ( a | b )<<1 ) == 0 ); } /* ------------------------------------------------------------------------------- Returns 1 if the double-precision floating-point value `a' is less than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float64_le_quiet( float64 a, float64 b ) { flag aSign, bSign; if ( ( ( extractFloat64Exp( a ) == 0x7FF ) && extractFloat64Frac( a ) ) || ( ( extractFloat64Exp( b ) == 0x7FF ) && extractFloat64Frac( b ) ) ) { if ( float64_is_signaling_nan( a ) || float64_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } aSign = extractFloat64Sign( a ); bSign = extractFloat64Sign( b ); if ( aSign != bSign ) return aSign || ( (bits64) ( ( a | b )<<1 ) == 0 ); return ( a == b ) || ( aSign ^ ( a < b ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the double-precision floating-point value `a' is less than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float64_lt_quiet( float64 a, float64 b ) { flag aSign, bSign; if ( ( ( extractFloat64Exp( a ) == 0x7FF ) && extractFloat64Frac( a ) ) || ( ( extractFloat64Exp( b ) == 0x7FF ) && extractFloat64Frac( b ) ) ) { if ( float64_is_signaling_nan( a ) || float64_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } aSign = extractFloat64Sign( a ); bSign = extractFloat64Sign( b ); if ( aSign != bSign ) return aSign && ( (bits64) ( ( a | b )<<1 ) != 0 ); return ( a != b ) && ( aSign ^ ( a < b ) ); } #endif #ifdef FLOATX80 /* ------------------------------------------------------------------------------- Returns the result of converting the extended double-precision floating- point value `a' to the 32-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic---which means in particular that the conversion is rounded according to the current rounding mode. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned. ------------------------------------------------------------------------------- */ int32 floatx80_to_int32( floatx80 a ) { flag aSign; int32 aExp, shiftCount; bits64 aSig; aSig = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); aSign = extractFloatx80Sign( a ); if ( ( aExp == 0x7FFF ) && (bits64) ( aSig<<1 ) ) aSign = 0; shiftCount = 0x4037 - aExp; if ( shiftCount <= 0 ) shiftCount = 1; shift64RightJamming( aSig, shiftCount, &aSig ); return roundAndPackInt32( aSign, aSig ); } /* ------------------------------------------------------------------------------- Returns the result of converting the extended double-precision floating- point value `a' to the 32-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic, except that the conversion is always rounded toward zero. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned. ------------------------------------------------------------------------------- */ int32 floatx80_to_int32_round_to_zero( floatx80 a ) { flag aSign; int32 aExp, shiftCount; bits64 aSig, savedASig; int32 z; aSig = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); aSign = extractFloatx80Sign( a ); if ( 0x401E < aExp ) { if ( ( aExp == 0x7FFF ) && (bits64) ( aSig<<1 ) ) aSign = 0; goto invalid; } else if ( aExp < 0x3FFF ) { if ( aExp || aSig ) float_exception_flags |= float_flag_inexact; return 0; } shiftCount = 0x403E - aExp; savedASig = aSig; aSig >>= shiftCount; z = aSig; if ( aSign ) z = - z; if ( ( z < 0 ) ^ aSign ) { invalid: float_raise( float_flag_invalid ); return aSign ? (sbits32) 0x80000000 : 0x7FFFFFFF; } if ( ( aSig<>( - shiftCount ); if ( (bits64) ( aSig<<( shiftCount & 63 ) ) ) { float_exception_flags |= float_flag_inexact; } if ( aSign ) z = - z; return z; } /* ------------------------------------------------------------------------------- Returns the result of converting the extended double-precision floating- point value `a' to the single-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float32 floatx80_to_float32( floatx80 a ) { flag aSign; int32 aExp; bits64 aSig; aSig = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); aSign = extractFloatx80Sign( a ); if ( aExp == 0x7FFF ) { if ( (bits64) ( aSig<<1 ) ) { return commonNaNToFloat32( floatx80ToCommonNaN( a ) ); } return packFloat32( aSign, 0xFF, 0 ); } shift64RightJamming( aSig, 33, &aSig ); if ( aExp || aSig ) aExp -= 0x3F81; return roundAndPackFloat32( aSign, aExp, aSig ); } /* ------------------------------------------------------------------------------- Returns the result of converting the extended double-precision floating- point value `a' to the double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 floatx80_to_float64( floatx80 a ) { flag aSign; int32 aExp; bits64 aSig, zSig; aSig = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); aSign = extractFloatx80Sign( a ); if ( aExp == 0x7FFF ) { if ( (bits64) ( aSig<<1 ) ) { return commonNaNToFloat64( floatx80ToCommonNaN( a ) ); } return packFloat64( aSign, 0x7FF, 0 ); } shift64RightJamming( aSig, 1, &zSig ); if ( aExp || aSig ) aExp -= 0x3C01; return roundAndPackFloat64( aSign, aExp, zSig ); } #ifdef FLOAT128 /* ------------------------------------------------------------------------------- Returns the result of converting the extended double-precision floating- point value `a' to the quadruple-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float128 floatx80_to_float128( floatx80 a ) { flag aSign; int16 aExp; bits64 aSig, zSig0, zSig1; aSig = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); aSign = extractFloatx80Sign( a ); if ( ( aExp == 0x7FFF ) && (bits64) ( aSig<<1 ) ) { return commonNaNToFloat128( floatx80ToCommonNaN( a ) ); } shift128Right( aSig<<1, 0, 16, &zSig0, &zSig1 ); return packFloat128( aSign, aExp, zSig0, zSig1 ); } #endif /* ------------------------------------------------------------------------------- Rounds the extended double-precision floating-point value `a' to an integer, and returns the result as an extended quadruple-precision floating-point value. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ floatx80 floatx80_round_to_int( floatx80 a ) { flag aSign; int32 aExp; bits64 lastBitMask, roundBitsMask; int8 roundingMode; floatx80 z; aExp = extractFloatx80Exp( a ); if ( 0x403E <= aExp ) { if ( ( aExp == 0x7FFF ) && (bits64) ( extractFloatx80Frac( a )<<1 ) ) { return propagateFloatx80NaN( a, a ); } return a; } if ( aExp < 0x3FFF ) { if ( ( aExp == 0 ) && ( (bits64) ( extractFloatx80Frac( a )<<1 ) == 0 ) ) { return a; } float_exception_flags |= float_flag_inexact; aSign = extractFloatx80Sign( a ); switch ( float_rounding_mode ) { case float_round_nearest_even: if ( ( aExp == 0x3FFE ) && (bits64) ( extractFloatx80Frac( a )<<1 ) ) { return packFloatx80( aSign, 0x3FFF, LIT64( 0x8000000000000000 ) ); } break; case float_round_to_zero: break; case float_round_down: return aSign ? packFloatx80( 1, 0x3FFF, LIT64( 0x8000000000000000 ) ) : packFloatx80( 0, 0, 0 ); case float_round_up: return aSign ? packFloatx80( 1, 0, 0 ) : packFloatx80( 0, 0x3FFF, LIT64( 0x8000000000000000 ) ); } return packFloatx80( aSign, 0, 0 ); } lastBitMask = 1; lastBitMask <<= 0x403E - aExp; roundBitsMask = lastBitMask - 1; z = a; roundingMode = float_rounding_mode; if ( roundingMode == float_round_nearest_even ) { z.low += lastBitMask>>1; if ( ( z.low & roundBitsMask ) == 0 ) z.low &= ~ lastBitMask; } else if ( roundingMode != float_round_to_zero ) { if ( extractFloatx80Sign( z ) ^ ( roundingMode == float_round_up ) ) { z.low += roundBitsMask; } } z.low &= ~ roundBitsMask; if ( z.low == 0 ) { ++z.high; z.low = LIT64( 0x8000000000000000 ); } if ( z.low != a.low ) float_exception_flags |= float_flag_inexact; return z; } /* ------------------------------------------------------------------------------- Returns the result of adding the absolute values of the extended double- precision floating-point values `a' and `b'. If `zSign' is 1, the sum is negated before being returned. `zSign' is ignored if the result is a NaN. The addition is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign ) { int32 aExp, bExp, zExp; bits64 aSig, bSig, zSig0, zSig1; int32 expDiff; aSig = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); bSig = extractFloatx80Frac( b ); bExp = extractFloatx80Exp( b ); expDiff = aExp - bExp; if ( 0 < expDiff ) { if ( aExp == 0x7FFF ) { if ( (bits64) ( aSig<<1 ) ) return propagateFloatx80NaN( a, b ); return a; } if ( bExp == 0 ) --expDiff; shift64ExtraRightJamming( bSig, 0, expDiff, &bSig, &zSig1 ); zExp = aExp; } else if ( expDiff < 0 ) { if ( bExp == 0x7FFF ) { if ( (bits64) ( bSig<<1 ) ) return propagateFloatx80NaN( a, b ); return packFloatx80( zSign, 0x7FFF, LIT64( 0x8000000000000000 ) ); } if ( aExp == 0 ) ++expDiff; shift64ExtraRightJamming( aSig, 0, - expDiff, &aSig, &zSig1 ); zExp = bExp; } else { if ( aExp == 0x7FFF ) { if ( (bits64) ( ( aSig | bSig )<<1 ) ) { return propagateFloatx80NaN( a, b ); } return a; } zSig1 = 0; zSig0 = aSig + bSig; if ( aExp == 0 ) { normalizeFloatx80Subnormal( zSig0, &zExp, &zSig0 ); goto roundAndPack; } zExp = aExp; goto shiftRight1; } zSig0 = aSig + bSig; if ( (sbits64) zSig0 < 0 ) goto roundAndPack; shiftRight1: shift64ExtraRightJamming( zSig0, zSig1, 1, &zSig0, &zSig1 ); zSig0 |= LIT64( 0x8000000000000000 ); ++zExp; roundAndPack: return roundAndPackFloatx80( floatx80_rounding_precision, zSign, zExp, zSig0, zSig1 ); } /* ------------------------------------------------------------------------------- Returns the result of subtracting the absolute values of the extended double-precision floating-point values `a' and `b'. If `zSign' is 1, the difference is negated before being returned. `zSign' is ignored if the result is a NaN. The subtraction is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign ) { int32 aExp, bExp, zExp; bits64 aSig, bSig, zSig0, zSig1; int32 expDiff; floatx80 z; aSig = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); bSig = extractFloatx80Frac( b ); bExp = extractFloatx80Exp( b ); expDiff = aExp - bExp; if ( 0 < expDiff ) goto aExpBigger; if ( expDiff < 0 ) goto bExpBigger; if ( aExp == 0x7FFF ) { if ( (bits64) ( ( aSig | bSig )<<1 ) ) { return propagateFloatx80NaN( a, b ); } float_raise( float_flag_invalid ); z.low = floatx80_default_nan_low; z.high = floatx80_default_nan_high; return z; } if ( aExp == 0 ) { aExp = 1; bExp = 1; } zSig1 = 0; if ( bSig < aSig ) goto aBigger; if ( aSig < bSig ) goto bBigger; return packFloatx80( float_rounding_mode == float_round_down, 0, 0 ); bExpBigger: if ( bExp == 0x7FFF ) { if ( (bits64) ( bSig<<1 ) ) return propagateFloatx80NaN( a, b ); return packFloatx80( zSign ^ 1, 0x7FFF, LIT64( 0x8000000000000000 ) ); } if ( aExp == 0 ) ++expDiff; shift128RightJamming( aSig, 0, - expDiff, &aSig, &zSig1 ); bBigger: sub128( bSig, 0, aSig, zSig1, &zSig0, &zSig1 ); zExp = bExp; zSign ^= 1; goto normalizeRoundAndPack; aExpBigger: if ( aExp == 0x7FFF ) { if ( (bits64) ( aSig<<1 ) ) return propagateFloatx80NaN( a, b ); return a; } if ( bExp == 0 ) --expDiff; shift128RightJamming( bSig, 0, expDiff, &bSig, &zSig1 ); aBigger: sub128( aSig, 0, bSig, zSig1, &zSig0, &zSig1 ); zExp = aExp; normalizeRoundAndPack: return normalizeRoundAndPackFloatx80( floatx80_rounding_precision, zSign, zExp, zSig0, zSig1 ); } /* ------------------------------------------------------------------------------- Returns the result of adding the extended double-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ floatx80 floatx80_add( floatx80 a, floatx80 b ) { flag aSign, bSign; aSign = extractFloatx80Sign( a ); bSign = extractFloatx80Sign( b ); if ( aSign == bSign ) { return addFloatx80Sigs( a, b, aSign ); } else { return subFloatx80Sigs( a, b, aSign ); } } /* ------------------------------------------------------------------------------- Returns the result of subtracting the extended double-precision floating- point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ floatx80 floatx80_sub( floatx80 a, floatx80 b ) { flag aSign, bSign; aSign = extractFloatx80Sign( a ); bSign = extractFloatx80Sign( b ); if ( aSign == bSign ) { return subFloatx80Sigs( a, b, aSign ); } else { return addFloatx80Sigs( a, b, aSign ); } } /* ------------------------------------------------------------------------------- Returns the result of multiplying the extended double-precision floating- point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ floatx80 floatx80_mul( floatx80 a, floatx80 b ) { flag aSign, bSign, zSign; int32 aExp, bExp, zExp; bits64 aSig, bSig, zSig0, zSig1; floatx80 z; aSig = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); aSign = extractFloatx80Sign( a ); bSig = extractFloatx80Frac( b ); bExp = extractFloatx80Exp( b ); bSign = extractFloatx80Sign( b ); zSign = aSign ^ bSign; if ( aExp == 0x7FFF ) { if ( (bits64) ( aSig<<1 ) || ( ( bExp == 0x7FFF ) && (bits64) ( bSig<<1 ) ) ) { return propagateFloatx80NaN( a, b ); } if ( ( bExp | bSig ) == 0 ) goto invalid; return packFloatx80( zSign, 0x7FFF, LIT64( 0x8000000000000000 ) ); } if ( bExp == 0x7FFF ) { if ( (bits64) ( bSig<<1 ) ) return propagateFloatx80NaN( a, b ); if ( ( aExp | aSig ) == 0 ) { invalid: float_raise( float_flag_invalid ); z.low = floatx80_default_nan_low; z.high = floatx80_default_nan_high; return z; } return packFloatx80( zSign, 0x7FFF, LIT64( 0x8000000000000000 ) ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloatx80( zSign, 0, 0 ); normalizeFloatx80Subnormal( aSig, &aExp, &aSig ); } if ( bExp == 0 ) { if ( bSig == 0 ) return packFloatx80( zSign, 0, 0 ); normalizeFloatx80Subnormal( bSig, &bExp, &bSig ); } zExp = aExp + bExp - 0x3FFE; mul64To128( aSig, bSig, &zSig0, &zSig1 ); if ( 0 < (sbits64) zSig0 ) { shortShift128Left( zSig0, zSig1, 1, &zSig0, &zSig1 ); --zExp; } return roundAndPackFloatx80( floatx80_rounding_precision, zSign, zExp, zSig0, zSig1 ); } /* ------------------------------------------------------------------------------- Returns the result of dividing the extended double-precision floating-point value `a' by the corresponding value `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ floatx80 floatx80_div( floatx80 a, floatx80 b ) { flag aSign, bSign, zSign; int32 aExp, bExp, zExp; bits64 aSig, bSig, zSig0, zSig1; bits64 rem0, rem1, rem2, term0, term1, term2; floatx80 z; aSig = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); aSign = extractFloatx80Sign( a ); bSig = extractFloatx80Frac( b ); bExp = extractFloatx80Exp( b ); bSign = extractFloatx80Sign( b ); zSign = aSign ^ bSign; if ( aExp == 0x7FFF ) { if ( (bits64) ( aSig<<1 ) ) return propagateFloatx80NaN( a, b ); if ( bExp == 0x7FFF ) { if ( (bits64) ( bSig<<1 ) ) return propagateFloatx80NaN( a, b ); goto invalid; } return packFloatx80( zSign, 0x7FFF, LIT64( 0x8000000000000000 ) ); } if ( bExp == 0x7FFF ) { if ( (bits64) ( bSig<<1 ) ) return propagateFloatx80NaN( a, b ); return packFloatx80( zSign, 0, 0 ); } if ( bExp == 0 ) { if ( bSig == 0 ) { if ( ( aExp | aSig ) == 0 ) { invalid: float_raise( float_flag_invalid ); z.low = floatx80_default_nan_low; z.high = floatx80_default_nan_high; return z; } float_raise( float_flag_divbyzero ); return packFloatx80( zSign, 0x7FFF, LIT64( 0x8000000000000000 ) ); } normalizeFloatx80Subnormal( bSig, &bExp, &bSig ); } if ( aExp == 0 ) { if ( aSig == 0 ) return packFloatx80( zSign, 0, 0 ); normalizeFloatx80Subnormal( aSig, &aExp, &aSig ); } zExp = aExp - bExp + 0x3FFE; rem1 = 0; if ( bSig <= aSig ) { shift128Right( aSig, 0, 1, &aSig, &rem1 ); ++zExp; } zSig0 = estimateDiv128To64( aSig, rem1, bSig ); mul64To128( bSig, zSig0, &term0, &term1 ); sub128( aSig, rem1, term0, term1, &rem0, &rem1 ); while ( (sbits64) rem0 < 0 ) { --zSig0; add128( rem0, rem1, 0, bSig, &rem0, &rem1 ); } zSig1 = estimateDiv128To64( rem1, 0, bSig ); if ( (bits64) ( zSig1<<1 ) <= 8 ) { mul64To128( bSig, zSig1, &term1, &term2 ); sub128( rem1, 0, term1, term2, &rem1, &rem2 ); while ( (sbits64) rem1 < 0 ) { --zSig1; add128( rem1, rem2, 0, bSig, &rem1, &rem2 ); } zSig1 |= ( ( rem1 | rem2 ) != 0 ); } return roundAndPackFloatx80( floatx80_rounding_precision, zSign, zExp, zSig0, zSig1 ); } /* ------------------------------------------------------------------------------- Returns the remainder of the extended double-precision floating-point value `a' with respect to the corresponding value `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ floatx80 floatx80_rem( floatx80 a, floatx80 b ) { flag aSign, bSign, zSign; int32 aExp, bExp, expDiff; bits64 aSig0, aSig1, bSig; bits64 q, term0, term1, alternateASig0, alternateASig1; floatx80 z; aSig0 = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); aSign = extractFloatx80Sign( a ); bSig = extractFloatx80Frac( b ); bExp = extractFloatx80Exp( b ); bSign = extractFloatx80Sign( b ); if ( aExp == 0x7FFF ) { if ( (bits64) ( aSig0<<1 ) || ( ( bExp == 0x7FFF ) && (bits64) ( bSig<<1 ) ) ) { return propagateFloatx80NaN( a, b ); } goto invalid; } if ( bExp == 0x7FFF ) { if ( (bits64) ( bSig<<1 ) ) return propagateFloatx80NaN( a, b ); return a; } if ( bExp == 0 ) { if ( bSig == 0 ) { invalid: float_raise( float_flag_invalid ); z.low = floatx80_default_nan_low; z.high = floatx80_default_nan_high; return z; } normalizeFloatx80Subnormal( bSig, &bExp, &bSig ); } if ( aExp == 0 ) { if ( (bits64) ( aSig0<<1 ) == 0 ) return a; normalizeFloatx80Subnormal( aSig0, &aExp, &aSig0 ); } bSig |= LIT64( 0x8000000000000000 ); zSign = aSign; expDiff = aExp - bExp; aSig1 = 0; if ( expDiff < 0 ) { if ( expDiff < -1 ) return a; shift128Right( aSig0, 0, 1, &aSig0, &aSig1 ); expDiff = 0; } q = ( bSig <= aSig0 ); if ( q ) aSig0 -= bSig; expDiff -= 64; while ( 0 < expDiff ) { q = estimateDiv128To64( aSig0, aSig1, bSig ); q = ( 2 < q ) ? q - 2 : 0; mul64To128( bSig, q, &term0, &term1 ); sub128( aSig0, aSig1, term0, term1, &aSig0, &aSig1 ); shortShift128Left( aSig0, aSig1, 62, &aSig0, &aSig1 ); expDiff -= 62; } expDiff += 64; if ( 0 < expDiff ) { q = estimateDiv128To64( aSig0, aSig1, bSig ); q = ( 2 < q ) ? q - 2 : 0; q >>= 64 - expDiff; mul64To128( bSig, q<<( 64 - expDiff ), &term0, &term1 ); sub128( aSig0, aSig1, term0, term1, &aSig0, &aSig1 ); shortShift128Left( 0, bSig, 64 - expDiff, &term0, &term1 ); while ( le128( term0, term1, aSig0, aSig1 ) ) { ++q; sub128( aSig0, aSig1, term0, term1, &aSig0, &aSig1 ); } } else { term1 = 0; term0 = bSig; } sub128( term0, term1, aSig0, aSig1, &alternateASig0, &alternateASig1 ); if ( lt128( alternateASig0, alternateASig1, aSig0, aSig1 ) || ( eq128( alternateASig0, alternateASig1, aSig0, aSig1 ) && ( q & 1 ) ) ) { aSig0 = alternateASig0; aSig1 = alternateASig1; zSign = ! zSign; } return normalizeRoundAndPackFloatx80( 80, zSign, bExp + expDiff, aSig0, aSig1 ); } /* ------------------------------------------------------------------------------- Returns the square root of the extended double-precision floating-point value `a'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ floatx80 floatx80_sqrt( floatx80 a ) { flag aSign; int32 aExp, zExp; bits64 aSig0, aSig1, zSig0, zSig1, doubleZSig0; bits64 rem0, rem1, rem2, rem3, term0, term1, term2, term3; floatx80 z; aSig0 = extractFloatx80Frac( a ); aExp = extractFloatx80Exp( a ); aSign = extractFloatx80Sign( a ); if ( aExp == 0x7FFF ) { if ( (bits64) ( aSig0<<1 ) ) return propagateFloatx80NaN( a, a ); if ( ! aSign ) return a; goto invalid; } if ( aSign ) { if ( ( aExp | aSig0 ) == 0 ) return a; invalid: float_raise( float_flag_invalid ); z.low = floatx80_default_nan_low; z.high = floatx80_default_nan_high; return z; } if ( aExp == 0 ) { if ( aSig0 == 0 ) return packFloatx80( 0, 0, 0 ); normalizeFloatx80Subnormal( aSig0, &aExp, &aSig0 ); } zExp = ( ( aExp - 0x3FFF )>>1 ) + 0x3FFF; zSig0 = estimateSqrt32( aExp, aSig0>>32 ); shift128Right( aSig0, 0, 2 + ( aExp & 1 ), &aSig0, &aSig1 ); zSig0 = estimateDiv128To64( aSig0, aSig1, zSig0<<32 ) + ( zSig0<<30 ); doubleZSig0 = zSig0<<1; mul64To128( zSig0, zSig0, &term0, &term1 ); sub128( aSig0, aSig1, term0, term1, &rem0, &rem1 ); while ( (sbits64) rem0 < 0 ) { --zSig0; doubleZSig0 -= 2; add128( rem0, rem1, zSig0>>63, doubleZSig0 | 1, &rem0, &rem1 ); } zSig1 = estimateDiv128To64( rem1, 0, doubleZSig0 ); if ( ( zSig1 & LIT64( 0x3FFFFFFFFFFFFFFF ) ) <= 5 ) { if ( zSig1 == 0 ) zSig1 = 1; mul64To128( doubleZSig0, zSig1, &term1, &term2 ); sub128( rem1, 0, term1, term2, &rem1, &rem2 ); mul64To128( zSig1, zSig1, &term2, &term3 ); sub192( rem1, rem2, 0, 0, term2, term3, &rem1, &rem2, &rem3 ); while ( (sbits64) rem1 < 0 ) { --zSig1; shortShift128Left( 0, zSig1, 1, &term2, &term3 ); term3 |= 1; term2 |= doubleZSig0; add192( rem1, rem2, rem3, 0, term2, term3, &rem1, &rem2, &rem3 ); } zSig1 |= ( ( rem1 | rem2 | rem3 ) != 0 ); } shortShift128Left( 0, zSig1, 1, &zSig0, &zSig1 ); zSig0 |= doubleZSig0; return roundAndPackFloatx80( floatx80_rounding_precision, 0, zExp, zSig0, zSig1 ); } /* ------------------------------------------------------------------------------- Returns 1 if the extended double-precision floating-point value `a' is equal to the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag floatx80_eq( floatx80 a, floatx80 b ) { if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( a )<<1 ) ) || ( ( extractFloatx80Exp( b ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( b )<<1 ) ) ) { if ( floatx80_is_signaling_nan( a ) || floatx80_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } return ( a.low == b.low ) && ( ( a.high == b.high ) || ( ( a.low == 0 ) && ( (bits16) ( ( a.high | b.high )<<1 ) == 0 ) ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the extended double-precision floating-point value `a' is less than or equal to the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag floatx80_le( floatx80 a, floatx80 b ) { flag aSign, bSign; if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( a )<<1 ) ) || ( ( extractFloatx80Exp( b ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( b )<<1 ) ) ) { float_raise( float_flag_invalid ); return 0; } aSign = extractFloatx80Sign( a ); bSign = extractFloatx80Sign( b ); if ( aSign != bSign ) { return aSign || ( ( ( (bits16) ( ( a.high | b.high )<<1 ) ) | a.low | b.low ) == 0 ); } return aSign ? le128( b.high, b.low, a.high, a.low ) : le128( a.high, a.low, b.high, b.low ); } /* ------------------------------------------------------------------------------- Returns 1 if the extended double-precision floating-point value `a' is less than the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag floatx80_lt( floatx80 a, floatx80 b ) { flag aSign, bSign; if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( a )<<1 ) ) || ( ( extractFloatx80Exp( b ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( b )<<1 ) ) ) { float_raise( float_flag_invalid ); return 0; } aSign = extractFloatx80Sign( a ); bSign = extractFloatx80Sign( b ); if ( aSign != bSign ) { return aSign && ( ( ( (bits16) ( ( a.high | b.high )<<1 ) ) | a.low | b.low ) != 0 ); } return aSign ? lt128( b.high, b.low, a.high, a.low ) : lt128( a.high, a.low, b.high, b.low ); } /* ------------------------------------------------------------------------------- Returns 1 if the extended double-precision floating-point value `a' is equal to the corresponding value `b', and 0 otherwise. The invalid exception is raised if either operand is a NaN. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag floatx80_eq_signaling( floatx80 a, floatx80 b ) { if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( a )<<1 ) ) || ( ( extractFloatx80Exp( b ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( b )<<1 ) ) ) { float_raise( float_flag_invalid ); return 0; } return ( a.low == b.low ) && ( ( a.high == b.high ) || ( ( a.low == 0 ) && ( (bits16) ( ( a.high | b.high )<<1 ) == 0 ) ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the extended double-precision floating-point value `a' is less than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag floatx80_le_quiet( floatx80 a, floatx80 b ) { flag aSign, bSign; if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( a )<<1 ) ) || ( ( extractFloatx80Exp( b ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( b )<<1 ) ) ) { if ( floatx80_is_signaling_nan( a ) || floatx80_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } aSign = extractFloatx80Sign( a ); bSign = extractFloatx80Sign( b ); if ( aSign != bSign ) { return aSign || ( ( ( (bits16) ( ( a.high | b.high )<<1 ) ) | a.low | b.low ) == 0 ); } return aSign ? le128( b.high, b.low, a.high, a.low ) : le128( a.high, a.low, b.high, b.low ); } /* ------------------------------------------------------------------------------- Returns 1 if the extended double-precision floating-point value `a' is less than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag floatx80_lt_quiet( floatx80 a, floatx80 b ) { flag aSign, bSign; if ( ( ( extractFloatx80Exp( a ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( a )<<1 ) ) || ( ( extractFloatx80Exp( b ) == 0x7FFF ) && (bits64) ( extractFloatx80Frac( b )<<1 ) ) ) { if ( floatx80_is_signaling_nan( a ) || floatx80_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } aSign = extractFloatx80Sign( a ); bSign = extractFloatx80Sign( b ); if ( aSign != bSign ) { return aSign && ( ( ( (bits16) ( ( a.high | b.high )<<1 ) ) | a.low | b.low ) != 0 ); } return aSign ? lt128( b.high, b.low, a.high, a.low ) : lt128( a.high, a.low, b.high, b.low ); } #endif #ifdef FLOAT128 /* ------------------------------------------------------------------------------- Returns the result of converting the quadruple-precision floating-point value `a' to the 32-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic---which means in particular that the conversion is rounded according to the current rounding mode. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned. ------------------------------------------------------------------------------- */ int32 float128_to_int32( float128 a ) { flag aSign; int32 aExp, shiftCount; bits64 aSig0, aSig1; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); aSign = extractFloat128Sign( a ); if ( ( aExp == 0x7FFF ) && ( aSig0 | aSig1 ) ) aSign = 0; if ( aExp ) aSig0 |= LIT64( 0x0001000000000000 ); aSig0 |= ( aSig1 != 0 ); shiftCount = 0x4028 - aExp; if ( 0 < shiftCount ) shift64RightJamming( aSig0, shiftCount, &aSig0 ); return roundAndPackInt32( aSign, aSig0 ); } /* ------------------------------------------------------------------------------- Returns the result of converting the quadruple-precision floating-point value `a' to the 32-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic, except that the conversion is always rounded toward zero. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned. ------------------------------------------------------------------------------- */ int32 float128_to_int32_round_to_zero( float128 a ) { flag aSign; int32 aExp, shiftCount; bits64 aSig0, aSig1, savedASig; int32 z; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); aSign = extractFloat128Sign( a ); aSig0 |= ( aSig1 != 0 ); if ( 0x401E < aExp ) { if ( ( aExp == 0x7FFF ) && aSig0 ) aSign = 0; goto invalid; } else if ( aExp < 0x3FFF ) { if ( aExp || aSig0 ) float_exception_flags |= float_flag_inexact; return 0; } aSig0 |= LIT64( 0x0001000000000000 ); shiftCount = 0x402F - aExp; savedASig = aSig0; aSig0 >>= shiftCount; z = aSig0; if ( aSign ) z = - z; if ( ( z < 0 ) ^ aSign ) { invalid: float_raise( float_flag_invalid ); return aSign ? (sbits32) 0x80000000 : 0x7FFFFFFF; } if ( ( aSig0<>( ( - shiftCount ) & 63 ) ); if ( (bits64) ( aSig1<>( - shiftCount ); if ( aSig1 || ( shiftCount && (bits64) ( aSig0<<( shiftCount & 63 ) ) ) ) { float_exception_flags |= float_flag_inexact; } } if ( aSign ) z = - z; return z; } +#if (defined(SOFTFLOATSPARC64_FOR_GCC) || defined(SOFTFLOAT_FOR_GCC)) \ + && defined(SOFTFLOAT_NEED_FIXUNS) /* + * just like above - but do not care for overflow of signed results + */ +uint64 float128_to_uint64_round_to_zero( float128 a ) +{ + flag aSign; + int32 aExp, shiftCount; + bits64 aSig0, aSig1; + uint64 z; + + aSig1 = extractFloat128Frac1( a ); + aSig0 = extractFloat128Frac0( a ); + aExp = extractFloat128Exp( a ); + aSign = extractFloat128Sign( a ); + if ( aExp ) aSig0 |= LIT64( 0x0001000000000000 ); + shiftCount = aExp - 0x402F; + if ( 0 < shiftCount ) { + if ( 0x403F <= aExp ) { + aSig0 &= LIT64( 0x0000FFFFFFFFFFFF ); + if ( ( a.high == LIT64( 0xC03E000000000000 ) ) + && ( aSig1 < LIT64( 0x0002000000000000 ) ) ) { + if ( aSig1 ) float_exception_flags |= float_flag_inexact; + } + else { + float_raise( float_flag_invalid ); + } + return LIT64( 0xFFFFFFFFFFFFFFFF ); + } + z = ( aSig0<>( ( - shiftCount ) & 63 ) ); + if ( (bits64) ( aSig1<>( - shiftCount ); + if (aSig1 || ( shiftCount && (bits64) ( aSig0<<( shiftCount & 63 ) ) ) ) { + float_exception_flags |= float_flag_inexact; + } + } + if ( aSign ) z = - z; + return z; + +} +#endif /* (SOFTFLOATSPARC64_FOR_GCC || SOFTFLOAT_FOR_GCC) && SOFTFLOAT_NEED_FIXUNS */ + +/* ------------------------------------------------------------------------------- Returns the result of converting the quadruple-precision floating-point value `a' to the single-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float32 float128_to_float32( float128 a ) { flag aSign; int32 aExp; bits64 aSig0, aSig1; bits32 zSig; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); aSign = extractFloat128Sign( a ); if ( aExp == 0x7FFF ) { if ( aSig0 | aSig1 ) { return commonNaNToFloat32( float128ToCommonNaN( a ) ); } return packFloat32( aSign, 0xFF, 0 ); } aSig0 |= ( aSig1 != 0 ); shift64RightJamming( aSig0, 18, &aSig0 ); zSig = aSig0; if ( aExp || zSig ) { zSig |= 0x40000000; aExp -= 0x3F81; } return roundAndPackFloat32( aSign, aExp, zSig ); } /* ------------------------------------------------------------------------------- Returns the result of converting the quadruple-precision floating-point value `a' to the double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float64 float128_to_float64( float128 a ) { flag aSign; int32 aExp; bits64 aSig0, aSig1; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); aSign = extractFloat128Sign( a ); if ( aExp == 0x7FFF ) { if ( aSig0 | aSig1 ) { return commonNaNToFloat64( float128ToCommonNaN( a ) ); } return packFloat64( aSign, 0x7FF, 0 ); } shortShift128Left( aSig0, aSig1, 14, &aSig0, &aSig1 ); aSig0 |= ( aSig1 != 0 ); if ( aExp || aSig0 ) { aSig0 |= LIT64( 0x4000000000000000 ); aExp -= 0x3C01; } return roundAndPackFloat64( aSign, aExp, aSig0 ); } #ifdef FLOATX80 /* ------------------------------------------------------------------------------- Returns the result of converting the quadruple-precision floating-point value `a' to the extended double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ floatx80 float128_to_floatx80( float128 a ) { flag aSign; int32 aExp; bits64 aSig0, aSig1; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); aSign = extractFloat128Sign( a ); if ( aExp == 0x7FFF ) { if ( aSig0 | aSig1 ) { return commonNaNToFloatx80( float128ToCommonNaN( a ) ); } return packFloatx80( aSign, 0x7FFF, LIT64( 0x8000000000000000 ) ); } if ( aExp == 0 ) { if ( ( aSig0 | aSig1 ) == 0 ) return packFloatx80( aSign, 0, 0 ); normalizeFloat128Subnormal( aSig0, aSig1, &aExp, &aSig0, &aSig1 ); } else { aSig0 |= LIT64( 0x0001000000000000 ); } shortShift128Left( aSig0, aSig1, 15, &aSig0, &aSig1 ); return roundAndPackFloatx80( 80, aSign, aExp, aSig0, aSig1 ); } #endif /* ------------------------------------------------------------------------------- Rounds the quadruple-precision floating-point value `a' to an integer, and returns the result as a quadruple-precision floating-point value. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float128 float128_round_to_int( float128 a ) { flag aSign; int32 aExp; bits64 lastBitMask, roundBitsMask; int8 roundingMode; float128 z; aExp = extractFloat128Exp( a ); if ( 0x402F <= aExp ) { if ( 0x406F <= aExp ) { if ( ( aExp == 0x7FFF ) && ( extractFloat128Frac0( a ) | extractFloat128Frac1( a ) ) ) { return propagateFloat128NaN( a, a ); } return a; } lastBitMask = 1; lastBitMask = ( lastBitMask<<( 0x406E - aExp ) )<<1; roundBitsMask = lastBitMask - 1; z = a; roundingMode = float_rounding_mode; if ( roundingMode == float_round_nearest_even ) { if ( lastBitMask ) { add128( z.high, z.low, 0, lastBitMask>>1, &z.high, &z.low ); if ( ( z.low & roundBitsMask ) == 0 ) z.low &= ~ lastBitMask; } else { if ( (sbits64) z.low < 0 ) { ++z.high; if ( (bits64) ( z.low<<1 ) == 0 ) z.high &= ~1; } } } else if ( roundingMode != float_round_to_zero ) { if ( extractFloat128Sign( z ) ^ ( roundingMode == float_round_up ) ) { add128( z.high, z.low, 0, roundBitsMask, &z.high, &z.low ); } } z.low &= ~ roundBitsMask; } else { if ( aExp < 0x3FFF ) { if ( ( ( (bits64) ( a.high<<1 ) ) | a.low ) == 0 ) return a; float_exception_flags |= float_flag_inexact; aSign = extractFloat128Sign( a ); switch ( float_rounding_mode ) { case float_round_nearest_even: if ( ( aExp == 0x3FFE ) && ( extractFloat128Frac0( a ) | extractFloat128Frac1( a ) ) ) { return packFloat128( aSign, 0x3FFF, 0, 0 ); } break; case float_round_to_zero: break; case float_round_down: return aSign ? packFloat128( 1, 0x3FFF, 0, 0 ) : packFloat128( 0, 0, 0, 0 ); case float_round_up: return aSign ? packFloat128( 1, 0, 0, 0 ) : packFloat128( 0, 0x3FFF, 0, 0 ); } return packFloat128( aSign, 0, 0, 0 ); } lastBitMask = 1; lastBitMask <<= 0x402F - aExp; roundBitsMask = lastBitMask - 1; z.low = 0; z.high = a.high; roundingMode = float_rounding_mode; if ( roundingMode == float_round_nearest_even ) { z.high += lastBitMask>>1; if ( ( ( z.high & roundBitsMask ) | a.low ) == 0 ) { z.high &= ~ lastBitMask; } } else if ( roundingMode != float_round_to_zero ) { if ( extractFloat128Sign( z ) ^ ( roundingMode == float_round_up ) ) { z.high |= ( a.low != 0 ); z.high += roundBitsMask; } } z.high &= ~ roundBitsMask; } if ( ( z.low != a.low ) || ( z.high != a.high ) ) { float_exception_flags |= float_flag_inexact; } return z; } /* ------------------------------------------------------------------------------- Returns the result of adding the absolute values of the quadruple-precision floating-point values `a' and `b'. If `zSign' is 1, the sum is negated before being returned. `zSign' is ignored if the result is a NaN. The addition is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ static float128 addFloat128Sigs( float128 a, float128 b, flag zSign ) { int32 aExp, bExp, zExp; bits64 aSig0, aSig1, bSig0, bSig1, zSig0, zSig1, zSig2; int32 expDiff; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); bSig1 = extractFloat128Frac1( b ); bSig0 = extractFloat128Frac0( b ); bExp = extractFloat128Exp( b ); expDiff = aExp - bExp; if ( 0 < expDiff ) { if ( aExp == 0x7FFF ) { if ( aSig0 | aSig1 ) return propagateFloat128NaN( a, b ); return a; } if ( bExp == 0 ) { --expDiff; } else { bSig0 |= LIT64( 0x0001000000000000 ); } shift128ExtraRightJamming( bSig0, bSig1, 0, expDiff, &bSig0, &bSig1, &zSig2 ); zExp = aExp; } else if ( expDiff < 0 ) { if ( bExp == 0x7FFF ) { if ( bSig0 | bSig1 ) return propagateFloat128NaN( a, b ); return packFloat128( zSign, 0x7FFF, 0, 0 ); } if ( aExp == 0 ) { ++expDiff; } else { aSig0 |= LIT64( 0x0001000000000000 ); } shift128ExtraRightJamming( aSig0, aSig1, 0, - expDiff, &aSig0, &aSig1, &zSig2 ); zExp = bExp; } else { if ( aExp == 0x7FFF ) { if ( aSig0 | aSig1 | bSig0 | bSig1 ) { return propagateFloat128NaN( a, b ); } return a; } add128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 ); if ( aExp == 0 ) return packFloat128( zSign, 0, zSig0, zSig1 ); zSig2 = 0; zSig0 |= LIT64( 0x0002000000000000 ); zExp = aExp; goto shiftRight1; } aSig0 |= LIT64( 0x0001000000000000 ); add128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 ); --zExp; if ( zSig0 < LIT64( 0x0002000000000000 ) ) goto roundAndPack; ++zExp; shiftRight1: shift128ExtraRightJamming( zSig0, zSig1, zSig2, 1, &zSig0, &zSig1, &zSig2 ); roundAndPack: return roundAndPackFloat128( zSign, zExp, zSig0, zSig1, zSig2 ); } /* ------------------------------------------------------------------------------- Returns the result of subtracting the absolute values of the quadruple- precision floating-point values `a' and `b'. If `zSign' is 1, the difference is negated before being returned. `zSign' is ignored if the result is a NaN. The subtraction is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ static float128 subFloat128Sigs( float128 a, float128 b, flag zSign ) { int32 aExp, bExp, zExp; bits64 aSig0, aSig1, bSig0, bSig1, zSig0, zSig1; int32 expDiff; float128 z; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); bSig1 = extractFloat128Frac1( b ); bSig0 = extractFloat128Frac0( b ); bExp = extractFloat128Exp( b ); expDiff = aExp - bExp; shortShift128Left( aSig0, aSig1, 14, &aSig0, &aSig1 ); shortShift128Left( bSig0, bSig1, 14, &bSig0, &bSig1 ); if ( 0 < expDiff ) goto aExpBigger; if ( expDiff < 0 ) goto bExpBigger; if ( aExp == 0x7FFF ) { if ( aSig0 | aSig1 | bSig0 | bSig1 ) { return propagateFloat128NaN( a, b ); } float_raise( float_flag_invalid ); z.low = float128_default_nan_low; z.high = float128_default_nan_high; return z; } if ( aExp == 0 ) { aExp = 1; bExp = 1; } if ( bSig0 < aSig0 ) goto aBigger; if ( aSig0 < bSig0 ) goto bBigger; if ( bSig1 < aSig1 ) goto aBigger; if ( aSig1 < bSig1 ) goto bBigger; return packFloat128( float_rounding_mode == float_round_down, 0, 0, 0 ); bExpBigger: if ( bExp == 0x7FFF ) { if ( bSig0 | bSig1 ) return propagateFloat128NaN( a, b ); return packFloat128( zSign ^ 1, 0x7FFF, 0, 0 ); } if ( aExp == 0 ) { ++expDiff; } else { aSig0 |= LIT64( 0x4000000000000000 ); } shift128RightJamming( aSig0, aSig1, - expDiff, &aSig0, &aSig1 ); bSig0 |= LIT64( 0x4000000000000000 ); bBigger: sub128( bSig0, bSig1, aSig0, aSig1, &zSig0, &zSig1 ); zExp = bExp; zSign ^= 1; goto normalizeRoundAndPack; aExpBigger: if ( aExp == 0x7FFF ) { if ( aSig0 | aSig1 ) return propagateFloat128NaN( a, b ); return a; } if ( bExp == 0 ) { --expDiff; } else { bSig0 |= LIT64( 0x4000000000000000 ); } shift128RightJamming( bSig0, bSig1, expDiff, &bSig0, &bSig1 ); aSig0 |= LIT64( 0x4000000000000000 ); aBigger: sub128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 ); zExp = aExp; normalizeRoundAndPack: --zExp; return normalizeRoundAndPackFloat128( zSign, zExp - 14, zSig0, zSig1 ); } /* ------------------------------------------------------------------------------- Returns the result of adding the quadruple-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float128 float128_add( float128 a, float128 b ) { flag aSign, bSign; aSign = extractFloat128Sign( a ); bSign = extractFloat128Sign( b ); if ( aSign == bSign ) { return addFloat128Sigs( a, b, aSign ); } else { return subFloat128Sigs( a, b, aSign ); } } /* ------------------------------------------------------------------------------- Returns the result of subtracting the quadruple-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float128 float128_sub( float128 a, float128 b ) { flag aSign, bSign; aSign = extractFloat128Sign( a ); bSign = extractFloat128Sign( b ); if ( aSign == bSign ) { return subFloat128Sigs( a, b, aSign ); } else { return addFloat128Sigs( a, b, aSign ); } } /* ------------------------------------------------------------------------------- Returns the result of multiplying the quadruple-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float128 float128_mul( float128 a, float128 b ) { flag aSign, bSign, zSign; int32 aExp, bExp, zExp; bits64 aSig0, aSig1, bSig0, bSig1, zSig0, zSig1, zSig2, zSig3; float128 z; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); aSign = extractFloat128Sign( a ); bSig1 = extractFloat128Frac1( b ); bSig0 = extractFloat128Frac0( b ); bExp = extractFloat128Exp( b ); bSign = extractFloat128Sign( b ); zSign = aSign ^ bSign; if ( aExp == 0x7FFF ) { if ( ( aSig0 | aSig1 ) || ( ( bExp == 0x7FFF ) && ( bSig0 | bSig1 ) ) ) { return propagateFloat128NaN( a, b ); } if ( ( bExp | bSig0 | bSig1 ) == 0 ) goto invalid; return packFloat128( zSign, 0x7FFF, 0, 0 ); } if ( bExp == 0x7FFF ) { if ( bSig0 | bSig1 ) return propagateFloat128NaN( a, b ); if ( ( aExp | aSig0 | aSig1 ) == 0 ) { invalid: float_raise( float_flag_invalid ); z.low = float128_default_nan_low; z.high = float128_default_nan_high; return z; } return packFloat128( zSign, 0x7FFF, 0, 0 ); } if ( aExp == 0 ) { if ( ( aSig0 | aSig1 ) == 0 ) return packFloat128( zSign, 0, 0, 0 ); normalizeFloat128Subnormal( aSig0, aSig1, &aExp, &aSig0, &aSig1 ); } if ( bExp == 0 ) { if ( ( bSig0 | bSig1 ) == 0 ) return packFloat128( zSign, 0, 0, 0 ); normalizeFloat128Subnormal( bSig0, bSig1, &bExp, &bSig0, &bSig1 ); } zExp = aExp + bExp - 0x4000; aSig0 |= LIT64( 0x0001000000000000 ); shortShift128Left( bSig0, bSig1, 16, &bSig0, &bSig1 ); mul128To256( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1, &zSig2, &zSig3 ); add128( zSig0, zSig1, aSig0, aSig1, &zSig0, &zSig1 ); zSig2 |= ( zSig3 != 0 ); if ( LIT64( 0x0002000000000000 ) <= zSig0 ) { shift128ExtraRightJamming( zSig0, zSig1, zSig2, 1, &zSig0, &zSig1, &zSig2 ); ++zExp; } return roundAndPackFloat128( zSign, zExp, zSig0, zSig1, zSig2 ); } /* ------------------------------------------------------------------------------- Returns the result of dividing the quadruple-precision floating-point value `a' by the corresponding value `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float128 float128_div( float128 a, float128 b ) { flag aSign, bSign, zSign; int32 aExp, bExp, zExp; bits64 aSig0, aSig1, bSig0, bSig1, zSig0, zSig1, zSig2; bits64 rem0, rem1, rem2, rem3, term0, term1, term2, term3; float128 z; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); aSign = extractFloat128Sign( a ); bSig1 = extractFloat128Frac1( b ); bSig0 = extractFloat128Frac0( b ); bExp = extractFloat128Exp( b ); bSign = extractFloat128Sign( b ); zSign = aSign ^ bSign; if ( aExp == 0x7FFF ) { if ( aSig0 | aSig1 ) return propagateFloat128NaN( a, b ); if ( bExp == 0x7FFF ) { if ( bSig0 | bSig1 ) return propagateFloat128NaN( a, b ); goto invalid; } return packFloat128( zSign, 0x7FFF, 0, 0 ); } if ( bExp == 0x7FFF ) { if ( bSig0 | bSig1 ) return propagateFloat128NaN( a, b ); return packFloat128( zSign, 0, 0, 0 ); } if ( bExp == 0 ) { if ( ( bSig0 | bSig1 ) == 0 ) { if ( ( aExp | aSig0 | aSig1 ) == 0 ) { invalid: float_raise( float_flag_invalid ); z.low = float128_default_nan_low; z.high = float128_default_nan_high; return z; } float_raise( float_flag_divbyzero ); return packFloat128( zSign, 0x7FFF, 0, 0 ); } normalizeFloat128Subnormal( bSig0, bSig1, &bExp, &bSig0, &bSig1 ); } if ( aExp == 0 ) { if ( ( aSig0 | aSig1 ) == 0 ) return packFloat128( zSign, 0, 0, 0 ); normalizeFloat128Subnormal( aSig0, aSig1, &aExp, &aSig0, &aSig1 ); } zExp = aExp - bExp + 0x3FFD; shortShift128Left( aSig0 | LIT64( 0x0001000000000000 ), aSig1, 15, &aSig0, &aSig1 ); shortShift128Left( bSig0 | LIT64( 0x0001000000000000 ), bSig1, 15, &bSig0, &bSig1 ); if ( le128( bSig0, bSig1, aSig0, aSig1 ) ) { shift128Right( aSig0, aSig1, 1, &aSig0, &aSig1 ); ++zExp; } zSig0 = estimateDiv128To64( aSig0, aSig1, bSig0 ); mul128By64To192( bSig0, bSig1, zSig0, &term0, &term1, &term2 ); sub192( aSig0, aSig1, 0, term0, term1, term2, &rem0, &rem1, &rem2 ); while ( (sbits64) rem0 < 0 ) { --zSig0; add192( rem0, rem1, rem2, 0, bSig0, bSig1, &rem0, &rem1, &rem2 ); } zSig1 = estimateDiv128To64( rem1, rem2, bSig0 ); if ( ( zSig1 & 0x3FFF ) <= 4 ) { mul128By64To192( bSig0, bSig1, zSig1, &term1, &term2, &term3 ); sub192( rem1, rem2, 0, term1, term2, term3, &rem1, &rem2, &rem3 ); while ( (sbits64) rem1 < 0 ) { --zSig1; add192( rem1, rem2, rem3, 0, bSig0, bSig1, &rem1, &rem2, &rem3 ); } zSig1 |= ( ( rem1 | rem2 | rem3 ) != 0 ); } shift128ExtraRightJamming( zSig0, zSig1, 0, 15, &zSig0, &zSig1, &zSig2 ); return roundAndPackFloat128( zSign, zExp, zSig0, zSig1, zSig2 ); } /* ------------------------------------------------------------------------------- Returns the remainder of the quadruple-precision floating-point value `a' with respect to the corresponding value `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float128 float128_rem( float128 a, float128 b ) { flag aSign, bSign, zSign; int32 aExp, bExp, expDiff; bits64 aSig0, aSig1, bSig0, bSig1, q, term0, term1, term2; bits64 allZero, alternateASig0, alternateASig1, sigMean1; sbits64 sigMean0; float128 z; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); aSign = extractFloat128Sign( a ); bSig1 = extractFloat128Frac1( b ); bSig0 = extractFloat128Frac0( b ); bExp = extractFloat128Exp( b ); bSign = extractFloat128Sign( b ); if ( aExp == 0x7FFF ) { if ( ( aSig0 | aSig1 ) || ( ( bExp == 0x7FFF ) && ( bSig0 | bSig1 ) ) ) { return propagateFloat128NaN( a, b ); } goto invalid; } if ( bExp == 0x7FFF ) { if ( bSig0 | bSig1 ) return propagateFloat128NaN( a, b ); return a; } if ( bExp == 0 ) { if ( ( bSig0 | bSig1 ) == 0 ) { invalid: float_raise( float_flag_invalid ); z.low = float128_default_nan_low; z.high = float128_default_nan_high; return z; } normalizeFloat128Subnormal( bSig0, bSig1, &bExp, &bSig0, &bSig1 ); } if ( aExp == 0 ) { if ( ( aSig0 | aSig1 ) == 0 ) return a; normalizeFloat128Subnormal( aSig0, aSig1, &aExp, &aSig0, &aSig1 ); } expDiff = aExp - bExp; if ( expDiff < -1 ) return a; shortShift128Left( aSig0 | LIT64( 0x0001000000000000 ), aSig1, 15 - ( expDiff < 0 ), &aSig0, &aSig1 ); shortShift128Left( bSig0 | LIT64( 0x0001000000000000 ), bSig1, 15, &bSig0, &bSig1 ); q = le128( bSig0, bSig1, aSig0, aSig1 ); if ( q ) sub128( aSig0, aSig1, bSig0, bSig1, &aSig0, &aSig1 ); expDiff -= 64; while ( 0 < expDiff ) { q = estimateDiv128To64( aSig0, aSig1, bSig0 ); q = ( 4 < q ) ? q - 4 : 0; mul128By64To192( bSig0, bSig1, q, &term0, &term1, &term2 ); shortShift192Left( term0, term1, term2, 61, &term1, &term2, &allZero ); shortShift128Left( aSig0, aSig1, 61, &aSig0, &allZero ); sub128( aSig0, 0, term1, term2, &aSig0, &aSig1 ); expDiff -= 61; } if ( -64 < expDiff ) { q = estimateDiv128To64( aSig0, aSig1, bSig0 ); q = ( 4 < q ) ? q - 4 : 0; q >>= - expDiff; shift128Right( bSig0, bSig1, 12, &bSig0, &bSig1 ); expDiff += 52; if ( expDiff < 0 ) { shift128Right( aSig0, aSig1, - expDiff, &aSig0, &aSig1 ); } else { shortShift128Left( aSig0, aSig1, expDiff, &aSig0, &aSig1 ); } mul128By64To192( bSig0, bSig1, q, &term0, &term1, &term2 ); sub128( aSig0, aSig1, term1, term2, &aSig0, &aSig1 ); } else { shift128Right( aSig0, aSig1, 12, &aSig0, &aSig1 ); shift128Right( bSig0, bSig1, 12, &bSig0, &bSig1 ); } do { alternateASig0 = aSig0; alternateASig1 = aSig1; ++q; sub128( aSig0, aSig1, bSig0, bSig1, &aSig0, &aSig1 ); } while ( 0 <= (sbits64) aSig0 ); add128( - aSig0, aSig1, alternateASig0, alternateASig1, &sigMean0, &sigMean1 ); + aSig0, aSig1, alternateASig0, alternateASig1, (bits64 *)&sigMean0, &sigMean1 ); if ( ( sigMean0 < 0 ) || ( ( ( sigMean0 | sigMean1 ) == 0 ) && ( q & 1 ) ) ) { aSig0 = alternateASig0; aSig1 = alternateASig1; } zSign = ( (sbits64) aSig0 < 0 ); if ( zSign ) sub128( 0, 0, aSig0, aSig1, &aSig0, &aSig1 ); return normalizeRoundAndPackFloat128( aSign ^ zSign, bExp - 4, aSig0, aSig1 ); } /* ------------------------------------------------------------------------------- Returns the square root of the quadruple-precision floating-point value `a'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ float128 float128_sqrt( float128 a ) { flag aSign; int32 aExp, zExp; bits64 aSig0, aSig1, zSig0, zSig1, zSig2, doubleZSig0; bits64 rem0, rem1, rem2, rem3, term0, term1, term2, term3; float128 z; aSig1 = extractFloat128Frac1( a ); aSig0 = extractFloat128Frac0( a ); aExp = extractFloat128Exp( a ); aSign = extractFloat128Sign( a ); if ( aExp == 0x7FFF ) { if ( aSig0 | aSig1 ) return propagateFloat128NaN( a, a ); if ( ! aSign ) return a; goto invalid; } if ( aSign ) { if ( ( aExp | aSig0 | aSig1 ) == 0 ) return a; invalid: float_raise( float_flag_invalid ); z.low = float128_default_nan_low; z.high = float128_default_nan_high; return z; } if ( aExp == 0 ) { if ( ( aSig0 | aSig1 ) == 0 ) return packFloat128( 0, 0, 0, 0 ); normalizeFloat128Subnormal( aSig0, aSig1, &aExp, &aSig0, &aSig1 ); } zExp = ( ( aExp - 0x3FFF )>>1 ) + 0x3FFE; aSig0 |= LIT64( 0x0001000000000000 ); zSig0 = estimateSqrt32( aExp, aSig0>>17 ); shortShift128Left( aSig0, aSig1, 13 - ( aExp & 1 ), &aSig0, &aSig1 ); zSig0 = estimateDiv128To64( aSig0, aSig1, zSig0<<32 ) + ( zSig0<<30 ); doubleZSig0 = zSig0<<1; mul64To128( zSig0, zSig0, &term0, &term1 ); sub128( aSig0, aSig1, term0, term1, &rem0, &rem1 ); while ( (sbits64) rem0 < 0 ) { --zSig0; doubleZSig0 -= 2; add128( rem0, rem1, zSig0>>63, doubleZSig0 | 1, &rem0, &rem1 ); } zSig1 = estimateDiv128To64( rem1, 0, doubleZSig0 ); if ( ( zSig1 & 0x1FFF ) <= 5 ) { if ( zSig1 == 0 ) zSig1 = 1; mul64To128( doubleZSig0, zSig1, &term1, &term2 ); sub128( rem1, 0, term1, term2, &rem1, &rem2 ); mul64To128( zSig1, zSig1, &term2, &term3 ); sub192( rem1, rem2, 0, 0, term2, term3, &rem1, &rem2, &rem3 ); while ( (sbits64) rem1 < 0 ) { --zSig1; shortShift128Left( 0, zSig1, 1, &term2, &term3 ); term3 |= 1; term2 |= doubleZSig0; add192( rem1, rem2, rem3, 0, term2, term3, &rem1, &rem2, &rem3 ); } zSig1 |= ( ( rem1 | rem2 | rem3 ) != 0 ); } shift128ExtraRightJamming( zSig0, zSig1, 0, 14, &zSig0, &zSig1, &zSig2 ); return roundAndPackFloat128( 0, zExp, zSig0, zSig1, zSig2 ); } /* ------------------------------------------------------------------------------- Returns 1 if the quadruple-precision floating-point value `a' is equal to the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float128_eq( float128 a, float128 b ) { if ( ( ( extractFloat128Exp( a ) == 0x7FFF ) && ( extractFloat128Frac0( a ) | extractFloat128Frac1( a ) ) ) || ( ( extractFloat128Exp( b ) == 0x7FFF ) && ( extractFloat128Frac0( b ) | extractFloat128Frac1( b ) ) ) ) { if ( float128_is_signaling_nan( a ) || float128_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } return ( a.low == b.low ) && ( ( a.high == b.high ) || ( ( a.low == 0 ) && ( (bits64) ( ( a.high | b.high )<<1 ) == 0 ) ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the quadruple-precision floating-point value `a' is less than or equal to the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float128_le( float128 a, float128 b ) { flag aSign, bSign; if ( ( ( extractFloat128Exp( a ) == 0x7FFF ) && ( extractFloat128Frac0( a ) | extractFloat128Frac1( a ) ) ) || ( ( extractFloat128Exp( b ) == 0x7FFF ) && ( extractFloat128Frac0( b ) | extractFloat128Frac1( b ) ) ) ) { float_raise( float_flag_invalid ); return 0; } aSign = extractFloat128Sign( a ); bSign = extractFloat128Sign( b ); if ( aSign != bSign ) { return aSign || ( ( ( (bits64) ( ( a.high | b.high )<<1 ) ) | a.low | b.low ) == 0 ); } return aSign ? le128( b.high, b.low, a.high, a.low ) : le128( a.high, a.low, b.high, b.low ); } /* ------------------------------------------------------------------------------- Returns 1 if the quadruple-precision floating-point value `a' is less than the corresponding value `b', and 0 otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float128_lt( float128 a, float128 b ) { flag aSign, bSign; if ( ( ( extractFloat128Exp( a ) == 0x7FFF ) && ( extractFloat128Frac0( a ) | extractFloat128Frac1( a ) ) ) || ( ( extractFloat128Exp( b ) == 0x7FFF ) && ( extractFloat128Frac0( b ) | extractFloat128Frac1( b ) ) ) ) { float_raise( float_flag_invalid ); return 0; } aSign = extractFloat128Sign( a ); bSign = extractFloat128Sign( b ); if ( aSign != bSign ) { return aSign && ( ( ( (bits64) ( ( a.high | b.high )<<1 ) ) | a.low | b.low ) != 0 ); } return aSign ? lt128( b.high, b.low, a.high, a.low ) : lt128( a.high, a.low, b.high, b.low ); } /* ------------------------------------------------------------------------------- Returns 1 if the quadruple-precision floating-point value `a' is equal to the corresponding value `b', and 0 otherwise. The invalid exception is raised if either operand is a NaN. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float128_eq_signaling( float128 a, float128 b ) { if ( ( ( extractFloat128Exp( a ) == 0x7FFF ) && ( extractFloat128Frac0( a ) | extractFloat128Frac1( a ) ) ) || ( ( extractFloat128Exp( b ) == 0x7FFF ) && ( extractFloat128Frac0( b ) | extractFloat128Frac1( b ) ) ) ) { float_raise( float_flag_invalid ); return 0; } return ( a.low == b.low ) && ( ( a.high == b.high ) || ( ( a.low == 0 ) && ( (bits64) ( ( a.high | b.high )<<1 ) == 0 ) ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the quadruple-precision floating-point value `a' is less than or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float128_le_quiet( float128 a, float128 b ) { flag aSign, bSign; if ( ( ( extractFloat128Exp( a ) == 0x7FFF ) && ( extractFloat128Frac0( a ) | extractFloat128Frac1( a ) ) ) || ( ( extractFloat128Exp( b ) == 0x7FFF ) && ( extractFloat128Frac0( b ) | extractFloat128Frac1( b ) ) ) ) { if ( float128_is_signaling_nan( a ) || float128_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } aSign = extractFloat128Sign( a ); bSign = extractFloat128Sign( b ); if ( aSign != bSign ) { return aSign || ( ( ( (bits64) ( ( a.high | b.high )<<1 ) ) | a.low | b.low ) == 0 ); } return aSign ? le128( b.high, b.low, a.high, a.low ) : le128( a.high, a.low, b.high, b.low ); } /* ------------------------------------------------------------------------------- Returns 1 if the quadruple-precision floating-point value `a' is less than the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. ------------------------------------------------------------------------------- */ flag float128_lt_quiet( float128 a, float128 b ) { flag aSign, bSign; if ( ( ( extractFloat128Exp( a ) == 0x7FFF ) && ( extractFloat128Frac0( a ) | extractFloat128Frac1( a ) ) ) || ( ( extractFloat128Exp( b ) == 0x7FFF ) && ( extractFloat128Frac0( b ) | extractFloat128Frac1( b ) ) ) ) { if ( float128_is_signaling_nan( a ) || float128_is_signaling_nan( b ) ) { float_raise( float_flag_invalid ); } return 0; } aSign = extractFloat128Sign( a ); bSign = extractFloat128Sign( b ); if ( aSign != bSign ) { return aSign && ( ( ( (bits64) ( ( a.high | b.high )<<1 ) ) | a.low | b.low ) != 0 ); } return aSign ? lt128( b.high, b.low, a.high, a.low ) : lt128( a.high, a.low, b.high, b.low ); } #endif #if defined(SOFTFLOAT_FOR_GCC) && defined(SOFTFLOAT_NEED_FIXUNS) /* * These two routines are not part of the original softfloat distribution. * * They are based on the corresponding conversions to integer but return * unsigned numbers instead since these functions are required by GCC. * * Added by Mark Brinicombe 27/09/97 * * float64 version overhauled for SoftFloat 2a [bjh21 2000-07-15] */ /* ------------------------------------------------------------------------------- Returns the result of converting the double-precision floating-point value `a' to the 32-bit unsigned integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-point Arithmetic, except that the conversion is always rounded toward zero. If `a' is a NaN, the largest positive integer is returned. If the conversion overflows, the largest integer positive is returned. ------------------------------------------------------------------------------- */ uint32 float64_to_uint32_round_to_zero( float64 a ) { flag aSign; int16 aExp, shiftCount; bits64 aSig, savedASig; uint32 z; aSig = extractFloat64Frac( a ); aExp = extractFloat64Exp( a ); aSign = extractFloat64Sign( a ); if (aSign) { float_raise( float_flag_invalid ); return(0); } if ( 0x41E < aExp ) { float_raise( float_flag_invalid ); return 0xffffffff; } else if ( aExp < 0x3FF ) { if ( aExp || aSig ) float_exception_flags |= float_flag_inexact; return 0; } aSig |= LIT64( 0x0010000000000000 ); shiftCount = 0x433 - aExp; savedASig = aSig; aSig >>= shiftCount; z = aSig; if ( ( aSig<>( - shiftCount ); if ( aSig<<( shiftCount & 31 ) ) { float_exception_flags |= float_flag_inexact; } return z; } #endif Index: head/lib/libc/softfloat/eqtf2.c =================================================================== --- head/lib/libc/softfloat/eqtf2.c (nonexistent) +++ head/lib/libc/softfloat/eqtf2.c (revision 230363) @@ -0,0 +1,24 @@ +/* $NetBSD: eqtf2.c,v 1.1 2011/01/17 10:08:35 matt Exp $ */ + +/* + * Written by Matt Thomas, 2011. This file is in the Public Domain. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#ifdef FLOAT128 +flag __eqtf2(float128, float128); + +flag +__eqtf2(float128 a, float128 b) +{ + + /* libgcc1.c says !(a == b) */ + return !float128_eq(a, b); +} +#endif /* FLOAT128 */ Property changes on: head/lib/libc/softfloat/eqtf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/getf2.c =================================================================== --- head/lib/libc/softfloat/getf2.c (nonexistent) +++ head/lib/libc/softfloat/getf2.c (revision 230363) @@ -0,0 +1,26 @@ +/* $NetBSD: getf2.c,v 1.1 2011/01/17 10:08:35 matt Exp $ */ + +/* + * Written by Matt Thomas, 2011. This file is in the Public Domain. + */ + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#include +__FBSDID("$FreeBSD$"); + +#ifdef FLOAT128 + +flag __getf2(float128, float128); + +flag +__getf2(float128 a, float128 b) +{ + + /* libgcc1.c says (a >= b) - 1 */ + return float128_le(b, a) - 1; +} + +#endif /* FLOAT128 */ Property changes on: head/lib/libc/softfloat/getf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/gexf2.c =================================================================== --- head/lib/libc/softfloat/gexf2.c (nonexistent) +++ head/lib/libc/softfloat/gexf2.c (revision 230363) @@ -0,0 +1,25 @@ +/* $NetBSD: gexf2.c,v 1.2 2004/09/27 10:16:24 he Exp $ */ + +/* + * Written by Ben Harris, 2000. This file is in the Public Domain. + */ + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#include +__FBSDID("$FreeBSD$"); + +#ifdef FLOATX80 + +flag __gexf2(floatx80, floatx80); + +flag +__gexf2(floatx80 a, floatx80 b) +{ + + /* libgcc1.c says (a >= b) - 1 */ + return floatx80_le(b, a) - 1; +} +#endif /* FLOATX80 */ Property changes on: head/lib/libc/softfloat/gexf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/gttf2.c =================================================================== --- head/lib/libc/softfloat/gttf2.c (nonexistent) +++ head/lib/libc/softfloat/gttf2.c (revision 230363) @@ -0,0 +1,26 @@ +/* $NetBSD: gttf2.c,v 1.1 2011/01/17 10:08:35 matt Exp $ */ + +/* + * Written by Matt Thomas, 2011. This file is in the Public Domain. + */ + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#include +__FBSDID("$FreeBSD$"); + +#ifdef FLOAT128 + +flag __gttf2(float128, float128); + +flag +__gttf2(float128 a, float128 b) +{ + + /* libgcc1.c says a > b */ + return float128_lt(b, a); +} + +#endif /* FLOAT128 */ Property changes on: head/lib/libc/softfloat/gttf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/gtxf2.c =================================================================== --- head/lib/libc/softfloat/gtxf2.c (nonexistent) +++ head/lib/libc/softfloat/gtxf2.c (revision 230363) @@ -0,0 +1,25 @@ +/* $NetBSD: gtxf2.c,v 1.2 2004/09/27 10:16:24 he Exp $ */ + +/* + * Written by Ben Harris, 2000. This file is in the Public Domain. + */ + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#include +__FBSDID("$FreeBSD$"); + +#ifdef FLOATX80 + +flag __gtxf2(floatx80, floatx80); + +flag +__gtxf2(floatx80 a, floatx80 b) +{ + + /* libgcc1.c says a > b */ + return floatx80_lt(b, a); +} +#endif /* FLOATX80 */ Property changes on: head/lib/libc/softfloat/gtxf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/letf2.c =================================================================== --- head/lib/libc/softfloat/letf2.c (nonexistent) +++ head/lib/libc/softfloat/letf2.c (revision 230363) @@ -0,0 +1,26 @@ +/* $NetBSD: letf2.c,v 1.1 2011/01/17 10:08:35 matt Exp $ */ + +/* + * Written by Matt Thomas, 2011. This file is in the Public Domain. + */ + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#include +__FBSDID("$FreeBSD$"); + +#ifdef FLOAT128 + +flag __letf2(float128, float128); + +flag +__letf2(float128 a, float128 b) +{ + + /* libgcc1.c says 1 - (a <= b) */ + return 1 - float128_le(a, b); +} + +#endif /* FLOAT128 */ Property changes on: head/lib/libc/softfloat/letf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/lttf2.c =================================================================== --- head/lib/libc/softfloat/lttf2.c (nonexistent) +++ head/lib/libc/softfloat/lttf2.c (revision 230363) @@ -0,0 +1,26 @@ +/* $NetBSD: lttf2.c,v 1.1 2011/01/17 10:08:35 matt Exp $ */ + +/* + * Written by Matt Thomas, 2011. This file is in the Public Domain. + */ + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#include +__FBSDID("$FreeBSD$"); + +#ifdef FLOAT128 + +flag __lttf2(float128, float128); + +flag +__lttf2(float128 a, float128 b) +{ + + /* libgcc1.c says -(a < b) */ + return -float128_lt(a, b); +} + +#endif /* FLOAT128 */ Property changes on: head/lib/libc/softfloat/lttf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/negtf2.c =================================================================== --- head/lib/libc/softfloat/negtf2.c (nonexistent) +++ head/lib/libc/softfloat/negtf2.c (revision 230363) @@ -0,0 +1,27 @@ +/* $NetBSD: negtf2.c,v 1.1 2011/01/17 10:08:35 matt Exp $ */ + +/* + * Written by Matt Thomas, 2011. This file is in the Public Domain. + */ + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#include +__FBSDID("$FreeBSD$"); + +#ifdef FLOAT128 + +float128 __negtf2(float128); + +float128 +__negtf2(float128 a) +{ + + /* libgcc1.c says -a */ + a.high ^= FLOAT64_MANGLE(0x8000000000000000ULL); + return a; +} + +#endif /* FLOAT128 */ Property changes on: head/lib/libc/softfloat/negtf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/negxf2.c =================================================================== --- head/lib/libc/softfloat/negxf2.c (nonexistent) +++ head/lib/libc/softfloat/negxf2.c (revision 230363) @@ -0,0 +1,25 @@ +/* $NetBSD: negxf2.c,v 1.2 2004/09/27 10:16:24 he Exp $ */ + +/* + * Written by Ben Harris, 2000. This file is in the Public Domain. + */ + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#include +__FBSDID("$FreeBSD$"); + +#ifdef FLOATX80 + +floatx80 __negxf2(floatx80); + +floatx80 +__negxf2(floatx80 a) +{ + + /* libgcc1.c says -a */ + return __mulxf3(a,__floatsixf(-1)); +} +#endif /* FLOATX80 */ Property changes on: head/lib/libc/softfloat/negxf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/netf2.c =================================================================== --- head/lib/libc/softfloat/netf2.c (nonexistent) +++ head/lib/libc/softfloat/netf2.c (revision 230363) @@ -0,0 +1,26 @@ +/* $NetBSD: netf2.c,v 1.1 2011/01/17 10:08:35 matt Exp $ */ + +/* + * Written by Matt Thomas, 2011. This file is in the Public Domain. + */ + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#include +__FBSDID("$FreeBSD$"); + +#ifdef FLOAT128 + +flag __netf2(float128, float128); + +flag +__netf2(float128 a, float128 b) +{ + + /* libgcc1.c says a != b */ + return !float128_eq(a, b); +} + +#endif /* FLOAT128 */ Property changes on: head/lib/libc/softfloat/netf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/nexf2.c =================================================================== --- head/lib/libc/softfloat/nexf2.c (nonexistent) +++ head/lib/libc/softfloat/nexf2.c (revision 230363) @@ -0,0 +1,25 @@ +/* $NetBSD: nexf2.c,v 1.2 2004/09/27 10:16:24 he Exp $ */ + +/* + * Written by Ben Harris, 2000. This file is in the Public Domain. + */ + +#include "softfloat-for-gcc.h" +#include "milieu.h" +#include "softfloat.h" + +#include +__FBSDID("$FreeBSD$"); + +#ifdef FLOATX80 + +flag __nexf2(floatx80, floatx80); + +flag +__nexf2(floatx80 a, floatx80 b) +{ + + /* libgcc1.c says a != b */ + return !floatx80_eq(a, b); +} +#endif /* FLOATX80 */ Property changes on: head/lib/libc/softfloat/nexf2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: head/lib/libc/softfloat/softfloat-for-gcc.h =================================================================== --- head/lib/libc/softfloat/softfloat-for-gcc.h (revision 230362) +++ head/lib/libc/softfloat/softfloat-for-gcc.h (revision 230363) @@ -1,43 +1,169 @@ -/* $NetBSD: softfloat-for-gcc.h,v 1.6 2003/07/26 19:24:51 salo Exp $ */ +/* $NetBSD: softfloat-for-gcc.h,v 1.8 2009/12/14 01:07:42 matt Exp $ */ /* $FreeBSD$ */ /* * Move private identifiers with external linkage into implementation * namespace. -- Klaus Klein , May 5, 1999 */ #define float_exception_flags __softfloat_float_exception_flags #define float_exception_mask __softfloat_float_exception_mask #define float_rounding_mode __softfloat_float_rounding_mode #define float_raise __softfloat_float_raise /* The following batch are called by GCC through wrappers */ #define float32_eq __softfloat_float32_eq #define float32_le __softfloat_float32_le #define float32_lt __softfloat_float32_lt #define float64_eq __softfloat_float64_eq #define float64_le __softfloat_float64_le #define float64_lt __softfloat_float64_lt +#define float128_eq __softfloat_float128_eq +#define float128_le __softfloat_float128_le +#define float128_lt __softfloat_float128_lt /* * Macros to define functions with the GCC expected names */ #define float32_add __addsf3 #define float64_add __adddf3 +#define floatx80_add __addxf3 +#define float128_add __addtf3 + #define float32_sub __subsf3 #define float64_sub __subdf3 +#define floatx80_sub __subxf3 +#define float128_sub __subtf3 + #define float32_mul __mulsf3 #define float64_mul __muldf3 +#define floatx80_mul __mulxf3 +#define float128_mul __multf3 + #define float32_div __divsf3 #define float64_div __divdf3 +#define floatx80_div __divxf3 +#define float128_div __divtf3 + +#if 0 +#define float32_neg __negsf2 +#define float64_neg __negdf2 +#define floatx80_neg __negxf2 +#define float128_neg __negtf2 +#endif + #define int32_to_float32 __floatsisf #define int32_to_float64 __floatsidf +#define int32_to_floatx80 __floatsixf +#define int32_to_float128 __floatsitf + #define int64_to_float32 __floatdisf #define int64_to_float64 __floatdidf +#define int64_to_floatx80 __floatdixf +#define int64_to_float128 __floatditf + +#define int128_to_float32 __floattisf +#define int128_to_float64 __floattidf +#define int128_to_floatx80 __floattixf +#define int128_to_float128 __floattitf + +#define uint32_to_float32 __floatunsisf +#define uint32_to_float64 __floatunsidf +#define uint32_to_floatx80 __floatunsixf +#define uint32_to_float128 __floatunsitf + +#define uint64_to_float32 __floatundisf +#define uint64_to_float64 __floatundidf +#define uint64_to_floatx80 __floatundixf +#define uint64_to_float128 __floatunditf + +#define uint128_to_float32 __floatuntisf +#define uint128_to_float64 __floatuntidf +#define uint128_to_floatx80 __floatuntixf +#define uint128_to_float128 __floatuntitf + #define float32_to_int32_round_to_zero __fixsfsi #define float64_to_int32_round_to_zero __fixdfsi +#define floatx80_to_int32_round_to_zero __fixxfsi +#define float128_to_int32_round_to_zero __fixtfsi + #define float32_to_int64_round_to_zero __fixsfdi #define float64_to_int64_round_to_zero __fixdfdi +#define floatx80_to_int64_round_to_zero __fixxfdi +#define float128_to_int64_round_to_zero __fixtfdi + +#define float32_to_int128_round_to_zero __fixsfti +#define float64_to_int128_round_to_zero __fixdfti +#define floatx80_to_int128_round_to_zero __fixxfti +#define float128_to_int128_round_to_zero __fixtfti + #define float32_to_uint32_round_to_zero __fixunssfsi #define float64_to_uint32_round_to_zero __fixunsdfsi +#define floatx80_to_uint32_round_to_zero __fixunsxfsi +#define float128_to_uint32_round_to_zero __fixunstfsi + +#define float32_to_uint64_round_to_zero __fixunssfdi +#define float64_to_uint64_round_to_zero __fixunsdfdi +#define floatx80_to_uint64_round_to_zero __fixunsxfdi +#define float128_to_uint64_round_to_zero __fixunstfdi + +#define float32_to_uint128_round_to_zero __fixunssfti +#define float64_to_uint128_round_to_zero __fixunsdfti +#define floatx80_to_uint128_round_to_zero __fixunsxfti +#define float128_to_uint128_round_to_zero __fixunstfti + #define float32_to_float64 __extendsfdf2 +#define float32_to_floatx80 __extendsfxf2 +#define float32_to_float128 __extendsftf2 +#define float64_to_floatx80 __extenddfxf2 +#define float64_to_float128 __extenddftf2 + +#define float128_to_float64 __trunctfdf2 +#define floatx80_to_float64 __truncxfdf2 +#define float128_to_float32 __trunctfsf2 +#define floatx80_to_float32 __truncxfsf2 #define float64_to_float32 __truncdfsf2 + +#if 0 +#define float32_cmp __cmpsf2 +#define float32_unord __unordsf2 +#define float32_eq __eqsf2 +#define float32_ne __nesf2 +#define float32_ge __gesf2 +#define float32_lt __ltsf2 +#define float32_le __lesf2 +#define float32_gt __gtsf2 +#endif + +#if 0 +#define float64_cmp __cmpdf2 +#define float64_unord __unorddf2 +#define float64_eq __eqdf2 +#define float64_ne __nedf2 +#define float64_ge __gedf2 +#define float64_lt __ltdf2 +#define float64_le __ledf2 +#define float64_gt __gtdf2 +#endif + +/* XXX not in libgcc */ +#if 1 +#define floatx80_cmp __cmpxf2 +#define floatx80_unord __unordxf2 +#define floatx80_eq __eqxf2 +#define floatx80_ne __nexf2 +#define floatx80_ge __gexf2 +#define floatx80_lt __ltxf2 +#define floatx80_le __lexf2 +#define floatx80_gt __gtxf2 +#endif + +#if 0 +#define float128_cmp __cmptf2 +#define float128_unord __unordtf2 +#define float128_eq __eqtf2 +#define float128_ne __netf2 +#define float128_ge __getf2 +#define float128_lt __lttf2 +#define float128_le __letf2 +#define float128_gt __gttf2 +#endif Index: head/lib/libc/softfloat/softfloat-source.txt =================================================================== --- head/lib/libc/softfloat/softfloat-source.txt (revision 230362) +++ head/lib/libc/softfloat/softfloat-source.txt (revision 230363) @@ -1,384 +1,384 @@ -$NetBSD: softfloat-source.txt,v 1.1 2000/06/06 08:15:10 bjh21 Exp $ +$NetBSD: softfloat-source.txt,v 1.2 2006/11/24 19:46:58 christos Exp $ $FreeBSD$ SoftFloat Release 2a Source Documentation John R. Hauser 1998 December 14 ------------------------------------------------------------------------------- Introduction SoftFloat is a software implementation of floating-point that conforms to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. SoftFloat can support four floating-point formats: single precision, double precision, extended double precision, and quadruple precision. All operations required by the IEEE Standard are implemented, except for conversions to and from decimal. SoftFloat is distributed in the form of C source code, so a C compiler is needed to compile the code. Support for the extended double- precision and quadruple-precision formats is dependent on the C compiler implementing a 64-bit integer type. This document gives information needed for compiling and/or porting SoftFloat. The source code for SoftFloat is intended to be relatively machine- independent and should be compilable using any ISO/ANSI C compiler. At the time of this writing, SoftFloat has been successfully compiled with the GNU C Compiler (`gcc') for several platforms. ------------------------------------------------------------------------------- Limitations SoftFloat as written requires an ISO/ANSI-style C compiler. No attempt has -been made to accomodate compilers that are not ISO-conformant. Older ``K&R- +been made to accommodate compilers that are not ISO-conformant. Older ``K&R- style'' compilers are not adequate for compiling SoftFloat. All testing I have done so far has been with the GNU C Compiler. Compilation with other compilers should be possible but has not been tested. The SoftFloat sources assume that source code file names can be longer than 8 characters. In order to compile under an MS-DOS-type system, many of the source files will need to be renamed, and the source and makefiles edited appropriately. Once compiled, the SoftFloat binary does not depend on the existence of long file names. The underlying machine is assumed to be binary with a word size that is a power of 2. Bytes are 8 bits. Support for the extended double-precision and quadruple-precision formats depends on the C compiler implementing a 64-bit integer type. If the largest integer type supported by the C compiler is 32 bits, SoftFloat is limited to the single- and double- precision formats. ------------------------------------------------------------------------------- Contents Introduction Limitations Contents Legal Notice SoftFloat Source Directory Structure SoftFloat Source Files processors/*.h softfloat/bits*/*/softfloat.h softfloat/bits*/*/milieu.h softfloat/bits*/*/softfloat-specialize softfloat/bits*/softfloat-macros softfloat/bits*/softfloat.c Steps to Creating a `softfloat.o' Making `softfloat.o' a Library Testing SoftFloat Timing SoftFloat Compiler Options and Efficiency Processor-Specific Optimization of `softfloat.c' Using `softfloat-macros' Contact Information ------------------------------------------------------------------------------- Legal Notice SoftFloat was written by John R. Hauser. This work was made possible in part by the International Computer Science Institute, located at Suite 600, 1947 Center Street, Berkeley, California 94704. Funding was partially provided by the National Science Foundation under grant MIP-9311980. The original version of this code was written as part of a project to build a fixed-point vector processor in collaboration with the University of California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. ------------------------------------------------------------------------------- SoftFloat Source Directory Structure Because SoftFloat is targeted to multiple platforms, its source code is slightly scattered between target-specific and target-independent directories and files. The directory structure is as follows: processors softfloat bits64 templates 386-Win32-gcc SPARC-Solaris-gcc bits32 templates 386-Win32-gcc SPARC-Solaris-gcc The two topmost directories and their contents are: softfloat - Most of the source code needed for SoftFloat. processors - Target-specific header files that are not specific to SoftFloat. The `softfloat' directory is further split into two parts: bits64 - SoftFloat implementation using 64-bit integers. bits32 - SoftFloat implementation using only 32-bit integers. Within these directories are subdirectories for each of the targeted platforms. The SoftFloat source code is distributed with targets `386-Win32-gcc' and `SPARC-Solaris-gcc' (and perhaps others) already prepared for both the 32-bit and 64-bit implementations. Source files that are not within these target-specific subdirectories are intended to be target-independent. The naming convention used for the target-specific directories is `--'. The names of the supplied target directories should be interpreted as follows: : 386 - Intel 386-compatible processor. SPARC - SPARC processor (as used by Sun machines). : Win32 - Microsoft Win32 executable. Solaris - Sun Solaris executable. : gcc - GNU C Compiler. You do not need to maintain this convention if you do not want to. Alongside the supplied target-specific directories is a `templates' directory containing a set of ``generic'' target-specific source files. A new target directory can be created by copying the `templates' directory and editing the files inside. (Complete instructions for porting SoftFloat to a new target are in the section _Steps_to_Creating_a_`softfloat.o'_.) Note that the `templates' directory will not work as a target directory without some editing. To avoid confusion, it would be wise to refrain from editing the files inside `templates' directly. ------------------------------------------------------------------------------- SoftFloat Source Files The purpose of each source file is described below. In the following, the `*' symbol is used in place of the name of a specific target, such as `386-Win32-gcc' or `SPARC-Solaris-gcc', or in place of some other text, as in `bits*' for either `bits32' or `bits64'. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - processors/*.h The target-specific `processors' header file defines integer types of various sizes, and also defines certain C preprocessor macros that characterize the target. The two examples supplied are `386-gcc.h' and `SPARC-gcc.h'. The naming convention used for processor header files is `-.h'. If 64-bit integers are supported by the compiler, the macro name `BITS64' should be defined here along with the corresponding 64-bit integer types. In addition, the function-like macro `LIT64' must be defined for constructing 64-bit integer literals (constants). The `LIT64' macro is used consistently in the SoftFloat code to annotate 64-bit literals. If `BITS64' is not defined, only the 32-bit version of SoftFloat can be compiled. If `BITS64' _is_ defined, either can be compiled. If an inlining attribute (such as an `inline' keyword) is provided by the compiler, the macro `INLINE' should be defined to the appropriate keyword. If not, `INLINE' can be set to the keyword `static'. The `INLINE' macro appears in the SoftFloat source code before every function that should be inlined by the compiler. SoftFloat depends on inlining to obtain good speed. Even if inlining cannot be forced with a language keyword, the compiler may still be able to perform inlining on its own as an optimization. If a command-line option is needed to convince the compiler to perform this optimization, this should be assured in the makefile. (See the section _Compiler_Options_and_Efficiency_ below.) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - softfloat/bits*/*/softfloat.h The target-specific `softfloat.h' header file defines the SoftFloat interface as seen by clients. Unlike the actual function definitions in `softfloat.c', the declarations in `softfloat.h' do not use any of the types defined by the `processors' header file. This is done so that clients will not have to include the `processors' header file in order to use SoftFloat. Nevertheless, the target-specific declarations in `softfloat.h' must match what `softfloat.c' expects. For example, if `int32' is defined as `int' in the `processors' header file, then in `softfloat.h' the output of `float32_to_int32' should be stated as `int', although in `softfloat.c' it is given in target- independent form as `int32'. For the `bits64' implementation of SoftFloat, the macro names `FLOATX80' and `FLOAT128' must be defined in order for the extended double-precision and quadruple-precision formats to be enabled in the code. Conversely, either or both of the extended formats can be disabled by simply removing the `#define' of the respective macro. When an extended format is not enabled, none of the functions that either input or output the format are defined, and no space is taken up in `softfloat.o' by such functions. There is no provision for disabling the usual single- and double-precision formats. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - softfloat/bits*/*/milieu.h The target-specific `milieu.h' header file provides declarations that are needed to compile SoftFloat. In addition, deviations from ISO/ANSI C by the compiler (such as names not properly declared in system header files) are corrected in this header if possible. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - softfloat/bits*/*/softfloat-specialize This target-specific C source fragment defines: -- whether tininess for underflow is detected before or after rounding by default; -- what (if anything) special happens when exceptions are raised; -- how signaling NaNs are distinguished from quiet NaNs; -- the default generated quiet NaNs; and -- how NaNs are propagated from function inputs to output. These details are not decided by the IEC/IEEE Standard. This fragment is included verbatim within `softfloat.c' when SoftFloat is compiled. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - softfloat/bits*/softfloat-macros This target-independent C source fragment defines a number of arithmetic functions used as primitives within the `softfloat.c' source. Most of the functions defined here are intended to be inlined for efficiency. This fragment is included verbatim within `softfloat.c' when SoftFloat is compiled. Target-specific variations on this file are possible. See the section _Processor-Specific_Optimization_of_`softfloat.c'_Using_`softfloat-macros'_ below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - softfloat/bits*/softfloat.c The target-independent `softfloat.c' source file contains the body of the SoftFloat implementation. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The inclusion of the files above within each other (using `#include') can be shown graphically as follows: softfloat/bits*/softfloat.c softfloat/bits*/*/milieu.h processors/*.h softfloat/bits*/*/softfloat.h softfloat/bits*/*/softfloat-specialize softfloat/bits*/softfloat-macros Note in particular that `softfloat.c' does not include the `processors' header file directly. Rather, `softfloat.c' includes the target-specific `milieu.h' header file, which in turn includes the processor header file. ------------------------------------------------------------------------------- Steps to Creating a `softfloat.o' Porting and/or compiling SoftFloat involves the following steps: 1. If one does not already exist, create an appropriate `.h' file in the `processors' directory. 2. If `BITS64' is defined in the `processors' header file, choose whether to compile the 32-bit or 64-bit implementation of SoftFloat. If `BITS64' is not defined, your only choice is the 32-bit implementation. The remaining steps occur within either the `bits32' or `bits64' subdirectories. 3. If one does not already exist, create an appropriate target-specific subdirectory by copying the given `templates' directory. 4. In the target-specific subdirectory, edit the files `softfloat-specialize' and `softfloat.h' to define the desired exception handling functions and mode control values. In the `softfloat.h' header file, ensure also that all declarations give the proper target-specific type (such as `int' or `long') corresponding to the target-independent type used in `softfloat.c' (such as `int32'). None of the type names declared in the `processors' header file should appear in `softfloat.h'. 5. In the target-specific subdirectory, edit the files `milieu.h' and `Makefile' to reflect the current environment. 6. In the target-specific subdirectory, execute `make'. For the targets that are supplied, if the expected compiler is available (usually `gcc'), it should only be necessary to execute `make' in the target-specific subdirectory. ------------------------------------------------------------------------------- Making `softfloat.o' a Library SoftFloat is not made into a software library by the supplied makefile. If desired, `softfloat.o' can easily be put into its own library (in Unix, `softfloat.a') using the usual system tool (in Unix, `ar'). ------------------------------------------------------------------------------- Testing SoftFloat SoftFloat can be tested using the `testsoftfloat' program by the same author. The `testsoftfloat' program is part of the TestFloat package available at the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/ TestFloat.html'. ------------------------------------------------------------------------------- Timing SoftFloat A program called `timesoftfloat' for timing the SoftFloat functions is included with the SoftFloat source code. Compiling `timesoftfloat' should pose no difficulties once `softfloat.o' exists. The supplied makefile will create a `timesoftfloat' executable by default after generating `softfloat.o'. See `timesoftfloat.txt' for documentation about using `timesoftfloat'. ------------------------------------------------------------------------------- Compiler Options and Efficiency In order to get good speed with SoftFloat, it is important that the compiler inline the routines that have been marked `INLINE' in the code. Even if inlining cannot be forced by an appropriate definition of the `INLINE' macro, the compiler may still be able to perform inlining on its own as an optimization. In that case, the makefile should be edited to give the compiler whatever option is required to cause it to inline small functions. The ability of the processor to do fast shifts has been assumed. Efficiency will not be as good on processors for which this is not the case (such as the original Motorola 68000 or Intel 8086 processors). ------------------------------------------------------------------------------- Processor-Specific Optimization of `softfloat.c' Using `softfloat-macros' The `softfloat-macros' source fragment defines arithmetic functions used as primitives by `softfloat.c'. This file has been written in a target- independent form. For a given target, it may be possible to improve on these functions using target-specific and/or non-ISO-C features (such as `asm' statements). For example, one of the ``macro'' functions takes two word-size integers and returns their full product in two words. This operation can be done directly in hardware on many processors; but because it is not available through standard C, the function defined in `softfloat-macros' uses four multiplies to achieve the same result. To address these shortcomings, a customized version of `softfloat-macros' can be created in any of the target-specific subdirectories. A simple modification to the target's makefile should be sufficient to ensure that the custom version is used instead of the generic one. ------------------------------------------------------------------------------- Contact Information At the time of this writing, the most up-to-date information about SoftFloat and the latest release can be found at the Web page `http:// HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'. Index: head/lib/libc/softfloat/softfloat-specialize =================================================================== --- head/lib/libc/softfloat/softfloat-specialize (revision 230362) +++ head/lib/libc/softfloat/softfloat-specialize (revision 230363) @@ -1,494 +1,521 @@ -/* $NetBSD: softfloat-specialize,v 1.3 2002/05/12 13:12:45 bjh21 Exp $ */ +/* $NetBSD: softfloat-specialize,v 1.6 2011/03/06 10:27:37 martin Exp $ */ /* $FreeBSD$ */ /* This is a derivative work. */ /* =============================================================================== This C source fragment is part of the SoftFloat IEC/IEEE Floating-point Arithmetic Package, Release 2a. Written by John R. Hauser. This work was made possible in part by the International Computer Science Institute, located at Suite 600, 1947 Center Street, Berkeley, California 94704. Funding was partially provided by the National Science Foundation under grant MIP-9311980. The original version of this code was written as part of a project to build a fixed-point vector processor in collaboration with the University of California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. More information is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ arithmetic/SoftFloat.html'. THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. Derivative works are acceptable, even for commercial purposes, so long as (1) they include prominent notice that the work is derivative, and (2) they include prominent notice akin to these four paragraphs for those parts of this code that are retained. =============================================================================== */ #include +#include +#include /* ------------------------------------------------------------------------------- Underflow tininess-detection mode, statically initialized to default value. (The declaration in `softfloat.h' must match the `int8' type here.) ------------------------------------------------------------------------------- */ #ifdef SOFTFLOAT_FOR_GCC static #endif #ifdef __sparc64__ int8 float_detect_tininess = float_tininess_before_rounding; #else int8 float_detect_tininess = float_tininess_after_rounding; #endif /* ------------------------------------------------------------------------------- Raises the exceptions specified by `flags'. Floating-point traps can be defined here if desired. It is currently not possible for such a trap to substitute a result value. If traps are not implemented, this routine should be simply `float_exception_flags |= flags;'. ------------------------------------------------------------------------------- */ +#ifdef SOFTFLOAT_FOR_GCC +#define float_exception_mask __softfloat_float_exception_mask +#endif int float_exception_mask = 0; void float_raise( int flags ) { float_exception_flags |= flags; if ( flags & float_exception_mask ) { +#if 0 + siginfo_t info; + memset(&info, 0, sizeof info); + info.si_signo = SIGFPE; + info.si_pid = getpid(); + info.si_uid = geteuid(); + if (flags & float_flag_underflow) + info.si_code = FPE_FLTUND; + else if (flags & float_flag_overflow) + info.si_code = FPE_FLTOVF; + else if (flags & float_flag_divbyzero) + info.si_code = FPE_FLTDIV; + else if (flags & float_flag_invalid) + info.si_code = FPE_FLTINV; + else if (flags & float_flag_inexact) + info.si_code = FPE_FLTRES; + sigqueueinfo(getpid(), &info); +#else raise( SIGFPE ); +#endif } } +#undef float_exception_mask /* ------------------------------------------------------------------------------- Internal canonical NaN format. ------------------------------------------------------------------------------- */ typedef struct { flag sign; bits64 high, low; } commonNaNT; /* ------------------------------------------------------------------------------- The pattern for a default generated single-precision NaN. ------------------------------------------------------------------------------- */ #define float32_default_nan 0xFFFFFFFF /* ------------------------------------------------------------------------------- Returns 1 if the single-precision floating-point value `a' is a NaN; otherwise returns 0. ------------------------------------------------------------------------------- */ #ifdef SOFTFLOAT_FOR_GCC static #endif flag float32_is_nan( float32 a ) { return ( 0xFF000000 < (bits32) ( a<<1 ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the single-precision floating-point value `a' is a signaling NaN; otherwise returns 0. ------------------------------------------------------------------------------- */ -#if defined(SOFTFLOAT_FOR_GCC) && !defined(SOFTFLOATSPARC64_FOR_GCC) +#if defined(SOFTFLOAT_FOR_GCC) && !defined(SOFTFLOATSPARC64_FOR_GCC) && \ + !defined(SOFTFLOAT_M68K_FOR_GCC) static #endif flag float32_is_signaling_nan( float32 a ) { return ( ( ( a>>22 ) & 0x1FF ) == 0x1FE ) && ( a & 0x003FFFFF ); } /* ------------------------------------------------------------------------------- Returns the result of converting the single-precision floating-point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid exception is raised. ------------------------------------------------------------------------------- */ static commonNaNT float32ToCommonNaN( float32 a ) { commonNaNT z; if ( float32_is_signaling_nan( a ) ) float_raise( float_flag_invalid ); z.sign = a>>31; z.low = 0; z.high = ( (bits64) a )<<41; return z; } /* ------------------------------------------------------------------------------- Returns the result of converting the canonical NaN `a' to the single- precision floating-point format. ------------------------------------------------------------------------------- */ static float32 commonNaNToFloat32( commonNaNT a ) { return ( ( (bits32) a.sign )<<31 ) | 0x7FC00000 | ( a.high>>41 ); } /* ------------------------------------------------------------------------------- Takes two single-precision floating-point values `a' and `b', one of which is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a signaling NaN, the invalid exception is raised. ------------------------------------------------------------------------------- */ static float32 propagateFloat32NaN( float32 a, float32 b ) { flag aIsNaN, aIsSignalingNaN, bIsNaN, bIsSignalingNaN; aIsNaN = float32_is_nan( a ); aIsSignalingNaN = float32_is_signaling_nan( a ); bIsNaN = float32_is_nan( b ); bIsSignalingNaN = float32_is_signaling_nan( b ); a |= 0x00400000; b |= 0x00400000; if ( aIsSignalingNaN | bIsSignalingNaN ) float_raise( float_flag_invalid ); if ( aIsNaN ) { return ( aIsSignalingNaN & bIsNaN ) ? b : a; } else { return b; } } /* ------------------------------------------------------------------------------- The pattern for a default generated double-precision NaN. ------------------------------------------------------------------------------- */ #define float64_default_nan LIT64( 0xFFFFFFFFFFFFFFFF ) /* ------------------------------------------------------------------------------- Returns 1 if the double-precision floating-point value `a' is a NaN; otherwise returns 0. ------------------------------------------------------------------------------- */ #ifdef SOFTFLOAT_FOR_GCC static #endif flag float64_is_nan( float64 a ) { return ( LIT64( 0xFFE0000000000000 ) < (bits64) ( FLOAT64_DEMANGLE(a)<<1 ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the double-precision floating-point value `a' is a signaling NaN; otherwise returns 0. ------------------------------------------------------------------------------- */ -#if defined(SOFTFLOAT_FOR_GCC) && !defined(SOFTFLOATSPARC64_FOR_GCC) +#if defined(SOFTFLOAT_FOR_GCC) && !defined(SOFTFLOATSPARC64_FOR_GCC) && \ + !defined(SOFTFLOATM68K_FOR_GCC) static #endif flag float64_is_signaling_nan( float64 a ) { return ( ( ( FLOAT64_DEMANGLE(a)>>51 ) & 0xFFF ) == 0xFFE ) && ( FLOAT64_DEMANGLE(a) & LIT64( 0x0007FFFFFFFFFFFF ) ); } /* ------------------------------------------------------------------------------- Returns the result of converting the double-precision floating-point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid exception is raised. ------------------------------------------------------------------------------- */ static commonNaNT float64ToCommonNaN( float64 a ) { commonNaNT z; if ( float64_is_signaling_nan( a ) ) float_raise( float_flag_invalid ); z.sign = FLOAT64_DEMANGLE(a)>>63; z.low = 0; z.high = FLOAT64_DEMANGLE(a)<<12; return z; } /* ------------------------------------------------------------------------------- Returns the result of converting the canonical NaN `a' to the double- precision floating-point format. ------------------------------------------------------------------------------- */ static float64 commonNaNToFloat64( commonNaNT a ) { return FLOAT64_MANGLE( ( ( (bits64) a.sign )<<63 ) | LIT64( 0x7FF8000000000000 ) | ( a.high>>12 ) ); } /* ------------------------------------------------------------------------------- Takes two double-precision floating-point values `a' and `b', one of which is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a signaling NaN, the invalid exception is raised. ------------------------------------------------------------------------------- */ static float64 propagateFloat64NaN( float64 a, float64 b ) { flag aIsNaN, aIsSignalingNaN, bIsNaN, bIsSignalingNaN; aIsNaN = float64_is_nan( a ); aIsSignalingNaN = float64_is_signaling_nan( a ); bIsNaN = float64_is_nan( b ); bIsSignalingNaN = float64_is_signaling_nan( b ); a |= FLOAT64_MANGLE(LIT64( 0x0008000000000000 )); b |= FLOAT64_MANGLE(LIT64( 0x0008000000000000 )); if ( aIsSignalingNaN | bIsSignalingNaN ) float_raise( float_flag_invalid ); if ( aIsNaN ) { return ( aIsSignalingNaN & bIsNaN ) ? b : a; } else { return b; } } #ifdef FLOATX80 /* ------------------------------------------------------------------------------- The pattern for a default generated extended double-precision NaN. The `high' and `low' values hold the most- and least-significant bits, respectively. ------------------------------------------------------------------------------- */ #define floatx80_default_nan_high 0xFFFF #define floatx80_default_nan_low LIT64( 0xFFFFFFFFFFFFFFFF ) /* ------------------------------------------------------------------------------- Returns 1 if the extended double-precision floating-point value `a' is a NaN; otherwise returns 0. ------------------------------------------------------------------------------- */ flag floatx80_is_nan( floatx80 a ) { return ( ( a.high & 0x7FFF ) == 0x7FFF ) && (bits64) ( a.low<<1 ); } /* ------------------------------------------------------------------------------- Returns 1 if the extended double-precision floating-point value `a' is a signaling NaN; otherwise returns 0. ------------------------------------------------------------------------------- */ flag floatx80_is_signaling_nan( floatx80 a ) { bits64 aLow; aLow = a.low & ~ LIT64( 0x4000000000000000 ); return ( ( a.high & 0x7FFF ) == 0x7FFF ) && (bits64) ( aLow<<1 ) && ( a.low == aLow ); } /* ------------------------------------------------------------------------------- Returns the result of converting the extended double-precision floating- point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid exception is raised. ------------------------------------------------------------------------------- */ static commonNaNT floatx80ToCommonNaN( floatx80 a ) { commonNaNT z; if ( floatx80_is_signaling_nan( a ) ) float_raise( float_flag_invalid ); z.sign = a.high>>15; z.low = 0; z.high = a.low<<1; return z; } /* ------------------------------------------------------------------------------- Returns the result of converting the canonical NaN `a' to the extended double-precision floating-point format. ------------------------------------------------------------------------------- */ static floatx80 commonNaNToFloatx80( commonNaNT a ) { floatx80 z; z.low = LIT64( 0xC000000000000000 ) | ( a.high>>1 ); z.high = ( ( (bits16) a.sign )<<15 ) | 0x7FFF; return z; } /* ------------------------------------------------------------------------------- Takes two extended double-precision floating-point values `a' and `b', one of which is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a signaling NaN, the invalid exception is raised. ------------------------------------------------------------------------------- */ static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b ) { flag aIsNaN, aIsSignalingNaN, bIsNaN, bIsSignalingNaN; aIsNaN = floatx80_is_nan( a ); aIsSignalingNaN = floatx80_is_signaling_nan( a ); bIsNaN = floatx80_is_nan( b ); bIsSignalingNaN = floatx80_is_signaling_nan( b ); a.low |= LIT64( 0xC000000000000000 ); b.low |= LIT64( 0xC000000000000000 ); if ( aIsSignalingNaN | bIsSignalingNaN ) float_raise( float_flag_invalid ); if ( aIsNaN ) { return ( aIsSignalingNaN & bIsNaN ) ? b : a; } else { return b; } } #endif #ifdef FLOAT128 /* ------------------------------------------------------------------------------- The pattern for a default generated quadruple-precision NaN. The `high' and `low' values hold the most- and least-significant bits, respectively. ------------------------------------------------------------------------------- */ #define float128_default_nan_high LIT64( 0xFFFFFFFFFFFFFFFF ) #define float128_default_nan_low LIT64( 0xFFFFFFFFFFFFFFFF ) /* ------------------------------------------------------------------------------- Returns 1 if the quadruple-precision floating-point value `a' is a NaN; otherwise returns 0. ------------------------------------------------------------------------------- */ flag float128_is_nan( float128 a ) { return ( LIT64( 0xFFFE000000000000 ) <= (bits64) ( a.high<<1 ) ) && ( a.low || ( a.high & LIT64( 0x0000FFFFFFFFFFFF ) ) ); } /* ------------------------------------------------------------------------------- Returns 1 if the quadruple-precision floating-point value `a' is a signaling NaN; otherwise returns 0. ------------------------------------------------------------------------------- */ flag float128_is_signaling_nan( float128 a ) { return ( ( ( a.high>>47 ) & 0xFFFF ) == 0xFFFE ) && ( a.low || ( a.high & LIT64( 0x00007FFFFFFFFFFF ) ) ); } /* ------------------------------------------------------------------------------- Returns the result of converting the quadruple-precision floating-point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, the invalid exception is raised. ------------------------------------------------------------------------------- */ static commonNaNT float128ToCommonNaN( float128 a ) { commonNaNT z; if ( float128_is_signaling_nan( a ) ) float_raise( float_flag_invalid ); z.sign = a.high>>63; shortShift128Left( a.high, a.low, 16, &z.high, &z.low ); return z; } /* ------------------------------------------------------------------------------- Returns the result of converting the canonical NaN `a' to the quadruple- precision floating-point format. ------------------------------------------------------------------------------- */ static float128 commonNaNToFloat128( commonNaNT a ) { float128 z; shift128Right( a.high, a.low, 16, &z.high, &z.low ); z.high |= ( ( (bits64) a.sign )<<63 ) | LIT64( 0x7FFF800000000000 ); return z; } /* ------------------------------------------------------------------------------- Takes two quadruple-precision floating-point values `a' and `b', one of which is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a signaling NaN, the invalid exception is raised. ------------------------------------------------------------------------------- */ static float128 propagateFloat128NaN( float128 a, float128 b ) { flag aIsNaN, aIsSignalingNaN, bIsNaN, bIsSignalingNaN; aIsNaN = float128_is_nan( a ); aIsSignalingNaN = float128_is_signaling_nan( a ); bIsNaN = float128_is_nan( b ); bIsSignalingNaN = float128_is_signaling_nan( b ); a.high |= LIT64( 0x0000800000000000 ); b.high |= LIT64( 0x0000800000000000 ); if ( aIsSignalingNaN | bIsSignalingNaN ) float_raise( float_flag_invalid ); if ( aIsNaN ) { return ( aIsSignalingNaN & bIsNaN ) ? b : a; } else { return b; } } #endif Index: head/lib/libc/softfloat/softfloat.txt =================================================================== --- head/lib/libc/softfloat/softfloat.txt (revision 230362) +++ head/lib/libc/softfloat/softfloat.txt (revision 230363) @@ -1,373 +1,373 @@ -$NetBSD: softfloat.txt,v 1.1 2000/06/06 08:15:10 bjh21 Exp $ +$NetBSD: softfloat.txt,v 1.2 2006/11/24 19:46:58 christos Exp $ $FreeBSD$ SoftFloat Release 2a General Documentation John R. Hauser 1998 December 13 ------------------------------------------------------------------------------- Introduction SoftFloat is a software implementation of floating-point that conforms to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. As many as four formats are supported: single precision, double precision, extended double precision, and quadruple precision. All operations required by the standard are implemented, except for conversions to and from decimal. This document gives information about the types defined and the routines implemented by SoftFloat. It does not attempt to define or explain the IEC/IEEE Floating-Point Standard. Details about the standard are available elsewhere. ------------------------------------------------------------------------------- Limitations SoftFloat is written in C and is designed to work with other C code. The SoftFloat header files assume an ISO/ANSI-style C compiler. No attempt -has been made to accomodate compilers that are not ISO-conformant. In +has been made to accommodate compilers that are not ISO-conformant. In particular, the distributed header files will not be acceptable to any compiler that does not recognize function prototypes. Support for the extended double-precision and quadruple-precision formats depends on a C compiler that implements 64-bit integer arithmetic. If the largest integer format supported by the C compiler is 32 bits, SoftFloat is limited to only single and double precisions. When that is the case, all references in this document to the extended double precision, quadruple precision, and 64-bit integers should be ignored. ------------------------------------------------------------------------------- Contents Introduction Limitations Contents Legal Notice Types and Functions Rounding Modes Extended Double-Precision Rounding Precision Exceptions and Exception Flags Function Details Conversion Functions Standard Arithmetic Functions Remainder Functions Round-to-Integer Functions Comparison Functions Signaling NaN Test Functions Raise-Exception Function Contact Information ------------------------------------------------------------------------------- Legal Notice SoftFloat was written by John R. Hauser. This work was made possible in part by the International Computer Science Institute, located at Suite 600, 1947 Center Street, Berkeley, California 94704. Funding was partially provided by the National Science Foundation under grant MIP-9311980. The original version of this code was written as part of a project to build a fixed-point vector processor in collaboration with the University of California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek. THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. ------------------------------------------------------------------------------- Types and Functions When 64-bit integers are supported by the compiler, the `softfloat.h' header file defines four types: `float32' (single precision), `float64' (double precision), `floatx80' (extended double precision), and `float128' (quadruple precision). The `float32' and `float64' types are defined in terms of 32-bit and 64-bit integer types, respectively, while the `float128' type is defined as a structure of two 64-bit integers, taking into account the byte order of the particular machine being used. The `floatx80' type is defined as a structure containing one 16-bit and one 64-bit integer, with the machine's byte order again determining the order of the `high' and `low' fields. When 64-bit integers are _not_ supported by the compiler, the `softfloat.h' header file defines only two types: `float32' and `float64'. Because ISO/ANSI C guarantees at least one built-in integer type of 32 bits, the `float32' type is identified with an appropriate integer type. The `float64' type is defined as a structure of two 32-bit integers, with the machine's byte order determining the order of the fields. In either case, the types in `softfloat.h' are defined such that if a system implements the usual C `float' and `double' types according to the IEC/IEEE Standard, then the `float32' and `float64' types should be indistinguishable in memory from the native `float' and `double' types. (On the other hand, when `float32' or `float64' values are placed in processor registers by the compiler, the type of registers used may differ from those used for the native `float' and `double' types.) SoftFloat implements the following arithmetic operations: -- Conversions among all the floating-point formats, and also between integers (32-bit and 64-bit) and any of the floating-point formats. -- The usual add, subtract, multiply, divide, and square root operations for all floating-point formats. -- For each format, the floating-point remainder operation defined by the IEC/IEEE Standard. -- For each floating-point format, a ``round to integer'' operation that rounds to the nearest integer value in the same format. (The floating- point formats can hold integer values, of course.) -- Comparisons between two values in the same floating-point format. The only functions required by the IEC/IEEE Standard that are not provided are conversions to and from decimal. ------------------------------------------------------------------------------- Rounding Modes All four rounding modes prescribed by the IEC/IEEE Standard are implemented for all operations that require rounding. The rounding mode is selected by the global variable `float_rounding_mode'. This variable may be set to one of the values `float_round_nearest_even', `float_round_to_zero', `float_round_down', or `float_round_up'. The rounding mode is initialized to nearest/even. ------------------------------------------------------------------------------- Extended Double-Precision Rounding Precision For extended double precision (`floatx80') only, the rounding precision of the standard arithmetic operations is controlled by the global variable `floatx80_rounding_precision'. The operations affected are: floatx80_add floatx80_sub floatx80_mul floatx80_div floatx80_sqrt When `floatx80_rounding_precision' is set to its default value of 80, these operations are rounded (as usual) to the full precision of the extended double-precision format. Setting `floatx80_rounding_precision' to 32 or to 64 causes the operations listed to be rounded to reduced precision equivalent to single precision (`float32') or to double precision (`float64'), respectively. When rounding to reduced precision, additional bits in the result significand beyond the rounding point are set to zero. The consequences of setting `floatx80_rounding_precision' to a value other than 32, 64, or 80 is not specified. Operations other than the ones listed above are not affected by `floatx80_rounding_precision'. ------------------------------------------------------------------------------- Exceptions and Exception Flags All five exception flags required by the IEC/IEEE Standard are implemented. Each flag is stored as a unique bit in the global variable `float_exception_flags'. The positions of the exception flag bits within this variable are determined by the bit masks `float_flag_inexact', `float_flag_underflow', `float_flag_overflow', `float_flag_divbyzero', and `float_flag_invalid'. The exception flags variable is initialized to all 0, meaning no exceptions. An individual exception flag can be cleared with the statement float_exception_flags &= ~ float_flag_; where `' is the appropriate name. To raise a floating-point exception, the SoftFloat function `float_raise' should be used (see below). In the terminology of the IEC/IEEE Standard, SoftFloat can detect tininess for underflow either before or after rounding. The choice is made by the global variable `float_detect_tininess', which can be set to either `float_tininess_before_rounding' or `float_tininess_after_rounding'. Detecting tininess after rounding is better because it results in fewer spurious underflow signals. The other option is provided for compatibility with some systems. Like most systems, SoftFloat always detects loss of accuracy for underflow as an inexact result. ------------------------------------------------------------------------------- Function Details - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Conversion Functions All conversions among the floating-point formats are supported, as are all conversions between a floating-point format and 32-bit and 64-bit signed integers. The complete set of conversion functions is: int32_to_float32 int64_to_float32 int32_to_float64 int64_to_float32 int32_to_floatx80 int64_to_floatx80 int32_to_float128 int64_to_float128 float32_to_int32 float32_to_int64 float32_to_int32 float64_to_int64 floatx80_to_int32 floatx80_to_int64 float128_to_int32 float128_to_int64 float32_to_float64 float32_to_floatx80 float32_to_float128 float64_to_float32 float64_to_floatx80 float64_to_float128 floatx80_to_float32 floatx80_to_float64 floatx80_to_float128 float128_to_float32 float128_to_float64 float128_to_floatx80 Each conversion function takes one operand of the appropriate type and returns one result. Conversions from a smaller to a larger floating-point format are always exact and so require no rounding. Conversions from 32-bit integers to double precision and larger formats are also exact, and likewise for conversions from 64-bit integers to extended double and quadruple precisions. Conversions from floating-point to integer raise the invalid exception if the source value cannot be rounded to a representable integer of the desired size (32 or 64 bits). If the floating-point operand is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as the operand is returned. On conversions to integer, if the floating-point operand is not already an integer value, the operand is rounded according to the current rounding mode as specified by `float_rounding_mode'. Because C (and perhaps other languages) require that conversions to integers be rounded toward zero, the following functions are provided for improved speed and convenience: float32_to_int32_round_to_zero float32_to_int64_round_to_zero float64_to_int32_round_to_zero float64_to_int64_round_to_zero floatx80_to_int32_round_to_zero floatx80_to_int64_round_to_zero float128_to_int32_round_to_zero float128_to_int64_round_to_zero These variant functions ignore `float_rounding_mode' and always round toward zero. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Standard Arithmetic Functions The following standard arithmetic functions are provided: float32_add float32_sub float32_mul float32_div float32_sqrt float64_add float64_sub float64_mul float64_div float64_sqrt floatx80_add floatx80_sub floatx80_mul floatx80_div floatx80_sqrt float128_add float128_sub float128_mul float128_div float128_sqrt Each function takes two operands, except for `sqrt' which takes only one. The operands and result are all of the same type. Rounding of the extended double-precision (`floatx80') functions is affected by the `floatx80_rounding_precision' variable, as explained above in the section _Extended_Double-Precision_Rounding_Precision_. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Remainder Functions For each format, SoftFloat implements the remainder function according to the IEC/IEEE Standard. The remainder functions are: float32_rem float64_rem floatx80_rem float128_rem Each remainder function takes two operands. The operands and result are all of the same type. Given operands x and y, the remainder functions return the value x - n*y, where n is the integer closest to x/y. If x/y is exactly halfway between two integers, n is the even integer closest to x/y. The remainder functions are always exact and so require no rounding. Depending on the relative magnitudes of the operands, the remainder functions can take considerably longer to execute than the other SoftFloat functions. This is inherent in the remainder operation itself and is not a flaw in the SoftFloat implementation. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Round-to-Integer Functions For each format, SoftFloat implements the round-to-integer function specified by the IEC/IEEE Standard. The functions are: float32_round_to_int float64_round_to_int floatx80_round_to_int float128_round_to_int Each function takes a single floating-point operand and returns a result of the same type. (Note that the result is not an integer type.) The operand is rounded to an exact integer according to the current rounding mode, and the resulting integer value is returned in the same floating-point format. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Comparison Functions The following floating-point comparison functions are provided: float32_eq float32_le float32_lt float64_eq float64_le float64_lt floatx80_eq floatx80_le floatx80_lt float128_eq float128_le float128_lt Each function takes two operands of the same type and returns a 1 or 0 representing either _true_ or _false_. The abbreviation `eq' stands for ``equal'' (=); `le' stands for ``less than or equal'' (<=); and `lt' stands for ``less than'' (<). The standard greater-than (>), greater-than-or-equal (>=), and not-equal (!=) functions are easily obtained using the functions provided. The not-equal function is just the logical complement of the equal function. The greater-than-or-equal function is identical to the less-than-or-equal function with the operands reversed; and the greater-than function can be obtained from the less-than function in the same way. The IEC/IEEE Standard specifies that the less-than-or-equal and less-than functions raise the invalid exception if either input is any kind of NaN. The equal functions, on the other hand, are defined not to raise the invalid exception on quiet NaNs. For completeness, SoftFloat provides the following additional functions: float32_eq_signaling float32_le_quiet float32_lt_quiet float64_eq_signaling float64_le_quiet float64_lt_quiet floatx80_eq_signaling floatx80_le_quiet floatx80_lt_quiet float128_eq_signaling float128_le_quiet float128_lt_quiet The `signaling' equal functions are identical to the standard functions except that the invalid exception is raised for any NaN input. Likewise, the `quiet' comparison functions are identical to their counterparts except that the invalid exception is not raised for quiet NaNs. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Signaling NaN Test Functions The following functions test whether a floating-point value is a signaling NaN: float32_is_signaling_nan float64_is_signaling_nan floatx80_is_signaling_nan float128_is_signaling_nan The functions take one operand and return 1 if the operand is a signaling NaN and 0 otherwise. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Raise-Exception Function SoftFloat provides a function for raising floating-point exceptions: float_raise The function takes a mask indicating the set of exceptions to raise. No result is returned. In addition to setting the specified exception flags, this function may cause a trap or abort appropriate for the current system. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ------------------------------------------------------------------------------- Contact Information At the time of this writing, the most up-to-date information about SoftFloat and the latest release can be found at the Web page `http:// HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.