Vdsp(bf561)中的浮点运算（6）：float加减运算

副标题#e#

一直觉得float加减运算很简朴，无非就是将之转换为__float32_add和__float32_sub这两个函数挪用罢了，然后用软件模仿举办加减运算。但真的如此简朴吗？当一些让人不太舒服的条件呈现的时候，照旧如此吗？

1.1 Vdsp对float加减运算的处理惩罚

在vdsp下，可以很简朴地用：

float add(float x, float y) { float r = x + y; return r; } float sub(float x, float y) { float r = x - y; return r; }

来完成浮点加减运算，编译器自动将内里的加法操纵转换为___float32_add的函数挪用，而将减法操纵转换为___float32_sub的函数挪用，这两个函数的挪用实此刻libdsp/fpadd.asm中：

___float32_sub: BITTGl　(R1,31);　　// Flip sign bit of Y, fall　through to add .align 2; ___float32_add:

从这几行代码可以看出减法无非就是把减数改变标记再用加法实现罢了。

1.2 当y为0时

看__float32_add的代码：

// check for addition of 0.0 R3 = R1 << 1;　　// Remove sign CC = R3;　　　　// If Y=0, return X IF !CC JUMP .return_x_nopop; ……….. .return_x_nopop: #if CHECKING_FOR_NEGZERO_RES R1 = R0 << 1; CC = R1; IF !CC R0 = R1;　// convert any -0.0 to 0.0 #endif RTS;

直接返回x的值，此时的操纵需要的CYCLE数为25。

1.3 当x为0时

R2 = R0 << 1;　　// Remove sign CC = R2;　　　　// If X=0, return Y IF !CC JUMP .return_y_nopop; ………. .return_y_nopop: R0 = R1; RTS;

直接返回y的值，此时的操纵需要的CYCLE数为26。

#p#副标题#e#

1.4 当x为NAN可能INF时

// Check for all　exponent bits set, indicating a NaN or inf operand R4 = R2 >> 24;　　// Extract X exponent R5 = R3 >> 24;　　// Extract Y exponent


R6 = MAXBIASEXP+1;

CC = R4 == R6;

// Handle identities where X is NaN or inf.

IF CC JUMP .handle_nan_inf_x;

…………….

.handle_nan_inf_x:

// If x == inf, y a number ,return x

// If y == inf, and x&y have same sign, return x; (x may be NaN)

// else return NaN

CC = R5 < R6;　　// If exp Y < MAXBIASEXP+1

R2 = R0 << 9;　　// and X is inf

CC &= AZ;

IF CC JUMP .return_x_not0;　 // Return inf

CC = AZ;　　　　　// If X is inf

R2 = R1 << 9;　　// then we can deduce all　Y exponent bits set

CC &= AZ;　　　　// so Y is inf if no significand bits set

R2 = R0 ^ R1;　　// and Y is of the same sign

R2 >>= 31;

CC &= AZ;
R1 = -1;　　　// R1 = default NaN

IF !CC R0 = R1;

.return_x_not0: (R7:4) = [SP++]; RTS;

许多判定条件：

l　x == inf且y是一个正常数

返回x

add(inf,　 4)的功效就是inf。

add(-inf,　 4)的功效就是-inf。

此时的操纵需要的CYCLE数为50。

l　y == inf且xy同号

返回x

add(inf, inf)的功效为inf。

add(-inf, -inf)的功效为-inf。

add(inf, -inf)的功效为nan。

add(nan, inf)的功效为nan。

add(nan, -inf)的功效为nan。

此时的操纵需要的CYCLE数为50。

l　其它

返回nan

此时的操纵需要的CYCLE数为50。

#p#副标题#e#

1.5 当x为正常数且y为nan可能inf时

// If X is a number, but Y is NaN or inf, return Y. CC = R5 == R6; IF CC JUMP .return_y; …………. .return_y:　 // no need for -0.0 return check for this case (R7:4) = [SP++]; .return_y_nopop: R0 = R1; RTS;

直接返回y的值，如

add(4, inf)的值为inf

add(4, -inf)的值为-inf

add(4, nan)的值为nan

此时的操纵需要的CYCLE数为40。

1.6 当指数差大于24时

fpadd.asm内里这样表明这个条件：

// Extract and compare the two exponents. Since there are // 23 bits of mantissa, if the difference between exponents (D) // is greater than 24, the operand with the smaller exponent // is too insignificant to affect the other. If the difference // is exactly, the 24th (hidden) bit will　be shifted into the // R position for rounding, and so can still　affect the result. // (R is the most significant bit of the remainder, which is // all　the bits shifted off when adjusting exponents to match)

由于float内里的尾数部门只有23位，因此当两个数的指数差大于24时可以直接忽略这个较量小的数，转换为十进制的不同就是1.6777216e7。

好比add(1<<24, 1)的功效为1<<24。

此时的CYCLE为136。

1.7 当小数加上大数

看__float32_add的代码可以发明，当加法操纵的第一个操纵数较小时它会互换两个操纵数的位置：

#p#分页标题#e#

// If the exponents are different, then we arrange the // operands so that X is the larger, and we're adding // a less-significant number to it. Because the exponents // are biased (the eeeeeeee bits are the true exponent, // with +127 added), we remove the sign bits of X and Y, // and then compare directly.


CC = R3 <= R2 (IU);　　　// compare X and Y values (exp and mant)

IF CC JUMP .no_swap;　　 // okay if Y exp is smallest
// Y exp is biggest. Swap.

P1 = R5;　　// default exp of result

R5 = R0;　　// swap x and y

R0 = R1;

R1 = R5;

R4 = -R4;　　// negate D.

.no_swap: ………………

#p#副标题#e#

初看这个注释，获得的印象是假如第一个操纵数大于第二个操纵数，那么应该可以节减几个CYCLE，在大量运算时就可以很可观地节减许多时间。

但实际测试的功效却是：

add(10000.0, 10.0)需要的CYCLE为136。

add(10.0, 10000.0)需要的CYCLE则为132。

为什么？？？

在VDSP下跟踪进去，发明白一个很有意思的现象，当需要举办互换的时候（CC=1），这个时候暗示PC的光标会指向

P1 = R5;　　// default exp of result

这行语句，而不是直接跳转到.no_swap。可是光标的颜色由正常的黄色变为灰色，寄存器的值也不会改变。

于是乎想起了pipeline，在pipeline viewer内里可以看到pipeline举办了一个很明明的清空操纵，这样造成了从

IF CC JUMP .no_swap;　　 // okay if Y exp is smallest

到.no_swap跳转完成整整花了10个CYCLE！

当需要互换的时候，由于pipeline没有间断，从

IF CC JUMP .no_swap;　　 // okay if Y exp is smallest

执行到.no_swap只花了6个CYCLE！

第一次这么近间隔地感觉到了JUMP对效率的伤害！！也大白了uclinux内核内里likely和unlikely对效率的孝敬！！

1.8 溢出

当两个数相加高出float的暗示范畴，将返回inf可能-inf

好比：

add(FLT_MAX, FLT_MAX)的功效就是inf

关键字：