ARM > Efficient C for ARM > Local Variable Types
Example
This example code calculates a simple checksum on a packet of 64 words:
int checksum1(const int *data)
{
char i;
int sum = 0;
for (i = 0; i < 64; i++)
sum += data[i];
return sum;
}
Let’s look at the annotated compiler output:
checksum1
MOV r2,r0 ; R2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ; i = 0
loop
LDR r3,[r2,r1,LSL #2] ; R3 = data[i]
ADD r1,r1,#1 ; R1 = i+1
AND r1,r1,#0xff ; i = (char)R1 *** UNNECESSARY ***
CMP r1,#0x40 ; compare i to 64
ADD r0,r3,r0 ; sum += R3
BCC loop ; if (i<64) loop
MOV pc,r14 ; return sum
The compiler is emitting an AND r1,r1,#0xff
instruction even though it should know that i
never exceeds 64. If we change i
from char
to unsigned int
the AND
disappears: it’s no longer necessary to account for wrap-around.
Remember that this isn’t just a saving of one instruction or cycle. It saves 64 instructions: one for each iteration.
This is an inner loop. Optimisations to inner loops are highly beneficial.
Remarks
You might think char
is an efficient choice for i
; using less stack space or register space than an int
might. On the ARM, this is wrong:
- Stack entries are at least 32 bits wide.
- Registers are 32 bits wide.
To compute the modification of i
correctly the compiler must account for the case where i
will wrap around which you get for ‘free’ with int
, but not with char
.