ARM > Efficient C for ARM > Local Variable Types
Example
This example code calculates a simple checksum on a packet of 64 words:
int checksum1(const int *data)
{
char i;
int sum = 0;
for (i = 0; i < 64; i++)
sum += data[i];
return sum;
}
Let’s look at the annotated compiler output:
checksum1
MOV r2,r0 ; R2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ; i = 0
loop
LDR r3,[r2,r1,LSL #2] ; R3 = data[i]
ADD r1,r1,#1 ; R1 = i+1
AND r1,r1,#0xff ; i = (char)R1 *** UNNECESSARY ***
CMP r1,#0x40 ; compare i to 64
ADD r0,r3,r0 ; sum += R3
BCC loop ; if (i<64) loop
MOV pc,r14 ; return sum
The compiler is emitting an AND r1,r1,#0xff instruction even though it should know that i never exceeds 64. If we change i from char to unsigned int the AND disappears: it’s no longer necessary to account for wrap-around.
Remember that this isn’t just a saving of one instruction or cycle. It saves 64 instructions: one for each iteration.
This is an inner loop. Optimisations to inner loops are highly beneficial.
Remarks
You might think char is an efficient choice for i; using less stack space or register space than an int might. On the ARM, this is wrong:
- Stack entries are at least 32 bits wide.
- Registers are 32 bits wide.
To compute the modification of i correctly the compiler must account for the case where i will wrap around which you get for ‘free’ with int, but not with char.