Any timing test must use some kind of timer, so if your branches are flawless, then it's probably the timer.
Looking at source/2-branch_timing.s:
; Verifies timing of branch instructions
;
; Runs branch instruction in loop that counts iterations
; until APU length counter expires. Moves the loop around
; in memory to trigger page cross/no cross cases.
So it uses the APU's length counter to do the timing. Searching for APU yields this:
Code:
; Synchronize with APU length counter
setb SNDMODE,$40
setb SNDCHN,$01
setb $4000,$10
setb $4001,$7F
setb $4002,$FF
setb $4003,$18
lda #$01
: and SNDCHN
bne :-
setb, what's that? There's a readme.txt in the source/ directory:
Code:
Macros
------
Some macros are used to make common operations more convenient, defined
in common/macros.inc. The left is equivalent to the right:
Macro Equivalent
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
...
setb addr,byte lda #byte
sta addr
Then it does the length counter setup described above, runs the instruction, and converts the number of iterations the branch loop ran to a cycle count:
Code:
; Setup length counter
setb $4003,$18
delay 29830-7120
; Run instruction
setb temp,0
pla
jmp (addr)
raw_to_cycles: ; entry i is lowest value that qualifies for i cycles
.byte 250, 241, 233, 226, 219, 213, 206, 201, 195, 190, 0
; Jumps here when instruction has been timed
instr_done:
; Convert iteration count to cycle count
lda temp
ldy #-1
: iny
cmp raw_to_cycles,y
blt :-
So your APU length counter handling might be wrong.