This routine delays a run-time specified number of cycles, plus a fixed constant number of cycles (33). The constant number includes the number of cycles the JSR+RTS takes.
Pass the number of cycles to delay in A:X with X having the low 8 bits and A having the high 8 bits of the number of cycles to delay.
Requires no absolute jumps / relocations. Preserves X,Y. Does require page-aligning so none of the JSRs cross page boundary. Written for CA65.
I could not find such a routine online nor in Blargg's library, so I wrote my own. The sub-256 cycle part is copied from 6502org wiki. Blargg's library has one, but entering it would require a jump, so this ends up having a smaller total overhead.
Here is a version that has the semantics for A and X reversed. X contains the high-order 8 bits, A contains the lower 8 bits. X is zeroed, Y is preserved. It reuses the sub-256 cycle delay routine from Blargg's library (which can be entered separately). The overhead is 30 cycles.
If relocations are not a problem, then the routines can be replaced with these, respectively:
Pass the number of cycles to delay in A:X with X having the low 8 bits and A having the high 8 bits of the number of cycles to delay.
Requires no absolute jumps / relocations. Preserves X,Y. Does require page-aligning so none of the JSRs cross page boundary. Written for CA65.
Code:
; Delays A:X clocks+overhead
; Time: 256*A+X+33 clocks (including JSR)
; Clobbers A. Preserves X,Y.
delay_256a_x_33_clocks:
cmp #1 ; +2; 2 cycles overhead
bcs @do256 ; +2; 4 cycles overhead
; 0-255 cycles remain, overhead = 4
txa ; +2; 6
;;;;;;;;;;;;;;;;
; 15 + JSR + RTS overhead for the code below. JSR=6, RTS=6. Total: 27. 6+27=33
; ; Cycles Accumulator Carry flag
; ; 0 1 2 3 4 (hex) 0 1 2 3 4
sec ; 0 0 0 0 0 00 01 02 03 04 1 1 1 1 1
: sbc #5 ; 2 2 2 2 2 FB FC FD FE FF 0 0 0 0 0
bcs :- ; 4 4 4 4 4 FB FC FD FE FF 0 0 0 0 0
lsr a ; 6 6 6 6 6 7D 7E 7E 7F 7F 1 0 1 0 1
bcc :+ ; 8 8 8 8 8 7D 7E 7E 7F 7F 1 0 1 0 1
: sbc #$7E ;10 11 10 11 10 FF FF 00 00 01 0 0 1 1 1
bcc :+ ;12 13 12 13 12 FF FF 00 00 01 0 0 1 1 1
beq :+ ; 14 15 14 00 00 01 1 1 1
bne :+ ; 16 01 1
: rts ;15 16 17 18 19 This loop from http://6502org.wikidot.com/software-delay
@do256: ; do 256 cycles. ; 5 cycles done so far. C is set from CMP
sbc #1 ; 2 cycles
pha ; 3 cycles
lda #(34*2-1) ; 2 cycles
; ;12 cycles done so far
: sec ; 2 cycles (sec is only needed
sbc #2 ; 2 cycles to make loop 7 cycles)
bcs :- ; 3 cycles for taken branch
; ; -1 cycles for untaken branch
;12 + 34*7 - 1 = 249 done so far, 7 missing
pla ; 4 cycles
bcc delay_256a_x_33_clocks ; 3 cycles ; C is unset from SBC
; Time: 256*A+X+33 clocks (including JSR)
; Clobbers A. Preserves X,Y.
delay_256a_x_33_clocks:
cmp #1 ; +2; 2 cycles overhead
bcs @do256 ; +2; 4 cycles overhead
; 0-255 cycles remain, overhead = 4
txa ; +2; 6
;;;;;;;;;;;;;;;;
; 15 + JSR + RTS overhead for the code below. JSR=6, RTS=6. Total: 27. 6+27=33
; ; Cycles Accumulator Carry flag
; ; 0 1 2 3 4 (hex) 0 1 2 3 4
sec ; 0 0 0 0 0 00 01 02 03 04 1 1 1 1 1
: sbc #5 ; 2 2 2 2 2 FB FC FD FE FF 0 0 0 0 0
bcs :- ; 4 4 4 4 4 FB FC FD FE FF 0 0 0 0 0
lsr a ; 6 6 6 6 6 7D 7E 7E 7F 7F 1 0 1 0 1
bcc :+ ; 8 8 8 8 8 7D 7E 7E 7F 7F 1 0 1 0 1
: sbc #$7E ;10 11 10 11 10 FF FF 00 00 01 0 0 1 1 1
bcc :+ ;12 13 12 13 12 FF FF 00 00 01 0 0 1 1 1
beq :+ ; 14 15 14 00 00 01 1 1 1
bne :+ ; 16 01 1
: rts ;15 16 17 18 19 This loop from http://6502org.wikidot.com/software-delay
@do256: ; do 256 cycles. ; 5 cycles done so far. C is set from CMP
sbc #1 ; 2 cycles
pha ; 3 cycles
lda #(34*2-1) ; 2 cycles
; ;12 cycles done so far
: sec ; 2 cycles (sec is only needed
sbc #2 ; 2 cycles to make loop 7 cycles)
bcs :- ; 3 cycles for taken branch
; ; -1 cycles for untaken branch
;12 + 34*7 - 1 = 249 done so far, 7 missing
pla ; 4 cycles
bcc delay_256a_x_33_clocks ; 3 cycles ; C is unset from SBC
I could not find such a routine online nor in Blargg's library, so I wrote my own. The sub-256 cycle part is copied from 6502org wiki. Blargg's library has one, but entering it would require a jump, so this ends up having a smaller total overhead.
Here is a version that has the semantics for A and X reversed. X contains the high-order 8 bits, A contains the lower 8 bits. X is zeroed, Y is preserved. It reuses the sub-256 cycle delay routine from Blargg's library (which can be entered separately). The overhead is 30 cycles.
Code:
; Delays X:A clocks+overhead
; Time: 256*X+A+30 clocks (including JSR)
; Clobbers A,X. Preserves Y.
delay_256x_a_30_clocks:
cpx #0 ; +2
beq delay_a_25_clocks ; +3 (25+5 = 30 cycles overhead)
@do256: ; do 256 cycles. 4 cycles so far. Loop is 1+2+4+4+2+2 = 15 bytes.
pha ; +3
lda #(256-42) ; +2
; ; 9 cycles done so far. Carry is set from CPX
: adc #1 ; +2
bne :- ; +3 for taken branch
; -1 for untaken branch
: adc #(256/6) ; 2 cycles
bcc :- ; +3 for taken branch
; -1 for untaken branch
; 9 + 42*5-1 + 6*5-1 = 247 done so far; 9 missing
pla ; +4
dex ; +2
bcs delay_256x_a_30_clocks ; +3. Carry is set from ADC
;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A clocks + overhead
; Preserved: X, Y
; Time: A+25 clocks (including JSR)
: sbc #7 ; carry set by CMP
delay_a_25_clocks:
cmp #7
bcs :- ; do multiples of 7
lsr a ; bit 0
bcs :+
: ; A=clocks/2, either 0,1,2,3
beq @zero ; 0: 5
lsr a
beq :+ ; 1: 7
bcc :+ ; 2: 9
@zero: bne :+ ; 3: 11
: rts ; (thanks to dclxvi for the algorithm)
; Time: 256*X+A+30 clocks (including JSR)
; Clobbers A,X. Preserves Y.
delay_256x_a_30_clocks:
cpx #0 ; +2
beq delay_a_25_clocks ; +3 (25+5 = 30 cycles overhead)
@do256: ; do 256 cycles. 4 cycles so far. Loop is 1+2+4+4+2+2 = 15 bytes.
pha ; +3
lda #(256-42) ; +2
; ; 9 cycles done so far. Carry is set from CPX
: adc #1 ; +2
bne :- ; +3 for taken branch
; -1 for untaken branch
: adc #(256/6) ; 2 cycles
bcc :- ; +3 for taken branch
; -1 for untaken branch
; 9 + 42*5-1 + 6*5-1 = 247 done so far; 9 missing
pla ; +4
dex ; +2
bcs delay_256x_a_30_clocks ; +3. Carry is set from ADC
;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A clocks + overhead
; Preserved: X, Y
; Time: A+25 clocks (including JSR)
: sbc #7 ; carry set by CMP
delay_a_25_clocks:
cmp #7
bcs :- ; do multiples of 7
lsr a ; bit 0
bcs :+
: ; A=clocks/2, either 0,1,2,3
beq @zero ; 0: 5
lsr a
beq :+ ; 1: 7
bcc :+ ; 2: 9
@zero: bne :+ ; 3: 11
: rts ; (thanks to dclxvi for the algorithm)
If relocations are not a problem, then the routines can be replaced with these, respectively:
Code:
;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A:X clocks+overhead
; Clobbers A. Preserves X,Y. Has relocations.
; Time: 256*A+X+31 clocks (including JSR)
;;;;;;;;;;;;;;;;;;;;;;;;
: ; do 256 cycles. ; 5 cycles done so far. Loop is 2+1+ 2+3+ 1 = 9 bytes.
sbc #1 ; 2 cycles - Carry was set from cmp
pha ; 3 cycles
lda #(256-25-10-2-4) ; +2
jsr delay_a_25_clocks
pla ; 4 cycles
delay_256a_x_31_clocks:
cmp #1 ; +2; 2 cycles overhead
bcs :- ; +2; 4 cycles overhead
; 0-255 cycles remain, overhead = 4
txa ; +2; 6; +27 = 33
; 15 + JSR + RTS overhead for the code below. JSR=6, RTS=6. 15+12=27
; ; Cycles Accumulator Carry flag
; ; 0 1 2 3 4 (hex) 0 1 2 3 4
sec ; 0 0 0 0 0 00 01 02 03 04 1 1 1 1 1
: sbc #5 ; 2 2 2 2 2 FB FC FD FE FF 0 0 0 0 0
bcs :- ; 4 4 4 4 4 FB FC FD FE FF 0 0 0 0 0
lsr a ; 6 6 6 6 6 7D 7E 7E 7F 7F 1 0 1 0 1
bcc :+ ; 8 8 8 8 8 7D 7E 7E 7F 7F 1 0 1 0 1
: sbc #$7E ;10 11 10 11 10 FF FF 00 00 01 0 0 1 1 1
bcc :+ ;12 13 12 13 12 FF FF 00 00 01 0 0 1 1 1
beq :+ ; 14 15 14 00 00 01 1 1 1
bne :+ ; 16 01 1
: rts ;15 16 17 18 19 (thanks to dclxvi for the algorithm)
;;;;;;;;;;;;;;;;;;;;;;;;
; Delays X:A clocks+overhead
; Clobbers A,X. Preserves Y. Has relocations.
; Time: 256*X+A+30 clocks (including JSR)
;;;;;;;;;;;;;;;;;;;;;;;;
delay_256x_a_30_clocks:
cpx #0 ; +2
beq delay_a_25_clocks ; +3 (25+5 = 30 cycles overhead)
; do 256 cycles. ; 4 cycles so far. Loop is 1+1+ 2+3+ 1+3 = 11 bytes.
dex ; 2 cycles
pha ; 3 cycles
lda #(256-25-9-2-7) ; +2
jsr delay_a_25_clocks
pla ; 4
jmp delay_256x_a_30_clocks ; 3.
;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A clocks + overhead
; Preserved: X, Y
; Time: A+25 clocks (including JSR)
;;;;;;;;;;;;;;;;;;;;;;;;
: sbc #7 ; carry set by CMP
delay_a_25_clocks:
cmp #7
bcs :- ; do multiples of 7
; ; Cycles Accumulator Carry Zero
lsr a ; 0 0 0 0 0 0 0 00 01 02 03 04 05 06 0 0 0 0 0 0 0 ? ? ? ? ? ? ?
bcs :+ ; 2 2 2 2 2 2 2 00 00 01 01 02 02 03 0 1 0 1 0 1 0 1 1 0 0 0 0 0
: beq @zero ; 4 5 4 5 4 5 4 00 00 01 01 02 02 03 0 1 0 1 0 1 0 1 1 0 0 0 0 0
lsr a ; : : 6 7 6 7 6 :: :: 01 01 02 02 03 : : 0 1 0 1 0 : : 0 0 0 0 0
beq :+ ; : : 8 9 8 9 8 :: :: 00 00 01 01 01 : : 1 1 0 0 1 : : 1 1 0 0 0
bcc :+ ; : : : : A B A :: :: :: :: 01 01 01 : : : : 0 0 1 : : : : 0 0 0
@zero: bne :+ ; 7 8 : : : : C 00 01 :: :: :: :: 01 0 1 : : : : 1 1 1 : : : : 0
: rts ; 9 A B C D E F (thanks to dclxvi for the algorithm)
; Delays A:X clocks+overhead
; Clobbers A. Preserves X,Y. Has relocations.
; Time: 256*A+X+31 clocks (including JSR)
;;;;;;;;;;;;;;;;;;;;;;;;
: ; do 256 cycles. ; 5 cycles done so far. Loop is 2+1+ 2+3+ 1 = 9 bytes.
sbc #1 ; 2 cycles - Carry was set from cmp
pha ; 3 cycles
lda #(256-25-10-2-4) ; +2
jsr delay_a_25_clocks
pla ; 4 cycles
delay_256a_x_31_clocks:
cmp #1 ; +2; 2 cycles overhead
bcs :- ; +2; 4 cycles overhead
; 0-255 cycles remain, overhead = 4
txa ; +2; 6; +27 = 33
; 15 + JSR + RTS overhead for the code below. JSR=6, RTS=6. 15+12=27
; ; Cycles Accumulator Carry flag
; ; 0 1 2 3 4 (hex) 0 1 2 3 4
sec ; 0 0 0 0 0 00 01 02 03 04 1 1 1 1 1
: sbc #5 ; 2 2 2 2 2 FB FC FD FE FF 0 0 0 0 0
bcs :- ; 4 4 4 4 4 FB FC FD FE FF 0 0 0 0 0
lsr a ; 6 6 6 6 6 7D 7E 7E 7F 7F 1 0 1 0 1
bcc :+ ; 8 8 8 8 8 7D 7E 7E 7F 7F 1 0 1 0 1
: sbc #$7E ;10 11 10 11 10 FF FF 00 00 01 0 0 1 1 1
bcc :+ ;12 13 12 13 12 FF FF 00 00 01 0 0 1 1 1
beq :+ ; 14 15 14 00 00 01 1 1 1
bne :+ ; 16 01 1
: rts ;15 16 17 18 19 (thanks to dclxvi for the algorithm)
;;;;;;;;;;;;;;;;;;;;;;;;
; Delays X:A clocks+overhead
; Clobbers A,X. Preserves Y. Has relocations.
; Time: 256*X+A+30 clocks (including JSR)
;;;;;;;;;;;;;;;;;;;;;;;;
delay_256x_a_30_clocks:
cpx #0 ; +2
beq delay_a_25_clocks ; +3 (25+5 = 30 cycles overhead)
; do 256 cycles. ; 4 cycles so far. Loop is 1+1+ 2+3+ 1+3 = 11 bytes.
dex ; 2 cycles
pha ; 3 cycles
lda #(256-25-9-2-7) ; +2
jsr delay_a_25_clocks
pla ; 4
jmp delay_256x_a_30_clocks ; 3.
;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A clocks + overhead
; Preserved: X, Y
; Time: A+25 clocks (including JSR)
;;;;;;;;;;;;;;;;;;;;;;;;
: sbc #7 ; carry set by CMP
delay_a_25_clocks:
cmp #7
bcs :- ; do multiples of 7
; ; Cycles Accumulator Carry Zero
lsr a ; 0 0 0 0 0 0 0 00 01 02 03 04 05 06 0 0 0 0 0 0 0 ? ? ? ? ? ? ?
bcs :+ ; 2 2 2 2 2 2 2 00 00 01 01 02 02 03 0 1 0 1 0 1 0 1 1 0 0 0 0 0
: beq @zero ; 4 5 4 5 4 5 4 00 00 01 01 02 02 03 0 1 0 1 0 1 0 1 1 0 0 0 0 0
lsr a ; : : 6 7 6 7 6 :: :: 01 01 02 02 03 : : 0 1 0 1 0 : : 0 0 0 0 0
beq :+ ; : : 8 9 8 9 8 :: :: 00 00 01 01 01 : : 1 1 0 0 1 : : 1 1 0 0 0
bcc :+ ; : : : : A B A :: :: :: :: 01 01 01 : : : : 0 0 1 : : : : 0 0 0
@zero: bne :+ ; 7 8 : : : : C 00 01 :: :: :: :: 01 0 1 : : : : 1 1 1 : : : : 0
: rts ; 9 A B C D E F (thanks to dclxvi for the algorithm)