Code:
<Bushytail> beam chasing refers more to the beam that makes scanlines on a CRT, and trying to make code for graphical effects fast enough to fit
<Bushytail> it comes up more on the 2600 for obvious reasons
<Bushytail> it comes up more on the 2600 for obvious reasons
This got me wondering just how many HBlank writes one can stuff into the NES. I choes to do this without WRAM, because it makes it actually possible to continuously run (as it takes a lot longer to rewrite the WRAM of an unrolled-loop, and picture-time is greater than Vblank, so…very much not having time in VBlank to rewrite all the HBlank code)
So, the biggest way to do it is to have a 240-byte table for each address/data of write. Initially I was thinking only of PPU registers, not mapper registers…which requires more than one byte of rewritable address.
Code:
;Maximum beamchasing...in RAM.
;Hblank is 28⅓, non is 86⅓ cycles
ldx #16 (end of vblank)
looptop: (8)
lda $write1addr,x ;256-byte tables of where we want each write
sta $wr1+2 (16)
lda $write2addr,x
sta $wr2+2 (24)
lda $write3addr,x
sta $wr3+2 (32)
lda $write4addr,x
sta $wr4+2 (40)
lda $write5addr,x
sta $wr5+2 (48)
lda $write6addr,x
sta $wr6+2 (56)
lda $write4,x;and what we want written
sta $re4+1 (64)
lda $write5,x
sta $re5+1 (72)
lda $write6,x
sta $re6+1 (80)
lda $write1,x (84)
ldy $write2,x (88)
sty $re2+1 (92)
ldy #write3,x (96)
stx $FF (100)
re2:ldx #00 ;overwritten
wr1:sta $20ZZ ;addr overwritten, write cycle is when hblank begins (103+1)
wr2:stx $20ZZ ;addr overwritten (5)
wr3:sty $20ZZ ;addr overwritten (9)
re4:lda #00 ;value overwritten (11)
wr4:sta $20ZZ ;addr overwritten (15)
re5:lda #00 ;value overwritten (17)
wr5:sta $20ZZ ;addr overwritten (21)
re6:lda #00 ;value overwritten (23)
wr6:sta $20ZZ ;addr overwritten (27)
;…and we're out of hblank time. One cycle (and one-third) of leeway.
ldx $ff ;(3)nonblank times
inx ;(5)
bne looptop;(8)
;so, 102+28 if we're perfect, we have the cycle in hblank but not out to spare
;92+28 if we're executing out of ZP
;Hblank is 28⅓, non is 86⅓ cycles
ldx #16 (end of vblank)
looptop: (8)
lda $write1addr,x ;256-byte tables of where we want each write
sta $wr1+2 (16)
lda $write2addr,x
sta $wr2+2 (24)
lda $write3addr,x
sta $wr3+2 (32)
lda $write4addr,x
sta $wr4+2 (40)
lda $write5addr,x
sta $wr5+2 (48)
lda $write6addr,x
sta $wr6+2 (56)
lda $write4,x;and what we want written
sta $re4+1 (64)
lda $write5,x
sta $re5+1 (72)
lda $write6,x
sta $re6+1 (80)
lda $write1,x (84)
ldy $write2,x (88)
sty $re2+1 (92)
ldy #write3,x (96)
stx $FF (100)
re2:ldx #00 ;overwritten
wr1:sta $20ZZ ;addr overwritten, write cycle is when hblank begins (103+1)
wr2:stx $20ZZ ;addr overwritten (5)
wr3:sty $20ZZ ;addr overwritten (9)
re4:lda #00 ;value overwritten (11)
wr4:sta $20ZZ ;addr overwritten (15)
re5:lda #00 ;value overwritten (17)
wr5:sta $20ZZ ;addr overwritten (21)
re6:lda #00 ;value overwritten (23)
wr6:sta $20ZZ ;addr overwritten (27)
;…and we're out of hblank time. One cycle (and one-third) of leeway.
ldx $ff ;(3)nonblank times
inx ;(5)
bne looptop;(8)
;so, 102+28 if we're perfect, we have the cycle in hblank but not out to spare
;92+28 if we're executing out of ZP
Should probably unrollx3 at least just to make it easy to deal with the third-cycles.
Obviously if one is writing out of WRAM one could unroll it all the way and just use ld#imm to easily fit, but that requires WRAM. I want t osee if it can fit in ZP in such a way, because it makes it easy to "bankswitch" our arbitrary tables (rewrite the 12 values, relatively easy to fit in Vblank)
You can get two more writes if you're doing the 2006/5/5/6 thing,but obviously you have to find some cycles to put them in.
Presently 91 bytes, and those 92+28 cycles (if ZP)...unrolling 3 times will drop some cycles, and make it easier to deal with the ⅔ cycle per line accruing.
(pre-post edit: save a cycle by changing the "save x" store to point at the load-x and making the ldx #imm; also means not having to not save a ZP slot for that.)
Of course, if we fix two of the writes to scroll registers, that will save the rewriting which-register-bytes…which is enough to drop it to fit a 3-unrolled into ZP, and also get it actually fitting under the cycle count, though sync cycles still need to be considered…
edit: or fix two to "disable render enable render", which makes for THREE ditched tables (2xaddr, 1xdata for the disable-render value)...but at cost of true-arbitrary writes.
edit2: added leading explanation. It occurs tome that a CHR bankswitch might be a desired write as well, which would require making one of the writes have its hi-address rewritable. Also fixed the ldx, as there are only 240 scanlines to write.
edit3,4: In sum: "[How] Can we fit six arbitrary Hblank (PPU register/CHR bank/VRAM) writes in every scanline every frame? If not, how much freedom needs sacrificing to fit them in?"