Opcodes per frame

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Opcodes per frame
by on (#63696)
Hey all,

To all my fellow yanks reading this, Happy 4th! To everyone else, hi!

Anyway, I seem to remember reading a post on here awhile back that I think was discussing instructions per second. I can't find it now. I'm just curious in finding out roughly (I know it changes depending on the opcodes) how many opcodes the 2A03 can handle during vblank after the NMI fires and during rendering time once vblank is done. Does anyone know offhand?

by on (#63697)
On NTSC:
341 ppu pixels per scanline
262 total scanlines.

In one frame (starting from NMI):
20 Vblank scanlines
1 prerender scanline
240 visible scanlines
1 'pre-vblank' scanline

Divide by 3 to turn PPU pixels into CPU cycles.

So that's ~2273 total CPU cycles during vblank time. Of course, you don't get all of them, because entering an interrupt itself takes some cycles, and running all the logic to get ready to draw takes time too.

Most instructions you'd run during vblank time are 4 cycles long.

by on (#63698)
Let me answer what will probably be your next question.

To get the most out of vblank time, prepare a buffer in main RAM (for example, use unused parts of the stack at $0100-$019F) before vblank, and then copy from that buffer into VRAM during vblank. The limiting factor becomes how much you can stuff into VRAM. On NTSC, count on being able to copy 160 bytes to nametables using a moderately unrolled loop, plus one 256-byte display list to OAM.

Disch's doc explains more.

by on (#63699)
Probably the best answer to this question would be to profile a few common games to see the actual average number of instructions during each of those periods. Note that opcode refers to the operation code byte of an instruction, for example $A9 for LDA #imm. The opcode is examined to determine what instruction is executed.

by on (#63730)
Dwedit wrote:
So that's ~2273 total CPU cycles during vblank time.
Most instructions you'd run during vblank time are 4 cycles long.


So we're looking at approximately 500 instructions. Thanks Dwedit.

tepples wrote:
Let me answer what will probably be your next question.
To get the most out of vblank time, prepare a buffer in main RAM before vblank, and then copy from that buffer into VRAM during vblank.


Yep, that's how I do it. I load all the columns of background tiles that need to be drawn, score updates, etc. in the game loop and then have the NMI handler load them if there's been a change during the previous frame. I was just curious for curiosity's sake. But also it might be helpful later on. Thanks everyone.

by on (#63733)
bigjt_2 wrote:
So we're looking at approximately 500 instructions.

If you want my honest opinion, timing it in terms of "instructions" is a very bad idea. Typical VRAM-updating code will use instructions that vary between 2 and 5 cycles, while most should use 3 or 4, but I'm not really sure what a good average would be.

Also, if you have loops, it's not like you can just look at the source file and use the number of lines your code takes as an estimate of how much time it will need to execute, you have to take into consideration how many times the loop will repeat.

Another important thing is that even though a sprite DMA is triggered by a 4-cycle instruction (ST* $4014), the actual data transfer takes 513 cycles, so the math will be really off if you time your update routines by counting instructions.

Have you tried debugging your code with Nintendulator? You could set up a breakpoint for when the video updates finish and based on the timing information the emulator shows you will know how much time you have left (or if you went past VBlank, which is not good!).

by on (#63744)
tokumaru wrote:
Another important thing is that even though a sprite DMA is triggered by a 4-cycle instruction (ST* $4014), the actual data transfer takes 513 cycles, so the math will be really off if you time your update routines by counting instructions.


I didn't even think about sprite DMA. It takes that many cycles? I guess I'm not surprised when I consider it's transferring everything in sprite RAM to the PPU, but that's pretty interesting.

Thanks all. As always, I learned a lot from this.

by on (#63747)
bigjt_2 wrote:
I didn't even think about sprite DMA. It takes that many cycles? I guess I'm not surprised when I consider it's transferring everything in sprite RAM to the PPU, but that's pretty interesting.

Yeah, it transfers 256 bytes from CPU memory to OAM. 513 cycles may seem like a long time, but this is practically 2 cycles per byte, much faster than would be possible without DMA. Even with the fastest unrolled code possible, it would take 7 (if you use all of zero page for sprites, which is not practical at all) or 8 cycles for each byte, for a total of 1792 or 2048 cycles, nearly all of VBlank. If you look at it like that, 513 is pretty damn fast.