(I moved my own post from a different thread in the forum to here, as I had inadvertently hijacked a thread).
I would LOVE a traditional code profiler built into a NES emulator. One that creates a histogram of PC values so I can find hot-spots in my "common library code" that gets called from various routines each frame.
I would also LOVE if an emulator would make it easy to count CPU cycles taken between two given PC values (or use fake instructions, like hypervisor escapes). Maybe this will make it clearer:
Lets say that I have a routine that takes a varying amount of CPU time when called (internal branches based on changing game state). I want to know, for each frame, how many cycles it took. I would want the output as a CSV (ascii text) file with two values: frame and cpu cycle count.
The emulator would keep an internal counter (# of "counted" cycles in a frame). When the PPU frame ends this value (and the frame number) are appended to the CSV file.
The counter counts CPU cycles ONLY when the emulator is in a magic "mode". This mode is entered when the CPU executes to fictitious 6502 instruction $02, and exited when the CPU executes $12 (these are normally invalid CPU opcodes that kill the CPU). For the purposes of CPU / PPU timing, "executing" these magic opcodes would consume 2 CPU cycles each.
Or use the "decimal" flag as the magic flag. "D" can be set and cleared easily enough (SED, CLD) and has zero effect n the rest of the NES.
Ex: Consider this code (completely made up)
Inside the NMI handler it does this:
I want to analyze the average amount of CPU cycles that I waste in my main thread while waiting for the NMI to complete. I know the trick of setting the PPU to gray-scale mode for one scan-line (Thanks to Tepples for a post in 2008). That line gives a nice visual indication, but makes objective analysis difficult.
The above is a highly contrived example. Although it does represent a microcosm of one thing that I want to analyze, I also want to analyze CPU cycle usage in a wide variety of my functions (especially my usage of fixed point 16x16 signed multiplication and cos() lookups).
The above makes it easy to count cycles when different builds of the ROM move the sed/cld around to different PC values, but make it difficult to have more than one "counter" per frame. Using fake opcodes (heck, make them 2-byte opcodes, with the second byte a "counter" index, allowing for 256 counters). Or just make the emulator take two PC values. The counter is toggled on/off when the PC equals either of these values. No instruction behavior hacking required for that.
Does this request make sense?
I would LOVE a traditional code profiler built into a NES emulator. One that creates a histogram of PC values so I can find hot-spots in my "common library code" that gets called from various routines each frame.
I would also LOVE if an emulator would make it easy to count CPU cycles taken between two given PC values (or use fake instructions, like hypervisor escapes). Maybe this will make it clearer:
Lets say that I have a routine that takes a varying amount of CPU time when called (internal branches based on changing game state). I want to know, for each frame, how many cycles it took. I would want the output as a CSV (ascii text) file with two values: frame and cpu cycle count.
The emulator would keep an internal counter (# of "counted" cycles in a frame). When the PPU frame ends this value (and the frame number) are appended to the CSV file.
The counter counts CPU cycles ONLY when the emulator is in a magic "mode". This mode is entered when the CPU executes to fictitious 6502 instruction $02, and exited when the CPU executes $12 (these are normally invalid CPU opcodes that kill the CPU). For the purposes of CPU / PPU timing, "executing" these magic opcodes would consume 2 CPU cycles each.
Or use the "decimal" flag as the magic flag. "D" can be set and cleared easily enough (SED, CLD) and has zero effect n the rest of the NES.
Ex: Consider this code (completely made up)
Code:
wait_4_nmi_done:
sed ; Turn on magic CPU timing mode.
spin: asl nmi_spinlock
bcc spin
cld ; Turn off magic CPU timing mode.
;; do other crap, but don't accumulate the CPU cycles.
sed
;; do more crap, want the CPU cycles counted.
cld
rts
sed ; Turn on magic CPU timing mode.
spin: asl nmi_spinlock
bcc spin
cld ; Turn off magic CPU timing mode.
;; do other crap, but don't accumulate the CPU cycles.
sed
;; do more crap, want the CPU cycles counted.
cld
rts
Inside the NMI handler it does this:
Code:
nmi_handler:
inc nmi_ticks
sec
ror nmi_spinlock
rti
inc nmi_ticks
sec
ror nmi_spinlock
rti
I want to analyze the average amount of CPU cycles that I waste in my main thread while waiting for the NMI to complete. I know the trick of setting the PPU to gray-scale mode for one scan-line (Thanks to Tepples for a post in 2008). That line gives a nice visual indication, but makes objective analysis difficult.
The above is a highly contrived example. Although it does represent a microcosm of one thing that I want to analyze, I also want to analyze CPU cycle usage in a wide variety of my functions (especially my usage of fixed point 16x16 signed multiplication and cos() lookups).
The above makes it easy to count cycles when different builds of the ROM move the sed/cld around to different PC values, but make it difficult to have more than one "counter" per frame. Using fake opcodes (heck, make them 2-byte opcodes, with the second byte a "counter" index, allowing for 256 counters). Or just make the emulator take two PC values. The counter is toggled on/off when the PC equals either of these values. No instruction behavior hacking required for that.
Does this request make sense?