NES Emulator debugging extension request

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
NES Emulator debugging extension request
by on (#72002)
(I moved my own post from a different thread in the forum to here, as I had inadvertently hijacked a thread).

I would LOVE a traditional code profiler built into a NES emulator. One that creates a histogram of PC values so I can find hot-spots in my "common library code" that gets called from various routines each frame.

I would also LOVE if an emulator would make it easy to count CPU cycles taken between two given PC values (or use fake instructions, like hypervisor escapes). Maybe this will make it clearer:

Lets say that I have a routine that takes a varying amount of CPU time when called (internal branches based on changing game state). I want to know, for each frame, how many cycles it took. I would want the output as a CSV (ascii text) file with two values: frame and cpu cycle count.

The emulator would keep an internal counter (# of "counted" cycles in a frame). When the PPU frame ends this value (and the frame number) are appended to the CSV file.

The counter counts CPU cycles ONLY when the emulator is in a magic "mode". This mode is entered when the CPU executes to fictitious 6502 instruction $02, and exited when the CPU executes $12 (these are normally invalid CPU opcodes that kill the CPU). For the purposes of CPU / PPU timing, "executing" these magic opcodes would consume 2 CPU cycles each.

Or use the "decimal" flag as the magic flag. "D" can be set and cleared easily enough (SED, CLD) and has zero effect n the rest of the NES.

Ex: Consider this code (completely made up)

Code:
wait_4_nmi_done:
  sed                       ; Turn on magic CPU timing mode.
spin: asl nmi_spinlock
  bcc spin
  cld                        ; Turn off magic CPU timing mode.
;; do other crap, but don't accumulate the CPU cycles.
  sed
;; do more crap, want the CPU cycles counted.
  cld
  rts


Inside the NMI handler it does this:
Code:
nmi_handler:
  inc nmi_ticks
  sec
  ror nmi_spinlock
  rti


I want to analyze the average amount of CPU cycles that I waste in my main thread while waiting for the NMI to complete. I know the trick of setting the PPU to gray-scale mode for one scan-line (Thanks to Tepples for a post in 2008). That line gives a nice visual indication, but makes objective analysis difficult.

The above is a highly contrived example. Although it does represent a microcosm of one thing that I want to analyze, I also want to analyze CPU cycle usage in a wide variety of my functions (especially my usage of fixed point 16x16 signed multiplication and cos() lookups).

The above makes it easy to count cycles when different builds of the ROM move the sed/cld around to different PC values, but make it difficult to have more than one "counter" per frame. Using fake opcodes (heck, make them 2-byte opcodes, with the second byte a "counter" index, allowing for 256 counters). Or just make the emulator take two PC values. The counter is toggled on/off when the PC equals either of these values. No instruction behavior hacking required for that.

Does this request make sense?
Re: NES Emulator debugging extension request
by on (#72004)
clueless wrote:
I would LOVE a traditional code profiler built into a NES emulator. One that creates a histogram of PC values so I can find hot-spots in my "common library code" that gets called from various routines each frame.

I intended for the Code/Data Logger visualizer to be useful for this in some ways. It's very much like ICU64's memory viewer. More frequently accessed memory is darker red/green/yellow/purple/cyan (depending on access type execute/read/write/APU-DMA/sprite-DMA. All I need is someone to help with the OpenGL text overlay and it'll be much more useful. You can zoom/pan also.

clueless wrote:
I would also LOVE if an emulator would make it easy to count CPU cycles taken between two given PC values (or use fake instructions, like hypervisor escapes). Maybe this will make it clearer:

One method for doing this I described in another post. http://nesdev.com/bbs/viewtopi ... sc&start=0 (look for SMB snapshot). More specifically to your point, the "Permanent Marker" feature of the assembler would meet the "move with my changing code" situation.

It doesn't output to CSV [yet] but all the info you'd need is in the data used to create the visualization. 8)

clueless wrote:
Or just make the emulator take two PC values. The counter is toggled on/off when the PC equals either of these values. No instruction behavior hacking required for that.


This was brought up later in that same discussion...having marking capability be just another thing you can do as a result of hitting a breakpoint. Idea not yet implemented. :oops:
Re: NES Emulator debugging extension request
by on (#72007)
clueless wrote:
The counter counts CPU cycles ONLY when the emulator is in a magic "mode". This mode is entered when the CPU executes to fictitious 6502 instruction $02, and exited when the CPU executes $12 (these are normally invalid CPU opcodes that kill the CPU). For the purposes of CPU / PPU timing, "executing" these magic opcodes would consume 2 CPU cycles each.

Or use the "decimal" flag as the magic flag. "D" can be set and cleared easily enough (SED, CLD) and has zero effect n the rest of the NES.

I'd prefer the decimal flag approach over the HLT approach even if only because the decimal flag approach won't screw anything up if it inadvertently ends up in production code.

Quote:
Using fake opcodes (heck, make them 2-byte opcodes, with the second byte a "counter" index, allowing for 256 counters).

For this, one might repurpose one of five SKB instructions.
Re: NES Emulator debugging extension request
by on (#72027)
NESICIDE wrote:
clueless wrote:
I would LOVE a traditional code profiler built into a NES emulator. One that creates a histogram of PC values so I can find hot-spots in my "common library code" that gets called from various routines each frame.

I intended for the Code/Data Logger visualizer to be useful for this in some ways. It's very much like ICU64's memory viewer. More frequently accessed memory is darker red/green/yellow/purple/cyan (depending on access type execute/read/write/APU-DMA/sprite-DMA. All I need is someone to help with the OpenGL text overlay and it'll be much more useful. You can zoom/pan also.

I've done that before in an OpenGL project. I'll find my code and try to post it.

NESICIDE wrote:
clueless wrote:
I would also LOVE if an emulator would make it easy to count CPU cycles taken between two given PC values (or use fake instructions, like hypervisor escapes). Maybe this will make it clearer:

One method for doing this I described in another post. http://nesdev.com/bbs/viewtopi ... sc&start=0 (look for SMB snapshot). More specifically to your point, the "Permanent Marker" feature of the assembler would meet the "move with my changing code" situation.

That's great, but I use ca65. I adopted ca65 many years ago while doing Apple IIe cross-platform development (hosted on Linux). I've not tried your assembler. My code is too tied to ca65's features to make it easy to port to another assembler's syntax (I say that without doing any actual research, so YMMV).

NESICIDE wrote:
It doesn't output to CSV [yet] but all the info you'd need is in the data used to create the visualization. 8)

clueless wrote:
Or just make the emulator take two PC values. The counter is toggled on/off when the PC equals either of these values. No instruction behavior hacking required for that.


This was brought up later in that same discussion...having marking capability be just another thing you can do as a result of hitting a breakpoint. Idea not yet implemented. :oops:
Re: NES Emulator debugging extension request
by on (#72030)
clueless wrote:
I've done that before in an OpenGL project. I'll find my code and try to post it.


Nesicide, I've opened up public read-only access on a Subversion code repository to an opengl project that I was working on years ago (unfinished, as usual). The game implements a 2-d text overlay using opengl on top of a 3-d world model. The relevent code is in the function "UpdateFPSDisplay", in "./src/gui/sdl_core.cpp". It uses two wrapper functions called "SDL_GL_Enter2DMode()" and "SDL_GL_Leave2DMode()" (also in same source file).

https://www.ecoligames.com/svn/hhd3d/src/gui/sdl_core.cpp

The game was called "Hungry Hungry Dragon" (inspired by hungry-hungry-hippo). It was going to be a flight-sim where you controlled a dragon and competed with other dragons. Your goal was to swoop down and gobble up humans and fight off other dragons. The networking stack works (udp, tolerates packet loss). The game "worlds" are stored in an sqlite database, which is replicated on-demand to each node during the game. I thought that it was a neat idea at the time.

Anyway, I got distracted with something else and never finished the game. I suck at creating 3-d models (the only drawn objects are spheres).

by on (#72032)
Yeah I've brought up this same topic in the past, would be great if some emulator implemented it. I might add something similar to Nintendulator some time in the future. Here's little something I've been using every now and then, it sorts the accessed PC values based on Nintendulator's file logs for some basic profiling, but it's far from perfect: http://nesdev.com/bbs/viewtopi ... 4047#54047

Another quote from past:
Quote:
I wish one of the vast library of NES emulators would support profiling similar to AMD's CodeAnalyst and Intel's VTune on x86. On the same note it could also keep track of RAM usage, access frequencies and stuff like that (and display the results visually). Would be optimal if it could also handle assembler generated symbols.


A special build of VirtuaNES also supports cycle counting with help of some special registers ($401E and $401F IIRC), search the forums to find it.

by on (#72053)
I once added profiling to VisualBoyAdvance, just to see what PocketNES and Goomba were doing that was most CPU intensive. Needless to say, it's whatever code is performing spin-wait polling. (Reading that memory mapped register, then the backward branch)

by on (#72075)
Dwedit wrote:
I once added profiling to VisualBoyAdvance, just to see what PocketNES and Goomba were doing that was most CPU intensive. Needless to say, it's whatever code is performing spin-wait polling.

So I guess emulators have a reason to optimize the idle loop, so that it can get back to vsync() faster.