Emulator update and question about timing and microcodes

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Emulator update and question about timing and microcodes
by on (#225472)
Hello!

I've been working on my emulator, Nintendoish for the last 10 months or so.

Progress is going great! I've retargeted the emulator to be an iOS emulator. While there are Windows/Mac build targets, they don't have great UI and are mostly meant for ease of development. The main user experience for the emu is definitely as an iOS app.

I currently pass all of Blargg's PPU tests (including ppu_vbl_nmi) and all of his CPU tests except for the last test in cpu_interrupts_v2. I have to thank everyone on this forum for my progress. Even though I haven't posted many questions directly, searching this forum I've been able to almost always find an answer.

My main problem right now is the only way I can get the emu to pass ppu_vbl_nmi, get through Battletoads level 2, and not shake the status bar in Bart vs the Space Mutants is through dirty PPU hacks. Delay Vblank a couple cycles here, delay Sprite 0 hit a couple cycles there, etc.

Obviously that's not ideal. I want a hack free PPU if possible.

My question is this: Is it possible to get those timings right without implementing a CPU that executes cycle by cycle microcodes? Has anyone successfully written an emulator that executes a whole instruction in one cycle, runs the PPU by the timing of that instruction * 3, repeat with next instruction, and still gets through Battletoads, doesn't shake Bart's status bar, and passes ppu timing tests without PPU hacks?

If not then sounds like I probably should rewrite my CPU. Because I really do want to make it through those timings without dirty PPU hacks if possible.

For what's it worth, currently my CPU looks like this:
1. Load an instruction. (But don't execute)
2. Step the PPU as many times as the loaded instruction timing requires * 3.
3. Poll for interrupts and set them to pending.
4. Execute the loaded instruction.
5. Execute any pending interrupts.
6. Step the PPU for any additional cycles caught in execution. (Crossed pages.)
7. Goto Step 1.

Thank you!
Re: Emulator update and question about timing and microcodes
by on (#225483)
There was a discussion here.
Re: Emulator update and question about timing and microcodes
by on (#225590)
drewying wrote:
Is it possible to get those timings right without implementing a CPU that executes cycle by cycle microcodes? Has anyone successfully written an emulator that executes a whole instruction in one cycle, runs the PPU by the timing of that instruction * 3, repeat with next instruction, and still gets through Battletoads, doesn't shake Bart's status bar, and passes ppu timing tests without PPU hacks?


FCEUX operates like that. But it's full of hacks to make things work. What's worse is that you won't be able to reproduce those hacks. It took many many years of incremental tuning until FCEUX was able to play virtually all games. And that tuning was the effort of many many contributors. As an individual emulator developer, if you want to make something that can run all games, then you'll have to strive for the highest accuracy. It's the only practical path to that goal. In other words, introduce microcodes.
Re: Emulator update and question about timing and microcodes
by on (#225663)
I remember of Disch discussing about queue/dequeue PPU events after a certain amount of CPU cycles. Did he vanish from the forum... too?
Re: Emulator update and question about timing and microcodes
by on (#225694)
For accuracy, you will need to have all the weird quirks happen.

Dummy Reads:
* Dummy Read when correcting the high byte of an address
* Dummy Read in read-modify write instructions
For example, Ironsword uses a dummy read to acknowledge an APU interrupt, otherwise it crashes on bootup.

Cycle penalties:
* Cycle Penalty when ABS+X/Y crosses to a new high byte
Battletoads will shake the screen without the cycle penalty

There are other quirks too.

Then to eliminate most scroll shaking issues, you need to have scroll writes apply as the write cycle of the instruction ends, this is usually the 4th cycle of an absolute write. Other IO writes also need to happen that time too.

Also, "Microcode" isn't the right term, as the 6502 is a hardwired processor that does not use microcode.
Re: Emulator update and question about timing and microcodes
by on (#225696)
Dwedit wrote:
Also, "Microcode" isn't the right term, as the 6502 is a hardwired processor that does not use microcode.

I agree that more is hardwired in 6502's unstructured decode logic than in some of its contemporaries. But the 6502 does contain 130 lines of decode ROM.
Re: Emulator update and question about timing and microcodes
by on (#225697)
Dwedit wrote:
"Microcode" isn't the right term, as the 6502 is a hardwired processor that does not use microcode.


This topic has come up in the past and you're right, of course. But, for lack of a better term, "microcode" conveys the concept.
Re: Emulator update and question about timing and microcodes
by on (#225900)
Thanks for the input guys. Looks like "microcodes" is indeed my next step to go any further in my accuracy quest. :)
Re: Emulator update and question about timing and microcodes
by on (#225914)
Note that you don't need to be able to pause the CPU mid-instruction - you can still have it execute entire instructions at a time, just as long as you advance the PPU and APU by the appropriate amounts each time the CPU performs a single memory access.

The "simplest" way is to just model all CPU instructions as sequences of memory reads and writes (including dummy reads/writes, as Dwedit noted) and have your "ReadMem"/"WriteMem" functions run the PPU and APU by a single CPU cycle, but you can optimize things further by just queueing up those cycles and "catching them up" whenever some special interaction would be triggered (e.g. the CPU accessing an I/O register, or the PPU/APU generating an interrupt which you can predict fairly easily).