I am aware of a few CPU quirks, like the dummy write on INC. Is there a list somewhere of the other quirks, or can one of you walking NES information archives whip one up? Much appreciated! ;D
There's the error on wrap for the JMP Indirect instruction, you ask to do an indirect jmp at ($1FF) and it reads the 16-bit jump address from $1FF and $100 instead of $1FF and $200.
Then there's the extra cycle for when some instructions cross pages when adding X or Y.
Then there's the zeropage instructions which wrap back to the zeropage when you add a X to them instead of advancing to the $100 page.
Those are all good quirks, what about other dummy reads/writes? I'm trying to make my CPU not require a timing table, instead just adding cycles for read/write (and page boundary crossing where appropriate).
I think there are dummy reads for the wrong offset when there's a penalty cycle. Not 100% sure here, but I think when instructions like LDA xxxx,X cross to the next page, it will read from the incorrect page, then read from the correct page.
Wonder if this could be used to read both PPUSTAT and PPUDATA from one instruction?
Alright, and am I correct that INC does a dummy write? Say you do INC $AA, it'll read the value at $AA, then write that value to $AA, increase then write to $AA again?
If so, does DEC do the same thing?
I think all the "read-modify-write" instructions (including shift instructions) do that.
What's bad is that some games want to see the dummy write, and not the real write afterwards. Good ol MMC1 games and INC xxxx to write FF as the dummy value.
I couldn't find it right now, but I'm pretty sure someone in this board posted a link to a document that lists everything each instruction does every cycle, so it should contain all the dummy reads/writes. I'll update here if I can find it.
It seems that this is a good idea then! There are a few things still confusing me, take for example LDA (xx,X) (Assume all reads/writes consume 1 CPU cycle):
Read Instruction (PC)
Read Zero Page Address (PC + 1)
Add X, with wrapping
Read Low Byte ($00)
Read High Byte ($20)
Read Data ($2000)
This sequence totals 5 cycles, but the actual instruction requires 6.. Where is the missing cycle coming from?
EDIT: That would be great tokumaru!
I think
this is the document. The last third of the document appears to detail several instructions.
The 6502.org forums have details about instruction execute and the PLA (programmable logic array, not to be confused with 'pull accumulator').
The following links mainly discuss how the 6502 handles to so called "illegal instructions". However, they also contain clues about what the 6502 does on each clock cycle while executing an instruction.
http://forum.6502.org/viewtopic.php?t=1406
http://www.pagetable.com/?p=39
Once this data is gathered, I think that it would make an excellent addition to the wiki.
EDIT:
http://www.geocities.jp/team_zero_three/FC/index_e.html
(search for first use of the word "dummy")
http://nesdev.com/bbs/viewtopi ... 1288#11288
EDIT #2: Found this on the geocities.jp page:
Quote:
Although the FC/NES 6502 is not completely same as the general NMOS6502 chip, it has been verified that the FC/NES 6502 also has undocumented instructions and executes RMWW for RMW instructions.
It would be so counter-intuitive, but how awesome would it be to do a cycle by cycle CPU emulation? Like LDA:
Code:
int LDA(int cycles)
{
static int ldaCycles=0;
while (cycles > 0)
{
switch (ldaCycles)
{
case 0: // fetch address?
...
case x: return (cycles - ldaCycles);
}
}
return 0;
}
It would be a huge pain in the ass, but I think it would be neato!
I seem to remember Nintendulator emulates cycle by cycle like that.
tepples wrote:
I seem to remember Nintendulator emulates cycle by cycle like that.
I didn't come to that conclusion looking at the source for v0.975.
Nestopia emulates the instructions cycle by cycle.
NESICIDE wrote:
tepples wrote:
I seem to remember Nintendulator emulates cycle by cycle like that.
I didn't come to that conclusion looking at the source for v0.975.
I initially thought that too, but actually it does emulate cycle by cycle. Check the functions which handle different addressing modes (AM_xxx), every MemGet()/MemSet() increases cycle count.
thefox wrote:
NESICIDE wrote:
tepples wrote:
I seem to remember Nintendulator emulates cycle by cycle like that.
I didn't come to that conclusion looking at the source for v0.975.
I initially thought that too, but actually it does emulate cycle by cycle. Check the functions which handle different addressing modes (AM_xxx), every MemGet()/MemSet() increases cycle count.
I had the same thought and, indeed, that's how my emulator does it too. However, half way through posting my "there's tons of emulators out there that do cycle-by-cycle emulation, I re-read the original suggestion.
The *difference* between the way we do it and the suggestion is that the LDA function proposed takes a number of cycles to run as an argument, keeps track of the current cycle of the LDA instruction statically, and returns when the passed-in number of cycles has been exhausted or when the instruction completes [whichever occurs first]. So, the LDA function may be called with a cycle count of 1 in which case it would only execute *one* memory-access cycle part of the LDA and then return.
I do BRK/NMI/IRQ that way just so I can get the granularity necessary to pass the CPU interrupt tests. I thought about redoing my CPU core entirely to make every instruction have a cycles parameter and return when the cycles were depleted, not necessarily when the instruction was completed. But I wonder if an easier approach would be to just have a subordinate catch-up routine in the PPU that is called whenever the CPU notices that the number of cycles its been asked to emulate has been depleted. That way the PPU can change state mid-CPU-instruction and the CPU can react as necessary. Of course there'd need to be some way to stop the thing, otherwise the CPU would just keep asking for more from the PPU forever.
I think there's two "cycle-based" ideas that conflict. One is the independent cycle-based component idea, where each core runs one cycle at a time exactly as its hardware equivalent would do. Then there's the cycle-synced components idea where each component is cycle-based and also synced at its cycle-granularity to the other components. For the CPU/APU this is easy since they're 1:1. For the CPU/PPU whenever the CPU runs one cycle the PPU should have run 3 or 3.2. But, in both our cases the CPU could run ahead of the PPU because it might be doing multiple reads/writes to satisfy the completion of an LDA. In my emulator the first two cycles of any instruction are broken apart so that the PPU/CPU are cycle-synced for those two cycles. But once the instruction begins operation, the CPU will jump ahead of the PPU depending on how many cycles remain to be done for the instruction.
Intersting thought...I might have to crack open my CPU core again.
It would be one hell of an undertaking, but it would be awesome once it's done! I imagine it would run horribly slow though, possibly too slow for general use? The largest concern I have is if you execute an interrupt mid-instruction, what happens to that instruction after control is returned? Is it skipped, reset, or does it go back to work as if nothing happened? I imagine the latter is true, but I can only speculate!
Indeed, having a static timer inside each instructions execution method would make for tidier code, but again, if an interrupt occurred during say LDA, that cycle counter would have to be reset or the next LDA called would continue at the previous instruction's step!
Let us all know how much progress you make, it would be nice to have a proof of concept at least!
beannaich wrote:
It would be one hell of an undertaking, but it would be awesome once it's done! I imagine it would run horribly slow though, possibly too slow for general use? The largest concern I have is if you execute an interrupt mid-instruction, what happens to that instruction after control is returned? Is it skipped, reset, or does it go back to work as if nothing happened? I imagine the latter is true, but I can only speculate!
Indeed, having a static timer inside each instructions execution method would make for tidier code, but again, if an interrupt occurred during say LDA, that cycle counter would have to be reset or the next LDA called would continue at the previous instruction's step!
Let us all know how much progress you make, it would be nice to have a proof of concept at least!
I think interrupt waits until instruction is executed.
You can't interrupt an instruction mid-execution. The interrupt is serviced between instructions. Otherwise I can't imagine it being very stable at all.
On the 6502, this is correct. But on the 65816, an instruction can be /ABORTed part-way through. This was intended to work with a mapper to support virtual memory.