When emulated, the stage select screen in Mega Man 3 shows a shifting scanline just above Shadow Man's picture. I always assumed it was a timing issue that emulators hadn't figured out. I hadn't played the game in 10 years on the real hardware and I figured Capcom and Nintendo would never allow such a visual glitch to appear. But having played MM3 on the real hardware today I can confirm that the scanline behaves exactly in the way it does in our more advanced emulators like Nintendulator. So, one issue laid to rest at last.
Now if the could eliminate the glitches in Micro Machines I would be very happy.
That MM3 glitch was well known already. Trying to figure out how Micro Machines works is really going to be a bitch. Any information on what reading $2004 does exactly when the PPU is rendering a picture? What's weird about the code is that -at least throughout the first screens or so- it reads $2004 usually 8 times within a timed loop. Maybe reading it would return $80 ORed with some value whenever a specific sprite line has been rendered?
It seems logical to me that $2003 would get changed as the PPU fetches Sprite data, thus reading $2004 would return the byte last fetched by the PPU. BT's doc implies that sprite Y data would be fetched every 4 cycles (256 cycles / 64 srites... Y is fetched every 4 cycles to see if the sprite is in range). If the sprite is in range, it would fetch the other sprite data presumably in the following 3 cycles (before the next Y fetch). My hunch is that In theory, were an emu to keep $2003 synced up with these fetches, it might give some positive results in Micro Machines. I've been meaning to tinker around and experiment with this, but haven't gotten around to it yet. It was definatly on my to-do list.
But anyway... that idea is completely 100% theory. I don't know if that's at all accruate, it just seems logical to me.
edit: implimentation might be as simple as setting $2003 to the current scanline cycle?
Disch is correct in that reading $2004 while rendering DOES expose internal SPR-RAM reads. The actual read pattern has been investigated, but not fully deciphered. It seems to read a different sprite's Y-coordinate every two cycles, and then some more stuff if it's within range; remaining cycles seem to have it alternating between sprite N y-coordinates and sprite #63's Y-coordinate (i.e. 00, FC, 04, FC, 08, FC, 0C, FC, ...). HBlank seems to return either $FF or some other random-looking values.
Since the test I used only performed one read per scanline, it was not possible to construct an accurate model of what was going on (since each scanline had different sprites), though it may be possible to get better results by reading multiple times per scanline.
I was thinking just like Disch and tried a quick hack where I returned byte # [ppu cycle] when reading $2004. Didn't fix Micro Machines though.
If I recall, Micro Machines checks for bit 7 to be set, which according to Q's post would indicate it's in HBlank. Gonna try some more hacks in my emu.
-Martin
I've just finished improving my original test program so it reads all of the values during the first 3 scanlines, spread out across 42 frames - 12 values per scanline, 3 scanlines per every OTHER frame. With the sprite pattern I used, this will make it always fetch the same sprites on those scanlines, providing an accurate snapshot of exactly what is going on.
The data sequence starts at cycle ~339 and ends at cycle ~249, a total of 252 samples (the last 4 will always be zero).
The test program can be found at the following URL, in case you have a real NES to test it on:
http://www.qmtpro.com/~nes/demos/read2004.nes
[update]
After having the above program (and a few modifications) tested on Kevin Horton's CopyNES, I believe I have definitive information on the PPU's internal SPR-RAM access patterns:
Cycles 0-63: idle - reading $2004 will return $FF
Cycles 64-255: sprite evaluation
- read each sprite's Y coordinate twice, starting at sprite zero (
S[00], S[00], S[04], S[04], S[08], S[08], ...)
- if it's in range, read the remaining bytes of that sprite, twice each (S[00], S[00],
S[04], S[04], S[05], S[05], S[06], S[06], S[07], S[07], S[08], S[08], S[0C], S[0C], ...)
- once all 64 sprites have been evaluated, "thrash" between fetching sprite N Y-coord and sprite 63 Y-coord (S[00], S[FC], S[04], S[FC], S[08], S[FC], ...)
Cycles 256-319: sprite fetches (8 sprites, 8 cycles each)
- for each sprite, read S[n+0], S[n+1], S[n+2], S[n+3], S[foo], S[foo], S[foo], S[foo], where 'foo' is usually the X-coordinate (n+3)
- for the first unused sprite, read S[FC], followed by seven $FFs
- for remaining unused sprites, read eight $FFs
Cycles 320-340: read S[foo], where 'foo' is usually the the Y-coordinate of the first sprite on the scanline
Nintendulator currently emulates this (based on the initial tests and some 'guesswork'), though it returns values consistent with no sprites being rendered at all.
Excellent! Way to go, Quietust... you're my hero.
I'll begin implimenting this and see if it solves the Micro Machines problems once and for all. Thanks a bunch.
EDIT:
I'd assume this only takes effect outside VBlank if BG or SPR rendering is on... right? if both are off or its in VBlank it just returns the byte at the address specified by $2003, right?
If sprite 63's y-coordinate were to be in range, while the ppu is thrashing, would the ppu read the rest of the data for that sprite, as though it weren't in "thrash mode"?
I assume the S[foo] reads don't serve a purpose directly related to the rendering of the sprite, right?
and the sprite fetches are just the sprites that were in range?
Drag wrote:
If sprite 63's y-coordinate were to be in range, while the ppu is thrashing, would the ppu read the rest of the data for that sprite, as though it weren't in "thrash mode"?
I don't know, but I personally doubt it.
Quote:
I assume the S[foo] reads don't serve a purpose directly related to the rendering of the sprite, right?
Not that I'm aware.
Quote:
and the sprite fetches are just the sprites that were in range?
Of course - they correspond directly to the sprites that get rendered on the next scanline.
After adding a bit of extra logic to my emulator (namely, to
not return $FF during cycles 320-340), the glitches in Micro Machines are completely gone. My $2004 read behavior is now mostly consistent with the real hardware when there are no sprites on the scanline; adding proper behavior when there ARE visible sprites will be an interesting task, though I
will attempt it.
As usual, the current build can be found at
http://nintendulator.sf.net/
I still need to perform a few more tests, though, to answer a few questions:
1. What happens during cycles [256-319] when there are no sprites on the given scanline? (very first read would be either SPR_RAM[FC] or literal $FF)
2. When there are more than 8 sprites on a scanline, what happens during the remaining cycles in [64-255]? (will it attempt to fetch additional sprite data?)
3. What happens during the remaining cycles in [64-255] when sprite #63 is in range? (most likely nothing happens)
I tried implimenting this and I've
almost got Micro Machines working... but I'm still having some jitteriness problems. With my current implimentation, the "?" racer at the top and racer list at the bottom get displayed fine... but the text in between them still messes up.
It seems to come down to the [320-340] range. The game doesn't seem to be reading in any of the other ranges (except for [0-63], but that's simple enough to impliment) -- EDIT: crap I was getting the cycle wrong... it does read from under 320 sometimes... but my problem still remains.
My current implimentation of reads when the PPU is in that cycle:
Code:
else // nScanCyc is in the [320-340] range
{
// find first sprite on line
for(i = 0; i < 256; i += 4)
{
if((u8)(nScanline - nSprRam[i]) < (bSpr16 ? 16 : 8)) break;
}
if(i < 256)
{
return nSprRam[nSprRam[i]];
}
else //no sprite on line
{
//not sure if this is right?
return nSprRam[0xFC];
}
}
Anything standing out as being wrong? There always seems to be a sprite found on the line (sprite 0 -- every time)
EDIT #2:
I cutoff the [320-340] region code and just had it return 0 in that range... and that seemed to have solved everything. Micro Machines seems to run perfectly now. Is that what I should be doing? or is my above code flawed somehow?
Hrm... nSprRam[i] instead of nSprRam[nSprRam[i]] also works. Maybe I just misinterpretted what you said before. Nevermind ^^
Thanks a million Q, you are the NES master.
This is the code I use, and it gets Micro Machines glitch-free. It doesn't yet produce valid data when there are sprites on the scanline, though I'll be adding that later.
Code:
static int __fastcall Read4 (void)
{
if (PPU.IsRendering)
{
if (PPU.Clockticks < 64)
PPU.ppuLatch = 0xFF;
else if (PPU.Clockticks < 192)
PPU.ppuLatch = PPU.Sprite[((PPU.Clockticks - 64) << 1) & 0xFC];
else if (PPU.Clockticks < 256)
PPU.ppuLatch = (PPU.Clockticks & 1) ? PPU.Sprite[0xFC] : PPU.Sprite[((PPU.Clockticks - 192) << 1) & 0xFC];
else if (PPU.Clockticks < 320)
PPU.ppuLatch = 0xFF;
else PPU.ppuLatch = PPU.Sprite[0];
}
else PPU.ppuLatch = PPU.Sprite[PPU.SprAddr];
return PPU.ppuLatch;
}
ah, so you're just returning S[00] when the cycle is above 320.
From what I can tell... Micro Machines isn't exactly super strict about this behavior. All it really expects is to have the high bit clear when the cycle is between 320-340 and the high bit set if earlier or later.
It's good to know the details anyway. I've implimented the info you've posted in my emu. It's not very efficient though... if a game were to constantly read 2004 all frame my emu would eat CPU time like a mofo XD. But ah well.
Another milestone in NESemdev!
In performing some additional tests, I ran into another minor catch - sprite DMA actually takes 513 cycles, rather than the expected 512 (note that this does not include the 4 cycles necessary for the 'STA $4014' instruction).
I have subsequently updated my emulator to reflect this information.
Quietust wrote:
In performing some additional tests, I ran into another minor catch - sprite DMA actually takes 513 cycles, rather than the expected 512 (note that this does not include the 4 cycles necessary for the 'STA $4014' instruction).
I have subsequently updated my emulator to reflect this information.
Are you sure? Isn't it just because of the CPU instruction delay before the DMA kicks in merely giving the impression that it takes 513 cycles and not 512?
The 2A03's DMA controller adds a couple cycles of overhead because it has to make sure that the 6502 core is completely halted before the DMA controller starts a transfer.
That could be, but the overall time elapsed during an "STA $4014" is a total of 517 cycles.
I've also discovered some startling information, which seems to explain the bizarre behavior we had noticed earlier with sprite overflow happening even when there were not 9 sprites on a scanline. Following is the sprite table used in my demo, evaluated cycle by cycle during scanlines 1-3. Fetches are shown in bold, and successful Y-evaluations are shown in italic.
$AA,$00,$00,$00 - sprite not in range
$01,$10,$01,$00 - first sprite in range
$00,$20,$01,$01 - 2nd
$00,$30,$01,$02
$00,$40,$02,$03
$00,$50,$02,$04
$00,$60,$02,$05
$00,$70,$03,$06
$00,$80,$03,$07 - and 8th, after which it starts thrashing between the first sprite's Y-coordinate and the coordinate being evaluated
$05,$90,$03,$08 - but here's where it goes horribly wrong..
$05,$A0,$40,$09 - it starts fetching DIAGONALLY through the sprite table
$05,$B0,$41,$0A
$05,$C0,$42,$0B
$05,$D0,$43,$0C
$05,$E0,$80,$0D
$05,$F0,$81,$0E
$05,$F1,$82,$0F
$05,$F2,$83,$10
$05,$F3,$00,$11
$05,$F4,$00,$12 - managing to match sprite 19's FLAGS value as a valid Y coordinate
$05,$F5,$00,$13 - after it finishes "fetching" this sprite (and setting the overflow flag), it realigns back at the beginning of this line
$05,$F6,$00,$14 - and then continues here on the next sprite
$05,$F7,$00,$15
$05,$F8,$00,$16
$05,$F9,$00,$17
etc.
Gosh.
Damn.
I don't know which is more shocking: the complexity of the behavior of the PPU bug you describe, or that you managed to figure it out.
Just ... wow. What kind of twisted circuitry exists in the PPU to cause that?
Micro Machines also displays another instance of unusual, difficult to emulate behavior. (Which only Nintendulator seems to emulate correctly.) If you press the reset button while it is running in a NES, it will reset just fine during the races but not in the menu screens. In the Standalone Cartridge Version it frequently crashes the program, in the Aladdin Deck Enhancer Version it generally just cuts off the music until the next screen but still allows input. On most emulators, using the "Soft Reset" option should be identical to pressing the reset button of a NES. I wonder if this is related to the method Micro Machines uses to display its picture in these screens?
While this is an old post to ressurect, the issue I describe below may be the same type of error that is displayed in MM3, a true glitch in the real NES:
In the rather rare game Zombie Nation (and its japanese counterpart), every emulator I have ever used it on has the top scanline of the bottom screen shift about 8-16 pixels or so very rapidly. I would ordinarily suppose this to be a glitch in the emulation, because the defect would be clearly noticeable on a TV. Many shifting scanlines are covered by the TV's overscan or are so far on the sides of the screen as not to be noticeable. The shifting scanlines as I see them on Zombie Nation cover the entire health meter heads. Is this an emulation problem needed to be addressed or sloppy programming?
If I didn't know any better, I'd think that I'm seeing PPU documentation approaching the level of the "VIC Article" for the VIC-II. Very impressive.