RMW instructions and DMA

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
RMW instructions and DMA
by on (#230611)
With all the new work happening around PPU writes, I was thinking about other areas that remained a general weakness of emulators. I wondered what the simplest remaining 'emulator check' would be for a game or demo maker to do that would trip up most everything except console.

Then I remembered that there was a very simple case that I still didn't even touch yet, RMW instructions to DMA addresses. This thread has a lot of work done with visual 2A03, but nothing done yet on console.

I am wondering if anyone can maybe mock up a timing test (maybe using blargg's sync code) that can help establish exact results in such cases?

If nothing else having a test ROM for this will tie up a loose end.

Any thoughts? Any other 'emulator checks' anyone can think of?
Re: RMW instructions and DMA
by on (#230612)
One emulator check that I've found for multiple platforms is sub-frame timing of the start of a keypress relative to the start of vblank. Among Game Boy emulators, bgb randomizes timing if the game reads the controller multiple times in a frame (to simulate GB or GBC hardware), while using the simple method of changing state at vblank (for less input latency) if a game uses only simple reading. The Qt version of mGBA, on the other hand, always makes buttons change state at the moment of vblank. I wrote a test ROM called Telling LYs to demonstrate this, and you may be the second (after pwnskar) to request a port of Telling LYs to the NES.
Re: RMW instructions and DMA
by on (#230613)
All the APU registers are fully decoded, including the 6502's internal φ1 signal. (Yes, it makes sure φ1 is low, instead of making sure that φ2 is high)

While w4014 is true, _db7 is copied to "spr_page7", inverted into "spr_/page7", and finally inverted into spr_+addr15; while φ1 is high spr_page7 and spr_/page7 set up a feedback loop to hold the latched value.

The external address bus is a four-way analog multiplexer (= transmission gates) that enables one of four sources depending on things:
* When "ab_use_cpu" is true, the 6502's address is used
* When "ab_use_pcm" is true, the DPCM DMA address is used
* When "ab_use_spr_r" is true, "spr_+addr" is used
* When "ab_use_spr_w" is true, $2004 is used

The 6502 won't get off of the internal data bus except when it yields after /RDY is asserted. It looks like OAM DMA just holds the value loaded from the external bus in the capacitance of the internal data bus, and then drives the value from that back externally for the write cycle of OAM DMA.

So, in conclusion, I don't see how the value could be anything other than the last value written.
Re: RMW instructions and DMA
by on (#230616)
@tepples: yeah that is an obvious one, but to be more specific I am interested in cases that don't require user interaction. (In the same vein as what Furrtek is known to do in their GB games.)

@lidnariq: The address seems to be pretty straightforward, but I am more so interested in timing. That also seems pretty straightforward, but so far there is no test for it. For that matter, I think there is a delay in the execution of DMC DMA by itself, at least that's what I saw in Visual 2A03. There seems to be enough time between when a write to the DMA register happens and when RDY is pulled to do a read or two. Maybe some trickery can be done to find out exactly?
Re: RMW instructions and DMA
by on (#230617)
Looks like writes to $4014 immediately drive +RDY (oops, it's active high) low.

when w4014 is high, that pulls node 14126 low, which pulls node 14117 high, which pulls node 14063 low, which pulls nodes 15737+14039 high, which pulls RDY low. At this point, the 6502 will continue doing all the writes it wants to, and then will stop until RDY goes high again.

I guess the only question (and it's less straightforward to track down) is how long after the write to $4014 does it start the DMA address bus cadence.
Re: RMW instructions and DMA
by on (#230618)
Yeah it’s immediate for 4014, but I am interested in writes that would trigger a DMC DMA. (Enabling it with 4015) I remember testing it and seeing a noticeable delay but never confirmed on console .
Re: RMW instructions and DMA
by on (#230619)
> a game or demo maker
> I am interested in cases that don't require user interaction.

Sounds like the demoscene. Here's a test that might border on parody of certain scene tendencies:

Test for 341/3.2 = 106.5625 CPU cycles per scanline as opposed to 341/3 = 113.6667, and for 312 scanlines per frame as opposed to 262. This distinguishes NTSC emulators from a PAL NES. Most emulators default to NTSC, while the demoscene has tended to make full use of the taller vblank of the PAL console at the expense of North American viewers.

If it turns out to be something other than a PAL NES, cut to a full-screen flag.
If 113.6667 by 312, show a Russian Federation flag (for Dendy, the best known regional name of the Micro Genius famiclone).
If 113.6667 by 262, and $4016 D3-4 are open bus, show a white screen with a red disk for Japan (for Famicom).
If 113.6667 by 262, and $4016 D3-4 are driven low, show the stars and stripes of the U.S. flag (for North American NES).

Back this up by testing for all PAL DMC periods, which might be inaccurate in an emulator even if it claims to support PAL timing in general. Also test for NTSC quirks fixed in the PAL 2A07/2C07 chipset, such as DMC-triggered double reading of nametables and the defects in bytewise OAM access.

My rationale: Unless your demo effects depend on particular hardware quirks, why test for them? See "Object detection" on Quirks Mode. Why do Furrtek games test for emulation?
Re: RMW instructions and DMA
by on (#230666)
tepples wrote:
My rationale: Unless your demo effects depend on particular hardware quirks, why test for them? See "Object detection" on Quirks Mode. Why do Furrtek games test for emulation?


I think just for fun / to mess with people. I like hardware trickery so I find it amusing. XD

As for DMC DMA timing, the last time I looked at this I got the following:

Code:
cycle   ab   db   rw   Fetch   pc   a   x   y   s   p   c_rdy
1406   0089   ea   1   NOP   0089   1f   a2   00   bd   nv&#8209BdIzc   1
1406   0089   ea   1      0089   1f   a2   00   bd   nv&#8209BdIzc   1
1405   0089   ea   1      0089   1f   a2   00   bd   nv&#8209BdIzc   1
1405   0088   ea   1   NOP   0088   1f   a2   00   bd   nv&#8209BdIzc   1
1404   0088   ea   1   NOP   0088   1f   a2   00   bd   nv&#8209BdIzc   1
1404   0088   ea   1      0088   1f   a2   00   bd   nv&#8209BdIzc   1
1403   0088   ea   1      0088   1f   a2   00   bd   nv&#8209BdIzc   1
1403   c000   00   1      0088   1f   a2   00   bd   nv&#8209BdIzc   0 
1402   c000   00   1      0088   1f   a2   00   bd   nv&#8209BdIzc   0  vvvvv three dead cycles instead of usual 4
1402   0088   ea   1      0088   1f   a2   00   bd   nv&#8209BdIzc   0
1401   0088   ea   1      0088   1f   a2   00   bd   nv&#8209BdIzc   0
1401   0088   ea   1      0088   1f   a2   00   bd   nv&#8209BdIzc   0
1400   0088   ea   1      0088   1f   a2   00   bd   nv&#8209BdIzc   1
1400   0087   ea   1   NOP   0087   1f   a2   00   bd   nv&#8209BdIzc   1  ^^^^ Dummy read
1399   0087   ea   1   NOP   0087   1f   a2   00   bd   nv&#8209BdIzc   1
1399   0087   ea   1      0087   1f   a2   00   bd   nv&#8209BdIzc   1
1398   0087   ea   1      0087   1f   a2   00   bd   nv&#8209BdIzc   1
1398   0086   ea   1   NOP   0086   1f   a2   00   bd   nv&#8209BdIzc   1
1397   0086   ea   1   NOP   0086   1f   a2   00   bd   nv&#8209BdIzc   1
1397   4015   1f   0      0086   1f   a2   00   bd   nv&#8209BdIzc   1
1396   4015   40   0      0086   1f   a2   00   bd   nv&#8209BdIzc   1
1396   0085   40   1      0085   1f   a2   00   bd   nv&#8209BdIzc   1
1395   0085   40   1      0085   1f   a2   00   bd   nv&#8209BdIzc   1
1395   0084   15   1      0084   1f   a2   00   bd   nv&#8209BdIzc   1
1394   0084   15   1      0084   1f   a2   00   bd   nv&#8209BdIzc   1
1394   0083   8d   1   STA Abs   0083   1f   a2   00   bd   nv&#8209BdIzc   1
1393   0083   8d   1   STA Abs   0083   1f   a2   00   bd   nv&#8209BdIzc   1
1393   0083   8d   1      0083   1f   a2   00   bd   nv&#8209BdIzc   1


Notice that there is an entire NOP that happens in between the write to $4015 and RDY going low. The question I'm trying to figure out is how to test this on console?

Maybe with some carefully crafted code you can time it so that there is a difference between pulling RDY low immediately and if you give it the apparent extra cycles.

But what? Maybe executing code at $2001 where reading the address will trigger a missed NMI at the right time? Any other ideas? It's not a lot of time to work with.