question about DMA registers

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
question about DMA registers
by on (#100874)
Do the DMA registers get destroyed after a DMA block has been transfered? If they do not get destroyed, does the source address register continue where it left off after a DMA block update, or does it still hold the starting point of the last DMA block? Same thing with the DMA legnth register. Does it count down to zero during a DMA block update, or does it still hold the legnth of the previous DMA block?

I want to know if there is a faster way to upload an individual 16x16, without updating them in groups of 8, or spending additional vblank time reseting the DMA control registers, just to get the bottom half of a sprite in.
Re: question about DMA registers
by on (#100877)
I think the parameters for the transfer remain the same until you choose to change them. So it's always going to be the exact same transfer until you alter the registers. I could be wrong, write a test ROM and find out. But I see no reason why the DMA registers would become altered by having the DMA transfer activate. But it's possible, afterall the NES's VRAM Pointer for $2006 is also used in rendering. So DMA registers could be modified during DMA.

BSNES *probably* should have the behavior correct if you don't have a flash cart or copier to test on real hardware.
Re: question about DMA registers
by on (#101525)
Addresses are incremented (or decremented) during transfer, and contain new address after transfer.
Same for the length counter, contains 0000h after transfer.
Special case would be HDMA, the start address is used as reload value (and doesn't change).
Re: question about DMA registers
by on (#101536)
MottZilla wrote:
BSNES *probably* should have the behavior correct if you don't have a flash cart or copier to test on real hardware.


To give you an idea of how thoroughly I've tested things ...

I found a glitch where if the last HDMA indirect channel completes (eg no other HDMA channel is scheduled to run after it), the indirect address is updated to the low byte of the loaded address << 8, and the final read does not occur, so it finishes eight clock cycles sooner.

So yeah, it's well tested =)
Re: question about DMA registers
by on (#101566)
Well then I was right to believe you would have tested such behavior. Curious, are you aware of any games that rely on either of these DMA behaviors?
Re: question about DMA registers
by on (#101582)
I have myself relied on SNES DMA regs being updated as it runs. That's pretty common.

No game intentionally relies on the HDMA reload issue to my knowledge, but that tiny difference once per frame ended up throwing off the main menu in Circuit USA somehow.

What I'd really like to know more about is the 1/1/1 SNES' DMA glitch. When DMA starts near an HDMA end and vice versa, the entire system crashes. I want to emulate it, but it's hard to test what's happening when the entire system crashes on you :P
Re: question about DMA registers
by on (#101593)
Emulating a system crashing hardware glitch... that sounds like all sorts of fun >.<
Re: question about DMA registers
by on (#101599)
Not too much different from stopping emulation on the STP opcode on the 65816 or any of the twelve STP opcodes (whose low nibble is $2) on the 6502.
Re: question about DMA registers
by on (#101608)
Far, far, far easier to emulate STP. And STP is useful for test ROMs. Less code than sei; stz $4210; stz $4211; bra $fe =)

This doesn't just freeze the CPU, I suspect what's happening is the program counter is being corrupted. Sometimes really weird stuff happens, like the screen fills with gibberish and random sound effects start playing. It's ... something to behold.
Re: question about DMA registers
by on (#101632)
If the PC is being corrupted and you can make it crash where you want maybe the PC corruption is just causing it to be misaligned with the opcodes? Maybe stick some NOPs after where it would crash and see if you can make it recover? Just a random thought though. I've never heard much about the DMA glitch other than it existed on the original version of the hardware.
Re: question about DMA registers
by on (#101638)
How about this:

  • Fill as much of the memory space as you can with WAI opcdes.
  • At the first scanline of the active display period, setup the GP DMA / HDMA combo that will trigger the crash, and hope that whatever weird value PC is changed to will hit one of the WAIs.
  • Once the next vblank occurs and your NMI handler is called, fetch the return address from the stack and log it somehow (send it through a debug interface, print it to the screen, ...).

?
Re: question about DMA registers
by on (#101697)
Re: WAI method: why not just fill memory with $00 and install a BRK/IRQ handler, in which you pull the originating address off the stack, save it somewhere in DP or RAM, then in NMI do your drawing/rendering of the 16-bit address in DP/RAM?
Re: question about DMA registers
by on (#101699)
Same but different. As long as the amount of non-WAI/BRK code is kept to a minimum to reduce the chance of hitting anything else than one of those instructions when PC gets corrupted (if that's what's happening). This obviously won't work if PC ends up somewhere in I/O space :P
Address bus conflict?
by on (#101700)
It's not a bus conflict between the DMA unit's address output and the CPU's address output, is it?
Re: question about DMA registers
by on (#101718)
Lots of great ideas, anyone here want to test them out on hardware and see? :D

I'm honestly pretty swamped from my end. Dumping more than 720 carts, databasing all the board layouts, hacking Far East of Eden Zero, trying to figure out a hires blending edge case in Marvelous, build an SNES expansion port connector, on and on.
Re: question about DMA registers
by on (#102432)
I am having two problems with HDMAs started mid-frame. Both are probably related to the "do_transfer" flag (as how it is called in the anomie docs).

The first problem is that Super Ghouls & Ghosts isn't working in current no$sns version. The anomie doc isn't too clear about if/when/how do_transfer flags are initialized for HDMAs that are started mid-frame.
I got the game working by initializing the "do_transfer" flags of all disabled channels to zero in line 0.
That way the game does work, but I am not sure if it's a correct reproduction of what the hardware does(?)

Alongsides, I've been writing a test program that does set up to HDMA channels, both with values as so:
420Ch=00h ;stop all HDMA channels
43x0h=02h ;transfer two bytes to [bbus+0], and [bbus+0]
43x1h=88h ;dummy bbus destination address (unused port 2188h)
42x4h=abus.src.bank
42x8h=abus.src.offs.lo
42x9h=abus.src.off.hi
42xAh=01h ;remain count
abus.src is pointing to "02h,77h,55h,00h" (repeat/pause 2 scanlines (02h), and transfer one data unit (77h,55h), and after the pause, finish the transfer (00h).
Then I am starting the first of the two channels in line 128, and watch the src/remain values in 43x8h..43xAh, which behave as expected (src increases, and remain decreases from 02h downto 00h).
A few scanlines later, I starting the second HDMA channel, which should do the same thing - but doesn't do so. Instead, it's decreasing remain count in 43xAh from 55h downwards... Is that a known effect?

The second channel does apparently start with "do_transfer=1" (for whatever reason), causing it to transfer 02h,77h as data, and then fetch 55h as repeat count for next scanlines.
This happens only when starting the HDMA channels one after another; it doesn't happen when I do start BOTH channels in scanline 128 (then both do decrease remain from 02h downto 00h).
Re: question about DMA registers
by on (#102435)
What anomie calls do_transfer is cleared at the start of every frame, regardless of whether HDMA is enabled or not. Similarly, the completed flag is cleared.

After that, -only if HDMA is enabled-, you then stop any active DMA that has HDMA enabled, copy the source address to the HDMA address, and then reload the line counter (and thus indirect address if required.) Since I'm mentioning it, keep in mind that any HDMA during DMA will kill that DMA channel mid-progress. That'll fix a few bugs for you (I think in Bugs Bunny and some football game.)

Technically this process happens at the first cycle edge at or above V=0,H=12+8-dma_counter() for CPU revision 1, and V=0,H=12+dma_counter() for CPU revision 2. dma_counter() = total clocks executed since power on & 7; (or &6; if you want.)

Or if you want, here's the implementation, I tried to follow anomie's names where possible, but I emulate some stuff he never knew about as well:
http://gitorious.org/bsnes/bsnes/blobs/ ... ma/dma.cpp
http://gitorious.org/bsnes/bsnes/blobs/ ... timing.cpp

Not to derail, but did you ever get a chance to look at the ST018? I could never figure out what that one port was that looked like a timer (it was writing 21.47MHz/3 to it.)
Re: question about DMA registers
by on (#102440)
> What anomie calls do_transfer is cleared at the start of every frame,
> regardless of whether HDMA is enabled or not. Similarly, the completed flag is cleared.
Okay, then the bugfix for Ghouls & Ghosts wasn't so wrong.

> keep in mind that any HDMA during DMA will kill that DMA channel mid-progress.
> That'll fix a few bugs for you (I think in Bugs Bunny and some football game.)
Good to know, thanks! I've been a bit lazy there and hoped no game would rely on such things.

I've just first-time tested bsnes (on a winxp computer via remote desktop, hoping that it'd be able to run bsnes... it worked... but it took me an hour to figure out how to load a rom-image into it (in short: the browse file gui turned out to be misleading... it didn't do anything except occasionally crashing... but loading roms via commandline worked... at least if they are padded to min 32kbytes, else nothing happens)). And I thought no$sns was a bit old-fashioned, and the more modern emus would work by click-and-play :-)

Anyways - I got the test program running, and bsnes didn't reproduce the HDMA glitch either (ie. it didn't fetch the 55h data byte as reload/counter value). Looks as if there is at least one secret still hiding in the console.

> Not to derail, but did you ever get a chance to look at the ST018?
Don't know when I get around to extract my ARM assembler/disassembler/emulator from no$gba for use in no$sns (whenever that happens, I'll have a look at the st018 bios and ioports & let you know if I find out anything new).
Re: question about DMA registers
by on (#102447)
> it took me an hour to figure out how to load a rom-image into it

Don't worry, the next release is going to feature an external DLL to load files, and it'll accept files in the format you're used to.
I presume that will make koitsu very happy as well.

> at least if they are padded to min 32kbytes, else nothing happens

Not having enough ROM data to describe a reset vector isn't very fun :P

> And I thought no$sns was a bit old-fashioned, and the more modern emus would work by click-and-play :-)

I do things the more correct way, rather than the easiest way. I'm not in it for the popularity.
But yeah, the external plugin works nicely. Lets me put all the legacy code somewhere outside the emulator, which is something I can tolerate.

> Looks as if there is at least one secret still hiding in the console.

I'm skeptical, but do let me know when you figure it out, I'll confirm your findings on the five or so decks I have here.
For what it's worth, there' a lot more than one secret remaining.

CPU:
DMA/HDMA crash on CPU revision 1
If you start a DIV during a MUL, or vice versa, the results are psychotic
Initial values in WRAM are off the wall crazy; they vary per system and per reset ... approximating it to some degree would be good
On the very first frame only, the VIRQ value is off by one scanline
What is the exact timing involved in auto joypad polling? I only have an approximation because it is hard to test with manual input.
Need to wire up an Arduino to send deterministic data to it or something.

SMP:
The TEST register has two mystery bits that make the timers go crazy.

DSP:
When you mute a channel, it fades out and does not silence immediately

PPU:
We barely understand this at all. Almost no data on cycle timings here.
What if you use the MUL functionality during a Mode7 screen rendering?
What if you toggle a BGMODE in the middle of a scanline? (god help us if it actually gets acknowledged ...)

I wrote some fun tests that wrote text on the screen by changing the registers mid-scanline. But otherwise, nobody's interested in emulating or coding for the PPU like this ... I currently have the only dot-based PPU renderer, and it's as slow as you think it is.

SuperFX:
What happens if the secondary pixel cache fills up and the SFX CPU is accessing RAM? Does it interleave writes, or does one stall the other? If the CPU can stall the secondary pixel cache, you could have a RAM-only program permanently cause it to stay full.

SA-1:
Some registers can only be accessed from the SA1, some from the CPU.
The stuff no games use is not understood: H/VIRQs (how does it time the H/V position?), clock-IRQs, etc.
How in god's name do we emulate the SA-1 memory conflict controller that stalls its CPU when the SNES CPU is reading from ROM/RAM? That's going to require monstrous overhead. We can't afford to step both the CPU and SA-1 one clock tick at a time.

DSP-n/ST01n NEC DSP:
How do OV1 and S1 flags work? Documentation sucks.

Cx4 / Hitachi DSP:
How does the program RAM work? It caches pages, but what's the overhead on that? Are all opcodes one cycle each?
Currently my emulation causes Rockman to die in the intro, unless you set a different frequency rate :(
$70-77:0000-7fff always returns 0 when read. Looks like it has the option to have RAM pass through it, but MMX2/3 didn't use that.
What's with all the regs? $7f52 has to be 1 for MMX3 to read past 1MB; has to be 0 for MMX2 to read past 1MB.

ST018:
What's with the timer value thing?
Cydrak wrote a cool exploit to crush the stack and execute uploaded code out of RAM.

SPC7110:
What does $4808 and $58:0000-ffff do?
What in the hell was the intended usage of the interleave/skip functions on decompression? What are the timing restrictions there?
How do we emulate bad input data causing the decompressor to 'crash' and spit out junk?

SPC7110-RTC:
There's some interesting delays for certain actions, if you start reading too soon you get crazy stuff happening that's not in the datasheet.

SDD1:
Can DMAs happen on any channel? If not, which bits control activating decompression DMA?
How the hell does the SDD1 know when a DMA is taking place?
What's with the crazy read data when you set bank selections bigger than the ROM size?
How do we emulate this chip crashing on bad input data (like a string of 0x00s for a long time)?

SRTC:
There's a test register we know nothing about.
I haven't mapped out how the BCD works on invalid values. Took me a fucking week to do that for the SPC7110-RTC.

> Don't know when I get around to extract my ARM assembler/disassembler/emulator from no$gba for use in no$sns

Yours may be a bit trickier because I bet you share ARMv4/v5 for your NDS emulation; but I use the same CPU core for the GBA and ST018 (ARMv3), and it works fine. I need to make a separate instruction table to omit the v4-only stuff though.
Re: question about DMA registers
by on (#102455)
byuu wrote:
Not having enough ROM data to describe a reset vector isn't very fun

It doesn't take 32K just to describe a reset vector, at least on the Super NES's predecessor. A 16K ROM on the NROM-128 board is mirrored into both $8000-$BFFF and $C000-$FFFF. One game is so small it's mapped into $8000-$9FFF, $A000-$BFFF, $C000-$DFFF, and $E000-$FFFF, and the most common file format for NES games just accepts that as an overdump. It's made official as of NES 2.0: "double it up and call it a day" says kevtris.

byuu wrote:
But yeah, the external plugin works nicely. Lets me put all the legacy code somewhere outside the emulator, which is something I can tolerate.

Is it flexible enough to allow multiple games in one file, such as for a plug-in that can load ROMs out of a .zip or .7z?

byuu wrote:
I wrote some fun tests that wrote text on the screen by changing the registers mid-scanline. But otherwise, nobody's interested in emulating or coding for the PPU like this ... I currently have the only dot-based PPU renderer, and it's as slow as you think it is.

I wonder how hard it'd be to port something like blargg's flowing palette demo to the Super NES.
Re: question about DMA registers
by on (#102462)
Quote:
PPU:
We barely understand this at all. Almost no data on cycle timings here.
What if you use the MUL functionality during a Mode7 screen rendering?
What if you toggle a BGMODE in the middle of a scanline? (god help us if it actually gets acknowledged ...)


I've been wondering lately what happens when you change the Mode7 parameters midscreen. The best case senario would be a split-screen Mode-7 effect, which can be useful for a giant front facing angel boss with rotating wings.

For changing BGMODE mid scanline, the best case senario would be a splite-screen effect, but with a garbage tile inbetween. If that is the case, then maybe you can disable the BG layer right before the mode switch, switch to mode-7, enable the BG layer again, Let the Mode-7 layer rotate an object, disable the BG layer, switch modes, enable it again, to continue the background layer, and then use sprites to patch up the rectangular hole in the background layer. This way you can have a BG layer and a Mode-7 layer, without creating the entire background out of sprites.
Re: question about DMA registers
by on (#102464)
This would be incredibly cool, but I think the probability such a thing can be done in hardware is like 0.001%

I think it would remain in the current mode until the next scanline.

This should really be tested though.
Re: question about DMA registers
by on (#102466)
Bregalad wrote:
I think it would remain in the current mode until the next scanline.

Wouldn't that violate the hardware engineering rule of thumb to use the minimum number of flip-flops, such as those holding the current mode?
Re: question about DMA registers
by on (#102470)
Well, I don't know. In some cases one or two flip-flops could saves several dozen of gates.
The NES PPU is very "combinational" in it's design, however the C64 chip is exactly the opposite, it is very "sequential" as you can fool it into thinking it has done something when it haven't, or vice versa (you can focre it to re-fetch color table etc..., while the NES fetches colours every scanline)

Anyways we're not going to re-invent the SNES PPU and this should be tested.
Re: question about DMA registers
by on (#102487)
> DMA/HDMA crash on CPU revision 1
I don't think that my SNES is having that revision... or is revision1 the first revision (version2) after the original version (version1)? My bigger concern is that we don't seem to know which CPU (and PPU) versions do exist at all. That tiny detail should be reverse-engineered before talking about smarter differences between revisions. I know that my SNES is having this chipset (and ID values in right column):
Code:
 Board:     (C) 1992 Nintendo, SNSP-CPU-01        ;BOARD
 U1  100pin Nintendo, S-CPU A, 5A22-02, 2FF 7S    ;CPU  ;ID=2 in 4210h
 U2  100pin Nintendo, S-PPU1, 5C77-01, 2EU 64     ;PPU1 ;ID=1 in 213Eh
 U3  100pin Nintendo, S-PPU2 B, 5C78-03, 2EV 7G   ;PPU2 ;ID=3 in 213Fh

There seem to be different chip versions, and even a cost-down version with CPU+PPUs in one chip. The thing I'd like to see would be chip names & ID values typed-up from such consoles (or without typing-up: photos of the mainboard, bundled with screenshots of the ID values).

> PPU: What if you use the MUL functionality during a Mode7 screen rendering?
MUL writes are just treated as rotation/scaling parameters, and MUL reads work as so: http://nocash.emubase.de/fullsnes.htm#s ... ionscaling (in the "M7A/M7B Port Notes" section).
The funny thing is that the PPU is actually doing 680 insane-fast multiplications per scanline (normally one would need 8 multiplications for the first pixel, and then add horizontal offsets for the following pixels).
Re: question about DMA registers
by on (#102489)
nocash wrote:
My bigger concern is that we don't seem to know which CPU (and PPU) versions do exist at all. That tiny detail should be reverse-engineered before talking about smarter differences between revisions.

In The Lion King, if you push B Button, A Button, R Button, R Button, Y Button (BARRY), it'll tell you what the IDs are.

Quote:
The funny thing is that the PPU is actually doing 680 insane-fast multiplications per scanline (normally one would need 8 multiplications for the first pixel, and then add horizontal offsets for the following pixels).

Patent workaround perhaps?
Re: question about DMA registers
by on (#102495)
> It doesn't take 32K just to describe a reset vector, at least on the Super NES's predecessor.

Sure, and technically if you use a board manifest with my emulator you can do it with less than 32K as well. You can even map RAM there instead.

But the heuristics are a cheap hack to get commercial software working, and all commercial software either has 32K or 64K banks, so nobody scans for headers at 16K, etc.

My guess is that nocash is assuming a reset vector of $8000 if a ROM is <32K, which isn't how the hardware would work, but I guess is nice if you don't want to pad the test ROM for some reason.

> Is it flexible enough to allow multiple games in one file, such as for a plug-in that can load ROMs out of a .zip or .7z?

Yes, I used to do that too with the old bsnes/Qt version of this concept (snesloader)
It was fun eating up 600MB of RAM and ten seconds to load Super Mario World from the GoodMerge set (997 hacks in one archive. Not exaggerating that number, it's an exact value.) [and that speed/RAM usage was with the official 7zip library code, as used in fex.] {by the way, fex is fucking fantastic if you've never tried it.}
A lot of people loved that feature, too.

> I wonder how hard it'd be to port something like blargg's flowing palette demo to the Super NES.

Due to the DRAM refresh in the middle of each rendering scanline, it would have a sharp bar of solid colors for ~10-15 pixels. But aside from that, it would only be nominally harder (have to add in variable memory access speed and penalty cycles.) Good news is that it should work in bsnes/accuracy, too.

I used the display brightness register to write text with my demo ROMs. It was nowhere near as visually pleasing as blargg's example.

> I think it would remain in the current mode until the next scanline.

Likewise. Or worse, it will be like turning the display on and off. It'll just fuck the graphics up royally for several pixels, and then recover.

> U1 100pin Nintendo, S-CPU A, 5A22-02, 2FF 7S ;CPU ;ID=2 in 4210h

That's a revision 2. Your CPU is immune to DMA/HDMA crash.

> My bigger concern is that we don't seem to know which CPU (and PPU) versions do exist at all.

CPU has revision 1 & 2.
PPU1 has revision 1.
PPU2 has revision 1, 2 & 3. Revision 2 is hauntingly rare.

Known models:
CPU/PPU1/PPU2
1/1/1 (uncommon)
2/1/1 (rarest)
2/1/2 (really rare)
2/1/3 (common as dirt)

Once Nintendo moved to the one-chip design, they stopped updating the revisions, but still changed things.
The SNES Jr, for instance, has different SMP timer behavior (no glitching), and the PPU mid scanline effects still work, but seem to not work as well? Like, you lower the brightness from max to full black, yet you see onscreen a light gray color. WRAM/APURAM initialization patterns are totally different each time, too.

It's my personal opinion that the SNES Jr is an official clone (redesign) of the original system.

I've not found any differences between the PPU2 revisions. All the bugs I know of (X=256 priority issue, half-height on OAM size 6 interlace mode, EXTBG BG2 using the wrong scroll offset in one direction, etc) still exist.

> The funny thing is that the PPU is actually doing 680 insane-fast multiplications per scanline (normally one would need 8 multiplications for the first pixel, and then add horizontal offsets for the following pixels).

Wow, so they do reload and remultiply for every pixel? In that case, you could change the values mid-scanline.

> Patent workaround perhaps?

'604: "A method for adding to a number after having multipled it."

Sadly, I could see the US patent office granting that.
Re: question about DMA registers
by on (#102554)
> My guess is that nocash is assuming a reset vector of $8000 if a ROM is <32K
No, at FFFC, as usually. With ROMs of 1K,2k,4K,8K,16K size being mirrored within the 32K area. I thought that'd be obvious. Now I am wondering if there has ever been something like a 1K-compo for the SNES.

> the PPU mid scanline effects still work, but seem to not work as well?
> Like, you lower the brightness from max to full black, yet you see onscreen a light gray color.
You mean [2100h]=00h doesn't act as black, and not as dark-gray? And instead it does produce light gray?
Or, that there are a few gray pixels displayed at time of writing any value to 2100h?

> Known models:
> CPU/PPU1/PPU2
> 1/1/1 (uncommon)
> 2/1/1 (rarest)
> 2/1/2 (really rare)
> 2/1/3 (common as dirt)
Okay, and the newer stuff including cost-down single-chip version returns 2/1/3, too? Then I'll try a guess:

CPU Versions
CPU.ID=1 100pin Nintendo, S-CPU, 5A22-01 (CPU) http://www.chipdb.org/img-nintendo-s-cpu-snes-5274.htm
CPU.ID=2 100pin Nintendo, S-CPU A, 5A22-02 (CPU) as found in my own SNES
CPU.ID=2 100pin Nintendo, S-CPU B, 5A22-02 (CPU) http://www.snescentral.com/article.php?id=1017
CPU.ID=2 160pin Nintendo, S-CPUN A, RF5A122 (CPU, PPU1, PPU2, S-CLK)

PPU1 Versions
PPU1.ID=1 100pin Nintendo, S-PPU1, 5C77-01 (PPU1) as found in my own SNES
PPU1.ID=1 160pin Nintendo, S-CPUN A, RF5A122 (CPU, PPU1, PPU2, S-CLK)

PPU2 Versions
PPU2.ID=1 100pin Nintendo, S-PPU2?, 5C78-01? (rarely mentioned in internet)
PPU2.ID=2 100pin Nintendo, S-PPU2 A?, 5C78-02?? (never mentioned in internet)
PPU2.ID=3 100pin Nintendo, S-PPU2 B, 5C78-03 (PPU2) as found in my own SNES
PPU2.ID=3 100pin Nintendo, S-PPU2 C, 5C78-03 (PPU2) http://www.snescentral.com/article.php?id=1017
PPU2.ID=3 160pin Nintendo, S-CPUN A, RF5A122 (CPU, PPU1, PPU2, S-CLK)

Could that be correct?
The "S-CPUN A" chip name suggests that there might have also been a "S-CPUN" (without "A")?