When to use DMA and when not to?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
When to use DMA and when not to?
by on (#170059)
In the event of trying to do a simple DMA for uploading an 8x8 ball graphic, I had thought of something: At what times would you choose not to DMA to vram and when would not? It had occurred to me that updating a tilemap horizontally should be faster than vertically, because you're just DMAing a single line of data, where vertically, you're doing 4 bytes, then not doing anything for 128 bytes, 4 more bytes, skipping an additional 128 bytes, so on. That is, unless there's somehow a way to write 4 (or 1 or 2 bytes, just use multiple DMA channels) bytes and then skip 128 via DMA. I also thought of how updating a BG for a score counter or something like that would be slower trying to use DMA.

Now, it's time for me to freeload off you guys. :lol: Do you see something wrong with this, because, as I said, I'm trying to upload 1 4bpp tile to the very beginning. I haven't had much experience with DMA due to Macros that took care of it, but I'm trying to stay away from those now. I didn't comment on anything, I used old code I didn't make that had these comments, but I arranged everything according to it. A is 8 bit, while X and Y are 16 bit.
Code:
  lda #$F0      ;Increment VRAM address by 1 after write to $2119
  sta $2115
  ldx #$0000
  stx $2116      ;$2116: Word address for accessing VRAM

  ldx #.LOWORD(BallTile)
  stx $4302      ;Store Data offset into DMA source offset
  lda #.BANKBYTE(BallTile)
  sta $4304      ;Store data Bank into DMA source bank
  ldx #$20
  stx $4305      ;Store size of data block

  lda #$01
  sta $4300   ;Set DMA mode (word, normal increment)
  lda #$18   ;Set the destination register (VRAM write register)
  sta $4301
  lda #$01   ;Initiate DMA transfer (channel 1)
  sta $420B

Just so you know, I looked at a manual detailing what each register does before asking for help. :roll:
Re: When to use DMA and when not to?
by on (#170060)
It looks like it should work, except "ldx #$20" might assemble wrong because it looks like an 8-bit value.
Re: When to use DMA and when not to?
by on (#170062)
Well, half of the problem I noticed is that I enable NMI before the code gets run. :lol: Well, I fixed that, and it still doesn't work, and I also put two 0's in front.
Re: When to use DMA and when not to?
by on (#170065)
Keep in mind that this code either needs to run during VBlank, or you need to have forced blanking on.
Re: When to use DMA and when not to?
by on (#170066)
Espozo wrote:
vertically, you're doing 4 bytes, then not doing anything for 128 bytes, 4 more bytes, skipping an additional 128 bytes, so on. That is, unless there's somehow a way to write 4 (or 1 or 2 bytes, just use multiple DMA channels) bytes and then skip 128 via DMA.

Tilemap rows are 64 bytes (32 words - remember, VRAM uses word addressing), not 128. One tilemap entry is two bytes, not 4. Mode 7 tilemap rows are 128 bytes (one byte per tile), and since they're interleaved with tile data they span 128 words.

There is indeed a way to DMA a tilemap column in one shot. Read the VMAIN register description again.
Re: When to use DMA and when not to?
by on (#170106)
Quote:
Read the VMAIN register description again.

Who said I ever read it before? :lol:
Re: When to use DMA and when not to?
by on (#170111)
Espozo wrote:
Quote:
Read the VMAIN register description again.

Who said I ever read it before? :lol:

Yeah. :?

From my personal yet insignificant point of view, you could easily go on like this forever -- i.e., put out random ideas too big for your boots anyway, and dismiss any substantial advice by shallow counterquestions and/or outright defensive behavior. (Don't bother asking for proof -- your >2300 post counter speaks for itself.)

Oh ye good Lord (which I know dost not exist anyway), please let there be brains in Mr Husband. :mrgreen:
Re: When to use DMA and when not to?
by on (#170133)
Espozo wrote:
Quote:
Read the VMAIN register description again.

Who said I ever read it before? :lol:

You did. Last line in the OP.

To be fair, the SNES register list is really quite a lot of information and it's not completely unreasonable to have missed something. Just... realize that for most of us, we learned most of what we know from the same docs you have access to. I've actually never updated a tilemap column in VRAM; I just saw the address increment bits in the description of $2115 and figured that was what they were for...

I don't see anything wrong with your code. If it doesn't work, it may be the fault of the context. Nicole's suggestion is the obvious one, but there may be other possibilities.

...

As for the thread title - basically, use DMA if it's faster, and just write to the data ports manually otherwise. If you're writing more than a few consecutive bytes, it's probably faster to use DMA. Count the cycles if you aren't sure (and you care enough to bother).

As you know, DMA is normally used to send data to VRAM/CGRAM/OAM during VBlank or forced blank (CGRAM is also accessible during HBlank, but it's usually better to use HDMA for that). However, it is also possible to use DMA to move data between ROM/WRAM/SRAM, using the WRAM gate on the B bus ($2180). (No, you cannot do WRAM-to-WRAM transfers this way.) Theoretically it'd be best to do this during active display so you don't limit your VBlank time - but unfortunately the launch-model CPU has a bug that can lock up the system if DMA and HDMA are used at the same time (I think it's when a DMA ends right near where an HDMA starts, but I'm not sure). If you can guarantee that the DMA/HDMA conflict won't be triggered (the easiest way to guarantee this is to not use HDMA), you should be able to use DMA during active display, as long as you don't try to access VRAM/CGRAM/OAM. Otherwise, don't, unless you don't care about rev.1 CPU compatibility...

As for writing to ARAM, you shouldn't DMA to those ports because the SPC700 has to manually pick up the data, and it's even slower than the CPU. If you can work out how to use HDMA to transfer data, great; otherwise it has to be manual.
Re: When to use DMA and when not to?
by on (#170143)
Ramsis wrote:
Yeah. :?

From my personal yet insignificant point of view, you could easily go on like this forever -- i.e., put out random ideas too big for your boots anyway, and dismiss any substantial advice by shallow counterquestions and/or outright defensive behavior. (Don't bother asking for proof -- your >2300 post counter speaks for itself.)

Oh ye good Lord (which I know dost not exist anyway), please let there be brains in Mr Husband. :mrgreen:


I've either completely misinterpreted you or that was the single most inflammatory forum post I've ever seen.

You could have said that in so many other ways. You could have just said 'Read it then' instead of tearing this guy down piece by piece. And why the hell did you pull religion into it? I'm atheist as well but that doesn't mean I go around with a 'fuck believers' shirt on. This is one of the best forums I've ever been on, please don't ruin it just because someone asked a question they might have been able to answer on their own.
Re: When to use DMA and when not to?
by on (#170145)
93143 wrote:
To be fair, the SNES register list is really quite a lot of information and it's not completely unreasonable to have missed something. Just... realize that for most of us, we learned most of what we know from the same docs you have access to.

And a lot of us programmed the NES before the Super NES and recognized analogous registers. For example, VMAIN on the S-PPU is analogous to bit 2 of $2000 on the NES.
Re: When to use DMA and when not to?
by on (#170151)
93143 wrote:
You did. Last line in the OP.

I see the problem: I read this, and it doesn't say VMAIN. https://en.wikibooks.org/wiki/Super_NES ... _Registers

Guilty wrote:
I've either completely misinterpreted you or that was the single most inflammatory forum post I've ever seen.

You read it right. I don't even post on any of his threads, but he insists on commenting on mine despite the fact he isn't here to make me any better of a programmer, just to make derogatory remarks.
Re: When to use DMA and when not to?
by on (#170154)
I'd really recommend http://problemkaputt.de/fullsnes.txt as a general resource, personally. It consolidates a lot of thorough information in one place.

There's also http://problemkaputt.de/fullsnes.htm which has formatting, but seems slightly older, being "extracted from no$sns v1.5" instead of "v1.6".
Re: When to use DMA and when not to?
by on (#170175)
Espozo wrote:
I see the problem: I read this, and it doesn't say VMAIN. https://en.wikibooks.org/wiki/Super_NES ... _Registers

It also does a hilariously bad job of explaining exactly what "0x2115" does...

The reference I usually use for registers is this. It's based on anomie's docs, and it's older than fullsnes (which I use as well) but IMO easier to find stuff in. The rest of the pages on superfamicom.org are good too, as you know, but the Registers page is a really good one-stop cheat sheet.
Re: When to use DMA and when not to?
by on (#170183)
Yeah, I should look at those sources.

Unsurprisingly, it was me being an idiot as to where I put it. It's good now.

I think if I have a BG scoreboard, I guess I'll copy over a numbers or whatever else in the tilemap format from ram into vram. I'm not sure what kind of performance benefit you're getting from something like 16 bytes though, considering how much you have to do to set up DMA.
Re: When to use DMA and when not to?
by on (#170185)
Espozo wrote:
It's good now.

Good.

Quote:
I'm not sure what kind of performance benefit you're getting from something like 16 bytes though, considering how much you have to do to set up DMA.

Add it up.

http://fdwr.tripod.com/docs/65c816.txt (instruction details (Table 9) start about 2/3 of the way through)
http://wiki.superfamicom.org/snes/show/Timing (DMA section about halfway down)
Re: When to use DMA and when not to?
by on (#170186)
Wait, right off the bat:

Quote:
DMA takes 8 master cycles per byte transferred

Isn't this potentially slower than lda, sta, lda, sta, etc? I know it depends on the size of the accumulator and what kind of addressing you're using, but if I recall correctly, with 16 bit addressing, plain old lda takes 2 cycles, (3 for 16 bits) and the same for sta, so you could upload a byte in 4 cycles if the accumulator is 8 bit, or upload 2 bytes in 6 cycles if the accumulator is 16 bit. I'm fully aware about how much more limiting this is, but still, for something like FMV, it'd be perfect. Or wait... Is there some sort of other limitation affecting the amount you can upload to vram during vblank?

Now I'm starting to stray off course... :lol:
Re: When to use DMA and when not to?
by on (#170188)
That's master cycles, not CPU cycles. The master clock is 21.477 MHz. Each CPU cycle is 6, 8, or 12 master cycles depending on the memory area accessed and the setting of $420D.

In other words, DMA is one "slow" CPU cycle per byte transferred. Overhead (not counting the instructions necessary to set up the DMA unit and start it) is one slow CPU cycle for the transfer, one for each channel, and 0.25-1 at each end for cycle alignment purposes.

Also, lda is only 2-3 cycles if it's using immediate addressing - that is, the operand is the data. (Needless to say, sta doesn't have an immediate addressing mode...) With simple addressing it's one CPU cycle for the opcode, between one and three for the address, and one or two for the actual data, and the same is true of sta. Indexing and indirect addressing can mix things up a bit.
Re: When to use DMA and when not to?
by on (#170190)
In other words, DMA is so fast in comparison to any other method of copying data that there's no point in using anything else for large chunks of data.

As for small chunks, that depends on how much overhead there is. Also, it's worth noting that DMA can't copy between two PPU registers, nor copy between two parts of WRAM at once.

Specifically, DMA must be from the $2100-$21ff registers to somewhere outside of that, or vice versa.

You might get the idea that you could get around this for WRAM by copying between $7e0000-$7fffff and $2180, but it doesn't actually work; the WRAM chip simply can't handle it.