cpow ran some tests with visual2a03 to delve further into DMA behavior. Though his notes were a bit scattered for my tastes, and left me with a few unanswered questions.
So using his notes as a base, I ran my own tests on visual2a03 to verify them, and to fill in a few of the gaps.
Here are my findings:
I also uploaded the unpolished / scattered notes which has the results of my tests as well as links to the test programs I ran:
https://www.dropbox.com/s/afvbxers66v9994/dma.txt?dl=0
So using his notes as a base, I ran my own tests on visual2a03 to verify them, and to fill in a few of the gaps.
Here are my findings:
Code:
==========================================
==========================================
==========================================
General
==========================================
DMA unit alternates between 'get' cycles and 'put' cycles. Values are read on 'get' cycles and written on 'put' cycles. 'get' cycles can never write -- 'put' cycles can read, but discard any value read. DMA unit seems to alternate between get/put even when DMA is not active -- effectively meaning that even cycles are 'get' cycles, and odd cycles are 'put' cycles.
"Dummy reads" ALWAYS seem to be performed from whatever address the CPU will want to read from next -- that is, whatever address will be read from once the DMA is complete.
When the DMA unit needs to cut into the CPU, it begins a 'halt' process. The process appears to be as follows:
1) The 'halt attempt' cycle -- Let the CPU start its next cycle.
a) If this cycle is a write, perform it normally. Repeat step 1
b) If this cycle is a read, hijack the read, discard the value, and prevent all other actions that occur on this cycle (PC not incremented, etc).
Presumably, side-effects from performing the read still occur. Proceed to step 2
2) For DMC DMA ONLY -- do another dummy read, discarding the result.
3) If the DMA unit is currently on a 'put' cycle, do another dummy read ('alignment' cycle)
4) Actually perform the DMA
a) For DMC, this performs a single read cycle, then returns control to main CPU logic
b) For OAM, this performs 256 alternating reads/writes as you'd expect
Note that the DMA is effectively delayed as it waits for all CPU write cycles to complete. Though this is just a delay, and does not actually alter the length of the DMA
What DOES alter the length of the DMA is the optional alignment cycle.
==========================================
==========================================
==========================================
DMC
==========================================
DMC DMAs appear to try to halt during the 'put' phase -- meaning they will take 4 cycles normally:
1) 'put' - halt
2) 'get' - extra DMC dummy read
3) 'put' - dummy cycle for alignment
4) 'get' - DMA
When DMC halt happens "on a write cycle", this makes it take 3 cycles because alignment can be skipped:
*) 'put' - initial halt attempt -- but if a write cycle, it's delayed
1) 'get' - attempt to halt again -- successful this time because it's a read cycle
2) 'put' - extra DMC dummy read
alignment not needed
3) 'get' - DMA
(note the '*' write cycle is performed normally, and therefore does not count as a stolen cycle, hence DMC only steals 3 cycles here)
However the DMC will steal 4 cycles if it attempts to halt during the first write of a RMW instruction (INC/DEC/etc)
*) 'put' - halt attempt - fails because CPU is writing (first RMW write)
*) 'get' - halt attempt - fails because CPU is writing (second RMW write)
1) 'put' - halt attempt - successful
2) 'get' - extra DMC dummy read
3) 'put' - alignment
4) 'get' - DMA
The above logic matches for 3 consecutive writes (interrupts/BRK). If the halt is during the 1st or 3rd write, it'll steal 3 cycles... but if it's during the 2nd write, it'll steal 4.
==========================================
==========================================
==========================================
OAM / 4014
==========================================
OAM DMA behaves similarly, but skips the DMC-only dummy read. Meaning OAM will take 513 / 514 cycles depending on whether or not
the alignment cycle is needed
Assuming the write is performed with a STA/STX/STY:
*) $4014 write cycle triggering OAM DMA
1) halt attempt - successful (next cycle is a read for the next opcode, or is an interrupt)
?) possible alignment
2) 'get' - read 1st byte
3) 'put' - write 1st byte
...
Writing to 4014 twice consecutively (INC/DEC/etc) holds expected logic. Both writes will perform, followed by the halt cycle,
possible alignment, then 512 cycles of DMA.
==========================================
==========================================
==========================================
Both at the same time
==========================================
Things to note:
- DMC DMA trumps OAM DMA
- A DMC halt is considered successful if it happens on an OAM DMA cycle
- under no circumstances can a DMC DMA cycle immediately follow a successful DMC halt cycle. There must be at least 1 dummy cycle, alignment cycle, or OAM DMA cycle between the halt and the DMA.
Examples:
Cycles marked with '+' are "DMC stolen"
p = must be a 'put' cycle (remember DMC always halts on a put cycle)
g = must be a 'get' cycle
* = normal, unaffected CPU cycle
DMC halts on the $4014 write cycle:
p *) $4014 write - unsuccessful DMC halt
g 1) DMC & OAM halt -- successful
p 2) DMC dummy / alignment not a DMC stolen cycle, since this would have to be alignment regardless
g+3) DMC DMA
p+4) re-alignment
g 5) OAM DMA read 1
p 6) OAM DMA write 1
...
DMC halts 1 cycle after $4014 write:
g *) $4014 write
p 1) DMC & OAM halt - successful
g 2) OAM read 1 (DMC dummy)
p 3) OAM write 1 (DMC alignment)
g+4) DMC
p+5) re-alignment
g 6) OAM read 2
p 7) OAM write 2
...
This logic follows for 2 consecutive writes to $4014.
==========================================
==========================================
==========================================
What I was not able to test
==========================================
Visual2a03 gave EXTREMELY weird behavior for OAM DMA. I suspect it needs more warmup time. OAM DMA was fetching from the wrong address, and the address being read from was being mangled by DMC DMAs, which was resulting in 700+ stolen cycles... and would also result in extremely corrupted sprites on a real system. Because of this I was unable to test the following:
1) What happens on edge case when DMC DMA occurs at the very end of OAM DMA?
2) If you INC $4014, does it DMA from the pre-incremented value or post-incremented?
==========================================
==========================================
General
==========================================
DMA unit alternates between 'get' cycles and 'put' cycles. Values are read on 'get' cycles and written on 'put' cycles. 'get' cycles can never write -- 'put' cycles can read, but discard any value read. DMA unit seems to alternate between get/put even when DMA is not active -- effectively meaning that even cycles are 'get' cycles, and odd cycles are 'put' cycles.
"Dummy reads" ALWAYS seem to be performed from whatever address the CPU will want to read from next -- that is, whatever address will be read from once the DMA is complete.
When the DMA unit needs to cut into the CPU, it begins a 'halt' process. The process appears to be as follows:
1) The 'halt attempt' cycle -- Let the CPU start its next cycle.
a) If this cycle is a write, perform it normally. Repeat step 1
b) If this cycle is a read, hijack the read, discard the value, and prevent all other actions that occur on this cycle (PC not incremented, etc).
Presumably, side-effects from performing the read still occur. Proceed to step 2
2) For DMC DMA ONLY -- do another dummy read, discarding the result.
3) If the DMA unit is currently on a 'put' cycle, do another dummy read ('alignment' cycle)
4) Actually perform the DMA
a) For DMC, this performs a single read cycle, then returns control to main CPU logic
b) For OAM, this performs 256 alternating reads/writes as you'd expect
Note that the DMA is effectively delayed as it waits for all CPU write cycles to complete. Though this is just a delay, and does not actually alter the length of the DMA
What DOES alter the length of the DMA is the optional alignment cycle.
==========================================
==========================================
==========================================
DMC
==========================================
DMC DMAs appear to try to halt during the 'put' phase -- meaning they will take 4 cycles normally:
1) 'put' - halt
2) 'get' - extra DMC dummy read
3) 'put' - dummy cycle for alignment
4) 'get' - DMA
When DMC halt happens "on a write cycle", this makes it take 3 cycles because alignment can be skipped:
*) 'put' - initial halt attempt -- but if a write cycle, it's delayed
1) 'get' - attempt to halt again -- successful this time because it's a read cycle
2) 'put' - extra DMC dummy read
alignment not needed
3) 'get' - DMA
(note the '*' write cycle is performed normally, and therefore does not count as a stolen cycle, hence DMC only steals 3 cycles here)
However the DMC will steal 4 cycles if it attempts to halt during the first write of a RMW instruction (INC/DEC/etc)
*) 'put' - halt attempt - fails because CPU is writing (first RMW write)
*) 'get' - halt attempt - fails because CPU is writing (second RMW write)
1) 'put' - halt attempt - successful
2) 'get' - extra DMC dummy read
3) 'put' - alignment
4) 'get' - DMA
The above logic matches for 3 consecutive writes (interrupts/BRK). If the halt is during the 1st or 3rd write, it'll steal 3 cycles... but if it's during the 2nd write, it'll steal 4.
==========================================
==========================================
==========================================
OAM / 4014
==========================================
OAM DMA behaves similarly, but skips the DMC-only dummy read. Meaning OAM will take 513 / 514 cycles depending on whether or not
the alignment cycle is needed
Assuming the write is performed with a STA/STX/STY:
*) $4014 write cycle triggering OAM DMA
1) halt attempt - successful (next cycle is a read for the next opcode, or is an interrupt)
?) possible alignment
2) 'get' - read 1st byte
3) 'put' - write 1st byte
...
Writing to 4014 twice consecutively (INC/DEC/etc) holds expected logic. Both writes will perform, followed by the halt cycle,
possible alignment, then 512 cycles of DMA.
==========================================
==========================================
==========================================
Both at the same time
==========================================
Things to note:
- DMC DMA trumps OAM DMA
- A DMC halt is considered successful if it happens on an OAM DMA cycle
- under no circumstances can a DMC DMA cycle immediately follow a successful DMC halt cycle. There must be at least 1 dummy cycle, alignment cycle, or OAM DMA cycle between the halt and the DMA.
Examples:
Cycles marked with '+' are "DMC stolen"
p = must be a 'put' cycle (remember DMC always halts on a put cycle)
g = must be a 'get' cycle
* = normal, unaffected CPU cycle
DMC halts on the $4014 write cycle:
p *) $4014 write - unsuccessful DMC halt
g 1) DMC & OAM halt -- successful
p 2) DMC dummy / alignment not a DMC stolen cycle, since this would have to be alignment regardless
g+3) DMC DMA
p+4) re-alignment
g 5) OAM DMA read 1
p 6) OAM DMA write 1
...
DMC halts 1 cycle after $4014 write:
g *) $4014 write
p 1) DMC & OAM halt - successful
g 2) OAM read 1 (DMC dummy)
p 3) OAM write 1 (DMC alignment)
g+4) DMC
p+5) re-alignment
g 6) OAM read 2
p 7) OAM write 2
...
This logic follows for 2 consecutive writes to $4014.
==========================================
==========================================
==========================================
What I was not able to test
==========================================
Visual2a03 gave EXTREMELY weird behavior for OAM DMA. I suspect it needs more warmup time. OAM DMA was fetching from the wrong address, and the address being read from was being mangled by DMC DMAs, which was resulting in 700+ stolen cycles... and would also result in extremely corrupted sprites on a real system. Because of this I was unable to test the following:
1) What happens on edge case when DMC DMA occurs at the very end of OAM DMA?
2) If you INC $4014, does it DMA from the pre-incremented value or post-incremented?
I also uploaded the unpolished / scattered notes which has the results of my tests as well as links to the test programs I ran:
https://www.dropbox.com/s/afvbxers66v9994/dma.txt?dl=0