I've heard about some kind of crashing that happens when HDMA and DMA are used at the same time. Does this always happen, or can this be avoided somehow?
It can be avoided by not using a 1/1/1 system.
Revenant wrote:
It can be avoided by not using a 1/1/1 system.
You can't really demand that from your customers...
Half-serious answer aside, I'm pretty sure the only easy/reliable way around it is just to avoid doing DMA outside of vblank.
I don't remember personally seeing too many details of the DMA crash being discussed - I want to say it only happens if DMA and HDMA start at exactly the same time, not just overlap, but others could possibly clarify that.
The problem is clearly documented (with full explanations/timing diagrams) in the official SNES documentation. Chapter 25, Section 1 describes the problem. It only applies to SNES units using CPU revision 1. Several workarounds/solutions are available.
Basically: if you really need both to happen at the same time (i.e. you can't fit DMA into vblank), use the HV counter to ensure the DMA is timed at a point where it won't stop at the same time HDMA starts?
Or use an IRQ, or use cycle-counted code from a known start point.
In other words, it's no good for speeding up decompression in the presence of HDMA, because a fast decompressor needs to be free-running; it can't afford to wait around for a safe H-position. I stopped bugging psycopathicteen about HDMA audio streaming in the Bad Apple demo when I realized this.
Real-life example: The PowerPak loads data in chunks of 512 bytes from the CF card using DMA during active display. The original firmware didn't use HDMA at all, so when I implemented the latter for the first time, I had to find a way to avoid the glitch. ikari made the clever suggestion to correctly time the DMA transfers simply by waiting for Hblank before starting them.
So the way it's worked ever since v2.00-beta is roughly like this:
- Scanline N: Wait for Hblank and start DMA around the time HDMA finishes
- Scanline N+1: DMA is still running. When Hblank occurs, it is interrupted once by HDMA
- Scanline N+2: DMA finishes long before Hblank (and therefore, HDMA) occurs
So the glitch is never triggered, no matter the console revision.
Either you give up on 1/1/1 users, or you only use DMA during Vblank.
There is unfortunately no other viable option. SNES timing is way too sporadic, and we don't even fully understand exactly what causes the crash yet, nor what actually happens that causes it. The SNES dev manual is too vague about its timing notes to emulate the behavior accurately. I have good theories, but the SNES is very resilient against logical analysis :P
I once came across a US football game that read out the CPU revision, and used DMA on r2, and a block move on r1. But even this was a terrible idea: if the speed is the same, just use the r1 version; and if the speed is different, what a horrible idea that half your users will be complaining of slowdowns. Regular end users have no way of knowing which revision their SNES is: they can't even remove the security screws to find out.
> Basically: if you really need both to happen at the same time (i.e. you can't fit DMA into vblank), use the HV counter to ensure the DMA is timed at a point where it won't stop at the same time HDMA starts?
This is honestly asking too much. There's 1324 usable cycles per scanline, the setup time to trigger the DMA is going to add some cycles to that, the CPU<>DMA and DMA<>CPU syncs add time, there's DRAM refresh in the mix, and you have to add the transfer lengths * 8 cycles per byte, and then you also have to consider if your DMA goes over HDMA transfers (which are even more complex, and can vary timing based on indirect transfers and reloading counters) ... and most likely, DMA transfers are going to be longer than one scanline, so that's most likely certain.
It's way too complex. I would have a ton of trouble implementing that reliably, even using emulation logs to proof everything, and I can't think of anyone who knows more about SNES timing than I do (a couple of people who know as much as I do.)
Pretty much the only way it'd work is if your DMA and HDMA always transferred the same amount of data each time, and your HDMA didn't have weird indirect fetching going on, and you could align the DMA start to a fixed point (whether via $4212's Hblank status, or a timed IRQ/WAI), and you could observe in an emulator that you weren't anywhere near the HDMA start when the DMA ended.
All of those conditions would probably only be true ~3% of the time you wanted to use DMA. And even then, you may be better off just using a block move operation if the length is small enough (I know, mvn/mvp are slow), compared to waiting for a blanking even or IRQ trigger before starting your DMA.
Thus, my strong advice is: don't do it. Find another way to make your code work. I don't care who pulled it off, it should never be done. You're playing with fire.
byuu wrote:
I once came across a US football game that read out the CPU revision, and used DMA on r2, and a block move on r1.
I think
Uniracers does the same thing.
byuu wrote:
and if the speed is different, what a horrible idea that half your users will be complaining of slowdowns.
I assume the developers considered this difference an acceptable engineering tradeoff. Compare it to disc-based game consoles prior to 2013, which slow down if the disc gets scratched. (The PlayStation 4 and Xbox One install the entire game to an internal HDD and read only the lead-in to verify disc presence during gameplay.) Compare it to Game Boy Color dual-mode games, which can run the CPU at double speed and rely on fast LCD response time only in GBC mode. And compare it to PC games, which have to deal with everything from the Atom and Intel integrated graphics in a convertible tablet/laptop to 4K NVIDIA masturbation in a purpose-built desktop.
byuu wrote:
Regular end users have no way of knowing which revision their SNES is: they can't even remove the security screws to find out.
If you have a copy of
The Lion King, BARRY will tell you.
byuu wrote:
Either you give up on 1/1/1 users, or you only use DMA during Vblank.
There is unfortunately no other viable option.
Well, there is a third option, which is obvious but should probably be listed as well: just don't use any HDMA when you need to DMA outside of Vblank.
Maybe that's why sometimes the IRQs are used instead of HDMA?
tepples wrote:
If you have a copy of The Lion King, BARRY will tell you.
Thanks to this, I now know that I have a 2/1/3 SNES.
Edit: Are there any examples of people getting different values on this screen? I've looked around and all I can see is 2/1/3. Any confirmation that this screen is accurate?
Edit 2: Found a video of someone getting 1/1/1 (in an emulator). So I guess 2/1/3 is a lot more common? Kinda sucks for the 1/1/1 users.
Edit 3 (sorry): Looks like bsnes-plus emulates CPU revision 2. At some point I might like to ask someone to test my game on a 1/1/1 system to see if it works properly there, unless bsnes/higan can be made to emulate the DMA/HDMA crash.
2/1/3 is probably the most common combination by far, partially because even all of the 1CHIP revisions report it even though it's not strictly the same CPU/PPUs as a "real" 2/1/3 system.
There are definitely others (I own a 2/1/1), but between all the numerous hardware revisions, version 1 S-CPUs are a definite minority. But if you do care about supporting them, I still think the only realistic option is to just keep DMA and HDMA separate at all times.
My Super NES is 1/1/1 and yellow as fucc, and I have the SNES PowerPak with MUFASA firmware.
> I think Uniracers does the same thing.
That is incorrect. Uniracers writes to OAM during Hblank, which the SNES "forbids".
But because it's done on a scanline with no sprites, the OAM write cursor is pointed by the PPU at address 0x218. I really don't have a clue how the Uniracers devs figured that out. I presume they just got lucky and their code worked by sheer accident.
> I assume the developers considered this difference an acceptable engineering tradeoff.
I should add, I played the game in both modes. I only found out about it because my HDMA code had a bug in it (this was pre-libco, so it was a terrifyingly complex state machine) back then, and the bug mysteriously went away when FitzRoy changed the CPU revision from 2 to 1.
There is no difference in speed. It's just creating two possible failure points instead of one.
> If you have a copy of The Lion King, BARRY will tell you.
Right, or select on Final Fantasy: Mystic Quest's status screen. Or you can heat a Bic pen to melting point and jam it into the screws to create a temporary bit (never worked when I tried it.) That's still an unreasonable requirement.
> Maybe that's why sometimes the IRQs are used instead of HDMA?
Very possible.
> Edit 3 (sorry): Looks like bsnes-plus emulates CPU revision 2.
I can't speak for all the forks, but I emulate some differences of CPU r1 and r2. Not all of them. But the setting is currently not changeable once the emulator is compiled.
I don't know how to emulate the HDMA/DMA crash of CPU r1, so that's why I default to r2. My emulator is a more faithful r2 emulator than r1 emulator.
tepples wrote:
yellow as fucc
That's easily reversible if you have access to hydrogen peroxide and sunlight. I've done this successfully many times.
byuu wrote:
I can't speak for all the forks, but I emulate some differences of CPU r1 and r2. Not all of them. But the setting is currently not changeable once the emulator is compiled.
-plus can emulate the r1 CPU (using a config file variable rather than a compile-time setting), which affects HDMA and DRAM refresh timing.
Quote:
This is honestly asking too much. There's 1324 usable cycles per scanline, the setup time to trigger the DMA is going to add some cycles to that, the CPU<>DMA and DMA<>CPU syncs add time, there's DRAM refresh in the mix, and you have to add the transfer lengths * 8 cycles per byte, and then you also have to consider if your DMA goes over HDMA transfers (which are even more complex, and can vary timing based on indirect transfers and reloading counters) ... and most likely, DMA transfers are going to be longer than one scanline, so that's most likely certain.
You can set up the DMA registers before waiting for H-blank and then all that's left is writing to $420b. Also, I don't need more than 64 bytes at once. I'm trying to make a self-updating level map, so I can have longer levels, and also do tricks like making a level repeat during a boss fight.
Er, is 64 bytes really worth the effort? I bet there are other things that could deserve optimization way more.
I guess now is a good time to try 10.6 coordinates since I got the game to work within a 1024x512 playfield. I'll make a copy of the source code, in case something goes wrong.
I also might also simplify my animation engine a bit since I have a better idea of how many sprites I need. Instead of mixing 32x32 and 16x16 slots together, I might have 20 32x32s, 32 16x16s and 4 static 32x32 explosion frames. I'll still cheat when it comes to fireballs.
byuu wrote:
Either you give up on 1/1/1 users, or you only use DMA during Vblank.
I don't agree.
byuu wrote:
if the speed is the same, just use the r1 version; and if the speed is different, what a horrible idea that half your users will be complaining of slowdowns. Regular end users have no way of knowing which revision their SNES is: they can't even remove the security screws to find out.
How would the knowledge about CPU revisions help them, anyway?
byuu wrote:
This is honestly asking too much. There's 1324 usable cycles per scanline, the setup time to trigger the DMA is going to add some cycles to that, the CPU<>DMA and DMA<>CPU syncs add time, there's DRAM refresh in the mix, and you have to add the transfer lengths * 8 cycles per byte, and then you also have to consider if your DMA goes over HDMA transfers (which are even more complex, and can vary timing based on indirect transfers and reloading counters)
As a programmer, you know your program. You know what's going on. So what's your point?
byuu wrote:
It's way too complex.
It's complicated, yes. Complex, not really.
byuu wrote:
I can't think of anyone who knows more about SNES timing than I do (a couple of people who know as much as I do.)
You're my hero. (No joke/irony/sarcasm intended.)
byuu wrote:
Pretty much the only way it'd work is if your DMA and HDMA always transferred the same amount of data each time, and your HDMA didn't have weird indirect fetching going on, and you could align the DMA start to a fixed point (whether via $4212's Hblank status, or a timed IRQ/WAI)
These "ideal" conditions
do occur, you know.
byuu wrote:
Thus, my strong advice is: don't do it. Find another way to make your code work. I don't care who pulled it off, it should never be done.
Whether you care or not, it has been done.
Apart from that, I reckon anyone even asking the OP's question is familiar with best practice in this regard anyway. So yes, it definitely
should be done, and be it only for
testing purposes.
Ramsis wrote:
As a programmer, you know your program. You know what's going on. So what's your point?
Wishful thinking, unfortunately. If you're using any kind of abstraction layer, you're likely going to be a lot less informed about side effects caused by the underlying implementation. Yes, even the SNES DMA registers are an abstraction.
Is it [well-]understood how FFMQ (/USA) and The Lion King learn CPU revision data? Is it small enough to be run in the firmware? A reason not to just include on cart and possibly make a warning "This may freeze on your SNES"?
Is there a testROM out there for it, as another option?
Yeah, it's just reported directly by reading from $004210.
The problem is that either you entirely avoid the possibility of DMA/HDMA conflicts, or you write defensively like Ramsis's code, or it will crash on a 1/1/1 system. You can't get away with a warning label like "might crash"; it's just too likely.
Quote:
reported directly
I'm surprised it didn't make it into more games, unless it was against lot-check regs or something.
Unless whatever you're doing is in a[some] cutscene(s) and you grant the option to skip (or auto-skip if 1/1/1).
HihiDanni wrote:
Ramsis wrote:
As a programmer, you know your program. You know what's going on. So what's your point?
Wishful thinking, unfortunately. If you're using any kind of abstraction layer, you're likely going to be a lot less informed about side effects caused by the underlying implementation. Yes, even the SNES DMA registers are an abstraction.
What do you mean by an "abstraction"?
Myask wrote:
I'm surprised it didn't make it into more games, unless it was against lot-check regs or something.
It's as easy as (assuming 8-bit accum)
lda $004210 / and #$0f / cmp #$02 / blt CPURevIs1Or0 (
blt ==
bcc). I imagine the logical code would just check this at power on or reset (see what else $4210 does for why, re: bit 7), and either set a variable in RAM or DP somewhere that DMA/HDMA code refers to (to know which CPU-revision-compatible methodology to use), or load the address of the CPU-revision-compatible DMA/HDMA routine and stick it in RAM/DP somewhere and use that via an indirect jump or equivalent; there are several approaches.
My point being: the detection process for CPU (and PPUs, for that matter ($213e and $213f)) revision is very simple and only needs to be done once.
> I don't agree.
That's your right. People can listen to you and your experience with writing an SNES RPG (something I haven't done, but which shows a lot of knowledge of SNES development.) Or they can listen to me and my experience writing the most accurate SNES emulator. Or they can make up their own mind.
(I'm not in any way trying to imply either of us are more knowledgeable or should be listened to more. I was trying to reflect that we both have a lot of weight in knowing what we're talking about.)
> Wishful thinking, unfortunately.
Definitely. I can probably count on one hand the number of devs that can count the actual cycles of their code, including penalty cycles (conditions 2, 4 and 6 in the WDC manual especially.)
Now throw in bus hold delays, DRAM refresh, CPU->DMA sync, DMA->CPU sync, possible IRQs, HDMA indirect fetch overhead, etc and throw in the fact that we don't even know when exactly this crash occurs, and yeah.
What's going to end up happening is people tweaking things until the game "seems to work fine", which is a terrible way to program and it's why certain games end up breaking on some systems/emulators but not others.
I'm sure I could study for a bit and prepare my own fugu sushi, or rewire my circuit breaker box. But even if I succeed, and I got some added benefit out of it, it was still not a very wise idea to attempt it.
> My point being: the detection process for CPU (and PPUs, for that matter ($213e and $213f)) revision is very simple and only needs to be done once.
There's no reason to do it.
There is no PPU1r2, and no one knows of any differences between the three PPU2 revisions.
I discovered two CPUr2 revisions that are extremely minor (HDMA init timing, and DRAM refresh trigger position), and nobody understands the DMA<>HDMA crash well.
But even if someone did, again ... you don't want your game to run like garbage on a 1/1/1 deck. And I'd bet money that if there was a really substantial difference, Nintendo would have disapproved the software in QA testing. In fact, they may well have done just that, hence we have no examples to the contrary.
Even worse about the revisions ... they clearly stopped bumping them at 2/1/3, despite the fact that changes continued to be made. The SNES Jr pretends to be a 2/1/3, despite many glaring and severe changes.
That is to say, I'm surprised that so few "display to user the CPU revs" instances exist. Not that I'm surprised that nobody (or almost nobody? Uniracers?) did differential-coding for the revisions, or even 2/1/3-only code; that's the sort of Hardware-Dependent Mystery that would not be good for consumer confidence in product.
Funny how the version of the game(FFMQ) that is prone to freeze, fixed in later revision, is the one that lets you see if you have such a version of the SNES...which I seem to have. Hooray for 1/1/1?