> 0.4MIPS wait what?
Technically, it depends on a lot of factors. When I say 0.4MIPS, I'm usually rolling operations together like "clc; adc", "sec; sbc", and so forth.
But the chip clock runs at 21477272hz. Given normal memory speeds (8 clocks per cycle), instructions are anywhere from 2 - 8 cycles long. The more cycles, the more useful.
In the absolute best case, you could execute at 1.342MIPS. But, good luck making a game with only:
Code:
clc, cld, cli, clv, sec, sed, sei,
tax, tay, txa, txy, tya, tyx,
tcd, tcs, tdc, tsc, tsx, txs,
inc, inx, iny, dec, dex, dey,
asl, lsr, rol, ror, nop, xce,
(and you don't want to know why I just happen to have that list handy :P)
Even if it were possible, you would spend way more instructions working around the horrible limitations to do the same thing anyway.
So I tend to think of 6 cycles as being the average for doing something useful (clc + long-adc, jsl subroutine, stuff that eg the 680x0 can easily do (and then some) in one instruction.) That gets you to 0.447MIPS.
If you want to be more optimistic, you could say 4 cycles is more reasonable, which only brings you to 0.671MIPS. FastROM can also help a bit, but is heavily crippled by not being usable on RAM.
Regardless, said performance is absolutely dreadful. You really do have to micro-optimize the living hell out of SNES ASM. Especially if you ever try something as reckless as "drawing proportional fonts." It can take entire frames to render a proportional font screen.
> I disagree, I still can read my old 68000 asm code just fine, it's perfectly doable.
You're comparing apples and carburetors. 680x0 is very readable compared to 65816.
Code:
mulu.l #23,d0 //(unsigned) multiply d0 by 23
vs
Code:
sta $00; asl; sta $02; asl; sta $04; asl; asl; clc; adc $04; clc; adc $02; clc; adc $00 //(unsigned) multiply by 23
Fun exercise for the reader: try and do muls.l #23,d0 (signed multiply) on the 65816.
> I don't understand this fear of the assembly.
You haven't written enough.
I tried to make a fan translation for Dai Kaijuu Monogatari. I hacked 95% of the game, reprogrammed all of the text engines and window displays entirely. It was about 200 - 300KB of pure 65816 ASM, commented as best I could.
I sat around for a year asking for anyone to please help me translate the script. I went back to look at the code and realized ... it was only maybe 5% better than looking at a raw disassembly.
Why? Because all math was done on the accumulator (and optimized like my mul 23 above that renders it unreadable), speed optimizations resulted in nasty tricks with the stack and register sizes (no common calling convention or assumptions on A/X sizes inside called functions), branching was abused heavily for performance (the bad kind of goto), memory was all hard-coded addresses, and there was no grander structure beyond basic functions.
This isn't a problem at all for
writing 65816. But when you come back to it after months of inactivity, you can't look at the code and intuitively understand all of the speed hacks and accumulator math at first glance. You basically have to study and relearn each function you wrote in order to be able to use them again.
With C++, all of my expressions are in pure math, I can work with all the variables I want at one time, I can build classes with member functions, inherit from base classes and obtain polymorphism transparently, and allocate memory dynamically on the heap.
Absolute world of difference. I know 65816
very well, and I've reverse engineered and programmed massive blocks of code in it. There are many as good as me, but quite possibly no one better at it. But I'm not going to pretend it's easy to write large-scale 65816 applications because of that.