65816 really 16 bit or just a 6502 with 16 bit registers

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176188)
Hi all, i have a discussion with stef on a french forum .
Like you know already, stef dislike the snes and specially the 65816 .
For him this CPU is a 8 bit CPU because the bus is 8 bit,for me it's a 16 bit (because the ALU is 16 bit) but a system with a 65816 cannot be qualified of really 16 bit because the bus .

For me the bit of a CPU is dependant of his ALU, but Z80 have a 4 bit ALU multiplexed to 8 bit .

What do you think ??
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176193)
TOUKO wrote:
For him this CPU is a 8 bit CPU because the bus is 8 bit,for me it's a 16 bit (because the ALU is 16 bit)

You are right. The Intel 8088 and the Motorola 68008 are also 16 bit CPUs.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176195)
8088 and 68008 are 8 bit data bus cut down version of 8086 and 68000 which are both true 16 bit CPU where the 65C816 doesn't own by itself a full 16 bit core design (most of it comes from the 65C02).

@Touko> If you were totally honest, i said the 65816 is as much 16 bit than the 68000 is 32 bit. I also said for for me the 65816 is more a 8 bits CPU with extended 16 bits capabilities (enabling user friendly 16 bit development).
And yeah i dislike the 65C816 and the SNES in a technical point of view as they are both weak piece of hardware and badly designed... i can't do anything for that :-/ On the other hand i love playing on SNES and Super Metroid stay my untouched best game for ever :p
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176196)
Quote:
i said the 65816 is as much 16 bit than the 68000 is 32 bit

Not for me, the 68k like the 65816 has a 16 bit ALU,i said even if the 68k could do 32 bit ops, they are splited internally,into 2 16 bit ops, in contrary to 16 bit ops in the 65816.

Seems to be the same thing, i think that the 8 bit ops on Z80(because of the 4 bit ALU) is close to 32 bit on the 68k, may be this why the Z80 efficiency is not so fabulous(but it's not a bad or weak CPU, is only less efficient) .

Quote:
And yeah i dislike the 65C816 and the SNES in a technical point of view as they are both weak piece of hardware and badly designed

Stef i respect your opinion, although we do not often agree,but "weak piece of hardware" is a little bit strong and false, the snes is not perfect, and some parts are badly designed(i agree), not the whole thing .
You have some parts in Md which are weak too,there is no perfect hardware(except the PCE :P ) .
But i not understand why your are saying that the 65816 is a bad CPU because is only a 16 bit 6502(this is entirely false of course) !!!

i quote a googled translation of your interpretation of the 65816 design:
Quote:
And if you want to drive the nail on the so-called 16-bit architecture of the 65816 (which is also officially introduced as naked 8/16 bit CPU):
In fact, the 65816 is directly derived from 6502, the only changes it's just the passage of certain registers in 16-bit, 16-bit ALU is a memory address that goes to 24 bits. But, and the big but is that it gives exactly the same logic as the 6502 for the whole game IO / instruction decoding etc ... that remains to 8 bits. And therein lies the problem since all the data go through it (suddenly I come back to the fact that the bus width is decidedly decisive). But this choice is understandable as extend the registers and the ALU is very simple, for review against all the IO system and the instruction decoding it would have been much more complex and have more investment (especially if they wished maintain compatibility with the 6502). In fact they even kept the 16-bit PC register 6502 and 24-bit addressing is only possible thanks to an additional register PB (8 bits) that contains 8 bits.

For me the 65816 it is rather an 8-bit CPU whose capabilities have been extended to make it 16-bit friendly.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176204)
The phrase "16-bit CPU" doesn't precisely mean anything.

There are CPUs with 16-bit buses, CPUs with 16-bit ALUs, and CPUs with 16-bit ISAs; at best, we might define a 16-bit CPU to be a CPU with all of the above, in which case neither the 65816 (8/16/16) nor the 68000 (16/16/32) are "true" 16-bit; at worst, the phrase is just marketing.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176205)
TOUKO wrote:
Stef i respect your opinion, although we do not often agree,but "weak piece of hardware" is a little bit strong and false, the snes is not perfect, and some parts are badly designed(i agree), not the whole thing .
You have some parts in Md which are weak too,there is no perfect hardware(except the PCE :P ) .
But i'am not understand why your are saying that the 65816 is a bad CPU because is only a 16 bit 6502(this is entirely false of course) !!!


To be honest i think that a major part of the SNES is badly designed... really.
For instance it has some nice sound capabilities but severely limited by the small amount of dedicated memory... i can understand the quantity was driven by cost but then, why designing it in a way so we can't easily stream sound data to overcome that limited memory ?
Same thing for the graphics capabilities, a large part of video mode aren't really useful, the video chip is overcomplicated and most of its features useless. Why did they though when they were implementing the 64x64 sprite size for instance ? It look like the SNES specs were decided by marketing guys who just wanted to show "big numbers", just to show the SNES can do more than the competitor even if in reality the features are not usable.

The real nice features in the SNES for me are the following ones:
- rich colorset and palette
- transparency effect
- pixel windowing
- mode 7
- HDMA

But all these is completely hidden by the largely convoluted and complex design of the system itself. The MD has its weakness: the poor number of palette is terrible and the sound part could have be done in a better way. But all in one, it's still much more well designed than the SNES. Even the Saturn is for me not as terrible than the SNES in term of design. The Saturn is very complex, convoluted but at least it doesn't have real weakness as the SNES can have. It's just a super complex system (but to be honest i don't like it neither).

Quote:
i quote a googled translation of your interpretation of the 65816 design:
Quote:
And if you want to drive the nail on the so-called 16-bit architecture of the 65816 (which is also officially introduced as naked 8/16 bit CPU):
...
For me the 65816 it is rather an 8-bit CPU whose capabilities have been extended to make it 16-bit friendly.


No worries i totally assume what i said even if the google translation is a bit rusty :p
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176206)
Quote:
No worries i totally assume what i said even if the google translation is a bit rusty :p

yes i know sorry, but the text was too long for a by hand traduction . :(

For the Md weakness, i dislike the FM chip,he is very slow and a low grade FM chip IMO,too often the sound Fx are terrible,musics are really excellent in some case (treasure, techno soft,in midnight resistance),i dislike the volume setting,too high(seems to be hard to set correctly).
The PCM capacities are bad, because you have no volume control,nor panning and you must use timed code with Z80 for correct rendition,i noticed very distorded samples in your XGM driver(it's not your driver fault, but the difficulty to sync the samples stream with the 2612's DAC),but not always,it depends of what sample you are playing,plus the stupid Z80's banking system .
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176213)
Is the Z80 banking any more stupid than MMC1? And is the PCM any worse than NES $4011? Hmmm... MMC1 and $4011... How many words can you make before the sun goes down?

As for the "bit" designation for an entire console, I choose the widest data bus in the system, measured in word width times words per clock.
  • For NES, Game Boy, and Master System, it's 8.
  • For TG16 and Super NES, it's 16 (VDC/PPU data bus).
  • For Genesis, it's 16 (CPU data bus) or 16 (VDP data bus, 8 bits times 2 transfers per clock).
  • For Jaguar, it's 64.
  • And for Nintendo 64, it's also 64 (9-bit RAM with parity, so really 8 bits, times 8 transfers per clock).

The SPC700 was designed for maximum isolation from the rest of the system. This makes streaming harder (you need HDMA) but reduces bus contention, which you can sometimes get on a Genesis when the Z80 and 68000 try to access the cart at once.

The real problem with the 6502 is that it isn't really designed to run C well. The 65816 improves on this somewhat with the d,S and (d,S),Y addressing modes, as well as the base pointer register that TCD and PLD set. But arrays of structs in the heap are still a mess because there's no quick way to add a number other than 1 or -1 to an index register.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176215)
Quote:
Is the Z80 banking any more stupid than MMC1? And is the PCM any worse than NES $4011?

if i remember correctly, it's serial, and 1 bit at a time, you spend 100+ cyles to map a bank .

Quote:
And is the PCM any worse than NES $4011?

worse no, they are not scratchy, or muffled, only distorded,but it's very audible,and you must be carefull of DMA contention,which can be problematic.

Quote:
The real problem with the 6502 is that it isn't really designed to run C well. The 65816 improves on this somewhat with the d,S and (d,S),Y addressing modes, as well as the base pointer register that TCD and PLD set. But arrays of structs in the heap are still a mess because there's no quick way to add a number other than 1 or -1 to an index register.

Of course in that case you're right, 65xxx are not suited for high level languages, this why i'll go for 68k in a PC, you need of this king of CPU (large addressable memory, not bank mapping,good performance for high level language,no need expensive memory, useful when you need a big amount) .
For a game system,with limited resource requirements i'll prefer a 65xxx, fast, a huge margin for optimisations,a fast interrupt system, a lower amount of bytes taken by compiled code,not expensive at all,very well suited for game systems .
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176216)
MMC1 on NES is also serial. PRG ROM banking on MMC1 (SGROM/SNROM, up to 2 Mbit) takes 56 half-cycles. (Counting half cycles to account for difference in clock speed and IPC between 6502 and Z80.)
Code:
sta $E000  ; 8 - first 4 bits are PRG A14-A17
lsr a      ; 4
sta $E000  ; 8
lsr a      ; 4
sta $E000  ; 8
lsr a      ; 4
sta $E000  ; 8
lda #$00   ; 4 - fifth is cart WRAM disable, so clear it
sta $E000  ; 8


SUROM (4 Mbit) has A18 over in another serial register, taking a total of 100 half-cycles.
Code:
sta $E000  ; 8 - first 4 bits are PRG A14-A17
lsr a      ; 4
sta $E000  ; 8
lsr a      ; 4
sta $E000  ; 8
lsr a      ; 4
sta $E000  ; 8
and #$02   ; 4 - clear WRAM disable bit but preserve fifth bit for next write
sta $E000  ; 8
sta $A000  ; 8 - SUROM doesn't use first 4 bits for anything important
sta $A000  ; 8
sta $A000  ; 8
sta $A000  ; 8
lsr a      ; 4
sta $A000  ; 8 - but the fifth bit is A18
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176218)
Wahou, MMC1 banking is very bad too .. :shock:

The difference is spending 100 cycles to bank a CHROM, is in fact not a bad deal comparing how many cylces you must spend to transfert those tiles in VRAM .
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176220)
tepples wrote:
The SPC700 was designed for maximum isolation from the rest of the system. This makes streaming harder (you need HDMA) but reduces bus contention, which you can sometimes get on a Genesis when the Z80 and 68000 try to access the cart at once.

Eh, the 68000 and Z80 fighting for the bus is nothing, it's only a handful of cycles which is very little compared to how much the code itself takes up and the Z80 isn't accessing ROM all the time either (as it's spending nearly all its time in its own RAM). The problems are more DMA (which forces the Z80 to stop for a while if it tries to use ROM) and when the 68000 wants to modify Z80 RAM (which again requires stopping the Z80), on top of bank switching wasting a significant amount of time, not to mention Z80-only engines that try to handle FM and PSG on the Z80 as well while it's struggling to get all PCM samples at the right moment.

There are lots of showstoppers for the Z80, but bus contention with the 68000 isn't one of those =P
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176221)
Quote:
The problems are more DMA (which forces the Z80 to stop for a while if it tries to use ROM)

Yes but this can be "easily" solved with a RAM buffer,this inevitably,will introduce some delays before playing PCM .
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176224)
If I were to design the SNES in a way I imagine it would cost about the same: (of course, I have no way of knowing...)

No SPC700 or 64KB of audio ram, just have the 65816 handle audio...
Give the 64KB of main ram to the video hardware for 128KB of vram...
Get rid of all the BS video modes (4,5,6, maybe 0) and use the space for more oam space, like adding 2 more bits for each sprite for size and tile selection...

And because I felt that this might be cheaper now from getting rid of the SPC700 and the audio ram, upgrade the SNES ram speed to fast rom speed. Also, the ram is now 64KB.

Of course, drooling over what the SNES could have been is stupid. I wouldn't call the SNES any weaker than the Genesis because although it's lacking in terms of CPU power, I feel the video hardware makes up for it, which is more important to me personally. CPU power is important with something like 3D on a system with no 3D hardware acceleration like this, but it's still not powerful enough to make a 3D game practical. I definitely agree that it's more poorly designed though, with tons of bad resource management that I addressed earlier. I don't know why they felt like chopping the vram bandwidth into a million different unpractical and borderline useless ways when they could have used the PPU space more constructively.

My opinion personally, but I believe the number of bits in a CPU should be determined by how much the instruction set can handle. I mean, imagine if you had the two identical CPUs, except one with an 8 bit data bus that runs at 4MHz and another with a 16 bit data bus that runs at 2MHz. (This isn't related to the SNES and the Genesis, I'm just trying to prove a point about trying to determine the bits of a CPU) If anything, the first one is better, because the second one is wasteful when it comes to 8 bit instructions. In other words, if the 65816 is a 16 bit CPU, the 68000 is a 32 bit one. However, doesn't the 68000 not support 32 bits for every instruction, unlike the 65816 that supports 16 bits for every instruction?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176225)
Quote:
No SPC700 or 64KB of audio ram, just have the 65816 handle audio...

definitely not a good solution, because the CPU can be halted during all the Vblank and cannot play any PCM in that period .

Quote:
Of course, drooling over what the SNES could have been is stupid. I wouldn't call the SNES any weaker than the Genesis because although it's lacking in terms of CPU power,

CPU power on snes is problematic because of the weak WRAM, rather than 128ko, nintendo should use
faster RAM, but lesser, like 32/64 ko .
Snes's sprites system is also bad and need to be rethinked, this nes heritage should be out and replaced by a real SAT, with multiple size on screen at the same time .

But guys, do you think tha the 65816 is a 16 bit processor or not ??, that's in fact also the question :P
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176227)
TOUKO wrote:
the CPU can be halted during all the Vblank and cannot play any PCM in that period .

Wait, can you only upload audio data during Vblank? I really don't have a good understanding of how audio in the SNES works because I haven't gotten there yet. I feel that they spent too much money on the audio system though.

TOUKO wrote:
But guys, do you think tha the 65816 is a 16 bit processor or not ??

Yes. However, if the 65816 is a 16 bit processor, the 68000 is (at least for the most part) a 32 bit one.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176228)
Lot of external bank system use serial write for cost reason (only 1 pin used for write).
For me the Z80 bank register on MD is not what hurt the most... i think the DMA mastering the main BUS is the biggest issue :-/
You can overcome it but still that requires to cleverly design the driver for that.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176230)
Quote:
Wait, can you only upload audio data during Vblank?

Of course no with a classic sound chip, but with a PCM based chip you must maintain the PCM flux,even in vblank else it would sound bad .
This is why the SPC exist, to avoid PCM interruptions when the 65816 is halted by DMA, it's the same for Z80 in the MD(but not as critical as snes because PCM is not his main audio feature) .

If you want to remove the SPC, you must replace the entire audio system, by a classic one,like FM for exemple.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176231)
What could be nice for sound is just having a simple ADPCM chip capable of reading sample directly from the ROM but owning a small internal sample cache so it can uses DMA transfer to feed it :)
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176233)
Stef wrote:
What could be nice for sound is just having a simple ADPCM chip capable of reading sample directly from the ROM but owning a small internal sample cache so it can uses DMA transfer to feed it :)


i agree, 1 sound genrator + 1/2 PCM voice (like ADPCM) would be the best combo for me, plus ADPCM chip just need to be driven at the start of sample, and is not affected by DMA contention .

Quote:
Yes. However, if the 65816 is a 16 bit processor, the 68000 is (at least for the most part) a 32 bit one.

however it's very different, because the 68k split the 32bit operations, in 2 16bit,in contrary the 65816 do really 16bit ops in native mode(it don't split in 2 8bit) .

And personnaly i don't think that the 65816 is only a 6502 with 16 bit registers like stef think .
The 386SX on the FM Towns Marty also is a 32bit CPU with a 16 bit data bus but still considered 32 bit

How about the SA-1 ??, it's simply a 65816 core or an improved/custom one ?? (i don't speak of DMA, and all others feature embedded, but only the CPU core) .
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176239)
Stef wrote:
For instance it has some nice sound capabilities but severely limited by the small amount of dedicated memory...

It was plenty good enough to produce some of the best game soundtracks of all time.

Whenever memory is limited, it is a problem for anyone working with it. This was always a major part of game development. If it had twice as much RAM you'd still be complaining. ;) It's like that myth about a goldfish that grows to meet the size of its bowl. Unless your goals are drastically smaller than the memory you have, it's going to be one of your biggest problems, and solving that problem is what a good developer does.

I think that's actually the strongest factor in the current "indie renaissance" going on in game development right now. Anybody can make a game now because its biggest problems have been obviated by cheap memory and computing power.


As for whether 65816 counts as 16-bit or 8-bit, I'm with Harrison Ford on this matter. :beer: :D
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176240)
Quote:
As for whether 65816 counts as 16-bit or 8-bit, I'm with Harrison Ford on this matter. :beer: :D

Ahahah, :mrgreen:
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176242)
For the sake of accessibility, and in case the talk show clip to which rainwarrior linked gets taken down by the talk show's copyright owner, let me summarize it for the record:

Harrison Ford is a guest on a late night talk show. (I'm guessing it's to promote a recently released Indiana Jones film.) When someone is quizzing him on the color of the tip of the whip carried by Jones, Ford replies: "Who gives a $#!+?"
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176243)
Ok ,i found that in the official WDC's documentation:
Quote:
The WDC W65C816S is a fully static CMOS 16-bit microprocessor featuring software compatibility* with the 8-bit NMOS and CMOS 6500-series predecessors.
A software switch determines whether the processor is in the 8-bit "emulation" mode, or in the native mode, thus allowing existing systems to use the expanded features.


I think there is no doubts now .. :)
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176247)
TOUKO wrote:
But guys, do you think tha the 65816 is a 16 bit processor or not ??

I think you're bent on dragging an ever so pointless debate from the '90s console wars era to the present. :P
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176248)
Ramsis wrote:
TOUKO wrote:
But guys, do you think tha the 65816 is a 16 bit processor or not ??

I think you're bent on dragging an ever so pointless debate from the '90s console wars era to the present. :P

yes i know, but personnaly i don't care of those details, even if the 65816 was a 4 bit CPU, for me only the perfs count . :wink:
But some Md fans claim that the 65816 is weak because it's a 8bit CPU,and the 68k is mush more faster because it's a 32bit CPU, we all know the song .
This CPU is not perfect, has a low frequency, coupled with crappy WRAM, and the snes has some wonders like super aleste,RR², SF2, DKC series, etc ...

i like all the machines of that era, with their cons/pros.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176249)
TOUKO wrote:
But guys, do you think tha the 65816 is a 16 bit processor or not ??, that's in fact also the question :P

I think it is a 16-bit processor because it has a 16-bit ALU, which is really the hearth of the CPU. However we have to admit the 8-bit data bus is a limitation, but also an advantage as it can connect to more traditional ROMs and adress decoding chips, which would be harder to use if the data bus was genuinely 16-bit.

TOUKO wrote:
Snes's sprites system is also bad and need to be rethinked, this nes heritage should be out and replaced by a real SAT, with multiple size on screen at the same time .

Wow, this makes no sense, the SNES indeed allows multiple sprite sizes on screen at the same time - you are limited to 2 sizes but in software you can simulate anything with metasprites so it's not really a limitation.

Quote:
Why did they though when they were implementing the 64x64 sprite size for instance ?

64x64 sprite size is very useful to simulate an extra BG when mode 7 is used for the real BG. It's also possible with 32x32, but takes 4 times more sprites (you need 8x7 sprites to fill the screen as opposed to 4x4). A lot of games do this - I have not checked which ones uses 32x32 or 64x64, though. Also, most do that only for parts of the screen, intelligently.

Quote:
No SPC700 or 64KB of audio ram, just have the 65816 handle audio...

Considering how much the 65c816 is criticized for not being fast enough, I cannot imagine how poor it'd be at rendering audio. The 16 MHz ARM on the GBA is already the bare minimum, and most games have audio engines of poor quality.


Quote:
To be honest i think that a major part of the SNES is badly designed... really.
For instance it has some nice sound capabilities but severely limited by the small amount of dedicated memory... i can understand the quantity was driven by cost but then, why designing it in a way so we can't easily stream sound data to overcome that limited memory ?
Same thing for the graphics capabilities, a large part of video mode aren't really useful, the video chip is overcomplicated and most of its features useless.

Well, it saddens me to admit it, but you are partly right, especially about the SPC700, which would greatly benefit from extra memory. Nevertheless, it was probably supposed to be only 32k before the release of the console, so at least they expanded it to 64k.

The SPC/CPU design is fine, but they should have designed them in a way so that one CPU can interrupt the other (is doesn't really matter which), so that they could synchronize and stream data more easily.

If I were to "redesign the SNES" as Espozo suggest, I'd add NES backwards compatibility, and add accessible NES pAPU audio channels to SNES games, so it could sound similar to a GBA :) Even if this means scarifying one or two sound channels for the SPC700.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176251)
Quote:
I think it is a 16-bit processor because it has a 16-bit ALU, which is really the hearth of the CPU.

yes i think so too ..

Quote:
Wow, this makes no sense, the SNES indeed allows multiple sprite sizes on screen at the same time - you are limited to 2 sizes but in software you can simulate anything with metasprites so it's not really a limitation.

Of course, but is far to be as practical as if the PPU allow you to select directly multiple size in hardware like Md or PCE ..
Is not a normal thing to spend more CPU power to do that,this not like if the CPU has plenty of cycles to spend on things like that . :wink:
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176252)
TOUKO wrote:
yes i know, but personnaly i don't care of those details

Oh really? :wink:

Let's see:

TOUKO wrote:
For me the bit of a CPU is dependant of his ALU, but Z80 have a 4 bit ALU multiplexed to 8 bit .

TOUKO wrote:
Not for me, the 68k like the 65816 has a 16 bit ALU,i said even if the 68k could do 32 bit ops, they are splited internally,into 2 16 bit ops, in contrary to 16 bit ops in the 65816.

Seems to be the same thing, i think that the 8 bit ops on Z80(because of the 4 bit ALU) is close to 32 bit on the 68k, may be this why the Z80 efficiency is not so fabulous(but it's not a bad or weak CPU, is only less efficient) .

TOUKO wrote:
For the Md weakness, i dislike the FM chip,he is very slow and a low grade FM chip IMO,too often the sound Fx are terrible,musics are really excellent in some case (treasure, techno soft,in midnight resistance),i dislike the volume setting,too high(seems to be hard to set correctly).
The PCM capacities are bad, because you have no volume control,nor panning and you must use timed code with Z80 for correct rendition,i noticed very distorded samples in your XGM driver(it's not your driver fault, but the difficulty to sync the samples stream with the 2612's DAC),but not always,it depends of what sample you are playing,plus the stupid Z80's banking system .

TOUKO wrote:
Quote:
Is the Z80 banking any more stupid than MMC1? And is the PCM any worse than NES $4011?

if i remember correctly, it's serial, and 1 bit at a time, you spend 100+ cyles to map a bank .

Quote:
And is the PCM any worse than NES $4011?

worse no, they are not scratchy, or muffled, only distorded,but it's very audible,and you must be carefull of DMA contention,which can be problematic.

Quote:
The real problem with the 6502 is that it isn't really designed to run C well. The 65816 improves on this somewhat with the d,S and (d,S),Y addressing modes, as well as the base pointer register that TCD and PLD set. But arrays of structs in the heap are still a mess because there's no quick way to add a number other than 1 or -1 to an index register.

Of course in that case you're right, 65xxx are not suited for high level languages, this why i'll go for 68k in a PC, you need of this king of CPU (large addressable memory, not bank mapping,good performance for high level language,no need expensive memory, useful when you need a big amount) .
For a game system,with limited resource requirements i'll prefer a 65xxx, fast, a huge margin for optimisations,a fast interrupt system, a lower amount of bytes taken by compiled code,not expensive at all,very well suited for game systems .

TOUKO wrote:
the CPU can be halted during all the Vblank and cannot play any PCM in that period .

(…)

CPU power on snes is problematic because of the weak WRAM, rather than 128ko, nintendo should use
faster RAM, but lesser, like 32/64 ko .
Snes's sprites system is also bad and need to be rethinked, this nes heritage should be out and replaced by a real SAT, with multiple size on screen at the same time .

TOUKO wrote:
This is why the SPC exist, to avoid PCM interruptions when the 65816 is halted by DMA, it's the same for Z80 in the MD(but not as critical as snes because PCM is not his main audio feature) .

If you want to remove the SPC, you must replace the entire audio system, by a classic one,like FM for exemple.

TOUKO wrote:
1 sound genrator + 1/2 PCM voice (like ADPCM) would be the best combo for me, plus ADPCM chip just need to be driven at the start of sample, and is not affected by DMA contention .

Whoa, sooooo many details … which you totally don't care about. :mrgreen:

The lesson to learn here, people, is to leave the past behind for good. We're not in the 1990s any more. Eat that. And stop bringing up pointless '90s discussions over and over again. Stop listening to those who're into the field simply for their own need of "nostalgia," which they'll typically bring up as an argument to support their untenable views.

Ramsis (signing off of this ridiculous thread)
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176253)
Quote:
Of course, but is far to be as practical as if the PPU allow you to select directly multiple size in hardware like Md or PCE ..

Indeed, especially considering there's 4 documented sizes, so selecting them directly would be doable with 2 bits, just 1 more than selecting small/big like it is done. Nevertheless, I do not think it is a major issue.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176261)
Lol ramsis,you got me :D
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176262)
Just gonna say: the biggest bottleneck with the SNES is not the CPU per-se, it's how setting up the hardware to do something is ridiculously complex (in terms of what operations the CPU has to do). Yeah, talking about where bits go arranged or how memory is laid out and such. It definitely makes things harder to do than they could have been.

Bregalad wrote:
If I were to "redesign the SNES" as Espozo suggest, I'd add NES backwards compatibility, and add accessible NES pAPU audio channels to SNES games, so it could sound similar to a GBA :) Even if this means scarifying one or two sound channels for the SPC700.

Honestly, I imagine the main reason NES compatibility is not in the SNES (despite being traces of it in the design) is that games rely so absurdly much in every single timing detail of the NES that it would be completely unfeasible to pull it off with modified hardware short of having two separate PPUs (EDIT: yeah I know the S-PPU is already split in two, I mean adding the NES one alongside those as well). I don't think I need to explain that was never going to be worth the cost.

On the other hand it would be interesting to see how it'd have been if the SNES was some sort of improved NES instead of its own new thing. Even if they had to ditch backwards compatibility due to different timings.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176263)
I agree sik,the limited cycles available would not have been a problem if the hardware was clean .
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176264)
TOUKO wrote:
How about the SA-1 ??, it's simply a 65816 core or an improved/custom one ?? (i don't speak of DMA, and all others feature embedded, but only the CPU core) .


It's a totally standard 65816, with a custom memory system wrapped around it to let it run at 10MHz without needing excessively fast (for 1996) external ROM and RAM. Specifically, it's got separate busses to ROM and to RAM, the ROM bus is 16-bit with a prefetch queue for instructions, the RAM bus has a write buffer, and there's 2KB of fast internal RAM that can be used for direct page and the stack.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176265)
I think any debates about which console is more powerful have ceased to be relevant decades ago. Do you know which console is even more powerful than the SNES or Genesis? The N64, PS1, Saturn, Dreamcast, PS2, GameCube, Xbox, PS3, Wii, Xbox 360, PS4, Wii U, Xbox One, and so on.

Let's face it, we aren't programming for these things because they're the best thing on the market. It's probably at least 70% nostalgia, and 30% interest in working with limited, old hardware. At that point, it doesn't matter which is more powerful, it's just a matter of personal taste when it comes to which constraints you want to deal with.

Which is more powerful, the NES or SNES? Obviously the SNES, but somehow these are still called the nesdev forums. Why? Because console power is irrelevant here.

rainwarrior wrote:
It's like that myth about a goldfish that grows to meet the size of its bowl.

While we're derailing endlessly... for the most part this isn't a myth; given the proper environment, goldfish grow to be more than a foot long, and can live for more than ten years.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176272)
AWJ wrote:
TOUKO wrote:
How about the SA-1 ??, it's simply a 65816 core or an improved/custom one ?? (i don't speak of DMA, and all others feature embedded, but only the CPU core) .


It's a totally standard 65816, with a custom memory system wrapped around it to let it run at 10MHz without needing excessively fast (for 1996) external ROM and RAM. Specifically, it's got separate busses to ROM and to RAM, the ROM bus is 16-bit with a prefetch queue for instructions, the RAM bus has a write buffer, and there's 2KB of fast internal RAM that can be used for direct page and the stack.

Thanks for the infos :wink:
But what do you mean by "not excessively fast" ??,it adds some wait states when accesing rom or external RAM ??

@nicole:The topic here has derailed,it was more about the wrong infos about the snes cpu than console war.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176273)
Nicole wrote:
rainwarrior wrote:
It's like that myth about a goldfish that grows to meet the size of its bowl.

While we're derailing endlessly... for the most part this isn't a myth; given the proper environment, goldfish grow to be more than a foot long, and can live for more than ten years.

The myth is that it's the size of the tank, which is not really the limiting factor, though related.

My point was that the complaints that "if it only had more memory" were about the size of your goals. When I was working on PS3, devs constantly complained about "only" having 256 MB of system memory, even though this is 8 times as much as the previous generation had. In another time, someone managed to make a complex 3D space shooter like Star Raiders with only 128 bytes of RAM.

Nowadays you can just have recorded music, or even complex mixed music with relatively modest memory and CPU requirements (compared to what tends to be available). Game audio just doesn't typically have needs that are constrained by the modern environment, and it had been a tremendously free situation. Eventually you can escape!

Indie games got to escape from practical constraints like that a lot earlier than AAA games, since they're not competing to use everything they can to make their game look better than others in a technical way. Modern AAA games are still a struggle, but the pace of expansion is slowing down, and the sub-AAA market is growing tremendously. There's a light at the end of this tunnel.


Anyhow, you could have twice as much music memory on the SNES, and it would make the thing you were struggling to do in 64k easy, but if you'd started with 128k as your constraint you would have filled that up quickly and be complaining that it's not 256k. Audio memory constraints became easy at ~20 megabytes, not 128k. ;) If you want to have a better time working with 64k, maybe practice with only 32k for a while.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176274)
Nicole wrote:
Do you know which console is even more powerful than the SNES or Genesis? The N64, PS1, Saturn, Dreamcast, PS2, GameCube, Xbox, PS3, Wii, Xbox 360, PS4, Wii U, Xbox One, and so on.

Then which unpatented and fully cracked console is more powerful than the Super NES or Genesis? Everything newer than N64 is still patented, which leaves developers open to suit over claims of the form "program storage medium for the computing machine of claim 1". I'm not sure whether the N64's CIC is cloned. Saturn disc protection was just recently figured out, and even then the details aren't public, nor can burners replicate the PS1 wobble track. And opening the Xbox One and Wii U to individual developers is also a very recent development (as in earlier this month).

Nicole wrote:
Let's face it, we aren't programming for these things because they're the best thing on the market. It's probably at least 70% nostalgia, and 30% interest in working with limited, old hardware.

Part of that 30% is that that the graphics architecture is qualitatively different enough from OpenGL to be interesting. Otherwise, you could just stick with PC, which has lower entry cost. The other is they're cracked deeply enough for hobbyists to make original software and sell copies that work on unmodded systems.

rainwarrior wrote:
Nowadays you can just have recorded music

That's been true even on handhelds since the Game Boy Advance.

rainwarrior wrote:
If you want to have a better time working with 64k, maybe practice with only 32k for a while.

Case in point: the entire soundtrack of Haunted: Halloween '85 fits in 16K. At least on SPC700, you get to load in new samples and song data before each track.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176275)
TOUKO wrote:
AWJ wrote:
TOUKO wrote:
How about the SA-1 ??, it's simply a 65816 core or an improved/custom one ?? (i don't speak of DMA, and all others feature embedded, but only the CPU core) .


It's a totally standard 65816, with a custom memory system wrapped around it to let it run at 10MHz without needing excessively fast (for 1996) external ROM and RAM. Specifically, it's got separate busses to ROM and to RAM, the ROM bus is 16-bit with a prefetch queue for instructions, the RAM bus has a write buffer, and there's 2KB of fast internal RAM that can be used for direct page and the stack.

Thanks for infos :wink:
But what do you mean by "not excessively fast" ??,it adds some wait states when accesing rom or external RAM ??


Nobody's done hardware tests to suss out all the details, but the external memories apparently run at half the speed of the CPU (i.e. 5MHz) According to the manual, the following operations incur a wait state:

-When the SA-1 and the S-CPU both access ROM at the same time
-Reading data from external RAM or ROM
-Writing to external RAM "when the write buffer is full" (no explanation given as to what conditions can cause that state)
-Jumping/branching to an unaligned (odd) ROM address

Also, unlike the S-CPU, the SA-1 can execute instructions and DMA at the same time, but one or the other has to stall if both are accessing the same memory (there's a register bit that controls whether the CPU or DMA has priority)

No SNES emulator to my knowledge even attempts to emulate SA-1 memory contention. bsnes always runs it at the full 10MHz and lets DMA occur instantaneously.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176277)
AWJ wrote:
Nobody's done hardware tests to suss out all the details, but the external memories apparently run at half the speed of the CPU (i.e. 5MHz) According to the manual, the following operations incur a wait state:

-When the SA-1 and the S-CPU both access ROM at the same time
-Reading data from external RAM or ROM
-Writing to external RAM "when the write buffer is full" (no explanation given as to what conditions can cause that state)
-Jumping/branching to an unaligned (odd) ROM address

Also, unlike the S-CPU, the SA-1 can execute instructions and DMA at the same time, but one or the other has to stall if both are accessing the same memory (there's a register bit that controls whether the CPU or DMA has priority)

No SNES emulator to my knowledge even attempts to emulate SA-1 memory contention. bsnes always runs it at the full 10MHz and lets DMA occur instantaneously.

Thanks,that are precious informations,it is surprising that nintendo can did that kind of stuffs,and take that ricoh version of the 816.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176279)
rainwarrior wrote:
My point was that the complaints that "if it only had more memory" were about the size of your goals.

Yeah, I certainly don't disagree with your point.

tepples wrote:
Then which unpatented and fully cracked console is more powerful than the Super NES or Genesis?

Well, PC, more or less, in any way that really matters to an indie dev. People don't develop for the SNES because it's the newest unencumbered thing out there.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176281)
rainwarrior wrote:
When I was working on PS3, devs constantly complained about "only" having 256 MB of system memory, even though this is 8 times as much as the previous generation had.

To be fair, that has a lot to do with comparing it to the 360. More specifically: the 360 has 512MB of unified RAM (i.e. for both CPU and GPU), while the PS3 had 256MB of RAM and 256MB of VRAM. This was already annoying as the PS3 imposed a limit on how memory could be used, but to top it off streaming became way more commonplace in that generation: in the case of the 360 you just streamed into memory and the GPU could use it directly, while on the PS3 you had to stream it into RAM, then copy it into VRAM, and that temporary copy is effectively lost memory.

Reminds me of how people complain that the SNES can only use 16KB for sprites even though there's 64KB worth of video memory =O)
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176282)
tepples wrote:
This makes streaming harder (you need HDMA)

You don't need HDMA. Lots of people have done streaming engines where the S-CPU pokes in data manually.

In my personal opinion, you need HDMA to get good bandwidth on demand without crippling everything else that needs the S-CPU. But my definition of "good bandwidth" may differ from others'...

AWJ wrote:
the external memories apparently run at half the speed of the CPU (i.e. 5MHz)

Does this imply that the bank byte is demuxed from the data lines, allowing the effective memory speed to double? I mean, on top of the doubling from it being 16-bit; 5.37 MHz is still pretty fast for a SNES cartridge, and the SA-1 seems to have been a little too popular in Japan for it to have required extra fast memory on top of the cost of the chip...

And how much would it have cost to put this sort of memory controller (without the fancy collision control, presumably) in the actual Super Famicom in 1990?

I'm just kinda wondering if Nintendo passed up a legitimate opportunity to nearly quadruple the speed of the S-CPU without changing the memory response time spec...
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176288)
93143 wrote:
AWJ wrote:
the external memories apparently run at half the speed of the CPU (i.e. 5MHz)

Does this imply that the bank byte is demuxed from the data lines, allowing the effective memory speed to double?


No, it implies the SA-1 slows down to 5MHz whenever accessing external memory directly (not through the instruction prefetch queue or the RAM write buffer)

The S-CPU's external bus does demux the address and data lines (and has individual /RD and /WR strobes), but internally it still seems to use the standard 2-phase 65xx bus. I expect that would have required almost total redesign of the 65816 core to change.

ETA: Oh wait, I see what you're asking. Yeah, I expect "phi1" on the SA-1 is always half a 10MHz cycle and "phi2" is stretched to 3 half-cycles. That corresponds to how the S-CPU works (where phi1 is always 3 master clocks and phi2 is 3 or 5 clocks for fast/slow address ranges) If I'm right, SA-1 would have exactly the same memory speed requirements as a normal "fast" (3.58MHz) SNES cart.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176289)
Sik wrote:
rainwarrior wrote:
When I was working on PS3, devs constantly complained about "only" having 256 MB of system memory, even though this is 8 times as much as the previous generation had.

To be fair, that has a lot to do with comparing it to the 360.

It did, yes, if you were cross developing with it. I intentionally left that out because I was trying to avoid drawing out a 360 vs PS3 comparison in this thread, but you'd hear the same complaints if you were making an exclusive PS3 game (or if you were making an exclusive 360 game, or a Wii game, etc.). Nobody was ever satisfied with how much RAM they got. It's always full, no matter what you're doing. ;P 256 MB is a ridiculously large amount of memory for a game, but there's always a way to fill it.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176291)
AWJ wrote:
The S-CPU's external bus does demux the address and data lines

There must be something I don't get here. Why are the memory speeds (200/120 ns) apparently specified under the assumption that they can't start responding until the beginning of phi2?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176295)
Damn, four pages in one day. Slow down, people :P

The 65816 has a 24-bit address bus and an 8-bit data bus. And probably by necessity, is designed to do 16-bit operations split over two cycles each, with a few exceptions (like XBA.)

I would still call it a 16-bit CPU, but the designation has always been a meaningless designation driven by marketing for bullshit like the "64-bit Jaguar" and "128-bit Dreamcast."

Not worth the time to think about, let alone debate for one view or another.

Nicole wrote:
I think any debates about which console is more powerful have ceased to be relevant decades ago.


Exactly. The only thing that matters are the games.

The reason I worked on the SNES first is because I love the games. I'm a huge JRPG fan, and the SNES has: Lufia 1&2, Breath of Fire 1&2, Final Fantasy IV-VI, Chrono Trigger, Bahamut Lagoon, Rudra's Secret Treasure, Star Ocean, Tales of Phantasia, Tengai Makyou Zero, Dragon Quest I&IIR, IIIR, V, VI, Aretha 1&2, Dai Kaijuu Monogatari I&II, and on and on.

The Genesis has........... Phantasy Star IV. Which is also amazing.

But that we're still waging the "Genesis does what Nintendon't" war in 2016 is just beyond the pale ridiculous.

And indeed, I'm the #1 SNES fan out there (bought over 2100 games, know more about it than probably!! anyone else alive), and yet I'm working on a Genesis emulator now.

AWJ wrote:
No SNES emulator to my knowledge even attempts to emulate SA-1 memory contention. bsnes always runs it at the full 10MHz and lets DMA occur instantaneously.


I suspect we could fix the latter without much of a performance penalty. v100 shipped with broken SA-1 IRQs, still gotta do a patch release for that, but I've been too busy.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176297)
byuu wrote:
I suspect we could fix the latter without much of a performance penalty. v100 shipped with broken SA-1 IRQs, still gotta do a patch release for that, but I've been too busy.


I'm not worried about performance so much as about actually breaking games by changing the behavior without adequate research. It seems most likely that SA-1 DMA can either stall the SA-1 or run in parallel with it, depending on the priority register setting and on the memory regions used by the DMA, the SA-1 and the S-CPU. If you make DMA stall when it shouldn't, games will run too slowly and in the worst case communication between the CPUs might break down. If you let DMA run in the background when it should actually stall, games might try to use data that hasn't finished transferring yet (because on real hardware execution wouldn't resume until the transfer was finished)

The current behavior is definitely wrong but at least it lets all games work (though I am a bit suspicious of the unmapped accesses SD Gundam does)

Emulating the Genesis should give you some grounding in emulating bus contention in a heterogeneous multi-processor system (AFAIK the relationship between the 68K and the Z80 is somewhat close to that between the S-CPU and the SA-1)
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176299)
Quote:
It seems most likely that SA-1 DMA can either stall the SA-1 or run in parallel with it, depending on the priority register setting and on the memory regions used by the DMA, the SA-1 and the S-CPU.


Oh, gotcha. That explains why I went with the choice I did, then. Thought it was weird I'd leave out some simple step(cycles) calls.

Quote:
Emulating the Genesis should give you some grounding in emulating bus contention in a heterogeneous multi-processor system (AFAIK the relationship between the 68K and the Z80 is somewhat close to that between the S-CPU and the SA-1)


Yeah, I'm "looking forward to it."

It's not that I don't understand the problem, it's that libco is an awful choice to emulate such a thing.

I may have to force the synchronization to be once-per-opcode for the 68K/Z80. We'll see how it goes. I'm currently averaging ~1800fps with only the 68K core running alone. Still need to add the Z80, YM2612, PSG, VDP.

Another possibility is to consider C++17 stackless coroutines. I might be able to write a scheduler wrapper around libco and it.

EDIT: you're gonna flip when you see the 68K core, too. For the sake of performance, I had to templatize the size parameter of instructions (8-bit, 16-bit, 32-bit modes ... so triplicates of most instructions and the effective address decoding routines.) I've only implemented about 50% of the instructions, and already the core is 72KiB of source code (the largest of all 15 processors in higan) and generates a 700KiB object file (the next largest is ~200KiB.)
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176300)
byuu wrote:
It's not that I don't understand the problem, it's that libco is an awful choice to emulate such a thing.


I don't think the problem is with libco per se, but with your scheduler design that stores each device's time base in terms of clocks relative to a parent device. That works beautifully with the SNES where the devices form a natural tree (the PPU and cartridge coprocessors only communicate with the S-CPU, and the DSP only communicates with the SMP), but it falls apart when both CPUs can communicate with the video and audio hardware.

You probably hate how much I bring up MAME, but I think you should take a look at its scheduler design. In MAME each device's time base is stored in a common unit (attoseconds) relative to power-on. In MAME this requires using 128-bit integers to avoid dealing with overflow, but you could get away with using master clocks as the common unit of time since the MD only has one crystal (I think... Someone correct me if I'm wrong) Once per emulated frame or so, check if any of the time bases is close to overflowing and if any of them is, subtract the lowest time base from all of them (or just subtract the lowest time base unconditionally--probably simpler and faster actually)

I'm not surprised in the least that your 68K dwarfs any of the other CPU cores you've written. The 68K really is the Cadillac of 16 bit microprocessors. 68K bigots who don't consider the 65816 a "real" 16 bit CPU because of its Spartanness should realize that their own favorite CPU is just as much an outlier in the opposite direction.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176301)
Doesn't the Genesis still derive all clocks from the same 53.6931 MHz crystal? Z80 is f/15, dot clock is f/8 or f/10, and 68000 is f/7.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176305)
> I don't think the problem is with libco per se, but with your scheduler design that stores each device's time base in terms of clocks relative to a parent device.

You're correct in that the design to have one signed integer represent the time difference between two components (neither is the parent) won't work for the Genesis. It did work for the five emulators I've written previously, but here ... both the 68K and Z80 can talk with the PSG.

The solution isn't that complicated, though. We just need three counters instead of two:
68K <> Z80
Z80 <> PSG
68K <> PSG
The tricky part will be where to put them and how to name them.

But libco is indeed an issue here. The thing is, whenever the 68K accesses shared memory with the Z80, or vice versa, it has to switch to the other to catch up on time. This is always going to be immensely painful. The only magic trick is rollbacks, which Nemesis has already tried with the Genesis.

We know from experience that with the DSP being as simple a device as it is, that a state machine results in a speedup over using libco. You and I disagree on how much ... I observed maybe 3-5%, I think you said 8-10%? But given that's against the total emulator framerate, that's pretty significant for a minor change.

We'll probably be doing around one million of these switches a second with the 68K<>Z80, just like with the SMP<>DSP. The smarter move here would be to try and write the Z80 as a state machine, and enslave it to the 68K. But, you know how stubborn I can be with consistency.

> You probably hate how much I bring up MAME

I get it, at least. You're a MAME dev. Ostensibly you work on it because you believe in what it's doing and how it's designed. It's like me bringing up cooperative threading all the time. I respect that.

> I think you should take a look at its scheduler design

I don't think I'd be too successful tearing at its source code in isolation. But I'd be up for hearing more about it from you, if you were up for it. If not, no big deal.

> the MD only has one crystal

That is correct. Every chip is powered off clock dividers against it.

> The 68K really is the Cadillac of 16 bit microprocessors.

Agreed. From my perspective, the 68K is atrocious.

I understand that as a developer writing code for the system, the 68K is indeed like a Cadillac and the 65816 like a Ford Pinto.

But I admire simplicity more than anything else. I think that's evident in my extreme NIH and efforts to minimize code. I'm extremely proud of having an 8KiB ZIP decoder, an 8KiB PNG decoder (requires the ZIP decoder), a 20KiB web server, etc.

The 68K, from a backend hardware design perspective, is an absolute mess. Many instructions are missing certain effective addressing modes, for absolutely no discernible reason (sometimes you can see why; but often it just feels completely abitrary whether you get to use the PC with index/displacement modes. Newer models start adding them back, so it's clearly possible.) The instruction encoding is just completely off the wall ... sometimes 00 = byte, 01 = word, 10 = long ... sometimes you get these "opmode" 3-bit prefixes to encode what's effective only three values. Sometimes it's one bit. Sometimes it's two bits, but the bits are in a different ordering ... 10 = byte, 11 = word, 01 = long. For byte/word modes, address register ops usually sign extend, data register ops usually leave the upper bits alone. But there's always exceptions. MOVEM sign extends to data registers, too. MOVEQ doesn't sign extend, but fills all the bits of the data registers. Many instructions ignore the size prefix completely when registers are used as destination addresses. Shifting by zero in a register does weird things with flags, whereas shifting by zero with an immediate turns into shift-by-eight. Sometimes an immediate is limited to 3-bits, sometimes 8-bits, sometimes you can load 16-bits or 32-bits from the opcode extensions. Lots of instructions do spurious read cycles for no reason at all. I don't even want to get into 68020 and above ... those just make things a dozen times worse. This is by far the nastiest chip I've worked with, and I've emulated ARM7 and 80186. I feel like this chip had five or more designers creating instructions, and none of them worked with each other to ensure any sort of logical consistency.

But again ... before the 68K defenders come in ... I get it! It's a dream to program for it! The 65816 only has one accumulator!! Madness! I'm right there with you on the user-end perspective.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176306)
byuu wrote:
The solution isn't that complicated, though. We just need three counters instead of two:
68K <> Z80
Z80 <> PSG
68K <> PSG
The tricky part will be where to put them and how to name them.


That seems really likely to get you into a paradoxical situation where device A is ahead of device B, device B is ahead of device C and device C is ahead of device A. And the number of counters explodes as you add devices (every device needs a counter for every other device that can possibly interact with it)

Here's how I'd design a scheduler for the MD: each device has one, unsigned counter. Each time it executes a cycle, it adds n to its counter where n is its divider (so the 68K adds 7 per cycle, the Z80 adds 15, etc.) To determine whether device A is ahead of or behind device B, you can just compare their counters, since they're all in the same unit (53MHz master clocks). To keep the counters from ever overflowing, once per frame go through all the counters, find the smallest value, and subtract that from all of the counters (so the smallest counter becomes zero and all the others are rebased relative to it)

If you're worried about performance, I think you'll have to bite the bullet and make a scheduler capable of handling both cothreaded and state-machine devices. Here's one way you could do it: make Processor a class with all the bits needed to interact with the scheduler except for a cothread pointer, and with a pure virtual method enter(). Make a subclass, CothreadedProcessor, that has the cothread pointer and overrides enter() with an implementation that switches to it. All cothreaded devices (mainly CPUs) will inherit from CothreadedProcessor. State machine devices, on the other hand, will inherit directly from Processor, and override enter() with the main loop of their state machine.

Performance should be pretty good because the compiler should be able to devirtualize most or all calls to enter(). By using virtual methods, none of the devices needs to know which of the other devices is cothreaded and which is a state machine (which I believe was the problem you had with the old, heavily hardcoded bsnes scheduler)
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176307)
Quote:
Here's how I'd design a scheduler for the MD: each device has one, unsigned counter. Each time it executes a cycle, it adds n to its counter where n is its divider (so the 68K adds 7 per cycle, the Z80 adds 15, etc.) To determine whether device A is ahead of or behind device B, you can just compare their counters, since they're all in the same unit (53MHz master clocks). To keep the counters from ever overflowing, once per frame go through all the counters, find the smallest value, and subtract that from all of the counters (so the smallest counter becomes zero and all the others are rebased relative to it)


Thank you. I implemented this in my Wonderswan Color core, and the results are very impressive:
* Riviera: 177fps -> 195fps
* Final Fantasy: 158fps -> 165fps
Very impressive for such a simple change. And it perfectly eliminates the problem I was having with the 68K<>PSG<>Z80 scenario. And now I can build up the scheduler class to do a lot more heavy lifting.

It's trickier though with the SNES where there are multiple independent clock rates.

Quote:
If you're worried about performance, I think you'll have to bite the bullet and make a scheduler capable of handling both cothreaded and state-machine devices.


Indeed, that's exactly what I'm thinking of doing. I want to wait to see how well C++17 works out, because there's not an easy way to make a state machine right now without all the red tape.

Quote:
By using virtual methods, none of the devices needs to know which of the other devices is cothreaded and which is a state machine (which I believe was the problem you had with the old, heavily hardcoded bsnes scheduler)


That's exactly correct. It was an even bigger mess when I had the performance/balanced profiles, so it could change out from under you based on compilation flags.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176309)
rainwarrior wrote:
Stef wrote:
For instance it has some nice sound capabilities but severely limited by the small amount of dedicated memory...

It was plenty good enough to produce some of the best game soundtracks of all time.

Whenever memory is limited, it is a problem for anyone working with it. This was always a major part of game development. If it had twice as much RAM you'd still be complaining. ;) It's like that myth about a goldfish that grows to meet the size of its bowl. Unless your goals are drastically smaller than the memory you have, it's going to be one of your biggest problems, and solving that problem is what a good developer does.
...


I agree with that and in fact if you take my next sentence I said that I understand you have anyway cost constraint which drive the amount of memory you can have. The problem for me is more about it is very complicated to stream any data to that memory because of the design. 64kb can be enough for single music but is really short when you add SFX and digitalized voices on top of thats... having something similar to the MD design (having SPC using the A-Bus in cycle steal mode) would have make the whole sound system much more powerful in the whole.


byuu> MOVEQ *does* sign extend...
And i agree the M68000 instruction decoding part is a bit fudgy, having a good hexa decimal opcode table is important to not make any mistake but i guess you have a bunch of good documentation for that.
Still looking back my old 68000 C68K generator table, i think that it's not that much a mess.
At least EA (Effective Address) both for source / destination are always located at the same position, as is the size field (with the single exception of SUBA). Also the size field can be encoded on 1 bit (instead of 2) when the only possible size is word or long (for Ax register operation) and that makes sense. I do agree there are some weirdness in instruction decoding sometime... but for me it's not really worst than other CPU i played around :p

Glad to hear you're working on a MD emulator by the way :) I guess you already know about Exodus (written by Nemesis) which is the equivalent of Bsnes but for the MD. I've an almost finished Gens 2 emulator (rewrite from scratch) sitting somewhere on my hard drive (i wrote it a long time ago but never had motivation to finish / release it) which was working on the same idea to use the master clock as main synchronizer for all components. As you described we meet trouble when Z80 & 68000 could modify the same region, this is true for the PSG / VDP but in fact this as also true for 68000 RAM (strangely the Z80 can actually write it but not read it) or SRAM. But to be honest, i think that not a single game relies on it as no game try to write VDP, 68000 RAM or SRAM from the Z80. And when they access PSG from Z80 they don't do it from the 68000 and vice-versa. So basically you can go over and choose to drive synchronization using a master reference as the 68000 CPU (it's was my case). As soon the 68000 does access or modify Z80 context or VDP context then you need synchronize.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176314)
tepples wrote:
As for the "bit" designation for an entire console, I choose the widest data bus in the system, measured in word width times words per clock.
  • For NES, Game Boy, and Master System, it's 8.
  • For TG16 and Super NES, it's 16 (VDC/PPU data bus).
  • For Genesis, it's 16 (CPU data bus) or 16 (VDP data bus, 8 bits times 2 transfers per clock).
  • For Jaguar, it's 64.
  • And for Nintendo 64, it's also 64 (9-bit RAM with parity, so really 8 bits, times 8 transfers per clock).


Widest bus in SMS is VRAM bus which is 16 bits.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176318)
byuu wrote:
The Genesis has........... Phantasy Star IV. Which is also amazing.


When starting on Gen dev, I played through PS IV because it was a first-party title and said to be one of the best Gen games. Boy was I underwhelmed, it did almost nothing that pushed the hw, the battle char disappearing was completely unnecessary, 4/10 story clearly written by a seventh-grader, unmemorable music and graphics, bad UI, bad dialogue, and I even found a bug (in a certain castle, going to a certain corner always hanged the game).

That... Kind of made me wonder why there are so few good Gen games, when the console would be fully capable of Pokemon Gold or FF 4. That was a highly marketed first-party title too.

Quote:
yet I'm working on a Genesis emulator now.


Excellent! You're one of the few people who care about portability, as it is, there are practically no good Gen emulators for Linux, especially non-32-bit x86.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176325)
calima wrote:
there are so few good Gen games
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176326)
I'm surprised nobody made a dirt cheap 16-bit RISC cpu. I'm thinking maybe having 8 24-bit registers, with most ALU instructions being 16-bit, but some being 24-bit.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176331)
Such CPUs exist and are called DSPs and MCUs. They just don't become popular as a general-purpose personal computer's primary CPU because personal computer users tend to demand binary compatibility with their existing proprietary applications. This is why Mac OS 7.5 through 9 for PowerPC include a 68LC040 emulator, why Mac OS X 10.5 and 10.6 for x86 include a PowerPC emulator, and why every version of Windows since the 386 has included a virtual machine for running applications for MS-DOS and/or previous versions of Windows.

As for devices that run graphical applications but not those made for desktop PCs, mobile devices have tended to implement Thumb, the 16-bit instruction set of ARM processors.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176332)
MSP430.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176333)
psycopathicteen wrote:
I'm surprised nobody made a dirt cheap 16-bit RISC cpu. I'm thinking maybe having 8 24-bit registers, with most ALU instructions being 16-bit, but some being 24-bit.


Nintendo could have done a custom 65816, with a 16bit bus and throw out the 6502 compatibility,which becoming useless,and a 1 cycle WRAM/ROM access .
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176347)
Bregalad wrote:
Quote:
No SPC700 or 64KB of audio ram, just have the 65816 handle audio...

Considering how much the 65c816 is criticized for not being fast enough, I cannot imagine how poor it'd be at rendering audio. The 16 MHz ARM on the GBA is already the bare minimum, and most games have audio engines of poor quality.

Hardly the bare minimum. Just for clarification, I pretty sure he meant the cpu handling writing to the DSP registers, not generating the waveforms itself. But that said, I've done 4 channels, frequency scaled with volume control, on the PC Engine at 35% cpu resource - all software (and that's a 7.16mhz 8bit 65x). A simple set of 24bit fixed point auto increment regs/pointers hardware on cart would cut that down to 8% cpu resource. No need for a 16mhz arm chip.


The Genesis doesn't have a true 16bit data bus to the VDP, although it acts like it. I ran into this problem when setting odd values for the autoincrement settings (the two bytes of a word going to different places in video memory). The VDP access to vram itself is 8bit as well. My point being, none of that makes the slightest bit of difference.

Point out the inferiority of the SNES' 8bit data bus the main cpu is about as irrelevant. All it really does is show ignorance of how the system works. The DMA is 8bit and damn fast at that. This whole is the cpu 16bit or not, is hardware engineering perspective VS software engineering perspective. The 65816 is a 16bit processor from both perspectives, while the 68000 is 16bit from a hardware perspective and 32bit from a software perspective. I have no problem calling the 68000 a 32bit processor, having written enough code for it (Genesis) - but it's the most underwhelming pathetic 32bit processor I've used. The ISA might be a dream to code with, but the thing is just so slow. A "crippled" '816 at less than half the frequency comes relatively close to it, and a hyper 8bit 7.16mhz processor often matches it. Not impressed.

Even when the SNES is running in 3.58mhz mode, it isn't really. Registers are essentially in ram (WRAM: address vectors) and the processor relies heavily on using ram - it's closer to ~3.1mhz. Try down clocking the Genesis at half speed (3.85mhz) and it would choke. The SNES is doing more with less. A 7.67mhz 65816 with no delays would exceed the 68000 in these consoles. A 65816 with a 16bit bus.... would smack the 68000 around like a little biatch. Crazy that they never made one.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176353)
Genesis does what Nintendon't, and it has nothing to do with the data bus. The data bus is largely a wash, as the 68000 only accesses memory every four cycles. The real difference is Times. Not cycle times or The New York Times or Times New Roman, but *. The 68000 has a hardware 16x16=32 multiplier. The 65816 doesn't, and the multiplier in the 5A22 built around it is only 8x8. Hardware multiply and divide make basic 3D practical even without DSP or GSU.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176356)
tomaitheous wrote:
I pretty sure he meant the cpu handling writing to the DSP registers, not generating the waveforms itself.

Yes. I'm pretty sure the SPC700 doesn't actually generate waveforms, which is why I don't really think it's needed, but of course I have no clue how sound hardware works. It seems like you could have just wired up the DSP to a bus from the cartridge. I would just say main ram, but doesn't it need to constantly pull data to generate waveforms? If the DSP were using main ram, I imagine the CPU couldn't at the same time.

tomaitheous wrote:
Point out the inferiority of the SNES' 8bit data bus the main cpu is about as irrelevant.

Yeah, I don't get the whole data bus thing. It doesn't matter if a cpu has a data bus that is twice as large if it only pulls from ram half as often, or if the clock speed is halved or whatever else. I actually kind of like the thought of a small data bus, because it's not wasteful, like if you have a 16 bit data bus and use an 8 bit instruction, you're wasting a theoretical cycle vs if the data bus were 8 bit, if that makes sense. However, I think arguments like "if the 65816 ran at 7.18MHz or the 68000 at 3.58MHz" are kind of silly, because I imagine getting the 65816 to run at 7.18MHz would be a lot more difficult and expensive than getting the 68000 to go at that speed, as it would be more powerful, which drives up the cost 90% of the time.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176359)
The normal multiplier is u8 * u8 -> u16, and requires waiting 8 CPU cycles, regardless of waitstates (it uses the CPU clock), before you can read the result.

However, there's also the mode 7 multiplier, which is i16 * i8 -> i24, and can be used when you have mode 0-6 set, or during blanking in mode 7. It's also significantly faster, letting you read the result immediately after writing the operands.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176361)
calima wrote:
Excellent! You're one of the few people who care about portability, as it is, there are practically no good Gen emulators for Linux, especially non-32-bit x86.

Eh, BlastEm is getting there (some games still broken due to random bugs, but most should work) (・・ ) (though requires 64-bit x86)

psycopathicteen wrote:
I'm surprised nobody made a dirt cheap 16-bit RISC cpu. I'm thinking maybe having 8 24-bit registers, with most ALU instructions being 16-bit, but some being 24-bit.

RISC gained traction when 32-bit was already common. Note that there are 32-bit RISC CPUs running off a 16-bit bus and with 16-bit opcodes (the SH2s on the 32X come to mind since if I recall correctly their bus is 16-bit, and the ARM in the GBA has Thumb mode as well as a 16-bit bus on everything but a portion of RAM).

tepples wrote:
Genesis does what Nintendon't, and it has nothing to do with the data bus. The data bus is largely a wash, as the 68000 only accesses memory every four cycles. The real difference is Times. Not cycle times or The New York Times or Times New Roman, but *. The 68000 has a hardware 16x16=32 multiplier. The 65816 doesn't, and the multiplier in the 5A22 built around it is only 8x8. Hardware multiply and divide make basic 3D practical even without DSP or GSU.

No, it didn't. Multiply is a really slow operation on the 68000, and division even moreso. On top of that, they mess with raster effects, since the 68000 can't process interrupts until the current instruction is finished (this is particularly bad for division, since each division eats up about 1/3 worth of scanline's time). You really want to avoid them at all costs... but at least you can get away without them for 3D if you really need to. But the 68000 is still too slow for 3D precisely because it's damn too slow at memory accesses (nearly all the time is spent on filling polygons, and this requires an absurd amount of memory accesses, in addition to the time spent transferring this to video memory once done).
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176368)
The 68k multiply and divide timings are also clearly microcoded, and take variable time:
M68000 User's Manual wrote:
DIVS, DIVU — The divide algorithm used by the MC68000 provides less than 10% difference between the best- and worst-case [158 or 140 clocks respectively] timings.
MULS, MULU — The multiply algorithm requires 38+2n clocks where n is defined as:
MULU: n = the number of ones in the <ea>
MULS: n = concatenate the <ea> with a zero as the LSB; n is the resultant number of 10 or 01 patterns in the 17-bit source; i.e., worst case happens when the source is $5555.

This means the Genesis can do approximately 110k-190k 16×16→32 multiplications per second.

The SNES's mode 7 multiplier would be so I/O bound (instead of internal processing limited) that I'm having a hard time making a fair comparison.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176376)
lidnariq wrote:
I'm having a hard time making a fair comparison.

Uh, what? I imagine that dealing with the mode 7 multiplication registers, you're only doing loads and stores. LDA can be from 2-7 cycles, (8 cycles for 16 bit) and STA can be from 3-7 cycles (8 for 16 bit). Just see the number of times you'll do either, and there you'll have it.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176380)
tomaitheous wrote:
Point out the inferiority of the SNES' 8bit data bus the main cpu is about as irrelevant. All it really does is show ignorance of how the system works. The DMA is 8bit and damn fast at that. This whole is the cpu 16bit or not, is hardware engineering perspective VS software engineering perspective. The 65816 is a 16bit processor from both perspectives, while the 68000 is 16bit from a hardware perspective and 32bit from a software perspective. I have no problem calling the 68000 a 32bit processor, having written enough code for it (Genesis) - but it's the most underwhelming pathetic 32bit processor I've used. The ISA might be a dream to code with, but the thing is just so slow. A "crippled" '816 at less than half the frequency comes relatively close to it, and a hyper 8bit 7.16mhz processor often matches it. Not impressed.

Even when the SNES is running in 3.58mhz mode, it isn't really. Registers are essentially in ram (WRAM: address vectors) and the processor relies heavily on using ram - it's closer to ~3.1mhz. Try down clocking the Genesis at half speed (3.85mhz) and it would choke. The SNES is doing more with less. A 7.67mhz 65816 with no delays would exceed the 68000 in these consoles. A 65816 with a 16bit bus.... would smack the 68000 around like a little biatch. Crazy that they never made one.


Oh sorry i can't let pass that... Do you in fact understand how hardware work before accusing others about their ignorance ? You are still from the ones comparing CPU based on their frenquency .. really ? Arging that a 7.67 Mhz 65816 can do more than a 7.67 Mhz 68000 (and of course it does) actually just show your ignorance and is totally irrelevant... The problem is all about memory speed. Please tell me how you could had 7 Mhz ROM (even 5 Mhz) in 1990 with good capacity and resonable price ? Of course the 8 bits BUS of the SNES in a pain in the butt, and of course the 65x0/65816 architecture is really inneficient in term of what it can do with a given memory speed compared to 68000, and that is actually what matter back in time as the memory (ROM as RAM) speed/cost was a major deal.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176381)
Sik wrote:
But the 68000 is still too slow for 3D precisely because it's damn too slow at memory accesses (nearly all the time is spent on filling polygons, and this requires an absurd amount of memory accesses, in addition to the time spent transferring this to video memory once done).


Not that slow, of course you can't do advanced 3D at all but you can get something not far from what the SNES did with a SFX:
https://www.youtube.com/watch?v=YUZpF2JLF4s

I made benchmarks with my 3D maths methods in SGDK and you can barely do complete 3D transformation of about 10000 vertices / second (that is, including 2D projection so 11 multiplications + 11 additions + 1 division per vertex). If you spent 30% of your CPU time in it that let you about 3000 vertices / second or 200 vertices / frame (for a 15 FPS game). Not sure Starfox on SNES put more than that.

Also i don't understand why you say that "68000 it's damn too slow at memory accesses" O_o ? compared to what ? In fact i would totally say the opposite. The 68000 is quite efficient doing memory operation regarding it's BUS speed... And that "starfox demo" shows it (not bad for a CPU using < 2Mhz memory operation speed).
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176382)
> On top of that, they mess with raster effects

... so I take it I'm going to have to implement the VDP as a dot-based pixel renderer, instead of a scanline-based renderer, eh? :P

Not that I was planning on doing a scanline-based renderer anyway ...
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176383)
I think you don't need dot based raster to get troubles with raster effects... if a 68k division happen right before a H-Int, then H-int processing is postponed after the division (let say, about 130 cycles too late then), depending what you are doing in the H-Int callback you can definitely miss the period where VDP is fetching next scanline data (and that is the case even with a scanline based renderer).
You will need dot based rendering anyway if you plan to allow that bitmap DMA mode ;)

I just dig in my old Gens (rewrite) sources, it looks like i tried to implement all basic timing with a global event system with a centralized timer (split in global / current frame / current scanline timers). The basic idea was to push all incoming relevant events for a whole scanline (H Blank flag change, VDP SAT prefetch, VDP line rendering, h-int trigger...) then i was executing CPU (all CPU) cycles for 1 scanline and the event handler system was splitting cycles slices according so events occurred when expected. The idea was to use that framework to emulate the Sega Saturn as well as it heavily relies on SMP.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176385)
Stef you are always speaking of costs, but there is a big margin between 2,58 and 7,67 mhz .
A simple 1 cycle for accessing RAM/ROM like hu6280 and you can go more than 6mhz and have ROM/RAM speed as used in MD (150ns) .
For RAM no needs to have 128 ko 32/64 were enought, and the cherry on the cake, you can boost your DMA too .

For me the SA-1 is the good exemple that costs are not so expensive like we think, A 10mhz 65816+enbeded RAM+all others feature within a simple cartridge .

I even read that Z80 RAM on Md was a 100ns(at least on VA0 revision) . :shock:
http://segaretro.org/Mega_Drive_PCB_revisions
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176386)
Stef wrote:
I just dig in my old Gens (rewrite) sources, it looks like i tried to implement all basic timing with a global event system. The basic idea was to push all incoming relevant events for a whole scanline (H Blank flag change, VDP SAT prefetch, VDP line rendering, h-int trigger...) then i was executing CPU (all CPU) cycles for 1 scanline and the event handler system was splitting cycles slices according so events occurred when expected. The idea was to use that framework to emulate the Sega Saturn as well as it heavily relies on SMP.


Yeah, I think that could really work well for the Genesis, being based off a single oscillator.

In many ways, I am starting to feel MAME's pain with supporting more and more systems. It's extremely rewarding intellectually, but my pride and strive for perfection really take a beating.

Anyway ... I would strongly recommend you look into binary min-heap arrays for this. Here's my implementation for reference: http://hastebin.com/raw/qurokahane

If you use this as a priority queue, it's pretty miraculous. The idea is, any time you know something is going to happen in N cycles, where N can be any number of cycles you want ... you can add it to the queue in logarithmic time. And whenever an event triggers, you can remove it in logarithmic time too. But the real magic that makes it so great ... as time passes, you can advance the queue by N cycles and trigger callback events in constant(!!) time ... which boils down to one compare.

My version above uses a trick to avoid having to normalize the queue periodically to avoid overflow. Makes it a bit harder to read, but easier to use since you never have to worry about that case.

So instead of having an add_cpu_cycles(uint N) loop that has to test if we need to fire an IRQ, an NMI, a DMA event, run the ALU, or do a bunch of other things like that ... you can test every single possible event with just one compare.

There may be better data structures than binary min-heap for this, but I loved the simplicity of it. It's very rare that I'm able to implement algorithms when described by mathematicians.

Anyway, a Gens reboot sounds pretty awesome! Gens was always my favorite Genesis emulator (sorry Steve, but I don't use closed source stuff) ... would be fun to talk shop with you sometime in the future after I learn a lot more :D

...

As for the Saturn, that's my ultimate dream console to emulate. But short of a 100-fold increase in processing power before I reach 40, I'm not going to attempt it. It would require too many accuracy sacrifices and nothing kills my enjoyment of emu coding more than that =(
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176389)
I've been thinking about multiplication, and how would I do it (for multijointed sprites) while doing Mode 7. I thought instead of using the $42xx registers, using the "square LUT" method, but with 10-bit signed integers.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176390)
Espozo wrote:
It seems like you could have just wired up the DSP to a bus from the cartridge.

At the cost of about 25 more pins. The more pins, the sooner your cartridge and console become XBOX HUEG like a Neo Geo.

The Apple IIGS has 64K of dedicated sound RAM. It uses an access method very much like VRAM on the NES or Super NES, even down to needing a priming read when reading back from audio RAM. Somehow it multiplexes access by the Ensoniq DOC and access by the CPU, and I imagine that involves a "delay slots" concept analogous to that in the Master System and Genesis VDP.

Sik wrote:
Multiply is a really slow operation on the 68000, and division even moreso.

Is multiply slower than four 8-cycle multiplies on a 5A22? MULS/MULU is up to 70 cycles according to this table. The exact value is 38+2n, where n for MULU is the weight of the second factor, and n for MULS is the number of transitions in b, that is, the weight of b XOR (b << 1). Using a 2.5:1 clock ratio between 68000 and 65816 implies 28 cycles, which is less than the 32 for four 5A22 multiplies alone, let alone reading and writing the multiplier's MMIO port.

Sik wrote:
each division eats up about 1/3 worth of scanline's time

Each scanline is 228 Z80 cycles, or 228*15/7 = 488.5 68000 cycles, so 1/3 is about right. From the same table, DIVS is 90-100% of 158 cycles, and DIVU is 90-100% of 140 cycles. At 2.5:1 this corresponds to between about 54 and 64 65816 cycles. I don't think I even got 16/8 anywhere that fast on 6502, let alone 32/16.

Sik wrote:
But the 68000 is still too slow for 3D precisely because it's damn too slow at memory accesses (nearly all the time is spent on filling polygons, and this requires an absurd amount of memory accesses, in addition to the time spent transferring this to video memory once done).

Does the tech demo using assets taken from Star Fox use DMA-assisted filling?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176394)
Quote:
because I imagine getting the 65816 to run at 7.18MHz would be a lot more difficult and expensive than getting the 68000 to go at that speed, as it would be more powerful, which drives up the cost 90% of the time.

Why more expensive ??
You think as if the CPU was the only component in the system and can use the type of RAM it needs,like that Ok, 68k can use more slower memory and of course slower ROM too.
But in the case of game system, fortunately the CPU is not alone, and you have more componants involved,like DMA .
But DMA is fast and needs fast access,so fast RAM/ROM, the costs reduction of the 68k are void .
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176395)
byuu wrote:
... so I take it I'm going to have to implement the VDP as a dot-based pixel renderer, instead of a scanline-based renderer, eh? :P

You were going to need that if you didn't want Overdrive to yell "YOUR EMULATOR SUXX" at you anyway (that particular scene needs practically perfect emulation of sprite fetching and rendering).

TOUKO wrote:
I even read that Z80 RAM on Md was a 100ns(at least on VA0 revision) . :shock:
http://segaretro.org/Mega_Drive_PCB_revisions

Yeah, I wonder if refresh logic would have countered the benefits of using DRAM, given it was just 8KB and it'd need to cope with two CPUs potentially having to cope with it.

tepples wrote:
Sik wrote:
Multiply is a really slow operation on the 68000, and division even moreso.

Is multiply slower than four 8-cycle multiplies on a 5A22? MULS/MULU is up to 70 cycles according to this table. The exact value is 38+2n, where n for MULU is the weight of the second factor, and n for MULS is the number of transitions in b, that is, the weight of b XOR (b << 1). Using a 2.5:1 clock ratio between 68000 and 65816 implies 28 cycles, which is less than the 32 for four 5A22 multiplies alone, let alone reading and writing the multiplier's MMIO port.

Sik wrote:
each division eats up about 1/3 worth of scanline's time

Each scanline is 228 Z80 cycles, or 228*15/7 = 488.5 68000 cycles, so 1/3 is about right. From the same table, DIVS is 90-100% of 158 cycles, and DIVU is 90-100% of 140 cycles. At 2.5:1 this corresponds to between about 54 and 64 65816 cycles. I don't think I even got 16/8 anywhere that fast on 6502, let alone 32/16.

Huh, good point.

But I wasn't really comparing to the SNES, the point was more that they were still quite slow in general for what they're worth, so you can't just go around using them lightly as people make it sound to be. Although admittedly multiplication against small values may stack up reasonably against a look-up table (especially if the other value can have much bigger ranges) depending on the situation (if the source operand is at most 8-bit in terms of magnitude, it'll be guaranteed to not take longer than 54 cycles - and because it's the bit pattern that matters, it also applies to bit shifted variants of said values).

(EDIT: to make it clear, 256×256 possible values would require 64K entries... if one of those values has a larger range, the table starts becoming really big)

tepples wrote:
Does the tech demo using assets taken from Star Fox use DMA-assisted filling?

Off the top of my head no (just an absurd amount of loop unrolling), though it does exploit DMA to get a sorta linear bitmap in RAM (to avoid having to convert from bitmap to tile arrangement).
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176398)
Quote:
Does the tech demo using assets taken from Star Fox use DMA-assisted filling?


In this version:
https://www.youtube.com/watch?v=5rKgH11ng78
All filling is done in software by the 68k, even VRAM bitmap buffer transfer is done by the 68000 CPU because of the required linear bitmap buffer to tile conversion.

In fact gasega68k tried to use DMA for VRAM transfer, he had to use 2 background plans with a special tilemap arrangement so he could still use a fast software filling algorithm :
https://www.youtube.com/watch?v=oHLc0AzD85g

We can see the frame rate can get up to 30 FPS when there is few element on screen, but the background sacrifice is too important for the minor performance gain imo. I think the 100% software implementation is already good enough, and you can maintain the nice tilting background :)
I also made my own starfox demo just some months before the gasega68k attempt :
https://www.youtube.com/watch?v=UuYFmIEtLLk

But my version is not as impressive, in fact i used a lot of C code and directly interpreted data from original SNES game (i have a part of the original rom in my demo :-/) which is very inefficient but faster to get the demo done :p

I have a general benchmark sample for MD in SGDK which display various number as maths performance and even polygon filling rate if you want to give a try :
https://www.dropbox.com/s/el2bz4eiug7dkk1/rom.bin?dl=0
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176400)
Stef wrote:
But my version is not as impressive, in fact i used a lot of C code and directly interpreted data from original SNES game (i have a part of the original rom in my demo :-/) which is very inefficient but faster to get the demo done :p

Despite the single digit framerate, I actually like yours the best. There are some random little elements that yours gets right that the others don't, like using gradients on the polygons that just make it look more attractive overall, probably because of this: :lol:

Stef wrote:
i used a lot of C code and directly interpreted data from original SNES game

And wow, the music is spot on! :shock:

tepples wrote:
At the cost of about 25 more pins.

:shock:

tepples wrote:
Is multiply slower than four 8-cycle multiplies on a 5A22?

It doesn't seem that fair to use the multiplication and division unit on the 5A22 for comparison, considering the Genesis doesn't even have an affine transformed background layer in hardware. If I'm not mistaken, the multiplication unit in the PPU is instantaneous, but it's still a pain to write to it. How do you even use the multiplication unit in the PPU? I'm guessing it's a dual write thing, so you'd need to have an 8 bit accumulator, which is slower. Actually, could you have a 16 bit accumulator, and write twice, just making sure the top 8 bits of the second store is what you want to multiply the 16 bit number by? If this is the case, it seems like you could just do lda, sta, lda, sta, lda, sta, lda, sta. (Edit: I forgot to put the two lda's at the end for whatever reason.)

Has anyone ever even done a legit 3D demo on the SNES, outside of a cube that probably has so many things hardcoded that it's not even useful?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176403)
Sik wrote:
(EDIT: to make it clear, 256×256 possible values would require 64K entries... if one of those values has a larger range, the table starts becoming really big)


Wouldn't that be 128kbyte (because the element size would be WORD?).
Stef wrote:

Oh sorry i can't let pass that... Do you in fact understand how hardware work before accusing others about their ignorance ? You are still from the ones comparing CPU based on their frenquency .. really ?

Bullshit. People who say that the problem with the SNES is the 8bit data, but don't know what they're talking about. That is what ignorance is. If the bus is enough to deliver what is needed for the processor to crank out operations, then who cares what the size is? Apparently people that don't have a clue about 65x processor and only read docs, that's who. 68k people looking at the '816 cringing at the 8bit data bus, having no clue what any of it means in context because all they know is the 68000 as is (and the crippled 68008) - that is ignorance.


Anyway, the point of talking about the clock speed is that it show the '816 is doing more per clock cycle than the 68k is. The 68k just simply relies on brute force clock speed to get anywhere, because while it has powerful general instructions - it's slow overall. The '816 is like a sports car - not a lot of room, but it's fast. The 68k is like a cargo truck, not fast at all but it can carry a lot more. Fortunately for the '816, game code is made up of a lot of smaller, simpler operations overall.

Quote:
Arging that a 7.67 Mhz 65816 can do more than a 7.67 Mhz 68000 (and of course it does) actually just show your ignorance and is totally irrelevant... The problem is all about memory speed. Please tell me how you could had 7 Mhz ROM (even 5 Mhz) in 1990 with good capacity and resonable price ?

By doing what Hudson did, and redesigning the processor so that it's not a two phase requirement just to access memory. Not only is it possible, but Hudson ran 7.16mhz compatible memory (ram and rom - no wait states). If small little Hudson could pull it off, so could Big N. So ,yes - ram and rom existed at that speed and was available back in 1987, let alone 1990.

Quote:
Of course the 8 bits BUS of the SNES in a pain in the butt, and of course the 65x0/65816 architecture is really inneficient in term of what it can do with a given memory speed compared to 68000, and that is actually what matter back in time as the memory (ROM as RAM) speed/cost was a major deal.


That context is the these consoles. And the '816 is producing more per clock than the 68000. That's the point. Ignorant people look at the clock speed think "it's low, that's why it's slow" - without realizing that it's much more efficient with that clock speed than the 68k. Convoluted/complex code? Sure. But the power is there. Is it a great general purpose processor like the 68k? No. But it's in the context of these old consoles, which take the burden off the cpu with video and sound hardware acceleration - the processor is free for optimization and acceleration at common tasks. This is what you fail to see, but because you're looking at the whole picture (outside of these consoles).
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176405)
After all, if we really wanted to compare just the 68k versus the 65816... we'd compare the Mac 512k vs the Apple ][gs, not the Genesis and SNES.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176406)
I don't know how this thread became 6 pages.

The answer to the initial post is: the 65816 is a 16-bit CPU. It contains a usable 256 instructions (vs. the 6502's 151 and the 65c02's 178), with full 16-bit registers (A, X, Y, but also D and S are 16-bit though not common-use registers). Yes it supports 65c02 emulation through e=1 at runtime. Yes it supports 8-bit registers, both through e=1 as well as dynamically at runtime through REP/SEP. Yes it has an 8-bit data bus for backwards-compatibility (as it was designed to be from day one). It offers several addressing modes the 6502 nor 65c02 lack (I was reluctant to mention this because it's a bit beside the point).

In other words: yes it is a 16-bit CPU. It is not "just a 6502 with 16-bit registers". THE END.

I will punch the balls and/or vag of anyone who comes along and says:

* "but the undocumented opcodes on 6502 are technically usable...!"
* "but opcode $42/WDM isn't usable, it's just a NOP, so the CPU actually has 255 opcodes!"
* "but opcode $01/COP really isn't used by anything, so the CPU actually has 255 opcodes!"
* "but other 6502/65c02 implementations have more than 151/178!"
* ...or anything other weird edge-cased sperglord nonsense.

Do not nitpick, troll me, or be an asshole about it. You know damn well what is meant by this. If you even consider going this route, it means YOU KNOW you are intentionally doing it. Do not be that guy. Just don't.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176407)
Espozo wrote:
It doesn't seem that fair to use the multiplication and division unit on the 5A22 for comparison, considering the Genesis doesn't even have an affine transformed background layer in hardware.

Its affine transformed background mode is also its only packed pixel mode, as used by Wolfenstein 3D. Besides, how do you build any sort of 16x16 out of signed 16x8, or any sort of larger division out of 16/8?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176408)
tomaitheous wrote:
Sik wrote:
(EDIT: to make it clear, 256×256 possible values would require 64K entries... if one of those values has a larger range, the table starts becoming really big)


Wouldn't that be 128kbyte (because the element size would be WORD?).

64K entries, not bytes. Actual size depends on what you're trying to do (it's unlikely you'll be using plain multiplication as-is, e.g. if you're doing 3D stuff then you're probably using a reciprocal table instead, and you may want to use 16.16 there), and even then if you're going for larger ranges then a word is not going to suffice anymore (e.g. 258×255 already won't fit in 16-bit).

tomaitheous wrote:
Convoluted/complex code? Sure.

Honestly, that's probably the biggest actual downside for the 65816. When you're trying to get everything in done within a tight deadline (i.e. basically every commercial game ever made), stuff like that matters a lot, and you can safely throw away the idea of doing things the optimal way, especially near the end where you're trying to hack in everything together and fixing critical bugs at all costs.

But I guess that falls under the same camp as "what matters is the games".
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176410)
tepples wrote:
how do you build any sort of 16x16 out of signed 16x8, or any sort of larger division out of 16/8?

How do you build 16x16 out of 8x8, signed or not? Anyway, doesn't signed just mean the first bit indicates whether the number is negative or not? What difference does it make if you have 65536 vs. 32768?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176412)
To make 16x16 unsigned multiplication out of 8x8:

a = a_high * 256 + a_low
b = b_high * 256 + b_low
a * b = (a_high * 256 + a_low) * (b_high * 256 + b_low)
= a_high * b_high * 65536 + a_low * b_high * 256 + a_high * b_low * 256 + a_low * b_low

Is there an equivalent decomposition for signed multiplication?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176418)
koitsu wrote:
I don't know how this thread became 6 pages.

The answer to the initial post is: the 65816 is a 16-bit CPU. It contains a usable 256 instructions (vs. the 6502's 151 and the 65c02's 178), with full 16-bit registers (A, X, Y, but also D and S are 16-bit though not common-use registers). Yes it supports 65c02 emulation through e=1 at runtime. Yes it supports 8-bit registers, both through e=1 as well as dynamically at runtime through REP/SEP. Yes it has an 8-bit data bus for backwards-compatibility (as it was designed to be from day one). It offers several addressing modes the 6502 nor 65c02 lack (I was reluctant to mention this because it's a bit beside the point).

In other words: yes it is a 16-bit CPU. It is not "just a 6502 with 16-bit registers". THE END.
...


The end for you, we all have our definition of a 16 bits CPU as there is no real legit one.
The topic severely derailed (i'm not the one who started it...), basically i said the 65816 is as 16 bits than the 68000 is 32 bits, and it's true i can't consider the 65816 as a real 16 bits CPU as it is not able to operate 16 bits of data at once because of its 8 bits data bus (which is a big part of CPU architecture) coupled to the fact that operation have to be done with memory because of the single accumulator. So yes the 65816 is definitely as 65C02 with 16 bits registers and 16 bits ALU .
I can't convince you that i'm right but you will never made my mind change about that too and honestly i think we all don't care.
The END :p

Quote:
Bullshit. People who say that the problem with the SNES is the 8bit data, but don't know what they're talking about. That is what ignorance is. If the bus is enough to deliver what is needed for the processor to crank out operations, then who cares what the size is? Apparently people that don't have a clue about 65x processor and only read docs, that's who. 68k people looking at the '816 cringing at the 8bit data bus, having no clue what any of it means in context because all they know is the 68000 as is (and the crippled 68008) - that is ignorance.


I think here almost people know what they are talking about... and i was the one complaining for the 65816 8 bis data bus and honestly i think i do know what i'm talking about.

Quote:
Anyway, the point of talking about the clock speed is that it show the '816 is doing more per clock cycle than the 68k is. The 68k just simply relies on brute force clock speed to get anywhere, because while it has powerful general instructions - it's slow overall. The '816 is like a sports car - not a lot of room, but it's fast. The 68k is like a cargo truck, not fast at all but it can carry a lot more. Fortunately for the '816, game code is made up of a lot of smaller, simpler operations overall.


Really that is so disappointing to read that from you... Still thinking in that absurd logic.
You still comparing the CPU on their speed... Who care about what a CPU can do for a given speed if you can't put it more than 3 Mhz while others can go up to 10 Mhz ?

Quote:
By doing what Hudson did, and redesigning the processor so that it's not a two phase requirement just to access memory. Not only is it possible, but Hudson ran 7.16mhz compatible memory (ram and rom - no wait states). If small little Hudson could pull it off, so could Big N. So ,yes - ram and rom existed at that speed and was available back in 1987, let alone 1990.


That is definitely bullshit... I think you have a very biased vision based on your experience (and attachment) for the PCE architecture but how much experience do you have on other system ? Hudson (and NEC in fact) was in a very particular case where it was directly producing its own ROM chips so it could easily lower its margins here. Still the PCE has a very small RAM (8 KB) compared to the Sega Genesis because of that, and hucard capacity were limited compared again to Genesis ROM. That solution was not possible both for Sega nor Nintendo in term of cost, even more with the required increasing ROM capacity for this ERA. Do you realize that the SNES itself it the proof that you're wrong ? Do you think Nintendo put a 2.68 Mhz 65816 in the SNES because they though "2.68 mhz is enough for everyone !" ?. Do you realize that "fast" rom (allowing the 65816 to get up to 3.58 Mhz) arrived late in the SNES life... if they could have them back in 1990 they would make the SNES at least running at 3.58 Mhz by default.
That is why imo 65xx based architecture are crap... inefficient use of memory bandwidth, but ok, you get what you paid for and the 65xx CPU are cheap, and that count.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176420)
Cool, so now that we've determined this thread serves absolutely zero purpose because it's filled with nothing but opinions, can it be locked given its uselesness? The previous post vs. the initial post should be all that's needed to justify that.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176421)
byuu wrote:
Yeah, I think that could really work well for the Genesis, being based off a single oscillator.

In many ways, I am starting to feel MAME's pain with supporting more and more systems. It's extremely rewarding intellectually, but my pride and strive for perfection really take a beating.


Yeah of course, the single clock based system make things much easier :) In fact as i was supporting Sega-CD (and 32X but the later use Megadrive clock) as well but i cheated and used an approximated value (795 cycles per scanline which is really close to reality) so i could rely on the same uniq global counter as well for it.

Quote:
Anyway ... I would strongly recommend you look into binary min-heap arrays for this. Here's my implementation for reference: http://hastebin.com/raw/qurokahane


I'm not used to templatizen C++ (the syntax is... well, not very readable :p) but i got it :)
My event structure was similar with "cycle" and "callback" fields, i added an "id" so i could eventually relocate a given event if for any reason it should be postponed and removed (i also have an extra "param" to own pamameters for callback if any).
Your binary min-heap arrays looks like a binary tree to store your events based on their counter.

Quote:
If you use this as a priority queue, it's pretty miraculous. The idea is, any time you know something is going to happen in N cycles, where N can be any number of cycles you want ... you can add it to the queue in logarithmic time. And whenever an event triggers, you can remove it in logarithmic time too. But the real magic that makes it so great ... as time passes, you can advance the queue by N cycles and trigger callback events in constant(!!) time ... which boils down to one compare.


Yeah, i expected it from the tree structure and your is specially optimized to the minimal requirement. Always important to cleverly choose the array storage structure depending what you plan to do most with it (insertions, removals or just retrievals) =)
Today i'm doing many java development for my professional job and the basic API is well designed to deal with all that kind of data structure:
https://commons.apache.org/proper/commo ... eList.html

I've to admit that back in time (sources were dates from 2005) i was doing almost everything in C (wanted to port it on Dreamcast) and i chosen a simple array where i was storing the first incoming event index with others minor optimization to make some operation fast but the structure itself was a simple array. Still that array was just containing events for a single scanline so it was never that big ;)

Quote:
So instead of having an add_cpu_cycles(uint N) loop that has to test if we need to fire an IRQ, an NMI, a DMA event, run the ALU, or do a bunch of other things like that ... you can test every single possible event with just one compare.


Yeah, because the idea is that you just push events first in the queue then just checking counter against first event counter, exactly what i was doing.

Quote:
There may be better data structures than binary min-heap for this, but I loved the simplicity of it. It's very rare that I'm able to implement algorithms when described by mathematicians.


When you want sorted list with fast insert/remove and iterate operation, the tree is the structure to go with, there is no real better alternative. After that it all depends to the Tree structure implementation itself :)

Quote:
Anyway, a Gens reboot sounds pretty awesome! Gens was always my favorite Genesis emulator (sorry Steve, but I don't use closed source stuff) ... would be fun to talk shop with you sometime in the future after I learn a lot more :D


Haha thanks but unfortunately this was a (very) old project (2005) than i never completed by lack of motivation and because i turned more and more in Megadrive programming :p I could eventually release the sources but i think it's not that interesting now we have emulators as BlastEm or Exodus.

Quote:
As for the Saturn, that's my ultimate dream console to emulate. But short of a 100-fold increase in processing power before I reach 40, I'm not going to attempt it. It would require too many accuracy sacrifices and nothing kills my enjoyment of emu coding more than that =(


The Saturn is a challenging system ;) As you said i think we can forget about 100% accuracy for it as it would require too much effort as CPU power.
In fact i worked a bit on Saturn emulation back in time. I joined Sthief which was just releasing it's first version of SSE (a very old and discontinued saturn emulator). I moved the emulator to windows (it was DOS based) then i made a VDP1 software implementation (which was really lacking as the OpenGL was quite broken in accuracy) and also the first SCSP sound core while fixing tons of bugs. I believe SSE was the first Saturn emulator to actually have real SCSP sound (the one you can heard in BIOS logo) and not only CDDA playback. Too bad we never released that version... it's probably sitting somewhere on my hard drive :p

Edit: I found an old binary sitting on my hard drive:
https://dl.dropboxusercontent.com/u/933 ... n/wsse.zip

Sometime you get up to the CD player panel but almost time it crashes when you launch it X'D
It was working better back in time (i guess newer windows version don't help) :p
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176422)
Quote:
Still the PCE has a very small RAM (8 KB) compared to the Sega Genesis because of that, and hucard capacity were limited compared again to Genesis ROM.

The PCE was made for 8 bit competition ,not 16 bit, 8 ko was enough, and don't forget that PCE can access to his VRAM at any time,so more ram become useless in most case .
Z80 ram in MD was 100/120/150/200ns (on various revisions), it's strange for a very costly memory .
The pce's RAM/ROM speed is 140 ns, for Md it's 150,and like i alway said (and tom too ) nintendo should have made ​​a custom 65816 not using a vanilla one,mainly to reduce the RAM/ROM speed needed,and put it into a PCE/MD level of 150/160 ns .
I think the classic 65816 was easy to put in snes, the low speed permit to big N to reduce the costs easily, because he was sure that the CPU can be easily replaced by a much powerfull one via cartridge at a low costs .
Snes's roms were expensive ??, yes because third developers were forced to buy the cartriges to nintendo,and could not be allowed to make their own .

150 ns RAM/ROM were expensive also for sega, and this don't discourage SEGA to put 100ns RAM speed for his Z80(purely useless,even for 68k) .

when hudson made his chipset ,it didn't know who was going to do his machine, and 140ns memory was already here,the chipset was thinked with memory which was easily faisible for all the primary manufacturers in the market,and was not manufactured by nec,but by epson .
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176426)
Stef wrote:
I can't convince you that i'm right but you will never made my mind change about that too and honestly i think we all don't care.

That's because your whole view stems from a 68k centric perspective. Until you adapt a more holistic perspective, you'll always have that bias and preconceptions. I like you stef, and you create some great stuff, but you in particular are known for your extreme 68k and z80 bias.

I don't think the 65x is a great design. I think it makes a very poor general purpose processor (i.e. not a good wide range processor). I think it lacks features like the 6809 has (its only closest comparison), which makes it feel a bit cheap in design. I think its constant bus hogging makes it limited in system design scopes and appeal. Extending into 16bit design, none of this was addresses and had limited application because of it - it was a simple and cheap upgrade. By comparison, the 68k was a much better design (linear PC, real/usable branch relative addressing, not hogging the bus, fuller ISA, hardware macro instructions (32bit ones; two pass over the ALU)) - it was a forward thinking model that easily extended itself to changes and upgrades down the line. Its range of application is also far greater (which made it a perfect processor for computer system designs).

What you call a bias on my part, I call a realistic understanding. I recognize that the 68k is better in soo many respects, but context is key. Context is a HUGE part of this comparison. And this... this is where you fail to see that the 68k and the 65x models are brought much closer to capability - specifically because of the limited role game logic requires of said processors. Are there some exceptions where game logic is more expansive in design? Sure. And requires some more processing power. But the norm brings these processors closer together. And that's what I think you completely fail to see.




Quote:
Really that is so disappointing to read that from you... Still thinking in that absurd logic.
You still comparing the CPU on their speed... Who care about what a CPU can do for a given speed if you can't put it more than 3 Mhz while others can go up to 10 Mhz ?

It's not absurd at all; it speaks to the intrinsic characteristics of the processor (namely, efficiency in relation to clock cycles). You do this as well, except for other areas that show the 68k in good light. Take off the blinders... man.



Quote:
Do you realize that the SNES itself it the proof that you're wrong?


And do you realize there are a myriad of reasons why the snes is designed the way it is? It's plainly obvious that additional hardware support on cartridge was part of the design scope of the system. From their perspective, having successfully extended the Famicom life span with extended hardware, it made sense to go with a much cheaper (cost) processor and add hardware resource as the bar for software develop began to push the boundaries. Apparently Nintendo planned ahead given both the superior audio and color design of the system at the time; above and beyond the status quo. From Nintendo's perspective at the time, it made sense to go with a cheaper processor to keep the base price down (consider the cost of the SMP and sPPU design). I think it's a good indication that Nintendo saw the processing requirements increasing beyond whatever initial design would provide, especially giving the capability of the support hardware (video/hardware). Whether they executed this integration seamlessly or not, is entirely another thing.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176429)
tepples wrote:
Is there an equivalent decomposition for signed multiplication?

Just apply signs rule (result is positive if both are same sign, negative if both are different sign, then you can just make both operands positive for the calculation and apply the sign after the fact).

Stef wrote:
So yes the 65816 is definitely as 65C02 with 16 bits registers and 16 bits ALU .

You realize the width of ALU operations is the usual determinator when it comes to determining the "bittage" of a CPU, right? (yes, I know the Z80's ALU works on 4-bit halves, but it always tries to do two halves in practice)

Seriously, you're just looking for excuses to say the 65816 is just a 6502 with 16-bit registers and built-in bank switching. That's like saying the 68020 is just a 68000 with a 32-bit bus.

(EDIT: typo)

koitsu wrote:
Cool, so now that we've determined this thread serves absolutely zero purpose because it's filled with nothing but opinions, can it be locked given its uselesness? The previous post vs. the initial post should be all that's needed to justify that.

There's like only one person in the entire forum who disagrees on the 65816 being 16-bit, and I haven't seen anybody anywhere else disagree on that either =P


Also really if we insist on arguing whether the 65816 was powerful enough or not... let's not forget the NES, running a 6502 at less than 2MHz, was consistently doing platformers at 60FPS. So huh yeah, there goes the whole argument for the entire generation.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176430)
Quote:
(yes, I know the Z80's ALU works on 4-bit halves, but it always tries to do two halves in practice)

May be, that's why he's not very fast even with operations between registers ??

Quote:
Seriously, you're just looking for excuses to stay the 65816 is just a 6502 with 16-bit registers and built-in bank switching. That's like saying the 68020 is just a 68000 with a 32-bit bus.

Yes and this is why i created this tread .. :?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176432)
I just realized I typo'd "say" as "stay". Whoops ^_^;

TOUKO wrote:
May be, that's why he's not very fast even with operations between registers ??

I think it has ugly stuff going on regarding memory accesses too, every extra byte access in the instruction results in 3 extra cycles (even if it's just fetching). A faster ALU isn't going to help here.

EDIT: all accesses add 3 cycles, not just the bytes in the instruction itself. Still, point stands. 68000 suffers from something similar (every extra word access adds 4 cycles due to how bus accesses are handled)
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176434)
This 4 bit ALU is a very courious thing, really.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176436)
Sik wrote:
tepples wrote:
Is there an equivalent decomposition for signed multiplication?

Just apply signs rule (result is positive if both are same sign, negative if both are different sign, then you can just make both operands positive for the calculation and apply the sign after the fact).

I thought you couldn't "just make both operands positive" if trying to use the mode 7 multiplier to multiply signed by signed.

Quote:
koitsu wrote:
Cool, so now that we've determined this thread serves absolutely zero purpose because it's filled with nothing but opinions, can it be locked given its uselesness? The previous post vs. the initial post should be all that's needed to justify that.

There's like only one person in the entire forum who disagrees on the 65816 being 16-bit, and I haven't seen anybody anywhere else disagree on that either =P

Also really if we insist on arguing whether the 65816 was powerful enough or not

And that's why I'm not locking it just yet. Though the previous question is answered (the 65816 has a 16-bit ALU and 16-bit ISA), I find the derail about overall performance interesting, and splits are discouraged under new policy.

What we've proved so far: Data bandwidth is a wash, as is overall data processing rate where large multiplies and divides aren't expected. The segmented architecture and dearth of registers make high-level languages less efficient on 65816. 16x16 multiplies and 32/16 divides are slow in both cases but still faster on 68000, but the atomicity of DIVS/DIVU and lack of HDMA in the surrounding memory controller cause latency that makes them less useful in an engine relying heavily on hblank IRQ.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176441)
tepples wrote:
I thought you couldn't "just make both operands positive" if trying to use the mode 7 multiplier to multiply signed by signed.

No, it's more like a wrapper on the multiplication algorithm:

  1. Keep track of sign
  2. Make both operands positive
  3. Do multiplication as if it was unsigned
  4. Apply intended sign to the result

Although wait, do the mode 7 registers allow doing multiplication like it was unsigned? Because if not, that is going to be your problem. I suppose if you can do away with wasting a couple of bits (i.e. 14-bit instead of 16-bit) you could try pushing the higher half so the sign bit is always clear. I suppose it's still faster than doing the whole multiplication entirely in software.

tepples wrote:
16x16 multiplies and 32/16 divides are slow in both cases but still faster on 68000, but the atomicity of DIVS/DIVU and lack of HDMA in the surrounding memory controller cause latency that makes them less useful in an engine relying heavily on hblank IRQ.

Yep.

Note that with hblank IRQs it still depends. Some stuff will always only take effect on the next line no matter what (e.g. changing vertical scroll while not in scroll-per-two-cell mode) - for those it's most likely not an issue since timing is not that important (as long as it lands in the correct line, you're fine). The problem is for those where you want to get it done as early as possible, like palette stuff.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176444)
Sik wrote:
Although wait, do the mode 7 registers allow doing multiplication like it was unsigned? Because if not, that is going to be your problem. I suppose if you can do away with wasting a couple of bits (i.e. 14-bit instead of 16-bit) you could try pushing the higher half so the sign bit is always clear. I suppose it's still faster than doing the whole multiplication entirely in software.
Should still be possible to decompose the arithmetic even though it's s8×s16→s24. Might have to do some prep first instead of only after like with the unsigned-to-signed conversion.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176448)
Quote:
...The pce's RAM/ROM speed is 140 ns, for Md it's 150...


Where did you saw that ? First MD roms were much slower than that...

Quote:
That's because your whole view stems from a 68k centric perspective. Until you adapt a more holistic perspective, you'll always have that bias and preconceptions. I like you stef, and you create some great stuff, but you in particular are known for your extreme 68k and z80 bias.


I don't think i have a 68k centric view... i do know the 68k is special by itself and i see it more as 16/32 bits hybrid than a pure 16 bits. And you make a mistake by assuming i'm totally biased to 68K and Z80 (just because MD uses both). Yeah I do like the 68k CPU but honestly i think that almost people who programmed it do appreciate it as it's a really confortable CPU to develop for... but i really dislike the Z80 on the other hand, i think it's quite difficult to really use that CPU efficiently. For me the Z80 is not a really good 8 bits CPU and i far prefer the GBZ80 (the gameboy customized Z80) which is somehow a "fixed" version of the Z80 (simpler, more usable). I also like the 6809, a very nice and powerful 8 bits CPU. In other CPU ERA i also like ARM thumb or SHx CPU as well (really efficient and nice design) but ok, that is out of focus, just to tell you i'm not a 100% 68000 fan bias... The 6502 series CPU has only one interest for me: their price... but at the cost of their poor efficiency and painful programming.

Quote:
You realize the width of ALU operations is the usual determinator when it comes to determining the "bittage" of a CPU, right? (yes, I know the Z80's ALU works on 4-bit halves, but it always tries to do two halves in practice)


But the ALU size is just almost a design choice, as in the Z80... Internally the 68000 has 3 16 bits ALU so it can do 3 times more than the 65816 ? MMX was using 64 bits ALU and SSE 128 ALU so these CPU were 64 and 128 bits ? That is part of the whole but definitely the data processing capacity (directly linked to the memory capacity) is what matter.

Quote:
Seriously, you're just looking for excuses to say the 65816 is just a 6502 with 16-bit registers and built-in bank switching. That's like saying the 68020 is just a 68000 with a 32-bit bus.


The 68020 bring the full 32 bits IO logic to the 68000, exactly what the 65816 *does not* from the 6502 and that is a pretty big difference... But honestly again i don't care, it's just my opinion, i don't want to convince anyone.

Kisses :p
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176450)
Quote:
Where did you saw that ? First MD roms were much slower than that...

If you want to use DMA(ROM->all that you can) you have no choice, ROM must be as fast as WRAM, so 150ns .
it's entirely dumb having to copy first in WRAM, and after in VRAM(with DMA) because slow ROM,when you know that DMA can do it more, more faster .

Quote:
but at the cost of their poor efficiency

If this is not biased !!!,65xxx not efficient ?? first news !!
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176453)
tomaitheous wrote:
Hardly the bare minimum. Just for clarification, I pretty sure he meant the cpu handling writing to the DSP registers, not generating the waveforms itself. But that said, I've done 4 channels, frequency scaled with volume control, on the PC Engine at 35% cpu resource - all software (and that's a 7.16mhz 8bit 65x). A simple set of 24bit fixed point auto increment regs/pointers hardware on cart would cut that down to 8% cpu resource. No need for a 16mhz arm chip.

Oh, I didn' understand you meant it that way. Yeah, I guess the CPU tied to the S-DSP directly without the CPU part of the SPC700 chip would have made some sense, but then there'd be a major RAM problem. Either the whole RAM access is interleaved between the S-DSP and the CPU, and that means sacrifying a huge part of performance just for getting sound. The other option is to have a dedicated sound-RAM, accessible through DMA. This would end-up pretty much what the PS1 chip is, it's basically 3x SNES DSP with 512k RAM and accessible directly by the CPU.

The main problem would be games would mostly update their sound engine at the slow rate of 60/50Hz, by lazyness, instead of update it at faster rates like they do, which allows for more precision and effects in sound.

I know squat about thee PCE or it's hardware, but it sounds like it's a real achievement what you did, if you actually managed to render sound by software on the 6502 alone (even if the 6502 is overclocked and has extra instructions). The reasons of why the GBA games have poor sound rendering are multiple and not only due to technical CPU constraints. However some games uses tricks such as pre-rendering instruments at 12 tones in order to make sound rendering much more simpler/faster. My Final Fantasy sound restoration hacks uses pre-filtered samples to compensate the lack of an anti-aliasing filter in the sound engine. In other words : Low CPU usage, good sound quality, decent ROM usage, pick (at most) two.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176474)
Bregalad wrote:
However some games uses tricks such as pre-rendering instruments at 12 tones in order to make sound rendering much more simpler/faster.

I think I remember seeing that this was done on the Neo Geo, because I don't think the audio hardware can alter PCM samples in any way, maybe aside from volume.

Bregalad wrote:
The other option is to have a dedicated sound-RAM, accessible through DMA.

Okay, so it's basically getting rid of the SPC 700, but keeping the RAM. Actually, wait a minute, wouldn't the CPU have pretty much no time to write to the ram because the DSP would constantly be reading from it? Wait, then the SPC700 couldn't write to it either... :lol: When does the DSP pull from ram anyway? It can't be a whole frame, otherwise there'd be no way for the SPC700 to update it. But yeah, the SPC700 just seems to make it difficult to update audio ram, while doing processing that wouldn't even make the 65816 break a sweat. I also imagine that if the SPC700 were gone, there'd be more money for other things, like the slow ram.

Bregalad wrote:
Low CPU usage, good sound quality, decent ROM usage, pick (at most) two.

Bye bye ROM usage! :lol: Unless you're truly trying to make a GBA game (or a game on any other system) like it was back then (which why would you want to, there's hundreds of already existing ones) then ROM usage isn't even an issue, or at least to me. I'd like to see how the systems would work if ROM wasn't an issue like it was back in the day, and there are no pre existing games like that. Of course, you have to put some restraint. I almost had a copy of the level tilemap flipped sideways so I could DMA the rows and not use any CPU time to build a buffer, but I realized that that was kind of ridiculous... :lol:

Anyway though, I imagine because the GBA's CPU is such a beast and the games really aren't any more complicated than the SNES's, that you'd spend less than a quarter of the CPU time actually on game logic and the rest on sound. I'm almost certain that how developers programed for the GBA wasn't as efficient as it was back then on the SNES because they have much more wiggle room (and I'm sure that the fact it was 2001 and not 1991 had something to do with it).
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176479)
Espozo wrote:
Actually, wait a minute, wouldn't the CPU have pretty much no time to write to the ram because the DSP would constantly be reading from it? Wait, then the SPC700 couldn't write to it either... :lol: When does the DSP pull from ram anyway?

The audio RAM runs at 3.07 MHz: two slots for the DSP and one for the SMP. In your proposed change, it'd be two for the DSP and one for an interface similar to that of VRAM and the Apple IIGS's audio RAM.

Quote:
Unless you're truly trying to make a GBA game (or a game on any other system) like it was back then (which why would you want to, there's hundreds of already existing ones) then ROM usage isn't even an issue, or at least to me.

GBA ROM is limited to 32 MiB for the first player and 256 KiB for players 2-4. There's only one cart I know of that uses a mapper to address more: Shrek and Shark Tale.

Quote:
I imagine because the GBA's CPU is such a beast and the games really aren't any more complicated than the SNES's, that you'd spend less than a quarter of the CPU time actually on game logic and the rest on sound.

A soft mixer at the typical rate (18 kHz) might take about 15% of the CPU. The GSM Full Rate compressed audio decoder used in Luminesweeper takes 60%. But then Doom doesn't use a mixer at all (PSG music, hardcoded samples) because it's spending almost all its CPU time on rendering a pseudo-3D view in software.

Quote:
I'm almost certain that how developers programed for the GBA wasn't as efficient as it was back then on the SNES because they have much more wiggle room

So is Martin "nocash" Korth. Search for "HLL" in GBATEK.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176484)
tepples wrote:
The GSM Full Rate compressed audio decoder used in Luminesweeper takes 60%.

I honestly have no clue what that means, but if 18KHz is 15%, then I imagine you could just have a sampler at CD quality (44,1Khz) which I imagine is about a well as a human can even hear, (I don't know why else it would be a ridiculous number like that) so 15 / 12 = 1.2 x 44.1 = 52.92% of CPU time for audio, which leaves you with 47.08%, which for any game not doing 3D software rendering (maybe aside for the 500+ bullet hell demo) that is way more than enough.

tepples wrote:
There's only one cart I know of that uses a mapper to address more: Shrek and Shark Tale.

I imagine all the flash carts can address more, and they're what you're going to be playing any homebrew games on.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176486)
Espozo wrote:
tepples wrote:
The GSM Full Rate compressed audio decoder used in Luminesweeper takes 60%.

I honestly have no clue what that means

GSM Full Rate, aka GSM 06.10, is a lossy audio codec based on linear prediction. When run at 18 kHz mono, it encodes audio at 30 kbps.

Espozo wrote:
tepples wrote:
There's only one cart I know of that uses a mapper to address more: Shrek and Shark Tale.

I imagine all the flash carts can address more

Only if you can figure out the sequence of writes to switch banks (if it's NOR based) or access the inserted memory card (if it's CF or SD based). This differs from one make and model of flash cart to another, which is why "DLDI" was invented for DS homebrew. It's as if the PowerPak supported only one mapper and the EverDrive another, and you had to mapper-hack games bigger than a certain size to run on a different brand of flash cart. And even then, a CF or SD card will be a lot slower than ROM, making it useful for streaming but not for random access to, say, a large set of frames. Besides, which flash cart's bank switching registers do VBA and NO$GBA emulate?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176493)
tepples wrote:
GBA ROM is limited to 32 MiB for the first player and 256 KiB for players 2-4. There's only one cart I know of that uses a mapper to address more: Shrek and Shark Tale.

That's not how it works, at all.

There are two different methods to implement multiplayer. One involves requiring each player have their own cartridges, in this case there isn't any limit besides what you can fit in ROM as usual. The other one involves having only one player own a cartridge, in this case the game needs to fit entirely in RAM for all the other players (which is probably what you're referring to). I suppose it could be possible to stream in data from the player that has the cartridge but I assume that's horribly slow.

Note that this also applies to games downloaded using the Game Boy Player (like the chao garden minigame or the Nights minigame in some of Sega's Gamecube games), they have to fit entirely in RAM.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176494)
Tepples was certainly referring to the multiboot scheme for 2-4 players.

Espozo wrote:
I imagine I imagine I imagine

I imagine
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176498)
tepples wrote:
which flash cart's bank switching registers do VBA and NO$GBA emulate?

Well, how about this, does anything (flash cart or emulator) emulate the Shrek and Shark Tale mapper? I'm saying that it's possible to have more memory, just not easily possible, kind of like having character ram on a Neo Geo cart.

mikejmoffitt wrote:
Espozo wrote:
I imagine I imagine I imagine

I imagine

I should start to proofread my messages... :lol:
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176505)
Stef wrote:
The Saturn is very complex, convoluted but at least it doesn't have real weakness as the SNES can have.
This is from an old part of the conversation, but I really can't let it slide. The Saturn has multiple weaknesses that stem from atrocious design decisions. The SNES, on the other hand, has only one true weakness - the inexcusably slow CPU.

Let's start with the Saturn's two main Hitachi SH-2 CPUs, clocked at 28.6 MHz. Not dual-core, it's two separate chips, and they couldn't both access the memory bus at the same time. As individual chips that were not intended for multiprocessor systems, they had no bus snooping or cache-coherency. Without cache coherency, changes made to memory on one CPU are not guaranteed to be visible on the other CPU until the software running on the writing CPU flushes its relevant cache line and the software running on the reading CPU invalidates its relevant cache line. That's a weakness that practical dual-CPU systems don't have, including systems that predate the Saturn.

Compared to the Playstation's single MIPS R3000A at 33.9 MHz, the Saturn's CPUs are individually both slower and more obscure. MIPS is and was a popular and well-understood CPU architecture, in the same way that the 6502, Z80, and 68000 were popular architectures when designed into the NES, Game Boy, and Megadrive, respectively. The SuperH architecture used in the Hitachi SH-2 really only saw use in microcontrollers and some mobile phones, aside from the Saturn and Dreamcast.

Being hard to program for is not "complexity", it's a weakness. If you can't turn theoretical performance into real performance, it might as well not be there.

Now, let's talk about the Saturn's two Video Display Processors (VDPs). The VDP1 is really just an nVidia NV1 chip. You might have warm, fuzzy feelings about nVidia nowadays, but the NV1 was frankly bizarre. It couldn't do translucent polygons. It used quadrilaterals instead of triangles to build 3-D scenes, contrary to pretty much every 3-D accelerator before or since. The Playstation, on the other hand, only needed 1 chip to be competitive. (Of course, both Saturn and Playstation pale in comparison graphically to the N64, but that system came out almost two years later, so that comparison is unfair.)

Finally, let's talk about FMV. FMV was much better on the Playstation, which had a dedicated video decoder chip. You'd think that Sega would have learned from the Sega CD debacle, but FMV had to be decoded in software on the Saturn, unless the add-on Video CD Card was installed. And how many gamers bought that?

So yeah, the Saturn deserved to lose to the Playstation.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176508)
Quote:
The SNES, on the other hand, has only one true weakness - the inexcusably slow CPU.


A lot of people on this forum disagree with this statement.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176514)
Quote:
The VDP1 is really just an nVidia NV1 chip.


The VDP1 predates the NV1. It is similar in design and features, and that's why a lot of Saturn PC game ports used it.

The similarity is interesting, but I know of no documentation that clearly indicates nVidia designed the VDP1, let alone that it's actually just the NV1's GPU in disguise.

Quote:
A lot of people on this forum disagree with this statement.


Eeeeeh, kind of.

It was definitely atrocious for the generation of variable-width fonts. You could do it with dialogue text, but it was really inadequate for entire inventory screens full of 8x8 text. I had to resort to pre-rendering all possible item names into ROM data, which would never fly back in the days of super expensive mask ROM.

Instructions to left/right shift A by X would have saved it for that and a lot of other use cases.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176516)
psycopathicteen wrote:
Quote:
The SNES, on the other hand, has only one true weakness - the inexcusably slow CPU.

A lot of people on this forum disagree with this statement.

And I'm among them. The S-CPU is plenty fast for 2D games, so long as you have time and skill to write inner loops in assembly language rather than relying on a C compiler.

byuu wrote:
Eeeeeh, kind of.

It was definitely atrocious for the generation of variable-width fonts. You could do it with dialogue text, but it was really inadequate for entire inventory screens full of 8x8 text.

The NES can render a whole page of VWF text in a few frames. And that's with a 40% slower CPU, no 5A22 multiplier to shift the glyph slivers, and no DMA to copy rendered tiles to the PPU. See the help screens in Action 53, RHDE, and the 240p test suite. Once I finish my present paid project, should I port my VWF engine to the Super NES and make an e-book reader app?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176517)
It takes about 3-4 scanlines worth of CPU time to render one 2bpp 72x8 pixel item name. I used every possible trick in the book to optimize my VWF routine to that point. I am confident you couldn't beat my routines.

Unless you have a giant pool of RAM to buffer lots of items, you have to render and blit them one at a time. Including the setup and transfer to VRAM, you're eating 5 scanlines. Vblank is ~37 scanlines long, and you have to do other stuff during it. So, at best, you can get seven item names rendered and uploaded per frame. But more realistically, I could get in only 3-4 due to busy NMI routines.

Bahamut Lagoon had screens with 36 text names and 4 menu options on them. Having the entire game lock for 150ms every time you scroll one row in a giant list was completely unacceptable. Dejap's non-ZSNES patch was far worse than mine, and would take a full second, while you watched the entire screen fill with gibberish since they did the updates to the tiledata while the tilemap was still pointing at the old data. But that was kind of shitty programming on their part.

There were also many screens that used 4bpp fonts.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176519)
In a situation like that, my NES VWF engine would be rendering two 72x8 lines per frame. It doesn't cause the menu to "lock", as it's rendered in the background. Here it is in 240p test suite rendering to a 128x176 window (Action 53 is similar):

Attachment:
vwf_rendering_speed.gif
vwf_rendering_speed.gif [ 8.07 KiB | Viewed 2569 times ]


If you scroll one row at a time, you can just render 2 lines into the pattern table and update the nametable to reflect different tile numbers.

But I'd be interested (in a separate topic) to see your routines.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176521)
Isn't that more of a RAM problem than a CPU problem if you have to rely on doing everything during vblank?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176530)
LightStruk wrote:
It couldn't do translucent polygons.

Huh, it could... just wasn't very useful. Only 50%, and only within the framebuffer which was a problem because Saturn games often used the tilemaps for the backgrounds and floors. Also it was glitchy because the way it rendered quads meant it would render some pixels twice (making them more opaque than usual). So yeah, it could do translucent polygons but it was better to do checkerboard in practice =P

Surprised you didn't mention the fact the Saturn had to do all 3D calculations in software in your rant - I presume this is the reason why there's a second SH2 on the system, especially since their devkit seemed to default to using it to do all the sorting and 3D math. The PS1 had dedicated hardware to do all the 3D math instead (albeit separate from the GPU which still only saw 2D triangles).

LightStruk wrote:
So yeah, the Saturn deserved to lose to the Playstation.

It was going to lose miserably even if it had the best platform ever because Sega of Japan and Sega of America were actively trying to sabotage each other at the time (and this is something that they were already starting to do by the end of the previous generation). Nothing can save a company that's trying to destroy itself =P
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176531)
LightStruk wrote:
the inexcusably slow CPU.

Even if 3.58Mhz isn't fast enough, the ram still doesn't even go that fast.

Yeah though, if the two CPUs in the Saturn have to interrupt each other to pull data from ram, then you might as well only have one. :lol:

The N64 was terribly programed for. Conker's Bad Fur Day is one of the few games I believe really show of the N64's capability. Perfect Dark also looks good, but the framerate rivals a Super FX game. :lol: I don't get what the whole "small texture cache" thing was, I mean, it's not like you can just swap it out. I guess the main problem is that if you had a 128x128 texture that you wanted to sprawl out forever, you'd need to have each 64x64 section be its own quad. Even if you decided to draw all the sections that used that 64x64 texture, and then swapped out the texture in the texture cache, it would still be slower than if you just drew one 64x64 texture that wraps around.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176536)
Quote:
I don't know why else it would be a ridiculous number like that

44100 Hz was taken because it is an integer multiple of both 50 Hz and 60 Hz. (Agruably, any multiple of 300 Hz fits that criteria so I don't know why they picked this instead of any other multiple of 300 above 40000 Hz).

Quote:
The SNES, on the other hand, has only one true weakness - the inexcusably slow CPU.

Once again, this statement complete and utter bullshit.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176543)
Quote:
This is from an old part of the conversation, but I really can't let it slide. The Saturn has multiple weaknesses that stem from atrocious design decisions. The SNES, on the other hand, has only one true weakness - the inexcusably slow CPU.


For me the SNES has some other important flaws in the PPU design: splitted OAM, constrained sprites (size, tiles capacity), VBlank only accesses for VRAM... compared to predating systems as the Megadrive or the PCE it really hurts to meet that kind of design in 1990, even more when the CPU is definitely not strong (to not say weak).

Quote:
Let's start with the Saturn's two main Hitachi SH-2 CPUs, clocked at 28.6 MHz. Not dual-core, it's two separate chips, and they couldn't both access the memory bus at the same time.


Of course it is not dual core, back in time dual core were not existing (except for very specific purpose).

Quote:
As individual chips that were not intended for multiprocessor systems, they had no bus snooping or cache-coherency. Without cache coherency, changes made to memory on one CPU are not guaranteed to be visible on the other CPU until the software running on the writing CPU flushes its relevant cache line and the software running on the reading CPU invalidates its relevant cache line. That's a weakness that practical dual-CPU systems don't have, including systems that predate the Saturn.


Back in time again dual CPU systems were quite rare, i would say that even cache in CPU was a recent addition, specially if you consider video game system.
Of course they couldn't use real SMP design that only very costly servers owned back in time. But in fact the SH-2 was designed with SMP approach in mind, it has many features to make SMP architecture easier, for instance it allow you (directly from the memory map) to read/write through the cache or directly from/to memory, exactly to take care about the cache coherency issue between CPU.

Quote:
Compared to the Playstation's single MIPS R3000A at 33.9 MHz, the Saturn's CPUs are individually both slower and more obscure. MIPS is and was a popular and well-understood CPU architecture, in the same way that the 6502, Z80, and 68000 were popular architectures when designed into the NES, Game Boy, and Megadrive, respectively. The SuperH architecture used in the Hitachi SH-2 really only saw use in microcontrollers and some mobile phones, aside from the Saturn and Dreamcast.


Did you already tried to develop with the Super-H CPU ? It's a very straight forward RISC based CPU very similar to ARM / MIPS architecture. It used a smart 16 bit length instruction set to allow better code density and better performance on 16 bit BUS systems. ARM actually copied it for its Thumb instruction set.
The CPU by itself is very powerful and i would say probably better than the R3000A on a same clock basis.
It includes a powerful MAC (Multiply and Accumulate) instruction allowing fast 3D computation without requiring a special DSP for these operations, it also integrate many features as a DMA controller, watch dog timer, dedicated communication canals to communicate with a second CPU (it was used to communicate between CPU without hogging the main data bus). Also i really liked the fact you could split the cache so you use one part as a very fast scratch pad RAM. All these features made the CPU very efficient. Honestly i think that a single SH-2 at 33 Mhz would have been a better choice...

Quote:
Being hard to program for is not "complexity", it's a weakness. If you can't turn theoretical performance into real performance, it might as well not be there.


I do agree with that, a complex system is generally not a good system as you won't be able to exploit all the performance but what i meant is that the Saturn system by itself is very powerful for what it was initially designed (a super 2D system) and has not any specific performance weakness.

Quote:
Now, let's talk about the Saturn's two Video Display Processors (VDPs). The VDP1 is really just an nVidia NV1 chip. You might have warm, fuzzy feelings about nVidia nowadays, but the NV1 was frankly bizarre. It couldn't do translucent polygons. It used quadrilaterals instead of triangles to build 3-D scenes, contrary to pretty much every 3-D accelerator before or since. The Playstation, on the other hand, only needed 1 chip to be competitive. (Of course, both Saturn and Playstation pale in comparison graphically to the N64, but that system came out almost two years later, so that comparison is unfair.)


I do know all that, but in fact the VDP1 was a reasonable choice if you think the system was preliminary designed to be a "super 2D" system (Gigadrive) with a bit of 3D features but definitely not a real 3D system as the PSX. Drawing quad (sprite) with many hardware capabilities (deformation, rotation, lightning...) was a real strong feature to make great looking 2D games. The problem is that we always compare it the the PSX which was designed to do 3D (and the PSX was really good for that), the Saturn can hardly compete in this domain as it was not intented to do that initially. They boosted the VDP1 (fill rate, lightning feature..) so you could use quad (sprite) as polygon for 3D games, changed the main CPU for 2 SH2 and they added the SCU DSP to improve the 3D computation capabilities but well, that was a quick and dirty addition. The problem is the PSX bring the 3D and Sega tried to follow with its Saturn instead of concentrating on the system strengths: doing nice 2D games. I would said the best was mix of 2D/3D games as Clockwork Knight, i think Sega should have stick on that kind of game...
And the Saturn can actually do translucent polygon, as Sik explained they just rarely use it as the way quads are rendered lead to some duplicated rendered pixels (and so transparency is applied twice in some part), also "mesh transparency" was a hardware feature and it was faster so almost developers used it.

Quote:
Finally, let's talk about FMV. FMV was much better on the Playstation, which had a dedicated video decoder chip. You'd think that Sega would have learned from the Sega CD debacle, but FMV had to be decoded in software on the Saturn, unless the add-on Video CD Card was installed. And how many gamers bought that?


Honestly i really don't care about FMV, it's nice the PSX integrated a video decoder chip (not surprising from Sony) but why spent more bucks in that when you already have 2 SH2 CPU which are more than enough to unpack movies. The difference is that video quality was almost always the same on the PSX (and it was great) as the hardware decoder fixed the codec where on Saturn it was really dependant from the software codec used. Still the best FMV on Saturn (thinking about Panzer Dragon Saga for instance) are imo almost on par with the best FMV you had on PSX.

Quote:
So yeah, the Saturn deserved to lose to the Playstation.


Again for me the goal wasn't to compare Saturn versus PSX. The PSX is stronger in 3D, there is no debate... but the Saturn was not preliminary designed for that. I think Sega should have stay with their initial plan instead of trying to change at last minute the architecture and having something so convoluted and expensive in the end. That was a big mistake for sure...

Quote:
The VDP1 predates the NV1. It is similar in design and features, and that's why a lot of Saturn PC game ports used it.
The similarity is interesting, but I know of no documentation that clearly indicates nVidia designed the VDP1, let alone that it's actually just the NV1's GPU in disguise.


In fact it looks like the NV1 (STG-2000) was based on the Saturn VDP (and not the contrary). Don't know exactly how this happened... really difficult to find good information about it.

Quote:
Surprised you didn't mention the fact the Saturn had to do all 3D calculations in software in your rant - I presume this is the reason why there's a second SH2 on the system, especially since their devkit seemed to default to using it to do all the sorting and 3D math. The PS1 had dedicated hardware to do all the 3D math instead (albeit separate from the GPU which still only saw 2D triangles).


In fact the Saturn also has a dedicated hardware to compute 3D math (the SCU DSP) but a very few game actually uses it. Probably because it was not easy to use it (lack of documentation) and also because the SH2 are already quite capable to process 3D math (thanks to the MAC instruction).
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176544)
My understanding of what you said so far is that the Sega Saturn and Nintendo 64 have two CPUs, one to run the game and one to run the vertex shaders. On the Saturn, these are the two SH2s; on the N64, these are the CPU and RSP. The PlayStation is the odd man out, with T&L on a fixed-function GTE chip.

How far off is this?
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176545)
Bregalad wrote:
Quote:
The SNES, on the other hand, has only one true weakness - the inexcusably slow CPU.

Once again, this statement complete and utter bullshit.
Since this thread seems to be inspiring fanboi rage, let me start by saying that the SNES is my favorite console of the 16-bit generation, by a very long shot. My frustration with its slow CPU is out of love, not spite.

With that out of the way, let's honestly compare the SNES to its primary competition, the Sega Genesis.

Excluding the CPU, the SNES is more powerful than the Genesis in every way. It could display many more colors at once, had very advanced sprite rotation and scaling (Mode 7), had better digital sound samples, more sophisticated music synthesis, more RAM, and had more buttons on its original controller with shoulder buttons.

The Sega Megadrive was released on October 29th, 1988, with a 7.6 MHz Motorola 68000 CPU. When released in the US as the Sega Genesis, it cost $190. The Super Famicom was released over two years later on November 21st, 1990, with a 3.58 MHz 65816 CPU. When released in the US as the SNES, it cost $200. That's why I say the slow CPU was inexcusable - it was later, slower, and came in a more expensive box.

Megahertz can be deceiving, so was the Megadrive / Genesis CPU really faster then the Super Famicom CPU, despite being two years older and costing less?

Yes. Yes it was.

The Motorola 68000 has 8 general purpose data registers, while the 65816 has 1 true general purpose register (the Accumulator) and 2 less flexible registers (X and Y). More registers means being able to do more calculations right on the CPU without having to incur the penalty of going to main memory. (Don't try to tell me that the 65816's zero-page is like having 256 registers; they're still slower than register-only instructions.) The two CPUs have similar cycle costs for their arithmetic instructions, so it's not like the 65816 has a higher IPC (instructions per clock).

Rumor has it that the Super Famicom was originally going to get a 10 MHz 68000, but combined with the leading-edge graphics and sound, it would have cost too much money. Therefore, Nintendo switched to the 65816. It wasn't a bad decision in some other ways, since it was very similar to the 6502 in the NES, and developers would hit the ground running with the new system. It's a pattern for Nintendo, actually.

SNES CPU - cheap, slow, slightly improved version of predecessor's CPU.
Wii CPU - cheap, slow, slightly improved version of predecessor's CPU.
Wii U CPU - cheap, slow, slightly improved version of predecessor's CPU.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176548)
Bregalad wrote:
(Agruably, any multiple of 300 Hz fits that criteria so I don't know why they picked this instead of any other multiple of 300 above 40000 Hz).
Apparently to give some headroom for the cut-off point in the analog side of the circuit:
Wikipedia wrote:
In addition, signals must be low-pass filtered before sampling to avoid aliasing. While an ideal low-pass filter would perfectly pass frequencies below 20 kHz (without attenuating them) and perfectly cut off frequencies above 20 kHz, in practice a transition band is necessary, where frequencies are partly attenuated. The wider this transition band is, the easier and more economical it is to make an anti-aliasing filter. The 44.1 kHz sampling frequency allows for a 2.05 kHz transition band.


Stef wrote:
In fact the Saturn also has a dedicated hardware to compute 3D math (the SCU DSP) but a very few game actually uses it. Probably because it was not easy to use it (lack of documentation) and also because the SH2 are already quite capable to process 3D math (thanks to the MAC instruction).

The only thing I saw was a division unit that takes up 37 cycles (although the CPUs can go do other stuff in the meanwhile).

I need to look up again but I think multiplication on the SH2 is not single cycle. A quick look in this SH2 doc says (number in parenthesis is cycles contention, no idea what the 3/ means though)

  • "Executed in 1–3 states" for 16×16 MUL
  • "Executed in 2–4 states" for 32×32 MUL
  • "Executed in states 3/(2)" for 16×16 MAC ← (gap = 1-3?)
  • "Executed in 2–4 states 3/(2~4)" for 32×32 MAC

And on top of this there's extra cycles involved in bit shifting because it uses fixed point for non-integer calculations, and it's using matrices so there are quite a lot of these operations. I need to take a look at the PS1 again but I think the vector processor can do several of these operations all at once.

The Saturn still had other problems anyway, like the complete inability to do texture modulation (only addition and substraction, much like color blending on the SNES - this is why lighting on the Saturn is so different from the PS1 for anything textured, although whether it looks worse or not when you don't go realistic is a different matter =P).

tepples wrote:
My understanding of what you said so far is that the Sega Saturn and Nintendo 64 have two CPUs, one to run the game and one to run the vertex shaders. On the Saturn, these are the two SH2s; on the N64, these are the CPU and RSP. The PlayStation is the odd man out, with T&L on a fixed-function GTE chip.

How far off is this?

Not that off as far as I know.

LightStruk wrote:
Since this thread seems to be inspiring fanboi rage

The thread was fanboy bait by definition, what did you expect? =P

LightStruk wrote:
Rumor has it that the Super Famicom was originally going to get a 10 MHz 68000, but combined with the leading-edge graphics and sound, it would have cost too much money. Therefore, Nintendo switched to the 65816.

http://forums.sega.com/showthread.php?4 ... -processor

Sadly their source is gone, so I can't check to see if there were any links backing it up. Mentioning a name makes it sound like this is something you'd find in an interview. That said, the SNES address space really makes it look like NES games would have been directly intended to run on it instead of having a separate mode like Genesis vs Master System, so I find it really hard to believe it could have ever used anything that wasn't compatible with the 6502.

Then again apparently parts of Sega wanted the Saturn to have a 68020 (or 68030, sources disagree on this) but it was very obvious the engineers never considered that as an actual option. An overpowered Mega Drive with a faster CPU and 3D tacked on top (and probably CD drive and more RAM) may have fared better than the Saturn, honestly (even if just because it'd have been cheaper), but again Sega was too busy sabotaging itself anyway.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176550)
LightStruk wrote:
(SNES) had very advanced sprite rotation and scaling

Nope.

I'm really not getting the whole system sabotaging thing going on here, when it's obvious that the people slamming on the other systems by using as many negative adjectives (that all mean the same thing) as possible have never even programmed for said systems.

Sik wrote:
LightStruk wrote:
Since this thread seems to be inspiring fanboi rage

The thread was fanboy bait by definition, what did you expect? =P

Exactly, which is why this is deserving of a lock about now. Also, LightStruk, don't act all high and mighty like it isn't in you, because it is in me right now. It makes me angry, frustrated, and slightly upset.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176551)
Sik wrote:
The only thing I saw was a division unit that takes up 37 cycles (although the CPUs can go do other stuff in the meanwhile).


http://koti.kapsi.fi/~antime/sega/files ... 042795.pdf
The DSP can be programmed and work independently as it has its own memory and even DMA capabilities. Normally it was used to compute matrix multiplications but a very few game uses it. As you can see the documentation is minimalist...
The division unit was external as you pointed but SH2 could compute them faster (1 bit division in 1 cycle if i remember correctly).

Quote:
I need to look up again but I think multiplication on the SH2 is not single cycle. A quick look in this SH2 doc says (number in parenthesis is cycles contention, no idea what the 3/ means though)

  • "Executed in 1–3 states" for 16×16 MUL
  • "Executed in 2–4 states" for 32×32 MUL
  • "Executed in states 3/(2)" for 16×16 MAC ← (gap = 1-3?)
  • "Executed in 2–4 states 3/(2~4)" for 32×32 MAC

And on top of this there's extra cycles involved in bit shifting because it uses fixed point for non-integer calculations, and it's using matrices so there are quite a lot of these operations. I need to take a look at the PS1 again but I think the vector processor can do several of these operations all at once.


I guess the vector processor in indeed more efficient to do that, but still the SH2 by itself is already quite capable. 32x32 MAC in 2-4 states is damn fast. Also you have quick 2, 8 and 16 bit shift (1 cycle) which is very convenient for fixed point math.
Really i was really please when i discovered this CPU, really nice and efficient design :)

Quote:
The Saturn still had other problems anyway, like the complete inability to do texture modulation (only addition and substraction, much like color blending on the SNES - this is why lighting on the Saturn is so different from the PS1 for anything textured, although whether it looks worse or not when you don't go realistic is a different matter =P).


In fact as far i remember it does support only 0.5 + 0.5 blending for sprites but background has more available blending operation.
Still on Saturn you can create crazy effect by abusing the gouraud shading on sprite (yeah, the VDP1 support gouraud shading). In fact you could apply the gouraud shading with paletted texture in which case the palette index was interpolated (not the color) and you can obtain some weird, i remember some people doing cell shading using that trick but you can probably do much more :p

Quote:
Then again apparently parts of Sega wanted the Saturn to have a 68020 (or 68030, sources disagree on this) but it was very obvious the engineers never considered that as an actual option. An overpowered Mega Drive with a faster CPU and 3D tacked on top (and probably CD drive and more RAM) may have fared better than the Saturn, honestly (even if just because it'd have been cheaper), but again Sega was too busy sabotaging itself anyway.


I heard they planned to use a 16 Mhz V60 cpu (as in system 32 and model 1) but quickly discarded it as it was judged too weak.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
by on (#176552)
An "overpowered Genesis with CD and 3D tacked on" would have been more like a 32X. The Saturn stuff is interesting; feel free to continue it in a new topic in in Other Retro Dev.

But this topic has run its course, and we've got a vote to lock (koitsu), a second (Espozo), and a motive (derailed into flamebait).

Image