Developers complained about the SNES's CPU so much back in the day, I wonder why nobody complained much about the NES's CPU, since the NES was even slower.
Easy, NES CPU wasn't slower compared to contemporary game systems, and was faster compared to the previous generation (Atari 2600 etc).
The NES CPU runs at the same speed as that of the Atari 7800 and (assuming a 2:1 IPC ratio*) the Master System. The 65816 in the Super NES likewise runs about the same instructions per second as the 68000 in the Genesis (
proof), and it has almost the same data bus bandwidth. One problem is the 65816's relative lack of certain C-friendly constructs present in the 68000, such as 32-bit registers, autoincrement addressing modes, abuse of the address generator as an ALU with the LEA instruction, and 16-bit multiplication and division. (The built-in multiplier and divider on the S-CPU are 8-bit.) Switching between operating on 8- and 16-bit values requires setting a mode (using REP or SEP), which takes programmer care to ensure that the mode is correct at subroutine entry and exit. The packed pixel format of the Genesis VDP is also more convenient for certain data compression methods.
It is, however, an improvement over the 6502, with new addressing modes that take some of the load off the X and Y registers: movable zero page is a frame pointer freeing X, zero page indirect without indexing frees Y, and stack-relative addressing modes free X. Dedicated instructions (JML, PHK, PLB, PHB) and addressing instructions perform bank switching with less pain. And the S-CPU has a DMA unit at $004300 that can perform both Blast Processing (hardware accelerated memcpy to video memory) and automatic raster effects.
* Let C64 fans who fought in the C64 vs. Speccy flame wars correct me.
Shiru wrote:
Easy, NES CPU wasn't slower compared to contemporary game systems, and was faster compared to the previous generation (Atari 2600 etc).
Okay, that's a good point, but I still wonder why it seems like everyone totally forgot everything they knew about the 6502 when the SNES came out. There are some games that run smoother on the NES than their SNES counterpart, which doesn't make sense because they could've used practically the same code for both systems, and the SNES would've still ran a bit faster.
The entire premise of this thread is wrong.
Everybody who works with any CPU has problems with it. Software engineering is difficult, and the CPU's architecture is a huge set of problems to deal with. You really haven't heard of anyone complaining about the NES CPU before? You aren't looking.
psycopathicteen wrote:
There are some games that run smoother on the NES than their SNES counterpart
I guess that by the time the SNES came out, programmers were already starting to abandon the practice of coding directly in ASM, favoring higher-level languages like C, so even when games weren't made in C, most programmers weren't experienced enough with the CPU to get the most out of it.
The previous generation of consoles used mostly the 6502 and the Z80, which were also present in a multitude of personal computers which people used when learning to code, so there were many more 6502 and Z80 experts than 65816 or 68000. Programmers that stayed in the industry for longer periods probably made better use of the 65816.
I'm just guessing, of course.
psycopathicteen wrote:
There are some games that run smoother on the NES than their SNES counterpart, which doesn't make sense because they could've used practically the same code for both systems
Not if Nintendo required a developer to make conspicuous use of the Super NES features on which Nintendo was spending loads of marketing money.
- NES doesn't have to stream sound effect samples to the SPC700; instead, it can bankswitch PRG ROM at $C000-$DFFF.
- NES doesn't have to memcpy huge tile data every frame for tile animation; instead, it can bankswitch CHR ROM.
- NES palette updates are smaller.
- NES doesn't have to update three layers of nametables, though the more orthogonal attribute system probably balances this out.
- NES doesn't have to update the size and high X flags to keep sprites from popping up at far left.
As for computers, the Apple IIGS was using the 65816, and the Mac, Amiga, and Atari ST were using the 68000. How many Genesis game developers came from the Amiga and ST scenes?
tokumaru wrote:
psycopathicteen wrote:
There are some games that run smoother on the NES than their SNES counterpart
I guess that by the time the SNES came out, programmers were already starting to abandon the practice of coding directly in ASM, favoring higher-level languages like C, so even when games weren't made in C, most programmers weren't experienced enough with the CPU to get the most out of it.
The previous generation of consoles used mostly the 6502 and the Z80, which were also present in a multitude of personal computers which people used when learning to code, so there were many more 6502 and Z80 experts than 65816 or 68000. Programmers that stayed in the industry for longer periods probably made better use of the 65816.
I'm just guessing, of course.
There were a lot of people who were experts with the 68000 because it was widely used in arcade games. I have always questioned if SNES games were actually written in asm, because of it being so time consuming, and the amount of slowdown in some games.
Viewpoint for Genesis is a port of an arcade game, and it runs on a 68K based console, but it still becomes molasses whenever more than one enemy is on the screen.
To answer the original question, I think nobody really had much of a problem with the SNES CPU exept Sega fanboys. The CPU was the only part of the Megadrive's hardware which was not clearly much more worse than the SNES' (they are about of equal processing power, as shown by tepples). However, being clocked about 2x faster, Sega used it as a marketing argument into tricking people to think the CPU was faster with so-called "Blast processing". Back then, nobody really knew anything about processor architecture exept professionals, and the general public was under the wrong assumption that the clock frequency was the only way to measure a CPU performance.
Unfortunately it only measures performance between two processors of the same architecture.
Quote:
Switching between operating on 8- and 16-bit values requires setting a mode (using REP or SEP), which takes programmer care to ensure that the mode is correct at subroutine entry and exit.
On a related note, is there any use for register smaller than 16-bit, other than writing to 8-bit memory mapped I/O ? Or is the point to reduce the code or data size ? Considering how large SNES carts and it's RAM can be I think it's much less of an issue than on the NES.
Bregalad wrote:
On a related note, is there any use for register smaller than 16-bit, other than writing to 8-bit memory mapped I/O ? Or is the point to reduce the code or data size ?
Or when another IC on the bus expects 8-bit data, such as a row of tile data in a proportional font engine or a software sprite occlusion engine, or the entries in a display list to be copied to OAM.
Quote:
Considering how large SNES carts and it's RAM can be I think it's much less of an issue than on the NES.
A lot of games have so much tile data, so much map data, or (especially in RPGs) so much text that even 8 bits are too much, and they need to compress the data. A lot of RPGs use
digram coding (also called BPE or DTE), which allocates half of the 8-bit space to a binary tree representing common combinations of characters that appear together. For example, $80 could represent "qu" and $81 could represent $80 followed by "e", squeezing "que" into one byte. Others use
Huffman coding, assigning short bit strings either to individual characters or
to entire words.
Star Ocean and
Street Fighter Alpha 2 use so much compression on their tile data that it takes a separate IC in the cart to undo it.
psycopathicteen wrote:
There were a lot of people who were experts with the 68000 because it was widely used in arcade games. I have always questioned if SNES games were actually written in asm, because of it being so time consuming, and the amount of slowdown in some games.
Lots of SNES games and I'm sure Genesis games were programmed in Assembly. Some were also programmed in C. At the end of the day their performance is what matters. Just because a game suffers slowdowns isn't because it was programmed in C. It's entirely possible that a game is programmed quite efficiently but it just has too large of a workload to avoid slowdown due to the design of the game or the exact situation occurring.
I do recall hearing that Capcom CPS2 arcade games were programmed in C. It's era was from 1993 onward. And it ofcourse used the 68000. I imagine alot less console games prior to the 32bit generation were programmed in C than in ASM. ASM may be more time consuming but we are talking about people for whom this was their job. It's different than being a hobbyist.
MottZilla wrote:
It's entirely possible that a game is programmed quite efficiently but it just has too large of a workload to avoid slowdown due to the design of the game or the exact situation occurring.
The question was why a Super NES game would have a substantially larger workload than the NES version of the same game.
Tepples mentioned the NES not having to update CHR tiles every frame. I don't think many early SNES games had elaborate animation schemes. I think most early SNES games used DMA just for player and bosses, everything else just remained in VRAM.
psycopathicteen wrote:
I wonder why nobody complained much about the NES's CPU, since the NES was even slower.
Well, I did put this poetry in the silent version of Solar Wars, in lieu of music:
Code:
Ode to the NES
When I first touched you, it was love at first sight.
Your games were so fun, and your graphics so bright.
The years have gone by, and you're great all the same,
So I learnt of your hardware and coded a game.
I studied your specs as I programmed about,
But I certainly didn't like what I had found out:
That your memory was minuscule, and your processor slow,
And your PPU limited, and your resolution low.
And your timing so finicky, it made my head spin,
And your carts that didn't work when they were pushed in.
Not to mention a palette so daft that it hurt,
But your composite display was by far the worst.
So, I'm leaving you, dear NES, maybe even for good,
I'd come back if you improved, but I don't think you would.
In your time you were a most spectacular machine,
But you haven't got shit on the Turbografx-16.
Oct 10, 1999
Bregalad wrote:
On a related note, is there any use for register smaller than 16-bit, other than writing to 8-bit memory mapped I/O ?
There is use. Imagine you reading a text string, you need to process one byte at a time, rather than a word. Also, 8 bit operations are faster on 65816.
All my 65816 programming has kept A 8-bit, for manipulation of registers and byte-oriented data, and X and Y as 16-bit, for loop counters, memory pointers, and light 16-bit arithmetic (compare, increment/decrement, load/store). I've only used 16-bit A where it's worth the hassle of switching and being careful about calling routines which assume A is 8 bits.
Actually, I think I have heard or read somewhere, that someone did actually complain about the lack of decimal mode in the 2A03 and wrote their complaints in an unused area of the ROM (if you dumped the ROM then you could read it).
No decimal mode? No problem. I just wrote a
6-scanline 16-bit binary to decimal converter for one game and base-100 scoring (with 8-bit conversion of each digit pair in less than one scanline) for another game. It's not that hard.
psycopathicteen wrote:
Tepples mentioned the NES not having to update CHR tiles every frame. I don't think many early SNES games had elaborate animation schemes. I think most early SNES games used DMA just for player and bosses, everything else just remained in VRAM.
Zombies Ate My Neighbors actually DMAs each frame of animation when called for, even the player. That game is probably on the higher-end of early SNES games in terms of vram management.
strat wrote:
Zombies Ate My Neighbors actually DMAs each frame of animation when called for, even the player.
And that's even before you get to the Game Boy Advance, which has
enough video memory bandwidth to choke a horse.
Cool. Is the vram organized into slots like DKC does?
The game keeps track of what objects are active and writes their animation frames to vram slots occupied by expired objects. Even the player graphics aren't always in the same vram location.
tepples wrote:
strat wrote:
Zombies Ate My Neighbors actually DMAs each frame of animation when called for, even the player.
And that's even before you get to the Game Boy Advance, which has
enough video memory bandwidth to choke a horse.
In the linked page, tepples wrote:
Sprite is a trademark of the Coca-Cola Company, but only if it looks
like a soda can (see
http://pics.pineight.com/nes/spritecans-2011.zip).
That made my day, and it has just begun
The main problem is as pointed out earlier is that there's way more stuff to handle in the SNES compared to NES.
X coordinate split of the sprites is certainly not helping in the SNES either...
TmEE wrote:
In the linked page, tepples wrote:
Sprite is a trademark of the Coca-Cola Company, but only if it looks
like a soda can (see
http://pics.pineight.com/nes/spritecans-2011.zip).
That made my day, and it has just begun
I never tire of that demo, and I haven't seen the 2011 remake yet. One of my favorites.
I wonder why Nintendo didn't extend the OAM to 8 bytes per sprite. They would've been able to use the entire 64kB of vram for sprite patterns, and flexible sprite sizes. Was it that much more expensive to have 1024 bytes instead of 544 bytes? Did Nintendo think 128 "bad" sprites were better than 64 "good" sprites?
Ugh, I wrote this long post without paying attention to that psychopathicteen was talking about the
SNES instead. Oh well. But I put enough thought into it I don't just want to discard it...
Anyway, I'm guess Nintendo took a lot of heat over the NES's "only" having 64 sprites, and so they chose to provide 128 mediocre sprites instead of 64 adjustable ones.
The NES shows a lot of signs of being a "skunkworks" project, kinda like the original IBM PC: there's a lot of little weird nits that are odd in hindsight. Some of them are for simplicity or cost (PPU fetch pattern), or show certain assumptions as to how it would be used (noise frequencies). Others show a lack of in-depth understanding the field (DPCM frequency choice), or a lack of time for testing (vis: nesdevwiki:Errata ).
For example:
The PPU keeps its "nametable - attribute - pattern - pattern" cadence fetching from RAM for the entire scanline, 42 times. The spare bandwidth could have been used for a lot of purposes.
The PPU's hardware palette could have been tweaked to significantly increase the number of colors available, especially in the ranges of desaturated or darker colors.
The APU's lookup tables used for noise and DPCM are kinda (ok, a lot) off. And the LUTs take up space, so it's not even clear it's a win on complexity.
The APU's noise channel tonal flag randomly provides period 31 or period 93, depending on the exact state of the LFSR when the tonal mode is switched.
The APU's length counters are only useful in the simplest of music engines.
The APU's DPCM sample length counter (16n+1) doesn't provide the most useful values, and its restriction that its data (mostly) only come from the top quarter of the CPU's memory space is inconvenient since that cohabits with the hardware vectors.
The APU's DACs use nMOSFET pullups, rather nMOSFET pulldowns, causing the nonlinear mixing of channels.
The APU's DACs are internally implemented using 2ⁿ MOSFETs for the nth bit, when they could have used gate geometry to the same effect.
The CPU's joypad interface seems moderately likely it was originally intended to be used for a parallel input, rather than serial.
The CPU's memory map could have been made less goofy by incorporating the functionality of the 74'139 into the CPU die.
The CPU's lengthened M2 might be what causes OAMADDR write corruption.
The CPU's DMA unit is minimalist and inflexible, and the PPU's primary bus is slow; being able to write to the PPU's primary bus would have been greatly useful.
psycopathicteen wrote:
I wonder why Nintendo didn't extend the OAM to 8 bytes per sprite. They would've been able to use the entire 64kB of vram for sprite patterns, and flexible sprite sizes. Was it that much more expensive to have 1024 bytes instead of 544 bytes? Did Nintendo think 128 "bad" sprites were better than 64 "good" sprites?
on MD the sprite table lives in VRAM, but on the NES and SNES it is inside the PPU itself. 1024 bytes is almost twice as much as 544, that needs quite a bit of extra die space...
SNES even uses PSRAM as VRAM while MD has DRAM, I don't really see why they could not have had sprite stuff in VRAM aswell, PSRAM certainly has the bandwidth needed... that would have also left lot of space on the chip itself for making the sprites nicer. It is kind of funny MD does more with inferior memory, there's even access slots midframe to do stuff (but not a whole lot).
I think the access slots that the Genesis uses for sprites and midframe stuff correspond to access slots that the Super NES uses for the third layer.
There's only 18 of them though, byte or word wide depending if you access VRAM or CRAM and VSRAM.
HBL is fully dedicated to sprites processing for the coming line, and active line is spent on fetching BG tiles, tilemap entries and VSCROLL values and refresh slots + free access slots (refresh steals every 4th access slot).
But yes, it is kinda same deal as with TG16/PCE where you have full access to VRAM all times, but because there's only one BG layer. That machine also uses PSARM as VRAM. This machine should have had DMA unit to move data, the CPU cannot saturate the bandwidth in any case, and the chip can do way more that games show, much like Master System where CPU can only use up 20% of available bandwidth.
TmEE wrote:
But yes, it is kinda same deal as with TG16/PCE where you have full access to VRAM all times, but because there's only one BG layer. That machine also uses PSARM as VRAM. This machine should have had DMA unit to move data, the CPU cannot saturate the bandwidth in any case
So
that's why the TG16 lost to the Genesis in Latin language markets: no Blast Processing.
I thought it was because Bonk sucks? Or did Sonic just have too much rockin' cool 90's attitude?
I can confirm that BG3 is the reason why the SNES doesn't have access cycles or an OAM located in VRAM. It takes 3 pixels to fetch BG1, 3 pixels to fetch BG2, and 2 pixels to fetch BG3, (hence why BG3 is only 2bpp), which adds up to 8 pixels.
Has anyone actually tried to measucre what is happening on the VRAM bus in SNES ?
Hooking some data lines to RGB signal can produce some pretty good insight :
http://www.tmeeco.eu/BitShit/MoreSpriteFun.jpg
Maybe Keith Courage in Alpha Zones wasn't as captivating a pack-in title as Altered Beast.
http://wiki.superfamicom.org/snes/show/TimingWhoever wrote this document might have done so. The way it is being described sounds extremely similar to the VDP timing sheet on spritesminds, except the sPPU data bus is half the size, but twice the speed, and fetches one tile at a time, instead of two tiles at once.
I know that the 65816 requires fast memory because it has a multiplexed data/address bus, but since the SNES uses a custom Ricoh chip, couldn't they have gotten the bus multiplexer removed, and had allowed the chip to run at 5.37 or 7.16 Mhz instead? Because the DMA is built into the Ricoh chip, couldn't they have ran DMA at 5.37 or 7.16 Mhz too?
Multiplexing has little to do with it. Even the non-multiplexed 6502 needs a half cycle to generate each address. A 6502 family CPU divides the CPU cycle into low and high halves. An address is ready by the start of the high half, and the data has to be ready by the end.
If it helps, think of each 65816 cycle as two cycles, address generation and memory access, because other CPUs like Z80 and 68000 that allow a whole cycle for memory access tend to be clocked faster to compensate for their lower IPC. If you put a divide by 2 in front of a 65816's clock input, that'd correspond to an "internal operation" cycle after each memory access. And just as the Amiga used the 68000's copious "internal operation" cycles for video, the Apple II used "internal operation" half-cycles for video.
I'm surprised how many bottlenecks there were with designing consoles back then. Requiring 2x fast memory was mostly an issue with CPUs, right? PPUs and support chips didn't need RAM that was 2x as fast as the access speed?
It's a lot easier to use slower memory for video memory, because the fetch order is not only almost entirely predictable but almost entirely in-order. So any high-latency high-bandwidth technique works here, such as increasing the bus width.
Except of course where there's indirection involved, such as tile fetches whose address depends on a nametable fetch.
tepples wrote:
And just as the Amiga used the 68000's copious "internal operation" cycles for video, the Apple II used "internal operation" half-cycles for video.
C64 is the same. The VIC-II is able to fetch most, but not all, of its data during those same half-cycles. Outside of that, on one of every eight scanlines, the VIC-II steals 40 cycles from the 6510 to read character pointers (analogous to nametable data) and two cycles for each sprite that's active.
What if the SNES kept the CPU at 2.68 and 3.58Mhz, but sped up the DMA to 5.37Mhz. Would that have worked?
I always thought of DMA as read on one half-cycle, write on the other. You can't be sure that the data coming out of the source is correct until the end of the read half-cycle, and the value needs to be held on the data bus during the write half-cycle.
I always thought it reads from bus A and writes to bus B at the same time.
tepples wrote:
Multiplexing has little to do with it. Even the non-multiplexed 6502 needs a half cycle to generate each address. A 6502 family CPU divides the CPU cycle into low and high halves. An address is ready by the start of the high half, and the data has to be ready by the end.
If it helps, think of each 65816 cycle as two cycles, address generation and memory access, because other CPUs like Z80 and 68000 that allow a whole cycle for memory access tend to be clocked faster to compensate for their lower IPC. If you put a divide by 2 in front of a 65816's clock input, that'd correspond to an "internal operation" cycle after each memory access. And just as the Amiga used the 68000's copious "internal operation" cycles for video, the Apple II used "internal operation" half-cycles for video.
The Huc6280 in the TG16 (65x variant) doesn't require a half cycle. 120ns rom runs all day with no problems on the 7.159mhz bus access of the cpu. If Hudson could pull that off (NEC didn't design any of the three main chips), then surely the Big N could have as well. And the PC-Engine came out before the SNES (2 years before IIRC).
The address bus is only guaranteed to be correct by the time φ2 rises, but in practice the transitions happen very shortly after the start of φ1. Most ROMs' /OE-to-data time is half that of their /CE-to-data time, so as long as /CE is only derived from the address bus on the TG16 (and not M2), I would think a plain 120ns ROM should work fine.
tomaitheous wrote:
And the PC-Engine came out before the SNES (2 years before IIRC).
3 years before (Oct 1987 vs. Nov 1990) but the SFC was announced as early as Sept. '87.
I think they should've ran the CPU and PPU on a synchronized 32-bit bus with cartridge pinouts, and 128kB of shared RAM. The only downside would be screen resolution being tied to CPU speed and ROM latency, but other than that, it would be so much more flexible. It would be able to fetch sprite patterns directly from the ROM, without any complicated sprite management, and will be better equipped for enhancement chips.
And it probably would have been as expensive as a Neo-Geo AES.
I don't understand how it would be anymore expensive than a Genesis or SNES. Think of it as an SNES's 65816 sharing the same data and address bus as the Sega Genesis's VDP, with alternating half cycles, with "slow ROM" being used with 32H mode, and "fast ROM" being used with 40H mode.
ROM space was so expensive that it was cheaper to put dedicated compression ASICs on SNES cartridges rather than double the capacity at the time.
At least with RAM for storing tile data, you can compress them.
I didn't say anything about increasing ROM size.
You did by implying graphics would be stored in ROM for access by the PPU, and therefore not compressible.
I said ROM access and 128kB of shared RAM that both the PPU and CPU can use.