AtariAge "CPU comparison"

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
AtariAge "CPU comparison"
by on (#169904)
http://atariage.com/forums/topic/197977 ... s-vs-tg16/

I know that this is beating a dead horse, but come on? More self-proclaimed experts complaining about the SNES's CPU who know diddly squat about it. These are the kinds of people who make me feel like I'm only doing SNES programming to win a stupid debate. If these people really did have performance issues with the SNES, it's because they purposefully avoided making any optimizations, because if they were actually trying to get good performance out of the SNES, it would go against their argument. It just really makes me cringe reading this.
Re: AtariAge "CPU comparison"
by on (#169905)
The SNES CPU was a bit slow, comparatively. 1/3 the speed of Genesis, and only twice as fast as NES, i think (not really an apples to apples comparison).
Re: AtariAge "CPU comparison"
by on (#169907)
The Genesis's CPU is only 1.9 Mhz because it divides the clock by 4.
Re: AtariAge "CPU comparison"
by on (#169908)
Well, everything written on this thread is completely false, and what else can I say ? This was written 4 years ago, what can you do about it today ? What's the point of bringing it here, even ? We're in no way responsible for these people, Arkhan in particular, to invent completely false things about the SNES CPU being slower than NES's.

I'll also add that the general level of those forums seems remarkably low.
Re: AtariAge "CPU comparison"
by on (#169911)
Gradius 3 was a launch title running on SloROM and a home port of an arcade title that already had plenty of slowdown so it's a pretty unfair example to use. Hell, it's actually quite performant compared to, say, Metal Slug 2. I've been doing a lot of planning on my own game and I think I could definitely come up with something on the level of Treasure's games. The CPU gives me enough cycles to work with that I think it is certainly attainable.

I'll be honest though, I'm also getting tired of comparisons between systems regarding which is "more powerful". A lot of such discussion is largely unproductive and borderline manipulative and often devolves into comparisons with the Neo Geo (which I find frankly bizarre). But I also feel like I have to prove a point, so I end up feeling torn. I try not to think about it too much given the level of discourse.

The SNES is a... weird machine, but that's half of why I want to make a game for it. After I'm done I'd like to write up a big technical postmortem about it. Discussion about system quirks and such is fascinating as long as it isn't about competition.
Re: AtariAge "CPU comparison"
by on (#169913)
Well i think this has been already discussed quite a bit...
I'm the first to admit you just can't compare the SNES 65816 to the MD 68000 on their clock speed as these CPU uses complete different architecture. The SNES CPU runs synchronously with RAM and so can access it at each cycle while the 68000 external frequency is only 1/4 of its internal speed. Given these infos you have this :
- SNES CPU internal / external speed = ~3.1Mhz with fast ROM and 2.68 Mhz with slow ROM.
- MD 68000 internal speed = 7.67 Mhz / external speed = 1.92 Mhz
And the external speed is *very* important as it refers how much data you can exchange and more or less how fast you can fetch instructions...
So even if the MD 68000 has a faster internal speed, the SNES 65816 does has a faster memory cycle / external speed.
But then you have to consider than the 65816 only uses a 8 bits data BUS while the 68000 uses a 16 bits BUS so despite its slow 1.92 Mhz external clock you can still do faster transfer with the 68000 CPU. But in the case you're doing many 8 bits operations the 16 bits can be wasteful....
Given all that observations you may think these CPUs are finally close in performance... 65816 better for 8 bits operations while the 68000 can do a better job for >= 16 bits operations, well in fact it's a bit true but not exactly.
The *huge* advantage of the 68000 is its far more advanced architecture, the 65816 is a very simple CPU in comparison, after all it's just a 16 bits extended 6502, even the Z80 looks quite advanced compared to the 6502.
The 68000 has a 16/32 bits architecture, supports multiplication / division and others advanced instructions (as dynamic shift) the 65816 doesn't have, and more importantly it has 8+8 32 bits registers and a very efficient instruction set compared to the 65816.

Still of course you can optimize your code on 65816 and get descent results... but you can do the same for the 68000, optimizations work on any CPU. And definitely if the subject is about what you can do with each CPU, you can do a lot more with the 68000. I barely estimate the 7.67 Mhz 68000 to be almost twice as fast than the 3.1 Mhz 65816 (so with fast ROM).

A good example can be found on the topic where we were speaking about LZ4 unpacking. LZ4 compression is well suited to 8 bits CPU and there is indeed a very fast unpacking implementation working for the 65816. Keeping the classic 8 bits format i was able to obtain an implementation about only 15/20% faster on the 68000 but as soon i modified the compression algorithm to take advantage of 16 bits i was able to obtain 120% faster unpacking code. The thing is that when you are getting farther in the optimization process, you will always tend to use the advantage of 16/32 bits instructions to process more data at once and keep as much data as possible in register to reduce memory accesses, and in that case it will be really faster than the 65816.
Re: AtariAge "CPU comparison"
by on (#169923)
psycopathicteen wrote:
The Genesis's CPU is only 1.9 Mhz because it divides the clock by 4.

More like 1 MIPS as you'll be doing a lot of 8 cycle operations... but the speed is completely useless without also figuring out how many instructions you actually need (that's the thing about the 68000, it's notoriously slow but generally allows getting stuff done in less instructions, which isn't much of a gain for simple things but can be pretty important for complex stuff).
Re: AtariAge "CPU comparison"
by on (#169940)
Stef wrote:
I barely estimate the 7.67 Mhz 68000 to be almost twice as fast than the 3.1 Mhz 65816 (so with fast ROM).

I mean, I guess I'm not one to talk as I don't know enough about the 68000, but that seems a little exaggerated? In the case of an actual game from the time period, not when you're trying to do software 3D rendering which makes use of 32 bit operations and multiplication/division (Even the video hardware on the SNES is less suited for 3D due to the graphics format).

And yes, I did read your reasonings. To me, it seems like both processors are even taking these into account:

Stef wrote:
- SNES CPU internal / external speed = ~3.1Mhz with fast ROM and 2.68 Mhz with slow ROM.- MD 68000 internal speed = 7.67 Mhz / external speed = 1.92 Mhz

Stef wrote:
The *huge* advantage of the 68000 is its far more advanced architecture, the 65816 is a very simple CPU in comparison, after all it's just a 16 bits extended 6502, even the Z80 looks quite advanced compared to the 6502.

Even now...

Stef wrote:
But then you have to consider than the 65816 only uses a 8 bits data BUS while the 68000 uses a 16 bits BUS so despite its slow 1.92 Mhz external clock you can still do faster transfer with the 68000 CPU. But in the case you're doing many 8 bits operations the 16 bits can be wasteful....

Now uneven. But as I said, it's not always needed, so in the case where you're only doing 32 bit moves, it's definitely faster. However, you could also make a program that only does rts's :lol:. I honestly just want to know why you feel the 68000 is 2x as fast as a 65816 at half the clock frequency, as you've said you've worked with both (and I've seen you code for the 68000).

HihiDanni wrote:
compared to, say, Metal Slug 2.
HihiDanni wrote:
often devolves into comparisons with the Neo Geo (which I find frankly bizarre).

Hmm... :lol:

Actually though, the performance of Gradius 3 is probably better than that of Metal Slug 2. Metal Slug 2 already runs at 30fps (despite not having 2x the action) and slows down with as little as two enemies onscreen, and not even Gradius 3 does that. In effect, it's a 20fps game (although really a 15 due to a bug that deals with slowdown).

I'm not entirely sure why Gradius 3 is always the one to blame for terrible slowdown. If I recall correctly, Super R-Type is actually a little worse in that it starts to slow down with less action onscreen, although one thing I've never previously thought of is that there are more collisions to be checked. (Options in Gradius don't stop enemy bullets, while the force and bits in R-Type do.)
Re: AtariAge "CPU comparison"
by on (#169942)
Re: Gradius 3: I've already discussed this before. It's not about the CPU. I'll use the Apple IIGS as a reference point: it runs at 2.8MHz, with a 1MHz bus (needed for classic Apple II/II+/IIC/IIE compatibility), and has virtually none of the graphical capability of the SNES (i.e. almost everything graphical has to be done CPU-side -- there is no "PPU" in a sense). The things you can accomplish with a SNES at stock 2.68MHz is mindblowing in comparison; use of 3.58MHz (for high-speed) isn't going to magically going to decrease CPU cycle times. So really, the SNES is a pretty amazing system.

TL;DR -- My opinion/view mirrors that of Bregalad. The AtariAge forums have a remarkably low signal-to-noise ratio, and plagued with misinformation to boot. The times I've seen good/established information posted there I can count on one hand. I've always classified it as more of a "fan of system X/Y/Z" place, not a place of any technical merit.
Re: AtariAge "CPU comparison"
by on (#169943)
Even if the SNES is literally half the speed of the Genesis, if it's easy to get 80 sprites on the Genesis, then it should be pretty easy to get 40 sprites on the SNES (and it is), but for some reason tons of programmers have problems moving more than 4 or 5 sprites. It's like there's a book or something teaching people to program the SNES in a very discreet method that limits the programmer to 4 or 5 sprites.

BTW, why do people mention the fact that "some areas of memory slow the CPU down to 1.79Mhz," as if it is anything significant? The only thing at 1.79Mhz are the joypad registers, something that only gets read once in a frame. How is this anyway more significant than the Genesis's 68000 getting cycles stolen from the Z80 fetching stuff from ROM?
Re: AtariAge "CPU comparison"
by on (#169948)
Yeah, I agree with koitsu in that I'd say overall, video hardware has much more to do than a CPU in the case of old 2D hardware where there's no software rendering being done. (Or at least in something that's not a tech demo, which is a different story.) I just feel like a very powerful CPU for 2D hardware is more of bragging rights than anything, or it makes it good for higher level programming languages. Well, I mean, there's a point you want to be at, and then everything past it is purely overkill. I'd say the question is if the 65816 is at least to that, but I'd imagine it is, considering that there are a couple (as in 2) shooters on the SNES that display a bunch of enemies, so I'd say it's at that point, even if just barely. Ofcourse, not every game genre even demands that much CPU power.

Ofcourse, I haven't even finished programming Pong, so what am I to say anything. :lol:
Re: AtariAge "CPU comparison"
by on (#169954)
A few more thoughts I have:

I think arcade spec is kinda overrated because of how easy it is to make a powerful arcade system. Cost is largely a non-factor since you're just selling to arcade operators and not home users. Designing affordable hardware for home use is arguably far more challenging. If you want to make something people can buy, you're going to have to make some compromises. Making the most out of a $200 budget is far more interesting IMHO.

As far as slowdown in shmups, I'd imagine that the biggest factor here is the player's bullets. When you're testing collisions between A number of bullets and B number of enemies, the brute force method involves A * B comparisons which can add up quickly (16 bullets * 8 enemies = 128 comparisons, sheesh), far quicker than testing multiple enemy bullets against a single player-controlled object. There are ways to speed up collision handling, a topic that is still relevant today in the realm of physics engines - you can eliminate possibilities to reduce the number of tests, and you'll get a performance boost as long as the incurred overhead isn't greater than the savings. My current idea involves putting player bullets into a spatial list (two, actually), so that enemies only need to check the bullets within a given subregion. A full grid would likely be too slow so I'm going to have two 1D lists, each along one axis - one for horizontal/diagonal bullets and one for vertical bullets.

I have to wonder though, just how many shmups take the brute force approach to doing collision tests?
Re: AtariAge "CPU comparison"
by on (#169971)
Here is a code for collision detection. It is for sprites with center based coordinates, so it is slower than a corner based coordinate system. It's approximately 100 cycles, which should give about 600 collision test, in worst case scenario.

Code:
object_collision:

lda {width}      //4
clc         //2 6
adc.w {width},x      //6 12
sta {temp}      //5 17

lda {x_position}   //4 21
sec         //2 23
sbc.w {x_position},x   //6 29
cmp {temp}      //5 34
bcc +         //2 36
clc         //2 38
adc {temp}      //5 43
bcc no_collision   //2 45
+;

lda {height}      //4 49
clc         //2 51
adc.w {height},x   //6 57
sta {temp}      //5 62

lda {y_position}   //4 66
sec         //2 68
sbc.w {y_position},x   //6 74
cmp {temp}      //5 79
bcc +         //2 81
clc         //2 83
adc {temp}      //5 88
bcc no_collision   //2 90
+;
lda #$0001      //3 93
rts         //6 99


no_collision:
lda #$0000
rts
Re: AtariAge "CPU comparison"
by on (#169980)
HihiDanni wrote:
I have to wonder though, just how many shmups take the brute force approach to doing collision tests?

Probably many, but even then there are some simple optimizations. Obvious one: keep bullets in their own list so collisions can be done quickly against just bullets rather than every other object. Not so obvious one: treat bullets as 1px large, since checking for a point in a box is faster than checking overlap between two boxes (this will work when bullets are small enough and if not then you can compensate by just making the box larger).
Re: AtariAge "CPU comparison"
by on (#169981)
Sik wrote:
Probably many

I hadn't even considered any other way... :lol:

Sik wrote:
Not so obvious one: treat bullets as 1px large

I've always thought this was one of the more obvious optimizations. It's funny, because the R-Type games actually do this backward, in that the ship's hitbox is 1 pixel large and everything else is regular and often slightly larger than the visual representation of the objects. Frankly though, using bullets make more sense.
Re: AtariAge "CPU comparison"
by on (#169983)
psycopathicteen wrote:
BTW, why do people mention the fact that "some areas of memory slow the CPU down to 1.79Mhz," as if it is anything significant? The only thing at 1.79Mhz are the joypad registers, something that only gets read once in a frame. How is this anyway more significant than the Genesis's 68000 getting cycles stolen from the Z80 fetching stuff from ROM?

Beacause those people haven't the technical knownledge nor the intellectual level to understand something as "complicated". They're just fanboys of a Sega System and takes pleasure in bashing the SNES, they have barely any idea what a CPU is, know absolutely nothing about neither computer science and electronics, and that's it. You're really loosing your time with them.

Those are the kind of people who things a fast CPU will make the character move faster on screen, when it actually have absolutely nothing to do with it, but hey, they have no idea what they're talking about.
Re: AtariAge "CPU comparison"
by on (#169984)
Espozo wrote:
I mean, I guess I'm not one to talk as I don't know enough about the 68000, but that seems a little exaggerated? In the case of an actual game from the time period, not when you're trying to do software 3D rendering which makes use of 32 bit operations and multiplication/division (Even the video hardware on the SNES is less suited for 3D due to the graphics format).

....

Now uneven. But as I said, it's not always needed, so in the case where you're only doing 32 bit moves, it's definitely faster. However, you could also make a program that only does rts's :lol:. I honestly just want to know why you feel the 68000 is 2x as fast as a 65816 at half the clock frequency, as you've said you've worked with both (and I've seen you code for the 68000).


As you may notice i said
Quote:
I barely estimate the 7.67 Mhz 68000 to be almost twice as fast than the 3.1 Mhz 65816 (so with fast ROM).


So i didn't said the 68000 is twice as fast than a 65816 working at half the frequency (which would mean the 68000 is as fast than a 65816 running at the same frequency). I said *almost* twice as fast than a 3.1 Mhz 65816... that is again just pure estimation but i think a 7.67 Mhz 68000 is equivalent to a ~5.5 Mhz 65816.
And now i can explain you why that is my estimation. If you really want to use these CPU at their maximum you will always tend to unroll and use very simple instructions in your code to execute bottleneck in an optimal way.
A good example is the psycopathicteen sprite rotation code but you have tons of possible examples (polygon fill, unpacking code, collision check...). At this point then you will get closer and closer to the maximum "data processing rate" of these CPU where you can almost reduces the view to sort of "read / modify / write" operation.
In which case you will obtain sort of:

Code:
move.l (a0)+,d0
add.l d1, d0
move.l d0,(a1)+


for the 68000 which give you 12+12+8 = 32 cycles to process 32 bits (1 cycles per bit, practical :p)

and for the 65816 :

Code:
lda $00uu,x
adc #xxx
sta $00uu,y


which give 5+3+6 = 14 cycles to process 16 bits (a bit less than 1 cycle per bit).

Of course you have to consider this as a very rough estimation but still it gives an idea and i think your understand the point. The 68000 also gives you some advantages as you don't have to deal with page crossing / boundary stuff but the 65816 has other advantages as fast branching.
Given these numbers you can see from where come my estimation. Of course it's not only numbers but also my experience working with many different CPU.

Quote:
Even if the SNES is literally half the speed of the Genesis, if it's easy to get 80 sprites on the Genesis, then it should be pretty easy to get 40 sprites on the SNES (and it is), but for some reason tons of programmers have problems moving more than 4 or 5 sprites. It's like there's a book or something teaching people to program the SNES in a very discreet method that limits the programmer to 4 or 5 sprites.


I totally agree with that and i think your Alesha demo is a good example. Still you have to consider how much time you spent in your optimization process, you just couldn't ask every developer to push optimizations to that level during all development stages, it is very time consuming (and so very costly). Also even having a sprite engine capable of handling metasprite and resources allocation and being flexible enough to handle every kind of sprite is definitely not that easy. It becomes really hard when you want it to be really fast and handling many sprites at once. In SGDK my current sprite engine is very slow and you can observe slowdown with only 10 sprites (and that is on MD) ! Ok it's wrote in C and i'm doing resource allocation in a very lazy (and slow) way but i really understand that some games used that kind of engine because they did not had time to do a better one. I'm currently rewriting my Sprite Engine (still in C) to obtain better performance but it became very complex now and really for me it's quite difficult to offer a good and flexible engine and provide good performance at same time :-/
Re: AtariAge "CPU comparison"
by on (#169999)
What exactly do you mean by resource allocation?
Re: AtariAge "CPU comparison"
by on (#170002)
By resource allocation i mean VRAM and hardware sprite object and the underlying Sprite structure (small block of memory) which own current state of your sprite (the later one is not a big deal, you can even statically allocate it). My sprite engine allow you to just create a new sprite without having to worry about how VRAM and hardware sprite are being allocated or about sending data to VRAM, basically you just specify the position and the current animation frame and everything is done automatically... But it also let you the choice to manually define / allocate almost everything in case you want to have finer control (in case you want to use preloaded sprite data or static hardware sprite allocation...). All that flexibility has a cost... I understand that normally you tend to design your sprite engine around your game, but sometime you want to have something easy to work with and which does not give you any strict constraints about what you can do in your game.
Re: AtariAge "CPU comparison"
by on (#170022)
I always thought it would be easier to manage sprite VRAM allocation on the Genesis because it's not as cramped, has slightly higher DMA, and isn't arranged in collumns and rows like the SNES. How much memory are you saving for backgrounds?
Re: AtariAge "CPU comparison"
by on (#170023)
On the one hand, the Genesis has no BG3. On the other hand, BG1 and BG2 are optionally 20% wider, and the horizontal scroll table and sprite display list sit in VRAM instead of main RAM (for HDMA) and dedicated OAM respectively.
Re: AtariAge "CPU comparison"
by on (#170031)
The Genesis has a "window plane" (which doesn't support transparency). I prefer having an open-air HUD even if the tile data for it is only 2BPP.

I'm not wild about the 16kb sprite char limit. Sometimes I wish I could have been able to fit in just a few more explosion sprites. Instead, I think I will take this opportunity to make the backgrounds especially beautiful.

I'll be honest, I kinda wish the SNES was 320px wide, as it would make widescreen display a lot less awkward. Its display in 4:3 is fine though, and I like the slight fattening it does to the graphics - I think it's actually kinda cute and endearing? When seeing screenshots on a webpage or whatever, the aspect ratio means you're looking at squares. But I think they are very pretty squares.
Re: AtariAge "CPU comparison"
by on (#170034)
psycopathicteen wrote:
I always thought it would be easier to manage sprite VRAM allocation on the Genesis because it's not as cramped, has slightly higher DMA, and isn't arranged in collumns and rows like the SNES. How much memory are you saving for backgrounds?


I think you have more freedom on the Genesis, as you can allocate sprite tile data anywhere in VRAM and you don't have size limitation except the VRAM size itself. But actually all these freedom can be a bit tricky to deal with : how much VRAM you allocate ? where allocate sprite in VRAM ? You have to deal with all that "flexibility".
Currently in my sprite engine i allocate a default of 384 tiles (12 KB) but you can change it as you want. Ideally you should always keep a good margin for background data and tilemap but i know some games can reserve a lot of VRAM just for sprites (Final Fight CD reserve almost half of VRAM just for sprites : ~30 KB).

Quote:
The Genesis has a "window plane" (which doesn't support transparency). I prefer having an open-air HUD even if the tile data for it is only 2BPP.


The window plane of Genesis does support transparency, the problem is that it replaces the BGA plan so you can't have the 2 BG plans + the window overlapping on same area. For that reason many games just use an opaque color for window background to hide anything below it (and more importantly hide the missing BGA). Still some games were able to use it smartly so you don't even see that constraint.
Street of Rage 2 come in mind : player and enemies energy bars are rendered using the window plan so you never have 2 background plans overlapping on that area but levels were designed with that constraint in mind so you almost never notice it, really clever :)

Quote:
I'm not wild about the 16kb sprite char limit.


I think 16 KB can be a limit in very specific cases. Beat'em all with large sprites is one of them though. I sincerely think the sprite limitation we observe in SNES Final Fight series is partially due to the sprite 16 KB limit. As i mentioned just before, the MegaCD version allocate ~30 KB just for the sprites. On SNES you could use the BG3 for HUD and lower the sprite VRAM requirement quite a bit though.
Re: AtariAge "CPU comparison"
by on (#170036)
What happens when you have all three planes overlapping in one tile at once? Does it just make the Plane A tile disappear? I wonder if this could be used for special effects?
Re: AtariAge "CPU comparison"
by on (#170038)
Stef wrote:
The window plane of Genesis does support transparency, the problem is that it replaces the BGA plan

The Super NES equivalent of that would be setting up "window" parameters for BG1, then switching to "level" parameters in an hblank ISR.
Re: AtariAge "CPU comparison"
by on (#170040)
HihiDanni wrote:
What happens when you have all three planes overlapping in one tile at once? Does it just make the Plane A tile disappear? I wonder if this could be used for special effects?

What are you even talking about? If you mean that BGs will get clipped due to overdraw, the answer is never, 3 BGs will always display as 3 BGs. What I believe you're trying to describe only applies to sprites, and sprite overdraw is only affected by other sprites.

Anyway, even if only 2bpp, you can make some nice status bars if you swap out a color or two per line.

This is actually a near perfect representation of the Metal Slug status bar in that it fits in a 256 pixel wide display. It works in 4 4 color palettes, with I think two colors being swapped per line:

Attachment:
2bpp status bar.png
2bpp status bar.png [ 5.12 KiB | Viewed 1842 times ]

The "Press Start" letters just barely don't work, but if they are even displaying, that means a player isn't even there so you could definitely use their resources for that.

HihiDanni wrote:
Sometimes I wish I could have been able to fit in just a few more explosion sprites.

You should see the ambitious junk I tried to do in solving this problem, trying to fit each sprite into vram like a tetris piece, scanning through sprite vram to make sure there are no redundant tiles, stuff like that. About 50% (or more) of SNES games use 8x8 and 16x16 sized sprites, and in that case, it fits perfectly, as all 128 sprites could be 16x16 and have its own spot in vram. The problem with this could be bandwidth. In order to get any benefit from using 16x16 and 32x32 sized sprites, you'd need to check for redundant tiles when requesting tiles.

HihiDanni wrote:
I'll be honest, I kinda wish the SNES was 320px wide

But this would get rid of BG3, because there's not enough bandwidth. Then, you'd basically have the Genesis. I'd rather have an extra BG, even if only 2bpp, as looking on a TV that's displaying in the 4:3 aspect ratio, the difference between the two isn't that big.
Re: AtariAge "CPU comparison"
by on (#170042)
Espozo wrote:
What are you even talking about? If you mean that BGs will get clipped due to overdraw, the answer is never, 3 BGs will always display as 3 BGs.

That question was referring to the Genesis VDP. Sorry if that wasn't clear.

Espozo wrote:
You should see the ambitious junk I tried to do in solving this problem, trying to fit each sprite into vram like a tetris piece, scanning through sprite vram to make sure there are no redundant tiles, stuff like that. About 50% (or more) of SNES games use 8x8 and 16x16 sized sprites, and in that case, it fits perfectly, as all 128 sprites could be 16x16 and have its own spot in vram. The problem with this could be bandwidth. In order to get any benefit from using 16x16 and 32x32 sized sprites, you'd need to check for redundant tiles when requesting tiles.

My game will use 16x16 and 32x32 sprites. I want a sizable portion of the char sprite space to be statically allocated. The top portion of table 1 (player sprite) and all of table 2 (enemies, maybe other things) are for DMA. One thing I'll probably end up doing to save space is using four-way symmetry with four sprites for sparkles and things like that.
Re: AtariAge "CPU comparison"
by on (#170043)
And if you really want to optimize a Super NES game for a widescreen monitor, you can use mode 5 with its 512x224 pixel backgrounds. Total pixel count including borders is 560x240, giving a pixel aspect ratio of 240/560*16:9 = 16:21 = 0.762. That's close to the 0.75 PAR of the Apple IIGS (in super hi-res mode) and NTSC C64 or the 0.767 PAR of the CPS/CPS2 arcade board.
Re: AtariAge "CPU comparison"
by on (#170044)
I actually thought about making a game using Mode 5 once! I'm not sure if the background depth trade-off is worth it though, and presumably the sprites would remain at standard resolution.
Re: AtariAge "CPU comparison"
by on (#170045)
HihiDanni wrote:
The Genesis has a "window plane" (which doesn't support transparency). I prefer having an open-air HUD even if the tile data for it is only 2BPP.

Transparency or translucency? Because if the former, yeah it does, it's just that one of the scroll planes can't show under it (but the other can, as well as sprites). Beat'em-ups and fighting games generally show health bars using the window plane without an opaque background.

HihiDanni wrote:
What happens when you have all three planes overlapping in one tile at once? Does it just make the Plane A tile disappear? I wonder if this could be used for special effects?

Plane A disappears (window plane replaces it), yeah. The other two planes aren't affected, layer priority even works the same way in fact.

As for special effects: the Dragon Ball Z game uses it for vertical split screen (only one background plane for each side but hey it works!), I wonder if Herzog Zwei does it as well. In Project MD I use it for the clipping in the level select, I imagine there are other games that do the same. In theory you could mix it with raster effects as well for funky stuff with plane A, but I haven't seen anything do it yet.
Re: AtariAge "CPU comparison"
by on (#170046)
Sik wrote:
As for special effects: the Dragon Ball Z game uses [window] for vertical split screen (only one background plane for each side but hey it works!)

I wonder how hard it'd be to combine this with Team Player or 4-Way Play for a 4-quadrant split screen.
Re: AtariAge "CPU comparison"
by on (#170048)
HihiDanni wrote:
I actually thought about making a game using Mode 5 once! I'm not sure if the background depth trade-off is worth it though, and presumably the sprites would remain at standard resolution.

Sprites can do interlace just fine; there's a separate PPU flag for that. Horizontal resolution is normal, though.
Re: AtariAge "CPU comparison"
by on (#170049)
(EDIT: on the vertical split screen thing) Well, as far as I'm aware the SNES can do the equivalent thing and in an even more flexible way. Yuu Yuu Hakusho (the first one) does this.

Also just noticed this so before I forget:
Espozo wrote:
But this would get rid of BG3, because there's not enough bandwidth. Then, you'd basically have the Genesis. I'd rather have an extra BG, even if only 2bpp, as looking on a TV that's displaying in the 4:3 aspect ratio, the difference between the two isn't that big.

Something like this would probably require a faster clock anyway, so that'd probably result in more bandwidth assuming the memory is fast enough. This is what happens on the Mega Drive, anyway (hence why its 256px mode has less sprites and a slower transfer speed).
Re: AtariAge "CPU comparison"
by on (#170051)
@Stef
Are you trying to allocate sprites of different sizes, without wasting any space? If that so, than it looks like it would be more complicated with having 16 different sprite sizes instead of 2.
Re: AtariAge "CPU comparison"
by on (#170056)
Sik wrote:
Something like this would probably require a faster clock anyway, so that'd probably result in more bandwidth assuming the memory is fast enough.

I mean, I'm assuming we're talking about sacrificing BG3 for more bandwidth for BG1, BG2, and sprites, which would allow they to display at 320 pixels wide. I'd imagine you'd have enough bandwidth sacrificing BG 3 to have sprites cover the whole screen horizontally, and BGs pretty much have to do that and would in this case, so in effect, you'd get how the Genesis distributes its vram bandwidth. SNES and Genesis seem about the same in this regard, the designers of the SNES valued BG layers more while the designers of the Genesis valued horizontal resolution more.

psycopathicteen wrote:
@Stef
Are you trying to allocate sprites of different sizes, without wasting any space? If that so, than it looks like it would be more complicated with having 16 different sprite sizes instead of 2.

He's trying to go my crazy route? :lol: Well, at least the sprite tiles are in a straight line. It'd just be like seeing if a piece of whatever length would fit in this spot, and the possible lengths seem to be:

1 tile (1x1)
2 tiles (2x1, 1x2)
3 tiles (3x1, 1x3)
4 tiles (1x4, 4x1, 2x2)
6 tiles (3x2, 2x3)
8 tiles (4x2, 2x4)
9 tiles (3x3)
12 tiles (4x3, 3x4)
16 tiles (4x4)

So really, it's 9 sizes compared to 2, but I feel that the number is irrelevant at a certain point. I kind of wonder how you'd check to see if a sprite would fit, other than manually looking at each 8x8 tile to see if it's empty.
Re: AtariAge "CPU comparison"
by on (#170057)
Espozo wrote:
I kind of wonder how you'd check to see if a sprite would fit, other than manually looking at each 8x8 tile to see if it's empty.

There's got to be some sort of data structure to help with that...
Re: AtariAge "CPU comparison"
by on (#170058)
You know what? I've done the 32x32 and 16x16 slot searching thing for a while now, but I still haven't done any duplicate checking. So far the "super long table" method seems the most compatible with my engine (even though it sounds silly), except for having to go back and giving every animation a starting index to the table. Its doable, but I want to save it for the next time I get sick with pnemonia.
Re: AtariAge "CPU comparison"
by on (#170061)
psycopathicteen wrote:
So far the "super long table" method seems the most compatible with my engine

What's the "super long table" method? Just what we've been talking about? Yeah, there's next to no benefit using 16x16 and 32x32 if you're not checking for duplicates, at least if a large portion of sprite vram is using this setup and not static things.
Re: AtariAge "CPU comparison"
by on (#170069)
Didn't you once mention having a list of how many of every animation frame are onscreen, and where they are in vram, if there are any?
Re: AtariAge "CPU comparison"
by on (#170074)
psycopathicteen wrote:
@Stef
Are you trying to allocate sprites of different sizes, without wasting any space? If that so, than it looks like it would be more complicated with having 16 different sprite sizes instead of 2.


Actually that's exactly what does the current implementation of my sprite engine... At each frame update it reallocate all hardware VDP sprite and re-allocate VRAM if needed : searching if the TileSet (the name i gave to the tile data structure for a sprite) is already present in VRAM. For that i'm using a "TileSet cache" framework. Each TileSet has its own size of course as Sprite can have many different size on MD... so yeah that definitely consumes a lot of time in scanning all TileSet cache entries for each sprite... it's why i said the resource allocation was done in a lazy way :-/

I almost rewrote everything from scratch though now, i do not check anymore for duplicated TileSet entries as there is absolutely no interest in doing that. I explain : In fact you have 2 possibilities :
- share the same TileSet sprite data for several sprite --> static/fixed allocation of VRAM for that TileSet and sprite point to it statically.
- dynamic and free TileSet sprite usage --> dynamic VRAM allocation without worrying about duplicated entries.

Even if you can spare some VRAM by looking the duplicated entries for dynamic TileSet allocation, you have to consider that in the worst case you won't have any duplicated entries (each similar sprite are on a different animation frame) so anyway you have to get enough VRAM to store each sprite TileSet independently... so don't even bother about looking for duplicated TilesSet to optimize the VRAM usage here.
Re: AtariAge "CPU comparison"
by on (#170078)
Stef wrote:
At each frame update it [...] re-allocate VRAM if needed


Potentially reallocating VRAM for each animation frame sounds really excessive.
Performance aside, with higher reallocation rates, fragmentation tends to become more of a problem.
Also, it sounds like producing lots of arbitrary allocation misses once VRAM space gets tight, negating any possible benefit of temporarily shared tiles.

My personal preference is to scan all frames of all animation files for any given object/character (e.g. the player, an enemy etc.)
at compile time and allocate the maximum size of each sprite size once at object instanciation time.
If no VRAM space is left, I postpone or abort instanciation.
Frames are optimized to deviate as little as possible from each other for each sprite size to prevent wasting space.

To mitigate fragmentation, I allocate "big" sprite tiles from the bottom of sprite VRAM and "small" sprite tiles backwards from the top of sprite VRAM.
I feel this is making the best out of the only-two-sprite-sizes-concurrently-limitation of the SNES.

Apart from that, I agree that the twofold shareable static/individual dynamic allocation scheme really is the most sane and straight-forward way to go.
Re: AtariAge "CPU comparison"
by on (#170079)
d4s wrote:
Potentially reallocating VRAM for each animation frame sounds really excessive.
Performance aside, with higher reallocation rates, fragmentation tends to become more of a problem.
Also, it sounds like producing lots of arbitrary allocation misses once VRAM space gets tight, negating any possible benefit of temporarily shared tiles.


Of course it is :p It's a really inefficient way of making thing work, but it was simpler to implement :) I wanted to provide a simple to use API sprite so i started it that way but quickly realized it was too much limited by the poor general performance of it :-/

Quote:
My personal preference is to scan all frames of all animation files for any given object/character (e.g. the player, an enemy etc.)
at compile time and allocate the maximum size of each sprite size once at object instanciation time.
If no VRAM space is left, I postpone or abort instanciation.
Frames are optimized to deviate as little as possible from each other for each sprite size to prevent wasting space.


That is more or less what i'm doing now: I'm storing the maximum hardware VDP sprite in use and the maximum TileSet size informations for a sprite definition object then i use that to allocate these resources only once when you're creating / instancing your sprite object.

Quote:
To mitigate fragmentation, I allocate "big" sprite tiles from the bottom of sprite VRAM and "small" sprite tiles backwards from the top of sprite VRAM.
I feel this is making the best out of the only-two-sprite-sizes-concurrently-limitation of the SNES.


Unfortunately that is a real issue on MD as you can have many different size and end up with many fragmentation in VRAM.
One of the solution is to try to avoid too much "live" sprite definition / allocation of different size or find the opportunity to release / reallocate everything at some point.
Re: AtariAge "CPU comparison"
by on (#170109)
And most games just statically assign slots anyway. Seriously, you lot are the only ones obsessed with the idea =P Most games actually would have those graphics compressed in ROM which makes streaming not much of a feasible option in the first place.

Espozo wrote:
I mean, I'm assuming we're talking about sacrificing BG3 for more bandwidth for BG1, BG2, and sprites, which would allow they to display at 320 pixels wide. I'd imagine you'd have enough bandwidth sacrificing BG 3 to have sprites cover the whole screen horizontally, and BGs pretty much have to do that and would in this case, so in effect, you'd get how the Genesis distributes its vram bandwidth. SNES and Genesis seem about the same in this regard, the designers of the SNES valued BG layers more while the designers of the Genesis valued horizontal resolution more.

Again you're missing the important detail here: you can't use the same clock speed because 320 is not a multiple of 256 (unless you want to use a really fast clock which wasn't really feasible). What this means is that in a 320px mode you'd need to use a slightly faster clock speed to get smaller pixels, and the faster speed also means more memory accesses per line (i.e. more bandwidth).

The Mega Drive has a completely different problem, which is the fact it has slower memory altogether (all accesses have to be in bursts of four consecutive bytes, and it only has enough time to read two bytes per pixel). To be fair they could have probably gotten room for a third background plane if they got rid of the free slots (much like the SNES does), although some of those were reserved for memory refresh so that may have been an issue =O)
Re: AtariAge "CPU comparison"
by on (#170131)
Sik wrote:
The Mega Drive has a completely different problem, which is the fact it has slower memory altogether (all accesses have to be in bursts of four consecutive bytes, and it only has enough time to read two bytes per pixel).

All rendering accesses on the Super NES outside the mode 7 background occur in bursts of the two bytes that make up a word, also two bytes per pixel. So that's a wash.

Educated guess of the fetch pattern over the course of 16 pixels, at 2 pixels per 4-byte burst:
  1. BGA/window map (2 cells)
  2. BGB map (2 cells)
  3. BGA/window sliver for left tile of pair
  4. BGB sliver for left tile of pair
  5. sprite fetch?
  6. refresh?
  7. BGA/window sliver for right tile of pair
  8. BGB sliver for right tile of pair

Is that close?
Re: AtariAge "CPU comparison"
by on (#170132)
Sik wrote:
And most games just statically assign slots anyway. Seriously, you lot are the only ones obsessed with the idea =P Most games actually would have those graphics compressed in ROM which makes streaming not much of a feasible option in the first place.


Yes, most people are perfectly fine with giving each enemy only 4 frames each.
Re: AtariAge "CPU comparison"
by on (#170144)
tepples wrote:
Educated guess of the fetch pattern over the course of 16 pixels, at 2 pixels per 4-byte burst:
  1. BGA/window map (2 cells)
  2. BGB map (2 cells)
  3. BGA/window sliver for left tile of pair
  4. BGB sliver for left tile of pair
  5. sprite fetch?
  6. refresh?
  7. BGA/window sliver for right tile of pair
  8. BGB sliver for right tile of pair

Is that close?

Yeah now that you make me think about it I'm dumb, there's not enough space for a third background (maybe for more sprites instead? but if they were too strained for more color RAM I'm not hopeful about that). It goes like this, not necessarily in order though, and the fetches happen 16 pixels ahead of time (yes, it starts rendering from within border area already):

  • Plane A 2x cell entries
  • Plane B 2x cell entries
  • Plane A 1st cell silver
  • Plane A 2nd cell silver
  • Plane B 1st cell silver
  • Plane B 2nd cell silver
  • Sprite cell silver
  • Free slot (sometimes refresh)

So yeah mostly like you said, although I think it just reads the pairs of silvers in a row instead of separately (being 16px ahead of time makes this feasible). I just arranged it in a neater way to understand =P (plane A can be either scroll or window, btw)

In any case they had to use a faster clock for 320px than for 256px which resulted in larger bandwidth, so if the SNES had to do the same it'd end up with the same result (can fetch more stuff per line, so it doesn't lose anything). The only question is whether the memory was fast enough for this.

EDIT: btw the fetches above show why there's a bug if window plane is at the left side and scroll A moves horizontally, there's a 16px column where it would need to fetch data for both but it can't do it, so instead it repeats what it had fetched for the window.

EDIT 2: by the way, sprite table fetches happen during hblank, if anybody wonders.
Re: AtariAge "CPU comparison"
by on (#170147)
Sik wrote:
I just arranged it in a neater way to understand

I arranged it to be more similar to the NES pattern (nametable, attribute, pattern plane 0, pattern plane 1).

Sik wrote:
by the way, sprite table fetches happen during hblank, if anybody wonders.

So the exact opposite of NES and Super NES, where sprite table scanning happens in parallel with background fetches and the sprite patterns are fetched in hblank.

Correct me if I'm wrong, but it sounds like sprites appearing on line n are fetched earlier on the Genesis VDP:
  • Nintendo: table scan during draw of line n-1, then pattern fetch during hblank before line n
  • Sega (I think): table scan during hblank before line n-1, then pattern fetch during draw of line n-1

Sega's way would thus appear to need more secondary OAM inside the VDP than Nintendo's way. Perhaps the Genesis VDP has so little CRAM because it needed more die space for secondary OAM to design around a Nintendo patent.
Re: AtariAge "CPU comparison"
by on (#170149)
Or maybe that was just the idea they came up with? I mean, they weren't shy about trying to make a D-pad despite Nintendo having owned a patent on it (though Sega's had a different mechanism internally I believe). Heck, the term "D-pad" actually comes from Sega.

Anyway: the crucial difference here is that the VDP keeps a cache with half of the sprite table (more specifically: Y coordinates, size and link order). It scans this list to figure out which (up to) 20 sprites may appear on that line, and then proceeds to fetch their X coordinates and tile IDs. This means that it doesn't need to retrieve the entire table from VRAM, just about a quarter of it. The time spent scanning the cache can be used to fetch other data from memory. And yeah, sprites are entirely rendered a line ahead of time, which is why if you reenable display mid-screen the first visible line won't show any sprites. Nearly nobody notices though =P

Actually, technically this should result in less internal memory, right? If I recall correcctly the PPU and SPPU keep all of the table on-die, while the VDP only needs to keep half the table this way. I wonder how much die space was spent on the linebuffers though (since to make this work it'd need two lines worth of buffered data, at 7 bits per pixel).

EDIT: also remember Sega's original intention was to allow for scaling sprites. In this case fetching and rendering ahead of time was probably a good thing.
Re: AtariAge "CPU comparison"
by on (#170150)
Sik wrote:
most games just statically assign slots anyway. Seriously, you lot are the only ones obsessed with the idea =P

I'm not sure if you were talking directly to psychopathicteen, but I believe I was the one who came up with the idea because I wanted to use it, and Stef seems to be trying to do the same thing, so yeah... :lol:

Sik wrote:
Most games actually would have those graphics compressed in ROM which makes streaming not much of a feasible option in the first place.

I don't think the people here are trying to make "most games". :lol:

psycopathicteen wrote:
Yes, most people are perfectly fine with giving each enemy only 4 frames each.

Exactly.

Good luck trying to pull off anything like this without trying to dynamically allocate sprites in vram: https://www.youtube.com/watch?v=lMx4iLp-EAc
Re: AtariAge "CPU comparison"
by on (#170156)
Espozo wrote:
I'm not sure if you were talking directly to psychopathicteen

This forum in general.

Espozo wrote:
Good luck trying to pull off anything like this without trying to dynamically allocate sprites in vram: https://www.youtube.com/watch?v=lMx4iLp-EAc

Most of the time you won't be trying to go that far though, here I'm seeing people behave like you absolutely need this even for simpler games =P

For the record, you could still probably get by changing sprites slots every so often (and just because an address is reserved for something it doesn't mean you can't stream it). That's still different from dynamically assigning addresses all the time. Stream those backgrounds though, I don't think there's much tiling going on. Also I'd try to push the large sprites into backgrounds where possible, luckily there are usually only one or two and normally not overlapping in a bad way.
Re: AtariAge "CPU comparison"
by on (#170167)
Sik wrote:
This forum in general.

Yeah, I'm stupid, the way you said it even implied a group of people.

Sik wrote:
here I'm seeing people behave like you absolutely need this even for simpler games =P

Well, psychopathicteen and Stef are advanced programmers, and I already know psychopathicteen is doing something pretty ambitious. I don't know how much he needs to do this though, at least yet. However, I'm not a good programmer, I guess I'm good at coming up with ideas but not being able to implement them. :lol:

Sik wrote:
Stream those backgrounds though

I'd be impossible not to. I actually had a conversation about the feasibility of porting Metal Slug to the SNES (inside of the froyo topic) and it seems possible for the most part, you'd just need a giant cartridge. The most difficult part would be running into not having enough colors for sprites, and I said you could put them on a list vertically and swap out the colors when necessary. This takes a surprisingly large amount of CPU time when you think about it, but the game does run at 30fps to begin with...

Anyway, yeah.
Re: AtariAge "CPU comparison"
by on (#170179)
Enemies pretty much always come in groups of similar types (which means a lot of palette sharing), and you can't go backwards in the level, so you could probably just swap palettes when a new group comes in. That'd be pretty cheap actually.
Re: AtariAge "CPU comparison"
by on (#170182)
No, I mean it's down to the point that all objects onscreen at once will use too many palettes. However, it's very rare that there are objects with more than 8 different palettes on any given horizontal line. I tested this with several screenshots. It's possible for this to get messed up, but it would almost have to be deliberately done, at least on single player. Hell, if there are too many objects per line, they won't even display anyway, never mind the palette being incorrect (Of course, it's not like this is synched together, but most objects would be about 32x32 so no more than 8 objects in general.) The problem is changing the colors in time, which I think you can update a color per line per HDMA channel, but I think you can also time code to run during Hblank somehow and update more colors than possible with HDMA, they just have to be continuous.

So yeah, if you wanted to get this to work, it'd require a crap ton of tile swapping and changing colors per line, but I see it working. 30 fps definitely helps to deal with dumb stuff like this that the Neo Geo original didn't even have to.
Re: AtariAge "CPU comparison"
by on (#170187)
Espozo wrote:
I think you can also time code to run during Hblank somehow and update more colors than possible with HDMA, they just have to be continuous.

That'd be an H- (or HV-)position-triggered interrupt with a DMA transfer inside. And yes, the colours would have to be in a small number of contiguous chunks, no more than four and perhaps only one chunk, otherwise you wouldn't be able to just DMA them in; you'd have to do a bunch of maneuvering and it would never fit. Technically you could do 8 chunks of one colour each, but then why aren't you using HDMA?

Theoretically, I think up to 21 colours should fit if they're all in one chunk, but the timing would have to be ridiculously precise, and IRQs aren't that precise; you'd need timed code, and doing it every line would eat most or all of your CPU time. A 15-colour sprite palette should work, but you wouldn't be able to use HDMA at all, for anything, lest it bump part of the DMA out into the next active line.

The advantage of HDMA is that it's automatic and has very low overhead. Running an H-IRQ every line can eat a lot of CPU time even for simple tasks.
Re: AtariAge "CPU comparison"
by on (#170216)
Sik wrote:
And most games just statically assign slots anyway. Seriously, you lot are the only ones obsessed with the idea =P Most games actually would have those graphics compressed in ROM which makes streaming not much of a feasible option in the first place.


Hm i don't really agree with that, i think most game uses a mix of static allocated / dynamic allocated sprites in their engine.
Having everything static is a big constraint for the game design, you have to count about how many sprites of which kind / size for a level, how to recycle them etc... in term of level / code design having everything statically allocated can be really painful.
Dynamic resource allocation offers much more freedom and make the graphic engine code simpler and less convoluted. You "just" need to have an efficient Sprite Engine capable of dealing with dynamic resource allocation and that is.
Also in my case i'm basically developing an API, i want it to be simple but still powerful and flexible so you can use it for almost game situation. You always have the choice to build your own engine and only use low level methods but you will spent much more time doing that.

Quote:
The Mega Drive has a completely different problem, which is the fact it has slower memory altogether (all accesses have to be in bursts of four consecutive bytes, and it only has enough time to read two bytes per pixel). To be fair they could have probably gotten room for a third background plane if they got rid of the free slots (much like the SNES does), although some of those were reserved for memory refresh so that may have been an issue =O)


Not sure what you mean by slower memory altogether, but given video memory on each system i think we cannot say that. The MD VRAM is specially designed to give (very) fast burst reading but slower random accesses. The VDP has be designed around this special memory try to take benefit from it (actually it partially does it) and if you count how much total bandwidth you obtain from both systems you end to have a bit more on MD than on SNES.
Re: AtariAge "CPU comparison"
by on (#170236)
For the Genesis, I think it you can avoid defragment problems by having objects use multiples of 16 tiles, and then use power of 2 sizes for objects/sprites that are smaller than 32x32.
Re: AtariAge "CPU comparison"
by on (#170243)
psycopathicteen wrote:
For the Genesis, I think it you can avoid defragment problems by having objects use multiples of 16 tiles, and then use power of 2 sizes, for objects/sprites that are smaller than 32x32.


Indeed it can help but honestly that hurts :p Having no square sprite is one of the strength of the MD sprite capabilities, it is really handy to limit the number of sprite to use and the scanline sprite overflow.
Re: AtariAge "CPU comparison"
by on (#170252)
Nobody says you have to use the full slot when drawing a single sprite. You can just use a 16-tile slot for a 4x1 tile graphic, though since you have space leftover you might as well cram in a few frames of animation so there is less DMAing going on.

My game is actually going to be using 32x32 pixel slots, which will hold both 32x32 and 16x16 sprites.
Re: AtariAge "CPU comparison"
by on (#170255)
Stef wrote:
psycopathicteen wrote:
For the Genesis, I think it you can avoid defragment problems by having objects use multiples of 16 tiles, and then use power of 2 sizes, for objects/sprites that are smaller than 32x32.


Indeed it can help but honestly that hurts :p Having no square sprite is one of the strength of the MD sprite capabilities, it is really handy to limit the number of sprite to use and the scanline sprite overflow.


I mean't power of 2 for smaller sized sprites/metasprites. Larger metasprites would be multiples of 16.

Quote:
I don't know how much he needs to do this though, at least yet.


You mean, how far it is to what I want it? For the most part, it's what I want. The only thing I'm still deciding is which animation method to do fire/energy balls, and elevators. It feels weird that there is always one static fireball and elevator sprite in VRAM, when I only use fireballs during boss fights, and I only use an elevator in one part of the level. I'll probably come up with an "automatic animation generator" mode, where certain metasprites trigger a generator to turn on, and there's a routine that automatically runs the animation just as long as there are that type of sprites onscreen.
Re: AtariAge "CPU comparison"
by on (#170272)
psycopathicteen wrote:
You mean, how far it is to what I want it? For the most part, it's what I want.

I mean how much you need this system for it to be possible to display the level of graphics you currently are. By checking duplicate tiles, the number of tiles across all the small robots onscreen would be greatly reduced.

Stef wrote:
psycopathicteen wrote:
For the Genesis, I think it you can avoid defragment problems by having objects use multiples of 16 tiles, and then use power of 2 sizes, for objects/sprites that are smaller than 32x32.

Indeed it can help but honestly that hurts :p Having no square sprite is one of the strength of the MD sprite capabilities, it is really handy to limit the number of sprite to use and the scanline sprite overflow.

It's kind of funny how the backwards sprite capabilities of the SNES actually kind of comes in handy here. There are less (by a lot) sprite sizes, but there's also about one and a half times the sprites on the SNES.

psycopathicteen wrote:
The only thing I'm still deciding is which animation method to do fire/energy balls, and elevators.

Personally, I'd do them the same way as any other object. They don't have to be erased from vram every time they're not onscreen, I'd code it to where they'd only be able to be overwritten when the boss is defeated. Shoot, I'd do every object outsize of a status bar (if that even counts) or something extremely commonplace and non changing in size (like coins) this way.

One kind of random thing I've noticed is the complete lack a sprite status bar to taking of advantage in vram in each tile in the sprite be different, like if I'm making a game with 16x16 and 32x32 sized sprites and I wanted to make the status bar out of sprites, I wouldn't give each number its own 16x16 tile, halving bandwidth where the score is, I'd put a line of unique 16x16's together and DMA tile data where on the sprite I want to overwrite, so instead of having 16x16 tiles of numbers 00-99, you'd just have 0-9. This further helps if you want to use the top or bottom 8 pixels for displaying something else.
Re: AtariAge "CPU comparison"
by on (#170283)
psycopathicteen wrote:
I mean't power of 2 for smaller sized sprites/metasprites. Larger metasprites would be multiples of 16.


Yeah, I understood it but even for larger sprite you need no square sprites for "edge" part (if you really want to optimize your hardware sprite usage). And in my case, i allocate VRAM for the whole (meta)sprite given it's maximum number of tiles, i'm not doing per hardware sprite VRAM allocation. So if you do many sprite allocation / release operations you end by having fragmented VRAM. Hopefully in reality you can statically allocate a certain number of sprite and "dynamic" sprites are allocated / released by block almost time so that is not much a issue.

Quote:
It's kind of funny how the backwards sprite capabilities of the SNES actually kind of comes in handy here. There are less (by a lot) sprite sizes, but there's also about one and a half times the sprites on the SNES.


Yeah having square sprite is convenient for some reason, specially in a hardware perspective. But because of that you tend to use more sprite on SNES (using 2 16x16 sprites instead of one 8x24 one for instance), then you need more CPU processing (something you really don't want to waste) and at same time the sprite pixel scanline limitation can be reached faster.
Re: AtariAge "CPU comparison"
by on (#170294)
I didn't say make metasprites out of square sprites. I just said make the slot sizes multiples of 16. If you had a metasprite made of 3 24x24, it would be rounded up to 32 tiles.
Re: AtariAge "CPU comparison"
by on (#170297)
Oh you mean having allocation slot of 16 tiles when you use 16 tiles or more for a sprite ? I'm not sure how much it help about fragmentation, it does probably but if I understand your idea then if you want to allocate 6 sprites using 18 tiles each, you end up allocating 6 x 32 tiles (2 x 16 tiles slot per sprite) ?
Re: AtariAge "CPU comparison"
by on (#170298)
It does help reduce how badly fragmented it gets when the gaps are the same size a sprite is likely to need.
Re: AtariAge "CPU comparison"
by on (#170302)
In such an allocation scheme, would it be worth it to add some sort of incremental defragmentation of sprite VRAM slots in the background?

Image
Remember me?
Re: AtariAge "CPU comparison"
by on (#170390)
There've been some recent edits to the SNES page on tvtropes, and I'm not sure I like them.

Is there actual evidence that Nintendo intentionally nerfed the S-CPU in favour of adding chips to cartridges, such that it should be asserted as fact?
Re: AtariAge "CPU comparison"
by on (#170391)
Are you referring to jnv11's recent edits claiming that the clock speed was cut on purpose to save money?

Otherwise, I've read rumors that something like the DSP-1 was originally going to be included on the mainboard but was cut.
Re: AtariAge "CPU comparison"
by on (#170393)
I really doubt they nerfed the CPU >_> They did remove a part that computed math which is why Pilotwings has a DSP (its whole purpose was to simulate the missing part since there wasn't enough time to rewrite the game to not use it).

That said:
tepples wrote:
Are you referring to jnv11's recent edits claiming that the clock speed was cut on purpose to save money?

Well, I could see the LoROM clock having been added for that (not to cut costs on the SNES, but on the cartridges, seeing as one would be spending lots more of money buying games).
Re: AtariAge "CPU comparison"
by on (#170394)
The CPU limited RAM throughput?? Ummm, if that's the case then FastROM wouldn't have been a thing. I'm pretty sure the slow RAM throughput is because... the RAM is slow.

Also I'm pretty sure that 16-bit operations are at worst case one cycle slower than the same thing in 8-bit.

Has this person ever programmed a Super Nintendo or are they just going along with "correlation implies causation"?
Re: AtariAge "CPU comparison"
by on (#170396)
I read the edit(s) as saying that the memory speed limited the CPU speed, not the other way around (which is basically correct).

whoops, I was reading the text that was already there, not what the user added. The "Nintendo crippled the CPU in order to make cartridges more expensive" is definitely "[citation needed]" material.
Re: AtariAge "CPU comparison"
by on (#170397)
tepples wrote:
Otherwise, I've read rumors that something like the DSP-1 was originally going to be included on the mainboard but was cut.


The SNES could have been a beast had they included a NEC uPD7725 with program and data RAM (for per-game upload of firmware) instead of ROM.

But I still think a lot of games wouldn't bother with it, because programming in that thing's VLIW ISA was brutal.
Re: AtariAge "CPU comparison"
by on (#170400)
Revenant wrote:
whoops, I was reading the text that was already there, not what the user added.

The link I gave was to the most recent history. At the bottom there should be a link to expand the history so you can see when and by whom that particular phrase was added.
Re: AtariAge "CPU comparison"
by on (#170403)
This is basically the "circle of logic" everybody uses to "prove" the SNES is "slow."

Attachment:
circle of logic.png
circle of logic.png [ 19.42 KiB | Viewed 1385 times ]
Re: AtariAge "CPU comparison"
by on (#170404)
tepples wrote:
Are you referring to jnv11's recent edits claiming that the clock speed was cut on purpose to save money?

Yes.

I've made some contributions and corrections to that page myself in the past, but I've been wary of saying too much since I know I'm biased and probably don't understand things as well as I think I do. Still, I think I understand things better than jnv11 does. Based on given edit reasons, this person seems to believe that the 8-bit bus "castrated" the console, and that the bit depth of the DSP was why it "blew out" competing sound chips.

I don't want to start an edit war, and I certainly don't want to stomp on valid observations with misinformation. Maybe I should start a discussion there...

HihiDanni wrote:
I'm pretty sure the slow RAM throughput is because... the RAM is slow.

Yeah, but with a 16-bit bus (and 16-bit memory) the throughput is doubled for the same memory speed. This is the reason the Mega Drive's bus speed in MB/s roughly matches that of the SNES (FastROM) despite the fact that the bus accesses happen roughly half as often.

The thing is, the bus width comes with the CPU. You can't criticize it in isolation; the proper place for complaints about the bus width is a technical discussion of the relative merits of different CPUs. The discussion as it currently stands on that page seems unbalanced and incomplete (and a fuller discussion should probably be inside a note because it's already getting bloated).
Re: AtariAge "CPU comparison"
by on (#170407)
psycopathicteen wrote:
A. BECAUSE THE SNES IS SLOW
B. BECAUSE I INTENTIONALLY PROGRAMMED IT TO BE SLOW
C. GOTO A

C. Because finding and hiring somebody who knew how to make a 65816 not be slow was too expensive within the time and money that the publisher allotted.

The 68000 had a pool of Amiga, Atari ST, and arcade programmers to draw from. The 65816 had only the Apple IIGS, and that platform wasn't quite as conducive to training programmers in action game logic optimization because of its dumb, unscrollable, single frame buffer. The fastest you could scroll on a GS was about 10 to 15 fps using the obscure "PEA field" technique that overlays the stack on the hardware frame buffer, and the back buffer is stored backwards as a block of self-modifying code.

93143 wrote:
Based on given edit reasons, this person seems to believe [...] that the bit depth of the DSP was why it "blew out" competing sound chips.

Compare the clarity of "Nintendo!" in Tetris Attack to "Say-Gah" in the Sonic games to see what a difference bit depth makes. Or compare the clarity of any Super NES game to its GBA port, which is usually a pile of hiss and aliasing because of its 8-bit DAC usually filled by software mixing at 18 kHz.

Quote:
The thing is, the bus width comes with the CPU. You can't criticize it in isolation; the proper place for complaints about the bus width is a technical discussion of the relative merits of different CPUs.

So you then have to disprove "65816 sux; 68000 roolz".

If you do decide to make major changes to the Super NES page at TV Tropes, could you consider doing the same for the corresponding page on All The Tropes? You can't copy text back and forth because of copyright issues, as the two wikis use incompatible copyright licenses, but you can put your own original words on both pages. All The Tropes is a fork that still uses the same free content license as Wikipedia, from a backup of TV Tropes from before July 2012 when TV Tropes changed to a less fork-friendly license (which change allegedly infringes its pre-July 2012 contributors' copyrights).
Re: AtariAge "CPU comparison"
by on (#170418)
tepples wrote:
Compare the clarity of "Nintendo!" in Tetris Attack to "Say-Gah" in the Sonic games to see what a difference bit depth makes. Or compare the clarity of any Super NES game to its GBA port, which is usually a pile of hiss and aliasing because of its 8-bit DAC usually filled by software mixing at 18 kHz.

Sample rate (and more importantly proper timing) makes an even bigger difference than bit depth does, though (the latter matters mostly for quiet noise, the former matters for whether it even sounds correctly at all). Mind you, SNES still gets an edge there.

Also let's say Sega's sound engine wasn't exactly the best at PCM either even for the system's standards... (in fact the Seeegaaa sound is played with code that's unique to that sample with a busy loop just for that reason, which is also why it can't play properly anything else during that time)
Re: AtariAge "CPU comparison"
by on (#170421)
Quote:
Also let's say Sega's sound engine wasn't exactly the best at PCM either even for the system's standards... (in fact the Seeegaaa sound is played with code that's unique to that sample with a busy loop just for that reason, which is also why it can't play properly anything else during that time)

i think it's the case on almost all platforms, on PCE you'd be surprised how PCM drivers are bad, even in SF2 .
Re: AtariAge "CPU comparison"
by on (#170425)
Do you really want to compare which version of SF2 has the worst sound PCM driver ? :p
Re: AtariAge "CPU comparison"
by on (#170428)
Stef wrote:
Do you really want to compare which version of SF2 has the worst sound PCM driver ? :p

LOL, can i take a wildcard ??

SF2 driver eat 10/15% for a single pcm, and close to 20% for 2.PCM are not as bad as MD because fortunately,even with a so bad driver, timings for accessing to the sound chip are not as tight than the 2612 ones, the PCE's sound chip run at 3,58 mhz without any latency for reading/writing registers .

for managing sprites, i think a well organised SAT is also important, it's not so good to spend too many cycles for changing something in the sprites table .
Re: AtariAge "CPU comparison"
by on (#170433)
Seeing stuff like the Genesis Z80 getting paused every frame for things such as end-of-frame DMA transfers makes me appreciate how isolated the SPC-700 is from the rest of the system, since it doesn't have to fight over shared resources. On the other hand it means a convoluted upload process and garbage loading times. I'm hopeful about getting two simultaneous streaming samples, but I'm going to need to modify the SNESGSS sound driver, and I know basically nothing about how the SPC-700 works.
Re: AtariAge "CPU comparison"
by on (#170434)
Quote:
Seeing stuff like the Genesis Z80 getting paused every frame for things such as end-of-frame DMA transfers makes me appreciate how isolated the SPC-700 is from the rest of the system, since it doesn't have to fight over shared resources

if i am right, Z80 is stopped only if you want access to the 68K bus,else it run freely .
SPC-700 is isolated yes, but so isolated that it cannot accessing to ROM datas, for me it's not a better solution .
Re: AtariAge "CPU comparison"
by on (#170435)
The waveform stops updating when the Z80 is paused. On the SPC you're uploading data into a buffer space representing the sample that the DSP is reading continuously. I think the Genesis could have benefited from a PCM buffer rather than relying on a CPU to provide individual updates.
Re: AtariAge "CPU comparison"
by on (#170437)
But then the samples are on the cartridge, and the cartridge is on the 68K bus. Would it have been hard for the Z80 to use its 8K of exclusive RAM as a circular buffer? It'd be filled during draw time, when DMA is unlikely to be running, and emptied during vblank to leave room for BLAST PROCESSING. Can the Z80 see the vcount in Genesis mode?
Re: AtariAge "CPU comparison"
by on (#170438)
Quote:
I think the Genesis could have benefited from a PCM buffer rather than relying on a CPU to provide individual updates.

Of course you're right, but SPC can also benefit to have a DMA(or another mecanism) for updating his RAM without using the CPU .The Z80 have also some drawback.

Quote:
But then the samples are on the cartridge, and the cartridge is on the 68K bus. Would it have been hard for the Z80 to use its 8K of exclusive RAM as a circular buffer

Yes ,it's how works some MD's PCM drivers,they use a buffer and fill it in active display,of course this cause some PCM delay (if I remember correctly,stef's driver has up to a 4 frames delay),and steal some 68K cycles .
Another problem is to use timed code to send datas to the 2612 DAC,and cherry on the cake, the 2612 is damn slow,but results are better than PCM hear in SF2 for exemple, not scratchy, but can be heavily distorded .

SPC-700, is not perfect too, and have also some drawback, mainly with only 64K of RAM,sounds are perceived muffled by many people .

Quote:
Can the Z80 see the vcount in Genesis mode?

Yes, but unfortunately it's the only irq to be saw by Z80,the 2612's timers and hsync lines are not connected,this is why you need timed code to play PCM .
Re: AtariAge "CPU comparison"
by on (#170439)
TOUKO wrote:
SPC can also benefit to have a DMA(or another mecanism) for updating his RAM without using the CPU


The SNES can DMA stuff to the APU I/O registers, but the audio-side CPU has to individually read each byte out of these. It would have been nice if the APU could finish the transfer of each byte without CPU intervention.
Re: AtariAge "CPU comparison"
by on (#170440)
it seems that some guys here try to feed SPC ram with HDMA, but i don't know if it works or not .
Re: AtariAge "CPU comparison"
by on (#170442)
HDMA should work, since it's basically a mechanism for periodically writing bytes to registers, and these periods can be made long enough for the SPC to process each byte sent over. This means you can have your game doing other stuff with the streaming happening with periodic interrupts instead of the program spending time in a busy loop waiting to send the next byte. Streaming audio in this way is a great workaround to the 64kb limit, though I can imagine why it didn't see widespread usage since it basically needs driver-side support.
Re: AtariAge "CPU comparison"
by on (#170443)
Quote:
Streaming audio in this way is a great workaround to the 64kb limit

Yes,if it really usefull/doable in game,i agree,even in psychopaticteen's bad apple demo it would be an accomplishment.
Re: AtariAge "CPU comparison"
by on (#170449)
tepples wrote:
But then the samples are on the cartridge, and the cartridge is on the 68K bus. Would it have been hard for the Z80 to use its 8K of exclusive RAM as a circular buffer? It'd be filled during draw time, when DMA is unlikely to be running, and emptied during vblank to leave room for BLAST PROCESSING. Can the Z80 see the vcount in Genesis mode?


It's feasible and actually exactly how work my XGM sound driver. It does even more than that: it takes care of the DMA bus contention but also allow to play up to 4 PCM at same time (at 14 Khz) while playing the music, that is everything running on the Z80.
That give you an idea about how bad the SF2 driver is (100% of the Z80 CPU used for that awful 2 PCM mixing at 7.5 Khz).

Using PCM buffering in Z80 memory is the basic idea which bring much more possibilities in term of PCM Megadrive sound driver.
As said Touko it adds a bit of delay to the PCM playback but honestly it's not much noticeable for SFX and for the music you can fix them in amount (what the XGM driver is doing).

About the VCounter, yeah you can access it but accessing the VDP mean you use the 68K bus so you should avoid to access it in DMA potential period. I was using the V-Int for music tempo and guess when DMA could occurs but now i'm using the 68k to give XGM tempo, that offer more flexibility (for PAL / NTSC timing) and also give finer control for the BUS contention issue.
Re: AtariAge "CPU comparison"
by on (#170450)
To be fair, honestly the interruption problem would have been solved if the Z80 got priority over the VDP during DMA. That'd have slowed down video transfers a bit but it'd mean PCM never has to get interrupted. (also this really was never problem with FM and PSG as the hardware keeps playing them and timing-wise at worst you get a delay of like 1/7 of the frame, practically unhearable?) Though the problem with Sega's driver is a flawed design instead =P (halt PCM while processing FM and PSG, which takes up a good chunk of the frame! that's why it sounds so scratchy)

I'd argue the SPC memory limitation is worse, honestly. I mean, it's a sample based system, where access to more memory is even more important. But of course there's the bus conflict issue, I wonder if that could have been handled with the priority I mentioned.
Re: AtariAge "CPU comparison"
by on (#170451)
Sik wrote:
To be fair, honestly the interruption problem would have been solved if the Z80 got priority over the VDP during DMA.


Totally agree... A shame they did not designed the BUS arbiter to handle it that way. Maybe the VDP DMA couldn't handle that easily and requires exclusive BUS access, still that would have solve almost all issues (others being the bad driver design).

Quote:
I'd argue the SPC memory limitation is worse, honestly. I mean, it's a sample based system, where access to more memory is even more important. But of course there's the bus conflict issue, I wonder if that could have been handled with the priority I mentioned.


Well at least on Megadrive they added a BUS arbiter to handle Z80 in BUS cycle steal, they should have done at least the same on the SNES SPC. Having the SPC fetching samples directly from ROM would have been awesome :)
Re: AtariAge "CPU comparison"
by on (#170457)
To be fair, was the SPC700 designed exclusively for the SNES? If not that may explain why it's so self-contained.

Stef wrote:
Maybe the VDP DMA couldn't handle that easily and requires exclusive BUS access

100% sure it could have (it already can cope with the VDP hogging up most accesses during active scan, after all). Most likely just they not realizing the issue on their part, it's even questionable whether they even really cared about PCM when the hardware was being designed.
Re: AtariAge "CPU comparison"
by on (#170459)
I don't think Sony knew enough about how the SNES was going to work, and vice versa.
Re: AtariAge "CPU comparison"
by on (#170604)
Sik wrote:
To be fair, was the SPC700 designed exclusively for the SNES?


Yes: Ken Kutaragi designed it specifically for the SNES. Sort of in secret and against the explicit instructions of Sony management (who wanted nothing to do with video games), and then presented to them as a fait accompli, IIRC.

It was subsequently used in a bunch of other stuff, and the feature set of the Playstation's SPU looks very much like a next-gen S-SMP/SPC700, so I'd wager it was derived from it.
Re: AtariAge "CPU comparison"
by on (#170605)
Was BRR compression even used for anything outside the SNES and Playstation?
Re: AtariAge "CPU comparison"
by on (#170610)
CD-ROM XA ADPCM, used in both CD-i and PlayStation consoles, is similar to the PlayStation's VAG variant of BRR, but with sinc filtering (what nocash's psx-spx calls "25-point Zigzag Interpolation") instead of Gaussian filtering. CD-i oriented description

There have been plenty of proprietary ADPCM-family audio codecs. But later systems appear to have standardized on IMA ADPCM, such as the Nintendo DS.
Re: AtariAge "CPU comparison"
by on (#170655)
Guspaz wrote:
Sik wrote:
To be fair, was the SPC700 designed exclusively for the SNES?


Yes: Ken Kutaragi designed it specifically for the SNES. Sort of in secret and against the explicit instructions of Sony management (who wanted nothing to do with video games), and then presented to them as a fait accompli, IIRC.
citation?
Re: AtariAge "CPU comparison"
by on (#170659)
Closest citation I can find for that is in the first paragraph here: https://en.wikipedia.org/wiki/Ken_Kutaragi#Career

However, that entire paragraph lacks citations. The closest reference I can find for *that* is: http://www.eurogamer.net/articles/farew ... er-article --

Quote:
... It's typical of Kutaragi's approach to work that he didn't actually tell any of his superiors about the Nintendo deal. Sony had no interest in videogames, and it's unlikely that bosses at the firm would ever have approved of his working on a chip for a Nintendo console. Undeterred, Kutaragi simply set about designing the chip in secret - eventually producing the design for the SPC700, the groundbreaking audio chip which allowed the SNES to seriously outclass all of its rivals in terms of sound and music.

Sony's executives were apoplectic when they found out about the project, and not for the last time, Kutaragi's career had a near-death experience. However, he was rescued by the intervention of Norio Ohga, who approved of the project, and allowed Kutaragi to complete work on the chip. ...
Re: AtariAge "CPU comparison"
by on (#170681)
Hmmmm, I recall reading somewhere about him designing the SPC700 on his own then proposing it to Nintendo, but maybe it was just me misremembering this (or somebody misunderstanding it). Though yeah the SPC700 definitely was first used on the SNES, my question was because if it was like I said, then it meant Kutagari had no way to make the SPC700 well integrated with the rest of the hardware (enforcing the isolated design it ended up having).
Re: AtariAge "CPU comparison"
by on (#170702)
My source was Console Wars, which went into the Sony/Nintendo interactions in significant detail. Both because of what it meant for the birth of the SNES, and because of what it meant for the birth of the Playstation. The book focuses more on Sega than anyone else, but after Nintendo and Sony broke up, Sony was going to do the Playstation with Sega, until Sega of Japan kiboshed it in a fit of arrogance.
Re: AtariAge "CPU comparison"
by on (#170720)
Wasn't Console Wars the romanticized one? (i.e. don't take everything from it seriously, some things were changed to make for a better story) There was another book from about the same time which was more accurate, but it focused solely on Sega if I recall correctly.
Re: AtariAge "CPU comparison"
by on (#170721)
I think dramatized is the word you're looking for, and yeah, it was, although as the article that Koitsu linked to indicates, Kutaragi's work on the SPC700 and its contentious nature inside of Sony wasn't exactly a secret.
Re: AtariAge "CPU comparison"
by on (#170851)
Revenant wrote:
The "Nintendo crippled the CPU in order to make cartridges more expensive" is definitely "[citation needed]" material.

All right, I'm going with that.

tepples wrote:
93143 wrote:
Based on given edit reasons, this person seems to believe [...] that the bit depth of the DSP was why it "blew out" competing sound chips.

Compare the clarity of "Nintendo!" in Tetris Attack to "Say-Gah" in the Sonic games to see what a difference bit depth makes. Or compare the clarity of any Super NES game to its GBA port, which is usually a pile of hiss and aliasing because of its 8-bit DAC usually filled by software mixing at 18 kHz.

8-bit PCM doesn't sound that bad; it's the time-domain aliasing that really kills it.

I know bit depth helps, but it wasn't the only reason or even the primary reason the SNES got a reputation for better sound.

tepples wrote:
So you then have to disprove "65816 sux; 68000 roolz".

More or less. It's a delicate task, especially for somebody whose 68000 knowledge is entirely secondhand... and I can't spare very much time for this sort of thing as it's kinda crunch time in the real world...

Quote:
If you do decide to make major changes to the Super NES page at TV Tropes, could you consider doing the same for the corresponding page on All The Tropes?

I don't see why not, once the edits have stabilized. Hopefully I can get knowledgeable people to at least review what I've written to make sure I'm not spreading propaganda...

TOUKO wrote:
it seems that some guys here try to feed SPC ram with HDMA, but i don't know if it works or not .

It works in N-Warp Daisakusen.

HihiDanni wrote:
I'm hopeful about getting two simultaneous streaming samples, but I'm going to need to modify the SNESGSS sound driver, and I know basically nothing about how the SPC-700 works.

That sounds kinda like my situation.

I've tried to sketch out an ultra-high-bandwidth technique using self-modifying code (rather than the stack method used by d4s in NWD). I posted it in the general-purpose thread here and here. It was roundly ignored, possibly because tl;dr, or possibly because it's my first real attempt at coding SPC700 and is entirely untested - I haven't even tried to assemble it yet, so it's not like I've proved anything... but if it works (and my assumptions about the context are realizable), it should be possible to stream two samples at 32 kHz, or three at 22 kHz, with bandwidth to spare and without restricting music timing to VBlank. At least, that's the idea...

TOUKO wrote:
even in psychopaticteen's bad apple demo it would be an accomplishment.

I'm not sure it would help in Bad Apple, since he's already used general-purpose DMA to speed up video decoding. In other words using HDMA for anything would force him to roll back the video decoder or else lose compatibility with rev.1 CPUs. (This is a very freaking annoying hardware bug. How did it manage to slip through testing and get released?)

I once proposed using almost all of audio RAM as a huge multi-second buffer, filled as the opportunity arises. This would allow the program to completely ignore audio for several seconds at a time if it needed to concentrate on video. I don't know what the program currently does, or even if he still considers it a work in progress...
Re: AtariAge "CPU comparison"
by on (#170853)
Are you planning on uploading instruments as needed?
Re: AtariAge "CPU comparison"
by on (#170856)
Revenant wrote:
The "Nintendo crippled the CPU in order to make cartridges more expensive" is definitely "[citation needed]" material.

"Citation needed" is for stuff you think is true, but doesn't have a source. I'd say that's just "delete on sight" material, author inserting their own opinion instead of objectively reporting.
Re: AtariAge "CPU comparison"
by on (#170904)
psycopathicteen wrote:
Are you planning on uploading instruments as needed?

My original goal was real-time multi-channel streaming at high sample rates, without restricting music timing to VBlank. But obviously if you can do that, it should also be possible to change out samples in RAM during playback. And it would be nice to have a reasonably fast bulk data loader that didn't freeze the game à la SFA2...

I suspect it'll be a while before I'm ready to actually try out this method in the field. By then I may have a better idea, or I may have realized that this one has a fatal flaw or something...

rainwarrior wrote:
Revenant wrote:
The "Nintendo crippled the CPU in order to make cartridges more expensive" is definitely "[citation needed]" material.

"Citation needed" is for stuff you think is true, but doesn't have a source. I'd say that's just "delete on sight" material, author inserting their own opinion instead of objectively reporting.

Technically what it said (following on from the part about the CPU being weak) was:
Quote:
This was intentional on the part of Nintendo because it wanted to lower the cost of the system, and it provided connections for cartridges to have their own coprocessors when high performance processing is needed. This naturally drove up the cost of games that required such coprocessors.

Which could be considered technically true, but gives the wrong impression. I deleted it, and gave the following reason (visible only in page history):
Quote:
This seems to be an unsupported extrapolation. The NES, Genesis and even N64 could and did use special hardware in cartridges; I am unaware of any indication that this feature was related to the choice of CPU in the case of the SNES.

But now I'm worried that I may have started an urban legend about coprocessors in N64 cartridges, which is not what I meant. If some dude on some forum starts claiming WDC or Conker or Naboo had an enhancement chip, you know who to blame...

Next up is to try to finesse some of the CPU stuff...
Re: AtariAge "CPU comparison"
by on (#170907)
Even the Atari 2600 did, really (and at some point the Odyssey was going to - that's a console that didn't even have a CPU for starters). I don't recall of any extra hardware in N64 cartridges outside save RAM though, but I may be missing something obvious.

And yeah, people apparently were convinced Donkey Kong 64 really made some proper use of the Expansion Pak until one of the devs came out and admitted it was just to prevent a crash somewhere in the game, so I can see why some people may start thinking some games would need a coprocessor >_>
Re: AtariAge "CPU comparison"
by on (#170917)
Doubutsu no Mori's real-time clock, Mario no Photopi's flash card readers, and Morita Shougi 64's built-in modem. At least, that's what a quick Google turned up.

I didn't know that about DK64. Knowing Rare, it must have been one hell of a bug...
Re: AtariAge "CPU comparison"
by on (#170919)
93143 wrote:
Mario no Photopi's flash card readers

I wonder whether that game can be exploited to take full control, as Super Mario World and Pokémon Yellow and many Wii games were. Then it would become a flash cart. True, little else uses SmartMedia, but I think a few multi-format card writers that handle xD-Picture cards can still handle SM.
Re: AtariAge "CPU comparison"
by on (#170920)
93143 wrote:
Doubutsu no Mori's real-time clock, Mario no Photopi's flash card readers, and Morita Shougi 64's built-in modem. At least, that's what a quick Google turned up.

Touché.

And to Rare's credit, DK64 started as a 64DD game (and hence was made with the extra memory in mind). When they decided to turn it into a cartridge game they tried to cram it into less memory, and they did manage to pull it off... except for that one annoying crash, and it happened to stop if the extra memory was present. So they just made the game require the Expansion Pak and they claimed the game made use of the extra memory =P (also probably gave a good excuse to sell the Expansion Pak as it'd later be needed for Ocarina of Time, which was another 64DD to cartridge port)
Re: AtariAge "CPU comparison"
by on (#170921)
You mean Majora's Mask? Ocarina was a 64DD to cart port, but it came out a year before DK64.
Re: AtariAge "CPU comparison"
by on (#170967)
Right, my timeframes keep being all messed up =P (and yeah now that I remember, OOT was always cartridge, Majora's Mask was the disk one that latched onto OOT)
Re: AtariAge "CPU comparison"
by on (#171379)
I checked the SOR 3's sprites, and i am surprised by the number of sprites used for the foes(and may be for players too).
Up to 10 sprites are used for each meta sprites, it sounds like a wastle to me ..
i don't know how many cpu cycles are used to modify a single sprite entry on Md,but with this system i think a lot .
Re: AtariAge "CPU comparison"
by on (#171393)
Yeah, but remember the MD has the ability of displaying rectangular-shaped sprites. In a beat'em-up the amount of tiles to be uploaded is a bigger issue (both for bandwidth and memory usage), so you're better off using more sprites in order to reduce waste from blank areas.
Re: AtariAge "CPU comparison"
by on (#171409)
Because audio streaming has been brought up, I wonder how a 1-bit adpcm format would sound like.

Such as something with a formula similar to this:

y(t) = x(t) + 5/2*y(t-1) - 5/2*y(t-2) + 15/16*y(t-3)

where y(t) is the signed 8-bit output, and x(t) is either +8 or -8.
Re: AtariAge "CPU comparison"
by on (#171413)
CVSD, a form of 1-bit ADPCM, is in Sinistar.
Re: AtariAge "CPU comparison"
by on (#171421)
Quote:
Yeah, but remember the MD has the ability of displaying rectangular-shaped sprites. In a beat'em-up the amount of tiles to be uploaded is a bigger issue (both for bandwidth and memory usage), so you're better off using more sprites in order to reduce waste from blank areas.

Of course i thought it was for maximising the sprites bandwidth, but i think (may be i am wrong) it's not really optimal on the number of sprites used, you can use bigger sprites here with the same result (for bandwidth).

For exemple the baseball bat use 5 sprites (8x8) .. :shock: ,good for bandwidth, but a little bit overkill no ??

Even with this , i think the SOR series have a very good sprites engine.
Re: AtariAge "CPU comparison"
by on (#171429)
Aside bandwidth i believe it has be made that way to optimize the sprite per scanline rendering and avoid sprite flickering as much as possible. That is for me the big advantage of having rectangular sprite (8x32 for vertical object) and sometime you have to use many smalls hardware sprites when you have thin diagonal object (pipe or sword)...

Quote:
Even with this , i think the SOR series have a very good sprites engine.


Of course they have ;) I observed a bit the SOR2 engine and we can see they cut sprites (i mean just rendering half of it) when they are going out of screen, an optimization to limit the scanline flickering but which require more computation.
Actually having a good sprite engine is definitely something not easy to deal with. I'm certain that many game spent a big part of CPU budget with the sprite engine (not even considering collisions, just about displaying (meta) sprites in a somehow optimized way considering hardware resource limitation).
Re: AtariAge "CPU comparison"
by on (#171590)
I found an audacity plugin that allows me to convert PCM to 1-bit dpcm quality, believe it or not. So far, setting it the output to 4-bit has more white noise, whereas 8-bit is more muffled. I haven't tried 6-bit yet.
Re: AtariAge "CPU comparison"
by on (#171613)
tepples wrote:
The fastest you could scroll on a GS was about 10 to 15 fps using the obscure "PEA field" technique that overlays the stack on the hardware frame buffer, and the back buffer is stored backwards as a block of self-modifying code.


Another guy used the same kind of technique to get what looks like 30+fps scrolling.
http://iigs.dreamhosters.com/gte/gte.html
https://www.youtube.com/watch?v=IsXPn6OCMF8
8-)
Re: AtariAge "CPU comparison"
by on (#171714)
ehaliewicz wrote:
tepples wrote:
The fastest you could scroll on a GS was about 10 to 15 fps using the obscure "PEA field" technique that overlays the stack on the hardware frame buffer, and the back buffer is stored backwards as a block of self-modifying code.


Another guy used the same kind of technique to get what looks like 30+fps scrolling.
http://iigs.dreamhosters.com/gte/gte.html
https://www.youtube.com/watch?v=IsXPn6OCMF8
8-)

Both very fascinating! ...I wonder what happened to the last bits.
Re: AtariAge "CPU comparison"
by on (#172807)
byuu wrote:
The SNES could have been a beast had they included a NEC uPD7725 with program and data RAM (for per-game upload of firmware) instead of ROM.

I've looked up the datasheet for the μPD77C/P25, and now I'm wondering why the DSP-1 took so long to do stuff. According to the datasheet a 16x16 signed multiply is just one of several things that can all happen in one cycle, but the SNES manual lists that same multiply as taking 26 cycles. The datasheet says 2.58 μs for a sin/cos, but the SNES manual says 7.8 μs at about the same clock speed. Is there really that much overhead involved in getting this chip to do something on demand?

Also, at 50 mA peak, this thing draws as much current as a SNES mouse. I suppose that rules out its use with the Super FX or SA-1...

...

Regarding the tvtropes page, the CPU entry reads like this (I deleted the extra bit about how the slow CPU was an attempt to save money by shoving the load off onto special chips in cartridges):
Quote:
*Like the NES, the Super NES has a Central Processing Unit for main data processing, and a Picture Processing Unit for the graphics. Also like the NES, the Super NES CPU and PPU have a master clock speed of 21.477 MHz, but the CPU divides it down to between 1.79, 2.68 and 3.58 MHz due to slow (cheap) cartridge ROM, and it was cheaper to make the system with said clock speed. This led to the belief that the SNES is a slow system, and that too much on screen action would slow it down.

I was thinking about putting an expandable note at the end of that section, as follows:
Quote:
The reality seems to have been a bit more complicated. The 65C816 was more cycle-efficient than the 68000, especially with typical game logic of the era, meaning the difference in clock speed with the Sega Genesis was less important than it looked. However, the 65C816 wasn't nearly as popular/widespread or easy to use, and hackers have reported some fairly boneheaded programming in commercial games, particularly in early releases. In addition, while the graphics processor in the SNES was more powerful than its competition and loaded with features, it was complex and fiddly to work with. It was certainly possible to put a lot of action on screen in a SNES game without slowdown, as demonstrated by later games like ''Rendering Ranger R2''.

I should probably also finesse the part that implies that 1.79 MHz ROM access was a thing... Is it true that a game that relies exclusively on autopoll need never run at 1.79 MHz? I mean, the autopoll doesn't interrupt the CPU, right?

The sub-entry after the main CPU entry used to read:
Quote:
** The processor itself was a 65C816, a 16-bit successor to the 6502 used in the NES, Apple II, Commodore 64, and Atari consoles and computers. Nintendo actually used Apple IIGS computers as development systems, since they also used the 65C816.

The recent edits added the following to the sub-entry:
Quote:
Since the 6502 family has only one accumulator register, every operation that uses a second operand must reference the RAM. Accessing the RAM is limited by the 8-bit data bus. Therefore, 16-bit operations were slower than 8 bit operations, but the 16-bit operations were still faster than emulating them with 8-bit instructions.

I was thinking of replacing that with an expandable note that reads:
Quote:
This explains the lower clock speeds vs. the Sega Genesis. The 6502 was designed as a budget processor when RAM was faster than CPUs, and thus it used an accumulator-based architecture with very simple instructions that required it to access the bus almost every cycle. The 65C816 inherited this operational paradigm, along with the 8-bit data bus that forced it to access code and data one byte at a time (this meant that 16-bit operations were somewhat slower than 8-bit operations, though still far faster than the sequence of 8-bit operations that would be required to do the same task). The 68000, by contrast, used more complex instructions that took more cycles to execute, and accessed its 16-bit data bus only once every four CPU cycles, relying on its array of internal general-purpose registers to keep processing speed up. This is why the Genesis was able to use a CPU clocked more than twice as fast as the SNES CPU while using slower, cheaper memory. Note however that the fast turnaround of the 65C816 makes it more powerful at a given clock speed than the 68000, so the advantage isn't as big as it looks (bus throughput is nearly identical on both systems; ironically the more sophisticated 68000 is better at moving bulk data and the more primitive 65C816 is better at navigating complicated logic).

The advantage of an expandable note is that it allows long explanations without cluttering the default view with walls of text, and is thus less likely to get flagged as "natter" and cut. Still, that's an awfully long explanation...

Comments?
Re: AtariAge "CPU comparison"
by on (#172906)
Quote:
This is why the Genesis was able to use a CPU clocked more than twice as fast as the SNES CPU while using slower, cheaper memory.

I think this is correct for a CPU only system(like atari ST or the apple 2GS),but when stuffs like DMA are involved, you need fast RAM/ROM too CPUs don't count anymore.
The Md's WRAM is 150 ns,the ROM also needs to be 150ns (if you use DMA),the PCE needs 140ns RAM/ROM with his 7,16 mhz CPU .

The 68k was more expensive than the 816,even in the 16bit era,really the snes's problem is more his architecture than his cpu speed .Of course i don't said a 65816 @2,6 is enough, but his low frequency is impacted by the non sense of the snes's architecture .

I think the 68k would have been better suited to the snes's architecture, you can have a faster cpu's clock,with the snes's slow memory .
Re: AtariAge "CPU comparison"
by on (#172910)
...hmm. I guess that's true; the VDP's DMA unit can use the whole 16-bit bus at two pixels per word, which is an equivalent bus speed to the SNES DMA unit even in H32 mode. In H40 mode it's nearly as fast as FastROM... Thanks for pointing that out.

I thought I remembered something about certain MD games using super slow ROM (~500 ns in one case) and still running well. Maybe that was nonsense, or maybe I misunderstood or remembered wrong...

How about this:
Quote:
This is why the Genesis was able to use a CPU clocked more than twice as fast as the SNES CPU without needing correspondingly faster, more expensive memory.

No need to get into DMA speed comparisons at this point in the article...

EDIT: Found the thread: http://www.sega-16.com/forum/showthread ... -ROM-speed
From what I can tell, some games might have used chips that slow, but they'd have had to avoid using DMA to pull directly from ROM. Well, whatever; the new version is less misleading...


Further comments?
Re: AtariAge "CPU comparison"
by on (#172914)
Code:
I thought I remembered something about certain MD games using super slow ROM (~500 ns in one case) and still running well. Maybe that was nonsense, or maybe I misunderstood or remembered wrong...

it's only possible if no DMA are used from ROM,and i don't think,even in early games,they didn't use DMA from ROM at all .
If Md could transfert in active display(more than 4 words) it would be a different story, but with an unlimited acces only in vblank, it's not concevable for me.

http://gendev.spritesmind.net/forum/vie ... 8&start=30

Quote:
Quote:
This is why the Genesis was able to use a CPU clocked more than twice as fast as the SNES CPU without needing correspondingly faster, more expensive memory.


No need to get into DMA speed comparisons at this point in the article...

Of course we need, because it's related to memory speed,which is true like i said for systems where the CPU is the fastest chip for accessing datas,and we don't speak of a simple 68k/816 costs comparison, but the use of those CPU in two 2D game systems,that are not only CPU dependent,and involving some other chips.

Usually the 65xx needs 1 cycle for accessing memory, like DMA in general,this is why the PCE's CPU needs 140ns memory @7,16 mhz, and MD 150ns for his 6,67mhz DMA .
I think Nintendo has focused on the snes's PPU and the audio chip, which were (very ??) expensive, and reduced costs on the other parts (CPU / RAM / ROM) .
I think really that the couple of MD's CPU (68k + z80) was way more expensive than the 816 + his DMA controler.
Re: AtariAge "CPU comparison"
by on (#172980)
The big thing about the 68000 was that there was a lot more of people experienced with it both for its use in computers and in arcades.

That said, I was under the impression ROM speed had to be 120ns? I mean, the access still has to happen in a single cycle after all, the reason the 68000 spends four cycles is because of its microcoded nature. But I could be wrong, I should look up 68000's bus cycle timings probably.

EDIT: issues the access in 2nd cycle, data appears in 3rd cycle, stops accessing in 4th cycle... OK I guess that reacting within 2 cycles (or maybe 1.5) works.
Re: AtariAge "CPU comparison"
by on (#172983)
TOUKO wrote:
Quote:
No need to get into DMA speed comparisons at this point in the article...

Of course we need, because it's related to memory speed

This note is in the CPU section. Memory is further down the page. (And it looks like it needs work too; there's been a rant added about how horrible the 8-bit bus was, as if it were independent of the choice of CPU.)
Re: AtariAge "CPU comparison"
by on (#172985)
So as far as I can tell:

Super NES: 3.58 MHz, reads 8 bits in I think half a cycle (140 ns), overall peak throughput 3.58 MB/s
Genesis: 7.67 MHz, reads 16 bits in 2 cycles (261 ns), overall peak throughput 3.84 MB/s

On the one hand, an 8-bit read or write will finish faster than it would on a half-speed 16-bit bus where 8-bit accesses are the same speed as 16-bit accesses. On the other hand, a 16-bit bus can use slower memory for a given throughput.
Re: AtariAge "CPU comparison"
by on (#172990)
I'm not sure it makes sense to use a 16-bit bus with a 65816, even with glue logic to split and merge the bytes. The 65816 needs byte-aligned access; a simple operation like inc $0C27 in 16-bit mode would require six accesses for just seven bytes. I suppose it could be handled with wait states... You'd have to increase the CPU speed a fair bit just to come out ahead.
Re: AtariAge "CPU comparison"
by on (#172997)
Wait states like those used in the SA-1, a coprocessor with an embedded 10.7 MHz 65816? I seem to remember SA-1 games also using 16-bit ROM.
Re: AtariAge "CPU comparison"
by on (#173004)
But what was the ROM speed? With the ubiquity of random byte accesses in 65816 code, there's no way it could actually sustain 10.74 MHz on a ROM that couldn't respond at that speed.

The only wait states I'm aware of were to pause the SA-1 on a cycle-pair basis to allow the S-CPU unfettered access to shared memory. Because apparently the S-CPU is a juggernaut that doesn't understand the concept of a wait state...

EDIT: Hang on, there's something interesting in the manual... yep; apparently there can be wait cycles introduced on jumps and returns, on branches to odd addresses, and on data reads from ROM. Sounds like a 16-bit chip with glue logic to me...

This also explains something I was wondering about - how Nintendo managed to afford 50 ns ROM for the SA-1 (which was used by a ton of games that totally didn't need it, possibly as copy protection) despite even the later versions of the Super FX being limited to 5 master cycles per byte outside the instruction cache. The answer is apparently that they didn't...
Re: AtariAge "CPU comparison"
by on (#173360)
Did they also fix the 65816s half cycle long RAM accesses? If that's the case, could they have released the SNES with a SA-1? Can you imagine an SNES, that can upload 24 kB in one frame!
Re: AtariAge "CPU comparison"
by on (#173376)
psycopathicteen wrote:
could they have released the SNES with a SA-1?

Probably not at 200 USD in fourth quarter 1991.

Quote:
Can you imagine an SNES, that can upload 24 kB in one frame!

DMA speed would have depended on how fast the S-PPU and other B bus devices can accept writes.

And on the hardware we did get, it depends on how much far you're willing to letterbox. If your game is framed for a modern widescreen TV, you can get away with showing only 168 lines of active picture and the rest with forced blanking. (This also means the GBA port can use the same framing.)
Then you have 262 - (168+1) = 93 lines of blanking, because you need 1 to prime the sprite renderer. Then 93 * 1324 / 8 = 15856 bytes, or nearly all of sprite VRAM.
Re: AtariAge "CPU comparison"
by on (#173384)
Seems like it could have at least accepted one byte per dot, since it reads faster than that (mind you, that's two 8-bit memories in parallel)... How expensive would it have been to add a bus terminal to the CPU die and use 16-bit busing everywhere else on the board? The CPU wouldn't have been able to sustain 5.37 MHz (or 7.16 MHz in FastROM), but it would have been noticeably faster than what we got, and DMA would have run at double speed except for a slight penalty on odd start/end addresses... and when updating only the low or high byte, like with Mode 7...

...how would such a system handle 8-bit writes? Would it be possible to assert a write and then just not put a signal on half the data lines, or would it have to read the word, modify it, and then store it back? I'm guessing the SA-1 didn't need to worry about this - from the description of the wait behaviour, it looks like only the ROM was 16-bit. You could redesign the SNES to work the same way, with the bus terminal in front of the ROM instead of on the CPU, and if the PPU bus could accept 8-bit writes at 5.37 MHz you could still double the DMA speed, at least when updating both low and high bytes in VRAM... uh, do DMA and wait states mix?

Running in WRAM would be an issue. Perhaps the bottom 8 KB could have been fast SRAM, to allow full-speed operation like with the SA-1's I-RAM...

Anybody have an idea of how feasible this sort of thing would have been in 1990?

EDIT: What I said about 8-bit writes on a 16-bit bus applies to any write, since they come from the CPU one byte at a time. You could buffer low bytes for one cycle to make sure there isn't a high byte coming, but that doesn't help much unless the bus terminal has access to the program counter or something so it can predict what the CPU will want next and intelligently interleave accesses. Even then, if making an 8-bit write to a 16-bit chip isn't possible (and from the way the secondary pixel cache on the Super FX works I suspect it's not), the only way to make 8-bit writes happen any faster than two memory cycles would be to use separate buses for each system component, so a fresh ROM fetch could happen in parallel with RAM access... Furthermore, even putting the bus terminal on cartridge access only doesn't fully solve the problem, because you still have to deal with SRAM and special chip registers. The only solution there seems to be to put the bus terminal in the cartridge itself, and then you've irretrievably lost the design philosophy of the S-CPU in which timing is determined on-die and wait states don't exist.

Hang on - a lot of PPU registers are write-only and 8 bits wide. I'm guessing you'd have to change that...
Re: AtariAge "CPU comparison"
by on (#173389)
93143 wrote:
...how would such a system handle 8-bit writes? Would it be possible to assert a write and then just not put a signal on half the data lines, or would it have to read the word, modify it, and then store it back?
Could do the same as the 68k, and have separate "upper byte" and "lower byte" strobes.
Re: AtariAge "CPU comparison"
by on (#173393)
lidnariq wrote:
93143 wrote:
...how would such a system handle 8-bit writes? Would it be possible to assert a write and then just not put a signal on half the data lines, or would it have to read the word, modify it, and then store it back?
Could do the same as the 68k, and have separate "upper byte" and "lower byte" strobes.

You mean writing each byte separately, in parallel but staggered by half a memory cycle? With memory constructed as a pair of 8-bit units instead of one 16-bit unit?

Yeah, I guess that'd work. Not perfect, but 8-bit writes wouldn't be horrible any more. Solves the PPU bus problem too. That's what I get for posting a stream of consciousness instead of taking the time to think it through, or at least waiting for an answer...

But the question remains - how hard would it have been to do this? Did Nintendo pass up an easy method of supercharging the console, or is there something about this that would have been prohibitive in 1990, or is there another theoretical issue I haven't thought of?
Re: AtariAge "CPU comparison"
by on (#173394)
With 1 cycle for accessing memory you need 180ns chip (@5.36 mhz), easily doable but you must cut the 128ko of WRAM to 64/32 ko for reduce costs ..
For ROMS, since sega used 150ns chip in her cartridges , I do not see how Nintendo could not do the same .

i think really that the snes was scheduled to be out in 88/89, but delayed by the PPU and/or spc developpement,because in 90's 65816 was more faster than 5/6 mhz, close to 14 if i remember correctly .
Re: AtariAge "CPU comparison"
by on (#173395)
The 8-bit WRAM is already good for 2.68 MHz. Therefore a 2x8-bit dual WRAM should be good for 5.37 MHz (with wait states for random access) if the hypothetical 16-bit memory controller can overlap low and high byte accesses, giving each one a full memory cycle. (This may assume a 7.16 MHz internal CPU speed, analogous to the current 3.58 MHz internal speed, which I suspect allows 5 master cycles for the WRAM to respond...? That would explain the ratio between the SlowROM and FastROM specifications... Okay, so everyone probably already knew this. I'm not a hardware guy, all right?)

Based on testing with the B bus WRAM gate, the S-WRAM can stably respond to at least a couple of sequential accesses at FastROM speeds, which implies at least ~150 ns performance or so.


Wait... it seems to me that if you were to demux the bank byte and use the full cycle for data, you could read at twice the speed (writes would be delayed half a cycle because the data's not there yet). Could the ROM in SA-1 games actually be ordinary SlowROM? Could the SNES have been designed so that the 120 ns FastROM spec was sufficient for 14.3 MHz? Or am I missing something, and it's already reading as fast as it can?
Re: AtariAge "CPU comparison"
by on (#173403)
93143 wrote:
You mean writing each byte separately, in parallel but staggered by half a memory cycle? With memory constructed as a pair of 8-bit units instead of one 16-bit unit?

The 68000 has a 16-bit bus but can do byte accesses. What it does is have a lower strobe and an upper strobe indicating which bytes it wants to access (lower strobe for low byte, upper strobe for high byte, both strobes for word). Those are two more signals (much like e.g. the address lines)

So the suggestion was to use strobes to let hardware know whether a byte or a word access is wanted =P
Re: AtariAge "CPU comparison"
by on (#173414)
but it can not access to an odd address .. :(
Re: AtariAge "CPU comparison"
by on (#173423)
It cannot access a 16-bit word or 32-bit long at an odd address, but it does let byte addresses be unaligned.

As near as I can tell, basically every >8-bit cpu except x86 does the same (MIPS, ARM, POWER), and you can opt in to "fault instead of be slow" on x86: https://en.wikipedia.org/wiki/Bus_error#Example
Re: AtariAge "CPU comparison"
by on (#173425)
93143 wrote:

Wait... it seems to me that if you were to demux the bank byte and use the full cycle for data, you could read at twice the speed (writes would be delayed half a cycle because the data's not there yet). Could the ROM in SA-1 games actually be ordinary SlowROM? Could the SNES have been designed so that the 120 ns FastROM spec was sufficient for 14.3 MHz? Or am I missing something, and it's already reading as fast as it can?

I also wonder if they got rid of the multixing to begin with, they could've got the write data earlier in the cycle.
Re: AtariAge "CPU comparison"
by on (#173426)
93143 wrote:
byuu wrote:
The SNES could have been a beast had they included a NEC uPD7725 with program and data RAM (for per-game upload of firmware) instead of ROM.

I've looked up the datasheet for the μPD77C/P25, and now I'm wondering why the DSP-1 took so long to do stuff. According to the datasheet a 16x16 signed multiply is just one of several things that can all happen in one cycle, but the SNES manual lists that same multiply as taking 26 cycles. The datasheet says 2.58 μs for a sin/cos, but the SNES manual says 7.8 μs at about the same clock speed. Is there really that much overhead involved in getting this chip to do something on demand?


Yes, yes there is. The uPD7725 has no way of implementing anything like a jump table. A significant portion of the program ROM in the DSP-1 is dedicated to command decoding, which has to be done with a tree of test-and-branches on each bit of the command in turn. That's why the DSP-1 has so many mirrored commands, and the more important commands have more mirrors than the less important ones.

ETA: Here's what command decoding in the DSP-1B looks like (comments added, obviously) You can count for yourself how many cycles it takes to decode each command.

Code:
000: 97c000 jrqm   000
001: c10007 ld     0400,sr
002: c02006 ld     0080,dr
003: c03002 ld     00c0,b
004: 97c010 jrqm   004
005: 128081 mov    dr,a
            and    dr,b
006: 91800c jnzb   003
007: c00007 ld     0000,sr
008: 0b0000 shr1   a    ; xxxxxxx*
009: 9040a0 jca    028  ; xxxxxxx1
00a: 0b0000 shr1   a    ; xxxxxx*0
00b: 904080 jca    020  ; xxxxxx10
00c: 0b0000 shr1   a    ; xxxxx*00
00d: 904060 jca    018  ; xxxxx100
00e: 0b0000 shr1   a    ; xxxx*000
00f: 90404c jca    013  ; xxxx1000
010: 0b0000 shr1   a    ; xxx*0000
011: 9006b0 jnca   1ac  ; xxx00000 0x00, 0x20
012: 9046cc jca    1b3  ; xxx10000 0x10, 0x30
013: 0b0000 shr1   a    ; xxx*1000
014: 904784 jca    1e1  ; xxx11000 0x18, 0x38
015: 0b0000 shr1   a    ; xx*01000
016: 900740 jnca   1d0  ; xx001000 0x08
017: 9046f4 jca    1bd  ; xx101000 0x28
018: 0b0000 shr1   a    ; xxxx*100
019: 904074 jca    01d  ; xxxx1100
01a: 0b0000 shr1   a    ; xxx*0100
01b: 9007d0 jnca   1f4  ; xxx00100 0x04, 0x24
01c: 904800 jca    200  ; xxx10100 0x14, 0x34
01d: 0b0000 shr1   a    ; xxx*1100
01e: 9008fc jnca   23f  ; xxx01100 0x0c, 0x2c
01f: 904940 jca    250  ; xxx11100 0x1c, 0x3c
020: 0b0000 shr1   a    ; xxxxx*10
021: 904094 jca    025  ; xxxxx110
022: 0b0000 shr1   a    ; xxxx*010
023: 9009ec jnca   27b  ; xxxx0010 0x02, 0x12, 0x22, 0x32
024: 904d80 jca    360  ; xxxx1010 0x0a, 0x1a, 0x2a, 0x3a
025: 0b0000 shr1   a    ; xxxx*110
026: 900e8c jnca   3a3  ; xxxx0110 0x06, 0x16, 0x26, 0x36
027: 905068 jca    41a  ; xxxx1110 0x0e, 0x1e, 0x2e, 0x3e
028: 0b0000 shr1   a    ; xxxxxx*1
029: 9040b8 jca    02e  ; xxxxxx11
02a: 0b0000 shr1   a    ; xxxxx*01
02b: 0b0000 shr1   a    ; xxxx*x01
02c: 901120 jnca   448  ; xxxx0x01
02d: 905224 jca    489  ; xxxx1x01
02e: 0b0000 shr1   a    ; xxxxx*11
02f: 9040cc jca    033  ; xxxxx111
030: 0b0000 shr1   a    ; xxxx*011
031: 90128c jnca   4a3  ; xxxx0011
032: 9052f4 jca    4bd  ; xxxx1011
033: 0b0000 shr1   a    ; xxxx*111
034: 0b0000 shr1   a    ; xxx*x111
035: 9053a8 jca    4ea  ; xxx1x111 0x17, 0x1f, 0x37, 0x3f
036: 0b0000 shr1   a    ; xx*0x111
037: 90133c jnca   4cf  ; xx00x111 0x07, 0x0f
038: 9053c0 jca    4f0  ; xx10x111 0x27, 0x2f
Re: AtariAge "CPU comparison"
by on (#173462)
Sik wrote:
93143 wrote:
You mean writing each byte separately, in parallel but staggered by half a memory cycle? With memory constructed as a pair of 8-bit units instead of one 16-bit unit?

The 68000 has a 16-bit bus but can do byte accesses. What it does is have a lower strobe and an upper strobe indicating which bytes it wants to access (lower strobe for low byte, upper strobe for high byte, both strobes for word). Those are two more signals (much like e.g. the address lines)

So the suggestion was to use strobes to let hardware know whether a byte or a word access is wanted =P

Yeah, I got that. I googled it, and that's where I found the bit about dual 8-bit memories - apparently it's a popular setup for a 68000, and obviously has no trouble taking a half-word write without clobbering the other half. The SNES uses this setup for VRAM.

But it seems to me that there are additional considerations with the 65816. It uses an 8-bit bus on the CPU side, so technically all memory accesses are 8-bit. So if you want the doubled speed from using 16-bit memory, it would be faster for the memory controller to stagger writes by one CPU cycle, or half a memory cycle, so as to use each byte as soon as it comes through rather than waiting until the whole word is ready. Getting any smarter than that seems like it would require CPU emulation in the glue logic.

AWJ wrote:
The uPD7725 has no way of implementing anything like a jump table.

Why not map an input register to the program space? Check for a new command and branch to the instruction register, which contains a jump to the desired program.

I shouldn't spend any time studying this chip right now. Busy busy...
Re: AtariAge "CPU comparison"
by on (#173463)
I think the point was that you couldn't do indirect jumps at all (i.e. address has to be hardcoded in the opcode), otherwise you could just copy the address from a table into a register and jump there. I guess self-modifying code wasn't an option either, right?
Re: AtariAge "CPU comparison"
by on (#173465)
That's what I meant. The uPD7725 has no indirect or computed jumps, only absolute. And it can't execute out of RAM; it's a pure Harvard architecture with program ROM, data ROM and data RAM as completely separate address spaces.
Re: AtariAge "CPU comparison"
by on (#173467)
No, I meant map an input port into the DSP's program ROM space, to allow the S-CPU to write in an absolute jump instruction (or the relevant part of one, the rest being ordinary ROM). The thing can at least do conditional jumps, so a normal input register could be checked for a 'start' command and the result used to branch to the mapped address. Or, better yet, you could map the input port directly to the address section of the conditional jump and eliminate the extra instruction...

Even a single externally writable byte in the program memory would be enough to implement a jump table, and two bytes would make a jump table unnecessary by allowing a direct jump to anywhere in the ROM (though that would require the S-CPU to write a 16-bit value, and S-CPU cycles might be more precious than DSP cycles).

I think there's an SPC loading program that does something similar (but far more timing-sensitive), having the S-SMP jump to the I/O ports so the S-CPU can operate it like a puppet to load the last few bytes of the RAM image...

I'm not sure exactly how tightly integrated the DSP and its program memory are in a hardware sense, or how easy it would be to insert this level of customization... though it strikes me that byuu's program RAM idea would trivially allow this sort of stunt if you used dual-ported memory...