BG2 in EXTBG is literally the same data as BG1. No extra reading required.
Oh, I get it now. Wouldn't that mean that you can only use 128 colors, or is it like BG 1 takes colors 0-127, and BG2 takes colors 128-255? Anyway though, couldn't they have implemented this just as easily on any other mode? The Irem M92 (and I think even the M72) has something almost exactly like this where the backgrounds can have half of the colors be higher priority. In the Hunt showcases this a lot, like where there is water that covers the submarine and the rest of the BG layer. Also, the underwater buildings on stage 2.
93143 wrote:
Long story short: you may not want to mess with it.
It seems like there are a lot of weird tricks I don't yet know about. I'm guessing this would be way too slow for trying to change the color of every pixel to create a 16bpp bitmap?
93143 wrote:
禁則事項です。
Google Translate?
(Nothing ever translates that well unless it was written in it, and even then, it can have trouble translating its own translated message.
)
93143 wrote:
...okay, no, that's not true. I'm Canadian.
Enjoying your "froyo"?
(Seriously though, this website is home to the largest number of Canadians I've seen yet.)
93143 wrote:
Does that make me a hipster for saying Mega Drive instead of Genesis?
I'm assuming it was called the Genesis there also? Wasn't there some sort of copyright or something that prevented Sega from using the name Megadrive? Kind of reminds me how StarFox had been named StarWing in PAL regions due to some crappy Atari 2600 game.
Espozo wrote:
93143 wrote:
The nice part about the VRAM gate is that it's got one 8-bit register for each VRAM chip (at least, I'm pretty sure that's how the split works), and you can set it to increment the word address (by 1, 32, or 128) with a write to either one. This means that for Mode 7, you can use a single DMA transfer to update either the graphics data or the tilemap without touching the other one, meaning you don't have to store them interleaved in ROM, and you can render to Mode 7 (like Wolfenstein 3D) without worrying about skipping the tilemap bytes.
Cool. So is one entry in mode 7 tilemap really just 8 character bits?
Yes, in Mode 7 each "even" byte of VRAM is one 8-bit tilemap entry without any attribute bits, and each "odd" byte is one 8-bit pixel. Mode 3/4 also has 8bpp tiles but they're regular planar SNES tiles, and not interleaved with the tilemap.
Quote:
Quote:
adds a second BG layer to Mode 7, and this BG2 is exactly the same as the BG1 (scroll, matrix, etc.) except that the top byte in the colour index for each pixel is considered a priority bit instead of a colour bit.
Would it have been any more difficult for them to have just made the second BG layer be a normal BG layer? Also, isn't the second BG in EXTBG just like a regular 8bpp BG in how the graphics and the tilemap are? I would have thought all of this would be too much for how fast the ppus can read from vram.
The EXTBG layer is completely "parasitic": it's a different interpretation of the same Mode 7 pixel data that the PPU is already fetching for BG1. Each of the VRAM data lines is literally connected to two different pins on PPU2.
The originally intended purpose of the EXTBG pins was almost certainly to mix in output from a third PPU of some sort (e.g. from a NES-compatible PPU, if backward compatibility was planned for the SNES at some point) When whatever that "PPU3" would have been was dropped from the design, some engineer evidently realized those pins could still be used to add a bit of extra functionality to Mode 7 at no hardware cost.
Quote:
The video hardware in just about any other 2D system I can think is way more straightforward.
Arcade hardware tends to be simpler (meaning more fixed-function, not less powerful) than console hardware because for arcade games with different needs the manufacturer would use completely different hardware. Until the late 1990s (when hardware directly derived from consoles and PCs started taking over) no arcade manufacturer would use the same video chipset for a 2D shoot-em-up as for a behind-the-car driving game.
The Genesis video hardware does have "modes" if you count the Master System backward-compatibility ones. I won't dispute that the SNES is more complicated than the Genesis or the PC Engine (TurboGrafx) but if you want to see a crazy-complicated "2D" video system take a look at the Saturn. It makes the SNES look very simple indeed.
The GBA video hardware is actually quite a bit simpler than the SNES. Just to name two basic things, there are only two tile formats (8x8 4bpp and 8bpp) and the VRAM layout is much more hardwired than on the SNES. Most of the complexity of the SNES (and the Saturn) comes from the desire to squeeze every drop of use out of very limited RAM capacity and bandwidth. You have so many different tile formats because there's enough bandwidth to fetch and draw two 4bpp layers and a 2bpp layer, but not quite enough for three 4bpp layers. You can put anything (except Mode 7) anywhere in VRAM so that you never leave any VRAM unused, no matter what the balance of sprite to background graphics in your game is. You support 16x16 tiles because it reduces the amount of tilemap data the CPU has to create in WRAM and upload to VRAM by three quarters, but you also support 8x8 tiles because most games do need a score/HUD display (run Dogyuun or Knuckle Bash or Batsugun in MAME and take a look at how they displayed their HUDs using hardware that only supported 16x16 tiles; it's hilariously wasteful)
The PSX video hardware is also pretty damn simple, though it's fundamentally different from everything else discussed here (a blitter/rasterizer rather than sprites'n'tilemaps). The primary reason its emulation is such a black art is that nobody wants to play those first-generation 3D games in their original resolutions. Imagine that that HDNES abomination was the bare minimum that end-users would accept and you have the PSX emulation scene
Espozo wrote:
93143 wrote:
BG2 in EXTBG is literally the same data as BG1. No extra reading required.
Oh, I get it now. Wouldn't that mean that you can only use 128 colors, or is it like BG 1 takes colors 0-127, and BG2 takes colors 128-255? Anyway though, couldn't they have implemented this just as easily on any other mode? The Irem M92 (and I think even the M72) has something almost exactly like this where the backgrounds can have half of the colors be higher priority. In the Hunt showcases this a lot, like where there is water that covers the submarine and the rest of the BG layer. Also, the underwater buildings on stage 2.
BG1 uses colors 0-255, BG2 uses colors 0-127, sprites use colors 128-255 (like they always do). Nothing is ever wasted on the SNES (except that one VRAM cycle in Mode 6)
For the regular BG modes Nintendo went with priority-per-tile rather than priority-per-pixel. The former is probably easier to implement in hardware than the latter when you have variable bitdepths like the SNES does, and probably also easier for artists to use. But both of them are luxury features; a lot of arcade hardware can't do either, and I can't think of any hardware that does both at once (though I'm sure it exists...)
AWJ wrote:
The EXTBG layer is completely "parasitic": it's a different interpretation of the same Mode 7 pixel data that the PPU is already fetching for BG1. Each of the VRAM data lines is literally connected to two different pins on PPU2.
I'm almost not really sure why it's called an extra BG when it doesn't really seem like it's enough to be considered one. If we're counting that as an extra BG, we could say the Irem M92 has 6BGs.
AWJ wrote:
Arcade hardware tends to be simpler (meaning more fixed-function, not less powerful) than console hardware because for arcade games with different needs the manufacturer would use completely different hardware.
Well, maybe with Sega at least. I swear, it looks like they where developing for 8 different arcade machines at any given moment. If you look at a company that didn't feel like having 100 different arcade boards like Capcom, SNK, or Irem, they have many different games running on the same hardware. Then there's people like Konami, who I heard pretty much made a new arcade board for every game.
AWJ wrote:
run Dogyuun or Knuckle Bash or Batsugun in MAME and take a look at how they displayed their HUDs using hardware that only supported 16x16 tiles; it's hilariously wasteful
How do they do it? I've seen games use 16x16 sprites for one letter or number, even on the SNES. (Rendering Ranger R2, which I would have thought they could have gotten away with just updating the tiles instead)
AWJ wrote:
The PSX video hardware is also pretty damn simple, though it's fundamentally different from everything else discussed here (a blitter/rasterizer rather than sprites'n'tilemaps). The primary reason its emulation is such a black art is that nobody wants to play those first-generation 3D games in their original resolutions. Imagine that that HDNES abomination was the bare minimum that end-users would accept and you have the PSX emulation scene
I actually today felt like being a pirate and downloading Goldeneye because I was playing it at my father's house in Virginia during the summer on the N64 there, and it looked really weird, like how smooth everything looked contrasted with the rest of everything. This is kind of random, but let me tell you, the control stick options in Project 64 are terrible, or at least they were with my wired Xbox 360 controller. It's weird, because once you moved the control stick past halfway in either direction, it wouldn't increase the speed. I wish more shooters where like Doom to where you can hold down a run button for turning really quickly, and you can not hold it down for accurate aiming, because in every other game I've played with adjustable sensitivity, I've had to try and balance between it because the control stick sensitivity is god awful. (I got a Wii u for Splatoon a couple of days ago and as fun as I find the game to be, the controls are some of the worst I've ever experienced. I literally turned the sensitivity as low as it goes.) People need to look at Super Monkey Ball or something for a good reference on how to properly implement analog controls.
Edit: New post incoming to stop my rant.
Quote:
Nothing is ever wasted on the SNES (except that one VRAM cycle in Mode 6)
I think Mode 6 wins the award for most useless SNES graphical mode. I mean, I can't even think of one thing that uses it. I would have rather seen a 512x448 8bpp mode, even if that could only fill a 1/4 of the screen with unique tiles. It could look incredibly awesome for a title screen if it's fine to have repeating tiles.
Quote:
But both of them are luxury features; a lot of arcade hardware can't do either, and I can't think of any hardware that does both at once (though I'm sure it exists...)
Irem M92.
Edit: Is it me, or is Wikipedia (unsurprisingly) off on this?
Quote:
RAM is accessed at 3.072 MHz
Isn't ram always accessed at slowrom speed (2.56MHz?) while rom can either be accesed at slowrom or fastrom (3.58MHz?) speeds depending on the cartridge?
Also, I'm assuming the Wikipedia article was written by someone over at Sega 16?
Quote:
higher bit-rate competition such as the Sega Genesis.
Quote:
As part of the overall plan for the SNES, rather than include an expensive CPU that would still become obsolete in a few years, the hardware designers made it easy to interface special coprocessor chips to the console
Espozo wrote:
It seems like there are a lot of weird tricks I don't yet know about.
This one is rare. As far as I know, mid-scanline BGMODE switching has been done only three times in the history of the SNES - once in a cracktro for Bubsy 2 done by Anthrox in 1994, once in a menu test screen for the RPG Ramsis is working on, and once (well, a few times) in my preliminary work on that shmup port I've been poking at.
Quote:
I'm guessing this would be way too slow for trying to change the color of every pixel to create a 16bpp bitmap?
You can write one 16-bit word (corresponding to a
15-bit RGB triplet plus one dummy bit) every four dots with DMA to CGRAM, except during DRAM refresh right in the middle of every scanline where the CPU pauses for 10 dots. To my knowledge this has not been tested and may not work, and even if it does the DRAM refresh will ensure it doesn't work well. Though I suppose you could rewrite the colours for that area during HBlank...
You can do it on the Mega Drive, though - since DMA to CRAM is 16-bit, you can write one colour every
two pixels, which looks half-decent. And there's no DRAM refresh region, so the whole screen is accessible. This technique is called "FantomBitmap" and was developed fairly recently.
Quote:
Isn't ram always accessed at slowrom speed (2.56MHz?)
That's 2.68 MHz. 21.477 MHz master clock divided by 8. (Or 21.281 MHz for PAL, resulting in 2.66 MHz slow access.) Mind you, nothing happens during DRAM refresh, so globally FastROM and SlowROM are more like 3.47 and 2.61 MHz (3.44 and 2.58 for PAL).
If you happened to have some SRAM in the cartridge mapped/mirrored in the FastROM region, you could access it at high speed...
Quote:
93143 wrote:
禁則事項です。
Google Translate?
Anime reference.
Quote:
Enjoying your "froyo"?
I have honestly never seen that word before. Checking their website, it seems the nearest location is about 3 hours away by car...
Quote:
I'm assuming it was called the Genesis there also?
Yeah, Canada tends to get lumped in with the U.S. in these matters, because it's culturally similar and has comparable income levels, a huge open border, and 1/9 of the population (mostly living within a few hours of said border), so it isn't worth it to treat it as a whole separate region. (Regarding video games specifically, it doesn't hurt that Canada is an NTSC territory with the same power outlet standards as the U.S., so hardware that works there works here.)
But there are still weird little differences. Ever seen
these?
93143 wrote:
This one is rare. As far as I know, mid-scanline BGMODE switching has been done only three times in the history of the SNES - once in a cracktro for Bubsy 2 done by Anthrox in 1994, once in a menu test screen for the RPG Ramsis is working on, and once (well, a few times) in my preliminary work on that shmup port I've been poking at.
Wow, of all things, Bubsy 2... You know, this technique seems really pretty useful. Couldn't you use it for column scrolling in non supported modes by changing BG layer Y coordinates during the scanline? Heck, you know, if this is something that has to be changed every scanline, couldn't you actually have it to where there's a diagonal window of Mode 1 to 3 or something?
93143 wrote:
If you happened to have some SRAM in the cartridge mapped/mirrored in the FastROM region, you could access it at high speed...
93143 wrote:
Anime reference.
Yeah, I wouldn't have gotten that...
93143 wrote:
I have honestly never seen that word before. Checking their website, it seems the nearest location is about 3 hours away by car...
Apparently, it's a Canadian only frozen yogurt chain. "Froyo" is the "cool" way of saying frozen yogurt. Dang though, talk about product placement. I wouldn't be surprised if Nintendo owned Yogurt's (which has only recently been known because of them):
https://twitter.com/yogurtysBut the fun doesn't stop there...
https://www.facebook.com/yogenfruz (yes, this is an entirely different frozen yogurt chain.)
93143 wrote:
But there are still weird little differences. Ever seen these?
mnm's in a smarties container?
Espozo wrote:
Wow, of all things, Bubsy 2...
Not the game itself. A cracktro (or so it appears) inserted by a team called Anthrox.
Behold! This technique will be possible in higan v095.
Quote:
You know, this technique seems really pretty useful. Couldn't you use it for column scrolling in non supported modes by changing BG layer Y coordinates during the scanline?
Maybe, but you'd have to line up the writes pretty precisely so they'd happen where you wanted them, and you'd sacrifice a lot of compute time doing it. There's a reason only one game (if I'm not mistaken) ever did that...
Quote:
Heck, you know, if this is something that has to be changed every scanline, couldn't you actually have it to where there's a diagonal window of Mode 1 to 3 or something?
Like I said, it eats a lot of real estate, as well as glitching the screen. (Not sure if Mode 1 -> Mode 3 causes glitching, but Mode 5 -> Mode 1 does, and Mode 7 -> anything causes a lot of it.) My game has a 32-pixel-wide column of sprites dedicated to masking the garbage that results from the mode change, and it only looks okay because all the writes fall in a dark area of the sprite mask so you can't see the remaining glitch pixels very well.
And the only reason
that's true is that I used timed code in the interrupt to check the H-position and stagger-step in order to line up the writes tightly enough within a single 8-dot column to avoid visible glitching when the main code is using instructions as long as 7 cycles (I was aiming for 8, but there was some slight glitching outside the target column). After this, the interrupt zeroes BG1's scroll values, changes BGMODE (+ MOSAIC, since I'm using 16-bit writes) and CGADSUB + COLDATA, and does a decrement+branch on a line counter in WRAM so I can do a couple of things I ran out of HDMA channels for (one of which probably shouldn't be done with HDMA anyway).
This interrupt alone eats nearly half of the available compute time during the playfield display. Adding in my heavily loaded HDMA schedule bumps it to around two-thirds. Fortunately this happens to be a Super FX game, so if I find I'm running out of CPU time, the GSU can pick up the slack...
You might not need the full functionality of my IRQ, but keep in mind that the IRQ timer registers are internal to the CPU, so you can't write them with HDMA. This means you'd have to add that functionality to the interrupt if you wanted to move the trigger point every scanline.
Alternately, you could use timed code over the whole frame instead of an IRQ. Again, it's harder than it sounds even without trying to get any useful work done during the frame - or maybe that's just because I was running HDMA at the same time; I never did figure out what the problem was...
Quote:
https://www.facebook.com/yogenfruz (yes, this is an entirely different frozen yogurt chain.)
Now that name I know - our local Cineplex has one.
Quote:
mnm's in a smarties container?
Those are Smarties. And no, they taste nothing at all like M&M's, or as little like them as they possibly can while being basically the same thing...
93143 wrote:
Behold! This technique will be possible in higan v095.
Wow...
93143 wrote:
Maybe, but you'd have to line up the writes pretty precisely so they'd happen where you wanted them, and you'd sacrifice a lot of compute time doing it. There's a reason only one game (if I'm not mistaken) ever did that...
By wasting computing time, I assume you mean like checking how much processing has been done to know when to send the information, because it needs to be timed perfectly? Wait, didn't you say you could create an interrupt for this? What would be some bad processing wise than?
There are honestly dozens of possibilities for this, even if you do need an extra chip.
93143 wrote:
Now that name I know - our local Cineplex has one.
Did you ever try the nasty-looking product placement yogurt?
(Or should I say "froyo"?)
Espozo wrote:
93143 wrote:
This one is rare. As far as I know, mid-scanline BGMODE switching has been done only three times in the history of the SNES - once in a cracktro for Bubsy 2 done by Anthrox in 1994, once in a menu test screen for the RPG Ramsis is working on, and once (well, a few times) in my preliminary work on that shmup port I've been poking at.
Wow, of all things, Bubsy 2... You know, this technique seems really pretty useful. Couldn't you use it for column scrolling in non supported modes by changing BG layer Y coordinates during the scanline? Heck, you know, if this is something that has to be changed every scanline, couldn't you actually have it to where there's a diagonal window of Mode 1 to 3 or something?
No, mid-scanline raster effects really aren't very practical at all on the SNES which is why they're so rare. If you want an angled split screen like the DBZ fighting games, or a side-HUD like every console port of a shmup that originally ran on a vertical monitor, then using the clip windows or one of the offset-per-tile modes is much easier.
The problem with doing mid-scanline effects on the SNES is that a single CPU cycle takes from 1.5 to 2 pixels worth of time to execute, and the shortest
instruction on the 65816 takes two CPU cycles, so the most precise positioning possible is about 1 tile, and even that requires tricky cycle counting which prevents you from doing much of anything else with the CPU at the same time (like, say, gameplay calculations). Furthermore, mode changes in particular produce a very large and visible stripe of garbage on the screen that you have to cover up with sprites.
Air Strike Patrol does exactly what you're suggesting, changing a Y scroll register in mid-scanline to make the word "READY" rotate as if the letters were on the faces of a row of cubes. They work around the precision issue by doing the effect on a BG layer that's mostly blank/transparent (it contains the HUD at the left edge of the screen, the "READY" in the middle, and nothing in between) and the CPU usage is irrelevant because the effect is displayed at the start of a stage before you actually start playing (also, the effect only spans a few scanlines, not the entire height of the screen)
AWJ wrote:
a single CPU cycle takes from 1.5 to 2 pixels worth of time to execute
How is it "1.5 to 2"? Don't all cycles take the exact same amount of time? Or is it really like all cycles take 1.75 pixels or something? Also though, if you're using a fast expansion chip, you shouldn't have to worry about this problem, should you?
Espozo wrote:
AWJ wrote:
a single CPU cycle takes from 1.5 to 2 pixels worth of time to execute
How is it "1.5 to 2"? Don't all cycles take the exact same amount of time? Or is it really like all cycles take 1.75 pixels or something? Also though, if you're using a fast expansion chip, you shouldn't have to worry about this problem, should you?
No, all cycles don't take the same amount of time on the SNES. A fast cycle (accessing FastROM or MMIO) is 1.5 pixels (6 master clocks), a slow cycle (accessing SlowROM or WRAM) is 2 pixels (8 master clocks).
Expansion chips in a cartridge can't directly write to the PPU (or to anything else on the console motherboard). Only the S-CPU can.
AWJ wrote:
No, all cycles don't take the same amount of time on the SNES. A fast cycle (accessing FastROM or MMIO) is 1.5 pixels (6 master clocks), a slow cycle (accessing SlowROM or WRAM) is 2 pixels (8 master clocks).
You know, what happens if you try to send this data on a "half pixel"? (will it count as 0 or 1 or something else?) I imagine that if you are using FastROM along with WRAM and you're wanting to just have a precise split down the screen or something, you could do a mixture of both 1.5 pixel and 2 pixel cycles to perfectly make any number over 1 (which I'm not sure why you'd do it then anyway).
AWJ wrote:
Expansion chips in a cartridge can't directly write to the PPU (or to anything else on the console motherboard). Only the S-CPU can.
I know, unfortunately. I'm still not sure why they didn't add a way, considering they didn't seem to have a problem with adding an expansion port on the bottom of the system that pretty much went unused.
Espozo wrote:
Is it me, or is Wikipedia (unsurprisingly) off on this?
Quote:
RAM is accessed at 3.072 MHz
Isn't ram always accessed at slowrom speed (2.56MHz?) while rom can either be accesed at slowrom or fastrom (3.58MHz?) speeds depending on the cartridge?
Also, I'm assuming the Wikipedia article was written by someone over at Sega 16?
3.072 is right on for the speed of RAM access on the APU side. The S-SMP gets one cycle out of three and the S-DSP gets the other two. It's also coincidentally an average between 3.6 MHz (fast ROM and PPU ports) and 2.7 MHz (slow RAM). Does any reasonable Super NES emulator output the fraction of cycles that are slow or fast?
Quote:
Quote:
As part of the overall plan for the SNES, rather than include an expensive CPU that would still become obsolete in a few years, the hardware designers made it easy to interface special coprocessor chips to the console
Extensibility was intentionally engineered into the NES, according to two articles published during the third year of
Nintendo Power (issues 13-24; I lack exact issue numbers). The second was titled "Why Game Paks Never Forget" or the like. I remember it using an analogy that one could upgrade a car by replacing its engine.
> Does any reasonable Super NES emulator output the fraction of cycles that are slow or fast?
No, but that's an easy mod to make if you wanted to track that. Hardest part would be hooking it up to display in a window.
> Also, I'm assuming the Wikipedia article was written by someone over at Sega 16?
Speaking of that, are there any mods left there? Been waiting over a week for an account activation.
byuu wrote:
>
Speaking of that, are there any mods left there? Been waiting over a week for an account activation.
The first time I tried I had to wait about a month before being denied.
The second time I tried I had to wait about a week. Slow as hell, and some of those guys seem REALLY hostile towards the SNES.
I wonder whether attempting to add a hypothetical bgen to higan might help you get an account sooner.
Espozo wrote:
Wait, didn't you say you could create an interrupt for this? What would be some bad processing wise than?
The fastest possible software-transparent interrupt takes either 48 or 53.5 dots, depending on the current register size setting:
Code:
(irq) ; 6 slow, 2 fast
pha ; 2 or 3 slow, 1 fast
lda $4211 ; 3 slow, 1 or 2 fast
pla ; 2 or 3 slow, 2 fast
rti ; 5 slow, 2 fast
If you want to make it actually do anything, you'll have to add at least one
rep #$30 or similar at the beginning, plus whatever you want to make it actually do, plus any extra bookkeeping necessary to keep it transparent. I figure the practical minimum for a mode switch is the BGMODE write itself plus a scroll change (my code just zeroes the scroll values), plus a change to the IRQ H-timer if you're trying to do a non-vertical split. If you're just doing a scroll change, you can skip the BGMODE load/store ops, but you're probably using nonzero values for the scroll registers. Either way, you've probably about doubled the size of the above example, and now it's eating a third of your compute time. That's for
one change per scanline.
Plus, interrupt positions jitter based on the size of the instructions in the interrupted code, because the processor will finish the current instruction before servicing the IRQ, and some instructions can take a couple of tile widths to execute. If you want to line up the raster writes within a narrower range than that, you have to include additional timed code like I did, or else just never use instructions beyond a certain length in your main code. I don't think there's even enough time in a scanline to stagger-step an interrupt all the way down to a single pixel for a perfect split in the presence of normal game code; it took me the majority of a scanline to null out a much smaller variance in a test ROM...
Quote:
There are honestly dozens of possibilities for this, even if you do need an extra chip.
That's true, but it's such a massive undertaking that unless you have a specific goal that can't be done any other way (or you think you can use the garbage effect artistically), you're probably better off using windowing or offset-per-tile or some such, as AWJ says.
I'm only doing it because I really want my port to reproduce the graphical look of the original as closely as possible, and I need a Super FX anyway for unrelated reasons. It's more of an exercise in stretching the SNES to its limits than an attempt to realize a creative vision...
Quote:
Did you ever try the nasty-looking product placement yogurt?
(Or should I say "froyo"?) No. As I said, this is the first I've heard of it, and we don't have a branch in town.
It is kinda pleasant to see Nintendo having this much success with a new property, though... I don't remember anybody offering Pikmin-themed frozen yogurt...
Quote:
I imagine that if you are using FastROM along with WRAM and you're wanting to just have a precise split down the screen or something, you could do a mixture of both 1.5 pixel and 2 pixel cycles
I tried that. It didn't work, even though I carefully aligned the first write to within a single dot.
Granted, I haven't tried it without HDMA running, so that might be what was kicking the loop around, but I only barely managed a vertical split with a grumbly edge a few pixels wide. The only perfect vertical split I ever saw only worked on some frames; other frames had a diagonal split. It's touchy. I suppose one could run a bit of alignment code every scanline to eliminate the grumbliness...
And I didn't try to run any useful code during the screen - the whole thing was a timed loop. I can imagine trying to squeeze real processing into a timed loop being a hair-pulling experience. And without that, all you've got for processing is VBlank. I don't know about you, but my VBlank is
stuffed with DMA...
Oh, and there's that pesky DRAM refresh in the middle of the screen. You flat-out can't do anything at all for 10 pixels in the middle of every scanline, and on later console revisions this area moves around a bit between scanlines and (I think) between frames (possibly to reduce the visibility of the associated vertical bar artifact, which seems to be due to heavy current draw). So it's actually impossible to do a vertical mode switch in the exact centre of the screen, and a diagonal split would probably need masking to look decent even if it didn't generate any garbage (mode switching does, at least when Mode 7 is involved; scroll changes don't).
...
This isn't intended to be the final word or anything. If you figure out a good way to do it, great. Tell me about it. But from what I've found out so far, it seems apparent that CPU-based mid-scanline raster effects are somewhat less useful than they sound.
> some of those guys seem REALLY hostile towards the SNES.
That they do. Utterly bizarre to me. Both systems have their pros and cons, but I'd agree the Genesis was the superior hardware. But to me, what matters are the games. And the SNES has ~15 times the JRPGs as the Genesis.
> I wonder whether attempting to add a hypothetical bgen to higan might help you get an account sooner.
I've been contemplating it (really want to play Lunar 1&2 again on the Mega CD; would be fun to use my own emulator for that), but I really just don't have the time.
byuu wrote:
That they do. Utterly bizarre to me.
Because they're utterly bizarre.
byuu wrote:
Both systems have their pros and cons, but I'd agree the Genesis was the superior hardware.
I'm just curious, and I'm not saying you're wrong because you have x100 more knowledge of both systems than I do, but what makes you think so? I'm pretty sure you don't believe in blast processing.
I always thought the SNES was backwards in how it was constructed (mostly with sprites), but I never really saw it as being weaker. The best part to me though is when people complain about slowdown in slowrom games.
byuu wrote:
But to me, what matters are the games. And the SNES has ~15 times the JRPGs as the Genesis.
Is that supposed to be a good thing?
Well you know 93143, just thinking, I imagine one trick you could reasonably do is have an object or something made of a BG on the left side of the screen, and then have another object on the right side of the screen on the same BG layer. What you could then do is have them move differently by doing the mid scan line trick thing, and if there's garbage in between, can't you clip it out with a window layer? This could almost be good for parallax, like if in a fighting game or something where you can't move very far to the left or the right, you have something to the left that moves not very fast and something to the right that moves faster when you move, because the thing on the right is supposed to be closer to you. They shouldn't overlap each other because you can't move enough to the right for this to happen.
You know though, with all this talk about dots, is it like the whole screen is how fast the CPU operates in one frame? Compared to the video hardware, the CPU really doesn't seem way too fast, which isn't too surprising to me though considering that on the pictures I've seen, the video hardware as a whole (both PPUs) is twice as big as the CPU, because each PPU is about that size. I like it that way better though, as if the PPUs where better, we wouldn't be having this problem in the first place... (I'm not saying they where bad though.) I still think Nintendo should have put a bus from the cartridge to the PPUs though because most of the problems with trying to do fancy stuff like this is the CPU's fault, mostly because of the slow DMA.
93143 wrote:
No. As I said, this is the first I've heard of it, and we don't have a branch in town.
Didn't you just say that that other place was by the movie theater there?
93143 wrote:
It is kinda pleasant to see Nintendo having this much success with a new property, though... I don't remember anybody offering Pikmin-themed frozen yogurt...
That's because one of the games is (in my humble opinion) infinitely better than the other, and yes, I own Pikmin. I still haven't changed my mind about the controls though.
93143 wrote:
I don't know about you, but my VBlank is stuffed with DMA...
Mine should be with all the explosions.
93143 wrote:
This isn't intended to be the final word or anything. If you figure out a good way to do it, great. Tell me about it. But from what I've found out so far, it seems apparent that CPU-based mid-scanline raster effects are somewhat less useful than they sound.
Yeah, unfortunately.
I'm just curious, but since the Genesis runs at a faster clockspeed, does anyone know how many pixels a cycle is on it? How many cycles is the fastest instruction?
Espozo wrote:
since the Genesis runs at a faster clockspeed, does anyone know how many pixels a cycle is on it?
Genesis 68000 CPU:
- Clock rate is 15/7 of S-CPU fast clock rate
- Data bus is twice as wide as that of the S-CPU
- Accesses memory every four cycles, unlike S-CPU that accesses memory every cycle
VDP:
- 256px mode: Dot rate is same as other TMS9918 derivatives and TMS9918-inspired Nintendo PPUs, 3/2 of S-CPU fast clock rate 342 dots per line
- 320px mode: Dot rate is 5/4 of 256px mode dot rate, or 15/8 of S-CPU fast clock rate, 427.5 dots per line
In other topics, I've adopted a
convention of assuming that each slow S-CPU cycle is worth three 68000 cycles and each fast cycle is worth two.
Quote:
How many cycles is the fastest instruction?
Google
68000 cycles per instruction brought
this, which breaks each instruction into address generation and the remainder.
Espozo wrote:
Well you know 93143, just thinking, I imagine one trick you could reasonably do is have an object or something made of a BG on the left side of the screen, and then have another object on the right side of the screen on the same BG layer. What you could then do is have them move differently by doing the mid scan line trick thing, and if there's garbage in between, can't you clip it out with a window layer? This could almost be good for parallax, like if in a fighting game or something where you can't move very far to the left or the right, you have something to the left that moves not very fast and something to the right that moves faster when you move, because the thing on the right is supposed to be closer to you. They shouldn't overlap each other because you can't move enough to the right for this to happen.
I don't see why not. But isn't this a bit of a heavy-duty process for such a mild effect? I guess if it's indicated by the direction the game design has taken, you're out of BG layers and sprites but not CPU time, and a window mask effect by itself won't cut it...
BTW, the garbage only happens with mode changes (and possibly other changes). Scroll changes seem to be clean, as long as the layer in question is transparent during the switch so you can't see the seam. (Which I guess you could ensure by using a masking window.)
Air Strike Patrol, as stated, uses it to do a neat effect with some text while using the same BG layer for a vertical status bar. There's quite a bit of space between the sidebar and the text, it seems to only change the vertical scroll values, and the text doesn't cover very many scanlines; the effect probably uses very little CPU time compared with a full-screen raster split...
If you find an application for this that doesn't devastate the rest of the game, I see no reason not to go ahead and try it. The ASP case is a good example of a neat trick that doesn't cost very much. Just beware of the SNES Jr.; it doesn't behave the same as a normal SNES and these stunts can fail on it.
...
Hang on; wasn't this thread about a tilemap editor? In that case, it'll probably work fine with less exotic techniques like HDMA and windowing...
Quote:
You know though, with all this talk about dots, is it like the whole screen is how fast the CPU operates in one frame?
Mostly. The SNES is a raster display system, so you can time it based on dots and scanlines and frames, because the PPU is drawing the picture to the screen at a steady rate while the CPU is running. Of course, the CPU still runs during VBlank, so you actually get 262 scanlines of CPU time (assuming NTSC and non-interlaced). Note, however, that the CPU pauses for 10 dots (40 master cycles) in the middle of each scanline to allow the memory controller to refresh WRAM, so each scanline only spans 1324 master cycles of compute time, instead of the 1364 master cycles it takes on the wall clock (the 256 active dots only take 1024 master cycles to draw; the rest is HBlank).
Quote:
Didn't you just say that that other place was by the movie theater there?
Whoops. I didn't even check that link; I assumed you just meant it was a totally different Canadian frozen yogurt chain (though this one does have U.S. locations, as I found after checking). Turns out it too is running a Splatoon promo... what are the odds, eh?
And their website design is really suspiciously similar...
The Financial Post wrote:
Aaron Serruya, who, with his brothers Michael and Simon, founded Yogen Früz in Toronto in 1986. [...] Now chief executive of Yogurty’s, Mr. Serruya
The Financial Post wrote:
The [Yogurty’s] brand is owned by Markham, Ont.-based International Franchise Inc., the company that acquired Yogen Früz in 2005
Oh.
To answer your question - no, I haven't. I don't go out much lately...
Espozo wrote:
byuu wrote:
But to me, what matters are the games. And the SNES has ~15 times the JRPGs as the Genesis.
Is that supposed to be a good thing?
That's a great thing! Who doesn't enjoy a good story?
DoNotWant wrote:
Espozo wrote:
byuu wrote:
But to me, what matters are the games. And the SNES has ~15 times the JRPGs as the Genesis.
Is that supposed to be a good thing?
That's a great thing! Who doesn't enjoy a good story?
If I want a good story, I prefer watching a movie.
Why such SNES hate? Personally, I always disliked the Genesis. The poor color rendition and the extremely limited tonal qualities to it's sound were a huge downfall to me. Don't get me wrong, I enjoyed some of the games on the system but I generally enjoyed them in spite of the system.
I'm curious what is the draw for Genesis love? Please don't say clock speed because clock speed isn't FUN. My iPhone has a higher clock speed and I have no desire to play games on it.
I recently start getting more into SHMUPs and I read a lot of people saying SNES sucks for SHMUPs and Genesis is the only way to go. But then I look at the highly regarded SHMUPs on that system, like Thunderforce, and I don't honestly think they look very good. I honestly think Space Megaforce looks better and plays better than the critically acclaimed MUSHA.
Granted there are only a couple great SHMUPs for the SNES, but when I look at Space Megaforce or Macross: Scrambled Valkyrie, I think this may have been a limitation on the skill of the programmers and not the system.
Espozo wrote:
Because they're utterly bizarre.
As some guys can be here belong the Megadrive...
Quote:
byuu wrote:
Both systems have their pros and cons, but I'd agree the Genesis was the superior hardware.
I'm just curious, and I'm not saying you're wrong because you have x100 more knowledge of both systems than I do, but what makes you think so? I'm pretty sure you don't believe in blast processing.
I always thought the SNES was backwards in how it was constructed (mostly with sprites), but I never really saw it as being weaker.
Did you at least know the Megadrive hardware ?
Byuu has probably good reasons to think that and you know i'm sharing this feeling.
Byuu wrote the most famous SNES emulator along others great things so i guess you can trust him on that point. I believe than any people with strong hardware knowledges and specifically about both systems would tell you the Megadrive was the superior hardware. The SNES of course has the edge on some aspects (much more colors, more hardware GFX effects, orchestral sound) but for a system 2 years newer that is the least to expect... However compared to the MD it has so much pitfalls and that's when you develop on both systems you understand you can definitely do more with the Megadrive. The SNES hardware is just so unbalanced and painful to develop with where the Megadrive is straightforward and cleverly designed, that is the point.
The 68000 is for me definitely a big advantage over the SNES but that is not the only one... column scrolling (doable with the offset per tile but again painful), packed tile data format, powerful and flexible sprite capabilities, flexible sound hardware.
The SNES has the nice HDMA thing which is definitely a great addition but imo almost everything else is not really worthy.
The Megadrive also has its weakness as the (very) limited number of color palette but that is almost the only real weakness of this system, i even bet that with 8 palettes instead of 4 the Megadrive story could have been different.
darryl.revok wrote:
Why such SNES hate? Personally, I always disliked the Genesis.
Why such Genesis hate ??
Quote:
The poor color rendition and the extremely limited tonal qualities to it's sound were a huge downfall to me. Don't get me wrong, I enjoyed some of the games on the system but I generally enjoyed them in spite of the system.
I'm curious what is the draw for Genesis love? Please don't say clock speed because clock speed isn't FUN. My iPhone has a higher clock speed and I have no desire to play games on it.
Personally i owned both systems back in time and i loved both... I would say i even played more the SNES than the Megadrive but definitely both systems were great to play with. Today as i'm developing on these systems (well actually the Megadrive but at least i tried on SNES) my mind is biased and i'm more judging them about their hardware design and capabilities. I am even dismayed that the SNES was always considered as the superior hardware by magazine when i was younger as definitely it was not really the case, marketing work...
Quote:
I recently start getting more into SHMUPs and I read a lot of people saying SNES sucks for SHMUPs and Genesis is the only way to go. But then I look at the highly regarded SHMUPs on that system, like Thunderforce, and I don't honestly think they look very good. I honestly think Space Megaforce looks better and plays better than the critically acclaimed MUSHA.
Granted there are only a couple great SHMUPs for the SNES, but when I look at Space Megaforce or Macross: Scrambled Valkyrie, I think this may have been a limitation on the skill of the programmers and not the system.
That is just matter of taste but definitely the Megadrive has a better library when it comes to SHMUPS... I recommend you Gynoug which is my preferred on the Megadrive :p Snes has some great titles (R-Type 3, Pop twin bee, Phalanx) but less than the Megadrive in that department.
93143 wrote:
And their website design is really suspiciously similar...
The Financial Post wrote:
Aaron Serruya, who, with his brothers Michael and Simon, founded Yogen Früz in Toronto in 1986. [...] Now chief executive of Yogurty’s, Mr. Serruya
The Financial Post wrote:
The [Yogurty’s] brand is owned by Markham, Ont.-based International Franchise Inc., the company that acquired Yogen Früz in 2005
Oh.
To answer your question - no, I haven't. I don't go out much lately...
Can someone provide a sharp scan of the "yogurtys.com" text on a Yogurty's cup? I'd like to see what font it uses, and the cups in the photo for the
Splatoon ad campaign aren't sharp enough for WhatTheFont/WhatFontIs.
How is collumn scrolling "painful" on the SNES? You just upload the scroll values of each column to the second row in BG3's tilemap with bit-13 set.
>> Both systems have their pros and cons, but I'd agree the Genesis was the superior hardware.
> I'm just curious, and I'm not saying you're wrong because you have x100 more knowledge of both systems than I do, but what makes you think so?
The CPU. As a developer wanting to make a game to run on a game console, being able to write in C is such an incredible advantage. The Genesis has a real CPU with 16 registers.
There have been attempts to make C compilers for 65xx processors, but they produce code that's unusably slow and horribly inefficient. C is just entirely incompatible with the notion of "one actual register, two index registers" (and even with the one actual math register; you can't natively multiply, divide, shift/rotate by more than one bit at a time, etc)
As a game developer, I'd take the video and sound hit (both of which also come at added complexity compared to the Genesis.) I would absolutely dread the idea of writing a full-fledged RPG engine in SNES assembler. And those damned M/X flags on P ... I hate them so very, very much.
Conversely, as an emulator developer / ROM hacker, I much prefer analyzing others' SNES ASM, because it's so much simpler, and you don't have to work your way through obscene "shell game" behavior that C compilers generate with variables in registers.
> Is that supposed to be a good thing?
For someone who likes JRPG/JSLGs like me, yes.
Genesis has a few, and I love those games. Shining Force 1&2, Phantasy Star IV (III was total shit), and if you go to the Mega CD ... Lunar 1&2.
But the SNES ... oh man. Lufia 1&2, BoF 1&2, Final Fantasy 4-6, Dragon Quest 1,2,3,5,6, Bahamut Lagoon, Tales of Phantasia, Chrono Trigger, Star Ocean, Tengai Makyou Zero, Der Langrisser, Dai Kaijuu Monogatari 1&2, Arabian Nights, Power of the Hired, Soul Blader, Terranigma, Illusion of Gaia, Chaos Seed, Ys 3-5, Seiken Densetsu 2&3, Aretha 1&2, Bakumatsu Oni, Kourinden Oni, Emerald Dragon, Romancing SaGa 1-3, Farland Story 1&2, Tactics Ogre, Bushi Seiryuuden, Last Bible III, ... and those are just my personal favorites. There's like five times more than that.
If the game libraries were reversed, I'd have been Nemesis, but in 2004.
I'd easily take an SNES over a PS4. So to me, arguing about which hardware was superior is just a waste of time. I don't play hardware, I play games.
> Granted there are only a couple great SHMUPs for the SNES, but when I look at Space Megaforce or Macross: Scrambled Valkyrie, I think this may have been a limitation on the skill of the programmers and not the system.
Frankly, you're wasting your time playing shmups on any old console.
Why dick around with MUSHA vs Space Megaforce when you could be playing Dodonpachi with your monitor tilted into portrait mode instead?
SHMUPS aren't RPGs. There's no compelling story to trump the graphical limitations.
https://youtu.be/iJwhq2pkhhc?t=2800 => "WHERE IS YOUR
SPRITE LIMIT GOD NOW?!"
Stef wrote:
As some guys can be here belong the Megadrive...
I get the first part of what you said, but not "belong the Megadrive".
Stef wrote:
Did you at least know the Megadrive hardware ?
Not too much, and I never said that whatever reason he would say would be wrong. This is basically what I know:
Has a 256x224 screen mode, and a 320x224 screen mode.
In 256 pixel mode, there are 256 sprite pixels per line.
In 320 pixel mode, there are 320 sprite pixels per line.
In 256 pixel mode, there are 64 sprites.
In 320 pixel mode, there are 80 sprites.
(why in the world would you even choose 256 pixel mode then?)
Has a shadow and highlight mode that can either double the brightness of the darkness of a color under it? Can't this actually increase the total color count?
Has a 9bit color depth.
All tiles are 4bpp.
All tiles are packed pixel?
Has an interlaced mode that's used in the competition 2 player in Sonic 2. (I heard it took a lot of silicon real estate in the video processor that could have been used for something like more colors)
Has the hardware in the video chip for selecting up to 384 color entries, but there's only enough space for 64 colors and I don't think anything was built to be able to select that many palettes. (Wasn't there an arcade game based off the Genesis hardware that used this?)
Every sprite can be some combination of 8, 16, 24, or 32 by 8, 16, 24, or 32 pixels large.
Each sprite and BG tile can select 4 different 16 color palettes from a shared bank.
I think I remember that sprite information is stored in vram and that it's actually safe to update it mid screen.
Each sprite entry is (I think?) 8 bytes, and sprites (thankfully) have access to all tiles in vram.
There are 2 BG that have access to all tiles.
There is a scrolling table for BG layers because there isn't anything like hdma.
There is also a 16x16 column scrolling table, and I think I remember hearing that there's only 20 entries so one of them wraps around when the screen scrolls.
Has a 7.6Mhz 68000
Has I think about around a 4Mhz Z80 for sound and the Master System?
Has one PCM channel for sound.
Has several (I don't know the specific number) FM channels for sound.
Has 64KB of main ram.
Has 64KB of video ram (but was potentially going to have 128KB).
Some have a bios that loads up "licensed by Sega" or something, but I think Sega ended up getting in trouble with Accolade?
Okay, you can tell I'm getting desperate...
Stef wrote:
The SNES hardware is just so unbalanced and painful to develop with where the Megadrive is straightforward and cleverly designed, that is the point.
I can't actually disagree with you there. :/
Stef wrote:
flexible sound hardware.
Isn't the sound hardware on the Genesis less flexible? I recall hearing a story about how in Mortal Kombat III on the Genesis, samples had to be played at half quality to try to have multiple sound effects going on, which is often the case for fighting games.
Stef wrote:
The Megadrive also has its weakness as the (very) limited number of color palette but that is almost the only real weakness of this system, i even bet that with 8 palettes instead of 4 the Megadrive story could have been different.
That was enough for me. The SNES may not have 512 colors, and the Turbo Graphics 16 may have only had 9bit color depth, but the Genesis got hit hard. I'll also take my 256 color and 16 color BGs at 256 pixels over 2 16 color BGs at 320 pixels, but that's just personal preference.
Stef wrote:
I am even dismayed that the SNES was always considered as the superior hardware by magazine when i was younger as definitely it was not really the case, marketing work...
https://www.youtube.com/watch?v=zlulSyBI2aY Quote:
Phalanx
I... guess? Of course I'm going to agree with you about R-Type 3 though. Super R-Type would be better if it weren't in slowmo, but it's still a fairly mediocre attempt. I don't like Gradius, and I think Axelay is tremendously overrated, although I only did about 1 level.
Stef wrote:
Why such Genesis hate ??
I didn't really get darryl.revok talking about SNES hate, because I don't really see any of that here.
psycopathicteen wrote:
How is collumn scrolling "painful" on the SNES? You just upload the scroll values of each column to the second row in BG3's tilemap with bit-13 set.
I didn't get it either.
byuu wrote:
The CPU. As a developer wanting to make a game to run on a game console, being able to write in C is such an incredible advantage. The Genesis has a real CPU with 16 registers.
Quote:
There have been attempts to make C compilers for 65xx processors, but they produce code that's unusably slow and horribly inefficient. C is just entirely incompatible with the notion of "one actual register, two index registers" (and even with the one actual math register; you can't natively multiply, divide, shift/rotate by more than one bit at a time, etc)
More than once, this is where our opinions differ. I'd rather write in machine code than C, and I'd rather have a better PPU than a better CPU. I don't know about you, but if I don't have way too much I can do with the video hardware, I'll try and limit what I originally wanted to do, but if I'm not as limited and the CPU just can't keep up very well and there's slowdown, I'd either add an expansion chip, (not trying to imply that it's easy though) or I'd just shrug my shoulders and say oh well.
Quote:
Frankly, you're wasting your time playing shmups on any old console.
Quote:
Why dick around with MUSHA vs Space Megaforce when you could be playing Dodonpachi with your monitor tilted into portrait mode instead?
Why dick around with Final Fantasy 6 vs. Chrono Trigger when you could be playing Final Fantasy 3,000?
Quote:
SHMUPS aren't RPGs. There's no compelling story to trump the graphical limitations.
Are graphical limitations all that matter? The game you posted in the video actually looked rather boring, just a bunch of slow bullets that seem to disappear for whatever reason. I'd much rather stick to the "outdated" R-Type.
tepples wrote:
Can someone provide a sharp scan of the "yogurtys.com" text on a Yogurty's cup? I'd like to see what font it uses, and the cups in the photo for the Splatoon ad campaign aren't sharp enough for WhatTheFont/WhatFontIs.
That would be cool. I just noticed that I sort of made a bad pun... Should I instead say that would be "fresh"?
psycopathicteen wrote:
If I want a good story, I prefer watching a movie.
This.
Quote:
Quote:
The SNES hardware is just so unbalanced and painful to develop with where the Megadrive is straightforward and cleverly designed, that is the point.
I can't actually disagree with you there. :/
I wonder if it was possible back then to have VRAM accessed a little bit faster than the pixel clock, like having it access 10 words per 8 pixels. It seems like most of the complexity came from trying to shove as many bg layers into the sPPU VRAM bandwidth as possible.
> More than once, this is where our opinions differ. I'd rather write in machine code than C, and I'd rather have a better PPU than a better CPU.
And you are in the minority. The Genesis had a thriving unlicensed development scene. Complete games that could have easily been real official titles, not small homebrew games. Tons of great Chinese games like Fengshen Yingjiechuan and Beggar Prince. And of course, the venerable Pier Solar. Meanwhile, the SNES' most impressive unlicensed game is d4s' Super Road Blaster port. Aside from digging up games that were intended to be licensed (eg the shamefully bad Nightmare Busters) or Christian ROM hacks / loli porn rape dungeon games, the SNES hasn't seen a single, real full, substantial unlicensed game yet. And I'll probably finish FEoEZ before Project N comes out :P
> Why dick around with Final Fantasy 6 vs. Chrono Trigger when you could be playing Final Fantasy 3,000?
I already covered this. You play RPGs for their story.
And certainly, I will play "Zelda: Link's Awakening" and "Castlevania: The Adventure" when I'm out of other titles to play (and they are indeed worth playing through once); but "Zelda 3: Triforce of the Gods" and "Castlevania: Rondo of Blood" are clearly the superior, more enjoyable games.
Shmups though are a different class of game. They're more akin to pinball machines: you play the same game over, and over, and over. I've literally played through Dodonpachi at least 50-100 times. I would probably put fighting games in this same category. Whereas I have played through Chrono Trigger and Final Fantasy VI exactly once. And Zelda 3 probably four times.
So the better analogy would be, why play this:
http://1stadvantagesolutions.com/32440.jpgWhen you can play this?
http://www.pinballpast.com/shop/files/t ... pfield.JPG> just a bunch of slow bullets that seem to disappear for whatever reason
He's spamming bombs. But it takes forever because that boss becomes invincible while your bomb is in effect. It's basically the game's last ditch effort to steal quarters from even the most extreme players. I only showed that part to emphasize the fun of shmups not limited by a max of 64 sprites onscreen, or 20 on the same line.
The reason I like this game so much is
because the bullets move at a speed that human beings can dodge them. There's certainly not a technical reason they can't move ten times faster like a Touhou game.
> I'd much rather stick to the "outdated" R-Type.
Again, suit yourself.
You seem to be arguing like I've been stating objective facts. They're just my opinions. You're welcome to share yours too, of course, but no need to be a contrarian just because I responded to someone else with mine.
byuu wrote:
You seem to be arguing like I've been stating objective facts.
The "shoot em ups become outdated because you only play them for graphics" seemed kind of like you where trying to state objective facts. :/
Espozo wrote:
(why in the world would you even choose 256 pixel mode then?)
To save VRAM, or to draw the same graphics and use them for both Super NES and Genesis versions of a game.
Quote:
I think I remember that sprite information is stored in vram and that it's actually safe to update it mid screen.
Yes. Video on the Genesis VDP is most similar to S-PPU mode 2, with offset per 2 columns in the same way mode 2 has offset per tile. On the Genesis, the sprite table is inside VRAM and OPT data is in a separate VSRAM, but in S-PPU mode 2, OPT data is inside VRAM (at a position controlled by BG3's nametable base register) and the sprite table is in a separate OAM. Either way, the fetch pattern of the Genesis VDP leaves some time available for VRAM data access during active picture, which makes real-time decompression more practical as tile data can be sent directly to the VDP instead of being buffered for a vblank DMA.
Quote:
Has I think about around a 4Mhz Z80 for sound and the Master System?
3.6 MHz. But the Z80 is about half as efficient clock for clock as the 6502, so it's about as fast as the NES CPU.
Quote:
Has one PCM channel for sound.
Plus 5 FM channels and the 4 Master System channels. In addition, specialized Z80 code can mix multiple samples into this mono PCM channel.
Quote:
Some have a bios that loads up "licensed by Sega" or something, but I think Sega ended up getting in trouble with Accolade?
The VDP is locked until the BIOS is switched in, which is done by writing the string "SEGA" to an I/O port. The BIOS looks for "SEGA" or " SEGA" at a particular place in the ROM header, and if found, it displays the license screen for a few seconds and jumps back to the cartridge.
Accolade inserted the code to call the BIOS. Sega sued, claiming that the license screen display in an unlicensed game infringed Sega's trademark. The judge ruled in Accolade's favor, holding that Sega itself was misusing its trademark as an ersatz patent.
The boot sequence in the Game Boy, Game Boy Color, and Game Boy Advance works almost identically, except its BIOS is switched in on power-up, and the magic string in the ROM header is a bitmap of the name "Nintendo".
Quote:
Stef wrote:
flexible sound hardware.
Isn't the sound hardware on the Genesis less flexible? I recall hearing a story about how in Mortal Kombat III on the Genesis, samples had to be played at half quality to try to have multiple sound effects going on, which is often the case for fighting games.
Audio mixing required more CPU time, and you're probably right that professional programmers on a deadline didn't have time to optimize the soft mixer to demoscene levels of efficiency.
Quote:
tepples wrote:
Can someone provide a sharp scan of the "yogurtys.com" text on a Yogurty's cup? I'd like to see what font it uses, and the cups in the photo for the Splatoon ad campaign aren't sharp enough for WhatTheFont/WhatFontIs.
That would be cool. I just noticed that I sort of made a bad pun... Should I instead say that would be "fresh"?
I ask because the font sort of reminds me of Fink Heavy, the font used for the logo of
Animal Crossing and
a bunch of other things, but it isn't exact.
tepples wrote:
Espozo wrote:
(why in the world would you even choose 256 pixel mode then?)
To save VRAM, or to draw the same graphics and use them for both Super NES and Genesis versions of a game.
I'd still kind of like 80 sprites... I wouldn't mind 256 pixel mode if the sprite capabilities where kept intact. Isn't the way this is done by lowering the clockspeed of the video chip?
tepples wrote:
half as efficient clock for clock
I can think of another processor a little above "half as efficient clock for clock" compared to another CPU, but not going to say anything...
tepples wrote:
specialized Z80 code can mix multiple samples into this mono PCM channel.
Isn't that how the GBA does it, except it has two PCM channels, one for left, and one for right if you have headphones?
tepples wrote:
Audio mixing required more CPU time, and you're probably right that professional programmers on a deadline didn't have time to optimize the soft mixer to demoscene levels of efficiency.
So really, what I thought was a fairly clever idea was really just laziness... Can you think of any games that mix the audio together? I haven't payed attention enough to notice. (You know, how feasible would it be to mix audio together with the SPC700?)
Espozo wrote:
I get the first part of what you said, but not "belong the Megadrive".
Oh sorry, my english sucks, i meant "towards", i don't know why i'm always confusing these 2 words :p
Quote:
Not too much, and I never said that whatever reason he would say would be wrong. This is basically what I know:
Has a 256x224 screen mode, and a 320x224 screen mode.
In 256 pixel mode, there are 256 sprite pixels per line.
In 320 pixel mode, there are 320 sprite pixels per line.
In 256 pixel mode, there are 64 sprites.
In 320 pixel mode, there are 80 sprites.
...
Has several (I don't know the specific number) FM channels for sound.
Has 64KB of main ram.
Has 64KB of video ram (but was potentially going to have 128KB).
Definitely you already have a good view of the Megadrive hardware but i believe you really need to try some piece of development on this system to understand why we prefer it. Of course you don't have the time for that, myself i would love to develop on more system but lack of time for that
Also you said you don't care about C and prefer using assembly directly... Writing assembly for the 68000 is really pleasant, try it for sometime and then you will have hard time getting back to 65816 assembly, trust me.
About the Megadrive hardware, I don't know where you get the 384 colors entries (arcade board ?) but at least it would have been really easy to have 4+4 palettes (4 for sprites and 4 for BG) and also 12 bits RGB color (internally color data are encoded as 0000RRRxGGGxBBBx).
Quote:
Isn't the sound hardware on the Genesis less flexible? I recall hearing a story about how in Mortal Kombat III on the Genesis, samples had to be played at half quality to try to have multiple sound effects going on, which is often the case for fighting games.
Well depending the "sound driver" you can really change the sound capabilities of the system.
The Megadrive sound system is composed of the Z80 CPU (3.57 Mhz), 8 KB of dedicated memory (owning the sound driver code and its data, variables), the YM2612 chip and the PSG chip.
What i like is that you have a full dedicated CPU to handle all the sound stuff and that CPU can access the main BUS (and so the ROM) which is a big difference from the SPC700 which need any data to be pushed from external.
It's true that by default the YM2612 offers you 6 FM channels or 5 FM + 1 PCM and the PSG offer 3 square channels + 1 noise channel. Also because they wanted to maintain backward compatibility with the Master System you don't have any interrupt connected to the YM2612 timers, which is a pity and make things a bit more complex...
But then everything is up to the developer skill.
You spoke about MK3, but just take SF2... The Megadrive version is "famous" for its poor voices quality, as MK3 they actually mix 2 PCM channels in software but that is really poorly done :-/
I made a patch for this game to fix the voice quality :
https://www.youtube.com/watch?v=-iE5GJNkOqsI just replaced the dumb default sound driver and that is... It's just unbelievable they didn't put more effort on porting this game when you think how much its (negative ?) impact was important :-/
Also i developed my own sound driver for the Megadrive for easy music and SFX handling in game development.
You can find it in my SGDK toolkit, i called it
XGM driver and it basically allows you to have 5 FM + 4 PCM (@14 Khz) + 4 PSG everything handled by the Z80 cpu only.
The samples played on the 4 PCM channels doesn't have any size limit.
It really extends the Megadrive sound capabilities and this only using the Z80 CPU (so the main 68000 remain 100% available for others parts). I bet it would be really difficult to achieve something similar with the SPC700 (extra software PCM channel) and the limited data access will remains whatever happen.
Quote:
That was enough for me. The SNES may not have 512 colors, and the Turbo Graphics 16 may have only had 9bit color depth, but the Genesis got hit hard. I'll also take my 256 color and 16 color BGs at 256 pixels over 2 16 color BGs at 320 pixels, but that's just personal preference.
Well for me 4 palettes is definitely not enough, even if you can share it between sprite and BG. Only having 4+4 would have make a huge difference.
Stef wrote:
Do you know that "blast processing" term actually came from the fact that you can DMA to CRAM fast enough to create a pseudo 16 bits (9 bits used) bitmap mode ?
Quote:
Quote:
Phalanx
I... guess? Of course I'm going to agree with you about R-Type 3 though. Super R-Type would be better if it weren't in slowmo, but it's still a fairly mediocre attempt. I don't like Gradius, and I think Axelay is tremendously overrated, although I only did about 1 level.
I don't like Axelay neither, nor Super Aleste or Musha aleste :p
Phalanx is really underrated for me, it has its flaws (as the slowdowns) but very enjoyable at end and one of the rare SNES shump to display more than 10 sprites at once (exaggerating but that is the idea) :p
Quote:
psycopathicteen wrote:
How is collumn scrolling "painful" on the SNES? You just upload the scroll values of each column to the second row in BG3's tilemap with bit-13 set.
I didn't get it either.
Well i never understood correctly the documentation then, i though you had to update the whole BG3 tilemap to do it.
Espozo wrote:
tepples wrote:
specialized Z80 code can mix multiple samples into this mono PCM channel.
Isn't that how the GBA does it, except it has two PCM channels, one for left, and one for right if you have headphones?
Yes, but the GBA also has an ARM CPU, with lots of registers and fast multiplication, and a pair of DMA channels to feed its sample FIFOs so that mixed audio can be fed to the DAC in the background without needing the rest of the program to be timed code.
Stef wrote:
Also i developed my own sound driver for the Megadrive for easy music and SFX handling in game development.
You can find it in my SGDK toolkit, i called it
XGM driver and it basically allows you to have 5 FM + 4 PCM (@14 Khz) + 4 PSG everything handled by the Z80 cpu only.
The samples played on the 4 PCM channels doesn't have any size limit.
It really extends the Megadrive sound capabilities and this only using the Z80 CPU (so the main 68000 remain 100% available for others parts). I bet it would be really difficult to achieve something similar with the SPC700 (extra software PCM channel) and the limited data access will remains whatever happen.
Well done! I'm taking note of that sound driver, because I personally want a different way to make music other than VGMMM and Deflemask. Both of them suffer from desyncing with PCM (although in your case, it looks like I have fixed sample rates only... ah well, that's OK for me, as I can easily do resampling)... and I'm interested in using Kega Fusion for both creating VGM files and rendering them.
I actually understand the difficulty with emulating extra PCM channels quite well... especially in the SPC700's case when you have to render it either to BRR (in real time) or use the echo buffer. Both are quite CPU-intensive... on either end.
KungFuFurby wrote:
Well done! I'm taking note of that sound driver, because I personally want a different way to make music other than VGMMM and Deflemask. Both of them suffer from desyncing with PCM (although in your case, it looks like I have fixed sample rates only... ah well, that's OK for me, as I can easily do resampling)... and I'm interested in using Kega Fusion for both creating VGM files and rendering them.
I actually understand the difficulty with emulating extra PCM channels quite well... especially in the SPC700's case when you have to render it either to BRR (in real time) or use the echo buffer. Both are quite CPU-intensive... on either end.
Thanks
I do have a question, you said you don't want to use VGMMM nor DefleMask, is there any other usable solution ?
Also what about this PCM desyncing issue ? i never heard about it...
XGM uses frame based timing, the 4 PCM software mixing operation eats the major part of the CPU time (about 70%) and to maintain good playback level i had to cycle count the code so no way to handle sub frame accuracy timing level for XGM instruction inside that. With the PCM mixing and buffering i can obtain some PCM timing a bit off but the delay is rarely above the frame. For SFX that is another story... (you can't avoid the internal mixing buffer delay).
This looks like it's gone way off-topic. If someone can pinpoint the exact post where it ceased to be about "SNES Tilemap Editor", please PM me.
Stef wrote:
Do you know that "blast processing" term actually came from the fact that you can DMA to CRAM fast enough to create a pseudo 16 bits (9 bits used) bitmap mode ?
I thought that was a relatively new discovery, (coined "phantom bitmap" as I think tepples told me) and I'm 99% percent certain that Sonic 2 doesn't use it which they are showing off in the video. I'm not denying this is a very impressive feat, and would have increased the look of Sega CD games greatly so we wouldn't have terrible looking ones like Road Avenger that use one palette (which isn't the systems fault, but having access to all 512 colors sure as heck looks better than only 61).
I won't say I'll deny the SNES of similar marketing, like how I remember reading an article where the people made it seem like the SNES could run every video mode put together, like saying 4 BG layers and then 512x448 resolution. The specifications would but the GBA to shame. I can't help but feel that things like 64x64 sized sprites and high resolution mode where only put in for bragging rights, as they are terribly impractical. Doesn't the Turbographics 16 actually have 256x224, 320x224, and 512x448 resolution modes?
tepples wrote:
Yes, but the GBA also has an ARM CPU, with lots of registers and fast multiplication, and a pair of DMA channels to feed its sample FIFOs so that mixed audio can be fed to the DAC in the background without needing the rest of the program to be timed code.
But can't even load a number in one instruction... I'm never going to let that go.
Stef wrote:
I made a patch for this game to fix the voice quality :
Wow. Good job! I've always wanted to see someone try to fix that game.
Stef wrote:
It's just unbelievable they didn't put more effort on porting this game when you think how much its (negative ?) impact was important :-/
I've always felt the same way. I always thought of it as unfair how the Genesis got shatted on with SF2 in that it uses the same small graphics as the SNES port, which wasn't even that great to begin with either. (tiny sprites even for the resolution, bad voice samples, worse music relative to the arcade, (whoever says otherwise is deaf)) Heck, I just now made this, showing how they could have just horizontally squashed the sprites (The middle is for the SNES. For the Genesis, use something exactly in between the CPS1 and SNES ones in size. I didn't touch anything up) and it would have looked better than what they did (I could only find Ken, but close enough)
Attachment:
Street Fighter 2 Sprites.png [ 5.13 KiB | Viewed 1701 times ]
Now here's everything in the 4x3 aspect ratio (the CPS1 has a crazy 12x7 aspect ratio (384x224) despite running on a 4x3 monitor)
Attachment:
Street Fighter 2 4x3 Sprites.png [ 28.59 KiB | Viewed 1701 times ]
I know the game has vertical black bars on the top and bottom of the screen, but I don't have a clue as to why. By double buffering, you can easily have 2 Zangiefs unless they where really trying not to have to double buffer anything even though it would fit in vram. I wouldn't be surprised if it where a cost effective decision because they'd need less rom to store the smaller graphics, even though the graphics in the original SF2 SNES weren't even compressed... I bet they just wanted to get it done in time and knew it would sell like hotcakes regardless.
Hell, actually, this is how it could look on the Genesis if only the color depth was reduced to 9 bit:
Ryu after his weight loss program:
Attachment:
Street Fighter 2 Sprites.png [ 5.69 KiB | Viewed 1696 times ]
4x3 Aspect Ratio:
Attachment:
Street Fighter 2 4x3 Sprites.png [ 25.88 KiB | Viewed 1696 times ]
Espozo wrote:
Attachment:
Street Fighter 2 Sprites.png
Now here's everything in the 4x3 aspect ratio (the CPS1 has a crazy 12x7 aspect ratio (384x224) despite running on a 4x3 monitor)
Display aspect ratios aren't quite as important as pixel aspect ratios, which can be derived from the
dot clock rate. Oddball pixel aspect ratios were common in the second through fourth generation because of the way NTSC's 4:3 frame was defined in Rec. 601. Atari 2600 and Apple II color have a 12:7 PAR, where each pixel is one cycle of color burst wide. ColecoVision, MSX, NES, SMS, Genesis 256px, and Super NES have a 8:7 PAR. Genesis 320px is 32:35. Commodore 64 has a narrow 3:4 PAR, and that of the CPS 1 and 2 is 135:176 which is close to C64.
So the proper factor when squashing from CPS to SNES is 135/176/(8/7), which is very close to two-thirds, and for CPS to Genesis it's 135/176/(32/35), which is close to five-sixths.
Quote:
I bet they just wanted to get it done in time and knew it would sell like hotcakes regardless.
And when they did put more time into it, the result was
Special Champion Edition, the Genesis counterpart to
Turbo.
What annoys me the most about Street Fighter 2 is the huge-ass bars on top and bottom of the screen. There's just not enough stuff on-screen for SF2 to need extrea v-blank.
Edit: I know why. They recycled code from Final Fight.
Does this thread have anything at all to do with froyo? What is going on? Why did you do this?
> Does this thread have anything at all to do with froyo?
They replaced the Cuzzin's chains up here with this "#froyo" place that is absolutely terrible. Nothing but ridiculous, gimmicky flavors now like "strawberry cheesecake", "oreo cookies", "triple fudge cake", etc. If I wanted those dessert flavors, I would eat those things.
Probably for the best that I eat less of it anyway. No fat, but I'm sure it was an obscene amount of sugar.
rainwarrior wrote:
Does this thread have anything at all to do with froyo? What is going on? Why did you do this?
Because it has even less to do with "SNES Tilemap Editor", the topic from which it was split.
Espozo wrote:
I thought that was a relatively new discovery, (coined "phantom bitmap" as I think tepples told me) and I'm 99% percent certain that Sonic 2 doesn't use it which they are showing off in the video.
A new discovery for homebrew scene but not for Sega developers
The term blast processing just comes from the sentence the developer used back in time "by blasting data with DMA in CRam you can modify color fast enough to produce a full colored line" or something close to that... The marketing guys liked the "blast" word and decided to use it just as a reference about the general power of the machine and not about a specific feature.
Nothing to do with sonic 2 so, no game uses this trick.
Quote:
I'm not denying this is a very impressive feat, and would have increased the look of Sega CD games greatly so we wouldn't have terrible looking ones like Road Avenger that use one palette (which isn't the systems fault, but having access to all 512 colors sure as heck looks better than only 61).
On Megadrive using that trick is not much of interest, it requires a lot of memory (you can only fill half of the screen with genesis ram) and it consumes many CPU time as it requires continuous DMA during the active bitmap display... Indeed it would have been much more interesting with the SegaCD where the sub CPU could prepare rendering in a part of of the word ram while the second part is being transfered by DMA to CRam.
Quote:
I won't say I'll deny the SNES of similar marketing, like how I remember reading an article where the people made it seem like the SNES could run every video mode put together, like saying 4 BG layers and then 512x448 resolution. The specifications would but the GBA to shame. I can't help but feel that things like 64x64 sized sprites and high resolution mode where only put in for bragging rights, as they are terribly impractical. Doesn't the Turbographics 16 actually have 256x224, 320x224, and 512x448 resolution modes?
A lot of the SNES features appears to exist just to exhibit bigger numbers, and honestly back in time that worked :-p
We didn't known the constraints applying to these numbers...
Quote:
Wow. Good job! I've always wanted to see someone try to fix that game.
So long time I wanted to fix it and honestly it was not difficult at all, the most difficult part was just to integrate well in their code.
Quote:
I've always felt the same way. I always thought of it as unfair how the Genesis got shatted on with SF2 in that it uses the same small graphics as the SNES port, which wasn't even that great to begin with either. (tiny sprites even for the resolution, bad voice samples, worse music relative to the arcade, (whoever says otherwise is deaf)) Heck, I just now made this, showing how they could have just horizontally squashed the sprites (The middle is for the SNES. For the Genesis, use something exactly in between the CPS1 and SNES ones in size. I didn't touch anything up) and it would have looked better than what they did (I could only find Ken, but close enough)
...
... I bet they just wanted to get it done in time and knew it would sell like hotcakes regardless.
Honestly there is weird things in the code of this game. For instance it can transfer data at same location in VRam twice during a single frame, a real waste of bandwidth, I guess that explain why they needed to extend vblank (it's even worst on the SNES version). Another funny thing in the code of the MD version, when you enter the option screen then exit it, the stack pointer is not restored to initial position (some parameters never got released probably) meaning that you can potentially get the game to crash if you try to enter/exit the option many many time :-p
byuu wrote:
> Does this thread have anything at all to do with froyo?They replaced the Cuzzin's chains up here with this "#froyo" place that is absolutely terrible. Nothing but ridiculous, gimmicky flavors now like "strawberry cheesecake", "oreo cookies", "triple fudge cake", etc. If I wanted those dessert flavors, I would eat those things.Probably for the best that I eat less of it anyway. No fat, but I'm sure it was an obscene amount of sugar.
And costs an obscene amount money... I'd simply try the Splatoon froyo because it's Splatoon froyo, even though it looks terrible.
Man, too bad I don't live in Canada. I would have gotten my picture taken by this statue of an inkling with Thomas the Tank Engine's face superimposed on it.
tepples wrote:
So the proper factor when squashing from CPS to SNES is 135/176/(8/7), which is very close to two-thirds, and for CPS to Genesis it's 135/176/(32/35), which is close to five-sixths.
I did 2/3 and 5/6. It's convenient the vertical resolution is exactly the same.
psycopathicteen wrote:
What annoys me the most about Street Fighter 2 is the huge-ass bars on top and bottom of the screen. There's just not enough stuff on-screen for SF2 to need extrea v-blank.
Exactly. They could easily double buffer to fit everything in the available vblank. I think they take something like 16 pixels off the top and the bottom.
psycopathicteen wrote:
Edit: I know why. They recycled code from Final Fight.
I'm somehow not surprised, although this is a new low for Capcom... For such an anticipated game, you would have though/hoped they would have put more time into it. Has the sourcecode for the original CPS1 SF2 ever been released?
tepples wrote:
Because it has even less to do with "SNES Tilemap Editor", the topic from which it was split.
I think most of my threads turn into "anything goes".
(And I'm usually the one that makes it that way...) I don't mind though.
Quote:
On Megadrive using that trick is not much of interest, it requires a lot of memory (you can only fill half of the screen with genesis ram) and it consumes many CPU time as it requires continuous DMA during the active bitmap display... Indeed it would have been much more interesting with the SegaCD where the sub CPU could prepare rendering in a part of of the word ram while the second part is being transfered by DMA to CRam.
I wonder if the reason it was never done on the Sega CD is how much memory it would use, even for a CD. You know, how are color entries stored in C(G)RAM? (It's kind of odd there isn't a standard as to what to call these types of things...) I'd imagine it's like 64 bytes for 64 colors, plus 8 more bytes for the 9th bit, kind of like high oam on the SNES for 72 bytes total. The most convenient way I can think of to store this giant bitmap would be to have every pixel by 2 bytes, but that would be unnecessarily large.
Quote:
A lot of the SNES features appears to exist just to exhibit bigger numbers, and honestly back in time that worked :-p
We didn't known the constraints applying to these numbers...
I think we know they forgot a pretty important number for marketing...
Stef wrote:
Honestly there is weird things in the code of this game. For instance it can transfer data at same location in VRam twice during a single frame, a real waste of bandwidth, I guess that explain why they needed to extend vblank (it's even worst on the SNES version). Another funny thing in the code of the MD version, when you enter the option screen then exit it, the stack pointer is not restored to initial position (some parameters never released probably) meaning that if you can potentially get the game to crash if you try to enter/exit the option many many time :-p
Expert programing.
(I think you've already heard psychopathicteen's views on Capcom in that regard...)
Quote:
Quote:
Edit: I know why. They recycled code from Final Fight.
I'm somehow not surprised, although this is a new low for Capcom... For such an anticipated game, you would have though/hoped they would have put more time into it. Has the sourcecode for the original CPS1 SF2 ever been released?
I was just making an educated guess, since Final Fight has the same bars.
Quote:
I wonder if the reason it was never done on the Sega CD is how much memory it would use, even for a CD. You know, how are color entries stored in C(G)RAM? (It's kind of odd there isn't a standard as to what to call these types of things...) I'd imagine it's like 64 bytes for 64 colors, plus 8 more bytes for the 9th bit, kind of like high oam on the SNES for 72 bytes total. The most convenient way I can think of to store this giant bitmap would be to have every pixel by 2 bytes, but that would be unnecessarily large.
Oh yeah of course it would have consumed a lot of memory but they could (and have to) compress it somehow.
The 9bits color is encoded in a word (16 bits) : 0000 BBB0 GGG0 RRR0
Very straightforward to work with and generate bitmap rendering, not so well in term of raw size (2 bytes per pixel).
Quote:
Expert programing.
(I think you've already heard psychopathicteen's views on Capcom in that regard...)
Actually I can understand the code is not the best one, they had time and money constraints and I will say at soon the game is good enough we can accept it. They probably spotted the option screen memory leak but just didn't care about it as nobody will ever notice it... Honestly I think the SNES version is definitely not bad, even not bad at all. The MD version is OK (it could have redrawed graphics for the higher resolution but costly definitely) and the palette work is not that bad... But the voices, just not acceptable !
Stef wrote:
The 9bits color is encoded in a word (16 bits) : 0000 BBB0 GGG0 RRR0
I thought the main reason they had 9 bit instead of 15 or 16 bit color is that it would have taken up almost twice the space in cram. Well, apparently not.
Stef wrote:
the palette work is not that bad...
I was honestly impressed too for what they had in that regard. Is it like both fighters have their own palette, and the BGs and the status bar have the remaining two?
Stef wrote:
But the voices, just not acceptable !
It's unfortunate that examples like this have led people to believe that the Genesis has very poor sound hardware when it's really just lazy programing. You may disagree with me, but I feel the same way about the SNES and games like Gradius 3 and Super R-Type that come to a crawl when there are 4+ objects onscreen. R-Type 3 makes up for it though, even though there is still a little bit of slowdown. It only really when the stage spins in level 1 (I'm assuming it's some sort of fancy collision detection?) and on some parts in the second run where there are 16+ enemies with 32+ projectiles. (It's actually wanted here).
Regarding the giant black bars on the top and bottom of the screen, I arranged the picture I posted into 18 sprites (I could have done it in a third as much and used about 4 more tiles, but there's only ever two characters anyway) and 45 tiles, less than a third of the DMA bandwidth.
Attachment:
SF2 Sprite Tiles.png [ 4.51 KiB | Viewed 1627 times ]
Espozo wrote:
18 sprites and 45 tiles, less than a quarter of the DMA bandwidth.
Fixed that for you.
What I don't get is how they managed to get Street Fighter Alpha 2 to hang for multiple seconds before every fight. Even with a 224-line active display, which they are
not using, the SNES should be able to fill all of its memory (VRAM, WRAM, audio RAM) in a little over half a second. There's no SRAM. The S-DD1 can decompress graphics at the speed of DMA, and compressing BRR does nothing anyway. I can't imagine there's much precalculation to do. So what's the holdup?
Espozo wrote:
I thought the main reason they had 9 bit instead of 15 or 16 bit color is that it would have taken up almost twice the space in cram. Well, apparently not.
The CRAM is internal to the VPD die and as it's static RAM it requires lot of space. Even if the information is "presented" in word format it looks like internally only the 9 meaningful bits are really stored, when you read the CRAM useless bits are usually set to random values.
Quote:
I was honestly impressed too for what they had in that regard. Is it like both fighters have their own palette, and the BGs and the status bar have the remaining two?
Exactly, which is the reason why definitely 4 palettes is really limiting ! Only 2 palettes availables for the background and status bar :-/
Stef wrote:
but I feel the same way about the SNES and games like Gradius 3 and Super R-Type that come to a crawl when there are 4+ objects onscreen. R-Type 3 makes up for it though, even though there is still a little bit of slowdown. It only really when the stage spins in level 1 (I'm assuming it's some sort of fancy collision detection?) and on some parts in the second run where there are 16+ enemies with 32+ projectiles. (It's actually wanted here).
These games could have been better coded but remember they are first generation games and honestly the SNES sprites "features" (understand constraints) coupled to the "slow CPU" probably made it hard to sort quickly (to get things released in time).
If you look at SNES Konami games you can see they really improved with time... still even on their last released Parodius game you experience important slowdowns when many stuffs happens on the screen. In this case you cannot do really much about it, you can always optimize your code more but that's true for any systems, at the end you still cannot animate as much sprites with the SNES CPU than with the MD one and that counts for shump type games.
Quote:
Regarding the giant black bars on the top and bottom of the screen, I arranged the picture I posted into 18 sprites (I could have done it in a third as much and used about 4 more tiles, but there's only ever two characters anyway) and 45 tiles, less than a third of the DMA bandwidth.
Definitely we are far from the bandwidth limit, even for zangief or honda character. But you cannot use the whole bandwidth for sprites, you have to consider you need a bit of the VBlank for the SAT update, BG animations (tilemap udpate) and stuff like that. But still they could have done a lot better overall (even better on the MD with the H40 mode).
93143 wrote:
What I don't get is how they managed to get Street Fighter Alpha 2 to hang for multiple seconds before every fight. Even with a 224-line active display, which they are not using, the SNES should be able to fill all of its memory (VRAM, WRAM, audio RAM) in a little over half a second. There's no SRAM. The S-DD1 can decompress graphics at the speed of DMA, and compressing BRR does nothing anyway. I can't imagine there's much precalculation to do. So what's the holdup?
I heard game is busy unpacking data at this point but why they weren't able to "hide" that unpacking process (in the versus screen for instance), no idea...
For some reason the earlier Konami SNES games are better known than their later games.
Some strange observation I've made is the more obscure an SNES game is, the less likely it is to have slowdown.
Stef wrote:
Espozo wrote:
I thought the main reason they had 9 bit instead of 15 or 16 bit color is that it would have taken up almost twice the space in cram. Well, apparently not.
The CRAM is internal to the VPD die and as it's static RAM it requires lot of space. Even if the information is "presented" in word format it looks like internally only the 9 meaningful bits are really stored, when you read the CRAM useless bits are usually set to random values.
I theorize that VDP was supposed to operate on 4 bits per channel, like much arcade hardware at the time, and the decision to give it 3-bit color was last minute. I think this because it's weird that the LSB of each color nybble is disregarded.
Right now we have
xxxx bbbx gggx rrrx
If 9-bit color was planned, you would expect this more sane configuration:
xxxx xbbb xggg xrrr
So, I think that was a last minute snip from
xxxx bbbb gggg rrrr
Yeah they considered it... too bad they came with 9 bits RGB finally...
I think it would have been far better to have 12 bits RGB with 4+4 palettes instead of highlight / shadow effect and interlace mode for instance... They really under estimated impact of these choices. Anyway, past is past
The past is the past indeed.
I have been hard at work on a neat Megadrive port using SGDK... stay tuned
that sounds intriguing and interesting at same time =)
Stef wrote:
93143 wrote:
What I don't get is how they managed to get Street Fighter Alpha 2 to hang for multiple seconds before every fight. Even with a 224-line active display, which they are not using, the SNES should be able to fill all of its memory (VRAM, WRAM, audio RAM) in a little over half a second. There's no SRAM. The S-DD1 can decompress graphics at the speed of DMA, and compressing BRR does nothing anyway. I can't imagine there's much precalculation to do. So what's the holdup?
I heard game is busy unpacking data at this point but why they weren't able to "hide" that unpacking process (in the versus screen for instance), no idea...
IIRC someone looked at what SFA2 is doing during the pause at the start of each round, and discovered that it's simply transferring data to the APU the entire time. The "Round... xxx... fight!" speech is quite high quality and takes up a big chunk of APU RAM; they overwrite it with the music/instrument/voice data for the match after playing it. They probably could have figured out a better way to do it without that horrible long pause, but I guess they didn't have a programmer with the SPC700 chops to do it or just didn't care.
AWJ wrote:
IIRC someone looked at what SFA2 is doing during the pause at the start of each round, and discovered that it's simply transferring data to the APU the entire time. The "Round... xxx... fight!" speech is quite high quality and takes up a big chunk of APU RAM; they overwrite it with the music/instrument/voice data for the match after playing it. They probably could have figured out a better way to do it without that horrible long pause, but I guess they didn't have a programmer with the SPC700 chops to do it or just didn't care.
I rather suspected that, considering the game hangs momentarily every time the announcer has to say something new.
I bet an HDMA audio streaming engine (plus a faster bulk APU loader) would get rid of the pauses and allow much higher-quality music. Maybe higher-quality voice samples too. I also wonder if it was really necessary to cut so much off the top and bottom of the image and still have such small sprites...
Kinda makes me want to re-port it. But that would be a gigantic task, and I'm already porting a game to the SNES, never mind that I have half a dozen other hobbies on the back burner, not to mention my actual work... and it's not like I'm a fighting game nut in the first place...
93143 wrote:
Kinda makes me want to re-port it. But that would be a gigantic task, and I'm already porting a game to the SNES, never mind that I have half a dozen other hobbies on the back burner, not to mention my actual work... and it's not like I'm a fighting game nut in the first place...
I'd be more interested in the original SF2, simply because I like it more. In fact, it's probably the only fighting game I like. And yes, the small sprites are unreasonable. I cleaned up the shrinked sprite I made in about 5 minutes, and it makes me wonder why Capcom didn't just do this, considering it's considerably more work to make it look good if you shrink it both vertically and horizontally. I also added a hadoken for the heck of it, and it looks perfectly fine also.
Attachment:
Clean Sprite.png [ 6.55 KiB | Viewed 1955 times ]
I might resize all the Ryu sprites, which should only take about a couple of hours tops to clean them all up. The one think I would be concerned about though is the tilemap... (If the game was horizontally squashed, you could at least use hdma to keep the palette and character area proportional)
Still, no one knows about if the Street Fighter 2 source code?
Actually, who am I kidding, I've got more than enough things to do. I still think Capcom didn't need those giant bars.
I did this a while back, the first time I contemplated Alpha 2 (new file date due to minor tweaks and a new comparison sprite from the actual port):
Attachment:
Dan_SNES.png [ 7.9 KiB | Viewed 1932 times ]
Attachment:
Dan_SNESscreen.png [ 20.49 KiB | Viewed 1932 times ]
Mind you, Zangief is
quite large in this game... Birdie,
Sodom Katana and
Vega Bison aren't small either...
AWJ wrote:
Stef wrote:
93143 wrote:
What I don't get is how they managed to get Street Fighter Alpha 2 to hang for multiple seconds before every fight. Even with a 224-line active display, which they are not using, the SNES should be able to fill all of its memory (VRAM, WRAM, audio RAM) in a little over half a second. There's no SRAM. The S-DD1 can decompress graphics at the speed of DMA, and compressing BRR does nothing anyway. I can't imagine there's much precalculation to do. So what's the holdup?
I heard game is busy unpacking data at this point but why they weren't able to "hide" that unpacking process (in the versus screen for instance), no idea...
IIRC someone looked at what SFA2 is doing during the pause at the start of each round, and discovered that it's simply transferring data to the APU the entire time. The "Round... xxx... fight!" speech is quite high quality and takes up a big chunk of APU RAM; they overwrite it with the music/instrument/voice data for the match after playing it. They probably could have figured out a better way to do it without that horrible long pause, but I guess they didn't have a programmer with the SPC700 chops to do it or just didn't care.
The first change I'd make to SFA2 on SNES is the awful instruments used for the music. It is as if the music was forgotten until the last day.
93143 wrote:
I also wonder if it was really necessary to cut so much off the top and bottom of the image and still have such small sprites...
How much ROM space do they take up?
It's very likely that they made the graphics small enough to fit in the cartridges being sold at the time. The arcade games were all at least 58.5 Mbit:
The console versions were a
lot smaller. The original
Street Fighter II for Super NES was 16 Mbit,
Turbo was 20 Mbit to support player use of boss characters plus frames for a couple new moves, and
Super was 32 Mbit to support the five new characters (including a new Ken replacing what was largely a head-swapped Ryu) and more new moves.
psycopathicteen claims that console
Street Fighter II shares code with console
Final Fight. Prototypes of
Final Fight were called
Street Fighter '89. I wonder how much code arcade
Street Fighter II shares with arcade
Final Fight.
And no, bars aren't necessarily bad. A game with a 256x176 pixel playfield (NTSC) or a 256x208 playfield (PAL) would have huge bars but would zoom well on a modern TV without cutting off more than a couple pixels at the top and bottom, especially if it uses interlace.
Star Ocean was released before SFA2, used the same chip plus a save RAM, and was 50% larger.
Still, I can certainly see Capcom deciding 32 Mb of fast ROM should be plenty and designing to that target...
If I were trying to improve on the port, which I'm not, it might be worth considering a two-stage effort: first, try to fix or at least ameliorate the pauses and the terrible music in the original without expanding the ROM; second, boost the size of the graphics and the quality of the sound and music by going to 48 or 64 Mb (or even higher if necessary). After all, the Nintendo 64 was out by this point - or were 8-bit ROMs more expensive for a given size?
It does seem to me that a moderate-sized S-DD1 cart (32 Mb or less?) should allow a more or less arcade-perfect Street Fighter II port, at least graphically...
...
Regarding HDMA streaming, does anyone know if it's reasonable to expect to be able to write a general-purpose audio engine capable of turning around fast enough to reliably grab a command byte and a couple bytes of sample data from the I/O ports every scanline while playing music? Or if one scanline is too fast, maybe a few bytes every two or three scanlines? I want this for my shmup port, so it's not an academic question...
...at 3 BRR bytes per shot, plus the line counter (if not firing every scanline) and an APU control byte, it seems storing audio in HDMA format is only slightly smaller than uncompressed 8-bit PCM... but it sure simplifies the CPU-side handling...
tepples wrote:
And no, bars aren't necessarily bad. A game with a 256x176 pixel playfield (NTSC) or a 256x208 playfield (PAL) would have huge bars but would zoom well on a modern TV without cutting off more than a couple pixels at the top and bottom, especially if it uses interlace.
I'm not designing for a modern "TV", still less would Capcom have been. I don't see why I should sacrifice quality for a widescreen experience at SNES resolution, besides which a lot of the games I like (including the one I'm porting) don't work well with even a little bit of lag.
(I'm sorry; I'm just really annoyed that you can't plug an old console into an HDTV and expect a usable result.
I wonder if it's possible to make an HDTV that can straight-up emulate an NTSC CRT with sub-millisecond lag, converting each scanline as it comes in, overwriting the 1080p image buffer one 240p (or 480i) line at a time and updating the display progressively to minimize the buffering delay... and of course responding properly to slightly nonstandard signals that would work perfectly on a CRT but tend to confuse modern image conversion algorithms...
And now you've got me thinking about hires modes. Since "retro" graphics are in vogue, or were last time I checked, maybe a high-resolution SNES game could be well-received. You wouldn't have enough VRAM or PPU time for the florid multilayer design aesthetic seen in high-end official SNES games, but you'd have way more real estate, and widescreen wouldn't feel cramped.)
93143 wrote:
I wonder if it's possible to make an HDTV that can straight-up emulate an NTSC CRT with sub-millisecond lag, converting each scanline as it comes in, overwriting the 1080p image buffer one 240p (or 480i) line at a time and updating the display progressively to minimize the buffering delay... and of course responding properly to slightly nonstandard signals that would work perfectly on a CRT but tend to confuse modern image conversion algorithms...
Of course it's
possible. Vendors have simply decided it's not worth the effort.
lidnariq wrote:
93143 wrote:
I wonder if it's possible to make an HDTV that can straight-up emulate an NTSC CRT with sub-millisecond lag, converting each scanline as it comes in, overwriting the 1080p image buffer one 240p (or 480i) line at a time and updating the display progressively to minimize the buffering delay... and of course responding properly to slightly nonstandard signals that would work perfectly on a CRT but tend to confuse modern image conversion algorithms...
Of course it's
possible. Vendors have simply decided it's not worth the effort.
And people wonder why I still play all my games on a CRT.
Khaz wrote:
lidnariq wrote:
Of course it's possible. Vendors have simply decided it's not worth the effort.
And people wonder why I still play all my games on a CRT.
The biggest reasons I can think of why it's not worth the effort:
1. The people who actually realize there's a difference and would care enough to buy such a TV is a small subset of the (already relatively small) retro-gamers market.
2. Their solution would be competing with CRTs, which are currently very cheap and available.
3. Additional competition with existing partial solutions like the XRGB Framemeister.
I think the market is just too small for a TV manufacturer to want to address. There isn't enough to gain, and making TVs is a mass-market thing. The only thing that would get this done is a hobbyist who really cares about it, like what Kevtris is doing with the HD NES, or even the XRGB is in a category of devices that can be made by one person or a small company. I've never seen a hobbyist making a custom HDTV; I think that might be too big a project for such a small scale of return.
Khaz wrote:
And people wonder why I still play all my games on a CRT.
My mother got rid of mine.
93143 wrote:
Mind you, Zangief is quite large in this game...
Well, this is what I did:
Attachment:
Zangief SNES.png [ 3.72 KiB | Viewed 1814 times ]
It's 51 sprites (I could have easily done it in less than half the amount if I weren't trying to save tiles...) and in 92 tiles, which is almost exactly half. You'd definitely need to double buffer... (which it should then be perfectly fine.)
You know, just shrinking these sprites actually turned out pretty good, as I didn't even tamper with it. I honestly have no clue as to why Capcom would create an arcade machine with a crazy wide resolution when a ton of its games run on 3:4 monitors. I'd imagine they could have gone with 320x224 and added an extra BG layer or more sprite overdraw or something.
Espozo wrote:
Khaz wrote:
And people wonder why I still play all my games on a CRT.
My mother got rid of mine.
93143 wrote:
Mind you, Zangief is quite large in this game...
Well, this is what I did:
Attachment:
Zangief SNES.png
It's 51 sprites (I could have easily done it in less than half the amount if I weren't trying to save tiles...) and in 92 tiles, which is almost exactly half. You'd definitely need to double buffer... (which it should then be perfectly fine.)
You know, just shrinking these sprites actually turned out pretty good, as I didn't even tamper with it. I honestly have no clue as to why Capcom would create an arcade machine with a crazy wide resolution when a ton of its games run on 3:4 monitors. I'd imagine they could have gone with 320x224 and added an extra BG layer or more sprite overdraw or something.
The game isn't "wide", it is designed for 4:3. It has a tall pixel aspect ratio, similar to how the NES has an 8:7 fat pixel aspect ratio. Conversely the CPS1/2 and similar systems can pack in a little more horizontal detail. It is intended to be "squished" to a 4:3 aspect.
mikejmoffitt wrote:
3:4 monitors
Oops, I meant 4:3.
mikejmoffitt wrote:
Conversely the CPS1/2 and similar systems can pack in a little more horizontal detail. It is intended to be "squished" to a 4:3 aspect.
I thought the main goal was generally to have square pixels. If possible, I would have tried to cut some pixels off the sides and add them to the top and bottom for not as much horizontal detail, but more vertical detail. It would make me crazy trying to stretch out all the artwork for that aspect ratio. I suppose whatever they where using to make the artwork could have also been running at a similar aspect ratio. The main reason I thought they did this was for vertical shooters, and it looks fine squashed anyway so they didn't bother.
Square pixels were converged on solely because everything else is sillier. But there's no particular reason—especially on CRTs with "soft" pixels unlike modern LCD/plasma "hard" pixels—to pick any given PAR over any other.
Subpixel rendering on LCDs uses a PAR of 1:3.
The 256px/512px (8:7, 4:7) modes enforced by the ColecoVision, NES, SNES, and available on the Genesis, TG16, and PS1 were chosen for the convenience of having a width that was a power of 2.
The 160px/320px/640px (12:7, 6:7, 3:7) modes enforced by the Atari 2600, 5200, 7800, IBM CGA, and available on the Genesis and TG16 were chosen because then the same clock could be used for both pixels and color encoding.
The PS1's 320/370/640px (≈10:11/8:11/19:40 NTSC ; ≈10:9/5:6/21:40 PAL) modes are a little weirder, but they come from only being able to use pixel clocks that were integer divisions of (NTSC×15=53.7MHz ; PAL×12=53.2MHz)
The VIC-20 (3:2 NTSC or 5:3 PAL) and C64's (3:4 or 5:6) pixel clocks were chosen to produce an actually perfectly NTSC/PAL compliant signal (at least, other than progressive scan vs interlace), unlike the vast majority of their contemporaries.
Espozo wrote:
Khaz wrote:
And people wonder why I still play all my games on a CRT.
My mother got rid of mine.
93143 wrote:
Mind you, Zangief is quite large in this game...
Well, this is what I did:
Attachment:
Zangief SNES.png
It's 51 sprites (I could have easily done it in less than half the amount if I weren't trying to save tiles...) and in 92 tiles, which is almost exactly half. You'd definitely need to double buffer... (which it should then be perfectly fine.)
You know, just shrinking these sprites actually turned out pretty good, as I didn't even tamper with it. I honestly have no clue as to why Capcom would create an arcade machine with a crazy wide resolution when a ton of its games run on 3:4 monitors. I'd imagine they could have gone with 320x224 and added an extra BG layer or more sprite overdraw or something.
Or just delay the animation by a TV frame if both characters end up being animated on the same frame. DKC does that.
In a lot of these classic 240p systems, the pixel clock rate is chosen not to have perfectly square pixels (which is exactly 135/22 MHz per Rec. 601) as much as to have a nice ratio with other rates derived from the same oscillator. For example:
- The Missile Command hardware's dot clock is 5 MHz, which is 4 times its 6502 clock.
- Neo Geo's dot clock is 6 MHz, which is half its 68000 clock.
- CPS's dot clock is 8 MHz, which is the same as CPS1's 68000 clock and two-thirds of CPS2's 68000 clock.
- Namco systems from this era tend to have a 288x224 pixel picture with a 6.144 MHz dot clock, which is 128 times the DAT and DVD sample rate and conveniently about 0.12% away from square. Ms. Pac-Man clocks its Z80 at half dot clock, and ToyPop clocks its 68000 at dot clock.
When one of these oscillators produces NTSC modulation, you get nice ratios to the 315/88 = 3.58 MHz color burst. NTSC-related are even used in arcades because they were so easy to get:
- Atari 2600 pixel clock equals color burst.
- Apple II HGR pixel rate is twice color burst, as are hires pixels in Atari 7800 and plenty of Atari arcade systems.
- Amiga and CGA can be set to 2 or 4 times color burst.
- The IBM PC's interval timer, used for the speaker, DMA refresh, and 18.2 Hz clock, is clocked at the same speed as the 2600's: one-third color burst or 1.19 MHz.
- NES's dot clock of 945/176 = 5.37 MHz is three times its CPU clock but also 3/2 times NTSC color burst.
- The dot clock of Commodore 64 and "super hi-res" mode in the Apple IIGS is 16/7 times color burst or 8.18 MHz, which produces 3:4 pixels (about as narrow as pixels) but is exactly 8 times its 6502 clock.
lidnariq wrote:
But there's no particular reason—especially on CRTs with "soft" pixels unlike modern LCD/plasma "hard" pixels—to pick any given PAR over any other.
Rotating an image by 90 degrees produces better results with a squarer PAR. Circular motion is easier with a squarer PAR. And if your game takes place on a grid, an oddball PAR may cause problems unless your engine can somehow compensate for PAR by sizing grid spaces with the opposite ratio. The PAR in Apple II's HGR mode is 6:7, but a 14x12 pixel grid space is perfectly square and meshes with the frame buffer's oddball 7-pixel slivers. The PAR of Super NES is 8:7, but
Zoop for Super NES mostly compensates by drawing to 12x14 pixel grid spaces.
psycopathicteen wrote:
Or just delay the animation by a TV frame if both characters end up being animated on the same frame. DKC does that.
So does the NES game I'm working on. It takes up to 2 frames to upload a sprite cel to CHR RAM, and other video memory updates can delay it. But it's double buffered, and one advantage of double buffering is that you can speculatively load, say, the next frame in a walk cycle.
tepples wrote:
It's very likely that they made the graphics small enough to fit in the cartridges being sold at the time. The arcade games were all at least 58.5 Mbit:
The console versions were a
lot smaller. The original
Street Fighter II for Super NES was 16 Mbit,
Turbo was 20 Mbit to support player use of boss characters plus frames for a couple new moves, and
Super was 32 Mbit to support the five new characters (including a new Ken replacing what was largely a head-swapped Ryu) and more new moves.
psycopathicteen claims that console
Street Fighter II shares code with console
Final Fight. Prototypes of
Final Fight were called
Street Fighter '89. I wonder how much code arcade
Street Fighter II shares with arcade
Final Fight.
And no, bars aren't necessarily bad. A game with a 256x176 pixel playfield (NTSC) or a 256x208 playfield (PAL) would have huge bars but would zoom well on a modern TV without cutting off more than a couple pixels at the top and bottom, especially if it uses interlace.
I wonder what fills arcade games up. I wonder if there are a lot of unused graphics, duplicates or unused space.
Tons and tons of tweens. And no compression, because these are plain CHR-ROM.
lidnariq wrote:
Tons and tons of tweens.
Like this?
"Tweens" here means frames of
inbetweening.
I still think the arcade game was kind've choppy though. I think most of the ROM size difference comes from the bigger sprite size. A lot of it could be BG animation, since the BGs of fighting games don't have a lot of repeated tiles.
I'm assuming you mean Street Fighter 2? Honestly, after learning more about the SNES and looking at the arcade version of Street Fighter Alpha 2 again, there are so many things that I'd have no clue what to do in trying to port it over. One major thing I saw was a 4bpp BG3, (Street Fighter 2 really doesn't use it that often, and it can mostly be made with sprites or improvised easily with 2bpp graphics) and with that shadow trail thing during a special, you pretty much have to make it to where a fighter uses less than 32 sprites, because there are 3 shadow trail sprites along with the regular character. Although this shadow trail thing probably doesn't really use any DMA bandwidth (set it up to where the previous frames of the fighter are used for the shadow) I imagine it could fill up sprite vram quickly. I guess I'd try to add it, but it wouldn't be the biggest deal without it. Also, if you want to keep BG3, you'd have to make the scoreboard out of sprites. This wouldn't be "arcade perfect" like a SF2 port, but it would sure as hell be better than what Capcom did.
Anyway, If you're lazy like me, here's a video of the game:
https://www.youtube.com/watch?v=cFZLsfhw4m0 (you got to love the awful tilemap "rotation" thing they like to show off a lot.)
You know, I can't find Street Fighter Alpha 2 sprites anywhere. I wish Mame weren't so picky with roms... (It's got to be the right rom for the right version or whatever. Does the arcade machine change whenever a new version of Mame gets released, because I don't think there should be any reason for the different rom versions.)
Espozo wrote:
I'm assuming you mean Street Fighter 2? Honestly, after learning more about the SNES and looking at the arcade version of Street Fighter Alpha 2 again, there are so many things that I'd have no clue what to do in trying to port it over. One major thing I saw was a 4bpp BG3, (Street Fighter 2 really doesn't use it that often, and it can mostly be made with sprites or improvised easily with 2bpp graphics) and with that shadow trail thing during a special, you pretty much have to make it to where a fighter uses less than 32 sprites, because there are 3 shadow trail sprites along with the regular character. Although this shadow trail thing probably doesn't really use any DMA bandwidth (set it up to where the previous frames of the fighter are used for the shadow) I imagine it could fill up sprite vram quickly. I guess I'd try to add it, but it wouldn't be the biggest deal without it. Also, if you want to keep BG3, you'd have to make the scoreboard out of sprites. This wouldn't be "arcade perfect" like a SF2 port, but it would sure as hell be better than what Capcom did.
Anyway, If you're lazy like me, here's a video of the game:
https://www.youtube.com/watch?v=cFZLsfhw4m0 (you got to love the awful tilemap "rotation" thing they like to show off a lot.)
You know, I can't find Street Fighter Alpha 2 sprites anywhere. I wish Mame weren't so picky with roms... (It's got to be the right rom for the right version or whatever. Does the arcade machine change whenever a new version of Mame gets released, because I don't think there should be any reason for the different rom versions.)
If your MAME installation complains that CPS2 games are missing ROMs when you try to run them, then what you're probably missing is qsound.zip, which contains the internal ROM of the system's audio DSP. Same reason you can't run DSP-1 games, etc., on a SNES emulator with LLE'd coprocessors (e.g. bsnes) unless you have the appropriate firmware ROMs.
Thank you!
On an unrelated note, Jesus, how much overdraw is available on the CPS2? It's like Neo Geo class sprite capabilities but with 3 BG layers. (Look at Chun Li's stage with all the bikers, and whenever you win a match, a giant portrait of your character made of sprites is put on top of everything.)
Wow, that game sure has nice music and animation.
I think that would be a ligitimate reason to use the MSU.
I agree. It's kind of neat, but I think fmv is kind of wasted potential. :/ (it's just simply DMAing tiles.) It's kind of saddening to see how many SNES and Genesis ports where butchered due to memory constraints and not actual hardware constraints.
Anyway, I tried a shot at the tilemap, and ohh boy... This is the kind of game that will use 4x the amount of palettes for just two extra colors, and will somehow use boatloads of gradients. (which get messed up when changing the size of the picture...) It's almost like they just randomly threw the tilemap together, as when I looked in Mame, I actually found palettes that where exactly the same as others except that they where missing a few colors (filled with magenta.) Literally, in BG 2, they use an extra palette for just 1 more color in a gradient.
Anyway, I said forget that, and I made this "inferior" one less color BG scaled to SNES size:
Attachment:
SFA2 SNES BG2.png [ 2.81 KiB | Viewed 1550 times ]
And I attempted BG 1, but the area where the snow and the tree meets is just a nightmare. Here's pretty much just a squashed version with some obvious color palette issues corrected. (colors from the tree where being used on the purple rock things to the left, and they don't share the same palette in the game)
Attachment:
SFA2 SNES BG1.png [ 12.52 KiB | Viewed 1550 times ]
You know, didn't Khaz make some sort of program that converts photos into 4bpp tilemap graphics? Is there a way to set it to where it doesn't use all 8 palettes and only uses 5 or something? Also, I know that you can change the color of one color every scanline using HDMA, but is it possible to have it to where say on like 64, it changes color 0, on line 65, it changes color 1, and on like 66, it changes color 2?
Also, somewhat unrelated, but I think I found a good use for the generally impractical mid scanline thing: a fighting game health bar. If the health bar is only something like 10 pixels tall, shouldn't this not be too hard at all?
Espozo wrote:
You know, didn't Khaz make some sort of program that converts photos into 4bpp tilemap graphics? Is there a way to set it to where it doesn't use all 8 palettes and only uses 5 or something?
When I was playing with the GBA, someone made a program called "Quither", but that's it.
Espozo wrote:
You know, didn't Khaz make some sort of program that converts photos into 4bpp tilemap graphics? Is there a way to set it to where it doesn't use all 8 palettes and only uses 5 or something
Why yes,
I did do that and it can do that very easily! I think the number of output palettes is simply a command-line flag on the python script, it defaults to 8.
If you find a use for my program like that please let me know, I'd be excited to hear about it!
Espozo wrote:
I agree. It's kind of neat, but I think fmv is kind of wasted potential. :/ (it's just simply DMAing tiles.) It's kind of saddening to see how many SNES and Genesis ports where butchered due to memory constraints and not actual hardware constraints.
Anyway, I tried a shot at the tilemap, and ohh boy... This is the kind of game that will use 4x the amount of palettes for just two extra colors, and will somehow use boatloads of gradients. (which get messed up when changing the size of the picture...) It's almost like they just randomly threw the tilemap together, as when I looked in Mame, I actually found palettes that where exactly the same as others except that they where missing a few colors (filled with magenta.) Literally, in BG 2, they use an extra palette for just 1 more color in a gradient.
Anyway, I said forget that, and I made this "inferior" one less color BG scaled to SNES size:
Attachment:
SFA2 SNES BG2.png
And I attempted BG 1, but the area where the snow and the tree meets is just a nightmare. Here's pretty much just a squashed version with some obvious color palette issues corrected. (colors from the tree where being used on the purple rock things to the left, and they don't share the same palette in the game)
Attachment:
SFA2 SNES BG1.png
You know, didn't Khaz make some sort of program that converts photos into 4bpp tilemap graphics? Is there a way to set it to where it doesn't use all 8 palettes and only uses 5 or something? Also, I know that you can change the color of one color every scanline using HDMA, but is it possible to have it to where say on like 64, it changes color 0, on line 65, it changes color 1, and on like 66, it changes color 2?
Also, somewhat unrelated, but I think I found a good use for the generally impractical mid scanline thing: a fighting game health bar. If the health bar is only something like 10 pixels tall, shouldn't this not be too hard at all?
How does the original BG look like in the arcade?
BTW, I found a website with the CPS2 game ROM sizes
http://digilander.libero.it/calimeroseg ... ps2_2.html and Street Fighter Alpha 2 is listed as 27,3MB, which I'm guessing the comma is supposed to be a decimal point. 27.3MB would be ~218 megabits. I'm surprised it's not that much bigger than Super Street Fighter 2 Turbo. Must've had a smaller PRG-ROM. If the game is mostly graphics, and because SNES graphics have 2/3 the resolution, then uncompressed, the game would take up ~146 megabits, and if you are able to compress the graphics with a 2:1 ratio, you can fit it in 73 megabits. In that case you probably could barely fit the game in a stock hardware ROM size. The only issue would be to find an algorithm that is fast and memory efficient enough.
Having a MSU will still give it a benefit with audio, since it can stream music directly, bypassing the SPC700. With the MSU approach, I think it will be possible to use the 65816 to do sound effects mixing, while the MSU is running the music.
psycopathicteen wrote:
If the game is mostly graphics, and because SNES graphics have 2/3 the resolution, then uncompressed, the game would take up ~146 megabits, and if you are able to compress the graphics with a 2:1 ratio, you can fit it in 73 megabits. In that case you probably could barely fit the game in a stock hardware ROM size. The only issue would be to find an algorithm that is fast and memory efficient enough.
Well, the real one uses an S-DD1, which has a pretty efficient compression scheme designed for SNES bitplanes and can decompress at the speed of DMA. I believe it can also bankswitch to allow up to 256 MB (sic.) of ROM.
Quote:
Having a MSU will still give it a benefit with audio, since it can stream music directly, bypassing the SPC700. With the MSU approach, I think it will be possible to use the 65816 to do sound effects mixing, while the MSU is running the music.
How many sound effect channels does the arcade game use?
If HDMA audio streaming could be incorporated into the audio engine at a high enough bitrate, it could be used for voice work and most of the SPC700's RAM could be dedicated to high-quality instrument samples. An improved bulk APU loader could switch out data in the background without freezing the game, so each tune could use a unique sample set if necessary. With judicious use of the echo buffer, it should be possible (?) to do a pretty decent rendition of the arcade game's music without using too many channels...
I've got to admit; I'm dubious about using the MSU1 here. For a new game, or a port of a game that
would be possible but for the memory limitations (like Road Blaster), or for a cool hack of an already-good game (like SMW or Zelda 3), it's great. But for something like this, where the problem is that the original game was (apparently) somewhat botched and we're (hypothetically) trying to fix it, I'd feel better about just pushing the period hardware to its limits, even if it ended up not quite as good.
Also, is there enough CPU time for software mixing without causing slowdown? Especially BRR mixing in stereo? Even if you were using MSU1, it still seems like a good idea to implement HDMA streaming for multiple independent voice samples (I suspect it would be much easier without a music engine in the way) and just preload regular sound effects into the audio RAM. The streaming bitrate requirement for a couple of 22 kHz BRRs would be far less than that for streaming directly to the echo buffer.
(I think my issue is partly with offloading something to the MSU1
instead of fully utilizing the existing facilities. It feels less lazy if it's used
in addition to energetic and creative use of the period hardware, though it still feels a bit like cheating in this context.)
Espozo wrote:
Also, somewhat unrelated, but I think I found a good use for the generally impractical mid scanline thing: a fighting game health bar. If the health bar is only something like 10 pixels tall, shouldn't this not be too hard at all?
I don't get it. What do you suppose a mid-scanline raster effect could do to a fighting game health bar?
93143 wrote:
I don't get it. What do you suppose a mid-scanline raster effect could do to a fighting game health bar?
You would change the scroll values for one of the BG planes to an area of the tilemap that's just the health bar because all the BGs are in use.
93143 wrote:
How many sound effect channels does the arcade game use?
Don't have a clue. The hardware appears to use the GBA kind of setup to where there are two audio channels for each speaker, but that all the sound effects get mixed together for a channel.
93143 wrote:
Well, the real one uses an S-DD1, which has a pretty efficient compression scheme designed for SNES bitplanes and can decompress at the speed of DMA. I believe it can also bankswitch to allow up to 256 MB (sic.) of ROM.
93143 wrote:
I've got to admit; I'm dubious about using the MSU1 here. For a new game, or a port of a game that would be possible but for the memory limitations (like Road Blaster), or for a cool hack of an already-good game (like SMW or Zelda 3), it's great. But for something like this, where the problem is that the original game was (apparently) somewhat botched and we're (hypothetically) trying to fix it, I'd feel better about just pushing the period hardware to its limits, even if it ended up not quite as good
How is this any worse? All we're really doing is just adding memory via a memory controller, or at least I'm pretty sure that that's all the MSU1 does. I'd say "cheating" if we were using an additional processor for game logic. I can't imagine fighting games require much processing, although Kirby Superstar and Super Mario RPG use the SA1 so I don't have a clue.
Also, about the SNES version having much music worse than the arcade version, I'm pretty sure half of the problem is the awful instrument choice.
psycopathicteen wrote:
How does the original BG look like in the arcade?
Like this:
Attachment:
BG1.png [ 14.31 KiB | Viewed 1658 times ]
Attachment:
BG2.png [ 3.38 KiB | Viewed 1658 times ]
Khaz wrote:
python
Don't tell me I need some sort of python compiler...
Espozo wrote:
You would change the scroll values for one of the BG planes to an area of the tilemap that's just the health bar because all the BGs are in use.
But the health bar in most fighters covers just about the whole screen horizontally. Wouldn't changing stuff around during HBlank accomplish the same thing?
Quote:
How is this any worse? All we're really doing is just adding memory via a memory controller, or at least I'm pretty sure that that's all the MSU1 does.
It also adds CD-quality audio streaming direct from the cartridge, bypassing the SNES APU. Which is what psycopathicteen was suggesting. As for me, I'd rather not go that route unless it's clear the APU is incapable of doing an adequate job no matter how clever we get. The idea here is that SFA2 could have been ported better at the time (at least, that's how I've been approaching it), and using MSU1 undercuts that argument.
Quote:
Also, about the SNES version having much music worse than the arcade version, I'm pretty sure half of the problem is the awful instrument choice.
It sounds to me like a combination of lackluster programming and sample work and a really tiny sample RAM pool, likely due to character voices hogging most of the space.
...oh wow. I'm just now listening to Adon's theme from the SNES port (
Youtube for those who don't have the file). I forgot how embarrassingly bad it was. Adon's theme in the arcade version is
EPIC, and I
know the SPC700 can do better than this nonsense.
Espozo wrote:
Khaz wrote:
python
Don't tell me I need some sort of python compiler...
It's not a compiler, just a program that runs python scripts:
http://www.python.org/
Espozo wrote:
Also, about the SNES version having much music worse than the arcade version, I'm pretty sure half of the problem is the awful instrument choice.
But
Super Street Fighter II for Super NES sounds OK. Or did Capcom mess it up for
Alpha 2? If so, were voice samples the culprit?
Guile's theme (
SSF2):
Arcade version? |
Super NES version
Quote:
Also, is there enough CPU time for software mixing without causing slowdown? Especially BRR mixing in stereo? Even if you were using MSU1, it still seems like a good idea to implement HDMA streaming for multiple independent voice samples (I suspect it would be much easier without a music engine in the way) and just preload regular sound effects into the audio RAM. The streaming bitrate requirement for a couple of 22 kHz BRRs would be far less than that for streaming directly to the echo buffer.
I just calculated it. If were doing 32kHz 16-bit stereo, with 4 channels, it takes an entire frame. 16kHz mono will only take up 1/4 of a frame.
I calculated it too.
24 kHz mono would need 24000/16 = 1500 9-byte packets per second at 16 samples per packet. With an HDMA format that contains 3 data bytes and 1 control byte, each packet can be sent in 3 scanlines. So that's 4500 scanlines per second, or 75 scanlines per frame. Or 150 for stereo.
The next question is whether the S-SMP can reliably receive data at this rate. There are about 64 SPC700 cycles per scanline. Anyone want to sketch a loop that receives 3 bytes in 64 cycles?
Another technique to try with stereo is "mid-side" aka "joint stereo", with a higher sample rate for L+R than for L-R.
Yet another technique is time-stretching waveforms. Stretching the "o", "u", and "e" in "Shoryuken" can in theory be done with the looping hardware. This was one of the tricks used to store lots of voice in games using ESS MX (aka DigiTalker), the others being 2-bit DPCM, cosinification of the waveform's phase that allowed use of ping-pong looping, and silencing the quiet half of the cosinified waveform. However, this requires some pitch detection algorithm to select the appropriate playback frequency for each looped portion.
This is starting to sound serious. When I get home, I might split the SFA2 stuff into a new SFA2 topic.
tepples wrote:
This is starting to sound serious. When I get home, I might split the SFA2 stuff into a new SFA2 topic.
Yes, you're absolutely right. It's about damn time we got back to discussing froyo. ಠ_ಠ
I figured out why the hdma wasn't working for Bad Apple. I misunderstood what the 7th bit of the length does.
tepples wrote:
24 kHz mono would need 24000/16 = 1500 9-byte packets per second at 16 samples per packet. With an HDMA format that contains 3 data bytes and 1 control byte, each packet can be sent in 3 scanlines. So that's 4500 scanlines per second, or 75 scanlines per frame. Or 150 for stereo.
Exactly.
I figured a single scanline was probably too fast to fit a high-precision music engine in around the data streaming, but two might just work, and three would be easier. At three bytes every three scanlines, 24 kHz mono just barely fits in a frame.
If you aren't doing music, you can probably just use VBlank to run the sound effects engine - perhaps have it branch on a command byte indicating end-of-frame, do its thing, and then resume polling the I/O ports. The data rate could be fairly high in this case - if you could actually get three bytes every scanline, you could do 32 kHz stereo with margin, or three 22 kHz mono voice samples (the announcer and both combatants) given appropriate command code design.
If you
are doing music, this might still work (I don't know; I've never coded a music engine). But if the music engine had to be faster than 60 Hz, the data rate would probably have to be lower, so as to be able to fit a complete engine cycle in between port reads. Or perhaps one could use a bifurcated engine, with high-speed high-granularity stuff running between I/O reads and heavier, more complicated stuff running during VBlank. As with NES audio, I imagine consciously using VBlank would constrain the available tempo choices, and introduce a big difference between NTSC and PAL...
Quote:
The next question is whether the S-SMP can reliably receive data at this rate. There are about 64 SPC700 cycles per scanline. Anyone want to sketch a loop that receives 3 bytes in 64 cycles?
I'd try, but I've never coded SPC700 before, and I really should be working.
...oh, what the heck. This is my first cut at 8-bit 65816, no optimization, no bug checking. Supports two streaming channels at three bytes per shot with 252-byte sample buffers:
Code:
start:
lda IO0 ; 3 cycles - load from first I/O port
cmp OLD ; 2 cycles - check if command byte has changed
beq sequence ; 2 cycles (4 if taken) - if not, do something else that needs doing
bit DATA ; 2 cycles? (please tell me SPC700 has a BIT equivalent) - check if command byte is a new data indicator or something else
beq command ; 2 cycles (4 if taken) - if something else, handle it elsewhere
bit CHANNEL ; 2 cycles? - check where to put the data
bne channel2 ; 2 cycles (4 if taken) - go to duplicate routine if data is for second streaming channel
lda IO1 ; 3 cycles - if streaming data to first channel, load from second I/O port
sta sample1,x ; 6 cycles (?) - store to indexed sample position
inx ; 2 cycles - since BRR is in 9-byte chunks, X doesn't need bounds checking yet
lda IO2 ; 3 cycles
sta sample1,x ; 6 cycles
inx ; 2 cycles
lda IO3 ; 3 cycles
sta sample1,x ; 6 cycles
inx ; 2 cycles
cpx SAMPLE1END ; 2 cycles - check for end of buffer
bne start ; 4 cycles (2 if not taken)
ldx SAMPLE1 ; 4 cycles - if at end of buffer, reset pointer index
bra start ; 4 cycles
TOTAL: 62 cycles in maximal case (give or take a few if I've misunderstood the instruction set or something)
Notice that this requires precise timing, since an 8-bit index register doesn't allow for a very big buffer. Also, it's probably impossible to receive data every line with this scheme, since it runs a fast internal engine cycle if there's no new data rather than just branching back to the check, meaning that by the time it realizes the ports have been written, it can be too late to get the data before they're written again.
The rest of the code would have to be sure to preserve X and Y, since the above routine assumes X is the buffer index, and the "duplicate routine" for the second streaming channel would presumably do the same with Y.
psycopathicteen wrote:
I figured out why the hdma wasn't working for Bad Apple. I misunderstood what the 7th bit of the length does.
Oh, excellent. Do you mean the HDMA line counter, or something else?
93143 wrote:
As with NES audio, I imagine consciously using VBlank would constrain the available tempo choices, and introduce a big difference between NTSC and PAL
My NES music engine uses Bresenham's algorithm to support any tempo expressed in subdivisions per minute. Internally, subdivisions are called "rows", drawing from tracker terminology. For example, Thwaite's soundtrack is at 100 BPM with 3 rows per beat, and each song opens with a
setTempo 300 command. Every frame, it adds the tempo to a 16-bit counter. When the counter becomes positive, it plays one row on all music tracks and subtracts 3606 (NTSC) or 3000 (PAL) from the counter, which is the number of frames per minute. But then that might be considered "low-precision" to some.
-1000
+ 300 = -700
+ 300 = -400
+ 300 = -100
+ 300 = 200 POSITIVE! Play row. - 3606 = -3406
+ 300 = -3106
+ 300 = -2806
[6 frames omitted]
+ 300 = -706
+ 300 = -406
+ 300 = -106
+ 300 = 194 POSITIVE! Play row. - 3606 = -3412
etc.
rainwarrior wrote:
a program that runs python scripts:
Is the python script runner literally called "Python"?
tepples wrote:
But Super Street Fighter II for Super NES sounds OK.
Definitely better than SFA2, but I still wouldn't call it the greatest.
tepples wrote:
Arcade version?
Yes. I think it's the best version, with the original SF2 arcade Guile theme behind it.
Khaz wrote:
tepples wrote:
This is starting to sound serious. When I get home, I might split the SFA2 stuff into a new SFA2 topic.
Yes, you're absolutely right. It's about damn time we got back to discussing froyo. ಠ_ಠ
Definitely.
(No Canadians here have ever tried it though? Sorry.
)
93143 wrote:
But the health bar in most fighters covers just about the whole screen horizontally. Wouldn't changing stuff around during HBlank accomplish the same thing?
That's why.
93143 wrote:
...oh wow. I'm just now listening to Adon's theme from the SNES port (Youtube for those who don't have the file). I forgot how embarrassingly bad it was. Adon's theme in the arcade version is EPIC, and I know the SPC700 can do better than this nonsense.
From 00:06 to 00:12 on the arcade version sounds like something straight out of Doom. The main noise sounds like something from the Commodore 64.
93143 wrote:
The idea here is that SFA2 could have been ported better at the time (at least, that's how I've been approaching it),
It's really embarrassingly bad, especially for
1995! I bet they just recycled the original SF2 engine for SFA2, which wasn't even that great to begin with.
Espozo wrote:
rainwarrior wrote:
a program that runs python scripts:
Is the python script runner literally called "Python"?
Yes. The official interpreter for a language is usually just named after the language. See
How do I installed Python?Feel free to PM me a split point as to when this topic became completely about
Street Fighter.
tepples wrote:
See How do I installed Python?
I hope you're not a native English speaker... Anyway though, thank you! it works.
tepples wrote:
Feel free to PM me a split point as to when this topic became completely about Street Fighter.
Like Khaz suggested, Does this really even need a split?
Anyway though, Khaz, how do you get your picture converter thing to work? I tried copying and pasting the thing from Github, but I was also copying the line number... I suppose I could go through the entire thing, but I guarantee there's a better way and I want to know if I'm doing it right in the first place.
Espozo wrote:
tepples wrote:
See How do I installed Python?
I hope you're not a native English speaker
Someone's husband is behind on his
memes.
tepples wrote:
behind
You mean by about 10 years?
Attachment:
World's Most Popular Meme.png [ 23.39 KiB | Viewed 1539 times ]
Exactly. I thought "How do I (past tense verb phrase)" was so long established that it would be understood.
I think most people remember a meme for about as long as it's popular. There's plenty of random crap all the way up to 2010 I've never heard about.
You know, one thing you got to love is a meme that's extremely popular for about 1 second:
https://www.google.com/trends/explore#q=squid%20kid (to be fair though, it hasn't been around that long. Just look at how the popularity of the other meme came back after the initial spike.)
Espozo wrote:
Like Khaz suggested, Does this really even need a split?
Anyway though, Khaz, how do you get your picture converter thing to work? I tried copying and pasting the thing from Github, but I was also copying the line number... I suppose I could go through the entire thing, but I guarantee there's a better way and I want to know if I'm doing it right in the first place.
I'm pretty sure the github page has a "download raw" button or something so you can get the actual source file directly.
It's a command-line tool, and how it works should be covered by its built-in "help" function that I personally forget how to access. iirc input has to be a 256 color bitmap. Give it a try and let me know if it works for you. I'll be home later and will be able to answer questions you may have better then.
Khaz wrote:
I'm pretty sure the github page has a "download raw" button or something so you can get the actual source file directly.
Yeah, you're right. Anyway though, I pasted the whole thing into Python and it said...
Quote:
SyntaxError: multiple statements found while compiling a single statement
>>>
I don't have a clue as to what this means...
If you're using the interpreter directly, you can only put one thing on a single line (I don't know why). Pasting multiple lines into one "input" line makes that (stupidly worded) error.
Or you can just paste it into a script file and run that script.
Yeah... Just download the raw .py file, then run it in a command line like "quantomatic img.bmp". You will probably need to pay attention to some of the optional flags to get the right results... (still not home yet)
Well, I noticed that the program only accepts bmp files, so I just converted what I wanted into a bmp file but then when I thought I'd have it done, the command line spat out this:
Quote:
Invalid DIB Header size. Non-Windows BMP DIB Headers are not supported at this time.
I use GIMP, if that makes a difference.
I also have a random question... How possible is it to change colors during hblank not using HDMA?
Okay, I got to admit, this is pretty awesome:
https://www.youtube.com/watch?v=gKe-AdS7npI
Espozo wrote:
Well, I noticed that the program only accepts bmp files, so I just converted what I wanted into a bmp file but then when I thought I'd have it done, the command line spat out this:
Quote:
Invalid DIB Header size. Non-Windows BMP DIB Headers are not supported at this time.
I use GIMP, if that makes a difference.
That really is strange but, I did indeed put that error in there myself:
Code:
if dibHeaderSize != 40:
print ("Invalid DIB Header size. Non-Windows BMP DIB Headers are not supported at this time.")
quit()
It MIGHT be because you used GIMP instead of MSPaint. Though now that I see my code, I also misspoke earlier when I said it takes a "256 color bitmap". From the header of the file: "#Takes in a 24bpp Bitmap file". So, if you were feeding it a 256 color file, that could also have something to do with it. Apologies for my mistake.
Espozo wrote:
I also have a random question... How possible is it to change colors during hblank not using HDMA?
... H-Counter Interrupt, set to trigger right near the end of a scanline? You probably wouldn't want it triggering every single line though, so probably a combination. V-counter to turn on the H-counter when you get to the right scanline, H-counter triggers right near the end of the visible line so you can optimize it to write data immediately the moment it's capable of doing so, then turns itself back off until the next V-counter trigger.
That's all I got, anyhow.
Yeah, I was using an indexed color picture, but just now, I didn't and it still gave me the same error.
Espozo wrote:
Yeah, I was using an indexed color picture, but just now, I didn't and it still gave me the same error.
Have you tried opening it in MSPaint and re-saving it again? Do you have MSPaint?
Failing that, feel free to send me the file and I will take a look at it and find out what is up with it. It'll probably be simple to adjust my script to handle it... Hopefully...
Espozo wrote:
No Canadians here have ever tried it though?
I thought I was the only one we got any data on, and that conclusion is restricted to the Splatoon promo and to the first brand name you mentioned. (Also, I was unfamiliar with the 'hip' contraction of the product name.) I have indeed tried Yogen Früz, just not the promo stuff.
Quote:
93143 wrote:
But the health bar in most fighters covers just about the whole screen horizontally. Wouldn't changing stuff around during HBlank accomplish the same thing?
That's why.
I'm not sure I managed to adequately convey to you the degree of imprecision involved in mid-scanline raster effects. Based on my experience so far, either they're extraordinarily difficult to time exactly right or you have to put up with the change happening on an 8-pixel column boundary, or both, depending on what it is you're doing. And unless you're changing parameters for something that's completely transparent for the duration of the change attempt regardless of any intermediate states, you will get glitching (scroll changes take multiple writes). A pixel-perfect seam between two arbitrary shapes via scroll change is outright impossible.
Not to mention that the health bar in SFA2 is translucent in the damage area...
I think you'd have better luck reorganizing the BG layers so as to never need more than 3. It helps that each tile has its own priority and palette, and that you can switch around the scroll offsets (and even the BG3 priority bit in BGMODE) during HBlank. If you really had to, you could change scroll parameters mid-scanline between objects that are guaranteed to be on opposite sides of the screen and never get near one another...
Espozo wrote:
I also have a random question... How possible is it to change colors during hblank not using HDMA?
As said, H- or H/V-IRQ. Or timed code (preposterous), or polling HVBJOY for the HBlank flag (CPU hog and/or imprecise).
Why would you want to?
93143 wrote:
I'm not sure I managed to adequately convey to you the degree of imprecision involved in mid-scanline raster effects. Based on my experience so far, either they're extraordinarily difficult to time exactly right or you have to put up with the change happening on an 8-pixel column boundary, or both, depending on what it is you're doing. And unless you're changing parameters for something that's completely transparent for the duration of the change attempt regardless of any intermediate states, you will get glitching (scroll changes take multiple writes). A pixel-perfect seam between two arbitrary shapes via scroll change is outright impossible.
Well, Would it be fine for if you're covering a large part of it with sprites? I mean like on this picture, the edge of the rock wall would be sprites, and the rest of the rock wall and the ship would be one background layer. (I suppose you could make it one straight line down, but it would be better if it where in remotely the same shape as the wall)
Attachment:
BG split.png [ 36.8 KiB | Viewed 1398 times ]
I just really don't think I understand how difficult this really is. Just one, not very accurate split down the middle of the screen doesn't seem like it would be that difficult to do. What would be kind of difficult though is that you had to move the split point, like you would in the example above.
93143 wrote:
If you really had to, you could change scroll parameters mid-scanline between objects that are guaranteed to be on opposite sides of the screen and never get near one another...
Now you're talking!
93143 wrote:
Why would you want to?
Why not?
This in conjunction with HDMA should enable you to change more color per scanline, shouldn't it? One thing I'm really crazy about is 2bpp status bars. Because many games have their letters use simple gradients, you could do this instead of wasting HDMA channels for something that takes up the entire screen instead of 16 scanlines. Yeah, I'm crazy...
Okay, so all I did was delete that one little thing of python code, and after about 30 minutes, I finally got three files, being the tiles, the tilemap, and the palettes. I noticed that it doesn't actually give you a picture of what it made, and because I'm lazy, can anyone see how this looks?
Attachment:
Test Picture.zip [2.54 KiB]
Downloaded 46 times
This was the original picture: (I would have just jumped straight to the SFA2 picture if I knew it where going to take 30+ minutes...)
Attachment:
Doom.bmp [ 168.12 KiB | Viewed 1385 times ]
Uhm, your zip file there appears to only contain the tilemap and palettes, not the tile set.
Also, it was actually designed to output a bitmap of the final results, but that part was commented out. I was going to make it a command-line option of whether to output one or not, but never got around to it. If you want, you can edit the code yourself to do it. There's this part near the bottom:
Code:
#=======================================================================================================================
#Dump final bitmap
#print("Exporting Bitmap Reduced To {} Palettes...".format(fullPals))
#with open(outputBMPNameC, "wb") as f:
#Copy original header back verbatim
# for i in range(pixelStartAddress):
# f.write(bytelist[i])
#Then write new pixel data
# for i in range(bitmapHeight):
# for j in range(bitmapWidth):
# tileCoordY = i%8
# tileCoordX = j%8
# mapCoordY = i//8
# mapCoordX = j//8
#for each pixel, lookup which palette to use (FIND mapCoordY, mapCoordX IN tileMergePalTileList), find the closest colour match in that palette, then write that colour value
# foundTileMatch = False
# for n in range(fullPals):
#print("PALETTE {}: {} TILES".format(n, tileMergePalNumTiles[n]))
# for m in range(tileMergePalNumTiles[n]):
#print("{} == {}, {} == {}".format(tileMergePalTileList[n][m][0], mapCoordY, tileMergePalTileList[n][m][1], mapCoordX))
# if tileMergePalTileList[n][m][0] == mapCoordY and tileMergePalTileList[n][m][1] == mapCoordX:
# foundTileMatch = True
# correctDrawPal = n
# break
# if foundTileMatch == True:
# break
# if foundTileMatch == False:
# print("WTF Tile not found {}, {}".format(mapCoordY, mapCoordX))
# quit()
# bestPixelDistance = 999999
# for n in range(15):
# pixelDistance = ((tileMergePalR[correctDrawPal][n] - tileClrsR[mapCoordY][mapCoordX][(tileCoordY*8)+tileCoordX]) ** 2)
# pixelDistance = pixelDistance + ((tileMergePalG[correctDrawPal][n] - tileClrsG[mapCoordY][mapCoordX][(tileCoordY*8)+tileCoordX]) ** 2)
# pixelDistance = pixelDistance + ((tileMergePalB[correctDrawPal][n] - tileClrsB[mapCoordY][mapCoordX][(tileCoordY*8)+tileCoordX]) ** 2)
# if pixelDistance < bestPixelDistance:
# bestPixelDistance = pixelDistance
# bestClrMatch = n
# tileDataOutputArray[bitmapHeight - i - 1][j] = bestClrMatch + 1 #Plus One Because Palette Colour Zero is Transparent!
# f.write(struct.pack("B", int(tileMergePalB[correctDrawPal][bestClrMatch] * 8)))
# f.write(struct.pack("B", int(tileMergePalG[correctDrawPal][bestClrMatch] * 8)))
# f.write(struct.pack("B", int(tileMergePalR[correctDrawPal][bestClrMatch] * 8)))
#=======================================================================================================================
#For when not dumping final bitmap, quickly and quietly populate tileDataOutputArray instead
#Write new pixel data
for i in range(bitmapHeight):
for j in range(bitmapWidth):
tileCoordY = i%8
tileCoordX = j%8
mapCoordY = i//8
mapCoordX = j//8
#for each pixel, lookup which palette to use (FIND mapCoordY, mapCoordX IN tileMergePalTileList), find the closest colour match in that palette, then write that colour value
foundTileMatch = False
for n in range(fullPals):
for m in range(tileMergePalNumTiles[n]):
if tileMergePalTileList[n][m][0] == mapCoordY and tileMergePalTileList[n][m][1] == mapCoordX:
foundTileMatch = True
correctDrawPal = n
break
if foundTileMatch == True:
break
if foundTileMatch == False:
print("WTF Tile not found {}, {}".format(mapCoordY, mapCoordX))
quit()
bestPixelDistance = 999999
for n in range(15):
pixelDistance = ((tileMergePalR[correctDrawPal][n] - tileClrsR[mapCoordY][mapCoordX][(tileCoordY*8)+tileCoordX]) ** 2)
pixelDistance = pixelDistance + ((tileMergePalG[correctDrawPal][n] - tileClrsG[mapCoordY][mapCoordX][(tileCoordY*8)+tileCoordX]) ** 2)
pixelDistance = pixelDistance + ((tileMergePalB[correctDrawPal][n] - tileClrsB[mapCoordY][mapCoordX][(tileCoordY*8)+tileCoordX]) ** 2)
if pixelDistance < bestPixelDistance:
bestPixelDistance = pixelDistance
bestClrMatch = n
tileDataOutputArray[bitmapHeight - i - 1][j] = bestClrMatch + 1 #Plus One Because Palette Colour Zero is Transparent!
#=======================================================================================================================
...All you have to do is uncomment the first block, and comment out the second one (after "For when not dumping final bitmap")...
But anyways, yeah... All you have to do to use the output is set up Mode 1, 8x8 tiles, and then DMA the tileset, map and palettes over. I have a little picture-displayer program I will probably put up on my github when I get a second
Khaz wrote:
Uhm, your zip file there appears to only contain the tilemap and palettes, not the tile set.
No!!! I think I deleted the actual tiles on accident. 30 minutes wasted...
Khaz wrote:
...All you have to do is uncomment the first block, and comment out the second one (after "For when not dumping final bitmap")...
Good. I noticed one thing about your picture converter thing, and it appears that the pictures cannot be over 256 pixels wide and tall? Well, it would take a millennium anyway. One thing that's ridiculous about my computer is that it seems that it never allows anything to use more than 50% of the (already small) CPU power, so I was playing Mame waiting for it.
Espozo wrote:
I just really don't think I understand how difficult this really is. Just one, not very accurate split down the middle of the screen doesn't seem like it would be that difficult to do. What would be kind of difficult though is that you had to move the split point, like you would in the example above.
The basic IRQ jitter is about two tiles wide, since that's about as long as instructions get. An arbitrary H/V scroll change takes a whole tile to write even if you use DMA, and nearly two if you don't. I believe scroll changes take effect on an 8-pixel column boundary. And if you want this to be able to happen anywhere on the screen, you need to account for DRAM refresh getting in the way, which is another 10 dots.
So for an arbitrary shape like that rock edge, you're talking
at least a 32-pixel-wide sprite mask, and that's if you check the H-counter during the interrupt and use cycle-counted stagger-stepping to reduce the IRQ jitter. If you don't do that, you're looking at 40 pixels minimum, or more if you aren't using 8x8 sprites (or if you can't quite line it up precisely enough; there is
no margin in those numbers and they could be a bit optimistic).
Moving the split point just adds time at the end of the interrupt. But the interrupt is already taking a fair bit of time, especially if you want a stagger-step, and the split is fairly tall in this case. You'd end up burning a large chunk of your CPU time per frame if you insisted on doing this. Of course, if you can pull it off without hurting the gameplay...
...I wonder if one could take advantage of the fact that the actual rock wall doesn't look like it has much detail on the inside. A lot of that could simply be masked off with a window. Even just replacing what detail there is with sprites shouldn't take much if any more sprite data than masking a scroll change, and you could skip the H-IRQ...
...
I really need to find out for sure what happens when you try to change OBSEL on a scanline full of sprites. I suspect one would have to change it
before HBlank, if the PPU caches sprite graphics during HBlank, but I wonder if it would affect the picture if it were written mid-line... My display engine test writes it mid-line, and it works perfectly, but the write always happens to the right of all the sprite data on the line, so there's nothing left to render by the time the change happens...
Quote:
Why not?
This in conjunction with HDMA should enable you to change more color per scanline, shouldn't it?
Actually, changing one CGRAM entry with each channel (8 total) uses up basically all of HBlank. I'm kinda surprised it works at all (though I'm not clear on where exactly the overhead cycles happen)...
The only way to exceed this would be with ordinary DMA, most likely during an IRQ. But you'd have to be changing a block of contiguous colours, to save time writing the address; otherwise it wouldn't be a whole lot faster than HDMA.
Quote:
Because many games have their letters use simple gradients, you could do this instead of wasting HDMA channels for something that takes up the entire screen instead of 16 scanlines.
If you're using all your HDMA channels, you probably won't be able to target the remaining sliver of HBlank accurately enough anyway, even if there's technically enough time left in the first place.
If you're doing CGRAM changes for other reasons too, keep in mind that an HDMA channel targeting CGRAM can change
any entry; you aren't limited to a single entry or subpalette. So unless you have to change more than one colour on a single line, you can use one channel for everything.
93143 wrote:
Moving the split point just adds time at the end of the interrupt. But the interrupt is already taking a fair bit of time, especially if you want a stagger-step, and the split is fairly tall in this case. You'd end up burning a large chunk of your CPU time per frame if you insisted on doing this. Of course, if you can pull it off without hurting the gameplay...
If it doesn't take more than a 10th of CPU time, it's fine with me. Really, If I plan on doing all this crap, I'd might as well use the SA-1 in conjunction with the SNES CPU, so one could do this stuff and the other could do "normal" game logic. It's not that the SNES's CPU is slow, it's that the PPU's have too many shortcomings...
93143 wrote:
The only way to exceed this would be with ordinary DMA, most likely during an IRQ. But you'd have to be changing a block of contiguous colours, to save time writing the address;
How contiguous many colors could you change doing this? I see that HDMA actually affects this kind of stuff? What happens if HBlank code isn't done in time?
93143 wrote:
I really need to find out for sure what happens when you try to change OBSEL on a scanline full of sprites.
If it works, it seems a bullet hell shooter would be possible on the SNES after all.
Espozo wrote:
One thing that's ridiculous about my computer is that it seems that it never allows anything to use more than 50% of the (already small) CPU power, so I was playing Mame waiting for it.
Sounds like a dual-core processor (50% = 1 core). If the program hasn't been designed to take advantage of multiple cores, that's what happens.
And Python is notorious for using only one core for each process because of the Global Interpreter Lock that simplifies its operation. To make use of more than one core, you have to import multiprocessing and send pickled objects back and forth. Some native code libraries release the GIL while running native code, allowing other threads to run, but many don't (such as Pillow).
Espozo wrote:
If it doesn't take more than a 10th of CPU time, it's fine with me.
I was thinking more like a third, and that's with
no jitter reduction. That rock edge shows up on a large majority of the active scanlines in the frame. This is also not counting preparation time for the table of timer values, which would have to be updated and bounds-checked every frame. It also doesn't count HDMA for scroll reset.
Actually, I forgot about HDMA. Since the rock edge can be anywhere on the screen, you'd have to be prepared to fire the IRQ during or even before HBlank, and thus you'd have to account for variable amounts of HDMA introducing variable waiting periods in the IRQ, unless you just didn't use HDMA for anything other than restoring the scroll state (and even that would probably introduce a few pixels of extra jitter). So that's another factor increasing the necessary width of the sprite mask, at least in areas where HDMA is being used.
After seeing that part of the game in motion, I don't think my windowing idea will work; there's too much going on to the left. But you could still use windowing to reduce the required sprite mask coverage in spots, since a lot of the rock wall is totally featureless... in fact you could probably use it to eliminate the requirement to change the IRQ trigger point every line, saving some CPU time. In that case, it might take closer to a quarter of the available CPU time outside VBlank, counting scroll and window HDMA.
I was thinking of using BG3 for the ship/cliffs/waterfall and sprites (temporarily?) for the status display, but 2bpp might be too much of a quality hit even with HDMA and 8 subpalettes... on the other hand, the background
is fairly pale, and it would make animating the waterfall easier...
Naturally, the actual programming is where the rubber meets the road; if you encounter a situation like this, you're the one who'll have to make the decisions...
Quote:
How contiguous many colors could you change doing this?
You have about 340 master cycles of HBlank, but you can probably write off at least a tile-width, maybe two, due to IRQ jitter and DMA sync behaviour. With jitter reduction, you might have about 310 guaranteed cycles. You have to write the address after rendering has stopped, so that's 16 cycles with DMA, plus channel overhead for the main colour transfer for 24 master cycles total. Now the actual transfer has 286 cycles to execute; each colour is two bytes, so dividing by 16 we get 17.875. Without jitter reduction, it's somewhere between 16 and 17. In other words, you could probably update a 4bpp subpalette during HBlank.
This is using the 512-PPU-accesses-per-line theory of CGRAM bandwidth, which seems to be most consistent with my observed results (I've never had trouble using all 8 HDMA channels for CGRAM even in very busy tests). But there might be some weird reason it doesn't work the way I've described above; if so, no one seems to know about it.
Quote:
I see that HDMA actually affects this kind of stuff?
HDMA pauses the CPU while it executes, just like regular DMA. It's ~18 master cycles overhead per transfer, plus 8 for each channel that hasn't terminated yet whether or not it has data to push, plus 8 master cycles per byte of data pushed by any channel. If you use a lot of 4-byte channels and indirect addressing (16 extra master cycles to load an address), HDMA can easily run past the far side of HBlank and into the next line, and all of that time is unavailable to the CPU.
Quote:
What happens if HBlank code isn't done in time?
It runs into active display, and anything that doesn't work properly during active display won't work properly. I'm not an expert on the minutiae of PPU timing, but I'm pretty sure that's the gist of it.
Quote:
93143 wrote:
I really need to find out for sure what happens when you try to change OBSEL on a scanline full of sprites.
If it works, it seems a bullet hell shooter would be possible on the SNES after all.
OBSEL, not OAM. I'm changing where the sprite tables are located in VRAM. Mid-frame paging, essentially.
I do need this to work for my bullet hell shooter to be possible without substantial graphical compromises. But that's because of the way I've chosen to duck the global sprite count limit - I need enough sprite data to cover a 144x192 Mode 7 playfield in a rendered layer from the Super FX, plus some other stuff that takes more than the 2.5 kB I'd have left over in just two sprite tables...
I expect it to work in my case. I'm not as confident in the case of something like Metal Slug, where there's no sprite-free buffer zone on the right-hand side of the screen. It should still work...
93143 wrote:
I was thinking more like a third, and that's with no jitter reduction.
You know though, call me crazy, but I don't think there'd be any other way around it. The reason I brought up Metal Slug is because I remember thinking about if there where an SNES port and that dumb rock wall. That's just about the only time in the game where 3 "BG layers" are used, not including the fix layer. There are also several higher priority parts on the "BG layer" that are just sprites overlapping the rest of it, but because those sprites seem to use the palette of what's underneath it, you could do what Equinox does and make a sprite tilemap thing to occlude the sprites that are "underneath" the higher priority part of the BG layer.
Quote:
I was thinking of using BG3 for the ship/cliffs/waterfall and sprites (temporarily?) for the status display, but 2bpp might be too much of a quality hit even with HDMA and 8 subpalettes... on the other hand, the background is fairly pale, and it would make animating the waterfall easier...
I'd keep the status bar BG3 the whole time. This is, no joke, actually possible using BG3 and 4 sub palettes if you change the color every scanline: (The SNES's resolution is 48 pixels shorter than this game, so the status bar had to be condensed. I wouldn't adjust all the graphics for a difference that small)
Attachment:
Status Bar.png [ 5.24 KiB | Viewed 1773 times ]
93143 wrote:
you could probably update a 4bpp subpalette during HBlank.
Wow. That would make changing palettes for the BG a whole lot easier, because if you couldn't change a whole palette on one scanline, then you'd have to make sure the palette you're changing isn't being displayed at that point.
93143 wrote:
OBSEL, not OAM.
Wait a minute though, is it that OAM is only free to write to during vblank, or is it that it's only not free to write to during hblank?
Espozo wrote:
You know though, call me crazy, but I don't think there'd be any other way around it.
There is. Use BG3 for the far background and switch to sprites for the status bar once the rock edge gets far enough to the left that you need the far background and the status bar to coexist on the same scanlines.
On the other hand, except for the initial destruction of that wooden structure, it doesn't seem like an especially busy area anyway, so it could be feasible to just eat the CPU hit. With windowing, you might be able to get the CPU hit below 30% all-in (the detail down near the water is probably better served by more precise timing, since if you overload it with masking sprites you'll blow past the scanline limit and get clipping).
...Actually, that's another point to consider. If you do a mid-scanline scroll change, and enough is going on on screen to exceed the scanline limits on sprite tiles, the sprite mask could disappear and show the guts of the effect to the world. Using window masking (with the window's right edge exactly on the rock edge) in conjunction with sprite masking would solve this; you might get "background" dropouts, but the actual glitched BG wouldn't show up underneath. Updating the window's HDMA table would take a bit of extra time, perhaps on the order of 5% of the frame unless you want to dedicate a couple of banks to a table of tables. So I guess we're back to a third of the available CPU time...
Either way, this is looking really heavy on sprite memory. Perhaps the upper part of the rock wall would benefit from some rework to use less VRAM? Mid-screen paging could be necessary (hope it works), but would that require too many duplicates and eat up too much space for BG tiles? Another advantage of doing more BG graphics in 2bpp is that they'd make everything more likely to fit...
...I wonder if some of that wooden scaffolding could be done in BG3, assuming you go with the raster split...
Quote:
I'd keep the status bar BG3 the whole time. This is, no joke, actually possible using BG3 and 4 sub palettes if you change the color every scanline
A preliminary inspection suggests to me that it is
almost possible to do this with two HDMA channels. It's so close I almost want to actually tabulate it and see...
Quote:
I wouldn't adjust all the graphics for a difference that small
There does seem, however, to be one definite instance of low-hanging fruit in there:
Attachment:
Status Bar mod.png [ 2.46 KiB | Viewed 1730 times ]
Quote:
That would make changing palettes for the BG a whole lot easier, because if you couldn't change a whole palette on one scanline, then you'd have to make sure the palette you're changing isn't being displayed at that point.
Or just make sure the colours you need are all available on the lines you need them on, either by staring at it for hours or by coding a Python script to automate it or something. (I found the "staring at it" method to work better than my Matlab script, but that was with
one HDMA channel modifying a single 4bpp subpalette, so it wasn't intractable to my human brain - there's no way I could ever have managed
hcolor without the script...)
But yeah, if the IRQ+DMA method works, you could use an arbitrary 4bpp palette on every line. As long as you don't need any other raster effects, because nothing else is going to fit in HBlank...
Quote:
Wait a minute though, is it that OAM is only free to write to during vblank, or is it that it's only not free to write to during hblank?
viewtopic.php?f=12&t=6758&p=65562#p65562OAM is busy all scanline, with the internal address being continually changed by the PPU. During HBlank, the address is left at the last position checked by the PPU (which I imagine would be the top of the high table) and apparently cannot be rewritten for CPU access (Uniracers writes to the high table during HBlank, possibly to prevent sprites from crossing the screen split in two-player mode). As I understand it, to access the OAM address pointer normally, you have to be in either forced blank or VBlank. And if you force blank during HBlank, well, the prevailing opinion seems to be that you'd get either no sprites or glitched sprites on the following scanline, since HBlank is when the actual VRAM data tables are read and the line buffer filled. The remaining questions seem to be: 1) how much of HBlank is available for CPU writes?, and 2) does the internal address increment on such writes and wrap back to the beginning of the low table?
OBSEL ($2101) is a very different story; it seems to work any old where... though I admittedly haven't tried changing sprite
sizes mid-screen; only table positions... and I still haven't tried changing it in the middle of a fully-loaded scanline; I should get to that... I imagine changing it during HBlank only worked because I only had one sprite, and thus it didn't happen to interrupt the graphics fetch process halfway; this could explain why higan accuracy showed a different result than my SNES when I used HDMA to change OBSEL but worked the same when I used a mid-scanline IRQ...
93143 wrote:
There is. Use BG3 for the far background and switch to sprites for the status bar once the rock edge gets far enough to the left that you need the far background and the status bar to coexist on the same scanlines.
But it won't look that good...
93143 wrote:
On the other hand, except for the initial destruction of that wooden structure, it doesn't seem like an especially busy area anyway, so it could be feasible to just eat the CPU hit.
It isn't even in view at that point anyway. (part of the rock ledge is, but not the waterfall and the ship, so initially, it wouldn't need any trickery)
93143 wrote:
If you do a mid-scanline scroll change, and enough is going on screen to exceed the scanline limits on sprite tiles, the sprite mask could disappear and show the guts of the effect to the world.
Ehh, you're thinking about it too much. Objects would be clipped long before that would, as they are higher priority. Anyway, I don't think anyone has ever cared too much about this kind of stuff if they have dealt with sprite clipping in the first place.
93143 wrote:
...I wonder if some of that wooden scaffolding could be done in BG3, assuming you go with the raster split...
You'd might as well forget about using 2bpp tiles on anything in the game, as it doesn't look very good at all. I've tried.
Anyway, I made part of the rock wall (the part where it's visible without going up) using a sprite sheet from someone (they didn't have the top part for whatever reason) and it's 31 sprites and 54 16x16 sprite tiles. The thing that's a pain is that it's more than 15 colors... (somehow) I think the more accurate but more CPU intensive method might be worth it for here. The wall as it is right now is exactly 40 pixels wide at its skinniest point. (I didn't plan it out, it's just how the tiles where set on the Neo Geo)
Attachment:
Rock Wall.png [ 8.81 KiB | Viewed 1699 times ]
93143 wrote:
A preliminary inspection suggests to me that it is almost possible to do this with two HDMA channels. It's so close I almost want to actually tabulate it and see...
I think it's like 5 with the "Push Start" (It's got a gradient) and 3 without. Honestly, considering the "Push Start" won't even appear without a player and you'd at least like to be able to have enough palettes for a second player, I think that's something that could be made with sprites. And yeah, your numbers for the timer look better. I actually did redo some of the graphics by getting rid of "arms" and "bombs" and replacing it with "ammo" in the same style because it was super crammed originally.
93143 wrote:
But yeah, if the IRQ+DMA method works, you could use an arbitrary 4bpp palette on every line. As long as you don't need any other raster effects, because nothing else is going to fit in HBlank...
Yeah, that's the problem... I was thinking of a way to try to make it seem like there are more sprite palettes on the SNES than there really are by setting up a system where a palette that is only used for some position on top of the screen would be converted into a different one for objects below it, like the palette for a helicopter would turn into the palette for a tank, and there are a surprising number of other instances where this would work.
Basically, this is where it would stand with the SNES's current palette situation:
1 player: should be fine, but could run into problems if you purposely try to overload the amount of colors.
2 player: expect a mess...
And this is where it would stand with the scanline changing thing:
1 player: should be perfectly fine no matter what.
2 player: should be fine, but could run into problems if you purposely try to overload the amount of colors.
Yeah, ironically, the system that gets labeled as being super colorful, the lack of colors is probably my biggest problem with it. I don't care if the Genesis had a 3GHz CPU because I still wouldn't touch it. Anyway though, how is it that a system like the Neo Geo can have 16x the amount of palette entries, but not have 16x more or be 16x better at anything else? If I'm not mistaken, palettes are stored on a chip external from the video processor, unlike the SNES or the Genesis which are stored on the video processor. I guess the reason they couldn't just make the amount of palette colors bigger is because of the limited size available on the video processors, which led me into wondering why they don't just take the Neo Geo approach and put it on a separate chip, but I guess it would have cost too much to add that and connect it to the video processor... It's funny how simple it seems things like this would be to fix, but how hard they really is given the budget to design the thing, although if they where really that concerned about money, I'm not sure why the SNES has a big expansion port on the bottom that only got used for a satellite thing in Japan that probably just as easily had it go through the cartridge slot like the hypothetical SNES CD attachment (MSU-1).
Espozo wrote:
Objects would be clipped long before that would, as they are higher priority.
...you know, I never quite understood how sprite overload works on the SNES, except that there's apparently a "bug" that causes sprites "in front" to be clipped first. I've also observed that the "cheese grater" effect seems to start from the left...
Quote:
Anyway, I made part of the rock wall (the part where it's visible without going up) using a sprite sheet from someone (they didn't have the top part for whatever reason) and it's 31 sprites and 54 16x16 sprite tiles.
That's most of a sprite table. The blank part would be unnecessary with a window mask helping out, but it's still substantial. Even if mid-screen OBSEL changes work reliably in combat, you can only take sprite paging so far before the required duplicates crowd out the fancy BGs...
Quote:
93143 wrote:
A preliminary inspection suggests to me that it is almost possible to do this with two HDMA channels.
I think it's like 5 with the "Push Start" (It's got a gradient) and 3 without.
Well, I just tabulated it (kinda), and unless I missed something it
is possible to do it with two HDMA channels if you use sprites for "Push Start" and you're willing to use five subpalettes, but if either of those conditions is not met you have to use three channels.
Doing this up properly would take more brain power than I have this late at night...
Quote:
I was thinking of a way to try to make it seem like there are more sprite palettes on the SNES than there really are by setting up a system where a palette that is only used for some position on top of the screen would be converted into a different one for objects below it, like the palette for a helicopter would turn into the palette for a tank, and there are a surprising number of other instances where this would work.
Three HDMA channels could pull that off in just five scanlines, assuming all 15 colours have to change. Two could do it in eight. How fast do you actually need it to happen?
93143 wrote:
...you know, I never quite understood how sprite overload works on the SNES, except that there's apparently a "bug" that causes sprites "in front" to be clipped first. I've also observed that the "cheese grater" effect seems to start from the left...
I didn't think that that was "a bug". I thought it had just been how the sprite rendering system operates, in that it starts from the back and overwrites it, it's not like the SNES is going to have some sort of z buffer. However, clipping from the left first is still curious... I guess it reads right to left, and then "back" to "front"?
93143 wrote:
That's most of a sprite table. The blank part would be unnecessary with a window mask helping out, but it's still substantial. Even if mid-screen OBSEL changes work reliably in combat, you can only take sprite paging so far before the required duplicates crowd out the fancy BGs...
Yeah, I don't think any of the sprite table things are going to work... Anyway, the window would only help with one 16x16 tile, so yeah, it wouldn't really matter. I'd honestly just take some slowdown for just that one instance in the game, as it's about the only part of the game where there are 3 "BG layers".
93143 wrote:
Doing this up properly would take more brain power than I have this late at night...
Yeah, I have trouble going to sleep. (and even more trouble waking up...)
93143 wrote:
Three HDMA channels could pull that off in just five scanlines, assuming all 15 colours have to change. Two could do it in eight. How fast do you actually need it to happen?
You know just as well as I do. Here's a screenshot from the game:
In this picture, the "Metal Slug" letters at the top would be the "Press Start" letters, and that's one palette, the soldiers along with their weapons are one palette, the prisoner is one palette, the balcony is one palette, the car is one palette, the van is one palette, the player is one palette, the smoke and fire from the rockets and the rockets themselves are one palette, the bazooka rounds and the grenades are one palette, (they all flash gray, blue, and red which all are about 5 colors in that one palette) The recolored explosion to look like smoke is one palette, the wood debris falling right below the balcony is one palette, (it's the same color used throughout the whole game) and the rocket launcher explosion is one palette. In total, that's 12 palettes, but break it down vertically, (I tried to give a fair amount of space between) and it's like:
Metal Slug letters, soldiers, rocket explosion, grenade and bazooka rounds, prisoner. 5 palettes
Balcony, soldiers, rocket explosion, grenade and bazooka rounds, prisoner, recolored smoke explosion, wood debris. 7 palettes
Rocket smoke, soldiers, rocket explosion, grenade and bazooka rounds, recolored smoke explosion, wood debris. 7 palettes
Player, rocket smoke and rockets, soldiers, rocket explosion, grenade and bazooka rounds, recolored smoke explosion. 6 palettes
Car, van, player, soldiers, rocket smoke and rockets, 5 palettes
Keep in mind, this is also one of the worst parts of the game for palette overload next to the part with the sky bridge thing on the last level. I think this seems pretty good though and could probably be accomplished without glitching by changing 2 colors per line. The problem would be programming it...
Edit: You know, I wonder how well the SPC700 would fare with something like this:
https://www.youtube.com/watch?v=8xdv3fKqbYkIf I remember correctly, the Neo Geo has 7 PCM channels and 7FM channels (7 seems like a bit of an odd number, but whatever). I have something called "Bridge M1" that can somehow play many of the songs in arcade games, and on it, you can adjust the volume of certain things, like I think the FM channels come from a different chip than the PCM channels do, like on the Irem M92, FM channels come from some sort of Yamaha chip while PCM channels come from something called the GA1 I think. Anyway, I turned off the PCM channels and there was almost no sound at all, so I don't think there should be too much of a problem there, and considering the SNES has one more PCM channel, I imagine you could fake FM sounds some way. I'd just hope there isn't ever any more than 8 channels total being used in Metal Slug, but if there are, I suppose it wouldn't be way too impossible to mix FM sounds into one channel?
Also, this is kind of random, but cant the SPC700 only be accessed using HDMA for whatever reason? I think I also remember hearing that you can only ever transfer information to it using 2HDMA channels, and I was thinking, because there's only 64KB of audio ram, maybe you could have it to where you'd swap out instruments and sound effects using the two HDMA channels, and if there's a moment where's only enough bandwidth to swap out an instrument or a voice sample, the instrument wins and the voice sample just doesn't play I guess. I honestly have no clue how the SPC700 works other than that it has 8 PCM channels though.
Espozo wrote:
Also, this is kind of random, but cant the SPC700 only be accessed using HDMA for whatever reason?
No, that's not true.
The SNES APU is a black box (well, a silver box) that communicates via four I/O ports: $2140 through $2143 on the 5A22 side and $F4-$F7 on the SPC700 side. These ports are read/write from both sides, but IIUC reading returns the value last written on the other side, not the value last written on either side. The APU has no other means of communication with the SNES besides those ports. It even uses a separate master oscillator (meaning you have to watch out for sync issues when streaming audio continuously).
Due to this encapsulation, the APU is the only component of the SNES that has any storage memory (at least, any that's relevant to programmers): 64 bytes of IPL ROM that gets loaded into the top of audio RAM at boot and defines a communication protocol that the S-CPU has to follow in order to load data into the APU. Once a program has been loaded, you can tell the APU you're done and it will jump to the starting point of your program. If your program contains a different communication protocol, you can then use that, or you can jump back to the IPL boot loader if you want.
Since data going through the I/O ports has to be picked up and used/placed by the SPC700, it is not possible to DMA directly to audio RAM. You could DMA to the ports, but most of the data would be lost. The usual technique is to put both CPUs in a handshake loop, where the S-CPU (the main SNES CPU) pushes in some data, the S-SMP (the SPC700) takes it and writes an acknowledgement back, and then the S-CPU pushes in more data. This ties up the S-CPU, of course. So some programmers have used HDMA to write to the ports instead, because it happens at regular intervals without CPU intervention but not too fast for the SPC700 to keep up.
You should only need one HDMA channel, or maybe two depending on how you're setting up the tables, because you can only write to each port once during HBlank or the SPC700 will miss writes no matter how tight the APU-side loop is. This defines a theoretical maximum HDMA-to-APU bandwidth of 3x225(?)x60 = 40,500 bytes per second in the general case (assuming overscan is off), because you need a control byte to tell the SPC700 the data is new, meaning you can only give it three bytes of actual data. It is probably possible to write a loop on the SPC700 tight enough to get three bytes every scanline; however, it would likely be unable to do anything else in between and would have to be told that VBlank had started so it could go run the audio engine. Lower bitrates, such that new data is only provided every two or three lines, are more forgiving and might allow for high-granularity interleaving of streaming and processing.
A tight handshake loop between the S-CPU and the SPC700 can reach much higher bitrates - blargg wrote a demo that streams 32 kHz stereo
uncompressed to the echo buffer, which is 128,000 bytes per second, but between decompressing the audio (he used a custom lossless codec) and feeding the result to the APU, there wasn't much CPU time left over for anything else.
Quote:
I honestly have no clue how the SPC700 works other than that it has 8 PCM channels though.
Technically, the SPC700 has no audio capability at all. It's an 8-bit CPU based heavily on the 6502 but with extra features. Nintendo refers to it as the S-SMP. It controls the S-DSP, which is the actual audio chip; the S-DSP is the one with 8 gaussian-interpolated PCM channels using BRR-encoded samples, optional ADSR or manual gain, pitch modulation, the option to generate various flavours of noise on any channel instead of using a sample, and an echo feature that uses main audio RAM and can overwrite your program if you're not careful.
But most people just use "SPC700" to mean the whole APU.
93143 wrote:
A tight handshake loop between the S-CPU and the SPC700 can reach much higher bitrates - blargg wrote a demo that streams 32 kHz stereo uncompressed to the echo buffer, which is 128,000 bytes per second, but between decompressing the audio (he used a custom lossless codec) and feeding the result to the APU, there wasn't much CPU time left over for anything else.
Well, I don't think you'd really need to compress anything in this case. Just like the creators of the original game, I'm not the least bit concerned about memory.
I'd also much rather not waste precious HDMA channels...
93143 wrote:
Technically, the SPC700 has no audio capability at all. It's an 8-bit CPU based heavily on the 6502 but with extra features. Nintendo refers to it as the S-SMP. It controls the S-DSP, which is the actual audio chip; the S-DSP is the one with 8 gaussian-interpolated PCM channels using BRR-encoded samples, optional ADSR or manual gain, pitch modulation, the option to generate various flavours of noise on any channel instead of using a sample, and an echo feature that uses main audio RAM and can overwrite your program if you're not careful.
Oh, I get it now. It's like on the Irem M92, the V30 controls the GA20 (I called it by the wrong name earlier) that creates the PCM samples and the Yamaha YM2151 (which is apparently closely related to the Genesis's own Yamaha chip) for PCM samples. (Apparently, the Irem M72 also uses the Yamaha YM2151, as seen in this video:
http://www.youtube.com/watch?v=m2gNq11ept4)
Also, I looked at the Neo Geo, and it has a Z80 for controlling the Yamaha YM2610, which is basically (from what I can tell) 1 "super" PCM channel, 6 normal PCM channels, 4 FM channels, and a bunch of other junk for a total of 15 different channels. This is it:
https://en.wikipedia.org/wiki/Yamaha_YM2610
I was always confused about what exactly the SPC700 is. The sound chip, or that secondary CPU that controls it. Why do so many systems use a second CPU for sound anyway? Wouldn't it be cheaper to just let the main CPU access it directly, and to ocassionally steal cycles from the CPU to access RAM?
SPC700 is the name of a CPU, just as i386 and 65816 are the names of CPUs.
- S-CPU contains a 65816 CPU and a memory controller.
- S-SMP contains an SPC700 CPU, some timers, and a 64-byte bootloader.
- The "sound module" contains the S-SMP and the S-DSP. In the oldest console revisions, especially those with a detachable sound module, these were two separately packaged ICs.
These are the communication paths:
Code:
ROM
^
v
S-PPU2, S-PPU1 <=> S-CPU <=> S-SMP <=> S-DSP
^ ^ ^ ^ ^
v v v v v
VRAM WRAM Audio RAM
DMA is done by connecting either ROM or WRAM to a different device through the S-CPU's memory controller.
In other words, SPC700 + I/O registers = S-SMC, and S-SMC + S-DSP + RAM = sound module?
Why are there so many Ss? The S stands for "Super" right? Why did they need to call everything "Super?"
psycopathicteen wrote:
In other words, SPC700 + I/O registers = S-SMC, and S-SMC + S-DSP + RAM = sound module?
Why are there so many Ss? The S stands for "Super" right? Why did they need to call everything "Super?"
All the SNES ASICs have names starting with "S-", probably to distinguish them from ASICs for other Nintendo products. S-CPU, S-PPU1, S-PPU2, S-ENC (the composite video encoder), S-SMP, S-DSP and S-WRAM. Yes, the 128KB RAM is a custom chip too--the logic that mirrors just the first 8KB over every bank, and the $2180-$2183 address port/data port registers, are both built into the RAM itself.
The S distinguishes the S-CPU from the NES CPU and the S-PPUs from the NES PPU.
I don't remember what Nintendo called the N64's MIPS-architecture CPU, but its PPU was the RCP (Reality Coprocessor). Just as the NES CPU comprises the CPU proper and APU, the RCP consists of the RSP (Reality Signal Processor), which handles vertex shading and apparently audio mixing, and the RDP (Reality Drawing Processor), a fixed-function triangle rasterizer. Starting with the GameCube, Nintendo followed Atari's practice in adopting human-readable names for each main chip: Gekko (PowerPC G3 CPU) with Flipper (ArtX fixed-function GPU) on the GameCube, Broadway (overclocked Gekko) with Hollywood (ArtX GPU and ARM9 IOP), and the Wii U multi-chip module containing Espresso (multicore PowerPC G3 CPU) and Latte (AMD Radeon GPGPU, ArtX GPU for Wii mode, ARM9 IOP, and 32 MB of VRAM).
psycopathicteen wrote:
I was always confused about what exactly the SPC700 is. The sound chip, or that secondary CPU that controls it. Why do so many systems use a second CPU for sound anyway? Wouldn't it be cheaper to just let the main CPU access it directly, and to ocassionally steal cycles from the CPU to access RAM?
I don't really get it either... It's not like there's a separate CPU to handle the video hardware.
tepples wrote:
Starting with the GameCube, Nintendo followed Atari's practice in adopting human-readable names for each main chip
I personally think calling it by what it is makes more sense. The PPU is indeed a Picture Processing Unit, but the espresso isn't an espresso.
tepples wrote:
RCP
Kind of like this?
tepples wrote:
VRAM
I thought that had gone in favor of unified memory. I'm assuming it's for textures? I guess it could also be for the framebuffer. It seems too small for textures, but too large for a framebuffer.
Anyway, about changing colors mid screen for sprites on the SNES, The way I though of would be to have a list consisting of 224 entries, (224 scanlines) with each entry containing 8 sub entries (8 palettes). There would be a routine that would look at the height of an object, what vertical position it's at, and what palette it's requesting, and it would go to the list and fill it out accordingly. Obviously, the position of sprites with a different palette could overlap, so you'd need to find a palette slot that's open. One problem would be that something that's palette 0 at the top could be palette 1 at the bottom, so you'd have to make it intelligently re arrange it as it parses the giant list to make a table. Honestly though, this is going to be ungodly slow. Just think, it's got to check 8 palettes from the previous line against the 8 palettes from the current line, and it's going to have to do that 224 times. It's also going to have to know to not try and overwrite anything when there are empty palettes available. It's also going to have to decide as to what palettes are safe too overwrite and what aren't for when there are more than 8 palettes total. Again,
very slow...
Espozo wrote:
It's not like there's a separate CPU to handle the video hardware.
There is on the N64 (the RSP), and there is on the Jaguar (Tom).
Espozo wrote:
tepples wrote:
RCP
Kind of like this?
Perhaps the RC-P90 (a second-source FN P90) was named after the Reality Coprocessor for the same reason that the Klobb (a second-source Škorpion vz. 61) was named after Ken Lobb.
Espozo wrote:
tepples wrote:
VRAM
I thought that had gone in favor of unified memory. I'm assuming it's for textures? I guess it could also be for the framebuffer. It seems too small for textures, but too large for a framebuffer.
Unused VRAM is reportedly used as an L3 cache by the CPU.
Espozo wrote:
Anyway, about changing colors mid screen for sprites on the SNES, The way I though of would be to have a list consisting of 224 entries, (224 scanlines) with each entry containing 8 sub entries (8 palettes). There would be a routine that would look at the height of an object, what vertical position it's at, and what palette it's requesting, and it would go to the list and fill it out accordingly.
Ewww, flashbacks to Atari 7800 "zone lists"...
tepples wrote:
Perhaps the RC-P90 (a second-source FN P90) was named after the Reality Coprocessor for the same reason that the Klobb (a second-source Škorpion vz. 61) was named after Ken Lobb.
tepples wrote:
Unused VRAM is reportedly used as an L3 cache by the CPU.
What is "L3"? Also, other than the "L3 cache", what is occupying it?
tepples wrote:
Ewww, flashbacks to Atari 7800 "zone lists"...
What in the world is that? Also, I didn't know you did anything with the Atari 7800.
Espozo wrote:
tepples wrote:
Unused VRAM is reportedly used as an L3 cache by the CPU.
What is "L3"? Also, other than the "L3 cache", what is occupying it?
Level 3 cache contains whatever is too big to fit in level 1 cache (the fastest memory closest to the CPU) and too big to fit in level 2 cache (the second fastest memory second closest to the CPU). Anything too big to fit in level 3 cache has to be dumped out to main RAM.
Quote:
tepples wrote:
Ewww, flashbacks to Atari 7800 "zone lists"...
What in the world is that? Also, I didn't know you did anything with the Atari 7800.
I never did anything with the Atari 7800, but I do recall discussions about it in #nesdev. The screen is divided into "zones", horizontal strips 8 or 16 pixels tall, each with its own sprite display list. An object that straddles zones has to be drawn in each zone that it crosses. During each scanline, the Maria GPU halts the CPU (by pulling RDY low), reads the display list for the current zone, and draws the pixels of the objects in the current zone to a line buffer. (At the same time, it's sending pixels in the other line buffer to the screen.) Once the display list is finished, it releases RDY and lets the CPU continue executing.
You could potentially do this palette switching thing with coarser granularity, maybe 8 or 16 lines. No need to check every scanline unless that's the only way to make it work. Even then you'd only have to do it if you got a conflict; sprites come in at least 8- or 16-line chunks, so there's no way to miss one if your granularity is at least that small. And if you're willing to use an SA-1, the CPU hit wouldn't be as bad...
It could still be pretty bad, though. I haven't coded anything, but might it be possible to partially preschedule this, since the game is fairly linear? That way you'd reduce the number of objects and palettes you had to check in real time, and it would be more likely to end up tractable...
Espozo wrote:
Well, I don't think you'd really need to compress anything in this case.
It's true that just the streaming by itself might only take about 30% of the available cycles, but the fact is the SPC700 is fully occupied just receiving the data; it's not waiting for the decompression at all. So no matter what you choose to do with the remaining S-CPU time, it has to fit inside the APU streaming loop (maybe 70 FastROM cycles) if you want to maintain the bandwidth, which can't be good for code efficiency.
In blargg's demo, the SPC700 is actually past its theoretical bandwidth limit, since you only have about 8 cycles to pick up and store each byte, including transfer control overhead. That's barely enough for writing to a fixed location, but if you want indexing (you do), it doesn't work - to say nothing of task selection for an actual game audio engine, or buffer selection for multiple streaming samples. So in a game-like scenario you'd have a longer loop spanning more S-CPU cycles.
HDMA is much more likely to give you decent bandwidth without eating a large chunk of your CPU time. Remember what happened to Street Fighter Alpha 2?
Quote:
Just like the creators of the original game, I'm not the least bit concerned about memory.
Then you could probably store samples in HDMA format with the control bytes baked in, so you'd only need one channel.
Quote:
I'd also much rather not waste precious HDMA channels...
Well, if you find you need all eight for other stuff, you may need to take a CPU hit or an audio quality hit, or both, but that doesn't sound like a decision that needs to be made before the game engine's design has firmed up.
Let's see... you need a minimum of two for colour, one or maybe two or maybe even three for scroll, perhaps one for a window... you probably shouldn't use HDMA for OBSEL for the reason I've already given (it would probably interrupt sprite compositing)... Does the original game use any raster effects you'd need HDMA for?
93143 wrote:
You could potentially do this palette switching thing with coarser granularity, maybe 8 or 16 lines. No need to check every scanline unless that's the only way to make it work.
Yeah, you're right. It's not like you can really change an entire palette in one line anyway. It'd probably make it 8 lines and use 2 HDMA channels so that it will be easier as you will change one palette in one chunk.
93143 wrote:
It could still be pretty bad, though. I haven't coded anything, but might it be possible to partially preschedule this, since the game is fairly linear?
You could, but the results wouldn't be nearly as good and it would overall just be a pain. (I'm not a huge fan of hardcoding a bunch of stuff like this.) It's going to be kind of like sprite pixel per scanline limit: You just have to pray that some things will be at different heights so nothing happens, and just like the sprite pixel per scanline limit, if there's too much one line, graphical glitches will occur. (although worse in this case)
93143 wrote:
Remember what happened to Street Fighter Alpha 2?
That's the reason? Oh...
93143 wrote:
Does the original game use any raster effects you'd need HDMA for?
Not much. I haven't seen many Neo Geo games use raster effects, presumably because they are less needed with more flexible "BG"s. There might be one or two instances where I would use it where the original game wouldn't though, like here:
The tanks and the sky would all be a raster split.
I'm just saying though, all but the 2 for changing sprite palettes don't need an HDMA channel solely for it, as that will be hardcoded. If I need more channels for scrolling, I'll make them that. If I need more for changing BG colors, I'll do that.
I don't really care if the colors aren't exactly the same as the arcade, just as long as it's not obvious.
You may be psycho, but I guess I'm crazy because I care. Also, if you did that, some of the graphics might need to be reworked. If you have something that uses 8 shades of one color and 7 of a different color and something that uses 7 shades of a closely related color and 8 of a different closely related color, something is going to have to use one less color.
Basically though, what I'm thinking is you'll go through the object list, get the object's y position relative to the screen, divide it by 8, and then fill out the row of that number with the palette it's requesting. One thing though, you'd need to figure out how tall the object is. That makes me wonder, how do most games have the object's coordinates relative to the visual representation of the object? And also, say if you where standing in one place and the animation changed, like a kick, you wouldn't change the coordinates, would you? I mean, I'd think you'd have the coordinates be arbitrary in that you'd have registers that would say the position of the top left corner of the visual representation relative to the object coordinates, and the bottom right corner. Wait a minute, changing that wouldn't be any better than just changing the coordinates of the object and saying how tall and wide it is. But I just thought of something... The hit box often doesn't meet up perfectly with the visual representation, so you'd need a reference point, which would be the object's coordinates...
Don't you have to rework all the graphics anyway because of the different pixel aspect ratio? Otherwise it would look fat, and you wouldn't be able to see as much of the action.
93143 wrote:
Don't you have to rework all the graphics anyway because of the different pixel aspect ratio? Otherwise it would look fat, and you wouldn't be able to see as much of the action.
it's a 19 to 16 ratio in terms of screen size (8 pixels are clipped on both sides on the Neo Geo). Just cut off 24 pixels on each side and there you have it.
Attachment:
304 Pixels Wide.png [ 26.84 KiB | Viewed 1967 times ]
Attachment:
256 Pixels Wide.png [ 22.82 KiB | Viewed 1967 times ]
Attachment:
256 Pixels Wide Stretched to 304.png [ 121.71 KiB | Viewed 1967 times ]
I suppose some important things are cut off, but oh well.
TMS9918/NES/SNES dot clock: 945/176 = 5.37 MHz
Neo Geo dot clock: 6.00 MHz
Ideal squash ratio: 315/352 = 89.5% of width
which means you can maintain proportions by doing this:
- Squash 304 pixels to 272 using some sort of autohinting scaling algorithm analogous to Rotsprite.
- Chop off the 8 pixels at far left and right.
I remember how time consuming it was to make the Gunstar Heroes sprites skinnier, and I didn't like the fact that you can't redraw the background to be skinnier, because you're stuck with the tile grid. You can probably dynamically load tile patterns as the level scrolls, but it will consume more CPU time, DMA time, and you'll have less flexible color usage. I'd probably try this out for the heck of it, when I have a long stretch of days when I'm not busy.
psycopathicteen wrote:
I remember how time consuming it was to make the Gunstar Heroes sprites skinnier, and I didn't like the fact that you can't redraw the background to be skinnier, because you're stuck with the tile grid.
I thought Neo Geo backgrounds were made out of sprites, not a tile grid. True, each sprite is a column of 16x16 pixel tiles, but they aren't reused nearly as tightly as, say, NES tiles. Can you show a map of how tiles are reused?
Quote:
You can probably dynamically load tile patterns as the level scrolls
As I do in my current NES project. Yet the artist
still tries to use more than 256 distinct tiles in a single part of a level.
tepples wrote:
I thought Neo Geo backgrounds were made out of sprites, not a tile grid. True, each sprite is a column of 16x16 pixel tiles
Sprites can be "stuck" together in hardware. I think I remember each sprite has a "sticky" bit that attaches it to the right side of the lower numbered sprite. (The first sprite wraps all the way around.) When they are attached, the tile height and shrink height are ignored as it shares it from the sprite all the way on the left. You can stick as many sprites as you want, and the horizontal shrink for one sprite can change, and the ones on the right of it will move over accordingly, but they won't shrink also. You manually have to horizontal shrink all of them.
tepples wrote:
but they aren't reused nearly as tightly as, say, NES tiles. Can you show a map of how tiles are reused?
I think palettes are the issue...
tepples wrote:
You can probably dynamically load tile patterns as the level scrolls
Well, in this case, you'd definitely have to.
Still though, how do most people attach the coordinates of the object to the actual object?
Also, one thing that would really help is if I somehow knew what palettes are being used for each sprite tile in the game.
Yeah, there's no way this is going to work without reducing the number of frames to 2...
Attachment:
Again wishing the SNES had CHR ROM....png [ 51.3 KiB | Viewed 1671 times ]
(Who in the world would have the patience to draw that?)
Why? It's not like you have to switch out the whole image; just the parts that move. And I can already see some parts of the waterfall that seem to be re-used multiple times in a frame; careful reworking might decrease the tile count even more.
How are you planning to animate this? I figure you've got four options:
1) stick all the tiles and tilemaps in VRAM, and switch tilemaps via BGnSC.
2) stick all the tiles in VRAM, and DMA a new tilemap (can perhaps get away with a fairly small transfer if using 16x16 tiles).
3) leave the tilemap alone and DMA new tiles as required (could eat a lot of DMA).
4) use two tilemaps and double buffer the animated tiles (but obviously not the static tiles). This could let you use a moderate amount of VRAM and a moderate amount of DMA per frame without limiting the number of frames of animation, though the frame rate is limited by how much DMA you can spare.
93143 wrote:
Why? It's not like you have to switch out the whole image; just the parts that move.
Which is half the image...
93143 wrote:
I can already see some parts of the waterfall that seem to be re-used multiple times in a frame
Where? From what I see, it's all being animated, just very gradually. What you would might be able to get away with is having more frames of animation for parts that seem to change more.
93143 wrote:
2) stick all the tiles in VRAM, and DMA a new tilemap (can perhaps get away with a fairly small transfer if using 16x16 tiles).
That's what I was thinking. And yes, 16x16 tiles are going to be used, as that's the tile size on the Neo Geo anyway.
93143 wrote:
3) leave the tilemap alone and DMA new tiles as required (could eat a lot of DMA).
That would eat about all of DMA...
Also, I think I came up with a solution for the box for the visual representation and the coordinates. The coordinates would be in the center of the object for flipping it (like if you had a picture of someone with their leg out, you wouldn't flip it in the center of the picture, just the center of their body) and the animation engine would load the size and position of the visual representation box. I really only need height, but I imagine width will come in handy.
Espozo wrote:
93143 wrote:
I can already see some parts of the waterfall that seem to be re-used multiple times in a frame
Where? From what I see, it's all being animated, just very gradually. What you would might be able to get away with is having more frames of animation for parts that seem to change more.
Look closely at a single frame of the waterfall. Just about every feature in the top two thirds of the image shows up at least twice, sometimes more.
(Or were you already taking that into account?)
Does the original actually animate at 60 fps?
Okay, having checked the game again, I'll comment on some stuff...
93143 wrote:
Look closely at a single frame of the waterfall. Just about every feature in the top two thirds of the image shows up at least twice, sometimes more.
It doesn't appear so...
93143 wrote:
Does the original actually animate at 60 fps?
Yes.
Espozo wrote:
93143 wrote:
Look closely at a single frame of the waterfall. Just about every feature in the top two thirds of the image shows up at least twice, sometimes more.
It doesn't appear so...
Look closer.
I'm not talking about repeating tiles across animation frames. I'm talking about repeating tiles
within a single frame.
The most obvious example is the big dark grey rock outcropping partway down, which shows up twice; once in the middle, once on the left and a bit lower down. But there are others. Even the spray at the bottom has a repeated chunk...
Quote:
93143 wrote:
Does the original actually animate at 60 fps?
Yes.
That seems like overkill to me. It could probably be taken down quite a bit before it started to look bad. On the flip side, animating just two frames that fast would
definitely look bad...
93143 wrote:
I'm not talking about repeating tiles across animation frames. I'm talking about repeating tiles within a single frame.
Oh... I see. I thought you meant across multiple frames.
93143 wrote:
That seems like overkill to me. It could probably be taken down quite a bit before it started to look bad. On the flip side, animating just two frames that fast would definitely look bad...
Agreed. I think 4 frames at 30fps sounds doable. It's kind of strange how the giant waterfall is animated at 60fps when the (comparatively) small explosions are animated at about 30. (it slows down near the end).
One extremely odd thing I just noticed is that the player's grenades move at 30fps, I don't mean they're animated at that speed, but they where actually programed to hang in the air for a frame. I'm guessing it just follows a table for the position of the grenade while airborne, and does a collision check with the ground to then use a "bouncing" table. I could honestly care less about emulating odd behaviors like this.
Edit: Wait, what the heck?
Everything moves at 30fps. (Even though they felt the need to animate the giant waterfall at 60fps for whatever reason...) And no, this isn't some kind of weird Mame behavior, because it doesn't do this to other games. (GunForce 2 forever!)
Shoot, if there are problems with processing, can't you just make it run at 30fps on the SNES?
Also, It works just like this on Metal Slug 2. How the hell is it that that game can barely even run at 30fps?
You know, it really is a shame that you can't "freeze" the screen on the SNES to where you could devote a whole frame to vram updates... Well, you could have it flash back and forth.
Espozo wrote:
I think 4 frames at 30fps sounds doable.
Perhaps. There might be bandwidth issues, though, since on top of the main action you've got a bunch of water in the foreground too... Maybe you could do all 8 frames at 15 fps, and just accept that the animation is going to look a bit different...
Quote:
Wait, what the heck? Everything moves at 30fps. [...] Shoot, if there are problems with processing, can't you just make it run at 30fps on the SNES?
Well, doing 60 would be inaccurate, right?
If you ended up needing an SA-1, I don't think you'd have trouble with processing at 30 fps or maybe even 60. Even if the original were programmed with very high efficiency, which it probably isn't, a 10.74 MHz 65C816 should be able to beat a 12 MHz 68000 handily (and of course a 2.68 MHz 65C816 wipes the floor with a 4 MHz Z80).
93143 wrote:
Well, doing 60 would be inaccurate, right?
I don't think I've ever seen any objects that get animated that high, only the backgrounds for whatever reason. (On level 3 when you destroy the building, the background changes color every frame to make it look sunny I guess.) Anyway though, you can definitely save time considering the only thing that's happening on half of the frames is that the animation only for some things is being updated. You wouldn't have to do collisions or a majority of the processing for objects, and that whole palette scheme I came up with would only have to be figured out every other frame because nothing is moving to cause issues. Basically, the game runs at 30fps. Wow.
93143 wrote:
If you ended up needing an SA-1, I don't think you'd have trouble with processing at 30 fps or maybe even 60.
Hell, I don't think the stock SNES would have trouble with 60 if it weren't for all the extra crap the Neo Geo game doesn't do, like looking for empty slots in vram, the palette thing, that midscreen split, etc. And of course, all of these issues are tied to the PPUs... Maybe aside from the midscreen split that only occurs for one brief moment, there isn't a fiber in my body that says the SNES couldn't handle how it is, that is, (mostly) at 30fps.
93143 wrote:
Even if the original were programmed with very high efficiency, which it probably isn't
I
know it isn't. Whenever you defeat a boss and shrapnel flies all over the place, they game goes to a stop (despite the fact that you can't even move) and everything starts flickering. (not even the Contra III car explosion causes that.) I think it might have to do with this:
https://www.youtube.com/watch?v=-2PiaH8CO64 What the heck is even going on? I can understand something changing the sprites it's using if an object changed the number of sprites it was using, but this would be a minor difference, it's not like changing back and forth like that. It's like the game was thrown together using Elmer's glue.
Also, whenever the "MISSION COMPLETE" letters go away, the game also slows down. Oddly enough, Metal Slug 2 doesn't have this problem, only the first one. I guess they where so focused on making those letters move smoothly that they forgot about the rest of the game. (Kind of like they where so bussy animating that waterfall that they couldn't get the rest of Metal Slug 1 to run at 60fps
)
93143 wrote:
a 10.74 MHz 65C816 should be able to beat a 12 MHz 68000 handily
It would probably be about 1.5 to 1.75x faster.
There is nothing I see in the Metal Slug games that couldn't be processed on the Genesis at 60fps. I got to give it to Stef in that Gunstar Heroes is more impressive from a processing point of view (definitely not artwork.)
You may not think the game looks pretty, but if the SNES can processes everything going on here at 60fps without even a hint of slowdown, there shouldn't be a problem with Metal Slug in its 30fpsness.
https://www.youtube.com/watch?v=LUH32GRqqHI (look at the space shooter segments.)
I know I sound crazy, but I just lost a great deal of appreciation for Metal Slug after finding that out. Gunforce 2 may slow down 50% of the time, but at least it targets 60fps and when it does slow down, it's at least justified.
Okay, this is a bump (not like anyone here thankfully cares) but I was thinking this would work:
Code:
lda VisualAreaYHeight
beq next_object
ror
ror
ror
tay
lda XPosition
clc
adc VisualAreaYStartingPosition
and #$FFF8
sta TemporaryA
tax
line_checker:
lda a:PalettesUsedPerLine,x
cmp #$0008
bcs preperation_for_next_row
adc TemporaryA
tax
inc a:PalettesUsedPerLine,x
lda Palette
sta a:PaletteTable,x
preperation_for_next_row:
dey
beq next_object
txa
clc
adc #$0008
cmp #$00E0
bcs next_object
tax
bra line_checker
But then I realized that if I did this, than the position of a color palette would go all over the place, and you can't change oam during hblank... I'm honestly stooped as to go about this. I don't even think I'm going to worry about Metal Slug anymore (shocker!) but I still want to see how good this would work.
...And to further take away from Metal Slug, I remember that the Neo Geo has an auto animate feature, where you can tell it how many tiles after the one you selected you want it to cycle though, as in if I chose it to start at tile 2 and wanted it to have 4 frames of animation, it would go to tile 2, then tile 3, then tile 4, then tile 5, and then back to tile two and would keep doing that over and over again.
So yeah, it's a 30fps 2D game... :/
Instead of going through an object list and assigning palettes, why not go through a palette list and check for objects using them? That way you could group similar palettes and arrange them to minimize artifacting in case of overload.
Caveat: My brain is a little fried because I had a very busy and stressful week, so I haven't actually thought this through...
Quote:
So yeah, it's a 30fps 2D game... :/
That does make me feel better about my shmup port. Most of the bullet patterns are going to have to be at 30 fps...
93143 wrote:
Instead of going through an object list and assigning palettes, why not go through a palette list and check for objects using them? That way you could group similar palettes and arrange them to minimize artifacting in case of overload.Caveat: My brain is a little fried because I had a very busy and stressful week, so I haven't actually thought this through...
Yeah, I'm completely lost as to what to do right now. :/ I guess you'd have to have a code that could read the first line, see that palette #$FFFF is palette 0, see the second line, look for palette #$FFFF first, and then make that the first one on the second line and then keep doing the same thing, so in the case of checking 8 palettes against 8 palettes, that's 64 checks... Also, it will have to know to try to change palettes that are barely used instead of ones that go all the way across the screen except with some gaps in-between. I wonder if having a total list of palettes that are supposed to be onscreen regardless of height would help in some way.
How in the world can something that seems so simple be so processing intensive?
93143 wrote:
That does make me feel better about my shmup port. Most of the bullet patterns are going to have to be at 30 fps...
Only bullets? Wait a minute, I just thought of something... I know there isn't going to be enough time to check collisions at 60fps, but do you think you could still move the bullets at that? sure, the hit boxes will be off by about a pixel or two during some frames, but then again, relative to 60fps, it's better to only have one probably not too noticeable thing be off (hangs for a frame) than to have that and the actual visual representation be off too.
Espozo wrote:
I know there isn't going to be enough time to check collisions at 60fps, but do you think you could still move the bullets at that?
Actually, I think I can do collisions at 60 fps on the GSU, if I'm clever about it. And I'd rather not have exploits show up involving fast bullets skipping over the player...
The problem is DMA. Regardless of how fast I can finish drawing a frame, I can't transfer it to VRAM faster than 30 fps unless I drop to 2bpp, which is usually (but not always) visually unacceptable.
Espozo wrote:
You can stick as many sprites as you want, and the horizontal shrink for one sprite can change, and the ones on the right of it will move over accordingly, but they won't shrink also. You manually have to horizontal shrink all of them.
This is in large part because sprites are always 16 pixels large, and as such the shrink value goes from 1 pixel to 16 pixels (it's only 4 bits long). By having each sprite keep its horizontal shrink value, you still keep the ability to shrink with pixel granurality (precisely because not all the shrink values have to be identical).
Would have been easier if the shrink value was larger and only the first sprite had to be set, but oh well, probably hardware implementation details (given it uses a look-up table internally to do the shrinking, keeping it as small as possible was probably a good idea).
93143 wrote:
I figured a single scanline was probably too fast to fit a high-precision music engine in around the data streaming, but two might just work, and three would be easier. At three bytes every three scanlines, 24 kHz mono just barely fits in a frame.
Okay, that's a load of rubbish. Try this on for size:
Code:
; A streaming HDMA table consists of multiple data chunks, with gaps to allow
; the APU to do tasks mid-frame. Each data chunk is preceded by a heads-up
; to the APU, followed by a gap long enough to guarantee that the audio
; engine will finish what it was doing, notice the incoming data flag, and
; begin polling port 0 for the data start flag. After this gap, the data
; start flag, data ID number (for multiple stream capability) and data length
; in scanlines is sent, and on the next scanline the data chunk begins.
data_incoming_HDMA:
mov A, #data_start_HDMA ; 2 cycles - load data start flag value
- cbne $F4, - ; 7 cycles - listen for the write
; This point is reached roughly 3-9 cycles after $2140 is written, assuming
; CBNE loads the comparison value before the branch target.
mov A, $F5 ; 3 cycles - load data ID number
mov temp, $F6 ; 5 cycles - load chunk size in scanlines
; The APU should have assigned the data ID to a buffer when it sent the
; data request, so all I have to do is find the buffer in question.
cbne buf1_id, buf2check ; 7/5 cycles - check buffer 1, skip next two instructions if no match
mov X, #buf1 ; 0/2 cycles - load direct-page address for buffer 1 data
jmp buf_found ; 0/3 cycles - skip ahead
buf2check:
cbne buf2_id, buf3 ; 7/5 cycles - check buffer 2, skip next two instructions if no match
mov X, #buf2 ; 0/2 cycles - load direct-page address for buffer 2 data
jmp buf_found ; 0/3 cycles - skip next instruction
buf3:
mov X, #buf3 ; 2/0 cycles - load direct-page address for buffer 3 data
; TOTALS: 18-25 since start flag noticed in $F4, 21-34 since $2140 written
; Okay, I've got the buffer (or, if none was assigned, I'll be corrupting buffer #3).
; Now I need to rewrite some MOV instructions in the streaming loop with the desired
; absolute addresses:
buf_found:
mov A, (X) ; 3 cycles - get low byte of buffer address
mov Y, $01+X ; 4 cycles - get high byte of buffer address
movw (get_data_HDMA+3), YA ; 5 cycles - write buffer address
clrc ; 2 cycles - not sure if needed
addw YA, one ; 5 cycles - I've wasted a byte of zero page memory on a constant
movw (get_data_HDMA+8), YA ; 5 cycles
addw YA, one ; 5 cycles - no need to CLRC as these cannot overflow
movw (get_data_HDMA+13), YA ; 5 cycles
addw YA, one ; 5 cycles
movw (get_data_HDMA+18), YA ; 5 cycles
; TOTALS: 44 cycles since buf_found, 65-78 since start flag written to $2140
; Okay, with that done, I just need to get the chunk size, zero X, and start the read loop:
mov Y, temp ; 3 cycles - pick up chunk size in scanlines
push X ; 4 cycles - store buffer pointer address
mov X, #$00 ; 2 cycles - set X to zero
jmp !get_data_HDMA ; 3 cycles - goto streaming loop in zero page
; TOTALS: 12 cycles since loop rewritten, 77-90 since start flag written to $2140
; Ideally, one scanline should be almost exactly 65 cycles long. The port reads
; are between cycles 3 and 30 past this point, putting them between 15 cycles after
; the first HDMA write and about 11 cycles before the fourth one on the next line.
; That should be good for at least several scanlines regardless of clock drift, no?
;===================================================================
; STREAMING LOOP IN ZERO PAGE:
get_data_HDMA:
mov A, $F4 ; 3 cycles
mov !buf+X, A ; 6 cycles
mov A, $F5 ; 3 cycles
mov !(buf+1)+X, A ; 6 cycles
mov A, $F6 ; 3 cycles
mov !(buf+2)+X, A ; 6 cycles
mov A, $F7 ; 3 cycles
mov !(buf+3)+X, A ; 6 cycles
inc X ; 2 cycles
inc X ; 2 cycles
inc X ; 2 cycles
inc X ; 2 cycles
cmp (X), (Y) ; waste 5 cycles
cmp (X), (Y) ; waste 5 cycles
cmp (X), (Y) ; waste 5 cycles
dbnz Y, get_data_HDMA ; 6/4 cycles
; TOTAL: 65 cycles
jmp !end_data_HDMA ; 3 cycles
; ZERO PAGE: 32 bytes code, 21 bytes buffer metadata, 2 bytes misc. storage
; = 55 bytes total, or ~21%. Maybe I should be using page 1 for this...
;===================================================================
end_data_HDMA:
mov A, X ; 2 cycles - load the buffer address index into A
pop X ; 4 cycles - pick up the buffer pointer address
clrc ; 2 cycles - clear carry
adc A, (X) ; 3 cycles - add the index to the low byte of the buffer pointer
mov (X), A ; 4 cycles - store the result back
mov A, Y ; 2 cycles - Y should be zero
adc A, $01+X ; 4 cycles - add zero to the high byte of the buffer pointer, with carry
mov $01+X, A ; 5 cycles - store the result back
cbne $05+X, done_HDMA ; 8/6 cycles - check high byte against buffer end address
mov A, (X) ; 0/3 cycles - pick up low byte
cbne $04+X, done_HDMA ; 0/8/6 cycles check low byte against buffer end address
mov A, $02+X ; 0/0/4 cycles - if end of buffer reached, load buffer start address low byte
mov (X)+, A ; 0/0/4 cycles - store to buffer pointer low byte and increment X
mov A, $02+X ; 0/0/4 cycles - load buffer start address high byte
mov (X), A ; 0/0/4 cycles - store to buffer pointer high byte
done_HDMA:
Caveat: I haven't tried this code, so I don't know if it's even correct, never mind if it works or not.
Well, it's good enough that it has sprite shrinking anyway. You know, can you really do much in terms of raster effects on the Neo Geo? I mean, you could pull of the Axelay effect if you were to change the first sprite's vertical shrinking value, but in order to also shrink horizontally, that would probably take way too much bandwidth than what's available.
Sik wrote:
probably hardware implementation details
They didn't seem to have any trouble implementing other random features, like the hardware animation.
If it's just a lookup table though, why don't other systems have the same thing? I mean, no other system really has 512 sized sprites, so the lookup table wouldn't need to be that big, just like 32x32 or 64x64. You would also need it horizontally though.
Sik wrote:
you still keep the ability to shrink with pixel granurality
Do some systems that support scrolling not support it pixel per pixel? Well, I mean, I guess you could say the Atari 2600 has "sprite scaling".
I think it's kind of funny that this topic was brought up again. I'm sad that I never got to try the Splatoon yogurt though, and I'm also sad that there's no new free junk coming out for it.
The success of that game in Japan is pretty inkredible though. It sucks, because all the Japanese people occupy all the higher ranks in rank battle and my internet connection is about a solid second behind them.
93143 wrote:
Okay, that's a load of rubbish. Try this on for size:
What am I looking at?
That is an attempt at an APU-side HDMA streaming engine capable of approaching four bytes per scanline, with the ability to smoothly back off on the bandwidth in such a way that the audio engine can achieve good time resolution by doing processing in between data chunks. This should work regardless of how long the engine cycles are (within limits - an audio engine that takes most of a frame just to turn over once ain't gonna fit).
The idea is that a single control word need not indicate only a single shot of HDMA data. Here, there's an "incoming" command, which tells the APU to start polling $F4 as soon as it notices, a "start" command, which is bundled with some metadata and tells the APU to actually start receiving data, and an arbitrary number of actual data shots (the number is part of the metadata that came with the "start" command). This pattern can be repeated any number of times per frame. I got the idea from the streaming engine in Super SNESMod, which looks like it relies on timed code to minimize handshaking requirements.
I have a plan that requires both high-bandwidth HDMA streaming (well beyond 32 kHz mono) and high-granularity audio engine timing (one frame is nearly 17 ms, which I don't expect to be acceptable for what I'm trying to do). I'm hoping this code makes enough sense that something like it can actually work. (This is my first attempt at writing SPC700 code; the earlier example in this thread doesn't count.)
...
It just occurred to me that if the audio engine can be relied on to have set up the streaming buffer pointer beforehand, it could also set up the streaming code, and it shouldn't be necessary to do the code modification in between the "start" command and the beginning of data pickup. Moving it to afterwards would allow more than three streaming buffers and remove the requirement for the data pickup loop to be in direct page, but it would eat roughly an extra scanline of compute time after every data chunk, and it would require a separate data pickup loop for each buffer...
With more robust use of X, I could reduce the amount of code modification required (in fact I might try that; I think I could get the loop out of direct page), but I don't see how to eliminate it without either limiting buffer size to 256 bytes (which is bad for 32 kHz because a frame is 300 bytes) or using multiple data pickup loops for each buffer...
Espozo wrote:
If it's just a lookup table though, why don't other systems have the same thing? I mean, no other system really has 512 sized sprites, so the lookup table wouldn't need to be that big, just like 32x32 or 64x64. You would also need it horizontally though.
Sega was going to implement it in the Mega Drive VDP, but ran out of die space (・~・) I suppose Space Harrier II and Super Thunder Blade were originally meant to use this feature?
And the problem with look-up tables is that, well, they take up a considerable amount of die space which made them really expensive compared to other stuff.
Sik wrote:
Sega was going to implement it in the Mega Drive VDP, but ran out of die space (・~・)
Same with everything else?
What was the Mega Drive originally supposed to be, an arcade machine?
Espozo wrote:
Sik wrote:
Sega was going to implement it in the Mega Drive VDP, but ran out of die space (・~・)
Same with everything else?
What was the Mega Drive originally supposed to be, an arcade machine?
It sounds like the Mega Drive was originally supposed to be a prototype Super Famicom.
More like they were trying to make something that could handle reasonably their then-current arcade games (including the superscaler ones). That didn't work out as you can see =P The Sega CD was originally an attempt to include the missing hardware, then somebody decided to add a CD drive (that was a last minute change) and then Digital Pictures lured Sega into thinking that FMV games were more important than, you know, all the other improvements >.<
Sik wrote:
then Digital Pictures lured Sega into thinking that FMV games were more important than, you know, all the other improvements >.<
However though, it all still has to go through the VDP, which partially explains why the video looked like crap. (512 colors total isn't the greatest, but the bigger offender was the 4 4bpp color palettes. I imagine they could have saved bandwidth and made the video larger if they cropped off the top and bottom of the screen, but I guess they liked the look of those ugly looking boarders.) If only they knew about the phantom bitmap trick...
Sik wrote:
More like they were trying to make something that could handle reasonably their then-current arcade games (including the superscaler ones).
If only they could have gone couple of years into the future to see this:
https://www.youtube.com/watch?v=MTzyz2TgGls
Espozo wrote:
I imagine they could have saved bandwidth and made the video larger if they cropped off the top and bottom of the screen, but I guess they liked the look of those ugly looking boarders.
Eh, bandwidth wasn't the issue (you can easily get 15FPS without any sort of cheating), the problem was that videos had to be decompressed since let's say that an 1x drive wasn't exactly capable of streaming uncompressed video (I made some calculations, it'd have gone at like 2FPS if they tried that). This is also why the video size kept increasing over time, they simply improved the codecs being used (
case in point, that one is nearly fullscreen)
(EDIT: also thinking about it, how much uncompressed video can you cram into a CD, really?)
Espozo wrote:
If only they could have gone couple of years into the future to see this:
https://www.youtube.com/watch?v=MTzyz2TgGlsTo be fair, Sega wasn't aiming for kids either (a large part of why they could actually take over the NES in the US, their target audiences didn't exactly overlap).
Also what's at 1:01, F-Zero Kart? =P
Okay, let's try this again, a little less cryptic this time (and better code too, IMO):
I'm trying to get enough bandwidth for 32 kHz stereo streaming or better while preserving sub-frame audio processing resolution and not loading the S-CPU down very much. This is apropos of the Street Fighter Alpha 2 discussion earlier in the thread (three 22 kHz streams would be great), but I have applications of my own in mind for it.
So what I've done is I've tried to design an approach using HDMA, that uses single commands (well, pairs of) to set up block transfers with no fine-grained handshaking, so as to be able to feed the APU data in chunks during active display while allowing it to do other processing in between the chunks.
In my concept, the HDMA pattern would consist of a series of data blocks, each one consisting of (a) a "data incoming" command, (b) a gap long enough for the audio engine to notice said command and begin polling I/O, (c) a "data start" command, along with the data length in scanlines and a data ID for multi-stream support, and (d) four bytes of data per line, for the number of lines given in (c). Possibly also (e) a "no action" command, so the APU doesn't misinterpret part of the last data shot as some other command...
...
My first attempt,
a bit upthread, used a lot of zero-page memory and was limited to three sample buffers at a time, because it relied on 16-bit code modification at runtime. Last night I modified the code to make better use of the index registers, freeing up a significant chunk of direct page and allowing up to 6 simultaneous buffers, although each buffer now has to start on a page boundary. I also moved the buffer metadata to page one, freeing up the rest of zero page (not sure how much that matters).
Unfortunately the new code imposes an additional restriction on the streaming format. The use of X as the low byte of the streaming buffer pointer means the data chunk size has to divide evenly into 256 bytes; otherwise I have to deal with overflow in the pickup loop, and there's no time for that. The buffer should probably also be a multiple of 9 bytes, unless the streaming data is already formatted with the buffer size in mind, studded with end-of-sample bits and padded with zeroes... actually that sounds like a good idea regardless...
The key question, which at the moment is totally outside my expertise, is how long a high-granularity but full-featured audio engine can be expected to take
at most between I/O port checks. If my math is correct, an engine that turns around in 9 scanlines or so should allow up to 640 bytes per frame in 32-line (128-byte) chunks, with five engine slots per frame plus whatever fits in VBlank (roughly 4 at max length) for a total of about 37% of the total compute time (ie: streaming eats 63%). An engine that turns around in 3 scanlines (which may be unrealistic) could allow the same bandwidth in 16-line chunks, with ten engine slots per active display period (in this case streaming eats 70% of total compute time). Paired 16-line chunks (two chunks back-to-back with no processing in between) could do it given 6 scanlines of turnaround time, leaving room for five of those 6-line engine slots in active display. I haven't yet cycle-counted past the end of the streaming routine (partly because I haven't written anything more than this yet), so these numbers are approximate.
Any thoughts? Keep in mind I haven't ever coded for the APU before, and this mess hasn't even been assembled, never mind run...
Code:
; BUFFER METADATA STRUCTURE (WIP):
; byte 0-1: current buffer write position
; byte 2: buffer start page
; byte 3-4: buffer end address
; byte 5: data ID
; In other words, using six buffers burns 36 bytes of direct page. If using
; zero page for this is acceptable, the SETP/CLRP instructions can be removed
; and the timing headroom goes from ~8 cycles to ~12.
; HDMA STREAMING CODE:
data_incoming_HDMA:
mov A, #data_start_HDMA ; 2 cycles - load data start flag value
- cbne $F4, - ; 7 cycles - listen for the write
; This point is reached roughly 3-9 cycles after $2140 is written, assuming
; CBNE loads the comparison value before the branch target.
mov A, $F5 ; 3 cycles - load data ID number
mov X, $F6 ; 3 cycles - load chunk size in scanlines
; TOTALS: 6 cycles since start code noticed in $F4, 9-15 cycles since $2140 written
; Find the buffer to which the data ID was assigned when the APU sent the data
; request (or processed a streaming SFX request from the S-CPU):
setp ; 2 cycles - switch to page one (optional)
cbne buf6_id, buf5check ; 7/5 cycles - check buffer 6, proceed to next if no match
mov Y, #$04 ; waste 0/2 cycles
- dbnz Y, - ; waste 0/22 cycles
cmp A, (X) ; waste 0/3 cycles
mov Y, #buf6 ; 0/2 cycles - load direct-page address for buffer 6 data
jmp buf_found ; 0/3 cycles - skip ahead
buf5check:
cbne buf5_id, buf4check ; 7/5 cycles - check buffer 5, proceed to next if no match
mov Y, #$03 ; waste 0/2 cycles
- dbnz Y, - ; waste 0/16 cycles
cmp A, (X) ; waste 0/3 cycles
mov Y, #buf5 ; 0/2 cycles - load direct-page address for buffer 5 data
jmp buf_found ; 0/3 cycles - skip ahead
buf4check:
cbne buf4_id, buf3check ; 7/5 cycles
cmp (X), (Y) ; waste 0/5 cycles
cmp (X), (Y) ; waste 0/5 cycles
nop ; waste 0/2 cycles
nop ; waste 0/2 cycles
mov Y, #buf4 ; 0/2 cycles
jmp buf_found ; 0/3 cycles
buf3check:
cbne buf3_id, buf2check ; 7/5 cycles
cmp (X), (Y) ; waste 0/5 cycles
nop ; waste 0/2 cycles
mov Y, #buf3 ; 0/2 cycles
jmp buf_found ; 0/3 cycles
buf2check:
cbne buf2_id, buf1 ; 7/5 cycles
mov Y, #buf2 ; 0/2 cycles
jmp buf_found ; 0/3 cycles
buf1:
mov Y, #buf1 ; 2/0 cycles
buf_found:
; TOTALS: 39-40 since buffer check started, 48-55 since $2140 written
; If no assigned buffer was found, data will be sent to buffer #1. Now the
; data pickup loop must be rewritten to target the selected buffer:
mov A, $01+Y ; 4 cycles - get high byte of buffer pointer
mov !(get_data_HDMA+4), A ; 5 cycles - write buffer page address
mov !(get_data_HDMA+9), A ; 5 cycles
mov !(get_data_HDMA+14), A ; 5 cycles
mov !(get_data_HDMA+19), A ; 5 cycles
; TOTALS: 24 cycles since buf_found, 72-79 since start flag written to $2140
; The index registers will now be set up for the loop. The buffer metadata
; pointer will be saved for later, and X and Y will be loaded with the low
; byte of the buffer pointer and the chunk size in scanlines, respectively:
mov A, X ; 2 cycles - move chunk size from X to A
mov X, $00+Y ; 4 cycles - get low byte of buffer pointer
push Y ; 4 cycles - store buffer pointer address
mov Y, A ; 2 cycles - get chunk size in scanlines
clrp ; 2 cycles - switch back to page zero (if using page one)
; TOTALS: 14 cycles since loop rewritten, 86-93 since start flag written to $2140
; Ideally, one scanline should be almost exactly 65 cycles long. The port reads
; are between cycles 3 and 30 past this point, putting them between 24 cycles after
; the first HDMA write and about 8 cycles before the fourth one on the next line.
; That should be good for at least several scanlines regardless of clock drift, no?
; STREAMING LOOP:
get_data_HDMA:
mov A, $F4 ; 3 cycles - get byte 0 of the data shot
mov !$0000+X, A ; 6 cycles - write it to the current buffer position
mov A, $F5 ; 3 cycles - get byte 1
mov !$0001+X, A ; 6 cycles - write it to the current buffer position plus one
mov A, $F6 ; 3 cycles - get byte 2
mov !$0002+X, A ; 6 cycles
mov A, $F7 ; 3 cycles - get byte 3
mov !$0003+X, A ; 6 cycles
inc X ; 2 cycles - increment the current buffer position four times
inc X ; 2 cycles
inc X ; 2 cycles
inc X ; 2 cycles
cmp (X), (Y) ; waste 5 cycles
cmp (X), (Y) ; waste 5 cycles
cmp (X), (Y) ; waste 5 cycles
dbnz Y, get_data_HDMA ; 6/4 cycles - repeat for next scanline, or exit if done
; TOTAL: 65 cycles
; The final loop ends ~19-26 cycles after the first byte would be written on the line
; immediately following the last line of the data chunk.
; Now it remains only to store X back in the zero page data structure and check for
; page rollover and end-of-buffer, updating the high byte of the buffer pointer as
; appropriate:
end_data_HDMA:
setp ; 2 cycles - switch to page one (if using page one for buffer metadata)
mov A, X ; 2 cycles - load the new buffer address low byte from X
pop X ; 4 cycles - pick up the buffer pointer address
mov (X), A ; 4 cycles - store the new buffer pointer low byte
bne + ; 4/2 cycles - check if X had rolled over to zero (POP doesn't affect flags)
inc $00+X ; 5 cycles - increment high byte of buffer pointer
+ mov A, $01+X ; 4 cycles - pick up high byte
cbne $04+X, done_HDMA ; 8/6 cycles - check high byte against buffer end address
mov A, (X) ; 0/3 cycles - pick up low byte
cbne $03+X, done_HDMA ; 0/8/6 cycles check low byte against buffer end address
mov A, $02+X ; 0/0/4 cycles - if end of buffer reached, load buffer start page
mov $01+X, A ; 0/0/5 cycles - store to buffer pointer high byte
done_HDMA:
clrp ; 2 cycles - switch back to page zero (if using page one)
; This code ends ~49-75 cycles after the first non-chunk HDMA slot. In other words, it
; brackets the second slot, unless it will be more than 16 cycles until the next read.
; Which is quite probable. And that means the next read will get whatever was written
; TWO scanlines after the last data shot. Or, simply put, there are two scanlines of
; overhead after the chunk ends.
I've taken a cursory look at Super SNESMod, and at the APU code from N-Warp Daisakusen. The latter is interesting because it's doing almost exactly what I'm trying to do, but it's handled differently and seems to have some disadvantages compared with my approach (though to be fair, it is a field-proven capability, while mine is very much not). It also uses 66 cycles instead of 65 for the loop, but that seems to be a PAL thing.
Now that I think about it, if I wanted my code to be able to handle 32-line chunks on PAL, I'd probably have to partially unroll the pickup loop to take 131 cycles per two scanlines. PAL is nominally 65.632 cycles per scanline, vs. 65.033 on NTSC, give or take quite a bit (nearly 0.2 as I understand it), so over a chunk that long the timing would be unreliable with any single-line loop... and if I need two instances of the pickup code, it will take 20 extra cycles to overwrite the high byte, so I'm back to 3 buffers...
Wait... I have 15 cycles in that loop during which nothing whatsoever is happening:
Code:
; FOR PAL, REPLACE 15-CYCLE TIME DELAY IN DATA PICKUP LOOP WITH:
mov A, Y ; 2 cycles
and #$01 ; 2 cycles
beq + ; 4/2 cycles
cmp A, (X) ; 0/3 cycles
+ nop ; 2 cycles
cmp (X), (Y) ; 5 cycles
That's either 15 or 16 cycles depending on the low bit of the line counter. Problem solved.
On the subject of Capcom beat'mups, is their a reason for the 3-enemy limit in the SNES Final Fight games other than perceived CPU speed?
My first guess would be CHR RAM limits. The Super NES sprite page is 128x256 pixels, and that has to cover the player, the player's pickup weapon, enemies, and hit sparks. Though DMA is pretty fast at filling it, it takes about three frames without letterboxing to refill it, which is why Final Fight has letterboxing.
My second guess would be overdraw. The Super NES's maximum total sprite width is 272 pixels (34 8x1 pixel slivers) per scanline. Say you have Haggar and three Andore, and they all decide to do an attack where they spread their arms to be 64 pixels wide. Then you've used up most of the available slivers. It's also why Mighty Final Fight for NES used much smaller sprites.
Definitely memory is the issue. I doubt sprite overflow was much of a worry, the scenario tepples decided is probably not that common and can be usually ignored =P
You could have variable sized slots, and reorganize the slots when an enemy dies.
psycopathicteen wrote:
You could have variable sized slots, and reorganize the slots when an enemy dies.
Pfft, like anyone but the people here are going to even attempt that. One thing I always found humorous is that if 8x8 and 16x16 sprites are being used, there's no way you can run out of available vram, because you only have 128 sprites. One thing I was thinking about that would minimize cpu usage but at a cost increase bandwidth would be to have every sprite have its own 16x16 slot, even if the sprite is only 8x8. (Less checking, and no comparing to see if the sprite is the same. Of course, you could upload 16x16 worth of data and only use an 8x8 sprite and change the sprite tile number for some animation) Overall though, I find the 8x8 and 16x16 setup to be fairly useless, only because I don't believe it gives you enough sprite coverage. I guess if your games uses a lot of 24 pixel wide objects and you want to avoid overdraw, it could come in handy, but if you're worrying about sprites per line, there's a good chance you'll also be worrying about sprites total. There's really no winning when it comes to selecting sprite sizes. There's definitely loosing though... (any size configuration with 64x64.)
Well beat'mups tend to have only large sprites anyway, it doesn't need to worry about individual 16x16 sprites. They can be organized in rows of 8 16x16 sprites. A short character can be 2 rows, a medium character 3, and a tall character 4.
You're forgetting little cosmetic things like items though.
Especially stuff like pipes and baseball bats that are usually large and diagonal, causing lots of wasted blank space. Especially when all your sprites have to be squares and not rectangles.
Then it makes sense that items would get their own dynamic slots as well.
I'm almost certain the sprite size limitation was the major issue in handling Final Fight on SNES even if FF2 and FF3 did a bit better job.
We can see how much the more flexible sprite capabilities of the MD bring an advantage here :
Still the MD version is on the edge with sprite overflow and we often see a bit of flickering when there are many items and enemies at same time.
Most of the "flickering" in the Sega CD version is not from sprite overflow though, it seems to be a flaw in the sorting algorithm. If too many objects happen to be at a similar depth, it'll outright refuse to draw some of those. I wonder if they just split the field into buckets and then added objects into them (and dropping would happen if a bucket was full).
Lesson: use a proper sorting algorithm. Insertsort does a fine job here, since objects remain sorted most of the time anyway (and when they don't, it's usually just two consecutive objects needing to be swapped).
Those screenshots don't look anywhere near maxing out the SNES sprites limit.
psycopathicteen wrote:
Those screenshots don't look anywhere near maxing out the SNES sprites limit.
Of course It does not, you can have 5 characters on screen at once in FF2 and FF3 which is much more than what those screenshots show but definitely even in FF3 it lacks a lot of the sub items we can find in the SegaCD version, i think having rectangle sprites (instead of only square) is a big advantage here... But it's true using smart sorting algorithm can help (using Z sort for instance so sprite behind flicker first)
Shit, I forgot that on overflow the SNES drops sprites from the front, not from the back (the MD does the opposite). I have absolutely no idea how they got to that behavior, but it's definitely a wreck for this situation.
Stef wrote:
We can see how much the more flexible sprite capabilities of the MD bring an advantage here :
It's probably more of the fact the sprite graphics have to fit in 16 KB. Really though, altogether, the Sega CD version stomps the SNES one in terms of graphics. It appears it was made later and had much more effort put into it though, so it might be better to compare it with Final Fight 2, but I still think this looks better. If only they'd taken my advice...
Sik wrote:
I forgot that on overflow the SNES drops sprites from the front, not from the back (the MD does the opposite).
Really? Why don't people ever list that as an advantage for the Genesis? The only real time I'd say the SNES dropout being an "advantage" is when you're faking part of a BG layer, (like things protruding out in the areas where there's rowscrolling) because it looks a little less odd when the actual objects of the game are disappearing.
Sik wrote:
I have absolutely no idea how they got to that behavior,
What do you mean? When filling out the linebuffer, you start with the lowest priority sprite, and then write the higher priority sprite over it. Eventually, it's going to run out of time trying to draw the sprites and will leave on of them half finished, and because of what I said, it's going to be the highest priority sprite. How does in work on the Genesis? Does it write the highest priority, and then go backwards, checking to see if every pixel that's being drawn is being obscured?
Sik wrote:
it's definitely a wreck for this situation.
What situation?
Espozo wrote:
Really? Why don't people ever list that as an advantage for the Genesis?
I only learned about this less than a month ago, so I'd say that probably a lot of people aren't even aware about this. You don't really see much use of intentional sprite overflow on the SNES (cutting can be done with the windows after all), so if you ever see it you'd only notice how awful it looks without actually paying attention to the actual order.
Espozo wrote:
What do you mean? When filling out the linebuffer, you start with the lowest priority sprite, and then write the higher priority sprite over it. Eventually, it's going to run out of time trying to draw the sprites and will leave on of them half finished, and because of what I said, it's going to be the highest priority sprite.
No, that's the thing. Earlier sprites in the list get drawn in the front. It's actually discarding the sprites that come in earlier, not the sprites that come in later. Yes, it caught me off guard too, but was talking to somebody who was doing some SNES experiments some time ago (a couple of weeks maybe?) and this was brought up.
Espozo wrote:
How does in work on the Genesis? Does it write the highest priority, and then go backwards, checking to see if every pixel that's being drawn is being obscured?
Earliest sprite gets drawn on top. Later sprites only get drawn where there are blank pixels. It simply stops drawing if it runs out of time, plain and simple (so sprites later in the list will get discarded).
Espozo wrote:
Sik wrote:
it's definitely a wreck for this situation.
What situation?
Sprite overflow when sprites have to be sorted by depth already. You can't work around that since that'd mess with the depth order.
Which way do priorities go on the NES?
Sik wrote:
Espozo wrote:
How does in work on the Genesis? Does it write the highest priority, and then go backwards, checking to see if every pixel that's being drawn is being obscured?
Earliest sprite gets drawn on top. Later sprites only get drawn where there are blank pixels. It simply stops drawing if it runs out of time, plain and simple (so sprites later in the list will get discarded).
Honestly that way the MD is rendering sprites always surprised me (front sprite firsts, meaning it draws pixel only in blank pixel), i think it's way more complex that just doing reverse order sprite rendering as the SNES is probably doing. The fact the SNES has a "fixed" OAM (first sprite in table is always top priority and so on...) explain why they could do it that way i guess. The MD uses a chained sprite list system so they had to render sprite in the same order they are met in the chained list. They could eventually said first sprite is lowest priority while last one has top priority so they could use a simpler rendering method but i think they considered it would be more logic and convenient for developers to have it in opposite way hopefully.
Quote:
Sprite overflow when sprites have to be sorted by depth already. You can't work around that since that'd mess with the depth order.
That is definitely an issue when you know your game will hit the sprite limit in certain situation, as you just can hide it :-/ So it's even more important to do your best to avoid that situation...
psycopathicteen wrote:
Which way do priorities go on the NES?
What do you mean by "priorities go"? If you're taking about what gets drawn and what gets clipped first, it draws the bottom most sprites first, (or apparently not according to Sik?) and goes from left to right while drawing them. The right side of the highest priority sprites are first not to be drawn.
Stef wrote:
Honestly that way the MD is rendering sprites always surprised me (front firsts, mean it draws pixel only in blank pixel), i think it's way more complex that just doing reverse order sprite rendering as the SNES is probably doing.
I imagine it takes about the same bandwidth though, so it's smarter in that sense because it's less ugly. Also, are you saying that the priorities of sprites aren't completely dependent on their order?
psycopathicteen wrote:
Which way do priorities go on the NES?
Lower indices have higher priority and are drawn on top. The higher the index, the higher the chances of the sprite being dropped. Sounds like the same as the Genesis.
Oh, I thought he said "SNES"...
Stef wrote:
i think they considered it would be more logic and convenient for developers to have it in opposite way hopefully.
Probably the only time they considered how the developers would feel.
To be honest though, 16KB of sprite graphics really isn't
too bad when you're using the space perfectly. If you were to reasonably use BG mode 3 (8bpp and 4bpp) like I plan on, then you'd have to cram all the sprites in a space that small anyway. To be honest, (not that it's no possible of course) but I really haven't seen many, if any, Genesis games where you wouldn't be able to fit all the sprites onscreen into a space that small. I think Turbografx games generally used more space for sprites, but that's only because there's one BG layer. A large boss that would be a BG on the Genesis or the SNES would have to be sprites on the Turbografx.
I know I'm derailing this, (we can always get back on "subject", although that would be Splatoon frozen yogurt
) but could they have reasonable added a good deal more palette entries on the SNES had they not added stupid crap like half of the useless BG modes and 64x64 sprites and whatnot, or does this take next to no space and ram takes up a bunch? I've just never been pleased with the SNES color situation, and it blows my mind how people were able to make things on the Genesis as good looking as that Final Fight VD game.
Quote:
I imagine it takes about the same bandwidth though, so it's smarter in that sense because it's less ugly. Also, are you saying that the priorities of sprites aren't completely dependent on their order?
Not exactly the same bandwidth as you need 1 read and 1 write per pixel on MD where you only need 1 write on SNES.
Sprites priority is handled as on SNES after all : first sprite top, then second sprite then third.
The difference is that sprite order on MD is not defined from the sprite index in the OAM but depending the chained ordering :
You can have sprites rendered like this (always starting at sprite 0):
spr 0 --> spr 10 --> spr 2 --> spr 14 --> spr 5 --> spr 6 --> 0 (= end)
front --> bottom
Stef wrote:
You can have sprites rendered like this (always starting at sprite 0):spr 0 --> spr 10 --> spr 2 --> spr 14 --> spr 5 --> spr 6 --> 0 (= end)
So there's a list that defines the order that sprites get drawn?
Stef wrote:
Not exactly the same bandwidth as you need 1 read and 1 write per pixel on MD where you only need 1 write on SNES.
What I'd think would be cool is if somebody made a complete list as to how the vram bandwidth is divided on both systems and which has the greater bandwidth. I know the SNES has more bandwidth for BG layers, but less for sprites, so it's about even.
Quote:
Stef wrote:
i think they considered it would be more logic and convenient for developers to have it in opposite way hopefully.
Probably the only time they considered how the developers would feel.
Are you speaking about SNES or MD ? Because about the MD, except about the (very) limited palette number, the system is generally really straightforward (we cannot say the same about the SNES).
Quote:
To be honest though, 16KB of sprite graphics really isn't too bad when you're using the space perfectly. If you were to reasonably use BG mode 3 (8bpp and 4bpp) like I plan on, then you'd have to cram all the sprites in a space that small anyway. To be honest, (not that it's no possible of course) but I really haven't seen many, if any, Genesis games where you wouldn't be able to fit all the sprites onscreen into a space that small.
Well maybe Final Fight is a good example, in this area :
You can have 6 different characters (2 players + 4 enemies) on screen + the wooden box + barrel + knife + wood wreckage ... Not sure that actually fit in 16 KB. Also consider that on SNES you are using square sprites which means wasted space (and you can't use 8x8 sprites only given the size of the sprites).
Quote:
I've just never been pleased with the SNES color situation, and it blows my mind how people were able to make things on the Genesis as good looking as that Final Fight VD game.
The SNES colors situation is really comfortable for me :p Of course the more palette you have the best it is but having 8+8 is a good compromize i think. On MD even having 4+4 would have been *so much* better really.
Espozo wrote:
So there's a list that defines the order that sprites get drawn?
It's in the SAT / OAM :
http://wiki.megadrive.org/index.php?tit ... bute_TableIt's the "link" field which define the next sprite index to render
Stef wrote:
What I'd think would be cool is if somebody made a complete list as to how the vram bandwidth is divided on both systems and which has the greater bandwidth. I know the SNES has more bandwidth for BG layers, but less for sprites, so it's about even.
We already tried to calculate that on another forum, actually both system are really close, with a minor edge for the MD when you take everything (in H40 mode), but at this time we assumed sprite rendering cost the same on both system (i.e. only cost a write operation per pixel). I hope to find then, it was a quite interesting talking
Stef wrote:
Honestly that way the MD is rendering sprites always surprised me (front sprite firsts, meaning it draws pixel only in blank pixel)
It's not that unusual though, it's called underdraw (where you're only allowed to draw on unoccupied space) and happens to match the behavior that was common for sprite systems at the time. Also it's not like overdraw would have been much better, in either case you'll have to check for transparent pixels (in the linebuffer for underdraw, or in the sprite itself for overdraw).
As for bandwidth: every 16 pixels there are eight slots (with up to 4 consecutive bytes each). Two slots are used to retrieve two indices from each scroll plane, then four slots are used to retrieve the referenced tiles, then one more slot for a sprite tile and then a free slot (used by 68000 or for refresh). Let's just say they packed it quite a bit. (as for when the sprite list is parsed, that's done in hblank period using an internal cache containing the portions of the sprite table needed to know which sprites are at any given line)
Quote:
It's not that unusual though, it's called underdraw (where you're only allowed to draw on unoccupied space) and happens to match the behavior that was common for sprite systems at the time. Also it's not like overdraw would have been much better, in either case you'll have to check for transparent pixels (in the linebuffer for underdraw, or in the sprite itself for overdraw).
Yeah about the operation itself yes, but with underdraw you still need to do :
- read sprite data
- read line buffer data
- if line buffer data == 0 --> write sprite data to line buffer
= 2 reads + 1 write.
with overdraw method :
- read sprite data
- if sprite data != 0 --> write sprite data to line buffer
= 1 read + 1 write.
Quote:
As for bandwidth: every 16 pixels there are eight slots (with up to 4 consecutive bytes each). Two slots are used to retrieve two indices from each scroll plane, then four slots are used to retrieve the referenced tiles, then one more slot for a sprite tile and then a free slot (used by 68000 or for refresh). Let's just say they packed it quite a bit. (as for when the sprite list is parsed, that's done in hblank period using an internal cache containing the portions of the sprite table needed to know which sprites are at any given line)
Actually i found back the topic we were debating over the VRAM bandwidth on MD and SNES (and even about the PCE).
We concluded the SNES could do 340 VRAM accesses per scanline and per bus, as the PPU has 2x8 bits BUS you can accesses 640 bytes/scanline (read or write).
The MD VRAM is more complex as it uses a dual port PSRAM, but at the end it allows the MD VDP to do about 800 (bytes) VRAM accesses by scanline. Not bad for a system released 2 years before but honestly the PCE is not bad neither at this game (i don't have the exact number but it should be close to the SNES bandwidth with its 16 bits BUS)
Stef wrote:
You can have 6 different characters (2 players + 4 enemies) on screen
Already, I think that's better than Final Fight 3, which I think has 2 players + 3 enemies? I don't know, I just remember playing it on bsnes and finding it underwhelming with the assistant CPU (which is a pretty cool feature, really). Ninja Baseball Batman is still my favorite beat em up.
Stef wrote:
Are you speaking about SNES or MD ?
Definitely SNES. I haven't worked with the MD, so I don't know how it is, but I have a fairly good understanding of the MD and it seems much easier to get the equivalent picture on the MD than on the SNES.
Stef wrote:
Not sure that actually fit in 16 KB.
That picture definitely would (the box uses many repeated tiles, the barrels are flipped horizontally) but with all the stuff you mentioned, maybe not. I'm not sure if the game would actually load 4 enemies there, because it almost seems like you'd run into overdraw problems there.
Believe it or not, I actually got this to fit into 16KB with 16x16 and 32x32 sprites when I was first starting with the SNES, but with virtually no room left:
Attachment:
16KB.png [ 17.85 KiB | Viewed 1829 times ]
Like I said, seeing something only a little smaller than this on the Turbografx isn't to incredibly uncommon, but despite the fact the Genesis has about the same sprite capabilities, you wouldn't see it there. I've seen a good bit of Turbografx games with bad overdraw problems because of this though, but there's no other way, unlike the Genesis and the SNES (where it's pretty much your only option).
Sik wrote:
It's not that unusual though, it's called underdraw (where you're only allowed to draw on unoccupied space) and happens to match the behavior that was common for sprite systems at the time.
From what I've seen, the Genesis can even do the same sprite masking trick that's possible on the NES, which consists in overlapping a high priority back sprite and a low priority front sprite to bring forward the background.
This is easily noticeable in Sonic 1's Marble Zone, when the platforms that float in lava catch fire. When it sinks into the lava, a hole in the platform in the shape of the fire allows the lava to show through. I don't think this was intentional in this particular place, but it shows the effect is possible.
Can the SNES do something like this?
Here's a screenshot of what I'm talking about:
Attachment:
marble12.png [ 8.9 KiB | Viewed 1823 times ]
tokumaru wrote:
From what I've seen, the Genesis can even do the same sprite masking trick that's possible on the NES, which consists in overlapping a high priority back sprite and a low priority front sprite to bring forward the background.
This is easily noticeable in Sonic 1's Marble Zone, when the platforms that float in lava catch fire. When it sinks into the lava, a hole in the platform in the shape of the fire allows the lava to show through. I don't think this was intentional in this particular place, but it shows the effect is possible.
Can the SNES do something like this?
This effect is used in Sonic 2 title screen just to hide bottom part of sonic / tail bodies, Sonic 1 uses a more classic sprite masking feature to do it i believe. I remember i had to almost rewrite all my VDP render core in Gens to properly emulate that specific feature, i was wondering how internally the VDP was working to allow that sort of weirdness :p
To reply to your question, if the NES was already handling it, i guess the SNES does it as well.
Espozo wrote:
Already, I think that's better than Final Fight 3, which I think has 2 players + 3 enemies? I don't know, I just remember playing it on bsnes and finding it underwhelming with the assistant CPU (which is a pretty cool feature, really). Ninja Baseball Batman is still my favorite beat em up.
Yeah the FF serie on SNES was always limited to 3 enemies max on screen, at least with FF2 and FF3 you can play it at 2.
The SegaCD version adds an extra enemy but gameplay is more dynamic and that, imo, makes the game really much more pleasant to play than its SNES counterpart.
Quote:
That picture definitely would (the box uses many repeated tiles, the barrels are flipped horizontally) but with all the stuff you mentioned, maybe not. I'm not sure if the game would actually load 4 enemies there, because it almost seems like you'd run into overdraw problems there.
I think indeed overdraw would be a problem anyway but even about just straight video memory usage i think 16 KB can be limiting.
Just checking in debugger it seems FFCD use VRAM area 0x9000-0xFFFF just for sprites, almost half of VRAM ! But some part of it could be replaced in BG pattern on SNES (as the time counter if you use BG3 for that)... Still definitely 16 KB seems too small for this game. I'm wondering how they can handle background with so few dedicated VRAM space O_o...
Stef wrote:
gameplay is more dynamic and that, imo, makes the game really much more pleasant to play than its SNES counterpart.
Oh, I know. The SNES port sucks, and I know some of it has to do with memory, but none of the graphics are even compressed, so... yeah. Just a lousy effort.
Stef wrote:
i think 16 KB can be limiting.
It definitely can, I'm just not to sure in this case.
Stef wrote:
it seems FFCD use VRAM area 0x9000-0xFFFF just for sprites, almost half of VRAM !
Yeah, that's just wasteful... Like I said, no one back in the say would have done this, but it's possible to make every sprite in 8x8 and 16x16 mode use different tiles from one another, and many of the games used this mode and also used the full 16KB but had trouble managing the space inside (SMW). Many later SNES games used 16x16 and 32x32 and got much smarter about it by having object slots and stuff like that, but I have yet to see sprite slots with 8x8 and 16x16 or an even more complex scheme with 16x16 and 32x32 like I came up with. (This would have to check for lookalike tiles to give any advantage). A big problem is CPU usage, and many developers weren't so good about it...
Stef wrote:
I'm wondering how they can handle background with so few dedicated VRAM space O_o...
I wouldn't be surprised if they were changing out tiles as the level scrolls, not just from screen to screen.
tokumaru wrote:
From what I've seen, the Genesis can even do the same sprite masking trick that's possible on the NES, which consists in overlapping a high priority back sprite and a low priority front sprite to bring forward the background.
It's possible on the SNES. Look at Equinox:
https://www.youtube.com/watch?v=ns463PKC2G0. I've turned off the BG layers in the debugger, and a sprite mask for the BG appears in front of an object when it's under the background. (It must be doing BG collision and then spawning the overlay. Because the actual sprite mask is never seen, it's just whatever color so it doesn't take up a palette.) It's a pretty smart idea, (better than overlays, because it doesn't take up a palette) and it's the kind of thing (like dynamic sprite allocation) that you really don't see too much. It's odd how dated the actual graphics look though in comparison, but I guess actual artwork is different than technical effects.
Espozo wrote:
I know I'm derailing this, (we can always get back on "subject", although that would be Splatoon frozen yogurt
)
Feel free to PM me some reasonable split points if you can find any.
Quote:
but could they have reasonable added a good deal more palette entries on the SNES had they not added stupid crap like half of the useless BG modes and 64x64 sprites and whatnot, or does this take next to no space and ram takes up a bunch?
Look at Visual 2C02. The 28x6-bit palette SRAM is already big. A 256x15-bit palette would be even bigger.
Quote:
it blows my mind how people were able to make things on the Genesis as good looking as that Final Fight VD game.
VD?
tepples wrote:
Feel free to PM me some reasonable split points if you can find any.
It's fine. This is pretty much the "anything goes" topic. There's only so long you can talk about frozen yogurt...
It's weird though... I didn't know taito made toys:
Guess where they're selling these... Well, frankly, if you play the game nowadays, it's only Japanese people anyway. (It's always been that way on ranked battle, which is annoying because of connection problems.)
tepples wrote:
VD?
What's next to "V" on the keyboard...
Stef wrote:
Yeah about the operation itself yes, but with underdraw you still need to do :
- read sprite data
- read line buffer data
- if line buffer data == 0 --> write sprite data to line buffer
= 2 reads + 1 write.
with overdraw method :
- read sprite data
- if sprite data != 0 --> write sprite data to line buffer
= 1 read + 1 write.
Don't forget that the sprite data can't be accessed while it's being fetched from video memory, so going by your logic it actually takes two steps (leaving it pretty much the same effort as underdraw). Once the data is cached on the die it doesn't need any additional effort to retrieve it either.
So how it'd go for underdraw is:
- Fetch sprite data from VRAM
- Check if pixel in linebuffer is transparent
- Replace pixel in linebuffer if so
For overdraw:
- Fetch sprite data from VRAM
- Check if pixel in sprite is transparent
- Replace pixel in linebuffer if so
In other words, there really isn't any significant gain from doing either and it boiled down to preference. Also if I recall correctly the Master System handles sprite overlap the same way (with underdraw), so it was in their best interests to ensure the end result was similar (assuming they adapted mode 4 to use the same sprite system as mode 5).
I played around with forming Final Fight sprites into blocks, and found the biggest sprites can fit within a 2kB slot. So VRAM-wise it can do 6 enemies, and have 4kB left for items.
Changing the subject a little bit, I wonder if sprites in games like Street Fighter Alpha, and Darkstalkers, can be compressed at crazy high compression ratios using more modern-ish compression algorithms. I know that h264 makes use of solid colored 4x4 blocks and motion vectors, why not make use of it for pixel animation?
Because trying to decode "modern-ish" graphics compression on a SNES in real time would be disastrously slow.
In fact, that's why Street Fighter Alpha 2 had a coprocessor to do the decompression.
Yeah, I think the suggestion was to do the same thing, but with a different algorithm.
In that case, I'm not sure to what extent motion compensation would be useful with indexed-color images, unless perhaps you're thinking of
shearing as a frame interpolation method.
I don't think motion vectors are something that requires much CPU power. Isn't it just copying a block of pixels from the previous frame, and the motion vectors point to where the block is from the previous frame relative to the current frame?
I'm just not yet convinced that motion vectors are actually worthwhile to use for a lossless 4bpp animation, as the diligence to keep it lossless would likely outweigh any coding gain. Or are you instead recommending somehow making a lossy 4bpp animation? In either case, feel free to make a tech demo of motion vectors.
I'm not really interested in making a demo. I was just wondering if anybody has tried making a "crazy awsome" compression algorithm before, or if anybody has any idea for it.
You might try experimenting with
Exomizer, which is (as far as I know) the best practical compression tool aimed at 8-bit processors. It
might decompress fast enough on a SNES to allow some real-time animation of compressed sprites.
You'd have to port the decompression code from 6502 to 65816, since the stock code is aimed exclusively at 8-bit home computers, but it's doable.
Of course, the compression algorithm isn't specifically designed with graphics in mind. Any sort of actual image codec is probably very ill-suited to the SNES for a lot of reasons.
If I'm doing a compression heavy game, I'd probably use an SA-1.
I'm just doing some researchs now to try to find a good compression algorithm which can offer good compression ratio and very fast unpacking (by very fast I mean ~10KB of data per frame so you can use it to unpack sprite data for instance).
I implemented Sik's UFTC compression in SGDK and it provides a very good unpacking speed (about 13KB by frame) but it's limited to tile data and the compression ratio is not very good, specially for small amount of data.
I tried to implement lzo1x, it provides better compression and should have fast unpacking but then I have to optimize the unpacker as currently it does only about 2KB by frame (but code is in C). I saw the lzo1x can have very nice compression ratio but the version I'm using is not that good (probably because it has been optimized for compression speed which I don't care). I really want to find a ratio optimized lzo1x packer now, hoping the unpacker should not be modified !
psycopathicteen wrote:
If I'm doing a compression heavy game, I'd probably use an SA-1.
You have the LZ4 compression, here the link with his 65816 implementation :
http://www.brutaldeluxe.fr/products/crossdevtools/lz4/It's a realtime oriented decompressor,this means very fast decompression with good compression ratio(but not the best inevitably) .
Thanks Touko
I actually experimented LZ4 just before switching to LZO but i could not find a proper minimal LZ4 packer source code working correctly and i definitely don't want to have a 10+ complexes sources files project for a simple compression algorithm, i just need a pack(..) / unpack(..) methods couple which can fit in a single (or 2 :p) source file so i can distribute it easily =) Also the advantage of LZO is than it can have better compression ratio when you increase compression level, looks like LZ4HC can compress better than LZ4 but not sure how many.
Well it's definitely not easy to find a good compression algo for our old systems :p Ideally i would like to have only one compression method i could use for everything =)
Edit: I realized how tricky it is to have a "real time" unpacker when we are working on 8/16 bits systems. By real time i mean something you can use to unpack your sprite data at each frame. For instance for the MD i calculated i would ideally unpack up to 6 KB of graphics data per frame, that is not much 6 KB but still for that you need at least an unpacking speed of 10 KB / frame so you preserve a bit of CPU (less than half of CPU time) for others tasks. And 10 KB per frame is *a lot*, just a simple copy loop like this one :
Code:
.loop:
move.b (a0)+,(a1)+
dbra d0, .loop
requires 22 cycles per byte transferred... at this speed you can only do ~5.7 KB per frame :-/
And as data is not aligned to word in packed stream you can hardly use word/dword data transfer instruction, what a shame...
I think there is no magic solution, it depend what you need, speed or ratio ,but speed and ratio at same time is a chimera ..
Now, there are some very good compressor ,better by far than those in 80/90's,to find a good compromise or well suited for a particular project .
Quote:
Thanks Touko
I actually experimented LZ4 just before switching to LZO but i could not find a proper minimal LZ4 packer source code working correctly
You can start with the link i posted, i know is not that you want, but LZ4 is simple enough to do it from scratch,and to enhance for the decompression part .
The compressor is a standard windows .exe :
http://www.brutaldeluxe.fr/products/cro ... _Files.zipQuote:
requires 22 cycles per byte transferred... at this speed you can only do ~5.7 KB per frame :-/
You can optimise with words copy, and doing bytes only when needed .
For exemple if you have 13 bytes to copy, you can tranfert 6 words and 1 byte,needs more code but i think easily doable.
Of course a 100% 16 bits algorithm is better for the 68K,but 5/6 kB / frame is not that bad imo .
Quote:
move.b (a0)+,(a1)+
Isn't 20 cycles /byte/word (12 + 8 ) ??
how many cycles take dbra (2 cycles) ??
It's not easy to calculate the 68k's instruction cycles
TOUKO wrote:
I think there is no magic solution, it depend what you need, speed or ratio ,but speed and ratio at same time is a chimera ..
Now, there are some very good compressor ,better by far than those in 80/90's,to find a good compromise or well suited for a particular project.
That is the idea, today we have blazing powerful computer so we can use better compression algorithm, we are just limited about the decompression part which should remain very fast.
Quote:
You can start with the link i posted, i know is not that you want, but LZ4 is simple enough to do it from scratch,and to enhance for the decompression part .
The compressor is a standard windows .exe :
http://www.brutaldeluxe.fr/products/cro ... _Files.zipI really want to provide the source code for the compressor so any people can compile tools for its platform (linux / osx)
I downloaded sources but got some troubles when i tried to compile them from my mingw GCC.
Quote:
You can optimise with words copy, and doing bytes only when needed.
For exemple if you have 13 bytes to copy, you can tranfert 6 words and 1 byte,needs more code but i think easily doable.
Of course a 100% 16 bits algorithm is better for the 68K,but 5/6 kB / frame is not that bad imo .
Well unfortunately you can't really do that as data aren't necessary aligned on word, word/dword moves work only for word aligned data.
Of course i could eventually add some tests before and take different code depending if source and dest are aligned but doing that would waste some really precious cycles... In almost case the byte copy loop is quite small (2 or 3 bytes) :-/
Code:
move.b (a0)+,(a1)+ = 12 cycles
dbra d0, .loop = 10 cycles when branch taken
So 22 cycles for the loop
Quote:
I really want to provide the source code for the compressor so any people can compile tools for its platform (linux / osx)
I downloaded sources but got some troubles when i tried to compile them from my mingw GCC.
Ah, it be complicated then,but i understand your point ( here
).
Quote:
That is the idea, today we have blazing powerful computer so we can use better compression algorithm, we are just limited about the decompression part which should remain very fast.
Yes but on an old machine you cannot have fast decompressing with modern compression scheme imo.
If it was possible, i think it already exists .
2 methods can be used in your SGDK, a strong with the best ratio possible, but all your CPU used for interlevel decompression, and a realtime one for decompressing datas on the fly .
Quote:
move.b (a0)+,(a1)+ = 12 cycles
dbra d0, .loop = 10 cycles when branch taken
So 22 cycles for the loop
And you don't have a +4 cycles (addresses calculation) for each memory access (an)+ -> (an+),so 8 cycles ???
EDIT: it's may be 4 cycles for a register copy, and + 4 if memory ??
4+8(8 because memory to memory) = 12.
Quote:
Yes but on an old machine you cannot have fast decompressing with modern compression scheme imo.
If it was possible, i think it already exists .
New compression algos are arriving every days, lz4 or snappy are modern and still provide fast unpacking =)
Of course we couldn't use the best packer which requires also very complex unpacking, but at least we can use complex packing and simple unpacking compression =)
Quote:
2 methods can be used in your SGDK, a strong with the best ratio possible, but all your CPU used for interlevel decompression, and a realtime one for decompressing datas on the fly .
Yeah, for now i have aplib packer which give good compression ratio but slow decompression and now i want another general purpose compression which provide acceptable compression ratio (not as good than aplib but not too bad too) with fast decompression (usable for sprite data at least) =)
Quote:
And you don't have a +4 cycles (addresses calculation) for each memory access (an)+ -> (an+),so 8 cycles ???
EDIT: it's may be 4 cycles for a register copy, and + 4 if memory ??
4+8(8 because memory to memory) = 12.
Actually it's quite simple, for almost every memory copy operations you can consider 4 cycles per bus accesses :
- 1 access to fetch instruction
- 1 access to read data
- 1 access to write data
so 12 cycles here...
As we use direct register indexing, we have free address register increment (hidden by memory accesses).
Wouldn't these algorithms compress better if they were modified to have use pixels instead of bytes?
"Better" in a rate sense: Yes. The Codemasters codec, a lossless codec for 2bpp tiled images that
tokumaru reverse-engineered, operates on a pixel level. And because it operates on a pixel level without hardware assistance, it's slow. In order to operate on a pixel level without slowdown, you'll need
hardware.
Yeah that's all the problem. And even using byte stream compression on a 68000 is not really efficient when you always have small amount of byte to copy (between 1 to 4). Typically LZx compression should perform really well on 65816 CPU as it don't mind to have unaligned word operation while it does on 68000. On the page Touko linked we can saw they got a lz4 unpacking code running at a maximum rate of 5kb per frame on a 2.6 mhz 65816, definitely not bad... Not sure I can go much higher with the 68000 (I'm using lzo1x which is more complex to unpack but still). Using a compression algo optimized for word access could allow much better performance on 68000, I wonder if someone already tried to derive a LZW algo with that constraint in mind.
I just want to know how much they can be compressed down. For example, how much would
http://segaretro.org/Nemesis_compression save when doing large cartoony sprites in SFA or Dark Stalkers? 3:1? 6:1? 10:1?
It maybe has a good compression ratio, but it seems very slow too .
I think really you can find some modern algo that doing the job way better,like pucrunch, exomiser,LZMPi,and many others .
You can find in nemessis link this :
Quote:
While the Nemesis format was limited to only 3 tiles of decompressed data per frame, the Kosinski decompression routine may be able to decompress 100 tiles in the same time.
It confirm that nemesis is slow,even in his fastest form(12 tiles/frame),but kosinski require dictionnary, and of course eat rom space too .
Tried to optimize my LZO1X unpacking code for 68k, can't obtain better than 3.5 KB / frame for an average compressed file...
I guess i will have to stick with UFTC for live real time sprite unpacking or get a look into LZ4 again.
I'm curious about this LZO1X format but can't find a description of it anywhere... I only found the documentation on how to use the libraries and the source code of the decompressors. Do you have a link to a plain description of the format so I can see what the big deal is about this one?
Earlier today I thought up a new compression scheme, that is loosely based off of Kosinski compression, but is designed more for graphics than general data.
Its starts out with description word followed by a bunch of other bytes. The description word describes what to do with the following bytes. These are the description codes:
0: literal mode:
Next byte contains the following 2 pixels.
10: RLE mode:
Next byte contains legnth and pixel data in this format:
LLLLPPPP with L being a legnth from 3-14, and P being the pixel data.
If the RLE byte is $c0-$ff, it signifies an inline copy in the format:
11LLLRRR where L is the legnth from 3-10 and R is how many pixels back to copy from 2-9.
110: LZSS mode
This is for copying pixels from the previous row of pixels. It's format is this:
LLLLRRRR where L is legnth from 3-14 and R is the relative offset on previous line to copy from.
If the LZSS byte is $c0-$ff, then it goes into full range copy mode:
11LLLLRR RRRRRRRR where R is anywhere from 2-1025 pixels back.
111: end of description word:
Next 2 bytes are the new description word.
Differentiating between RLE and LZ is usually not necessary, because an LZ match with an offset of -1 is effectively the same as an RLE run.
tokumaru wrote:
I'm curious about this LZO1X format but can't find a description of it anywhere... I only found the documentation on how to use the libraries and the source code of the decompressors. Do you have a link to a plain description of the format so I can see what the big deal is about this one?
Unfortunately i couldn't get any description neither, even the LZO format from which it is based on i can't find much about it.
I tried to somehow reverse engineer the packing format of LZO, it's quite similar to LZ4 but with more block format (both length and offset can have different size) to improve the compression, there is also a special trick about initial byte and ending...
Here's my unpacking code in C, i derived it from an original more complex and slower code, trying to comment it a bit...
I think i got my C code to a point i can't really improve it anymore, it unpacks at 3.1 KB / frame, which is quite close to my ASM version actually (3.6KB / frame for the assembly version, poor improvement, the C version was already quite good) :
Code:
#define M1_MAX_OFFSET 0x0400
#define M2_MAX_OFFSET 0x0800
#define M3_MAX_OFFSET 0x4000
#define M4_MAX_OFFSET 0xbfff
u16 lzo_unpck(const u8 *src, u8 *dest)
{
const u8 *ip;
const u8 *m_pos;
u8 *op;
s16 t;
s16 rawCopy = 0;
op = dest;
ip = src;
// special case of initial unpacked block data
if (*ip > 17)
{
t = *ip++ - 17;
if (t < 4) rawCopy = t;
else rawCopy = 4;
while(t--) *op++ = *ip++;
}
for (;;)
{
// segment type
t = *ip++;
// 0000XXXX
if (t < 16)
{
// rawCopy = 4 (special value indicating a previous big raw copy ?)
if (rawCopy == 4)
{
// number of literal bytes to copy (b10)
rawCopy = t & 3;
// set m_pos to current - (specified 10bit (2 bits from t b32 + 8 bits from *ip) + M2_MAX_OFFSET)
m_pos = op - (1 + M2_MAX_OFFSET + (t >> 2) + (*ip++ << 2));
// only 3 bytes to copy from m_pos
t = 3;
}
// rawCopy != 0 (1-3) --> small match
else if (rawCopy)
{
// number of literal bytes to copy (b10)
rawCopy = t & 3;
// set m_pos to current - specified 10bit (2 bits from t b32 + 8 bits from *ip)
m_pos = op - (1 + (t >> 2) + (*ip++ << 2));
// only 2 bytes to copy from m_pos
t = 2;
}
else
// rawCopy = 0 --> can't do match
{
// special case of 0
if (!t)
{
u8 len;
while (!(len = *ip++))
t += 255;
t += 0xF + len;
}
// just literal copy to do (min size = 4)
*op++ = *ip++;
*op++ = *ip++;
*op++ = *ip++;
while(t--) *op++ = *ip++;
// big literal copy just happened
rawCopy = 4;
continue;
}
}
// 11xxxxxx - 10XXXXXX - 01XXXXXX (>= 64)
else if (t >= 64)
{
// number of direct bytes to copy (b10)
rawCopy = t & 3;
// set m_pos to current - specified 11bit (3 bits from t b432 + 8 bits from *ip)
m_pos = op - (1 + ((t >> 2) & 7) + (*ip++ << 3));
// byte to copy from m_pos = t b765
t = (t >> 5) - 1 + (3 - 1);
}
// 001XXXXX (32-63)
else if (t >= 32)
{
// byte to copy from match (b43210)
t &= 0x1F;
// special case of 0
if (!t)
{
u8 len;
while (!(len = *ip++))
t += 255;
t += 0x1F + len;
}
// minimum size = 2
t += 3 - 1;
rawCopy = *ip++ << 0;
rawCopy |= *ip++ << 8;
// set m_pos to current - specified 14bit (from *ip)
m_pos = op - (1 + (rawCopy >> 2));
// number of literal bytes to copy
rawCopy &= 3;
}
// 0001XXXX (16-31)
else
{
// set m_pos to current or current - 2048
m_pos = op - ((t & 8) << 11);
// byte to copy from match (b210)
t &= 0x07;
// special case of 0
if (!t)
{
u8 len;
while (!(len = *ip++))
t += 255;
t += 0x7 + len;
}
// minimum size = 2
t += 3 - 1;
rawCopy = *ip++ << 0;
rawCopy |= *ip++ << 8;
// set m_pos to current - specified 14bit (from *ip)
m_pos -= rawCopy >> 2;
// number of literal bytes to copy
rawCopy &= 3;
// done !
if (m_pos == op)
return op - dest;
// subtract M3_MAX_OFFSET from m_pos
m_pos -= M3_MAX_OFFSET;
}
// t = 2 at least here
*op++ = *m_pos++;
*op++ = *m_pos++;
t -= 2;
// do copy from back buffer (packed data)
while(t--) *op++ = *m_pos++;
t = rawCopy;
// then copy literal bytes
while (t--) *op++ = *ip++;
}
// return unpacked size
return op - dest;
}
I spent many time with that LZO stuff but i think i will just try LZ4 format which is much more described and used.
I was not able to get a recent precompiled LZO packer for windows to test how good performs LZO1X with maximum compression. It should perform well but the only LZO1X packer i was able to obtain / build is quite old and does not even pack as good than recent LZ4HC :-/ So definitely i should t least switch to LZ4 to see how much speed i can obtain from it but i believe it won't be really good neither (a bit better than LZO1X but not much)...
Just tested LZ4, well, definitely better than LZO... I can unpack at ~5 KB / frame using an optimized version of the unpacker.
I wonder if i can improve it but definitely look like a better solution than LZO after all.
Edit:Can't get more than 5.5 KB / frame bringing others minors optimizations.
What a shame there is no word aligned optimized LZW packers so we could properly take advantage of the 68000 word/dword transfer.
Having that would allow at least 10 KB / frame, would be awesome for LZ77 packed data
Maybe i've an idea...
Quote:
What a shame there is no word aligned optimized LZW packers so we could properly take advantage of the 68000 word/dword transfer.
Because it should be worse for compression ratio .
Yeah of course it won't compress as much but i believe it could be not that bad and the speed improvement can be really high
The idea is about using it to (un)pack 4bpp GFX data anyway so i guess i won't make make a big different in compression ratio.
At least i hope so :p
Stef wrote:
Yeah of course it won't compress as much but i believe it could be not that bad and the speed improvement can be really high
Of course the decompression speed should be very high,but i really think that compression ratio will suffer a lot .
Wait and see .
It already suffers badly from using bytes instead of nibbles. Also pixelart doesn't really compress that well in the first place since there isn't much redundancy at all within a sprite (UFTC exploits the fact that among different sprites from the same animation there may be some repeated portions as well as a good chunk of blank pixels, but even then it barely makes it to about half the size in the best cases).
That too, you'll need to compress many graphics at once to get any real gain at all, and this means you need a way to get random access (or at least the ability to immediately jump to a given graphic within a set). That alone makes a lot of compression algorithms completely useless.
psycopathicteen wrote:
I played around with forming Final Fight sprites into blocks, and found the biggest sprites can fit within a 2kB slot. So VRAM-wise it can do 6 enemies, and have 4kB left for items.
Of course with 16kB you can go with 8 128x128 sprites, it's more than enough,the only problem with snes is you cannot maximise the sprite on screen or VRAM use because the lack of more than 2 sprite's sizes on screen .
And with that size, you need to have a double buffer for your sprites,else you can reach your DMA limit quickly.
May be i'am wrong, but seems evident with your alisha sprite, you spend some VRAM with empty space because of that .
What's size have you used for alisha ??
EDIT: answer here :
viewtopic.php?f=12&t=14034@sik: you're right, but half the size is not bad at all, unless if the average compression is often close to 10/15% or less .
TOUKO wrote:
@sik: you're right, but half the size is not bad at all, unless if the average compression is often close to 10/15% or less .
Half the size means twice as many graphics so yeah, it's definitely not a bad thing. But the discussion is about trying a modern algorithm to get better compression, and UFTC pales in comparison =P (then again SFA2 had the advantage of extra hardware to aid with decompression, so they could afford something better too)
Alisha alternates character animation updates whenever it reaches the DMA limit.
I finally completed my own packer based on LZ4 but optimized for the 68000 CPU (and so taking advantage of 16 bits move), the idea was to get something fast enough for "live" sprite data unpacking and also giving a good compression rate.
Ok so to start speed (tested on Kega Fusion):
It varies between 7.5 to 10 KB/frame depending the compression level of the input data (lower compression is usually slower to unpack) compared to an optimized LZ4 implementation which varies between 4.5 to 6 KB/frame. Unfortunately the improvement, even if still great, is not as important than expected (header block decoding still eat a lot) and we are still very far from the 12-14KB/frame from UFTC :p
For me that is not good enough as unpacking 5 KB of graphic data per frame will eat about 75% of the CPU time :-/ I really need to improve the speed to make it more usable for my requirements (i would like to have a minimum of 10 KB/frame).
Then the compression level:
This time this is a good new, the compression level achieved by this new compression scheme (i called it LZ4W as it's merely based on LZ4 but optimized for word read/write) is definitely not as bad than we could expect from a word based compression, it could even be better but i preferred to simplify it to gain on unpacking speed.
Here're some examples:
tiles data: original=12672 LZ4HC=6622 LZ4W=6746 UFTC=11524
single BMP sprite: original=4798 LZ4HC=2202 LZ4W=2168 UFTC=3580
text file: original=40992 LZ4HC=16088 LZ4W=27458 UFTC=44700
highly compressible tiles: original=346656 LZ4HC=24613 LZ4W=48202 UFTC=114932
map data: original=4576 LZ4HC=1367 LZ4W=1382 UFTC=3212
LZ4W performs badly for bigger file, that is expected as windows research is small (256 words)... anyway i plan to use it on small data and hopefully for that case it performs well =)
I'm posting that here as i believe the 65816 can also take benefit from this compression scheme, probably not as much than the 68000 but still the 16 bits architecture can give a boost here. I will post more infos soon about the compression format, it's very simple. I developed the compressor in java, i will provide the sources as well...
New update, i was able to optimize my unpacking code while keeping the same compression level.
I managed to obtain a decompression rate between 10.5 KB up to 14-15 KB / frame, i am happy with that final result
I will probably keep only 2 compression schemes for the next SGDK :
- aplib (good compression but slow)
- lz4w (average compression and fast).
Wow! That's a pretty large improvement...
So even if you were maxing out the bandwidth to vram, you should still have well over half the CPU power left.
I always heard that the DKC games used some sort of really good compression. Has anyone ever looked at the format and found the compression ratio and whatnot? I assume compression on the SNES wouldn't exactly work the same as on the Genesis due to different graphic formats?
Quote:
Wow! That's a pretty large improvement...
So even if you were maxing out the bandwidth to vram, you should still have well over half the CPU power left.
Yeah that was the idea, i really wanted something allowing me to use full bandwidth and still have enough CPU time to work with
Quote:
I always heard that the DKC games used some sort of really good compression. Has anyone ever looked at the format and found the compression ratio and whatnot? I assume compression on the SNES wouldn't exactly work the same as on the Genesis due to different graphic formats?
LZ4 is a general purpose compression and so i believe it should work not that bad even for SNES graphic. Do you have a sample file containing preconverted raw SNES tile data so i could test it ?
I think the method i used could be nice suited for the 65816 CPU as well, it won't be as fast as on the 68000 but i believe it could be almost at half the speed (which is already a good unpacking speed for the 65816 i think).
But now you are speaking about, i think that indeed DKC is probably using a very efficient compression scheme as well. I know this game is using 4 MB rom but still the amount of animation is impressive so i'm almost certain they had to compress the sprites data...
Maybe they are using a large part of word ram to unpack sprites data there at level loading but i don't think so, i'm not sure you can store all required sprites animation for a level in 128 KB.
Stef wrote:
Do you have a sample file containing preconverted raw SNES tile data so i could test it ?
What do you mean by preconverted? You mean like LZ4 compressed versions of graphics in the SNES's graphics format? If this is the case, I unfortunately don't have any.
Stef wrote:
i think that indeed DKC is probably using a very efficient compression scheme as well.
It'd thought I heard something about how it had the best compression ratio (or at least given the same processing power) than anything previously done on the SNES, and even though the graphics are relatively small, there's an insane number of animation frames in the game (I saw it in a level editor, but I don't remember now.) Somewhat humorously, there's was even space for some unused graphics (including that of an enemy from DKC2)
according to TCRF.
Espozo wrote:
What do you mean by preconverted? You mean like LZ4 compressed versions of graphics in the SNES's graphics format? If this is the case, I unfortunately don't have any.
I should have said "raw SNES graphics file" instead
I just want a binary file containing SNES tiles data (in bitplan format) so i could test how much my LZ4W packer can compress it. Ideally it would be nice to compare the compression ratio to current existing SNES GFX packers
Quote:
It'd thought I heard something about how it had the best compression ratio (or at least given the same processing power) than anything previously done on the SNES, and even though the graphics are relatively small, there's an insane number of animation frames in the game (I saw it in a level editor, but I don't remember now.) Somewhat humorously, there's was even space for some unused graphics (including that of an enemy from DKC2)
according to TCRF.
Well i'm now really curious about the used compression algo in DKC, i think i will investigate that
Congrats stef for this improvement, i am between 13 and up to 17 kb per frame on thE hu6280, this of course with the classic LZ4.
it's not optimised to it's max, but i think i can't go higher .
Espozo: LZ4 can be assimilated to a simple bytes copy,it depends of how fast you can transfert bytes and your CPU frequency, this is why it's so fast (you can transfert 19kB in a frame with the hu6280)
This is why the snes's CPU cannot do mush than 5/5.5 kb per frame.
Wow 13 to 17 KB/frame ?? Did you measured it ? That is amazing and even more that what i can achieve with the 68000. Actually depending the compression level it can go up to ~16 KB/frame but usually it varies between 10 and 14 KB/frame.
LZ4 i really well suited for 8 bits CPU (and the PCE is using a really fast one and it has the powerful TAI instruction) so i guess using LZ4W won't provide better performance in your case but i think it can for the 65816 (not much but still). LZ4 on a 2.68 Mhz 65816 performs between 3 to 5 KB/frame i believe, with LZ4W we can probably obtain 4.5 to 7 KB/frame
Edit: Actually i'm realizing the LZ4 unpack code take advantage of the MVx instruction and unfortunately you cannot transfer memory block faster than using this instruction on 65C816, even when you are in 16 bits accumulator mode... So you're right, there is no way of passing that 5 KB/frame limit :-/By the way i found some information about DKC graphics :
http://www.dkc-atlas.com/forum/viewtopi ... =38&t=1167http://www.dkc-atlas.com/forum/viewtopic.php?f=56&t=226Apparently the tiles data them-self are not packed.. I'm not that surprised as this game is already pushing bandwidth usage, so they should have a blazing fast unpacking code to feed it. And even using a blazing fast unpacking code they could not go higher than 5 KB/frame by using all CPU time. So there is just no way of unpacking that much sprite data in live on a stock SNES...
Quote:
LZ4 can be assimilated to a simple bytes copy,it depends of how fast you can transfert bytes and your CPU frequency, this is why it's so fast (you can transfert 19kB in a frame with the hu6280)
This is why the snes's CPU cannot do mush than 5/5.5 kb per frame.
Actually you still have the compression decoding part so 19KB for PCE (funny that block move instruction is actually faster on Hu6280 than on 65C816 CPU) or 5.5 KB for SNES are theory maximum for simple memory copy (it's ~25 KB/frame on MD using move.l) but in practice you're at lower speed than that of course. On average compression level (between 40 and 50%) you get about the worst unpacking speed and so about half (in best case) of the max memory copy speed. LZ4W uses a simpler compression scheme compared to LZ4 so it can help a bit on average speed but definitely not much for PCE and SNES.
Yes i mesured it,sorry stef do not answer to all your post but for me it's not so practical with my cell phone.
I started with the 6502 version,i optimised with hu6280's opcodes,but still with lda/sta,"fast" because of CPU's frequency but not the best,and i used the block transfert instructions.
Now it only decompress in RAM,but i want to do it also directly in vram,and even for RAM it's not y et finished(for now it's limited to only 2 banks,16 kb).
19 kb is doable with a single instruction.
Txx src,dst,sise but not so easily out of this case i think.
But i am curious about how many registers you used for decompression ?
And about compression ratio compared to the classic Lz4 ?
Ok, it looks like you still have limitations in your use case of your LZ4 unpacking implementation. I understand the TAI instruction can be problematic as well (for interruption or just about bank crossing). The original LZ4 code for 65C816 was also limited to 64KB bank for that reason. My version doesn't have that kind of limitation but anyway as we only have 64 KB of RAM on MD we don't need to cover more.
About the compression ratio, here are the numbers to compare LZ4W versus LZ4HC (which is the best compression level for LZ4) :
Tiles data: original=12672 LZ4HC=6622 LZ4W=6746 UFTC=11524
single BMP sprite: original=4798 LZ4HC=2202 LZ4W=2168 UFTC=3580
text file: original=40992 LZ4HC=16088 LZ4W=27458 UFTC=44700
highly compressible tiles: original=346656 LZ4HC=24613 LZ4W=48202 UFTC=114932
map data: original=4576 LZ4HC=1367 LZ4W=1382 UFTC=3212
Actually i slightly modified my encoder so numbers are now a bit different but should be very close to that.
About the number of registers used, my unpacking code is crazy simple ! Basically it requires only 2 data registers and 3 address registers :
- 1 data reg hosting literal and match length
- 1 data reg for match offset
- 1 address reg hosting source buffer
- 1 address reg hosting destination buffer
- 1 address reg hosting temp source buffer for match copy
I'm using 2 extras address registers for internal use (jump table address and fast loop start jump) but you can do without them.
I calculated ~7.5 kB per fame for block moves, and the 65816 can outperform it's block move by using an unrolled loop. If you have enough room in the first 8kB, you can use Direct Page, and optimize it even more.
psycopathicteen wrote:
I calculated ~7.5 kB per fame for block moves, and the 65816 can outperform it's block move by using an unrolled loop. If you have enough room in the first 8kB, you can use Direct Page, and optimize it even more.
(2.68 / 7 cycles) / 60 = ~6.5 KB pere frame.
If we use fast rom we can consider this : (((2.68 + 3.58) / 2) / 7) / 60 = ~7.5 KB per frame indeed !
How can you go faster with unrolled loop ? Even in 16 Acc mode i can't see how to do faster than 7 cycles per byte.
5 cycles to load a word (minimum with indexing)
5 cycles to store it
2+2 cycles to increment both index
So 14 cycles for 2 bytes which is the same as the MVx instruction speed.
Don't increment the index registers. Start with an of offset of 254, and go down like this:
Code:
lda $00fe,x
sta $00fe,y
lda $00fc,x
sta $00fc,y
...
lda $0002,x
sta $0002,y
lda $0000,x
sta $0000,y
Let the legnth factor is an index to a LUT of entry points. If it's an odd number it gets rounded up.
I just thought of another trick. If still have enough room in the first 8kB of RAM, you can copy data with just a series of "PEI ($xx)".
Stef wrote:
(2.68 + 3.58) / 2
With block move, it looks like the only slow cycle is the write to RAM, so that should be more like (2.68 + 6 * 3.58) / 7.
It seems to me that it might be helpful to use DMA to copy literals, as long as you avoided conflict with HDMA or else just didn't care about 1/1/1 compatibility...
EDIT2: This was not an attempt at faux-ninjaing tepples; I only changed the phrasing a bit.
Stef wrote:
How can you go faster with unrolled loop ? Even in 16 Acc mode i can't see how to do faster than 7 cycles per byte.
If you use an algorithm that uses a static dictionary rather than an algorithm where bytes refer to previous bytes (such as LZ77 family), DMA copies from ROM to WRAM's B bus port become possible at 1 slow cycle per byte. But I'm not sure whether the overhead of setting up a DMA copy for every few copied bytes is worth it, and I'm not sure how well a static dictionary method would work with the weird interleaving of tile planes in Super NES 4-bit tiles and TG16 background tiles.
Which tile data corpus are you using?
psycopathicteen wrote:
Don't increment the index registers. Start with an of offset of 254, and go down like this:
Code:
lda $00fe,x
sta $00fe,y
lda $00fc,x
sta $00fc,y
...
lda $0002,x
sta $0002,y
lda $0000,x
sta $0000,y
Let the legnth factor is an index to a LUT of entry points. If it's an odd number it gets rounded up.
I just thought of another trick. If still have enough room in the first 8kB of RAM, you can copy data with just a series of "PEI ($xx)".
Nice trick
You still have to increment X/Y at each sequence decode but that is already a nice gain. I would even use negative offset with LDA/STA so you can increment X/Y before the jump when you still own the literal length info in register and use it for the jump table. It would be interesting to try to put the fastest algorithm and see how it performs.
This LDA/STA sequence cost 11 cycles to transfer a word if i'm not mistaken, so a bit more than 9 KB/frame at max.
Quote:
With block move, it looks like the only slow cycle is the write to RAM, so that should be more like (2.68 + 6 * 3.58) / 7.
Does that means that except on RAM access cycle the CPU is *always* running @ 3.58 Mhz ? I guess it's quite difficult to really know how the CPU is actually handling its RAM access cycle.
Quote:
If you use an algorithm that uses a static dictionary rather than an algorithm where bytes refer to previous bytes (such as LZ77 family), DMA copies from ROM to WRAM's B bus port become possible at 1 slow cycle per byte. But I'm not sure whether the overhead of setting up a DMA copy for every few copied bytes is worth it, and I'm not sure how well a static dictionary method would work with the weird interleaving of tile planes in Super NES 4-bit tiles and TG16 background tiles.
Unfortunately i think the DMA setup overhead is way too heavy, in general case you transfer very few byte at once (3 to 5 bytes/words), it's why the block header decoding should be pretty fast to allow good unpacking speed.
I was using tiles data for MD (so chunky pixel) but i would really like to test compression on planar tiles data to see the difference.
Ideally it would be nice to test the same tile set in chunky and in planar format
Stef wrote:
Does that means that except on RAM access cycle the CPU is *always* running @ 3.58 Mhz ?
The timing logic is built into the CPU, and I'm pretty sure it's cycle-based, not instruction-based. ROM in the upper half of the memory map is fast if $420D is set. I believe internal operations that don't use the bus are always fast, and special areas like the B bus ($21xx) and internal register access ($42xx/$43xx) are always fast. The DSP area and the ROM areas below bank $80 are always slow.
Quote:
Unfortunately i think the DMA setup overhead is way too heavy
It's not like you have to do the whole thing every time. In this case, the DMA type doesn't change, the destination (B bus WRAM gate) doesn't change, and if you're using tepples' fixed-dictionary scheme you don't even have to rewrite the WRAM gate address because it auto-increments. All you need to do is reset the source address and size, and start the transfer - two writes, one immediate load, and one more write, and I think all of the writes can be direct page (EDIT: except the actual transfer start; that's in the previous page and won't fit unless you use a non-page-aligned DP, which would remove the speed advantage). Or, if you don't need both index registers, you could eliminate the load and use a 16-bit write to start the transfer. This procedure is 100% fast cycles if $420D is set.
Have I forgotten something important? (It happens - I'm no expert...)
Thanks for the clarification about the CPU speed, it was always intriguing me
About the DMA, the thing is, does it worth it if you need to transfer only a few byte ?
Maybe it does.. after all to use software copy you also need to initialize stuff for the jump table...
You can find my LZ4W implementation for the 68k and the java packer/unpacker here if you want to give a try :
https://www.dropbox.com/sh/smgwbi8g6y50 ... 03vQa?dl=0
Nice trick i forgot you can relocate the 65816's DP.
@stef:Txx can be problematic for interrupts if you transfert more than 32bytes.
Here's some graphical data from my game you can test from. I got it from the hex editor part of bsnes debugger.
Code:
00 00 00 00 00 00 00 00 00 00 00 00 01 01 02 02
00 00 00 00 00 00 00 00 00 00 00 00 00 01 01 03
00 00 00 00 0f 0f 33 30 4f 40 bc 83 70 0f 60 1f
00 00 00 00 00 0f 0f 3f 3f 7f 7f ff ff ff ff ff
00 00 00 00 80 80 e0 60 f0 10 fb 1b 7f a4 6d ab
00 00 00 00 00 80 80 e0 e0 f0 f8 e3 e3 df e6 df
00 00 00 00 03 03 04 04 79 78 cf be b7 0d ef cb
00 00 00 00 00 03 03 07 07 7f 77 f9 f9 fe 35 fe
00 00 00 00 e0 e0 70 10 f8 08 ec 14 c4 3c c2 3e
00 00 00 00 00 e0 e0 f0 f0 f8 f8 fc f8 fc fc fe
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
02 02 04 04 05 04 09 08 09 08 0b 08 0b 08 0b 08
01 03 03 07 03 07 07 0f 07 0f 07 0f 07 0f 07 0f
c6 3f cc 3f 99 7e 93 7d b6 7a 26 fa 2c f4 0c f4
ff ff ff ff ff ff fe ff fc fe fc fe f8 fc f8 fc
fa 36 fc 74 b5 b4 3a 29 2b 28 1a 18 19 18 14 14
ed de 8b fc 0a bc 14 38 16 39 06 11 07 10 03 10
39 2b df 15 ff 0d 65 99 fd 05 d7 27 db 2b ea 0a
d6 3f ea 1f 62 0f 62 03 62 0b 60 0b f0 03 f0 02
ca 3e ca 3e 6d 1f 65 1f 65 1f 35 0f 35 0f b1 8e
fc fe fc fe fe ff fe ff fe ff fe ff fe ff 7f ff
00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 80
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80
13 10 13 10 13 10 13 10 13 10 13 10 13 10 13 10
0f 1f 0f 1f 0f 1f 0f 1f 0f 1f 0f 1f 0f 1f 0f 1f
18 e8 18 e8 38 c8 a8 48 b0 50 b0 50 b0 50 f1 11
f0 f8 f0 f8 f0 f8 f0 f8 e0 f0 e0 f0 e0 f0 e0 f0
02 02 03 03 1c 1c 21 20 5b 43 ad 94 bb 88 49 28
01 00 00 00 03 00 1f 00 3f 00 7c 03 78 07 f8 07
10 10 e0 e0 3c 3c e2 02 f7 f1 fb 08 0e f5 de 25
e0 00 00 00 c0 00 fc 00 fe 00 0f f0 07 f8 07 f8
b1 8e b9 86 99 86 99 86 9b 84 db c4 db c4 da 44
7f ff 7f ff 7f ff 7f ff 7f ff 3f 7f 3f 7f bf 3f
80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80
00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80
11 10 11 10 09 08 09 08 09 08 08 08 04 04 04 04
0f 1f 0f 1f 07 0f 07 0f 07 0f 07 0f 03 07 03 07
f1 11 f2 12 f5 14 f5 14 fa 19 db 18 d7 11 d1 11
e0 f0 e1 f0 e3 f0 e3 f0 e7 f0 e7 f0 ee e0 ee e0
78 18 af 6f 68 a8 c5 44 85 84 85 84 0b 08 09 08
e8 07 cf 00 cf 00 83 00 03 00 03 00 07 00 0f 00
7d 0c f3 f2 22 22 c1 41 c1 41 40 c0 40 c0 e0 20
0b f0 f1 00 c1 00 80 00 80 00 80 00 80 00 e0 00
5e c0 6e a0 af 61 bf 51 5d 31 dd a9 ad 99 7d 49
bf 3f df 1f de 1f ee 0f ee 0f 76 07 76 07 36 07
80 80 80 80 00 00 00 00 00 00 00 00 00 00 00 00
00 80 00 80 00 00 00 00 00 00 00 00 00 00 00 00
04 04 02 02 02 02 01 01 01 01 00 00 00 00 00 00
03 07 01 03 01 03 00 01 00 01 00 00 00 00 00 00
5e 1e 50 10 10 10 10 10 10 10 90 90 90 90 50 50
e0 f0 e0 f0 e0 f0 e0 f0 e0 f0 60 f0 60 f0 20 70
16 16 27 21 4d 42 ff fe e1 21 2d 21 2f 21 2b 25
17 08 21 1e 40 3f fe 01 1f 00 1e 00 1e 00 1e 00
f0 50 f8 c8 6c 14 3e 1e ee e8 88 88 9c 84 5c 44
d0 20 c8 30 04 f8 1e e0 f0 00 70 00 78 00 38 00
4a 4a 32 32 22 22 22 22 24 24 14 14 14 14 18 18
34 06 0c 0e 1c 3e 1c 3e 18 3c 08 1c 08 1c 00 18
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30 30 10 10 00 00 00 00 00 00 00 00 00 00 00 00
00 30 00 10 00 00 00 00 00 00 00 00 00 00 00 00
2a 26 3a 26 56 4a 54 4c 54 4c 74 4c ac 94 ac 94
1c 00 1c 00 3c 00 38 00 38 00 38 00 78 00 78 00
56 4a 3a 26 2b 25 1d 13 15 12 0e 09 0a 09 0b 08
3c 00 1c 00 1e 00 0e 00 0f 00 07 00 07 00 07 00
18 18 10 10 00 00 00 00 80 80 80 80 c0 40 c0 40
00 18 00 10 00 00 00 00 00 00 00 00 80 00 80 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 01 01 01 01 02 02 04 04 09 08 13 10 26 20
00 00 00 00 00 00 01 00 03 00 07 00 0f 00 1f 00
b8 88 38 08 28 08 10 10 d0 10 a0 20 40 40 80 80
70 00 f0 00 f0 00 e0 00 e0 00 c0 00 80 00 00 00
05 04 04 04 04 04 05 04 05 04 09 08 0a 09 0a 09
03 00 03 00 03 00 03 00 03 00 07 00 07 00 07 00
a0 20 a0 20 a0 20 c0 40 40 c0 40 c0 c0 40 c0 40
c0 00 c0 00 c0 00 80 00 80 00 80 00 80 00 80 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 01 03 03 02 02 02 02 07 07 04 04
00 00 00 00 00 00 03 00 02 01 02 01 07 00 04 00
4d 41 9a 82 34 04 88 88 7c 74 c6 3a 7e 02 fc fc
3e 00 7c 00 f8 00 f8 00 74 88 02 fc 02 fc fc 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0b 08 13 10 12 10 1c 1c 13 13 16 11 3b 38 27 27
07 00 0f 00 0f 00 1f 00 13 0c 10 0f 38 07 27 00
80 80 80 80 80 80 c0 c0 e0 a0 30 d0 f0 10 e0 e0
00 00 00 00 00 00 40 00 a0 40 10 e0 10 e0 e0 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Thanks, I just tested, 1536 bytes by default.
APLIB : 885 bytes (57%)
LZ4HC : 1061 bytes (69%)
LZ4W : 1048 bytes (68%)
I'm glad to see my custom LZ4W does even better than LZ4HC (by a very small margin) :p
The thing is that it's a really small dataset so it can't compress much, even APLIB is not that good here...
but still given the amount of null data (00), i think the result is not really good :-/
Here's 9kB of graphical data.
Quote:
Actually you still have the compression decoding part so 19KB for PCE (funny that block move instruction is actually faster on Hu6280 than on 65C816 CPU)
Yes but not by much ,6 cycles/byte for the 6280 vs 7 for 65816, but the CPU's frequency makes the difference .
Quote:
Ok, it looks like you still have limitations in your use case of your LZ4 unpacking implementation.
Yes but this is not a technical one, i myself limited to 2 banks for each side (source/dest), i think 16 kb to work with is enough for a "real time" decompressor (i cannot do more) .
But ican be a limititation in some case (large compressed files) .
psycopathicteen wrote:
Here's 9kB of graphical data.
Ok, here are the result :
Original: 9216 bytes
APLIB: 3968 bytes (43%)
LZ4HC: 4577 bytes (49%)
LZ4W : 5972 bytes (64%)
LZ4HC does a really good job here.
LZ4W start to does a lot of miss as data get bigger (76 miss on 526 encoded segments).
Actually SNES GFX data does not compress that bad
TOUKO wrote:
Yes but not by much ,6 cycles/byte for the 6280 vs 7 for 65816, but the CPU's frequency makes the difference .
Yeah of course the CPU speed make the difference, still i would have expected to see the cycles the other way and the 65816 having the better block move instruction
Quote:
Yes but this is not a technical one, i myself limited to 2 banks for each side (source/dest), i think 16 kb to work with is enough for a "real time" decompressor (i cannot do more) .
But ican be a limititation in some case (large compressed files) .
Honestly i don't plan to use very large data block as well, i will use it mainly for sprite tile data. When i was speaking about limitation, i was mainly referring the fact that using TAI instruction can be a problem for the interruption.
Quote:
i was mainly referring the fact that using Txx instructions can be a problem for the interruption.
yes it can be, but i'll see if it really is in a real use (AKA in a game condition) ..
But the big difference is that i can decompress directly in VRAM .
I don't know if that is the same for MVx instruction on 65C816 but i guess it is, it probably can't interrupt it and restore the internal state.
Stef wrote:
Original: 9216 bytes
APLIB: 3968 bytes (43%)
LZ4HC: 4577 bytes (49%)
LZ4W : 5972 bytes (64%)
Huh, and I thought SLZ was doing bad due to the smaller one performing awful (but it's simple, so I guess I can't expect much either). I need to get around implementing that XSLZ idea I had around (SLZ isn't the most efficient thing out there), the problem is that I'm too lazy to make a compressor for it. Also SLZ is slow anyway, although I haven't measured it in a while =P
Anyway, with SLZ:
Smaller file: 1536 → 1096 bytes (71.35%)
Larger file: 9261 → 4861 bytes (52.49%)
These files seem to compress awfully in general though. How come nothing can make it shrink to around 30%?
The larger file, with my implementation of HAL's SNES-era compression:
9216 -> 4467 bytes (48.5%)
Exomizer:
9216 -> 3682 (40%)
9216 -> 3663 (39.7%) when crunching backwards
I have no idea what the performance for decompressing them would be like compared to the other algorithms mentioned so far, though.
I just tried 7zip in Ultra mode with LZMA2 and I got 39.6% (3650 bytes) on the 9 KB file and 62.4% (959 bytes) on the 1.5 KB file. Apparently it's not just '90s-friendly methods that have trouble with this data...
With LZ4:
9216 -> 5044 (54,73%)
1536 -> 1097 (71,42%)
Quote:
I have no idea what the performance for decompressing them would be like compared to the other algorithms mentioned so far, though.
Exomizer is like pucrunch, not fast because of a good ratio as oposite to Lz variants .
If you want the better ratio available exomiser is a good choice, if you want something fast for on the fly decompression, the better choice is LZx .
tepples wrote:
Oh, that is definitely a nice advantage over the Hu6280 TAI instruction, that can explain the extra cycle as well.
About the compression, there is unfortunately no way to obtain good compression ratio for gfx data when you are using lossless compression (except for simple or cartoonish style gfx). 50% rate is already nice for me, and if you can obtain 50% with a very fast unpacking then you are done
Quote:
Oh, that is definitely a nice advantage over the Hu6280 TAI instruction, that can explain the extra cycle as well.
Yes is a good advantage, but it's Txx, TAI is only one block transfert instruction, the Hu6280 has 5 .
TII
TDD
TAI
TIN
TIA
Quote:
50% rate is already nice for me, and it you can get 50% with a very fast unpacking then you are done
I agree .