I've heard there is more of a performance hit with 65xx chips than with other CPUs. I want to know the reasons of why it is, and try to come up with ways to fix it.
CPUs don't run C code. C and any other languages get compiled into instructions the cpu can understand. C compilers for the 65xx chips aren't the best at optimizing C statements into efficient low level instructions they way mature compilers for other platforms are. We're still at the point where handwritten asm is likely to be more performant. If you want to improve the situation, improve the C compilers.
The 6502 doesn't do stack-based indexing particularly well, and a naïve translation of C to machine code (as well as any instance requiring recursion) really wants a fast stack-based indexing.
FrankWDoom wrote:
C compilers for the 65xx chips aren't the best at optimizing C statements into efficient low level instructions they way mature compilers for other platforms are.
What psycopathicteen asks is why this is the case: whether it's an inherent limit of the architecture or just a matter of lack of interest in a sub-32-bit architecture among authors of Free compilers. GCC won't save you because "We don’t support 16-bit machines in GNU" (
GNU Coding Standards: Portability between CPUs).
For the 6502, the answer is simple: registers are smaller than
int, smaller than
size_t, and smaller than
char *. Most C operations require promotion of
char values to
int. Nor can registers be paired into a pointer, making it harder to have automatic variables in a recursive program. So compilers have to use zero page to store 16-bit values, including the pointer to the automatic variable stack. Accessing arrays through pointers stored as local variables on the stack is thus relatively slow. True, it would be possible to cache pointer variables in software-defined registers on zero page, except for a couple things. First, some people expect to have their programs interoperate with ROM BASIC interpreters on Apple II, Commodore 64, and Atari 800, which take a large chunk of zero page space. Second, some people expect to write interrupt handlers in C, which may clobber the software-defined registers, especially when an NMI interrupts an IRQ that interrupted the main thread.
In addition, because there is no hardware multiply, accessing elements of an array can be slow if the access is not sequential and the size of an element is not a power of two.
The 65816 fixes the pointer problem, as a near pointer can be held in X, Y, or D. The
d,s and
(d,s),y addressing modes largely fix the problems with pointers to local variables. But it shares the lack of hardware multiply. Thus there were some decent C compilers, such as APW and ORCA/C, but they're non-free and still mostly payware. A program that relies on a non-free compiler is called
Java trapped.
Quote:
If you want to improve the situation, improve the C compilers.
How would we go about attracting skilled compiler authors, especially for a niche platform like 65816, and set up a Kickstarter campaign?
The major problems are of "ANSI" C are:
- C was designed for 16-bit or more CPUs. The ANSI standard says that "int" should be at least 16-bit. The 6502 has only 3 8-bit registers, so the usage of 16-bit instructions and variables is a major performance hit if you "naively" write C code. Of course it can be fixed by using "char" type everywhere...
- ...but the problem of authomatic promotion remains. If you add two "chars" together, the ANSI standard specify the result is an "int". So if you conform to the standard you HAVE to do it the inneficient way. Of course it's possible to detect that the high byte will be unused later and delete those instructions internally, but this complicates the compiling process as opposed to a 16-bit CPU for example.
- C was designed for CPUs that adress the whole adress space equally. As such, without using any kind of clever trick a C compiler can not take *any* advantage of the zero page, except for temporary storage. A C compiler also assumes all variables are on the stack, and on the 6502 acessing the stack other than the top most element is tricky to say the least. It is possible but it kills the X register. Finally, the stack is limited to 256 bytes.
- C was designed for CPU with orthogonal instructions and adressing modes. It cannot take advantage for $xxxx,Y and $xxxx,X adressing mode for adressing arrays, because the index of an array is always "int", which again is 16-bit and do not fit in X and Y register. Even worse, it's signed int, so negative index to arrays should be allowed by the standard, and it does not work on a 6502 except within the zero page, which is unfortunately unusable. So the access to "any" array, even single dimentional, will copy the array adress to ZP temporaries, add a 16-bit value to that temporary and and use that pointer to access the element, which is ridiculously bloated.
So the choice to compile C for 6502 would be either depart from the standard largely, or make a super clever compiler that handles all those cases and produces efficient code when it detects doing so is possible.
A third approach, would be to say that since C will be slow and bloated anyway, you just generate an intermediate bytecode and interpret that bytecode on the 6502. That's an approach I'd seriously take, if only I could find a "suitable" bytecode which would be simple and simple to interpret. Most open bytecodes I could find on the net are way too complex unfortunately.
Quote:
GCC won't save you because "We don’t support 16-bit machines in GNU
But GCC has an AVR port which is 8-bit. (very different from 6502, but still)
Quote:
the size of an element is not a power of two
This is an extremely rare case and definitely not part of why C is slow on the 6502. GCC pads its structs to fix that problem by default, because on modern architecture it's better to waste memory than time multipliying by the size of a weird sized element.
Would the
work done by Intel on support for 8086 and 80286 in LLVM help 65816 any?
Quote:
Quote:
the size of an element is not a power of two
This is an extremely rare case
Not if you have arbitrarily sized
struct Actor.
Quote:
GCC pads its structs to fix that problem by default
I thought GCC padded up to the next multiple of the largest alignment of a member, such as 4 bytes if
CHAR_BIT == 8 and the struct contains an
int32_t. What makes you think it pads, say, a struct with 5
int32_t members from 20 bytes up to 32?
tepples wrote:
[LLVMdev] 16-bit x86 status update wrote:
In fact we've implemented no 16-bit ABI at all. This is really 32-bit
code, 32-bit object formats, 32-bit ABIs. Just expecting to run on a CPU
which happens to be in 16-bit mode and hence needs the 0x66 and 0x67
prefixes to be used. A lot.
So ... Doubtful.
I haven't studied it deeply, but LLVM is a very complex "bytecode", that natively supports variable sized strings and similar concept. It has nothing to do with a "simple" bytecode that I was thinking of that would be made of between 15 and 30 simple instructions at most.
psycopathicteen wrote:
What is the slowest part of 6502/65816 running C?
Honestly the architectural issues aren't really the big problem. The slowest thing is the
existing compilers we have for this target. They're just not up to the optimization task.
Compilers are among the most difficult and compilcated computing tasks. You can't expect a compiler like cc65, used by only a handful of people, to compare to an old workhorse like GCC with millions of development hours behind it.
The second slowest thing is just the CPU itself. The targets we're talking about just aren't very powerful computers. There's always a loss of efficiency when using C, but if you've got lots of computing power it doesn't have to matter. In a case like the NES, the lack of power magnifies the impact of an efficiency loss like this, which is already bad because of the compiler quality.
This CPUs are from an era before C was popular, so there was never very good C tools for it. The problem isn't really C itself, or the processor.
Consider 65816 vs. 68000, processors from roughly the same hardware generation. I was under the impression that the existing widely available compilers targeting 68000 were a lot better because of the sixteen 32-bit registers (of which D0 and A4-A7 correspond to the five 16-bit registers AXYDS on 65816), instructions that make more orthogonal use thereof (apart from a few instructions that hardcode A7 as the stack pointer), and hardware 16x16 multiplier. Is this an advantage of the 68000 architecture, or is it just that Atari ST, Amiga, and Mac outnumbered Apple IIGS?
tepples wrote:
Atari ST, Amiga, and Mac outnumbered Apple IIGS?
Do you really consider Atari ST, Amiga and Mac the same generation as the Apple IIGS? I don't.
I thought the IIGS was popular for a long time just because it was cheap and backwards compatible, not because of computing power. (Kinda like Wii vs PS3?)
Also we're still talking about an era where assembly was vastly preferred for high performance applications like games. C existed, and a lot of people were using it on the 68000, especially hobbyists, but I don't think "good" C compilers really started to happen until some time in the 90s.
Quote:
C was designed for CPU with orthogonal instructions and adressing modes. It cannot take advantage for $xxxx,Y and $xxxx,X adressing mode for adressing arrays, because the index of an array is always "int", which again is 16-bit and do not fit in X and Y register. Even worse, it's signed int, so negative index to arrays should be allowed by the standard, and it does not work on a 6502 except within the zero page, which is unfortunately unusable. So the access to "any" array, even single dimentional, will copy the array adress to ZP temporaries, add a 16-bit value to that temporary and and use that pointer to access the element, which is ridiculously bloated.
So the 65816 got screwed over by having X and Y being unsigned instead of signed?
psycopathicteen wrote:
Quote:
C was designed for CPU with orthogonal instructions and adressing modes. It cannot take advantage for $xxxx,Y and $xxxx,X adressing mode for adressing arrays, because the index of an array is always "int", which again is 16-bit and do not fit in X and Y register. Even worse, it's signed int, so negative index to arrays should be allowed by the standard, and it does not work on a 6502 except within the zero page, which is unfortunately unusable. So the access to "any" array, even single dimentional, will copy the array adress to ZP temporaries, add a 16-bit value to that temporary and and use that pointer to access the element, which is ridiculously bloated.
So the 65816 got screwed over by having X and Y being unsigned instead of signed?
Not really. The indexing issue can be optimized away any time the index being used is an unsigned char. The ZP pointer issue can be optimized away any time the array is static.
Trying to use a negative index to an array on the 6502 is problematic, yes, but that has nothing to do with C. Same deal with using 16 or 32 bit numbers everywhere, that's a problem for the platform, not really for C itself. If you want your code to be able to run well you have to know your platform and make concessions for it. Just like having a multiplication instruction makes multiplication easier, but again not specific to C but the platform; you avoid using that stuff on a platform that can't do it well.
Actually, that's basically why cc65 is somewhat usable, and someone like Shiru is able to develop games with it quickly. If you limit yourself to stuff that you know the platform + compiler will handle well, you can still have plenty of the utility of C without the code being
too inefficient. Prefer static variables to locals, use unsigned char for everything, use static arrays instead of passing pointers around, etc. etc.
It's perfectly
possible to write a much better C compiler for NES / SNES, but it's just not
practical given the resources we have. (How many people are both capable and interested? Approximately zero, it seems.)
I may not speak for everyone, but I though programming in C kind of ruins the fun. You're doing this for fun, not because you need to get the game out the door faster.
I'm still waiting for WDC's Terbium...
Espozo wrote:
I may not speak for everyone, but I though programming in C kind of ruins the fun.
I hate C, along with most other high-level languages. They all have the weirdest ways of doing things, and you can tell the creators came up with a lot of stupid workarounds for things that could be very simple.
Espozo wrote:
I though programming in C kind of ruins the fun. You're doing this for fun, not because you need to get the game out the door faster.
I wouldn't equate "slower development" with "more fun". Even if you're doing something for personal experience and not for any commercial consideration, that doesn't mean you want to do it more slowly than you have to.
How often do you see people starting new projects here? How often do you see people finishing anything? If you can accomplish something with 10 days of work instead of 100, doesn't that make you a lot likelier to finish it?
Why use assembly when you can use a hex editor? Is it fun to optimize a 16-bit comparison in assembly for the hundredth time, or would you rather just have it work already and get on to other problems? (I think most people burn out and give up on projects because it takes a lot of repetitive and tedious work to finish them, a lot more often than they run into difficult technical problems.)
rainwarrior wrote:
I wouldn't equate "slower development" with "more fun".
It doesn't. Assembly isn't fun because it's slow.
rainwarrior wrote:
How often do you see people starting new projects here? How often do you see people finishing anything? If you can accomplish something with 10 days of work instead of 100, doesn't that make you a lot likelier to finish it?
I like to think of slow development as a type of quality control. If you're really that dedicated, you'll finish it.
rainwarrior wrote:
Why use assembly when you can use a hex editor?
Because I don't exactly want to track down and change a ton of pointers if I increase the size of a table in ram or something? If it weren't for that, I might have.
Also, I'm pretty sure C won't really help me solving problems related to vram, only generating the code for the solution that I though up of, and that isn't incredibly hard for me to do. I mean like C wouldn't have made me think of the idea of 16x16 and 32x32 animation slots; from what I've seen it's still not a miracle worker.
Quote:
I like to think of slow development as a type of quality control. If you're really that dedicated, you'll finish it.
Well, I've been working on my game for more than 10 years, and I'm still about halfway in it's development. So, well, not really.
You obviously haven't programmed in assembly for a long time enough. Back then I would have agreed with you, but today I agree with rainwarrior instead. Micro-optimizing 6502 code isn't interesting for me any longer. Actually I'm considering quitting NES-dev to go to PC-pseudo-NES dev instead very seriously. The extra effort to make a game actually working on the real console is barely worth the additional time, considering 99% of people will play it emulated anyway.
Quote:
I hate C, [...]
Then any other mid-level language, such as PASCAL, could do the trick as well. You could also look in FORTH, personally I didn't like it at all but many people on 6502.org are fan of this language and apparently it is an efficient mid-level language available for the 6502.
Quote:
It's perfectly possible to write a much better C compiler for NES / SNES, but it's just not practical given the resources we have. (How many people are both capable and interested? Approximately zero, it seems.)
Well there's me but I am stuck and don't know what to do. I have a fairly good list of ideas that would allow for generating efficient code for the 6502, following the compile, assemble and link flow. Many of those ideas are crazy, and involves generating a lot of code at compile-time to remove it entirely at link-time, but should work well. I could even optimize zero-page usage using my ideas. The problem is how do I implement them? Do I try to make a back-end for GCC, or for SDCC, or for LLVM? Or do I do an unofficial branch of CC65, or Quetzacoatl, who already compile code for the 6502, although poorly? All those are extremely complex project, and understanding someone else's code is much more complex than writing your own equivalent. Would I be better making my own non-standard C compiler.
I would really love to work for such a project, but the problem is that it is a huge work, and I'm not being paid for it.
There are a number of early C compilers that targeted 8 bit, but they were for 8080 or Z80. If there was source available for these compilers how hard would it be to adapt them to 6502, or maybe recompile the binary? (Assuming that they did a better job than cc65, which may not be the case.)
Bregalad wrote:
Well, I've been working on my game for more than 10 years, and I'm still about halfway in it's development. So, well, not really.
Well, I've been working on my game for a little over half a year and I'm over halfway done with the overhead, so I have no clue as to how it could take you 10 years unless you add one instruction per every 2 weeks. It looks like psychopathicteen is done with the almost done with the overhead and already has a couple of objects down. The thing that would take me the longest is graphics, and C won't help with that. I just have 0 clue as to how it could possibly take you that long, and frankly, I'm about as interested in programing as I am in creating a video game, so if I don't finish, so be it.
I personally think my point still stands, but I'm not as experienced as you, so maybe I shouldn't talk.
Bregalad wrote:
The extra effort to make a game actually working on the real console is barely worth the additional time, considering 99% of people will play it emulated anyway.
Well, everything I've made so far works just fine in BSNES accuracy. I haven't actually tested it on a real console before though.
Movax12 wrote:
There are a number of early C compilers that targeted 8 bit, but they were for 8080 or Z80. If there was source available for these compilers how hard would it be to adapt them to 6502
An 8080 compiler might be useful for Game Boy, as (if I remember correctly) the Game Boy's CPU is an 8080 superset made by Sharp with some features borrowed from Z80. But the 8080 family is probably a better match for C than the 6502 is because the 8080 family has 16-bit register pairs (BC, DE, HL) and a 16-bit SP, each of which is big enough to hold an
int or
void *. It would also depend on the copyright license attached to the source code, as (for example) cc65's license does not qualify under the Debian Free Software Guidelines (or the nearly identical OSI Open Source Definition).
You are correct indeed, I didn't tell the true story. I am of course not actively working on it since 10 years, it is just that I started the project 10 years ago (actually it'll be 11 years since I started in late 2004 if I remember well).
Quote:
so I have no clue as to how it could take you 10 years unless you add one instruction per every 2 weeks.
Basically yes but it doesn't work like that. It's more like, I do a lot of progress during one weekend, then I encounter a problem and decide to postpone it instead of solve it, so I don't touch the project for 5 months. Then I decide to work again on it, but on a different part than on the one where I had a problem. And so on and so forth. Now I'm in a point where all paths leads to problems I don't want to solve so I've exhausted my possibilities on making progress on the game. Some times I fix a major problem but the effort it takes to do so is greater in assembly, when it involves completes rewrites of very important routines.
In addition to that I dedicate a lot of free time for other romhacking related projects (the sound restoration of all 3 Final Fantasy Advance games for instance), and completing various games. I also dedicate a lot of free time to playing music. We cannot do everything.
Quote:
I'm about as interested in programing as I am in creating a video game, so if I don't finish, so be it.
That is also my problem. I have lots of idea for "games", but I lack ideas for the actual contents within them. The main problem for me is enemy (especially boss) behaviour, I have now decided the most part who the enemies are and even designed some of their graphics, however programming their behaviour is really hard, complex and discouraging. I have been on that particular problem since year 2006, when I was already "halfway" through the design of the game, so it's not a "new" problem.
tepples wrote:
cc65's license does not qualify under the Debian Free Software Guidelines (or the nearly identical OSI Open Source Definition).
It does now. The entire project is licensed under the zlib licence.
rainwarrior wrote:
(I think most people burn out and give up on projects because it takes a lot of repetitive and tedious work to finish them, a lot more often than they run into difficult technical problems.)
It tends to be the other way for me =/ (repetitive work means I can just copy and paste and edit, but technical problems with me tend to often be awfully hard things to figure out)
Movax12 wrote:
The entire [cc65] project is licensed under the zlib licence.
If so, it must have been within the past three months, as that's when people were discussing removing the last
vestiges of John R. Dunning's code. If you can back this up, I'll
notify the Ubuntu project.
tepples wrote:
Movax12 wrote:
The entire [cc65] project is licensed under the zlib licence.
If so, it must have been within the past three months, as that's when people were discussing removing the last
vestiges of John R. Dunning's code. If you can back this up, I'll
notify the Ubuntu project.
https://github.com/cc65/cc65/commit/aeb ... 81c6dd70e6
Bregalad wrote:
Basically yes but it doesn't work like that. It's more like, I do a lot of progress during one weekend, then I encounter a problem and decide to postpone it instead of solve it, so I don't touch the project for 5 months.
Well, I can relate to that because of vram. It's just such a headache. (I've only waited about 2 months though.)
Bregalad wrote:
The main problem for me is enemy (especially boss) behaviour, I have now decided the most part who the enemies are and even designed some of their graphics, however programming their behaviour is really hard, complex and discouraging.
At least for what I'm doing, (a run and gun) enemy behavior isn't very complex. It's not much more different than walk, shoot in direction of player, walk, shoot in direction of player. I'm sure there will be plenty of random number generation. The hardest part for me which isn't still isn't as hard is thinking of something like how I can program a missile to where it follows, like practical problems versus something like making a CPU opponent in a fighting game.
Espozo wrote:
rainwarrior wrote:
How often do you see people starting new projects here? How often do you see people finishing anything? If you can accomplish something with 10 days of work instead of 100, doesn't that make you a lot likelier to finish it?
I like to think of slow development as a type of quality control. If you're really that dedicated, you'll finish it.
Is this based on experience? I think most developers would like to agree with you in a purely idealistic sense, but out of realism will also admit that once a roadblock has been hit, this sort of much longer work can be a project killer.
Espozo wrote:
Also, I'm pretty sure C won't really help me solving problems related to vram, only generating the code for the solution that I though up of, and that isn't incredibly hard for me to do. I mean like C wouldn't have made me think of the idea of 16x16 and 32x32 animation slots; from what I've seen it's still not a miracle worker.
This sort of concept is somewhat language agnostic, much like discussion of many common data structures and algorithms. If you are more comfortable with language X, you may reach a good conclusion faster as you have a stronger feel for what your tools can do for you. It is not the language's job to make you come up with ideas.
Espozo wrote:
rainwarrior wrote:
Why use assembly when you can use a hex editor?
Because I don't exactly want to track down and change a ton of pointers if I increase the size of a table in ram or something? If it weren't for that, I might have.
*woosh*
mikejmoffitt wrote:
*woosh*
What was hard to understand about what I said? Just changing this:
Code:
ObjectTable: .res $1000
to this:
Code:
ObjectTable: .res $1800
would cause me a mountain of trouble for anything after it.
thefox wrote:
https://github.com/cc65/cc65/commit/aeb849257277a6b98542de8579697b81c6dd70e6
Thank you. Update sent to Launchpad.
Espozo wrote:
mikejmoffitt wrote:
*woosh*
What was hard to understand about what I said?
He was implying that you missed rainwarrior's point about why somebody might want to use C over assembly, in the same way somebody might want to use an assembler over editing the machine code manually in a hex editor.
I don't really think there's anything as crippling about C to ASM as ASM to machine code because of what I had said about manually changing pointers.
I don't have direct experience with being limited to 8-bit operations, but I don't see how C has any advantage over 65C816 assembly in the context of a Super NES game. Maybe this is because most of what I've done so far has been PPU-oriented registerpoken rather than full-scale game logic, but despite my decade or so of C++ experience, I find SNES assembly coding simpler, more logical and easier to understand. It allows me to work directly with the hardware, so I can actually understand the logic flow and memory structure, rather than trying to cargo-cult my way through a bunch of abstractions dreamed up by brains that are not mine. (Also, I can count cycles. Try that in C.)
Programming modern computer systems is obviously quite different, but it seems to me that a NES or SNES is both simple enough and specialized enough that the abstractions of a high-level language would just get in the way.
I would tend to agree with Espozo that the major disadvantage of hex coding, other than the lack of readability and the difficulty of memorizing all the opcodes (mnemonics in ASM are called that for a reason), is that you can't really automate anything beyond simply using copy/paste. In assembly, you can use macros and labels and so forth, and you don't lose the direct connection to the hardware because it's fairly obvious what's going on.
93143 wrote:
I don't see how C has any advantage over 65C816 assembly in the context of a Super NES game.
"Can we get this game on Genesis too?"
93143 wrote:
I would tend to agree with Espozo that the major disadvantage of hex coding, other than the lack of readability and the difficulty of memorizing all the opcodes (mnemonics in ASM are called that for a reason), is that you can't really automate anything beyond simply using copy/paste. In assembly, you can use macros and labels and so forth, and you don't lose the direct connection to the hardware because it's fairly obvious what's going on.
Exactly. I honestly don't really even bother with macros though. I just really don't want to have to worry about moving pointers to addresses and things in ram if I decide to expand on something, and not having to memorize that adding means $XX is a big plus that doesn't affect performance at all.
tepples wrote:
"Can we get this game on Genesis too?"
Sure, if we want:
1. Lower-quality music.
2. Potentially an entire background missing or the color depth of one of them being severely reduced.
3. No special hardware effects, like transparencies or an affine transformed BG.
4. Less colorful sprites and backgrounds, or less variation of objects onscreen to somewhat try to overcome the palette limit.
And a couple of others. You
do get a 64 more pixel wide screen and more sprite sizes though, at the cost of less sprites so it just about cancels out. I don't have any intention in porting my game to the Genesis because I plan on designing the game completely around the SNES's video hardware. Sound can always easily be degraded, just like how it's easier to lower framerate when porting a 3D game than it is to lower polycount.
Actually the larger flexibility on sprite size usually is enough to counter the lower sprite count (because you can usually get away with less sprites to get the same stuff on screen). Also consider there are enough sprites to cover the whole screen.
The biggest issue is that the code will be way more prone to slow down on the SNES, not to mention that 68000 compilers usually expect 32-bit ints while on the 65816 you'll most likely have 16-bit instead. For the record, GCC can be configured to use 16-bit int on the 68000:
https://gcc.gnu.org/onlinedocs/gcc-4.0. ... tions.htmlQuote:
-mshort
Consider type int to be 16 bits wide, like short int. Additionally, parameters passed on the stack are also aligned to a 16-bit boundary even on targets whose API mandates promotion to 32-bit.
Espozo wrote:
more sprite sizes though, at the cost of less sprites so it just about cancels out.
Sik wrote:
Actually the larger flexibility on sprite size usually is enough to counter the lower sprite count (because you can usually get away with less sprites to get the same stuff on screen). Also consider there are enough sprites to cover the whole screen.
Off course, I could always be a jackass and have it to where there are more than 80 square sprites, but I could do the same with the Genesis to where I have 80 24x24 sized sprites. No joke though, for what I want to do, I'm not entirely sure 80 any sized sprites would always cut it because of projectiles and effects like smoke and explosions. (Slowdown ahoy!) To be honest though, I haven't seen a whole ton SNES games go over 80 sprites, and I've only seen about 2 where there are more than 80 individual objects. The DKC games often push over 80 sprites, but each object (bananas obviously not included) is made of 4+ sprites. Both never really go anywhere over 100.
Sik wrote:
The biggest issue is that the code will be way more prone to slow down on the SNES, not to mention that 68000 compilers usually expect 32-bit ints while on the 65816 you'll most likely have 16-bit instead. For the record, GCC can be configured to use 16-bit int on the 68000:
Yeah, unfortunately. One thing I've never understood is how quick some hardware uninclined people are to judge the SNES for slowing down, when I can think of a dozen Neo Geo and other arcade games (and no, there's more than just Metal Slug 2, although the slowdown in that game is still laughably bad at times) that do and the people have no clue that it's the same processor as the Genesis except clocked faster. Do these people just not know or care about arcade games?
You know, this is kind of related to sprites on the SNES, but I feel that there was a much simpler way to go about sprites, and that would have been to get rid of 8x8 (and you might as well do the same to 64x64) sized sprites and have it to where each character bit actually acts like it's going by 16x16, so you can use all of vram for sprites (I'm assuming the Turbo Graphics 16 is like this?). Or, even better, you could also keep the option of having 8x8 and 16x16 but that each character bit would go through 8x8 instead of 16x16 sized tiles. I'm just curious, but I've heard that sprite overdraw on the SNES is by 8x8 sprite tiles, so if you made it to where it was measured with 16x16 and there where only 16x16 and 32x32 sized sprites, could you potentially get an overdraw increase somehow?
Yeah, it would probably fetch everything in steps of 16 pixels if it was made that way. (doesn't have to, but it's likely it would have)
For the record, this reminds me that the X68000 can display 128 sprites but they're always 16×16, no exception (it's the only size available), making it even more wasteful than the SNES in that sense. Make what you want out of that.
Espozo wrote:
93143 wrote:
I would tend to agree with Espozo that the major disadvantage of hex coding, other than the lack of readability and the difficulty of memorizing all the opcodes (mnemonics in ASM are called that for a reason), is that you can't really automate anything beyond simply using copy/paste. In assembly, you can use macros and labels and so forth, and you don't lose the direct connection to the hardware because it's fairly obvious what's going on.
Exactly. I honestly don't really even bother with macros though. I just really don't want to have to worry about moving pointers to addresses and things in ram if I decide to expand on something, and not having to memorize that adding means $XX is a big plus that doesn't affect performance at all.
Okay, can we not waste time talking about writing software in a hex editor, since
nobody was forwarding that as something that should be taken seriously?
Espozo wrote:
tepples wrote:
"Can we get this game on Genesis too?"
Sure, if we want:
1. Lower-quality music.
2. Potentially an entire background missing or the color depth of one of them being severely reduced.
3. blah blah etc
Please don't start this sort of thing up...
Espozo wrote:
Sure, if we want:
1. Lower-quality music.
2. Potentially an entire background missing or the color depth of one of them being severely reduced.
3. No special hardware effects, like transparencies or an affine transformed BG.
4. Less colorful sprites and backgrounds, or less variation of objects onscreen to somewhat try to overcome the palette limit.
This was totally uncalled for and has nothing to do with this discussion.
Music is subjective... I for example tend to like the crispy sounding Genesis more than the low quality "head inside a bucket" SNES samples, but both can produce nice music when well utilized.
Also, it's not like a lot of SNES games make use of the special video effects. In fact, may games look silly trying to fit them in when they're completely unnecessary.
Regarding graphics in general, limitations don't necessarily mean bad results. After all, the SNES is more limited than a Playstation 2 but you still think it's a nice platform for making games on, instead of complaining that it doesn't even do 3D.
A lot of SNES games end up suffering from the smooth gradient syndrome, while Genesis games tend to look sharper and have better contrast. This is not an absolute rule though, the point is that a competent artist will make the best out of the platform he's working on, and limitations just mean they have to think differently.
Quote:
I don't have any intention in porting my game to the Genesis because I plan on designing the game completely around the SNES's video hardware.
Then there's really no point in porting it to other contemporary platforms, since they'll obviously not turn out well. The same could be said if the primary target was the Genesis.
Back in the commercial era though, it made sense to stick to the lowest common denominator and design games around the features that both consoles had, and reusing some C code.
Quote:
A lot of SNES games end up suffering from the smooth gradient syndrome, while Genesis games tend to look sharper and have better contrast. This is not an absolute rule though, the point is that a competent artist will make the best out of the platform he's working on, and limitations just mean they have to think differently.
Which is weird because gradients
should be a good thing, but they tend to look really ugly. On the flip side, a lot of Genesis games use an ugly "dark yellow" color that always comes off looking dull and unnatural.
The color race didn't help matters. Back then color count mattered so everybody tried to cram in as many colors as possible (VGA had the worst case of this with many graphics simply looking ugly due to misused gradient overdose). These days it'd be a lot better - the Mega Drive's limited palette would force you to use a more limited hue set in order to get more lighting shades, but that's generally considered a good thing since it preserves the concept of using a limited palette to convey emotion.
Honestly in practice nothing forces you to use all colors on the SNES either, if you don't
need to then maybe just don't do it... (the extra palettes are probably more useful for small extra details, for heavy palette cycling or for cases where lots of palette swapping would be useful e.g. 4P games or beat'em-ups)
tokumaru wrote:
Back in the commercial era though, it made sense to stick to the lowest common denominator and design games around the features that both consoles had, and reusing some C code.
Not back then, C was still considered too slow by many (I imagine especially on the 65816, not really that bad on the 68000), and games got rewritten from scratch anyway. Heck, the norm was to outsource ports to other companies I believe.
There are exceptions to the rule as usual but usually it was like that.
mikejmoffitt wrote:
Okay, can we not waste time talking about writing software in a hex editor, since nobody was forwarding that as something that should be taken seriously?
Thank this:
Quote:
Why use assembly when you can use a hex editor?
mikejmoffitt wrote:
Please don't start this sort of thing up...
I just think it's silly to program in C to potentially make a port on the Genesis that probably isn't even going to be made anyway. The video hardware is different enough to where most people, or at least people not wanting to make a profit, won't bother.
tokumaru wrote:
This was totally uncalled for and has nothing to do with this discussion.
The whole topic of the discussion was about using C code, presumably to make programming easier, not for making ports, which started with:
tepples wrote:
"Can we get this game on Genesis too?"
I think what I've said can be agreed on, well, maybe not the first one to some people. I'm personally a bigger fan of the SNES's sound hardware, but I can understand tokumaru when he said:
tokumaru wrote:
Music is subjective... I for example tend to like the crispy sounding Genesis more than the low quality "head inside a bucket" SNES samples, but both can produce nice music when well utilized.
tokumaru wrote:
Also, it's not like a lot of SNES games make use of the special video effects. In fact, may games look silly trying to fit them in when they're completely unnecessary.
Actually, a lot of them do (at least that I've seen) but, like you said, they often end up looking silly, like any non racing game trying to squeeze in mode 7. Of course, both can look nice when they're used reasonably, like the fog in DKC.
tokumaru wrote:
A lot of SNES games end up suffering from the smooth gradient syndrome, while Genesis games tend to look sharper and have better contrast. This is not an absolute rule though, the point is that a competent artist will make the best out of the platform he's working on, and limitations just mean they have to think differently.
I also agree in I'd say it depends on the game in this is actually a problem of the artists and not the hardware itself, but then we get to what you said about the PS2. In my opinion, with colors, you can either make something that looks better, like Irem's artwork, or something much worse, like the "European" artwork that uses excessive shading and completely forgets to include any detail whatsoever.
I guess I did kind of randomly attack the Genesis, and I think we know where that leads to...
Let me try to sum up how I perceive the digression, so that we can get back to topic: If you want to use exclusive hardware features, feel free to write the whole thing in assembly language. If you don't, go ahead and use assembly language in the part of the program that interacts directly with the hardware, but C for the game logic can save your programmers time. And SNES mode 2 and Genesis with VSRAM are comparable.
At this point we know GCC supports at least one architecture with 16-bit int: -m68000 -mshort. What's the big obstacle to doing so with 65816?
tepples wrote:
Let me try to sum up how I perceive the digression, so that we can get back to topic: If you want to use exclusive hardware features, feel free to write the whole thing in assembly language. If you don't, go ahead and use assembly language in the part of the program that interacts directly with the hardware, but C for the game logic can save your programmers time. And SNES mode 2 and Genesis with VSRAM are comparable.
Sounds fair.
I am still kind of curious as to if using 16x16 sprite tiles would help with overdraw any. I heard that the SNES has a linebuffer for sprites, but I don't have a clue as to how that actually gets filled out. I would make a new topic for it as to not disturb this one, but it seems like a bit of a waste as it should be able to be answered in one post.Edit: Never mind, I'll just ask.
tepples wrote:
At this point we know GCC supports at least one architecture with 16-bit int: -m68000 -mshort. What's the big obstacle to doing so with 65816?
The size of
int is a complete non-issue for a well-written compiler; all GCC has to do is map
int to a 16-bit data type instead of a 32-bit data type.
The bank separation inherent to the 65816 is a bigger obstacle. Most likely, GCC is designed on the assumption of a single, flat address space. As far as I know, this assumption holds true on every CPU architecture it supports, or at least GCC thinks it does. Adding support for banking would be a huge undertaking, and it's doubtful any of the GCC developers would see any practical reason to do so. It might be possible to make it work within a single bank, but there's much less use for a C compiler that's limited to a single 64kB chunk of address space.
Data space on 65816 is flat, at least in HiROM. Indexed address modes can cross from one bank to the next. It's just code sections that can't cross bank boundaries. Or are you referring to the concept of a "far pointer", where sizeof(size_t) < sizeof(intptr_t)?
tepples wrote:
Data space on 65816 is flat, at least in HiROM. Indexed address modes can cross from one bank to the next.
Hmm, that makes large data much more feasible. For some reason, I was under the impression that the bank wraparound applied to all addresses.
tepples wrote:
It's just code sections that can't cross bank boundaries.
This sets a hard limit of 64kB of code, since GCC doesn't understand how to jump to a new bank.
tepples wrote:
Or are you referring to the concept of a "far pointer", where sizeof(size_t) < sizeof(intptr_t)?
GCC doesn't do "far" pointers. If you make pointers 24 bits instead of 16, it will apply to all pointers. This tradeoff may be worth it, depending on what kind of program you're making.
Now that you mention it, the concept of "far" pointers is similar enough between 65816 and 8086 that an 8086 compiler could potentially be used as the base for a 65816 compiler.
GDB gained, and then lost, 65816 support some 15+ years ago (it was marked obsolete in version 5.1 released April 2001). I don't know if that means somebody planned to introduce any manner of 65816 support into GCC or binutils, but whatever initiative was there seems to be long gone.
As far as the obstacle of the SNES memory layout (and the 64k bank boundary problem) goes, that could probably be alleviated with a decent linker script, though it'd be a pain in the ass to have to use it for every build (although perhaps ld could just be made to use some sensible default when dealing with the 65816).
Joe wrote:
tepples wrote:
It's just code sections that can't cross bank boundaries.
This sets a hard limit of 64kB of code, since GCC doesn't understand how to jump to a new bank.
It only sets a hard limit on a single linking unit. You could always make several 64k banks separately. You'd just need to write a small amount of assembly code to facilitate jumps across banks.
Espozo wrote:
Actually, a lot of them do (at least that I've seen) but, like you said, they often end up looking silly, like any non racing game trying to squeeze in mode 7.
https://www.youtube.com/watch?v=y51i_v7F5ekhttps://youtu.be/0MsHKmCVT6I?t=41https://www.youtube.com/watch?v=AnP9pLsSMV4(I don't care what people say about Mohawk and Headphone Jack, at least they were trying :v)
But yeah, I know what you mean, games adding in silly mode 7 effects for no real reason. Especially landscape ones in a game that otherwise has entirely 2D-only graphics (at least rotation or scaling would make more sense for those). Jeez?
(EDIT: just realized the double post - whoops)
Quick remark: GCC on the 68000 uses an instruction called JBSR. Once it finally knows the target address (which requires the linking to come in), it gets replaced with either JSR or BSR as appropriate, depending on how long is the jump. I'd imagine you could do something like that on the 65816 too for calls across banks. Then the only real issue left is that functions can't cross bank boundaries, but I wonder how the heck do you make a function so large anyway.
There's also the issue of arranging everything into place but that's the linker's job. Maybe write an intermediate tool that generates a linker file on the fly and feeds that to the linker to avoid most of the issues.
EDIT: also this is probably relevant
https://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_17.html
Sik wrote:
(I don't care what people say about Mohawk and Headphone Jack, at least they were trying :v)
Good lord! That game was made in 1995? It looks like an SNES game from 1992! It reminds me of Sonic the Hedgehog, (it's obvious that they "took some inspiration" from it) and Super Mario Galaxy. It looks like it could have actually been pretty fun if put in the hands of a better developer, and I have to admit, the title screen was pretty awesome.
One game, although I still love it, that uses an excessive amount of Mode 7 is R-Type III. I can think of 4 parts in just the first level that it's used, although it is used fairly well, and I will say I was not expecting it at all the first time I was playing it like you can with a majority of SNES games where the background goes blank, which in the case of R-Type 3, you are in space at that part and the stars in the background are sprites. I'll tell you one place I don't like it, and that's at the level 4 boss. If you die there and you don't have the cyclone force, you're screwed.
I just noticed I went off on a tangent on a totally unrelated game.
I vaguely remember hearing about Mohawk & Headphone Jack. I didn't know it was a 2D Mode 7 platformer... I didn't know there was such a thing, but it makes so much sense now and I can't imagine why more developers didn't use Mode 7 that way... Sure, you've only got 256 tiles with no flipping, but that's not all that bad, and they're 8bpp so you don't have colour count issues. The map is huge, so you can zoom out if you want to and still have plenty of headroom for rotation. And with EXTBG you can even do tricks with per-pixel layer priority.
One big disadvantage is that there's no parallax scrolling, but even that could be worked around somewhat if you really wanted to...
93143 wrote:
I didn't know it was a 2D Mode 7 platformer... I didn't know there was such a thing, but it makes so much sense now and I can't imagine why more developers didn't use Mode 7 that way...
Isn't that how
this Super Ghouls 'n Ghosts level works? The background looks good enough to me, the limitations aren't so apparent.
93143 wrote:
no flipping
Wait, really? So the opposite sides of the racetrack in Super Mario Kart are actually drawn flipped? I wonder if to save memory in a situation like this, you could actually have a code that flips the tiles.
Personally, I think the problem with this would be the fact that there is no hardware for scaling or rotating sprites, which I think is a bit silly, especially when a launch titles did tricks to try and fake that there where, like in SMW, or how the racers in F-Zero are pre rendered. (Would it have really been that hard to scale the cars on F-Zero in software? I mean, there's only ever about 3 other racers onscreen at once. (Your car doesn't count, as it's always in the same spot.))
Quote:
Isn't that how this Super Ghouls 'n Ghosts level works? The background looks good enough to me, the limitations aren't so apparent.
I didn't have to look any further than the status bar... At least in the R-Type games, it's always like that.
You really need a huge tilemap for Mode 7, so as to get a lot of real estate onscreen at the same time. And the way it's implemented, even a 128x128-tile map takes half of VRAM (well, half of half of VRAM, but it's hard to do much of anything with the other half even if you don't need it for Mode 7 tile data). If they had added flipping, they'd have needed to either cut the tile count down by half for each added flip axis or add a whole other byte to each tilemap entry; the latter would either halve the size of the tilemap or restrict all of VRAM to just Mode 7 and square 2bpp sprites (NES-format graphics are just about the only thing that can fit in unused Mode 7 tile data areas without disturbing the tilemap, and even then you can't use transparent pixels because you can't guarantee a zero without the cooperation of the tilemap data).
Or they could have used 128 kB of VRAM, but... ($$$)
93143 wrote:
Or they could have used 128 kB of VRAM, but... ($$$)
I wouldn't be at all surprised if they originally planned on having 128KB of VRAM. As I've said before, I think I remember that several of the registers that hold information on where BG tiles and tile maps start actually can address 128KB of data, in that there's an additional un needed bit. If the bit is on, it wraps around back to the beginning of VRAM. I know I've heard people say that the Genesis was originally supposed to have 128KB of VRAM, but does anyone know about the SNES? I've even heard that if you add an extra 64KB of VRAM to the Genesis, it works just as fine, so I'm guessing you could do the same to the SNES?
About the Sega Genesis, actually it appears that it was originally designed to have only 64 KB of VRAM but then later in the development process they considered 128 KB of VRAM (partially because of the Super Nintendo project). So they indeed modified the VDP to support 128 KB of VRAM, it even has extra registers just for that :
http://info.sonicretro.org/SCHG:VDP_Doc ... /Registershttp://gendev.spritesmind.net/forum/vie ... hp?p=18726But to reply your question about the SNES, at least the PPU registers do not give any indication about a possible 128 KB VRAM mode.
I believe mode 7 actually has the ability to access beyond 64KB somehow. Not sure on the details.
Quote:
But to reply your question about the SNES, at least the PPU registers do not give any indication about a possible 128 KB VRAM mode.
Actually they do. Tile maps can be located at 64 different addresses in 2kB chunks. Tile patterns can be located at 16 different addresses in 8kB chunks. Sprite patterns can be located at 8 different address in 16kB chunks.
psycopathicteen wrote:
Actually they do. Tile maps can be located at 64 different addresses in 2kB chunks. Tile patterns can be located at 16 different addresses in 8kB chunks. Sprite patterns can be located at 8 different address in 16kB chunks.
Are you sure ?
Just looking at the BG tilemap address:
aaaaaayx = where aaaaaa is the tilemap address << 10 so at max you can locate it at 0xFC00 (64 * 1KB granularity)
BG Chr Address:
bbbbaaaa = where bbbb/aaaa is base address << 12 so again 0xF000 at max (16 * 4KB granularity)
So except if these addresses are "word" based (but it does not look like so) they are definitely defined for 64 KB space or i am missing something ?
But it's true that VMADDL and VMADDH seems to define word address and so 128 KB space...
The addresses are word based.
To quote bazz's tutorials: (And, like I said, I've tried it before.)
bazz wrote:
Register $2107
BG1 Tile Map Location (1B/W)
aaaaaass a: Tile map address s: Screen size: 00=32x32 01=64x32 10=32x64 11=64x64
The a bits set the starting tilemap address. This can be set in intervals of $0400 words. So setting the a bits to 1 sets the address to $0400, incrementing that would set it to $0800, etc. If you want to convert the bits into the address itself to see, shift them left by 10. (1 left shift 10 = $0400) Since there is only 64K of VRAM, the Most Significant Bit (bit 7) must be ZERO. $2108-$210A are the same exact thing as $2107, only for BG2-BG4, respectively.
bazz wrote:
Register $210B
BG1 & BG2 Character location (1b/W)aaaabbbb a: Base address for BG2.
b: Base address for BG1.
Register $210C
BG3 & BG4 Character Location (1b/W)aaaabbbb a: Base address for BG4.
b: Base address for BG3.
The starting address here can be set in intervals of $1000 words. So 0 would be $0000, 1 would be $1000, etc. Because of the limited size of VRAM, the MSB must be 0 (you can’t go over address $8000). You can convert the bits to the address value by shifting them left by 12.
It's a bit unfortunate (Get it?
) that they didn't just increase the granularity of where everything can start, but I don't usually have to much of a problem with this admittedly.
psycopathicteen wrote:
The addresses are word based.
Ok so it makes sense, my documentation wasn't clear enough about it.
Definitely means they had 128 KB VRam in mind as well so !
Quote:
It's a bit unfortunate (Get it?
) that they didn't just increase the granularity of where everything can start, but I don't usually have to much of a problem with this admittedly.
Yeah the granularity becomes limited when it comes to the character base address of BG plan, as you end with only 8 possible positions in VRAM.
Now, the thing is, could you add an extra 64KB and have it successfully work? I think one thing Nintendo should have done is have it kind of like the N64, to where there is an "expansion pak" that adds 64KB more of vram.
Yes. That's how saving is done anyway. Heck, a board that maps the program ROM as a HiROM at $C00000-$FFFFFF could even have several megabytes of work RAM and map it at $400000-$7DFFFF.
But that's only applicable to work/save RAM, not VRAM.
Any idea what the top two bits in SETINI were originally for? The EXTBG bit at least does something moderately useful as matters stand (though what the "external LSI" it was supposed to enable data from was isn't clear), but the "External Synchronization" bit appears to be purely an artifact. It's supposed to be "Used for super-imposing images, etc.", but...
93143 wrote:
Any idea what the top two bits in SETINI were originally for? The EXTBG bit at least does something moderately useful as matters stand (though what the "external LSI" it was supposed to enable data from was isn't clear)
Probably the same sort of "external LSI" that was supposed to feed 4-bit background pixels into the original NES PPU. In fact, early on when the Super NES was supposed to be back-compatible, it might have originally been an NES PPU in output mode, the same way the newer RGB NES mods work.
Quote:
, but the "External Synchronization" bit appears to be purely an artifact. It's supposed to be "Used for super-imposing images, etc.", but...
Appears intended for either a hypothetical Super Famicom Titler or the Nintendo Super System's OSD.
Or just to allow the PPU to be used with other PPUs in non-SNES systems (e.g. arcades).
Espozo wrote:
It's a bit unfortunate (Get it?
) that they didn't just increase the granularity of where everything can start, but I don't usually have to much of a problem with this admittedly.
Eh, not really. Does that granurality match the maximum size of the table? Because that's probably a more likely reason (makes it much easier to form an address by just grouping bits together).
Sik wrote:
Does that granurality match the maximum size of the table?
No. Well, tilemaps are indeed 2 kB, but BG layer data sections are 16, 32, or 64 kB depending on bit depth. OBJ data tables are 8 kB, and while the offset of the second table from the first table can be specified in 8 kB steps, the position of the first table itself is limited to 16 kB steps even though you have three bits for the address.
That granurality is only for positioning the nametables though, if I'm understanding correctly.
Nintendo's terminology is pretty funky. Let's see if I can't summarize clearly:
Tile map for a particular BG layer
size: 2 kB (x1, x2, or x4 depending on x and y extension bits)
address bits: 6
address precision: 2 kB
address range: 126 kB
Tile data for a particular BG layer
size: 1024 tiles (16, 32, or 64 kB depending on bit depth)
address bits: 4
address precision: 8 kB
address range: 120 kB
Tile data for sprites, first table
size: 8 kB
address bits: 3
address precision: 16 kB
address range: 112 kB
Tile data for sprites, second table
size: 8 kB
address bits: 2
address precision: 8 kB (offset from end of first table)
address range: 24 kB (plus 112 kB for first table)
Mode 7 interleaved tilemap and tiledata
size: 32 kB
address bits: 0
address precision: N/A
address range: 0 kB
93143 wrote:
a 2D Mode 7 platformer... I didn't know there was such a thing
Chalk up another one - it seems Septentrion did it too.
The 65816 should be good at C equations like this:
x = a + b + c & 3 | y
Just as long as your not using 32 bit integers and multiplication.
I think the banks system is also an issue .
I don't know if the 65816 can access directly of his full 24 bits addressing .
The 65816 supports "far" (24-bit absolute) addresses in
LDA al and
LDA al,X and 24-bit pointers in
LDA [d] and
LDA [d],Y modes. It just can't fit a far pointer into a single register, a drawback it shares with 8086 real mode. But it's still possible to pass a far pointer in B:Y* in the same way that an 8086 program might pass a pointer in DS:SI.
* So long as it isn't running on a broken CPU with a misbehaving PLB instruction.
thanks tepples