Efficiency of development process using C versus 6502

Efficiency of development process using C versus 6502
by GradualGames on 2015-11-18 (#159345)

So, I'm still pretty new to writing C for the NES. I keep feeling like it's actually going to wind up taking more of my time using it than if I just continue to build well-honed idioms in 6502. I also find myself feeling somewhat irritated with all the "noise" it generates. So much stuff going on that, I'm not sure if I really want to be bothered to fully understand. However, with regards to building full games with pure 6502, I really feel like I'm getting halfway decent at it at this point. I'm curious though if some of the C adherants here can vouch for it resulting in greater productivity whilst developing a game. I know I saw Shiru's video where he prototypes an object in Jim Power in C and then re-codes it in assembly. But...I have a feeling that adopting that process, for me personally will feel taxing, potentially. I may prefer to just continue to build games in 6502. Kinda hard to decide. It's kinda neat learning how one CAN write code in C on the NES, I'm enjoying that, but...not really sure if I'm gonna take the plunge and use it full tilt on my next project.

To put it in other terms, I suppose I feel like the power of abstraction that higher level languages offer would only really be useful if it was actually possible to forget about the hardware, with something really powerful like a modern PC or phone. On something as constrained as the NES, I'm just feeling like it's gonna get in the way. I mean....even with ca65 macros I got into hot water pretty quickly on my current project (code size grew much faster with them---thinking of specific use cases which were probably just bad design). I see headaches in my future if I use C. Haha.

*edit* One thing I thought of recently is, not too long ago I read about tokumaru's approach to enemy state machines essentially as continuations, obviating the need for annoying lists of state addresses and enums. It's something I'm looking forward to trying. I know one technically can fake this even in C, but...what is the point? It just seems like there are really good, really organized ways of coding in 6502 that can be just as fast (once you have enough practice) as developing in C.

*edit* Yet another thing I thought about was structures of arrays versus arrays of structures. C will always default to the latter (unless cc65 has a structures of arrays optimization I'm unaware of)? So if I code enemies in C first and then re-code them in assembly, I'm seeing two different entity update architectures being in place---one just for prototyping in C, and then one for the "real" 6502 implementation. This all just seems to be overkill.

Note, the speed to which I'm referring here is *development* speed, obviously C is always going to be slower actually executing.

Thoughts?

Re: Efficiency of development process using C versus 6502
by dougeff on 2015-11-18 (#159351)

Why not take a hybrid approach...

Code 90% of the game in C, and any time you question how the C will compile (too many bytes, or too slowly) write that in ASM. You can call an ASM function from the C code.

(Of course I'm a newb at cc65 too).

Re: Efficiency of development process using C versus 6502
by tokumaru on 2015-11-18 (#159352)

I personally haven't coded in C for the NES yet (and I don't really plan to), but from what I've read about it in here, it seems like you constantly have to monitor what your C code is compiling to in order to make sure it's not turning into a complete mess, which apparently happens quite often.

I think that having to check the assembly all the time kinda defeats the purpose, and I'd rather write good assembly from the get go, than stay stuck going back and forth until the C code becomes something acceptable. I imagine that the more you code in C, the more intuitive it gets, so you don't have to monitor the output as much, but I still don't know if it's worth the trouble.

I honestly don't think that coding in assembly is a chore, I actually like it a lot. I like the power of assembly, and being able to control every little detail of my programs. There are also all the little shortcuts you can take and tricks you can perform only with the freedom of assembly, and to me that's a big part of the fun.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2015-11-18 (#159354)

tokumaru wrote:

I think I'm leaning in this direction, having tried C out for a few weeks. I mean it is kinda neat that it's possible. I like high level languages most of the time, but on the NES, it's turning out to be a bit cumbersome. I really think you need power to match abstraction. I am curious about Shiru's thoughts on this though, since he apparently uses C to prototype things quickly and then re-codes in assembly. For me, I've occasionally written down some pseudo code in comments...which is almost the same process, just, you only have to get it working correctly once, in one language, rather than two

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2015-11-18 (#159355)

tokumaru wrote:

...but from what I've read about it in here, it seems like you constantly have to monitor what your C code is compiling to in order to make sure it's not turning into a complete mess, which apparently happens quite often.

It will happen very often when you're starting out, for sure. Once you get used to what works well, it becomes less of a problem.

Nobody ever comes to the board to say "hey I wrote some C code and it's working like I expected it to", so if you're going by the stuff that comes up here, you're getting a bit of a bias toward its problems.

tokumaru wrote:

I think that having to check the assembly all the time kinda defeats the purpose...

You don't need to check the assembly all the time. You need to check performance frequently. That's as easy as leaving a $2001 write at the end of your code to visually show you the frame's timing. Not really onerous at all, just leave that on while you're working, and run your code often.

You only have to look at the assembly if you want to gain a deeper knowledge of what the compiler is doing.

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2015-11-18 (#159356)

GradualGames wrote:

Yet another thing I thought about was structures of arrays versus arrays of structures. C will always default to the latter (unless cc65 has a structures of arrays optimization I'm unaware of)?

Structure of arrays in C looks exactly like it sounds:

Code:

// array of structures

struct Box {
    char a;
    char b;
    char c;
};

struct Box boxes[10];

boxes[5].c = 3;

// structure of arrays

struct Box {
    char a[10];
    char b[10];
    char c[10];
};

struct Box boxes;

boxes.c[5] = 3;

Re: Efficiency of development process using C versus 6502
by GradualGames on 2015-11-18 (#159357)

What if you want to deal with 16 or 24 bit values? I might have an array of lo bytes and an array of hi bytes in my 6502 code, what I was wondering is if C can translate that into just an access of an array of ints. The point I'm trying to make is even in C code, you're still gonna have to have a lot of knowledge of the hardware and wrestle with it constantly. I'm not sure I'm convinced it can really help on such a constrained system. Probably in the end it boils down to how one enjoys working. To me, it's always a bit tough to get code to work correctly, I'd rather do that once...so...pseudocode for me. Haha.

Re: Efficiency of development process using C versus 6502
by tokumaru on 2015-11-18 (#159358)

Prototyping is good, and I'll certainly do it if it's something that can be tested in isolation. Coding two versions of the same program in parallel, however, is a bit too extreme in my opinion, and I'll probably never do that.

Generally, I try to write very well structured assembly code, with lots of comments, and I also try to do things in a consistent way every time. Another thing I do is separate the logic from the hardware interactions as much as possible. As long as I do those things, my assembly code is just as easy for me to follow as any other program I've written using high-level languages, so using C wouldn't make my programs any easier to comprehend.

Admittedly, engineering things in assembly might be a little harder, and prototyping helps you avoid wasting time engineering solutions around the wrong ideas. This is something I unfortunately have wasted some time on, since I don't have a simple easy to prototype more complicated ideas.

Re: Efficiency of development process using C versus 6502
by tokumaru on 2015-11-18 (#159359)

rainwarrior wrote:

Nobody ever comes to the board to say "hey I wrote some C code and it's working like I expected it to"

Good point. I am picturing topics like that now though! :lol:

Re: Efficiency of development process using C versus 6502
by dougeff on 2015-11-18 (#159361)

Quote:

You only have to look at the assembly if you want to gain a deeper knowledge of what the compiler is doing.

I actually liked the way the C compiler does a few things. For 1, it seems to keep Y at zero, and anytime it needs a zero it can TYA (or STY). Another thing was this for incrementing a 16bit number... (I'm going from memory here).

Code:

Ldx Highbyte
Lda Lowbyte
Clc
Adc #1
Bcc +
Inx
+sta lowbyte
Stx highbyte

This is about as efficiently you can do 16 bit math.

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2015-11-18 (#159362)

GradualGames wrote:

What if you want to deal with 16 or 24 bit values? ... I might have an array of lo bytes and an array of hi bytes in my 6502 code, what I was wondering is if C can translate that into just an access of an array of ints.

I've never seen a C implementation with a 24 bit type. As for arrays of 16 bit values, yes you're probably out of luck. I will remind you, though, that striped organization of arrays is merely an optimization for efficiency, and you already gave up that efficiency to be using C in the first place. The extra cycles to put a *2 on the address aren't quite as bad as with larger structures.

If you mean you already have striped arrays in your efficient assembly part of the program, and you just want convenient access to them from the C part, perhaps just write some macros to make the fetching convenient?

Code:

#define GET16(a,b,index) ((a[index]<<8)|b[index])
short int x = GET16(hi,lo,25);

I think cc65 is smart enough not to turn <<8 into 8 shifts, so it shouldn't do too badly with this. This method could also work with 24 bit striped arrays too (i.e. temporarily unpack them to a long int to work with them in C, then repack them when you're done).

If you mean you want to define the array as regular C code:

Code:

short int array[10];

I don't think the C compiler would be allowed to "optimize" this as two striped arrays, even if it wanted to. I think it might be against the C specification (not entirely sure) for it to reorganize data that way.

GradualGames wrote:

I'm not sure I'm convinced it can really help on such a constrained system.

I've written a bit of stuff in C on the NES, and I thought it was very much worthwhile. It made developing and iterating on the code significantly faster. There's no question in my mind about this.

It depends on what your goals are. I'm not using it for my current project, but that's largely because it's already being developed primarily in C++ / Win32. I don't really have to iterate on the NES assembly code much; it tends to port very easily once I have the details worked out in the C++ version. If the project was a little smaller in scope and didn't have the Win32 version, I'd probably be using C on the NES.

Re: Efficiency of development process using C versus 6502
by lidnariq on 2015-11-18 (#159363)

rainwarrior wrote:

I've never seen a C implementation with a 24 bit type.

Unimportant tangent: I've seen this show up in several 8-bit microprocessor C compilers (e.g. HI-TECH PICC, where it is a short long) and at least one 8051 C compiler.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2015-11-18 (#159364)

Another thing which gives me pause about C coding is the issue of far calls. I've implemented a macro and routine which helps me turn any jsr into trampoline (and yes, I do preserve processor status for return values from far calls). As my game code began to sprawl multiple 16kb roms this became incredibly useful. Note I did not use this setup for high performance cases, in those cases I hard-code the trampoline. I suppose I'm concerned if I take the plunge into C and make something as large as my current project, I'm going to have a lot of really inefficient far-calling going on (probably much worse than this macro and routine I wrote for 6502 far calls). The sprawl I am guessing will be dramatically worse than plain 6502 code.

I'm also not exactly sure how I'd structure a game program. I like that in 6502 I can simply jump to another location intended for initializing a new game state, without modifying the stack. I'm sure I could just split it up into functions and use a switch case or what not...I dunno. I'm probably just wrestling with an overall resistance to change. Hey at least I'm trying it out. Haha

*edit* I thought of what you're suggesting Rainwarrior, write some utility macros to pull the bytes out of my arrays to rearrange them into 16 bit values etc. But it just feels hacky, in my mind.

I use 24 bit values all the time. 16 bit world coordinates and 8 bit sub-pixel precision. It feels really natural to work with. How would one work with these in C? With longs?

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2015-11-18 (#159368)

GradualGames wrote:

Another thing which gives me pause about C coding is the issue of far calls.

CC65 has no far call capability, and that's more or less something that would have to be built into the language. You can write assembly trampolines, and call them from C if you like, but there's just no language support for a C to C far call.

There are C compilers that have their own extensions for far calls (e.g. x86 compilers before the 386), but CC65 has no such extension. Far calls are not part of the ANSI C standard, so this is always a compiler-specific thing. In this case you'd really need not just a NES-aware C compiler (which CC65 is not), but an NES C compiler with a specific mapper in mind.

There's a few of us here who have taken a liking to BNROM's 32k banking, which would give you a lot more space to fit all your C code into a single bank (though confining the C code to two 16k banks, one fixed, one bankable, would be just as good, I suppose). Putting C code in more than one bank is somewhat possible, but it wouldn't be easy. Much easier to keep all your C in one bank, and any "far" calls would go to pure assembly.

GradualGames wrote:

But it just feels hacky, in my mind.

I'm usually more concerned with what code does, and how easy it is to write or change, than how it "feels". Nobody playing your game will be able to feel your code.

GradualGames wrote:

I use 24 bit values all the time. 16 bit world coordinates and 8 bit sub-pixel precision. It feels really natural to work with. How would one work with these in C? With longs?

Yes, I would probably temporarily convert them to 32 bit integers to work with them in C, or 16 bit integers if the particular routine was not interested in sub-pixel precision.

Re: Efficiency of development process using C versus 6502
by Bregalad on 2015-11-19 (#159381)

Honnestly to this point I tend to belive the best option is to prototype the full game in any high level language (not necessarly) on a less limited platoform, and port it to assembly and NES hardware as a second step.

I also think other high level languages might be more suited to the 6502 than ANSI C (I could be wrong).

Re: Efficiency of development process using C versus 6502
by GradualGames on 2015-11-19 (#159385)

Given my experience on my current project which is wrapping up---I'd be wary of trying the prototype approach. The reason is that---even though I'm well aware of the constraints I'm working with---I am still confronted with situations where I have to make a trade-off to get what I want. My concern is I could put all this effort into a prototype *thinking* it'll work great on the NES, but have to work through all such things anyway, with an end-result much different from what I prototyped. Using C is still the most appealing option (if I end up wanting to adopt a prototype-first development process..), I think, because at least I'll see my experiments running on the real system, without any illusions that it *should* work (which could happen if I prototyped on a PC, for example).

Re: Efficiency of development process using C versus 6502
by tomaitheous on 2015-11-20 (#159436)

dougeff wrote:

Quote:

You only have to look at the assembly if you want to gain a deeper knowledge of what the compiler is doing.

Code:

Ldx Highbyte
Lda Lowbyte
Clc
Adc #1
Bcc +
Inx
+sta lowbyte
Stx highbyte

This is about as efficiently you can do 16 bit math.

In C or in general? Loading and stowing the highbyte every time is a waste of cycles. Plus, you could have used the same approach but with inc lowbyte and branch if the result isn't zero, without loading anything. How that efficiency manifests itself elsewhere in related code, is another matter. The C compiler in all likelihood isn't going to benefit from preventing previous Acc and X data from being destroyed, but it might save a couple of cycles and definitely some bytes.

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2015-11-20 (#159437)

dougeff's example is apocryphal anyway (from memory? O_o). Here's what CC65 actually does with a 16 bit increment:

Code:

// C code
static short int g;
++g;

; assembled into
   inc     _g
   bne     L0030
   inc     _g+1
L0030:

It's exactly what you should expect.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2015-11-20 (#159442)

I've been so tunnel-vision on shipping games that I somehow missed this much improved way to do a 16 bit increment. Thanks Rainwarrior.

Re: Efficiency of development process using C versus 6502
by dougeff on 2015-11-20 (#159443)

I'm looking at the test code now...

Code:

int Source; //ie 16bit
...
   Source++;

compiled into this...

Code:

   lda     _Source
   ldx     _Source+1
   clc
   adc     #$01
   bcc     L0020
   inx
L0020:   sta     _Source
   stx     _Source+1

Re: Efficiency of development process using C versus 6502
by GradualGames on 2015-11-20 (#159444)

That's interesting---I wonder if you are using a different version from Rainwarrior, an older one perhaps?

Re: Efficiency of development process using C versus 6502
by dougeff on 2015-11-20 (#159445)

Although, interestingly...

changing

Code:

Source++;

to...

Code:

++Source;

produced this...

Code:

   inc     _Source
   bne     L0003
   inc     _Source+1
L0003:   

So, yeah, I'm dumb, because I wasn't aware it would compile differently.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2015-11-20 (#159450)

I'd be interested to learn why the compiler would change the asm output for preincrement versus postincrement. Interestingly I do recall that, at least in the past, using preincrement is recommended (as an optimization in loops) for many compilers. I don't know why, though. The asm code has the same meaning either way, just...one of them is faster. Is it just convention that compilers interpret preincrement as a hint from the programmer to generate slightly faster code...?

Re: Efficiency of development process using C versus 6502
by DRW on 2015-11-20 (#159452)

GradualGames wrote:

I'd be interested to learn why the compiler would change the asm output for preincrement versus postincrement.

It probably has to do with the fact that both things mean something different in C:

Code:

int a = 5;
int b = ++a;

Line 2:
Increment a.
Return a.
Assign a to b.
(a equals 6 now.
b equals 6 now.)

Code:

int a = 5;
int b = a++;

Line 2:
Memorize a.
Increment a.
Return the memorized value.
Assign the memorized value to b.
(a equals 6 now.
b equals 5 now.)

And I assume the CC65 compiler simply isn't advanced enough to recognize that, in dougeff's example, you don't actually use the return value of the post increment and therefore optimize the call Source++;into a preincrement ++Source;.

Re: Efficiency of development process using C versus 6502
by lidnariq on 2015-11-20 (#159453)

~~If I had to guess, I'd guess that "x++" is syntactic-sugar-ed to x=x+1, while "++x" is handled separately.~~

No, it's far weirder. ++x is syntactic-sugar-ed to x=x+1, generated via g_addeqstatic, where the value 1 is special-cased for 1- and 2- byte types. x++, on the other hand, appears to be generated via g_inc, which doesn't special-case 1.

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2015-11-20 (#159456)

dougeff wrote:

Code:

Source++;

Ah, didn't think to try that way. I use pre-increment by force of habit. I basically never want a post-increment (it feels like a form of obfuscation to delay the effect, would rather just make it explicit).

Apparently CC65 doesn't bother to try and detect and optimize an empty post-increment. Maybe just another reason to stick with pre-increments.

Re: Efficiency of development process using C versus 6502
by DRW on 2015-11-21 (#159479)

rainwarrior wrote:

I basically never want a post-increment

I like to use it in parts of arrays that I cannot loop.

One hypothetical example:

Code:

Sprites[i++] = Y;
Sprites[i++] = Tile;
Sprites[i++] = Attributes;
Sprites[i++] = X;

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-02 (#161744)

So, I've continued to experiment with C, just to see how I like it. Today, I wrote a really dumb metasprite drawing routine. No clipping, just blast metasprite entries to sprite ram. No other intelligence like external tracking of position with sprite ram is used, its basically just a test:

Code:

void sprite_draw_metasprite(int x, int y, unsigned char chr_handle, const unsigned char *metasprite) {
    int i;
    int sprite_offset = 0;
    const metasprite_entry *metasprite_entries = (const metasprite_entry*) (metasprite + 1);
    for(i = 0; i < metasprite[0]; i++) {
        sprite *current_sprite = &sprite_ram[sprite_offset];
        const metasprite_entry *current_entry = &metasprite_entries[i];
        current_sprite->x = x + current_entry->x;
        current_sprite->y = y + current_entry->y;
        current_sprite->tile = chr_handle + current_entry->tile;
        current_sprite->attribute = current_entry->attribute;
        sprite_offset++;
    }
}

I threw a metasprite at it with 18 entries. A rather large sprite. Still, this can be drawn in much less than a frame with a pure asm routine. Yet---with this C version, it takes more than a frame to draw just one metasprite!!! I didn't expect C code to be *this* bad. Obviously I would not use the above routine in an actual game. But, even if I was using C for entity logic, it seems it would add up really quickly, both in code bloat and bad performance.

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-02 (#161745)

The first thing that I see: Don't use a pointer to set it to your data:
sprite *current_sprite = &sprite_ram[sprite_offset];
I'm pretty sure this makes the code slow because everytime you access the pointer, it has to dereference the address and use an indirect access. Use the sprite_ram variable itself in every instance. (Same with the other pointer.)

Also: What is metasprite_entry? What type is it? I'm a bit confused since the variable metasprite is of type unsigned char *.

Then use ++i instead of i++.

Those are my first advices. Let's see if we can find more.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-02 (#161746)

metasprite_entry is defined thus:

Code:

typedef struct {
    unsigned char y;
    unsigned char tile;
    unsigned char attribute;
    unsigned char x;
    unsigned char flipped_x;
} metasprite_entry;

Ported directly from a schema for metasprite entries I've been using in 6502 for ages. The reason I used unsigned char* for things was so that I could just have a pile of bytes starting with a count, followed by a bunch of entries of whatever length. I was trying to figure out how to represent this purely with an array of structs inside a struct, but I'm not sure how to do that where you don't specify the array of the structs inside the struct. Except to use a pointer, but then I have to split the metasprite header from the actual const data of the metasprite entries, which I didn't like.

Amusingly, I did not use those two pointers inside the loop initially---I actually was wondering if it would be faster that way, imagining that perhaps finding the correct location in the array four times rather than once was where the time was going---guess not!

Re: Efficiency of development process using C versus 6502
by tepples on 2016-01-02 (#161747)

Why store a separate flipped_x? You can just EOR is_flipped before adding the X position, where is_flipped is $FF if flipped or $00 if not.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-02 (#161748)

Cool tip, Tepples. Thanks!

Re: Efficiency of development process using C versus 6502
by thefox on 2016-01-02 (#161750)

When getting started with cc65, you have to look at the generated assembly to learn what kind of structures are overly expensive.

Basically what rainwarrior said: viewtopic.php?p=159355#p159355

Re: Efficiency of development process using C versus 6502
by thefox on 2016-01-02 (#161751)

Let's look at the generated code: (cl65 -Oirs --add-source --listing metasprites.lst metasprites.c)

I added the missing "sprite" struct in the source. Comments starting with ";;;" are mine. "cc65 stack" means cc65's own software-emulated stack.

tl;dr: The biggest problem is that the cc65 stack is used for arguments and local variables instead of statically allocated space. This forces cc65 to use indirect addressing all over the place.

Code:

000100r 1               ; ---------------------------------------------------------------
000100r 1               ; void __near__ sprite_draw_metasprite (int, int, unsigned char, __near__ const unsigned char *)
000100r 1               ; ---------------------------------------------------------------
000100r 1               
000100r 1               .segment   "CODE"
000000r 1               
000000r 1               .proc   _sprite_draw_metasprite: near
000000r 1               
000000r 1               .segment   "CODE"
000000r 1               
000000r 1               ;
000000r 1               ; void sprite_draw_metasprite(int x, int y, unsigned char chr_handle, const unsigned char *metasprite) {
000000r 1               ;
000000r 1  20 rr rr        jsr     pushax      ;;; Allocates cc65 stack space for "i" (probably doesn't have to be 16-bit?)
000003r 1               ;
000003r 1               ; int sprite_offset = 0;
000003r 1               ;
000003r 1  20 rr rr        jsr     decsp2      ;;; Local variable "sprite_offset" allocated from cc65 stack. Static allocation is usually better.
000006r 1  20 rr rr        jsr     push0
000009r 1               ;
000009r 1               ; const metasprite_entry *metasprite_entries = (const metasprite_entry*) (metasprite + 1);
000009r 1               ;
000009r 1  A0 05           ldy     #$05        ;;; Needs to read the incoming "metasprite" parameter from cc65 stack.
00000Br 1  B1 rr           lda     (sp),y
00000Dr 1  AA              tax
00000Er 1  88              dey
00000Fr 1  B1 rr           lda     (sp),y
000011r 1  18              clc
000012r 1  69 01           adc     #$01
000014r 1  90 01           bcc     L0006
000016r 1  E8              inx
000017r 1  20 rr rr     L0006:   jsr     pushax
00001Ar 1               ;
00001Ar 1               ; for(i = 0; i < metasprite[0]; i++) {
00001Ar 1               ;
00001Ar 1  A0 04           ldy     #$04
00001Cr 1  A9 00           lda     #$00        ;;; Initializes "i"
00001Er 1  91 rr           sta     (sp),y
000020r 1  C8              iny
000021r 1  91 rr           sta     (sp),y
000023r 1  A0 07        L0007:   ldy     #$07    ;;; Checks loop condition
000025r 1  20 rr rr        jsr     pushwysp
000028r 1  A0 09           ldy     #$09        ;;; Reads "metasprite" from cc65 stack to ptr1
00002Ar 1  B1 rr           lda     (sp),y
00002Cr 1  85 rr           sta     ptr1+1
00002Er 1  88              dey
00002Fr 1  B1 rr           lda     (sp),y
000031r 1  85 rr           sta     ptr1
000033r 1  A2 00           ldx     #$00
000035r 1  A1 rr           lda     (ptr1,x)    ;;; Reads metasprite[0]    
000037r 1  20 rr rr        jsr     tosicmp0    ;;; Compare    
00003Ar 1  90 03 4C rr     jcs     L0008       ;;; Leaves loop    
00003Er 1  rr           
00003Fr 1               ;
00003Fr 1               ; sprite *current_sprite = &sprite_ram[sprite_offset];
00003Fr 1               ;
00003Fr 1  A0 03           ldy     #$03
000041r 1  B1 rr           lda     (sp),y      ;;; Read "sprite_offset" from cc65 stack
000043r 1  AA              tax
000044r 1  88              dey
000045r 1  B1 rr           lda     (sp),y
000047r 1  20 rr rr        jsr     aslax2      ;;; Multiply by 4 (struct size)
00004Ar 1  18              clc
00004Br 1  69 rr           adc     #<(_sprite_ram) ;;; Add sprite_ram
00004Dr 1  A8              tay
00004Er 1  8A              txa
00004Fr 1  69 rr           adc     #>(_sprite_ram)
000051r 1  AA              tax
000052r 1  98              tya
000053r 1  20 rr rr        jsr     pushax      ;;; Resulting local variable to cc65 stack again.
000056r 1               ;
000056r 1               ; const metasprite_entry *current_entry = &metasprite_entries[i];
000056r 1               ;
000056r 1  A0 05           ldy     #$05        ;;; Read metasprite_entries and i from cc65 stack.
000058r 1  20 rr rr        jsr     pushwysp
00005Br 1  A0 09           ldy     #$09
00005Dr 1  B1 rr           lda     (sp),y
00005Fr 1  AA              tax
000060r 1  88              dey
000061r 1  B1 rr           lda     (sp),y
000063r 1  20 rr rr        jsr     mulax5      ;;; Multiply by struct size
000066r 1  20 rr rr        jsr     tosaddax    ;;; Add
000069r 1  20 rr rr        jsr     pushax      ;;; Store on stack.
00006Cr 1               ;
00006Cr 1               ; current_sprite->x = x + current_entry->x;
00006Cr 1               ;
00006Cr 1  A0 05           ldy     #$05        ;;; Read the "x" param from cc65 stack
00006Er 1  20 rr rr        jsr     pushwysp    
000071r 1  A0 14           ldy     #$14
000073r 1  20 rr rr        jsr     pushwysp
000076r 1  A0 05           ldy     #$05
000078r 1  B1 rr           lda     (sp),y
00007Ar 1  85 rr           sta     ptr1+1
00007Cr 1  88              dey
00007Dr 1  B1 rr           lda     (sp),y
00007Fr 1  85 rr           sta     ptr1
000081r 1  88              dey
000082r 1  B1 rr           lda     (ptr1),y    ;;; Read "x" from current_entry
000084r 1  20 rr rr        jsr     tosadda0
000087r 1  A0 03           ldy     #$03
000089r 1  20 rr rr        jsr     staspidx    ;;; Store the result (3 = offset to struct)
00008Cr 1               ;
00008Cr 1               ; current_sprite->y = y + current_entry->y;
00008Cr 1               ;
00008Cr 1  A0 05           ldy     #$05
00008Er 1  20 rr rr        jsr     pushwysp
000091r 1  A0 12           ldy     #$12
000093r 1  20 rr rr        jsr     pushwysp
000096r 1  A0 05           ldy     #$05
000098r 1  B1 rr           lda     (sp),y
00009Ar 1  85 rr           sta     ptr1+1
00009Cr 1  88              dey
00009Dr 1  B1 rr           lda     (sp),y
00009Fr 1  85 rr           sta     ptr1
0000A1r 1  A2 00           ldx     #$00
0000A3r 1  A1 rr           lda     (ptr1,x)
0000A5r 1  20 rr rr        jsr     tosadda0
0000A8r 1  A0 00           ldy     #$00
0000AAr 1  20 rr rr        jsr     staspidx
0000ADr 1               ;
0000ADr 1               ; current_sprite->tile = chr_handle + current_entry->tile;
0000ADr 1               ;
0000ADr 1  A0 05           ldy     #$05
0000AFr 1  20 rr rr        jsr     pushwysp
0000B2r 1  A0 0E           ldy     #$0E
0000B4r 1  B1 rr           lda     (sp),y
0000B6r 1  20 rr rr        jsr     pusha0
0000B9r 1  A0 05           ldy     #$05
0000BBr 1  B1 rr           lda     (sp),y
0000BDr 1  85 rr           sta     ptr1+1
0000BFr 1  88              dey
0000C0r 1  B1 rr           lda     (sp),y
0000C2r 1  85 rr           sta     ptr1
0000C4r 1  A0 01           ldy     #$01
0000C6r 1  B1 rr           lda     (ptr1),y
0000C8r 1  20 rr rr        jsr     tosadda0
0000CBr 1  A0 01           ldy     #$01
0000CDr 1  20 rr rr        jsr     staspidx
0000D0r 1               ;
0000D0r 1               ; current_sprite->attribute = current_entry->attribute;
0000D0r 1               ;
0000D0r 1  A0 05           ldy     #$05
0000D2r 1  20 rr rr        jsr     pushwysp
0000D5r 1  A0 03           ldy     #$03
0000D7r 1  B1 rr           lda     (sp),y
0000D9r 1  85 rr           sta     ptr1+1
0000DBr 1  88              dey
0000DCr 1  B1 rr           lda     (sp),y
0000DEr 1  85 rr           sta     ptr1
0000E0r 1  B1 rr           lda     (ptr1),y
0000E2r 1  20 rr rr        jsr     staspidx
0000E5r 1               ;
0000E5r 1               ; sprite_offset++;
0000E5r 1               ;
0000E5r 1  A0 07           ldy     #$07        ;;; 16-bit increment
0000E7r 1  B1 rr           lda     (sp),y
0000E9r 1  AA              tax
0000EAr 1  88              dey
0000EBr 1  B1 rr           lda     (sp),y
0000EDr 1  18              clc
0000EEr 1  69 01           adc     #$01
0000F0r 1  90 01           bcc     L001E
0000F2r 1  E8              inx
0000F3r 1  20 rr rr     L001E:   jsr     staxysp
0000F6r 1               ;
0000F6r 1               ; }
0000F6r 1               ;
0000F6r 1  20 rr rr        jsr     incsp4
0000F9r 1               ;
0000F9r 1               ; for(i = 0; i < metasprite[0]; i++) {
0000F9r 1               ;
0000F9r 1  A0 05           ldy     #$05        ;;; i++
0000FBr 1  B1 rr           lda     (sp),y
0000FDr 1  AA              tax
0000FEr 1  88              dey
0000FFr 1  B1 rr           lda     (sp),y
000101r 1  18              clc
000102r 1  69 01           adc     #$01
000104r 1  90 01           bcc     L0010
000106r 1  E8              inx
000107r 1  20 rr rr     L0010:   jsr     staxysp
00010Ar 1  4C rr rr        jmp     L0007
00010Dr 1               ;
00010Dr 1               ; }
00010Dr 1               ;
00010Dr 1  A0 0D        L0008:   ldy     #$0D
00010Fr 1  4C rr rr        jmp     addysp
000112r 1               
000112r 1               .endproc

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-02 (#161763)

A metasprite routine is exactly the kind of thing you should write in assembly and stick in a library, anyway.

I don't think it'd be worth trying to make more efficient in C, if you're capable of analyzing the output I'm sure you could write an efficient assembly version in way less time than it would take you to play the C vs compiler game to make something faster (which would still be slower than hand written assembly at the end of it anyway).

Re: Efficiency of development process using C versus 6502
by dougeff on 2016-01-02 (#161769)

I agree. A struct for metasprite update (with parameters passed to the C stack) will take 10x longer than equivalent ASM version.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-02 (#161776)

Well I said as much earlier. I wouldn't actually use this in a game. I was just surprised that drawing just this one metasprite took well over a full frame to draw. It's making me wonder what would I actually gain from C. I'm leaning towards just trying to use ca65 macros as a middle ground between nearly no abstraction and too much abstraction.

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-03 (#161798)

I haven't analyzed your code any further. I also think that drawing the sprites should be an Assembly routine. But I'm pretty sure it's not just the fact that you use C that makes it so slow. I guess there are some other issues.

However, that's not a reason to abolish C completely.
I'm programming my game in C as well. I have the low level stuff like sprite drawing and PPU updates in Assembly, but the game logic is written in C. (Makes it much easier than hand-drawn Assembly.)
And I haven't run into a problem yet that I couldn't fix.

So, believe me, it's not C itself that makes problems. It's the way you use pointers and arrays. I'm pretty sure your code could be updated in a way that it doesn't take this long.

As I said, specifically for the sprites update, I didn't even bother to find a good C routine because that's one of the ever-present, constant things in games, so I did in in Assembly right from the beginning.
But don't think that you cannot do an action game that constantly runs at 60 fps with C. I'm in the process of doing one.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-03 (#161799)

Thanks for the encouragement to continue on with C. I do keep feeling that, if I can stick with it, it may result in a better game because I'll be able to think more iteratively/quickly about the game logic. I can't count how many times I started on a new enemy in pure 6502 and groaned thinking about all of the unnatural asmy thought I'd have to go through to get it to work halfway decently. It was fun in the beginning because it felt novel and like I was doing something "hard," but the struggle remains even to this day. That either means I have little talent or there is a grain of truth in that asm just *is* harder to think in. In C that sort of thing would be a snap. Plus I should remember that enemy updates typically have no inner loop, it's just a brief state update, once per frame for each enemy (for a typical game engine..), dealing with far less data than blasting a metasprite to the screen. So, I think I'll press on. Thanks again.

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-03 (#161802)

Yeah, game logic can really be a pain in the ass in Assembly.

You should have a look at Shiru's suggestions for writing C for the NES:
https://shiru.untergrund.net/articles/p ... s_in_c.htm

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-03 (#161808)

GradualGames wrote:

I was just surprised that drawing just this one metasprite took well over a full frame to draw.

There are a lot of hidden ways for naive code that looks sensible to accidentally kill performance. This is kind of unavoidable, but as long as you monitor performance regularly, you should at least be able to spot when it happens, and correct for it. Sometimes it might be worth digging into what the compiler is doing to learn to write better code for it, but sometimes you should just not write that piece of code in C. (BTW I find that writing in C first, then translating to assembly by hand is easier than writing in assembly directly.)

I found this kind of problem was common in modern game development too. Somebody adds a physics object to the scene, but puts it inside a wall by accident. Somebody adds a slow special case to a base character class, but it was accidentally used everywhere. Somebody adds one too many lights to a scene. Somebody puts a shadow casting light inside a box and doesn't realize its there. This kind of stuff is much worse when you've got lots of people working on the project together. If you don't monitor performance constantly, it's hard to go back and figure out what the problem was, especially when it's not one problem but 20 less severe ones that have piled up over time. Often I'd have to spend a few days just analyzing problem areas of the game, and each time we tend to create new "well, don't do this" rules. If we could get the tool to report a known kind of problem, we'd do it, but for the most part they were just a big list of commandments that people would still occasionally forget about, and there's always, always, always a new and easy way to reduce the framerate to slideshow that we haven't thought of yet.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-03 (#161813)

rainwarrior wrote:

GradualGames wrote:

I was just surprised that drawing just this one metasprite took well over a full frame to draw.

(BTW I find that writing in C first, then translating to assembly by hand is easier than writing in assembly directly.)

You just might be right about this. I tried this for the first time today---prototyping a new metasprite routine in C, which pre-clips the sprite before drawing each individual sprite. I feel that this would have taken forever in asm. But I have a correct, pixel perfect clipping meta sprite routine in C just after an hour or two of fiddling. Translating that to asm should be a snap, now. I think maybe I've finally broken through mentally towards accepting C for my future games. Thanks everyone, this discussion has been immensely helpful and encouraging.

Re: Efficiency of development process using C versus 6502
by thefox on 2016-01-03 (#161819)

I've also gone the "write in C first, translate to asm by hand" route occasionally. Actually I think the first routine I wrote like that was also my first metasprite rendering routine (this was done in cc65). I've also prototyped map scrolling and collision routines in C++.

Shiru is also using a similar strategy: https://www.youtube.com/watch?v=w_xWcHdsOPY

Re: Efficiency of development process using C versus 6502
by tomaitheous on 2016-01-03 (#161833)

GradualGames wrote:

Just an observation:

I find trying to apply C or some other higher level language to these old consoles, especially the NES, as fighting against the grain. And at the same time, you're limiting yourself in the scope of your projects because of this inherent vice on potential speed/processing power. If you're capable of writing effective assembly code, to supplement the lacking area of the C generated code, then you're fully capable of doing the whole thing in assembly to begin with. I mean, if assembly feels foreign - then you haven't give it enough chance, enough exposure. And if you rely on C mixed with assembly, then you'll never really reach that point of being comfortable in assembly - thinking in terms of assembly structure. You're essentially trying to force something that really doesn't belong to this system, this architecture. And in essence, you're missing the mark and experience of what it's actually like to write code for these old systems.

For all this hard work, and time, you could have just applied it all to assembly code, becoming comfortable and efficient in writing capable code, to be able to think directly in terms of assembly code structure, and tapping into the system's full potential. Any other approach is really just an attempt to shortcut this, and an overall ineffective at that.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-03 (#161839)

tomaitheous wrote:

GradualGames wrote:

You make some very good points here. I'm still trying to evaluate whether I want to adopt C or not. I keep waffling back and forth. I have a feeling now, later in the afternoon I am in fact going to revert to sticking with 6502...because...man, the number of things that can go wrong just multiply using C. I even got into trouble, once again, using plain old ca65 macros today. I'm a lot more comfortable with 6502 now than I was a few years ago, but it's still a bit slow-going for me. I enjoy it nonetheless. I accepted the reality that there are no shortcuts when I started back in 2009

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-03 (#161840)

tomaitheous wrote:

you're limiting yourself in the scope of your projects because of this inherent vice on potential speed/processing power. If you're capable of writing effective assembly code, to supplement the lacking area of the C generated code, then you're fully capable of doing the whole thing in assembly to begin with.

It's only limiting if your game concept is bound by processing power.

Personally I find the most difficult limitation of making games is just the overall scope of content/features and sticking with it for long enough to actually finish the game. CPU power is relatively easy to manage and deal with, compared to that (depends on the game though). C coding actually helps a lot with being able to finish something; not having to spend as much time coding is a big boon. I'm much more likely to finish a project if it doesn't have to take as long to make.

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-03 (#161843)

GradualGames wrote:

You make some very good points here.

Meh. Proponents of Assembly make it sound like all you need is a bit experience and you can write code as fluently as C. I don't think that's gonna happen very soon. C is a huge improvement when it comes to writing code. It's not just like you write code in C# and then you have to downstep to Visual Basic 6. It's more like building a house with modern machines vs. building a house with nothing but a hammer.

To take Shiru's example:

That's Assembly:

Code:

 lda my      ;multiply my by 32
 sta ptr_h   ;through shifting
 ldy #0      ;a 16-bit var (ptr_h,ptr_l)
 sty ptr_l   ;to the right for three times
dup 3
 lsr ptr_h   ;shift
 ror ptr_l
edup
 lda ptr_l   ;add mx as 16-bit value
 clc
 adc mx
 bcc @1
 inc ptr_h
@1:
 clc
 adc #<map   ;add map offset
 sta ptr_l
 lda ptr_h
 adc #>map
 sta ptr_h
 lda [ptr_l],y   ;read the value

And that's the same code in C:

Code:

n=map[(my<<5)+mx];

GradualGames wrote:

man, the number of things that can go wrong just multiply using C.

I guess there are much more things that can go wrong with Assembly.

How many times did I run into this error: LDA SomeConstantValue and the program didn't work? Yeah, it should have been LDA #SomeConstantValue, but it took a lot of time to find this out. This cannot happen in C.

If you really want to finish a game, use C and only do some low-level functions in Assembly. When you run into problems, you can still check which part of the game is too slow and rewrite it in Assembly.

For example, in my game, in the moment there is only one thing that takes an overly long time that I might have to optimize: The collision checks when the player character attacks. As long as you don't press the attack button, there is a huge part of unneeded time per frame. So, what would I have gained from using Assembly right from the beginning?

On the other hand, if I try to imagine what it would have been writing the MovePlayerCharacter function in pure assembly: I might have quit out of frustration long ago.

I would suggest that you use C. And when you run into problems, you can still switch to Assembly in specific places.
But imagine you finish the game in C (and some low level and time critical Assembly stuff). Then you will have finished it long before you would have finished the game in Assembly.

Re: Efficiency of development process using C versus 6502
by dougeff on 2016-01-03 (#161855)

I feel like we've had this conversation before.

One thing I like from Shiru's examples, is he wrote a whole library of asm functions to handle all common game routines. They aren't explained very well, so I don't use them, but once you write a good 'update PPU with new tiles' asm function, you really don't have to do it again. Every game after that can reuse the same code.

This is how you write fast code in C. Have a huge library of ASM code to handle all basic NES functions.

Re: Efficiency of development process using C versus 6502
by tomaitheous on 2016-01-03 (#161860)

Quote:

One thing I like from Shiru's examples, is he wrote a whole library of asm functions to handle all common game routines. They aren't explained very well, so I don't use them, but once you write a good 'update PPU with new tiles' asm function, you really don't have to do it again. Every game after that can reuse the same code.

This is how you write fast code in C. Have a huge library of ASM code to handle all basic NES functions.

And if you need/require something different? What are you going to do then? If you don't have an effective grasp on assembly, then you're at the mercy of someone else's library. This is actually another negative strike against this C environment (it discourages coders for learning assembly until they absolutely have to, or just give up). If you were comfortable with assembly structure, writing any knew library function/routine would be trivial.

rainwarrior wrote:

Personally I find the most difficult limitation of making games is just the overall scope of content/features and sticking with it for long enough to actually finish the game.

I agree, but what does C or Assembly have to do with any of this? I can think in terms of assembly structure very easily, and write code very effectively and fast. And while I'm there, in the midst of it, I'm already at an advantage point where I can write something that's effective and usable, instead of worrying about whether I need to optimize this with assembly support - because the compiler it spitting out some pitifully slow code, or just something that OK or reasonable initially, but ends up compounding into a larger performance issue. Even if you write unoptimized assembly code, you're much less likely to run in this issue.

Quote:

C coding actually helps a lot with being able to finish something; not having to spend as much time coding is a big boon.

I highly, highly doubt that literally writing code, the physical aspect of typing it out, has any sort of tangible or measurable impact in the grand scheme of writing a game for this platform. And being efficient and comfortable with assembly, negates any issue of trying translating logic into assembly structure.

The act of actually writing code is near meaningless in the grand scheme of things. What takes up time is game design; coming up with tools for making tilemaps, collision maps, AI, specific tweaking to gameplay mechanics and overall design. If know I need to access memory and it needs to be in the structure of a pointer, then it's simple as pie to write the pointer interfacing code. And while I'm there, I can very easily and simply optimize for sequential access (something C compilers for these old system are terrible at, or just straight out don't do it). I'm not talking about voodoo assembly, but your basic everyday building blocks of code.

DRW wrote:

Meh. Proponents of Assembly make it sound like all you need is a bit experience and you can write code as fluently as C. I don't think that's gonna happen very soon.

I can write assembly as fluently, if not better, than C. All things have a learning curve. Are you in it to put something out quick and be done with it? Or actually interested in developing your skills and increasing the performance and scope of projects as you proceed to produce stuffs for this console? Soon, is relative. It will happen. It all depends if you're willing to invest the time to appropriate the necessary skills, or spend your time fighting with C.

Quote:

C is a huge improvement when it comes to writing code.

In general, absolutely. But in this context, not even close when it comes with a burden of performance deficits - not to mention other aspects (dealing with predefined design/structure of the compiler setup).

Quote:

To take Shiru's example:

That's Assembly:

Code:

 lda my      ;multiply my by 32
 sta ptr_h   ;through shifting
 ldy #0      ;a 16-bit var (ptr_h,ptr_l)
 sty ptr_l   ;to the right for three times
dup 3
 lsr ptr_h   ;shift
 ror ptr_l
edup
 lda ptr_l   ;add mx as 16-bit value
 clc
 adc mx
 bcc @1
 inc ptr_h
@1:
 clc
 adc #<map   ;add map offset
 sta ptr_l
 lda ptr_h
 adc #>map
 sta ptr_h
 lda [ptr_l],y   ;read the value

And that's the same code in C:

Code:

n=map[(my<<5)+mx];

And you're point being? I look at the map fetch code and the first thing I think: I need a pointer and an I need to build an index (y shifted by 5, added to x). I could write that in a matter of a minute (clear as day in my head). I've taken more time enjoying a cup of coffee than something like that. That's trivial. And while I'm there, do I need sequential access to fetch rows or columns? If this is for collision purposes, how many tiles do I need to access (tile grid alignment in relation to the object offset)? I can easily optimize accordingly and/or adapt it for future needs.

Quote:

How many times did I run into this error: LDA SomeConstantValue and the program didn't work? Yeah, it should have been LDA #SomeConstantValue, but it took a lot of time to find this out. This cannot happen in C.

That's the noobish/rookie-ish level mistake, though. When you post something like this, that tells me you have no experience with assembly. Does it happen? Sure. Is it easily tracked down? Very easily. Matter of fact, when you write your own code in assembly - you can automatically and easily identify it in the debugger. It makes the debugger so much more effective and useful. Have you ever tried wading through C generated assembly code in a debugger (even if it has symbol support)? It's a mess and it slows things down.

Quote:

For example, in my game, in the moment there is only one thing that takes an overly long time that I might have to optimize: The collision checks when the player character attacks. As long as you don't press the attack button, there is a huge part of unneeded time per frame. So, what would I have gained from using Assembly right from the beginning?

Quote:

On the other hand, if I try to imagine what it would have been writing the MovePlayerCharacter function in pure assembly: I might have quit out of frustration long ago.

Quote:

But imagine you finish the game in C (and some low level and time critical Assembly stuff). Then you will have finished it long before you would have finished the game in Assembly.

How long have you been coding for the NES? What have you put out so far? How long have you been working on your current project? If took 3 to 6 months learning and writing assembly structured code (practice; writing experimental routines and getting feedback), you'd already be in the proper mindset to avoid a lot of problems you're exaggerating about. Quitting out of frustration??? You're coding for the NES! If you've lasted this long, then assembly is nothing. It's just a processor; there are more advance things to consider and work with than simply writing assembly code for a processor (like the video and audio eccentrics and limitations)

I understand people want that C environment and will put up with whatever in order to make it work. I'm just stating that it's not the shortcut people think it is. At some point, you're going to have to bare the burden of learning assembly - or allow that untapped potential to go to waste. If you want to make that choice, that's fine. But I one should know exactly what they're getting into, and if the investment into this console is more than a passing phase - there are more optimal ways to spend that time. Anything worth doing, requires work. If you're afraid of work - then the NES isn't the right system/environment for you. You're dealing with some severe limitations here. If you're afraid of assembly because it looks daunting or alien, that's only because you have familiarized yourself with it. Assembly is easy. Do you have to re-invent the wheel sometimes? Definitely, but you've now developed in deeper understand of that wheel in relation to where is belongs. That's what the NES represents in terms of coding, structure, and approach.

What I find though, is people making comments about C and assembly, with authority, when they clearly have little understanding of coding in Assembly - and all that it entails. If you can't write effective and efficiently in assembly, easily think about logic in terms of assembly structure - then you really don't have anything valid to contribute on the matter. Insecure defensive excuses and falsehoods, don't help people that might be on that fence - who have a longer investment or view point in developing for this console, who might want to reach the level of commercial software. If your set on using C, then that's perfectly fine. I'm not here saying don't use C. Use whatever you want. But making false statements or over exaggerations about the negatives perceptions of assembly language, helps no one. It comes off as people reassuring their own defense mechanisms and avoiding their insecurities.

I'm not interested in debating what is appropriate for you, personally. I'm just stating the facts and dispelling the exaggerations about assembly. I think someone should know the specifics and truth, before deciding which path to take - or continue on.

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-03 (#161861)

tomaitheous wrote:

I highly, highly doubt that literally writing code, the physical aspect of typing it out, has any sort of tangible or measurable impact

I've said this many times before on this board, but WRITING CODE IS NOT JUST TYPING IT OUT. It's planning, thinking, reading, testing, debugging, revising, rewriting, and yes at some point you do have to type it.

I find C a lot faster to think about, to read and understand, and to revise, than equivalent assembly. That's really the whole point, here. It's not simply about verbosity, it's about how much time I have to spend working with it.

This is my personal experience with C on the NES. It's why I bothered to learn to use it in the first place, and why I'll probably use it again in a future project. I think it's quite useful, and it's a disservice to tell people that assembly is the only way to do things. I think it's a very valuable tool that's worth trying out. I don't think there's any one-size-fits-all development solution, and I don't use it for everything, but in the right hands, or the right project, cc65 can do a lot of good on the NES.

Re: Efficiency of development process using C versus 6502
by na_th_an on 2016-01-04 (#161869)

My stance in this kind of discusion is use whatever floats your boat.

C is enough for my needs (plus Shiru's routines in assembly) so that's what I've used. When I need assembly to speed up something, I will dig into it.

In my latest project (which is almost finished but won't be released until we can plan a proper physical release - the game will be still available for free download, though) I had to struggle a little bit to be able to fit my gameplay in a NTSC frame when I had two players on. But I managed to do it in C. And whenever I look at the code again I find a new bit that could be optimized - but won't, as it's not needed anymore as the game is already finished.

I had the game ready in a couple of weeks. I'm not a constant person. I get bored very easily. If I had to code games in assembly which is more time-taxing, I'm sure I wouldn't finish a single one. But that's me.

On the other hand, in another project (on hold) I had to rewrite a couple of funcs in assembly 'cause they just took too much time ~and~ it looked a simple enough task for a complete newbie to achieve. It took me a whole morning just to move around an array a bullets and check for simple collision, but I managed.

That's why, again, my stance is "whatever floats your boat". C does for me, most of the time

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-04 (#161873)

Here's one thing I worry about if I make a committed switch to C on my next big project. In my last game project, which was a large, scrolling game, I had quite a lot of engine and player code, spanning four (16kb, mapper2) banks or more perhaps. I ended up using a lot of far calls (using a routine I devised for making any function call a trampoline, preserving processor status across banks). Those far calls are probably not the most efficient thing in the game, but they're only used for higher level things, no engine core stuff, so they don't have a huge impact.

What I'm saying is, my own 6502 code ended up sprawling quite a few banks for a full-size NES game. I am a bit worried if I use C, the sprawl will be worse, and the temptation or necessity to use far calls will be a lot worse as well. I guess I feel more in control if I use 6502.

I agree with tomaitheous about the immediate/absolute errors. I still make those mistakes now and then but I catch them relatively quickly. What's interesting to me is, if you code in 6502, the number of errors you can make are actually smaller, ironically, because each line of code is so self contained and atomic, and you really only have a few dozen commands. In C, errors can be hidden inside an expression and you may, ironically, have to think about it almost as much as you would hunting through an equivalent "tall" set of 6502 instructions rather than a "long" statement of C. So there's a weird trade-off there, where the gains are perhaps not quite as large as imagined. C also has a harder to learn syntax than 6502, which sometimes requires research to get right. I never have to research 6502 syntax anymore, by contrast.

Probably in the end na_th_an is correct, it's probably however you best enjoy working. There may be no objectively measurable impact on development timeline of using C, at least on this small and constrained computer.

Re: Efficiency of development process using C versus 6502
by tepples on 2016-01-04 (#161874)

For some people, C makes it easier to experiment with the game design well before anything's final. Others might choose to write more of the game in Python with a mindset toward converting it to assembly later.

Re: Efficiency of development process using C versus 6502
by na_th_an on 2016-01-05 (#161929)

You can always code the core game engine in assembly and then the "boring" parts in C, those "offline stuff" such as intro screens, menus, transitions, etc - which make your game look pro but are a bummer to do.

Re: Efficiency of development process using C versus 6502
by tomaitheous on 2016-01-05 (#161945)

na_th_an wrote:

Regardless of what I posted, I still find C related stuff on these old systems as interesting from a hobbyist/curiosity stand point. (I've worked on modifying C compiler for huc6280 and have done support libs and new functionalty in asm as well). I do think it would be interesting to have an assembler and C hybrid of sorts, or some sort of higher level of assist than you can directly write in your assembly listing. I've done stuff with macros that have added higher level functionality for non critical things such as you have mentioned - it makes the code more compact and thus more readable when parsing through it. IIRC, wasn't NESHLA something like this? I guess someone could always write a preprocessor that takes care of the "C" stuff inside the assembler listing, and spit out asm file for final assembling.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-05 (#161946)

tomaitheous wrote:

na_th_an wrote:

I've developed a habit with 6502 where I sort of statically allocate zp space to work with within any given routine. It's become somewhat methodical..I have a bunch of single byte, and two byte variables in zp that I just re-use for local working space in a given routine. Instead of all these expensive pusha and pushax calls that cc65 generates, I keep wondering why isn't there a higher level compiler that can statically use zp space like a programmer writing handwritten assembly would. I haven't looked into NESHLA yet...or things like pynes, to see if anybody's already trying to go that direction. My focus is on building game projects though so I'm probably not going to try to create my own compiler which does this. I may certainly appreciate using one, should one be available or developed, eventually though. In the meantime...probably going to be satisfied with ca65 and judicious use of macros.

Re: Efficiency of development process using C versus 6502
by dougeff on 2016-01-05 (#161948)

Quote:

Instead of all these expensive pusha and pushax calls that cc65 generates, I keep wondering why isn't there a higher level compiler that can statically use zp space like a programmer writing handwritten assembly would

Just do like I do...

Instead of..

Code:

someFunction(arg1, arg2);

You define arg1, arg2 as global variables in the zero page, then...

Code:

arg1 = some value;
arg2 = some value 2;
someFunction(void);

Or, perhaps create your own pseudostack and stackpointer that points to an array in the zero page Ram.
Where you will keep temporary values / passed arguments.

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-05 (#161962)

As dougeff just pointed out, you can do static ZP parameters in the same way in C as you would do in assembly. The ability is still there. ;P You can also make a macro out of it, if you want it to look like a regular function call.

The reason a C compiler can't automate it is that it has no way of knowing* which functions call which, or if they might be recursive, etc. They have to go on the stack, by definition, more or less.

* More advanced compilers can do optimization with statically defined functions. If a function is not imported or exported, but fully defined in the current translation unit, a compiler is allowed to do things like replace it inline (i.e. copy paste the code instead of making a function call), and do other optimizations. In theory you could do some sort of partial inline that looks for the destination of call parameters, and sticks them there instead (if static) as part of the call rather than taking them off the stack, but that's kind of light years beyond the compilers we have right now.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-06 (#162009)

Here's yet another thought I had. With a typical configuration set up for CC65, (at least the default nes.cfg) sets aside 3 pages of ram for stack/local space. That's kinda huge. I feel like that'd get in the way (assuming I don't use sram, and only use the standard 2k of ram available normally). With pure 6502 I can maximize my use of ram for things that are not just local space. I'm guessing perhaps the solution there is "just don't make that segment really large." If so, what's a reasonable size for it?

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-06 (#162019)

Your BSS segment for RAM doesn't have to have a specified size. Segments can grow to fit whatever they need, the linker will give an error if you pack SEGMENTs that are too large into any given MEMORY region (if the enclosing MEMORY region has a fixed size).

The C stack can be placed wherever you like. If you keep things under control you might get away with 32 bytes? 64 or 128 might be more conservative. (You could even stick it on the bottom half of the $100 page and share space with the hardware stack.)

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-06 (#162021)

rainwarrior wrote:

The C stack can be placed wherever you like. If you keep things under control you might get away with 32 bytes? 64 or 128 might be more conservative. (You could even stick it on the bottom half of the $100 page and share space with the hardware stack.)

What? I have to declare a segment for the C stack? Where?

My config file looks like this and I never ran into any errors with it:

Code:

MEMORY
{
   HEADER:  type = ro, start = $0000, size = $0010, fill = yes;
   PRG_ROM: type = ro, start = $8000, size = $8000, fill = yes;
   CHR_ROM: type = ro, start = $0000, size = $2000, fill = yes;
   ZP:      type = rw, start = $0000, size = $0100;
   RAM:     type = rw, start = $0200, size = $0600, define = yes;
}

SEGMENTS
{
   HEADER:            load = HEADER,  type = ro;
   CODE:              load = PRG_ROM, type = ro;
   RODATA:            load = PRG_ROM, type = ro;
   CHARS:             load = CHR_ROM, type = ro;
   VECTORS:           load = PRG_ROM, type = ro,  start = $FFFA;
   ZEROPAGE:          load = ZP,      type = zp;
   ZEROPAGE_FAMITONE: load = ZP,      type = zp,  start = $00FD, define = yes;
   SPRITES:           load = RAM,     type = bss,                define = yes;
   BSS:               load = RAM,     type = bss, start = $0300;
   BSS_FAMITONE:      load = RAM,     type = bss, start = $0700, define = yes;
}

So, is there anything missing after all when this file is used while programming with C?

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-06 (#162022)

No, the C stack is manually defined somewhere in the CRT library I think? It can be moved around though, either by replacing the module in question, or there may be defines you can manipulate it with, I can't recall. (I just replaced the module that defined it.)

The stack should just be a .res somewhere, doesn't need a segment of its own.

Re: Efficiency of development process using C versus 6502
by lidnariq on 2016-01-06 (#162023)

Location of the C runtime stack is defined in the config file:

Code:

    __STACKSIZE__: type = weak, value = $0300; # 3 pages stack
[...]
    SRAM:   file = "", start = $0500, size = __STACKSIZE__, define = yes;

and implemented via crt0.s:

Code:

; Set up the stack.

        lda     #<(__SRAM_START__ + __SRAM_SIZE__)
        sta     sp
        lda     #>(__SRAM_START__ + __SRAM_SIZE__)
        sta     sp+1            ; Set argument stack ptr

Nothing seems to actually enforce the length __STACKSIZE__, though.

If you didn't initialize sp in your crt0 replacement, you're probably randomly scribbling over some portion of memory whenever C needs to use its software stack.

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-06 (#162024)

I wrote it like this:

Code:

   ; The argument stack pointer for CC65 is initialized.
   LDA #<(__RAM_START__ + __RAM_SIZE__)
   STA sp
   LDA #>(__RAM_START__ + __RAM_SIZE__)
   STA sp + 1

Is this also correct? Or should I change it? (The corresponding config file is written above, in my previous post.)

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-06 (#162026)

I'd recommend using a segment or .res or something to occupy that block and make sure nothing else gets placed in it.

Re: Efficiency of development process using C versus 6502
by lidnariq on 2016-01-06 (#162027)

If the C library actually checked for stack overflow, you might want to put the stack in its own region.

But it doesn't, and it grows top-down. Other than putting in a canary to check for stack overflow, I don't think you can meaningfully do better :/

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-06 (#162028)

I think there is some kind of option for stack overflow checking. There's #pragma-check-stack or the --check-stack compiler option.

I've never used them, but they're there at least? I'm not sure what the check is supposed to do if it fails on the NES. If you're not using the IRQ, you could maybe write a BRK response to it and a corresponding handler?

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-06 (#162030)

O.k., I think I get it now:

The C stack grows from end to start. So, when I declare the stack from $500 to $7FF, then the first variable that is used is at location $7FF, the next is $7FE, then $7FD etc.

Therefore, even if I didn't declare a special memory element for the stack, the program still works because the global variables start from the beginning while the stack variables start from the end, so as long as they don't cross paths, everything is alright.

Is that right so far?

But while there isn't a check for stack overflow, declaring the stack as a separate memory element is still a good thing:

Code:

RAM:     type = rw, start = $0200, size = $0300, define = yes;
SRAM:    type = rw, start = $0500, size = $0300;

Because in this case, the program might not be able to check when the stack grows too big, but at least the compiler tells you when you declare too many global variables, so that they get into the reserved space for the stack.

Therefore, when you need too much RAM, you can monitor whether your stack can still work with less space. Or you can re-code your program and remove global variables.
It's an added security device.

In this case, you can even make the RAM segment as small as possible (since the compiler always knows at compile time how many global variables are used, but it doesn't know how much stack space is needed at any given time) and worry about stack overflows even less:
When my whole game uses 50 global variables, I can declare the RAM memory like this and know for sure that my stack space will definitely not be too small. And if I add a 51st variable, the compiler immediately complains and I just add one byte to the RAM memory part.
This is much less error-prone than worrying whether I should reserve 128 or 256 or 512 bytes for the stack.

Is that correct?

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-06 (#162031)

(I didn't see your post when I wrote the above one.)

rainwarrior wrote:

I think there is some kind of option for stack overflow checking.

But an overflow check would require CPU time.

I guess I wouldn't include it, even if it does exist. After all, if a stack overflow can happen, my program is faulty anyway, so it doesn't matter whether it crashes with a defined message or just glitches out.

And I don't expect my program to be faulty regarding stack overflows. I mean, from all the potential bugs, this one is the least likely. So, why waste any checks for something that most likely isn't there in the first place?

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-06 (#162032)

lidnariq wrote:

If the C library actually checked for stack overflow, you might want to put the stack in its own region.

I like explicitly allocating the stack, not to avoid stack overflow, but to avoid RAM allocation overflow. Same result, I suppose, but you do know your static allocation size at compile time.

You can estimate how much stack you need. If you estimate you need 64 bytes, you can reserve it, and the linker will tell you when you've run out of RAM. The default setup is basically this but with a stack estimate of 0 bytes. I think you can probably make a guess that's better than that.

You can start with a conservatively large estimate for the stack size, and not worry about it until your RAM allocations grow too large. At that point you can measure your true stack usage, and adjust the estimate accordingly.

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-06 (#162033)

DRW wrote:

But an overflow check would require CPU time.

Yes, that's why it's not on by default.

I think the expected use is that you find a crash bug, then you recompile temporarily with the checks on and run it again. The program would run slower but it'd be able to diagnose the crash. (Similar to debugging memory faults with valgrind.)

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-06 (#162034)

I would say that if I do an NES game that actually runs into a stack overflow, then my programming concept is a major fuck-up.

Re: Efficiency of development process using C versus 6502
by thefox on 2016-01-06 (#162042)

DRW wrote:

After all, if a stack overflow can happen, my program is faulty anyway, so it doesn't matter whether it crashes with a defined message or just glitches out.

That is if it glitches out so that you actually can recognize it was because of a stack overflow. The check gives you a guarantee.

Side note: Lua scripting support in emulator like NDX/FCEUX would allow you to add the stack overflow check without consuming any NES CPU time.

Side note #2:

Quote:

But while there isn't a check for stack overflow, declaring the stack as a separate memory element is still a good thing:

Code:

RAM:     type = rw, start = $0200, size = $0300, define = yes;
SRAM:    type = rw, start = $0500, size = $0300;

Naming the memory section "SRAM" will cause confusion.

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-06 (#162043)

I thought SRAM stands for stack RAM. What does it mean instead?

Re: Efficiency of development process using C versus 6502
by thefox on 2016-01-06 (#162044)

DRW wrote:

I thought SRAM stands for stack RAM. What does it mean instead?

In electronics: Static RAM. In NES circles also Save RAM (sometimes confusingly).

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-06 (#162045)

Should the stack even be its own memory section or just another segment inside the RAM section?

Re: Efficiency of development process using C versus 6502
by thefox on 2016-01-06 (#162050)

DRW wrote:

Should the stack even be its own memory section or just another segment inside the RAM section?

Can't think of any benefit for making a separate memory section for it offhand.

Re: Efficiency of development process using C versus 6502
by rainwarrior on 2016-01-06 (#162051)

DRW wrote:

Should the stack even be its own memory section or just another segment inside the RAM section?

~~Create a new MEMORY block if you want to give it a fixed address (e.g. $780-7FF).~~

Otherwise you can create a new SEGMENT (then .res the stack, and setup sp to use the res) if you just want allocation protection. You can give the SEGMENT a fixed starting address with a start attribute, but it should be the last SEGMENT listed for that memory block in the config file.

Either way seems fine to me. A ~~MEMORY block~~ fixed address might help with debugging.

Edit: remembered you can align a SEGMENT easily. Can't think of a good reason to use a separate MEMORY block.

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-07 (#162061)

rainwarrior wrote:

Otherwise you can create a new SEGMENT (then .res the stack, and setup sp to use the res) if you just want allocation protection.

Why is using .res necessary here? When I use define = yes in the config file, I have constant names that I can use for the stack. No need to declare a further variable name, as far as I see it.

Re: Efficiency of development process using C versus 6502
by thefox on 2016-01-07 (#162065)

DRW wrote:

rainwarrior wrote:

Otherwise you can create a new SEGMENT (then .res the stack, and setup sp to use the res) if you just want allocation protection.

Why is using .res necessary here? When I use define = yes in the config file, I have constant names that I can use for the stack. No need to declare a further variable name, as far as I see it.

For segments __FOO_SIZE__ will be the actual size of the segment after all data has been placed it. In your case if you have a special segment for stack but don't put anything in it, size would be 0. (For memory areas _SIZE_ would be the size defined in the linker config, and _LAST_ would point to one past the last actually used address.)

There are many ways to set this thing up.

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-07 (#162066)

thefox wrote:

For segments __FOO_SIZE__ will be the actual size of the segment after all data has been placed it.

O.k., yes, that makes sense. But in the current case, the sizes for the RAM segments have to be set with absolute values anyway: Regular RAM from $0200 to $04FF and stack from $0500 to $07FF.

Because otherwise, the whole splitting of the RAM segment wouldn't make sense anyway. Or at least, the information in the config file wouldn't be self-dependent, but would rely on the fact that I will declare the actual size somewhere in the source code. Which completely goes against the purpose of having such a config file in the first place, in my opinion.

So, I think in the case of RAM vs. stack, declaring absolute values in the config file and not declaring anything with .res for the stack is the most elegant way.

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-07 (#162088)

I was able to complete converting a metasprite rendering routine from C to 6502 asm. To my surprise, the resulting routine ran ~20% faster than my previous 6502 implementation (checked with screenshots and the monochrome bit trick), which I wrote without the aid of prototyping in a high level language first. I'm sold. C it is! I enjoy the "Tetris" like game of improving 6502 code, but...it's really nice to think high level first. I think I'm realizing I can have my cake and eat it too.

*edit* Man, I waffled back again today to thinking I'll be most comfortable with pure 6502 rather than moving to C. I think I just plain feel happier working that way. There's something ineffably satisfying about feeling like you're building something out of atoms, atom by atom, yourself. That's kinda lost when I try to jump to something else.

Re: Efficiency of development process using C versus 6502
by mikejmoffitt on 2016-01-07 (#162098)

That is a great example of a truth in comparisons between assembly and higher level languages: the best assembly will beat or at least speed match C, but using assembly doesn't guarantee that will happen!

Re: Efficiency of development process using C versus 6502
by GradualGames on 2016-01-08 (#162129)

In my example, it was more that if you don't put much thought into an algorithm, 6502 will certainly give you MUCH better performance than C without trying very hard. But then, C helped me re-think the algorithm more clearly, make sure it was correct, then re-translate to handwritten assembly...getting something faster than the original 6502 implementation. I.e. I never wrote a metasprite rendering routine in 6502 that was anywhere near as slow as the C one...that would be downright embarassing!

aside: This is making me want to examine metasprite rendering routines by others. I tried searching for a thread on the topic but had not found one.

Re: Efficiency of development process using C versus 6502
by DRW on 2016-01-08 (#162130)

GradualGames wrote:

This is making me want to examine metasprite rendering routines by others. I tried searching for a thread on the topic but had not found one.

I wrote a thread about my sprite routine and asked for suggestions, but literally no person wrote anything on-topic, instead discussing totally unrelated stuff:
viewtopic.php?f=2&t=13451