Returning to SNES homebrew (WIP demo, tips and observations)

Returning to SNES homebrew (WIP demo, tips and observations)
by HihiDanni on 2017-05-13 (#195698)

So I'd been suffering from burnout for a while. My initial goals were maybe a bit too lofty (a project I'm moving over to a more high-level platform - the modern desktop computer). But this stuff is so interesting to me that I still want to have a go at _something_. So right now I'd like to make a small project (short shmup), then release an open source template/framework for folks to build games off of. My hope is that this will accelerate homebrew growth on the SNES.

In its current state, object management has more or less fallen into place. I have a system where objects set up function pointers to thinker functions where they can do as much or as little work as they need. I also have efficient object creation/deletion working with an object pool system for fast constant-time slot allocation - the overhead is small enough that creating an object takes less than 1% of CPU. Additionally, you will be able to slice the object list and pool (total size up to 128 objects) into several smaller regions for more efficient iteration over, say, bullets or particles.

Right now I'd say only one or two major obstacles are left before I would consider my framework "fully-featured". Right now I will need to make some kind of level format and scrolling system, for which I am taking heavy inspiration from Sonic 2 (8x8 tiles form 16x16 tiles form 128x128 tiles). The scrolling logic for updating the on-screen tilemap might be complex, but I have a few ideas involving lookup tables and indirect access to simplify and speed up the logic. My goal is to have bi-directional scrolling with at least a max speed of 16 pixels per frame along both axes. If it becomes efficient enough I may raise the limit to 32 pixels per frame per axis.

Here's the latest build of my engine demonstrating object creation (hold A to spawn objects, move them off-screen to despawn them):

Attachment:

superfeather-objectcreation.sfc [256 KiB]
Downloaded 148 times

And now, I'd like to take this space to document a number of the observations I made while working on this homebrew, in the hope that it will benefit someone:

- I've found that there are two general themes to the work I've been doing: 1. Provide convenience while minimizing overhead, and 2. Approach the same problem from multiple angles - after designing a few implementations you can decide on which one performs the best.

- Try to design around the worst case performance scenario. So far whenever I've been implementing something I've been testing it by applying it to 128 objects at once every frame. This causes any overhead introduced by your systems to be effectively amplified, making it easier to see just how fast or slow your code is, as well as stress testing the system to see just how much you can throw at it. If your code is well-written you'll have all 128 objects moving and thinking at full speed with significant headroom for some extra processing and it will feel really good.

- Know what to optimize. Functions that are called many times per frame are the most obvious candidates. For example, in my CreateObject, MoveObject, and DeleteObject routines, I removed the php and plp instructions and documented the fact that these functions simply expect the A/X/Y registers to already be set to 16-bit size. However, for my SPC upload functions I let loose with the 24-bit long addressing for developer convenience because the overhead is from the main CPU waiting on the audio CPU to acknowledge each byte sent. The developer convenience here is that you can reference audio data from a different bank to upload (something that's much easier to do with DMA, but you can't use that with the SPC, so...)

- Familiarize yourself with the processor status flags and the different addressing modes. There are a number of features that will simplify your code and thus let you achieve more efficient designs. In two of my routines (CreateObject and AddSprite), I set the overflow flag in the status register if the object doesn't fit. This is a convenient way to return a boolean value from a function and works hand in hand with branching since it also operates on the status flags.

- As for addressing modes, indirect addressing can be very useful. If you've ever worked with function pointers or similar concepts in languages such as C, you'll know what I'm talking about. This can save you time and code complexity trying to decide how to do X - instead you can just go ahead and tell the CPU to do X. I haven't fully explored the possibilities, but between the direct page, indirect addressing, and the index registers there is a lot that you can effectively automate.

- Find ways to exploit your routines to minimize further processing later on. I mentioned setting the overflow flag in AddSprite above - this happens if the sprite is outside the screen. The check is meant to prevent placing a pointless off-screen sprite into one of the OAM slots, but it also doubles as a way to perform a rough despawning check - if overflow is set you can do a more precise "out of screen" check only in that scenario, instead of every frame.

- Set your Break interrupt handler to something that halts the game. I only mention this because I know some tutorial projects will assign this handler to an EmptyHandler function that immediately returns. This is an extremely bad idea because if the program counter gets lost, it will go right over any zero bytes and keep on marching and causing havoc in your program without you knowing what went wrong.

- On that same note, you should definitely be using an emulator with good debugging facilities. Memory viewing, breakpoints, and program step-thru have been the most useful features for me.

Hopefully you'll see more from me soon!

Re: Returning to SNES homebrew (WIP demo, tips and observati
by psycopathicteen on 2017-05-13 (#195716)

Quote:

Additionally, you will be able to slice the object list and pool (total size up to 128 objects) into several smaller regions for more efficient iteration over, say, bullets or particles.

So does that mean you can save memory with simple objects such as bullets? How do you get around fragmentation?

How much memory are you devoting to objects. I use direct page, but I'm limited to ~52 objects because I need to fit them into 8kB and I need room for other stuff.

Re: Returning to SNES homebrew (WIP demo, tips and observati
by HihiDanni on 2017-05-13 (#195718)

The object pool system effectively works like this:

- Initialize a region of memory to include all the possible object indexes
- Use it like a queue (FIFO), so have two values to keep track of the current head and tail positions of this queue
- When creating an object, pop the value at the head of the queue and use it as the index of the new object
- When deleting an object, push the index of the deleted object at the tail of the queue

Essentially you're keeping track of a list of unused slots with this technique, so there is no need to look for an empty slot by iterating over the main object list. There's no fragmentation for creating/deleting objects, but there is fragmentation for processing each living object per frame (as it has to iterate over any empty slots). My justification for not avoiding that fragmentation:

1. "Moving" an existing object to another index would require updating every single reference to that object's index, which to me isn't desirable or even feasible
2. Because of 1, we have gaps formed by still-living objects - the only way I can think of to optimize this, besides make the object pool sorted (huge overhead) would be dynamically altering the size of the object list, and that won't improve the worst case scenario, which is largely what I'm optimizing for

Basically my design philosophy for this is simple solutions to simple problems, unless a complex solution provides a sufficient convenience-to-performance ratio.

Edit: Thinking about this some more, I suppose one strategy would be to take a linked list approach to reduce overhead on having just a partially filled list, but again I am unsure what effect this would have on worst case.

Re: Returning to SNES homebrew (WIP demo, tips and observati
by psycopathicteen on 2017-05-13 (#195719)

How much memory are you using per object, and are you using DP or X or Y to index objects?

Re: Returning to SNES homebrew (WIP demo, tips and observati
by HihiDanni on 2017-05-13 (#195720)

Objects use the index register - convention is to use X for the "current" object, and it's what the various routines expect for functions involving objects. Each object takes up 33 bytes - as for memory layout, each field takes up 384 bytes, and thus the X register is a multiple of 3. This is to facilitate the 24-bit coordinate system.

Of course, objects don't necessarily need to use 33 bytes each. I can increase or decrease the number of fields available for each object based on the remaining amount of the first 8kB of RAM.

Re: Returning to SNES homebrew (WIP demo, tips and observati
by psycopathicteen on 2017-05-16 (#195925)

That's good that you can get 33 bytes per object, because memory usage is something I need to improve on. I currently use 112 bytes per object, but I'm slowly getting rid of registers I don't need.

Re: Returning to SNES homebrew (WIP demo, tips and observati
by Erockbrox on 2017-05-18 (#196152)

If you made an actual game engine for the snes I would totally make a game with it.

Re: Returning to SNES homebrew (WIP demo, tips and observati
by psycopathicteen on 2017-05-23 (#196390)

The hard part with making a game engine is you can't tell if it's 100% finished until you finished the game with it. If I release my game engine as is, there would be no easy way to end the game, because I haven't got there yet.

Re: Returning to SNES homebrew (WIP demo, tips and observati
by tepples on 2017-05-23 (#196391)

Even an incomplete 1-level tech demo using the engine would serve as a starting point for hacking in exit functionality.

Re: Returning to SNES homebrew (WIP demo, tips and observati
by HihiDanni on 2017-06-10 (#197897)

A small update, not much to really show but still fairly important: documentation! I've started working on an HTML manual explaining all the different concepts in this framework, including coding conventions, how to use game objects and object pools, the audio driver, and troubleshooting and common mistakes. I rather doubt there are any API documentation generators for 65816 assembly, but I do find this process enjoyable. I hope that it will be a useful resource to folks, even outside of this specific framework.

I'm still doing work on the framework itself simultaneously. I recently found a way to speed up the object thinker iteration to the point where I'm happy with the overhead involved in doing two passes over 128 objects! For a bit I was concerned I would have to lower the default limit to 96 when I previously said the engine was capable of 128 objects, but then I realized that I was effectively doing this:

Code:

jsr (objThinker, x)
; Object thinker code
rts
inx
inx
inx
jsr (objThinker, x)
; Object thinker code
rts
inx
inx
inx

etc...

Can you guess the optimization? That's right, the answer is to replace the rts/jsr with a simple jmp, and this quite literally cuts the iteration overhead in half! So now I am very much comfortable having a second pass over all the objects so that they can draw their final positions, providing more accurate information to the player while still remaining performance-focused!

Additionally, the framework has been ported to the ca65 assembler, and I am now working on the animation system.

Re: Returning to SNES homebrew (WIP demo, tips and observati
by psycopathicteen on 2017-06-10 (#197904)

How does the routine end?

Re: Returning to SNES homebrew (WIP demo, tips and observati
by HihiDanni on 2017-06-10 (#197906)

Object thinkers end by advancing X and then doing an indirect jump to the next object thinker. Use of rts is also supported, but is slightly slower. Iteration is terminated by a function pointer at the end of the list that jumps to a label straight after the loop, which avoids the need to do compare/branch statements for each iteration.