So I'd been suffering from burnout for a while. My initial goals were maybe a bit too lofty (a project I'm moving over to a more high-level platform - the modern desktop computer). But this stuff is so interesting to me that I still want to have a go at _something_. So right now I'd like to make a small project (short shmup), then release an open source template/framework for folks to build games off of. My hope is that this will accelerate homebrew growth on the SNES.
In its current state, object management has more or less fallen into place. I have a system where objects set up function pointers to thinker functions where they can do as much or as little work as they need. I also have efficient object creation/deletion working with an object pool system for fast constant-time slot allocation - the overhead is small enough that creating an object takes less than 1% of CPU. Additionally, you will be able to slice the object list and pool (total size up to 128 objects) into several smaller regions for more efficient iteration over, say, bullets or particles.
Right now I'd say only one or two major obstacles are left before I would consider my framework "fully-featured". Right now I will need to make some kind of level format and scrolling system, for which I am taking heavy inspiration from Sonic 2 (8x8 tiles form 16x16 tiles form 128x128 tiles). The scrolling logic for updating the on-screen tilemap might be complex, but I have a few ideas involving lookup tables and indirect access to simplify and speed up the logic. My goal is to have bi-directional scrolling with at least a max speed of 16 pixels per frame along both axes. If it becomes efficient enough I may raise the limit to 32 pixels per frame per axis.
Here's the latest build of my engine demonstrating object creation (hold A to spawn objects, move them off-screen to despawn them):
And now, I'd like to take this space to document a number of the observations I made while working on this homebrew, in the hope that it will benefit someone:
- I've found that there are two general themes to the work I've been doing: 1. Provide convenience while minimizing overhead, and 2. Approach the same problem from multiple angles - after designing a few implementations you can decide on which one performs the best.
- Try to design around the worst case performance scenario. So far whenever I've been implementing something I've been testing it by applying it to 128 objects at once every frame. This causes any overhead introduced by your systems to be effectively amplified, making it easier to see just how fast or slow your code is, as well as stress testing the system to see just how much you can throw at it. If your code is well-written you'll have all 128 objects moving and thinking at full speed with significant headroom for some extra processing and it will feel really good.
- Know what to optimize. Functions that are called many times per frame are the most obvious candidates. For example, in my CreateObject, MoveObject, and DeleteObject routines, I removed the php and plp instructions and documented the fact that these functions simply expect the A/X/Y registers to already be set to 16-bit size. However, for my SPC upload functions I let loose with the 24-bit long addressing for developer convenience because the overhead is from the main CPU waiting on the audio CPU to acknowledge each byte sent. The developer convenience here is that you can reference audio data from a different bank to upload (something that's much easier to do with DMA, but you can't use that with the SPC, so...)
- Familiarize yourself with the processor status flags and the different addressing modes. There are a number of features that will simplify your code and thus let you achieve more efficient designs. In two of my routines (CreateObject and AddSprite), I set the overflow flag in the status register if the object doesn't fit. This is a convenient way to return a boolean value from a function and works hand in hand with branching since it also operates on the status flags.
- As for addressing modes, indirect addressing can be very useful. If you've ever worked with function pointers or similar concepts in languages such as C, you'll know what I'm talking about. This can save you time and code complexity trying to decide how to do X - instead you can just go ahead and tell the CPU to do X. I haven't fully explored the possibilities, but between the direct page, indirect addressing, and the index registers there is a lot that you can effectively automate.
- Find ways to exploit your routines to minimize further processing later on. I mentioned setting the overflow flag in AddSprite above - this happens if the sprite is outside the screen. The check is meant to prevent placing a pointless off-screen sprite into one of the OAM slots, but it also doubles as a way to perform a rough despawning check - if overflow is set you can do a more precise "out of screen" check only in that scenario, instead of every frame.
- Set your Break interrupt handler to something that halts the game. I only mention this because I know some tutorial projects will assign this handler to an EmptyHandler function that immediately returns. This is an extremely bad idea because if the program counter gets lost, it will go right over any zero bytes and keep on marching and causing havoc in your program without you knowing what went wrong.
- On that same note, you should definitely be using an emulator with good debugging facilities. Memory viewing, breakpoints, and program step-thru have been the most useful features for me.
Hopefully you'll see more from me soon!
In its current state, object management has more or less fallen into place. I have a system where objects set up function pointers to thinker functions where they can do as much or as little work as they need. I also have efficient object creation/deletion working with an object pool system for fast constant-time slot allocation - the overhead is small enough that creating an object takes less than 1% of CPU. Additionally, you will be able to slice the object list and pool (total size up to 128 objects) into several smaller regions for more efficient iteration over, say, bullets or particles.
Right now I'd say only one or two major obstacles are left before I would consider my framework "fully-featured". Right now I will need to make some kind of level format and scrolling system, for which I am taking heavy inspiration from Sonic 2 (8x8 tiles form 16x16 tiles form 128x128 tiles). The scrolling logic for updating the on-screen tilemap might be complex, but I have a few ideas involving lookup tables and indirect access to simplify and speed up the logic. My goal is to have bi-directional scrolling with at least a max speed of 16 pixels per frame along both axes. If it becomes efficient enough I may raise the limit to 32 pixels per frame per axis.
Here's the latest build of my engine demonstrating object creation (hold A to spawn objects, move them off-screen to despawn them):
Attachment:
And now, I'd like to take this space to document a number of the observations I made while working on this homebrew, in the hope that it will benefit someone:
- I've found that there are two general themes to the work I've been doing: 1. Provide convenience while minimizing overhead, and 2. Approach the same problem from multiple angles - after designing a few implementations you can decide on which one performs the best.
- Try to design around the worst case performance scenario. So far whenever I've been implementing something I've been testing it by applying it to 128 objects at once every frame. This causes any overhead introduced by your systems to be effectively amplified, making it easier to see just how fast or slow your code is, as well as stress testing the system to see just how much you can throw at it. If your code is well-written you'll have all 128 objects moving and thinking at full speed with significant headroom for some extra processing and it will feel really good.
- Know what to optimize. Functions that are called many times per frame are the most obvious candidates. For example, in my CreateObject, MoveObject, and DeleteObject routines, I removed the php and plp instructions and documented the fact that these functions simply expect the A/X/Y registers to already be set to 16-bit size. However, for my SPC upload functions I let loose with the 24-bit long addressing for developer convenience because the overhead is from the main CPU waiting on the audio CPU to acknowledge each byte sent. The developer convenience here is that you can reference audio data from a different bank to upload (something that's much easier to do with DMA, but you can't use that with the SPC, so...)
- Familiarize yourself with the processor status flags and the different addressing modes. There are a number of features that will simplify your code and thus let you achieve more efficient designs. In two of my routines (CreateObject and AddSprite), I set the overflow flag in the status register if the object doesn't fit. This is a convenient way to return a boolean value from a function and works hand in hand with branching since it also operates on the status flags.
- As for addressing modes, indirect addressing can be very useful. If you've ever worked with function pointers or similar concepts in languages such as C, you'll know what I'm talking about. This can save you time and code complexity trying to decide how to do X - instead you can just go ahead and tell the CPU to do X. I haven't fully explored the possibilities, but between the direct page, indirect addressing, and the index registers there is a lot that you can effectively automate.
- Find ways to exploit your routines to minimize further processing later on. I mentioned setting the overflow flag in AddSprite above - this happens if the sprite is outside the screen. The check is meant to prevent placing a pointless off-screen sprite into one of the OAM slots, but it also doubles as a way to perform a rough despawning check - if overflow is set you can do a more precise "out of screen" check only in that scenario, instead of every frame.
- Set your Break interrupt handler to something that halts the game. I only mention this because I know some tutorial projects will assign this handler to an EmptyHandler function that immediately returns. This is an extremely bad idea because if the program counter gets lost, it will go right over any zero bytes and keep on marching and causing havoc in your program without you knowing what went wrong.
- On that same note, you should definitely be using an emulator with good debugging facilities. Memory viewing, breakpoints, and program step-thru have been the most useful features for me.
Hopefully you'll see more from me soon!