After making several small NROM projects, I want to try and start working on a bigger project that I've had in mind for a while. I know that at the very least I'll probably need MMC1, as I'm thinking that WRAM will definitely be necessary. I've messed around with CNROM a little, but bank switching, especially PRG bank switching, is still pretty foreign of a concept to me.
From what I understand (And by all means, correct me if I've got this all wrong.) from having read the MMC1 page on the wiki, you can have part of the rom be fixed, and have another part be swapped. I'm guessing that I'd want the code for my engine in the fixed bank, and then all of my .db's and code for each game state and whatnot in the banks that are switchable. I've also heard that things like interrupt vectors have to be added to the end of each bank. So I guess what I'm asking here is what all is generally put in the switched banks?
There's really only two critical requirements when it comes to what data can go in a bank:
1. Any bank that can appear at $FFFA-FFFF should have a reset vector (and NMI/IRQ vectors if needed). You might also need some reset stub code somewhere in the bank so it has somewhere to point to.
2. When playing a DPCM sample, the bank it is in should remain resident, or else there will be audible errors when that bank is switched out. (Only affects $C000-FFFF. MMC3 and FME7 have a convenient 8k bank at $C000-DFFF which helps greatly for DPCM sample banking.)
If you are using a mapper with a fixed upper bank (e.g. UxROM), then #1 is really easy to satisfy. If you're using a mapper that could have any bank on reset, you'll need a reset vector and reset stub code in each bank. #2 isn't relevant if you simply don't use DPCM (also makes controller reading simpler).
Otherwise, just make sure that the correct bank is switched in before you try to fetch data from it, or jump to code in it. NMI and IRQ can be tricky in some cases; if your NMI needs to do some bankswitching, you'll want it to put the banks back as they were before your return from it.
I personally am very fond of AxROM/BNROM 32k banking. It's not very good for DPCM, but having 32k banks makes it really easy to organize data. All the music code and data goes together in a single bank, graphics unpacking code and data goes in a bank together, level data and loading code goes together, etc.
I tend to treat going to another bank as a function call, i.e. I jsr to a special banked-code entry function, which bankswitches, does the thing it's there to do, then switches back to the original bank before returning.
rainwarrior wrote:
I personally am very fond of AxROM/BNROM 32k banking. It's not very good for DPCM, but having 32k banks makes it really easy to organize data. All the music code and data goes together in a single bank, graphics unpacking code and data goes in a bank together, level data and loading code goes together, etc.
What do you do if you have 192 KiB of map data and 192 KiB of tile data? What I'm doing in my current project, which runs on a 512 KiB oversize BNROM, involves sticking copy-to-RAM and unpack routines in 192 bytes of RAM. Or is it common to put the unpacker in all ROM banks?
tepples wrote:
What do you do if you have 192 KiB of map data and 192 KiB of tile data? What I'm doing in my current project, which runs on a 512 KiB oversize BNROM, involves sticking copy-to-RAM and unpack routines in 192 bytes of RAM. Or is it common to put the unpacker in all ROM banks?
I can't imagine that you need me to provide for you a general rule to solve such a specific problem. Do what seems to fit best for your case (or just do anything that gets the job done, really).
Sogona wrote:
you can have part of the rom be fixed, and have another part be swapped.
It depends on the mapper. Some have a fixed part, some don't (the MMC1 lets you choose). When you don't have a fixed part, it's common to simulate one by replicating small pieces of code across multiple banks. CPU vectors and a reset stub should be present in all banks, and trampoline code should be present in banks that "talk" to each other.
Quote:
I'm guessing that I'd want the code for my engine in the fixed bank, and then all of my .db's and code for each game state and whatnot in the banks that are switchable.
That really depends on how much space each part of your game needs. I like to optimize things for the main game, so I'd put as much of the main game engine in the fixed bank as possible (physics, object management, etc.), allowing the switchable part to be used for data (level maps) and less common code (some enemy A.I. maybe). This might mean putting anything not related to the main game engine (reset code, splash screens, menus, etc.) in separate switchable banks, so as to not waste any space with things that are not necessary during the most important part of the program.
Without a fixed bank, my approach would be to dedicate 1 or 2 banks to controlling the game states, and have everything else be data along with the functions that make use of hat data. For example, banks with level data would also have a function to check for collisions between objects and the level map.
Quote:
I've also heard that things like interrupt vectors have to be added to the end of each bank.
Only when the mapper doesn't have a fixed bank where the CPU vectors are. The MMC1 has bankswitching modes where the vectors are switchable, so in order to be completely safe you should have a reset stub at the end of every 16KB bank. This could be unnecessary depending on the power up state of the MMC1, but I couldn't find any information about that.
tepples wrote:
Or is it common to put the unpacker in all ROM banks?
I'm leaning towards this approach. When you use 32KB PRG-ROM banks, it's a given that you'll be wasting space with redundancy. I'd rather keep things simple and fast, even if that means losing 1, 2 or 3KB out of every 32KB. That's not such a big deal.
Sogona wrote:
I'm guessing that I'd want the code for my engine in the fixed bank...
tokumaru wrote:
I'd put as much of the main game engine in the fixed bank as possible...
It's not like code in a switchable bank runs any slower than in a fixed bank. The bankswitch itself takes a handful of cycles, but as long as your banking structure isn't requiring to you bankswitch 100 times per frame it's probably not an appreciable difference. Only bad usage patterns will make a significant impact.
Group stuff by how you use it. "Engine" is too vague a category. Think about the tasks you need to do each frame, and what code needs to be called by a lot of things, and what code/data only needs to be called within a small group. For example, you could probably group all your character drawing code, sprite rendering code, and related data (metasprites, etc.) into a single bank.
If you group your code early on, it's easier to move around later too. You could start with all code in the fixed bank, and then just move code groups out of it when you need more space there as the project grows. The only kind of stuff that really needs to be in the fixed bank are things that are called/needed/referenced by more than one switched bank. Otherwise it's just the same as any other place to put it.
rainwarrior wrote:
It's not like code in a switchable bank runs any slower than in a fixed bank.
No, but it easier/faster to manage the switchable banks from the fixed bank, rather than having one switchable bank call another. Another point of saving most of the fixed bank for the most complex part of the game (the main engine) is to make the most of the limited address space. Having more of the main engine available means less need to bankswitch.
Of course this is completely irrelevant to 32KB bankswitching. Without a fixed bank, you can max out every bank with relevant code/data, and it's mandatory that you have switchable banks call other switchable banks.
My point is that while having a fixed bank is handy in the sense that you can make this bank manage everything, it also makes things less versatile because you can have less dynamic stuff loaded at any given time. This is why I think you should think carefully about what to put in the fixed bank.
Personally, I like to start putting stuff in switchable banks right from the beginning, because it's much harder to move stuff out of the fixed bank than it is to move it back in. Running out of space in the fixed bank can be quite annoying to deal with.
As for mappers, my personal preference (based on functionality, not price) is something like FME-7, because it has:
- 8 KB PRG-RAM bank at $6000..7FFF
- 8 KB PRG-ROM bank at $8000..9FFF (could be used for data)
- 8 KB PRG-ROM bank at $A000..BFFF (could be used for code that operates on the data at $8000..9FFF)
- 8 KB PRG-ROM bank at $C000..DFFF (could be used for DPCM samples)
- 8 KB PRG-ROM bank at $E000..FFFF (fixed bank, for vectors, trampolines, etc)
The possibility of having an 8 KB switchable bank for data, as well as another 8 KB switchable bank for code means that the amount of data can be extended very easily without having to duplicate code (as long as the code doesn't need to see more than 8 KB of the data at a time).
UxROM (or MMC1 in 16 KB mode) is quite annoying with its only one switchable PRG bank. I think I'd prefer 32 KB banking to it, although I haven't tried to use that in a real project yet.
Could you guys please explain to me what trampoline code is?
A trampoline is used to jump from one piece of code to another.
Trampoline (computing) in Wikipedia.
On the NES, a trampoline might be used to jump from code in one bank to code in another bank.
With a single fixed bank, as in UNROM (or MMC3 if you're using $C000-$DFFF for audio), you usually put code that operates on a single bank of data in the same bank as the data and code that operates on multiple banks of data in the fixed bank.
Sogona wrote:
Could you guys please explain to me what trampoline code is?
A trampoline is a piece of code that's either in a fixed bank or replicated across multiple switchable banks (simulating a fixed bank) that allows switchable banks to call each other. For example, my game engine is running from a 32KB switchable bank, and I need to read level data from another switchable bank. To do this, I can JSR to a piece of code that's present in both banks at the same meory location, which will make the switch and JMP to the actual routine that reads the level data. Once the data is read, the program jumps back to the trampoline code, which swaps the old bank back and returns to the location after the original JSR.
It's sort of a slow process, so you don't want to be doing this hundreds of times per frame.
Alright, well I seem to have gotten it to work so far
rainwarrior wrote:
I personally am very fond of AxROM/BNROM 32k banking. It's not very good for DPCM, but having 32k banks makes it really easy to organize data. All the music code and data goes together in a single bank, graphics unpacking code and data goes in a bank together, level data and loading code goes together, etc.
I completely agree with this, but I still haven't decided what the best way to handle inter-bank calls is in this case.
I know about solutions that take the bank index and subroutine address as parameters (which can be passed in registers, global variables, or even in .db/.dw statements after the subroutine call) and handle everything dynamically, but that always seemed so slow to me, and sometimes crippling, if the solution in question prevents you from using registers and/or the stack for passing/returning actual parameters to/from the subroutines being called.
I have considered making a separate trampoline for each subroutine that can be called, because the bank and the address would always be known and wouldn't have to be passed around. It would also be faster than handling indices and addresses dynamically. The obvious disadvantage is the space these routines would occupy in every bank. I don't think the typical game would have that many subroutines scattered across different banks though, seeing as most banks would be occupied by data, and the few routines necessary to interact with that data.
In the kinds of games I have designed, I would expect to switch banks at least once for each active object that collides with the level map, but maybe more than that if objects need more complex A.I. or bigger lookup tables. Then a few times more for scrolling purposes, loading new objects, that kind of thing. Then again in the NMI for different types of VRAM updates and audio updates. All things considered, I expect around 50 round trips to other banks every frame, so optimizing for speed does sound like a good idea in this case.
What are the solutions you have personally used, or seen other games using? What are their advantages and disadvantages?
Okay, well since you asked, here is how my upcoming BxROM (32k banking) game does bankswitching:
My game is based on "rooms", so collision data is unpacked to RAM between room transitions. As such, there's no need to bankswitch to a collision-data bank for each collision test. If you want to know specifically how my game is arranged, I have basically 5 types of bank:
1. Level data and unpacking code
2. CHR data and upload code
3. Music data and code
4. Character update code and data
5. "Main" bank containing game loops, player control and data, NMI handler, etc.
Banks 1/2 are mostly only used during room transitions, no performance issue there.
Bank 3 is called once per frame after the NMI handler finishes withe the PPU.
Bank 4 is called once per frame to update all the characters.
*My "banked function call" trampoline is a piece of code at the same position in every bank that looks like:
Code:
bus_conflict: .byte 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
bank_call:
tay
lda bank ; stores previous bank number
pha
tya
sta bus_conflict, Y
jmp bank_entry
bank_return:
pla
tay
sta bus_conflict, Y
rts
So you
lda #target_bank then
jsr bank_call. When the bank is finished, it does
jmp bank_return, and the whole operation is basically a "long"
jsr. It adds 30 extra cycles compared to a non-banked
jsr/
rts, and you can use
X as a parameter to the call. Not horrible, and it could be optimized for specific cases easily.
If I had to bank for collision, my first approach would probably just be to try it with the banked call and see what the overall performance was like. An extra ~1800 cycles for 60 collision calls (this is an upper bound estimate, I think most frames might have 15 or fewer collision calls) doesn't seem too bad to me; it's not wonderful, but might be good enough. If it was a problem, maybe I'd think about batching multiple collision calls into a single call (collision tests often come in groups). My collision routines are actually kind of slow anyway, since the data is bit-packed to save RAM.
During unpause, I have to reload part of the room to redraw the screen area covered by the pause overlay. My level banks unpacking code has an option to unpack 64 bytes at a time, which gets placed in my NMI update buffer. You can use RAM to shuttle blocks of data between banks like this.
* As my game grew, eventually I had a lot of character code and needed a second bank for it, so at this point some character updates are behind an additional banked call. (There are maximum 16 characters at once, and 2 functions for update and draw, so at most ~960 extra cycles? Typically far fewer, often 0.) There is extra overhead, but it's easy to prioritize characters that appear in CPU-heavy rooms to the "primary" character bank where they don't have to bankswitch. Any characters that don't appear in performance critical areas of the game (i.e. most of them) can be freely moved to the auxiliary bank. I could probably split the character banks into one for update code and one for draw code, which might eliminate any extra bankswitching, but I'd rather keep all the code for a single character in one place (no good reason for this other than I don't want to do the work to separate them now that they don't fit in one bank; not going to do that work until I actually have a performance problem to solve with it).
I don't really know how other games do it. Battletoads seemed to have all its music in a single bank. I expect banks are kind of dedicated to a particular kind of level, e.g. a bank for the vertical platforming levels, a bank for the vehicle riding levels, etc. but I haven't really looked into it.
Tepples suggested dedicating a little bit of RAM for a trampoline, or putting some simple unpacking code in RAM to avoid having to duplicate it in many banks (if ROM space is tight). However, I tend to think of RAM as more scarce than ROM,
especially when you have PRG banking available, so I'd rather trade ROM space for RAM in most cases. (If I had WRAM it might be a different story.) Depends on your RAM budget though, it can be perfectly fine to use some of it for code or some large transfer buffer.
rainwarrior wrote:
My game is based on "rooms", so collision data is unpacked to RAM between room transitions.
That makes a lot of sense. I could never handle maps like this in my engine, unless I used WRAM. 8KB would be enough for an entire level and all of its metatiles.
Quote:
1. Level data and unpacking code
2. CHR data and upload code
3. Music data and code
4. Character update code and data
5. "Main" bank containing game loops, player control and data, NMI handler, etc.
I see. I'm working with a very similar structure, but since I still don't have much data or code ready, I haven't actually distributed everything across the correct banks yet.
My main problem is having to constantly jump back and forth between the object A.I. bank and the current level's bank, for collision purposes, since I can't possibly buffer a decent part of the level in RAM without resorting to WRAM. I guess this isn't such a big problem though... If a generic subroutine for inter-bank calls ends up being too slow, I can always optimize the few cases when it does make a difference.
Quote:
If I had to bank for collision, my first approach would probably just be to try it with the banked call and see what the overall performance was like. An extra ~1800 cycles for 60 collision calls (this is an upper bound estimate, I think most frames might have 15 or fewer collision calls) doesn't seem too bad to me; it's not wonderful, but might be good enough.
Yeah, that's probably the most sensible approach.
Quote:
* As my game grew, eventually I had a lot of character code and needed a second bank for it, so at this point some character updates are behind an additional banked call.
That's something I fear will happen to me too.
Quote:
(There are maximum 16 characters at once, and 2 functions for update and draw, so at most ~960 extra cycles? Typically far fewer, often 0.) There is extra overhead, but it's easy to prioritize characters that appear in CPU-heavy rooms to the "primary" character bank where they don't have to bankswitch. Any characters that don't appear in performance critical areas of the game (i.e. most of them) can be freely moved to the auxiliary bank.
That also makes a lot of sense.
Quote:
I could probably split the character banks into one for update code and one for draw code, which might eliminate any extra bankswitching
That's actually a pretty good idea. I was actually avoiding having to use separate update and draw routines because I didn't want to iterate over the objects twice, but I often find problems that could be easily solved by making that separation. Being able to do all the updates and then all the drawing by just bankswitching only once for each loop does sound like less overhead than constantly bankswitching for different objects.
Quote:
I don't really know how other games do it.
I looked at a few games, but didn't want to dedicate hours of debugging to understand them completely. I was surprised at the amount of hardcoded bank indices used, which I guess supports your theory that banks are dedicated to particular kinds of levels. A great deal of AxROM games were developed by RARE, so the techniques used don't differ very much. The Color Dreams games are messier.
Quote:
However, I tend to think of RAM as more scarce than ROM, especially when you have PRG banking available, so I'd rather trade ROM space for RAM in most cases.
I guess I agree with you.
This is an interesting topic. A while ago, I upgraded my NROM project to MMC1, 128k of PRG with CHR-RAM. I sort of left off at a point where I would have to make a lot of decisions about what goes where, and it seems that I've run out of room in the fixed bank ($C000-$FFFF). I think it really boils down to a few scenarios that present a problem. Running out of space in the fixed bank, is of course one of them. But if you're executing a routine that performs a lot of reading, you need to make sure obviously that both the code and data are accessible at the same time. So you can't have your map-loading routine in the switched bank (assuming 1 switchable bank), and your map data in a different bank (assuming it's not in the fixed bank). You can put the map reading code (or the map data, but I don't see why you would) in the fixed bank, or put the map reading code AND the map data in the switched bank, or put the map data in RAM. The other problem scenario is if you're making frequent calls to a routine. Bankswitching to get to it a bunch of times will cost a lot of cycles.
It really depends on how often you will need to read from ROM, and from what routines. In any case, my thought process is to put anything that I call often, and in any game mode, I will definitely put in the fixed bank. This is mostly stuff like random number generation, or multiplication code (then again, these may come with look-up tables, which means those have to be both accessible at the same time). To eliminate tons of bankswitching, I would try to pair most code/data together in their own banks, if the two can fit. For instance, in my game, the sound data + sound engine won't exceed 16kb, so I can put those two together. Map data might spread across a few banks, so the routines that read from ROM will be a part of the fixed bank.
Sometimes you won't have enough resources to follow the same logic for all components of your engine (you run out of space in the fixed bank, or run out of cycles to handle bankswitching, etc.), so you will end up doing a mixture of methods. When it comes down to that, it's more of an art than a science, and weighing sacrifices can be difficult.
Personally, I find it easier to structure things around 32KB banks than dealing with a fixed 16KB bank. 16KB is way too big for a fixed bank, which makes it harder to have everything you need mapped in. I honestly can't think of 16KB of stuff I would like to have mapped at all times, so I feel like I'd be wasting part of my precious addressing space with stuff I don't need.
IMO, the ideal setup would be 4 x 8KB without a fixed bank, although a fixed bank isn't so troublesome if you have 3 switchable 8KB banks, which make combining code/data much simpler. These mappers are pretty rare/expensive though, so I don't feel like using them.
I always thought that 32KB switching would be too limiting, but I'm enjoying being able to give each bank control of the entire address space. The only thing I'm worried about is having to switch between the level map and the object A.I. banks way too often because of collision detection, but I'm sure that can be optimized somehow if it really does become a problem.
I've honestly never even considered 32KB bankswitching. In fact, I'm sure I forgot that it existed because I automatically assumed it was a silly idea. But now that you mention it, it sounds quite interesting. So the advantage it really adds is not committing to a single fixed bank, as well as being able to spread things freely across the 32KB of data. I can see how those two things would be useful.
I'm not sure how I feel about 8KB banks. I feel like it would be hard to keep things neatly within banks, and make sure you don't end up with wrong address references by switching a bank that should go to $A000-$BFFF into $8000-$9FFF for some reason. Performance-wise, you could probably get a lot out of it, but on the maintenance end, it seems challenging.
What I'm finding with the single 16KB bank is that I have an obscure collection of code in it, with some random procedures being there just because I make frequent calls to them. It's a little harder to maintain, but it helps reduce overhead. The situation you present is the one I'm most concerned about when it comes to platformers. Thankfully my project allows me to decompress enough of the level into RAM so that I can access that information for collision detection, but I imagine with bigger/more complex projects, this may not be so easy. Decompression into RAM requires not only the space in RAM, but the cycles it takes to get that information in RAM. Depending on your compression style, it may or may not be worth it.
tokumaru wrote:
IMO, the ideal setup would be 4 x 8KB without a fixed bank.
How about 8 x 4k with no fixed bank + CHR RAM? ^_^
INL NSF
If by (right now it still seems like an off) chance I run out of room in my fixed $C000 bank, what other mappers would you guys recommend? How does something like FME-7 compare to something like MMC3?
I wouldn't recommend a change to a radically different mapper late in development. That could mean a complete reorganization of your code and data. I think that if you run out of space in the fixed bank (something that could happen with any mapper), you'll simply have to move things that are not used as often into the switchable banks and deal with the overhead of having to switch banks in order to access those things.
As long as you're not using cartridge RAM, and don't switch between the two different memory layouts that MMC3 supports, adapting your code from this subset of MMC3 to a subset of FME-7 with a bank at $6000 should be easy.
lidnariq wrote:
a subset of FME-7 with a bank at $6000 should be easy.
Hum... I hadn't considered the fact that the FME-7 can give you an extra 8KB slot. Yeah, I guess that would be one way to get a little more out of the limited addressing space. Most NES games shouldn't need something like that though...
It may sound weird, but one of the approaches I've been somewhat promoting with GTROM board (which has 32kB banks) is an arbitrary-size fixed bank. This comes at the expense of maximum program size. So if you want a 4kB fixed bank, and the mapper has 16 banks, the cost is 4kB * 16 = 64kB out of 512kB. From a production point of view, a 512kB flashrom is about $1 today, and I'm not going to care about burning off 13 cents worth of memory. That's pretty close to the cost of adding the extra hardware to the mapper for a real fixed bank, but there are further savings from not being set up to produce multiple boards with different options.
Right now my cart programming setup only supports 16kB fixed page, or 16 bytes fixed page (really just $FFF0-$FFF9) which is required for the bootloader reset. I figured if I wanted to, I could use just a couple bytes to specify the size and location of the fixed bank. And the loader would copy that memory from the first 32kB, and ignore it in the others. Of course, by the time you go through the trouble of padding out your .NES file however you like, you might as well just make the fixed bank yourself with the .INCBIN directive.
Not really suggestion you do what I'm talking about, but figured I'd throw in this oddball suggestion since it's a different way to look at 32kB banking.
Seeing as I'm only using about 192 bytes of code in RAM for inter-bank fetches, I'll have to see how much the half-kilobyte fixed bank approach will cause me to switch things around.
How would the .incbin method be done in something like ca65? It'd have to export the addresses of the entry points into fixed bank code somehow, unless we want to populate $FE00, $FE03, $FE06, $FE09, etc. with JMPs.
tepples wrote:
How would the .incbin method be done in something like ca65? It'd have to export the addresses of the entry points into fixed bank code somehow, unless we want to populate $FE00, $FE03, $FE06, $FE09, etc. with JMPs.
I don't think I ever found out a clean way to output the same code multiple times with ca65/ld65 without resorting to .scope, but maybe there is a way I overlooked. My 'best method' ended up being to have the linker output to different files, which I manually combined with a big ugly 'copy /b' command. For a different assembler I probably would have went straight for .incbin, and one way I've thought it could be done (haven't tried this exactly) is to .incbin in the other 32kB banks the same code that you're assembling in the first 32kB bank. Of course it's not available yet because it hasn't been assembled, so you have to run the build process twice when you change any code in the fixed bank. To assemble the program for the first time, you would need to include a dummy file so it can assemble without error and "prime the pump". Sounds kinda hacky, but I think that should probably work with any assembler.
edit: To summarize, the fixed bank source is included when the whole program is assembled, but at the same time is included in binary form only for banks after the first one, to avoid all the label conflict errors.
edit 2: On second thought, maybe you could include a separate .asm file once in the global scope, then the same file again multiple times in their own unique scopes that are never actually invoked. The code would be there in the same place, but the main program would only see the addresses from the global scope.
I keep on thinking that there should be some way of (mis?)using cc65's overlays or extended memory drivers to present an interface for banking, but I haven't quite figured it out.
Memblers wrote:
Of course, by the time you go through the trouble of padding out your .NES file however you like, you might as well just make the fixed bank yourself with the .INCBIN directive.
I'm certainly using the "virtual fixed bank" approach, but I don't expect to make it more than a few hundred bytes long. And I don't use .incbin, I just include a source file that includes everything that goes in the fixed area.
tepples wrote:
How would the .incbin method be done in something like ca65?
I don't know about ca65, but if anyone reading this is wondering about ASM6, you can have a fixed section by creating a file like this:
Code:
;place this block at the very end of the program bank
.org $10000 - (+FixedSectionEnd - +FixedSectionStart)
Mapper.FixedSection = $
+FixedSectionStart:
.include "functions\Mapper.CallFunction.asm"
.include "interrupts\System.NMI.asm"
.include "interrupts\System.IRQ.asm"
.include "interrupts\System.Reset.asm"
.include "tables\Mapper.BankIndices.asm"
.include "tables\Mapper.MappedBankIndex.asm"
.include "tables\System.InterruptVectors.asm"
+FixedSectionEnd:
And then include it at the end of every bank. To avoid "label already defined" errors, you have to declare global labels like I did above (inside all the included files too), instead of the usual name + colon. I'm so used to this that I define every global label like this now, even the ones that are not used multiple times.
My strategy is to link each bank separately. That way the linker config has a fixed segment, and I have one fixed source file (fixed.s) that puts code in that segment. I assemble that once into an object file, and that same object gets linked into every bank.
Cross-bank symbols can be done too. Create a dummy segment for the other bank that does not output to the file, and simply include its objects in the link, which makes all of its exported symbols available in the current bank.
Mind if I step back to the basic basics of PRG bank switching? I'm just now trying to figure this out. It's a bit more tricky than CHR bankswitching.
I've got a lot of questions, and I couldn't really seem to find a resource for the things that are tripping me up.
For starters, I just want to even get my code in my game program in a way that it will run, before even bankswitching.
So, I figured I'd assemble my code and then include it in my program as a binary file. I couldn't use a normal include or I'd get warnings about labels already defined. Is this how this is typically done?
Do I need to be sure to .org to the end of the bank? For ex. $9FFF, and store some data, so that my file is exactly 8KB?
I put the banks as an incbin in my main program, and that worked fine as long as my program was still set to be 32KB, basically, ignoring those extra banks.
If I set my program to $04 banks, The program crashed. So, I look and see that fixed banks are fixed to the last and second to last banks. That throws off the way I have this set up.
I move .inbins to the top of the program, before .org $8000. Now it works. I'm wondering though, without writing a bank switch, how do I know which swappable banks are loaded? At this moment, they're all exactly the same so I couldn't tell.
I could of course swap the banks whenever, but I guess what I'm asking is, what are the default values of R6 and R7 on MMC3 $8000?
Edit: Through trial and error I found that $8000 gets sets to the first bank, so I'm guessing that $A000 gets set to the second.
darryl.revok wrote:
So, I figured I'd assemble my code and then include it in my program as a binary file. I couldn't use a normal include or I'd get warnings about labels already defined. Is this how this is typically done?
For MMC3, probably not. My own MMC3 project code is set up so I can define which bank a piece of code or data should end up in, and the linker handles actually placing it into the correct bank.
darryl.revok wrote:
Do I need to be sure to .org to the end of the bank? For ex. $9FFF, and store some data, so that my file is exactly 8KB?
An appropriate linker configuration will ensure banks are the correct size automatically.
darryl.revok wrote:
I put the banks as an incbin in my main program, and that worked fine as long as my program was still set to be 32KB, basically, ignoring those extra banks.
If I set my program to $04 banks, The program crashed. So, I look and see that fixed banks are fixed to the last and second to last banks. That throws off the way I have this set up.
For MMC3, $E000-$FFFF is always fixed to the last bank. The second to last bank is also always present, but it can be switched between using $C000-DFFF and $8000-$9FFF (so that you can have DPCM in a switchable bank).
darryl.revok wrote:
I move .inbins to the top of the program, before .org $8000. Now it works. I'm wondering though, without writing a bank switch, how do I know which swappable banks are loaded? At this moment, they're all exactly the same so I couldn't tell.
You could tell by having a canary at the same offset in every bank and checking its value to determine which bank is loaded, but other than that, there is no way to tell.
darryl.revok wrote:
I could of course swap the banks whenever, but I guess what I'm asking is, what are the default values of R6 and R7 on MMC3 $8000?
There are no default values. Don't trust emulators.
darryl.revok wrote:
Edit: Through trial and error I found that $8000 gets sets to the first bank, so I'm guessing that $A000 gets set to the second.
There are no default values. Don't trust emulators.
darryl.revok wrote:
It's a bit more tricky than CHR bankswitching.
It's definitely trickier, since code runs off of PRG-ROM.
Quote:
For starters, I just want to even get my code in my game program in a way that it will run, before even bankswitching.
On MMC3, that's as simple as putting everything in the last 8KB of the ROM.
Quote:
So, I figured I'd assemble my code and then include it in my program as a binary file. I couldn't use a normal include or I'd get warnings about labels already defined. Is this how this is typically done?
There are ways around that, depending on the assembler you're using. In ASM6, for example, I define labels like this:
MyLabel = $, indead of the typical
MyLabel:, whenever I have to repeat the label in multiple banks.
Quote:
Do I need to be sure to .org to the end of the bank? For ex. $9FFF, and store some data, so that my file is exactly 8KB?
Again, depends on the assembler. In ASM6,
.org $A000 will do, there's no need store any data. I believe
.align $2000 will work too, and it looks cleaner because you don't have to write specific addresses.
Quote:
If I set my program to $04 banks, The program crashed. So, I look and see that fixed banks are fixed to the last and second to last banks. That throws off the way I have this set up.
On MMC3, the last bank is guaranteed to be mapped to $E000-$FFFF, so that's where the reset code should be. It's a good idea to put the NMI and IRQ handlers there too.
Quote:
I move .inbins to the top of the program, before .org $8000. Now it works. I'm wondering though, without writing a bank switch, how do I know which swappable banks are loaded? At this moment, they're all exactly the same so I couldn't tell.
I usually have a
.db statement write the bank index at a fixed location in every bank, so I can easily look that up to see what is mapped where.
Quote:
I could of course swap the banks whenever, but I guess what I'm asking is, what are the default values of R6 and R7 on MMC3 $8000?
You shouldn't rely on default values. Different revisions of the chips behave differently, and most don't even have default values at all. Much like PPU registers, mapper registers have to be manually initialized in order to be considered stable.
Quote:
Edit: Through trial and error I found that $8000 gets sets to the first bank, so I'm guessing that $A000 gets set to the second.
Don't rely on that. Only the fixed bank is guaranteed to be mapped to a specific location.
Joe wrote:
and the linker handles actually placing it into the correct bank.
This is probably pretty basic, but what's a linker? This, I imagine:
https://en.wikipedia.org/wiki/LinkerSince the page says compiles files, does that mean you're working with a higher-level language, or is that just a semantic?
So, am I correct in understanding and you built an app to put your project files together? Does it take the source code or binaries?
Quote:
There are no default values. Don't trust emulators.
Gotcha. Load desired map bank at startup.
Tokumaru wrote:
In ASM6, for example, I define labels like this: MyLabel = $, indead of the typical MyLabel:,
Okay, I'm trying to figure this out. I also am using ASM6.
So, any label that gets called from another bank will have to have a .org, so it can target the same place in any bank, let's say for example, MapStart at $8000.
So, would I define MapStart=$8000, so then a reference to MapStart would just call that point in the program counter? Then it doesn't seem like I'd need a "MapStart:" label. So I'd probably just put a commented out label for my own reference.
Is that correct understanding?
Quote:
In ASM6, .org $A000 will do, there's no need store any data.
$A000 doesn't write one byte into the next bank?
Quote:
It's a good idea to put the NMI and IRQ handlers there too.
Unless something changes, I'm going to have a unique NMI and IRQ for each level type. I was planning on having the NMI vector actually point to a trampoline that loads the right NMI bank in place of the map bank. I'll have to do the same for IRQ, and take that latency into consideration.
It's unfortunate that the MMC3 only has two swappable banks and one needs to be dedicated to music.
darryl.revok wrote:
It's unfortunate that the MMC3 only has two swappable banks and one needs to be dedicated to music.
Look into FME-7, then; INL has a CPLD-based clone for it, in addition to his MMC3 clone.
(Or look into
my summary of more complicated mappers on the wiki)
darryl.revok wrote:
Okay, I'm trying to figure this out. I also am using ASM6.
Good, this means I can actually help you with this.
Quote:
So, any label that gets called from another bank will have to have a .org, so it can target the same place in any bank, let's say for example, MapStart at $8000.
If you have data of varying sizes, yes, you're going to have to .org the labels in place. Personally, I find it strange that you'd want something like level maps to be at fixed locations, since that would mean wasting space when levels are small. Normally you'd use pointers for this, and static positions only for data that's exactly or nearly the same size across banks.
For things that are the same size, there's no need to use .org statements for each label... just start from a known address (e.g. $8000) and include the things in the same order from there.
Quote:
So, would I define MapStart=$8000, so then a reference to MapStart would just call that point in the program counter? Then it doesn't seem like I'd need a "MapStart:" label. So I'd probably just put a commented out label for my own reference.
"$" without an hex number is a special symbol that means "the current PC". So doing
Label = $ (no number!) creates a symbol containing the current PC, which is similar to how labels work, but, unlike labels, symbols can be redefined, so you can have that command many times without errors. Just make sure that you do it at the same address every time, otherwise "Label" will point to different places and your code will be buggy.
Quote:
$A000 doesn't write one byte into the next bank?
No. It prepares to write data from that point on (having padded all the way up there), but it doesn't write anything if you don't do anything that actually outputs data.
Creating 8KB banks for the MMC3 can be as simple as this:
Code:
;bank 0 (will be mapped to $8000-$BFFF)
.base $8000
;STUFF
.align $2000
;bank 1 (will be mapped to $8000-$BFFF)
.base $8000
;STUFF
.align $2000
;bank 2 (hardwired to $C000-$DFFF)
.base $C000
;STUFF
.align $2000
;bank 3 (hardwired to $E000-$FFFF)
.base $E000
;STUFF
.align $2000
Quote:
Unless something changes, I'm going to have a unique NMI and IRQ for each level type. I was planning on having the NMI vector actually point to a trampoline that loads the right NMI bank in place of the map bank. I'll have to do the same for IRQ, and take that latency into consideration.
Then make sure the trampoline is in the fixed bank, and that it swaps the actual handlers in before jumping to them. And it restores the previous bank before returning. And that it didn't interrupt a mapper write taking place in the main thread.
Quote:
It's unfortunate that the MMC3 only has two swappable banks and one needs to be dedicated to music.
If you want to bank switch DPCM samples, yes, there's no other way. Otherwise you can temporally switch the audio stuff in, update the APU, and switch it out just fine.
tokumaru wrote:
Personally, I find it strange that you'd want something like level maps to be at fixed locations, since that would mean wasting space when levels are small. Normally you'd use pointers for this, and static positions only for data that's exactly or nearly the same size across banks.
I'm going to tackle this first, since before I start doing PRG switching, I need to make my code work for it.
I'm still debating how to best set this up. I did a little bit of planning. I have metatiles in fixed locations at the end of the bank. I've also got various data about the level, such as number and position of scanline splits, in fixed locations.
As for the map value, don't I need a fixed value for calculating my pointer locations?
For my levels, data is stored in columns. I can use hScroll Hi and Lo to calculate a value which designates the offset from MapStart, but don't I need a starting position for the map which is going to be the same in every bank?
I suppose though that I could have data fixed to the beginning of the bank, and then that reads the pointer value for MapStart, as well as pointers for the Metatile locations and otherwise.
I thought about trying to squeeze in enemy data if it's an enemy type unique to that level, or the NMI for that scroll type if I can fit it. Maybe just the IRQ for that level. If I used fixed values though, the amount of space I had available in every bank would be determined by the amount of space available in the most full bank.
I guess I do have a lot of options. The trampoline for NMI could load the proper bank, and then load the pointer for the NMI handler from a fixed position in that bank. Likewise with loading a map bank, and then loading the MapStart pointer from a fixed position in that bank.
Every bank would have to have a pointer for a routine type that's included in one of those banks, even if it's a null pointer, but that's a small sacrifice.
Does that seem like the right line of thinking?
darryl.revok wrote:
I have metatiles in fixed locations at the end of the bank. I've also got various data about the level, such as number and position of scanline splits, in fixed locations.
Those sound reasonable. Level maps, on the other hand, usually vary in size, and I would also expect there to be more than 1 map per bank.
Quote:
I can use hScroll Hi and Lo to calculate a value which designates the offset from MapStart, but don't I need a starting position for the map which is going to be the same in every bank?
The normal thing is to have each map starting at a different memory location, and a table indicating where each one starts. Like this:
Code:
LevelMap0:
.incbin "levelmap0.bin"
LevelMap1:
.incbin "levelmap1.bin"
LavelMap2:
.incbin "levelmap2.bin"
LevelMapPointers:
.dw LevelMap0, LevelMap1, LevelMap2
And then, when a level starts, you use the index of that level to fetch and buffer its base address, that you can later add to column offsets and such:
Code:
lda LevelIndex
asl
tax
lda LevelMapPointers+0, x
sta LevelBaseAddress+0
lda LevelMapPointers+1, x
sta LevelBaseAddress+1
Don't forget that games without bankswitching have levels too, and the key to that (and to a lot of dynamic things in a game) is indexing pointers. When you have the luxury of being able to switch banks, that can make some of the dynamic stuff simpler, but you can never completely kill off indexed pointers.
Quote:
Does that seem like the right line of thinking?
Most of that sounds like it'd work just fine. You should go with whatever you're more comfortable with. It's very likely that you'll notice a few shortcoming of whatever method you chose later down the road, but that's normal, and you'll then be able to refine what you have to better suit your needs.
darryl.revok wrote:
Joe wrote:
and the linker handles actually placing it into the correct bank.
This is probably pretty basic, but what's a linker? This, I imagine:
https://en.wikipedia.org/wiki/LinkerSince the page says compiles files, does that mean you're working with a higher-level language, or is that just a semantic?
So, am I correct in understanding and you built an app to put your project files together? Does it take the source code or binaries?
A linker is a program that takes object files (the output of an assembler like ca65) and puts them together into a single binary.
Here, "compiling" refers to combining several items into one. All of my code is still assembly. It should be possible to use C in my project, but I haven't figured out how since I want to write it in assembly.
I did not build a linker myself; I use ld65. It takes the binaries created by ca65. I did have to spend some time tweaking my linker script until I got something reasonable; let me know if you want a copy of that.
darryl.revok wrote:
It's unfortunate that the MMC3 only has two swappable banks and one needs to be dedicated to music.
I know someone has already mentioned this, but you only need to do that if you have DPCM and can't fit it into one of your fixed banks. If you only have a few small DPCM samples, you might be able to fit them into one of the fixed banks instead. If you have no DPCM, you don't need to worry about this at all.
lidnariq wrote:
Look into FME-7, then; INL has a CPLD-based clone for it, in addition to his MMC3 clone.
Actually, I'm starting to consider switching my project to FME-7 after reading about this.
One thing that concerns me is that the wiki states the maximum PRG ROM size is 512KB, but INL says their board supports 256KB. Does anybody know which number is accurate? I suppose they could both be accurate and perhaps INL's board doesn't allow for the full 512KB but I'm not sure.
In any case, it's not that big of a deal but I'm curious. Since I'm not using CHR RAM, I didn't expect to get close to 512KB of PRG ROM with my first game. I did expect to utilize all 256KB of CHR ROM though, so reducing that would be a deal breaker.
It looks like FME-7 has the features of MMC3 plus a little more. It has finer bankswitching for CHR and PRG, and the scanline counter is a bit more precise.
Using the FME-7 scanline counter concerns me a little because it's different, and the timing isn't automatically aligned to a scanline, however I get the impression that it would just take a little more careful planning, and overall the results would be a little better, since you won't have to burn up any cycles, which could add up with a lot of IRQs on the screen at once. It does seem like it would require tighter planning with the IRQs though. No branching inside of IRQ except on the very last one.
tokumaru wrote:
darryl.revok wrote:
I have metatiles in fixed locations at the end of the bank. I've also got various data about the level, such as number and position of scanline splits, in fixed locations.
Those sound reasonable. Level maps, on the other hand, usually vary in size, and I would also expect there to be more than 1 map per bank.
I'm not entirely sure on this to be honest. At first I had planned to have each level take up one bank. There's a good chance that it will stay that way, but there's no reason I couldn't change things down the line.
The way I'm planning this game, I'm going to focus on quantity of graphics and variety of areas rather than length. The game's going to be relatively short, but each stage should have a good challenge and I've got plans for things that will make replay very interesting. Each stage planned (10) has an almost entirely different appearance, and the only tiles that would be shared are really basic things like some grass in about three levels or pavement that may be shared between two or so.
I think though that I should adapt more flexibility with pointer locations however it ends up playing out. There really are a lot of options, like for example, certain levels or even scenes within levels have completely unique NMIs. Those could go in a level bank. Unique enemies could go in a level bank.
I was planning on going for the goal of having as little swapping during play as possible. If I could get a PRG set up that has all needed data available at the right time, it would be minimal cycle savings, but optimal nonetheless. I feel like with the FME-7 that would be a lot easier.
Quote:
It's very likely that you'll notice a few shortcoming of whatever method you chose later down the road, but that's normal, and you'll then be able to refine what you have to better suit your needs.
Definitely. I'm finally getting to the point though that I have enough of the picture that I can step back and make a rough outline for everything that will be needed.
Joe wrote:
A linker is a program that takes object files (the output of an assembler like ca65) and puts them together into a single binary.
So, are they compiled at time of assembly, or post-assembly? Would it require use of CA65? I'm using ASM6.
I'm really slow about adopting new software or new techniques as long as I have things that work. I still haven't even started using macros.
darryl.revok wrote:
One thing that concerns me is that the wiki states the maximum PRG ROM size is 512KB, but INL says their board supports 256KB. Does anybody know which number is accurate? I suppose they could both be accurate and perhaps INL's board doesn't allow for the full 512KB but I'm not sure.
The FME-7 definitely supports 512 KiB PRG, BUT no commercial games were released that used it.
The 5A and 5B only support 256 KiB PRG (because the audio mixing amplifier uses that pin).
INL's clone apparently doesn't support 512 KiB, but I don't know why. It could either be space within the CPLD, or could be a to-the-letter implementation of the 5A.
Either way, all definitely support 256 KiB CHR.
Quote:
Using the FME-7 scanline counter concerns me a little because it's different, and the timing isn't automatically aligned to a scanline, however I get the impression that it would just take a little more careful planning, and overall the results would be a little better, since you won't have to burn up any cycles, which could add up with a lot of IRQs on the screen at once. It does seem like it would require tighter planning with the IRQs though. No branching inside of IRQ except on the very last one.
As
tepples has pointed out, the problem with an M2-based IRQ in combination with the variable IRQ latency of the 6502 means that the error on each additional IRQ adds linearly (unless you never actually stop the IRQ's counter).
(By which I mean, if you disable, reload, and reënable the IRQ, and the first one is two extra cycles late (because the IRQ fired during beginning of a 3 cycle instruction), and the second one is one extra cycle late, the second one will end up being three total cycles late)
Quote:
So, are they compiled at time of assembly, or post-assembly? Would it require use of CA65? I'm using ASM6.
Ld65 is specific to ca65 object files. You could rewrite your code to assemble in ca65 (only requires a few syntax changes and adding a few lines to define which segment each bit of code is). Or, you could go without a linker, and assemble everything with ASM6.
lidnariq wrote:
As
tepples has pointed out, the problem with an M2-based IRQ in combination with the variable IRQ latency of the 6502 means that the error on each additional IRQ adds linearly (unless you never actually stop the IRQ's counter).
I was thinking that one could load the IRQ counter value to determine what the IRQ latency was, and offset the reload value with that. The counter doesn't seem to be accessible through CPU memory though.
That was disappointing, as for a moment, I thought it might be possible to get pixel accurate IRQs, which would be a game changer.
An inability to do complex mid-screen scroll splits is a problem. I've been using them pretty extensively. Asides from the MMC5, are any mappers known to be especially flexible with scanline IRQs? I'll look over the list.
Some IRQ sources are user-readable, such as Namco's 163. (Unfortunately, the 163 auto-stops)
Note that you still CAN do complex multiple splits, it's just that you either have to accept this error source, or align all your IRQs to the same remainder of 256cy (approximately 3 scanlines). Or mix and match; obviously you could do the first several IRQs all with perfect relative alignment, and then you can do several more at a different alignment—it's only writes to the LSB that allow drift.
But anyway, there is absolutely no existing IRQ generator that will produce cycle-perfect interrupts, because of the 6502's interrupt latency. That's why thefox suggested the C64's NOP slide technique. (which gets it down to 2 cycles). The only way to get it down to 1 cycle would be something that injected clockslides.
Hmmm... That sounds problematic. I feel like an MMC3 with 4x8KB PRG windows and 8x1KB CHR windows might be better for me. I don't know if anyone's made one yet. INL's site said they were working on a RAMBO-1 board.
Does anyone know how hard it would be to make a mapper with pixel accurate IRQs? I don't know much at all about the hardware design, I just know what features would be really useful for making cool games.
If that was possible you could do layered backgrounds with so much flexibility. However, maybe it's harder than I'm making it out to be, because you'd have to be accurate to the PPU cycle count instead of just the CPU cycle count.
I'm probably getting ahead of myself but I've wondered for a while what goes into someone designing a mapper best suited to their purposes.
Pixel perfect IRQs are impossible; the CPU runs at 1/3 the speed of the PPU and with a remainder of 2/3 cycles per scanline, so there will always be a jitter of 1 or 2 pixels on each scanline.
Cycle perfect IRQs are ... well, what I said. A special IRQ that would fire ~14cy earlier than when you requested, jump to a clockslide to make up the difference, and finally jump to your desired code at the right time. Or, as you were pointing out, if you had an IRQ where you could read the amount of error ... say, something like the 163's IRQ but that didn't auto-stop.
Also consider looking through
http://wiki.nesdev.com/w/index.php/User ... 3_Variants
Quote:
Pixel perfect IRQs
Port your game to SNES maybe?
I don't understand the need for cycle/pixel perfect IRQs though... All raster effects that are possible on the NES can be accomplished just fine despite the latency that's inherent of IRQs.
The only effect that's really problematic is the palette update, which is hard not really because of the interrupt latency, but because of the short hblank, that would remain short even if there was no IRQ latency at all.
tokumaru wrote:
I don't understand the need for cycle/pixel perfect IRQs though... All raster effects that are possible on the NES can be accomplished just fine despite the latency that's inherent of IRQs.
If it was possible, I would use it for layered backgrounds. The MMC5 doesn't even do that with it's vertical scroll split mode. It apparently triggers off tile loading.
That being said, it's probably not possible without putting another PPU in the cartridge or something, I'm getting ahead of myself, and it's off-topic for the thread anyway.
There is absolutely no way to get attribute data finer than 8x1 pixels. (It's a fundamental property of the PPU: it does nothing but fetch a byte from the nametable, fetch two bits from the attribute table, fetch one bitplane from the pattern tables, and fetch the other bitplane; and then it repeats that 42 times on each scanline. 34 of those fetches are for background data; the other 8 are for sprites.) You can't usefully change fine X mid-scanline: at the very least it'll take 4 cycles = 12 pixels between the last writes to $2005 and $2006, during which the pixels will be all wrong. (
Here's something related that Bananmos wrote several years ago)
The MMC5's left-and-right split mode is about as impressive as you can get, without being too unwieldy. You could spiff it up with something like MMC4 crossed with MMC5, where each tile or scanline changes the operating bank. Maybe lots of banks. Maybe finer attribute zones (down to 8x1) and feeding arbitrary tile data. But those are pipe dreams.
Now I wonder whether you could change the MMC5 left-and-right split on a scanline-by-scanline basis...
lidnariq wrote:
You could spiff it up with something like MMC4 crossed with MMC5, where each tile or scanline changes the operating bank. Maybe lots of banks. Maybe finer attribute zones (down to 8x1) and feeding arbitrary tile data.
It would be much simpler to move development over to the SNES out the Genesis than do something like this, wouldn't it?
tokumaru wrote:
It would be much simpler to move development over to the SNES out the Genesis than do something like this, wouldn't it?
I'd say definitely. Especially if the MMC5 hasn't even been fully tamed yet.
My perspective is part curiosity, plus I'm considering that if I ever wanted to have a sizable number of carts produced, I can only imagine the costs would be 1/2 or 1/4 if I got the boards manufactured.
Really though, at this point, I'd be really happy to get an MMC3 variant with 4 x 8KB PRG switching and 8 x 1K CHR switching. That would open up a lot of flexibility. I could do an unrolled loop for each level type instead of trying to make code in the fixed bank that will operate for all.
Even if there's not a board available yet, the PowerPak and Everdrive support a lot of those variants. VRC7 is said to have good support. It has one fixed bank but that's okay. The bankable PRG-RAM of the FME-7 could be really useful though.
There's a lot one could imagine doing if they had complete control of the hardware, it's probably best not to worry about it too much until I make a first game.
lidnariq wrote:
The MMC5's left-and-right split mode is about as impressive as you can get, without being too unwieldy.
That sounds reasonable, and that's still a board that isn't entirely attainable. I think the 8x8 attributes are one of the biggest advancements of that mapper. I mean, there's a ton of stuff really.
Is it possible to mod an MMC5 cart with a CF card and make an MMC5 only dev-cart?
Quote:
That's a great document. Sortable and everything. Cross referencing with PowerPak compatibility, there are many options for which I could start developing right away.
tokumaru wrote:
It would be much simpler to move development over to the SNES or the Genesis than do something like this, wouldn't it?
Absolutely. Hence "pipe dream".
darryl.revok wrote:
Is it possible to mod an MMC5 cart with a CF card and make an MMC5 only dev-cart?
Hmmm, clever. Yes, you could replace one of the two PRG-RAMs that the MMC5 supports with a CF card. Starting with EWROM (for 32 KiB of RAM) or ELROM (and add both CF and RAM).
Quote:
Yes, you could replace one of the two PRG-RAMs that the MMC5 supports with a CF card. Starting with EWROM (for 32 KiB of RAM) or ELROM (and add both CF and RAM).
I'm a bit confused about this but as I said I don't know much about the hardware portion. You mentioned replacing the PRG-RAM, but it's the PRG-ROM that would need to be exchanged with removable storage to make a dev cart that functions like a Powerpak, right?
Just curious, do you think anybody around here would be able and willing to do this for compensation? I could supply the cart. I suppose it would be Laser Invasion or Castlevania III.
The NES has to boot off non-volatile memory, not RAM that loses its contents when it loses power. And yet, at the same time, you need RAM to hold temporary code copied from the CF card (or else use up a large but limited number of rewrites in EEPROM ... or FRAM ...) *
* You could use an NVRAM instead of ROM, but that's just asking for a stray write to render the thing unbootable
Fortunately, the MMC5 supports addressing not one but two external RAMs (of up to 32 KiB each), either of which can be banked anywhere over $6000-$DFFF, and either could be easily replaced by the CF card (as an interface to reading its contents).
This is only the simplest hardware change; other things may be more useful but require more forethought.
Okay, so I adapted my code to VRC-7 to take advantage of it's finer bankswitching. After considering the way it's scanline counter works, it seems like it will have the same problems as the FME-7 though, so I may as well switch to the Sunsoft.
I'm wondering, if I don't do anything too tight with scanline IRQs (just scroll changing and bankswitching) could I use the FME-7 and just offset the IRQ each time by the average IRQ latency, and get it to line up in hBlank? I'd like the FME-7 if I could get the scanline counter to do complex scroll splitting without glitches. (without modifying the positions of my scroll splits. I have to align them to attributes.)
Anyway, back to PRG bankswitching, as the topic of this thread. It's always a little nerve-racking when I fully disassemble my program instead of making small changes, but thanks to debug breakpoints I managed to track down the bugs.
Right now, my setup is kind of like this:
Reset code is in the fixed bank
Right after reset, it goes to a load level routine
This routine accesses data in the fixed bank, which, by level number, loads:
Map data bank number
Level number inside that bank (So I can have varying numbers of levels inside map banks)
Level code bank number
Object bank number (this probably won't be entirely tied to the level number, but I'm not sure how I will handle it)
After loading all of that, it uses the "Level number in bank" number to pull pointers from map bank for:
MapType
MapStartPosition
MapPalette
NMIhandler
IRQhandler
ScrollingCode
ObjectControlCode
PlayerInitializeValues
Then it uses MapType to determine which way to set the mirroring, and whether or not it needs to run a routine to build code in RAM.
My main game loop is in my fixed bank, but it uses JMP(indirect)s to access the routines for things like scrolling and object control, so they can be different for each level. I figure I can even find a way to make screens like menus or cutscenes levels within this system.
I'm sure it will undoubtedly need some changing as things progress and I find out more what I'll need, but does this sound like a good approach?
I would put the NMIhandler and IRQhandler in the fixed bank.
The NMI and IRQ vectors point to the fixed bank, but they point to a location with a JMP (indirect) with the current NMI or IRQ location.
I've got entirely different NMIs for different types of levels. For example, a level with only horizontal scrolling has a different NMI than one with X/Y scrolling, which is different from one that has X/Y scrolling and fetches tiles at varying rates due to scroll splits. Doing things like this was one of the main reasons I wanted to change to a mapper with 8KB x 4 PRG bankswitching.
I may be able to share one IRQ but I need to get further along to see if that's the case. I use them in different ways on different instances.
darryl.revok wrote:
The NMI and IRQ vectors point to the fixed bank, but they point to a location with a JMP (indirect) with the current NMI or IRQ location.
On the off chance that it hasn't occured to you, it might be worth pointing out that pointing the vectors at RAM and having the RAM contain two
JMP $xxyy is two total more bytes of RAM consumed but two cycles faster than having two
JMP ($07zz) somewhere in ROM.
Coming back to this. I still don't feel like I've fully grasped how bank switching is properly done, but here's an idea for a ROM map I came up with:
Fixed Bank: - Main skeleton; NMI handler, reset stub, main loop trampoline
- Bankswitching routines
- Camera routines
- Music engine routines
- Object system routines
- Address tables for map data
- Code for player
- Metatile definition data
Switchable Bank 0: - Code for each object minus player
- Data for each object (Including player)
Switchable Bank 0: - Map data for each room in "Area 1"
- Music data for "Area 1"
Switchable Bank 1: - Map data for each room in "Area 2"
- Music data for "Area 2"
(This is somewhat specific to the game I have in mind, I'm planning on having a few big areas with a bunch of rooms connected to one another.)
Switchable Bank 2: - Code and data for other, smaller states. Title, game over, password/file screen, Pause screen (If I want to have a menu.)
I feel like having everything that pertains to an area in the same bank reduces the number of switches that there would need to be if all my map data and music data were just in their own dedicated banks. When The camera needs to move, or when music needs to be played, everything's already right there. The only thing I could see that would have lots of switches would be when objects need to be updated, since they'd have their own dedicated bank. And I guess then, when each object needs to check for collision against the map, it'd have to trampoline to the bank for the respective area. But otherwise, all the data for each object - Animation, hitbox, AI - is all together.
I guess the frame could start with the bank for the area being switched in, and then when each object is run, it trampolines to the object bank, and goes back and forth for background collision detection, and then the area bank is switched back before vblank, so that when it's NMI time the right bank'll be swapped in for music to be ready to be read. What do you guys think?
Are you gonna have so few metatiles that you can place them in the fixed bank? Personally, I'd place the metatiles along with the levels that use them, and put the music in another bank... Unless you indeed plan on having just a few metatiles that are reusable across different levels. As for the music, there's hardly any reason to have it in the same bank as the level it belongs to, since it's not used at all by the game engine. Unless you really don't have anything more game-related that could be using the space.
Having to switch banks for each object and again for testing map collisions does indeed sound like a significant overhead, especially with the MMC1, which requires several mapper writes for each switch (personally I'd rather use UxROM, if I didn't need other MMC1 features), but depending on the number of objects that are active this might not be a problem. You might have to tweak the level design later, to avoid having more active objects than your engine can comfortably handle. I'm handling objects this way myself, but I'm using a mapper with an interface that's not as anal as the MMC1's.
The
Programming UNROM article on the wiki is kinda confusing. It says you need to write to $8000 to switch banks, but in the example it looks like it's writing to $C000
UNROM places one register across $8000-$FFFF. It doesn't matter where you write, so long as the value you write is the same as the value already in ROM (to avoid bus conflicts).
Okay, that makes sense. For whatever reason I couldn't get it to work before, but it's working fine now. Thanks, Tepples.
Okay, so if I have a switchable bank dedicated to object AI and data, another switchable bank with all the map data for Area 1, and then my collision reading routine in the fixed bank, how would I go about this?
1. In object bank
2. Save the X and Y of where the collision takes place as temporary parameters
3. Tell the program we need to switch to the Area 1 bank
4. JSR to the map reading routine in the fixed bank
5. Bank switch
6. Read map data from the Area 1 bank
7. Return the type of collision
8. Switch back to the object bank
9. RTS
I guess the scenario isn't that bad, but what if there were different subroutines across multiple switchable banks, or across 32kb banks? The only thing I can think to do would be to save the PC twice, but that seems pretty clunky.
IMHO...
Either 1. The code could be on the fixed bank, it swaps banks, and the data on swapped bank...never having to jump or jsr to swapped bank
2. Control code in the fixed bank swaps bank, jsr to unique code in the swapped bank, accessing data on that bank, rts.
Honestly, there's probably a dozen ways to do this. I've seen a game that put all it's bank swapping code in the RAM. I've seen a game that had a big series of indirect jump addresses at the start of each swapped bank.
Sogona wrote:
2. Save the X and Y of where the collision takes place as temporary parameters
The object RAM should contain the object's coordinates and its speeds, which are sufficient to calculate the object's new position. Personally, I'd have a single routine move the object and handle collisions/ejections, as that would mean less bankswitching. This routine would reside in the fixed bank, and would do the following when called by an object:
1- remember which object bank is mapped;
2- switch to the current level's bank;
3- move the object horizontally and eject as necessary;
4- move the object vertically and eject as necessary;
5- switch back to the object's bank;
6- return;
That's only two switches per object movement, because the movement is handled outside of the object's bank. As an optional step, you could report back to the object the types of blocks it made contact with, horizontally and vertically, so it could react to special blocks (conveyor belts, lava, etc.).
Quote:
I guess the scenario isn't that bad, but what if there were different subroutines across multiple switchable banks, or across 32kb banks?
If you have a fixed bank, make good use of it. Try to put all routines that manipulate data in there, and use the switchable banks just for data. This is obviously not possible for the object routines, which should probably be the only routines to need the trampoline approach.