I'm creating an Assembler targeted solely for nes dev and I allow variables to be defined without any address assigned to them. The compiler will then arrange them in memory as it sees fit. Which means it needs to decide which variables to stuff in the Zero Page and which don't get the privilege of residing in the ZP. (Variables can be forced in or out of the ZeroPage as well, but those don't matter here)
My thought is to hand out ZeroPage locations to variables in the following order:
1. Most referenced.
2. Arrays
3. Structures
Does this make sense?
So you're trying to make an automated tool to decide whether a variable should be placed in the equivalent of ca65's "ZEROPAGE" segment or its "BSS" segment. I could see a use for that. In that case:
Before 1 should be "0. Variables used with addressing modes that work only on zero page". This includes (d,x) used for pointer tables and (d),y used for pointers into big arrays.
And near the bottom should be "4. Arrays that need to be aligned to the start of a 256-byte page, or arrays that need to not cross a page". Pretty much every game will use a page for the display list that gets copied to OAM through $4014. LJ65 uses two more: a page for player 1's matrix, and a page for player 2's matrix.
And you may want to include a way to
shuffle the allocation order at compile time, so that buffer overflows become easier to spot as they spill onto different variables.
I don't think it is a good idea to automatically allocate data between ZP and normal RAM. You lose important bit of control with this.
And why yet another assembler, anyway? Isn't there are a lot of them already?
I think it's best left up the user honestly. With things like Tepples' mentioned (ZEROPAGE segment, or .zp for nesasm), variables can already be assigned to the zero page or not without a specific address.
That said, most referenced and anything in something that will loop a lot of times should be in the zero page. Any non static RAM should be as well, because it can be reused without costing much extra. As well, any variable referenced in the NMI routine should probably get priority.
I actually wouldn't put arrays and structures there, because of the space they'd take up. Most of my actual arrays take an entire page/half a page/a quarter of a page, and need to be consistent across frames.
You make a very good point tepples. If STX addr0,Y or STY addr0,X is called on that variable it will need to be place in the ZP before all others.
There is still the ability to mark a variable as ZP or not. Additionally you can flat out specify the address too. So #4 is already covered. I simply lock my Sprite[64] array to 0x300 or whever else I decide to store the Sprite ram for the DMA.
I'll make a note about non-static variables. Right now I'm not supporting them because I don't have anything in place to prevent collisions. Definately going on the backlog though.
The kind of arrays that would be put on zero page might include
- an array of anything with one element for each player
- an array of pointers with one element for each music channel
A static analysis of the most-referenced variables might help when trying to trim size from your code. So might an actual profile when trying to add speed to your code.
There is a way to handle non-static variables, and it depends on assembling a call graph of the program. Leaf functions, those that don't call anything else, have their variables at the very start, and if function A calls function B, function A's variables come after function B's. It breaks for recursive calls, but then recursive functions should be saving things on the stack anyway.
So based on everyone's feedback this is the order I think will work best:
-Variables that have an address specified in the source
-Variables referenced by STX zp,Y and STY zp,x must be placed in ZP (Compile error if this is not possible)
-Variables marked as ZP (Compile warning for each one that doesn't fit)
-Most referenced
-Arrays
-structures
Additionally the build ouput will note how many bytes, pointers, chars, arrays, structs etc were promoted to the ZP
It shouldn't be that crucial either way. ZP is slightly faster and uses one less byte of memory. It's only required for certain instructions to use things like the ZP(Y) indirection. Otherwise it's just about trying to optimize performance some I suppose. Unless your project is really needing to crunch every cycle you don't need to stress about what is in ZP and what isn't. Just in general put the main variables in there. Whatever gets used the most often to improve performance some.
I default to putting everything in ZP, except for large structures and arrays, which usually fill up the other pages. Once I run out of space in ZP I start demoting some variables, but that's almost never necessary.
Maybe you should be generous at first, and start demoting the variables only when ZP space runs out, based on the priorities you have set.