More SNES questions

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
More SNES questions
by on (#180856)
I've been reviewing this SNES document...
[removed link to Nintendo's official manual - admin]

And, I have a few questions, mostly because this seems to be an official document, but it seems to differ from other sources.

1.It warns not to set a Sprite X to $100 (-256), maybe something about a 'time over' error that hasn't been resolved. I don't remember reading this warning anywhere else. Is this accurate?

2. It recommends this at start...
Code:
Sei
Clc
Xce
Jmp long to 80+ bank (faster)

And says CPU speed is automatically increased when accessing the 80+ banks. Is that right?

3. It said something about VRAM having $8000 words, but a potential $10000 (iirc). Is there such a thing as expansion VRAM?

4. It said that if you don't reset the NMI flag inside NMI, it can retrigger? And it you don't reset some flag after forced blank, it could immediately enter NMI after turning on rendering. Right? And also, disabling and reenabling NMI might immediately trigger an old NMI if flag isn't reset?

5. Should I have an IRQ and NMI handler inside every bank?

6. Does anyone use the WRAM registers, or is that just for DMA transfers to WRAM?

7. If you set DMA to not increment, can you blank RAM with DMA, by continually writting the same byte (0)?

8. Is the CPU completely occupied with a DMA, or is it free to do other things?

9. Is there any tools for pixel editing in 256 color mode?

10. Multiply/Divide...do you need to wait to get the result? I remember reading that somewhere.

11. If you disable controller reads, then reenable, it says it takes a while for new controller reads to start working. How long?

12. It says many games set BG Y scroll to -1, to align to Sprites. Right? Like NES, they appear 1 pixel low?

13. Am I right to assume that the SNES doesn't care about the header?

14. Don't use screen math in mode 5-6? Another warning I've never seen before, with no explanation.

15. It says that you can do pixel perfect raster effects, with H/V counters? I've never seen any examples.

16. 2132 fixed color math. It says in the SFC wiki write once, but then their example shows multiple writes. (?)
http://wiki.superfamicom.org/snes/show/Registers

17. Anyone know what Offset-per-tile mode is?

I think that's enough for now.
Re: More SNES questions
by on (#180857)
dougeff wrote:
1.It warns not to set a Sprite X to $100 (-256), maybe something about a 'time over' error that hasn't been resolved. I don't remember reading this warning anywhere else. Is this accurate?
If a sprite is exactly at X=256, then it still consumes fetches from the 34-slivers-per-scanline limit (as though every single sliver were visible!), but none of it displays.

Quote:
And says CPU speed is automatically increased when accessing the 80+ banks. Is that right?
Only if you set the bit in $420D.

And it may not be as much of a speed improvement as you're hoping, because internal operations cycles and access to registers occur at 3.6MHz speed anyway.

Quote:
3. It said something about VRAM having $8000 words, but a potential $10000 (iirc). Is there such a thing as expansion VRAM?
Unfortunately, no. (discussion)

Quote:
5. Should I have an IRQ and NMI handler inside every bank?
IRQ and NMI vectors (in fact, ALL vectors) always point to bank 0. It would be actively difficult to be forced to have multiple handlers... but obviously if you had a use case you could fake an indirect long jump.

Quote:
7. If you set DMA to not increment, can you blank RAM with DMA, by continually writing the same byte (0)?
Yes.

Quote:
8. Is the CPU completely occupied with a DMA, or is it free to do other things?
The address busses and data bus are completely occupied during DMA, so the CPU couldn't do anything else anyway.

Quote:
9. Is there any tools for pixel editing in 256 color mode?
Any palette-using graphics editor should do here. Both GIMP and mtpaint can work.

Quote:
10. Multiply/Divide...do you need to wait to get the result? I remember reading that somewhere.
8 CPU cycles.

Quote:
13. Am I right to assume that the SNES doesn't care about the header?
The SNES itself only cares about the vectors.

Quote:
17. Anyone know what Offset-per-tile mode is?
It allows any given 8-pixel-wide column to have its own fine Y scroll and coarse X scroll. Try my simple demo of modes 0-6. (HDMA would let you change the vertical extent of these columns)
Re: More SNES questions
by on (#180858)
In public, it's better to cite SNES Development Wiki or Fullsnes instead of what appears to be a pirated copy of the official manual. I'll answer based on my recollection of those public resources.

dougeff wrote:
1.It warns not to set a Sprite X to $100 (-256), maybe something about a 'time over' error that hasn't been resolved. I don't remember reading this warning anywhere else. Is this accurate?

There appears to be a logic error in sprite evaluation that treats sprites at x=-256 as if they are at x=0, that is, with all tiles on screen. They're not displayed (which is correct), but they still count against the 34 sliver cap (which is the error). But you don't need to write the sprite at all if -256 <= x <= -64 (or -32 with sane OBSEL size settings), as it'll count against the 32 sprite cap even if -255 <= x <= -64.

Quote:
2. It recommends this at start...
Code:
Sei
Clc
Xce
Jmp long to 80+ bank (faster)

And says CPU speed is automatically increased when accessing the 80+ banks. Is that right?

Execution out of ROM in $008000-$7DFFFF is always slow (2.7 MHz). Execution out of ROM in $808000-$FFFFFF is fast if the memory controller has been told to use fast ROM. So make sure your init code jumps out of bank $00 at some point, and when you do, jump to bank $80 or higher. Do the same for any nontrivial NMI or IRQ handler.

Quote:
3. It said something about VRAM having $8000 words, but a potential $10000 (iirc). Is there such a thing as expansion VRAM?

Not in retail consoles. I don't know if it was expanded in debug consoles or the Nintendo Super System arcade platform.

Quote:
4. It said that if you don't reset the NMI flag inside NMI, it can retrigger?

As far as I know, it's the same logic as the NES PPU's NMI: if you don't acknowledge the NMI by reading the status register, and you turn NMI generation off and on, it'll cause another NMI. On the NES, the NMI enable is $2000 bit 7, and the status register is at $2002. On the Super NES, the NMI generation enable is at $4200, and the status register is at $4210.

Quote:
5. Should I have an IRQ and NMI handler inside every bank?

No. Your vectors should be in $00FFE0-$00FFFF. And unless you're using certain types of coprocessor, $008000-$00FFFF is always readable.

Quote:
6. Does anyone use the WRAM registers, or is that just for DMA transfers to WRAM?

It might be useful for autoincrementing if the X and Y registers are taken up

Quote:
7. If you set DMA to not increment, can you blank RAM with DMA, by continually writting the same byte (0)?

Yes, but you have to set ROM (or possibly the VRAM read port) as the source because WRAM-to-WRAM transfers don't work.

Quote:
8. Is the CPU completely occupied with a DMA, or is it free to do other things?

General-purpose DMA completely occupies the CPU.

Quote:
9. Is there any tools for pixel editing in 256 color mode?

GIMP can save a PNG in 4-, 16, or 256-color indexed mode and then pilbmp2nes.py can convert it to tile data. I have used these tools both on Ubuntu and on Windows.

Quote:
10. Multiply/Divide...do you need to wait to get the result?

Yes, if you're referring to 5A22 multiply and divide, not the PPU multiplier that's usable when the background mode isn't 7. There are a couple games that depend on incomplete results of 5A22 mul/div, but I consider the result unspecified.

Quote:
13. Am I right to assume that the SNES doesn't care about the header?

The retail Super NES console doesn't care about anything in the header other than the vectors, but other tools are believed to read it.

Quote:
14. Don't use screen math in mode 5-6? Another warning I've never seen before, with no explanation.

The PPU can look up only two palette entries per pixel: one for the sub screen and one for the main screen, and then it does math on the result. Usually the PPU displays the color math output in the whole pixel. But in high-resolution modes, the PPU splits each pixel into two halves, displaying the unchanged sub screen color in the left half and the color math output in the right half.

Quote:
16. 2132 fixed color math. It says in the SFC wiki write once, but then their example shows multiple writes. (?)

If you're setting R, G, and B to different values, you have to poke in each value separately. There are three internal registers (COLDATA_R, COLDATA_G, COLDATA_B). Bits 7-5 of the written value select which registers to write (the "address" so to speak); bits 4-0 set the value to write to the selected registers.

Quote:
17. Anyone know what Offset-per-tile mode is?

It's used for scrolling individual columns of a background, such as the segments of the snake in the final boss of Aladdin, the floor in "Touch Fuzzy, Get Dizzy" in Yoshi's Island, or the playfield in Tetris Attack. It's also used for shearing the background of Star Fox to fake rotation. Have you used VSRAM on the Sega Genesis?
Re: More SNES questions
by on (#180859)
Sixteen questions is more than "a few".

1. A sprite's horizontal (X) position is stored in a 9-bit number, where the MSB defines signage. The format of this data is non-linear; bit 8 (MSB) is stored in a different part of the OAM than the rest. Refer to A-3 (near end of document) for the OAM format. "Time over" is discussed in Chapter 20 (2-20-1) and 2-27-3 (see $213E bit 7). The simple explanation is: don't set a sprites' X position to $100 (bits 0-7 = $00, bit 8 = $01).

2. It depends on the memory mode being used (mode 20 vs. 21 vs. 30 vs. 31) and what $420D bit 0 is set to. Refer to Chapter 21 (2-21-1 to 2-21-5), 1-2-17, 1-2-25 to 1-2-35, and 2-28-5 (see $420D bit 0). The long jump is needed because it sets K (a.k.a. "PCB" or "PC bank" -- don't confuse this with the B register) to wherever you jump to. When the CPU powers on, K=$00.

3. To my knowledge, there is no extended VRAM (i.e. it's 64KBytes, or 32K of words (16-bit)). Expansion carts might offer something like this, but I know nothing about them. This subject has come up on the forum before.

4. Correct. Refer to 2-28-6 ($4210 bit 7; read the notes underneathe) and B-3. B-3 has the full explanation of the situation. In short, it's simple: make sure in your code that your NMI vector points to that you read $4210 once.

5. The 65816 reads all the vectors (RESET, NMI, IRQ, etc.) from bank $00. This should explain to you why part of the SNES memory map is/why how it is, re: mirroring and banks $80, $C0, etc.. Vectors are only 16-bits, which means you need to make sure that your vector routines either a) fit within the constraints of bank $00, or use 24-bit addressing (long JMP/long JSR, etc.) if spanning more than one.

If there's a vector for something you don't use (ex. IRQ, COP, BRK, etc.) then point it to an address containing nothing but an rti as a safety net.

6. (For readers: this is a question about registers $2180 through $2183, re: accessing WRAM a.k.a. $7E0000-7FFFFF). These registers are most commonly used for DMA transfer destinations (to read/write something from/to WRAM), but nothing stops you from using them yourself if you want. But in the latter case, it's probably faster to access the $7E0000-7FFFFF range natively yourself. :-)

7. Yes, absolutely -- and this is quite common.

8. This is coveredin Chapter 17 (2-17-1 to 2-17-5). The CPU completely halts when a DMA transfer takes place (any 1 bits written to $430B (DMA) or $430C (HDMA)), and is resumed when the transfers finish. If multiple transfers are initiated at once, they all are finished/completed before the CPU is resumed.

9. This should be posted as a separate thread; you're asking about tools for graphics, but don't state what OS you need, etc... Separate thread please.

10. Yes, there is a delay (the CPU keeps working during this time period, so you're free to do what you wish). The delay times (in CPU cycles) are fully documented: 2-28-2 describes the delay periods for both multiplication and division (read the notes).

11. I would think it would take one VBlank, based on the described implementation in Chapter 13 (2-13-1 to 2-23-2). Anomie's docs might have more precise details.

12. Don't have an answer for this -- someone else will need to provide one.

13. The SNES (console) itself does not care about the official header data in $FFB0-FFDF.

14. Someone will need to correct me, but: the description you've given here is vague (I need references). I don't know what "screen math" refers to either. My guess is that it's warning you about calculations being done because of the interlaced and pseudo-512 features that mode 5 and 6 offer. These are not commonly-used video modes, by the way, so my suggestion might be to not worry about it until/if you have need for them.

15. That's nice (re: "I've never seen any examples"). :-) Usually this type of effect is done using HDMA and/or with precise timing (HDMA is the more common feature -- doing specific DMA transfers per scanline). H/V counters are certainly possible to use too, but you might end up spending time spinning waiting on them. One of the most impressive demos -- Anthrox's SIDMANIA demo would certainly classify as this. (AFAIK emulators still cannot run this thing reliably given how timing-precise it is. There are no videos of it on YouTube either. I might have to do a video capture sometime...)

Just remember that the SNES is still primarily a tile-based console. The Genesis/MegaDrive, as I understand it, is more loose as far as really screwing around with raster effects. I won't discuss this item past this point.

16. Again: I'd need a reference. Off the top of my head I'm not aware of anything that mandates you can't write to $2132 more than once. Possibly this is an error in the wiki. The register is described in 2-27-18; it isn't a "multi-write" register. Examples using it are described in Chapter 7 (2-7-1 to 2-7-5).

17. Don't have an answer for this -- someone else will need to provide one.

P.S. -- Please don't ever do this again (re: "a few questions", then dump 16). It's too much to ask in one sitting. All this will do is create a massive 200-page thread that nobody can follow. A volume of questions like this should be asked in real-time somewhere (IRC, etc.).

P.P.S. -- I see the number of questions has been increased to 17. Please stop this. :P
Re: More SNES questions
by on (#180864)
Great responses. Thanks!
Re: More SNES questions
by on (#180868)
Your question about vectors and banks suggests that you aren't completely familiar with how address banks work on the 65816. They're very different from bank switching on an 8-bit system like the NES, and much more like segments in 16-bit x86 programming. Would you say you understand the concept of the program bank, the data bank and the relocatable direct page, or do you need a primer?

Quote:
But in high-resolution modes, the PPU splits each pixel into two halves, displaying the unchanged sub screen color in the left half and the color math output in the right half.


This isn't quite accurate. The right half of each pixel is calculated the same way as in standard resolution, but the left half is the result of color math done on the sub screen pixel and the previous main screen pixel (the one immediately to the left). Essentially, in hires mode you can only blend the screen with itself (resulting in a low-pass filter blurring effect, but only in the horizontal direction) or with the fixed color register. Some text boxes in Marvelous and the fullscreen pictures of the girls in Suchie-Pie do the former; the subscreen in Romancing SaGa 3 does the latter (to display the colored boxes around item/spell descriptions, etc.)

A quirk of hires color math (which most emulators don't emulate properly--AFAIK only the accuracy profile in bsnes, and some builds of snes9x) is that the "is math enabled on this layer" test (bits 0-5 of register $2131) is only done by the PPU once for every two pixels. This can result in ugly artifacts along the edges between math-enabled and math-disabled layers (again, see Suchie-Pie)
Re: More SNES questions
by on (#180888)
Quote:
you aren't completely familiar with how address banks work on the 65816.

I'm not familiar with 65816. I'm trying to learn. I see some vector info here...
https://en.wikipedia.org/wiki/Interrupt ... processors

My understanding is that the ROM is divided into $8000 byte chunks by the MAD-1, which is similar to banking on the NES (AxROM)...in that you have to change the data bank byte (or use long addressing) to change which $8000 byte chunk is being accessed (read/write).

EDIT - and use jump long (change PB) to execute from a different bank.
Re: More SNES questions
by on (#180890)
dougeff wrote:
Quote:
you aren't completely familiar with how address banks work on the 65816.

I'm not familiar with 65816. I'm trying to learn. I see some vector info here...
https://en.wikipedia.org/wiki/Interrupt ... processors

My understanding is that the ROM is divided into $8000 byte chunks by the MAD-1, which is similar to banking on the NES (AxROM)...in that you have to change the data bank byte (or use long addressing) to change which $8000 byte chunk is being accessed (read/write).

EDIT - and use jump long (change PB) to execute from a different bank.

Refer to Chapter 21 (2-21-1 to 2-21-5), 1-2-17, and 1-2-25 to 1-2-35 for SNES memory layouts (re: mode 20 and mode 21; mode 25 is also in there, but that's for advanced usage). The memory layouts/models/modes described there provide a diagram of each layout. I suggest when starting out, use mode 20 -- it makes your life a bit easier. I've discussed mode 20/21 at length in other snesdev posts in the past year, and I'd really rather not go over all of that again.

Thinking of the B register like a NES mapper (AxROM) isn't accurate. B plays a much bigger role than that. These are CPU-level features (same goes for 24-bit addressing), not things that happen as a result of intermediary mapper hardware "swapping" address bus lines around.

I strongly suggest you read the Programming the 65816 (including the 6502, 65C02, and 65802) by Western Design Center. Don't skim it, read it. The manual is written in such a way that it explains things to you in English step-by-step. I suggest the Eyes/Lichty version if at all possible. As someone who knows 6502 already, this book should be a godsend to you. (It was what I read and used exclusively back in my day to learn 65816 relating to the Apple IIGS -- there were no other book-based resources for the CPU, that I knew of, at the time)
Re: More SNES questions
by on (#180903)
tepples wrote:
In public, it's better to cite SNES Development Wiki or Fullsnes


While I love the Fullsnes document, it is not easy to cite, as it is a huge 1 page html file. And the wiki...while the register info is great, doesn't explain things good enough for a SNES noob like me.

I would write my own tutorial, but I don't have enough experience (obviously).

koitsu wrote:
read the Programming the 65816
I have read it, every page.

I had an Apple IIgs also. To bad I never learned any 65816 programming back then.

And when I said I've never seen examples of raster effects on SNES using H/V counters. I mean code. Short code examples of how it would/should work.

I'm trying to figure out how to write a homebrew for SNES. Sorry for all the questions. Much appreciated.
Re: More SNES questions
by on (#180908)
dougeff wrote:
While I love the Fullsnes document, it is not easy to cite, as it is a huge 1 page html file.

An HTTP URL can include a fragment identifier, the part after the # character. If the document is HTML, and the fragment identifier matches the value of the id= attribute associated with a particular section of that document, the browser will scroll to that section when viewing the URI. Currently id= is preferred, but for backward compatibility, the browser will also match name= attributes of <a> elements. For example, http://problemkaputt.de/fullsnes.htm#snesppuinterrupts instructs the browser to scroll to an element of fullsnes.htm whose id="snesppuinterrupts" or to an <a name="snesppuinterrupts">.

If you reached a section of the document by following links within the document, the URL including the fragment identifier should be in your browser's address bar. If you reached a section of the document by using "Find text within this page" (Ctrl+F or Cmd-F), a section's heading may include a link to self or a link back to the table of contents. Otherwise, you can right-click, Inspect Element, and root around for a nearby id= or <a name=>.

Quote:
I'm trying to figure out how to write a homebrew for SNES. Sorry for all the questions. Much appreciated.

I'd help if I weren't currently busy with The Curse of Possum Hollow.
Re: More SNES questions
by on (#180917)
It's not really bank-switching. The 65816 has a 24-bit address bus; it can access the entire address space from $000000 to $ffffff at all times.

It's entirely possible to run code in one bank that's reading data from another, for instance. And there's nothing stopping you from writing code like this, either:
Code:
; this code is in bank $89
lda $abcdef
sta $cd0123
jmp $ef8888

All changing DB does is affect which address something like lda $1234 refers to within the CPU itself; it doesn't really have anything to do with the cartridge in particular.

In fact, the basic LoROM/HiROM maps don't need a special mapping chip at all (it's just about how the 24 address lines are connected/not connected to ROM).

It's also worth noting that the difference between the banks isn't just what part of the ROM is there. For instance, the system area ($xx0000-xx7fff) is only mapped to banks $00-3f and $80-bf. Also, the SNES's 128 KB of WRAM is mapped to $7e0000-7fffff.
Re: More SNES questions
by on (#180922)
I understand.
Re: More SNES questions
by on (#181164)
I have another question. I plan to use ca65, my question is about getting a long address from a label.

Let's say the program is in bank 0, and I have some compressed graphics in bank 1, and I plan to decompress it to WRAM at bank ff, before DMAing it to VRAM.

So my question is, when you make a linker cfg file, usually I would say bank 0 segment start assembling at address $8000, bank 1 segment start assembling at address $8000, etc...

Should I instead say...bank 0 segment, start asssembling at address $808000, bank 1 segment, start asssembling at address $818000...WRAM segment #2, start at $ff0000.

Then put the compressed graphics in bank 1 segment with a label on it, define an array in WRAM segment #2,...

And if I then (inside the bank 0 code) use the bank 1 label, will it know to use long addressing? And the same with the label for the array in WRAM, will it know to use long addressing when I use the name of my variables/array in the Bank 0 code?

I see you can add f: (24 bit), but I wasn't sure if it would know what to put in the bankbyte position.

EDIT, and then the opposite question. If I start the bank at $808000, wouldn't intra-bank addresses (read/write bank 0 from bank 0) be long...and I would have to use a: to force 16-bit?
Re: More SNES questions
by on (#181166)
Because I don't know quite how to answer your first array of questions but want to help, here's the linker file I use:

Code:
# ca65 linker config for 256 KiB (2 Mbit) sfc file

# Physical areas of memory
MEMORY {
  ZEROPAGE:   start =  $000000, size =  $0100;   # $0000-00ff -- zero page
                                                 # $0100-01ff -- stack
  BSS:        start =  $000200, size =  $1e00;   # $0200-1fff -- RAM
  BSS7E:      start =  $7e2000, size =  $e000;   # SNES work RAM, $7e2000-7effff
  BSS7F:      start =  $7f0000, size = $10000;   # SNES work RAM, $7f0000-$7ffff
  ROM0:       start =  $808000, size =  $8000, fill = yes;
  ROM1:       start =  $818000, size =  $8000, fill = yes;
  ROM2:       start =  $828000, size =  $8000, fill = yes;
  ROM3:       start =  $838000, size =  $8000, fill = yes;
  ROM4:       start =  $848000, size =  $8000, fill = yes;
  ROM5:       start =  $858000, size =  $8000, fill = yes;
  ROM6:       start =  $868000, size =  $8000, fill = yes;
  ROM7:       start =  $878000, size =  $8000, fill = yes;
}

# Logical areas code/data can be put into.
SEGMENTS {
  CODE:       load = ROM0, align =  $100;
  RODATA:     load = ROM0, align =  $100;
  SNESHEADER: load = ROM0, start = $80ffc0;
  CODE1:      load = ROM1, align =  $100, optional = yes;
  RODATA1:    load = ROM1, align =  $100, optional = yes;
  CODE2:      load = ROM2, align =  $100, optional = yes;
  RODATA2:    load = ROM2, align =  $100, optional = yes;
  CODE3:      load = ROM3, align =  $100, optional = yes;
  RODATA3:    load = ROM3, align =  $100, optional = yes;
  CODE4:      load = ROM4, align =  $100, optional = yes;
  RODATA4:    load = ROM4, align =  $100, optional = yes;
  CODE5:      load = ROM5, align =  $100, optional = yes;
  RODATA5:    load = ROM5, align =  $100, optional = yes;
  CODE6:      load = ROM6, align =  $100, optional = yes;
  RODATA6:    load = ROM6, align =  $100, optional = yes;
  CODE7:      load = ROM7, align =  $100, optional = yes;
  RODATA7:    load = ROM7, align =  $100, optional = yes;

  ZEROPAGE:   load = ZEROPAGE, type = zp;
  BSS:        load = BSS,   type = bss, align = $100, optional = yes;
  BSS7E:      load = BSS7E, type = bss, align = $100, optional = yes;
  BSS7F:      load = BSS7F, type = bss, align = $100, optional = yes;
}

dougeff wrote:
And if I then (inside the bank 0 code) use the bank 1 label, will it know to use long addressing

I'm fairly certain. It seems to take into account both what data bank you are on and what direct page is when generating code.

dougeff wrote:
If I start the bank at $808000, wouldn't intra-bank addresses (read/write bank 0 from bank 0) be long...and I would have to use a: to force 16-bit?

In the case of bank 0, you can use direct page, which (unfortunately) cannot move from bank 0. Additionally, while direct page is 16 bit (so it can be anywhere in bank 0), the address that gets added to it is only 8 bit. Also, setting direct page to anything other than a multiple of 256 will make it take an extra cycle and be no faster that absolute addressing.

So, when addressing any bank other than bank 0, you have to use absolute addressing, and when addressing bank 0, you can always use direct addressing, and if "B" is set for bank 0, you can also use absolute addressing. Of course, long addressing can be used to address anywhere, but is slower.
Re: More SNES questions
by on (#181168)
There's no WRAM in bank $ff. You can access the first 8 KiB of WRAM at $0000-1fff in each of banks $00-3f and $80-bf, and you can access the full 128 KiB at $7e0000-7fffff, but that is not mirrored to $fe0000-$ffffff.
Re: More SNES questions
by on (#181169)
Quote:
There's no WRAM in bank $ff...


I meant $7f.

Re:Espozo...thanks, that's very helpful.
Re: More SNES questions
by on (#181456)
Follow up...
I was able to figure out how to get the assembler (ca65) to do everything I wanted.

Labels put in the 'zeropage' segment correctly assemble as 8-bit. (Assuming DP = 0000)

All other labels assembled as 16-bit. If you use a label in another bank it needs an f: to get a long address.

All addresses written out in numbers take the bit-depth you write. Example LDA $812345 correctly assembles into a long address. LDA $01 correctly assembles into a DP address.

LDA #01 immediate, varies depending on directives...
.a16 (A9 01 00)
or .a8 (A9 01)

I think I'm figuring it out. Baby steps.

Edit - I'm thinking of trying byuu's SPC assembler...at some point.