Dynamic Sprite Vram Routine Ideas

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Dynamic Sprite Vram Routine Ideas
by on (#140023)
Yes, yes I know the title is terrible. :roll: (couldn't think of what to call it) What I mean by this is, what kind of ways can you make it to where vram for sprites is as crammed as possible? On the metasprite code I made (which I got to be 16 bits! :mrgreen: )(oh, and yes, this means I'm still using WLA... I found it had the best starterkit, even if it is a buggy POS.) I wanted to have it to where each metasprite had an offset in vram, that would get added to the character bit number of each individual sprite to where you would get the final number that you would store into vram, much like what I did when I had a "controller" for x and y that everything was based off of, except for vram. The problem is, sprites on the SNES (as you already know) don't use all the different tiles in a strait line (like the GBA and the Genesis?) Instead, they have it in a box like shape, making what I suggested earlier impossible unless you only used one sprite size. The way I would think that would give you maximum vram density (but definatelly not efficiency) would be if you had every tile or tiles (in the case of 16x16, 32x32, and 64x64 sized sprites) have its own register. Every time you add a new sprite, you would look for a register that is 0, meaning that there are no tiles there. If you are uploading a sprite that is 4x larger than the smaller one, you would look every four registers. If you had a large sprite that is 16x larger than the small sprite, then you would look every 16 registers.

Here's a picture to show what I mean:

Attachment:
SpriteVram.png
SpriteVram.png [ 1.33 KiB | Viewed 3750 times ]

Oh, (here's a given) this approach is obviously going to be better the larger the small sized sprite is. I personally think 16x16 and 32x32 is the best overall. (Seriously, why did they even consider making 64x64 sized sprites a possibility? Especially when you can only pick two different sprite sizes...)
Re: Dynamic Sprite Vram Routine Ideas
by on (#140034)
I was under the impression that if you had a lot of small projectiles, 8x8 and 16x16 might be the best so as not to exceed the 34-sliver limit on too many lines. It also lets you use 128x16 (1024 bytes) as the unit of sprite tile updates, uploading five different units plus OAM in each vblank if you aren't making any background changes.

The GBA can do both Super NES-style "box" mode and Genesis-style "line" mode.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140037)
The problem is, if you want a bunch of explosions (like I love) then you are going to want to use a 32x32 tiles. I really wish there was something like a 8x16 and 16x32 tilemode, as all 128 32x32 sprites can't even all fit on screen, but all 128 16x16 sprites only fill it about halfway. :?

Oh, can't the GBA actually do both "box" and "line" modes? (Wait you just said that. I'm an idiot...) (The Genesis would have a really hard time with box mode using 24x24 sized sprites...)
Re: Dynamic Sprite Vram Routine Ideas
by on (#140040)
Espozo wrote:
Oh, can't the GBA actually do both "box" and "line" modes? (Wait you just said that. I'm an idiot...) (The Genesis would have a really hard time with box mode using 24x24 sized sprites...)


Let's use proper terminology (because I had a hard time figuring out what either you or tepples were talking about). I dunno about the SNES or Genesis, but in GBAland, we call these modes 1 dimensional (sprite tiles are in a "line" in memory) or 2 dimensional (sprite tiles are in a "box" in memory). The GBA could do both, however, you could not mix and match (well you could with HBlanks and some nifty programming, but generally no).
Re: Dynamic Sprite Vram Routine Ideas
by on (#140041)
I really didn't know there was a "proper" name for them, I was just saying how they looked. Also, why in the world would you even want to use mode 2 (better? :wink: ) If you have mode 1? mode 2 may be easier to look at in a vram viewer, but not much else. (It's even harder to dma tile data using it.) I wonder, is mode 2 easier for the hardware to use, or does the SNES just use it because it was simply built that way? (Mode 2 just really seems more complicated in about every aspect.) Just thinking, having non-square sized sprites on the GBA just seems like it would make mode 2 a complete mess.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140047)
Espozo wrote:
Also, why in the world would you even want to use mode 2


2D mode is probably meant to be used with graphics tools and a VRAM viewer. For the graphic artists who didn't care about the programming aspects of the game, having a simple grid they could throw up on the screen and know that's how the hardware would see it exactly was probably a decent benefit for them. With 2D mode, you can reserve only some of the sprite tiles (the ones that need to be animated), to occupy something like the first 4 rows or VRAM or something like that, and update them all via DMA while leaving the rest that don't need updates untouched. I've seen that done in a couple of GBA games that use 2D mapping (Super Mario Advance's intro in some parts) so DMAing wouldn't be terrible. You just need careful planning.

Espozo wrote:
Just thinking, having non-square sized sprites on the GBA just seems like it would make mode 2 a complete mess.


Nah, not really. You get enough VRAM so that there's plenty of space to mix and match sprite shapes. I believe Mega Man Zero does this quite well in many places (there is even often VRAM to spare/not update). 2D mode, however, seems more common in games Nintendo developed themselves rather than 3rd party developers, from what I've seen. I think Nintendo's internal tools were geared towards this (also, the GBA defaults to 2D mode, so...)

But enough about the GBA. This is SNESdev after all.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140065)
tepples wrote:
I was under the impression that if you had a lot of small projectiles, 8x8 and 16x16 might be the best so as not to exceed the 34-sliver limit on too many lines. It also lets you use 128x16 (1024 bytes) as the unit of sprite tile updates, uploading five different units plus OAM in each vblank if you aren't making any background changes.


...and then you'll pull your hair out in frustration having to rearrange every animation frame to fit into a 128x16 block.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140070)
psycopathicteen wrote:
...and then you'll pull your hair out in frustration having to rearrange every animation frame to fit into a 128x16 block.

Which is something software can easily do for you. Say your character is 32x32; you can have your tile sheet converter turn each cel into four 16x16 sprites and have two frames per unit.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140084)
tepples wrote:
psycopathicteen wrote:
...and then you'll pull your hair out in frustration having to rearrange every animation frame to fit into a 128x16 block.

Which is something software can easily do for you. Say your character is 32x32; you can have your tile sheet converter turn each cel into four 16x16 sprites and have two frames per unit.

Wait a minute, wouldn't you need to arrange the tiles in real time? Otherwise, you would waste DMA time on a metasprite that has less than 64x32 pixels.

Edit: You know, one thing I always wondered was how the DKC games managed with this sort of stuff. Has anyone taken DKC apart enough to know?
Re: Dynamic Sprite Vram Routine Ideas
by on (#140096)
I think that's one of the things that is lacking on the sPPU side of the snes; sprite flexibility. Yeah, having to choose between only two sprites is annoying, but layout for sprites (IIRC, it's been awhile for snes dev stuffs) is limiting too. This is where the PCE and Genesis have more flexibility; sprites (and tiles) and exist anywhere in the whole vram range (even inside tilemap areas). Though for your typical average game, I'm sure it's not a huge deal breaker.

I kinda understand your idea. How memory efficient is it, though?


On a related note, the original sharp 68k only had one sprite size: 16x16. And 128 SAT entries. And, IIRC, had 32k of vram for sprites. And 16 sprites per scanline (one site says 32, but a lot say 16. Might be res/mode dependent). Of course, later models up the video specs (supposedly).
Re: Dynamic Sprite Vram Routine Ideas
by on (#140102)
tomaitheous wrote:
This is where the PCE and Genesis have more flexibility; sprites (and tiles) and exist anywhere in the whole vram range (even inside tilemap areas). Though for your typical average game, I'm sure it's not a huge deal breaker

But they only have about half (5/8 for the Genesis) the amount of sprites, so you with the Genesis, you can have a 32x16 sprite, but with the SNES, you can have two 16x16 sprites for the exact same effect, only loosing slightly more sprites than the Genesis and the exact same amount of the TG16, but it's unlikely you wont have a single sprite that's perfectly square. In fact, explosions and bullets are usually one sprite and are usually take the most sprites, so... (I do wish the SNES had an extra sprite bit for sprite size so you could have all the different sized sprites, but I don't care way too bad. Explosions and bullets, which are usually the largest and the smallest, are usually together so that's a bit of a pain...)

tomaitheous wrote:
I kinda understand your idea. How memory efficient is it, though?

If you mean wram, not at all. If you mean vram, yes. There are 512 tiles available for sprites, so you'd have to use 512 different registers with 8x8 tile sprites, and 128 registers with 16x16 tile sprites. You could always have a register represent 8 different tiles because there are 8 different bits, but you'd have to do an AND operation 8 times, but this would waste time, which I'm not sure this could be done because of that, so it would probably be better to use more wram instead of processing time, as the SNES has plenty of ram. The one problem with this, how would you know where to write for animation updates? Maybe every time you successfully find an open place for a new sprite in vram, you store the result and look at what it was for where you are supposed to dma new tiles? This would use even more ram, because you need to store the answer in a register... (The register would actually just be a general offset for where a sprite is in vram. It would be for finding character data, and for seeing what part of vram needs to be updated.)
Re: Dynamic Sprite Vram Routine Ideas
by on (#140114)
Sorry to double post, but I think this is pretty significant. I think I kind of have a code that would work with 8x8 and 16x16 sprites. (I don't know how it would work using 16x16 and 32x32 sprites because 128 registers doesn't translate well with 512 tiles.) If there's anything that needs to be fixed, please tell me.

The code assumes 8 bit accumulator with 16 bit indexes
Code:
look_for_8x8_vram:
   cpy #$0100      ;512, because there are 512 slots for sprites (this number can be anything, just not over 512)
   beq no_slot_found   ;no space left for sprite
   lda Vram,y   
   beq slot_found      ;there is space for another sprite
   iny
   bra look_for_8x8_vram   ;look again if the space is already occupied

look_for_16x16_vram:
   cpy #$0100            ;512, because there are 512 slots for sprites (this number can be anything, just not over 512)
   beq no_slot_found         ;no space left for sprite
   lda Vram,y            ;upper lefthand corner of square
   bne prepare_for_look_for_16x16_vram   ;look again if the space is already occupied
   lda Vram+1,y            ;upper righthand corner of square   
   bne prepare_for_look_for_16x16_vram   ;look again if the space is already occupied
   lda Vram+16,y            ;lower lefthand corner of square
   bne prepare_for_look_for_16x16_vram   ;look again if the space is already occupied
   lda Vram+17,y            ;lower righthand corner of square
   bne prepare_for_look_for_16x16_vram   ;look again if the space is already occupied
   bra 16x16_slot_found         ;there is space for another sprite         

prepare_for_look_for_16x16_vram:
   inx
   cmx   #$08
   beq   next_row
   iny
   iny
   bra look_for_16x16_vram

next_row:
   tya         ;If this is done right, this should skip every other row of tiles
   adc   #$08
   tay
   bra look_for_16x16_vram

8x8_slot_found:
   inc   Vram,y      ;say that one of the slots is now taken
   sty   TempY      ;TempY holds the vram offset
   rts

16x16_slot_found:
   lda   #$01
   sta   Vram,y      ;say that four of the slots are now taken
   sta   Vram+1,y
   sta   Vram+16,y
   sta   Vram+17,y
   sty   TempY      ;TempY holds the vram offset
   rts

no_slot_found:
   rts   ;nothing else you can really do...

Note: This has not been tested because I cannot currently fit it in with the rest of my code.

Edit: I forgot to make it to where 16x16 sized sprites skipped every row of tiles vertically, so I, if it is correct, I did. I also fixed prepare_for_look_for_16x16_vram, because I accidentally incremented by 4 tiles instead of 2.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140129)
Espozo wrote:
Edit: You know, one thing I always wondered was how the DKC games managed with this sort of stuff. Has anyone taken DKC apart enough to know?


DKC has every metasprite crammed into a 128x16 box, like the method Tepples mentioned. It's a pretty VRAM efficient method, but be prepared to make a lot of metasprite tables if you don't have a computer program to automatically generate metasprite tables for you.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140131)
Espozo wrote:
tomaitheous wrote:
This is where the PCE and Genesis have more flexibility; sprites (and tiles) and exist anywhere in the whole vram range (even inside tilemap areas). Though for your typical average game, I'm sure it's not a huge deal breaker

But they only have about half (5/8 for the Genesis) the amount of sprites, so you with the Genesis, you can have a 32x16 sprite, but with the SNES, you can have two 16x16 sprites for the exact same effect, only loosing slightly more sprites than the Genesis and the exact same amount of the TG16, but it's unlikely you wont have a single sprite that's perfectly square. In fact, explosions and bullets are usually one sprite and are usually take the most sprites, so... (I do wish the SNES had an extra sprite bit for sprite size so you could have all the different sized sprites, but I don't care way too bad. Explosions and bullets, which are usually the largest and the smallest, are usually together so that's a bit of a pain...)


By half? You mean the size of the SAT (sprite attribute table, or OAM as Nintendo calls it)? Yeah, but I look at it the other way around; you need such a large SAT because of the limited sprite sizes on the SNES. I.e. You need a large SAT especially when using anything 8x8 sprites (meta or otherwise).

Just an observation/off topic:
Even if you limited the PCE and MD to 256 res mode, the 64 entry SAT will easily cover the entire screen because they have access to all their sprite sizes in a single frame and aren't limited to square sizes. For MD, that's up to 32x32 which at 64 entries, covers 65k pixels. A 256x224 screen is only 57k pixels. For PCE, with its max size of 32x64, can cover up to 128k pixels; way more coverage than it can show pixel wise. SAT size isn't a problem, even with the PCE's small sprite size of 16x16. Just watch a Lords of Thunder longplay on the hardest setting; revenge bullets on every enemy kill. 64 sprites is a LOT of sprites onscreen (objects, not metasprites). Same for the MD. And you have direct access to the full 64k range; no banking or name table limitations or wrapping issues, etc.

Of course, MD has its own problems: to get around the limit number of subpalettes, I'd use more metasprite cells to access more than one palette in a sprite (that or limit sprites to 8 colors each). Stef mentioned that there's a part in Contra Hardcores that maxes out the SAT (80 entries for high res mode), and I thought it was for this reason - but I'm not entirely sure. PCE has a huge number of subpalettes for sprites, but all sizes being increments of 16 pixels (16x16,16x32,16x64,32x16,32x32,32x64) - it can waste space if the object is tiny or waste sprite-line-pixel bandwidth on transparent pixels. They all have their drawbacks, but I wouldn't say SAT entry size is one of them.


On topic:

Isn't the sprite wrap on the SNES sprite page/bank/whatever, at 16k?
Re: Dynamic Sprite Vram Routine Ideas
by on (#140133)
I see we have different opinions... Well, I respect yours anyway! :P The one thing I do have to say though is that having the whole 64KB range for sprites is cool, but maybe not all that practical because of BG tiles.

About vram though, I think it would have been cool if Nintendo (and Sega, for the Sega Genesis) had offered some sort of vram expansion for the SNES, like they did the N64. (Sprites having access to 64KB of vram would be very useful here.) I heard people have done this on the Genesis, but has it ever been done on the SNES? Is it even easily possible?

Oh, and does anyone have any good ideas on how to make my code use 16x16 and 32x32 sprite sizes? I don't really know how to get 128 slots for sprites to easily match up with 512 tiles...
Re: Dynamic Sprite Vram Routine Ideas
by on (#140146)
I see a couple mistakes. #$0100 is 256, #$0200 is 512. TYA and TAY only transfers the low 8-bits of Y because the accumulator is 8-bit, though this would not be a problem with 16x16s and 32x32s. After counting X up to 8, you didn't reset it.

Now this is how to calculate the tile number, when using 16x16s and 32x32s.

Code:
tya
asl
and #$f0
sta TempY
tya
and #$07
ora TempY
rep #$20
and #$00ff
asl
sta TempY
sep #$20
Re: Dynamic Sprite Vram Routine Ideas
by on (#140147)
tomaitheous wrote:
Isn't the sprite wrap on the SNES sprite page/bank/whatever, at 16k?

Huh?

SNES has two active 8 kB sprite tables (square, 128x128) at any given moment. They can be located more or less anywhere in VRAM, with the PPU informed of their locations via OBSEL ($2101); they do not need to be adjacent. I'm not sure if you can change OBSEL during HBlank (though I certainly hope so, and I should probably test this), but I know you can change it between frames.

The format is SNES bitplane 4bpp, of course, and each row of 16 tiles is contiguous in VRAM, followed immediately by the next row down. That's the only "wrap" I'm aware of.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140159)
You can insert a 0 bit into a register by masking off the low bits and adding to itself. I learned this a couple days ago. Here's the basic idea in 6502 or in 65816 8-bit mode (sep #$20):
Code:
              ; C  A         tmp
              ; ?  hgfedcba  ????????
  sta tmp     ; ?  hgfedcba  hgfedcba
  and #$F8    ; ?  hgfed000  hgfedcba
  clc         ; 0  hgfed000  hgfedcba
  adc tmp     ; h  gfed0cba  hgfedcba


Or in 65816 16-bit mode (rep #$20), also inserting a bit at the bottom:
Code:
              ; C  BA                  tmp
              ; ?  ???????? hgfedcba   ???????? ????????
  and #$00FF  ; ?  00000000 hgfedcba   ???????? ????????
  asl a       ; 0  0000000h gfedcba0   ???????? ????????
  sta tmp     ; 0  0000000h gfedcba0   0000000h gfedcba0
  and #$01F0  ; 0  0000000h gfed0000   0000000h gfedcba0
  adc tmp     ; 0  000000hg fed0cba0   0000000h gfedcba0
Re: Dynamic Sprite Vram Routine Ideas
by on (#140173)
psycopathicteen wrote:
I see a couple mistakes. #$0100 is 256, #$0200 is 512. TYA and TAY only transfers the low 8-bits of Y because the accumulator is 8-bit, though this would not be a problem with 16x16s and 32x32s. After counting X up to 8, you didn't reset it.

Oh...

psycopathicteen wrote:
Now this is how to calculate the tile number, when using 16x16s and 32x32s.Code:tyaasland #$f0sta TempYtyaand #$07ora TempYrep #$20and #$00ffaslsta TempYsep #$20

Would that work like if a single register represented a single tile, or if a register represented a 16x16 tile? Also, sorry, but where would you put the code? When you jump back?

Here's the code I kind of converted, just changing the names of things, fixing the bugs from the other code (y is now 8bit, meaning tya and tay should be fine) and doing some stuff to act like you are using 32x32 sprites.

Code:
look_for_16x16_vram:
   cpy #$00F0      ;512, because there are 128 slots for sprites (this number can be anything, just not over 128)
   beq no_slot_found   ;no space left for sprite
   lda Vram,y   
   beq slot_found      ;there is space for another sprite
   iny
   bra look_for_16x16_vram   ;look again if the space is already occupied

look_for_32x32_vram:
   cpy #$00F0            ;512, because there are 128 slots for sprites (this number can be anything, just not over 128)
   beq no_slot_found         ;no space left for sprite
   lda Vram,y            ;upper lefthand corner of square
   bne prepare_for_look_for_32x32_vram   ;look again if the space is already occupied
   lda Vram+1,y            ;upper righthand corner of square   
   bne prepare_for_look_for_32x32_vram   ;look again if the space is already occupied
   lda Vram+8,y            ;lower lefthand corner of square
   bne prepare_for_look_for_32x32_vram   ;look again if the space is already occupied
   lda Vram+9,y            ;lower righthand corner of square
   bne prepare_for_look_for_32x32_vram   ;look again if the space is already occupied
   bra 32x32_slot_found         ;there is space for another sprite         

prepare_for_look_for_32x32_vram:
   inx
   cmx   #$04
   beq   next_row
   iny
   iny
   bra look_for_32x32_vram

next_row:
   tya         ;If this is done right, this should skip every other row of tiles
   adc   #$10
   tay
   ldx   #$00
   bra look_for_32x32_vram

16x16_slot_found:
   inc   Vram,y      ;say that one of the slots is now taken
   sty   TempY      ;TempY holds the vram offset
   rts

32x32_slot_found:
   lda   #$01
   sta   Vram,y      ;say that four of the slots are now taken
   sta   Vram+1,y
   sta   Vram+8,y
   sta   Vram+9,y
   sty   TempY      ;TempY holds the vram offset
   rts

no_slot_found:
   rts   ;nothing else you can really do...


Oh wait, I'm dumb, are you, tepples, speaking of a way how to have one register represent 8 tiles because, you know, each register holds 8 bits?
Re: Dynamic Sprite Vram Routine Ideas
by on (#140181)
I'm talking about how to efficiently "calculate the tile number, when using 16x16s and 32x32s" as psycopathicteen put it, how to efficiently index into the following sequence:

[0, 2, 4, 6, 8, 10, 12, 14, 32, 34, 46, 48, 40, 42, 44, 46, 64, 66, 68, 70, 72, 74, 76, 78, 96, ...]

Do I need to draw and post a diagram, as usual?
Re: Dynamic Sprite Vram Routine Ideas
by on (#140182)
Sorry! I just didn't really look at it... :oops: (You do have to remember that not everyone is as cool as you. :wink: )
Re: Dynamic Sprite Vram Routine Ideas
by on (#140208)
There is a way to check individual bits of a byte. There's an instruction called "bit abs,x" that works like "and abs,x" except it doesn't affect the accumulator, only the zero flag.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140353)
Espozo wrote:
I see we have different opinions... Well, I respect yours anyway! :P The one thing I do have to say though is that having the whole 64KB range for sprites is cool, but maybe not all that practical because of BG tiles.


It's not just 64k; there are no page or bank limits on those systems. That's a pretty nice advantage when you're creating games that needs unique sprite frames for each object (unique for the frame) and/or double buffer too, and/or more room for stored animation. For example, a very common setup is 32k for sprites and 32k for tilemap and tiles. It's not the full 64k range, but still way more flexible than two banks of 8k (256 cells), and those said banks wrap (no falling through to the next bank). The SNES larger SAT does nothing to address that issue, but having 6 or more sprite sizes available readily in each SAT/OAM entry - does directly address the issue of a smaller SAT/OAM size; it helps negate that issue. Just look at the SNES itself, if you chose sprite sizes of 16x16/32x32 - do you really think you need that full 128 SAT/OAM size?? No. But if you went with something like 8x8/16x16, then anything outside of those two sizes requires more than one OAM entry. 64 entries with a sprite size of 32x32 still has more accumulated pixel coverage than a 256x223/239 screen. I.e. more pixels than you can show. I mean, unless your game has no individual objects larger than 16x16 (Gauntlet clone?)

The larger OAM table entry size is more of a supplement to a specific limitation (8x8/16x16), than a general advantage by comparison. That's not to say there aren't specific case advantages (particle effects, etc). SNES has a lot of advantages in the video department of the other systems, but sprites in general isn't really one of them (with the exception of cells and objects per line). So I don't see it as a matter of opinion (or a difference of opinion), but a matter of application in practice (the proof of the pudding is in the eating?).

Quote:
That's the only "wrap" I'm aware of.

The wrap on the table of 16x16 cells (8kbyte page in this context), if the sprite size is larger than 8x8.

psycopathicteen wrote:
There is a way to check individual bits of a byte. There's an instruction called "bit abs,x" that works like "and abs,x" except it doesn't affect the accumulator, only the zero flag.


Should effect Z, N, and V flags. That's not an advantage in this context, I take it?
Re: Dynamic Sprite Vram Routine Ideas
by on (#140358)
The opinion part was mostly that I think that 16x16's and 32x32 on the SNES is about equal with the sprite configuration on the TurboGrafx-16/PC Engine. Either Wikipedia's wrong (wouldn't surprise me :roll: ) because it said the PCE used 16x16 sized sprites as its smallest size, unlike the SNES and the Genesis that use 8x8 sprites as the smallest size. Because the PCE has double wide sprites, (excluding the 16×64 size) but only half the number of sprites, the SNES using 16x16 and 32x32 and the PCE can cover the same amount of screen with sprites, (not counting overdraw, but I'd hardly call 16 more pixels an "advantage" :roll: ) but there are different advantages for both. In the event you want the entire screen to be filled with 16x64 sized sprites, you can do this on the PCE, but not the SNES because it doesn't have 4x the number of sprites. If you had the entire screen filled with 32x64 sprites on the PCE, you could do the same think with the SNES because although you don't have 32x64 sized sprites, you still have 2x the amount of sprites, and they are 32x32, so you could just piece them together. you could also have a 64x32 sized "sprite" on the SNES and you can do the same thing on the PCE, but it will cost you twice as much. I think that both systems have different advantages, (SNES having x2 the amount of sprites, PCE having more sprite sizes and, more importantly, a lot more flexibility with sprite tiles) so its impossible to really declare a "winner", as both are better for different situations. :wink:
Re: Dynamic Sprite Vram Routine Ideas
by on (#140362)
Espozo wrote:
you don't have 32x64 sized sprites

Yes you do. Set the top three bits of OBSEL to 110 and that's the Large sprite size. Small is 16x32.

Reportedly the vertical flip function works on each half separately, though, which might be why Nintendo didn't bother telling anyone about this feature...
Re: Dynamic Sprite Vram Routine Ideas
by on (#140363)
I meant with the 16x16 and 32x32 combo. (probably should have specified that... :oops: ) Also just thinking, I know this isn't exactly a sprite issue, (its a BG one) but many games on the PCE uses sprites for background elements, reducing the amount of sprites to use by a bit.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140374)
I'm wondering. If you have the same sprite shown on two or more consecutive frames, and to save on DMA bandwidth, your keeping the sprites in the same place in vram, how were you planning on keeping track of the locations? I have two ideas:

1) Each object slot holds the tile numbers for all sprites in the object's metasprite.
2) Each object slot holds a pointer to where in the OAM the object's metasprite was written to in the previous frame, so it can look up the tile numbers.
Re: Dynamic Sprite Vram Routine Ideas
by on (#140436)
Espozo wrote:
I meant with the 16x16 and 32x32 combo. (probably should have specified that... :oops: ) Also just thinking, I know this isn't exactly a sprite issue, (its a BG one) but many games on the PCE uses sprites for background elements, reducing the amount of sprites to use by a bit.


Depends on the game. It's not that common, but Lords of Thunder and Dracula X:Rondo of blood probably do it the most.
I made a video a while back showing a PCE game using sprites for non object stuffs:
https://www.youtube.com/watch?v=Mm4pKt_LE2k
https://www.youtube.com/watch?v=PQX8KfYT-zc
All real BG stuff is greyscale and all sprites are monochrome (green or red). If it looks like separate scrolling layers in the BG and it's not monochrome green/red, then it's either hsync line scrolls and/or dynamic tiles.

Watch the second video, then watch this. The pillars are sprites. As much as sprites are on screen at that moment, the SAT isn't maxed out AFAIK. Though it's probably pretty close. But I find those results very acceptable, considering you really don't see snes games with such a number of objects on screen - let alone exceed it.

The problem with using sprites for more complex overlapping BG parts, is that it increases the risk of hitting the sprite scanline pixel limit - not maxing out the SAT. I have a WIP demo that uses a ton of sprites to simulate a large free-directional second BG layer on the PCE, way more than any official game and the SAT isn't maxed (enough entries left for a SMB3 or Sonic style game objects. It's the scanline limit that's the issue, not so much the SAT size). Anyway, I can't think of a better real example than the last video I linked (superhard mode for PCECD; revenge bullets), for how the SAT size isn't an issue at that res. Matter of fact, on the Genesis it's 64 too. It's only bumped to 80 when using 320 mode, because there's 25% more pixel coverage needed for the same screen real estate/area.

Quote:
Reportedly the vertical flip function works on each half separately,

Which kinda makes then useless. That, and the fixed rectangle size for both entries.
Quote:
because it said the PCE used 16x16 sized sprites as its smallest size

That's correct. Though there's a way to make it do vertical cell size of 8 pixel increments (XXby8,XXby16,XXby32), but no official games do this.

Quote:
If you had the entire screen filled with 32x64 sprites on the PCE, you could do the same think with the SNES because although you don't have 32x64 sized sprites, you still have 2x the amount of sprites, and they are 32x32, so you could just piece them together


But see, this is my point. As soon as you start to compare/argue/whatever larger sprite sizes than 16x16, it really becomes irrelevant. Because you're gonna hit that sprite scanline pixel limit way before you hit the full entry of the SAT/OAM. That's my point. From games that I played over the years of that era, 16x16, 32x16, 16x32, and 32x32 seem to be the most common sizes. Or close to those sizes but the same range (Genesis has more options for in between sizes).

Your examples aid my point; the larger SAT/OAM-table is meaning less for sprite sizes outside of 8x8/16x16 - because all these system have a pretty limited sprite scanline limit, and you're gonna hit that limit much before you hit the table limit.

I do think the 8x8/16x16+128OAMsize has a lot of advantages and is the best mode of the SNES. It's just more cpu overhead (and coupled of the upper layout of the OAM table, etc). 16x16/32x32 makes sense for reducing cpu overhead, but I don't think it's a better mode; just more practical in regards to cpu usage. I just don't see how the the 128 table size is advantageous outside of that mode, especially when real world examples show otherwise.


If you're interested in discussing this further (examples, etc), maybe we should start a new thread instead of derailing/side tracking this one?

So this organization routine you're writing - it's for 16x16/32x32 mode right? Because of how the sprites cell rows are interleaved?
Re: Dynamic Sprite Vram Routine Ideas
by on (#140448)
psycopathicteen wrote:
I'm wondering. If you have the same sprite shown on two or more consecutive frames, and to save on DMA bandwidth, your keeping the sprites in the same place in vram, how were you planning on keeping track of the locations? I have two ideas:

1) Each object slot holds the tile numbers for all sprites in the object's metasprite.
2) Each object slot holds a pointer to where in the OAM the object's metasprite was written to in the previous frame, so it can look up the tile numbers.

The way I was thinking it would be to go through the code, find the result, and then store it in a register offset by what sprite it is currently on. (So there will be 128) When my code jumps to the metasprite routine, it loads the values for the metasprite table in a sprite buffer offset by y. You probably load the tile number byte offset by y (unfortunately, this means using 512 registers...) and then adding the number to the bytes in OAM that hold the Character selection bits. To be completely honest with you, I really haven't even thought this far. (And it's not even far at all... :oops: )
Re: Dynamic Sprite Vram Routine Ideas
by on (#140619)
A bit of a bump but, you know, another thing I'm starting to wonder is how you are going to tell the metasprite routine where to look for the tiles in each sprite. It's easy to do it for x and y position, because you can always find each sprite's position relative to the metasprites x and y, but you cant really do this for character data, because in the routine I made, there are plenty of times where there could be a gap in between sprites in a metasprite do to how it handles double sized sprites. I have my original metasprite code (that still has plenty of work to be done :wink: ) right here,

Code:
: lda MetaspriteCount       ; If MetaspriteCount is zero, then we're done.  Otherwise we have
  beq done                  ; metasprites to iterate over and populate for DMA (see VBlank)
  lda MetaspriteTable,x     ; 1st byte = sprite X position (value 0-255)
  adc MetaspriteXPosition
  and #$00FF
  sta SpriteBuf1,y          ; Store sprite X position SpriteBuf1+y
  lda MetaspriteTable+2,x   ; 2nd byte = sprite Y position (value 0-255)
  adc MetaspriteYPosition
  and #$00FF
  sta SpriteBuf1+1,y        ; Store sprite Y position in SpriteBuf1+1,y
  lda MetaspriteTable+4,x   ; 2nd byte = sprite Y position (value 0-255)
  sta SpriteBuf1+2,y        ; Store sprite Y position in SpriteBuf1+1,y
  txa
  clc
  adc #$0006
  tax
  tya
  clc
  adc #$0004                ; Increment Y by 4 because each sprite in the OAM table has 4 bytes
  tay                     
  dec MetaspriteCount       ; Decrement MetaspriteCount by 1
  bra :-                    ; Back to the loop...


and the makeshift one right here. (The hard part was trying to index for 3 different things) I really don't want to jump the routine that finds tile numbers in the metasprite routine, because that would cause me to find the tile number of sprites that aren't even going to undergo any animation change.

Code:
: lda MetaspriteCount       ; If MetaspriteCount is zero, then we're done.  Otherwise we have
  beq done                  ; metasprites to iterate over and populate for DMA (see VBlank)
  lda MetaspriteTable,x     ; 1st byte = sprite X position (value 0-255)
  adc MetaspriteXPosition
  and #$00FF
  sta SpriteBuf1,y          ; Store sprite X position SpriteBuf1+y
  lda MetaspriteTable+2,x   ; 2nd byte = sprite Y position (value 0-255)
  adc MetaspriteYPosition
  and #$00FF
  sta SpriteBuf1+1,y        ; Store sprite Y position in SpriteBuf1+1,y
  lda MetaspriteTable+4,x   ; 2nd byte = sprite Y position (value 0-255)
  sty TempY1
  ldy MetaspriteCharacterOffsetOffset ;(a bit redundant, but I can't think of a better title right now...)
  clc
  adc MetaspriteCharacterOffset,y
  iny
  sty MetaspriteCharacterOffsetOffset
  ldy TempY1
  and #$01FF
  sta SpriteBuf1+2,y        ; Store sprite Y position in SpriteBuf1+1,y
  txa
  clc
  adc #$0006
  tax
  tya
  clc
  adc #$0004                ; Increment Y by 4 because each sprite in the OAM table has 4 bytes
  tay                     
  dec MetaspriteCount       ; Decrement MetaspriteCount by 1
  bra :-                    ; Back to the loop...

Just so you know, this is using ca65 assembler.
Re: Dynamic Sprite Vram Routine Ideas
by on (#141770)
93143 wrote:
I'm not sure if you can change OBSEL during HBlank (though I certainly hope so, and I should probably test this), but I know you can change it between frames.

Well, last night I decided to check this, and now I'm sure. You can change at least the table locations by writing to OBSEL during HBlank. Transition is seamless, so far as I can tell.

(This test ROM lets you move a red square around the screen. Halfway down, it turns green. No modifications to CGRAM are involved.)

It's probable I'm not the first to find this out, since every emulator I tried (even ZSNES) gets it right. But I wanted to be certain, because this was one of the few remaining potential wrenches in the works that could have derailed my porting project. I have a lot of sprites that never move, and even without anything else going on they don't fit in 16 kB... unless I waste a lot of 8x8 sprites to avoid tile duplication, in which case they seem to just barely fit in 16 kB with little or no room for anything else...

EDIT: replaced .rar with .7z in deference to the moderators. Also note that this doesn't quite work properly in the latest version of higan; the switch seems to be reversed (green on top, red on bottom). In the accuracy core, the switches seem to happen one scanline late, the most obvious effect of which is that the top scanline is rendered in the same colours as the bottom half of the screen; this happens as far back as bsnes v072.
Re: Dynamic Sprite Vram Routine Ideas
by on (#141788)
If you were doing this though, wouldn't you have to duplicate a bunch of tiles to be on both sprite tables? Basically, you have to reserve half of vram for sprites even if you really only need about 24KB. The main problem I see with this though is that you'd have to update vram twice for one object. Maybe your code could look to see where you object is vertically and only update the table that corresponds to what part of the screen it's on? You'd still have to write to both if it is in the dead center.
Re: Dynamic Sprite Vram Routine Ideas
by on (#141812)
In the general case, yes, it'd be tricky. My game is a unique case in that a lot of the sprites have a defined position on the screen that doesn't change during gameplay.

It's a bullet hell port, that can easily have hundreds of bullets onscreen, and some of the backgrounds really want to be in Mode 7. This means (a) I need the Super FX chip or something similar to blit the bullets to a surface, and (b) the blitted surface sometimes has to be a tiled sprite layer, because Mode 7 doesn't let you use any other background layers. The sprites in the bullet layer never move, so they only need to be stored once for a given frame.

Plus, the sidebar with score and lives and whatnot would eat too many sprites if I just overlaid it on Mode 7, so I need to switch from Mode 7 to Mode 1 about two-thirds of the way across the screen. This generates a column of garbage a few tiles wide, which needs to be masked with - you guessed it - more sprites. Which also never move...

...

I figure I can probably get away with using one sprite table for all of the stuff that can show up anywhere plus maybe one or two of the 32x32 mode switch masking sprites. Then I can simply modify the offset of the second table to give me 3/2 buffering for the bullet layer, in which I use two tables for a frame, overwrite the third table during VBlank, use the same two tables (giving me 30 fps), overwrite one of the just-used tables, and then change to using the two newly-overwritten tables for the third frame; rinse and repeat. Minimal duplication. I may have to reserve more space for BG tiles, which would cut into the space available for Mode 7 data, but I'll cross that bridge when I come to it.

Espozo wrote:
Maybe your code could look to see where you object is vertically and only update the table that corresponds to what part of the screen it's on? You'd still have to write to both if it is in the dead center.

Depending on what you were doing, you might even want to move the table switch up and down depending on where stuff is... That wouldn't work for my game, but I'm sure it's theoretically possible to come up with a scenario in which it'd be useful...
Re: Dynamic Sprite Vram Routine Ideas
by on (#141813)
At how much stuff you are doing (enhancement chips, buffers, changing object tables) couldn't you just have it to where the entire playing field is just a buffer, similar to Doom? If you're worried about frame rate, you could always flip the image on it's side for more DMA bandwidth. Off course, the pictures on its side. :?
Re: Dynamic Sprite Vram Routine Ideas
by on (#141815)
According to my calculations a GSU2 in high-speed mode (Super FX2) would be fairly heavily loaded just drawing the bullets at 4bpp and 30 fps. In fact I'm still not certain it's possible to do a 1:1 port, though I think it's close enough that if it comes to it, a few tricks ought to get me over the line. We'll see. If all else fails I might have to accept a bit of slowdown.

In any case doing a scaling/rotating background at the same time, especially an 8bpp one, would completely blow away my processor time budget. And if it were 8bpp, it would take twice as long to DMA. Flipping it on its side really isn't an option; the feel would be completely different, which isn't really acceptable for a port.

I could scale it down and not use Mode 7 for the backgrounds, but I don't want to, and I think the system can handle what I want to do.
Re: Dynamic Sprite Vram Routine Ideas
by on (#141816)
Who said you had to use the SuperFX... Anyway, If you run out of time using the normal SuperFX chip, you could use the version that's clocked twice as fast, unless you're still afraid of running out of time.
Re: Dynamic Sprite Vram Routine Ideas
by on (#141818)
I am using the version that's clocked twice as fast. And there are no other programmable special chips for the Super NES that are better at bitplane blitting. I'd have to make something up, which would essentially render this project non-executable.

(I've been busy with work, but I really should mock up a bullet-rendering engine and start getting a feel for what the real-world performance is like. Last I heard, emulated Super FX chips weren't exactly the same as real ones, but I think higan is pretty close...)
Re: Dynamic Sprite Vram Routine Ideas
by on (#142334)
I might as well use Espozo's idea since I ran into a little trouble today with vram overload in my Alisha's Adventure, though I want to fix some other issues first.

-players collision box is too big.
-kicks look weird.
-explosions cause too much flicker and clutter up the view too much.
-button scheme needs a little reworking.
-BG collision routine takes too much CPU time.
Re: Dynamic Sprite Vram Routine Ideas
by on (#142382)
psycopathicteen wrote:
I might as well use Espozo's idea since I ran into a little trouble today with vram overload in my Alisha's Adventure, though I want to fix some other issues first.

Hold on now! You still need to pay me the royalties. (It's patented. :wink: )

psycopathicteen wrote:
-explosions cause too much flicker and clutter up the view too much.

Honestly, I don't think explosions really fit for what you're using them for. I would recommend using a spikey cloud like thing you would see in a comic book for whenever you hit an enemy, like the affect that plays whenever you jump on an enemy in DKC. You could use and explosion for when the enemy is defeated though, and make it to where it isn't permanently in vram like it is now.

psycopathicteen wrote:
kicks look weird.

Would you be interested in someone helping you with artwork?
Re: Dynamic Sprite Vram Routine Ideas
by on (#142387)
Maybe I'll use a forward flying kick as the normal attack, instead of half of a roundhouse kick.

Heck, it could be that I just over-animated it, and if I cut a frame or two out, it will look right.
Re: Dynamic Sprite Vram Routine Ideas
by on (#144587)
Code:
animation:

lda {graphics_address_request}
cmp {graphics_address}
bne +

lda {metasprite_request}
cmp {metasprite}
beq no_metasprite_pattern

+;

lda #$0000
ldy {metasprite_request}
beq no_animation_slot
lda $0008,y
beq no_animation_slot



sep #$20
lda $000a,y
sta $4202
lda $000c,y
sta $4203
nop #3
rep #$20
lda $4216
+;

sta {vram_size}
clc
adc {total_dma_legnth}
cmp #$0060
bcc yes_metasprite_pattern

lda {first_object_to_dma}
bne +
tdc
sta {first_object_to_dma}
+;

jmp no_metasprite_pattern

yes_metasprite_pattern:






jsr clear_vram_slot

ldy {metasprite_request}
lda $000a,y
sta {vram_width}
lda $000c,y
sta {vram_height}

bra dynamic_animation




no_animation_slot:
ldy {metasprite_request}
sty {metasprite}
tdc
tax
-;
lda $000e,y
sta {metasprite_table},x
beq +
lda $0010,y
sta {metasprite_table}+2,x
lda $0012,y
sta {metasprite_table}+4,x
lda $0014,y
sta {metasprite_table}+6,x
txa
clc
adc #$000a
tax
tya
clc
adc #$0008
tay
bra -

+;


no_metasprite_pattern:

rts



dynamic_animation:

lda {graphics_address_request}
sta {graphics_address}


lda {metasprite_request}
sta {metasprite}
tay
tdc
tax
-;
lda $000e,y
sta {metasprite_table},x
beq +
lda $0010,y
sta {metasprite_table}+2,x
lda $0012,y
sta {metasprite_table}+4,x
lda $0014,y
sta {metasprite_table}+8,x

jsr find_vram_slot



txa
clc
adc #$000a
tax
tya
clc
adc #$0008
tay
bra -

+;











ldy {metasprite}
lda $0006,y
clc
adc {graphics_address}
sta {temp}
lda $0008,y
sta {temp2}

tdc
tax

ldy {dma_updates}

-;

lda {metasprite_table},x
beq +
asl
sta {dma_rows},y
asl #5
sta {dma_legnth},y

lda {vram_width}
asl #5
sta {dma_increment},y

lda {metasprite_table}+8,x
and #$01f0
lsr #4
sep #$20
sta $4202
lda {vram_width}
sta $4203
nop #3
rep #$20
lda {metasprite_table}+8,x
and #$000f
clc
adc $4216
asl #5
clc
adc {temp}
sta {dma_address},y
lda {temp2}
sta {dma_bank},y


lda {metasprite_table}+6,x
and #$01ff
asl #4
sta {dma_destination},y

iny #2

lda #$0000
sta {dma_legnth},y

txa
clc
adc #$000a
tax
bra -

+;
sty {dma_updates}
lda {vram_size}
clc
adc {total_dma_legnth}
sta {total_dma_legnth}

jmp no_metasprite_pattern










find_vram_slot:
phx
lda {metasprite_table},x
cmp #$0002
beq ++

sep #$20
ldx #$0000
-;
lda {vram_slot_table},x
beq +
inx
cpx #$0080
bne -
rep #$20
pla
sec
sbc #$000a
tax
rts

+;
lda #$01
sta {vram_slot_table},x

rep #$20
stx {temp}
txa
and #$0078
clc
adc {temp}
asl
plx
sta {temp}

lda $0014,y
and #$fe00
ora {temp}
sta {metasprite_table}+6,x
rts





+;
ldx #$0000
phy
ldy #$0007
-;
lda {vram_slot_table},x
ora {vram_slot_table}+8,x
beq +
inx
dey
bne -
inx
ldy #$0007
cpx #$0030
bne -
ldx #$0040
-;
lda {vram_slot_table},x
ora {vram_slot_table}+8,x
beq +
inx
dey
bne -
inx
ldy #$0007
cpx #$0070
bne -
ply
pla
sec
sbc #$000a
tax

rts
+;
ply
lda #$0101
sta {vram_slot_table},x
sta {vram_slot_table}+8,x

stx {temp}
txa
and #$0078
clc
adc {temp}
asl
plx
sta {temp}

lda $0014,y
and #$fe00
ora {temp}
sta {metasprite_table}+6,x

rts










clear_vram_slot:
php


ldy {metasprite}
beq no_slot_to_clear
lda $0008,y
beq no_slot_to_clear


tdc
tax
-;
lda {metasprite_table},x
beq no_slot_to_clear
tay
phx

lda {metasprite_table}+6,x
and #$01e0
lsr
sta {temp}
lda {metasprite_table}+6,x
and #$000e
ora {temp}
lsr

tax

cpy #$0002
beq +
sep #$20
lda #$00
sta {vram_slot_table},x
rep #$20



bra ++
+;
lda #$0000
sta {vram_slot_table},x
sta {vram_slot_table}+8,x


+;




pla
clc
adc #$000a
tax
jmp -

no_slot_to_clear:

tdc
tax
lda #$0000
sta {metasprite_table},x


plp
rts


I just revamped my animation code to use your idea of individual 16x16s and 32x32s getting their own vram slot. I had the idea of giving each object slot, it's own metasprite table, just to make it easier to build the oam afterwords. I think the code could be simplified and optimized a bit more though.
Re: Dynamic Sprite Vram Routine Ideas
by on (#144595)
psycopathicteen wrote:
I just revamped my animation code to use your idea of individual 16x16s and 32x32s getting their own vram slot. I had the idea of giving each object slot, it's own metasprite table, just to make it easier to build the oam afterwords.
Didn't I say that earlier? I said you could reserve 16 or so bytes with one for each sprite.

This is pretty off topic, (and was actually the main reason I was going to post) but how do you use direct page without having to do this stuff?

stz a:HFlipMask

I know someone explained it at some point, but I can't find where... I was thinking about having objects being offset by direct page instead of y like I am know, so I have x and y free instead of just x.
Re: Dynamic Sprite Vram Routine Ideas
by on (#144600)
Wish I could comment on how ca65 does things, but the bottom line is that to use direct page you need to use instructions that have a one-byte address for an operand. In WLA at least, this should happen automatically when you use a label that's defined as one byte, like, ".EQU label $08". Later when you write "LDA label", it will translate that to "LDA $08" which will use direct page. (WLA is just a dick about it so that if you write ".EQU label $0008" it will still interpret it as "LDA $08" which is wrong - for any address under $0100 you need to explicitly tell WLA that you mean TWO BYTES.)

In other words, I don't THINK you should need the "a:" at all just to use direct page, so long as HFlipMask is defined as a single byte. I think the easiest way to figure it out would probably be to just try it, and then look at either a listing file or a trace to see if it correctly translated your direct page instructions to a one-byte-operand instruction.

I think that's what you were asking anyways...
Re: Dynamic Sprite Vram Routine Ideas
by on (#144603)
Well, whenever you use direct page, the things you want to use direct page for are using a "<" and the things that aren't are using a "a:". Otherwise, it seems that things get loaded from the wrong location. I think I remember something about moving something out of the BSS segment?
Re: Dynamic Sprite Vram Routine Ideas
by on (#144604)
Hope someone can confirm my assumption here, but sounds like the "<" is the same as WLA's ".b" and "a:" is WLA's ".w" - it's "operand hinting" to explicitly tell the assembler whether to take one or two bytes as the operand.

In other words, you only "need" those when ca65 isn't interpreting it the way you want by default. I would imagine, generally speaking, that you shouldn't require the "a:"s ever and only the <s, if ca65 always defaults to a two-byte ("absolute") address.
Re: Dynamic Sprite Vram Routine Ideas
by on (#144605)
Nope, I need the "a:"s. The code doesn't work correctly without them. However, it does of direct page hasn't been messed with.
Re: Dynamic Sprite Vram Routine Ideas
by on (#144606)
Well, that means that for some reason it's assembling your absolute-addressed instructions as one-byte direct page instructions. I cannot think of a reason why that should ever happen, so I'm afraid someone else who uses ca65 will have to step in...
Re: Dynamic Sprite Vram Routine Ideas
by on (#144609)
If you're trying to access variables located in $000000-$0000FF while D is pointed elsewhere, you need the a:.
Re: Dynamic Sprite Vram Routine Ideas
by on (#144610)
So should I just push the variables out of that?
Re: Dynamic Sprite Vram Routine Ideas
by on (#144613)
You could use a LUT for converting between the slot number and CHR number. Since all CHR numbers are even, they can be stored as 8-bit values, and then shifted left by 1, to get a 9-bit CHR number.

You can also change the order of the sprite slots, so each group of 4 16x16 slots make a 32x32 slot.

014589CD
2367ABEF
Re: Dynamic Sprite Vram Routine Ideas
by on (#144615)
Khaz wrote:
Hope someone can confirm my assumption here, but sounds like the "<" is the same as WLA's ".b" and "a:" is WLA's ".w" - it's "operand hinting" to explicitly tell the assembler whether to take one or two bytes as the operand.

In other words, you only "need" those when ca65 isn't interpreting it the way you want by default. I would imagine, generally speaking, that you shouldn't require the "a:"s ever and only the <s, if ca65 always defaults to a two-byte ("absolute") address.

"<" in ca65 takes the lowbyte of a value, and since the assembler then knows that the value is in range 0..255 it uses zeropage/direct page addressing for the instruction. "a:" is a way to override the automatic addressing mode selection to use absolute addressing. ca65 doesn't default to absolute addressing (absolute addressing would be a bad default on 6502, but that doesn't stop NESASM from doing it...).

Khaz wrote:
Well, that means that for some reason it's assembling your absolute-addressed instructions as one-byte direct page instructions. I cannot think of a reason why that should ever happen, so I'm afraid someone else who uses ca65 will have to step in...

ca65 doesn't (and can't) keep track of whether the direct page register has been modified, so it works on the assumption that it is 0.

The reason it behaves like that is quite clearly because it started as a 6502 assembler and 65816 support was added as an afterthought. The 65816 support could definitely be improved, but it's not exactly trivial to do because of the object/link model used by cc65 (even if you were able to hint the assembler about the current value of the direct page register, its value could come from an external symbol and the assembler wouldn't know it at compile time).
Re: Dynamic Sprite Vram Routine Ideas
by on (#144734)
This is really getting hard to optimize. All this extra work just to make the robots slightly bigger.
Re: Dynamic Sprite Vram Routine Ideas
by on (#144737)
Are you running into slowdown? I know this sounds terrible, but I'd much much rather large fluently animated sprites over small choppy ones. At least one can be fixed by throwing an expansion chip at it... (Not saying it would be easy, I'm just saying it would be possible.) Maybe I'm not the one people should be getting opinions from, considering I like Metal Slug 2 over X. (I really just don't find slowdown that irritating, and in many situations where there is slowdown in these games, there's so much going on that it's almost welcome.)
Re: Dynamic Sprite Vram Routine Ideas
by on (#144793)
So far, I got it optimized down so that when it slows down, it only misses it by a couple scanlines. I only have a couple more tweaks left. One optimization technique that worked well was clearing OAM with DMA.

I had the idea of exclusively using 16x16 sprites for dynamically animated objects, so I can use a "stack of open slots" trick. I'll set up a stack of every possible open slot. When a slot gets used, it's pulled off the "stack", when it is cleared, it's pushed back on. I'm not sure if it is worth it or not though.
Re: Dynamic Sprite Vram Routine Ideas
by on (#144963)

Code:
animation:

lda {graphics_address_request}
cmp {graphics_address}
bne +

lda {metasprite_request}
cmp {metasprite}
beq no_metasprite_pattern

+;

lda #$0000
ldy {metasprite_request}
beq no_animation_slot
lda $0008,y
beq no_animation_slot



sep #$20
lda $000a,y
sta $4202
lda $000c,y
sta $4203
nop #3
rep #$20
lda $4216
+;

sta {vram_size}
clc
adc {total_dma_legnth}
cmp #$0060
bcc yes_metasprite_pattern

lda {first_object_to_dma}
bne +
tdc
sta {first_object_to_dma}
+;

no_metasprite_pattern:
rts

yes_metasprite_pattern:






jsr clear_vram_slot

ldy {metasprite_request}
lda $000a,y
asl #5
sta {vram_width}
lda $000c,y
sta {vram_height}

bra dynamic_animation




no_animation_slot:
jsr clear_vram_slot

ldy {metasprite_request}
sty {metasprite}
tdc
tax
-;
lda $000e,y
sta {metasprite_table},x
beq +
lda $0010,y
sta {metasprite_table}+24,x
lda $0012,y
sta {metasprite_table}+48,x
lda $0014,y
sta {metasprite_table}+72,x
inx #2
tya
clc
adc #$0008
tay
bra -

+;




rts



dynamic_animation:

lda {graphics_address_request}
sta {graphics_address}




lda {metasprite_request}
sta {metasprite}
tay
lda $0006,y
clc
adc {graphics_address}
sta {temp3}
lda $0008,y
sta {temp2}

tdc
tax
-;
lda $000e,y
sta {metasprite_table},x
beq +
lda $0010,y
sta {metasprite_table}+24,x
lda $0012,y
sta {metasprite_table}+48,x
lda $0014,y
sta {metasprite_table}+96,x

jsr find_vram_slot



inx #2
tya
clc
adc #$0008
tay
bra -

+;



















lda {vram_size}
clc
adc {total_dma_legnth}
sta {total_dma_legnth}

jmp no_metasprite_pattern










find_vram_slot:
phx
lda {metasprite_table},x
cmp #$0002
bne +
jmp large_slot
+;

phy
ldy #$0000
sep #$20
ldx #$0000
lda {vram_slot_table},x
cmp #$0f
bne +
-;
inx
lda {vram_slot_table},x
cmp #$0f
beq -
+;
-;
lsr
bcc +
iny
bra -

+;



cpx #$0020
bne +
ply
rep #$20
plx
dex #2
rts

+;
lda slot_to_name3,y
ora {vram_slot_table},x
sta {vram_slot_table},x

rep #$20
lda slot_to_name,x
ora slot_to_name2,y
and #$00ff
asl
sta {temp}
ply
plx


lda $0014,y
and #$fe00
ora {temp}
sta {metasprite_table}+72,x
phy
ldy {dma_updates}

lda #$0002
sta {dma_rows},y
lda #$0040
sta {dma_legnth},y

lda {vram_width}
sta {dma_increment},y

lda {metasprite_table}+96,x
and #$01ff
asl #5
clc
adc {temp3}
sta {dma_address},y
lda {temp2}
sta {dma_bank},y


lda {metasprite_table}+72,x
and #$01ff
asl #4
sta {dma_destination},y

iny #2

lda #$0000
sta {dma_legnth},y

sty {dma_updates}


ply
rts






large_slot:
sep #$20
ldx #$0000
lda {vram_slot_table},x
beq +
-;
inx
lda {vram_slot_table},x
bne -
+;
cpx #$0020
bne +
rep #$20
plx
dex #2

rts
+;
lda #$0f
sta {vram_slot_table},x
rep #$20

lda slot_to_name,x
and #$00ff
asl
sta {temp}

plx


lda $0014,y
and #$fe00
ora {temp}
sta {metasprite_table}+72,x

phy
ldy {dma_updates}

lda #$0004
sta {dma_rows},y
lda #$0080
sta {dma_legnth},y

lda {vram_width}
sta {dma_increment},y

lda {metasprite_table}+96,x
and #$01ff
asl #5
clc
adc {temp3}
sta {dma_address},y
lda {temp2}
sta {dma_bank},y


lda {metasprite_table}+72,x
and #$01ff
asl #4
sta {dma_destination},y

iny #2

lda #$0000
sta {dma_legnth},y

sty {dma_updates}


ply
rts










clear_vram_slot:
php


ldy {metasprite}
beq no_slot_to_clear
lda $0008,y
beq no_slot_to_clear


tdc
tax
-;
lda {metasprite_table},x
beq no_slot_to_clear
tay
phx

lda {metasprite_table}+72,x
and #$01ff
tax
lda name_to_slot2,x
sta {temp}
lda name_to_slot,x
tax

cpy #$0002
beq +
sep #$20
lda {temp}
eor #$ff
and {vram_slot_table},x
sta {vram_slot_table},x
rep #$20



bra ++
+;


sep #$20
lda #$00
sta {vram_slot_table},x
rep #$20


+;




plx
inx #2
jmp -

no_slot_to_clear:

tdc
tax
lda #$0000
sta {metasprite_table},x


plp
rts


slot_to_name:
db $00
db $02
db $04
db $06
db $20
db $22
db $24
db $26
db $40
db $42
db $44
db $46
db $60
db $62
db $64
db $66

db $80
db $82
db $84
db $86
db $a0
db $a2
db $a4
db $a6
db $c0
db $c2
db $c4
db $c6
db $e0
db $e2
db $e4
db $e6

slot_to_name2:
db $00
db $01
db $10
db $11

slot_to_name3:
db $01
db $02
db $04
db $08


name_to_slot:
dw $0000,$0000,$0001,$0001,$0002,$0002,$0003,$0003
dw $0000,$0000,$0001,$0001,$0002,$0002,$0003,$0003
dw $0000,$0000,$0001,$0001,$0002,$0002,$0003,$0003
dw $0000,$0000,$0001,$0001,$0002,$0002,$0003,$0003
dw $0004,$0004,$0005,$0005,$0006,$0006,$0007,$0007
dw $0004,$0004,$0005,$0005,$0006,$0006,$0007,$0007
dw $0004,$0004,$0005,$0005,$0006,$0006,$0007,$0007
dw $0004,$0004,$0005,$0005,$0006,$0006,$0007,$0007
dw $0008,$0008,$0009,$0009,$000a,$000a,$000b,$000b
dw $0008,$0008,$0009,$0009,$000a,$000a,$000b,$000b
dw $0008,$0008,$0009,$0009,$000a,$000a,$000b,$000b
dw $0008,$0008,$0009,$0009,$000a,$000a,$000b,$000b
dw $000c,$000c,$000d,$000d,$000e,$000e,$000f,$000f
dw $000c,$000c,$000d,$000d,$000e,$000e,$000f,$000f
dw $000c,$000c,$000d,$000d,$000e,$000e,$000f,$000f
dw $000c,$000c,$000d,$000d,$000e,$000e,$000f,$000f
dw $0010,$0010,$0011,$0011,$0012,$0012,$0013,$0013
dw $0010,$0010,$0011,$0011,$0012,$0012,$0013,$0013
dw $0010,$0010,$0011,$0011,$0012,$0012,$0013,$0013
dw $0010,$0010,$0011,$0011,$0012,$0012,$0013,$0013
dw $0014,$0014,$0015,$0015,$0016,$0016,$0017,$0017
dw $0014,$0014,$0015,$0015,$0016,$0016,$0017,$0017
dw $0014,$0014,$0015,$0015,$0016,$0016,$0017,$0017
dw $0014,$0014,$0015,$0015,$0016,$0016,$0017,$0017
dw $0018,$0018,$0019,$0019,$001a,$001a,$001b,$001b
dw $0018,$0018,$0019,$0019,$001a,$001a,$001b,$001b
dw $0018,$0018,$0019,$0019,$001a,$001a,$001b,$001b
dw $0018,$0018,$0019,$0019,$001a,$001a,$001b,$001b
dw $001c,$001c,$001d,$001d,$001e,$001e,$001f,$001f
dw $001c,$001c,$001d,$001d,$001e,$001e,$001f,$001f
dw $001c,$001c,$001d,$001d,$001e,$001e,$001f,$001f
dw $001c,$001c,$001d,$001d,$001e,$001e,$001f,$001f

name_to_slot2:
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0001,$0002,$0001,$0002,$0001,$0002,$0001,$0002
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008
dw $0004,$0008,$0004,$0008,$0004,$0008,$0004,$0008


This works fast enough. This makes me wonder how fast the Genesis would be if it had the same PPU as the SNES, and needed to do all this shit in order to have good animation, and vice-versa. If this was the Genesis, I would've used 80 equally sized 32x32 slots.