I've heard psychopathicteen talk about how he branches to an "animation engine" when he is done going through the code of all the objects. I'm not exactly sure how much his does, I imagine that every object slot would have information that the animation engine would look at and put on the screen in some way. One idea that I though of is that every object just has one register (16 bits) that would say what animation frame you want to display. This number would offset multiple tables thought the animation engine like the address of where the tiles start and how many tiles the object is using. One thing that's frustrating is that I'm pretty sure vram updates have to be done during vblank, so that's a bit on an inconvenience. I'm not sure how I would store the addresses for all (or a least most) of the tiles in sprite part of vram for it. The animation engine would find empty space in vram to add the new tiles, using the thing I thought of where it looks for 16x16 sized slots, or four of them for a 32x32 sized sprite. About the register in the object slots, it would also offset the table that holds addresses for a metasprite table, so it works for the metasprite creating routine. I have no idea how to translate this all into code... (the animation engine part.) The code also needs to know not to look for empty space in vram for the tiles if the object has already been created and the tiles are in vram.
How does everyone else deal with animation? Like I said, I'm aiming on just doing something like "lda #Player1Walk1" and it appearing on screen.
Personally, at the risk of repeating myself... I based my metasprite routine largely off
psycopathicteen's reply to you back in December. I have buildMetasprite in my normal object loop right after I finish each object and I only iterate across all objects once.
When an object is spawned in my system, it's allocated its block in VRAM if it requires one. I'm using the DKC method (so I hear) of having 16 strips of VRAM for one object each, with one or two reserved for common objects. Each object contains a pointer to the address of its current sprite in ROM, if it has a VRAM block allocated. Whenever I change that ROM pointer, I set a bit that tells my vblank routine to DMA the new strip of sprite tiles over to that VRAM block.
So in my system, updating a VRAM-block-allocated object's image on screen really is as simple as storing the new ROM address and metasprite label into the object's internal registers. If it's a common object with no VRAM allocation, you either just change the tile number or write a new tile to that slot during vblank.
I know you could have problems with object-object interactions when drawing as you go through the first time, but I can't imagine them being too hard to just patch around and they'd mostly just be visual errors only visible on one frame, so hardly noticeable, right?
I do have something similar to a "frame number register." In Alisha's Adventure, I use two registers, one that holds a "metasprite number" and another that holds a "graphics address offset" number. The "graphics address offset" is there, so I can reuse the same metasprite for multiple frames, by just changing where in ROM it's DMAing the patterns from.
Khaz wrote:
Personally, at the risk of repeating myself... I based my metasprite routine largely off psycopathicteen's reply to you back in December. I have buildMetasprite in my normal object loop right after I finish each object and I only iterate across all objects once.
Why does that matter at all? I already have a metasprite building routine, I just want to have it to where I offset a table with metasprite table addresses so I can also do that with the vram slot finding thing. That's easy to implement.
Khaz wrote:
When an object is spawned in my system, it's allocated its block in VRAM if it requires one. I'm using the DKC method (so I hear) of having 16 strips of VRAM for one object each, with one or two reserved for common objects. Each object contains a pointer to the address of its current sprite in ROM, if it has a VRAM block allocated. Whenever I change that ROM pointer, I set a bit that tells my vblank routine to DMA the new strip of sprite tiles over to that VRAM block.
See, you don't have the problem because tile positions in vram are always going to be next to each other.
Basically, I have a number I store in an object slot (lets call the slot "Joe") that at the start of the vram finding routine, is loaded to x and offsets a table that has the addresses of tables that say position of where the tiles for the frame are. In the metasprite routine, the number from "Joe" is loaded again and also offsets a table that has the addresses of the metasprite tables. Maybe this is just too much going on...
Code:
Animation:
ldx Joe
lda TileAddressTableTable,x
tay
lda TileAddressTable,y
Metasprite:
ldx Joe
lda MetaspriteTablePositionTableTable,x
tay
lda MetaspriteTablePositionTable,y
sta MetaspriteCount
iny
iny
sty MetaspriteOffset
Well correct me if I'm wrong, but it sounds to me like all you're talking about is combining the metasprite number and the tile source in ROM into a single "image ID" value that will represent that exact frame of animation. I just don't see the point in that; it might save you a byte or two in each object slot, but at the expense of more space and time for the lookup tables to convert your "image ID" into a metasprite and a ROM address.
I see what you're saying about trying to dynamically search for room in VRAM to fit new sprites in, but I'm a bit wary of that approach taking up too much processing and just plain being more of a pain. But in terms of how to implement it, it doesn't really seem much different from my approach. You could still probably store every image sequentially in ROM so you only need the start address and load the right number of tiles.
The hard part of that approach is keeping track of where everything is in VRAM. You'll need several bytes inside each object to track that if you're storing one metasprite in non-sequential blocks of space. If you have multiple tile sizes you'll also be prone to having wasted space, though I'm sure that happens no matter what you do. If you have large object slots with lots of spare bytes, I could see doing things that way and maybe squeezing out a bit more VRAM room.
I dunno. I'll just give my opinion and say that it just doesn't seem like you'll get any noticeable benefit out of it for all the work and time and processor load, as opposed to a simpler system.
Just give each object slot a designated tile-name table.
Well, I think I've kind of settled on what I want to do. I was originally using a "MetaspriteTableSize" byte in every one of my object slots along with a "MetaspriteTableOffset", but I'm just having it to where the first byte in the metasprite table is what says the size. As you can see below, the performance is pretty much the same, and I save on a word in ram and I also save on an extra load and store, so performance is probably slightly higher. I'm just going to have a separate register for the tile location, so I'll have 2 things instead of 3.
Code:
Original Code:
ldx MetaspriteTableOffset (It actually needs to be loaded into x for the code)
lda MetaspriteTableSize
sta a:MetaspriteCount
New Code:
ldx MetaspriteTableOffset
lda a:Zero,x (but it had proven itself useful here)
sta a:MetaspriteCount
My metasprite data format actually contains information on animation, as well as metasprite stuff.
Code:
Alisha_stance:
dw $0018,$0000,$0004
dw $4000,$00c0,$0006,$0008
dw $0002,$0010,$fff0,$3002
dw $0002,$0010,$0010,$301a
dw $0001,$fff8,$ffe8,$3000
dw $0001,$fff8,$fff8,$300c
dw $0001,$fff8,$0008,$3018
dw $0001,$fff8,$0018,$3024
dw $0000
What these numbers mean is this:
ntsc frames per animation loop, unused, ntsc frames per animation frame
graphics ROM address, graphics ROM bank, image width, image height
sprite size, sprite X, sprite Y, attributes and number of tiles offset from ROM address
repeat until sprite size is 0
For rotating sprites, it is similar.
Code:
head:
dw $8000,$00c1,$0002
dw $0000,$007f,$0004,$0004
dw $0002,$0000,$0000,$3400
dw $0000
graphics ROM address, graphics ROM bank, size of rotation steps
rotation buffer address, rotation buffer bank, window width, window height
sprite size, sprite X, sprite Y, attributes and number of tiles offset from rotation buffer address
repeat until sprite size is 0
I would have though things like animation loops would have been done in the object's code.
Yes. The looping part is done as part of the of the object code.
Code:
animate_object:
ldy {metasprite_request}
cpy {metasprite}
beq +
stz {animation_frame}
+;
cpy #$0000
beq +
sep #$20
lda {animation_frame}
sta $4204
stz $4205
lda $0004,y
sta $4206
nop #8
lda $4214
sta $4202
lda $000a,y
sta $4203
nop
lda $0008,y
beq non_dynamic_animation
lda $4216
sta $4202
lda $000c,y
sta $4203
nop #3
rep #$20
lda $4216
asl #5
sta {graphics_address_request}
-;
lda {animation_frame}
inc
sta {animation_frame}
cmp $0000,y
bne +
stz {animation_frame}
+;
rts
non_dynamic_animation:
rep #$20
lda $4216
asl #4
sta {attributes}
stz {graphics_address_request}
stz {name_offset}
bra -
BTW, now that I look at it, this routine looks kind've slow. I could probably optimize it a little more.
I just thought of something, and I want to see if anyone approves. So you have a table with information on it relating to animation data that you index. The first word of the table says the bank the graphics are in. Everything but the first word will be addresses for tiles, but there will be special values that the animation engine will determine as different things (tile addresses obviously can't be these) The table gets indexed one word higher every fame (there will be an "animation timer" word in each object slot) the object still exists.
#$0000: End of Table Still Frame (The frame is displayed forever, as the animation timer never increases)
#$0001: End of Table Loop (The animation engine goes back to the second word in the table for an animation loop, and the animation timer is changed appropriately)
#$0002: No Animation Update (Makes the animation engine increase the timer by two and skips to the next object, or something like that)
Maybe Possible:
#$0003: Blank Frame (I'm not sure how this would work, but it would somehow mess with the metasprite routine)
I'm not exactly sure how I'm going to make it make sure to not count these values as addresses and how to do the appropriate action with each, but I'll get to it sometime.
Here are two example of how a table would look using this format:
#$0004 (bank 4)
#$916F (nonsense address)
#$0002 (same frame as before)
#$0002 (same frame as before)
#$6C87 (nonsense address)
#$0002 (same frame as before)
#$0002 (same frame as before)
#$0002 (same frame as before)
#$3B9E (nonsense address)
#$0002 (same frame as before)
#$0001 (loop to the beginning)
#$0007 (bank 7)
#$7D0D (nonsense address)
#$0002 (same frame as before)
#$0002 (same frame as before)
#$0002 (same frame as before)
#$0002 (same frame as before)
#$1805 (nonsense address)
#$0002 (same frame as before)
#$0002 (same frame as before)
#$4B6F (nonsense address)
#$0002 (same frame as before)
#$0002 (same frame as before)
#$F781 (nonsense address)
#$0002 (same frame as before)
#$0002 (same frame as before)
#$0002 (same frame as before)
#$9A76 (nonsense address)
#$0000 (do nothing but stop)
Espozo wrote:
I just thought of something, and I want to see if anyone approves...
I like it. It's kinda similar to the format I have designed.
I use a different method to handle the loops. I have a command called
DECREMENT_INDEX and it uses a two byte parameter which the index is decremented by. This allows for starting animations which continue into loops (ie, StopWalkingLeftAndBlink). It also saves 2 bytes as I don't need to store the starting address of the current animation.
----
While were all showing off our formats. I'll show you mine.
My format uses a bytecode syntax to prevent the linker from assigning a tileset/frame address <= $0008 (something that is possible with HIROM mapping).
I also don't store the tileset Bank in the animation list. That is set during entity load/Init (limiting me to 64KiB of tiles per entity, but I doubt I'll need more).
The current bytecodes I've designed are:
- LOAD_LEFT_TILES - Load 4 16x16 tiles to the left half of the entity's VRAM tiles.
- LOAD_RIGHT_TILES - Load 4 16x16 tiles to the right half of the entity's VRAM tiles.
- LOAD_PALETTE - Copies 16 colours into CGRAM (I'm not entirely sure how to handle this one)
- SET_METATILE_FRAME - Address of the table of the sprites frame.
- SET_ENTITY_SIZE - Sets the bounding box for collisions (crouching)
- WAIT_ONE_FRAME - Does nothing for one frame.
- DECREMENT_INDEX - Decrements the animation table position by # bytes.
Here is an example of my format using DECREMENT_INDEX to handle two animations:
Code:
Animation_Player_WalkLeft:
.byte Animation::LOAD_LEFT_GRAPHICS
.addr Player_HeadTiles
.byte Animation::LOAD_RIGHT_GRAPHICS
.addr Player_WalkTiles
_Animation_Player_WalkLeft_Loop:
.byte Animation::SET_FRAME
.addr MetaSpriteFrame_Player_WalkLeft1
.byte Animation::WAIT_FRAME
.byte 3
.byte Animation::SET_FRAME
.addr MetaSpriteFrame_Player_WalkLeft2
.byte Animation::WAIT_FRAME
.byte 3
.byte Animation::SET_FRAME
.addr MetaSpriteFrame_Player_WalkLeft3
.byte Animation::WAIT_FRAME
.byte 3
.byte Animation::DECREMENT_INDEX
.word * - _Animation_Player_WalkLeft_Loop
UnDisbeliever wrote:
I also don't store the tileset Bank in the animation list. That is set during entity load/Init (limiting me to 64KiB of tiles per entity, but I doubt I'll need more).
You mean you're only going to use one bank for all the sprite tiles?
Wait a minute! I didn't even tell it anything on multiple sprites! Should I have it to where the values in the table are actually loaded to index a table with the addresses of other tables? The table might say the sprite size of each sprite and the tiles for every sprite would be set up a certain way so it always works. You could have it like this maybe: (The code wouldn't have to differentiate these numbers with addresses because it will always alternate)
#$0000: End of Table
#$0001: Small Sprite Size
#$0002: Large Sprite Size
A table would look like this:
#$0001 (Small Sprite)
#$6D89 (Nonsense Address)
#$0002 (Large Sprite Size)
#$F7C4 (Nonsense Address)
#$0001 (Small Sprite)
#$37DE (Nonsense Address)
#$0001 (Small Sprite)
#$5AA7 (Nonsense Address)
#$0001 (End of Table, No Sprites Left)
Maybe this is getting out of hand...
One thing though UnDisbeliever, are you coding in C? I think I remember looking at the source code for one of your games and it didn't look like anything I was used to.
Espozo wrote:
You mean you're only going to use one bank for all the sprite tiles?
No, each entity type (tank, walker, player, stomper) has their own Bank, leaving up to 64KiB tiles (2048 8x8 tiles) per enemy type. I may change this in the future.
Espozo wrote:
Wait a minute! I didn't even tell it anything on multiple sprites... You could have it like this maybe
I've tried to parse that paragraph a few times and I don't really understand it.
Is this what your trying to explain?
Code:
repeat
vramPos = sprite VRAM position.
command = table[pos]
pos += 2
if command == 0
break loop
else if command == SmallSprite
DelayedCopytoVRAM(bank, table[pos], vramPos, 16)
pos += 2
vramPos += 16 / 2
else if command == LargeSprite
DelayedCopytoVRAM(tile bank, table[pos], vramPos, 32) // top half
DelayedCopytoVRAM(tile bank, table[pos] + 32, vramPos + 16 * 16, 32) // bottom half
pos += 2
vramPos += 32 / 2
if vramPos & $0100
// move to next line so you don't override the large tiles
vramPos += $0100
And each command loads a single small or large tile into VRAM?
I think its going to eat quite a bit of ROM space if you do that. I think you should add a size parameter to it. My version 1.0 is going to use a fixed size parameter size of 256 bytes (128 bytes (8 small tiles) to the top half, 128 bytes for the lower half).
---
Espozo wrote:
One thing though UnDisbeliever, are you coding in C? I think I remember looking at the source code for one of your games and it didn't look like anything I was used to.
No, I'm coding using
structure macros to bring a bit of High Level Assembly to the 65816.
These macros turn code like this:
Code:
LDA z:WES::walkLeftOnZero
IF_ZERO
LDA #JOY_LEFT
ELSE
LDA #JOY_RIGHT
ENDIF
into this:
Code:
LDA z:WES::walkLeftOnZero
BNE __STRUCTURE_001
LDA #JOY_LEFT
BRA __STRUCTURE_002
__STRUCTURE_001:
LDA #JOY_RIGHT
__STRUCTURE_002:
After I have written, tested and optimised the code I verify its correctness by converting it into psudeo-code so that 'future me' can quickly understand it in order to fix bugs.
UnDisbeliever wrote:
I've tried to parse that paragraph a few times and I don't really understand it.
Basically, each frame of animation in each object consists of multiple sprites. It needs to look through vram to find an empty slot for each sprite in the frame, and the result of where it was finally found is transferred to the character bits in the metasprite routine. The reason you need to tell it if it's big or not is that it has to find a slot that size in vram. Do you get it now?
UnDisbeliever wrote:
Is this what your trying to explain?
I don't get that at all.
Quote:
No, I'm coding using structure macros to bring a bit of High Level Assembly to the 65816.
Weird...
Espozo wrote:
Basically, each frame of animation in each object consists of multiple sprites. It needs to look through vram to find an empty slot for each sprite in the frame, and the result of where it was finally found is transferred to the character bits in the metasprite routine. The reason you need to tell it if it's big or not is that it has to find a slot that size in vram. Do you get it now?
Yeah, I understand it now.
I was initially thinking that you had allocated a block of VRAM (say 32 tiles, or 2 lines) for each object when they first appear on screen.
On each frame the animation module would load the needed tiles (from the animation frame table) to that VRAM block and
then the metasprite module would build the OAM table.
---
Quote:
Quote:
No, I'm coding using structure macros to bring a bit of High Level Assembly to the 65816.
Weird...
I initially thought so too. My first engine failed because it became too difficult to debug. After watching The Making of: ROM City Rampage (
youtube), and looking at the
example code published I realised that is just what I needed to improve readability.
I know it's not for everyone, but I'm liking it.
UnDisbeliever wrote:
Yeah, I understand it now.I was initially thinking that you had allocated a block of VRAM (say 32 tiles, or 2 lines) for each object when they first appear on screen.On each frame the animation module would load the needed tiles (from the animation frame table) to that VRAM block and then the metasprite module would build the OAM table.
That seems like that's what most people do. I don't think that would work for me, because objects are generally going to vary from 16x16 for projectiles and other small things all the way up to somewhere about 64x64 for big explosions and vehicles (I want to see if I can somehow double buffer things like that, but that'll be hard to implement). The way I'm doing is more processor intensive, but I think it'll be worth it. Psychopathicteen is doing the same thing.
This is pretty sad, but does anyone know where my "vram finder" code thing that I posted here is? I've been building up the courage to actually try to start this...
Edit: Never mind, I found it. Guess I better get to work...
Well, I just thought of a way to have it to tell a certain code during vblank to upload tiles to vram. I'm guessing you could have it so that there is a big table that has all the information for the tiles you want to upload, kind of like this: (Data would be entered into the table in the code for the vram finder and animation engine.)
Word 1: Sprite Graphics Size (small or large)
Word 2: Position In Vram
Word 3: Tile Address
The table could be probably about the number of 16x16 sized sprites that could possibly be uploaded in a single frame, and for every slot you go through in the code, you have a counter for the amount of graphics you are uploading. (For example, a 16x16 sprite counts as 1, and 32x32 sprite counts as 4.) When the number reaches the maximum amount that you can upload, you quit the routine. The request table would obviously be cleared every frame, so you could also have it to where when it reads that a sprites size is 0, (1 will be small and 2 will be large) it will stop searching the table and will then clear it, like if there's no more bandwidth. Another thing you could possibly do is dump whatever tile requests that haven't been able to have been complete into another table that will be uploaded if possible latter, but it could potentially write over a new frame and would just be a hassle altogether.
Maybe I should actually be doing something right now...
It'd probably be best just to write out all the transfers in a format similar to that used by the DMA engine itself.
0-1: Destination address in VRAM (for $2116-$2117, $FFFF at end of list)
2-4: Source address (for $4302-$4304)
5-6: Length in bytes (for $4305-$4306)
7: Write increment (for $2115)
This way, your NMI handler can just set up a 16-bit copy to VRAM ($4300=$01, $4301=$18), copy the appropriate bytes to the DMA and VRAM address registers, start a DMA copy on channel 0, and repeat.
tepples wrote:
It'd probably be best just to write out all the transfers in a format similar to that used by the DMA engine itself.
0-1: Destination address in VRAM (for $2116-$2117, $FFFF at end of list)
2-4: Source address (for $4302-$4304)
5-6: Length in bytes (for $4305-$4306)
7: Write increment (for $2115)
This way, your NMI handler can just set up a 16-bit copy to VRAM ($4300=$01, $4301=$18), copy the appropriate bytes to the DMA and VRAM address registers, start a DMA copy on channel 0, and repeat.
Yeah, that'd make more sense. One thing I'm not sure about is how the sprites aren't actually in a straight line, so you can't just do one transfer for one sprite. I'd be 2 for 16x16 and 4 for 32x32 sized sprites. Maybe you could have a table that a separate code reads and then transfers it to another table for DMA?
I'd probably be better to just have it to where you either had the table have slots for horizontal slivers that would be filled during the vram finder, or if you added another byte for size that the uploader code would take care of.
For 16x16 sprites, treat the table as having 16-byte entries:
0-1: Destination address in VRAM (for $2116-$2117, $FFFF at end of list)
2-4: Source address for top row (for $4302-$4304)
5-6: Length in bytes of one row, equal to 32 times number of tiles (for $4305-$4306)
7: Write increment (for $2115)
8-9: Destination address in VRAM, plus $0100
10-12: Source address for bottom row
13-15: Same as 5-7
And do the same thing for 32x32, except do 5-7 four times, and increase the transfer size? It could see the length in bytes for one row, and the code could decide if the sprite is either 16x16 or 32x32 because a 32x32 sized sprite is going to be twice as large horizontally as well as vertically, so you really don't need an extra byte for size. If the sprite is 16, the length of the transfer will be used twice, and if it is 32, the length of the transfer will be used four times. This should be simple enough to implement.
Well, here's something quick I made. There's still a fair amount to do, but just see if what I did so far makes sense. It's for uploading tiles. There is currently no loop or anything. Mostly see if I'm setting up DMA right, because I really haven't used it yet.
Here's a table for how the table is formatted:
Word 0: Spot In Vram
Word 1: Bank Number
Word 2: Tile Address
Word 3: Transfer/Sprite Size
And here's the code:
Code:
start_tile_uploader:
rep #$30 ; A=16, X/Y=16
ldx #$0000
lda #$1801 ; Set DMA mode (word, normal increment) and destination register (VRAM write register)
sta $4300
tile_uploader_begining:
lda TileRequestTable,x ;Spot in vram
sta $2116
lda TileRequestTable+2,x ;Bank number
sta $4303
lda TileRequestTable+4,x ;Tile address
sta $4302
lda TileRequestTable+6,x ;Transfer/sprite size
cmp #$0080
sta $4305
beq tile_uploader_32x32
;Bottom Half
inc $2115
lda TileRequestTable,x ;Spot in vram
clc
adc #$0100
sta $2116
lda TileRequestTable+4,x ;Tile address (all the tiles for one sprite will be in the same bank)
clc
adc #$0040
sta $4302
bra ????
tile_uploader_32x32:
inc $2115
lda TileRequestTable,x ;Spot in vram
clc
adc #$0100
sta $2116
lda TileRequestTable+4,x ;Tile address (all the tiles for one sprite will be in the same bank)
clc
adc #$0080
sta $4302
;Third Part
inc $2115
lda TileRequestTable,x ;Spot in vram
clc
adc #$0100
sta $2116
lda TileRequestTable+4,x ;Tile address (all the tiles for one sprite will be in the same bank)
clc
adc #$0080
sta $4302
;Fourth Part
inc $2115
lda TileRequestTable,x ;Spot in vram
clc
adc #$0100
sta $2116
lda TileRequestTable+4,x ;Tile address (all the tiles for one sprite will be in the same bank)
clc
adc #$0080
sta $4302
bra ????
Espozo wrote:
Well, here's something quick I made. There's still a fair amount to do, but just see if what I did so far makes sense. It's for uploading tiles. There is currently no loop or anything. Mostly see if I'm setting up DMA right, because I really haven't used it yet.
Firstly, I have to ask what this line is for.
Espozo wrote:
VMAIN ($2115) is a write only register, when the 65816 tries to read it, it would instead read the
open bus, increment that and then store the now unknown number into registers $2115, $2116.
Secondly, your also going to have to reset
DAS0 ($4305 - Size) after every transfer. DMA will actually decrement that register until it hits
$0000. In which case you will need to set it again when processing the bottom half.
Here a snippet of my 16x16 tile VBlank code. It uses 2 DMA channels to overcome that annoying limitation and saves precious VBlank time by not reloading values again.
It uses a structure of arrays to store the info (only need to decrement index by 2).
- BankAndType - lowbyte - data bank, high byte - type of transfer (0 = block, 1 = 16 px tile)
- DataPtr - location of tile data
- Size - number of bytes to transfer per row
- VramWordAddress - location to store the tiles.
- BottomHalfDataPtr - location of the bottom row of the tile data.
I chose to have use separate variable for the address of the bottom half instead of just adding $0200 to
DataPtr.
Its value is set by the game loop an could be either
Size or 512 bytes in size depending on the animation bytecode used.
Its very simple and allows me to upload multiple adjacent 16px tiles at once.
Prep (before loop)
Code:
.A16
.I8
LDY #VMAIN_INCREMENT_HIGH | VMAIN_INCREMENT_1
STY VMAIN
LDA #DMAP_DIRECTION_TO_PPU | DMAP_TRANSFER_2REGS | (.lobyte(VMDATA) << 8)
STA DMAP0 ; also sets BBAD0
STA DMAP1 ; also sets BBAD1
During loop, X is
table position + 2.
Code:
.A16
.I8
; Load data bank, ignore type.
LDY EntityAnimation__vramBuffer_BankAndType - 2, X
STY A1B0
STY A1B1
; destination already set.
LDA EntityAnimation__vramBuffer_DataPtr - 2, X
STA A1T0
LDA EntityAnimation__vramBuffer_BottomHalfDataPtr - 2, X
STA A1T1
LDA EntityAnimation__vramBuffer_Size - 2, X
STA DAS0
STA DAS1
LDA EntityAnimation__vramBuffer_VramWordAddress - 2, X
STA VMADD
LDY #MDMAEN_DMA0
STY MDMAEN
CLC
ADC #16 * 16
STA VMADD
LDY #MDMAEN_DMA1
STY MDMAEN
The code would not be difficult to modify for 32x32 pixel tiles, just use 4 DMA channels.
UnDisbeliever wrote:
Firstly, I have to ask what this line is for.
I had no clue what I was doing.
What register should I load from? Also, after setting up everything and doing the loop, is that when you should fire the four DMA channels?
UnDisbeliever wrote:
Secondly, your also going to have to reset DAS0 ($4305 - Size) after every transfer. DMA will actually decrement that register until it hits $0000. In which case you will need to set it again when processing the bottom half.
Strange... I actually noticed that I was repeatedly loading the same value, and I had tried to optimize. Your solution sounds good. I think I'll fix up what I have now and finish it and then I'll put it up back here for inspection.
UnDisbeliever wrote:
I chose to have use separate variable for the address of the bottom half instead of just adding $0200 to DataPtr.Its value is set by the game loop an could be either Size or 512 bytes in size depending on the animation bytecode used.Its very simple and allows me to upload multiple adjacent 16px tiles at once.
You're saying that you want to have it to where the tiles for the top and the tiles for the bottom of a sprite are spread out? Also, what would you want to alter the transfer type for?
Espozo wrote:
You're saying that you want to have it to where the tiles for the top and the tiles for the bottom of a sprite are spread out?
Yeah, currently 16x16 tiles are spread out so the bottom half is 512 bytes after the top half (for simplicity - like in VRAM). In the future I may code a sprite editor app that would to remove the unused bytes as needed. I'm just keeping my options open.
Espozo wrote:
Also, what would you want to alter the transfer type for?
My metaSprites are made up of a mixture of 8x8 and 16x16 Objects.
I use the
Type byte to handle either one row being uploaded (8px tiles) or two (16px tiles). I decided not to use Size to detemine that as it allows me to upload multiple tile together.
I'm also thinking of add a third transfer type that would allow me to transfer multiple 16px tiles in single command, but I'll need to make the sprite editor app first (building the data by hand would be a nightmare).
UnDisbeliever wrote:
I use the Type byte to handle either one row being uploaded (8px tiles) or two (16px tiles). I decided not to use Size to detemine that as it allows me to upload multiple tile together.
So I should be fine with just that transfer type?
Espozo wrote:
UnDisbeliever wrote:
I use the Type byte to handle either one row being uploaded (8px tiles) or two (16px tiles). I decided not to use Size to detemine that as it allows me to upload multiple tile together.
So I should be fine with just that transfer type?
Kinda, the fastest way to do it is to have two transfer types (16px and 32px tile).
You just need check the size of transfer and then branch to either the code for 16px tiles or branch to the code for 32px tiles (one that uses 4 DMA channels).
UnDisbeliever wrote:
Espozo wrote:
UnDisbeliever wrote:
I use the Type byte to handle either one row being uploaded (8px tiles) or two (16px tiles). I decided not to use Size to detemine that as it allows me to upload multiple tile together.
So I should be fine with just that transfer type?
Kinda, the fastest way to do it is to have two transfer types (16px and 32px tile).
You just need check the size of transfer and then branch to either the code for 16px tiles or branch to the code for 32px tiles (one that uses 4 DMA channels).
I meant on the DMA side of things, like increment and stuff. If you look at the code I posted, I have it to where there are two transfer types.
Espozo wrote:
I meant on the DMA side of things, like increment and stuff. If you look at the code I posted, I have it to where there are two transfer types.
Your going to have to point it out to me. I only see DMA mode 1, SNES -> PPU, increment, VMDATA word writes. Nothing about a different DMA mode there.
By transfer type, I meant 16x16 and 32x32 sprites. You are right in that I'm only using DMA mode 1, SNES -> PPU, increment, VMDATA word writes.
Espozo wrote:
By transfer type, I meant 16x16 and 32x32 sprites. You are right in that I'm only using DMA mode 1, SNES -> PPU, increment, VMDATA word writes.
Ah, I see
.
Still your fastest method is to have two transfer types in your VBlank (DMA) routine, just test the size and branch as needed.
UnDisbeliever wrote:
Ah, I see
.
No, it's really my fault for not being specific enough. (Both are what I guess you'd call "transfer types".
Maybe we could call 16x16 and 32x32 "types", and the other thing "modes".)
UnDisbeliever wrote:
Still your fastest method is to have two transfer types in your VBlank (DMA) routine, just test the size and branch as needed.
I do. The first horizontal sliver is set up the same for both, so it gave me an opportunity to do that for both and then branch off when it checks the size.
Here is where it checks it:
Code:
lda TileRequestTable+6,x ;Transfer/sprite size
cmp #$0080
sta $4305
beq tile_uploader_32x32
Espozo wrote:
UnDisbeliever wrote:
Ah, I see
.
No, it's really my fault for not being specific enough. (Both are what I guess you'd call "transfer types".
Maybe we could call 16x16 and 32x32 "types", and the other thing "modes".)
I'm also having an off day explaining things.
The fastest way (known to me) of loading a 32px tile to VRAM is to take my 16px tile transfer code and extend it to use 4 DMA Channels. But that uses a different number of channels than the 16px code, therefore its best to test the size before setting the DMA registers.
Code:
LDA size
CMP #$0080
BEQ upload_32_tile
// 16px tile DMA code goes here
BRA done
upload_32_tile:
// 32px tile DMA code goes here
done:
I hope that explains what I've been trying to say. I'm going to take a break here.
Why didn't my HL macro code work with 65816?
Movax12 wrote:
Why didn't my HL macro code work with 65816?
I'm not even going to lie, but I have no clue what you're talking about.
UnDisbeliever wrote:
But that uses a different number of channels than the 16px code, therefore its best to test the size before setting the DMA registers.
Why does it matter? Both are going to use channel 0 for the first row, and you're going to fire all 4 DMA channels at the end of the whole thing anyway. I just have it to where it checks the size when it loads it for the next row and then branches to the 32x32 code if it is the number 128 so it saves a load and some rom space.
I don't know why, but I'm feeling really out of it today. I want to have it to where I do this if I'm using 8 bits:
lda #%00001111
sta $420B
But I'm using 16. Is this correct? I often forget what order the bytes are supposed to be in. (It doesn't matter if $420A is zeroed.)
lda #%0000111100000000
sta $420A
Also, why couldn't I do "inc $2115" again?
Espozo wrote:
Also, why couldn't I do "inc $2115" again?
Because
inc addr is a read-modify-write instruction, and you can't read $2115?
Espozo wrote:
I don't know why, but I'm feeling really out of it today. I want to have it to where I do this if I'm using 8 bits:
lda #%00001111
sta $420B
But I'm using 16. Is this correct? I often forget what order the bytes are supposed to be in. (It doesn't matter if $420A is zeroed.)
lda #%0000111100000000
sta $420A
You've got the bytes backwards. When you're spelling out a "word" in your code, it is in the easier-to-read ie/ correct order. In the actual ROM, though, it's little-endian, so backwards. eg/ .dw $4455 is $55, $44 in ROM.
EDIT: WAIT you're trying to write to 420B? Okay that should work. Sorry...
Khaz wrote:
EDIT: WAIT you're trying to write to 420B? Okay that should work. Sorry...
It's fine. I'm glad that I know it's right.
93143 wrote:
Espozo wrote:
Also, why couldn't I do "inc $2115" again?
Because
inc addr is a read-modify-write instruction, and you can't read $2115?
What would happen if you tried to read it? Would you just get 0?
Reading a write-only port produces "open bus". The behavior of open bus differs based on A. exactly how the enable logic is implemented inside the PPUs, and B. whether the PPUs have their own internal data bus with substantial capacitance.
On the NESIt was confirmed in
Riding the open bus that the NES PPU's internal data bus behaves as an 8-bit latch that stores the last value written to or read from any PPU port. Reading any write-only register (or reading the unimplemented bits of any readable port such as $2002) will return bits from this latch. I have developed a test ROM for NES open bus behavior.
On the Super NESOpen bus behavior on the Super NES is far more complicated according to
"SNES Unpredictable Things" in Fullsnes. The PPU1 (address generator) and PPU2 (compositor) each have their own data latch, but they don't enable it for all reads the way the NES PPU does.
tepples wrote:
Reading a write-only port produces "open bus". The behavior of open bus differs based on A. exactly how the enable logic is implemented inside the PPUs, and B. whether the PPUs have their own internal data bus with substantial capacitance.
Wouldn't this make for a good, seemingly random number generator?
I know I'm sounding like a broken record here, but what exactly is $2115 even do? All I found was that it's the "VRAM Address Increment Value".
I just did some minor things with the code. I just don't know what it means by "increment value" when you write to $2116 for the value anyway, unless $2115 is optional in that whatever number it holds is added to $2116.
Table Setup:
Word 1: Spot In Vram
Word 2: Bank Number
Word 3: Tile Address
Word 4: Transfer/Sprite Size
Code:
start_tile_uploader:
rep #$30 ; A=16, X/Y=16
sta $4300
tile_uploader_loop:
cpx TileUploaderRequestCounter
bne tile_uploader_begining
ldx #$0000
lda #$1801 ; Set DMA mode (word, normal increment) and destination register (VRAM write register)
sta $4200
lda #%0000111100000000 ; Initiate DMA transfer (channel 0, 1, 2, and 3)
sta $420A
rts
tile_uploader_begining:
lda TileRequestTable,x ;Spot in vram
sta $2116
lda TileRequestTable+2,x ;Bank number
sta $4303
lda TileRequestTable+4,x ;Tile address
sta $4302
lda TileRequestTable+6,x ;Transfer/sprite size
cmp #$0080
sta $4305
sta $4315
beq tile_uploader_32x32
;Bottom Half
inc $2115
lda TileRequestTable,x ;Spot in vram
clc
adc #$0100
sta $2116
lda TileRequestTable+4,x ;Tile address (all the tiles for one sprite will be in the same bank)
clc
adc #$0040
sta $4312
txa
clc
adc #$0008
tax
bra tile_uploader_loop
tile_uploader_32x32:
sta $4325
sta $4335
inc $2115
lda TileRequestTable,x ;Spot in vram
clc
adc #$0100
sta $2116
lda TileRequestTable+4,x ;Tile address (all the tiles for one sprite will be in the same bank)
clc
adc #$0080
sta $4312
;Third Part
inc $2115
lda TileRequestTable,x ;Spot in vram
clc
adc #$0100
sta $2116
lda TileRequestTable+4,x ;Tile address (all the tiles for one sprite will be in the same bank)
clc
adc #$0080
sta $4322
;Fourth Part
inc $2115
lda TileRequestTable,x ;Spot in vram
clc
adc #$0100
sta $2116
lda TileRequestTable+4,x ;Tile address (all the tiles for one sprite will be in the same bank)
clc
adc #$0080
sta $4332
txa
clc
adc #$0008
tax
bra tile_uploader_loop
What does this mean? I don't know why, but I assumed that the address would be incremented automatically so it wouldn't just do multiple writes in the same place. Would it be good if I kept it at #00?
Quote:
i = Address increment mode^:
0 => increment after writing $2118/reading $2139
1 => increment after writing $2119/reading $213a
Espozo wrote:
What does this mean? I don't know why, but I assumed that the address would be incremented automatically so it wouldn't just do multiple writes in the same place. Would it be good if I kept it at #00?
Quote:
i = Address increment mode^:
0 => increment after writing $2118/reading $2139
1 => increment after writing $2119/reading $213a
Normally you want that bit 1, so you can write two bytes to the same word address before it increments. If you're doing something that requires modifying every other byte, like writing Mode 7 pixel data, or Mode 7 tilemap data, or changing out BG tile palettes without moving the tiles, you can just target one of the two VRAM data ports and have it increment the word address after you've written that port. In the case of Mode 7 tilemap data, you want that bit 0 so you can write to the low byte and have the address go up.
Another thing I'm being stupid about is you have to fire DMA after every sprite, right?
Espozo wrote:
tepples wrote:
Reading a write-only port produces "open bus". The behavior of open bus differs based on A. exactly how the enable logic is implemented inside the PPUs, and B. whether the PPUs have their own internal data bus with substantial capacitance.
Wouldn't this make for a good, seemingly random number generator?
Usually not. Most open bus that you find in these old systems is deterministic, based on the last value that appeared on a particular bus. For "classic" open bus, it is usually the last byte of the opcode. For PPU open bus, it is the last byte read or written while the PPU was selected.
Quote:
I know I'm sounding like a broken record here, but what exactly is $2115 even do? All I found was that it's the "VRAM Address Increment Value".
Essentially the same thing bit 2 of $2000 does on the NES. It determines whether 1, 32, or 128 is added after a write to $2118/$2119. Use 1 for pattern tables or for horizontal runs in nametables, 32 for vertical runs in mode 0-6 nametables, or 128 for vertical runs in the mode 7 nametable.
Bit 7 of $2115 determines whether writes to $2118 or $2119 causes the VRAM address to be incremented. You normally want to leave this bit turned on (update on $2119 write) unless you're either A. updating the mode 7 nametable or B. updating bits 7-0 of a mode 0-6 nametable tile number without affecting the flip, priority, palette, or bits 9-8 of the tile number. One example of B would involve clearing a nametable to a particular 16-bit value: one DMA to $2118 with a constant source address and another to $2119.
tepples wrote:
Essentially the same thing bit 2 of $2000 does on the NES.
I'd have no clue what that is.
Because of $2116, you couldn't upload more than one transfer at a time by firing more than one DMA channel, could you? Would you have to do something like this?
Code:
;First Row
lda TileRequestTable+2,x ;Spot in vram
lda #%0000000100000000 ; Initiate DMA transfer (channel 0)
sta $420A
;Second Row
lda TileRequestTable+2,x ;Spot in vram
clc
adc #$0100
sta $2116
lda #%0000001000000000 ; Initiate DMA transfer (channel 1)
sta $420A
;Third Row
lda TileRequestTable+2,x ;Spot in vram
clc
adc #$0200
sta $2116
lda #%0000010000000000 ; Initiate DMA transfer (channel 2)
sta $420A
;Fourth Row
lda TileRequestTable+2,x ;Spot in vram
clc
adc #$0300
sta $2116
lda #%0000100000000000 ; Initiate DMA transfer (channel 3)
sta $420A
You could use 8 channels to set the address, data, address, data, address, data, address, and data.
tepples wrote:
You could use 8 channels to set the address, data, address, data, address, data, address, and data.
? There are 4, 4 tile slivers in a 32x32 sized sprite. That's why it's like that.
If this makes it any easier to understand:
Table Setup:
Word 1: Transfer/Sprite Size
Word 2: Spot In Vram
Word 3: Bank Number
Word 4: Tile Address
Code:
start_tile_uploader:
rep #$30 ; A=16, X/Y=16
ldx #$0000
lda #$1801 ; Set DMA mode (word, normal increment) and destination register (VRAM write register)
sta $4300
lda #$0080
sta $2115
tile_uploader_begining:
lda TileRequestTable,x ;Transfer/Sprite Size
sta $4305
sta $4315
cmp #$0080
beq tile_uploader_32x32
;16x16 Top Half
lda TileRequestTable+2,x ;Spot in vram
sta $2116
lda TileRequestTable+4,x ;Bank number
sta $4303
sta $4313
lda TileRequestTable+6,x ;Tile address
sta $4302
clc
adc #$0040
sta $4312
lda #%0000000100000000 ; Initiate DMA transfer (channel 0)
sta $420A
;16x16 Bottom Half
lda TileRequestTable+2,x ;Spot in vram
clc
adc #$0100
sta $2116
lda #%0000001000000000 ; Initiate DMA transfer (channel 1)
sta $420A
txa
clc
adc #$0008
tax
cpx TileUploaderRequestCounter
bne tile_uploader_begining
jsr
tile_uploader_32x32:
sta $4325
sta $4335
lda TileRequestTable,x ;Transfer/sprite size
cmp #$0080
sta $4305
sta $4315
beq tile_uploader_32x32
lda TileRequestTable+2,x ;Spot in vram
sta $2116
lda TileRequestTable+4,x ;Bank number
sta $4303
sta $4313
sta $4323
sta $4333
lda TileRequestTable+6,x ;Tile address
sta $4302
clc
adc #$0040
sta $4312
clc
adc #$0040
sta $4322
clc
adc #$0040
sta $4332
lda #%0000000100000000 ; Initiate DMA transfer (channel 0)
sta $420A
;Second Row
lda TileRequestTable+2,x ;Spot in vram
clc
adc #$0100
sta $2116
lda #%0000001000000000 ; Initiate DMA transfer (channel 1)
sta $420A
;Third Row
lda TileRequestTable+2,x ;Spot in vram
clc
adc #$0200
sta $2116
lda #%0000010000000000 ; Initiate DMA transfer (channel 2)
sta $420A
;Fourth Row
lda TileRequestTable+2,x ;Spot in vram
clc
adc #$0300
sta $2116
lda #%0000100000000000 ; Initiate DMA transfer (channel 3)
sta $420A
txa
clc
adc #$0008
cmp TileUploaderRequestCounter
beq done:
tax
brl tile_uploader_begining
done:
rts
What he means is that rather than stopping after every row to manually write the new VRAM address, you could dedicate every other DMA channel to writing $2116/$2117, resulting in a single 8-channel DMA transfer (or 7-channel if you set up the initial address manually first) rather than four single-channel transfers.
93143 wrote:
What he means is that rather than stopping after every row to manually write the new VRAM address, you could dedicate every other DMA channel to writing $2116/$2117, resulting in a single 8-channel DMA transfer (or 7-channel if you set up the initial address manually first) rather than four single-channel transfers.
DMA actually doesn't fire everything at once, correct? Doesn't it do the first, then the second, then the third and so on? Also, I imagine DMAing to $2116/$2117 wouldn't really do much. There isn't actually a penalty for doing it one channel at a time, is there? I just wouldn't bother with the 8 channel thing because you still have to set up DMA for it so it probably wouldn't really offer much of a speed advantage.
Espozo wrote:
There isn't actually a penalty for doing it one channel at a time, is there?
The penalty for doing DMA one channel at a time mostly arises from having to use valuable vblank time to set up DMA registers. But this might be negligible.
tepples wrote:
Espozo wrote:
There isn't actually a penalty for doing it one channel at a time, is there?
The penalty for doing DMA one channel at a time mostly arises from having to use valuable vblank time to set up DMA registers. But this might be negligible.
Won't you still use vblank time to set up the 8 channel thing?
You can set up a DMA transfer during active picture and fire it during vblank.
tepples wrote:
You can set up a DMA transfer during active picture and fire it during vblank.
So I have it to where I only use channel 0,3,5, and 7 for actually uploading the sprite graphics and I use 2,4, and 6 to change the address for where it goes in vram? Wait a minute, what happens if there's slowdown and it doesn't get to the part where the vram addresses are filled out? I don't think I'll bother with this. Does the rest of my code look any good? I'm just not used to doing anything involving DMA or HDMA.
Espozo wrote:
Movax12 wrote:
Why didn't my HL macro code work with 65816?
I'm not even going to lie, but I have no clue what you're talking about.
See
this post:UnDisbeliever wrote:
No, I'm coding using
structure macros to bring a bit of High Level Assembly to the 65816.
In
structure.inc, UnDisbeliever says (reffering to my macros), "I first tried using the HL macros for 65816 code, but they kept mixing 8/16 bit state and failing to generate correct code and thus tried to look for something simpler."
Maybe he could explain more? Without thinking about it too much (I don't know 65816 well) I am not sure why they would fail.
Movax12 wrote:
In
structure.inc, UnDisbeliever says (reffering to my macros), "I first tried using the HL macros for 65816 code, but they kept mixing 8/16 bit state and failing to generate correct code and thus tried to look for something simpler."
Maybe he could explain more? Without thinking about it too much (I don't know 65816 well) I am not sure why they would fail.
Sounds to me like the issue was some of the macros expect a specific processor state (8 or 16 bit A/X/Y) on entry so they'll only work if called in the right place. Alternately, the macro may set itself up to the correct register widths but the programmer may be unaware and thus the macro will change the processor state without his knowledge. So, if the problem happens to be the latter you could make things a tiny bit more "user-friendly" with a PHP-PLP around the macro, if you're not overly concerned about the efficiency of code made with your macros...
Though, if you have the macro source code handy, neither one should be much of a problem...
Well the macro code I wrote is designed for plain 6502, so it doesn't change register widths, not sure if a register was in 16 bit mode how it would affect the outcome of the flags and branches. I have to read more about 65816.
Movax12 wrote:
Well the macro code I wrote is designed for plain 6502, so it doesn't change register widths, not sure if a register was in 16 bit mode how it would affect the outcome of the flags and branches. I have to read more about 65816.
Short answer is, nine times out of ten, one instruction performed with the wrong register width will easily break everything; hence keeping track of which mode you're in is one of the first responsibilities of a 65816 programmer.
To get around the problem, you can simply add a SEP #$30 instruction at the start of every macro, which will guarantee 8-bit registers. To ensure complete compatibility, you can put a PHP before it and a PLP at the end of the macro so that your code will always run 8-bit and restore the processor state after it's done.
However, that's adding three (tiny) instructions worth of processor load/ROM to every macro. So, in all honesty, I think the macros are fine as-is given it's made clear that they all require you to be in 8-bit A/X/Y mode before you call them...
EDIT: Wrote REP instead of SEP. >.<
Or you can use variables (if it has macros it has variables too, right?) that keep track of the current flag status and add opcodes to change it only when really needed. Then you only need to keep track of when the flag status is potentially not known (e.g. at the beginning of subroutines or when coming from a branch).
Movax12 wrote:
Well the macro code I wrote is designed for plain 6502, so it doesn't change register widths, not sure if a register was in 16 bit mode how it would affect the outcome of the flags and branches. I have to read more about 65816.
The primary issues were things like
thisCode:
eor #$80 ; toggle negative bit
do_array_syntax cmp, {#( .mid(1,255, OPERAND) ^ $80 )}
.if _HL_::__KEEP_REG_A_ .and _left_ = _COMP_MACRO_::REGA
eor #$80 ; put bit7 back if we are just restoring A
.endif
Which require the
$80 to be
$8000 for a 16 bit Accumulator.
There is also a few
.lobyte instructions for wrapping signed numbers that would fail in 16 bit mode.
I tried editing it, but had difficulty trying to keep the state of the accumulator in check as the macro unfolded. Without a lack of
check accumulator size psudeo-functions in ca65 I just gave up and went with something simpler (
http://wilsonminesco.com/StructureMacros/index.html) which resulted in structure.inc.
Khaz wrote:
So, in all honesty, I think the macros are fine as-is given it's made clear that they all require you to be in 8-bit A/X/Y mode before you call them...
I agree.
I have no problem with the ca65hl macros themselves. It just my code uses a mixture of 8, 16, 24 and 32 bit values and it would look messy if I used ca65hl for 40% of the comparisons and the other 60% used the more traditional assembly with branches.
UnDisbeliever wrote:
The primary issues were things like....
Thanks, I'll consider upgrading my code to support 65816.
Me wrote:
Does the rest of my code look any good?
I'll guess I'll take silence as a yes? Anyway, I'll try to make the rest of this. Wish me luck...
I think I know how to speed up looking for vram slots. Instead of each byte holding the flags of 4 16x16 slots that make up a 32x32 slot, it probably is faster to have each byte hold 1 of the 4 16x16 slots of 8 32x32 slots.
psycopathicteen wrote:
it probably is faster to have each byte hold 1 of the 4 16x16 slots of 8 32x32 slots.
Wouldn't that just make it harder to look for an empty 32x32 slot, because the bits will be spread out?
Anyway, I thought of something really stupid a while back. It's a simple alternator for if you are starting at the beginning of the table and going forward, or if you are starting at the end of the table and going backward. Would this even be worth implementing? All it would really be is this, but I don't really like the thought of playing to the best case scenario, and I would require more rom space. (Which I could honestly care less about, but still.)
Code:
lda TableStartPosition
bne backwards
;forwards code here
inc TableStartPosition
rts
backwards:
;backwards code here
stz TableStartPosition
rts
You can pretty much do this and be done.
BTW, why does nesdev's code font always make code look longer than it really is?
Code:
sep #$10
rep #$20
ldx #$00
lda {vram_slot_table}
ora {vram_slot_table}+4
ora {vram_slot_table}+8
ora {vram_slot_table}+12
cmp #$ffff
beq check_other_bank
sep #$20
cmp #$ff
bne +
xba
inx
+;
tay
lda LUT,y
tsb {vram_slot_table}
tsb {vram_slot_table}+4
tsb {vram_slot_table}+8
tsb {vram_slot_table}+12
bra found_slot
check_other_bank:
ldx #$02
lda {vram_slot_table}+2
ora {vram_slot_table}+6
ora {vram_slot_table}+10
ora {vram_slot_table}+14
cmp #$ffff
beq no_slot_found
sep #$20
cmp #$ff
bne +
xba
inx
+;
tay
lda LUT,y
tsb {vram_slot_table}+2
tsb {vram_slot_table}+6
tsb {vram_slot_table}+10
tsb {vram_slot_table}+14
found_slot:
lda LUT2,y
ora LUT3,x
rep #$20
ora #$00ff
asl
sta {sprite_name}
rts
no_slot_found:
lda #$0200
sta {sprite_name}
rts
psycopathicteen wrote:
You can pretty much do this and be done.
I'm not even kidding, but I don't have the slightest clue as to what your code is doing. What does "LUT" mean? I think I can handle it. (To my knowledge, I actually came up with the idea in the first place.)
psycopathicteen wrote:
BTW, why does nesdev's code font always make code look longer than it really is?
Compared to what? I usually still use notepad because I like it the most, and the letters and the vertical distance between them is slightly smaller, so that could contribute to it.
I just realized that I have no clue how to do something: How would you index a value in the accumulator by x or y? It's because I want to load from a table that I'm indexing that contains the values of the addresses of other tables that I will then also be indexing.
Espozo wrote:
What does "LUT" mean?
LookUp Table.
Or Launch Umbilical Tower, but I don't think the context is quite right for that...
93143 wrote:
Espozo wrote:
What does "LUT" mean?
LookUp Table.
Oops. I probably shouldn't have asked a Google-able question.
Anyway though, for indexing the table, would it be kind of like this: (but wait, this won't allow you to load any of the other values from the table that the address table is pointing to, will it?)
Code:
lda (TableFullOfTableAdresses,x)
There's no addressing mode using a register as indirect address. Either use any of the direct page indirection modes, or push the address to the stack and use the stack relative indirect indexed addressing mode (that's a mouthful, and depending on what you need to do it'll probably be slower too).
Read all about it in "Programming the 65816" book from WDC, which I think is freely available on the net.
I guess what I could do is I load from the table full of table addresses indexed by x, transfer the accumulator to x, (that's the value of the address for the table, and I won't need x again) and then index $0000 in ram, so I'm reading what I just indexed, if that makes sense.
Code:
ldx Whatever
lda TableFullOfTableAdresses,x
tax
lda Zero,x
Espozo wrote:
I guess what I could do is I load from the table full of table addresses indexed by x, transfer the accumulator to x, (that's the value of the address for the table, and I won't need x again) and then index $0000 in ram, so I'm reading what I just indexed, if that makes sense.
Code:
ldx Whatever
lda TableFullOfTableAdresses,x
tax
lda Zero,x
Sure, that would work. If you're doing a lot of accesses on the table it may be faster to push the address you've loaded into A to the direct page register, and do subsequent reads with lda z:$00,x (in ca65 parlance).
Well, I'm bored, so I feel like posting what I made of the animation table looker thing so far. I don't really have anything commented on because I'm lazy, but if someone is curious about something, I can if they want me to.
Animation Update Table Setup:
Word 1: Spot In Vram
Word 2: Transfer/Sprite Size
Word 3: Bank Number
Word 4: Tile Address
Animation Tables in Rom Setup: (It obviously doesn't have the spot in vram.)
Word 1: Transfer/Sprite Size
Word 2: Bank Number
Word 3: Tile Address
Code:
start_table_finder:
rep #$30
ldx #$0000
animation_table_finder:
lda AnimationFrame
clc
adc AnimationCounter
tay
lda a:AnimationTableTable,y
cmp #$0003
bcc special_animation_table_commands ;not yet implemented, but there should be 3 different commands.
tay
inc AnimationCounter
animation_table_reader_loop:
lda a:Zero,y
beq animation_table_reader_done
sta TileRequestTable+2,x
lda a:Zero+2,y
sta TileRequestTable+4,x
lda a:Zero+4,y
sta TileRequestTable+6,x
tya
clc
adc #$0006
tay
txa
clc
adc #$0008
tax
bra animation_table_reader_loop
animation_table_reader_done:
tdc
clc
adc #$30
tcd
cmp #6144+ObjectTable
bne animation_table_finder
stx TileUploaderRequestCounter
rts
One problem I'm having is that I have no clue how to implement the vram finder into this, because there aren't enough registers for both to go on at the same time.
Should I somehow try and do it after?Edit: Doing it after really doesn't make any sense, because it will probably take more CPU time to look through the list again than it would to just preserve the registers.
One thing I thought about the vram finder is that with single objects that consist of lots of sprites, you could have it to where you find an open slot for the first sprite and then you know not to look before that for the next one, because it would have used one of the slots if they were open. Wait, a minute though, it could only know to do this if it went through the table already, which I thought wasn't a good idea. I have no clue what to do.
Edit 2: Actually, just dismiss what I just said. I'm not thinking straight today.
I just thought of something: I never even coded the vram finder to say where the sprites are stored in vram, which is fine for uploading, but you don't know their location for erasing it from vram because of an animation change.
I thought of one other thing though. If you could somehow make it to where all the 32x32 sprites went before the 16x16 ones, you could potentially condense vram more so you could store more things, because the 16x16 will just look for the first thing that's available, and not what fits the 32x32s. Why is it that I come up with all these ideas, but not have a clue how to implement them...
Edit: I just thought of something: For the objects, you could maybe have a table that says what sprite is going to what object? It would be 128 entries long so there are all the sprites, and each slot in the table would say what object the sprite is, and what spot in vram it's using. The problem is, you'd have to look through the table to find what each sprite goes to each object when you need to delete it from the table. Maybe in the object slot, you could have a register to say where in the table the first sprite is, and also something that says how many sprites there are, so it knows when to stop searching? Why am I getting so ahead of myself... I should probably worry about getting rid of objects from vram once I actually get them there...
Espozo wrote:
I thought of one other thing though. If you could somehow make it to where all the 32x32 sprites went before the 16x16 ones, you could potentially condense vram more so you could store more things, because the 16x16 will just look for the first thing that's available, and not what fits the 32x32s. Why is it that I come up with all these ideas, but not have a clue how to implement them...
That's what I'm doing. With my engine OBJ VRAM is split into two sections.
The top section is for meta-sprites and allocate 2 VRAM rows (32 tiles) to each Entity when they are about to be appear on-screen. It consists of 14 slots.
The bottom section are for 16x16/8x8
singlar sprites (for projectiles, coins, etc), each Entity will be given a single 16x16 tile on activation. It consists of 16 slots (currently unimplemented).
Espozo wrote:
I just thought of something: I never even coded the vram finder to say where the sprites are stored in vram, which is fine for uploading, but you don't know their location for erasing it from vram because of an animation change.
For the meta-sprites I have a table called
entityInVramSlots, its currently 15 words in size an points to address of the Entity using that VRAM slot. If any value is zero then its free. Each slot represents 2 rows (32 tiles) in the VRAM.
On
Entity Activation (it's almost on screen), the table is scanned for the first free slot. Once found, the entity's address is stored in the slot, a variable called
EntityAnimationStruct::metaSpriteCharAttr is set to
slot index * 16 (index is already a multiple of two), and
EntityAnimationStruct::tileVramWordAddress is set to
slot index * 16 * 16 + GAMELOOP_OAM_TILES.
On
Entity deactivation (its off-screen or dead) the VRAM slot is freed. This is easily done since
(EntityAnimationStruct::tileVramWordAddress - GAMELOOP_OAM_TILES) / (16 * 16) is equal to the slot index.
Since your using 32x32/16x16 pixel sprites the formula to turn a slot index into a tile address would different. I hope this helps.
What I mean by doing 32x32s before 16x16s is look at the picture below, where red are places in vram that have already been taken, and green are free slots.
Attachment:
vram.png [ 169 Bytes | Viewed 1629 times ]
If a 16x16 went before a 32x32, it would take the first thing that was open and take the first spot, making it to where the 32x32 couldn't fit. If the 32x32 went first, it would take the big open spot and the 16x16 could also fit in the other open spot. I think I've
kind of settled a solution to my problem. The vram uploader code will run after the animation engine, where it will read the DMA request table. This won't have it to where all the 32x32s go first, but I guess I'll figure that out latter, if I even do implement it. This whole thing is a nightmare.
How much easier it would have been if I had opted for a simpler system...
The only tables you need are:
1) Metasprite data table in ROM
2) VRAM location table for each object in RAM
3) DMA table
4) VRAM empty/used flag table
Well, I still need to implement the animation engine commands, the keeping track of tiles in vram, and do a good bit of cleanup, particularly on the vram finder, but here is all of it:
Animation Update Table Setup:
Word 1: Spot In Vram
Word 2: Transfer/Sprite Size
Word 3: Bank Number
Word 4: Tile Address
Animation Tables in Rom Setup: (It obviously doesn't have the spot in vram.)
Word 1: Transfer/Sprite Size
Word 2: Bank Number
Word 3: Tile Address
Animation Engine:
Code:
start_table_finder:
rep #$30
ldx #$0000
animation_table_finder:
lda AnimationFrame
clc
adc AnimationCounter
tay
lda a:AnimationTableTable,y
cmp #$0003
bcc special_animation_table_commands ;not yet implemented, but there should be 3 different commands.
tay
inc AnimationCounter
animation_table_reader_loop:
lda a:Zero,y
beq animation_table_reader_done
sta TileRequestTable+2,x
lda a:Zero+2,y
sta TileRequestTable+4,x
lda a:Zero+4,y
sta TileRequestTable+6,x
tya
clc
adc #$0006
tay
txa
clc
adc #$0008
tax
bra animation_table_reader_loop
animation_table_reader_done:
tdc
clc
adc #$30
tcd
cmp #6144+ObjectTable
bne animation_table_finder
stx TileUploaderRequestCounter
rts
Vram Finder:
Code:
start_vram_finder:
rep #$30 ; A=16, X/Y=16
ldx #$0000
stz 32x32ColumnSkipperCounter
vram_finder_loop:
lda TileRequestTable+2,x
cmp #$0080
beq look_for_32x32_vram
rts
look_for_16x16_vram:
cpy #$0080 ;128, because there are 128 slots for sprites
beq vram_finder_done ;no space left for sprite
lda VramTable,y
beq slot_found ;there is space for another sprite
iny
bra look_for_16x16_vram ;look again if the space is already occupied
look_for_32x32_vram:
cpy #$0080 ;128, because there are 128 slots for sprites
beq vram_finder_done ;no space left for sprite
lda VramTable,y ;upper lefthand corner of square
bne prepare_for_look_for_32x32_vram ;look again if the space is already occupied
lda VramTable+1,y ;upper righthand corner of square
bne prepare_for_look_for_32x32_vram ;look again if the space is already occupied
lda VramTable+8,y ;lower lefthand corner of square
bne prepare_for_look_for_32x32_vram ;look again if the space is already occupied
lda VramTable+9,y ;lower righthand corner of square
bne prepare_for_look_for_32x32_vram ;look again if the space is already occupied
bra 32x32_slot_found ;there is space for another sprite
prepare_for_look_for_32x32_vram:
inc 32x32ColumnSkipperCounter
cmp #$08
beq next_row
iny
iny
bra look_for_32x32_vram
next_row:
tya ;If this is done right, this should skip every other row of tiles
adc #$08
tay
bra look_for_32x32_vram
16x16_slot_found:
inc VramTable,y ;say that one of the slots is now taken
sty TileRequestTable,x
cpx TileUploaderRequestCounter
beq vram_finder_done
inx
inx
inx
inx
bra vram_finder_loop
32x32_slot_found:
lda #$01
sta VramTable,y ;say that four of the slots are now taken
sta VramTable+1,y
sta VramTable+8,y
sta VramTable+9,y
sty TileRequestTable,x
cpx TileUploaderRequestCounter
beq vram_finder_done
inx
inx
inx
inx
bra vram_finder_loop
vram_finder_done:
rts
Tile Uploader:
Code:
start_tile_uploader:
rep #$30 ; A=16, X/Y=16
ldx #$0000
lda #$1801 ; Set DMA mode (word, normal increment) and destination register (VRAM write register)
sta $4300
lda #$0080
sta $2115
tile_uploader_begining:
lda TileRequestTable+2,x ;Transfer/Sprite Size
sta $4305
sta $4315
cmp #$0080
beq tile_uploader_32x32
;16x16 Top Half
lda TileRequestTable,x ;Spot in vram
sta $2116
lda TileRequestTable+4,x ;Bank number
sta $4303
sta $4313
lda TileRequestTable+6,x ;Tile address
sta $4302
clc
adc #$0040
sta $4312
lda #%0000000100000000 ; Initiate DMA transfer (channel 0)
sta $420A
;16x16 Bottom Half
lda TileRequestTable,x ;Spot in vram
clc
adc #$0100
sta $2116
lda #%0000001000000000 ; Initiate DMA transfer (channel 1)
sta $420A
txa
clc
adc #$0008
tax
cpx TileUploaderRequestCounter
bne tile_uploader_begining
jsr
tile_uploader_32x32:
sta $4325
sta $4335
lda TileRequestTable+2,x ;Transfer/sprite size
cmp #$0080
sta $4305
sta $4315
beq tile_uploader_32x32
lda TileRequestTable,x ;Spot in vram
sta $2116
lda TileRequestTable+4,x ;Bank number
sta $4303
sta $4313
sta $4323
sta $4333
lda TileRequestTable+6,x ;Tile address
sta $4302
clc
adc #$0040
sta $4312
clc
adc #$0040
sta $4322
clc
adc #$0040
sta $4332
lda #%0000000100000000 ; Initiate DMA transfer (channel 0)
sta $420A
;Second Row
lda TileRequestTable,x ;Spot in vram
clc
adc #$0100
sta $2116
lda #%0000001000000000 ; Initiate DMA transfer (channel 1)
sta $420A
;Third Row
lda TileRequestTable,x ;Spot in vram
clc
adc #$0200
sta $2116
lda #%0000010000000000 ; Initiate DMA transfer (channel 2)
sta $420A
;Fourth Row
lda TileRequestTable,x ;Spot in vram
clc
adc #$0300
sta $2116
lda #%0000100000000000 ; Initiate DMA transfer (channel 3)
sta $420A
txa
clc
adc #$0008
cmp TileUploaderRequestCounter
beq done:
tax
brl tile_uploader_begining
done:
rts
(I wouldn't have to be doing any of this if the SNES had chr rom... I think I might settle down a little and see if I can make a Pong demo or something, seeing that everyone else is and that I actually haven't made anything complete yet.)
I looked through the first routine, and am I correct in guessing that "a:zero,y" is used to read metasprite data, and "tile_request_table,x" is a temporary copy of the metasprite data in RAM so the animation engine can manipulate the CHR selection bits.
It's been a while, but yeah, I'm pretty sure "a:zero,y" is the information for each sprite in the frame, (if the sprite size is 0, that means that it is the end of the sprite list) It gets used for the vram finder, and later the vram uploader.
And yes, "tile_request_table,x" is the sprite information in ram. Notice how the character bits aren't uploaded to the table in the animation engine? It's the vram finder that finds it and then fills out the empty spot for every sprite in the table. Every entry in the tile request table is 8 bytes, and the first two bytes in every slot are the character bits. This was the format:
Word 1: Spot In Vram
Word 2: Transfer/Sprite Size
Word 3: Bank Number
Word 4: Tile Address