Sure. However, I actually already deleted the flipping part of the code. It wasn't way too optimized, but at the same time, I think the speed increase would be marginal.
Here's the metasprite code though. I got rid of the single metasprite code because there's no need now. However, I might bring it back, because it'll be
marginally faster in that I shouldn't need to add the x or y position from the metasprite, I'd just use the object's x and y positions, so that'll save me two sets of clc and adc!
I guess the other thing would be that I don't need to check if we're at the end of the table again, but that's even less impressive in terms of saving cpu time. I don't care though, when you're going through a routine (at most) 128 times, any cycle saved helps.
Oh yeah, "big_metasprite" is the exact same thing except that the values for checking if it's out of bounds are different. I figured I'd waste a few cycles (and go against what I said earlier
) to combat overdraw. The comments are pretty much useless, but I think they're accurate. The comments on the metasprite are, I know.
Code:
.proc metasprite_handler
rep #$30 ;A=16, X/Y=16
lda #ObjectTable
tcd
ldy #$0000
bra continue_metasprite_finder
metasprite_finder:
tdc
clc
adc #ObjectSlotSize
tcd
cmp #ObjectTable+ObjectTableSize
bne continue_metasprite_finder
sty a:SpriteCount
rts
continue_metasprite_finder:
ldx ObjectSlot::MetaspriteOffset
beq metasprite_finder
lda a:$0000,x
sta a:MetaspriteCount
metasprite_loop:
lda a:$0006,x
and #$0001
sta a:SpriteBuf3+2,y ;sprite size
bne big_sprite
lda a:$0002,x
clc
adc ObjectSlot::OnscreenXPosition
cmp #256
bcc sprite_x_not_out_of_bounds
cmp #65528
bcs sprite_x_not_out_of_bounds
txa
clc
adc #$0008
tax
dec a:MetaspriteCount ;decrement MetaspriteCount by 1
brl metasprite_loop ;back to the loop...
metasprite_finder_branch:
bra metasprite_finder
sprite_x_not_out_of_bounds:
and #$01FF
sta a:SpriteBuf1,y ;Store sprite X position SpriteBuf1+y
sta a:SpriteBuf3,y ;Store sprite X position SpriteBuf1+y
lda a:$0004,x ;2nd byte = sprite Y position (value 0-255)
clc
adc ObjectSlot::OnscreenYPosition
cmp #224
bcc sprite_y_not_out_of_bounds
cmp #65528
bcs sprite_y_not_out_of_bounds
txa
clc
adc #$0008
tax
dec a:MetaspriteCount ;decrement MetaspriteCount by 1
brl metasprite_loop ;back to the loop...
sprite_y_not_out_of_bounds:
sta a:SpriteBuf1+1,y
lda a:$0006,x
sta a:SpriteBuf3+2,y ;sprite size
lda a:$0008,x
ora ObjectSlot::Attributes
sta a:SpriteBuf1+2,y ;extra/character
iny
iny
iny
iny
cpy #$0200 ;sees if all 128 sprites are used up
bne continue_sprite_y_not_out_of_bounds
sty a:SpriteCount
rts
continue_sprite_y_not_out_of_bounds:
dec a:MetaspriteCount ;decrement MetaspriteCount by 1
beq metasprite_finder_branch
txa
clc
adc #$0008
tax
brl metasprite_loop ;back to the loop...
Code:
TestMetasprite:
.word $0002 ; Number of metasprite table entries below
;XPos YPos NextTile/Size Extra/Character
.word $0000,$0000,$0000,$0001
.word $0000,$0008,$0101,$4000
Yeah though, I actually got it to only cover "only" half the screen now. Much better though. The rest of my stuff takes up about a forth of that, and I know that can be optimized more than this can. Additionally, like I said, I'm not using FastROM either. I think I've learned though that anything that doesn't have to be done at runtime (like flipping metasprites), I'm not doing it.
I know I keep going on and on, but I'm confused, if you're doing a "rep" or "sep", if the accumulator is 16 bit, will that add 2 extra cycles? I've been trying to be a bit smarter in terms of the size of the accumulator, and x and y. Unfortunately, I really can't do anything with x and y in the above routine. It's a pain in the ass that x and y can't be different sizes, because you could easily make two (really four now
) different routines that deal with the different 256 byte halves of oam. There's no feasible way to make x 8 bit here. (I suppose you could have a different routine for every 256 bytes... Yeah, put the metasprite data at the beginning of every bank...
) It's also a pain in the ass that direct page can't escape bank 0, because it would be perfect for indexing metasprites as each slot is only a handful of bytes. With the object table, anything you use is just as effective. The reason I'm using direct page on the object table is because it's the fastest, and most of my object routines are probably going to deal with data outside of the first 8KB or ram or the data outside of bank $00.
Oh yeah, one final thing, an obvious optimization I saw with your hioam filling code is that you can use direct page instead of x or y to save a cycle for every "ora". X and y can then just be 8 bit, because you're only indexing 32 bytes instead of 512.