Maybe faster way of ordering object palettes

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Maybe faster way of ordering object palettes
by on (#177142)
I just thought of something that could potentially speed up the process of figuring out what palettes to change and where to change them for displaying more than 8 sprite palettes onscreen. Instead of having a buffer in ram that has screen zones (a couple of lines tall) with the 8 palettes they are using on them, you could have a table in ram that contains slots for all the palettes for the objects in the game. The slots would have a minimum and maximum y value for where it is being used on the screen (this system wouldn't support a bunch of palettes on the top of the screen, a bunch in the middle, and the same ones on the top on the bottom, but that seems really situational) and also a value linking it to another slot in the table to form a linked list.

(Keep in mind for the next part, that if a value has decreased, go through the linked list backward instead of forward. If there was no palette there to begin with, then start from the beginning forward. Edit: This makes no sense, because it shouldn't ever get smaller. The table gets reset every frame.)

Upon having the minimum y value changed, (if only the maximum y value is changed, go to the next paragraph.) you'd try and see where it would fit in the linked list by checking the minimum y value of the request to the maximum y value of any of the palettes that are already the linked list. When you find a spot that works in that the minimum y value of the request is greater than the maximum of the pre existing palette, check the maximum y value of the requested palette to the maximum y value of any palette afterword in the linked list table.

If the maximum position is greater than the pre existing palette, go to the palette in the linked list. If not, stop and put the requested palette in that spot on the linked list table.

I might have something wrong with how this works, because it's hard for me to visualize, but I think I've taken most everything into account (correct me if I'm wrong). This still seems very CPU intensive because of all the comparisons (because it is) but this is nothing in comparison to checking 8 palettes against 8 palettes in every zone like you would have to otherwise, and as a plus, you can initiate the palette swap wherever you want to instead of between zones. The only downside is that you can't have a palette switch to another and back, but this doesn't seem like it would be useful nearly as much as it should to justify the extra processing power involved in searching through something like 64 palettes instead of something like 16. Also, if an object using a palette falls between the maximum and minimum y values, then it doesn't have to search through the table.

When you're done with it all, go through the linked list and turn it into data for HDMA or something. The list should probably be erased every frame because you have to figure the objects will be moving.
Re: Maybe faster way of ordering object palettes
by on (#177398)
Wow! Hardcore! I'll try to read this when I'm in a better mental state.
Re: Maybe faster way of ordering object palettes
by on (#177838)
I haven't been able to get to this because I needed to help my sister prepare for college and also prepare for school myself. Anyway, I think I got the actual sorting part of it done mostly, I still need to implement a feature to where if the visual range of an object is off of the screen (as in over 224) or if the top is equal to or under the bottom (I was thinking about it only checking if they were equal which is the only thing that actually makes sense, but it doesn't take any more resources to do this), then the object is ignored. The reason I'm not sure how I want to do this is that I want the routine to use both an 8 bit accumulator and 8 bit index registers. (Currently, a 16 bit accumulator and 16 bit index registers are being used.) Because the visual top and bottom of an object could be over 256, this could mess up only using 8 bits. I'll have to have a separate routine for calculating the top and bottom of an object, and I could have that routine use a 16 bit accumulator, just have it to where if the visual top should really be above or under 224, force it to be 224 or 0 respectively. I currently have it to where the metasprite routine calculates the onscreen location of objects, but I think I'll make it its own routine because of things like the problem I'm currently having.

Anyway, I said the problem about having an 8 bit accumulator, but I haven't said why I'm having a problem with 16 bit index registers. The problem is that the object table is simply too large to by indexed by an 8 bit number. Luckily, I have 128 slots, and because the CPU is at max 16 bits, that's 256 bytes for each attribute table for objects, just enough to be indexed by an 8 bit number. I'll have to be a bit more wasteful with how ram is being used in that a variable that uses an odd number of bytes will have to use an even number and I'll potentially be a little slower with some things, like I may have to make an at most 32 bit buffer for if I want to write a word in the center (I'm mainly thinking of 24 bit coordinates and 16 bit velocity) but I'm absolutely sure that having almost all my routines use 8 bit index registers instead of 16 bit will far outweigh this.

So yeah, here's the routine. "PaletteLinkedListTable" actually needs to be zero filled every frame, I forgot about that. The number of palettes should justify whether or not DMA needs to be used for this. With 8 bit index registers, you'd have support for 127 (254 with bit shifting using 16 bit index registers in another routine) palettes with how my code is made. I'm thinking about actually copying the palettes over into ram, so that a palette being referenced by here could actually change, like if you have only one object use a palette ever, you can change it however you want.

Code:
.proc palette_sorter
  rep #$30   ;A=16, X/Y=16
  ldx #$0000
  stz PaletteSorterObjectTableOffset
  stz PaletteTableStart

next_object:
  lda ObjectTable+ObjectSlot::Palette,x
  beq determine_objects_checked
  tay
  lda PaletteLinkedListTable-2,y
  beq first_object_using_palette
  lda ObjectTable+ObjectSlot::VisualTop,x
  cmp PaletteMinimumHeightTable-2,y
  bcs over_minimum_height
  sta PaletteMinimumHeightTable-2,y
  lda ObjectTable+ObjectSlot::Palette,x
  tax
  bra check_minimum_height_loop

over_minimum_height:
  lda ObjectTable+ObjectSlot::VisualTop,x
  cmp PaletteMinimumHeightTable-2,y
  bcc under_maximum_height
  sta PaletteMaximumHeightTable-2,y
  lda ObjectTable+ObjectSlot::Palette,x
  tax
  bra check_maximum_height_loop

determine_objects_checked:
  ldx PaletteSorterObjectTableOffset
under_maximum_height:
  txa
  adc #ObjectSlotSize
  tax
  stx PaletteSorterObjectTableOffset
  cmp #ObjectTableSize
  bcs next_object
  rts



first_object_using_palette:
  lda ObjectTable+ObjectSlot::VisualTop,x
  sta PaletteMinimumHeightTable-2,y
  lda ObjectTable+ObjectSlot::VisualBottom,x
  sta PaletteMaximumHeightTable-2,y
  tyx
  ldy PaletteTableStart
  bne check_minimum_height_loop
  stx PaletteTableStart

check_minimum_height_loop:
  lda PaletteMinimumHeightTable-2,y
  cmp PaletteMaximumHeightTable-2,x
  bcs check_maximum_height_loop
  lda PaletteLinkedListTable-2,y
  sty PreviousPaletteSlot
  beq linked_list_end
  tay
  bra check_minimum_height_loop

check_maximum_height_loop:
  lda PaletteMaximumHeightTable-2,y
  cmp PaletteMaximumHeightTable-2,x
  bcc spot_found
  lda PaletteLinkedListTable-2,y
  sty PreviousPaletteSlot
  beq linked_list_end
  tay
  bra check_maximum_height_loop

spot_found:
  tya
  sta PaletteLinkedListTable-2,x
  ldy PreviousPaletteSlot
  txa
  sta PaletteLinkedListTable-2,y
  bra determine_objects_checked

linked_list_end:
  lda #$FFFF
  sta PaletteLinkedListTable-2,x
  ldy PreviousPaletteSlot
  txa
  sta PaletteLinkedListTable-2,y
  bra determine_objects_checked
.endproc

It really does suck that I can't test to see how this works right now (I can test to see if it crashes, but that's about it) in that I need to have it determine how the palettes are being uploaded when during vblank and hblank. Although I need to get that done, I think I'm going to try to split up almost all my tables into several tables with one attribute, particularly the object table. I feel it's the more pressing issue.

Edit: I fixed some issues I saw with "first_object_using_palette".