I just thought of something that could potentially speed up the process of figuring out what palettes to change and where to change them for displaying more than 8 sprite palettes onscreen. Instead of having a buffer in ram that has screen zones (a couple of lines tall) with the 8 palettes they are using on them, you could have a table in ram that contains slots for all the palettes for the objects in the game. The slots would have a minimum and maximum y value for where it is being used on the screen (this system wouldn't support a bunch of palettes on the top of the screen, a bunch in the middle, and the same ones on the top on the bottom, but that seems really situational) and also a value linking it to another slot in the table to form a linked list.
(Keep in mind for the next part, that if a value has decreased, go through the linked list backward instead of forward. If there was no palette there to begin with, then start from the beginning forward. Edit: This makes no sense, because it shouldn't ever get smaller. The table gets reset every frame.)
Upon having the minimum y value changed, (if only the maximum y value is changed, go to the next paragraph.) you'd try and see where it would fit in the linked list by checking the minimum y value of the request to the maximum y value of any of the palettes that are already the linked list. When you find a spot that works in that the minimum y value of the request is greater than the maximum of the pre existing palette, check the maximum y value of the requested palette to the maximum y value of any palette afterword in the linked list table.
If the maximum position is greater than the pre existing palette, go to the palette in the linked list. If not, stop and put the requested palette in that spot on the linked list table.
I might have something wrong with how this works, because it's hard for me to visualize, but I think I've taken most everything into account (correct me if I'm wrong). This still seems very CPU intensive because of all the comparisons (because it is) but this is nothing in comparison to checking 8 palettes against 8 palettes in every zone like you would have to otherwise, and as a plus, you can initiate the palette swap wherever you want to instead of between zones. The only downside is that you can't have a palette switch to another and back, but this doesn't seem like it would be useful nearly as much as it should to justify the extra processing power involved in searching through something like 64 palettes instead of something like 16. Also, if an object using a palette falls between the maximum and minimum y values, then it doesn't have to search through the table.
When you're done with it all, go through the linked list and turn it into data for HDMA or something. The list should probably be erased every frame because you have to figure the objects will be moving.
(
Upon having the minimum y value changed, (if only the maximum y value is changed, go to the next paragraph.) you'd try and see where it would fit in the linked list by checking the minimum y value of the request to the maximum y value of any of the palettes that are already the linked list. When you find a spot that works in that the minimum y value of the request is greater than the maximum of the pre existing palette, check the maximum y value of the requested palette to the maximum y value of any palette afterword in the linked list table.
If the maximum position is greater than the pre existing palette, go to the palette in the linked list. If not, stop and put the requested palette in that spot on the linked list table.
I might have something wrong with how this works, because it's hard for me to visualize, but I think I've taken most everything into account (correct me if I'm wrong). This still seems very CPU intensive because of all the comparisons (because it is) but this is nothing in comparison to checking 8 palettes against 8 palettes in every zone like you would have to otherwise, and as a plus, you can initiate the palette swap wherever you want to instead of between zones. The only downside is that you can't have a palette switch to another and back, but this doesn't seem like it would be useful nearly as much as it should to justify the extra processing power involved in searching through something like 64 palettes instead of something like 16. Also, if an object using a palette falls between the maximum and minimum y values, then it doesn't have to search through the table.
When you're done with it all, go through the linked list and turn it into data for HDMA or something. The list should probably be erased every frame because you have to figure the objects will be moving.