Has anybody every come up with a good masking pattern to produce the mirroring effects seen on the palette. Obviously, an AND with $1F will limit you down to the actual table, but within that table itself, four specific locations ($10, $14, $18, $1C) are mirrored down to ($00, $04, $08, $0C).
There isn't a very obvious mask for that pattern. Without one, emulators have to implement it using a series of conditional statements, which, for something like memory and pixel operations in an emulator, is very expensive. I'm not really aware of whether or not this is solved, but I have found a working pattern so I thought I'd share in case someone has use.
Essentially, you OR the bottom two bits of the address, shift 5 left, XOR with $FFFF, and then AND that with the address. What this effectively does is zero out the fifth bit, ONLY if the bottom two bits are clear. This is the one commonality I found among the mirrored table entries, so you can derive a mask for bit 5 from that.
Did I do something useful, or was this already obvious to someone? It sure seems useful to me.
What? NES palette is in the PPU's $3F00-$3F1F space and is mirrored to $3FFF.
3gengames wrote:
What? NES palette is in the PPU's $3F00-$3F1F space and is mirrored to $3FFF.
He's talking about the weird mirroring that exists with the first color (0) of each palette. The first color of the 4 background palettes can have unique values, but only the first color of the first palette gets displayed when rendering is enabled (the others can be displayed while rendering is off if the PPU address is pointing to them). The first color of the 4 sprite palettes don't exist at all, and are mirrors of the background palettes.
So, even though there are 32 memory positions between $3F00 and $3F1F, the NES only has 28 bytes of actual memory for palettes. What bmac6502 described was a way to locate the actual bytes based on the address without having to make logical decisions, which is not trivial because of the weird mirroring.
Use a 32-entry array of function pointers to optimized getters?
Actually, a 32-case "switch" statement would be optimized by the C/C++ compiler into an array of code pointers, so a regular old switch statement should be fine.
Have your compiler produce asm output from its code and look at when it emitted. (or just machine-level debug it in an IDE).
You can try a few different techniques and run them all (in a contrived test harness) under a code profiler to see which is fastest.
$00, $04, $08, $0C, $10, $14, $18, and $1C are all mirrored to $00, I thought. In this case, you'd also have to apply this logic to the 4th and 3rd bits as well as the 5th bit.
If you're looking for the speediest solution, only AND with $1F when you're rendering pixels. Whenever one of the aforementioned palette entries is written to, propagate the change to 00, 04, 08, 0C, 10, 14, 18, and 1C. Trust me, writing to those specific palette entries occurs much less frequently than looking up the color stored there, and only needing one AND operation when you're rendering is much better than having to perform a complex bit-shifting/masking operation for each pixel.
Wasn't color $10 color $0,$4,$8,$C if written last or something like that? The fox first said it in another forum post.
The wiki says:
Addresses $3F04/$3F08/$3F0C can contain unique data, though these values are not used by the PPU when rendering. Addresses $3F10/$3F14/$3F18/$3F1C are mirrors of $3F00/$3F04/$3F08/$3F0C.
So I think only writes to $3F00 change the background color, with the other "mirrors" just not having any effect. The method I posted can still be used if you apply it to a seperate array which contains the actual colors-to-be-displayed-for-these-palette-indices. However, you'd only need to propagate the writes if 00 or 10 is being written to. 04 08 0C 14 18 1C would just not have any visible effect.
You'd still need to store the physical value written to the addresses though, and 10, 14, 18, and 1C are indeed mirrors of 00, 04, 08, 0C, so the method the OP mentioned would work fine for this purpose.
Though, I have to wonder if something like
Code:
if (address & 0x0003) {
address unchanged
} else {
address & 0x3FEF
}
isn't faster than all of the bit masking/shifting. I mean, in 6502 code, having conditional branches would be faster than all of the shifting and bit manipulation.
Drag wrote:
So I think only writes to $3F00 change the background color, with the other "mirrors" just not having any effect.
I'm pretty sure you overwrite $3F00 when you write to $3F10.
But the byte $10 write would be mirrored to $00, changing the universal background color. Right? I am pretty sure the fox said that in another post not very long ago.
Yes, $3F10 is mirrored to $3F00, so when you're writing to $3F10, you're actually writing to $3F00, which is the only address that affects the background color. Sorry if I was a little confusing with my wording. :S
Ok, after working it all out...The values ARE only mirrored between the pairs. linking all of the values together does not produce correct results.
Also, ignoring pairs other than $00/$10 does not produce correct results either. It appears that in order to get correct behavior from the hardware, it must perform the full mirroring as described on the NESDev wiki.
With the in mind, I knew I had something with the identification that the two low bits provided an indication of mirrored values, but I definitely wasn't happy with the convoluted series of bit ops that I arrived at. Thanks to Drag, as both of his observations are excellent. Here is what I finally arrived at:
Code:
void write(addr, value)
{
addr&=0x3F1F;
palIdx[addr] = value;
if(addr & 0x3)
{
palIdx[addr&0x3F0F] = value;
}
}
byte read(addr)
{
return palIdx[addr&0x3F1F];
}
The way I've done it above, it incorporates both of drag's suggestions. Using an AND with a single conditional to test the lower two bits seems obvious in retrospect. Loading the heavier work on the write side since the reads occur per pixel during render is also a great idea. The write side is two ANDs, a memory store, and a conditional branch which might result in an additional AND + memory store, which isn't too horrible for an infrequent operation. The read is only a single AND at all times.
One more update. The above is not quite right. It mirrors the high four index values down, but does not mirror the low four index values up. Fixing it isn't hard though.