So after have some practices two years later, I retry to do some simple stuff on NES.
I've succesffuly write colors palettes and nametables.
So know, I try to understand how to read colors palettes. (Because I would probably compute it)
For know I just try to read the early first byte of the nes palette at $3F00. Regarding the wiki, it seems possible to Read at $2007 "DATA" when the screen is turned off or during Vblank Time.
So I found different talk about it accross the forum, but never find "code" that show it in reality.
Can you help me figure it out how I must doing it:
I try different things, so the simple would be like this ? (palette is already filled with colors)
Code:
VBlankWait: ; wait for Vblank, PPU is ready after this
bit PPU_STATUS ; read bit 7 of PPU_STATUS $2002
bpl VBlankWait ; if N = 0 (bit 7 of PPU_STATUS) goto VblankWait
lda #$00001100 ; Disable NMI & Set Increment Per 1
sta PPU_CTRL ; = $2000
lda #$00000110 ; Screen Display OFF
sta PPU_MASK ; = $2001
SetPaletteAddress: ; palette start always at $3F00 in PPU Memory
lda #$3F
sta PPU_ADDR ; $2006 write low byte
lda #$00
sta PPU_ADDR ; write high byte
lda PPU_DATA ; $2007
sta $00 ; will be the first color ?
lda PPU_DATA
sta $01 ; just to check if it could be here or it "must" be second color ?
Source code for Debug
Attachment:
Debug.zip [37.21 KiB]
Downloaded 103 times
Thanks for those wo know how to do it.
Insert an extra LDA PPU_DATA before the "real" one (see documentation of $2007 read for more). I should have looked at the documentation.
See if you can run this with a debugger so you can see for yourself whether the colors read back as you expect. Or set up an ASCII font in the NES and print the values read back on screen. You need to get it where you have feedback like this so you can experiment directly.
FYI, palette reading doesn't work on some Famicoms (I think Famicom Titler, because it uses the RGB PPU, but maybe some others with old PPU revisions as well).
Even if it did work, I don't think it's a good idea to design your program in a way that you have to read it back.
Oh, and palette reading doesn't work in FCEUX if that's what you're testing with.
blargg wrote:
Insert an extra LDA PPU_DATA before the "real" one (see documentation of $2007 read for more).
He's reading the palette, so there's no need for the dummy read.
When reading $2007, there's a delay, so the first byte that returns is invalid. Just discard the first value, and use the following ones normally. As thefox mentioned, this doesn't apply to palettes.
I have to ask though: Why do you need to read the palette back? Is it to perform modifications on it, such as fades and things like that? The palette is so small that I would advise you to keep a copy of it in RAM instead, so that you can manipulate it freely and use VBlank time just to write it to VRAM. VBlank time is pretty short, so if you use it to read, modify and write values back you might end up wasting too much time on this and not have enough left for more important stuff. Ideally, VBlank time should just be used for blasting previously calculated data to VRAM, if you are doing lots of calculations you're just wasting that time.
Also, were't there certain Famicom revisions that didn't allow palette reading? I'm not sure, but I seem to remember someone saying something like this. It could have been reading $2004 that wasn't allowed, I'm not sure. Someone else will have to confirm this.
So Thanks,
Quote:
thefox: Oh, and palette reading doesn't work in FCEUX if that's what you're testing with.
Ha, yes it was. Thanks to stop me in this loop.
Quote:
tokumaru: Is it to perform modifications on it, such as fades and things like that?
In First place yes, but I was thinking It will preserve some Ram Memory.
I've found a lot of talk about it, and I succes doing color swap during Vblank. (So It's not the problem)
Quote:
thefox: Even if it did work, I don't think it's a good idea to design your program in a way that you have to read it back.
Yes, know I will never try to reuse it
Quote:
tokumaru: I would advise you to keep a copy of it in RAM instead
Yes I will probably use an "already made" Buffer, and be carefull to not use it in other thinks when there is fading.
Quote:
tokumaru: VBlank time is pretty short
So I was thinking about changing only 8 colors per NMI.
I've found that "Transparent Color" need to be swapped in one pass, is it True ?
[edit]After some tests, change the first byte Transparent Color change alls of this color[/edit]
As usual, tokumaru is exagerating how short VBlank time is. It does not allow you to update the entiere screen, but the bandwith is still large enough for most applications. I use read-modify-write from VRAM for attribute table in my engine, and it works well. HOWEVER, I agree for the palette it'd be sub-optimal. Because using 32 bytes of RAM for this is really not much, I doubt you'd ever use so much RAM that you don't even have 32 bytes somewhere for a shadow copy of the palette. Many games even uses 2 shadow copies, one for the original palette and a second one with effects enabled (or something in the like).
Bregalad wrote:
As usual, tokumaru is exagerating how short VBlank time is.
Keep in mind that he has to read the palette,
execute the fade logic and write it back. Really, with loops and conditionals, this could easily eat most of VBlank, leaving little time for sprite and name table updates.
Raccoon, if you pre-calculate everything, you can definitely write more than 8 colors per NMI.
Yes Thanks Bregalad.
I already think update the entire palette is possible (if you only do it in one NMI).
But the advice from tokumaru "is good" because, with some complexity, i would probably implement it in an "another project" during updating nametable columns, and keep the smooth scrolling without lags.
Quote:
tokumaru: Raccoon, if you pre-calculate everything, you can definitely write more than 8 colors per NMI.
I think this point would effectivelly need to be mentionned : It's actually what I'm doing
For know I try some tests, I will let all of you know how I've done.
(and probably share with some source and nes resulting program)
You'd honestly have to try pretty hard not to update the palette in one whole NMI. You should be able to write the palette a good number of times. One way to make it fast is to put 32 tiles for scrolling and then the 32 bytes for the palette in the $0100 page of RAM with the stack, and put the stack to the beginning of the data and do a PLA/STA loop to unload it all fast. That's how I do it, at least. But uploading 64 bytes isn't very hard at all.
Vblank time length isn't really an issue here. You can comfortably transfer 192 bytes in a vblank, and that's enough to transfer a palette 6 times over.
edit: PLA/STA loop is equally fast as a LDA xxxx,X \ STA loop, just smaller code.
Too bad using PLA isn't faster, but it is better if you need to save code space, as well as RAM. (If you can live with a reduced stack.)
It's better because you have to load 1 value, and then do a PLA loop to do the optimized loop (Whatever that is called again), but if you keep all 3 arrays at the beginning of $0100, so you don't have to reposition any pointer at all while you would putting them other places. Now an array of LDA STA in WRAM would be better, but most games don't use it so I figured it'd be useless to mention.
The ridiculously optimized 30 fps version of the "Bad Apple" video uses a self-modifying sequence of LDA #$xx/STA $2007 to copy 240 bytes while still having time to read back from CHR ROM where a third of the video data is stored.
Getting slightly off topic here, but doing it the other way around, i.e. building the buffer with PHA, and writing it to PPU with STA could potentially be globally faster because PHA is only 3 cycles and you get automatic indexing. At least when I was writing my scrolling code I was running out of index registers all the time. The problem of course is that if subroutines have to be used in the scrolling code, SP has to be saved and restored each time. That's why I didn't do it in my own engine, I had already too many subroutine dependencies, but if I ever rewrite my scrolling code I'll consider it.
Another "problem" is that the buffer is built backwards, but that's not really a problem.
If you use interrupts, stack tricks for building the buffers are out of the question.
Of course they aren't. I use PLA \ STA in NMI, it works quite well. I start my vram buffer at $0100 and increment an index variable. This and the stack pointer move towards each other. In NMI I save the SP and set it to $FF. PLA now starts at $0100. Restore the SP when done.
Edit2: I see I misunderstood: BUILDING the buffers with stack instructions is the point of the previous post.
thefox, I don't understand how to use PHA to make things better, can you explain more? edit: I think I get it, but yes the buffer would be backwards resulting in slower vram writes.. it doesn't seem to be that useful.
Dwedit wrote:
If you use interrupts, stack tricks for building the buffers are out of the question.
I hadn't though about that, but I don't think it should be a problem if the buffer has a couple of bytes of spare space. E.g. if an NMI occurs during the building, the return address will write to the PPU buffer, yes, but not over the existing buffer values. When the NMI returns the only result is that the buffer will have some garbage in it (which will get overwritten by more buffer data or a terminating byte anyways).
Movax12 wrote:
thefox, I don't understand how to use PHA to make things better, can you explain more? edit: I think I get it, but yes the buffer would be backwards resulting in slower vram writes.. it doesn't seem to be that useful.
The VRAM writes would be slightly slower, yes, because an index register (X) needs to be modified unlike with PLA, but if LDA foo,X is unrolled I don't think it should be significantly slower. I guess it could turn out to be significant if a lot of small transfers are made. Dunno.
thefox wrote:
if LDA foo,X is unrolled I don't think it should be significantly slower.
I don't think it would be slower at all. Unrolled
PLAs and unrolled
LDA foo, X require some logic to skip some of the leading writes, in case the amount of bytes to transfer is variable. Both require you to set the index of the first element to copy (the PLA way even needs TXS, which the other method doesn't need).
Dwedit wrote:
If you use interrupts, stack tricks for building the buffers are out of the question.
Why so? When a program pushes bytes on the stack and then pulls them off, interrupts don't disturb this. This is equivalent.
EDIT: added context, sorry. I had thought my post was just after Dwedit's, but hadn't seen that there was a second page. thefox covered what I was getting at regarding the non-interference of interrupts to this buffering.
Why what? The TXS? Well, once the NMI fires, if your buffer was at the top of the stack it isn't anymore. Even if you use "waiting for VBlank" instead of NMIs for VRAM updates you still might have more than one buffer or it/they might not necessarily be the last thing you put on the stack. If you're using the stack for buffering VRAM data, you'll most likely need to manipulate the stack pointer.
Ho those guy are awesome.
Certainly this post have derivate a little, but for sure you've help me a lot and advice me.
At first glance I haven't plane to use the stack to buffer my PPU works.
Because I wasn't thinking of it, tricky care about the Interrupts (JSR RTS) during buffer build would be a pain ? You really need to save it in somewhere else ?
For know because of little time I have, I will let it this for later. (but maybe will need to understanding it more).
So know I have done my PPU Palette Data Update entirely in one frame (so the simpler way), using a Buffer ($00 → $20). Also before this buffer was Update, I build it in the main program. I've succes doing faddings (from black) and generate random palette colors (just for fun).
If someone want, ask it, I will share my "little devs".