What are some idea to prevent the flashing effect between screens that require large amounts of PPU writes (enough where you have to disable rendering).
I have an opening cutscene to an RPG I'm working on. Moving from one page to the next causes the screen to flash with the universal backgound color. It's very annoying and I want to get rid of it.
One option I was considering was using the pallette to fade the image off screen. This way I could make several partial updates to the PPU over several frames (all hidden from veiew). Onother approach was keeping 2 nametables in the PPU and switching displays (scrolling) when the next screen had finished loading. I'd always be updating the offscreen nametable. The problem with this would be the tilesets change between pages. I'd have to do all the chr ram writes in the same vblank to keep the flicker from occuring.
Any ideas?
The Mad Wizard did a thing where it fades out the background but leaves the player sprite, then drags that sprite across the screen while it's slowly updating the nametables during vblank. (Very similar technique used with Super Metroid, I think?)
About how much (bytes) can be written to the ppu during vblank? I know this depends heavily on the efficiency of the algorithm and what else needs to happen in the vblank.
Do you manage a PPU Stack in CPU Memory? And if so, what is you encoding? I was thinking 3 bytes, High Address, Low Address, New Value. But I could also see one that takes advantage of consecutive byte changes (like RLE)
Lucradan wrote:
About how much (bytes) can be written to the ppu during vblank?
I'd say about 160 generally.
Lucradan wrote:
Do you manage a PPU Stack in CPU Memory? And if so, what is you encoding? I was thinking 3 bytes, High Address, Low Address, New Value. But I could also see one that takes advantage of consecutive byte changes (like RLE)
I made
Popslide, a VRAM update framework that uses an otherwise (normally) unused part of the stack page at $0108-$01BF. It expects packets in the
NES Stripe Image format.
My first game, I was only able to squeeze 64 bytes of PPU writing in the v-blank. That was because I was doing the decoding DURING v-blank.
If you do the decoding DURING rendering (ie, after v-blank) and pack the bytes into a super-fast writing routine you could get over 200 bytes of PPU writing during v-blank. Keep in mind you still need to do the sprite DMA.
You say "stack", but stack implies last in / first out. I prefer buffer.
A more recent game I did, I wrote 128 bytes max per v-blank...for horizontal scrolling. For vertical scrolling, you'd need* max 136 bytes. Those are achievable goals.
EDIT, "need" is the wrong word here. You could split the writes over 2 v-blanks, or more. I like to group (mentally) 4 lines together, since they all share the same attribute bytes. 4 * 32 = 128 + 8 attributes = 136.
Yeah, my rule of thumb is that 64 is very easy to reach, but you can more than double that it if you really know what you're doing.
The mention of stack reminds me that my current favourite place to put the PPU update buffer is in the $100-$140 region so that I can abuse the stack instruction PLA for a "read and increment index" function. (Not actually the fastest way to do it, but it's convenient in some ways.)
You all are a bit higher level than me, but you have given me a lot to chew on. Right now I've got each page of the cutscene with its own independent loader code, name tables, chr tables, etc. Just to see that each page looks right. I've got to eventually merge it all together into something much more efficient and fluid.
On second thought, I think I split the scrolling writes because I was also doing HUD updates and color rotation.
So, I think it was 30+30 (scroll) + ~ 20 (HUD) + 1 color rotation...so about 81 max. Attribute bytes were done on a separate v-blank. That sounds about right. 81.
rainwarrior wrote:
The mention of stack reminds me that my current favourite place to put the PPU update buffer is in the $100-$140 region so that I can abuse the stack instruction PLA for a "read and increment index" function. (Not actually the fastest way to do it, but it's convenient in some ways.)
What's a faster way? Loading from non-zero-page and incrementing an index in 4 cycles is pretty doggone fast, but I'd love to use a faster method.
The pedal to the floor version is an unrolled chain of LDA #imm, STA $2007. Takes a lot of RAM or ROM though. (Can also use all three A/X/Y and try to skip repeated loads, but that's yet another layer of madness on top.)
In between the two is unrolled LDA zp, STA $2007. You gotta give up all your ZP for this pretty much though, so it's not likely useful.
rainwarrior wrote:
(Can also use all three A/X/Y and try to skip repeated loads, but that's yet another layer of madness on top.)
I've posted this before but here's C++ code for doing this
https://pastebin.com/raw/6j6pwM2Q(this is for CHR data but it can be adapted to any PPU write)
pubby wrote:
rainwarrior wrote:
(Can also use all three A/X/Y and try to skip repeated loads, but that's yet another layer of madness on top.)
I've posted this before but here's C++ code for doing this
https://pastebin.com/raw/6j6pwM2QAlso included possible arithmetic instructions in lieu of loads! Hardcore.
rainwarrior wrote:
The pedal to the floor version is an unrolled chain of LDA #imm, STA $2007. Takes a lot of RAM or ROM though.
Oh yeah. That's fast.
So I've realized that I can't do the palette fade option because the frame and text use the same palette. Will have to go with the 2 screen option.