VRAM Buffer

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
VRAM Buffer
by on (#197314)
Hey, this is our first post on these forums. We have been working on making a NES game with NESASM. What do you think of our VRAM buffer code?

The format is;
Code:
----------------------------------
Bytes         Purpose
----------------------------------
0              Length of data
1-2            PPU write address
3              Mode flag
                   0 - Location
                   1 - RLE
4              RLE byte
4-6            Data read address


For RAM it uses 2 general purpose pointer bytes and 3 incremental bytes in zero page and 36 bytes in RAM
Code:
  .zp
  .org $0000
PointerLo         .ds 1       ;general purpose address pointer variables
PointerHi         .ds 1       ;low byte first, high byte immediately after
GfxBufferPtr      .ds 1       ;pointer for GfxBuffer
GfxBufferLen      .ds 1       ;length of bytes to write
FlushGfxBuffer    .ds 1       ;should the buffer be flushed

  .bss
  .org $0400
GfxBuffer         .ds 36      ;36 byte buffer


And then we have the following code in NMI;
Code:
  .code
  .bank 0
  .org $8000

NMI:                      ;it all happens in NMI, using all in NMI for code
  LDX #$00                ;start at the begining
Draw_Gfx_Buffer:
  LDA GfxBuffer, x        ;load length byte from buffer
  BEQ .done               ;we are done if the length is $00

  STA <GfxBufferLen       ;store length
  LDA PPU_STATUS          ;read PPU status to reset the high/low latch
  INX                     ;incriment pointer
  LDA GfxBuffer, x
  STA PPU_ADDRESS         ;write the high byte of the write address
  INX                     ;incriment pointer
  LDA GfxBuffer, x
  STA PPU_ADDRESS         ;write the low byte of the write address
 
  INX
  LDA GfxBuffer, x        ;load flags byte from buffer
  CMP #ModeLocation       ;are we in location mode?
  BEQ .modeLocation
  CMP #ModeRLE            ;are we in RLE mode?
  BEQ .modeRLE
.modeLocation:
  INX
  LDA GfxBuffer, x
  STA <PointerHi          ;write high byte of the read address
  INX
  LDA GfxBuffer, x
  STA <PointerLo          ;write low byte of the read address
  INX                     ;incriment pointer

  LDY #$00                ;start at 0
.loop:
  LDA [PointerLo], y      ;load data from address
  STA PPU_DATA            ;write to PPU
  INY                     ;incriment counter
  CPY <GfxBufferLen       ;Compare X to buffer length
  BNE .loop               ;Branch if compare was Not Equal to zero
 
  LDA #True
  STA <FlushGfxBuffer     ;we will need to flush the buffer
  JMP Draw_Gfx_Buffer
.modeRLE:
  INX                     ;incriment pointer
  ;crap for RLE goes here
  LDA #True
  STA <FlushGfxBuffer     ;we will need to flush the buffer
  JMP Draw_Gfx_Buffer
.done:

Flush_Gfx_Buffer:
  LDA <FlushGfxBuffer     ;if FlushGrxBuffer
  BEQ .done               ;equals 0, false then we are done
 
  LDA #$00                ;fill with 0's
  LDY #$00                ;start at 0
.loop:
  STA GfxBuffer, y
  INY
  CPY #36
  BNE .loop
  STA GfxBufferPtr        ;and clear the buffer pointer
.done:
Re: VRAM Buffer
by on (#197316)
The code looks good, but besides that, I would be extremely hesitant towards using generic VRAM update buffers. You only get so many cycles per VBLANK, so why waste so many on iteration code?

I think it's okay to use something like this for small, low-cost things that happen rarely, but if you're planning on using this for scrolling or status bars or anything that gets updated every frame in a predictable fashion, you're better off steering clear!
Re: VRAM Buffer
by on (#197317)
Code:
  LDY #$00                ;start at 0
.loop:
  LDA [PointerLo], y      ;load data from address
  STA PPU_DATA            ;write to PPU
  INY                     ;incriment counter
  CPY <GfxBufferLen       ;Compare X to buffer length
  BNE .loop


This is inefficient. If you need to move lots of bytes to VRAM, consider having a second...more efficient system.

Also, for flexibility. Consider if you had one update going left to right, and a second update going top to bottom, both during the same V-blank. I don't think your plan would cover that.
Re: VRAM Buffer
by on (#197318)
Have you looked at how Popslide does things? It's a generic updater that still runs fast (about 8 cycles per data byte plus 50 per address change) because it's hardcoded to put the buffer in an otherwise unused part of the stack page at $0108-$01BF.
Re: VRAM Buffer
by on (#197320)
Your copy loop is extremely slow, since it uses indirect addressing, increments the index for every byte, and compares the index to the end value after every byte as well. That adds up to 17 cycles per byte, which considering the sprite DMA and all the overhead, will realistically allow to transfer maybe 60 bytes per frame, barely enough for a row/column of tiles and the palette. Keep in mind that this style of VRAM updates are terrible for columns of attribute bytes, which will require several 1 byte transfers, causing a lot of overhead.

A while ago I was looking for the optimal way (in terms of speed) to implement a VRAM update system. What I came up with needed 8 cycles per byte and as little overhead as I could possibly have, and could do about 200 bytes per frame. But even that wasn't ideal for attribute columns, and I found myself constantly missing opportunities for speeding things up, since many types of updates use redundant addresses and even data, so I eventually ditched the generic approach in favor of specific routines for each kind of update.

I also created a lookup table indicating the amount of time needed for each type of update, so I could subtract those from the total vblank time to know when the vblank time was up and I should start rejecting update requests for the frame.