I've read that there is exactly 1232 PPU cycles per scanline, but how big is the PPU to VRAM bus and are the accesses every cycle or every two cycles?
VRAM is the second fastest memory in the GBA, after IWRAM. That's why both acNES and PocketNES put code there.
Accesses are every cycle during vblank or slightly slower outside of vblank because the CPU will wait for the PPU to finish. However, there's no second CPU-to-PPU address bus unlike on the Super NES, so the source and destination addresses have to take turns on the main address bus. EWRAM to VRAM copies take 4 cycles per 16 bits, ROM to VRAM copies take 3 cycles per 16 bits, and IWRAM to VRAM copies take 3 cycles per 32 bits.
But really, the PPU to VRAM bus is fast enough that you can rewrite all of the sprite pattern table (32 KiB, 0x06010000-0x06017FFE) in each vblank using a DMA copy or even an LDMIA/STMIA copy, as I described in
this white paper back in 2002.
I think I figured out how the PPU might work. (this is all just an educated guess though.)
The 96kB is divided into 3 ram chips. One with 64kB and two with 16kB. Each with a 16-bit bus.
For modes 0, 1 and 2, the backgrounds use the 64kB and the sprites use the two 16kB. For modes 3, 4 and 5, the backgrounds use the 64kB and one of the 16kB, while the sprites use the other.
Both the backgrounds and sprites are done simaltaneously, with the PPU accessing VRAM once every two cycles. They have 616 VRAM accesses per line, with 2 access per pixel.
Aren't the CPU, PPU, IWRAM, and VRAM all on the same chip?
Psyco probably meant "chip" in the sense that CGRAM and OAM are separate "chips" on the NES PPU or the GBA CPU: separate RAM cores on one die.
Does GBA use the same interweaved pattern/map byte format for so-called "Mode 7" as the SNES does? I know that it uses bytes for both tile references and pixels, just like the SNES, but I can't find a document that says they are interweaved. If they are not interweaved, than why are the tile references 8-bit, and how would the CPU still have enough time to write to vram during active display, because it would need to access 4 memory times in one pixel, in order to show 2 layers of "Mode 7." I know that on the SNES, there are actually 2 8-bit VRAM banks, that act as a single unit for everything but Mode 7, with one bank being the tile map, the other being the pattern table.
Having programmed a mode 1 game myself and watched mode 2 demos work in the VRAM viewer of VBA, I can tell you that the sort of interleave seen in Super NES mode 7 isn't present. The PPU appears to run at four times the dot clock rate, which in mode 2 allows for two map reads and two texture reads per pixel. This leaves the rest of the line for sprite fetches.
I've read that VRAM access during active display just takes away from the sprite unit's accesses.
The Game Boy delays the processing of the rest of the scanline if it has to do certain kinds of processing. This is because the LCD doesn't need pixels clocked in quite as regularly as a CRT. Speculation: I wonder if this is true of the GBA. If all background fetching happens first, and then sprite pattern fetching, it's possible that VRAM accesses during active picture cause background accesses to be delayed, which in turn causes sprite accesses to be delayed, which in turn may cause dropout.
Does that mean there can only be 240 pixels of rotation/scaling sprites before flicker, and less while writing to VRAM?
"Less when writing to VRAM" yes. I don't know the exact timings; I never ran into significant overflow in any of my GBA productions.
One of the GBA docs said that it PPU renders sprites the entire scanline, and takes 2 cycles per rotation/scaling pixels. If that is the case, then the 32kB of sprite patterns must be separate from the 64kb used for backgrounds.
However, the background takes up half the sprite memory in bitmap modes. There must be something complicated going on here. What motivated you to investigate this? Are you trying to make a cycle-accurate GBA emulator or a demo that depends on cycle accuracy or pushes the sprite limit?
I am just interested in how the circuitry works.
I'm not too sure myself, but apparently the GBA uses an advanced memory bus which is ridiculously more complex than what is found in the NES or even the SNES.
The bus allows variable length wait states, so that one device can "wait" for another to finish the transfer before it uses the bus. Because of this, every cycle can be used for a good use, unlike the NES where there is dead cycles everywhere both on the CPU and PPU side.