Hi.
I made my journey through the PPU wiki, but everything is a little bit messy in my head. So I'm here to ask you for advice on how to start... basically. Any clue will be welcome. My CPU is fully working (maybe NMI and IRQ interrupts are a proper starting point?), but now I don't know what to do first (assume I understand the contents displayed on the wiki more or less). I don't want you to trace a route map for me, just some advice to start.
I think I could try to write my own PPU Viewer so it helps me out with the memory structure of the PPU. What do you think of this? However, I feel like I don't really understand the "working cycle" of the PPU: it draws 256 scanlines plus the hidden ones [post-render?] and after that VBLANK takes place, during which PPU "calculates" the next frame (sprite 0 hit, nametable switching [if it's set], etc). That's a summary of what I understand.
A common first step in PPU emulation is to code an "instantaneous" PPU, which draws an entire frame all at once every frame, sampling the PPU settings (scroll, palettes, etc.) from a specific time in the frame.
This is horribly inaccurate, and anything that requires any sort of CPU-PPU synchronization will fail, but a few games should be playable this way.
This will help you familiarize yourself with the formats of the pattern, name and attribute tables, as well as various other settings and components that impact the way the image is rendered.
Once you understand how the data is used to form a picture, you can increase the parallelism and simulate the PPU alongside the CPU, doing all the memory fetches and logic that the actual PPU does.
Do what tokumaru says. Additional points alongside his:
- Don't worry about mappers. Focus only on first-generation mapper 0 (NROM) games for now
- Use games like Donkey Kong, Mario Bros (NOT Super!), Clu Clu Land, and Pinball as starting games. They make for good test subjects. You *can* get these working with the model/method tokumaru describes
- Yes, you will need NMI, guaranteed. Bit 7 of $2000 will play a critical role in getting things working, along with bit 7 of $2002 (else you'll find game code spinning waiting for that bit)
- You will need to implement some generic joypad methodology for $4016/4017, else some games will spin waiting for certain behaviour
- Don't worry about IRQ for now
- Don't worry about audio for now
Edit: clarification (d7 of $2000 vs. $2002)
Surprisingly, Mappers aren't that hard, and games like Mega Man or Mega Man 2 will work fine even with an instantaneous rendering PPU.
Yeah, definitely, I'll do what tokumaru suggests. Thank you for your anwers!
No offense, but I'll say my opinion - do
NOT do what he's saying. Reason - you are going to write an inaccurate software and placed down to the others. I experienced this back in 1998. Try to follow what's described in the PPU section (even if it looks crypt) and
always have a goal of ACCURACY. Otherwise, as I said, you're producing something NOT interesting, since we have
tons of NES emulator projects, but only a few ones made a public release.
The choice is yours now.
I certainly didn't suggest releasing an emulator with a crude PPU implementation to the public, but as a private milestone, it makes sense to implement things in the simplest possible way to increase your understanding of that thing.
As long as people are capable of telling apart personal accomplishments from products worth releasing, it's OK to make things that are less than perfect. This is true for emulators, games, or anything really.
The one wrinkle in that: Certain well-known source code backup services tend to charge more for a private project than a public one.
That's only a wrinkle if you choose to make it one and happen to insist on using that particular code hosting service and don't want to or can't pay for a private repo.
And FWIW I don't see anything wrong with "making available publicly because I don't want to or can't pay for a private repo". Especially because, if you aren't going around advertising it, your random repo in a sea of millions will likely go unnoticed anyway.
The only question I have is whether hitching your emulator to a fundamentally inaccurate PPU will add technical debt that'll be a pain to pay off in the future. But I've never written an NES emulator, so I don't have much to add there.
Just to give you a high-level picture of what's going on, and clear up some misconceptions in your initial post:
The PPU doesn't calculate a whole frame at once (it wouldn't have enough RAM to store all that data!). Rather, it fetches and calculates stuff
as the frame is being rendered and displayed on-screen. The CPU is still running as this happens, so a lot of effects like status bars and so on are done by having the CPU change PPU settings in the middle of a frame being rendered. (Though, the programmer has to be careful about what they change and when they change it, so that they don't end up interfering with the PPU and causing graphical glitches.)
Vblank is actually when the PPU is idle. This is why you can access things like PPU memory during vblank, which you aren't really able to do while the frame is rendering, since the PPU is accessing it at the time.
On NTSC systems (like in the US and Japan), there are 262 scanlines total:
- 0-239 are the visible scanlines, where the PPU is rendering. In other words, the game screen is 240 pixels tall.
- 240 is the post-render scanline, which could have been the start of Vblank, since the PPU is idle, except... for whatever reason the Vblank flag isn't actually set until the next scanline.
- 241-260 is Vblank, lasting for 20 scanlines. (On PAL systems, this would be 70 scanlines instead, since games run at 50 FPS instead of 60, so there's more time between frames.)
- 261 is the last scanline, where the PPU has to fetch some things to prepare for the next frame. You could also think of this as scanline -1.
There's some more detailed information here.
Nicole wrote:
Just to give you a high-level picture of what's going on, and clear up some misconceptions in your initial post:
The PPU doesn't calculate a whole frame at once (it wouldn't have enough RAM to store all that data!). Rather, it fetches and calculates stuff
as the frame is being rendered and displayed on-screen. The CPU is still running as this happens, so a lot of effects like status bars and so on are done by having the CPU change PPU settings in the middle of a frame being rendered. (Though, the programmer has to be careful about what they change and when they change it, so that they don't end up interfering with the PPU and causing graphical glitches.)
Vblank is actually when the PPU is idle. This is why you can access things like PPU memory during vblank, which you aren't really able to do while the frame is rendering, since the PPU is accessing it at the time.
On NTSC systems (like in the US and Japan), there are 262 scanlines total:
- 0-239 are the visible scanlines, where the PPU is rendering. In other words, the game screen is 240 pixels tall.
- 240 is the post-render scanline, which could have been the start of Vblank, since the PPU is idle, except... for whatever reason the Vblank flag isn't actually set until the next scanline.
- 241-260 is Vblank, lasting for 20 scanlines. (On PAL systems, this would be 70 scanlines instead, since games run at 50 FPS instead of 60, so there's more time between frames.)
- 261 is the last scanline, where the PPU has to fetch some things to prepare for the next frame. You could also think of this as scanline -1.
There's some more detailed information here.Nicole's summary is really great. I know there has been a debate on here recently about what should be on the Wiki. I would argue that a short, concise summary of a topic, in non-technical (but accurate) language, like the one Nicole gave above should be at the top of every Wiki page. It would set the tone of the Wiki as approachable but accurate. Thank you, Nicole!
Yeah, thank you Nicole. Your post helped me clarify some misconceptions of mine.
I'm going with tokumaru's approach, Zepper. Of course it's only draft code that I hope will help me understand the PPU as a whole. After it's acomplished its mission, I'll discard it and write proper code with proper synchronization with the CPU, but, at the moment, I don't see the big picture.
About the private hosting inconvenience you mentioned... no worries. I have private backup source code hosting at the moment thanks to my university.
I came across something... strange. I started by drawing sprite tiles on screen, but they appear mirrored in X axis. I'm using YYCHR to check my code correctness and everything is drawn right except for this mirroring. I'm using SDL and I know X grows from left to right and Y from up to down, so that's not the cause. And I also know I can invert the tile drawing from right to left, but I'd rather not. What the reason for this mirroring?
Just to clarify, this is the first tile (YYCHR):
This is what my program draws:
HastatusXXI wrote:
I came across something... strange. I started by drawing sprite tiles on screen, but they appear mirrored in X axis. I'm using YYCHR to check my code correctness and everything is drawn right except for this mirroring. I'm using SDL and I know X grows from left to right and Y from up to down, so that's not the cause. And I also know I can invert the tile drawing from right to left, but I'd rather not. What the reason for this mirroring?
Just to clarify, this is the first tile (YYCHR):
This is what my program draws:
Not to be blunt, but if your graphics code draws sprites flipped horizontally, then
flip them before drawing them so they show up correctly - you're going to have to figure out how to flip sprite tiles anyways (each sprite can be flipped horizontally and/or vertically), so you may as well do it now.
Quietust wrote:
HastatusXXI wrote:
I came across something... strange. I started by drawing sprite tiles on screen, but they appear mirrored in X axis. I'm using YYCHR to check my code correctness and everything is drawn right except for this mirroring. I'm using SDL and I know X grows from left to right and Y from up to down, so that's not the cause. And I also know I can invert the tile drawing from right to left, but I'd rather not. What the reason for this mirroring?
Just to clarify, this is the first tile (YYCHR):
This is what my program draws:
Not to be blunt, but if your graphics code draws sprites flipped horizontally, then
flip them before drawing them so they show up correctly - you're going to have to figure out how to flip sprite tiles anyways (each sprite can be flipped horizontally and/or vertically), so you may as well do it now.
I know how to flip them, it's easy. However, I want to know if this is normal (PPU manages memory in a big-endian fashion, AFAIK), since the wiki states that the colour of a row of 8 pixels in a 8x8 tile is determined by using the corresponding byte of pattern table 0 as 0 bit values and the corresponding byte of pattern table 1 as 1 bit values and this is how I implemented my program. Is YYCHR flipping the sprites prior to drawing them?
HastatusXXI wrote:
However, I want to know if this is normal (PPU manages memory in a big-endian fashion, AFAIK), since the wiki states that the colour of a row of 8 pixels in a 8x8 tile is determined by using the corresponding byte of pattern table 0 as 0 bit values and the corresponding byte of pattern table 1 as 1 bit values and this is how I implemented my program.
According to the
PPU pattern tables article on the wiki, the most significant bit of tile data (i.e. 0x80) represents the leftmost pixel of said tile, and the least significant bit (0x01) represents the rightmost pixel.
HastatusXXI wrote:
Is YYCHR flipping the sprites prior to drawing them?
No, it is not, since a given tile can be interpreted as either sprite or background, and background tiles are never flipped.
Okay, solved. My algorithm was flipping the pixels prior to drawing them. Thank you, Quietust.
Hello again, I'm making some progress here with the PPU. I'm using Nintendulator PPU and CPU memory maps dumps to comprehend image generation. I understand backgrounds now, but I need OAM dumps to practise everything concerning sprites. Is there any possibility to do this in Nintendulator? (I only see the option of dumping CPU and PPU, but nothing about OAM/SPR).
HastatusXXI wrote:
Hello again, I'm making some progress here with the PPU. I'm using Nintendulator PPU and CPU memory maps dumps to comprehend image generation. I understand backgrounds now, but I need OAM dumps to practise everything concerning sprites. Is there any possibility to do this in Nintendulator? (I only see the option of dumping CPU and PPU, but nothing about OAM/SPR).
The "Dump PPU" button dumps
everything related to the PPU: pattern tables, nametables,
sprites, and palettes (in that order).
Quietust wrote:
HastatusXXI wrote:
Hello again, I'm making some progress here with the PPU. I'm using Nintendulator PPU and CPU memory maps dumps to comprehend image generation. I understand backgrounds now, but I need OAM dumps to practise everything concerning sprites. Is there any possibility to do this in Nintendulator? (I only see the option of dumping CPU and PPU, but nothing about OAM/SPR).
The "Dump PPU" button dumps
everything related to the PPU: pattern tables, nametables,
sprites, and palettes (in that order).
Okay, I see. Thank you, Quietust.
I successfully rendered one frame of Donkey Kong (many things were hardcoded, such as the palettes). Now I'm trying to do it properly and, following the wiki article on PPU registers, I check PPUCTRL via the CPU dump at $2000 (I know that't not the way to do it, it's temporary). I found the value $A5, so the pattern table 0 is chosen, but pattern table 1 (I hardcoded this previously) should be chosen instead. Of course the rendered image is a mess.
What's the problem here?
Debugging Donkey Kong I see writes of $14 at the beginning of NMI and $94 when it finishes during gameplay frames. (The two writes disable then re-enable NMI so that it won't interrupt itself, that's a pattern a few Nintendo games use.) Where did you get $A5?
I tried to use Mesen's Trace Logger feature to get me exactly what I needed (a log of writes to $2000), but it seems incredibly buggy/weird/busted in ways that are off-topic (I'll have to file a GitHub issue :-) ). So I stuck with a breakpoint on any writes to effective address $2000, with a conditional of value == $A5 -- nothing turned up from power-on onwards. I then changed the breakpoint to any write to effective address $2000. From power-on all the way into the attract mode (where the game plays by itself), these are the values you're going to see written to $2000 at varying points:
$10
$14
$90
$94
No other values were seen, at least not during the parts I tested.
$A5 sounds to me like either the wrong data being shown, ex. an opcode (that would be lda $zp), an internal value you might be using for PPU state, or maybe you reported to us the wrong value.
koitsu wrote:
I tried to use Mesen's Trace Logger feature to get me exactly what I needed (a log of writes to $2000), but it seems incredibly buggy/weird/busted in ways that are off-topic (I'll have to file a GitHub issue :-) )
Just tried on my end and it works as expected with the following condition: "address == $2000 && iswrite"
Sour wrote:
koitsu wrote:
I tried to use Mesen's Trace Logger feature to get me exactly what I needed (a log of writes to $2000), but it seems incredibly buggy/weird/busted in ways that are off-topic (I'll have to file a GitHub issue :-) )
Just tried on my end and it works as expected with the following condition: "address == $2000 && iswrite"
I'll file a GitHub Issues for the behaviours I see that are bizarre/weird. There's no issue with the conditionals, the problems pertain to other things.
I got the value $A5 from a random CPU dump of the title screen of Donkey Kong using Nintendulator. The value I'm giving to you comes from the dump at position $2000, not from watching a breakpoint. In fact, Nintendulator only shows $FF when I trace a breakpoint at $2000. I'm puzzled.
If the last value written read or written on
any PPU register ($2000-$2007 or their mirrors at $2008-$3FFF) is $A5, then reading a write-only PPU register (such as $2000) will return $A5. This behavior, which FCEUX calls "PPUGenLatch", is a result of bus capacitance within the PPU. Coincidentally,
the last time we discussed it also involved
Donkey Kong. It's possible that Nintendulator's memory viewer is emulating PPUGenLatch for the $2000-$3FFF space, but I haven't tested that.
As far as I'm aware, the only NES program that actually uses PPUGenLatch on purpose is my
controller test ROM when it's trying to tell the difference between an NES and a Famicom based on $4016 open bus behavior. Existing games don't, and if future games do, it might just be to set the default language to Japanese or English or change how specialty controllers are read.
tepples wrote:
If the last value written read or written on
any PPU register ($2000-$2007 or their mirrors at $2008-$3FFF) is $A5, then reading a write-only PPU register (such as $2000) will return $A5. This behavior, which FCEUX calls "PPUGenLatch", is a result of bus capacitance within the PPU. Coincidentally,
the last time we discussed it also involved
Donkey Kong. It's possible that Nintendulator's memory viewer is emulating PPUGenLatch for the $2000-$3FFF space, but I haven't tested that.
As far as I'm aware, the only NES program that actually uses PPUGenLatch on purpose is my
controller test ROM when it's trying to tell the difference between an NES and a Famicom based on $4016 open bus behavior. Existing games don't, and if future games do, it might just be to set the default language to Japanese or English or change how specialty controllers are read.
Does it mean I won't be able to get a PPU register value from a CPU dump from Nintendulator? There's no option to dump PPU registers and Quietust specified above that PPU dump contains pattern tables, nametables, OAM and palettes, but he didn't mention PPU registers at all, so I infered they would be available from the CPU memory map.
koitsu mentioned Mesen's Trace Logger. I'd rather use Nintendulator, since it's compatible with Wine. I'd try a Virtual Machine with Win 7 x64 if it's necessary (the one I have now runs Win Xp x32, but Mesen's emulator is not compatible with x86 arch).
Edit: Okay, I've seen Mesen is compatible with Linux.
Edit2: I can see the values koitsu gets for $2000 in the Trace Logger. However, all the PPU mapped registers stay $00, so I can't produce a dump to continue practising. How can I solve this? Is there any other way to get the value of the PPU registers?
Edit 3: Ok, I overlooked the paragraph on PPUGenLatch in the wiki article on PPU ports. So, the values in these registers start to decay about a frame after they've been written. Shouldn't it be possible to get a CPU dump just after the values have been written, either in Nintendulator or Mesen? Do they always dump the CPU memory map in a concrete instant of time?
SUMMARY: Is there any way I can dump the CPU memory map including the PPU registers values prior to decaying?
Quote:
Is there any way I can dump the CPU memory map including the PPU registers values prior to decaying?
Most of the PPU state is visible in Mesen's debugger. There's a frame labelled "PPU Status" in there which has the state of $2000, $2001, the current address, and the current scanline/pixel.
In general debuggers will
not return the state of write-only registers from a memory dump of CPU. There are different behaviours that may be implemented here (open bus, PPU bus storage, 00, FF, etc.) and this is not exactly consistent between emulators. Even with registers that can be read, a CPU memory view or dump may not show their contents, because reading from them may have side effects too. Again depends on the emulator, it's kind of a grey area what's appropriate to display for this kind of thing.
Hi, everyone.
I'm trying to resume this project, although at a slow pace...
I'm still trying to understand picture representation. I made a draft tool to represent a Donkey Kong frame from CPU and PPU dump files. Everything is drawn right expect from Mario and a flame, which look reversed on their half (see the attached image).
Any ideas on why this might be happening?
Edit: since Peach is drawn properly I guess there's no trouble regarding my drawing algorithm, so ¿is there any bit for a particular sprite that sets it as reversed?
Edit2: I answer my own question: yes, bit 6 and 7 of byte 3 (attributes) in OAM are represent sprite flipping.
UPDATE: yes, that was the problem. Solved
I'm unable to get NMI working properly with Donkey Kong.
When the CPU arrives at this line
Code:
C7DC 8D 00 20 STA $2000 = 10 A:90 X:00 Y:00 P:A4 SP:FF PPU:208, 47 CYC:64980
bit 7 of $2000 is set to 1 and, given that this is in a frame rendering, when it reaches VBLANK this bit won't change, since it's already toggled. In the next frame this is solved, since STA is done with A:10, but, according to Nintendulator's DK log, this NMI shouldn't be skipped. Any idea on what's going on?
Another question regarding that: why does the first NMI take place on cycle 86972? Isn't it supposed to happen after cycle 27384? (
http://wiki.nesdev.com/w/index.php/PPU_power_up_state)
Usually, at least the first two frames' vblank have no /NMI because the CPU is still waiting for the PPU to stabilize before enabling NMI generation in $2000.
tepples wrote:
Usually, at least the first two frames' vblank have no /NMI because the CPU is still waiting for the PPU to stabilize before enabling NMI generation in $2000.
That solves my second question, thank you!
However, I still have no clue about the first. Let me clarify a little bit, since my explanation might be confusing:
The first NMI takes place on cycle 86972. This line
Code:
C7DC 8D 00 20 STA $2000 = 10 A:90 X:00 Y:00 P:A4 SP:FF PPU:208, 47 CYC:64980
occurs in the same frame this first NMI happens. This line toggles bit 7 of $2000 and no other instruction accesses this address until some cycles after NMI has been handled. However, AFAIK, for NMI to occur PPU has to be in VBLANK (bit 7 of $2002 toggled) and bit 7 of $2000 must be toggled during VBLANK (i.e. 0->1, but being 1 before VBLANK starts shouldn't trigger NMI, should it? [edge-sensitivity?]).
So, my question is: Why is NMI, indeed, triggered, given what I exposed? Am I missing something?
HastatusXXI wrote:
for NMI to occur PPU has to be in VBLANK (bit 7 of $2002 toggled) and bit 7 of $2000 must be toggled during VBLANK (i.e. 0->1, but being 1 before VBLANK starts shouldn't trigger NMI, should it? [edge-sensitivity?]).
It
is edge-triggered, but it's triggered on (reg2000 bitAND reg2002 bitAND 128). As long as either is true, a rising edge in the other will cause an NMI
lidnariq wrote:
HastatusXXI wrote:
for NMI to occur PPU has to be in VBLANK (bit 7 of $2002 toggled) and bit 7 of $2000 must be toggled during VBLANK (i.e. 0->1, but being 1 before VBLANK starts shouldn't trigger NMI, should it? [edge-sensitivity?]).
It
is edge-triggered, but it's triggered on (reg2000 bitAND reg2002 bitAND 128). As long as either is true, a rising edge in the other will cause an NMI
That explains it. Thank you!
It occurs to me that the text on the wiki might have misled you. Any suggestions on things we might change to help with that?
lidnariq wrote:
It occurs to me that the text on the wiki might have misled you. Any suggestions on things we might change to help with that?
Sure. This page
http://wiki.nesdev.com/w/index.php/NMI (operation) is well suited to come up with an implementation for NMI, while this other
http://wiki.nesdev.com/w/index.php/CPU_interrupts explains the hardware behaviour (edge-sensitivity). However, it might seem unrelated at first for people who are novice like me. An effort to connect both pages should be made, in my opinion. Besides, this sentence "The PPU pulls /NMI low if and only if both NMI_occurred and NMI_output are true." in the section operation of the first page needs clarification. I'm unable to get what you explained to me from that.
Sorry for the delay. I'm unable to dedicate to this project all the time I'd like.