Need some help with the PPU

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Need some help with the PPU
by on (#116119)
Alright guys, bear with me please. I've read through all of the documents on the wiki and read through a ton of threads on here and there are still some things I just do not understand with the PPU. Maybe you guys can clear these things up for me.

1. How often are the PPU registers ($2000-$2007) looked at/updated? Does the PPU look at them all every cycle or every frame? Also, when are they checked? Eg. First thing at the start of a cycle/frame, or after a cycle/frame?
2. I'm still fuzzy on how some I/O registers work. For example $2007 is either a read or a write to PPU memory. From what I understand, the PPU checks $2006 twice to get the address that the CPU wants to read/write to and stores the address in a temporary VRAM address. Then the data written to $2007 is written into VRAM by the address specified in the temp address. After the write, the temporary address is incremented. If that's correct, then I get that part. How then, can $2007 be used as a read?
3. What is the base nametable that is referred to in register $2000? Does that just tell the PPU which nametable to read from first?
4. How does horizontal and vertical mirroring work? I've read multiple documents on this and I just don't understand how the nametables are arranged. Are the nametables physically mirrored, physically moved to different nametable locates in VRAM, or does the PPU just read from them in a different order? My original thought for horizontal was that nametable 1 would be on the left side (top and bottom) and nametable 2 would be on the right side (top and bottom). The wiki seems to tell me that all four nametables are used somehow.
5. My current idea on how to run my emulator would be to run the CPU and return the number of cycles the opcode that was executed took. Then I would take that information and run the PPU for 3 * cycles to catch up. For example, say some operation took 7 cycles, I would run the PPU for 21 cycles before executing another opcode. Is this an ok way of doing it?

This is all I have for now. The PPU has been worrying me since I started this project and it's proving to be quite confusing. I'm slowly getting there, though.

Thanks
Re: Need some help with the PPU
by on (#116125)
Dartht33bagger wrote:
1. How often are the PPU registers ($2000-$2007) looked at/updated? Does the PPU look at them all every cycle or every frame? Also, when are they checked? Eg. First thing at the start of a cycle/frame, or after a cycle/frame?

Writes to PPU registers take effect immediately. The CPU and PPU run in parallel, as soon as the CPU performs a write to one of these registers, they are immediately sent to the PPU. The time it takes for the writes to affect how the picture is rendered might vary though... For example, the first write to $2005 (during rendering), will change the fine scroll immediately, but the coarse scroll will not change until the scanline ends. The second write to $2005 (Y scroll) simply doesn't do anything until the end of the frame. You can only emulate these effects correctly if you perform all the PPU tasks in the same order the PPU does it, in parallel with the CPU.

Quote:
2. I'm still fuzzy on how some I/O registers work. For example $2007 is either a read or a write to PPU memory. From what I understand, the PPU checks $2006 twice to get the address that the CPU wants to read/write to and stores the address in a temporary VRAM address. Then the data written to $2007 is written into VRAM by the address specified in the temp address. After the write, the temporary address is incremented. If that's correct, then I get that part. How then, can $2007 be used as a read?

Exactly the same way. Reads, just like writes, cause the address to increment. The only difference is that there's a delay when reading. When you read from $2007, a buffered value will be returned, and the contents of the address being read go into that buffer, so it will only be read on the next read. Games that read from $2007 will often throw out the first value read because of this. This delay does not happen when reading from the palettes though.

Quote:
3. What is the base nametable that is referred to in register $2000? Does that just tell the PPU which nametable to read from first?

Yes, that's the name table where rendering starts (i.e. the name table where the pixel that will show up at the top left corner of the screen is). Unless the scroll is (0, 0), you will see more than one name table at once.

Quote:
4. How does horizontal and vertical mirroring work? I've read multiple documents on this and I just don't understand how the nametables are arranged.

The NES has an addressing range of 4096 bytes dedicated to name tables (which are displayed as a 2x2 grid), but the NES has only 2048 of memory for this, so the other 2048 are mirrored. Games get to pick whether the 2 available name tables are arranged horizontally (and mirrored vertically) of vertically (and mirrored horizontally):

Code:
Vertical mirroring (top and bottom look the same):
A B
A B

Horizontal mirroring (left and right look the same):
A A
B B


Quote:
Are the nametables physically mirrored, physically moved to different nametable locates in VRAM, or does the PPU just read from them in a different order?

The PPU address lines are manipulated so that the PPU reads from different parts of the available 2KB of memory.

Quote:
My original thought for horizontal was that nametable 1 would be on the left side (top and bottom) and nametable 2 would be on the right side (top and bottom). The wiki seems to tell me that all four nametables are used somehow.

They are arranged that way when vertical mirroring is used. The PPU doesn't know that there are only 2KB, it thinks it's accessing 4KB of data, so as far as it's concerned there are 4 name tables. But since that extra 2KB don't exist, the 2KB that do exist are used twice. You should know that carts do have the option of disabling the internal 2KB of VRAM and supplying 4KB of its own for the name tables (this is called 4-screen "mirroring" - quotes used because there's no mirroring involved, despite the name).

Quote:
5. My current idea on how to run my emulator would be to run the CPU and return the number of cycles the opcode that was executed took. Then I would take that information and run the PPU for 3 * cycles to catch up. For example, say some operation took 7 cycles, I would run the PPU for 21 cycles before executing another opcode. Is this an ok way of doing it?

That should be OK for most games, but this is not how an actual console works. For example, if a program is waiting for a sprite 0 hit, it will constantly read $2002 waiting for the sprite hit flag to get set. If the LDA $2002 instruction starts before the sprite hit, and the hit happens before that instruction ends (i.e. within 12 pixels) the flag will not be set, like it should be. You'd have to run things cycle by cycle to catch this.
Re: Need some help with the PPU
by on (#116166)
Thank you! Things are starting to make a lot more sense now. Two more quick questions.

1. How does the PPU know if $2007 wants a read or a write? Does the CPU write a nonzero value for a write and a zero value for a read?
2. When a $2007 read occurs, where does the data read go to? Does the PPU just send the read byte to $2007 for the CPU?
3. I want to try to get SMB1 to run since it has no mapper, but I know it's a tricky to emulate game. I also know it uses the sprite 0 hit flag. I'm assuming I need to use something like openMP to get the CPU and PPU running in parallel, but I'm stumped on how to run the CPU cycle by cycle. How do I, for example, split up an add with carry operation into mutiple steps so that only a little of the operation happens per cycle?

I won't work on question 3 for a while since I just want to get a game working first.
Re: Need some help with the PPU
by on (#116168)
1. There's a R/W pin on the CPU. When it's clocked, the pin is "read" and the action is done on the PPU, weather it be a read or write.

2. The byte first goes to a buffer and is not directly read out. All reads are delayed by one. I'm not exactly sure where or how it's stored, but that's how it works.

3. You do that by looking at the 6502 cycle-by-cycle operations for the instructions and implement them like that. :)
Re: Need some help with the PPU
by on (#116170)
3gengames wrote:
2. The byte first goes to a buffer and is not directly read out. All reads are delayed by one. I'm not exactly sure where or how it's stored, but that's how it works.

Except for reads from palette memory (Those aren't delayed). I don't know if any games care, though.
Re: Need some help with the PPU
by on (#116173)
Trust me games probably care, that's a big difference. But yep, that is right.
Re: Need some help with the PPU
by on (#116192)
Dartht33bagger wrote:
1. How does the PPU know if $2007 wants a read or a write? Does the CPU write a nonzero value for a write and a zero value for a read?

What? Where did you get this zero/non-zero idea from? Like 3gengames said, one of the CPU pins indicates whether it's trying to read or write data, and the PPU uses that to tell reads and writes apart.

Fun fact: the Atari 2600 doesn't send the R/W signal to the cart, so carts with extra RAM use one address range for writing and another range for reading. For example, a cart with 256 bytes of RAM will make this memory writable at $1000-$10FF and readable at $1100-$11FF (it uses an address line to select between reading/writing).

Quote:
2. When a $2007 read occurs, where does the data read go to? Does the PPU just send the read byte to $2007 for the CPU?

When $2007 is read, the buffered value is sent to the CPU and the value read from the PPU goes to the buffer.

Quote:
I'm assuming I need to use something like openMP to get the CPU and PPU running in parallel

It doesn't have to be anything fancy, you can simply alternate between emulating the 2 chips. Things only get complex if you want both accuracy AND speed. Things get really simpler if you only care about one or the other.

Quote:
but I'm stumped on how to run the CPU cycle by cycle. How do I, for example, split up an add with carry operation into mutiple steps so that only a little of the operation happens per cycle?

There are documents such as this one (scroll down) that explain what happens on each cycle of the various instructions.

Quote:
I won't work on question 3 for a while since I just want to get a game working first.

Yeah, you shouldn't bother with cycle accurate emulation for now.
Re: Need some help with the PPU
by on (#116195)
Thank you!

One final question for now: How do I emulate the CPU pin for reads and writes on $2007? Do certain opcodes tell the CPU to read to write from that register?
Re: Need some help with the PPU
by on (#116205)
The most commonly encountered CPU instructions for this are LDA $2007 (read) and STA $2007 (write). There are others.


EDIT: mislead less
Re: Need some help with the PPU
by on (#116222)
Dartht33bagger wrote:
How do I emulate the CPU pin for reads and writes on $2007? Do certain opcodes tell the CPU to read to write from that register?

Emulators don't usually emulate individual pins. This would actually be a good idea if it weren't so painfully slow.

You will have to emulate all the instructions one by one, so you absolutely MUST know whether an instruction is reading or writing. Emulators usually have a method that handles CPU writes (that all store instructions can call) and another one that handles CPU reads (that all load instructions can call), and these methods perform a range check to know what to do when particular addresses are accessed. If you detect that a write is being made to $2000-$2007 (or mirrors of that range) you call the appropriate PPU methods to process the write. The same goes for reads, and you pass along to the CPU whatever the PPU returns.
Re: Need some help with the PPU
by on (#116223)
Speaking of $2007, there's something i'm unsure of. Looking at the "skinny on nes scrolling" page, I can see that X and Y scrolling updates every now and then during a frame, sometimes resulting in a wrap-around and nametable toggle. Does this apply to read and writes to $2007? Based on most text on the wiki, it seems like 1 or 32 just gets added to the vram address, with no wrapping or anything. Is that correct?

Oh, and this: "If rendering is enabled" - is this if certain bits are set in $2001? 0x1E? Or is 0xA enough (for BG rendering)?
Re: Need some help with the PPU
by on (#116226)
fred wrote:
Speaking of $2007, there's something i'm unsure of. Looking at the "skinny on nes scrolling" page, I can see that X and Y scrolling updates every now and then during a frame, sometimes resulting in a wrap-around and nametable toggle. Does this apply to read and writes to $2007? Based on most text on the wiki, it seems like 1 or 32 just gets added to the vram address, with no wrapping or anything. Is that correct?


Yes.

fred wrote:
Oh, and this: "If rendering is enabled" - is this if certain bits are set in $2001? 0x1E? Or is 0xA enough (for BG rendering)?

Enabling either background or sprite rendering is sufficient - even if only one is enabled, the PPU still does all of the work to render both (it just discards whichever one is turned off when it comes time to output the pixels themselves).
Re: Need some help with the PPU
by on (#116227)
fred wrote:
Speaking of $2007, there's something i'm unsure of. Looking at the "skinny on nes scrolling" page, I can see that X and Y scrolling updates every now and then during a frame, sometimes resulting in a wrap-around and nametable toggle. Does this apply to read and writes to $2007? Based on most text on the wiki, it seems like 1 or 32 just gets added to the vram address, with no wrapping or anything. Is that correct?


Accessing $2007 during VBlank and when rendering is disabled (which means that neither background nor sprite rendering is enabled in $2001, i.e. that bits 3 and 4 are both zero) increments the address linearly by either 1 or 32. Accessing $2007 outside of VBlank with rendering enabled (this is seldom done) performs a glitchy update that takes its parts from the kinds of updates that are normally done during rendering.

Internally, the same register (loopy_v) is used both to hold the address for $2006/$2007 and during rendering to keep track of the current nametable location being rendered (along with fine x). Saves on hardware.
Re: Need some help with the PPU
by on (#116230)
Ah, that clears it up. Thanks to you both!
Re: Need some help with the PPU
by on (#116243)
tokumaru wrote:
Dartht33bagger wrote:
1. How does the PPU know if $2007 wants a read or a write? Does the CPU write a nonzero value for a write and a zero value for a read?

What? Where did you get this zero/non-zero idea from? Like 3gengames said, one of the CPU pins indicates whether it's trying to read or write data, and the PPU uses that to tell reads and writes apart.


Avoid technical or low level things. He's writing an emulator.

How does the PPU know if $2007 wants a read or a write?
- Firstly, you need the CPU 6502 program code. You should trap reads by LDA $2007 instruction, and writes by STA $2007. Look for LDA/STA timing diagram. Easy.
Re: Need some help with the PPU
by on (#116249)
Zepper wrote:
Avoid technical or low level things. He's writing an emulator.

Well, he did ask how the PPU does it, and that's how... I can't lie to him, or even dumb it down. But I didn't provide any details either, or said that he should emulate this control line.

Quote:
You should trap reads by LDA $2007 instruction, and writes by STA $2007.

You and tepples are talking about LDA and STA like those are the ONLY instructions that can access $2007. Personally, I think that might confuse the OP. Several instructions (with different addressing modes!) can write/read to/from memory, and all of them can access $2007... no reason to hide that from him.
Re: Need some help with the PPU
by on (#116262)
Zepper wrote:
tokumaru wrote:
Dartht33bagger wrote:
1. How does the PPU know if $2007 wants a read or a write? Does the CPU write a nonzero value for a write and a zero value for a read?

What? Where did you get this zero/non-zero idea from? Like 3gengames said, one of the CPU pins indicates whether it's trying to read or write data, and the PPU uses that to tell reads and writes apart.


Avoid technical or low level things. He's writing an emulator.

How does the PPU know if $2007 wants a read or a write?
- Firstly, you need the CPU 6502 program code. You should trap reads by LDA $2007 instruction, and writes by STA $2007. Look for LDA/STA timing diagram. Easy.


This is exactly what I needed. I didn't know that reading $2007 would tell the PPU to read data for the CPU. I was thinking that reading $2007 would just give you the back the data that you wrote to $2007 in the past.
Re: Need some help with the PPU
by on (#116274)
tokumaru wrote:
Zepper wrote:
Avoid technical or low level things. He's writing an emulator.

Well, he did ask how the PPU does it, and that's how... I can't lie to him, or even dumb it down. But I didn't provide any details either, or said that he should emulate this control line.


He asked for an easy PPU read/write... in an emulation point of view.

Quote:
You and tepples are talking about LDA and STA like those are the ONLY instructions that can access $2007. Personally, I think that might confuse the OP. Several instructions (with different addressing modes!) can write/read to/from memory, and all of them can access $2007... no reason to hide that from him.


That's obvious. I gave a quick example, since I mentioned to check the instruction timing, where occurs reads or writes.
Re: Need some help with the PPU
by on (#116281)
Zepper wrote:

You know it's obvious, and I know it's obvious... but does he? :wink:
Re: Need some help with the PPU
by on (#116286)
tokumaru wrote:
Zepper wrote:

You know it's obvious, and I know it's obvious... but does he? :wink:


I don't know. I suppose he has 6502 emulation working, but I can't answer by him. -_-;;
Re: Need some help with the PPU
by on (#116379)
My 6502 emulator is working correctly :)

A few more questions for you guys. I'm having a hard time figuring out how a 8x8 tile is rendered in a scanline. From my understand of how the PPU renders a background pixel from this diagram, it goes something like this:

1. Fetch name table byte
2. Fetch attribute table byte two cycles later
3. Fetch lower background byte two cycles later
4. Fetch upper background byte two cycles later
5. 8 cycles after the first fetch, the attribute byte and the two background tile bytes are loaded into shift registers.

Now the PPU shifts the registers every cycle and created a pixel from the four bits it fetches from the shift registers. That's great and it makes sense. The part that confuses me is how I'm supposed to draw a 8x8 tile in a scanline. Does only the top 8x1 pixels in a tile get rendered on the first scanline? Does the PPU switch between many different tiles on the first scanline and then come back to the same tiles when it needs to draw the next scanline?

Also, almost everything about scrolling makes sense to me but this line:

Quote:
The low 3 bits of X and Y sent to $2005 are the fine pixel offset within the 8x8 tile. The X component goes into the separate x register, which just selects one of 8 pixels coming out of a set of shift registers.


How exactly does it "select" the pixel coming out of the shift register?
Re: Need some help with the PPU
by on (#116380)
Dartht33bagger wrote:
The part that confuses me is how I'm supposed to draw a 8x8 tile in a scanline. Does only the top 8x1 pixels in a tile get rendered on the first scanline?

Yes. Each scanline covers 34 tiles, and only a sliver of each tile gets rendered.

Quote:
Does the PPU switch between many different tiles on the first scanline and then come back to the same tiles when it needs to draw the next scanline?

Yes. It normally* visits a particular set of 34 tiles eight times.

Quote:
Quote:
The low 3 bits of X and Y sent to $2005 are the fine pixel offset within the 8x8 tile. The X component goes into the separate x register, which just selects one of 8 pixels coming out of a set of shift registers.

How exactly does it "select" the pixel coming out of the shift register?

With these:
ImageImage
ImageImage

The eight pixels coming out of the shift register are fed into a set of 8-to-1 multiplexers indexed by the three bits of fine X scroll.


* The top and bottom row of tiles in a scrolled screen may be visited fewer than eight times, as are many tiles that form part of a raster effect.
Re: Need some help with the PPU
by on (#116390)
I'm really close to understanding how this all works now. Two final questions (hopefully).

1. When the attribute shift registers are filled, the attribute byte fetched is fed through a 4 to 1 mux that selects two bits in the attribute byte based on bit 1 of coarse X and coarse Y. The two bits that are selected by the mux are used to fill the entirety of the two attribute shift registers, correct?

For example, say my attribute byte is 11100100. Then lets say bit 1 of coarse X is 1 and bit 1 of course Y is 0. That would make my mux choose 01 from the attribute byte, which would make my two attribute shift registers look like this: 01010101. Is this a correct understand of the process?

2. How are the pattern table addresses created? I know how to find the nametable address and read a byte from that location, which will give me a value of up to $FF. Then I check the PPU register status to see what pattern table the background is held in. If it's held in the second table, I add $1000. That's fine.

However, I see that on the wiki, the valid pattern table address are from $0000-$0FF7 and $1000-$1FF7. How are these addresses made? The only way I could think of it happening is shifting the byte read from the nametable to the left 4 bits and then adding fine x scroll to the address (since fine x scroll can't be larger than 7).
Re: Need some help with the PPU
by on (#116394)
Dartht33bagger wrote:
For example, say my attribute byte is 11100100. Then lets say bit 1 of coarse X is 1 and bit 1 of course Y is 0. That would make my mux choose 01 from the attribute byte, which would make my two attribute shift registers look like this: 01010101. Is this a correct understand of the process?


No - one shift register will get filled with 1s and the other will get filled with 0s.

Dartht33bagger wrote:
2. How are the pattern table addresses created? I know how to find the nametable address and read a byte from that location, which will give me a value of up to $FF. Then I check the PPU register status to see what pattern table the background is held in. If it's held in the second table, I add $1000. That's fine.

However, I see that on the wiki, the valid pattern table address are from $0000-$0FF7 and $1000-$1FF7. How are these addresses made? The only way I could think of it happening is shifting the byte read from the nametable to the left 4 bits and then adding fine x scroll to the address (since fine x scroll can't be larger than 7).


It does indeed shift the nametable byte to the left 4 bits, but it adds the fine Y scroll to the address, not the fine X scroll (why would it add fine X scroll when each byte represents one scanline?). Also, remember that the PPU fetches two pattern bytes per tile, where the second one is simply the first one plus 8 (and contains the upper color bits for that tile).
Re: Need some help with the PPU
by on (#116532)
So I've gotten the background part of my ppu written and I'm trying to test it, but I'm failing miserably. I'm just trying to get one frame rendered at the very least. Right now, I'm not even reaching any of my PPU functions because $2000 stays 0x00 and $2001 stays 0x10. I know my CPU works because it passed the nestest.nes log (well, at least up until the illegal opcodes kicked in), but my cpu gets stuck in a loop as well.

So I'm thinking at this point I'm missing something important about how the system starts up. I've read the power up states for both the PPU and the CPU on the wiki, but I'm having a hard time understanding how the system starts up. I know the system boots up, the program counter is set to the reset vector, and then the PPU does something for a certain amount of cycles before telling the CPU that it's clear to go.

How exactly does the "handshake" between the PPU and the CPU occur? Is the PPU rendering frames at this point? Do NMIs occur at this point, or do they wait until the handshake is complete?
Re: Need some help with the PPU
by on (#116534)
Dartht33bagger wrote:
So I've gotten the background part of my ppu written and I'm trying to test it, but I'm failing miserably. I'm just trying to get one frame rendered at the very least. Right now, I'm not even reaching any of my PPU functions because $2000 stays 0x00 and $2001 stays 0x10. I know my CPU works because it passed the nestest.nes log (well, at least up until the illegal opcodes kicked in), but my cpu gets stuck in a loop as well.

So I'm thinking at this point I'm missing something important about how the system starts up. I've read the power up states for both the PPU and the CPU on the wiki, but I'm having a hard time understanding how the system starts up. I know the system boots up, the program counter is set to the reset vector, and then the PPU does something for a certain amount of cycles before telling the CPU that it's clear to go.

How exactly does the "handshake" between the PPU and the CPU occur? Is the PPU rendering frames at this point? Do NMIs occur at this point, or do they wait until the handshake is complete?


The CPU and PPU run in isolation and only interact via reads/writes from the CPU and NMIs from the PPU. On the NES, the PPU always starts at (0,0) in this diagram on power-up and reset (I think the reset button on the Famicom doesn't reset the PPU, so there it could start anywhere within the frame). The PPU is always "running", so there's nothing special that needs to be done to get it going.

The only complication is that certain registers won't work (are zeroed out) until the PPU has run for one frame. On startup, you'll see games e.g. polling the VBlank flag in $2002 in a loop to determine when the PPU is safe to use. I don't know of any games that depend on certain PPU regs not working on start-up though, so you can safely treat all PPU regs as working right away when starting out.

If you haven't done so already, I'd recommend implementing CPU tracing so you can see what your emulator is doing. On startup you should get something like the following (taken from Donkey Kong):

Code:
C79E: sei
C79F: cld
C7A0: lda #$10
C7A2: sta $2000
C7A5: ldx #$FF
C7A7: txs     
C7A8: lda $2002 <-+
C7AB: and #$80    | VBlank flag polling loop
C7AD: beq $C7A8 --+


Once you get that, you can then see that the PPU register accesses in that code are handled as expected, using your favourite debugging method.
Re: Need some help with the PPU
by on (#116540)
Dartht33bagger wrote:
How exactly does the "handshake" between the PPU and the CPU occur?

Are you setting bit 7 of $2002 to indicate that VBlank has started? The PPU needs about a frame to become stable after power up, so all games will wait 2 or 3 frames by polling this flag before doing anything else. If you are not consistently setting/clearing the flag games will get stuck in this wait loop.

Quote:
Is the PPU rendering frames at this point?

The PPU is always rendering something. Even if background and sprites are disabled, it will render the background color.

Quote:
Do NMIs occur at this point, or do they wait until the handshake is complete?

NMIs will only occur if bit 7 of $2000 is set. Most games will only enable NMIs after everything is stable and initialized. It should be possible to have them enabled right from the start though.
Re: Need some help with the PPU
by on (#116723)
Another question about rendering turned on/off. If background rendering is turned off, does the ppu still go through the attribute/nametable/tile fetch and all that, or does the ppu just idle and render pixels from whatever pallete value is at $3F00?
Re: Need some help with the PPU
by on (#116725)
Dartht33bagger wrote:
If background rendering is turned off, does the ppu still go through the attribute/nametable/tile fetch and all that

Yes, so long as sprite rendering is turned on. If at least one of background rendering and sprite rendering is turned on, all the background processing still happens. The background pixels are just replaced with 0 before they hit the compositor (the part of the PPU where the pixels from the sprite unit and background unit are combined). In my block diagram, this replacement with 0 happens in the D-shaped AND gates just to the left of the compositor. This results in display of the color at $3F00 unless an opaque sprite is in the way.

If both background and sprite rendering are turned off, the output is the color at $3F00, unless the VRAM address is in $3F00-$3FFF in which case the output is the color at the VRAM address.
Re: Need some help with the PPU
by on (#116737)
So I've spent about 8 hours debugging now and I'm kind of stuck. I'm currently just trying to render the background before I move onto sprites, and everything looks like it *should* be working correctly in my code. The screen shows a different story, though. I'm using Donkey Kong as a test game and I'm getting black, orange and gray bars on the screen. Does anyone have any idea why this is happening? I can't seem to pinpoint the issue since my cpu seems to be filling VRAM correctly, and my PPU is reading from valid addresses.
Re: Need some help with the PPU
by on (#116738)
How do you know it's "filling VRAM correctly?" Did you take a PPU RAM dump from FCEUX and compare it to a dump from your emulator? I've attached a PPU RAM dump for Donkey Kong so you can compare the two (fc /b, vimdiff, etc.) and see what's going on. The PPU RAM dump was taken at the Donkey Kong title screen.

The number of things wrong in your shot are almost infinite.

I would recommend you run your emulator through blargg's CPU tests first. My gut feeling is that your 6502 code may be doing the wrong thing.

Also a good starting ROM is Mario Brothers (not Super, just standard Mario Brothers).
Re: Need some help with the PPU
by on (#116752)
Dartht33bagger wrote:
So I've spent about 8 hours debugging now and I'm kind of stuck. I'm currently just trying to render the background before I move onto sprites, and everything looks like it *should* be working correctly in my code. The screen shows a different story, though. I'm using Donkey Kong as a test game and I'm getting black, orange and gray bars on the screen. Does anyone have any idea why this is happening? I can't seem to pinpoint the issue since my cpu seems to be filling VRAM correctly, and my PPU is reading from valid addresses.


It's like the nametables are incorrectly being rendered, or incorrectly written into VRAM. Checking that bin koitsu posted against your VRAM will narrow that down. Or yeah, maybe the CPU. CPU emulation can be very hard if this is your first try. Another starter ROM you might want to try is Lode Runner. It's one of the first I got working in my emu.
Re: Need some help with the PPU
by on (#116856)
It seems that my CPU was off a little bit. I corrected two small errors and then decided I would add in all of the illegal opcodes as well so that I could complete the nestest.nes test. I wasn't able to find Blarg's test anywhere, though.

I'm using this document to code my illegal opcodes and I've ran into a roadblock. I cannot get my DCP (opcode $C3) instruction to output the correct status flags after an operation. In the document I've been using, it says that DCP only changes the carry flag. However, the nestest log seems to change the zero flag and negative flag as well.

For example, at PC = $E92E in the nestest.log the DCP instruction doesn't change the processor status flags at all, even though the result of the instruction should set the negative flag. A few instruction down at PC = $E949, the same DCP instruction is ran, and the negative flag and zero flag are turned off. I can't figure how to replicate this result.

So I'm wondering: is there a better document out there that explains the illegal opcodes like this site does for the legal opcodes? The illegal opcodes page on the wiki tells me nothing and I can't find any other document on the net that goes into detail about the instructions.
Re: Need some help with the PPU
by on (#116863)
What are you doing computation wise for DCP? My CPU passes all of the nestest.nes illegal ops. For DCP I just do a DEC followed by a CMP operation, then move to the next instruction. That's where DCP gets it's mnemonic. It's a concatenation of the two.
Re: Need some help with the PPU
by on (#116871)
Dartht33bagger wrote:
It seems that my CPU was off a little bit. I corrected two small errors and then decided I would add in all of the illegal opcodes as well so that I could complete the nestest.nes test. I wasn't able to find Blarg's test anywhere, though.

Relevant links:

viewtopic.php?p=111810#p111810
viewtopic.php?f=3&t=3966

In general you can find his test suites here: http://blargg.8bitalley.com/nes-tests/

I recommend you stay away from implementing illegal opcodes for the time being -- focus on getting your 6502 CPU core with legal/documented opcodes working correctly first. Worry about illegal opcodes later (there's only one commercial game that I know of which uses an illegal opcode anyway). You're going to spend a lot more time dealing with emulating the PPU and/or mappers anyway. :-)
Re: Need some help with the PPU
by on (#116875)
It looks like I'll have to implement MMC1 to use Blarg's official opcode test from looking at the header. That's going to take some time...and I don't really understand how the output at $6004 works. Do I just dump a range of memory after a while to see what opcodes I missed?

As for how I was doing DCP, I originally was just decrementing the memory value since that's all the document I was using said it was. Now that I'm using a compare as well, the zero and carry flags are set correctly, but my negative flag is still being set when it shouldn't be. I need my processor status to be $64 and I'm getting $E4. Right now I'm comparing the original data to the original data minus one.
Re: Need some help with the PPU
by on (#116885)
I agree with koitsu about the illegal ops, but if you really want to have a look at this:
http://wiki.nesdev.com/w/index.php/Prog ... al_opcodes

That explains what they do pretty well. All of the useful ones are essentially just existing opcodes combined like DCP is. Here's how I handle them:

Code:
//undocumented instructions
#ifdef UNDOCUMENTED
    static void lax() {
        lda();
        ldx();
    }

    static void sax() {
        sta();
        stx();
        putvalue(a & x);
    }

    static void dcp() {
        dec();
        cmp();
    }

    static void isb() {
        inc();
        sbc();
    }

    static void slo() {
        asl();
        ora();
    }

    static void rla() {
        rol();
        and();
    }

    static void sre() {
        lsr();
        eor();
    }

    static void rra() {
        ror();
        adc();
    }
#else
    #define lax nop
    #define sax nop
    #define dcp nop
    #define isb nop
    #define slo nop
    #define rla nop
    #define sre nop
    #define rra nop
#endif


Pretty straightforward, really. putvalue just writes a value into the destination operand.
Re: Need some help with the PPU
by on (#116895)
Dartht33bagger wrote:
It looks like I'll have to implement MMC1 to use Blarg's official opcode test from looking at the header. That's going to take some time...and I don't really understand how the output at $6004 works. Do I just dump a range of memory after a while to see what opcodes I missed?

There should be individual test ROMs that don't need any mapper. As for output, just dump $6000-$7FFF to a text file after running the test ROM for say a minute (you could run your emulator as fast as it can go so this happens more quickly). Then examine the text file.
Re: Need some help with the PPU
by on (#116913)
So I've been trying to get my output to dump correctly. My RAM is of type unsigned char and I'm having a hard time figuring out how to print out the output text. If I don't cast the data as (int), gedit refuses to open the output txt document because of "invalid UTF-8 input". So I've outputted the data as hex values, but I'm not sure how to read the output as hex values.

For example, my output from $6000 to $6016 looks like this right now(all hex values):

6000 - 0
6001 - de
6002 - b0
6003 - 61
6004 - a
6005 - 30
6006 - 31
6007 - 2d
6008 - 62
6009 - 61
600a - 73
600b - 69
600c - 63
600d - 73
600e - a
600f - a
6010 - 50
6011 - 61
6012 - 73
6013 - 73
6014 - 65
6015 - 64
6016 - a

Then there are zeros until another set of values start around $60F1.
Re: Need some help with the PPU
by on (#116914)
It's a null-terminated string at $6004:
readme.txt wrote:
Output at $6000
---------------
All text output is written starting at $6004, with a zero-byte terminator at the end. As more text is written, the terminator is moved forward, so an emulator can print the current text at any time.

The test status is written to $6000. $80 means the test is running, $81 means the test needs the reset button pressed, but delayed by at least 100 msec from now. $00-$7F means the test has completed and given that result code.

To allow an emulator to know when one of these tests is running and the data at $6000+ is valid, as opposed to some other NES program, $DE $B0 $G1 is written to $6001-$6003.
Re: Need some help with the PPU
by on (#116915)
$G1? G?

I see "basics" and "Passed" while trying to decode ASCII in my head.
Re: Need some help with the PPU
by on (#116918)
I passed 01-basics and 02-implied. 03-immediate failed but only on illegal opcodes, so I moved on. Now 04-zero_page is really burning me. I'm not even getting a failed message, just a string of easy opcodes...

Output for 04-zero_page:

A6 LDX z, A4 LDY z, 85 STA z, 86 STX z, 84 STY z, E6 INC z, C6 DEC z, 06 ASL z, 46 LSR z
26 ROL z, 66 ROR z, 65 ADC z, E5 SBC z, 05 ORA z, 25 AND z, 45 EOR z, 24 BIT z, C5 CMP z
E4 CPX z, C4 CPY z, 04 DOP z, 44 DOP z, 64 DOP z, 07 SLO z

I don't understand how I'm missing $A6 LDX (or most of the others) since they seem so simple:
Code:
case 0xA6:   //Zeropage load X
         temp1 = memory->readRAM(PC, ppu);
         X = memory->readRAM(temp1, ppu);
         Z = !(X);
         N = X & 0x80;
         cycles =  3;
         PC++;
         break;
Re: Need some help with the PPU
by on (#116937)
Something else broken? Odd that the earlier tests don't fail. They do catch your unofficial instructions not being right, so it doesn't look like whatever's broken is something the tests rely on. What's your code for LDX immediate, to see how it differs from LDX zero-page?
Re: Need some help with the PPU
by on (#116952)
05-zp_xy gives the same results as 04-zero_page does. The same opcodes are showing up that showed up in the zeropage test.

Here is my code for $A2 immediate LDX:

Code:
case 0xA2:   //Immediate load X
         X = memory->readRAM(PC, ppu);
         Z = !(X);
         N = X & 0x80;
         cycles =  2;
         PC++;
         break;
Re: Need some help with the PPU
by on (#116968)
Dartht33bagger wrote:
05-zp_xy gives the same results as 04-zero_page does. The same opcodes are showing up that showed up in the zeropage test.

Here is my code for $A2 immediate LDX:

Code:
case 0xA2:   //Immediate load X
         X = memory->readRAM(PC, ppu);
         Z = !(X);
         N = X & 0x80;
         cycles =  2;
         PC++;
         break;


The logic looks right to me, but what is "ppu" for in readRAM? You're not reading the value from VRAM are you?
Re: Need some help with the PPU
by on (#116970)
If PC reads work, then zero-page reads should be working.
Re: Need some help with the PPU
by on (#116973)
blargg wrote:
If PC reads work, then zero-page reads should be working.


Should be, but not if he reads the opcode from CPU memory then reads some operands from VRAM due to typos. It looks like he's specifying the address space to take the data from each time he pulls in a byte for anything.
Re: Need some help with the PPU
by on (#116977)
Zeropage reads work correctly on nestest.nes, so I don't understand why they aren't here.

I pass the ppu to readRAM so that if $2002 is read I can clear the writeToggle or if $2007 is read I can use/increment the ppuAddress. I have a separate function for reading from VRAM creativity called readVRAM.

Edit: Somehow I screwed my cpu code up and it wasn't passing any of the tests except for basics. So I reverted back to an older file and everything is working again. I'm trying to figure out why my ROR instructions won't pass.
Re: Need some help with the PPU
by on (#116996)
Ah, okay that makes sense. My mistake.

For ROR, you should have a temp variable to store whether bit 0 of the operand is 0 or 1 before changing it. Now shift the value right by a bit. If the actual CPU carry flag is set, now set bit 7 of the operand. Then, replace the actual CPU carry flag with what you saved in the temp variable. Last, calculate the zero and negative flags. That should do it.
Re: Need some help with the PPU
by on (#117089)
For some reason only tests 01-03 print out a pass/fail message. 04, 05, and 06 only output the missed opcodes. If no opcodes are missed, then nothing gets printed out. Is that supposed to happen?

Here is what I'm getting right now for 04:
Quote:
�ްa
P
^Q
�^\4
�^P
�^P
/usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0
@
^A
^Q
�^\4
�^P
�^P
/usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0
DONE
Re: Need some help with the PPU
by on (#117093)
Yes, that's normal.
Re: Need some help with the PPU
by on (#117122)
Well I'm happy to report that I passed every test except for 11-stack. I've checked everything that has to do with the stack, and I cannot find my error.

Test output:
Quote:
48 PHA
08 PHP
68 PLA
28 PLP
9A TXS
BA TSX

11-stack

Failed


I'm guessing that since I'm missing pretty much every instruction that effects the stack, it must be how I wrap around. These are my two stack functions:

Code:
const void cpu::pushStack(memory* memory, unsigned char &data, ppu* ppu)
{
   memory->writeRAM(SP, data, ppu);
   SP--;            //Decrememnt after writing memory.
   if(SP == 0x00FF) SP = 0x01FF;
}

const unsigned char cpu::popStack(memory* memory,ppu* ppu)
{
   SP++;
   if(SP == 0x2000) SP = 0x0100;
   return( memory->readRAM(SP, ppu) );
}
Re: Need some help with the PPU
by on (#117124)
Quote:
if(SP == 0x2000) SP = 0x0100;

There's the problem, 0x2000 instead of 0x200.
Re: Need some help with the PPU
by on (#117162)
Of course it was a dumb mistake like that. I checked that function multiplie times and missed it. Now I passed, which means my CPU checks out.

After comparing my VRAM dump of Donkey Kong to the one posted earlier in this thread, I'm way off. I'm not exactly sure why, though. I went through and rechecked how all the registers worked and how nametable mirroring worked on the wiki to make sure my code looked ok. My function that writes to VRAM in my memory class looks ok, so I'm thinking that maybe my ppu is somehow at an incorrect address when I go to write data to VRAM. Maybe if I ask these few questions I'll be closer to figuring out why my VRAM isn't being filled correctly.

1. What is the ppu address supposed to be at startup? Currently I'm starting it up at $2000, but I have no idea if it matters or not.
2. Should the ppu be doing its normal rendering operations during the first few screens at startup? Right now, mine starts fetching data/shifting registers/outputting pixels from the instant the emulator starts. To me, this seems like it would be wrong since the game is going to be polling $2002 until Vblank happens - which means that nothing but the chr-rom will actually be filled. Could my ppu address be incorrect because I'm incrementing it from the get go?
3. The only way the game can fill up VRAM is by writing the address it wants to fill with data at $2006 and then the writing data to $2007, correct?

This is my code to fill VRAM:
Code:
void memory::writeVRAM(unsigned short address, unsigned char &data)
{
   //Address above 0x3FFF wrap around between the 0x0000 and 0x3FFF range.
   address &= 0x3FFF;

   if(address >= 0x2000 && address <= 0x2FFF)
   {
      if(horizontalMirror)
      {
         //$2000 = $2400 and $2800 = $2C00
         if(address < 0x2400)         //Address in first nametable
         {
            VRAM[address] = data;
            VRAM[address + 0x400] = data;
            VRAM[address + 0x1000] = data;
            VRAM[address + 0x1400] = data;   
         }
         else if(address < 0x2800)
         {
            VRAM[address] = data;
            VRAM[address - 0x400] = data;
            VRAM[address + 0xC00] = data;         //Mirror for $2000 range
            VRAM[address + 0x1000] = data;
         }
         else if(address < 0x2C00)
         {
            VRAM[address] = data;
            VRAM[address + 0x400] = data;
            VRAM[address + 0x1000] = data;
            if(address < 0x2B00) VRAM[address + 0x1400] = data;   //No mirror above 0x2EFF
         }
         else
         {
            VRAM[address] = data;
            VRAM[address - 0x400] = data;
            if(address < 0x2F00) VRAM[address + 0x1000] = data;   //No mirror above 0x2EFF
            VRAM[address + 0xC00] = data;
         }
      }
      else    //Vertical mirroring
      {
         //$2000 = $2800 and $2400 = $2C00
         if(address < 0x2400)         //Address in first nametable
         {
            VRAM[address] = data;
            VRAM[address + 0x800] = data;
            VRAM[address + 0x1000] = data;
            VRAM[address + 0x1800] = data;   
         }
         else if(address < 0x2800)      //$2400 range
         {
            VRAM[address] = data;
            VRAM[address + 0x800] = data;
            VRAM[address + 0x1000] = data;   
            if(address < 0x2700) VRAM[address + 0x1800] = data;
         }
         else if(address < 0x2C00)      //$2800 range
         {
            VRAM[address] = data;
            VRAM[address - 0x800] = data;
            VRAM[address + 0x800] = data;         //Mirror for $2000 range
            VRAM[address + 0x1000] = data;
         }
         else               //$2C00 range
         {
            VRAM[address] = data;
            VRAM[address - 0x800] = data;
            if(address < 0x2F00) VRAM[address + 0x1000] = data;   //No mirror above 0x2EFF
            VRAM[address + 0x800] = data;            //Mirror for $2400 range
         }
      }   
   }
   else   //Pallete write
   {
      //Pallete glitch where these areas of BG pallete are also copied to sprite pallete
      if(address == 0x3F00 || address == 0x3F04 || address == 0x3F08 || address == 0x3F0C)
      {
         //Sets and mirrors the data
         for(int i = address; i < 0x4000; i += 0x20)
            VRAM[i] = VRAM[i + 0x10] = data;
      }
      else
      {
         //Sets and mirrors the data
         for(int i = address; i < 0x4000; i += 0x20)
            VRAM[i] = data;
      }
   }
}
Re: Need some help with the PPU
by on (#117168)
The PPU begins with NMIs disabled and sprites and background disabled. If that's not enough, the game will also likely disable NMIs and disable sprites and backgrounds immediately at boot. So you'll be drawing a bunch of gray pixels (background color) until the game has really started up. Also before the first frame has rendered completely, writes to 2000 and 2001 are ignored, so you can't enable NMIs or enable spites and backgrounds.
The Famicom is different in that the PPU is immediately ready after a system reset, so that's why games will disable NMIs and disable sprites and backgrounds. While it doesn't matter on the NES, it does on the Famicom.

Also, it sounds like you might be having problems with disabling rendering. When rendering is disabled (sprites and backgrounds turned off), the PPU doesn't do any memory fetches, and doesn't automatically increment its addresses. So at power on, it won't be incrementing the PPU address at all.

Initial VRAM address simply doesn't matter, nothing relies on it at all. But it's supposed to be 0000.

Also, remember that the first 2006 write sets the high byte, and the second 2006 write sets the low byte. It's the only Big Endian part of the entire system.
Re: Need some help with the PPU
by on (#117169)
Dartht33bagger wrote:
1. What is the ppu address supposed to be at startup? Currently I'm starting it up at $2000, but I have no idea if it matters or not.
2. Should the ppu be doing its normal rendering operations during the first few screens at startup? Right now, mine starts fetching data/shifting registers/outputting pixels from the instant the emulator starts. To me, this seems like it would be wrong since the game is going to be polling $2002 until Vblank happens - which means that nothing but the chr-rom will actually be filled. Could my ppu address be incorrect because I'm incrementing it from the get go?

Both questions answered in the Wiki: http://wiki.nesdev.com/w/index.php/PPU_power_up_state

Dwedit's comments on Famicom vs. NES apply as well, naturally.

Also, the above "VRAM writing code" (which is really just your code used to handle mirroring and certain PPU memory regions) sheds absolutely no light on how you're handling writes to $2000, $2005, $2006, and $2007.
Re: Need some help with the PPU
by on (#117171)
The way you're handling mirroring appears sort of suspect. You're executing each write more than once into a space larger than the actual memory. But most ASIC mappers can change the mirroring at runtime, and when it changes, games expect the existing data to get moved around. Say a program writes the following to the nametables while mirroring is set to vertical (not to scale):
Code:
Data wri  tten to
one name  table
stays th  ere when
nametabl  es move.

Data wri  tten to
one name  table
stays th  ere when
nametabl  es move.

If the program then proceeds to switch the mirroring to horizontal, it expects the data to instantly become logically rearranged as follows:
Code:
Data wri  Data wri
one name  one name
stays th  stays th
nametabl  nametabl

tten to   tten to
table     table
ere when  ere when
es move.  es move.

Rad Racer relies on this, as do a lot of games that use a status bar.

You'll especially notice this when you try to implement one-screen mirroring as used by mapper 7 and some MMC1 games.
Re: Need some help with the PPU
by on (#117175)
tepples has a pretty good point. Mirroring doesn't mean that the data is written to multiple locations... it means that there's only one physical location, which is accessed through multiple addresses. In emulators this is often implemented with pointers, which can be changed to point anywhere you want (and like tepples said, many mappers do constantly change the mirroring settings as they run).
Re: Need some help with the PPU
by on (#117201)
I completely forgot that I could use pointers to reference data pieces inside of an array. I'll come home and re-write my mirror functions accordingly later tonight with pointers.

Here is how I handle writes to registers for those who asked:

Code:
switch(address)
   {
      case 0x2000:
         ppu->ppuTempAddress &= ~0x0C00;         //Clears bits 10 and 11
         ppu->ppuTempAddress |= ((data & 0x03) << 10);   //Shifts the nametable select bits to bit 10 and 11 in the temp address
         break;
      case 0x2005:
         if(ppu->writeToggle)
         {
            ppu->ppuTempAddress &= 0x8C1F;            //Makes bits 5-9 and 12-14 zero
            ppu->ppuTempAddress |= (data & 0xF8) << 2;      //Shifts the data to fill bits 5-9
            ppu->ppuTempAddress |= (data & 0x07) << 12;      //Shifts the data to fill bits 12-14
            ppu->writeToggle = false;
            break;
         }
         else
         {
            ppu->ppuTempAddress &= ~0x001F;            //Makes the first 5 bits zero
            ppu->ppuTempAddress |= (data & 0xF8) >> 3;      //Gets the last 5 bits for the address
            ppu->fineXScroll = data & 0x07;            //Gets the first three bits
            ppu->writeToggle = true;
         }
         break;
      case 0x2006:
         if(ppu->writeToggle)
         {
            ppu->ppuTempAddress &= 0xFF00;            //Clears the lower 8 bits
            ppu->ppuTempAddress |= data;            //Lower byte of address
            ppu->ppuAddress = ppu->ppuTempAddress;         //Set after temp address is filled
            ppu->writeToggle = false;
         }
         else
         {
            ppu->ppuTempAddress &= 0x00FF;            //Clears upper 8 bits
            ppu->ppuTempAddress = (data & 0x3F) << 8;      //Upper piece of address
            ppu->writeToggle = true;
         }
         break;
      case 0x2007:
         writeVRAM(ppu->ppuAddress, data);
         if(RAM[0x2000] & 0x04) ppu->ppuAddress += 32;         //Checks increment bit
         else ppu->ppuAddress++;
         break;
   }
Re: Need some help with the PPU
by on (#117202)
Comment in passing (have much to do today, sorry) -- thumbs up to using pointers for the mirroring. That's absolutely the proper way to do it, ditto with a large amount of mapper implementation (PRG/CHR page selection). If I see memcpy() I will stab. ;-)
Re: Need some help with the PPU
by on (#117209)
The PPU registers are mirrored all the way from $2000 to $3FFF. This means that writing to $2008 (for example) is the same as writing to $2000, so you shouldn't check for the exact addresses in your switch. Instead, check whether the address being accessed is in the $2000-$3FFF range, then discard all bits of the address except the first 3, which you can use to select a register (0 to 7).
Re: Need some help with the PPU
by on (#117225)
tokumaru wrote:
The PPU registers are mirrored all the way from $2000 to $3FFF. This means that writing to $2008 (for example) is the same as writing to $2000, so you shouldn't check for the exact addresses in your switch. Instead, check whether the address being accessed is in the $2000-$3FFF range, then discard all bits of the address except the first 3, which you can use to select a register (0 to 7).

Code:
case (Address & 0xE007)
Re: Need some help with the PPU
by on (#117227)
Yep, and the cool thing is that the compiler will do all the optimizations tokumaru mentioned: masking the address, checking that it's in the $2000-$2007 range, then using either a binary search or jump table, depending on which is most efficient. Further, this kind of masking them compare is actually closest to how the hardware does it; the upper 3 address lines are decoded ('138?) then the lower 3 are used to select the register.
Code:
switch ( Address & 0xE007 ) {
case 0x2000: ... // handles all mirrors, e.g. 0x2008, 0x2010... 0x3F80
case 0x2001: ...
...
case 0x2007: ...
}
Re: Need some help with the PPU
by on (#117229)
Thanks for all the helps guys! I should have been using pointers from the start for mirroring - it makes it so much simpler. I rewrote all of my memory functions that have to do with RAM so that almost everything is done through pointers now.

However, a new problem has arisen. I decided to run back through Blarg's CPU tests just to make sure that my new code worked correctly, and now tests past 03 give me a message that interrupts should not happen during the tests. For some reason, one BRK instruction is being executed per test. In 04 - zero_page, at one point during the test the program writes $00 to $3A6. Later on in the test, the program counter ends up at $3A6 and reads the opcode from that address. This of course makes a BRK instruction occur.

I'm not really sure if this from me rewriting my memory functions or from the CPU. When I ran the tests before, tests 04-09 spit out garbage output, so I was told that I had passed them since no opcodes showed up.
Re: Need some help with the PPU
by on (#117230)
Binary search between the working version and this version until you find the set of code changes that broke it. Then break those changes roughly in half, if there is any independence. Repeat until you find the cause.

It'd be slightly interesting to add a check for BRK causing the vectoring rather than unwanted IRQ, and report this differently.
Re: Need some help with the PPU
by on (#117245)
Are you handling unofficial opcodes? Depending on what you do with them, I suspect they might be the culprit.
I tried running these tests in my emulator recently and pass some, others fail with the 'interrupts should not occur' message. My emulator fails spectacularly at unofficial opcodes, as i pretty much do nothing with them yet. Can't recall if I looked into the fail closely, but I took it as a "passed officials, crashed at unofficials".
Re: Need some help with the PPU
by on (#117246)
Treating unofficial opcodes as one-byte NOPs runs the risk of BRKing the program if a 2- or 3-byte unofficial opcode's operand is $00.
Re: Need some help with the PPU
by on (#117252)
Hmm, I never even thought of the unofficial opcodes. They are all implemented, but I'm not sure how well they work. They all move the program counter the right number, but that's about all I can guarantee. Three of them fail on test 03 but that's the only test that outputs any opcodes. The other tests that fail just complain about the BRK instruction.

So I grabbed a copy of my old memory function and ran the tests again. Now the old memory function is complaining about interrupts happening as well. I've been able to figure out that at some point during the test, opcode $91 (indirect, Y STA) is executed. A = $00 at this point and so $91 writes $00 to address $3A6. Then, later on in the test, the program counter ends up at $3A6 and reads $00, which throws the break instruction.

From what I just found it, it looks like my accumulator isn't set to the right value when opcode $91 is executed. 05-zp_xy and 06-absolute have the same results: opcode $91 sets address $3A6 to $00 which later causes a BRK instruction.
Re: Need some help with the PPU
by on (#117255)
Dartht33bagger wrote:
So I grabbed a copy of my old memory function and ran the tests again. Now the old memory function is complaining about interrupts happening as well. I've been able to figure out that at some point during the test, opcode $91 (indirect, Y STA) is executed. A = $00 at this point and so $91 writes $00 to address $3A6. Then, later on in the test, the program counter ends up at $3A6 and reads $00, which throws the break instruction.

From what I just found it, it looks like my accumulator isn't set to the right value when opcode $91 is executed. 05-zp_xy and 06-absolute have the same results: opcode $91 sets address $3A6 to $00 which later causes a BRK instruction.

The description you just gave of sta ($zp),y is wrong/makes no sense. The contents of the accumulator have no bearing on the indirect, nor indexing, operations.

Please provide the code you use for opcodes $81 and $91. It's fairly easy to tell when someone has this wrong (and many people do).

I would recommend you make your emulator halt/stop/throw some indication when BRK is executed; it's usually (99% of the time) a sign that someone has something somewhere that's broken in their 6502 core, or their mapper implementation. Same goes for making your emulator halt/stop/etc. when an invalid opcode is executed. Do this for now -- do not go about implementing unofficial opcodes at this point (sorry I'm repeating myself, I know I said this before), just stop the emu and dump the code around the area that induced the error (i.e. show -10 and +10 bytes disassembled around the area which broke, and all contents of registers (PC, S, A, X, Y, P, etc.)).

Also, not to get pedantic or off track, but FYI the addressing modes are generally referred to as the following:

Code:
sta $12     = zero page (ex. opcode $85)
sta $12,x   = zero page indexed X (ex. opcode $95)
stx $12,y   = zero page indexed Y (ex. opcode $96)
sta $1234   = absolute (ex. opcode $8d)
sta $1234,x = absolute indexed X (ex. opcode $9d)
sta $1234,y = absolute indexed Y (ex. opcode $99)
jmp ($1234) = indirect (ex. opcode $6c)
sta ($12,x) = indexed indirect X (sometimes called "pre-indexed X") (ex. opcode $81)
sta ($12),y = indirect indexed Y (sometimes called "post-indexed Y") (ex. opcode $91)

IMO, I believe it's time you start looking at other peoples' 6502 emulation cores and comparing the code to yours. CPU opcode testers are not able to get everything correct because they rightfully have to assume addressing modes and opcodes are implemented correctly.
Re: Need some help with the PPU
by on (#117271)
I've looked over another core a few weeks ago and it looked to be doing the same thing that mine does. I'll try your dumping method though and see if I can't figure this out.

Here are $81 and $91:

Code:
case 0x81:   //Indirect,X store A to memory
         temp1 = (memory->readRAM(PC, ppu) + X) & 0xFF; //Wraps around if >255
         temp2 = (memory->readRAM((temp1 + 1) & 0xFF, ppu) << 8) | memory->readRAM(temp1, ppu); //Gets address
         memory->writeRAM(temp2, A, ppu);
         cycles =  6;
         PC++;
         break;
      case 0x91:   //Indirect,Y store A to memory
         temp1 = memory->readRAM(PC, ppu);         //Gets Zeropage address
         temp2 = ((memory->readRAM((temp1 + 1) & 0xFF, ppu) << 8) | memory->readRAM(temp1, ppu)) + Y; //Gets real address
         memory->writeRAM(temp2, A, ppu);
         cycles =  6;
         PC++;
         break;
Re: Need some help with the PPU
by on (#117278)
You've got a precedence error in your 0x91 handler; + has higher precedence than |. It's a good idea to avoid mixing bitwise and arithmetic operators without parentheses, even if you've memorized the precedence levels perfectly. Below, yours is like addr1, and addr2 shows what's being done first, whereas you want addr3, with the displacement added afterwards.
Code:
   int addr1 =  (0x01 << 8) |  0xFF  + 0x01 ; // 0x0100
   int addr2 =  (0x01 << 8) | (0xFF  + 0x01); // 0x0100
   int addr3 = ((0x01 << 8) |  0xFF) + 0x01 ; // 0x0200

In this case, it seems simpler to just use all arithmetic operators:
Code:
temp2 = memory->readRAM((temp1 + 1) & 0xFF, ppu)*0x100 + memory->readRAM(temp1, ppu) + Y; //Gets real address

though I have to say, that expression is too verbose to read easily, especially how Y is pushed to the end almost out of sight. This makes the regularity clear and the calculation uncluttered:
Code:
int low  = memory->readRAM( (temp1 + 0) & 0xFF, ppu);
int high = memory->readRAM( (temp1 + 1) & 0xFF, ppu);
int addr = high*0x100 + low + Y;
Re: Need some help with the PPU
by on (#117296)
Ok. I sat down tonight and wrote functions to get addresses in the CPU instead of having each instruction get the address. Could you guys check my functions out to make sure everything is right (it should be)?

Code:
//Get address functions
const unsigned short cpu::zeroPageX(memory* memory, ppu* ppu)
{
   unsigned short temp = memory->readRAM(PC, ppu);
   temp += X;
   temp &= 0xFF;      //Wraps around
   return temp;
}

const unsigned short cpu::zeroPageY(memory* memory, ppu* ppu)
{
   unsigned short address = memory->readRAM(PC, ppu);
   address += Y;
   address &= 0xFF;
   return address;
}

const unsigned short cpu::absolute(memory* memory, ppu* ppu)
{
   unsigned short high = memory->readRAM(PC + 1, ppu) << 8;
   unsigned short low = memory->readRAM(PC, ppu);
   unsigned short address = high | low;
   return address;
}
   
const unsigned short cpu::absoluteX(memory* memory, ppu* ppu)
{
   unsigned short high = memory->readRAM(PC + 1, ppu) << 8;
   unsigned short low = memory->readRAM(PC, ppu);
   unsigned short address = high | low;
   address += X;
   return address;
}

const unsigned short cpu::absoluteY(memory* memory, ppu* ppu)
{
   unsigned short high = memory->readRAM(PC + 1, ppu) << 8;
   unsigned short low = memory->readRAM(PC, ppu);
   unsigned short address = high | low;
   address += Y;
   return address;
}

const unsigned short cpu::indirectX(memory* memory, ppu* ppu)
{
   unsigned short zeropageAddress = zeroPageX(memory, ppu);
   unsigned short low = memory->readRAM(zeropageAddress, ppu);
   zeropageAddress++;
   zeropageAddress &= 0xFF;
   unsigned short high = memory->readRAM(zeropageAddress, ppu) << 8;
   unsigned short address = high | low;
   return address;
}

const unsigned short cpu::indirectY(memory* memory, ppu* ppu)
{
   unsigned short zeropageAddress = memory->readRAM(PC, ppu);
   unsigned short low = memory->readRAM(zeropageAddress, ppu);
   zeropageAddress++;
   zeropageAddress &= 0xFF;
   unsigned short high = memory->readRAM(zeropageAddress, ppu) << 8;
   unsigned short address = high | low;
   address += Y;
   return address;
}
Re: Need some help with the PPU
by on (#117299)
Edit: I took a longer stare at your indirectX() and zeroPageX() routines, and yeah, now I get it. They were confusing me because of your mention of PC, which to me (at that point in the CPU) would still be pointing to the opcode. However somewhere else in your code you're obviously doing PC++ before handling the actual functionality of the addressing mode. In other words I really expected to see PC+1 and PC+2 being used all over.

You need to be aware of 3 things relating to addressing modes: zero page wrapping, page boundary crossing, and the JMP indirect CPU bug:

a) "Zero page wrapping", which is any time a ZP read/write operation happens, the effective addresses used for reading/writing need to stay within the $00xx range (hence the name zero page). This is what &= 0xff is about, but it needs to be applied only where applicable. "Zero page wrapping" does not incur a cycle penalty (keep reading).

b) Actual "page boundary crossing", which is any time an effective address rolls over into the next page successfully (i.e. $12ff -> $1300). You might think "why do I care about this, it's just simple 16-bit addition" -- you need to care about it because crossing a page actually costs an extra CPU cycle. Right now the first-generation NES games you're testing with tend to not be very "timing-dependent" but this will matter quite a lot later, trust me. And don't forget about something like $ffff -> $0000 too (that's also considered page crossing). Your current abstraction methodology loses this ability.

c) There's an actual CPU-level bug in the 6502 which affects jmp ($xxxx) (opcode $6c) only, where (b) above does not happen correctly. In other words, jmp ($80ff) would read the effective 16-bit address low byte from $80ff and the high byte from $8000 -- not $8100 like you would expect. (And no, there is no additional CPU cycle penalty in that situation since the page never gets crossed)

Remaining part of my previous (non-edited) post, which I'll keep just for folks reading this thread (whose Subject is no longer accurate):

Here's some actual 6502 code with comments (I assume you speak 6502):

Code:
lda #$fe     ; A=$fe
sta $0622    ; Store value $fe at memory location $0622 (in RAM)
lda #$22     ; A=$22
sta $4c      ; Memory location $4c ($004c) now contains value $22 (low byte of 16-bit address)
lda #$06     ; A=$06
sta $4d      ; Memory location $4d ($004d) now contains value $06 (high byte of 16-bit address)
ldx #$3a     ; X=$3a
lda ($12,x)  ; Effective address is $12+X (thus $4c)
             ; Memory location $4c ($004c) contains value $22 (low byte of 16-bit address)
             ; Memory location $4d ($004d) contains value $06 (high byte of 16-bit address)
             ; Effective 16-bit address to read from is $0622
;
; A now contains value $fe
;

The difference between this and indirect indexed Y (e.g. lda ($12),y, opcode $b1) is where/when the indexing is applied. This is why some people call indexed indirect X "pre-indexed mode", and indirect indexed Y "post-indexed" mode.

If you want me to do a little write-up like the above but for indirect indexed Y, let me know and I can.

Also a coding practise tip in passing: I would strongly suggest you use inttypes.h typedefs for integers, i.e. uint16_t for an unsigned 16-bit integer (what you call unsigned short. They're fewer characters to type and allow for better cross-architecture support since not all architectures (or compilers/environments for that matter) are identical. In case you think I'm kidding...

P.S. -- What compiler are you using that's letting you shove new variable declarations right in the smack dab centre of your code without making an new code block (e.g. { ... })? Awful that this is allowed. In other words, it should really look like this:

Code:
const unsigned short cpu::indirectY(memory* memory, ppu* ppu)
{
   unsigned short zeroPageAddress, low, high, address;

   zeropageAddress = memory->readRAM(PC, ppu);
   low = memory->readRAM(zeropageAddress, ppu);
   zeropageAddress++;
   zeropageAddress &= 0xFF;
   high = memory->readRAM(zeropageAddress, ppu) << 8;
   address = high | low;
   address += Y;
   return address;
}
Re: Need some help with the PPU
by on (#117303)
koitsu wrote:
Also a coding practise tip in passing: I would strongly suggest you use inttypes.h typedefs for integers, i.e. uint16_t
[...]
P.S. -- What compiler are you using that's letting you shove new variable declarations right in the smack dab centre of your code without making an new code block (e.g. { ... })?

This has been allowed in C++ forever and in C since 1999. In fact, the same revision of C that added stdint.h (the standardized version of inttypes.h) added declaring variables anywhere. I don't think it's as awful as you appear to think it is because it allows variables to be initialized with a value when they come into existence. This is especially important in C++ where declaring a variable of a non-POD type and initializing it later causes a default constructor to run at the point of declaration followed by the class's operator = handler at initialization. Even in C, declaring late allows giving a defined value once you know a value, which makes your program less likely to encounter undefined behavior from using a variable before it is initialized.

Quote:
it should really look like this:

Code:
const unsigned short cpu::indirectY(memory* memory, ppu* ppu)
{
   unsigned short zeroPageAddress, low, high, address;
   zeropageAddress = memory->readRAM(PC, ppu);

At this point, if low, high, or address were accessed right now, that would be undefined.