Starting Out - NESdev BBS

Starting Out
by FreakSoftware on 2006-03-26 (#11221)

Hi,

I've always wanted to write an NES emulator and for some reason I've finally just gotten to the point where I actually started. I'm reading everything I can get my hands on (minus the 6502 manual :roll:

), but I'm having trouble figuring out from a high level perspective of what it is that needs to be written.

The problem I'm finding is that every document I read either explains some things very detailed or very vaguely, either implying a lot between the lines, or simply expecting you to know everything else. I haven't been able to put together a clear picture of how things work.

Starting with the 6502 core from iNES, I've written a bit of code implementing some of basics of the memory and PPU, but I'm suck on a general lack of understanding. Some questions I have follow. I feel foolish asking some of these, but I'd rather ask and make a fool of myself than do something completely wrong.

1) What big pieces need to be written? The CPU is taken care of, then there's the PPU, APU, memory, controllers (not sure how much there is to do there), and then the mappers (which I haven't quite fully digested the details of, but I basically get it).

2) When the processor starts running do I need to load data from the cartridge into memory, or does the MMC map straight to the cartridge in such a way that access to certain areas of CPU memory automatically read from the cartridge?

3) With the CPU memory, do I really need to mirror chunks of memory or do I just look at every single address coming in and reduce it to a common unmirrored chunk? For example, on the PPU, $4000 - $10000 mirrors $0000-$3FFF. Is there any point at all of having any data in $4000+? Am I correct in guess that with $2008-$3FFF in CPU memory that as long as the address is >= $2000 and <= $2007 then it just looks at the least significant 4 bits? Is this why $3456 is the same as $2006?

It's actually difficult to formulate specific questions to ask because basically I'm just kind of confused as to where to start. Lets start with these for now, if you will.

Thanks guys

Re: Starting Out
by Disch on 2006-03-26 (#11222)

FreakSoftware wrote:

1) What big pieces need to be written?

CPU, APU, PPU, mappers. That's pretty much it. Basic controllers are very simple to impliment, but nonstandard controllers (like the zapper, ROB, keyboard, other crap) are a bit harder.

If you have a CPU running code, a PPU drawing pixels, an APU playing sound, and mappers managing the game -- you have a fully working emulator.

PPU should be next highest priority after the CPU. Even if you don't have input... getting a game's title screen displayed or watching the intro demo of a game is very satisfying.

Quote:

or does the MMC map straight to the cartridge in such a way that access to certain areas of CPU memory automatically read from the cartridge?

Bingo.

You may instinctively think to have a big array of $10000 bytes that hosts the RAM and PRG, but it's actually somewhat harder to manage an emu with that setup.

I find it's best to load the ROM's PRG and CHR into their own buffers, then use a series of function pointers to cover certain addressing spaces.

I get into more details, and other people have recommendations in this thread:

http://nesdev.com/bbs/viewtopic.php?t=990

Quote:

3) With the CPU memory, do I really need to mirror chunks of memory or do I just look at every single address coming in and reduce it to a common unmirrored chunk?

Mirroring can easily be accomplished by masking out the appropriate bits of the address. For example system RAM is at $0000-07FF but is also mirrored at $0800-1FFF. In an emu this is easily done with something like:

Code:

BYTE read_ram(WORD a)
{
  return RAM[ a & 0x07FF ];
}

where 'a' is an address between $0000-$1FFF.

Quote:

Is there any point at all of having any data in $4000+?

Not really. Whenever you're dealing with PPU memory accesses, just mask the address with 0x3FFF to have larger addresses mirror the first 16k.

Quote:

Am I correct in guess that with $2008-$3FFF in CPU memory that as long as the address is >= $2000 and <= $2007 then it just looks at the least significant 4 bits? Is this why $3456 is the same as $2006?

Yeah, you have the right idea. When the CPU accesses $2000-3FFF, only the low 3 bits are significant.

Re: Starting Out
by FreakSoftware on 2006-03-27 (#11225)

Disch wrote:

PPU should be next highest priority after the CPU. Even if you don't have input... getting a game's title screen displayed or watching the intro demo of a game is very satisfying.

I bet

Quote:

I find it's best to load the ROM's PRG and CHR into their own buffers, then use a series of function pointers to cover certain addressing spaces.

Now all of these map to the CPU's memory, not the PPU's right? The PPU's memory is physical?

Quote:

Mirroring can easily be accomplished by masking out the appropriate bits of the address.

Hey there's a good idea.

Quote:

Yeah, you have the right idea. When the CPU accesses $2000-3FFF, only the low 3 bits are significant.

Who came up with this stuff? :p

Ok, now just a quick overview of how the whole thing works...

The system turns on, and initial states are loaded into all of the registers. The CPU runs for a fixed amount of cycles (113.333 for NTSC etc) then the PPU draws the line to the screen. (Nothing happens during H-Blank?) This happens for 240 lines, then the V-Blank period starts. The CPU runs one cycle, performs (if enabled) an NMI (generated by hardware I assume), then runs the rest of the cycles per line, runs for the rest of the lines, then starts on the new frame.

Right? I'll gather my thoughts and re-read some things and ask more tomorrow.... err... today. It's getting early in the morning.

Thanks

by hap on 2006-03-27 (#11227)

Hey, it's always nice to see new people starting on an NES emulator.

I don't think it's a good approach to use an existing 6502 emulator, as it's the core and will make it hard for you to understand how it works. If you won't write your own CPU emulator, at least do some reading on the subject, and you'll see your general lack of understanding isn't as bad as you thought it was.

Quote:
Now all of these map to the CPU's memory, not the PPU's right? The PPU's memory is physical?

CHR is always mapped to the PPU memory area $0000-$1fff. It can also be mapped to the PPU nametable area, with specific mappers, you can worry about that later.

Quote:
The CPU runs for a fixed amount of cycles (113.333 for NTSC etc) then the PPU draws the line to the screen. (Nothing happens during H-Blank?) This happens for 240 lines, then the V-Blank period starts. The CPU runs one cycle, performs (if enabled) an NMI (generated by hardware I assume), then runs the rest of the cycles per line, runs for the rest of the lines, then starts on the new frame.

This quick overview would be the behaviour of an emulator with a scanline based renderer. Is that what you meant ? On the other hand, a real NES behaves differently; every component runs in parallel, and something does happen during hblank (PPU fetching sprites, and the first 2 tiles of the upcoming scanline).

by dXtr on 2006-03-27 (#11229)

if this is your first emulation project I would recommend to start a little simpler with writing a Chip8 emulator

by FreakSoftware on 2006-03-27 (#11234)

hap wrote:
I don't think it's a good approach to use an existing 6502 emulator, as it's the core and will make it hard for you to understand how it works. If you won't write your own CPU emulator, at least do some reading on the subject, and you'll see your general lack of understanding isn't as bad as you thought it was.

I'm well aware of how processors in general work (I made one in VDHL and Verilog) as well as assembly (80806 for a class), I just don't think I need to know the specifics of the 6502 right now. I do plan to write the core eventually.

Quote:
Now all of these map to the CPU's memory, not the PPU's right? The PPU's memory is physical?

CHR is always mapped to the PPU memory area $0000-$1fff. It can also be mapped to the PPU nametable area, with specific mappers, you can worry about that later.

Quote:
This quick overview would be the behaviour of an emulator with a scanline based renderer. Is that what you meant ? On the other hand, a real NES behaves differently; every component runs in parallel

Of course. I was actually playing with the idea of running separate components in threads just to see if it would work pretty well (I'm looking to emulate more complicated hardware later on, and being able to take advantage of multiple cores would be a big boost). Though I think I'll just get it working in a single thread for now.

A few more questions:

A) How does the NES hardware vsync itself with the TV? Or does it work the other way around? hsync?

B) bah... I'll ask more later, I have to go.

by teaguecl on 2006-03-27 (#11239)

FreakSW,
Glad to see another NES enthusiast join the community. You have a somewhat unique perspective on this since you are starting from scratch. I was thinking maybe you could help us out by testing our wiki out http://nesdevwiki.ath.cx/index.php/Main_Page

This is intended to be the collection of all NES technical info we know about. Theoretically you should be able to implement an emulator from scratch using info from the wiki. Of course we're not there yet, but you are in a good position to let us know what is missing or confusing. Regardless, it's a good reference for someone in your position, and it's difficult to find (not linked from main page). Good luck!

by FreakSoftware on 2006-03-27 (#11241)

teaguecl wrote:
FreakSW,
Glad to see another NES enthusiast join the community. You have a somewhat unique perspective on this since you are starting from scratch. I was thinking maybe you could help us out by testing our wiki out http://nesdevwiki.ath.cx/index.php/Main_Page

Oh believe me, I am.

There's a lot of details, but nothing really explains how it works from the top down in a high level sense and that's one thing I'm trying to put together.

by FreakSoftware on 2006-03-27 (#11242)

Is "one-screen mirroring" just a typical game like Mario Brothers would be?

A document describing MMC1 says that the chip has four registers:
Register 0 (reg0) - written to via $8000-$9FFF
Register 1 (reg1) - written to via $A000-$BFFF
Register 2 (reg2) - written to via $C000-$DFFF
Register 3 (reg3) - written to via $E000-$FFFF

I don't quite follow how you can write to a register via an address range. Can someone explain that, or is the above simply misleading?

by hap on 2006-03-27 (#11243)

You don't need to create multithreaded software to simulate components running in parallel; a 'catch-up' implementation is sufficient, that has been described on these forums before; run the PPU to catch up with the CPU at each write or read of any PPU register, and at vblank.

and A): The same way cable tv broadcasting does, by outputting high voltage 'blacker than black' synchronisation and screen transition pulses to let the television screen know when the beam must retrace.

*edit* that question about MMC1 registers, I suggest you to do some reading on the subject of 'memory mapped I/O'..

by FreakSoftware on 2006-03-27 (#11244)

hap wrote:
You don't need to create multithreaded software to simulate components running in parallel;

No, but running them in parallel a) is interesting b) can run faster.

Quote:
and A): The same way cable tv broadcasting does, by outputting high voltage 'blacker than black' synchronisation and screen transition pulses to let the television screen know when the beam must retrace.

So the signal source just says "VBlank now" and then goes from there. Ok.

by Disch on 2006-03-27 (#11245)

FreakSoftware wrote:
Is "one-screen mirroring" just a typical game like Mario Brothers would be?

No. One screen mirroring is typically for games which want to have a 4-way scrolling screen AND a stationary status bar of some sort. Examples of this are Conflict, Battletoads, RC Pro Am, etc.

Such games that use 1-screen in this manner can reserve the second nametable to hold nothing but the status bar... and have the scrolling map on its own nametable. This way the map will never overlap the status bar in the nametable as the screen scrolls.

Quote:
I don't quite follow how you can write to a register via an address range. Can someone explain that, or is the above simply misleading?

It's the same concept of mirroring.

The same reason the CPU address $3726 is the same as $2006. Within a certain range, only certain address bits are used.

On MMC1, when you write in the $8000-FFFF range, only bits 13-15 are significant for determining which of the 4 MMC1 regs you're writing to (well really only 13 and 14, since bit 15 is always 1).

ie: writing to $9213 is the same as writing to $8000 because when you mask out bits 13-15 (address & 0xE000), you get the same value ($8000)

If it helps to think of it this way... you could say that $8001-9FFF are all mirrors of $8000, etc. (but only when writing)

by FreakSoftware on 2006-03-27 (#11250)

Disch wrote:
FreakSoftware wrote:
Is "one-screen mirroring" just a typical game like Mario Brothers would be?

No. One screen mirroring is typically for games which want to have a 4-way scrolling screen AND a stationary status bar of some sort. Examples of this are Conflict, Battletoads, RC Pro Am, etc.

Such games that use 1-screen in this manner can reserve the second nametable to hold nothing but the status bar... and have the scrolling map on its own nametable. This way the map will never overlap the status bar in the nametable as the screen scrolls.

Quote:
Quote:
I don't quite follow how you can write to a register via an address range. Can someone explain that, or is the above simply misleading?

It's the same concept of mirroring.

Oh so they literally mean that every one of those address writes to the same register. That's crazy.

Quote:
On MMC1, when you write in the $8000-FFFF range, only bits 13-15 are significant for determining which of the 4 MMC1 regs you're writing to (well really only 13 and 14, since bit 15 is always 1).

ie: writing to $9213 is the same as writing to $8000 because when you mask out bits 13-15 (address & 0xE000), you get the same value ($8000)

If it helps to think of it this way... you could say that $8001-9FFF are all mirrors of $8000, etc. (but only when writing)

See, now that's a beautiful explanation! You should describe everything in that way so I don't have to ask

Be back later.

by FreakSoftware on 2006-03-28 (#11253)

Continuing on MMC1 registers... If I understand this correctly, writing to $8000-$FFFF writes to the actual registers in the MMC, but reading from those address reads from the select ROM banks defined by the bits in the registers?

So for example if we have Register 0 as 00001100, then a) writing 11111111 to $A234 would actually write 11111111 to register 0, but b) reading from $A234 would read from address $2234 of the low 16 KB PRGROM bank?

What ranges in the CPU address range are mapped and which ones are real? Is only $0000 to $07FF real? $2000-$2007 are mapped to PPU registers, $4000 to $401F is kind of sketchy so I don't know about them, but then from $4020 to $10000 it's all mapped somehow onto the cartridge?

Addresses lower than $2000 only use the least significant 11 bits, addresses from $2000 to $3FFF use the least significant 3 bits...

Yeah, I don't get the memory mapping. I'm doing my best sorry

A diagram would help.

by tepples on 2006-03-28 (#11261)

FreakSoftware wrote:
Continuing on MMC1 registers... If I understand this correctly, writing to $8000-$FFFF writes to the actual registers in the MMC, but reading from those address reads from the select ROM banks defined by the bits in the registers?

Correct. Consider the CPU's read/write flag to be like another address bit in some cases.

Quote:
So for example if we have Register 0 as 00001100, then a) writing 11111111 to $A234 would actually write 11111111 to register 0, but b) reading from $A234 would read from address $2234 of the low 16 KB PRGROM bank?

Almost. The MMC1 is actually loaded through a serial port, so writing 11111111 would reset the mapper. If you want to write 11111 to register 0, then write 00000001 00000001 00000001 00000001 00000001, or 00011111 00001111 00000111 00000011 00000001, etc. (Bits 6 to 1 don't matter.)

Quote:
What ranges in the CPU address range are mapped and which ones are real? Is only $0000 to $07FF real? $2000-$2007 are mapped to PPU registers, $4000 to $401F is kind of sketchy so I don't know about them, but then from $4020 to $10000 it's all mapped somehow onto the cartridge?

That's about right.

by FreakSoftware on 2006-03-28 (#11262)

tepples wrote:
The MMC1 is actually loaded through a serial port, so writing 11111111 would reset the mapper.

That I knew/read, but didn't mention. I didn't realize it was serialized. So then, writing 00000001, 00000000, 00000001, 00000001, 00000000 would end up with:

00000001
00000010
00000101
00001011
00010110 <- final result

Interesting.

Quote:
That's about right.

About? What'd I miss?

by tepples on 2006-03-28 (#11263)

FreakSoftware wrote:
tepples wrote:
The MMC1 is actually loaded through a serial port

So then, writing 00000001, 00000000, 00000001, 00000001, 00000000 would end up with:

00000001
00000010
00000101
00001011
00010110 <- final result

Almost. The port is little endian. Writing 1, 0, 1, 1, 0 will end up with
10000
01000
10100
11010
01101 <- final result

Quote:
Quote:
That's about right.

About? What'd I miss?

Given that I myself have no way of buying NES development hardware, I can't test my answers on hardware before posting them as I do on gbadev.org, so I use weasel words to let people know that I haven't completely verified the behavior.

by Disch on 2006-03-28 (#11264)

Before worrying about medium-complexity mappers like MMC1, perhaps you should get more familiar with more basic mappers like 0, 2, 3.

Having existing experience with other mappers and other aspects of the NES will make understanding MMC1 much easier. But for now, I'd say "don't worry about it".

Quote:
What ranges in the CPU address range are mapped and which ones are real?

"Real" is a subjective term. EVERY address gets mapped to somewhere (unless it gets mapped to nowhere -- like if it's unused). Where it gets mapped to depends on the address itself. $0000-1FFF get routed to system RAM, $2000-3FFF get routed to PPU registers, $4000-40FF (roughly) get routed to CPU (and pAPU -- since the pAPU is part of the CPU) registers, and $41FF-FFFF typically goes to the cartridge, however not all of that is used by the cartridge.

$0000-07FF isn't a special case or anything. The RAM doesn't really exist at that address -- but rather that address is mapped to the RAM. (I hope that doesn't confuse you even more).

by mozz on 2006-03-28 (#11266)

FreakSoftware wrote:
No, but running them in parallel a) is interesting b) can run faster.

It's likely to be slower if you use real OS threads. The context-switching and synchronization overhead would kill performance if you were switching every couple simulated cycles.

I suggest you think in terms of several simulated tasks which you switch between using co-operative multitasking (i.e. "green threads". NT-based Windows offers some Win32 API functions for "fibers" which do this, but I would ignore those and do your own context-switching. As far as the OS is concerned, run your code in a single process/thread. That keeps things nice and deterministic. If you really want to, you could arrange to run image filters (HQ3X, Super 2xSAI, etc) of the previously-rendered frame in a separate thread. But I think that is truly a waste of effort, especially for something as simple as the NES.

The key realization for efficiently simulating multiple parallel hardware tasks (e.g. a CPU, a PPU and an APU) is that you DON'T have to simulate them in lock-step. You just need to make sure that side effects from one task onto another, are observed in the correct order and at the correct (simulated) times. So each task can have its own idea of its own "current time" in some absolute sense. In other words, a cycle counter.

For example, you can let the simulation of the CPU "run ahead" of the simulation of the PPU, and when the CPU tries to write to a PPU address, you would switch tasks and simulate the PPU for a while until it "catches up" to where the CPU is. Then you would perform the write, and carry on. Another possibility (since the write is a one-directional effect) is to just remember (in a timestamped circular buffer, for example) the fact that a certain value was written to a PPU port at a certain time, and continue simulating the CPU. You wouldn't need to stop and switch to the PPU unless the CPU tried to *read* from a PPU register---in which case you'd need the PPU to "catch up" before you do the read, to ensure the correct value is available.

Note that unless you write (at least some parts of) your emulator in assembly code, things like context-switching are difficult. There are several emulators that simulate a pair of chips (say, a CPU and a sound chip) by having the CPU always stay "ahead of" the sound chip, and periodically the CPU emulation loop will call a function which simulates the sound chip until it catches up.

Which reminds me: whichever task you are simulating, you probably want to pause it every so often and let the other tasks run (i.e. simulate the CPU for a limited number of cycles, then stop). There are several reasons you might want to do this---one reason is to stop precisely at (or shortly before) the known time when a timer interrupt is going to occur. Another reason is to make sure that you don't run out of audio samples to send to the sound card. Etc.

Disclaimer: I have never written a complete emulator before, but I've given a lot of thought to how an efficient-but-accurate emulator could be written for the SNES. The SNES has a CPU and a sound chip which run on completely independent clocks. But I think the same ideas could be applied to the NES (and may even be overkill). I hope these ideas are useful to you.

EDIT: Let me quote blargg from this other thread http://nesdev.com/bbs/viewtopic.php?p=9201#9201, for he puts it more succinctly than I:
blargg wrote:
Any NES CPU emulator which includes the timestamp of memory accessess can be used as the basis for a "cycle-accurate" NES emulator. The general rule is, any number of hardware modules can be emulated on an as-needed ("catch-up") basis as long as the future effects of all but one module on others can easily be predicted in advance. This is the case for the NES, where the CPU is the only entity whose future effect can only be determined by doing the actual emulation.

by FreakSoftware on 2006-03-28 (#11278)

I appreciate your reply and will definitely refer back to it when the time comes. For now I am writing this all in a single thread, but before I can even get to the real code writing portion of it, I need to understand how it all works. I'm getting there. I'd guess I understand about 30% of it now, 40% of it simply stuff I just need to read and re-read up on, and another 30% I need to continue asking about.