How is this method of timing?

How is this method of timing?
by ens_leader on 2009-01-16 (#42034)

I came up with an interesting way of timing my 6502 CPU... Please let me know if this would work efficiently:

Code:

void execute()
{
     int a = get_time_in_microseconds();
     run_cpu_instruction();
     while  (get_time_in_microseconds() - a < 1.79  + cycles_executed_from_last_instruction) {}
}

Would something like this work?

by Dwedit on 2009-01-16 (#42035)

Usually people use cycle countdown timers which are decreased until the next "event" happens (such as the scanline changing, an interrupt firing, the PPU switching between backgrounds and sprites, etc..)
You could also use Timestamp and Don't Exceed This Timestamp instead of a countdown.

by Disch on 2009-01-16 (#42036)

Interesting... but there are a few problems.

1) There's more to emulating than just the CPU, and those areas require more CPU time. You could still make this work if this is just one thread and you do everything else in seperate threads, but you'd still have to sync them somehow.

2) It's opcode accurate and not cycle accurate. So if this method is used to sync with other subsystems of the NES (like the PPU) it wouldn't be as accurate as some alternatives. Unless you spin in a loop like this per cycle which I guess would work.

3) It's criminally inefficient. You're effectively making the computer spin in a clock checking loop which will burn 100% CPU time.

4) Makes fastforward impossible. Also might have problems with frameskip or if the emu can't run fullspeed.

by ens_leader on 2009-01-17 (#42075)

Timestamp vs. Countdown timer on event. Which would be more accurate and efficient then to implement? I just don't want to have to rewrite all this once I implement one solution.

by Disch on 2009-01-17 (#42078)

You don't need to space out instructions evenly over time -- it has no value. The only possible practical value it could have would be to handle joypad updates smoother, but since all games (except for one that I know of) poll joypad data once per frame, it's a moot point.

Nobody will notice if you do all the work for a frame as quickly as the computer is capable. What the user notices is the frames themselves. Therefore the generally accepted approach is to do things on a frame-by-frame basis. Rather than space out individual cycles or instructions, you just space out the frames as evenly as you can.

I tend to recommend the tried-and-true timestamp and catch-up approach. Keep a CPU timestamp which you update between instructions (or cycles), and keep timestamps for other systems (PPU, APU, mapper IRQ counters, etc) and use the timestamps to keep them synced up. Generally you run the CPU ahead of everything else, then when the CPU needs to interact with the PPU, you would "catch up" the PPU by running it until its timestamp reaches the CPU's timestamp.

You'd just run the CPU until it's timestamp reaches the end of the frame, then catch up the PPU/APU/etc, output your frame of video and audio, then do the timing stuff and wait another 1/60th of a second before doing the next frame.

A properly designed timestamp and catchup system can be just as accurate as any other approach, and is far more efficient than some other approaches (though it may not be the most efficient -- these days the most efficient method would probably involve multiple threads).

by tepples on 2009-01-17 (#42080)

Timing accuracy in an emulator is measured relatively, where one chip should execute x cycles for every y cycles of another chip. For example, the NTSC NES's PPU executes 3 cycles for each cycle of the CPU and APU. Emulators often handle this with a timestamping scheme: the CPU runs "ahead" of the other units, and any writes to PPU or APU registers are logged with a cycle count. When the CPU does anything that depends on the precise state of the PPU and APU, the emulator runs the PPU and APU to "catch up" to the number of cycles that the CPU has run. These include reading PPUSTATUS ($2002) while in range of sprite 0, or executing just before other units are expected to issue an interrupt.

Accuracy does not depend on how many cycles of the emulated machine are executed for each cycle of the host CPU, except at one point: just before the first cycle of the "dummy" or "post-render" scanline (#240). That's when your emulator should catch up all units and then wait for DirectX (or another platform's counterpart) to make sure it's clear to send the finished pixels and samples for the frame that it just rendered.

EDIT: Disch posted while I was typing this. I'll add more:

As for multiple threads, PCs have been fast enough for the past eight years that one core can handle a single NES. If you want to make your emulator thread-safe, it might be for handling multiple consoles at once, such as nemulator's Wii-reminiscent picker.

by ens_leader on 2009-01-17 (#42082)

Disch wrote:
You don't need to space out instructions evenly over time -- it has no value. The only possible practical value it could have would be to handle joypad updates smoother, but since all games (except for one that I know of) poll joypad data once per frame, it's a moot point.

Nobody will notice if you do all the work for a frame as quickly as the computer is capable. What the user notices is the frames themselves. Therefore the generally accepted approach is to do things on a frame-by-frame basis. Rather than space out individual cycles or instructions, you just space out the frames as evenly as you can.

I tend to recommend the tried-and-true timestamp and catch-up approach. Keep a CPU timestamp which you update between instructions (or cycles), and keep timestamps for other systems (PPU, APU, mapper IRQ counters, etc) and use the timestamps to keep them synced up. Generally you run the CPU ahead of everything else, then when the CPU needs to interact with the PPU, you would "catch up" the PPU by running it until its timestamp reaches the CPU's timestamp.

You'd just run the CPU until it's timestamp reaches the end of the frame, then catch up the PPU/APU/etc, output your frame of video and audio, then do the timing stuff and wait another 1/60th of a second before doing the next frame.

A properly designed timestamp and catchup system can be just as accurate as any other approach, and is far more efficient than some other approaches (though it may not be the most efficient -- these days the most efficient method would probably involve multiple threads).

Ok that sounds very efficient to me, I will start experimenting with that.

I would definitely like to work with threads but I want to keep my code portable (language is C) and id especially like it to work in mobile devices as well in a couple years maybe, who knows. Thanks, I'll let you know how it turns out in the coming weeks/months.

by ens_leader on 2009-01-18 (#42103)

Disch wrote:
You don't need to space out instructions evenly over time -- it has no value. The only possible practical value it could have would be to handle joypad updates smoother, but since all games (except for one that I know of) poll joypad data once per frame, it's a moot point.

Nobody will notice if you do all the work for a frame as quickly as the computer is capable. What the user notices is the frames themselves. Therefore the generally accepted approach is to do things on a frame-by-frame basis. Rather than space out individual cycles or instructions, you just space out the frames as evenly as you can.

I tend to recommend the tried-and-true timestamp and catch-up approach. Keep a CPU timestamp which you update between instructions (or cycles), and keep timestamps for other systems (PPU, APU, mapper IRQ counters, etc) and use the timestamps to keep them synced up. Generally you run the CPU ahead of everything else, then when the CPU needs to interact with the PPU, you would "catch up" the PPU by running it until its timestamp reaches the CPU's timestamp.

You'd just run the CPU until it's timestamp reaches the end of the frame, then catch up the PPU/APU/etc, output your frame of video and audio, then do the timing stuff and wait another 1/60th of a second before doing the next frame.

A properly designed timestamp and catchup system can be just as accurate as any other approach, and is far more efficient than some other approaches (though it may not be the most efficient -- these days the most efficient method would probably involve multiple threads).

Disch -

I read your post on
http://nesdev.com/bbs/viewtopic.php?t=3720

and I had a question regarding your "master cycles" concept.

I noticed you used 5 for ppu and 15 for cpu. This make sense since the ppu is 3 times the speed of the cpu. What would be the benefit of using 5 and 15 instead of say 1 and 3. Wouldn't 1 and 3 be just as efficient? Or even 10 and 30 to make things even.

I was just curious why you used 5 and 15 over the other numbers for representing ppu and cpu.

Thanks in advance

by Disch on 2009-01-18 (#42106)

ens_leader wrote:
I was just curious why you used 5 and 15 over the other numbers for representing ppu and cpu.

Thanks in advance

The NTSC ratio is 3:1. The PAL ratio is 3.2:1 (CPU is a little slower). To represent this in integers, rather than dealing with potentially lossy floating points, I use the following constants:

PPU (NTSC and PAL) = 5
CPU (NTSC) = 15
CPU (PAL) = 16

This provides the proper PPU:CPU cycle ratio for both NTSC and PAL emulation.

by Dwedit on 2009-01-18 (#42107)

If the only fractions you are using with floating point numbers are 16ths, floating point math is lossless up to integer values of 1048576.

by Disch on 2009-01-18 (#42108)

3.2 is fifths, not sixteenths. :P

But yeah... I didn't mean to imply that floating points are unreliable -- I'm just saying that for something like this where exact precision is everything, integers just seem like the better option. Exact precision isn't something you can always get with floating points.

by Dwedit on 2009-01-18 (#42109)

Gah...
I was recently doing a bunch of fixed point math on the reciprocal of 3.2 (5/16), so all that was fresh on my mind as I made that post.

by JohnPublic on 2009-02-06 (#42961)

Disch -

Quote:
You'd just run the CPU until it's timestamp reaches the end of the frame, then catch up the PPU/APU/etc, output your frame of video and audio, then do the timing stuff and wait another 1/60th of a second before doing the next frame.

1/60th of a second? Isn't NTSC 30fps, hence wait 1/30th of a second?

by Disch on 2009-02-06 (#42963)

Interleaved is 30 Hz. But progressive scan is 60 Hz. NES outputs progressive (60 full frames every second).

Though even in interleaved, the framerate would still sort of be 60 Hz -- it's just that you only output half the frame each time instead of the full frame each time. Interleaved is like a tradeoff -- half the framerate for twice the vertical resolution.

edit:

or is the term "interlaced" and not interleaved? Whatever.. same difference.

by koitsu on 2009-02-06 (#42972)

Disch wrote:
or is the term "interlaced" and not interleaved? Whatever.. same difference.

Interlaced is the correct term here. Reference material:

http://en.wikipedia.org/wiki/Interlace
http://www.labdv.com/leon-lab/video/interlace_en.htm

Most consoles (including many today!) use interlaced output (odd first, even second, I think -- or do I have the order reversed?). The assumption made is that the connected device is a TV, or otherwise will do deinterlacing itself.

Visual example of what I'm referring to:

http://videoanimal.files.wordpress.com/2008/03/de-interlaced.png
http://www.elurauser.com/articles/deinterlace_weave_lq.jpg

It's pretty sad how much video there is on the Internet which is intended for computer monitor use, yet remains interlaced. Most users don't seem to understand that the results look horrible. :-(

by ens_leader on 2009-02-07 (#42993)

tepples wrote:
Timing accuracy in an emulator is measured relatively, where one chip should execute x cycles for every y cycles of another chip. For example, the NTSC NES's PPU executes 3 cycles for each cycle of the CPU and APU. Emulators often handle this with a timestamping scheme: the CPU runs "ahead" of the other units, and any writes to PPU or APU registers are logged with a cycle count. When the CPU does anything that depends on the precise state of the PPU and APU, the emulator runs the PPU and APU to "catch up" to the number of cycles that the CPU has run. These include reading PPUSTATUS ($2002) while in range of sprite 0, or executing just before other units are expected to issue an interrupt.

Accuracy does not depend on how many cycles of the emulated machine are executed for each cycle of the host CPU, except at one point: just before the first cycle of the "dummy" or "post-render" scanline (#240). That's when your emulator should catch up all units and then wait for DirectX (or another platform's counterpart) to make sure it's clear to send the finished pixels and samples for the frame that it just rendered.

EDIT: Disch posted while I was typing this. I'll add more:

As for multiple threads, PCs have been fast enough for the past eight years that one core can handle a single NES. If you want to make your emulator thread-safe, it might be for handling multiple consoles at once, such as nemulator's Wii-reminiscent picker.

What happens if the cpu master cycle is on the 2nd to last pixel in the frame and then executes an instruction that is 3 cycles and thus bleeds into the next frame? should the ppu 'ignore' that instruction and catch up and render the full frame 1/60th while missing the last two pixels? I ask this because the cpu wont always end on the last pixel of the frame because of variable opcode cycles and such.

by Disch on 2009-02-07 (#43002)

The PPU would run until it reaches the timestamp of the CPU write. If the write occurs past the end of the frame, then the PPU would run through the full frame before the write is performed, and its timestamp would be into the next frame.

The "spillover" would need to be retained by all subsystems to keep the frames the right length. If a frame is (262*341*5) 446710 master cycles, then you would subtract 446710 from all of your timestamps at the end of the frame to adjust the timestamps for next frame. You would not reset timestamps to zero because then you lose the spillover.

Partly because of this, I arrange my frame like so:

--------------------
1 'idle' scanline
20/70 VBlank scanlines
1 prerender scanline
240 rendered scanlines
--------------------

With the idle scanline and vblank scanlines first -- this makes it easy to allow writes like the example you described. Since the PPU is inactive for the time past the end of the frame, these writes can be allowed without disrupting rendering. And you don't have to worry about rendering the next frame before applying the write.