Reliability of raster timing

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Reliability of raster timing
by on (#40562)
Hello guys,

I am currently developing a NES game using the MMC3 mapper. My game will basically be a typical jump'n'run game, where the screen is split between top game and bottom status display.

For the raster split, I am using the MMC3 IRQ feature, which works quite well.

One feature of my raster routine is that it changes the palette mid-frame, to have a totally independent status display. As we all (probably) know, for changing the palette, one has to turn off PPU rendering, which on the other hand means that writing new colours will produce coloured stripes in the blank area. Fortunately, I was able to time my routine so that those stripes are hidden by the HBLANK.

I was able to get a stable routine which runs very well on both Nintendulator and Nestopia, which both seem to emulate the PPU behaviour correctly. I still haven't build my MMC3 devcart to test my routines on real hardware. So now my question is, to what degree are both emulators reliable in order to code timing intensive raster routines?

by on (#40576)
Nestopia are very accurate, they even emulates the imperfection of MMC3's counter quite well.
But you'd still want to test on the real hardware after you developped the routine, and possibly fine tune the timing a little.

by on (#40586)
Quote:
Nestopia are very accurate, they even emulates the imperfection of MMC3's counter quite well.
But you'd still want to test on the real hardware after you developped the routine, and possibly fine tune the timing a little.


Absolutely, since there are slight differences between Nestopia and Nintendulator. In order to produce a clean stable raster split on Nintendulator, I need to delay writes to palette ram 1 more cycle than Nestopia. But currently, it looks perfect on all 3 major emulators (even FCE Ultra), so I am guessing (hoping) that it will work equally well on real hardware.

What I am most proud of is the fact that the routine works equally well on both PAL and NTSC mode (after lots of tweaking), depite PAL having severely less cycles per line. Coming from the C64, this scenario is much worse than the 63 vs 64 vs 65 cycle issue on the C64.

Now the other issue is: should homebrew software be made so it can run on emulators, or should it further explore the quirks of the original hardware.... ;)

by on (#40588)
Quote:
Now the other issue is: should homebrew software be made so it can run on emulators, or should it further explore the quirks of the original hardware.... Wink

I defy you to find things modern emulators doesn't implement, exept a trick about $2004 reading that was recently found, that I don't think any emulator has implemented yet.

by on (#40589)
Bregalad wrote:
I defy you to find things modern emulators doesn't implement, exept a trick about $2004 reading that was recently found, that I don't think any emulator has implemented yet.


Interesting answer, considering the fact that in your previous post, you recommend checking my timing against real hardware.

As soon as you deliberately modify registers during screen rendering, especially in the middle of a scanline, I am quite sure one can produce code which will not display correctly using an emulator.

Even on the best explored systems, like the C64 or the Atari VCS, which absolutely require 99.9% perfect timing in order to run everything, there is still untapped potential left. I kind of doubt that NES emulation has reached the same kind of perfection, though Nestopia is really a huge step forward compared to emulators like FCE Ultra.

Don't take it the wrong way, i don't mean to bash anyone. But I sense a lot of untapped potential in the NES. The only thing which kind of destroys any attempt to unveil this potential is the huge difference between PAL and NTSC consoles, your timing will alway be off depending on the region you develop for. Why on earth did Nintendo choose to do this? They could have easily divided the 26,58Mhz / 15, and PAL users would have 1,77 Mhz CPU speed.

Anyway, thanks for your help! :D

by on (#40591)
Quote:
Interesting answer, considering the fact that in your previous post, you recommend checking my timing against real hardware.

Of course you can't know if it will work on real hardware before trying. But it's now hard to encounter a situation where Nestopia or Nintendulator are "wrong". Maybe if you play a lot with the register as you said, but this will not output anything usefull on the screen so there is no point in doing that exept for tests.
Quote:
Even on the best explored systems, like the C64 or the Atari VCS, which absolutely require 99.9% perfect timing in order to run everything, there is still untapped potential left. I kind of doubt that NES emulation has reached the same kind of perfection, though Nestopia is really a huge step forward compared to emulators like FCE Ultra.

This is possible, but I still defy you to find anything really new. I'm not saying this isn't possible or anything, I'm just defying you.

by on (#40593)
Bregalad wrote:
Quote:
Interesting answer, considering the fact that in your previous post, you recommend checking my timing against real hardware.

Of course you can't know if it will work on real hardware before trying. But it's now hard to encounter a situation where Nestopia or Nintendulator are "wrong".


Hmm, I already have encountered a case where both emulators behave slightly differently. In order to hide the color ram artifacts on Nintendulator, I have to add 1 extra cycle. Furthermore, artifacts caused by updating $2006 too late (when I was still figuring out the timing) look slightly different on both emulators. So logically, either one or both of those emulators are wrong.

But at least, they DO implement a pixel based PPU renderer which makes them behave close to each other. Otherwise, it wouldn't be possible at all to do this kind of tweaking I did. Even the worst flickering raster looks perfectly stable on FCE Ultra... ;)

Quote:
Maybe if you play a lot with the register as you said, but this will not output anything usefull on the screen so there is no point in doing that exept for tests.


Ummm, what makes you so sure? For example, ever played with the colour emphasize bits in the middle of a scanline? ;)

The reason why Atari VCS and C64 emulators are so accurate is that lots of people tried lots of weird things on those machines during their life time, with no one asking the question of "usefulness".

Quote:
This is possible, but I still defy you to find anything really new. I'm not saying this isn't possible or anything, I'm just defying you.


Hmm, guess I need to collect some parts to build my devcart. ;)

by on (#40597)
6502freak wrote:
Why on earth did Nintendo choose to do this? They could have easily divided the 26,58Mhz / 15, and PAL users would have 1,77 Mhz CPU speed.

When Nintendo designed the Famicom, they probably had no intention of bringing it to the states much less to Europe. 26.58 MHz also won't neatly divide to the NTSC colorburst (and multiples of) for the PPU. Anyways, the CPU speed is hardly a matter in porting, with the completely different video timing and all.

by on (#40601)
kyuusaku wrote:
When Nintendo designed the Famicom, they probably had no intention of bringing it to the states much less to Europe. 26.58 MHz also won't neatly divide to the NTSC colorburst


Yes, but the PAL colourburst. PAL NES systems are clocked with 26.58Mhz masterclock. 26.58Mhz / 6 = 4.43Mhz = PAL colorburst.

NTSC NES systems are clocked with 21.48Mhz masterclock. 21.48Mhz / 6 = 3.58Mhz = NTSC colorburst.

Now, the PAL pixelclock is generated by:

26.58Mhz / 5 = 5.31Mhz

So far, they solved this most elegantly, because the 5.31Mhz PAL pixelclock is very close to the NTSC pixelclock of 5.37Mhz (21.48Mhz / 4). With 340 PPU per per line, the PAL NES yields 15.62khz with 50.1Hz using 312 lines.

But why on earth did they choose 26.58Mhz / 16 = 1.66Mhz, instead of 26.58Mhz / 15 = 1.77Mhz, which would be nearly identical to the NTSC CPU speed (21.48Mhz / 12 = 1.79Mhz)? Most PAL/NTSC conversion problems would have been eliminated. The sound hardware wouldn't need a PAL update, because the difference between 1.77 and 1.79 Mhz is hardly noticeable. Furthermore, the formula 1 CPU cycle = 3 PPU pixels would be the same on PAL & NTSC.

Quote:
(and multiples of) for the PPU. Anyways, the CPU speed is hardly a matter in porting, with the completely different video timing and all.


Once you start writing timing intensive code, it matters a lot, because on PAL you have quite a few CPU cycles less per line than NTSC. Of course you still would have 50Hz vs. 60Hz, but at least the video AND cpu timing would be nearly identical.

by on (#40612)
6502freak wrote:
But why on earth did they choose 26.58Mhz / 16 = 1.66Mhz, instead of 26.58Mhz / 15 = 1.77Mhz, which would be nearly identical to the NTSC CPU speed (21.48Mhz / 12 = 1.79Mhz)? Most PAL/NTSC conversion problems would have been eliminated. The sound hardware wouldn't need a PAL update, because the difference between 1.77 and 1.79 Mhz is hardly noticeable. Furthermore, the formula 1 CPU cycle = 3 PPU pixels would be the same on PAL & NTSC.


If I were to take a guess from what I'd learned in my digital systems lab, it allowed them to not use a 4-input AND gate as the input to the clock divider reset. (/12: 4 bit divider, AND q2 & q3 and use it as the reset signal. /16: no reset signal. /15: must AND all 4 outputs). Since the NES dates to the era when ICs were designed almost purely by hand, I'm guessing it was easier to remove the (N)AND gate than to replace it with one approximately twice the size.

by on (#40620)
Quote:
Ummm, what makes you so sure? For example, ever played with the colour emphasize bits in the middle of a scanline? Wink

Yes. Commercial games also did so, and it takes effect instantally.

For NTSC and PAL, the rule is that if the VBlank works in NTSC it's possible to make it work in PAL, and if raster timing works in PAL it's possible to make it work in NTSC.

by on (#40623)
lidnariq wrote:
If I were to take a guess from what I'd learned in my digital systems lab, it allowed them to not use a 4-input AND gate as the input to the clock divider reset. (/12: 4 bit divider, AND q2 & q3 and use it as the reset signal. /16: no reset signal. /15: must AND all 4 outputs).


I very much doubt that a single tiny gate is the reason for this.

Quote:
Since the NES dates to the era when ICs were designed almost purely by hand, I'm guessing it was easier to remove the (N)AND gate than to replace it with one approximately twice the size.


Doesn't convince me, because the PAL colour encoding alone is more complex to implement than the NTSC one (you have to shift the phases on every odd line). It requires quite a rework. There are also other changes in the PAL PPU, for example the 341/340 cycle toggling line is not present.

In the end, we'll never know for sure.

by on (#40624)
Bregalad wrote:
Quote:
Ummm, what makes you so sure? For example, ever played with the colour emphasize bits in the middle of a scanline? Wink

Yes. Commercial games also did so, and it takes effect instantally.


What commercial games changes the emphasize bits in the MIDDLE OF A SCANLINE. Notice, SCANLINE, not SCREEN. ;)

by on (#40625)
6502freak wrote:
What commercial games changes the emphasize bits in the MIDDLE OF A SCANLINE. Notice, SCANLINE, not SCREEN. ;)

At least Final Fantasy.

by on (#40628)
6502freak wrote:
Doesn't convince me, because the PAL colour encoding alone is more complex to implement than the NTSC one (you have to shift the phases on every odd line). It requires quite a rework. There are also other changes in the PAL PPU, for example the 341/340 cycle toggling line is not present.

In the end, we'll never know for sure.


I think the PAL change is easy, and requires very little space... but I need to do some research on the wiki before I mouth off.

The CPU and PPU were entirely separate, and may well have been updated by entirely different teams. As far as I know, the only differences in the PAL cpu were- that divider by 16, and the changes in the lookup table for noise and dpcm. The sound generator on the CPU was already done by pulling out the 6502's BCD mode: it's not altogether unreasonable to think that space was at a premium on the CPU die.

Some more guesses, then: The division needs to be even, because the hardware is tremendously easier for a /6 or /8 which produces the high and low-going edges of the output clock, in comparison to two different comparisons which set it high and low respectively. Or the duty cycle of the resultant clock needs to be 50%, not 46%. Or they were lazy and removed the AND gate instead of drawing new silicon.

But yeah, we can only take educated guesses.

by on (#40630)
Bregalad wrote:
6502freak wrote:
What commercial games changes the emphasize bits in the MIDDLE OF A SCANLINE. Notice, SCANLINE, not SCREEN. ;)

At least Final Fantasy.


It changes monochrome midscanline, but I'm reasonably sure it doesn't change emphasis mid-scanline (or even mid-frame).

by on (#40634)
True, but this is connected and has the same effect.