OpenCL anyone ever try it ?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
OpenCL anyone ever try it ?
by on (#87896)
I wonder if anyone ever tried to use OpenCL to emulate lets say a PPU ?
Would that even be possible? practical?

by on (#87897)
well, it's possible but no it's not practical at all. the CPU and PPU in the NES need to be closely synchronized for games to run properly. you can't really separate them like that. that's why you can't effectively write a multi-threaded NES emulator with the CPU and PPU in separate threads.

almost all of the time is going to be spent on one waiting for the other to do something so you might as well just simplify things and keep them in the same thread. besides, today's CPUs are so powerful. there's no reason you would need to offload the PPU work.

i used to play NES games in FCEU at full speed on a Pentium 1 MMX at 233 MHz.

by on (#87899)
You could do it but it'd be a huge waste since as miker00lz said the NES is not a machine where there is much benefit in parallelism of emulation. You could offload some tasks perhaps but again as stated there is going to be alot of waste of one component waiting for response from the other which actually means performance may be much worse.

by on (#87901)
miker00lz wrote:
well, it's possible but no it's not practical at all. the CPU and PPU in the NES need to be closely synchronized for games to run properly.

True, but the only points of synchronization are
  1. when the upper bits of $2002 change (two to four times per frame, and fairly predictable),
  2. when the CPU writes to $2000-$2007 (not very often barring raster effects, except during vblank when the sync need not be as tight),
  3. when the PPU fires an NMI (once per frame, and predictable), and
  4. when the PPU's memory access pattern causes the mapper to fire an IRQ (not very often barring raster effects, and still usually predictable).
These are why catch-up style emulation works. A multithreaded emulator would just do all the catching up in a separate thread that waits for batches of timestamped writes from the CPU.

Quote:
besides, today's CPUs are so powerful

Except on mobile, where you want to reduce the clock frequency even as you increase the cores so that the backlight remains the biggest current sink. Or except if you want to do something like a Wii Menu or a virtual arcade with twelve or more simultaneous emulation instances.

Quote:
i used to play NES games in FCEU at full speed on a Pentium 1 MMX at 233 MHz.

True, but the more accurate "new PPU" of FCEUX needs a faster host CPU. Still, even Nestopia still fits into a P3 at 866 MHz.

by on (#87903)
It's not like any emulators couple timing on the level of hardware so I don't see how an emulator WOULDN'T benefit from parallelism. An efficient emulator will use the catchup method where the majority of time is spent out of sync so they can benefit from separate threads. Plus a thread for the APU and resampler would be smart since accurate audio probably eats more CPU time than the CPU and PPU together.

by on (#87909)
Is one core enough to emulate nes in a cycle perfect way ? like if say the ppu does 3 cycles and the cpu does 1 , so i'd switch to cpu for 1 cycle (sub instruction accuracy) and then to ppu for 3 ?
(say i have a quad core i7 HT running at 4.3GHz)

by on (#87910)
Sure, you can emulate NES at the cycle level on a sub-GHz machine if optimized right (catch-up).

by on (#87916)
The only case I can see for trying to use parallelism with NES is if you do whatever we saw awhile back where you emulate a ton of NES games at once. Then I suppose performance counts. But for any other purpose, any modern machine except those mobile devices should be more than enough to handle the NES even with "cycle accurate" emulation.

I didn't think about the catch up method for use with parallelism but I suppose that makes it a bit more reasonable.

by on (#87924)
Coldberg wrote:
Is one core enough to emulate nes in a cycle perfect way ? like if say the ppu does 3 cycles and the cpu does 1 , so i'd switch to cpu for 1 cycle (sub instruction accuracy) and then to ppu for 3 ?
(say i have a quad core i7 HT running at 4.3GHz)


The emulator in NESICIDE runs 3 PPU, 1 CPU and 1 APU cycle, lather, rinse, repeat, in a single emulation thread. I haven't been bothered with catch-up optimization [yet]. It runs fine on my Intel i5 laptop at 2.6GHz. Not so good on others' machines, from what I hear...