Hey everyone. Long time no see. I've been busy with "real life" nonsense... and emudev (and really, all hobby programming) has sort of drifted out of my life.
However there's just that "something" about the NES that fascinates me a keeps bringing me back.
Anyway, I was kicking around ideas for a new emu. But, with today's multicore and multithreaded CPUs, making a single-threaded emulator seems rather antiquated. Especially since emulators have to run several different systems in parallel. So I figure if I start a new emu project I'm going to try to take advantage of a multithreaded setup.
Of course, multithreading is trickier, so I thought it'd be fun/useful to open a design discussion on the topic. Has anyone here made such an emu? I know byuu has. How did you go about it?
My idea is fundamentally the same as the "catch up" approach most people here are probably familiar with. The difference is, you don't catch up the PPU on register reads/writes... instead it's constantly catching up in a parallel thread. The time it's "catching up" to is the current CPU timestamp, which would constantly be increasing as the CPU executes instructions.
There's still the same sync issues. The PPU can't surpass the CPU and we need a way to sync them on register accesses.
I'm thinking that since the CPU is clocked slower and performs generally simpler tasks, if given the same CPU time, the CPU's timestamp will advance much more quickly than the PPU timestamp, which is benefitial. This means we can probably get away with having the PPU thread have a "dumb spin" while waiting for the CPU to advance. Something like:
... or if you want to be a little more intelligent... possibly this instead:
This would be done every time the PPU emulates a cycle, to ensure it doesn't surpass the CPU.
On the CPU side, however, I don't think we'd want to do this. The CPU will have to wait for the PPU to catch up on register accesses so that the two systems are synced. Since the PPU is likely going to take longer to catch up, a dumb spin on the CPU end probably wouldn't be very effective.
I'm thinking that something like C++11's condition_variable could be used. The CPU would effectively sleep until the PPU emits a signal that it has caught up.
The same thing could be done for the other subsystems, with each running in its own thread.
My main beef with the catch-up approach was that you'd have to write your PPU logic in a way that it would have to be able to enter and exit at any given cycle. With a separate thread, that's no longer the case. You can write the logic cleanly and straightforward without having to allow logic to be interrupted and restarted later. That's the hope anyway... this is still all theory. I'm not sure how well it'd work in practice.
Anyone have any thoughts?
PS. This system actually would have quite a bit of "thrashing" between threads on things like $2002 spin loops. Maybe it would be wiser to have $2002 status predicted so that the CPU can read it and resume without having to wait for the PPU to catch up. Although that gets tricky with the weird sprite overflow behavior. Maybe the thrashing wouldn't be so bad... I'd have to try it out and see.
However there's just that "something" about the NES that fascinates me a keeps bringing me back.
Anyway, I was kicking around ideas for a new emu. But, with today's multicore and multithreaded CPUs, making a single-threaded emulator seems rather antiquated. Especially since emulators have to run several different systems in parallel. So I figure if I start a new emu project I'm going to try to take advantage of a multithreaded setup.
Of course, multithreading is trickier, so I thought it'd be fun/useful to open a design discussion on the topic. Has anyone here made such an emu? I know byuu has. How did you go about it?
My idea is fundamentally the same as the "catch up" approach most people here are probably familiar with. The difference is, you don't catch up the PPU on register reads/writes... instead it's constantly catching up in a parallel thread. The time it's "catching up" to is the current CPU timestamp, which would constantly be increasing as the CPU executes instructions.
There's still the same sync issues. The PPU can't surpass the CPU and we need a way to sync them on register accesses.
I'm thinking that since the CPU is clocked slower and performs generally simpler tasks, if given the same CPU time, the CPU's timestamp will advance much more quickly than the PPU timestamp, which is benefitial. This means we can probably get away with having the PPU thread have a "dumb spin" while waiting for the CPU to advance. Something like:
Code:
while(ppu_time >= cpu_time);
... or if you want to be a little more intelligent... possibly this instead:
Code:
while(ppu_time >= cpu_time) std::yield();
This would be done every time the PPU emulates a cycle, to ensure it doesn't surpass the CPU.
On the CPU side, however, I don't think we'd want to do this. The CPU will have to wait for the PPU to catch up on register accesses so that the two systems are synced. Since the PPU is likely going to take longer to catch up, a dumb spin on the CPU end probably wouldn't be very effective.
I'm thinking that something like C++11's condition_variable could be used. The CPU would effectively sleep until the PPU emits a signal that it has caught up.
The same thing could be done for the other subsystems, with each running in its own thread.
My main beef with the catch-up approach was that you'd have to write your PPU logic in a way that it would have to be able to enter and exit at any given cycle. With a separate thread, that's no longer the case. You can write the logic cleanly and straightforward without having to allow logic to be interrupted and restarted later. That's the hope anyway... this is still all theory. I'm not sure how well it'd work in practice.
Anyone have any thoughts?
PS. This system actually would have quite a bit of "thrashing" between threads on things like $2002 spin loops. Maybe it would be wiser to have $2002 status predicted so that the CPU can read it and resume without having to wait for the PPU to catch up. Although that gets tricky with the weird sprite overflow behavior. Maybe the thrashing wouldn't be so bad... I'd have to try it out and see.