From a quick look at the code, I think you're going to find the vast, vast majority of CPU is spent in PPU:tick() and putpixel(). You really should not be calling a whole separate function just to draw each pixel, that is like the exact opposite of high performance design. When you enter putpixel, you're calculating this whole mess for getting each one's pointer onto the SDL pixel surface:
Code:
uint8_t *p = (uint8_t*)screen->pixels + ((y * 256 + x) * 3);
You should probably be using a 32-bit surface instead of 24-bit for one thing, almost all modern systems are using 32-bit color the whole time so it takes less effort for the CPU to actually draw it onto the screen. Plus, you then get benefit of being able to write pixels in a less ugly fashion by treating screen->pixels as a uint32_t array.
More importantly, you should probably calculate the color values for every pixel on a per-scanline basis. After you calculate each pixel of the scanline, you can write them all at once so then not only do you not call an entire function for every single pixel and waste gobs of CPU, but you can just do a single pointer calculation for each scanline and then just increment your offset into the surface array with each, eliminating a TON of multiplication which is pretty CPU intensive.
Something along these lines:
Code:
void drawscanline(uint16_t scanline, uint8_t scanline_buffer[256]) {
int32_t offset = (int32_t)scanline * screen->w;
for (int32_t x=0; x<256; x++) {
((uint32_t *)screen->pixels)[offset++] = palette_table[scanline_buffer[x] & 0x3F];
}
}
That code assumes palette_table is an array with each possible color already calculated into an RGB lookup table. This would be about a million times faster than what you're doing. There may be some other problems going on, but this jumped out at me immediately. You also should lock the SDL surface at the start of each frame with SDL_LockSurface(screen), and after the whole frame's been drawn use SDL_UnlockSurface(screen) followed by SDL_UpdateRect(screen, 0, 0, screen->w, screen->h). You don't need to use SDL_Flip at all.
Btw, I've tried a few ROMs with your emu code so far and they all just show grey. Are there any that work yet? I'd like to try.
EDIT: Fyi, if you do decide to go with a palette lookup table (you really should!) then you should be sure that you generate the RGB values using SDL_MapRGB(screen->format, redval, greenval, blueval) and not a fixed set of hardcoded 32-bit values. The reason for this is that you can be absolutely sure SDL will come up with the right data based on the surface format. If you hardcode, you will have portability issues because the colors will be all screwed up on big-endian systems like PowerPC and ARM if you hardcode x86 little-endian values and vice-versa.