Scripted mapper plugins

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Scripted mapper plugins
by on (#78299)
The idea is that the functionality of any mapper can be implemented in the form of a script, used by the emulator. Of course, you'd be able to come up with some wildly unrealistic mappers, but that's aside the point.

Anyway, we should play around with this idea to see if anything might come of it. :P

I had the idea to set up breakpoint-like behavior. It seemed like it was efficient enough for code, but modular enough to not be hardcoded into the emulator (which is the whole point of this in the first place ;) ).

For NROM,
Code:
on CPU_Read:8000,FFFF {
  PRG_Data = PRG_Space[CPU_Addr];
}

on PPU_Read:0000,1FFF {
  CHR_Data = CHR_Space[PPU_Addr];
}
on PPU_Write:0000,1FFF {
  CHR_Space[PPU_Addr] = CHR_Data; // This does nothing if it's ROM.
}

PRG_Space is just the collection of bytes the CPU can have access to at any time, from any collection of ROMs on the cart. Basically, it's just a way to access the PRG section of the iNES rom.
CHR_Space is the same thing, but in this case, it can either be a ROM or RAM.
I didn't specify what happens for PPU accesses to 2000-2FFF. In that case, the emulator should just assume everything is completely hardwired and determine the behavior based on the appropriate iNES header flags. Additionally, the same could be done if you don't specify CPU 8000-FFFF accesses, and PPU 0000-1FFF accesses.

Let's try CNROM:
Code:
var bank [= initial value on reset];

on CPU_Write:8000,FFFF {
  bank = PRG_Data << 13; // %bb..... ........
}

on CPU_Read:8000,FFFF {
  PRG_Data = PRG_Space[CPU_Addr];
}

on PPU_Read:0000,1FFF {
  CHR_Data = CHR_Space[bank | PPU_Addr]; // bbppppp ppppppp
}

Now let's try AxROM, which has mapper-controlled mirroring, and PRG bankswitching.
Code:
var bank;
var nametable;

on CPU_Write:8000,FFFF {
  bank = (PRG_Data & 7) << 15; // %bb b....... ........
  nametable = (PRG_Data & 0x10) << 6; // %n.. ........
}

on CPU_Read:8000,FFFF {
  PRG_Data = PRG_Space[bank | CPU_Addr]; // %bb bccccccc cccccccc
}

on PPU_Read:0000,1FFF {
  PPU_Data = PPU_Space[PPU_Addr];
}
on PPU_Write:0000,1FFF {
  PPU_Space[PPU_Addr] = PPU_Data;
}
on PPU_Read:2000,2FFF {
  PPU_Data = CIRAM[nametable | (PPU_Addr % 0x3FF)];
}
on PPU_Write:2000,2FFF {
  CIRAM[nametable | (PPU_Addr % 0x3FF)] = PPU_Data;
}


There's a couple of kinks that would need to be ironed out, but I think this would be usable for describing mappers. :P

by on (#78300)
Could a mapper description like this be compiled into code that runs in the emulator using LLVM?

by on (#78301)
Of course, if someone wrote the compiler.

I've always felt it'd be best to write the mapper emulation in 6502 code. Any NES emulator will already have a 6502 core ready, anyway. Just make a new CPU context, map all the cartridge pins to MMIO addresses, add some custom MMIO to it for common functionality, and have at it.

For timing, easiest way would be to run at 'infinite' frequency, having an MMIO reg you can write to in order to 'tick' one cycle, clocked at the same rate as the NES CPU.

Writing the code will be a bit harder, but will only ever need to be done once. Emulators can dynarec it to run faster if they want, but I doubt it would be all that demanding even on today's cell phones.

by on (#78308)
try coding it in javascript ;)

by on (#78310)
byuu wrote:
I've always felt it'd be best to write the mapper emulation in 6502 code. Any NES emulator will already have a 6502 core ready, anyway. Just make a new CPU context, map all the cartridge pins to MMIO addresses, add some custom MMIO to it for common functionality, and have at it.

Interesting idea, but I'll believe it when I see it. It's more work than current methods to time the CPUs correctly, and the emulator won't be able to run as fast since you'd have to run two CPUs at once. The only advantage I see is that any NES emulator could easily incorporate it if desired. But I don't really see any reason any NES emulator would want to. I respect your opinions as you are the author of a very good snes emulator, but I have questions to pose... did you code the snes coprocessor chips in bsnes using 65c816 assembly? If not, why not and are you considering doing this then?
Put the code in the games
by on (#78311)
If we are going to make universal mapper scripts like this and use byuu's idea of writing them using 6502 assembly, why not put the scripts in the actual rom files. This is sort of my idea for a change in the header format. I mean, the real mappers exist on the game cartridges, not the NES itself, so doesn't it seem reasonable to design the emulation system like that as well? This would also do away with the need to CRC games to figure out which board of which mapper to use because the right script will be in the right game. We could also just write the scripts and have users put them into the games themselves. They'd just need to check a list to see which one to use.

by on (#78317)
LUA is a good choice for this kind of functionality.

by on (#78319)
I was going to rant about how Lua is interpreted, and an interpreter run once for each of the nearly 4.5 million fetches a second has to be slow. But then I discovered that a JIT engine for Lua based on LLVM existed.

by on (#78330)
6T4 wrote:
the emulator won't be able to run as fast since you'd have to run two CPUs at once


True, instead of a 100MHz machine, you'll need at least 150MHz. That may be a stretch for time travelers from 1989 :)

6T4 wrote:
did you code the snes coprocessor chips in bsnes using 65c816 assembly? If not, why not and are you considering doing this then?


SNES coprocessors are not mappers, they are actual processors that execute instructions given to them.

Example: the NEC uPD772x family has four opcodes. Load, Jump, Op, and Op+Return. Op can do 10 math functions. Total code size? 10KB. That code emulates the DSP-1, DSP-1B, DSP-2, DSP-3, DSP-4, ST-0010, ST-0011. Count them all as one processor with different programs.

SuperFX and SA-1 are similarly processors that execute instructions. S-DD1 and SPC7110 are decompression codecs, much like an MPEG decoder. S-RTC is a real-time clock. Cx4 and ST-0018 are just like the NEC uPD (the latter may even be an enhanced uPD): we are waiting on the chip PROM dumps. That only leaves the OBC1, which is indeed a simple mapper.

So in total we have (four or) five unique processors, two codecs, one clock, one mapper. The SuperFX and SA-1 even accept external programs, meaning they are not fixed-function. The rest benefit greatly from executing internal PROMs to get proper timing. Since these PROMs exist, and we have them, why simulate when we can emulate?

Now compare to the NES, where we have 100-200 mappers, and not a single emulator supports all of them. Also plenty of pirate carts that some emulators will rightly refuse to ever support. Almost all of the NES mappers are extremely simplistic, like GameBoy MBCs.

Look at my processor core for the NEC uPD: I support seven unique special chips with one interpreter that loads separate PROMs. That is the analogue I am making to NES mappers.

There are of course problematic NES mappers, like the MMC5 and VRC6/7. The most complicated may be best to implement on a per-emulator basis.

The important part is that emulators already support a 6502 virtual machine. If you want a scripting language, they will have to support that language as well. Even if you use one with a nice OSS C library, that will rule out non-C/C++ NES emulators from using it. Scripting will be just as slow (possibly slower), than doing the same in 6502.

by on (#78337)
Just as a test, here's my attempt to describe MMC1 in this notation:
Code:
var shiftValue = 0; // Holds the shift register value
var bitCount = 0; // Counts how many bits have been written
var latch = 0; // Is set to shiftValue when it has 5 bits
var latchFull = 0; // Flag determining if the latch has just been loaded
var mirroring = 0; // mirroring behavior
var prgMode = 0; // PRG bank behavior
var chrMode = 0; // CHR bank behavior (flag)
var chrBankA = 0; // PPU $0000-0FFF or $0000-1FFF bank
var chrBankB = 0; // PPU $1000-1FFF bank
var prgBank = 0; // PRG bank
var ramDisabled = 0; // Flag determining if cart RAM is disabled

// Evaluate the shift register behavior first.
on CPU_Write:8000,FFFF {
  if !(PRG_Data & 0x80) {
    shiftValue = shiftValue << 1;
    shiftValue |= PRG_Data & 1;
    bitCount++;
    if (bitCount == 5) {
      latch = shiftValue & 0x1F;
      bitCount = 0;
      latchFull = 1;
    }
  } else {
    bitCount = 0;
  }
}

// Afterwards, check which address range was written to.
// The appropriate code only runs when the latch has been freshly loaded from
// the previous routine.

on CPU_Write:8000,9FFF { // Control register
  if (latchFull) {
    latchFull = 0;
    mirroring = latch & 3;
    prgMode = (latch >> 2) & 3;
    chrMode = latch & 0x10; // Only need to know if this is zero or nonzero
  }
}

on CPU_Write:A000,BFFF { // CHR Bank A (0000-0FFF or 0000-1FFF)
  if (latchFull) {
    latchFull = 0;
    chrBankA = latch << 12; // %b bbbb.... ........
  }
}

on CPU_Write:C000,DFFF { // CHR Bank B (1000-1FFF)
  if (latchFull) {
    latchFull = 0;
    chrBankB = latch << 12; // %b bbbb.... ........
  }
}

on CPU_Write:E000,FFFF { // PRG Bank (8000-FFFF or 8000-BFFF or C000-FFFF)
  if (latchFull) {
    latchFull = 0;
    prgBank = (latch & 0xF) << 14; // %bb bb...... ........
    ramDisabled = latch & 0x10; // Only need to know if this is zero or nonzero
  }
}

// Cart RAM
on CPU_Read:6000,7FFF {
  if !(ramDisabled) {
    PRG_Data = PRG_RAM_Space[CPU_Addr & 0x1FFF];
  }
  // Assume open bus if PRG_Data isn't set during a read operation.
}


// PRG Banks
on CPU_Read:8000,BFFF {
  switch (prgMode) {
  case 0: // PRG Bank 8000-FFFF, ignore lowest bit
  case 1: // CPU_Addr will fill that in, actually
    PRG_Data = PRG_Space[(prgBank & 0x38000) | CPU_Addr];
    break;
  case 2: // PRG Bank 8000-BFFF
    PRG_Data = PRG_Space[prgBank | (CPU_Addr & 0x3FFF)];
    break;
  case 3: // 8000-BFFF fixed to first bank
    PRG_Data = PRG_Space[CPU_Addr & 0x3FFF];
    break;
  }
}
on CPU_Read:C000,FFFF {
  switch (prgMode) {
  case 0: // PRG Bank 8000-FFFF, ignore lowest bit
  case 1: // CPU_Addr will fill that in, actually
    PRG_Data = PRG_Space[(prgBank & 0x38000) | CPU_Addr];
    break;
  case 2: // C000-FFFF fixed to last bank
    PRG_Data = PRG_Space[CPU_Addr | 0x3C000];
    break;
  case 3: // PRG Bank C000-FFFF
    PRG_Data = PRG_Space[prgBank | (CPU_Addr & 0x3FFF)];
    break;
  }
}

// CHR Banks
on PPU_Read:0000,0FFF {
  if (chrMode) { // nonzero == low bit considered
    PPU_Data = PPU_Space[chrBankA | (PPU_Addr & 0x0FFF)];
  } else { // zero == low bit ignored (let PPU_Addr fill it in)
    PPU_Data = PPU_Space[(chrBankA & 0x1E000) | PPU_Addr];
  }
}
on PPU_Write:0000,0FFF {
  if (chrMode) { // nonzero == low bit considered
    PPU_Space[chrBankA | (PPU_Addr & 0x0FFF)] = PPU_Data;
  } else { // zero == low bit ignored (let PPU_Addr fill it in)
    PPU_Space[(chrBankA & 0x1E000) | PPU_Addr] = PPU_Data;
  }
}
on PPU_Read:1000,1FFF {
  if (chrMode) { // nonzero == use chrBankB
    PPU_Data = PPU_Space[chrBankB | (PPU_Addr & 0x0FFF)];
  } else { // zero == use chrBankA and let PPU_Addr fill in low bit
    PPU_Data = PPU_Space[(chrBankA & 0x1E000) | PPU_Addr];
  }
}
on PPU_Write:1000,1FFF {
  if (chrMode) { // nonzero == use chrBankB
    PPU_Space[chrBankB | (PPU_Addr & 0x0FFF)] = PPU_Data;
  } else { // zero == use chrBankA and let PPU_Addr fill in low bit
    PPU_Space[(chrBankA & 0x1E000) | PPU_Addr] = PPU_Data;
  }
}

// Nametable mirroring
on PPU_Read:2000,23FF { // Upper Left
  switch (mirroring) {
  case 0: // 1 Screen A
  case 2: // Vertical
  case 3: // Horizontal
    PPU_Data = CIRAM[PPU_Addr & 0x3FF];
    break;
  case 1: // 1 Screen B
    PPU_Data = CIRAM[(PPU_Addr & 0x3FF) | 0x400];
    break;
  }
}
on PPU_Write:2000,23FF { // Upper Left
  switch (mirroring) {
  case 0: // 1 Screen A
  case 2: // Vertical
  case 3: // Horizontal
    CIRAM[PPU_Addr & 0x3FF] = PPU_Data;
    break;
  case 1: // 1 Screen B
    CIRAM[(PPU_Addr & 0x3FF) | 0x400] = PPU_Data;
    break;
  }
}
on PPU_Read:2400,27FF { // Upper Right
  switch (mirroring) {
  case 0: // 1 Screen A
  case 3: // Horizontal
    PPU_Data = CIRAM[PPU_Addr & 0x3FF];
    break;
  case 1: // 1 Screen B
  case 2: // Vertical
    PPU_Data = CIRAM[(PPU_Addr & 0x3FF) | 0x400];
    break;
  }
}
on PPU_Write:2400,27FF { // Upper Right
  switch (mirroring) {
  case 0: // 1 Screen A
  case 3: // Horizontal
    CIRAM[PPU_Addr & 0x3FF] = PPU_Data;
    break;
  case 1: // 1 Screen B
  case 2: // Vertical
    CIRAM[(PPU_Addr & 0x3FF) | 0x400] = PPU_Data;
    break;
  }
}
on PPU_Read:2800,2BFF { // Lower Left
  switch (mirroring) {
  case 0: // 1 Screen A
  case 2: // Vertical
    PPU_Data = CIRAM[PPU_Addr & 0x3FF];
    break;
  case 1: // 1 Screen B
  case 3: // Horizontal
    PPU_Data = CIRAM[(PPU_Addr & 0x3FF) | 0x400];
    break;
  }
}
on PPU_Write:2800,2BFF { // Lower Left
  switch (mirroring) {
  case 0: // 1 Screen A
  case 2: // Vertical
    CIRAM[PPU_Addr & 0x3FF] = PPU_Data;
    break;
  case 1: // 1 Screen B
  case 3: // Horizontal
    CIRAM[(PPU_Addr & 0x3FF) | 0x400] = PPU_Data;
    break;
  }
}
on PPU_Read:2C00,2FFF { // Lower Right
  switch (mirroring) {
  case 0: // 1 Screen A
    PPU_Data = CIRAM[PPU_Addr & 0x3FF];
    break;
  case 1: // 1 Screen B
  case 2: // Vertical
  case 3: // Horizontal
    PPU_Data = CIRAM[(PPU_Addr & 0x3FF) | 0x400];
    break;
  }
}
on PPU_Write:2C00,2FFF { // Lower Right
  switch (mirroring) {
  case 0: // 1 Screen A
    CIRAM[PPU_Addr & 0x3FF] = PPU_Data;
    break;
  case 1: // 1 Screen B
  case 2: // Vertical
  case 3: // Horizontal
    CIRAM[(PPU_Addr & 0x3FF) | 0x400] = PPU_Data;
    break;
  }
}

There may be a better way to handle the mirroring than to duplicate the routines for both reads and writes...

Maybe a PPU_Access event which triggers on both reads and writes (with PPU_Data being ignored) can be used to set up an address variable, and then seperate PPU_Read and PPU_Write routines can commit data.

Also, it's worth noting that PRG_Addr only describes A0-A14, since the cart edge doesn't get A15. Though, if that's too much of a problem, PRG_Addr can represent all 16 bits of the address.

by on (#78347)
Any way this can be merged with all the other times it was talked about?

I think it's another thing where everyone has their own implementation like the iNES successor.
How about a ฿ounty?
by on (#78348)
฿฿฿ MAKE MONEY FAST ฿฿฿

The first to implement practical scriptable mapper support in a Free emulator that runs at real time on an Atom subnotebook PC will earn some Bitcoins.

Anyway:
Of course the cart edge gets A15; it's just delayed by a fraction of a cycle.