Some tidbits about the Cx4 (attn: byuu, nocash)

Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2016-08-07 (#176882)

Hello!
While preparing firmware v0.1.7c for release I stumbled upon the Cx4 again - Rockman died in attract mode once again. After some nights of fiddling it turns out my timing is ass, and the Cx4 can pull off some more stuff than what the two games do. Bit of a shame actually it wasn't used for more stuff.

Anyway this should cover most of the unknown stuff about MMIO registers, pins, internal registers, instruction timing, cartridge RAM access, DMA, and some oddball stuff.

I did not touch unknown instructions and flags so far.

I tried to organize my notes a bit but it's probably still a mess, please ask if you are confused or need to know anything

So without further ado, here are the notes. Also available at https://sd2snes.de/files/cx4_notes.txt

Code:

=======================================
Cx4 notes by ikari_01 <otakon@gmx.net>
https://sd2snes.de
-> Version: 0.2
    - add clarification on memory mapping
    - point out that the CPU is halted until caching is complete
    - cart bus only claimed on actual bus operations
    - add register $7f48
    - correct pin mapping (74 and 75 were swapped)

   Version: 0.1 (initial)
=======================================

These notes add some information about previously unknown/undocumented aspects
of the Capcom Cx4 custom chip. It is NOT a complete documentation of the Cx4
but adds bits of information missing in existing documentation.
They were compiled while working with the Cx4 and are a bit chaotic, please
ask if anything is unclear.


Hardware:
=========

Pin 74: global memory output enable
        Cx4 still exposes its MMIO and internal RAM etc. to the bus if this is
        high, but no ROM or RAM connected to it.
        Probably for use with cart ROM/RAM connected "alongside" the Cx4 but
        independent of it.
Pin 75: Map select (0=LoROM; 1=HiROM)

Memory map
==========

LoROM mapping is widely known. Cart RAM is mapped at 70-7f:0000-7fff.

HiROM mapping is a bit botched at least on the MMX2 PCB. SNES A15 becomes A20
to the ROM, all other address lines are shifted down by one to close the gap.
So the mapping S-CPU -> Cart ROM goes as follows:

C0:0000 => 0x000000
C0:8000 => 0x100000
C1:0000 => 0x008000
C1:8000 => 0x108000
C2:0000 => 0x010000
 ...
DF:7FFF => 0x0FFFFF
DF:FFFF => 0x1FFFFF

ROM content must be rearranged to match, or rewired.

As a plus, in HiROM mode it is possible to use 32MBits of cart ROM in two
16Mbit chips (by leaving $7f52 at $01), from E0:0000 onward the second ROM
will be selected.

In HiROM mode ROM is mapped as follows (assuming $7f52 = $01)

   00-3F:8000-FFFF ROM1 0x100000-0x1FFFFF, ROM2 0x100000-0x1FFFFF
   40-7D:0000-FFFF NOTHING (open bus with a bit of noise)
   80-BF:8000-FFFF ROM1 0x100000-0x1FFFFF, ROM2 0x100000-0x1FFFFF
   C0-FF:0000-7FFF ROM1 0x000000-0x0FFFFF, ROM2 0x000000-0x1FFFFF
   C0-FF:8000-FFFF ROM1 0x100000-0x1FFFFF, ROM2 0x100000-0x1FFFFF

Cart RAM mapping:

   LoROM: 70-77:0000-7FFF
   HiROM: 30-3F:6000-7FFF, B0-BF:6000-7FFF

MMIO mapping:

   LoROM: 00-3F:6000-7FFF, 80-BF:6000-7FFF
   HiROM: 00-2F:6000-7FFF, 80-AF:6000-7FFF (to make room for Cart RAM)


DMA
===

DMA source and destination CAN reference the same bus but not the same chip,
e.g. cart ROM <-> cart RAM is allowed but cart RAM <-> cart RAM isn't!
Same-bus DMA takes WS1+WS2 extra waitstates per cycle.
DMA from/to internal RAM only takes WS1 or WS2 extra waitstates depending
on the referenced mapping area.
Neither DMA source nor destination may point to unmapped areas (-> lockup).
ROM is disallowed as a DMA destination (-> lockup).


CPU misc. (caching)
===================

Cx4 has two program cache pages. They are the only way it can execute code,
the CPU CANNOT run directly from cart ROM/RAM.
The pages have tags to indicate what program page (from ROM) they currently
contain. The CPU will use these to determine whether a jump across page
boundaries requires re-buffering of one of the cache pages.

If execution reaches the end of cache page 0 and there is no STOP instruction,
page 1 will be buffered (according to contents of P register?) and execution
continues. On end of page 1, execution halts (implied STOP instruction) and
CPU goes idle.

During caching of a program page the CPU is halted until all bytes are copied.

For more details on caching see $7f4c.

For cartridge ROM/RAM access, there are two different configurable waitstate
counts (called WS1+WS2 here; see $7f50 below).
WS1 applies to cart ROM, WS2 applies to cart RAM.


Registers
=========

(for sake of completeness in conjunction with $7f47, $7f40-$7f46 are listed
here.)
$7f40: DMA source low byte
$7f41: DMA source high byte
$7f42: DMA source bank
$7f43: DMA length low byte
$7f44: DMA length high byte
$7f45: DMA destination low byte
$7f46: DMA destination high byte
$7f47: DMA destination bank (!)
       ALSO: Trigger GPDMA (BUS<->internal map)

$7f48: R/W
       Trigger program page caching
       0: Page select (0/1)
       This preloads a cache page with bus data (cart ROM/RAM) pre-set by the
       offset select ($7f49-$7f4b) and program page select ($7f4d-$7f3e)
       registers. The appropriate number of waitstates for the designated
       memory type applies.
       
$7f4c: R/W
       1: cache page 1 lock (1=locked)
       0: cache page 0 lock (1=locked)

       The cache page lock flags are used to prevent the CPU from buffering
       to the corresponding cache page at runtime (e.g. when the pgm_page
       register is prepared and a JMP/CALL P instruction is executed).
       The cache pages can still be filled by writing to $7f48.

       This is more or less a tuning mechanism for the developer who can decide
       to keep certain code cached at all times.

       Several constellations must be considered when code is executed in one
       of the cache pages:

       no pages locked:
       ================
       If the other page already contains the program page required, the CPU
       will just jump there. Otherwise the other page will be loaded with the
       program page contents from ROM prior to jumping and its tag will be
       updated.

       either of the pages locked:
       ===========================
       The other page cannot be used for buffering so unless it already
       contains the desired program page the same page is used, overwriting
       the code that is currently executed. If a RET occurs, the previous
       program page is swapped back in. This requires reading 512 bytes from
       cartridge ROM every time so it can get very slow.

       both pages locked:
       ==================
       ONLY the program pages that have been pre-cached by writing $7f4d/e and
       $7f48 are available to the CPU. If a different program page is requested
       prior to execution either by $7f4d/e -> $7f4f, or by a JMP/CALL P
       instruction at runtime, execution will stop immediately.

$7f50: 7: -
       6-4: WS1 (ROM read waitstates) (0-7)
       3: -
       2-0: WS2 (cart RAM read/write waitstates) (0-7)

$7f51: 0: IRQ ack / inhibit
       (write 1 to ACK and disable further IRQ
        write 0 to enable IRQ)

$7f52: ROM configuration select
       LoROM: 0: 2x  8Mbit (A21 switches between ROM /CE1 and /CE2)
              1: 1x 16Mbit (maybe A22 switches but 40-7f/c0-ff are inactive)

       HiROM: 0: 2x  8Mbit (A20 switches)
              1: 2x 16Mbit (A21 switches)

$7f53: READ (mirrors: $7f54-$7f57, $7f59, $7f5b-$7f5f)
       7: CPU is accessing ROM bus (SNES cut off)
       6: CPU is running
       5-2: -
       1: IRQ pending
       0: Cx4 suspended (see $7f55-$7f5d)

       WRITE (no mirrors)
       Any write access returns the Cx4 to idle state immediately - useful to
       recover from infinite loops ;)

$7f55: Any write access indefinitely suspends the Cx4 (registers can be
       read and written but the CPU shows no reaction: no buffering occurs,
       no code is run and $7f53 is not updated).
       Cx4 status bit 0 is set.

$7f56: Any write: Suspend Cx4 for  32 cycles ( 1.6µs @20MHz)
$7f57: Any write: Suspend Cx4 for  64 cycles ( 3.2µs @20MHz)
$7f58: Any write: Suspend Cx4 for  96 cycles ( 4.8µs @20MHz)
$7f59: Any write: Suspend Cx4 for 128 cycles ( 6.4µs @20MHz)
$7f5a: Any write: Suspend Cx4 for 160 cycles ( 8.0µs @20MHz)
$7f5b: Any write: Suspend Cx4 for 192 cycles ( 9.6µs @20MHz)
$7f5c: Any write: Suspend Cx4 for 224 cycles (11.2µs @20MHz)

These registers can be used to obtain guaranteed access to ROM/RAM from the
SNES side while the Cx4 is running. CPU and/or ongoing DMA transfers are
suspended.

$7f5d: Any write clears the Cx4 suspend flag and the chip becomes
       responsive again (presumably resumes execution).

$7f5e: Any write clears the IRQ pending flag WITHOUT touching the actual
       cart IRQ signal (remains low)

If IRQ is enabled, /IRQ goes high->low when the Cx4 CPU stops, and stays low.
Software must ACK by writing 1 to $7f51 bit 0 -> /IRQ will go high again.
Software must then write 0 again to re-enable IRQ triggering for the next
execution.


CPU registers
=============

$20: PC  (PC of current instruction + 1)

$28: ??? (always seems to return $2e)

$2e: cart ROM bus port (triggers cart ROM reads),
     to be used with $61 opcode!
     Waitstates = $7f50 bits 6-4.

$2f: cart RAM bus port (triggers cart RAM reads/writes),
     to be used with $61 / $e1 opcode!
     Waitstates = $7f50 bits 2-0.

$70-$7f are mirrors of $60-$6f ("R0-R15").
internal register address appears to be 7-bit, e.g. $e0-$ff are the same
as $60-$7f.


Instruction cycles
==================

1 cycle = 50ns (@20MHz) duh
The vast majority of instructions execute in a single cycle.
Exceptions/noteworthy details are:

 - jmp/call takes 1 cycle if branch not taken, 3 cycles if taken
   (regardless of p flag, crossing page boundaries comes at no extra cost)

 - ret takes 3 cycles

 - skip takes 1 cycle for itself, but it makes the skipped instruction count
   for 1 cycle (injected NOP or equivalent)

 - internal data rom/ram access takes 1 cycle only.

 - cart ROM access from Cx4 code:

   * Cartridge ROM is accessed by reading from register $2e to a special
     internal register (fullsnes: ext_dta). (Opcode: $612e)
   * The read itself is 0-waitstate and executes in a single cycle. However
     the result will not be valid before the appropriate number of waitstates
     is reached and the data is actually pulled in from the ROM.
   * The CPU may execute other code in the meantime.
   * To stall the CPU until the ROM read operation is complete, a wait
     i­nstruction can be issued ($1c00).
   * The external bus address does not auto-increment; to do so, a special
     instruction can be issued ($4000). There may be a decrement instruction
     as well (as of yet unknown). It is useful to do this before the wait
     instruction to save a cycle.
   * The number of waitstates can be configured by setting $7f50 bits 6-4.

 - cart RAM access from Cx4 code:

   * Cartridge RAM is accessed by reading from register $2f to a special
     internal register, or by writing to register $2f from the same.
     (Opcode: $612f (read) / $e12f (write))
   * Access handling appears to be the same as for cart ROM, and the
     same for reading and writing:
      issue read/write; alter bus address; wait for complete (or do
      something else in the meantime)
   * The number of waitstates can be configured by setting $7f50 bits 2-0.

Cartridge bus is only claimed by the Cx4 when DMA or caching occurs, or a bus access
is carried out by Cx4 code. At all other times the SNES address+data buses are forwarded
to the ROM/RAM even if the Cx4 CPU is running.

Flags
=====

As of yet untouched. higan seems to do a decent job at them already.


scribble / internal notes from testing
======================================

This is probably useless but eh.

Cx4 code base: 01:8000
PC always 00

1 cycle = 50ns

255 NOP + 1 STOP in 12.8us -> 50ns/inst  -> 1 cycle/inst
255 JMP + 1 STOP in 38.4us -> 150ns/inst -> 3 cycles/inst
255 MOV -> A/R0 + 1 STOP in 12.8 us -> 50ns/inst -> 1 cycle/inst
255 RDWR ROM/RAM + 1 STOP in 12.8 us -> 50ns/inst -> 1 cycle/inst
255 RDBUS (612E) -> CRASH
127 RDBUS+WAIT + 1 STOP in 32us -> 250ns/pair -> 5 cycles total (WS = 4)
127 RDBUS+WAIT + 1 STOP in 19.2us -> 150ns/pair -> 3 cycles total (WS = 2)
127 RDBUS+WAIT + 1 STOP in 12.8us -> 100ns/pair -> 2 cycles total (WS = 1)
127 RDBUS+WAIT + 1 STOP in 12.8us -> 100ns/pair -> 2 cycles total (WS = 0!)
127 RDBUS+INC + 1 STOP in 12.8us + CRASH -> 100ns/pair -> 2 cycles total (WS = 4!!!!!)
 85 RDBUS+INC+WAIT + 1 STOP in 21.4us -> 250ns/triple -> 5 cycles total (WS = 4)
 85 WRBUS?!+INC+WAIT + 1 STOP in 21.4us -> 250ns/triple -> 5 cycles total (WS = 4)

Page    file
======================
 00     cx4_00_nop.bin            1
 01     cx4_08_jmp.bin            3
 02     cx4_64_mova.bin           1
 03     cx4_e0_movr0.bin          1
 04     cx4_70_rdrom.bin          1
 05     cx4_68_rdram_r0.bin       1
 06     cx4_6c_rdram_imm.bin      1
 07     cx4_e8_wrram_r0.bin       1
 08     cx4_ec_wrram_imm.bin      1
 09     cx4_40_rdbus_wait.bin     ?!
 0a     cx4_40_rdbus_nowait.bin   1
 0b     cx4_24_skip_unknown.bin   1
 0c     cx4_25_skip_nc.bin        1
 0d     cx4_25_skip_c.bin         1
 0e     cx4_28_call.bin           3
 0f     cx4_81_add_shl1.bin       1
 10     cx4_61_bustest.bin        - (1)
 11     cx4_61_bustest_1c.bin     1+WS (1 cycle 612e; ->WS cycles 1c00)
 12     cx4_61_bustest_40.bin     2+crash (1 cycle 612e; 1 cycle 4000)
 13     cx4_61_bustest_401c.bin   2+(WS-1) (1 cycle 612e; 1 cycle 4000; ->WS cycles 1c00)
 14     cx4_e0_bustest_401c.bin   1
 15     cx4_e1_bustest_401c.bin   1
 16     cx4_e12f_bustest_401c.bin 2+WS2-1 !!!!
 17     cx4_612f_bustest_401c.bin 2+WS2-1 !!!!
 18     cx4_e02f_bustest_401c.bin 3 (doesn't work!)
 19     cx4_0a_jmp_p1.bin         3 (P test not applicable - see 1d+1e)
 1a     cx4_0a_jmp_p2.bin         3 (P test not applicable - see 1d+1e)
 1b     cx4_0c_jz.bin             1 (not taken), 3 (taken)
 1c     cx4_10_jc.bin             1 (not taken), 3 (taken)
 1d     cx4_3c_ret.bin            3
 1e     cx4_2a_call_p.bin         1 (not taken), 3 (taken)
 1f     cx4_25_skiptest_nc.bin    1 (not taken), 2 (taken)
 20     cx4_25_skiptest_c.bin     1 (not taken), 2 (taken)
 21     cx4_25_skiptest_nc_alljmp.bin
 22     cx4_25_skiptest_c_alljmp.bin
 23     cx4_xx_infloop.bin        -
 24     cx4_xx_busloop.bin        (for 0x80 flag testing)
 25     cx4_xx_dumpflags.bin      (hopefully)
 26     cx4_f0_xchg.bin           1
 27     cx4_xx_opdump.bin         -
 28     cx4_xx_flagloop.bin       -
 29     cx4_xx_opdump2.bin        -
 2a     cx4_xx_opdump3.bin        -

opdump:

PC       regs
---------------
00       04,05,06,07,09,0a,0b,0d,0e,0f,10,11,12,14,15,16
28       17,18,19,1a,1b,1d,1e,1f,20,21,22,23,24,25,26,27
50       28,29,2a,2b,2c,2d,30,31,32,33,34,35,36,37,38,39
78       3a,3b,3c,3d,3e,3f,40,41,42,43,44,45,46,47,48,49
a0       4a,4b,4c,4d,4e,4f,70,71,72,73,74,75,76,77,78,79
c8       7a,7b,7c,7d,7e,7f,80,81,82,83,84,85,86,87,88,89
f0       8a,8b,8c,8d,8e,8f,90

opdump2:

PC       regs
---------------
00       91,92,93,94,95,96,97,98,99,9a,9b,9c,9d,9e,9f,a0
28       a1,a2,a3,a4,a5,a6,a7,a8,a9,aa,ab,ac,ad,ae,af,b0
50       b1,b2,b3,b4,b5,b6,b7,b8,b9,ba,bb,bc,bd,be,bf,c0
78       c1,c2,c3,c4,c5,c6,c7,c8,c9,ca,cb,cc,cd,ce,cf,d0
a0       d1,d2,d3,d4,d5,d6,d7,d8,d9,da,db,dc,dd,de,df,e0
c8       e1,e2,e3,e4,e5,e6,e7,e8,e9,ea,eb,ec,ed,ee,ef,f0
f0       f1,f2,f3,f4,f5,f6,f7

opdump3:

PC       regs
---------------
00       f8,f9,fa,fb,fc,fd,fe,ff

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by byuu on 2016-08-07 (#176888)

Great work, thank you very much for sharing it with us!

I can't say I understand all of your notes, but there's some stuff I should be able to improve. Really surprised that almost all instructions really do run at 20MIPS ... this thing is indeed a beast!

When a program cache needs to be loaded, does the Cx4 stall and read in the entire page all at once? Or is it an on-demand thing, fetching the bytes as they execute? Given the way the Cx4 has to pull the ROM off the bus, I assume the former.

Also, I'm unsure about emulating the HiROM mapping mode. I want to, but the weird address mangling you mention isn't something I'm going to want to do. If I do support it, I'll probably allow a non-mangled ROM to be used instead. Add something like mode=banked|linear to the cx4 node in the manifest.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2016-08-08 (#176909)

byuu wrote:

When a program cache needs to be loaded, does the Cx4 stall and read in the entire page all at once? Or is it an on-demand thing, fetching the bytes as they execute? Given the way the Cx4 has to pull the ROM off the bus, I assume the former.

The former is correct. It always reads an entire cache page's worth of bytes (512) from cart ROM/RAM, that is its granularity for the cache tags.

Oh, it may be worth mentioning that the Cx4 only acquires the bus when it is actually accessing it. This is a given for buffering/DMA but it also applies to individual bus read instructions carried out by Cx4 code.
E.g. with 4 waitstates (as set by RMX2/3), when a $612e;$4000;1c00 happens, it will only acquire the bus and apply its own address for 250ns, and forward the SNES CPU bus at all other times.

byuu wrote:

Also, I'm unsure about emulating the HiROM mapping mode. I want to, but the weird address mangling you mention isn't something I'm going to want to do.

I'd say it is safe to assume a "classical" HiROM mapping, my guess is that the pin was just intended for the Cx4 to respond on HiROM-ish addresses, requiring a dedicated PCB layout for the rest. It is set by a pin, not a register after all.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by AWJ on 2016-08-08 (#176911)

ikari_01 wrote:

byuu wrote:

The former is correct. It always reads an entire cache page's worth of bytes (512) from cart ROM/RAM, that is its granularity for the cache tags.

I think what he meant to ask is does the Cx4 wait for the entire cache page to be loaded before resuming execution, or does it start executing as soon as the address being jumped to is in the cache? I.e. is jumping to the first address in a 512-byte page any faster than jumping to an address near the end of the page?

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2016-08-08 (#176912)

Oh ok. No, the CPU is halted until the entire page has been filled.

EDIT: Oh, looks like I forgot to mention one other thing:
Writing to $7f48 is used by SNES code to preload at least one of the cache pages. This is required before starting the CPU using $7f4f. Bit 0 determines which of the two pages to write to. Waitstates as usual are according to $7f50 and the source address set at $7f49-$7f4b.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by tepples on 2016-08-08 (#176920)

byuu wrote:

Also, I'm unsure about emulating the HiROM mapping mode. I want to, but the weird address mangling you mention isn't something I'm going to want to do. If I do support it, I'll probably allow a non-mangled ROM to be used instead.

Agreed. The situation reminds me of the "swapbin" utility. Rearrange the banks of the ROM image into what the program sees, and just assume the HiROM boards have a funny swapped ROM pinout.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by AWJ on 2016-08-08 (#176948)

In which 65816 banks are the Cx4 registers and internal RAM mapped in each mode (LoROM/HiROM)? The HiROM mapping you've shown for external RAM overlaps with the register/internal RAM mapping.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by qwertymodo on 2016-08-08 (#176951)

Great to see somebody finally figured out pins 74 and 75. Now we have a complete pinout

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by Markfrizb on 2016-08-08 (#176960)

So if you hold pin75 high, is this basically tri-stating the outputs of the CX4?

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by qwertymodo on 2016-08-08 (#176967)

Markfrizb wrote:

So if you hold pin75 high, is this basically tri-stating the outputs of the CX4?

If I read it right, you can still read and write to the Cx4 itself (i.e. the internal registers) but it no longer passes through reads and writes to the ROM/SRAM. Theoretically, you could have a ROM/RAM dedicated to just the Cx4, and then a separate ROM/RAM for the main cart address space, though you'd probably need a separate address decoder in that case.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by byuu on 2016-08-08 (#176983)

qwertymodo wrote:

I would think that would make the chip useless. If you don't have ROM, you can't load in program cache pages. If you don't have RAM, you don't really have space to write out any results from your code executing. If you don't have either, you have a 20MHz paperweight.

If the Cx4 had its own little pool of program RAM and data RAM, that might be a useful feature to exist.

At any rate, I'm definitely not emulating this mode.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2016-08-08 (#176988)

AWJ wrote:

In which 65816 banks are the Cx4 registers and internal RAM mapped in each mode (LoROM/HiROM)? The HiROM mapping you've shown for external RAM overlaps with the register/internal RAM mapping.

LoROM: 00-3f:6000-7fff, 80-bf:6000-7fff
HiROM: 00-2f:6000-7fff, 80-af:6000-7fff (30-3f, b0-bf is used for cart RAM)

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2016-08-08 (#176989)

qwertymodo wrote:

This. I could only imagine this to be useful when in need of vastly more ROM than the Cx4 can address itself, or when it's absolutely necessary for the SNES to run from ROM while the Cx4 is working.
IIRC the only real world example that uses such a setup is the SPC7110.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by AWJ on 2016-08-08 (#176991)

byuu wrote:

qwertymodo wrote:

The Cx4 would have ROM, it just wouldn't be visible to the S-CPU (and vice-versa, the S-CPU's ROM wouldn't be visible to the Cx4). It would be a bit similar to the GSU configuration that's described in documents but not used by any actual games, where SNES-only ROM is mapped in the fast banks.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2016-08-10 (#177101)

Updated the notes with some feedback, also I got the pins the wrong way around :oops:

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by Optiroc on 2016-08-11 (#177148)

Ikari: Do you plan to implement any way of loading custom Cx4 code with SD2SNES? For example by using the format byuu proposed here.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by AWJ on 2016-08-11 (#177161)

The Cx4 "firmware" is nothing but trig and reciprocal lookup tables. The actual program is entirely in the external ROM. I suspect you could construct lookup tables programmatically at startup (or even at compile time with what modern C++ allows in a constexpr) that exactly match the internal ROM's; you would just have to be careful about rounding.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by byuu on 2016-08-11 (#177253)

The Cx4 is unique in that it's the only SNES cartridge firmware that we could legally distribute with emulators, since you can't copyright math constants.

The one advantage of allowing the firmware to be separate would be if you wanted to use your own data ROM instead. As it's ROM data, there's no reason it couldn't be changed at manufacturing time ala the NEC uPD DSPs, but it would certainly be much less important since it's only the dat ROM.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by nocash on 2016-08-14 (#177427)

Good findings! Including some really unexpected low-level stuff... how did you figure out the "cache page lock" and "Suspend Cx4 for NN cycles" features - just by examining cpu's code execution timings?

Are port 7F40h and up R/W? Most of the 7F4xh registers are only known to be writeable, but reading might work as well?

Or better, what about the whole 6C00h..7FFFh area? Is there anything read-able in there, mirrors or so?

What happens on reading unused bits (eg. 2 unused bits in the waitstate register, etc.)? Do they always return zero or so? And did you test if they ignore attempts to write different values to them?

Did you check if 7F60h and up can change expception vectors other than NMI/IRQ? Might also support COP/BRK, and possibly also exception vectors in 6502-emulation mode.
And when are those vectors mapped to FFExh, is there an enable bit for that? Or they just mapped anytime when the CX4 is busy (and unmapped when it's ready)?

For Port "$7f53: READ (mirrors: $7f54-$7f57, $7f59, $7f5b-$7f5f)", the official address should be 7F5Eh (the games are using that address to test the busy flag in bit6), ie. 7F53h should be treated as "mirror" of 7F5Eh. (That of course, for READing only) (the 7F53h WRITE description should be kept as is).
NB. the games are testing 7F5Eh.bit6 not only for "CPU running", but also for "DMA running" or "Cache load running", so it seems to be a more general busy flag, not solely for CPU code execution.

Having the 7F52h ROM config description is nice (there has been a bunch of conflicting info whether one should write 00h or 01h to 7F48h or 7F52h for mapping two ROMs), but it's much clearer. Oh, only, the notation in your txt is slightly confusing for that port, everywhere else "0: and 1:" are used for "bit0 and bit1," but I guess here they do mean "set bit0 to 0 or 1", right?
For 2x16Mbit ROM, did you also tested if A23 toggles between /CE1 and /CE2 in LoROM mode? Aside from A22, that would be the other possible address line for mapping more than 2Mbyte LoROM.

---

And, for the CPU operands...

"PC of current instruction + 1" is that the 8bit program counter, or the whole 24bit address? In the cx4-style NNNN:NN notation (rather than SNES-style NN:NNNN notation)?
And then, "+1" means plus one 16bit word, ie +2 bytes, right?

"$28: ??? (always seems to return $2e)" that's (as known so far) unrelated to operand number 2Eh? And just returns a (seemingly) constant value of 00002Eh? Might be some internal status value. Btw. if there's a way to read PC, did you check if one could read SP for call/ret stack, too?

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2016-08-14 (#177441)

nocash wrote:

I actually figured out cache page lock when first doing the Cx4 core for the sd2snes. It kept working improperly and the game made use of $7f4c in the corresponding places... I was like "hm, this could only work if the cache behaved something like this..." and at one point it became clear.

The suspend feature was found by accident shortly before posting my notes when messing around with the DMA feature. Before finding out that $7f47 is actually just the destination bank number instead of some mode setting register I wondered why it locked up for certain values. One idea was that it might be used to "DMA" values poked by the S-CPU to a specific register so I started poking some registers during lockup and found that for $7f56-$7f5c the SNES address appeared on the ROM data bus instead of the Cx4 bus address for a number of different durations. This also conveniently explained the bit 0 flag in $7f5e getting set not only for $7f55 writes but also $7f56-7f5c.

nocash wrote:

Are port 7F40h and up R/W? Most of the 7F4xh registers are only known to be writeable, but reading might work as well?
Or better, what about the whole 6C00h..7FFFh area? Is there anything read-able in there, mirrors or so?
What happens on reading unused bits (eg. 2 unused bits in the waitstate register, etc.)? Do they always return zero or so? And did you test if they ignore attempts to write different values to them?

$6000-$6bff is fully r/w (3kB RAM)
$6c00-$6fff returns random values with a bit of noise.
$7000-$7bff is fully r/w (mirror of $6000-$6bff)
$7c00-$7f3f returns the same kind of garbage as above.
$7f40-$7f52 reflect the values written but only for the bits actually used. All other bits are read as 0. (e.g. writing $ff to $7f52 will result in $7f52 reading $01.)
$7f60-$7f7f is fully r/w. (vector area)
$7f80-$7faf is fully r/w. (16x24bit registers)
$7fb0-$7fbf is always 0.
$7fc0-$7fef is fully r/w. (mirror of $7f80-$7faf)
$7ff0-$7fff is always 0. (mirror of $7fb0-$7fbf)

The "random" pattern for the unmapped work RAM areas ($6c00-$6fff, $7c00-$7fff) is a bit strange, it is a 32 byte pattern repeating itself.

nocash wrote:

Did you check if 7F60h and up can change expception vectors other than NMI/IRQ? Might also support COP/BRK, and possibly also exception vectors in 6502-emulation mode.
And when are those vectors mapped to FFExh, is there an enable bit for that? Or they just mapped anytime when the CX4 is busy (and unmapped when it's ready)?

$7f60-$7f7f is always mapped to $ffe0-$ffff while the Cx4 is busy and always unmapped when it is idle. All 32 bytes can be used. There is no separate enable flag.
Also $7f40-$7f5f is always mapped to $ffc0-$ffdf while the Cx4 is running, whatever that may be good for

nocash wrote:

For Port "$7f53: READ (mirrors: $7f54-$7f57, $7f59, $7f5b-$7f5f)", the official address should be 7F5Eh (the games are using that address to test the busy flag in bit6), ie. 7F53h should be treated as "mirror" of 7F5Eh. (That of course, for READing only) (the 7F53h WRITE description should be kept as is).

I wonder. I think I've seen MMX2 poll $7f56 in one occasion, might have been the Cx4 tests though.

nocash wrote:

NB. the games are testing 7F5Eh.bit6 not only for "CPU running", but also for "DMA running" or "Cache load running", so it seems to be a more general busy flag, not solely for CPU code execution.

True, and these other actions also affect the IRQ flag.

nocash wrote:

[$7f52]Oh, only, the notation in your txt is slightly confusing for that port, everywhere else "0: and 1:" are used for "bit0 and bit1," but I guess here they do mean "set bit0 to 0 or 1", right?
For 2x16Mbit ROM, did you also tested if A23 toggles between /CE1 and /CE2 in LoROM mode? Aside from A22, that would be the other possible address line for mapping more than 2Mbyte LoROM.

Oops, yes, it's the value 0 or 1 for bit 0. Going to correct that too.
A23 does nothing unfortunately so it really seems the cap for LoROM is 16Mbits.

nocash wrote:

"PC of current instruction + 1" is that the 8bit program counter, or the whole 24bit address? In the cx4-style NNNN:NN notation (rather than SNES-style NN:NNNN notation)?
And then, "+1" means plus one 16bit word, ie +2 bytes, right?

It is the 8-bit program counter within the cache page, so it reads 0000NN. I don't know yet what happens when reading the register at PC=$ff. "+1" means one instruction so one 16-bit word

nocash wrote:

"$28: ??? (always seems to return $2e)" that's (as known so far) unrelated to operand number 2Eh? And just returns a (seemingly) constant value of 00002Eh? Might be some internal status value. Btw. if there's a way to read PC, did you check if one could read SP for call/ret stack, too?

I see no obvious connection to operand $2e, though I haven't tried writing to $28 yet. Regarding the stack pointer, all other operands in the 00-ff range, save for the constants and other documented ones, have read 000000. I shall try manipulating the stack (e.g. call) and do another readout some time.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2016-08-14 (#177443)

Also, while the Cx4 is busy, from the SNES's perspective:

- when Cx4 CPU is running or DMA to internal RAM in progress, $6000-$7f3f return $ff
- when Cx4 CPU is running (but not when DMA takes place) $7f80-$7faf and $7fc0-$7fef return various things, sometimes R0 is repeated throughout, sometimes it changes wildly. Could be the current contents of the CPU's IDB.
- when Cx4 is accessing the bus (in any way), ROM/cartRAM reads return $00.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by qwertymodo on 2016-08-14 (#177455)

In LoROM, you could probably get 24 MBit ROM with a 16MBit ROM1 and 8MBit ROM2 and "bank switching" the second half of ROM1 and all of ROM2 in the upper half of the ROM address space (I forget which register that is, but there's the one that switches between 1x16 and 2x8).

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2016-08-14 (#177458)

That should actually work. Crafty!

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by qwertymodo on 2016-08-14 (#177495)

Good luck deciding how to map the resulting ROM file for emulation though

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2016-08-14 (#177500)

And another one: $7f58 and $7f5a always read as $00.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by Revenant on 2017-08-28 (#203426)

Some things I'm curious/unclear about after reading these notes:

- Does reading registers $2e/$2f using other register read instructions (i.e. $602e instead of $612e) still get meaningful data from the bus somehow (after waitstates)?
- Does writing $2f using a register write instruction allow writing to the bus from e.g. the A register instead of ext_dta?
- Does reading/writing other registers using $61xx/e1xx still use/change the value of ext_dta?

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by Revenant on 2017-08-31 (#203630)

Anyway, I've started implementing these timing notes in bsnes-plus. Program page caching (and preloading), wait states, the RAM/ROM ports, and various other register and memory changes are in so far.

Here's the current state of the MMX2 attract demo:
https://www.youtube.com/watch?v=sYyPDf49JAQ

Definitely an improvement, although it still seems a little wonky. I don't have a real cartridge for comparison, so I can't tell how much of the things happening in the video are timing mistakes on my part or just normal gameplay mistakes.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by FitzRoy on 2017-09-01 (#203646)

Revenant wrote:

Definitely an improvement, although it still seems a little wonky. I don't have a real cartridge for comparison, so I can't tell how much of the things happening in the video are timing mistakes on my part or just normal gameplay mistakes.

I dunno, my first impression is.. that could be perfect. Mega Man survives the whole ordeal and the scene ends with him about to deliver a killing blow. Need someone with the cart to do a recording.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by fred on 2017-09-01 (#203667)

Got a friend with a rockman X2 cart who did a quick recording: https://youtu.be/F_P7I3dqj30
For some reason the capture occasionally drops to 30 fps recording, but i hope it's servicable still.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by Revenant on 2017-09-01 (#203668)

Thanks for the recording! I didn't compare frame-by-frame or anything but I'm surprised to see that my attempt so far seems to be pretty much right on target.

I guess tonight I'll push the current WIP branch to github tonight after I finish tweaking a couple of other things. There's still some other stuff (mostly w/r/t memory mapping details) from the notes in this thread that I haven't implemented yet but I guess the majority of the crucial stuff is there now.

(edit: https://github.com/devinacker/bsnes-plus/tree/cx4)

Big thanks to ikari for researching and documenting all this stuff, too.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by jbo_85 on 2017-09-02 (#203733)

Cool! That sped up the spinning wireframe head in the intro which was quite demanding before.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by Revenant on 2017-09-03 (#203842)

Almost everything in this thread is now emulated in the bsnes-plus master branch, excluding some of the details about $6000-7fff memory access while the Cx4 is running. I might add that stuff later but it's nothing that either of MMX2 or MMX3 rely on to work correctly, I think.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by byuu on 2018-09-03 (#224967)

A year late, but not forgotten ...

Questions in the event anyone knows.

Is it confirmed in LoROM mode that the Cx4 bus can't see ROM at $c0-ff:0000-ffff?

Quote:

DMA source and destination CAN reference the same bus but not the same chip

I mean, what other bus could it access? It's not like the Cx4 can transfer data to VRAM. The only valid transfers seem to be cart ROM->cart RAM or Cx4 DRAM, cart RAM->DRAM, or Cx4 DRAM->cart RAM. Unknown if DMA to/from IO registers work, but hey, maybe. Then you get the fun possibility of a DMA transfer firing off a DMA transfer.

Are the DMA transfers staggered like SNES DMA? Eg:

cycle 1: read source[0]
cycle 2: write target[0], read source[1]
cycle n: write target[n-1], read target[n]
cycle n+1: write target[n]

Or are they more like this?

cycle 1: read source[0]
cycle 2: write target[0]
cycle n*2: read source[n]
cycle n*2+1: write target[n]

What exactly is the purpose of $7f49-7f4b? You can address every possible location with $7f4d-7f4f already. Does $7f49-7f4b get taken into account with program RAM page tagging?

What happens if I write $7f48, but the requested page is already cached? Will it forcefully reload it again? Seems like we'd kind of want it to, right? Eg if you were loading from cart RAM, the data may be modified.

Is $7f4d-7f4f cached to an internal program counter register, or is $7f4d-7f4f the actual program counter itself? Eg if I write to it while the Cx4 is running, do horrible things happen?

Does writing to $7f53 get the Cx4 out of a lockup from a bad DMA transfer as well?

Quote:

$7f55: Any write access indefinitely suspends the Cx4 (registers can be
read and written but the CPU shows no reaction: no buffering occurs,
no code is run and $7f53 is not updated).
Cx4 status bit 0 is set.

Apparently $7f53 is updated if status bit 0 is set.

What happens if I write register $20? Can I actually simulate a (short) PC jump that way? Very bizarre it's not the full 24-bit PC, but given the way $7f4d-7f4f are laid out (bytes 8-15, bytes 16-23, THEN bytes 0-7), I suspect the PC is only 8-bit, and the extra 16-bits are just the "active" P register.

With $2e and $2f, what happens if I change the bus address during the fetch period? Will it screw up the read/write, or is the address cached? I suspect it must be cached, because if $4000 (increment bus address) occurs before the wait, then well ... that's not good, is it?

nocash doesn't seem to have the ability to test on real hardware, so with:
http://problemkaputt.de/fullsnes.htm#sn ... cx4opcodes
6000h+nnoooooooo ?? ??? mov A/ext_dta/?/prg_page,<op>

Where does the prg_page part come from? It's always so hard to tell in nocash's docs what's speculation and what's confirmed behavior. MMX2/3 doesn't use $63xx instructions, so I don't see how he'd know that. It's not obvious from other instructions that it's some sort of pattern.

Just to be clear, are the ROM/RAM waitstates the exact # of cycles requires, or the # of ADDITIONAL cycles required, above 1? Eg WS1=3, is that "the data's ready in 3 cycles", or "the data's ready in 4 cycles"?

What happens if I start a DMA or cache page operation while the Cx4 is active? Will it take priority over the Cx4 instruction processing, stay pending until the Cx4 instructions are halted, or just do nothing?

What happens if the bus address used for $2e isn't ROM, or for $2f isn't RAM? Will it just transform the address and access said bus ANYWAY, ignore the operation, or lock the chip up?

What happens if a DMA crosses from RAM into ROM during the transfer?

Are the DMA source/length/target values updated during a transfer?

What does a DMA transfer length of zero do?

What's the actual overhead of DMA transfers per byte? Eg you say "EXTRA waitstates" on cart ROM/RAM accesses, so does that imply that accesses to internal RAM take one cycle? Or that the wait states are really waitstates+1?

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by Overload on 2018-09-05 (#225042)

Most of this information (except timing info) was made available when segher and I reverse engineered the CX4 chip back in 2011. Information regarding register mapping and the entire instruction set was published. The complete instruction set has been on the snes9x dsp website for 7 years.

http://users.tpg.com.au/advlink/dsp/cx4.html

The only thing I couldn't test on my mashmods flash programmer was RDBUS which most likely needs the snes clock signal to function.

I didn't find anything at $2e or $2f it could be that these addresses also require a clocking signal from the snes?

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by nocash on 2018-09-05 (#225060)

Overload wrote:

Most of this information (except timing info) was made available when segher and I reverse engineered the CX4 chip back in 2011. Information regarding register mapping and the entire instruction set was published. The complete instruction set has been on the snes9x dsp website for 7 years.
http://users.tpg.com.au/advlink/dsp/cx4.html
The only thing I couldn't test on my mashmods flash programmer was RDBUS which most likely needs the snes clock signal to function.
I didn't find anything at $2e or $2f it could be that these addresses also require a clocking signal from the snes?

Very nice document. That was available in 2011??? I've implemented the CX4 stuff in no$sns in December 2011, back then I've only found two txt files (attached below), maybe that were much earlier versions of your doc? The cx4.html file appears in the internet archive starting at 2013, though I've missed it back then, too, and never heard about it until today : /

Going by ikari's findings, opcode 4000h does merely do an address increment (it isn't RDBUS).

The CLEAR opcode and XNOR opcodes are new (to me), good to know about them.
Some opcodes can use only "Rx" (r0-r15) instead of "reg" (all registers)? Nasty : (

Having info about affected flags is useful (and knowing about the overflow flag to exist).

What is a "Greater Than Flag"??? Going by the opcode descriptions it's "Greater or Equal (T=1)". If it's greater or equal... then it's just a regular "Carry Flag" isn't it? Or wait, is it some sort of "signed carry"? Ie. the Sign and Overflow flags being merged into a single flag?

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by byuu on 2018-09-05 (#225127)

Quote:

Most of this information (except timing info) was made available when segher and I reverse engineered the CX4 chip back in 2011.

"Most" feels like a strong overstatement.

Your page states the meaning of register $28, and has a few more opcodes (super, super awesome, by the way, thank you so much!) ... but there's nothing about program RAM caching, the extra MMIO registers, wait state on ROM/RAM accesses, or answers to any? of my questions.

On your instruction set ... you're missing $1c00 (wait for external bus reads), and it seems there's permanent confusion about the sequence of $612e,$4000,$1c00's purpose. So ... it's definitely not 100% complete. Again, this is awesome stuff and I'm not trying to be negative.

Code:

101000SS .xxxxxxx   A0   XNOR   A, reg   Inverse Exclusive OR   A = A ^ !reg   N Z
101001SS xxxxxxxx   A4   XNOR   A, imm   Inverse Exclusive OR   A = A ^ !imm   N Z

For XNOR, when imm is 8-bit, is it a = a ^ ~(uint8_t)imm, or is it a & ~imm? Eg what happens to bits 8-23 of the 8-bit immediate value?

Code:

V Overflow Flag 0

Holy heck I was missing an entire flag o_O

Can you please share the algorithm for computing this flag? Is it just the standard?

ADD: V = ~(A ^ data) & (A ^ result) & 0x8000;
SUB: V = ~(A ^ data) & (A ^ result) & 0x8000;

Or is it the slightly modified variant?

ADD: V = ~(A ^ data) & (A ^ result) & 0x8000;
SUB: V = (A ^ data) & (A ^ result) & 0x8000;

I guess you're probably not too interested in updating the document at this point with more details, heh.

Code:

03   MBR   Memory Buffer Register   8*   RW   00
08   ROMB   Immediate ROM   24   R   DATA_ROM[0]
0C   RAMB   RAM Buffer   24   RW   000000
13   MAR   Memory Address Register   24   RW   FFFFFF
1C   DPR   RAM Address   12*   RW   000

It really feels like 18 would be a data ROM address, but obviously that isn't necessary ... wow, the $74xx a=dataROM[abs] opcode was quite the find!

Code:

01100001 .xxxxxxx   61   MOV   MBR, reg**   Data Transfer Instruction   MBR = reg**
01100010 ....xxxx   62   MOV   MAR, Rx   Data Transfer Instruction   MAR = Rx

Code:

11100000 .xxxxxxx   E0   MOV   reg, A   Data Transfer Instruction   reg = A
11100001 .xxxxxxx   E1   MOV   reg**, MBR   Data Transfer Instruction   reg** = MBR

... is it really the case that $e0xx can write to all registers, but not $e1xx? That makes zero sense ... I mean, really ... absolutely zero sense. Why on earth would they impose such a limitation? Same for 62xx/61xx and reads. In the read case, does $61xx from registers 00-5f just always return 0, or just not set the MBR at all, or mirror the GPRs throughout the whole space?

Quote:

What is a "Greater Than Flag"??? Going by the opcode descriptions it's "Greater or Equal (T=1)". If it's greater or equal... then it's just a regular "Carry Flag" isn't it?

Yeah, it seems to be. Your document also says:

Quote:

Carry Flag is CLEARED on borrow (ie. opposite as on 80x86 CPUs).

I have it as:

int r = ri() - sa();
cf = r >= 0;

I really don't think MMX2/3 would be playable if we had the carry flag implemented incorrectly ... right?

Also from your docs:

Quote:

Call Stack is reportedly 16 levels deep, at least 16bits per level.

Pretty sure it's 8x23-bits, not 16x16+-bits.

Quote:

Stack Circular, 8 x 23-Bits
Internal RAM 4 x 384 x 16-Bits

Also, what the heck with the data RAM? Isn't it just 3072x8-bits? The way the RDRAM and WRRAM instructions work doesn't jive with it being 4 x ... x 16-bits at all. If it were 1 x ... 24-bits, then we wouldn't need L,M,H (L,H,B) variant instructions, presumably. And if it were 16-bits, we would only have two instead of three of them ...

Quote:

28 P Page Select 15* RW 00FF

Are sure this is P and not the upper 15-bits of IP/PC? You say P, so I'm going with that, but good to confirm I guess.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by byuu on 2018-09-05 (#225152)

Also ...

Code:

000011F. xxxxxxxx   0C   BEQ   imm   Branch on Equal (Z=1)
000100F. xxxxxxxx   10   BGE   imm   Branch on Greater or Equal (T=1)
000101F. xxxxxxxx   14   BMI   imm   Branch on Minus (N=1)
000110F. xxxxxxxx   18   BVS   imm   Branch on Overflow (V=1)

We have the unused bit at d8 ... why the heck isn't this used to implement BNE, BLT, BPL, BVC? Augh ... so ridiculous.

I'm very torn between implementing the unused bits as opcode mirrors versus as no operations. Overload, what is your confidence in the .s being mirrors? Eg I notice for NOP, it could really have 11 dots instead of 10 to fill in a missing gap there.

Code:

01011001 ........   59   EXTS   A   Sign Extension (8 bits)      N Z
01011010 ........   5A   EXTS   A   Sign Extension (16 bits)      N Z

Wow, this one's a lot more limited than I thought. I figured it would take a register-or-immediate. I guess it would have been awkward since there were no shift bits. Also, I wasn't setting flags on these, fun.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by Overload on 2018-09-06 (#225238)

nocash wrote:

Yes it was. It's been on the internet since the 17th of November 2011, that's the date listed on the main page. Until yesterday the site hadn't been updated since 2011. I also dumped the CX4 data ROM on the 8th of June 2011 incase you are interested.

nocash wrote:

Going by ikari's findings, opcode 4000h does merely do an address increment (it isn't RDBUS).

This was segher's interpretation of the opcode, as it says in the notes I couldn't get it to work and reading from $2e only moved zero into MBR as was the case for all addresses from $0 to $5f.

nocash wrote:

What is a "Greater Than Flag"??? Going by the opcode descriptions it's "Greater or Equal (T=1)". If it's greater or equal... then it's just a regular "Carry Flag" isn't it? Or wait, is it some sort of "signed carry"? Ie. the Sign and Overflow flags being merged into a single flag?

Mnemonics are based on Hitachi namings. "Greater Than Flag" is a carry Flag.

Cx4info.txt is seghers doc and cx4opcodes.txt is from the "CX4 Program ROM thread" on byuu's forum. Some of those are my comments including the exploit I used to dump the data rom.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by Overload on 2018-09-06 (#225266)

byuu wrote:

"Most" feels like a strong overstatement.

Maybe not most but a lot of it was already known. I have information that is unpublished as well. I'm pretty sure some of that information was already in the "CX4 Program ROM" thread on your forums.

byuu wrote:

On your instruction set ... you're missing $1c00 (wait for external bus reads), and it seems there's permanent confusion about the sequence of $612e,$4000,$1c00's purpose. So ... it's definitely not 100% complete. Again, this is awesome stuff and I'm not trying to be negative.

Is it confirmed, is there definite proof that it is a wait instruction and not simply a nop or double nop?

byuu wrote:

For XNOR, when imm is 8-bit, is it a = a ^ ~(uint8_t)imm, or is it a & ~imm? Eg what happens to bits 8-23 of the 8-bit immediate value?

The immediate value is zero extended to 24 bits and the whole 24bits inverted.

byuu wrote:

Holy heck I was missing an entire flag o_O

Can you please share the algorithm for computing this flag? Is it just the standard?

standard overflow

byuu wrote:

I guess you're probably not too interested in updating the document at this point with more details, heh.

I just updated it with some more info. I will update it if any of the information is incorrect or proven or If I feel like adding more info. I am still involved in emulation, kindred is still an active project.

byuu wrote:

It really feels like 18 would be a data ROM address, but obviously that isn't necessary ...

I think they threw darts at a dart board when they were deciding the address numbers.

byuu wrote:

It does in the sense that MBR, the target is a Special Purpose Register and logically you can't be reading and writing to an SPR at the same time as they would exist in the same array. The General Purpose Registers must exist in a separate array with a datapath between them and the SPR array. You can't transfer from SPR to SPR. I hope that makes sense?

byuu wrote:

Pretty sure it's 8x23-bits, not 16x16+-bits.

definitelty 8x23-bits, I tested that. More than 8 pushes will overwrite previous stack pushes.

byuu wrote:

I can't remember whether it was I or somebody else who came up with that. Maybe it has something to do with how it is laid out on the die. I'll have to think about it some.

byuu wrote:

Are sure this is P and not the upper 15-bits of IP/PC? You say P, so I'm going with that, but good to confirm I guess.

Correct. Not the upper bits of IP/PC.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by Overload on 2018-09-06 (#225270)

byuu wrote:

I was looking at these last night. My test program and kindred both have the following which is got me scratching my head. It's been so long since I worked on this.

Code:

000010F0 xxxxxxxx   08   BRA   imm   Branch (Always)
000011F0 xxxxxxxx   0C   BEQ   imm   Branch on Equal (Z=1)
000100F0 xxxxxxxx   10   BGE   imm   Branch on Greater or Equal (T=1)
000101F0 xxxxxxxx   14   BMI   imm   Branch on Minus (N=1)
000110F0 xxxxxxxx   18   BVS   imm   Branch on Overflow (V=1)
001010F0 xxxxxxxx   28   BSR   imm   Branch Subroutine
001011F0 xxxxxxxx   2C   BSREQ   imm   Branch Subroutine on Equal (Z=1)
001100F0 xxxxxxxx   30   BSRGE   imm   Branch Subroutine on Greater or Equal (T=1)
001101F0 xxxxxxxx   34   BSRMI   imm   Branch Subroutine on Minus (N=1)
001110F0 xxxxxxxx   38   BSRVS   imm   Branch Subroutine on Overflow (V=1)

I would think that these are probably correct as the test program ran in parallel with the hardware. So $09xx, $0Bxx, $0Dxx, etc.. I have as nops.
How can we be certain that $04xx is a nop, it could be a mirror of $1cxx :wink:

There are gaps everywhere that all seem to be nops, so many nops.

byuu wrote:

I hope it doesn't break anything

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by nocash on 2018-09-07 (#225338)

I am slowly working through the new findings. Phew, there are a quite a lot of details that need to be changed... and I'll need to change most things four times: for assembler + disassembler + emulator + specifications : (

Overload wrote:

Mnemonics are based on Hitachi namings. "Greater Than Flag" is a carry Flag.

The Greater naming is really weird, first of, it should be called GreaterOrEqual, and second, Less/Greater does conventionally imply signed comparisions (as opposed to Above/Below or Higher/Lower for unsigned comparisions) (but as far as I understand the CX4 "Greater" is meant to be unsigned). So, please rename it to Carry (or add some caution saying that Greater doesn't actually mean Greater).
Oh, and, just to be sure: Hitachi didn't release any actual CX4 opcode specs, or did they?

I am almost 100% sure that opcode 4000h does just do "inc ext_ptr". Ikari said so, and I did have guessed/used it that way since no$sns v1.0, too. And the CX4 disassembly doesn't make too much sense otherwise (it's here and there using that opcode just for incrementing the ext_ptr, without actually doing any memory access in that places).
I am quite sure that you can test opcode 4000h with your hardware setup and won't need a clock signal for it.
The actual memory access should consist of opcodes 612Eh+1C00h, that might actually hang on your hardware (assuming that they need a clock source for the waitstate counter).
Ikari seems to have encountered crashes when using 612Eh without trailing 1C00h. Though maybe one could replace 1C00h by four NOPs (equivalent to the usual 4 waitstates), or maybe the hardware still screws up somehow despite of the NOPs.

Overload wrote:

Is it confirmed, is there definite proof that it is a wait instruction and not simply a nop or double nop?

Yes, I would say so. Ikari seems to have tested the timings with different waitstate settings (affecting the cycles for 1C00h), and also with/without 4000h (taking one cycles less on 1C00h if opcode 4000h was already taking up one cycle). For details, search for "1C00" and "4000" in ikari's specs.

---

The older specs (in the cx4opcodes.txt file posted above) did include "reg=00h" for using "A" as operand.
The newer specs (cx4.html file) leaves reg=00h undefined.
Which one is correct?

If "A" can be used then one could do stuff like "ADD A,A*2,A" (aka multiply by 3) or "CMP A,A" (aka clear N,Z flags).
If "A" cannot be used then CPU emulation would be much easier/faster (as I have "A" stored in a 80x86 register).

Alongsides, I've noticed that the older specs did permit accessing internal rom/ram via [reg], and that existing code is actually using "[A]" in that place - but the newer specs say that those opcodes can use only [A] (ie. not [reg]).
I guess that might have been the reason for originally believing that "reg=00h" would mean "A".

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by byuu on 2018-09-07 (#225341)

Quote:

I'm pretty sure some of that information was already in the "CX4 Program ROM" thread on your forums.

That's indeed quite possible. It was a hectic time. We tore through all of the DSPs in short order.

Quote:

The immediate value is zero extended to 24 bits and the whole 24bits inverted.

Ah, good, so it should follow traditional XNOR equality, then: ~(A^B) == ~A^B == A^~B

Quote:

I hope that makes sense?

Mostly, yes. I'll trust your expertise and add the limitation, then.

Quote:

It's been so long since I worked on this.

Indeed ... the time goes by so fast anymore.

Nonetheless, I greatly appreciate you jumping back into this and reashing this for nocash and I.
I regret it took me so long to get back to the Cx4, but your help is truly invaluable here, thank you.

Quote:

I hope it doesn't break anything

Genius me, I decided to rewrite the entire CPU core to not be a mess of else if((addr&mask)==patern) blocks. So I'm sure I'll end up breaking all kinds of things :3

Quote:

Phew, there are a quite a lot of details that need to be changed... and I'll need to change most things four times: for assembler + disassembler + emulator + specifications : (

Right? >_<

Code:

01100001 .xxxxxxx   61   MOV   MBR, reg**   Data Transfer Instruction   MBR = reg**
01100010 ....xxxx   62   MOV   MAR, Rx   Data Transfer Instruction   MAR = Rx

So bizarre that both aren't just ....xxxx Rx, or both 011.xxxx reg**
What exactly happens on MOV MBR,reg[00-5f]? Does it get loaded with zero?

Code:

11100000 .xxxxxxx   E0   MOV   reg, A   Data Transfer Instruction   reg = A
11100001 .xxxxxxx   E1   MOV   reg**, MBR   Data Transfer Instruction   reg** = MBR

And then no E2,E3 ... this is truly a bizarre architecture.

...

I wonder what happens if we read past the end of data RAM ... whether it mirrors (000-3ff,400-7ff,800-bff,800-bff) or just returns zeroes.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by nocash on 2018-09-07 (#225342)

byuu wrote:

SNES DMA doesn't work like that. The two basic DMA schemes are this:

Code:

 Transfer with src+dst on different address busses:
  1st cycle: read source[0], write dest[0]   ;aka [dst+0]=[src+0]
  2nd cycle: read source[1], write dest[1]   ;aka [dst+1]=[src+1]
 Transfer with src+dst on same address busses:
  1st cycle: read source[0], write temp      ;aka temp=[src+0]
  2nd cycle: read temp, write dest[0]        ;aka [dst+0]=temp
  3rd cycle: read source[1], write temp      ;aka temp=[src+1]
  4th cycle: read temp, write dest[1]        ;aka [dst+1]=temp

The first case is faster, and SNES DMA works like that. And I would assume that CX4 DMA to other address bus does also work as so (so the DMA would probably take only 1+WS cycles per byte) (though ikari wasn't perfectly clear about DMA timings, it might be also 0+WS, or 2+WS or whatever).
The second case is slower, and it's used for CX4 DMA on same bus (same bus means CartROM to CartRAM) (as opposed to internal CX4RAM which probably doesn't use the Cart bus). I guess transfer time would be 1+WS1+1+WS2 per byte (again, ikari wasn't too clear, it might be also 0+WS1+0+WS2 or whatever).

byuu wrote:

What exactly is the purpose of $7f49-7f4b? You can address every possible location with $7f4d-7f4f already. Does $7f49-7f4b get taken into account with program RAM page tagging?

Main purpose is probably just using it as linker base address (so the CX4 code can be linked more easily with SNES code; it affects only opcode fetches, not data fetches, ie.doesn't work out for cases where the CX4 code reads data from 24bit CardROM addresses; hence the changed bank numbers in CX4 code in MegamanX2 vs MegamanX3).
Aside from linking, the base address might also help on fitting 16bit program bank numbers into 8bit immediates (even if the CX4 code is located at higher memory addresses).

But what do you mean by Program RAM? I don't know of a way to execute CX4 in CartRAM, nor in internal CX4RAM, do you?
If there's a way to do such a thing then it would likewise require changing some special flag. Just changing the "ROM address" into a "RAM address" probably won't do it (since CX4 also needs different opcodes when reading data from CartROM vs CartRAM).
Hmmmm, or well, ikari seems to be saying that DMA works for CartROM+CartRAM (as far as I can see without needing to change any flags for CartROM vs CartRAM vs InternalCX4RAM). And also says that $7f48 could cache CartROM and CartRAM (though the part about "CPU misc. (caching)" does mention ROM only).
Don't know if there's a way to execute code in CartRAM (but existing retail carts don't have any CartRAM installed anyways).

byuu wrote:

What happens if I write register $20? Can I actually simulate a (short) PC jump that way?

Might be so. If it works then it would probably end up with a 1-2 cycle branch delay (alike branch delays on MIPS processors).
Not that it would be too useful (or good practice) to do such things.

byuu wrote:

With $2e and $2f, what happens if I change the bus address during the fetch period?

The whole idea about the "inc ext_ptr" opcode is that you do change the bus address during the fetch period. You could probably also issue more than one "inc ext_ptr" during fetch.
There might some restrictions about using other opcodes than "inc ext_ptr" during the fetch period (for example, maybe things could screw up when accessing r0-r15 during fetch).

byuu wrote:

Just to be clear, are the ROM/RAM waitstates the exact # of cycles requires, or the # of ADDITIONAL cycles required, above 1? Eg WS1=3, is that "the data's ready in 3 cycles", or "the data's ready in 4 cycles"?

Concerning opcodes it's "1+WS" cycles, ikari is listing a bunch of test cases (at the end of his doc), and also mentions 250ns for WS=4.
Concerning DMA it's probably also 1+WS (or 1+WS1+1+WS2 for same bus dma).

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by byuu on 2018-09-07 (#225344)

Quote:

The first case is faster, and SNES DMA works like that.

It can't work like that. To write a value, the data bus has to be valid for the entire duration of the cycle. And to read a value, the data bus won't be populated for some time, potentially up to the entire length of the cycle, but usually, in DMA's case, halfway through the cycle (4 of 8 clocks.)

SNES DMA + HDMA very very conveniently have an extra cycle (8 clocks) that suddenly makes sense if you consider the second, staggered approach. Otherwise under your model, there's an extra 8 clocks of setup time that doesn't really seem necessary.

I'm not too sure how we could prove this either way. It's possible to read $2137 to latch the counters during a DMA B bus read, but not to latch the counters from a DMA B bus write, so we can't simply use that to determine when the relevant reads and write occur within the DMA itself.

I guess we'd need a logic analyzer to prove whose theory is correct here.

Quote:

Aside from linking, the base address might also help on fitting 16bit program bank numbers into 8bit immediates (even if the CX4 code is located at higher memory addresses).

Oooh, very clever! This sounds the most plausible reason to me. I hadn't thought of that.

Quote:

But what do you mean by Program RAM?

Oh, that's just what I'm calling the 2x256x16-bit instruction cache.

Quote:

The whole idea about the "inc ext_ptr" opcode is that you do change the bus address during the fetch period.

Yeah, so we pretty much *need* to cache the current bus address while we wait for the value to be populated.

............

Code:

01100001 .xxxxxxx   61   MOV   MBR, reg**   Data Transfer Instruction   MBR = reg**
** Registers 60-7F only.

Implementing this breaks sprites in Rockman X2's opening sequence, the 2 on the X2 title screen, and I stopped looking after that.

Allowing it to access 00-7f makes the game work again.

The instruction being executed is $612e, which is of course the first of the three-opcode sequence for reading from the bus.

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by byuu on 2018-09-08 (#225363)

For whatever this is worth ...

Code:

00.. .... ....  NOP
01.. .... ....  ???
10f. dddd dddd  JMP imm
11f. dddd dddd  JMP EQ,imm
00f. dddd dddd  JMP GE,imm
01f. dddd dddd  JMP MI,imm
10f. dddd dddd  JMP VS,imm
11.. .... ....  WAIT
00.. .... ....  ???
0100 .... ...t  SKIP V
0101 .... ...t  SKIP C
0110 .... ...t  SKIP Z
0111 .... ...t  SKIP N
10f. dddd dddd  JSR
11f. dddd dddd  JSR EQ,imm
00f. dddd dddd  JSR GE,imm
01f. dddd dddd  JSR MI,imm
10f. dddd dddd  JSR VS,imm
11.. .... ....  RTS
00.. .... ....  INC MAR
01.. .... ....  ???
10ss .rrr rrrr  CMPR A<<s,reg
11ss iiii iiii  CMPR A<<s,imm
00ss .rrr rrrr  CMP A<<s,reg
01ss iiii iiii  CMP A<<s,imm
1000 .... ....  ???
1001 .... ....  SXB
1010 .... ....  SXW
1011 .... ....  ???
11.. .... ....  ???
0000 .rrr rrrr  LD A,reg
0001 .rrr rrrr  LD MDR,reg
0010 .rrr rrrr  LD MAR,reg
0011 .... rrrr  LD P,Rn
0100 iiii iiii  LD A,imm
0101 iiii iiii  LD MDR,imm
0110 iiii iiii  LD MAR,imm
0111 iiii iiii  LD P,imm
1000 .... ....  RDRAM 0,A
1001 .... ....  RDRAM 1,A
1010 .... ....  RDRAM 2,A
1011 .... ....  ???
1100 iiii iiii  RDRAM 0,imm
1101 iiii iiii  RDRAM 1,imm
1110 iiii iiii  RDRAM 2,imm
1111 .... ....  ???
00.. .... ....  RDROM A
01ii iiii iiii  RDROM imm
10.. .... ....  ???
1100 iiii iiii  LD PL,imm
1101 .iii iiii  LD PH,imm
111. .... ....  ???
00ss .rrr rrrr  ADD A<<s,reg
01ss iiii iiii  ADD A<<s,imm
10ss .rrr rrrr  SUBR A<<s,reg
11ss iiii iiii  SUBR A<<s,imm
00ss .rrr rrrr  SUB A<<s,reg
01ss iiii iiii  SUB A<<s,imm
10.. .rrr rrrr  MUL reg
11.. iiii iiii  MUL imm
00ss .rrr rrrr  XNOR A<<s,reg
01ss iiii iiii  XNOR A<<s,imm
10ss .rrr rrrr  XOR A<<s,reg
11ss iiii iiii  XOR A<<s,imm
00ss .rrr rrrr  AND A<<s,reg
01ss iiii iiii  AND A<<s,imm
10ss .rrr rrrr  OR A<<s,reg
11ss iiii iiii  OR A<<s,imm
00.. .rrr rrrr  SHR A,reg
01.. ...i iiii  SHR A,imm
10.. .rrr rrrr  ASR A,reg
11.. ...i iiii  ASR A,imm
00.. .rrr rrrr  ROR A,reg
01.. ...i iiii  ROR A,imm
10.. .rrr rrrr  SHL A,reg
11.. ...i iiii  SHL A,imm
0000 .rrr rrrr  ST reg,A
0001 .rrr rrrr  ST reg,MDR
001. .... ....  ???
01.. .... ....  ???
1000 .... ....  WRRAM 0,A
1001 .... ....  WRRAM 1,A
1010 .... ....  WRRAM 2,A
1011 .... ....  ???
1100 iiii iiii  WRRAM 0,imm
1101 iiii iiii  WRRAM 1,imm
1110 iiii iiii  WRRAM 2,imm
1111 .... ....  ???
00.. .... rrrr  SWAP A,Rn
01.. .... ....  ???
10.. .... ....  CLEAR
11.. .... ....  HALT

To analyze the unknown instructions ...

Code:

01.. .... ....  probably a valid instruction; but possibly NOP is 0000 0... .... ....
00.. .... ....  very likely to be a valid instruction
01.. .... ....  perhaps INC MDR, INC P, or INC DPR?
1000 .... ....  would be sign-extend 0-bits ... may set A to zero?
1011 .... ....  would be sign-extend 24-bits (no change) ... may set N/Z flags?
11.. .... ....  very likely to be a valid instruction
1011 .... ....  would be RDRAM 3,A ... most likely a no-op
1111 .... ....  would be RDRAM 3,imm ... most likely a no-op
10.. .... ....  could be a RDROM variant?
111. .... ....  could be wasted on LD P[16-23],[24-31] which doesn't exist ...
001. .... ....  likely to be ST reg,MAR and ST reg,P
01.. .... ....  very likely to be a valid instruction
1011 .... ....  would be WRRAM 3,A ... most likely a no-op
1111 .... ....  would be WRRAM 3,imm ... most likely a no-op
01.. .... ....  almost guaranteed to be a valid instruction; sitting between SWAP and CLEAR

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by ikari_01 on 2018-09-09 (#225479)

Wow, some interesting questions here. I'm going to answer part of it for now, can't tackle all at once

First of all, WS1+WS2 should probably be referred to as just "waitstates" or "extra clocks per cycle". The two games set these to 4, and as a result the Cx4 accesses ROM at 4MHz. (20MHz / 5).
The power-on default is 3 for both, thus it would access ROM and RAM at 5MHz. I can take logic analyzer traces of all that if required (which is what I did in the first place to determine waitstates, measure instruction times, etc... but I did not save the traces so I'd need to record them again.)

The DMA transfers work as described by nocash (simultaneous for independent buses, sequential for same bus).
DMA would usually take place between cartridge ROM and internal RAM, or internal RAM and cartridge RAM.

Small discourse that might deserve a thread on its own:
SNES DMA actually also works the way nocash described

Source and destination addresses are put on their respective buses simultaneously and RD+PAWR or PARD+WR are asserted depending on DMA direction.

For a memory write cycle, data usually only has to be valid a short period of time before and during the rising edge (i.e. end) of the write strobe (like 30ns or so). Thus the source has plenty of time during the active period of the cycle to put its data on the bus before it is required at the destination.
There seems to be an extra cycle before every DMA block (usually with a bogus address on the buses, and with no read or write control signals asserted) but I don't know what that's for.

So far I cannot confirm an extra cycle of overhead per HDMA channel, usually I'm just seeing one extra cycle before all HDMA channels fire in a row. Of course it would need to pause to fetch a new pointer from an indirect table, or a new count value, but as long as that isn't happening there don't seem to be any extra cycles.
Then there's the seemingly arbitrary prolongation of the previous CPU cycle, which actually just pads out to the next multiple of 8 master clocks since reset. I overlaid the RGB signal with PAWR so the B bus writes actually became visible, and HDMA is always nicely lined up in a vertical row, surrounded by regular CPU cycles jittering about around it. It recently turned out that it's important to keep phase relationship between DMA access and dot clock but that's a different topic... (maybe anybody saw the Chrono Trigger frog glitch in the demo roll, or flickering pixels around sprites in Star Ocean or Kirby Super Star... https://github.com/RedGuyyyy/sd2snes/issues/6)

Anyway, back to topic:

byuu wrote:

Is it confirmed in LoROM mode that the Cx4 bus can't see ROM at $c0-ff:0000-ffff?

Just re-checked, the following regions are open bus:
C0:0000-FF:FFFF
40:0000-6F:FFFF
70:8000-77:FFFF (70:0000-77:7FFF is cartridge RAM)
78:0000-7D:FFFF

byuu wrote:

Does writing to $7f53 get the Cx4 out of a lockup from a bad DMA transfer as well?

A bad DMA transfer just seems to keep the CPU stalled in memory access ($7f53 bits 7 and 6 are set). Writing to $7f53 clears those bits and you can talk to the Cx4 again.

byuu wrote:

What happens if I start a DMA or cache page operation while the Cx4 is active? Will it take priority over the Cx4 instruction processing, stay pending until the Cx4 instructions are halted, or just do nothing?

It simply ignores DMA or cache page triggers as long as $7f53.6 is set. It does not execute them afterwards. It does accept all the register values though, so you could prepare everything, then just trigger DMA or cache page as soon as the Cx4 becomes idle.

Gotta catch some sleep now

Re: Some tidbits about the Cx4 (attn: byuu, nocash)
by nocash on 2018-09-12 (#225698)

I am still wondering if one can use operand=00h to access "Register A" or not. The question (and answer) affects many opcodes, almost everything that doesn't use immediates as parameter. At the moment, my emulation is allowing operand=00h for A, if that's wrong then I could change that, but it would be nice if somebody could confirm if it's really wrong.

And, I am wondering about CartROM vs CartRAM... especially about three different cases:

For DMA one does simply specify different src/dst addresses to select between CartROM/CartRAM (eg. without needing to set a SoureType=CartRAM flag or so), right?

For Program Code/cache loading, does that work, too? Allowing to execute code that comes from CartRAM? Would be interesting to know, and if it's working then it might be a bit easier to test custom code on the CX4 (provided one has a way to run SNES code in WRAM, which could then relocate CX4 code to CartRAM) (installing CartRAM would be probably easier than installing (writeable) CartROM).

For CX4 data reads from CartROM/CartRAM - if DMA (and maybe also cache loading) works just by using different addresses - then I am wondering why one needs to use different opcodes (612Eh/612Fh) for reading CartROM/CartRAM. Or could it be possible to use either opcode for either CartROM/CartRAM? If so, maybe the opcodes differ only by using different waitstates, WS1/WS2?