Questions re Crystalis and Mesen's CDLs, disassembly, etc.

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Questions re Crystalis and Mesen's CDLs, disassembly, etc.
by on (#234285)
Hi all- apologies if these have already been answered, I've searched and also read through what I think are the relevant sections of Mesen's manual and played with the menus but haven't found answers yet.

I'm taking a stab at reverse engineering Crystalis. My goal is to get some kind of rebuildable disassembly although I know this ROM is a bear. My strategy so far is to use Mesen's debugger to help with code identification and then attempt a literal disassembly to get a rebuildable code base that I can work from going forward. I've finished a fairly thorough play-thru trying to cover as many edge cases as possible, so I think the CDL is fairly complete in terms of regular gameplay. Now I have questions about how to interpret Mesen's CDL files and its disassembly output.

* The main frustration I'm having is completing the CDL. I tried scanning it for unidentified data and overwriting those parts of the ROM with zeroes. This does affect the game-- why was this data "unknown" in the CDL when it's clearly doing something? I get that some wouldn't be encountered during a regular playthrough, but I think there's something else going on.
* As far as disassembling the ROM, Mesen adds labels- which is great, but I also want a literal, rebuildable, disassembly. I read Mesen's manual and looked through all the menus but I can't figure out a way to turn off the auto labels and get the output I want -- is there a way to do this or will I need to address this programatically? (I've also tried a few disassemblers that are supposed to work with FCEUX's CDL files but no luck, although I know Mesen uses a slightly different CDL format).

Thank you!
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234298)
What version of Mesen are you using? (Help -> About) It matters greatly here.

Sour can explain the changes to the CDL format he used. (I asked about some of these in a PM with him and he was happy to explain them.)

Getting a usable disassembly (for an entire game, vs. small snippets) isn't something Mesen has right now. You'll need something like disasm6, which can use CDL files (but I'm not sure if it can use Mesen's CDLs, as they aren't 100% identical to FCEUX). Except disasm6 doesn't have MMC3 support, so you'll likely have to manually disassemble PRG banks one by one, or make hand-made CDL files for each of them, or just keep a gigantic list of addresses you care about along with each bank. You're welcome.

clever-disasm might be another alternative, except... well... I'll keep my opinions to myself (hint: good luck with that).

I think you'll find that "the disassembler you want" for this task doesn't really exist. In fact, as the author of a disassembler in the mid-to-late 90s, I would say the situation today is really not all that great for something "easy". You really do have to put in the long hours splitting stuff up. It's a *lot* of work, on top of the work you have to do just to understand the code itself.

P.S. -- You might try asking the Stardust Crusaders folks or on the romhacking.net forum to see if someone else has already done the initial work/pain for you, to get something reassemble-able.
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234299)
koitsu wrote:
clever-disasm might be another alternative, except... well... I'll keep my opinions to myself (hint: good luck with that).
From prior experience, it's really not up for MMC3 disassemblies. Regardless, it requires a lot of guidance—I just find that it requires a lot less than generating a sufficiently complete CDL. (Specifically, I mean that I've generated disassemblies using an FCEUX CDL file and disasm6, and I've generated disassemblies using clever-disasm, and I think it took less effort to get something comparable with clever-disasm. It's just that that effort is "repeatedly run it and edit the descriptor file until the ambiguities are cleared up" instead of "play through the game until sufficiently close to all the code in the ROM is marked")

Either way, 90%+ of the work is still converting all the automatic labels and automatic names into human-comprehensible ones.
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234304)
Quote:
Except disasm6 doesn't have MMC3 support, so you'll likely have to manually disassemble PRG banks one by one, or make hand-made CDL files for each of them, or just keep a gigantic list of addresses you care about along with each bank. You're welcome.


That fixed it. And works with the split up CDL, too. Didn't occur to me that this was the problem since without the CDL it would output a disassembly, just with code/data mixed. Thanks :facepalm:

FWIW I'm using Mesen 0.9.7. The CDLs are the same as FCEUX's except that Mesen sets bit #7 to mark entry points.
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234305)
taters wrote:
FWIW I'm using Mesen 0.9.7. The CDLs are the same as FCEUX's except that Mesen sets bit #7 to mark entry points.
As a heads up, the current dev builds also change the meaning of the $10 bit to mark bytes that are the destination of a jump/branch instruction. AFAIK, that PHP disassembler only uses the $01/$02 bits in the CDL file.

You can turn off the default labels by either just selecting them and pressing delete, and you can remove them permanently by going to File->Workspace->Disable default labels.

It's possible that the unknown data is a bug (although I'm not too sure how it could happen), if you have a way to reproduce the problem, I'm happy to take a look.
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234307)
Sour wrote:
AFAIK, that PHP disassembler only uses the $01/$02 bits in the CDL file.

As of version disasm6 v1.5, I believe that's correct (from looking at the PHP code) -- it only cares about bits 0 and 1 in the CDL file.

As for the OP encountering issues in Mesen where some bytes are considered unknown despite being accessed/used: I may have seen this. I definitely had some scenarios where a single byte (it was ALWAYS a single byte too) in a block of data I'd previously marked as code or data was unexpectedly marked "unknown". Going back and re-assigning it to code/data would alleviate the problem. I spent some time trying to figure out how this could/would happen, but failed / couldn't reproduce it reliably (read: it could be reproduced, but the situation/scenario I wasn't able to determine reliably, which makes reporting it very hard). It was on an older version of Mesen however, so I'd be better off starting over with a latest Mesen debug build + fresh RE project files + seeing if I could reproduce it there.

If this isn't the problem the OP saw, then I guess there could be two bugs, haha. :-)
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234329)
Sour wrote:

You can turn off the default labels by either just selecting them and pressing delete, and you can remove them permanently by going to File->Workspace->Disable default labels.



Do you mean the list on the right side of the debugger? I saw that in the instructions but it doesn't work for me (nor the option under the Workspace menu).

koitsu wrote:
As for the OP encountering issues in Mesen where some bytes are considered unknown despite being accessed/used: I may have seen this. I definitely had some scenarios where a single byte (it was ALWAYS a single byte too) in a block of data I'd previously marked as code or data was unexpectedly marked "unknown". Going back and re-assigning it to code/data would alleviate the problem. I spent some time trying to figure out how this could/would happen, but failed / couldn't reproduce it reliably (read: it could be reproduced, but the situation/scenario I wasn't able to determine reliably, which makes reporting it very hard). It was on an older version of Mesen however, so I'd be better off starting over with a latest Mesen debug build + fresh RE project files + seeing if I could reproduce it there.

Sounds like it may be similar/same issue. I have not noticed it today but will keep trying to reproduce it.
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234366)
Are you talking about the automatic jump labels? e.g L8040, L88A9, etc?
If so, you need to disable that option first (in Options->Auto-create jump labels), and then File->Workspace->Reset labels should clear all the labels.
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234376)
Yes and that worked!

Thank you!
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234537)
I have another question about configuring Mesen's disassembly output. I've noticed that following a branch instruction, even if the CDL has logged subsequent bytes as data, Mesen disassembles it, I guess to show what the instructions are in case of a non-branching test.

For example:

LDA #$00
STA $09
BEQ L3C900
... [some data bytes that get disassembled into nonsense instructions (and worse, throw off the reading frame)]

The affected data is highlighted in green, so at least I can visually see that it's not actually code. But I don't know how to stop Mesen from disassembling it. I've tried a few different settings but no luck so far. Any way to stop the auto disassembly?

p.s. I apologize for asking what are obviously simple questions that maybe there is an answer in Mesen's documentation or via search, but I didn't find it or it was not clear to me
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234539)
There isn't a way to do this, Mesen will always assume something after a conditional branch is potential code, because this will be the case 99.9% of the time. The green color means that it's been marked as code, but hasn't been executed yet.

Changing this would make the debugger somewhat less user-friendly, since the alternative would require all branches to be taken before they're actually marked as code (which means they wouldn't be disassembled at all depending on your settings)

I'm not sure what you mean by "throwing off the reading frame", though?
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234541)
Unlike 65C02, 65816, Z80, and SM83, the MOS 6502 has no BRA instruction, and an unconditional conditional branch saves 1 byte relative to JMP. So I imagine one might approximate detecting the most common unconditional conditional branch setups by adding a flag to the debugger.

  • Set an internal flag when NZ are set by loading a known value:
    LDA/X/Y #immediate or from LDA/X/Y from absolute ROM in the same or a fixed bank
  • Clear it when NZ are set by any other instruction
  • Clear it when PC is changed (JMP, JSR, B??, RTS, RTI)
  • If the flag is set, then taken BMI, BPL, BEQ, BNE should not mark the untaken side as code
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234545)
Sour wrote:
There isn't a way to do this, Mesen will always assume something after a conditional branch is potential code, because this will be the case 99.9% of the time. The green color means that it's been marked as code, but hasn't been executed yet.

Changing this would make the debugger somewhat less user-friendly, since the alternative would require all branches to be taken before they're actually marked as code (which means they wouldn't be disassembled at all depending on your settings)


I appreciate the heuristics, just wondering if there was a way to switch it off since that branch is never going to execute

Sour wrote:
I'm not sure what you mean by "throwing off the reading frame", though?


In the example I pasted, the last byte of the inappropriately disassembled data matches the opcode for CMP. The next two bytes are 0xA9 0x09-- which should be disassembled as LDA #$09 but 0xA9 ends up being treated as the operand for the CMP opcode.
Re: Questions re Crystalis and Mesen's CDLs, disassembly, et
by on (#234546)
For this kind of work I use https://csdb.dk/release/?id=149429 it won't understand the NES's banking, so you would need to manually split up the files into banks. While messan is handy for working things out, I find Regenerator is better at making source code that will re assemble. For example if you could make the CMP a byte so it will put and then dissasemble the LDA. It has zero smarts so you need to tell it what it is where. You can also set up Lo/Hi tables and have labels made for it etc.