I'm trying to do some static analysis of FDS titles, identifying and counting which FDS BIOS APIs they call. Following the classic strategy of "do the simplest thing that could possibly work," my first attempt just scanned the ROM for byte sequences of $20 $xx [$Ex|$Fx], in other words, "JSR $Exxx" or "JSR $Fxxx."
That resulted in lots of false positives, as you might imagine. I have reduced the false positives by
1. Only scanning the PRG files in the disks (block type 4, file type 0)
2. Excluding results that are immediately followed by an illegal opcode, on the unproven assumption that no FDS title would use "unofficial" opcodes,
3. Rejecting matches that point before the first "public" API that starts at $E149, because that's known to be character data.
I have considered rejecting matches that fall between entrypoints in the BIOS, or in other words, JSRs into the middle of BIOS functions. That said, I'm not convinced that all FDS titles are this well-behaved.
I still get a ton of false positives, likely because of instruction alignment. What other strategies might I use to reliably identify JSRs into the FDS BIOS?
I imagine that the "right" way to do this is to disassemble the PRG files starting from some known good address, but that's harder than it sounds. I can place each file in RAM correctly using the load address field in the preceding block 3.
- If I disassemble from the beginning of the loaded file, that assumes that byte 0 of each file is program code, which it might not be.
- If there are gaps between functions, they may contain garbage data that looks like code and throws off subsequent disassembly.
- If I try to do some kind of static execution tree analysis, following JSRs and branches, I would be stymied by jumps from one file to another, as I won't know which files are intended to be loaded simultaneously.
- I could reject apparent jumps that land between instructions of the FDS BIOS, which I could get by looking at the FDS BIOS disassembly. That doesn't solve the false positives that land on opcodes, however.
- Finally, I'm pretty much SOL if there's any self-modifying code that writes or modifies JSR instructions. I don't know if any FDS software does that.
FWIW, yes I know the public FDS documentation on the Wiki says that there are no public APIs in the $Fxxx range, but I've already identified some errors in the Wiki, so I need to prove this for myself.
That resulted in lots of false positives, as you might imagine. I have reduced the false positives by
1. Only scanning the PRG files in the disks (block type 4, file type 0)
2. Excluding results that are immediately followed by an illegal opcode, on the unproven assumption that no FDS title would use "unofficial" opcodes,
3. Rejecting matches that point before the first "public" API that starts at $E149, because that's known to be character data.
I have considered rejecting matches that fall between entrypoints in the BIOS, or in other words, JSRs into the middle of BIOS functions. That said, I'm not convinced that all FDS titles are this well-behaved.
I still get a ton of false positives, likely because of instruction alignment. What other strategies might I use to reliably identify JSRs into the FDS BIOS?
I imagine that the "right" way to do this is to disassemble the PRG files starting from some known good address, but that's harder than it sounds. I can place each file in RAM correctly using the load address field in the preceding block 3.
- If I disassemble from the beginning of the loaded file, that assumes that byte 0 of each file is program code, which it might not be.
- If there are gaps between functions, they may contain garbage data that looks like code and throws off subsequent disassembly.
- If I try to do some kind of static execution tree analysis, following JSRs and branches, I would be stymied by jumps from one file to another, as I won't know which files are intended to be loaded simultaneously.
- I could reject apparent jumps that land between instructions of the FDS BIOS, which I could get by looking at the FDS BIOS disassembly. That doesn't solve the false positives that land on opcodes, however.
- Finally, I'm pretty much SOL if there's any self-modifying code that writes or modifies JSR instructions. I don't know if any FDS software does that.
FWIW, yes I know the public FDS documentation on the Wiki says that there are no public APIs in the $Fxxx range, but I've already identified some errors in the Wiki, so I need to prove this for myself.