Games that have lots of slow downs, would changing the internal header settings to Fast Rom make any difference?
No, because nothing actually inspects the header (other than emulators). On hardware, the game has to reconfigure the
SNES to address ROM at the faster speed, and even then, only some of it. ($808000-$BFFFFF and $C00000-$FFFFFF)
Maybe I'm using the wrong terminology... The internal header... Does it have a setting or indicator that tells the console if it's a fast rom or not? I'm not referring to the 512byte of the added header that the emulators use.
No, see previous answer. To use FastROM, you simply set bit 0 of $420D to 1, after which any CPU access (not DMA) to the FastROM region will be 6 clocks per byte instead of 8.
This means that if you want your code to speed up noticeably, you need to point the program counter at the FastROM region; this is usually done with jml.
The internal header specifies whether Nintendo was informed that the game needed to be produced using 120ns ROMs. Nothing more... unless the game itself checks the bytes in the header and uses that information to decide whether to do those two things. That'd be silly, though, because they know the header they were going to include, so there's no reason to conditionally relocate the code.
I see. Thanks for the explaination!
lidnariq wrote:
That'd be silly, though, because they know the header they were going to include, so there's no reason to conditionally relocate the code.
Unless they hadn't decided which ROM type to use until last moment (which mattered depending on how much money they had), or the game started as SlowROM and they were trying to turn it into FastROM. Changing bank numbers at assembly time (i.e. using constants) probably would have been a better idea though, yeah (but you never know what a programmer could decide is easier to implement).
The speed difference between SlowROM and FastROM in a game that suffers slowdown may not be significant enough to have a positive effect anyway.
Remember that the SNES CPU does not run at 2.68mhz or 3.58mhz. It changes all the time meaning the average rate it somewhere between those two numbers. FastROM means that those cycles which read FastROM are slightly faster. But all your RAM access is still "slow". Even in SlowROM you'll have the CPU running at "Fast" speed for some percentage of the cycles.
There is a boost in performance but it's not a huge one. Maybe if a game was just *barely* going over the cpu frame time then by using FastROM you could eliminate that slowdown. psycopathicteen said that Gradius III which suffers slowdown is pretty unoptimized. Which sounds like FastROM alone wouldn't even come close to relieving the slowdown issues.
I'm thinking, is it possible to make a program that can take a ROM, and automatically optimize it? I think one of the hardest parts would be making a branching map of the entire thing, especially with things like JSR (abs,x) where you have to figure out all the possible numbers X could be.
I don't see how this is possible with a single utility or program, i.e. there's no 100% reliable way to "analyse a ROM file" and accomplish this.
The only way this would work is through an actual emulator, where it had some heuristics and logic to keep track of lots and lots of data and let you reverse-engineer its collected data from there. Think of it FCEUX's Code/Data Logger, just with a different purpose.
My opinion is generally the same as MottZilla's.
psycopathicteen wrote:
I'm thinking, is it possible to make a program that can take a ROM, and automatically optimize it?
If "halting problem" and "Kolmogorov complexity" have any meaning to you, you'll see why the answer is "intractable".
Quote:
I think one of the hardest parts would be making a branching map of the entire thing
I made a tool to map the call graph of the source code of game Thwaite. Do you want me to go dig it up?
Quote:
especially with things like JSR (abs,x) where you have to figure out all the possible numbers X could be.
In my call graph visualizer, a lookup table would be represented as the JSR (abs,x) calling the jump table, and then the jump table calling all its members.
I thought someone was making a disassembler that starts with the boot vector, and takes all possible routes in code (back tracing, etc) and builds out a disassembly. If you took that approach, you should be able to reliably (or close as possible) be able to identify the opcodes you want to change. Unless code is dynamically built in ram, then this would fail for that part.
The other method was what koitsu mentioned. Using a data code logger, you map out the entire rom as you play through the game. The CD log/file tells you if a specific byte is the first for an instruction, so you get alignment for disassembly. I think Bizhawk emulator supports SNES (can't remember, it has CDL for other consoles beside NES).
Or do the last method and have an app scan for whatever target opcode. Set boundaries (like obvious data areas), and maybe add a simple check to look at opcodes before after the match, to see if it's a false positive (you'll mostly likely still get false positives).
Though that assumes it's just simple JSR/Jmp opcodes to change, and not something weird like building addresses in ram that you jump indirectly from, or weird stack pushing/popping stuffs.
I would just pick one game to work on, using any of the methods above to assist, instead of trying an automatic conversion route.
tomaitheous wrote:
not something weird like building addresses in ram that you jump indirectly from
That's not weird, that's how jump tables (a common programming technique) are implemented. I expect most games to use jump tables for processing object A.I., for example.
Quote:
or weird stack pushing/popping stuffs.
That could also be related to jump tables, or multi-threading. These are perfectly valid programming techniques, and there's nothing "weird" about them.
Okay, so if somebody would make a game optimizer, this is how it will work.
Step 1: The program will disassemble the game, as it is being played on an emulator.
Step 2: It marks every ROM region that is used for code.
Step 3: It does the optimizations.
Step 4: It reallocates the new code to fit into the same region as the old code.
Some rules:
- Routines accessed by indirect jumps will retain their original starting point.
- Routines accessed by stack returns will retain their original starting point, unless it is returning from a subroutine.
What kind of optimizations do you expect to be able to automate?
tokumaru wrote:
tomaitheous wrote:
not something weird like building addresses in ram that you jump indirectly from
That's not weird, that's how jump tables (a common programming technique) are implemented. I expect most games to use jump tables for processing object A.I., for example.
No, this would be a subroutine pointer list. A jump table is a list of jump instructions to each subroutine (rather than storing their addresses), where you do a relative jump to the one you want and then it jumps again into the expected subroutine.
In the end both are the same kind of hell for disassembling anyway, because the disassembler has no way to tell for sure up to where it can end up going. Jump tables and pointer lists need to be specified manually.
I think if you want to optimize a game to reduce or eliminate slowdown you are better off manually trying to do so. Any automated program for such a thing is likely to do a poorer job compared to a person that knows the whole picture, or atleast the bigger picture. I don't think there would be a one size fits all solution for optimizing these sorts of games. If you want something more like that I think you just need to look into overclocking the CPU or otherwise getting more CPU time. On an emulator it could be done more easily ofcourse.
Sik wrote:
In the end both are the same kind of hell for disassembling anyway, because the disassembler has no way to tell for sure up to where it can end up going. Jump tables and pointer lists need to be specified manually.
Yup, which is exactly why I said the only way for this to be done even remotely reliably is through an emulator -- a "code tracer" (disassembler or something like IDA Pro) can't do this accurately unless the game/title is super simple (talking about like bare-bones homebrew demos).
Tepples also pointed out the larger issue, which is the halting problem. Computers are pretty poor at making decisions based on examining code. Optimizing compilers, for example, work well because the programming "rules" of C forbid the programmer to write outside of defined variables, arrays, etc., so the optimizer knows (well enough) what data/variables never get used in the code and can eliminate them.
With assembly code on the (S)NES, it's a free-for-all, and code can jump or read/write anywhere, anytime, and this is the rule built into the assembly instructions. Software can't catch all possibilities, only emulate the code and show hotspots, etc to the user, and mmaybe suggest optimizations. The implementation would still be up to the programmer to do.
rainwarrior wrote:
What kind of optimizations do you expect to be able to automate?
-Using Direct Page in places where it repeatedly accesses the same page
-Switching to 16-bit mode in places that should've used 16-bit mode
-Getting rid of redundant loads and stores (except in I/O registers, where you can expect different results from a register without writing to it)
The problems with things like this are, if you got rid of even one LDA, wouldn't it offset the entire game and cause pointers to go to the wrong place and mess everything up? I guess you could write NOP, but that still takes up 2 clock cycles. The only real non-crazy way to do something like this that I can think of would be if you had the source code, but I'm sure that even the developers haven't kept track of it...
For each byte of code you remove from a subroutine, you'd insert NOPs after its RTS.
That won't help if code elsewhere jumps somewhere in the middle of that subroutine (incidentally, IDA tends to get confused by this since it makes it hard to tell what are the real subroutine boundaries). Also yes, this seems to be pretty common, especially for code written in assembly.
Sik wrote:
That won't help if code elsewhere jumps somewhere in the middle of that subroutine
A debugging emulator that tracks the set of all (current instruction PC, previous instruction PC) pairs can help ferret out these inbound jumps so that they can be fixed up.
I watched a video about compiler optimizations, and according to the video, compilers divide code into "basic blocks" where it never jumps.
In order to pad a basic block, you can insert a NOP at the end if there is one byte left. If two or more, you can insert a branch.
So what game would be a prime candidate for this? Couldn't you modify an emulator to run all rom at 3.58mhz so see how a specific game would react, before going through all this work?
tokumaru wrote:
tomaitheous wrote:
not something weird like building addresses in ram that you jump indirectly from
That's not weird, that's how jump tables (a common programming technique) are implemented. I expect most games to use jump tables for processing object A.I., for example.
Quote:
or weird stack pushing/popping stuffs.
That could also be related to jump tables, or multi-threading. These are perfectly valid programming techniques, and there's nothing "weird" about them.
Ok, maybe 'weird' isn't the proper description. 'Annoying' would be better, relative for what's at task. 65C02 and later models have jmp [table,x]. For jsr [table,x], which didn't exist on the 6280, I just jsr'd to the jump table,x instruction. I never needed to build anything out in ram, but I've seen it enough times in commercial games (pce mostly). I haven't seen anything for stack use that couldn't have been done another way (with the same amount of effort), outside my own convoluted use for such things (making BRA.long macro for code that has all relative branch, allowing it to work in different cpu logical address locations) - but that also hasn't seemed to deter devs from doing it (macros maybe? I think FF1 on nes does something to what I'm referring to).
How are we going to distribute the ROMs?
Is the disassembler program finished yet?, and does anybody have a full list of games that need fixing?
psycopathicteen wrote:
How are we going to distribute the ROMs?
One way is as an IPS, xdelta, or other binary patch. Such a patch would change only the inner loops that cause the most slowdown, allowing the vast majority of the file to remain unchanged. Another is as a list of discovered basic blocks (including inbound jumps) and optimization choices that a custom patcher could apply. This would allow larger changes that move big things around but might require end users to install a runtime environment (such as Python or a C compiler) to run the patcher.
Secondary recommendation for IPS or other binary patchers. It's the best way** to ensure you don't get dinged for copyright infringement.
Please make sure whatever patching method you use is in a format that's supported by easy-to-use tools for both Windows and *IX systems. For example, there are several "IPS extensions" that many IPS patchers on Windows do not support, thus bail out or crash (or some do the wrong thing entirely) when encountered.
** -- I will not be catering to Aspies who want to argue/debate/discuss what a better way is, or how it's still possible that IP violations could happen even with IPS (yes it's possible but I've yet to hear of a single major game company complaining over such things. Full ROMs, incl. modified, are what they go looking for usually -- exceptions are situations like containing copyrighted graphics. I can tell a story that happened with Parodius and a takedown notice if people want an example)