Why aren't you using zero page addressing for all the RAM you have?
For instance, CompressFlag is at $0301. This means it takes 4 cycles each time that cmp is needed. If it was on the zero page it would take 3 cycles. Let's say there are 18 "runs" of data. You lose 18 cycles for absolutely no reason. You're not tight on RAM, so why? The same with BIT DecompressSettings. Every time you do that, you lose a cycle because DecompressSettings is not on the zero page.
And in case you do decide to move those things to zero page, unfortunately nesasm doesn't give you the gains automatically.
Code:
;You have to put < before each zero page RAM label that you want to use zero page addressing for
;So do
BIT <DecompressSettings
;rather than
BIT DecompressSettings
;And when that byte is on the zero page, it will be faster.
CMP <CompressFlag
INC <DecompressOutput
;etc
Unless you are tight on zero page RAM you should do the above. It not only makes things faster, it also means your code takes up less space.
Generally, you have to choose between making your code faster or making it smaller. If you want small code, you have the right idea. But if you want fast code, avoid doing anything in a loop that doesn't need to be done.
Will the value of DecompressSettings EVER change during this subroutine? It doesn't look like it. So checking it is a waste of time for EVERY SINGLE run.
If I were you, I would check DecompressSettings immediately after you jsr to the subroutine, and then branch to a loop that never increments if the minus bit is clear, and always does if the bit is set. This means you have to duplicate some code, but it also means you only check DecompressSettings ONCE per routine.
In fact, you could ignore DecompressSettings entirely, and just make two separate routines. Think about it this way: The user has to set beforehand which one they're using. So why not save the writes required to do that, and just have them jsr to a different routine?
This plan saves the cycles for the write (6 cycles), and 4 cycles for every time BIT DecompressSettings would have happened.
This is a little strange in your C code.
You know you can just do
Code:
while(1)
Right?
Or even just
Code:
while(AlwaysOne)
There's no need for
Code:
while(AlwaysOne == 1)
Still, not bad for a generic solution. I'll mess with it a bit more and see if I can find anything else.