Opcodes: huge switch statement or goto label?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Opcodes: huge switch statement or goto label?
by on (#28189)
Hello again!

I am currently writing the opcode handling routine, and I fear the routine is going to blow up into a massive switch-case-web. So, in the light of optimizing, is it worth rewriting the part that I have (approximately 40 opcodes) into a different format?

I am thinking of replacing the case-statements with a label prefix and executing a (computer science course teachers, please beware) goto at the beginning of the handling routine. I understand that this will introduce an additional jump-instruction at the start of each opcode, but it removes the need of comparing the fetched byte to each and every opcode. Additionally, this will make sure that the opcodes at the end of the list start almost as fast as those at the beginning. Ofcourse one could use a list of opcode stastistics (I swear I have seen one, I forgot where) to determine opcode placement in the switch-statement.

Can anybody shed some light on this issue? My arguments seem to favor the goto-approach, but I'd like to know if I'm forgetting something before making an important design decision like this one.
Dynamic goto label?
by on (#28190)
Hm.. Come to think of it: Is it at all possible to use a dynamic goto label like LBL_(OPCODE) in C?

by on (#28192)
you could do:

switch(opcode)
{
case 0x00: goto brk or brk();
...
}

and hope it becomes a jump table.

by on (#28193)
large switch statements usually get compiled to a jump table.

therefore a single, large switch is usually the way to go. From there I generally use macros or inline functions for addressing mode/instruction combos.

by on (#28203)
Yep. The straight forward switch is the best place to start:
Code:
switch ( opcode )
{
case 0x00: // BRK
    ...
    break;

case 0x01: // ORA (zp,x)
    ...
    break;
...
}

If you've got over 100 case statements, most compilers will use a jump table for the switch, since that's faster than the 7 comparisons that would otherwise be required to determine the opcode (7 because they would use a binary search, not the 100 that a linear search would require).

by on (#28221)
Good, thanks. I guess I underestimated the compiler's (gcc) capabilities here.

by on (#28235)
With gcc(this uses a gcc extension), I've done something like this before:

Code:
static const void* GotoTable[256] =
{
 &&Op00, &&Op01, &&Op02, [...]
};

goto *GotoTable[happy_opcode];


Whether or not this is good programming practice is left to the reader. ;)
I actually had an instance where doing this was measurably faster than a switch() statement, so I guess gcc's optimizer is a bit flaky sometimes. But, the NES has enough opcodes, that it should always cause gcc to generate a jump table automatically with an opcode switch() statement.

by on (#28274)
Yes, that's my idea, and I could simplify a lot of code. However, I have no clues about seeing the compiler level of anything...

by on (#28284)
Fx3 wrote:
Yes, that's my idea, and I could simplify a lot of code. However, I have no clues about seeing the compiler level of anything...

If you're looking for ideas on what not to hand-optimize, and you can read i386 assembly language, you can look at the asm code that GCC generates:

gcc -Wall -O -c hello.c
produces object code in hello.o

gcc -Wall -O -S hello.c
produces asm code in hello.s

by on (#28376)
Uh... yes, but I meant another point. By using multiple jumps (goto), does it take out a chance to optimize the code for the compiler? A "major" programmer said that to me, as if this piece of code "locks" any optimizing... or almost.