I have a working 2A03 emulator, but would like to optimize it as much as possible. Blargg's website has some good information, but there are a couple of points that are still unclear.
*The same addressing modes are re-used numerous times. For instance, LDA ($nn),Y will use the same effective address calculation as ORA ($nn),Y. In general, on an x86 platform, is it faster to inline the effective address calculation (thus minimizing CALL/RET overhead), or is it faster to use subroutines for each form of calculation (thus minimizing code size and making better use of the L1 cache)?
*One suggestion I've heard is to not calculate the N and Z flags on every opcode that sets them (almost all of them), but instead to simply keep a variable that contains the last data byte that affected N/Z, and only parse the flags when needed. Therefore, BEQ/BNE would simply check whether the last data byte was 0, and BMI/BPL would check whether bit 7 was set, and it would only be necessary to change the flags into 2A03 format for PHP or interrupts. But, if this method is used, how can the emulator handle setting N and Z simultaneously via BIT, PLP, or RTI?
*The same addressing modes are re-used numerous times. For instance, LDA ($nn),Y will use the same effective address calculation as ORA ($nn),Y. In general, on an x86 platform, is it faster to inline the effective address calculation (thus minimizing CALL/RET overhead), or is it faster to use subroutines for each form of calculation (thus minimizing code size and making better use of the L1 cache)?
*One suggestion I've heard is to not calculate the N and Z flags on every opcode that sets them (almost all of them), but instead to simply keep a variable that contains the last data byte that affected N/Z, and only parse the flags when needed. Therefore, BEQ/BNE would simply check whether the last data byte was 0, and BMI/BPL would check whether bit 7 was set, and it would only be necessary to change the flags into 2A03 format for PHP or interrupts. But, if this method is used, how can the emulator handle setting N and Z simultaneously via BIT, PLP, or RTI?