precalculated flags N and Z?

precalculated flags N and Z?
by Anes on 2005-08-15 (#3736)

im sure a lot of people has implemented this, but in any case i post it here cos maybe someone havent done it this way, i think its faster that testing bits. Its only for N and Z flags.

Code:

BYTE g_Flags[256]
...
cpuinit()
{
WORD i;

   for (i = 0; i < 256; i++)
     g_Flags[i] = (i == 0 ? 0x02 : 0x00) | (i & 0x80);

}

So when we have to set or clear the flags for an instruction (soppouse its a LDA that only affects N & Z:

Code:

.. // Code for LDA here

g_CpuContext.P &= 0x7D;
g_CpuContext.P |= g_Flags[g_CpuContext.A];

First we clear the N and Z flag cos we dont know if the value needs to clear the flags, then we set what we pre-calculated in the initcpu() routine.

If this post is help someone im glad for you, if not please admin delete it.

by hap on 2005-08-15 (#3739)

No it shouldn't be deleted, don't be so hard on yourself :p
It looks like a fast and easy method, but I'm doing it the blargg way: http://www.slack.net/~ant/nes-emu/6502.html

by Disch on 2005-08-15 (#3742)

I never fully understood blargg's method, but took its idea and spun it into my own. Rather than keeping one NZ flag I keep both N and Z, but set them both during each instruction. You might see a lot of this in my code:

Code:
fN = fZ = A;

For example my ORA emulation looks like this:

Code:
#define ORA()
fN = fZ = A |= val

Then the Z flag is set whenever fZ is zero (and is cleared when fZ is nonzero -- this is kind of backwards) -- whereas the high bit of fN is used to determine the N flag -- as my BMI/BPL/BNE/BEQ emulation demonstrates:

Code:
case 0x10: BRANCH(!(fN & 0x80)); break; /* BPL */
case 0x30: BRANCH( (fN & 0x80)); break; /* BMI */
case 0xD0: BRANCH( fZ ); break; /* BNE */
case 0xF0: BRANCH( !fZ ); break; /* BEQ */

Probably isn't as efficient as blargg's way since it requires two vars instead of one, but I've gotten used to it.

I used to use a pre-built NZ table like that and keep all the status bits in one byte -- but I found this to be troublesome. As it requires operations to flip off and flip on bits every instruction. For example when the NZ bits change not only do you have to OR your status var with the value in that table, but you also have to AND it with the inverse of NZ bits before ORing (to flip off N and Z before flipping them back on). It just seemed like such a waste.

Keeping seperate vars for each flag is definitely the way to go, imo. That way you only have to combine them to a single byte when the status is push/pulled -- which doesn't occur very much at all.

by blargg on 2005-08-15 (#3746)

Disch described the idea quite well. The version I use merges n and z together simply to reduce the number of variables, but it's really convoluted and probably not worth it. Here's a summary of one evolutionary path:

1) Keep flags as booleans and calculate in each instruction:

zero = (result == 0);
negative = (result < 0);

2) Use table:

flags &= ~(negative_mask | zero_mask);
flags |= nzc_table [result];

3) Defer testing until flag is actually needed by using the native processor's own comparison instructions:

zero = result;
negative = result;
carry = result;

if ( zero == 0 ) ...
if ( negative < 0 ) ...
if ( carry & 0x100 ) ...

In most cases the status flag never needs to be calculated, which this scheme takes advantage of.

This scheme is part of an important general pattern of keeping data in the most efficient form for emulation and converting it to the actual hardware format only when needed. For 6502 emulation, the hardware format is needed only when saving/restoring the status register on the stack; all other uses of the status flags can be of the internal format you choose for your emulator. For PPU emulation you might keep the pattern data in a format that's faster to draw to the screen, perhaps expanding it to 8 bits per pixel.

by Anes on 2005-08-15 (#3749)

Im actually not using separate variables for flags, i mean i only use one variable that is "P", thats emulate the 6502 P. It seems it is not a good idea ah?

Disch: your method is cool but it means i have to re-arrange a lot of code , it is a pain that i didnt put this topic before.

Blargg: i still dont understand what you want to mean with the "ifs"
is it for testing the flags? Cos i dont like to use "if" that are commonly transleted to x86 CMP,JMP, which are slower (of course if we compare BYTES). Of course this not include "ifs" like this (if (a & 0x80)) wich i think it compiles to a "test" instruction, which is faster.
(correct me if i missunerstood something).

by blargg on 2005-08-15 (#3752)

In this example, zero and negative are two extra variables. They don't store simple boolean results, rather they store the last 8-bit value which the flag would have been set based on. When the actual flag is needed, the variable needs to be tested for that condition. The optimization is that this condition isn't tested until it's actually needed, and for branches the native test instruction can be used rather than the somewhat inefficient conversion to a boolean value: flag = (zero == 0)

Code:
// AND #imm
case 0x29:
a = a & read_memory( pc + 1 );
pc = pc + 2;
zero = a;
negative = a;
break;

// BNE
case 0xD0:
if ( zero != 0 )
{
   int offset = (char) read_memory( pc + 1 );
   pc = pc + offset;
}
pc = pc + 2;
break;

// BMI
case 0x30:
if ( negative & 0x80 )
{
   int offset = (char) read_memory( pc + 1 );
   pc = pc + offset;
}
pc = pc + 2;
break;

by tepples on 2005-08-15 (#3780)

hap wrote:
It looks like a fast and easy method, but I'm doing it the blargg way: http://www.slack.net/~ant/nes-emu/6502.html

I'm not inclined to put much faith in this statement: "Profiling shows that BMI and BPL aren't that frequent" given the ' bit / bpl ' sequences in NES programs' init codes and in any zero-based loop using ' dex / bpl :- '.

by hap on 2005-08-16 (#3793)

I handle BIT and PLP flag changing differently. If Z and N are both set, the MSB of NZ is set, and the low byte is 0, so instead of:

is_negative = (((nz + 0x200) >> 3) | nz) & 0x80
it's: is_negative = nz & 0x8080

by blargg on 2005-08-16 (#3799)

tepples wrote:
I'm not inclined to put much faith in this statement: "Profiling shows that BMI and BPL aren't that frequent" given the ' bit / bpl ' sequences in NES programs' init codes and in any zero-based loop using ' dex / bpl :- '.

Argh, I must have removed the reference to that being a profile of various NSFs, which of course don't have a VBL wait loop.