I noticed via Twitter that byuu acquired additional upd7725 documentation and changed the implementation of the overflow flags in higan. The upd7725 overflow flags are something I had puzzled over and discussed with Lord Nightmare once (quite a long time ago, probably when I was initially backporting the DSP LLE into bsnes-classic) because I'd noticed that he'd changed the way MAME calculated the flags from how bsnes did it. Neither the original bsnes implementation nor LN's modified MAME implementation looked quite right to me, but although I understood how the flags were meant to be used, I couldn't figure out how to calculate them so that they could be used in that way.
Thanks to the new documentation, specifically the explanation of how the S1 flag is calculated and the "1, 0, 1" (overflow, no overflow, overflow) case, I think I've figured out how everything works, and it's a good bit simpler than byuu's new implementation. In particular, I believe that there is no need for a flag "history buffer" and that the chip contains no such thing.
First of all, we have to understand what the OV0 and OV1 flags mean arithmetically. Basically, whereas OV0 indicates whether the most recent operation produced a signed overflow, OV1 indicates whether the value in the accumulator is in bounds (between -32768 and +32767) or whether it is overflowed. Let's think about how to calculate that, and build up a truth table.
First, the easy cases. If the accumulator previously contained an in-bounds value, and no overflow occurred in the last operation, then the accumulator must still contain an in-bounds value. Likewise, if the accumulator previously contained an in-bounds value and an overflow occurred, the value in the accumulator is now out of bounds.
Next, if the accumulator was previously out of bounds, and no overflow occurred in the last operation, then the accumulator is still out of bounds. This is perhaps not quite as easy to intuit as the first two cases, but think about it: the only way the accumulator can go from out of bounds back to in bounds is if a second overflow occurs, in the opposite direction of the original overflow.
Finally, the hard case: what happens if the accumulator was out of bounds and another overflow occurs? Let's look at a couple of examples:
Adding $7FFF to $0001 gives a result of $8000 with an overflow (positive + positive = negative). Adding $FFFE to the result gives a result of $7FFE and a second overflow (negative + negative = positive). However, despite the two overflows, the final result is correct and in bounds: 32767 + 1 + (-2) equals 32766. The two overflows have cancelled each other out. Now let's look at another example:
On the first addition, an overflow occurs (positive + positive = negative). No overflow occurs on the second addition, but on the third addition another positive + positive = negative overflow occurs. This time, the final result is not in bounds: 32767 + 32767 + 32767 + 32767 isn't -4 or even 65532, it's 131068 (hex $1FFFC).
The difference between these two cases is that in the first case the two overflows were in opposite directions, and in the second case both overflows were in the same direction. The purpose of the S1 flag is to distinguish between these two cases. According to the datasheet, the S1 flag contains the sign of the result of the last operation that took place with the incoming OV1 flag clear; in other words, the last operation that took place while the incoming accumulator was in bounds. If the S1 flag is the same as the S0 flag produced by the current overflowing operation, then two overflows in the same direction have occurred and the accumulator is still out of bounds. If the S1 flag and the S0 flag are different, then two overflows in opposite directions have occurred, meaning the accumulator went out of bounds and then back in bounds.
So here are the complete truth tables for S1 and OV1:
byuu was wondering whether the OV1 test should use the new or old value of S1, but you can see from these truth tables that it doesn't matter. The value of S1 only changes if the previous OV1 was clear, while OV1 only depends on S1 if the previous OV1 was set.
The datasheet implies that "overflow, no overflow, overflow" is some kind of special case that the chip explicitly checks for, but in fact it's just a consequence of the math. Two consecutive operations can't both overflow in the same direction; just look at the results from adding $7FFF (the largest possible positive number) to itself. If you work out the results of repeatedly adding $8000 (the smallest possible negative number) to itself, it's the same. You can only have two overflows in the same direction if there is at least one non-overflowing operation between them.
Note that in order to make use of this overflow mechanism, it is essential that the OV1 flag be cleared before you start doing your additions, or the S1 flag won't be updated when it should be. According to the datasheet, any ALU operation other than an addition or subtraction clears the OV1 flag, and if you look at a disassembly of the DSP1 program or Lord Nightmare's prose2k DSP program, you can see that they in fact do xor a,a or and a,a prior to any sequence of calculations that use the OV1 flag or the SGN pseudo-register.
One more thing: why is this overflow mechanism only good for three operations? Let's look at what happens if you do four additions in a row with the following values:
We've already gone over the first three additions, so just look at what happens with the fourth. An overflow occurs (OV0 = 1) and S1 and S0 have opposite values, so the OV1 flag is cleared. Which means the accumulator is considered to be in bounds. But this is wrong--32767 + 32767 + 32767 + 32767 + -32768 is 98300, not 32764! If you do four additions in a row, it becomes possible for two overflows in one direction to occur followed by an overflow in the opposite direction, resulting in a false negative.
Thanks to the new documentation, specifically the explanation of how the S1 flag is calculated and the "1, 0, 1" (overflow, no overflow, overflow) case, I think I've figured out how everything works, and it's a good bit simpler than byuu's new implementation. In particular, I believe that there is no need for a flag "history buffer" and that the chip contains no such thing.
First of all, we have to understand what the OV0 and OV1 flags mean arithmetically. Basically, whereas OV0 indicates whether the most recent operation produced a signed overflow, OV1 indicates whether the value in the accumulator is in bounds (between -32768 and +32767) or whether it is overflowed. Let's think about how to calculate that, and build up a truth table.
First, the easy cases. If the accumulator previously contained an in-bounds value, and no overflow occurred in the last operation, then the accumulator must still contain an in-bounds value. Likewise, if the accumulator previously contained an in-bounds value and an overflow occurred, the value in the accumulator is now out of bounds.
Code:
OV1in OV0 | OV1out
-----------+--------
0 0 | 0
0 1 | 1
-----------+--------
0 0 | 0
0 1 | 1
Next, if the accumulator was previously out of bounds, and no overflow occurred in the last operation, then the accumulator is still out of bounds. This is perhaps not quite as easy to intuit as the first two cases, but think about it: the only way the accumulator can go from out of bounds back to in bounds is if a second overflow occurs, in the opposite direction of the original overflow.
Code:
OV1in OV0 | OV1out
-----------+--------
1 0 | 1
-----------+--------
1 0 | 1
Finally, the hard case: what happens if the accumulator was out of bounds and another overflow occurs? Let's look at a couple of examples:
Code:
32767 + 1 + (-2) (hex: $7FFF + $0001 + $FFFE)
$7FFF + $0001 = $8000 + overflow
$8000 + $FFFE = $7FFE + overflow
$7FFF + $0001 = $8000 + overflow
$8000 + $FFFE = $7FFE + overflow
Adding $7FFF to $0001 gives a result of $8000 with an overflow (positive + positive = negative). Adding $FFFE to the result gives a result of $7FFE and a second overflow (negative + negative = positive). However, despite the two overflows, the final result is correct and in bounds: 32767 + 1 + (-2) equals 32766. The two overflows have cancelled each other out. Now let's look at another example:
Code:
32767 + 32767 + 32767 + 32767 (hex: $7FFF + $7FFF + $7FFF + $7FFF)
$7FFF + $7FFF = $FFFE + overflow
$FFFE + $7FFF = $7FFD
$7FFD + $7FFF = $FFFC + overflow
$7FFF + $7FFF = $FFFE + overflow
$FFFE + $7FFF = $7FFD
$7FFD + $7FFF = $FFFC + overflow
On the first addition, an overflow occurs (positive + positive = negative). No overflow occurs on the second addition, but on the third addition another positive + positive = negative overflow occurs. This time, the final result is not in bounds: 32767 + 32767 + 32767 + 32767 isn't -4 or even 65532, it's 131068 (hex $1FFFC).
The difference between these two cases is that in the first case the two overflows were in opposite directions, and in the second case both overflows were in the same direction. The purpose of the S1 flag is to distinguish between these two cases. According to the datasheet, the S1 flag contains the sign of the result of the last operation that took place with the incoming OV1 flag clear; in other words, the last operation that took place while the incoming accumulator was in bounds. If the S1 flag is the same as the S0 flag produced by the current overflowing operation, then two overflows in the same direction have occurred and the accumulator is still out of bounds. If the S1 flag and the S0 flag are different, then two overflows in opposite directions have occurred, meaning the accumulator went out of bounds and then back in bounds.
So here are the complete truth tables for S1 and OV1:
Code:
OV1in | S1
------+----
0 | S0
1 | unchanged
OV1in OV0 | OV1out
-----------+--------
0 0 | 0
0 1 | 1
1 0 | 1
1 1 | (S0 == S1)
------+----
0 | S0
1 | unchanged
OV1in OV0 | OV1out
-----------+--------
0 0 | 0
0 1 | 1
1 0 | 1
1 1 | (S0 == S1)
byuu was wondering whether the OV1 test should use the new or old value of S1, but you can see from these truth tables that it doesn't matter. The value of S1 only changes if the previous OV1 was clear, while OV1 only depends on S1 if the previous OV1 was set.
The datasheet implies that "overflow, no overflow, overflow" is some kind of special case that the chip explicitly checks for, but in fact it's just a consequence of the math. Two consecutive operations can't both overflow in the same direction; just look at the results from adding $7FFF (the largest possible positive number) to itself. If you work out the results of repeatedly adding $8000 (the smallest possible negative number) to itself, it's the same. You can only have two overflows in the same direction if there is at least one non-overflowing operation between them.
Note that in order to make use of this overflow mechanism, it is essential that the OV1 flag be cleared before you start doing your additions, or the S1 flag won't be updated when it should be. According to the datasheet, any ALU operation other than an addition or subtraction clears the OV1 flag, and if you look at a disassembly of the DSP1 program or Lord Nightmare's prose2k DSP program, you can see that they in fact do xor a,a or and a,a prior to any sequence of calculations that use the OV1 flag or the SGN pseudo-register.
One more thing: why is this overflow mechanism only good for three operations? Let's look at what happens if you do four additions in a row with the following values:
Code:
32767 + 32767 + 32767 + 32767 + -32768 (hex: $7FFF + $7FFF + $7FFF + $7FFF + $8000)
$7FFF + $7FFF = $FFFE S0 = 1 S1 = 1 OV0 = 1 OV1 = 1
$FFFE + $7FFF = $7FFD S0 = 0 S1 = 1 OV0 = 0 OV1 = 1
$7FFD + $7FFF = $FFFC S0 = 1 S1 = 1 OV0 = 1 OV1 = 1
$FFFC + $8000 = $7FFC S0 = 0 S1 = 1 OV0 = 1 OV1 = 0
$7FFF + $7FFF = $FFFE S0 = 1 S1 = 1 OV0 = 1 OV1 = 1
$FFFE + $7FFF = $7FFD S0 = 0 S1 = 1 OV0 = 0 OV1 = 1
$7FFD + $7FFF = $FFFC S0 = 1 S1 = 1 OV0 = 1 OV1 = 1
$FFFC + $8000 = $7FFC S0 = 0 S1 = 1 OV0 = 1 OV1 = 0
We've already gone over the first three additions, so just look at what happens with the fourth. An overflow occurs (OV0 = 1) and S1 and S0 have opposite values, so the OV1 flag is cleared. Which means the accumulator is considered to be in bounds. But this is wrong--32767 + 32767 + 32767 + 32767 + -32768 is 98300, not 32764! If you do four additions in a row, it becomes possible for two overflows in one direction to occur followed by an overflow in the opposite direction, resulting in a false negative.