Gauss Table Creation

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Gauss Table Creation
by on (#119449)
Sony's soundchips (as used in SNES and PSX) are using a "gauss_table" with 512 entries for pitch interpolations:
PSX: http://nocash.emubase.de/psx-spx.htm#spuadpcmpitch
SNES: http://nocash.emubase.de/fullsnes.htm#s ... spbrrpitch
Above docs are containing the table contents as extracted from real hardware.

So far, the table contents are known, and everything is fine. But, just for curiosity: How did Sony create those tables?

The guy who did originally dump one of that tables (and who branded the name "gauss" table) did later admit that he doesn't have a clue if the table/interpolation has any relation to Gauss at all. Anyway, the table content does seem to resemble to something that is called "gaussian normal distribution" - so after all, the name "gauss_table" seems to be correct.

Here's a formula that does - more or less - reproduce the contents for table[0..511]:

table[i] = (e^(-((siz-i)^2)/curv)) * (volume) - offset

e = Euler's Number (2.718281828...)
siz = index of last table entry (511) (or maybe 512 in case the table excludes the highest point)
offset = some small offset, needed to get table[0]=0 for SNES, and table[0]=-1 for PSX
volume = volume factor (should be equal to "table[siz]+offset")
curv = some constant that somehow affects the shape of the curve

The "volume" should be kinda obvious since
table[siz]=(e^(0/curv) * (volume) - offset
If siz should be 512 rather than 511, then (for obtaining the "volume" value) one must guess the value for table[512], the value should be same or maybe one bigger than table[511].

The "offset" value is needed to get table[0]=0 for SNES, and table[0]=-1 for PSX. Without subtracting that offset, table[0] would be some positive value (the formula won't reach zero until somewhere at table[-infinite]).
with some experimentation, I ended up with these offset values
offset=circa 10 for SNES
offset=circa 50 for PSX

that assuming that offset is constant for all table entries. I might be also variable, something like "offset=(512-i)/10" instead of "offset=50" or whatever. But anyways, the offset is needed only for fine-tuning.

And "curv" can be calculated as:
curv = -(siz-i)^2 / log.e((table[i]+offset)/volume)
aka
curv = -(siz-i)^2 / ln((table[i]+offset)/volume)
aka, when picking i=256 for example,
curv = -(siz-256)^2 / ln((table[256]+offset)/volume)
and then I got this curv values from above formula:
curv=circa 53240 for SNES
curv=circa 42484 for PSX

So, with above stuff, three table entries are used as reference points:
table[siz]=highest point, used to calculate "volume"
table[256]=some random point, used to calculate "curv"
table[0]=lowest point, used to compute "offset" (done by experimentation, not really calculated)

and with the computed volume, curv, offset values, it should be theoretically possible to calculate all other table entries.

Unfortunately, the results are still far away from perfection. Maybe I got something wrong, or maybe Sony used some rounded value like e=2.7 rather than e=2.718281828... or the overall formula isn't correct at all.

Any ideas?

PS. credits to Felix Laepple for pointing me on the basic formula.
Re: Gauss Table Creation
by on (#119454)
Here's a generator I wrote a while back that comes close (about 12 units off at the extremes):
Code:
double e = 2.718281828;
for ( int i = 0; i < 512; i++ ) {
   double x = i / 511.0 * 2.31 - 0.05;
   double y = pow( e, -x * x ) * 1305.64;
   gauss [i] = y - 8.54;
}
Re: Gauss Table Creation
by on (#119476)
Thanks! That looks similar...
Your "511.0*2.31" would be equivalent to "curv=(511.0/2.31)^2" (ie. curv=48934.8) (a good bit different than the curv=53240 that I came up with for SNES).
The "-0.05" is something that I didn't have, did you use that to maintain the highest point at 1305 despite of the -8.54 subtraction?
Big difference is that "i" instead of "siz-i" will reverse the table, ie. table[0]=highest point instead of lowest point.
Re: Gauss Table Creation
by on (#119478)
I opened numpy, graphed the log FFT of the whole thing, and found a huge notch in the frequency response around period 256 (one input sample). It's as if they took their original curve and tweaked it to resolve one special case of whining at the sample frequency. I was able to produce a very similar-looking (and similar-functioning but nowhere near bit-exact) curve by convolving four boxcar functions of length 256, or three 256-boxcars and two 128-boxcars. Next I might see what I can do with products of the bell curve at various scales and various window functions (Hann, Blackman, etc.).

Is one of the goals some way to compress the table for use in an emulator?
Re: Gauss Table Creation
by on (#119500)
No, no goal, just for curiosity. The whole table takes only 1kbyte, the floating point math needed to calculate the table might even eat up more memory.

The table contains a 'smooth' waveform without notches, where did you see notches? Or did you go some step farther and did analyze interpolation filtering characterisics... or whatever?

Tried the SNES table with my formula and curv=53240 that gave me errors around +/-16, a bit worse than blargg's results. Then I tried curv=53240-3000+i*6000/512 that dropped the error to around +/-8, that's even a bit better than blargg's results. Maybe this is the right direction and gives perfect results when fine-tuning the "-3000+i*6000" values.
Re: Gauss Table Creation
by on (#119502)
nocash wrote:
The table contains a 'smooth' waveform without notches, where did you see notches? Or did you go some step farther and did analyze interpolation filtering characterisics... or whatever?

Yes, I took the Fourier transform to analyze its filtering characteristics. By "notch", I meant near-zero response at a particular frequency.
Re: Gauss Table Creation
by on (#119503)
Another try, curv=46440+i*24 (and offset=5, siz=511, volume=1305.5+offset), gives error in range -0..+6, yet a good bit closer.

tepples wrote:
By "notch", I meant near-zero response at a particular frequency.

Yikes, frequency responses are sounding difficult.
With my plus/minus/xor integer math skills, even multiplications and exponents are already looking horribly complicated to me :-)
Re: Gauss Table Creation
by on (#119504)
Tried to calculate "curv" values for three points...
i=128, table[128]=029h=41, curv=-383^2 / ln((41+5)/1310.5) = 43794
i=256, table[256]=176h=374, curv=-255^2 / ln((474+5)/1310.5) = 64607
i=384, table[384]=3C9h=969, curv=-127^2 / ln((969+5)/1310.5) = 54351

Hmmmm... not the expected the result. I was expecting that curv would increase alongside with i.
But of that three values, it's reaching the highest point at i=256.
NB. at i=256, the "ln(...)" result is close to -1.00.


EDIT: Oops, typo, used 474 instead of 374. Correct should be:
i=128, table[128]=029h=41, curv=-383^2 / ln((41+5)/1310.5) = 43794
i=256, table[256]=176h=374, curv=-255^2 / ln((374+5)/1310.5) = 52412
i=384, table[384]=3C9h=969, curv=-127^2 / ln((969+5)/1310.5) = 54351

So curv does increase alongsides with i, but not linearily.
Or maybe curv is constant and I am just trying to workaround a mistake elsewhere in the formula.
Re: Gauss Table Creation
by on (#119522)
Just noting, in case you weren't aware ... but the S-DSP stores the gaussian table in a 512x12-bit (6144-bit) mask ROM table.

Unfortunately we haven't been able to extract the table yet, the die scans needed another layer removed to see it and this hasn't been done.

It is possible, though it'd be extreme and unlikely, that they hand-tweaked entries in this table.
Re: Gauss Table Creation
by on (#119791)
Okay, I've written a utility that can display the graphs, and calculate graphs using FPU opcodes, and display differences between original and calculated graphs... (source code
Attachment:
Gauss.zip [10.17 KiB]
Downloaded 235 times
for Borland TASM).

First of, here are the original SNES and PSX tables, shown at their 12bit/16bit ranges (max=800h/8000h for SNES/PSX). And scaled to max=519h/59B3h, ie. highest points of the SNES/PSX graphs).
The sum of entries gauss[000h+i]+gauss[0FFh-i]+gauss[100h+i]+gauss[1FFh-i] is approximately 800h/8000h for SNES/PSX, the difference is that the PSX graph having a steeper+higher peak level, but less steep+high bottom).
Attachment:
gauss1.gif
gauss1.gif [ 7.59 KiB | Viewed 8037 times ]


Next, here are some attemps to compute the SNES table by software.
The upper picture shows what happens when using bigger/smaller "curv" values. The bold lines are the actual graphs, the thin lines are showing the difference between original (red graph) and computed graphs at higher resolution - ideally this should be a straight horizontal line (=no difference).
The lower picture shows some more attempts:
curv=55968 is quite fine on the left side, but goes wrong at the right side
curv=58700+x*24 is quite fine, but still has some up/down error
Attachment:
gauss2.gif
gauss2.gif [ 12.63 KiB | Viewed 8037 times ]
Re: Gauss Table Creation
by on (#119792)
This is computing the required "curv[i]" values... the jitter at the left/right sides is due low resolution of the 12bit table entries... but even without that glitch, the result doesn't look too useful :-/
Attachment:
gauss3.gif
gauss3.gif [ 5.39 KiB | Viewed 8037 times ]


So next, attempt, replacing the "((x)^2)/curv" idea by "(something)^2", and now computing that "something":
The upper picture is using factor=256, looks fine. The lower one factor=235, which looks even better.
And, the graphs with offset=0 (cyan) are finally showing some "constantly" raising waveform, yeah :-)
Attachment:
gauss4.gif
gauss4.gif [ 15.16 KiB | Viewed 8037 times ]


And, my first attempt to calculate "(something)" by software. Not perfect, but it looks as if it chould give perfect results when fine-tuning the 7000 and 532 values.
EDIT: And, the "235" in the reference-graph may also need some fine-tuning.
Attachment:
gauss5.gif
gauss5.gif [ 5.79 KiB | Viewed 8037 times ]
Re: Gauss Table Creation
by on (#119806)
Got bored of calculating or guessing numbers, and instead, used dumb brute-force approach for finding better values than 7000, 532, 235. This did threw out values 11580, 551, 244. Giving these graphs:
Attachment:
gauss6.gif
gauss6.gif [ 10.15 KiB | Viewed 8013 times ]

The difference to the original SNES table is now within -1..+3. The formula is most probably correct, and errors may be due to rounding issues on the final result, or fractional parts of the constants (like maybe 551.4 instead of 551, or 1305.5 instead of 1305, etc).
Re: Gauss Table Creation
by on (#119807)
The remaining waviness may be a post-processing step to make sure all sums of four corresponding values are near $800, so that DC interpolates to DC.
Re: Gauss Table Creation
by on (#119828)
tepples wrote:
The remaining waviness may be a post-processing step to make sure all sums of four corresponding values are near $800, so that DC interpolates to DC.

Theoretically yes, but the formula does spit out sums near 800h anyways (don't know how and why, but it does do so). And Sony definetly didn't mind about getting exact sums of 800h (causing the nasty hardware glitch when sum becomes 801h).

I've changed the brute-force stuff a bit, allowing it to span bigger range at better resolutions (with steps smaller than 1.0). The problem is that some of the FPU opcodes are quite slow, computing a few million graphs with 512 points each can take up a whole minute, or even several hours when using slightly bigger ranges for the separate constants.

Some nice constants are 16185, 580.0, 255.0, 1305.0. Used like this:
n = (x + 16185/(580-x) - 16185/580) / 255
table[x] = (e^(-(n^2))) * 1305

Results are very close to the original snes table (with errors are in range -1..+1).
Though there are various other constants that give similar (or possibly even better) results, so it's hard to tell which values are best.

I think some of the remaining errors could be blamed to rounding errors. One thing that is definitely wrong is that my tool spits out table[0]=(e^0)*1305=1304. And Sony's original program may have similar rounding errors, which would make it difficult to get the same results without knowing the original FPU rounding mode and FPU resolution.

Oh, and I've replaced "e=2.718281828" by "e=(1.0)*(2^log.2(e))" (using fld1,fldl2e,fscale opcodes, which is hopefully more accurate).

EDIT: Just changed the resolution of FPU memory operands from 64bit (qword) to the full 80bit (tbyte) resolution - that has fixed the "(e^0)<1" error.

EDIT: I was somehow thinking that table[511] might be rounded down to 1305. But actually, it might be rounded up to 1305. With constants like so 16151.9, 580.1, 255.0, 1304.5. Results are possibly looking a bit better that way. Only, again the FPU is giving me table[0]=(e^0)*1304.5=1304, same for (e^0)*1304.6, despite of round-to-nearest mode, the damn thing just isn't rounding as desired.
Re: Gauss Table Creation
by on (#119837)
Tried to do the same for the PSX table - should have been doing that anyways since the 16bit entries are having 16x better precision than the 12bit SNES values.
Attachment:
gauss7.gif
gauss7.gif [ 4.67 KiB | Viewed 7943 times ]

Hmmmm, the error (the thin line at the bottom of the image) ranging from +25..-10 doesn't look good. I got a similar error for the SNES (but barely visible, ranging only from +1..-1 due to the lower resolution).
Anyways, I am afraid that +25..-10 can't be blamed on rounding errors, so there's probably still something wrong/missing in the overall formula :-/
EDIT: As tepples mentioned the entries should sum up properly (to 7F7Fh..7F81h for the PSX), which won't work with the above errors. Either Sony has applied post processing to get the 7F7Fh..7F81h range, or they used a better formula that didn't need post processing...
Re: Gauss Table Creation
by on (#121231)
I've just got back to the basic table[x]=(e^(-(x^2)/curv)*volume formula. And tested what happens when using other values than e=2.718281828, like this:
curv=36000, volume=1305, e=2.000
curv=55000, volume=1305, e=2.718281828 (euler's number)
curv=72000, volume=1305, e=4.000

The curv values are needed to match the result close to the middle table entry. And I was expecting to see huge differences on other tables entries. But, no, the results are identical in all three cases. Surprise. Magic. I've no clue how that is possible (yes my maths are very poor). At least, I've now figured out that it doesn't seem to matter if one is using 2.718281828 or other values (except, supposedly, e=1.000 couldn't work).
Please excuse my amateurish attempts. Are there any maths gurus out there who could jump in?
Re: Gauss Table Creation
by on (#121232)
A number raised to the power of a product of two factor is the number raised to the first factor, then raised to the other factor.

a^(b * c) = (a^b)^c

This means that the following identities hold:

e^(y / curvA) = (e^(1 / curvA))^y
2^(y / curvB) = (2^(1 / curvB))^y
4^(y / curvC) = (4^(1 / curvC))^y

What you're seeing is that (2^(1 / curvB)) = (4^(1 / curvC)) because 2^2 = 4^1, and (e^(1 / curvA)) isn't too different.

2.000^(1/36000) = 1.000019254
2.718281828^(1/55000) = 1.000018182
4.000^(1/72000) =1.000019254
Re: Gauss Table Creation
by on (#121233)
Thanks! I am really not familar with that stuff (though I suspect that I might have learned it at school ages ago). Some more working e:curv pairs:
curv=-36000, volume=1305, e=0.500
curv=36000, volume=1305, e=2.000
curv=55000, volume=1305, e=2.718281828 (euler's number)
curv=72000, volume=1305, e=4.000
curv=108000, volume=1305, e=8.000
curv=1, volume=1305, e=1.00001925

The latter one with curv=1 and e=1.00001925 means that one could simplify the formula to table[x]=(e^(-(x^2))*volume.
The only problem is that the formula is wrong either way. The middle table entry at x=256 is okay, but higher values near x=128 are a bit too small, and the lower values near x=384 a bit too large. Any ideas how to fix that?
Re: Gauss Table Creation
by on (#237776)
This gets fairly close for SNES, but it seems like the window function needs another cosine term or something:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

static void generate_lp(double* coeffs, unsigned num_coeffs, double tb_center, double a, double b)
{
 double f_s = tb_center;
 int N = num_coeffs / 2;

 for(int i = 0; i < N; i++)
 {
  double k = (0.5 + i);
  double pik_N = M_PI * 2.0 * k / (num_coeffs - 1);
  double c_k = sin(M_PI * 2.0 * k * f_s) / k;
  double w_k = (1.0 - (a + b)) + a * cos(pik_N) + b * cos(2.0 * pik_N);
  double r = c_k * w_k;

  coeffs[N + i] = r;
  coeffs[N - i - 1] = r;
 }
}

static void normalize(double* coeffs, unsigned num_coeffs, double v)
{
 double sum = 0;
 for(unsigned i = 0; i < num_coeffs; i++)
  sum += coeffs[i];

 double multiplier = v / sum;

 for(unsigned i = 0; i < num_coeffs; i++)
  coeffs[i] *= multiplier;
}

static const int apu_halfimp[512] =
{
    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
    1,    1,    1,    1,    1,    1,    1,    1,    1,    1,    1,    2,    2,    2,    2,    2,
    2,    2,    3,    3,    3,    3,    3,    4,    4,    4,    4,    4,    5,    5,    5,    5,
    6,    6,    6,    6,    7,    7,    7,    8,    8,    8,    9,    9,    9,   10,   10,   10,
   11,   11,   11,   12,   12,   13,   13,   14,   14,   15,   15,   15,   16,   16,   17,   17,
   18,   19,   19,   20,   20,   21,   21,   22,   23,   23,   24,   24,   25,   26,   27,   27,
   28,   29,   29,   30,   31,   32,   32,   33,   34,   35,   36,   36,   37,   38,   39,   40,
   41,   42,   43,   44,   45,   46,   47,   48,   49,   50,   51,   52,   53,   54,   55,   56,
   58,   59,   60,   61,   62,   64,   65,   66,   67,   69,   70,   71,   73,   74,   76,   77,
   78,   80,   81,   83,   84,   86,   87,   89,   90,   92,   94,   95,   97,   99,  100,  102,
  104,  106,  107,  109,  111,  113,  115,  117,  118,  120,  122,  124,  126,  128,  130,  132,
  134,  137,  139,  141,  143,  145,  147,  150,  152,  154,  156,  159,  161,  163,  166,  168,
  171,  173,  175,  178,  180,  183,  186,  188,  191,  193,  196,  199,  201,  204,  207,  210,
  212,  215,  218,  221,  224,  227,  230,  233,  236,  239,  242,  245,  248,  251,  254,  257,
  260,  263,  267,  270,  273,  276,  280,  283,  286,  290,  293,  297,  300,  304,  307,  311,
  314,  318,  321,  325,  328,  332,  336,  339,  343,  347,  351,  354,  358,  362,  366,  370,
  374,  378,  381,  385,  389,  393,  397,  401,  405,  410,  414,  418,  422,  426,  430,  434,
  439,  443,  447,  451,  456,  460,  464,  469,  473,  477,  482,  486,  491,  495,  499,  504,
  508,  513,  517,  522,  527,  531,  536,  540,  545,  550,  554,  559,  563,  568,  573,  577,
  582,  587,  592,  596,  601,  606,  611,  615,  620,  625,  630,  635,  640,  644,  649,  654,
  659,  664,  669,  674,  678,  683,  688,  693,  698,  703,  708,  713,  718,  723,  728,  732,
  737,  742,  747,  752,  757,  762,  767,  772,  777,  782,  787,  792,  797,  802,  806,  811,
  816,  821,  826,  831,  836,  841,  846,  851,  855,  860,  865,  870,  875,  880,  884,  889,
  894,  899,  904,  908,  913,  918,  923,  927,  932,  937,  941,  946,  951,  955,  960,  965,
  969,  974,  978,  983,  988,  992,  997, 1001, 1005, 1010, 1014, 1019, 1023, 1027, 1032, 1036,
 1040, 1045, 1049, 1053, 1057, 1061, 1066, 1070, 1074, 1078, 1082, 1086, 1090, 1094, 1098, 1102,
 1106, 1109, 1113, 1117, 1121, 1125, 1128, 1132, 1136, 1139, 1143, 1146, 1150, 1153, 1157, 1160,
 1164, 1167, 1170, 1174, 1177, 1180, 1183, 1186, 1190, 1193, 1196, 1199, 1202, 1205, 1207, 1210,
 1213, 1216, 1219, 1221, 1224, 1227, 1229, 1232, 1234, 1237, 1239, 1241, 1244, 1246, 1248, 1251,
 1253, 1255, 1257, 1259, 1261, 1263, 1265, 1267, 1269, 1270, 1272, 1274, 1275, 1277, 1279, 1280,
 1282, 1283, 1284, 1286, 1287, 1288, 1290, 1291, 1292, 1293, 1294, 1295, 1296, 1297, 1297, 1298,
 1299, 1300, 1300, 1301, 1302, 1302, 1303, 1303, 1303, 1304, 1304, 1304, 1304, 1304, 1305, 1305
};

int main(int argc, char* argv[])
{
 double coeffs[1024];

 generate_lp(coeffs, 1024, 0.16 / 256.0, 0.50, 0.08);
 normalize(coeffs, 1024, 2048 * 256.0);

 for(unsigned i = 0; i < 512; i++)
 {
  int c = floor(0.5 + coeffs[i]);

  printf("0x%03x: APU=%d Calced=%d (%f)", i, apu_halfimp[i], c, coeffs[i]);

  if(apu_halfimp[i] != c)
   printf("  MISMATCH (%f)", 0.5 - (coeffs[i] - floor(coeffs[i])));

  printf("\n");
 }

 return 0;
}
Re: Gauss Table Creation
by on (#237788)
Had an idea that fixed it so it generates a 100% match, new version attached.
Re: Gauss Table Creation
by on (#237789)
Cool. With the sine and cosine it's very different from the exponential stuff that I had been trying. And your constants with not more than 2 fractional digits are looking quite simple. Or they do even look like percent values, originally specified as plain integers without any fractional digits. Are you going to try the same formula for the PSX gauss table, too?
Re: Gauss Table Creation
by on (#237826)
I can get close with PS1, but still a lot of off-by-one errors.

The number of -1 values is difficult to reconcile. I'm wondering if I dumped/logged the SPU table incorrectly(from incorrect assumptions about the SPU's interpolation math?), in such a way that wouldn't really have much of an audible effect, but would complicate trying to recreate the original generation algorithm.
Re: Gauss Table Creation
by on (#237836)
I have this values in the table http://problemkaputt.de/psx-spx.htm#spuadpcmpitch
I guess the main difference is using values other than 0.16 and 0.08 for PSX?
And for the larger fractional part... replace 800h/sum by 7f80h/sum?

EDIT: Or did you mean the problem is getting -1 at all, because the formula normally outputs 0 as smallest value?
That might be a rounding issue, subtract 0.5 from the result?

EDIT: Gave it try, too. Replacing 800h by 7F80h, and 0.16 by 0.256 is getting relative close for PSX.
But it's still off +/-2 or so, so you might already have better results. Which values are using currently?
Re: Gauss Table Creation
by on (#237880)
nocash wrote:
EDIT: Gave it try, too. Replacing 800h by 7F80h, and 0.16 by 0.256 is getting relative close for PSX.
But it's still off +/-2 or so, so you might already have better results. Which values are using currently?


0.256 as well, with about the same results, even after brute-force checking different rounding modes and scaling values.
Re: Gauss Table Creation
by on (#237881)
Does the PlayStation's table have any defects where the four values for one fractional phase (table[x], table[256 + x], table[511 - x], and table[255 - x]) add up to other than 100%? I know the Super NES table has a few such values. If there aren't such values in the PlayStation's table, but some values there are +/- 1, they might be where particular sets of 4 have been corrected in a post-processing step.
Re: Gauss Table Creation
by on (#237882)
Mednafen wrote:
Had an idea that fixed it so it generates a 100% match, new version attached.


That's incredible! Really awesome work, thank you for sharing~ ^-^
Re: Gauss Table Creation
by on (#237885)
tepples wrote:
Does the PlayStation's table have any defects where the four values for one fractional phase (table[x], table[256 + x], table[511 - x], and table[255 - x]) add up to other than 100%? I know the Super NES table has a few such values. If there aren't such values in the PlayStation's table, but some values there are +/- 1, they might be where particular sets of 4 have been corrected in a post-processing step.

Yes and no. The inaccuracy in sum +/-1 is there, but it won't cause overflows on PSX.
On SNES the sum of four values is 800h+/-1 (that's bad because it may exceed 800h)
On PSX the sum of four values is 7F80h+/-1 (that's better because it won't exceed 8000h)

I've re-arranged the code from Mednafen a bit to make it easier to see which immediates are used in which place:
Code:
  for i=0 to 511
    k = (0.5 + i)
    s = (sin(PI * k * 1.280 / 1024)  )          ;for PSX: Use 2.048 instead of 1.280
    t = (cos(PI * k * 2.000 / 1023)-1)*0.50
    u = (cos(PI * k * 4.000 / 1023)-1)*0.08
    table[511-i] = s * (t + u + 1.0) / k
  next i
The output from that formula does still need some scaling: For SNES, multiply all values by ~511C0h, and for PSX by ~37C500h. And then round the results to nearest integer. That is giving almost perfect values for SNES and PSX (with +/-1 error).
For SNES that +/-1 error can be fixed by computing the perfect scaling factor for each group of four values. But that doesn't help on PSX.

In the above formula, the /1024 vesus /1023 looks a bit odd. But trying to use only /1024 (or /1023 or /1023.5) is making things worse.
Maybe PSX needs another cosine, multiplied by some small value like 0.0000x or so? That might add some fractional bits that aren't needed on SNES.
Re: Gauss Table Creation
by on (#237891)
I've got exact matches for PSX! The trick is do the scaling & error adjustment in separate steps:
  • First, calc sum of ALL table entries, and scale all entries accordingly (alike the normalize function in your old code version).
  • Next, calc sum of each FOUR table entries (alike your newer code), but don't use that as scaling value, instead compute difference=sum-7F80h, and then fix the four table entries by subtacting difference/4.
To some level one can even "see" that it's supposed to be done as so (the unadjusted graph as "too large" values at begin+middle+end, so those large values must be fixed by subtraction (multiplication would affect only the bigger values, and have little effect on near-zero values)).

However, that's working for PSX only. Doing the same on SNES is messing up a few values. Either one shouldn't do that on SNES... or maybe one could do so when adjusting some constants in the SNES.
Re: Gauss Table Creation
by on (#237906)
What I don't understand about the formula,
Code:
  for i=0 to 511
    k = (0.5 + i)
    s = (sin(PI * k * 1.280 / 1024)  )          ;for PSX: Use 2.048 instead of 1.280
    t = (cos(PI * k * 2.000 / 1023)-1)*0.50
    u = (cos(PI * k * 4.000 / 1023)-1)*0.08
    table[511-i] = s * (t + u + 1.0) / k
  next i
how and why does that magically result in,
Code:
  table[0+i] + table[255-i] + table[256+i] + table[511-i] = constant     ;with "i=0..127", and "constant" being same in each case
is there some maths rule that could explain that effect?

Currently, the groups of four values don't sum up to the exact same constant. But it's almost perfect, and when understanding why it is so close, then one could perhaps improve the formula to get totally perfect results (and no longer needing the extra steps for fixing the errors in sum of four values).

There are some webpages explaining what happens when adding or multiplying sine and cosine values, maybe that could explain how the formula works. The extra difficulty is the "divide by k" step.

Hmmmm, or is the formula just working because it was supposed to match Sony's gauss tables? I wouldn't be surprised if changing a few parameters in the formula could be used to reproduce the shape of a chicken's egg - which wouldn't mean that Sony (or the chicken) had actually used that formula for creating eggs.
Re: Gauss Table Creation
by on (#237911)
nocash wrote:
Hmmmm, or is the formula just working because it was supposed to match Sony's gauss tables? I wouldn't be surprised if changing a few parameters in the formula could be used to reproduce the shape of a chicken's egg - which wouldn't mean that Sony (or the chicken) had actually used that formula for creating eggs.


Indeed. The importance of algorithms over constant tables isn't just to remove a few lines from a source code file, it's to show you what the constants mean. That's why the NES color generation formula is worth its weight in gold. There's an infinite number of algorithms to produce any table, and we'll never know which one is correct here unless Sony tells us.

But by finally having a bit-perfect algorithm, it's a solid base. Now we can try and understand why values are what they are, and what significance they have. It's a critical starting point. I'm excited to see what happens from here.