lidnariq wrote:
Honestly, even using 32 KiW isn't worthwhile. The lookup table approach is only useful on architectures where floating point or division on integers is really expensive, i.e. "before the pentium".
Let's put that claim to the test. Below is a Java application that compares 32-bit floating-point operations to lookup tables:
Code:
public final class BenchmarkApuMixers {
private static final float[] pulseTable = new float[31];
private static final float[] tndTable = new float[203];
static {
for(int i = pulseTable.length - 1; i >= 0; i--) {
pulseTable[i] = 95.52f / (8128f / i + 100f);
}
for(int i = tndTable.length - 1; i >= 0; i--) {
tndTable[i] = 163.67f / (24329f / i + 100f);
}
}
public static float noTableTest(final float pulse1, final float pulse2,
final float triangle, final float noise, final float dmc) {
return 95.88f / (8128f / (pulse1 + pulse2) + 100f) + 159.79f
/ ((1f / (triangle / 8227f + noise / 12241f + dmc / 22638f)) + 100f);
}
public static float lookupTableTest(final int pulse1, final int pulse2,
final int triangle, final int noise, final int dmc) {
return pulseTable[pulse1 + pulse2] + tndTable[3 * triangle + (noise << 1)
+ dmc];
}
public static void main(final String... args) throws Throwable {
float result = 0;
int x = 0;
int y = 0;
long startTime = System.nanoTime();
for(int i = 200_000_000; i >= 0; i--) {
if (++x == 16) {
x = 0;
}
if (++y == 128) {
y = 0;
}
result += noTableTest(x, x, x, x, y);
}
System.out.format("%f nanos/iteration%n", (System.nanoTime() - startTime)
/ 200_000_000.0);
result = 0;
x = 0;
y = 0;
startTime = System.nanoTime();
for(int i = 200_000_000; i >= 0; i--) {
if (++x == 16) {
x = 0;
}
if (++y == 128) {
y = 0;
}
result += lookupTableTest(x, x, x, x, y);
}
System.out.format("%f nanos/iteration%n", (System.nanoTime() - startTime)
/ 200_000_000.0);
System.out.println(result);
}
}
The results:
floating-point: 30.239289 nanos/iteration
lookup tables: 2.060711 nanos/iteration
That can't be right. The iterations are way too brief. This is only a 3 GHz machine.
Any suggestions for better ways to benchmark this?