Drew Sebastino wrote:
lidnariq wrote:
Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)
Well, it's not like it's never been "meaningfully slow", but it'll always be slower than integer addition, I'd imagine.
How do you make a 1 cycle instruction faster than another 1 cycle instruction?
Drew Sebastino wrote:
pubby wrote:
It takes more time to read from memory nowadays than it does to multiply two numbers together.
Even if in cache? (Depends on what level of cache, but you get the point)
Yes.
Drew Sebastino wrote:
rainwarrior wrote:
If you want to know how fast or slow they are, time it.
With what, a stopwatch?
That's not an invalid way of timing a test that takes long enough, but of course you should probably
use something designed for the purpose of timing software.
Drew Sebastino wrote:
rainwarrior wrote:
If you've got something that's conceptually a 2D grid, what's your alternative way of dealing with it? You need to turn 2 coordinates into an address. How else do you propose to do that?
The starting offset will be generated the same way no matter what, which is why I said it wouldn't be slower for a single access. Say if you wanted to increment all the data on the fifth [4] row of a 20x20 array though; this:
Code:
i = 4 * 20;
j = i + 20;
for(; i < j; i++)
array[i]++;
should be faster than this:
Code:
j = 4;
for(i = 0; i < 20; i++)
array[j][i]++;
Edit: I think ap9 was talking about the same thing?
No, in that case it won't be faster at all. ap9 was saying that in that example iterating over
j will be slower than iterating over
i. Iterating directly on the row won't have any meaningful improvement. For a loop like that the compiler would definitely know that
j is a
invariant here, and will operate accordingly.
It would generally be capable of doing this even if you put
j=4 inside the loop. (Identifying loop invariants is something compilers usually do very well.) ...but even if it didn't it might not matter for performance due to latency/pipelines, and then if it does there's certainly lower impact ways to fix the problem than having to change your data structure everywhere else it's used.
Drew Sebastino wrote:
The obvious thing would be just don't use a multidimensional array for this then. Except, I think there are many people who are not aware that the above code would be faster...
...except it isn't faster. There's a reason I suggest that you time these things. What's seems to be "obvious" to you is factually wrong. You won't be able to correct your intuition about this without finding some way to actually verify what you believe.
Also, talking about shifts vs. multiplies, even though the instruction speed is a probably not different, there are often still advantages to keeping rows of your structures a power of 2 in length. Sometimes it helps with caching, sometimes it helps with other optimziations. (Most of them there is no reason to do "by hand".)
Even on older systems where e.g. a multiply by 10 might potentially have been a slower instruction than two shifts and an add, compilers would automatically compile
* 10 into those shift and add instructions. Just because you write
* does not specify
imul. The compiler is allowed to get that operation done in whatever ways it knows how. The point of optimizing compilers is to keep the programmer from having to make brainless transpositions of the code like this and let them write something easier to read.