C: multi vs single dimensional array performance?

C: multi vs single dimensional array performance?
by Drew Sebastino on 2018-11-28 (#229475)

Well, I finally got around to having learned C, and while I think I have a good understanding of how most features get translated into machine code, I either don't understand how multidimensional arrays work, or I do and they're really just as slow as I think they are. It appears to me that say in a two dimensional array, to generate the offset, the processor would multiply the number in the left set of square brackets by the size of the first dimension, then add this to the number in the right set of square brackets? I figure this can't actually be what's going on, because this is stupid inefficient past anything other than a single random access.

Kind of a shame there doesn't seem to be anything between C and assembly. Cross compatibility is great, but not at the sake of an unnecessarily large performance loss.

Re: C: multi vs single dimensional array performance?
by lidnariq on 2018-11-28 (#229476)

Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)

Re: C: multi vs single dimensional array performance?
by pubby on 2018-11-28 (#229477)

It takes more time to read from memory nowadays than it does to multiply two numbers together. Besides, most of those multiplications will get optimized by the compiler.

Quote:

Cross compatibility is great, but not at the sake of an unnecessarily large performance loss.

You don't know enough to hold such an opinion

It's okay just learn a bit more. I think you'll see things differently when you do.

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-28 (#229478)

Drew Sebastino wrote:

...or I do and they're really just as slow as I think they are.

If you want to know how fast or slow they are, time it.

However...

Drew Sebastino wrote:

It appears to me that say in a two dimensional array, to generate the offset, the processor would multiply the number in the left set of square brackets by the size of the first dimension, then add this to the number in the right set of square brackets?

What are you comparing against? What's the alternative here? If you were to time how "slow" this thing is or isn't, what is the alternative that you would also time and compare with?

If you've got something that's conceptually a 2D grid, what's your alternative way of dealing with it? You need to turn 2 coordinates into an address. How else do you propose to do that?

You should also understand that in most C compilers (even going back 20 years) these 3 things would generate exactly the same code:

Code:

a[(y<<8)+x] // 1D array
a[(y*256)+x] // 1D array
a[y][x] // 2D array

Also, like others said above, we have pipelines and latency such that the relevant instructions are usually executed while waiting for other things like memory access anyway.

Re: C: multi vs single dimensional array performance?
by koitsu on 2018-11-28 (#229479)

lidnariq wrote:

Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)

Going off of what I know of the OP: probably because when it comes to CPU architectures and assembly, all he's familiar with is 65816 -- the final member of the 65xx family that never had native multiplication/division instructions for integers of any value**. What's easily unknown to "younger folks" is that other CPUs at the time the 6502 came out also lacked such capability (read: Intel 808, Zilog Z80). The only exceptions I know of for that same time frame are the VAX-11 and IBM's System 360 mainframe (a bloody behemoth), neither of which are microprocessors (the latter two are microcomputers, whose CPUs were physically enormous). Today people are "spoiled" by mul/imul/div/idiv (x86, starting with 8086) or muls/mulu/divs/divu (68K) or whatever ARM offers (AFAIK, multiplication-only up until roughly ARMv6 or v6, I forget, which added division).

I mention age because when I started doing x86 (80286/80386) for the first time in 1993 or so -- I was 16 years old -- the native imul/idiv instructions were a godsend. So were having more than 3 registers (particularly having more than one common-purpose accumulator, subsequently followed by having 32-bit registers and linear addressing (real mode/segmented memory is AWFUL)). I believe the OP is equally young and may be equally unaware.

** -- I have to say this because some pendant will come along and pee in my corn flakes blabbing about bit shifts.

Re: C: multi vs single dimensional array performance?
by calima on 2018-11-29 (#229482)

Sample values for a Phenom II: integer multiplication 1-4 cycles, RAM access 121+ cycles.

Re: C: multi vs single dimensional array performance?
by 93143 on 2018-11-29 (#229483)

Which is also (so I hear) why it's good to know whether the language stores multidimensional arrays in row-column or column-row order, because if you write your inner loop so that it increments the wrong one, you can end up skipping all over memory and load yourself down with cache misses.

Maybe a modern compiler would optimize that out; I don't know. Quite a trick to optimize it out if you have multiple loops that don't use the same order...

Re: C: multi vs single dimensional array performance?
by ap9 on 2018-11-29 (#229485)

There is a slight advantage to manually designing the array, say to pre-compute a position in an array for later use, if it doesn't change throughout an operation.

Example:

Code:

const int timings[5*8] = {...};
int current_frequency;
int *current_timings;

void set_freq(int freq)
{
  current_frequency = freq;
  current_timings = timings + freq*5;
}

void tick_frame()
{
  ticks+= current_timings[frame];   // instead of timings[frame+freq*5]
  ++frame;
}

As for multiplication versus shifts or tables for performance, it depends on the CPU/RAM being used. PowerPC had particularly fast arithmetic compared to slow system RAM. Today's chips can execute so many instructions per cycle.

The compiler tends to take care of these things well enough that most cases are trivial on manual improvements. If there's anything I've learned, it's better to write the way you like or understand it, so you can get to prototype stage faster. Time/profile it afterward, if it matters. Making optimizations can be particularly time consuming.

Re: C: multi vs single dimensional array performance?
by Banshaku on 2018-11-29 (#229492)

@Koitsu
Yes, some bits can shift in your corn flakes... what where we talking about, condiments? :lol:

@Drew Sebastino

On a serious note, unless you target a very specific CPU that have low specs or you just are starting to learn plain C, you don't have to worry yet about those things: those are the kind of things you worry later when you have specific bottle neck and have more experience on the subject. These days you don't have to worry much about that anymore.

Re: C: multi vs single dimensional array performance?
by tepples on 2018-11-29 (#229493)

Banshaku wrote:

@Koitsu
Yes, some bits can shift in your corn flakes... what where we talking about, condiments? :lol:

I assume koitsu was referring to my habit of pointing out edge cases, particularly if they are believed not to change the conclusion meaningfully. Shift instructions are good for multiplication and division by powers of 2 but not for arbitrary multipliers and divisors.

Edge case II: Digital eventually turned VAX into a microprocessor several years later (CVAX, 1987). But by then, the market for microprocessors with a PDP-11-inspired 32-bit ISA was favoring sɯɐᴉllᴉM's 68K, and a few years later, Digital pivoted to Alpha.

Banshaku wrote:

On a serious note, unless you target a very specific CPU that have low specs

Such as 6502, LR35902, or what have you.

Re: C: multi vs single dimensional array performance?
by Drew Sebastino on 2018-11-29 (#229495)

lidnariq wrote:

Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)

Well, it's not like it's never been "meaningfully slow", but it'll always be slower than integer addition, I'd imagine. Not to mention it still has to do an extra addition for generating the offset in a multidimensional array that it wouldn't have to otherwise.

pubby wrote:

It takes more time to read from memory nowadays than it does to multiply two numbers together.

Even if in cache? (Depends on what level of cache, but you get the point)

rainwarrior wrote:

If you want to know how fast or slow they are, time it.

With what, a stopwatch? :lol:

rainwarrior wrote:

If you've got something that's conceptually a 2D grid, what's your alternative way of dealing with it? You need to turn 2 coordinates into an address. How else do you propose to do that?

The starting offset will be generated the same way no matter what, which is why I said it wouldn't be slower for a single access. Say if you wanted to increment all the data on the fifth [4] row of a 20x20 array though; this:

Code:

i = 4 * 20;
j = i + 20;

for(; i < j; i++)
  array[i]++;

should be faster than this:

Code:

j = 4;

for(i = 0; i < 20; i++)
  array[j][i]++;

Edit: I think ap9 was talking about the same thing?

The obvious thing would be just don't use a multidimensional array for this then. Except, I think there are many people who are not aware that the above code would be faster, never mind not caring; it looks uglier, for one thing. Not saying that programming needs to be fool-proofed though...

koitsu wrote:

I believe the OP is equally young and may be equally unaware.

Actually 19 now (kind of pathetic at this point...) The NECV33 / 80186 in the Irem M92 has multiplication and division instructions, but I think I remember they're still a fair bit slower than addition / subtraction. Maybe I'm just thinking of the 68000 where they're infamously slow.

calima wrote:

RAM access 121+ cycles.

Shit...

Re: C: multi vs single dimensional array performance?
by gauauu on 2018-11-29 (#229496)

Quote:

Code:

i = 4 * 20;
j = i + 20;

for(; i < j; i++)
  array[i]++;

should be faster than this:

Code:

j = 4;

for(i = 0; i < 20; i++)
  array[j][i]++;

Maybe. Maybe not. Optimizers are so crazy these days that it's hard to guess what will faster without actually checking. But for most things like this, the difference is irrelevant other than in the most extreme situations (a bottle-neck of code that has to run bajillions of times, or something extremely timing-sensitive).

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-29 (#229497)

Drew Sebastino wrote:

lidnariq wrote:

Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)

Well, it's not like it's never been "meaningfully slow", but it'll always be slower than integer addition, I'd imagine.

How do you make a 1 cycle instruction faster than another 1 cycle instruction?

Drew Sebastino wrote:

pubby wrote:

It takes more time to read from memory nowadays than it does to multiply two numbers together.

Even if in cache? (Depends on what level of cache, but you get the point)

Yes.

Drew Sebastino wrote:

rainwarrior wrote:

If you want to know how fast or slow they are, time it.

With what, a stopwatch? :lol:

That's not an invalid way of timing a test that takes long enough, but of course you should probably use something designed for the purpose of timing software.

Drew Sebastino wrote:

rainwarrior wrote:

If you've got something that's conceptually a 2D grid, what's your alternative way of dealing with it? You need to turn 2 coordinates into an address. How else do you propose to do that?

Code:

i = 4 * 20;
j = i + 20;

for(; i < j; i++)
  array[i]++;

should be faster than this:

Code:

j = 4;

for(i = 0; i < 20; i++)
  array[j][i]++;

Edit: I think ap9 was talking about the same thing?

No, in that case it won't be faster at all. ap9 was saying that in that example iterating over j will be slower than iterating over i. Iterating directly on the row won't have any meaningful improvement. For a loop like that the compiler would definitely know that j is a invariant here, and will operate accordingly.

It would generally be capable of doing this even if you put j=4 inside the loop. (Identifying loop invariants is something compilers usually do very well.) ...but even if it didn't it might not matter for performance due to latency/pipelines, and then if it does there's certainly lower impact ways to fix the problem than having to change your data structure everywhere else it's used.

Drew Sebastino wrote:

The obvious thing would be just don't use a multidimensional array for this then. Except, I think there are many people who are not aware that the above code would be faster...

...except it isn't faster. There's a reason I suggest that you time these things. What's seems to be "obvious" to you is factually wrong. You won't be able to correct your intuition about this without finding some way to actually verify what you believe.

Also, talking about shifts vs. multiplies, even though the instruction speed is a probably not different, there are often still advantages to keeping rows of your structures a power of 2 in length. Sometimes it helps with caching, sometimes it helps with other optimziations. (Most of them there is no reason to do "by hand".)

Even on older systems where e.g. a multiply by 10 might potentially have been a slower instruction than two shifts and an add, compilers would automatically compile * 10 into those shift and add instructions. Just because you write * does not specify imul. The compiler is allowed to get that operation done in whatever ways it knows how. The point of optimizing compilers is to keep the programmer from having to make brainless transpositions of the code like this and let them write something easier to read.

Re: C: multi vs single dimensional array performance?
by lidnariq on 2018-11-29 (#229498)

93143 wrote:

A friend of mine was in a competition to write an AI for Laser Chess ≈ Khet. He managed to fit the entire board state into a single slice of L1 cache, and as such in the time limit alloted per turn he was able to run his less-clever AI to search through 3 or 4 more moves in the future ("ply"), which got him to second place out of 20-ish contestants (First place was an AI that really had a better intuition about how the game worked).

Re: C: multi vs single dimensional array performance?
by tepples on 2018-11-29 (#229500)

rainwarrior wrote:

Drew Sebastino wrote:

lidnariq wrote:

Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)

Well, it's not like it's never been "meaningfully slow", but it'll always be slower than integer addition, I'd imagine.

How do you make a 1 cycle instruction faster than another 1 cycle instruction?

By a constraint of a superscalar microarchitecture requiring it to be scheduled on an execute unit that another instruction is already using. If you have one and a half units (U-V pipeline in Pentium and Pentium MMX as described in John Stokes' article), or one full unit and two half units (4-1-1 pipeline in Pentium Pro, Pentium II, and Pentium III), it's best for the compiler to interleave instructions that can run on the half units with instructions that need the full unit. Lately, even different threads have started to share resources, as seen in Intel processors with Hyper-Threading Technology and AMD processors with Clustered MultiThreading.

Re: C: multi vs single dimensional array performance?
by Drew Sebastino on 2018-11-29 (#229502)

Quote:

there's certainly lower impact ways to fix the problem than having to change your data structure everywhere else it's used.

I wouldn't think everywhere; aren't you able to omit the left set of square brackets and index past the row dimension in the right set of square brackets?

Quote:

The point of optimizing compilers is to keep the programmer from having to make brainless transpositions of the code like this and let them write something easier to read.

I guess I wasn't quite aware of the magnitude modern compilers are able to optimize code...

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-29 (#229503)

tepples wrote:

By a constraint of a superscalar microarchitecture requiring it to be scheduled on an execute...

I don't think that helps, tepples.

My simple point was that an add isn't "always" faster than a multiply, and it's generally better to let the compiler figure this out. (Yes, an add can be faster in the right circumstances.)

Most compilers are very good at figuring out the best instruction to use, locally. There's always stuff that can be improved "by hand" but in general you should not resort to assembly until you have identified a specific performance problem with a specific piece of code. (i.e. if you aren't already timing that code, you're not ready to optimize it.)

Something that compilers can't do is figure out a better algorithm, or reorganize your data structures. That's why ap9 was talking about knowing whether your arrays are stored by rows or columns; if you know that one is more important than the other, you should arrange your array that way.

Something that compilers couldn't do very well in the past but are getting better at is figuring out what the important cases in your data are. Usually this is an area where you have a lot more capacity to know what your data really looks like, so you might know that the common case should be optimized at the expense of the rare one. This is now somewhat approachable by the compiler through profile guided optimization, which is pretty neat.

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-29 (#229504)

Drew Sebastino wrote:

Quote:

there's certainly lower impact ways to fix the problem than having to change your data structure everywhere else it's used.

I wouldn't think everywhere; aren't you able to omit the left set of square brackets and index past the row dimension in the right set of square brackets?

I don't understand the question, but the hypothetical I was thinking of is a case where j is technically invariant but the compiler doesn't realize this for some reason.

Here's a couple of illustrations:

Code:

// in a case like this, current_row() probably won't be understood as invariant (though it might if it's an inline)
for (int i=0; i<50; ++i)
   a[current_row()][i] = thing;

// you can manually make it invariant like this:
const int j = current_row();
for (int i=0; i<50; ++i)
   a[j][i] = thing;

// another common idiom is to take a pointer to a row of data and use that instead
// (not faster in this simple case, but it's a valid way to temporarily slice the structure, semantically)
int * const row = a[j];
for (int i=0; i<50; ++i)
   row[i] = thing;

It's hard to come up with a good example for this though, because it kinda has to get more complex than this before this stuff even matters to performance. With simple loops, there's often a bunch of ways to express the same idea that all compile to equivalent cod.

It's hard to think of a case where it would matter whether you use a 2D fixed array vs. manually indexing a 1D fixed array. In most situations these are simply equivalent.

A more realistic situation would be to find that some loop is taking a lot of time, and maybe looking at a disassembly (or especially a debugger with pipeline visualization) and thinking about how to address whatever the more specific issue the code is having. Usually this just involves tweaking the C code a little to get the compiler to handle it better, more rarely you might add cache control instructions, or even rewrite the assembly... but often just getting good knowledge of why something isn't working will give you a new idea for how to approach it on a higher level, and change the algorithm rather than trying to optimize at the finer level.

Like that's sort of what people are saying above, even if you had two ways of indexing an array where one has e.g an extra instruction, it still might not actually make any operations slower in your program. Successfully optimizing modern C software usually takes more careful situational consideration than just applying old canards like "use ++ as a prefix instead of suffix". A lot of very normal ways of doing loops and arrays and things like that have been well studied, and optimizers do a good job with them.

Drew Sebastino wrote:

I guess I wasn't quite aware of the magnitude modern compilers are able to optimize code...

It's a completely different situation than writing assembly. (...or writing C with CC65, where the compiler doesn't really have an optimizer.)

Re: C: multi vs single dimensional array performance?
by koitsu on 2018-11-29 (#229510)

Should we introduce him to https://godbolt.org/ or not?

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-29 (#229511)

I was going to mention it, but I don't think it really demonstrates the case he was asking about. Godbolt will show you the generated assembly, but won't show you what's faster, or where the pipeline is going to stall, what your cache does, etc. which are a lot more relevant to the array access question.

Writing some code and timing it will answer those questions a lot better. I think you need to know about what's actually faster before you can really speculate about the speed of the assembly output.

Edit: just noticed it has "LLVM-MCA" in Godbolt's tools list, which seems to do some interesting analysis. Maybe that's worth playing with.

Re: C: multi vs single dimensional array performance?
by koitsu on 2018-11-29 (#229512)

rainwarrior wrote:

Drew Sebastino wrote:

Quote:

there's certainly lower impact ways to fix the problem than having to change your data structure everywhere else it's used.

I wouldn't think everywhere; aren't you able to omit the left set of square brackets and index past the row dimension in the right set of square brackets?

I don't understand the question, but ...

I think I understand his question, but it just prompts me to ask him questions. Let's assume char b[8][1024] is declared:

If the question is "can you do something like b[1024]?" then the answer is no; the compiler will tell you at compile-time that the array index is out-of-bounds.

If the question is "is the memory in a multidimensional array guaranteed to be allocated linearly in memory?" then the answer is yes, assuming the array is declared as described. The below program demonstrates exactly that -- by accessing individual bytes of the array directly with a pointer:

Code:

#include <stdio.h>

int main(void) {
  char b[8][1024];
  char *ptr;

  b[0][0] = 'x';
  b[0][1] = 'A';
  b[1][0] = 'y';
  b[2][0] = 'z';

  ptr = (char *) b;

  printf("%c\n", *(ptr));            /* will output x */
  printf("%c\n", *(ptr+1));          /* will output A */
  printf("%c\n", *(ptr+(1*1024)));   /* will output y */
  printf("%c\n", *(ptr+(2*1024)));   /* will output z */
  printf("%c\n", *(ptr+1900));       /* what do you think this will show? */

  return 0;
}

The last printf(), if you aren't sure, will output whatever happens to be at that byte in memory at that time -- it'll probably be a gobbledegook character, but you have no way of guaranteeing what it is (I don't want to get into explaining how ld.so and ELF end up declaring this stuff, it's advanced material). I should have done something like memset(&b, '-', sizeof(b)); in advance, to pre-initialise all the bytes to the ASCII character -. NULL or '\0' is not relevant here; %c will indeed output a null byte if encountered (read: %c is not %s).

Edit: typo.

Re: C: multi vs single dimensional array performance?
by 93143 on 2018-11-29 (#229513)

rainwarrior wrote:

Drew Sebastino wrote:

I guess I wasn't quite aware of the magnitude modern compilers are able to optimize code...

It's a completely different situation than writing assembly. (...or writing C with CC65, where the compiler doesn't really have an optimizer.)

I recently had an issue where changing a file in my C++ simulation code (to change how a flag was calculated) didn't work without me deleting every object file in the directory before running make. Naturally, make was assuming that since .o files existed for every .cc and .h file I hadn't changed, they didn't need to be recompiled. Unfortunately, it seems the Intel compiler with -O3 was doing optimization across multiple files; it had noticed that this particular flag was always zero in my old code and had optimized out the tests, so when the new code was linked with the older code, the new flag handling did nothing.

Fun stuff.

Re: C: multi vs single dimensional array performance?
by dougeff on 2018-11-29 (#229514)

Quote:

didn't work without me deleting every object file in the directory

'make' options.

‘--always-make’
Consider all targets out-of-date. GNU make proceeds to [remake all] targets

‘--assume-new=file’
Pretend that the target file has just been modified.

or have a make recipe that specifically deletes all .o files like

clean: rm -f *.o

(windows)
clean: del -f *.o

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-29 (#229516)

The description sounded more like a compiler bug to do with "whole program optimization"; one which I hope was reported and eventually fixed.

Re: C: multi vs single dimensional array performance?
by Banshaku on 2018-11-29 (#229518)

tepples wrote:

As for myself, I just love teasing Koitsu, it's one of my many hobbies :lol:

Still, my opinion remain, since you just finally learned C, which means there is still a lot of things that are required to be learned before being comfortable, this is the last thing you should focus on: optimisation. You start to optimise and learn more about it once you have coded enough or the basic idea is done. It doesn't mean that you shouldn't be aware of performance issues, this is not what I'm saying but it shouldn't be your primary focus while learning. It will come naturally, once the need arise.

Re: C: multi vs single dimensional array performance?
by 93143 on 2018-11-29 (#229519)

I'm not really worried about the bug, if you can call it that (I just figured that was how it worked and you had to watch out for it). It's easy to work around, and I have to focus all my attention on my research because it's running very late. Maybe I'll try to figure out who to report it to and what they need to reproduce it after I finish this project.

My main point was that modern optimizers can get pretty adventurous. In this case, the optimizer deleted parts of my code based on information parsed from a completely different file, which is not something you'd expect it to do if your only experience was '80s-era assembly. If you write lda #$00; bne + in a SNES program, the assembler will assemble it with no questions asked.

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-29 (#229522)

As you described it I definitely think Intel would consider that a bug, and would want to fix it. I'm sure other users of the compiler would appreciate not having to deal with the same bug, too.

Re: C: multi vs single dimensional array performance?
by 93143 on 2018-11-29 (#229523)

Okay, but is it Intel's problem or is it GNU's problem? Or neither? The optimization is working correctly as far as I can tell; it's just that make didn't realize that the optimization in the older files relied on assumptions that the newer file broke, so it didn't bother recompiling them. In fact, from what I can tell with a quick search, this may be something that can be fixed at the makefile level.

But I'm not a make guru at the best of times, and this particular makefile is downright byzantine. It's about 100 KB, plus a 53 KB makefile.def, and I didn't write it. I'm also far from the only person using it. Given the pressure I'm under, I think it's best if I file this as somebody else's problem for now.

Anyway, I don't want to hijack this thread. I was just commenting on the state of modern optimizing compilers. If you're writing C/C++ for a modern PC, coding for speed means something very different than it does when writing assembly for the SNES.

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-29 (#229524)

Oh, well I suppose if you don't have header dependencies configured correctly a problem like that could easily happen regardless of optimization. I thought you were saying that this was caused specifically by a whole program optimization (/IPO?) feature, but maybe I misinterpreted that entirely. Yes, makefiles are very easy to get wrong. :S

Re: C: multi vs single dimensional array performance?
by 93143 on 2018-11-30 (#229525)

Well, yeah, it seems to have been caused by whole program optimization inducing a dependency between otherwise independent files, which the makefile was not set up to know about.

All checks of a certain flag had been stripped out of the compiled object files, because the compiler had determined, by parsing across multiple files, that the flag was always set to 0. Then I changed one .cc file to conditionally set the flag to 1 and remade the program; make checked the timestamps and only recompiled the file I'd changed...

Re: C: multi vs single dimensional array performance?
by calima on 2018-11-30 (#229526)

Koitsu's example of accessing a multidimensional array linearly is undefined behavior. The compiler is allowed to break that, it is not guaranteed to be allocated linearly. Hit that before.

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-30 (#229528)

Right, if you actually want a linear 1D array you should just use one. There are some applications where being able to e.g. treat an x coordinate past the edge as the next line down is a useful thing.

You can also split the difference with a convenient inline accessor function that takes (x,y) parameters. Would perform just as well as either the 1D or 2D array in normal circumstances, and you could use it or ignore it as needed.

Re: C: multi vs single dimensional array performance?
by pubby on 2018-11-30 (#229529)

calima wrote:

Koitsu's example of accessing a multidimensional array linearly is undefined behavior. The compiler is allowed to break that, it is not guaranteed to be allocated linearly. Hit that before.

Pretty sure arrays are guaranteed to be allocated in a single contiguous storage location and they're guaranteed to be packed (or rather, their size is equal to sizeof(T) * N). Because of this, it should be safe in practice.

The reason it's undefined behavior is because of other rules, like how a pointer to an object can't be incremented past the end of that object, and so on. And type punning, etc.

But yes! If you want to do that, just use a single dimensional array and add the multiplications in yourself.

Re: C: multi vs single dimensional array performance?
by koitsu on 2018-11-30 (#229532)

calima wrote:

Koitsu's example of accessing a multidimensional array linearly is undefined behavior. The compiler is allowed to break that, it is not guaranteed to be allocated linearly. Hit that before.

...which is exactly why I said, quoting myself, "then the answer is yes, assuming the array is declared as described." The array I demonstrated in the example code I gave is absolutely going to consist of a piece contiguous/linear memory. Period -- and I refuse to get into discussing how ld.so and ELF declarations end up being allocated because it's remarkably advanced and I have no interest in blowing time discussing it. The only "bug" in what I listed is one of habit: I should have said, in my final paragraph, memset(&b, '-', sizeof(b[0][0]) * 8 * 1024);, though what I said will work in this case.

OP's question got answered as best I could answer it / as much time as I was willing to put into it. Are there other aspects of multidimensional (or even single-dimensional!) array declarations and accesses/methodologies (generally) where this *won't* work? Absolutely! But not in the example I gave. Nothing else for me to say on this matter.

Re: C: multi vs single dimensional array performance?
by Bregalad on 2018-11-30 (#229534)

koitsu wrote:

The only "bug" in what I listed is one of habit: I should have said, in my final paragraph, memset(&b, '-', sizeof(b[0][0]) * 8 * 1024);, though what I said will work in this case.

My god C is so ugly as a programming language, I can't for the world figure why anyon would use such an ugly language and although I've much used it I've forgotten all those quicks and hope I won't met them ever again. This line above looks like giberrish nonsense and doesn't look like you're iinitalizing a multidimentional array.

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-30 (#229535)

Bregalad wrote:

koitsu wrote:

The only "bug" in what I listed is one of habit: I should have said, in my final paragraph, memset(&b, '-', sizeof(b[0][0]) * 8 * 1024);, though what I said will work in this case.

Not that I really care whether code is "ugly" but there are other options.

It could have been initialized to 0 during its declaration like this:

Code:

char b[8][1024] = { 0 };

If you need an arbitrary initialization value, even the memset can be ranged more simply like this, so at least you aren't repeating your values, and the & is optional for a static array like this:

Code:

memset(b, v, sizeof(b));

There are other ways too. (C++11 has some 'nice' looking ways to do this, maybe.) koitsu's version isn't surprising or non-idiomatic though, even though it's verbose, I've seen similar many times.

Re: C: multi vs single dimensional array performance?
by lidnariq on 2018-11-30 (#229539)

koitsu wrote:

This behavior is, in fact, described in the C99 standard. §6.5.2.1 Array subscripting

Re: C: multi vs single dimensional array performance?
by Drew Sebastino on 2018-11-30 (#229546)

Sorry for being ignorant, but a multidimensional array is guaranteed to be contiguous in memory, correct? (I was getting mixed signals from some of the answers.) This is important because, regardless of performance, sometimes it is less complicated (for the programmer) to access a multidimensional array linearly. A random example would be if you wanted to increment every item in a multidimensional array by 1; you'd save yourself from having to write embedded for loops.

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-30 (#229551)

No, koitsu was demonstrating that the underlying memory organization is the same. I don't believe he was suggesting that it is a good idea to cast a 2D array to a 1D array.

If you want 1D array access then just use a 1D array. If you want convenient 2 coordinate access to that 1D array, make an inline function or a macro or something to make indexing it like a 2D array easier. It will be functionally and performance-wise equivalent to a 2D array.

Don't go casting types around like this except as a last resort. It can work if you really need it to, but it's semantically wrong enough that an aggressive compiler might break it on you in the wrong situation.

...and again, you're making the assumption that there is a performance advantage for doing this anyway. This is not necessarily true. Depending on the situation accessing a row at a time might be as fast or nearly as fast as having the whole array unrolled into a 1D form. The difference between these two things is not usually a significant performance bottleneck by itself. It can be, but it usually isn't. Like I keep saying: time it if you really want to know, and don't start optimizing until you've started timing things. The answer to this is going to be different for different shapes of array, and different needs.

Re: C: multi vs single dimensional array performance?
by Drew Sebastino on 2018-11-30 (#229553)

rainwarrior wrote:

...and again, you're making the assumption that there is a performance advantage for doing this anyway.

I was talking about ease for the programmer (not the processor) at this point; I now understand that the compiler is capable of generating efficient code with a multi dimensional array.

There are things that are definately easier for the programmer to conceptualize as a multi dimensional array vs a single dimensional array, so you would make a multi dimensional array, but you would still want the option to access linearly, especially if the compiler will treat it as such in these cases anyway.

As an example, you have a framebuffer stored as a 2D array, because for 99% of cases, it's easier to work with this way. However, if you want to perform an operation on every pixel of the framebuffer, it would be easier for you (not the processor) to start at index 0 and go until index xDimension * yDimension rather than making an embedded for loop for each axis. Not a good example though because there's hardly a difference in difficulty for the programmer either way...

Re: C: multi vs single dimensional array performance?
by rainwarrior on 2018-11-30 (#229555)

Well, I mentioned it by name, but there are solutions to the convenience problem too, e.g. in C++:

Code:

inline int& pixel(int x, int y)
{
    return array[(y*256)+x];
}

// can use to access the 1D array with 2 coordinates:
pixel(10,35) = v;
v = pixel(99,2);

The "inline" means the compiler is allowed to weave the function directly into the code instead of having to do a subroutine call, and permits all sorts of optimizations to carry through it. It's yet another functionally equivalent way to do the indexing. There are a bunch of other options here too.

I've seen stuff like a "forxy" macro that's a shorthand for a loop that iterates over x and y, instead of having to type two loops.

There are tons of image libraries out there that have both ways of accessing an image built into an image class, or operating on an image structure.

Really there are a lot of ways to accomplish these things. Many of them have zero run-time performance drawback. C and especially C++ are very versatile in this respect, and I think this is one of the big strengths of these languages.

Re: C: multi vs single dimensional array performance?
by koitsu on 2018-11-30 (#229556)

rainwarrior covered it -- no, I wasn't advocating treating a multidimensional array as a standard/single-dimensional array (you should access the array in the manner of which it was declared if at all possible), I was simply demonstrating that the memory is guaranteed to be contiguous if the array is declared as in my example. Other declarations certainly do not guarantee contiguous layout, and thus you need to trust the language and compiler to do what's optimal. What's optimal depends on the situation; if you're ever worried/deeply concerned, then use a profiler and look at the generated assembly + do cycle counting.

If embedded (nested) for-loops bother you, then you'll probably loathe how one iterate over data structures like JSON, YAML, XML, etc. -- things commonplace today in higher-level PLs. You use nested loops or a helper function which is recursive to help you, or you end up using abstraction frameworks that "hide" it all from you (but in the end, do exactly that -- nested loops or recursive functions). But switching back to C, linked list code or trees will probably bother you -- but they're very common, both in userland (programs/libraries) as well as kernel. Some OSes (ex. Linux, BSDs; not sure about Windows) have helper macros for lists and trees that make such things a bit easier code-wise.