Pointers And Arrays (intermediate)


Match word(s).

If you have any questions or comments,
please visit us on the Forums.

FAQ > Prelude's Corner > Pointers And Arrays (intermediate)

This item was added on: 2003/11/26

One of the first things a new student learns when studying C and C++ is that pointers and arrays are equivalent. This couldn't be further from the truth, and is often a source of bugs for even experienced programmers who haven't been enlightened by someone who is aware of reality. If you want proof, here is a simple test using header files, declare an array in a header file such as int a[32]. Now create a source file that references a, but declare it as an extern pointer. If arrays and pointers are interchangeable then this should work just peachy:


#include "list.h" 

extern int  *a;

int main(void)
{
  a[0] = 5;

  return(0);
}


Unfortunately, your compiler gave you a warning, probably something along the lines of a type mismatch and redeclaration. Now, if you change the extern declaration to extern int a[]; it works just fine. Are you suspicious about the relationshipf of pointers and arrays now? The problem is that most C and C++ text and teachers aren't familiar enough with the details of when pointers are and are not equivalent to arrays, so they don't discuss the issues in detail. The above code could be replaced with int a; in the header file and extern double a; in the source file, the results would be the same. This tutorial will describe exactly when a pointer is equivalent to an array. Yay!

The first big difference is how pointers are accessed in memory, it may surprise you that pointer accesses are wildly different from array accesses. Take the following code for example:


#include <stdio.h> 
#include <stdlib.h> 

int main(void)
{
  int a[1];
  int *b = malloc(1 * sizeof *b);

  a[0] = 1;
  b[0] = 1;

  return(0);
}


If you take the time to look at the assembler output for the lines that assign 1 to a[0] and b[0] then you might be surprised to see something like this:


  9:     a[0] = 1;
 0040102E   mov         dword ptr [ebp-4],1

 10:    b[0] = 1;
 00401025   mov         eax,dword ptr [ebp-8]
 00401028   mov         dword ptr [eax],1


Sparing you the gory details of assembler language, the array access simply moves directly to the location in memory referenced by a[0] and places the value 1 there, there are no other signifigant operations. The pointer access, however, is more complicated. The address of the pointer is first accessed and the contents read, in this case, the content is the address of the memory being pointed to. Next, the value of the subscript, 0 in this case, is added to the memory address just acquired from the pointer, only then is the memory referenced by b[0] accessed. The obvious conclusion is that arrays and pointers are not equivalent in this case.

I can hear you yelling, "So when are they the same?". Put simply, it can be broken down into how it is used:


 Rule 1:
  An array name in an expression (in contrast to a declaration) is 
  treated by the compiler as a pointer to the first element of the array.

 Rule 2:
  A subscript is always equivalant to an offset from a pointer.

 Rule 3:
  An array name in the declaration of a function parameter is 
  treated by the compiler as a pointer to the first element of the array.


Generally, whenever the name of an array appears in an expression, it is converted to a pointer to the first element of the array. This makes perfect sense, but can cause a few gotchas if you aren't careful.

Rule 1 specifies that an array name can always be accessed using either pointer or array notation, so the expressions a[i] and *( a + i ) are equivalent since an array is treated as a pointer to the first element of the array. In reality, any use of array notation is converted by the compiler to pointer notation anyway, and because of this you can use array notation with pointers, and create some interesting constructs, i[a] is legal and equivalent to a[i], though you should never exploit this fact.

Rule 2 notes the same as I explained above, array notation is always treated as a pointer offset. Because of this the common belief that pointers are faster than arrays for the same operations is usually incorrect.

Rule 3 states that as the formal parameter to a function, an array is always converted to a pointer to the first element. This is good for efficiency and other interesting tricks such as passing a slice of an array to a function, but it gets hairy when you want to take the size of the array. Since the parameter is converted to a pointer, this will not work as you want:


#include <stdio.h> 

void func(char a[])
{
  printf("%u\n", sizeof a);
}

int main(void)
{
  char  a[] = "12345";

  func(a);

  return(0);
}


What does this print? If you said 6 then you should reread the previous paragraph, if you said the size of a pointer then you get a cookie because you're correct. Since a is converted to a pointer when you pass it to func, all size information is lost to func so you can only access the size of the pointer to the current element. This causes some problems, but can usually be fixed by passing an extra argument containing the size of the array:


#include <stdio.h>  

void func(char a[], size_t s)
{
  printf("%u\n", s);
}

int main(void)
{
  char  a[] = "12345";

  func(a, sizeof a);

  return(0);
}

Now the output is 6. Note that I used size_t as the type of size since that is the type of the result for the sizeof operator. To avoid type mismatch problems it is a good idea to always keep things matching.

The crash course in pointer and array relationships is now over, consider yourself a better programmer with this knowledge, you can now avoid problems related to pointers and arrays or at least have the necessary knowledge to fix problems if they show up. Don't stop here though, pointers and arrays are an interesting subject and you should learn as much as you can about them since they are used so much in C and C++.

Have fun!

Script provided by SmartCGIs