Help with malloc(): datum ** and [i][j]

Mon Aug 20 07:22:58 AEST 1984

(Warning:  what follows is a long explanation of various subscripting
phenomena.)

	From: bbanerje at sjuvax.UUCP (B. Banerjee)

	What I would like to do is something like this -

	datum **myarray;	/* datum is typedef'ed */

	myarray = calloc(num_rows * num_cols, sizeof(datum));

	The trouble is, what should I cast calloc to in order that I may
	address myarray using indices?  I thought of doing something like
	this:

	myarray = (datum **) calloc ( ....)

	However, the datum that "myarray[i][j]" would address is not
	clear to me.  The situation is even murkier when using higher
	dimensioned arrays.

(Tap tap tap)  Quiet in the back row!  There, that's better.  Can
everyone hear me?  What?  Is that better?  Ok, good.

Today we're going to talk about multi-dimensional arrays.  We all
understand single dimensional arrays, right?  One simply has a large
area of storage that is accessed by adding an offset (the subscript) to
the start of the memory area and poking around in there.  No problem.

Well, in many compiled languages, like C for example [of course!],
multi-dimensional arrays are really the same thing, only fancier.
Think of it this way:  suppose the array is declared as

	int a[20][50];

Then we can just multiply the two constants together and make a big
block of integers (in this case 1000 of them).  This is our storage
area.  To access one of these, we have two choices:  we can use what's
called ``row major'' order or ``column major order''.

Suppose we're asked to fetch a[6][4].  (Let's assume we have zero-based
arrays, so we go from a[0][0] to a[19][49].)  We can either multiply
the first subscript by the second subscript size and add the second
subscript.  This gives 6 times 50 plus 4 which is 304.  So we add 304
to the storage area for ``a'' and grab an integer from there.  This is
the one called ``row major'' (I think).

The other way we could do it is multiply the 4 times the 50 and add
6; that is, start with the second subscript and multiply it by the
second subscript size, then add the first subscript.  From here we
do the same thing as before.  So add 206 to ``a'' and fetch from there.
If I haven't got them reversed this is ``column major'' order.

Going back to the first, let's take a look at a three dimensional
array.  Given ``int a[4][6][5];'' as the declaration and ``a[2][3][4]''
as the reference, we take 2, multiply by six, add three, multiply the
result by five, and add four, giving 79.

We can generalize the first as follows:

	offset = 0;
	for (i = 0, j = num_dimens - 1; i < num_dimensions; i++, j--) {
		offset = offset + array_index[i];
		offset = offset * array_sub[j];
	}

where num_dimens is the number of dimensions in the array,
array_index[] contains the constants used in the declaration of the
array, array_sub[] contains the values used in the reference, and
offset computes the offset from the base of the storage area.

Comment?  What?  Yes, I know it's not optimized C code.  This is
just an illustration.  Also keep in mind that for a compiler the
subscripts (array_sub[] elements) might well be variables or
expressions, and the offset will have to be computed at RUN time.

Everyone got that?  Too bad, we're moving on anyway.

All those multiplications done at run time can really slow things
down.  Wouldn't it be nice if there were some other way to get to the
various array elements?  Also, wouldn't it be nice not to have to have
60 columns for every row, when only one or two of them really need all
of those?

I see some smiles.  You, you look like you want to say something.

Did everyone hear that?  No?  He's right; he said ``pointers''.  We can
use pointers to weasel out of multiplications.  Not only that but we
can make different ``subscripts'' have different numbers of
``sub-subscripts''.  Of course there's a price.  We have to pay for
this with storage and much head-scratching.

Let's look at a ``two-dimensional'' ``array'' using pointers.
Think of it this way:  instead of having a big rectangular area
of storage like this:

	a[0][0]		a[0][1]		a[0][2]
	a[1][0]		a[1][1]		a[1][2]

we set up one vector like this:

	a[0]
	a[1]

and make each point to another vector.  Now we have something like
this:

	a[0]===>[0] [1] [2]
	a[1]===>[0] [1] [2]

That is, one two-element vector of pointers, and two three-element
storage areas.  We're using more space now, because we have, in
addition to the six storage areas for values, two more for pointers.

But, we can now access ``a'' without multiplies.  If we need to
generate code for ``a[1][0]'', we just add 1 to the vector of pointers
and fetch that, then add 0 to that and fetch from there.  In other
words, follow the pointer for a[1] and take [0] from there.

Also, we can set up the vectors like this instead:

	a[0]===>[0] [1] [2]
	a[1]===>[0] [1] [2] [3] [4] [5] [6]

Since we aren't committed to ``rectangular'' storage areas, and in fact
can put the data in a[0][] ``far away'' from the data in a[1][], we
aren't limited to fixed sizes.  Of course, it gets confusing if a[0]
can access only [0] through [2] but a[1] can access [0] through [6].

We can make ``higher-dimensional'' ``arrays'' the same way.  If ``a''
is supposed to be a five by ten by four array, we can set up five
vectors pointing to ten vectors pointing to storage, like this:

	a[0]===>[0] [1] [2] ... [9]
	a[1]\	 v   v   v       v
	a[2]\\	[0] [0] [0] ... [0]
	a[3]\\\	[1] [1] [1] ... [1]
	a[4] |||[2] [2] [2] ... [2]
	  |  |||[3] [3] [3] ... [3]
	  |  |||
	  |  ||\[0] [1] [2] ... [9]
	  |  ||	 v   v   v       v
	  |  ||	[0] [0] [0] ... [0]
	  |  ||	[1] [1] [1] ... [1]
	  |  ||	[2] [2] [2] ... [2]
	  |  ||	[3] [3] [3] ... [3]
	  |  ||
	  |  |`=>.   .   .  ...  .
	  |  `==>.   .   .  ...  .
	  |
	  `====>[0] [1] [2] ... [9]
		 v   v   v       v
		[0] [0] [0] ... [0]
		[1] [1] [1] ... [1]
		[2] [2] [2] ... [2]
		[3] [3] [3] ... [3]

(Ok, so I'm no artist.)

This time, to access a[2][7][3], we add 2 to the a[] area, follow
that pointer, add 7 to that area, follow that pointer, and add 3
to that area.

Ok, so how to we go about building such a thing, in C?  Suppose
that you want to read two numbers and set up an array of that size.
The array is to hold things of type ``datum'' (a typedef, for what
we don't care).  Here's such a routine:

	datum **a;
	int a_rows, a_cols;

	build_a () {
		int i;

		if (scanf ("%d %d", &a_rows, &a_cols) < 2)
			return (-1);
		a = (datum **) calloc (a_rows, sizeof (datum *));
		if (a == 0)
			return (-1);
		for (i = 0; i < a_rows; i++) {
			a[i] = (datum *) alloc (a_cols, sizeof (datum));
			if (a[i] == 0)
				return (-1);
		}
		return (0);
	}

(Note the tests to make sure that ``calloc'' worked.)

Rewriting this for three, four, or more dimensions is left as an
exercise to the student.  (In other words, I sure don't want to
work that hard.)

Ok, that's all for today.  Quiz Thursday.  Good night.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris at umcp-cs		ARPA:	chris at maryland