Array bounds checking: what is legal
Chris Torek
chris at mimsy.umd.edu
Sun Sep 2 06:37:35 AEST 1990
In article <26196 at mimsy.umd.edu> I wrote:
>`&arr[sizeof arr/sizeof *arr]' ... is Officially Legal.
(Those who would dispute this are advised to see ANSI Standard
X3.159-1989, otherwise known as `The ANSI C Standard', sections 3.2.2.1
(Lvalues and function designators), 3.3.3.4 (The sizeof operator), and
3.3.6 (Additive operators).)
This seems to be rather universally misunderstood. To amplify a bit:
In article <29051 at nigel.ee.udel.edu> gdtltr at freezer.it.udel.edu (Gary Duzan)
writes:
>I don't believe accessing the element after is legal, but the pointer
>is still legal.
Correct. Given `int a[4];', the following holds:
int *p = a; /* legal */
a[0], a[1], a[2], a[3]; /* all legal */
p[0], p[1], p[2], p[3]; /* all legal */
p = &a[4]; /* legal */
*p; /* illegal (a[4] does not exist) */
p--; /* legal */
p = a; /* legal */
p--; /* illegal */
p = &a[4]; /* legal */
p[-4], p[-3], p[-2], p[-1]; /* all legal */
Note the last carefully: it is not the subscript itself that makes a
given x[i] legal or illegal, but rather whether x+i yeilds a legal address
and, if so, whether *(x+i) is also legal.
Now, as to why &a[4] is legal when a[4] is not, consider:
int i;
for (i = 0; i < 4; i++)
printf("%d\n", i);
When this code is run, i takes on five values, namely 0, 1, 2, 3, and 4.
Even if we alter the loop slightly to get rid of the `4', i still takes
on the value 4:
for (i = 0; i <= 3; i++)
...
Now what happens if we loop `p' over the various elements in `a'?
for (p = &a[0]; p < &a[4]; p++)
...
p must eventually take on the value &a[4]. There is no way around it;
even if we get rid of the `&a[4]' in the loop, p still winds up with
&a[4] as its final value:
for (p = &a[0]; p <= &a[3]; p++)
...
/* now p == &a[4] */
Since this sort of thing happens all the time in existing code, there was
no choice but to make it Officially Legal and require all C compilers to
support it. This, on the other hand, is not legal:
for (p = &a[3]; p >= &a[0]; p--) /* illegal */
...
This loop supposedly terminates when p takes on the value &a[-1]; but as
noted above, &a[-1] is not a legal address, and in fact this code fails
on some machines---for instance, on a 68000 where the C compiler starts
the data space at location 2, and `a' is a global array of 32-bit `int's
that happens to be the first object in the data segment. The code turns
into, e.g.,
loop:
...
subql #4,a2 # p--
cmpl #2,a2 # (unsigned long)p < 2?
jcs out # if so, exit loop
jra loop # otherwise continue
and when p==&a[0], p==2, so p-4 puts 0xfffffffe into p, which is still
greater than or equal to 2.
This is the same old fencepost problem that occurs everywhere.
Incidentally, there is a way to keep p from taking on &a[4]:
for (p = a;; p++) {
...
if (p == &a[3])
break;
}
This is the same solution required for loops that purport to run to
MAXINT or MAXULONG or other such maxima, and it shares their drawback:
these are exceedingly ugly.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain: chris at cs.umd.edu Path: uunet!mimsy!chris
More information about the Comp.lang.c
mailing list