Question about C array initialization

nbtrap · 2012-06-25 03:10:26

I understand that the statement

char string[] = "chars";

initializes string[5] to 0 ('\000'). My question is does the statement

char *strings[] = { "string1", "string2", "string3" };

initialize strings[3] to (void *) 0, i.e. NULL? I can't find the answer to this question in the ansi standard. GCC seems to tack on a NULL pointer at the end (even with the -ansi and -pedantic-errors flags), which is really convenient for finding the boundary, but I'm not sure whether it's a GCC thing or an official C thing. I see where the standard talks about terminating char arrays initialized by a string literal with a null byte, but it doesn't talk about the analogous case with arrays of pointers.

Last edited by nbtrap (2012-06-25 03:18:20)

lotuskip · 2012-06-25 06:18:38

This

char string[] = "chars";

allocates 6 bytes of memory and initializes all of it. Note that it's equivalent to

char string[] = { 'c', 'h', 'a', 'r', 's', '\0' };

This, however,

char *strings[] = { "string1", "string2", "string3" };

allocates memory for 3 pointers and initializes them. There is no reason why "strings[3]" (which doesn't even get allocated!) would get initialized.

mloskot · 2012-06-25 09:35:09

nbtrap wrote:

I understand that the statement
char string[] = "chars";
initializes string[5] to 0 ('\000').

Wrong.

nbtrap wrote:

I can't find the answer to this question in the ansi standard

Here is relevant quote from ANSI C ( draft of ANSI X3.159-1989)

3.5.7 Initialization
(...)
char s[] = "abc", t[3] = "abc";
defines ``plain'' char array objects s and t whose members are
initialized with character string literals. This declaration is
identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
(...)

nbtrap · 2012-06-25 10:57:50

lotuskip wrote:

This
char string[] = "chars";
allocates 6 bytes of memory and initializes all of it. Note that it's equivalent to
char string[] = { 'c', 'h', 'a', 'r', 's', '\0' };
This, however,
char *strings[] = { "string1", "string2", "string3" };
allocates memory for 3 pointers and initializes them. There is no reason why "strings[3]" (which doesn't even get allocated!) would get initialized.

There is so a reason why strings[3] would get allocated, and it's the same reason why string[5] in the above example does in fact get allocated, namely, for easier array boundary management. Just as it's easy to know how many chars the above declaration for string initalizes by finding the null byte, it would be very helpful to be able to do the same kind of thing for an array of pointers when the array is of unknown size, i.e. when there is no integer constant between the brackets. Here's an example of what I mean:

char *strings[] = { "string1", "string2", "string3", ... };

/* return true if msg matches a string in the pointer array strings, false otherwise */
int matcher(char *msg)
{
    int i;
    for (i = 0; strings[i]; i++) {
        if (!strcmp(msg, strings[i]))
            return 1;
    }
    return 0;
}

Think of the "..." in the strings declaration to be a placeholder for some number of strings--you want to not have to know if possible. In that case, the code above would only work if in fact the array is initialized with an extra null pointer at the end. So please tell me, if the standard itself doesn't say so, then why does GCC consistently initialize variables like strings with an extra null pointer on the end, even with the -ansi and -pedantic-errors flags? Go ahead and see for yourself.

Edit: Even a declaration such as

int *ints[] = { (int *) 1, (int *) 2, (int *) 3 };

initializes ints[3] to the null pointer with "gcc -ansi -pedantic-errors", at least when the array is statically allocated, i.e. outside of any block of code and/or with the "static" keyword. My guess is it does the same for automatic variables too--I just haven't had the time to test it.

Last edited by nbtrap (2012-06-25 11:10:00)

nbtrap · 2012-06-25 10:58:40

mloskot wrote:

nbtrap wrote:
I understand that the statement
char string[] = "chars";
initializes string[5] to 0 ('\000').
Wrong.

Don't you mean "right"?

lotuskip · 2012-06-25 11:25:08

nbtrap wrote:

There is so a reason why strings[3] would get allocated

I thought I was clear enough in my earlier post...

"strings[3]" doesn't even make any sense. The array only has 3 elements. "strings[3]" might segfault, or might give you something completely random. Strings constants in C by definition contain a '\0' at the end. Arrays of pointers are a completely different thing. You can manually add the NULL there if you want. I agree that it is useful in many occasions.

Last edited by lotuskip (2012-06-25 11:29:29)

nbtrap · 2012-06-25 12:17:06

lotuskip wrote:

nbtrap wrote:
There is so a reason why strings[3] would get allocated
Strings constants in C by definition contain a '\0' at the end.

Not necessarily. A declaration like:

char arr[3] = "str";

doesn't get null-terminated. And besides, my point is that initializing strings[3] *would* make sense for the same reason that intializing

char foo[] = "bar";

with a terminating null byte *does* make sense.

I know that arrays of pointers are not the same. My questions assumes that much. The fact of the matter is, however, GCC terminates pointer arrays of unknown size with a null pointer, and my question is simply "is this an ansi C thing or a GCC thing?".

In fact, it seems that GCC terminates all arrays of unknown size that are explicitly initialized with the appropriate number of null bytes. For example:

struct tag { int i; char c; char *str; } tags[] = {
    { 1, 'a', "string1" },
    { 2, 'b', "string2" },
    { 3, 'c', "string3" }
};

initializes tags[3] to sizeof (struct tag) null bytes. Try it for yourself.

I find this a very useful feature and really hope it's in the definition and not just a GNU extension. If it is a GNU extension, I may need to report it as a GCC bug insofar as it happes with the -ansi and -pedantic flags, unless of course the behavior is undefined by the standard, which seems to be the case.

lotuskip · 2012-06-25 12:43:46

nbtrap wrote:

Not necessarily. A declaration like:
char arr[3] = "str";
doesn't get null-terminated.

What doesn't get '\0'-terminated is your array. The string constant "str" is always '\0'-terminated. During the initialization, as much as possible of the string constant is copied to your array, in this case 3 bytes, so the '\0' does not get copied.

nbtrap wrote:

In fact, it seems that GCC terminates all arrays of unknown size that are explicitly initialized with the appropriate number of null bytes. For example:
struct tag { int i; char c; char *str; } tags[] = {
    { 1, 'a', "string1" },
    { 2, 'b', "string2" },
    { 3, 'c', "string3" }
};
initializes tags[3] to sizeof (struct tag) null bytes.

You might want to check the value of "sizeof(tags) / sizeof(struct tag)". That should be 3, so that "tags[3]" is just a poor expression for the memory coming after the array.

Your described behaviour would be bizarre if it were real in general. What if the struct is absolutely huge, like a megabyte? Will you still find a million zeros after the array? Some zeros appearing after the array might be coincidental.

Last edited by lotuskip (2012-06-25 12:50:45)

nbtrap · 2012-06-25 12:59:34

lotuskip wrote:

nbtrap wrote:
Not necessarily. A declaration like:
char arr[3] = "str";
doesn't get null-terminated.
What doesn't get '\0'-terminated is your array. The string constant "str" is always '\0'-terminated. During the initialization, as much as possible of the string constant is copied to your array, in this case 3 bytes, so the '\0' does not get copied.
nbtrap wrote:
In fact, it seems that GCC terminates all arrays of unknown size that are explicitly initialized with the appropriate number of null bytes. For example:
struct tag { int i; char c; char *str; } tags[] = {
    { 1, 'a', "string1" },
    { 2, 'b', "string2" },
    { 3, 'c', "string3" }
};
initializes tags[3] to sizeof (struct tag) null bytes.
You might want to check the value of "sizeof(tags) / sizeof(tag)". That should be 3, so that "tags[3]" is just a poor expression for the memory coming after the array.
Your described behaviour would be bizarre if it were real in general. What if the struct is absolutely huge, like a megabyte? Will you still find a million zeros after the array? Some zeros appearing after the array might be coincidental.

You're right that "sizeof(tags) / sizeof(struct tag)" yields 3, but I am convinced that the null bytes coming at the end are not a coincidence--you can try changing the size of tag by adding members, and the number of null bytes at the end is always exactly sizeof(struct tag). I think it's safe to conclude that it is something not defined by the standard but handled logically by GCC.

That said, you've answered a more important question that I never really asked in the first place. What I'm really looking for is an easy way to do array boundary management on an array of unknown size without having to count the elements and hard-code the size of the array into the program. What I never considered was the simple "sizeof array / sizeof arraymember". I guess I always (mistakenly) assumed that "sizeof" couldn't work on arrays. Thanks for your help.

Xyne · 2012-06-25 13:33:04

nbtrap wrote:

Not necessarily. A declaration like:
char arr[3] = "str";
doesn't get null-terminated.

It doesn't get null-terminated because you don't leave enough room in the array for the null character. "char arr[3]" can only contain 3 characters, the same way that "char arr[2] = "str" will only contain 2. The second example generates a warning whereas the first does not, but only because it is very common to work with non-null-terminated strings.

nbtrap wrote:

And besides, my point is that initializing strings[3] *would* make sense for the same reason that intializing
char foo[] = "bar";
with a terminating null byte *does* make sense.
I know that arrays of pointers are not the same. My questions assumes that much. The fact of the matter is, however, GCC terminates pointer arrays of unknown size with a null pointer, and my question is simply "is this an ansi C thing or a GCC thing?".
In fact, it seems that GCC terminates all arrays of unknown size that are explicitly initialized with the appropriate number of null bytes. For example:
struct tag { int i; char c; char *str; } tags[] = {
    { 1, 'a', "string1" },
    { 2, 'b', "string2" },
    { 3, 'c', "string3" }
};
initializes tags[3] to sizeof (struct tag) null bytes. Try it for yourself.

I get the following:

test.c

#include <stdio.h>

int
main(int argc, char * * argv)
{
  int i;
  char *strings[] = { "string1", NULL, "string3"};

  for (i=0;i<5;i++)
  {
    printf("strings[%d]: %s\n", i, strings[i]);
  }

  return 0;
}

output

strings[0]: string1
strings[1]: (null)
strings[2]: string3
Segmentation fault

test2.c

#include <stdio.h>
#include <string.h>

int
main(int argc, char * * argv)
{
  int i;
  struct tag {int i; char c; char *str;};
  struct tag null;

  memset(&null, 0, sizeof(struct tag));
  struct tag tags[] =
  {
    { 1, 'a', "string1" },
//     { 2, 'b', "string2" },
    null,
    { 3, 'c', "string3" }
  };

  for (i=0;i<5;i++)
  {
    printf(
      "tags[%d]: (%p) %1d %c %s\n",
      i, tags + i, tags[i].i, tags[i].c, tags[i].str
    );
  }
  return 0;
}

output

tags[0]: (0x7fffd2df4e80) 1 a string1
tags[1]: (0x7fffd2df4e90) 0  (null)
tags[2]: (0x7fffd2df4ea0) 3 c string3
tags[3]: (0x7fffd2df4eb0) 0  (null)
Segmentation fault

The above behavior appears to conflict.

Given that string constants in C are defined as null-terminated, i.e. "str" is equivalent to "{'s', 't', 'r', '\0'}", the assignment 'arr[] = "str"' explicitly includes a null terminator. The other examples above do not include such a terminator. Even if it would be convenient sometimes, the standard should adopt a minimally invasive approach. It's easy to add a NULL at the end of an array declaration, but impossible to remove an extra element from such a declaration.

This may be informative too:
test3.c

#include <stdio.h>
#include <string.h>

int
main(int argc, char * * argv)
{
  char str[] = "test";
  char *strings[] = {"string1", NULL, "string3"};
  struct tag {int i; char c; char *str;};
  struct tag null;

  memset(&null, 0, sizeof(struct tag));
  struct tag tags[] =
  {
    { 1, 'a', "string1" },
//     { 2, 'b', "string2" },
    null,
    { 3, 'c', "string3" }
  };

  printf(
    "str\n  size: %lu\n  length: %lu\n"
    "strings\n  size: %lu\n  length: %lu\n"
    "tags\n  size: %lu\n  length: %lu\n",
    sizeof(str), sizeof(str)/sizeof(char),
    sizeof(strings), sizeof(strings)/sizeof(char *),
    sizeof(tags), sizeof(tags)/sizeof(struct tag)
  );

  return 0;
}

output

str
  size: 5
  length: 5
strings
  size: 24
  length: 3
tags
  size: 48
  length: 3

The string length (actual number of chars in array) is reported as 5 ('t', 'e', 's', 't', '\0'), which is expected. The array of strings and tags are both reported as 3, even though you can access a null value after the last initialized index of the tags array. If that null value was really supposed to be there, it should be accounted for using sizeof, just as the null character is in the string.

Maybe the ultimate null element of the tags array is some artefact of memory alignment.

Xyne · 2012-06-25 13:42:59

nbtrap wrote:

I guess I always (mistakenly) assumed that "sizeof" couldn't work on arrays.

Be careful with "sizeof" and arrays. The declaration must be visible to "sizeof". For example,

struct foo bar[50];
...

// This works.
unsigned int bar_size = sizeof(bar);

// This does not work;
void test(struct foo * baz)
{
  unsigned int baz_size = sizeof(baz);
}
test(bar);

// This does not work either;
void test2(struct foo[] baz)
{
  unsigned int baz_size = sizeof(baz);
}
test2(bar);

// This does.
void test3(struct * baz, unsigned int baz_size)
{
}
test3(bar, sizeof(bar));

// So does this.
void test4(void)
{
  unsigned int baz_size = sizeof(bar);
  baz = bar; // or e.g. allocate and copy memory for bar based on baz_size
}

Arch Linux

#1 2012-06-25 03:10:26

Question about C array initialization

#2 2012-06-25 06:18:38

Re: Question about C array initialization

#3 2012-06-25 09:35:09

Re: Question about C array initialization

#4 2012-06-25 10:57:50

Re: Question about C array initialization

#5 2012-06-25 10:58:40

Re: Question about C array initialization

#6 2012-06-25 11:25:08

Re: Question about C array initialization

#7 2012-06-25 12:17:06

Re: Question about C array initialization

#8 2012-06-25 12:43:46

Re: Question about C array initialization

#9 2012-06-25 12:59:34

Re: Question about C array initialization

#10 2012-06-25 13:33:04

Re: Question about C array initialization

#11 2012-06-25 13:42:59

Re: Question about C array initialization

Board footer