[SOLVED] C programming help

NYFinest · 2011-07-06 14:03:20

Hello everyone,

Hopefully this is the appropiate spot to post this, I need help with a C programming HW problem. I need to create a program that that finds how many repeated characters in a string. For example the word "missing" has 2 repeating characters and the word "hisssss" has 1 repeating character. In my code posted below, the word "missing" returns 4 repeated characters so I assume my approach was wrong. I also have another similar problem that I need to do, but if i can figure this one out, I can do the next one. If anyone can point me in the right direction I'll greatly appreciate it.

Thanks.

Also, I need to use the function prototype I used in the code.

#include <stdio.h>
#include <string.h>
#define MAX 64

int repeat(char *ptr) {

    int length, i, j, count = 0;
    char comp[MAX];

    length = strlen(ptr);

    for (i = 0; i < length; i++)
        for (j = 0; j < length; j++) {

            if (i != j && ptr[i] == ptr[j])

                count++;

        } //for 

    return count;
} //repeat


int main () {

    char word[MAX];
    char character = ' ';
    int total;


    printf("word: ");
    scanf("%s", word);

    total = repeat(word);
    printf ("repeated letters: %d", total);

} //main

Last edited by NYFinest (2011-07-08 02:35:10)

ewaller · 2011-07-06 14:55:59

Well, you said this is a homework problem, so I'll treat it as such. BTW, thank you for coming right out and saying it, because homework problems masquerading as real problems are disallowed (and are usually detected).

You are comparing each character in the string to every other character in the string -- without keeping track of what you have seen.
You will need to step through each character in the string, and determine if you have already seen it. Then, if you have, handle the repeat condition; if not, add the new character to the list of characters you have seen.

SoleSoul · 2011-07-06 17:12:13

I would have done it this way for simplicity (not for performance):

Create an array of 26 integers initialized to 0, each represents one letter. For each letter in the string, add one to the correct cell. You can do it like that: a[X-A] += 1; when X is the ascii value of the letter you've got and A is the ascii value of the letter 'a' minus 1.

After you have the full array, it's just a matter of counting the number of cells which contain a number greater than 1.

Good luck.

(I hate the fake complexity of homework

tavianator · 2011-07-06 17:44:22

SoleSoul wrote:

You can do it like that: a[X-A] += 1; when X is the ascii value of the letter you've got and A is the ascii value of the letter 'a' minus 1.

You should not do it that way because C does not specify that the character encoding is ASCII; 'b' is not guaranteed to be 'a' + 1. Also, if one of the characters is less than 'a' - 1, you will index the array with a negative value, which is very bad.

No harm in "int used_chars[256] = { 0 };" IMO.

Last edited by tavianator (2011-07-06 17:45:48)

NYFinest · 2011-07-06 19:08:38

Thank you all for the replies.
I was thinking of creating another array and while the for loop is doing its job somehow put the character into the new array. Then with another loop search to see if any of the characters are equal and remove 1 from the counter. IMO this seems kind of messy but I still have not tried it yet. I also think I am making it over complicated. I will try out other ways pointed out and any suggestions are welcomed.

Thanks

listdata · 2011-07-07 18:57:32

I second ewaller's pseudocode --- this way, you can handle any arbitrary set of characters.

I think the missing key here is a hashing function to convert a particular character symbol into a unique integer, which can be used as a sort of UUID on a character. This UUID is just a simple integer number, which can in turn be used to increment the _correct_ value each time you encounter a repeated character. It may sound advanced, but you can do it with just 1 for-loop, over an array called "seen_chars[]" or something like that.

Also, you should initialize the "word" array you have with NULL bytes, so that when you read in the word with scanf, it is always terminated with a NULL byte ('\0') (provided that the word is 63 characters or less, of course). Then your strlen() calls will actually function properly (they're broken right now, since your "word" array is initialized with random values, with or without possible NULL bytes).

Another tip: put in a bunch of printf() calls inside your loops, so you can actually see your code work. This is a powerful preemptive debugging concept that beginners tend to overlook...

Last edited by listdata (2011-07-07 19:00:33)

juster · 2011-07-08 00:07:25

Haha this thread is funny.

SoleSoul wrote:

I would have done it this way for simplicity (not for performance): ...

This seems out of place because your method has better performance than the original! Right? It is O(n) instead of O(n^2). Doing it SoleSoul's way also shows you know how to use a simple data structure. An array! That might be handy. You don't even need to malloc it.

I think it's pretty safe to assume NY is using ASCII. Character encodings seem way too advanced for this sort of homework. That's just too funny... make sure you implement a UTF8 parser as well. Oh and a hash table. Oh you ran out of time and got an F? Bummer.

Hehe. Okay, let's see. Scanf will null terminate the string for you. You maybe should specify a width for the "%s". Like "%63s" or else you can overflow the string. Overflow you ask? Like, try typing in more than 64 chars with what you have. Maybe your teacher doesn't care though? If that hasn't been covered, don't sweat it. Mkay, here's the "hashing function" mentioned earlier:

int super_awesome_uuid (char ch)
{
    if (ch < 'a' || ch > 'z') { printf("Argh!?\n"); exit(0); } /* or whatnot */
    return ch - 'a';
}

tavianator · 2011-07-08 01:56:18

juster wrote:

Mkay, here's the "hashing function" mentioned earlier:

int super_awesome_uuid (char ch)
{
    if (ch < 'a' || ch > 'z') { printf("Argh!?\n"); exit(0); } /* or whatnot */
    return ch - 'a';
}

Lol. Here's mine:

int super_awesome_uuid(char ch)
{
  return ch;
}

juster · 2011-07-08 02:32:37

You forget to check if it's negative.

NYFinest · 2011-07-08 02:34:50

lol, I would like to thank you all for your responses. I did a few adjustments to the code and I got it to work but not 100% and submitted it. I got it back and according to the professor nothing wrong with it.

Once again, thanks for the tips, ill be keeping them in hand for any future coding

listdata · 2011-07-08 05:05:44

juster wrote:

int super_awesome_uuid (char ch)
{
    if (ch < 'a' || ch > 'z') { printf("Argh!?\n"); exit(0); } /* or whatnot */
    return ch - 'a';
}

I guess that works... and technically you could trivially expand the range to include all useful ASCII input characters like numbers, punctuation, etc. But now you have to figure out the exact array size of the "hashtable" though...

tavianator wrote:

Lol. Here's mine:

int super_awesome_uuid(char ch)
{
  return ch;
}

The OP's original program only used a 64-char string, so, your solution would require the creation of a needlessly large array (even within just the range of ASCII chars from a QWERTY keyboard) --- not to mention having lots of dead space for your array's first 33 elements (if we start with the '!' character as the lower ASCII char bound).

Here's my take:

#define MAX 64

/* return unique number for a given character function, and -1 if not found */
int get_idx(char *history, char *c) {
    int i;

    for (i = 0; i < MAX; i++) { // actually, can use 32 instead of MAX (64) to save some space, because that's the maximum number of unique, repeating characters in a 64-char string (EDIT: oops, no, because if there are 64 unique characters, history[] needs to be 64 characters big!)
        if (history[i] == *c)
            return i;
    }

    return -1;
}

You need to pass in the "hashtable" (char *history), along with the character you're looking at. No need to figure out the size of the UUID array. Works like a charm. The only "trick" not mentioned here is the actual population of the history[] array --- but that should be pretty easy to figure out (hint: make use of the -1 return value).

Last edited by listdata (2011-07-09 15:05:34)

tavianator · 2011-07-08 16:06:40

listdata wrote:

The OP's original program only used a 64-char string, so, your solution would require the creation of a needlessly large array (even within just the range of ASCII chars from a QWERTY keyboard) --- not to mention having lots of dead space for your array's first 33 elements (if we start with the '!' character as the lower ASCII char bound).

You could store the counts in an unsigned char array, for example, thus taking up all of 256 bytes. Even an int array will take up only 1024 bytes. This is likely to be about .00002384185791015600% of your available memory. And, why would you write a solution that doesn't work for arbitrary strings, when the more general solution is both easier to write and faster? If you write the OP's repeat() function this way, it's 7 lines long.

I'd also like to point out that I should've written

int super_awesome_uuid(char ch)
{
  return (unsigned char)ch;
}

in case the char type is signed on your platform.

Cdh · 2011-07-08 19:29:26

tavianator wrote:

You could store the counts in

But that isn't the original task.

I believe the most "minimal" solution (because no storage required) would be:

#include <stdio.h>
#include <string.h>
#define MAX 64

int repeat(char *ptr) {
    int length, i, j, count, tempcount = 0;
    char currentchar;

    for (i = 0; i < strlen(ptr); i++) {
        tempcount = 0;
        currentchar = *(ptr + i);

        for (j = i - 1; j >= 0 && tempcount <= 1; j--) {
            if (*(ptr + j) == currentchar) {
                tempcount += 1;
            }
        }

        if (tempcount == 1) {
            count += 1;
        }
    }
    return count;
}

int main () {
    char word[MAX];
    char character = ' ';

    printf("word: ");
    fgets(word, MAX, stdin);
    printf ("repeated letters: %d\n", repeat(word));
}

//edit: cleaned up a little
But of course that's slower than the array storage way.

Still O(n^2) though... The worst case is like "abcdabcd" so in the first half of the string for every character in the string I go back to the beginning without finding another occurence (~n/2*n1/4) and for the second part I go back through half the string length for every character too (n/2*n/2)... right?

Also, use (f)gets to read, since scanf only reads until a whitespace...

Last edited by Cdh (2011-07-08 19:39:06)

tavianator · 2011-07-08 20:33:43

Cdh wrote:

tavianator wrote:
You could store the counts in
But that isn't the original task.

Hrm? It's a way to implement the original task. Also, this is a homework question, so don't post complete solutions (https://wiki.archlinux.org/index.php/Fo … e#Homework). I suppose the saving grace is that as written, yours doesn't work (count is uninitialised).

Still O(n^2) though... The worst case is like "abcdabcd" so in the first half of the string for every character in the string I go back to the beginning without finding another occurence (~n/2*n1/4) and for the second part I go back through half the string length for every character too (n/2*n/2)... right?

The worst case would be "abcdefgh...". That way the nth character would always take n-1 comparisons, so you'd have n*(n-1)/2 comparisons total. Of course, n is bounded here 'cause there's only a few possible characters. For unbounded n, I'm not sure how to construct the combinatoric worst case.

Cdh · 2011-07-08 20:50:15

Eh, even if it has long been marked as [SOLVED] and told us he successfully submitted his solution?

Well, he could be lying about that, but I don't think so.

Hm, it does work with gcc default settings. It initializes variables with 0 apparently. But it's obviously not in the standard because pathcc doesn't do it.

#include <stdio.h>
int main() {
   int a, b = 5;
   printf("%d, %d\n", a, b);
   return 0;
}

Default settings of compilers:

chris@chrisl ~/temp % gcc -o vartest vartest.c
chris@chrisl ~/temp % ./vartest
0, 5
chris@chrisl ~/temp % pathcc -o vartest vartest.c
chris@chrisl ~/temp % ./vartest
1187462416, 5

Hm, you could argue about the worst case. Since the amount of characters is limited it actually is only a constant factor of 3* the alphabet or so.
I'm not so sure about the general behaviour either. The "..?.." isn't obvious.

Last edited by Cdh (2011-07-08 20:52:40)

listdata · 2011-07-09 15:04:29

tavianator wrote:

listdata wrote:
The OP's original program only used a 64-char string, so, your solution would require the creation of a needlessly large array (even within just the range of ASCII chars from a QWERTY keyboard) --- not to mention having lots of dead space for your array's first 33 elements (if we start with the '!' character as the lower ASCII char bound).
You could store the counts in an unsigned char array, for example, thus taking up all of 256 bytes. Even an int array will take up only 1024 bytes. This is likely to be about .00002384185791015600% of your available memory. And, why would you write a solution that doesn't work for arbitrary strings, when the more general solution is both easier to write and faster? If you write the OP's repeat() function this way, it's 7 lines long.
I'd also like to point out that I should've written
int super_awesome_uuid(char ch)
{
  return (unsigned char)ch;
}
in case the char type is signed on your platform.

True, but I like my approach better because: (1) it works for arbitrarily dense sets of symbols, and (2) does not needlessly waste space. The advantage of your approach is that you use almost no CPU cycles to calculate the UUID. So for very large input strings, it will be very, very fast (although at the cost of more memory use depending on set density). I was just noting how your approach used more memory, which is not an insignificant point.

Last edited by listdata (2011-07-09 15:45:39)

listdata · 2011-07-09 16:19:30

Cdh wrote:

tavianator wrote:
You could store the counts in
But that isn't the original task.
I believe the most "minimal" solution (because no storage required) would be:
#include <stdio.h>
#include <string.h>
#define MAX 64

int repeat(char *ptr) {
    int length, i, j, count, tempcount = 0;
    char currentchar;

    for (i = 0; i < strlen(ptr); i++) {
        tempcount = 0;
        currentchar = *(ptr + i);

        for (j = i - 1; j >= 0 && tempcount <= 1; j--) {
            if (*(ptr + j) == currentchar) {
                tempcount += 1;
            }
        }

        if (tempcount == 1) {
            count += 1;
        }
    }
    return count;
}

int main () {
    char word[MAX];
    char character = ' ';

    printf("word: ");
    fgets(word, MAX, stdin);
    printf ("repeated letters: %d\n", repeat(word));
}
//edit: cleaned up a little
But of course that's slower than the array storage way.
Still O(n^2) though... The worst case is like "abcdabcd" so in the first half of the string for every character in the string I go back to the beginning without finding another occurence (~n/2*n1/4) and for the second part I go back through half the string length for every character too (n/2*n/2)... right?
Also, use (f)gets to read, since scanf only reads until a whitespace...

Amazing. No additional storage ("hashtable" discussion above) needed! I'm not trained in Big O notation so can't help you there, but here's a slightly cleaned up version, Linux Kernel style:

int repeat(char *ptr) {
    int i, j, count, tempcount;
    count = 0;

    for (i = 0; i < strlen(ptr); i++) {
        tempcount = 0;

        for (j = 0; j < i && tempcount < 2; j++) {
            if (ptr[j] == ptr[i])
                tempcount++;
        }

        if (tempcount == 1)
            count++;
    }
    return count;
}

Last edited by listdata (2011-07-09 16:46:03)

tavianator · 2011-07-10 22:25:03

Cdh wrote:

Eh, even if it has long been marked as [SOLVED] and told us he successfully submitted his solution?
Well, he could be lying about that, but I don't think so.

Oops, I missed that post when I scanned the thread. Sorry. Here's mine:

int
repeat(const char *ptr)
{
  size_t i, count = 0, used_chars[256] = { 0 };

  for (i = 0; ptr[i] != '\0'; ++i) {
    ++used_chars[(unsigned char)ptr[i]];
  }

  for (i = 0; i < 256; ++i) {
    if (used_chars[i] > 1) {
      ++count;
    }
  }

  return count;
}

Compared to listdata/Cdh's implementation, my way takes 0.006s on a 1MB input, compared to 0.811s.

Hm, it does work with gcc default settings. It initializes variables with 0 apparently. But it's obviously not in the standard because pathcc doesn't do it.

This is why "gcc -Wall" is a good idea.

listdata wrote:

True, but I like my approach better because: (1) it works for arbitrarily dense sets of symbols, and (2) does not needlessly waste space. The advantage of your approach is that you use almost no CPU cycles to calculate the UUID. So for very large input strings, it will be very, very fast (although at the cost of more memory use depending on set density). I was just noting how your approach used more memory, which is not an insignificant point.

1) What do you mean by dense?
2) Both use O(1) space, mine just happens to use slightly more. For unicode characters obviously my way would be a huge waste, but you could use a hash table or trie or something to keep the space proportional to the number of distinct characters found.

Also, the first U in UUID stands for "universal," which is certainly not true of any of the hashing methods listed here. UID would make sense though.

Cdh · 2011-07-11 10:21:11

tavianator wrote:

Compared to listdata/Cdh's implementation, my way takes 0.006s on a 1MB input, compared to 0.811s.

Yes, this damned O(n^2).

But I had another idea today:

Since Characters are integers they can be trivially sorted via inplace radix sort in O(n). Then solving the original problem is trivial in O(n). All of that is not requiring any additional space.

tavianator · 2011-07-11 15:35:50

Cdh wrote:

tavianator wrote:
Compared to listdata/Cdh's implementation, my way takes 0.006s on a 1MB input, compared to 0.811s.
Yes, this damned O(n^2).
But I had another idea today:
Since Characters are integers they can be trivially sorted via inplace radix sort in O(n). Then solving the original problem is trivial in O(n). All of that is not requiring any additional space.

Clever! Actually, my method is similar to counting sort when you think about it, but without the actual sorting step. However, in-place radix sort uses k levels of recursion for sorting k-bit integers, so it's not quite true that it requires no additional space.

Cdh · 2011-07-11 20:37:02

Does this mean it would need 32 or 64 times the input length memory? *eek* But it's all on the stack, right?

listdata · 2011-07-11 23:37:15

tavianator wrote:

listdata wrote:
True, but I like my approach better because: (1) it works for arbitrarily dense sets of symbols, and (2) does not needlessly waste space. The advantage of your approach is that you use almost no CPU cycles to calculate the UUID. So for very large input strings, it will be very, very fast (although at the cost of more memory use depending on set density). I was just noting how your approach used more memory, which is not an insignificant point.
1) What do you mean by dense?
2) Both use O(1) space, mine just happens to use slightly more. For unicode characters obviously my way would be a huge waste, but you could use a hash table or trie or something to keep the space proportional to the number of distinct characters found.
Also, the first U in UUID stands for "universal," which is certainly not true of any of the hashing methods listed here. UID would make sense though.

This discussion is getting a little too hairy for me, so just to clear up any misunderstandings, here's my (slightly modified) full solution:

/*
 * Return unique number for a given character, based on the history buffer. If
 * not found in the history, then add the character to the history and return
 * its index.
 */
int get_idx(char *history, char *c) {
    int i;

    for (i = 0; i < MAX; i++) {
        if (history[i] == *c)
            return i;
        if (history[i] == '\0') {
            history[i] = *c;
            return i;
        }
    }

    /*
     * History buffer is full, which should never happen as long as both the
     * string where *c comes from and the buffer is both MAX big.
     */
    assert(0);
}

/*
 * Only use 1 for-loop, and only call get_idx() once per iteration. Drawback:
 * get_idx() modifies the history[] array!
 */
int count_reps(char *str) {
    int i, idx, reps;
    char history[MAX] = {'\0'};
    int rep_chars[MAX] = {0};

    for (i = 0, reps = 0; str[i] != '\0'; i++) {
        idx = get_idx(history, &str[i]);
        rep_chars[idx]++;
        if (rep_chars[idx] == 2)
            reps++;
    }

    return reps;
}

The get_idx() function used to just return 1 or -1 depending on whether the character was found in the hash already, but now it just modifies the history buffer to cut down on the logic. It's still far less impressive than Cdh's solution, though...

Regarding point (1): My approach requires another copy of the string; it's difference from your approach is that it always only requires the same memory space as the original string. Your approach's memory use grows with the number of symbols/characters in the string type's set. That's what I meant by "arbitrary density"...

Point (2): As you can probably see now, my solution can be very slow and certainly is not O(n)...

listdata · 2011-07-12 00:00:55

Cdh wrote:

tavianator wrote:
Compared to listdata/Cdh's implementation, my way takes 0.006s on a 1MB input, compared to 0.811s.
Yes, this damned O(n^2).
But I had another idea today:
Since Characters are integers they can be trivially sorted via inplace radix sort in O(n). Then solving the original problem is trivial in O(n). All of that is not requiring any additional space.

Well, it doesn't matter which sorting algorithm you use, since we're only concerned about the uniqueness of each character in the string. The only desirable property after the sorting is that the characters would be grouped together appropriately (e.g., "abcdabbbbc" -> "aabbbbbccd") and then it's just a trivial "does n != n+1" question with a single loop across the string to determine if that character repeats. Maybe something like:

/* str[] is already sorted */
char repchar = '\0';
int reps = 0;
for (i = 0; str[i] != '\0'; i++) {
    if (str[i] == repchar) {
        continue;
    }
    if (str[i] == str[i+1]) {
        repchar = str[i];
        reps++;
        i++;
        continue;
    }
}

juster · 2011-07-12 03:21:07

Woo you guys are still going at it! Cdh, I think your algorithm is O(log n) though I forget exactly how to calculate it. Never totally understood big O crap... has something to do with best case being 1 and worst being n/2 or some such voodoo. Congrats!

Here is a version I wrote for kicks a few days ago. You memory-shy types might enjoy it. Obviously, I prefer tav's and SoleSoul's approach myself. Much simpler. This is just a perversion of this simple approach.

#include <stdio.h>
#include <stdint.h>

int main ()
{
    uint32_t seen, dups, mask;
    int count, i;
    char ch;

    seen = dups = 0;

    printf("word: ");
    while (1) {
        ch = getchar();
        if (ch == '\n' || ch == EOF) break;
        if (ch < 'a' || ch > 'z') continue;

        mask = 1 << (ch - 'a');
        if (dups & mask) continue;

        if (seen & mask) dups |= mask;
        else seen |= mask;
    }

    for (i=0; i<28; ++i) {
        if (dups & (1<<i)) count++;
    }

    printf("%d repeated characters.\n", count);

    return 0;
}

it's probably less efficient than just using a plain array and a little wacky in general.

newcomer · 2011-07-12 16:04:52

/* Count duplicates in a string */

#include<stdio.h>

unsigned int dup(const char *str) {
    unsigned int cs, r = 0, high[8] = {0}, low[8] = {0};
    unsigned char cc;
    while(cc = (unsigned char)(*str++)) {
        if(low[cc/32] & 1 << cc%32)
            high[cc/32] |= 1 << cc%32;
        else
            low[cc/32] |= 1 << cc%32;
    }
    for(cs = 0; cs != 256; ++cs)
        if(high[cs/32] & 1 << cs%32)
            ++r;
    return r;
}

int main(int argc, char *argv[]) {
    while(argc-- > 1)
        printf("%s -> %d\n", argv[argc], dup(argv[argc]));
}

Very similar to the array solution, but with less wasted space I think. No idea about the big O efficiency.

Arch Linux

#1 2011-07-06 14:03:20

[SOLVED] C programming help

#2 2011-07-06 14:55:59

Re: [SOLVED] C programming help

#3 2011-07-06 17:12:13

Re: [SOLVED] C programming help

#4 2011-07-06 17:44:22

Re: [SOLVED] C programming help

#5 2011-07-06 19:08:38

Re: [SOLVED] C programming help

#6 2011-07-07 18:57:32

Re: [SOLVED] C programming help

#7 2011-07-08 00:07:25

Re: [SOLVED] C programming help

#8 2011-07-08 01:56:18

Re: [SOLVED] C programming help

#9 2011-07-08 02:32:37

Re: [SOLVED] C programming help

#10 2011-07-08 02:34:50

Re: [SOLVED] C programming help

#11 2011-07-08 05:05:44

Re: [SOLVED] C programming help

#12 2011-07-08 16:06:40

Re: [SOLVED] C programming help

#13 2011-07-08 19:29:26

Re: [SOLVED] C programming help

#14 2011-07-08 20:33:43

Re: [SOLVED] C programming help

#15 2011-07-08 20:50:15

Re: [SOLVED] C programming help

#16 2011-07-09 15:04:29

Re: [SOLVED] C programming help

#17 2011-07-09 16:19:30

Re: [SOLVED] C programming help

#18 2011-07-10 22:25:03

Re: [SOLVED] C programming help

#19 2011-07-11 10:21:11

Re: [SOLVED] C programming help

#20 2011-07-11 15:35:50

Re: [SOLVED] C programming help

#21 2011-07-11 20:37:02

Re: [SOLVED] C programming help

#22 2011-07-11 23:37:15

Re: [SOLVED] C programming help

#23 2011-07-12 00:00:55

Re: [SOLVED] C programming help

#24 2011-07-12 03:21:07

Re: [SOLVED] C programming help

#25 2011-07-12 16:04:52

Re: [SOLVED] C programming help

Board footer