awk -F '|' '// { Count[$3 "|" $5]++; } END { for (i in Count) { printf "%s|%s\n", i, Count[i]; }}' /path/to/file
As for not wanting to use awk, you started the thread by saying you could use any tool that could be good for this. Bash is not good for this. When you included awk it was much worse, yes - but that was because you completely misused it. If you want to do this in bash - good luck.
]]>Your edit doesn't help. SHOW what kind of output you would want for this hypothetical input.
In any case, from everything I can gather about what you are describing, the following awk script would work, and would only read the data once. This assumes each element of the list is separated by newlines (which it looks like the original was before you smashed it into an ugly bash array) and it gives the output separated by newlines:
#!/bin/bash
awk '
// {
W=...;
H=...;
WxH = W "x" H;
Count[WxH]++;
}
END {
for (i in Count) {
printf "%s=%s\n", i, Count[i];
}
}
' /path/to/your/input
You just need to fill in the elipses for the W and H, or preferrably just use one string function to extract WxH.
]]>X[2]="${A[1]}$T"
echo ${X[2]}
aa10
calculates how many similar and add as part of a string to a new array
]]>You say your goal is to compare to arrays - and any language that can do it will be fine. But then the only description of what you really want is presenting a potential solution in bash. Then as we move along we find that these arrays are filled from files.
So can you please describe what you actually want to acheive?
I will start by saying that bash will most likely not be good for such large arrays. It could do it ... but it shouldn't.
If you have the list in a file, why not just `sort -u`?
EDIT: I just reread your bash version - no wonder it takes hours: for all that weird variable processing, you are repeating the exact same processing on every single array element over 15 thousand times: you extract Width and Height from each element, then for each element you extract W and H from every element - so you end up extracting width and height from each array element one more time than there are elements in the array! Why on earth are you doing that? Preprocess every array element once into the parts you care about:
Rule of Representation: Fold knowledge into data so program logic can be stupid and robust.
EDIT: as for awk and sed slowing down the script "very much" it's because you are using them completely wrong. Don't launch a new awk subprocess for each variable of interest for each element of the array for every other element of the array (which would mean you're launching 1500*1500*2 subprocesses with awk). Just use one awk process to preprocess the input into a format that can be easily compared. Then go through that input only once.
]]>This I am using to fill in array in the script from a file:
Is zsh faster than bash?
Does it has a better ways to handle arrays?
Is a programming much different in them?
I have no idea whether zsh is any faster or slower. zsh is somewhat more powerful in array handling, and has a lot more sugar in e.g. its parameter expansion stuff, but these are really just a convenience to me. zsh is a good interactive shell, but to spend time learning programing in it is not quite useful, certainly not before you're comfortable with the basic, portable shell stuff -- learn bash first. Also, don't shun external tools, they're often very efficient at what they do, and you can be sure that a bad implementation in shell will be slower than using external specialty tools.
]]>ArrayFillCount=0;
Count=0
while read line ; do
ArrayFillCount=$((ArrayFillCount+1))
ArrayOfFiles[$ArrayFillCount]="$line"
done < myinfo.txt;
And I wrote above how each string look like.
if [[ "$Width" -eq "$W" && "$Height" -eq "$H" ]];
then TotalDupes=$((TotalDupes+1));
fi;
It compares only numbers related to W and H from the whole line.
And I am using this way to separate parts in a line:
TMPA="${ArrayOfFiles[$DupCount]}";
TMPwB="${TMPA/|H|*/}";
W=${TMPwB/*W|/}
TMPhB="${TMPA/|Format|*/}";
H="${TMPhB/*|/}"
Instead of other extern programs or commands that do output to display like:
#does the same as above.
#This is an example of the strings in the array extracted with help of echo and awk
W=$(echo "./Sorted/jpeg/ImgSize_W_119_H_170/all-web-images_50934_f124382360.jpg|W|119|H|170|Format|jpeg|Errors|0|" | awk -F'|' '{print $3}')
H=$(echo "./Sorted/jpeg/ImgSize_W_119_H_170/all-web-images_50934_f124382360.jpg|W|119|H|170|Format|jpeg|Errors|0|" | awk -F'|' '{print $3}')
Is zsh faster than bash?
Does it has a better ways to handle arrays?
Is a programming much different in them?
Again, like 2ManyDogs, I assume you want to remove duplicates; another way to do that,
#!/bin/bash
mapfile -t array < <(printf '%s\n' "${array[@]}"|sort -u)
or if you can use zsh, simply use
${(u)array}
or
declare -Ua array
array=(... ...)
A search for "bash find duplicates in an array", gave me this page: http://stackoverflow.com/questions/2205 … ment-array
and this code does work to return show the duplicate entries in my test array, but I'm not sure it is what you want, or will work with your array:
#!/bin/bash
array=( 1 2 3 4 5 7 2 6 7 )
printf '%s\n' "${array[@]}"|awk '!($0 in seen){seen[$0];next} 1'
There are many people here with more awk and bash skills, so if you can be a little more precise about what you have an what you want, I'm sure someone can give you a better answer. But in the meantime, there are many web pages with examples that may help you.
]]>while [ $ZZ != $TotalItems ]; do
TMPA="${ArrayOfFiles[$ZZ]}";
TMPwB="${TMPA/|H|*/}";
Width=${TMPwB/*W|/}
TMPhB="${TMPA/|Format|*/}";
Height="${TMPhB/*|/}"
DupCount="0"
while [ $DupCount != $TotalItems ];
do
TMPA="${ArrayOfFiles[$DupCount]}";
TMPwB="${TMPA/|H|*/}";
W=${TMPwB/*W|/}
TMPhB="${TMPA/|Format|*/}";
H="${TMPhB/*|/}"
if [[ "$Width" -eq "$W" && "$Height" -eq "$H" ]];
then TotalDupes=$((TotalDupes+1));
fi;
DupCount=$((DupCount+1))
done
CollectDupes[$ZZ]="${ArrayOfFiles[$ZZ]}Duplicates|$TotalDupes";
#echo ${CollectDupes[$ZZ]} >> /tmp/tmpXX.txt
ZZ=$((ZZ+1))
done
Here is time stamps:
19:17:59
DONE FILL IN ARRAY
19:18:00
Total: 15210
21:53:25
It took two and a half hour(19:18:00 to 21:53:25) only to calculate, bash used only one core of four to 100%. I also used date '+%T' to get time for each task.
]]>