You are not logged in.

#1 2013-04-23 18:26:50

instantaphex
Member
Registered: 2012-07-07
Posts: 67

[SOLVED] Script to concatenate csv fields by a common column

I have a csv file that has a list of many URLs in one column as well as an adjacent column containing the email address of the owner pulled from the whois data.  Many of the URLs have the same owner.  What I am trying to do is to concatenate all of the URLs into a single csv cell and have just one email address per list of URLs.  Here is the (horribly messy and ugly) code I have so far.

#!/bin/bash
lastemail=$(head -1 merge_csv.csv | awk -F, '{print $2}')
isfirst="yes"
while read line
do
	email=$(echo $line | awk -F, '{print $2}')
	source=$(echo $line | awk -F, '{print $1}')
	if [[ "$email" = "$lastemail" ]]
	then
		if [[ $isfirst = "yes" ]]
		then
			echo -n \""$source" >> results.csv
			isfirst="no"
		else
			echo -n ^M"$source" >> results.csv
		fi
	else
		echo  -n \",\""$lastemail"\"^M >> results.csv
		echo -n \""$source" >> hope.csv
		lastemail=$(echo $line | awk -F, '{print $2}')
		isfirst="yes"
	fi
done < merge_csv.csv

It works for the very first row and then from there on it lists all the URLs in their own cell with no email address until the very last one.  The strange thing is that when I take a look at the csv file in vim, they seem to be formatted exactly as I expected them to be.  Like this:

"http://www.example.com/1
http://www.example.com/2
http://www.example.com/3
http://www.example.com/4
http://www.example.com/5", "owner@example.com"

I suspect that there is an aspect to the csv format that I'm not quite getting.  Why does it work on the first group of entries but not on the next 4000 when the formatting is exactly the same?  Should I be doing the linefeed with \n instead?

Thanks!

Last edited by instantaphex (2013-04-29 18:03:04)

Offline

#2 2013-04-25 09:17:14

ap0calypse
Member
From: Austria
Registered: 2012-03-12
Posts: 54
Website

Re: [SOLVED] Script to concatenate csv fields by a common column

Does it have to be bash? This would be easier in Perl ....

EDIT: It would also be quite helpful, if you could post at least some lines from the text you want to parse, just to make sure everyone understands your problem smile

EDIT2: Nevermind.... if you want a non-bash version, here it is:

#!/usr/bin/perl
use strict;

my %h;
for (<>) {
    my ($url, $admin) = ((split /,/, $_)[0], (split /,/, $_)[1]);
    chomp($url, $admin);
    push @{$h{$admin}}, $url;
}
for (keys %h) {
    print '"', join (" ", @{$h{$_}}) . '","' . $_ . '"' . "\n";
}

Maybe it helps.

EDIT3 (and last) :
sorry, forgot to tell you that my version works on STDIN. If you want to test it you would have to call it like this: "cat your_url_list.txt | perl myprog > result.csv"

Last edited by ap0calypse (2013-04-26 10:46:16)


projects: whistle | bazinga
open minds, open sources, open future.

Offline

#3 2013-04-25 17:20:41

instantaphex
Member
Registered: 2012-07-07
Posts: 67

Re: [SOLVED] Script to concatenate csv fields by a common column

Thanks for the reply.  To answer your question, no, it doesn't have to be in bash. 

I wasn't quite able to get your script to do what I was looking for.  Let me give a bit more information so that you can help me see where I've gone wrong.  What I have is a csv file that (in vim) looks like this:

http://www.example.com/1, owner@example.com
http://www.example.com/2, owner@example.com
http://www.example.com/3, owner@example.com
http://www.example.com/4, owner@example.com
http://www.otherexample.com/1, owner@otherexample.com
http://www.otherexample.com/2, owner@otherexample.com
http://www.otherexample.com/3, owner@otherexample.com
http://www.otherexample.com/4, owner@otherexample.com

and so it goes... for several thousand lines.  What I am trying to do, is to get all of the URLs from the same owner to be rendered in the same cell when I open the csv file in excel (I later intend to do a mail merge to email all of the owners so I need to use excel).  To figure out what excel was looking for, I opened excel initially and created a simple csv file with more or less the type of format I wanted, then I opened it in vim to check out what was going on with it.  What excel spit out was this:

"http://www.example.com/1
http://www.example.com/2
http://www.example.com/3",owner@example.com

There are a few gotchas with some of these URLs that I noticed while looking through the output of your perl script.  Some of the URLs contain query strings with spaces in them:

http://www.example.com/index.php?id=this has spaces

The perl script you provided ends up lumping the things after the space into the email address.

Another problem that I'm finding is that I'm not entirely sure what excel is looking for in the csv file in order to put all of the URLs from the same owner into one cell.  The bash script I provided in the OP actually does (or in its first iteration... did.  I then switched to enclosing the email address in quotes as well.  This had no effect) exactly what excel gave me (from the example above).  For some reason excel doesn't render it like I was looking for.  Maybe I'm missing something in the csv format?

Offline

#4 2013-04-25 18:54:10

briest
Member
From: Katowice, PL
Registered: 2006-05-04
Posts: 468

Re: [SOLVED] Script to concatenate csv fields by a common column

OMG. Awk:

awk -F, '{a[$2]=(a[$2] " " $1)}END{for (i in a) { sub(/^ /,"\"",a[i]); print a[i] "\"," i}}' /path/to/file

(unless there are commas in URLs)

Last edited by briest (2013-04-25 18:57:30)

Offline

#5 2013-04-25 19:12:57

briest
Member
From: Katowice, PL
Registered: 2006-05-04
Posts: 468

Re: [SOLVED] Script to concatenate csv fields by a common column

MySQL wink

create temporary table addr (urls text, email varchar(255)); 
load data infile '/path/to/file' into table addr FIELDS TERMINATED BY ',';
select group_concat(urls separator " "), email group by email into outfile '/path/to/outfile' fields terminated by ',' optionally enclosed by '"';

Offline

#6 2013-04-26 10:45:34

ap0calypse
Member
From: Austria
Registered: 2012-03-12
Posts: 54
Website

Re: [SOLVED] Script to concatenate csv fields by a common column

aaah, ok i thought your stuff was separated by spaces in your initial textfile, now we are getting closer. smile

I modified my script from the above posting, it should work now smile

Last edited by ap0calypse (2013-04-26 11:16:01)


projects: whistle | bazinga
open minds, open sources, open future.

Offline

#7 2013-04-29 18:02:48

instantaphex
Member
Registered: 2012-07-07
Posts: 67

Re: [SOLVED] Script to concatenate csv fields by a common column

I really appreciate all of your help.  The problem seems to be with excel and csv files.  The original script that I wrote does what I expected but it doesn't render correctly in excel.  I'm going to mark this as solved.

Offline

Board footer

Powered by FluxBB