You are not logged in.

#1 2010-01-16 11:44:07

milomouse
Member
Registered: 2009-03-24
Posts: 940
Website

SH: compare and remove found instances

Forewarning: I'm a scripting beginner and haven't posted in this section before and hope I can describe this properly. Also I'm trying to accomplish this in shell (be it sh, bash, or zsh)...

Yes, well, basically I'm trying to compare output2 to output1 and if there are anything from output2 found in output1 to remove them from output1. So if I'm checking for something I don't want in output1 I'll use output2 as a reference and if output1 contains, say, two values from output2 it will remove them all. Example:

"output1" (good values) | "output2" (bad values)
-------------------------------------------------
good mouse              | bad cat
good mouse X            | bad cat X
bad cat XXX             | bad cat XX
good mouse XX           | bad cat XXX
bad cat                 | bad cat XXXX

output1 is compared to a reference of output2 and it finds two instances that aren't supposed to be there, regardless of position in either list. Of course when I compare them it wont be as simple as them all starting with the same letters/words. I'm going to be comparing URLs from two outputs.

I know I can "diff" them like: diff <(output1) <(output2) but like I said they won't be in order and they can't be sorted alphabetically because both lists will usually have different values. I'm just trying to remove a value from one list if it's found in both.

I've found a simple way of checking a single output for duplicates but this only removes the duplicates and not every instance.

awk ' !x[$0]++' file

Also, I hate using temp files. Maybe I could pipe both outputs to fifo? Eh, that's kinda the same thing and not really my main issue.

So, to make things easier to understand I'm going to be dumping two outputs of URLs and comparing them. One list will have "potential URLs" and the other will have "definitely not". By comparing the two I will tell if a "potential URL" is a "definitely not", and so remove it from the "potential" list. The end result will be a single output of "potentials" that were confirmed good (as they weren't in the "definitely not" output). There's more to the grand scheme but I've already figured out how to pre-pend and append things to everyline (or N line/s) and organize the whole mess but I just haven't figured out this part, so any help is appreciated even if it's just a link for me to read. Be gentle neutral

Last edited by milomouse (2010-01-16 11:48:41)

Offline

#2 2010-01-16 13:19:47

tlvb
Member
From: Sweden
Registered: 2008-10-06
Posts: 297
Website

Re: SH: compare and remove found instances

This snippet outputs lines from output1 that do not (-v) match any whole line (-x) in the file (-f) output2.

grep -xvf output2 output1

Last edited by tlvb (2010-01-16 14:29:19)


I need a sorted list of all random numbers, so that I can retrieve a suitable one later with a binary search instead of having to iterate through the generation process every time.

Offline

#3 2010-01-16 15:02:32

milomouse
Member
Registered: 2009-03-24
Posts: 940
Website

Re: SH: compare and remove found instances

Ooh, a nice cat. :D Yeah, that's what I've been using (also used to be a part of my zsh prompt) but I was trying to find a cleaner solution because I generally abuse grep, cat, awk and pipes. I'm horrible for it. Here's a dirty mock-up of what I'm trying to accomplish:

function mrr() {
  local env=/tmp/newmirrors
  local fifo1=/tmp/fifo1
  local fifo2=/tmp/fifo2
  mkfifo /tmp/fifo{1,2}
  links -dump 'https://www.archlinux.de/?page=MirrorStatus;orderby=lastsync;sort=1'|\
  grep "tp:/"|head -n 10|awk '{print $1}'|sed "s/$/\$repo\/os\/`arch`/" >! $fifo1 &
  links -dump 'https://www.archlinux.de/?page=MirrorStatus;orderby=avgtime;sort=1'|\
  grep "tp:/"|head -n 10|awk '{print $1}'|sed "s/$/\$repo\/os\/`arch`/" >! $fifo2 &
  grep -xvf $fifo2 <$fifo1|grep -vi i686|head -n 5|sed 's/^/Server = /;s/$//' >! $env
  rm /tmp/fifo{1,2} &>/dev/null
  rankmirrors -n 5 $env >! /tmp/good
  rm $env ; sudo mv /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist-old
  sudo mv /tmp/good /etc/pacman.d/mirrorlist
  }

So, yeah. Basically a crappy attempt on what I believe Xyne's perl Reflector is doing (although I don't know how to read much Perl to tell). I just wanted something I understood.

Anyway, I hate how I use fifo as tmp files but I didn't know how else to do it. Also, I had to add the bit about i686 because some servers on the list are i686 only and specify it in their URL so I had to remove them (since I have x86_64). Oh, and the use of "sudo", blech. I guess it works though. Basically it grabs a list of the most recently updated servers and then weeds out the servers with bad response time (if they're on that list, which a few were when I last tested this). Rankmirrors surprisingly yields different results despite the response-time. I guess I could use ping instead of Rankmirrors but I'll do that in the script, not this function. This is the basic reason I needed to compare them, which I hope to expand upon later. I'm sure there's an easier way to do all this but I simply lack the know-how. Again, this is a temporary function I have in my .zsh* file; the script will be different [if I ever finished it].

edit: typo

Last edited by milomouse (2010-01-16 15:31:18)

Offline

#4 2010-01-16 16:43:31

jac
Member
From: /home/jac
Registered: 2009-05-19
Posts: 431
Website

Re: SH: compare and remove found instances

Could comm help you out at all? The inputs have to be sorted, but I don't think that is the same problem you were having with diff. Also, it uses input files, so...

Offline

#5 2010-01-17 05:35:36

milomouse
Member
Registered: 2009-03-24
Posts: 940
Website

Re: SH: compare and remove found instances

Neat, I didn't know about comm. It seems it does about the same thing as grep was doing but I think I'm able to do a little more with it without grepping on both sides. Although, yes, I'm still stuck with my fifos! Oh well, I'll play around with this. smile Next I'mma look into something besides Links for dumping urls. Maybe sed with the aid of something else..

Last edited by milomouse (2010-01-17 05:38:13)

Offline

#6 2010-01-17 11:08:03

tlvb
Member
From: Sweden
Registered: 2008-10-06
Posts: 297
Website

Re: SH: compare and remove found instances

If you prefer comparing bash variables instead of files it would be possible to do something like:

out1=$(cat output1); out2=$(cat output2) # as a reference to my previous example
echo $out1|grep -xv $out2;

Last edited by tlvb (2010-01-17 11:17:07)


I need a sorted list of all random numbers, so that I can retrieve a suitable one later with a binary search instead of having to iterate through the generation process every time.

Offline

#7 2010-01-17 13:10:37

milomouse
Member
Registered: 2009-03-24
Posts: 940
Website

Re: SH: compare and remove found instances

Well, for the most part that works but I'm having difficulties while handling Links. I can't "cat" the output and when I "echo" the Links output they're not on newlines so I have to append \n to each one so echo displays each URL on it's own line. This causes a problem because the output will look like this for whatever reason:

http://URL.com
 http://URL.com
 http://URL.com

So there's this space infront of the lines following the first one and when they compare to the other Links output with your example I'll get a final output of:

Server = http://URL.com
Server = 
Server = http://URL.com
Server = 
Server = http://URL.com

Basically I have this in my secondary test:

function mrr2() {
  local env=/tmp/newmirrors
  out1=$(links -dump 'https://www.archlinux.de/?page=MirrorStatus;orderby=lastsync;sort=1'|\
  grep "tp:/"|head -n 10|awk '{print $1}'|sed "s/$/\$repo\/os\/`arch`\\\n/")
  out2=$(links -dump 'https://www.archlinux.de/?page=MirrorStatus;orderby=avgtime;sort=1'|\
  grep "tp:/"|head -n 10|awk '{print $1}'|sed "s/$/\$repo\/os\/`arch`\\\n/")
  echo $out1|grep -xv $out2|grep -vi i686|head -n 5|sed 's/^/Server = /;s/$//' >! $env
}

I tried to use "echo -n" to kill that last empty newline but it still ends up with the two blank "Server = " lines. I think I ultimately need a different way to obtain the output of the websites instead of Links unless I'm missing something. The only other way to amend this I think is to [dumbly] remove every other line after comparison before giving the final output.

Offline

#8 2010-01-17 13:40:34

Bralkein
Member
Registered: 2004-10-26
Posts: 354

Re: SH: compare and remove found instances

Hi, I think this script more or less does what you want. It uses only one temp file at the end because it's kind of a PITA to do it without due to the way sudo works.

#!/bin/bash                                                                         
maybe_mirrors=$(links -dump 'https://www.archlinux.de/?page=MirrorStatus;orderby=lastsync;sort=1'|\
  grep "tp:/"|head -n 10|awk '{print $1}'|sed "s/$/\$repo\/os\/`arch`/")                           
bad_mirrors=$(links -dump 'https://www.archlinux.de/?page=MirrorStatus;orderby=avgtime;sort=1'|\   
  grep "tp:/"|head -n 10|awk '{print $1}'|sed "s/$/\$repo\/os\/`arch`/")
good_mirrors=()

for i in $maybe_mirrors; do
  if grep -q $i <<< $bad_mirrors; then
    :
  else
     good_mirrors=( "${good_mirrors[@]}" "$i\n" )
  fi
done

good_mirrors_ranked=$(echo -e "${good_mirrors[@]}" | sed 's/^/Server = /;s/$//' |\
  head -n 5 | rankmirrors -n 5 - )

echo "$good_mirrors_ranked" > /tmp/good
sudo mv /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist-old
sudo mv /tmp/good /etc/pacman.d/mirrorlist

Edit: Made the for loop shorter/better

Last edited by Bralkein (2010-01-17 14:06:14)

Offline

#9 2010-01-18 03:17:07

milomouse
Member
Registered: 2009-03-24
Posts: 940
Website

Re: SH: compare and remove found instances

Awesome, that works great. Thanks big_smile After thinking it over, I figured it was better to remove "sudo" from a script that can be ran as a user, this also removes the need of a temp file (also added the bit about i686 back since I personally still avoid that) I did make a small change to check if file actually exists because while I was testing different methods I noticed it would always try to move the mirrorlist file even if it wasn't there. Anyway, in order to run the script I decided to make sure whoever was doing it had root privileges since it's messing with system (pacman) stuff anyway. Who knows the terrors someone could inflict if they change your servers to something else. I'm still going to investigate the URL dumping without "Links" and possibly a rankmirrors supplicant, if only for learning purposes. I guess I'll dig through "dnsutils" or something.. maybe some more googling. I hope this isn't out of the range of sh/bash.

#!/bin/bash
if [ "$USER" = "root" ]; then
maybe_mirrors=$(links -dump 'https://www.archlinux.de/?page=MirrorStatus;orderby=lastsync;sort=1'|\
  grep "tp:/"|head -n 10|awk '{print $1}'|sed "s/$/\$repo\/os\/`arch`/")
bad_mirrors=$(links -dump 'https://www.archlinux.de/?page=MirrorStatus;orderby=avgtime;sort=1'|\
  grep "tp:/"|head -n 10|awk '{print $1}'|sed "s/$/\$repo\/os\/`arch`/")
good_mirrors=()

for i in $maybe_mirrors; do
  if grep -q $i <<< $bad_mirrors; then
    :
  else
     good_mirrors=( "${good_mirrors[@]}" "$i\n" )
  fi
done

good_mirrors_ranked=$(echo -e "${good_mirrors[@]}" | grep -vi i686 | sed 's/^/Server = /;s/$//' |\
  head -n 5 | rankmirrors -n 5 - )

if [ -f /etc/pacman.d/mirrorlist ]; then
  mv /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist-old
fi
echo "$good_mirrors_ranked" > /etc/pacman.d/mirrorlist
else
  echo "User \"$USER\" does not have the required privileges to run this script."
fi

edit: typo

Last edited by milomouse (2010-01-18 03:19:53)

Offline

Board footer

Powered by FluxBB