You are not logged in.
Pages: 1
Hey guys,
i know there are still a lot of helper out there for the AUR but i just do it for fun and to learn something
My problem is, that i dont know how the use the following output of my method
Method
def showSearchResults(keyword):
f = urllib.request.urlopen('http://aur.archlinux.org/packages.php?O=0&K='+keyword+'&do_Search=Los')
for line in f:
print(line)
searchPattern = sys.argv[1]
print('you were searching for: ' + searchPattern)
showSearchResults(searchPattern)
Result with Groovy
you were searching for: groovy
b'<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"\n'
b' "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n'
b'<html xmlns="http://www.w3.org/1999/xhtml"\n'
b'\txml:lang="en" lang="en">\n'
b' <head>\n'
b' <title>AUR (en) - Search Criteria: groovy</title>\n'
b"\t<link rel='stylesheet' type='text/css' href='css/fonts.css' />\n"
b"\t<link rel='stylesheet' type='text/css' href='css/containers.css' />\n"
b"\t<link rel='stylesheet' type='text/css' href='css/arch.css' />\n"
b"\t<link rel='stylesheet' type='text/css' href='css/archnavbar/archnavbar.css' />\n"
b"\t<link rel='shortcut icon' href='images/favicon.ico' />\n"
b"\t<link rel='alternate' type='application/rss+xml' title='Newest Packages RSS' href='rss.php' />\n"
b'\t<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n'
b' </head>\n'
b'\t<body>\n'
b'\t\t<div id="archnavbar" class="anb-aur">\n'
b'\t\t\t<div id="archnavbarlogo"><h1><a href="/" title="Return to the main page">Arch Linux</a></h1></div>\n'
b'\t\t\t<div id="archnavbarmenu">\n'
b'\t\t\t\t<ul id="archnavbarlist">\n'
b'\t\t\t\t\t<li id="anb-home"><a href="http://www.archlinux.org/" title="Arch news, packages, projects and more">Home</a></li>\n'
b'\t\t\t\t\t<li id="anb-packages"><a href="http://www.archlinux.org/packages/" title="Arch Package Database">Packages</a></li>\n'
b'\t\t\t\t\t<li id="anb-forums"><a href="https://bbs.archlinux.org/" title="Community forums">Forums</a></li>\n'
b'\t\t\t\t\t<li id="anb-wiki"><a href="https://wiki.archlinux.org/" title="Community documentation">Wiki</a></li>\n'
b'\t\t\t\t\t<li id="anb-bugs"><a href="https://bugs.archlinux.org/" title="Report and track bugs">Bugs</a></li>\n'
b'\t\t\t\t\t<li id="anb-aur"><a href="https://aur.archlinux.org/" title="Arch Linux User Repository">AUR</a></li>\n'
b'\t\t\t\t\t<li id="anb-download"><a href="http://www.archlinux.org/download/" title="Get Arch Linux">Download</a></li>\n'
b'\t\t\t\t</ul>\n'
b'\t\t\t</div>\n'
b'\t\t</div><!-- #archnavbar -->\n'
b'\n'
b'\t\t<div id="archdev-navbar">\n'
b'\t\t\t<ul>\n'
b'\t\t\t\t<li><a href="index.php">AUR Home</a></li>\n'
b'\t\t\t\t<li><a href="account.php">Accounts</a></li>\n'
b'\t\t\t\t<li><a href="packages.php">Packages</a></li>\n'
b'\t\t\t\t<li><a href="http://bugs.archlinux.org/index.php?tasks=all&project=2">Bugs</a></li>\n'
b'\t\t\t\t<li><a href="http://archlinux.org/mailman/listinfo/aur-general">Discussion</a></li>\n'
b'\t\t\t\t\t\t\t</ul>\n'
b'\t\t</div><!-- #archdev-navbar -->\n'
b'\n'
b'\t\t<div id="login_bar" class="pgbox">\n'
b"<span class='error'>\n"
b'\tHTTP login is disabled. Please <a href="https://aur.archlinux.org/packages.php?O=0&K=groovy&do_Search=Los">switch to HTTPs</a> if you want to login.</span>\n'
b'</div>\n'
b'\n'
b'\t<div id="lang_sub">\n'
b'<a href="/packages.php?setlang=ca" title="Catal\xc3\xa0">ca</a>\n'
b'<a href="/packages.php?setlang=cs" title="\xc4\x8desky">cs</a>\n'
b'<a href="/packages.php?setlang=da" title="Dansk">da</a>\n'
b'<a href="/packages.php?setlang=de" title="Deutsch">de</a>\n'
b'<a href="/packages.php?setlang=en" title="English">en</a>\n'
b'<a href="/packages.php?setlang=el" title="\xce\x95\xce\xbb\xce\xbb\xce\xb7\xce\xbd\xce\xb9\xce\xba\xce\xac">el</a>\n'
b'<a href="/packages.php?setlang=es" title="Espa\xc3\xb1ol">es</a>\n'
b'<a href="/packages.php?setlang=fi" title="Finnish">fi</a>\n'
b'<a href="/packages.php?setlang=fr" title="Fran\xc3\xa7ais">fr</a>\n'
b'<a href="/packages.php?setlang=he" title="\xd7\xa2\xd7\x91\xd7\xa8\xd7\x99\xd7\xaa">he</a>\n'
b'<a href="/packages.php?setlang=hr" title="Hrvatski">hr</a>\n'
b'<a href="/packages.php?setlang=hu" title="Magyar">hu</a>\n'
b'<a href="/packages.php?setlang=it" title="Italiano">it</a>\n'
b'<a href="/packages.php?setlang=nb_NO" title="Norsk">nb_no</a>\n'
b'<a href="/packages.php?setlang=nl" title="Dutch">nl</a>\n'
b'<a href="/packages.php?setlang=pl" title="Polski">pl</a>\n'
b'<a href="/packages.php?setlang=pt" title="Portugu\xc3\xaas">pt</a>\n'
b'<a href="/packages.php?setlang=pt_BR" title="Portugu\xc3\xaas (Brasil)">pt_br</a>\n'
b'<a href="/packages.php?setlang=ro" title="Rom\xc3\xa2n\xc4\x83">ro</a>\n'
b'<a href="/packages.php?setlang=ru" title="\xd0\xa0\xd1\x83\xd1\x81\xd1\x81\xd0\xba\xd0\xb8\xd0\xb9">ru</a>\n'
b'<a href="/packages.php?setlang=sr" title="Srpski">sr</a>\n'
b'<a href="/packages.php?setlang=tr" title="T\xc3\xbcrk\xc3\xa7e">tr</a>\n'
b'<a href="/packages.php?setlang=uk" title="\xd0\xa3\xd0\xba\xd1\x80\xd0\xb0\xd1\x97\xd0\xbd\xd1\x81\xd1\x8c\xd0\xba\xd0\xb0">uk</a>\n'
b'<a href="/packages.php?setlang=zh_CN" title="\xe7\xae\x80\xe4\xbd\x93\xe4\xb8\xad\xe6\x96\x87">zh_cn</a>\n'
b'\t</div>\n'
b'\t<!-- Start of main content -->\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b"<div class='pgbox'>\n"
b"<form action='packages.php' method='get'>\n"
b"<div class='pgboxtitle'>\n"
b"\t<span class='f3'>Search Criteria</span>\n"
b"\t<input type='hidden' name='O' value='0' />\n"
b'\t<input type=\'text\' name=\'K\' size=\'30\' value="groovy" maxlength=\'35\' />\n'
b"\t<input type='submit' style='min-width:80px' class='button' name='do_Search' value='Go' />\n"
b'\t\t<a href="?O=0&K=groovy&do_Search=Los&PP=50&detail=1">Advanced</a>\n'
b'</div>\n'
b'\n'
b'\t\t\t</form>\n'
b'</div>\n'
b"\t<form action='packages.php?O=0&K=groovy&do_Search=Los' method='post'>\n"
b'\t\t<div class="pgbox">\n'
b'\t\t\t<div class="pgboxtitle">\n'
b"\t\t\t\t<span class='f3'>Package Listing</span>\n"
b'\t\t\t</div>\n'
b'\n'
b'\n'
b'\n'
b'\n'
b"<table width='100%' cellspacing='0' cellpadding='2'>\n"
b'<tr>\n'
b'\t\n'
b"\t<th style='border-bottom: #666 1px solid; vertical-align: bottom'><span class='f2'>\n"
b"\t\t<a href='?O=0&K=groovy&do_Search=Los&PP=50&SB=c&SO=d'>Category</a>\n"
b'\t</span></th>\n'
b"\t<th style='border-bottom: #666 1px solid; vertical-align: bottom; text-align: center;'><span class='f2'>\n"
b"\t\t<a href='?O=0&K=groovy&do_Search=Los&PP=50&SB=n&SO=d'>Name</a>\n"
b'\t</span></th>\n'
b"\t<th style='border-bottom: #666 1px solid; vertical-align: bottom'><span class='f2'>\n"
b"\t\t<a href='?O=0&K=groovy&do_Search=Los&PP=50&SB=v&SO=d'>Votes</a>\n"
b'\t</span></th>\n'
b'\n'
b"\t\t<th style='border-bottom: #666 1px solid; vertical-align: bottom; text-align: center;'><span class='f2'>Description</span></th>\n"
b"\t<th style='border-bottom: #666 1px solid; vertical-align: bottom'><span class='f2'>\n"
b"\t\t<a href='?O=0&K=groovy&do_Search=Los&PP=50&SB=m&SO=d'>Maintainer</a>\n"
b'\t</span></th>\n'
b'</tr>\n'
b'\n'
b'<tr>\n'
b"\t\t<td class='data1'><span class='f5'><span class='blue'>devel</span></span></td>\n"
b"\t<td class='data1'><span class='f4'><a href='packages.php?ID=43399'><span class='black'>gant 1.9.3-1</span></a></span></td>\n"
b'\t<td class=\'data1\' style="text-align: right"><span class=\'f5\'><span class=\'blue\'>1</span></span></td>\n'
b"\t\t<td class='data1'><span class='f4'><span class='blue'>\n"
b'\tA Groovy-based build system that uses Ant tasks, but no XML</span></span></td>\n'
b"\t<td class='data1'><span class='f5'><span class='blue'>\n"
b"\t\t<a href='packages.php?K=szym&SeB=m'>szym</a>\n"
b'\t\t</span></span></td>\n'
b'</tr>\n'
b'<tr>\n'
b"\t\t<td class='data2'><span class='f5'><span class='blue'>devel</span></span></td>\n"
b"\t<td class='data2'><span class='f4'><a href='packages.php?ID=12645'><span class='black'>grails 1.3.7-3</span></a></span></td>\n"
b'\t<td class=\'data2\' style="text-align: right"><span class=\'f5\'><span class=\'blue\'>73</span></span></td>\n'
b"\t\t<td class='data2'><span class='f4'><span class='blue'>\n"
b'\tGroovy on rails</span></span></td>\n'
b"\t<td class='data2'><span class='f5'><span class='blue'>\n"
b"\t\t<a href='packages.php?K=trontonic&SeB=m'>trontonic</a>\n"
b'\t\t</span></span></td>\n'
b'</tr>\n'
b'<tr>\n'
b"\t\t<td class='data1'><span class='f5'><span class='blue'>devel</span></span></td>\n"
b"\t<td class='data1'><span class='f4'><a href='packages.php?ID=294'><span class='black'>groovy 1.8.2-1</span></a></span></td>\n"
b'\t<td class=\'data1\' style="text-align: right"><span class=\'f5\'><span class=\'blue\'>135</span></span></td>\n'
b"\t\t<td class='data1'><span class='f4'><span class='blue'>\n"
b'\tGroovy is a Java based scripting language, similar to Python, Ruby and Smalltalk</span></span></td>\n'
b"\t<td class='data1'><span class='f5'><span class='blue'>\n"
b"\t\t<a href='packages.php?K=Musikolo&SeB=m'>Musikolo</a>\n"
b'\t\t</span></span></td>\n'
b'</tr>\n'
b'<tr>\n'
b"\t\t<td class='data2'><span class='f5'><span class='blue'>devel</span></span></td>\n"
b"\t<td class='data2'><span class='f4'><a href='packages.php?ID=31909'><span class='black'>groovy-docs 1.8.2-1</span></a></span></td>\n"
b'\t<td class=\'data2\' style="text-align: right"><span class=\'f5\'><span class=\'blue\'>6</span></span></td>\n'
b"\t\t<td class='data2'><span class='f4'><span class='blue'>\n"
b'\tDocumentation for the Groovy programming language.</span></span></td>\n'
b"\t<td class='data2'><span class='f5'><span class='blue'>\n"
b"\t\t<a href='packages.php?K=bruce&SeB=m'>bruce</a>\n"
b'\t\t</span></span></td>\n'
b'</tr>\n'
b'<tr>\n'
b"\t\t<td class='outofdate'><span class='f5'><span class='blue'>none</span></span></td>\n"
b"\t<td class='outofdate'><span class='f4'><a href='packages.php?ID=47672'><span class='black'>groovyserv 0.6-1</span></a></span></td>\n"
b'\t<td class=\'outofdate\' style="text-align: right"><span class=\'f5\'><span class=\'blue\'>1</span></span></td>\n'
b"\t\t<td class='outofdate'><span class='f4'><span class='blue'>\n"
b'\tGroovyServ makes Groovy's startup time much faster, by pre-invoking Groovy as a server.</span></span></td>\n'
b"\t<td class='outofdate'><span class='f5'><span class='blue'>\n"
b"\t\t<span style='color: blue; font-style: italic;'>orphan</span>\n"
b'\t\t</span></span></td>\n'
b'</tr>\n'
b'\n'
b'\t</table>\n'
b'</div> <!-- .pgbox ??! -->\n'
b'\n'
b'\n'
b'\t\t<div class="pgbox pkg_search_results_footer">\n'
b'\t\t\t<div class="legend_and_actions">\n'
b'\t\t\t\t<div class="legend">\n'
b"\t\t\t\t\t<span class='f3'>Legend</span>\n"
b'\t\t\t\t\t<span class="outofdate">Out of Date</span>\n'
b'\t\t\t\t</div>\n'
b'\t\t\t\t\t\t\t</div> <!-- .legend_and_actions -->\n'
b'\t\t\t<div class="page_links">\n'
b'\t\t\t\t<div class="f4 blue">\n'
b'\t\t\t\t\tShowing results 1 - 5 of 5\t\t\t\t</div>\n'
b'\t\t\t\t<div class="page_nav">\n'
b'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<span class="page_sel">1</span>\n'
b'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</div>\n'
b'\t\t\t</div> <!-- .page_links -->\n'
b'\t\t</div> <!-- .pgbox .pkg_search_results_footer -->\n'
b'\t</form>\n'
b'\n'
b'\t<!-- End of main content -->\n'
b'<div class="pgbox version">v1.9.0</div>\t</body>\n'
b'</html>\n'
Now i have no clue how to process further. I am only interested on the results (In the Groovy example, the 5 shown here: http://aur.archlinux.org/packages.php?O … earch=Los)
Any Ideas or clues? Maybe with some RegExp?
Thanks in advance
Last edited by hueck (2011-10-08 23:28:51)
Offline
Do NOT use regular expressions for this. They won't work. Never use regular expressions to parse html or xml (or any other syntax) unless you're using them to write some kind of tokenizer.
What you want is lxml: http://lxml.de/index.html
You can read more about lexers and parsers here to get a general overview of what lxml does: http://en.wikipedia.org/wiki/Lexical_analysis
Also it might be helpful to do a bit of research on xml, and understand how it works.
Last edited by Nisstyre56 (2011-10-09 00:26:25)
In Zen they say: If something is boring after two minutes, try it for four. If still boring, try it for eight, sixteen, thirty-two, and so on. Eventually one discovers that it's not boring at all but very interesting.
~ John Cage
Offline
Try this: (see http://docs.python.org/py3k/library/xml … ttree.html)
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import sys
import xml.etree.ElementTree
import urllib.request
search_url = ''.join(['http://aur.archlinux.org/packages.php?O=0&K=', sys.argv[1], '&do_Search=Los'])
dstr = [str(line, 'utf-8') for line in urllib.request.urlopen(search_url)]
element = xml.etree.ElementTree.fromstringlist(dstr)
oldv=''
line=''
for e in element.iter('{http://www.w3.org/1999/xhtml}td'):
v = e.get('class')
if v != oldv:
if len(line) > 0: print (line)
line = ''
for s in e.itertext():
s = s.lstrip()
s = s.rstrip()
if len(line) == 0:
line = s
else:
line = ' '.join([line, s])
oldv = v
print (line)
Last edited by rockin turtle (2011-10-09 06:32:21)
Offline
Thx, i am gonna do some research and try it out.
Offline
Use the api instead of parsing the page, seriously.
[oh@Alice][~]% curl "https://aur.archlinux.org/rpc.php?type=search&arg=groovy" 2>/dev/null | json_pp
{
"type" : "search",
"results" : [
{
"URL" : "http://groovy.codehaus.org",
"ID" : "294",
"FirstSubmitted" : "1113416437",
"Maintainer" : "Musikolo",
"OutOfDate" : "0",
"CategoryID" : "3",
"License" : "BSD/Apache style licence",
"URLPath" : "/packages/gr/groovy/groovy.tar.gz",
"NumVotes" : "135",
"Version" : "1.8.2-1",
"Name" : "groovy",
"Description" : "Groovy is a Java based scripting language, similar to Python, Ruby and Smalltalk",
"LastModified" : "1315523364"
},
{
"URL" : "http://grails.org/",
"ID" : "12645",
"FirstSubmitted" : "1187846781",
"Maintainer" : "trontonic",
"OutOfDate" : "0",
"CategoryID" : "3",
"License" : "Apache",
"URLPath" : "/packages/gr/grails/grails.tar.gz",
"NumVotes" : "73",
"Version" : "1.3.7-3",
"Name" : "grails",
"Description" : "Groovy on rails",
"LastModified" : "1305713109"
},
{
"URL" : "http://groovy.codehaus.org",
"ID" : "31909",
"FirstSubmitted" : "1257899383",
"Maintainer" : "bruce",
"OutOfDate" : "0",
"CategoryID" : "3",
"License" : "APACHE",
"URLPath" : "/packages/gr/groovy-docs/groovy-docs.tar.gz",
"NumVotes" : "6",
"Version" : "1.8.2-1",
"Name" : "groovy-docs",
"Description" : "Documentation for the Groovy programming language.",
"LastModified" : "1315709403"
},
{
"URL" : "http://gant.codehaus.org",
"ID" : "43399",
"FirstSubmitted" : "1289426860",
"Maintainer" : "szym",
"OutOfDate" : "0",
"CategoryID" : "3",
"License" : "APACHE",
"URLPath" : "/packages/ga/gant/gant.tar.gz",
"NumVotes" : "1",
"Version" : "1.9.3-1",
"Name" : "gant",
"Description" : "A Groovy-based build system that uses Ant tasks, but no XML",
"LastModified" : "1289426860"
},
{
"URL" : "http://kobo.github.com/groovyserv/index.html",
"ID" : "47672",
"FirstSubmitted" : "1300877392",
"Maintainer" : null,
"OutOfDate" : "1",
"CategoryID" : "1",
"License" : "apache-ant",
"URLPath" : "/packages/gr/groovyserv/groovyserv.tar.gz",
"NumVotes" : "1",
"Version" : "0.6-1",
"Name" : "groovyserv",
"Description" : "GroovyServ makes Groovy's startup time much faster, by pre-invoking Groovy as a server.",
"LastModified" : "1300877392"
}
]
}
[oh@Alice][~]%
Last edited by Mr.Elendig (2011-10-15 13:26:58)
Evil #archlinux@libera.chat channel op and general support dude.
. files on github, Screenshots, Random pics and the rest
Offline
Pages: 1