You are not logged in.
I've been visiting the #archlinux irc channel using weechat, and using the python program (script?) urlgrab.py to view any urls that are posted. That all works fine.
However, it doesn't recognize urls that are bracketed by <>, like this:
I located where it finds a url and modified it to recognize these as urls. Here's the code:
def urlGrabCheckMsgline(server, chan, message):
# Ignore output from 'tinyurl.py'
if message.startswith( "[AKA] http://tinyurl.com" ):
return weechat.PLUGIN_RC_OK
# Check for URLs
for word in message.split(" "):
if word[0:7] == "http://" or \
word[0:8] == "https://" or \
# try this 3/22/08 - phrik places urls within <>
word[0:8] == "<http://" or \
word[0:9] == "<https://" or \
# end
word[0:6] == "ftp://":
urlGrab.addUrl(word, chan, server)
But when they get passed to firefox they still have the <>. How do I strip those off?
Offline
I don't know with this is the best method to remove the <>.... but to remove the first and last char in a string
a[1:-1]
Offline
I've been visiting the #archlinux irc channel using weechat, and using the python program (script?) urlgrab.py to view any urls that are posted. That all works fine.
However, it doesn't recognize urls that are bracketed by <>, like this:
I located where it finds a url and modified it to recognize these as urls. Here's the code:
def urlGrabCheckMsgline(server, chan, message): # Ignore output from 'tinyurl.py' if message.startswith( "[AKA] http://tinyurl.com" ): return weechat.PLUGIN_RC_OK # Check for URLs for word in message.split(" "): if word[0:7] == "http://" or \ word[0:8] == "https://" or \ # try this 3/22/08 - phrik places urls within <> word[0:8] == "<http://" or \ word[0:9] == "<https://" or \ # end word[0:6] == "ftp://": urlGrab.addUrl(word, chan, server)
But when they get passed to firefox they still have the <>. How do I strip those off?
hi,
i would simply use re:
grab_url = re.compile(r'((https?://|ftp://|www\.)[-A-Za-z/.?_=&0-9#]*)', re.I)
vlad
Offline
DonVla's regex does a pretty reasonable job - you'd do well to get into re syntax. I don't think it's spot on though, for example, what about URLs with % symbols in. And are unicode urls coming into play or what?!
If we try and keep with the original design, which is to simply check whether the start of the string follows a URL pattern then we could do something like this:
for word in message.split(" "):
if word.startswith('<') and word.endswith('>'):
word = word[1:-1]
if re.match(r'(https?|ftp)://', word):
urlGrab.addUrl(word, chan, server)
Offline
Thanks, arooaroo. That works perfectly. And so elegant too! I had made such a horrible mess of ifs that you would have barfed. Even I could tell it was ugly code.
kazuo, DonVla - Thank you as well. I learned a lot from trying to implement your ideas.
Offline
Why not just replace the unwanted symbols with empty characters?
message.replace('<','')
message.replace('>','')
Last edited by barebones (2008-03-28 16:18:47)
Offline
Good question. I'll check it out.
Offline