You are not logged in.
Hope this an appropriate location to post this question.
Looking for suggestions on approaching this problem [hoping for an ArchWay approach ]
1. Login to ssl secure website with user id and password [CURL?]
2. Perform a search for an article/award/bid notification with a set of keywords [CURL?]
3. Parse (tokenize) the results to identify amounts greater than $5,000 - [AeroText, LingPipe, NetOwl, Inxight... others???]
4. Produce an alert with results and links to specific items [db and rss feed or some alerting mechanism]
Looked at CURL but not sure if there is perhaps another framework to leverage (can start from scratch but trying not to reinvent the wheel).
Any ideas, suggestions greatly appreciated from the community.
Thanks,
Dave
Offline
The Mechanize gem is perfect for automating sites like this (Ruby).
gem install mechanize
The (untested) example below:
- logs into an SSL website (handling cookies etc)
- searches for a list of keywords
- stores results for a hash for storage/email etc
#!/usr/bin/env ruby
# encoding: utf-8
require 'rubygems'
require 'mechanize'
# create a browser agent and set its alias
agent = Mechanize.new { |a| a.user_agent_alias = 'Mac Safari' }
words = %w( i can haz taco? ) # list of keywords
result = Hash.new { |h, k| h[k] = [] } # hash default is an array
agent.get('https://www.tacos-vs-poutine.com/') do |page|
# login to the site
form = page.form_with(:name => 'login')
form.email = 'vegemite'
form.password = 'sandwich'
page = agent.submit(form)
# search site for each word
words.each do |word|
search_result = page.form_with(:name => 'search') do |search|
search.q = word
end.submit
# store results in a hash
search_result.links.each do |link|
if link.text.to_i > 5000
result[word] << link.text
end
end
end
# store results in DB, convert to RSS feed, email etc
# eg: Sequel + Sqlite3, RubyRSS, Mail gems
# ...
end
** EDIT: fixed syntax
Last edited by awkwood (2010-09-23 01:31:05)
Offline
Thank you both very much for quick replies.
Would anyone be interested in tackling this problem?
Offline