You are not logged in.

#1 2010-09-22 13:06:53

daveg55
Member
Registered: 2010-09-22
Posts: 2

Creating alert from search results on web content *DEEPWEB*?

Hope this an appropriate location to post this question.

Looking for suggestions on approaching this problem [hoping for an ArchWay approach  smile]

1. Login to ssl secure website with user id and password [CURL?]
2. Perform a search for an article/award/bid notification with a set of keywords [CURL?]
3. Parse (tokenize) the results to identify amounts greater than $5,000 - [AeroText, LingPipe, NetOwl, Inxight... others???]
4. Produce an alert with results and links to specific items [db and rss feed or some alerting mechanism]

Looked at CURL but not sure if there is perhaps another framework to leverage (can start from scratch but trying not to reinvent the wheel).

Any ideas, suggestions greatly appreciated from the community.

Thanks,

Dave

Offline

#2 2010-09-22 13:16:04

Stebalien
Member
Registered: 2010-04-27
Posts: 1,239
Website

Re: Creating alert from search results on web content *DEEPWEB*?

Perl. Personally I prefer python but this is a perfect job for perl.


Steven [ web : git ]
GPG:  327B 20CE 21EA 68CF A7748675 7C92 3221 5899 410C

Offline

#3 2010-09-22 15:29:31

awkwood
Member
From: .au <=> .ca
Registered: 2009-04-23
Posts: 91

Re: Creating alert from search results on web content *DEEPWEB*?

The Mechanize gem is perfect for automating sites like this (Ruby).

gem install mechanize

The (untested) example below:
  - logs into an SSL website (handling cookies etc)
  - searches for a list of keywords
  - stores results for a hash for storage/email etc

#!/usr/bin/env ruby
# encoding: utf-8
require 'rubygems'
require 'mechanize'

# create a browser agent and set its alias
agent = Mechanize.new { |a| a.user_agent_alias = 'Mac Safari' }

words  = %w( i can haz taco? )           # list of keywords
result = Hash.new { |h, k| h[k] = [] }   # hash default is an array

agent.get('https://www.tacos-vs-poutine.com/') do |page|

  # login to the site
  form          = page.form_with(:name => 'login')
  form.email    = 'vegemite'
  form.password = 'sandwich'
  page          = agent.submit(form)

  # search site for each word
  words.each do |word|
    search_result = page.form_with(:name => 'search') do |search|
      search.q = word
    end.submit

    # store results in a hash
    search_result.links.each do |link|
      if link.text.to_i > 5000
        result[word] << link.text
      end
    end
  end

  # store results in DB, convert to RSS feed, email etc
  # eg: Sequel + Sqlite3, RubyRSS, Mail gems
  # ...
end

** EDIT: fixed syntax

Last edited by awkwood (2010-09-23 01:31:05)

Offline

#4 2010-09-28 15:42:49

daveg55
Member
Registered: 2010-09-22
Posts: 2

Re: Creating alert from search results on web content *DEEPWEB*?

Thank you both very much for quick replies.

Would anyone be interested in tackling this problem?

Offline

Board footer

Powered by FluxBB