You are not logged in.

#1 2011-10-20 21:03:43

avx
Member
Registered: 2011-07-05
Posts: 71

newsfiler - simple RSS to file with fulltext support

situation: imho, all RSS-readers suck
thoughts: "everything is a file", why not news?
motivation: handle news like normal textfiles with regards to linking, sharing, searching, archiving, ... them

(my) solution: newsfiler

newsfiler is a simple script, which iterates over a list of feed urls and transforms it's items to textfiles.

Short description of functionality:

a) read in a list of feed urls
b) extract news item direct links
c) pass these to readability port, thus receiving the fulltext - since most feeds still don't provide them on their own
d) do some formatting
e) write out the contents to: $HOME/news/$feedtitle/[YYYY-mm-dd] @ [HH:mm] with content

Feed Title: Item Title [DATE TIME]
Original Link

Full Content

available featurs:

  • receive fulltext for any feed

  • fix feeds not giving a date on their items

todo:

  • replace links with 'linktext [X]' and having a list of '[X] reallink' at the end

  • somehow implement parallel threads, it's currently not really fast

  • some kind of configuration per feed, ie timestamps to only fetch a feed in certain intervals

  • pass the site hosting the feed as referrer* done *

  • maybe a mode to write every item of a feed to the same file, so it could easily be followed by ie `tail -f`

  • maybe a mode to keep html, so it can easily be formatted and looked at in the browser

  • code clean-up, maybe try to get rid of some deps

written in: Python
deps: feedparser, BeautifulSoup, readability-lxml

download: http://phorcix.org/code/newsfiler

contact: here or see email-comment in the source

Ideas/patches welcome, flame too smile

Enjoy, or not?!
avx

Edit 1:

  • quick fix/addition: added dupe-check based on the article's title and date(where available) (needs proper fix for feeds without dates).

  • some cleanup, more "python-style"-coding

Edit 2:

  • fixed stupid error not saving anything

  • added support for custom user-agent and referrer

Last edited by avx (2011-10-22 23:18:16)

Offline

Board footer

Powered by FluxBB