You are not logged in.
situation: imho, all RSS-readers suck
thoughts: "everything is a file", why not news?
motivation: handle news like normal textfiles with regards to linking, sharing, searching, archiving, ... them
(my) solution: newsfiler
newsfiler is a simple script, which iterates over a list of feed urls and transforms it's items to textfiles.
Short description of functionality:
a) read in a list of feed urls
b) extract news item direct links
c) pass these to readability port, thus receiving the fulltext - since most feeds still don't provide them on their own
d) do some formatting
e) write out the contents to: $HOME/news/$feedtitle/[YYYY-mm-dd] @ [HH:mm] with content
Feed Title: Item Title [DATE TIME]
Original Link
Full Content
available featurs:
receive fulltext for any feed
fix feeds not giving a date on their items
todo:
replace links with 'linktext [X]' and having a list of '[X] reallink' at the end
somehow implement parallel threads, it's currently not really fast
some kind of configuration per feed, ie timestamps to only fetch a feed in certain intervals
pass the site hosting the feed as referrer* done *
maybe a mode to write every item of a feed to the same file, so it could easily be followed by ie `tail -f`
maybe a mode to keep html, so it can easily be formatted and looked at in the browser
code clean-up, maybe try to get rid of some deps
written in: Python
deps: feedparser, BeautifulSoup, readability-lxml
download: http://phorcix.org/code/newsfiler
contact: here or see email-comment in the source
Ideas/patches welcome, flame too
Enjoy, or not?!
avx
Edit 1:
quick fix/addition: added dupe-check based on the article's title and date(where available) (needs proper fix for feeds without dates).
some cleanup, more "python-style"-coding
Edit 2:
fixed stupid error not saving anything
added support for custom user-agent and referrer
Last edited by avx (2011-10-22 23:18:16)
Offline