Cron RSS Reader

pilotkeller · 2012-12-31 17:18:50

So I've been using newsbeuter like many people for over a year now. However, the one complaint I and many others have voiced on the forums is that: there exists no cron RSS reader. Sure you can run newsbeuter 24/7 with the auto-refresh feature enabled, but who wants that. So a week ago at the start of the holidays I wrote my own. Now after a week I have seen that it works and that it should be ready to release in beta. Please understand that this is designed in my favorite way; stupid simple. It is source edited and demands only two files: a history file, and the program. I wrote it in python because of the wonderful feedparser library and because this way you can edit your feeds on the fly without needing to recompile.

For anyone who wishes, feel free to play around with this and give me your feedback.

To get it working just run the following to ensure you have the deps:

sudo pacman -Sy coreutils curl python3 python-feedparser youtube-dl

Next put the following source in a file and chmod 700 it. Now you can ./rss or maybe put it in ~/bin and add that to your path so you can run it like any other binary. But wait! Now before you run it, you will more than likely want to edit the config unless you want all of my feeds and just happen to have a /home/keller/downloads folder you want all the stuff put in. To edit the urls just follow the examples I left in. They are in the style ("Name Of Feed", "http:\\example.com\rss\url.xml") where the first entry starts with and additional open brace and then each end in a comma if it is not the last feed entry or a additional close brace if it is. If you don't have anything for the particular category, just use (). Now give it a run by appending the --no-download option. If you don't, it's going to take a while and will work on every feed entry.

To quote --help:

Usage: rss [Options]
This is a zero hassle RSS reader. It is designed to be source configured and
run without a care in the world. Cron it, script it, just plain run it...

Options:
  -h, --help                 Show me this message.
      --no-download          Do not act on new entries.
                               (Simply mark as old.)
  -q, --quiet                Don't talk.
  -v, --say-something        The opposite of -q.

In terms of action, it will:
First: Append new links to a file called links.html in the downloads folder that you can view with your browser of choice. This is the fastest so you can read your feeds while you wait on the downloads.
Second: Download the attached image from Deviant Art
Third: Download every youtube video at a max of 720p (To speed up the download and conserve bandwidth)
Lastly: Download all of your podcasts

I was debating adding twitter, but I don't use RSS for twitter (or twitter for that matter) so if there is a desire for it (or others) let me know.

#!/usr/bin/python
# Depends on:
#     * coreutils
#     * curl
#     * python3
#     * python-feedparser
#     * youtube-dl
# - - - - - - - - - - - - - - Configuration and URLs - - - - - - - - - - - - - #

be_quiet = False # True disables output (Good for cron)

download_dir = "/home/keller/downloads/"
history_file = "/home/keller/.rss_history"

# Appends to links.html page in download directory
link_urls = (("Extra Ordinary", "http://www.exocomics.com/feed"),
             ("Doghouse Diaries", "http://feeds2.feedburner.com/thedoghousediaries/feed"),
             ("Cyanide & Happiness", "http://feeds.feedburner.com/Explosm"),
             ("XKCD", "http://xkcd.com/rss.xml"),
             ("Scandinavia And The World", "http://feeds.feedburner.com/satwcomic?format=xml"),
             ("Surviving The World", "http://survivingtheworld.net/feed.xml"),
             ("Something Of That Ilk", "http://somethingofthatilk.com/rss/index.php"),
             ("Invisible Bread", "http://feeds.feedburner.com/InvisibleBread"),
             ("Happle Tea", "http://www.happletea.com/feed/"),
             ("Dilbert", "http://feed.dilbert.com/dilbert/daily_strip"),
             ("What-If", "http://what-if.xkcd.com/feed.atom"),
             ("Networking Nerd", "http://networkingnerd.net/feed/"),
             ("Fry Guy's Blog", "http://www.fryguy.net/feed/"),
             ("Ethereal Mind", "http://feeds.feedburner.com/etherealmind?format=xml"),
             ("Packet Pushers", "http://feeds.feedburner.com/PacketPushersBlogsOnly"),
             ("Lone SysAdmin", "http://lonesysadmin.net/feed/"),
             ("Arch Linux News", "http://www.archlinux.org/feeds/news/"),
             ("Schneier on Security", "http://feeds.feedburner.com/schneier/excerpts"))

# Deviant Art RSS
deviant_art_urls = (("Isbjorg's Main Gallery", "http://backend.deviantart.com/rss.xml?q=gallery%3Aisbjorg%2F9742889&type=deviation"))

# Youtube RSS - Youtube Username Only
youtube_users = (("Phillip DeFranco", "sxephil"),
                 ("Freddie", "freddiew"),
                 ("Freddie BTS", "freddiew2"),
                 ("Corridor Digital", "corridordigital"),
                 ("Corridor Digital BTS", "samandniko"),
                 ("Jenna Marbles", "jennamarbles"),
                 ("Source Fed", "sourcefed"),
                 ("Minute Physics", "minutephysics"),
                 ("VSauce", "vsauce"),
                 ("Numberphile", "numberphile"),
                 ("Veritasium", "1veritasium"),
                 ("Sixty Symbols", "sixtysymbols"),
                 ("Periodic Videos", "periodicvideos"))

# Podcasts - Audio/Video linked content download
podcast_urls = (("Security Now", "http://feeds.twit.tv/sn_video_large"),
                ("Ted Talks", "http://feeds.feedburner.com/tedtalks_video"),
                ("Scam School", "http://revision3.com/scamschool/feed/MP4-Large"),
                ("Hak 5", "http://feeds.feedburner.com/hak5hd?format=xml"),
                ("Film Riot", "http://revision3.com/filmriot/feed/MP4-hd30"),
                ("SANS News", "https://isc.sans.edu/dailypodcast.xml"),
                ("The Techie Geek", "http://feeds.feedburner.com/thetechiegeek/ogg?format=xml"))

# - - - - - - - - - - - - No need to modify below here - - - - - - - - - - - - #
from feedparser import parse
from os import system
from sys import argv
import pickle

# -- Argument Parse

no_download = False

if "--help" in argv or "-h" in argv:
    print("""Usage: rss [Options]
This is a zero hassle RSS reader. It is designed to be source configured and
run without a care in the world. Cron it, script it, just plain run it...

Options:
  -h, --help                 Show me this message.
      --no-download          Do not act on new entries.
                               (Simply mark as old.)
  -q, --quiet                Don't talk.
  -v, --say-something        The opposite of -q.
""")
    exit()

if "--no-download" in argv:
    no_download = True

if "--quiet" in argv or "-q" in argv:
    be_quiet = True

if "--say-something" in argv or "-v" in argv:
    be_quiet = False

# -- Unpickle History

try:
    history = pickle.load(open(history_file, "rb"))
except:
    tmp = open(history_file, "w")
    tmp.close()
    history = {"podcast" : [], "deviant_art" : [], "youtube" : [], "link" : []}

current_links = [] # Holds all current links so we can prune ancient history

# -- Link

for url in link_urls:
    if not be_quiet : print("Checking", url[0] + "...")
    for entry in parse(url[1]).entries:
        current_links.append(entry.link)
        if entry.link not in history["link"]: # If is a new link
            if not be_quiet : print(" * New Content Found!")
            if no_download or system('echo "<a href=\"' + entry.link + '\">' + url[0] + ' : ' + entry.title + '</a><br />" >> ' + download_dir + 'links.html') == 0: # Append to file
                history["link"].append(entry.link)

# -- Deviant Art

for url in deviant_art_urls:
    if not be_quiet : print("Checking", url[0] + "...")
    for entry in parse(url[1]).entries:
        if entry.media_content[0]["url"][-4] == '.': # Check it's a file
            current_links.append(entry.media_content[0]["url"])
            if entry.media_content[0]["url"] not in history["deviant_art"]: # If is a new link
                if not be_quiet : print(" * Downloading:", entry.media_content[0]["url"][entry.media_content[0]["url"].rfind('/') + 1:])
                if no_download or system('curl -so "' + download_dir + entry.media_content[0]["url"][entry.media_content[0]["url"].rfind('/') + 1:] + '" "' + entry.media_content[0]["url"] + '"') == 0: # Download
                    history["deviant_art"].append(entry.media_content[0]["url"])

# -- Youtube

for url in youtube_users:
    if not be_quiet : print("Checking", url[0] + "...")
    for entry in parse("https://gdata.youtube.com/feeds/api/users/" + url[1] + "/uploads").entries:
        current_links.append(entry.link)
        if entry.link not in history["youtube"]: # If is a new link
            if no_download or system('youtube-dl  --max-quality 22 -o "' + download_dir + '%(title)s.%(ext)s" "' + entry.link + '"' + ['', ' -q'][be_quiet]) == 0: # Download
                history["youtube"].append(entry.link)

# -- Podcast

for url in podcast_urls:
    if not be_quiet : print("Checking", url[0] + "...")
    for entry in parse(url[1]).entries:
        for link in entry.links:
            if link.type[0:5] == "video" or link.type[0:5] == "audio": # If _IT_ describes itself as video or audio
                current_links.append(link.href)
                if link.href not in history["podcast"]: # If is a new link
                    if not be_quiet : print(" * Downloading:", link.href[link.href.rfind('/') + 1:])
                    if no_download or system('curl -#Lo "' + download_dir + link.href[link.href.rfind('/') + 1:] + '" "' + link.href + '"' + ['', ' -s'][be_quiet]) == 0: # Download
                        history["podcast"].append(link.href)

# -- History Dump

for key in history:
    for link in history[key]:
        if link not in current_links:
            history[key].remove(link)

# -- Pickle History

pickle.dump(history, open(history_file, "wb"))

That's all folks. If you do give it a try, please give feed back even if it's just that the whole thing sucks. I'd love to know what feeds don't work, if there is anything you want added, or if you just think it's swell.

If there appears to be some interest I'll look at putting this in the AUR. Maybe with a mode for config files so that each user can have thier feeds all independantly checked with each run under cron. Though that'll take some work to get just right so I'm only doing so if people are sure they would like such a feature. So yeah, feel free to ask for the moon, I'll see what I can do in most cases.

Lastly, I am humbly sorry that I suck at documenting how exactly it works here. If you have any questions at all, please feel free to post away and I will do my best to answer them.

Arch Linux

#1 2012-12-31 17:18:50

Cron RSS Reader

Board footer