You are not logged in.
First of all I'm sorry, I know that there are those who don't like these kind of threads, but I fear that on my own I'll get nowhere fast.
My requirements are really quite simple actually. I need to parse an RSS feed, more specifically the arch news feed, but to do so only showing the titles from within a given time frame (some arbitrary date up until the current date), after which of course I'll need the output in a simple format which I can then further manipulate to my liking.
I've been spending quite a lot of time on this problem by now, but haven't even been able to find a simple terminal based RSS reader. I've come across one or two that seemed promising at first, but they've been overly complicated to an extend that I might as well write a custom script, and as far as I can gather, the time frame specific feature has not been available.
I'm not great at scripting in any language, so before I begin the process, which will undoubtedly take weeks, not to speak of the mediocre result I'll produce, I thought I'd ask for advice here first.
Hoping that you'll point me in the right direction, best regards.
Last edited by zacariaz (2013-08-30 08:49:44)
I am a philosopher, of sorts, not a troll or an imbecile.
My apologies that this is not always obvious, despite my best efforts.
Offline
What did you find already? I have come across many scripts like what yuo describe in these forums. I think that the handy self made scripts thread has at least a few.
Offline
Would parsing https://mailman.archlinux.org/pipermail/arch-announce/ be easier?
Offline
Would parsing https://mailman.archlinux.org/pipermail/arch-announce/ be easier?
That would require a lot of scraping.
Here's a custom script. I didn't know if you wanted to specify the cutoff by a fixed amount of time (e.g. the last 30 days) or by a fixed date (e.g. since Aug 1), so I have added options for both. I have also included comments to make it easier for you to edit it.
#!/usr/bin/env python3
# Depends on python-feedparser
import argparse
import datetime
import feedparser
import time
FEED_URL = 'https://www.archlinux.org/feeds/news/'
# Command-line argument parser.
parser = argparse.ArgumentParser(description='List Arch Linux news titles.')
parser.add_argument(
'-d', '--days', metavar='n', type=int, default=30,
help='Show titles from the last n days. Default: %(default)s.'
)
parser.add_argument(
'-s', '--since', metavar='yyyy-mm-dd',
help='Show titles between now and the given date.'
)
# Convert time struct objects to datetime objects.
def ts_to_dt(ts):
return datetime.datetime.fromtimestamp(time.mktime(ts))
# Main function to print out titles.
def main(args=None):
args = parser.parse_args(args)
# Load the feed.
feed = feedparser.parse(FEED_URL)
# Get the current time.
now = datetime.datetime.now()
if args.since:
ts = time.strptime(args.since, '%Y-%m-%d')
dt = ts_to_dt(ts)
cutoff = now - dt
else:
cutoff = datetime.timedelta(days=args.days)
# Find all entries not older than the cutoff.
for entry in feed.entries:
ts = entry.published_parsed
dt = ts_to_dt(ts)
if (now - dt) <= cutoff:
print(entry.title)
# Run the main function when invoked as a script.
# This will catch keyboard interrupts (ctrl+c) and broken pipe errors.
if __name__ == '__main__':
try:
main()
except (KeyboardInterrupt, BrokenPipeError):
pass
Save it as e.g. archtitles. To get the feed titles in the last 60 days:
$ archtitles -d 60
PHP 5.5 available in the [extra] repository
TeXLive 2013 update may require user intervention
To get the titles since June 1:
$ archtitles -s 2013-06-01
PHP 5.5 available in the [extra] repository
TeXLive 2013 update may require user intervention
Binaries move to /usr/bin requiring update intervention
python-feedparser
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
are you looking for something like pacmatic?
end ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'the machine is not the end to the means., we are. In history, in board rooms and politic the greatest decision and effort
evolves from passion, lust for life, and a common sense of humanity. Never forget what you are and why'. -me
Offline
Wow, I didn't expect this kind of reply. Now I believe I have something to work with.
Thanks a bunch to all, really. I'll return if I have any issues (at least I'll post the final result), but I think I can take it from here.
Best regards.
I am a philosopher, of sorts, not a troll or an imbecile.
My apologies that this is not always obvious, despite my best efforts.
Offline
karol wrote:Would parsing https://mailman.archlinux.org/pipermail/arch-announce/ be easier?
That would require a lot of scraping.
Here's a custom script. I didn't know if you wanted to specify the cutoff by a fixed amount of time (e.g. the last 30 days) or by a fixed date (e.g. since Aug 1), so I have added options for both. I have also included comments to make it easier for you to edit it.
#!/usr/bin/env python3 # Depends on python-feedparser import argparse import datetime import feedparser import time FEED_URL = 'https://www.archlinux.org/feeds/news/' # Command-line argument parser. parser = argparse.ArgumentParser(description='List Arch Linux news titles.') parser.add_argument( '-d', '--days', metavar='n', type=int, default=30, help='Show titles from the last n days. Default: %(default)s.' ) parser.add_argument( '-s', '--since', metavar='yyyy-mm-dd', help='Show titles between now and the given date.' ) # Convert time struct objects to datetime objects. def ts_to_dt(ts): return datetime.datetime.fromtimestamp(time.mktime(ts)) # Main function to print out titles. def main(args=None): args = parser.parse_args(args) # Load the feed. feed = feedparser.parse(FEED_URL) # Get the current time. now = datetime.datetime.now() if args.since: ts = time.strptime(args.since, '%Y-%m-%d') dt = ts_to_dt(ts) cutoff = now - dt else: cutoff = datetime.timedelta(days=args.days) # Find all entries not older than the cutoff. for entry in feed.entries: ts = entry.published_parsed dt = ts_to_dt(ts) if (now - dt) <= cutoff: print(entry.title) # Run the main function when invoked as a script. # This will catch keyboard interrupts (ctrl+c) and broken pipe errors. if __name__ == '__main__': try: main() except (KeyboardInterrupt, BrokenPipeError): pass
Usage
Save it as e.g. archtitles. To get the feed titles in the last 60 days:
$ archtitles -d 60 PHP 5.5 available in the [extra] repository TeXLive 2013 update may require user intervention
To get the titles since June 1:
$ archtitles -s 2013-06-01 PHP 5.5 available in the [extra] repository TeXLive 2013 update may require user intervention Binaries move to /usr/bin requiring update intervention
Dependencies
python-feedparser
Still at work, but just had the chance to look a bit closer at your script, and let me tell, it's absolutely brilliant.
Together with something like:
tac /var/log/pacman.log | grep -m 1 "starting full system upgrade" | cut -c2-11
although not foolproof, it will most certainly to the job to my liking.
Don't think I have to change a thing! (though I probably will anyway )
This one is definitely solved.
Thanks again and best regards.
Last edited by zacariaz (2013-08-30 08:52:01)
I am a philosopher, of sorts, not a troll or an imbecile.
My apologies that this is not always obvious, despite my best efforts.
Offline