You are not logged in.
Does a tool like this exist?
What I'm thinking of:
cat $html | tool_i_need --match-post='<p class="newspost">' > output.rss
does something like this exist? Note: I am not interested in "online services" that do it for me.
The main use case: a lot of blogs have comments which i want to follow, but no rss feeds for the comments, so i want to create my own feed.
Last edited by Dieter@be (2010-10-30 09:24:13)
< Daenyth> and he works prolifically
4 8 15 16 23 42
Offline
Yes. It is called "Python". Or any other programming language. Basic parser transformation. HTML parser -> filter out the interesting stuff -> generate the RSS subset of XML.
Beautiful Soup is probably the easist to use HTML parser. (It does have downsides, but worry about them later.) soup.findAll('p', {'class':'newpost'}) gets you half way there.
Offline
thanks, I'll keep that in mind. I prefer to save some time though, so if something like this already exists (i.e.: example code)
< Daenyth> and he works prolifically
4 8 15 16 23 42
Offline
Offline
I'm looking for a generic tool, not something that only supports some specific sites.
< Daenyth> and he works prolifically
4 8 15 16 23 42
Offline
Offline
For xhtml (blogs are using it, aren't they? ) there is xsltproc. You just need a stylesheet like this:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param
name="channel-title"
select="/head/title" />
<xsl:param
name="channel-description"
select="/head/title" />
<xsl:param
name="channel-url" />
<xsl:param
name="page-url" />
<xsl:template match="/">
<rss version="2.0">
<channel>
<title><xsl:value-of select="$channel-title"/></title>
<link><xsl:value-of select="$channel-url"/></link>
<description><xsl:value-of select="$channel-description"/></description>
<xsl:apply-templates/>
</channel>
</rss>
</xsl:template>
<xsl:template match="//p[@class == 'newspost']">
<item>
<title><xsl:value-of select=".//id('title-id')"/></title>
<link><xsl:value-of select="$page-url"/>#<!-- find anchor--></link>
<description></description>
</item>
</xsl:template>
</xsl:stylesheet>
The params values may be set from the command-line for those that can't be guessed from the source file.
While xsltproc is a generic tool, stylesheets are mostly site specific
Dieter@be: do you have a link for your sample?, I'd like to test this crap
Last edited by diegonc (2010-10-30 17:05:32)
Offline