You are not logged in.
Pages: 1
I want to see html sources without RSS in newsboat. I am finding "free" web-based services that offer this but would prefer something I can run myself to do the conversion.
Googling for a shell script or perl script to do this is surprisingly difficult. Is anyone else doing it and care to share a link to a script to achieve it?
Initial target is: https://mirrors.edge.kernel.org/pub/lin … le-review/
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I found this project: https://html2rss.github.io/
Sources:
https://github.com/gildesmarais/html2rss
https://github.com/gildesmarais/html2rss-configs
https://github.com/gildesmarais/html2rss-web
Otherwise you should be able to use xidel to extract data and generate the rss. http://www.videlibri.de/xidel.html
Last edited by progandy (2020-08-29 15:30:16)
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
Thanks I stumbled upon the ruby project as well. I starting hacking a shell script together as well. Will checkout xidel.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Here's an example for xidel with xquery3. The most annoying part is creating a valid date string.
xidel https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/stable-review/ --extract-kind=xquery3 --extract-file=kernelrss.xq --output-format xml >test.rss
kernelrss.xq:
declare function formatTime($t) {
let $months:=("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
let $days:=("Mon","Tue","Wed","Thu","Fri","Sat","Sun")
let $DT:=tokenize($t, " ")
let $D:=tokenize($DT[1], "-")
let $m:=format-integer(index-of($months, $D[2]), "00")
let $date:=xs:dateTime($D[3] || "-" || $m || "-" || $D[1] || "T" || $DT[2] || ":00")
let $dow:=((int((xs:date($date) - xs:date('1970-01-05')) div xs:dayTimeDuration('P1D')) mod 7 + 7) mod 7) +1
return format-dateTime($date, $days[$dow]||", [D,2] "||$months[int($m)]||" [Y,4] [H,2]:[M,2]:[s,2] [Z] UTC")
};
<rss version="2.0">
<channel>
<title>Linux kernel v5.x stable review</title>
<link>https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/stable-review/</link>
<description>Signed releases for linux kernel v5.x stable review</description>
<language>en-us</language>
<copyright>kernel.org</copyright>
<pubDate>{formatTime(normalize-space(//a[@href="sha256sums.asc"]/following-sibling::node()[1]))}</pubDate>
{for $a in reverse(//a[starts-with(@href, "patch") and ends-with(@href,"z")])
return <item>
<title>{$a/text()}</title>
<description>Release of {$a/text()}</description>
<link>https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/stable-review/</link>
<author>kernel.org</author>
<guid>{$a/text()}</guid>
<pubDate>{formatTime(normalize-space($a/following-sibling::node()[1]))}</pubDate>
</item>
}
</channel>
</rss>
Last edited by progandy (2020-08-29 18:22:32)
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
@progandy - Thanks for sharing that code. I checked into xidel but it seems a bit complex so I wrote a bash script that seems to work but I cannot get newsboat to read it. See: https://bbs.archlinux.org/viewtopic.php?id=258620
Last edited by graysky (2020-08-29 18:23:12)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Offline
Pages: 1