You are not logged in.
Hostsblock is a bash script designed to take advantage of the HOSTS file available in all operating systems to provide system-wide blocking of internet advertisements, malicious domains, trackers, and other undesirable content. To do so, it downloads a configurable set of blocklists and processes and their entries into a singular HOSTS file.
Hostsblock also includes hostsblock-urlcheck, a command-line utility that allows you to block and unblock certain websites and any other domains contained in that website, in the event that the included blocklists don't block enough or block too little on a specific site.
Features:
-System-wide blocking (all non-proxied connections use the HOSTS file)
-Zip- and 7zip-capable (can download and process zip- and 7zip-compressed files)
-Non-interactive (can be run as a periodic cronjob without needing user interaction)
-Extensive configurability (allows for custom black and white listing, redirection, post-processing scripting, target HOSTS file, etc.)
-Bandwith-efficient (only downloads blocklists that have been changed, uses compression when available)
-Resource-efficient (only processes blocklists when changes are registered, uses minimal pipes)
-High performance blocking (when using dns caching and pseudo-server daemons)
-Extensive choice of blocklists included (allows user to customize how much or how little is blocked)
-Redirection capability (combats DNS cache poisoning)
Hostsblock is available via the AUR
More information can be found at the hostsblock homepage and below.
Please submit all comments, feedback, and bugs below.
Old stuff----------------------------------------------------------------------------------------------------------------------------------------------------------------
EDITED: The script has changed rapidly over the past couple weeks. You can now find it on the AUR. (see news below)
FEATURES
Non-interactive downloading and combining of a (possibly unlimited) number of hosts file lists into a single file.
Filters out redundant entries, excessive white spaces, and comments to minimize resultant file size.
Can use links to zipped files (using the "addzip" subroutine) or plaintext files ("addurl").
Preserves the redirect ip address (e.g. 127.0.0.1) of original files so that lists designed to combat dns pollution can be used (e.g. in China).
Includes a whitelist area
Recycles entries from the original hosts file in the event that one of included servers is momentarily down or you are using your plain old /etc/hosts file (preserves your loopback entries).
File processing is just a single "one liner" (broken up with \'s for legibility sake).
SUGGESTED ADDITIONS
Use dnsmasq as a caching server instead of merely using this to override your existing /etc/hosts file. The default hosts file is HUGE (8.6 MB), and may bog down slower machines. Using dnsmasq resolves this (at the expense of memory usage) and should speed up repeat dns resolutions. Follow the directions here (https://wiki.archlinux.org/index.php/Dn … ache_Setup) and then add /etc/hosts.adblock (the default) to dnsmasq.conf as a second host file to make use of this.
Use pixelserv (http://proxytunnel.sourceforge.net/pixelserv.php), which redirects calls to localhost to a 1x1 transparent gif image. This cleans up the whitespaces left by missing ads and should improve the speed of website rendering (instead of timing out, the browser just renders the 1x1 gif).
PLANS (i.e. stuff I wouldn't mind help with)
Move all items in need of user configuration (necessary variables, blacklists, whitelists) to an external config file.
Package this up for inclusion in the AUR, or even merge with hosts_update (https://aur.archlinux.org/packages.php?ID=44930)
/etc/cron.daily/update-adblock
#!/bin/bash
## NECESSARY VARIABLES
# temporary direct where files will be downloaded and unzipped
tmpdir=/dev/shm
# final resulting composite file.
hostsfile=/etc/hosts.adblock # for use with dnsmasq
#hostsfile=/etc/hosts # for use without dnsmasq
# commands to execute at the end of the process
dnsmasq_restart() { #
/etc/rc.d/dnsmasq restart # if using dnsmasq
} #
#dnsmasq_restart() { #
#/bin/true # if NOT using dnsmasq
#} #
## Subroutines
addurl() {
echo "$n: $@..."
curl -s -o $tmpdir/hosts.adblock.d/hosts.adblock.$n "$@"
let "n+=1"
}
addzip() {
[ -d $tmpdir/tmp ] && rm -r $tmpdir/tmp
mkdir $tmpdir/tmp
echo "$n: $@..."
curl -s -o $tmpdir/tmp/hosts.adblock.zip "$@"
cd $tmpdir/tmp
echo " Extracting..."
ionice -c 3 nice -n 19 unzip -jq hosts.adblock.zip
grep -Ih -- "^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" ./* > $tmpdir/hosts.adblock.d/hosts.adblock.$n
cd /tmp
rm -r $tmpdir/tmp
let "n+=1"
}
## Main routine
[ -d $tmpdir/hosts.adblock.d ] && rm -r $tmpdir/hosts.adblock.d
mkdir $tmpdir/hosts.adblock.d
cp $hostsfile $tmpdir/hosts.adblock.d/hosts.adlbock.0
n=1
echo "DOWNLOADING BLACKLISTS:"
## BLACKLISTS TO DOWNLOAD
# hphosts main file
addzip "http://support.it-mate.co.uk/downloads/hphosts.zip"
# hphosts file to block yahoo
addzip "http://hosts-file.net/download/yahoo_servers.zip"
# hphosts partial file (for updates between main releases)
addurl "http://hosts-file.net/hphosts-partial.asp"
# hostsfile.org BADHOSTS file (currently defunct?)
addzip "http://hostsfile.org/Downloads/BadHosts.unx.zip"
# hostsfile.mine.nu
addzip "http://hostsfile.mine.nu/Hosts.zip"
# the extremely popular mvps hostsfile
addzip "http://winhelp2002.mvps.org/hosts.zip"
# yoyo.org files
addurl "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&showintro=1&startdate%5Bday%5D=&startdate%5Bmonth%5D=&startdate%5Byear%5D=&mimetype=plaintext"
# malwaredomainlist.com
addurl "http://www.malwaredomainlist.com/hostslist/hosts.txt"
# securemecca (should be a mirror of hostsfile.org)
addurl "http://www.securemecca.com/Downloads/hosts.txt"
# hostsfile.org (mirror of securemecca)
addurl "http://www.hostsfile.org/Downloads/hosts.txt"
# defunct? Coming back?
addurl "http://someonewhocares.org/hosts/hosts"
# sysctl.org
addurl "http://sysctl.org/cameleon/hosts"
# a file specifically for smartphones
addurl "http://www.ismeh.com/HOSTS"
# another file I randomly found
addurl "http://www.modyouri.com/adblock_hosts/hosts"
# Process files
echo "PROCESSING FILES..."
grep -Ih -- "^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" $tmpdir/hosts.adblock.d/* |\
sed -e 's/[[:space:]][[:space:]]*/ /g' -e "s/\r//g" -e "s/\#.*//g" -e "s/ $//g" |\
sort -u |\
# WHITELIST
sed -e "/\.dropbox\.com/d" -e \
"/ www\.malwaredomainlists\.com/d" -e \
"/ www\.arcamax\.com/d" -e \
"/ www\.instructables\.com/d" -e \
"/ goo\.gl/d" -e \
"/ www\.reddit\.com/d" -e \
"/ t\.co/d" -e \
"/ bit\.ly/d" -e \
"/ www\.viddler\.com/d" -e \
"/ viddler\.com/d" -e \
"/ tinyurl\.com/d" -e \
"/ www\.pcmag\.com/d" -e \
"/ www\.forbes\.com/d" -e \
"/ www\.hydrogenaudio\.org/d" -e \
"/ www\.flickr\.com/d" -e \
"/ adjax\.flickr\.yahoo\.com/d" -e \
"/ l-stat\.livejournal\.com/d" -e \
"/ stat\.livejournal\.com/d" -e \
"/\.about\.com/d" -e \
"/ hosts-file\.net/d" -e \
"/ community\.livejournal\.com/d" -e \
"/ netflix\.com/d" -e \
"/ torrentfreak\.com/d" -e \
"/\.linkedin\.com/d" -e \
"/\.espn\.go\.com/d" -e \
"/ l\.yimg\.com/d" > $hostsfile
# clean up
echo "CLEANING UP..."
[ -d $tmpdir/tmp ] && rm -r $tmpdir/tmp
rm -r $tmpdir/hosts.adblock.d
dnsmasq_restart
Last edited by jasonwryan (2015-02-17 20:16:05)
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
https://aur.archlinux.org/packages.php?ID=58976
Just uploaded an updated version of this text to the AUR, complete with significant improvements (in bold below)
FEATURES
-Non-interactive downloading and combining of a (possibly unlimited) number of hosts file lists into a single file.
-Filters out redundant entries, excessive white spaces, and comments to minimize resultant file size.
-Can seamlessly use links to zipped files or plaintext files.
-Preserves the redirect ip address (e.g. 127.0.0.1) of original files so that lists designed to combat dns pollution can be used (e.g. in China).
-Customizable redirect ip address (e.g. 127.0.0.1) for blocked hosts
-Whitelist
-Recycles entries from the original hosts file in the event that one of included servers is momentarily down or you are using your plain old /etc/hosts file (preserves your loopback entries).
-options now configurable with a config file (/etc/hostsblock.conf)
-installable via the AUR!
SUGGESTED ADDITIONS
-Use dnsmasq as a caching server instead of merely using this to override your existing /etc/hosts file. The default hosts file is HUGE (8.6 MB), and may bog down slower machines. Using dnsmasq resolves this (at the expense of memory usage) and should speed up repeat dns resolutions. Follow the directions here (https://wiki.archlinux.org/index.php/Dn … ache_Setup) and then add /etc/hosts.adblock (the default) to dnsmasq.conf as a second host file to make use of this.
-Use pixelserv (now in the AUR at https://aur.archlinux.org/packages.php?ID=58975), which redirects calls to localhost to a 1x1 transparent gif image. This cleans up the whitespaces left by missing ads and should improve the speed of website rendering (instead of timing out, the browser just renders the 1x1 gif).
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
have you ever compared/benchmarked this solution to using privoxy and or adblock extension? (chromium browser)
Offline
@el mariachi. I have not compared the two empirically, but there are certain advantages and disadvantages for a hosts-file implementation like hostsblock:
+protection for the whole operating system, not just for chromium
+once activated, no further configuration of apps are needed (vs. privoxy, which requires apps to support a proxy or for you to redirect everything via redsocks)
+lists that fix dns poisioning can also be included (e.g. for people behind the great firewall of china)
-can't use wildcards for matching, like adblock. In each blacklist, each offending sub-domain must be listed out
-a small number of ads internal to websites (e.g. distrowatch hosts its own ads, and gmail works around this too) can't be blocked this way, but can be via adblock
-certainly not as sophisticated in its filtering as privoxy, but with the benefit of a lesser performance penalty than privoxy.
+/-theoretical performance penalties (but with solutions):
-extremely large hosts files can slow down DNS resolution, BUT
+if you use dnsmasq and add the second file as an additional hosts file, the entries are cached and should be resolved nearly instantaneously (see the wiki page on dnsmasq dns caching)
-since the hosts file redirects a matching blacklisted subdomain to localhost, pages might take longer to load, since they would take time to timeout on localhost, also leaving the boilerplate, BUT
+if you use pixelserv, redirected blacklisted subdomains load nearly instantly with a 1x1 transparent gif pixel (like with privoxy)
>Theoretically, with pixelserv and dnsmasq caching, hostsblock should be significantly faster than adblock and privoxy. Privoxy is notorius for slowing down connections, and adblock theoretically requires yet another scan of incoming data (to the best of my knowledge)
Bottom line: I like having hostsblock for the security of my whole machine, but I also like using adblock for any extra protection it provides.
(I also use do not track+, https everywhere, and disabled javascript and cookies in chromium, using chromium's whitelist feature for those select domains I do want to allow int)
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
great answer thank you! for pixelserv to work with your script do I have to do anything special or simply start the daemon?
I'm trying it now
Offline
If you have pixelserv installed and "pixelserv=1" defined in the hostsblock.conf file, then hostsblock will automatically check to see if pixelserv is running and start it if it isn't. Otherwise, just starting the daemon will work just fine.
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
UPDATE: https://aur.archlinux.org/packages.php?ID=58976
Update 0.2:
-Added personal blacklist functionality
-Included more info in the config file
WARNING: there has been a minor change in the config file: "blacklists" is now "blocklists".
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
I must be doing something wrong. I whitelisted ".eztv.it" (which I don't understand why is blacklisted in the first place) and I still can't access it.
edit: eztv.it works with " eztv.it" instead of "." but ompldr.org doesn't and flickr is seriously deformated:
Last edited by el mariachi (2012-05-06 14:42:31)
Offline
That is definitely one of the downsides of having lots of aggressive blocklists together. One way I always check what is missing is by using grep on the resulting file, e.g.
$ grep eztv /etc/hosts.block
127.0.0.1 eztvefnet.org
127.0.0.1 eztv.it
127.0.0.1 eztvlinks.com
127.0.0.1 eztv.tracker.prq.to
127.0.0.1 www.eztvefnet.org
127.0.0.1 www.eztv.it
127.0.0.1 www.eztvlinks.com
With that, I would add both " www.eztv.it" and " eztv.it" to your whitelists. (I like being a precise as possible so that these other mimicking domains don't sneak in.)
ompldr.org:
$ grep ompldr.org /etc/hosts.block
127.0.0.1 ompldr.org
127.0.0.1 www.ompldr.org
whitelist: " ompldr.org" and " www.ompldr.org"
After re-running hostsblock and restarting chromium (which seems to cache dns queries too), I can now get through here. I don't know if it appears as it should, if not I commonly will look at the page source and see if there any other subdomains listed that might need to be unblocked.
flickr is a tougher cookie to crack, since it is all entangled with yahoo. When I just tried to log in to my old account, it stopped at https://login.yahoo.com.
$ grep login.yahoo.com /etc/hosts.block
127.0.0.1 login.yahoo.com
127.0.0.1 www.login.yahoo.com
whitelist "login.yahoo.com" alone should take care of that.
Now I'll run hostsblock again to refresh the list and see if I can get logged in.
Oi, that is mangled. I'll check the page source for domains. Here are the domains I found by a quick scane:
l.yimg.com
us.js.yimg.com
Let's check that out in the block list. Oi. By greping just .yimg.com, I get a huge list of subdomains...I suspect that some flickr users use their accounts to spam.
Let's check out just .js.yimg.com: Oh, nothing there, so I would just add " l.yimg.com" to the list.
re-hostsblock and restart chromium, and everything is clean.
I should look into making this process a little easier, possibly through some sort of command line tool that can detect possible subdomains in a particular website. Something for the TODO list.
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
Version 0.3 (06.04.2012) https://aur.archlinux.org/packages.php?ID=58976
*improved performance by changing whitelist for-loop with an sed-file in the main processing routine
*added automatic backing up of original target hosts file
*added install-time warning to backup /etc/hosts file
*changed install spot to /usr/sbin instead of /usr/bin (requires root to run anyway)
*fixed typo in default variables (blacklists to blocklists)
*added changelog
*updated conf file with whitelist entries to unblock flickr.com and ompldr.org
TODO
*add command-line tool to interactively examine a given site, provide a list of domains involved in that site (and whether they are blocked or not), and add entries to the blacklist and whitelist accordingly.
*fix issue with whitespaces in whitelist entries (disappears in temporary .sed file)
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
I confirm the fix works. Thanks
Offline
New version out:
Version 0.4 (09.04.2012) https://aur.archlinux.org/packages.php?ID=58976
*Post-extraction scan now recursively searches extracted file tree for potential entries
*Added 7z blocklist functionality
*Moved unzip (with p7zip) to optdep
*Added check for unzip and 7za to decompression routines
*Added new entries from rlwpx.free.fr (requires p7zip)
*Now gzips backup hosts file to save space (added gzip as dependency)
TODO
*add command-line tool to interactively examine a given site, provide a list of domains involved in that site (and whether they are blocked or not), and add entries to the blacklist and whitelist accordingly.
*fix issue with whitespaces in whitelist entries (disappears in temporary .sed file)
*add color output
Last edited by gaenserich (2012-05-10 13:20:55)
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
Do be warned: the blocklists from rlwpx.free.fr are HUGE and in most cases probably overkill (i.e. you'll have to make a lot of whitelist exceptions). Use only if you're brave or borderline paranoid.
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
humm I still see "page not found" errors. I have pixelserv and dnsmasq. Do I have to redirect my browser to somewhere?
Offline
A while back, I found that pixelserv would shut down unexpectedly. Does "ps aux | grep -v grep | grep pixelserv" produce anything? If not, the way I worked around it was adding a little entry to my crontab, e.g. as root: crontab -e, then add the line
*/10 * * * * ps aux | grep -v grep | grep pixelserv &>/dev/null || /etc/rc.d/pixelserv restart
This will check every 10 minutes to make sure pixelserv is running.
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
that'll do for now, but can you look into it and see if there's a "cleaner" solution? why is it shutting down?
Sorry if I sound ungrateful, I don't have much now, I'll help you out soon
Offline
There isn't much documentation out there on pixelserv, which hasn't been maintained for a while, and I personally don't know that much about perl to get my hands dirty with it. It doesn't seem to have done it to me in a while, so I wouldn't know where to start either. If possible, run it directly (i.e. not through the rc script) in the foreground for debugging info.
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
running in the foreground doesn't tell much. It just "crashes".
Reading pixelserv.pl didn't help me much either
Offline
I've placed @pixelserv on my rc.conf.. Haven't had an issue with it.. Perhaps you're not starting it right Mariachi?
Just test
sudo rc.d start pixelserv
If that doesn work. Then I have no clue.
Last edited by ObliviousGmn (2012-05-10 23:29:20)
- The Github -
Offline
After rooting around, I found an alternative version of pixelserv written in C: http://www.linksysinfo.org/index.php?th … 509/page-7. I'll try compiling it once I'm home. This would work much better than the perl version anyway!
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
This script is developing very fast, but I see that ontobelli's script (https://bbs.archlinux.org/viewtopic.php?id=135884) produces a cleaner and better-looking hosts file -- probably due to the different (order of) commands used at the stage of processing downloaded blocklists as well as original hosts file entries and user lists of domains to be denied or allowed access.
Perhaps a solution might be sorting all the lists and simply adding them under the original hosts entries.
Additional suggestions:
1) using 3 separate files for user allowed and denied domain lists and for url addresses of downloadable blocklists
2) avoiding the use of zip files for the sake of 100% MacOSX-compatibility.
Offline
Re: C-implemented pixelserv: I tried compiling it last night, but I'm missing a library or something for it to run. It was specifically written for linksys firmware and its derivatives, e.g. tomato and dd-wrt, so it might be a tangle to get it to work on x86.
@el mariachi: also, I believe pixelserv needs to be run as root (since it is managing a priviledged port).
@sadi
Re: cleaner output hosts file:
Is there a certain sense in which you mean clean here? Assuming that you mean that it preserves the original file's header, I could implement a little step in there that detects use of a /etc/hosts (instead of /etc/hosts.block, for example), and appends the remainder accordingly. However, that would leave a redundancy with entiries (most blocklists also include their own localhost entries). Be assured that your localhost entry never disappears, but rather just gets buried in alphanumeric order among all the other entries. Moreover, hostsblock also does everything it can to remove excess whitespace and comments to reduce file size.
Vs. ontobelli's script, hostsblock is a little bit more powerful via its configurability (you don't have to edit the script itself to customize how it works) and the simplicity of its processing step (fewer commands linked by pipes during the processing step significantly cuts down the amount of time needed to process, on my machine, 29M of blocklists, especially on my wimpy little proxy server)
Re: Separate files. I thought about that, and I certainly like the idea. I thought about doing that with the whitelists and blacklists, which would be trivial to implement. I don't necessarily know what the benefits of that are over a single centralized file, however, as long the number of entries are moderately short, as they are now. I could be persuaded otherwise.
Re: zip files. Feel free to supplant the given zip entries for their equivalent plaintext types (although some are only available as zip files) in your own config file. I included zip defaults to help reduce the amount of bandwidth required for daily updates (one domain on the list actually blocked me for a while for overusing bandwidth while I was testing the script).
Making the script 100% compatible is pretty trivial. When I install this script on os x, I typically just swap out the "unzip -jq" command with "unzip -f" or something, I don't remember off the top of my head.
Check out hostsblock for system-wide ad- and malware-blocking.
Offline
I have a dd-wrt router, maybe I'll try that instead. Is the router supposed to be ARM? It's a linksys router, Broadcom chip. Anyway, I'll see what's needed to use it in x86.
Offline
@tlvince: Where has kwakd been all my life! This is exactly what I've been searching for!
Check out hostsblock for system-wide ad- and malware-blocking.
Offline