You are not logged in.

#1 2022-11-19 16:29:45

lzer0
Member
Registered: 2022-05-17
Posts: 8

[SOLVED] help translating PCRE (Acrylic DNS) pattern to ERE (Squid)

I have setup a proxy with Squid to block certain domains, which is a url_regex pattern that looks like this:

^https?://([a-z0-9]\.?)*yahoo\.([a-z]{2,}.*)$

This pattern blocks every subdomain, the domain itself and whatever top level domains after that.
But there is a problem, it's also blocking unwanted entries like yahooyahoo.*, ayahoo.*, yahooooo.* which is not what I want, I want to allow typos while blocking the entirety of every domain that only contains "yahoo" in this example

In Windows, I have setup a DNS Proxy with Acrylic DNS that follow this regex syntax:
https://i.imgur.com/79dtR5x.png
This means, a domain can be completely blocked from being accessed with the following pattern:

>yahoo.*

This pattern will not block entries that don't exactly match the keyword, typos like ayahoo.* and yahooooo.* are allowed which is the behavior I want, but cannot replicate with Squid proxy because the regex rules are different. Also, other filters like dansguardian and e2guardian are deprecated (both fail to compile in AUR), the only one which managed to install is squidguard (AUR), but it's outdated and orphan which I guess it's a security risk just having it installed.

In this case Squid Proxy is very similar to Acrylic DNS as both use regex patterns to block undesirable stuff, but their syntax differ so I'm stuck as to what would be the url_regex equivalent of nvm

>keyword.*

(">" matches any subdomain before keyword)
("*" matches any topdomain after the dot)
typos like kkeyword, keywordd or keywordkeyword aren't matched so their access are allowed, this is the behavior I want, but can't achieve yet. I figured it out lol

EDIT1: ok I kept messing with the syntax for a while, I finally made a pattern that behaves exactly as I want: (Version 1)

[^a-z0-9](keyword1|keyword2)\.[a-z]{2,}.*$

EDIT2: Forgot to add the starting position metacharacter, works virtually the same as Version 1, but a bit more specific I guess? (Version 2)

^.*[^a-z0-9]\.?(keyword1|keyword2)\.[a-z]{2,}.*$

This will block only the keywords while allowing typos, so other unrelated pages won't get blocked.
Oh and btw, this doesn't support spaces of any kind, nor supports line breaks, you have to write one keyword after the other with only the pipe operators separating them and nothing else, it's gotta get long horizontally so you gotta enable word-wrap in your text editor.
I will mark this thread as solved.

Mod Edit - Replaced oversized image with link.
CoC - Pasting pictures and code

Last edited by lzer0 (2022-11-19 23:20:30)

Offline

Board footer

Powered by FluxBB