You are not logged in.

#1 2019-04-04 00:48:56

wis
Member
Registered: 2019-02-15
Posts: 14

should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

in aurweb and AUR tools, to prevent malware distribution attacks?

an attacker can register a domain with a Unicode character that looks like an ascii character to host the malware and use it for a PKGBUILD's source URL.
all modern browsers and terminals render Unicode.

more info:
https://en.wikipedia.org/wiki/Internati … g_concerns
https://en.wikipedia.org/wiki/IDN_homograph_attack
https://www.wandera.com/mobile-security … e-attacks/

original tite: should URLs in PKGBUILDs' source be displayed in punycode encoding?

Last edited by wis (2019-04-04 23:03:15)

Offline

#2 2019-04-04 01:24:16

Scimmia
Fellow
Registered: 2012-09-01
Posts: 11,565

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

What's the concern here? What's displayed is of no consequence. You're worried about a malicious PKGBUILD?

Last edited by Scimmia (2019-04-04 01:24:51)

Offline

#3 2019-04-04 01:40:50

wis
Member
Registered: 2019-02-15
Posts: 14

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

Scimmia wrote:

What's the concern here? What's displayed is of no consequence. You're worried about a malicious PKGBUILD?

Yes.
this practically solve the issue, you can catch it while viewing the PKBUILD in the terminal and without opening the URL in a browser to make sure that for example that the URL containing the domain spotify.com which looks legit, is not spotify.com with a Cyrillic s Unicode character.

Offline

#4 2019-04-04 02:27:21

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,534
Website

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

So you're relying on whether the domain the source is retrieved from looks "familiar" to you?

You could just set a syntax highlighting rule for PKGBUILDs in your text editor to colorize characters > 128

Wait, is this about the aurweb interface display, or what you see when you view the PKGBUILD on your system?

Last edited by Trilby (2019-04-04 02:28:20)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#5 2019-04-04 02:46:38

ngoonee
Forum Fellow
From: Between Thailand and Singapore
Registered: 2009-03-17
Posts: 7,356

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

Trilby wrote:

So you're relying on whether the domain the source is retrieved from looks "familiar" to you?

You could just set a syntax highlighting rule for PKGBUILDs in your text editor to colorize characters > 128

Wait, is this about the aurweb interface display, or what you see when you view the PKGBUILD on your system?


He mentions aurweb in his first sentence. I'm not sure whether this is even a concern, you'd know better...


Allan-Volunteer on the (topic being discussed) mailn lists. You never get the people who matters attention on the forums.
jasonwryan-Installing Arch is a measure of your literacy. Maintaining Arch is a measure of your diligence. Contributing to Arch is a measure of your competence.
Griemak-Bleeding edge, not bleeding flat. Edge denotes falls will occur from time to time. Bring your own parachute.

Offline

#6 2019-04-04 03:58:20

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,534
Website

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

That's why I asked, as the OPs first post was all about the web interface, but the second one was referring viewing a PKGBUILD in a terminal.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#7 2019-04-04 05:48:15

wis
Member
Registered: 2019-02-15
Posts: 14

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

Trilby wrote:

So you're relying on whether the domain the source is retrieved from looks "familiar" to you?

You could just set a syntax highlighting rule for PKGBUILDs in your text editor to colorize characters > 128

Wait, is this about the aurweb interface display, or what you see when you view the PKGBUILD on your system?

Yes. and Yes.
I think it's a real concern, it should be implement in aurweb then AUR helpers will follow suit.
I currently use yay. I think it uses less as a pager or $PAGER which I would like it to be vim at some point in the future, when I can do it, prob. all is needed is a vim plugin.
punycode encoding of ascii is ascii, so 99% of PKGBUILDs will look the same.

Last edited by wis (2019-04-04 05:49:59)

Offline

#8 2019-04-04 10:59:31

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,534
Website

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

I don't know of any AUR helpers that have their own tool for viewing PKGBUILDs.

wis wrote:

I currently use yay. I think it uses less as a pager or $PAGER which I would like it to be vim at some point in the future...

So you are waiting for the author of yay to take your choice away and force everyone to use vim so you don't have to simply set an environment variable to view the PKGBUILD how you want?

As I said in my previous post, it is currently trivial to set a vim syntax highlighting rule to highlight characters outside the ASCII range.  You don't need anything on aurweb to change in order to do this.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#9 2019-04-04 11:20:37

ayekat
Member
Registered: 2011-01-17
Posts: 1,590

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

wis wrote:

think it's a real concern, it should be implement in aurweb then AUR helpers will follow suit.

How do you implement this in aurweb? You would either need to parse the PKGBUILD (which is a big no-no) to determine what part of the displayed file is the source array, or you would just display the entire file in Punycode, which has some odd side effects (comments, or maintainer names). Not to mention that the file would then no longer be displayed as-is, which I would strongly oppose.

Also, the AUR helper authors don't really care about how the aurweb displays things (why should they). You'd have to ask them individually (and there's dozens of AUR helpers out there, and possibly a 3- or 4-digit number of personal AUR helpers scripts, so have fun). And what should they do, exactly?

I think Trilby's suggestion of just having your text editor display such characters in a special way (or more sophisticated: display such characters in a URL in a special way) is the simplest (and most reliable) one.


pkgshackscfgblag

Offline

#10 2019-04-04 11:54:33

goeb
Member
Registered: 2015-06-03
Posts: 11

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

ayekat wrote:

How do you implement this in aurweb? You would either need to parse the PKGBUILD (which is a big no-no) to determine what part of the displayed file is the source array, or you would just display the entire file in Punycode, which has some odd side effects (comments, or maintainer names). Not to mention that the file would then no longer be displayed as-is, which I would strongly oppose.

There's already syntax highlighting for the PKGBUILD, so highlighting any offending characters should be possible for the web interface, too. And you could still show the punycode URL in the Sources section on the packages' main pages, the information comes from .SRCINFO. Although I suspect that you could upload a .SRCINFO with a URL that differs from the URL in the PKGBUILD, so the user will have to take care about PKGBUILD contents anyway.

Offline

#11 2019-04-04 13:44:04

wis
Member
Registered: 2019-02-15
Posts: 14

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

Trilby wrote:

I don't know of any AUR helpers that have their own tool for viewing PKGBUILDs.

wis wrote:

I currently use yay. I think it uses less as a pager or $PAGER which I would like it to be vim at some point in the future...

So you are waiting for the author of yay to take your choice away and force everyone to use vim so you don't have to simply set an environment variable to view the PKGBUILD how you want?

As I said in my previous post, it is currently trivial to set a vim syntax highlighting rule to highlight characters outside the ASCII range.  You don't need anything on aurweb to change in order to do this.

No, that's not what I was saying, at all, my bad for not writing as clearly as I should have. I should've wrote "all is needed for me to do" and that my ideal choice of Pager for everything is vim.
I'm not advocating forcing change to anyone's choices. but I want this discussion to happen.
I think I know how I can do it, Vim has a -c option, which can be used to enable a plugin, I can make a wrapper script for yay and set
PAGER='vim -R -c ":enablePkgViewPlugin" -' before calling yay.


ayekat wrote:
wis wrote:

think it's a real concern, it should be implement in aurweb then AUR helpers will follow suit.

How do you implement this in aurweb? You would either need to parse the PKGBUILD (which is a big no-no) to determine what part of the displayed file is the source array, or you would just display the entire file in Punycode, which has some odd side effects (comments, or maintainer names). Not to mention that the file would then no longer be displayed as-is, which I would strongly oppose.

to my knowledge punycode encoding is only for the (IDN) domains, so only the domain part of the source URL would be rendered differently.
I thought that PKGBUILD's $source var naming is standard, isn't it?

ayekat wrote:

How do you implement this in aurweb?

from this repo search: https://github.com/lfos/aurweb/search?q … =highlight
it looks like aurweb use highlight:
http://www.andre-simon.de/doku/highligh … hlight.php
an extensible highlighter, one way, as goeb mentioned, is to highlight Unicode characters in the domain parts of URLs by wrapping them in span elements with a different css color than the URL's.

ayekat wrote:

And what should they do, exactly?

punycode encode the domain part of URLs in the $source var of the PKGBUILD in a temporary copy, before opening it with the pager.

Last edited by wis (2019-04-04 14:49:45)

Offline

#12 2019-04-04 14:11:41

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,534
Website

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

If you quote, please take care to quote accurately (i.e., use the right names for each bit of quoted content).

Displaying something other than what is actually in the PKGBUILD is a horrible idea.  That rather defeats the purpose of displaying the information at all.  And again, this is trivial to do on your end.  If you use vim, really, just add a syntax rule for PKGBUILDs that highlights characters outside the ascii range.  There is no need for a plugin, and no need for wrapper scripts.

You're proposing significant changes with some downsides all to avoid adding a simple syntax rule to your editor to satisfy your own desire to highlight these characters.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#13 2019-04-04 15:14:05

wis
Member
Registered: 2019-02-15
Posts: 14

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

fixed the quote.
I'm definitely doing that,  I started this discussion because I think this this is a valid concern and I don't want to see another similar incident to this https://thehackernews.com/2018/07/arch- … lware.html
is displaying the punycode encoding of the domain part of URLs in $source other than what's actually in PKGBUILD? they map 1 to 1.
again, 99% of PKGBUILDS will look the same because punycode encoding of ascii is ascii.
so you think characters in the domain part of URLs in $source that are in the set of Unicode characters and not in the set of (hostname-label-legal) ascii characters --should only be highlighted in aurweb and pagers, and never displaying a different PKGBUILD content?

Last edited by wis (2019-04-04 15:15:09)

Offline

#14 2019-04-04 15:48:11

ayekat
Member
Registered: 2011-01-17
Posts: 1,590

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

goeb wrote:

There's already syntax highlighting for the PKGBUILD, so highlighting any offending characters should be possible for the web interface, too.

Yes, syntax highlighting could be adopted to do that, I guess. But OP is asking for punycode encoding, which might be a bit less trivial (but still possible, I guess—one could extend the syntax highlighter to actually modify all URLs in the PKGBUILD by encoding it with punycode, but… again, I don't like the idea of not displaying a file as-is).

wis, that incident you've linked has nothing to do with a URL not being encoded with punycode; it was simply a malicious PKGBUILD.
And I believe that changing how the aurweb displays a PKGBUILD won't really have any effect (or only annoying ones, as previously mentioned), since the majority of users are probably reading PKGBUILDs in their own text editors anyway.

Concerning AUR helpers automatically modifying PKGBUILDs to encode all URLs with punycode… that would be up to each helper's authors individually, but I feel like that will introduce more issues than solve any.
For one, if you want to be rigorous, URLs would not only need to be changed in the PKGBUILD itself, but any additional files (e.g. .install files) as well. And going full find+replace on all kinds of files without exactly knowing what they contain is not a good idea IMHO. What happens with the file checksums, for instance?


pkgshackscfgblag

Offline

#15 2019-04-04 18:39:39

progandy
Member
Registered: 2012-05-17
Posts: 5,199

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

ayekat wrote:

wis, that incident you've linked has nothing to do with a URL not being encoded with punycode; it was simply a malicious PKGBUILD.
And I believe that changing how the aurweb displays a PKGBUILD won't really have any effect (or only annoying ones, as previously mentioned), since the majority of users are probably reading PKGBUILDs in their own text editors anyway.

Who would even trouble themselves with registering a look-alike IDN? I believe just seeing a github repository or gist looks like a trustworthy enough URL to many users.

Better use a DLAGENT that can whitelist URL patterns and ask for permissions when it encounters unknown domains or maybe just IDN domains. It would also be a good idea for aur helpers to notify you if the mainteiner changes similar to aurto.

Last edited by progandy (2019-04-04 18:44:06)


| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

#16 2019-04-04 21:05:21

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,534
Website

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

wis wrote:

... so you think characters in the domain part of URLs in $source that are in the set of Unicode characters and not in the set of (hostname-label-legal) ascii characters --should only be highlighted in aurweb and pagers, and never displaying a different PKGBUILD content?

No.  I don't think either of these.  But if you wanted to have your editor/pager highlight them, I'd be happy to help you scratch that personal itch.

As noted punycode encoding would not prevent any of the issues in that article.  I'd certainly prefer there wasn't malicious code in the AUR, but can you actually find any examples that would be avoided by the changes you are proposing?

Propose changes that will increase security, sure, but don't propose changes with the sole intent of improving security if they are - in fact - completely ineffective in doing so.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#17 2019-04-04 21:51:33

wis
Member
Registered: 2019-02-15
Posts: 14

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

forget punycode, I concede that Unicode characters highlighting in IDNs is a better idea.

Trilby wrote:

but can you actually find any examples that would be avoided by the changes you are proposing?

I gave an example in comment #3. and that question is answerable in practice if I had a dataset of all the PKGBUILDs of the AUR, to try finding malicious/deceptive IDNs, or I'd need to scrape and download the PKGBUILDs?

Trilby wrote:

Propose changes that will increase security, sure, but don't propose changes with the sole intent of improving security if they are - in fact - completely ineffective in doing so.

can you explain how so and why is it "completely ineffective" based on the example I gave?

Offline

#18 2019-04-07 13:46:19

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,534
Website

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

wis wrote:

I gave an example in comment #3.

No, you didn't.  None of the cases provided in that link had any deceptive IDNs.  The steps suggested here would not have avoided *any* of those.

EDIT: oops, I looked at post #13.  You did not provide an example in post #3, you posed a hypothetical.  Where is/was the package that had this?  Even your hypothetical is faulty as you did not even give a domain name with a non-ascii character, you just asked us to imagine one.  If you're interested in solving imaginary problems that we can't even point to a single real example of, then I've got some 100% effective unicorn repellant to sell you.

wis wrote:

... and that question is answerable in practice if I had a dataset of all the PKGBUILDs of the AUR, to try finding malicious/deceptive IDNs, or I'd need to scrape and download the PKGBUILDs?

Well, I'd think you should be able to find at least one example.  There are 52437 packages in the AUR, have you not even yet seen a single example of the type of problem you aim to fix?  If you've not even seen a single example, why do you think it is a problem worthy of a solution?

Ideally an assessment might include some numbers of how common such a problem is, or how popular those packages (or the ones they intend to masquerade as) are and how many users might be affected.  But really, if you can't even identify a single case where this has been done, then it's completely a moot point.

Trilby wrote:

can you explain how so and why is it "completely ineffective" based on the example I gave?

This has been done several times in this thread: the only examples you provided did not have non-ascii characters in their domain name in order to masquerade as a different familiar domain.  None of them even had non-ascii characters.  So nothing in those malicuous packages would have been highlighted as non-ascii in the aurweb interface.  You proposed change would not have prevented a single one of these malicious packages causing trouble.  Zero effect = completely ineffective.

Last edited by Trilby (2019-04-07 13:51:22)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#19 2019-04-07 16:00:31

loqs
Member
Registered: 2014-03-06
Posts: 17,378

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

I think wis is proposing an example of

spotify.сom
spotify.\xd1\x81om

I do not believe there is such a top level domain.
My understanding is that the data used by aurweb comes from .srcinfo which is generated by the uploader of the PKGBUILD.
Bash provides numerous ways to change the source array content when the PKGBUILD is executed.

Offline

#20 2019-04-07 16:38:48

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,534
Website

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

I know what he is proposing - but it's completely fictional at this point.  Is there a single example of this actually being used as a strategy to distrubute malicious code such that the proposed changes could have any beneficial effect?

And if someone is going to the lengths of making a look-alike domain to distrubute mallicious code and they make a PKGBUILD to facilitate further distribution of it, they could also just manually modify the .SRCINFO to make even this proposed strategy of catching such mythical attackers impotent.

We have a purely hypothetical attack for which not a single real case has been produced and a proposed solution that doesn't even protect against this hypothetical attack.

Last edited by Trilby (2019-04-07 16:41:28)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#21 2019-04-08 08:18:00

schard
Forum Moderator
From: Hannover
Registered: 2016-05-06
Posts: 1,989
Website

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

Having a security vulnerability, that may currently not be explioted does not make it less worth fixing imho.
There are plenty of examples, such as https://www.аррӏе.com/ that show, that certain attacks are quite possible.
On the other hand, PKGBUILDs are just text files that contain configuration and functions, representing packaging instructions for makepkg.
I personally do not think, that restricting the URLs is a feasible solution.
Maybe this is a good place for a tool that can be invoked to check PKGBUILDs for suspicious content, such as punycode URLs:

>>> def suspicious(url):
...     return url.encode('punycode').decode() != url + '-'
... 
>>> suspicious('https://apple.com/')
False
>>> suspicious('https://www.аррӏе.com/')
True
>>> 

macro_rules! yolo { { $($tokens:tt)* } => { unsafe { $($tokens)* } }; }

Offline

#22 2019-04-08 11:57:23

Lone_Wolf
Forum Moderator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,928

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

Interesting link, schard.
Only after setting network.IDN_show_punycode to true in firefox I can see the real url in your post.

I do see the potential for abuse.
There aren't that many fields in a PKGBUILD that accept uris, maybe --printsrcinfo could convert all URIs to punycode ?


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#23 2019-04-08 12:25:12

progandy
Member
Registered: 2012-05-17
Posts: 5,199

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

Lone_Wolf wrote:

I do see the potential for abuse.
There aren't that many fields in a PKGBUILD that accept uris, maybe --printsrcinfo could convert all URIs to punycode ?

That won't stop those who handcraft the .SRCINFO and only use IDN URLs in the PKGBUILD. Maybe a two stage approach would be best. First highlight IDN urls in the aurweb interface (doesn't help if SRCINFO and PKGBUILD differ), and then block or warn about IDN when makepkg tries to download the files. This could be even done with a curl wrapper for DLAGENTS like this (requires libidn2):

#!/usr/bin/bash
# /usr/local/lib/curl-idnblock.sh
# 
# depends on libidn2, sed, bash
#
# Requires the URL as the last parameter (%u in makepkg.conf), e.g. :
# /usr/bin/curl -gqb "" -fLC - --retry 3 --retry-delay 3 -o %o %u
# convert to 
# /usr/local/lib/curl-idnblock.sh -gqb "" -fLC - --retry 3 --retry-delay 3 -o %o %u

domain="$(sed -e "s/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/" <<<"${!#}")"
decoded="$(idn2 -l "$domain")"
if [[ $decoded != $domain ]]; then
	# a whitelist would be better here, maybe with interactive entry?
	echo "Blocked IDN domain: $decoded ($domain)" >&2
	exit 1
fi
exec /usr/bin/curl "$@"

Last edited by progandy (2019-04-08 12:43:02)


| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

#24 2019-04-08 12:49:39

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,534
Website

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

schard wrote:

Having a security vulnerability, that may currently not be explioted does not make it less worth fixing imho.

No, but I was looking for a specific example so we could demonstrate that the suggested changes would not actually protect against it.

Let me try again: I'm looking for an example, even one, just one example for which the proposed changes would actually provide any protection.

Don't buy fancy locks for the skylights in your house that are too small for all but the smallest humans on the planet to fit through when your front door doesn't even have a latch.  The fact that you could dream up a scenario where a tiny contortionist might be able to crawl in through the skylight does not make putting a lock on it any less ridiculous: it still offers zero protection to your home.

And to go back to it once again, in contrast to highlighting anything on the aurweb interface, syntax highlighting non-ascii characters in your own text editor that you use to preview a PKGBUILD would actually protect against this not-yet-ever-observed attack.  And setting your text editor up to do that highlighting would take less time than it would to read a single post in this thread.

Last edited by Trilby (2019-04-08 12:57:53)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#25 2019-04-08 21:45:21

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,396
Website

Re: should Unicode chars in IDN of PKGBUILD's source URLs be highlighted?

Trilby wrote:
schard wrote:

Having a security vulnerability, that may currently not be explioted does not make it less worth fixing imho.

No, but I was looking for a specific example so we could demonstrate that the suggested changes would not actually protect against it.

schard wrote:

There are plenty of examples, such as https://www.аррӏе.com/ that show, that certain attacks are quite possible.

Offline

Board footer

Powered by FluxBB