[solved]scraping

system72 · 2025-12-24 22:26:47

hi im getting scraped by several bots that dont know the concept of boundaries, they are even targeting ssh through multiple albeit laughable attempts.. now for everything other than ssh should i setup anubis? i am quite hesitent about this though.. just looking for advice

this is even funnier

Last edited by system72 (2025-12-25 22:43:36)

Succulent of your garden · 2025-12-25 01:01:43

That sound more like a thing of the season: Which is holy days + the new year. it's usually as I know that attacks and hacking get more than usual in this period of the year, because IT guys wanna be with family or do something else and be outside of the job a while. In this case it seems the guy is trying to guess if you have a minecraft server I guess

To be honest if you are not a so know person probably a quick solution is to just turn off your server, that if I'm not wrong you are self hosting right ? and wait until the next year. That's the slot lazzy approach.

But if you want to increase that, well maybe yeah, I mean it's a good idea to have the anubis thing because it does force to make the proof of work to validate the human in the middle, so a long list of scrapers are not going to pass it, but there is always the chance that a very good scrapper get something, but I think that's not your case because I believe are two types of scrappers:

1) The ones that are scrapping to get the prices of stuff around many stores, or need to get many data to make a data set of something similar.
2) The one that want to hack you in some way.

The minecraft thing suggest the second one in my humble opinion.

Are you using a reverse proxy right ? As I know reverse proxies like cloudflare can be tweak to avoid DDOS attacks. If you are more like into that situation then probably it's better to make some adjustments over the reverse proxy / load balancer. That IP address appears many times ? Or do you have many folks trying to get into minecraft server at the same time ? Then add fail2ban https://en.wikipedia.org/wiki/Fail2ban https://wiki.archlinux.org/title/Fail2ban if you don't have it [ but something tells me that you already have it] and put like 3 or 5 attempts.

But if you are not into the DDOS thing, then maybe you should ask yourself how vulnerable is your webpage. I mean long story short: if it's just a static web site then probably it's not so much to do for the bad hacker , but if you have data bases or oh shit react get hacked again, then probably it's a good idea to harder over there.

I guess you can do many things with the reverse proxy and plus fail2ban, to avoid in using anubis. Also many pages have the robots.txt thing. Many pages let you scrap things and you can see that with something like blablalba.com/robots.txt . Maybe you can do that if you don't have it, but I think very few people are going to respect that to be honest. But if you just wanna get things harder for scrappers, put anubis.

system72 · 2025-12-25 01:42:07

robots.txt is a suggestion, if the bot doesnt want to follow it they do not have to.

no i am not using a reverse proxy, i do not use cloudflare, i am just using nginx. technically cloudflare handles dns but thats my registrars doing.

no i do not have a minecraft server, just like minecraft, its literally minecraft oh my god.. no.. i just thought it was funny they were trying to log into a user named minecraft..

i have password authentication disabled, and use a key to ssh, i do not use fail2ban because the wiki article says its redundant in my case.

no my server is not that vulnerable, well that depends on my services really.. or a 0day

if i use anubis im pretty sure it stops allowing your site to be indexed by search engines so its really my last resort, i might be wrong though

Everything2067 · 2025-12-25 06:34:14

Web scrapers do not try to log in to the server. That looks like someone who has hosted their own minecraft server trying to log in to their server, or someone thinking this is a minecraft server and trying to get access to it.
Related: https://bbs.archlinux.org/viewtopic.php?id=12192
https://wiki.archlinux.org/title/OpenSSH#Protection

mpan · 2025-12-25 11:29:58

system72: this is not scraping, and nothing to be worried about either. Normal internet weather. Make sure you follow basic security guidelines and you’re fine.

Turn off password log-ins completely and only allow SSH key authentication. This completely thwarts this kind of attactks and additionally binds actors’ resources in a futile task. If for any reason you need to allow password log-ins, only allow strong ones. Strong means it is generated by a random source offering high enough entropy level. Something like 20 random alnum characters (if stored in a password manager) or 4–5 word Diceware passphrase (if stored in human memory). But preferably just don’t allo password log-ins.

This is not scraping. It has nothing to do with Minecraft. It is unrelated to X-mas or any other time of a year. This is not DDoS or any other DoS. Most notably this is SSH, not HTTP, so HTTP-related techniques are meaningless. It’s just plain, old credentials stuffing.

If the log noise bothers you, using a non-standard port number and fail2ban is the way to limit it. But you do not get any additional security from it, just less information in the logs. Make sure the bans are short: the addresses are recycled quickly, usually already belong to innocent victims of other attacks, and may be given to somebody else a few hours later. Including you, coincidently, if you are dependent on any kind of dynamic IP address.

Also… why posting a piece of text as an image? ;_;

system72 · 2025-12-25 13:19:17

buddy.. i told everyone in the original post that it was unrelated to ssh.. please everyone, re read, they are scraping every commit in my cgit instance, i know from the logs

dont reply if you cannot understand what i am saying, i literally already said i use passwordless ssh, i posted a picture of the ssh login becausw i thought it was funny and laughable

i already said i do not use fail2ban becausw irs redundanr in my case

Last edited by system72 (2025-12-25 13:21:18)

Succulent of your garden · 2025-12-25 13:43:39

system72 wrote:

no i am not using a reverse proxy, i do not use cloudflare, i am just using nginx. technically cloudflare handles dns but thats my registrars doing.

But then how do you manage the dns query from outside world? Do you have static ip address ?
If you implement a reverse proxy/load balancer you can do a lot of things, it doesn't need to be cloudflare, but having that in front of the server webpage can help you. I know that Nginx can do that, but having it outside in some cases is good option, but not sure if that's your case to be honest.

system72 wrote:

. i just thought it was funny they were trying to log into a user named minecraft..

Well yeah me too. I guess sometimes people are lazzy in naming accounts and psswd and get pwned

system72 wrote:

if i use anubis im pretty sure it stops allowing your site to be indexed by search engines so its really my last resort, i might be wrong though

I guess it depends, Maybe it can, but some pages are still being showed to me by search engines and have anubis on it. But also remember that shodan.io exists, so with that people can try to find vulnerable devices across the whole internet, keep also that in mind.

So i think you can put the anubis if you want to. I mean it's open source and anyone that's going in into your website probably it's enough tech savy to know what is that and don't get scared about loading that first . but it does not going to solve your issue 100%. Also the minecraft account attempts are still going to be a thing, but if you don't care about that and want to find LoL logs then it's up to you

system72 · 2025-12-25 17:53:34

i have an A record pointed at my ip

mpan · 2025-12-25 18:07:52

system72 wrote:

buddy.. i told everyone in the original post that it was unrelated to ssh.. please everyone, re read, they are scraping every commit in my cgit instance, i know from the logs (…)

To paraphrase your own words: dont reply if you cannot understand what you yourself are saying.

Read your own question once more and see, what is it asking about. Maybe you wished to write about e.g. a webserver and explain it’s not SSH, but that seems to never left your head.

My reply is to what you wrote, not what I’d extract from your mind with my crystall ball. It’s meant to be possibly comprehensive to an arbitrary reader too. Regardless of what partial, scattered information was already uttered, or if it aligns with OP’s attempts to address the issue at hand.

Do whatever you wish with that advice. 10 minutes is the time I needed to re-read and dissect messages in this thread to check if that wasn’t indeed a mistake on my end. Not going to waste any more second on this thread.

system72 · 2025-12-25 19:13:03

alright, ill think ill be going with anubis then since its my only choice.. they are even trying to scrape my redlib instance.. this is ruining my christmas

system72 · 2025-12-25 22:42:55

alright seems to be working https://system72.dev/store/61PJBT-w.txt just need to set it up for cgit now, marking as solved

Succulent of your garden · 2025-12-25 23:55:57

share your experience to us if you find that anubis fix what you want ^^

system72 · 2025-12-26 00:05:34

well yes, its doing what i want and blocking all of the crawlers spamming my redlib instance

Arch Linux

#1 2025-12-24 22:26:47

[solved]scraping

#2 2025-12-25 01:01:43

Re: [solved]scraping

#3 2025-12-25 01:42:07

Re: [solved]scraping

#4 2025-12-25 06:34:14

Re: [solved]scraping

#5 2025-12-25 11:29:58

Re: [solved]scraping

#6 2025-12-25 13:19:17

Re: [solved]scraping

#7 2025-12-25 13:43:39

Re: [solved]scraping

#8 2025-12-25 17:53:34

Re: [solved]scraping

#9 2025-12-25 18:07:52

Re: [solved]scraping

#10 2025-12-25 19:13:03

Re: [solved]scraping

#11 2025-12-25 22:42:55

Re: [solved]scraping

#12 2025-12-25 23:55:57

Re: [solved]scraping

#13 2025-12-26 00:05:34

Re: [solved]scraping

Board footer