Running Apache, question about spiders and bots

jskier · 2003-09-16 00:00:02

I'm running Apache and set up a php traffic analzyer. I see that bots and spiders are able to track pages and directories which are not at all linked on any page. How on earth do they figure out my directory structure? And how do I stop it, it makes me uneasy (tried metatags, this only stops them from posting the content). Any help would be appreciated, thanks ahead,

jskier

andy · 2003-09-16 07:37:10

Some robots (or spiders) simply guess typical names. But I think all adhere to
http://www.robotstxt.org/wc/robots.html (or simply plug robots.txt into google - you'll find a lot)
or was that what you meant with meta-tags ?

The next thing you can do is look into possibilities listed on :
http://httpd.apache.org/docs/howto/auth.html
especially "access control". If you only skim over this page ;-), here is an important snippet :

These directives may be placed in a .htaccess file in the particular directory being protected, or may go in the main server configuration file, in a <Directory> section, or other scope container.

But other than that : not linking pages or not showing info does not protect you in any way, and the web is not designed to be that way.

jskier · 2003-09-17 03:02:56

Thanks, the link was useful. I still don't understand how these bots are getting into less common folder names- oh well, at least the folders are secure now
This sort of thing never happened in IIS but at least I don't have to worry patching every otherday.

jskier

Arch Linux

#1 2003-09-16 00:00:02

Running Apache, question about spiders and bots

#2 2003-09-16 07:37:10

Re: Running Apache, question about spiders and bots

#3 2003-09-17 03:02:56

Re: Running Apache, question about spiders and bots

Board footer