Preventing Crawlers

AbbieRose · Aug 15, 2008

Someone told me that it is possible to stop spiders crawling certain pages that you don't want them to. I can't remember the name of the file to look up how to create such a file-can someone assist?

Lesli · Aug 16, 2008

You may be talking about the .htaccess file. I just did a quick google search on the string

block crawlers htaccess

and the first listed result was titled "How to block user agents"

purple · Aug 16, 2008

I know to block Google from crawling your page you use a robots.txt file in your index and a metatag to prevent it from being indexed through indirect links. You can go here for more information from Google's webmaster help:
http://www.google.com/support/webmasters/bin/answer.py?answer=35302

diligent · Aug 16, 2008

Most search engines should follow the robots.txt rules also

blueroomhosting · Aug 16, 2008

You can use robots.txt and this will tell spiders what they should and shouldn't crawl. However, it doesn't actually forbid access. Search engine spiders like googlebot will obey those rules, however less scrupulous crawlers like spambots may not bother. If you want extra protection you can use htaccess rules which will block these user agents at the webserver level using .htaccess.

AbbieRose · Aug 16, 2008

Thank you. My main concern is simply that I have duplicate pages, that are my work in progress pages that I leave on the server but not linked to anything. I don't want them being found, and potentially having the search engines discount the real pages because they find the duplicates that are not ready and not linked.

I'll look into both methods, thanks.

Yawp · Aug 16, 2008

Hi AbbieRose,
You can simply use meta tags to tell robots not to index the contents or follow the links to a page.
Add the following meta tag in the <head> section of your page.

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

This wont stop mailware robots or email address harvesters though.

All the best.

Preventing Crawlers

AbbieRose

New member

Lesli

New member

purple

New member

diligent

New member

blueroomhosting

New member

AbbieRose

New member

Yawp

New member

Supporters