Preventing Crawlers

AbbieRose

New member
Someone told me that it is possible to stop spiders crawling certain pages that you don't want them to. I can't remember the name of the file to look up how to create such a file-can someone assist?
 
You may be talking about the .htaccess file. I just did a quick google search on the string

block crawlers htaccess

and the first listed result was titled "How to block user agents"
 
You can use robots.txt and this will tell spiders what they should and shouldn't crawl. However, it doesn't actually forbid access. Search engine spiders like googlebot will obey those rules, however less scrupulous crawlers like spambots may not bother. If you want extra protection you can use htaccess rules which will block these user agents at the webserver level using .htaccess.
 
Thank you. My main concern is simply that I have duplicate pages, that are my work in progress pages that I leave on the server but not linked to anything. I don't want them being found, and potentially having the search engines discount the real pages because they find the duplicates that are not ready and not linked.

I'll look into both methods, thanks.
 
Hi AbbieRose,
You can simply use meta tags to tell robots not to index the contents or follow the links to a page.
Add the following meta tag in the <head> section of your page.

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

This wont stop mailware robots or email address harvesters though.

All the best.
 
Back
Top