Monday, August 22, 2005

Robots.txt -- Why This File Is Important

Search Engine crawlers can throw off web site statistics by falsely elevating the number of hits your site receives. The trained eye and ranking organizations have the ability to discern spider or bot hits versus actual visitor hits. In most cases the ranking organization will not only disregard these hits, but also penalize the site through ranking.

What helps in preventing spiders from abusing your site hit counters is the Robots.txt file. In most cases it is placed in the root directory of the domain. An example of the contents of this file [courtesy SitePro News] is shown on the left. The SitePro News article is worth a read --> Search Engine Spiders Lost Without Guidance.

The article also mentions a site that describes the uses and formulating of Robots.txt files --> Robotstxt.org

The robots.txt file in the blog.qisoftware.com sub-domain is here -->robots.txt file for blog sub-domain. I maintain a robots.txt file at this level to ensure this blog's statistics remain accurate. I also maintain robots.txt files on all of the other qisoftware.com sub-domains. There is also a robots.txt file in the root directory of my domain [qisoftware.com] which disallows crawlers and spiders from accessing specific data directories I maintain within that directory structure.

There are other precautions that can be used to ensure your site is not inundated with spiders and crawlers [yes I use other means in addition to those outlined here], however the robots.txt file is very important.

Technorati Tags:

0 Comments:

Post a Comment