When you create your site, you will most likely want it to be on Google or other search engines. Luckily though, they have their own bots or “crawlers” that scour the internet to index sites. These bots can be controlled in how they interact with your site through a text file. This would be the robots.txt file. This file contains rules that the bots visiting your site are asked to follow. This can generally be a good thing, but can also be a bad thing for your rating if setup incorrectly.
What Does A Robots.txt File Look Like?
Here are a couple different examples of what this file could look like for you.
Allow All Bots Full Access
User-agent: * Disallow:
Block All Access For Bots
User-agent: * Disallow: /
Set Crawl Delay to 10 Seconds
User-agent: * Crawl-delay: 10
By setting a crawl delay of 10 seconds you’re limiting these search engines to index your website. If you have a VPS with a limited amount of resources and your pages are not optimized to handle a sudden spike of web traffic, you may consider upgrading your Cloud VPS Server to have more resources and adding a crawl-delay
Those are just a couple very general templates for the robots.txt file. You can use it to block access to a single folder, or even just a single file if wanted. Each entry within the robots.txt file will have a User-agent field with something next to it. The asterisk indicates a wildcard, where it would apply to all. If you want to set it for a specific bot, you would put this in this field. Any lines after that will apply to that user agent until a new one is specified. For example, in the file below, for Google it would block the /tmp folder on the site, but for all other bots all folders would be allowed (including the /tmp folder).
User-agent: Googlebot Disallow: /tmp/ User-agent: * Disallow:
As you can see, this can be used to prevent bots from indexing private files, or even unnecessary directories. Though memorizing the bots that exist on the internet there are various resources available to find those out.
There are many things that the robots.txt can do to affect the site’s crawlers, but preventing access is the most common. Other options include the crawl-delay option. This option can be used to make the bot delay between crawling the pages on the site. More about what can be done with the robots.txt file can be found on this organizations site here. They even have a complete list of the bots that are known to them available here.
- What Is The Maximum CPU Load For Your Cloud/Dedicated Servers?
- My Server is Pinging Fine, but my Website is not Loading
- How to Upgrade or Customize Your Server
- More details about using a robots.txt file
- Google Search Console
- Bing Bot Blog Post
- Bing webmaster tools – Help with Yahoo web crawler (Slurp) and Bing’s web crawler
- Yandex Support Guide for robots.txt
If you should have any questions or would like assistance, do feel free to contact us through Live Chat, on our Phones, or by submitting a ticket with our Technical Support team.