When you create your site, you will most likely want it to be on Google or other search engines. Luckily, they have their own bots or "crawlers" that scour the internet to index sites. These bots can be controlled in how they interact with your site through a text file. This would be the robots.txt file. This file contains rules that the bots visiting your site are asked to follow. This can generally be a good thing but can also be a bad thing for your rating if set up incorrectly.
What Does A Robots.txt File Look Like?
Here are a couple of different examples of what this file could look like for you.
Allow All Bots Full Access
User-agent: * Disallow:
Block All Access For Bots
User-agent: * Disallow: /
Set Crawl Delay to 10 Seconds
User-agent: * Crawl-delay: 10
By setting a crawl delay of 10 seconds, you're limiting these search engines to index your website. If you have a VPS with a limited amount of resources and your pages are not optimized to handle a sudden spike of web traffic, you may consider upgrading your Cloud VPS Server to have more resources and adding a crawl-delay
Those are just a couple of very general templates for the robots.txt file. You can use it to block access to a single folder or even just a single file if wanted. Each entry within the robots.txt file will have a User-agent field with something next to it. The asterisk indicates a wildcard, where it would apply to all. If you want to set it for a specific bot, you will put this in this field. Any lines after that will apply to that user agent until a new one is specified. For example, in the file below, for Google, it would block the /tmp folder on the site, but for all other bots, all folders would be allowed (including the /tmp folder).
User-agent: Googlebot Disallow: /tmp/ User-agent: * Disallow:
As you can see, you can use this to prevent bots from indexing private files or even unnecessary directories. Though memorizing the bots that exist on the internet, various resources are available to find those out.
There are many things that the robots.txt can do to affect the site's crawlers, but preventing access is the most common. Other options include the crawl-delay option. You can use this option to make the bot delay between crawling the pages on the site.
If you should have any questions or would like assistance, please contact us through Live Chat or by submitting a ticket with our Technical Support team.