What Is The Robots.txt File?

When you create your site, you will most likely want it to be on Google or other search engines. Luckily though, they have their own bots or “crawlers” that scour the internet to index sites. These bots can be controlled in how they interact with your site through a text file. This would be the robots.txt file. This file contains rules that the bots visiting your site are asked to follow. This can generally be a good thing, but can also be a bad thing for your rating if setup incorrectly.

What Does A Robots.txt File Look Like?

Here are a couple different examples of what this file could look like for you.

Allow All Bots Full Access

User-agent: *

Block All Access For Bots

User-agent: *
Disallow: /

Set Crawl Delay to 10 Seconds

User-agent: *
Crawl-delay: 10

By setting a crawl delay of 10 seconds you’re limiting these search engines to index your website. If you have a VPS with a limited amount of resources and your pages are not optimized to handle a sudden spike of web traffic, you may consider upgrading your Cloud VPS Server to have more resources and adding a crawl-delay

Those are just a couple very general templates for the robots.txt file. You can use it to block access to a single folder, or even just a single file if wanted. Each entry within the robots.txt file will have a User-agent field with something next to it. The asterisk indicates a wildcard, where it would apply to all. If you want to set it for a specific bot, you would put this in this field. Any lines after that will apply to that user agent until a new one is specified. For example, in the file below, for Google it would block the /tmp folder on the site, but for all other bots all folders would be allowed (including the /tmp folder).

User-agent: Googlebot
Disallow: /tmp/

User-agent: *

As you can see, this can be used to prevent bots from indexing private files, or even unnecessary directories. Though memorizing the bots that exist on the internet there are various resources available to find those out.

There are many things that the robots.txt can do to affect the site’s crawlers, but preventing access is the most common. Other options include the crawl-delay option. This option can be used to make the bot delay between crawling the pages on the site. More about what can be done with the robots.txt file can be found on this organizations site here. They even have a complete list of the bots that are known to them available here.

