This article is for beginners who only recently began to explore the creation and promotion of websites. If you have used WordPress CMS before then you must be knowing something about robot.txt. Robot.txt file can be used with any kind of websites and blog. So here you can learn about how to create the robot.txt file and some of its basic commands and directives.
What is Robots.txt
Robots.txt is an important element of each website on the Internet. Robots.txt is useful in raising the rankings of site and SEO traffic. Robots.txt is a special file which is majorly present in the root of many sites which contains the different rules of indexing used by search engines.
It is always accepted as a name “robots.txt“. The main tasks of robot.txt are to block indexing certain pages, show where the sitemap file is and to specify the locations of the mirror site. Robots.txt also helps to avoid leakage of sensitive data of your website.
Also read: 6 Quick Tips for Improving Your WordPress Security
Why we need Robots.txt
Incorrect use of robots.txt file may lead to non-indexing of website by search engines. If you don’t use it properly you may end up indexing junk data, also your sensitive data may be available to your customers and a wide range of people. If you want to check your robot.txt file you can use the services such Yandex.Webmaster robots.txt Analysis tool.
Basic commands in Robots.txt
To create a robots.txt file you don’t need any specialized programs. We just need a text editor and some definitive codes or instructions to enter in the robot.txt file. After the creation of robot.txt file you have to place it in the root directory of the website. After that search engines crawl the site and begin indexing.
Example 1:
User-agent: *
Disallow: /
Disallow and User-agent commands
Disallow command specifies that page mentioned should not be indexed and User-agent command actually refer to the search engine crawlers. Creating a robots.txt file is rarely complete without Allow directive, which is the opposite of Disallow. They both define what part or page should be indexed and which shouldn’t.
In the example above we are defining User-agent: * which simply refers to all types of crawlers and bots indexing the site. The next line Disallow: / means NO crawlers should index the whole website.
Example 2:
User-agent: *
Disallow: /wp-admin/
Disallow: /trackback/
The above code means crawler shouldn’t index the wp-admin and trackback folders of the website.
Additional commands in Robots.txt
There are many additional commands which can be used by robot.txt, they are as follows.
- Host – It detects the presence of your site of the primary mirror, if there are several.
- Sitemap – This command shows the search engine position of your site map.
- Crawl-delay – It is used to create a delay before loading the page. This command is only useful for sites with many pages. For example Crawl-delay: 10 – creates delay between the loading of pages is 10 seconds.
- Request-rate – It allows you to set the frequency of loading pages. For example Request-rate: 1/2 load one page every two seconds.
- Visit-time – This command allows the crawler to index only in a strictly allotted time to UTC.
User-agent: Googlebot-Image
Disallow:
Allow: /*
Example 4: WordPress robot.txt
User-agent: *
Crawl-delay: 2
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /cgi-bin
Disallow: /category
Disallow: /tag
Disallow: /author
Disallow: /*.html/$
Disallow: /*feed*
Disallow: /trackback
Disallow: /*trackback
Disallow: /*trackback*
Disallow: /*/trackback
Disallow: /*?*