First of all, the robots.txt is a nothing more than a plain text file (ASCII or UTF-8) located in your domain root directory, which blocks (or allows) search engines to access certain areas of your site.

 

The robots.txt contains a simple set of commands (or directives) and it’s typically applied in order to restrict crawler traffic onto your server, thus preventing unwanted resource usage.

 

Search engines use so-called crawlers (or bots) in order to index parts of a website and return those as search results. You might want certain sensitive data stored on your server to be inaccessible for web searches. The robots.txt file helps you do just that.

 

Note: Files or pages on your website are not completely cut off from crawlers in case these files are indexed/referenced from other websites. To properly protect your URL from appearing in Google search engines, you can password-protect the files directly from your server.

 

How to create the robots.txt file

 

In order to create your robots.txt file (if not already existent), simply follow the following steps:

 

Log-in to your cPanel Account >> Click on File Manager >> Select a domain >> Go to the website folder >> Click on “New File”  >> Type in “robots.txt”  >> Click on “Create New File”.

 

Now you are free to edit the content of this file by double-clicking on it.

 

Note: you can create only one robots.txt file for each domain. Duplicates are not allowed on the same root path. Each domain or sub-domain must contain its own robots.txt file.   

 

Examples of usage and syntax rules

 

Usually, a robots.txt file contains one or more rules, each on its own separate line. Each rule blocks or allows access to a given crawler to a specified file path or the entire website.

 

Here are a few examples of robots.txt files:

 

1. Block all crawlers (user-agents) from accessing the logs and ssl directories.

User-agent:*
Disallow: /logs/
Disallow: /ssl/

 

2. Block all crawlers to index the whole site.

User-agent: *
Disallow: /

 

3. Allow all user agents to access the entire site.

User-agent: *
Allow: /

 

4. Block indexation for the whole site from a specific crawler.

User-agent: Bot1
Disallow: /

 

5. Allow indexation to a specific web crawler and prevents indexation from others.

User-agent: Googlebot
Disallow:
User-agent: *
Disallow: /

 

Under User-agent:  you can type in the specific crawler name. You can also include all crawlers simply by typing in the star (*) symbol. With this command, you can filter out all crawlers except AdBot crawlers, which you need to enumerate explicitly. You can find a list of all crawlers on the internet.

Additionally, in order for the Allow and Disallow commands to work only for a specific file or folder, you must always include their names between “/”.

Notice how both commands are case-sensitive? It is especially relevant to know, that the crawler agents default setting is so that they can access any page or directory if not blocked by a Disallow: rule.

Note: You can find a complete set of rules and syntax examples here.

 

Помог ли вам данный ответ? 0 Пользователи нашли это полезным (0 голосов)