How to block a site from indexing in robots.txt: instructions and recommendations

2025 Author: Trinity Chesterton | [email protected]. Last modified: 2025-01-23 10:04

The work of an SEO-optimizer is very large-scale. Beginners are advised to write down the optimization algorithm so as not to miss any steps. Otherwise, the promotion will hardly be called successful, since the site will constantly experience failures and errors that will have to be corrected for a long time.

One of the optimization steps is working with the robots.txt file. Every resource should have this document, because without it it will be more difficult to cope with optimization. It performs many functions that you will have to understand.

Robot Assistant

The robots.txt file is a plain text document that can be viewed in the standard Notepad of the system. When creating it, you must set the encoding to UTF-8 so that it can be read correctly. The file works with http, https and FTP protocols.

This document is an assistant to search robots. In case you don't know, every system uses "spiders" that quickly crawl the World Wide Web to return relevant sites for queries.users. These robots must have access to the resource data, robots.txt works for this.

In order for the spiders to find their way, you need to send the robots.txt document to the root directory. To check if the site has this file, enter “https://site.com.ua/robots.txt” into the address bar of the browser. Instead of "site.com.ua" you need to enter the resource you need.

Document functions

The robots.txt file provides crawlers with several types of information. It can give partial access so that the "spider" can scan specific elements of the resource. Full access allows you to check all available pages. A complete ban prevents robots from even starting to check, and they leave the site.

After visiting the resource, "spiders" receive an appropriate response to the request. There may be several of them, it all depends on the information in robots.txt. For example, if the scan was successful, the robot will receive the code 2xx.

Perhaps the site has been redirected from one page to another. In this case, the robot receives the code 3xx. If this code occurs multiple times, then the spider will follow it until it receives another response. Although, as a rule, he uses only 5 attempts. Otherwise, the popular 404 error appears.

If the answer is 4xx, then the robot is allowed to crawl the entire content of the site. But in the case of the 5xx code, the check may stop completely, since this often indicates temporary server errors.

What forneed robots.txt?

As you may have guessed, this file is the robots' guide to the root of the site. Now it is used to partially restrict access to inappropriate content:

pages with personal information of users;
mirror sites;
search results;
data submission forms, etc.

If there is no robots.txt file in the site root, the robot will crawl absolutely all content. Accordingly, unwanted data may appear in the search results, which means that both you and the site will suffer. If there are special instructions in the robots.txt document, then the "spider" will follow them and give out the information desired by the owner of the resource.

Working with a file

To close a site from indexing using robots.txt, you need to figure out how to create this file. To do this, follow the instructions:

Create a document in Notepad or Notepad++.
Set the file extension ".txt".
Enter the required data and commands.
Save the document and upload it to the site root.

As you can see, at one of the stages it is necessary to set commands for robots. They are of two types: allowing (Allow) and prohibiting (Disallow). Also, some optimizers may specify crawl speed, host, and link to resource page map.

In order to start working with robots.txt and completely close the site from indexing, you must also understand the symbols used. For example, in a documentuse "/", which indicates that the entire site is selected. If "" is used, then a sequence of characters is required. In this way, it will be possible to specify a specific folder that can either be scanned or not.

Feature of bots

"Spiders" for search engines are different, so if you work for several search engines at once, then you will have to take this moment into account. Their names are different, which means that if you want to contact a specific robot, you will have to specify its name: “User Agent: Yandex” (without quotes).

If you want to set directives for all search engines, then you need to use the command: "User Agent: " (without quotes). In order to properly block the site from indexing using robots.txt, you need to know the specifics of popular search engines.

The fact is that the most popular search engines Yandex and Google have several bots. Each of them has its own tasks. For example, Yandex Bot and Googlebot are the main "spiders" that crawl the site. Knowing all the bots, it will be easier to fine-tune the indexing of your resource.

Examples

So, with the help of robots.txt, you can close the site from indexing with simple commands, the main thing is to understand what you need specifically. For example, if you want Googlebot not to approach your resource, you need to give it the appropriate command. It will look like: "User-agent: Googlebot Disallow: /" (without quotes).

Now we need to understand what is in this command and how it works. So "User-agent"is used in order to use a direct call to one of the bots. Next, we indicate to which one, in our case it is Google. The "Disallow" command must start on a new line and prohibit the robot from entering the site. The slash symbol in this case indicates that all pages of the resource are selected for the command execution.

In robots.txt, you can disable indexing for all search engines with a simple command: "User-agent:Disallow: /" (without quotes). The asterisk character in this case denotes all search robots. Typically, such a command is needed in order to pause the indexing of the site and start cardinal work on it, which otherwise could affect the optimization.

If the resource is large and has many pages, it often contains proprietary information that is either undesirable to disclose, or it can negatively affect promotion. In this case, you need to understand how to close the page from indexing in robots.txt.

You can hide either a folder or a file. In the first case, you need to start again by contacting a specific bot or everyone, so we use the “User-agent” command, and below we specify the “Disallow” command for a specific folder. It will look like this: "Disallow: / folder /" (without quotes). This way you hide the entire folder. If it contains some important file that you would like to show, then you need to write the command below: “Allow: /folder/file.php” (without quotes).

Check file

If using robots.txt to close the site fromYou succeeded in indexing, but you don’t know if all your directives worked correctly, you can check the correctness of the work.

First, you need to check the placement of the document again. Remember that it must be exclusively in the root folder. If it is in the root folder, then it will not work. Next, open the browser and enter the following address there: “https://yoursite. com/robots.txt (without quotes). If you get an error in your web browser, then the file is not where it should be.

Directives can be checked in special tools that are used by almost all webmasters. We are talking about Google and Yandex products. For example, Google Search Console has a toolbar where you need to open "Crawl", and then run the "Robots.txt File Inspection Tool". You need to copy all the data from the document into the window and start scanning. Exactly the same check can be done in Yandex. Webmaster.

Recommended:

Indexing the site in search engines. How the site is indexed in "Yandex" and "Google"

Do you want your site to appear in search engine results queries? Then it must be processed by the search engines Rambler, Yandex, Google, Yahoo, and so on. You must inform search engines (spiders, systems) about the existence of your website, and then they will crawl it in whole or in part

Page indexing. Quick indexing of the site by search engines "Google" and "Yandex"

An article about what page indexing is; how indexing by search engines is carried out, as well as how to speed up the indexing of your own site and how to prohibit it

Seo-optimization of the site yourself: step-by-step instructions, description, recommendations and reviews

Owners of their own Internet resources, regardless of their thematic focus, need to know about seo-optimization. With it, you can attract more users. Accordingly, the number of views will be increased. How can SEO-optimization of the site be carried out independently? Step by step instructions will be given in this review

Indexing a site in "Yandex": how to make a site "delicious" for a search engine?

How to attract the attention of Yandex robots, how long to wait and what tools to use? What is preventing your project from starting successfully? Learn all about indexing in 10 minutes. As a bonus - a universal checklist for those who need to speed up indexing by 2-3 times

How to block a Beeline SIM card? How to block a Beeline number

Each mobile device is equipped with a unique identifier - a SIM card that stores huge amounts of information, with which you can keep in touch with anyone from anywhere in the world. It often happens that this most important element needs to be restored again. In this article, we will talk about how to block a Beeline SIM card and install an unwanted call filter

How to block a site from indexing in robots.txt: instructions and recommendations

Table of contents:

Robot Assistant

Document functions

What forneed robots.txt?

Working with a file

Feature of bots

Examples

Check file

Recommended:

Indexing the site in search engines. How the site is indexed in "Yandex" and "Google"

Page indexing. Quick indexing of the site by search engines "Google" and "Yandex"

Seo-optimization of the site yourself: step-by-step instructions, description, recommendations and reviews

Indexing a site in "Yandex": how to make a site "delicious" for a search engine?

How to block a Beeline SIM card? How to block a Beeline number

ASUS Nexus 7 tablet: reviews and prices

IdeaTab Lenovo A3000 1 tablet: review, specifications and reviews

How to choose a 7-inch Lenovo tablet with a SIM card?

PocketBook 613 Basic New: reviews, specifications, instructions

Smart bracelet Xiaomi Mi Band: reviews, instructions, review

Universal charger: how to restore battery he alth

Nokia X3: review, specifications and reviews

Wiring accessories: installation

360 camera: overview of models and specifications

IPTV - what is it? IPTV playlist. How to set up IPTV?

Search satellite systems: review, description, specifications and reviews. Satellite car security system

Article for those who want to know how to set up a navigator

Navigator "Explay": review of models (reviews)

What is global positioning?

Car radio Pioneer DEH-P6000UB: specifications, connection, reviews