The Robots.txt file is a very underrated on-page SEO factor. Not everybody realizes the value it brings to the table. The Robots.txt file is like an access control system that tells the crawler bots which pages need to be crawled and which ones don’t. It is a rule book for your website which is read by the various web spiders before it attempts a crawl on your website.
There are tons of amazing Drupal SEO modules in version 9 and 8 that help make our jobs easier and boosts SEO ranking. And one of them is the RobotTxt module. The RobotsTxt module in Drupal 9 (and 8) is a handy feature that enables easy control of the Robots.Txt file in a multisite Drupal environment. You can dynamically create and edit the Robots.txt files for each site via the UI. Let’s learn more about this utility module and how to implement it in Drupal 9.
But how does Robots.Txt help with SEO?
So, Robots.Txt files restrict crawlers from crawling some pages. But why wouldn’t you want all your pages/files to be crawled, right? Why do you need to have any restrictions whatsoever? Well, in this case, the more isn’t always merrier.
- Without a Robots.txt file, you are allowing web spiders to crawl all your webpages, sections and files. This uses up your Crawl Budget (Yes, that’s a thing) – which can affect your SEO.
- A crawl budget is the number of your pages crawled by web spiders (Google bot, Yahoo, Bing, etc.) in a given timeframe. Too many pages to crawl could decrease your chances of being indexed faster. Not only that, you might also lose out on indexing the important pages!
- Not all your pages need to be crawled. For example, I’m sure you wouldn’t want Google to crawl your development / staging environment web pages or your internal login pages.
- You might want to restrict media files (images, videos or other documents) from being crawled upon.
- If you have a reasonable number of duplicate content pages, it is a better idea to add them to the Robots.Txt file instead of using canonical links on each of those pages.
How to Install and Implement the RobotsTxt Module in Drupal 9
The RobotsTxt Drupal 9 module is great when you want to dynamically generate a Robot.Txt file for each of your website when you are running many sites from one codebase (multisite environment).
Step 1: Install the RobotsTxt Module for Drupal 9
Using composer:
composer require 'drupal/robotstxt:^1.4'
Step 2: Enable the module
Go to Home > Administration > Extend (/admin/modules) and enable RobotsTxt module.
Step 3: Remove the existing Robots.txt file
Once the module is installed, make sure to delete (or rename) the robots.txt file in the root of your Drupal installation for this module to display its own robots.txt file(s). Otherwise, the module cannot intercept requests for the /robots.txt path.
Step 4: Configure
Navigate to Home -> Administration -> Configuration -> Search and metadata -> RobotsTxt (/admin/config/search/robotstxt), where you can add in your changes to “Contents of robots.txt” region. Save the configuration.
Step 5: Verify
To verify your changes please visit https://yoursitename.com/robots.txt
RobotTxt API
If you want to implement a shared list of directives across your multisite environment, you can implement the RobotsTxt API. The module has a single hook called hook_robotstxt()
. The hook allows you to define extra directives via code.
The example below will add a Disallow for /foo and /bar to the bottom of the robots.txt file without having to add them manually to the “Contents of robots.txt” region in the UI.
/**
* Add additional lines to the site's robots.txt file.
*
* @return array
* An array of strings to add to the robots.txt.
*/
function
hook_robotstxt
() {
return
[
'Disallow: /foo'
,
'Disallow: /bar'
,
];
}