How to add a robots.txt to your Django site2020-02-10
robots.txt is a standard file to communicate to “robot” crawlers, such as Google’s Googlebot, which pages they should not crawl.
You serve it on your site at the root URL
/robots.txt, for example
To add such a file to a Django application, you have a few options.
You could serve it from a web server outside your application, such as nginx. The downside of this approach is that if you move your application to a different web server, you’ll need to redo that configuration. Also you might be tracking your application code in Git, but not your web server configuration, and it’s best to track changes to your robots rules.
The approach I favour is serving it as a normal URL from within Django. It becomes another view that you can test and update over time. Here are a couple of approaches to do that.
With a Template
This is the easiest approach. It keeps the robots.txt file in a template and simply renders it at the URL.
First, add a new template called
robots.txt in your root templates directory, or in your “core” app’s templates directory:
Second, add a urlconf entry:
This creates a new view directly inside the URLconf, rather than importing it from
This is not the best idea, since it’s mixing the layers in one file, but it’s often done pragmatically to avoid extra lines of code for simple views.
We need to set
text/plain to serve it as a text document, rather than the default
After this is in place, you should be able to run
python manage.py runserver and see the file served at
http://localhost:8000/robots.txt (or similar for your runserver url).
With a Custom View
This is a slightly more flexible approach.
Using a view, you can add custom logic, such as checking the
Host header and serving different content per domain.
It also means you don’t need to worry about variables being HTML escaped in your template, which might end up incorrect for the text format.
First, add a new view, in your “core” app:
We’re using Django’s
require_GET decorator to restrict to only GET requests.
Class-based views already do this, but we need to think about it ourselves for function-based views.
We generate the robots.txt content inside Python, by combining a list of lines using
Second, add a urlconf entry:
Again, you should be able to check this on
As I wrote above, one of the advantages of serving this from Django is that we can test it. Automated tests will guard against accidental breakage of the code, or removal of the URL.
You can add some basic tests in a file like
Run the tests with Django’s
python manage.py test core.tests.test_views.
It’s also a good idea check they are being run by making them fail, for example by commenting out the entry in the URL conf.
If you have a complicated set of robots.txt rules, you’ll want to run a checker after you deploy it. It seems Google’s is the de facto standard, see their webmasters page.
If you want to control your robots.txt rules in your database, there’s a Jazzband package called django-robots. I haven’t used it, but it seems well maintained. It also adds some less standard rules, like directing to the sitemap.
Hope this helps you control those robots,
Interested in Django or Python training? I'm taking bookings for workshops.
One summary email a week, no spam, I promise.
- Common Issues Using Celery (And Other Task Queues)
- How to Add Database Modifications Beyond Migrations to Your Django Project
- Django's Field Choices Don't Constrain Your Data
© 2020 All rights reserved.