Table of Contents
Introduction
The Django select_related method allows you to speed up your querysets and therefore your application.
In this post, I’ll show you how you can minimize database accesses and thus maximize the speed of your Django app.
To learn how to use the select_related
method we will see a very clear example. We will start from the creation of the models and we will analyze two ways; the first (the slow one) without using select_related
and then the fast one using it.
You will see that the difference in database accesses will be abysmal.
Speed up Django queryset using select_related method
The Django documentation describes the select_related
method with this definition:
Returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query
Django documentation
In the following sections we will see how best to use this method in a real case.
Preconditions
In this section I will show you the steps to be able to replicate this example in your local environment.
Of course, if you’re only interested in how to use the select_related
method, skip ahead to the next chapter.
In order to be able to replicate this example in your local environment you will need to have the Django
Python package installed.
My suggestion is to create a virtualenv and then install Django.
If you don’t know what a virtualenv is or you don’t know how to create one, then take a look at this tutorial.
Once you’ve created the virtualenv, you can install Django with the following command from your command line.
pip install Django
This will install the latest version of Django in your local environment.
Once this is done, the last step you need is to create a Django app.
For this I advise you to follow the steps of the tutorial that you can find on the official Django documentation.
With that done, you’re ready to start viewing the rest of the tutorial.
Models and records creation
First of all, let’s edit the Django app’s models.py
file.
Suppose we have two models, Artist and Album.
For simplicity let’s minimize the fields of each model and just specify a name attribute for both and a foreign key from Album to Artist.
Let’s make another simplification and say that an album can only have one artist.
class Artist(models.Model):
name = models.CharField(max_length=10)
class Album(models.Model):
name = models.CharField(max_length=30)
artist = models.ForeignKey(Artist, on_delete=models.CASCADE)
Once this is done, we write this simple code to insert 100 Artists and 100 Albums. Each artist is associated with an album.
for idx in range(100):
artist_name = "artist_{}".format(idx)
artist_obj = Artist.objects.create(name=artist_name)
album_name = "album_{}".format(idx)
Album.objects.create(name=album_name, artist=artist_obj)
Now that the models are ready, in the next chapters we will see how it is possible to limit accesses to the database using the select_related
method.
Without Django select_related method – Slow approach
In this piece of code below we simply want to print the associated artist for each album.
To do this, we first collect all the albums and then use a for
loop to print the information we need for each artist
from django.db import connection
print("Initial number of queries: {}".format(len(connection.queries)))
album_qs = Album.objects.all()
for album in album_qs:
artist = album.artist
print("Album name: {} - Artist name: {}".format(album.name, artist.name))
print("Final number of queries: {}".format(len(connection.queries)))
As we said, the initial queryset is on Album model.
After this, we use a for loop to print each album and the linked artist.
Django provides a method to count how many queries are made. The method is called queries
and is in the connection
module.
As you can see from the above code, just import it and use it.
from django.db import connection
queries = connection.queries
Note that it returns a list so in our example, we will use the len(connection.queries)
to count the accesses to the database.
The output of the code snippet above is the following:
Initial number of queries: 0
Album name: album_0 - Artist name: artist_0
Album name: album_1 - Artist name: artist_1
Album name: album_2 - Artist name: artist_2
Album name: album_3 - Artist name: artist_3
Album name: album_4 - Artist name: artist_4
Album name: album_5 - Artist name: artist_5
...
Album name: album_95 - Artist name: artist_95
Album name: album_96 - Artist name: artist_96
Album name: album_97 - Artist name: artist_97
Album name: album_98 - Artist name: artist_98
Album name: album_99 - Artist name: artist_99
Final number of queries: 101
The first query is done when all Albums are retrieved, but what about the other 100?
Basically, everytime access to artist variable is requested, there is an additional database hit.
artist = album.artist
As you can imagine this is a really big deal. Obviously in this small example it is possible that you do not notice a difference in performance, but in a web application continuing to query the database can slow down the user a lot.
In the next chapter we will see how to fix this problem and thus speed up your Django application.
Using Django select_related method – Faster approach
Starting from the code presented in the previous section, how can we speed up our code?
In this case, the bottleneck is the number of database accesses, but no problem!
To solve this problem we can use the select_related method.
print("Initial number of queries: {}".format(len(connection.queries)))
album_qs = Album.objects.all().select_related("artist")
for album in album_qs:
artist = album.artist
print("Album name: {} - Artist name: {}".format(album.name, artist.name))
print("Final number of queries: {}".format(len(connection.queries)))
The only difference is on the second line, indeed using select_related, Django ‘pre-loads’ all artists for each album.
This means that every time we look for the artist via the album, we no longer access the database.
Let’s try running the code again now and see the difference:
Initial number of queries: 0
Album name: album_0 - Artist name: artist_0
Album name: album_1 - Artist name: artist_1
Album name: album_2 - Artist name: artist_2
Album name: album_3 - Artist name: artist_3
Album name: album_4 - Artist name: artist_4
Album name: album_5 - Artist name: artist_5
...
Album name: album_95 - Artist name: artist_95
Album name: album_96 - Artist name: artist_96
Album name: album_97 - Artist name: artist_97
Album name: album_98 - Artist name: artist_98
Album name: album_99 - Artist name: artist_99
Final number of queries: 1
Amazing, the modification done has decreased the number of database hits from 101 to 1!
Conclusion
With this brief use case we have seen how it is possible to decrease the number of database accesses using the select_related Django querysets method.
Django offers other methods for speeding up queries that are just as effective. In the next posts I will analyze others.
As always, if you have any doubts or if you are in trouble, I invite you to write me a comment. If not take a look at the latest posts!