Blog

Understanding how to improve query optimisation in Django

Date

13th January 2020

Read

5 min

Creator

Pranjal Singh

I’ve found it really intriguing how fast people learn Django. Because of it’s simplicity & effectiveness‚ it appears that it’s quite appealing to newbies. 

But I’ve also found it strange how people miss out on slight optimisations that can result in almost 20 – 200 % improvement in time complexity! Let me put the above into context. Consider the relationship below:

### models.pyfrom django.db import modelsclass Author(models.Model): name = models.CharField(max_length=255) def __str__(self): return "Author - {}".format(self.name) class Book(models.Model) name = models.CharField(max_length=255) author = models.ForeignKey(Author‚ related_name="books") def __str__(self): return "Book - {}".format(self.name)

The above denotes that an author a can have many books. So‚ let’s create a list of authors & books.

Authors :-ID | Name1 | J.K. Rowling2 | George R.R MartinBooks:-ID | Name | Author1 | Harry Potter & the chamber of secrets | 12 | Harry Potter & the deathly hallows | 1 3 | Harry Potter & the prisoner of Azkaban | 14 | A Game of Thrones | 25 | A Clash of Kings | 26 | A Dance with Dragons | 2

As you can see‚ I’m a big fan of both the Harry Potter series & Game of Thrones!

Now‚ let’s write an API‚ via which we can retrieve the list of books along with their authors.

### api.pyclass AllBooksAPI(ListAPIView): serializer_class = BookSerializer queryset = Book.objects.all() ### serializer.pyclass BookSerializer(serializers.ModelSerializer): author = serializer.SerializerMethodField() def get_author(self‚ obj): return obj.author.nameclass Meta: model = Book fields = ['id'‚ 'name'‚ 'author']

As expected‚ we hit the API url. Ta-da! All works perfectly fine – or does it? Let’s look at the log of the queries. We can log the queries by adding this snippet to settings.py

LOGGING = { ‘version’: 1‚ ‘filters’: { ‘require_debug_true’: { ‘()’: ‘django.utils.log.RequireDebugTrue’‚ }}‚’handlers’: { ‘console’: { ‘level’: ‘DEBUG’‚< ‘filters’: [‘require_debug_true’]‚ ‘class’: ‘logging.StreamHandler’‚ }}‚ ‘loggers’: { ‘django.db.backends’: { ‘handlers’: [‘console’]‚ } }}

The log looks something like this.

(0.001) SELECT “Book_book”.”id”‚ “Book_book”.”name”‚ “Book_book”.”author_id” FROM “Book_book”; args=()(0.000) SELECT “Author_author”.”id”‚ “Author_author”.”name” FROM “Author_author” WHERE “Author_author”.”id” = 1; args=(1‚)(0.000) SELECT “Author_author”.”id”‚ “Author_author”.”name” FROM “Author_author” WHERE “Author_author”.”id” = 1; args=(1‚)(0.000) SELECT “Author_author”.”id”‚ “Author_author”.”name” FROM “Author_author” WHERE “Author_author”.”id” = 1; args=(1‚)(0.000) SELECT “Author_author”.”id”‚ “Author_author”.”name” FROM “Author_author” WHERE “Author_author”.”id” = 2; args=(2‚)(0.000) SELECT “Author_author”.”id”‚ “Author_author”.”name” FROM “Author_author” WHERE “Author_author”.”id” = 2; args=(2‚)(0.003) SELECT “Author_author”.”id”‚ “Author_author”.”name” FROM “Author_author” WHERE “Author_author”.”id” = 2; args=(2‚)

Wait‚ what? We executed 7 different queries to fetch the list of 6 books & their authors?! Let’s analyse this further. The first query fetches the book ID‚ name & the author ID from the Book table. That seems okay. But where are the extra 6 queries coming from? The problem lies with this statement inside the BookSerializer.

def get_author(self‚ obj): return obj.author.name

As you can probably guess‚ we are fetching the author from the database for every book! This certainly doesn’t seem efficient. We are reaching out to the DB every time we want to access an author of a book. How about we try & do it in 1 single connection – saving us 6 roundtrips! Django provides us with select_related & prefetch_related for this matter. This helps to fetch the related data (other models) instead of just retrieving the column value. In the above example‚ we want to retrieve the data from the Author table for all the books.

It looks something like this.

queryset = Book.objects.select_related(‘author’).all()

Let’s review our logs to see how our new query is working.

(0.001) SELECT “Book_book”.”id”‚ “Book_book”.”name”‚”Book_book”.”author_id”‚ “Author_author”.”id”‚”Author_author”.”name” FROM “Book_book” INNER JOIN “Author_author” ON (“Book_book”.”author_id” = “Author_author”.”id”); args=()

Perfect! We are now making just one roundtrip to execute the whole query. To add to it‚ the new query takes only 0.001 seconds in comparison to the old one which takes 0.004 seconds (adding the time taken to execute all the 7 queries) That’s a 300% improvement in terms of time complexity.

Imagine this multiplied. The gains for complex models & querysets could be manyfold. There is one limitation of select_related though. We cannot use it with a many-to-many field. For that purpose though‚ we do have prefetch_related.

In the above example‚ usage of prefetch_related will look something like this.

queryset = Book.objects.prefetch_related(‘author’).all() #### 2 database hits

Checking the logs‚ you’ll notice that it takes 2 queries (i.e 2 database hits).

(0.000) SELECT “Book_book”.”id”‚ “Book_book”.”name”‚ “Book_book”.”author_id” FROM “Book_book”; args=()(0.000) SELECT “Author_author”.”id”‚ “Author_author”.”name” FROM “Author_author” WHERE “Author_author”.”id” IN (1‚ 2); args=(1‚ 2)

This is because of the fact that prefetch_related first fetches the list of books & then the list of related authors. After this‚ it performs a join in python. The overall result of both select_related & prefetch_related is same‚ but they differ in terms of their execution.

To sum it up for the above example

Book.objects.all() #### 7 database hitsBook.objects.select_related(‘author’).all() #### 1 database hitBook.objects.prefetch_related(‘author’).all() #### 2 database hits“`