If you’re here just for the solution, JUMP HERE.

At Q5, we love using Django as the backend for our projects. Given a choice, we’ll almost always choose Django.

Django’s native ORM provides us a really nice way to build a quick and easy object oriented API. The object orientation is nice because we don’t have to think about the SQL calls made, and can instead build out the rest of the application.

Combine that with Django Rest Framework and this allows us to serialize (turn into JSON) our Django Object Oriented models. This is nice because we can quickly return what we need to our frontend through some small translations / serialization calls.

With the ease of setting up a REST api through DRF + Django, it’s easy to think your application is built to scale … until it doesn’t.

One really big pitfall that you have to look out for if you’re using Django’s ORM is that you can make way too many DB calls. This is because Django ORM operates in a way called: “lazy loading”. What this means is that Django doesn’t actually fetch the item from the DB until it actually truly needs it.

Assume we have a UserProfile model setup like such:

Because it’s so easy to call something like UserProfile.user , or User.settings , you don’t realize that you’re actually making 1 extra DB call every time you call UserProfile.user or UserProfile.settings, because Django hasn’t fetched the field yet.

Problem

Maybe in your viewsets, it won’t really matter because even if you add 1 or 2 extra db calls, they will always be constant and it doesn’t make a huge difference.

Where this really becomes a problem is when you have nested serializers, and you have a GET method to call a list of them.

Let’s take this code below as an example:

Models

class Module(models.Model): """ A specific module for a course """ title = models.CharField(max_length=255, default='') course = models.ForeignKey(OnlineCourse, on_delete=models.CASCADE, null=True, related_name="modules") class Lesson(models.Model): """ A specific lesson in a module """ title = models.CharField(max_length=255, default='') module = models.ForeignKey(Module, on_delete=models.CASCADE, null=True, related_name="lessons") class Exercise(models.Model): """ A specific exercise for a lesson """ TYPES = ( ('vid', 'Video'), ('mcq', 'Multiple Choice Quiz'), ('fib', 'Fill in Blank'), ('text', 'Text Lesson') ) type = models.CharField(max_length=255, default='video', choices=TYPES) lesson = models.ForeignKey(Lesson, on_delete=models.CASCADE, null=True, related_name="exercises")

Serializers

class ExerciseSerializer(serializers.ModelSerializer): class Meta: model = Exercise fields = '__all__' class LessonSerializer(serializers.ModelSerializer): exercises = serializers.SerializerMethodField() class Meta: model = Lesson fields = '__all__' def get_exercises(self, obj): qs = obj.exercises.all().order_by('index') return ExerciseSerializer(qs, many=True, read_only=True).data class ModuleSerializer(serializers.ModelSerializer): lessons = serializers.SerializerMethodField() class Meta: model = Module fields = '__all__' def get_lessons(self, obj): return LessonSerializer(obj.lessons.all().order_by('index'), many=True, read_only=True).data

If you see above, we have nested serializers so for each module, we can get every lesson as well as every exercise. In the serializers, we grab all the lessons and the exercises and return them back up to the ModuleSerializer.

Works beautifully, but the only problem? We’re making *tons* of DB calls like this. For every module we have, we’ll make one extra DB call with obj.lessons.all() , and for every lesson we have, we’ll make one extra DB call with obj.exercises.all() . Therefore, if a module has 100 lessons, we’ll be making 100 extra DB calls. This is because every time we call .exercises.all(), we make 1 extra db call.

The N+1 DB Problem.

Not a huge problem if your DB is sitting on a really big instance, but if not, then it’s really expensive! And really easy to miss unless you’re looking for it.

Solution

We use Django Silk to monitor our API endpoints to see if they’re taking too long.

We noticed in our Silk monitor that the above was happening — every time we added a new “exercise” to a “lesson”, we’d make 1 extra DB call. As the DB grew, this wouldn’t scale well.

Enter prefetch_related

Django has a nice utility called prefetch_related which solves this issue. Instead of “lazy loading”, prefetch_related grabs the specified foreign key elements in one SQL call.

Now we can set that SQL call to a variable and reuse the SQL call throughout each nested serializer.

For example:

queryset = Courses.objects.filter(published=True).prefetch_related('modules', 'modules__lessons', 'modules__lessons__exercises')

Now we can pass this queryset down through our serializers through a context as such:

CourseSerializer(course, context={'queryset': queryset})

And now we can re-write our serializers to account for the context and continue passing it down to each nested serializer:

class ExerciseSerializer(serializers.ModelSerializer): choices = serializers.SerializerMethodField() class Meta: model = Exercise fields = '__all__' class LessonSerializer(serializers.ModelSerializer): exercises = ExerciseSerializer(many=True, read_only=True) completed_exercises = serializers.SerializerMethodField() class Meta: model = Lesson fields = '__all__' def get_completed_exercises(self, obj): qs = self.context.get("queryset") if qs: return sum([es.exercise.lesson.id == obj.id and es.complete for es in qs]) else: return 0 class ModuleSerializer(serializers.ModelSerializer): lessons = LessonSerializer(many=True, read_only=True) class Meta: model = Module fields = '__all__'

By calling the LessonSerializer and ExerciseSerializer directly, that automatically passes down the context — the above is equivalent to:

class LessonSerializer(serializers.ModelSerializer): exercises = ExerciseSerializer(many=True, read_only=True, context=self.context)

And now instead of having the N+1 DB Problem where if we add an exercise, we have to make 1 extra DB call, and same if we add a Lesson, we make 1 DB call at the beginning of all the serialization and fetch every foreign key we need, pass down that DB object to use through each serialization and end up only making 1 DB call in total.

We’ve dramatically reduced our DB load from 100 calls if we had 100 exercises to just 1 even if we have 5,000 exercises.

Tradeoffs

In the sample problem above, there are also tradeoffs that we need to consider. For example, maybe we don’t want to load all 5,000 exercises into one DB call. We may have memory constraints, or just serializing all 5,000 of them might take longer than making 5,000 DB calls. Maybe we can serialize them 1,000 at a time and make 5 DB calls instead of just 1.

Having the database as a bottleneck is one issue, but there are other issues that we need to consider based on the strength of our servers for a fully performant and scalable app.