Joins vs Includes

Sometimes, joins and includes can be confusing because there are scenarios where they seem to be interchangeable, but there are specific reasons to use one or the other.

joins

joins allow us to specify a relation to be included in the query. The resulting query will include a JOIN clause (SQL).

Let’s say we have these two models: Employee that has one Computer.

#app/models/employee.rb
class Employee < ActiveRecord::Base
  has_one :computer
end

#app/models/computer.rb
class Computer < ActiveRecord::Base
  belongs_to :employee
end

If we want to know all the employees whose name is ‘Fernando’ and has a Dell computer, we can use joins.

Employee.
  joins(:computer).
  where(name: 'Fernando', computers: { brand: 'Dell' })

This query works well and returns all the employees who fulfill the conditions.

The SQL will be something like this:

SELECT "employees".*
 FROM "employees"
 INNER JOIN "computers" ON "computers"."employee_id" = "employees"."id"
 WHERE "employees"."name" = 'Fernando' AND "computers"."brand" = 'Dell'

includes

As we explained in a previous post, if we want to load all the employees and all their computers in a single query, we can use includes,

Employee.includes(:computer)

This way, Rails hits the database just once, and load all the data from employees and computers. The confusion might start here, because using includes we can also perform some conditionals in the where clause. For, example:

Employee.
  includes(:computer).
  where(name: 'Fernando', computers: { brand: 'Dell' })

As we can see, it is possible to include conditionals that affect both relations, just like we did with joins.

However, the resulting SQL is different. Let’s see:

SELECT "employees"."id" AS t0_r0,
 "employees"."name" AS t0_r1,
 "employees"."created_at" AS t0_r2,
 "employees"."updated_at" AS t0_r3,
 "computers"."id" AS t1_r0,
 "computers"."brand" AS t1_r1,
 "computers"."model" AS t1_r2,
 "computers"."employee_id" AS t1_r3,
 "computers"."created_at" AS t1_r4,
 "computers"."updated_at" AS t1_r5
 FROM "employees"
 LEFT OUTER JOIN "computers" ON "computers"."employee_id" = "employees"."id"
 WHERE "employees"."name" = 'Fernando' AND "computers"."brand" = 'Dell'

As we can see, includes adds all the fields from computer, as expected, while joins doesn’t. And this little difference is what we must take into account.

When we just want to filter the result of a query based on a field that belongs to a secondary relation, but we aren’t going to use data of this relation later, we must use joins. Otherwise, we will load a lot of data that we are not going to use.

Example:

employees = Employee.
  joins(:computer).
  where(name: 'Fernando', computer: { brand: 'Dell' })
puts employees.map(&:name)

On the other hand, if we need to use data from the secondary relation, then includes is mandatory, otherwise, we will have N+1 query issues.

Example:

employees = Employee.
  joins(:computer).
  where(name: 'Fernando', computer: { brand: 'Dell' })
puts employees.map{|e| "#{e.full_name} has a Dell #{e.computer.model}" }

It is not so confusing after all, right?

comments powered by Disqus