Django ORM中的select_related和prefetch_related有什么区别?

问题:Django ORM中的select_related和prefetch_related有什么区别?

在Django文件中,

select_related() “遵循”外键关系,在执行查询时选择其他相关对象数据。

prefetch_related() 对每个关系进行单独的查找,并在Python中执行“联接”。

“在python中进行连接”是什么意思?有人可以举例说明吗?

我的理解是,对于外键关系,使用select_related; 对于M2M关系,请使用prefetch_related。它是否正确?

In Django doc,

select_related() “follows” foreign-key relationships, selecting additional related-object data when it executes its query.

prefetch_related() does a separate lookup for each relationship, and does the “joining” in Python.

What does it mean by “doing the joining in python”? Can someone illustrate with an example?

My understanding is that for foreign key relationship, use select_related; and for M2M relationship, use prefetch_related. Is this correct?


回答 0

您的理解基本上是正确的。您可以使用select_related时,你将要选择的对象是一个对象,所以OneToOneField还是ForeignKey。您可以使用prefetch_related时,你会得到一个东西的“设置”,所以ManyToManyFieldS作为你陈述或反向ForeignKey秒。为了阐明我的意思是“ reverse ForeignKeys”,这里有一个例子:

class ModelA(models.Model):
    pass

class ModelB(models.Model):
    a = ForeignKey(ModelA)

ModelB.objects.select_related('a').all() # Forward ForeignKey relationship
ModelA.objects.prefetch_related('modelb_set').all() # Reverse ForeignKey relationship

区别在于select_related执行SQL连接,因此从SQL Server将结果作为表的一部分返回。prefetch_related另一方面,执行另一个查询,因此减少了原始对象中的冗余列(ModelA在上面的示例中)。您可以使用prefetch_related任何可以使用的东西select_related

折衷方案是prefetch_related必须创建并发送ID列表以选择回服务器,这可能需要一段时间。我不确定在事务中是否有很好的方法,但是我的理解是Django总是只发送一个列表并显示SELECT … WHERE PK IN(…,…,…)基本上。在这种情况下,如果预取的数据稀疏(例如,将美国国家对象链接到人们的地址),这可能会很好,但是,如果它们之间的关系更接近一对一,则会浪费大量通信资源。如有疑问,请尝试两者并查看哪种效果更好。

上面讨论的所有内容基本上都与与数据库的通信有关。但是,在Python方面prefetch_related具有额外的好处,即使用单个对象表示数据库中的每个对象。使用select_related重复的对象将在Python中为每个“父”对象创建。由于Python中的对象具有相当大的内存开销,因此这也是一个考虑因素。

Your understanding is mostly correct. You use select_related when the object that you’re going to be selecting is a single object, so OneToOneField or a ForeignKey. You use prefetch_related when you’re going to get a “set” of things, so ManyToManyFields as you stated or reverse ForeignKeys. Just to clarify what I mean by “reverse ForeignKeys” here’s an example:

class ModelA(models.Model):
    pass

class ModelB(models.Model):
    a = ForeignKey(ModelA)

ModelB.objects.select_related('a').all() # Forward ForeignKey relationship
ModelA.objects.prefetch_related('modelb_set').all() # Reverse ForeignKey relationship

The difference is that select_related does an SQL join and therefore gets the results back as part of the table from the SQL server. prefetch_related on the other hand executes another query and therefore reduces the redundant columns in the original object (ModelA in the above example). You may use prefetch_related for anything that you can use select_related for.

The tradeoffs are that prefetch_related has to create and send a list of IDs to select back to the server, this can take a while. I’m not sure if there’s a nice way of doing this in a transaction, but my understanding is that Django always just sends a list and says SELECT … WHERE pk IN (…,…,…) basically. In this case if the prefetched data is sparse (let’s say U.S. State objects linked to people’s addresses) this can be very good, however if it’s closer to one-to-one, this can waste a lot of communications. If in doubt, try both and see which performs better.

Everything discussed above is basically about the communications with the database. On the Python side however prefetch_related has the extra benefit that a single object is used to represent each object in the database. With select_related duplicate objects will be created in Python for each “parent” object. Since objects in Python have a decent bit of memory overhead this can also be a consideration.


回答 1

两种方法可以达到相同的目的,从而放弃不必要的数据库查询。但是他们使用不同的方法来提高效率。

使用这两种方法的唯一原因是,当单个大型查询优于许多小型查询时。Django使用大型查询来抢先在内存中创建模型,而不是针对数据库执行按需查询。

select_related对每个查找执行联接,但将选择范围扩展为包括所有联接表的列。但是,这种方法有一个警告。

联接有可能使查询中的行数相乘。当您通过外键或一对一字段执行联接时,行数不会增加。但是,多对多联接没有此保证。因此,Django限制select_related了不会意外导致大规模联接的关系。

对于“ join in python”来说prefetch_related,应该比它还要令人震惊。它为要连接的每个表创建一个单独的查询。它使用WHERE IN子句过滤每个表,例如:

SELECT "credential"."id",
       "credential"."uuid",
       "credential"."identity_id"
FROM   "credential"
WHERE  "credential"."identity_id" IN
    (84706, 48746, 871441, 84713, 76492, 84621, 51472);

每个表都被拆分成一个单独的查询,而不是执行可能包含太多行的单个联接。

Both methods achieve the same purpose, to forego unnecessary db queries. But they use different approaches for efficiency.

The only reason to use either of these methods is when a single large query is preferable to many small queries. Django uses the large query to create models in memory preemptively rather than performing on demand queries against the database.

select_related performs a join with each lookup, but extends the select to include the columns of all joined tables. However this approach has a caveat.

Joins have the potential to multiply the number of rows in a query. When you perform a join over a foreign key or one-to-one field, the number of rows won’t increase. However, many-to-many joins do not have this guarantee. So, Django restricts select_related to relations that won’t unexpectedly result in a massive join.

The “join in python” for prefetch_related is a little more alarming then it should be. It creates a separate query for each table to be joined. It filters each of these table with a WHERE IN clause, like:

SELECT "credential"."id",
       "credential"."uuid",
       "credential"."identity_id"
FROM   "credential"
WHERE  "credential"."identity_id" IN
    (84706, 48746, 871441, 84713, 76492, 84621, 51472);

Rather than performing a single join with potentially too many rows, each table is split into a separate query.


回答 2

如Django文档所述:

prefetch_related()

返回一个QuerySet,该查询集将自动为每个指定的查询分批检索相关对象。

这与select_related具有相似的目的,因为两者均旨在阻止由于访问相关对象而导致的数据库查询泛滥,但是策略却大不相同。

select_related通过创建SQL连接并将相关对象的字段包括在SELECT语句中来工作。因此,select_related在同一数据库查询中获取相关对象。但是,为了避免跨“许多”关系进行联接会产生更大的结果集,select_related仅限于单值关系-外键和一对一关系。

另一方面,prefetch_related对每个关系进行单独的查找,并在Python中进行“联接”。除了select_related支持的外键和一对一关系之外,这还允许它预取多对多和多对一对象,这不能使用select_related完成。它还支持GenericRelation和GenericForeignKey的预取,但是,必须将其限制为同类结果。例如,仅当查询仅限于一个ContentType时,才支持预取GenericForeignKey引用的对象。

有关此的更多信息:https : //docs.djangoproject.com/en/2.2/ref/models/querysets/#prefetch-related

As Django documentation says:

prefetch_related()

Returns a QuerySet that will automatically retrieve, in a single batch, related objects for each of the specified lookups.

This has a similar purpose to select_related, in that both are designed to stop the deluge of database queries that is caused by accessing related objects, but the strategy is quite different.

select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships – foreign key and one-to-one.

prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey, however, it must be restricted to a homogeneous set of results. For example, prefetching objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.

More information about this: https://docs.djangoproject.com/en/2.2/ref/models/querysets/#prefetch-related


回答 3

仔细阅读已经发布的答案。只是认为如果我添加一个带有实际示例的答案会更好。

假设您有3个相关的Django模型。

class M1(models.Model):
    name = models.CharField(max_length=10)

class M2(models.Model):
    name = models.CharField(max_length=10)
    select_relation = models.ForeignKey(M1, on_delete=models.CASCADE)
    prefetch_relation = models.ManyToManyField(to='M3')

class M3(models.Model):
    name = models.CharField(max_length=10)

在这里,您可以使用字段和使用字段的对象查询M2模型及其相关M1对象。select_relationM3prefetch_relation

但是正如我们所提到M1的关系由M2ForeignKey,它只返回只有1对任何记录M2对象。同样的事情也适用OneToOneField

但是M3与的关系来自M2ManyToManyField它可能返回任意数量的M1对象。

考虑这样一种情况:您有2个M2对象m21m22这些对象具有相同的5个M3具有ID的关联对象1,2,3,4,5。当您M3为每个对象获取关联的M2对象时,如果使用select related,则它将如何工作。

脚步:

  1. 查找m21对象。
  2. 查询M3m21ID为的对象相关的所有对象1,2,3,4,5
  3. m22对象和所有其他M2对象重复相同的操作。

因为我们有相同1,2,3,4,5的ID两个m21m22对象,如果我们使用select_related选项,它会查询数据库两次,这已经获取相同的ID。

相反,如果您使用prefetch_related,则当您尝试获取M2对象时,它将在查询M2表时记下对象返回的所有ID(注意:仅这些ID),并且作为最后一步,Django将对M3表进行查询以及您的M2对象已返回的所有ID的集合。并M2使用Python而不是数据库将它们连接到对象。

这样,您M3只查询一次所有对象,从而提高了性能。

Gone through the already posted answers. Just thought it would be better if I add an answer with actual example.

Let’ say you have 3 Django models which are related.

class M1(models.Model):
    name = models.CharField(max_length=10)

class M2(models.Model):
    name = models.CharField(max_length=10)
    select_relation = models.ForeignKey(M1, on_delete=models.CASCADE)
    prefetch_relation = models.ManyToManyField(to='M3')

class M3(models.Model):
    name = models.CharField(max_length=10)

Here you can query M2 model and its relative M1 objects using select_relation field and M3 objects using prefetch_relation field.

However as we’ve mentioned M1‘s relation from M2 is a ForeignKey, it just returns only 1 record for any M2 object. Same thing applies for OneToOneField as well.

But M3‘s relation from M2 is a ManyToManyField which might return any number of M1 objects.

Consider a case where you have 2 M2 objects m21, m22 who have same 5 associated M3 objects with IDs 1,2,3,4,5. When you fetch associated M3 objects for each of those M2 objects, if you use select related, this is how it’s going to work.

Steps:

  1. Find m21 object.
  2. Query all the M3 objects related to m21 object whose IDs are 1,2,3,4,5.
  3. Repeat same thing for m22 object and all other M2 objects.

As we have same 1,2,3,4,5 IDs for both m21, m22 objects, if we use select_related option, it’s going to query the DB twice for the same IDs which were already fetched.

Instead if you use prefetch_related, when you try to get M2 objects, it will make a note of all the IDs that your objects returned (Note: only the IDs) while querying M2 table and as last step, Django is going to make a query to M3 table with the set of all IDs that your M2 objects have returned. and join them to M2 objects using Python instead of database.

This way you’re querying all the M3 objects only once which improves performance.