标签归档:django-orm

如何为具有多对多字段的Django模型创建对象?

问题:如何为具有多对多字段的Django模型创建对象?

我的模特:

class Sample(models.Model):
    users = models.ManyToManyField(User)

我想同时保存user1并保存user2在该模型中:

user1 = User.objects.get(pk=1)
user2 = User.objects.get(pk=2)
sample_object = Sample(users=user1, users=user2)
sample_object.save()

我知道这是错误的,但是我敢肯定,您会明白我的意思。你会怎么做?

My model:

class Sample(models.Model):
    users = models.ManyToManyField(User)

I want to save both user1 and user2 in that model:

user1 = User.objects.get(pk=1)
user2 = User.objects.get(pk=2)
sample_object = Sample(users=user1, users=user2)
sample_object.save()

I know that’s wrong, but I’m sure you get what I want to do. How would you do it ?


回答 0

您不能从未保存的对象创建m2m关系。如果有pk,请尝试以下操作:

sample_object = Sample()
sample_object.save()
sample_object.users.add(1,2)

更新:阅读了saverio的答案后,我决定对这个问题进行更深入的研究。这是我的发现。

这是我最初的建议。它可以工作,但不是最佳选择。(注意:我使用Bar的是s和a Foo而不是Users和a Sample,但是您知道了。)

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars.add(bar1)
foo.bars.add(bar2)

它总共产生7个查询:

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

我相信我们可以做得更好。您可以将多个对象传递给该add()方法:

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars.add(bar1, bar2)

如我们所见,传递多个对象可以节省一个SELECT

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

我不知道您还可以分配对象列表:

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars = [bar1, bar2]

不幸的是,这又增加了一个SELECT

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."id", "app_foo_bars"."foo_id", "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE "app_foo_bars"."foo_id" = 1
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

让我们尝试分配一个pks 列表,如saverio建议的那样:

foo = Foo()
foo.save()
foo.bars = [1,2]

由于不获取两个Bars,因此保存了两个SELECT语句,总共有5个:

INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."id", "app_foo_bars"."foo_id", "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE "app_foo_bars"."foo_id" = 1
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

最终获胜者是:

foo = Foo()
foo.save()
foo.bars.add(1,2)

路过pks到add()让我们一共有4个查询:

INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

You cannot create m2m relations from unsaved objects. If you have the pks, try this:

sample_object = Sample()
sample_object.save()
sample_object.users.add(1,2)

Update: After reading the saverio’s answer, I decided to investigate the issue a bit more in depth. Here are my findings.

This was my original suggestion. It works, but isn’t optimal. (Note: I’m using Bars and a Foo instead of Users and a Sample, but you get the idea).

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars.add(bar1)
foo.bars.add(bar2)

It generates a whopping total of 7 queries:

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

I’m sure we can do better. You can pass multiple objects to the add() method:

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars.add(bar1, bar2)

As we can see, passing multiple objects saves one SELECT:

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

I wasn’t aware that you can also assign a list of objects:

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars = [bar1, bar2]

Unfortunately, that creates one additional SELECT:

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."id", "app_foo_bars"."foo_id", "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE "app_foo_bars"."foo_id" = 1
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

Let’s try to assign a list of pks, as saverio suggested:

foo = Foo()
foo.save()
foo.bars = [1,2]

As we don’t fetch the two Bars, we save two SELECT statements, resulting in a total of 5:

INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."id", "app_foo_bars"."foo_id", "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE "app_foo_bars"."foo_id" = 1
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

And the winner is:

foo = Foo()
foo.save()
foo.bars.add(1,2)

Passing pks to add() gives us a total of 4 queries:

INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

回答 1

对于将来的访问者,您可以使用django 1.4中新的bulk_create2个查询中创建一个对象及其所有m2m对象。请注意,仅当您不需要对带有save()方法或信号的数据进行任何预处理或后处理时,此方法才可用。您插入的正是数据库中的内容

您无需在字段上指定“直通”模型即可执行此操作。为了完整起见,下面的示例创建了一个空白的Users模型来模仿原始海报的要求。

from django.db import models

class Users(models.Model):
    pass

class Sample(models.Model):
    users = models.ManyToManyField(Users)

现在,在Shell或其他代码中,创建2个用户,创建一个示例对象,然后将用户批量添加到该示例对象中。

Users().save()
Users().save()

# Access the through model directly
ThroughModel = Sample.users.through

users = Users.objects.filter(pk__in=[1,2])

sample_object = Sample()
sample_object.save()

ThroughModel.objects.bulk_create([
    ThroughModel(users_id=users[0].pk, sample_id=sample_object.pk),
    ThroughModel(users_id=users[1].pk, sample_id=sample_object.pk)
])

For future visitors, you can create an object and all of its m2m objects in 2 queries using the new bulk_create in django 1.4. Note that this is only usable if you don’t require any pre or post-processing on the data with save() methods or signals. What you insert is exactly what will be in the DB

You can do this without specifying a “through” model on the field. For completeness, the example below creates a blank Users model to mimic what the original poster was asking.

from django.db import models

class Users(models.Model):
    pass

class Sample(models.Model):
    users = models.ManyToManyField(Users)

Now, in a shell or other code, create 2 users, create a sample object, and bulk add the users to that sample object.

Users().save()
Users().save()

# Access the through model directly
ThroughModel = Sample.users.through

users = Users.objects.filter(pk__in=[1,2])

sample_object = Sample()
sample_object.save()

ThroughModel.objects.bulk_create([
    ThroughModel(users_id=users[0].pk, sample_id=sample_object.pk),
    ThroughModel(users_id=users[1].pk, sample_id=sample_object.pk)
])

回答 2

Django 1.9
一个简单的例子:

sample_object = Sample()
sample_object.save()

list_of_users = DestinationRate.objects.all()
sample_object.users.set(list_of_users)

Django 1.9
A quick example:

sample_object = Sample()
sample_object.save()

list_of_users = DestinationRate.objects.all()
sample_object.users.set(list_of_users)

回答 3

RelatedObjectManagers与Model中的字段是不同的“属性”。实现您想要的最简单的方法是

sample_object = Sample.objects.create()
sample_object.users = [1, 2]

这与分配用户列表相同,而没有其他查询和模型构建。

如果查询的数量让您感到困扰(而不是简单),那么最佳解决方案将需要三个查询:

sample_object = Sample.objects.create()
sample_id = sample_object.id
sample_object.users.through.objects.create(user_id=1, sample_id=sample_id)
sample_object.users.through.objects.create(user_id=2, sample_id=sample_id)

这将起作用,因为我们已经知道“用户”列表为空,因此我们可以轻松创建。

RelatedObjectManagers are different “attributes” than fields in a Model. The simplest way to achieve what you are looking for is

sample_object = Sample.objects.create()
sample_object.users = [1, 2]

That’s the same as assigning a User list, without the additional queries and the model building.

If the number of queries is what bothers you (instead of simplicity), then the optimal solution requires three queries:

sample_object = Sample.objects.create()
sample_id = sample_object.id
sample_object.users.through.objects.create(user_id=1, sample_id=sample_id)
sample_object.users.through.objects.create(user_id=2, sample_id=sample_id)

This will work because we already know that the ‘users’ list is empty, so we can create mindlessly.


回答 4

您可以通过以下方式替换相关对象集(Django 1.9中的新增功能):

new_list = [user1, user2, user3]
sample_object.related_set.set(new_list)

You could replace the set of related objects in this way (new in Django 1.9):

new_list = [user1, user2, user3]
sample_object.related_set.set(new_list)

回答 5

如果有人想做David Marbles,请回答自我引用ManyToMany字段。直通模型的ID称为:“ to_’model_name_id”和“ from_’model_name’_id”。

如果这样不起作用,您可以检查Django连接。

If someone is looking to do David Marbles answer on a self referring ManyToMany field. The ids of the through model are called: “to_’model_name_id” and “from_’model_name’_id”.

If that doesn’t work you can check the django connection.


Django自引用外键

问题:Django自引用外键

一般来说,我对Web应用程序和数据库内容还是陌生的,所以这可能是一个愚蠢的问题。我想制作一个模型(“ CategoryModel”),其字段指向模型的另一个实例(其父实例)的主要ID。

class CategoryModel(models.Model):
    parent = models.ForeignKey(CategoryModel)

我该怎么做呢?谢谢!

I’m kind of new to webapps and database stuff in general so this might be a dumb question. I want to make a model (“CategoryModel”) with a field that points to the primary id of another instance of the model (its parent).

class CategoryModel(models.Model):
    parent = models.ForeignKey(CategoryModel)

How do I do this? Thanks!


回答 0

您可以将模型的名称作为字符串传递给ForeignKey,它将做正确的事情。

所以:

parent = models.ForeignKey("CategoryModel")

或者您可以使用字符串“ self”

parent = models.ForeignKey("self")

You can pass in the name of a model as a string to ForeignKey and it will do the right thing.

So:

parent = models.ForeignKey("CategoryModel")

Or you can use the string “self”

parent = models.ForeignKey("self")

回答 1

您可以使用字符串“ self”表示自我参考。

class CategoryModel(models.Model):
    parent = models.ForeignKey('self')

https://docs.djangoproject.com/zh-CN/dev/ref/models/fields/#foreignkey

You can use the string ‘self’ to indicate a self-reference.

class CategoryModel(models.Model):
    parent = models.ForeignKey('self')

https://docs.djangoproject.com/en/dev/ref/models/fields/#foreignkey


回答 2

https://books.agiliq.com/projects/django-orm-cookbook/zh-CN/latest/self_fk.html

class Employee(models.Model):
    manager = models.ForeignKey('self', on_delete=models.CASCADE)

要么

class Employee(models.Model):
    manager = models.ForeignKey("app.Employee", on_delete=models.CASCADE)

https://stackabuse.com/recursive-model-relationships-in-django/

https://books.agiliq.com/projects/django-orm-cookbook/en/latest/self_fk.html

class Employee(models.Model):
    manager = models.ForeignKey('self', on_delete=models.CASCADE)

OR

class Employee(models.Model):
    manager = models.ForeignKey("app.Employee", on_delete=models.CASCADE)

https://stackabuse.com/recursive-model-relationships-in-django/


回答 3

您还要设置null = True和blank = True

class CategoryModel(models.Model):
    parent = models.ForeignKey("self", on_delete=models.CASCADE, null=True, blank=True)

null = True,允许在数据库中
blank = True,允许在表单中验证

You also to sett null=True and blank=True

class CategoryModel(models.Model):
    parent = models.ForeignKey("self", on_delete=models.CASCADE, null=True, blank=True)

null=True, to allow in database
blank=True, to allow in form validation


如何在Django queryset中执行OR条件?

问题:如何在Django queryset中执行OR条件?

我想编写一个与此SQL查询等效的Django查询:

SELECT * from user where income >= 5000 or income is NULL.

如何构造Django queryset过滤器?

User.objects.filter(income__gte=5000, income=0)

这是行不通的,因为它AND是过滤器。我想要OR过滤器以获取单个查询集的并集。

I want to write a Django query equivalent to this SQL query:

SELECT * from user where income >= 5000 or income is NULL.

How to construct the Django queryset filter?

User.objects.filter(income__gte=5000, income=0)

This doesn’t work, because it ANDs the filters. I want to OR the filters to get union of individual querysets.


回答 0

from django.db.models import Q
User.objects.filter(Q(income__gte=5000) | Q(income__isnull=True))

通过文档

from django.db.models import Q
User.objects.filter(Q(income__gte=5000) | Q(income__isnull=True))

via Documentation


回答 1

由于QuerySet实现了Python __or__运算符(|)或并集,因此它可以正常工作。正如您所期望的,|二进制运算符返回一个QuerySetso order_by(),,.distinct()其他查询集过滤器可以附加到末尾。

combined_queryset = User.objects.filter(income__gte=5000) | User.objects.filter(income__isnull=True)
ordered_queryset = combined_queryset.order_by('-income')

更新2019-06-20:现在在Django 2.1 QuerySet API参考中已完全记录了该内容。更多历史性讨论可以在DjangoProject票证#21333中找到

Because QuerySets implement the Python __or__ operator (|), or union, it just works. As you’d expect, the | binary operator returns a QuerySet so order_by(), .distinct(), and other queryset filters can be tacked on to the end.

combined_queryset = User.objects.filter(income__gte=5000) | User.objects.filter(income__isnull=True)
ordered_queryset = combined_queryset.order_by('-income')

Update 2019-06-20: This is now fully documented in the Django 2.1 QuerySet API reference. More historic discussion can be found in DjangoProject ticket #21333.


回答 2

现有答案中已经提到了这两种选择:

from django.db.models import Q
q1 = User.objects.filter(Q(income__gte=5000) | Q(income__isnull=True))

q2 = User.objects.filter(income__gte=5000) | User.objects.filter(income__isnull=True)

但是,似乎偏爱哪一个。

关键是它们在SQL级别上是相同的,所以请随意选择您喜欢的任何一个!

Django的ORM食谱在这个比较详细谈到,这里是有关部分:


queryset = User.objects.filter(
        first_name__startswith='R'
    ) | User.objects.filter(
    last_name__startswith='D'
)

导致

In [5]: str(queryset.query)
Out[5]: 'SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login",
"auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name",
"auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff",
"auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user"
WHERE ("auth_user"."first_name"::text LIKE R% OR "auth_user"."last_name"::text LIKE D%)'

qs = User.objects.filter(Q(first_name__startswith='R') | Q(last_name__startswith='D'))

导致

In [9]: str(qs.query)
Out[9]: 'SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login",
 "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name",
  "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff",
  "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user"
  WHERE ("auth_user"."first_name"::text LIKE R% OR "auth_user"."last_name"::text LIKE D%)'

资料来源:django-orm-cookbook


Both options are already mentioned in the existing answers:

from django.db.models import Q
q1 = User.objects.filter(Q(income__gte=5000) | Q(income__isnull=True))

and

q2 = User.objects.filter(income__gte=5000) | User.objects.filter(income__isnull=True)

However, there seems to be some confusion regarding which one is to prefer.

The point is that they are identical on the SQL level, so feel free to pick whichever you like!

The Django ORM Cookbook talks in some detail about this, here is the relevant part:


queryset = User.objects.filter(
        first_name__startswith='R'
    ) | User.objects.filter(
    last_name__startswith='D'
)

leads to

In [5]: str(queryset.query)
Out[5]: 'SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login",
"auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name",
"auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff",
"auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user"
WHERE ("auth_user"."first_name"::text LIKE R% OR "auth_user"."last_name"::text LIKE D%)'

and

qs = User.objects.filter(Q(first_name__startswith='R') | Q(last_name__startswith='D'))

leads to

In [9]: str(qs.query)
Out[9]: 'SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login",
 "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name",
  "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff",
  "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user"
  WHERE ("auth_user"."first_name"::text LIKE R% OR "auth_user"."last_name"::text LIKE D%)'

source: django-orm-cookbook



Django ORM中的select_related和prefetch_related有什么区别?

问题:Django ORM中的select_related和prefetch_related有什么区别?

在Django文件中,

select_related() “遵循”外键关系,在执行查询时选择其他相关对象数据。

prefetch_related() 对每个关系进行单独的查找,并在Python中执行“联接”。

“在python中进行连接”是什么意思?有人可以举例说明吗?

我的理解是,对于外键关系,使用select_related; 对于M2M关系,请使用prefetch_related。它是否正确?

In Django doc,

select_related() “follows” foreign-key relationships, selecting additional related-object data when it executes its query.

prefetch_related() does a separate lookup for each relationship, and does the “joining” in Python.

What does it mean by “doing the joining in python”? Can someone illustrate with an example?

My understanding is that for foreign key relationship, use select_related; and for M2M relationship, use prefetch_related. Is this correct?


回答 0

您的理解基本上是正确的。您可以使用select_related时,你将要选择的对象是一个对象,所以OneToOneField还是ForeignKey。您可以使用prefetch_related时,你会得到一个东西的“设置”,所以ManyToManyFieldS作为你陈述或反向ForeignKey秒。为了阐明我的意思是“ reverse ForeignKeys”,这里有一个例子:

class ModelA(models.Model):
    pass

class ModelB(models.Model):
    a = ForeignKey(ModelA)

ModelB.objects.select_related('a').all() # Forward ForeignKey relationship
ModelA.objects.prefetch_related('modelb_set').all() # Reverse ForeignKey relationship

区别在于select_related执行SQL连接,因此从SQL Server将结果作为表的一部分返回。prefetch_related另一方面,执行另一个查询,因此减少了原始对象中的冗余列(ModelA在上面的示例中)。您可以使用prefetch_related任何可以使用的东西select_related

折衷方案是prefetch_related必须创建并发送ID列表以选择回服务器,这可能需要一段时间。我不确定在事务中是否有很好的方法,但是我的理解是Django总是只发送一个列表并显示SELECT … WHERE PK IN(…,…,…)基本上。在这种情况下,如果预取的数据稀疏(例如,将美国国家对象链接到人们的地址),这可能会很好,但是,如果它们之间的关系更接近一对一,则会浪费大量通信资源。如有疑问,请尝试两者并查看哪种效果更好。

上面讨论的所有内容基本上都与与数据库的通信有关。但是,在Python方面prefetch_related具有额外的好处,即使用单个对象表示数据库中的每个对象。使用select_related重复的对象将在Python中为每个“父”对象创建。由于Python中的对象具有相当大的内存开销,因此这也是一个考虑因素。

Your understanding is mostly correct. You use select_related when the object that you’re going to be selecting is a single object, so OneToOneField or a ForeignKey. You use prefetch_related when you’re going to get a “set” of things, so ManyToManyFields as you stated or reverse ForeignKeys. Just to clarify what I mean by “reverse ForeignKeys” here’s an example:

class ModelA(models.Model):
    pass

class ModelB(models.Model):
    a = ForeignKey(ModelA)

ModelB.objects.select_related('a').all() # Forward ForeignKey relationship
ModelA.objects.prefetch_related('modelb_set').all() # Reverse ForeignKey relationship

The difference is that select_related does an SQL join and therefore gets the results back as part of the table from the SQL server. prefetch_related on the other hand executes another query and therefore reduces the redundant columns in the original object (ModelA in the above example). You may use prefetch_related for anything that you can use select_related for.

The tradeoffs are that prefetch_related has to create and send a list of IDs to select back to the server, this can take a while. I’m not sure if there’s a nice way of doing this in a transaction, but my understanding is that Django always just sends a list and says SELECT … WHERE pk IN (…,…,…) basically. In this case if the prefetched data is sparse (let’s say U.S. State objects linked to people’s addresses) this can be very good, however if it’s closer to one-to-one, this can waste a lot of communications. If in doubt, try both and see which performs better.

Everything discussed above is basically about the communications with the database. On the Python side however prefetch_related has the extra benefit that a single object is used to represent each object in the database. With select_related duplicate objects will be created in Python for each “parent” object. Since objects in Python have a decent bit of memory overhead this can also be a consideration.


回答 1

两种方法可以达到相同的目的,从而放弃不必要的数据库查询。但是他们使用不同的方法来提高效率。

使用这两种方法的唯一原因是,当单个大型查询优于许多小型查询时。Django使用大型查询来抢先在内存中创建模型,而不是针对数据库执行按需查询。

select_related对每个查找执行联接,但将选择范围扩展为包括所有联接表的列。但是,这种方法有一个警告。

联接有可能使查询中的行数相乘。当您通过外键或一对一字段执行联接时,行数不会增加。但是,多对多联接没有此保证。因此,Django限制select_related了不会意外导致大规模联接的关系。

对于“ join in python”来说prefetch_related,应该比它还要令人震惊。它为要连接的每个表创建一个单独的查询。它使用WHERE IN子句过滤每个表,例如:

SELECT "credential"."id",
       "credential"."uuid",
       "credential"."identity_id"
FROM   "credential"
WHERE  "credential"."identity_id" IN
    (84706, 48746, 871441, 84713, 76492, 84621, 51472);

每个表都被拆分成一个单独的查询,而不是执行可能包含太多行的单个联接。

Both methods achieve the same purpose, to forego unnecessary db queries. But they use different approaches for efficiency.

The only reason to use either of these methods is when a single large query is preferable to many small queries. Django uses the large query to create models in memory preemptively rather than performing on demand queries against the database.

select_related performs a join with each lookup, but extends the select to include the columns of all joined tables. However this approach has a caveat.

Joins have the potential to multiply the number of rows in a query. When you perform a join over a foreign key or one-to-one field, the number of rows won’t increase. However, many-to-many joins do not have this guarantee. So, Django restricts select_related to relations that won’t unexpectedly result in a massive join.

The “join in python” for prefetch_related is a little more alarming then it should be. It creates a separate query for each table to be joined. It filters each of these table with a WHERE IN clause, like:

SELECT "credential"."id",
       "credential"."uuid",
       "credential"."identity_id"
FROM   "credential"
WHERE  "credential"."identity_id" IN
    (84706, 48746, 871441, 84713, 76492, 84621, 51472);

Rather than performing a single join with potentially too many rows, each table is split into a separate query.


回答 2

如Django文档所述:

prefetch_related()

返回一个QuerySet,该查询集将自动为每个指定的查询分批检索相关对象。

这与select_related具有相似的目的,因为两者均旨在阻止由于访问相关对象而导致的数据库查询泛滥,但是策略却大不相同。

select_related通过创建SQL连接并将相关对象的字段包括在SELECT语句中来工作。因此,select_related在同一数据库查询中获取相关对象。但是,为了避免跨“许多”关系进行联接会产生更大的结果集,select_related仅限于单值关系-外键和一对一关系。

另一方面,prefetch_related对每个关系进行单独的查找,并在Python中进行“联接”。除了select_related支持的外键和一对一关系之外,这还允许它预取多对多和多对一对象,这不能使用select_related完成。它还支持GenericRelation和GenericForeignKey的预取,但是,必须将其限制为同类结果。例如,仅当查询仅限于一个ContentType时,才支持预取GenericForeignKey引用的对象。

有关此的更多信息:https : //docs.djangoproject.com/en/2.2/ref/models/querysets/#prefetch-related

As Django documentation says:

prefetch_related()

Returns a QuerySet that will automatically retrieve, in a single batch, related objects for each of the specified lookups.

This has a similar purpose to select_related, in that both are designed to stop the deluge of database queries that is caused by accessing related objects, but the strategy is quite different.

select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships – foreign key and one-to-one.

prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey, however, it must be restricted to a homogeneous set of results. For example, prefetching objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.

More information about this: https://docs.djangoproject.com/en/2.2/ref/models/querysets/#prefetch-related


回答 3

仔细阅读已经发布的答案。只是认为如果我添加一个带有实际示例的答案会更好。

假设您有3个相关的Django模型。

class M1(models.Model):
    name = models.CharField(max_length=10)

class M2(models.Model):
    name = models.CharField(max_length=10)
    select_relation = models.ForeignKey(M1, on_delete=models.CASCADE)
    prefetch_relation = models.ManyToManyField(to='M3')

class M3(models.Model):
    name = models.CharField(max_length=10)

在这里,您可以使用字段和使用字段的对象查询M2模型及其相关M1对象。select_relationM3prefetch_relation

但是正如我们所提到M1的关系由M2ForeignKey,它只返回只有1对任何记录M2对象。同样的事情也适用OneToOneField

但是M3与的关系来自M2ManyToManyField它可能返回任意数量的M1对象。

考虑这样一种情况:您有2个M2对象m21m22这些对象具有相同的5个M3具有ID的关联对象1,2,3,4,5。当您M3为每个对象获取关联的M2对象时,如果使用select related,则它将如何工作。

脚步:

  1. 查找m21对象。
  2. 查询M3m21ID为的对象相关的所有对象1,2,3,4,5
  3. m22对象和所有其他M2对象重复相同的操作。

因为我们有相同1,2,3,4,5的ID两个m21m22对象,如果我们使用select_related选项,它会查询数据库两次,这已经获取相同的ID。

相反,如果您使用prefetch_related,则当您尝试获取M2对象时,它将在查询M2表时记下对象返回的所有ID(注意:仅这些ID),并且作为最后一步,Django将对M3表进行查询以及您的M2对象已返回的所有ID的集合。并M2使用Python而不是数据库将它们连接到对象。

这样,您M3只查询一次所有对象,从而提高了性能。

Gone through the already posted answers. Just thought it would be better if I add an answer with actual example.

Let’ say you have 3 Django models which are related.

class M1(models.Model):
    name = models.CharField(max_length=10)

class M2(models.Model):
    name = models.CharField(max_length=10)
    select_relation = models.ForeignKey(M1, on_delete=models.CASCADE)
    prefetch_relation = models.ManyToManyField(to='M3')

class M3(models.Model):
    name = models.CharField(max_length=10)

Here you can query M2 model and its relative M1 objects using select_relation field and M3 objects using prefetch_relation field.

However as we’ve mentioned M1‘s relation from M2 is a ForeignKey, it just returns only 1 record for any M2 object. Same thing applies for OneToOneField as well.

But M3‘s relation from M2 is a ManyToManyField which might return any number of M1 objects.

Consider a case where you have 2 M2 objects m21, m22 who have same 5 associated M3 objects with IDs 1,2,3,4,5. When you fetch associated M3 objects for each of those M2 objects, if you use select related, this is how it’s going to work.

Steps:

  1. Find m21 object.
  2. Query all the M3 objects related to m21 object whose IDs are 1,2,3,4,5.
  3. Repeat same thing for m22 object and all other M2 objects.

As we have same 1,2,3,4,5 IDs for both m21, m22 objects, if we use select_related option, it’s going to query the DB twice for the same IDs which were already fetched.

Instead if you use prefetch_related, when you try to get M2 objects, it will make a note of all the IDs that your objects returned (Note: only the IDs) while querying M2 table and as last step, Django is going to make a query to M3 table with the set of all IDs that your M2 objects have returned. and join them to M2 objects using Python instead of database.

This way you’re querying all the M3 objects only once which improves performance.