标签归档:django-aggregation

如何在Django中过滤用于计数注释的对象?

问题:如何在Django中过滤用于计数注释的对象?

考虑简单的Django模型EventParticipant

class Event(models.Model):
    title = models.CharField(max_length=100)

class Participant(models.Model):
    event = models.ForeignKey(Event, db_index=True)
    is_paid = models.BooleanField(default=False, db_index=True)

使用参与者总数来注释事件查询很容易:

events = Event.objects.all().annotate(participants=models.Count('participant'))

如何用筛选的参与者计数进行注释is_paid=True

我需要查询所有事件,而与参与者人数无关,例如,我不需要按带注释的结果进行过滤。如果有0参与者,那没关系,我只需要带有0注释的值即可。

文档中的示例在这里不起作用,因为它从查询中排除了对象,而不是使用注释了对象0

更新。Django 1.8具有新的条件表达式功能,因此我们现在可以这样做:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0,
        output_field=models.IntegerField()
    )))

更新 2。Django 2.0具有新的条件聚合功能,请参阅下面的可接受答案

Consider simple Django models Event and Participant:

class Event(models.Model):
    title = models.CharField(max_length=100)

class Participant(models.Model):
    event = models.ForeignKey(Event, db_index=True)
    is_paid = models.BooleanField(default=False, db_index=True)

It’s easy to annotate events query with total number of participants:

events = Event.objects.all().annotate(participants=models.Count('participant'))

How to annotate with count of participants filtered by is_paid=True?

I need to query all events regardless of number of participants, e.g. I don’t need to filter by annotated result. If there are 0 participants, that’s ok, I just need 0 in annotated value.

The example from documentation doesn’t work here, because it excludes objects from query instead of annotating them with 0.

Update. Django 1.8 has new conditional expressions feature, so now we can do like this:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0,
        output_field=models.IntegerField()
    )))

Update 2. Django 2.0 has new Conditional aggregation feature, see the accepted answer below.


回答 0

Django 2.0中的条件聚合可让您进一步减少过去的流量。这也将使用Postgres的filter逻辑,该逻辑比求和的情况要快一些(我见过像20-30%这样的数字被打乱)。

无论如何,就您的情况而言,我们正在研究以下简单内容:

from django.db.models import Q, Count
events = Event.objects.annotate(
    paid_participants=Count('participants', filter=Q(participants__is_paid=True))
)

在文档中有一个单独的部分,关于对注释进行过滤。它和条件聚合是一样的东西,但是更像上面的例子。无论哪种方式,这都比我以前做的粗糙子查询要健康得多。

Conditional aggregation in Django 2.0 allows you to further reduce the amount of faff this has been in the past. This will also use Postgres’ filter logic, which is somewhat faster than a sum-case (I’ve seen numbers like 20-30% bandied around).

Anyway, in your case, we’re looking at something as simple as:

from django.db.models import Q, Count
events = Event.objects.annotate(
    paid_participants=Count('participants', filter=Q(participants__is_paid=True))
)

There’s a separate section in the docs about filtering on annotations. It’s the same stuff as conditional aggregation but more like my example above. Either which way, this is a lot healthier than the gnarly subqueries I was doing before.


回答 1

刚刚发现Django 1.8具有新的条件表达式功能,因此现在我们可以这样做:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0, output_field=models.IntegerField()
    )))

Just discovered that Django 1.8 has new conditional expressions feature, so now we can do like this:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0, output_field=models.IntegerField()
    )))

回答 2

更新

Django 1.11现在通过subquery-expressions支持了我提到的子查询方法。

Event.objects.annotate(
    num_paid_participants=Subquery(
        Participant.objects.filter(
            is_paid=True,
            event=OuterRef('pk')
        ).values('event')
        .annotate(cnt=Count('pk'))
        .values('cnt'),
        output_field=models.IntegerField()
    )
)

我更喜欢这种方法而不是聚合(sum + case),因为它应该更快,更容易被优化(使用适当的索引)

对于较旧的版本,可以使用 .extra

Event.objects.extra(select={'num_paid_participants': "\
    SELECT COUNT(*) \
    FROM `myapp_participant` \
    WHERE `myapp_participant`.`is_paid` = 1 AND \
            `myapp_participant`.`event_id` = `myapp_event`.`id`"
})

UPDATE

The sub-query approach which I mention is now supported in Django 1.11 via subquery-expressions.

Event.objects.annotate(
    num_paid_participants=Subquery(
        Participant.objects.filter(
            is_paid=True,
            event=OuterRef('pk')
        ).values('event')
        .annotate(cnt=Count('pk'))
        .values('cnt'),
        output_field=models.IntegerField()
    )
)

I prefer this over aggregation (sum+case), because it should be faster and easier to be optimized (with proper indexing).

For older version, the same can be achieved using .extra

Event.objects.extra(select={'num_paid_participants': "\
    SELECT COUNT(*) \
    FROM `myapp_participant` \
    WHERE `myapp_participant`.`is_paid` = 1 AND \
            `myapp_participant`.`event_id` = `myapp_event`.`id`"
})

回答 3

我建议改用.values您的Participantqueryset 方法。

简而言之,您想要做的是:

Participant.objects\
    .filter(is_paid=True)\
    .values('event')\
    .distinct()\
    .annotate(models.Count('id'))

完整的示例如下:

  1. 创建2 Event秒:

    event1 = Event.objects.create(title='event1')
    event2 = Event.objects.create(title='event2')
    
  2. Participants 添加到他们:

    part1l = [Participant.objects.create(event=event1, is_paid=((_%2) == 0))\
              for _ in range(10)]
    part2l = [Participant.objects.create(event=event2, is_paid=((_%2) == 0))\
              for _ in range(50)]
    
  3. 将所有Participants按其event字段分组:

    Participant.objects.values('event')
    > <QuerySet [{'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, '...(remaining elements truncated)...']>
    

    这里需要与众不同:

    Participant.objects.values('event').distinct()
    > <QuerySet [{'event': 1}, {'event': 2}]>
    

    什么.values.distinct正在做的事情是,他们正在创造的两个水桶Participant用元的分组小号event。请注意,这些存储桶包含Participant

  4. 然后,您可以注释这些存储桶,因为它们包含原始集Participant。在这里,我们要计算的数量Participant,只需通过计算id这些存储区中的元素的s即可(因为它们是Participant):

    Participant.objects\
        .values('event')\
        .distinct()\
        .annotate(models.Count('id'))
    > <QuerySet [{'event': 1, 'id__count': 10}, {'event': 2, 'id__count': 50}]>
    
  5. 最后,您只Participant需要一个is_paidbeing True,您可以只在前一个表达式的前面添加一个过滤器,这将产生上面显示的表达式:

    Participant.objects\
        .filter(is_paid=True)\
        .values('event')\
        .distinct()\
        .annotate(models.Count('id'))
    > <QuerySet [{'event': 1, 'id__count': 5}, {'event': 2, 'id__count': 25}]>
    

唯一的缺点是Event您只能id从上面的方法中获取,因此您必须检索之后的内容。

I would suggest to use the .values method of your Participant queryset instead.

For short, what you want to do is given by:

Participant.objects\
    .filter(is_paid=True)\
    .values('event')\
    .distinct()\
    .annotate(models.Count('id'))

A complete example is as follow:

  1. Create 2 Events:

    event1 = Event.objects.create(title='event1')
    event2 = Event.objects.create(title='event2')
    
  2. Add Participants to them:

    part1l = [Participant.objects.create(event=event1, is_paid=((_%2) == 0))\
              for _ in range(10)]
    part2l = [Participant.objects.create(event=event2, is_paid=((_%2) == 0))\
              for _ in range(50)]
    
  3. Group all Participants by their event field:

    Participant.objects.values('event')
    > <QuerySet [{'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, '...(remaining elements truncated)...']>
    

    Here distinct is needed:

    Participant.objects.values('event').distinct()
    > <QuerySet [{'event': 1}, {'event': 2}]>
    

    What .values and .distinct are doing here is that they are creating two buckets of Participants grouped by their element event. Note that those buckets contain Participant.

  4. You can then annotate those buckets as they contain the set of original Participant. Here we want to count the number of Participant, this is simply done by counting the ids of the elements in those buckets (since those are Participant):

    Participant.objects\
        .values('event')\
        .distinct()\
        .annotate(models.Count('id'))
    > <QuerySet [{'event': 1, 'id__count': 10}, {'event': 2, 'id__count': 50}]>
    
  5. Finally you want only Participant with a is_paid being True, you may just add a filter in front of the previous expression, and this yield the expression shown above:

    Participant.objects\
        .filter(is_paid=True)\
        .values('event')\
        .distinct()\
        .annotate(models.Count('id'))
    > <QuerySet [{'event': 1, 'id__count': 5}, {'event': 2, 'id__count': 25}]>
    

The only drawback is that you have to retrieve the Event afterwards as you only have the id from the method above.


回答 4

我正在寻找什么结果:

  • 将任务添加到报告中的人员(受让人)。-唯一身份人员总数
  • 将任务添加到报告中但仅针对计费性大于0的任务的人员。

通常,我将不得不使用两个不同的查询:

Task.objects.filter(billable_efforts__gt=0)
Task.objects.all()

但我想在一个查询中两者。因此:

Task.objects.values('report__title').annotate(withMoreThanZero=Count('assignee', distinct=True, filter=Q(billable_efforts__gt=0))).annotate(totalUniqueAssignee=Count('assignee', distinct=True))

结果:

<QuerySet [{'report__title': 'TestReport', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}, {'report__title': 'Utilization_Report_April_2019', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}]>

What result I am looking for:

  • People (assignee) who have tasks added to a report. – Total Unique count of People
  • People who have tasks added to a report but, for task whose billability is more than 0 only.

In general, I would have to use two different queries:

Task.objects.filter(billable_efforts__gt=0)
Task.objects.all()

But I want both in one query. Hence:

Task.objects.values('report__title').annotate(withMoreThanZero=Count('assignee', distinct=True, filter=Q(billable_efforts__gt=0))).annotate(totalUniqueAssignee=Count('assignee', distinct=True))

Result:

<QuerySet [{'report__title': 'TestReport', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}, {'report__title': 'Utilization_Report_April_2019', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}]>