如何使用Django的ORM提取随机记录?

问题:如何使用Django的ORM提取随机记录?

我有一个模型,代表我在网站上展示的绘画。在主要网页上,我想展示其中的一些:最新的,大多数时间未访问的网页,最受欢迎的网页和随机的网页。

我正在使用Django 1.0.2。

尽管使用django模型可以轻松提取其中的前3个,但最后一个(随机)会给我带来一些麻烦。在我看来,我可以将代码进行如下编码:

number_of_records = models.Painting.objects.count()
random_index = int(random.random()*number_of_records)+1
random_paint = models.Painting.get(pk = random_index)

在我看来,这看起来并不像我想要的东西-这完全是数据库抽象的一部分,应该包含在模型中。另外,在这里,我需要处理已删除的记录(然后所有记录的数量将无法覆盖所有可能的键值)以及可能还有很多其他事情。

我还有其他方法可以做,最好是在模型抽象内进行?

I have a model that represents paintings I present on my site. On the main webpage I’d like to show some of them: newest, one that was not visited for most time, most popular one and a random one.

I’m using Django 1.0.2.

While first 3 of them are easy to pull using django models, last one (random) causes me some trouble. I can ofc code it in my view, to something like this:

number_of_records = models.Painting.objects.count()
random_index = int(random.random()*number_of_records)+1
random_paint = models.Painting.get(pk = random_index)

It doesn’t look like something I’d like to have in my view tho – this is entirely part of database abstraction and should be in the model. Also, here I need to take care of removed records (then number of all records won’t cover me all the possible key values) and probably lots of other things.

Any other options how I can do it, preferably somehow inside the model abstraction?


回答 0

使用order_by('?')将在生产的第二天杀死数据库服务器。更好的方法类似于“从关系数据库获取随机行”中所述

from django.db.models.aggregates import Count
from random import randint

class PaintingManager(models.Manager):
    def random(self):
        count = self.aggregate(count=Count('id'))['count']
        random_index = randint(0, count - 1)
        return self.all()[random_index]

Using order_by('?') will kill the db server on the second day in production. A better way is something like what is described in Getting a random row from a relational database.

from django.db.models.aggregates import Count
from random import randint

class PaintingManager(models.Manager):
    def random(self):
        count = self.aggregate(count=Count('id'))['count']
        random_index = randint(0, count - 1)
        return self.all()[random_index]

回答 1

只需使用:

MyModel.objects.order_by('?').first()

它记录在QuerySet API中

Simply use:

MyModel.objects.order_by('?').first()

It is documented in QuerySet API.


回答 2

如果使用MySQL(即使不了解其他数据库),即使对于中型表,order_by(’?’)[:N]的解决方案也非常慢。

order_by('?')[:N]将被翻译为SELECT ... FROM ... WHERE ... ORDER BY RAND() LIMIT N查询。

这意味着将对表中的每一行执行RAND()函数,然后将根据该函数的值对整个表进行排序,然后将返回前N条记录。如果您的桌子很小,那很好。但是在大多数情况下,这是一个非常慢的查询。

我写了一个简单的函数,即使id有孔(某些行已删除),该函数也可以工作:

def get_random_item(model, max_id=None):
    if max_id is None:
        max_id = model.objects.aggregate(Max('id')).values()[0]
    min_id = math.ceil(max_id*random.random())
    return model.objects.filter(id__gte=min_id)[0]

在几乎所有情况下,它都比order_by(’?’)快。

The solutions with order_by(‘?’)[:N] are extremely slow even for medium-sized tables if you use MySQL (don’t know about other databases).

order_by('?')[:N] will be translated to SELECT ... FROM ... WHERE ... ORDER BY RAND() LIMIT N query.

It means that for every row in table the RAND() function will be executed, then the whole table will be sorted according to value of this function and then first N records will be returned. If your tables are small, this is fine. But in most cases this is a very slow query.

I wrote simple function that works even if id’s have holes (some rows where deleted):

def get_random_item(model, max_id=None):
    if max_id is None:
        max_id = model.objects.aggregate(Max('id')).values()[0]
    min_id = math.ceil(max_id*random.random())
    return model.objects.filter(id__gte=min_id)[0]

It is faster than order_by(‘?’) in almost all cases.


回答 3

这是一个简单的解决方案:

from random import randint

count = Model.objects.count()
random_object = Model.objects.all()[randint(0, count - 1)] #single random object

Here’s a simple solution:

from random import randint

count = Model.objects.count()
random_object = Model.objects.all()[randint(0, count - 1)] #single random object

回答 4

您可以在模型上创建一个经理来执行此类操作。首先要明白一个经理是什么,Painting.objects方法是包含一个管理者all()filter()get(),等创建自己的管理器允许您预先筛选结果,并拥有所有这些相同的方法,以及您自己的自定义方法,工作的结果。

编辑:我修改了代码以反映该order_by['?']方法。注意,管理器返回无限数量的随机模型。因此,我加入了一些用法代码来展示如何仅获得一个模型。

from django.db import models

class RandomManager(models.Manager):
    def get_query_set(self):
        return super(RandomManager, self).get_query_set().order_by('?')

class Painting(models.Model):
    title = models.CharField(max_length=100)
    author = models.CharField(max_length=50)

    objects = models.Manager() # The default manager.
    randoms = RandomManager() # The random-specific manager.

用法

random_painting = Painting.randoms.all()[0]

最后,您可以在模型上拥有许多经理,因此可以随意创建LeastViewsManager()MostPopularManager()

You could create a manager on your model to do this sort of thing. To first understand what a manager is, the Painting.objects method is a manager that contains all(), filter(), get(), etc. Creating your own manager allows you to pre-filter results and have all these same methods, as well as your own custom methods, work on the results.

EDIT: I modified my code to reflect the order_by['?'] method. Note that the manager returns an unlimited number of random models. Because of this I’ve included a bit of usage code to show how to get just a single model.

from django.db import models

class RandomManager(models.Manager):
    def get_query_set(self):
        return super(RandomManager, self).get_query_set().order_by('?')

class Painting(models.Model):
    title = models.CharField(max_length=100)
    author = models.CharField(max_length=50)

    objects = models.Manager() # The default manager.
    randoms = RandomManager() # The random-specific manager.

Usage

random_painting = Painting.randoms.all()[0]

Lastly, you can have many managers on your models, so feel free to create a LeastViewsManager() or MostPopularManager().


回答 5

其他答案可能很慢(使用order_by('?')),或者使用多个SQL查询。这是一个示例解决方案,没有排序,只有一个查询(假设Postgres):

random_instance_or_none = Model.objects.raw('''
    select * from {0} limit 1
    offset floor(random() * (select count(*) from {0}))
'''.format(Model._meta.db_table)).first()

请注意,如果表为空,这将引发索引错误。为自己编写一个与模型无关的辅助函数以进行检查。

The other answers are either potentially slow (using order_by('?')) or use more than one SQL query. Here’s a sample solution with no ordering and just one query (assuming Postgres):

random_instance_or_none = Model.objects.raw('''
    select * from {0} limit 1
    offset floor(random() * (select count(*) from {0}))
'''.format(Model._meta.db_table)).first()

Be aware that this will raise an index error if the table is empty. Write yourself a model-agnostic helper function to check for that.


回答 6

只是一个简单的想法,我该怎么做:

def _get_random_service(self, professional):
    services = Service.objects.filter(professional=professional)
    i = randint(0, services.count()-1)
    return services[i]

Just a simple idea how I do it:

def _get_random_service(self, professional):
    services = Service.objects.filter(professional=professional)
    i = randint(0, services.count()-1)
    return services[i]

回答 7

嗨,我需要从查询集中选择一个随机记录,该记录的长度我也需要报告(即,网页生成了描述项,并且记录还剩下)

q = Entity.objects.filter(attribute_value='this or that')
item_count = q.count()
random_item = q[random.randomint(1,item_count+1)]

花费了一半的时间(0.7s和1.7s):

item_count = q.count()
random_item = random.choice(q)

我猜想它可以避免在选择随机条目之前拉低整个查询,并使我的系统对于足以重复访问某个页面的页面有足够的响应能力,以使用户希望减少item_count的计数。

Hi I needed to select a random record from a queryset who’s length I also needed to report (ie web page produced described item and said records left)

q = Entity.objects.filter(attribute_value='this or that')
item_count = q.count()
random_item = q[random.randomint(1,item_count+1)]

took half as long(0.7s vs 1.7s) as:

item_count = q.count()
random_item = random.choice(q)

I’m guessing it avoids pulling down the whole query before selecting the random entry and made my system responsive enough for a page that is accessed repeatedly for a repetitive task where users want to see the item_count count down.


回答 8

DB中的随机化在python中令人讨厌和更好。但是同时,将所有数据从数据库带到python内存只是忽略大多数结果(尤其是在生产环境中)并不是一个好主意。我们可能还需要某种过滤。

  1. 所以基本上我们在DB有数据,
  2. 我们想使用python的rand函数
  3. 后记会从DB提供所有必需的数据。

基本上,使用2个查询比在DB CPU中随机选择(在DB中计算)或加载整个数据(繁重的网络利用率)要便宜得多。解释的解决方案必须具有可伸缩性,因此尝试在此处进行计划将不适用于特别是带有过滤器,软/硬删除甚至带有is_public标志的生产环境。因为我们生成的随机ID可能会从数据库中删除或在过滤器中被删减。假定max_id(records)== count(records)是一个坏习惯。

(当然,如果您不删除与查询使用的数据相当的百分比,或者您不想使用任何种类的过滤器,并且如果您有信心,可以使用random id,然后可以使用random)

如果您只想要一项。请参阅(@Valter Silva)

import random

mgr = models.Painting.objects
qs = mgr.filter(...)
random_id = random.choice(1, qs.count())-1        # <--- [ First Query Hit ]

random_paint = qs[random_id] ## <-- [ Second Query Hit ]

如果您想要n个项目。

import random

req_no_of_random_items = 8        ## i need 8 random items.
qs = models.Painting.objects.filter(...)

## if u prefer to use random values often, you can keep this in cache. 
possible_ids = list(qs.values_list('id', flat=True))        # <--- [ First Query Hit ]

possible_ids = random.choices(possible_ids, k=8)
random_paint = qs.filter(pk__in=possible_ids) ## in a generic case to get 'n' items.

或者,如果您想为生产使用更优化的代码,请使用缓存功能获取产品ID:

from django.core.cache import cache

def id_set_cache(qs):
    key = "some_random_key_for_cache"
    id_set =  cache.get(key)
    if id_set is None:
        id_set = list(qs.values_list('id', flat=True)
        cache.set(key, id_set)
    retrun id_set

Randomization in DB feels nasty and better in python. But at the same time, it’s not a good idea to bring all the data from DB to python memory just to ignore most of the results (especially in the production environment). we might need some sort of filtering also.

  1. So Basically we have data at DB,
  2. we wanna use the rand function of python
  3. and afterwords bring up the whole required data from DB.

Basically using 2 queries will be much less expensive than picking random in DB CPU (computing in DB) or loading whole data (heavy Network Utilization). Solutions explained must need a scalable nature trying to plan here won’t work for a production environment espicially with filters, soft/hard deletes, or even with an is_public flag. because probably random id we generated might be deleted from the database or will be cut down in filters. Its a bad practice to assume max_id(records) == count(records).

(Ofcouce, If you do’not delete a percentage of data which is comparable to query uses, or if you dont wanna use any kond of filters, and if you are confident, random id which you can proceed with a random )

if you want only one items. Refer ( @Valter Silva )

import random

mgr = models.Painting.objects
qs = mgr.filter(...)
random_id = random.choice(1, qs.count())-1        # <--- [ First Query Hit ]

random_paint = qs[random_id] ## <-- [ Second Query Hit ]

if you want ‘n’ items.

import random

req_no_of_random_items = 8        ## i need 8 random items.
qs = models.Painting.objects.filter(...)

## if u prefer to use random values often, you can keep this in cache. 
possible_ids = list(qs.values_list('id', flat=True))        # <--- [ First Query Hit ]

possible_ids = random.choices(possible_ids, k=8)
random_paint = qs.filter(pk__in=possible_ids) ## in a generic case to get 'n' items.

or if you want to have a more optimized code for production, use a cachefunction to get ids of products:

from django.core.cache import cache

def id_set_cache(qs):
    key = "some_random_key_for_cache"
    id_set =  cache.get(key)
    if id_set is None:
        id_set = list(qs.values_list('id', flat=True)
        cache.set(key, id_set)
    retrun id_set

回答 9

仅需注意(一种非常常见的)特殊情况,如果表中有一个索引自动递增列且没有删除,那么执行随机选择的最佳方法是查询,例如:

SELECT * FROM table WHERE id = RAND() LIMIT 1

假设有一个名为id的表列。在Django中,您可以通过以下方式进行操作:

Painting.objects.raw('SELECT * FROM appname_painting WHERE id = RAND() LIMIT 1')

其中必须用应用程序名称替换appname。

通常,使用id列,可以使用以下命令更快地完成order_by(’?’):

Paiting.objects.raw(
        'SELECT * FROM auth_user WHERE id>=RAND() * (SELECT MAX(id) FROM auth_user) LIMIT %d' 
    % needed_count)

Just to note a (fairly common) special case, if there is a indexed auto-increment column in the table with no deletes, the optimum way to do a random select is a query like:

SELECT * FROM table WHERE id = RAND() LIMIT 1

that assumes such a column named id for table. In django you can do this by:

Painting.objects.raw('SELECT * FROM appname_painting WHERE id = RAND() LIMIT 1')

in which you must replace appname with your application name.

In General, with an id column, the order_by(‘?’) can be done much faster with:

Paiting.objects.raw(
        'SELECT * FROM auth_user WHERE id>=RAND() * (SELECT MAX(id) FROM auth_user) LIMIT %d' 
    % needed_count)

回答 10

强烈建议从关系数据库中获取随机行

因为使用django orm这样的事情,如果您的数据表很大,会使数据库服务器特别生气:

解决方案是提供一个模型管理器并手动编写SQL查询;)

更新

无需编写custom即可在任何数据库后端(甚至是非依赖数据库)上运行的另一种解决方案ModelManager在Django中从查询集获取随机对象

This is Highly recomended Getting a random row from a relational database

Because using django orm to do such a thing like that, will makes your db server angry specially if you have big data table :|

And the solution is provide a Model Manager and write the SQL query by hand ;)

Update:

Another solution which works on any database backend even non-rel ones without writing custom ModelManager. Getting Random objects from a Queryset in Django


回答 11

您可能想要使用对任何迭代器进行抽样相同的方法,尤其是如果您打算对多个项目进行抽样以创建样本集时,尤其如此。@MatijnPieters和@DzinX为此投入了很多思考:

def random_sampling(qs, N=1):
    """Sample any iterable (like a Django QuerySet) to retrieve N random elements

    Arguments:
      qs (iterable): Any iterable (like a Django QuerySet)
      N (int): Number of samples to retrieve at random from the iterable

    References:
      @DZinX:  https://stackoverflow.com/a/12583436/623735
      @MartinPieters: https://stackoverflow.com/a/12581484/623735
    """
    samples = []
    iterator = iter(qs)
    # Get the first `N` elements and put them in your results list to preallocate memory
    try:
        for _ in xrange(N):
            samples.append(iterator.next())
    except StopIteration:
        raise ValueError("N, the number of reuested samples, is larger than the length of the iterable.")
    random.shuffle(samples)  # Randomize your list of N objects
    # Now replace each element by a truly random sample
    for i, v in enumerate(qs, N):
        r = random.randint(0, i)
        if r < N:
            samples[r] = v  # at a decreasing rate, replace random items
    return samples

You may want to use the same approach that you’d use to sample any iterator, especially if you plan to sample multiple items to create a sample set. @MatijnPieters and @DzinX put a lot of thought into this:

def random_sampling(qs, N=1):
    """Sample any iterable (like a Django QuerySet) to retrieve N random elements

    Arguments:
      qs (iterable): Any iterable (like a Django QuerySet)
      N (int): Number of samples to retrieve at random from the iterable

    References:
      @DZinX:  https://stackoverflow.com/a/12583436/623735
      @MartinPieters: https://stackoverflow.com/a/12581484/623735
    """
    samples = []
    iterator = iter(qs)
    # Get the first `N` elements and put them in your results list to preallocate memory
    try:
        for _ in xrange(N):
            samples.append(iterator.next())
    except StopIteration:
        raise ValueError("N, the number of reuested samples, is larger than the length of the iterable.")
    random.shuffle(samples)  # Randomize your list of N objects
    # Now replace each element by a truly random sample
    for i, v in enumerate(qs, N):
        r = random.randint(0, i)
        if r < N:
            samples[r] = v  # at a decreasing rate, replace random items
    return samples

回答 12

一种更简单的方法包括简单地过滤到感兴趣的记录集,并根据random.sample需要选择尽可能多的记录集:

from myapp.models import MyModel
import random

my_queryset = MyModel.objects.filter(criteria=True)  # Returns a QuerySet
my_object = random.sample(my_queryset, 1)  # get a single random element from my_queryset
my_objects = random.sample(my_queryset, 5)  # get five random elements from my_queryset

注意,您应该有一些代码来验证它my_queryset是否为空。如果第一个参数包含的元素太少,则random.sample返回ValueError: sample larger than population

One much easier approach to this involves simply filtering down to the recordset of interest and using random.sample to select as many as you want:

from myapp.models import MyModel
import random

my_queryset = MyModel.objects.filter(criteria=True)  # Returns a QuerySet
my_object = random.sample(my_queryset, 1)  # get a single random element from my_queryset
my_objects = random.sample(my_queryset, 5)  # get five random elements from my_queryset

Note that you should have some code in place to verify that my_queryset is not empty; random.sample returns ValueError: sample larger than population if the first argument contains too few elements.


回答 13

自动删除不删除主键的方法

如果您有一个表,其中主键是一个没有间隔的连续整数,那么以下方法应该有效:

import random
max_id = MyModel.objects.last().id
random_id = random.randint(0, max_id)
random_obj = MyModel.objects.get(pk=random_id)

与遍历表的所有行的其他方法相比,此方法效率更高。尽管它确实需要两个数据库查询,但两者都很简单。此外,它很简单,不需要定义任何额外的类。但是,它的适用性仅限于具有自动递增主键的表,其中行从未删除,因此id序列中没有空格。

在删除行(例如空格)的情况下,如果重试该方法直到随机选择一个现有的主键,该方法仍然可以使用。

参考文献

Method for auto-incrementing primary key with no deletes

If you have a table where the primary key is a sequential integer with no gaps, then the following method should work:

import random
max_id = MyModel.objects.last().id
random_id = random.randint(0, max_id)
random_obj = MyModel.objects.get(pk=random_id)

This method is much more efficient than other methods here that iterate through all rows of the table. While it does require two database queries, both are trivial. Furthermore, it’s simple and doesn’t require defining any extra classes. However, it’s applicability is limited to tables with an auto-incrementing primary key where rows have never deleted, such that there are no gaps in the sequence of ids.

In the case where rows have been deleted such that are gaps, this method could still work if it is retried until an existing primary key is randomly selected.

References


回答 14

我有一个非常简单的解决方案,使自定义经理:

class RandomManager(models.Manager):
    def random(self):
        return random.choice(self.all())

然后添加模型:

class Example(models.Model):
    name = models.CharField(max_length=128)
    objects = RandomManager()

现在,您可以使用它:

Example.objects.random()

I got very simple solution, make custom manager:

class RandomManager(models.Manager):
    def random(self):
        return random.choice(self.all())

and then add in model:

class Example(models.Model):
    name = models.CharField(max_length=128)
    objects = RandomManager()

Now, you can use it:

Example.objects.random()