标签归档:filter

如何在Django中过滤DateTimeField的日期?

问题:如何在Django中过滤DateTimeField的日期?

我试图过滤DateTimeField与日期比较。我的意思是:

MyObject.objects.filter(datetime_attr=datetime.date(2009,8,22))

我得到一个空的查询集列表作为答案,因为(我认为)我不在考虑时间,但我希望“任何时间”。

Django中有一种简单的方法吗?

我在datetime中设置了时间,但不是00:00

I am trying to filter a DateTimeField comparing with a date. I mean:

MyObject.objects.filter(datetime_attr=datetime.date(2009,8,22))

I get an empty queryset list as an answer because (I think) I am not considering time, but I want “any time”.

Is there an easy way in Django for doing this?

I have the time in the datetime setted, it is not 00:00.


回答 0

此类查询的实现django.views.generic.date_based方式如下:

{'date_time_field__range': (datetime.datetime.combine(date, datetime.time.min),
                            datetime.datetime.combine(date, datetime.time.max))} 

因为它很冗长,所以有计划使用__date运算符来改进语法。有关更多详细信息,请检查“ #9596将DateTimeField与日期比较太难 ”。

Such lookups are implemented in django.views.generic.date_based as follows:

{'date_time_field__range': (datetime.datetime.combine(date, datetime.time.min),
                            datetime.datetime.combine(date, datetime.time.max))} 

Because it is quite verbose there are plans to improve the syntax using __date operator. Check “#9596 Comparing a DateTimeField to a date is too hard” for more details.


回答 1

YourModel.objects.filter(datetime_published__year='2008', 
                         datetime_published__month='03', 
                         datetime_published__day='27')

//在评论后编辑

YourModel.objects.filter(datetime_published=datetime(2008, 03, 27))

不起作用,因为它创建了一个时间值设置为0的datetime对象,因此数据库中的时间不匹配。

YourModel.objects.filter(datetime_published__year='2008', 
                         datetime_published__month='03', 
                         datetime_published__day='27')

// edit after comments

YourModel.objects.filter(datetime_published=datetime(2008, 03, 27))

doest not work because it creates a datetime object with time values set to 0, so the time in database doesn’t match.


回答 2

这是我使用ipython的timeit函数得到的结果:

from datetime import date
today = date.today()

timeit[Model.objects.filter(date_created__year=today.year, date_created__month=today.month, date_created__day=today.day)]
1000 loops, best of 3: 652 us per loop

timeit[Model.objects.filter(date_created__gte=today)]
1000 loops, best of 3: 631 us per loop

timeit[Model.objects.filter(date_created__startswith=today)]
1000 loops, best of 3: 541 us per loop

timeit[Model.objects.filter(date_created__contains=today)]
1000 loops, best of 3: 536 us per loop

包含似乎更快。

Here are the results I got with ipython’s timeit function:

from datetime import date
today = date.today()

timeit[Model.objects.filter(date_created__year=today.year, date_created__month=today.month, date_created__day=today.day)]
1000 loops, best of 3: 652 us per loop

timeit[Model.objects.filter(date_created__gte=today)]
1000 loops, best of 3: 631 us per loop

timeit[Model.objects.filter(date_created__startswith=today)]
1000 loops, best of 3: 541 us per loop

timeit[Model.objects.filter(date_created__contains=today)]
1000 loops, best of 3: 536 us per loop

contains seems to be faster.


回答 3

现在,Django具有__date queryset过滤器,可以针对开发版本中的日期查询datetime对象。因此,它将很快在1.9中可用。

Now Django has __date queryset filter to query datetime objects against dates in development version. Thus, it will be available in 1.9 soon.


回答 4

Mymodel.objects.filter(date_time_field__contains=datetime.date(1986, 7, 28))

以上是我用过的。它不仅有效,而且还具有一些固有的逻辑支持。

Mymodel.objects.filter(date_time_field__contains=datetime.date(1986, 7, 28))

the above is what I’ve used. Not only does it work, it also has some inherent logical backing.


回答 5

从Django 1.9开始,执行此操作的方法是__date在datetime对象上使用。

例如: MyObject.objects.filter(datetime_attr__date=datetime.date(2009,8,22))

As of Django 1.9, the way to do this is by using __date on a datetime object.

For example: MyObject.objects.filter(datetime_attr__date=datetime.date(2009,8,22))


回答 6

这产生与使用__year,__ month和__day相同的结果,并且似乎对我有用:

YourModel.objects.filter(your_datetime_field__startswith=datetime.date(2009,8,22))

This produces the same results as using __year, __month, and __day and seems to work for me:

YourModel.objects.filter(your_datetime_field__startswith=datetime.date(2009,8,22))

回答 7

假设active_on是一个日期对象,将其增加1天,然后进行范围调整

next_day = active_on + datetime.timedelta(1)
queryset = queryset.filter(date_created__range=(active_on, next_day) )

assuming active_on is a date object, increment it by 1 day then do range

next_day = active_on + datetime.timedelta(1)
queryset = queryset.filter(date_created__range=(active_on, next_day) )

回答 8

这是一种有趣的技术-我利用在Django上在MySQL上实现的startswith过程来实现只在日期中查找日期时间的结果。基本上,当Django在数据库中进行查找时,它必须对DATETIME MySQL存储对象进行字符串转换,因此您可以对此进行过滤,而忽略日期的时间戳部分-这样%LIKE%仅与日期匹配对象,您将获得给定日期的每个时间戳。

datetime_filter = datetime(2009, 8, 22) 
MyObject.objects.filter(datetime_attr__startswith=datetime_filter.date())

这将执行以下查询:

SELECT (values) FROM myapp_my_object \ 
WHERE myapp_my_object.datetime_attr LIKE BINARY 2009-08-22%

在这种情况下,无论时间戳如何,LIKE BINARY都将匹配日期中的所有内容。包括以下值:

+---------------------+
| datetime_attr       |
+---------------------+
| 2009-08-22 11:05:08 |
+---------------------+

希望这对所有人都有帮助,直到Django提出解决方案为止!

Here is an interesting technique– I leveraged the startswith procedure as implemented with Django on MySQL to achieve the result of only looking up a datetime through only the date. Basically, when Django does the lookup in the database it has to do a string conversion for the DATETIME MySQL storage object, so you can filter on that, leaving out the timestamp portion of the date– that way %LIKE% matches only the date object and you’ll get every timestamp for the given date.

datetime_filter = datetime(2009, 8, 22) 
MyObject.objects.filter(datetime_attr__startswith=datetime_filter.date())

This will perform the following query:

SELECT (values) FROM myapp_my_object \ 
WHERE myapp_my_object.datetime_attr LIKE BINARY 2009-08-22%

The LIKE BINARY in this case will match everything for the date, no matter the timestamp. Including values like:

+---------------------+
| datetime_attr       |
+---------------------+
| 2009-08-22 11:05:08 |
+---------------------+

Hopefully this helps everyone until Django comes out with a solution!


回答 9

这里有一篇很棒的博客文章:在Django ORM中比较日期和日期时间

为Django> 1.7,<1.9发布的最佳解决方案是注册一个转换:

from django.db import models

class MySQLDatetimeDate(models.Transform):
    """
    This implements a custom SQL lookup when using `__date` with datetimes.
    To enable filtering on datetimes that fall on a given date, import
    this transform and register it with the DateTimeField.
    """
    lookup_name = 'date'

    def as_sql(self, compiler, connection):
        lhs, params = compiler.compile(self.lhs)
        return 'DATE({})'.format(lhs), params

    @property
    def output_field(self):
        return models.DateField()

然后可以在过滤器中使用它,如下所示:

Foo.objects.filter(created_on__date=date)

编辑

此解决方案绝对取决于后端。从文章:

当然,此实现依赖于具有DATE()函数的SQL的特定风格。MySQL确实如此。SQLite也是如此。另一方面,我还没有亲自使用PostgreSQL,但是通过谷歌搜索使我相信它没有DATE()函数。因此,这种简单的实现似乎必然与后端有关。

There’s a fantastic blogpost that covers this here: Comparing Dates and Datetimes in the Django ORM

The best solution posted for Django>1.7,<1.9 is to register a transform:

from django.db import models

class MySQLDatetimeDate(models.Transform):
    """
    This implements a custom SQL lookup when using `__date` with datetimes.
    To enable filtering on datetimes that fall on a given date, import
    this transform and register it with the DateTimeField.
    """
    lookup_name = 'date'

    def as_sql(self, compiler, connection):
        lhs, params = compiler.compile(self.lhs)
        return 'DATE({})'.format(lhs), params

    @property
    def output_field(self):
        return models.DateField()

Then you can use it in your filters like this:

Foo.objects.filter(created_on__date=date)

EDIT

This solution is definitely back end dependent. From the article:

Of course, this implementation relies on your particular flavor of SQL having a DATE() function. MySQL does. So does SQLite. On the other hand, I haven’t worked with PostgreSQL personally, but some googling leads me to believe that it does not have a DATE() function. So an implementation this simple seems like it will necessarily be somewhat backend-dependent.


回答 10

嗯..我的解决方案正在工作:

Mymodel.objects.filter(date_time_field__startswith=datetime.datetime(1986, 7, 28))

Hm.. My solution is working:

Mymodel.objects.filter(date_time_field__startswith=datetime.datetime(1986, 7, 28))

回答 11

Model.objects.filter(datetime__year=2011, datetime__month=2, datetime__day=30)
Model.objects.filter(datetime__year=2011, datetime__month=2, datetime__day=30)

回答 12

在Django 1.7.6中工作:

MyObject.objects.filter(datetime_attr__startswith=datetime.date(2009,8,22))

In Django 1.7.6 works:

MyObject.objects.filter(datetime_attr__startswith=datetime.date(2009,8,22))

回答 13

请参阅文章Django文档

ur_data_model.objects.filter(ur_date_field__gte=datetime(2009, 8, 22), ur_date_field__lt=datetime(2009, 8, 23))

See the article Django Documentation

ur_data_model.objects.filter(ur_date_field__gte=datetime(2009, 8, 22), ur_date_field__lt=datetime(2009, 8, 23))

列表中每个*项目的Django过滤器查询集__in

问题:列表中每个*项目的Django过滤器查询集__in

假设我有以下型号

class Photo(models.Model):
    tags = models.ManyToManyField(Tag)

class Tag(models.Model):
    name = models.CharField(max_length=50)

在一个视图中,我有一个带有活动过滤器的列表,称为category。我想过滤所有具有类别标签的照片对象。

我试过了:

Photo.objects.filter(tags__name__in=categories)

但这匹配类别中的任何项目,而不是所有项目。

因此,如果类别为[‘holiday’,’summer’],则我希望Photo带有假日和夏季标签。

能做到吗?

Let’s say I have the following models

class Photo(models.Model):
    tags = models.ManyToManyField(Tag)

class Tag(models.Model):
    name = models.CharField(max_length=50)

In a view I have a list with active filters called categories. I want to filter Photo objects which have all tags present in categories.

I tried:

Photo.objects.filter(tags__name__in=categories)

But this matches any item in categories, not all items.

So if categories would be [‘holiday’, ‘summer’] I want Photo’s with both a holiday and summer tag.

Can this be achieved?


回答 0

摘要:

正如jpic和sgallen在评论中所建议的那样,可以.filter()为每个类别添加一个选项。每filter增加一个,就会添加更多的联接,这对于少量的类别来说应该不是问题。

聚合 方法。对于大量类别,此查询将更短,甚至更快。

您还可以选择使用自定义查询


一些例子

测试设置:

class Photo(models.Model):
    tags = models.ManyToManyField('Tag')

class Tag(models.Model):
    name = models.CharField(max_length=50)

    def __unicode__(self):
        return self.name

In [2]: t1 = Tag.objects.create(name='holiday')
In [3]: t2 = Tag.objects.create(name='summer')
In [4]: p = Photo.objects.create()
In [5]: p.tags.add(t1)
In [6]: p.tags.add(t2)
In [7]: p.tags.all()
Out[7]: [<Tag: holiday>, <Tag: summer>]

使用链接过滤器方法:

In [8]: Photo.objects.filter(tags=t1).filter(tags=t2)
Out[8]: [<Photo: Photo object>]

结果查询:

In [17]: print Photo.objects.filter(tags=t1).filter(tags=t2).query
SELECT "test_photo"."id"
FROM "test_photo"
INNER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
INNER JOIN "test_photo_tags" T4 ON ("test_photo"."id" = T4."photo_id")
WHERE ("test_photo_tags"."tag_id" = 3  AND T4."tag_id" = 4 )

请注意,每个都为查询filter添加了更多内容JOINS

使用注释 方法

In [29]: from django.db.models import Count
In [30]: Photo.objects.filter(tags__in=[t1, t2]).annotate(num_tags=Count('tags')).filter(num_tags=2)
Out[30]: [<Photo: Photo object>]

结果查询:

In [32]: print Photo.objects.filter(tags__in=[t1, t2]).annotate(num_tags=Count('tags')).filter(num_tags=2).query
SELECT "test_photo"."id", COUNT("test_photo_tags"."tag_id") AS "num_tags"
FROM "test_photo"
LEFT OUTER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
WHERE ("test_photo_tags"."tag_id" IN (3, 4))
GROUP BY "test_photo"."id", "test_photo"."id"
HAVING COUNT("test_photo_tags"."tag_id") = 2

ANDed Q对象不起作用:

In [9]: from django.db.models import Q
In [10]: Photo.objects.filter(Q(tags__name='holiday') & Q(tags__name='summer'))
Out[10]: []
In [11]: from operator import and_
In [12]: Photo.objects.filter(reduce(and_, [Q(tags__name='holiday'), Q(tags__name='summer')]))
Out[12]: []

结果查询:

In [25]: print Photo.objects.filter(Q(tags__name='holiday') & Q(tags__name='summer')).query
SELECT "test_photo"."id"
FROM "test_photo"
INNER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
INNER JOIN "test_tag" ON ("test_photo_tags"."tag_id" = "test_tag"."id")
WHERE ("test_tag"."name" = holiday  AND "test_tag"."name" = summer )

Summary:

One option is, as suggested by jpic and sgallen in the comments, to add .filter() for each category. Each additional filter adds more joins, which should not be a problem for small set of categories.

There is the aggregation approach. This query would be shorter and perhaps quicker for a large set of categories.

You also have the option of using custom queries.


Some examples

Test setup:

class Photo(models.Model):
    tags = models.ManyToManyField('Tag')

class Tag(models.Model):
    name = models.CharField(max_length=50)

    def __unicode__(self):
        return self.name

In [2]: t1 = Tag.objects.create(name='holiday')
In [3]: t2 = Tag.objects.create(name='summer')
In [4]: p = Photo.objects.create()
In [5]: p.tags.add(t1)
In [6]: p.tags.add(t2)
In [7]: p.tags.all()
Out[7]: [<Tag: holiday>, <Tag: summer>]

Using chained filters approach:

In [8]: Photo.objects.filter(tags=t1).filter(tags=t2)
Out[8]: [<Photo: Photo object>]

Resulting query:

In [17]: print Photo.objects.filter(tags=t1).filter(tags=t2).query
SELECT "test_photo"."id"
FROM "test_photo"
INNER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
INNER JOIN "test_photo_tags" T4 ON ("test_photo"."id" = T4."photo_id")
WHERE ("test_photo_tags"."tag_id" = 3  AND T4."tag_id" = 4 )

Note that each filter adds more JOINS to the query.

Using annotation approach:

In [29]: from django.db.models import Count
In [30]: Photo.objects.filter(tags__in=[t1, t2]).annotate(num_tags=Count('tags')).filter(num_tags=2)
Out[30]: [<Photo: Photo object>]

Resulting query:

In [32]: print Photo.objects.filter(tags__in=[t1, t2]).annotate(num_tags=Count('tags')).filter(num_tags=2).query
SELECT "test_photo"."id", COUNT("test_photo_tags"."tag_id") AS "num_tags"
FROM "test_photo"
LEFT OUTER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
WHERE ("test_photo_tags"."tag_id" IN (3, 4))
GROUP BY "test_photo"."id", "test_photo"."id"
HAVING COUNT("test_photo_tags"."tag_id") = 2

ANDed Q objects would not work:

In [9]: from django.db.models import Q
In [10]: Photo.objects.filter(Q(tags__name='holiday') & Q(tags__name='summer'))
Out[10]: []
In [11]: from operator import and_
In [12]: Photo.objects.filter(reduce(and_, [Q(tags__name='holiday'), Q(tags__name='summer')]))
Out[12]: []

Resulting query:

In [25]: print Photo.objects.filter(Q(tags__name='holiday') & Q(tags__name='summer')).query
SELECT "test_photo"."id"
FROM "test_photo"
INNER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
INNER JOIN "test_tag" ON ("test_photo_tags"."tag_id" = "test_tag"."id")
WHERE ("test_tag"."name" = holiday  AND "test_tag"."name" = summer )

回答 1

尽管仅适用于PostgreSQL,另一种有效的方法是使用django.contrib.postgres.fields.ArrayField

docs复制的示例:

>>> Post.objects.create(name='First post', tags=['thoughts', 'django'])
>>> Post.objects.create(name='Second post', tags=['thoughts'])
>>> Post.objects.create(name='Third post', tags=['tutorial', 'django'])

>>> Post.objects.filter(tags__contains=['thoughts'])
<QuerySet [<Post: First post>, <Post: Second post>]>

>>> Post.objects.filter(tags__contains=['django'])
<QuerySet [<Post: First post>, <Post: Third post>]>

>>> Post.objects.filter(tags__contains=['django', 'thoughts'])
<QuerySet [<Post: First post>]>

ArrayField具有一些更强大的功能,例如重叠索引转换

Another approach that works, although PostgreSQL only, is using django.contrib.postgres.fields.ArrayField:

Example copied from docs:

>>> Post.objects.create(name='First post', tags=['thoughts', 'django'])
>>> Post.objects.create(name='Second post', tags=['thoughts'])
>>> Post.objects.create(name='Third post', tags=['tutorial', 'django'])

>>> Post.objects.filter(tags__contains=['thoughts'])
<QuerySet [<Post: First post>, <Post: Second post>]>

>>> Post.objects.filter(tags__contains=['django'])
<QuerySet [<Post: First post>, <Post: Third post>]>

>>> Post.objects.filter(tags__contains=['django', 'thoughts'])
<QuerySet [<Post: First post>]>

ArrayField has some more powerful features such as overlap and index transforms.


回答 2

这也可以通过使用Django ORM和一些Python魔术来动态查询生成来完成:)

from operator import and_
from django.db.models import Q

categories = ['holiday', 'summer']
res = Photo.filter(reduce(and_, [Q(tags__name=c) for c in categories]))

想法是为每个类别生成适当的Q对象,然后使用AND运算符将它们组合到一个QuerySet中。例如,对于您的示例,它等于

res = Photo.filter(Q(tags__name='holiday') & Q(tags__name='summer'))

This also can be done by dynamic query generation using Django ORM and some Python magic :)

from operator import and_
from django.db.models import Q

categories = ['holiday', 'summer']
res = Photo.filter(reduce(and_, [Q(tags__name=c) for c in categories]))

The idea is to generate appropriate Q objects for each category and then combine them using AND operator into one QuerySet. E.g. for your example it’d be equal to

res = Photo.filter(Q(tags__name='holiday') & Q(tags__name='summer'))

回答 3

我使用了一个小函数,它为给定的运算符和列名迭代列表上的过滤器:

def exclusive_in (cls,column,operator,value_list):         
    myfilter = column + '__' + operator
    query = cls.objects
    for value in value_list:
        query=query.filter(**{myfilter:value})
    return query  

这个函数可以这样调用:

exclusive_in(Photo,'tags__name','iexact',['holiday','summer'])

它也可以与任何类和列表中的更多标签一起使用;运算符可以是’iexact’,’in’,’contains’,’ne’,…等任何人。

I use a little function that iterates filters over a list for a given operator an a column name :

def exclusive_in (cls,column,operator,value_list):         
    myfilter = column + '__' + operator
    query = cls.objects
    for value in value_list:
        query=query.filter(**{myfilter:value})
    return query  

and this function can be called like that:

exclusive_in(Photo,'tags__name','iexact',['holiday','summer'])

it also work with any class and more tags in the list; operators can be anyone like ‘iexact’,’in’,’contains’,’ne’,…


回答 4

queryset = Photo.objects.filter(tags__name="vacaciones") | Photo.objects.filter(tags__name="verano")
queryset = Photo.objects.filter(tags__name="vacaciones") | Photo.objects.filter(tags__name="verano")

回答 5

如果我们想动态地执行此操作,请遵循以下示例:

tag_ids = [t1.id, t2.id]
qs = Photo.objects.all()

for tag_id in tag_ids:
    qs = qs.filter(tag__id=tag_id)    

print qs

If we want to do it dynamically, followed the example:

tag_ids = [t1.id, t2.id]
qs = Photo.objects.all()

for tag_id in tag_ids:
    qs = qs.filter(tag__id=tag_id)    

print qs

如何使用Python删除非ASCII字符但保留句点和空格?

问题:如何使用Python删除非ASCII字符但保留句点和空格?

我正在使用.txt文件。我希望文件中的文本字符串不包含非ASCII字符。但是,我想留空格和句点。目前,我也正在剥离它们。这是代码:

def onlyascii(char):
    if ord(char) < 48 or ord(char) > 127: return ''
    else: return char

def get_my_string(file_path):
    f=open(file_path,'r')
    data=f.read()
    f.close()
    filtered_data=filter(onlyascii, data)
    filtered_data = filtered_data.lower()
    return filtered_data

我应该如何修改onlyascii()以保留空格和句点?我想这并不太复杂,但我无法弄清楚。

I’m working with a .txt file. I want a string of the text from the file with no non-ASCII characters. However, I want to leave spaces and periods. At present, I’m stripping those too. Here’s the code:

def onlyascii(char):
    if ord(char) < 48 or ord(char) > 127: return ''
    else: return char

def get_my_string(file_path):
    f=open(file_path,'r')
    data=f.read()
    f.close()
    filtered_data=filter(onlyascii, data)
    filtered_data = filtered_data.lower()
    return filtered_data

How should I modify onlyascii() to leave spaces and periods? I imagine it’s not too complicated but I can’t figure it out.


回答 0

您可以使用string.printable过滤字符串中所有不可打印的字符,如下所示:

>>> s = "some\x00string. with\x15 funny characters"
>>> import string
>>> printable = set(string.printable)
>>> filter(lambda x: x in printable, s)
'somestring. with funny characters'

我机器上的string.printable包含:

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c

编辑:在Python 3上,筛选器将返回可迭代。返回字符串的正确方法是:

''.join(filter(lambda x: x in printable, s))

You can filter all characters from the string that are not printable using string.printable, like this:

>>> s = "some\x00string. with\x15 funny characters"
>>> import string
>>> printable = set(string.printable)
>>> filter(lambda x: x in printable, s)
'somestring. with funny characters'

string.printable on my machine contains:

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c

EDIT: On Python 3, filter will return an iterable. The correct way to obtain a string back would be:

''.join(filter(lambda x: x in printable, s))

回答 1

更改为其他编解码器的简单方法是使用encode()或decode()。在您的情况下,您想转换为ASCII并忽略所有不支持的符号。例如,瑞典字母å不是ASCII字符:

    >>>s = u'Good bye in Swedish is Hej d\xe5'
    >>>s = s.encode('ascii',errors='ignore')
    >>>print s
    Good bye in Swedish is Hej d

编辑:

Python3:str->字节-> str

>>>"Hej då".encode("ascii", errors="ignore").decode()
'hej d'

Python2:unicode-> str-> unicode

>>> u"hej då".encode("ascii", errors="ignore").decode()
u'hej d'

Python2:str-> unicode-> str(以相反的顺序解码和编码)

>>> "hej d\xe5".decode("ascii", errors="ignore").encode()
'hej d'

An easy way to change to a different codec, is by using encode() or decode(). In your case, you want to convert to ASCII and ignore all symbols that are not supported. For example, the Swedish letter å is not an ASCII character:

    >>>s = u'Good bye in Swedish is Hej d\xe5'
    >>>s = s.encode('ascii',errors='ignore')
    >>>print s
    Good bye in Swedish is Hej d

Edit:

Python3: str -> bytes -> str

>>>"Hej då".encode("ascii", errors="ignore").decode()
'hej d'

Python2: unicode -> str -> unicode

>>> u"hej då".encode("ascii", errors="ignore").decode()
u'hej d'

Python2: str -> unicode -> str (decode and encode in reverse order)

>>> "hej d\xe5".decode("ascii", errors="ignore").encode()
'hej d'

回答 2

根据@artfulrobot,这应该比filter和lambda更快:

re.sub(r'[^\x00-\x7f]',r'', your-non-ascii-string) 

在此处查看更多示例 http://stackoverflow.com/questions/20078816/replace-non-ascii-characters-with-a-single-space/20079244#20079244

According to @artfulrobot, this should be faster than filter and lambda:

re.sub(r'[^\x00-\x7f]',r'', your-non-ascii-string) 

See more examples here Replace non-ASCII characters with a single space


回答 3

您的问题不明确;前两个句子加在一起表示您认为空格和“句点”是非ASCII字符。这是不正确的。等于ord(char)<= 127的所有字符都是ASCII字符。例如,您的函数不包括这些字符!“#$%&\’()* +,-。/,但包括其他几个字符,例如[] {}。

请退后一步,三思而后行,然后编辑您的问题以告诉我们您要做什么,而无需提及ASCII单词,以及为什么您认为ord(char)> = 128这样的chars是可忽略的。另外:哪个版本的Python?输入数据的编码是什么?

请注意,您的代码将整个输入文件读取为单个字符串,并且您对另一个答案的注释(“最佳解决方案”)意味着您无需关心数据中的换行符。如果您的文件包含这样的两行:

this is line 1
this is line 2

结果将是'this is line 1this is line 2'……您真正想要的是什么?

更好的解决方案包括:

  1. 过滤器功能比一个更好的名字 onlyascii
  2. 认识到如果要保留参数,则过滤器功能仅需要返回真实值:

    def filter_func(char):
        return char == '\n' or 32 <= ord(char) <= 126
    # and later:
    filtered_data = filter(filter_func, data).lower()

Your question is ambiguous; the first two sentences taken together imply that you believe that space and “period” are non-ASCII characters. This is incorrect. All chars such that ord(char) <= 127 are ASCII characters. For example, your function excludes these characters !”#$%&\'()*+,-./ but includes several others e.g. []{}.

Please step back, think a bit, and edit your question to tell us what you are trying to do, without mentioning the word ASCII, and why you think that chars such that ord(char) >= 128 are ignorable. Also: which version of Python? What is the encoding of your input data?

Please note that your code reads the whole input file as a single string, and your comment (“great solution”) to another answer implies that you don’t care about newlines in your data. If your file contains two lines like this:

this is line 1
this is line 2

the result would be 'this is line 1this is line 2' … is that what you really want?

A greater solution would include:

  1. a better name for the filter function than onlyascii
  2. recognition that a filter function merely needs to return a truthy value if the argument is to be retained:

    def filter_func(char):
        return char == '\n' or 32 <= ord(char) <= 126
    # and later:
    filtered_data = filter(filter_func, data).lower()
    

回答 4

您可以使用以下代码删除非英语字母:

import re
str = "123456790 ABC#%? .(朱惠英)"
result = re.sub(r'[^\x00-\x7f]',r'', str)
print(result)

这将返回

123456790 ABC#%?。()

You may use the following code to remove non-English letters:

import re
str = "123456790 ABC#%? .(朱惠英)"
result = re.sub(r'[^\x00-\x7f]',r'', str)
print(result)

This will return

123456790 ABC#%? .()


回答 5

如果您需要可打印的ASCII字符,则可能应将代码更正为:

if ord(char) < 32 or ord(char) > 126: return ''

等同于string.printable(@jterrace的答案),除了没有返回和制表符(’\ t’,’\ n’,’\ x0b’,’\ x0c’和’\ r’),但不对应您问题的范围

If you want printable ascii characters you probably should correct your code to:

if ord(char) < 32 or ord(char) > 126: return ''

this is equivalent, to string.printable (answer from @jterrace), except for the absence of returns and tabs (‘\t’,’\n’,’\x0b’,’\x0c’ and ‘\r’) but doesnt correspond to the range on your question


回答 6

我强烈推荐使用Fluent Python(Ramalho)。列出受第二章启发的单线Class理解:

onlyascii = ''.join([s for s in data if ord(s) < 127])
onlymatch = ''.join([s for s in data if s in
              'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'])

Working my way through Fluent Python (Ramalho) – highly recommended. List comprehension one-ish-liners inspired by Chapter 2:

onlyascii = ''.join([s for s in data if ord(s) < 127])
onlymatch = ''.join([s for s in data if s in
              'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'])

Python:检查一个字典是否是另一个较大字典的子集

问题:Python:检查一个字典是否是另一个较大字典的子集

我正在尝试编写一个自定义过滤器方法,该方法接受任意数量的kwargs并返回一个列表,其中包含包含这些kwargs的类似数据库的列表的元素。

例如,假设d1 = {'a':'2', 'b':'3'}d2=相同。d1 == d2结果为True。但是,假设d2=同一件事,再加上一堆其他事情。我的方法需要能够判断d1是否在d2中,但是Python无法使用字典来做到这一点。

内容:

我有一个字类,并且每个对象都有类似的属性worddefinitionpart_of_speech,等等。我希望能够在这些单词的主列表上调用filter方法,例如Word.objects.filter(word='jump', part_of_speech='verb-intransitive')。我无法弄清楚如何同时管理这些键和值。但是,对于其他人来说,这可能具有更大的功能。

I’m trying to write a custom filter method that takes an arbitrary number of kwargs and returns a list containing the elements of a database-like list that contain those kwargs.

For example, suppose d1 = {'a':'2', 'b':'3'} and d2 = the same thing. d1 == d2 results in True. But suppose d2 = the same thing plus a bunch of other things. My method needs to be able to tell if d1 in d2, but Python can’t do that with dictionaries.

Context:

I have a Word class, and each object has properties like word, definition, part_of_speech, and so on. I want to be able to call a filter method on the main list of these words, like Word.objects.filter(word='jump', part_of_speech='verb-intransitive'). I can’t figure out how to manage these keys and values at the same time. But this could have larger functionality outside this context for other people.


回答 0

转换为项目对并检查是否包含。

all(item in superset.items() for item in subset.items())

优化留给读者练习。

Convert to item pairs and check for containment.

all(item in superset.items() for item in subset.items())

Optimization is left as an exercise for the reader.


回答 1

在Python 3中,您可以dict.items()用来获取字典项的类似集合的视图。然后,您可以使用<=运算符来测试一个视图是否为另一个视图的“子集”:

d1.items() <= d2.items()

在Python 2.7中,使用dict.viewitems()进行相同的操作:

d1.viewitems() <= d2.viewitems()

在Python 2.6及以下版本中,您将需要其他解决方案,例如使用all()

all(key in d2 and d2[key] == d1[key] for key in d1)

In Python 3, you can use dict.items() to get a set-like view of the dict items. You can then use the <= operator to test if one view is a “subset” of the other:

d1.items() <= d2.items()

In Python 2.7, use the dict.viewitems() to do the same:

d1.viewitems() <= d2.viewitems()

In Python 2.6 and below you will need a different solution, such as using all():

all(key in d2 and d2[key] == d1[key] for key in d1)

回答 2

对于需要进行单元测试的人请注意:assertDictContainsSubset()Python的TestCase类中还有一个方法。

http://docs.python.org/2/library/unittest.html?highlight=assertdictcontainssubset#unittest.TestCase.assertDictContainsSubset

但是在3.2中已弃用它,不知道为什么,也许有替代品。

Note for people that need this for unit testing: there’s also an assertDictContainsSubset() method in Python’s TestCase class.

http://docs.python.org/2/library/unittest.html?highlight=assertdictcontainssubset#unittest.TestCase.assertDictContainsSubset

It’s however deprecated in 3.2, not sure why, maybe there’s a replacement for it.


回答 3

对于键和值,请检查使用: set(d1.items()).issubset(set(d2.items()))

如果您只需要检查按键: set(d1).issubset(set(d2))

for keys and values check use: set(d1.items()).issubset(set(d2.items()))

if you need to check only keys: set(d1).issubset(set(d2))


回答 4

为了完整起见,您还可以执行以下操作:

def is_subdict(small, big):
    return dict(big, **small) == big

但是,对于速度(或缺乏速度)或可读性(或缺乏可读性),我不做任何主张。

For completeness, you can also do this:

def is_subdict(small, big):
    return dict(big, **small) == big

However, I make no claims whatsoever concerning speed (or lack thereof) or readability (or lack thereof).


回答 5

>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True

上下文:

>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> list(d1.iteritems())
[('a', '2'), ('b', '3')]
>>> [(k,v) for k,v in d1.iteritems()]
[('a', '2'), ('b', '3')]
>>> k,v = ('a','2')
>>> k
'a'
>>> v
'2'
>>> k in d2
True
>>> d2[k]
'2'
>>> k in d2 and d2[k]==v
True
>>> [(k in d2 and d2[k]==v) for k,v in d1.iteritems()]
[True, True]
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems())
<generator object <genexpr> at 0x02A9D2B0>
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems()).next()
True
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True
>>>
>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True

context:

>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> list(d1.iteritems())
[('a', '2'), ('b', '3')]
>>> [(k,v) for k,v in d1.iteritems()]
[('a', '2'), ('b', '3')]
>>> k,v = ('a','2')
>>> k
'a'
>>> v
'2'
>>> k in d2
True
>>> d2[k]
'2'
>>> k in d2 and d2[k]==v
True
>>> [(k in d2 and d2[k]==v) for k,v in d1.iteritems()]
[True, True]
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems())
<generator object <genexpr> at 0x02A9D2B0>
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems()).next()
True
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True
>>>

回答 6

我的函数出于相同的目的,递归地执行此操作:

def dictMatch(patn, real):
    """does real dict match pattern?"""
    try:
        for pkey, pvalue in patn.iteritems():
            if type(pvalue) is dict:
                result = dictMatch(pvalue, real[pkey])
                assert result
            else:
                assert real[pkey] == pvalue
                result = True
    except (AssertionError, KeyError):
        result = False
    return result

在您的示例中,dictMatch(d1, d2)即使d2中包含其他内容,也应返回True,而且它也适用于较低级别:

d1 = {'a':'2', 'b':{3: 'iii'}}
d2 = {'a':'2', 'b':{3: 'iii', 4: 'iv'},'c':'4'}

dictMatch(d1, d2)   # True

注意:可能有更好的解决方案,可以避免使用该if type(pvalue) is dict子句,并适用于更广泛的情况(例如哈希列表等)。递归也不受限制,因此后果自负。;)

My function for the same purpose, doing this recursively:

def dictMatch(patn, real):
    """does real dict match pattern?"""
    try:
        for pkey, pvalue in patn.iteritems():
            if type(pvalue) is dict:
                result = dictMatch(pvalue, real[pkey])
                assert result
            else:
                assert real[pkey] == pvalue
                result = True
    except (AssertionError, KeyError):
        result = False
    return result

In your example, dictMatch(d1, d2) should return True even if d2 has other stuff in it, plus it applies also to lower levels:

d1 = {'a':'2', 'b':{3: 'iii'}}
d2 = {'a':'2', 'b':{3: 'iii', 4: 'iv'},'c':'4'}

dictMatch(d1, d2)   # True

Notes: There could be even better solution which avoids the if type(pvalue) is dict clause and applies to even wider range of cases (like lists of hashes etc). Also recursion is not limited here so use at your own risk. ;)


回答 7

这是一个解决方案,也可以正确地递归到词典中包含的列表和集合中。您也可以将其用于包含字典等的列表…

def is_subset(subset, superset):
    if isinstance(subset, dict):
        return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())

    if isinstance(subset, list) or isinstance(subset, set):
        return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)

    # assume that subset is a plain value if none of the above match
    return subset == superset

Here is a solution that also properly recurses into lists and sets contained within the dictionary. You can also use this for lists containing dicts etc…

def is_subset(subset, superset):
    if isinstance(subset, dict):
        return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())

    if isinstance(subset, list) or isinstance(subset, set):
        return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)

    # assume that subset is a plain value if none of the above match
    return subset == superset

回答 8

这个看似简单的问题使我花费了几个小时的研究时间才能找到100%可靠的解决方案,因此我记录了在此答案中发现的内容。

  1. 用“ Pythonic-ally”来讲,small_dict <= big_dict这将是最直观的方法,但是很糟糕,它不起作用{'a': 1} < {'a': 1, 'b': 2}似乎可以在Python 2中使用,但是它不可靠,因为官方文档明确指出了这一点。继续搜索“除平等以外的其他结果均得到一致解决,但没有其他定义。” 在这一节。更不用说,比较Python 3中的2个字典会导致TypeError异常。

  2. 第二个最直观的东西是small.viewitems() <= big.viewitems()仅适用于Python 2.7和small.items() <= big.items()Python3。但是有一个警告:它可能有bug。如果您的程序可以在<= 2.6的Python上使用,则它d1.items() <= d2.items()实际上是在比较2个元组列表,没有特定的顺序,因此最终结果将不可靠,并且将成为程序中的一个讨厌的bug。我不希望为Python <= 2.6编写另一种实现,但是我仍然不满意我的代码带有一个已知的错误(即使它在不受支持的平台上)。所以我放弃了这种方法。

  3. 我用@blubberdiblub的答案安定下来(信誉归他所有):

    def is_subdict(small, big): return dict(big, **small) == big

    值得指出的是,这个答案依赖于==字典之间的行为,这在官方文档中已明确定义,因此应该在每个Python版本中都适用。去搜索:

    • “只有并且当它们具有相同的(键,值)对时,字典的比较才相等。” 是本页的最后一句话
    • “映射(dict的实例)在且仅当它们具有相等的(键,值)对时比较相等。键和元素的相等比较会增强自反性。” 在此页面

This seemingly straightforward issue costs me a couple hours in research to find a 100% reliable solution, so I documented what I’ve found in this answer.

  1. “Pythonic-ally” speaking, small_dict <= big_dict would be the most intuitive way, but too bad that it won’t work. {'a': 1} < {'a': 1, 'b': 2} seemingly works in Python 2, but it is not reliable because the official documention explicitly calls it out. Go search “Outcomes other than equality are resolved consistently, but are not otherwise defined.” in this section. Not to mention, comparing 2 dicts in Python 3 results in a TypeError exception.

  2. The second most-intuitive thing is small.viewitems() <= big.viewitems() for Python 2.7 only, and small.items() <= big.items() for Python 3. But there is one caveat: it is potentially buggy. If your program could potentially be used on Python <=2.6, its d1.items() <= d2.items() are actually comparing 2 lists of tuples, without particular order, so the final result will be unreliable and it becomes a nasty bug in your program. I am not keen to write yet another implementation for Python<=2.6, but I still don’t feel comfortable that my code comes with a known bug (even if it is on an unsupported platform). So I abandon this approach.

  3. I settle down with @blubberdiblub ‘s answer (Credit goes to him):

    def is_subdict(small, big): return dict(big, **small) == big

    It is worth pointing out that, this answer relies on the == behavior between dicts, which is clearly defined in official document, hence should work in every Python version. Go search:

    • “Dictionaries compare equal if and only if they have the same (key, value) pairs.” is the last sentence in this page
    • “Mappings (instances of dict) compare equal if and only if they have equal (key, value) pairs. Equality comparison of the keys and elements enforces reflexivity.” in this page

回答 9

这是针对给定问题的一般递归解决方案:

import traceback
import unittest

def is_subset(superset, subset):
    for key, value in subset.items():
        if key not in superset:
            return False

        if isinstance(value, dict):
            if not is_subset(superset[key], value):
                return False

        elif isinstance(value, str):
            if value not in superset[key]:
                return False

        elif isinstance(value, list):
            if not set(value) <= set(superset[key]):
                return False
        elif isinstance(value, set):
            if not value <= superset[key]:
                return False

        else:
            if not value == superset[key]:
                return False

    return True


class Foo(unittest.TestCase):

    def setUp(self):
        self.dct = {
            'a': 'hello world',
            'b': 12345,
            'c': 1.2345,
            'd': [1, 2, 3, 4, 5],
            'e': {1, 2, 3, 4, 5},
            'f': {
                'a': 'hello world',
                'b': 12345,
                'c': 1.2345,
                'd': [1, 2, 3, 4, 5],
                'e': {1, 2, 3, 4, 5},
                'g': False,
                'h': None
            },
            'g': False,
            'h': None,
            'question': 'mcve',
            'metadata': {}
        }

    def tearDown(self):
        pass

    def check_true(self, superset, subset):
        return self.assertEqual(is_subset(superset, subset), True)

    def check_false(self, superset, subset):
        return self.assertEqual(is_subset(superset, subset), False)

    def test_simple_cases(self):
        self.check_true(self.dct, {'a': 'hello world'})
        self.check_true(self.dct, {'b': 12345})
        self.check_true(self.dct, {'c': 1.2345})
        self.check_true(self.dct, {'d': [1, 2, 3, 4, 5]})
        self.check_true(self.dct, {'e': {1, 2, 3, 4, 5}})
        self.check_true(self.dct, {'f': {
            'a': 'hello world',
            'b': 12345,
            'c': 1.2345,
            'd': [1, 2, 3, 4, 5],
            'e': {1, 2, 3, 4, 5},
        }})
        self.check_true(self.dct, {'g': False})
        self.check_true(self.dct, {'h': None})

    def test_tricky_cases(self):
        self.check_true(self.dct, {'a': 'hello'})
        self.check_true(self.dct, {'d': [1, 2, 3]})
        self.check_true(self.dct, {'e': {3, 4}})
        self.check_true(self.dct, {'f': {
            'a': 'hello world',
            'h': None
        }})
        self.check_false(
            self.dct, {'question': 'mcve', 'metadata': {'author': 'BPL'}})
        self.check_true(
            self.dct, {'question': 'mcve', 'metadata': {}})
        self.check_false(
            self.dct, {'question1': 'mcve', 'metadata': {}})

if __name__ == "__main__":
    unittest.main()

注:原来的代码将无法在某些情况下,学分固定@奥利维尔- melançon

Here’s a general recursive solution for the problem given:

import traceback
import unittest

def is_subset(superset, subset):
    for key, value in subset.items():
        if key not in superset:
            return False

        if isinstance(value, dict):
            if not is_subset(superset[key], value):
                return False

        elif isinstance(value, str):
            if value not in superset[key]:
                return False

        elif isinstance(value, list):
            if not set(value) <= set(superset[key]):
                return False
        elif isinstance(value, set):
            if not value <= superset[key]:
                return False

        else:
            if not value == superset[key]:
                return False

    return True


class Foo(unittest.TestCase):

    def setUp(self):
        self.dct = {
            'a': 'hello world',
            'b': 12345,
            'c': 1.2345,
            'd': [1, 2, 3, 4, 5],
            'e': {1, 2, 3, 4, 5},
            'f': {
                'a': 'hello world',
                'b': 12345,
                'c': 1.2345,
                'd': [1, 2, 3, 4, 5],
                'e': {1, 2, 3, 4, 5},
                'g': False,
                'h': None
            },
            'g': False,
            'h': None,
            'question': 'mcve',
            'metadata': {}
        }

    def tearDown(self):
        pass

    def check_true(self, superset, subset):
        return self.assertEqual(is_subset(superset, subset), True)

    def check_false(self, superset, subset):
        return self.assertEqual(is_subset(superset, subset), False)

    def test_simple_cases(self):
        self.check_true(self.dct, {'a': 'hello world'})
        self.check_true(self.dct, {'b': 12345})
        self.check_true(self.dct, {'c': 1.2345})
        self.check_true(self.dct, {'d': [1, 2, 3, 4, 5]})
        self.check_true(self.dct, {'e': {1, 2, 3, 4, 5}})
        self.check_true(self.dct, {'f': {
            'a': 'hello world',
            'b': 12345,
            'c': 1.2345,
            'd': [1, 2, 3, 4, 5],
            'e': {1, 2, 3, 4, 5},
        }})
        self.check_true(self.dct, {'g': False})
        self.check_true(self.dct, {'h': None})

    def test_tricky_cases(self):
        self.check_true(self.dct, {'a': 'hello'})
        self.check_true(self.dct, {'d': [1, 2, 3]})
        self.check_true(self.dct, {'e': {3, 4}})
        self.check_true(self.dct, {'f': {
            'a': 'hello world',
            'h': None
        }})
        self.check_false(
            self.dct, {'question': 'mcve', 'metadata': {'author': 'BPL'}})
        self.check_true(
            self.dct, {'question': 'mcve', 'metadata': {}})
        self.check_false(
            self.dct, {'question1': 'mcve', 'metadata': {}})

if __name__ == "__main__":
    unittest.main()

NOTE: The original code would fail in certain cases, credits for the fixing goes to @olivier-melançon


回答 10

如果您不介意使用pydash 那里is_match,那确实可以做到:

import pydash

a = {1:2, 3:4, 5:{6:7}}
b = {3:4.0, 5:{6:8}}
c = {3:4.0, 5:{6:7}}

pydash.predicates.is_match(a, b) # False
pydash.predicates.is_match(a, c) # True

If you don’t mind using pydash there is is_match there which does exactly that:

import pydash

a = {1:2, 3:4, 5:{6:7}}
b = {3:4.0, 5:{6:8}}
c = {3:4.0, 5:{6:7}}

pydash.predicates.is_match(a, b) # False
pydash.predicates.is_match(a, c) # True

回答 11

我知道这个问题很旧,但是这是我的解决方案,用于检查一个嵌套字典是否是另一个嵌套字典的一部分。解决方案是递归的。

def compare_dicts(a, b):
    for key, value in a.items():
        if key in b:
            if isinstance(a[key], dict):
                if not compare_dicts(a[key], b[key]):
                    return False
            elif value != b[key]:
                return False
        else:
            return False
    return True

I know this question is old, but here is my solution for checking if one nested dictionary is a part of another nested dictionary. The solution is recursive.

def compare_dicts(a, b):
    for key, value in a.items():
        if key in b:
            if isinstance(a[key], dict):
                if not compare_dicts(a[key], b[key]):
                    return False
            elif value != b[key]:
                return False
        else:
            return False
    return True

回答 12

此函数适用于非哈希值。我也认为它清晰易读。

def isSubDict(subDict,dictionary):
    for key in subDict.keys():
        if (not key in dictionary) or (not subDict[key] == dictionary[key]):
            return False
    return True

In [126]: isSubDict({1:2},{3:4})
Out[126]: False

In [127]: isSubDict({1:2},{1:2,3:4})
Out[127]: True

In [128]: isSubDict({1:{2:3}},{1:{2:3},3:4})
Out[128]: True

In [129]: isSubDict({1:{2:3}},{1:{2:4},3:4})
Out[129]: False

This function works for non-hashable values. I also think that it is clear and easy to read.

def isSubDict(subDict,dictionary):
    for key in subDict.keys():
        if (not key in dictionary) or (not subDict[key] == dictionary[key]):
            return False
    return True

In [126]: isSubDict({1:2},{3:4})
Out[126]: False

In [127]: isSubDict({1:2},{1:2,3:4})
Out[127]: True

In [128]: isSubDict({1:{2:3}},{1:{2:3},3:4})
Out[128]: True

In [129]: isSubDict({1:{2:3}},{1:{2:4},3:4})
Out[129]: False

回答 13

一个适用于嵌套字典的简短递归实现:

def compare_dicts(a,b):
    if not a: return True
    if isinstance(a, dict):
        key, val = a.popitem()
        return isinstance(b, dict) and key in b and compare_dicts(val, b.pop(key)) and compare_dicts(a, b)
    return a == b

这将消耗a和b字典。如果有人知道避免这种情况的好方法,而又不像其他答案那样采用部分迭代的解决方案,请告诉我。我需要一种基于键将字典拆分为头部和尾部的方法。

这段代码作为编程练习更有用,并且可能比此处混合递归和迭代的其他解决方案要慢得多。@Nutcracker的解决方案对于嵌套字典非常有用。

A short recursive implementation that works for nested dictionaries:

def compare_dicts(a,b):
    if not a: return True
    if isinstance(a, dict):
        key, val = a.popitem()
        return isinstance(b, dict) and key in b and compare_dicts(val, b.pop(key)) and compare_dicts(a, b)
    return a == b

This will consume the a and b dicts. If anyone knows of a good way to avoid that without resorting to partially iterative solutions as in other answers, please tell me. I would need a way to split a dict into head and tail based on a key.

This code is more usefull as a programming exercise, and probably is a lot slower than other solutions in here that mix recursion and iteration. @Nutcracker’s solution is pretty good for nested dictionaries.


如何按多列过滤熊猫数据框

问题:如何按多列过滤熊猫数据框

要按单列过滤数据帧(df),如果我们考虑男性和女性的数据,则可以:

males = df[df[Gender]=='Male']

问题1-但是,如果数据跨越多年并且我只想看2014年的男性,该怎么办?

在其他语言中,我可能会做类似的事情:

if A = "Male" and if B = "2014" then 

(除了我要执行此操作,并在新的数据框对象中获取原始数据框的子集)

问题2。如何循环执行此操作,并为每个唯一的年份和性别集创建一个数据框对象(例如,2013-男,2013-女,2014-男和2014-女的df

for y in year:

for g in gender:

df = .....

To filter a dataframe (df) by a single column, if we consider data with male and females we might:

males = df[df[Gender]=='Male']

Question 1 – But what if the data spanned multiple years and i wanted to only see males for 2014?

In other languages I might do something like:

if A = "Male" and if B = "2014" then 

(except I want to do this and get a subset of the original dataframe in a new dataframe object)

Question 2. How do I do this in a loop, and create a dataframe object for each unique sets of year and gender (i.e. a df for: 2013-Male, 2013-Female, 2014-Male, and 2014-Female

for y in year:

for g in gender:

df = .....

回答 0

使用&运算符时,不要忘了用():包裹子语句:

males = df[(df[Gender]=='Male') & (df[Year]==2014)]

要将数据帧存储在dictfor循环中:

from collections import defaultdict
dic={}
for g in ['male', 'female']:
  dic[g]=defaultdict(dict)
  for y in [2013, 2014]:
    dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict

编辑:

您的演示getDF

def getDF(dic, gender, year):
  return dic[gender][year]

print genDF(dic, 'male', 2014)

Using & operator, don’t forget to wrap the sub-statements with ():

males = df[(df[Gender]=='Male') & (df[Year]==2014)]

To store your dataframes in a dict using a for loop:

from collections import defaultdict
dic={}
for g in ['male', 'female']:
  dic[g]=defaultdict(dict)
  for y in [2013, 2014]:
    dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict

EDIT:

A demo for your getDF:

def getDF(dic, gender, year):
  return dic[gender][year]

print genDF(dic, 'male', 2014)

回答 1

对于要用作过滤器且依赖于多个列的更通用的布尔函数,可以使用:

df = df[df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)]

其中f是一个函数,该函数适用于col_1和col_2中的每对元素(x1,x2),并根据您要启用的任何条件(x1,x2)返回True或False。

For more general boolean functions that you would like to use as a filter and that depend on more than one column, you can use:

df = df[df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)]

where f is a function that is applied to every pair of elements (x1, x2) from col_1 and col_2 and returns True or False depending on any condition you want on (x1, x2).


回答 2

pandas 0.13开始,这是最有效的方法。

df.query('Gender=="Male" & Year=="2014" ')

Start from pandas 0.13, this is the most efficient way.

df.query('Gender=="Male" & Year=="2014" ')

回答 3

如果有人想知道什么是更快的过滤方法(可接受的答案或来自@redreamality的答案):

import pandas as pd
import numpy as np

length = 100_000
df = pd.DataFrame()
df['Year'] = np.random.randint(1950, 2019, size=length)
df['Gender'] = np.random.choice(['Male', 'Female'], length)

%timeit df.query('Gender=="Male" & Year=="2014" ')
%timeit df[(df['Gender']=='Male') & (df['Year']==2014)]

100,000行的结果:

6.67 ms ± 557 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.54 ms ± 536 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

10,000,000行的结果:

326 ms ± 6.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
472 ms ± 25.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

因此,结果取决于大小和数据。在我的笔记本电脑上,query()经过50万行之后速度会更快。此外,字符串搜索Year=="2014"有不必要的开销(Year==2014更快)。

In case somebody wonders what is the faster way to filter (the accepted answer or the one from @redreamality):

import pandas as pd
import numpy as np

length = 100_000
df = pd.DataFrame()
df['Year'] = np.random.randint(1950, 2019, size=length)
df['Gender'] = np.random.choice(['Male', 'Female'], length)

%timeit df.query('Gender=="Male" & Year=="2014" ')
%timeit df[(df['Gender']=='Male') & (df['Year']==2014)]

Results for 100,000 rows:

6.67 ms ± 557 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.54 ms ± 536 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Results for 10,000,000 rows:

326 ms ± 6.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
472 ms ± 25.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So results depend on the size and the data. On my laptop, query() gets faster after 500k rows. Further, the string search in Year=="2014" has an unnecessary overhead (Year==2014 is faster).


回答 4

您可以使用query中创建自己的过滤器功能pandas。在这里,您可以df按所有kwargs参数过滤结果。不要忘记添加一些验证器(kwargs过滤器)来获得自己的过滤器功能df

def filter(df, **kwargs):
    query_list = []
    for key in kwargs.keys():
        query_list.append(f'{key}=="{kwargs[key]}"')
    query = ' & '.join(query_list)
    return df.query(query)

You can create your own filter function using query in pandas. Here you have filtering of df results by all the kwargs parameters. Dont’ forgot to add some validators(kwargs filtering) to get filter function for your own df.

def filter(df, **kwargs):
    query_list = []
    for key in kwargs.keys():
        query_list.append(f'{key}=="{kwargs[key]}"')
    query = ' & '.join(query_list)
    return df.query(query)

回答 5

您可以使用np.logical_and运算符替换&(或np.logical_or替换|)以多列(多于两列)进行过滤

如果您提供多个字段的目标值,则这是完成此任务的示例函数。您可以将其调整为适用于不同类型的过滤或其他方式:

def filter_df(df, filter_values):
    """Filter df by matching targets for multiple columns.

    Args:
        df (pd.DataFrame): dataframe
        filter_values (None or dict): Dictionary of the form:
                `{<field>: <target_values_list>}`
            used to filter columns data.
    """
    import numpy as np
    if filter_values is None or not filter_values:
        return df
    return df[
        np.logical_and.reduce([
            df[column].isin(target_values) 
            for column, target_values in filter_values.items()
        ])
    ]

用法:

df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [1, 2, 3, 4]})

filter_df(df, {
    'a': [1, 2, 3],
    'b': [1, 2, 4]
})

You can filter by multiple columns (more than two) by using the np.logical_and operator to replace & (or np.logical_or to replace |)

Here’s an example function that does the job, if you provide target values for multiple fields. You can adapt it for different types of filtering and whatnot:

def filter_df(df, filter_values):
    """Filter df by matching targets for multiple columns.

    Args:
        df (pd.DataFrame): dataframe
        filter_values (None or dict): Dictionary of the form:
                `{<field>: <target_values_list>}`
            used to filter columns data.
    """
    import numpy as np
    if filter_values is None or not filter_values:
        return df
    return df[
        np.logical_and.reduce([
            df[column].isin(target_values) 
            for column, target_values in filter_values.items()
        ])
    ]

Usage:

df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [1, 2, 3, 4]})

filter_df(df, {
    'a': [1, 2, 3],
    'b': [1, 2, 4]
})

如何根据任意条件函数过滤字典?

问题:如何根据任意条件函数过滤字典?

我有一个要点词典,说:

>>> points={'a':(3,4), 'b':(1,2), 'c':(5,5), 'd':(3,3)}

我想创建一个新字典,其中所有x和y值均小于5的点,即点“ a”,“ b”和“ d”。

根据这本书,每个字典都有该items()函数,该函数返回一个(key, pair) 元组列表:

>>> points.items()
[('a', (3, 4)), ('c', (5, 5)), ('b', (1, 2)), ('d', (3, 3))]

所以我写了这个:

>>> for item in [i for i in points.items() if i[1][0]<5 and i[1][1]<5]:
...     points_small[item[0]]=item[1]
...
>>> points_small
{'a': (3, 4), 'b': (1, 2), 'd': (3, 3)}

有没有更优雅的方式?我期待Python具有一些超棒的dictionary.filter(f)功能…

I have a dictionary of points, say:

>>> points={'a':(3,4), 'b':(1,2), 'c':(5,5), 'd':(3,3)}

I want to create a new dictionary with all the points whose x and y value is smaller than 5, i.e. points ‘a’, ‘b’ and ‘d’.

According to the the book, each dictionary has the items() function, which returns a list of (key, pair) tuple:

>>> points.items()
[('a', (3, 4)), ('c', (5, 5)), ('b', (1, 2)), ('d', (3, 3))]

So I have written this:

>>> for item in [i for i in points.items() if i[1][0]<5 and i[1][1]<5]:
...     points_small[item[0]]=item[1]
...
>>> points_small
{'a': (3, 4), 'b': (1, 2), 'd': (3, 3)}

Is there a more elegant way? I was expecting Python to have some super-awesome dictionary.filter(f) function…


回答 0

如今,在Python 2.7及更高版本中,您可以使用dict理解:

{k: v for k, v in points.iteritems() if v[0] < 5 and v[1] < 5}

在Python 3中:

{k: v for k, v in points.items() if v[0] < 5 and v[1] < 5}

Nowadays, in Python 2.7 and up, you can use a dict comprehension:

{k: v for k, v in points.iteritems() if v[0] < 5 and v[1] < 5}

And in Python 3:

{k: v for k, v in points.items() if v[0] < 5 and v[1] < 5}

回答 1

dict((k, v) for k, v in points.items() if all(x < 5 for x in v))

如果您使用的是Python 2,并且您可能有很多条目.iteritems().items()则可以选择调用而不是。points

all(x < 5 for x in v)如果您确定每个点始终都是二维的,则可能会过大(在这种情况下,您可能会用表示相同的约束and),但效果很好;-)。

dict((k, v) for k, v in points.items() if all(x < 5 for x in v))

You could choose to call .iteritems() instead of .items() if you’re in Python 2 and points may have a lot of entries.

all(x < 5 for x in v) may be overkill if you know for sure each point will always be 2D only (in that case you might express the same constraint with an and) but it will work fine;-).


回答 2

points_small = dict(filter(lambda (a,(b,c)): b<5 and c < 5, points.items()))
points_small = dict(filter(lambda (a,(b,c)): b<5 and c < 5, points.items()))

回答 3

>>> points = {'a': (3, 4), 'c': (5, 5), 'b': (1, 2), 'd': (3, 3)}
>>> dict(filter(lambda x: (x[1][0], x[1][1]) < (5, 5), points.items()))

{'a': (3, 4), 'b': (1, 2), 'd': (3, 3)}
>>> points = {'a': (3, 4), 'c': (5, 5), 'b': (1, 2), 'd': (3, 3)}
>>> dict(filter(lambda x: (x[1][0], x[1][1]) < (5, 5), points.items()))

{'a': (3, 4), 'b': (1, 2), 'd': (3, 3)}

回答 4

dict((k, v) for (k, v) in points.iteritems() if v[0] < 5 and v[1] < 5)
dict((k, v) for (k, v) in points.iteritems() if v[0] < 5 and v[1] < 5)

回答 5

我认为Alex Martelli的答案绝对是做到这一点的最优雅的方法,但只是想添加一种dictionary.filter(f)方法,以Pythonic的方式满足您对超棒方法的需求:

class FilterDict(dict):
    def __init__(self, input_dict):
        for key, value in input_dict.iteritems():
            self[key] = value
    def filter(self, criteria):
        for key, value in self.items():
            if (criteria(value)):
                self.pop(key)

my_dict = FilterDict( {'a':(3,4), 'b':(1,2), 'c':(5,5), 'd':(3,3)} )
my_dict.filter(lambda x: x[0] < 5 and x[1] < 5)

基本上,我们创建一个继承自的类dict,但添加了filter方法。我们确实需要使用.items()过滤,因为.iteritems()在破坏性迭代时使用会引发异常。

I think that Alex Martelli’s answer is definitely the most elegant way to do this, but just wanted to add a way to satisfy your want for a super awesome dictionary.filter(f) method in a Pythonic sort of way:

class FilterDict(dict):
    def __init__(self, input_dict):
        for key, value in input_dict.iteritems():
            self[key] = value
    def filter(self, criteria):
        for key, value in self.items():
            if (criteria(value)):
                self.pop(key)

my_dict = FilterDict( {'a':(3,4), 'b':(1,2), 'c':(5,5), 'd':(3,3)} )
my_dict.filter(lambda x: x[0] < 5 and x[1] < 5)

Basically we create a class that inherits from dict, but adds the filter method. We do need to use .items() for the the filtering, since using .iteritems() while destructively iterating will raise exception.


回答 6

dict((k, v) for (k, v) in points.iteritems() if v[0] < 5 and v[1] < 5)
dict((k, v) for (k, v) in points.iteritems() if v[0] < 5 and v[1] < 5)

从列表中删除无值而不删除0值

问题:从列表中删除无值而不删除0值

这是我开始的来源。

我的清单

L = [0, 23, 234, 89, None, 0, 35, 9]

当我运行这个:

L = filter(None, L)

我得到这个结果

[23, 234, 89, 35, 9]

但这不是我所需要的,我真正需要的是:

[0, 23, 234, 89, 0, 35, 9]

因为我正在计算数据的百分位数,所以0会产生很大的不同。

如何从列表中删除无值而不删除0值?

This was my source I started with.

My List

L = [0, 23, 234, 89, None, 0, 35, 9]

When I run this :

L = filter(None, L)

I get this results

[23, 234, 89, 35, 9]

But this is not what I need, what I really need is :

[0, 23, 234, 89, 0, 35, 9]

Because I’m calculating percentile of the data and the 0 make a lot of difference.

How to remove the None value from a list without removing 0 value?


回答 0

>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> [x for x in L if x is not None]
[0, 23, 234, 89, 0, 35, 9]

只是为了好玩,您可以filter在不使用的情况下适应这种情况lambda(我不推荐使用此代码-只是出于科学目的)

>>> from operator import is_not
>>> from functools import partial
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> filter(partial(is_not, None), L)
[0, 23, 234, 89, 0, 35, 9]
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> [x for x in L if x is not None]
[0, 23, 234, 89, 0, 35, 9]

Just for fun, here’s how you can adapt filter to do this without using a lambda, (I wouldn’t recommend this code – it’s just for scientific purposes)

>>> from operator import is_not
>>> from functools import partial
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> filter(partial(is_not, None), L)
[0, 23, 234, 89, 0, 35, 9]

回答 1

FWIW,Python 3使此问题变得容易:

>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> list(filter(None.__ne__, L))
[0, 23, 234, 89, 0, 35, 9]

在Python 2中,您将改为使用列表推导:

>>> [x for x in L if x is not None]
[0, 23, 234, 89, 0, 35, 9]

A list comprehension is likely the cleanest way:

>>> L = [0, 23, 234, 89, None, 0, 35, 9
>>> [x for x in L if x is not None]
[0, 23, 234, 89, 0, 35, 9]

There is also a functional programming approach but it is more involved:

>>> from operator import is_not
>>> from functools import partial
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> list(filter(partial(is_not, None), L))
[0, 23, 234, 89, 0, 35, 9]

回答 2

对于Python 2.7(请参阅Raymond的答案,对于Python 3等效):

想知道在Python(和其他OO语言)中是否普遍存在“不是None”的东西,以至于在我的Common.py(我使用“ from common import *”导入到每个模块)中,包括了以下几行:

def exists(it):
    return (it is not None)

然后从列表中删除None元素,只需执行以下操作:

filter(exists, L)

我发现这比相应的列表理解(Raymond以他的Python 2版本显示)更容易阅读。

For Python 2.7 (See Raymond’s answer, for Python 3 equivalent):

Wanting to know whether something “is not None” is so common in python (and other OO languages), that in my Common.py (which I import to each module with “from Common import *”), I include these lines:

def exists(it):
    return (it is not None)

Then to remove None elements from a list, simply do:

filter(exists, L)

I find this easier to read, than the corresponding list comprehension (which Raymond shows, as his Python 2 version).


回答 3

使用列表理解可以做到以下几点:

l = [i for i in my_list if i is not None]

l的值是:

[0, 23, 234, 89, 0, 35, 9]

Using list comprehension this can be done as follows:

l = [i for i in my_list if i is not None]

The value of l is:

[0, 23, 234, 89, 0, 35, 9]

回答 4

@jamylak的答案很不错,但是如果您不想导入几个模块只是为了完成这个简单的任务,请lambda就地编写自己的代码:

>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> filter(lambda v: v is not None, L)
[0, 23, 234, 89, 0, 35, 9]

@jamylak answer is quite nice, however if you don’t want to import a couple of modules just to do this simple task, write your own lambda in-place:

>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> filter(lambda v: v is not None, L)
[0, 23, 234, 89, 0, 35, 9]

回答 5

迭代空间,使用可能是一个问题。在不同情况下,分析可能会显示“更快”和/或“更少的内存”密集型。

# first
>>> L = [0, 23, 234, 89, None, 0, 35, 9, ...]
>>> [x for x in L if x is not None]
[0, 23, 234, 89, 0, 35, 9, ...]

# second
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> for i in range(L.count(None)): L.remove(None)
[0, 23, 234, 89, 0, 35, 9, ...]

一种方法(也由@jamylak@Raymond Hettinger@Dipto提出)在内存中创建了一个重复列表,这对于只有很少None条目的大列表来说可能是昂贵的。

第二个方法经过列表一次,然后再次每次直至None到达。这可能会减少内存消耗,并且列表会变得越来越小。列表大小的减少可能会加快None前面很多条目的速度,但是最坏的情况是如果None后面很多条目。

并行化和就地技术是其他方法,但是每种方法在Python中都有其自身的复杂性。了解数据和运行时用例以及对程序进行性能分析是开始进行大量操作或处理大数据的地方。

在通常情况下,选择任何一种方法都可能无关紧要。它更多地成为符号的偏爱。实际上,在那些不常见的情况下,numpy或者cython可能是值得尝试的替代方法,而不是尝试对Python优化进行微管理。

Iteration vs Space, usage could be an issue. In different situations profiling may show either to be “faster” and/or “less memory” intensive.

# first
>>> L = [0, 23, 234, 89, None, 0, 35, 9, ...]
>>> [x for x in L if x is not None]
[0, 23, 234, 89, 0, 35, 9, ...]

# second
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> for i in range(L.count(None)): L.remove(None)
[0, 23, 234, 89, 0, 35, 9, ...]

The first approach (as also suggested by @jamylak, @Raymond Hettinger, and @Dipto) creates a duplicate list in memory, which could be costly of memory for a large list with few None entries.

The second approach goes through the list once, and then again each time until a None is reached. This could be less memory intensive, and the list will get smaller as it goes. The decrease in list size could have a speed up for lots of None entries in the front, but the worst case would be if lots of None entries were in the back.

The second approach would likely always be slower than the first approach. That does not make it an invalid consideration.

Parallelization and in-place techniques are other approaches, but each have their own complications in Python. Knowing the data and the runtime use-cases, as well profiling the program are where to start for intensive operations or large data.

Choosing either approach will probably not matter in common situations. It becomes more of a preference of notation. In fact, in those uncommon circumstances, numpy (example if L is numpy.array: L = L[L != numpy.array(None) (from here)) or cython may be worthwhile alternatives instead of attempting to micromanage Python optimizations.


回答 6

from operator import is_not
from functools import partial   

filter_null = partial(filter, partial(is_not, None))

# A test case
L = [1, None, 2, None, 3]
L = list(filter_null(L))
from operator import is_not
from functools import partial   

filter_null = partial(filter, partial(is_not, None))

# A test case
L = [1, None, 2, None, 3]
L = list(filter_null(L))

回答 7

如果全部是列表列表,则可以修改@Raymond先生的答案

L = [ [None], [123], [None], [151] ] no_none_val = list(filter(None.__ne__, [x[0] for x in L] ) ) 对于python 2但是

no_none_val = [x[0] for x in L if x[0] is not None] """ Both returns [123, 151]"""

<<如果变量不是None,则List中变量的list_indice [0] >>

If it is all a list of lists, you could modify sir @Raymond’s answer

L = [ [None], [123], [None], [151] ] no_none_val = list(filter(None.__ne__, [x[0] for x in L] ) ) for python 2 however

no_none_val = [x[0] for x in L if x[0] is not None] """ Both returns [123, 151]"""

<< list_indice[0] for variable in List if variable is not None >>


回答 8

说列表如下

iterator = [None, 1, 2, 0, '', None, False, {}, (), []]

这只会返回那些其 bool(item) is True

print filter(lambda item: item, iterator)
# [1, 2]

这相当于

print [item for item in iterator if item]

仅过滤无:

print filter(lambda item: item is not None, iterator)
# [1, 2, 0, '', False, {}, (), []]

相当于:

print [item for item in iterator if item is not None]

要获取所有评估为False的项目

print filter(lambda item: not item, iterator)
# Will print [None, '', 0, None, False, {}, (), []]

Say the list is like below

iterator = [None, 1, 2, 0, '', None, False, {}, (), []]

This will return only those items whose bool(item) is True

print filter(lambda item: item, iterator)
# [1, 2]

This is equivalent to

print [item for item in iterator if item]

To just filter None:

print filter(lambda item: item is not None, iterator)
# [1, 2, 0, '', False, {}, (), []]

Equivalent to:

print [item for item in iterator if item is not None]

To get all the items that evaluate to False

print filter(lambda item: not item, iterator)
# Will print [None, '', 0, None, False, {}, (), []]

如何在Python 3中使用过滤,映射和归约

问题:如何在Python 3中使用过滤,映射和归约

filter,,map并且reduce可以在Python 2中完美运行。这是一个示例:

>>> def f(x):
        return x % 2 != 0 and x % 3 != 0
>>> filter(f, range(2, 25))
[5, 7, 11, 13, 17, 19, 23]

>>> def cube(x):
        return x*x*x
>>> map(cube, range(1, 11))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]

>>> def add(x,y):
        return x+y
>>> reduce(add, range(1, 11))
55

但是在Python 3中,我收到以下输出:

>>> filter(f, range(2, 25))
<filter object at 0x0000000002C14908>

>>> map(cube, range(1, 11))
<map object at 0x0000000002C82B70>

>>> reduce(add, range(1, 11))
Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    reduce(add, range(1, 11))
NameError: name 'reduce' is not defined

如果有人可以向我解释为什么,我将不胜感激。

代码的屏幕截图,用于进一步说明:

filter, map, and reduce work perfectly in Python 2. Here is an example:

>>> def f(x):
        return x % 2 != 0 and x % 3 != 0
>>> filter(f, range(2, 25))
[5, 7, 11, 13, 17, 19, 23]

>>> def cube(x):
        return x*x*x
>>> map(cube, range(1, 11))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]

>>> def add(x,y):
        return x+y
>>> reduce(add, range(1, 11))
55

But in Python 3, I receive the following outputs:

>>> filter(f, range(2, 25))
<filter object at 0x0000000002C14908>

>>> map(cube, range(1, 11))
<map object at 0x0000000002C82B70>

>>> reduce(add, range(1, 11))
Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    reduce(add, range(1, 11))
NameError: name 'reduce' is not defined

I would appreciate if someone could explain to me why this is.

Screenshot of code for further clarity:


回答 0

您可以阅读Python 3.0的新增功能中的更改。从2.x升级到3.x时,应该仔细阅读它,因为已经做了很多更改。

此处的完整答案是文档中的引号。

视图和迭代器而不是列表

一些著名的API不再返回列表:

  • […]
  • map()filter()返回迭代器。如果您确实需要列表,则可以使用快速解决方案,例如list(map(...)),但是更好的解决方案通常是使用列表理解(特别是当原始代码使用lambda时),或者重写代码以使其根本不需要列表。map()该函数的副作用特别棘手。正确的转换是使用常规for循环(因为创建列表将很浪费)。
  • […]

内建

  • […]
  • 已删除reduce()functools.reduce()如果确实需要,请使用;但是,在99%的时间里,显式for循环更易于阅读。
  • […]

You can read about the changes in What’s New In Python 3.0. You should read it thoroughly when you move from 2.x to 3.x since a lot has been changed.

The whole answer here are quotes from the documentation.

Views And Iterators Instead Of Lists

Some well-known APIs no longer return lists:

  • […]
  • map() and filter() return iterators. If you really need a list, a quick fix is e.g. list(map(...)), but a better fix is often to use a list comprehension (especially when the original code uses lambda), or rewriting the code so it doesn’t need a list at all. Particularly tricky is map() invoked for the side effects of the function; the correct transformation is to use a regular for loop (since creating a list would just be wasteful).
  • […]

Builtins

  • […]
  • Removed reduce(). Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable.
  • […]

回答 1

的功能mapfilter被有意改为返回迭代器,并减少从被除去的内置和放置在functools.reduce

因此,对于filtermap,您可以将它们包装起来以list()像以前一样查看结果。

>>> def f(x): return x % 2 != 0 and x % 3 != 0
...
>>> list(filter(f, range(2, 25)))
[5, 7, 11, 13, 17, 19, 23]
>>> def cube(x): return x*x*x
...
>>> list(map(cube, range(1, 11)))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
>>> import functools
>>> def add(x,y): return x+y
...
>>> functools.reduce(add, range(1, 11))
55
>>>

现在的建议是,用生成器表达式或列表推导替换map和filter的用法。例:

>>> def f(x): return x % 2 != 0 and x % 3 != 0
...
>>> [i for i in range(2, 25) if f(i)]
[5, 7, 11, 13, 17, 19, 23]
>>> def cube(x): return x*x*x
...
>>> [cube(i) for i in range(1, 11)]
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
>>>

他们说for循环在99%的时间里比减少容易阅读,但是我还是坚持functools.reduce

编辑:99%的数字直接从Guido van Rossum编写的“ Python 3.0的新功能”页面中提取。

The functionality of map and filter was intentionally changed to return iterators, and reduce was removed from being a built-in and placed in functools.reduce.

So, for filter and map, you can wrap them with list() to see the results like you did before.

>>> def f(x): return x % 2 != 0 and x % 3 != 0
...
>>> list(filter(f, range(2, 25)))
[5, 7, 11, 13, 17, 19, 23]
>>> def cube(x): return x*x*x
...
>>> list(map(cube, range(1, 11)))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
>>> import functools
>>> def add(x,y): return x+y
...
>>> functools.reduce(add, range(1, 11))
55
>>>

The recommendation now is that you replace your usage of map and filter with generators expressions or list comprehensions. Example:

>>> def f(x): return x % 2 != 0 and x % 3 != 0
...
>>> [i for i in range(2, 25) if f(i)]
[5, 7, 11, 13, 17, 19, 23]
>>> def cube(x): return x*x*x
...
>>> [cube(i) for i in range(1, 11)]
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
>>>

They say that for loops are 99 percent of the time easier to read than reduce, but I’d just stick with functools.reduce.

Edit: The 99 percent figure is pulled directly from the What’s New In Python 3.0 page authored by Guido van Rossum.


回答 2

作为其他答案的补充,对于上下文管理器来说,这听起来像是一个很好的用例,它将重新将这些函数的名称映射为返回列表并引入reduce全局命名空间的函数。

快速实现可能如下所示:

from contextlib import contextmanager    

@contextmanager
def noiters(*funcs):
    if not funcs: 
        funcs = [map, filter, zip] # etc
    from functools import reduce
    globals()[reduce.__name__] = reduce
    for func in funcs:
        globals()[func.__name__] = lambda *ar, func = func, **kwar: list(func(*ar, **kwar))
    try:
        yield
    finally:
        del globals()[reduce.__name__]
        for func in funcs: globals()[func.__name__] = func

用法如下所示:

with noiters(map):
    from operator import add
    print(reduce(add, range(1, 20)))
    print(map(int, ['1', '2']))

哪些打印:

190
[1, 2]

只是我的2美分:-)

As an addendum to the other answers, this sounds like a fine use-case for a context manager that will re-map the names of these functions to ones which return a list and introduce reduce in the global namespace.

A quick implementation might look like this:

from contextlib import contextmanager    

@contextmanager
def noiters(*funcs):
    if not funcs: 
        funcs = [map, filter, zip] # etc
    from functools import reduce
    globals()[reduce.__name__] = reduce
    for func in funcs:
        globals()[func.__name__] = lambda *ar, func = func, **kwar: list(func(*ar, **kwar))
    try:
        yield
    finally:
        del globals()[reduce.__name__]
        for func in funcs: globals()[func.__name__] = func

With a usage that looks like this:

with noiters(map):
    from operator import add
    print(reduce(add, range(1, 20)))
    print(map(int, ['1', '2']))

Which prints:

190
[1, 2]

Just my 2 cents :-)


回答 3

由于该reduce方法已从Python3的内置函数中删除,因此请不要忘记functools在您的代码中导入。请查看下面的代码段。

import functools
my_list = [10,15,20,25,35]
sum_numbers = functools.reduce(lambda x ,y : x+y , my_list)
print(sum_numbers)

Since the reduce method has been removed from the built in function from Python3, don’t forget to import the functools in your code. Please look at the code snippet below.

import functools
my_list = [10,15,20,25,35]
sum_numbers = functools.reduce(lambda x ,y : x+y , my_list)
print(sum_numbers)

回答 4

以下是Filter,map和reduce函数的示例。

数字= [10,11,12,22,34,43,54,34,67,87,88,98,99,87,44,66]

//过滤

奇数=列表(filter(lambda x:x%2!= 0,数字))

打印(奇数)

//地图

multipleOf2 = list(map(lambda x:x * 2,数字))

打印(multiplyOf2)

//降低

由于不常用reduce函数,因此已从Python 3的内置函数中删除了它。functools模块中仍提供了reduce函数,因此您可以执行以下操作:

从functools进口减少

sumOfNumbers = reduce(lambda x,y:x + y,数字)

打印(sumOfNumbers)

Here are the examples of Filter, map and reduce functions.

numbers = [10,11,12,22,34,43,54,34,67,87,88,98,99,87,44,66]

//Filter

oddNumbers = list(filter(lambda x: x%2 != 0, numbers))

print(oddNumbers)

//Map

multiplyOf2 = list(map(lambda x: x*2, numbers))

print(multiplyOf2)

//Reduce

The reduce function, since it is not commonly used, was removed from the built-in functions in Python 3. It is still available in the functools module, so you can do:

from functools import reduce

sumOfNumbers = reduce(lambda x,y: x+y, numbers)

print(sumOfNumbers)


回答 5

map,filter和reduce的优点之一是当您将它们“链接”在一起以进行复杂的操作时,它们变得清晰易读。但是,内置语法不清晰,全都是“向后的”。因此,我建议使用该PyFunctional软件包(https://pypi.org/project/PyFunctional/)。 这是两者的比较:

flight_destinations_dict = {'NY': {'London', 'Rome'}, 'Berlin': {'NY'}}

Py功能版本

非常清晰的语法。你可以说:

“我有一个飞行目的地序列。如果城市位于dict值中,我想从中获得dict键。最后,过滤掉我在流程中创建的空列表。”

from functional import seq  # PyFunctional package to allow easier syntax

def find_return_flights_PYFUNCTIONAL_SYNTAX(city, flight_destinations_dict):
    return seq(flight_destinations_dict.items()) \
        .map(lambda x: x[0] if city in x[1] else []) \
        .filter(lambda x: x != []) \

默认Python版本

都是倒退。您需要说:

“好的,所以有一个列表。我想从中过滤出空列表。为什么?因为如果城市位于dict值中,我首先得到了dict键。哦,我要执行的列表是flight_destinations_dict。 ”

def find_return_flights_DEFAULT_SYNTAX(city, flight_destinations_dict):
    return list(
        filter(lambda x: x != [],
               map(lambda x: x[0] if city in x[1] else [], flight_destinations_dict.items())
               )
    )

One of the advantages of map, filter and reduce is how legible they become when you “chain” them together to do something complex. However, the built-in syntax isn’t legible and is all “backwards”. So, I suggest using the PyFunctional package (https://pypi.org/project/PyFunctional/). Here’s a comparison of the two:

flight_destinations_dict = {'NY': {'London', 'Rome'}, 'Berlin': {'NY'}}

PyFunctional version

Very legible syntax. You can say:

“I have a sequence of flight destinations. Out of which I want to get the dict key if city is in the dict values. Finally, filter out the empty lists I created in the process.”

from functional import seq  # PyFunctional package to allow easier syntax

def find_return_flights_PYFUNCTIONAL_SYNTAX(city, flight_destinations_dict):
    return seq(flight_destinations_dict.items()) \
        .map(lambda x: x[0] if city in x[1] else []) \
        .filter(lambda x: x != []) \

Default Python version

It’s all backwards. You need to say:

“OK, so, there’s a list. I want to filter empty lists out of it. Why? Because I first got the dict key if the city was in the dict values. Oh, the list I’m doing this to is flight_destinations_dict.”

def find_return_flights_DEFAULT_SYNTAX(city, flight_destinations_dict):
    return list(
        filter(lambda x: x != [],
               map(lambda x: x[0] if city in x[1] else [], flight_destinations_dict.items())
               )
    )

列表理解与lambda +过滤器

问题:列表理解与lambda +过滤器

我碰巧发现自己有一个基本的过滤需求:我有一个列表,并且必须按项目的属性对其进行过滤。

我的代码如下所示:

my_list = [x for x in my_list if x.attribute == value]

但是后来我想,这样写会更好吗?

my_list = filter(lambda x: x.attribute == value, my_list)

它更具可读性,并且如果需要性能,可以将lambda取出以获取收益。

问题是:使用第二种方法是否有警告?有任何性能差异吗?我是否完全想念Pythonic Way™,应该以另一种方式来做到这一点(例如,使用itemgetter而不是lambda)吗?

I happened to find myself having a basic filtering need: I have a list and I have to filter it by an attribute of the items.

My code looked like this:

my_list = [x for x in my_list if x.attribute == value]

But then I thought, wouldn’t it be better to write it like this?

my_list = filter(lambda x: x.attribute == value, my_list)

It’s more readable, and if needed for performance the lambda could be taken out to gain something.

Question is: are there any caveats in using the second way? Any performance difference? Am I missing the Pythonic Way™ entirely and should do it in yet another way (such as using itemgetter instead of the lambda)?


回答 0

奇怪的是,不同的人有多少美丽。我发现列表理解比filter+ 清晰得多lambda,但是请使用任何您更容易理解的列表。

有两件事可能会减慢您对的使用filter

第一个是函数调用开销:使用Python函数(无论是由def还是创建的lambda)后,过滤器的运行速度可能会比列表理解慢。几乎可以肯定,这还不够重要,并且在对代码进行计时并发现它是瓶颈之前,您不应该对性能进行太多的考虑,但是区别仍然存在。

可能适用的其他开销是,lambda被强制访问作用域变量(value)。这比访问局部变量要慢,并且在Python 2.x中,列表推导仅访问局部变量。如果您使用的是Python 3.x,则列表推导是在单独的函数中运行的,因此它也将value通过闭包进行访问,这种区别将不适用。

要考虑的另一个选项是使用生成器而不是列表推导:

def filterbyvalue(seq, value):
   for el in seq:
       if el.attribute==value: yield el

然后,在您的主要代码(这才是真正的可读性)中,您已经用有希望的有意义的函数名称替换了列表理解和过滤器。

It is strange how much beauty varies for different people. I find the list comprehension much clearer than filter+lambda, but use whichever you find easier.

There are two things that may slow down your use of filter.

The first is the function call overhead: as soon as you use a Python function (whether created by def or lambda) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn’t think much about performance until you’ve timed your code and found it to be a bottleneck, but the difference will be there.

The other overhead that might apply is that the lambda is being forced to access a scoped variable (value). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessing value through a closure and this difference won’t apply.

The other option to consider is to use a generator instead of a list comprehension:

def filterbyvalue(seq, value):
   for el in seq:
       if el.attribute==value: yield el

Then in your main code (which is where readability really matters) you’ve replaced both list comprehension and filter with a hopefully meaningful function name.


回答 1

在Python中,这是一个有点宗教性的问题。即使Guido考虑从Python 3中删除它mapfilter并且reduce存在足够的反弹,最终只reduce从内置函数转移到functools.reduce

我个人认为列表理解更容易阅读。[i for i in list if i.attribute == value]由于所有行为都在表面上而不是在过滤器函数内部,因此表达式中发生的事情更加明确。

我不会太担心这两种方法之间的性能差异,因为这是微不足道的。如果确实证明这是您应用程序中的瓶颈(不太可能),我真的只会对其进行优化。

另外,由于BDFL希望filter摆脱这种语言,因此可以肯定地自动使列表理解更具Pythonic ;-)

This is a somewhat religious issue in Python. Even though Guido considered removing map, filter and reduce from Python 3, there was enough of a backlash that in the end only reduce was moved from built-ins to functools.reduce.

Personally I find list comprehensions easier to read. It is more explicit what is happening from the expression [i for i in list if i.attribute == value] as all the behaviour is on the surface not inside the filter function.

I would not worry too much about the performance difference between the two approaches as it is marginal. I would really only optimise this if it proved to be the bottleneck in your application which is unlikely.

Also since the BDFL wanted filter gone from the language then surely that automatically makes list comprehensions more Pythonic ;-)


回答 2

由于任何速度差都将是微不足道的,因此使用过滤器还是列表理解都取决于品味。总的来说,我倾向于使用理解(这里似乎与大多数其他答案一致),但是在某些情况下,我更喜欢使用filter

一个非常常见的用例是抽取某些可迭代的X的值作为谓词P(x):

[x for x in X if P(x)]

但有时您想先将某些函数应用于这些值:

[f(x) for x in X if P(f(x))]


作为一个具体的例子,考虑

primes_cubed = [x*x*x for x in range(1000) if prime(x)]

我认为这看起来比使用略好filter。但是现在考虑

prime_cubes = [x*x*x for x in range(1000) if prime(x*x*x)]

在这种情况下,我们要filter反对后计算值。除了两次计算多维数据集的问题(想象一个更昂贵的计算)外,还有两次写入表达式的问题,这违背了DRY的审美观。在这种情况下,我倾向于使用

prime_cubes = filter(prime, [x*x*x for x in range(1000)])

Since any speed difference is bound to be miniscule, whether to use filters or list comprehensions comes down to a matter of taste. In general I’m inclined to use comprehensions (which seems to agree with most other answers here), but there is one case where I prefer filter.

A very frequent use case is pulling out the values of some iterable X subject to a predicate P(x):

[x for x in X if P(x)]

but sometimes you want to apply some function to the values first:

[f(x) for x in X if P(f(x))]


As a specific example, consider

primes_cubed = [x*x*x for x in range(1000) if prime(x)]

I think this looks slightly better than using filter. But now consider

prime_cubes = [x*x*x for x in range(1000) if prime(x*x*x)]

In this case we want to filter against the post-computed value. Besides the issue of computing the cube twice (imagine a more expensive calculation), there is the issue of writing the expression twice, violating the DRY aesthetic. In this case I’d be apt to use

prime_cubes = filter(prime, [x*x*x for x in range(1000)])

回答 3

尽管filter可能是“更快的方法”,但“ Python方式”将不在乎这些事情,除非性能绝对至关重要(在这种情况下,您将不会使用Python!)。

Although filter may be the “faster way”, the “Pythonic way” would be not to care about such things unless performance is absolutely critical (in which case you wouldn’t be using Python!).


回答 4

我以为我会在python 3中添加,filter()实际上是一个迭代器对象,因此您必须将filter方法调用传递给list()才能构建过滤后的列表。所以在python 2:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = filter(lambda num: num % 2 == 0, lst_a)

列表b和c具有相同的值,并且大约在相同的时间内完成,因为filter()等效[如果在z中,则x表示y中的x]。但是,在3中,相同的代码将使列表c包含过滤器对象,而不是过滤后的列表。要在3中产生相同的值:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = list(filter(lambda num: num %2 == 0, lst_a))

问题在于list()接受一个可迭代的参数,并从该参数创建一个新列表。结果是,在python 3中以这种方式使用filter所花费的时间是[x for x in y if z]中方法的两倍,因为您必须遍历filter()的输出以及原始列表。

I thought I’d just add that in python 3, filter() is actually an iterator object, so you’d have to pass your filter method call to list() in order to build the filtered list. So in python 2:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = filter(lambda num: num % 2 == 0, lst_a)

lists b and c have the same values, and were completed in about the same time as filter() was equivalent [x for x in y if z]. However, in 3, this same code would leave list c containing a filter object, not a filtered list. To produce the same values in 3:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = list(filter(lambda num: num %2 == 0, lst_a))

The problem is that list() takes an iterable as it’s argument, and creates a new list from that argument. The result is that using filter in this way in python 3 takes up to twice as long as the [x for x in y if z] method because you have to iterate over the output from filter() as well as the original list.


回答 5

一个重要的区别是列表理解将list在过滤器返回a 的同时返回filter,而您不能像a那样操作list(即:对其进行调用len,但不能与的返回一起使用filter)。

我自己的自学使我遇到了一些类似的问题。

话虽这么说,如果有一种方法可以list从a中获得结果filter,就像您在.NET中所做的那样lst.Where(i => i.something()).ToList(),我很想知道。

编辑:这是Python 3而不是2的情况(请参阅注释中的讨论)。

An important difference is that list comprehension will return a list while the filter returns a filter, which you cannot manipulate like a list (ie: call len on it, which does not work with the return of filter).

My own self-learning brought me to some similar issue.

That being said, if there is a way to have the resulting list from a filter, a bit like you would do in .NET when you do lst.Where(i => i.something()).ToList(), I am curious to know it.

EDIT: This is the case for Python 3, not 2 (see discussion in comments).


回答 6

我发现第二种方法更具可读性。它确切地告诉您意图是什么:过滤列表。
PS:请勿将“列表”用作变量名

I find the second way more readable. It tells you exactly what the intention is: filter the list.
PS: do not use ‘list’ as a variable name


回答 7

filter如果使用内置函数,通常会稍快一些。

我希望列表理解在您的情况下会更快

generally filter is slightly faster if using a builtin function.

I would expect the list comprehension to be slightly faster in your case


回答 8

过滤器就是这样。它过滤出列表的元素。您可以看到定义中提到的内容相同(在我之前提到的官方文档链接中)。然而,列表理解是什么,作用于后产生一个新的列表的东西前面的列表上。(两个过滤器和列表理解创造了新的名单,并取代旧的名单无法执行操作。这里一个新的名单是像一个列表(例如,一种全新的数据类型。例如将整数转换为字符串等)

在您的示例中,按照定义,使用过滤器比使用列表理解更好。但是,如果您想从列表元素中说other_attribute,在您的示例中要作为新列表进行检索,则可以使用列表理解。

return [item.other_attribute for item in my_list if item.attribute==value]

这就是我实际上记得有关过滤器和列表理解的方式。删除列表中的一些内容并保持其他元素不变,请使用过滤器。在元素上自行使用一些逻辑,并创建适合于某些目的的缩减列表,使用列表理解。

Filter is just that. It filters out the elements of a list. You can see the definition mentions the same(in the official docs link I mentioned before). Whereas, list comprehension is something that produces a new list after acting upon something on the previous list.(Both filter and list comprehension creates new list and not perform operation in place of the older list. A new list here is something like a list with, say, an entirely new data type. Like converting integers to string ,etc)

In your example, it is better to use filter than list comprehension, as per the definition. However, if you want, say other_attribute from the list elements, in your example is to be retrieved as a new list, then you can use list comprehension.

return [item.other_attribute for item in my_list if item.attribute==value]

This is how I actually remember about filter and list comprehension. Remove a few things within a list and keep the other elements intact, use filter. Use some logic on your own at the elements and create a watered down list suitable for some purpose, use list comprehension.


回答 9

这是我需要在列表理解进行筛选时使用的一小段内容。只是过滤器,lambda和列表(也称为猫的忠诚度和狗的清洁度)的组合。

在这种情况下,我正在读取文件,删除空白行,注释掉行,以及对行进行注释后的所有内容:

# Throw out blank lines and comments
with open('file.txt', 'r') as lines:        
    # From the inside out:
    #    [s.partition('#')[0].strip() for s in lines]... Throws out comments
    #   filter(lambda x: x!= '', [s.part... Filters out blank lines
    #  y for y in filter... Converts filter object to list
    file_contents = [y for y in filter(lambda x: x != '', [s.partition('#')[0].strip() for s in lines])]

Here’s a short piece I use when I need to filter on something after the list comprehension. Just a combination of filter, lambda, and lists (otherwise known as the loyalty of a cat and the cleanliness of a dog).

In this case I’m reading a file, stripping out blank lines, commented out lines, and anything after a comment on a line:

# Throw out blank lines and comments
with open('file.txt', 'r') as lines:        
    # From the inside out:
    #    [s.partition('#')[0].strip() for s in lines]... Throws out comments
    #   filter(lambda x: x!= '', [s.part... Filters out blank lines
    #  y for y in filter... Converts filter object to list
    file_contents = [y for y in filter(lambda x: x != '', [s.partition('#')[0].strip() for s in lines])]

回答 10

除了可接受的答案外,还有一个极端的情况,您应该使用过滤器而不是列表推导。如果列表不可散列,则无法直接使用列表推导处理它。一个真实的例子是,如果您用来pyodbc从数据库中读取结果。在fetchAll()从结果cursor是unhashable列表。在这种情况下,要直接处理返回的结果,应使用过滤器:

cursor.execute("SELECT * FROM TABLE1;")
data_from_db = cursor.fetchall()
processed_data = filter(lambda s: 'abc' in s.field1 or s.StartTime >= start_date_time, data_from_db) 

如果您在此处使用列表理解,则会出现错误:

TypeError:无法散列的类型:“列表”

In addition to the accepted answer, there is a corner case when you should use filter instead of a list comprehension. If the list is unhashable you cannot directly process it with a list comprehension. A real world example is if you use pyodbc to read results from a database. The fetchAll() results from cursor is an unhashable list. In this situation, to directly manipulating on the returned results, filter should be used:

cursor.execute("SELECT * FROM TABLE1;")
data_from_db = cursor.fetchall()
processed_data = filter(lambda s: 'abc' in s.field1 or s.StartTime >= start_date_time, data_from_db) 

If you use list comprehension here you will get the error:

TypeError: unhashable type: ‘list’


回答 11

我花了一些时间熟悉higher order functions filterand map。因此,我习惯了它们,实际上我很喜欢filter,因为很明显它通过保留真实内容来进行过滤,而且我知道一些functional programming术语也很酷。

然后,我读了这段文章(Fluent Python书):

map和filter函数仍是Python 3中的内置函数,但是由于引入了列表理解和生成器表达式,因此它们并不那么重要。listcomp或genexp可将地图和过滤器组合在一起,但可读性更高。

现在,我想,如果您可以使用已经很广泛的成语(例如列表理解)来实现filter/ 的概念,那又何必困扰 map呢?而且mapsfilters是种功能。在这种情况下,我更喜欢使用Anonymous functionslambda。

最后,仅出于测试目的,我对这两种方法(maplistComp)都进行了计时,但没有看到任何相关的速度差异来证明对此进行论证的合理性。

from timeit import Timer

timeMap = Timer(lambda: list(map(lambda x: x*x, range(10**7))))
print(timeMap.timeit(number=100))

timeListComp = Timer(lambda:[(lambda x: x*x) for x in range(10**7)])
print(timeListComp.timeit(number=100))

#Map:                 166.95695265199174
#List Comprehension   177.97208347299602

It took me some time to get familiarized with the higher order functions filter and map. So i got used to them and i actually liked filter as it was explicit that it filters by keeping whatever is truthy and I’ve felt cool that I knew some functional programming terms.

Then I read this passage (Fluent Python Book):

The map and filter functions are still builtins in Python 3, but since the introduction of list comprehensions and generator ex‐ pressions, they are not as important. A listcomp or a genexp does the job of map and filter combined, but is more readable.

And now I think, why bother with the concept of filter / map if you can achieve it with already widely spread idioms like list comprehensions. Furthermore maps and filters are kind of functions. In this case I prefer using Anonymous functions lambdas.

Finally, just for the sake of having it tested, I’ve timed both methods (map and listComp) and I didn’t see any relevant speed difference that would justify making arguments about it.

from timeit import Timer

timeMap = Timer(lambda: list(map(lambda x: x*x, range(10**7))))
print(timeMap.timeit(number=100))

timeListComp = Timer(lambda:[(lambda x: x*x) for x in range(10**7)])
print(timeListComp.timeit(number=100))

#Map:                 166.95695265199174
#List Comprehension   177.97208347299602

回答 12

奇怪的是,在Python 3上,我看到过滤器的执行速度快于列表推导。

我一直认为列表理解会更有效。类似于:[如果名称不是None,则在brand_names_db中使用名称命名]生成的字节码要好一些。

>>> def f1(seq):
...     return list(filter(None, seq))
>>> def f2(seq):
...     return [i for i in seq if i is not None]
>>> disassemble(f1.__code__)
2         0 LOAD_GLOBAL              0 (list)
          2 LOAD_GLOBAL              1 (filter)
          4 LOAD_CONST               0 (None)
          6 LOAD_FAST                0 (seq)
          8 CALL_FUNCTION            2
         10 CALL_FUNCTION            1
         12 RETURN_VALUE
>>> disassemble(f2.__code__)
2           0 LOAD_CONST               1 (<code object <listcomp> at 0x10cfcaa50, file "<stdin>", line 2>)
          2 LOAD_CONST               2 ('f2.<locals>.<listcomp>')
          4 MAKE_FUNCTION            0
          6 LOAD_FAST                0 (seq)
          8 GET_ITER
         10 CALL_FUNCTION            1
         12 RETURN_VALUE

但是它们实际上要慢一些:

   >>> timeit(stmt="f1(range(1000))", setup="from __main__ import f1,f2")
   21.177661532000116
   >>> timeit(stmt="f2(range(1000))", setup="from __main__ import f1,f2")
   42.233950221000214

Curiously on Python 3, I see filter performing faster than list comprehensions.

I always thought that the list comprehensions would be more performant. Something like: [name for name in brand_names_db if name is not None] The bytecode generated is a bit better.

>>> def f1(seq):
...     return list(filter(None, seq))
>>> def f2(seq):
...     return [i for i in seq if i is not None]
>>> disassemble(f1.__code__)
2         0 LOAD_GLOBAL              0 (list)
          2 LOAD_GLOBAL              1 (filter)
          4 LOAD_CONST               0 (None)
          6 LOAD_FAST                0 (seq)
          8 CALL_FUNCTION            2
         10 CALL_FUNCTION            1
         12 RETURN_VALUE
>>> disassemble(f2.__code__)
2           0 LOAD_CONST               1 (<code object <listcomp> at 0x10cfcaa50, file "<stdin>", line 2>)
          2 LOAD_CONST               2 ('f2.<locals>.<listcomp>')
          4 MAKE_FUNCTION            0
          6 LOAD_FAST                0 (seq)
          8 GET_ITER
         10 CALL_FUNCTION            1
         12 RETURN_VALUE

But they are actually slower:

   >>> timeit(stmt="f1(range(1000))", setup="from __main__ import f1,f2")
   21.177661532000116
   >>> timeit(stmt="f2(range(1000))", setup="from __main__ import f1,f2")
   42.233950221000214

回答 13

我拿

def filter_list(list, key, value, limit=None):
    return [i for i in list if i[key] == value][:limit]

My take

def filter_list(list, key, value, limit=None):
    return [i for i in list if i[key] == value][:limit]