问题:Python:根据条件拆分列表?

从美学角度和性能角度来看,基于条件将项目列表拆分为多个列表的最佳方法是什么?相当于:

good = [x for x in mylist if x in goodvals]
bad  = [x for x in mylist if x not in goodvals]

有没有更优雅的方法可以做到这一点?

更新:这是实际用例,以更好地解释我正在尝试做的事情:

# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims  = [f for f in files if f[2].lower() not in IMAGE_TYPES]

What’s the best way, both aesthetically and from a performance perspective, to split a list of items into multiple lists based on a conditional? The equivalent of:

good = [x for x in mylist if x in goodvals]
bad  = [x for x in mylist if x not in goodvals]

is there a more elegant way to do this?

Update: here’s the actual use case, to better explain what I’m trying to do:

# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims  = [f for f in files if f[2].lower() not in IMAGE_TYPES]

回答 0

good = [x for x in mylist if x in goodvals]
bad  = [x for x in mylist if x not in goodvals]

有没有更优雅的方法可以做到这一点?

该代码完全可读,而且非常清晰!

# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims  = [f for f in files if f[2].lower() not in IMAGE_TYPES]

再次,这很好!

使用集合可能会稍微改善性能,但这是一个微不足道的区别,我发现列表理解要容易阅读得多,您不必担心顺序被弄乱了,重复项被删除等等。

实际上,我可能会“向后”前进另一步,只需使用简单的for循环即可:

images, anims = [], []

for f in files:
    if f.lower() in IMAGE_TYPES:
        images.append(f)
    else:
        anims.append(f)

列表理解或使用set()都很好,除非您需要添加其他检查或其他逻辑-说您要删除所有0字节jpeg,只需添加类似内容即可。

if f[1] == 0:
    continue
good = [x for x in mylist if x in goodvals]
bad  = [x for x in mylist if x not in goodvals]

is there a more elegant way to do this?

That code is perfectly readable, and extremely clear!

# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims  = [f for f in files if f[2].lower() not in IMAGE_TYPES]

Again, this is fine!

There might be slight performance improvements using sets, but it’s a trivial difference, and I find the list comprehension far easier to read, and you don’t have to worry about the order being messed up, duplicates being removed as so on.

In fact, I may go another step “backward”, and just use a simple for loop:

images, anims = [], []

for f in files:
    if f.lower() in IMAGE_TYPES:
        images.append(f)
    else:
        anims.append(f)

The a list-comprehension or using set() is fine until you need to add some other check or another bit of logic – say you want to remove all 0-byte jpeg’s, you just add something like..

if f[1] == 0:
    continue

回答 1

good, bad = [], []
for x in mylist:
    (bad, good)[x in goodvals].append(x)
good, bad = [], []
for x in mylist:
    (bad, good)[x in goodvals].append(x)

回答 2

这是惰性迭代器方法:

from itertools import tee

def split_on_condition(seq, condition):
    l1, l2 = tee((condition(item), item) for item in seq)
    return (i for p, i in l1 if p), (i for p, i in l2 if not p)

它对每个项目评估一次条件,然后返回两个生成器,第一个生成器从条件为true的序列中产生值,另一个从条件为false的序列中产生值。

因为它很懒,所以您可以在任何迭代器上使用它,甚至可以在无限迭代器上使用它:

from itertools import count, islice

def is_prime(n):
    return n > 1 and all(n % i for i in xrange(2, n))

primes, not_primes = split_on_condition(count(), is_prime)
print("First 10 primes", list(islice(primes, 10)))
print("First 10 non-primes", list(islice(not_primes, 10)))

通常,虽然非惰性列表返回方法更好:

def split_on_condition(seq, condition):
    a, b = [], []
    for item in seq:
        (a if condition(item) else b).append(item)
    return a, b

编辑:对于您更具体的用例,可以通过某个键将项目拆分为不同的列表,下面是一个通用函数:

DROP_VALUE = lambda _:_
def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
    """Split a sequence into lists based on a key function.

        seq - input sequence
        resultmapping - a dictionary that maps from target lists to keys that go to that list
        keyfunc - function to calculate the key of an input value
        default - the target where items that don't have a corresponding key go, by default they are dropped
    """
    result_lists = dict((key, []) for key in resultmapping)
    appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)

    if default is not DROP_VALUE:
        result_lists.setdefault(default, [])
        default_action = result_lists[default].append
    else:
        default_action = DROP_VALUE

    for item in seq:
        appenders.get(keyfunc(item), default_action)(item)

    return result_lists

用法:

def file_extension(f):
    return f[2].lower()

split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
print split_files['images']
print split_files['anims']

Here’s the lazy iterator approach:

from itertools import tee

def split_on_condition(seq, condition):
    l1, l2 = tee((condition(item), item) for item in seq)
    return (i for p, i in l1 if p), (i for p, i in l2 if not p)

It evaluates the condition once per item and returns two generators, first yielding values from the sequence where the condition is true, the other where it’s false.

Because it’s lazy you can use it on any iterator, even an infinite one:

from itertools import count, islice

def is_prime(n):
    return n > 1 and all(n % i for i in xrange(2, n))

primes, not_primes = split_on_condition(count(), is_prime)
print("First 10 primes", list(islice(primes, 10)))
print("First 10 non-primes", list(islice(not_primes, 10)))

Usually though the non-lazy list returning approach is better:

def split_on_condition(seq, condition):
    a, b = [], []
    for item in seq:
        (a if condition(item) else b).append(item)
    return a, b

Edit: For your more specific usecase of splitting items into different lists by some key, heres a generic function that does that:

DROP_VALUE = lambda _:_
def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
    """Split a sequence into lists based on a key function.

        seq - input sequence
        resultmapping - a dictionary that maps from target lists to keys that go to that list
        keyfunc - function to calculate the key of an input value
        default - the target where items that don't have a corresponding key go, by default they are dropped
    """
    result_lists = dict((key, []) for key in resultmapping)
    appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)

    if default is not DROP_VALUE:
        result_lists.setdefault(default, [])
        default_action = result_lists[default].append
    else:
        default_action = DROP_VALUE

    for item in seq:
        appenders.get(keyfunc(item), default_action)(item)

    return result_lists

Usage:

def file_extension(f):
    return f[2].lower()

split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
print split_files['images']
print split_files['anims']

回答 3

所有提出的解决方案的问题在于它将扫描并应用两次过滤功能。我会做一个简单的小函数,像这样:

def SplitIntoTwoLists(l, f):
  a = []
  b = []
  for i in l:
    if f(i):
      a.append(i)
    else:
      b.append(i)
 return (a,b)

这样一来,您不会处理任何事情两次,也不会重复代码。

Problem with all proposed solutions is that it will scan and apply the filtering function twice. I’d make a simple small function like this:

def split_into_two_lists(lst, f):
    a = []
    b = []
    for elem in lst:
        if f(elem):
            a.append(elem)
        else:
            b.append(elem)
    return a, b

That way you are not processing anything twice and also are not repeating code.


回答 4

我接受。我提出了一个惰性的单遍partition函数,该函数可以保留输出子序列中的相对顺序。

1.要求

我认为要求是:

  • 保持元素的相对顺序(因此,没有集合和字典)
  • 每个元素仅评估一次条件(因此不使用(ifiltergroupby
  • 允许任意使用任一序列的延迟消耗(如果我们能够对它们进行预先计算,那么幼稚的实现也可能是可以接受的)

2. split图书馆

我的partition函数(在下面介绍)和其他类似的函数已将其集成到一个小的库中:

通常可以通过PyPI安装:

pip install --user split

要根据条件拆分列表,请使用partition函数:

>>> from split import partition
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
>>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> images, other = partition(lambda f: f[-1] in image_types, files)
>>> list(images)
[('file1.jpg', 33L, '.jpg')]
>>> list(other)
[('file2.avi', 999L, '.avi')]

3. partition功能说明

在内部,我们需要一次构建两个子序列,因此仅消耗一个输出序列将迫使另一序列也要计算。并且我们需要在用户请求之间保持状态(存储已处理但尚未请求的元素)。为了保持状态,我使用了两个双端队列(deques):

from collections import deque

SplitSeq 全班负责客房整理:

class SplitSeq:
    def __init__(self, condition, sequence):
        self.cond = condition
        self.goods = deque([])
        self.bads = deque([])
        self.seq = iter(sequence)

魔术发生在它的.getNext()方法上。它几乎就像.next() 迭代器一样,但是允许指定这次我们想要哪种元素。在后台,它不会丢弃被拒绝的元素,而是将它们放入两个队列之一:

    def getNext(self, getGood=True):
        if getGood:
            these, those, cond = self.goods, self.bads, self.cond
        else:
            these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
        if these:
            return these.popleft()
        else:
            while 1: # exit on StopIteration
                n = self.seq.next()
                if cond(n):
                    return n
                else:
                    those.append(n)

最终用户应该使用partition功能。它接受一个条件函数和一个序列(就像mapfilter),并返回两个生成器。第一个生成器构建条件成立的元素的子序列,第二个生成器构建互补子序列。迭代器和生成器允许延迟分解甚至很长或无限的序列。

def partition(condition, sequence):
    cond = condition if condition else bool  # evaluate as bool if condition == None
    ss = SplitSeq(cond, sequence)
    def goods():
        while 1:
            yield ss.getNext(getGood=True)
    def bads():
        while 1:
            yield ss.getNext(getGood=False)
    return goods(), bads()

我选择了测试功能是第一个参数,以便在未来的部分应用程序(类似于如何mapfilter 有测试功能作为第一个参数)。

My take on it. I propose a lazy, single-pass, partition function, which preserves relative order in the output subsequences.

1. Requirements

I assume that the requirements are:

  • maintain elements’ relative order (hence, no sets and dictionaries)
  • evaluate condition only once for every element (hence not using (i)filter or groupby)
  • allow for lazy consumption of either sequence (if we can afford to precompute them, then the naïve implementation is likely to be acceptable too)

2. split library

My partition function (introduced below) and other similar functions have made it into a small library:

It’s installable normally via PyPI:

pip install --user split

To split a list base on condition, use partition function:

>>> from split import partition
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
>>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> images, other = partition(lambda f: f[-1] in image_types, files)
>>> list(images)
[('file1.jpg', 33L, '.jpg')]
>>> list(other)
[('file2.avi', 999L, '.avi')]

3. partition function explained

Internally we need to build two subsequences at once, so consuming only one output sequence will force the other one to be computed too. And we need to keep state between user requests (store processed but not yet requested elements). To keep state, I use two double-ended queues (deques):

from collections import deque

SplitSeq class takes care of the housekeeping:

class SplitSeq:
    def __init__(self, condition, sequence):
        self.cond = condition
        self.goods = deque([])
        self.bads = deque([])
        self.seq = iter(sequence)

Magic happens in its .getNext() method. It is almost like .next() of the iterators, but allows to specify which kind of element we want this time. Behind the scene it doesn’t discard the rejected elements, but instead puts them in one of the two queues:

    def getNext(self, getGood=True):
        if getGood:
            these, those, cond = self.goods, self.bads, self.cond
        else:
            these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
        if these:
            return these.popleft()
        else:
            while 1: # exit on StopIteration
                n = self.seq.next()
                if cond(n):
                    return n
                else:
                    those.append(n)

The end user is supposed to use partition function. It takes a condition function and a sequence (just like map or filter), and returns two generators. The first generator builds a subsequence of elements for which the condition holds, the second one builds the complementary subsequence. Iterators and generators allow for lazy splitting of even long or infinite sequences.

def partition(condition, sequence):
    cond = condition if condition else bool  # evaluate as bool if condition == None
    ss = SplitSeq(cond, sequence)
    def goods():
        while 1:
            yield ss.getNext(getGood=True)
    def bads():
        while 1:
            yield ss.getNext(getGood=False)
    return goods(), bads()

I chose the test function to be the first argument to facilitate partial application in the future (similar to how map and filter have the test function as the first argument).


回答 5

我基本上喜欢Anders的方法,因为它非常笼统。这是一个将分类程序放在首位(以匹配过滤器语法)并使用defaultdict(假定已导入)的版本。

def categorize(func, seq):
    """Return mapping from categories to lists
    of categorized items.
    """
    d = defaultdict(list)
    for item in seq:
        d[func(item)].append(item)
    return d

I basically like Anders’ approach as it is very general. Here’s a version that puts the categorizer first (to match filter syntax) and uses a defaultdict (assumed imported).

def categorize(func, seq):
    """Return mapping from categories to lists
    of categorized items.
    """
    d = defaultdict(list)
    for item in seq:
        d[func(item)].append(item)
    return d

回答 6

首次使用(OP前编辑):使用集:

mylist = [1,2,3,4,5,6,7]
goodvals = [1,3,7,8,9]

myset = set(mylist)
goodset = set(goodvals)

print list(myset.intersection(goodset))  # [1, 3, 7]
print list(myset.difference(goodset))    # [2, 4, 5, 6]

这对于可读性(IMHO)和性能都很好。

第二遍(OP编辑后):

创建一组好的扩展列表:

IMAGE_TYPES = set(['.jpg','.jpeg','.gif','.bmp','.png'])

这将提高性能。否则,你的情况对我来说很好。

First go (pre-OP-edit): Use sets:

mylist = [1,2,3,4,5,6,7]
goodvals = [1,3,7,8,9]

myset = set(mylist)
goodset = set(goodvals)

print list(myset.intersection(goodset))  # [1, 3, 7]
print list(myset.difference(goodset))    # [2, 4, 5, 6]

That’s good for both readability (IMHO) and performance.

Second go (post-OP-edit):

Create your list of good extensions as a set:

IMAGE_TYPES = set(['.jpg','.jpeg','.gif','.bmp','.png'])

and that will increase performance. Otherwise, what you have looks fine to me.


回答 7

itertools.groupby几乎可以满足您的要求,只是它要求对项目进行排序以确保获得单个连续范围,因此您需要首先按关键字进行排序(否则每种类型将获得多个交错的组)。例如。

def is_good(f):
    return f[2].lower() in IMAGE_TYPES

files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file3.gif', 123L, '.gif')]

for key, group in itertools.groupby(sorted(files, key=is_good), key=is_good):
    print key, list(group)

给出:

False [('file2.avi', 999L, '.avi')]
True [('file1.jpg', 33L, '.jpg'), ('file3.gif', 123L, '.gif')]

与其他解决方案类似,可以将密钥功能定义为分为任意数量的组。

itertools.groupby almost does what you want, except it requires the items to be sorted to ensure that you get a single contiguous range, so you need to sort by your key first (otherwise you’ll get multiple interleaved groups for each type). eg.

def is_good(f):
    return f[2].lower() in IMAGE_TYPES

files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file3.gif', 123L, '.gif')]

for key, group in itertools.groupby(sorted(files, key=is_good), key=is_good):
    print key, list(group)

gives:

False [('file2.avi', 999L, '.avi')]
True [('file1.jpg', 33L, '.jpg'), ('file3.gif', 123L, '.gif')]

Similar to the other solutions, the key func can be defined to divide into any number of groups you want.


回答 8

good.append(x) if x in goodvals else bad.append(x)

@dansalmo给出的这个简洁明了的答案掩盖在评论中,所以我只是在这里重新张贴它作为答案,这样它才能得到应有的重视,特别是对于新读者而言。

完整的例子:

good, bad = [], []
for x in my_list:
    good.append(x) if x in goodvals else bad.append(x)
good.append(x) if x in goodvals else bad.append(x)

This elegant and concise answer by @dansalmo showed up buried in the comments, so I’m just reposting it here as an answer so it can get the prominence it deserves, especially for new readers.

Complete example:

good, bad = [], []
for x in my_list:
    good.append(x) if x in goodvals else bad.append(x)

回答 9

如果要以FP样式制作:

good, bad = [ sum(x, []) for x in zip(*(([y], []) if y in goodvals else ([], [y])
                                        for y in mylist)) ]

这不是最易读的解决方案,但是至少仅一次迭代mylist。

If you want to make it in FP style:

good, bad = [ sum(x, []) for x in zip(*(([y], []) if y in goodvals else ([], [y])
                                        for y in mylist)) ]

Not the most readable solution, but at least iterates through mylist only once.


回答 10

我个人喜欢您引用的版本,假设您已经有一个列表goodvals。如果没有,则类似于:

good = filter(lambda x: is_good(x), mylist)
bad = filter(lambda x: not is_good(x), mylist)

当然,这实际上与您最初使用列表理解非常相似,但是使用函数而不是查找:

good = [x for x in mylist if is_good(x)]
bad  = [x for x in mylist if not is_good(x)]

总的来说,我发现列表理解的美学非常令人愉悦。当然,如果您实际上不需要保留顺序并且不需要重复,则在集合上使用intersectiondifference方法也可以很好地工作。

Personally, I like the version you cited, assuming you already have a list of goodvals hanging around. If not, something like:

good = filter(lambda x: is_good(x), mylist)
bad = filter(lambda x: not is_good(x), mylist)

Of course, that’s really very similar to using a list comprehension like you originally did, but with a function instead of a lookup:

good = [x for x in mylist if is_good(x)]
bad  = [x for x in mylist if not is_good(x)]

In general, I find the aesthetics of list comprehensions to be very pleasing. Of course, if you don’t actually need to preserve ordering and don’t need duplicates, using the intersection and difference methods on sets would work well too.


回答 11

我认为基于N个条件拆分iterable的一般方法很方便

from collections import OrderedDict
def partition(iterable,*conditions):
    '''Returns a list with the elements that satisfy each of condition.
       Conditions are assumed to be exclusive'''
    d= OrderedDict((i,list())for i in range(len(conditions)))        
    for e in iterable:
        for i,condition in enumerate(conditions):
            if condition(e):
                d[i].append(e)
                break                    
    return d.values()

例如:

ints,floats,other = partition([2, 3.14, 1, 1.69, [], None],
                              lambda x: isinstance(x, int), 
                              lambda x: isinstance(x, float),
                              lambda x: True)

print " ints: {}\n floats:{}\n other:{}".format(ints,floats,other)

 ints: [2, 1]
 floats:[3.14, 1.69]
 other:[[], None]

如果元素可以满足多个条件,请删除中断。

I think a generalization of splitting a an iterable based on N conditions is handy

from collections import OrderedDict
def partition(iterable,*conditions):
    '''Returns a list with the elements that satisfy each of condition.
       Conditions are assumed to be exclusive'''
    d= OrderedDict((i,list())for i in range(len(conditions)))        
    for e in iterable:
        for i,condition in enumerate(conditions):
            if condition(e):
                d[i].append(e)
                break                    
    return d.values()

For instance:

ints,floats,other = partition([2, 3.14, 1, 1.69, [], None],
                              lambda x: isinstance(x, int), 
                              lambda x: isinstance(x, float),
                              lambda x: True)

print " ints: {}\n floats:{}\n other:{}".format(ints,floats,other)

 ints: [2, 1]
 floats:[3.14, 1.69]
 other:[[], None]

If the element may satisfy multiple conditions, remove the break.


回答 12

def partition(pred, iterable):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

检查一下

def partition(pred, iterable):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

Check this


回答 13

有时候,列表理解似乎不是最好的选择!

我根据人们对此主题的回答进行了一些测试,并在随机生成的列表上进行了测试。这是列表的生成(可能有更好的方法,但这不是重点):

good_list = ('.jpg','.jpeg','.gif','.bmp','.png')

import random
import string
my_origin_list = []
for i in xrange(10000):
    fname = ''.join(random.choice(string.lowercase) for i in range(random.randrange(10)))
    if random.getrandbits(1):
        fext = random.choice(good_list)
    else:
        fext = "." + ''.join(random.choice(string.lowercase) for i in range(3))

    my_origin_list.append((fname + fext, random.randrange(1000), fext))

现在我们开始

# Parand
def f1():
    return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]

# dbr
def f2():
    a, b = list(), list()
    for e in my_origin_list:
        if e[2] in good_list:
            a.append(e)
        else:
            b.append(e)
    return a, b

# John La Rooy
def f3():
    a, b = list(), list()
    for e in my_origin_list:
        (b, a)[e[2] in good_list].append(e)
    return a, b

# Ants Aasma
def f4():
    l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
    return [i for p, i in l1 if p], [i for p, i in l2 if not p]

# My personal way to do
def f5():
    a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
    return list(filter(None, a)), list(filter(None, b))

# BJ Homer
def f6():
    return filter(lambda e: e[2] in good_list, my_origin_list), filter(lambda e: not e[2] in good_list, my_origin_list)

使用cmpthese函数,最好的结果是dbr答案:

f1     204/s  --    -5%   -14%   -15%   -20%   -26%
f6     215/s     6%  --    -9%   -11%   -16%   -22%
f3     237/s    16%    10%  --    -2%    -7%   -14%
f4     240/s    18%    12%     2%  --    -6%   -13%
f5     255/s    25%    18%     8%     6%  --    -8%
f2     277/s    36%    29%    17%    15%     9%  --

Sometimes, it looks like list comprehension is not the best thing to use !

I made a little test based on the answer people gave to this topic, tested on a random generated list. Here is the generation of the list (there’s probably a better way to do, but it’s not the point) :

good_list = ('.jpg','.jpeg','.gif','.bmp','.png')

import random
import string
my_origin_list = []
for i in xrange(10000):
    fname = ''.join(random.choice(string.lowercase) for i in range(random.randrange(10)))
    if random.getrandbits(1):
        fext = random.choice(good_list)
    else:
        fext = "." + ''.join(random.choice(string.lowercase) for i in range(3))

    my_origin_list.append((fname + fext, random.randrange(1000), fext))

And here we go

# Parand
def f1():
    return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]

# dbr
def f2():
    a, b = list(), list()
    for e in my_origin_list:
        if e[2] in good_list:
            a.append(e)
        else:
            b.append(e)
    return a, b

# John La Rooy
def f3():
    a, b = list(), list()
    for e in my_origin_list:
        (b, a)[e[2] in good_list].append(e)
    return a, b

# Ants Aasma
def f4():
    l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
    return [i for p, i in l1 if p], [i for p, i in l2 if not p]

# My personal way to do
def f5():
    a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
    return list(filter(None, a)), list(filter(None, b))

# BJ Homer
def f6():
    return filter(lambda e: e[2] in good_list, my_origin_list), filter(lambda e: not e[2] in good_list, my_origin_list)

Using the cmpthese function, the best result is the dbr answer :

f1     204/s  --    -5%   -14%   -15%   -20%   -26%
f6     215/s     6%  --    -9%   -11%   -16%   -22%
f3     237/s    16%    10%  --    -2%    -7%   -14%
f4     240/s    18%    12%     2%  --    -6%   -13%
f5     255/s    25%    18%     8%     6%  --    -8%
f2     277/s    36%    29%    17%    15%     9%  --

回答 14

这个问题的又一个解决方案。我需要一个尽可能快的解决方案。这意味着仅对列表进行一次迭代,最好是O(1)以便将数据添加到结果列表之一。这与sastanin提供的解决方案非常相似,但更短:

from collections import deque

def split(iterable, function):
    dq_true = deque()
    dq_false = deque()

    # deque - the fastest way to consume an iterator and append items
    deque((
      (dq_true if function(item) else dq_false).append(item) for item in iterable
    ), maxlen=0)

    return dq_true, dq_false

然后,您可以通过以下方式使用该功能:

lower, higher = split([0,1,2,3,4,5,6,7,8,9], lambda x: x < 5)

selected, other = split([0,1,2,3,4,5,6,7,8,9], lambda x: x in {0,4,9})

如果你不罚款所产生的deque对象,你可以很容易地将其转换为listset,不管你喜欢(例如list(lower))。转换要快得多,直接建立列表即可。

此方法保持项目的顺序以及所有重复项。

Yet another solution to this problem. I needed a solution that is as fast as possible. That means only one iteration over the list and preferably O(1) for adding data to one of the resulting lists. This is very similar to the solution provided by sastanin, except much shorter:

from collections import deque

def split(iterable, function):
    dq_true = deque()
    dq_false = deque()

    # deque - the fastest way to consume an iterator and append items
    deque((
      (dq_true if function(item) else dq_false).append(item) for item in iterable
    ), maxlen=0)

    return dq_true, dq_false

Then, you can use the function in the following way:

lower, higher = split([0,1,2,3,4,5,6,7,8,9], lambda x: x < 5)

selected, other = split([0,1,2,3,4,5,6,7,8,9], lambda x: x in {0,4,9})

If you’re not fine with the resulting deque object, you can easily convert it to list, set, whatever you like (for example list(lower)). The conversion is much faster, that construction of the lists directly.

This methods keeps order of the items, as well as any duplicates.


回答 15

例如,按偶数和奇数拆分列表

arr = range(20)
even, odd = reduce(lambda res, next: res[next % 2].append(next) or res, arr, ([], []))

或一般来说:

def split(predicate, iterable):
    return reduce(lambda res, e: res[predicate(e)].append(e) or res, iterable, ([], []))

优点:

  • 最短的可能方式
  • 谓词对每个元素仅应用一次

缺点

  • 需要功能编程范例的知识

For example, splitting list by even and odd

arr = range(20)
even, odd = reduce(lambda res, next: res[next % 2].append(next) or res, arr, ([], []))

Or in general:

def split(predicate, iterable):
    return reduce(lambda res, e: res[predicate(e)].append(e) or res, iterable, ([], []))

Advantages:

  • Shortest posible way
  • Predicate applies only once for each element

Disadvantages

  • Requires knowledge of functional programing paradigm

回答 16

受@gnibbler 出色的答案(但简洁!)的启发,我们可以将这种方法映射到多个分区:

from collections import defaultdict

def splitter(l, mapper):
    """Split an iterable into multiple partitions generated by a callable mapper."""

    results = defaultdict(list)

    for x in l:
        results[mapper(x)] += [x]

    return results

然后splitter可以如下使用:

>>> l = [1, 2, 3, 4, 2, 3, 4, 5, 6, 4, 3, 2, 3]
>>> split = splitter(l, lambda x: x % 2 == 0)  # partition l into odds and evens
>>> split.items()
>>> [(False, [1, 3, 3, 5, 3, 3]), (True, [2, 4, 2, 4, 6, 4, 2])]

这适用于两个以上具有更复杂映射的分区(也适用于迭代器):

>>> import math
>>> l = xrange(1, 23)
>>> split = splitter(l, lambda x: int(math.log10(x) * 5))
>>> split.items()
[(0, [1]),
 (1, [2]),
 (2, [3]),
 (3, [4, 5, 6]),
 (4, [7, 8, 9]),
 (5, [10, 11, 12, 13, 14, 15]),
 (6, [16, 17, 18, 19, 20, 21, 22])]

或使用字典进行映射:

>>> map = {'A': 1, 'X': 2, 'B': 3, 'Y': 1, 'C': 2, 'Z': 3}
>>> l = ['A', 'B', 'C', 'C', 'X', 'Y', 'Z', 'A', 'Z']
>>> split = splitter(l, map.get)
>>> split.items()
(1, ['A', 'Y', 'A']), (2, ['C', 'C', 'X']), (3, ['B', 'Z', 'Z'])]

Inspired by @gnibbler’s great (but terse!) answer, we can apply that approach to map to multiple partitions:

from collections import defaultdict

def splitter(l, mapper):
    """Split an iterable into multiple partitions generated by a callable mapper."""

    results = defaultdict(list)

    for x in l:
        results[mapper(x)] += [x]

    return results

Then splitter can then be used as follows:

>>> l = [1, 2, 3, 4, 2, 3, 4, 5, 6, 4, 3, 2, 3]
>>> split = splitter(l, lambda x: x % 2 == 0)  # partition l into odds and evens
>>> split.items()
>>> [(False, [1, 3, 3, 5, 3, 3]), (True, [2, 4, 2, 4, 6, 4, 2])]

This works for more than two partitions with a more complicated mapping (and on iterators, too):

>>> import math
>>> l = xrange(1, 23)
>>> split = splitter(l, lambda x: int(math.log10(x) * 5))
>>> split.items()
[(0, [1]),
 (1, [2]),
 (2, [3]),
 (3, [4, 5, 6]),
 (4, [7, 8, 9]),
 (5, [10, 11, 12, 13, 14, 15]),
 (6, [16, 17, 18, 19, 20, 21, 22])]

Or using a dictionary to map:

>>> map = {'A': 1, 'X': 2, 'B': 3, 'Y': 1, 'C': 2, 'Z': 3}
>>> l = ['A', 'B', 'C', 'C', 'X', 'Y', 'Z', 'A', 'Z']
>>> split = splitter(l, map.get)
>>> split.items()
(1, ['A', 'Y', 'A']), (2, ['C', 'C', 'X']), (3, ['B', 'Z', 'Z'])]

回答 17

bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]

append返回None,因此可以正常工作。

bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]

append returns None, so it works.


回答 18

要获得性能,请尝试itertools

所述itertools模块标准化一组核心快速的,这是通过单独或组合使用的存储器有效的工具。它们共同构成了一个“迭代器代数”,从而可以在纯Python中简洁高效地构建专用工具。

请参阅itertools.ifilter或imap。

itertools.ifilter(谓词,可迭代)

创建一个迭代器以从可迭代的元素中过滤出元素,仅返回谓词为True的元素

For perfomance, try itertools.

The itertools module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an “iterator algebra” making it possible to construct specialized tools succinctly and efficiently in pure Python.

See itertools.ifilter or imap.

itertools.ifilter(predicate, iterable)

Make an iterator that filters elements from iterable returning only those for which the predicate is True


回答 19

有时您不需要列表的另一半。例如:

import sys
from itertools import ifilter

trustedPeople = sys.argv[1].split(',')
newName = sys.argv[2]

myFriends = ifilter(lambda x: x.startswith('Shi'), trustedPeople)

print '%s is %smy friend.' % (newName, newName not in myFriends 'not ' or '')

Sometimes you won’t need that other half of the list. For example:

import sys
from itertools import ifilter

trustedPeople = sys.argv[1].split(',')
newName = sys.argv[2]

myFriends = ifilter(lambda x: x.startswith('Shi'), trustedPeople)

print '%s is %smy friend.' % (newName, newName not in myFriends 'not ' or '')

回答 20

这是最快的方法。

它使用if else,(类似于dbr的答案),但首先创建一个集合。一组运算将操作数从O(m * n)减少到O(log m)+ O(n),从而使速度提高45%以上。

good_list_set = set(good_list)  # 45% faster than a tuple.

good, bad = [], []
for item in my_origin_list:
    if item in good_list_set:
        good.append(item)
    else:
        bad.append(item)

短一点:

good_list_set = set(good_list)  # 45% faster than a tuple.

good, bad = [], []
for item in my_origin_list:
    out = good if item in good_list_set else bad
    out.append(item)

基准结果:

filter_BJHomer                  80/s       --   -3265%   -5312%   -5900%   -6262%   -7273%   -7363%   -8051%   -8162%   -8244%
zip_Funky                       118/s    4848%       --   -3040%   -3913%   -4450%   -5951%   -6085%   -7106%   -7271%   -7393%
two_lst_tuple_JohnLaRoy         170/s   11332%    4367%       --   -1254%   -2026%   -4182%   -4375%   -5842%   -6079%   -6254%
if_else_DBR                     195/s   14392%    6428%    1434%       --    -882%   -3348%   -3568%   -5246%   -5516%   -5717%
two_lst_compr_Parand            213/s   16750%    8016%    2540%     967%       --   -2705%   -2946%   -4786%   -5083%   -5303%
if_else_1_line_DanSalmo         292/s   26668%   14696%    7189%    5033%    3707%       --    -331%   -2853%   -3260%   -3562%
tuple_if_else                   302/s   27923%   15542%    7778%    5548%    4177%     343%       --   -2609%   -3029%   -3341%
set_1_line                      409/s   41308%   24556%   14053%   11035%    9181%    3993%    3529%       --    -569%    -991%
set_shorter                     434/s   44401%   26640%   15503%   12303%   10337%    4836%    4345%     603%       --    -448%
set_if_else                     454/s   46952%   28358%   16699%   13349%   11290%    5532%    5018%    1100%     469%       --

Python 3.7的完整基准代码(从FunkySayu修改):

good_list = ['.jpg','.jpeg','.gif','.bmp','.png']

import random
import string
my_origin_list = []
for i in range(10000):
    fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
    if random.getrandbits(1):
        fext = random.choice(list(good_list))
    else:
        fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))

    my_origin_list.append((fname + fext, random.randrange(1000), fext))

# Parand
def two_lst_compr_Parand(*_):
    return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]

# dbr
def if_else_DBR(*_):
    a, b = list(), list()
    for e in my_origin_list:
        if e[2] in good_list:
            a.append(e)
        else:
            b.append(e)
    return a, b

# John La Rooy
def two_lst_tuple_JohnLaRoy(*_):
    a, b = list(), list()
    for e in my_origin_list:
        (b, a)[e[2] in good_list].append(e)
    return a, b

# # Ants Aasma
# def f4():
#     l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
#     return [i for p, i in l1 if p], [i for p, i in l2 if not p]

# My personal way to do
def zip_Funky(*_):
    a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
    return list(filter(None, a)), list(filter(None, b))

# BJ Homer
def filter_BJHomer(*_):
    return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list,                                                                             my_origin_list))

# ChaimG's answer; as a list.
def if_else_1_line_DanSalmo(*_):
    good, bad = [], []
    for e in my_origin_list:
        _ = good.append(e) if e[2] in good_list else bad.append(e)
    return good, bad

# ChaimG's answer; as a set.
def set_1_line(*_):
    good_list_set = set(good_list)
    good, bad = [], []
    for e in my_origin_list:
        _ = good.append(e) if e[2] in good_list_set else bad.append(e)
    return good, bad

# ChaimG set and if else list.
def set_shorter(*_):
    good_list_set = set(good_list)
    good, bad = [], []
    for e in my_origin_list:
        out = good if e[2] in good_list_set else bad
        out.append(e)
    return good, bad

# ChaimG's best answer; if else as a set.
def set_if_else(*_):
    good_list_set = set(good_list)
    good, bad = [], []
    for e in my_origin_list:
        if e[2] in good_list_set:
            good.append(e)
        else:
            bad.append(e)
    return good, bad

# ChaimG's best answer; if else as a set.
def tuple_if_else(*_):
    good_list_tuple = tuple(good_list)
    good, bad = [], []
    for e in my_origin_list:
        if e[2] in good_list_tuple:
            good.append(e)
        else:
            bad.append(e)
    return good, bad

def cmpthese(n=0, functions=None):
    results = {}
    for func_name in functions:
        args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
        t = Timer(*args)
        results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec

    functions_sorted = sorted(functions, key=results.__getitem__)
    for f in functions_sorted:
        diff = []
        for func in functions_sorted:
            if func == f:
                diff.append("--")
            else:
                diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
        diffs = " ".join(f'{x:>8s}' for x in diff)

        print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")


if __name__=='__main__':
    from timeit import Timer
cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))

This is the fastest way.

It uses if else, (like dbr’s answer) but creates a set first. A set reduces the number of operations from O(m * n) to O(log m) + O(n), resulting in a 45%+ boost in speed.

good_list_set = set(good_list)  # 45% faster than a tuple.

good, bad = [], []
for item in my_origin_list:
    if item in good_list_set:
        good.append(item)
    else:
        bad.append(item)

A little shorter:

good_list_set = set(good_list)  # 45% faster than a tuple.

good, bad = [], []
for item in my_origin_list:
    out = good if item in good_list_set else bad
    out.append(item)

Benchmark results:

filter_BJHomer                  80/s       --   -3265%   -5312%   -5900%   -6262%   -7273%   -7363%   -8051%   -8162%   -8244%
zip_Funky                       118/s    4848%       --   -3040%   -3913%   -4450%   -5951%   -6085%   -7106%   -7271%   -7393%
two_lst_tuple_JohnLaRoy         170/s   11332%    4367%       --   -1254%   -2026%   -4182%   -4375%   -5842%   -6079%   -6254%
if_else_DBR                     195/s   14392%    6428%    1434%       --    -882%   -3348%   -3568%   -5246%   -5516%   -5717%
two_lst_compr_Parand            213/s   16750%    8016%    2540%     967%       --   -2705%   -2946%   -4786%   -5083%   -5303%
if_else_1_line_DanSalmo         292/s   26668%   14696%    7189%    5033%    3707%       --    -331%   -2853%   -3260%   -3562%
tuple_if_else                   302/s   27923%   15542%    7778%    5548%    4177%     343%       --   -2609%   -3029%   -3341%
set_1_line                      409/s   41308%   24556%   14053%   11035%    9181%    3993%    3529%       --    -569%    -991%
set_shorter                     434/s   44401%   26640%   15503%   12303%   10337%    4836%    4345%     603%       --    -448%
set_if_else                     454/s   46952%   28358%   16699%   13349%   11290%    5532%    5018%    1100%     469%       --

The full benchmark code for Python 3.7 (modified from FunkySayu):

good_list = ['.jpg','.jpeg','.gif','.bmp','.png']

import random
import string
my_origin_list = []
for i in range(10000):
    fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
    if random.getrandbits(1):
        fext = random.choice(list(good_list))
    else:
        fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))

    my_origin_list.append((fname + fext, random.randrange(1000), fext))

# Parand
def two_lst_compr_Parand(*_):
    return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]

# dbr
def if_else_DBR(*_):
    a, b = list(), list()
    for e in my_origin_list:
        if e[2] in good_list:
            a.append(e)
        else:
            b.append(e)
    return a, b

# John La Rooy
def two_lst_tuple_JohnLaRoy(*_):
    a, b = list(), list()
    for e in my_origin_list:
        (b, a)[e[2] in good_list].append(e)
    return a, b

# # Ants Aasma
# def f4():
#     l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
#     return [i for p, i in l1 if p], [i for p, i in l2 if not p]

# My personal way to do
def zip_Funky(*_):
    a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
    return list(filter(None, a)), list(filter(None, b))

# BJ Homer
def filter_BJHomer(*_):
    return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list,                                                                             my_origin_list))

# ChaimG's answer; as a list.
def if_else_1_line_DanSalmo(*_):
    good, bad = [], []
    for e in my_origin_list:
        _ = good.append(e) if e[2] in good_list else bad.append(e)
    return good, bad

# ChaimG's answer; as a set.
def set_1_line(*_):
    good_list_set = set(good_list)
    good, bad = [], []
    for e in my_origin_list:
        _ = good.append(e) if e[2] in good_list_set else bad.append(e)
    return good, bad

# ChaimG set and if else list.
def set_shorter(*_):
    good_list_set = set(good_list)
    good, bad = [], []
    for e in my_origin_list:
        out = good if e[2] in good_list_set else bad
        out.append(e)
    return good, bad

# ChaimG's best answer; if else as a set.
def set_if_else(*_):
    good_list_set = set(good_list)
    good, bad = [], []
    for e in my_origin_list:
        if e[2] in good_list_set:
            good.append(e)
        else:
            bad.append(e)
    return good, bad

# ChaimG's best answer; if else as a set.
def tuple_if_else(*_):
    good_list_tuple = tuple(good_list)
    good, bad = [], []
    for e in my_origin_list:
        if e[2] in good_list_tuple:
            good.append(e)
        else:
            bad.append(e)
    return good, bad

def cmpthese(n=0, functions=None):
    results = {}
    for func_name in functions:
        args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
        t = Timer(*args)
        results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec

    functions_sorted = sorted(functions, key=results.__getitem__)
    for f in functions_sorted:
        diff = []
        for func in functions_sorted:
            if func == f:
                diff.append("--")
            else:
                diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
        diffs = " ".join(f'{x:>8s}' for x in diff)

        print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")


if __name__=='__main__':
    from timeit import Timer
cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))

回答 21

如果您坚持聪明,可以采用Winden的解决方案,并且还需要一些虚假的聪明:

def splay(l, f, d=None):
  d = d or {}
  for x in l: d.setdefault(f(x), []).append(x)
  return d

If you insist on clever, you could take Winden’s solution and just a bit spurious cleverness:

def splay(l, f, d=None):
  d = d or {}
  for x in l: d.setdefault(f(x), []).append(x)
  return d

回答 22

这里已经有很多解决方案,但是另一种解决方法是-

anims = []
images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]

仅对列表进行一次迭代,并且看起来更具Python风格,因此对我来说是可读的。

>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file1.bmp', 33L, '.bmp')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> anims = []
>>> images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
>>> print '\n'.join([str(anims), str(images)])
[('file2.avi', 999L, '.avi')]
[('file1.jpg', 33L, '.jpg'), ('file1.bmp', 33L, '.bmp')]
>>>

Already quite a few solutions here, but yet another way of doing that would be –

anims = []
images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]

Iterates over the list only once, and looks a bit more pythonic and hence readable to me.

>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file1.bmp', 33L, '.bmp')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> anims = []
>>> images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
>>> print '\n'.join([str(anims), str(images)])
[('file2.avi', 999L, '.avi')]
[('file1.jpg', 33L, '.jpg'), ('file1.bmp', 33L, '.bmp')]
>>>

回答 23

我将采用两遍方法,将对谓词的评估与对列表的过滤分开:

def partition(pred, iterable):
    xs = list(zip(map(pred, iterable), iterable))
    return [x[1] for x in xs if x[0]], [x[1] for x in xs if not x[0]]

就性能而言(除了pred仅对的每个成员进行一次评估iterable),这样做的好处是,它将大量逻辑移出了解释器,并转移到了高度优化的迭代和映射代码中。如本答案所述这可以加快长迭代次数的迭代速度。

在表达方面,它利用了诸如理解和映射之类的表达习语。

I’d take a 2-pass approach, separating evaluation of the predicate from filtering the list:

def partition(pred, iterable):
    xs = list(zip(map(pred, iterable), iterable))
    return [x[1] for x in xs if x[0]], [x[1] for x in xs if not x[0]]

What’s nice about this, performance-wise (in addition to evaluating pred only once on each member of iterable), is that it moves a lot of logic out of the interpreter and into highly-optimized iteration and mapping code. This can speed up iteration over long iterables, as described in this answer.

Expressivity-wise, it takes advantage of expressive idioms like comprehensions and mapping.


回答 24

from itertools import tee

def unpack_args(fn):
    return lambda t: fn(*t)

def separate(fn, lx):
    return map(
        unpack_args(
            lambda i, ly: filter(
                lambda el: bool(i) == fn(el),
                ly)),
        enumerate(tee(lx, 2)))

测试

[even, odd] = separate(
    lambda x: bool(x % 2),
    [1, 2, 3, 4, 5])
print(list(even) == [2, 4])
print(list(odd) == [1, 3, 5])

solution

from itertools import tee

def unpack_args(fn):
    return lambda t: fn(*t)

def separate(fn, lx):
    return map(
        unpack_args(
            lambda i, ly: filter(
                lambda el: bool(i) == fn(el),
                ly)),
        enumerate(tee(lx, 2)))

test

[even, odd] = separate(
    lambda x: bool(x % 2),
    [1, 2, 3, 4, 5])
print(list(even) == [2, 4])
print(list(odd) == [1, 3, 5])

回答 25

如果您不介意使用外部库,那么我知道两个可以自然地实现此操作:

>>> files = [ ('file1.jpg', 33, '.jpg'), ('file2.avi', 999, '.avi')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
  • iteration_utilities.partition

    >>> from iteration_utilities import partition
    >>> notimages, images = partition(files, lambda x: x[2].lower() in IMAGE_TYPES)
    >>> notimages
    [('file2.avi', 999, '.avi')]
    >>> images
    [('file1.jpg', 33, '.jpg')]
  • more_itertools.partition

    >>> from more_itertools import partition
    >>> notimages, images = partition(lambda x: x[2].lower() in IMAGE_TYPES, files)
    >>> list(notimages)  # returns a generator so you need to explicitly convert to list.
    [('file2.avi', 999, '.avi')]
    >>> list(images)
    [('file1.jpg', 33, '.jpg')]

If you don’t mind using an external library there two I know that nativly implement this operation:

>>> files = [ ('file1.jpg', 33, '.jpg'), ('file2.avi', 999, '.avi')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
  • iteration_utilities.partition:

    >>> from iteration_utilities import partition
    >>> notimages, images = partition(files, lambda x: x[2].lower() in IMAGE_TYPES)
    >>> notimages
    [('file2.avi', 999, '.avi')]
    >>> images
    [('file1.jpg', 33, '.jpg')]
    
  • more_itertools.partition

    >>> from more_itertools import partition
    >>> notimages, images = partition(lambda x: x[2].lower() in IMAGE_TYPES, files)
    >>> list(notimages)  # returns a generator so you need to explicitly convert to list.
    [('file2.avi', 999, '.avi')]
    >>> list(images)
    [('file1.jpg', 33, '.jpg')]
    

回答 26

不确定这是否是一种好方法,但是也可以通过这种方式完成

IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi')]
images, anims = reduce(lambda (i, a), f: (i + [f], a) if f[2] in IMAGE_TYPES else (i, a + [f]), files, ([], []))

Not sure if this is a good approach but it can be done in this way as well

IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi')]
images, anims = reduce(lambda (i, a), f: (i + [f], a) if f[2] in IMAGE_TYPES else (i, a + [f]), files, ([], []))

回答 27

如果列表由组和间歇性分隔符组成,则可以使用:

def split(items, p):
    groups = [[]]
    for i in items:
        if p(i):
            groups.append([])
        groups[-1].append(i)
    return groups

用法:

split(range(1,11), lambda x: x % 3 == 0)
# gives [[1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

If the list is made of groups and intermittent separators, you can use:

def split(items, p):
    groups = [[]]
    for i in items:
        if p(i):
            groups.append([])
        groups[-1].append(i)
    return groups

Usage:

split(range(1,11), lambda x: x % 3 == 0)
# gives [[1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

回答 28

images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims  = [f for f in files if f not in images]

当条件较长时很好,例如您的示例。读者不必弄清楚负面条件以及它是否能捕获所有其他情况。

images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims  = [f for f in files if f not in images]

Nice when the condition is longer, such as in your example. The reader doesn’t have to figure out the negative condition and whether it captures all other cases.


回答 29

另一个答案,简短而又“邪恶”(针对列表理解的副作用)。

digits = list(range(10))
odd = [x.pop(i) for i, x in enumerate(digits) if x % 2]

>>> odd
[1, 3, 5, 7, 9]

>>> digits
[0, 2, 4, 6, 8]

Yet another answer, short but “evil” (for list-comprehension side effects).

digits = list(range(10))
odd = [x.pop(i) for i, x in enumerate(digits) if x % 2]

>>> odd
[1, 3, 5, 7, 9]

>>> digits
[0, 2, 4, 6, 8]

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。