Python:根据索引集从列表中选择子集

问题:Python:根据索引集从列表中选择子集

我有几个具有相同数量条目的列表(每个列表都指定一个对象属性):

property_a = [545., 656., 5.4, 33.]
property_b = [ 1.2,  1.3, 2.3, 0.3]
...

并列出具有相同长度的标志

good_objects = [True, False, False, True]

(可以很容易地用等效的索引列表代替:

good_indices = [0, 3]

生成仅包含由条目或索引指示的值的新列表property_aselproperty_bsel… 的最简单方法是什么True

property_asel = [545., 33.]
property_bsel = [ 1.2, 0.3]

I have several lists having all the same number of entries (each specifying an object property):

property_a = [545., 656., 5.4, 33.]
property_b = [ 1.2,  1.3, 2.3, 0.3]
...

and list with flags of the same length

good_objects = [True, False, False, True]

(which could easily be substituted with an equivalent index list:

good_indices = [0, 3]

What is the easiest way to generate new lists property_asel, property_bsel, … which contain only the values indicated either by the True entries or the indices?

property_asel = [545., 33.]
property_bsel = [ 1.2, 0.3]

回答 0

您可以只使用列表推导

property_asel = [val for is_good, val in zip(good_objects, property_a) if is_good]

要么

property_asel = [property_a[i] for i in good_indices]

后者要快一些,因为它good_indices的长度小于的长度property_a,假设good_indices它们是预先计算的,而不是即时生成的。


编辑:第一个选项等效于itertools.compressPython 2.7 / 3.1之后的版本。请参阅@Gary Kerr的答案。

property_asel = list(itertools.compress(property_a, good_objects))

You could just use list comprehension:

property_asel = [val for is_good, val in zip(good_objects, property_a) if is_good]

or

property_asel = [property_a[i] for i in good_indices]

The latter one is faster because there are fewer good_indices than the length of property_a, assuming good_indices are precomputed instead of generated on-the-fly.


Edit: The first option is equivalent to itertools.compress available since Python 2.7/3.1. See @Gary Kerr‘s answer.

property_asel = list(itertools.compress(property_a, good_objects))

回答 1

我看到2个选项。

  1. 使用numpy:

    property_a = numpy.array([545., 656., 5.4, 33.])
    property_b = numpy.array([ 1.2,  1.3, 2.3, 0.3])
    good_objects = [True, False, False, True]
    good_indices = [0, 3]
    property_asel = property_a[good_objects]
    property_bsel = property_b[good_indices]
  2. 使用列表理解并将其压缩:

    property_a = [545., 656., 5.4, 33.]
    property_b = [ 1.2,  1.3, 2.3, 0.3]
    good_objects = [True, False, False, True]
    good_indices = [0, 3]
    property_asel = [x for x, y in zip(property_a, good_objects) if y]
    property_bsel = [property_b[i] for i in good_indices]

I see 2 options.

  1. Using numpy:

    property_a = numpy.array([545., 656., 5.4, 33.])
    property_b = numpy.array([ 1.2,  1.3, 2.3, 0.3])
    good_objects = [True, False, False, True]
    good_indices = [0, 3]
    property_asel = property_a[good_objects]
    property_bsel = property_b[good_indices]
    
  2. Using a list comprehension and zip it:

    property_a = [545., 656., 5.4, 33.]
    property_b = [ 1.2,  1.3, 2.3, 0.3]
    good_objects = [True, False, False, True]
    good_indices = [0, 3]
    property_asel = [x for x, y in zip(property_a, good_objects) if y]
    property_bsel = [property_b[i] for i in good_indices]
    

回答 2

使用内置的功能zip

property_asel = [a for (a, truth) in zip(property_a, good_objects) if truth]

编辑

只看2.7的新功能。itertools模块中现在有一个函数,与上面的代码类似。

http://docs.python.org/library/itertools.html#itertools.compress

itertools.compress('ABCDEF', [1,0,1,0,1,1]) =>
  A, C, E, F

Use the built in function zip

property_asel = [a for (a, truth) in zip(property_a, good_objects) if truth]

EDIT

Just looking at the new features of 2.7. There is now a function in the itertools module which is similar to the above code.

http://docs.python.org/library/itertools.html#itertools.compress

itertools.compress('ABCDEF', [1,0,1,0,1,1]) =>
  A, C, E, F

回答 3

假设您只有项目列表和真实/必需索引列表,那么这应该是最快的:

property_asel = [ property_a[index] for index in good_indices ]

这意味着属性选择将只进行与真实/必需索引一样多的回合。如果您有很多遵循单个标签(真/假)列表规则的属性列表,则可以使用相同的列表理解原则创建索引列表:

good_indices = [ index for index, item in enumerate(good_objects) if item ]

这会遍历good_objects中的每个项目(同时使用枚举记住其索引),并且仅返回该项目为true的索引。


对于没有理解列表的人,这是英文散文版本,其代码以粗体突出显示:

列出每个索引组的索引,索引存在好对象枚举中,如果(其中)该项为True

Assuming you only have the list of items and a list of true/required indices, this should be the fastest:

property_asel = [ property_a[index] for index in good_indices ]

This means the property selection will only do as many rounds as there are true/required indices. If you have a lot of property lists that follow the rules of a single tags (true/false) list you can create an indices list using the same list comprehension principles:

good_indices = [ index for index, item in enumerate(good_objects) if item ]

This iterates through each item in good_objects (while remembering its index with enumerate) and returns only the indices where the item is true.


For anyone not getting the list comprehension, here is an English prose version with the code highlighted in bold:

list the index for every group of index, item that exists in an enumeration of good objects, if (where) the item is True


回答 4

Matlab和Scilab语言为您提出的问题提供了比Python更简单,更优雅的语法,因此,我认为最好的方法是使用Python中的Numpy包来模仿Matlab / Scilab。通过这样做,您的问题的解决方案非常简洁,优雅:

from numpy import *
property_a = array([545., 656., 5.4, 33.])
property_b = array([ 1.2,  1.3, 2.3, 0.3])
good_objects = [True, False, False, True]
good_indices = [0, 3]
property_asel = property_a[good_objects]
property_bsel = property_b[good_indices]

Numpy试图模仿Matlab / Scilab,但这要付出一定的代价:您需要用关键字“ array”声明每个列表,这会使脚本过载(Matlab / Scilab不存在此问题)。请注意,此解决方案仅限于数字数组,在您的示例中就是这种情况。

Matlab and Scilab languages offer a simpler and more elegant syntax than Python for the question you’re asking, so I think the best you can do is to mimic Matlab/Scilab by using the Numpy package in Python. By doing this the solution to your problem is very concise and elegant:

from numpy import *
property_a = array([545., 656., 5.4, 33.])
property_b = array([ 1.2,  1.3, 2.3, 0.3])
good_objects = [True, False, False, True]
good_indices = [0, 3]
property_asel = property_a[good_objects]
property_bsel = property_b[good_indices]

Numpy tries to mimic Matlab/Scilab but it comes at a cost: you need to declare every list with the keyword “array”, something which will overload your script (this problem doesn’t exist with Matlab/Scilab). Note that this solution is restricted to arrays of number, which is the case in your example.