标签归档:indexing

如何在Python中获取排序数组的索引

问题:如何在Python中获取排序数组的索引

我有一个数字列表:

myList = [1, 2, 3, 100, 5]

现在,如果我对该列表进行排序以获得[1, 2, 3, 5, 100]。我想要的是按排序顺序排列的原始列表中元素的索引,即[0, 1, 2, 4, 3] — ala MATLAB的sort函数,它既返回值又返回索引。

I have a numerical list:

myList = [1, 2, 3, 100, 5]

Now if I sort this list to obtain [1, 2, 3, 5, 100]. What I want is the indices of the elements from the original list in the sorted order i.e. [0, 1, 2, 4, 3] — ala MATLAB’s sort function that returns both values and indices.


回答 0

如果使用的是numpy,则可以使用argsort()函数:

>>> import numpy
>>> numpy.argsort(myList)
array([0, 1, 2, 4, 3])

http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

这将返回对数组或列表进行排序的参数。

If you are using numpy, you have the argsort() function available:

>>> import numpy
>>> numpy.argsort(myList)
array([0, 1, 2, 4, 3])

http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

This returns the arguments that would sort the array or list.


回答 1

如下所示:

>>> myList = [1, 2, 3, 100, 5]
>>> [i[0] for i in sorted(enumerate(myList), key=lambda x:x[1])]
[0, 1, 2, 4, 3]

enumerate(myList) 给您一个包含(索引,值)元组的列表:

[(0, 1), (1, 2), (2, 3), (3, 100), (4, 5)]

您可以通过将列表传递给sorted并指定一个函数来提取排序键(每个元组的第二个元素;这就是它的lambda目的)对列表进行排序。最后,使用[i[0] for i in ...]列表推导来提取每个已排序元素的原始索引。

Something like next:

>>> myList = [1, 2, 3, 100, 5]
>>> [i[0] for i in sorted(enumerate(myList), key=lambda x:x[1])]
[0, 1, 2, 4, 3]

enumerate(myList) gives you a list containing tuples of (index, value):

[(0, 1), (1, 2), (2, 3), (3, 100), (4, 5)]

You sort the list by passing it to sorted and specifying a function to extract the sort key (the second element of each tuple; that’s what the lambda is for. Finally, the original index of each sorted element is extracted using the [i[0] for i in ...] list comprehension.


回答 2

myList = [1, 2, 3, 100, 5]    
sorted(range(len(myList)),key=myList.__getitem__)

[0, 1, 2, 4, 3]
myList = [1, 2, 3, 100, 5]    
sorted(range(len(myList)),key=myList.__getitem__)

[0, 1, 2, 4, 3]

回答 3

答案enumerate很好,但我个人不喜欢用于按值排序的lambda。以下只是反转索引和值,并对它们进行排序。因此,它将首先按值排序,然后按索引排序。

sorted((e,i) for i,e in enumerate(myList))

The answers with enumerate are nice, but I personally don’t like the lambda used to sort by the value. The following just reverses the index and the value, and sorts that. So it’ll first sort by value, then by index.

sorted((e,i) for i,e in enumerate(myList))

回答 4

使用enumerate和更新了答案itemgetter

sorted(enumerate(a), key=lambda x: x[1])
# [(0, 1), (1, 2), (2, 3), (4, 5), (3, 100)]

将列表压缩在一起:元组中的第一个元素将是索引,第二个是值(然后使用元组的第二个值对其进行排序x[1],x是元组)

或者用itemgetteroperatormodule`:

from operator import itemgetter
sorted(enumerate(a), key=itemgetter(1))

Updated answer with enumerate and itemgetter:

sorted(enumerate(a), key=lambda x: x[1])
# [(0, 1), (1, 2), (2, 3), (4, 5), (3, 100)]

Zip the lists together: The first element in the tuple will the index, the second is the value (then sort it using the second value of the tuple x[1], x is the tuple)

Or using itemgetter from the operatormodule`:

from operator import itemgetter
sorted(enumerate(a), key=itemgetter(1))

回答 5

我使用perfplot(我的一个项目)对这些进行了快速性能检查,发现很难推荐除numpy之外的其他任何东西(请注意对数刻度):

在此处输入图片说明


复制剧情的代码:

import perfplot
import numpy


def sorted_enumerate(seq):
    return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]


def sorted_enumerate_key(seq):
    return [x for x, y in sorted(enumerate(seq), key=lambda x: x[1])]


def sorted_range(seq):
    return sorted(range(len(seq)), key=seq.__getitem__)


def numpy_argsort(x):
    return numpy.argsort(x)


perfplot.save(
    "argsort.png",
    setup=lambda n: numpy.random.rand(n),
    kernels=[sorted_enumerate, sorted_enumerate_key, sorted_range, numpy_argsort],
    n_range=[2 ** k for k in range(15)],
    xlabel="len(x)",
)

I did a quick performance check on these with perfplot (a project of mine) and found that it’s hard to recommend anything else but numpy (note the log scale):

enter image description here


Code to reproduce the plot:

import perfplot
import numpy


def sorted_enumerate(seq):
    return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]


def sorted_enumerate_key(seq):
    return [x for x, y in sorted(enumerate(seq), key=lambda x: x[1])]


def sorted_range(seq):
    return sorted(range(len(seq)), key=seq.__getitem__)


def numpy_argsort(x):
    return numpy.argsort(x)


perfplot.save(
    "argsort.png",
    setup=lambda n: numpy.random.rand(n),
    kernels=[sorted_enumerate, sorted_enumerate_key, sorted_range, numpy_argsort],
    n_range=[2 ** k for k in range(15)],
    xlabel="len(x)",
)

回答 6

如果您不想使用numpy,

sorted(range(len(seq)), key=seq.__getitem__)

是最快的,这表现在这里

If you do not want to use numpy,

sorted(range(len(seq)), key=seq.__getitem__)

is fastest, as demonstrated here.


回答 7

本质上,您需要argsort执行,所需的实现取决于您是要使用外部库(例如NumPy)还是要保持纯Python的依赖关系。

您需要问自己的问题是:您是否想要

  • 将数组/列表排序的索引
  • 元素在排序数组/列表中将具有的索引

不幸的是,问题中的示例并未明确说明所需的内容,因为两者都会给出相同的结果:

>>> arr = np.array([1, 2, 3, 100, 5])

>>> np.argsort(np.argsort(arr))
array([0, 1, 2, 4, 3], dtype=int64)

>>> np.argsort(arr)
array([0, 1, 2, 4, 3], dtype=int64)

选择argsort实施

如果您可以使用NumPy,则只需使用该函数numpy.argsort或方法即可numpy.ndarray.argsort

已经在其他一些答案中提到了没有NumPy的实现,因此我将根据此处的基准答案来概述最快的解决方案

def argsort(l):
    return sorted(range(len(l)), key=l.__getitem__)

获取将对数组/列表进行排序的索引

要获取对数组/列表进行排序的索引,您只需调用argsort数组或列表即可。我在这里使用的是NumPy版本,但是Python实现应该给出相同的结果

>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(arr)
array([1, 2, 0, 3], dtype=int64)

结果包含获取排序数组所需的索引。

由于排序数组将是[1, 2, 3, 4]argsorted数组,因此包含原始元素中这些元素的索引。

  • 最小值为1,它1在原始索引中为index ,因此结果的第一个元素为1
  • 由于2at 2是原始索引的索引,因此结果的第二个元素是2
  • 由于3at 0是原始索引的索引,因此结果的第三个元素是0
  • 最大值4,它3在原始索引中,因此结果的最后一个元素是3

获取元素在排序数组/列表中的索引

在这种情况下,您需要申请argsort 两次

>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(np.argsort(arr))
array([2, 0, 1, 3], dtype=int64)

在这种情况下 :

  • 原始元素的第一个元素是3,这是第三个最大值,因此它将2在排序后的数组/列表中具有索引,因此第一个元素是2
  • 原始元素的第二个元素是1,这是最小值,因此它将0在排序后的数组/列表中具有索引,因此第二个元素是0
  • 原始元素的第三个元素是2,这是第二个最小的值,因此它将1在排序后的数组/列表中具有索引,因此第三个元素是1
  • 原始元素的第四个元素4是最大值,因此它将3在排序后的数组/列表中具有索引,因此最后一个元素是3

Essentially you need to do an argsort, what implementation you need depends if you want to use external libraries (e.g. NumPy) or if you want to stay pure-Python without dependencies.

The question you need to ask yourself is: Do you want the

  • indices that would sort the array/list
  • indices that the elements would have in the sorted array/list

Unfortunately the example in the question doesn’t make it clear what is desired because both will give the same result:

>>> arr = np.array([1, 2, 3, 100, 5])

>>> np.argsort(np.argsort(arr))
array([0, 1, 2, 4, 3], dtype=int64)

>>> np.argsort(arr)
array([0, 1, 2, 4, 3], dtype=int64)

Choosing the argsort implementation

If you have NumPy at your disposal you can simply use the function numpy.argsort or method numpy.ndarray.argsort.

An implementation without NumPy was mentioned in some other answers already, so I’ll just recap the fastest solution according to the benchmark answer here

def argsort(l):
    return sorted(range(len(l)), key=l.__getitem__)

Getting the indices that would sort the array/list

To get the indices that would sort the array/list you can simply call argsort on the array or list. I’m using the NumPy versions here but the Python implementation should give the same results

>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(arr)
array([1, 2, 0, 3], dtype=int64)

The result contains the indices that are needed to get the sorted array.

Since the sorted array would be [1, 2, 3, 4] the argsorted array contains the indices of these elements in the original.

  • The smallest value is 1 and it is at index 1 in the original so the first element of the result is 1.
  • The 2 is at index 2 in the original so the second element of the result is 2.
  • The 3 is at index 0 in the original so the third element of the result is 0.
  • The largest value 4 and it is at index 3 in the original so the last element of the result is 3.

Getting the indices that the elements would have in the sorted array/list

In this case you would need to apply argsort twice:

>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(np.argsort(arr))
array([2, 0, 1, 3], dtype=int64)

In this case :

  • the first element of the original is 3, which is the third largest value so it would have index 2 in the sorted array/list so the first element is 2.
  • the second element of the original is 1, which is the smallest value so it would have index 0 in the sorted array/list so the second element is 0.
  • the third element of the original is 2, which is the second-smallest value so it would have index 1 in the sorted array/list so the third element is 1.
  • the fourth element of the original is 4 which is the largest value so it would have index 3 in the sorted array/list so the last element is 3.

回答 8

其他答案是错误的。

运行argsort一次不是解决方案。例如,以下代码:

import numpy as np
x = [3,1,2]
np.argsort(x)

Yieldarray([1, 2, 0], dtype=int64)不是我们想要的。

答案应该是运行argsort两次:

import numpy as np
x = [3,1,2]
np.argsort(np.argsort(x))

给出array([2, 0, 1], dtype=int64)预期。

The other answers are WRONG.

Running argsort once is not the solution. For example, the following code:

import numpy as np
x = [3,1,2]
np.argsort(x)

yields array([1, 2, 0], dtype=int64) which is not what we want.

The answer should be to run argsort twice:

import numpy as np
x = [3,1,2]
np.argsort(np.argsort(x))

gives array([2, 0, 1], dtype=int64) as expected.


回答 9

将numpy导入为np

索引

S=[11,2,44,55,66,0,10,3,33]

r=np.argsort(S)

[output]=array([5, 1, 7, 6, 0, 8, 2, 3, 4])

argsort按排序顺序返回S的索引

物有所值

np.sort(S)

[output]=array([ 0,  2,  3, 10, 11, 33, 44, 55, 66])

Import numpy as np

FOR INDEX

S=[11,2,44,55,66,0,10,3,33]

r=np.argsort(S)

[output]=array([5, 1, 7, 6, 0, 8, 2, 3, 4])

argsort Returns the indices of S in sorted order

FOR VALUE

np.sort(S)

[output]=array([ 0,  2,  3, 10, 11, 33, 44, 55, 66])

回答 10

我们将创建另一个从0到n-1的索引数组,然后将其压缩到原始数组,然后根据原始值对其进行排序

ar = [1,2,3,4,5]
new_ar = list(zip(ar,[i for i in range(len(ar))]))
new_ar.sort()

`

We will create another array of indexes from 0 to n-1 Then zip this to the original array and then sort it on the basis of the original values

ar = [1,2,3,4,5]
new_ar = list(zip(ar,[i for i in range(len(ar))]))
new_ar.sort()

`


如何在Pandas中的特定列索引处插入列?

问题:如何在Pandas中的特定列索引处插入列?

我可以在熊猫的特定列索引处插入列吗?

import pandas as pd
df = pd.DataFrame({'l':['a','b','c','d'], 'v':[1,2,1,2]})
df['n'] = 0

这会将列n作为的最后一列df,但是没有办法告诉df您将其放在n开头吗?

Can I insert a column at a specific column index in pandas?

import pandas as pd
df = pd.DataFrame({'l':['a','b','c','d'], 'v':[1,2,1,2]})
df['n'] = 0

This will put column n as the last column of df, but isn’t there a way to tell df to put n at the beginning?


回答 0

参见文档:http : //pandas.pydata.org/pandas-docs/stable/genic/pandas.DataFrame.insert.html

使用loc = 0将在开头插入

df.insert(loc, column, value)

df = pd.DataFrame({'B': [1, 2, 3], 'C': [4, 5, 6]})

df
Out: 
   B  C
0  1  4
1  2  5
2  3  6

idx = 0
new_col = [7, 8, 9]  # can be a list, a Series, an array or a scalar   
df.insert(loc=idx, column='A', value=new_col)

df
Out: 
   A  B  C
0  7  1  4
1  8  2  5
2  9  3  6

see docs: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.insert.html

using loc = 0 will insert at the beginning

df.insert(loc, column, value)

df = pd.DataFrame({'B': [1, 2, 3], 'C': [4, 5, 6]})

df
Out: 
   B  C
0  1  4
1  2  5
2  3  6

idx = 0
new_col = [7, 8, 9]  # can be a list, a Series, an array or a scalar   
df.insert(loc=idx, column='A', value=new_col)

df
Out: 
   A  B  C
0  7  1  4
1  8  2  5
2  9  3  6

回答 1

您可以尝试将列提取为列表,根据需要对其进行按摩,然后为数据框重新编制索引:

>>> cols = df.columns.tolist()
>>> cols = [cols[-1]]+cols[:-1] # or whatever change you need
>>> df.reindex(columns=cols)

   n  l  v
0  0  a  1
1  0  b  2
2  0  c  1
3  0  d  2

编辑:这可以在一行中完成;但是,这看起来有点难看。也许会有更清洁的建议…

>>> df.reindex(columns=['n']+df.columns[:-1].tolist())

   n  l  v
0  0  a  1
1  0  b  2
2  0  c  1
3  0  d  2

You could try to extract columns as list, massage this as you want, and reindex your dataframe:

>>> cols = df.columns.tolist()
>>> cols = [cols[-1]]+cols[:-1] # or whatever change you need
>>> df.reindex(columns=cols)

   n  l  v
0  0  a  1
1  0  b  2
2  0  c  1
3  0  d  2

EDIT: this can be done in one line ; however, this looks a bit ugly. Maybe some cleaner proposal may come…

>>> df.reindex(columns=['n']+df.columns[:-1].tolist())

   n  l  v
0  0  a  1
1  0  b  2
2  0  c  1
3  0  d  2

回答 2

如果要为所有行使用一个值:

df.insert(0,'name_of_column','')
df['name_of_column'] = value

编辑:

你也可以:

df.insert(0,'name_of_column',value)

If you want a single value for all rows:

df.insert(0,'name_of_column','')
df['name_of_column'] = value

Edit:

You can also:

df.insert(0,'name_of_column',value)

回答 3

这是一个非常简单的答案(仅一行)。

在将“ n”列添加到df中之后,您可以按照以下步骤进行操作。

import pandas as pd
df = pd.DataFrame({'l':['a','b','c','d'], 'v':[1,2,1,2]})
df['n'] = 0

df
    l   v   n
0   a   1   0
1   b   2   0
2   c   1   0
3   d   2   0

# here you can add the below code and it should work.
df = df[list('nlv')]
df

    n   l   v
0   0   a   1
1   0   b   2
2   0   c   1
3   0   d   2



However, if you have words in your columns names instead of letters. It should include two brackets around your column names. 

import pandas as pd
df = pd.DataFrame({'Upper':['a','b','c','d'], 'Lower':[1,2,1,2]})
df['Net'] = 0
df['Mid'] = 2
df['Zsore'] = 2

df

    Upper   Lower   Net Mid Zsore
0   a       1       0   2   2
1   b       2       0   2   2
2   c       1       0   2   2
3   d       2       0   2   2

# here you can add below line and it should work 
df = df[list(('Mid','Upper', 'Lower', 'Net','Zsore'))]
df

   Mid  Upper   Lower   Net Zsore
0   2   a       1       0   2
1   2   b       2       0   2
2   2   c       1       0   2
3   2   d       2       0   2

Here is a very simple answer to this(only one line).

You can do that after you added the ‘n’ column into your df as follows.

import pandas as pd
df = pd.DataFrame({'l':['a','b','c','d'], 'v':[1,2,1,2]})
df['n'] = 0

df
    l   v   n
0   a   1   0
1   b   2   0
2   c   1   0
3   d   2   0

# here you can add the below code and it should work.
df = df[list('nlv')]
df

    n   l   v
0   0   a   1
1   0   b   2
2   0   c   1
3   0   d   2



However, if you have words in your columns names instead of letters. It should include two brackets around your column names. 

import pandas as pd
df = pd.DataFrame({'Upper':['a','b','c','d'], 'Lower':[1,2,1,2]})
df['Net'] = 0
df['Mid'] = 2
df['Zsore'] = 2

df

    Upper   Lower   Net Mid Zsore
0   a       1       0   2   2
1   b       2       0   2   2
2   c       1       0   2   2
3   d       2       0   2   2

# here you can add below line and it should work 
df = df[list(('Mid','Upper', 'Lower', 'Net','Zsore'))]
df

   Mid  Upper   Lower   Net Zsore
0   2   a       1       0   2
1   2   b       2       0   2
2   2   c       1       0   2
3   2   d       2       0   2

访问列表的多个元素,知道它们的索引

问题:访问列表的多个元素,知道它们的索引

我需要从给定列表中选择一些元素,知道它们的索引。假设我要创建一个新列表,该列表包含给定列表[-2、1、5、3、8、5、6]中索引为1、2、5的元素。我所做的是:

a = [-2,1,5,3,8,5,6]
b = [1,2,5]
c = [ a[i] for i in b]

有什么更好的方法吗?像c = a [b]一样?

I need to choose some elements from the given list, knowing their index. Let say I would like to create a new list, which contains element with index 1, 2, 5, from given list [-2, 1, 5, 3, 8, 5, 6]. What I did is:

a = [-2,1,5,3,8,5,6]
b = [1,2,5]
c = [ a[i] for i in b]

Is there any better way to do it? something like c = a[b] ?


回答 0

您可以使用operator.itemgetter

from operator import itemgetter 
a = [-2, 1, 5, 3, 8, 5, 6]
b = [1, 2, 5]
print(itemgetter(*b)(a))
# Result:
(1, 5, 5)

或者您可以使用numpy

import numpy as np
a = np.array([-2, 1, 5, 3, 8, 5, 6])
b = [1, 2, 5]
print(list(a[b]))
# Result:
[1, 5, 5]

但实际上,您当前的解决方案很好。这可能是所有人中最整洁的。

You can use operator.itemgetter:

from operator import itemgetter 
a = [-2, 1, 5, 3, 8, 5, 6]
b = [1, 2, 5]
print(itemgetter(*b)(a))
# Result:
(1, 5, 5)

Or you can use numpy:

import numpy as np
a = np.array([-2, 1, 5, 3, 8, 5, 6])
b = [1, 2, 5]
print(list(a[b]))
# Result:
[1, 5, 5]

But really, your current solution is fine. It’s probably the neatest out of all of them.


回答 1

备择方案:

>>> map(a.__getitem__, b)
[1, 5, 5]

>>> import operator
>>> operator.itemgetter(*b)(a)
(1, 5, 5)

Alternatives:

>>> map(a.__getitem__, b)
[1, 5, 5]

>>> import operator
>>> operator.itemgetter(*b)(a)
(1, 5, 5)

回答 2

另一个解决方案可以通过pandas Series:

import pandas as pd

a = pd.Series([-2, 1, 5, 3, 8, 5, 6])
b = [1, 2, 5]
c = a[b]

然后,您可以根据需要将c转换回列表:

c = list(c)

Another solution could be via pandas Series:

import pandas as pd

a = pd.Series([-2, 1, 5, 3, 8, 5, 6])
b = [1, 2, 5]
c = a[b]

You can then convert c back to a list if you want:

c = list(c)

回答 3

比较五个提供的答案的执行时间的基础测试,但不是非常广泛的测试:

def numpyIndexValues(a, b):
    na = np.array(a)
    nb = np.array(b)
    out = list(na[nb])
    return out

def mapIndexValues(a, b):
    out = map(a.__getitem__, b)
    return list(out)

def getIndexValues(a, b):
    out = operator.itemgetter(*b)(a)
    return out

def pythonLoopOverlap(a, b):
    c = [ a[i] for i in b]
    return c

multipleListItemValues = lambda searchList, ind: [searchList[i] for i in ind]

使用以下输入:

a = range(0, 10000000)
b = range(500, 500000)

简单的python循环是使用lambda操作最快的一秒钟,紧随其后的是,mapIndexValues和getIndexValues始终与numpy方法相似,将列表转换为numpy数组后速度显着降低。最快的。

numpyIndexValues -> time:1.38940598 (when converted the lists to numpy arrays)
numpyIndexValues -> time:0.0193445 (using numpy array instead of python list as input, and conversion code removed)
mapIndexValues -> time:0.06477512099999999
getIndexValues -> time:0.06391049500000001
multipleListItemValues -> time:0.043773591
pythonLoopOverlap -> time:0.043021754999999995

Basic and not very extensive testing comparing the execution time of the five supplied answers:

def numpyIndexValues(a, b):
    na = np.array(a)
    nb = np.array(b)
    out = list(na[nb])
    return out

def mapIndexValues(a, b):
    out = map(a.__getitem__, b)
    return list(out)

def getIndexValues(a, b):
    out = operator.itemgetter(*b)(a)
    return out

def pythonLoopOverlap(a, b):
    c = [ a[i] for i in b]
    return c

multipleListItemValues = lambda searchList, ind: [searchList[i] for i in ind]

using the following input:

a = range(0, 10000000)
b = range(500, 500000)

simple python loop was the quickest with lambda operation a close second, mapIndexValues and getIndexValues were consistently pretty similar with numpy method significantly slower after converting lists to numpy arrays.If data is already in numpy arrays the numpyIndexValues method with the numpy.array conversion removed is quickest.

numpyIndexValues -> time:1.38940598 (when converted the lists to numpy arrays)
numpyIndexValues -> time:0.0193445 (using numpy array instead of python list as input, and conversion code removed)
mapIndexValues -> time:0.06477512099999999
getIndexValues -> time:0.06391049500000001
multipleListItemValues -> time:0.043773591
pythonLoopOverlap -> time:0.043021754999999995

回答 4

我确定已经考虑了这一点:如果b中的索引数量很小且恒定,则可以将结果写为:

c = [a[b[0]]] + [a[b[1]]] + [a[b[2]]]

如果索引本身是常数,甚至更简单…

c = [a[1]] + [a[2]] + [a[5]]

或者如果有连续范围的索引…

c = a[1:3] + [a[5]]

I’m sure this has already been considered: If the amount of indices in b is small and constant, one could just write the result like:

c = [a[b[0]]] + [a[b[1]]] + [a[b[2]]]

Or even simpler if the indices itself are constants…

c = [a[1]] + [a[2]] + [a[5]]

Or if there is a consecutive range of indices…

c = a[1:3] + [a[5]]

回答 5

这是一个更简单的方法:

a = [-2,1,5,3,8,5,6]
b = [1,2,5]
c = [e for i, e in enumerate(a) if i in b]

Here’s a simpler way:

a = [-2,1,5,3,8,5,6]
b = [1,2,5]
c = [e for i, e in enumerate(a) if i in b]

回答 6

我的答案不使用numpy或python集合。

查找元素的一种简单方法如下:

a = [-2, 1, 5, 3, 8, 5, 6]
b = [1, 2, 5]
c = [i for i in a if i in b]

缺点:此方法可能不适用于较大的列表。对于较大的列表,建议使用numpy。

My answer does not use numpy or python collections.

One trivial way to find elements would be as follows:

a = [-2, 1, 5, 3, 8, 5, 6]
b = [1, 2, 5]
c = [i for i in a if i in b]

Drawback: This method may not work for larger lists. Using numpy is recommended for larger lists.


回答 7

静态索引和小清单?

不要忘记,如果列表很小并且索引没有更改,例如在您的示例中,有时最好的方法是使用序列解压缩

_,a1,a2,_,_,a3,_ = a

性能要好得多,您还可以保存一行代码:

 %timeit _,a1,b1,_,_,c1,_ = a
10000000 loops, best of 3: 154 ns per loop 
%timeit itemgetter(*b)(a)
1000000 loops, best of 3: 753 ns per loop
 %timeit [ a[i] for i in b]
1000000 loops, best of 3: 777 ns per loop
 %timeit map(a.__getitem__, b)
1000000 loops, best of 3: 1.42 µs per loop

Static indexes and small list?

Don’t forget that if the list is small and the indexes don’t change, as in your example, sometimes the best thing is to use sequence unpacking:

_,a1,a2,_,_,a3,_ = a

The performance is much better and you can also save one line of code:

 %timeit _,a1,b1,_,_,c1,_ = a
10000000 loops, best of 3: 154 ns per loop 
%timeit itemgetter(*b)(a)
1000000 loops, best of 3: 753 ns per loop
 %timeit [ a[i] for i in b]
1000000 loops, best of 3: 777 ns per loop
 %timeit map(a.__getitem__, b)
1000000 loops, best of 3: 1.42 µs per loop

回答 8

一种pythonic方式:

c = [x for x in a if a.index(x) in b]

Kind of pythonic way:

c = [x for x in a if a.index(x) in b]

Python Pandas:获取列匹配特定值的行的索引

问题:Python Pandas:获取列匹配特定值的行的索引

给定一个带有“ BoolCol”列的DataFrame,我们要查找其中“ BoolCol” == True的值的DataFrame索引

我目前有迭代的方式来做,很完美:

for i in range(100,3000):
    if df.iloc[i]['BoolCol']== True:
         print i,df.iloc[i]['BoolCol']

但这不是正确的熊猫方法。经过研究,我目前正在使用以下代码:

df[df['BoolCol'] == True].index.tolist()

这给了我一份索引列表,但是当我通过以下方法检查它们时,它们不匹配:

df.iloc[i]['BoolCol']

结果实际上是错误的!

哪一种是正确的Pandas方法?

Given a DataFrame with a column “BoolCol”, we want to find the indexes of the DataFrame in which the values for “BoolCol” == True

I currently have the iterating way to do it, which works perfectly:

for i in range(100,3000):
    if df.iloc[i]['BoolCol']== True:
         print i,df.iloc[i]['BoolCol']

But this is not the correct panda’s way to do it. After some research, I am currently using this code:

df[df['BoolCol'] == True].index.tolist()

This one gives me a list of indexes, but they dont match, when I check them by doing:

df.iloc[i]['BoolCol']

The result is actually False!!

Which would be the correct Pandas way to do this?


回答 0

df.iloc[i]返回的ithdfi不引用索引标签,i是基于0的索引。

相反,该属性index返回实际的索引标签,而不是数字的行索引:

df.index[df['BoolCol'] == True].tolist()

或等效地,

df.index[df['BoolCol']].tolist()

通过使用具有非默认索引的DataFrame玩,可以很清楚地看到差异,该索引与行的数字位置不相等:

df = pd.DataFrame({'BoolCol': [True, False, False, True, True]},
       index=[10,20,30,40,50])

In [53]: df
Out[53]: 
   BoolCol
10    True
20   False
30   False
40    True
50    True

[5 rows x 1 columns]

In [54]: df.index[df['BoolCol']].tolist()
Out[54]: [10, 40, 50]

如果要使用索引

In [56]: idx = df.index[df['BoolCol']]

In [57]: idx
Out[57]: Int64Index([10, 40, 50], dtype='int64')

那么您可以使用loc代替来选择行iloc

In [58]: df.loc[idx]
Out[58]: 
   BoolCol
10    True
40    True
50    True

[3 rows x 1 columns]

注意,loc也可以接受布尔数组

In [55]: df.loc[df['BoolCol']]
Out[55]: 
   BoolCol
10    True
40    True
50    True

[3 rows x 1 columns]

如果您有一个布尔数组,mask并且需要序数索引值,则可以使用进行计算np.flatnonzero

In [110]: np.flatnonzero(df['BoolCol'])
Out[112]: array([0, 3, 4])

用于df.iloc按顺序索引选择行:

In [113]: df.iloc[np.flatnonzero(df['BoolCol'])]
Out[113]: 
   BoolCol
10    True
40    True
50    True

df.iloc[i] returns the ith row of df. i does not refer to the index label, i is a 0-based index.

In contrast, the attribute index returns actual index labels, not numeric row-indices:

df.index[df['BoolCol'] == True].tolist()

or equivalently,

df.index[df['BoolCol']].tolist()

You can see the difference quite clearly by playing with a DataFrame with a non-default index that does not equal to the row’s numerical position:

df = pd.DataFrame({'BoolCol': [True, False, False, True, True]},
       index=[10,20,30,40,50])

In [53]: df
Out[53]: 
   BoolCol
10    True
20   False
30   False
40    True
50    True

[5 rows x 1 columns]

In [54]: df.index[df['BoolCol']].tolist()
Out[54]: [10, 40, 50]

If you want to use the index,

In [56]: idx = df.index[df['BoolCol']]

In [57]: idx
Out[57]: Int64Index([10, 40, 50], dtype='int64')

then you can select the rows using loc instead of iloc:

In [58]: df.loc[idx]
Out[58]: 
   BoolCol
10    True
40    True
50    True

[3 rows x 1 columns]

Note that loc can also accept boolean arrays:

In [55]: df.loc[df['BoolCol']]
Out[55]: 
   BoolCol
10    True
40    True
50    True

[3 rows x 1 columns]

If you have a boolean array, mask, and need ordinal index values, you can compute them using np.flatnonzero:

In [110]: np.flatnonzero(df['BoolCol'])
Out[112]: array([0, 3, 4])

Use df.iloc to select rows by ordinal index:

In [113]: df.iloc[np.flatnonzero(df['BoolCol'])]
Out[113]: 
   BoolCol
10    True
40    True
50    True

回答 1

可以使用numpy where()函数来完成:

import pandas as pd
import numpy as np

In [716]: df = pd.DataFrame({"gene_name": ['SLC45A1', 'NECAP2', 'CLIC4', 'ADC', 'AGBL4'] , "BoolCol": [False, True, False, True, True] },
       index=list("abcde"))

In [717]: df
Out[717]: 
  BoolCol gene_name
a   False   SLC45A1
b    True    NECAP2
c   False     CLIC4
d    True       ADC
e    True     AGBL4

In [718]: np.where(df["BoolCol"] == True)
Out[718]: (array([1, 3, 4]),)

In [719]: select_indices = list(np.where(df["BoolCol"] == True)[0])

In [720]: df.iloc[select_indices]
Out[720]: 
  BoolCol gene_name
b    True    NECAP2
d    True       ADC
e    True     AGBL4

虽然您并不总是需要索引来进行匹配,但是如果需要的话:

In [796]: df.iloc[select_indices].index
Out[796]: Index([u'b', u'd', u'e'], dtype='object')

In [797]: df.iloc[select_indices].index.tolist()
Out[797]: ['b', 'd', 'e']

Can be done using numpy where() function:

import pandas as pd
import numpy as np

In [716]: df = pd.DataFrame({"gene_name": ['SLC45A1', 'NECAP2', 'CLIC4', 'ADC', 'AGBL4'] , "BoolCol": [False, True, False, True, True] },
       index=list("abcde"))

In [717]: df
Out[717]: 
  BoolCol gene_name
a   False   SLC45A1
b    True    NECAP2
c   False     CLIC4
d    True       ADC
e    True     AGBL4

In [718]: np.where(df["BoolCol"] == True)
Out[718]: (array([1, 3, 4]),)

In [719]: select_indices = list(np.where(df["BoolCol"] == True)[0])

In [720]: df.iloc[select_indices]
Out[720]: 
  BoolCol gene_name
b    True    NECAP2
d    True       ADC
e    True     AGBL4

Though you don’t always need index for a match, but incase if you need:

In [796]: df.iloc[select_indices].index
Out[796]: Index([u'b', u'd', u'e'], dtype='object')

In [797]: df.iloc[select_indices].index.tolist()
Out[797]: ['b', 'd', 'e']

回答 2

一种简单的方法是在过滤之前重置DataFrame的索引:

df_reset = df.reset_index()
df_reset[df_reset['BoolCol']].index.tolist()

有点hacky,但是很快!

Simple way is to reset the index of the DataFrame prior to filtering:

df_reset = df.reset_index()
df_reset[df_reset['BoolCol']].index.tolist()

Bit hacky, but it’s quick!


回答 3

首先,您可以检查query目标列的类型bool (PS:关于如何使用它,请检查链接

df.query('BoolCol')
Out[123]: 
    BoolCol
10     True
40     True
50     True

在通过Boolean列过滤原始df之后,我们可以选择索引。

df=df.query('BoolCol')
df.index
Out[125]: Int64Index([10, 40, 50], dtype='int64')

大熊猫也有nonzero,我们只需选择行的位置True然后使用它对DataFrameindex

df.index[df.BoolCol.nonzero()[0]]
Out[128]: Int64Index([10, 40, 50], dtype='int64')

First you may check query when the target column is type bool (PS: about how to use it please check link )

df.query('BoolCol')
Out[123]: 
    BoolCol
10     True
40     True
50     True

After we filter the original df by the Boolean column we can pick the index .

df=df.query('BoolCol')
df.index
Out[125]: Int64Index([10, 40, 50], dtype='int64')

Also pandas have nonzero, we just select the position of True row and using it slice the DataFrame or index

df.index[df.BoolCol.nonzero()[0]]
Out[128]: Int64Index([10, 40, 50], dtype='int64')

回答 4

如果只想使用一次数据框对象,请使用:

df['BoolCol'].loc[lambda x: x==True].index

If you want to use your dataframe object only once, use:

df['BoolCol'].loc[lambda x: x==True].index

回答 5

我扩展这个问题是如何获取rowcolumn并且value所有的比赛价值?

这是解决方案:

import pandas as pd
import numpy as np


def search_coordinate(df_data: pd.DataFrame, search_set: set) -> list:
    nda_values = df_data.values
    tuple_index = np.where(np.isin(nda_values, [e for e in search_set]))
    return [(row, col, nda_values[row][col]) for row, col in zip(tuple_index[0], tuple_index[1])]


if __name__ == '__main__':
    test_datas = [['cat', 'dog', ''],
                  ['goldfish', '', 'kitten'],
                  ['Puppy', 'hamster', 'mouse']
                  ]
    df_data = pd.DataFrame(test_datas)
    print(df_data)
    result_list = search_coordinate(df_data, {'dog', 'Puppy'})
    print(f"\n\n{'row':<4} {'col':<4} {'name':>10}")
    [print(f"{row:<4} {col:<4} {name:>10}") for row, col, name in result_list]

输出:

          0        1       2
0       cat      dog        
1  goldfish           kitten
2     Puppy  hamster   mouse


row  col        name
0    1           dog
2    0         Puppy

I extended this question that is how to gets the row, columnand value of all matches value?

here is solution:

import pandas as pd
import numpy as np


def search_coordinate(df_data: pd.DataFrame, search_set: set) -> list:
    nda_values = df_data.values
    tuple_index = np.where(np.isin(nda_values, [e for e in search_set]))
    return [(row, col, nda_values[row][col]) for row, col in zip(tuple_index[0], tuple_index[1])]


if __name__ == '__main__':
    test_datas = [['cat', 'dog', ''],
                  ['goldfish', '', 'kitten'],
                  ['Puppy', 'hamster', 'mouse']
                  ]
    df_data = pd.DataFrame(test_datas)
    print(df_data)
    result_list = search_coordinate(df_data, {'dog', 'Puppy'})
    print(f"\n\n{'row':<4} {'col':<4} {'name':>10}")
    [print(f"{row:<4} {col:<4} {name:>10}") for row, col, name in result_list]

Output:

          0        1       2
0       cat      dog        
1  goldfish           kitten
2     Puppy  hamster   mouse


row  col        name
0    1           dog
2    0         Puppy

熊猫-获取给定列的第一行值

问题:熊猫-获取给定列的第一行值

这似乎是一个非常简单的问题……但是我没有看到我期望的简单答案。

那么,如何获得Pandas中给定列的第n行的值?(我对第一行特别感兴趣,但也对更通用的做法感兴趣)。

例如,假设我想将Btime中的1.2值作为变量。

什么是正确的方法?

df_test =

  ATime   X   Y   Z   Btime  C   D   E
0    1.2  2  15   2    1.2  12  25  12
1    1.4  3  12   1    1.3  13  22  11
2    1.5  1  10   6    1.4  11  20  16
3    1.6  2   9  10    1.7  12  29  12
4    1.9  1   1   9    1.9  11  21  19
5    2.0  0   0   0    2.0   8  10  11
6    2.4  0   0   0    2.4  10  12  15

This seems like a ridiculously easy question… but I’m not seeing the easy answer I was expecting.

So, how do I get the value at an nth row of a given column in Pandas? (I am particularly interested in the first row, but would be interested in a more general practice as well).

For example, let’s say I want to pull the 1.2 value in Btime as a variable.

Whats the right way to do this?

df_test =

  ATime   X   Y   Z   Btime  C   D   E
0    1.2  2  15   2    1.2  12  25  12
1    1.4  3  12   1    1.3  13  22  11
2    1.5  1  10   6    1.4  11  20  16
3    1.6  2   9  10    1.7  12  29  12
4    1.9  1   1   9    1.9  11  21  19
5    2.0  0   0   0    2.0   8  10  11
6    2.4  0   0   0    2.4  10  12  15

回答 0

要选择该ith行,请使用iloc

In [31]: df_test.iloc[0]
Out[31]: 
ATime     1.2
X         2.0
Y        15.0
Z         2.0
Btime     1.2
C        12.0
D        25.0
E        12.0
Name: 0, dtype: float64

要在Btime列中选择第i个值,可以使用:

In [30]: df_test['Btime'].iloc[0]
Out[30]: 1.2

df_test['Btime'].iloc[0](推荐)和之间有区别df_test.iloc[0]['Btime']

DataFrames将数据存储在基于列的块中(每个块具有一个dtype)。如果先按列选择,则可以返回视图(比返回副本要快),并且保留原始dtype。相反,如果首先选择按行,并且DataFrame的列具有不同的dtype,则Pandas 将数据复制到新的Object dtype 系列中。因此,选择列比选择行要快一些。因此,虽然 df_test.iloc[0]['Btime']作品,df_test['Btime'].iloc[0]是多一点点效率。

在分配方面,两者之间存在很大差异。 df_test['Btime'].iloc[0] = x影响df_test,但df_test.iloc[0]['Btime'] 可能不会。有关原因的说明,请参见下文。由于索引顺序的细微差别会在行为上产生很大差异,因此最好使用单个索引分配:

df.iloc[0, df.columns.get_loc('Btime')] = x

df.iloc[0, df.columns.get_loc('Btime')] = x (推荐的):

为DataFrame分配新值的推荐方法避免链接索引,而应使用andrew所示的方法,

df.loc[df.index[n], 'Btime'] = x

要么

df.iloc[n, df.columns.get_loc('Btime')] = x

后一种方法要快一些,因为df.loc必须将行和列标签转换为位置索引,因此,如果使用df.iloc替代方法,则转换的必要性要少一些 。


df['Btime'].iloc[0] = x 可行,但不建议:

尽管这可行,但是它利用了当前实现DataFrames的方式。不能保证熊猫将来会以这种方式工作。特别是,它利用了以下事实:(当前)df['Btime']始终返回视图(而不是副本),因此df['Btime'].iloc[n] = x可用于在的列的第n个位置分配新值。Btimedf

由于Pandas无法明确保证索引器何时返回视图还是副本,因此使用链式索引的赋值通常会引发,SettingWithCopyWarning即使在这种情况下,赋值可以成功修改df

In [22]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [24]: df['bar'] = 100
In [25]: df['bar'].iloc[0] = 99
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

In [26]: df
Out[26]: 
  foo  bar
0   A   99  <-- assignment succeeded
2   B  100
1   C  100

df.iloc[0]['Btime'] = x 不起作用:

相比之下,with的分配df.iloc[0]['bar'] = 123不起作用,因为df.iloc[0]正在返回副本:

In [66]: df.iloc[0]['bar'] = 123
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [67]: df
Out[67]: 
  foo  bar
0   A   99  <-- assignment failed
2   B  100
1   C  100

警告:我之前曾建议过df_test.ix[i, 'Btime']。但这不能保证为您提供ith值,因为在尝试按位置索引之前先尝试ix标签索引。因此,如果DataFrame的整数索引不是从0开始的排序顺序,则using 将返回标有标签的行,而不是该行。例如,ix[i] iith

In [1]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])

In [2]: df
Out[2]: 
  foo
0   A
2   B
1   C

In [4]: df.ix[1, 'foo']
Out[4]: 'C'

To select the ith row, use iloc:

In [31]: df_test.iloc[0]
Out[31]: 
ATime     1.2
X         2.0
Y        15.0
Z         2.0
Btime     1.2
C        12.0
D        25.0
E        12.0
Name: 0, dtype: float64

To select the ith value in the Btime column you could use:

In [30]: df_test['Btime'].iloc[0]
Out[30]: 1.2

There is a difference between df_test['Btime'].iloc[0] (recommended) and df_test.iloc[0]['Btime']:

DataFrames store data in column-based blocks (where each block has a single dtype). If you select by column first, a view can be returned (which is quicker than returning a copy) and the original dtype is preserved. In contrast, if you select by row first, and if the DataFrame has columns of different dtypes, then Pandas copies the data into a new Series of object dtype. So selecting columns is a bit faster than selecting rows. Thus, although df_test.iloc[0]['Btime'] works, df_test['Btime'].iloc[0] is a little bit more efficient.

There is a big difference between the two when it comes to assignment. df_test['Btime'].iloc[0] = x affects df_test, but df_test.iloc[0]['Btime'] may not. See below for an explanation of why. Because a subtle difference in the order of indexing makes a big difference in behavior, it is better to use single indexing assignment:

df.iloc[0, df.columns.get_loc('Btime')] = x

df.iloc[0, df.columns.get_loc('Btime')] = x (recommended):

The recommended way to assign new values to a DataFrame is to avoid chained indexing, and instead use the method shown by andrew,

df.loc[df.index[n], 'Btime'] = x

or

df.iloc[n, df.columns.get_loc('Btime')] = x

The latter method is a bit faster, because df.loc has to convert the row and column labels to positional indices, so there is a little less conversion necessary if you use df.iloc instead.


df['Btime'].iloc[0] = x works, but is not recommended:

Although this works, it is taking advantage of the way DataFrames are currently implemented. There is no guarantee that Pandas has to work this way in the future. In particular, it is taking advantage of the fact that (currently) df['Btime'] always returns a view (not a copy) so df['Btime'].iloc[n] = x can be used to assign a new value at the nth location of the Btime column of df.

Since Pandas makes no explicit guarantees about when indexers return a view versus a copy, assignments that use chained indexing generally always raise a SettingWithCopyWarning even though in this case the assignment succeeds in modifying df:

In [22]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [24]: df['bar'] = 100
In [25]: df['bar'].iloc[0] = 99
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

In [26]: df
Out[26]: 
  foo  bar
0   A   99  <-- assignment succeeded
2   B  100
1   C  100

df.iloc[0]['Btime'] = x does not work:

In contrast, assignment with df.iloc[0]['bar'] = 123 does not work because df.iloc[0] is returning a copy:

In [66]: df.iloc[0]['bar'] = 123
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [67]: df
Out[67]: 
  foo  bar
0   A   99  <-- assignment failed
2   B  100
1   C  100

Warning: I had previously suggested df_test.ix[i, 'Btime']. But this is not guaranteed to give you the ith value since ix tries to index by label before trying to index by position. So if the DataFrame has an integer index which is not in sorted order starting at 0, then using ix[i] will return the row labeled i rather than the ith row. For example,

In [1]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])

In [2]: df
Out[2]: 
  foo
0   A
2   B
1   C

In [4]: df.ix[1, 'foo']
Out[4]: 'C'

回答 1

请注意,@ unutbu的答案是正确的,直到您想将值设置为新值,否则如果您的数据框是视图,则该答案将不起作用。

In [4]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [5]: df['bar'] = 100
In [6]: df['bar'].iloc[0] = 99
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.16.0_19_g8d2818e-py2.7-macosx-10.9-x86_64.egg/pandas/core/indexing.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

可以同时在设置和获取上使用的另一种方法是:

In [7]: df.loc[df.index[0], 'foo']
Out[7]: 'A'
In [8]: df.loc[df.index[0], 'bar'] = 99
In [9]: df
Out[9]:
  foo  bar
0   A   99
2   B  100
1   C  100

Note that the answer from @unutbu will be correct until you want to set the value to something new, then it will not work if your dataframe is a view.

In [4]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [5]: df['bar'] = 100
In [6]: df['bar'].iloc[0] = 99
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.16.0_19_g8d2818e-py2.7-macosx-10.9-x86_64.egg/pandas/core/indexing.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

Another approach that will consistently work with both setting and getting is:

In [7]: df.loc[df.index[0], 'foo']
Out[7]: 'A'
In [8]: df.loc[df.index[0], 'bar'] = 99
In [9]: df
Out[9]:
  foo  bar
0   A   99
2   B  100
1   C  100

回答 2

另一种方法是:

first_value = df['Btime'].values[0]

这种方式似乎比使用更快.iloc

In [1]: %timeit -n 1000 df['Btime'].values[20]
5.82 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [2]: %timeit -n 1000 df['Btime'].iloc[20]
29.2 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Another way to do this:

first_value = df['Btime'].values[0]

This way seems to be faster than using .iloc:

In [1]: %timeit -n 1000 df['Btime'].values[20]
5.82 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [2]: %timeit -n 1000 df['Btime'].iloc[20]
29.2 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

回答 3

  1. df.iloc[0].head(1) -仅从整个第一行开始的第一个数据集。
  2. df.iloc[0] -整个列的第一行。
  1. df.iloc[0].head(1) – First data set only from entire first row.
  2. df.iloc[0] – Entire First row in column.

回答 4

通常,如果您想从J列中获取前N行,最好的方法是:pandas dataframe

data = dataframe[0:N][:,J]

In a general way, if you want to pick up the first N rows from the J column from pandas dataframe the best way to do this is:

data = dataframe[0:N][:,J]

回答 5

为了从列“ test”和第1行获取例如值,它的工作原理如下

df[['test']].values[0][0]

因为只df[['test']].values[0]给一个数组

To get e.g the value from column ‘test’ and row 1 it works like

df[['test']].values[0][0]

as only df[['test']].values[0] gives back a array


回答 6

获取第一行并保留索引的另一种方法:

x = df.first('d') # Returns the first day. '3d' gives first three days.

Another way of getting the first row and preserving the index:

x = df.first('d') # Returns the first day. '3d' gives first three days.

iloc,ix和loc有何不同?

问题:iloc,ix和loc有何不同?

有人可以解释这三种切片方法有何不同吗?
我看过文档,也看过这些 答案,但仍然发现自己无法解释这三者之间的区别。在我看来,它们在很大程度上似乎是可互换的,因为它们处于切片的较低级别。

例如,假设我们要获取的前五行DataFrame。这三者如何运作?

df.loc[:5]
df.ix[:5]
df.iloc[:5]

有人可以提出三种用法之间的区别更清楚的情况吗?

Can someone explain how these three methods of slicing are different?
I’ve seen the docs, and I’ve seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

For example, say we want to get the first five rows of a DataFrame. How is it that all three of these work?

df.loc[:5]
df.ix[:5]
df.iloc[:5]

Can someone present three cases where the distinction in uses are clearer?


回答 0

注意:在熊猫版本0.20.0及更高版本中,ix弃用,建议改为使用lociloc。我留下了ix完整的答案部分,以供早期版本的熊猫用户参考。下面添加了示例,显示了的替代方案 ix


首先,以下是三种方法的概述:

  • loc从索引中获取带有特定标签的行(或列)。
  • iloc在索引中的特定位置获取行(或列)(因此仅获取整数)。
  • ix通常会尝试表现得像,lociloc如果索引中没有标签,则会回落为行为。

重要的是要注意一些细微之处,这些细微之处可能会使ix使用起来有些棘手:

  • 如果索引是整数类型,ix则将仅使用基于标签的索引,而不会使用基于位置的索引。如果标签不在索引中,则会引发错误。

  • 如果指数不包含唯一整数,然后给出一个整数,ix将立即使用基于位置的索引,而不是基于标签的索引。但是,如果ix给定其他类型(例如字符串),则可以使用基于标签的索引。


为了说明这三种方法之间的差异,请考虑以下系列:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

我们将看看用整数值切片3

在这种情况下,向s.iloc[:3]我们返回前3行(因为它将3视为位置),并向s.loc[:3]我们返回前8行(因为将3视为标签):

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

注意s.ix[:3]s.loc[:3]由于它首先查找标签,而不是在位置上工作(因此,其索引为s整数类型),因此Notification 返回相同的Series 。

如果我们尝试使用不在索引中的整数标签(例如6)怎么办?

此处s.iloc[:6]按预期返回Series的前6行。但是,s.loc[:6]由于6不在索引中,所以引发KeyError 。

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

根据上面提到的细微之处,s.ix[:6]现在引发KeyError,因为它试图像在索引中loc找到一个那样工作,但找不到它6。因为我们的索引是整数类型,ix所以不会回落为iloc

但是,如果我们的索引为混合类型,则给定的整数ixiloc立即表现出来,而不是引发KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

请记住,ix它仍然可以接受非整数并表现为loc

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

作为一般建议,如果您仅使用标签建立索引,或者仅使用整数位置建立索引,请坚持使用lociloc避免出现意外结果-请勿使用ix


结合基于位置和基于标签的索引

有时在给定DataFrame的情况下,您将需要为行和列混合使用标签和位置索引方法。

例如,考虑以下DataFrame。如何最好地将行切成“ c” 包括前四列?

>>> df = pd.DataFrame(np.nan, 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

在早期版本的pandas(0.20.0之前)中ix,您可以整齐地进行此操作-我们可以按标签对行进行切片,按位置对列进行切片(请注意,对于列,ix由于4不是列名,因此默认为基于位置的切片 ):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

在更高版本的熊猫中,我们可以使用iloc并借助另一种方法来获得此结果:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc()是一种索引方法,意思是“获取标签在此索引中的位置”。请注意,由于切片与iloc不包含其端点,因此如果还要行’c’,则必须在此值上加1。

此处的熊猫文档中还有其他示例。

Note: in pandas version 0.20.0 and above, ix is deprecated and the use of loc and iloc is encouraged instead. I have left the parts of this answer that describe ix intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix.


First, here’s a recap of the three methods:

  • loc gets rows (or columns) with particular labels from the index.
  • iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
  • ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index.

It’s important to note some subtleties that can make ix slightly tricky to use:

  • if the index is of integer type, ix will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.

  • if the index does not contain only integers, then given an integer, ix will immediately use position-based indexing rather than label-based indexing. If however ix is given another type (e.g. a string), it can use label-based indexing.


To illustrate the differences between the three methods, consider the following Series:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

We’ll look at slicing with the integer value 3.

In this case, s.iloc[:3] returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3] returns us the first 8 rows (since it treats 3 as a label):

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

Notice s.ix[:3] returns the same Series as s.loc[:3] since it looks for the label first rather than working on the position (and the index for s is of integer type).

What if we try with an integer label that isn’t in the index (say 6)?

Here s.iloc[:6] returns the first 6 rows of the Series as expected. However, s.loc[:6] raises a KeyError since 6 is not in the index.

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

As per the subtleties noted above, s.ix[:6] now raises a KeyError because it tries to work like loc but can’t find a 6 in the index. Because our index is of integer type ix doesn’t fall back to behaving like iloc.

If, however, our index was of mixed type, given an integer ix would behave like iloc immediately instead of raising a KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

Keep in mind that ix can still accept non-integers and behave like loc:

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

As general advice, if you’re only indexing using labels, or only indexing using integer positions, stick with loc or iloc to avoid unexpected results – try not use ix.


Combining position-based and label-based indexing

Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.

For example, consider the following DataFrame. How best to slice the rows up to and including ‘c’ and take the first four columns?

>>> df = pd.DataFrame(np.nan, 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

In earlier versions of pandas (before 0.20.0) ix lets you do this quite neatly – we can slice the rows by label and the columns by position (note that for the columns, ix will default to position-based slicing since 4 is not a column name):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

In later versions of pandas, we can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc() is an index method meaning “get the position of the label in this index”. Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row ‘c’ as well.

There are further examples in pandas’ documentation here.


回答 1

iloc基于整数定位工作。因此,无论您的行标签是什么,您都可以始终执行以下操作:

df.iloc[0]

或最后五行

df.iloc[-5:]

您也可以在列上使用它。这将检索第三列:

df.iloc[:, 2]    # the : in the first position indicates all rows

您可以将它们结合起来以获得行和列的交集:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

另一方面,.loc使用命名索引。让我们设置一个带有字符串作为行和列标签的数据框:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

然后我们可以得到第一行

df.loc['a']     # equivalent to df.iloc[0]

和第二两排的'date'柱通过

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

等等。现在,可能值得指出的是,a的默认行和列索引DataFrame是从0开始的整数,在这种情况下iloc,它们的loc工作方式相同。这就是为什么您的三个示例是等效的。如果您有非数字索引(例如字符串或日期时间), df.loc[:5] 则会引发错误。

另外,您可以仅使用数据框的进行列检索__getitem__

df['time']    # equivalent to df.loc[:, 'time']

现在假设您要混合使用位置索引和命名索引,即使用行上的名称和列上的位置进行索引(为澄清起见,我的意思是从我们的数据框中选择内容,而不是使用行索引中包含字符串和整数的方式创建数据框列索引)。这是.ix进来的地方:

df.ix[:2, 'time']    # the first two rows of the 'time' column

我认为也值得一提的是,您也可以将布尔向量传递给该loc方法。例如:

 b = [True, False, True]
 df.loc[b] 

将返回的第一行和第三行df。这等效df[b]于选择,但也可以用于通过布尔向量进行分配:

df.loc[b, 'name'] = 'Mary', 'John'

iloc works based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing

df.iloc[0]

or the last five rows by doing

df.iloc[-5:]

You can also use it on the columns. This retrieves the 3rd column:

df.iloc[:, 2]    # the : in the first position indicates all rows

You can combine them to get intersections of rows and columns:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

On the other hand, .loc use named indices. Let’s set up a data frame with strings as row and column labels:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

Then we can get the first row by

df.loc['a']     # equivalent to df.iloc[0]

and the second two rows of the 'date' column by

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

and so on. Now, it’s probably worth pointing out that the default row and column indices for a DataFrame are integers from 0 and in this case iloc and loc would work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes, df.loc[:5] would raise an error.

Also, you can do column retrieval just by using the data frame’s __getitem__:

df['time']    # equivalent to df.loc[:, 'time']

Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where .ix comes in:

df.ix[:2, 'time']    # the first two rows of the 'time' column

I think it’s also worth mentioning that you can pass boolean vectors to the loc method as well. For example:

 b = [True, False, True]
 df.loc[b] 

Will return the 1st and 3rd rows of df. This is equivalent to df[b] for selection, but it can also be used for assigning via boolean vectors:

df.loc[b, 'name'] = 'Mary', 'John'

回答 2

我认为,可接受的答案令人困惑,因为它使用仅缺少值的DataFrame。我也不喜欢术语基于位置.iloc,相反,喜欢整数位置,因为它是更描述性,正是.iloc代表。关键字是.ilocINTEGER-需要INTEGERS。

请参阅我关于子集选择的非常详细的博客系列,以了解更多信息


.ix已弃用且含糊不清,切勿使用

由于.ix已弃用,因此我们仅关注.loc和之间的差异.iloc

在讨论差异之前,重要的是要了解DataFrames具有标签,这些标签可帮助标识每个列和每个索引。让我们看一个示例DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

在此处输入图片说明

所有粗体字均为标签。标签,agecolorfoodheightscorestate被用于。其他标签,JaneNickAaronPenelopeDeanChristinaCornelia被用于索引


在DataFrame中选择特定行的主要方法是使用.loc.iloc索引器。这些索引器中的每一个也可以用于同时选择列,但是现在只关注行更容易。同样,每个索引器都使用紧跟其名称的一组括号进行选择。

.loc仅通过标签选择数据

我们将首先讨论.loc仅通过索引或列标签选择数据的索引器。在示例DataFrame中,我们提供了有意义的名称作为索引值。许多DataFrame都没有任何有意义的名称,而是默认为0到n-1之间的整数,其中n是DataFrame的长度。

您可以使用三种不同的输入 .loc

  • 一串
  • 字符串列表
  • 使用字符串作为起始值和终止值的切片符号

用带字符串的.loc选择单行

要选择一行数据,请将索引标签放在后面的括号内.loc

df.loc['Penelope']

这将数据行作为系列返回

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

使用.loc与字符串列表选择多行

df.loc[['Cornelia', 'Jane', 'Dean']]

这将返回一个DataFrame,其中的数据行按列表中指定的顺序进行:

在此处输入图片说明

使用带有切片符号的.loc选择多行

切片符号由开始,停止和步进值定义。按标签切片时,大熊猫在返回值中包含停止值。以下是从亚伦到迪恩(含)的片段。它的步长未明确定义,但默认为1。

df.loc['Aaron':'Dean']

在此处输入图片说明

可以采用与Python列表相同的方式获取复杂的切片。

.iloc仅按整数位置选择数据

现在转到.iloc。DataFrame中数据的每一行和每一列都有一个定义它的整数位置。这是在输出中直观显示的标签的补充。整数位置只是从0开始从顶部/左侧开始的行/列数。

您可以使用三种不同的输入 .iloc

  • 一个整数
  • 整数列表
  • 使用整数作为起始值和终止值的切片符号

用带整数的.iloc选择单行

df.iloc[4]

这将返回第5行(整数位置4)为系列

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

用.iloc选择带有整数列表的多行

df.iloc[[2, -2]]

这将返回第三行和倒数第二行的DataFrame:

在此处输入图片说明

使用带切片符号的.iloc选择多行

df.iloc[:5:3]

在此处输入图片说明


使用.loc和.iloc同时选择行和列

两者的一项出色功能.loc/.iloc是它们可以同时选择行和列。在上面的示例中,所有列都是从每个选择中返回的。我们可以选择输入类型与行相同的列。我们只需要用逗号分隔行和列选择即可。

例如,我们可以选择Jane行和Dean行,它们的高度,得分和状态如下:

df.loc[['Jane', 'Dean'], 'height':]

在此处输入图片说明

这对行使用标签列表,对列使用切片符号

我们自然可以.iloc只使用整数来执行类似的操作。

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

带标签和整数位置的同时选择

.ix用来与标签和整数位置同时进行选择,这很有用,但有时会造成混淆和模棱两可,值得庆幸的是,它已被弃用。如果您需要混合使用标签和整数位置进行选择,则必须同时选择标签或整数位置。

例如,如果我们要选择行Nick以及第Cornelia2列和第4列,则可以.loc通过以下方式将整数转换为标签来使用:

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 

或者,可以使用get_locindex方法将索引标签转换为整数。

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

布尔选择

.loc索引器还可以进行布尔选择。例如,如果我们有兴趣查找年龄在30岁以上的所有行,并仅返回foodscore列,则可以执行以下操作:

df.loc[df['age'] > 30, ['food', 'score']] 

您可以使用复制它,.iloc但是不能将其传递为布尔系列。您必须将boolean Series转换为numpy数组,如下所示:

df.iloc[(df['age'] > 30).values, [2, 4]] 

选择所有行

可以.loc/.iloc仅用于列选择。您可以使用如下冒号来选择所有行:

df.loc[:, 'color':'score':2]

在此处输入图片说明


索引运算符[]可以选择行和列,但不能同时选择。

大多数人都熟悉DataFrame索引运算符的主要目的,即选择列。字符串选择单个列作为系列,而字符串列表选择多个列作为DataFrame。

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

使用列表选择多个列

df[['food', 'score']]

在此处输入图片说明

人们所不熟悉的是,当使用切片符号时,选择是通过行标签或整数位置进行的。这非常令人困惑,我几乎从未使用过,但是确实可以使用。

df['Penelope':'Christina'] # slice rows by label

在此处输入图片说明

df[2:6:2] # slice rows by integer location

在此处输入图片说明

.loc/.iloc选择行的显式性是高度首选的。单独的索引运算符无法同时选择行和列。

df[3:5, 'color']
TypeError: unhashable type: 'slice'

In my opinion, the accepted answer is confusing, since it uses a DataFrame with only missing values. I also do not like the term position-based for .iloc and instead, prefer integer location as it is much more descriptive and exactly what .iloc stands for. The key word is INTEGER – .iloc needs INTEGERS.

See my extremely detailed blog series on subset selection for more


.ix is deprecated and ambiguous and should never be used

Because .ix is deprecated we will only focus on the differences between .loc and .iloc.

Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each index. Let’s take a look at a sample DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

enter image description here

All the words in bold are the labels. The labels, age, color, food, height, score and state are used for the columns. The other labels, Jane, Nick, Aaron, Penelope, Dean, Christina, Cornelia are used for the index.


The primary ways to select particular rows in a DataFrame are with the .loc and .iloc indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.

.loc selects data only by labels

We will first talk about the .loc indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length of the DataFrame.

There are three different inputs you can use for .loc

  • A string
  • A list of strings
  • Slice notation using strings as the start and stop values

Selecting a single row with .loc with a string

To select a single row of data, place the index label inside of the brackets following .loc.

df.loc['Penelope']

This returns the row of data as a Series

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

Selecting multiple rows with .loc with a list of strings

df.loc[['Cornelia', 'Jane', 'Dean']]

This returns a DataFrame with the rows in the order specified in the list:

enter image description here

Selecting multiple rows with .loc with slice notation

Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.

df.loc['Aaron':'Dean']

enter image description here

Complex slices can be taken in the same manner as Python lists.

.iloc selects data only by integer location

Let’s now turn to .iloc. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.

There are three different inputs you can use for .iloc

  • An integer
  • A list of integers
  • Slice notation using integers as the start and stop values

Selecting a single row with .iloc with an integer

df.iloc[4]

This returns the 5th row (integer location 4) as a Series

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

Selecting multiple rows with .iloc with a list of integers

df.iloc[[2, -2]]

This returns a DataFrame of the third and second to last rows:

enter image description here

Selecting multiple rows with .iloc with slice notation

df.iloc[:5:3]

enter image description here


Simultaneous selection of rows and columns with .loc and .iloc

One excellent ability of both .loc/.iloc is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.

For example, we can select rows Jane, and Dean with just the columns height, score and state like this:

df.loc[['Jane', 'Dean'], 'height':]

enter image description here

This uses a list of labels for the rows and slice notation for the columns

We can naturally do similar operations with .iloc using only integers.

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

Simultaneous selection with labels and integer location

.ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.

For instance, if we want to select rows Nick and Cornelia along with columns 2 and 4, we could use .loc by converting the integers to labels with the following:

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 

Or alternatively, convert the index labels to integers with the get_loc index method.

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

Boolean Selection

The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows wher age is above 30 and return just the food and score columns we can do the following:

df.loc[df['age'] > 30, ['food', 'score']] 

You can replicate this with .iloc but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:

df.iloc[(df['age'] > 30).values, [2, 4]] 

Selecting all rows

It is possible to use .loc/.iloc for just column selection. You can select all the rows by using a colon like this:

df.loc[:, 'color':'score':2]

enter image description here


The indexing operator, [], can select rows and columns too but not simultaneously.

Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

Using a list selects multiple columns

df[['food', 'score']]

enter image description here

What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.

df['Penelope':'Christina'] # slice rows by label

enter image description here

df[2:6:2] # slice rows by integer location

enter image description here

The explicitness of .loc/.iloc for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.

df[3:5, 'color']
TypeError: unhashable type: 'slice'

如何避免Python / Pandas在保存的csv中创建索引?

问题:如何避免Python / Pandas在保存的csv中创建索引?

对文件进行一些编辑后,我试图将csv保存到文件夹。

每次我使用pd.to_csv('C:/Path of file.csv')csv文件时,都有单独的索引列。我想避免将索引打印到csv。

我试过了:

pd.read_csv('C:/Path to file to edit.csv', index_col = False)

并保存文件…

pd.to_csv('C:/Path to save edited file.csv', index_col = False)

但是,我仍然得到不需要的索引列。保存文件时如何避免这种情况?

I am trying to save a csv to a folder after making some edits to the file.

Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv.

I tried:

pd.read_csv('C:/Path to file to edit.csv', index_col = False)

And to save the file…

pd.to_csv('C:/Path to save edited file.csv', index_col = False)

However, I still got the unwanted index column. How can I avoid this when I save my files?


回答 0

使用index=False

df.to_csv('your.csv', index=False)

Use index=False.

df.to_csv('your.csv', index=False)

回答 1

有两种方法可以处理我们不希望将索引存储在csv文件中的情况。

  1. 正如其他人所述,将 数据框保存到csv文件时可以使用index = False

    df.to_csv('file_name.csv',index=False)

  2. 或者,您可以使用索引保存数据框,在读取时只需删除未命名的包含先前索引的0列即可!简单!

    df.to_csv(' file_name.csv ')
    df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)

There are two ways to handle the situation where we do not want the index to be stored in csv file.

  1. As others have stated you can use index=False while saving your
    dataframe to csv file.

    df.to_csv('file_name.csv',index=False)

  2. Or you can save your dataframe as it is with an index, and while reading you just drop the column unnamed 0 containing your previous index.Simple!

    df.to_csv(' file_name.csv ')
    df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)


回答 2

如果不需要索引,请使用以下命令读取文件:

import pandas as pd
df = pd.read_csv('file.csv', index_col=0)

使用保存

df.to_csv('file.csv', index=False)

If you want no index, read file using:

import pandas as pd
df = pd.read_csv('file.csv', index_col=0)

save it using

df.to_csv('file.csv', index=False)

回答 3

正如其他人所说,如果您不想首先保存索引列,则可以使用 df.to_csv('processed.csv', index=False)

但是,由于您通常使用的数据本身具有某种索引,因此我们假设使用“时间戳”列,因此我将保留索引并使用该索引加载数据。

因此,要保存索引数据,请首先设置其索引,然后保存DataFrame:

df.set_index('timestamp')
df.to_csv('processed.csv')

之后,您可以读取带有索引的数据:

pd.read_csv('processed.csv', index_col='timestamp')

或读取数据,然后设置索引:

pd.read_csv('filename.csv')
pd.set_index('column_name')

As others have stated, if you don’t want to save the index column in the first place, you can use df.to_csv('processed.csv', index=False)

However, since the data you will usually use, have some sort of index themselves, let’s say a ‘timestamp’ column, I would keep the index and load the data using it.

So, to save the indexed data, first set their index and then save the DataFrame:

df.set_index('timestamp')
df.to_csv('processed.csv')

Afterwards, you can either read the data with the index:

pd.read_csv('processed.csv', index_col='timestamp')

or read the data, and then set the index:

pd.read_csv('filename.csv')
pd.set_index('column_name')

回答 4

如果要将此列保留为索引,则可以采用另一种解决方案。

pd.read_csv('filename.csv', index_col='Unnamed: 0')

Another solution if you want to keep this column as index.

pd.read_csv('filename.csv', index_col='Unnamed: 0')

回答 5

如果您想要一个好的格式,那么下一条语句是最好的:

dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)

在这种情况下,您将获得一个带有’,’的csv文件,该文件在各列和utf-8格式之间分开。另外,数字索引不会出现。

If you want a good format the next statement is the best:

dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)

In this case you have got a csv file with ‘,’ as separate between columns and utf-8 format. In addition, numerical index won’t appear.


通过整数索引选择一行熊猫系列/数据框

问题:通过整数索引选择一行熊猫系列/数据框

我很好奇,为什么df[2]不支持,而df.ix[2]df[2:3]这两个工作。

In [26]: df.ix[2]
Out[26]: 
A    1.027680
B    1.514210
C   -1.466963
D   -0.162339
Name: 2000-01-03 00:00:00

In [27]: df[2:3]
Out[27]: 
                  A        B         C         D
2000-01-03  1.02768  1.51421 -1.466963 -0.162339

我希望df[2]df[2:3]与Python索引约定一致的方式进行工作。是否有设计原因不支持按单个整数索引行?

I am curious as to why df[2] is not supported, while df.ix[2] and df[2:3] both work.

In [26]: df.ix[2]
Out[26]: 
A    1.027680
B    1.514210
C   -1.466963
D   -0.162339
Name: 2000-01-03 00:00:00

In [27]: df[2:3]
Out[27]: 
                  A        B         C         D
2000-01-03  1.02768  1.51421 -1.466963 -0.162339

I would expect df[2] to work the same way as df[2:3] to be consistent with Python indexing convention. Is there a design reason for not supporting indexing row by single integer?


回答 0

回显@HYRY,请参阅0.11中的新文档

http://pandas.pydata.org/pandas-docs/stable/indexing.html

在这里,我们有了新的运算符,.iloc以明确支持仅整数索引,并且.loc明确支持仅标签索引

例如,想象这种情况

In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list('AB'))

In [2]: df
Out[2]: 
          A         B
0  1.068932 -0.794307
2 -0.470056  1.192211
4 -0.284561  0.756029
6  1.037563 -0.267820
8 -0.538478 -0.800654

In [5]: df.iloc[[2]]
Out[5]: 
          A         B
4 -0.284561  0.756029

In [6]: df.loc[[2]]
Out[6]: 
          A         B
2 -0.470056  1.192211

[] 仅对行进行切片(按标签位置)

echoing @HYRY, see the new docs in 0.11

http://pandas.pydata.org/pandas-docs/stable/indexing.html

Here we have new operators, .iloc to explicity support only integer indexing, and .loc to explicity support only label indexing

e.g. imagine this scenario

In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list('AB'))

In [2]: df
Out[2]: 
          A         B
0  1.068932 -0.794307
2 -0.470056  1.192211
4 -0.284561  0.756029
6  1.037563 -0.267820
8 -0.538478 -0.800654

In [5]: df.iloc[[2]]
Out[5]: 
          A         B
4 -0.284561  0.756029

In [6]: df.loc[[2]]
Out[6]: 
          A         B
2 -0.470056  1.192211

[] slices the rows (by label location) only


回答 1

DataFrame索引运算符的主要目的[]是选择列。

当索引运算符传递字符串或整数时,它将尝试查找具有该特定名称的列并将其作为Series返回。

因此,在上述问题中:df[2]搜索与整数值匹配的列名2。该列不存在,并且KeyError引发a。


使用切片符号时,DataFrame索引运算符完全更改行为以选择行

奇怪的是,当给定切片时,DataFrame索引运算符选择行,并且可以按整数位置或按索引标签来选择行。

df[2:3]

这将从整数位置为2的行开始切为3,最后一个元素除外。因此,只需一行。下面的代码选择从整数位置6开始的行,直到每第三行从20开始但不包括20的行。

df[6:20:3]

如果DataFrame索引中包含字符串,则还可以使用由字符串标签组成的切片。有关更多详细信息,请参见.iloc与.loc上的此解决方案

我几乎从未将这种切片符号与索引运算符一起使用,因为它不是显式的,而且几乎从未使用过。按行切片时,请坚持使用.loc/.iloc

The primary purpose of the DataFrame indexing operator, [] is to select columns.

When the indexing operator is passed a string or integer, it attempts to find a column with that particular name and return it as a Series.

So, in the question above: df[2] searches for a column name matching the integer value 2. This column does not exist and a KeyError is raised.


The DataFrame indexing operator completely changes behavior to select rows when slice notation is used

Strangely, when given a slice, the DataFrame indexing operator selects rows and can do so by integer location or by index label.

df[2:3]

This will slice beginning from the row with integer location 2 up to 3, exclusive of the last element. So, just a single row. The following selects rows beginning at integer location 6 up to but not including 20 by every third row.

df[6:20:3]

You can also use slices consisting of string labels if your DataFrame index has strings in it. For more details, see this solution on .iloc vs .loc.

I almost never use this slice notation with the indexing operator as its not explicit and hardly ever used. When slicing by rows, stick with .loc/.iloc.


回答 2

您可以将DataFrame视为Series的字典。df[key]尝试通过选择列索引key并返回Series对象。

但是,在[]内切片会对行进行切片,因为这是非常常见的操作。

您可以阅读文档以了解详细信息:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

You can think DataFrame as a dict of Series. df[key] try to select the column index by key and returns a Series object.

However slicing inside of [] slices the rows, because it’s a very common operation.

You can read the document for detail:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics


回答 3

要基于索引访问熊猫表,还可以考虑使用numpy.as_array选项将表转换为Numpy数组,方法如下:

np_df = df.as_matrix()

然后

np_df[i] 

会工作。

To index-based access to the pandas table, one can also consider numpy.as_array option to convert the table to Numpy array as

np_df = df.as_matrix()

and then

np_df[i] 

would work.


回答 4

您可以看一下源代码

DataFrame具有对_slice()进行切片的私有函数DataFrame,并且它允许参数axis确定要切片的轴。在__getitem__()DataFrame不设置轴,同时调用_slice()。因此_slice(),默认情况下将其切片为轴0。

您可以进行一个简单的实验,这可能对您有所帮助:

print df._slice(slice(0, 2))
print df._slice(slice(0, 2), 0)
print df._slice(slice(0, 2), 1)

You can take a look at the source code .

DataFrame has a private function _slice() to slice the DataFrame, and it allows the parameter axis to determine which axis to slice. The __getitem__() for DataFrame doesn’t set the axis while invoking _slice(). So the _slice() slice it by default axis 0.

You can take a simple experiment, that might help you:

print df._slice(slice(0, 2))
print df._slice(slice(0, 2), 0)
print df._slice(slice(0, 2), 1)

回答 5

您可以像这样遍历数据帧。

for ad in range(1,dataframe_c.size):
    print(dataframe_c.values[ad])

you can loop through the data frame like this .

for ad in range(1,dataframe_c.size):
    print(dataframe_c.values[ad])

获取列表的最后一个元素

问题:获取列表的最后一个元素

在Python中,如何获取列表的最后一个元素?

In Python, how do you get the last element of a list?


回答 0

some_list[-1] 是最短和最Pythonic的。

实际上,您可以使用此语法做更多的事情。该some_list[-n]语法获取第n到最后一个元素。因此some_list[-1]获取最后一个元素,some_list[-2]获取倒数第二个,依此类推,一直向下到some_list[-len(some_list)],这将为您提供第一个元素。

您也可以通过这种方式设置列表元素。例如:

>>> some_list = [1, 2, 3]
>>> some_list[-1] = 5 # Set the last element
>>> some_list[-2] = 3 # Set the second to last element
>>> some_list
[1, 3, 5]

请注意,IndexError如果期望的项目不存在,则按索引获取列表项将引发。这意味着some_list[-1]如果some_list为空将引发异常,因为空列表不能有最后一个元素。

some_list[-1] is the shortest and most Pythonic.

In fact, you can do much more with this syntax. The some_list[-n] syntax gets the nth-to-last element. So some_list[-1] gets the last element, some_list[-2] gets the second to last, etc, all the way down to some_list[-len(some_list)], which gives you the first element.

You can also set list elements in this way. For instance:

>>> some_list = [1, 2, 3]
>>> some_list[-1] = 5 # Set the last element
>>> some_list[-2] = 3 # Set the second to last element
>>> some_list
[1, 3, 5]

Note that getting a list item by index will raise an IndexError if the expected item doesn’t exist. This means that some_list[-1] will raise an exception if some_list is empty, because an empty list can’t have a last element.


回答 1

如果您的str()list()对象最终可能是空的:astr = ''alist = [],那么您可能要使用alist[-1:]而不是alist[-1]对象“ sameness”。

其意义是:

alist = []
alist[-1]   # will generate an IndexError exception whereas 
alist[-1:]  # will return an empty list
astr = ''
astr[-1]    # will generate an IndexError exception whereas
astr[-1:]   # will return an empty str

区别在于返回空列表对象或空str对象更像是“异常元素”,而不是异常对象。

If your str() or list() objects might end up being empty as so: astr = '' or alist = [], then you might want to use alist[-1:] instead of alist[-1] for object “sameness”.

The significance of this is:

alist = []
alist[-1]   # will generate an IndexError exception whereas 
alist[-1:]  # will return an empty list
astr = ''
astr[-1]    # will generate an IndexError exception whereas
astr[-1:]   # will return an empty str

Where the distinction being made is that returning an empty list object or empty str object is more “last element”-like then an exception object.


回答 2

您也可以这样做:

alist.pop()

这取决于您要对列表执行的操作,因为该pop()方法将删除最后一个元素。

You can also do:

alist.pop()

It depends on what you want to do with your list because the pop() method will delete the last element.


回答 3

在python中显示最后一个元素的最简单方法是

>>> list[-1:] # returns indexed value
    [3]
>>> list[-1]  # returns value
    3

还有许多其他方法可以实现这一目标,但是它们简短易用。

The simplest way to display last element in python is

>>> list[-1:] # returns indexed value
    [3]
>>> list[-1]  # returns value
    3

there are many other method to achieve such a goal but these are short and sweet to use.


回答 4

在Python中,如何获取列表的最后一个元素?

为了获得最后一个元素,

  • 而不修改列表,以及
  • 假设您知道列表中最后一个元素(即非空)

传递-1给下标符号:

>>> a_list = ['zero', 'one', 'two', 'three']
>>> a_list[-1]
'three'

说明

索引和切片可以采用负整数作为参数。

我已经从文档中修改了一个示例以指示每个索引引用序列中的哪个项目,在这种情况下,在string中"Python"-1引用最后一个元素(字符)'n'

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
   0   1   2   3   4   5 
  -6  -5  -4  -3  -2  -1

>>> p = 'Python'
>>> p[-1]
'n'

通过迭代拆包分配

为了获取最后一个元素,但出于完整性的考虑,此方法可能不必要地实现第二个列表(并且由于它支持任何可迭代的对象-不只是列表):

>>> *head, last = a_list
>>> last
'three'

变量名head绑定到不必要的新创建的列表:

>>> head
['zero', 'one', 'two']

如果您不打算对该列表进行任何操作,则可能会更合适:

*_, last = a_list

或者,实际上,如果您知道它是一个列表(或至少接受下标符号):

last = a_list[-1]

在功能上

评论者说:

我希望Python像Lisp一样具有first()和last()函数…它将摆脱很多不必要的lambda函数。

这些定义起来非常简单:

def last(a_list):
    return a_list[-1]

def first(a_list):
    return a_list[0]

或使用operator.itemgetter

>>> import operator
>>> last = operator.itemgetter(-1)
>>> first = operator.itemgetter(0)

在任一情况下:

>>> last(a_list)
'three'
>>> first(a_list)
'zero'

特别案例

如果您正在做更复杂的事情,您可能会发现以略微不同的方式获取最后一个元素的性能更高。

如果您是编程的新手,则应避免使用本节,因为它会将原本在语义上不同的算法部分结合在一起。如果在某个地方更改算法,则可能会对另一行代码产生意外影响。

我尽力提供所有警告和条件,但我可能错过了一些东西。如果您认为我没有提出警告,请发表评论。

切片

列表的一部分将返回一个新列表-因此,如果要在新列表中使用该元素,我们可以从-1到末尾进行切片:

>>> a_slice = a_list[-1:]
>>> a_slice
['three']

如果列表为空,这样做的好处是不会失败:

>>> empty_list = []
>>> tail = empty_list[-1:]
>>> if tail:
...     do_something(tail)

尝试通过索引访问会引发一个IndexError需要处理的问题:

>>> empty_list[-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

但是同样,仅在需要时才可以切片:

  • 创建一个新列表
  • 如果先前列表为空,则新列表为空。

for 循环

作为Python的功能,for循环中没有内部作用域。

如果您已经对列表执行了完整的迭代,则最后一个元素仍将由循环中分配的变量名称引用:

>>> def do_something(arg): pass
>>> for item in a_list:
...     do_something(item)
...     
>>> item
'three'

从语义上讲,这并不是列表中的最后一件事。从语义上讲,这是名称item绑定到的最后一件事。

>>> def do_something(arg): raise Exception
>>> for item in a_list:
...     do_something(item)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 1, in do_something
Exception
>>> item
'zero'

因此,只有当您

  • 已经在循环,并且
  • 您知道循环将结束(不会由于错误而中断或退出),否则它将指向循环引用的最后一个元素。

获取和删除它

我们还可以通过删除并返回最后一个元素来更改原始列表:

>>> a_list.pop(-1)
'three'
>>> a_list
['zero', 'one', 'two']

但是现在原始列表已修改。

-1实际上是默认参数,因此list.pop可以在没有索引参数的情况下使用):

>>> a_list.pop()
'two'

仅在以下情况下这样做

  • 您知道列表中有元素,或者准备为空时处理异常,并且
  • 您确实打算从列表中删除最后一个元素,将其视为堆栈。

这些是有效的用例,但不是很常见。

保存相反的其余部分以供以后使用:

我不知道为什么要这么做,但出于完整性考虑,由于reversed返回了迭代器(支持迭代器协议),您可以将其结果传递给next

>>> next(reversed([1,2,3]))
3

所以就像做相反的事情:

>>> next(iter([1,2,3]))
1

但是我想不出这样做的充分理由,除非稍后需要其余的反向迭代器,这可能看起来像这样:

reverse_iterator = reversed([1,2,3])
last_element = next(reverse_iterator)

use_later = list(reverse_iterator)

现在:

>>> use_later
[2, 1]
>>> last_element
3

In Python, how do you get the last element of a list?

To just get the last element,

  • without modifying the list, and
  • assuming you know the list has a last element (i.e. it is nonempty)

pass -1 to the subscript notation:

>>> a_list = ['zero', 'one', 'two', 'three']
>>> a_list[-1]
'three'

Explanation

Indexes and slices can take negative integers as arguments.

I have modified an example from the documentation to indicate which item in a sequence each index references, in this case, in the string "Python", -1 references the last element, the character, 'n':

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
   0   1   2   3   4   5 
  -6  -5  -4  -3  -2  -1

>>> p = 'Python'
>>> p[-1]
'n'

Assignment via iterable unpacking

This method may unnecessarily materialize a second list for the purposes of just getting the last element, but for the sake of completeness (and since it supports any iterable – not just lists):

>>> *head, last = a_list
>>> last
'three'

The variable name, head is bound to the unnecessary newly created list:

>>> head
['zero', 'one', 'two']

If you intend to do nothing with that list, this would be more apropos:

*_, last = a_list

Or, really, if you know it’s a list (or at least accepts subscript notation):

last = a_list[-1]

In a function

A commenter said:

I wish Python had a function for first() and last() like Lisp does… it would get rid of a lot of unnecessary lambda functions.

These would be quite simple to define:

def last(a_list):
    return a_list[-1]

def first(a_list):
    return a_list[0]

Or use operator.itemgetter:

>>> import operator
>>> last = operator.itemgetter(-1)
>>> first = operator.itemgetter(0)

In either case:

>>> last(a_list)
'three'
>>> first(a_list)
'zero'

Special cases

If you’re doing something more complicated, you may find it more performant to get the last element in slightly different ways.

If you’re new to programming, you should avoid this section, because it couples otherwise semantically different parts of algorithms together. If you change your algorithm in one place, it may have an unintended impact on another line of code.

I try to provide caveats and conditions as completely as I can, but I may have missed something. Please comment if you think I’m leaving a caveat out.

Slicing

A slice of a list returns a new list – so we can slice from -1 to the end if we are going to want the element in a new list:

>>> a_slice = a_list[-1:]
>>> a_slice
['three']

This has the upside of not failing if the list is empty:

>>> empty_list = []
>>> tail = empty_list[-1:]
>>> if tail:
...     do_something(tail)

Whereas attempting to access by index raises an IndexError which would need to be handled:

>>> empty_list[-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

But again, slicing for this purpose should only be done if you need:

  • a new list created
  • and the new list to be empty if the prior list was empty.

for loops

As a feature of Python, there is no inner scoping in a for loop.

If you’re performing a complete iteration over the list already, the last element will still be referenced by the variable name assigned in the loop:

>>> def do_something(arg): pass
>>> for item in a_list:
...     do_something(item)
...     
>>> item
'three'

This is not semantically the last thing in the list. This is semantically the last thing that the name, item, was bound to.

>>> def do_something(arg): raise Exception
>>> for item in a_list:
...     do_something(item)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 1, in do_something
Exception
>>> item
'zero'

Thus this should only be used to get the last element if you

  • are already looping, and
  • you know the loop will finish (not break or exit due to errors), otherwise it will point to the last element referenced by the loop.

Getting and removing it

We can also mutate our original list by removing and returning the last element:

>>> a_list.pop(-1)
'three'
>>> a_list
['zero', 'one', 'two']

But now the original list is modified.

(-1 is actually the default argument, so list.pop can be used without an index argument):

>>> a_list.pop()
'two'

Only do this if

  • you know the list has elements in it, or are prepared to handle the exception if it is empty, and
  • you do intend to remove the last element from the list, treating it like a stack.

These are valid use-cases, but not very common.

Saving the rest of the reverse for later:

I don’t know why you’d do it, but for completeness, since reversed returns an iterator (which supports the iterator protocol) you can pass its result to next:

>>> next(reversed([1,2,3]))
3

So it’s like doing the reverse of this:

>>> next(iter([1,2,3]))
1

But I can’t think of a good reason to do this, unless you’ll need the rest of the reverse iterator later, which would probably look more like this:

reverse_iterator = reversed([1,2,3])
last_element = next(reverse_iterator)

use_later = list(reverse_iterator)

and now:

>>> use_later
[2, 1]
>>> last_element
3

回答 5

为了防止IndexError: list index out of range,请使用以下语法:

mylist = [1, 2, 3, 4]

# With None as default value:
value = mylist and mylist[-1]

# With specified default value (option 1):
value = mylist and mylist[-1] or 'default'

# With specified default value (option 2):
value = mylist[-1] if mylist else 'default'

To prevent IndexError: list index out of range, use this syntax:

mylist = [1, 2, 3, 4]

# With None as default value:
value = mylist and mylist[-1]

# With specified default value (option 1):
value = mylist and mylist[-1] or 'default'

# With specified default value (option 2):
value = mylist[-1] if mylist else 'default'

回答 6

另一种方法:

some_list.reverse() 
some_list[0]

Another method:

some_list.reverse() 
some_list[0]

回答 7

lst[-1]是最好的方法,但是对于一般的可迭代对象,请考虑more_itertools.last

import more_itertools as mit


mit.last([0, 1, 2, 3])
# 3

mit.last(iter([1, 2, 3]))
# 3

mit.last([], "some default")
# 'some default'

lst[-1] is the best approach, but with general iterables, consider more_itertools.last:

Code

import more_itertools as mit


mit.last([0, 1, 2, 3])
# 3

mit.last(iter([1, 2, 3]))
# 3

mit.last([], "some default")
# 'some default'

回答 8

list[-1]将检索列表的最后一个元素而不更改列表。 list.pop()将检索列表的最后一个元素,但它将更改/更改原始列表。通常,不建议更改原始列表。

另外,如果出于某种原因,您正在寻找一些不符合pythonic的工具,则可以使用list[len(list)-1],假设列表不为空。

list[-1] will retrieve the last element of the list without changing the list. list.pop() will retrieve the last element of the list, but it will mutate/change the original list. Usually, mutating the original list is not recommended.

Alternatively, if, for some reason, you’re looking for something less pythonic, you could use list[len(list)-1], assuming the list is not empty.


回答 9

如果您不想在列表为空时获取IndexError,则也可以使用下面的代码。

next(reversed(some_list), None)

You can also use the code below, if you do not want to get IndexError when the list is empty.

next(reversed(some_list), None)

回答 10

好的,但是几乎每种语言都常见items[len(items) - 1]吗?这是IMO获得最后一个元素的最简单方法,因为它不需要任何Python知识。

Ok, but what about common in almost every language way items[len(items) - 1]? This is IMO the easiest way to get last element, because it does not require anything pythonic knowledge.


在列表中查找项目的索引

问题:在列表中查找项目的索引

给定一个列表["foo", "bar", "baz"]和列表中的项目"bar",如何1在Python中获取其索引()?

Given a list ["foo", "bar", "baz"] and an item in the list "bar", how do I get its index (1) in Python?


回答 0

>>> ["foo", "bar", "baz"].index("bar")
1

参考:数据结构>列表中的更多内容

注意事项

请注意,虽然这也许是回答这个问题最彻底的方法是问index是一个相当薄弱的组件listAPI,而我不记得我最后一次使用它的愤怒。在评论中已向我指出,由于此答案被大量引用,因此应使其更完整。有关list.index以下注意事项。最初值得一看它的文档可能是值得的:

list.index(x[, start[, end]])

在值等于x的第一项的列表中返回从零开始的索引。ValueError如果没有此类项目,则引发a 。

可选参数startend的解释与切片符号相同,用于将搜索限制到列表的特定子序列。返回的索引是相对于完整序列的开始而不是开始参数计算的。

列表长度的线性时间复杂度

一个index调用检查,以列表的每一个元素,直到它找到一个匹配。如果您的列表很长,并且您大概不知道它在列表中的什么位置,则此搜索可能会成为瓶颈。在这种情况下,您应该考虑使用其他数据结构。请注意,如果您大致知道在哪里找到匹配项,则可以给出index提示。例如,在此代码段中,l.index(999_999, 999_990, 1_000_000)它比straight快大约五个数量级l.index(999_999),因为前者只需要搜索10个条目,而后者要搜索一百万个:

>>> import timeit
>>> timeit.timeit('l.index(999_999)', setup='l = list(range(0, 1_000_000))', number=1000)
9.356267921015387
>>> timeit.timeit('l.index(999_999, 999_990, 1_000_000)', setup='l = list(range(0, 1_000_000))', number=1000)
0.0004404920036904514

仅将第一个匹配项的索引返回到其参数

呼叫index顺序搜索列表,直到找到匹配项,然后在该处停止。如果希望需要更多匹配项的索引,则应使用列表推导或生成器表达式。

>>> [1, 1].index(1)
0
>>> [i for i, e in enumerate([1, 2, 1]) if e == 1]
[0, 2]
>>> g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
>>> next(g)
0
>>> next(g)
2

我曾经使用过的大多数地方index,现在我使用列表推导或生成器表达式,因为它们更具通用性。因此,如果您打算接触index,请看看这些出色的Python功能。

如果元素不在列表中则抛出

如果该项目不存在,则调用会index导致ValueError

>>> [1, 1].index(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 2 is not in list

如果该项目可能不在列表中,则您应该

  1. 首先使用item in my_list(干净,可读的方法)进行检查,或者
  2. index呼叫包裹在一个try/except可以捕获的块中ValueError(可能更快,至少在要搜索的列表较长且通常存在该项目的情况下。)
>>> ["foo", "bar", "baz"].index("bar")
1

Reference: Data Structures > More on Lists

Caveats follow

Note that while this is perhaps the cleanest way to answer the question as asked, index is a rather weak component of the list API, and I can’t remember the last time I used it in anger. It’s been pointed out to me in the comments that because this answer is heavily referenced, it should be made more complete. Some caveats about list.index follow. It is probably worth initially taking a look at the documentation for it:

list.index(x[, start[, end]])

Return zero-based index in the list of the first item whose value is equal to x. Raises a ValueError if there is no such item.

The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.

Linear time-complexity in list length

An index call checks every element of the list in order, until it finds a match. If your list is long, and you don’t know roughly where in the list it occurs, this search could become a bottleneck. In that case, you should consider a different data structure. Note that if you know roughly where to find the match, you can give index a hint. For instance, in this snippet, l.index(999_999, 999_990, 1_000_000) is roughly five orders of magnitude faster than straight l.index(999_999), because the former only has to search 10 entries, while the latter searches a million:

>>> import timeit
>>> timeit.timeit('l.index(999_999)', setup='l = list(range(0, 1_000_000))', number=1000)
9.356267921015387
>>> timeit.timeit('l.index(999_999, 999_990, 1_000_000)', setup='l = list(range(0, 1_000_000))', number=1000)
0.0004404920036904514

Only returns the index of the first match to its argument

A call to index searches through the list in order until it finds a match, and stops there. If you expect to need indices of more matches, you should use a list comprehension, or generator expression.

>>> [1, 1].index(1)
0
>>> [i for i, e in enumerate([1, 2, 1]) if e == 1]
[0, 2]
>>> g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
>>> next(g)
0
>>> next(g)
2

Most places where I once would have used index, I now use a list comprehension or generator expression because they’re more generalizable. So if you’re considering reaching for index, take a look at these excellent Python features.

Throws if element not present in list

A call to index results in a ValueError if the item’s not present.

>>> [1, 1].index(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 2 is not in list

If the item might not be present in the list, you should either

  1. Check for it first with item in my_list (clean, readable approach), or
  2. Wrap the index call in a try/except block which catches ValueError (probably faster, at least when the list to search is long, and the item is usually present.)

回答 1

学习Python真正有用的一件事是使用交互式帮助功能:

>>> help(["foo", "bar", "baz"])
Help on list object:

class list(object)
 ...

 |
 |  index(...)
 |      L.index(value, [start, [stop]]) -> integer -- return first index of value
 |

这通常会引导您找到所需的方法。

One thing that is really helpful in learning Python is to use the interactive help function:

>>> help(["foo", "bar", "baz"])
Help on list object:

class list(object)
 ...

 |
 |  index(...)
 |      L.index(value, [start, [stop]]) -> integer -- return first index of value
 |

which will often lead you to the method you are looking for.


回答 2

大多数答案都说明了如何查找单个索引,但是如果该项目多次在列表中,则它们的方法不会返回多个索引。用途enumerate()

for i, j in enumerate(['foo', 'bar', 'baz']):
    if j == 'bar':
        print(i)

index()函数仅返回第一个匹配项,而enumerate()返回所有匹配项。

作为列表理解:

[i for i, j in enumerate(['foo', 'bar', 'baz']) if j == 'bar']

这也是另一个小解决方案itertools.count()(与枚举几乎相同):

from itertools import izip as zip, count # izip for maximum efficiency
[i for i, j in zip(count(), ['foo', 'bar', 'baz']) if j == 'bar']

对于较大的列表,这比使用enumerate()以下命令更有效:

$ python -m timeit -s "from itertools import izip as zip, count" "[i for i, j in zip(count(), ['foo', 'bar', 'baz']*500) if j == 'bar']"
10000 loops, best of 3: 174 usec per loop
$ python -m timeit "[i for i, j in enumerate(['foo', 'bar', 'baz']*500) if j == 'bar']"
10000 loops, best of 3: 196 usec per loop

The majority of answers explain how to find a single index, but their methods do not return multiple indexes if the item is in the list multiple times. Use enumerate():

for i, j in enumerate(['foo', 'bar', 'baz']):
    if j == 'bar':
        print(i)

The index() function only returns the first occurrence, while enumerate() returns all occurrences.

As a list comprehension:

[i for i, j in enumerate(['foo', 'bar', 'baz']) if j == 'bar']

Here’s also another small solution with itertools.count() (which is pretty much the same approach as enumerate):

from itertools import izip as zip, count # izip for maximum efficiency
[i for i, j in zip(count(), ['foo', 'bar', 'baz']) if j == 'bar']

This is more efficient for larger lists than using enumerate():

$ python -m timeit -s "from itertools import izip as zip, count" "[i for i, j in zip(count(), ['foo', 'bar', 'baz']*500) if j == 'bar']"
10000 loops, best of 3: 174 usec per loop
$ python -m timeit "[i for i, j in enumerate(['foo', 'bar', 'baz']*500) if j == 'bar']"
10000 loops, best of 3: 196 usec per loop

回答 3

要获取所有索引:

indexes = [i for i,x in enumerate(xs) if x == 'foo']

To get all indexes:

indexes = [i for i,x in enumerate(xs) if x == 'foo']

回答 4

index()返回值的第一个索引!

| 索引(…)
| L.index(value,[start,[stop]])->整数-返回值的第一个索引

def all_indices(value, qlist):
    indices = []
    idx = -1
    while True:
        try:
            idx = qlist.index(value, idx+1)
            indices.append(idx)
        except ValueError:
            break
    return indices

all_indices("foo", ["foo","bar","baz","foo"])

index() returns the first index of value!

| index(…)
| L.index(value, [start, [stop]]) -> integer — return first index of value

def all_indices(value, qlist):
    indices = []
    idx = -1
    while True:
        try:
            idx = qlist.index(value, idx+1)
            indices.append(idx)
        except ValueError:
            break
    return indices

all_indices("foo", ["foo","bar","baz","foo"])

回答 5

如果该元素不在列表中,则会出现问题。此函数处理该问题:

# if element is found it returns index of element else returns None

def find_element_in_list(element, list_element):
    try:
        index_element = list_element.index(element)
        return index_element
    except ValueError:
        return None

A problem will arise if the element is not in the list. This function handles the issue:

# if element is found it returns index of element else returns None

def find_element_in_list(element, list_element):
    try:
        index_element = list_element.index(element)
        return index_element
    except ValueError:
        return None

回答 6

a = ["foo","bar","baz",'bar','any','much']

indexes = [index for index in range(len(a)) if a[index] == 'bar']
a = ["foo","bar","baz",'bar','any','much']

indexes = [index for index in range(len(a)) if a[index] == 'bar']

回答 7

您必须设置条件以检查要搜索的元素是否在列表中

if 'your_element' in mylist:
    print mylist.index('your_element')
else:
    print None

You have to set a condition to check if the element you’re searching is in the list

if 'your_element' in mylist:
    print mylist.index('your_element')
else:
    print None

回答 8

此处提出的所有功能均会重现固有的语言行为,但会掩盖正在发生的事情。

[i for i in range(len(mylist)) if mylist[i]==myterm]  # get the indices

[each for each in mylist if each==myterm]             # get the items

mylist.index(myterm) if myterm in mylist else None    # get the first index and fail quietly

如果该语言提供了执行所需功能的方法,为什么还要编写具有异常处理功能的函数?

All of the proposed functions here reproduce inherent language behavior but obscure what’s going on.

[i for i in range(len(mylist)) if mylist[i]==myterm]  # get the indices

[each for each in mylist if each==myterm]             # get the items

mylist.index(myterm) if myterm in mylist else None    # get the first index and fail quietly

Why write a function with exception handling if the language provides the methods to do what you want itself?


回答 9

如果需要所有索引,则可以使用NumPy

import numpy as np

array = [1, 2, 1, 3, 4, 5, 1]
item = 1
np_array = np.array(array)
item_index = np.where(np_array==item)
print item_index
# Out: (array([0, 2, 6], dtype=int64),)

这是一个清晰易读的解决方案。

If you want all indexes, then you can use NumPy:

import numpy as np

array = [1, 2, 1, 3, 4, 5, 1]
item = 1
np_array = np.array(array)
item_index = np.where(np_array==item)
print item_index
# Out: (array([0, 2, 6], dtype=int64),)

It is clear, readable solution.


回答 10

在Python中给定包含该项目的列表的情况下查找项目的索引

对于列表["foo", "bar", "baz"]和列表中的项目,"bar"用Python获取索引(1)的最干净方法是什么?

好吧,可以肯定的是,这里有index方法,它返回第一次出现的索引:

>>> l = ["foo", "bar", "baz"]
>>> l.index('bar')
1

此方法存在两个问题:

  • 如果该值不在列表中,则会得到一个 ValueError
  • 如果列表中有多个值,则仅获取第一个的索引

没有值

如果该值可能丢失,则需要捕获 ValueError

您可以使用这样的可重用定义来执行此操作:

def index(a_list, value):
    try:
        return a_list.index(value)
    except ValueError:
        return None

并像这样使用它:

>>> print(index(l, 'quux'))
None
>>> print(index(l, 'bar'))
1

不利的一面是,您可能会检查返回的值isis not无:

result = index(a_list, value)
if result is not None:
    do_something(result)

列表中有多个值

如果可能发生更多次,您将无法获得有关以下方面的完整信息list.index

>>> l.append('bar')
>>> l
['foo', 'bar', 'baz', 'bar']
>>> l.index('bar')              # nothing at index 3?
1

您可以将索引枚举到列表中:

>>> [index for index, v in enumerate(l) if v == 'bar']
[1, 3]
>>> [index for index, v in enumerate(l) if v == 'boink']
[]

如果没有出现,则可以通过布尔检查结果来进行检查,或者如果对结果进行循环,则什么也不做:

indexes = [index for index, v in enumerate(l) if v == 'boink']
for index in indexes:
    do_something(index)

用熊猫更好地处理数据

如果您有熊猫,则可以通过Series对象轻松获得以下信息:

>>> import pandas as pd
>>> series = pd.Series(l)
>>> series
0    foo
1    bar
2    baz
3    bar
dtype: object

比较检查将返回一系列布尔值:

>>> series == 'bar'
0    False
1     True
2    False
3     True
dtype: bool

通过下标符号将该布尔值系列传递给该系列,您将只获得匹配的成员:

>>> series[series == 'bar']
1    bar
3    bar
dtype: object

如果只需要索引,index属性将返回一系列整数:

>>> series[series == 'bar'].index
Int64Index([1, 3], dtype='int64')

而且,如果要将它们放在列表或元组中,只需将它们传递给构造函数即可:

>>> list(series[series == 'bar'].index)
[1, 3]

是的,您也可以使用带有枚举的列表理解,但这在我看来并不那么优雅-您正在用Python进行相等性测试,而不是让用C编写的内置代码来处理它:

>>> [i for i, value in enumerate(l) if value == 'bar']
[1, 3]

这是XY问题吗?

XY问题是在询问您尝试的解决方案,而不是您的实际问题。

为什么您认为需要列表中给定元素的索引?

如果您已经知道该值,为什么还要关心它在列表中的位置?

如果值不存在,则捕获ValueError相当冗长-我宁愿避免这种情况。

无论如何,我通常都会遍历该列表,因此我通常会保留一个指向任何有趣信息的指针,并使用枚举获取索引。

如果您要处理数据,则可能应该使用pandas-与我展示的纯Python解决方法相比,pandas的工具要优雅得多。

我不记得list.index自己需要。但是,我浏览了Python标准库,并且看到了一些很好的用法。

idlelibGUI和文本解析中,有很多用途。

keyword模块使用它在模块中查找注释标记,以通过元编程自动重新生成其中的关键字列表。

在Lib / mailbox.py中,它似乎像有序映射一样在使用它:

key_list[key_list.index(old)] = new

del key_list[key_list.index(key)]

在Lib / http / cookiejar.py中,似乎用来获取下个月的内容:

mon = MONTHS_LOWER.index(mon.lower())+1

在Lib / tarfile.py中,类似于distutils来获取最多一个项目的切片:

members = members[:members.index(tarinfo)]

在Lib / pickletools.py中:

numtopop = before.index(markobject)

这些用法似乎有一个共同点,即它们似乎在受限制大小的列表上运行(由于O的n(n)查找时间而很重要list.index),并且它们主要用于解析(对于Idle,则通常用于UI)。

尽管有用例,但这种情况很少见。如果发现自己正在寻找答案,请问自己正在做的事情是否最直接地使用了该用例所用语言提供的工具。

Finding the index of an item given a list containing it in Python

For a list ["foo", "bar", "baz"] and an item in the list "bar", what’s the cleanest way to get its index (1) in Python?

Well, sure, there’s the index method, which returns the index of the first occurrence:

>>> l = ["foo", "bar", "baz"]
>>> l.index('bar')
1

There are a couple of issues with this method:

  • if the value isn’t in the list, you’ll get a ValueError
  • if more than one of the value is in the list, you only get the index for the first one

No values

If the value could be missing, you need to catch the ValueError.

You can do so with a reusable definition like this:

def index(a_list, value):
    try:
        return a_list.index(value)
    except ValueError:
        return None

And use it like this:

>>> print(index(l, 'quux'))
None
>>> print(index(l, 'bar'))
1

And the downside of this is that you will probably have a check for if the returned value is or is not None:

result = index(a_list, value)
if result is not None:
    do_something(result)

More than one value in the list

If you could have more occurrences, you’ll not get complete information with list.index:

>>> l.append('bar')
>>> l
['foo', 'bar', 'baz', 'bar']
>>> l.index('bar')              # nothing at index 3?
1

You might enumerate into a list comprehension the indexes:

>>> [index for index, v in enumerate(l) if v == 'bar']
[1, 3]
>>> [index for index, v in enumerate(l) if v == 'boink']
[]

If you have no occurrences, you can check for that with boolean check of the result, or just do nothing if you loop over the results:

indexes = [index for index, v in enumerate(l) if v == 'boink']
for index in indexes:
    do_something(index)

Better data munging with pandas

If you have pandas, you can easily get this information with a Series object:

>>> import pandas as pd
>>> series = pd.Series(l)
>>> series
0    foo
1    bar
2    baz
3    bar
dtype: object

A comparison check will return a series of booleans:

>>> series == 'bar'
0    False
1     True
2    False
3     True
dtype: bool

Pass that series of booleans to the series via subscript notation, and you get just the matching members:

>>> series[series == 'bar']
1    bar
3    bar
dtype: object

If you want just the indexes, the index attribute returns a series of integers:

>>> series[series == 'bar'].index
Int64Index([1, 3], dtype='int64')

And if you want them in a list or tuple, just pass them to the constructor:

>>> list(series[series == 'bar'].index)
[1, 3]

Yes, you could use a list comprehension with enumerate too, but that’s just not as elegant, in my opinion – you’re doing tests for equality in Python, instead of letting builtin code written in C handle it:

>>> [i for i, value in enumerate(l) if value == 'bar']
[1, 3]

Is this an XY problem?

The XY problem is asking about your attempted solution rather than your actual problem.

Why do you think you need the index given an element in a list?

If you already know the value, why do you care where it is in a list?

If the value isn’t there, catching the ValueError is rather verbose – and I prefer to avoid that.

I’m usually iterating over the list anyways, so I’ll usually keep a pointer to any interesting information, getting the index with enumerate.

If you’re munging data, you should probably be using pandas – which has far more elegant tools than the pure Python workarounds I’ve shown.

I do not recall needing list.index, myself. However, I have looked through the Python standard library, and I see some excellent uses for it.

There are many, many uses for it in idlelib, for GUI and text parsing.

The keyword module uses it to find comment markers in the module to automatically regenerate the list of keywords in it via metaprogramming.

In Lib/mailbox.py it seems to be using it like an ordered mapping:

key_list[key_list.index(old)] = new

and

del key_list[key_list.index(key)]

In Lib/http/cookiejar.py, seems to be used to get the next month:

mon = MONTHS_LOWER.index(mon.lower())+1

In Lib/tarfile.py similar to distutils to get a slice up to an item:

members = members[:members.index(tarinfo)]

In Lib/pickletools.py:

numtopop = before.index(markobject)

What these usages seem to have in common is that they seem to operate on lists of constrained sizes (important because of O(n) lookup time for list.index), and they’re mostly used in parsing (and UI in the case of Idle).

While there are use-cases for it, they are fairly uncommon. If you find yourself looking for this answer, ask yourself if what you’re doing is the most direct usage of the tools provided by the language for your use-case.


回答 11

具有该zip功能的所有索引:

get_indexes = lambda x, xs: [i for (y, i) in zip(xs, range(len(xs))) if x == y]

print get_indexes(2, [1, 2, 3, 4, 5, 6, 3, 2, 3, 2])
print get_indexes('f', 'xsfhhttytffsafweef')

All indexes with the zip function:

get_indexes = lambda x, xs: [i for (y, i) in zip(xs, range(len(xs))) if x == y]

print get_indexes(2, [1, 2, 3, 4, 5, 6, 3, 2, 3, 2])
print get_indexes('f', 'xsfhhttytffsafweef')

回答 12

获取列表中一个或多个(相同)项目的所有出现次数和位置

使用enumerate(alist)可以存储第一个元素(n),即元素x等于要查找的内容时列表的索引。

>>> alist = ['foo', 'spam', 'egg', 'foo']
>>> foo_indexes = [n for n,x in enumerate(alist) if x=='foo']
>>> foo_indexes
[0, 3]
>>>

让我们使函数findindex

该函数将项目和列表作为参数,并返回项目在列表中的位置,就像我们之前看到的那样。

def indexlist(item2find, list_or_string):
  "Returns all indexes of an item in a list or a string"
  return [n for n,item in enumerate(list_or_string) if item==item2find]

print(indexlist("1", "010101010"))

输出量


[1, 3, 5, 7]

简单

for n, i in enumerate([1, 2, 3, 4, 1]):
    if i == 1:
        print(n)

输出:

0
4

Getting all the occurrences and the position of one or more (identical) items in a list

With enumerate(alist) you can store the first element (n) that is the index of the list when the element x is equal to what you look for.

>>> alist = ['foo', 'spam', 'egg', 'foo']
>>> foo_indexes = [n for n,x in enumerate(alist) if x=='foo']
>>> foo_indexes
[0, 3]
>>>

Let’s make our function findindex

This function takes the item and the list as arguments and return the position of the item in the list, like we saw before.

def indexlist(item2find, list_or_string):
  "Returns all indexes of an item in a list or a string"
  return [n for n,item in enumerate(list_or_string) if item==item2find]

print(indexlist("1", "010101010"))

Output


[1, 3, 5, 7]

Simple

for n, i in enumerate([1, 2, 3, 4, 1]):
    if i == 1:
        print(n)

Output:

0
4

回答 13

只需您可以选择

a = [['hand', 'head'], ['phone', 'wallet'], ['lost', 'stock']]
b = ['phone', 'lost']

res = [[x[0] for x in a].index(y) for y in b]

Simply you can go with

a = [['hand', 'head'], ['phone', 'wallet'], ['lost', 'stock']]
b = ['phone', 'lost']

res = [[x[0] for x in a].index(y) for y in b]

回答 14

另外一个选项

>>> a = ['red', 'blue', 'green', 'red']
>>> b = 'red'
>>> offset = 0;
>>> indices = list()
>>> for i in range(a.count(b)):
...     indices.append(a.index(b,offset))
...     offset = indices[-1]+1
... 
>>> indices
[0, 3]
>>> 

Another option

>>> a = ['red', 'blue', 'green', 'red']
>>> b = 'red'
>>> offset = 0;
>>> indices = list()
>>> for i in range(a.count(b)):
...     indices.append(a.index(b,offset))
...     offset = indices[-1]+1
... 
>>> indices
[0, 3]
>>> 

回答 15

而现在,对于完全不同的东西…

…就像在获取索引之前确认项目的存在。这种方法的好处是,该函数始终返回一个索引列表-即使它是一个空列表。它也适用于字符串。

def indices(l, val):
    """Always returns a list containing the indices of val in the_list"""
    retval = []
    last = 0
    while val in l[last:]:
            i = l[last:].index(val)
            retval.append(last + i)
            last += i + 1   
    return retval

l = ['bar','foo','bar','baz','bar','bar']
q = 'bar'
print indices(l,q)
print indices(l,'bat')
print indices('abcdaababb','a')

当粘贴到交互式python窗口中时:

Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def indices(the_list, val):
...     """Always returns a list containing the indices of val in the_list"""
...     retval = []
...     last = 0
...     while val in the_list[last:]:
...             i = the_list[last:].index(val)
...             retval.append(last + i)
...             last += i + 1   
...     return retval
... 
>>> l = ['bar','foo','bar','baz','bar','bar']
>>> q = 'bar'
>>> print indices(l,q)
[0, 2, 4, 5]
>>> print indices(l,'bat')
[]
>>> print indices('abcdaababb','a')
[0, 4, 5, 7]
>>> 

更新资料

经过一年的低沉的python开发,我对最初的答案感到有些尴尬,因此要想保持纪录,肯定可以使用上面的代码;然而,很多更地道的方式来获得相同的行为是使用列表理解,用枚举()函数一起。

像这样:

def indices(l, val):
    """Always returns a list containing the indices of val in the_list"""
    return [index for index, value in enumerate(l) if value == val]

l = ['bar','foo','bar','baz','bar','bar']
q = 'bar'
print indices(l,q)
print indices(l,'bat')
print indices('abcdaababb','a')

将其粘贴到交互式python窗口中时会生成:

Python 2.7.14 |Anaconda, Inc.| (default, Dec  7 2017, 11:07:58) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def indices(l, val):
...     """Always returns a list containing the indices of val in the_list"""
...     return [index for index, value in enumerate(l) if value == val]
... 
>>> l = ['bar','foo','bar','baz','bar','bar']
>>> q = 'bar'
>>> print indices(l,q)
[0, 2, 4, 5]
>>> print indices(l,'bat')
[]
>>> print indices('abcdaababb','a')
[0, 4, 5, 7]
>>> 

现在,在回顾了这个问题和所有答案之后,我意识到这正是FMc在他先前的答案中提出的。当我最初回答这个问题时,我什至没有看到那个答案,因为我不理解。我希望我的详细示例能有助于理解。

如果上面的单行代码对您仍然没有意义,我强烈建议您使用Google“ python list comprehension”,并花一些时间来熟悉一下自己。它只是众多强大功能之一,使使用Python开发代码感到非常高兴。

And now, for something completely different…

… like confirming the existence of the item before getting the index. The nice thing about this approach is the function always returns a list of indices — even if it is an empty list. It works with strings as well.

def indices(l, val):
    """Always returns a list containing the indices of val in the_list"""
    retval = []
    last = 0
    while val in l[last:]:
            i = l[last:].index(val)
            retval.append(last + i)
            last += i + 1   
    return retval

l = ['bar','foo','bar','baz','bar','bar']
q = 'bar'
print indices(l,q)
print indices(l,'bat')
print indices('abcdaababb','a')

When pasted into an interactive python window:

Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def indices(the_list, val):
...     """Always returns a list containing the indices of val in the_list"""
...     retval = []
...     last = 0
...     while val in the_list[last:]:
...             i = the_list[last:].index(val)
...             retval.append(last + i)
...             last += i + 1   
...     return retval
... 
>>> l = ['bar','foo','bar','baz','bar','bar']
>>> q = 'bar'
>>> print indices(l,q)
[0, 2, 4, 5]
>>> print indices(l,'bat')
[]
>>> print indices('abcdaababb','a')
[0, 4, 5, 7]
>>> 

Update

After another year of heads-down python development, I’m a bit embarrassed by my original answer, so to set the record straight, one can certainly use the above code; however, the much more idiomatic way to get the same behavior would be to use list comprehension, along with the enumerate() function.

Something like this:

def indices(l, val):
    """Always returns a list containing the indices of val in the_list"""
    return [index for index, value in enumerate(l) if value == val]

l = ['bar','foo','bar','baz','bar','bar']
q = 'bar'
print indices(l,q)
print indices(l,'bat')
print indices('abcdaababb','a')

Which, when pasted into an interactive python window yields:

Python 2.7.14 |Anaconda, Inc.| (default, Dec  7 2017, 11:07:58) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def indices(l, val):
...     """Always returns a list containing the indices of val in the_list"""
...     return [index for index, value in enumerate(l) if value == val]
... 
>>> l = ['bar','foo','bar','baz','bar','bar']
>>> q = 'bar'
>>> print indices(l,q)
[0, 2, 4, 5]
>>> print indices(l,'bat')
[]
>>> print indices('abcdaababb','a')
[0, 4, 5, 7]
>>> 

And now, after reviewing this question and all the answers, I realize that this is exactly what FMc suggested in his earlier answer. At the time I originally answered this question, I didn’t even see that answer, because I didn’t understand it. I hope that my somewhat more verbose example will aid understanding.

If the single line of code above still doesn’t make sense to you, I highly recommend you Google ‘python list comprehension’ and take a few minutes to familiarize yourself. It’s just one of the many powerful features that make it a joy to use Python to develop code.


回答 16

FMc和user7177的答案的变体将给出一个字典,该字典可以返回任何条目的所有索引:

>>> a = ['foo','bar','baz','bar','any', 'foo', 'much']
>>> l = dict(zip(set(a), map(lambda y: [i for i,z in enumerate(a) if z is y ], set(a))))
>>> l['foo']
[0, 5]
>>> l ['much']
[6]
>>> l
{'baz': [2], 'foo': [0, 5], 'bar': [1, 3], 'any': [4], 'much': [6]}
>>> 

您也可以将其用作单个衬纸,以获取单个条目的所有索引。尽管我确实使用set(a)减少了调用lambda的次数,但是并不能保证效率。

A variant on the answer from FMc and user7177 will give a dict that can return all indices for any entry:

>>> a = ['foo','bar','baz','bar','any', 'foo', 'much']
>>> l = dict(zip(set(a), map(lambda y: [i for i,z in enumerate(a) if z is y ], set(a))))
>>> l['foo']
[0, 5]
>>> l ['much']
[6]
>>> l
{'baz': [2], 'foo': [0, 5], 'bar': [1, 3], 'any': [4], 'much': [6]}
>>> 

You could also use this as a one liner to get all indices for a single entry. There are no guarantees for efficiency, though I did use set(a) to reduce the number of times the lambda is called.


回答 17

此解决方案不如其他解决方案强大,但是如果您是初学者并且仅了解for循环,则仍然可以在避免ValueError的情况下找到项目的第一个索引:

def find_element(p,t):
    i = 0
    for e in p:
        if e == t:
            return i
        else:
            i +=1
    return -1

This solution is not as powerful as others, but if you’re a beginner and only know about forloops it’s still possible to find the first index of an item while avoiding the ValueError:

def find_element(p,t):
    i = 0
    for e in p:
        if e == t:
            return i
        else:
            i +=1
    return -1

回答 18

在列表L中查找项目x的索引:

idx = L.index(x) if (x in L) else -1

Finding index of item x in list L:

idx = L.index(x) if (x in L) else -1

回答 19

由于Python列表是从零开始的,因此我们可以使用zip内置函数,如下所示:

>>> [i for i,j in zip(range(len(haystack)), haystack) if j == 'needle' ]

其中“ haystack”是有问题的列表,“ needle”是要查找的项目。

(注意:这里我们使用i进行迭代以获取索引,但是如果我们需要专注于项目,可以切换到j。)

Since Python lists are zero-based, we can use the zip built-in function as follows:

>>> [i for i,j in zip(range(len(haystack)), haystack) if j == 'needle' ]

where “haystack” is the list in question and “needle” is the item to look for.

(Note: Here we are iterating using i to get the indexes, but if we need rather to focus on the items we can switch to j.)


回答 20

name ="bar"
list = [["foo", 1], ["bar", 2], ["baz", 3]]
new_list=[]
for item in list:
    new_list.append(item[0])
print(new_list)
try:
    location= new_list.index(name)
except:
    location=-1
print (location)

这说明了字符串是否也不在列表中,如果字符串也不在列表中,则 location = -1

name ="bar"
list = [["foo", 1], ["bar", 2], ["baz", 3]]
new_list=[]
for item in list:
    new_list.append(item[0])
print(new_list)
try:
    location= new_list.index(name)
except:
    location=-1
print (location)

This accounts for if the string is not in the list too, if it isn’t in the list then location = -1


回答 21

index()如果找不到该项目,Python 方法将引发错误。因此,相反,您可以使其类似于indexOf()JavaScript 的功能,-1如果未找到该项目,它将返回:

try:
    index = array.index('search_keyword')
except ValueError:
    index = -1

Python index() method throws an error if the item was not found. So instead you can make it similar to the indexOf() function of JavaScript which returns -1 if the item was not found:

try:
    index = array.index('search_keyword')
except ValueError:
    index = -1

回答 22

有一个更实用的答案。

list(filter(lambda x: x[1]=="bar",enumerate(["foo", "bar", "baz", "bar", "baz", "bar", "a", "b", "c"])))

更通用的形式:

def get_index_of(lst, element):
    return list(map(lambda x: x[0],\
       (list(filter(lambda x: x[1]==element, enumerate(lst))))))

There is a more functional answer to this.

list(filter(lambda x: x[1]=="bar",enumerate(["foo", "bar", "baz", "bar", "baz", "bar", "a", "b", "c"])))

More generic form:

def get_index_of(lst, element):
    return list(map(lambda x: x[0],\
       (list(filter(lambda x: x[1]==element, enumerate(lst))))))

回答 23

让我们将名称指定lst给您拥有的列表。可以将列表转换lstnumpy array。并且,然后使用numpy.where获取列表中所选项目的索引。以下是实现它的方法。

import numpy as np

lst = ["foo", "bar", "baz"]  #lst: : 'list' data type
print np.where( np.array(lst) == 'bar')[0][0]

>>> 1

Let’s give the name lst to the list that you have. One can convert the list lst to a numpy array. And, then use numpy.where to get the index of the chosen item in the list. Following is the way in which you will implement it.

import numpy as np

lst = ["foo", "bar", "baz"]  #lst: : 'list' data type
print np.where( np.array(lst) == 'bar')[0][0]

>>> 1

回答 24

对于那些来自像我这样的另一种语言的人,也许有一个简单的循环,它更易于理解和使用:

mylist = ["foo", "bar", "baz", "bar"]
newlist = enumerate(mylist)
for index, item in newlist:
  if item == "bar":
    print(index, item)

我很感激枚举到底是做什么的?。那帮助我理解了。

For those coming from another language like me, maybe with a simple loop it’s easier to understand and use it:

mylist = ["foo", "bar", "baz", "bar"]
newlist = enumerate(mylist)
for index, item in newlist:
  if item == "bar":
    print(index, item)

I am thankful for So what exactly does enumerate do?. That helped me to understand.


回答 25

如果您打算一次查找索引,则可以使用“索引”方法。但是,如果要多次搜索数据,则建议使用bisect模块。请记住,使用bisect模块的数据必须进行排序。因此,您可以对数据进行一次排序,然后可以使用二等分。在我的机器上使用bisect模块比使用索引方法快20倍。

这是使用Python 3.8及更高版本语法的代码示例:

import bisect
from timeit import timeit

def bisect_search(container, value):
    return (
      index 
      if (index := bisect.bisect_left(container, value)) < len(container) 
      and container[index] == value else -1
    )

data = list(range(1000))
# value to search
value = 666

# times to test
ttt = 1000

t1 = timeit(lambda: data.index(value), number=ttt)
t2 = timeit(lambda: bisect_search(data, value), number=ttt)

print(f"{t1=:.4f}, {t2=:.4f}, diffs {t1/t2=:.2f}")

输出:

t1=0.0400, t2=0.0020, diffs t1/t2=19.60

If you are going to find an index once then using “index” method is fine. However, if you are going to search your data more than once then I recommend using bisect module. Keep in mind that using bisect module data must be sorted. So you sort data once and then you can use bisect. Using bisect module on my machine is about 20 times faster than using index method.

Here is an example of code using Python 3.8 and above syntax:

import bisect
from timeit import timeit

def bisect_search(container, value):
    return (
      index 
      if (index := bisect.bisect_left(container, value)) < len(container) 
      and container[index] == value else -1
    )

data = list(range(1000))
# value to search
value = 666

# times to test
ttt = 1000

t1 = timeit(lambda: data.index(value), number=ttt)
t2 = timeit(lambda: bisect_search(data, value), number=ttt)

print(f"{t1=:.4f}, {t2=:.4f}, diffs {t1/t2=:.2f}")

Output:

t1=0.0400, t2=0.0020, diffs t1/t2=19.60

回答 26

如果性能值得关注:

在众多答案中提到,内置方法 list.index(item)方法是O(n)算法。如果您需要执行一次,那就很好。但是,如果您需要多次访问元素的索引,则首先创建一个由项-索引对组成的字典(O(n)),然后每次需要时在O(1)处访问索引就更有意义了。它。

如果您确定列表中的项目不会重复,则可以轻松地进行以下操作:

myList = ["foo", "bar", "baz"]

# Create the dictionary
myDict = dict((e,i) for i,e in enumerate(myList))

# Lookup
myDict["bar"] # Returns 1
# myDict.get("blah") if you don't want an error to be raised if element not found.

如果您可能有重复的元素,并且需要返回其所有索引:

from collections import defaultdict as dd
myList = ["foo", "bar", "bar", "baz", "foo"]

# Create the dictionary
myDict = dd(list)
for i,e in enumerate(myList):
    myDict[e].append(i)

# Lookup
myDict["foo"] # Returns [0, 4]

If performance is of concern:

It is mentioned in numerous answers that the built-in method of list.index(item) method is an O(n) algorithm. It is fine if you need to perform this once. But if you need to access the indices of elements a number of times, it makes more sense to first create a dictionary (O(n)) of item-index pairs, and then access the index at O(1) every time you need it.

If you are sure that the items in your list are never repeated, you can easily:

myList = ["foo", "bar", "baz"]

# Create the dictionary
myDict = dict((e,i) for i,e in enumerate(myList))

# Lookup
myDict["bar"] # Returns 1
# myDict.get("blah") if you don't want an error to be raised if element not found.

If you may have duplicate elements, and need to return all of their indices:

from collections import defaultdict as dd
myList = ["foo", "bar", "bar", "baz", "foo"]

# Create the dictionary
myDict = dd(list)
for i,e in enumerate(myList):
    myDict[e].append(i)

# Lookup
myDict["foo"] # Returns [0, 4]

回答 27

如@TerryA所示,许多答案都讨论了如何查找一个索引。

more_itertools是一个第三方库,具有用于在可迭代对象中定位多个索引的工具。

给定

import more_itertools as mit


iterable = ["foo", "bar", "baz", "ham", "foo", "bar", "baz"]

查找多个观测值的索引:

list(mit.locate(iterable, lambda x: x == "bar"))
# [1, 5]

测试多个项目:

list(mit.locate(iterable, lambda x: x in {"bar", "ham"}))
# [1, 3, 5]

另请参见使用的更多选项more_itertools.locate。通过安装> pip install more_itertools

As indicated by @TerryA, many answers discuss how to find one index.

more_itertools is a third-party library with tools to locate multiple indices within an iterable.

Given

import more_itertools as mit


iterable = ["foo", "bar", "baz", "ham", "foo", "bar", "baz"]

Code

Find indices of multiple observations:

list(mit.locate(iterable, lambda x: x == "bar"))
# [1, 5]

Test multiple items:

list(mit.locate(iterable, lambda x: x in {"bar", "ham"}))
# [1, 3, 5]

See also more options with more_itertools.locate. Install via > pip install more_itertools.


回答 28

使用dictionary,其中首先处理列表,然后向其添加索引

from collections import defaultdict

index_dict = defaultdict(list)    
word_list =  ['foo','bar','baz','bar','any', 'foo', 'much']

for word_index in range(len(word_list)) :
    index_dict[word_list[word_index]].append(word_index)

word_index_to_find = 'foo'       
print(index_dict[word_index_to_find])

# output :  [0, 5]

using dictionary , where process the list first and then add the index to it

from collections import defaultdict

index_dict = defaultdict(list)    
word_list =  ['foo','bar','baz','bar','any', 'foo', 'much']

for word_index in range(len(word_list)) :
    index_dict[word_list[word_index]].append(word_index)

word_index_to_find = 'foo'       
print(index_dict[word_index_to_find])

# output :  [0, 5]

回答 29

在我看来,这["foo", "bar", "baz"].index("bar")是好的,但还不够!因为如果“ bar”不在字典中,请ValueError提出。因此,您可以使用以下功能:

def find_index(arr, name):
    try:
        return arr.index(name)
    except ValueError:
        return -1

if __name__ == '__main__':
    print(find_index(["foo", "bar", "baz"], "bar"))

结果是:

1个

如果name不是arr,则函数返回-1。例如:

打印(find_index([“ foo”,“ bar”,“ baz”],“ fooo”))

-1

in my opinion the ["foo", "bar", "baz"].index("bar") is good but it isn’t enough!because if “bar” isn’t in dictionary,ValueError raised.So you can use this function:

def find_index(arr, name):
    try:
        return arr.index(name)
    except ValueError:
        return -1

if __name__ == '__main__':
    print(find_index(["foo", "bar", "baz"], "bar"))

and the result is:

1

and if name wasn’t at arr,the function return -1.for example:

print(find_index([“foo”, “bar”, “baz”], “fooo”))

-1