标签归档:indexing

pandas loc vs. iloc vs. ix vs. at vs. iat?

问题:pandas loc vs. iloc vs. ix vs. at vs. iat?

最近开始从我的安全地方(R)分支到Python,并且对中的单元格本地化/选择感到有些困惑Pandas。我已经阅读了文档,但仍在努力了解各种本地化/选择选项的实际含义。

  • 我为什么应该使用.loc.iloc超过最一般的选择.ix
  • 我的理解是.locilocat,和iat可以提供一些保证正确性是.ix不能提供的,但我也看到了在那里.ix往往是一刀切最快的解决方案。
  • 请说明使用除.ix?以外的任何东西背后的现实世界,最佳实践推理。

Recently began branching out from my safe place (R) into Python and and am a bit confused by the cell localization/selection in Pandas. I’ve read the documentation but I’m struggling to understand the practical implications of the various localization/selection options.

  • Is there a reason why I should ever use .loc or .iloc over the most general option .ix?
  • I understand that .loc, iloc, at, and iat may provide some guaranteed correctness that .ix can’t offer, but I’ve also read where .ix tends to be the fastest solution across the board.
  • Please explain the real-world, best-practices reasoning behind utilizing anything other than .ix?

回答 0

loc:仅适用于索引
iloc:适用于位置
ix:您可以从数据获取数据,而无需将其包含在索引
中:获取标量值。这是一个非常快速的定位
获取标量值。这是一个非常快的iloc

http://pyciencia.blogspot.com/2015/05/obtener-y-filtrar-datos-de-un-dataframe.html

注:由于pandas 0.20.0中,.ix索引被弃用赞成更加严格.iloc.loc索引。

loc: only work on index
iloc: work on position
ix: You can get data from dataframe without it being in the index
at: get scalar values. It’s a very fast loc
iat: Get scalar values. It’s a very fast iloc

http://pyciencia.blogspot.com/2015/05/obtener-y-filtrar-datos-de-un-dataframe.html

Note: As of pandas 0.20.0, the .ix indexer is deprecated in favour of the more strict .iloc and .loc indexers.


回答 1

已更新,pandas 0.20因为ix已弃用。这不但表明了如何使用locilocatiatset_value,但如何实现,混合位置/标签基于索引。


loc基于标签
允许您将一维数组作为索引器传递。数组可以是索引或列的切片(子集),也可以是长度与索引或列相等的布尔数组。

特别说明:当传递标量索引器时,loc可以分配以前不存在的新索引或列值。

# label based, but we can use position values
# to get the labels from the index object
df.loc[df.index[2], 'ColName'] = 3

df.loc[df.index[1:3], 'ColName'] = 3

iloc基于位置
类似于,loc除了位置而不是索引值。但是,您不能分配新的列或索引。

# position based, but we can get the position
# from the columns object via the `get_loc` method
df.iloc[2, df.columns.get_loc('ColName')] = 3

df.iloc[2, 4] = 3

df.iloc[:3, 2:4] = 3

at基于标签的
作品与loc标量索引器非常相似。 无法对数组索引器进行操作。 能够!分配新的索引和列。

优势loc是,这是速度更快。
缺点是不能将数组用于索引器。

# label based, but we can use position values
# to get the labels from the index object
df.at[df.index[2], 'ColName'] = 3

df.at['C', 'ColName'] = 3

iat基于位置的
原理相似iloc无法在数组索引器中工作。 不能!分配新的索引和列。

优势iloc是,这是速度更快。
缺点是不能将数组用于索引器。

# position based, but we can get the position
# from the columns object via the `get_loc` method
IBM.iat[2, IBM.columns.get_loc('PNL')] = 3

set_value基于标签的
作品与loc标量索引器非常相似。 无法对数组索引器进行操作。 能够!分配新的索引和列

优势超级快,因为几乎没有开销!
缺点由于pandas没有进行大量安全检查,因此开销很少。 使用风险自负。另外,这也不打算供公众使用。

# label based, but we can use position values
# to get the labels from the index object
df.set_value(df.index[2], 'ColName', 3)

set_valuetakable=True位置,并根据
原理相似iloc无法在数组索引器中工作。 不能!分配新的索引和列。

优势超级快,因为几乎没有开销!
缺点由于pandas没有进行大量安全检查,因此开销很少。 使用风险自负。另外,这也不打算供公众使用。

# position based, but we can get the position
# from the columns object via the `get_loc` method
df.set_value(2, df.columns.get_loc('ColName'), 3, takable=True)

Updated for pandas 0.20 given that ix is deprecated. This demonstrates not only how to use loc, iloc, at, iat, set_value, but how to accomplish, mixed positional/label based indexing.


loclabel based
Allows you to pass 1-D arrays as indexers. Arrays can be either slices (subsets) of the index or column, or they can be boolean arrays which are equal in length to the index or columns.

Special Note: when a scalar indexer is passed, loc can assign a new index or column value that didn’t exist before.

# label based, but we can use position values
# to get the labels from the index object
df.loc[df.index[2], 'ColName'] = 3

df.loc[df.index[1:3], 'ColName'] = 3

ilocposition based
Similar to loc except with positions rather that index values. However, you cannot assign new columns or indices.

# position based, but we can get the position
# from the columns object via the `get_loc` method
df.iloc[2, df.columns.get_loc('ColName')] = 3

df.iloc[2, 4] = 3

df.iloc[:3, 2:4] = 3

atlabel based
Works very similar to loc for scalar indexers. Cannot operate on array indexers. Can! assign new indices and columns.

Advantage over loc is that this is faster.
Disadvantage is that you can’t use arrays for indexers.

# label based, but we can use position values
# to get the labels from the index object
df.at[df.index[2], 'ColName'] = 3

df.at['C', 'ColName'] = 3

iatposition based
Works similarly to iloc. Cannot work in array indexers. Cannot! assign new indices and columns.

Advantage over iloc is that this is faster.
Disadvantage is that you can’t use arrays for indexers.

# position based, but we can get the position
# from the columns object via the `get_loc` method
IBM.iat[2, IBM.columns.get_loc('PNL')] = 3

set_valuelabel based
Works very similar to loc for scalar indexers. Cannot operate on array indexers. Can! assign new indices and columns

Advantage Super fast, because there is very little overhead!
Disadvantage There is very little overhead because pandas is not doing a bunch of safety checks. Use at your own risk. Also, this is not intended for public use.

# label based, but we can use position values
# to get the labels from the index object
df.set_value(df.index[2], 'ColName', 3)

set_value with takable=Trueposition based
Works similarly to iloc. Cannot work in array indexers. Cannot! assign new indices and columns.

Advantage Super fast, because there is very little overhead!
Disadvantage There is very little overhead because pandas is not doing a bunch of safety checks. Use at your own risk. Also, this is not intended for public use.

# position based, but we can get the position
# from the columns object via the `get_loc` method
df.set_value(2, df.columns.get_loc('ColName'), 3, takable=True)

回答 2

熊猫从DataFrame中进行选择的主要方式有两种。

  • 标签
  • 整数位置

该文档使用位置一词来指代整数位置。我不喜欢这个术语,因为我觉得它很混乱。整数位置更具描述性,正好.iloc代表该位置。此处的关键字是INTEGER-按整数位置选择时必须使用整数。

在显示摘要之前,让我们确保…

.ix已弃用且含糊不清,切勿使用

熊猫有三个主要的索引器。我们有索引运算符本身(括号[].loc,和.iloc。让我们总结一下:

  • []-主要选择列的子集,但也可以选择行。无法同时选择行和列。
  • .loc -仅按标签选择行和列的子集
  • .iloc -仅按整数位置选择行和列的子集

我几乎从未使用过,.at或者.iat因为它们没有添加任何附加功能并且只增加了一点性能。除非您有一个对时间敏感的应用程序,否则我不建议您使用它们。无论如何,我们有他们的摘要:

  • .at 仅通过标签在DataFrame中选择单个标量值
  • .iat 仅通过整数位置选择DataFrame中的单个标量值

除了按标签和整数位置进行选择外,还存在布尔选择(也称为布尔索引)


解释.loc,,.iloc布尔选择.at.iat的示例如下所示

我们将首先关注.loc和之间的差异.iloc。在讨论差异之前,必须了解DataFrame具有用于帮助标识每一列和每一行的标签,这一点很重要。让我们看一个示例DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

所有粗体字均为标签。标签,agecolorfoodheightscorestate被用于。其他标签,JaneNickAaronPenelopeDeanChristinaCornelia用作标签的行。这些行标签统称为index


在DataFrame中选择特定行的主要方式是使用.loc.iloc索引器。这些索引器中的每一个也可以用于同时选择列,但是现在只关注行比较容易。此外,每个索引器都使用紧跟其名称的一组括号进行选择。

.loc仅通过标签选择数据

我们将首先讨论.loc仅通过索引或列标签选择数据的索引器。在示例DataFrame中,我们提供了有意义的名称作为索引值。许多DataFrame都没有任何有意义的名称,而是默认为0到n-1之间的整数,其中n是DataFrame的长度(行数)。

您可以使用三种输入中的许多不同.loc,它们是

  • 一串
  • 字符串列表
  • 使用字符串作为起始值和终止值的切片符号

用带字符串的.loc选择单行

要选择单行数据,请将索引标签放在后面的括号内.loc

df.loc['Penelope']

这将数据行作为系列返回

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

使用.loc与字符串列表选择多行

df.loc[['Cornelia', 'Jane', 'Dean']]

这将返回一个DataFrame,其中的数据行按列表中指定的顺序进行:

使用带有切片符号的.loc选择多行

切片符号由开始值,停止值和步长值定义。按标签切片时,大熊猫在返回值中包含停止值。以下是从亚伦到迪恩(含)的片段。它的步长未明确定义,但默认为1。

df.loc['Aaron':'Dean']

可以采用与Python列表相同的方式获取复杂的切片。

.iloc仅按整数位置选择数据

现在转到.iloc。DataFrame中数据的每一行和每一列都有一个定义它的整数位置。这是输出中直观显示的标签的补充。整数位置就是从0开始从顶部/左侧开始的行数/列数。

您可以使用三种输入中的许多不同.iloc,它们是

  • 一个整数
  • 整数列表
  • 使用整数作为起始值和终止值的切片符号

用带整数的.iloc选择单行

df.iloc[4]

这将返回第5行(整数位置4)为系列

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

用.iloc选择带有整数列表的多行

df.iloc[[2, -2]]

这将返回第三行和倒数第二行的DataFrame:

使用带切片符号的.iloc选择多行

df.iloc[:5:3]


使用.loc和.iloc同时选择行和列

两者的一项出色功能.loc/.iloc是它们可以同时选择行和列。在上面的示例中,所有列都是从每个选择中返回的。我们可以选择输入类型与行相同的列。我们只需要用逗号分隔行和列的选择即可。

例如,我们可以选择Jane行和Dean行,它们的高度,得分和状态如下:

df.loc[['Jane', 'Dean'], 'height':]

这对行使用标签列表,对列使用切片符号

我们自然可以.iloc只使用整数来执行类似的操作。

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

带标签和整数位置的同时选择

.ix用来与标签和整数位置同时进行选择,这很有用,但有时会造成混淆和模棱两可,值得庆幸的是,它已弃用。如果您需要混合使用标签和整数位置进行选择,则必须同时选择标签或整数位置。

例如,如果我们要选择行Nick以及第Cornelia2列和第4列,则可以.loc通过以下方式将整数转换为标签来使用:

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 

或者,可以使用get_locindex方法将索引标签转换为整数。

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

布尔选择

.loc索引器还可以进行布尔选择。例如,如果我们有兴趣查找年龄在30岁以上的所有行并仅返回foodscore列,则可以执行以下操作:

df.loc[df['age'] > 30, ['food', 'score']] 

您可以使用复制它,.iloc但是不能将其传递为布尔系列。您必须将boolean Series转换为numpy数组,如下所示:

df.iloc[(df['age'] > 30).values, [2, 4]] 

选择所有行

可以.loc/.iloc仅用于列选择。您可以使用冒号选择所有行,如下所示:

df.loc[:, 'color':'score':2]


索引运算符[]可以切片也可以选择行和列,但不能同时选择。

大多数人都熟悉DataFrame索引运算符的主要目的,即选择列。字符串选择单个列作为系列,字符串列表选择多个列作为DataFrame。

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

使用列表选择多个列

df[['food', 'score']]

人们所不熟悉的是,当使用切片符号时,选择是通过行标签或整数位置进行的。这非常令人困惑,我几乎从未使用过,但是确实可以使用。

df['Penelope':'Christina'] # slice rows by label

df[2:6:2] # slice rows by integer location

.loc/.iloc选择行的明确性是高度首选的。单独的索引运算符无法同时选择行和列。

df[3:5, 'color']
TypeError: unhashable type: 'slice'

.at和选择.iat

选择与.at几乎相同,.loc但仅在DataFrame中选择一个“单元”。我们通常将此单元称为标量值。要使用.at,请将行标签和列标签都传递给它,并用逗号分隔。

df.at['Christina', 'color']
'black'

选择与.iat几乎相同,.iloc但仅选择一个标量值。您必须为行和列位置都传递一个整数

df.iat[2, 5]
'FL'

There are two primary ways that pandas makes selections from a DataFrame.

  • By Label
  • By Integer Location

The documentation uses the term position for referring to integer location. I do not like this terminology as I feel it is confusing. Integer location is more descriptive and is exactly what .iloc stands for. The key word here is INTEGER – you must use integers when selecting by integer location.

Before showing the summary let’s all make sure that …

.ix is deprecated and ambiguous and should never be used

There are three primary indexers for pandas. We have the indexing operator itself (the brackets []), .loc, and .iloc. Let’s summarize them:

  • [] – Primarily selects subsets of columns, but can select rows as well. Cannot simultaneously select rows and columns.
  • .loc – selects subsets of rows and columns by label only
  • .iloc – selects subsets of rows and columns by integer location only

I almost never use .at or .iat as they add no additional functionality and with just a small performance increase. I would discourage their use unless you have a very time-sensitive application. Regardless, we have their summary:

  • .at selects a single scalar value in the DataFrame by label only
  • .iat selects a single scalar value in the DataFrame by integer location only

In addition to selection by label and integer location, boolean selection also known as boolean indexing exists.


Examples explaining .loc, .iloc, boolean selection and .at and .iat are shown below

We will first focus on the differences between .loc and .iloc. Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each row. Let’s take a look at a sample DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

All the words in bold are the labels. The labels, age, color, food, height, score and state are used for the columns. The other labels, Jane, Nick, Aaron, Penelope, Dean, Christina, Cornelia are used as labels for the rows. Collectively, these row labels are known as the index.


The primary ways to select particular rows in a DataFrame are with the .loc and .iloc indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.

.loc selects data only by labels

We will first talk about the .loc indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length(number of rows) of the DataFrame.

There are many different inputs you can use for .loc three out of them are

  • A string
  • A list of strings
  • Slice notation using strings as the start and stop values

Selecting a single row with .loc with a string

To select a single row of data, place the index label inside of the brackets following .loc.

df.loc['Penelope']

This returns the row of data as a Series

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

Selecting multiple rows with .loc with a list of strings

df.loc[['Cornelia', 'Jane', 'Dean']]

This returns a DataFrame with the rows in the order specified in the list:

Selecting multiple rows with .loc with slice notation

Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.

df.loc['Aaron':'Dean']

Complex slices can be taken in the same manner as Python lists.

.iloc selects data only by integer location

Let’s now turn to .iloc. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.

There are many different inputs you can use for .iloc three out of them are

  • An integer
  • A list of integers
  • Slice notation using integers as the start and stop values

Selecting a single row with .iloc with an integer

df.iloc[4]

This returns the 5th row (integer location 4) as a Series

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

Selecting multiple rows with .iloc with a list of integers

df.iloc[[2, -2]]

This returns a DataFrame of the third and second to last rows:

Selecting multiple rows with .iloc with slice notation

df.iloc[:5:3]


Simultaneous selection of rows and columns with .loc and .iloc

One excellent ability of both .loc/.iloc is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.

For example, we can select rows Jane, and Dean with just the columns height, score and state like this:

df.loc[['Jane', 'Dean'], 'height':]

This uses a list of labels for the rows and slice notation for the columns

We can naturally do similar operations with .iloc using only integers.

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

Simultaneous selection with labels and integer location

.ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.

For instance, if we want to select rows Nick and Cornelia along with columns 2 and 4, we could use .loc by converting the integers to labels with the following:

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 

Or alternatively, convert the index labels to integers with the get_loc index method.

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

Boolean Selection

The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows where age is above 30 and return just the food and score columns we can do the following:

df.loc[df['age'] > 30, ['food', 'score']] 

You can replicate this with .iloc but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:

df.iloc[(df['age'] > 30).values, [2, 4]] 

Selecting all rows

It is possible to use .loc/.iloc for just column selection. You can select all the rows by using a colon like this:

df.loc[:, 'color':'score':2]


The indexing operator, [], can slice can select rows and columns too but not simultaneously.

Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

Using a list selects multiple columns

df[['food', 'score']]

What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.

df['Penelope':'Christina'] # slice rows by label

df[2:6:2] # slice rows by integer location

The explicitness of .loc/.iloc for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.

df[3:5, 'color']
TypeError: unhashable type: 'slice'

Selection by .at and .iat

Selection with .at is nearly identical to .loc but it only selects a single ‘cell’ in your DataFrame. We usually refer to this cell as a scalar value. To use .at, pass it both a row and column label separated by a comma.

df.at['Christina', 'color']
'black'

Selection with .iat is nearly identical to .iloc but it only selects a single scalar value. You must pass it an integer for both the row and column locations

df.iat[2, 5]
'FL'

回答 3

df = pd.DataFrame({'A':['a', 'b', 'c'], 'B':[54, 67, 89]}, index=[100, 200, 300])

df

                        A   B
                100     a   54
                200     b   67
                300     c   89
In [19]:    
df.loc[100]

Out[19]:
A     a
B    54
Name: 100, dtype: object

In [20]:    
df.iloc[0]

Out[20]:
A     a
B    54
Name: 100, dtype: object

In [24]:    
df2 = df.set_index([df.index,'A'])
df2

Out[24]:
        B
    A   
100 a   54
200 b   67
300 c   89

In [25]:    
df2.ix[100, 'a']

Out[25]:    
B    54
Name: (100, a), dtype: int64
df = pd.DataFrame({'A':['a', 'b', 'c'], 'B':[54, 67, 89]}, index=[100, 200, 300])

df

                        A   B
                100     a   54
                200     b   67
                300     c   89
In [19]:    
df.loc[100]

Out[19]:
A     a
B    54
Name: 100, dtype: object

In [20]:    
df.iloc[0]

Out[20]:
A     a
B    54
Name: 100, dtype: object

In [24]:    
df2 = df.set_index([df.index,'A'])
df2

Out[24]:
        B
    A   
100 a   54
200 b   67
300 c   89

In [25]:    
df2.ix[100, 'a']

Out[25]:    
B    54
Name: (100, a), dtype: int64

回答 4

让我们从这个小df开始:

import pandas as pd
import time as tm
import numpy as np
n=10
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))

我们会这样

df
Out[25]: 
        0   1   2   3   4   5   6   7   8   9
    0   0   1   2   3   4   5   6   7   8   9
    1  10  11  12  13  14  15  16  17  18  19
    2  20  21  22  23  24  25  26  27  28  29
    3  30  31  32  33  34  35  36  37  38  39
    4  40  41  42  43  44  45  46  47  48  49
    5  50  51  52  53  54  55  56  57  58  59
    6  60  61  62  63  64  65  66  67  68  69
    7  70  71  72  73  74  75  76  77  78  79
    8  80  81  82  83  84  85  86  87  88  89
    9  90  91  92  93  94  95  96  97  98  99

有了这个我们有:

df.iloc[3,3]
Out[33]: 33

df.iat[3,3]
Out[34]: 33

df.iloc[:3,:3]
Out[35]: 
    0   1   2   3
0   0   1   2   3
1  10  11  12  13
2  20  21  22  23
3  30  31  32  33



df.iat[:3,:3]
Traceback (most recent call last):
   ... omissis ...
ValueError: At based indexing on an integer index can only have integer indexers

因此,我们不能将.iat用于子集,而只能在其中使用.iloc。

但是,让我们尝试从较大的df中进行选择,并检查速度…

# -*- coding: utf-8 -*-
"""
Created on Wed Feb  7 09:58:39 2018

@author: Fabio Pomi
"""

import pandas as pd
import time as tm
import numpy as np
n=1000
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))
t1=tm.time()
for j in df.index:
    for i in df.columns:
        a=df.iloc[j,i]
t2=tm.time()
for j in df.index:
    for i in df.columns:
        a=df.iat[j,i]
t3=tm.time()
loc=t2-t1
at=t3-t2
prc = loc/at *100
print('\nloc:%f at:%f prc:%f' %(loc,at,prc))

loc:10.485600 at:7.395423 prc:141.784987

因此,使用.loc我们可以管理子集,并且仅使用单个标量即可使用.loc,但是.at比.loc更快

:-)

Let’s start with this small df:

import pandas as pd
import time as tm
import numpy as np
n=10
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))

We’ll so have

df
Out[25]: 
        0   1   2   3   4   5   6   7   8   9
    0   0   1   2   3   4   5   6   7   8   9
    1  10  11  12  13  14  15  16  17  18  19
    2  20  21  22  23  24  25  26  27  28  29
    3  30  31  32  33  34  35  36  37  38  39
    4  40  41  42  43  44  45  46  47  48  49
    5  50  51  52  53  54  55  56  57  58  59
    6  60  61  62  63  64  65  66  67  68  69
    7  70  71  72  73  74  75  76  77  78  79
    8  80  81  82  83  84  85  86  87  88  89
    9  90  91  92  93  94  95  96  97  98  99

With this we have:

df.iloc[3,3]
Out[33]: 33

df.iat[3,3]
Out[34]: 33

df.iloc[:3,:3]
Out[35]: 
    0   1   2   3
0   0   1   2   3
1  10  11  12  13
2  20  21  22  23
3  30  31  32  33



df.iat[:3,:3]
Traceback (most recent call last):
   ... omissis ...
ValueError: At based indexing on an integer index can only have integer indexers

Thus we cannot use .iat for subset, where we must use .iloc only.

But let’s try both to select from a larger df and let’s check the speed …

# -*- coding: utf-8 -*-
"""
Created on Wed Feb  7 09:58:39 2018

@author: Fabio Pomi
"""

import pandas as pd
import time as tm
import numpy as np
n=1000
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))
t1=tm.time()
for j in df.index:
    for i in df.columns:
        a=df.iloc[j,i]
t2=tm.time()
for j in df.index:
    for i in df.columns:
        a=df.iat[j,i]
t3=tm.time()
loc=t2-t1
at=t3-t2
prc = loc/at *100
print('\nloc:%f at:%f prc:%f' %(loc,at,prc))

loc:10.485600 at:7.395423 prc:141.784987

So with .loc we can manage subsets and with .at only a single scalar, but .at is faster than .loc

:-)


在Python中,如何将一个列表与另一个列表建立索引?

问题:在Python中,如何将一个列表与另一个列表建立索引?

我想用这样的另一个列表索引一个列表

L = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Idx = [0, 3, 7]
T = L[ Idx ]

T应该最终是一个包含[‘a’,’d’,’h’]的列表。

有没有比这更好的方法

T = []
for i in Idx:
    T.append(L[i])

print T
# Gives result ['a', 'd', 'h']

I would like to index a list with another list like this

L = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Idx = [0, 3, 7]
T = L[ Idx ]

and T should end up being a list containing [‘a’, ‘d’, ‘h’].

Is there a better way than

T = []
for i in Idx:
    T.append(L[i])

print T
# Gives result ['a', 'd', 'h']

回答 0

T = [L[i] for i in Idx]
T = [L[i] for i in Idx]

回答 1

如果您使用的是numpy,则可以执行如下扩展切片:

>>> import numpy
>>> a=numpy.array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
>>> Idx = [0, 3, 7]
>>> a[Idx]
array(['a', 'd', 'h'], 
      dtype='|S1')

…并且可能要快得多(如果性能足以影响numpy导入)

If you are using numpy, you can perform extended slicing like that:

>>> import numpy
>>> a=numpy.array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
>>> Idx = [0, 3, 7]
>>> a[Idx]
array(['a', 'd', 'h'], 
      dtype='|S1')

…and is probably much faster (if performance is enough of a concern to to bother with the numpy import)


回答 2

功能方法:

a = [1,"A", 34, -123, "Hello", 12]
b = [0, 2, 5]

from operator import itemgetter

print(list(itemgetter(*b)(a)))
[1, 34, 12]

A functional approach:

a = [1,"A", 34, -123, "Hello", 12]
b = [0, 2, 5]

from operator import itemgetter

print(list(itemgetter(*b)(a)))
[1, 34, 12]

回答 3

T = map(lambda i: L[i], Idx)
T = map(lambda i: L[i], Idx)

回答 4

我对这些方法都不满意,所以我想出了一个Flexlist类,它允许通过整数,切片或索引列表进行灵活的索引编制:

class Flexlist(list):
    def __getitem__(self, keys):
        if isinstance(keys, (int, slice)): return list.__getitem__(self, keys)
        return [self[k] for k in keys]

例如,您将其用作:

L = Flexlist(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
Idx = [0, 3, 7]
T = L[ Idx ]

print(T)  # ['a', 'd', 'h']

I wasn’t happy with any of these approaches, so I came up with a Flexlist class that allows for flexible indexing, either by integer, slice or index-list:

class Flexlist(list):
    def __getitem__(self, keys):
        if isinstance(keys, (int, slice)): return list.__getitem__(self, keys)
        return [self[k] for k in keys]

Which, for your example, you would use as:

L = Flexlist(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
Idx = [0, 3, 7]
T = L[ Idx ]

print(T)  # ['a', 'd', 'h']

回答 5

L= {'a':'a','d':'d', 'h':'h'}
index= ['a','d','h'] 
for keys in index:
    print(L[keys])

我会用一个Dict add想要keysindex

L= {'a':'a','d':'d', 'h':'h'}
index= ['a','d','h'] 
for keys in index:
    print(L[keys])

I would use a Dict add desired keys to index


回答 6

您也可以将__getitem__方法与map以下方法结合使用:

L = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Idx = [0, 3, 7]
res = list(map(L.__getitem__, Idx))
print(res)
# ['a', 'd', 'h']

You could also use the __getitem__ method combined with map like the following:

L = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Idx = [0, 3, 7]
res = list(map(L.__getitem__, Idx))
print(res)
# ['a', 'd', 'h']

从列表或元组中明确选择项目

问题:从列表或元组中明确选择项目

我有以下Python列表(也可以是元组):

myList = ['foo', 'bar', 'baz', 'quux']

我可以说

>>> myList[0:3]
['foo', 'bar', 'baz']
>>> myList[::2]
['foo', 'baz']
>>> myList[1::2]
['bar', 'quux']

如何显式挑选索引没有特定模式的项目?例如,我要选择[0,2,3]。或者,从1000个很大的清单中,我要选择[87, 342, 217, 998, 500]。是否有一些Python语法可以做到这一点?看起来像这样:

>>> myBigList[87, 342, 217, 998, 500]

I have the following Python list (can also be a tuple):

myList = ['foo', 'bar', 'baz', 'quux']

I can say

>>> myList[0:3]
['foo', 'bar', 'baz']
>>> myList[::2]
['foo', 'baz']
>>> myList[1::2]
['bar', 'quux']

How do I explicitly pick out items whose indices have no specific patterns? For example, I want to select [0,2,3]. Or from a very big list of 1000 items, I want to select [87, 342, 217, 998, 500]. Is there some Python syntax that does that? Something that looks like:

>>> myBigList[87, 342, 217, 998, 500]

回答 0

list( myBigList[i] for i in [87, 342, 217, 998, 500] )

我将答案与python 2.5.2进行了比较:

  • 19.7微秒: [ myBigList[i] for i in [87, 342, 217, 998, 500] ]

  • 20.6 USEC: map(myBigList.__getitem__, (87, 342, 217, 998, 500))

  • 22.7 USEC: itemgetter(87, 342, 217, 998, 500)(myBigList)

  • 24.6 USEC: list( myBigList[i] for i in [87, 342, 217, 998, 500] )

请注意,在Python 3中,第1个已更改为与第4个相同。


另一种选择是以a开头,numpy.array它允许通过列表或a进行索引numpy.array

>>> import numpy
>>> myBigList = numpy.array(range(1000))
>>> myBigList[(87, 342, 217, 998, 500)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: invalid index
>>> myBigList[[87, 342, 217, 998, 500]]
array([ 87, 342, 217, 998, 500])
>>> myBigList[numpy.array([87, 342, 217, 998, 500])]
array([ 87, 342, 217, 998, 500])

tuple不工作方式相同那些片。

list( myBigList[i] for i in [87, 342, 217, 998, 500] )

I compared the answers with python 2.5.2:

  • 19.7 usec: [ myBigList[i] for i in [87, 342, 217, 998, 500] ]

  • 20.6 usec: map(myBigList.__getitem__, (87, 342, 217, 998, 500))

  • 22.7 usec: itemgetter(87, 342, 217, 998, 500)(myBigList)

  • 24.6 usec: list( myBigList[i] for i in [87, 342, 217, 998, 500] )

Note that in Python 3, the 1st was changed to be the same as the 4th.


Another option would be to start out with a numpy.array which allows indexing via a list or a numpy.array:

>>> import numpy
>>> myBigList = numpy.array(range(1000))
>>> myBigList[(87, 342, 217, 998, 500)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: invalid index
>>> myBigList[[87, 342, 217, 998, 500]]
array([ 87, 342, 217, 998, 500])
>>> myBigList[numpy.array([87, 342, 217, 998, 500])]
array([ 87, 342, 217, 998, 500])

The tuple doesn’t work the same way as those are slices.


回答 1

那这个呢:

from operator import itemgetter
itemgetter(0,2,3)(myList)
('foo', 'baz', 'quux')

What about this:

from operator import itemgetter
itemgetter(0,2,3)(myList)
('foo', 'baz', 'quux')

回答 2

它不是内置的,但是如果您愿意,可以创建一个将元组作为“索引”的list的子类:

class MyList(list):

    def __getitem__(self, index):
        if isinstance(index, tuple):
            return [self[i] for i in index]
        return super(MyList, self).__getitem__(index)


seq = MyList("foo bar baaz quux mumble".split())
print seq[0]
print seq[2,4]
print seq[1::2]

印刷

foo
['baaz', 'mumble']
['bar', 'quux']

It isn’t built-in, but you can make a subclass of list that takes tuples as “indexes” if you’d like:

class MyList(list):

    def __getitem__(self, index):
        if isinstance(index, tuple):
            return [self[i] for i in index]
        return super(MyList, self).__getitem__(index)


seq = MyList("foo bar baaz quux mumble".split())
print seq[0]
print seq[2,4]
print seq[1::2]

printing

foo
['baaz', 'mumble']
['bar', 'quux']

回答 3

也许列表理解是按顺序进行的:

L = ['a', 'b', 'c', 'd', 'e', 'f']
print [ L[index] for index in [1,3,5] ]

生成:

['b', 'd', 'f']

那是您要找的东西吗?

Maybe a list comprehension is in order:

L = ['a', 'b', 'c', 'd', 'e', 'f']
print [ L[index] for index in [1,3,5] ]

Produces:

['b', 'd', 'f']

Is that what you are looking for?


回答 4

>>> map(myList.__getitem__, (2,2,1,3))
('baz', 'baz', 'bar', 'quux')

您也可以创建自己的List类,该类支持将元组用作__getitem__要执行的操作的参数myList[(2,2,1,3)]

>>> map(myList.__getitem__, (2,2,1,3))
('baz', 'baz', 'bar', 'quux')

You can also create your own List class which supports tuples as arguments to __getitem__ if you want to be able to do myList[(2,2,1,3)].


回答 5

我只想指出,即使itemgetter的语法看起来也很整洁,但是在大型列表上执行时有点慢。

import timeit
from operator import itemgetter
start=timeit.default_timer()
for i in range(1000000):
    itemgetter(0,2,3)(myList)
print ("Itemgetter took ", (timeit.default_timer()-start))

物品获取者1.065209062149279

start=timeit.default_timer()
for i in range(1000000):
    myList[0],myList[2],myList[3]
print ("Multiple slice took ", (timeit.default_timer()-start))

多个切片花费0.6225321444745759

I just want to point out, even syntax of itemgetter looks really neat, but it’s kinda slow when perform on large list.

import timeit
from operator import itemgetter
start=timeit.default_timer()
for i in range(1000000):
    itemgetter(0,2,3)(myList)
print ("Itemgetter took ", (timeit.default_timer()-start))

Itemgetter took 1.065209062149279

start=timeit.default_timer()
for i in range(1000000):
    myList[0],myList[2],myList[3]
print ("Multiple slice took ", (timeit.default_timer()-start))

Multiple slice took 0.6225321444745759


回答 6

另一个可能的解决方案:

sek=[]
L=[1,2,3,4,5,6,7,8,9,0]
for i in [2, 4, 7, 0, 3]:
   a=[L[i]]
   sek=sek+a
print (sek)

Another possible solution:

sek=[]
L=[1,2,3,4,5,6,7,8,9,0]
for i in [2, 4, 7, 0, 3]:
   a=[L[i]]
   sek=sek+a
print (sek)

回答 7

当你有一个布尔numpy数组时,就像 mask

[mylist[i] for i in np.arange(len(mask), dtype=int)[mask]]

适用于任何序列或np.array的lambda:

subseq = lambda myseq, mask : [myseq[i] for i in np.arange(len(mask), dtype=int)[mask]]

newseq = subseq(myseq, mask)

like often when you have a boolean numpy array like mask

[mylist[i] for i in np.arange(len(mask), dtype=int)[mask]]

A lambda that works for any sequence or np.array:

subseq = lambda myseq, mask : [myseq[i] for i in np.arange(len(mask), dtype=int)[mask]]

newseq = subseq(myseq, mask)


熊猫使用什么规则生成视图与副本?

问题:熊猫使用什么规则生成视图与副本?

我对Pandas决定从数据框中进行选择是原始数据框的副本或原始数据视图时使用的规则感到困惑。

例如,如果我有

df = pd.DataFrame(np.random.randn(8,8), columns=list('ABCDEFGH'), index=range(1,9))

我了解a会query传回副本,因此类似

foo = df.query('2 < index <= 5')
foo.loc[:,'E'] = 40

将对原始数据帧无效df。我也了解标量或命名切片返回一个视图,因此对它们的赋值(例如

df.iloc[3] = 70

要么

df.ix[1,'B':'E'] = 222

会改变df。但是当涉及到更复杂的案件时,我迷失了。例如,

df[df.C <= df.B] = 7654321

变化df,但是

df[df.C <= df.B].ix[:,'B':'E']

才不是。

是否有一个熊猫正在使用的简单规则,我只是想念它?在这些特定情况下发生了什么;尤其是,如何更改满足特定查询的数据框中的所有值(或值的子集)(就像我在上面的最后一个示例中尝试的那样)?


注意:这和这个问题不一样;并且我已经阅读了文档,但并未对此有所启发。我还阅读了有关此主题的“相关”问题,但我仍然缺少Pandas使用的简单规则,以及如何将其应用于(例如)修改值(或值的子集)在满足特定查询的数据框中。

I’m confused about the rules Pandas uses when deciding that a selection from a dataframe is a copy of the original dataframe, or a view on the original.

If I have, for example,

df = pd.DataFrame(np.random.randn(8,8), columns=list('ABCDEFGH'), index=range(1,9))

I understand that a query returns a copy so that something like

foo = df.query('2 < index <= 5')
foo.loc[:,'E'] = 40

will have no effect on the original dataframe, df. I also understand that scalar or named slices return a view, so that assignments to these, such as

df.iloc[3] = 70

or

df.ix[1,'B':'E'] = 222

will change df. But I’m lost when it comes to more complicated cases. For example,

df[df.C <= df.B] = 7654321

changes df, but

df[df.C <= df.B].ix[:,'B':'E']

does not.

Is there a simple rule that Pandas is using that I’m just missing? What’s going on in these specific cases; and in particular, how do I change all values (or a subset of values) in a dataframe that satisfy a particular query (as I’m attempting to do in the last example above)?


Note: This is not the same as this question; and I have read the documentation, but am not enlightened by it. I’ve also read through the “Related” questions on this topic, but I’m still missing the simple rule Pandas is using, and how I’d apply it to — for example — modify the values (or a subset of values) in a dataframe that satisfy a particular query.


回答 0

这是规则,其后是覆盖:

  • 所有操作都会生成一个副本

  • 如果inplace=True提供,它将原位修改;只有一些操作支持这一点

  • 设置的索引器,例如.loc/.iloc/.iat/.at将原地设置。

  • 到达单一类型对象的索引器几乎总是一个视图(取决于内存布局,这可能不是原因,这不可靠)。这主要是为了提高效率。(上面的示例用于.query;它将始终返回的副本,其值为numexpr

  • 到达多类型对象的索引器始终是副本。

您的例子 chained indexing

df[df.C <= df.B].loc[:,'B':'E']

不能保证能正常工作(因此您不应该这样做)。

而是:

df.loc[df.C <= df.B, 'B':'E']

因为这更快,并且将始终有效

链式索引是2个单独的python操作,因此无法可靠地被熊猫拦截(您通常会得到SettingWithCopyWarning,但也不是100%可检测到的)。您所指出的dev文档提供了更全面的说明。

Here’s the rules, subsequent override:

  • All operations generate a copy

  • If inplace=True is provided, it will modify in-place; only some operations support this

  • An indexer that sets, e.g. .loc/.iloc/.iat/.at will set inplace.

  • An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that’s why this is not reliable). This is mainly for efficiency. (the example from above is for .query; this will always return a copy as its evaluated by numexpr)

  • An indexer that gets on a multiple-dtyped object is always a copy.

Your example of chained indexing

df[df.C <= df.B].loc[:,'B':'E']

is not guaranteed to work (and thus you shoulld never do this).

Instead do:

df.loc[df.C <= df.B, 'B':'E']

as this is faster and will always work

The chained indexing is 2 separate python operations and thus cannot be reliably intercepted by pandas (you will oftentimes get a SettingWithCopyWarning, but that is not 100% detectable either). The dev docs, which you pointed, offer a much more full explanation.


在python中处理list.index(可能不存在)的最佳方法?

问题:在python中处理list.index(可能不存在)的最佳方法?

我有看起来像这样的代码:

thing_index = thing_list.index(thing)
otherfunction(thing_list, thing_index)

好的,所以简化了,但是您知道了。现在thing可能实际上不在列表中,在这种情况下,我想通过-1 thing_index。在其他语言中index(),如果找不到该元素,这就是您期望返回的结果。实际上,它引发了一个错误ValueError

我可以这样做:

try:
    thing_index = thing_list.index(thing)
except ValueError:
    thing_index = -1
otherfunction(thing_list, thing_index)

但这感觉很脏,而且我不知道是否ValueError可以出于其他原因而提出该提议。我根据生成器函数提出了以下解决方案,但似乎有点复杂:

thing_index = ( [(i for i in xrange(len(thing_list)) if thing_list[i]==thing)] or [-1] )[0]

有没有一种更清洁的方法来实现同一目标?假设列表未排序。

I have code which looks something like this:

thing_index = thing_list.index(thing)
otherfunction(thing_list, thing_index)

ok so that’s simplified but you get the idea. Now thing might not actually be in the list, in which case I want to pass -1 as thing_index. In other languages this is what you’d expect index() to return if it couldn’t find the element. In fact it throws a ValueError.

I could do this:

try:
    thing_index = thing_list.index(thing)
except ValueError:
    thing_index = -1
otherfunction(thing_list, thing_index)

But this feels dirty, plus I don’t know if ValueError could be raised for some other reason. I came up with the following solution based on generator functions, but it seems a little complex:

thing_index = ( [(i for i in xrange(len(thing_list)) if thing_list[i]==thing)] or [-1] )[0]

Is there a cleaner way to achieve the same thing? Let’s assume the list isn’t sorted.


回答 0

使用try-except子句没有任何“肮脏”。这是Python方式。ValueError只会由.index方法引发,因为这是您唯一的代码!

要回答的评论:
在Python,容易请求原谅比获得许可的理念已经非常成熟,并没有 index不会提高这种类型的错误的任何其他问题。并不是我能想到的。

There is nothing “dirty” about using try-except clause. This is the pythonic way. ValueError will be raised by the .index method only, because it’s the only code you have there!

To answer the comment:
In Python, easier to ask forgiveness than to get permission philosophy is well established, and no index will not raise this type of error for any other issues. Not that I can think of any.


回答 1

thing_index = thing_list.index(elem) if elem in thing_list else -1

一条线。简单。没有exceptions。

thing_index = thing_list.index(elem) if elem in thing_list else -1

One line. Simple. No exceptions.


回答 2

dict类型具有一个get函数,如果字典中不存在该键,则to的第二个参数get是它应返回的值。同样setdefaultdict存在,如果键存在,则返回值;否则,它将根据您的默认参数设置值,然后返回您的默认参数。

您可以扩展list类型以具有getindexdefault方法。

class SuperDuperList(list):
    def getindexdefault(self, elem, default):
        try:
            thing_index = self.index(elem)
            return thing_index
        except ValueError:
            return default

然后可以这样使用:

mylist = SuperDuperList([0,1,2])
index = mylist.getindexdefault( 'asdf', -1 )

The dict type has a get function, where if the key doesn’t exist in the dictionary, the 2nd argument to get is the value that it should return. Similarly there is setdefault, which returns the value in the dict if the key exists, otherwise it sets the value according to your default parameter and then returns your default parameter.

You could extend the list type to have a getindexdefault method.

class SuperDuperList(list):
    def getindexdefault(self, elem, default):
        try:
            thing_index = self.index(elem)
            return thing_index
        except ValueError:
            return default

Which could then be used like:

mylist = SuperDuperList([0,1,2])
index = mylist.getindexdefault( 'asdf', -1 )

回答 3

使用的代码没有错ValueError。如果您想避免exceptions,这里还有另外一种说法:

thing_index = next((i for i, x in enumerate(thing_list) if x == thing), -1)

There is nothing wrong with your code that uses ValueError. Here’s yet another one-liner if you’d like to avoid exceptions:

thing_index = next((i for i, x in enumerate(thing_list) if x == thing), -1)

回答 4

这个问题是语言哲学之一。例如,在Java中,一直存在这样一种传统,即异常仅应仅在发生错误的“exceptions情况”中使用,而不是用于流控制。最初,这是出于性能原因,因为Java异常的速度很慢,但是现在这已成为公认的样式。

相反,Python一直使用异常来指示正常的程序流,就像ValueError我们在这里讨论的那样抛出异常。Python风格对此没有任何“污秽”,并且还有更多的来源。一个更常见的例子是StopIteration异常,它是由迭代器的next()方法引发的,以表示没有其他值。

This issue is one of language philosophy. In Java for example there has always been a tradition that exceptions should really only be used in “exceptional circumstances” that is when errors have happened, rather than for flow control. In the beginning this was for performance reasons as Java exceptions were slow but now this has become the accepted style.

In contrast Python has always used exceptions to indicate normal program flow, like raising a ValueError as we are discussing here. There is nothing “dirty” about this in Python style and there are many more where that came from. An even more common example is StopIteration exception which is raised by an iterator‘s next() method to signal that there are no further values.


回答 5

如果您经常这样做,那么最好在辅助功能中将其烘烤掉:

def index_of(val, in_list):
    try:
        return in_list.index(val)
    except ValueError:
        return -1 

If you are doing this often then it is better to stove it away in a helper function:

def index_of(val, in_list):
    try:
        return in_list.index(val)
    except ValueError:
        return -1 

回答 6

那😃呢?

li = [1,2,3,4,5] # create list 

li = dict(zip(li,range(len(li)))) # convert List To Dict 
print( li ) # {1: 0, 2: 1, 3: 2, 4:3 , 5: 4}
li.get(20) # None 
li.get(1)  # 0 

What about this 😃 :

li = [1,2,3,4,5] # create list 

li = dict(zip(li,range(len(li)))) # convert List To Dict 
print( li ) # {1: 0, 2: 1, 3: 2, 4:3 , 5: 4}
li.get(20) # None 
li.get(1)  # 0 

回答 7

那这个呢:

otherfunction(thing_collection, thing)

而不是在函数接口中公开一些依赖实现的东西(如列表索引),而是传递集合和东西,然后让其他函数处理“成员资格测试”问题。如果将其他函数编写为与集合类型无关,则可能以以下内容开头:

if thing in thing_collection:
    ... proceed with operation on thing

如果thing_collection是一个列表,元组,集合或字典,它将起作用。

这可能比以下更清楚:

if thing_index != MAGIC_VALUE_INDICATING_NOT_A_MEMBER:

这是您在其他功能中已经拥有的代码。

What about this:

otherfunction(thing_collection, thing)

Rather than expose something so implementation-dependent like a list index in a function interface, pass the collection and the thing and let otherfunction deal with the “test for membership” issues. If otherfunction is written to be collection-type-agnostic, then it would probably start with:

if thing in thing_collection:
    ... proceed with operation on thing

which will work if thing_collection is a list, tuple, set, or dict.

This is possibly clearer than:

if thing_index != MAGIC_VALUE_INDICATING_NOT_A_MEMBER:

which is the code you already have in otherfunction.


回答 8

像这样:

temp_inx = (L + [x]).index(x) 
inx = temp_inx if temp_inx < len(L) else -1

What about like this:

temp_inx = (L + [x]).index(x) 
inx = temp_inx if temp_inx < len(L) else -1

回答 9

列表中的“ .index()”方法存在相同的问题。我对它引发异常的事实没有任何疑问,但是我强烈不同意它是一个非描述性ValueError的事实。我可以理解是否会是IndexError。

我可以看到为什么返回“ -1”也是一个问题,因为它是Python中的有效索引。但实际上,我从不期望“ .index()”方法返回负数。

这是一行代码(好吧,这是一条相当长的线…),仅遍历列表一次,如果找不到该项目,则返回“无”。如果您愿意,将其重写为返回-1将是微不足道的。

indexOf = lambda list, thing: \
            reduce(lambda acc, (idx, elem): \
                   idx if (acc is None) and elem == thing else acc, list, None)

如何使用:

>>> indexOf([1,2,3], 4)
>>>
>>> indexOf([1,2,3], 1)
0
>>>

I have the same issue with the “.index()” method on lists. I have no issue with the fact that it throws an exception but I strongly disagree with the fact that it’s a non-descriptive ValueError. I could understand if it would’ve been an IndexError, though.

I can see why returning “-1” would be an issue too because it’s a valid index in Python. But realistically, I never expect a “.index()” method to return a negative number.

Here goes a one liner (ok, it’s a rather long line …), goes through the list exactly once and returns “None” if the item isn’t found. It would be trivial to rewrite it to return -1, should you so desire.

indexOf = lambda list, thing: \
            reduce(lambda acc, (idx, elem): \
                   idx if (acc is None) and elem == thing else acc, list, None)

How to use:

>>> indexOf([1,2,3], 4)
>>>
>>> indexOf([1,2,3], 1)
0
>>>

回答 10

我不知道为什么你应该认为它很脏…因为异常?如果您想要一个内衬,则为:

thing_index = thing_list.index(elem) if thing_list.count(elem) else -1

但是我建议不要使用它。我认为Ross Rogers解决方案是最好的解决方案,请使用对象封装您的喜好行为,不要尝试以牺牲可读性为代价将语言推向极限。

I don’t know why you should think it is dirty… because of the exception? if you want a oneliner, here it is:

thing_index = thing_list.index(elem) if thing_list.count(elem) else -1

but i would advise against using it; I think Ross Rogers solution is the best, use an object to encapsulate your desiderd behaviour, don’t try pushing the language to its limits at the cost of readability.


回答 11

我建议:

if thing in thing_list:
  list_index = -1
else:
  list_index = thing_list.index(thing)

I’d suggest:

if thing in thing_list:
  list_index = -1
else:
  list_index = thing_list.index(thing)

索引所有*除外* python中的一项

问题:索引所有*除外* python中的一项

有没有一种简单的方法来索引列表(或数组,或其他任何东西)中特定索引之外的所有元素?例如,

  • mylist[3] 将把该物品退回第3位

  • milist[~3] 将返回整个列表,除了3

Is there a simple way to index all elements of a list (or array, or whatever) except for a particular index? E.g.,

  • mylist[3] will return the item in position 3

  • milist[~3] will return the whole list except for 3


回答 0

对于列表,您可以使用列表组合。例如,要制作不含第3个元素b的副本a

a = range(10)[::-1]                       # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
b = [x for i,x in enumerate(a) if i!=3]   # [9, 8, 7, 5, 4, 3, 2, 1, 0]

这是非常通用的方法,可用于所有可迭代对象,包括numpy数组。如果您替换[]()b将是一个迭代器,而非列表。

或者,您可以通过以下方式就地完成此操作pop

a = range(10)[::-1]     # a = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
a.pop(3)                # a = [9, 8, 7, 5, 4, 3, 2, 1, 0]

numpy中,您可以使用布尔索引来做到这一点:

a = np.arange(9, -1, -1)     # a = array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
b = a[np.arange(len(a))!=3]  # b = array([9, 8, 7, 5, 4, 3, 2, 1, 0])

通常,这比上面列出的列表理解要快得多。

For a list, you could use a list comp. For example, to make b a copy of a without the 3rd element:

a = range(10)[::-1]                       # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
b = [x for i,x in enumerate(a) if i!=3]   # [9, 8, 7, 5, 4, 3, 2, 1, 0]

This is very general, and can be used with all iterables, including numpy arrays. If you replace [] with (), b will be an iterator instead of a list.

Or you could do this in-place with pop:

a = range(10)[::-1]     # a = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
a.pop(3)                # a = [9, 8, 7, 5, 4, 3, 2, 1, 0]

In numpy you could do this with a boolean indexing:

a = np.arange(9, -1, -1)     # a = array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
b = a[np.arange(len(a))!=3]  # b = array([9, 8, 7, 5, 4, 3, 2, 1, 0])

which will, in general, be much faster than the list comprehension listed above.


回答 1

>>> l = range(1,10)
>>> l
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> l[:2] 
[1, 2]
>>> l[3:]
[4, 5, 6, 7, 8, 9]
>>> l[:2] + l[3:]
[1, 2, 4, 5, 6, 7, 8, 9]
>>> 

也可以看看

解释Python的切片符号

>>> l = range(1,10)
>>> l
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> l[:2] 
[1, 2]
>>> l[3:]
[4, 5, 6, 7, 8, 9]
>>> l[:2] + l[3:]
[1, 2, 4, 5, 6, 7, 8, 9]
>>> 

See also

Explain Python’s slice notation


回答 2

我发现的最简单的方法是:

mylist[:x]+mylist[x+1:]

这将产生mylist没有index的元素x

The simplest way I found was:

mylist[:x] + mylist[x+1:]

that will produce your mylist without the element at index x.

Example

mylist = [0, 1, 2, 3, 4, 5]
x = 3
mylist[:x] + mylist[x+1:]

Result produced

mylist = [0, 1, 2, 4, 5]

回答 3

如果您使用的是numpy,则我认为最接近的是使用蒙版

>>> import numpy as np
>>> arr = np.arange(1,10)
>>> mask = np.ones(arr.shape,dtype=bool)
>>> mask[5]=0
>>> arr[mask]
array([1, 2, 3, 4, 5, 7, 8, 9])

如果itertools没有,可以达到类似的效果numpy

>>> from itertools import compress
>>> arr = range(1,10)
>>> mask = [1]*len(arr)
>>> mask[5]=0
>>> list(compress(arr,mask))
[1, 2, 3, 4, 5, 7, 8, 9]

If you are using numpy, the closest, I can think of is using a mask

>>> import numpy as np
>>> arr = np.arange(1,10)
>>> mask = np.ones(arr.shape,dtype=bool)
>>> mask[5]=0
>>> arr[mask]
array([1, 2, 3, 4, 5, 7, 8, 9])

Something similar can be achieved using itertools without numpy

>>> from itertools import compress
>>> arr = range(1,10)
>>> mask = [1]*len(arr)
>>> mask[5]=0
>>> list(compress(arr,mask))
[1, 2, 3, 4, 5, 7, 8, 9]

回答 4

使用np.delete!它实际上并没有删除任何内容

例:

import numpy as np
a = np.array([[1,4],[5,7],[3,1]])                                       

# a: array([[1, 4],
#           [5, 7],
#           [3, 1]])

ind = np.array([0,1])                                                   

# ind: array([0, 1])

# a[ind]: array([[1, 4],
#                [5, 7]])

all_except_index = np.delete(a, ind, axis=0)                                              
# all_except_index: array([[3, 1]])

# a: (still the same): array([[1, 4],
#                             [5, 7],
#                             [3, 1]])

Use np.delete ! It does not actually delete anything inplace

Example:

import numpy as np
a = np.array([[1,4],[5,7],[3,1]])                                       

# a: array([[1, 4],
#           [5, 7],
#           [3, 1]])

ind = np.array([0,1])                                                   

# ind: array([0, 1])

# a[ind]: array([[1, 4],
#                [5, 7]])

all_except_index = np.delete(a, ind, axis=0)                                              
# all_except_index: array([[3, 1]])

# a: (still the same): array([[1, 4],
#                             [5, 7],
#                             [3, 1]])

回答 5

我将提供一种功能(不变)的方法。

  1. 做到这一点的标准和简单方法是使用切片:

    index_to_remove = 3
    data = [*range(5)]
    new_data = data[:index_to_remove] + data[index_to_remove + 1:]
    
    print(f"data: {data}, new_data: {new_data}")

    输出:

    data: [0, 1, 2, 3, 4], new_data: [0, 1, 2, 4]
  2. 使用清单理解:

    data = [*range(5)]
    new_data = [v for i, v in enumerate(data) if i != index_to_remove]
    
    print(f"data: {data}, new_data: {new_data}") 

    输出:

    data: [0, 1, 2, 3, 4], new_data: [0, 1, 2, 4]
  3. 使用过滤功能:

    index_to_remove = 3
    data = [*range(5)]
    new_data = [*filter(lambda i: i != index_to_remove, data)]

    输出:

    data: [0, 1, 2, 3, 4], new_data: [0, 1, 2, 4]
  4. 使用遮罩。屏蔽由标准库中的itertools.compress函数提供:

    from itertools import compress
    
    index_to_remove = 3
    data = [*range(5)]
    mask = [1] * len(data)
    mask[index_to_remove] = 0
    new_data = [*compress(data, mask)]
    
    print(f"data: {data}, mask: {mask}, new_data: {new_data}")

    输出:

    data: [0, 1, 2, 3, 4], mask: [1, 1, 1, 0, 1], new_data: [0, 1, 2, 4]
  5. 使用Python标准库中的itertools.filterfalse函数

    from itertools import filterfalse
    
    index_to_remove = 3
    data = [*range(5)]
    new_data = [*filterfalse(lambda i: i == index_to_remove, data)]
    
    print(f"data: {data}, new_data: {new_data}")

    输出:

    data: [0, 1, 2, 3, 4], new_data: [0, 1, 2, 4]

I’m going to provide a functional (immutable) way of doing it.

  1. The standard and easy way of doing it is to use slicing:

    index_to_remove = 3
    data = [*range(5)]
    new_data = data[:index_to_remove] + data[index_to_remove + 1:]
    
    print(f"data: {data}, new_data: {new_data}")
    

    Output:

    data: [0, 1, 2, 3, 4], new_data: [0, 1, 2, 4]
    
  2. Use list comprehension:

    data = [*range(5)]
    new_data = [v for i, v in enumerate(data) if i != index_to_remove]
    
    print(f"data: {data}, new_data: {new_data}") 
    

    Output:

    data: [0, 1, 2, 3, 4], new_data: [0, 1, 2, 4]
    
  3. Use filter function:

    index_to_remove = 3
    data = [*range(5)]
    new_data = [*filter(lambda i: i != index_to_remove, data)]
    

    Output:

    data: [0, 1, 2, 3, 4], new_data: [0, 1, 2, 4]
    
  4. Using masking. Masking is provided by itertools.compress function in the standard library:

    from itertools import compress
    
    index_to_remove = 3
    data = [*range(5)]
    mask = [1] * len(data)
    mask[index_to_remove] = 0
    new_data = [*compress(data, mask)]
    
    print(f"data: {data}, mask: {mask}, new_data: {new_data}")
    

    Output:

    data: [0, 1, 2, 3, 4], mask: [1, 1, 1, 0, 1], new_data: [0, 1, 2, 4]
    
  5. Use itertools.filterfalse function from Python standard library

    from itertools import filterfalse
    
    index_to_remove = 3
    data = [*range(5)]
    new_data = [*filterfalse(lambda i: i == index_to_remove, data)]
    
    print(f"data: {data}, new_data: {new_data}")
    

    Output:

    data: [0, 1, 2, 3, 4], new_data: [0, 1, 2, 4]
    

回答 6

如果您事先不知道索引,这里的功能将起作用

def reverse_index(l, index):
    try:
        l.pop(index)
        return l
    except IndexError:
        return False

If you don’t know the index beforehand here is a function that will work

def reverse_index(l, index):
    try:
        l.pop(index)
        return l
    except IndexError:
        return False

回答 7

请注意,如果变量是列表列表,则某些方法将失败。例如:

v1 = [[range(3)] for x in range(4)]
v2 = v1[:3]+v1[4:] # this fails
v2

对于一般情况,使用

removed_index = 1
v1 = [[range(3)] for x in range(4)]
v2 = [x for i,x in enumerate(v1) if x!=removed_index]
v2

Note that if variable is list of lists, some approaches would fail. For example:

v1 = [[range(3)] for x in range(4)]
v2 = v1[:3]+v1[4:] # this fails
v2

For the general case, use

removed_index = 1
v1 = [[range(3)] for x in range(4)]
v2 = [x for i,x in enumerate(v1) if x!=removed_index]
v2

回答 8

如果要剪掉最后一个或第一个,请执行以下操作:

list = ["This", "is", "a", "list"]
listnolast = list[:-1]
listnofirst = list[1:]

如果将1更改为2,则将删除前2个字符,而不是第二个。希望这对您有所帮助!

If you want to cut out the last or the first do this:

list = ["This", "is", "a", "list"]
listnolast = list[:-1]
listnofirst = list[1:]

If you change 1 to 2 the first 2 characters will be removed not the second. Hope this still helps!


在满足熊猫特定条件的地方更新行值

问题:在满足熊猫特定条件的地方更新行值

说我有以下数据框:

什么是更新列的值最有效的方式壮举another_feat其中编号为2

是这个吗?

for index, row in df.iterrows():
    if df1.loc[index,'stream'] == 2:
       # do something

更新: 如果我有超过100列怎么办?我不想明确命名要更新的列。我想将每列的值除以2(流列除外)。

所以要明确我的目标是:

将所有值除以具有流2的所有行的2,但不更改流列

Say I have the following dataframe:

What is the most efficient way to update the values of the columns feat and another_feat where the stream is number 2?

Is this it?

for index, row in df.iterrows():
    if df1.loc[index,'stream'] == 2:
       # do something

UPDATE: What to do if I have more than a 100 columns? I don’t want to explicitly name the columns that I want to update. I want to divide the value of each column by 2 (except for the stream column).

So to be clear what my goal is:

Dividing all values by 2 of all rows that have stream 2, but not changing the stream column


回答 0

我认为loc如果需要将两列更新为相同的值,可以使用:

df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa'
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2        aaaa         aaaa
c       2        aaaa         aaaa
d       3  some_value   some_value

如果需要单独更新,请使用以下一种方法:

df1.loc[df1['stream'] == 2, 'feat'] = 10
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2          10   some_value
c       2          10   some_value
d       3  some_value   some_value

另一个常见的选择是使用numpy.where

df1['feat'] = np.where(df1['stream'] == 2, 10,20)
print df1
   stream  feat another_feat
a       1    20   some_value
b       2    10   some_value
c       2    10   some_value
d       3    20   some_value

编辑:如果您需要除stream条件之外的所有列True,请使用:

print df1
   stream  feat  another_feat
a       1     4             5
b       2     4             5
c       2     2             9
d       3     1             7

#filter columns all without stream
cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']

df1.loc[df1['stream'] == 2, cols ] = df1 / 2
print df1
   stream  feat  another_feat
a       1   4.0           5.0
b       2   2.0           2.5
c       2   1.0           4.5
d       3   1.0           7.0

I think you can use loc if you need update two columns to same value:

df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa'
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2        aaaa         aaaa
c       2        aaaa         aaaa
d       3  some_value   some_value

If you need update separate, one option is use:

df1.loc[df1['stream'] == 2, 'feat'] = 10
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2          10   some_value
c       2          10   some_value
d       3  some_value   some_value

Another common option is use numpy.where:

df1['feat'] = np.where(df1['stream'] == 2, 10,20)
print df1
   stream  feat another_feat
a       1    20   some_value
b       2    10   some_value
c       2    10   some_value
d       3    20   some_value

EDIT: If you need divide all columns without stream where condition is True, use:

print df1
   stream  feat  another_feat
a       1     4             5
b       2     4             5
c       2     2             9
d       3     1             7

#filter columns all without stream
cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']

df1.loc[df1['stream'] == 2, cols ] = df1 / 2
print df1
   stream  feat  another_feat
a       1   4.0           5.0
b       2   2.0           2.5
c       2   1.0           4.5
d       3   1.0           7.0

回答 1

您可以使用进行相同的操作.ix,如下所示:

In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd'))

In [2]: df
Out[2]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484 -0.905302 -0.435821  1.934512
3  0.266113 -0.034305 -0.110272 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

In [3]: df.ix[df.a>0, ['b','c']] = 0

In [4]: df
Out[4]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484  0.000000  0.000000  1.934512
3  0.266113  0.000000  0.000000 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

编辑

在获得额外的信息之后,以下将返回所有列(满足某些条件的列)的值减半:

>> condition = df.a > 0
>> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)

我希望这有帮助!

You can do the same with .ix, like this:

In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd'))

In [2]: df
Out[2]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484 -0.905302 -0.435821  1.934512
3  0.266113 -0.034305 -0.110272 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

In [3]: df.ix[df.a>0, ['b','c']] = 0

In [4]: df
Out[4]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484  0.000000  0.000000  1.934512
3  0.266113  0.000000  0.000000 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

EDIT

After the extra information, the following will return all columns – where some condition is met – with halved values:

>> condition = df.a > 0
>> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)

I hope this helps!


如何索引字典?

问题:如何索引字典?

我在下面有一个字典:

colors = {
    "blue" : "5",
    "red" : "6",
    "yellow" : "8",
}

如何索引字典中的第一个条目?

colors[0]KeyError由于明显的原因将返回。

I have a Dictionary below:

colors = {
    "blue" : "5",
    "red" : "6",
    "yellow" : "8",
}

How do I index the first entry in the dictionary?

colors[0] will return a KeyError for obvious reasons.


回答 0

在Python版本(包括Python 3.6)及更高版本中,字典是无序的。如果您不关心条目的顺序,并且仍然想通过索引访问键或值,则可以使用d.keys()[i]d.values()[i]d.items()[i]。(请注意,这些方法创建了Python 2.x中所有键,值或项的列表。因此,如果一次又一次需要它们,请将列表存储在变量中以提高性能。)

如果您确实关心条目的顺序,那么可以从Python 2.7开始使用collections.OrderedDict。或使用成对清单

l = [("blue", "5"), ("red", "6"), ("yellow", "8")]

如果您不需要通过密钥访问。(为什么您的数字是字符串?)

在Python 3.7中,常规字典是有序的,因此您不再需要使用它OrderedDict(但您仍然可以使用-它基本上是相同的类型)。Python 3.6的CPython实现已经包含了这一更改,但是由于它不是语言规范的一部分,因此您不能在Python 3.6中依赖它。

Dictionaries are unordered in Python versions up to and including Python 3.6. If you do not care about the order of the entries and want to access the keys or values by index anyway, you can use d.keys()[i] and d.values()[i] or d.items()[i]. (Note that these methods create a list of all keys, values or items in Python 2.x. So if you need them more then once, store the list in a variable to improve performance.)

If you do care about the order of the entries, starting with Python 2.7 you can use collections.OrderedDict. Or use a list of pairs

l = [("blue", "5"), ("red", "6"), ("yellow", "8")]

if you don’t need access by key. (Why are your numbers strings by the way?)

In Python 3.7, normal dictionaries are ordered, so you don’t need to use OrderedDict anymore (but you still can – it’s basically the same type). The CPython implementation of Python 3.6 already included that change, but since it’s not part of the language specification, you can’t rely on it in Python 3.6.


回答 1

如果仍然有人在看这个问题,那么当前接受的答案现在已经过时了:

由于Python 3.7 *字典是顺序保留的,也就是说它们现在的行为与collections.OrderedDicts完全相同。不幸的是,仍然没有专用的方法可以索引到字典的keys()/values()中,因此可以通过以下方法获取字典中的第一个键/值:

first_key = list(colors)[0]
first_val = list(colors.values())[0]

或者(避免将键视图实例化为列表):

def get_first_key(dictionary):
    for key in dictionary:
        return key
    raise IndexError

first_key = get_first_key(colors)
first_val = colors[first_key]

如果您需要n-th键,则类似

def get_nth_key(dictionary, n=0):
    if n < 0:
        n += len(dictionary)
    for i, key in enumerate(dictionary.keys()):
        if i == n:
            return key
    raise IndexError("dictionary index out of range") 

(* CPython 3.6已经包含有序字典,但这只是实现细节。语言规范包括3.7以后的有序字典。)

If anybody still looking at this question, the currently accepted answer is now outdated:

Since Python 3.7* the dictionaries are order-preserving, that is they now behave exactly as collections.OrderedDicts used to. Unfortunately, there is still no dedicated method to index into keys() / values() of the dictionary, so getting the first key / value in the dictionary can be done as

first_key = list(colors)[0]
first_val = list(colors.values())[0]

or alternatively (this avoids instantiating the keys view into a list):

def get_first_key(dictionary):
    for key in dictionary:
        return key
    raise IndexError

first_key = get_first_key(colors)
first_val = colors[first_key]

If you need an n-th key, then similarly

def get_nth_key(dictionary, n=0):
    if n < 0:
        n += len(dictionary)
    for i, key in enumerate(dictionary.keys()):
        if i == n:
            return key
    raise IndexError("dictionary index out of range") 

(*CPython 3.6 already included ordered dicts, but this was only an implementation detail. The language specification includes ordered dicts from 3.7 onwards.)


回答 2

处理字典中的元素就像坐在驴上,享受旅程。

作为Python的规则,字典是无序的

如果有

dic = {1: "a", 2: "aa", 3: "aaa"}

现在假设如果我喜欢dic[10] = "b",那么它不会总是这样添加

dic = {1:"a",2:"aa",3:"aaa",10:"b"}

可能像

dic = {1: "a", 2: "aa", 3: "aaa", 10: "b"}

要么

dic = {1: "a", 2: "aa", 10: "b", 3: "aaa"}

要么

dic = {1: "a", 10: "b", 2: "aa", 3: "aaa"}

或任何此类组合。

所以经验法则是字典无序的

Addressing an element of dictionary is like sitting on donkey and enjoy the ride.

As rule of Python DICTIONARY is orderless

If there is

dic = {1: "a", 2: "aa", 3: "aaa"}

Now suppose if I go like dic[10] = "b", then it will not add like this always

dic = {1:"a",2:"aa",3:"aaa",10:"b"}

It may be like

dic = {1: "a", 2: "aa", 3: "aaa", 10: "b"}

Or

dic = {1: "a", 2: "aa", 10: "b", 3: "aaa"}

Or

dic = {1: "a", 10: "b", 2: "aa", 3: "aaa"}

Or any such combination.

So thumb rule is DICTIONARY is orderless!


回答 3

如果需要有序字典,可以使用odict

If you need an ordered dictionary, you can use odict.


回答 4

实际上,我找到了一个新颖的解决方案,确实帮了我大忙,如果您特别关心列表或数据集中某个值的索引,则可以将dictionary的值设置为该Index !:

只是看:

list = ['a', 'b', 'c']
dictionary = {}
counter = 0
for i in list:
   dictionary[i] = counter
   counter += 1

print(dictionary) # dictionary = {'a':0, 'b':1, 'c':2}

现在,借助哈希图的强大功能,您可以在恒定时间内(也就是快得多)拉入条目的索引

actually I found a novel solution that really helped me out, If you are especially concerned with the index of a certain value in a list or data set, you can just set the value of dictionary to that Index!:

Just watch:

list = ['a', 'b', 'c']
dictionary = {}
counter = 0
for i in list:
   dictionary[i] = counter
   counter += 1

print(dictionary) # dictionary = {'a':0, 'b':1, 'c':2}

Now through the power of hashmaps you can pull the index your entries in constant time (aka a whole lot faster)


回答 5

哦,那是一个艰难的过程。基本上,这里的每个项目都有两个值。然后,您尝试使用数字作为键来呼叫他们。不幸的是,您的值之一已经设置为键!

试试这个:

colors = {1: ["blue", "5"], 2: ["red", "6"], 3: ["yellow", "8"]}

现在,您可以按数字调用键,就像它们像列表一样被索引。您还可以通过颜色和数字在列表中的位置进行引用。

例如,

colors[1][0]
// returns 'blue'

colors[3][1]
// returns '8'

当然,您将不得不想出另一种方式来跟踪每种颜色的位置。也许您可以拥有另一本字典来存储每种颜色的键值。

colors_key = {‘蓝色’:1,’红色’:6,’yllow’:8}

然后,您也可以根据需要查找颜色键。

colors [colors_key [‘blue’]] [0]将返回’blue’

这样的事情。

然后,当您使用它时,可以使用数字值作为键进行字典操作,以便始终可以使用它们来查找颜色(如果需要)。

值= {5:[1,’蓝色’],6:[2,’红色’],8:[3,’黄色’]}

然后,(colors [colors_key [values [5] [1]]] [0])将返回’blue’。

或者,您可以使用列表列表。

祝好运!

oh, that’s a tough one. What you have here, basically, is two values for each item. Then you are trying to call them with a number as the key. Unfortunately, one of your values is already set as the key!

Try this:

colors = {1: ["blue", "5"], 2: ["red", "6"], 3: ["yellow", "8"]}

Now you can call the keys by number as if they are indexed like a list. You can also reference the color and number by their position within the list.

For example,

colors[1][0]
// returns 'blue'

colors[3][1]
// returns '8'

Of course, you will have to come up with another way of keeping track of what location each color is in. Maybe you can have another dictionary that stores each color’s key as it’s value.

colors_key = {‘blue’: 1, ‘red’: 6, ‘yllow’: 8}

Then, you will be able to also look up the colors key if you need to.

colors[colors_key[‘blue’]][0] will return ‘blue’

Something like that.

And then, while you’re at it, you can make a dict with the number values as keys so that you can always use them to look up your colors, you know, if you need.

values = {5: [1, ‘blue’], 6: [2, ‘red’], 8: [3, ‘yellow’]}

Then, (colors[colors_key[values[5][1]]][0]) will return ‘blue’.

Or you could use a list of lists.

Good luck!


回答 6

您不能,因为它dict是无序的。您可以.popitem()用来获取任意项,但这会将其从dict中删除。

You can’t, since dict is unordered. you can use .popitem() to get an arbitrary item, but that will remove it from the dict.


按位置选择熊猫列

问题:按位置选择熊猫列

我只是想通过整数访问命名的熊猫列。

您可以使用来按位置选择一行df.ix[3]

但是如何按整数选择一列呢?

我的数据框:

df=pandas.DataFrame({'a':np.random.rand(5), 'b':np.random.rand(5)})

I’m simply trying to access named pandas columns by an integer.

You can select a row by location using df.ix[3].

But how to select a column by integer?

My dataframe:

df=pandas.DataFrame({'a':np.random.rand(5), 'b':np.random.rand(5)})

回答 0

我想到两种方法:

>>> df
          A         B         C         D
0  0.424634  1.716633  0.282734  2.086944
1 -1.325816  2.056277  2.583704 -0.776403
2  1.457809 -0.407279 -1.560583 -1.316246
3 -0.757134 -1.321025  1.325853 -2.513373
4  1.366180 -1.265185 -2.184617  0.881514
>>> df.iloc[:, 2]
0    0.282734
1    2.583704
2   -1.560583
3    1.325853
4   -2.184617
Name: C
>>> df[df.columns[2]]
0    0.282734
1    2.583704
2   -1.560583
3    1.325853
4   -2.184617
Name: C

编辑:原来的答案建议使用,df.ix[:,2]但是现在不建议使用此功能。用户应切换到df.iloc[:,2]

Two approaches that come to mind:

>>> df
          A         B         C         D
0  0.424634  1.716633  0.282734  2.086944
1 -1.325816  2.056277  2.583704 -0.776403
2  1.457809 -0.407279 -1.560583 -1.316246
3 -0.757134 -1.321025  1.325853 -2.513373
4  1.366180 -1.265185 -2.184617  0.881514
>>> df.iloc[:, 2]
0    0.282734
1    2.583704
2   -1.560583
3    1.325853
4   -2.184617
Name: C
>>> df[df.columns[2]]
0    0.282734
1    2.583704
2   -1.560583
3    1.325853
4   -2.184617
Name: C

Edit: The original answer suggested the use of df.ix[:,2] but this function is now deprecated. Users should switch to df.iloc[:,2].


回答 1

您还可以df.icol(n)用于按整数访问列。

更新:icol不推荐使用,并且可以通过以下方式实现相同的功能:

df.iloc[:, n]  # to access the column at the nth position

You can also use df.icol(n) to access a column by integer.

Update: icol is deprecated and the same functionality can be achieved by:

df.iloc[:, n]  # to access the column at the nth position

回答 2

您可以使用基于.loc的标签或基于.iloc方法的索引来进行包括列范围在内的列切片:

In [50]: import pandas as pd

In [51]: import numpy as np

In [52]: df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd'))

In [53]: df
Out[53]: 
          a         b         c         d
0  0.806811  0.187630  0.978159  0.317261
1  0.738792  0.862661  0.580592  0.010177
2  0.224633  0.342579  0.214512  0.375147
3  0.875262  0.151867  0.071244  0.893735

In [54]: df.loc[:, ["a", "b", "d"]] ### Selective columns based slicing
Out[54]: 
          a         b         d
0  0.806811  0.187630  0.317261
1  0.738792  0.862661  0.010177
2  0.224633  0.342579  0.375147
3  0.875262  0.151867  0.893735

In [55]: df.loc[:, "a":"c"] ### Selective label based column ranges slicing
Out[55]: 
          a         b         c
0  0.806811  0.187630  0.978159
1  0.738792  0.862661  0.580592
2  0.224633  0.342579  0.214512
3  0.875262  0.151867  0.071244

In [56]: df.iloc[:, 0:3] ### Selective index based column ranges slicing
Out[56]: 
          a         b         c
0  0.806811  0.187630  0.978159
1  0.738792  0.862661  0.580592
2  0.224633  0.342579  0.214512
3  0.875262  0.151867  0.071244

You could use label based using .loc or index based using .iloc method to do column-slicing including column ranges:

In [50]: import pandas as pd

In [51]: import numpy as np

In [52]: df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd'))

In [53]: df
Out[53]: 
          a         b         c         d
0  0.806811  0.187630  0.978159  0.317261
1  0.738792  0.862661  0.580592  0.010177
2  0.224633  0.342579  0.214512  0.375147
3  0.875262  0.151867  0.071244  0.893735

In [54]: df.loc[:, ["a", "b", "d"]] ### Selective columns based slicing
Out[54]: 
          a         b         d
0  0.806811  0.187630  0.317261
1  0.738792  0.862661  0.010177
2  0.224633  0.342579  0.375147
3  0.875262  0.151867  0.893735

In [55]: df.loc[:, "a":"c"] ### Selective label based column ranges slicing
Out[55]: 
          a         b         c
0  0.806811  0.187630  0.978159
1  0.738792  0.862661  0.580592
2  0.224633  0.342579  0.214512
3  0.875262  0.151867  0.071244

In [56]: df.iloc[:, 0:3] ### Selective index based column ranges slicing
Out[56]: 
          a         b         c
0  0.806811  0.187630  0.978159
1  0.738792  0.862661  0.580592
2  0.224633  0.342579  0.214512
3  0.875262  0.151867  0.071244

回答 3

您可以通过将列索引列表传递给dataFrame.ix来访问多个列。

例如:

>>> df = pandas.DataFrame({
             'a': np.random.rand(5),
             'b': np.random.rand(5),
             'c': np.random.rand(5),
             'd': np.random.rand(5)
         })

>>> df
          a         b         c         d
0  0.705718  0.414073  0.007040  0.889579
1  0.198005  0.520747  0.827818  0.366271
2  0.974552  0.667484  0.056246  0.524306
3  0.512126  0.775926  0.837896  0.955200
4  0.793203  0.686405  0.401596  0.544421

>>> df.ix[:,[1,3]]
          b         d
0  0.414073  0.889579
1  0.520747  0.366271
2  0.667484  0.524306
3  0.775926  0.955200
4  0.686405  0.544421

You can access multiple columns by passing a list of column indices to dataFrame.ix.

For example:

>>> df = pandas.DataFrame({
             'a': np.random.rand(5),
             'b': np.random.rand(5),
             'c': np.random.rand(5),
             'd': np.random.rand(5)
         })

>>> df
          a         b         c         d
0  0.705718  0.414073  0.007040  0.889579
1  0.198005  0.520747  0.827818  0.366271
2  0.974552  0.667484  0.056246  0.524306
3  0.512126  0.775926  0.837896  0.955200
4  0.793203  0.686405  0.401596  0.544421

>>> df.ix[:,[1,3]]
          b         d
0  0.414073  0.889579
1  0.520747  0.366271
2  0.667484  0.524306
3  0.775926  0.955200
4  0.686405  0.544421

回答 4

.transpose()方法将列转换为行,将行转换为列,因此您甚至可以编写

df.transpose().ix[3]

The method .transpose() converts columns to rows and rows to column, hence you could even write

df.transpose().ix[3]

从python pandas中的列名获取列索引

问题:从python pandas中的列名获取列索引

在R中,当您需要根据列名检索列索引时,可以执行此操作

idx <- which(names(my_data)==my_colum_name)

有没有办法对熊猫数据框做同样的事情?

In R when you need to retrieve a column index based on the name of the column you could do

idx <- which(names(my_data)==my_colum_name)

Is there a way to do the same with pandas dataframes?


回答 0

当然可以使用.get_loc()

In [45]: df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

In [46]: df.columns
Out[46]: Index([apple, orange, pear], dtype=object)

In [47]: df.columns.get_loc("pear")
Out[47]: 2

虽然老实说,我自己通常不需要这个。通常,通过名称进行访问可以实现我想要的功能(df["pear"],,df[["apple", "orange"]]或者也许df.columns.isin(["orange", "pear"])),尽管我可以肯定地看到一些您需要索引号的情况。

Sure, you can use .get_loc():

In [45]: df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

In [46]: df.columns
Out[46]: Index([apple, orange, pear], dtype=object)

In [47]: df.columns.get_loc("pear")
Out[47]: 2

although to be honest I don’t often need this myself. Usually access by name does what I want it to (df["pear"], df[["apple", "orange"]], or maybe df.columns.isin(["orange", "pear"])), although I can definitely see cases where you’d want the index number.


回答 1

这是通过列表理解的解决方案。cols是要获取索引的列的列表:

[df.columns.get_loc(c) for c in cols if c in df]

Here is a solution through list comprehension. cols is the list of columns to get index for:

[df.columns.get_loc(c) for c in cols if c in df]

回答 2

DSM的解决方案有效,但是如果您想要直接等效的解决方案,则which可以这样做(df.columns == name).nonzero()

DSM’s solution works, but if you wanted a direct equivalent to which you could do (df.columns == name).nonzero()


回答 3

当您要查找多个列匹配项时,可以使用使用searchsorted方法的矢量化解决方案。因此,df作为query_cols要搜索的数据框和列名,实现将是-

def column_index(df, query_cols):
    cols = df.columns.values
    sidx = np.argsort(cols)
    return sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

样品运行-

In [162]: df
Out[162]: 
   apple  banana  pear  orange  peach
0      8       3     4       4      2
1      4       4     3       0      1
2      1       2     6       8      1

In [163]: column_index(df, ['peach', 'banana', 'apple'])
Out[163]: array([4, 1, 0])

When you might be looking to find multiple column matches, a vectorized solution using searchsorted method could be used. Thus, with df as the dataframe and query_cols as the column names to be searched for, an implementation would be –

def column_index(df, query_cols):
    cols = df.columns.values
    sidx = np.argsort(cols)
    return sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

Sample run –

In [162]: df
Out[162]: 
   apple  banana  pear  orange  peach
0      8       3     4       4      2
1      4       4     3       0      1
2      1       2     6       8      1

In [163]: column_index(df, ['peach', 'banana', 'apple'])
Out[163]: array([4, 1, 0])

回答 4

如果您希望从列位置获得列名(OP问题的另一种方法),则可以使用:

>>> df.columns.get_values()[location]

使用@DSM示例:

>>> df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

>>> df.columns

Index(['apple', 'orange', 'pear'], dtype='object')

>>> df.columns.get_values()[1]

'orange'

其他方法:

df.iloc[:,1].name

df.columns[location] #(thanks to @roobie-nuby for pointing that out in comments.) 

In case you want the column name from the column location (the other way around to the OP question), you can use:

>>> df.columns.get_values()[location]

Using @DSM Example:

>>> df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

>>> df.columns

Index(['apple', 'orange', 'pear'], dtype='object')

>>> df.columns.get_values()[1]

'orange'

Other ways:

df.iloc[:,1].name

df.columns[location] #(thanks to @roobie-nuby for pointing that out in comments.) 

回答 5

这个怎么样:

df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})
out = np.argwhere(df.columns.isin(['apple', 'orange'])).ravel()
print(out)
[1 2]

how about this:

df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})
out = np.argwhere(df.columns.isin(['apple', 'orange'])).ravel()
print(out)
[1 2]