将特定的选定列提取到新DataFrame中作为副本

问题:将特定的选定列提取到新DataFrame中作为副本

我有一个带有4列的pandas DataFrame,我想创建一个只有三个列的 DataFrame 。这个问题类似于:从数据框中提取特定的列,但对于不是R的熊猫来说。以下代码不起作用,会引发错误,并且肯定不是熊猫的方式。

import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new = pd.DataFrame(zip(old.A, old.C, old.D)) # raises TypeError: data argument can't be an iterator 

熊猫式的做法是什么?

I have a pandas DataFrame with 4 columns and I want to create a new DataFrame that only has three of the columns. This question is similar to: Extracting specific columns from a data frame but for pandas not R. The following code does not work, raises an error, and is certainly not the pandasnic way to do it.

import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new = pd.DataFrame(zip(old.A, old.C, old.D)) # raises TypeError: data argument can't be an iterator 

What is the pandasnic way to do it?


回答 0

有一种方法可以做到,它实际上看起来类似于R

new = old[['A', 'C', 'D']].copy()

在这里,您只是从原始数据框中选择所需的列,并为这些列创建变量。如果您想完全修改新的数据框,则可能要.copy()避免使用SettingWithCopyWarning

另一种方法是使用filter默认创建副本的方法:

new = old.filter(['A','B','D'], axis=1)

最后,根据原始数据框中的列数,使用a表示它可能更简洁drop(默认情况下也会创建一个副本):

new = old.drop('B', axis=1)

There is a way of doing this and it actually looks similar to R

new = old[['A', 'C', 'D']].copy()

Here you are just selecting the columns you want from the original data frame and creating a variable for those. If you want to modify the new dataframe at all you’ll probably want to use .copy() to avoid a SettingWithCopyWarning.

An alternative method is to use filter which will create a copy by default:

new = old.filter(['A','B','D'], axis=1)

Finally, depending on the number of columns in your original dataframe, it might be more succinct to express this using a drop (this will also create a copy by default):

new = old.drop('B', axis=1)

回答 1

最简单的方法是

new = old[['A','C','D']]

The easiest way is

new = old[['A','C','D']]

.


回答 2

另一个更简单的方法似乎是:

new = pd.DataFrame([old.A, old.B, old.C]).transpose()

哪里old.column_name会给你一系列。列出所有要保留的列系列,并将其传递给DataFrame构造函数。我们需要进行转置来调整形状。

In [14]:pd.DataFrame([old.A, old.B, old.C]).transpose()
Out[14]: 
   A   B    C
0  4  10  100
1  5  20   50

Another simpler way seems to be:

new = pd.DataFrame([old.A, old.B, old.C]).transpose()

where old.column_name will give you a series. Make a list of all the column-series you want to retain and pass it to the DataFrame constructor. We need to do a transpose to adjust the shape.

In [14]:pd.DataFrame([old.A, old.B, old.C]).transpose()
Out[14]: 
   A   B    C
0  4  10  100
1  5  20   50

回答 3

通用功能形式

def select_columns(data_frame, column_names):
    new_frame = data_frame.loc[:, column_names]
    return new_frame

专门针对您上面的问题

selected_columns = ['A', 'C', 'D']
new = select_columns(old, selected_columns)

Generic functional form

def select_columns(data_frame, column_names):
    new_frame = data_frame.loc[:, column_names]
    return new_frame

Specific for your problem above

selected_columns = ['A', 'C', 'D']
new = select_columns(old, selected_columns)

回答 4

如果您想要一个新的数据框,则:

import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new=  old[['A', 'C', 'D']]

If you want to have a new data frame then:

import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new=  old[['A', 'C', 'D']]

回答 5

据我所知,使用过滤器功能时不一定需要指定轴。

new = old.filter(['A','B','D'])

返回与相同的数据框

new = old.filter(['A','B','D'], axis=1)

As far as I can tell, you don’t necessarily need to specify the axis when using the filter function.

new = old.filter(['A','B','D'])

returns the same dataframe as

new = old.filter(['A','B','D'], axis=1)

回答 6

按索引列:

# selected column index: 1, 6, 7
new = old.iloc[: , [1, 6, 7]].copy() 

columns by index:

# selected column index: 1, 6, 7
new = old.iloc[: , [1, 6, 7]].copy()