问题:将特定的选定列提取到新DataFrame中作为副本
我有一个带有4列的pandas DataFrame,我想创建一个只有三个列的新 DataFrame 。这个问题类似于:从数据框中提取特定的列,但对于不是R的熊猫来说。以下代码不起作用,会引发错误,并且肯定不是熊猫的方式。
import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new = pd.DataFrame(zip(old.A, old.C, old.D)) # raises TypeError: data argument can't be an iterator
熊猫式的做法是什么?
I have a pandas DataFrame with 4 columns and I want to create a new DataFrame that only has three of the columns. This question is similar to: Extracting specific columns from a data frame but for pandas not R. The following code does not work, raises an error, and is certainly not the pandasnic way to do it.
import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new = pd.DataFrame(zip(old.A, old.C, old.D)) # raises TypeError: data argument can't be an iterator
What is the pandasnic way to do it?
回答 0
有一种方法可以做到,它实际上看起来类似于R
new = old[['A', 'C', 'D']].copy()
在这里,您只是从原始数据框中选择所需的列,并为这些列创建变量。如果您想完全修改新的数据框,则可能要.copy()
避免使用SettingWithCopyWarning
。
另一种方法是使用filter
默认创建副本的方法:
new = old.filter(['A','B','D'], axis=1)
最后,根据原始数据框中的列数,使用a表示它可能更简洁drop
(默认情况下也会创建一个副本):
new = old.drop('B', axis=1)
There is a way of doing this and it actually looks similar to R
new = old[['A', 'C', 'D']].copy()
Here you are just selecting the columns you want from the original data frame and creating a variable for those. If you want to modify the new dataframe at all you’ll probably want to use .copy()
to avoid a SettingWithCopyWarning
.
An alternative method is to use filter
which will create a copy by default:
new = old.filter(['A','B','D'], axis=1)
Finally, depending on the number of columns in your original dataframe, it might be more succinct to express this using a drop
(this will also create a copy by default):
new = old.drop('B', axis=1)
回答 1
最简单的方法是
new = old[['A','C','D']]
。
The easiest way is
new = old[['A','C','D']]
.
回答 2
另一个更简单的方法似乎是:
new = pd.DataFrame([old.A, old.B, old.C]).transpose()
哪里old.column_name
会给你一系列。列出所有要保留的列系列,并将其传递给DataFrame构造函数。我们需要进行转置来调整形状。
In [14]:pd.DataFrame([old.A, old.B, old.C]).transpose()
Out[14]:
A B C
0 4 10 100
1 5 20 50
Another simpler way seems to be:
new = pd.DataFrame([old.A, old.B, old.C]).transpose()
where old.column_name
will give you a series.
Make a list of all the column-series you want to retain and pass it to the DataFrame constructor. We need to do a transpose to adjust the shape.
In [14]:pd.DataFrame([old.A, old.B, old.C]).transpose()
Out[14]:
A B C
0 4 10 100
1 5 20 50
回答 3
通用功能形式
def select_columns(data_frame, column_names):
new_frame = data_frame.loc[:, column_names]
return new_frame
专门针对您上面的问题
selected_columns = ['A', 'C', 'D']
new = select_columns(old, selected_columns)
Generic functional form
def select_columns(data_frame, column_names):
new_frame = data_frame.loc[:, column_names]
return new_frame
Specific for your problem above
selected_columns = ['A', 'C', 'D']
new = select_columns(old, selected_columns)
回答 4
如果您想要一个新的数据框,则:
import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new= old[['A', 'C', 'D']]
If you want to have a new data frame then:
import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new= old[['A', 'C', 'D']]
回答 5
据我所知,使用过滤器功能时不一定需要指定轴。
new = old.filter(['A','B','D'])
返回与相同的数据框
new = old.filter(['A','B','D'], axis=1)
As far as I can tell, you don’t necessarily need to specify the axis when using the filter function.
new = old.filter(['A','B','D'])
returns the same dataframe as
new = old.filter(['A','B','D'], axis=1)
回答 6
按索引列:
# selected column index: 1, 6, 7
new = old.iloc[: , [1, 6, 7]].copy()
columns by index:
# selected column index: 1, 6, 7
new = old.iloc[: , [1, 6, 7]].copy()