Python Pandas：将选定的列保留为DataFrame而不是Series

Question 1

从pandas DataFrame中选择单个列时（例如df.iloc[:, 0]，df['A']或df.A等），结果矢量将自动转换为Series而不是单列DataFrame。但是，我正在编写一些将DataFrame作为输入参数的函数。因此，我更喜欢处理单列DataFrame而不是Series，以便函数可以假定df.columns是可访问的。现在，我必须使用来将Series显式转换为DataFrame pd.DataFrame(df.iloc[:, 0])。这似乎不是最干净的方法。是否有更优雅的方法直接从DataFrame进行索引，以便结果是单列DataFrame而不是Series？

Question 2

When selecting a single column from a pandas DataFrame(say df.iloc[:, 0], df['A'], or df.A, etc), the resulting vector is automatically converted to a Series instead of a single-column DataFrame. However, I am writing some functions that takes a DataFrame as an input argument. Therefore, I prefer to deal with single-column DataFrame instead of Series so that the function can assume say df.columns is accessible. Right now I have to explicitly convert the Series into a DataFrame by using something like pd.DataFrame(df.iloc[:, 0]). This doesn’t seem like the most clean method. Is there a more elegant way to index from a DataFrame directly so that the result is a single-column DataFrame instead of Series?

Question 3

正如@Jeff提到的，有几种方法可以做到这一点，但我建议使用loc / iloc来使其更明确（如果尝试歧义，请提早出错）：

In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [11]: df
Out[11]:
   A  B
0  1  2
1  3  4

In [12]: df[['A']]

In [13]: df[[0]]

In [14]: df.loc[:, ['A']]

In [15]: df.iloc[:, [0]]

Out[12-15]:  # they all return the same thing:
   A
0  1
1  3

在整数列名称的情况下，后两种选择消除了歧义（正是创建loc / iloc的原因）。例如：

In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 0])

In [17]: df
Out[17]:
   A  0
0  1  2
1  3  4

In [18]: df[[0]]  # ambiguous
Out[18]:
   A
0  1
1  3

Question 4

As @Jeff mentions there are a few ways to do this, but I recommend using loc/iloc to be more explicit (and raise errors early if your trying something ambiguous):

In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [11]: df
Out[11]:
   A  B
0  1  2
1  3  4

In [12]: df[['A']]

In [13]: df[[0]]

In [14]: df.loc[:, ['A']]

In [15]: df.iloc[:, [0]]

Out[12-15]:  # they all return the same thing:
   A
0  1
1  3

The latter two choices remove ambiguity in the case of integer column names (precisely why loc/iloc were created). For example:

In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 0])

In [17]: df
Out[17]:
   A  0
0  1  2
1  3  4

In [18]: df[[0]]  # ambiguous
Out[18]:
   A
0  1
1  3

Question 5

正如安迪·海登（Andy Hayden）所建议的那样，利用.iloc / .loc索引（单列）数据帧是可行的方法。要注意的另一点是如何表达索引位置。使用列出的索引标签/位置，同时指定要作为数据框索引的参数值；否则将返回“ pandas.core.series.Series”

输入：

    A_1 = train_data.loc[:,'Fraudster']
    print('A_1 is of type', type(A_1))
    A_2 = train_data.loc[:, ['Fraudster']]
    print('A_2 is of type', type(A_2))
    A_3 = train_data.iloc[:,12]
    print('A_3 is of type', type(A_3))
    A_4 = train_data.iloc[:,[12]]
    print('A_4 is of type', type(A_4))

输出：

    A_1 is of type <class 'pandas.core.series.Series'>
    A_2 is of type <class 'pandas.core.frame.DataFrame'>
    A_3 is of type <class 'pandas.core.series.Series'>
    A_4 is of type <class 'pandas.core.frame.DataFrame'>

Question 6

As Andy Hayden recommends, utilizing .iloc/.loc to index out (single-columned) dataframe is the way to go; another point to note is how to express the index positions. Use a listed Index labels/positions whilst specifying the argument values to index out as Dataframe; failure to do so will return a ‘pandas.core.series.Series’

Input:

    A_1 = train_data.loc[:,'Fraudster']
    print('A_1 is of type', type(A_1))
    A_2 = train_data.loc[:, ['Fraudster']]
    print('A_2 is of type', type(A_2))
    A_3 = train_data.iloc[:,12]
    print('A_3 is of type', type(A_3))
    A_4 = train_data.iloc[:,[12]]
    print('A_4 is of type', type(A_4))

Output:

    A_1 is of type <class 'pandas.core.series.Series'>
    A_2 is of type <class 'pandas.core.frame.DataFrame'>
    A_3 is of type <class 'pandas.core.series.Series'>
    A_4 is of type <class 'pandas.core.frame.DataFrame'>

Question 7

您可以使用df.iloc[:, 0:1]，在这种情况下，结果向量将是aDataFrame而不是序列。

如你看到的：

Question 8

You can use df.iloc[:, 0:1], in this case the resulting vector will be a DataFrame and not series.

As you can see:

Question 9

提到了这三种方法：

pd.DataFrame(df.loc[:, 'A'])  # Approach of the original post
df.loc[:,[['A']]              # Approach 2 (note: use iloc for positional indexing)
df[['A']]                     # Approach 3

pd.Series.to_frame（）是另一种方法。

因为它是一种方法，所以可以在上述第二种方法和第三种方法不适用的情况下使用。特别是，在将某些方法应用于数据框中的列并且要将输出转换为数据框而不是序列时，此方法很有用。例如，在Jupyter Notebook中，一系列不会有漂亮的输出，但是会有一个数据框。

# Basic use case: 
df['A'].to_frame()

# Use case 2 (this will give you pretty output in a Jupyter Notebook): 
df['A'].describe().to_frame()

# Use case 3: 
df['A'].str.strip().to_frame()

# Use case 4: 
def some_function(num): 
    ...

df['A'].apply(some_function).to_frame()

Question 10

These three approaches have been mentioned:

pd.DataFrame(df.loc[:, 'A'])  # Approach of the original post
df.loc[:,[['A']]              # Approach 2 (note: use iloc for positional indexing)
df[['A']]                     # Approach 3

pd.Series.to_frame() is another approach.

Because it is a method, it can be used in situations where the second and third approaches above do not apply. In particular, it is useful when applying some method to a column in your dataframe and you want to convert the output into a dataframe instead of a series. For instance, in a Jupyter Notebook a series will not have pretty output, but a dataframe will.

# Basic use case: 
df['A'].to_frame()

# Use case 2 (this will give you pretty output in a Jupyter Notebook): 
df['A'].describe().to_frame()

# Use case 3: 
df['A'].str.strip().to_frame()

# Use case 4: 
def some_function(num): 
    ...

df['A'].apply(some_function).to_frame()

Python Pandas：将选定的列保留为DataFrame而不是Series

问题：Python Pandas：将选定的列保留为DataFrame而不是Series

回答 0

回答 1

回答 2

回答 3

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

使用子流程获取实时输出

Python连接文本文件

如何使用python从数组中删除特定元素

使用int的python dataframe pandas drop column

如何将traceback / sys.exc_info（）值保存在变量中？

Efinance 用 Python 爬取A股龙虎榜历史数据

Python Pandas：将选定的列保留为DataFrame而不是Series

问题：Python Pandas：将选定的列保留为DataFrame而不是Series

回答 0

回答 1

回答 2

回答 3

相关文章

排行榜展示

文章展示