在熊猫中将两个系列组合到一个DataFrame中

问题:在熊猫中将两个系列组合到一个DataFrame中

我有两个Series,s1并且s2索引相同(非连续)。如何合并s1s2成为DataFrame中的两列,并将其中一个索引保留为第三列?

I have two Series s1 and s2 with the same (non-consecutive) indices. How do I combine s1 and s2 to being two columns in a DataFrame and keep one of the indices as a third column?


回答 0

我认为这concat是个不错的方法。如果存在它们,则将“系列”的名称属性用作列(否则,将它们简单地编号):

In [1]: s1 = pd.Series([1, 2], index=['A', 'B'], name='s1')

In [2]: s2 = pd.Series([3, 4], index=['A', 'B'], name='s2')

In [3]: pd.concat([s1, s2], axis=1)
Out[3]:
   s1  s2
A   1   3
B   2   4

In [4]: pd.concat([s1, s2], axis=1).reset_index()
Out[4]:
  index  s1  s2
0     A   1   3
1     B   2   4

注意:这扩展到2个以上的系列。

I think concat is a nice way to do this. If they are present it uses the name attributes of the Series as the columns (otherwise it simply numbers them):

In [1]: s1 = pd.Series([1, 2], index=['A', 'B'], name='s1')

In [2]: s2 = pd.Series([3, 4], index=['A', 'B'], name='s2')

In [3]: pd.concat([s1, s2], axis=1)
Out[3]:
   s1  s2
A   1   3
B   2   4

In [4]: pd.concat([s1, s2], axis=1).reset_index()
Out[4]:
  index  s1  s2
0     A   1   3
1     B   2   4

Note: This extends to more than 2 Series.


回答 1

如果两个索引都相同,为什么不只使用.to_frame?

> = v0.23

a.to_frame().join(b)

< v0.23

a.to_frame().join(b.to_frame())

Why don’t you just use .to_frame if both have the same indexes?

>= v0.23

a.to_frame().join(b)

< v0.23

a.to_frame().join(b.to_frame())

回答 2

熊猫会自动将这些通过的序列对齐并创建联合索引。它们在这里恰好是相同的。reset_index将索引移到列。

In [2]: s1 = Series(randn(5),index=[1,2,4,5,6])

In [4]: s2 = Series(randn(5),index=[1,2,4,5,6])

In [8]: DataFrame(dict(s1 = s1, s2 = s2)).reset_index()
Out[8]: 
   index        s1        s2
0      1 -0.176143  0.128635
1      2 -1.286470  0.908497
2      4 -0.995881  0.528050
3      5  0.402241  0.458870
4      6  0.380457  0.072251

Pandas will automatically align these passed in series and create the joint index They happen to be the same here. reset_index moves the index to a column.

In [2]: s1 = Series(randn(5),index=[1,2,4,5,6])

In [4]: s2 = Series(randn(5),index=[1,2,4,5,6])

In [8]: DataFrame(dict(s1 = s1, s2 = s2)).reset_index()
Out[8]: 
   index        s1        s2
0      1 -0.176143  0.128635
1      2 -1.286470  0.908497
2      4 -0.995881  0.528050
3      5  0.402241  0.458870
4      6  0.380457  0.072251

回答 3

示例代码:

a = pd.Series([1,2,3,4], index=[7,2,8,9])
b = pd.Series([5,6,7,8], index=[7,2,8,9])
data = pd.DataFrame({'a': a,'b':b, 'idx_col':a.index})

Pandas允许您从中创建一个DataFramedictSeries作为值,将列名作为键。当找到a Series作为值时,它将使用Series索引作为索引的一部分DataFrame。数据对齐是熊猫的主要特权之一。因此,除非您有其他需求,否则新创建的商品DataFrame具有重复的价值。在上述示例中,data['idx_col']具有与相同的数据data.index

Example code:

a = pd.Series([1,2,3,4], index=[7,2,8,9])
b = pd.Series([5,6,7,8], index=[7,2,8,9])
data = pd.DataFrame({'a': a,'b':b, 'idx_col':a.index})

Pandas allows you to create a DataFrame from a dict with Series as the values and the column names as the keys. When it finds a Series as a value, it uses the Series index as part of the DataFrame index. This data alignment is one of the main perks of Pandas. Consequently, unless you have other needs, the freshly created DataFrame has duplicated value. In the above example, data['idx_col'] has the same data as data.index.


回答 4

如果我可以回答这个问题。

将系列转换为数据框的基本原理是要了解

从概念上讲,数据框中的每一列都是一个序列。

2.而且,每个列名都是映射到系列的键名。

如果牢记以上两个概念,则可以想到许多将系列转换为数据框的方法。一个简单的解决方案将是这样的:

在这里创建两个系列

import pandas as pd

series_1 = pd.Series(list(range(10)))

series_2 = pd.Series(list(range(20,30)))

使用所需的列名创建一个空的数据框

df = pd.DataFrame(columns = ['Column_name#1', 'Column_name#1'])

使用映射概念将序列值放入数据框内

df['Column_name#1'] = series_1

df['Column_name#2'] = series_2

立即检查结果

df.head(5)

If I may answer this.

The fundamentals behind converting series to data frame is to understand that

1. At conceptual level, every column in data frame is a series.

2. And, every column name is a key name that maps to a series.

If you keep above two concepts in mind, you can think of many ways to convert series to data frame. One easy solution will be like this:

Create two series here

import pandas as pd

series_1 = pd.Series(list(range(10)))

series_2 = pd.Series(list(range(20,30)))

Create an empty data frame with just desired column names

df = pd.DataFrame(columns = ['Column_name#1', 'Column_name#1'])

Put series value inside data frame using mapping concept

df['Column_name#1'] = series_1

df['Column_name#2'] = series_2

Check results now

df.head(5)

回答 5

不确定我是否完全理解您的问题,但这是您想做的吗?

pd.DataFrame(data=dict(s1=s1, s2=s2), index=s1.index)

index=s1.index这里甚至没有必要)

Not sure I fully understand your question, but is this what you want to do?

pd.DataFrame(data=dict(s1=s1, s2=s2), index=s1.index)

(index=s1.index is not even necessary here)


回答 6

基于以下方式的解决方案的简化join()

df = a.to_frame().join(b)

A simplification of the solution based on join():

df = a.to_frame().join(b)

回答 7

我使用了pandas将numpy数组或iseries转换为数据框,然后添加并按键将其他附加列作为“预测”。如果您需要将数据框转换回列表,请使用values.tolist()

output=pd.DataFrame(X_test)
output['prediction']=y_pred

list=output.values.tolist()     

I used pandas to convert my numpy array or iseries to an dataframe then added and additional the additional column by key as ‘prediction’. If you need dataframe converted back to a list then use values.tolist()

output=pd.DataFrame(X_test)
output['prediction']=y_pred

list=output.values.tolist()