问题:在熊猫中将两个系列组合到一个DataFrame中
我有两个Series,s1
并且s2
索引相同(非连续)。如何合并s1
并s2
成为DataFrame中的两列,并将其中一个索引保留为第三列?
I have two Series s1
and s2
with the same (non-consecutive) indices. How do I combine s1
and s2
to being two columns in a DataFrame and keep one of the indices as a third column?
回答 0
我认为这concat
是个不错的方法。如果存在它们,则将“系列”的名称属性用作列(否则,将它们简单地编号):
In [1]: s1 = pd.Series([1, 2], index=['A', 'B'], name='s1')
In [2]: s2 = pd.Series([3, 4], index=['A', 'B'], name='s2')
In [3]: pd.concat([s1, s2], axis=1)
Out[3]:
s1 s2
A 1 3
B 2 4
In [4]: pd.concat([s1, s2], axis=1).reset_index()
Out[4]:
index s1 s2
0 A 1 3
1 B 2 4
注意:这扩展到2个以上的系列。
I think concat
is a nice way to do this. If they are present it uses the name attributes of the Series as the columns (otherwise it simply numbers them):
In [1]: s1 = pd.Series([1, 2], index=['A', 'B'], name='s1')
In [2]: s2 = pd.Series([3, 4], index=['A', 'B'], name='s2')
In [3]: pd.concat([s1, s2], axis=1)
Out[3]:
s1 s2
A 1 3
B 2 4
In [4]: pd.concat([s1, s2], axis=1).reset_index()
Out[4]:
index s1 s2
0 A 1 3
1 B 2 4
Note: This extends to more than 2 Series.
回答 1
如果两个索引都相同,为什么不只使用.to_frame?
> = v0.23
a.to_frame().join(b)
< v0.23
a.to_frame().join(b.to_frame())
Why don’t you just use .to_frame if both have the same indexes?
>= v0.23
a.to_frame().join(b)
< v0.23
a.to_frame().join(b.to_frame())
回答 2
熊猫会自动将这些通过的序列对齐并创建联合索引。它们在这里恰好是相同的。reset_index
将索引移到列。
In [2]: s1 = Series(randn(5),index=[1,2,4,5,6])
In [4]: s2 = Series(randn(5),index=[1,2,4,5,6])
In [8]: DataFrame(dict(s1 = s1, s2 = s2)).reset_index()
Out[8]:
index s1 s2
0 1 -0.176143 0.128635
1 2 -1.286470 0.908497
2 4 -0.995881 0.528050
3 5 0.402241 0.458870
4 6 0.380457 0.072251
Pandas will automatically align these passed in series and create the joint index
They happen to be the same here. reset_index
moves the index to a column.
In [2]: s1 = Series(randn(5),index=[1,2,4,5,6])
In [4]: s2 = Series(randn(5),index=[1,2,4,5,6])
In [8]: DataFrame(dict(s1 = s1, s2 = s2)).reset_index()
Out[8]:
index s1 s2
0 1 -0.176143 0.128635
1 2 -1.286470 0.908497
2 4 -0.995881 0.528050
3 5 0.402241 0.458870
4 6 0.380457 0.072251
回答 3
示例代码:
a = pd.Series([1,2,3,4], index=[7,2,8,9])
b = pd.Series([5,6,7,8], index=[7,2,8,9])
data = pd.DataFrame({'a': a,'b':b, 'idx_col':a.index})
Pandas允许您从中创建一个DataFrame
,dict
以Series
作为值,将列名作为键。当找到a Series
作为值时,它将使用Series
索引作为索引的一部分DataFrame
。数据对齐是熊猫的主要特权之一。因此,除非您有其他需求,否则新创建的商品DataFrame
具有重复的价值。在上述示例中,data['idx_col']
具有与相同的数据data.index
。
Example code:
a = pd.Series([1,2,3,4], index=[7,2,8,9])
b = pd.Series([5,6,7,8], index=[7,2,8,9])
data = pd.DataFrame({'a': a,'b':b, 'idx_col':a.index})
Pandas allows you to create a DataFrame
from a dict
with Series
as the values and the column names as the keys. When it finds a Series
as a value, it uses the Series
index as part of the DataFrame
index. This data alignment is one of the main perks of Pandas. Consequently, unless you have other needs, the freshly created DataFrame
has duplicated value. In the above example, data['idx_col']
has the same data as data.index
.
回答 4
如果我可以回答这个问题。
将系列转换为数据框的基本原理是要了解
从概念上讲,数据框中的每一列都是一个序列。
2.而且,每个列名都是映射到系列的键名。
如果牢记以上两个概念,则可以想到许多将系列转换为数据框的方法。一个简单的解决方案将是这样的:
在这里创建两个系列
import pandas as pd
series_1 = pd.Series(list(range(10)))
series_2 = pd.Series(list(range(20,30)))
使用所需的列名创建一个空的数据框
df = pd.DataFrame(columns = ['Column_name#1', 'Column_name#1'])
使用映射概念将序列值放入数据框内
df['Column_name#1'] = series_1
df['Column_name#2'] = series_2
立即检查结果
df.head(5)
If I may answer this.
The fundamentals behind converting series to data frame is to understand that
1. At conceptual level, every column in data frame is a series.
2. And, every column name is a key name that maps to a series.
If you keep above two concepts in mind, you can think of many ways to convert series to data frame.
One easy solution will be like this:
Create two series here
import pandas as pd
series_1 = pd.Series(list(range(10)))
series_2 = pd.Series(list(range(20,30)))
Create an empty data frame with just desired column names
df = pd.DataFrame(columns = ['Column_name#1', 'Column_name#1'])
Put series value inside data frame using mapping concept
df['Column_name#1'] = series_1
df['Column_name#2'] = series_2
Check results now
df.head(5)
回答 5
不确定我是否完全理解您的问题,但这是您想做的吗?
pd.DataFrame(data=dict(s1=s1, s2=s2), index=s1.index)
(index=s1.index
这里甚至没有必要)
Not sure I fully understand your question, but is this what you want to do?
pd.DataFrame(data=dict(s1=s1, s2=s2), index=s1.index)
(index=s1.index
is not even necessary here)
回答 6
基于以下方式的解决方案的简化join()
:
df = a.to_frame().join(b)
A simplification of the solution based on join()
:
df = a.to_frame().join(b)
回答 7
我使用了pandas将numpy数组或iseries转换为数据框,然后添加并按键将其他附加列作为“预测”。如果您需要将数据框转换回列表,请使用values.tolist()
output=pd.DataFrame(X_test)
output['prediction']=y_pred
list=output.values.tolist()
I used pandas to convert my numpy array or iseries to an dataframe then added and additional the additional column by key as ‘prediction’. If you need dataframe converted back to a list then use values.tolist()
output=pd.DataFrame(X_test)
output['prediction']=y_pred
list=output.values.tolist()