问题:pandas系列和单列DataFrame有什么区别?
为何熊猫区分a Series
和单栏DataFrame
?
换句话说:Series
该类存在的原因是什么?
我主要使用带有datetime索引的时间序列,也许这有助于设置上下文。
Why does pandas make a distinction between a Series
and a single-column DataFrame
?
In other words: what is the reason of existence of the Series
class?
I’m mainly using time series with datetime index, maybe that helps to set the context.
回答 0
引用熊猫文档
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
具有标注轴(行和列)的二维大小可变的,可能是异构的表格数据结构。算术运算在行和列标签上对齐。可以看作是Series对象的类似dict的容器。大熊猫的主要数据结构。
因此,系列是a的单个列的数据结构DataFrame
,不仅在概念上,而且从字面上看,即a中的数据DataFrame
实际上都作为的集合存储在内存中Series
。
类似地:我们需要列表和矩阵,因为矩阵是用列表构建的。单行矩阵虽然在功能上等同于列表,但没有它们组成的列表仍然不存在。
它们都具有极其相似的API,但是您会发现DataFrame
方法始终可以满足您拥有不止一列的可能性。并且,当然,您总是可以向添加另一个Series
(或等效对象)DataFrame
,而向添加Series
另一个Series
涉及创建DataFrame
。
Quoting the Pandas docs
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
So, the Series is the data structure for a single column of a DataFrame
, not only conceptually, but literally, i.e. the data in a DataFrame
is actually stored in memory as a collection of Series
.
Analogously: We need both lists and matrices, because matrices are built with lists. Single row matricies, while equivalent to lists in functionality still cannot exists without the list(s) they’re composed of.
They both have extremely similar APIs, but you’ll find that DataFrame
methods always cater to the possibility that you have more than one column. And, of course, you can always add another Series
(or equivalent object) to a DataFrame
, while adding a Series
to another Series
involves creating a DataFrame
.
回答 1
来自pandas doc,网址
为http://pandas.pydata.org/pandas-docs/stable/dsintro.html。Series是一维标记的数组,能够保存任何数据类型。以熊猫系列的形式读取数据:
import pandas as pd
ds = pd.Series(data, index=index)
DataFrame是二维标记的数据结构,具有可能不同类型的列。
import pandas as pd
df = pd.DataFrame(data, index=index)
在以上两个索引中都是列表
例如:我有一个csv文件,其中包含以下数据:
,country,popuplation,area,capital
BR,Brazil,10210,12015,Brasile
RU,Russia,1025,457,Moscow
IN,India,10458,457787,New Delhi
要读取以上数据作为序列和数据框:
import pandas as pd
file_data = pd.read_csv("file_path", index_col=0)
d = pd.Series(file_data.country, index=['BR','RU','IN'] or index = file_data.index)
输出:
>>> d
BR Brazil
RU Russia
IN India
df = pd.DataFrame(file_data.area, index=['BR','RU','IN'] or index = file_data.index )
输出:
>>> df
area
BR 12015
RU 457
IN 457787
from the pandas doc http://pandas.pydata.org/pandas-docs/stable/dsintro.html
Series is a one-dimensional labeled array capable of holding any data type.
To read data in form of panda Series:
import pandas as pd
ds = pd.Series(data, index=index)
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
import pandas as pd
df = pd.DataFrame(data, index=index)
In both of the above index is list
for example: I have a csv file with following data:
,country,popuplation,area,capital
BR,Brazil,10210,12015,Brasile
RU,Russia,1025,457,Moscow
IN,India,10458,457787,New Delhi
To read above data as series and data frame:
import pandas as pd
file_data = pd.read_csv("file_path", index_col=0)
d = pd.Series(file_data.country, index=['BR','RU','IN'] or index = file_data.index)
output:
>>> d
BR Brazil
RU Russia
IN India
df = pd.DataFrame(file_data.area, index=['BR','RU','IN'] or index = file_data.index )
output:
>>> df
area
BR 12015
RU 457
IN 457787
回答 2
系列是一维对象,可以保存任何数据类型,例如整数,浮点数和字符串,例如
import pandas as pd
x = pd.Series([A,B,C])
0 A
1 B
2 C
系列的第一列称为索引,即0,1,2,第二列是您的实际数据,即A,B,C
DataFrames是二维对象,可以容纳序列,列表,字典
df=pd.DataFrame(rd(5,4),['A','B','C','D','E'],['W','X','Y','Z'])
Series is a one-dimensional object that can hold any data type such as integers, floats and strings e.g
import pandas as pd
x = pd.Series([A,B,C])
0 A
1 B
2 C
The first column of Series is known as index i.e 0,1,2
the second column is your actual data i.e A,B,C
DataFrames is two-dimensional object that can hold series, list, dictionary
df=pd.DataFrame(rd(5,4),['A','B','C','D','E'],['W','X','Y','Z'])
回答 3
系列是一维标记的数组,能够保存任何数据类型(整数,字符串,浮点数,Python对象等)。轴标签统称为索引。创建系列的基本方法是调用:
s = pd.Series(data, index=index)
DataFrame是二维标记的数据结构,具有可能不同类型的列。您可以将其视为电子表格或SQL表,或Series对象的字典。
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
s = pd.Series(data, index=index)
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects.
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
回答 4
导入汽车数据
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
这是cars.csv文件的外观。
打印出drive_right列为Series:
print(cars.loc[:,"drives_right"])
US True
AUS False
JAP False
IN False
RU True
MOR True
EG True
Name: drives_right, dtype: bool
单括号版本提供Pandas系列,双括号版本提供Pandas DataFrame。
打印出drive_right列作为DataFrame
print(cars.loc[:,["drives_right"]])
drives_right
US True
AUS False
JAP False
IN False
RU True
MOR True
EG True
将一个系列添加到另一个系列会创建一个DataFrame。
Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
Here is how the cars.csv file looks.
Print out drives_right column as Series:
print(cars.loc[:,"drives_right"])
US True
AUS False
JAP False
IN False
RU True
MOR True
EG True
Name: drives_right, dtype: bool
The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.
Print out drives_right column as DataFrame
print(cars.loc[:,["drives_right"]])
drives_right
US True
AUS False
JAP False
IN False
RU True
MOR True
EG True
Adding a Series to another Series creates a DataFrame.