问题:将Python字典转换为数据框
我有如下的Python字典:
{u'2012-06-08': 388,
u'2012-06-09': 388,
u'2012-06-10': 388,
u'2012-06-11': 389,
u'2012-06-12': 389,
u'2012-06-13': 389,
u'2012-06-14': 389,
u'2012-06-15': 389,
u'2012-06-16': 389,
u'2012-06-17': 389,
u'2012-06-18': 390,
u'2012-06-19': 390,
u'2012-06-20': 390,
u'2012-06-21': 390,
u'2012-06-22': 390,
u'2012-06-23': 390,
u'2012-06-24': 390,
u'2012-06-25': 391,
u'2012-06-26': 391,
u'2012-06-27': 391,
u'2012-06-28': 391,
u'2012-06-29': 391,
u'2012-06-30': 391,
u'2012-07-01': 391,
u'2012-07-02': 392,
u'2012-07-03': 392,
u'2012-07-04': 392,
u'2012-07-05': 392,
u'2012-07-06': 392}
键是Unicode日期,值是整数。我想通过将日期及其对应的值作为两个单独的列将其转换为pandas数据框。示例:col1:日期col2:DateValue(日期仍为Unicode,日期值仍为整数)
Date DateValue
0 2012-07-01 391
1 2012-07-02 392
2 2012-07-03 392
. 2012-07-04 392
. ... ...
. ... ...
对此方向的任何帮助将不胜感激。我找不到有关熊猫文档的资源来帮助我。
我知道一个解决方案可能是将此dict中的每个键值对转换为dict,以便整个结构成为dict的dict,然后我们可以将每一行分别添加到数据帧中。但我想知道是否有更简单的方法和更直接的方法来执行此操作。
到目前为止,我已经尝试将dict转换为series对象,但这似乎并不能维持各列之间的关系:
s = Series(my_dict,index=my_dict.keys())
I have a Python dictionary like the following:
{u'2012-06-08': 388,
u'2012-06-09': 388,
u'2012-06-10': 388,
u'2012-06-11': 389,
u'2012-06-12': 389,
u'2012-06-13': 389,
u'2012-06-14': 389,
u'2012-06-15': 389,
u'2012-06-16': 389,
u'2012-06-17': 389,
u'2012-06-18': 390,
u'2012-06-19': 390,
u'2012-06-20': 390,
u'2012-06-21': 390,
u'2012-06-22': 390,
u'2012-06-23': 390,
u'2012-06-24': 390,
u'2012-06-25': 391,
u'2012-06-26': 391,
u'2012-06-27': 391,
u'2012-06-28': 391,
u'2012-06-29': 391,
u'2012-06-30': 391,
u'2012-07-01': 391,
u'2012-07-02': 392,
u'2012-07-03': 392,
u'2012-07-04': 392,
u'2012-07-05': 392,
u'2012-07-06': 392}
The keys are Unicode dates and the values are integers. I would like to convert this into a pandas dataframe by having the dates and their corresponding values as two separate columns. Example: col1: Dates col2: DateValue (the dates are still Unicode and datevalues are still integers)
Date DateValue
0 2012-07-01 391
1 2012-07-02 392
2 2012-07-03 392
. 2012-07-04 392
. ... ...
. ... ...
Any help in this direction would be much appreciated. I am unable to find resources on the pandas docs to help me with this.
I know one solution might be to convert each key-value pair in this dict, into a dict so the entire structure becomes a dict of dicts, and then we can add each row individually to the dataframe. But I want to know if there is an easier way and a more direct way to do this.
So far I have tried converting the dict into a series object but this doesn’t seem to maintain the relationship between the columns:
s = Series(my_dict,index=my_dict.keys())
回答 0
这里的错误是因为使用标量值调用DataFrame构造函数(它期望值是列表/字典/ …,即具有多个列):
pd.DataFrame(d)
ValueError: If using all scalar values, you must must pass an index
您可以从字典中获取项目(即键值对):
In [11]: pd.DataFrame(d.items()) # or list(d.items()) in python 3
Out[11]:
0 1
0 2012-07-02 392
1 2012-07-06 392
2 2012-06-29 391
3 2012-06-28 391
...
In [12]: pd.DataFrame(d.items(), columns=['Date', 'DateValue'])
Out[12]:
Date DateValue
0 2012-07-02 392
1 2012-07-06 392
2 2012-06-29 391
但是我认为传递Series构造函数更有意义:
In [21]: s = pd.Series(d, name='DateValue')
Out[21]:
2012-06-08 388
2012-06-09 388
2012-06-10 388
In [22]: s.index.name = 'Date'
In [23]: s.reset_index()
Out[23]:
Date DateValue
0 2012-06-08 388
1 2012-06-09 388
2 2012-06-10 388
The error here, is since calling the DataFrame constructor with scalar values (where it expects values to be a list/dict/… i.e. have multiple columns):
pd.DataFrame(d)
ValueError: If using all scalar values, you must must pass an index
You could take the items from the dictionary (i.e. the key-value pairs):
In [11]: pd.DataFrame(d.items()) # or list(d.items()) in python 3
Out[11]:
0 1
0 2012-07-02 392
1 2012-07-06 392
2 2012-06-29 391
3 2012-06-28 391
...
In [12]: pd.DataFrame(d.items(), columns=['Date', 'DateValue'])
Out[12]:
Date DateValue
0 2012-07-02 392
1 2012-07-06 392
2 2012-06-29 391
But I think it makes more sense to pass the Series constructor:
In [21]: s = pd.Series(d, name='DateValue')
Out[21]:
2012-06-08 388
2012-06-09 388
2012-06-10 388
In [22]: s.index.name = 'Date'
In [23]: s.reset_index()
Out[23]:
Date DateValue
0 2012-06-08 388
1 2012-06-09 388
2 2012-06-10 388
回答 1
将字典转换为pandas数据框时,您希望键是该数据框的列,而值是行值,则可以像这样在字典周围放置方括号:
>>> dict_ = {'key 1': 'value 1', 'key 2': 'value 2', 'key 3': 'value 3'}
>>> pd.DataFrame([dict_])
key 1 key 2 key 3
0 value 1 value 2 value 3
它免除了我的头疼,所以我希望它可以帮助某个人!
编辑:在pandas docsdata
中,DataFrame构造函数中参数的一个选项是词典列表。在这里,我们传递的列表中有一个字典。
When converting a dictionary into a pandas dataframe where you want the keys to be the columns of said dataframe and the values to be the row values, you can do simply put brackets around the dictionary like this:
>>> dict_ = {'key 1': 'value 1', 'key 2': 'value 2', 'key 3': 'value 3'}
>>> pd.DataFrame([dict_])
key 1 key 2 key 3
0 value 1 value 2 value 3
It’s saved me some headaches so I hope it helps someone out there!
EDIT: In the pandas docs one option for the data
parameter in the DataFrame constructor is a list of dictionaries. Here we’re passing a list with one dictionary in it.
回答 2
如另一个答案所述,在pandas.DataFrame()
此处直接使用将不会发挥您的作用。
你可以做的是使用pandas.DataFrame.from_dict
具有orient='index'
:
In[7]: pandas.DataFrame.from_dict({u'2012-06-08': 388,
u'2012-06-09': 388,
u'2012-06-10': 388,
u'2012-06-11': 389,
u'2012-06-12': 389,
.....
u'2012-07-05': 392,
u'2012-07-06': 392}, orient='index', columns=['foo'])
Out[7]:
foo
2012-06-08 388
2012-06-09 388
2012-06-10 388
2012-06-11 389
2012-06-12 389
........
2012-07-05 392
2012-07-06 392
As explained on another answer using pandas.DataFrame()
directly here will not act as you think.
What you can do is use pandas.DataFrame.from_dict
with orient='index'
:
In[7]: pandas.DataFrame.from_dict({u'2012-06-08': 388,
u'2012-06-09': 388,
u'2012-06-10': 388,
u'2012-06-11': 389,
u'2012-06-12': 389,
.....
u'2012-07-05': 392,
u'2012-07-06': 392}, orient='index', columns=['foo'])
Out[7]:
foo
2012-06-08 388
2012-06-09 388
2012-06-10 388
2012-06-11 389
2012-06-12 389
........
2012-07-05 392
2012-07-06 392
回答 3
将字典的项目传递给DataFrame构造函数,并指定列名称。之后,解析Date
列以获取Timestamp
值。
注意python 2.x和3.x之间的区别:
在python 2.x中:
df = pd.DataFrame(data.items(), columns=['Date', 'DateValue'])
df['Date'] = pd.to_datetime(df['Date'])
在Python 3.x中:(需要一个附加的“列表”)
df = pd.DataFrame(list(data.items()), columns=['Date', 'DateValue'])
df['Date'] = pd.to_datetime(df['Date'])
Pass the items of the dictionary to the DataFrame constructor, and give the column names. After that parse the Date
column to get Timestamp
values.
Note the difference between python 2.x and 3.x:
In python 2.x:
df = pd.DataFrame(data.items(), columns=['Date', 'DateValue'])
df['Date'] = pd.to_datetime(df['Date'])
In Python 3.x: (requiring an additional ‘list’)
df = pd.DataFrame(list(data.items()), columns=['Date', 'DateValue'])
df['Date'] = pd.to_datetime(df['Date'])
回答 4
回答 5
熊猫具有内置功能,可将字典转换为数据帧。
pd.DataFrame.from_dict(dictionaryObject,orient =’index’)
对于您的数据,您可以如下进行转换:
import pandas as pd
your_dict={u'2012-06-08': 388,
u'2012-06-09': 388,
u'2012-06-10': 388,
u'2012-06-11': 389,
u'2012-06-12': 389,
u'2012-06-13': 389,
u'2012-06-14': 389,
u'2012-06-15': 389,
u'2012-06-16': 389,
u'2012-06-17': 389,
u'2012-06-18': 390,
u'2012-06-19': 390,
u'2012-06-20': 390,
u'2012-06-21': 390,
u'2012-06-22': 390,
u'2012-06-23': 390,
u'2012-06-24': 390,
u'2012-06-25': 391,
u'2012-06-26': 391,
u'2012-06-27': 391,
u'2012-06-28': 391,
u'2012-06-29': 391,
u'2012-06-30': 391,
u'2012-07-01': 391,
u'2012-07-02': 392,
u'2012-07-03': 392,
u'2012-07-04': 392,
u'2012-07-05': 392,
u'2012-07-06': 392}
your_df_from_dict=pd.DataFrame.from_dict(your_dict,orient='index')
print(your_df_from_dict)
Pandas have built-in function for conversion of dict to data frame.
pd.DataFrame.from_dict(dictionaryObject,orient=’index’)
For your data you can convert it like below:
import pandas as pd
your_dict={u'2012-06-08': 388,
u'2012-06-09': 388,
u'2012-06-10': 388,
u'2012-06-11': 389,
u'2012-06-12': 389,
u'2012-06-13': 389,
u'2012-06-14': 389,
u'2012-06-15': 389,
u'2012-06-16': 389,
u'2012-06-17': 389,
u'2012-06-18': 390,
u'2012-06-19': 390,
u'2012-06-20': 390,
u'2012-06-21': 390,
u'2012-06-22': 390,
u'2012-06-23': 390,
u'2012-06-24': 390,
u'2012-06-25': 391,
u'2012-06-26': 391,
u'2012-06-27': 391,
u'2012-06-28': 391,
u'2012-06-29': 391,
u'2012-06-30': 391,
u'2012-07-01': 391,
u'2012-07-02': 392,
u'2012-07-03': 392,
u'2012-07-04': 392,
u'2012-07-05': 392,
u'2012-07-06': 392}
your_df_from_dict=pd.DataFrame.from_dict(your_dict,orient='index')
print(your_df_from_dict)
回答 6
pd.DataFrame({'date' : dict_dates.keys() , 'date_value' : dict_dates.values() })
pd.DataFrame({'date' : dict_dates.keys() , 'date_value' : dict_dates.values() })
回答 7
您也可以只将字典的键和值传递给新的数据框,如下所示:
import pandas as pd
myDict = {<the_dict_from_your_example>]
df = pd.DataFrame()
df['Date'] = myDict.keys()
df['DateValue'] = myDict.values()
You can also just pass the keys and values of the dictionary to the new dataframe, like so:
import pandas as pd
myDict = {<the_dict_from_your_example>]
df = pd.DataFrame()
df['Date'] = myDict.keys()
df['DateValue'] = myDict.values()
回答 8
就我而言,我希望字典的键和值成为DataFrame的列和值。因此,唯一对我有用的是:
data = {'adjust_power': 'y', 'af_policy_r_submix_prio_adjust': '[null]', 'af_rf_info': '[null]', 'bat_ac': '3500', 'bat_capacity': '75'}
columns = list(data.keys())
values = list(data.values())
arr_len = len(values)
pd.DataFrame(np.array(values, dtype=object).reshape(1, arr_len), columns=columns)
In my case I wanted keys and values of a dict to be columns and values of DataFrame. So the only thing that worked for me was:
data = {'adjust_power': 'y', 'af_policy_r_submix_prio_adjust': '[null]', 'af_rf_info': '[null]', 'bat_ac': '3500', 'bat_capacity': '75'}
columns = list(data.keys())
values = list(data.values())
arr_len = len(values)
pd.DataFrame(np.array(values, dtype=object).reshape(1, arr_len), columns=columns)
回答 9
这对我有用,因为我想拥有一个单独的索引列
df = pd.DataFrame.from_dict(some_dict, orient="index").reset_index()
df.columns = ['A', 'B']
This is what worked for me, since I wanted to have a separate index column
df = pd.DataFrame.from_dict(some_dict, orient="index").reset_index()
df.columns = ['A', 'B']
回答 10
接受一个dict作为参数,并返回一个数据帧,其中dict的键作为索引,而值作为一列。
def dict_to_df(d):
df=pd.DataFrame(d.items())
df.set_index(0, inplace=True)
return df
Accepts a dict as argument and returns a dataframe with the keys of the dict as index and values as a column.
def dict_to_df(d):
df=pd.DataFrame(d.items())
df.set_index(0, inplace=True)
return df
回答 11
这对我来说是这样的:
df= pd.DataFrame([d.keys(), d.values()]).T
df.columns= ['keys', 'values'] # call them whatever you like
我希望这有帮助
This is how it worked for me :
df= pd.DataFrame([d.keys(), d.values()]).T
df.columns= ['keys', 'values'] # call them whatever you like
I hope this helps
回答 12
d = {'Date': list(yourDict.keys()),'Date_Values': list(yourDict.values())}
df = pandas.DataFrame(data=d)
如果不封装yourDict.keys()
在中list()
,则最终会将所有键和值放置在每一列的每一行中。像这样:
Date \
0 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
1 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
2 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
3 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
4 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
但是通过添加list()
,结果看起来像这样:
Date Date_Values
0 2012-06-08 388
1 2012-06-09 388
2 2012-06-10 388
3 2012-06-11 389
4 2012-06-12 389
...
d = {'Date': list(yourDict.keys()),'Date_Values': list(yourDict.values())}
df = pandas.DataFrame(data=d)
If you don’t encapsulate yourDict.keys()
inside of list()
, then you will end up with all of your keys and values being placed in every row of every column. Like this:
Date \
0 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
1 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
2 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
3 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
4 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
But by adding list()
then the result looks like this:
Date Date_Values
0 2012-06-08 388
1 2012-06-09 388
2 2012-06-10 388
3 2012-06-11 389
4 2012-06-12 389
...
回答 13
我已经遇到过几次,并有一个我从一个函数创建的示例字典,get_max_Path()
它返回了示例字典:
{2: 0.3097502930247044,
3: 0.4413177909384636,
4: 0.5197224051562838,
5: 0.5717654946470984,
6: 0.6063959031223476,
7: 0.6365209824708223,
8: 0.655918861281035,
9: 0.680844386645206}
要将其转换为数据框,我运行了以下命令:
df = pd.DataFrame.from_dict(get_max_path(2), orient = 'index').reset_index()
返回带有单独索引的简单两列数据框:
index 0
0 2 0.309750
1 3 0.441318
只需使用重命名列 f.rename(columns={'index': 'Column1', 0: 'Column2'}, inplace=True)
I have run into this several times and have an example dictionary that I created from a function get_max_Path()
, and it returns the sample dictionary:
{2: 0.3097502930247044,
3: 0.4413177909384636,
4: 0.5197224051562838,
5: 0.5717654946470984,
6: 0.6063959031223476,
7: 0.6365209824708223,
8: 0.655918861281035,
9: 0.680844386645206}
To convert this to a dataframe, I ran the following:
df = pd.DataFrame.from_dict(get_max_path(2), orient = 'index').reset_index()
Returns a simple two column dataframe with a separate index:
index 0
0 2 0.309750
1 3 0.441318
Just rename the columns using f.rename(columns={'index': 'Column1', 0: 'Column2'}, inplace=True)
回答 14
我认为您可以在创建字典时对数据格式进行一些更改,然后将其轻松转换为DataFrame:
输入:
a={'Dates':['2012-06-08','2012-06-10'],'Date_value':[388,389]}
输出:
{'Date_value': [388, 389], 'Dates': ['2012-06-08', '2012-06-10']}
输入:
aframe=DataFrame(a)
输出:将是您的DataFrame
您只需要在Sublime或Excel之类的地方使用一些文本编辑即可。
I think that you can make some changes in your data format when you create dictionary, then you can easily convert it to DataFrame:
input:
a={'Dates':['2012-06-08','2012-06-10'],'Date_value':[388,389]}
output:
{'Date_value': [388, 389], 'Dates': ['2012-06-08', '2012-06-10']}
input:
aframe=DataFrame(a)
output: will be your DataFrame
You just need to use some text editing in somewhere like Sublime or maybe Excel.