python pandas dataframe到字典

问题:python pandas dataframe到字典

我有两列数据框,打算将其转换为python字典-第一列将是键,第二列将是值。先感谢您。

数据框:

    id    value
0    0     10.2
1    1      5.7
2    2      7.4

I’ve a two columns dataframe, and intend to convert it to python dictionary – the first column will be the key and the second will be the value. Thank you in advance.

Dataframe:

    id    value
0    0     10.2
1    1      5.7
2    2      7.4

回答 0

请参阅有关的文档to_dict。您可以像这样使用它:

df.set_index('id').to_dict()

如果只有一列,为避免列名也是dict中的一个级别(实际上,在这种情况下,请使用Series.to_dict()):

df.set_index('id')['value'].to_dict()

See the docs for to_dict. You can use it like this:

df.set_index('id').to_dict()

And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()):

df.set_index('id')['value'].to_dict()

回答 1

mydict = dict(zip(df.id, df.value))
mydict = dict(zip(df.id, df.value))

回答 2

如果您想要一种简单的方法来保留重复项,则可以使用groupby

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}

If you want a simple way to preserve duplicates, you could use groupby:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}

回答 3

此线程中的joris和重复的线程中的punchagan的答案非常好,但是,如果用于键的列包含任何重复的值,它们将不会给出正确的结果。

例如:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}

如果您有重复的条目并且不想丢失它们,则可以使用以下难看但有效的代码:

>>> mydict = {}
>>> for x in range(len(ptest)):
...     currentid = ptest.iloc[x,0]
...     currentvalue = ptest.iloc[x,1]
...     mydict.setdefault(currentid, [])
...     mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}

The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.

For example:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}

If you have duplicated entries and do not want to lose them, you can use this ugly but working code:

>>> mydict = {}
>>> for x in range(len(ptest)):
...     currentid = ptest.iloc[x,0]
...     currentvalue = ptest.iloc[x,1]
...     mydict.setdefault(currentid, [])
...     mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}

回答 4

最简单的解决方案:

df.set_index('id').T.to_dict('records')

例:

df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')

如果您有多个值,例如val1,val2,val3等,并且您希望将它们作为列表,请使用以下代码:

df.set_index('id').T.to_dict('list')

Simplest solution:

df.set_index('id').T.to_dict('records')

Example:

df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')

If you have multiple values, like val1, val2, val3,etc and u want them as lists, then use the below code:

df.set_index('id').T.to_dict('list')

回答 5

在某些版本中,以下代码可能无法正常工作

mydict = dict(zip(df.id, df.value))

所以要明确

id_=df.id.values
value=df.value.values
mydict=dict(zip(id_,value))

注意我使用id_,因为单词id是保留单词

in some versions the code below might not work

mydict = dict(zip(df.id, df.value))

so make it explicit

id_=df.id.values
value=df.value.values
mydict=dict(zip(id_,value))

Note i used id_ because the word id is reserved word


回答 6

您可以使用“字典理解”

my_dict = {row[0]: row[1] for row in df.values}

You can use ‘dict comprehension’

my_dict = {row[0]: row[1] for row in df.values}

回答 7

另一个(略短)的解决方案,不会丢失重复的条目:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
...     ptest_slice = ptest[ptest['id'] == i]
...     pdict[i] = ptest_slice['value'].tolist()
...

>>> pdict
{'b': [3], 'a': [1, 2]}

Another (slightly shorter) solution for not losing duplicate entries:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
...     ptest_slice = ptest[ptest['id'] == i]
...     pdict[i] = ptest_slice['value'].tolist()
...

>>> pdict
{'b': [3], 'a': [1, 2]}

回答 8

您需要一个列表作为字典值。这段代码可以解决问题。

from collections import defaultdict
mydict = defaultdict(list)
for k, v in zip(df.id.values,df.value.values):
    mydict[k].append(v)

You need a list as a dictionary value. This code will do the trick.

from collections import defaultdict
mydict = defaultdict(list)
for k, v in zip(df.id.values,df.value.values):
    mydict[k].append(v)

回答 9

我试图从熊猫数据框的列中制作字典时发现了这个问题。在我的情况下,数据框具有A,B和C列(假设A和B是经度和纬度的地理坐标,C则是国家/地区/州/等等,或多或少是这种情况)。

我想要一个字典,其中每对A,B值(字典键)与对应行中的C(字典值)的值匹配(由于先前的过滤,每 A,B值保证是唯一的),但是它是在这种情况下,对于不同的A,B值对,可能具有相同的C值),所以我这样做了:

mydict = dict(zip(zip(df['A'],df['B']), df['C']))

使用熊猫to_dict()也可以:

mydict = df.set_index(['A','B']).to_dict(orient='dict')['C']

(在执行创建字典的行之前,A或B列均未用作索引)

两种方法都非常快速(在具有8万行,具有5年历史的快速双核笔记本电脑上,数据帧不到一秒钟)。

我发布此消息的原因:

  1. 对于那些需要这种解决方案的人
  2. 如果有人知道执行速度更快的解决方案(例如,数百万行),我将不胜感激。

I found this question while trying to make a dictionary out of three columns of a pandas dataframe. In my case the dataframe has columns A, B and C (let’s say A and B are the geographical coordinates of longitude and latitude and C the country region/state/etc, which is more or less the case).

I wanted a dictionary with each pair of A,B values (dictionary key) matching the value of C (dictionary value) in the corresponding row (each pair of A,B values is guaranteed to be unique due to previous filtering, but it is possible to have the same value of C for different pairs of A,B values in this context), so I did:

mydict = dict(zip(zip(df['A'],df['B']), df['C']))

Using pandas to_dict() also works:

mydict = df.set_index(['A','B']).to_dict(orient='dict')['C']

(none of the columns A or B were used as index before executing the line creating the dictionary)

Both approaches are fast (less than one second on a dataframe with 85k rows, 5-year-old fast dual-core laptop).

The reasons I’m posting this:

  1. for those who need this kind of solution
  2. if someone knows a faster executing solution (e.g., for millions of rows), I’d appreciate a reply.

回答 10

def get_dict_from_pd(df, key_col, row_col):
    result = dict()
    for i in set(df[key_col].values):
        is_i = df[key_col] == i
        result[i] = list(df[is_i][row_col].values)
    return result

这是我的想法,一个基本的循环

def get_dict_from_pd(df, key_col, row_col):
    result = dict()
    for i in set(df[key_col].values):
        is_i = df[key_col] == i
        result[i] = list(df[is_i][row_col].values)
    return result

this is my sloution, a basic loop


回答 11

这是我的解决方案:

import pandas as pd
df = pd.read_excel('dic.xlsx')
df_T = df.set_index('id').T
dic = df_T.to_dict('records')
print(dic)

This is my solution:

import pandas as pd
df = pd.read_excel('dic.xlsx')
df_T = df.set_index('id').T
dic = df_T.to_dict('records')
print(dic)