问题:Pandas DataFrame到字典列表
我有以下DataFrame:
客户item1 item2 item3
1个苹果牛奶番茄
2水橙土豆
3汁芒果片
我想将其翻译为每行词典列表
rows = [{'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
{'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
{'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]
I have the following DataFrame:
customer item1 item2 item3
1 apple milk tomato
2 water orange potato
3 juice mango chips
which I want to translate it to list of dictionaries per row
rows = [{'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
{'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
{'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]
回答 0
编辑
正如John Galt在回答中提到的那样,您可能应该改用df.to_dict('records')
。它比手动移调要快。
In [20]: timeit df.T.to_dict().values()
1000 loops, best of 3: 395 µs per loop
In [21]: timeit df.to_dict('records')
10000 loops, best of 3: 53 µs per loop
原始答案
使用df.T.to_dict().values()
,如下所示:
In [1]: df
Out[1]:
customer item1 item2 item3
0 1 apple milk tomato
1 2 water orange potato
2 3 juice mango chips
In [2]: df.T.to_dict().values()
Out[2]:
[{'customer': 1.0, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
{'customer': 2.0, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
{'customer': 3.0, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]
Edit
As John Galt mentions in his answer , you should probably instead use df.to_dict('records')
. It’s faster than transposing manually.
In [20]: timeit df.T.to_dict().values()
1000 loops, best of 3: 395 µs per loop
In [21]: timeit df.to_dict('records')
10000 loops, best of 3: 53 µs per loop
Original answer
Use df.T.to_dict().values()
, like below:
In [1]: df
Out[1]:
customer item1 item2 item3
0 1 apple milk tomato
1 2 water orange potato
2 3 juice mango chips
In [2]: df.T.to_dict().values()
Out[2]:
[{'customer': 1.0, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
{'customer': 2.0, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
{'customer': 3.0, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]
回答 1
用途df.to_dict('records')
-提供输出,而无需外部转置。
In [2]: df.to_dict('records')
Out[2]:
[{'customer': 1L, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
{'customer': 2L, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
{'customer': 3L, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]
Use df.to_dict('records')
— gives the output without having to transpose externally.
In [2]: df.to_dict('records')
Out[2]:
[{'customer': 1L, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
{'customer': 2L, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
{'customer': 3L, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]
回答 2
作为对John Galt答案的扩展-
对于以下DataFrame,
customer item1 item2 item3
0 1 apple milk tomato
1 2 water orange potato
2 3 juice mango chips
如果要获取包含索引值的词典列表,可以执行以下操作:
df.to_dict('index')
输出字典的字典,其中父字典的键是索引值。在这种情况下
{0: {'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
1: {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
2: {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}}
As an extension to John Galt’s answer –
For the following DataFrame,
customer item1 item2 item3
0 1 apple milk tomato
1 2 water orange potato
2 3 juice mango chips
If you want to get a list of dictionaries including the index values, you can do something like,
df.to_dict('index')
Which outputs a dictionary of dictionaries where keys of the parent dictionary are index values. In this particular case,
{0: {'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
1: {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
2: {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}}
回答 3
如果您只想选择一列,则可以使用。
df[["item1"]].to_dict("records")
下面将不工作,并产生一个类型错误:不支持的类型。我相信这是因为它正在尝试将系列转换为字典,而不是将数据帧转换为字典。
df["item1"].to_dict("records")
我只需要选择一个列,然后将其转换为以列名作为键的字典列表,然后在此卡住一会儿,以至于我想与大家分享。
If you are interested in only selecting one column this will work.
df[["item1"]].to_dict("records")
The below will NOT work and produces a TypeError: unsupported type: . I believe this is because it is trying to convert a series to a dict and not a Data Frame to a dict.
df["item1"].to_dict("records")
I had a requirement to only select one column and convert it to a list of dicts with the column name as the key and was stuck on this for a bit so figured I’d share.