问题:漂亮打印熊猫数据框
如何将pandas数据框打印为基于文本的漂亮表格,如下所示?
+------------+---------+-------------+
| column_one | col_two | column_3 |
+------------+---------+-------------+
| 0 | 0.0001 | ABCD |
| 1 | 1e-005 | ABCD |
| 2 | 1e-006 | long string |
| 3 | 1e-007 | ABCD |
+------------+---------+-------------+
How can I print a pandas dataframe as a nice text-based table, like the following?
+------------+---------+-------------+
| column_one | col_two | column_3 |
+------------+---------+-------------+
| 0 | 0.0001 | ABCD |
| 1 | 1e-005 | ABCD |
| 2 | 1e-006 | long string |
| 3 | 1e-007 | ABCD |
+------------+---------+-------------+
回答 0
我刚刚找到了一个满足这种需求的好工具,它称为tabulate 。
它打印表格数据并与一起使用DataFrame
。
from tabulate import tabulate
import pandas as pd
df = pd . DataFrame ({ 'col_two' : [ 0.0001 , 1e-005 , 1e-006 , 1e-007 ],
'column_3' : [ 'ABCD' , 'ABCD' , 'long string' , 'ABCD' ]})
print ( tabulate ( df , headers = 'keys' , tablefmt = 'psql' ))
+----+-----------+-------------+
| | col_two | column_3 |
|----+-----------+-------------|
| 0 | 0.0001 | ABCD |
| 1 | 1e-05 | ABCD |
| 2 | 1e-06 | long string |
| 3 | 1e-07 | ABCD |
+----+-----------+-------------+
注意:
要取消所有类型数据的行索引,请通过showindex="never"
或showindex=False
。
I’ve just found a great tool for that need, it is called tabulate .
It prints tabular data and works with DataFrame
.
from tabulate import tabulate
import pandas as pd
df = pd.DataFrame({'col_two' : [0.0001, 1e-005 , 1e-006, 1e-007],
'column_3' : ['ABCD', 'ABCD', 'long string', 'ABCD']})
print(tabulate(df, headers='keys', tablefmt='psql'))
+----+-----------+-------------+
| | col_two | column_3 |
|----+-----------+-------------|
| 0 | 0.0001 | ABCD |
| 1 | 1e-05 | ABCD |
| 2 | 1e-06 | long string |
| 3 | 1e-07 | ABCD |
+----+-----------+-------------+
Note:
To suppress row indices for all types of data, pass showindex="never"
or showindex=False
.
回答 1
回答 2
熊猫> = 1.0
如果您想要一个内置函数将数据转储到某些github markdown中,则现在有了一个。看一下to_markdown
:
df = pd . DataFrame ({ "A" : [ 1 , 2 , 3 ], "B" : [ 1 , 2 , 3 ]}, index =[ 'a' , 'a' , 'b' ])
print ( df . to_markdown ())
| | A | B |
|:---|----:|----:|
| a | 1 | 1 |
| a | 2 | 2 |
| b | 3 | 3 |
这是在github上的样子:
请注意,您仍然需要tabulate
安装该软件包。
pandas >= 1.0
If you want an inbuilt function to dump your data into some github markdown, you now have one. Take a look at to_markdown
:
df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
print(df.to_markdown())
| | A | B |
|:---|----:|----:|
| a | 1 | 1 |
| a | 2 | 2 |
| b | 3 | 3 |
Here’s what that looks like on github:
Note that you will still need to have the tabulate
package installed.
回答 3
如果您在Jupyter笔记本中,则可以运行以下代码,以格式正确的表格交互显示数据框。
这个答案建立在上面的to_html(’temp.html’)答案的基础上,但不是直接在笔记本中显示格式正确的表,而是创建文件:
from IPython . display import display , HTML
display ( HTML ( df . to_html ()))
由于以下示例中的代码而获得了此代码的感谢:在iPython Notebook中将DataFrame显示为表
If you are in Jupyter notebook, you could run the following code to interactively display the dataframe in a well formatted table.
This answer builds on the to_html(‘temp.html’) answer above, but instead of creating a file displays the well formatted table directly in the notebook:
from IPython.display import display, HTML
display(HTML(df.to_html()))
Credit for this code due to example at: Show DataFrame as table in iPython Notebook
回答 4
您可以使用prettytable 将表呈现为文本。诀窍是将data_frame转换为内存中的csv文件,并让prettytable读取该文件。这是代码:
from StringIO import StringIO
import prettytable
output = StringIO ()
data_frame . to_csv ( output )
output . seek ( 0 )
pt = prettytable . from_csv ( output )
print pt
You can use prettytable to render the table as text. The trick is to convert the data_frame to an in-memory csv file and have prettytable read it. Here’s the code:
from StringIO import StringIO
import prettytable
output = StringIO()
data_frame.to_csv(output)
output.seek(0)
pt = prettytable.from_csv(output)
print pt
回答 5
我用了一段时间的答案,发现在大多数情况下都很好。不幸的是,由于熊猫的to_csv 和prettytable 的from_csv 之间不一致,因此我不得不以其他方式使用prettytable。
一种失败的情况是包含逗号的数据框:
pd . DataFrame ({ 'A' : [ 1 , 2 ], 'B' : [ 'a,' , 'b' ]})
Prettytable引发以下形式的错误:
Error : Could not determine delimiter
下面的函数处理这种情况:
def format_for_print ( df ):
table = PrettyTable ([ '' ] + list ( df . columns ))
for row in df . itertuples ():
table . add_row ( row )
return str ( table )
如果您不关心索引,请使用:
def format_for_print2 ( df ):
table = PrettyTable ( list ( df . columns ))
for row in df . itertuples ():
table . add_row ( row [ 1 :])
return str ( table )
I used Ofer’s answer for a while and found it great in most cases. Unfortunately, due to inconsistencies between pandas’s to_csv and prettytable ‘s from_csv, I had to use prettytable in a different way.
One failure case is a dataframe containing commas:
pd.DataFrame({'A': [1, 2], 'B': ['a,', 'b']})
Prettytable raises an error of the form:
Error: Could not determine delimiter
The following function handles this case:
def format_for_print(df):
table = PrettyTable([''] + list(df.columns))
for row in df.itertuples():
table.add_row(row)
return str(table)
If you don’t care about the index, use:
def format_for_print2(df):
table = PrettyTable(list(df.columns))
for row in df.itertuples():
table.add_row(row[1:])
return str(table)
回答 6
遵循Mark的回答,如果由于某些原因(例如,您想在控制台上进行快速测试)而不 使用Jupyter,则可以使用该DataFrame.to_string
方法,该方法至少适用于-Pandas 0.12(2014)起。
import pandas as pd
matrix = [( 1 , 23 , 45 ), ( 789 , 1 , 23 ), ( 45 , 678 , 90 )]
df = pd . DataFrame ( matrix , columns = list ( 'abc' ))
print ( df . to_string ())
# outputs:
# a b c
# 0 1 23 45
# 1 789 1 23
# 2 45 678 90
Following up on Mark’s answer, if you’re not using Jupyter for some reason, e.g. you want to do some quick testing on the console, you can use the DataFrame.to_string
method, which works from — at least — Pandas 0.12 (2014) onwards.
import pandas as pd
matrix = [(1, 23, 45), (789, 1, 23), (45, 678, 90)]
df = pd.DataFrame(matrix, columns=list('abc'))
print(df.to_string())
# outputs:
# a b c
# 0 1 23 45
# 1 789 1 23
# 2 45 678 90
回答 7
也许您正在寻找这样的东西:
def tableize ( df ):
if not isinstance ( df , pd . DataFrame ):
return
df_columns = df . columns . tolist ()
max_len_in_lst = lambda lst : len ( sorted ( lst , reverse = True , key = len )[ 0 ])
align_center = lambda st , sz : "{0}{1}{0}" . format ( " " *( 1 +( sz - len ( st ))// 2 ), st )[: sz ] if len ( st ) < sz else st
align_right = lambda st , sz : "{0}{1} " . format ( " " *( sz - len ( st )- 1 ), st ) if len ( st ) < sz else st
max_col_len = max_len_in_lst ( df_columns )
max_val_len_for_col = dict ([( col , max_len_in_lst ( df . iloc [:, idx ]. astype ( 'str' ))) for idx , col in enumerate ( df_columns )])
col_sizes = dict ([( col , 2 + max ( max_val_len_for_col . get ( col , 0 ), max_col_len )) for col in df_columns ])
build_hline = lambda row : '+' . join ([ '-' * col_sizes [ col ] for col in row ]). join ([ '+' , '+' ])
build_data = lambda row , align : "|" . join ([ align ( str ( val ), col_sizes [ df_columns [ idx ]]) for idx , val in enumerate ( row )]). join ([ '|' , '|' ])
hline = build_hline ( df_columns )
out = [ hline , build_data ( df_columns , align_center ), hline ]
for _ , row in df . iterrows ():
out . append ( build_data ( row . tolist (), align_right ))
out . append ( hline )
return "\n" . join ( out )
df = pd . DataFrame ([[ 1 , 2 , 3 ], [ 11111 , 22 , 333 ]], columns =[ 'a' , 'b' , 'c' ])
print tableize ( df )
输出:
+ ------- + ---- + ----- +
| 一个| b | c |
+ ------- + ---- + ----- +
| 1 | 2 | 3 |
| 11111 | 22 | 333 |
+ ------- + ---- + ----- +
Maybe you’re looking for something like this:
def tableize(df):
if not isinstance(df, pd.DataFrame):
return
df_columns = df.columns.tolist()
max_len_in_lst = lambda lst: len(sorted(lst, reverse=True, key=len)[0])
align_center = lambda st, sz: "{0}{1}{0}".format(" "*(1+(sz-len(st))//2), st)[:sz] if len(st) < sz else st
align_right = lambda st, sz: "{0}{1} ".format(" "*(sz-len(st)-1), st) if len(st) < sz else st
max_col_len = max_len_in_lst(df_columns)
max_val_len_for_col = dict([(col, max_len_in_lst(df.iloc[:,idx].astype('str'))) for idx, col in enumerate(df_columns)])
col_sizes = dict([(col, 2 + max(max_val_len_for_col.get(col, 0), max_col_len)) for col in df_columns])
build_hline = lambda row: '+'.join(['-' * col_sizes[col] for col in row]).join(['+', '+'])
build_data = lambda row, align: "|".join([align(str(val), col_sizes[df_columns[idx]]) for idx, val in enumerate(row)]).join(['|', '|'])
hline = build_hline(df_columns)
out = [hline, build_data(df_columns, align_center), hline]
for _, row in df.iterrows():
out.append(build_data(row.tolist(), align_right))
out.append(hline)
return "\n".join(out)
df = pd.DataFrame([[1, 2, 3], [11111, 22, 333]], columns=['a', 'b', 'c'])
print tableize(df)
Output:
+-------+----+-----+
| a | b | c |
+-------+----+-----+
| 1 | 2 | 3 |
| 11111 | 22 | 333 |
+-------+----+-----+
回答 8
我希望将数据框打印出来,但也希望在同一页面上添加一些结果和注释。我已经完成了上述工作,但无法获得想要的东西。我最终使用file.write(df1.to_csv())和file.write(“ ,,, blah ,,,,, blah”)语句在页面上获取我的其他内容。当我打开csv文件时,它直接进入了一个电子表格,该电子表格以正确的速度和格式打印了所有内容。
I wanted a paper printout of a dataframe but I wanted to add some results and comments as well on the same page.
I have worked through the above and I could not get what I wanted. I ended up using
file.write(df1.to_csv()) and file.write(“,,,blah,,,,,,blah”) statements to get my extras on the page.
When I opened the csv file it went straight to a spreadsheet which printed everything in the right pace and format.