标签归档:options

在pandas数据框中完全打印很长的字符串

问题:在pandas数据框中完全打印很长的字符串

我正在努力看似非常简单的事情。我有一个包含非常长字符串的pandas数据框。

df = pd.DataFrame({'one' : ['one', 'two', 
      'This is very long string very long string very long string veryvery long string']})

现在,当我尝试打印相同的字符串时,我看不到完整的字符串,而只看到了字符串的一部分。

我尝试了以下选项

  • 使用 print(df.iloc[2])
  • 使用 to_html
  • 使用 to_string
  • 其中一个stackoverflow答案建议通过使用pandas display选项来增加列宽,但该方法也不起作用。
  • 我也没有得到如何set_printoptions帮助我。

任何想法表示赞赏。看起来很简单,但无法获得!

I am struggling with the seemingly very simple thing.I have a pandas data frame containing very long string.

df = pd.DataFrame({'one' : ['one', 'two', 
      'This is very long string very long string very long string veryvery long string']})

Now when I try to print the same, I do not see the full string I rather see only part of the string.

I tried following options

  • using print(df.iloc[2])
  • using to_html
  • using to_string
  • One of the stackoverflow answer suggested to increase column width by using pandas display option, that did not work either.
  • I also did not get how set_printoptions will help me.

Any ideas appreciated. Looks very simple, but not able to get it!


回答 0

您可以使用options.display.max_colwidth指定想要在默认表示中看到更多内容:

In [2]: df
Out[2]:
                                                 one
0                                                one
1                                                two
2  This is very long string very long string very...

In [3]: pd.options.display.max_colwidth
Out[3]: 50

In [4]: pd.options.display.max_colwidth = 100

In [5]: df
Out[5]:
                                                                               one
0                                                                              one
1                                                                              two
2  This is very long string very long string very long string veryvery long string

实际上,如果您只想检查一个值,则可以通过访问它(作为标量,而不是像一行一样df.iloc[2])来查看完整的字符串:

In [7]: df.iloc[2,0]    # or df.loc[2,'one']
Out[7]: 'This is very long string very long string very long string veryvery long string'

You can use options.display.max_colwidth to specify you want to see more in the default representation:

In [2]: df
Out[2]:
                                                 one
0                                                one
1                                                two
2  This is very long string very long string very...

In [3]: pd.options.display.max_colwidth
Out[3]: 50

In [4]: pd.options.display.max_colwidth = 100

In [5]: df
Out[5]:
                                                                               one
0                                                                              one
1                                                                              two
2  This is very long string very long string very long string veryvery long string

And indeed, if you just want to inspect the one value, by accessing it (as a scalar, not as a row as df.iloc[2] does) you also see the full string:

In [7]: df.iloc[2,0]    # or df.loc[2,'one']
Out[7]: 'This is very long string very long string very long string veryvery long string'

回答 1

使用pd.set_option('display.max_colwidth', -1)自动换行,多行细胞。

是有关如何充分利用大熊猫的jupyters显示器的重要资源。

Use pd.set_option('display.max_colwidth', -1) for automatic linebreaks and multi-line cells.

This is a great resource on how to use jupyters display with pandas to the fullest.


回答 2

另一种非常简单的方法是调用列表函数:

list(df['one'][2])
# output:
['This is very long string very long string very long string veryvery long string']

值得一提的是,要列出整个列并不是很方便,但是对于简单的一行来说,为什么呢?

Another, pretty simple approach is to call list function:

list(df['one'][2])
# output:
['This is very long string very long string very long string veryvery long string']

No worth to mention, that is not good to convent to list the whole columns, but for a simple line – why not


回答 3

打印整个字符串的另一种简便方法是values在数据框上调用。

df = pd.DataFrame({'one' : ['one', 'two', 
      'This is very long string very long string very long string veryvery long string']})

print(df.values)

输出将是

[['one']
 ['two']
 ['This is very long string very long string very long string veryvery long string']]

Another easier way to print the whole string is to call values on the dataframe.

df = pd.DataFrame({'one' : ['one', 'two', 
      'This is very long string very long string very long string veryvery long string']})

print(df.values)

The Output will be

[['one']
 ['two']
 ['This is very long string very long string very long string veryvery long string']]

回答 4

这是你的本意吗?

In [7]: x =  pd.DataFrame({'one' : ['one', 'two', 'This is very long string very long string very long string veryvery long string']})

In [8]: x
Out[8]: 
                                                 one
0                                                one
1                                                two
2  This is very long string very long string very...

In [9]: x['one'][2]
Out[9]: 'This is very long string very long string very long string veryvery long string'

Is this what you meant to do ?

In [7]: x =  pd.DataFrame({'one' : ['one', 'two', 'This is very long string very long string very long string veryvery long string']})

In [8]: x
Out[8]: 
                                                 one
0                                                one
1                                                two
2  This is very long string very long string very...

In [9]: x['one'][2]
Out[9]: 'This is very long string very long string very long string veryvery long string'

回答 5

我经常处理您描述的情况的.to_csv()方法是使用该方法并写入stdout:

import sys

df.to_csv(sys.stdout)

更新:现在应该可以使用None而不是sys.stdout具有相似的效果了!

这应该转储整个数据帧,包括所有字符串的全部。您可以使用to_csv参数来配置列分隔符,是否打印索引等。不过,它不如正确呈现它漂亮。

我最初将其发布是为了回答有关熊猫中某个数据框中所有列的输出数据的一些相关问题

The way I often deal with the situation you describe is to use the .to_csv() method and write to stdout:

import sys

df.to_csv(sys.stdout)

Update: it should now be possible to just use None instead of sys.stdout with similar effect!

This should dump the whole dataframe, including the entirety of any strings. You can use the to_csv parameters to configure column separators, whether the index is printed, etc. It will be less pretty than rendering it properly though.

I posted this originally in answer to the somewhat-related question at Output data from all columns in a dataframe in pandas


回答 6

只需在打印之前将以下行添加到您的代码中即可。

 pd.options.display.max_colwidth = 90  # set a value as your need

您只需执行以下步骤即可设置其他附加选项,

  • 您可以如下更改熊猫max_columns功能的选项,以显示更多列

    import pandas as pd
    pd.options.display.max_columns = 10

    (这将显示10列,您可以根据需要进行更改)

  • 这样,您可以更改行数,如下所示以显示更多行

    pd.options.display.max_rows = 999

    (这允许一次打印999行)

这应该很好

请参考文档,为熊猫更改更多选项/设置

Just add the following line to your code before print.

 pd.options.display.max_colwidth = 90  # set a value as your need

You can simply do the following steps for setting other additional options,

  • You can change the options for pandas max_columns feature as follows to display more columns

    import pandas as pd
    pd.options.display.max_columns = 10
    

    (this allows 10 columns to display, you can change this as you need)

  • Like that you can change the number of rows as you need to display as follows to display more rows

    pd.options.display.max_rows = 999
    

    (this allows to print 999 rows at a time)

this should works fine

Please kindly refer the doc to change more options/settings for pandas


回答 7

我创建了一个小实用程序功能,对我来说效果很好

def display_text_max_col_width(df, width):
    with pd.option_context('display.max_colwidth', width):
        print(df)

display_text_max_col_width(train_df["Description"], 800)

我可以根据需要更改宽度的长度,而无需永久设置任何选项。

I have created a small utility function, this works well for me

def display_text_max_col_width(df, width):
    with pd.option_context('display.max_colwidth', width):
        print(df)

display_text_max_col_width(train_df["Description"], 800)

I can change length of the width as per my requirement, without setting any option permanently.


回答 8

如果您使用的是jupyter笔记本,还可以将pandas数据帧打印为HTML表格,该表格将打印完整字符串。

from IPython.display import display, HTML
display(HTML(df.to_html()))

输出量

    one
0   one
1   two
2   This is very long string very long string very long string veryvery long string

If you’re using jupyter notebook, you can also print pandas dataframe as HTML table, which will print full strings.

from IPython.display import display, HTML
display(HTML(df.to_html()))

Output

    one
0   one
1   two
2   This is very long string very long string very long string veryvery long string

如何在不截断的情况下打印完整的NumPy数组?

问题:如何在不截断的情况下打印完整的NumPy数组?

当我打印一个numpy数组时,我得到一个截断的表示形式,但是我想要完整的数组。

有什么办法吗?

例子:

>>> numpy.arange(10000)
array([   0,    1,    2, ..., 9997, 9998, 9999])

>>> numpy.arange(10000).reshape(250,40)
array([[   0,    1,    2, ...,   37,   38,   39],
       [  40,   41,   42, ...,   77,   78,   79],
       [  80,   81,   82, ...,  117,  118,  119],
       ..., 
       [9880, 9881, 9882, ..., 9917, 9918, 9919],
       [9920, 9921, 9922, ..., 9957, 9958, 9959],
       [9960, 9961, 9962, ..., 9997, 9998, 9999]])

When I print a numpy array, I get a truncated representation, but I want the full array.

Is there any way to do this?

Examples:

>>> numpy.arange(10000)
array([   0,    1,    2, ..., 9997, 9998, 9999])

>>> numpy.arange(10000).reshape(250,40)
array([[   0,    1,    2, ...,   37,   38,   39],
       [  40,   41,   42, ...,   77,   78,   79],
       [  80,   81,   82, ...,  117,  118,  119],
       ..., 
       [9880, 9881, 9882, ..., 9917, 9918, 9919],
       [9920, 9921, 9922, ..., 9957, 9958, 9959],
       [9960, 9961, 9962, ..., 9997, 9998, 9999]])

回答 0

用途numpy.set_printoptions

import sys
import numpy
numpy.set_printoptions(threshold=sys.maxsize)

Use numpy.set_printoptions:

import sys
import numpy
numpy.set_printoptions(threshold=sys.maxsize)

回答 1

import numpy as np
np.set_printoptions(threshold=np.inf)

我建议使用,np.inf而不是np.nan别人建议的。它们都为您的目的而工作,但是通过将阈值设置为“无穷大”,对于每个阅读您的代码的人来说都是显而易见的。对我来说,达到“没有数字”的门槛似乎有点模糊。

import numpy as np
np.set_printoptions(threshold=np.inf)

I suggest using np.inf instead of np.nan which is suggested by others. They both work for your purpose, but by setting the threshold to “infinity” it is obvious to everybody reading your code what you mean. Having a threshold of “not a number” seems a little vague to me.


回答 2

先前的答案是正确的,但是作为较弱的选择,您可以转换为列表:

>>> numpy.arange(100).reshape(25,4).tolist()

[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21,
22, 23], [24, 25, 26, 27], [28, 29, 30, 31], [32, 33, 34, 35], [36, 37, 38, 39], [40, 41,
42, 43], [44, 45, 46, 47], [48, 49, 50, 51], [52, 53, 54, 55], [56, 57, 58, 59], [60, 61,
62, 63], [64, 65, 66, 67], [68, 69, 70, 71], [72, 73, 74, 75], [76, 77, 78, 79], [80, 81,
82, 83], [84, 85, 86, 87], [88, 89, 90, 91], [92, 93, 94, 95], [96, 97, 98, 99]]

The previous answers are the correct ones, but as a weaker alternative you can transform into a list:

>>> numpy.arange(100).reshape(25,4).tolist()

[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21,
22, 23], [24, 25, 26, 27], [28, 29, 30, 31], [32, 33, 34, 35], [36, 37, 38, 39], [40, 41,
42, 43], [44, 45, 46, 47], [48, 49, 50, 51], [52, 53, 54, 55], [56, 57, 58, 59], [60, 61,
62, 63], [64, 65, 66, 67], [68, 69, 70, 71], [72, 73, 74, 75], [76, 77, 78, 79], [80, 81,
82, 83], [84, 85, 86, 87], [88, 89, 90, 91], [92, 93, 94, 95], [96, 97, 98, 99]]

回答 3

NumPy 1.15或更高版本

如果您使用NumPy 1.15(2018年7月23日发行)或更高版本,则可以使用printoptions上下文管理器:

with numpy.printoptions(threshold=numpy.inf):
    print(arr)

(当然,如果您导入的方式是,请替换numpy为)npnumpy

使用上下文管理器(with-block)可确保在上下文管理器完成后,打印选项将恢复为块启动之前的状态。它确保设置是临时的,并且仅应用于块内的代码。

有关上下文管理器及其支持的其他参数的详细信息,请参见numpy.printoptions文档

NumPy 1.15 or newer

If you use NumPy 1.15 (released 2018-07-23) or newer, you can use the printoptions context manager:

with numpy.printoptions(threshold=numpy.inf):
    print(arr)

(of course, replace numpy by np if that’s how you imported numpy)

The use of a context manager (the with-block) ensures that after the context manager is finished, the print options will revert to whatever they were before the block started. It ensures the setting is temporary, and only applied to code within the block.

See numpy.printoptions documentation for details on the context manager and what other arguments it supports.


回答 4

听起来您正在使用numpy。

如果是这样,您可以添加:

import numpy as np
np.set_printoptions(threshold=np.nan)

这将禁用边角打印。有关更多信息,请参见此NumPy教程

This sounds like you’re using numpy.

If that’s the case, you can add:

import numpy as np
np.set_printoptions(threshold=np.nan)

That will disable the corner printing. For more information, see this NumPy Tutorial.


回答 5

这是一种一次性的方法,如果您不想更改默认设置,这将非常有用:

def fullprint(*args, **kwargs):
  from pprint import pprint
  import numpy
  opt = numpy.get_printoptions()
  numpy.set_printoptions(threshold=numpy.inf)
  pprint(*args, **kwargs)
  numpy.set_printoptions(**opt)

Here is a one-off way to do this, which is useful if you don’t want to change your default settings:

def fullprint(*args, **kwargs):
  from pprint import pprint
  import numpy
  opt = numpy.get_printoptions()
  numpy.set_printoptions(threshold=numpy.inf)
  pprint(*args, **kwargs)
  numpy.set_printoptions(**opt)

回答 6

使用上下文管理作为保价 sugggested

import numpy as np


class fullprint:
    'context manager for printing full numpy arrays'

    def __init__(self, **kwargs):
        kwargs.setdefault('threshold', np.inf)
        self.opt = kwargs

    def __enter__(self):
        self._opt = np.get_printoptions()
        np.set_printoptions(**self.opt)

    def __exit__(self, type, value, traceback):
        np.set_printoptions(**self._opt)


if __name__ == '__main__': 
    a = np.arange(1001)

    with fullprint():
        print(a)

    print(a)

    with fullprint(threshold=None, edgeitems=10):
        print(a)

Using a context manager as Paul Price sugggested

import numpy as np


class fullprint:
    'context manager for printing full numpy arrays'

    def __init__(self, **kwargs):
        kwargs.setdefault('threshold', np.inf)
        self.opt = kwargs

    def __enter__(self):
        self._opt = np.get_printoptions()
        np.set_printoptions(**self.opt)

    def __exit__(self, type, value, traceback):
        np.set_printoptions(**self._opt)


if __name__ == '__main__': 
    a = np.arange(1001)

    with fullprint():
        print(a)

    print(a)

    with fullprint(threshold=None, edgeitems=10):
        print(a)

回答 7

numpy.savetxt

numpy.savetxt(sys.stdout, numpy.arange(10000))

或者如果您需要一个字符串:

import StringIO
sio = StringIO.StringIO()
numpy.savetxt(sio, numpy.arange(10000))
s = sio.getvalue()
print s

默认输出格式为:

0.000000000000000000e+00
1.000000000000000000e+00
2.000000000000000000e+00
3.000000000000000000e+00
...

并可以使用其他参数进行配置。

特别要注意的是,它也不会显示方括号,并允许进行大量自定义,如以下内容所述:如何打印不带括号的Numpy数组?

在python 2.7.12,numpy 1.11.1上测试。

numpy.savetxt

numpy.savetxt(sys.stdout, numpy.arange(10000))

or if you need a string:

import StringIO
sio = StringIO.StringIO()
numpy.savetxt(sio, numpy.arange(10000))
s = sio.getvalue()
print s

The default output format is:

0.000000000000000000e+00
1.000000000000000000e+00
2.000000000000000000e+00
3.000000000000000000e+00
...

and it can be configured with further arguments.

Note in particular how this also not shows the square brackets, and allows for a lot of customization, as mentioned at: How to print a Numpy array without brackets?

Tested on Python 2.7.12, numpy 1.11.1.


回答 8

这是一个微小的修饰(除去传递额外的参数选项set_printoptions)neok的回答。

它显示了如何使用contextlib.contextmanager更少的代码行轻松地创建这样的contextmanager:

import numpy as np
from contextlib import contextmanager

@contextmanager
def show_complete_array():
    oldoptions = np.get_printoptions()
    np.set_printoptions(threshold=np.inf)
    try:
        yield
    finally:
        np.set_printoptions(**oldoptions)

在您的代码中,可以这样使用它:

a = np.arange(1001)

print(a)      # shows the truncated array

with show_complete_array():
    print(a)  # shows the complete array

print(a)      # shows the truncated array (again)

This is a slight modification (removed the option to pass additional arguments to set_printoptions)of neoks answer.

It shows how you can use contextlib.contextmanager to easily create such a contextmanager with fewer lines of code:

import numpy as np
from contextlib import contextmanager

@contextmanager
def show_complete_array():
    oldoptions = np.get_printoptions()
    np.set_printoptions(threshold=np.inf)
    try:
        yield
    finally:
        np.set_printoptions(**oldoptions)

In your code it can be used like this:

a = np.arange(1001)

print(a)      # shows the truncated array

with show_complete_array():
    print(a)  # shows the complete array

print(a)      # shows the truncated array (again)

回答 9

除了最大列数(以固定)之外,此答案numpy.set_printoptions(threshold=numpy.nan)还可以显示一定数量的字符。在某些环境中,例如从bash调用python(而不是交互式会话)时,可以通过如下设置参数来解决此问题linewidth

import numpy as np
np.set_printoptions(linewidth=2000)    # default = 75
Mat = np.arange(20000,20150).reshape(2,75)    # 150 elements (75 columns)
print(Mat)

在这种情况下,您的窗口应限制换行符的字符数。

对于那些使用sublime文本并希望在输出窗口中查看结果的用户,应将build选项添加"word_wrap": false到sublime-build文件[ source ]中。

Complementary to this answer from the maximum number of columns (fixed with numpy.set_printoptions(threshold=numpy.nan)), there is also a limit of characters to be displayed. In some environments like when calling python from bash (rather than the interactive session), this can be fixed by setting the parameter linewidth as following.

import numpy as np
np.set_printoptions(linewidth=2000)    # default = 75
Mat = np.arange(20000,20150).reshape(2,75)    # 150 elements (75 columns)
print(Mat)

In this case, your window should limit the number of characters to wrap the line.

For those out there using sublime text and wanting to see results within the output window, you should add the build option "word_wrap": false to the sublime-build file [source] .


回答 10

从NumPy 1.16版本开始,有关更多详细信息,请参见GitHub票证12251

from sys import maxsize
from numpy import set_printoptions

set_printoptions(threshold=maxsize)

Since NumPy version 1.16, for more details see GitHub ticket 12251.

from sys import maxsize
from numpy import set_printoptions

set_printoptions(threshold=maxsize)

回答 11

要关闭它并返回正常模式

np.set_printoptions(threshold=False)

To turn it off and return to the normal mode

np.set_printoptions(threshold=False)

回答 12

假设您有一个numpy数组

 arr = numpy.arange(10000).reshape(250,40)

如果要一次性打印整个数组(不切换np.set_printoptions),但是想要比上下文管理器更简单(更少的代码)的方法,那就做

for row in arr:
     print row 

Suppose you have a numpy array

 arr = numpy.arange(10000).reshape(250,40)

If you want to print the full array in a one-off way (without toggling np.set_printoptions), but want something simpler (less code) than the context manager, just do

for row in arr:
     print row 

回答 13

稍作修改:(因为您要打印大量列表)

import numpy as np
np.set_printoptions(threshold=np.inf, linewidth=200)

x = np.arange(1000)
print(x)

这将增加每行的字符数(默认线宽为75)。使用任何您喜欢的值作为适合您的编码环境的线宽。通过每行添加更多字符,这将使您不必遍历大量输出行。

A slight modification: (since you are going to print a huge list)

import numpy as np
np.set_printoptions(threshold=np.inf, linewidth=200)

x = np.arange(1000)
print(x)

This will increase the number of characters per line (default linewidth of 75). Use any value you like for the linewidth which suits your coding environment. This will save you from having to go through huge number of output lines by adding more characters per line.


回答 14

您可以使用array2string功能-docs

a = numpy.arange(10000).reshape(250,40)
print(numpy.array2string(a, threshold=numpy.nan, max_line_width=numpy.nan))
# [Big output]

You can use the array2string function – docs.

a = numpy.arange(10000).reshape(250,40)
print(numpy.array2string(a, threshold=numpy.nan, max_line_width=numpy.nan))
# [Big output]

回答 15

您不会总是希望打印所有项目,尤其是对于大型阵列。

一种显示更多项目的简单方法:

In [349]: ar
Out[349]: array([1, 1, 1, ..., 0, 0, 0])

In [350]: ar[:100]
Out[350]:
array([1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1,
       1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1])

默认情况下,当切片的数组<1000时,它可以正常工作。

You won’t always want all items printed, especially for large arrays.

A simple way to show more items:

In [349]: ar
Out[349]: array([1, 1, 1, ..., 0, 0, 0])

In [350]: ar[:100]
Out[350]:
array([1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1,
       1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1])

It works fine when sliced array < 1000 by default.


回答 16

如果有熊猫

    numpy.arange(10000).reshape(250,40)
    print(pandas.DataFrame(a).to_string(header=False, index=False))

避免了需要重新设置的副作用,numpy.set_printoptions(threshold=sys.maxsize)并且您没有得到numpy.array和方括号。我发现这很方便将大量数组转储到日志文件中

If you have pandas available,

    numpy.arange(10000).reshape(250,40)
    print(pandas.DataFrame(a).to_string(header=False, index=False))

avoids the side effect of requiring a reset of numpy.set_printoptions(threshold=sys.maxsize) and you don’t get the numpy.array and brackets. I find this convenient for dumping a wide array into a log file


回答 17

如果一个数组太大而无法打印,NumPy会自动跳过该数组的中央部分而仅打印角点:要禁用此行为并强制NumPy打印整个数组,可以使用更改打印选项set_printoptions

>>> np.set_printoptions(threshold='nan')

要么

>>> np.set_printoptions(edgeitems=3,infstr='inf',
... linewidth=75, nanstr='nan', precision=8,
... suppress=False, threshold=1000, formatter=None)

您也可以参考numpy文档 numpy文档中的“或部分”以获取更多帮助。

If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners: To disable this behaviour and force NumPy to print the entire array, you can change the printing options using set_printoptions.

>>> np.set_printoptions(threshold='nan')

or

>>> np.set_printoptions(edgeitems=3,infstr='inf',
... linewidth=75, nanstr='nan', precision=8,
... suppress=False, threshold=1000, formatter=None)

You can also refer to the numpy documentation numpy documentation for “or part” for more help.