网址中的熊猫read_csv

问题:网址中的熊猫read_csv

我将Python 3.4与IPython结合使用,并具有以下代码。我无法从给定的URL读取csv文件:

import pandas as pd
import requests

url="https://github.com/cs109/2014_data/blob/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(s)

我有以下错误

“预期的文件路径名或类文件对象,得到类型”

我怎样才能解决这个问题?

I am using Python 3.4 with IPython and have the following code. I’m unable to read a csv-file from the given URL:

import pandas as pd
import requests

url="https://github.com/cs109/2014_data/blob/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(s)

I have the following error

“Expected file path name or file-like object, got type”

How can I fix this?


回答 0

更新资料

0.19.2现在,您可以从熊猫直接传递URL


正如错误所暗示的,pandas.read_csv需要一个类似文件的对象作为第一个参数。

如果要从字符串读取csv,可以使用io.StringIO(Python 3.x)或StringIO.StringIO(Python 2.x)

另外,对于URL- https://github.com/cs109/2014_data/blob/master/countries.csv-您正在获得html响应,而不是原始的csv,您应该使用Rawgithub页面中的链接给出的url 获取原始的csv响应-https: //raw.githubusercontent.com/cs109/2014_data/master/countries.csv

范例-

import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

Update

From pandas 0.19.2 you can now just pass the url directly.


Just as the error suggests, pandas.read_csv needs a file-like object as the first argument.

If you want to read the csv from a string, you can use io.StringIO (Python 3.x) or StringIO.StringIO (Python 2.x) .

Also, for the URL – https://github.com/cs109/2014_data/blob/master/countries.csv – you are getting back html response , not raw csv, you should use the url given by the Raw link in the github page for getting raw csv response , which is – https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

Example –

import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

回答 1

在最新版本的pandas(0.19.2)中,您可以直接传递网址

import pandas as pd

url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)

In the latest version of pandas (0.19.2) you can directly pass the url

import pandas as pd

url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)

回答 2

正如我评论的那样,您需要使用StringIO对象并进行解码,即c=pd.read_csv(io.StringIO(s.decode("utf-8")))如果使用请求,则需要进行解码,因为如果您使用.text ,则content会返回字节,您只需要像s = requests.get(url).textc = 那样传递s即可pd.read_csv(StringIO(s))

一种更简单的方法是将原始数据的正确url 直接传递给read_csv,您不必传递像object这样的文件,您可以传递url从而根本不需要请求:

c = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv")

print(c)

输出:

                              Country         Region
0                             Algeria         AFRICA
1                              Angola         AFRICA
2                               Benin         AFRICA
3                            Botswana         AFRICA
4                             Burkina         AFRICA
5                             Burundi         AFRICA
6                            Cameroon         AFRICA
..................................

文档

filepath_or_buffer

字符串或文件句柄/ StringIO字符串可以是URL。有效的URL方案包括http,ftp,s3和file。对于文件URL,需要一个主机。例如,本地文件可以是文件://localhost/path/to/table.csv

As I commented you need to use a StringIO object and decode i.e c=pd.read_csv(io.StringIO(s.decode("utf-8"))) if using requests, you need to decode as .content returns bytes if you used .text you would just need to pass s as is s = requests.get(url).text c = pd.read_csv(StringIO(s)).

A simpler approach is to pass the correct url of the raw data directly to read_csv, you don’t have to pass a file like object, you can pass a url so you don’t need requests at all:

c = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv")

print(c)

Output:

                              Country         Region
0                             Algeria         AFRICA
1                              Angola         AFRICA
2                               Benin         AFRICA
3                            Botswana         AFRICA
4                             Burkina         AFRICA
5                             Burundi         AFRICA
6                            Cameroon         AFRICA
..................................

From the docs:

filepath_or_buffer :

string or file handle / StringIO The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv


回答 3

您遇到的问题是,进入变量s的输出不是csv,而是html文件。为了获得原始的csv,您必须将url修改为:

https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

您的第二个问题是read_csv需要一个文件名,我们可以通过使用io模块中的StringIO来解决此问题。第三个问题是request.get(url).content提供了字节流,我们可以改用request.get(url).text解决。

最终结果是此代码:

from io import StringIO

import pandas as pd
import requests
url='https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
s=requests.get(url).text

c=pd.read_csv(StringIO(s))

输出:

>>> c.head()
    Country  Region
0   Algeria  AFRICA
1    Angola  AFRICA
2     Benin  AFRICA
3  Botswana  AFRICA
4   Burkina  AFRICA

The problem you’re having is that the output you get into the variable ‘s’ is not a csv, but a html file. In order to get the raw csv, you have to modify the url to:

https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

Your second problem is that read_csv expects a file name, we can solve this by using StringIO from io module. Third problem is that request.get(url).content delivers a byte stream, we can solve this using the request.get(url).text instead.

End result is this code:

from io import StringIO

import pandas as pd
import requests
url='https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
s=requests.get(url).text

c=pd.read_csv(StringIO(s))

output:

>>> c.head()
    Country  Region
0   Algeria  AFRICA
1    Angola  AFRICA
2     Benin  AFRICA
3  Botswana  AFRICA
4   Burkina  AFRICA

回答 4

url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
c = pd.read_csv(url, sep = "\t")
url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
c = pd.read_csv(url, sep = "\t")

回答 5

要通过熊猫中的URL导入数据,只需应用下面的简单代码即可,实际上效果更好。

import pandas as pd
train = pd.read_table("https://urlandfile.com/dataset.csv")
train.head()

如果您对原始数据有疑问,则只需在网址前添加“ r”

import pandas as pd
train = pd.read_table(r"https://urlandfile.com/dataset.csv")
train.head()

To Import Data through URL in pandas just apply the simple below code it works actually better.

import pandas as pd
train = pd.read_table("https://urlandfile.com/dataset.csv")
train.head()

If you are having issues with a raw data then just put ‘r’ before URL

import pandas as pd
train = pd.read_table(r"https://urlandfile.com/dataset.csv")
train.head()

我可以将python中的stdout重定向到某种字符串缓冲区吗?

问题:我可以将python中的stdout重定向到某种字符串缓冲区吗?

我使用python ftplib编写了一个小型FTP客户端,但程序包中的某些函数不会返回字符串输出,而是输出到stdout。我想重定向stdout到一个我将能够从中读取输出的对象。

我知道stdout可以使用以下命令将其重定向到任何常规文件中:

stdout = open("file", "a")

但是我更喜欢不使用本地驱动器的方法。

我正在寻找类似BufferedReaderJava的东西,可用于将缓冲区包装到流中。

I’m using python’s ftplib to write a small FTP client, but some of the functions in the package don’t return string output, but print to stdout. I want to redirect stdout to an object which I’ll be able to read the output from.

I know stdout can be redirected into any regular file with:

stdout = open("file", "a")

But I prefer a method that doesn’t uses the local drive.

I’m looking for something like the BufferedReader in Java that can be used to wrap a buffer into a stream.


回答 0

from cStringIO import StringIO # Python3 use: from io import StringIO
import sys

old_stdout = sys.stdout
sys.stdout = mystdout = StringIO()

# blah blah lots of code ...

sys.stdout = old_stdout

# examine mystdout.getvalue()
from cStringIO import StringIO # Python3 use: from io import StringIO
import sys

old_stdout = sys.stdout
sys.stdout = mystdout = StringIO()

# blah blah lots of code ...

sys.stdout = old_stdout

# examine mystdout.getvalue()

回答 1

Python 3.4中有contextlib.redirect_stdout()函数

import io
from contextlib import redirect_stdout

with io.StringIO() as buf, redirect_stdout(buf):
    print('redirected')
    output = buf.getvalue()

以下代码示例显示了如何在旧版Python上实现它

There is contextlib.redirect_stdout() function in Python 3.4:

import io
from contextlib import redirect_stdout

with io.StringIO() as buf, redirect_stdout(buf):
    print('redirected')
    output = buf.getvalue()

Here’s code example that shows how to implement it on older Python versions.


回答 2

只是为了补充上述Ned的答案:您可以使用它将输出重定向到实现write(str)方法的任何对象

这可以很好地用于在GUI应用程序中“捕获” stdout输出。

这是PyQt中一个愚蠢的例子:

import sys
from PyQt4 import QtGui

class OutputWindow(QtGui.QPlainTextEdit):
    def write(self, txt):
        self.appendPlainText(str(txt))

app = QtGui.QApplication(sys.argv)
out = OutputWindow()
sys.stdout=out
out.show()
print "hello world !"

Just to add to Ned’s answer above: you can use this to redirect output to any object that implements a write(str) method.

This can be used to good effect to “catch” stdout output in a GUI application.

Here’s a silly example in PyQt:

import sys
from PyQt4 import QtGui

class OutputWindow(QtGui.QPlainTextEdit):
    def write(self, txt):
        self.appendPlainText(str(txt))

app = QtGui.QApplication(sys.argv)
out = OutputWindow()
sys.stdout=out
out.show()
print "hello world !"

回答 3

从Python 2.6开始,您可以使用实现io模块中的TextIOBaseAPI的任何方法来代替。此解决方案还使您能够sys.stdout.buffer.write()在Python 3中使用(已)将编码的字节字符串写入stdout(请参阅Python 3中的stdout)。StringIO那时,使用将不起作用,因为sys.stdout.encoding也不sys.stdout.buffer可用。

使用TextIOWrapper的解决方案:

import sys
from io import TextIOWrapper, BytesIO

# setup the environment
old_stdout = sys.stdout
sys.stdout = TextIOWrapper(BytesIO(), sys.stdout.encoding)

# do something that writes to stdout or stdout.buffer

# get output
sys.stdout.seek(0)      # jump to the start
out = sys.stdout.read() # read output

# restore stdout
sys.stdout.close()
sys.stdout = old_stdout

此解决方案适用于Python 2> = 2.6和Python 3。

请注意,我们的新产品sys.stdout.write()仅接受unicode字符串,并且sys.stdout.buffer.write()仅接受字节字符串。对于旧代码而言,情况可能并非如此,但对于在Python 2和3上运行且无需更改的代码而言,情况往往如此sys.stdout.buffer

您可以构建一个稍微的变化以接受unicode和byte字符串用于write()

class StdoutBuffer(TextIOWrapper):
    def write(self, string):
        try:
            return super(StdoutBuffer, self).write(string)
        except TypeError:
            # redirect encoded byte strings directly to buffer
            return super(StdoutBuffer, self).buffer.write(string)

您不必将缓冲区的编码设置为sys.stdout.encoding,但这在使用此方法测试/比较脚本输出时会有所帮助。

Starting with Python 2.6 you can use anything implementing the TextIOBase API from the io module as a replacement. This solution also enables you to use sys.stdout.buffer.write() in Python 3 to write (already) encoded byte strings to stdout (see stdout in Python 3). Using StringIO wouldn’t work then, because neither sys.stdout.encoding nor sys.stdout.buffer would be available.

A solution using TextIOWrapper:

import sys
from io import TextIOWrapper, BytesIO

# setup the environment
old_stdout = sys.stdout
sys.stdout = TextIOWrapper(BytesIO(), sys.stdout.encoding)

# do something that writes to stdout or stdout.buffer

# get output
sys.stdout.seek(0)      # jump to the start
out = sys.stdout.read() # read output

# restore stdout
sys.stdout.close()
sys.stdout = old_stdout

This solution works for Python 2 >= 2.6 and Python 3.

Please note that our new sys.stdout.write() only accepts unicode strings and sys.stdout.buffer.write() only accepts byte strings. This might not be the case for old code, but is often the case for code that is built to run on Python 2 and 3 without changes, which again often makes use of sys.stdout.buffer.

You can build a slight variation that accepts unicode and byte strings for write():

class StdoutBuffer(TextIOWrapper):
    def write(self, string):
        try:
            return super(StdoutBuffer, self).write(string)
        except TypeError:
            # redirect encoded byte strings directly to buffer
            return super(StdoutBuffer, self).buffer.write(string)

You don’t have to set the encoding of the buffer the sys.stdout.encoding, but this helps when using this method for testing/comparing script output.


回答 4

即使存在异常,此方法也将还原sys.stdout。它还会在异常发生前获取任何输出。

import io
import sys

real_stdout = sys.stdout
fake_stdout = io.BytesIO()   # or perhaps io.StringIO()
try:
    sys.stdout = fake_stdout
    # do what you have to do to create some output
finally:
    sys.stdout = real_stdout
    output_string = fake_stdout.getvalue()
    fake_stdout.close()
    # do what you want with the output_string

使用Python 2.7.10测试 io.BytesIO()

使用Python 3.6.4进行了测试 io.StringIO()


鲍勃(Bob),添加了一个案例,如果您感觉到修改/扩展代码实验中的任何内容,可能会在某种意义上变得有趣,否则可以将其删除

广告信息…在寻找一些可行的机制来“抓取”输出的过程中,通过扩展实验的一些评论,numexpr.print_versions()直接针对<stdout>(需要清理GUI并将详细信息收集到调试报告中)

# THIS WORKS AS HELL: as Bob Stein proposed years ago:
#  py2 SURPRISEDaBIT:
#
import io
import sys
#
real_stdout = sys.stdout                        #           PUSH <stdout> ( store to REAL_ )
fake_stdout = io.BytesIO()                      #           .DEF FAKE_
try:                                            # FUSED .TRY:
    sys.stdout.flush()                          #           .flush() before
    sys.stdout = fake_stdout                    #           .SET <stdout> to use FAKE_
    # ----------------------------------------- #           +    do what you gotta do to create some output
    print 123456789                             #           + 
    import  numexpr                             #           + 
    QuantFX.numexpr.__version__                 #           + [3] via fake_stdout re-assignment, as was bufferred + "late" deferred .get_value()-read into print, to finally reach -> real_stdout
    QuantFX.numexpr.print_versions()            #           + [4] via fake_stdout re-assignment, as was bufferred + "late" deferred .get_value()-read into print, to finally reach -> real_stdout
    _ = os.system( 'echo os.system() redir-ed' )#           + [1] via real_stdout                                 + "late" deferred .get_value()-read into print, to finally reach -> real_stdout, if not ( _ = )-caught from RET-d "byteswritten" / avoided from being injected int fake_stdout
    _ = os.write(  sys.stderr.fileno(),         #           + [2] via      stderr                                 + "late" deferred .get_value()-read into print, to finally reach -> real_stdout, if not ( _ = )-caught from RET-d "byteswritten" / avoided from being injected int fake_stdout
                       b'os.write()  redir-ed' )#  *OTHERWISE, if via fake_stdout, EXC <_io.BytesIO object at 0x02C0BB10> Traceback (most recent call last):
    # ----------------------------------------- #           ?                              io.UnsupportedOperation: fileno
    #'''                                                    ? YET:        <_io.BytesIO object at 0x02C0BB10> has a .fileno() method listed
    #>>> 'fileno' in dir( sys.stdout )       -> True        ? HAS IT ADVERTISED,
    #>>> pass;            sys.stdout.fileno  -> <built-in method fileno of _io.BytesIO object at 0x02C0BB10>
    #>>> pass;            sys.stdout.fileno()-> Traceback (most recent call last):
    #                                             File "<stdin>", line 1, in <module>
    #                                           io.UnsupportedOperation: fileno
    #                                                       ? BUT REFUSES TO USE IT
    #'''
finally:                                        # == FINALLY:
    sys.stdout.flush()                          #           .flush() before ret'd back REAL_
    sys.stdout = real_stdout                    #           .SET <stdout> to use POP'd REAL_
    sys.stdout.flush()                          #           .flush() after  ret'd back REAL_
    out_string = fake_stdout.getvalue()         #           .GET string           from FAKE_
    fake_stdout.close()                         #                <FD>.close()
    # +++++++++++++++++++++++++++++++++++++     # do what you want with the out_string
    #
    print "\n{0:}\n{1:}{0:}".format( 60 * "/\\",# "LATE" deferred print the out_string at the very end reached -> real_stdout
                                     out_string #                   
                                     )
'''
PASS'd:::::
...
os.system() redir-ed
os.write()  redir-ed
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
123456789
'2.5'
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version:   2.5
NumPy version:     1.10.4
Python version:    2.7.13 |Anaconda 4.0.0 (32-bit)| (default, May 11 2017, 14:07:41) [MSC v.1500 32 bit (Intel)]
AMD/Intel CPU?     True
VML available?     True
VML/MKL version:   Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for 32-bit applications
Number of threads used by default: 4 (out of 4 detected cores)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
>>>

EXC'd :::::
...
os.system() redir-ed
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
123456789
'2.5'
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version:   2.5
NumPy version:     1.10.4
Python version:    2.7.13 |Anaconda 4.0.0 (32-bit)| (default, May 11 2017, 14:07:41) [MSC v.1500 32 bit (Intel)]
AMD/Intel CPU?     True
VML available?     True
VML/MKL version:   Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for 32-bit applications
Number of threads used by default: 4 (out of 4 detected cores)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

Traceback (most recent call last):
  File "<stdin>", line 9, in <module>
io.UnsupportedOperation: fileno
'''

This method restores sys.stdout even if there’s an exception. It also gets any output before the exception.

import io
import sys

real_stdout = sys.stdout
fake_stdout = io.BytesIO()   # or perhaps io.StringIO()
try:
    sys.stdout = fake_stdout
    # do what you have to do to create some output
finally:
    sys.stdout = real_stdout
    output_string = fake_stdout.getvalue()
    fake_stdout.close()
    # do what you want with the output_string

Tested in Python 2.7.10 using io.BytesIO()

Tested in Python 3.6.4 using io.StringIO()


Bob, added for a case if you feel anything from the modified / extended code experimentation might get interesting in any sense, otherwise feel free to delete it

Ad informandum … a few remarks from extended experimentation during finding some viable mechanics to “grab” outputs, directed by numexpr.print_versions() directly to the <stdout> ( upon a need to clean GUI and collecting details into debugging-report )

# THIS WORKS AS HELL: as Bob Stein proposed years ago:
#  py2 SURPRISEDaBIT:
#
import io
import sys
#
real_stdout = sys.stdout                        #           PUSH <stdout> ( store to REAL_ )
fake_stdout = io.BytesIO()                      #           .DEF FAKE_
try:                                            # FUSED .TRY:
    sys.stdout.flush()                          #           .flush() before
    sys.stdout = fake_stdout                    #           .SET <stdout> to use FAKE_
    # ----------------------------------------- #           +    do what you gotta do to create some output
    print 123456789                             #           + 
    import  numexpr                             #           + 
    QuantFX.numexpr.__version__                 #           + [3] via fake_stdout re-assignment, as was bufferred + "late" deferred .get_value()-read into print, to finally reach -> real_stdout
    QuantFX.numexpr.print_versions()            #           + [4] via fake_stdout re-assignment, as was bufferred + "late" deferred .get_value()-read into print, to finally reach -> real_stdout
    _ = os.system( 'echo os.system() redir-ed' )#           + [1] via real_stdout                                 + "late" deferred .get_value()-read into print, to finally reach -> real_stdout, if not ( _ = )-caught from RET-d "byteswritten" / avoided from being injected int fake_stdout
    _ = os.write(  sys.stderr.fileno(),         #           + [2] via      stderr                                 + "late" deferred .get_value()-read into print, to finally reach -> real_stdout, if not ( _ = )-caught from RET-d "byteswritten" / avoided from being injected int fake_stdout
                       b'os.write()  redir-ed' )#  *OTHERWISE, if via fake_stdout, EXC <_io.BytesIO object at 0x02C0BB10> Traceback (most recent call last):
    # ----------------------------------------- #           ?                              io.UnsupportedOperation: fileno
    #'''                                                    ? YET:        <_io.BytesIO object at 0x02C0BB10> has a .fileno() method listed
    #>>> 'fileno' in dir( sys.stdout )       -> True        ? HAS IT ADVERTISED,
    #>>> pass;            sys.stdout.fileno  -> <built-in method fileno of _io.BytesIO object at 0x02C0BB10>
    #>>> pass;            sys.stdout.fileno()-> Traceback (most recent call last):
    #                                             File "<stdin>", line 1, in <module>
    #                                           io.UnsupportedOperation: fileno
    #                                                       ? BUT REFUSES TO USE IT
    #'''
finally:                                        # == FINALLY:
    sys.stdout.flush()                          #           .flush() before ret'd back REAL_
    sys.stdout = real_stdout                    #           .SET <stdout> to use POP'd REAL_
    sys.stdout.flush()                          #           .flush() after  ret'd back REAL_
    out_string = fake_stdout.getvalue()         #           .GET string           from FAKE_
    fake_stdout.close()                         #                <FD>.close()
    # +++++++++++++++++++++++++++++++++++++     # do what you want with the out_string
    #
    print "\n{0:}\n{1:}{0:}".format( 60 * "/\\",# "LATE" deferred print the out_string at the very end reached -> real_stdout
                                     out_string #                   
                                     )
'''
PASS'd:::::
...
os.system() redir-ed
os.write()  redir-ed
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
123456789
'2.5'
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version:   2.5
NumPy version:     1.10.4
Python version:    2.7.13 |Anaconda 4.0.0 (32-bit)| (default, May 11 2017, 14:07:41) [MSC v.1500 32 bit (Intel)]
AMD/Intel CPU?     True
VML available?     True
VML/MKL version:   Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for 32-bit applications
Number of threads used by default: 4 (out of 4 detected cores)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
>>>

EXC'd :::::
...
os.system() redir-ed
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
123456789
'2.5'
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version:   2.5
NumPy version:     1.10.4
Python version:    2.7.13 |Anaconda 4.0.0 (32-bit)| (default, May 11 2017, 14:07:41) [MSC v.1500 32 bit (Intel)]
AMD/Intel CPU?     True
VML available?     True
VML/MKL version:   Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for 32-bit applications
Number of threads used by default: 4 (out of 4 detected cores)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

Traceback (most recent call last):
  File "<stdin>", line 9, in <module>
io.UnsupportedOperation: fileno
'''

回答 5

python3的上下文管理器:

import sys
from io import StringIO


class RedirectedStdout:
    def __init__(self):
        self._stdout = None
        self._string_io = None

    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._string_io = StringIO()
        return self

    def __exit__(self, type, value, traceback):
        sys.stdout = self._stdout

    def __str__(self):
        return self._string_io.getvalue()

像这样使用:

>>> with RedirectedStdout() as out:
>>>     print('asdf')
>>>     s = str(out)
>>>     print('bsdf')
>>> print(s, out)
'asdf\n' 'asdf\nbsdf\n'

A context manager for python3:

import sys
from io import StringIO


class RedirectedStdout:
    def __init__(self):
        self._stdout = None
        self._string_io = None

    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._string_io = StringIO()
        return self

    def __exit__(self, type, value, traceback):
        sys.stdout = self._stdout

    def __str__(self):
        return self._string_io.getvalue()

use like this:

>>> with RedirectedStdout() as out:
>>>     print('asdf')
>>>     s = str(out)
>>>     print('bsdf')
>>> print(s, out)
'asdf\n' 'asdf\nbsdf\n'

回答 6

在Python3.6中,StringIOand cStringIO模块不见了,您应该改用,所以您应该io.StringIO像第一个答案那样进行操作:

import sys
from io import StringIO

old_stdout = sys.stdout
old_stderr = sys.stderr
my_stdout = sys.stdout = StringIO()
my_stderr = sys.stderr = StringIO()

# blah blah lots of code ...

sys.stdout = self.old_stdout
sys.stderr = self.old_stderr

// if you want to see the value of redirect output, be sure the std output is turn back
print(my_stdout.getvalue())
print(my_stderr.getvalue())

my_stdout.close()
my_stderr.close()

In Python3.6, the StringIO and cStringIO modules are gone, you should use io.StringIO instead.So you should do this like the first answer:

import sys
from io import StringIO

old_stdout = sys.stdout
old_stderr = sys.stderr
my_stdout = sys.stdout = StringIO()
my_stderr = sys.stderr = StringIO()

# blah blah lots of code ...

sys.stdout = self.old_stdout
sys.stderr = self.old_stderr

// if you want to see the value of redirect output, be sure the std output is turn back
print(my_stdout.getvalue())
print(my_stderr.getvalue())

my_stdout.close()
my_stderr.close()

回答 7

使用pipe()并写入适当的文件描述符。

https://docs.python.org/library/os.html#file-descriptor-operations


回答 8

这是另一种看法。 contextlib.redirect_stdoutio.StringIO()作为记录的是伟大的,但它仍然是一个有点冗长,日常使用。这是通过子类化使其成为单线的方法contextlib.redirect_stdout

import sys
import io
from contextlib import redirect_stdout

class capture(redirect_stdout):

    def __init__(self):
        self.f = io.StringIO()
        self._new_target = self.f
        self._old_targets = []  # verbatim from parent class

    def __enter__(self):
        self._old_targets.append(getattr(sys, self._stream))  # verbatim from parent class
        setattr(sys, self._stream, self._new_target)  # verbatim from parent class
        return self  # instead of self._new_target in the parent class

    def __repr__(self):
        return self.f.getvalue()  

由于__enter__返回self,因此在with块退出之后,可以使用上下文管理器对象。而且,由于使用__repr__方法,上下文管理器对象的字符串表示实际上是stdout。所以现在你有了

with capture() as message:
    print('Hello World!')
print(str(message)=='Hello World!\n')  # returns True

Here’s another take on this. contextlib.redirect_stdout with io.StringIO() as documented is great, but it’s still a bit verbose for every day use. Here’s how to make it a one-liner by subclassing contextlib.redirect_stdout:

import sys
import io
from contextlib import redirect_stdout

class capture(redirect_stdout):

    def __init__(self):
        self.f = io.StringIO()
        self._new_target = self.f
        self._old_targets = []  # verbatim from parent class

    def __enter__(self):
        self._old_targets.append(getattr(sys, self._stream))  # verbatim from parent class
        setattr(sys, self._stream, self._new_target)  # verbatim from parent class
        return self  # instead of self._new_target in the parent class

    def __repr__(self):
        return self.f.getvalue()  

Since __enter__ returns self, you have the context manager object available after the with block exits. Moreover, thanks to the __repr__ method, the string representation of the context manager object is, in fact, stdout. So now you have,

with capture() as message:
    print('Hello World!')
print(str(message)=='Hello World!\n')  # returns True

在其他两个日期之间生成一个随机日期

问题:在其他两个日期之间生成一个随机日期

如何生成必须在其他两个给定日期之间的随机日期?

该函数的签名应如下所示:

random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", 0.34)
                   ^                       ^          ^

            date generated has  date generated has  a random number
            to be after this    to be before this

并返回一个日期,例如: 2/4/2008 7:20 PM

How would I generate a random date that has to be between two other given dates?

The function’s signature should be something like this:

random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", 0.34)
                   ^                       ^          ^

            date generated has  date generated has  a random number
            to be after this    to be before this

and would return a date such as: 2/4/2008 7:20 PM


回答 0

将两个字符串都转换为时间戳(以您选择的分辨率为单位,例如毫秒,秒,小时,天等),从后一个减去前一个,将您的随机数(假设分布在中range [0, 1])乘以该差,然后再次加较早的一个。将时间戳转换回日期字符串,并且您在该范围内有一个随机时间。

Python示例(输出几乎是您指定的格式,而不是0填充-归咎于美国时间格式约定):

import random
import time

def str_time_prop(start, end, format, prop):
    """Get a time at a proportion of a range of two formatted times.

    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """

    stime = time.mktime(time.strptime(start, format))
    etime = time.mktime(time.strptime(end, format))

    ptime = stime + prop * (etime - stime)

    return time.strftime(format, time.localtime(ptime))


def random_date(start, end, prop):
    return str_time_prop(start, end, '%m/%d/%Y %I:%M %p', prop)

print(random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", random.random()))

Convert both strings to timestamps (in your chosen resolution, e.g. milliseconds, seconds, hours, days, whatever), subtract the earlier from the later, multiply your random number (assuming it is distributed in the range [0, 1]) with that difference, and add again to the earlier one. Convert the timestamp back to date string and you have a random time in that range.

Python example (output is almost in the format you specified, other than 0 padding – blame the American time format conventions):

import random
import time

def str_time_prop(start, end, format, prop):
    """Get a time at a proportion of a range of two formatted times.

    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """

    stime = time.mktime(time.strptime(start, format))
    etime = time.mktime(time.strptime(end, format))

    ptime = stime + prop * (etime - stime)

    return time.strftime(format, time.localtime(ptime))


def random_date(start, end, prop):
    return str_time_prop(start, end, '%m/%d/%Y %I:%M %p', prop)

print(random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", random.random()))

回答 1

from random import randrange
from datetime import timedelta

def random_date(start, end):
    """
    This function will return a random datetime between two datetime 
    objects.
    """
    delta = end - start
    int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
    random_second = randrange(int_delta)
    return start + timedelta(seconds=random_second)

精度是秒。如果需要,您可以将精度提高到微秒,或降低到半小时。为此,只需更改最后一行的计算即可。

示例运行:

from datetime import datetime

d1 = datetime.strptime('1/1/2008 1:30 PM', '%m/%d/%Y %I:%M %p')
d2 = datetime.strptime('1/1/2009 4:50 AM', '%m/%d/%Y %I:%M %p')

print(random_date(d1, d2))

输出:

2008-12-04 01:50:17
from random import randrange
from datetime import timedelta

def random_date(start, end):
    """
    This function will return a random datetime between two datetime 
    objects.
    """
    delta = end - start
    int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
    random_second = randrange(int_delta)
    return start + timedelta(seconds=random_second)

The precision is seconds. You can increase precision up to microseconds, or decrease to, say, half-hours, if you want. For that just change the last line’s calculation.

example run:

from datetime import datetime

d1 = datetime.strptime('1/1/2008 1:30 PM', '%m/%d/%Y %I:%M %p')
d2 = datetime.strptime('1/1/2009 4:50 AM', '%m/%d/%Y %I:%M %p')

print(random_date(d1, d2))

output:

2008-12-04 01:50:17

回答 2

一个小版本。

import datetime
import random


def random_date(start, end):
    """Generate a random datetime between `start` and `end`"""
    return start + datetime.timedelta(
        # Get a random amount of seconds between `start` and `end`
        seconds=random.randint(0, int((end - start).total_seconds())),
    )

请注意,startend参数都应该是datetime对象。如果您有字符串,则很容易转换。其他答案指出了这样做的一些方法。

A tiny version.

import datetime
import random


def random_date(start, end):
    """Generate a random datetime between `start` and `end`"""
    return start + datetime.timedelta(
        # Get a random amount of seconds between `start` and `end`
        seconds=random.randint(0, int((end - start).total_seconds())),
    )

Note that both start and end arguments should be datetime objects. If you’ve got strings instead, it’s fairly easy to convert. The other answers point to some ways to do so.


回答 3

更新的答案

使用Faker甚至更简单。

安装

pip install faker

用法:

from faker import Faker
fake = Faker()

fake.date_between(start_date='today', end_date='+30y')
# datetime.date(2025, 3, 12)

fake.date_time_between(start_date='-30y', end_date='now')
# datetime.datetime(2007, 2, 28, 11, 28, 16)

# Or if you need a more specific date boundaries, provide the start 
# and end dates explicitly.
import datetime
start_date = datetime.date(year=2015, month=1, day=1)
fake.date_between(start_date=start_date, end_date='+30y')

旧答案

使用雷达非常简单

安装

pip install radar

用法

import datetime

import radar 

# Generate random datetime (parsing dates from str values)
radar.random_datetime(start='2000-05-24', stop='2013-05-24T23:59:59')

# Generate random datetime from datetime.datetime values
radar.random_datetime(
    start = datetime.datetime(year=2000, month=5, day=24),
    stop = datetime.datetime(year=2013, month=5, day=24)
)

# Just render some random datetime. If no range is given, start defaults to 
# 1970-01-01 and stop defaults to datetime.datetime.now()
radar.random_datetime()

Updated answer

It’s even more simple using Faker.

Installation

pip install faker

Usage:

from faker import Faker
fake = Faker()

fake.date_between(start_date='today', end_date='+30y')
# datetime.date(2025, 3, 12)

fake.date_time_between(start_date='-30y', end_date='now')
# datetime.datetime(2007, 2, 28, 11, 28, 16)

# Or if you need a more specific date boundaries, provide the start 
# and end dates explicitly.
import datetime
start_date = datetime.date(year=2015, month=1, day=1)
fake.date_between(start_date=start_date, end_date='+30y')

Old answer

It’s very simple using radar

Installation

pip install radar

Usage

import datetime

import radar 

# Generate random datetime (parsing dates from str values)
radar.random_datetime(start='2000-05-24', stop='2013-05-24T23:59:59')

# Generate random datetime from datetime.datetime values
radar.random_datetime(
    start = datetime.datetime(year=2000, month=5, day=24),
    stop = datetime.datetime(year=2013, month=5, day=24)
)

# Just render some random datetime. If no range is given, start defaults to 
# 1970-01-01 and stop defaults to datetime.datetime.now()
radar.random_datetime()

回答 4

这是另一种方法-这种工作。

from random import randint
import datetime

date=datetime.date(randint(2005,2025), randint(1,12),randint(1,28))

更好的方法

startdate=datetime.date(YYYY,MM,DD)
date=startdate+datetime.timedelta(randint(1,365))

This is a different approach – that sort of works..

from random import randint
import datetime

date=datetime.date(randint(2005,2025), randint(1,12),randint(1,28))

BETTER APPROACH

startdate=datetime.date(YYYY,MM,DD)
date=startdate+datetime.timedelta(randint(1,365))

回答 5

由于Python 3 timedelta支持浮点数乘法,因此现在您可以执行以下操作:

import random
random_date = start + (end - start) * random.random()

鉴于startend是类型的datetime.datetime。例如,要在第二天生成一个随机的日期时间:

import random
from datetime import datetime, timedelta

start = datetime.now()
end = start + timedelta(days=1)
random_date = start + (end - start) * random.random()

Since Python 3 timedelta supports multiplication with floats, so now you can do:

import random
random_date = start + (end - start) * random.random()

given that start and end are of the type datetime.datetime. For example, to generate a random datetime within the next day:

import random
from datetime import datetime, timedelta

start = datetime.now()
end = start + timedelta(days=1)
random_date = start + (end - start) * random.random()

回答 6

要使用基于熊猫的解决方案,我使用:

import pandas as pd
import numpy as np

def random_date(start, end, position=None):
    start, end = pd.Timestamp(start), pd.Timestamp(end)
    delta = (end - start).total_seconds()
    if position is None:
        offset = np.random.uniform(0., delta)
    else:
        offset = position * delta
    offset = pd.offsets.Second(offset)
    t = start + offset
    return t

我喜欢它,因为很好 pd.Timestamp出色功能使我可以抛出不同的内容和格式。考虑以下几个示例…

你的签名。

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM", position=0.34)
Timestamp('2008-05-04 21:06:48', tz=None)

随机位置。

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM")
Timestamp('2008-10-21 05:30:10', tz=None)

不同的格式。

>>> random_date('2008-01-01 13:30', '2009-01-01 4:50')
Timestamp('2008-11-18 17:20:19', tz=None)

直接传递熊猫/日期时间对象。

>>> random_date(pd.datetime.now(), pd.datetime.now() + pd.offsets.Hour(3))
Timestamp('2014-03-06 14:51:16.035965', tz=None)

To chip in a pandas-based solution I use:

import pandas as pd
import numpy as np

def random_date(start, end, position=None):
    start, end = pd.Timestamp(start), pd.Timestamp(end)
    delta = (end - start).total_seconds()
    if position is None:
        offset = np.random.uniform(0., delta)
    else:
        offset = position * delta
    offset = pd.offsets.Second(offset)
    t = start + offset
    return t

I like it, because of the nice pd.Timestamp features that allow me to throw different stuff and formats at it. Consider the following few examples…

Your signature.

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM", position=0.34)
Timestamp('2008-05-04 21:06:48', tz=None)

Random position.

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM")
Timestamp('2008-10-21 05:30:10', tz=None)

Different format.

>>> random_date('2008-01-01 13:30', '2009-01-01 4:50')
Timestamp('2008-11-18 17:20:19', tz=None)

Passing pandas/datetime objects directly.

>>> random_date(pd.datetime.now(), pd.datetime.now() + pd.offsets.Hour(3))
Timestamp('2014-03-06 14:51:16.035965', tz=None)

回答 7

这是标题标题的字面意思的答案,而不是问题的正文:

import time
import datetime
import random

def date_to_timestamp(d) :
  return int(time.mktime(d.timetuple()))

def randomDate(start, end):
  """Get a random date between two dates"""

  stime = date_to_timestamp(start)
  etime = date_to_timestamp(end)

  ptime = stime + random.random() * (etime - stime)

  return datetime.date.fromtimestamp(ptime)

这段代码大致基于公认的答案。

Here is an answer to the literal meaning of the title rather than the body of this question:

import time
import datetime
import random

def date_to_timestamp(d) :
  return int(time.mktime(d.timetuple()))

def randomDate(start, end):
  """Get a random date between two dates"""

  stime = date_to_timestamp(start)
  etime = date_to_timestamp(end)

  ptime = stime + random.random() * (etime - stime)

  return datetime.date.fromtimestamp(ptime)

This code is based loosely on the accepted answer.


回答 8

您可以使用Mixer

pip install mixer

和,

from mixer import generators as gen
print gen.get_datetime(min_datetime=(1900, 1, 1, 0, 0, 0), max_datetime=(2020, 12, 31, 23, 59, 59))

You can Use Mixer,

pip install mixer

and,

from mixer import generators as gen
print gen.get_datetime(min_datetime=(1900, 1, 1, 0, 0, 0), max_datetime=(2020, 12, 31, 23, 59, 59))

回答 9

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Create random datetime object."""

from datetime import datetime
import random


def create_random_datetime(from_date, to_date, rand_type='uniform'):
    """
    Create random date within timeframe.

    Parameters
    ----------
    from_date : datetime object
    to_date : datetime object
    rand_type : {'uniform'}

    Examples
    --------
    >>> random.seed(28041990)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(1998, 12, 13, 23, 38, 0, 121628)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(2000, 3, 19, 19, 24, 31, 193940)
    """
    delta = to_date - from_date
    if rand_type == 'uniform':
        rand = random.random()
    else:
        raise NotImplementedError('Unknown random mode \'{}\''
                                  .format(rand_type))
    return from_date + rand * delta


if __name__ == '__main__':
    import doctest
    doctest.testmod()
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Create random datetime object."""

from datetime import datetime
import random


def create_random_datetime(from_date, to_date, rand_type='uniform'):
    """
    Create random date within timeframe.

    Parameters
    ----------
    from_date : datetime object
    to_date : datetime object
    rand_type : {'uniform'}

    Examples
    --------
    >>> random.seed(28041990)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(1998, 12, 13, 23, 38, 0, 121628)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(2000, 3, 19, 19, 24, 31, 193940)
    """
    delta = to_date - from_date
    if rand_type == 'uniform':
        rand = random.random()
    else:
        raise NotImplementedError('Unknown random mode \'{}\''
                                  .format(rand_type))
    return from_date + rand * delta


if __name__ == '__main__':
    import doctest
    doctest.testmod()

回答 10

将您的日期转换为时间戳并random.randint使用时间戳进行调用,然后将随机生成的时间戳转换回日期:

from datetime import datetime
import random

def random_date(first_date, second_date):
    first_timestamp = int(first_date.timestamp())
    second_timestamp = int(second_date.timestamp())
    random_timestamp = random.randint(first_timestamp, second_timestamp)
    return datetime.fromtimestamp(random_timestamp)

那你可以这样用

from datetime import datetime

d1 = datetime.strptime("1/1/2018 1:30 PM", "%m/%d/%Y %I:%M %p")
d2 = datetime.strptime("1/1/2019 4:50 AM", "%m/%d/%Y %I:%M %p")

random_date(d1, d2)

random_date(d2, d1)  # ValueError because the first date comes after the second date

如果您关心时区,则应该date_time_between_datesFaker库中使用它,因为我已经从中窃取了此代码,因为已经给出了另一个答案。

Convert your dates into timestamps and call random.randint with the timestamps, then convert the randomly generated timestamp back into a date:

from datetime import datetime
import random

def random_date(first_date, second_date):
    first_timestamp = int(first_date.timestamp())
    second_timestamp = int(second_date.timestamp())
    random_timestamp = random.randint(first_timestamp, second_timestamp)
    return datetime.fromtimestamp(random_timestamp)

Then you can use it like this

from datetime import datetime

d1 = datetime.strptime("1/1/2018 1:30 PM", "%m/%d/%Y %I:%M %p")
d2 = datetime.strptime("1/1/2019 4:50 AM", "%m/%d/%Y %I:%M %p")

random_date(d1, d2)

random_date(d2, d1)  # ValueError because the first date comes after the second date

If you care about timezones you should just use date_time_between_dates from the Faker library, where I stole this code from, as a different answer already suggests.


回答 11

  1. 将输入日期转换为数字(整数,浮点数,最适合您的用法)
  2. 在两个日期数字之间选择一个数字。
  3. 将此数字转换回日期。

许多操作系统中已经提供了许多用于将日期与数字进行日期转换的算法。

  1. Convert your input dates to numbers (int, float, whatever is best for your usage)
  2. Choose a number between your two date numbers.
  3. Convert this number back to a date.

Many algorithms for converting date to and from numbers are already available in many operating systems.


回答 12

您需要什么随机数?通常(取决于语言),您可以从日期开始获取到纪元的秒数​​/毫秒数。因此,对于startDate和endDate之间的随机日期,您可以执行以下操作:

  1. 以毫秒为单位计算startDate和endDate之间的时间(endDate.toMilliseconds()-startDate.toMilliseconds())
  2. 生成一个介于0和1之间的数字
  3. 生成一个新的Date,其时间偏移量= startDate.toMilliseconds()+ 2中获得的数字

What do you need the random number for? Usually (depending on the language) you can get the number of seconds/milliseconds from the Epoch from a date. So for a randomd date between startDate and endDate you could do:

  1. compute the time in ms between startDate and endDate (endDate.toMilliseconds() – startDate.toMilliseconds())
  2. generate a number between 0 and the number you obtained in 1
  3. generate a new Date with time offset = startDate.toMilliseconds() + number obtained in 2

回答 13

最简单的方法是将两个数字都转换为时间戳,然后将其设置为随机数生成器的最小和最大界限。

一个快速的PHP示例是:

// Find a randomDate between $start_date and $end_date
function randomDate($start_date, $end_date)
{
    // Convert to timetamps
    $min = strtotime($start_date);
    $max = strtotime($end_date);

    // Generate random number using above bounds
    $val = rand($min, $max);

    // Convert back to desired date format
    return date('Y-m-d H:i:s', $val);
}

此函数strtotime()用于将日期时间描述转换为Unix时间戳,并date()根据已生成的随机时间戳生成有效日期。

The easiest way of doing this is to convert both numbers to timestamps, then set these as the minimum and maximum bounds on a random number generator.

A quick PHP example would be:

// Find a randomDate between $start_date and $end_date
function randomDate($start_date, $end_date)
{
    // Convert to timetamps
    $min = strtotime($start_date);
    $max = strtotime($end_date);

    // Generate random number using above bounds
    $val = rand($min, $max);

    // Convert back to desired date format
    return date('Y-m-d H:i:s', $val);
}

This function makes use of strtotime() to convert a datetime description into a Unix timestamp, and date() to make a valid date out of the random timestamp which has been generated.


回答 14

只是添加另一个:

datestring = datetime.datetime.strftime(datetime.datetime( \
    random.randint(2000, 2015), \
    random.randint(1, 12), \
    random.randint(1, 28), \
    random.randrange(23), \
    random.randrange(59), \
    random.randrange(59), \
    random.randrange(1000000)), '%Y-%m-%d %H:%M:%S')

日常处理需要一些注意事项。28岁时,您就在安全的网站上。

Just to add another one:

datestring = datetime.datetime.strftime(datetime.datetime( \
    random.randint(2000, 2015), \
    random.randint(1, 12), \
    random.randint(1, 28), \
    random.randrange(23), \
    random.randrange(59), \
    random.randrange(59), \
    random.randrange(1000000)), '%Y-%m-%d %H:%M:%S')

The day handling needs some considerations. With 28 you are on the secure site.


回答 15

这是从emyller的方法修改而来的解决方案,该方法以任何分辨率返回随机日期数组

import numpy as np

def random_dates(start, end, size=1, resolution='s'):
    """
    Returns an array of random dates in the interval [start, end]. Valid 
    resolution arguments are numpy date/time units, as documented at: 
        https://docs.scipy.org/doc/numpy-dev/reference/arrays.datetime.html
    """
    start, end = np.datetime64(start), np.datetime64(end)
    delta = (end-start).astype('timedelta64[{}]'.format(resolution))
    delta_mat = np.random.randint(0, delta.astype('int'), size)
    return start + delta_mat.astype('timedelta64[{}]'.format(resolution))

这种方法的部分优点在于,np.datetime64它确实擅长将日期强制转换为日期,因此您可以将开始/结束日期指定为字符串,日期时间,熊猫时间戳记……几乎所有东西都可以使用。

Here’s a solution modified from emyller’s approach which returns an array of random dates at any resolution

import numpy as np

def random_dates(start, end, size=1, resolution='s'):
    """
    Returns an array of random dates in the interval [start, end]. Valid 
    resolution arguments are numpy date/time units, as documented at: 
        https://docs.scipy.org/doc/numpy-dev/reference/arrays.datetime.html
    """
    start, end = np.datetime64(start), np.datetime64(end)
    delta = (end-start).astype('timedelta64[{}]'.format(resolution))
    delta_mat = np.random.randint(0, delta.astype('int'), size)
    return start + delta_mat.astype('timedelta64[{}]'.format(resolution))

Part of what’s nice about this approach is that np.datetime64 is really good at coercing things to dates, so you can specify your start/end dates as strings, datetimes, pandas timestamps… pretty much anything will work.


回答 16

从概念上讲,这很简单。根据您所使用的语言,您将能够将这些日期转换为参考32或64位整数,通常表示自纪元(1970年1月1日)以来的秒数(否则称为“ Unix时间”)或自某个其他任意日期以来的毫秒数。只需在这两个值之间生成一个随机的32或64位整数。这应该是任何语言的统一班轮。

在某些平台上,您可以将时间生成为两倍(日期是整数部分,时间是小数部分是一种实现)。除了要处理单精度或双精度浮点数(在C,Java和其他语言中为“ floats”或“ doubles”)外,该原理均适用。减去差,乘以随机数(0 <= r <= 1),加到开始时间并完成。

Conceptually it’s quite simple. Depending on which language you’re using you will be able to convert those dates into some reference 32 or 64 bit integer, typically representing seconds since epoch (1 January 1970) otherwise known as “Unix time” or milliseconds since some other arbitrary date. Simply generate a random 32 or 64 bit integer between those two values. This should be a one liner in any language.

On some platforms you can generate a time as a double (date is the integer part, time is the fractional part is one implementation). The same principle applies except you’re dealing with single or double precision floating point numbers (“floats” or “doubles” in C, Java and other languages). Subtract the difference, multiply by random number (0 <= r <= 1), add to start time and done.


回答 17

在python中:

>>> from dateutil.rrule import rrule, DAILY
>>> import datetime, random
>>> random.choice(
                 list(
                     rrule(DAILY, 
                           dtstart=datetime.date(2009,8,21), 
                           until=datetime.date(2010,10,12))
                     )
                 )
datetime.datetime(2010, 2, 1, 0, 0)

(需要python dateutil库– pip install python-dateutil

In python:

>>> from dateutil.rrule import rrule, DAILY
>>> import datetime, random
>>> random.choice(
                 list(
                     rrule(DAILY, 
                           dtstart=datetime.date(2009,8,21), 
                           until=datetime.date(2010,10,12))
                     )
                 )
datetime.datetime(2010, 2, 1, 0, 0)

(need python dateutil library – pip install python-dateutil)


回答 18

使用ApacheCommonUtils生成给定范围内的随机长度,然后在该长度范围之外创建Date。

例:

导入org.apache.commons.math.random.RandomData;

导入org.apache.commons.math.random.RandomDataImpl;

公开日期nextDate(最小日期,最大日期){

RandomData randomData = new RandomDataImpl();

return new Date(randomData.nextLong(min.getTime(), max.getTime()));

}

Use ApacheCommonUtils to generate a random long within a given range, and then create Date out of that long.

Example:

import org.apache.commons.math.random.RandomData;

import org.apache.commons.math.random.RandomDataImpl;

public Date nextDate(Date min, Date max) {

RandomData randomData = new RandomDataImpl();

return new Date(randomData.nextLong(min.getTime(), max.getTime()));

}


回答 19

我用随机和时间为另一个项目做了这个。我从一开始就使用通用格式,您可以在此处查看strftime()中第一个参数的文档。第二部分是random.randrange函数。它在参数之间返回一个整数。将其更改为与您想要的字符串匹配的范围。在第二个扩展的元组中,您必须有很好的论据。

import time
import random


def get_random_date():
    return strftime("%Y-%m-%d %H:%M:%S",(random.randrange(2000,2016),random.randrange(1,12),
    random.randrange(1,28),random.randrange(1,24),random.randrange(1,60),random.randrange(1,60),random.randrange(1,7),random.randrange(0,366),1))

I made this for another project using random and time. I used a general format from time you can view the documentation here for the first argument in strftime(). The second part is a random.randrange function. It returns an integer between the arguments. Change it to the ranges that match the strings you would like. You must have nice arguments in the tuple of the second arugment.

import time
import random


def get_random_date():
    return strftime("%Y-%m-%d %H:%M:%S",(random.randrange(2000,2016),random.randrange(1,12),
    random.randrange(1,28),random.randrange(1,24),random.randrange(1,60),random.randrange(1,60),random.randrange(1,7),random.randrange(0,366),1))

回答 20

熊猫+ numpy解决方案

import pandas as pd
import numpy as np

def RandomTimestamp(start, end):
    dts = (end - start).total_seconds()
    return start + pd.Timedelta(np.random.uniform(0, dts), 's')

dts是时间戳之间的时间差(以秒为单位)(浮动)。然后将其用于创建介于0和dts之间的熊猫时间增量,并将其添加到开始时间戳中。

Pandas + numpy solution

import pandas as pd
import numpy as np

def RandomTimestamp(start, end):
    dts = (end - start).total_seconds()
    return start + pd.Timedelta(np.random.uniform(0, dts), 's')

dts is the difference between timestamps in seconds (float). It is then used to create a pandas timedelta between 0 and dts, that is added to the start timestamp.


回答 21

根据mouviciel的回答,这是使用numpy的矢量化解决方案。将开始日期和结束日期转换为整数,在它们之间生成一个随机数数组,然后将整个数组转换回日期。

import time
import datetime
import numpy as np

n_rows = 10

start_time = "01/12/2011"
end_time = "05/08/2017"

date2int = lambda s: time.mktime(datetime.datetime.strptime(s,"%d/%m/%Y").timetuple())
int2date = lambda s: datetime.datetime.fromtimestamp(s).strftime('%Y-%m-%d %H:%M:%S')

start_time = date2int(start_time)
end_time = date2int(end_time)

random_ints = np.random.randint(low=start_time, high=end_time, size=(n_rows,1))
random_dates = np.apply_along_axis(int2date, 1, random_ints).reshape(n_rows,1)

print random_dates

Based on the answer by mouviciel, here is a vectorized solution using numpy. Convert the start and end dates to ints, generate an array of random numbers between them, and convert the whole array back to dates.

import time
import datetime
import numpy as np

n_rows = 10

start_time = "01/12/2011"
end_time = "05/08/2017"

date2int = lambda s: time.mktime(datetime.datetime.strptime(s,"%d/%m/%Y").timetuple())
int2date = lambda s: datetime.datetime.fromtimestamp(s).strftime('%Y-%m-%d %H:%M:%S')

start_time = date2int(start_time)
end_time = date2int(end_time)

random_ints = np.random.randint(low=start_time, high=end_time, size=(n_rows,1))
random_dates = np.apply_along_axis(int2date, 1, random_ints).reshape(n_rows,1)

print random_dates

回答 22

它是@(Tom Alsberg)的修改方法。我将其修改为以毫秒为单位获取日期。

import random
import time
import datetime

def random_date(start_time_string, end_time_string, format_string, random_number):
    """
    Get a time at a proportion of a range of two formatted times.
    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """
    dt_start = datetime.datetime.strptime(start_time_string, format_string)
    dt_end = datetime.datetime.strptime(end_time_string, format_string)

    start_time = time.mktime(dt_start.timetuple()) + dt_start.microsecond / 1000000.0
    end_time = time.mktime(dt_end.timetuple()) + dt_end.microsecond / 1000000.0

    random_time = start_time + random_number * (end_time - start_time)

    return datetime.datetime.fromtimestamp(random_time).strftime(format_string)

例:

print TestData.TestData.random_date("2000/01/01 00:00:00.000000", "2049/12/31 23:59:59.999999", '%Y/%m/%d %H:%M:%S.%f', random.random())

输出: 2028/07/08 12:34:49.977963

It’s modified method of @(Tom Alsberg). I modified it to get date with milliseconds.

import random
import time
import datetime

def random_date(start_time_string, end_time_string, format_string, random_number):
    """
    Get a time at a proportion of a range of two formatted times.
    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """
    dt_start = datetime.datetime.strptime(start_time_string, format_string)
    dt_end = datetime.datetime.strptime(end_time_string, format_string)

    start_time = time.mktime(dt_start.timetuple()) + dt_start.microsecond / 1000000.0
    end_time = time.mktime(dt_end.timetuple()) + dt_end.microsecond / 1000000.0

    random_time = start_time + random_number * (end_time - start_time)

    return datetime.datetime.fromtimestamp(random_time).strftime(format_string)

Example:

print TestData.TestData.random_date("2000/01/01 00:00:00.000000", "2049/12/31 23:59:59.999999", '%Y/%m/%d %H:%M:%S.%f', random.random())

Output: 2028/07/08 12:34:49.977963


回答 23

start_timestamp = time.mktime(time.strptime('Jun 1 2010  01:33:00', '%b %d %Y %I:%M:%S'))
end_timestamp = time.mktime(time.strptime('Jun 1 2017  12:33:00', '%b %d %Y %I:%M:%S'))
time.strftime('%b %d %Y %I:%M:%S',time.localtime(randrange(start_timestamp,end_timestamp)))

参考

start_timestamp = time.mktime(time.strptime('Jun 1 2010  01:33:00', '%b %d %Y %I:%M:%S'))
end_timestamp = time.mktime(time.strptime('Jun 1 2017  12:33:00', '%b %d %Y %I:%M:%S'))
time.strftime('%b %d %Y %I:%M:%S',time.localtime(randrange(start_timestamp,end_timestamp)))

refer


回答 24

    # needed to create data for 1000 fictitious employees for testing code 
    # code relating to randomly assigning forenames, surnames, and genders
    # has been removed as not germaine to the question asked above but FYI
    # genders were randomly assigned, forenames/surnames were web scrapped,
    # there is no accounting for leap years, and the data stored in mySQL

    import random 
    from datetime import datetime
    from datetime import timedelta

    for employee in range(1000):
        # assign a random date of birth (employees are aged between sixteen and sixty five)
        dlt = random.randint(365*16, 365*65)
        dob = datetime.today() - timedelta(days=dlt)
        # assign a random date of hire sometime between sixteenth birthday and yesterday
        doh = datetime.today() - timedelta(days=random.randint(1, dlt-365*16))
        print("born {} hired {}".format(dob.strftime("%d-%m-%y"), doh.strftime("%d-%m-%y")))
    # needed to create data for 1000 fictitious employees for testing code 
    # code relating to randomly assigning forenames, surnames, and genders
    # has been removed as not germaine to the question asked above but FYI
    # genders were randomly assigned, forenames/surnames were web scrapped,
    # there is no accounting for leap years, and the data stored in mySQL

    import random 
    from datetime import datetime
    from datetime import timedelta

    for employee in range(1000):
        # assign a random date of birth (employees are aged between sixteen and sixty five)
        dlt = random.randint(365*16, 365*65)
        dob = datetime.today() - timedelta(days=dlt)
        # assign a random date of hire sometime between sixteenth birthday and yesterday
        doh = datetime.today() - timedelta(days=random.randint(1, dlt-365*16))
        print("born {} hired {}".format(dob.strftime("%d-%m-%y"), doh.strftime("%d-%m-%y")))

回答 25

另一种方法两个日期之间创建随机日期使用np.random.randint()pd.Timestamp().valuepd.to_datetime()具有for loop

# Import libraries
import pandas as pd

# Initialize
start = '2020-01-01' # Specify start date
end = '2020-03-10' # Specify end date
n = 10 # Specify number of dates needed

# Get random dates
x = np.random.randint(pd.Timestamp(start).value, pd.Timestamp(end).value,n)
random_dates = [pd.to_datetime((i/10**9)/(60*60)/24, unit='D').strftime('%Y-%m-%d')  for i in x]

print(random_dates)

输出量

['2020-01-06',
 '2020-03-08',
 '2020-01-23',
 '2020-02-03',
 '2020-01-30',
 '2020-01-05',
 '2020-02-16',
 '2020-03-08',
 '2020-02-09',
 '2020-01-04']

Alternative way to create random dates between two dates using np.random.randint(), pd.Timestamp().value and pd.to_datetime() with for loop:

# Import libraries
import pandas as pd

# Initialize
start = '2020-01-01' # Specify start date
end = '2020-03-10' # Specify end date
n = 10 # Specify number of dates needed

# Get random dates
x = np.random.randint(pd.Timestamp(start).value, pd.Timestamp(end).value,n)
random_dates = [pd.to_datetime((i/10**9)/(60*60)/24, unit='D').strftime('%Y-%m-%d')  for i in x]

print(random_dates)

Output

['2020-01-06',
 '2020-03-08',
 '2020-01-23',
 '2020-02-03',
 '2020-01-30',
 '2020-01-05',
 '2020-02-16',
 '2020-03-08',
 '2020-02-09',
 '2020-01-04']

Python字符串格式中的%s和%d有什么区别?

问题:Python字符串格式中的%s和%d有什么区别?

我不知道该做什么%s%d做什么以及它们如何工作。

I don’t understand what %s and %d do and how they work.


回答 0

它们用于格式化字符串。%s充当字符串的占位符,而%d充当数字的占位符。它们的关联值使用%运算符通过元组传递。

name = 'marcog'
number = 42
print '%s %d' % (name, number)

将打印marcog 42。请注意,名称是字符串(%s),数字是整数(%d为十进制)。

有关详细信息,请参见https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting

在Python 3中,示例为:

print('%s %d' % (name, number))

They are used for formatting strings. %s acts a placeholder for a string while %d acts as a placeholder for a number. Their associated values are passed in via a tuple using the % operator.

name = 'marcog'
number = 42
print '%s %d' % (name, number)

will print marcog 42. Note that name is a string (%s) and number is an integer (%d for decimal).

See https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting for details.

In Python 3 the example would be:

print('%s %d' % (name, number))

回答 1

python 3 doc

%d 用于十进制整数

%s 用于通用字符串或对象,如果是对象,它将转换为字符串

考虑以下代码

name ='giacomo'
number = 4.3
print('%s %s %d %f %g' % (name, number, number, number, number))

输出将是

贾科莫4.3 4 4.300000 4.3

如您所见,%d将截断为整数,%s保持格式,%f将打印为float并%g用于通用编号

明显

print('%d' % (name))

会产生异常;您不能将字符串转换为数字

from python 3 doc

%d is for decimal integer

%s is for generic string or object and in case of object, it will be converted to string

Consider the following code

name ='giacomo'
number = 4.3
print('%s %s %d %f %g' % (name, number, number, number, number))

the out put will be

giacomo 4.3 4 4.300000 4.3

as you can see %d will truncate to integer, %s will maintain formatting, %f will print as float and %g is used for generic number

obviously

print('%d' % (name))

will generate an exception; you cannot convert string to number


回答 2

%s 用作要插入格式化字符串中的字符串值的占位符。

%d 用作数字或十进制值的占位符。

例如(对于python 3)

print ('%s is %d years old' % ('Joe', 42))

将输出

Joe is 42 years old

%s is used as a placeholder for string values you want to inject into a formatted string.

%d is used as a placeholder for numeric or decimal values.

For example (for python 3)

print ('%s is %d years old' % ('Joe', 42))

Would output

Joe is 42 years old

回答 3

这些是占位符:

例如: 'Hi %s I have %d donuts' %('Alice', 42)

此行代码将%s替换为Alice(str),将%d替换为42。

输出: 'Hi Alice I have 42 donuts'

大多数情况下,这可以通过“ +”来实现。为了更深入地理解您的问题,您可能还需要检查{} / .format()。这是一个示例:Python字符串格式:%vs.format

在这里也可以看到@ 40’的Google python教程视频,其中有一些说明 https://www.youtube.com/watch?v=tKTZoB2Vjuk

These are placeholders:

For example: 'Hi %s I have %d donuts' %('Alice', 42)

This line of code will substitute %s with Alice (str) and %d with 42.

Output: 'Hi Alice I have 42 donuts'

This could be achieved with a “+” most of the time. To gain a deeper understanding to your question, you may want to check {} / .format() as well. Here is one example: Python string formatting: % vs. .format

also see here a google python tutorial video @ 40′, it has some explanations https://www.youtube.com/watch?v=tKTZoB2Vjuk


回答 4

%d%s字符串格式化“命令”用于格式字符串。的%d是数字,%s是用于字符串。

例如:

print("%s" % "hi")

print("%d" % 34.6)

传递多个参数:

print("%s %s %s%d" % ("hi", "there", "user", 123456)) 将返回 hi there user123456

The %d and %s string formatting “commands” are used to format strings. The %d is for numbers, and %s is for strings.

For an example:

print("%s" % "hi")

and

print("%d" % 34.6)

To pass multiple arguments:

print("%s %s %s%d" % ("hi", "there", "user", 123456)) will return hi there user123456


回答 5

这些都是有根据的答案,但没有一个完全可以理解%s和之间的区别的核心%d

%s告诉格式化程序str()在参数上调用该函数,并且由于我们按照定义强制使用字符串,%s因此实际上只是在执行str(arg)

%d另一方面,在调用int()之前先调用参数str(),例如str(int(arg)),这将导致int强制以及str强制。

例如,我可以将十六进制值转换为十进制,

>>> '%d' % 0x15
'21'

或截断一个浮点数。

>>> '%d' % 34.5
'34'

但是,如果参数不是数字,则该操作将引发异常。

>>> '%d' % 'thirteen'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: %d format: a number is required, not str

因此,如果意图仅仅是调用str(arg),那么%s就足够了,但是如果您需要额外的格式设置(例如格式化浮点小数位)或其他强制性格式,则需要其他格式符号。

使用这种f-string表示法,当您不使用格式化程序时,默认值为str

>>> a = 1
>>> f'{a}'
'1'
>>> f'{a:d}'
'1'
>>> a = '1'
>>> f'{a:d}'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'd' for object of type 'str'

情况同样如此string.format; 默认值为str

>>> a = 1
>>> '{}'.format(a)
'1'
>>> '{!s}'.format(a)
'1'
>>> '{:d}'.format(a)
'1'

These are all informative answers, but none are quite getting at the core of what the difference is between %s and %d.

%s tells the formatter to call the str() function on the argument and since we are coercing to a string by definition, %s is essentially just performing str(arg).

%d on the other hand, is calling int() on the argument before calling str(), like str(int(arg)), This will cause int coercion as well as str coercion.

For example, I can convert a hex value to decimal,

>>> '%d' % 0x15
'21'

or truncate a float.

>>> '%d' % 34.5
'34'

But the operation will raise an exception if the argument isn’t a number.

>>> '%d' % 'thirteen'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: %d format: a number is required, not str

So if the intent is just to call str(arg), then %s is sufficient, but if you need extra formatting (like formatting float decimal places) or other coercion, then the other format symbols are needed.

With the f-string notation, when you leave the formatter out, the default is str.

>>> a = 1
>>> f'{a}'
'1'
>>> f'{a:d}'
'1'
>>> a = '1'
>>> f'{a:d}'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'd' for object of type 'str'

The same is true with string.format; the default is str.

>>> a = 1
>>> '{}'.format(a)
'1'
>>> '{!s}'.format(a)
'1'
>>> '{:d}'.format(a)
'1'

回答 6

%d并且%s是占位符,它们用作可替换变量。例如,如果您创建2个变量

variable_one = "Stackoverflow"
variable_two = 45

您可以使用这些变量的元组将这些变量分配给字符串中的句子。

variable_3 = "I was searching for an answer in %s and found more than %d answers to my question"

请注意,它%s适用于字符串,%d适用于数字或十进制变量。

如果您打印variable_3它看起来像这样

print(variable_3 % (variable_one, variable_two))

我在StackOverflow中寻找答案,找到了超过45个答案。

%d and %s are placeholders, they work as a replaceable variable. For example, if you create 2 variables

variable_one = "Stackoverflow"
variable_two = 45

you can assign those variables to a sentence in a string using a tuple of the variables.

variable_3 = "I was searching for an answer in %s and found more than %d answers to my question"

Note that %s works for String and %d work for numerical or decimal variables.

if you print variable_3 it would look like this

print(variable_3 % (variable_one, variable_two))

I was searching for an answer in StackOverflow and found more than 45 answers to my question.


回答 7

它们是格式说明符。当您想要将Python表达式的值包含在字符串中且采用特定格式时,可以使用它们。

请参阅“ 深入Python ”以获取相对详细的介绍。

They are format specifiers. They are used when you want to include the value of your Python expressions into strings, with a specific format enforced.

See Dive into Python for a relatively detailed introduction.


回答 8

根据最新标准,这是应该执行的操作。

print("My name is {!s} and my number is{:d}".format("Agnel Vishal",100))

请检查python3.6文档示例程序

As per latest standards, this is how it should be done.

print("My name is {!s} and my number is{:d}".format("Agnel Vishal",100))

Do check python3.6 docs and sample program


回答 9

如果您想避免使用%s或%d,那么..

name = 'marcog'
number = 42
print ('my name is',name,'and my age is:', number)

输出:

my name is marcog and my name is 42

In case you would like to avoid %s or %d then..

name = 'marcog'
number = 42
print ('my name is',name,'and my age is:', number)

Output:

my name is marcog and my name is 42

回答 10

%s用于保留字符串的空间%d用于保留数字的空间

name = "Moses";
age = 23
print("My name is %s am CEO at MoTech Computers " %name)
print("Current am %d years old" %age)
print("So Am %s and am %d years old" %(name,age))

程序输出

该视频深入介绍了该技巧https://www.youtube.com/watch?v=4zN5YsuiqMA

%s is used to hold space for string %d is used to hold space for number

name = "Moses";
age = 23
print("My name is %s am CEO at MoTech Computers " %name)
print("Current am %d years old" %age)
print("So Am %s and am %d years old" %(name,age))

Program output

this video goes deep about that tip https://www.youtube.com/watch?v=4zN5YsuiqMA


回答 11

说到哪个…
python3.6附带的f-strings内容使格式化变得更加容易!
现在,如果您的python版本大于3.6,则可以使用以下可用方法设置字符串格式:

name = "python"

print ("i code with %s" %name)          # with help of older method
print ("i code with {0}".format(name))  # with help of format
print (f"i code with {name}")           # with help of f-strings

speaking of which …
python3.6 comes with f-strings which makes things much easier in formatting!
now if your python version is greater than 3.6 you can format your strings with these available methods:

name = "python"

print ("i code with %s" %name)          # with help of older method
print ("i code with {0}".format(name))  # with help of format
print (f"i code with {name}")           # with help of f-strings

模拟与魔术模拟

问题:模拟与魔术模拟

我的理解是MagicMockMock的超集,它自动执行“魔术方法”,从而无缝地提供对列表,迭代等的支持。那么,为什么存在普通Mock的原因是什么?难道这不是MagicMock的简化版本,实际上可以忽略吗?Mock类是否知道MagicMock中没有的任何技巧?

My understanding is that MagicMock is a superset of Mock that automatically does “magic methods” thus seamlessly providing support for lists, iterations and so on… Then what is the reason for plain Mock existing? Isn’t that just a stripped down version of MagicMock that can be practically ignored? Does Mock class know any tricks that are not available in MagicMock?


回答 0

模拟存在的原因是什么?

Mock的作者Michael Foord 在Pycon 2011(31:00)上回答了一个非常类似的问题

问:为什么MagicMock做了一件单独的事情,而不仅仅是将功能折叠到默认的模拟对象中?

答:一个合理的答案是MagicMock的工作方式是通过创建新的Mocks并对其进行设置来预先配置所有这些协议方法,因此,如果每个新的模拟都创建了一堆新的模拟并将它们设置为协议方法,则所有这些协议方法创建了一堆更多的模拟并将它们设置在其协议方法上,您可以无限递归…

如果您想将模拟作为容器对象访问是错误的,那又不想怎么办?如果每个模拟都自动获得了每种协议方法,那么这样做就变得更加困难。而且,MagicMock为您执行了一些预配置,设置了可能不合适的返回值,所以我认为最好有这种便利,它具有所有预配置的功能并可供您使用,但是您也可以进行普通的模拟对象,然后配置您想要存在的魔术方法…

简单的答案是:如果您要这样做,那就在任何地方使用MagicMock。

What is the reason for plain Mock existing?

Mock’s author, Michael Foord, addressed a very similar question at Pycon 2011 (31:00):

Q: Why was MagicMock made a separate thing rather than just folding the ability into the default mock object?

A: One reasonable answer is that the way MagicMock works is that it preconfigures all these protocol methods by creating new Mocks and setting them, so if every new mock created a bunch of new mocks and set those as protocol methods and then all of those protocol methods created a bunch more mocks and set them on their protocol methods, you’ve got infinite recursion…

What if you want accessing your mock as a container object to be an error — you don’t want that to work? If every mock has automatically got every protocol method, then it becomes much more difficult to do that. And also, MagicMock does some of this preconfiguring for you, setting return values that might not be appropriate, so I thought it would be better to have this convenience one that has everything preconfigured and available for you, but you can also take a ordinary mock object and just configure the magic methods you want to exist…

The simple answer is: just use MagicMock everywhere if that’s the behavior you want.


回答 1

使用Mock,您可以模拟魔术方法,但必须定义它们。MagicMock具有“大多数魔术方法的默认实现”。

如果您不需要测试任何魔术方法,那么Mock就足够了,并且不会在测试中带来很多无关紧要的东西。如果您需要测试许多魔术方法,MagicMock将为您节省一些时间。

With Mock you can mock magic methods but you have to define them. MagicMock has “default implementations of most of the magic methods.”.

If you don’t need to test any magic methods, Mock is adequate and doesn’t bring a lot of extraneous things into your tests. If you need to test a lot of magic methods MagicMock will save you some time.


回答 2

首先MagicMock是的子类Mock

class MagicMock(MagicMixin, Mock)

结果,MagicMock提供了Mock提供的一切以及更多功能。与其认为Mock是MagicMock的精简版,不如认为MagicMock是Mock的扩展版。这应该解决您有关Mock为什么存在以及Mock在MagicMock之上提供了什么的问题。

其次,MagicMock提供了许多魔术方法的默认实现,而Mock没有。有关提供的魔术方法的更多信息,请参见此处

提供的魔术方法的一些示例:

>>> int(Mock())
TypeError: int() argument must be a string or a number, not 'Mock'
>>> int(MagicMock())
1
>>> len(Mock())
TypeError: object of type 'Mock' has no len()
>>> len(MagicMock())
0

这些可能不那么直观(至少对我而言不直观):

>>> with MagicMock():
...     print 'hello world'
...
hello world
>>> MagicMock()[1]
<MagicMock name='mock.__getitem__()' id='4385349968'>

您可以“看到”添加到MagicMock的方法,因为这些方法是首次调用的:

>>> magic1 = MagicMock()
>>> dir(magic1)
['assert_any_call', 'assert_called_once_with', ...]
>>> int(magic1)
1
>>> dir(magic1)
['__int__', 'assert_any_call', 'assert_called_once_with', ...]
>>> len(magic1)
0
>>> dir(magic1)
['__int__', '__len__', 'assert_any_call', 'assert_called_once_with', ...]

那么,为什么不一直使用MagicMock呢?

回到您的问题是:您可以使用默认的魔术方法实现吗?例如,mocked_object[1]可以不出错吗?您是否可以接受由于魔术方法实现而导致的任何意外后果?

如果对这些问题的回答为“是”,请继续使用MagicMock。否则,坚持模拟。

To begin with, MagicMock is a subclass of Mock.

class MagicMock(MagicMixin, Mock)

As a result, MagicMock provides everything that Mock provides and more. Rather than thinking of Mock as being a stripped down version of MagicMock, think of MagicMock as an extended version of Mock. This should address your questions about why Mock exists and what does Mock provide on top of MagicMock.

Secondly, MagicMock provides default implementations of many/most magic methods, whereas Mock doesn’t. See here for more information on the magic methods provided.

Some examples of provided magic methods:

>>> int(Mock())
TypeError: int() argument must be a string or a number, not 'Mock'
>>> int(MagicMock())
1
>>> len(Mock())
TypeError: object of type 'Mock' has no len()
>>> len(MagicMock())
0

And these which may not be as intuitive (at least not intuitive to me):

>>> with MagicMock():
...     print 'hello world'
...
hello world
>>> MagicMock()[1]
<MagicMock name='mock.__getitem__()' id='4385349968'>

You can “see” the methods added to MagicMock as those methods are invoked for the first time:

>>> magic1 = MagicMock()
>>> dir(magic1)
['assert_any_call', 'assert_called_once_with', ...]
>>> int(magic1)
1
>>> dir(magic1)
['__int__', 'assert_any_call', 'assert_called_once_with', ...]
>>> len(magic1)
0
>>> dir(magic1)
['__int__', '__len__', 'assert_any_call', 'assert_called_once_with', ...]

So, why not use MagicMock all the time?

The question back to you is: Are you okay with the default magic method implementations? For example, is it okay for mocked_object[1] to not error? Are you okay with any unintended consequences due to the magic method implementations being already there?

If the answer to these questions is a yes, then go ahead and use MagicMock. Otherwise, stick to Mock.


回答 3

这就是python的官方文档 所说的:

在大多数这些示例中,Mock和MagicMock类是可互换的。由于MagicMock是功能更强大的类,因此默认情况下会使用一个明智的类。

This is what python’s official documentation says:

In most of these examples the Mock and MagicMock classes are interchangeable. As the MagicMock is the more capable class it makes a sensible one to use by default.


回答 4

我发现了另一种特殊情况,其中简单 Mock可能比MagicMock:更有用

In [1]: from unittest.mock import Mock, MagicMock, ANY
In [2]: mock = Mock()
In [3]: magic = MagicMock()
In [4]: mock.foo == ANY
Out[4]: True
In [5]: magic.foo == ANY
Out[5]: False

ANY与之比较可能很有用,例如,比较两个字典之间的几乎每个键,其中使用模拟来计算某些值。

如果您使用的是Mock


self.assertDictEqual(my_dict, {
  'hello': 'world',
  'another': ANY
})

AssertionError如果您使用过,它将引发一个MagicMock

I’ve found another particular case where simple Mock may turn more useful than MagicMock:

In [1]: from unittest.mock import Mock, MagicMock, ANY
In [2]: mock = Mock()
In [3]: magic = MagicMock()
In [4]: mock.foo == ANY
Out[4]: True
In [5]: magic.foo == ANY
Out[5]: False

Comparing against ANY can be useful, for example, comparing almost every key between two dictionaries where some value is calculated using a mock.

This will be valid if you’re using Mock:


self.assertDictEqual(my_dict, {
  'hello': 'world',
  'another': ANY
})

while it will raise an AssertionError if you’ve used MagicMock


使用pip为特定的python版本安装模块

问题:使用pip为特定的python版本安装模块

在Ubuntu 10.04上,默认情况下安装了Python 2.6,然后我安装了Python 2.7。如何使用pip installPython 2.7安装软件包。

例如:

pip install beautifulsoup4

默认情况下会为Python 2.6安装BeautifulSoup

当我做:

import bs4

在Python 2.6中可以使用,但在Python 2.7中可以显示:

No module named bs4

On Ubuntu 10.04 by default Python 2.6 is installed, then I have installed Python 2.7. How can I use pip install to install packages for Python 2.7.

For example:

pip install beautifulsoup4

by default installs BeautifulSoup for Python 2.6

When I do:

import bs4

in Python 2.6 it works, but in Python 2.7 it says:

No module named bs4

回答 0

pip对要安装新软件包的Python实例使用已安装的版本。

在许多发行版中,可能会有单独的python2.6-pippython2.7-pip程序包,并使用诸如pip-2.6和的二进制名称来调用pip-2.7。如果未将pip打包到所需目标的发行版中,则可能需要寻找setuptools或easyinstall软件包,或使用virtualenv(在生成的环境中始终包含pip)。

如果您在发行版中找不到任何内容,请在pip的网站上提供安装说明

Use a version of pip installed against the Python instance you want to install new packages to.

In many distributions, there may be separate python2.6-pip and python2.7-pip packages, invoked with binary names such as pip-2.6 and pip-2.7. If pip is not packaged in your distribution for the desired target, you might look for a setuptools or easyinstall package, or use virtualenv (which will always include pip in a generated environment).

pip’s website includes installation instructions, if you can’t find anything within your distribution.


回答 1

另外,由于pip其本身是用python编写的,因此您可以使用要为其安装软件包的python版本进行调用:

python2.7 -m pip install foo

Alternatively, since pip itself is written in python, you can just call it with the python version you want to install the package for:

python2.7 -m pip install foo

回答 2

您可以使用相应的python为特定的python版本执行 pip模块:

Python 2.6:

python2.6 -m pip install beautifulsoup4

Python 2.7

python2.7 -m pip install beautifulsoup4

You can execute pip module for a specific python version using the corresponding python:

Python 2.6:

python2.6 -m pip install beautifulsoup4

Python 2.7

python2.7 -m pip install beautifulsoup4

回答 3

您可以使用此语法

python_version -m pip install your_package

例如。如果您正在运行python3.5,则将其命名为“ python3”,并想安装numpy软件包

python3 -m pip install numpy

You can use this syntax

python_version -m pip install your_package

For example. If you’re running python3.5, you named it as “python3”, and want to install numpy package

python3 -m pip install numpy

回答 4

在Windows中,您可以通过提及python版本来执行pip模块(您需要确保启动器在您的路径上)

py -2 -m pip install pyfora

In Windows, you can execute the pip module by mentioning the python version ( You need to ensure that the launcher is on your path )

py -2 -m pip install pyfora

回答 5

另外,如果您想使用特定版本的python安装软件包的特定版本,可以采用这种方法

sudo python2.7 -m pip install pyudev=0.16

如果“ =”无效,请使用==

x@ubuntuserv:~$ sudo python2.7 -m pip install pyudev=0.16

无效的要求:’pyudev = 0.16’=不是有效的运算符。你是说==吗?

x@ubuntuserv:~$ sudo python2.7 -m pip install pyudev==0.16

工作良好

Alternatively, if you want to install specific version of the package with the specific version of python, this is the way

sudo python2.7 -m pip install pyudev=0.16

if the “=” doesnt work, use ==

x@ubuntuserv:~$ sudo python2.7 -m pip install pyudev=0.16

Invalid requirement: ‘pyudev=0.16’ = is not a valid operator. Did you mean == ?

x@ubuntuserv:~$ sudo python2.7 -m pip install pyudev==0.16

works fine


回答 6

Python 2

sudo pip2 install johnbonjovi  

Python 3

sudo pip3 install johnbonjovi

Python 2

sudo pip2 install johnbonjovi  

Python 3

sudo pip3 install johnbonjovi

回答 7

如果您同时安装了2.7和3.x版本的python,则只需将python 3.x版本的python exe文件重命名为类似的名称-将“ python.exe”重命名为“ python3.exe”。现在,您可以将pip分别用于两个版本。如果您通常键入“ pip install”,则默认情况下将考虑2.7版本。如果要在3.x版本上安装它,则需要将命令调用为“ python3 -m pip install”。

If you have both 2.7 and 3.x versions of python installed, then just rename the python exe file of python 3.x version to something like – “python.exe” to “python3.exe”. Now you can use pip for both versions individually. If you normally type “pip install ” it will consider the 2.7 version by default. If you want to install it on the 3.x version you need to call the command as “python3 -m pip install “.


回答 8

对于Python 3

sudo apt-get install python3-pip
sudo pip3 install beautifulsoup4

对于Python 2

sudo apt-get install python2-pip
sudo pip2 install beautifulsoup4

在Debian / Ubuntu上, pip是在安装适用于Python 2的软件包时使用的命令,pip3而是在安装适用于Python 3的软件包时使用的命令。

For Python 3

sudo apt-get install python3-pip
sudo pip3 install beautifulsoup4

For Python 2

sudo apt-get install python2-pip
sudo pip2 install beautifulsoup4

On Debian/Ubuntu, pip is the command to use when installing packages for Python 2, while pip3 is the command to use when installing packages for Python 3.


回答 9

对于python2使用:

py -2 -m pip install beautifulsoup4

for python2 use:

py -2 -m pip install beautifulsoup4

回答 10

与其他任何python脚本一样,您可以指定运行它的python安装。您可以将其放在外壳配置文件中以保存别名。该$1指的是你传递给脚本的第一个参数。

# PYTHON3 PIP INSTALL V2
alias pip_install3="python3 -m $(which pip) install $1"

As with any other python script, you may specify the python installation you’d like to run it with. You may put this in your shell profile to save the alias. The $1 refers to the first argument you pass to the script.

# PYTHON3 PIP INSTALL V2
alias pip_install3="python3 -m $(which pip) install $1"

回答 11

我在Windows上通过Chocolatey安装了Python 2.7 ,并在pip2.7.exe中找到了C:\tools\python2\Scripts

使用此可执行文件而不是pip命令为我安装了正确的模块(requests对于Python 2.7)。

I had Python 2.7 installed via chocolatey on Windows and found pip2.7.exe in C:\tools\python2\Scripts.

Using this executable instead of the pip command installed the correct module for me (requests for Python 2.7).


回答 12

我在另一个名为Twisted的软件包中也遇到了类似的问题。我想为Python 2.7安装它,但仅为Python 2.6(系统的默认版本)安装了它。

进行简单的更改对我有用。

在将Python 2.7的路径添加到$PATH变量时,请像这样将其追加到前面:PATH=/usr/local/bin:$PATH,以便系统使用该版本。

如果您遇到更多问题,可以关注这篇对我有帮助的博客文章-https: //github.com/h2oai/h2o-2/wiki/installing-python-2.7-on-centos-6.3.-follow-this-sequence仅用于centos机器

I faced a similar problem with another package called Twisted. I wanted to install it for Python 2.7, but it only got installed for Python 2.6 (system’s default version).

Making a simple change worked for me.

When adding Python 2.7’s path to your $PATH variable, append it to the front like this: PATH=/usr/local/bin:$PATH, so that the system uses that version.

If you face more problems, you can follow this blog post which helped me – https://github.com/h2oai/h2o-2/wiki/installing-python-2.7-on-centos-6.3.-follow-this-sequence-exactly-for-centos-machine-only


熊猫:设置编号。最大行数

问题:熊猫:设置编号。最大行数

我在查看以下内容时遇到问题DataFrame

n = 100
foo = DataFrame(index=range(n))
foo['floats'] = np.random.randn(n)
foo

问题是它不会在ipython笔记本中默认情况下不打印所有行,但是我必须切片才能查看结果行。甚至以下选项也不会更改输出:

pd.set_option('display.max_rows', 500)

有谁知道如何显示整个数组?

I have a problem viewing the following DataFrame:

n = 100
foo = DataFrame(index=range(n))
foo['floats'] = np.random.randn(n)
foo

The problem is that it does not print all rows per default in ipython notebook, but I have to slice to view the resulting rows. Even the following option does not change the output:

pd.set_option('display.max_rows', 500)

Does anyone know how to display the whole array?


回答 0

设置display.max_rows

pd.set_option('display.max_rows', 500)

对于较早版本的熊猫(<= 0.11.0),您需要同时更改display.heightdisplay.max_rows

pd.set_option('display.height', 500)
pd.set_option('display.max_rows', 500)

另请参阅pd.describe_option('display')

您只能一次临时设置一个选项,如下所示:

from IPython.display import display
with pd.option_context('display.max_rows', 100, 'display.max_columns', 10):
    display(df) #need display to show the dataframe when using with in jupyter
    #some pandas stuff

您还可以将选项重置为默认值,如下所示:

pd.reset_option('display.max_rows')

然后将它们全部重置:

pd.reset_option('all')

Set display.max_rows:

pd.set_option('display.max_rows', 500)

For older versions of pandas (<=0.11.0) you need to change both display.height and display.max_rows.

pd.set_option('display.height', 500)
pd.set_option('display.max_rows', 500)

See also pd.describe_option('display').

You can set an option only temporarily for this one time like this:

from IPython.display import display
with pd.option_context('display.max_rows', 100, 'display.max_columns', 10):
    display(df) #need display to show the dataframe when using with in jupyter
    #some pandas stuff

You can also reset an option back to its default value like this:

pd.reset_option('display.max_rows')

And reset all of them back:

pd.reset_option('all')


回答 1

就个人而言,我喜欢直接使用赋值语句设置选项,因为iPython使得通过制表符补全很容易找到。我很难记住确切的选项名称是什么,因此此方法对我有用。

例如,我要记住的是,它始于 pd.options

pd.options.<TAB>

大多数选项在 display

pd.options.display.<TAB>

从这里,我通常输出如下所示的当前值:

pd.options.display.max_rows
60

然后,将其设置为我想要的样子:

pd.options.display.max_rows = 100

另外,您应该注意用于选项的上下文管理器,它可以在代码块内临时设置选项。将选项名称作为字符串传递,后跟所需的值。您可以在同一行中传递任意数量的选项:

with pd.option_context('display.max_rows', 100, 'display.max_columns', 10):
    some pandas stuff

您还可以将选项重置为默认值,如下所示:

pd.reset_option('display.max_rows')

然后将它们全部重置:

pd.reset_option('all')

通过设置选项仍然非常好pd.set_option。我只是发现直接使用属性更容易,并且对get_option和的需求也更少set_option

Personally, I like setting the options directly with an assignment statement as it is easy to find via tab completion thanks to iPython. I find it hard to remember what the exact option names are, so this method works for me.

For instance, all I have to remember is that it begins with pd.options

pd.options.<TAB>

Most of the options are available under display

pd.options.display.<TAB>

From here, I usually output what the current value is like this:

pd.options.display.max_rows
60

I then set it to what I want it to be:

pd.options.display.max_rows = 100

Also, you should be aware of the context manager for options, which temporarily sets the options inside of a block of code. Pass in the option name as a string followed by the value you want it to be. You may pass in any number of options in the same line:

with pd.option_context('display.max_rows', 100, 'display.max_columns', 10):
    some pandas stuff

You can also reset an option back to its default value like this:

pd.reset_option('display.max_rows')

And reset all of them back:

pd.reset_option('all')

It is still perfectly good to set options via pd.set_option. I just find using the attributes directly is easier and there is less need for get_option and set_option.


回答 2

此注释此答案中已经指出了一点,但是我将尝试对该问题给出更直接的答案:

from IPython.display import display
import numpy as np
import pandas as pd

n = 100
foo = pd.DataFrame(index=range(n))
foo['floats'] = np.random.randn(n)

with pd.option_context("display.max_rows", foo.shape[0]):
    display(foo)

从pandas 0.13.1(pandas 0.13.1发行说明)开始,pandas.option_context可用。根据

[it]允许您执行带有一组选项的代码块,当您退出with块时,这些选项会还原为先前的设置。

It was already pointed in this comment and in this answer, but I’ll try to give a more direct answer to the question:

from IPython.display import display
import numpy as np
import pandas as pd

n = 100
foo = pd.DataFrame(index=range(n))
foo['floats'] = np.random.randn(n)

with pd.option_context("display.max_rows", foo.shape[0]):
    display(foo)

pandas.option_context is available since pandas 0.13.1 (pandas 0.13.1 release notes). According to this,

[it] allow[s] you to execute a codeblock with a set of options that revert to prior settings when you exit the with block.


回答 3

正如@hanleyhansen在评论中指出的那样,从0.18.1版本开始,该display.height选项已被弃用,并说“使用display.max_rows代替”。因此,您只需要像这样配置它:

pd.set_option('display.max_rows', 500)

请参阅发行说明-pandas 0.18.1文档

现在已弃用的display.height,display.width仅是一个格式选项,无法控制摘要的触发,类似于<0.11.0。

As @hanleyhansen noted in a comment, as of version 0.18.1, the display.height option is deprecated, and says “use display.max_rows instead”. So you just have to configure it like this:

pd.set_option('display.max_rows', 500)

See the Release Notes — pandas 0.18.1 documentation:

Deprecated display.height, display.width is now only a formatting option does not control triggering of summary, similar to < 0.11.0.


回答 4

pd.set_option('display.max_rows', 500)
df

不工作的Jupyter!
而是使用:

pd.set_option('display.max_rows', 500)
df.head(500)
pd.set_option('display.max_rows', 500)
df

Does not work in Jupyter!
Instead use:

pd.set_option('display.max_rows', 500)
df.head(500)

回答 5

正如这个答案类似的问题,不存在需要破解的设置。编写起来要简单得多:

print(foo.to_string())

As in this answer to a similar question, there is no need to hack settings. It is much simpler to write:

print(foo.to_string())

为什么使用“评估”是一种不好的做法?

问题:为什么使用“评估”是一种不好的做法?

我正在使用以下类轻松存储我的歌曲的数据。

class Song:
    """The class to store the details of each song"""
    attsToStore=('Name', 'Artist', 'Album', 'Genre', 'Location')
    def __init__(self):
        for att in self.attsToStore:
            exec 'self.%s=None'%(att.lower()) in locals()
    def setDetail(self, key, val):
        if key in self.attsToStore:
            exec 'self.%s=val'%(key.lower()) in locals()

我觉得这比写一个代码if/else块更具扩展性。但是,这eval似乎被认为是不良做法,使用不安全。如果是这样,有人可以向我解释原因并向我展示定义上述类的更好方法吗?

I am using the following class to easily store data of my songs.

class Song:
    """The class to store the details of each song"""
    attsToStore=('Name', 'Artist', 'Album', 'Genre', 'Location')
    def __init__(self):
        for att in self.attsToStore:
            exec 'self.%s=None'%(att.lower()) in locals()
    def setDetail(self, key, val):
        if key in self.attsToStore:
            exec 'self.%s=val'%(key.lower()) in locals()

I feel that this is just much more extensible than writing out an if/else block. However, eval seems to be considered a bad practice and unsafe to use. If so, can anyone explain to me why and show me a better way of defining the above class?


回答 0

是的,使用eval是一种不好的做法。仅出于以下几个原因:

  1. 几乎总有一种更好的方法
  2. 非常危险和不安全
  3. 使调试困难

您可以使用setattr代替:

class Song:
    """The class to store the details of each song"""
    attsToStore=('Name', 'Artist', 'Album', 'Genre', 'Location')
    def __init__(self):
        for att in self.attsToStore:
            setattr(self, att.lower(), None)
    def setDetail(self, key, val):
        if key in self.attsToStore:
            setattr(self, key.lower(), val)

编辑:

在某些情况下,您必须使用eval或exec。但是它们很少见。当然,在您的情况下使用eval是一个不好的做法。我要强调不好的做法,因为eval和exec经常在错误的地方使用。

编辑2:

似乎有些不同意,在OP案件中,评估是“非常危险和不安全的”。对于这种特定情况,这可能是正确的,但一般而言并非如此。问题是一般性的,我列出的理由也适用于一般性情况。

编辑3: 重新排序的点1和4

Yes, using eval is a bad practice. Just to name a few reasons:

  1. There is almost always a better way to do it
  2. Very dangerous and insecure
  3. Makes debugging difficult
  4. Slow

In your case you can use setattr instead:

class Song:
    """The class to store the details of each song"""
    attsToStore=('Name', 'Artist', 'Album', 'Genre', 'Location')
    def __init__(self):
        for att in self.attsToStore:
            setattr(self, att.lower(), None)
    def setDetail(self, key, val):
        if key in self.attsToStore:
            setattr(self, key.lower(), val)

EDIT:

There are some cases where you have to use eval or exec. But they are rare. Using eval in your case is a bad practice for sure. I’m emphasizing on bad practice because eval and exec are frequently used in the wrong place.

EDIT 2:

It looks like some disagree that eval is ‘very dangerous and insecure’ in the OP case. That might be true for this specific case but not in general. The question was general and the reasons I listed are true for the general case as well.

EDIT 3: Reordered point 1 and 4


回答 1

使用eval是很弱的,不是一个明显的习惯。

  1. 它违反了“软件基本原理”。您的来源不是可执行文件的总和。除了您的资料来源外,还eval必须清楚地了解到的参数。因此,它是万不得已的工具。

  2. 通常,这是经过漫长设计的标志。动态构建动态源代码的理由很少。委托和其他OO设计技术几乎可以完成任何事情。

  3. 这会导致相对缓慢的小代码即时编译。通过使用更好的设计模式可以避免开销。

作为注脚,在精神错乱的社会主义者的手中,这可能效果不佳。但是,当遇到精神错乱的用户或管理员时,最好不要首先让他们理解Python。在真正的邪恶之手,Python可以承担责任。eval完全不会增加风险。

Using eval is weak, not a clearly bad practice.

  1. It violates the “Fundamental Principle of Software”. Your source is not the sum total of what’s executable. In addition to your source, there are the arguments to eval, which must be clearly understood. For this reason, it’s the tool of last resort.

  2. It’s usually a sign of thoughtless design. There’s rarely a good reason for dynamic source code, built on-the-fly. Almost anything can be done with delegation and other OO design techniques.

  3. It leads to relatively slow on-the-fly compilation of small pieces of code. An overhead which can be avoided by using better design patterns.

As a footnote, in the hands of deranged sociopaths, it may not work out well. However, when confronted with deranged sociopathic users or administrators, it’s best to not give them interpreted Python in the first place. In the hands of the truly evil, Python can a liability; eval doesn’t increase the risk at all.


回答 2

在这种情况下,可以。代替

exec 'self.Foo=val'

您应该使用内置函数setattr

setattr(self, 'Foo', val)

In this case, yes. Instead of

exec 'self.Foo=val'

you should use the builtin function setattr:

setattr(self, 'Foo', val)

回答 3

是的:

使用Python破解:

>>> eval(input())
"__import__('os').listdir('.')"
...........
...........   #dir listing
...........

下面的代码将列出在Windows计算机上运行的所有任务。

>>> eval(input())
"__import__('subprocess').Popen(['tasklist'],stdout=__import__('subprocess').PIPE).communicate()[0]"

在Linux中:

>>> eval(input())
"__import__('subprocess').Popen(['ps', 'aux'],stdout=__import__('subprocess').PIPE).communicate()[0]"

Yes, it is:

Hack using Python:

>>> eval(input())
"__import__('os').listdir('.')"
...........
...........   #dir listing
...........

The below code will list all tasks running on a Windows machine.

>>> eval(input())
"__import__('subprocess').Popen(['tasklist'],stdout=__import__('subprocess').PIPE).communicate()[0]"

In Linux:

>>> eval(input())
"__import__('subprocess').Popen(['ps', 'aux'],stdout=__import__('subprocess').PIPE).communicate()[0]"

回答 4

值得注意的是,对于有问题的特定问题,可以使用eval以下几种替代方法:

如上所述,最简单的方法是使用setattr

def __init__(self):
    for name in attsToStore:
        setattr(self, name, None)

一种不太明显的方法是__dict__直接更新对象的对象。如果您要做的只是将属性初始化为None,那么这比上面的方法要简单。但是考虑一下:

def __init__(self, **kwargs):
    for name in self.attsToStore:
       self.__dict__[name] = kwargs.get(name, None)

这使您可以将关键字参数传递给构造函数,例如:

s = Song(name='History', artist='The Verve')

它还允许您locals()更加明确地使用它,例如:

s = Song(**locals())

…并且,如果您确实要分配None名称的属性,请在中找到locals()

s = Song(**dict([(k, None) for k in locals().keys()]))

为对象提供属性列表默认值的另一种方法是定义类的__getattr__方法:

def __getattr__(self, name):
    if name in self.attsToStore:
        return None
    raise NameError, name

如果无法以常规方式找到named属性,则调用此方法。这种方法比简单地在构造函数中设置属性或更新的方式要简单一些__dict__,但是它的优点是除非存在该属性,否则不实际创建该属性,这样可以大大减少类的内存使用量。

所有这些的要点:通常有很多原因可以避免:避免eval执行无法控制的代码的安全性问题,无法调试的代码的实际问题等。但是,更重要的原因是通常,您不需要使用它。Python向程序员公开了很多内部机制,因此您几乎不需要编写编写代码的代码。

It’s worth noting that for the specific problem in question, there are several alternatives to using eval:

The simplest, as noted, is using setattr:

def __init__(self):
    for name in attsToStore:
        setattr(self, name, None)

A less obvious approach is updating the object’s __dict__ object directly. If all you want to do is initialize the attributes to None, then this is less straightforward than the above. But consider this:

def __init__(self, **kwargs):
    for name in self.attsToStore:
       self.__dict__[name] = kwargs.get(name, None)

This allows you to pass keyword arguments to the constructor, e.g.:

s = Song(name='History', artist='The Verve')

It also allows you to make your use of locals() more explicit, e.g.:

s = Song(**locals())

…and, if you really want to assign None to the attributes whose names are found in locals():

s = Song(**dict([(k, None) for k in locals().keys()]))

Another approach to providing an object with default values for a list of attributes is to define the class’s __getattr__ method:

def __getattr__(self, name):
    if name in self.attsToStore:
        return None
    raise NameError, name

This method gets called when the named attribute isn’t found in the normal way. This approach somewhat less straightforward than simply setting the attributes in the constructor or updating the __dict__, but it has the merit of not actually creating the attribute unless it exists, which can pretty substantially reduce the class’s memory usage.

The point of all this: There are lots of reasons, in general, to avoid eval – the security problem of executing code that you don’t control, the practical problem of code you can’t debug, etc. But an even more important reason is that generally, you don’t need to use it. Python exposes so much of its internal mechanisms to the programmer that you rarely really need to write code that writes code.


回答 5

其他用户指出了如何可以更改不依赖的代码eval; 我将提供一个使用的合法用例eval,即使在CPython中也可以找到一个用例:testing

这是我在test_unary.py其中测试是否(+|-|~)b'a'引发的一个示例TypeError

def test_bad_types(self):
    for op in '+', '-', '~':
        self.assertRaises(TypeError, eval, op + "b'a'")
        self.assertRaises(TypeError, eval, op + "'a'")

显然,这里的用法不是坏习惯;您定义输入,仅观察行为。eval方便测试。

看看这个搜索在eval,在CPython的Git仓库中进行; 大量使用eval进行测试。

Other users pointed out how your code can be changed as to not depend on eval; I’ll offer a legitimate use-case for using eval, one that is found even in CPython: testing.

Here’s one example I found in test_unary.py where a test on whether (+|-|~)b'a' raises a TypeError:

def test_bad_types(self):
    for op in '+', '-', '~':
        self.assertRaises(TypeError, eval, op + "b'a'")
        self.assertRaises(TypeError, eval, op + "'a'")

The usage is clearly not bad practice here; you define the input and merely observe behavior. eval is handy for testing.

Take a look at this search for eval, performed on the CPython git repository; testing with eval is heavily used.


回答 6

什么时候 eval()用于处理用户提供的输入时,您使用户能够拖放到提供以下内容:

"__import__('code').InteractiveConsole(locals=globals()).interact()"

您可以摆脱它,但是通常您不希望向量在您的应用程序中执行任意代码

When eval() is used to process user-provided input, you enable the user to Drop-to-REPL providing something like this:

"__import__('code').InteractiveConsole(locals=globals()).interact()"

You may get away with it, but normally you don’t want vectors for arbitrary code execution in your applications.


回答 7

除了@Nadia Alramli答案之外,由于我是Python的新手,并且渴望检查使用eval将如何影响计时,因此我尝试了一个小程序,以下是观察结果:

#Difference while using print() with eval() and w/o eval() to print an int = 0.528969s per 100000 evals()

from datetime import datetime
def strOfNos():
    s = []
    for x in range(100000):
        s.append(str(x))
    return s

strOfNos()
print(datetime.now())
for x in strOfNos():
    print(x) #print(eval(x))
print(datetime.now())

#when using eval(int)
#2018-10-29 12:36:08.206022
#2018-10-29 12:36:10.407911
#diff = 2.201889 s

#when using int only
#2018-10-29 12:37:50.022753
#2018-10-29 12:37:51.090045
#diff = 1.67292

In addition to @Nadia Alramli answer, since I am new to Python and was eager to check how using eval will affect the timings, I tried a small program and below were the observations:

#Difference while using print() with eval() and w/o eval() to print an int = 0.528969s per 100000 evals()

from datetime import datetime
def strOfNos():
    s = []
    for x in range(100000):
        s.append(str(x))
    return s

strOfNos()
print(datetime.now())
for x in strOfNos():
    print(x) #print(eval(x))
print(datetime.now())

#when using eval(int)
#2018-10-29 12:36:08.206022
#2018-10-29 12:36:10.407911
#diff = 2.201889 s

#when using int only
#2018-10-29 12:37:50.022753
#2018-10-29 12:37:51.090045
#diff = 1.67292

将dict转换为OrderedDict

问题:将dict转换为OrderedDict

我在collections.OrderedDict上课时遇到了一些麻烦。我在Raspbian(Raspberry Pi的Debian发行版)上使用Python 2.7。我正在尝试打印两个字典,以便进行文本冒险的比较(并排)。该顺序对于准确比较至关重要。不管我尝试什么,词典都以通常的无序方式打印。

这是我在RPi上执行的操作所得到的:

import collections

ship = {"NAME": "Albatross",
         "HP":50,
         "BLASTERS":13,
         "THRUSTERS":18,
         "PRICE":250}

ship = collections.OrderedDict(ship)

print ship
# OrderedDict([('PRICE', 250), ('HP', 50), ('NAME', 'Albatross'), ('BLASTERS', 13), ('THRUSTERS', 18)])

显然有些不对劲,因为它正在打印函数调用并将键和值组放入嵌套列表中。

这是通过在PC上运行类似内容得到的:

import collections

Joe = {"Age": 28, "Race": "Latino", "Job": "Nurse"}
Bob = {"Age": 25, "Race": "White", "Job": "Mechanic", "Random": "stuff"}

#Just for clarity:
Joe = collections.OrderedDict(Joe)
Bob = collections.OrderedDict(Bob)

print Joe
# OrderedDict([('Age', 28), ('Race', 'Latino'), ('Job', 'Nurse')])
print Bob
# OrderedDict([('Age', 25), ('Race', 'White'), ('Job', 'Mechanic'), ('Random', 'stuff')])

这次是有秩序的,但是它不应该打印其他东西吗?(将其放入列表并显示函数调用。)

我在哪里犯错误?它与pi的pi版本无关,因为它只是Linux的版本。

I am having some trouble using the collections.OrderedDict class. I am using Python 2.7 on Raspbian, the Debian distro for Raspberry Pi. I am trying to print two dictionaries in order for comparison (side-by-side) for a text-adventure. The order is essential to compare accurately. No matter what I try the dictionaries print in their usual unordered way.

Here’s what I get when I do it on my RPi:

import collections

ship = {"NAME": "Albatross",
         "HP":50,
         "BLASTERS":13,
         "THRUSTERS":18,
         "PRICE":250}

ship = collections.OrderedDict(ship)

print ship
# OrderedDict([('PRICE', 250), ('HP', 50), ('NAME', 'Albatross'), ('BLASTERS', 13), ('THRUSTERS', 18)])

Obviously there is something not right because it is printing the function call and putting the keys and value groups into a nested list…

This is what I got by running something similar on my PC:

import collections

Joe = {"Age": 28, "Race": "Latino", "Job": "Nurse"}
Bob = {"Age": 25, "Race": "White", "Job": "Mechanic", "Random": "stuff"}

#Just for clarity:
Joe = collections.OrderedDict(Joe)
Bob = collections.OrderedDict(Bob)

print Joe
# OrderedDict([('Age', 28), ('Race', 'Latino'), ('Job', 'Nurse')])
print Bob
# OrderedDict([('Age', 25), ('Race', 'White'), ('Job', 'Mechanic'), ('Random', 'stuff')])

This time, it is in order, but it shouldn’t be printing the other things though right? (The putting it into list and showing function call.)

Where am I making my error? It shouldn’t be anything to do with the pi version of Python because it is just the Linux version.


回答 0

您正在创建一个字典第一,然后传递一个字典来的OrderedDict。对于<3.6 (*)的 Python版本,到您这样做时,排序将不再正确。dict本质上是无序的。

改为传递一个元组序列:

ship = [("NAME", "Albatross"),
        ("HP", 50),
        ("BLASTERS", 13),
        ("THRUSTERS", 18),
        ("PRICE", 250)]
ship = collections.OrderedDict(ship)

打印时看到的OrderedDict是它的表示形式,它是完全正确的。OrderedDict([('PRICE', 250), ('HP', 50), ('NAME', 'Albatross'), ('BLASTERS', 13), ('THRUSTERS', 18)])只是以可复制的方式向您显示的内容OrderedDict


(*):在CPython 3.6实现中,该dict类型已更新为使用内存效率更高的内部结构,该结构具有保留插入顺序的快乐副作用,并且通过扩展,问题中显示的代码可以正常工作。从Python 3.7开始,Python语言规范已更新,要求所有Python实现都必须遵循此行为。有关详细信息以及在某些情况下为什么仍要使用的原因请参见我的其他答案OrderedDict()

You are creating a dictionary first, then passing that dictionary to an OrderedDict. For Python versions < 3.6 (*), by the time you do that, the ordering is no longer going to be correct. dict is inherently not ordered.

Pass in a sequence of tuples instead:

ship = [("NAME", "Albatross"),
        ("HP", 50),
        ("BLASTERS", 13),
        ("THRUSTERS", 18),
        ("PRICE", 250)]
ship = collections.OrderedDict(ship)

What you see when you print the OrderedDict is it’s representation, and it is entirely correct. OrderedDict([('PRICE', 250), ('HP', 50), ('NAME', 'Albatross'), ('BLASTERS', 13), ('THRUSTERS', 18)]) just shows you, in a reproducable representation, what the contents are of the OrderedDict.


(*): In the CPython 3.6 implementation, the dict type was updated to use a more memory efficient internal structure that has the happy side effect of preserving insertion order, and by extension the code shown in the question works without issues. As of Python 3.7, the Python language specification has been updated to require that all Python implementations must follow this behaviour. See this other answer of mine for details and also why you’d still may want to use an OrderedDict() for certain cases.


回答 1

如果您无法在定义dict的地方编辑这部分代码,您仍然可以随时以所需的任何方式对其进行排序,如下所示:

from collections import OrderedDict

order_of_keys = ["key1", "key2", "key3", "key4", "key5"]
list_of_tuples = [(key, your_dict[key]) for key in order_of_keys]
your_dict = OrderedDict(list_of_tuples)

If you can’t edit this part of code where your dict was defined you can still order it at any point in any way you want, like this:

from collections import OrderedDict

order_of_keys = ["key1", "key2", "key3", "key4", "key5"]
list_of_tuples = [(key, your_dict[key]) for key in order_of_keys]
your_dict = OrderedDict(list_of_tuples)

回答 2

在大多数情况下,当我们需要自定义订单而不是ASC等通用订单时,我们会使用OrderedDict。

这是建议的解决方案:

import collections
ship = {"NAME": "Albatross",
         "HP":50,
         "BLASTERS":13,
         "THRUSTERS":18,
         "PRICE":250}

ship = collections.OrderedDict(ship)

print ship


new_dict = collections.OrderedDict()
new_dict["NAME"]=ship["NAME"]
new_dict["HP"]=ship["HP"]
new_dict["BLASTERS"]=ship["BLASTERS"]
new_dict["THRUSTERS"]=ship["THRUSTERS"]
new_dict["PRICE"]=ship["PRICE"]


print new_dict

这将输出:

OrderedDict([('PRICE', 250), ('HP', 50), ('NAME', 'Albatross'), ('BLASTERS', 13), ('THRUSTERS', 18)])
OrderedDict([('NAME', 'Albatross'), ('HP', 50), ('BLASTERS', 13), ('THRUSTERS', 18), ('PRICE', 250)])

注意:删除条目时,新排序的词典将保持其排序顺序。但是当添加新密钥时,密钥会附加到末尾,并且不会保留排序。(Official doc

Most of the time we go for OrderedDict when we required a custom order not a generic one like ASC etc.

Here is the proposed solution:

import collections
ship = {"NAME": "Albatross",
         "HP":50,
         "BLASTERS":13,
         "THRUSTERS":18,
         "PRICE":250}

ship = collections.OrderedDict(ship)

print ship


new_dict = collections.OrderedDict()
new_dict["NAME"]=ship["NAME"]
new_dict["HP"]=ship["HP"]
new_dict["BLASTERS"]=ship["BLASTERS"]
new_dict["THRUSTERS"]=ship["THRUSTERS"]
new_dict["PRICE"]=ship["PRICE"]


print new_dict

This will be output:

OrderedDict([('PRICE', 250), ('HP', 50), ('NAME', 'Albatross'), ('BLASTERS', 13), ('THRUSTERS', 18)])
OrderedDict([('NAME', 'Albatross'), ('HP', 50), ('BLASTERS', 13), ('THRUSTERS', 18), ('PRICE', 250)])

Note: The new sorted dictionaries maintain their sort order when entries are deleted. But when new keys are added, the keys are appended to the end and the sort is not maintained.(official doc)


回答 3

使用dict.items(); 它可以很简单,如下所示:

ship = collections.OrderedDict(ship.items())

Use dict.items(); it can be as simple as following:

ship = collections.OrderedDict(ship.items())

Jupyter Notebook中的tqdm反复打印新的进度条

问题:Jupyter Notebook中的tqdm反复打印新的进度条

我正在使用tqdm在Jupyter笔记本中运行的脚本打印进度。我正在通过将所有消息打印到控制台tqdm.write()。但是,这仍然给我这样的偏斜输出:

也就是说,每次必须打印新行时,新进度条都会打印在下一行上。通过终端运行脚本时不会发生这种情况。我该如何解决?

I am using tqdm to print progress in a script I’m running in a Jupyter notebook. I am printing all messages to the console via tqdm.write(). However, this still gives me a skewed output like so:

That is, each time a new line has to be printed, a new progress bar is printed on the next line. This does not happen when I run the script via terminal. How can I solve this?


回答 0

尝试使用tqdm.notebook.tqdm,而不是tqdm作为概述这里

这就像将导入更改为:

from tqdm.notebook import tqdm

祝好运!

编辑:经过测试,似乎tqdm在Jupyter笔记本中的“文本模式”下确实可以正常工作。很难说,因为您没有提供最小的示例,但是看来您的问题是由每次迭代中的打印语句引起的。在每个状态栏更新之间,print语句输出一个数字(〜0.89),这使输出混乱。尝试删除打印语句。

Try using tqdm.notebook.tqdm instead of tqdm, as outlined here.

This could be as simple as changing your import to:

from tqdm.notebook import tqdm

Good luck!

EDIT: After testing, it seems that tqdm actually works fine in ‘text mode’ in Jupyter notebook. It’s hard to tell because you haven’t provided a minimal example, but it looks like your problem is caused by a print statement in each iteration. The print statement is ouputting a number (~0.89) in between each status bar update, which is messing up the output. Try removing the print statement.


回答 1

对于tqdm_notebook对您不起作用的情况,这是一个替代答案。

给出以下示例:

from time import sleep
from tqdm import tqdm

values = range(3)
with tqdm(total=len(values)) as pbar:
    for i in values:
        pbar.write('processed: %d' %i)
        pbar.update(1)
        sleep(1)

输出看起来像这样(进度将显示为红色):

  0%|          | 0/3 [00:00<?, ?it/s]
processed: 1
 67%|██████▋   | 2/3 [00:01<00:00,  1.99it/s]
processed: 2
100%|██████████| 3/3 [00:02<00:00,  1.53it/s]
processed: 3

问题是stdoutstderr的输出是异步处理的,并根据新行分别进行处理。

如果说Jupyter在stderr上接收第一行,然后在stdout上接收“已处理”输出。然后,一旦它在stderr上收到输出以更新进度,就不会返回并更新第一行,因为它只会更新最后一行。相反,它将不得不写一个新行。

解决方法1,写入stdout

一种解决方法是将两者都输出到stdout:

import sys
from time import sleep
from tqdm import tqdm

values = range(3)
with tqdm(total=len(values), file=sys.stdout) as pbar:
    for i in values:
        pbar.write('processed: %d' % (1 + i))
        pbar.update(1)
        sleep(1)

输出将更改为(不再显示红色):

processed: 1   | 0/3 [00:00<?, ?it/s]
processed: 2   | 0/3 [00:00<?, ?it/s]
processed: 3   | 2/3 [00:01<00:00,  1.99it/s]
100%|██████████| 3/3 [00:02<00:00,  1.53it/s]

在这里我们可以看到Jupyter似乎直到行尾才清除。我们可以通过添加空格来添加另一种解决方法。如:

import sys
from time import sleep
from tqdm import tqdm

values = range(3)
with tqdm(total=len(values), file=sys.stdout) as pbar:
    for i in values:
        pbar.write('processed: %d%s' % (1 + i, ' ' * 50))
        pbar.update(1)
        sleep(1)

这给了我们:

processed: 1                                                  
processed: 2                                                  
processed: 3                                                  
100%|██████████| 3/3 [00:02<00:00,  1.53it/s]

解决方法2,改为设置描述

通常,没有两个输出而是更新描述可能更直接,例如:

import sys
from time import sleep
from tqdm import tqdm

values = range(3)
with tqdm(total=len(values), file=sys.stdout) as pbar:
    for i in values:
        pbar.set_description('processed: %d' % (1 + i))
        pbar.update(1)
        sleep(1)

输出(处理过程中更新说明):

processed: 3: 100%|██████████| 3/3 [00:02<00:00,  1.53it/s]

结论

您通常可以使它与纯tqdm一起正常工作。但是,如果tqdm_notebook为您工作,请使用它(但是您可能不会读那么远)。

This is an alternative answer for the case where tqdm_notebook doesn’t work for you.

Given the following example:

from time import sleep
from tqdm import tqdm

values = range(3)
with tqdm(total=len(values)) as pbar:
    for i in values:
        pbar.write('processed: %d' %i)
        pbar.update(1)
        sleep(1)

The output would look something like this (progress would show up red):

  0%|          | 0/3 [00:00<?, ?it/s]
processed: 1
 67%|██████▋   | 2/3 [00:01<00:00,  1.99it/s]
processed: 2
100%|██████████| 3/3 [00:02<00:00,  1.53it/s]
processed: 3

The problem is that the output to stdout and stderr are processed asynchronously and separately in terms of new lines.

If say Jupyter receives on stderr the first line and then the “processed” output on stdout. Then once it receives an output on stderr to update the progress, it wouldn’t go back and update the first line as it would only update the last line. Instead it will have to write a new line.

Workaround 1, writing to stdout

One workaround would be to output both to stdout instead:

import sys
from time import sleep
from tqdm import tqdm

values = range(3)
with tqdm(total=len(values), file=sys.stdout) as pbar:
    for i in values:
        pbar.write('processed: %d' % (1 + i))
        pbar.update(1)
        sleep(1)

The output will change to (no more red):

processed: 1   | 0/3 [00:00<?, ?it/s]
processed: 2   | 0/3 [00:00<?, ?it/s]
processed: 3   | 2/3 [00:01<00:00,  1.99it/s]
100%|██████████| 3/3 [00:02<00:00,  1.53it/s]

Here we can see that Jupyter doesn’t seem to clear until the end of the line. We could add another workaround for that by adding spaces. Such as:

import sys
from time import sleep
from tqdm import tqdm

values = range(3)
with tqdm(total=len(values), file=sys.stdout) as pbar:
    for i in values:
        pbar.write('processed: %d%s' % (1 + i, ' ' * 50))
        pbar.update(1)
        sleep(1)

Which gives us:

processed: 1                                                  
processed: 2                                                  
processed: 3                                                  
100%|██████████| 3/3 [00:02<00:00,  1.53it/s]

Workaround 2, set description instead

It might in general be more straight forward not to have two outputs but update the description instead, e.g.:

import sys
from time import sleep
from tqdm import tqdm

values = range(3)
with tqdm(total=len(values), file=sys.stdout) as pbar:
    for i in values:
        pbar.set_description('processed: %d' % (1 + i))
        pbar.update(1)
        sleep(1)

With the output (description updated while it’s processing):

processed: 3: 100%|██████████| 3/3 [00:02<00:00,  1.53it/s]

Conclusion

You can mostly get it to work fine with plain tqdm. But if tqdm_notebook works for you, just use that (but then you’d probably not read that far).


回答 2

现在大多数答案已经过时了。如果正确导入tqdm,则更好。

from tqdm import tqdm_notebook as tqdm

Most of the answers are outdated now. Better if you import tqdm correctly.

from tqdm import tqdm_notebook as tqdm


回答 3

如果此处的其他技巧不起作用,并且-和我一样-您正在通过中使用pandas集成progress_apply,则可以进行tqdm处理:

from tqdm.autonotebook import tqdm
tqdm.pandas()

df.progress_apply(row_function, axis=1)

这里的重点在于tqdm.autonotebook模块。正如他们在IPython Notebook中使用的说明中所述,这使得tqdm可以在Jupyter笔记本和Jupyter控制台中使用的进度条格式之间进行选择-由于我这一方面仍缺乏进一步的研究,该特定格式选择的tqdm.autonotebook效果很好pandas,而所有其他格式都没有不是,progress_apply特别是。

If the other tips here don’t work and – just like me – you’re using the pandas integration through progress_apply, you can let tqdm handle it:

from tqdm.autonotebook import tqdm
tqdm.pandas()

df.progress_apply(row_function, axis=1)

The main point here lies in the tqdm.autonotebook module. As stated in their instructions for use in IPython Notebooks, this makes tqdm choose between progress bar formats used in Jupyter notebooks and Jupyter consoles – for a reason still lacking further investigations on my side, the specific format chosen by tqdm.autonotebook works smoothly in pandas, while all others didn’t, for progress_apply specifically.


回答 4

要完成oscarbranson的答案:可以根据从何处运行进度条来自动选择控制台或笔记本版本的进度条:

from tqdm.autonotebook import tqdm

更多信息可以在这里找到

To complete oscarbranson’s answer: it’s possible to automatically pick console or notebook versions of progress bar depending on where it’s being run from:

from tqdm.autonotebook import tqdm

More info can be found here


回答 5

以上都不适合我。我发现运行以下命令可以在出现错误后解决此问题(它只会清除后台进度条的所有实例):

from tqdm import tqdm

# blah blah your code errored

tqdm._instances.clear()

None of the above works for me. I find that running to following sorts this issue after error (It just clears all the instances of progress bars in the background):

from tqdm import tqdm

# blah blah your code errored

tqdm._instances.clear()

回答 6

使用tqdm_notebook

从tqdm导入tqdm_notebook作为tqdm

x = [1,2,3,4,5]

对于我在tqdm(len(x))中:

print(x[i])

Use tqdm_notebook

from tqdm import tqdm_notebook as tqdm

x=[1,2,3,4,5]

for i in tqdm(range(0,len(x))):

    print(x[i])

回答 7

对于在Windows上无法解决此处提到的任何解决方案重复栏问题的每个人。我必须按照修复该问题的tqdm已知问题中的colorama说明安装该软件包。

pip install colorama

通过以下示例进行尝试:

from tqdm import tqdm
from time import sleep

for _ in tqdm(range(5), "All", ncols = 80, position = 0):
    for _ in tqdm(range(100), "Sub", ncols = 80, position = 1, leave = False):
        sleep(0.01)

会产生类似:

All:  60%|████████████████████████                | 3/5 [00:03<00:02,  1.02s/it]
Sub:  50%|██████████████████▌                  | 50/100 [00:00<00:00, 97.88it/s]

For everyone who is on windows and couldn’t solve the duplicating bars issue with any of the solutions mentioned here. I had to install the colorama package as stated in tqdm’s known issues which fixed it.

pip install colorama

Try it with this example:

from tqdm import tqdm
from time import sleep

for _ in tqdm(range(5), "All", ncols = 80, position = 0):
    for _ in tqdm(range(100), "Sub", ncols = 80, position = 1, leave = False):
        sleep(0.01)

Which will produce something like:

All:  60%|████████████████████████                | 3/5 [00:03<00:02,  1.02s/it]
Sub:  50%|██████████████████▌                  | 50/100 [00:00<00:00, 97.88it/s]

有趣好用的Python教程

退出移动版
微信支付
请使用 微信 扫码支付