分类目录归档：知识问答

如何修复PyDev“导入时未定义变量”错误？

2021年8月17日 Python实用宝典

问题：如何修复PyDev“导入时未定义变量”错误？

我有一个在Eclipse中使用PyDev的Python项目，并且PyDev不断为我的代码生成错误错误。我有一个settings定义settings对象的模块。我将其导入模块b并使用以下属性分配属性：

from settings import settings
settings.main = object()

在我的一些代码（但不是全部）中，语句如下：

from settings import settings
print settings.main

…即使在代码运行没有问题的情况下，也会在Eclipse代码错误窗格中生成“从import：main导入未定义的变量”消息。我该如何纠正？

I’ve got a Python project using PyDev in Eclipse, and PyDev keeps generating false errors for my code. I have a module settings that defines a settings object. I import that in module b and assign an attribute with:

from settings import settings
settings.main = object()

In some of my code–but not all of it, statements like:

from settings import settings
print settings.main

… generate “Undefined variable from import: main” messages in the Eclipse code error pane, even though the code runs without a problem. How can I correct these?

回答 0

对于您项目中的代码，唯一的方法是添加一个声明，声明您希望-可能受到a的保护，if False从而使其无法执行（如果您执行了静态代码分析，则只能看到您看到的内容，而不能看到运行时信息）自己打开该模块，则不会显示预期为main。

为了克服这个问题，有一些选择：

如果它是某个外部模块，则可以将其添加到，forced builtins以便PyDev为其生成外壳以获取运行时信息（有关详细信息，请参见http://pydev.org/manual_101_interpreter.html），即：大多数情况下，PyDev会将模块导入外壳中，并对模块中找到的类进行dir(module)和dir，以显示完成内容并进行代码分析。
您可以在出现错误的行中使用Ctrl + 1（对于Mac为Cmd + 1），PyDev将为您提供一个添加注释的选项，以忽略该错误。
可以创建一个stub模块并将其添加到补全中predefined（http://pydev.org/manual_101_interpreter.html上也有详细信息）。

For code in your project, the only way is adding a declaration saying that you expected that — possibly protected by an if False so that it doesn’t execute (the static code-analysis only sees what you see, not runtime info — if you opened that module yourself, you’d have no indication that main was expected).

To overcome this there are some choices:

If it is some external module, it’s possible to add it to the forced builtins so that PyDev spawns a shell for it to obtain runtime information (see http://pydev.org/manual_101_interpreter.html for details) — i.e.: mostly, PyDev will import the module in a shell and do a dir(module) and dir on the classes found in the module to present completions and make code analysis.
You can use Ctrl+1 (Cmd+1 for Mac) in a line with an error and PyDev will present you an option to add a comment to ignore that error.
It’s possible to create a stub module and add it to the predefined completions (http://pydev.org/manual_101_interpreter.html also has details on that).

回答 1

我正在使用opencv，它依赖于二进制文件等，所以我有脚本，其中每隔一行都有这个愚蠢的错误。Python是一种动态语言，因此不应将这种情况视为错误。

我通过以下步骤彻底删除了这些错误：

窗口->首选项-> PyDev->编辑器->代码分析->未定义->来自导入的未定义变量->忽略

就是这样。

也可能是，窗口->首选项-> PyDev->编辑器->代码分析->导入->找不到导入->忽略

I’m using opencv which relies on binaries etc so I have scripts where every other line has this silly error. Python is a dynamic language so such occasions shouldn’t be considered errors.

I removed these errors altogether by going to:

Window -> Preferences -> PyDev -> Editor -> Code Analysis -> Undefined -> Undefined Variable From Import -> Ignore

And that’s that.

It may also be, Window -> Preferences -> PyDev -> Editor -> Code Analysis -> Imports -> Import not found -> Ignore

回答 2

标为答案的帖子提供了一种解决方法，而不是解决方案。

此解决方案适用于我：

去 Window - Preferences - PyDev - Interpreters - Python Interpreter
转到Forced builtins标签
点击 New...
输入模块名称（multiprocessing以我为例），然后单击OK

错误消息不仅将消失，而且模块成员也将被识别。

The post marked as answer gives a workaround, not a solution.

This solution works for me:

Go to Window - Preferences - PyDev - Interpreters - Python Interpreter
Go to the Forced builtins tab
Click on New...
Type the name of the module (multiprocessing in my case) and click OK

Not only will the error messages disappear, the module members will also be recognized.

回答 3

我在Eclipse / PyDev项目中遇到类似的问题。在这个项目中，python代码的根目录是项目的子目录。

--> MyProject
 + --> src         Root of python code
   + --> module1     A module 
   + --> module2     Another module
 + --> docs
 + --> test

在调试或运行项目时，一切都很好，因为工作目录已设置到正确的位置。但是，PyDev代码分析未能找到来自module1或module2的任何导入。

解决方案是编辑项目属性-> PyDev-PYTHONPATH部分，然后从“源文件夹”选项卡中删除/ MyProject，然后向其中添加/ MyProject / src。

I was having a similar problem with an Eclipse/PyDev project. In this project the root directory of the python code was a sub-directory of the project.

--> MyProject
 + --> src         Root of python code
   + --> module1     A module 
   + --> module2     Another module
 + --> docs
 + --> test

When the project was debugged or run everything was fine as the working directory was set to the correct place. However the PyDev code analysis was failing to find any imports from module1 or module2.

Solution was to edit the project properties -> PyDev – PYTHONPATH section and remove /MyProject from the source folders tab and add /MyProject/src to it instead.

回答 4

这对我有用：

步骤1）移除解释器，再次自动配置

步骤2）窗口-首选项-PyDev-解释器-Python解释器转到“强制内置”选项卡。单击“新建…”。输入模块名称（在我的情况下为curses），然后单击“确定”。

步骤3）右键单击项目浏览器中出现错误的模块。转到PyDev->代码分析。

This worked for me:

step 1) Removing the interpreter, auto configuring it again

step 2) Window – Preferences – PyDev – Interpreters – Python Interpreter Go to the Forced builtins tab Click on New… Type the name of the module (curses in my case) and click OK

step 3) Right click in the project explorer on whichever module is giving errors. Go to PyDev->Code analysis.

回答 5

我有同样的问题。我在Windows上使用Python和Eclipse。该代码运行得很好，但是Eclipse到处都会显示错误。将文件夹“ Lib”的名称更改为“ lib”（C：\ Python27 \ lib）之后，问题得以解决。看来，如果字母的大写字母与配置文件中的字母不匹配，这有时会引起问题（但似乎并不总是这样，因为错误检查很长一段时间就可以在问题突然出现之前很明显了原因）。

I had the same problem. I am using Python and Eclipse on Windows. The code was running just fine, but eclipse show errors everywhere. After I changed the name of the folder ‘Lib’ to ‘lib’ (C:\Python27\lib), the problem was solved. It seems that if the capitalization of the letters doesn’t match the one in the configuration file, this will sometimes cause problems (but it seems like not always, because the error checking was fine for long time before the problems suddenly appeared for no obvious reason).

回答 6

我正在做什么的一个近似值：

import module.submodule

class MyClass:
    constant = submodule.constant

皮林特说： E: 4,15: Undefined variable 'submodule' (undefined-variable)

我通过更改导入来解决此问题，例如：

from module.submodule import CONSTANT

class MyClass:
    constant = CONSTANT

注意：我还使用导入的变量重命名了名称，以反映其常量性质。

An approximation of what I was doing:

import module.submodule

class MyClass:
    constant = submodule.constant

To which pylint said: E: 4,15: Undefined variable 'submodule' (undefined-variable)

I resolved this by changing my import like:

from module.submodule import CONSTANT

class MyClass:
    constant = CONSTANT

Note: I also renamed by imported variable to have an uppercase name to reflect its constant nature.

回答 7

您可能只需要在Eclipse中重新配置python路径。见我的回答对一个类似问题。

It is possible you just need to re-configure your python path within Eclipse. See my answer to a similar question.

回答 8

在首选项-> PyDev-> PyLint 传递给PyLint的参数下添加以下行：

--generated-members=objects

您将需要针对每个生成的进行此操作。我通过谷歌搜索找到了这个，但是我丢失了参考。

in preferences –> PyDev –> PyLint under arguments to pass to PyLint add this line:

--generated-members=objects

you will need to do this for each generated . I found this by googling, but I lost the reference.

回答 9

在项目浏览器中右键单击出现错误的模块。转到PyDev->删除错误标记。

Right click in the project explorer on whichever module is giving errors. Go to PyDev->Remove Error Markers.

回答 10

我的回答没有任何新贡献，只是我遇到的一个具体例子。

import gtk.gdk

w = gtk.gdk.get_default_root_window()

PyDev显示错误消息“来自导入的未定义变量：get_default_root_window（）”

在python shell中，您可以看到这是一个“内置”模块，如上面的答案所述：

>>> import gtk.gdk
>>> gtk.gdk
<module 'gtk.gdk' (built-in)>

现在，在Window-> Preferences-> PyDev-> Interpreters-> Python Interpreter下，选择选项卡“ Forced Builtins”并将“ gtk.gdk”添加到列表中。

现在错误消息不再显示。

My answer doesn’t contribute anything new, just a concrete example I encountered.

import gtk.gdk

w = gtk.gdk.get_default_root_window()

PyDev showed the error message “Undefined variable from import: get_default_root_window()”

In the python shell you can see that this is a ‘built-in’ module as mentioned in a answer above:

>>> import gtk.gdk
>>> gtk.gdk
<module 'gtk.gdk' (built-in)>

Now under Window->Preferences->PyDev->Interpreters->Python Interpreter, I selected the tab ‘Forced Builtins’ and added ‘gtk.gdk’ to the list.

Now the error message didn’t show anymore.

回答 11

我发现这两个步骤一直对我有用：

确认（否则添加）模块的父文件夹到PYTHONPATH。
将模块的全名添加到强制内置。

这里要注意的事情：

一些流行的模块会安装一些具有相同名称的父对子对。在这些情况下，除了已确认/添加的其他父项文件夹之外，还必须将该父项添加到PYTHONPATH中。
当添加到强制内置时，请使用（例如）“ google.appengine.api.memcache”，而不是仅使用“ memcache”，在此示例中，“ google”是PYTHONPATH中定义的文件夹的直接子代。

I find that these 2 steps work for me all the time:

Confirm (else add) the parent folder of the module to the PYTHONPATH.
Add FULL name of the module to forced builtins.

The things to note here:

Some popular modules install with some parent and child pair having the same name. In these cases you also have to add that parent to PYTHONPATH, in addition to its grandparent folder, which you already confirmed/added for everything else.
Use (for example) “google.appengine.api.memcache” when adding to forced builtins, NOT “memcache” only, where “google” in this example, is an immediate child of a folder defined in PYTHONPATH.

回答 12

如果确定脚本已运行并且是错误警报，请转至“首选项”>“ PyDev”>“编辑器”>“代码分析”。将错误降级为警告。

http://www.pydev.org/manual_adv_code_analysis.html

If you’re sure that your script runs and that it is a false alarm, Go to Preferences > PyDev > Editor > Code Analysis. Demote the errors to warnings.

http://www.pydev.org/manual_adv_code_analysis.html

知识问答

Python-何时使用文件vs打开

2021年8月17日 Python实用宝典

问题：Python-何时使用文件vs打开

file和openPython 和有什么不一样？我什么时候应该使用哪个？（假设我处于2.5级）

What’s the difference between file and open in Python? When should I use which one? (Say I’m in 2.5)

回答 0

您应该始终使用open()。

如文档所述：

打开文件时，最好使用open（）而不是直接调用此构造函数。文件更适合类型测试（例如，编写“ isinstance（f，file）”）。

另外，自Python 3.0起file() 已被删除。

You should always use open().

As the documentation states:

When opening a file, it’s preferable to use open() instead of invoking this constructor directly. file is more suited to type testing (for example, writing “isinstance(f, file)”).

Also, file() has been removed since Python 3.0.

回答 1

原因有两个：python哲学“应该有一种实现方法”并且file正在消失。

file是实际类型（使用例如file('myfile.txt')调用其构造函数）。open是工厂函数，它将返回文件对象。

在python 3.0 file中，它将从内置变为由io库中的多个类实现（有点类似于带有缓冲读取器的Java等）。

Two reasons: The python philosophy of “There ought to be one way to do it” and file is going away.

file is the actual type (using e.g. file('myfile.txt') is calling its constructor). open is a factory function that will return a file object.

In python 3.0 file is going to move from being a built-in to being implemented by multiple classes in the io library (somewhat similar to Java with buffered readers, etc.)

回答 2

file()是一种类型，例如int或列表。open()是用于打开文件的函数，它将返回一个file对象。

这是何时应使用open的示例：

f = open(filename, 'r')
for line in f:
    process(line)
f.close()

这是何时应使用文件的示例：

class LoggingFile(file):
    def write(self, data):
        sys.stderr.write("Wrote %d bytes\n" % len(data))
        super(LoggingFile, self).write(data)

如您所见，存在两者的充分理由和明确的用例。

file() is a type, like an int or a list. open() is a function for opening files, and will return a file object.

This is an example of when you should use open:

f = open(filename, 'r')
for line in f:
    process(line)
f.close()

This is an example of when you should use file:

class LoggingFile(file):
    def write(self, data):
        sys.stderr.write("Wrote %d bytes\n" % len(data))
        super(LoggingFile, self).write(data)

As you can see, there’s a good reason for both to exist, and a clear use-case for both.

回答 3

在功能上，两者是相同的；无论如何open都会调用file，因此当前的区别在于样式。在Python文档建议使用open。

打开文件时，最好使用open（）而不是直接调用文件构造函数。

原因是在将来的版本中不能保证它们是相同的（open将成为工厂函数，根据其打开的路径返回不同类型的对象）。

Functionally, the two are the same; open will call file anyway, so currently the difference is a matter of style. The Python docs recommend using open.

When opening a file, it’s preferable to use open() instead of invoking the file constructor directly.

The reason is that in future versions they is not guaranteed to be the same (open will become a factory function, which returns objects of different types depending on the path it’s opening).

回答 4

仅使用open（）打开文件。file（）实际上在3.0中已被删除，目前不推荐使用。他们之间有一种奇怪的关系，但是file（）现在正在进行中，因此不再需要担心。

以下来自Python 2.6文档。[括号内的内容]由我添加。

打开文件时，最好使用open（）而不是直接调用此[file（）]构造函数。文件更适合类型测试（例如，编写isinstance（f，file）

Only ever use open() for opening files. file() is actually being removed in 3.0, and it’s deprecated at the moment. They’ve had a sort of strange relationship, but file() is going now, so there’s no need to worry anymore.

The following is from the Python 2.6 docs. [bracket stuff] added by me.

When opening a file, it’s preferable to use open() instead of invoking this [file()] constructor directly. file is more suited to type testing (for example, writing isinstance(f, file)

回答 5

Van Rossum先生说，尽管open（）当前是file（）的别名，但您应该使用open（），因为将来可能会改变。

According to Mr Van Rossum, although open() is currently an alias for file() you should use open() because this might change in the future.

知识问答

如何为具有多对多字段的Django模型创建对象？

2021年8月17日 Python实用宝典

问题：如何为具有多对多字段的Django模型创建对象？

我的模特：

class Sample(models.Model):
    users = models.ManyToManyField(User)

我想同时保存user1并保存user2在该模型中：

user1 = User.objects.get(pk=1)
user2 = User.objects.get(pk=2)
sample_object = Sample(users=user1, users=user2)
sample_object.save()

我知道这是错误的，但是我敢肯定，您会明白我的意思。你会怎么做？

My model:

class Sample(models.Model):
    users = models.ManyToManyField(User)

I want to save both user1 and user2 in that model:

user1 = User.objects.get(pk=1)
user2 = User.objects.get(pk=2)
sample_object = Sample(users=user1, users=user2)
sample_object.save()

I know that’s wrong, but I’m sure you get what I want to do. How would you do it ?

回答 0

您不能从未保存的对象创建m2m关系。如果有pk，请尝试以下操作：

sample_object = Sample()
sample_object.save()
sample_object.users.add(1,2)

更新：阅读了saverio的答案后，我决定对这个问题进行更深入的研究。这是我的发现。

这是我最初的建议。它可以工作，但不是最佳选择。（注意：我使用Bar的是s和a Foo而不是Users和a Sample，但是您知道了。）

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars.add(bar1)
foo.bars.add(bar2)

它总共产生7个查询：

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

我相信我们可以做得更好。您可以将多个对象传递给该add()方法：

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars.add(bar1, bar2)

如我们所见，传递多个对象可以节省一个SELECT：

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

我不知道您还可以分配对象列表：

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars = [bar1, bar2]

不幸的是，这又增加了一个SELECT：

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."id", "app_foo_bars"."foo_id", "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE "app_foo_bars"."foo_id" = 1
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

让我们尝试分配一个pks 列表，如saverio建议的那样：

foo = Foo()
foo.save()
foo.bars = [1,2]

由于不获取两个Bars，因此保存了两个SELECT语句，总共有5个：

INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."id", "app_foo_bars"."foo_id", "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE "app_foo_bars"."foo_id" = 1
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

最终获胜者是：

foo = Foo()
foo.save()
foo.bars.add(1,2)

路过pks到add()让我们一共有4个查询：

INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

You cannot create m2m relations from unsaved objects. If you have the pks, try this:

sample_object = Sample()
sample_object.save()
sample_object.users.add(1,2)

Update: After reading the saverio’s answer, I decided to investigate the issue a bit more in depth. Here are my findings.

This was my original suggestion. It works, but isn’t optimal. (Note: I’m using Bars and a Foo instead of Users and a Sample, but you get the idea).

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars.add(bar1)
foo.bars.add(bar2)

It generates a whopping total of 7 queries:

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

I’m sure we can do better. You can pass multiple objects to the add() method:

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars.add(bar1, bar2)

As we can see, passing multiple objects saves one SELECT:

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

I wasn’t aware that you can also assign a list of objects:

bar1 = Bar.objects.get(pk=1)
bar2 = Bar.objects.get(pk=2)
foo = Foo()
foo.save()
foo.bars = [bar1, bar2]

Unfortunately, that creates one additional SELECT:

SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 1
SELECT "app_bar"."id", "app_bar"."name" FROM "app_bar" WHERE "app_bar"."id" = 2
INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."id", "app_foo_bars"."foo_id", "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE "app_foo_bars"."foo_id" = 1
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

Let’s try to assign a list of pks, as saverio suggested:

foo = Foo()
foo.save()
foo.bars = [1,2]

As we don’t fetch the two Bars, we save two SELECT statements, resulting in a total of 5:

INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."id", "app_foo_bars"."foo_id", "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE "app_foo_bars"."foo_id" = 1
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

And the winner is:

foo = Foo()
foo.save()
foo.bars.add(1,2)

Passing pks to add() gives us a total of 4 queries:

INSERT INTO "app_foo" ("name") VALUES ()
SELECT "app_foo_bars"."bar_id" FROM "app_foo_bars" WHERE ("app_foo_bars"."foo_id" = 1  AND "app_foo_bars"."bar_id" IN (1, 2))
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 1)
INSERT INTO "app_foo_bars" ("foo_id", "bar_id") VALUES (1, 2)

回答 1

对于将来的访问者，您可以使用django 1.4中新的bulk_create在2个查询中创建一个对象及其所有m2m对象。请注意，仅当您不需要对带有save（）方法或信号的数据进行任何预处理或后处理时，此方法才可用。您插入的正是数据库中的内容

您无需在字段上指定“直通”模型即可执行此操作。为了完整起见，下面的示例创建了一个空白的Users模型来模仿原始海报的要求。

from django.db import models

class Users(models.Model):
    pass

class Sample(models.Model):
    users = models.ManyToManyField(Users)

现在，在Shell或其他代码中，创建2个用户，创建一个示例对象，然后将用户批量添加到该示例对象中。

Users().save()
Users().save()

# Access the through model directly
ThroughModel = Sample.users.through

users = Users.objects.filter(pk__in=[1,2])

sample_object = Sample()
sample_object.save()

ThroughModel.objects.bulk_create([
    ThroughModel(users_id=users[0].pk, sample_id=sample_object.pk),
    ThroughModel(users_id=users[1].pk, sample_id=sample_object.pk)
])

For future visitors, you can create an object and all of its m2m objects in 2 queries using the new bulk_create in django 1.4. Note that this is only usable if you don’t require any pre or post-processing on the data with save() methods or signals. What you insert is exactly what will be in the DB

You can do this without specifying a “through” model on the field. For completeness, the example below creates a blank Users model to mimic what the original poster was asking.

from django.db import models

class Users(models.Model):
    pass

class Sample(models.Model):
    users = models.ManyToManyField(Users)

Now, in a shell or other code, create 2 users, create a sample object, and bulk add the users to that sample object.

Users().save()
Users().save()

# Access the through model directly
ThroughModel = Sample.users.through

users = Users.objects.filter(pk__in=[1,2])

sample_object = Sample()
sample_object.save()

ThroughModel.objects.bulk_create([
    ThroughModel(users_id=users[0].pk, sample_id=sample_object.pk),
    ThroughModel(users_id=users[1].pk, sample_id=sample_object.pk)
])

回答 2

Django 1.9
一个简单的例子：

sample_object = Sample()
sample_object.save()

list_of_users = DestinationRate.objects.all()
sample_object.users.set(list_of_users)

Django 1.9
A quick example:

sample_object = Sample()
sample_object.save()

list_of_users = DestinationRate.objects.all()
sample_object.users.set(list_of_users)

回答 3

RelatedObjectManagers与Model中的字段是不同的“属性”。实现您想要的最简单的方法是

sample_object = Sample.objects.create()
sample_object.users = [1, 2]

这与分配用户列表相同，而没有其他查询和模型构建。

如果查询的数量让您感到困扰（而不是简单），那么最佳解决方案将需要三个查询：

sample_object = Sample.objects.create()
sample_id = sample_object.id
sample_object.users.through.objects.create(user_id=1, sample_id=sample_id)
sample_object.users.through.objects.create(user_id=2, sample_id=sample_id)

这将起作用，因为我们已经知道“用户”列表为空，因此我们可以轻松创建。

RelatedObjectManagers are different “attributes” than fields in a Model. The simplest way to achieve what you are looking for is

sample_object = Sample.objects.create()
sample_object.users = [1, 2]

That’s the same as assigning a User list, without the additional queries and the model building.

If the number of queries is what bothers you (instead of simplicity), then the optimal solution requires three queries:

sample_object = Sample.objects.create()
sample_id = sample_object.id
sample_object.users.through.objects.create(user_id=1, sample_id=sample_id)
sample_object.users.through.objects.create(user_id=2, sample_id=sample_id)

This will work because we already know that the ‘users’ list is empty, so we can create mindlessly.

回答 4

您可以通过以下方式替换相关对象集（Django 1.9中的新增功能）：

new_list = [user1, user2, user3]
sample_object.related_set.set(new_list)

You could replace the set of related objects in this way (new in Django 1.9):

new_list = [user1, user2, user3]
sample_object.related_set.set(new_list)

回答 5

如果有人想做David Marbles，请回答自我引用ManyToMany字段。直通模型的ID称为：“ to_’model_name_id”和“ from_’model_name’_id”。

如果这样不起作用，您可以检查Django连接。

If someone is looking to do David Marbles answer on a self referring ManyToMany field. The ids of the through model are called: “to_’model_name_id” and “from_’model_name’_id”.

If that doesn’t work you can check the django connection.

知识问答

pip install -U中的“ -U”选项代表什么

2021年8月17日 Python实用宝典

问题：pip install -U中的“ -U”选项代表什么

尽管有大量Google搜索，但我找不到pip命令行选项/参数的任何文档。什么pip install -U意思是否有人链接到pip选项和参数列表？

Despite a ton of Googling, I can’t find any docs for pip’s command line options/arguments. What does pip install -U mean? Does anyone have a link to a list of pip’s options and arguments?

回答 0

键入pip install -h列出帮助：

-U，–upgrade将所有软件包升级到最新可用版本

因此，如果您已经安装了软件包，它将为您升级该软件包。如果没有-U开关，它将告诉您该软件包已安装并退出。

每个pip子命令都有其自己的帮助列表。pip -h向您显示总体帮助，并pip [subcommand] -h为该子命令提供帮助，例如install。

您也可以在线找到完整的参考文档。“ 常规选项”部分涵盖了每个pip子命令可用的开关，而每个子命令都有一个单独的“ 选项”部分来涵盖特定于子命令的开关；例如，请参阅“ pip install选项”部分。

Type pip install -h to list help:

-U, –upgrade Upgrade all packages to the newest available version

So, if you already have a package installed, it will upgrade the package for you. Without the -U switch it’ll tell you the package is already installed and exit.

Each pip subcommand has its own help listing. pip -h shows you overall help, and pip [subcommand] -h gives you help for that sub command, such as install.

You can also find the full reference documentation online; the General Options section covers switches available for every pip subcommand, while each subcommand has a separate Options section to cover subcommand-specific switches; see the pip install options section, for example.

知识问答

ImportError：没有名为bs4的模块（BeautifulSoup）

2021年8月17日 Python实用宝典

问题：ImportError：没有名为bs4的模块（BeautifulSoup）

我正在使用Python并使用Flask。当我在计算机上运行我的主Python文件时，它可以正常运行，但是当我激活venv并在终端中运行Flask Python文件时，它表示我的主Python文件具有“没有名为bs4的模块”。任何意见或建议，不胜感激。

I’m working in Python and using Flask. When I run my main Python file on my computer, it works perfectly, but when I activate venv and run the Flask Python file in the terminal, it says that my main Python file has “No Module Named bs4.” Any comments or advice is greatly appreciated.

回答 0

激活virtualenv，然后安装BeautifulSoup4：

$ pip install BeautifulSoup4

当您安装bs4使用easy_install，您在系统范围内进行了安装。因此，您的系统python可以导入它，但您的virtualenv python不能导入。如果您不需要bs4在系统python路径中安装，请卸载它并将其保留在virtualenv中。

有关virtualenvs的更多信息，请阅读此内容

Activate the virtualenv, and then install BeautifulSoup4:

$ pip install BeautifulSoup4

When you installed bs4 with easy_install, you installed it system-wide. So your system python can import it, but not your virtualenv python. If you do not need bs4 to be installed in your system python path, uninstall it and keep it in your virtualenv.

For more information about virtualenvs, read this

回答 1

对于python2.x：

sudo pip install BeautifulSoup4

对于python3：

sudo apt-get install python3-bs4

For python2.x:

sudo pip install BeautifulSoup4

For python3:

sudo apt-get install python3-bs4

回答 2

只需标记Balthazar的答案即可。跑步

pip install BeautifulSoup4

没有为我工作。改为使用

pip install beautifulsoup4

Just tagging onto Balthazar’s answer. Running

pip install BeautifulSoup4

did not work for me. Instead use

pip install beautifulsoup4

回答 3

pip3 install BeautifulSoup4

试试这个。这个对我有用。原因在这里得到了很好的解释..

pip3 install BeautifulSoup4

Try this. It works for me. The reason is well explained here..

回答 4

如果您将Anaconda用于软件包管理，则应执行以下操作：

conda install -c anaconda beautifulsoup4

If you are using Anaconda for package management, following should do:

conda install -c anaconda beautifulsoup4

回答 5

如果您使用Pycharm，请转到preferences - project interpreter - install bs4。

如果尝试安装BeautifulSoup，它将仍然显示没有名为的模块bs4。

If you use Pycharm, go to preferences - project interpreter - install bs4.

If you try to install BeautifulSoup, it will still show that no module named bs4.

回答 6

我建议您使用以下命令来卸载bs4库：

pip卸载bs4

然后使用以下命令进行安装：

须藤apt-get install python3-bs4

当我使用以下命令安装bs4库时，在Linux Ubuntu中遇到了相同的问题：

点安装bs4

I will advise you to uninstall the bs4 library by using this command:

pip uninstall bs4

and then install it using this command:

sudo apt-get install python3-bs4

I was facing the same problem in my Linux Ubuntu when I used the following command for installing bs4 library:

pip install bs4

回答 7

试试这个：

sudo python3 -m pip install bs4

Try this:

sudo python3 -m pip install bs4

回答 8

pip install --user BeautifulSoup4

回答 9

pip3.7 install bs4

试试这个。它适用于python 3.7

pip3.7 install bs4

Try this. It works with python 3.7

回答 10

我做了@ rayid-ali所说的，除了我在Windows 10机器上，所以我省略了sudo。也就是说，我做了以下工作：

python3 -m pip install bs4

它就像一个pycharm。无论如何都像魅力一样工作。

I did what @rayid-ali said, except I’m on a Windows 10 machine so I left out the sudo. That is, I did the following:

python3 -m pip install bs4

and it worked like a pycharm. Worked like a charm anyway.

回答 11

最简单的是使用easy_install。

easy_install bs4

如果pip失败，它将起作用。

The easiest is using easy_install.

easy_install bs4

It will work if pip fails.

回答 12

很多针对Python 2编写的教程/参考资料都告诉您使用pip install somename。如果您使用的是Python 3，则要将其更改为pip3 install somename。

A lot of tutorials/references were written for Python 2 and tell you to use pip install somename. If you’re using Python 3 you want to change that to pip3 install somename.

回答 13

您可能想尝试使用安装bs4

pip install --ignore-installed BeautifulSoup4

如果上述方法不适合您。

You might want to try install bs4 with

pip install --ignore-installed BeautifulSoup4

if the methods above didn’t work for you.

回答 14

尝试重新安装模块，或者尝试使用以下命令与漂亮的汤一起安装

pip install --ignore-installed BeautifulSoup4

Try reinstalling the module OR Try installing with beautiful soup with the below command

pip install --ignore-installed BeautifulSoup4

回答 15

原始查询的附录：modules.py

help('modules')

$python modules.py

它列出了已经安装的模块bs4。

_codecs_kr          blinker             json                six
_codecs_tw          brotli              kaitaistruct        smtpd
_collections        bs4                 keyword             smtplib
_collections_abc    builtins            ldap3               sndhdr
_compat_pickle      bz2                 lib2to3             socket

正确的解决方案是：

pip install --upgrade bs4

应该解决问题。

不仅如此，其他模块也会显示相同的错误。因此，对于那些错误的模块，您必须以与上述相同的方式发出pip命令。

Addendum to the original query: modules.py

help('modules')

$python modules.py

It lists that module bs4 already been installed.

_codecs_kr          blinker             json                six
_codecs_tw          brotli              kaitaistruct        smtpd
_collections        bs4                 keyword             smtplib
_collections_abc    builtins            ldap3               sndhdr
_compat_pickle      bz2                 lib2to3             socket

Proper solution is:

pip install --upgrade bs4

Should solve the problem.

Not only that, it will show same error for other modules as well. So you got to issue the pip command same way as above for those errored module(s).

知识问答

如何从macOS完全卸载Anaconda

2021年8月17日 Python实用宝典

问题：如何从macOS完全卸载Anaconda

如何从MacOS Sierra完全卸载Anaconda并恢复为原始Python？我试过使用，conda-clean -yes但不起作用。我也删除了其中的内容，~/.bash_profile但是它仍然使用Anaconda python，并且我仍然可以运行conda命令。

How can I completely uninstall Anaconda from MacOS Sierra and revert back to the original Python? I have tried using conda-clean -yes but that doesn’t work. I also remove the stuff in ~/.bash_profile but it still uses the Anaconda python and I can still run the conda command.

回答 0

删除配置：

conda install anaconda-clean
anaconda-clean --yes

删除配置后，您可以删除anaconda安装文件夹，该文件夹通常位于主目录下：

rm -rf ~/anaconda3

另外，该anaconda-clean --yes命令还会在您的主目录中以格式创建备份~/.anaconda_backup/<timestamp>。确保也删除该一个。

编辑（v5.2.0）：现在，如果您要清除所有内容，则还必须删除添加到的最后两行.bash_profile。他们看着像是：

# added by Anaconda3 5.2.0 installer
export PATH="/Users/ody/anaconda3/bin:$PATH"

To remove the configs:

conda install anaconda-clean
anaconda-clean --yes

Once the configs are removed you can delete the anaconda install folder, which is usually under your home dir:

rm -rf ~/anaconda3

Also, the anaconda-clean --yes command creates a backup in your home directory of the format ~/.anaconda_backup/<timestamp>. Make sure to delete that one also.

EDIT (v5.2.0): Now if you want to clean all, you will also have to delete the two last lines added to your .bash_profile. They look like:

# added by Anaconda3 5.2.0 installer
export PATH="/Users/ody/anaconda3/bin:$PATH"

回答 1

要卸载Anaconda，请打开终端窗口：

删除整个anaconda安装目录：

rm -rf ~/anaconda

编辑~/.bash_profile 并从您的PATH环境变量中删除anaconda目录。

注意：您可能需要编辑.bashrc和/或.profile文件而不是.bash_profile

删除以下隐藏的文件和目录，这些文件和目录可能是在主目录中创建的：
- .condarc
- .conda
- .continuum

用：

rm -rf ~/.condarc ~/.conda ~/.continuum

To uninstall Anaconda open a terminal window:

Remove the entire anaconda installation directory:

rm -rf ~/anaconda

Edit ~/.bash_profile and remove the anaconda directory from your PATH environment variable.

Note: You may need to edit .bashrc and/or .profile files instead of .bash_profile

Remove the following hidden files and directories, which may have been created in the home directory:
- .condarc
- .conda
- .continuum

Use:

rm -rf ~/.condarc ~/.conda ~/.continuum

回答 2

就我而言（Mac High Sierra），它安装在〜/ opt / anaconda3上。

https://docs.anaconda.com/anaconda/install/uninstall/

In my case (Mac High Sierra) it was installed at ~/opt/anaconda3.

https://docs.anaconda.com/anaconda/install/uninstall/

回答 3

打开终端，并输入以下命令，删除整个Anaconda目录，该目录的名称将为“ anaconda2”或“ anaconda3”，例如：rm -rf〜/ anaconda3。然后使用命令“ conda uninstall” https://conda.io/docs/commands/conda-uninstall.html删除conda 。

Open the terminal and remove your entire Anaconda directory, which will have a name such as “anaconda2” or “anaconda3”, by entering the following command: rm -rf ~/anaconda3. Then remove conda with command “conda uninstall” https://conda.io/docs/commands/conda-uninstall.html.

回答 4

这是anaconda在删除Anaconda之后有一个条目破坏了我的python安装的地方。希望这对其他人有帮助。

如果您使用的是纱，我在〜/“用户名”的.yarn.rc文件中找到了此条目

python“ / Users / someone / anaconda3 / bin / python3”

删除此行固定了彻底删除所需的最后一个位置。我不确定如何添加该条目，但它有帮助

This is one more place that anaconda had an entry that was breaking my python install after removing Anaconda. Hoping this helps someone else.

If you are using yarn, I found this entry in my .yarn.rc file in ~/”username”

python “/Users/someone/anaconda3/bin/python3”

removing this line fixed one last place needed for complete removal. I am not sure how that entry was added but it helped

回答 5

在执行了辣木和jkysam的非常有用的建议而没有立即获得成功后，需要简单地重新启动Mac才能使系统识别出更改。希望这对某人有帮助！

After performing the very helpful suggestions from both spicyramen & jkysam without immediate success, a simple restart of my Mac was needed to make the system recognize the changes. Hope this helps someone!

回答 6

这对我有用：

conda remove --all --prefix /Users/username/anaconda/bin/python

然后从.bash_profile中的$ PATH中删除

This has worked for me:

conda remove --all --prefix /Users/username/anaconda/bin/python

then also remove from $PATH in .bash_profile

回答 7

在我的〜/ .bash_profile文件中添加export PATH="/Users/<username>/anaconda/bin:$PATH"（或export PATH="/Users/<username>/anaconda3/bin:$PATH"如果您有anaconda 3），可以为我解决此问题。

Adding export PATH="/Users/<username>/anaconda/bin:$PATH" (or export PATH="/Users/<username>/anaconda3/bin:$PATH" if you have anaconda 3) to my ~/.bash_profile file, fixed this issue for me.

回答 8

官方说明似乎在这里：https : //docs.anaconda.com/anaconda/install/uninstall/

但是，如果您喜欢我，由于某种原因而无法使用，并且由于某种原因您的conda却安装在其他地方，并告诉您这样做：

rm -rf ~/opt

我不知道为什么将它保存在那里，但这就是我的目的。

这对我修复conda安装很有用（如果这是您像我这样首先卸载它的原因）：https : //stackoverflow.com/a/60902863/1601580最后为我修复了它。不知道为什么conda首先表现得很怪异，或者为什么错误地首先把东西安装了……

The official instructions seem to be here: https://docs.anaconda.com/anaconda/install/uninstall/

but if you like me that didn’t work for some reason and for some reason your conda was installed somewhere else with telling you do this:

rm -rf ~/opt

I have no idea why it was saved there but that’s what did it for me.

This was useful to me in fixing my conda installation (if that is the reason you are uninstalling it in the first place like me): https://stackoverflow.com/a/60902863/1601580 that ended up fixing it for me. Not sure why conda was acting weird in the first place or installing things wrongly in the first place though…

知识问答

如何使用python从数组中删除特定元素

2021年8月17日 Python实用宝典

问题：如何使用python从数组中删除特定元素

我想写一些东西从数组中删除一个特定的元素。我知道我必须for遍历数组以查找与内容匹配的元素。

假设我有一组电子邮件，并且想摆脱与某些电子邮件字符串匹配的元素。

我实际上想使用for循环结构，因为我还需要对其他数组使用相同的索引。

这是我的代码：

for index, item in emails:
    if emails[index] == 'something@something.com':
         emails.pop(index)
         otherarray.pop(index)

I want to write something that removes a specific element from an array. I know that I have to for loop through the array to find the element that matches the content.

Let’s say that I have an array of emails and I want to get rid of the element that matches some email string.

I’d actually like to use the for loop structure because I need to use the same index for other arrays as well.

Here is the code that I have:

for index, item in emails:
    if emails[index] == 'something@something.com':
         emails.pop(index)
         otherarray.pop(index)

回答 0

您不需要迭代数组。只是：

>>> x = ['ala@ala.com', 'bala@bala.com']
>>> x
['ala@ala.com', 'bala@bala.com']
>>> x.remove('ala@ala.com')
>>> x
['bala@bala.com']

这将删除与字符串匹配的第一次出现。

编辑：编辑后，您仍然不需要迭代。做就是了：

index = initial_list.index(item1)
del initial_list[index]
del other_list[index]

You don’t need to iterate the array. Just:

>>> x = ['ala@ala.com', 'bala@bala.com']
>>> x
['ala@ala.com', 'bala@bala.com']
>>> x.remove('ala@ala.com')
>>> x
['bala@bala.com']

This will remove the first occurence that matches the string.

EDIT: After your edit, you still don’t need to iterate over. Just do:

index = initial_list.index(item1)
del initial_list[index]
del other_list[index]

回答 1

使用filter()and lambda将提供一种简洁的方法来删除不需要的值：

newEmails = list(filter(lambda x : x != 'something@something.com', emails))

这不会修改电子邮件。它创建新列表newEmails，其中仅包含匿名函数为其返回True的元素。

Using filter() and lambda would provide a neat and terse method of removing unwanted values:

newEmails = list(filter(lambda x : x != 'something@something.com', emails))

This does not modify emails. It creates the new list newEmails containing only elements for which the anonymous function returned True.

回答 2

如果需要在for循环中使用索引，则for循环不正确：

for index, item in enumerate(emails):
    # whatever (but you can't remove element while iterating)

对于您而言，Bogdan解决方案是可以的，但是您选择的数据结构不是很好。必须用来自一个的数据与来自另一个的数据以相同的索引来维护这两个列表是笨拙的。

最好使用连音（电子邮件，其他数据）列表，或者以电子邮件为键的字典。

Your for loop is not right, if you need the index in the for loop use:

for index, item in enumerate(emails):
    # whatever (but you can't remove element while iterating)

In your case, Bogdan solution is ok, but your data structure choice is not so good. Having to maintain these two lists with data from one related to data from the other at same index is clumsy.

A list of tupple (email, otherdata) may be better, or a dict with email as key.

回答 3

做到这一点的理智方法是使用zip()和List Comprehension / Generator表达式：

filtered = (
    (email, other) 
        for email, other in zip(emails, other_list) 
            if email == 'something@something.com')

new_emails, new_other_list = zip(*filtered)

另外，如果您未使用array.array()或numpy.array()，那么很可能您正在使用[]或list()，这会为您提供列表，而不是数组。不一样的东西。

The sane way to do this is to use zip() and a List Comprehension / Generator Expression:

filtered = (
    (email, other) 
        for email, other in zip(emails, other_list) 
            if email == 'something@something.com')

new_emails, new_other_list = zip(*filtered)

Also, if your’e not using array.array() or numpy.array(), then most likely you are using [] or list(), which give you Lists, not Arrays. Not the same thing.

回答 4

有一个替代解决方案，该问题还处理重复的匹配项。

我们先从相同长度的2所列出：emails，otherarray。目的是从两个列表中的每个索引i中删除项目emails[i] == 'something@something.com'。

这可以使用列表理解，然后通过以下方式实现zip：

emails = ['abc@def.com', 'something@something.com', 'ghi@jkl.com']
otherarray = ['some', 'other', 'details']

from operator import itemgetter

res = [(i, j) for i, j in zip(emails, otherarray) if i!= 'something@something.com']
emails, otherarray = map(list, map(itemgetter(0, 1), zip(*res)))

print(emails)      # ['abc@def.com', 'ghi@jkl.com']
print(otherarray)  # ['some', 'details']

There is an alternative solution to this problem which also deals with duplicate matches.

We start with 2 lists of equal length: emails, otherarray. The objective is to remove items from both lists for each index i where emails[i] == 'something@something.com'.

This can be achieved using a list comprehension and then splitting via zip:

emails = ['abc@def.com', 'something@something.com', 'ghi@jkl.com']
otherarray = ['some', 'other', 'details']

from operator import itemgetter

res = [(i, j) for i, j in zip(emails, otherarray) if i!= 'something@something.com']
emails, otherarray = map(list, map(itemgetter(0, 1), zip(*res)))

print(emails)      # ['abc@def.com', 'ghi@jkl.com']
print(otherarray)  # ['some', 'details']

知识问答

网址中的熊猫read_csv

2021年8月17日 Python实用宝典

问题：网址中的熊猫read_csv

我将Python 3.4与IPython结合使用，并具有以下代码。我无法从给定的URL读取csv文件：

import pandas as pd
import requests

url="https://github.com/cs109/2014_data/blob/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(s)

我有以下错误

“预期的文件路径名或类文件对象，得到类型”

我怎样才能解决这个问题？

I am using Python 3.4 with IPython and have the following code. I’m unable to read a csv-file from the given URL:

import pandas as pd
import requests

url="https://github.com/cs109/2014_data/blob/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(s)

I have the following error

“Expected file path name or file-like object, got type”

How can I fix this?

回答 0

更新资料

0.19.2现在，您可以从熊猫直接传递URL。

正如错误所暗示的，pandas.read_csv需要一个类似文件的对象作为第一个参数。

如果要从字符串读取csv，可以使用io.StringIO（Python 3.x）或StringIO.StringIO（Python 2.x）。

另外，对于URL- https://github.com/cs109/2014_data/blob/master/countries.csv-您正在获得html响应，而不是原始的csv，您应该使用Rawgithub页面中的链接给出的url 获取原始的csv响应-https: //raw.githubusercontent.com/cs109/2014_data/master/countries.csv

范例-

import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

Update

From pandas 0.19.2 you can now just pass the url directly.

Just as the error suggests, pandas.read_csv needs a file-like object as the first argument.

If you want to read the csv from a string, you can use io.StringIO (Python 3.x) or StringIO.StringIO (Python 2.x) .

Also, for the URL – https://github.com/cs109/2014_data/blob/master/countries.csv – you are getting back html response , not raw csv, you should use the url given by the Raw link in the github page for getting raw csv response , which is – https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

Example –

import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

回答 1

在最新版本的pandas（0.19.2）中，您可以直接传递网址

import pandas as pd

url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)

In the latest version of pandas (0.19.2) you can directly pass the url

import pandas as pd

url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)

回答 2

正如我评论的那样，您需要使用StringIO对象并进行解码，即c=pd.read_csv(io.StringIO(s.decode("utf-8")))如果使用请求，则需要进行解码，因为如果您使用.text ，则content会返回字节，您只需要像s = requests.get(url).textc = 那样传递s即可pd.read_csv(StringIO(s))。

一种更简单的方法是将原始数据的正确url 直接传递给read_csv，您不必传递像object这样的文件，您可以传递url从而根本不需要请求：

c = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv")

print(c)

输出：

                              Country         Region
0                             Algeria         AFRICA
1                              Angola         AFRICA
2                               Benin         AFRICA
3                            Botswana         AFRICA
4                             Burkina         AFRICA
5                             Burundi         AFRICA
6                            Cameroon         AFRICA
..................................

从文档：

filepath_or_buffer：

字符串或文件句柄/ StringIO字符串可以是URL。有效的URL方案包括http，ftp，s3和file。对于文件URL，需要一个主机。例如，本地文件可以是文件：//localhost/path/to/table.csv

As I commented you need to use a StringIO object and decode i.e c=pd.read_csv(io.StringIO(s.decode("utf-8"))) if using requests, you need to decode as .content returns bytes if you used .text you would just need to pass s as is s = requests.get(url).text c = pd.read_csv(StringIO(s)).

A simpler approach is to pass the correct url of the raw data directly to read_csv, you don’t have to pass a file like object, you can pass a url so you don’t need requests at all:

c = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv")

print(c)

Output:

                              Country         Region
0                             Algeria         AFRICA
1                              Angola         AFRICA
2                               Benin         AFRICA
3                            Botswana         AFRICA
4                             Burkina         AFRICA
5                             Burundi         AFRICA
6                            Cameroon         AFRICA
..................................

From the docs:

filepath_or_buffer :

string or file handle / StringIO The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv

回答 3

您遇到的问题是，进入变量s的输出不是csv，而是html文件。为了获得原始的csv，您必须将url修改为：

‘ https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv ‘

您的第二个问题是read_csv需要一个文件名，我们可以通过使用io模块中的StringIO来解决此问题。第三个问题是request.get（url）.content提供了字节流，我们可以改用request.get（url）.text解决。

最终结果是此代码：

from io import StringIO

import pandas as pd
import requests
url='https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
s=requests.get(url).text

c=pd.read_csv(StringIO(s))

输出：

>>> c.head()
    Country  Region
0   Algeria  AFRICA
1    Angola  AFRICA
2     Benin  AFRICA
3  Botswana  AFRICA
4   Burkina  AFRICA

The problem you’re having is that the output you get into the variable ‘s’ is not a csv, but a html file. In order to get the raw csv, you have to modify the url to:

‘https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv‘

Your second problem is that read_csv expects a file name, we can solve this by using StringIO from io module. Third problem is that request.get(url).content delivers a byte stream, we can solve this using the request.get(url).text instead.

End result is this code:

from io import StringIO

import pandas as pd
import requests
url='https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
s=requests.get(url).text

c=pd.read_csv(StringIO(s))

output:

>>> c.head()
    Country  Region
0   Algeria  AFRICA
1    Angola  AFRICA
2     Benin  AFRICA
3  Botswana  AFRICA
4   Burkina  AFRICA

回答 4

url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
c = pd.read_csv(url, sep = "\t")

url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
c = pd.read_csv(url, sep = "\t")

回答 5

要通过熊猫中的URL导入数据，只需应用下面的简单代码即可，实际上效果更好。

import pandas as pd
train = pd.read_table("https://urlandfile.com/dataset.csv")
train.head()

如果您对原始数据有疑问，则只需在网址前添加“ r”

import pandas as pd
train = pd.read_table(r"https://urlandfile.com/dataset.csv")
train.head()

To Import Data through URL in pandas just apply the simple below code it works actually better.

import pandas as pd
train = pd.read_table("https://urlandfile.com/dataset.csv")
train.head()

If you are having issues with a raw data then just put ‘r’ before URL

import pandas as pd
train = pd.read_table(r"https://urlandfile.com/dataset.csv")
train.head()

知识问答

在其他两个日期之间生成一个随机日期

2021年8月17日 Python实用宝典

问题：在其他两个日期之间生成一个随机日期

如何生成必须在其他两个给定日期之间的随机日期？

该函数的签名应如下所示：

random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", 0.34)
                   ^                       ^          ^

            date generated has  date generated has  a random number
            to be after this    to be before this

并返回一个日期，例如： 2/4/2008 7:20 PM

How would I generate a random date that has to be between two other given dates?

The function’s signature should be something like this:

random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", 0.34)
                   ^                       ^          ^

            date generated has  date generated has  a random number
            to be after this    to be before this

and would return a date such as: 2/4/2008 7:20 PM

回答 0

将两个字符串都转换为时间戳（以您选择的分辨率为单位，例如毫秒，秒，小时，天等），从后一个减去前一个，将您的随机数（假设分布在中range [0, 1]）乘以该差，然后再次加较早的一个。将时间戳转换回日期字符串，并且您在该范围内有一个随机时间。

Python示例（输出几乎是您指定的格式，而不是0填充-归咎于美国时间格式约定）：

import random
import time

def str_time_prop(start, end, format, prop):
    """Get a time at a proportion of a range of two formatted times.

    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """

    stime = time.mktime(time.strptime(start, format))
    etime = time.mktime(time.strptime(end, format))

    ptime = stime + prop * (etime - stime)

    return time.strftime(format, time.localtime(ptime))


def random_date(start, end, prop):
    return str_time_prop(start, end, '%m/%d/%Y %I:%M %p', prop)

print(random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", random.random()))

Convert both strings to timestamps (in your chosen resolution, e.g. milliseconds, seconds, hours, days, whatever), subtract the earlier from the later, multiply your random number (assuming it is distributed in the range [0, 1]) with that difference, and add again to the earlier one. Convert the timestamp back to date string and you have a random time in that range.

Python example (output is almost in the format you specified, other than 0 padding – blame the American time format conventions):

import random
import time

def str_time_prop(start, end, format, prop):
    """Get a time at a proportion of a range of two formatted times.

    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """

    stime = time.mktime(time.strptime(start, format))
    etime = time.mktime(time.strptime(end, format))

    ptime = stime + prop * (etime - stime)

    return time.strftime(format, time.localtime(ptime))


def random_date(start, end, prop):
    return str_time_prop(start, end, '%m/%d/%Y %I:%M %p', prop)

print(random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", random.random()))

回答 1

from random import randrange
from datetime import timedelta

def random_date(start, end):
    """
    This function will return a random datetime between two datetime 
    objects.
    """
    delta = end - start
    int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
    random_second = randrange(int_delta)
    return start + timedelta(seconds=random_second)

精度是秒。如果需要，您可以将精度提高到微秒，或降低到半小时。为此，只需更改最后一行的计算即可。

示例运行：

from datetime import datetime

d1 = datetime.strptime('1/1/2008 1:30 PM', '%m/%d/%Y %I:%M %p')
d2 = datetime.strptime('1/1/2009 4:50 AM', '%m/%d/%Y %I:%M %p')

print(random_date(d1, d2))

输出：

2008-12-04 01:50:17

from random import randrange
from datetime import timedelta

def random_date(start, end):
    """
    This function will return a random datetime between two datetime 
    objects.
    """
    delta = end - start
    int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
    random_second = randrange(int_delta)
    return start + timedelta(seconds=random_second)

The precision is seconds. You can increase precision up to microseconds, or decrease to, say, half-hours, if you want. For that just change the last line’s calculation.

example run:

from datetime import datetime

d1 = datetime.strptime('1/1/2008 1:30 PM', '%m/%d/%Y %I:%M %p')
d2 = datetime.strptime('1/1/2009 4:50 AM', '%m/%d/%Y %I:%M %p')

print(random_date(d1, d2))

output:

2008-12-04 01:50:17

回答 2

一个小版本。

import datetime
import random


def random_date(start, end):
    """Generate a random datetime between `start` and `end`"""
    return start + datetime.timedelta(
        # Get a random amount of seconds between `start` and `end`
        seconds=random.randint(0, int((end - start).total_seconds())),
    )

请注意，start和end参数都应该是datetime对象。如果您有字符串，则很容易转换。其他答案指出了这样做的一些方法。

A tiny version.

import datetime
import random


def random_date(start, end):
    """Generate a random datetime between `start` and `end`"""
    return start + datetime.timedelta(
        # Get a random amount of seconds between `start` and `end`
        seconds=random.randint(0, int((end - start).total_seconds())),
    )

Note that both start and end arguments should be datetime objects. If you’ve got strings instead, it’s fairly easy to convert. The other answers point to some ways to do so.

回答 3

更新的答案

使用Faker甚至更简单。

安装

pip install faker

用法：

from faker import Faker
fake = Faker()

fake.date_between(start_date='today', end_date='+30y')
# datetime.date(2025, 3, 12)

fake.date_time_between(start_date='-30y', end_date='now')
# datetime.datetime(2007, 2, 28, 11, 28, 16)

# Or if you need a more specific date boundaries, provide the start 
# and end dates explicitly.
import datetime
start_date = datetime.date(year=2015, month=1, day=1)
fake.date_between(start_date=start_date, end_date='+30y')

旧答案

使用雷达非常简单

安装

pip install radar

用法

import datetime

import radar 

# Generate random datetime (parsing dates from str values)
radar.random_datetime(start='2000-05-24', stop='2013-05-24T23:59:59')

# Generate random datetime from datetime.datetime values
radar.random_datetime(
    start = datetime.datetime(year=2000, month=5, day=24),
    stop = datetime.datetime(year=2013, month=5, day=24)
)

# Just render some random datetime. If no range is given, start defaults to 
# 1970-01-01 and stop defaults to datetime.datetime.now()
radar.random_datetime()

Updated answer

It’s even more simple using Faker.

Installation

pip install faker

Usage:

from faker import Faker
fake = Faker()

fake.date_between(start_date='today', end_date='+30y')
# datetime.date(2025, 3, 12)

fake.date_time_between(start_date='-30y', end_date='now')
# datetime.datetime(2007, 2, 28, 11, 28, 16)

# Or if you need a more specific date boundaries, provide the start 
# and end dates explicitly.
import datetime
start_date = datetime.date(year=2015, month=1, day=1)
fake.date_between(start_date=start_date, end_date='+30y')

Old answer

It’s very simple using radar

Installation

pip install radar

Usage

import datetime

import radar 

# Generate random datetime (parsing dates from str values)
radar.random_datetime(start='2000-05-24', stop='2013-05-24T23:59:59')

# Generate random datetime from datetime.datetime values
radar.random_datetime(
    start = datetime.datetime(year=2000, month=5, day=24),
    stop = datetime.datetime(year=2013, month=5, day=24)
)

# Just render some random datetime. If no range is given, start defaults to 
# 1970-01-01 and stop defaults to datetime.datetime.now()
radar.random_datetime()

回答 4

这是另一种方法-这种工作。

from random import randint
import datetime

date=datetime.date(randint(2005,2025), randint(1,12),randint(1,28))

更好的方法

startdate=datetime.date(YYYY,MM,DD)
date=startdate+datetime.timedelta(randint(1,365))

This is a different approach – that sort of works..

from random import randint
import datetime

date=datetime.date(randint(2005,2025), randint(1,12),randint(1,28))

BETTER APPROACH

startdate=datetime.date(YYYY,MM,DD)
date=startdate+datetime.timedelta(randint(1,365))

回答 5

由于Python 3 timedelta支持浮点数乘法，因此现在您可以执行以下操作：

import random
random_date = start + (end - start) * random.random()

鉴于start和end是类型的datetime.datetime。例如，要在第二天生成一个随机的日期时间：

import random
from datetime import datetime, timedelta

start = datetime.now()
end = start + timedelta(days=1)
random_date = start + (end - start) * random.random()

Since Python 3 timedelta supports multiplication with floats, so now you can do:

import random
random_date = start + (end - start) * random.random()

given that start and end are of the type datetime.datetime. For example, to generate a random datetime within the next day:

import random
from datetime import datetime, timedelta

start = datetime.now()
end = start + timedelta(days=1)
random_date = start + (end - start) * random.random()

回答 6

要使用基于熊猫的解决方案，我使用：

import pandas as pd
import numpy as np

def random_date(start, end, position=None):
    start, end = pd.Timestamp(start), pd.Timestamp(end)
    delta = (end - start).total_seconds()
    if position is None:
        offset = np.random.uniform(0., delta)
    else:
        offset = position * delta
    offset = pd.offsets.Second(offset)
    t = start + offset
    return t

我喜欢它，因为很好 pd.Timestamp出色功能使我可以抛出不同的内容和格式。考虑以下几个示例…

你的签名。

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM", position=0.34)
Timestamp('2008-05-04 21:06:48', tz=None)

随机位置。

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM")
Timestamp('2008-10-21 05:30:10', tz=None)

不同的格式。

>>> random_date('2008-01-01 13:30', '2009-01-01 4:50')
Timestamp('2008-11-18 17:20:19', tz=None)

直接传递熊猫/日期时间对象。

>>> random_date(pd.datetime.now(), pd.datetime.now() + pd.offsets.Hour(3))
Timestamp('2014-03-06 14:51:16.035965', tz=None)

To chip in a pandas-based solution I use:

import pandas as pd
import numpy as np

def random_date(start, end, position=None):
    start, end = pd.Timestamp(start), pd.Timestamp(end)
    delta = (end - start).total_seconds()
    if position is None:
        offset = np.random.uniform(0., delta)
    else:
        offset = position * delta
    offset = pd.offsets.Second(offset)
    t = start + offset
    return t

I like it, because of the nice pd.Timestamp features that allow me to throw different stuff and formats at it. Consider the following few examples…

Your signature.

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM", position=0.34)
Timestamp('2008-05-04 21:06:48', tz=None)

Random position.

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM")
Timestamp('2008-10-21 05:30:10', tz=None)

Different format.

>>> random_date('2008-01-01 13:30', '2009-01-01 4:50')
Timestamp('2008-11-18 17:20:19', tz=None)

Passing pandas/datetime objects directly.

>>> random_date(pd.datetime.now(), pd.datetime.now() + pd.offsets.Hour(3))
Timestamp('2014-03-06 14:51:16.035965', tz=None)

回答 7

这是标题标题的字面意思的答案，而不是问题的正文：

import time
import datetime
import random

def date_to_timestamp(d) :
  return int(time.mktime(d.timetuple()))

def randomDate(start, end):
  """Get a random date between two dates"""

  stime = date_to_timestamp(start)
  etime = date_to_timestamp(end)

  ptime = stime + random.random() * (etime - stime)

  return datetime.date.fromtimestamp(ptime)

这段代码大致基于公认的答案。

Here is an answer to the literal meaning of the title rather than the body of this question:

import time
import datetime
import random

def date_to_timestamp(d) :
  return int(time.mktime(d.timetuple()))

def randomDate(start, end):
  """Get a random date between two dates"""

  stime = date_to_timestamp(start)
  etime = date_to_timestamp(end)

  ptime = stime + random.random() * (etime - stime)

  return datetime.date.fromtimestamp(ptime)

This code is based loosely on the accepted answer.

回答 8

您可以使用Mixer，

pip install mixer

和，

from mixer import generators as gen
print gen.get_datetime(min_datetime=(1900, 1, 1, 0, 0, 0), max_datetime=(2020, 12, 31, 23, 59, 59))

You can Use Mixer,

pip install mixer

and,

from mixer import generators as gen
print gen.get_datetime(min_datetime=(1900, 1, 1, 0, 0, 0), max_datetime=(2020, 12, 31, 23, 59, 59))

回答 9

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Create random datetime object."""

from datetime import datetime
import random


def create_random_datetime(from_date, to_date, rand_type='uniform'):
    """
    Create random date within timeframe.

    Parameters
    ----------
    from_date : datetime object
    to_date : datetime object
    rand_type : {'uniform'}

    Examples
    --------
    >>> random.seed(28041990)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(1998, 12, 13, 23, 38, 0, 121628)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(2000, 3, 19, 19, 24, 31, 193940)
    """
    delta = to_date - from_date
    if rand_type == 'uniform':
        rand = random.random()
    else:
        raise NotImplementedError('Unknown random mode \'{}\''
                                  .format(rand_type))
    return from_date + rand * delta


if __name__ == '__main__':
    import doctest
    doctest.testmod()

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Create random datetime object."""

from datetime import datetime
import random


def create_random_datetime(from_date, to_date, rand_type='uniform'):
    """
    Create random date within timeframe.

    Parameters
    ----------
    from_date : datetime object
    to_date : datetime object
    rand_type : {'uniform'}

    Examples
    --------
    >>> random.seed(28041990)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(1998, 12, 13, 23, 38, 0, 121628)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(2000, 3, 19, 19, 24, 31, 193940)
    """
    delta = to_date - from_date
    if rand_type == 'uniform':
        rand = random.random()
    else:
        raise NotImplementedError('Unknown random mode \'{}\''
                                  .format(rand_type))
    return from_date + rand * delta


if __name__ == '__main__':
    import doctest
    doctest.testmod()

回答 10

将您的日期转换为时间戳并random.randint使用时间戳进行调用，然后将随机生成的时间戳转换回日期：

from datetime import datetime
import random

def random_date(first_date, second_date):
    first_timestamp = int(first_date.timestamp())
    second_timestamp = int(second_date.timestamp())
    random_timestamp = random.randint(first_timestamp, second_timestamp)
    return datetime.fromtimestamp(random_timestamp)

那你可以这样用

from datetime import datetime

d1 = datetime.strptime("1/1/2018 1:30 PM", "%m/%d/%Y %I:%M %p")
d2 = datetime.strptime("1/1/2019 4:50 AM", "%m/%d/%Y %I:%M %p")

random_date(d1, d2)

random_date(d2, d1)  # ValueError because the first date comes after the second date

如果您关心时区，则应该date_time_between_dates从Faker库中使用它，因为我已经从中窃取了此代码，因为已经给出了另一个答案。

Convert your dates into timestamps and call random.randint with the timestamps, then convert the randomly generated timestamp back into a date:

from datetime import datetime
import random

def random_date(first_date, second_date):
    first_timestamp = int(first_date.timestamp())
    second_timestamp = int(second_date.timestamp())
    random_timestamp = random.randint(first_timestamp, second_timestamp)
    return datetime.fromtimestamp(random_timestamp)

Then you can use it like this

from datetime import datetime

d1 = datetime.strptime("1/1/2018 1:30 PM", "%m/%d/%Y %I:%M %p")
d2 = datetime.strptime("1/1/2019 4:50 AM", "%m/%d/%Y %I:%M %p")

random_date(d1, d2)

random_date(d2, d1)  # ValueError because the first date comes after the second date

If you care about timezones you should just use date_time_between_dates from the Faker library, where I stole this code from, as a different answer already suggests.

回答 11

将输入日期转换为数字（整数，浮点数，最适合您的用法）
在两个日期数字之间选择一个数字。
将此数字转换回日期。

许多操作系统中已经提供了许多用于将日期与数字进行日期转换的算法。

Convert your input dates to numbers (int, float, whatever is best for your usage)
Choose a number between your two date numbers.
Convert this number back to a date.

Many algorithms for converting date to and from numbers are already available in many operating systems.

回答 12

您需要什么随机数？通常（取决于语言），您可以从日期开始获取到纪元的秒数/毫秒数。因此，对于startDate和endDate之间的随机日期，您可以执行以下操作：

以毫秒为单位计算startDate和endDate之间的时间（endDate.toMilliseconds（）-startDate.toMilliseconds（））
生成一个介于0和1之间的数字
生成一个新的Date，其时间偏移量= startDate.toMilliseconds（）+ 2中获得的数字

What do you need the random number for? Usually (depending on the language) you can get the number of seconds/milliseconds from the Epoch from a date. So for a randomd date between startDate and endDate you could do:

compute the time in ms between startDate and endDate (endDate.toMilliseconds() – startDate.toMilliseconds())
generate a number between 0 and the number you obtained in 1
generate a new Date with time offset = startDate.toMilliseconds() + number obtained in 2

回答 13

最简单的方法是将两个数字都转换为时间戳，然后将其设置为随机数生成器的最小和最大界限。

一个快速的PHP示例是：

// Find a randomDate between $start_date and $end_date
function randomDate($start_date, $end_date)
{
    // Convert to timetamps
    $min = strtotime($start_date);
    $max = strtotime($end_date);

    // Generate random number using above bounds
    $val = rand($min, $max);

    // Convert back to desired date format
    return date('Y-m-d H:i:s', $val);
}

此函数strtotime()用于将日期时间描述转换为Unix时间戳，并date()根据已生成的随机时间戳生成有效日期。

The easiest way of doing this is to convert both numbers to timestamps, then set these as the minimum and maximum bounds on a random number generator.

A quick PHP example would be:

// Find a randomDate between $start_date and $end_date
function randomDate($start_date, $end_date)
{
    // Convert to timetamps
    $min = strtotime($start_date);
    $max = strtotime($end_date);

    // Generate random number using above bounds
    $val = rand($min, $max);

    // Convert back to desired date format
    return date('Y-m-d H:i:s', $val);
}

This function makes use of strtotime() to convert a datetime description into a Unix timestamp, and date() to make a valid date out of the random timestamp which has been generated.

回答 14

只是添加另一个：

datestring = datetime.datetime.strftime(datetime.datetime( \
    random.randint(2000, 2015), \
    random.randint(1, 12), \
    random.randint(1, 28), \
    random.randrange(23), \
    random.randrange(59), \
    random.randrange(59), \
    random.randrange(1000000)), '%Y-%m-%d %H:%M:%S')

日常处理需要一些注意事项。28岁时，您就在安全的网站上。

Just to add another one:

datestring = datetime.datetime.strftime(datetime.datetime( \
    random.randint(2000, 2015), \
    random.randint(1, 12), \
    random.randint(1, 28), \
    random.randrange(23), \
    random.randrange(59), \
    random.randrange(59), \
    random.randrange(1000000)), '%Y-%m-%d %H:%M:%S')

The day handling needs some considerations. With 28 you are on the secure site.

回答 15

这是从emyller的方法修改而来的解决方案，该方法以任何分辨率返回随机日期数组

import numpy as np

def random_dates(start, end, size=1, resolution='s'):
    """
    Returns an array of random dates in the interval [start, end]. Valid 
    resolution arguments are numpy date/time units, as documented at: 
        https://docs.scipy.org/doc/numpy-dev/reference/arrays.datetime.html
    """
    start, end = np.datetime64(start), np.datetime64(end)
    delta = (end-start).astype('timedelta64[{}]'.format(resolution))
    delta_mat = np.random.randint(0, delta.astype('int'), size)
    return start + delta_mat.astype('timedelta64[{}]'.format(resolution))

这种方法的部分优点在于，np.datetime64它确实擅长将日期强制转换为日期，因此您可以将开始/结束日期指定为字符串，日期时间，熊猫时间戳记……几乎所有东西都可以使用。

Here’s a solution modified from emyller’s approach which returns an array of random dates at any resolution

import numpy as np

def random_dates(start, end, size=1, resolution='s'):
    """
    Returns an array of random dates in the interval [start, end]. Valid 
    resolution arguments are numpy date/time units, as documented at: 
        https://docs.scipy.org/doc/numpy-dev/reference/arrays.datetime.html
    """
    start, end = np.datetime64(start), np.datetime64(end)
    delta = (end-start).astype('timedelta64[{}]'.format(resolution))
    delta_mat = np.random.randint(0, delta.astype('int'), size)
    return start + delta_mat.astype('timedelta64[{}]'.format(resolution))

Part of what’s nice about this approach is that np.datetime64 is really good at coercing things to dates, so you can specify your start/end dates as strings, datetimes, pandas timestamps… pretty much anything will work.

回答 16

从概念上讲，这很简单。根据您所使用的语言，您将能够将这些日期转换为参考32或64位整数，通常表示自纪元（1970年1月1日）以来的秒数（否则称为“ Unix时间”）或自某个其他任意日期以来的毫秒数。只需在这两个值之间生成一个随机的32或64位整数。这应该是任何语言的统一班轮。

在某些平台上，您可以将时间生成为两倍（日期是整数部分，时间是小数部分是一种实现）。除了要处理单精度或双精度浮点数（在C，Java和其他语言中为“ floats”或“ doubles”）外，该原理均适用。减去差，乘以随机数（0 <= r <= 1），加到开始时间并完成。

Conceptually it’s quite simple. Depending on which language you’re using you will be able to convert those dates into some reference 32 or 64 bit integer, typically representing seconds since epoch (1 January 1970) otherwise known as “Unix time” or milliseconds since some other arbitrary date. Simply generate a random 32 or 64 bit integer between those two values. This should be a one liner in any language.

On some platforms you can generate a time as a double (date is the integer part, time is the fractional part is one implementation). The same principle applies except you’re dealing with single or double precision floating point numbers (“floats” or “doubles” in C, Java and other languages). Subtract the difference, multiply by random number (0 <= r <= 1), add to start time and done.

回答 17

在python中：

>>> from dateutil.rrule import rrule, DAILY
>>> import datetime, random
>>> random.choice(
                 list(
                     rrule(DAILY, 
                           dtstart=datetime.date(2009,8,21), 
                           until=datetime.date(2010,10,12))
                     )
                 )
datetime.datetime(2010, 2, 1, 0, 0)

（需要python dateutil库– pip install python-dateutil）

In python:

>>> from dateutil.rrule import rrule, DAILY
>>> import datetime, random
>>> random.choice(
                 list(
                     rrule(DAILY, 
                           dtstart=datetime.date(2009,8,21), 
                           until=datetime.date(2010,10,12))
                     )
                 )
datetime.datetime(2010, 2, 1, 0, 0)

(need python dateutil library – pip install python-dateutil)

回答 18

使用ApacheCommonUtils生成给定范围内的随机长度，然后在该长度范围之外创建Date。

例：

导入org.apache.commons.math.random.RandomData;

导入org.apache.commons.math.random.RandomDataImpl;

公开日期nextDate（最小日期，最大日期）{

RandomData randomData = new RandomDataImpl();

return new Date(randomData.nextLong(min.getTime(), max.getTime()));

}

Use ApacheCommonUtils to generate a random long within a given range, and then create Date out of that long.

Example:

import org.apache.commons.math.random.RandomData;

import org.apache.commons.math.random.RandomDataImpl;

public Date nextDate(Date min, Date max) {

RandomData randomData = new RandomDataImpl();

return new Date(randomData.nextLong(min.getTime(), max.getTime()));

}

回答 19

我用随机和时间为另一个项目做了这个。我从一开始就使用通用格式，您可以在此处查看strftime（）中第一个参数的文档。第二部分是random.randrange函数。它在参数之间返回一个整数。将其更改为与您想要的字符串匹配的范围。在第二个扩展的元组中，您必须有很好的论据。

import time
import random


def get_random_date():
    return strftime("%Y-%m-%d %H:%M:%S",(random.randrange(2000,2016),random.randrange(1,12),
    random.randrange(1,28),random.randrange(1,24),random.randrange(1,60),random.randrange(1,60),random.randrange(1,7),random.randrange(0,366),1))

I made this for another project using random and time. I used a general format from time you can view the documentation here for the first argument in strftime(). The second part is a random.randrange function. It returns an integer between the arguments. Change it to the ranges that match the strings you would like. You must have nice arguments in the tuple of the second arugment.

import time
import random


def get_random_date():
    return strftime("%Y-%m-%d %H:%M:%S",(random.randrange(2000,2016),random.randrange(1,12),
    random.randrange(1,28),random.randrange(1,24),random.randrange(1,60),random.randrange(1,60),random.randrange(1,7),random.randrange(0,366),1))

回答 20

熊猫+ numpy解决方案

import pandas as pd
import numpy as np

def RandomTimestamp(start, end):
    dts = (end - start).total_seconds()
    return start + pd.Timedelta(np.random.uniform(0, dts), 's')

dts是时间戳之间的时间差（以秒为单位）（浮动）。然后将其用于创建介于0和dts之间的熊猫时间增量，并将其添加到开始时间戳中。

Pandas + numpy solution

import pandas as pd
import numpy as np

def RandomTimestamp(start, end):
    dts = (end - start).total_seconds()
    return start + pd.Timedelta(np.random.uniform(0, dts), 's')

dts is the difference between timestamps in seconds (float). It is then used to create a pandas timedelta between 0 and dts, that is added to the start timestamp.

回答 21

根据mouviciel的回答，这是使用numpy的矢量化解决方案。将开始日期和结束日期转换为整数，在它们之间生成一个随机数数组，然后将整个数组转换回日期。

import time
import datetime
import numpy as np

n_rows = 10

start_time = "01/12/2011"
end_time = "05/08/2017"

date2int = lambda s: time.mktime(datetime.datetime.strptime(s,"%d/%m/%Y").timetuple())
int2date = lambda s: datetime.datetime.fromtimestamp(s).strftime('%Y-%m-%d %H:%M:%S')

start_time = date2int(start_time)
end_time = date2int(end_time)

random_ints = np.random.randint(low=start_time, high=end_time, size=(n_rows,1))
random_dates = np.apply_along_axis(int2date, 1, random_ints).reshape(n_rows,1)

print random_dates

Based on the answer by mouviciel, here is a vectorized solution using numpy. Convert the start and end dates to ints, generate an array of random numbers between them, and convert the whole array back to dates.

import time
import datetime
import numpy as np

n_rows = 10

start_time = "01/12/2011"
end_time = "05/08/2017"

date2int = lambda s: time.mktime(datetime.datetime.strptime(s,"%d/%m/%Y").timetuple())
int2date = lambda s: datetime.datetime.fromtimestamp(s).strftime('%Y-%m-%d %H:%M:%S')

start_time = date2int(start_time)
end_time = date2int(end_time)

random_ints = np.random.randint(low=start_time, high=end_time, size=(n_rows,1))
random_dates = np.apply_along_axis(int2date, 1, random_ints).reshape(n_rows,1)

print random_dates

回答 22

它是@（Tom Alsberg）的修改方法。我将其修改为以毫秒为单位获取日期。

import random
import time
import datetime

def random_date(start_time_string, end_time_string, format_string, random_number):
    """
    Get a time at a proportion of a range of two formatted times.
    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """
    dt_start = datetime.datetime.strptime(start_time_string, format_string)
    dt_end = datetime.datetime.strptime(end_time_string, format_string)

    start_time = time.mktime(dt_start.timetuple()) + dt_start.microsecond / 1000000.0
    end_time = time.mktime(dt_end.timetuple()) + dt_end.microsecond / 1000000.0

    random_time = start_time + random_number * (end_time - start_time)

    return datetime.datetime.fromtimestamp(random_time).strftime(format_string)

例：

print TestData.TestData.random_date("2000/01/01 00:00:00.000000", "2049/12/31 23:59:59.999999", '%Y/%m/%d %H:%M:%S.%f', random.random())

输出： 2028/07/08 12:34:49.977963

It’s modified method of @(Tom Alsberg). I modified it to get date with milliseconds.

import random
import time
import datetime

def random_date(start_time_string, end_time_string, format_string, random_number):
    """
    Get a time at a proportion of a range of two formatted times.
    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """
    dt_start = datetime.datetime.strptime(start_time_string, format_string)
    dt_end = datetime.datetime.strptime(end_time_string, format_string)

    start_time = time.mktime(dt_start.timetuple()) + dt_start.microsecond / 1000000.0
    end_time = time.mktime(dt_end.timetuple()) + dt_end.microsecond / 1000000.0

    random_time = start_time + random_number * (end_time - start_time)

    return datetime.datetime.fromtimestamp(random_time).strftime(format_string)

Example:

print TestData.TestData.random_date("2000/01/01 00:00:00.000000", "2049/12/31 23:59:59.999999", '%Y/%m/%d %H:%M:%S.%f', random.random())

Output: 2028/07/08 12:34:49.977963

回答 23

start_timestamp = time.mktime(time.strptime('Jun 1 2010  01:33:00', '%b %d %Y %I:%M:%S'))
end_timestamp = time.mktime(time.strptime('Jun 1 2017  12:33:00', '%b %d %Y %I:%M:%S'))
time.strftime('%b %d %Y %I:%M:%S',time.localtime(randrange(start_timestamp,end_timestamp)))

参考

start_timestamp = time.mktime(time.strptime('Jun 1 2010  01:33:00', '%b %d %Y %I:%M:%S'))
end_timestamp = time.mktime(time.strptime('Jun 1 2017  12:33:00', '%b %d %Y %I:%M:%S'))
time.strftime('%b %d %Y %I:%M:%S',time.localtime(randrange(start_timestamp,end_timestamp)))

refer

回答 24

    # needed to create data for 1000 fictitious employees for testing code 
    # code relating to randomly assigning forenames, surnames, and genders
    # has been removed as not germaine to the question asked above but FYI
    # genders were randomly assigned, forenames/surnames were web scrapped,
    # there is no accounting for leap years, and the data stored in mySQL

    import random 
    from datetime import datetime
    from datetime import timedelta

    for employee in range(1000):
        # assign a random date of birth (employees are aged between sixteen and sixty five)
        dlt = random.randint(365*16, 365*65)
        dob = datetime.today() - timedelta(days=dlt)
        # assign a random date of hire sometime between sixteenth birthday and yesterday
        doh = datetime.today() - timedelta(days=random.randint(1, dlt-365*16))
        print("born {} hired {}".format(dob.strftime("%d-%m-%y"), doh.strftime("%d-%m-%y")))

    # needed to create data for 1000 fictitious employees for testing code 
    # code relating to randomly assigning forenames, surnames, and genders
    # has been removed as not germaine to the question asked above but FYI
    # genders were randomly assigned, forenames/surnames were web scrapped,
    # there is no accounting for leap years, and the data stored in mySQL

    import random 
    from datetime import datetime
    from datetime import timedelta

    for employee in range(1000):
        # assign a random date of birth (employees are aged between sixteen and sixty five)
        dlt = random.randint(365*16, 365*65)
        dob = datetime.today() - timedelta(days=dlt)
        # assign a random date of hire sometime between sixteenth birthday and yesterday
        doh = datetime.today() - timedelta(days=random.randint(1, dlt-365*16))
        print("born {} hired {}".format(dob.strftime("%d-%m-%y"), doh.strftime("%d-%m-%y")))

回答 25

另一种方法两个日期之间创建随机日期使用np.random.randint()，pd.Timestamp().value并pd.to_datetime()具有for loop：

# Import libraries
import pandas as pd

# Initialize
start = '2020-01-01' # Specify start date
end = '2020-03-10' # Specify end date
n = 10 # Specify number of dates needed

# Get random dates
x = np.random.randint(pd.Timestamp(start).value, pd.Timestamp(end).value,n)
random_dates = [pd.to_datetime((i/10**9)/(60*60)/24, unit='D').strftime('%Y-%m-%d')  for i in x]

print(random_dates)

输出量

['2020-01-06',
 '2020-03-08',
 '2020-01-23',
 '2020-02-03',
 '2020-01-30',
 '2020-01-05',
 '2020-02-16',
 '2020-03-08',
 '2020-02-09',
 '2020-01-04']

Alternative way to create random dates between two dates using np.random.randint(), pd.Timestamp().value and pd.to_datetime() with for loop:

# Import libraries
import pandas as pd

# Initialize
start = '2020-01-01' # Specify start date
end = '2020-03-10' # Specify end date
n = 10 # Specify number of dates needed

# Get random dates
x = np.random.randint(pd.Timestamp(start).value, pd.Timestamp(end).value,n)
random_dates = [pd.to_datetime((i/10**9)/(60*60)/24, unit='D').strftime('%Y-%m-%d')  for i in x]

print(random_dates)

Output

['2020-01-06',
 '2020-03-08',
 '2020-01-23',
 '2020-02-03',
 '2020-01-30',
 '2020-01-05',
 '2020-02-16',
 '2020-03-08',
 '2020-02-09',
 '2020-01-04']

知识问答

我可以将python中的stdout重定向到某种字符串缓冲区吗？

2021年8月17日 Python实用宝典

问题：我可以将python中的stdout重定向到某种字符串缓冲区吗？

我使用python ftplib编写了一个小型FTP客户端，但程序包中的某些函数不会返回字符串输出，而是输出到stdout。我想重定向stdout到一个我将能够从中读取输出的对象。

我知道stdout可以使用以下命令将其重定向到任何常规文件中：

stdout = open("file", "a")

但是我更喜欢不使用本地驱动器的方法。

我正在寻找类似BufferedReaderJava的东西，可用于将缓冲区包装到流中。

I’m using python’s ftplib to write a small FTP client, but some of the functions in the package don’t return string output, but print to stdout. I want to redirect stdout to an object which I’ll be able to read the output from.

I know stdout can be redirected into any regular file with:

stdout = open("file", "a")

But I prefer a method that doesn’t uses the local drive.

I’m looking for something like the BufferedReader in Java that can be used to wrap a buffer into a stream.

回答 0

from cStringIO import StringIO # Python3 use: from io import StringIO
import sys

old_stdout = sys.stdout
sys.stdout = mystdout = StringIO()

# blah blah lots of code ...

sys.stdout = old_stdout

# examine mystdout.getvalue()

from cStringIO import StringIO # Python3 use: from io import StringIO
import sys

old_stdout = sys.stdout
sys.stdout = mystdout = StringIO()

# blah blah lots of code ...

sys.stdout = old_stdout

# examine mystdout.getvalue()

回答 1

Python 3.4中有contextlib.redirect_stdout（）函数：

import io
from contextlib import redirect_stdout

with io.StringIO() as buf, redirect_stdout(buf):
    print('redirected')
    output = buf.getvalue()

以下代码示例显示了如何在旧版Python上实现它。

There is contextlib.redirect_stdout() function in Python 3.4:

import io
from contextlib import redirect_stdout

with io.StringIO() as buf, redirect_stdout(buf):
    print('redirected')
    output = buf.getvalue()

Here’s code example that shows how to implement it on older Python versions.

回答 2

只是为了补充上述Ned的答案：您可以使用它将输出重定向到实现write（str）方法的任何对象。

这可以很好地用于在GUI应用程序中“捕获” stdout输出。

这是PyQt中一个愚蠢的例子：

import sys
from PyQt4 import QtGui

class OutputWindow(QtGui.QPlainTextEdit):
    def write(self, txt):
        self.appendPlainText(str(txt))

app = QtGui.QApplication(sys.argv)
out = OutputWindow()
sys.stdout=out
out.show()
print "hello world !"

Just to add to Ned’s answer above: you can use this to redirect output to any object that implements a write(str) method.

This can be used to good effect to “catch” stdout output in a GUI application.

Here’s a silly example in PyQt:

import sys
from PyQt4 import QtGui

class OutputWindow(QtGui.QPlainTextEdit):
    def write(self, txt):
        self.appendPlainText(str(txt))

app = QtGui.QApplication(sys.argv)
out = OutputWindow()
sys.stdout=out
out.show()
print "hello world !"

回答 3

从Python 2.6开始，您可以使用实现io模块中的TextIOBaseAPI的任何方法来代替。此解决方案还使您能够sys.stdout.buffer.write()在Python 3中使用（已）将编码的字节字符串写入stdout（请参阅Python 3中的stdout）。StringIO那时，使用将不起作用，因为sys.stdout.encoding也不sys.stdout.buffer可用。

使用TextIOWrapper的解决方案：

import sys
from io import TextIOWrapper, BytesIO

# setup the environment
old_stdout = sys.stdout
sys.stdout = TextIOWrapper(BytesIO(), sys.stdout.encoding)

# do something that writes to stdout or stdout.buffer

# get output
sys.stdout.seek(0)      # jump to the start
out = sys.stdout.read() # read output

# restore stdout
sys.stdout.close()
sys.stdout = old_stdout

此解决方案适用于Python 2> = 2.6和Python 3。

请注意，我们的新产品sys.stdout.write()仅接受unicode字符串，并且sys.stdout.buffer.write()仅接受字节字符串。对于旧代码而言，情况可能并非如此，但对于在Python 2和3上运行且无需更改的代码而言，情况往往如此sys.stdout.buffer。

您可以构建一个稍微的变化以接受unicode和byte字符串用于write()：

class StdoutBuffer(TextIOWrapper):
    def write(self, string):
        try:
            return super(StdoutBuffer, self).write(string)
        except TypeError:
            # redirect encoded byte strings directly to buffer
            return super(StdoutBuffer, self).buffer.write(string)

您不必将缓冲区的编码设置为sys.stdout.encoding，但这在使用此方法测试/比较脚本输出时会有所帮助。

Starting with Python 2.6 you can use anything implementing the TextIOBase API from the io module as a replacement. This solution also enables you to use sys.stdout.buffer.write() in Python 3 to write (already) encoded byte strings to stdout (see stdout in Python 3). Using StringIO wouldn’t work then, because neither sys.stdout.encoding nor sys.stdout.buffer would be available.

A solution using TextIOWrapper:

import sys
from io import TextIOWrapper, BytesIO

# setup the environment
old_stdout = sys.stdout
sys.stdout = TextIOWrapper(BytesIO(), sys.stdout.encoding)

# do something that writes to stdout or stdout.buffer

# get output
sys.stdout.seek(0)      # jump to the start
out = sys.stdout.read() # read output

# restore stdout
sys.stdout.close()
sys.stdout = old_stdout

This solution works for Python 2 >= 2.6 and Python 3.

Please note that our new sys.stdout.write() only accepts unicode strings and sys.stdout.buffer.write() only accepts byte strings. This might not be the case for old code, but is often the case for code that is built to run on Python 2 and 3 without changes, which again often makes use of sys.stdout.buffer.

You can build a slight variation that accepts unicode and byte strings for write():

class StdoutBuffer(TextIOWrapper):
    def write(self, string):
        try:
            return super(StdoutBuffer, self).write(string)
        except TypeError:
            # redirect encoded byte strings directly to buffer
            return super(StdoutBuffer, self).buffer.write(string)

You don’t have to set the encoding of the buffer the sys.stdout.encoding, but this helps when using this method for testing/comparing script output.

回答 4

即使存在异常，此方法也将还原sys.stdout。它还会在异常发生前获取任何输出。

import io
import sys

real_stdout = sys.stdout
fake_stdout = io.BytesIO()   # or perhaps io.StringIO()
try:
    sys.stdout = fake_stdout
    # do what you have to do to create some output
finally:
    sys.stdout = real_stdout
    output_string = fake_stdout.getvalue()
    fake_stdout.close()
    # do what you want with the output_string

使用Python 2.7.10测试 io.BytesIO()

使用Python 3.6.4进行了测试 io.StringIO()

鲍勃（Bob），添加了一个案例，如果您感觉到修改/扩展代码实验中的任何内容，可能会在某种意义上变得有趣，否则可以将其删除

^{广告信息…在寻找一些可行的机制来“抓取”输出的过程中，通过扩展实验的一些评论，numexpr.print_versions()直接针对<stdout>（需要清理GUI并将详细信息收集到调试报告中）}

# THIS WORKS AS HELL: as Bob Stein proposed years ago:
#  py2 SURPRISEDaBIT:
#
import io
import sys
#
real_stdout = sys.stdout                        #           PUSH <stdout> ( store to REAL_ )
fake_stdout = io.BytesIO()                      #           .DEF FAKE_
try:                                            # FUSED .TRY:
    sys.stdout.flush()                          #           .flush() before
    sys.stdout = fake_stdout                    #           .SET <stdout> to use FAKE_
    # ----------------------------------------- #           +    do what you gotta do to create some output
    print 123456789                             #           + 
    import  numexpr                             #           + 
    QuantFX.numexpr.__version__                 #           + [3] via fake_stdout re-assignment, as was bufferred + "late" deferred .get_value()-read into print, to finally reach -> real_stdout
    QuantFX.numexpr.print_versions()            #           + [4] via fake_stdout re-assignment, as was bufferred + "late" deferred .get_value()-read into print, to finally reach -> real_stdout
    _ = os.system( 'echo os.system() redir-ed' )#           + [1] via real_stdout                                 + "late" deferred .get_value()-read into print, to finally reach -> real_stdout, if not ( _ = )-caught from RET-d "byteswritten" / avoided from being injected int fake_stdout
    _ = os.write(  sys.stderr.fileno(),         #           + [2] via      stderr                                 + "late" deferred .get_value()-read into print, to finally reach -> real_stdout, if not ( _ = )-caught from RET-d "byteswritten" / avoided from being injected int fake_stdout
                       b'os.write()  redir-ed' )#  *OTHERWISE, if via fake_stdout, EXC <_io.BytesIO object at 0x02C0BB10> Traceback (most recent call last):
    # ----------------------------------------- #           ?                              io.UnsupportedOperation: fileno
    #'''                                                    ? YET:        <_io.BytesIO object at 0x02C0BB10> has a .fileno() method listed
    #>>> 'fileno' in dir( sys.stdout )       -> True        ? HAS IT ADVERTISED,
    #>>> pass;            sys.stdout.fileno  -> <built-in method fileno of _io.BytesIO object at 0x02C0BB10>
    #>>> pass;            sys.stdout.fileno()-> Traceback (most recent call last):
    #                                             File "<stdin>", line 1, in <module>
    #                                           io.UnsupportedOperation: fileno
    #                                                       ? BUT REFUSES TO USE IT
    #'''
finally:                                        # == FINALLY:
    sys.stdout.flush()                          #           .flush() before ret'd back REAL_
    sys.stdout = real_stdout                    #           .SET <stdout> to use POP'd REAL_
    sys.stdout.flush()                          #           .flush() after  ret'd back REAL_
    out_string = fake_stdout.getvalue()         #           .GET string           from FAKE_
    fake_stdout.close()                         #                <FD>.close()
    # +++++++++++++++++++++++++++++++++++++     # do what you want with the out_string
    #
    print "\n{0:}\n{1:}{0:}".format( 60 * "/\\",# "LATE" deferred print the out_string at the very end reached -> real_stdout
                                     out_string #                   
                                     )
'''
PASS'd:::::
...
os.system() redir-ed
os.write()  redir-ed
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
123456789
'2.5'
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version:   2.5
NumPy version:     1.10.4
Python version:    2.7.13 |Anaconda 4.0.0 (32-bit)| (default, May 11 2017, 14:07:41) [MSC v.1500 32 bit (Intel)]
AMD/Intel CPU?     True
VML available?     True
VML/MKL version:   Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for 32-bit applications
Number of threads used by default: 4 (out of 4 detected cores)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
>>>

EXC'd :::::
...
os.system() redir-ed
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
123456789
'2.5'
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version:   2.5
NumPy version:     1.10.4
Python version:    2.7.13 |Anaconda 4.0.0 (32-bit)| (default, May 11 2017, 14:07:41) [MSC v.1500 32 bit (Intel)]
AMD/Intel CPU?     True
VML available?     True
VML/MKL version:   Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for 32-bit applications
Number of threads used by default: 4 (out of 4 detected cores)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

Traceback (most recent call last):
  File "<stdin>", line 9, in <module>
io.UnsupportedOperation: fileno
'''

This method restores sys.stdout even if there’s an exception. It also gets any output before the exception.

import io
import sys

real_stdout = sys.stdout
fake_stdout = io.BytesIO()   # or perhaps io.StringIO()
try:
    sys.stdout = fake_stdout
    # do what you have to do to create some output
finally:
    sys.stdout = real_stdout
    output_string = fake_stdout.getvalue()
    fake_stdout.close()
    # do what you want with the output_string

Tested in Python 2.7.10 using io.BytesIO()

Tested in Python 3.6.4 using io.StringIO()

Bob, added for a case if you feel anything from the modified / extended code experimentation might get interesting in any sense, otherwise feel free to delete it

^{Ad informandum … a few remarks from extended experimentation during finding some viable mechanics to “grab” outputs, directed by numexpr.print_versions() directly to the <stdout> ( upon a need to clean GUI and collecting details into debugging-report )}

# THIS WORKS AS HELL: as Bob Stein proposed years ago:
#  py2 SURPRISEDaBIT:
#
import io
import sys
#
real_stdout = sys.stdout                        #           PUSH <stdout> ( store to REAL_ )
fake_stdout = io.BytesIO()                      #           .DEF FAKE_
try:                                            # FUSED .TRY:
    sys.stdout.flush()                          #           .flush() before
    sys.stdout = fake_stdout                    #           .SET <stdout> to use FAKE_
    # ----------------------------------------- #           +    do what you gotta do to create some output
    print 123456789                             #           + 
    import  numexpr                             #           + 
    QuantFX.numexpr.__version__                 #           + [3] via fake_stdout re-assignment, as was bufferred + "late" deferred .get_value()-read into print, to finally reach -> real_stdout
    QuantFX.numexpr.print_versions()            #           + [4] via fake_stdout re-assignment, as was bufferred + "late" deferred .get_value()-read into print, to finally reach -> real_stdout
    _ = os.system( 'echo os.system() redir-ed' )#           + [1] via real_stdout                                 + "late" deferred .get_value()-read into print, to finally reach -> real_stdout, if not ( _ = )-caught from RET-d "byteswritten" / avoided from being injected int fake_stdout
    _ = os.write(  sys.stderr.fileno(),         #           + [2] via      stderr                                 + "late" deferred .get_value()-read into print, to finally reach -> real_stdout, if not ( _ = )-caught from RET-d "byteswritten" / avoided from being injected int fake_stdout
                       b'os.write()  redir-ed' )#  *OTHERWISE, if via fake_stdout, EXC <_io.BytesIO object at 0x02C0BB10> Traceback (most recent call last):
    # ----------------------------------------- #           ?                              io.UnsupportedOperation: fileno
    #'''                                                    ? YET:        <_io.BytesIO object at 0x02C0BB10> has a .fileno() method listed
    #>>> 'fileno' in dir( sys.stdout )       -> True        ? HAS IT ADVERTISED,
    #>>> pass;            sys.stdout.fileno  -> <built-in method fileno of _io.BytesIO object at 0x02C0BB10>
    #>>> pass;            sys.stdout.fileno()-> Traceback (most recent call last):
    #                                             File "<stdin>", line 1, in <module>
    #                                           io.UnsupportedOperation: fileno
    #                                                       ? BUT REFUSES TO USE IT
    #'''
finally:                                        # == FINALLY:
    sys.stdout.flush()                          #           .flush() before ret'd back REAL_
    sys.stdout = real_stdout                    #           .SET <stdout> to use POP'd REAL_
    sys.stdout.flush()                          #           .flush() after  ret'd back REAL_
    out_string = fake_stdout.getvalue()         #           .GET string           from FAKE_
    fake_stdout.close()                         #                <FD>.close()
    # +++++++++++++++++++++++++++++++++++++     # do what you want with the out_string
    #
    print "\n{0:}\n{1:}{0:}".format( 60 * "/\\",# "LATE" deferred print the out_string at the very end reached -> real_stdout
                                     out_string #                   
                                     )
'''
PASS'd:::::
...
os.system() redir-ed
os.write()  redir-ed
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
123456789
'2.5'
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version:   2.5
NumPy version:     1.10.4
Python version:    2.7.13 |Anaconda 4.0.0 (32-bit)| (default, May 11 2017, 14:07:41) [MSC v.1500 32 bit (Intel)]
AMD/Intel CPU?     True
VML available?     True
VML/MKL version:   Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for 32-bit applications
Number of threads used by default: 4 (out of 4 detected cores)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
>>>

EXC'd :::::
...
os.system() redir-ed
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
123456789
'2.5'
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version:   2.5
NumPy version:     1.10.4
Python version:    2.7.13 |Anaconda 4.0.0 (32-bit)| (default, May 11 2017, 14:07:41) [MSC v.1500 32 bit (Intel)]
AMD/Intel CPU?     True
VML available?     True
VML/MKL version:   Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for 32-bit applications
Number of threads used by default: 4 (out of 4 detected cores)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

Traceback (most recent call last):
  File "<stdin>", line 9, in <module>
io.UnsupportedOperation: fileno
'''

回答 5

python3的上下文管理器：

import sys
from io import StringIO


class RedirectedStdout:
    def __init__(self):
        self._stdout = None
        self._string_io = None

    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._string_io = StringIO()
        return self

    def __exit__(self, type, value, traceback):
        sys.stdout = self._stdout

    def __str__(self):
        return self._string_io.getvalue()

像这样使用：

>>> with RedirectedStdout() as out:
>>>     print('asdf')
>>>     s = str(out)
>>>     print('bsdf')
>>> print(s, out)
'asdf\n' 'asdf\nbsdf\n'

A context manager for python3:

import sys
from io import StringIO


class RedirectedStdout:
    def __init__(self):
        self._stdout = None
        self._string_io = None

    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._string_io = StringIO()
        return self

    def __exit__(self, type, value, traceback):
        sys.stdout = self._stdout

    def __str__(self):
        return self._string_io.getvalue()

use like this:

>>> with RedirectedStdout() as out:
>>>     print('asdf')
>>>     s = str(out)
>>>     print('bsdf')
>>> print(s, out)
'asdf\n' 'asdf\nbsdf\n'

回答 6

在Python3.6中，StringIOand cStringIO模块不见了，您应该改用，所以您应该io.StringIO像第一个答案那样进行操作：

import sys
from io import StringIO

old_stdout = sys.stdout
old_stderr = sys.stderr
my_stdout = sys.stdout = StringIO()
my_stderr = sys.stderr = StringIO()

# blah blah lots of code ...

sys.stdout = self.old_stdout
sys.stderr = self.old_stderr

// if you want to see the value of redirect output, be sure the std output is turn back
print(my_stdout.getvalue())
print(my_stderr.getvalue())

my_stdout.close()
my_stderr.close()

In Python3.6, the StringIO and cStringIO modules are gone, you should use io.StringIO instead.So you should do this like the first answer:

import sys
from io import StringIO

old_stdout = sys.stdout
old_stderr = sys.stderr
my_stdout = sys.stdout = StringIO()
my_stderr = sys.stderr = StringIO()

# blah blah lots of code ...

sys.stdout = self.old_stdout
sys.stderr = self.old_stderr

// if you want to see the value of redirect output, be sure the std output is turn back
print(my_stdout.getvalue())
print(my_stderr.getvalue())

my_stdout.close()
my_stderr.close()

回答 7

使用pipe()并写入适当的文件描述符。

https://docs.python.org/library/os.html#file-descriptor-operations

Use pipe() and write to the appropriate file descriptor.

https://docs.python.org/library/os.html#file-descriptor-operations

回答 8

这是另一种看法。 contextlib.redirect_stdout与io.StringIO()作为记录的是伟大的，但它仍然是一个有点冗长，日常使用。这是通过子类化使其成为单线的方法contextlib.redirect_stdout：

import sys
import io
from contextlib import redirect_stdout

class capture(redirect_stdout):

    def __init__(self):
        self.f = io.StringIO()
        self._new_target = self.f
        self._old_targets = []  # verbatim from parent class

    def __enter__(self):
        self._old_targets.append(getattr(sys, self._stream))  # verbatim from parent class
        setattr(sys, self._stream, self._new_target)  # verbatim from parent class
        return self  # instead of self._new_target in the parent class

    def __repr__(self):
        return self.f.getvalue()

由于__enter__返回self，因此在with块退出之后，可以使用上下文管理器对象。而且，由于使用__repr__方法，上下文管理器对象的字符串表示实际上是stdout。所以现在你有了

with capture() as message:
    print('Hello World!')
print(str(message)=='Hello World!\n')  # returns True

Here’s another take on this. contextlib.redirect_stdout with io.StringIO() as documented is great, but it’s still a bit verbose for every day use. Here’s how to make it a one-liner by subclassing contextlib.redirect_stdout:

import sys
import io
from contextlib import redirect_stdout

class capture(redirect_stdout):

    def __init__(self):
        self.f = io.StringIO()
        self._new_target = self.f
        self._old_targets = []  # verbatim from parent class

    def __enter__(self):
        self._old_targets.append(getattr(sys, self._stream))  # verbatim from parent class
        setattr(sys, self._stream, self._new_target)  # verbatim from parent class
        return self  # instead of self._new_target in the parent class

    def __repr__(self):
        return self.f.getvalue()

Since __enter__ returns self, you have the context manager object available after the with block exits. Moreover, thanks to the __repr__ method, the string representation of the context manager object is, in fact, stdout. So now you have,

with capture() as message:
    print('Hello World!')
print(str(message)=='Hello World!\n')  # returns True