分类目录归档:知识问答

如何指定方法的返回类型与类本身相同?

问题:如何指定方法的返回类型与类本身相同?

我在python 3中有以下代码:

class Position:

    def __init__(self, x: int, y: int):
        self.x = x
        self.y = y

    def __add__(self, other: Position) -> Position:
        return Position(self.x + other.x, self.y + other.y)

但是我的编辑器(PyCharm)说,参考位置无法解析(在__add__方法中)。我应该如何指定期望返回类型为type Position

编辑:我认为这实际上是一个PyCharm问题。它实际上在警告和代码完成中使用该信息

但如果我错了,请纠正我,并需要使用其他语法。

I have the following code in python 3:

class Position:

    def __init__(self, x: int, y: int):
        self.x = x
        self.y = y

    def __add__(self, other: Position) -> Position:
        return Position(self.x + other.x, self.y + other.y)

But my editor (PyCharm) says that the reference Position can not be resolved (in the __add__ method). How should I specify that I expect the return type to be of type Position?

Edit: I think this is actually a PyCharm issue. It actually uses the information in its warnings, and code completion

But correct me if I’m wrong, and need to use some other syntax.


回答 0

TL; DR:如果您使用的是Python 4.0,它将正常工作。从今天(2019年)开始,在3.7+中,您必须使用将来的语句(from __future__ import annotations)启用此功能-对于Python 3.6或更低版本,请使用字符串。

我猜你有这个exceptions:

NameError: name 'Position' is not defined

这是因为Position必须先定义,然后才能在批注中使用它,除非您正在使用Python 4。

Python 3.7+: from __future__ import annotations

Python 3.7引入了PEP 563:推迟对注释的评估。使用future语句的模块from __future__ import annotations将自动将注释存储为字符串:

from __future__ import annotations

class Position:
    def __add__(self, other: Position) -> Position:
        ...

按计划,它将成为Python 4.0中的默认设置。由于Python仍然是一种动态类型化的语言,因此在运行时不进行类型检查,因此键入注释应该不会对性能产生影响,对吗?错误!在python 3.7之前,键入模块曾经是内核中最慢的python模块之一,因此,如果升级到3.7,import typing您将看到性能提高多达7倍

Python <3.7:使用字符串

根据PEP 484,您应该使用字符串而不是类本身:

class Position:
    ...
    def __add__(self, other: 'Position') -> 'Position':
       ...

如果您使用Django框架,可能会很熟悉,因为Django模型还将字符串用于正向引用(外键模型已self声明或尚未声明的外键定义)。这应该与Pycharm和其他工具一起使用。

资料来源

PEP 484PEP 563的相关部分,为您节省行程:

转发参考

当类型提示包含尚未定义的名称时,该定义可以表示为字符串文字,稍后再解析。

通常会发生这种情况的情况是容器类的定义,其中定义的类出现在某些方法的签名中。例如,以下代码(简单的二叉树实现的开始)不起作用:

class Tree:
    def __init__(self, left: Tree, right: Tree):
        self.left = left
        self.right = right

为了解决这个问题,我们写:

class Tree:
    def __init__(self, left: 'Tree', right: 'Tree'):
        self.left = left
        self.right = right

字符串文字应包含有效的Python表达式(即,compile(lit,”,’eval’)应为有效的代码对象),并且在模块完全加载后,其值应无错误。在其中评估本地和全局命名空间的命名空间应与在其中评估同一函数的默认参数的命名空间相同。

和PEP 563:

在Python 4.0中,将不再在定义时评估函数和变量注释。而是将字符串形式保留在相应的__annotations__字典中。静态类型检查器在行为上不会有任何区别,而在运行时使用批注的工具将必须执行推迟的评估。

可以使用以下特殊导入从Python 3.7开始启用上述功能:

from __future__ import annotations

您可能会想做的事情

A.定义一个假人 Position

在类定义之前,放置一个虚拟定义:

class Position(object):
    pass


class Position(object):
    ...

这样可以摆脱NameError甚至看起来还可以:

>>> Position.__add__.__annotations__
{'other': __main__.Position, 'return': __main__.Position}

但是吗?

>>> for k, v in Position.__add__.__annotations__.items():
...     print(k, 'is Position:', v is Position)                                                                                                                                                                                                                  
return is Position: False
other is Position: False

B. Monkey-patch为了添加注释:

您可能想尝试一些Python元编程魔术,并编写装饰器以Monkey修补类定义,以便添加注释:

class Position:
    ...
    def __add__(self, other):
        return self.__class__(self.x + other.x, self.y + other.y)

装饰者应对此负责:

Position.__add__.__annotations__['return'] = Position
Position.__add__.__annotations__['other'] = Position

至少看起来是正确的:

>>> for k, v in Position.__add__.__annotations__.items():
...     print(k, 'is Position:', v is Position)                                                                                                                                                                                                                  
return is Position: True
other is Position: True

可能麻烦太多了。

结论

如果您使用的是3.6或更低版本,请使用包含类名的字符串文字,在3.7中使用from __future__ import annotations它就可以了。

TL;DR: if you are using Python 4.0 it just works. As of today (2019) in 3.7+ you must turn this feature on using a future statement (from __future__ import annotations) – for Python 3.6 or below use a string.

I guess you got this exception:

NameError: name 'Position' is not defined

This is because Position must be defined before you can use it in an annotation unless you are using Python 4.

Python 3.7+: from __future__ import annotations

Python 3.7 introduces PEP 563: postponed evaluation of annotations. A module that uses the future statement from __future__ import annotations will store annotations as strings automatically:

from __future__ import annotations

class Position:
    def __add__(self, other: Position) -> Position:
        ...

This is scheduled to become the default in Python 4.0. Since Python still is a dynamically typed language so no type checking is done at runtime, typing annotations should have no performance impact, right? Wrong! Before python 3.7 the typing module used to be one of the slowest python modules in core so if you import typing you will see up to 7 times increase in performance when you upgrade to 3.7.

Python <3.7: use a string

According to PEP 484, you should use a string instead of the class itself:

class Position:
    ...
    def __add__(self, other: 'Position') -> 'Position':
       ...

If you use the Django framework this may be familiar as Django models also use strings for forward references (foreign key definitions where the foreign model is self or is not declared yet). This should work with Pycharm and other tools.

Sources

The relevant parts of PEP 484 and PEP 563, to spare you the trip:

Forward references

When a type hint contains names that have not been defined yet, that definition may be expressed as a string literal, to be resolved later.

A situation where this occurs commonly is the definition of a container class, where the class being defined occurs in the signature of some of the methods. For example, the following code (the start of a simple binary tree implementation) does not work:

class Tree:
    def __init__(self, left: Tree, right: Tree):
        self.left = left
        self.right = right

To address this, we write:

class Tree:
    def __init__(self, left: 'Tree', right: 'Tree'):
        self.left = left
        self.right = right

The string literal should contain a valid Python expression (i.e., compile(lit, ”, ‘eval’) should be a valid code object) and it should evaluate without errors once the module has been fully loaded. The local and global namespace in which it is evaluated should be the same namespaces in which default arguments to the same function would be evaluated.

and PEP 563:

In Python 4.0, function and variable annotations will no longer be evaluated at definition time. Instead, a string form will be preserved in the respective __annotations__ dictionary. Static type checkers will see no difference in behavior, whereas tools using annotations at runtime will have to perform postponed evaluation.

The functionality described above can be enabled starting from Python 3.7 using the following special import:

from __future__ import annotations

Things that you may be tempted to do instead

A. Define a dummy Position

Before the class definition, place a dummy definition:

class Position(object):
    pass


class Position(object):
    ...

This will get rid of the NameError and may even look OK:

>>> Position.__add__.__annotations__
{'other': __main__.Position, 'return': __main__.Position}

But is it?

>>> for k, v in Position.__add__.__annotations__.items():
...     print(k, 'is Position:', v is Position)                                                                                                                                                                                                                  
return is Position: False
other is Position: False

B. Monkey-patch in order to add the annotations:

You may want to try some Python meta programming magic and write a decorator to monkey-patch the class definition in order to add annotations:

class Position:
    ...
    def __add__(self, other):
        return self.__class__(self.x + other.x, self.y + other.y)

The decorator should be responsible for the equivalent of this:

Position.__add__.__annotations__['return'] = Position
Position.__add__.__annotations__['other'] = Position

At least it seems right:

>>> for k, v in Position.__add__.__annotations__.items():
...     print(k, 'is Position:', v is Position)                                                                                                                                                                                                                  
return is Position: True
other is Position: True

Probably too much trouble.

Conclusion

If you are using 3.6 or below use a string literal containing the class name, in 3.7 use from __future__ import annotations and it will just work.


回答 1

将类型指定为字符串是可以的,但总是让我有些讨厌,因为我们基本上是在绕过解析器。因此,您最好不要拼写以下任何文字字符串:

def __add__(self, other: 'Position') -> 'Position':
    return Position(self.x + other.x, self.y + other.y)

有一个细微的变化是使用绑定的typevar,至少在声明typevar时,您只需编写一次字符串即可:

from typing import TypeVar

T = TypeVar('T', bound='Position')

class Position:

    def __init__(self, x: int, y: int):
        self.x = x
        self.y = y

    def __add__(self, other: T) -> T:
        return Position(self.x + other.x, self.y + other.y)

Specifying the type as string is fine, but always grates me a bit that we are basically circumventing the parser. So you better not misspell any one of these literal strings:

def __add__(self, other: 'Position') -> 'Position':
    return Position(self.x + other.x, self.y + other.y)

A slight variation is to use a bound typevar, at least then you have to write the string only once when declaring the typevar:

from typing import TypeVar

T = TypeVar('T', bound='Position')

class Position:

    def __init__(self, x: int, y: int):
        self.x = x
        self.y = y

    def __add__(self, other: T) -> T:
        return Position(self.x + other.x, self.y + other.y)

回答 2

在解析类主体本身时,名称“ Position”不可用。我不知道您如何使用类型声明,但是Python的PEP 484-如果使用这些键入提示表示您可以在此时将名称简单地作为字符串,这是大多数模式应使用的方式:

def __add__(self, other: 'Position') -> 'Position':
    return Position(self.x + other.x, self.y + other.y)

检查https://www.python.org/dev/peps/pep-0484/#forward-references-符合该要求的工具将知道从那里解包并使用类名。(请记住,Python语言本身不执行任何这些注释-它们通常用于静态代码分析,或者可以具有一个库/框架以在运行时进行类型检查-但您必须明确地进行设置。

更新此外,从Python 3.8开始,请检查pep-563-从Python 3.8开始,可以编写from __future__ import annotations以推迟对批注的求值-前向引用类应简单易用。

The name ‘Position’ is not avalilable at the time the class body itself is parsed. I don’t know how you are using the type declarations, but Python’s PEP 484 – which is what most mode should use if using these typing hints say that you can simply put the name as a string at this point:

def __add__(self, other: 'Position') -> 'Position':
    return Position(self.x + other.x, self.y + other.y)

Check https://www.python.org/dev/peps/pep-0484/#forward-references – tools conforming to that will know to unwrap the class name from there and make use of it.(It is always important to have in mind that the Python language itself does nothing of these annotations – they are usually meant for static-code analysis, or one could have a library/framework for type checking in run-time – but you have to explicitly set that).

update Also, as of Python 3.8, check pep-563 – as of Python 3.8 it is possible to write from __future__ import annotations to defer the evaluation of annotations – forward referencing classes should work straightforward.


回答 3

当基于字符串的类型提示可接受时,__qualname__也可以使用该项目。它包含类的名称,并且在类定义的主体中可用。

class MyClass:
    @classmethod
    def make_new(cls) -> __qualname__:
        return cls()

这样,重命名类并不意味着修改类型提示。但是我个人并不希望智能代码编辑器能够很好地处理这种形式。

When a string-based type hint is acceptable, the __qualname__ item can also be used. It holds the name of the class, and it is available in the body of the class definition.

class MyClass:
    @classmethod
    def make_new(cls) -> __qualname__:
        return cls()

By doing this, renaming the class does not imply modifying the type hints. But I personally would not expect smart code editors to handle this form well.


Python中exit()和sys.exit()之间的区别

问题:Python中exit()和sys.exit()之间的区别

在Python中,有两个类似的函数,exit()sys.exit()。有什么区别,何时应在另一个上使用?

In Python, there are two similarly-named functions, exit() and sys.exit(). What’s the difference and when should I use one over the other?


回答 0

exit是交互式外壳的帮助sys.exit程序- 旨在在程序中使用。

site模块(启动时会自动导入,除非指定了-S命令行选项)会向内置命名空间(例如exit添加多个常量。它们对于交互式解释程序外壳很有用,不应在程序中使用


从技术上讲,它们的作用大致相同:提高SystemExitsys.exitsysmodule.c中这样

static PyObject *
sys_exit(PyObject *self, PyObject *args)
{
    PyObject *exit_code = 0;
    if (!PyArg_UnpackTuple(args, "exit", 0, 1, &exit_code))
        return NULL;
    /* Raise SystemExit so callers may catch it or clean up. */
    PyErr_SetObject(PyExc_SystemExit, exit_code);
   return NULL;
}

虽然分别exitsite.py_sitebuiltins.py中定义。

class Quitter(object):
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return 'Use %s() or %s to exit' % (self.name, eof)
    def __call__(self, code=None):
        # Shells like IDLE catch the SystemExit, but listen when their
        # stdin wrapper is closed.
        try:
            sys.stdin.close()
        except:
            pass
        raise SystemExit(code)
__builtin__.quit = Quitter('quit')
__builtin__.exit = Quitter('exit')

请注意,还有第三个退出选项os._exit,它退出时不调用清除处理程序,刷新stdio缓冲区等(并且通常仅应在之后的子进程中使用fork())。

exit is a helper for the interactive shell – sys.exit is intended for use in programs.

The site module (which is imported automatically during startup, except if the -S command-line option is given) adds several constants to the built-in namespace (e.g. exit). They are useful for the interactive interpreter shell and should not be used in programs.


Technically, they do mostly the same: raising SystemExit. sys.exit does so in sysmodule.c:

static PyObject *
sys_exit(PyObject *self, PyObject *args)
{
    PyObject *exit_code = 0;
    if (!PyArg_UnpackTuple(args, "exit", 0, 1, &exit_code))
        return NULL;
    /* Raise SystemExit so callers may catch it or clean up. */
    PyErr_SetObject(PyExc_SystemExit, exit_code);
   return NULL;
}

While exit is defined in site.py and _sitebuiltins.py, respectively.

class Quitter(object):
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return 'Use %s() or %s to exit' % (self.name, eof)
    def __call__(self, code=None):
        # Shells like IDLE catch the SystemExit, but listen when their
        # stdin wrapper is closed.
        try:
            sys.stdin.close()
        except:
            pass
        raise SystemExit(code)
__builtin__.quit = Quitter('quit')
__builtin__.exit = Quitter('exit')

Note that there is a third exit option, namely os._exit, which exits without calling cleanup handlers, flushing stdio buffers, etc. (and which should normally only be used in the child process after a fork()).


回答 1

如果我exit()在代码中使用并在外壳中运行它,则会显示一条消息,询问我是否要终止该程序。真是令人不安。 看这里

但是sys.exit()在这种情况下更好。它关闭程序,并且不创建任何对话框。

If I use exit() in a code and run it in the shell, it shows a message asking whether I want to kill the program or not. It’s really disturbing. See here

But sys.exit() is better in this case. It closes the program and doesn’t create any dialogue box.


如何在Flask上获取查询字符串?

问题:如何在Flask上获取查询字符串?

从烧瓶文档中关于如何获取查询字符串的知识并不明显。我是新手,看了看文档,找不到!

所以

@app.route('/')
@app.route('/data')
def data():
    query_string=??????
    return render_template("data.html")

Not obvious from the flask documention on how to get the query string. I am new, looked at the docs, could not find!

So

@app.route('/')
@app.route('/data')
def data():
    query_string=??????
    return render_template("data.html")

回答 0

from flask import request

@app.route('/data')
def data():
    # here we want to get the value of user (i.e. ?user=some-value)
    user = request.args.get('user')
from flask import request

@app.route('/data')
def data():
    # here we want to get the value of user (i.e. ?user=some-value)
    user = request.args.get('user')

回答 1

完整URL可用request.url,而查询字符串可用request.query_string

这是一个例子:

from flask import request

@app.route('/adhoc_test/')
def adhoc_test():

    return request.query_string

要访问在查询字符串中传递的单个已知参数,可以使用request.args.get('param')。据我所知,这是“正确”的方法。

ETA:在继续之前,您应该问自己为什么要查询字符串。我从来没有拉过原始字符串-Flask具有以抽象方式访问它的机制。除非您有令人信服的理由不使用,否则应使用它们。

The full URL is available as request.url, and the query string is available as request.query_string.

Here’s an example:

from flask import request

@app.route('/adhoc_test/')
def adhoc_test():

    return request.query_string

To access an individual known param passed in the query string, you can use request.args.get('param'). This is the “right” way to do it, as far as I know.

ETA: Before you go further, you should ask yourself why you want the query string. I’ve never had to pull in the raw string – Flask has mechanisms for accessing it in an abstracted way. You should use those unless you have a compelling reason not to.


回答 2

Werkzeug / Flask已经为您解析了所有内容。无需使用urlparse再次执行相同的工作:

from flask import request

@app.route('/')
@app.route('/data')
def data():
    query_string = request.query_string  ## There is it
    return render_template("data.html")

有关请求和响应对象的完整文档,请参见Werkzeug:http : //werkzeug.pocoo.org/docs/wrappers/

Werkzeug/Flask as already parsed everything for you. No need to do the same work again with urlparse:

from flask import request

@app.route('/')
@app.route('/data')
def data():
    query_string = request.query_string  ## There is it
    return render_template("data.html")

The full documentation for the request and response objects is in Werkzeug: http://werkzeug.pocoo.org/docs/wrappers/


回答 3

我们可以使用request.query_string做到这一点。

例:

让我们考虑view.py

from my_script import get_url_params

@app.route('/web_url/', methods=('get', 'post'))
def get_url_params_index():
    return Response(get_url_params())

您还可以通过使用Flask蓝图使它更具模块化-http: //flask.pocoo.org/docs/0.10/blueprints/

让我们考虑将名字作为查询字符串/ web_url /?first_name = john的一部分进行传递

## here is my_script.py

## import required flask packages
from flask import request
def get_url_params():
    ## you might further need to format the URL params through escape.    
    firstName = request.args.get('first_name') 
    return firstName

如您所见,这只是一个小例子-您可以获取多个值+为其赋值,然后使用它或将其传递到模板文件中。

We can do this by using request.query_string.

Example:

Lets consider view.py

from my_script import get_url_params

@app.route('/web_url/', methods=('get', 'post'))
def get_url_params_index():
    return Response(get_url_params())

You also make it more modular by using Flask Blueprints – http://flask.pocoo.org/docs/0.10/blueprints/

Lets consider first name is being passed as a part of query string /web_url/?first_name=john

## here is my_script.py

## import required flask packages
from flask import request
def get_url_params():
    ## you might further need to format the URL params through escape.    
    firstName = request.args.get('first_name') 
    return firstName

As you see this is just a small example – you can fetch multiple values + formate those and use it or pass it onto the template file.


回答 4

我来这里是在寻找查询字符串,而不是如何从查询字符串获取值。

request.query_string 返回URL参数作为原始字节字符串(参考文献1)。

使用示例request.query_string

from flask import Flask, request

app = Flask(__name__)

@app.route('/data', methods=['GET'])
def get_query_string():
    return request.query_string

if __name__ == '__main__':
    app.run(debug=True)

输出:

Flask路由中的查询参数

参考文献:

  1. 有关query_string的官方API文档

I came here looking for the query string, not how to get values from the query string.

request.query_string returns the URL parameters as raw byte string (Ref 1).

Example of using request.query_string:

from flask import Flask, request

app = Flask(__name__)

@app.route('/data', methods=['GET'])
def get_query_string():
    return request.query_string

if __name__ == '__main__':
    app.run(debug=True)

Output:

query parameters in Flask route

References:

  1. Official API documentation on query_string

回答 5

像这样尝试查询字符串:

from flask import Flask, request

app = Flask(__name__)

@app.route('/parameters', methods=['GET'])
def query_strings():

    args1 = request.args['args1']
    args2 = request.args['args2']
    args3 = request.args['args3']

    return '''<h1>The Query String are...{}:{}:{}</h1>''' .format(args1,args2,args3)


if __name__ == '__main__':

    app.run(debug=True)

输出: 在此处输入图片说明

Try like this for query string:

from flask import Flask, request

app = Flask(__name__)

@app.route('/parameters', methods=['GET'])
def query_strings():

    args1 = request.args['args1']
    args2 = request.args['args2']
    args3 = request.args['args3']

    return '''<h1>The Query String are...{}:{}:{}</h1>''' .format(args1,args2,args3)


if __name__ == '__main__':

    app.run(debug=True)

Output: enter image description here


回答 6

O’Reilly Flask Web开发中所述,可以从烧瓶请求对象中检索每种形式的查询字符串:

O’Reilly Flask Web开发,如Manan Gouhari先前所述,首先,您需要导入请求:

from flask import request

request是Flask公开的对象,它是一个名为(您猜对了)的上下文变量request。顾名思义,它包含客户端包含在HTTP请求中的所有信息。该对象具有许多可以分别检索和调用的属性和方法。

您有很多request属性,其中包含要选择的查询字符串。在这里,我将列出以任何方式包含查询字符串的每个属性,以及O’Reilly对该书的描述。

首先args是“字典,其中所有参数都在URL的查询字符串中传递”。因此,如果要将查询字符串解析为字典,则可以执行以下操作:

from flask import request

@app.route('/'):
    queryStringDict = request.args

(正如其他人指出的,您也可以使用.get('<arg_name>')从字典中获取特定值)

然后,有form属性,它确实包含查询字符串,但它包含在另一个属性的部分包括查询字符串,我将立即上市。不过,首先form是“具有随请求一起提交的所有表单字段的字典”。我这么说是:烧瓶请求对象中还有另一个字典属性可用valuesvalues是“结合了form和中的值的字典args。” 检索将类似于以下内容:

from flask import request

@app.route('/'):
    formFieldsAndQueryStringDict = request.values

(再次,用于.get('<arg_name>')从字典中获取特定项目)

另一个选项是query_string“ URL的查询字符串部分,作为原始二进制值”。例子:

from flask import request

@app.route('/'):
    queryStringRaw = request.query_string

然后,还有一个额外的好处full_path是“ URL的路径和查询字符串部分”。通过ejemplo:

from flask import request

@app.route('/'):
    pathWithQueryString = request.full_path

最后,url“客户端请求的完整URL”(包括查询字符串):

from flask import request

@app.route('/'):
    pathWithQueryString = request.url

快乐黑客:)

Every form of the query string retrievable from flask request object as described in O’Reilly Flask Web Devleopment:

From O’Reilly Flask Web Development, and as stated by Manan Gouhari earlier, first you need to import request:

from flask import request

request is an object exposed by Flask as a context variable named (you guessed it) request. As its name suggests, it contains all the information that the client included in the HTTP request. This object has many attributes and methods that you can retrieve and call, respectively.

You have quite a few request attributes which contain the query string from which to choose. Here I will list every attribute that contains in any way the query string, as well as a description from the O’Reilly book of that attribute.

First there is args which is “a dictionary with all the arguments passed in the query string of the URL.” So if you want the query string parsed into a dictionary, you’d do something like this:

from flask import request

@app.route('/'):
    queryStringDict = request.args

(As others have pointed out, you can also use .get('<arg_name>') to get a specific value from the dictionary)

Then, there is the form attribute, which does not contain the query string, but which is included in part of another attribute that does include the query string which I will list momentarily. First, though, form is “A dictionary with all the form fields submitted with the request.” I say that to say this: there is another dictionary attribute available in the flask request object called values. values is “A dictionary that combines the values in form and args.” Retrieving that would look something like this:

from flask import request

@app.route('/'):
    formFieldsAndQueryStringDict = request.values

(Again, use .get('<arg_name>') to get a specific item out of the dictionary)

Another option is query_string which is “The query string portion of the URL, as a raw binary value.” Example of that:

from flask import request

@app.route('/'):
    queryStringRaw = request.query_string

Then as an added bonus there is full_path which is “The path and query string portions of the URL.” Por ejemplo:

from flask import request

@app.route('/'):
    pathWithQueryString = request.full_path

And finally, url, “The complete URL requested by the client” (which includes the query string):

from flask import request

@app.route('/'):
    pathWithQueryString = request.url

Happy hacking :)


回答 7

可以使用来完成request.args.get()。例如,如果您的查询字符串包含字段date,则可以使用进行访问

date = request.args.get('date')

别忘了request在烧瓶的导入列表中添加“ ”,即

from flask import request

This can be done using request.args.get(). For example if your query string has a field date, it can be accessed using

date = request.args.get('date')

Don’t forget to add “request” to list of imports from flask, i.e.

from flask import request

回答 8

如果请求为GET并且我们传递了一些查询参数,

fro`enter code here`m flask import request
@app.route('/')
@app.route('/data')
def data():
   if request.method == 'GET':
      # Get the parameters by key
      arg1 = request.args.get('arg1')
      arg2 = request.args.get('arg2')
      # Generate the query string
      query_string="?arg1={0}&arg2={1}".format(arg1, arg2)
      return render_template("data.html", query_string=query_string)

If the request if GET and we passed some query parameters then,

fro`enter code here`m flask import request
@app.route('/')
@app.route('/data')
def data():
   if request.method == 'GET':
      # Get the parameters by key
      arg1 = request.args.get('arg1')
      arg2 = request.args.get('arg2')
      # Generate the query string
      query_string="?arg1={0}&arg2={1}".format(arg1, arg2)
      return render_template("data.html", query_string=query_string)

bash:pip:找不到命令

问题:bash:pip:找不到命令

我下载了pip并运行python setup.py install,一切正常。本教程的下一步是运行,pip install <lib you want>但是甚至在尝试在线查找任何内容之前,我都会收到错误消息“ bash:pip:not found”。

这是在Mac OS X上,这也是我的新手,因此我假设有些路径设置在运行setup.py时未正确设置。我该如何进一步调查?我需要检查什么才能更好地了解问题的确切原因?

编辑:我也尝试过为Mac安装Python 2.7,希望友好的安装过程能够完成所有工作,例如编辑PATH,以及根据教程使一切正常工作所需的其他一切,但这是行不通的。安装运行后,“ python”仍然运行python 2.6,并且PATH未更新。

I downloaded pip and ran python setup.py install and everything worked just fine. The very next step in the tutorial is to run pip install <lib you want> but before it even tries to find anything online I get an error “bash: pip: command not found”.

This is on Mac OS X, which I’m new too, so I’m assuming there’s some kind of path setting that was not set correctly when I ran setup.py. How can I investigate further? What do I need to check to get a better idea of the exact cause of the problem?

EDIT: I also tried installing Python 2.7 for Mac in the hopes that the friendly install process would do any housekeeping like editing PATH and whatever else needs to happy for everything to work according to the tutorials, but this didn’t work. After installing is running ‘python’ still ran Python 2.6 and PATH was not updated.


回答 0

为什么不这样做,sudo easy_install pip或者这是否适用于python 2.6 sudo easy_install-2.6 pip

这将使用默认的python软件包安装程序系统安装pip,并同时为您节省了手动设置的麻烦。

这将允许您运行pippython软件包安装命令,因为它将与系统python一起安装。我也建议您在使用virtualenv软件包和模式时获得点子。:)

Why not just do sudo easy_install pip or if this is for python 2.6 sudo easy_install-2.6 pip?

This installs pip using the default python package installer system and saves you the hassle of manual set-up all at the same time.

This will allow you to then run the pip command for python package installation as it will be installed with the system python. I also recommend once you have pip using the virtualenv package and pattern. :)


回答 1

使用setuptools安装pip

sudo easy_install pip

(我知道答案的上面部分对于klobucar来说是多余的,但是我还不能添加评论),所以这是一个解决方案 sudo: easy_install: command not found关于Debian / Ubuntu:

sudo apt-get install python-setuptools

另外,对于python3,请使用easy_install3python3-setuptools

Use setuptools to install pip:

sudo easy_install pip

(I know the above part of my answer is redundant with klobucar’s, but I can’t add comments yet), so here’s an answer with a solution to sudo: easy_install: command not found on Debian/Ubuntu:

sudo apt-get install python-setuptools

Also, for python3, use easy_install3 and python3-setuptools.


回答 2

首先:尝试使用pip3而不是pip。例:

pip3 --version
pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)

pip3应该与Python3.x一起自动安装。该文档尚未更新,因此例如在安装Flask时,将说明中的pip替换为pip3即可。

现在,如果这不起作用,则可能必须单独安装pip。

First of all: try pip3 instead of pip. Example:

pip3 --version
pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)

pip3 should be installed automatically together with Python3.x. The documentation hasn’t been updated, so simply replace pip by pip3 in the instructions, when installing Flask for example.

Now, if this doesn’t work, you might have to install pip separately.


回答 3

更新:访问正确的pip安装以进行正确的python安装的一种更可靠的现代方法是使用语法python -m pip

原始答案

pip会将其自身安装到您的python安装位置的bin中。它还应创建一个指向更常见位置的符号链接,例如/usr/local/bin/pip

您可以编辑~/.profilePATH并将其更新为include /Library/Frameworks/Python.framework/Versions/2.6/bin,也可以在路径中的已知位置创建指向它的符号链接。

如果您执行以下操作: echo $PATH,您应该看到当前正在搜索的路径。如果/usr/local/bin位于PATH中,则可以执行以下操作:

ln -s /Library/Frameworks/Python.framework/Versions/2.6/bin/pip /usr/local/bin

我会选择将python bin添加到$ PATH变量中。

Update: A more reliable modern way to access the right pip install for the right python install is to use the syntax python -m pip.

Original Answer

pip would install itself into the bin of your python installation location. It also should create a symlink to some more common location like /usr/local/bin/pip

You can either edit your ~/.profile and update your PATH to include /Library/Frameworks/Python.framework/Versions/2.6/bin, or you could create a symlink to it in a place that you know is in your path.

If you do: echo $PATH, you should see the paths currently being searched. If /usr/local/bin is in your PATH, you can do:

ln -s /Library/Frameworks/Python.framework/Versions/2.6/bin/pip /usr/local/bin

I would opt for adding the python bin to your $PATH variable.


回答 4

按照给定安装Python最新版本 这里

它具有许多下载链接,例如numpy和scipy

然后转到终端并输入以下命令:

sudo easy_install pip

对于Python安装包,请检查此

安装软件包的要求本节介绍在安装其他Python软件包之前应遵循的步骤。

安装pip,setuptools和wheel如果从python.org安装了Python 2> = 2.7.9或Python 3> = 3.4,则已经具有pip和setuptools,但需要升级到最新版本:

在Linux或OS X上:

pip install -U pip setuptools在Windows上:

python -m pip install -U pip setuptools如果您正在Linux上使用由系统软件包管理器(例如“ yum”,“ apt-get”等)管理的Python安装,并且您想使用系统软件包管理器要安装或升级pip,请参阅使用Linux软件包管理器安装pip / setuptools / wheel

除此以外:

安全下载get-pip.py 1

运行python get-pip.py。2这将安装或升级点子。另外,如果尚未安装setuptools和wheel,它将安装setuptools和wheel。

Install Python latest version as given here

It has many download links like numpy and scipy

Then go to terminal and enter following command:-

sudo easy_install pip

For Python install packages check this

Requirements for Installing Packages This section describes the steps to follow before installing other Python packages.

Install pip, setuptools, and wheel If you have Python 2 >=2.7.9 or Python 3 >=3.4 installed from python.org, you will already have pip and setuptools, but will need to upgrade to the latest version:

On Linux or OS X:

pip install -U pip setuptools On Windows:

python -m pip install -U pip setuptools If you’re using a Python install on Linux that’s managed by the system package manager (e.g “yum”, “apt-get” etc…), and you want to use the system package manager to install or upgrade pip, then see Installing pip/setuptools/wheel with Linux Package Managers

Otherwise:

Securely Download get-pip.py 1

Run python get-pip.py. 2 This will install or upgrade pip. Additionally, it will install setuptools and wheel if they’re not installed already.


回答 5

我必须承认对python绝对是新手,我只需要一件事:awscli。我在下载python 3.xx时遇到了这个问题-pip:命令未找到

遵循下载AWS CLI的说明后,我进行了更改

pip install awscli

pip3 install awscli

运行了正确的版本。

我在计算机上做了一个别名,以在输入python的同时运行python3,这通常会运行系统版本2.7。我现在不确定这是个好主意。我想我只是按照他们想要的那样输入命令

I have to admit to being absolutely new to python, which I only need for one thing: awscli. I encountered this problem having downloaded python 3.x.x – pip: command not found

Whilst following the instructions for downloading the AWS cli I changed

pip install awscli

to

pip3 install awscli

which ran the correct version.

I’ve made an alias on my machine to run python3 whilst typing python, which would normally run the system version 2.7. I’m not sure this is a good idea now. I think I’ll just type in the commands as they intended them to be


回答 6

请参阅“ 如何安装Pip”一文,以了解更多信息。

截至2019年,

提供下载get-pip.py https://pip.pypa.io使用下面的命令:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

使用以下命令运行get-pip.py:
sudo python get-pip.py

完成安装后,运行此命令以检查是否安装了pip。
pip --version

安装pip后,删除get-pip.py文件。
rm get-pip.py

点子网站

Check out How to Install Pip article article for more information.

As of 2019,

Download get-pip.py provided by https://pip.pypa.io using the following command:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

Run get-pip.py using the following command:
sudo python get-pip.py

After you done installing, run this command to check if pip is installed.
pip --version

Remove get-pip.py file after installing pip.
rm get-pip.py

Pip website


回答 7

使用apt-get安装会在整个系统范围内安装pip,而不仅仅是为您的用户安装本地系统。尝试使用此命令使pip在您的系统上运行…

$ sudo apt-get install python-pip python-dev build-essential

然后pip将被安装而没有任何问题,您将可以使用“ sudo pip …”。

Installing using apt-get installs a system wide pip, not just a local one for your user. Try this command to get pip running on your system …

$ sudo apt-get install python-pip python-dev build-essential

Then pip will be installed without any issues and you will be able to use “sudo pip…”.


回答 8

不推荐使用大多数安装PIP的方法。这是最新的(2019)解决方案。请下载get-pip脚本

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

运行脚本

sudo python get-pip.py

Most of the methods to install PIP are deprecated. Here is the latest (2019) solution. Please download get-pip script

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

Run the script

sudo python get-pip.py

回答 9

我花了很长时间浏览本页上的所有答案,但在s-walsh对OP问题的评论中找到了一个对我有用的

答案是使用pip3:

$ pip3 install <name-of-install>

I spent ages going through all the answers on this page but found the one that worked for me in the comments of the OP question by s-walsh

The answer is to use pip3:

$ pip3 install <name-of-install>

回答 10

解决:

  1. 将此行添加到〜/ .bash_profile

    导出PATH =“ / usr / local / bin:$ PATH”

  2. 在终端窗口中,运行

    来源〜/ .bash_profile

To solve:

  1. Add this line to ~/.bash_profile

    export PATH=”/usr/local/bin:$PATH”

  2. In a terminal window, run

    source ~/.bash_profile


回答 11

它可能是root权限。我尝试退出root登录,使用

sudo su -l root
pip <command>

这对我行得通

It might be the root permission. I tried exit root login, use

sudo su -l root
pip <command>

that works for me


回答 12

安装Homebrew,打开Terminal或您喜欢的OSX终端仿真器并运行

$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

将Homebrew目录插入PATH环境变量的顶部。您可以通过在〜/ .profile文件底部添加以下行来完成此操作

export PATH=/usr/local/bin:/usr/local/sbin:$PATH

现在,我们可以安装Python 2.7:

$ brew install python

获取点子存储库:

$ git clone https://github.com/pypa/pip

安装点:

$sudo easy_install pip

install Homebrew, open Terminal or your favorite OSX terminal emulator and run

$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

insert the Homebrew directory at the top of your PATH environment variable. You can do this by adding the following line at the bottom of your ~/.profile file

export PATH=/usr/local/bin:/usr/local/sbin:$PATH

Now, we can install Python 2.7:

$ brew install python

Get pip repository:

$ git clone https://github.com/pypa/pip

install pip:

$sudo easy_install pip

回答 13

如果您正在运行Python 3.5,请运行以下终端命令:

sudo pip3 install -U nltk

终端中的任何其他pip命令都将类似:

pip3 install --upgrade pip
sudo pip3 install -U numpy ::

If you are running Python 3.5, run the following terminal command:

sudo pip3 install -U nltk

Any other pip commands in terminal would be similar:

pip3 install --upgrade pip
sudo pip3 install -U numpy ::

回答 14

python默认情况下安装它,但如果未安装,则可以使用以下cmd手动安装(仅适用于linux)

对于python3:

sudo apt install python3-pip 

对于python2

sudo apt install python-pip 

希望它的帮助。

python install it by default but if not install you can install it manual use following cmd (for linux only )

for python3 :

sudo apt install python3-pip 

for python2

sudo apt install python-pip 

hope its help.


回答 15

避免sudo

python <(curl https://bootstrap.pypa.io/get-pip.py) --user
echo 'export "PATH=$HOME/Library/Python/2.7/bin:$PATH"' >> ~/.bash_profile

从:

http://www.pip-command-not-found.com

Avoiding sudo:

python <(curl https://bootstrap.pypa.io/get-pip.py) --user
echo 'export "PATH=$HOME/Library/Python/2.7/bin:$PATH"' >> ~/.bash_profile

From:

http://www.pip-command-not-found.com


回答 16

CentOS 7用户可以使用:

yum install python-pip

virtualenv如果您使用的是点子,也建议使用。可以用相同的方式添加它:

yum install python-virtualenv

CentOS 7 users can just use:

yum install python-pip

Also recommend using virtualenv if you’re using pip. It can be added in the same way:

yum install python-virtualenv

回答 17

假设您有互联网,请参阅: https //pip.pypa.io/en/stable/installing/

基本上运行:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

python get-pip.py

assuming you have internet see: https://pip.pypa.io/en/stable/installing/

basically run:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

and

python get-pip.py

回答 18

(上下文:我的操作系统是使用AWS的Amazon linux。它看起来与RedHat类似,但看起来有所减少。)

退出外壳,然后打开一个新的外壳。pip命令现在可以使用。

这就是解决此位置问题的方法。

您可能还想知道:然后需要像下面的示例(例如jupyter)那样编写用于安装软件的pip命令,以便在我的系统上正常工作:

pip安装jupyter –user

具体来说,请注意缺少sudo以及–user的存在

如果pip文档对所有这些都说了话,那将是非常不错的,但是我猜这将需要输入更多的字符。

(Context: My OS is Amazon linux using AWS. It seems similar to RedHat but it’s stripped down a bit, it seems.)

Exit the shell, then open a new shell. The pip command now works.

That’s what solved the problem at this location.

You might want to know as well: The pip commands to install software then needed to be written like this example (jupyter for example) to work correctly on my system:

pip install jupyter –user

Specifically, note the lack of sudo, and the presence of –user

Would be real nice if pip docs had said anything about all this, but that would take typing in more characters I guess.


回答 19

不知道为什么以前没有提到过,但是唯一对我有用的(在我的NVIDIA Xavier上)是:

sudo apt-get install python3-pip

(或sudo apt-get install python-pip对于python 2)

Not sure why this wasnt mentioned before, but the only thing that worked for me (on my NVIDIA Xavier) was:

sudo apt-get install python3-pip

(or sudo apt-get install python-pip for python 2)


回答 20

通过升级python 3解决了这个问题 brew upgrade python:现在我可以这样做:

pip3 install  <package>  

==> python
Python has been installed as
  /usr/local/bin/python3

Unversioned symlinks `python`, `python-config`, `pip` etc. pointing to
`python3`, `python3-config`, `pip3` etc., respectively, have 

Solved this by upgrading python 3 brew upgrade python: Now i can just do:

pip3 install  <package>  

==> python
Python has been installed as
  /usr/local/bin/python3

Unversioned symlinks `python`, `python-config`, `pip` etc. pointing to
`python3`, `python3-config`, `pip3` etc., respectively, have 

回答 21

问题似乎是您的python版本和要安装的库yoıu版本不匹配。例如:如果Django是Django3,而您的python版本是2.7,则可能会收到此错误。

“安装运行后,’python’仍运行Python 2.6,并且PATH未更新。”

1-安装最新版本的Python 2-手动将PATH更改为python38并进行比较。3-尝试重新安装。

我解决了此问题,方法是使用最新版本的Python手动替换PATH。对于Windows:; C:\ python38 \ Scripts

The problem seems that your python version and the library yoıu want to install is not matching versionally. Ex: If Django is Django3 and your python version is 2.7, you may get this error.

“After installing is running ‘python’ still ran Python 2.6 and PATH was not updated.”

1- Install latest version of Python 2- Change your PATH manually as python38 and compare them. 3- Try to reinstall.

I solved this problem as replacing PATH manually with the latest version of Python. As for Windows: ;C:\python38\Scripts


回答 22

我为克服这个问题所做的是sudo apt install python-pip

原来我的虚拟机尚未安装pip。可以想象其他人也可能有这种情况。

What I did to overcome this was sudo apt install python-pip.

It turned out my virtual machine did not have pip installed yet. It’s conceivable that other people could have this scenario too.


回答 23

python-pip在更新pip编辑后使用过时版本的pip(9.0),当前发布的pip版本为(18.0),请/usr/bin/pip替换此导入:

from pip import main

from pip._internal import main

这适用于pip 18.0问题是pip更改main功能名称重复为/usr/bin/pip3/usr/bin/pip2

还认为/usr/local/lib/[your_python_version]/dist-packages/pip/__main__.py它应该与/usr/bin/pip

python-pip use obsolete version of pip (9.0) current post pip version is (18.0) after updating pip edit /usr/bin/pip replace this import:

from pip import main

to

from pip._internal import main

this working for pip 18.0 problem is pip change main function name repeat for /usr/bin/pip3 and /usr/bin/pip2

also view /usr/local/lib/[your_python_version]/dist-packages/pip/__main__.py It should be the same as /usr/bin/pip


回答 24

请执行以下操作:

sudo apt update
sudo apt install python3-pip
source ~/.bashrc

这肯定会安装pip及其所有依赖项。PS这是用于python 3,如果要使用python 2,请从第二个命令中将python3替换为python

sudo apt install python-pip

Do following:

sudo apt update
sudo apt install python3-pip
source ~/.bashrc

This will surely install pip with all its dependencies. PS this is for python 3 if you want for python 2 replace python3 from the second command to python

sudo apt install python-pip

回答 25

要解决Mac中的“ bash:pip:找不到命令 ”问题

在Mac 1上发现两个版本是2.7,另一个是3.7

  • 当我说sudo easy_install pip时,pip在2.7下安装

  • 当我说sudo easy_install-3.7 pip时,pip已安装在3.7下

但是,每当我需要进行pip install时,我都想在python3.7下安装该软件包,因此我在.bash_profile中设置了一个别名(alias pip = pip3)。

所以现在,每当我进行pip install时,都会在python3.7下安装

To overcome the issue “bash: pip: command not found” in Mac

Found two versions on Mac 1 is 2.7 and the other is 3.7

  • when I say sudo easy_install pip, pip got installed under 2.7

  • when I say sudo easy_install-3.7 pip, pip got installed under 3.7

But, whenever I would require to do pip install , I wanted to install the package under python3.7, so I have set an alias (alias pip=pip3)in .bash_profile

so now, whenever I do pip install , it gets installed under python3.7


回答 26

更新的安装命令为pip3

sudo apt-get install python3-pip

The updated command for installing pip3 is :

sudo apt-get install python3-pip

如何避免Python / Pandas在保存的csv中创建索引?

问题:如何避免Python / Pandas在保存的csv中创建索引?

对文件进行一些编辑后,我试图将csv保存到文件夹。

每次我使用pd.to_csv('C:/Path of file.csv')csv文件时,都有单独的索引列。我想避免将索引打印到csv。

我试过了:

pd.read_csv('C:/Path to file to edit.csv', index_col = False)

并保存文件…

pd.to_csv('C:/Path to save edited file.csv', index_col = False)

但是,我仍然得到不需要的索引列。保存文件时如何避免这种情况?

I am trying to save a csv to a folder after making some edits to the file.

Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv.

I tried:

pd.read_csv('C:/Path to file to edit.csv', index_col = False)

And to save the file…

pd.to_csv('C:/Path to save edited file.csv', index_col = False)

However, I still got the unwanted index column. How can I avoid this when I save my files?


回答 0

使用index=False

df.to_csv('your.csv', index=False)

Use index=False.

df.to_csv('your.csv', index=False)

回答 1

有两种方法可以处理我们不希望将索引存储在csv文件中的情况。

  1. 正如其他人所述,将 数据框保存到csv文件时可以使用index = False

    df.to_csv('file_name.csv',index=False)

  2. 或者,您可以使用索引保存数据框,在读取时只需删除未命名的包含先前索引的0列即可!简单!

    df.to_csv(' file_name.csv ')
    df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)

There are two ways to handle the situation where we do not want the index to be stored in csv file.

  1. As others have stated you can use index=False while saving your
    dataframe to csv file.

    df.to_csv('file_name.csv',index=False)

  2. Or you can save your dataframe as it is with an index, and while reading you just drop the column unnamed 0 containing your previous index.Simple!

    df.to_csv(' file_name.csv ')
    df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)


回答 2

如果不需要索引,请使用以下命令读取文件:

import pandas as pd
df = pd.read_csv('file.csv', index_col=0)

使用保存

df.to_csv('file.csv', index=False)

If you want no index, read file using:

import pandas as pd
df = pd.read_csv('file.csv', index_col=0)

save it using

df.to_csv('file.csv', index=False)

回答 3

正如其他人所说,如果您不想首先保存索引列,则可以使用 df.to_csv('processed.csv', index=False)

但是,由于您通常使用的数据本身具有某种索引,因此我们假设使用“时间戳”列,因此我将保留索引并使用该索引加载数据。

因此,要保存索引数据,请首先设置其索引,然后保存DataFrame:

df.set_index('timestamp')
df.to_csv('processed.csv')

之后,您可以读取带有索引的数据:

pd.read_csv('processed.csv', index_col='timestamp')

或读取数据,然后设置索引:

pd.read_csv('filename.csv')
pd.set_index('column_name')

As others have stated, if you don’t want to save the index column in the first place, you can use df.to_csv('processed.csv', index=False)

However, since the data you will usually use, have some sort of index themselves, let’s say a ‘timestamp’ column, I would keep the index and load the data using it.

So, to save the indexed data, first set their index and then save the DataFrame:

df.set_index('timestamp')
df.to_csv('processed.csv')

Afterwards, you can either read the data with the index:

pd.read_csv('processed.csv', index_col='timestamp')

or read the data, and then set the index:

pd.read_csv('filename.csv')
pd.set_index('column_name')

回答 4

如果要将此列保留为索引,则可以采用另一种解决方案。

pd.read_csv('filename.csv', index_col='Unnamed: 0')

Another solution if you want to keep this column as index.

pd.read_csv('filename.csv', index_col='Unnamed: 0')

回答 5

如果您想要一个好的格式,那么下一条语句是最好的:

dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)

在这种情况下,您将获得一个带有’,’的csv文件,该文件在各列和utf-8格式之间分开。另外,数字索引不会出现。

If you want a good format the next statement is the best:

dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)

In this case you have got a csv file with ‘,’ as separate between columns and utf-8 format. In addition, numerical index won’t appear.


实际上,Python 3.3中新的“ yield from”语法的主要用途是什么?

问题:实际上,Python 3.3中新的“ yield from”语法的主要用途是什么?

我很难缠住PEP 380

  1. 在什么情况下“产生于”有用?
  2. 什么是经典用例?
  3. 为什么与微线程相比?

[更新]

现在,我了解了造成困难的原因。我曾经使用过生成器,但从未真正使用过协程(由PEP-342引入)。尽管有一些相似之处,但生成器和协程基本上是两个不同的概念。了解协程(不仅是生成器)是了解新语法的关键。

恕我直言,协程是最晦涩的Python功能,大多数书籍使它看起来毫无用处且无趣。

感谢您做出的出色回答,特别感谢agf及其与David Beazley演讲相关的评论。大卫·罗克。

I’m having a hard time wrapping my brain around PEP 380.

  1. What are the situations where “yield from” is useful?
  2. What is the classic use case?
  3. Why is it compared to micro-threads?

[ update ]

Now I understand the cause of my difficulties. I’ve used generators, but never really used coroutines (introduced by PEP-342). Despite some similarities, generators and coroutines are basically two different concepts. Understanding coroutines (not only generators) is the key to understanding the new syntax.

IMHO coroutines are the most obscure Python feature, most books make it look useless and uninteresting.

Thanks for the great answers, but special thanks to agf and his comment linking to David Beazley presentations. David rocks.


回答 0

让我们先解决一件事。该解释yield from g就等于for v in g: yield v 甚至没有开始做正义什么yield from是一回事。因为,让我们面对现实,如果所有的事情yield from都是扩大for循环,那么它就不必添加yield from语言,也不能阻止在Python 2.x中实现一堆新功能。

什么yield from所做的就是建立主叫方和副生成器之间的透明双向连接

  • 从某种意义上说,该连接是“透明的”,它也将正确地传播所有内容,而不仅仅是所生成的元素(例如,传播异常)。

  • 该连接是在意义上是“双向”的数据可以同时寄给一个生成器。

如果我们在谈论TCP,yield from g可能意味着“现在暂时断开客户端的套接字,然后将其重新连接到该其他服务器套接字”。

顺便说一句,如果您不确定向生成器发送数据意味着什么,则需要删除所有内容并首先了解协程,它们非常有用(将它们与子例程进行对比),但是不幸的是在Python中鲜为人知。戴夫·比兹利(Dave Beazley)的《协程》好奇类是一个很好的开始。阅读幻灯片24-33以获得快速入门。

使用以下命令从生成器读取数据

def reader():
    """A generator that fakes a read from a file, socket, etc."""
    for i in range(4):
        yield '<< %s' % i

def reader_wrapper(g):
    # Manually iterate over data produced by reader
    for v in g:
        yield v

wrap = reader_wrapper(reader())
for i in wrap:
    print(i)

# Result
<< 0
<< 1
<< 2
<< 3

reader()我们可以手动完成,而不必手动进行迭代yield from

def reader_wrapper(g):
    yield from g

那行得通,我们消除了一行代码。意图可能会更清晰(或不太清楚)。但是生活没有改变。

使用第1部分中的收益将数据发送到生成器(协程)

现在,让我们做一些更有趣的事情。让我们创建一个名为coroutine的协程writer,它接受发送给它的数据并写入套接字,fd等。

def writer():
    """A coroutine that writes data *sent* to it to fd, socket, etc."""
    while True:
        w = (yield)
        print('>> ', w)

现在的问题是,包装器函数应如何处理将数据发送到编写器,以便将任何发送到包装器的数据透明地发送到writer()

def writer_wrapper(coro):
    # TBD
    pass

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in range(4):
    wrap.send(i)

# Expected result
>>  0
>>  1
>>  2
>>  3

包装器需要(显然)接受发送给它的数据,并且还应处理StopIterationfor循环用尽时的情况。显然只是做for x in coro: yield x不会做。这是一个有效的版本。

def writer_wrapper(coro):
    coro.send(None)  # prime the coro
    while True:
        try:
            x = (yield)  # Capture the value that's sent
            coro.send(x)  # and pass it to the writer
        except StopIteration:
            pass

或者,我们可以这样做。

def writer_wrapper(coro):
    yield from coro

这样可以节省6行代码,使其更具可读性,并且可以正常工作。魔法!

从第2部分-异常处理将数据发送到生成器收益

让我们使其更加复杂。如果我们的作者需要处理异常怎么办?假设writer句柄a 遇到一个SpamException,它将打印***

class SpamException(Exception):
    pass

def writer():
    while True:
        try:
            w = (yield)
        except SpamException:
            print('***')
        else:
            print('>> ', w)

如果我们不改变writer_wrapper怎么办?它行得通吗?我们试试吧

# writer_wrapper same as above

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in [0, 1, 2, 'spam', 4]:
    if i == 'spam':
        wrap.throw(SpamException)
    else:
        wrap.send(i)

# Expected Result
>>  0
>>  1
>>  2
***
>>  4

# Actual Result
>>  0
>>  1
>>  2
Traceback (most recent call last):
  ... redacted ...
  File ... in writer_wrapper
    x = (yield)
__main__.SpamException

嗯,它不起作用,因为x = (yield)只是引发了异常,一切都崩溃了。让它正常工作,但手动处理异常并将其发送或将其抛出到子生成器(writer)中

def writer_wrapper(coro):
    """Works. Manually catches exceptions and throws them"""
    coro.send(None)  # prime the coro
    while True:
        try:
            try:
                x = (yield)
            except Exception as e:   # This catches the SpamException
                coro.throw(e)
            else:
                coro.send(x)
        except StopIteration:
            pass

这可行。

# Result
>>  0
>>  1
>>  2
***
>>  4

但是,这也是!

def writer_wrapper(coro):
    yield from coro

yield from透明地处理发送值或抛出的值到副生成器。

但是,这仍然不能涵盖所有极端情况。如果外部生成器关闭,会发生什么?如果子生成器返回一个值(是的,在Python 3.3+中,生成器可以返回值),该如何处理?yield from透明地处理所有的极端案例是让人印象深刻yield from只是神奇地工作并处理了所有这些情况。

我个人认为这yield from是一个糟糕的关键字选择,因为它不会使双向性变得显而易见。提出了其他关键字(例如delegate但被拒绝了,因为向该语言添加新关键字比合并现有关键字要困难得多。

总之,最好将其yield from视为transparent two way channel调用方和子生成方之间的。

参考文献:

  1. PEP 380-委派给子生成器的语法(尤因)[v3.3,2009-02-13]
  2. PEP 342-通过增强型生成器进行协同程序(GvR,Eby)[v2.5,2005-05-10]

Let’s get one thing out of the way first. The explanation that yield from g is equivalent to for v in g: yield v does not even begin to do justice to what yield from is all about. Because, let’s face it, if all yield from does is expand the for loop, then it does not warrant adding yield from to the language and preclude a whole bunch of new features from being implemented in Python 2.x.

What yield from does is it establishes a transparent bidirectional connection between the caller and the sub-generator:

  • The connection is “transparent” in the sense that it will propagate everything correctly too, not just the elements being generated (e.g. exceptions are propagated).

  • The connection is “bidirectional” in the sense that data can be both sent from and to a generator.

(If we were talking about TCP, yield from g might mean “now temporarily disconnect my client’s socket and reconnect it to this other server socket”.)

BTW, if you are not sure what sending data to a generator even means, you need to drop everything and read about coroutines first—they’re very useful (contrast them with subroutines), but unfortunately lesser-known in Python. Dave Beazley’s Curious Course on Coroutines is an excellent start. Read slides 24-33 for a quick primer.

Reading data from a generator using yield from

def reader():
    """A generator that fakes a read from a file, socket, etc."""
    for i in range(4):
        yield '<< %s' % i

def reader_wrapper(g):
    # Manually iterate over data produced by reader
    for v in g:
        yield v

wrap = reader_wrapper(reader())
for i in wrap:
    print(i)

# Result
<< 0
<< 1
<< 2
<< 3

Instead of manually iterating over reader(), we can just yield from it.

def reader_wrapper(g):
    yield from g

That works, and we eliminated one line of code. And probably the intent is a little bit clearer (or not). But nothing life changing.

Sending data to a generator (coroutine) using yield from – Part 1

Now let’s do something more interesting. Let’s create a coroutine called writer that accepts data sent to it and writes to a socket, fd, etc.

def writer():
    """A coroutine that writes data *sent* to it to fd, socket, etc."""
    while True:
        w = (yield)
        print('>> ', w)

Now the question is, how should the wrapper function handle sending data to the writer, so that any data that is sent to the wrapper is transparently sent to the writer()?

def writer_wrapper(coro):
    # TBD
    pass

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in range(4):
    wrap.send(i)

# Expected result
>>  0
>>  1
>>  2
>>  3

The wrapper needs to accept the data that is sent to it (obviously) and should also handle the StopIteration when the for loop is exhausted. Evidently just doing for x in coro: yield x won’t do. Here is a version that works.

def writer_wrapper(coro):
    coro.send(None)  # prime the coro
    while True:
        try:
            x = (yield)  # Capture the value that's sent
            coro.send(x)  # and pass it to the writer
        except StopIteration:
            pass

Or, we could do this.

def writer_wrapper(coro):
    yield from coro

That saves 6 lines of code, make it much much more readable and it just works. Magic!

Sending data to a generator yield from – Part 2 – Exception handling

Let’s make it more complicated. What if our writer needs to handle exceptions? Let’s say the writer handles a SpamException and it prints *** if it encounters one.

class SpamException(Exception):
    pass

def writer():
    while True:
        try:
            w = (yield)
        except SpamException:
            print('***')
        else:
            print('>> ', w)

What if we don’t change writer_wrapper? Does it work? Let’s try

# writer_wrapper same as above

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in [0, 1, 2, 'spam', 4]:
    if i == 'spam':
        wrap.throw(SpamException)
    else:
        wrap.send(i)

# Expected Result
>>  0
>>  1
>>  2
***
>>  4

# Actual Result
>>  0
>>  1
>>  2
Traceback (most recent call last):
  ... redacted ...
  File ... in writer_wrapper
    x = (yield)
__main__.SpamException

Um, it’s not working because x = (yield) just raises the exception and everything comes to a crashing halt. Let’s make it work, but manually handling exceptions and sending them or throwing them into the sub-generator (writer)

def writer_wrapper(coro):
    """Works. Manually catches exceptions and throws them"""
    coro.send(None)  # prime the coro
    while True:
        try:
            try:
                x = (yield)
            except Exception as e:   # This catches the SpamException
                coro.throw(e)
            else:
                coro.send(x)
        except StopIteration:
            pass

This works.

# Result
>>  0
>>  1
>>  2
***
>>  4

But so does this!

def writer_wrapper(coro):
    yield from coro

The yield from transparently handles sending the values or throwing values into the sub-generator.

This still does not cover all the corner cases though. What happens if the outer generator is closed? What about the case when the sub-generator returns a value (yes, in Python 3.3+, generators can return values), how should the return value be propagated? That yield from transparently handles all the corner cases is really impressive. yield from just magically works and handles all those cases.

I personally feel yield from is a poor keyword choice because it does not make the two-way nature apparent. There were other keywords proposed (like delegate but were rejected because adding a new keyword to the language is much more difficult than combining existing ones.

In summary, it’s best to think of yield from as a transparent two way channel between the caller and the sub-generator.

References:

  1. PEP 380 – Syntax for delegating to a sub-generator (Ewing) [v3.3, 2009-02-13]
  2. PEP 342 – Coroutines via Enhanced Generators (GvR, Eby) [v2.5, 2005-05-10]

回答 1

在什么情况下“产生于”是有用的?

您遇到这样的循环的每种情况:

for x in subgenerator:
  yield x

作为PEP介绍,这是一个相当幼稚企图在使用子发生器,它缺少几个方面,特别是妥善处理.throw()/ .send()/ .close()通过引进机制PEP 342。要正确执行此操作,需要相当复杂的代码。

什么是经典用例?

考虑您要从递归数据结构中提取信息。假设我们要获取树中的所有叶节点:

def traverse_tree(node):
  if not node.children:
    yield node
  for child in node.children:
    yield from traverse_tree(child)

更重要的是,直到之前yield from,还没有简单的重构生成器代码的方法。假设您有一个(无意义的)生成器,如下所示:

def get_list_values(lst):
  for item in lst:
    yield int(item)
  for item in lst:
    yield str(item)
  for item in lst:
    yield float(item)

现在,您决定将这些循环分解为单独的生成器。不带yield from,这是很丑陋的,直到您是否真的想这样做三思。使用yield from,实际上看起来很不错:

def get_list_values(lst):
  for sub in [get_list_values_as_int, 
              get_list_values_as_str, 
              get_list_values_as_float]:
    yield from sub(lst)

为什么与微线程相比?

我认为PEP中的这一部分谈论的是,每个生成器确实都有其自己的隔离执行上下文。以及使用yield和来在生成者迭代器和调用者之间切换执行的事实__next__()分别,这类似于线程,其中操作系统会不时切换执行线程以及执行上下文(堆栈,寄存器, …)。

其效果也相当:生成器迭代器和调用者都同时在其执行状态中进行,它们的执行是交错的。例如,如果生成器进行某种计算,并且调用方打印出结果,则结果可用时,您将立即看到它们。这是一种并发形式。

这种类比不是特定于的yield from-而是Python中生成器的一般属性。

What are the situations where “yield from” is useful?

Every situation where you have a loop like this:

for x in subgenerator:
  yield x

As the PEP describes, this is a rather naive attempt at using the subgenerator, it’s missing several aspects, especially the proper handling of the .throw()/.send()/.close() mechanisms introduced by PEP 342. To do this properly, rather complicated code is necessary.

What is the classic use case?

Consider that you want to extract information from a recursive data structure. Let’s say we want to get all leaf nodes in a tree:

def traverse_tree(node):
  if not node.children:
    yield node
  for child in node.children:
    yield from traverse_tree(child)

Even more important is the fact that until the yield from, there was no simple method of refactoring the generator code. Suppose you have a (senseless) generator like this:

def get_list_values(lst):
  for item in lst:
    yield int(item)
  for item in lst:
    yield str(item)
  for item in lst:
    yield float(item)

Now you decide to factor out these loops into separate generators. Without yield from, this is ugly, up to the point where you will think twice whether you actually want to do it. With yield from, it’s actually nice to look at:

def get_list_values(lst):
  for sub in [get_list_values_as_int, 
              get_list_values_as_str, 
              get_list_values_as_float]:
    yield from sub(lst)

Why is it compared to micro-threads?

I think what this section in the PEP is talking about is that every generator does have its own isolated execution context. Together with the fact that execution is switched between the generator-iterator and the caller using yield and __next__(), respectively, this is similar to threads, where the operating system switches the executing thread from time to time, along with the execution context (stack, registers, …).

The effect of this is also comparable: Both the generator-iterator and the caller progress in their execution state at the same time, their executions are interleaved. For example, if the generator does some kind of computation and the caller prints out the results, you’ll see the results as soon as they’re available. This is a form of concurrency.

That analogy isn’t anything specific to yield from, though – it’s rather a general property of generators in Python.


回答 2

无论您从生成器内部调用生成器的哪个位置,都需要一个“泵”来重新yield设置值: for v in inner_generator: yield v。正如PEP所指出的那样,大多数人都忽略了这一点的微妙复杂性。throw()PEP中提供了一个示例,例如非本地流控制。yield from inner_generator无论您for之前编写了显式循环的地方,都将使用新语法。但是,它不仅是语法糖,它还处理了for循环忽略的所有极端情况。成为“丑闻”会鼓励人们使用它,从而获得正确的行为。

讨论线程中的此消息讨论了以下复杂性:

有了PEP 342引入的其他生成器功能,情况已不再如此:如Greg的PEP中所述,简单的迭代不正确地支持send()和throw()。当分解它们时,支持send()和throw()所需的体操实际上并不那么复杂,但是它们也不是简单的。

除了观察到生成器是一种平行论之外,我无法与微线程进行比较。您可以将挂起的生成器视为通过以下方式发送值的线程:yield到使用者线程的线程。实际的实现可能并非如此(Python开发人员显然对实际的实现非常感兴趣),但这与用户无关。

新的yield from语法不会在线程方面为语言增加任何其他功能,而只是使正确使用现有功能更加容易。或更准确地说,它使专家编写的复杂内部生成器的新手消费者可以更轻松地通过该生成器,而不会破坏其任何复杂功能。

Wherever you invoke a generator from within a generator you need a “pump” to re-yield the values: for v in inner_generator: yield v. As the PEP points out there are subtle complexities to this which most people ignore. Non-local flow-control like throw() is one example given in the PEP. The new syntax yield from inner_generator is used wherever you would have written the explicit for loop before. It’s not merely syntactic sugar, though: It handles all of the corner cases that are ignored by the for loop. Being “sugary” encourages people to use it and thus get the right behaviors.

This message in the discussion thread talks about these complexities:

With the additional generator features introduced by PEP 342, that is no longer the case: as described in Greg’s PEP, simple iteration doesn’t support send() and throw() correctly. The gymnastics needed to support send() and throw() actually aren’t that complex when you break them down, but they aren’t trivial either.

I can’t speak to a comparison with micro-threads, other than to observe that generators are a type of paralellism. You can consider the suspended generator to be a thread which sends values via yield to a consumer thread. The actual implementation may be nothing like this (and the actual implementation is obviously of great interest to the Python developers) but this does not concern the users.

The new yield from syntax does not add any additional capability to the language in terms of threading, it just makes it easier to use existing features correctly. Or more precisely it makes it easier for a novice consumer of a complex inner generator written by an expert to pass through that generator without breaking any of its complex features.


回答 3

一个简短的示例将帮助您理解的一个yield from用例:从另一个生成器获取价值

def flatten(sequence):
    """flatten a multi level list or something
    >>> list(flatten([1, [2], 3]))
    [1, 2, 3]
    >>> list(flatten([1, [2], [3, [4]]]))
    [1, 2, 3, 4]
    """
    for element in sequence:
        if hasattr(element, '__iter__'):
            yield from flatten(element)
        else:
            yield element

print(list(flatten([1, [2], [3, [4]]])))

A short example will help you understand one of yield from‘s use case: get value from another generator

def flatten(sequence):
    """flatten a multi level list or something
    >>> list(flatten([1, [2], 3]))
    [1, 2, 3]
    >>> list(flatten([1, [2], [3, [4]]]))
    [1, 2, 3, 4]
    """
    for element in sequence:
        if hasattr(element, '__iter__'):
            yield from flatten(element)
        else:
            yield element

print(list(flatten([1, [2], [3, [4]]])))

回答 4

yield from 基本上以有效的方式链接迭代器:

# chain from itertools:
def chain(*iters):
    for it in iters:
        for item in it:
            yield item

# with the new keyword
def chain(*iters):
    for it in iters:
        yield from it

如您所见,它删除了一个纯Python循环。这几乎就是它的全部工作,但是链接迭代器是Python中很常见的模式。

线程基本上是一种功能,使您可以在完全随机的点跳出函数,然后跳回另一个函数的状态。线程管理器经常执行此操作,因此该程序似乎可以同时运行所有这些功能。问题是这些点是随机的,因此您需要使用锁定来防止主管在有问题的点停止该功能。

在这种意义上,生成器与线程非常相似:它们允许您指定特定点(无论何时, yield),您可以在其中跳入和跳出。当以这种方式使用时,生成器称为协程。

阅读有关Python中协程的出色教程,以了解更多详细信息

yield from basically chains iterators in a efficient way:

# chain from itertools:
def chain(*iters):
    for it in iters:
        for item in it:
            yield item

# with the new keyword
def chain(*iters):
    for it in iters:
        yield from it

As you can see it removes one pure Python loop. That’s pretty much all it does, but chaining iterators is a pretty common pattern in Python.

Threads are basically a feature that allow you to jump out of functions at completely random points and jump back into the state of another function. The thread supervisor does this very often, so the program appears to run all these functions at the same time. The problem is that the points are random, so you need to use locking to prevent the supervisor from stopping the function at a problematic point.

Generators are pretty similar to threads in this sense: They allow you to specify specific points (whenever they yield) where you can jump in and out. When used this way, generators are called coroutines.

Read this excellent tutorials about coroutines in Python for more details


回答 5

在应用的使用为异步IO协程yield from也有类似的行为作为await协程功能。两者都用于中止协程的执行。

对于Asyncio,如果不需要支持较旧的Python版本(即> 3.5),则建议使用async def/ await作为定义协程的语法。因此yield from,协程中不再需要。

但通常在asyncio之外,如先前答案中所述,yield from <sub-generator>在迭代子生成器方面还有其他用途。

In applied usage for the Asynchronous IO coroutine, yield from has a similar behavior as await in a coroutine function. Both of which is used to suspend the execution of coroutine.

For Asyncio, if there’s no need to support an older Python version (i.e. >3.5), async def/await is the recommended syntax to define a coroutine. Thus yield from is no longer needed in a coroutine.

But in general outside of asyncio, yield from <sub-generator> has still some other usage in iterating the sub-generator as mentioned in the earlier answer.


回答 6

该代码定义了一个函数,该函数fixed_sum_digits返回一个生成器,该生成器枚举所有六个数字的数字,以使数字的总和为20。

def iter_fun(sum, deepness, myString, Total):
    if deepness == 0:
        if sum == Total:
            yield myString
    else:  
        for i in range(min(10, Total - sum + 1)):
            yield from iter_fun(sum + i,deepness - 1,myString + str(i),Total)

def fixed_sum_digits(digits, Tot):
    return iter_fun(0,digits,"",Tot) 

试着不用来写yield from。如果您找到有效的方法,请告诉我。

我认为对于这种情况:访问树yield from使代码更简单,更清晰。

This code defines a function fixed_sum_digits returning a generator enumerating all six digits numbers such that the sum of digits is 20.

def iter_fun(sum, deepness, myString, Total):
    if deepness == 0:
        if sum == Total:
            yield myString
    else:  
        for i in range(min(10, Total - sum + 1)):
            yield from iter_fun(sum + i,deepness - 1,myString + str(i),Total)

def fixed_sum_digits(digits, Tot):
    return iter_fun(0,digits,"",Tot) 

Try to write it without yield from. If you find an effective way to do it let me know.

I think that for cases like this one: visiting trees, yield from makes the code simpler and cleaner.


回答 7

简而言之,为迭代器函数yield from提供尾递归

Simply put, yield from provides tail recursion for iterator functions.


获取Python中当前脚本的名称

问题:获取Python中当前脚本的名称

我正在尝试获取当前正在运行的Python脚本的名称。

我有一个名为的脚本foo.py,我想做这样的事情以获得脚本名称:

print Scriptname

I’m trying to get the name of the Python script that is currently running.

I have a script called foo.py and I’d like to do something like this in order to get the script name:

print Scriptname

回答 0

您可以使用__file__获取当前文件的名称。在主模块中使用时,这是最初调用的脚本的名称。

如果要省略目录部分(可能存在),可以使用os.path.basename(__file__)

You can use __file__ to get the name of the current file. When used in the main module, this is the name of the script that was originally invoked.

If you want to omit the directory part (which might be present), you can use os.path.basename(__file__).


回答 1

import sys
print sys.argv[0]

这将打印foo.pypython foo.pydir/foo.pypython dir/foo.py等,这是第一个参数python。(请注意,在py2exe之后将会是foo.exe。)

import sys
print sys.argv[0]

This will print foo.py for python foo.py, dir/foo.py for python dir/foo.py, etc. It’s the first argument to python. (Note that after py2exe it would be foo.exe.)


回答 2

为了完整起见,我认为值得总结各种可能的结果,并为每种结果的确切行为提供参考:

  • __file__是当前正在执行的文件,如官方文档中所述

    __file__是从中加载模块的文件的路径名(如果它是从文件加载的)。所述__file__属性可以是缺少某些类型的模块,如Ç静态链接到解释器模块; 对于从共享库动态加载的扩展模块,它是共享库文件的路径名。

    从Python3.4起,每发行18416__file__始终是一个绝对路径,除非当前正在执行的文件是已经被直接执行(不通过与解释脚本-m使用相对路径命令行选项)。

  • __main__.__file__(需要import __main__)仅访问主模块的上述__file__属性,例如,从命令行调用的脚本的属性。

  • sys.argv[0](需要import sys)是从命令行调用的脚本名称,并且可能是绝对路径,如官方文档中所述

    argv[0]是脚本名称(是否为完整路径名取决于操作系统)。如果命令是使用-c解释器的命令行选项执行的,argv[0]则将其设置为字符串'-c'。如果没有脚本名称传递给Python解释器,argv[0]则为空字符串。

    正如提到的另一个回答这个问题Python的是被通过的工具,如转换成独立的可执行程序的脚本py2exePyInstaller可能不会显示预期的结果使用这种方法的时候(也就是sys.argv[0]将持有的可执行文件的名称,而不是名称该可执行文件中主要Python文件的名称)。

  • 如果上述选项似乎都不起作用,可能是由于不规则的导入操作造成的,那么检查模块可能会证明是有用的。特别是,在调用inspect.getfile(...)inspect.currentframe()可以工作,尽管后者将返回None没有实现运行时的Python堆栈帧。


处理符号链接

如果当前脚本是符号链接,则以上所有内容都将返回符号链接的路径,而不是真实文件的路径,因此os.path.realpath(...)应调用它们以提取后者。


提取实际文件名的进一步操作

os.path.basename(...)可以在上述任何方法上调用以便提取实际的文件名,os.path.splitext(...)也可以在实际的文件名上调用以便截断其后缀,如中所示os.path.splitext(os.path.basename(...))

Python的3.4起,每PEP 428中,PurePath的的pathlib模块可以用作以及任何上述的。具体来说,pathlib.PurePath(...).name提取实际文件名并pathlib.PurePath(...).stem提取不带后缀的实际文件名。

For completeness’ sake, I thought it would be worthwhile summarizing the various possible outcomes and supplying references for the exact behaviour of each:

  • __file__ is the currently executing file, as detailed in the official documentation:

    __file__ is the pathname of the file from which the module was loaded, if it was loaded from a file. The __file__ attribute may be missing for certain types of modules, such as C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.

    From Python3.4 onwards, per issue 18416, __file__ is always an absolute path, unless the currently executing file is a script that has been executed directly (not via the interpreter with the -m command line option) using a relative path.

  • __main__.__file__ (requires importing __main__) simply accesses the aforementioned __file__ attribute of the main module, e.g. of the script that was invoked from the command line.

  • sys.argv[0] (requires importing sys) is the script name that was invoked from the command line, and might be an absolute path, as detailed in the official documentation:

    argv[0] is the script name (it is operating system dependent whether this is a full pathname or not). If the command was executed using the -c command line option to the interpreter, argv[0] is set to the string '-c'. If no script name was passed to the Python interpreter, argv[0] is the empty string.

    As mentioned in another answer to this question, Python scripts that were converted into stand-alone executable programs via tools such as py2exe or PyInstaller might not display the desired result when using this approach (i.e. sys.argv[0] would hold the name of the executable rather than the name of the main Python file within that executable).

  • If none of the aforementioned options seem to work, probably due to an irregular import operation, the inspect module might prove useful. In particular, invoking inspect.getfile(...) on inspect.currentframe() could work, although the latter would return None when running in an implementation without Python stack frame.


Handling symbolic links

If the current script is a symbolic link, then all of the above would return the path of the symbolic link rather than the path of the real file and os.path.realpath(...) should be invoked in order to extract the latter.


Further manipulations that extract the actual file name

os.path.basename(...) may be invoked on any of the above in order to extract the actual file name and os.path.splitext(...) may be invoked on the actual file name in order to truncate its suffix, as in os.path.splitext(os.path.basename(...)).

From Python 3.4 onwards, per PEP 428, the PurePath class of the pathlib module may be used as well on any of the above. Specifically, pathlib.PurePath(...).name extracts the actual file name and pathlib.PurePath(...).stem extracts the actual file name without its suffix.


回答 3

注意 __file__将提供此代码所在的文件,该文件可以导入,并且与要解释的主文件不同。要获取主文件,可以使用特殊的__main__模块:

import __main__ as main
print(main.__file__)

注意 __main__.__file__在Python 2.7中有效,但在3.2中无效,因此请使用上述import-as语法使其具有可移植性。

Note that __file__ will give the file where this code resides, which can be imported and different from the main file being interpreted. To get the main file, the special __main__ module can be used:

import __main__ as main
print(main.__file__)

Note that __main__.__file__ works in Python 2.7 but not in 3.2, so use the import-as syntax as above to make it portable.


回答 4

上述答案是好的。但是我发现使用上面的结果这种方法更有效。
这导致实际的脚本文件名不是路径。

import sys    
import os    
file_name =  os.path.basename(sys.argv[0])

The Above answers are good . But I found this method more efficient using above results.
This results in actual script file name not a path.

import sys    
import os    
file_name =  os.path.basename(sys.argv[0])

回答 5

对于现代Python版本(3.4+),Path(__file__).name应该更加惯用。另外,Path(__file__).stem为您提供不带.py扩展名的脚本名称。

For modern Python versions (3.4+), Path(__file__).name should be more idiomatic. Also, Path(__file__).stem gives you the script name without the .py extension.


回答 6

尝试这个:

print __file__

Try this:

print __file__

回答 7

注意:如果您使用的是Python 3+,则应改用print()函数

假设文件名为foo.py,则以下代码段

import sys
print sys.argv[0][:-3]

要么

import sys
print sys.argv[0][::-1][3:][::-1]

至于具有更多字符的其他扩展名,例如文件名 foo.pypy

import sys
print sys.argv[0].split('.')[0]

如果要从绝对路径中提取

import sys
print sys.argv[0].split('/')[-1].split('.')[0]

将输出 foo

Note: If you are using Python 3+, then you should use the print() function instead

Assuming that the filename is foo.py, the below snippet

import sys
print sys.argv[0][:-3]

or

import sys
print sys.argv[0][::-1][3:][::-1]

As for other extentions with more characters, for example the filename foo.pypy

import sys
print sys.argv[0].split('.')[0]

If you want to extract from an absolute path

import sys
print sys.argv[0].split('/')[-1].split('.')[0]

will output foo


回答 8

sys中的第一个参数将是当前文件名,因此它将起作用

   import sys
   print sys.argv[0] # will print the file name

The first argument in sys will be the current file name so this will work

   import sys
   print sys.argv[0] # will print the file name

回答 9

如果您执行的是异常导入(例如,这是一个选项文件),请尝试:

import inspect
print (inspect.getfile(inspect.currentframe()))

请注意,这将返回文件的绝对路径。

If you’re doing an unusual import (e.g., it’s an options file), try:

import inspect
print (inspect.getfile(inspect.currentframe()))

Note that this will return the absolute path to the file.


回答 10

我们可以尝试使用此命令来获取当前脚本名称(不带扩展名)。

import os

script_name = os.path.splitext(os.path.basename(__file__))[0]

we can try this to get current script name without extension.

import os

script_name = os.path.splitext(os.path.basename(__file__))[0]

回答 11

由于OP要求提供当前脚本文件的名称,所以我希望

import os
os.path.split(sys.argv[0])[1]

Since the OP asked for the name of the current script file I would prefer

import os
os.path.split(sys.argv[0])[1]

回答 12

我快速的肮脏解决方案:

__file__.split('/')[-1:][0]

My fast dirty solution:

__file__.split('/')[-1:][0]

回答 13

os.path.abspath(__file__)将为您提供一条绝对路径(也relpath()可用)。

sys.argv[-1] 会给你一个相对的路径。

os.path.abspath(__file__) will give you an absolute path (relpath() available as well).

sys.argv[-1] will give you a relative path.


回答 14

所有这些答案都很不错,但是有一些问题,您乍一看可能看不到。

让我们定义我们想要的-我们想要执行的脚本的名称,而不是当前模块的名称-因此,__file__只有在已执行的脚本中使用了它,而不是在导入的模块中使用它时,它才起作用。 sys.argv也是可疑的-如果您的程序被pytest调用了怎么办?还是pydocRunner?还是被uwsgi调用?

-还有第三种获取脚本名称的方法,我在答案中没有看到-您可以检查堆栈。

另一个问题是,您(或某些其他程序)可以篡改sys.argv并且__main__.__file__-它可能存在,但可能不存在。它可能有效或无效。至少您可以检查脚本(所需结果)是否存在!

我在github上的库bitranox / lib_programname确实做到了:

  • 检查是否__main__存在
  • 检查是否__main__.__file__存在
  • 确实给 __main__.__file__有效结果(该脚本是否存在?)
  • 如果不是,请检查sys.argv:
  • sys.argv中是否有pytest,docrunner等?->如果是,请忽略
  • 我们可以在这里得到有效的结果吗?
  • 如果不是:检查堆栈并从那里获取结果
  • 如果堆栈也未给出有效结果,则抛出异常。

通过这种方式,我的解决方案正在到目前为止有setup.py testuwsgipytestpycharm pytestpycharm docrunner (doctest)dreampieeclipse

Dough Hellman也有一篇关于该问题的不错的博客文章,“用Python确定进程的名称”。

all that answers are great, but have some problems You might not see at the first glance.

lets define what we want – we want the name of the script that was executed, not the name of the current module – so __file__ will only work if it is used in the executed script, not in an imported module. sys.argv is also questionable – what if your program was called by pytest ? or pydoc runner ? or if it was called by uwsgi ?

and – there is a third method of getting the script name, I havent seen in the answers – You can inspect the stack.

Another problem is, that You (or some other program) can tamper around with sys.argv and __main__.__file__ – it might be present, it might be not. It might be valid, or not. At least You can check if the script (the desired result) exists !

my library bitranox/lib_programname at github does exactly that :

  • check if __main__ is present
  • check if __main__.__file__ is present
  • does give __main__.__file__ a valid result (does that script exist ?)
  • if not: check sys.argv:
  • is there pytest, docrunner, etc in the sys.argv ? –> if yes, ignore that
  • can we get a valid result here ?
  • if not: inspect the stack and get the result from there possibly
  • if also the stack does not give a valid result, then throw an Exception.

by that way, my solution is working so far with setup.py test, uwsgi, pytest, pycharm pytest , pycharm docrunner (doctest), dreampie, eclipse

there is also a nice blog article about that problem from Dough Hellman, “Determining the Name of a Process from Python”


回答 15

从Python 3.5开始,您可以简单地执行以下操作:

from pathlib import Path
Path(__file__).stem

在此处查看更多信息:https : //docs.python.org/3.5/library/pathlib.html#pathlib.PurePath.stem

例如,我的用户目录下有一个文件,test.py里面是这个文件:

from pathlib import Path

print(Path(__file__).stem)
print(__file__)

运行此输出:

>>> python3.6 test.py
test
test.py

As of Python 3.5 you can simply do:

from pathlib import Path
Path(__file__).stem

See more here: https://docs.python.org/3.5/library/pathlib.html#pathlib.PurePath.stem

For example, I have a file under my user directory named test.py with this inside:

from pathlib import Path

print(Path(__file__).stem)
print(__file__)

running this outputs:

>>> python3.6 test.py
test
test.py

如何获取Python函数的源代码?

问题:如何获取Python函数的源代码?

假设我有如下定义的Python函数:

def foo(arg1,arg2):
    #do something with args
    a = arg1 + arg2
    return a

我可以使用获取函数的名称foo.func_name。如上所述,我如何以编程方式获取其源代码?

Suppose I have a Python function as defined below:

def foo(arg1,arg2):
    #do something with args
    a = arg1 + arg2
    return a

I can get the name of the function using foo.func_name. How can I programmatically get its source code, as I typed above?


回答 0

如果该功能来自文件系统上可用的源文件,那么inspect.getsource(foo)可能会有帮助:

如果foo定义为:

def foo(arg1,arg2):         
    #do something with args 
    a = arg1 + arg2         
    return a  

然后:

import inspect
lines = inspect.getsource(foo)
print(lines)

返回值:

def foo(arg1,arg2):         
    #do something with args 
    a = arg1 + arg2         
    return a                

但是我相信,如果函数是从字符串,流中编译的,或者是从编译文件中导入的,那么您将无法检索其源代码。

If the function is from a source file available on the filesystem, then inspect.getsource(foo) might be of help:

If foo is defined as:

def foo(arg1,arg2):         
    #do something with args 
    a = arg1 + arg2         
    return a  

Then:

import inspect
lines = inspect.getsource(foo)
print(lines)

Returns:

def foo(arg1,arg2):         
    #do something with args 
    a = arg1 + arg2         
    return a                

But I believe that if the function is compiled from a string, stream or imported from a compiled file, then you cannot retrieve its source code.


回答 1

检查模块具有用于从Python对象中检索的源代码的方法。貌似它仅在源位于文件中时才起作用。如果有的话,我想您就不需要从对象中获取源代码。

The inspect module has methods for retrieving source code from python objects. Seemingly it only works if the source is located in a file though. If you had that I guess you wouldn’t need to get the source from the object.


回答 2

dis 如果源代码不可用,您是您的朋友吗:

>>> import dis
>>> def foo(arg1,arg2):
...     #do something with args
...     a = arg1 + arg2
...     return a
...
>>> dis.dis(foo)
  3           0 LOAD_FAST                0 (arg1)
              3 LOAD_FAST                1 (arg2)
              6 BINARY_ADD
              7 STORE_FAST               2 (a)

  4          10 LOAD_FAST                2 (a)
             13 RETURN_VALUE

dis is your friend if the source code is not available:

>>> import dis
>>> def foo(arg1,arg2):
...     #do something with args
...     a = arg1 + arg2
...     return a
...
>>> dis.dis(foo)
  3           0 LOAD_FAST                0 (arg1)
              3 LOAD_FAST                1 (arg2)
              6 BINARY_ADD
              7 STORE_FAST               2 (a)

  4          10 LOAD_FAST                2 (a)
             13 RETURN_VALUE

回答 3

如果使用的是IPython,则需要输入“ foo ??”

In [19]: foo??
Signature: foo(arg1, arg2)
Source:
def foo(arg1,arg2):
    #do something with args
    a = arg1 + arg2
    return a

File:      ~/Desktop/<ipython-input-18-3174e3126506>
Type:      function

If you are using IPython, then you need to type “foo??”

In [19]: foo??
Signature: foo(arg1, arg2)
Source:
def foo(arg1,arg2):
    #do something with args
    a = arg1 + arg2
    return a

File:      ~/Desktop/<ipython-input-18-3174e3126506>
Type:      function

回答 4

虽然我通常会认为这inspect是一个很好的答案,但我不同意您无法获得解释器中定义的对象的源代码。如果使用dill.source.getsourcefrom dill,即使它们是交互式定义的,也可以获取函数和lambda的来源。它也可以从咖喱中定义的绑定或未绑定类方法和函数中获取代码……但是,如果没有封闭对象的代码,您可能无法编译该代码。

>>> from dill.source import getsource
>>> 
>>> def add(x,y):
...   return x+y
... 
>>> squared = lambda x:x**2
>>> 
>>> print getsource(add)
def add(x,y):
  return x+y

>>> print getsource(squared)
squared = lambda x:x**2

>>> 
>>> class Foo(object):
...   def bar(self, x):
...     return x*x+x
... 
>>> f = Foo()
>>> 
>>> print getsource(f.bar)
def bar(self, x):
    return x*x+x

>>> 

While I’d generally agree that inspect is a good answer, I’d disagree that you can’t get the source code of objects defined in the interpreter. If you use dill.source.getsource from dill, you can get the source of functions and lambdas, even if they are defined interactively. It also can get the code for from bound or unbound class methods and functions defined in curries… however, you might not be able to compile that code without the enclosing object’s code.

>>> from dill.source import getsource
>>> 
>>> def add(x,y):
...   return x+y
... 
>>> squared = lambda x:x**2
>>> 
>>> print getsource(add)
def add(x,y):
  return x+y

>>> print getsource(squared)
squared = lambda x:x**2

>>> 
>>> class Foo(object):
...   def bar(self, x):
...     return x*x+x
... 
>>> f = Foo()
>>> 
>>> print getsource(f.bar)
def bar(self, x):
    return x*x+x

>>> 

回答 5

扩展runeh的答案:

>>> def foo(a):
...    x = 2
...    return x + a

>>> import inspect

>>> inspect.getsource(foo)
u'def foo(a):\n    x = 2\n    return x + a\n'

print inspect.getsource(foo)
def foo(a):
   x = 2
   return x + a

编辑:正如@ 0sh所指出的,此示例使用ipython但不是plain可以工作python。但是,从源文件导入代码时,两者都应该很好。

To expand on runeh’s answer:

>>> def foo(a):
...    x = 2
...    return x + a

>>> import inspect

>>> inspect.getsource(foo)
u'def foo(a):\n    x = 2\n    return x + a\n'

print inspect.getsource(foo)
def foo(a):
   x = 2
   return x + a

EDIT: As pointed out by @0sh this example works using ipython but not plain python. It should be fine in both, however, when importing code from source files.


回答 6

您可以使用inspect模块来获取完整的源代码。你必须使用getsource()方法为从inspect模块。例如:

import inspect

def get_my_code():
    x = "abcd"
    return x

print(inspect.getsource(get_my_code))

您可以在下面的链接中查看更多选项。 检索您的python代码

You can use inspect module to get full source code for that. You have to use getsource() method for that from the inspect module. For example:

import inspect

def get_my_code():
    x = "abcd"
    return x

print(inspect.getsource(get_my_code))

You can check it out more options on the below link. retrieve your python code


回答 7

由于此帖子被标记为与其他帖子重复,因此我在这里针对“ lambda”案例回答,尽管OP与lambda无关。

因此,对于未在自己的行中定义的lambda函数:除了marko.ristin的答案,您可能希望使用mini-lambda此答案中建议的使用SymPy

  • mini-lambda 更轻巧,支持任何类型的操作,但仅适用于单个变量
  • SymPy较重,但配备了数学/微积分运算。特别是它可以简化您的表达。它还在同一表达式中支持多个变量。

您可以使用以下方法进行操作mini-lambda

from mini_lambda import x, is_mini_lambda_expr
import inspect

def get_source_code_str(f):
    if is_mini_lambda_expr(f):
        return f.to_string()
    else:
        return inspect.getsource(f)

# test it

def foo(arg1, arg2):
    # do something with args
    a = arg1 + arg2
    return a

print(get_source_code_str(foo))
print(get_source_code_str(x ** 2))

它正确产生

def foo(arg1, arg2):
    # do something with args
    a = arg1 + arg2
    return a

x ** 2

有关详细信息,请参见mini-lambda 文档。我是作者;)

Since this post is marked as the duplicate of this other post, I answer here for the “lambda” case, although the OP is not about lambdas.

So, for lambda functions that are not defined in their own lines: in addition to marko.ristin‘s answer, you may wish to use mini-lambda or use SymPy as suggested in this answer.

  • mini-lambda is lighter and supports any kind of operation, but works only for a single variable
  • SymPy is heavier but much more equipped with mathematical/calculus operations. In particular it can simplify your expressions. It also supports several variables in the same expression.

Here is how you can do it using mini-lambda:

from mini_lambda import x, is_mini_lambda_expr
import inspect

def get_source_code_str(f):
    if is_mini_lambda_expr(f):
        return f.to_string()
    else:
        return inspect.getsource(f)

# test it

def foo(arg1, arg2):
    # do something with args
    a = arg1 + arg2
    return a

print(get_source_code_str(foo))
print(get_source_code_str(x ** 2))

It correctly yields

def foo(arg1, arg2):
    # do something with args
    a = arg1 + arg2
    return a

x ** 2

See mini-lambda documentation for details. I’m the author by the way ;)


回答 8

请注意,只有在单独的行上给出lambda时,可接受的答案才有效。如果将其作为参数传递给函数,并希望将lambda的代码作为对象进行检索,则问题将变得有些棘手,因为这inspect将为您提供整行内容。

例如,考虑一个文件test.py

import inspect

def main():
    x, f = 3, lambda a: a + 1
    print(inspect.getsource(f))

if __name__ == "__main__":
    main()

执行它会给你(注意缩进!):

    x, f = 3, lambda a: a + 1

我认为,要检索lambda的源代码,最好的办法是重新解析整个源文件(使用f.__code__.co_filename),并通过行号及其上下文匹配lambda AST节点。

我们必须在按合同设计的库icontract中做到这一点,因为我们必须解析作为装饰器参数传入的lambda函数。在此处粘贴太多代码,因此请看一下此函数的实现

Please mind that the accepted answers work only if the lambda is given on a separate line. If you pass it in as an argument to a function and would like to retrieve the code of the lambda as object, the problem gets a bit tricky since inspect will give you the whole line.

For example, consider a file test.py:

import inspect

def main():
    x, f = 3, lambda a: a + 1
    print(inspect.getsource(f))

if __name__ == "__main__":
    main()

Executing it gives you (mind the indention!):

    x, f = 3, lambda a: a + 1

To retrieve the source code of the lambda, your best bet, in my opinion, is to re-parse the whole source file (by using f.__code__.co_filename) and match the lambda AST node by the line number and its context.

We had to do precisely that in our design-by-contract library icontract since we had to parse the lambda functions we pass in as arguments to decorators. It is too much code to paste here, so have a look at the implementation of this function.


回答 9

如果您要严格定义函数,并且定义相对简短,那么没有依赖性的解决方案是在字符串中定义函数并将表达式的eval()分配给函数。

例如

funcstring = 'lambda x: x> 5'
func = eval(funcstring)

然后可以选择将原始代码附加到该函数:

func.source = funcstring

If you’re strictly defining the function yourself and it’s a relatively short definition, a solution without dependencies would be to define the function in a string and assign the eval() of the expression to your function.

E.g.

funcstring = 'lambda x: x> 5'
func = eval(funcstring)

then optionally to attach the original code to the function:

func.source = funcstring

回答 10

总结一下:

import inspect
print( "".join(inspect.getsourcelines(foo)[0]))

to summarize :

import inspect
print( "".join(inspect.getsourcelines(foo)[0]))

回答 11

相信变量名称不会存储在pyc / pyd / pyo文件中,因此,如果没有源文件,则无法检索确切的代码行。

I believe that variable names aren’t stored in pyc/pyd/pyo files, so you can not retrieve the exact code lines if you don’t have source files.


导入语句是否应该始终位于模块的顶部?

问题:导入语句是否应该始终位于模块的顶部?

PEP 08指出:

导入总是放在文件的顶部,紧随任何模块注释和文档字符串之后,以及模块全局变量和常量之前。

但是,如果仅在极少数情况下使用我要导入的类/方法/函数,那么在需要时进行导入肯定会更有效吗?

这不是吗?

class SomeClass(object):

    def not_often_called(self)
        from datetime import datetime
        self.datetime = datetime.now()

比这更有效?

from datetime import datetime

class SomeClass(object):

    def not_often_called(self)
        self.datetime = datetime.now()

PEP 08 states:

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?

Isn’t this:

class SomeClass(object):

    def not_often_called(self)
        from datetime import datetime
        self.datetime = datetime.now()

more efficient than this?

from datetime import datetime

class SomeClass(object):

    def not_often_called(self)
        self.datetime = datetime.now()

回答 0

模块导入非常快,但不是即时的。这意味着:

  • 将导入放在模块顶部很好,因为这是微不足道的成本,只需要支付一次即可。
  • 将导入放在函数中会导致对该函数的调用花费更长时间。

因此,如果您关心效率,则将进口放在首位。仅在您的分析显示有帮助的情况下,才将它们移入函数中(您进行了概要分析以查看最能改善性能的地方,对吗?)


我见过执行延迟导入的最佳原因是:

  • 可选的库支持。如果您的代码具有使用不同库的多个路径,则在未安装可选库的情况下不要中断。
  • __init__.py插件的中,可能已导入但并未实际使用。例如Bazaar插件,它使用bzrlib的延迟加载框架。

Module importing is quite fast, but not instant. This means that:

  • Putting the imports at the top of the module is fine, because it’s a trivial cost that’s only paid once.
  • Putting the imports within a function will cause calls to that function to take longer.

So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)


The best reasons I’ve seen to perform lazy imports are:

  • Optional library support. If your code has multiple paths that use different libraries, don’t break if an optional library is not installed.
  • In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib‘s lazy-loading framework.

回答 1

将import语句放在函数内部可以防止循环依赖。例如,如果您有两个模块X.py和Y.py,并且它们都需要互相导入,那么当您导入其中一个模块导致无限循环时,这将导致循环依赖。如果将import语句移动到一个模块中,则它将在调用该函数之前不会尝试导入另一个模块,并且该模块将已经被导入,因此不会出现无限循环。在此处阅读更多内容-effbot.org/zone/import-confusion.htm

Putting the import statement inside of a function can prevent circular dependencies. For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won’t try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more – effbot.org/zone/import-confusion.htm


回答 2

我采用了将所有导入放入使用它们的函数中而不是放在模块顶部的做法。

我得到的好处是能够更可靠地进行重构。当我将一个功能从一个模块移动到另一个模块时,我知道该功能将继续使用其完整的测试遗留功能。如果我在模块的顶部放置了导入文件,那么当我移动一个函数时,我发现我花了很多时间来使新模块的导入文件完整而最少。重构IDE可能与此无关。

如其他地方提到的那样,存在速度损失。我已经在我的应用程序中对此进行了测量,发现对于我的目的而言它并不重要。

能够预先查看所有模块依赖性而无需借助搜索(例如grep),也很不错。但是,我关心模块依赖性的原因通常是因为我正在安装,重构或移动包含多个文件的整个系统,而不仅仅是一个模块。在这种情况下,无论如何,我将执行全局搜索以确保我具有系统级依赖项。因此,我还没有发现全局导入可以帮助我在实践中理解系统。

我通常将检查的内容sys放入if __name__=='__main__'检查中,然后将参数(如sys.argv[1:])传递给main()函数。这使我可以mainsys尚未导入的上下文中使用。

I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.

The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module’s imports complete and minimal. A refactoring IDE might make this irrelevant.

There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.

It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I’m installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I’m going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.

I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.


回答 3

在大多数情况下,这样做对于保持清晰性和明智性很有用,但并非总是如此。以下是几个可能会在其他地方导入模块的情况的示例。

首先,您可以拥有一个带有以下形式的单元测试的模块:

if __name__ == '__main__':
    import foo
    aa = foo.xyz()         # initiate something for the test

其次,您可能需要在运行时有条件地导入一些不同的模块。

if [condition]:
    import foo as plugin_api
else:
    import bar as plugin_api
xx = plugin_api.Plugin()
[...]

在其他情况下,您可能会将导入放置在代码的其他部分中。

Most of the time this would be useful for clarity and sensible to do but it’s not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.

Firstly, you could have a module with a unit test of the form:

if __name__ == '__main__':
    import foo
    aa = foo.xyz()         # initiate something for the test

Secondly, you might have a requirement to conditionally import some different module at runtime.

if [condition]:
    import foo as plugin_api
else:
    import bar as plugin_api
xx = plugin_api.Plugin()
[...]

There are probably other situations where you might place imports in other parts in the code.


回答 4

当函数被调用为零或一次时,第一种变体的确比第二种变体更有效。但是,在第二次及其后的调用中,“导入每个调用”方法实际上效率较低。请参阅此链接以获取延迟加载技术,该技术通过执行“延迟导入”结合了两种方法的优点。

但是,除了效率之外,还有其他原因导致您可能会偏爱一个。一种方法是使阅读该模块相关代码的人更加清楚。它们还具有非常不同的故障特征-如果没有“ datetime”模块,第一个将在加载时失败,而第二个在调用该方法之前不会失败。

补充说明:在IronPython中,导入可能比CPython中昂贵得多,因为代码基本上是在导入时进行编译的。

The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the “import every call” approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a “lazy import”.

But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics — the first will fail at load time if there’s no “datetime” module while the second won’t fail until the method is called.

Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it’s being imported.


回答 5

Curt提出了一个很好的观点:第二个版本更清晰,它将在加载时而不是以后失败,并且出乎意料地失败。

通常,我不必担心模块的加载效率,因为它的速度(a)非常快,而(b)大多仅在启动时发生。

如果必须在意外的时刻加载重量级模块,则可以通过该__import__函数动态加载它们,并确保捕获ImportError异常并以合理的方式处理它们,这可能更有意义。

Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.

Normally I don’t worry about the efficiency of loading modules, since it’s (a) pretty fast, and (b) mostly only happens at startup.

If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.


回答 6

我不会担心过多地预先加载模块的效率。模块占用的内存不会很大(假设它足够模块化),启动成本可以忽略不计。

在大多数情况下,您希望将模块加载到源文件的顶部。对于阅读您的代码的人来说,它更容易分辨出哪个功能或对象来自哪个模块。

将模块导入代码中其他位置的一个很好的理由是,如果该模块在调试语句中使用过。

例如:

do_something_with_x(x)

我可以使用以下命令调试它:

from pprint import pprint
pprint(x)
do_something_with_x(x)

当然,将模块导入代码中其他位置的另一个原因是是否需要动态导入它们。这是因为您几乎别无选择。

我不会担心过多地预先加载模块的效率。模块占用的内存不会很大(假设它足够模块化),启动成本可以忽略不计。

I wouldn’t worry about the efficiency of loading the module up front too much. The memory taken up by the module won’t be very big (assuming it’s modular enough) and the startup cost will be negligible.

In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.

One good reason to import a module elsewhere in the code is if it’s used in a debugging statement.

For example:

do_something_with_x(x)

I could debug this with:

from pprint import pprint
pprint(x)
do_something_with_x(x)

Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don’t have any choice.

I wouldn’t worry about the efficiency of loading the module up front too much. The memory taken up by the module won’t be very big (assuming it’s modular enough) and the startup cost will be negligible.


回答 7

这是一个折衷,只有程序员才能决定进行。

情况1通过在需要之前不导入datetime模块(并进行可能需要的任何初始化)来节省一些内存和启动时间。请注意,“仅在调用时”执行导入也意味着“在调用时每次”进行导入,因此第一个调用之后的每个调用仍会产生执行导入的额外开销。

情况2通过预先导入datetime来节省一些执行时间和延迟,以便not_often_drawn()在调用时将更快地返回,并且还不会在每次调用时都导致导入开销。

除了效率外,如果import语句在…前面,则更容易在前面看到模块依赖性。将它们隐藏在代码中会使您更难于找到所需的模块。

就个人而言,除了单元测试之类的东西外,我通常都遵循PEP,因此我不希望总是加载它,因为我知道除了测试代码之外不会使用它们。

It’s a tradeoff, that only the programmer can decide to make.

Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import ‘only when called’ also means doing it ‘every time when called’, so each call after the first one is still incurring the additional overhead of doing the import.

Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.

Besides efficiency, it’s easier to see module dependencies up front if the import statements are … up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.

Personally I generally follow the PEP except for things like unit tests and such that I don’t want always loaded because I know they aren’t going to be used except for test code.


回答 8

这是一个示例,其中所有导入都位于最顶部(这是我唯一需要这样做的时间)。我希望能够在Un * x和Windows上终止子进程。

import os
# ...
try:
    kill = os.kill  # will raise AttributeError on Windows
    from signal import SIGTERM
    def terminate(process):
        kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
    try:
        from win32api import TerminateProcess  # use win32api if available
        def terminate(process):
            TerminateProcess(int(process._handle), -1)
    except ImportError:
        def terminate(process):
            raise NotImplementedError  # define a dummy function

(评论:约翰·米利金说的话。)

Here’s an example where all the imports are at the very top (this is the only time I’ve needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.

import os
# ...
try:
    kill = os.kill  # will raise AttributeError on Windows
    from signal import SIGTERM
    def terminate(process):
        kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
    try:
        from win32api import TerminateProcess  # use win32api if available
        def terminate(process):
            TerminateProcess(int(process._handle), -1)
    except ImportError:
        def terminate(process):
            raise NotImplementedError  # define a dummy function

(On review: what John Millikin said.)


回答 9

就像许多其他优化一样,您会牺牲一些可读性来提高速度。如John所述,如果您完成了概要分析作业,并且发现这是一项非常有用的更改,并且您需要额外的速度,则可以继续进行。在所有其他进口商品上加上注释可能会很好:

from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below

This is like many other optimizations – you sacrifice some readability for speed. As John mentioned, if you’ve done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It’d probably be good to put a note up with all the other imports:

from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below

回答 10

模块初始化仅发生一次-在首次导入时。如果有问题的模块来自标准库,那么您也可能会从程序中的其他模块导入它。对于像日期时间一样普遍的模块,它也可能是许多其他标准库的依赖项。由于模块初始化已经发生,因此import语句的花费很少。此时,它所做的全部工作就是将现有模块对象绑定到本地范围。

将该信息与用于可读性的参数相结合,我想说最好在模块范围内使用import语句。

Module initialization only occurs once – on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.

Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.


回答 11

只是为了完成萌的答案和原始问题:

当我们不得不处理循环依赖时,我们可以做一些“技巧”。假设我们正在与模块的工作a.py,并b.py包含x()和B y()分别。然后:

  1. 我们可以移动from imports模块底部的之一。
  2. 我们可以移动from imports实际上需要导入的函数或方法的内部之一(这并不总是可能的,因为您可以在多个地方使用它)。
  3. 我们可以将两者之一更改from imports为如下所示的导入:import a

因此,总结一下。如果您不是在处理循环依赖关系,而是采取某种技巧来避免它们,那么最好将所有导入内容放在顶部,因为在此问题的其他答案中已经说明了这些原因。并且,请在做“技巧”时添加评论,我们始终欢迎您!:)

Just to complete Moe’s answer and the original question:

When we have to deal with circular dependences we can do some “tricks”. Assuming we’re working with modules a.py and b.py that contain x() and b y(), respectively. Then:

  1. We can move one of the from imports at the bottom of the module.
  2. We can move one of the from imports inside the function or method that is actually requiring the import (this isn’t always possible, as you may use it from several places).
  3. We can change one of the two from imports to be an import that looks like: import a

So, to conclude. If you aren’t dealing with circular dependencies and doing some kind of trick to avoid them, then it’s better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this “tricks” include a comment, it’s always welcome! :)


回答 12

除了已经给出的出色答案外,值得注意的是,进口商品的摆放不仅是风格问题。有时,模块具有隐式依赖关系,需要首先导入或初始化,而顶级导入可能会导致违反所需的执行顺序。

这个问题通常出现在Apache Spark的Python API中,您需要在导入任何pyspark软件包或模块之前初始化SparkContext。最好将pyspark导入放置在保证SparkContext可用的范围内。

In addition to the excellent answers already given, it’s worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.

This issue often comes up in Apache Spark’s Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It’s best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.


回答 13

我很惊讶地没有看到已经发布的重复负载检查的实际成本数字,尽管对预期的结果有很多很好的解释。

如果您在顶部导入,则无论如何都会承受重击。这个数字很小,但是通常以毫秒为单位,而不是纳秒。

如果导入功能(S)之内,那么你只需要命中的加载,如果首次调用这些功能之一。正如许多人指出的那样,如果根本不发生这种情况,则可以节省加载时间。但是,如果函数被调用很多,您将遭受一次重复的打击,尽管命中率要小得多(用于检查它是否已加载;不是实际重新加载)。另一方面,正如@aaronasterling指出的那样,您还可以节省一点,因为在函数中进行导入使该函数可以使用稍快的局部变量查找来稍后标识名称(http://stackoverflow.com/questions/477096/python- import-coding-style / 4789963#4789963)。

这是一个简单测试的结果,该测试从函数内部导入了一些东西。报告的时间(在2.3 GHz Intel Core i7上的Python 2.7.14中)显示如下(第二次调用比以后的调用更多,这似乎是一致的,尽管我不知道为什么)。

 0 foo:   14429.0924 µs
 1 foo:      63.8962 µs
 2 foo:      10.0136 µs
 3 foo:       7.1526 µs
 4 foo:       7.8678 µs
 0 bar:       9.0599 µs
 1 bar:       6.9141 µs
 2 bar:       7.1526 µs
 3 bar:       7.8678 µs
 4 bar:       7.1526 µs

编码:

from __future__ import print_function
from time import time

def foo():
    import collections
    import re
    import string
    import math
    import subprocess
    return

def bar():
    import collections
    import re
    import string
    import math
    import subprocess
    return

t0 = time()
for i in xrange(5):
    foo()
    t1 = time()
    print("    %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1
for i in xrange(5):
    bar()
    t1 = time()
    print("    %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1

I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.

If you import at the top, you take the load hit no matter what. That’s pretty small, but commonly in the milliseconds, not nanoseconds.

If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn’t happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as @aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).

Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don’t know why).

 0 foo:   14429.0924 µs
 1 foo:      63.8962 µs
 2 foo:      10.0136 µs
 3 foo:       7.1526 µs
 4 foo:       7.8678 µs
 0 bar:       9.0599 µs
 1 bar:       6.9141 µs
 2 bar:       7.1526 µs
 3 bar:       7.8678 µs
 4 bar:       7.1526 µs

The code:

from __future__ import print_function
from time import time

def foo():
    import collections
    import re
    import string
    import math
    import subprocess
    return

def bar():
    import collections
    import re
    import string
    import math
    import subprocess
    return

t0 = time()
for i in xrange(5):
    foo()
    t1 = time()
    print("    %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1
for i in xrange(5):
    bar()
    t1 = time()
    print("    %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1

回答 14

我不希望提供完整的答案,因为其他人已经做得很好。当我发现在功能内部导入模块特别有用时,我只想提及一个用例。我的应用程序使用存储在特定位置的python软件包和模块作为插件。在应用程序启动期间,应用程序遍历该位置的所有模块并将其导入,然后在模块内部查找,如果找到了插件的安装点(在我的情况下,它是具有唯一标识的某些基类的子类ID)将其注册。插件的数量很大(现在有几十个,但将来可能有数百个),每个插件很少使用。在应用程序启动过程中,在我的插件模块顶部添加了第三方库,这会带来一些损失。尤其是某些第三方库的导入非常繁重(例如,密谋导入甚至尝试连接到Internet并下载一些内容,这些内容在启动时增加了大约一秒钟的时间)。通过优化插件中的导入(仅在使用它们的函数中调用它们),我设法将启动时间从10秒缩短到大约2秒。对于我的用户而言,这是一个很大的差异。

所以我的答案是不,不要总是将导入放在模块的顶部。

I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.

So my answer is no, do not always put the imports at the top of your modules.


回答 15

有趣的是,到目前为止,没有一个答案提到了并行处理,当序列化的函数代码被推到其他内核时,例如ipyparallel的情况,可能需要在函数中引入导入。

It’s interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.


回答 16

通过将变量/局部作用域导入函数内部,可以提高性能。这取决于函数中导入事物的用法。如果要循环很多次并访问模块全局对象,则将其作为本地导入可以有所帮助。

test.py

X=10
Y=11
Z=12
def add(i):
  i = i + 10

runlocal.py

from test import add, X, Y, Z

    def callme():
      x=X
      y=Y
      z=Z
      ladd=add 
      for i  in range(100000000):
        ladd(i)
        x+y+z

    callme()

运行

from test import add, X, Y, Z

def callme():
  for i in range(100000000):
    add(i)
    X+Y+Z

callme()

在Linux上使用一段时间显示收益很小

/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 
    0:17.80 real,   17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 
    0:14.23 real,   14.22 user, 0.01 sys

真正的是壁钟。用户是程序中的时间。sys是时候进行系统调用了。

https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names

There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.

test.py

X=10
Y=11
Z=12
def add(i):
  i = i + 10

runlocal.py

from test import add, X, Y, Z

    def callme():
      x=X
      y=Y
      z=Z
      ladd=add 
      for i  in range(100000000):
        ladd(i)
        x+y+z

    callme()

run.py

from test import add, X, Y, Z

def callme():
  for i in range(100000000):
    add(i)
    X+Y+Z

callme()

A time on Linux shows a small gain

/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 
    0:17.80 real,   17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 
    0:14.23 real,   14.22 user, 0.01 sys

real is wall clock. user is time in program. sys is time for system calls.

https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names


回答 17

可读性

除了启动性能外,还有一个可读性参数可用于本地化import语句。例如,在我当前的第一个python项目中,使用python行号1283到1296:

listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
                 str(Gtk.get_minor_version())+"."+
                 str(Gtk.get_micro_version())])

import xml.etree.ElementTree as ET

xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
    result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
                 result[2]+" "+result[3]])

如果该import语句位于文件的顶部,则必须向上滚动很长一段距离,或者按Home,以查找内容ET。然后,我将不得不导航回到第1283行以继续阅读代码。

确实,即使 import语句位于函数(或类)的顶部(如许多语句所放置的那样),也需要向上和向下分页。

显示Gnome版本号的操作很少,因此import文件顶部会引入不必要的启动延迟。

Readability

In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:

listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
                 str(Gtk.get_minor_version())+"."+
                 str(Gtk.get_micro_version())])

import xml.etree.ElementTree as ET

xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
    result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
                 result[2]+" "+result[3]])

If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.

Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.

Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.


回答 18

我想提一下我的一个用例,与@John Millikin和@VK提到的用例非常相似:

可选进口

我使用Jupyter Notebook进行数据分析,并且使用相同的IPython Notebook作为所有分析的模板。在某些情况下,我需要导入Tensorflow来进行一些快速的模型运行,但有时我会在未设置tensorflow或导入缓慢的地方工作。在这些情况下,我将依赖Tensorflow的操作封装在一个辅助函数中,将tensorflow导入该函数内部,并将其绑定到按钮。

这样,我可以“重新启动并运行所有程序”,而不必等待导入,也不必在导入失败时恢复其余的单元格。

I would like to mention a usecase of mine, very similar to those mentioned by @John Millikin and @V.K. :

Optional Imports

I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn’t set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.

This way, I could do “restart-and-run-all” without having to wait for the import, or having to resume the rest of the cells when it fails.


回答 19

这是一个有趣的讨论。像许多其他人一样,我什至从未考虑过这个话题。由于想要在我的一个库中使用Django ORM,我不得不在函数中具有导入功能。我不得不打电话django.setup()在导入模型类之前,我,因为这是文件的顶部,由于IoC注入器的构造,它被拖到了完全非Django的库代码中。

我有点四处乱窜,最后将django.setup()in放在单例构造函数中,并将相关的导入放在每个类方法的顶部。现在,这种方法工作正常,但是由于进口商品不在顶部而使我感到不安,而且我也开始担心进口商品的额外时间。然后我来到这里,以极大的兴趣阅读了大家对此的看法。

我有很长的C ++背景,现在使用Python / Cython。我对此的看法是,为什么不将导入内容放入函数中,除非它导致概要分析的瓶颈。这就像在需要变量之前为变量声明空间。麻烦的是,我有数千行代码,所有导入都在顶部!所以我想从现在开始,当我经过并有时间时,在这里和那里更改奇数文件。

This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.

I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren’t at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody’s take on this.

I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It’s only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I’m passing through and have the time.


将多个csv文件导入到pandas中并串联到一个DataFrame中

问题:将多个csv文件导入到pandas中并串联到一个DataFrame中

我想将目录中的多个csv文件读入pandas,并将它们连接成一个大的DataFrame。我还无法弄清楚。这是我到目前为止的内容:

import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

dfs = []
for filename in filenames:
    dfs.append(pd.read_csv(filename))

# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)

我想我在for循环中需要一些帮助吗???

I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far:

import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

dfs = []
for filename in filenames:
    dfs.append(pd.read_csv(filename))

# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)

I guess I need some help within the for loop???


回答 0

如果所有csv文件中的列均相同,则可以尝试以下代码。我已添加,header=0以便在读取后csv可以将第一行分配为列名。

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

If you have same columns in all your csv files then you can try the code below. I have added header=0 so that after reading csv first row can be assigned as the column names.

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

回答 1

替代darindaCoder的答案

path = r'C:\DRO\DCL_rawdata_files'                     # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one

An alternative to darindaCoder’s answer:

path = r'C:\DRO\DCL_rawdata_files'                     # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one

回答 2

import glob, os    
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))
import glob, os    
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))

回答 3

Dask库可以从多个文件读取数据帧:

>>> import dask.dataframe as dd
>>> df = dd.read_csv('data*.csv')

(来源:http : //dask.pydata.org/en/latest/examples/dataframe-csv.html

Dask数据框实现了Pandas数据框API的子集。如果所有数据都适合内存,则可以调用df.compute()将数据框转换为Pandas数据框。

The Dask library can read a dataframe from multiple files:

>>> import dask.dataframe as dd
>>> df = dd.read_csv('data*.csv')

(Source: http://dask.pydata.org/en/latest/examples/dataframe-csv.html)

The Dask dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call df.compute() to convert the dataframe into a Pandas dataframe.


回答 4

这里几乎所有答案都是不必要的复杂(全局模式匹配)或依赖于其他第三方库。您可以使用已内置的Pandas和python(所有版本)在2行中执行此操作。

对于一些文件-1个衬纸:

df = pd.concat(map(pd.read_csv, ['data/d1.csv', 'data/d2.csv','data/d3.csv']))

对于许多文件:

from os import listdir

filepaths = [f for f in listdir("./data") if f.endswith('.csv')]
df = pd.concat(map(pd.read_csv, filepaths))

设置df的这条熊猫线利用了3件事:

  1. Python的地图(函数,可迭代)发送到函数( pd.read_csv()可迭代(我们的列表)(是文件路径中的每个csv元素)。
  2. 熊猫的read_csv()函数可以正常读取每个CSV文件。
  3. 熊猫的concat()将所有这些都放在一个df变量下。

Almost all of the answers here are either unnecessarily complex (glob pattern matching) or rely on additional 3rd party libraries. You can do this in 2 lines using everything Pandas and python (all versions) already have built in.

For a few files – 1 liner:

df = pd.concat(map(pd.read_csv, ['data/d1.csv', 'data/d2.csv','data/d3.csv']))

For many files:

from os import listdir

filepaths = [f for f in listdir("./data") if f.endswith('.csv')]
df = pd.concat(map(pd.read_csv, filepaths))

This pandas line which sets the df utilizes 3 things:

  1. Python’s map (function, iterable) sends to the function (the pd.read_csv()) the iterable (our list) which is every csv element in filepaths).
  2. Panda’s read_csv() function reads in each CSV file as normal.
  3. Panda’s concat() brings all these under one df variable.

回答 5

编辑:我用谷歌搜索https://stackoverflow.com/a/21232849/186078。但是,最近我发现使用numpy进行任何操作,然后将其分配给数据框一次,而不是在迭代的基础上操纵数据框本身,这样更快,并且似乎也可以在此解决方案中工作。

我确实希望任何访问此页面的人都考虑采用这种方法,但又不想将这段巨大的代码作为注释并使其可读性降低。

您可以利用numpy真正加快数据帧的连接速度。

import os
import glob
import pandas as pd
import numpy as np

path = "my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))


np_array_list = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    np_array_list.append(df.as_matrix())

comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)

big_frame.columns = ["col1","col2"....]

时间统计:

total files :192
avg lines per file :8492
--approach 1 without numpy -- 8.248656988143921 seconds ---
total records old :1630571
--approach 2 with numpy -- 2.289292573928833 seconds ---

Edit: I googled my way into https://stackoverflow.com/a/21232849/186078. However of late I am finding it faster to do any manipulation using numpy and then assigning it once to dataframe rather than manipulating the dataframe itself on an iterative basis and it seems to work in this solution too.

I do sincerely want anyone hitting this page to consider this approach, but don’t want to attach this huge piece of code as a comment and making it less readable.

You can leverage numpy to really speed up the dataframe concatenation.

import os
import glob
import pandas as pd
import numpy as np

path = "my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))


np_array_list = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    np_array_list.append(df.as_matrix())

comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)

big_frame.columns = ["col1","col2"....]

Timing stats:

total files :192
avg lines per file :8492
--approach 1 without numpy -- 8.248656988143921 seconds ---
total records old :1630571
--approach 2 with numpy -- 2.289292573928833 seconds ---

回答 6

如果要递归搜索Python 3.5或更高版本),则可以执行以下操作:

from glob import iglob
import pandas as pd

path = r'C:\user\your\path\**\*.csv'

all_rec = iglob(path, recursive=True)     
dataframes = (pd.read_csv(f) for f in all_rec)
big_dataframe = pd.concat(dataframes, ignore_index=True)

请注意,最后三行可以用一行表示:

df = pd.concat((pd.read_csv(f) for f in iglob(path, recursive=True)), ignore_index=True)

您可以在** 此处找到文档。另外,我用iglob代替glob,因为它返回一个迭代器而不是列表。



编辑:多平台递归函数:

您可以将以上内容包装到一个多平台功能(Linux,Windows,Mac)中,因此可以执行以下操作:

df = read_df_rec('C:\user\your\path', *.csv)

这是函数:

from glob import iglob
from os.path import join
import pandas as pd

def read_df_rec(path, fn_regex=r'*.csv'):
    return pd.concat((pd.read_csv(f) for f in iglob(
        join(path, '**', fn_regex), recursive=True)), ignore_index=True)

If you want to search recursively (Python 3.5 or above), you can do the following:

from glob import iglob
import pandas as pd

path = r'C:\user\your\path\**\*.csv'

all_rec = iglob(path, recursive=True)     
dataframes = (pd.read_csv(f) for f in all_rec)
big_dataframe = pd.concat(dataframes, ignore_index=True)

Note that the three last lines can be expressed in one single line:

df = pd.concat((pd.read_csv(f) for f in iglob(path, recursive=True)), ignore_index=True)

You can find the documentation of ** here. Also, I used iglobinstead of glob, as it returns an iterator instead of a list.



EDIT: Multiplatform recursive function:

You can wrap the above into a multiplatform function (Linux, Windows, Mac), so you can do:

df = read_df_rec('C:\user\your\path', *.csv)

Here is the function:

from glob import iglob
from os.path import join
import pandas as pd

def read_df_rec(path, fn_regex=r'*.csv'):
    return pd.concat((pd.read_csv(f) for f in iglob(
        join(path, '**', fn_regex), recursive=True)), ignore_index=True)

回答 7

方便快捷

导入两个或多个csv而不需要列出名称。

import glob

df = pd.concat(map(pd.read_csv, glob.glob('data/*.csv')))

Easy and Fast

Import two or more csv‘s without having to make a list of names.

import glob

df = pd.concat(map(pd.read_csv, glob.glob('data/*.csv')))

回答 8

一个衬里使用map,但是如果您要指定其他参数,则可以执行以下操作:

import pandas as pd
import glob
import functools

df = pd.concat(map(functools.partial(pd.read_csv, sep='|', compression=None), 
                    glob.glob("data/*.csv")))

注意:map本身不允许您提供其他参数。

one liner using map, but if you’d like to specify additional args, you could do:

import pandas as pd
import glob
import functools

df = pd.concat(map(functools.partial(pd.read_csv, sep='|', compression=None), 
                    glob.glob("data/*.csv")))

Note: map by itself does not let you supply additional args.


回答 9

如果压缩了多个csv文件,则可以使用zipfile读取全部内容并进行如下连接:

import zipfile
import numpy as np
import pandas as pd

ziptrain = zipfile.ZipFile('yourpath/yourfile.zip')

train=[]

for f in range(0,len(ziptrain.namelist())):
    if (f == 0):
        train = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
    else:
        my_df = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
        train = (pd.DataFrame(np.concatenate((train,my_df),axis=0), 
                          columns=list(my_df.columns.values)))

If the multiple csv files are zipped, you may use zipfile to read all and concatenate as below:

import zipfile
import numpy as np
import pandas as pd

ziptrain = zipfile.ZipFile('yourpath/yourfile.zip')

train=[]

for f in range(0,len(ziptrain.namelist())):
    if (f == 0):
        train = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
    else:
        my_df = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
        train = (pd.DataFrame(np.concatenate((train,my_df),axis=0), 
                          columns=list(my_df.columns.values)))

回答 10

另一个具有列表理解功能的内联函数,它允许将参数与read_csv一起使用。

df = pd.concat([pd.read_csv(f'dir/{f}') for f in os.listdir('dir') if f.endswith('.csv')])

Another on-liner with list comprehension which allows to use arguments with read_csv.

df = pd.concat([pd.read_csv(f'dir/{f}') for f in os.listdir('dir') if f.endswith('.csv')])

回答 11

基于@Sid的正确答案。

串联之前,您可以将csv文件加载到中间字典中,该字典可以根据文件名(格式为dict_of_df['filename.csv'])访问每个数据集。例如,当列名未对齐时,此类词典可帮助您识别异构数据格式的问题。

导入模块并找到文件路径:

import os
import glob
import pandas
from collections import OrderedDict
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

注意:OrderedDict不是必需的,但是它将保留文件顺序,这可能对分析有用。

将csv文件加载到字典中。然后连接:

dict_of_df = OrderedDict((f, pandas.read_csv(f)) for f in filenames)
pandas.concat(dict_of_df, sort=True)

键是文件名f,值是csv文件的数据帧内容。除了f用作字典键之外,还可以使用os.path.basename(f)或其他os.path方法将字典中键的大小减小到仅相关的较小部分。

Based on @Sid’s good answer.

Before concatenating, you can load csv files into an intermediate dictionary which gives access to each data set based on the file name (in the form dict_of_df['filename.csv']). Such a dictionary can help you identify issues with heterogeneous data formats, when column names are not aligned for example.

Import modules and locate file paths:

import os
import glob
import pandas
from collections import OrderedDict
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

Note: OrderedDict is not necessary, but it’ll keep the order of files which might be useful for analysis.

Load csv files into a dictionary. Then concatenate:

dict_of_df = OrderedDict((f, pandas.read_csv(f)) for f in filenames)
pandas.concat(dict_of_df, sort=True)

Keys are file names f and values are the data frame content of csv files. Instead of using f as a dictionary key, you can also use os.path.basename(f) or other os.path methods to reduce the size of the key in the dictionary to only the smaller part that is relevant.


回答 12

使用pathlib库的替代方法(通常首选而不是os.path)。

此方法避免了pandas concat()/的迭代使用apped()

从pandas文档中:
值得注意的是,concat()(因此,append())会完整复制数据,并且不断重用此函数可能会对性能产生重大影响。如果需要对多个数据集使用该操作,请使用列表推导。

import pandas as pd
from pathlib import Path

dir = Path("../relevant_directory")

df = (pd.read_csv(f) for f in dir.glob("*.csv"))
df = pd.concat(df)

Alternative using the pathlib library (often preferred over os.path).

This method avoids iterative use of pandas concat()/apped().

From the pandas documentation:
It is worth noting that concat() (and therefore append()) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

import pandas as pd
from pathlib import Path

dir = Path("../relevant_directory")

df = (pd.read_csv(f) for f in dir.glob("*.csv"))
df = pd.concat(df)

回答 13

这是在Google云端硬盘上使用Colab的方式

import pandas as pd
import glob

path = r'/content/drive/My Drive/data/actual/comments_only' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True,sort=True)
frame.to_csv('/content/drive/onefile.csv')

This is how you can do using Colab on Google Drive

import pandas as pd
import glob

path = r'/content/drive/My Drive/data/actual/comments_only' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True,sort=True)
frame.to_csv('/content/drive/onefile.csv')

回答 14

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
file_path_list = glob.glob(path + "/*.csv")

file_iter = iter(file_path_list)

list_df_csv = []
list_df_csv.append(pd.read_csv(next(file_iter)))

for file in file_iter:
    lsit_df_csv.append(pd.read_csv(file, header=0))
df = pd.concat(lsit_df_csv, ignore_index=True)
import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
file_path_list = glob.glob(path + "/*.csv")

file_iter = iter(file_path_list)

list_df_csv = []
list_df_csv.append(pd.read_csv(next(file_iter)))

for file in file_iter:
    lsit_df_csv.append(pd.read_csv(file, header=0))
df = pd.concat(lsit_df_csv, ignore_index=True)