分类目录归档:知识问答

椭圆对象有什么作用?

问题:椭圆对象有什么作用?

在闲置地浏览命名空间时,我注意到一个看起来很奇怪的对象Ellipsis,它似乎并没有做任何特别的事情,但它是一个全局可用的内置对象。

经过搜索,我发现Numpy和Scipy在切片语法的某些晦涩变体中使用了它……但是几乎没有其他东西。

是否将此对象添加到专门支持Numpy + Scipy的语言中?Ellipsis是否有任何一般意义或用途?

D:\workspace\numpy>python
Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> Ellipsis
Ellipsis

While idly surfing the namespace I noticed an odd looking object called Ellipsis, it does not seem to be or do anything special, but it’s a globally available builtin.

After a search I found that it is used in some obscure variant of the slicing syntax by Numpy and Scipy… but almost nothing else.

Was this object added to the language specifically to support Numpy + Scipy? Does Ellipsis have any generic meaning or use at all?

D:\workspace\numpy>python
Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> Ellipsis
Ellipsis

回答 0

这是最近另一个问题。我将从那里详细说明我的答案

省略号是可以以切片表示法显示的对象。例如:

myList[1:2, ..., 0]

它的解释完全取决于实现该__getitem__函数并在其中看到Ellipsis对象的任何事物,但是其主要(和预期的)用途是在numpy第三方库中,该库添加了多维数组类型。由于存在多个维度,因此切片变得比仅起始索引和终止索引更为复杂;能够在多个维度上切片也很有用。例如,给定一个4×4数组,左上方区域将由slice定义[:2,:2]

>>> a
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

>>> a[:2,:2]  # top left
array([[1, 2],
       [5, 6]])

进一步扩展,此处省略号用于表示未指定的其余数组尺寸的占位符。可以认为它表示所[:]放置间隙中所有维度的完整切片,因此对于3d数组,与4d a[...,0]相同,同样是(但中间有许多冒号组成了完整数数组中的尺寸)。a[:,:,0]a[:,:,:,0]a[0,...,0]a[0,:,:,0]

有趣的是,在python3中,省略号文字(...)在slice语法之外可用,因此您实际上可以编写:

>>> ...
Ellipsis

除了各种数字类型,不,我认为它没有被使用。据我所知,它纯粹是为numpy使用而添加的,除了提供对象和相应的语法外,没有其他核心支持。那里的对象不需要这个,但是对切片的字面量“ …”支持确实需要。

This came up in another question recently. I’ll elaborate on my answer from there:

Ellipsis is an object that can appear in slice notation. For example:

myList[1:2, ..., 0]

Its interpretation is purely up to whatever implements the __getitem__ function and sees Ellipsis objects there, but its main (and intended) use is in the numpy third-party library, which adds a multidimensional array type. Since there are more than one dimensions, slicing becomes more complex than just a start and stop index; it is useful to be able to slice in multiple dimensions as well. E.g., given a 4×4 array, the top left area would be defined by the slice [:2,:2]:

>>> a
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

>>> a[:2,:2]  # top left
array([[1, 2],
       [5, 6]])

Extending this further, Ellipsis is used here to indicate a placeholder for the rest of the array dimensions not specified. Think of it as indicating the full slice [:] for all the dimensions in the gap it is placed, so for a 3d array, a[...,0] is the same as a[:,:,0] and for 4d, a[:,:,:,0], similarly, a[0,...,0] is a[0,:,:,0] (with however many colons in the middle make up the full number of dimensions in the array).

Interestingly, in python3, the Ellipsis literal (...) is usable outside the slice syntax, so you can actually write:

>>> ...
Ellipsis

Other than the various numeric types, no, I don’t think it’s used. As far as I’m aware, it was added purely for numpy use and has no core support other than providing the object and corresponding syntax. The object being there didn’t require this, but the literal “…” support for slices did.


回答 1

在Python 3中,您可以将Ellipsis文字...用作代码的“ nop”占位符:

def will_do_something():
    ...

不是魔术。可以使用任何表达式代替...,例如:

def will_do_something():
    1

(不能使用“批准”一词,但我可以说Guido 并未完全拒绝这种使用。)

In Python 3, you can use the Ellipsis literal ... as a “nop” placeholder for code:

def will_do_something():
    ...

This is not magic; any expression can be used instead of ..., e.g.:

def will_do_something():
    1

(Can’t use the word “sanctioned”, but I can say that this use was not outrightly rejected by Guido.)


回答 2

从Python 3.5和PEP484开始,使用键入模块时,文字省略号用于表示静态类型检查器的某些类型。

范例1:

任意长度的均值元组可以使用一种类型和省略号表示,例如 Tuple[int, ...]

范例2:

通过用文字省略号(三个点)代替参数列表,可以在不指定调用签名的情况下声明可调用对象的返回类型:

def partial(func: Callable[..., str], *args) -> Callable[..., str]:
    # Body

As of Python 3.5 and PEP484, the literal ellipsis is used to denote certain types to a static type checker when using the typing module.

Example 1:

Arbitrary-length homogeneous tuples can be expressed using one type and ellipsis, for example Tuple[int, ...]

Example 2:

It is possible to declare the return type of a callable without specifying the call signature by substituting a literal ellipsis (three dots) for the list of arguments:

def partial(func: Callable[..., str], *args) -> Callable[..., str]:
    # Body

回答 3

在指定预期的doctest输出时,也可以使用省略号:

class MyClass(object):
    """Example of a doctest Ellipsis

    >>> thing = MyClass()
    >>> # Match <class '__main__.MyClass'> and <class '%(module).MyClass'>
    >>> type(thing)           # doctest:+ELLIPSIS
    <class '....MyClass'>
    """
    pass

You can also use the Ellipsis when specifying expected doctest output:

class MyClass(object):
    """Example of a doctest Ellipsis

    >>> thing = MyClass()
    >>> # Match <class '__main__.MyClass'> and <class '%(module).MyClass'>
    >>> type(thing)           # doctest:+ELLIPSIS
    <class '....MyClass'>
    """
    pass

回答 4

Python文档中

切片通常使用此对象(请参阅切片)。它不支持任何特殊操作。恰好有一个省略号对象,名为Ellipsis(内置名称)。type(Ellipsis)()产生Ellipsis单例。

它写为Ellipsis...

From the Python documentation:

This object is commonly used by slicing (see Slicings). It supports no special operations. There is exactly one ellipsis object, named Ellipsis (a built-in name). type(Ellipsis)() produces the Ellipsis singleton.

It is written as Ellipsis or ....


回答 5

总结其他人所说的内容,从Python 3开始,Ellipsis本质上是与相似的另一个单例常量None,但没有特定的预期用途。现有用途包括:

  • 以切片语法表示剩余维度中的完整切片
  • 在类型提示中仅指示类型(Callable[..., int]Tuple[str, ...])的一部分
  • 在类型存根文件中以指示存在默认值而不指定它

可能的用途包括:

  • 作为None有效选项的地方的默认值
  • 作为尚未实现的功能的内容

Summing up what others have said, as of Python 3, Ellipsis is essentially another singleton constant similar to None, but without a particular intended use. Existing uses include:

  • In slice syntax to represent the full slice in remaining dimensions
  • In type hinting to indicate only part of a type(Callable[..., int] or Tuple[str, ...])
  • In type stub files to indicate there is a default value without specifying it

Possible uses could include:

  • As a default value for places where None is a valid option
  • As the content for a function you haven’t implemented yet

回答 6

__getitem__...自定义类中的最小示例

在自定义类中...传递魔术语法时[],将__getitem__()接收一个Ellipsis类对象。

然后,该类可以使用此Singleton对象执行任何所需的操作。

例:

class C(object):
    def __getitem__(self, k):
        return k

# Single argument is passed directly.
assert C()[0] == 0

# Multiple indices generate a tuple.
assert C()[0, 1] == (0, 1)

# Slice notation generates a slice object.
assert C()[1:2:3] == slice(1, 2, 3)

# Ellipsis notation generates the Ellipsis class object.
# Ellipsis is a singleton, so we can compare with `is`.
assert C()[...] is Ellipsis

# Everything mixed up.
assert C()[1, 2:3:4, ..., 6] == (1, slice(2,3,4), Ellipsis, 6)

Python内置list类选择给它提供范围的语义,当然,对它的任何合理使用也应如此。

就个人而言,我只是在我的API中远离它,而是创建一个单独的,更明确的方法。

已在Python 3.5.2和2.7.12中测试。

__getitem__ minimal ... example in a custom class

When the magic syntax ... gets passed to [] in a custom class, __getitem__() receives a Ellipsis class object.

The class can then do whatever it wants with this Singleton object.

Example:

class C(object):
    def __getitem__(self, k):
        return k

# Single argument is passed directly.
assert C()[0] == 0

# Multiple indices generate a tuple.
assert C()[0, 1] == (0, 1)

# Slice notation generates a slice object.
assert C()[1:2:3] == slice(1, 2, 3)

# Ellipsis notation generates the Ellipsis class object.
# Ellipsis is a singleton, so we can compare with `is`.
assert C()[...] is Ellipsis

# Everything mixed up.
assert C()[1, 2:3:4, ..., 6] == (1, slice(2,3,4), Ellipsis, 6)

The Python built-in list class chooses to give it the semantic of a range, and any sane usage of it should too of course.

Personally, I’d just stay away from it in my APIs, and create a separate, more explicit method instead.

Tested in Python 3.5.2 and 2.7.12.


回答 7

您可以在numpy这样的自定义切片情况下自己使用Ellipsis,但在任何内置类中都没有用。

我不知道是否专门为numpy添加了它,但我当然没有看到它在其他地方使用。

另请参阅:如何在Python中使用省略号切片语法?

You can use Ellipsis yourself, in custom slicing situations like numpy has done, but it has no usage in any builtin class.

I don’t know if it was added specifically for use in numpy, but I certainly haven’t seen it used elsewhere.

See also: How do you use the ellipsis slicing syntax in Python?


回答 8

正如@noɥʇʎԀʎzɐɹƆ和@phoenix提到的-您确实可以在存根文件中使用它。例如

class Foo:
    bar: Any = ...
    def __init__(self, name: str=...) -> None: ...

有关如何使用此省略号的更多信息和示例,请参见https://www.python.org/dev/peps/pep-0484/#stub-files

As mentioned by @noɥʇʎԀʎzɐɹƆ and @phoenix – You can indeed use it in stub files. e.g.

class Foo:
    bar: Any = ...
    def __init__(self, name: str=...) -> None: ...

More information and examples of how to use this ellipsis can be discovered here https://www.python.org/dev/peps/pep-0484/#stub-files


回答 9

它的预期用途不仅应用于这些第三方模块。在Python文档中没有适当提及(或者也许我找不到),但是CPython实际上至少在一个地方使用省略号...

它用于表示Python中的无限数据结构。我在玩列表时遇到了这个符号。

有关更多信息,请参见此问题

Its intended use shouldn’t be only for these 3rd party modules. It isn’t mentioned properly in the Python documentation (or maybe I just couldn’t find that) but the ellipsis ... is actually used in CPython in at least one place.

It is used for representing infinite data structures in Python. I came upon this notation while playing around with lists.

See this question for more info.


回答 10

这是等效的。

l=[..., 1,2,3]
l=[Ellipsis, 1,2,3]

...是在内部定义的常量built-in constants

省略

与省略号“ …”相同。特殊值,通常与用户定义的容器数据类型的扩展切片语法结合使用。

This is equivalent.

l=[..., 1,2,3]
l=[Ellipsis, 1,2,3]

... is a constant defined inside built-in constants.

Ellipsis

The same as the ellipsis literal “…”. Special value used mostly in conjunction with extended slicing syntax for user-defined container data types.


如何在Flask中提供静态文件

问题:如何在Flask中提供静态文件

所以这很尴尬。我有一个集成在一起的应用程序,Flask现在它只提供一个静态HTML页面,其中包含指向CSS和JS的链接。而且我找不到文档中Flask描述返回静态文件的位置。是的,我可以使用,render_template但是我知道数据没有模板化。我还以为send_file或者url_for是正确的事情,但我不能让这些工作。同时,我正在打开文件,阅读内容,并装配Response具有适当mimetype的:

import os.path

from flask import Flask, Response


app = Flask(__name__)
app.config.from_object(__name__)


def root_dir():  # pragma: no cover
    return os.path.abspath(os.path.dirname(__file__))


def get_file(filename):  # pragma: no cover
    try:
        src = os.path.join(root_dir(), filename)
        # Figure out how flask returns static files
        # Tried:
        # - render_template
        # - send_file
        # This should not be so non-obvious
        return open(src).read()
    except IOError as exc:
        return str(exc)


@app.route('/', methods=['GET'])
def metrics():  # pragma: no cover
    content = get_file('jenkins_analytics.html')
    return Response(content, mimetype="text/html")


@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def get_resource(path):  # pragma: no cover
    mimetypes = {
        ".css": "text/css",
        ".html": "text/html",
        ".js": "application/javascript",
    }
    complete_path = os.path.join(root_dir(), path)
    ext = os.path.splitext(path)[1]
    mimetype = mimetypes.get(ext, "text/html")
    content = get_file(complete_path)
    return Response(content, mimetype=mimetype)


if __name__ == '__main__':  # pragma: no cover
    app.run(port=80)

有人要为此提供代码示例或网址吗?我知道这将变得简单。

So this is embarrassing. I’ve got an application that I threw together in Flask and for now it is just serving up a single static HTML page with some links to CSS and JS. And I can’t find where in the documentation Flask describes returning static files. Yes, I could use render_template but I know the data is not templatized. I’d have thought send_file or url_for was the right thing, but I could not get those to work. In the meantime, I am opening the files, reading content, and rigging up a Response with appropriate mimetype:

import os.path

from flask import Flask, Response


app = Flask(__name__)
app.config.from_object(__name__)


def root_dir():  # pragma: no cover
    return os.path.abspath(os.path.dirname(__file__))


def get_file(filename):  # pragma: no cover
    try:
        src = os.path.join(root_dir(), filename)
        # Figure out how flask returns static files
        # Tried:
        # - render_template
        # - send_file
        # This should not be so non-obvious
        return open(src).read()
    except IOError as exc:
        return str(exc)


@app.route('/', methods=['GET'])
def metrics():  # pragma: no cover
    content = get_file('jenkins_analytics.html')
    return Response(content, mimetype="text/html")


@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def get_resource(path):  # pragma: no cover
    mimetypes = {
        ".css": "text/css",
        ".html": "text/html",
        ".js": "application/javascript",
    }
    complete_path = os.path.join(root_dir(), path)
    ext = os.path.splitext(path)[1]
    mimetype = mimetypes.get(ext, "text/html")
    content = get_file(complete_path)
    return Response(content, mimetype=mimetype)


if __name__ == '__main__':  # pragma: no cover
    app.run(port=80)

Someone want to give a code sample or url for this? I know this is going to be dead simple.


回答 0

首选方法是使用nginx或其他Web服务器提供静态文件。他们将比Flask更有效率。

但是,您可以使用send_from_directory从目录发送文件,这在某些情况下非常方便:

from flask import Flask, request, send_from_directory

# set the project root directory as the static folder, you can set others.
app = Flask(__name__, static_url_path='')

@app.route('/js/<path:path>')
def send_js(path):
    return send_from_directory('js', path)

if __name__ == "__main__":
    app.run()

千万不能使用send_filesend_static_file与用户提供的路径。

send_static_file 例:

from flask import Flask, request
# set the project root directory as the static folder, you can set others.
app = Flask(__name__, static_url_path='')

@app.route('/')
def root():
    return app.send_static_file('index.html')

The preferred method is to use nginx or another web server to serve static files; they’ll be able to do it more efficiently than Flask.

However, you can use send_from_directory to send files from a directory, which can be pretty convenient in some situations:

from flask import Flask, request, send_from_directory

# set the project root directory as the static folder, you can set others.
app = Flask(__name__, static_url_path='')

@app.route('/js/<path:path>')
def send_js(path):
    return send_from_directory('js', path)

if __name__ == "__main__":
    app.run()

Do not use send_file or send_static_file with a user-supplied path.

send_static_file example:

from flask import Flask, request
# set the project root directory as the static folder, you can set others.
app = Flask(__name__, static_url_path='')

@app.route('/')
def root():
    return app.send_static_file('index.html')

回答 1

如果只想移动静态文件的位置,则最简单的方法是在构造函数中声明路径。在下面的示例中,我将模板和静态文件移动到了名为的子文件夹中web

app = Flask(__name__,
            static_url_path='', 
            static_folder='web/static',
            template_folder='web/templates')
  • static_url_path=''从URL中删除任何前面的路径(即默认值/static)。
  • static_folder='web/static'将在文件夹中找到的所有文件 web/static作为静态文件提供。
  • template_folder='web/templates' 同样,这会更改模板文件夹。

使用此方法,以下URL将返回CSS文件:

<link rel="stylesheet" type="text/css" href="/css/bootstrap.min.css">

最后,这是文件夹结构的快照,flask_server.pyFlask实例在哪里:

嵌套的静态烧瓶文件夹

If you just want to move the location of your static files, then the simplest method is to declare the paths in the constructor. In the example below, I have moved my templates and static files into a sub-folder called web.

app = Flask(__name__,
            static_url_path='', 
            static_folder='web/static',
            template_folder='web/templates')
  • static_url_path='' removes any preceding path from the URL (i.e. the default /static).
  • static_folder='web/static' to serve any files found in the folder web/static as static files.
  • template_folder='web/templates' similarly, this changes the templates folder.

Using this method, the following URL will return a CSS file:

<link rel="stylesheet" type="text/css" href="/css/bootstrap.min.css">

And finally, here’s a snap of the folder structure, where flask_server.py is the Flask instance:

Nested Static Flask Folders


回答 2

您也可以(这是我的最爱)将文件夹设置为静态路径,以便每个人都可以访问其中的文件。

app = Flask(__name__, static_url_path='/static')

通过该设置,您可以使用标准HTML:

<link rel="stylesheet" type="text/css" href="/static/style.css">

You can also, and this is my favorite, set a folder as static path so that the files inside are reachable for everyone.

app = Flask(__name__, static_url_path='/static')

With that set you can use the standard HTML:

<link rel="stylesheet" type="text/css" href="/static/style.css">

回答 3

我确定您会在这里找到所需的资源:http : //flask.pocoo.org/docs/quickstart/#static-files

基本上,您只需要在软件包根目录中添加一个“静态”文件夹,然后就可以url_for('static', filename='foo.bar')通过http://example.com/static/foo.bar使用或直接链接到您的文件。

编辑:如评论中所建议,您可以直接使用'/static/foo.bar'URL路径, url_for()开销(在性能方面)非常低,并且使用它意味着您以后可以轻松自定义行为(更改文件夹,更改URL路径,将您的静态文件移至S3等)。

I’m sure you’ll find what you need there: http://flask.pocoo.org/docs/quickstart/#static-files

Basically you just need a “static” folder at the root of your package, and then you can use url_for('static', filename='foo.bar') or directly link to your files with http://example.com/static/foo.bar.

EDIT: As suggested in the comments you could directly use the '/static/foo.bar' URL path BUT url_for() overhead (performance wise) is quite low, and using it means that you’ll be able to easily customise the behaviour afterwards (change the folder, change the URL path, move your static files to S3, etc).


回答 4

您可以使用此功能:

send_static_file(filename)
内部使用的功能,用于将静态文件从静态文件夹发送到浏览器。

app = Flask(__name__)
@app.route('/<path:path>')
def static_file(path):
    return app.send_static_file(path)

You can use this function :

send_static_file(filename)
Function used internally to send static files from the static folder to the browser.

app = Flask(__name__)
@app.route('/<path:path>')
def static_file(path):
    return app.send_static_file(path)

回答 5

我使用的(并且运行良好)是“模板”目录和“静态”目录。我将所有.html文件/ Flask模板放在模板目录中,而static包含CSS / JS。据我所知,无论您使用Flask的模板语法的程度如何,render_template都能很好地适用于通用html文件。以下是我的views.py文件中的示例调用。

@app.route('/projects')
def projects():
    return render_template("projects.html", title = 'Projects')

只要确实要在单独的静态目录中引用某些静态文件时确保使用url_for()即可。无论如何,您可能最终都会在HTML的CSS / JS文件链接中执行此操作。例如…

<script src="{{ url_for('static', filename='styles/dist/js/bootstrap.js') }}"></script>

这是“规范”非正式Flask教程的链接-此处有许多很棒的技巧可帮助您快速入门。

http://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world

What I use (and it’s been working great) is a “templates” directory and a “static” directory. I place all my .html files/Flask templates inside the templates directory, and static contains CSS/JS. render_template works fine for generic html files to my knowledge, regardless of the extent at which you used Flask’s templating syntax. Below is a sample call in my views.py file.

@app.route('/projects')
def projects():
    return render_template("projects.html", title = 'Projects')

Just make sure you use url_for() when you do want to reference some static file in the separate static directory. You’ll probably end up doing this anyways in your CSS/JS file links in html. For instance…

<script src="{{ url_for('static', filename='styles/dist/js/bootstrap.js') }}"></script>

Here’s a link to the “canonical” informal Flask tutorial – lots of great tips in here to help you hit the ground running.

http://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world


回答 6

基于其他答案的最简单的工作示例如下:

from flask import Flask, request
app = Flask(__name__, static_url_path='')

@app.route('/index/')
def root():
    return app.send_static_file('index.html')

if __name__ == '__main__':
  app.run(debug=True)

使用名为index.html的HTML

<!DOCTYPE html>
<html>
<head>
    <title>Hello World!</title>
</head>
<body>
    <div>
         <p>
            This is a test.
         </p>
    </div>
</body>
</html>

重要提示:而且index.html的是一个文件夹,名为在静态,这意味着<projectpath>.py文件,<projectpath>\statichtml文件。

如果希望服务器在网络上可见,请使用 app.run(debug=True, host='0.0.0.0')

编辑:如果需要显示文件夹中的所有文件,请使用此

@app.route('/<path:path>')
def static_file(path):
    return app.send_static_file(path)

从本质BlackMamba上讲,这是答案,所以给他们投票。

A simplest working example based on the other answers is the following:

from flask import Flask, request
app = Flask(__name__, static_url_path='')

@app.route('/index/')
def root():
    return app.send_static_file('index.html')

if __name__ == '__main__':
  app.run(debug=True)

With the HTML called index.html:

<!DOCTYPE html>
<html>
<head>
    <title>Hello World!</title>
</head>
<body>
    <div>
         <p>
            This is a test.
         </p>
    </div>
</body>
</html>

IMPORTANT: And index.html is in a folder called static, meaning <projectpath> has the .py file, and <projectpath>\static has the html file.

If you want the server to be visible on the network, use app.run(debug=True, host='0.0.0.0')

EDIT: For showing all files in the folder if requested, use this

@app.route('/<path:path>')
def static_file(path):
    return app.send_static_file(path)

Which is essentially BlackMamba‘s answer, so give them an upvote.


回答 7

对于创建下一个文件夹树的角度+样板流:

backend/
|
|------ui/
|      |------------------build/          <--'static' folder, constructed by Grunt
|      |--<proj           |----vendors/   <-- angular.js and others here
|      |--     folders>   |----src/       <-- your js
|                         |----index.html <-- your SPA entrypoint 
|------<proj
|------     folders>
|
|------view.py  <-- Flask app here

我使用以下解决方案:

...
root = os.path.join(os.path.dirname(os.path.abspath(__file__)), "ui", "build")

@app.route('/<path:path>', methods=['GET'])
def static_proxy(path):
    return send_from_directory(root, path)


@app.route('/', methods=['GET'])
def redirect_to_index():
    return send_from_directory(root, 'index.html')
...

它有助于将“静态”文件夹重新定义为自定义。

For angular+boilerplate flow which creates next folders tree:

backend/
|
|------ui/
|      |------------------build/          <--'static' folder, constructed by Grunt
|      |--<proj           |----vendors/   <-- angular.js and others here
|      |--     folders>   |----src/       <-- your js
|                         |----index.html <-- your SPA entrypoint 
|------<proj
|------     folders>
|
|------view.py  <-- Flask app here

I use following solution:

...
root = os.path.join(os.path.dirname(os.path.abspath(__file__)), "ui", "build")

@app.route('/<path:path>', methods=['GET'])
def static_proxy(path):
    return send_from_directory(root, path)


@app.route('/', methods=['GET'])
def redirect_to_index():
    return send_from_directory(root, 'index.html')
...

It helps to redefine ‘static’ folder to custom.


回答 8

因此,我开始工作(基于@ user1671599答案),并希望与大家分享。

(我希望我做对了,因为这是我的第一个Python应用程序)

我做到了-

项目结构:

在此处输入图片说明

server.py:

from server.AppStarter import AppStarter
import os

static_folder_root = os.path.join(os.path.dirname(os.path.abspath(__file__)), "client")

app = AppStarter()
app.register_routes_to_resources(static_folder_root)
app.run(__name__)

AppStarter.py:

from flask import Flask, send_from_directory
from flask_restful import Api, Resource
from server.ApiResources.TodoList import TodoList
from server.ApiResources.Todo import Todo


class AppStarter(Resource):
    def __init__(self):
        self._static_files_root_folder_path = ''  # Default is current folder
        self._app = Flask(__name__)  # , static_folder='client', static_url_path='')
        self._api = Api(self._app)

    def _register_static_server(self, static_files_root_folder_path):
        self._static_files_root_folder_path = static_files_root_folder_path
        self._app.add_url_rule('/<path:file_relative_path_to_root>', 'serve_page', self._serve_page, methods=['GET'])
        self._app.add_url_rule('/', 'index', self._goto_index, methods=['GET'])

    def register_routes_to_resources(self, static_files_root_folder_path):

        self._register_static_server(static_files_root_folder_path)
        self._api.add_resource(TodoList, '/todos')
        self._api.add_resource(Todo, '/todos/<todo_id>')

    def _goto_index(self):
        return self._serve_page("index.html")

    def _serve_page(self, file_relative_path_to_root):
        return send_from_directory(self._static_files_root_folder_path, file_relative_path_to_root)

    def run(self, module_name):
        if module_name == '__main__':
            self._app.run(debug=True)

So I got things working (based on @user1671599 answer) and wanted to share it with you guys.

(I hope I’m doing it right since it’s my first app in Python)

I did this –

Project structure:

enter image description here

server.py:

from server.AppStarter import AppStarter
import os

static_folder_root = os.path.join(os.path.dirname(os.path.abspath(__file__)), "client")

app = AppStarter()
app.register_routes_to_resources(static_folder_root)
app.run(__name__)

AppStarter.py:

from flask import Flask, send_from_directory
from flask_restful import Api, Resource
from server.ApiResources.TodoList import TodoList
from server.ApiResources.Todo import Todo


class AppStarter(Resource):
    def __init__(self):
        self._static_files_root_folder_path = ''  # Default is current folder
        self._app = Flask(__name__)  # , static_folder='client', static_url_path='')
        self._api = Api(self._app)

    def _register_static_server(self, static_files_root_folder_path):
        self._static_files_root_folder_path = static_files_root_folder_path
        self._app.add_url_rule('/<path:file_relative_path_to_root>', 'serve_page', self._serve_page, methods=['GET'])
        self._app.add_url_rule('/', 'index', self._goto_index, methods=['GET'])

    def register_routes_to_resources(self, static_files_root_folder_path):

        self._register_static_server(static_files_root_folder_path)
        self._api.add_resource(TodoList, '/todos')
        self._api.add_resource(Todo, '/todos/<todo_id>')

    def _goto_index(self):
        return self._serve_page("index.html")

    def _serve_page(self, file_relative_path_to_root):
        return send_from_directory(self._static_files_root_folder_path, file_relative_path_to_root)

    def run(self, module_name):
        if module_name == '__main__':
            self._app.run(debug=True)

回答 9

一种简单的方法。 干杯!

演示

from flask import Flask, render_template
app = Flask(__name__)

@app.route("/")
def index():
   return render_template("index.html")

if __name__ == '__main__':
   app.run(debug = True)

现在创建名为模板的文件夹名称。将您的index.html文件添加到模板文件夹中

index.html

<!DOCTYPE html>
<html>
<head>
    <title>Python Web Application</title>
</head>
<body>
    <div>
         <p>
            Welcomes You!!
         </p>
    </div>
</body>
</html>

项目结构

-demo.py
-templates/index.html

One of the simple way to do. Cheers!

demo.py

from flask import Flask, render_template
app = Flask(__name__)

@app.route("/")
def index():
   return render_template("index.html")

if __name__ == '__main__':
   app.run(debug = True)

Now create folder name called templates. Add your index.html file inside of templates folder

index.html

<!DOCTYPE html>
<html>
<head>
    <title>Python Web Application</title>
</head>
<body>
    <div>
         <p>
            Welcomes You!!
         </p>
    </div>
</body>
</html>

Project Structure

-demo.py
-templates/index.html

回答 10

想共享…这个例子。

from flask import Flask
app = Flask(__name__)

@app.route('/loading/')
def hello_world():
    data = open('sample.html').read()    
    return data

if __name__ == '__main__':
    app.run(host='0.0.0.0')

这样效果更好,更简单。

Thought of sharing…. this example.

from flask import Flask
app = Flask(__name__)

@app.route('/loading/')
def hello_world():
    data = open('sample.html').read()    
    return data

if __name__ == '__main__':
    app.run(host='0.0.0.0')

This works better and simple.


回答 11

使用redirecturl_for

from flask import redirect, url_for

@app.route('/', methods=['GET'])
def metrics():
    return redirect(url_for('static', filename='jenkins_analytics.html'))

这将处理您html中引用的所有文件(css和js …)。

Use redirect and url_for

from flask import redirect, url_for

@app.route('/', methods=['GET'])
def metrics():
    return redirect(url_for('static', filename='jenkins_analytics.html'))

This servers all files (css & js…) referenced in your html.


回答 12

最简单的方法是在主项目文件夹中创建一个静态文件夹。包含.css文件的静态文件夹。

主文件夹

/Main Folder
/Main Folder/templates/foo.html
/Main Folder/static/foo.css
/Main Folder/application.py(flask script)

包含静态和模板文件夹以及flask脚本的主文件夹的图像

烧瓶

from flask import Flask, render_template

app = Flask(__name__)

@app.route("/")
def login():
    return render_template("login.html")

html(布局)

<!DOCTYPE html>
<html>
    <head>
        <title>Project(1)</title>
        <link rel="stylesheet" href="/static/styles.css">
     </head>
    <body>
        <header>
            <div class="container">
                <nav>
                    <a class="title" href="">Kamook</a>
                    <a class="text" href="">Sign Up</a>
                    <a class="text" href="">Log In</a>
                </nav>
            </div>
        </header>  
        {% block body %}
        {% endblock %}
    </body>
</html>

html

{% extends "layout.html" %}

{% block body %}
    <div class="col">
        <input type="text" name="username" placeholder="Username" required>
        <input type="password" name="password" placeholder="Password" required>
        <input type="submit" value="Login">
    </div>
{% endblock %}

The simplest way is create a static folder inside the main project folder. Static folder containing .css files.

main folder

/Main Folder
/Main Folder/templates/foo.html
/Main Folder/static/foo.css
/Main Folder/application.py(flask script)

Image of main folder containing static and templates folders and flask script

flask

from flask import Flask, render_template

app = Flask(__name__)

@app.route("/")
def login():
    return render_template("login.html")

html (layout)

<!DOCTYPE html>
<html>
    <head>
        <title>Project(1)</title>
        <link rel="stylesheet" href="/static/styles.css">
     </head>
    <body>
        <header>
            <div class="container">
                <nav>
                    <a class="title" href="">Kamook</a>
                    <a class="text" href="">Sign Up</a>
                    <a class="text" href="">Log In</a>
                </nav>
            </div>
        </header>  
        {% block body %}
        {% endblock %}
    </body>
</html>

html

{% extends "layout.html" %}

{% block body %}
    <div class="col">
        <input type="text" name="username" placeholder="Username" required>
        <input type="password" name="password" placeholder="Password" required>
        <input type="submit" value="Login">
    </div>
{% endblock %}

回答 13

app = Flask(__name__, static_folder="your path to static")

如果您的根目录中有模板,则如果包含app_Flask(name)的文件也位于同一位置,则放置app = Flask(name)将起作用,如果该文件位于其他位置,则必须指定模板位置才​​能启用烧瓶指向该位置

app = Flask(__name__, static_folder="your path to static")

If you have templates in your root directory, placing the app=Flask(name) will work if the file that contains this also is in the same location, if this file is in another location, you will have to specify the template location to enable Flask to point to the location


回答 14

所有的答案都不错,但是对我来说,行之有效的只是使用send_fileFlask 的简单功能。当host:port / ApiName将在浏览器中显示文件的输出时,当您只需要发送一个HTML文件作为响应时,此方法很好用


@app.route('/ApiName')
def ApiFunc():
    try:
        return send_file('some-other-directory-than-root/your-file.extension')
    except Exception as e:
        logging.info(e.args[0])```

All the answers are good but what worked well for me is just using the simple function send_file from Flask. This works well when you just need to send an html file as response when host:port/ApiName will show the output of the file in browser


@app.route('/ApiName')
def ApiFunc():
    try:
        return send_file('some-other-directory-than-root/your-file.extension')
    except Exception as e:
        logging.info(e.args[0])```


回答 15

   默认情况下,flask使用“模板”文件夹包含所有模板文件(任何纯文本文件,但通常是.html某种模板语言,例如jinja2),并使用“静态”文件夹包含所有静态文件(即.js .css和您的图片)。
   在您的中routes,您可以使用render_template()呈现模板文件(如上所述,默认情况下将其放置在templates文件夹中)作为您请求的响应。并且在模板文件(通常是类似.html的文件)中,您可能会使用某些.js和/或`.css’文件,所以我想您的问题是如何将这些静态文件链接到当前模板文件。

   By default, flask use a “templates” folder to contain all your template files(any plain-text file, but usually .html or some kind of template language such as jinja2 ) & a “static” folder to contain all your static files(i.e. .js .css and your images).
   In your routes, u can use render_template() to render a template file (as I say above, by default it is placed in the templates folder) as the response for your request. And in the template file (it’s usually a .html-like file), u may use some .js and/or `.css’ files, so I guess your question is how u link these static files to the current template file.


回答 16

如果您只是尝试打开文件,则可以使用 app.open_resource()。所以读取文件看起来像

with app.open_resource('/static/path/yourfile'):
      #code to read the file and do something

If you are just trying to open a file, you could use app.open_resource(). So reading a file would look something like

with app.open_resource('/static/path/yourfile'):
      #code to read the file and do something

如何使用print()打印类的实例?

问题:如何使用print()打印类的实例?

我正在学习Python中的绳索。当我尝试Foobar使用该print()函数打印类的对象时,得到如下输出:

<__main__.Foobar instance at 0x7ff2a18c>

有没有办法设置及其对象打印行为(或字符串表示形式)?例如,当我调用类对象时,我想以某种格式打印其数据成员。如何在Python中实现?print()

如果您熟悉C ++类,则可以通过为类ostream添加friend ostream& operator << (ostream&, const Foobar&)方法来实现上述目的。

I am learning the ropes in Python. When I try to print an object of class Foobar using the print() function, I get an output like this:

<__main__.Foobar instance at 0x7ff2a18c>

Is there a way I can set the printing behaviour (or the string representation) of a class and its objects? For instance, when I call print() on a class object, I would like to print its data members in a certain format. How to achieve this in Python?

If you are familiar with C++ classes, the above can be achieved for the standard ostream by adding a friend ostream& operator << (ostream&, const Foobar&) method for the class.


回答 0

>>> class Test:
...     def __repr__(self):
...         return "Test()"
...     def __str__(self):
...         return "member of Test"
... 
>>> t = Test()
>>> t
Test()
>>> print(t)
member of Test

__str__方法是在打印时发生的事情,该__repr__方法是在使用repr()功能时(或在交互式提示下查看它时)发生的事情。如果这不是最Python的方法,我深表歉意,因为我也在学习-但这确实可行。

如果未提供任何__str__方法,Python将__repr__改为打印结果。如果定义__str__但没有__repr__,Python将使用你所看到的上面的__repr__,但仍使用__str__打印。

>>> class Test:
...     def __repr__(self):
...         return "Test()"
...     def __str__(self):
...         return "member of Test"
... 
>>> t = Test()
>>> t
Test()
>>> print(t)
member of Test

The __str__ method is what happens when you print it, and the __repr__ method is what happens when you use the repr() function (or when you look at it with the interactive prompt). If this isn’t the most Pythonic method, I apologize, because I’m still learning too – but it works.

If no __str__ method is given, Python will print the result of __repr__ instead. If you define __str__ but not __repr__, Python will use what you see above as the __repr__, but still use __str__ for printing.


回答 1

正如Chris Lutz所提到的,这是由__repr__您的类中的方法定义的。

从以下文档中repr()

对于许多类型,此函数会尝试返回一个字符串,该字符串将在传递给时产生一个具有相同值的对象eval(),否则表示形式是一个用尖括号括起来的字符串,其中包含对象类型的名称以及其他信息通常包括对象的名称和地址。类可以通过定义__repr__()方法来控制此函数为其实例返回的内容。

给定以下类Test:

class Test:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __repr__(self):
        return "<Test a:%s b:%s>" % (self.a, self.b)

    def __str__(self):
        return "From str method of Test: a is %s, b is %s" % (self.a, self.b)

..it在Python Shell中的行为如下:

>>> t = Test(123, 456)
>>> t
<Test a:123 b:456>
>>> print repr(t)
<Test a:123 b:456>
>>> print(t)
From str method of Test: a is 123, b is 456
>>> print(str(t))
From str method of Test: a is 123, b is 456

如果__str__未定义任何方法,则print(t)(或print(str(t)))将使用结果__repr__代替

如果__repr__未定义任何方法,则使用默认值,该默认值与..

def __repr__(self):
    return "<%s instance at %s>" % (self.__class__.__name__, id(self))

As Chris Lutz mentioned, this is defined by the __repr__ method in your class.

From the documentation of repr():

For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object. A class can control what this function returns for its instances by defining a __repr__() method.

Given the following class Test:

class Test:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __repr__(self):
        return "<Test a:%s b:%s>" % (self.a, self.b)

    def __str__(self):
        return "From str method of Test: a is %s, b is %s" % (self.a, self.b)

..it will act the following way in the Python shell:

>>> t = Test(123, 456)
>>> t
<Test a:123 b:456>
>>> print repr(t)
<Test a:123 b:456>
>>> print(t)
From str method of Test: a is 123, b is 456
>>> print(str(t))
From str method of Test: a is 123, b is 456

If no __str__ method is defined, print(t) (or print(str(t))) will use the result of __repr__ instead

If no __repr__ method is defined then the default is used, which is pretty much equivalent to..

def __repr__(self):
    return "<%s instance at %s>" % (self.__class__.__name__, id(self))

回答 2

可以按以下方式完成可应用于任何类而无需特定格式的通用方法:

class Element:
    def __init__(self, name, symbol, number):
        self.name = name
        self.symbol = symbol
        self.number = number

    def __str__(self):
        return str(self.__class__) + ": " + str(self.__dict__)

然后,

elem = Element('my_name', 'some_symbol', 3)
print(elem)

产生

__main__.Element: {'symbol': 'some_symbol', 'name': 'my_name', 'number': 3}

A generic way that can be applied to any class without specific formatting could be done as follows:

class Element:
    def __init__(self, name, symbol, number):
        self.name = name
        self.symbol = symbol
        self.number = number

    def __str__(self):
        return str(self.__class__) + ": " + str(self.__dict__)

And then,

elem = Element('my_name', 'some_symbol', 3)
print(elem)

produces

__main__.Element: {'symbol': 'some_symbol', 'name': 'my_name', 'number': 3}

回答 3

如果遇到类似@Keith的情况,可以尝试:

print a.__dict__

它违背了我认为好的样式,但是如果您只是尝试调试,那么它应该可以做您想要的。

If you’re in a situation like @Keith you could try:

print a.__dict__

It goes against what I would consider good style but if you’re just trying to debug then it should do what you want.


回答 4

只是为了在@dbr的答案中加上我的两分钱,下面是他引用的官方文档中如何实现这句话的一个示例:

“ […返回一个字符串,当传递给eval()时,该字符串将产生具有相同值的对象,[…]”

给定此类定义:

class Test(object):
    def __init__(self, a, b):
        self._a = a
        self._b = b

    def __str__(self):
        return "An instance of class Test with state: a=%s b=%s" % (self._a, self._b)

    def __repr__(self):
        return 'Test("%s","%s")' % (self._a, self._b)

现在,很容易序列化Test类的实例:

x = Test('hello', 'world')
print 'Human readable: ', str(x)
print 'Object representation: ', repr(x)
print

y = eval(repr(x))
print 'Human readable: ', str(y)
print 'Object representation: ', repr(y)
print

因此,运行最后一段代码,我们将获得:

Human readable:  An instance of class Test with state: a=hello b=world
Object representation:  Test("hello","world")

Human readable:  An instance of class Test with state: a=hello b=world
Object representation:  Test("hello","world")

但是,正如我在最近的评论中所说:更多信息就在这里

Just to add my two cents to @dbr’s answer, following is an example of how to implement this sentence from the official documentation he’s cited:

“[…] to return a string that would yield an object with the same value when passed to eval(), […]”

Given this class definition:

class Test(object):
    def __init__(self, a, b):
        self._a = a
        self._b = b

    def __str__(self):
        return "An instance of class Test with state: a=%s b=%s" % (self._a, self._b)

    def __repr__(self):
        return 'Test("%s","%s")' % (self._a, self._b)

Now, is easy to serialize instance of Test class:

x = Test('hello', 'world')
print 'Human readable: ', str(x)
print 'Object representation: ', repr(x)
print

y = eval(repr(x))
print 'Human readable: ', str(y)
print 'Object representation: ', repr(y)
print

So, running last piece of code, we’ll get:

Human readable:  An instance of class Test with state: a=hello b=world
Object representation:  Test("hello","world")

Human readable:  An instance of class Test with state: a=hello b=world
Object representation:  Test("hello","world")

But, as I said in my last comment: more info is just here!


回答 5

您需要使用__repr__。这是一个类似的标准功能__init__。例如:

class Foobar():
    """This will create Foobar type object."""

    def __init__(self):
        print "Foobar object is created."

    def __repr__(self):
        return "Type what do you want to see here."

a = Foobar()

print a

You need to use __repr__. This is a standard function like __init__. For example:

class Foobar():
    """This will create Foobar type object."""

    def __init__(self):
        print "Foobar object is created."

    def __repr__(self):
        return "Type what do you want to see here."

a = Foobar()

print a

回答 6

@ user394430的响应的更漂亮的版本

class Element:
    def __init__(self, name, symbol, number):
        self.name = name
        self.symbol = symbol
        self.number = number

    def __str__(self):
        return  str(self.__class__) + '\n'+ '\n'.join(('{} = {}'.format(item, self.__dict__[item]) for item in self.__dict__))

elem = Element('my_name', 'some_symbol', 3)
print(elem)

产生视觉上漂亮的名称和值列表。

<class '__main__.Element'>
name = my_name
symbol = some_symbol
number = 3

更好的版本(感谢Ruud)对项目进行排序:

def __str__(self):
    return  str(self.__class__) + '\n' + '\n'.join((str(item) + ' = ' + str(self.__dict__[item]) for item in sorted(self.__dict__)))

A prettier version of response by @user394430

class Element:
    def __init__(self, name, symbol, number):
        self.name = name
        self.symbol = symbol
        self.number = number

    def __str__(self):
        return  str(self.__class__) + '\n'+ '\n'.join(('{} = {}'.format(item, self.__dict__[item]) for item in self.__dict__))

elem = Element('my_name', 'some_symbol', 3)
print(elem)

Produces visually nice list of the names and values.

<class '__main__.Element'>
name = my_name
symbol = some_symbol
number = 3

An even fancier version (thanks Ruud) sorts the items:

def __str__(self):
    return  str(self.__class__) + '\n' + '\n'.join((str(item) + ' = ' + str(self.__dict__[item]) for item in sorted(self.__dict__)))

回答 7

对于Python 3:

如果特定格式不重要(例如,用于调试),则仅继承下面的Printable类。无需为每个对象编写代码。

灵感来自这个答案

class Printable:
    def __repr__(self):
        from pprint import pformat
        return "<" + type(self).__name__ + "> " + pformat(vars(self), indent=4, width=1)

# Example Usage
class MyClass(Printable):
    pass

my_obj = MyClass()
my_obj.msg = "Hello"
my_obj.number = "46"
print(my_obj)

For Python 3:

If the specific format isn’t important (e.g. for debugging) just inherit from the Printable class below. No need to write code for every object.

Inspired by this answer

class Printable:
    def __repr__(self):
        from pprint import pformat
        return "<" + type(self).__name__ + "> " + pformat(vars(self), indent=4, width=1)

# Example Usage
class MyClass(Printable):
    pass

my_obj = MyClass()
my_obj.msg = "Hello"
my_obj.number = "46"
print(my_obj)

回答 8

这个线程中已经有很多答案,但是没有一个对我有特别的帮助,我必须自己解决这个问题,因此我希望这个答案能提供更多信息。

您只需要确保在类结束时有括号即可,例如:

print(class())

这是我正在从事的项目中的代码示例:

class Element:
    def __init__(self, name, symbol, number):
        self.name = name
        self.symbol = symbol
        self.number = number
    def __str__(self):
        return "{}: {}\nAtomic Number: {}\n".format(self.name, self.symbol, self.number

class Hydrogen(Element):
    def __init__(self):
        super().__init__(name = "Hydrogen", symbol = "H", number = "1")

要打印我的Hydrogen类,我使用了以下内容:

print(Hydrogen())

请注意,如果没有氢末的括号,这将无法正常工作。它们是必需的。

希望这会有所帮助,如果您还有其他问题,请告诉我。

There are already a lot of answers in this thread but none of them particularly helped me, I had to work it out myself, so I hope this one is a little more informative.

You just have to make sure you have parentheses at the end of your class, e.g:

print(class())

Here’s an example of code from a project I was working on:

class Element:
    def __init__(self, name, symbol, number):
        self.name = name
        self.symbol = symbol
        self.number = number
    def __str__(self):
        return "{}: {}\nAtomic Number: {}\n".format(self.name, self.symbol, self.number

class Hydrogen(Element):
    def __init__(self):
        super().__init__(name = "Hydrogen", symbol = "H", number = "1")

To print my Hydrogen class, I used the following:

print(Hydrogen())

Please note, this will not work without the parentheses at the end of Hydrogen. They are necessary.

Hope this helps, let me know if you have anymore questions.


Python的生成器和迭代器之间的区别

问题:Python的生成器和迭代器之间的区别

迭代器和生成器有什么区别?有关何时使用每种情况的一些示例会有所帮助。

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.


回答 0

iterator是一个更笼统的概念:其类具有next方法(__next__在Python 3中)和具有__iter__方法的任何对象return self

每个生成器都是一个迭代器,但反之亦然。生成器是通过调用具有一个或多个yield表达式(yield在Python 2.5及更早版本中为语句)的函数而构建的,并且该函数是满足上一段对的定义的对象iterator

当您需要一个具有某种复杂状态维护行为的类,或者想要公开除next(和__iter____init__)之外的其他方法时,您可能想使用自定义迭代器,而不是生成器。通常,一个生成器(有时,对于足够简单的需求,一个生成器表达式)就足够了,并且它更容易编写代码,因为状态维护(在合理范围内)基本上是由挂起和恢复帧“为您完成的”。

例如,一个生成器,例如:

def squares(start, stop):
    for i in range(start, stop):
        yield i * i

generator = squares(a, b)

或等效的生成器表达式(genexp)

generator = (i*i for i in range(a, b))

将需要更多代码来构建为自定义迭代器:

class Squares(object):
    def __init__(self, start, stop):
       self.start = start
       self.stop = stop
    def __iter__(self): return self
    def next(self): # __next__ in Python 3
       if self.start >= self.stop:
           raise StopIteration
       current = self.start * self.start
       self.start += 1
       return current

iterator = Squares(a, b)

但是,当然,有了类,Squares您可以轻松地提供其他方法,即

    def current(self):
       return self.start

如果您在应用程序中实际需要这种额外功能。

iterator is a more general concept: any object whose class has a next method (__next__ in Python 3) and an __iter__ method that does return self.

Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph’s definition of an iterator.

You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides next (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it’s simpler to code because state maintenance (within reasonable limits) is basically “done for you” by the frame getting suspended and resumed.

For example, a generator such as:

def squares(start, stop):
    for i in range(start, stop):
        yield i * i

generator = squares(a, b)

or the equivalent generator expression (genexp)

generator = (i*i for i in range(a, b))

would take more code to build as a custom iterator:

class Squares(object):
    def __init__(self, start, stop):
       self.start = start
       self.stop = stop
    def __iter__(self): return self
    def next(self): # __next__ in Python 3
       if self.start >= self.stop:
           raise StopIteration
       current = self.start * self.start
       self.start += 1
       return current

iterator = Squares(a, b)

But, of course, with class Squares you could easily offer extra methods, i.e.

    def current(self):
       return self.start

if you have any actual need for such extra functionality in your application.


回答 1

迭代器和生成器有什么区别?有关何时使用每种情况的一些示例会有所帮助。

总结:迭代器是具有__iter__和方法的对象__next__next在Python 2中)。生成器提供了一种简单的内置方法来创建Iterator的实例。

包含yield的函数仍然是一个函数,在调用该函数时,它会返回生成器对象的实例:

def a_function():
    "when called, returns generator object"
    yield

生成器表达式还返回生成器:

a_generator = (i for i in range(0))

有关更深入的说明和示例,请继续阅读。

生成器迭代器

具体来说,生成器是迭代器的子类型。

>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True

我们可以通过几种方式创建生成器。一种非常普遍且简单的方法是使用函数。

具体来说,其中包含yield的函数是一个函数,在调用该函数时会返回生成器:

>>> def a_function():
        "just a function definition with yield in it"
        yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function()  # when called
>>> type(a_generator)           # returns a generator
<class 'generator'>

同样,生成器是迭代器:

>>> isinstance(a_generator, collections.Iterator)
True

迭代器可迭代的

迭代器是可迭代的

>>> issubclass(collections.Iterator, collections.Iterable)
True

这需要一个__iter__返回Iterator 的方法:

>>> collections.Iterable()
Traceback (most recent call last):
  File "<pyshell#79>", line 1, in <module>
    collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__

内置元组,列表,字典,集合,冻结集合,字符串,字节字符串,字节数组,范围和内存视图是可迭代对象的一些示例:

>>> all(isinstance(element, collections.Iterable) for element in (
        (), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True

迭代器需要一个next__next__方法

在Python 2中:

>>> collections.Iterator()
Traceback (most recent call last):
  File "<pyshell#80>", line 1, in <module>
    collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next

在Python 3中:

>>> collections.Iterator()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__

我们可以使用以下函数从内置对象(或自定义对象)中获取迭代器iter

>>> all(isinstance(iter(element), collections.Iterator) for element in (
        (), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True

__iter__当您尝试将对象与for循环一起使用时,将调用该方法。然后__next__,在迭代器对象上调用该方法以获取循环中的每个项目。StopIteration耗尽后,迭代器会上升,并且此时无法重用。

从文档中

在“内置类型” 文档的“迭代器类型”部分的“生成器类型”部分中:

Python的生成器提供了一种实现迭代器协议的便捷方法。如果容器对象的__iter__()方法作为生成器实现的,它会自动返回一个迭代器对象(在技术上,一个生成器对象)供给__iter__()next()[ __next__()在Python 3]的方法。有关生成器的更多信息,可以在yield表达式的文档中找到。

(已添加重点。)

因此,我们从中了解到生成器是(便捷的)迭代器类型。

示例迭代器对象

您可以通过创建或扩展自己的对象来创建实现Iterator协议的对象。

class Yes(collections.Iterator):

    def __init__(self, stop):
        self.x = 0
        self.stop = stop

    def __iter__(self):
        return self

    def next(self):
        if self.x < self.stop:
            self.x += 1
            return 'yes'
        else:
            # Iterators must raise when done, else considered broken
            raise StopIteration

    __next__ = next # Python 3 compatibility

但是,使用Generator来执行此操作会更容易:

def yes(stop):
    for _ in range(stop):
        yield 'yes'

也许更简单一些,生成器表达式(类似于列表推导):

yes_expr = ('yes' for _ in range(stop))

它们都可以以相同的方式使用:

>>> stop = 4             
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop), 
                             ('yes' for _ in range(stop))):
...     print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...     
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes

结论

当需要将Python对象扩展为可以迭代的对象时,可以直接使用Iterator协议。

但是,在大多数情况下,最适合yield用于定义返回生成器迭代器或考虑生成器表达式的函数。

最后,请注意,生成器提供了更多的协同程序功能。我将yield在有关“ yield”关键字的作用?”的回答中深入解释Generators和该语句。

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.

In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.

A function with yield in it is still a function, that, when called, returns an instance of a generator object:

def a_function():
    "when called, returns generator object"
    yield

A generator expression also returns a generator:

a_generator = (i for i in range(0))

For a more in-depth exposition and examples, keep reading.

A Generator is an Iterator

Specifically, generator is a subtype of iterator.

>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True

We can create a generator several ways. A very common and simple way to do so is with a function.

Specifically, a function with yield in it is a function, that, when called, returns a generator:

>>> def a_function():
        "just a function definition with yield in it"
        yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function()  # when called
>>> type(a_generator)           # returns a generator
<class 'generator'>

And a generator, again, is an Iterator:

>>> isinstance(a_generator, collections.Iterator)
True

An Iterator is an Iterable

An Iterator is an Iterable,

>>> issubclass(collections.Iterator, collections.Iterable)
True

which requires an __iter__ method that returns an Iterator:

>>> collections.Iterable()
Traceback (most recent call last):
  File "<pyshell#79>", line 1, in <module>
    collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__

Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:

>>> all(isinstance(element, collections.Iterable) for element in (
        (), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True

Iterators require a next or __next__ method

In Python 2:

>>> collections.Iterator()
Traceback (most recent call last):
  File "<pyshell#80>", line 1, in <module>
    collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next

And in Python 3:

>>> collections.Iterator()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__

We can get the iterators from the built-in objects (or custom objects) with the iter function:

>>> all(isinstance(iter(element), collections.Iterator) for element in (
        (), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True

The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.

From the documentation

From the Generator Types section of the Iterator Types section of the Built-in Types documentation:

Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.

(Emphasis added.)

So from this we learn that Generators are a (convenient) type of Iterator.

Example Iterator Objects

You might create object that implements the Iterator protocol by creating or extending your own object.

class Yes(collections.Iterator):

    def __init__(self, stop):
        self.x = 0
        self.stop = stop

    def __iter__(self):
        return self

    def next(self):
        if self.x < self.stop:
            self.x += 1
            return 'yes'
        else:
            # Iterators must raise when done, else considered broken
            raise StopIteration

    __next__ = next # Python 3 compatibility

But it’s easier to simply use a Generator to do this:

def yes(stop):
    for _ in range(stop):
        yield 'yes'

Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):

yes_expr = ('yes' for _ in range(stop))

They can all be used in the same way:

>>> stop = 4             
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop), 
                             ('yes' for _ in range(stop))):
...     print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...     
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes

Conclusion

You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.

However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.

Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to “What does the “yield” keyword do?”.


回答 2

迭代器:

迭代器是使用next()方法获取序列的下一个值的对象。

生成器:

生成器是一种使用yield方法生成或产生值序列的函数。

生成器函数(如以下示例中的ex:函数)返回的生成next()器对象(如ex f中的示例)的每个方法调用都将按foo()顺序生成下一个值。

调用生成器函数时,它甚至不开始执行函数就返回生成器对象。当next()方法被称为首次,函数开始执行,直到它到达它返回产生值yield语句。收益跟踪(即记住上一次执行)。第二个next()调用从先前的值继续。

下面的示例演示了yield和生成器对象上的next方法的调用之间的相互作用。

>>> def foo():
...     print "begin"
...     for i in range(3):
...         print "before yield", i
...         yield i
...         print "after yield", i
...     print "end"
...
>>> f = foo()
>>> f.next()
begin
before yield 0            # Control is in for loop
0
>>> f.next()
after yield 0             
before yield 1            # Continue for loop
1
>>> f.next()
after yield 1
before yield 2
2
>>> f.next()
after yield 2
end
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

Iterators:

Iterator are objects which uses next() method to get next value of sequence.

Generators:

A generator is a function that produces or yields a sequence of values using yield method.

Every next() method call on generator object(for ex: f as in below example) returned by generator function(for ex: foo() function in below example), generates next value in sequence.

When a generator function is called, it returns an generator object without even beginning execution of the function. When next() method is called for the first time, the function starts executing until it reaches yield statement which returns the yielded value. The yield keeps track of i.e. remembers last execution. And second next() call continues from previous value.

The following example demonstrates the interplay between yield and call to next method on generator object.

>>> def foo():
...     print "begin"
...     for i in range(3):
...         print "before yield", i
...         yield i
...         print "after yield", i
...     print "end"
...
>>> f = foo()
>>> f.next()
begin
before yield 0            # Control is in for loop
0
>>> f.next()
after yield 0             
before yield 1            # Continue for loop
1
>>> f.next()
after yield 1
before yield 2
2
>>> f.next()
after yield 2
end
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

回答 3

添加答案是因为现有答案中没有一个专门解决官方文献中的混乱。

生成器函数是使用yield代替定义的普通函数return。调用时,生成器函数将返回一个生成器对象,它是一种迭代器-它具有一个next()方法。当您调用时next(),将返回生成器函数产生的下一个值。

函数或对象都可以称为“生成器”,具体取决于您阅读的是哪个Python源文档。在Python的词汇说发生器功能,而Python的维基意味着生成器对象。在Python的教程非常设法暗示用三句话的空间用法:

生成器是用于创建迭代器的简单而强大的工具。它们的编写方式与常规函数类似,但是只要要返回数据就使用yield语句。每次在其上调用next()时,生成器都会从上次中断的地方继续(它会记住所有数据值以及最后执行的语句)。

前两个句子用生成器函数标识生成器,而第三句话用生成器对象标识它们。

尽管存在所有这些困惑,但您仍然可以找到Python语言参考来获得清晰明确的词:

yield表达式仅在定义生成器函数时使用,并且只能在函数定义的主体中使用。在函数定义中使用yield表达式足以使该定义创建一个生成器函数,而不是普通函数。

调用生成器函数时,它将返回称为生成器的迭代器。然后,该生成器控制生成器功能的执行。

因此,在正式和精确的用法中,“生成器”不合格表示生成器对象,而不是生成器功能。

上面的参考是针对Python 2的,但是Python 3语言参考却说了同样的话。但是,Python 3词汇表指出

generator …通常是指生成器函数,但在某些情况下可能是指生成器迭代器。在预期含义不明确的情况下,使用完整术语可以避免歧义。

Adding an answer because none of the existing answers specifically address the confusion in the official literature.

Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator – it has a next() method. When you call next(), the next value yielded by the generator function is returned.

Either the function or the object may be called the “generator” depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:

Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).

The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.

Despite all this confusion, one can seek out the Python language reference for the clear and final word:

The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.

When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.

So, in formal and precise usage, “generator” unqualified means generator object, not generator function.

The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that

generator … Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.


回答 4

每个人都有一个非常好的示例冗长的答案,我对此表示感谢。我只是想为在概念上仍不太清楚的人提供简短的答案:

如果创建自己的迭代器,则涉及到一点点-您必须创建一个类并至少实现iter和next方法。但是,如果您不想经历这种麻烦并想快速创建迭代器,该怎么办。幸运的是,Python提供了一种定义迭代器的捷径。您需要做的就是定义一个至少调用一次yield的函数,现在当您调用该函数时,它将返回“ something ”,其作用类似于迭代器(您可以调用next方法并在for循环中使用它)。这个东西在Python中有一个名字叫做Generator

希望可以澄清一下。

Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:

If you create your own iterator, it is a little bit involved – you have to create a class and at least implement the iter and the next methods. But what if you don’t want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return “something” which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator

Hope that clarifies a bit.


回答 5

先前的答案未添加此功能:生成器具有close方法,而典型的迭代器则没有。该close方法StopIteration在生成器中触发一个异常,该异常可能会finally在该迭代器的子句中捕获,从而有机会运行一些清理操作。这种抽象使它比简单的迭代器在大型迭代器中最有用。一个人可以关闭一个生成器,就像一个人可以关闭一个文件一样,而不必担心底层内容。

也就是说,我对第一个问题的个人回答是:iteratable __iter__仅具有一个方法,典型的迭代器__next__仅具有一个方法,生成器具有an __iter__和a __next__以及一个extra close

对于第二个问题,我个人的回答是:在公共界面中,我倾向于偏爱生成器,因为它更具弹性:该close方法具有更大的可组合性yield from。在本地,我可以使用迭代器,但前提是它是一个平面且简单的结构(迭代器不容易编写),并且有理由相信该序列很短,尤其是在序列结束之前可以将其停止的情况。我倾向于将迭代器视为低级原语,而不是文字。

对于控制流而言,生成器是一个与承诺一样重要的概念:两者都是抽象的且可组合的。

Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.

That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.

For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.

For control flow matters, generators are an as much important concept as promises: both are abstract and composable.


回答 6

生成器功能,生成器对象,生成器:

一个生成器的功能就像Python中的常规功能,但它包含一个或多个yield语句。生成器函数是一个很好的工具,它可以尽可能轻松地创建 Iterator对象。通过generator函数返回的Iterator对象也称为Generator对象Generator

在此示例中,我创建了一个Generator函数,该函数返回Generator对象<generator object fib at 0x01342480>。就像其他迭代器一样,Generator对象可以在for循环中使用,也可以与内置函数一起使用,该 函数next()从generator返回下一个值。

def fib(max):
    a, b = 0, 1
    for i in range(max):
        yield a
        a, b = b, a + b
print(fib(10))             #<generator object fib at 0x01342480>

for i in fib(10):
    print(i)               # 0 1 1 2 3 5 8 13 21 34


print(next(myfib))         #0
print(next(myfib))         #1
print(next(myfib))         #1
print(next(myfib))         #2

因此,生成器函数是创建Iterator对象的最简单方法。

迭代器

每个生成器对象都是一个迭代器,但反之则不是。如果自定义迭代器对象的类实现__iter____next__方法(也称为迭代器协议),则可以创建该对象 。

但是,使用生成器函数来创建迭代器要容易得多,因为它们可以简化迭代器的创建,但是自定义迭代器为您提供了更大的自由度,并且您还可以根据需要实现其他方法,如下例所示。

class Fib:
    def __init__(self,max):
        self.current=0
        self.next=1
        self.max=max
        self.count=0

    def __iter__(self):
        return self

    def __next__(self):
        if self.count>self.max:
            raise StopIteration
        else:
            self.current,self.next=self.next,(self.current+self.next)
            self.count+=1
            return self.next-self.current

    def __str__(self):
        return "Generator object"

itobj=Fib(4)
print(itobj)               #Generator object

for i in Fib(4):  
    print(i)               #0 1 1 2

print(next(itobj))         #0
print(next(itobj))         #1
print(next(itobj))         #1

Generator Function, Generator Object, Generator:

A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.

In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.

def fib(max):
    a, b = 0, 1
    for i in range(max):
        yield a
        a, b = b, a + b
print(fib(10))             #<generator object fib at 0x01342480>

for i in fib(10):
    print(i)               # 0 1 1 2 3 5 8 13 21 34


print(next(myfib))         #0
print(next(myfib))         #1
print(next(myfib))         #1
print(next(myfib))         #2

So a generator function is the easiest way to create an Iterator object.

Iterator:

Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).

However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.

class Fib:
    def __init__(self,max):
        self.current=0
        self.next=1
        self.max=max
        self.count=0

    def __iter__(self):
        return self

    def __next__(self):
        if self.count>self.max:
            raise StopIteration
        else:
            self.current,self.next=self.next,(self.current+self.next)
            self.count+=1
            return self.next-self.current

    def __str__(self):
        return "Generator object"

itobj=Fib(4)
print(itobj)               #Generator object

for i in Fib(4):  
    print(i)               #0 1 1 2

print(next(itobj))         #0
print(next(itobj))         #1
print(next(itobj))         #1

回答 7

强烈推荐Ned Batchelder的示例 用于迭代器和生成器

没有生成器的方法会做一些偶数运算

def evens(stream):
   them = []
   for n in stream:
      if n % 2 == 0:
         them.append(n)
   return them

而使用生成器

def evens(stream):
    for n in stream:
        if n % 2 == 0:
            yield n
  • 我们不需要任何清单return声明
  • 对于大/无限长的流有效…它只是走并产生价值

evens照常调用方法(生成器)

num = [...]
for n in evens(num):
   do_smth(n)
  • 生成器也用于打破双回路

迭代器

一整页的书是可迭代的,书签是 迭代器

这个书签除了移动外别无其他 next

litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration  (Exception) as we got end of the iterator

要使用Generator …我们需要一个函数

要使用迭代器……我们需要nextiter

如前所述:

Generator函数返回迭代器对象

迭代器的全部优点:

一次将一个元素存储在内存中

Examples from Ned Batchelder highly recommended for iterators and generators

A method without generators that do something to even numbers

def evens(stream):
   them = []
   for n in stream:
      if n % 2 == 0:
         them.append(n)
   return them

while by using a generator

def evens(stream):
    for n in stream:
        if n % 2 == 0:
            yield n
  • We don’t need any list nor a return statement
  • Efficient for large/ infinite length stream … it just walks and yield the value

Calling the evens method (generator) is as usual

num = [...]
for n in evens(num):
   do_smth(n)
  • Generator also used to Break double loop

Iterator

A book full of pages is an iterable, A bookmark is an iterator

and this bookmark has nothing to do except to move next

litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration  (Exception) as we got end of the iterator

To use Generator … we need a function

To use Iterator … we need next and iter

As been said:

A Generator function returns an iterator object

The Whole benefit of Iterator:

Store one element a time in memory


回答 8

您可以将两种方法比较相同的数据:

def myGeneratorList(n):
    for i in range(n):
        yield i

def myIterableList(n):
    ll = n*[None]
    for i in range(n):
        ll[i] = i
    return ll

# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
    print("{} {}".format(i1, i2))

# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

此外,如果检查内存占用量,生成器将占用更少的内存,因为它不需要同时将所有值存储在内存中。

You can compare both approaches for the same data:

def myGeneratorList(n):
    for i in range(n):
        yield i

def myIterableList(n):
    ll = n*[None]
    for i in range(n):
        ll[i] = i
    return ll

# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
    print("{} {}".format(i1, i2))

# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

Besides, if you check the memory footprint, the generator takes much less memory as it doesn’t need to store all the values in memory at the same time.


回答 9

我用一种非常简单的方式专门为Python新手编写了代码,尽管Python深入人心地做了很多事情。

让我们从最基本的开始:

考虑一个清单,

l = [1,2,3]

让我们编写一个等效的函数:

def f():
    return [1,2,3]

o / p为print(l): [1,2,3]&o / p为print(f()) : [1,2,3]

让列表l变得可迭代:在python中,列表始终是可迭代的,这意味着您可以随时使用迭代器。

让我们在列表上应用迭代器:

iter_l = iter(l) # iterator applied explicitly

让我们迭代一个函数,即编写一个等效的生成器函数。 在python中,一旦您引入了关键字yield;它成为生成器函数,并且迭代器将被隐式应用。

注意:每个生成器始终可以应用隐式迭代器进行迭代,此处隐式迭代器是关键, 因此生成器函数将是:

def f():
  yield 1 
  yield 2
  yield 3

iter_f = f() # which is iter(f) as iterator is already applied implicitly

因此,如果您观察到,一旦创建函数fa generator,它已经是iter(f)

现在,

l是列表,应用迭代器方法“ iter”后,它变为iter(l)

f已经是iter(f),在应用迭代器方法“ iter”之后,它变为iter(iter(f)),再次是iter(f)

您将int强制转换为已经为int的int(x)并保留为int(x)。

例如:

print(type(iter(iter(l))))

<class 'list_iterator'>

永远不要忘记这是Python,而不是C或C ++

因此,以上解释得出的结论是:

列出l〜= iter(l)

生成器函数f == iter(f)

I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.

Let’s start with the very basic:

Consider a list,

l = [1,2,3]

Let’s write an equivalent function:

def f():
    return [1,2,3]

o/p of print(l): [1,2,3] & o/p of print(f()) : [1,2,3]

Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.

Let’s apply iterator on list:

iter_l = iter(l) # iterator applied explicitly

Let’s make a function iterable, i.e. write an equivalent generator function. In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.

Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux So the generator function will be:

def f():
  yield 1 
  yield 2
  yield 3

iter_f = f() # which is iter(f) as iterator is already applied implicitly

So if you have observed, as soon as you made function f a generator, it is already iter(f)

Now,

l is the list, after applying iterator method “iter” it becomes, iter(l)

f is already iter(f), after applying iterator method “iter” it becomes, iter(iter(f)), which is again iter(f)

It’s kinda you are casting int to int(x) which is already int and it will remain int(x).

For example o/p of :

print(type(iter(iter(l))))

is

<class 'list_iterator'>

Never forget this is Python and not C or C++

Hence the conclusion from above explanation is:

list l ~= iter(l)

generator function f == iter(f)


将嵌套的Python字典转换为对象?

问题:将嵌套的Python字典转换为对象?

我正在寻找一种优雅的方式来获取数据,该数据使用具有一些嵌套的字典和列表的字典的属性访问(即javascript样式的对象语法)。

例如:

>>> d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}

应该以这种方式访问​​:

>>> x = dict2obj(d)
>>> x.a
1
>>> x.b.c
2
>>> x.d[1].foo
bar

我认为,没有递归是不可能的,但是获得字典对象样式的一种好方法是什么?

I’m searching for an elegant way to get data using attribute access on a dict with some nested dicts and lists (i.e. javascript-style object syntax).

For example:

>>> d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}

Should be accessible in this way:

>>> x = dict2obj(d)
>>> x.a
1
>>> x.b.c
2
>>> x.d[1].foo
bar

I think, this is not possible without recursion, but what would be a nice way to get an object style for dicts?


回答 0

更新:在Python 2.6及更高版本中,请考虑namedtuple数据结构是否满足您的需求:

>>> from collections import namedtuple
>>> MyStruct = namedtuple('MyStruct', 'a b d')
>>> s = MyStruct(a=1, b={'c': 2}, d=['hi'])
>>> s
MyStruct(a=1, b={'c': 2}, d=['hi'])
>>> s.a
1
>>> s.b
{'c': 2}
>>> s.c
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyStruct' object has no attribute 'c'
>>> s.d
['hi']

备选方案(原始答案内容)为:

class Struct:
    def __init__(self, **entries):
        self.__dict__.update(entries)

然后,您可以使用:

>>> args = {'a': 1, 'b': 2}
>>> s = Struct(**args)
>>> s
<__main__.Struct instance at 0x01D6A738>
>>> s.a
1
>>> s.b
2

Update: In Python 2.6 and onwards, consider whether the namedtuple data structure suits your needs:

>>> from collections import namedtuple
>>> MyStruct = namedtuple('MyStruct', 'a b d')
>>> s = MyStruct(a=1, b={'c': 2}, d=['hi'])
>>> s
MyStruct(a=1, b={'c': 2}, d=['hi'])
>>> s.a
1
>>> s.b
{'c': 2}
>>> s.c
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyStruct' object has no attribute 'c'
>>> s.d
['hi']

The alternative (original answer contents) is:

class Struct:
    def __init__(self, **entries):
        self.__dict__.update(entries)

Then, you can use:

>>> args = {'a': 1, 'b': 2}
>>> s = Struct(**args)
>>> s
<__main__.Struct instance at 0x01D6A738>
>>> s.a
1
>>> s.b
2

回答 1

class obj(object):
    def __init__(self, d):
        for a, b in d.items():
            if isinstance(b, (list, tuple)):
               setattr(self, a, [obj(x) if isinstance(x, dict) else x for x in b])
            else:
               setattr(self, a, obj(b) if isinstance(b, dict) else b)

>>> d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}
>>> x = obj(d)
>>> x.b.c
2
>>> x.d[1].foo
'bar'
class obj(object):
    def __init__(self, d):
        for a, b in d.items():
            if isinstance(b, (list, tuple)):
               setattr(self, a, [obj(x) if isinstance(x, dict) else x for x in b])
            else:
               setattr(self, a, obj(b) if isinstance(b, dict) else b)

>>> d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}
>>> x = obj(d)
>>> x.b.c
2
>>> x.d[1].foo
'bar'

回答 2

令人惊讶的是,没有人提到邦奇。该库专门用于提供对dict对象的属性样式访问,并且完全符合OP的要求。演示:

>>> from bunch import bunchify
>>> d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}
>>> x = bunchify(d)
>>> x.a
1
>>> x.b.c
2
>>> x.d[1].foo
'bar'

可以从https://github.com/Infinidat/munch获得Python 3库- 版权归codyzu所有

Surprisingly no one has mentioned Bunch. This library is exclusively meant to provide attribute style access to dict objects and does exactly what the OP wants. A demonstration:

>>> from bunch import bunchify
>>> d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}
>>> x = bunchify(d)
>>> x.a
1
>>> x.b.c
2
>>> x.d[1].foo
'bar'

A Python 3 library is available at https://github.com/Infinidat/munch – Credit goes to codyzu


回答 3

x = type('new_dict', (object,), d)

然后向其中添加递归就可以了。

编辑这是我将如何实现:

>>> d
{'a': 1, 'b': {'c': 2}, 'd': ['hi', {'foo': 'bar'}]}
>>> def obj_dic(d):
    top = type('new', (object,), d)
    seqs = tuple, list, set, frozenset
    for i, j in d.items():
        if isinstance(j, dict):
            setattr(top, i, obj_dic(j))
        elif isinstance(j, seqs):
            setattr(top, i, 
                type(j)(obj_dic(sj) if isinstance(sj, dict) else sj for sj in j))
        else:
            setattr(top, i, j)
    return top

>>> x = obj_dic(d)
>>> x.a
1
>>> x.b.c
2
>>> x.d[1].foo
'bar'
x = type('new_dict', (object,), d)

then add recursion to this and you’re done.

edit this is how I’d implement it:

>>> d
{'a': 1, 'b': {'c': 2}, 'd': ['hi', {'foo': 'bar'}]}
>>> def obj_dic(d):
    top = type('new', (object,), d)
    seqs = tuple, list, set, frozenset
    for i, j in d.items():
        if isinstance(j, dict):
            setattr(top, i, obj_dic(j))
        elif isinstance(j, seqs):
            setattr(top, i, 
                type(j)(obj_dic(sj) if isinstance(sj, dict) else sj for sj in j))
        else:
            setattr(top, i, j)
    return top

>>> x = obj_dic(d)
>>> x.a
1
>>> x.b.c
2
>>> x.d[1].foo
'bar'

回答 4

有一个名为的收集助手namedtuple,可以为您完成此操作:

from collections import namedtuple

d_named = namedtuple('Struct', d.keys())(*d.values())

In [7]: d_named
Out[7]: Struct(a=1, b={'c': 2}, d=['hi', {'foo': 'bar'}])

In [8]: d_named.a
Out[8]: 1

There’s a collection helper called namedtuple, that can do this for you:

from collections import namedtuple

d_named = namedtuple('Struct', d.keys())(*d.values())

In [7]: d_named
Out[7]: Struct(a=1, b={'c': 2}, d=['hi', {'foo': 'bar'}])

In [8]: d_named.a
Out[8]: 1

回答 5

class Struct(object):
    """Comment removed"""
    def __init__(self, data):
        for name, value in data.iteritems():
            setattr(self, name, self._wrap(value))

    def _wrap(self, value):
        if isinstance(value, (tuple, list, set, frozenset)): 
            return type(value)([self._wrap(v) for v in value])
        else:
            return Struct(value) if isinstance(value, dict) else value

可以与任何深度的任何序列/字典/值结构一起使用。

class Struct(object):
    """Comment removed"""
    def __init__(self, data):
        for name, value in data.iteritems():
            setattr(self, name, self._wrap(value))

    def _wrap(self, value):
        if isinstance(value, (tuple, list, set, frozenset)): 
            return type(value)([self._wrap(v) for v in value])
        else:
            return Struct(value) if isinstance(value, dict) else value

Can be used with any sequence/dict/value structure of any depth.


回答 6

以我认为是前面示例的最佳方面,这是我想到的:

class Struct:
  '''The recursive class for building and representing objects with.'''
  def __init__(self, obj):
    for k, v in obj.iteritems():
      if isinstance(v, dict):
        setattr(self, k, Struct(v))
      else:
        setattr(self, k, v)
  def __getitem__(self, val):
    return self.__dict__[val]
  def __repr__(self):
    return '{%s}' % str(', '.join('%s : %s' % (k, repr(v)) for
      (k, v) in self.__dict__.iteritems()))

Taking what I feel are the best aspects of the previous examples, here’s what I came up with:

class Struct:
  '''The recursive class for building and representing objects with.'''
  def __init__(self, obj):
    for k, v in obj.iteritems():
      if isinstance(v, dict):
        setattr(self, k, Struct(v))
      else:
        setattr(self, k, v)
  def __getitem__(self, val):
    return self.__dict__[val]
  def __repr__(self):
    return '{%s}' % str(', '.join('%s : %s' % (k, repr(v)) for
      (k, v) in self.__dict__.iteritems()))

回答 7

如果您的字典来自json.loads(),则可以一行将其变成一个对象(而不是字典):

import json
from collections import namedtuple

json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))

另请参阅如何将JSON数据转换为Python对象

If your dict is coming from json.loads(), you can turn it into an object instead (rather than a dict) in one line:

import json
from collections import namedtuple

json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))

See also How to convert JSON data into a Python object.


回答 8

如果要将字典键作为对象(或作为困难键的字典)访问,请递归地进行操作,并且还能够更新原始字典,则可以执行以下操作:

class Dictate(object):
    """Object view of a dict, updating the passed in dict when values are set
    or deleted. "Dictate" the contents of a dict...: """

    def __init__(self, d):
        # since __setattr__ is overridden, self.__dict = d doesn't work
        object.__setattr__(self, '_Dictate__dict', d)

    # Dictionary-like access / updates
    def __getitem__(self, name):
        value = self.__dict[name]
        if isinstance(value, dict):  # recursively view sub-dicts as objects
            value = Dictate(value)
        return value

    def __setitem__(self, name, value):
        self.__dict[name] = value
    def __delitem__(self, name):
        del self.__dict[name]

    # Object-like access / updates
    def __getattr__(self, name):
        return self[name]

    def __setattr__(self, name, value):
        self[name] = value
    def __delattr__(self, name):
        del self[name]

    def __repr__(self):
        return "%s(%r)" % (type(self).__name__, self.__dict)
    def __str__(self):
        return str(self.__dict)

用法示例:

d = {'a': 'b', 1: 2}
dd = Dictate(d)
assert dd.a == 'b'  # Access like an object
assert dd[1] == 2  # Access like a dict
# Updates affect d
dd.c = 'd'
assert d['c'] == 'd'
del dd.a
del dd[1]
# Inner dicts are mapped
dd.e = {}
dd.e.f = 'g'
assert dd['e'].f == 'g'
assert d == {'c': 'd', 'e': {'f': 'g'}}

If you want to access dict keys as an object (or as a dict for difficult keys), do it recursively, and also be able to update the original dict, you could do:

class Dictate(object):
    """Object view of a dict, updating the passed in dict when values are set
    or deleted. "Dictate" the contents of a dict...: """

    def __init__(self, d):
        # since __setattr__ is overridden, self.__dict = d doesn't work
        object.__setattr__(self, '_Dictate__dict', d)

    # Dictionary-like access / updates
    def __getitem__(self, name):
        value = self.__dict[name]
        if isinstance(value, dict):  # recursively view sub-dicts as objects
            value = Dictate(value)
        return value

    def __setitem__(self, name, value):
        self.__dict[name] = value
    def __delitem__(self, name):
        del self.__dict[name]

    # Object-like access / updates
    def __getattr__(self, name):
        return self[name]

    def __setattr__(self, name, value):
        self[name] = value
    def __delattr__(self, name):
        del self[name]

    def __repr__(self):
        return "%s(%r)" % (type(self).__name__, self.__dict)
    def __str__(self):
        return str(self.__dict)

Example usage:

d = {'a': 'b', 1: 2}
dd = Dictate(d)
assert dd.a == 'b'  # Access like an object
assert dd[1] == 2  # Access like a dict
# Updates affect d
dd.c = 'd'
assert d['c'] == 'd'
del dd.a
del dd[1]
# Inner dicts are mapped
dd.e = {}
dd.e.f = 'g'
assert dd['e'].f == 'g'
assert d == {'c': 'd', 'e': {'f': 'g'}}

回答 9

>>> def dict2obj(d):
        if isinstance(d, list):
            d = [dict2obj(x) for x in d]
        if not isinstance(d, dict):
            return d
        class C(object):
            pass
        o = C()
        for k in d:
            o.__dict__[k] = dict2obj(d[k])
        return o


>>> d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}
>>> x = dict2obj(d)
>>> x.a
1
>>> x.b.c
2
>>> x.d[1].foo
'bar'
>>> def dict2obj(d):
        if isinstance(d, list):
            d = [dict2obj(x) for x in d]
        if not isinstance(d, dict):
            return d
        class C(object):
            pass
        o = C()
        for k in d:
            o.__dict__[k] = dict2obj(d[k])
        return o


>>> d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}
>>> x = dict2obj(d)
>>> x.a
1
>>> x.b.c
2
>>> x.d[1].foo
'bar'

回答 10

我最终都尝试了AttrDictBunch库,发现它们对于我的使用而言太慢了。经过一个朋友和我的研究,我们发现编写这些库的主要方法导致该库通过嵌套对象积极地递归并在整个字典对象中进行复制。考虑到这一点,我们进行了两个关键更改。1)我们使属性延迟加载2)我们创建轻量级代理对象的副本,而不是创建字典对象的副本。这是最终的实现。使用此代码的性能提升令人难以置信。当使用AttrDict或Bunch时,仅这两个库分别消耗了我的请求时间的1/2和1/3(什么!?)。这段代码将时间减少到几乎没有(在0.5ms范围内)。当然这取决于您的需求,

class DictProxy(object):
    def __init__(self, obj):
        self.obj = obj

    def __getitem__(self, key):
        return wrap(self.obj[key])

    def __getattr__(self, key):
        try:
            return wrap(getattr(self.obj, key))
        except AttributeError:
            try:
                return self[key]
            except KeyError:
                raise AttributeError(key)

    # you probably also want to proxy important list properties along like
    # items(), iteritems() and __len__

class ListProxy(object):
    def __init__(self, obj):
        self.obj = obj

    def __getitem__(self, key):
        return wrap(self.obj[key])

    # you probably also want to proxy important list properties along like
    # __iter__ and __len__

def wrap(value):
    if isinstance(value, dict):
        return DictProxy(value)
    if isinstance(value, (tuple, list)):
        return ListProxy(value)
    return value

通过https://stackoverflow.com/users/704327/michael-merickel查看此处的原始实现。

还要注意的另一件事是,此实现非常简单,并没有实现您可能需要的所有方法。您需要根据需要在DictProxy或ListProxy对象上编写这些内容。

I ended up trying BOTH the AttrDict and the Bunch libraries and found them to be way too slow for my uses. After a friend and I looked into it, we found that the main method for writing these libraries results in the library aggressively recursing through a nested object and making copies of the dictionary object throughout. With this in mind, we made two key changes. 1) We made attributes lazy-loaded 2) instead of creating copies of a dictionary object, we create copies of a light-weight proxy object. This is the final implementation. The performance increase of using this code is incredible. When using AttrDict or Bunch, these two libraries alone consumed 1/2 and 1/3 respectively of my request time(what!?). This code reduced that time to almost nothing(somewhere in the range of 0.5ms). This of course depends on your needs, but if you are using this functionality quite a bit in your code, definitely go with something simple like this.

class DictProxy(object):
    def __init__(self, obj):
        self.obj = obj

    def __getitem__(self, key):
        return wrap(self.obj[key])

    def __getattr__(self, key):
        try:
            return wrap(getattr(self.obj, key))
        except AttributeError:
            try:
                return self[key]
            except KeyError:
                raise AttributeError(key)

    # you probably also want to proxy important list properties along like
    # items(), iteritems() and __len__

class ListProxy(object):
    def __init__(self, obj):
        self.obj = obj

    def __getitem__(self, key):
        return wrap(self.obj[key])

    # you probably also want to proxy important list properties along like
    # __iter__ and __len__

def wrap(value):
    if isinstance(value, dict):
        return DictProxy(value)
    if isinstance(value, (tuple, list)):
        return ListProxy(value)
    return value

See the original implementation here by https://stackoverflow.com/users/704327/michael-merickel.

The other thing to note, is that this implementation is pretty simple and doesn’t implement all of the methods you might need. You’ll need to write those as required on the DictProxy or ListProxy objects.


回答 11

x.__dict__.update(d) 应该做的很好。

x.__dict__.update(d) should do fine.


回答 12

您可以通过自定义对象挂钩来利用标准库的json模块

import json

class obj(object):
    def __init__(self, dict_):
        self.__dict__.update(dict_)

def dict2obj(d):
    return json.loads(json.dumps(d), object_hook=obj)

用法示例:

>>> d = {'a': 1, 'b': {'c': 2}, 'd': ['hi', {'foo': 'bar'}]}
>>> o = dict2obj(d)
>>> o.a
1
>>> o.b.c
2
>>> o.d[0]
u'hi'
>>> o.d[1].foo
u'bar'

而且它不是严格的只读方式,例如namedtuple,您可以更改值,而不是结构:

>>> o.b.c = 3
>>> o.b.c
3

You can leverage the json module of the standard library with a custom object hook:

import json

class obj(object):
    def __init__(self, dict_):
        self.__dict__.update(dict_)

def dict2obj(d):
    return json.loads(json.dumps(d), object_hook=obj)

Example usage:

>>> d = {'a': 1, 'b': {'c': 2}, 'd': ['hi', {'foo': 'bar'}]}
>>> o = dict2obj(d)
>>> o.a
1
>>> o.b.c
2
>>> o.d[0]
u'hi'
>>> o.d[1].foo
u'bar'

And it is not strictly read-only as it is with namedtuple, i.e. you can change values – not structure:

>>> o.b.c = 3
>>> o.b.c
3

回答 13

这应该使您开始:

class dict2obj(object):
    def __init__(self, d):
        self.__dict__['d'] = d

    def __getattr__(self, key):
        value = self.__dict__['d'][key]
        if type(value) == type({}):
            return dict2obj(value)

        return value

d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}

x = dict2obj(d)
print x.a
print x.b.c
print x.d[1].foo

它不适用于列表。您必须将列表包装在UserList中,并重载__getitem__以包装字典。

This should get your started:

class dict2obj(object):
    def __init__(self, d):
        self.__dict__['d'] = d

    def __getattr__(self, key):
        value = self.__dict__['d'][key]
        if type(value) == type({}):
            return dict2obj(value)

        return value

d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}

x = dict2obj(d)
print x.a
print x.b.c
print x.d[1].foo

It doesn’t work for lists, yet. You’ll have to wrap the lists in a UserList and overload __getitem__ to wrap dicts.


回答 14

我知道这里已经有很多答案了,我参加聚会很晚,但是这种方法将递归地将“字典”转换成类似对象的结构…在3.xx中有效

def dictToObject(d):
    for k,v in d.items():
        if isinstance(v, dict):
            d[k] = dictToObject(v)
    return namedtuple('object', d.keys())(*d.values())

# Dictionary created from JSON file
d = {
    'primaryKey': 'id', 
    'metadata': 
        {
            'rows': 0, 
            'lastID': 0
        }, 
    'columns': 
        {
            'col2': {
                'dataType': 'string', 
                'name': 'addressLine1'
            }, 
            'col1': {
                'datatype': 'string', 
                'name': 'postcode'
            }, 
            'col3': {
                'dataType': 'string', 
                'name': 'addressLine2'
            }, 
            'col0': {
                'datatype': 'integer', 
                'name': 'id'
            }, 
            'col4': {
                'dataType': 'string', 
                'name': 'contactNumber'
            }
        }, 
        'secondaryKeys': {}
}

d1 = dictToObject(d)
d1.columns.col1 # == object(datatype='string', name='postcode')
d1.metadata.rows # == 0

I know there’s already a lot of answers here already and I’m late to the party but this method will recursively and ‘in place’ convert a dictionary to an object-like structure… Works in 3.x.x

def dictToObject(d):
    for k,v in d.items():
        if isinstance(v, dict):
            d[k] = dictToObject(v)
    return namedtuple('object', d.keys())(*d.values())

# Dictionary created from JSON file
d = {
    'primaryKey': 'id', 
    'metadata': 
        {
            'rows': 0, 
            'lastID': 0
        }, 
    'columns': 
        {
            'col2': {
                'dataType': 'string', 
                'name': 'addressLine1'
            }, 
            'col1': {
                'datatype': 'string', 
                'name': 'postcode'
            }, 
            'col3': {
                'dataType': 'string', 
                'name': 'addressLine2'
            }, 
            'col0': {
                'datatype': 'integer', 
                'name': 'id'
            }, 
            'col4': {
                'dataType': 'string', 
                'name': 'contactNumber'
            }
        }, 
        'secondaryKeys': {}
}

d1 = dictToObject(d)
d1.columns.col1 # == object(datatype='string', name='postcode')
d1.metadata.rows # == 0

回答 15

from mock import Mock
d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}
my_data = Mock(**d)

# We got
# my_data.a == 1
from mock import Mock
d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}
my_data = Mock(**d)

# We got
# my_data.a == 1

回答 16

让我解释一下我的解决方案几乎使用前一段时间。但是首先,以下代码说明了我没有这样做的原因:

d = {'from': 1}
x = dict2obj(d)

print x.from

给出此错误:

  File "test.py", line 20
    print x.from == 1
                ^
SyntaxError: invalid syntax

由于“ from”是Python关键字,因此某些字典关键字是您不允许的。


现在,我的解决方案允许直接使用字典项的名称来访问字典项。但它也允许您使用“字典语义”。这是带有示例用法的代码:

class dict2obj(dict):
    def __init__(self, dict_):
        super(dict2obj, self).__init__(dict_)
        for key in self:
            item = self[key]
            if isinstance(item, list):
                for idx, it in enumerate(item):
                    if isinstance(it, dict):
                        item[idx] = dict2obj(it)
            elif isinstance(item, dict):
                self[key] = dict2obj(item)

    def __getattr__(self, key):
        return self[key]

d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}

x = dict2obj(d)

assert x.a == x['a'] == 1
assert x.b.c == x['b']['c'] == 2
assert x.d[1].foo == x['d'][1]['foo'] == "bar"

Let me explain a solution I almost used some time ago. But first, the reason I did not is illustrated by the fact that the following code:

d = {'from': 1}
x = dict2obj(d)

print x.from

gives this error:

  File "test.py", line 20
    print x.from == 1
                ^
SyntaxError: invalid syntax

Because “from” is a Python keyword there are certain dictionary keys you cannot allow.


Now my solution allows access to the dictionary items by using their names directly. But it also allows you to use “dictionary semantics”. Here is the code with example usage:

class dict2obj(dict):
    def __init__(self, dict_):
        super(dict2obj, self).__init__(dict_)
        for key in self:
            item = self[key]
            if isinstance(item, list):
                for idx, it in enumerate(item):
                    if isinstance(it, dict):
                        item[idx] = dict2obj(it)
            elif isinstance(item, dict):
                self[key] = dict2obj(item)

    def __getattr__(self, key):
        return self[key]

d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}

x = dict2obj(d)

assert x.a == x['a'] == 1
assert x.b.c == x['b']['c'] == 2
assert x.d[1].foo == x['d'][1]['foo'] == "bar"

回答 17

过去的问答,但我还有话要说。似乎没有人谈论递归字典。这是我的代码:

#!/usr/bin/env python

class Object( dict ):
    def __init__( self, data = None ):
        super( Object, self ).__init__()
        if data:
            self.__update( data, {} )

    def __update( self, data, did ):
        dataid = id(data)
        did[ dataid ] = self

        for k in data:
            dkid = id(data[k])
            if did.has_key(dkid):
                self[k] = did[dkid]
            elif isinstance( data[k], Object ):
                self[k] = data[k]
            elif isinstance( data[k], dict ):
                obj = Object()
                obj.__update( data[k], did )
                self[k] = obj
                obj = None
            else:
                self[k] = data[k]

    def __getattr__( self, key ):
        return self.get( key, None )

    def __setattr__( self, key, value ):
        if isinstance(value,dict):
            self[key] = Object( value )
        else:
            self[key] = value

    def update( self, *args ):
        for obj in args:
            for k in obj:
                if isinstance(obj[k],dict):
                    self[k] = Object( obj[k] )
                else:
                    self[k] = obj[k]
        return self

    def merge( self, *args ):
        for obj in args:
            for k in obj:
                if self.has_key(k):
                    if isinstance(self[k],list) and isinstance(obj[k],list):
                        self[k] += obj[k]
                    elif isinstance(self[k],list):
                        self[k].append( obj[k] )
                    elif isinstance(obj[k],list):
                        self[k] = [self[k]] + obj[k]
                    elif isinstance(self[k],Object) and isinstance(obj[k],Object):
                        self[k].merge( obj[k] )
                    elif isinstance(self[k],Object) and isinstance(obj[k],dict):
                        self[k].merge( obj[k] )
                    else:
                        self[k] = [ self[k], obj[k] ]
                else:
                    if isinstance(obj[k],dict):
                        self[k] = Object( obj[k] )
                    else:
                        self[k] = obj[k]
        return self

def test01():
    class UObject( Object ):
        pass
    obj = Object({1:2})
    d = {}
    d.update({
        "a": 1,
        "b": {
            "c": 2,
            "d": [ 3, 4, 5 ],
            "e": [ [6,7], (8,9) ],
            "self": d,
        },
        1: 10,
        "1": 11,
        "obj": obj,
    })
    x = UObject(d)


    assert x.a == x["a"] == 1
    assert x.b.c == x["b"]["c"] == 2
    assert x.b.d[0] == 3
    assert x.b.d[1] == 4
    assert x.b.e[0][0] == 6
    assert x.b.e[1][0] == 8
    assert x[1] == 10
    assert x["1"] == 11
    assert x[1] != x["1"]
    assert id(x) == id(x.b.self.b.self) == id(x.b.self)
    assert x.b.self.a == x.b.self.b.self.a == 1

    x.x = 12
    assert x.x == x["x"] == 12
    x.y = {"a":13,"b":[14,15]}
    assert x.y.a == 13
    assert x.y.b[0] == 14

def test02():
    x = Object({
        "a": {
            "b": 1,
            "c": [ 2, 3 ]
        },
        1: 6,
        2: [ 8, 9 ],
        3: 11,
    })
    y = Object({
        "a": {
            "b": 4,
            "c": [ 5 ]
        },
        1: 7,
        2: 10,
        3: [ 12 , 13 ],
    })
    z = {
        3: 14,
        2: 15,
        "a": {
            "b": 16,
            "c": 17,
        }
    }
    x.merge( y, z )
    assert 2 in x.a.c
    assert 3 in x.a.c
    assert 5 in x.a.c
    assert 1 in x.a.b
    assert 4 in x.a.b
    assert 8 in x[2]
    assert 9 in x[2]
    assert 10 in x[2]
    assert 11 in x[3]
    assert 12 in x[3]
    assert 13 in x[3]
    assert 14 in x[3]
    assert 15 in x[2]
    assert 16 in x.a.b
    assert 17 in x.a.c

if __name__ == '__main__':
    test01()
    test02()

Old Q&A, but I get something more to talk. Seems no one talk about recursive dict. This is my code:

#!/usr/bin/env python

class Object( dict ):
    def __init__( self, data = None ):
        super( Object, self ).__init__()
        if data:
            self.__update( data, {} )

    def __update( self, data, did ):
        dataid = id(data)
        did[ dataid ] = self

        for k in data:
            dkid = id(data[k])
            if did.has_key(dkid):
                self[k] = did[dkid]
            elif isinstance( data[k], Object ):
                self[k] = data[k]
            elif isinstance( data[k], dict ):
                obj = Object()
                obj.__update( data[k], did )
                self[k] = obj
                obj = None
            else:
                self[k] = data[k]

    def __getattr__( self, key ):
        return self.get( key, None )

    def __setattr__( self, key, value ):
        if isinstance(value,dict):
            self[key] = Object( value )
        else:
            self[key] = value

    def update( self, *args ):
        for obj in args:
            for k in obj:
                if isinstance(obj[k],dict):
                    self[k] = Object( obj[k] )
                else:
                    self[k] = obj[k]
        return self

    def merge( self, *args ):
        for obj in args:
            for k in obj:
                if self.has_key(k):
                    if isinstance(self[k],list) and isinstance(obj[k],list):
                        self[k] += obj[k]
                    elif isinstance(self[k],list):
                        self[k].append( obj[k] )
                    elif isinstance(obj[k],list):
                        self[k] = [self[k]] + obj[k]
                    elif isinstance(self[k],Object) and isinstance(obj[k],Object):
                        self[k].merge( obj[k] )
                    elif isinstance(self[k],Object) and isinstance(obj[k],dict):
                        self[k].merge( obj[k] )
                    else:
                        self[k] = [ self[k], obj[k] ]
                else:
                    if isinstance(obj[k],dict):
                        self[k] = Object( obj[k] )
                    else:
                        self[k] = obj[k]
        return self

def test01():
    class UObject( Object ):
        pass
    obj = Object({1:2})
    d = {}
    d.update({
        "a": 1,
        "b": {
            "c": 2,
            "d": [ 3, 4, 5 ],
            "e": [ [6,7], (8,9) ],
            "self": d,
        },
        1: 10,
        "1": 11,
        "obj": obj,
    })
    x = UObject(d)


    assert x.a == x["a"] == 1
    assert x.b.c == x["b"]["c"] == 2
    assert x.b.d[0] == 3
    assert x.b.d[1] == 4
    assert x.b.e[0][0] == 6
    assert x.b.e[1][0] == 8
    assert x[1] == 10
    assert x["1"] == 11
    assert x[1] != x["1"]
    assert id(x) == id(x.b.self.b.self) == id(x.b.self)
    assert x.b.self.a == x.b.self.b.self.a == 1

    x.x = 12
    assert x.x == x["x"] == 12
    x.y = {"a":13,"b":[14,15]}
    assert x.y.a == 13
    assert x.y.b[0] == 14

def test02():
    x = Object({
        "a": {
            "b": 1,
            "c": [ 2, 3 ]
        },
        1: 6,
        2: [ 8, 9 ],
        3: 11,
    })
    y = Object({
        "a": {
            "b": 4,
            "c": [ 5 ]
        },
        1: 7,
        2: 10,
        3: [ 12 , 13 ],
    })
    z = {
        3: 14,
        2: 15,
        "a": {
            "b": 16,
            "c": 17,
        }
    }
    x.merge( y, z )
    assert 2 in x.a.c
    assert 3 in x.a.c
    assert 5 in x.a.c
    assert 1 in x.a.b
    assert 4 in x.a.b
    assert 8 in x[2]
    assert 9 in x[2]
    assert 10 in x[2]
    assert 11 in x[3]
    assert 12 in x[3]
    assert 13 in x[3]
    assert 14 in x[3]
    assert 15 in x[2]
    assert 16 in x.a.b
    assert 17 in x.a.c

if __name__ == '__main__':
    test01()
    test02()

回答 18

想要上传我的这个小范例版本。

class Struct(dict):
  def __init__(self,data):
    for key, value in data.items():
      if isinstance(value, dict):
        setattr(self, key, Struct(value))
      else:   
        setattr(self, key, type(value).__init__(value))

      dict.__init__(self,data)

它保留导入到类中的类型的属性。我唯一关心的是从解析的字典中覆盖方法。但是否则看起来很稳固!

Wanted to upload my version of this little paradigm.

class Struct(dict):
  def __init__(self,data):
    for key, value in data.items():
      if isinstance(value, dict):
        setattr(self, key, Struct(value))
      else:   
        setattr(self, key, type(value).__init__(value))

      dict.__init__(self,data)

It preserves the attributes for the type that’s imported into the class. My only concern would be overwriting methods from within the dictionary your parsing. But otherwise seems solid!


回答 19

这也很好

class DObj(object):
    pass

dobj = Dobj()
dobj.__dict__ = {'a': 'aaa', 'b': 'bbb'}

print dobj.a
>>> aaa
print dobj.b
>>> bbb

This also works well too

class DObj(object):
    pass

dobj = Dobj()
dobj.__dict__ = {'a': 'aaa', 'b': 'bbb'}

print dobj.a
>>> aaa
print dobj.b
>>> bbb

回答 20

这是实现SilentGhost原始建议的另一种方法:

def dict2obj(d):
  if isinstance(d, dict):
    n = {}
    for item in d:
      if isinstance(d[item], dict):
        n[item] = dict2obj(d[item])
      elif isinstance(d[item], (list, tuple)):
        n[item] = [dict2obj(elem) for elem in d[item]]
      else:
        n[item] = d[item]
    return type('obj_from_dict', (object,), n)
  else:
    return d

Here is another way to implement SilentGhost’s original suggestion:

def dict2obj(d):
  if isinstance(d, dict):
    n = {}
    for item in d:
      if isinstance(d[item], dict):
        n[item] = dict2obj(d[item])
      elif isinstance(d[item], (list, tuple)):
        n[item] = [dict2obj(elem) for elem in d[item]]
      else:
        n[item] = d[item]
    return type('obj_from_dict', (object,), n)
  else:
    return d

回答 21

我偶然发现需要递归将字典列表转换为对象列表的情况,因此根据罗伯托的代码段,这里为我做了什么工作:

def dict2obj(d):
    if isinstance(d, dict):
        n = {}
        for item in d:
            if isinstance(d[item], dict):
                n[item] = dict2obj(d[item])
            elif isinstance(d[item], (list, tuple)):
                n[item] = [dict2obj(elem) for elem in d[item]]
            else:
                n[item] = d[item]
        return type('obj_from_dict', (object,), n)
    elif isinstance(d, (list, tuple,)):
        l = []
        for item in d:
            l.append(dict2obj(item))
        return l
    else:
        return d

注意,出于明显的原因,任何元组都将转换为其等效列表。

希望这对某人有帮助,就像您为我所做的所有回答一样。

I stumbled upon the case I needed to recursively convert a list of dicts to list of objects, so based on Roberto’s snippet here what did the work for me:

def dict2obj(d):
    if isinstance(d, dict):
        n = {}
        for item in d:
            if isinstance(d[item], dict):
                n[item] = dict2obj(d[item])
            elif isinstance(d[item], (list, tuple)):
                n[item] = [dict2obj(elem) for elem in d[item]]
            else:
                n[item] = d[item]
        return type('obj_from_dict', (object,), n)
    elif isinstance(d, (list, tuple,)):
        l = []
        for item in d:
            l.append(dict2obj(item))
        return l
    else:
        return d

Note that any tuple will be converted to its list equivalent, for obvious reasons.

Hope this helps someone as much as all your answers did for me, guys.


回答 22

仅将您dict的分配给__dict__空对象该怎么办?

class Object:
    """If your dict is "flat", this is a simple way to create an object from a dict

    >>> obj = Object()
    >>> obj.__dict__ = d
    >>> d.a
    1
    """
    pass

当然,这在您嵌套的dict示例中将失败,除非您递归遍历该dict:

# For a nested dict, you need to recursively update __dict__
def dict2obj(d):
    """Convert a dict to an object

    >>> d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}
    >>> obj = dict2obj(d)
    >>> obj.b.c
    2
    >>> obj.d
    ["hi", {'foo': "bar"}]
    """
    try:
        d = dict(d)
    except (TypeError, ValueError):
        return d
    obj = Object()
    for k, v in d.iteritems():
        obj.__dict__[k] = dict2obj(v)
    return obj

您的示例list元素可能应该是Mapping,是(键,值)对的列表,如下所示:

>>> d = {'a': 1, 'b': {'c': 2}, 'd': [("hi", {'foo': "bar"})]}
>>> obj = dict2obj(d)
>>> obj.d.hi.foo
"bar"

What about just assigning your dict to the __dict__ of an empty object?

class Object:
    """If your dict is "flat", this is a simple way to create an object from a dict

    >>> obj = Object()
    >>> obj.__dict__ = d
    >>> d.a
    1
    """
    pass

Of course this fails on your nested dict example unless you walk the dict recursively:

# For a nested dict, you need to recursively update __dict__
def dict2obj(d):
    """Convert a dict to an object

    >>> d = {'a': 1, 'b': {'c': 2}, 'd': ["hi", {'foo': "bar"}]}
    >>> obj = dict2obj(d)
    >>> obj.b.c
    2
    >>> obj.d
    ["hi", {'foo': "bar"}]
    """
    try:
        d = dict(d)
    except (TypeError, ValueError):
        return d
    obj = Object()
    for k, v in d.iteritems():
        obj.__dict__[k] = dict2obj(v)
    return obj

And your example list element was probably meant to be a Mapping, a list of (key, value) pairs like this:

>>> d = {'a': 1, 'b': {'c': 2}, 'd': [("hi", {'foo': "bar"})]}
>>> obj = dict2obj(d)
>>> obj.d.hi.foo
"bar"

回答 23

这是另一个实现:

class DictObj(object):
    def __init__(self, d):
        self.__dict__ = d

def dict_to_obj(d):
    if isinstance(d, (list, tuple)): return map(dict_to_obj, d)
    elif not isinstance(d, dict): return d
    return DictObj(dict((k, dict_to_obj(v)) for (k,v) in d.iteritems()))

[编辑]关于还处理列表中的命令,而不仅仅是其他命令的遗漏之处。添加了修复程序。

Here’s another implementation:

class DictObj(object):
    def __init__(self, d):
        self.__dict__ = d

def dict_to_obj(d):
    if isinstance(d, (list, tuple)): return map(dict_to_obj, d)
    elif not isinstance(d, dict): return d
    return DictObj(dict((k, dict_to_obj(v)) for (k,v) in d.iteritems()))

[Edit] Missed bit about also handling dicts within lists, not just other dicts. Added fix.


回答 24

class Struct(dict):
    def __getattr__(self, name):
        try:
            return self[name]
        except KeyError:
            raise AttributeError(name)

    def __setattr__(self, name, value):
        self[name] = value

    def copy(self):
        return Struct(dict.copy(self))

用法:

points = Struct(x=1, y=2)
# Changing
points['x'] = 2
points.y = 1
# Accessing
points['x'], points.x, points.get('x') # 2 2 2
points['y'], points.y, points.get('y') # 1 1 1
# Accessing inexistent keys/attrs 
points['z'] # KeyError: z
points.z # AttributeError: z
# Copying
points_copy = points.copy()
points.x = 2
points_copy.x # 1
class Struct(dict):
    def __getattr__(self, name):
        try:
            return self[name]
        except KeyError:
            raise AttributeError(name)

    def __setattr__(self, name, value):
        self[name] = value

    def copy(self):
        return Struct(dict.copy(self))

Usage:

points = Struct(x=1, y=2)
# Changing
points['x'] = 2
points.y = 1
# Accessing
points['x'], points.x, points.get('x') # 2 2 2
points['y'], points.y, points.get('y') # 1 1 1
# Accessing inexistent keys/attrs 
points['z'] # KeyError: z
points.z # AttributeError: z
# Copying
points_copy = points.copy()
points.x = 2
points_copy.x # 1

回答 25

这个怎么样:

from functools import partial
d2o=partial(type, "d2o", ())

然后可以这样使用:

>>> o=d2o({"a" : 5, "b" : 3})
>>> print o.a
5
>>> print o.b
3

How about this:

from functools import partial
d2o=partial(type, "d2o", ())

This can then be used like this:

>>> o=d2o({"a" : 5, "b" : 3})
>>> print o.a
5
>>> print o.b
3

回答 26

我认为一个字典由数字,字符串和字典组成,大多数时候就足够了。因此,我忽略了元组,列表和其他类型未出现在字典最终维度中的情况。

考虑到继承,再结合递归,可以方便地解决打印问题,并提供两种查询数据的方式,一种编辑数据的方式。

请参阅下面的示例,该字典描述了有关学生的一些信息:

group=["class1","class2","class3","class4",]
rank=["rank1","rank2","rank3","rank4","rank5",]
data=["name","sex","height","weight","score"]

#build a dict based on the lists above
student_dic=dict([(g,dict([(r,dict([(d,'') for d in data])) for r in rank ]))for g in group])

#this is the solution
class dic2class(dict):
    def __init__(self, dic):
        for key,val in dic.items():
            self.__dict__[key]=self[key]=dic2class(val) if isinstance(val,dict) else val


student_class=dic2class(student_dic)

#one way to edit:
student_class.class1.rank1['sex']='male'
student_class.class1.rank1['name']='Nan Xiang'

#two ways to query:
print student_class.class1.rank1
print student_class.class1['rank1']
print '-'*50
for rank in student_class.class1:
    print getattr(student_class.class1,rank)

结果:

{'score': '', 'sex': 'male', 'name': 'Nan Xiang', 'weight': '', 'height': ''}
{'score': '', 'sex': 'male', 'name': 'Nan Xiang', 'weight': '', 'height': ''}
--------------------------------------------------
{'score': '', 'sex': '', 'name': '', 'weight': '', 'height': ''}
{'score': '', 'sex': '', 'name': '', 'weight': '', 'height': ''}
{'score': '', 'sex': 'male', 'name': 'Nan Xiang', 'weight': '', 'height': ''}
{'score': '', 'sex': '', 'name': '', 'weight': '', 'height': ''}
{'score': '', 'sex': '', 'name': '', 'weight': '', 'height': ''}

I think a dict consists of number, string and dict is enough most time. So I ignore the situation that tuples, lists and other types not appearing in the final dimension of a dict.

Considering inheritance, combined with recursion, it solves the print problem conveniently and also provides two ways to query a data,one way to edit a data.

See the example below, a dict that describes some information about students:

group=["class1","class2","class3","class4",]
rank=["rank1","rank2","rank3","rank4","rank5",]
data=["name","sex","height","weight","score"]

#build a dict based on the lists above
student_dic=dict([(g,dict([(r,dict([(d,'') for d in data])) for r in rank ]))for g in group])

#this is the solution
class dic2class(dict):
    def __init__(self, dic):
        for key,val in dic.items():
            self.__dict__[key]=self[key]=dic2class(val) if isinstance(val,dict) else val


student_class=dic2class(student_dic)

#one way to edit:
student_class.class1.rank1['sex']='male'
student_class.class1.rank1['name']='Nan Xiang'

#two ways to query:
print student_class.class1.rank1
print student_class.class1['rank1']
print '-'*50
for rank in student_class.class1:
    print getattr(student_class.class1,rank)

Results:

{'score': '', 'sex': 'male', 'name': 'Nan Xiang', 'weight': '', 'height': ''}
{'score': '', 'sex': 'male', 'name': 'Nan Xiang', 'weight': '', 'height': ''}
--------------------------------------------------
{'score': '', 'sex': '', 'name': '', 'weight': '', 'height': ''}
{'score': '', 'sex': '', 'name': '', 'weight': '', 'height': ''}
{'score': '', 'sex': 'male', 'name': 'Nan Xiang', 'weight': '', 'height': ''}
{'score': '', 'sex': '', 'name': '', 'weight': '', 'height': ''}
{'score': '', 'sex': '', 'name': '', 'weight': '', 'height': ''}

回答 27

通常,您想将dict层次结构镜像到您的对象中,而不是通常位于最低级别的列表或元组。所以这就是我这样做的方式:

class defDictToObject(object):

    def __init__(self, myDict):
        for key, value in myDict.items():
            if type(value) == dict:
                setattr(self, key, defDictToObject(value))
            else:
                setattr(self, key, value)

因此,我们这样做:

myDict = { 'a': 1,
           'b': { 
              'b1': {'x': 1,
                    'y': 2} },
           'c': ['hi', 'bar'] 
         }

并获得:

x.b.b1.x 1个

x.c [‘hi’,’bar’]

Typically you want to mirror dict hierarchy into your object but not list or tuples which are typically at lowest level. So this is how I did this:

class defDictToObject(object):

    def __init__(self, myDict):
        for key, value in myDict.items():
            if type(value) == dict:
                setattr(self, key, defDictToObject(value))
            else:
                setattr(self, key, value)

So we do:

myDict = { 'a': 1,
           'b': { 
              'b1': {'x': 1,
                    'y': 2} },
           'c': ['hi', 'bar'] 
         }

and get:

x.b.b1.x 1

x.c [‘hi’, ‘bar’]


回答 28

我的字典是这样的格式:

addr_bk = {
    'person': [
        {'name': 'Andrew', 'id': 123, 'email': 'andrew@mailserver.com',
         'phone': [{'type': 2, 'number': '633311122'},
                   {'type': 0, 'number': '97788665'}]
        },
        {'name': 'Tom', 'id': 456,
         'phone': [{'type': 0, 'number': '91122334'}]}, 
        {'name': 'Jack', 'id': 7788, 'email': 'jack@gmail.com'}
    ]
}

可以看出,我有嵌套的字典字典列表。这是因为addr_bk是从使用lwpb.codec转换为python dict的协议缓冲区数据中解码的。有可选字段(例如,电子邮件=>,其中的密钥可能不可用)和重复字段(例如,电话=>转换为词典列表)。

我尝试了以上所有建议的解决方案。有些不能很好地处理嵌套字典。其他人无法轻松打印对象详细信息。

只有Dawie Strauss的解决方案dict2obj(dict)最有效。

当找不到密钥时,我对它进行了一些处理:

# Work the best, with nested dictionaries & lists! :)
# Able to print out all items.
class dict2obj_new(dict):
    def __init__(self, dict_):
        super(dict2obj_new, self).__init__(dict_)
        for key in self:
            item = self[key]
            if isinstance(item, list):
                for idx, it in enumerate(item):
                    if isinstance(it, dict):
                        item[idx] = dict2obj_new(it)
            elif isinstance(item, dict):
                self[key] = dict2obj_new(item)

    def __getattr__(self, key):
        # Enhanced to handle key not found.
        if self.has_key(key):
            return self[key]
        else:
            return None

然后,我用以下方法进行了测试:

# Testing...
ab = dict2obj_new(addr_bk)

for person in ab.person:
  print "Person ID:", person.id
  print "  Name:", person.name
  # Check if optional field is available before printing.
  if person.email:
    print "  E-mail address:", person.email

  # Check if optional field is available before printing.
  if person.phone:
    for phone_number in person.phone:
      if phone_number.type == codec.enums.PhoneType.MOBILE:
        print "  Mobile phone #:",
      elif phone_number.type == codec.enums.PhoneType.HOME:
        print "  Home phone #:",
      else:
        print "  Work phone #:",
      print phone_number.number

My dictionary is of this format:

addr_bk = {
    'person': [
        {'name': 'Andrew', 'id': 123, 'email': 'andrew@mailserver.com',
         'phone': [{'type': 2, 'number': '633311122'},
                   {'type': 0, 'number': '97788665'}]
        },
        {'name': 'Tom', 'id': 456,
         'phone': [{'type': 0, 'number': '91122334'}]}, 
        {'name': 'Jack', 'id': 7788, 'email': 'jack@gmail.com'}
    ]
}

As can be seen, I have nested dictionaries and list of dicts. This is because the addr_bk was decoded from protocol buffer data that converted to a python dict using lwpb.codec. There are optional field (e.g. email => where key may be unavailable) and repeated field (e.g. phone => converted to list of dict).

I tried all the above proposed solutions. Some doesn’t handle the nested dictionaries well. Others cannot print the object details easily.

Only the solution, dict2obj(dict) by Dawie Strauss, works best.

I have enhanced it a little to handle when the key cannot be found:

# Work the best, with nested dictionaries & lists! :)
# Able to print out all items.
class dict2obj_new(dict):
    def __init__(self, dict_):
        super(dict2obj_new, self).__init__(dict_)
        for key in self:
            item = self[key]
            if isinstance(item, list):
                for idx, it in enumerate(item):
                    if isinstance(it, dict):
                        item[idx] = dict2obj_new(it)
            elif isinstance(item, dict):
                self[key] = dict2obj_new(item)

    def __getattr__(self, key):
        # Enhanced to handle key not found.
        if self.has_key(key):
            return self[key]
        else:
            return None

Then, I tested it with:

# Testing...
ab = dict2obj_new(addr_bk)

for person in ab.person:
  print "Person ID:", person.id
  print "  Name:", person.name
  # Check if optional field is available before printing.
  if person.email:
    print "  E-mail address:", person.email

  # Check if optional field is available before printing.
  if person.phone:
    for phone_number in person.phone:
      if phone_number.type == codec.enums.PhoneType.MOBILE:
        print "  Mobile phone #:",
      elif phone_number.type == codec.enums.PhoneType.HOME:
        print "  Home phone #:",
      else:
        print "  Work phone #:",
      print phone_number.number

回答 29

建立我对“ python:如何动态地向类添加属性? ”的答案:

class data(object):
    def __init__(self,*args,**argd):
        self.__dict__.update(dict(*args,**argd))

def makedata(d):
    d2 = {}
    for n in d:
        d2[n] = trydata(d[n])
    return data(d2)

def trydata(o):
    if isinstance(o,dict):
        return makedata(o)
    elif isinstance(o,list):
        return [trydata(i) for i in o]
    else:
        return o

您调用makedata要转换的字典,或者trydata取决于您期望输入的内容,它会吐出一个数据对象。

笔记:

  • trydata如果需要更多功能,可以添加Elif 。
  • 显然,如果您想要x.a = {}或类似的方法将无法使用。
  • 如果需要只读版本,请使用原始答案中的类数据。

Building off my answer to “python: How to add property to a class dynamically?“:

class data(object):
    def __init__(self,*args,**argd):
        self.__dict__.update(dict(*args,**argd))

def makedata(d):
    d2 = {}
    for n in d:
        d2[n] = trydata(d[n])
    return data(d2)

def trydata(o):
    if isinstance(o,dict):
        return makedata(o)
    elif isinstance(o,list):
        return [trydata(i) for i in o]
    else:
        return o

You call makedata on the dictionary you want converted, or maybe trydata depending on what you expect as input, and it spits out a data object.

Notes:

  • You can add elifs to trydata if you need more functionality.
  • Obviously this won’t work if you want x.a = {} or similar.
  • If you want a readonly version, use the class data from the original answer.

如何读取大文件-逐行读取?

问题:如何读取大文件-逐行读取?

我想遍历整个文件的每一行。一种方法是读取整个文件,将其保存到列表中,然后遍历感兴趣的行。此方法占用大量内存,因此我正在寻找替代方法。

到目前为止,我的代码:

for each_line in fileinput.input(input_file):
    do_something(each_line)

    for each_line_again in fileinput.input(input_file):
        do_something(each_line_again)

执行此代码将显示错误消息:device active

有什么建议么?

目的是计算成对的字符串相似度,这意味着对于文件中的每一行,我想计算每隔一行的Levenshtein距离。

I want to iterate over each line of an entire file. One way to do this is by reading the entire file, saving it to a list, then going over the line of interest. This method uses a lot of memory, so I am looking for an alternative.

My code so far:

for each_line in fileinput.input(input_file):
    do_something(each_line)

    for each_line_again in fileinput.input(input_file):
        do_something(each_line_again)

Executing this code gives an error message: device active.

Any suggestions?

The purpose is to calculate pair-wise string similarity, meaning for each line in file, I want to calculate the Levenshtein distance with every other line.


回答 0

正确的,完全Python式的读取文件的方法如下:

with open(...) as f:
    for line in f:
        # Do something with 'line'

with语句处理文件的打开和关闭,包括内部块是否引发异常。该for line in f会将文件对象f视为可迭代,它会自动使用缓冲I / O和内存管理,这样你就不必对大文件的担心。

应该有一种-最好只有一种-显而易见的方法。

The correct, fully Pythonic way to read a file is the following:

with open(...) as f:
    for line in f:
        # Do something with 'line'

The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered I/O and memory management so you don’t have to worry about large files.

There should be one — and preferably only one — obvious way to do it.


回答 1

两种有效的内存排序方式(第一个最好)-

  1. 用于 with -python 2.5及更高版本支持
  2. 使用的yield,如果你真的想有过多少读控制

1.使用 with

with是读取大型文件的一种不错且高效的pythonic方法。优点-1)文件对象从with执行块退出后自动关闭。2)with块内的异常处理。3)内存for循环f逐行遍历文件对象。在内部,它确实可以缓冲IO(以优化昂贵的IO操作)和内存管理。

with open("x.txt") as f:
    for line in f:
        do something with data

2.使用 yield

有时,人们可能希望对每个迭代中要读取的内容进行更细粒度的控制。在这种情况下,请使用iteryield。请注意,使用此方法时,明确需要在最后关闭文件。

def readInChunks(fileObj, chunkSize=2048):
    """
    Lazy function to read a file piece by piece.
    Default chunk size: 2kB.
    """
    while True:
        data = fileObj.read(chunkSize)
        if not data:
            break
        yield data

f = open('bigFile')
for chuck in readInChunks(f):
    do_something(chunk)
f.close()

陷阱并为完整性起见 -以下方法对于读取大文件而言不尽人意,但请阅读以获取全面的了解。

在Python中,最常见的从文件中读取行的方法是执行以下操作:

for line in open('myfile','r').readlines():
    do_something(line)

但是,完成此操作后,readlines()功能(功能相同read())将整个文件加载到内存中,然后对其进行迭代。对于大文件,一种更好的方法(首先提到的两种方法是最好的)是使用fileinput模块,如下所示:

import fileinput

for line in fileinput.input(['myfile']):
    do_something(line)

fileinput.input()调用顺序读取行,但在读取后甚至不会简单地将它们保留在内存中,因为file在python中是可迭代的。

参考文献

  1. Python with语句

Two memory efficient ways in ranked order (first is best) –

  1. use of with – supported from python 2.5 and above
  2. use of yield if you really want to have control over how much to read

1. use of with

with is the nice and efficient pythonic way to read large files. advantages – 1) file object is automatically closed after exiting from with execution block. 2) exception handling inside the with block. 3) memory for loop iterates through the f file object line by line. internally it does buffered IO (to optimized on costly IO operations) and memory management.

with open("x.txt") as f:
    for line in f:
        do something with data

2. use of yield

Sometimes one might want more fine-grained control over how much to read in each iteration. In that case use iter & yield. Note with this method one explicitly needs close the file at the end.

def readInChunks(fileObj, chunkSize=2048):
    """
    Lazy function to read a file piece by piece.
    Default chunk size: 2kB.
    """
    while True:
        data = fileObj.read(chunkSize)
        if not data:
            break
        yield data

f = open('bigFile')
for chuck in readInChunks(f):
    do_something(chunk)
f.close()

Pitfalls and for the sake of completeness – below methods are not as good or not as elegant for reading large files but please read to get rounded understanding.

In Python, the most common way to read lines from a file is to do the following:

for line in open('myfile','r').readlines():
    do_something(line)

When this is done, however, the readlines() function (same applies for read() function) loads the entire file into memory, then iterates over it. A slightly better approach (the first mentioned two methods are the best) for large files is to use the fileinput module, as follows:

import fileinput

for line in fileinput.input(['myfile']):
    do_something(line)

the fileinput.input() call reads lines sequentially, but doesn’t keep them in memory after they’ve been read or even simply so this, since file in python is iterable.

References

  1. Python with statement

回答 2

要删除换行符:

with open(file_path, 'rU') as f:
    for line_terminated in f:
        line = line_terminated.rstrip('\n')
        ...

随着通用换行符支持所有文本文件行会显得与终止'\n',无论在文件中的终止,'\r''\n',或者'\r\n'

编辑-要指定通用换行符支持:

  • Unix上的Python 2– open(file_path, mode='rU')必需[感谢@Dave ]
  • Windows上的Python 2- open(file_path, mode='rU')可选
  • Python 3– open(file_path, newline=None)可选

newline参数仅在Python 3中受支持,默认为None。在所有情况下,该mode参数默认为'r'。该U是在Python 3.在Python 2 Windows上不再支持一些其他的机制似乎转换\r\n\n

文件:

要保留本地行终止符:

with open(file_path, 'rb') as f:
    with line_native_terminated in f:
        ...

二进制模式仍然可以将文件解析为 in。每行将具有文件中包含的任何终止符。

感谢@katrielalex回答,Python的open()文档和iPython实验。

To strip newlines:

with open(file_path, 'rU') as f:
    for line_terminated in f:
        line = line_terminated.rstrip('\n')
        ...

With universal newline support all text file lines will seem to be terminated with '\n', whatever the terminators in the file, '\r', '\n', or '\r\n'.

EDIT – To specify universal newline support:

  • Python 2 on Unix – open(file_path, mode='rU') – required [thanks @Dave]
  • Python 2 on Windows – open(file_path, mode='rU') – optional
  • Python 3 – open(file_path, newline=None) – optional

The newline parameter is only supported in Python 3 and defaults to None. The mode parameter defaults to 'r' in all cases. The U is deprecated in Python 3. In Python 2 on Windows some other mechanism appears to translate \r\n to \n.

Docs:

To preserve native line terminators:

with open(file_path, 'rb') as f:
    with line_native_terminated in f:
        ...

Binary mode can still parse the file into lines with in. Each line will have whatever terminators it has in the file.

Thanks to @katrielalex‘s answer, Python’s open() doc, and iPython experiments.


回答 3

这是在python中读取文件的一种可能方式:

f = open(input_file)
for line in f:
    do_stuff(line)
f.close()

它不会分配完整列表。遍历所有行。

this is a possible way of reading a file in python:

f = open(input_file)
for line in f:
    do_stuff(line)
f.close()

it does not allocate a full list. It iterates over the lines.


回答 4

关于我来自哪里的一些背景信息。代码片段在最后。

如果可以,我更喜欢使用H2O之类的开源工具来进行超高性能的并行CSV文件读取,但是此工具在功能集中受到限制。我最终写了很多代码来创建数据科学管道,然后将其馈送到H2O集群以进行有监督的学习。

我通过从多处理库的池对象和映射函数中添加了很多并行性,从UCI仓库读取8GB HIGGS数据集的文件,甚至从数据中读取40GB CSV文件的速度也大大提高了。例如,使用最近邻居搜索进行聚类以及DBSCAN和Markov聚类算法都需要一些并行编程技巧,以绕过一些严重挑战性的内存和挂钟时间问题。

我通常喜欢先使用gnu工具将文件按行分成几部分,然后对所有文件进行glob-filemask,以在python程序中并行查找和读取它们。我通常使用1000多个部分文件。做这些技巧可以极大地提高处理速度和内存限制。

pandas dataframe.read_csv是单线程的,因此您可以执行以下技巧,通过运行map()并行执行来使pandas更快。您可以使用htop查看带有普通旧式顺序熊猫dataframe.read_csv的情况,仅一个内核上的100%cpu就是pd.read_csv中的实际瓶颈,而不是磁盘。

我应该补充一点,我在快速视频卡总线上使用SSD,而不是在SATA6总线上使用旋转高清硬盘,外加16个CPU内核。

另外,我发现在另一种应用程序中非常有用的另一种技术是并行CSV文件读取一个巨型文件中的所有文件,从而以不同的偏移量开始每个工作程序到文件中,而不是将一个大文件预分割为许多零件文件。在每个并行工作程序中使用python的文件seek()和tell()来读取条带中的大文本文件,这些文件位于大文件中不同的字节偏移起始字节和结束字节位置,并且同时进行。您可以对字节执行正则表达式findall,并返回换行计数。这是部分款项。最后,在工作人员完成后map函数返回时,将部分和求和以得到全局和。

以下是使用并行字节偏移技巧的一些示例基准测试:

我使用2个文件:HIGGS.csv是8 GB。它来自UCI机器学习存储库。all_bin .csv是40.4 GB,来自我当前的项目。我使用2个程序:Linux附带的GNU wc程序,以及我开发的纯python fastread.py程序。

HP-Z820:/mnt/fastssd/fast_file_reader$ ls -l /mnt/fastssd/nzv/HIGGS.csv
-rw-rw-r-- 1 8035497980 Jan 24 16:00 /mnt/fastssd/nzv/HIGGS.csv

HP-Z820:/mnt/fastssd$ ls -l all_bin.csv
-rw-rw-r-- 1 40412077758 Feb  2 09:00 all_bin.csv

ga@ga-HP-Z820:/mnt/fastssd$ time python fastread.py --fileName="all_bin.csv" --numProcesses=32 --balanceFactor=2
2367496

real    0m8.920s
user    1m30.056s
sys 2m38.744s

In [1]: 40412077758. / 8.92
Out[1]: 4530501990.807175

那是大约4.5 GB / s或45 Gb / s的文件拖曳速度。我的朋友,那不是没有旋转的硬盘。那实际上是三星Pro 950 SSD。

以下是由纯C编译程序gnu wc进行行计数的同一文件的速度基准。

很酷的是,在这种情况下,您可以看到我的纯python程序与gnu wc编译的C程序的速度基本匹配。Python是可解释的,但C是已编译的,因此这是一个非常有趣的壮举,我想您会同意的。当然,wc确实需要更改为并行程序,然后才能真正击败我的python程序。但是就目前而言,gnu wc只是一个顺序程序。您可以尽力而为,而python今天可以并行完成。Cython编译可能会帮助我(另一些时间)。此外,还没有探索内存映射文件。

HP-Z820:/mnt/fastssd$ time wc -l all_bin.csv
2367496 all_bin.csv

real    0m8.807s
user    0m1.168s
sys 0m7.636s


HP-Z820:/mnt/fastssd/fast_file_reader$ time python fastread.py --fileName="HIGGS.csv" --numProcesses=16 --balanceFactor=2
11000000

real    0m2.257s
user    0m12.088s
sys 0m20.512s

HP-Z820:/mnt/fastssd/fast_file_reader$ time wc -l HIGGS.csv
11000000 HIGGS.csv

real    0m1.820s
user    0m0.364s
sys 0m1.456s

结论:与C程序相比,纯python程序的速度不错。但是,至少在行计数方面,仅在C程序上使用纯python程序是不够的。通常,该技术可用于其他文件处理,因此此python代码仍然不错。

问题:仅一次编译正则表达式并将其传递给所有工作人员是否会提高速度?答:Regex预编译在此应用程序中无济于事。我想原因是所有工人的过程序列化和创建的开销占主导。

还有一件事。并行读取CSV文件是否有帮助?磁盘是瓶颈,还是CPU?他们说,许多关于stackoverflow的最受好评的答案都包含着通用的开发智慧,即您只需要一个线程即可读取文件,并且可以做到最好。他们确定吗?

让我们找出:

HP-Z820:/mnt/fastssd/fast_file_reader$ time python fastread.py --fileName="HIGGS.csv" --numProcesses=16 --balanceFactor=2
11000000

real    0m2.256s
user    0m10.696s
sys 0m19.952s

HP-Z820:/mnt/fastssd/fast_file_reader$ time python fastread.py --fileName="HIGGS.csv" --numProcesses=1 --balanceFactor=1
11000000

real    0m17.380s
user    0m11.124s
sys 0m6.272s

哦,是的,是的。并行文件读取效果很好。好吧,你去!

附言 如果您想知道某些情况,那么在使用单个工作进程时,如果balanceFactor为2,该怎么办?好吧,这太可怕了:

HP-Z820:/mnt/fastssd/fast_file_reader$ time python fastread.py --fileName="HIGGS.csv" --numProcesses=1 --balanceFactor=2
11000000

real    1m37.077s
user    0m12.432s
sys 1m24.700s

fastread.py python程序的关键部分:

fileBytes = stat(fileName).st_size  # Read quickly from OS how many bytes are in a text file
startByte, endByte = PartitionDataToWorkers(workers=numProcesses, items=fileBytes, balanceFactor=balanceFactor)
p = Pool(numProcesses)
partialSum = p.starmap(ReadFileSegment, zip(startByte, endByte, repeat(fileName))) # startByte is already a list. fileName is made into a same-length list of duplicates values.
globalSum = sum(partialSum)
print(globalSum)


def ReadFileSegment(startByte, endByte, fileName, searchChar='\n'):  # counts number of searchChar appearing in the byte range
    with open(fileName, 'r') as f:
        f.seek(startByte-1)  # seek is initially at byte 0 and then moves forward the specified amount, so seek(5) points at the 6th byte.
        bytes = f.read(endByte - startByte + 1)
        cnt = len(re.findall(searchChar, bytes)) # findall with implicit compiling runs just as fast here as re.compile once + re.finditer many times.
    return cnt

PartitionDataToWorkers的def只是普通的顺序代码。我省去了它,以防其他人想对并行编程的方式有所了解。我免费提供了更难的部分:经过测试和运行的并行代码,以帮助您学习。

感谢:Arno和Cliff的开源H2O项目以及H2O员工的出色软件和指导视频,它们为我提供了如上所述的纯Python高性能并行字节偏移读取器的灵感。H2O使用Java进行并行文件读取,可被python和R程序调用,并且在读取大型CSV文件方面比在地球上任何东西都快,而且速度惊人。

Some context up front as to where I am coming from. Code snippets are at the end.

When I can, I prefer to use an open source tool like H2O to do super high performance parallel CSV file reads, but this tool is limited in feature set. I end up writing a lot of code to create data science pipelines before feeding to H2O cluster for the supervised learning proper.

I have been reading files like 8GB HIGGS dataset from UCI repo and even 40GB CSV files for data science purposes significantly faster by adding lots of parallelism with the multiprocessing library’s pool object and map function. For example clustering with nearest neighbor searches and also DBSCAN and Markov clustering algorithms requires some parallel programming finesse to bypass some seriously challenging memory and wall clock time problems.

I usually like to break the file row-wise into parts using gnu tools first and then glob-filemask them all to find and read them in parallel in the python program. I use something like 1000+ partial files commonly. Doing these tricks helps immensely with processing speed and memory limits.

The pandas dataframe.read_csv is single threaded so you can do these tricks to make pandas quite faster by running a map() for parallel execution. You can use htop to see that with plain old sequential pandas dataframe.read_csv, 100% cpu on just one core is the actual bottleneck in pd.read_csv, not the disk at all.

I should add I’m using an SSD on fast video card bus, not a spinning HD on SATA6 bus, plus 16 CPU cores.

Also, another technique that I discovered works great in some applications is parallel CSV file reads all within one giant file, starting each worker at different offset into the file, rather than pre-splitting one big file into many part files. Use python’s file seek() and tell() in each parallel worker to read the big text file in strips, at different byte offset start-byte and end-byte locations in the big file, all at the same time concurrently. You can do a regex findall on the bytes, and return the count of linefeeds. This is a partial sum. Finally sum up the partial sums to get the global sum when the map function returns after the workers finished.

Following is some example benchmarks using the parallel byte offset trick:

I use 2 files: HIGGS.csv is 8 GB. It is from the UCI machine learning repository. all_bin .csv is 40.4 GB and is from my current project. I use 2 programs: GNU wc program which comes with Linux, and the pure python fastread.py program which I developed.

HP-Z820:/mnt/fastssd/fast_file_reader$ ls -l /mnt/fastssd/nzv/HIGGS.csv
-rw-rw-r-- 1 8035497980 Jan 24 16:00 /mnt/fastssd/nzv/HIGGS.csv

HP-Z820:/mnt/fastssd$ ls -l all_bin.csv
-rw-rw-r-- 1 40412077758 Feb  2 09:00 all_bin.csv

ga@ga-HP-Z820:/mnt/fastssd$ time python fastread.py --fileName="all_bin.csv" --numProcesses=32 --balanceFactor=2
2367496

real    0m8.920s
user    1m30.056s
sys 2m38.744s

In [1]: 40412077758. / 8.92
Out[1]: 4530501990.807175

That’s some 4.5 GB/s, or 45 Gb/s, file slurping speed. That ain’t no spinning hard disk, my friend. That’s actually a Samsung Pro 950 SSD.

Below is the speed benchmark for the same file being line-counted by gnu wc, a pure C compiled program.

What is cool is you can see my pure python program essentially matched the speed of the gnu wc compiled C program in this case. Python is interpreted but C is compiled, so this is a pretty interesting feat of speed, I think you would agree. Of course, wc really needs to be changed to a parallel program, and then it would really beat the socks off my python program. But as it stands today, gnu wc is just a sequential program. You do what you can, and python can do parallel today. Cython compiling might be able to help me (for some other time). Also memory mapped files was not explored yet.

HP-Z820:/mnt/fastssd$ time wc -l all_bin.csv
2367496 all_bin.csv

real    0m8.807s
user    0m1.168s
sys 0m7.636s


HP-Z820:/mnt/fastssd/fast_file_reader$ time python fastread.py --fileName="HIGGS.csv" --numProcesses=16 --balanceFactor=2
11000000

real    0m2.257s
user    0m12.088s
sys 0m20.512s

HP-Z820:/mnt/fastssd/fast_file_reader$ time wc -l HIGGS.csv
11000000 HIGGS.csv

real    0m1.820s
user    0m0.364s
sys 0m1.456s

Conclusion: The speed is good for a pure python program compared to a C program. However, it’s not good enough to use the pure python program over the C program, at least for linecounting purpose. Generally the technique can be used for other file processing, so this python code is still good.

Question: Does compiling the regex just one time and passing it to all workers will improve speed? Answer: Regex pre-compiling does NOT help in this application. I suppose the reason is that the overhead of process serialization and creation for all the workers is dominating.

One more thing. Does parallel CSV file reading even help? Is the disk the bottleneck, or is it the CPU? Many so-called top-rated answers on stackoverflow contain the common dev wisdom that you only need one thread to read a file, best you can do, they say. Are they sure, though?

Let’s find out:

HP-Z820:/mnt/fastssd/fast_file_reader$ time python fastread.py --fileName="HIGGS.csv" --numProcesses=16 --balanceFactor=2
11000000

real    0m2.256s
user    0m10.696s
sys 0m19.952s

HP-Z820:/mnt/fastssd/fast_file_reader$ time python fastread.py --fileName="HIGGS.csv" --numProcesses=1 --balanceFactor=1
11000000

real    0m17.380s
user    0m11.124s
sys 0m6.272s

Oh yes, yes it does. Parallel file reading works quite well. Well there you go!

Ps. In case some of you wanted to know, what if the balanceFactor was 2 when using a single worker process? Well, it’s horrible:

HP-Z820:/mnt/fastssd/fast_file_reader$ time python fastread.py --fileName="HIGGS.csv" --numProcesses=1 --balanceFactor=2
11000000

real    1m37.077s
user    0m12.432s
sys 1m24.700s

Key parts of the fastread.py python program:

fileBytes = stat(fileName).st_size  # Read quickly from OS how many bytes are in a text file
startByte, endByte = PartitionDataToWorkers(workers=numProcesses, items=fileBytes, balanceFactor=balanceFactor)
p = Pool(numProcesses)
partialSum = p.starmap(ReadFileSegment, zip(startByte, endByte, repeat(fileName))) # startByte is already a list. fileName is made into a same-length list of duplicates values.
globalSum = sum(partialSum)
print(globalSum)


def ReadFileSegment(startByte, endByte, fileName, searchChar='\n'):  # counts number of searchChar appearing in the byte range
    with open(fileName, 'r') as f:
        f.seek(startByte-1)  # seek is initially at byte 0 and then moves forward the specified amount, so seek(5) points at the 6th byte.
        bytes = f.read(endByte - startByte + 1)
        cnt = len(re.findall(searchChar, bytes)) # findall with implicit compiling runs just as fast here as re.compile once + re.finditer many times.
    return cnt

The def for PartitionDataToWorkers is just ordinary sequential code. I left it out in case someone else wants to get some practice on what parallel programming is like. I gave away for free the harder parts: the tested and working parallel code, for your learning benefit.

Thanks to: The open-source H2O project, by Arno and Cliff and the H2O staff for their great software and instructional videos, which have provided me the inspiration for this pure python high performance parallel byte offset reader as shown above. H2O does parallel file reading using java, is callable by python and R programs, and is crazy fast, faster than anything on the planet at reading big CSV files.


回答 5

Katrielalex提供了打开和读取一个文件的方法。

但是,算法执行的方式将为文件的每一行读取整个文件。这意味着,如果N是文件中的行数,则读取文件和计算Levenshtein距离的总次数将为N * N。由于您担心文件的大小并且不想将其保存在内存中,因此我担心生成的二次运行时间。您的算法属于O(n ^ 2)类算法,通常可以通过专业化加以改进。

我怀疑您已经在这里知道了内存与运行时间之间的折衷,但是也许您想研究是否存在一种有效的方法来并行计算多个Levenshtein距离。如果是这样,在这里分享您的解决方案将很有趣。

您的文件有几行,算法必须在哪种计算机上运行(内存和cpu功能),允许的运行时间是多少?

代码如下所示:

with f_outer as open(input_file, 'r'):
    for line_outer in f_outer:
        with f_inner as open(input_file, 'r'):
            for line_inner in f_inner:
                compute_distance(line_outer, line_inner)

但是问题是如何存储距离(矩阵?),并且可以受益于准备例如outer_line进行处理,或缓存一些中间结果以供重用的优势。

Katrielalex provided the way to open & read one file.

However the way your algorithm goes it reads the whole file for each line of the file. That means the overall amount of reading a file – and computing the Levenshtein distance – will be done N*N if N is the amount of lines in the file. Since you’re concerned about file size and don’t want to keep it in memory, I am concerned about the resulting quadratic runtime. Your algorithm is in the O(n^2) class of algorithms which often can be improved with specialization.

I suspect that you already know the tradeoff of memory versus runtime here, but maybe you would want to investigate if there’s an efficient way to compute multiple Levenshtein distances in parallel. If so it would be interesting to share your solution here.

How many lines do your files have, and on what kind of machine (mem & cpu power) does your algorithm have to run, and what’s the tolerated runtime?

Code would look like:

with f_outer as open(input_file, 'r'):
    for line_outer in f_outer:
        with f_inner as open(input_file, 'r'):
            for line_inner in f_inner:
                compute_distance(line_outer, line_inner)

But the questions are how do you store the distances (matrix?) and can you gain an advantage of preparing e.g. the outer_line for processing, or caching some intermediate results for reuse.


回答 6

#Using a text file for the example
with open("yourFile.txt","r") as f:
    text = f.readlines()
for line in text:
    print line
  • 打开文件以供阅读(r)
  • 阅读整个文件并将每一行保存到列表中(文本)中
  • 遍历列表,打印每行。

例如,如果要检查长度大于10的特定行,请使用已有的内容。

for line in text:
    if len(line) > 10:
        print line
#Using a text file for the example
with open("yourFile.txt","r") as f:
    text = f.readlines()
for line in text:
    print line
  • Open your file for reading (r)
  • Read the whole file and save each line into a list (text)
  • Loop through the list printing each line.

If you want, for example, to check a specific line for a length greater than 10, work with what you already have available.

for line in text:
    if len(line) > 10:
        print line

回答 7

从python文档中获得fileinput .input():

这会遍历中列出的所有文件的行sys.argv[1:],默认为sys.stdin列表为空

此外,函数的定义是:

fileinput.FileInput([files[, inplace[, backup[, mode[, openhook]]]]])

阅读两行之间的内容,这告诉我files可以是一个列表,这样您就可以得到以下内容:

for each_line in fileinput.input([input_file, input_file]):
  do_something(each_line)

看到这里更多信息

From the python documentation for fileinput.input():

This iterates over the lines of all files listed in sys.argv[1:], defaulting to sys.stdin if the list is empty

further, the definition of the function is:

fileinput.FileInput([files[, inplace[, backup[, mode[, openhook]]]]])

reading between the lines, this tells me that files can be a list so you could have something like:

for each_line in fileinput.input([input_file, input_file]):
  do_something(each_line)

See here for more information


回答 8

我强烈建议您不要使用默认文件加载,因为它的速度非常慢。您应该研究numpy函数和IOpro函数(例如numpy.loadtxt())。

http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

https://store.continuum.io/cshop/iopro/

然后,您可以将成对操作分成多个块:

import numpy as np
import math

lines_total = n    
similarity = np.zeros(n,n)
lines_per_chunk = m
n_chunks = math.ceil(float(n)/m)
for i in xrange(n_chunks):
    for j in xrange(n_chunks):
        chunk_i = (function of your choice to read lines i*lines_per_chunk to (i+1)*lines_per_chunk)
        chunk_j = (function of your choice to read lines j*lines_per_chunk to (j+1)*lines_per_chunk)
        similarity[i*lines_per_chunk:(i+1)*lines_per_chunk,
                   j*lines_per_chunk:(j+1)*lines_per_chunk] = fast_operation(chunk_i, chunk_j) 

与逐个元素地加载数据相比,将数据分块加载然后进行矩阵操作几乎总是快得多!

I would strongly recommend not using the default file loading as it is horrendously slow. You should look into the numpy functions and the IOpro functions (e.g. numpy.loadtxt()).

http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

https://store.continuum.io/cshop/iopro/

Then you can break your pairwise operation into chunks:

import numpy as np
import math

lines_total = n    
similarity = np.zeros(n,n)
lines_per_chunk = m
n_chunks = math.ceil(float(n)/m)
for i in xrange(n_chunks):
    for j in xrange(n_chunks):
        chunk_i = (function of your choice to read lines i*lines_per_chunk to (i+1)*lines_per_chunk)
        chunk_j = (function of your choice to read lines j*lines_per_chunk to (j+1)*lines_per_chunk)
        similarity[i*lines_per_chunk:(i+1)*lines_per_chunk,
                   j*lines_per_chunk:(j+1)*lines_per_chunk] = fast_operation(chunk_i, chunk_j) 

It’s almost always much faster to load data in chunks and then do matrix operations on it than to do it element by element!!


回答 9

是否需要经常从最近读取的位置读取大文件?

我创建了一个脚本,用于每天多次切割Apache access.log文件。因此,我需要在上次执行期间解析的最后一行上设置位置光标。为此,我曾经file.seek()file.seek()方法允许将光标存储在文件中。

我的代码:

ENCODING = "utf8"
CURRENT_FILE_DIR = os.path.dirname(os.path.abspath(__file__))

# This file is used to store the last cursor position
cursor_position = os.path.join(CURRENT_FILE_DIR, "access_cursor_position.log")

# Log file with new lines
log_file_to_cut = os.path.join(CURRENT_FILE_DIR, "access.log")
cut_file = os.path.join(CURRENT_FILE_DIR, "cut_access", "cut.log")

# Set in from_line 
from_position = 0
try:
    with open(cursor_position, "r", encoding=ENCODING) as f:
        from_position = int(f.read())
except Exception as e:
    pass

# We read log_file_to_cut to put new lines in cut_file
with open(log_file_to_cut, "r", encoding=ENCODING) as f:
    with open(cut_file, "w", encoding=ENCODING) as fw:
        # We set cursor to the last position used (during last run of script)
        f.seek(from_position)
        for line in f:
            fw.write("%s" % (line))

    # We save the last position of cursor for next usage
    with open(cursor_position, "w", encoding=ENCODING) as fw:
        fw.write(str(f.tell()))

Need to frequently read a large file from last position reading ?

I have created a script used to cut an Apache access.log file several times a day. So I needed to set a position cursor on last line parsed during last execution. To this end, I used file.seek() and file.seek() methods which allows the storage of the cursor in file.

My code :

ENCODING = "utf8"
CURRENT_FILE_DIR = os.path.dirname(os.path.abspath(__file__))

# This file is used to store the last cursor position
cursor_position = os.path.join(CURRENT_FILE_DIR, "access_cursor_position.log")

# Log file with new lines
log_file_to_cut = os.path.join(CURRENT_FILE_DIR, "access.log")
cut_file = os.path.join(CURRENT_FILE_DIR, "cut_access", "cut.log")

# Set in from_line 
from_position = 0
try:
    with open(cursor_position, "r", encoding=ENCODING) as f:
        from_position = int(f.read())
except Exception as e:
    pass

# We read log_file_to_cut to put new lines in cut_file
with open(log_file_to_cut, "r", encoding=ENCODING) as f:
    with open(cut_file, "w", encoding=ENCODING) as fw:
        # We set cursor to the last position used (during last run of script)
        f.seek(from_position)
        for line in f:
            fw.write("%s" % (line))

    # We save the last position of cursor for next usage
    with open(cursor_position, "w", encoding=ENCODING) as fw:
        fw.write(str(f.tell()))

回答 10

逐行读取大文件的最佳方法是使用python 枚举功能

with open(file_name, "rU") as read_file:
    for i, row in enumerate(read_file, 1):
        #do something
        #i in line of that line
        #row containts all data of that line

Best way to read large file, line by line is to use python enumerate function

with open(file_name, "rU") as read_file:
    for i, row in enumerate(read_file, 1):
        #do something
        #i in line of that line
        #row containts all data of that line

如何使我的Python程序休眠50毫秒?

问题:如何使我的Python程序休眠50毫秒?

如何使我的Python程序休眠50毫秒?

How do I get my Python program to sleep for 50 milliseconds?


回答 0

from time import sleep
sleep(0.05)

参考

from time import sleep
sleep(0.05)

Reference


回答 1

需要注意的是,如果你依靠睡眠服用正好 50毫秒,你不会得到那个。就是这样。

Note that if you rely on sleep taking exactly 50 ms, you won’t get that. It will just be about it.


回答 2

import time
time.sleep(50 / 1000)
import time
time.sleep(50 / 1000)

回答 3

也可以使用pyautogui作为

import pyautogui
pyautogui._autoPause(0.05,False)

如果first不为None,则它将暂停第一个arg秒,在此示例中:0.05秒

如果first为None,第二arg为True,则它将休眠以设置的全局暂停设置

pyautogui.PAUSE = int

如果您想知道原因,请参见源代码:

def _autoPause(pause, _pause):
    """If `pause` is not `None`, then sleep for `pause` seconds.
    If `_pause` is `True`, then sleep for `PAUSE` seconds (the global pause setting).

    This function is called at the end of all of PyAutoGUI's mouse and keyboard functions. Normally, `_pause`
    is set to `True` to add a short sleep so that the user can engage the failsafe. By default, this sleep
    is as long as `PAUSE` settings. However, this can be override by setting `pause`, in which case the sleep
    is as long as `pause` seconds.
    """
    if pause is not None:
        time.sleep(pause)
    elif _pause:
        assert isinstance(PAUSE, int) or isinstance(PAUSE, float)
        time.sleep(PAUSE)

can also using pyautogui as

import pyautogui
pyautogui._autoPause(0.05,False)

if first is not None, then it will pause for first arg second, in this example:0.05 sec

if first is None, and second arg is True, then it will sleep for global pause setting which is set with

pyautogui.PAUSE = int

if you are wondering the reason, see the source code:

def _autoPause(pause, _pause):
    """If `pause` is not `None`, then sleep for `pause` seconds.
    If `_pause` is `True`, then sleep for `PAUSE` seconds (the global pause setting).

    This function is called at the end of all of PyAutoGUI's mouse and keyboard functions. Normally, `_pause`
    is set to `True` to add a short sleep so that the user can engage the failsafe. By default, this sleep
    is as long as `PAUSE` settings. However, this can be override by setting `pause`, in which case the sleep
    is as long as `pause` seconds.
    """
    if pause is not None:
        time.sleep(pause)
    elif _pause:
        assert isinstance(PAUSE, int) or isinstance(PAUSE, float)
        time.sleep(PAUSE)

回答 4

您也可以使用Timer()功能来实现。

码:

from threading import Timer

def hello():
  print("Hello")

t = Timer(0.05, hello)
t.start()  # After 0.05 seconds, "Hello" will be printed

You can also do it by using Timer() function.

Code:

from threading import Timer

def hello():
  print("Hello")

t = Timer(0.05, hello)
t.start()  # After 0.05 seconds, "Hello" will be printed

如何使用python找出CPU数量

问题:如何使用python找出CPU数量

我想知道使用Python的本地计算机上的CPU数量。当使用最佳缩放的仅用户空间程序调用时,结果应user/real为输出time(1)

I want to know the number of CPUs on the local machine using Python. The result should be user/real as output by time(1) when called with an optimally scaling userspace-only program.


回答 0

如果您的Python版本> = 2.6,则可以简单地使用

import multiprocessing

multiprocessing.cpu_count()

http://docs.python.org/library/multiprocessing.html#multiprocessing.cpu_count

If you have python with a version >= 2.6 you can simply use

import multiprocessing

multiprocessing.cpu_count()

http://docs.python.org/library/multiprocessing.html#multiprocessing.cpu_count


回答 1

如果您对当前进程可用的处理器数量感兴趣,则必须首先检查cpuset。否则(或者如果未使用cpuset)multiprocessing.cpu_count()是在Python 2.6及更高版本中使用的方法。以下方法可回溯到旧版Python中的几个替代方法:

import os
import re
import subprocess


def available_cpu_count():
    """ Number of available virtual or physical CPUs on this system, i.e.
    user/real as output by time(1) when called with an optimally scaling
    userspace-only program"""

    # cpuset
    # cpuset may restrict the number of *available* processors
    try:
        m = re.search(r'(?m)^Cpus_allowed:\s*(.*)$',
                      open('/proc/self/status').read())
        if m:
            res = bin(int(m.group(1).replace(',', ''), 16)).count('1')
            if res > 0:
                return res
    except IOError:
        pass

    # Python 2.6+
    try:
        import multiprocessing
        return multiprocessing.cpu_count()
    except (ImportError, NotImplementedError):
        pass

    # https://github.com/giampaolo/psutil
    try:
        import psutil
        return psutil.cpu_count()   # psutil.NUM_CPUS on old versions
    except (ImportError, AttributeError):
        pass

    # POSIX
    try:
        res = int(os.sysconf('SC_NPROCESSORS_ONLN'))

        if res > 0:
            return res
    except (AttributeError, ValueError):
        pass

    # Windows
    try:
        res = int(os.environ['NUMBER_OF_PROCESSORS'])

        if res > 0:
            return res
    except (KeyError, ValueError):
        pass

    # jython
    try:
        from java.lang import Runtime
        runtime = Runtime.getRuntime()
        res = runtime.availableProcessors()
        if res > 0:
            return res
    except ImportError:
        pass

    # BSD
    try:
        sysctl = subprocess.Popen(['sysctl', '-n', 'hw.ncpu'],
                                  stdout=subprocess.PIPE)
        scStdout = sysctl.communicate()[0]
        res = int(scStdout)

        if res > 0:
            return res
    except (OSError, ValueError):
        pass

    # Linux
    try:
        res = open('/proc/cpuinfo').read().count('processor\t:')

        if res > 0:
            return res
    except IOError:
        pass

    # Solaris
    try:
        pseudoDevices = os.listdir('/devices/pseudo/')
        res = 0
        for pd in pseudoDevices:
            if re.match(r'^cpuid@[0-9]+$', pd):
                res += 1

        if res > 0:
            return res
    except OSError:
        pass

    # Other UNIXes (heuristic)
    try:
        try:
            dmesg = open('/var/run/dmesg.boot').read()
        except IOError:
            dmesgProcess = subprocess.Popen(['dmesg'], stdout=subprocess.PIPE)
            dmesg = dmesgProcess.communicate()[0]

        res = 0
        while '\ncpu' + str(res) + ':' in dmesg:
            res += 1

        if res > 0:
            return res
    except OSError:
        pass

    raise Exception('Can not determine number of CPUs on this system')

If you’re interested into the number of processors available to your current process, you have to check cpuset first. Otherwise (or if cpuset is not in use), multiprocessing.cpu_count() is the way to go in Python 2.6 and newer. The following method falls back to a couple of alternative methods in older versions of Python:

import os
import re
import subprocess


def available_cpu_count():
    """ Number of available virtual or physical CPUs on this system, i.e.
    user/real as output by time(1) when called with an optimally scaling
    userspace-only program"""

    # cpuset
    # cpuset may restrict the number of *available* processors
    try:
        m = re.search(r'(?m)^Cpus_allowed:\s*(.*)$',
                      open('/proc/self/status').read())
        if m:
            res = bin(int(m.group(1).replace(',', ''), 16)).count('1')
            if res > 0:
                return res
    except IOError:
        pass

    # Python 2.6+
    try:
        import multiprocessing
        return multiprocessing.cpu_count()
    except (ImportError, NotImplementedError):
        pass

    # https://github.com/giampaolo/psutil
    try:
        import psutil
        return psutil.cpu_count()   # psutil.NUM_CPUS on old versions
    except (ImportError, AttributeError):
        pass

    # POSIX
    try:
        res = int(os.sysconf('SC_NPROCESSORS_ONLN'))

        if res > 0:
            return res
    except (AttributeError, ValueError):
        pass

    # Windows
    try:
        res = int(os.environ['NUMBER_OF_PROCESSORS'])

        if res > 0:
            return res
    except (KeyError, ValueError):
        pass

    # jython
    try:
        from java.lang import Runtime
        runtime = Runtime.getRuntime()
        res = runtime.availableProcessors()
        if res > 0:
            return res
    except ImportError:
        pass

    # BSD
    try:
        sysctl = subprocess.Popen(['sysctl', '-n', 'hw.ncpu'],
                                  stdout=subprocess.PIPE)
        scStdout = sysctl.communicate()[0]
        res = int(scStdout)

        if res > 0:
            return res
    except (OSError, ValueError):
        pass

    # Linux
    try:
        res = open('/proc/cpuinfo').read().count('processor\t:')

        if res > 0:
            return res
    except IOError:
        pass

    # Solaris
    try:
        pseudoDevices = os.listdir('/devices/pseudo/')
        res = 0
        for pd in pseudoDevices:
            if re.match(r'^cpuid@[0-9]+$', pd):
                res += 1

        if res > 0:
            return res
    except OSError:
        pass

    # Other UNIXes (heuristic)
    try:
        try:
            dmesg = open('/var/run/dmesg.boot').read()
        except IOError:
            dmesgProcess = subprocess.Popen(['dmesg'], stdout=subprocess.PIPE)
            dmesg = dmesgProcess.communicate()[0]

        res = 0
        while '\ncpu' + str(res) + ':' in dmesg:
            res += 1

        if res > 0:
            return res
    except OSError:
        pass

    raise Exception('Can not determine number of CPUs on this system')

回答 2

另一种选择是使用该psutil库,它在以下情况下总是有用的:

>>> import psutil
>>> psutil.cpu_count()
2

这应该可以在psutil(Unix和Windows)支持的任何平台上使用。

请注意,在某些情况下multiprocessing.cpu_count可能会产生一种NotImplementedError同时psutil就能获得CPU的数量。这仅仅是因为psutil首先尝试使用与以前使用的相同的技术multiprocessing,如果失败,它还会使用其他技术。

Another option is to use the psutil library, which always turn out useful in these situations:

>>> import psutil
>>> psutil.cpu_count()
2

This should work on any platform supported by psutil(Unix and Windows).

Note that in some occasions multiprocessing.cpu_count may raise a NotImplementedError while psutil will be able to obtain the number of CPUs. This is simply because psutil first tries to use the same techniques used by multiprocessing and, if those fail, it also uses other techniques.


回答 3

在Python 3.4及更高版本中:os.cpu_count()

multiprocessing.cpu_count()就此功能实现了,但是NotImplementedError如果os.cpu_count()返回则提高None(“无法确定CPU的数量”)。

In Python 3.4+: os.cpu_count().

multiprocessing.cpu_count() is implemented in terms of this function but raises NotImplementedError if os.cpu_count() returns None (“can’t determine number of CPUs”).


回答 4

len(os.sched_getaffinity(0)) 通常就是你想要的

https://docs.python.org/3/library/os.html#os.sched_getaffinity

os.sched_getaffinity(0)(在Python 3中添加)(考虑sched_setaffinityLinux系统调用)返回可用的CPU集合,这限制了进程及其子进程可以在哪些CPU上运行。

0表示获取当前过程的值。该函数返回set()允许的CPU,因此需要len()

multiprocessing.cpu_count() 另一方面,仅返回物理CPU的总数。

这种差异尤为重要,因为某些集群管理系统(例如Platform LSF)将作业CPU的使用限制为sched_getaffinity

因此,如果使用multiprocessing.cpu_count(),则脚本可能会尝试使用比可用内核更多的内核,这可能导致过载和超时。

通过限制与taskset实用程序的关联性,我们可以具体看到差异。

例如,如果我在16核系统中将Python限制为仅1核(0核):

taskset -c 0 ./main.py

使用测试脚本:

main.py

#!/usr/bin/env python3

import multiprocessing
import os

print(multiprocessing.cpu_count())
print(len(os.sched_getaffinity(0)))

那么输出是:

16
1

nproc 但是默认情况下确实遵守关联性,并且:

taskset -c 0 nproc

输出:

1

man nproc使其非常明确:

打印可用的处理单元数

nproc具有--all要获取物理CPU计数的较不常见情况的标志:

taskset -c 0 nproc --all

该方法的唯一缺点是,它似乎仅适用于UNIX。我以为Windows必须具有类似的相似性API SetProcessAffinityMask,所以我想知道为什么还没有移植它。但是我对Windows一无所知。

已在Ubuntu 16.04,Python 3.5.2中进行了测试。

len(os.sched_getaffinity(0)) is what you usually want

https://docs.python.org/3/library/os.html#os.sched_getaffinity

os.sched_getaffinity(0) (added in Python 3) returns the set of CPUs available considering the sched_setaffinity Linux system call, which limits which CPUs a process and its children can run on.

0 means to get the value for the current process. The function returns a set() of allowed CPUs, thus the need for len().

multiprocessing.cpu_count() on the other hand just returns the total number of physical CPUs.

The difference is especially important because certain cluster management systems such as Platform LSF limit job CPU usage with sched_getaffinity.

Therefore, if you use multiprocessing.cpu_count(), your script might try to use way more cores than it has available, which may lead to overload and timeouts.

We can see the difference concretely by restricting the affinity with the taskset utility.

For example, if I restrict Python to just 1 core (core 0) in my 16 core system:

taskset -c 0 ./main.py

with the test script:

main.py

#!/usr/bin/env python3

import multiprocessing
import os

print(multiprocessing.cpu_count())
print(len(os.sched_getaffinity(0)))

then the output is:

16
1

nproc however does respect the affinity by default and:

taskset -c 0 nproc

outputs:

1

and man nproc makes that quite explicit:

print the number of processing units available

nproc has the --all flag for the less common case that you want to get the physical CPU count:

taskset -c 0 nproc --all

The only downside of this method is that this appears to be UNIX only. I supposed Windows must have a similar affinity API, possibly SetProcessAffinityMask, so I wonder why it hasn’t been ported. But I know nothing about Windows.

Tested in Ubuntu 16.04, Python 3.5.2.


回答 5

平台无关:

psutil.cpu_count(逻辑=假)

https://github.com/giampaolo/psutil/blob/master/INSTALL.rst

platform independent:

psutil.cpu_count(logical=False)

https://github.com/giampaolo/psutil/blob/master/INSTALL.rst


回答 6

这些给你超线程的CPU数量

  1. multiprocessing.cpu_count()
  2. os.cpu_count()

这些为您提供虚拟机的CPU数量

  1. psutil.cpu_count()
  2. numexpr.detect_number_of_cores()

仅在您在VM上工作时才重要。

These give you the hyperthreaded CPU count

  1. multiprocessing.cpu_count()
  2. os.cpu_count()

These give you the virtual machine CPU count

  1. psutil.cpu_count()
  2. numexpr.detect_number_of_cores()

Only matters if you works on VMs.


回答 7

multiprocessing.cpu_count()将返回逻辑CPU的数量,因此,如果您具有带超线程功能的四核CPU,它将返回8。如果您想要物理CPU的数量,请使用python绑定来hwloc:

#!/usr/bin/env python
import hwloc
topology = hwloc.Topology()
topology.load()
print topology.get_nbobjs_by_type(hwloc.OBJ_CORE)

hwloc设计为可跨操作系统和体系结构移植。

multiprocessing.cpu_count() will return the number of logical CPUs, so if you have a quad-core CPU with hyperthreading, it will return 8. If you want the number of physical CPUs, use the python bindings to hwloc:

#!/usr/bin/env python
import hwloc
topology = hwloc.Topology()
topology.load()
print topology.get_nbobjs_by_type(hwloc.OBJ_CORE)

hwloc is designed to be portable across OSes and architectures.


回答 8

无法弄清楚如何添加到代码或回复消息,但是这里提供了对jython的支持,您可以在放弃之前先加以支持:

# jython
try:
    from java.lang import Runtime
    runtime = Runtime.getRuntime()
    res = runtime.availableProcessors()
    if res > 0:
        return res
except ImportError:
    pass

Can’t figure out how to add to the code or reply to the message but here’s support for jython that you can tack in before you give up:

# jython
try:
    from java.lang import Runtime
    runtime = Runtime.getRuntime()
    res = runtime.availableProcessors()
    if res > 0:
        return res
except ImportError:
    pass

回答 9

这对于使用不同操作系统/系统但想要获得世界最佳状态的我们来说可能是有用的:

import os
workers = os.cpu_count()
if 'sched_getaffinity' in dir(os):
    workers = len(os.sched_getaffinity(0))

This may work for those of us who use different os/systems, but want to get the best of all worlds:

import os
workers = os.cpu_count()
if 'sched_getaffinity' in dir(os):
    workers = len(os.sched_getaffinity(0))

回答 10

您也可以为此目的使用“ joblib”。

import joblib
print joblib.cpu_count()

此方法将为您提供系统中的cpus数。但是需要安装joblib。关于joblib的更多信息可以在这里找到 https://pythonhosted.org/joblib/parallel.html

或者,您可以使用python的numexpr软件包。它具有许多简单的功能,有助于获取有关系统cpu的信息。

import numexpr as ne
print ne.detect_number_of_cores()

You can also use “joblib” for this purpose.

import joblib
print joblib.cpu_count()

This method will give you the number of cpus in the system. joblib needs to be installed though. More information on joblib can be found here https://pythonhosted.org/joblib/parallel.html

Alternatively you can use numexpr package of python. It has lot of simple functions helpful for getting information about the system cpu.

import numexpr as ne
print ne.detect_number_of_cores()

回答 11

如果您没有Python 2.6,请选择另一个选项:

import commands
n = commands.getoutput("grep -c processor /proc/cpuinfo")

Another option if you don’t have Python 2.6:

import commands
n = commands.getoutput("grep -c processor /proc/cpuinfo")

如何在Python中捕获SIGINT?

问题:如何在Python中捕获SIGINT?

我正在研究启动多个进程和数据库连接的python脚本。我不时地想用Ctrl+ C信号杀死脚本,我想进行一些清理。

在Perl中,我可以这样做:

$SIG{'INT'} = 'exit_gracefully';

sub exit_gracefully {
    print "Caught ^C \n";
    exit (0);
}

如何在Python中做类似的事情?

I’m working on a python script that starts several processes and database connections. Every now and then I want to kill the script with a Ctrl+C signal, and I’d like to do some cleanup.

In Perl I’d do this:

$SIG{'INT'} = 'exit_gracefully';

sub exit_gracefully {
    print "Caught ^C \n";
    exit (0);
}

How do I do the analogue of this in Python?


回答 0

用以下方式注册您的处理程序signal.signal

#!/usr/bin/env python
import signal
import sys

def signal_handler(sig, frame):
    print('You pressed Ctrl+C!')
    sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
print('Press Ctrl+C')
signal.pause()

代码改编自此处

有关更多文档,请signal参见此处。  

Register your handler with signal.signal like this:

#!/usr/bin/env python
import signal
import sys

def signal_handler(sig, frame):
    print('You pressed Ctrl+C!')
    sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
print('Press Ctrl+C')
signal.pause()

Code adapted from here.

More documentation on signal can be found here.  


回答 1

您可以像对待其他异常一样将其视为异常(KeyboardInterrupt)。制作一个新文件,并从您的外壳运行以下内容,以了解我的意思:

import time, sys

x = 1
while True:
    try:
        print x
        time.sleep(.3)
        x += 1
    except KeyboardInterrupt:
        print "Bye"
        sys.exit()

You can treat it like an exception (KeyboardInterrupt), like any other. Make a new file and run it from your shell with the following contents to see what I mean:

import time, sys

x = 1
while True:
    try:
        print x
        time.sleep(.3)
        x += 1
    except KeyboardInterrupt:
        print "Bye"
        sys.exit()

回答 2

并作为上下文管理器:

import signal

class GracefulInterruptHandler(object):

    def __init__(self, sig=signal.SIGINT):
        self.sig = sig

    def __enter__(self):

        self.interrupted = False
        self.released = False

        self.original_handler = signal.getsignal(self.sig)

        def handler(signum, frame):
            self.release()
            self.interrupted = True

        signal.signal(self.sig, handler)

        return self

    def __exit__(self, type, value, tb):
        self.release()

    def release(self):

        if self.released:
            return False

        signal.signal(self.sig, self.original_handler)

        self.released = True

        return True

使用方法:

with GracefulInterruptHandler() as h:
    for i in xrange(1000):
        print "..."
        time.sleep(1)
        if h.interrupted:
            print "interrupted!"
            time.sleep(2)
            break

嵌套处理程序:

with GracefulInterruptHandler() as h1:
    while True:
        print "(1)..."
        time.sleep(1)
        with GracefulInterruptHandler() as h2:
            while True:
                print "\t(2)..."
                time.sleep(1)
                if h2.interrupted:
                    print "\t(2) interrupted!"
                    time.sleep(2)
                    break
        if h1.interrupted:
            print "(1) interrupted!"
            time.sleep(2)
            break

从这里:https : //gist.github.com/2907502

And as a context manager:

import signal

class GracefulInterruptHandler(object):

    def __init__(self, sig=signal.SIGINT):
        self.sig = sig

    def __enter__(self):

        self.interrupted = False
        self.released = False

        self.original_handler = signal.getsignal(self.sig)

        def handler(signum, frame):
            self.release()
            self.interrupted = True

        signal.signal(self.sig, handler)

        return self

    def __exit__(self, type, value, tb):
        self.release()

    def release(self):

        if self.released:
            return False

        signal.signal(self.sig, self.original_handler)

        self.released = True

        return True

To use:

with GracefulInterruptHandler() as h:
    for i in xrange(1000):
        print "..."
        time.sleep(1)
        if h.interrupted:
            print "interrupted!"
            time.sleep(2)
            break

Nested handlers:

with GracefulInterruptHandler() as h1:
    while True:
        print "(1)..."
        time.sleep(1)
        with GracefulInterruptHandler() as h2:
            while True:
                print "\t(2)..."
                time.sleep(1)
                if h2.interrupted:
                    print "\t(2) interrupted!"
                    time.sleep(2)
                    break
        if h1.interrupted:
            print "(1) interrupted!"
            time.sleep(2)
            break

From here: https://gist.github.com/2907502


回答 3

您可以通过捕获异常来处理CTRL+ 。您可以在异常处理程序中实现任何清理代码。CKeyboardInterrupt

You can handle CTRL+C by catching the KeyboardInterrupt exception. You can implement any clean-up code in the exception handler.


回答 4

从Python的文档中

import signal
import time

def handler(signum, frame):
    print 'Here you go'

signal.signal(signal.SIGINT, handler)

time.sleep(10) # Press Ctrl+c here

From Python’s documentation:

import signal
import time

def handler(signum, frame):
    print 'Here you go'

signal.signal(signal.SIGINT, handler)

time.sleep(10) # Press Ctrl+c here

回答 5

另一个片段

简称main为主要功能,并exit_gracefullyCTRL+ c处理器

if __name__ == '__main__':
    try:
        main()
    except KeyboardInterrupt:
        pass
    finally:
        exit_gracefully()

Yet Another Snippet

Referred main as the main function and exit_gracefully as the CTRL + c handler

if __name__ == '__main__':
    try:
        main()
    except KeyboardInterrupt:
        pass
    finally:
        exit_gracefully()

回答 6

我改编了@udi的代码以支持多种信号(没什么花哨的):

class GracefulInterruptHandler(object):
    def __init__(self, signals=(signal.SIGINT, signal.SIGTERM)):
        self.signals = signals
        self.original_handlers = {}

    def __enter__(self):
        self.interrupted = False
        self.released = False

        for sig in self.signals:
            self.original_handlers[sig] = signal.getsignal(sig)
            signal.signal(sig, self.handler)

        return self

    def handler(self, signum, frame):
        self.release()
        self.interrupted = True

    def __exit__(self, type, value, tb):
        self.release()

    def release(self):
        if self.released:
            return False

        for sig in self.signals:
            signal.signal(sig, self.original_handlers[sig])

        self.released = True
        return True

此代码支持键盘中断调用(SIGINT)和SIGTERMkill <process>

I adapted the code from @udi to support multiple signals (nothing fancy) :

class GracefulInterruptHandler(object):
    def __init__(self, signals=(signal.SIGINT, signal.SIGTERM)):
        self.signals = signals
        self.original_handlers = {}

    def __enter__(self):
        self.interrupted = False
        self.released = False

        for sig in self.signals:
            self.original_handlers[sig] = signal.getsignal(sig)
            signal.signal(sig, self.handler)

        return self

    def handler(self, signum, frame):
        self.release()
        self.interrupted = True

    def __exit__(self, type, value, tb):
        self.release()

    def release(self):
        if self.released:
            return False

        for sig in self.signals:
            signal.signal(sig, self.original_handlers[sig])

        self.released = True
        return True

This code support the keyboard interrupt call (SIGINT) and the SIGTERM (kill <process>)


回答 7

Matt J的回答相反,我使用一个简单的对象。这使我有可能将处理程序解析为所有需要停止securlery的线程。

class SIGINT_handler():
    def __init__(self):
        self.SIGINT = False

    def signal_handler(self, signal, frame):
        print('You pressed Ctrl+C!')
        self.SIGINT = True


handler = SIGINT_handler()
signal.signal(signal.SIGINT, handler.signal_handler)

别处

while True:
    # task
    if handler.SIGINT:
        break

In contrast to Matt J his answer, I use a simple object. This gives me the possibily to parse this handler to all the threads that needs to be stopped securlery.

class SIGINT_handler():
    def __init__(self):
        self.SIGINT = False

    def signal_handler(self, signal, frame):
        print('You pressed Ctrl+C!')
        self.SIGINT = True


handler = SIGINT_handler()
signal.signal(signal.SIGINT, handler.signal_handler)

Elsewhere

while True:
    # task
    if handler.SIGINT:
        break

回答 8

您可以使用Python的内置信号模块中的函数在python中设置信号处理程序。具体来说,该signal.signal(signalnum, handler)功能用于注册handler信号功能signalnum

You can use the functions in Python’s built-in signal module to set up signal handlers in python. Specifically the signal.signal(signalnum, handler) function is used to register the handler function for signal signalnum.


回答 9

感谢现有的答案,但补充 signal.getsignal()

import signal

# store default handler of signal.SIGINT
default_handler = signal.getsignal(signal.SIGINT)
catch_count = 0

def handler(signum, frame):
    global default_handler, catch_count
    catch_count += 1
    print ('wait:', catch_count)
    if catch_count > 3:
        # recover handler for signal.SIGINT
        signal.signal(signal.SIGINT, default_handler)
        print('expecting KeyboardInterrupt')

signal.signal(signal.SIGINT, handler)
print('Press Ctrl+c here')

while True:
    pass

thanks for existing answers, but added signal.getsignal()

import signal

# store default handler of signal.SIGINT
default_handler = signal.getsignal(signal.SIGINT)
catch_count = 0

def handler(signum, frame):
    global default_handler, catch_count
    catch_count += 1
    print ('wait:', catch_count)
    if catch_count > 3:
        # recover handler for signal.SIGINT
        signal.signal(signal.SIGINT, default_handler)
        print('expecting KeyboardInterrupt')

signal.signal(signal.SIGINT, handler)
print('Press Ctrl+c here')

while True:
    pass

回答 10

如果您想确保清理过程完成,则可以使用SIG_IGN来添加Matt J的答案,以便进一步忽略它们,以防止清理中断。SIGINT

import signal
import sys

def signal_handler(signum, frame):
    signal.signal(signum, signal.SIG_IGN) # ignore additional signals
    cleanup() # give your process a chance to clean up
    sys.exit(0)

signal.signal(signal.SIGINT, signal_handler) # register the signal with the signal handler first
do_stuff()

If you want to ensure that your cleanup process finishes I would add on to Matt J’s answer by using a SIG_IGN so that further SIGINT are ignored which will prevent your cleanup from being interrupted.

import signal
import sys

def signal_handler(signum, frame):
    signal.signal(signum, signal.SIG_IGN) # ignore additional signals
    cleanup() # give your process a chance to clean up
    sys.exit(0)

signal.signal(signal.SIGINT, signal_handler) # register the signal with the signal handler first
do_stuff()

回答 11

就个人而言,我无法使用try / except KeyboardInterrupt,因为我使用的是阻塞的标准套接字(IPC)模式。因此,对SIGINT进行了提示,但仅在套接字上接收到数据之后才出现。

设置信号处理程序的行为相同。

另一方面,这仅适用于实际终端。其他启动环境可能不接受Ctrl+ C或预先处理信号。

此外,Python中还有“异常”和“ BaseExceptions”,它们在解释器需要干净退出本身的意义上有所不同,因此某些异常的优先级高于其他异常(异常是从BaseException派生的)

Personally, I couldn’t use try/except KeyboardInterrupt because I was using standard socket (IPC) mode which is blocking. So the SIGINT was cueued, but came only after receiving data on the socket.

Setting a signal handler behaves the same.

On the other hand, this only works for an actual terminal. Other starting environments might not accept Ctrl+C, or pre-handle the signal.

Also, there are “Exceptions” and “BaseExceptions” in Python, which differ in the sense that interpreter needs to exit cleanly itself, so some exceptions have a higher priority than others (Exceptions is derived from BaseException)


用Python编写单元测试:如何开始?[关闭]

问题:用Python编写单元测试:如何开始?[关闭]

我用Python完成了第一个适当的项目,现在的任务是为它编写测试。

由于这是我第一次做项目,所以这是我第一次为此编写测试。

问题是,我该如何开始?我真的一点儿都不知道。谁能指出我一些可以用来开始编写测试的文档/教程/链接/书(尤其是单元测试)

关于该主题的任何建议都将受到欢迎。

I completed my first proper project in Python and now my task is to write tests for it.

Since this is the first time I did a project, this is the first time I would be writing tests for it.

The question is, how do I start? I have absolutely no idea. Can anyone point me to some documentation/ tutorial/ link/ book that I can use to start with writing tests (and I guess unit testing in particular)

Any advice will be welcomed on this topic.


回答 0

如果您是第一次使用单元测试,那么最简单的学习方法通​​常是最好的。在此基础上,我建议使用py.test而不是默认unittest模块

考虑以下两个示例,它们具有相同的作用:

示例1(单元测试):

import unittest

class LearningCase(unittest.TestCase):
    def test_starting_out(self):
        self.assertEqual(1, 1)

def main():
    unittest.main()

if __name__ == "__main__":
    main()

示例2(pytest):

def test_starting_out():
    assert 1 == 1

假设两个文件都被命名test_unittesting.py,我们如何运行测试?

示例1(单元测试):

cd /path/to/dir/
python test_unittesting.py

示例2(pytest):

cd /path/to/dir/
py.test

If you’re brand new to using unittests, the simplest approach to learn is often the best. On that basis along I recommend using py.test rather than the default unittest module.

Consider these two examples, which do the same thing:

Example 1 (unittest):

import unittest

class LearningCase(unittest.TestCase):
    def test_starting_out(self):
        self.assertEqual(1, 1)

def main():
    unittest.main()

if __name__ == "__main__":
    main()

Example 2 (pytest):

def test_starting_out():
    assert 1 == 1

Assuming that both files are named test_unittesting.py, how do we run the tests?

Example 1 (unittest):

cd /path/to/dir/
python test_unittesting.py

Example 2 (pytest):

cd /path/to/dir/
py.test

回答 1

免费的Python书籍Dive Into Python中关于单元测试的,您可能会觉得很有用。

如果您遵循现代实践,则可能应该在编写项目时编写测试,而不要等到项目接近完成时再进行测试。

现在有点晚,但是现在您知道下一次了。:)

The free Python book Dive Into Python has a chapter on unit testing that you might find useful.

If you follow modern practices you should probably write the tests while you are writing your project, and not wait until your project is nearly finished.

Bit late now, but now you know for next time. :)


回答 2

在我看来,有三个很棒的python测试框架值得一试。
单元测试 -模块配备了所有的Python发行标准
鼻子 -可以运行单元测试的测试,而且具有更少的样板。
pytest-还运行单元测试,样板更少,报告更好,还有很多很棒的额外功能

为了对所有这些进行良好的比较,请阅读http://pythontesting.net/start-here的每个介绍。
也有关于灯具的扩展文章,还有更多。

There are, in my opinion, three great python testing frameworks that are good to check out.
unittest – module comes standard with all python distributions
nose – can run unittest tests, and has less boilerplate.
pytest – also runs unittest tests, has less boilerplate, better reporting, lots of cool extra features

To get a good comparison of all of these, read through the introductions to each at http://pythontesting.net/start-here.
There’s also extended articles on fixtures, and more there.


回答 3

的文档 单元测试将是一个良好的开端。

另外,现在有点晚了,但是将来请考虑在项目本身之前或期间编写单元测试。这样,您可以使用它们进行测试,并且(理论上)可以将它们用作回归测试,以验证您的代码更改没有破坏任何现有代码。这将为您带来编写测试用例的全部好处:)

The docs for unittest would be a good place to start.

Also, it is a bit late now, but in the future please consider writing unit tests before or during the project itself. That way you can use them to test as you go along, and (in theory) you can use them as regression tests, to verify that your code changes have not broken any existing code. This would give you the full benefit of writing test cases :)


回答 4

unittest随标准库一起提供,但是我建议您进行鼻子测试

鼻子扩展了单元测试,使测试更加容易。

我也建议你pylint

分析Python源代码以寻找错误和质量低劣的迹象。

unittest comes with the standard library, but I would recomend you nosetests.

nose extends unittest to make testing easier.

I would also recomend you pylint

analyzes Python source code looking for bugs and signs of poor quality.


回答 5

正如其他人已经回答的那样,编写单元测试已经晚了,但还不算太晚。问题是您的代码是否可测试。确实,要对现有代码进行测试并不容易,甚至有一本关于此的书:有效地使用遗留代码(请参见要点前期PDF)。

现在编写或不编写单元测试就是您的要求。您只需要意识到这可能是一项繁琐的任务。您可以解决此问题,以学习单元测试或考虑首先编写验收(端对端)测试,然后在更改代码或向项目添加新功能时开始编写单元测试。

As others already replied, it’s late to write unit tests, but not too late. The question is whether your code is testable or not. Indeed, it’s not easy to put existing code under test, there is even a book about this: Working Effectively with Legacy Code (see key points or precursor PDF).

Now writing the unit tests or not is your call. You just need to be aware that it could be a tedious task. You might tackle this to learn unit-testing or consider writing acceptance (end-to-end) tests first, and start writing unit tests when you’ll change the code or add new feature to the project.


回答 6

鼻子测试是在python中进行单元测试的出色解决方案。它同时支持基于单元测试的测试用例和doctest,并且只需简单的配置文件就可以开始使用它。

nosetests is brilliant solution for unit-testing in python. It supports both unittest based testcases and doctests, and gets you started with it with just simple config file.