标签归档:Python

用Python编写的CSV文件每行之间都有空行

问题:用Python编写的CSV文件每行之间都有空行

import csv

with open('thefile.csv', 'rb') as f:
  data = list(csv.reader(f))
  import collections
  counter = collections.defaultdict(int)

  for row in data:
        counter[row[10]] += 1


with open('/pythonwork/thefile_subset11.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    for row in data:
        if counter[row[10]] >= 504:
           writer.writerow(row)

该代码读取thefile.csv,进行更改并将结果写入thefile_subset1

但是,当我在Microsoft Excel中打开生成的csv时,每条记录后都有一个额外的空白行!

有没有办法使它不放在多余的空白行?

import csv

with open('thefile.csv', 'rb') as f:
  data = list(csv.reader(f))
  import collections
  counter = collections.defaultdict(int)

  for row in data:
        counter[row[10]] += 1


with open('/pythonwork/thefile_subset11.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    for row in data:
        if counter[row[10]] >= 504:
           writer.writerow(row)

This code reads thefile.csv, makes changes, and writes results to thefile_subset1.

However, when I open the resulting csv in Microsoft Excel, there is an extra blank line after each record!

Is there a way to make it not put an extra blank line?


回答 0

在Python 2中,请outfile使用模式'wb'而不是来打开'w'。该csv.writer写入\r\n直接到文件中。如果您未以二进制模式打开文件,则会写入文件,\r\r\n因为在Windows 文本模式下会将每个文件\n转换为\r\n

在Python 3中,所需的语法已更改(请参见下面的文档链接),因此请outfile使用附加参数newline=''(空字符串)打开。

例子:

# Python 2
with open('/pythonwork/thefile_subset11.csv', 'wb') as outfile:
    writer = csv.writer(outfile)

# Python 3
with open('/pythonwork/thefile_subset11.csv', 'w', newline='') as outfile:
    writer = csv.writer(outfile)

文档链接

In Python 2, open outfile with mode 'wb' instead of 'w'. The csv.writer writes \r\n into the file directly. If you don’t open the file in binary mode, it will write \r\r\n because on Windows text mode will translate each \n into \r\n.

In Python 3 the required syntax changed (see documentation links below), so open outfile with the additional parameter newline='' (empty string) instead.

Examples:

# Python 2
with open('/pythonwork/thefile_subset11.csv', 'wb') as outfile:
    writer = csv.writer(outfile)

# Python 3
with open('/pythonwork/thefile_subset11.csv', 'w', newline='') as outfile:
    writer = csv.writer(outfile)

Documentation Links


回答 1

以二进制模式“ wb”打开文件在Python 3+中不起作用。或者更确切地说,您必须在编写数据之前将数据转换为二进制。那只是一个麻烦。

相反,您应该将其保留在文本模式下,但是将换行符替换为空。像这样:

with open('/pythonwork/thefile_subset11.csv', 'w', newline='') as outfile:

Opening the file in binary mode “wb” will not work in Python 3+. Or rather, you’d have to convert your data to binary before writing it. That’s just a hassle.

Instead, you should keep it in text mode, but override the newline as empty. Like so:

with open('/pythonwork/thefile_subset11.csv', 'w', newline='') as outfile:

回答 2

简单的答案是,无论输入还是输出,都应始终以二进制模式打开csv文件,否则在Windows上,行尾出现问题。具体上输出csv模块将写\r\n(标准CSV行终止),然后(在文本模式)运行时将取代\n通过\r\n(Windows标准线路终端),得到的结果\r\r\n

摆弄lineterminator不是解决方案。

The simple answer is that csv files should always be opened in binary mode whether for input or output, as otherwise on Windows there are problems with the line ending. Specifically on output the csv module will write \r\n (the standard CSV row terminator) and then (in text mode) the runtime will replace the \n by \r\n (the Windows standard line terminator) giving a result of \r\r\n.

Fiddling with the lineterminator is NOT the solution.


回答 3

注意:似乎这不是首选的解决方案,因为在Windows系统上如何添加额外的行。如python文档中所述

如果csvfile是文件对象,则必须在有区别的平台上使用“ b”标志打开它。

Windows是其中一个与众不同的平台。虽然按照我下面所述更改行终止符可能已解决了该问题,但可以通过以二进制模式打开文件来完全避免该问题。有人可能会说这种解决方案更“优雅”。在这种情况下,用行终止符“摆弄”可能会导致系统之间无法移植的代码,在此情况下,在UNIX系统上以二进制模式打开文件不会产生任何效果。即。它导致跨系统兼容的代码。

Python Docs

在Windows上,附加到模式的’b’以二进制模式打开文件,因此也有’rb’,’wb’和’r + b’之类的模式。Windows上的Python区分文本文件和二进制文件。当读取或写入数据时,文本文件中的行尾字符会自动更改。对于ASCII文本文件来说,对文件数据进行这种幕后修改是可以的,但它会破坏JPEG或EXE文件中的二进制数据。读写此类文件时,请务必小心使用二进制模式。在Unix上,将’b’附加到该模式没有什么坏处,因此您可以在平台上独立地将其用于所有二进制文件。

原件

作为csv.writer的可选参数的一部分,如果您获得多余的空行,则可能必须更改lineterminator(信息此处)。以下示例是从python页面csv docs改编的 将其从“ \ n”更改为应有的值。由于这只是在暗中解决问题的方法,因此可能会或可能不会起作用,但这是我的最佳猜测。

>>> import csv
>>> spamWriter = csv.writer(open('eggs.csv', 'w'), lineterminator='\n')
>>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
>>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

Note: It seems this is not the preferred solution because of how the extra line was being added on a Windows system. As stated in the python document:

If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.

Windows is one such platform where that makes a difference. While changing the line terminator as I described below may have fixed the problem, the problem could be avoided altogether by opening the file in binary mode. One might say this solution is more “elegent”. “Fiddling” with the line terminator would have likely resulted in unportable code between systems in this case, where opening a file in binary mode on a unix system results in no effect. ie. it results in cross system compatible code.

From Python Docs:

On Windows, ‘b’ appended to the mode opens the file in binary mode, so there are also modes like ‘rb’, ‘wb’, and ‘r+b’. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. On Unix, it doesn’t hurt to append a ‘b’ to the mode, so you can use it platform-independently for all binary files.

Original:

As part of optional paramaters for the csv.writer if you are getting extra blank lines you may have to change the lineterminator (info here). Example below adapated from the python page csv docs. Change it from ‘\n’ to whatever it should be. As this is just a stab in the dark at the problem this may or may not work, but it’s my best guess.

>>> import csv
>>> spamWriter = csv.writer(open('eggs.csv', 'w'), lineterminator='\n')
>>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
>>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

回答 4

我正在将这个答案写给python 3,因为我最初遇到了同样的问题。

我应该使用来从arduino获取数据PySerial,并将其写入.csv文件中。在我的情况下'\r\n',每个读数都以结尾,因此换行符总是分隔每行。

就我而言,newline=''选项无效。因为它显示了一些错误,例如:

with open('op.csv', 'a',newline=' ') as csv_file:

ValueError: illegal newline value: ''

因此,他们似乎不接受此处省略换行符。

仅在这里看到答案之一,我在writer对象中提到了行终止符,例如,

writer = csv.writer(csv_file, delimiter=' ',lineterminator='\r')

这对我来说是多余的换行符。

I’m writing this answer w.r.t. to python 3, as I’ve initially got the same problem.

I was supposed to get data from arduino using PySerial, and write them in a .csv file. Each reading in my case ended with '\r\n', so newline was always separating each line.

In my case, newline='' option didn’t work. Because it showed some error like :

with open('op.csv', 'a',newline=' ') as csv_file:

ValueError: illegal newline value: ''

So it seemed that they don’t accept omission of newline here.

Seeing one of the answers here only, I mentioned line terminator in the writer object, like,

writer = csv.writer(csv_file, delimiter=' ',lineterminator='\r')

and that worked for me for skipping the extra newlines.


回答 5

with open(destPath+'\\'+csvXML, 'a+') as csvFile:
    writer = csv.writer(csvFile, delimiter=';', lineterminator='\r')
    writer.writerows(xmlList)

“ lineterminator =’\ r’”允许传递到下一行,而在两行之间没有空行。

with open(destPath+'\\'+csvXML, 'a+') as csvFile:
    writer = csv.writer(csvFile, delimiter=';', lineterminator='\r')
    writer.writerows(xmlList)

The “lineterminator=’\r'” permit to pass to next row, without empty row between two.


回答 6

这个答案中借用,似乎最干净的解决方案是使用io.TextIOWrapper。我设法为自己解决了以下问题:

from io import TextIOWrapper

...

with open(filename, 'wb') as csvfile, TextIOWrapper(csvfile, encoding='utf-8', newline='') as wrapper:
    csvwriter = csv.writer(wrapper)
    for data_row in data:
        csvwriter.writerow(data_row)

上面的答案与Python 2不兼容。为了具有兼容性,我想一个人只需要将所有写入逻辑包装在一个if块中即可:

if sys.version_info < (3,):
    # Python 2 way of handling CSVs
else:
    # The above logic

Borrowing from this answer, it seems like the cleanest solution is to use io.TextIOWrapper. I managed to solve this problem for myself as follows:

from io import TextIOWrapper

...

with open(filename, 'wb') as csvfile, TextIOWrapper(csvfile, encoding='utf-8', newline='') as wrapper:
    csvwriter = csv.writer(wrapper)
    for data_row in data:
        csvwriter.writerow(data_row)

The above answer is not compatible with Python 2. To have compatibility, I suppose one would simply need to wrap all the writing logic in an if block:

if sys.version_info < (3,):
    # Python 2 way of handling CSVs
else:
    # The above logic

回答 7

使用下面定义的方法将数据写入CSV文件。

open('outputFile.csv', 'a',newline='')

只需newline=''open方法内部添加一个附加参数:

def writePhoneSpecsToCSV():
    rowData=["field1", "field2"]
    with open('outputFile.csv', 'a',newline='') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(rowData)

这将写入CSV行,而不会创建其他行!

Use the method defined below to write data to the CSV file.

open('outputFile.csv', 'a',newline='')

Just add an additional newline='' parameter inside the open method :

def writePhoneSpecsToCSV():
    rowData=["field1", "field2"]
    with open('outputFile.csv', 'a',newline='') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(rowData)

This will write CSV rows without creating additional rows!


回答 8

使用Python 3时,可以使用编解码器模块避免出现空行。如文档中所述,文件以二进制模式打开,因此不需要更改换行符kwarg。我最近遇到了同样的问题,对我有用:

with codecs.open( csv_file,  mode='w', encoding='utf-8') as out_csv:
     csv_out_file = csv.DictWriter(out_csv)

When using Python 3 the empty lines can be avoid by using the codecs module. As stated in the documentation, files are opened in binary mode so no change of the newline kwarg is necessary. I was running into the same issue recently and that worked for me:

with codecs.open( csv_file,  mode='w', encoding='utf-8') as out_csv:
     csv_out_file = csv.DictWriter(out_csv)

Python和pip,列出可用的软件包的所有版本?

问题:Python和pip,列出可用的软件包的所有版本?

给定可以与pip一起安装的Python软件包的名称,是否有任何方法可以找到pip可以安装的所有可能版本的列表?现在是反复试验。

我正在尝试为第三方库安装一个版本,但是最新版本太新了,进行了向后不兼容的更改。所以我想以某种方式列出pip知道的所有版本,以便我可以对其进行测试。

Given the name of a Python package that can be installed with pip, is there any way to find out a list of all the possible versions of it that pip could install? Right now it’s trial and error.

I’m trying to install a version for a third party library, but the newest version is too new, there were backwards incompatible changes made. So I’d like to somehow have a list of all the versions that pip knows about, so that I can test them.


回答 0

(更新:截至2020年3月,许多人报告说,通过安装的蛋黄pip install yolk3k只能返回最新版本。 克里斯的回答似乎最支持我,并为我工作)

pastebin上的脚本可以正常工作。但是,如果您要使用多个环境/主机,这不是很方便,因为您每次都必须复制/创建它。

更好的全方位解决方案是使用yolk3k,该软件可与pip一起安装。例如,查看可用的Django版本:

$ pip install yolk3k
$ yolk -V django
Django 1.3
Django 1.2.5
Django 1.2.4
Django 1.2.3
Django 1.2.2
Django 1.2.1
Django 1.2
Django 1.1.4
Django 1.1.3
Django 1.1.2
Django 1.0.4

yolk3k2012年yolk停止开发的原版的叉子。尽管已不再维护(如下面的注释所示),yolkyolk3k似乎并支持Python 3。

注意:我不参与yolk3k的开发。如果某些事情似乎无法正常工作,则在此处发表评论不会有太大的不同。请改用yolk3k问题追踪器,并考虑提交修订(如果可能)。

(update: As of March 2020, many people have reported that yolk, installed via pip install yolk3k, only returns latest version. Chris’s answer seems to have the most upvotes and worked for me)

The script at pastebin does work. However it’s not very convenient if you’re working with multiple environments/hosts because you will have to copy/create it every time.

A better all-around solution would be to use yolk3k, which is available to install with pip. E.g. to see what versions of Django are available:

$ pip install yolk3k
$ yolk -V django
Django 1.3
Django 1.2.5
Django 1.2.4
Django 1.2.3
Django 1.2.2
Django 1.2.1
Django 1.2
Django 1.1.4
Django 1.1.3
Django 1.1.2
Django 1.0.4

yolk3k is a fork of the original yolk which ceased development in 2012. Though yolk is no longer maintained (as indicated in comments below), yolk3k appears to be and supports Python 3.

Note: I am not involved in the development of yolk3k. If something doesn’t seem to work as it should, leaving a comment here should not make much difference. Use the yolk3k issue tracker instead and consider submitting a fix, if possible.


回答 1

用于PIP> = 9.0使用

$ pip install pylibmc==
Collecting pylibmc==
  Could not find a version that satisfies the requirement pylibmc== (from 
  versions: 0.2, 0.3, 0.4, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.5, 0.5, 0.6.1, 0.6, 
  0.7.1, 0.7.2, 0.7.3, 0.7.4, 0.7, 0.8.1, 0.8.2, 0.8, 0.9.1, 0.9.2, 0.9, 
  1.0-alpha, 1.0-beta, 1.0, 1.1.1, 1.1, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.3.0)
No matching distribution found for pylibmc==

–将打印所有可用版本,而无需实际下载或安装任何其他软件包。

对于pip <9.0使用

pip install pylibmc==blork

在哪里blork可以是不是有效版本号的任何字符串。

For pip >= 9.0 use

$ pip install pylibmc==
Collecting pylibmc==
  Could not find a version that satisfies the requirement pylibmc== (from 
  versions: 0.2, 0.3, 0.4, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.5, 0.5, 0.6.1, 0.6, 
  0.7.1, 0.7.2, 0.7.3, 0.7.4, 0.7, 0.8.1, 0.8.2, 0.8, 0.9.1, 0.9.2, 0.9, 
  1.0-alpha, 1.0-beta, 1.0, 1.1.1, 1.1, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.3.0)
No matching distribution found for pylibmc==

– all the available versions will be printed without actually downloading or installing any additional packages.

For pip < 9.0 use

pip install pylibmc==blork

where blork can be any string that is not a valid version number.


回答 2

更新:
自2017年9月起,此方法不再起作用:--no-install已在第7点中删除

采用 pip install -v,您可以查看所有可用的版本

root@node7:~# pip install web.py -v
Downloading/unpacking web.py
  Using version 0.37 (newest of versions: 0.37, 0.36, 0.35, 0.34, 0.33, 0.33, 0.32, 0.31, 0.22, 0.2)
  Downloading web.py-0.37.tar.gz (90Kb): 90Kb downloaded
  Running setup.py egg_info for package web.py
    running egg_info
    creating pip-egg-info/web.py.egg-info

要不安装任何软件包,请使用以下解决方案之一:

root@node7:~# pip install --no-deps --no-install flask -v                                                                                                      
Downloading/unpacking flask
  Using version 0.10.1 (newest of versions: 0.10.1, 0.10, 0.9, 0.8.1, 0.8, 0.7.2, 0.7.1, 0.7, 0.6.1, 0.6, 0.5.2, 0.5.1, 0.5, 0.4, 0.3.1, 0.3, 0.2, 0.1)
  Downloading Flask-0.10.1.tar.gz (544Kb): 544Kb downloaded

要么

root@node7:~# cd $(mktemp -d)
root@node7:/tmp/tmp.c6H99cWD0g# pip install flask -d . -v
Downloading/unpacking flask
  Using version 0.10.1 (newest of versions: 0.10.1, 0.10, 0.9, 0.8.1, 0.8, 0.7.2, 0.7.1, 0.7, 0.6.1, 0.6, 0.5.2, 0.5.1, 0.5, 0.4, 0.3.1, 0.3, 0.2, 0.1)
  Downloading Flask-0.10.1.tar.gz (544Kb): 4.1Kb downloaded

经过pip 1.0测试

root@node7:~# pip --version
pip 1.0 from /usr/lib/python2.7/dist-packages (python 2.7)

Update:
As of Sep 2017 this method no longer works: --no-install was removed in pip 7

Use pip install -v, you can see all versions that available

root@node7:~# pip install web.py -v
Downloading/unpacking web.py
  Using version 0.37 (newest of versions: 0.37, 0.36, 0.35, 0.34, 0.33, 0.33, 0.32, 0.31, 0.22, 0.2)
  Downloading web.py-0.37.tar.gz (90Kb): 90Kb downloaded
  Running setup.py egg_info for package web.py
    running egg_info
    creating pip-egg-info/web.py.egg-info

To not install any package, use one of following solution:

root@node7:~# pip install --no-deps --no-install flask -v                                                                                                      
Downloading/unpacking flask
  Using version 0.10.1 (newest of versions: 0.10.1, 0.10, 0.9, 0.8.1, 0.8, 0.7.2, 0.7.1, 0.7, 0.6.1, 0.6, 0.5.2, 0.5.1, 0.5, 0.4, 0.3.1, 0.3, 0.2, 0.1)
  Downloading Flask-0.10.1.tar.gz (544Kb): 544Kb downloaded

or

root@node7:~# cd $(mktemp -d)
root@node7:/tmp/tmp.c6H99cWD0g# pip install flask -d . -v
Downloading/unpacking flask
  Using version 0.10.1 (newest of versions: 0.10.1, 0.10, 0.9, 0.8.1, 0.8, 0.7.2, 0.7.1, 0.7, 0.6.1, 0.6, 0.5.2, 0.5.1, 0.5, 0.4, 0.3.1, 0.3, 0.2, 0.1)
  Downloading Flask-0.10.1.tar.gz (544Kb): 4.1Kb downloaded

Tested with pip 1.0

root@node7:~# pip --version
pip 1.0 from /usr/lib/python2.7/dist-packages (python 2.7)

回答 3

您不需要第三方软件包即可获取此信息。pypi为以下所有包提供了简单的JSON feed

https://pypi.python.org/pypi/{PKG_NAME}/json

以下是一些仅使用获取所有版本的标准库的Python代码。

import json
import urllib2
from distutils.version import StrictVersion

def versions(package_name):
    url = "https://pypi.python.org/pypi/%s/json" % (package_name,)
    data = json.load(urllib2.urlopen(urllib2.Request(url)))
    versions = data["releases"].keys()
    versions.sort(key=StrictVersion)
    return versions

print "\n".join(versions("scikit-image"))

该代码打印出来(截至2015年2月23日):

0.7.2
0.8.0
0.8.1
0.8.2
0.9.0
0.9.1
0.9.2
0.9.3
0.10.0
0.10.1

You don’t need a third party package to get this information. pypi provides simple JSON feeds for all packages under

https://pypi.python.org/pypi/{PKG_NAME}/json

Here’s some Python code using only the standard library which gets all versions.

import json
import urllib2
from distutils.version import StrictVersion

def versions(package_name):
    url = "https://pypi.python.org/pypi/%s/json" % (package_name,)
    data = json.load(urllib2.urlopen(urllib2.Request(url)))
    versions = data["releases"].keys()
    versions.sort(key=StrictVersion)
    return versions

print "\n".join(versions("scikit-image"))

That code prints (as of Feb 23rd, 2015):

0.7.2
0.8.0
0.8.1
0.8.2
0.9.0
0.9.1
0.9.2
0.9.3
0.10.0
0.10.1

回答 4

我想出了简单的bash脚本。感谢jq的作者。

#!/bin/bash
set -e

PACKAGE_JSON_URL="https://pypi.org/pypi/${1}/json"

curl -s "$PACKAGE_JSON_URL" | jq  -r '.releases | keys | .[]' | sort -V

更新:添加按版本号排序。

I came up with dead-simple bash script. Thanks to jq‘s author.

#!/bin/bash
set -e

PACKAGE_JSON_URL="https://pypi.org/pypi/${1}/json"

curl -s "$PACKAGE_JSON_URL" | jq  -r '.releases | keys | .[]' | sort -V

Update: Add sorting by version number.


回答 5

您可以使用yolk3k软件包而不是yolk。yolk3k是原始蛋黄的叉子,它同时支持python2和3。

https://github.com/myint/yolk

pip install yolk3k

You could the yolk3k package instead of yolk. yolk3k is a fork from the original yolk and it supports both python2 and 3.

https://github.com/myint/yolk

pip install yolk3k

回答 6

看了一段时间的pip代码后,看起来可以在中的PackageFinder类中找到负责定位软件包的代码pip.index。它的方法find_requirement查找的版本InstallRequirement,但不幸的是仅返回最新版本。

下面的代码几乎是原始函数的1:1副本,第114行的return更改为返回所有版本。

该脚本将一个包名称作为第一个也是唯一的参数,并返回所有版本。

http://pastebin.com/axzdUQhZ

我不保证正确性,因为我对pip的代码不熟悉。但希望这会有所帮助。

样品输出

python test.py pip
Versions of pip
0.8.2
0.8.1
0.8
0.7.2
0.7.1
0.7
0.6.3
0.6.2
0.6.1
0.6
0.5.1
0.5
0.4
0.3.1
0.3
0.2.1
0.2 dev

编码:

import posixpath
import pkg_resources
import sys
from pip.download import url_to_path
from pip.exceptions import DistributionNotFound
from pip.index import PackageFinder, Link
from pip.log import logger
from pip.req import InstallRequirement
from pip.util import Inf


class MyPackageFinder(PackageFinder):

    def find_requirement(self, req, upgrade):
        url_name = req.url_name
        # Only check main index if index URL is given:
        main_index_url = None
        if self.index_urls:
            # Check that we have the url_name correctly spelled:
            main_index_url = Link(posixpath.join(self.index_urls[0], url_name))
            # This will also cache the page, so it's okay that we get it again later:
            page = self._get_page(main_index_url, req)
            if page is None:
                url_name = self._find_url_name(Link(self.index_urls[0]), url_name, req) or req.url_name

        # Combine index URLs with mirror URLs here to allow
        # adding more index URLs from requirements files
        all_index_urls = self.index_urls + self.mirror_urls

        def mkurl_pypi_url(url):
            loc = posixpath.join(url, url_name)
            # For maximum compatibility with easy_install, ensure the path
            # ends in a trailing slash.  Although this isn't in the spec
            # (and PyPI can handle it without the slash) some other index
            # implementations might break if they relied on easy_install's behavior.
            if not loc.endswith('/'):
                loc = loc + '/'
            return loc
        if url_name is not None:
            locations = [
                mkurl_pypi_url(url)
                for url in all_index_urls] + self.find_links
        else:
            locations = list(self.find_links)
        locations.extend(self.dependency_links)
        for version in req.absolute_versions:
            if url_name is not None and main_index_url is not None:
                locations = [
                    posixpath.join(main_index_url.url, version)] + locations

        file_locations, url_locations = self._sort_locations(locations)

        locations = [Link(url) for url in url_locations]
        logger.debug('URLs to search for versions for %s:' % req)
        for location in locations:
            logger.debug('* %s' % location)
        found_versions = []
        found_versions.extend(
            self._package_versions(
                [Link(url, '-f') for url in self.find_links], req.name.lower()))
        page_versions = []
        for page in self._get_pages(locations, req):
            logger.debug('Analyzing links from page %s' % page.url)
            logger.indent += 2
            try:
                page_versions.extend(self._package_versions(page.links, req.name.lower()))
            finally:
                logger.indent -= 2
        dependency_versions = list(self._package_versions(
            [Link(url) for url in self.dependency_links], req.name.lower()))
        if dependency_versions:
            logger.info('dependency_links found: %s' % ', '.join([link.url for parsed, link, version in dependency_versions]))
        file_versions = list(self._package_versions(
                [Link(url) for url in file_locations], req.name.lower()))
        if not found_versions and not page_versions and not dependency_versions and not file_versions:
            logger.fatal('Could not find any downloads that satisfy the requirement %s' % req)
            raise DistributionNotFound('No distributions at all found for %s' % req)
        if req.satisfied_by is not None:
            found_versions.append((req.satisfied_by.parsed_version, Inf, req.satisfied_by.version))
        if file_versions:
            file_versions.sort(reverse=True)
            logger.info('Local files found: %s' % ', '.join([url_to_path(link.url) for parsed, link, version in file_versions]))
            found_versions = file_versions + found_versions
        all_versions = found_versions + page_versions + dependency_versions
        applicable_versions = []
        for (parsed_version, link, version) in all_versions:
            if version not in req.req:
                logger.info("Ignoring link %s, version %s doesn't match %s"
                            % (link, version, ','.join([''.join(s) for s in req.req.specs])))
                continue
            applicable_versions.append((link, version))
        applicable_versions = sorted(applicable_versions, key=lambda v: pkg_resources.parse_version(v[1]), reverse=True)
        existing_applicable = bool([link for link, version in applicable_versions if link is Inf])
        if not upgrade and existing_applicable:
            if applicable_versions[0][1] is Inf:
                logger.info('Existing installed version (%s) is most up-to-date and satisfies requirement'
                            % req.satisfied_by.version)
            else:
                logger.info('Existing installed version (%s) satisfies requirement (most up-to-date version is %s)'
                            % (req.satisfied_by.version, applicable_versions[0][1]))
            return None
        if not applicable_versions:
            logger.fatal('Could not find a version that satisfies the requirement %s (from versions: %s)'
                         % (req, ', '.join([version for parsed_version, link, version in found_versions])))
            raise DistributionNotFound('No distributions matching the version for %s' % req)
        if applicable_versions[0][0] is Inf:
            # We have an existing version, and its the best version
            logger.info('Installed version (%s) is most up-to-date (past versions: %s)'
                        % (req.satisfied_by.version, ', '.join([version for link, version in applicable_versions[1:]]) or 'none'))
            return None
        if len(applicable_versions) > 1:
            logger.info('Using version %s (newest of versions: %s)' %
                        (applicable_versions[0][1], ', '.join([version for link, version in applicable_versions])))
        return applicable_versions


if __name__ == '__main__':
    req = InstallRequirement.from_line(sys.argv[1], None)
    finder = MyPackageFinder([], ['http://pypi.python.org/simple/'])
    versions = finder.find_requirement(req, False)
    print 'Versions of %s' % sys.argv[1]
    for v in versions:
        print v[1]

After looking at pip’s code for a while, it looks like the code responsible for locating packages can be found in the PackageFinder class in pip.index. Its method find_requirement looks up the versions of a InstallRequirement, but unfortunately only returns the most recent version.

The code below is almost a 1:1 copy of the original function, with the return in line 114 changed to return all versions.

The script expects one package name as first and only argument and returns all versions.

http://pastebin.com/axzdUQhZ

I can’t guarantee for the correctness, as I’m not familiar with pip’s code. But hopefully this helps.

Sample output

python test.py pip
Versions of pip
0.8.2
0.8.1
0.8
0.7.2
0.7.1
0.7
0.6.3
0.6.2
0.6.1
0.6
0.5.1
0.5
0.4
0.3.1
0.3
0.2.1
0.2 dev

The code:

import posixpath
import pkg_resources
import sys
from pip.download import url_to_path
from pip.exceptions import DistributionNotFound
from pip.index import PackageFinder, Link
from pip.log import logger
from pip.req import InstallRequirement
from pip.util import Inf


class MyPackageFinder(PackageFinder):

    def find_requirement(self, req, upgrade):
        url_name = req.url_name
        # Only check main index if index URL is given:
        main_index_url = None
        if self.index_urls:
            # Check that we have the url_name correctly spelled:
            main_index_url = Link(posixpath.join(self.index_urls[0], url_name))
            # This will also cache the page, so it's okay that we get it again later:
            page = self._get_page(main_index_url, req)
            if page is None:
                url_name = self._find_url_name(Link(self.index_urls[0]), url_name, req) or req.url_name

        # Combine index URLs with mirror URLs here to allow
        # adding more index URLs from requirements files
        all_index_urls = self.index_urls + self.mirror_urls

        def mkurl_pypi_url(url):
            loc = posixpath.join(url, url_name)
            # For maximum compatibility with easy_install, ensure the path
            # ends in a trailing slash.  Although this isn't in the spec
            # (and PyPI can handle it without the slash) some other index
            # implementations might break if they relied on easy_install's behavior.
            if not loc.endswith('/'):
                loc = loc + '/'
            return loc
        if url_name is not None:
            locations = [
                mkurl_pypi_url(url)
                for url in all_index_urls] + self.find_links
        else:
            locations = list(self.find_links)
        locations.extend(self.dependency_links)
        for version in req.absolute_versions:
            if url_name is not None and main_index_url is not None:
                locations = [
                    posixpath.join(main_index_url.url, version)] + locations

        file_locations, url_locations = self._sort_locations(locations)

        locations = [Link(url) for url in url_locations]
        logger.debug('URLs to search for versions for %s:' % req)
        for location in locations:
            logger.debug('* %s' % location)
        found_versions = []
        found_versions.extend(
            self._package_versions(
                [Link(url, '-f') for url in self.find_links], req.name.lower()))
        page_versions = []
        for page in self._get_pages(locations, req):
            logger.debug('Analyzing links from page %s' % page.url)
            logger.indent += 2
            try:
                page_versions.extend(self._package_versions(page.links, req.name.lower()))
            finally:
                logger.indent -= 2
        dependency_versions = list(self._package_versions(
            [Link(url) for url in self.dependency_links], req.name.lower()))
        if dependency_versions:
            logger.info('dependency_links found: %s' % ', '.join([link.url for parsed, link, version in dependency_versions]))
        file_versions = list(self._package_versions(
                [Link(url) for url in file_locations], req.name.lower()))
        if not found_versions and not page_versions and not dependency_versions and not file_versions:
            logger.fatal('Could not find any downloads that satisfy the requirement %s' % req)
            raise DistributionNotFound('No distributions at all found for %s' % req)
        if req.satisfied_by is not None:
            found_versions.append((req.satisfied_by.parsed_version, Inf, req.satisfied_by.version))
        if file_versions:
            file_versions.sort(reverse=True)
            logger.info('Local files found: %s' % ', '.join([url_to_path(link.url) for parsed, link, version in file_versions]))
            found_versions = file_versions + found_versions
        all_versions = found_versions + page_versions + dependency_versions
        applicable_versions = []
        for (parsed_version, link, version) in all_versions:
            if version not in req.req:
                logger.info("Ignoring link %s, version %s doesn't match %s"
                            % (link, version, ','.join([''.join(s) for s in req.req.specs])))
                continue
            applicable_versions.append((link, version))
        applicable_versions = sorted(applicable_versions, key=lambda v: pkg_resources.parse_version(v[1]), reverse=True)
        existing_applicable = bool([link for link, version in applicable_versions if link is Inf])
        if not upgrade and existing_applicable:
            if applicable_versions[0][1] is Inf:
                logger.info('Existing installed version (%s) is most up-to-date and satisfies requirement'
                            % req.satisfied_by.version)
            else:
                logger.info('Existing installed version (%s) satisfies requirement (most up-to-date version is %s)'
                            % (req.satisfied_by.version, applicable_versions[0][1]))
            return None
        if not applicable_versions:
            logger.fatal('Could not find a version that satisfies the requirement %s (from versions: %s)'
                         % (req, ', '.join([version for parsed_version, link, version in found_versions])))
            raise DistributionNotFound('No distributions matching the version for %s' % req)
        if applicable_versions[0][0] is Inf:
            # We have an existing version, and its the best version
            logger.info('Installed version (%s) is most up-to-date (past versions: %s)'
                        % (req.satisfied_by.version, ', '.join([version for link, version in applicable_versions[1:]]) or 'none'))
            return None
        if len(applicable_versions) > 1:
            logger.info('Using version %s (newest of versions: %s)' %
                        (applicable_versions[0][1], ', '.join([version for link, version in applicable_versions])))
        return applicable_versions


if __name__ == '__main__':
    req = InstallRequirement.from_line(sys.argv[1], None)
    finder = MyPackageFinder([], ['http://pypi.python.org/simple/'])
    versions = finder.find_requirement(req, False)
    print 'Versions of %s' % sys.argv[1]
    for v in versions:
        print v[1]

回答 7

您可以使用这个小的Python 3脚本(仅使用标准库模块)来使用JSON API从PyPI抓取软件包的可用版本列表,并以相反的时间顺序打印它们。不像其他一些Python的解决方案张贴在这里,但这并不松散的版本一样突破django2.2rc1还是uwsgi2.0.17.1

#!/usr/bin/env python3

import json
import sys
from urllib import request    
from pkg_resources import parse_version    

def versions(pkg_name):
    url = f'https://pypi.python.org/pypi/{pkg_name}/json'
    releases = json.loads(request.urlopen(url).read())['releases']
    return sorted(releases, key=parse_version, reverse=True)    

if __name__ == '__main__':
    print(*versions(sys.argv[1]), sep='\n')

保存脚本并以包名称作为参数运行它,例如:

python versions.py django
3.0a1
2.2.5
2.2.4
2.2.3
2.2.2
2.2.1
2.2
2.2rc1
...

You can use this small Python 3 script (using only standard library modules) to grab the list of available versions for a package from PyPI using JSON API and print them in reverse chronological order. Unlike some other Python solutions posted here, this doesn’t break on loose versions like django‘s 2.2rc1 or uwsgi‘s 2.0.17.1:

#!/usr/bin/env python3

import json
import sys
from urllib import request    
from pkg_resources import parse_version    

def versions(pkg_name):
    url = f'https://pypi.python.org/pypi/{pkg_name}/json'
    releases = json.loads(request.urlopen(url).read())['releases']
    return sorted(releases, key=parse_version, reverse=True)    

if __name__ == '__main__':
    print(*versions(sys.argv[1]), sep='\n')

Save the script and run it with the package name as an argument, e.g.:

python versions.py django
3.0a1
2.2.5
2.2.4
2.2.3
2.2.2
2.2.1
2.2
2.2rc1
...

回答 8

https://pypi.python.org/pypi/Django/适用于维护者选择显示所有软件包的软件包 https://pypi.python.org/simple/pip/-无论如何都应该做到这一点(列出所有链接)

https://pypi.python.org/pypi/Django/ – works for packages whose maintainers choose to show all packages https://pypi.python.org/simple/pip/ – should do the trick anyhow (lists all links)


回答 9

这对我在OSX上有效:

pip install docker-compose== 2>&1 \
| grep -oE '(\(.*\))' \
| awk -F:\  '{print$NF}' \
| sed -E 's/( |\))//g' \
| tr ',' '\n'

它每行返回一个列表:

1.1.0rc1
1.1.0rc2
1.1.0
1.2.0rc1
1.2.0rc2
1.2.0rc3
1.2.0rc4
1.2.0
1.3.0rc1
1.3.0rc2
1.3.0rc3
1.3.0
1.3.1
1.3.2
1.3.3
1.4.0rc1
1.4.0rc2
1.4.0rc3
1.4.0
1.4.1
1.4.2
1.5.0rc1
1.5.0rc2
1.5.0rc3
1.5.0
1.5.1
1.5.2
1.6.0rc1
1.6.0
1.6.1
1.6.2
1.7.0rc1
1.7.0rc2
1.7.0
1.7.1
1.8.0rc1
1.8.0rc2
1.8.0
1.8.1
1.9.0rc1
1.9.0rc2
1.9.0rc3
1.9.0rc4
1.9.0
1.10.0rc1
1.10.0rc2
1.10.0

或获取可用的最新版本:

pip install docker-compose== 2>&1 \
| grep -oE '(\(.*\))' \
| awk -F:\  '{print$NF}' \
| sed -E 's/( |\))//g' \
| tr ',' '\n' \
| gsort -r -V \
| head -1
1.10.0rc2

请记住gsort,必须安装(在OSX上)以解析版本。您可以使用安装brew install coreutils

This works for me on OSX:

pip install docker-compose== 2>&1 \
| grep -oE '(\(.*\))' \
| awk -F:\  '{print$NF}' \
| sed -E 's/( |\))//g' \
| tr ',' '\n'

It returns the list one per line:

1.1.0rc1
1.1.0rc2
1.1.0
1.2.0rc1
1.2.0rc2
1.2.0rc3
1.2.0rc4
1.2.0
1.3.0rc1
1.3.0rc2
1.3.0rc3
1.3.0
1.3.1
1.3.2
1.3.3
1.4.0rc1
1.4.0rc2
1.4.0rc3
1.4.0
1.4.1
1.4.2
1.5.0rc1
1.5.0rc2
1.5.0rc3
1.5.0
1.5.1
1.5.2
1.6.0rc1
1.6.0
1.6.1
1.6.2
1.7.0rc1
1.7.0rc2
1.7.0
1.7.1
1.8.0rc1
1.8.0rc2
1.8.0
1.8.1
1.9.0rc1
1.9.0rc2
1.9.0rc3
1.9.0rc4
1.9.0
1.10.0rc1
1.10.0rc2
1.10.0

Or to get the latest version available:

pip install docker-compose== 2>&1 \
| grep -oE '(\(.*\))' \
| awk -F:\  '{print$NF}' \
| sed -E 's/( |\))//g' \
| tr ',' '\n' \
| gsort -r -V \
| head -1
1.10.0rc2

Keep in mind gsort has to be installed (on OSX) to parse the versions. You can install it with brew install coreutils


回答 10

我的项目luddite具有此功能。

用法示例:

>>> import luddite
>>> luddite.get_versions_pypi("python-dateutil")
('0.1', '0.3', '0.4', '0.5', '1.0', '1.1', '1.2', '1.4', '1.4.1', '1.5', '2.0', '2.1', '2.2', '2.3', '2.4.0', '2.4.1', '2.4.2', '2.5.0', '2.5.1', '2.5.2', '2.5.3', '2.6.0', '2.6.1', '2.7.0', '2.7.1', '2.7.2', '2.7.3', '2.7.4', '2.7.5', '2.8.0')

通过查询https://pypi.org/的json API,它列出了可用软件包的所有版本。

My project luddite has this feature.

Example usage:

>>> import luddite
>>> luddite.get_versions_pypi("python-dateutil")
('0.1', '0.3', '0.4', '0.5', '1.0', '1.1', '1.2', '1.4', '1.4.1', '1.5', '2.0', '2.1', '2.2', '2.3', '2.4.0', '2.4.1', '2.4.2', '2.5.0', '2.5.1', '2.5.2', '2.5.3', '2.6.0', '2.6.1', '2.7.0', '2.7.1', '2.7.2', '2.7.3', '2.7.4', '2.7.5', '2.8.0')

It lists all versions of a package available, by querying the json API of https://pypi.org/


回答 11

我没有任何运气yolkyolk3kpip install -v可是所以最后我用这个(埃里克蒋介石的回答适合到Python 3):

import json
import requests
from distutils.version import StrictVersion

def versions(package_name):
    url = "https://pypi.python.org/pypi/{}/json".format(package_name)
    data = requests.get(url).json()
    return sorted(list(data["releases"].keys()), key=StrictVersion, reverse=True)

>>> print("\n".join(versions("gunicorn")))
19.1.1
19.1.0
19.0.0
18.0
17.5
0.17.4
0.17.3
...

I didn’t have any luck with yolk, yolk3k or pip install -v but so I ended up using this (adapted to Python 3 from eric chiang’s answer):

import json
import requests
from distutils.version import StrictVersion

def versions(package_name):
    url = "https://pypi.python.org/pypi/{}/json".format(package_name)
    data = requests.get(url).json()
    return sorted(list(data["releases"].keys()), key=StrictVersion, reverse=True)

>>> print("\n".join(versions("gunicorn")))
19.1.1
19.1.0
19.0.0
18.0
17.5
0.17.4
0.17.3
...

回答 12

另一种解决方案是使用Warehouse API:

https://warehouse.readthedocs.io/api-reference/json/#release

例如Flask:

import requests
r = requests.get("https://pypi.org/pypi/Flask/json")
print(r.json()['releases'].keys())

将打印:

dict_keys(['0.1', '0.10', '0.10.1', '0.11', '0.11.1', '0.12', '0.12.1', '0.12.2', '0.12.3', '0.12.4', '0.2', '0.3', '0.3.1', '0.4', '0.5', '0.5.1', '0.5.2', '0.6', '0.6.1', '0.7', '0.7.1', '0.7.2', '0.8', '0.8.1', '0.9', '1.0', '1.0.1', '1.0.2'])

Alternative solution is to use the Warehouse APIs:

https://warehouse.readthedocs.io/api-reference/json/#release

For instance for Flask:

import requests
r = requests.get("https://pypi.org/pypi/Flask/json")
print(r.json()['releases'].keys())

will print:

dict_keys(['0.1', '0.10', '0.10.1', '0.11', '0.11.1', '0.12', '0.12.1', '0.12.2', '0.12.3', '0.12.4', '0.2', '0.3', '0.3.1', '0.4', '0.5', '0.5.1', '0.5.2', '0.6', '0.6.1', '0.7', '0.7.1', '0.7.2', '0.8', '0.8.1', '0.9', '1.0', '1.0.1', '1.0.2'])

回答 13

bash仅依赖于python自身的简单脚本(我假设应该在问题的上下文中进行安装)以及curl或之一wget。假设您已setuptools安装软件包以对版本进行排序(几乎始终已安装)。它不依赖外部依赖项,例如:

  • jq 可能不存在;
  • grep并且awk在Linux和macOS上的行为可能有所不同。
curl --silent --location https://pypi.org/pypi/requests/json | python -c "import sys, json, pkg_resources; releases = json.load(sys.stdin)['releases']; print(' '.join(sorted(releases, key=pkg_resources.parse_version)))"

带有注释的较长版本。

将包名称放入变量中:

PACKAGE=requests

获取版本(使用curl):

VERSIONS=$(curl --silent --location https://pypi.org/pypi/$PACKAGE/json | python -c "import sys, json, pkg_resources; releases = json.load(sys.stdin)['releases']; print(' '.join(sorted(releases, key=pkg_resources.parse_version)))")

获取版本(使用wget):

VERSIONS=$(wget -qO- https://pypi.org/pypi/$PACKAGE/json | python -c "import sys, json, pkg_resources; releases = json.load(sys.stdin)['releases']; print(' '.join(sorted(releases, key=pkg_resources.parse_version)))")

打印排序版本:

echo $VERSIONS

Simple bash script that relies only on python itself (I assume that in the context of the question it should be installed) and one of curl or wget. It has an assumption that you have setuptools package installed to sort versions (almost always installed). It doesn’t rely on external dependencies such as:

  • jq which may not be present;
  • grep and awk that may behave differently on Linux and macOS.
curl --silent --location https://pypi.org/pypi/requests/json | python -c "import sys, json, pkg_resources; releases = json.load(sys.stdin)['releases']; print(' '.join(sorted(releases, key=pkg_resources.parse_version)))"

A little bit longer version with comments.

Put the package name into a variable:

PACKAGE=requests

Get versions (using curl):

VERSIONS=$(curl --silent --location https://pypi.org/pypi/$PACKAGE/json | python -c "import sys, json, pkg_resources; releases = json.load(sys.stdin)['releases']; print(' '.join(sorted(releases, key=pkg_resources.parse_version)))")

Get versions (using wget):

VERSIONS=$(wget -qO- https://pypi.org/pypi/$PACKAGE/json | python -c "import sys, json, pkg_resources; releases = json.load(sys.stdin)['releases']; print(' '.join(sorted(releases, key=pkg_resources.parse_version)))")

Print sorted versions:

echo $VERSIONS

回答 14

我的看法是结合了几个已发布的答案,并进行了一些修改,以使其在运行中的python环境中更易于使用。

这个想法是提供一个全新的命令(在install命令之后建模),为您提供要使用的软件包查找程序的实例。好处是,它可以与pip支持并读取本地pip配置文件的任何索引一起使用并使用,因此您可以获得与普通pip安装相同的正确结果。

我已经尝试使其与pip v 9.x和10.x兼容。.但是仅在9.x上尝试过

https://gist.github.com/kaos/68511bd013fcdebe766c981f50b473d4

#!/usr/bin/env python
# When you want a easy way to get at all (or the latest) version of a certain python package from a PyPi index.

import sys
import logging

try:
    from pip._internal import cmdoptions, main
    from pip._internal.commands import commands_dict
    from pip._internal.basecommand import RequirementCommand
except ImportError:
    from pip import cmdoptions, main
    from pip.commands import commands_dict
    from pip.basecommand import RequirementCommand

from pip._vendor.packaging.version import parse as parse_version

logger = logging.getLogger('pip')

class ListPkgVersionsCommand(RequirementCommand):
    """
    List all available versions for a given package from:

    - PyPI (and other indexes) using requirement specifiers.
    - VCS project urls.
    - Local project directories.
    - Local or remote source archives.

    """
    name = "list-pkg-versions"
    usage = """
      %prog [options] <requirement specifier> [package-index-options] ...
      %prog [options] [-e] <vcs project url> ...
      %prog [options] [-e] <local project path> ...
      %prog [options] <archive url/path> ..."""

    summary = 'List package versions.'

    def __init__(self, *args, **kw):
        super(ListPkgVersionsCommand, self).__init__(*args, **kw)

        cmd_opts = self.cmd_opts

        cmd_opts.add_option(cmdoptions.install_options())
        cmd_opts.add_option(cmdoptions.global_options())
        cmd_opts.add_option(cmdoptions.use_wheel())
        cmd_opts.add_option(cmdoptions.no_use_wheel())
        cmd_opts.add_option(cmdoptions.no_binary())
        cmd_opts.add_option(cmdoptions.only_binary())
        cmd_opts.add_option(cmdoptions.pre())
        cmd_opts.add_option(cmdoptions.require_hashes())

        index_opts = cmdoptions.make_option_group(
            cmdoptions.index_group,
            self.parser,
        )

        self.parser.insert_option_group(0, index_opts)
        self.parser.insert_option_group(0, cmd_opts)

    def run(self, options, args):
        cmdoptions.resolve_wheel_no_use_binary(options)
        cmdoptions.check_install_build_global(options)

        with self._build_session(options) as session:
            finder = self._build_package_finder(options, session)

            # do what you please with the finder object here... ;)
            for pkg in args:
                logger.info(
                    '%s: %s', pkg,
                    ', '.join(
                        sorted(
                            set(str(c.version) for c in finder.find_all_candidates(pkg)),
                            key=parse_version,
                        )
                    )
                )


commands_dict[ListPkgVersionsCommand.name] = ListPkgVersionsCommand

if __name__ == '__main__':
    sys.exit(main())

输出示例

./list-pkg-versions.py list-pkg-versions pika django
pika: 0.5, 0.5.1, 0.5.2, 0.9.1a0, 0.9.2a0, 0.9.3, 0.9.4, 0.9.5, 0.9.6, 0.9.7, 0.9.8, 0.9.9, 0.9.10, 0.9.11, 0.9.12, 0.9.13, 0.9.14, 0.10.0b1, 0.10.0b2, 0.10.0, 0.11.0b1, 0.11.0, 0.11.1, 0.11.2, 0.12.0b2
django: 1.1.3, 1.1.4, 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.2.7, 1.3, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.4, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.7, 1.4.8, 1.4.9, 1.4.10, 1.4.11, 1.4.12, 1.4.13, 1.4.14, 1.4.15, 1.4.16, 1.4.17, 1.4.18, 1.4.19, 1.4.20, 1.4.21, 1.4.22, 1.5, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.5.10, 1.5.11, 1.5.12, 1.6, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.6.6, 1.6.7, 1.6.8, 1.6.9, 1.6.10, 1.6.11, 1.7, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.7.8, 1.7.9, 1.7.10, 1.7.11, 1.8a1, 1.8b1, 1.8b2, 1.8rc1, 1.8, 1.8.1, 1.8.2, 1.8.3, 1.8.4, 1.8.5, 1.8.6, 1.8.7, 1.8.8, 1.8.9, 1.8.10, 1.8.11, 1.8.12, 1.8.13, 1.8.14, 1.8.15, 1.8.16, 1.8.17, 1.8.18, 1.8.19, 1.9a1, 1.9b1, 1.9rc1, 1.9rc2, 1.9, 1.9.1, 1.9.2, 1.9.3, 1.9.4, 1.9.5, 1.9.6, 1.9.7, 1.9.8, 1.9.9, 1.9.10, 1.9.11, 1.9.12, 1.9.13, 1.10a1, 1.10b1, 1.10rc1, 1.10, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.10.5, 1.10.6, 1.10.7, 1.10.8, 1.11a1, 1.11b1, 1.11rc1, 1.11, 1.11.1, 1.11.2, 1.11.3, 1.11.4, 1.11.5, 1.11.6, 1.11.7, 1.11.8, 1.11.9, 1.11.10, 1.11.11, 1.11.12, 2.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4

My take is a combination of a couple of posted answers, with some modifications to make them easier to use from within a running python environment.

The idea is to provide a entirely new command (modeled after the install command) that gives you an instance of the package finder to use. The upside is that it works with, and uses, any indexes that pip supports and reads your local pip configuration files, so you get the correct results as you would with a normal pip install.

I’ve made an attempt at making it compatible with both pip v 9.x and 10.x.. but only tried it on 9.x

https://gist.github.com/kaos/68511bd013fcdebe766c981f50b473d4

#!/usr/bin/env python
# When you want a easy way to get at all (or the latest) version of a certain python package from a PyPi index.

import sys
import logging

try:
    from pip._internal import cmdoptions, main
    from pip._internal.commands import commands_dict
    from pip._internal.basecommand import RequirementCommand
except ImportError:
    from pip import cmdoptions, main
    from pip.commands import commands_dict
    from pip.basecommand import RequirementCommand

from pip._vendor.packaging.version import parse as parse_version

logger = logging.getLogger('pip')

class ListPkgVersionsCommand(RequirementCommand):
    """
    List all available versions for a given package from:

    - PyPI (and other indexes) using requirement specifiers.
    - VCS project urls.
    - Local project directories.
    - Local or remote source archives.

    """
    name = "list-pkg-versions"
    usage = """
      %prog [options] <requirement specifier> [package-index-options] ...
      %prog [options] [-e] <vcs project url> ...
      %prog [options] [-e] <local project path> ...
      %prog [options] <archive url/path> ..."""

    summary = 'List package versions.'

    def __init__(self, *args, **kw):
        super(ListPkgVersionsCommand, self).__init__(*args, **kw)

        cmd_opts = self.cmd_opts

        cmd_opts.add_option(cmdoptions.install_options())
        cmd_opts.add_option(cmdoptions.global_options())
        cmd_opts.add_option(cmdoptions.use_wheel())
        cmd_opts.add_option(cmdoptions.no_use_wheel())
        cmd_opts.add_option(cmdoptions.no_binary())
        cmd_opts.add_option(cmdoptions.only_binary())
        cmd_opts.add_option(cmdoptions.pre())
        cmd_opts.add_option(cmdoptions.require_hashes())

        index_opts = cmdoptions.make_option_group(
            cmdoptions.index_group,
            self.parser,
        )

        self.parser.insert_option_group(0, index_opts)
        self.parser.insert_option_group(0, cmd_opts)

    def run(self, options, args):
        cmdoptions.resolve_wheel_no_use_binary(options)
        cmdoptions.check_install_build_global(options)

        with self._build_session(options) as session:
            finder = self._build_package_finder(options, session)

            # do what you please with the finder object here... ;)
            for pkg in args:
                logger.info(
                    '%s: %s', pkg,
                    ', '.join(
                        sorted(
                            set(str(c.version) for c in finder.find_all_candidates(pkg)),
                            key=parse_version,
                        )
                    )
                )


commands_dict[ListPkgVersionsCommand.name] = ListPkgVersionsCommand

if __name__ == '__main__':
    sys.exit(main())

Example output

./list-pkg-versions.py list-pkg-versions pika django
pika: 0.5, 0.5.1, 0.5.2, 0.9.1a0, 0.9.2a0, 0.9.3, 0.9.4, 0.9.5, 0.9.6, 0.9.7, 0.9.8, 0.9.9, 0.9.10, 0.9.11, 0.9.12, 0.9.13, 0.9.14, 0.10.0b1, 0.10.0b2, 0.10.0, 0.11.0b1, 0.11.0, 0.11.1, 0.11.2, 0.12.0b2
django: 1.1.3, 1.1.4, 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.2.7, 1.3, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.4, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.7, 1.4.8, 1.4.9, 1.4.10, 1.4.11, 1.4.12, 1.4.13, 1.4.14, 1.4.15, 1.4.16, 1.4.17, 1.4.18, 1.4.19, 1.4.20, 1.4.21, 1.4.22, 1.5, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.5.10, 1.5.11, 1.5.12, 1.6, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.6.6, 1.6.7, 1.6.8, 1.6.9, 1.6.10, 1.6.11, 1.7, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.7.8, 1.7.9, 1.7.10, 1.7.11, 1.8a1, 1.8b1, 1.8b2, 1.8rc1, 1.8, 1.8.1, 1.8.2, 1.8.3, 1.8.4, 1.8.5, 1.8.6, 1.8.7, 1.8.8, 1.8.9, 1.8.10, 1.8.11, 1.8.12, 1.8.13, 1.8.14, 1.8.15, 1.8.16, 1.8.17, 1.8.18, 1.8.19, 1.9a1, 1.9b1, 1.9rc1, 1.9rc2, 1.9, 1.9.1, 1.9.2, 1.9.3, 1.9.4, 1.9.5, 1.9.6, 1.9.7, 1.9.8, 1.9.9, 1.9.10, 1.9.11, 1.9.12, 1.9.13, 1.10a1, 1.10b1, 1.10rc1, 1.10, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.10.5, 1.10.6, 1.10.7, 1.10.8, 1.11a1, 1.11b1, 1.11rc1, 1.11, 1.11.1, 1.11.2, 1.11.3, 1.11.4, 1.11.5, 1.11.6, 1.11.7, 1.11.8, 1.11.9, 1.11.10, 1.11.11, 1.11.12, 2.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4

Python中类似C的结构

问题:Python中类似C的结构

有没有一种方法可以在Python中方便地定义类似C的结构?我讨厌写这样的东西:

class MyStruct():
    def __init__(self, field1, field2, field3):
        self.field1 = field1
        self.field2 = field2
        self.field3 = field3

Is there a way to conveniently define a C-like structure in Python? I’m tired of writing stuff like:

class MyStruct():
    def __init__(self, field1, field2, field3):
        self.field1 = field1
        self.field2 = field2
        self.field3 = field3

回答 0

使用命名的tuple,它已添加到Python 2.6的标准库的collections模块中。如果您需要支持Python 2.4,也可以使用Raymond Hettinger的命名元组配方。

这对于您的基本示例很好,但是也涵盖了以后可能会遇到的许多极端情况。您上面的片段将写为:

from collections import namedtuple
MyStruct = namedtuple("MyStruct", "field1 field2 field3")

新创建的类型可以这样使用:

m = MyStruct("foo", "bar", "baz")

您还可以使用命名参数:

m = MyStruct(field1="foo", field2="bar", field3="baz")

Use a named tuple, which was added to the collections module in the standard library in Python 2.6. It’s also possible to use Raymond Hettinger’s named tuple recipe if you need to support Python 2.4.

It’s nice for your basic example, but also covers a bunch of edge cases you might run into later as well. Your fragment above would be written as:

from collections import namedtuple
MyStruct = namedtuple("MyStruct", "field1 field2 field3")

The newly created type can be used like this:

m = MyStruct("foo", "bar", "baz")

You can also use named arguments:

m = MyStruct(field1="foo", field2="bar", field3="baz")

回答 1

更新:数据类

通过引入数据类的Python 3.7,我们非常接近。

下面的示例类似于下面的NamedTuple示例,但是结果对象是可变的,并且允许使用默认值。

from dataclasses import dataclass


@dataclass
class Point:
    x: float
    y: float
    z: float = 0.0


p = Point(1.5, 2.5)

print(p)  # Point(x=1.5, y=2.5, z=0.0)

如果您想使用更多特定的类型注释,那么这与新的键入模块配合使用非常好。

我一直在拼命等待!如果您问我,数据类和新的NamedTuple声明,再加上键入模块,真是天赐之物!

改进了NamedTuple声明

Python 3.6开始,只要您可以忍受不变性,它就会变得非常简单和美观(IMHO)。

一个声明NamedTuples的新方法被引入,它允许类型的注释,以及:

from typing import NamedTuple


class User(NamedTuple):
    name: str


class MyStruct(NamedTuple):
    foo: str
    bar: int
    baz: list
    qux: User


my_item = MyStruct('foo', 0, ['baz'], User('peter'))

print(my_item) # MyStruct(foo='foo', bar=0, baz=['baz'], qux=User(name='peter'))

Update: Data Classes

With the introduction of Data Classes in Python 3.7 we get very close.

The following example is similar to the NamedTuple example below, but the resulting object is mutable and it allows for default values.

from dataclasses import dataclass


@dataclass
class Point:
    x: float
    y: float
    z: float = 0.0


p = Point(1.5, 2.5)

print(p)  # Point(x=1.5, y=2.5, z=0.0)

This plays nicely with the new typing module in case you want to use more specific type annotations.

I’ve been waiting desperately for this! If you ask me, Data Classes and the new NamedTuple declaration, combined with the typing module are a godsend!

Improved NamedTuple declaration

Since Python 3.6 it became quite simple and beautiful (IMHO), as long as you can live with immutability.

A new way of declaring NamedTuples was introduced, which allows for type annotations as well:

from typing import NamedTuple


class User(NamedTuple):
    name: str


class MyStruct(NamedTuple):
    foo: str
    bar: int
    baz: list
    qux: User


my_item = MyStruct('foo', 0, ['baz'], User('peter'))

print(my_item) # MyStruct(foo='foo', bar=0, baz=['baz'], qux=User(name='peter'))

回答 2

您可以在很多情况下使用元组,而在C中使用结构(例如x,y坐标或RGB颜色)。

对于其他所有内容,您都可以使用字典或类似这样的实用程序类:

>>> class Bunch:
...     def __init__(self, **kwds):
...         self.__dict__.update(kwds)
...
>>> mystruct = Bunch(field1=value1, field2=value2)

我认为“确定性”讨论在此处,在Python Cookbook的发行版本中。

You can use a tuple for a lot of things where you would use a struct in C (something like x,y coordinates or RGB colors for example).

For everything else you can use dictionary, or a utility class like this one:

>>> class Bunch:
...     def __init__(self, **kwds):
...         self.__dict__.update(kwds)
...
>>> mystruct = Bunch(field1=value1, field2=value2)

I think the “definitive” discussion is here, in the published version of the Python Cookbook.


回答 3

也许您正在寻找没有构造函数的Structs:

class Sample:
  name = ''
  average = 0.0
  values = None # list cannot be initialized here!


s1 = Sample()
s1.name = "sample 1"
s1.values = []
s1.values.append(1)
s1.values.append(2)
s1.values.append(3)

s2 = Sample()
s2.name = "sample 2"
s2.values = []
s2.values.append(4)

for v in s1.values:   # prints 1,2,3 --> OK.
  print v
print "***"
for v in s2.values:   # prints 4 --> OK.
  print v

Perhaps you are looking for Structs without constructors:

class Sample:
  name = ''
  average = 0.0
  values = None # list cannot be initialized here!


s1 = Sample()
s1.name = "sample 1"
s1.values = []
s1.values.append(1)
s1.values.append(2)
s1.values.append(3)

s2 = Sample()
s2.name = "sample 2"
s2.values = []
s2.values.append(4)

for v in s1.values:   # prints 1,2,3 --> OK.
  print v
print "***"
for v in s2.values:   # prints 4 --> OK.
  print v

回答 4

字典怎么样?

像这样:

myStruct = {'field1': 'some val', 'field2': 'some val'}

然后,您可以使用它来操纵值:

print myStruct['field1']
myStruct['field2'] = 'some other values'

并且值不必是字符串。它们几乎可以是任何其他对象。

How about a dictionary?

Something like this:

myStruct = {'field1': 'some val', 'field2': 'some val'}

Then you can use this to manipulate values:

print myStruct['field1']
myStruct['field2'] = 'some other values'

And the values don’t have to be strings. They can be pretty much any other object.


回答 5

dF:太酷了……我不知道我可以使用dict访问类中的字段。

马克:我希望我遇到的情况恰好是当我想要一个元组,却又没有字典那么重的时候。

您可以使用字典访问类的字段,因为类的字段,其方法及其所有属性都使用dict在内部存储(至少在CPython中)。

…这将引导我们提出您的第二条评论。相信Python字典是“繁重的”是一个极端的非Python的概念。阅读此类评论会杀死我的Python Zen。这不好。

您会看到,在声明一个类时,实际上是在围绕字典创建一个非常复杂的包装器-因此,如果有的话,与使用简单的字典相比,您将增加更多的开销。顺便说一句,开销在任何情况下都是没有意义的。如果您正在处理对性能至关重要的应用程序,请使用C或类似的东西。

dF: that’s pretty cool… I didn’t know that I could access the fields in a class using dict.

Mark: the situations that I wish I had this are precisely when I want a tuple but nothing as “heavy” as a dictionary.

You can access the fields of a class using a dictionary because the fields of a class, its methods and all its properties are stored internally using dicts (at least in CPython).

…Which leads us to your second comment. Believing that Python dicts are “heavy” is an extremely non-pythonistic concept. And reading such comments kills my Python Zen. That’s not good.

You see, when you declare a class you are actually creating a pretty complex wrapper around a dictionary – so, if anything, you are adding more overhead than by using a simple dictionary. An overhead which, by the way, is meaningless in any case. If you are working on performance critical applications, use C or something.


回答 6

您可以将标准库中可用的C结构子类化。该ctypes的模块提供了一个结构类。来自文档的示例:

>>> from ctypes import *
>>> class POINT(Structure):
...     _fields_ = [("x", c_int),
...                 ("y", c_int)]
...
>>> point = POINT(10, 20)
>>> print point.x, point.y
10 20
>>> point = POINT(y=5)
>>> print point.x, point.y
0 5
>>> POINT(1, 2, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: too many initializers
>>>
>>> class RECT(Structure):
...     _fields_ = [("upperleft", POINT),
...                 ("lowerright", POINT)]
...
>>> rc = RECT(point)
>>> print rc.upperleft.x, rc.upperleft.y
0 5
>>> print rc.lowerright.x, rc.lowerright.y
0 0
>>>

You can subclass the C structure that is available in the standard library. The ctypes module provides a Structure class. The example from the docs:

>>> from ctypes import *
>>> class POINT(Structure):
...     _fields_ = [("x", c_int),
...                 ("y", c_int)]
...
>>> point = POINT(10, 20)
>>> print point.x, point.y
10 20
>>> point = POINT(y=5)
>>> print point.x, point.y
0 5
>>> POINT(1, 2, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: too many initializers
>>>
>>> class RECT(Structure):
...     _fields_ = [("upperleft", POINT),
...                 ("lowerright", POINT)]
...
>>> rc = RECT(point)
>>> print rc.upperleft.x, rc.upperleft.y
0 5
>>> print rc.lowerright.x, rc.lowerright.y
0 0
>>>

回答 7

我还想添加一个使用slot的解决方案:

class Point:
    __slots__ = ["x", "y"]
    def __init__(self, x, y):
        self.x = x
        self.y = y

一定要查看文档中的插槽,但是插槽的快速说明是python的一种说法:“如果您可以将这些属性以及仅这些属性锁定到类中,以致您承诺一旦该类就不会添加任何新属性实例化(是的,您可以向类实例添加新属性,请参见下面的示例),然后我将取消大的内存分配,该内存分配允许向类实例添加新属性,并仅使用我需要的这些插槽化属性。”

向类实例添加属性的示例(因此不使用插槽):

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

p1 = Point(3,5)
p1.z = 8
print(p1.z)

输出8

尝试向使用插槽的类实例添加属性的示例:

class Point:
    __slots__ = ["x", "y"]
    def __init__(self, x, y):
        self.x = x
        self.y = y

p1 = Point(3,5)
p1.z = 8

输出:AttributeError:’Point’对象没有属性’z’

这可以有效地用作结构,并且比类使用更少的内存(就像结构一样,尽管我没有确切研究多少)。如果您将创建大量对象实例并且不需要添加属性,则建议使用插槽。一个点对象就是一个很好的例子,因为可能实例化许多点来描述一个数据集。

I would also like to add a solution that uses slots:

class Point:
    __slots__ = ["x", "y"]
    def __init__(self, x, y):
        self.x = x
        self.y = y

Definitely check the documentation for slots but a quick explanation of slots is that it is python’s way of saying: “If you can lock these attributes and only these attributes into the class such that you commit that you will not add any new attributes once the class is instantiated (yes you can add new attributes to a class instance, see example below) then I will do away with the large memory allocation that allows for adding new attributes to a class instance and use just what I need for these slotted attributes”.

Example of adding attributes to class instance (thus not using slots):

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

p1 = Point(3,5)
p1.z = 8
print(p1.z)

Output: 8

Example of trying to add attributes to class instance where slots was used:

class Point:
    __slots__ = ["x", "y"]
    def __init__(self, x, y):
        self.x = x
        self.y = y

p1 = Point(3,5)
p1.z = 8

Output: AttributeError: ‘Point’ object has no attribute ‘z’

This can effectively works as a struct and uses less memory than a class (like a struct would, although I have not researched exactly how much). It is recommended to use slots if you will be creating a large amount of instances of the object and do not need to add attributes. A point object is a good example of this as it is likely that one may instantiate many points to describe a dataset.


回答 8

您还可以按位置将init参数传递给实例变量

# Abstract struct class       
class Struct:
    def __init__ (self, *argv, **argd):
        if len(argd):
            # Update by dictionary
            self.__dict__.update (argd)
        else:
            # Update by position
            attrs = filter (lambda x: x[0:2] != "__", dir(self))
            for n in range(len(argv)):
                setattr(self, attrs[n], argv[n])

# Specific class
class Point3dStruct (Struct):
    x = 0
    y = 0
    z = 0

pt1 = Point3dStruct()
pt1.x = 10

print pt1.x
print "-"*10

pt2 = Point3dStruct(5, 6)

print pt2.x, pt2.y
print "-"*10

pt3 = Point3dStruct (x=1, y=2, z=3)
print pt3.x, pt3.y, pt3.z
print "-"*10

You can also pass the init parameters to the instance variables by position

# Abstract struct class       
class Struct:
    def __init__ (self, *argv, **argd):
        if len(argd):
            # Update by dictionary
            self.__dict__.update (argd)
        else:
            # Update by position
            attrs = filter (lambda x: x[0:2] != "__", dir(self))
            for n in range(len(argv)):
                setattr(self, attrs[n], argv[n])

# Specific class
class Point3dStruct (Struct):
    x = 0
    y = 0
    z = 0

pt1 = Point3dStruct()
pt1.x = 10

print pt1.x
print "-"*10

pt2 = Point3dStruct(5, 6)

print pt2.x, pt2.y
print "-"*10

pt3 = Point3dStruct (x=1, y=2, z=3)
print pt3.x, pt3.y, pt3.z
print "-"*10

回答 9

每当我需要一个“行为也像字典的即时数据对象”(我认为C结构!)时,我就会想到这个可爱的技巧:

class Map(dict):
    def __init__(self, **kwargs):
        super(Map, self).__init__(**kwargs)
        self.__dict__ = self

现在您可以说:

struct = Map(field1='foo', field2='bar', field3=42)

self.assertEquals('bar', struct.field2)
self.assertEquals(42, struct['field3'])

当您需要一个“不是类的数据包”以及namedtuple难以理解时,非常方便。

Whenever I need an “instant data object that also behaves like a dictionary” (I don’t think of C structs!), I think of this cute hack:

class Map(dict):
    def __init__(self, **kwargs):
        super(Map, self).__init__(**kwargs)
        self.__dict__ = self

Now you can just say:

struct = Map(field1='foo', field2='bar', field3=42)

self.assertEquals('bar', struct.field2)
self.assertEquals(42, struct['field3'])

Perfectly handy for those times when you need a “data bag that’s NOT a class”, and for when namedtuples are incomprehensible…


回答 10

您可以通过以下方式在python中访问C-Style结构。

class cstruct:
    var_i = 0
    var_f = 0.0
    var_str = ""

如果您只想使用cstruct的对象

obj = cstruct()
obj.var_i = 50
obj.var_f = 50.00
obj.var_str = "fifty"
print "cstruct: obj i=%d f=%f s=%s" %(obj.var_i, obj.var_f, obj.var_str)

如果要创建cstruct的对象数组

obj_array = [cstruct() for i in range(10)]
obj_array[0].var_i = 10
obj_array[0].var_f = 10.00
obj_array[0].var_str = "ten"

#go ahead and fill rest of array instaces of struct

#print all the value
for i in range(10):
    print "cstruct: obj_array i=%d f=%f s=%s" %(obj_array[i].var_i, obj_array[i].var_f, obj_array[i].var_str)

注意:请使用您的结构名称而不是“ cstruct”名称,而不要使用var_i,var_f,var_str,请定义结构的成员变量。

You access C-Style struct in python in following way.

class cstruct:
    var_i = 0
    var_f = 0.0
    var_str = ""

if you just want use object of cstruct

obj = cstruct()
obj.var_i = 50
obj.var_f = 50.00
obj.var_str = "fifty"
print "cstruct: obj i=%d f=%f s=%s" %(obj.var_i, obj.var_f, obj.var_str)

if you want to create an array of objects of cstruct

obj_array = [cstruct() for i in range(10)]
obj_array[0].var_i = 10
obj_array[0].var_f = 10.00
obj_array[0].var_str = "ten"

#go ahead and fill rest of array instaces of struct

#print all the value
for i in range(10):
    print "cstruct: obj_array i=%d f=%f s=%s" %(obj_array[i].var_i, obj_array[i].var_f, obj_array[i].var_str)

Note: instead of ‘cstruct’ name, please use your struct name instead of var_i, var_f, var_str, please define your structure’s member variable.


回答 11

这里的一些答案非常详尽。我找到的最简单的选项是(来自:http : //norvig.com/python-iaq.html):

class Struct:
    "A structure that can have any fields defined."
    def __init__(self, **entries): self.__dict__.update(entries)

初始化:

>>> options = Struct(answer=42, linelen=80, font='courier')
>>> options.answer
42

增加更多:

>>> options.cat = "dog"
>>> options.cat
dog

编辑:对不起,没有看到这个例子。

Some the answers here are massively elaborate. The simplest option I’ve found is (from: http://norvig.com/python-iaq.html):

class Struct:
    "A structure that can have any fields defined."
    def __init__(self, **entries): self.__dict__.update(entries)

Initialising:

>>> options = Struct(answer=42, linelen=80, font='courier')
>>> options.answer
42

adding more:

>>> options.cat = "dog"
>>> options.cat
dog

edit: Sorry didn’t see this example already further down.


回答 12

这可能有点晚了,但是我使用Python Meta-Classes(也是下面的装饰器版本)提出了一个解决方案。

什么时候 __init__在运行时被调用时,它抓住每个参数和它们的值,并将它们分配为实例变量上您的课。这样,您可以制作类似结构的类,而不必手动分配每个值。

我的示例没有错误检查,因此更容易理解。

class MyStruct(type):
    def __call__(cls, *args, **kwargs):
        names = cls.__init__.func_code.co_varnames[1:]

        self = type.__call__(cls, *args, **kwargs)

        for name, value in zip(names, args):
            setattr(self , name, value)

        for name, value in kwargs.iteritems():
            setattr(self , name, value)
        return self 

它在起作用。

>>> class MyClass(object):
    __metaclass__ = MyStruct
    def __init__(self, a, b, c):
        pass


>>> my_instance = MyClass(1, 2, 3)
>>> my_instance.a
1
>>> 

将其发布在reddit上/ u / matchu发布了一个更干净的装饰器版本。除非您要扩展元类版本,否则我建议您使用它。

>>> def init_all_args(fn):
    @wraps(fn)
    def wrapped_init(self, *args, **kwargs):
        names = fn.func_code.co_varnames[1:]

        for name, value in zip(names, args):
            setattr(self, name, value)

        for name, value in kwargs.iteritems():
            setattr(self, name, value)

    return wrapped_init

>>> class Test(object):
    @init_all_args
    def __init__(self, a, b):
        pass


>>> a = Test(1, 2)
>>> a.a
1
>>> 

This might be a bit late but I made a solution using Python Meta-Classes (decorator version below too).

When __init__ is called during run time, it grabs each of the arguments and their value and assigns them as instance variables to your class. This way you can make a struct-like class without having to assign every value manually.

My example has no error checking so it is easier to follow.

class MyStruct(type):
    def __call__(cls, *args, **kwargs):
        names = cls.__init__.func_code.co_varnames[1:]

        self = type.__call__(cls, *args, **kwargs)

        for name, value in zip(names, args):
            setattr(self , name, value)

        for name, value in kwargs.iteritems():
            setattr(self , name, value)
        return self 

Here it is in action.

>>> class MyClass(object):
    __metaclass__ = MyStruct
    def __init__(self, a, b, c):
        pass


>>> my_instance = MyClass(1, 2, 3)
>>> my_instance.a
1
>>> 

I posted it on reddit and /u/matchu posted a decorator version which is cleaner. I’d encourage you to use it unless you want to expand the metaclass version.

>>> def init_all_args(fn):
    @wraps(fn)
    def wrapped_init(self, *args, **kwargs):
        names = fn.func_code.co_varnames[1:]

        for name, value in zip(names, args):
            setattr(self, name, value)

        for name, value in kwargs.iteritems():
            setattr(self, name, value)

    return wrapped_init

>>> class Test(object):
    @init_all_args
    def __init__(self, a, b):
        pass


>>> a = Test(1, 2)
>>> a.a
1
>>> 

回答 13

我编写了一个装饰器,可以将其用于任何方法,以便将传入的所有参数或任何默认值分配给该实例。

def argumentsToAttributes(method):
    argumentNames = method.func_code.co_varnames[1:]

    # Generate a dictionary of default values:
    defaultsDict = {}
    defaults = method.func_defaults if method.func_defaults else ()
    for i, default in enumerate(defaults, start = len(argumentNames) - len(defaults)):
        defaultsDict[argumentNames[i]] = default

    def newMethod(self, *args, **kwargs):
        # Use the positional arguments.
        for name, value in zip(argumentNames, args):
            setattr(self, name, value)

        # Add the key word arguments. If anything is missing, use the default.
        for name in argumentNames[len(args):]:
            setattr(self, name, kwargs.get(name, defaultsDict[name]))

        # Run whatever else the method needs to do.
        method(self, *args, **kwargs)

    return newMethod

快速演示。请注意,我使用位置参数a,使用默认值b和命名参数c。然后self,我打印所有3个引用,以显示在输入方法之前已正确分配了它们。

class A(object):
    @argumentsToAttributes
    def __init__(self, a, b = 'Invisible', c = 'Hello'):
        print(self.a)
        print(self.b)
        print(self.c)

A('Why', c = 'Nothing')

请注意,我的装饰器应使用任何方法,而不仅仅是__init__

I wrote a decorator which you can use on any method to make it so that all of the arguments passed in, or any defaults, are assigned to the instance.

def argumentsToAttributes(method):
    argumentNames = method.func_code.co_varnames[1:]

    # Generate a dictionary of default values:
    defaultsDict = {}
    defaults = method.func_defaults if method.func_defaults else ()
    for i, default in enumerate(defaults, start = len(argumentNames) - len(defaults)):
        defaultsDict[argumentNames[i]] = default

    def newMethod(self, *args, **kwargs):
        # Use the positional arguments.
        for name, value in zip(argumentNames, args):
            setattr(self, name, value)

        # Add the key word arguments. If anything is missing, use the default.
        for name in argumentNames[len(args):]:
            setattr(self, name, kwargs.get(name, defaultsDict[name]))

        # Run whatever else the method needs to do.
        method(self, *args, **kwargs)

    return newMethod

A quick demonstration. Note that I use a positional argument a, use the default value for b, and a named argument c. I then print all 3 referencing self, to show that they’ve been properly assigned before the method is entered.

class A(object):
    @argumentsToAttributes
    def __init__(self, a, b = 'Invisible', c = 'Hello'):
        print(self.a)
        print(self.b)
        print(self.c)

A('Why', c = 'Nothing')

Note that my decorator should work with any method, not just __init__.


回答 14

我在这里看不到这个答案,因此我想添加一下,因为我现在倾向于使用Python并发现了它。所述的Python教程(Python 2中在这种情况下)给出以下简单而有效的实施例:

class Employee:
    pass

john = Employee()  # Create an empty employee record

# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000

即,创建一个空的类对象,然后实例化该字段,并动态添加字段。

这样做的好处是非常简单。缺点是它不是特别自我记录(预期的成员未在“定义”类中的任何位置列出),并且未设置的字段在访问时会引起问题。这两个问题可以通过以下方法解决:

class Employee:
    def __init__ (self):
        self.name = None # or whatever
        self.dept = None
        self.salary = None

现在,您至少可以一眼看出程序将期望哪些字段。

两者都容易出现错别字,john.slarly = 1000会成功。仍然可以。

I don’t see this answer here, so I figure I’ll add it since I’m leaning Python right now and just discovered it. The Python tutorial (Python 2 in this case) gives the following simple and effective example:

class Employee:
    pass

john = Employee()  # Create an empty employee record

# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000

That is, an empty class object is created, then instantiated, and the fields are added dynamically.

The up-side to this is its really simple. The downside is it isn’t particularly self-documenting (the intended members aren’t listed anywhere in the class “definition”), and unset fields can cause problems when accessed. Those two problems can be solved by:

class Employee:
    def __init__ (self):
        self.name = None # or whatever
        self.dept = None
        self.salary = None

Now at a glance you can at least see what fields the program will be expecting.

Both are prone to typos, john.slarly = 1000 will succeed. Still, it works.


回答 15

这是一个使用类(从未实例化)保存数据的解决方案。我喜欢这种方式,几乎不需要打字,也不需要任何其他软件包

class myStruct:
    field1 = "one"
    field2 = "2"

您以后可以根据需要添加更多字段:

myStruct.field3 = 3

要获取值,请照常访问这些字段:

>>> myStruct.field1
'one'

Here is a solution which uses a class (never instantiated) to hold data. I like that this way involves very little typing and does not require any additional packages etc.

class myStruct:
    field1 = "one"
    field2 = "2"

You can add more fields later, as needed:

myStruct.field3 = 3

To get the values, the fields are accessed as usual:

>>> myStruct.field1
'one'

回答 16

我个人也喜欢这个变体。它扩展了@dF的答案

class struct:
    def __init__(self, *sequential, **named):
        fields = dict(zip(sequential, [None]*len(sequential)), **named)
        self.__dict__.update(fields)
    def __repr__(self):
        return str(self.__dict__)

它支持两种初始化模式(可以混合使用):

# Struct with field1, field2, field3 that are initialized to None.
mystruct1 = struct("field1", "field2", "field3") 
# Struct with field1, field2, field3 that are initialized according to arguments.
mystruct2 = struct(field1=1, field2=2, field3=3)

此外,它的打印效果更好:

print(mystruct2)
# Prints: {'field3': 3, 'field1': 1, 'field2': 2}

Personally, I like this variant too. It extends @dF’s answer.

class struct:
    def __init__(self, *sequential, **named):
        fields = dict(zip(sequential, [None]*len(sequential)), **named)
        self.__dict__.update(fields)
    def __repr__(self):
        return str(self.__dict__)

It supports two modes of initialization (that can be blended):

# Struct with field1, field2, field3 that are initialized to None.
mystruct1 = struct("field1", "field2", "field3") 
# Struct with field1, field2, field3 that are initialized according to arguments.
mystruct2 = struct(field1=1, field2=2, field3=3)

Also, it prints nicer:

print(mystruct2)
# Prints: {'field3': 3, 'field1': 1, 'field2': 2}

回答 17

以下对结构的解决方案的灵感来自namedtuple实现和一些先前的答案。但是,与namedtuple不同,它的值是可变的,但是就像名称/属性中不可变的c样式结构一样,而普通的类或dict则不是。

_class_template = """\
class {typename}:
def __init__(self, *args, **kwargs):
    fields = {field_names!r}

    for x in fields:
        setattr(self, x, None)            

    for name, value in zip(fields, args):
        setattr(self, name, value)

    for name, value in kwargs.items():
        setattr(self, name, value)            

def __repr__(self):
    return str(vars(self))

def __setattr__(self, name, value):
    if name not in {field_names!r}:
        raise KeyError("invalid name: %s" % name)
    object.__setattr__(self, name, value)            
"""

def struct(typename, field_names):

    class_definition = _class_template.format(
        typename = typename,
        field_names = field_names)

    namespace = dict(__name__='struct_%s' % typename)
    exec(class_definition, namespace)
    result = namespace[typename]
    result._source = class_definition

    return result

用法:

Person = struct('Person', ['firstname','lastname'])
generic = Person()
michael = Person('Michael')
jones = Person(lastname = 'Jones')


In [168]: michael.middlename = 'ben'
Traceback (most recent call last):

  File "<ipython-input-168-b31c393c0d67>", line 1, in <module>
michael.middlename = 'ben'

  File "<string>", line 19, in __setattr__

KeyError: 'invalid name: middlename'

The following solution to a struct is inspired by the namedtuple implementation and some of the previous answers. However, unlike the namedtuple it is mutable, in it’s values, but like the c-style struct immutable in the names/attributes, which a normal class or dict isn’t.

_class_template = """\
class {typename}:
def __init__(self, *args, **kwargs):
    fields = {field_names!r}

    for x in fields:
        setattr(self, x, None)            

    for name, value in zip(fields, args):
        setattr(self, name, value)

    for name, value in kwargs.items():
        setattr(self, name, value)            

def __repr__(self):
    return str(vars(self))

def __setattr__(self, name, value):
    if name not in {field_names!r}:
        raise KeyError("invalid name: %s" % name)
    object.__setattr__(self, name, value)            
"""

def struct(typename, field_names):

    class_definition = _class_template.format(
        typename = typename,
        field_names = field_names)

    namespace = dict(__name__='struct_%s' % typename)
    exec(class_definition, namespace)
    result = namespace[typename]
    result._source = class_definition

    return result

Usage:

Person = struct('Person', ['firstname','lastname'])
generic = Person()
michael = Person('Michael')
jones = Person(lastname = 'Jones')


In [168]: michael.middlename = 'ben'
Traceback (most recent call last):

  File "<ipython-input-168-b31c393c0d67>", line 1, in <module>
michael.middlename = 'ben'

  File "<string>", line 19, in __setattr__

KeyError: 'invalid name: middlename'

回答 18

有一个专门用于此目的的python包。见cstruct2py

cstruct2py是一个纯Python库,用于从C代码生成python类,并使用它们来打包和解压缩数据。该库可以解析C头(结构,联合,枚举和数组声明),并在python中进行仿真。生成的pythonic类可以解析和打包数据。

例如:

typedef struct {
  int x;
  int y;
} Point;

after generating pythonic class...
p = Point(x=0x1234, y=0x5678)
p.packed == "\x34\x12\x00\x00\x78\x56\x00\x00"

如何使用

首先,我们需要生成pythonic结构:

import cstruct2py
parser = cstruct2py.c2py.Parser()
parser.parse_file('examples/example.h')

现在我们可以从C代码导入所有名称:

parser.update_globals(globals())

我们也可以直接这样做:

A = parser.parse_string('struct A { int x; int y;};')

从C代码使用类型和定义

a = A()
a.x = 45
print a
buf = a.packed
b = A(buf)
print b
c = A('aaaa11112222', 2)
print c
print repr(c)

输出将是:

{'x':0x2d, 'y':0x0}
{'x':0x2d, 'y':0x0}
{'x':0x31316161, 'y':0x32323131}
A('aa111122', x=0x31316161, y=0x32323131)

克隆

对于克隆cstruct2py运行:

git clone https://github.com/st0ky/cstruct2py.git --recursive

There is a python package exactly for this purpose. see cstruct2py

cstruct2py is a pure python library for generate python classes from C code and use them to pack and unpack data. The library can parse C headres (structs, unions, enums, and arrays declarations) and emulate them in python. The generated pythonic classes can parse and pack the data.

For example:

typedef struct {
  int x;
  int y;
} Point;

after generating pythonic class...
p = Point(x=0x1234, y=0x5678)
p.packed == "\x34\x12\x00\x00\x78\x56\x00\x00"

How to use

First we need to generate the pythonic structs:

import cstruct2py
parser = cstruct2py.c2py.Parser()
parser.parse_file('examples/example.h')

Now we can import all names from the C code:

parser.update_globals(globals())

We can also do that directly:

A = parser.parse_string('struct A { int x; int y;};')

Using types and defines from the C code

a = A()
a.x = 45
print a
buf = a.packed
b = A(buf)
print b
c = A('aaaa11112222', 2)
print c
print repr(c)

The output will be:

{'x':0x2d, 'y':0x0}
{'x':0x2d, 'y':0x0}
{'x':0x31316161, 'y':0x32323131}
A('aa111122', x=0x31316161, y=0x32323131)

Clone

For clone cstruct2py run:

git clone https://github.com/st0ky/cstruct2py.git --recursive

回答 19

我认为Python结构字典适合此要求。

d = dict{}
d[field1] = field1
d[field2] = field2
d[field2] = field3

I think Python structure dictionary is suitable for this requirement.

d = dict{}
d[field1] = field1
d[field2] = field2
d[field2] = field3

回答 20

https://stackoverflow.com/a/32448434/159695在Python3中不起作用。

https://stackoverflow.com/a/35993/159695可在Python3中使用。

我将其扩展为添加默认值。

class myStruct:
    def __init__(self, **kwds):
        self.x=0
        self.__dict__.update(kwds) # Must be last to accept assigned member variable.
    def __repr__(self):
        args = ['%s=%s' % (k, repr(v)) for (k,v) in vars(self).items()]
        return '%s(%s)' % ( self.__class__.__qualname__, ', '.join(args) )

a=myStruct()
b=myStruct(x=3,y='test')
c=myStruct(x='str')

>>> a
myStruct(x=0)
>>> b
myStruct(x=3, y='test')
>>> c
myStruct(x='str')

https://stackoverflow.com/a/32448434/159695 does not work in Python3.

https://stackoverflow.com/a/35993/159695 works in Python3.

And I extends it to add default values.

class myStruct:
    def __init__(self, **kwds):
        self.x=0
        self.__dict__.update(kwds) # Must be last to accept assigned member variable.
    def __repr__(self):
        args = ['%s=%s' % (k, repr(v)) for (k,v) in vars(self).items()]
        return '%s(%s)' % ( self.__class__.__qualname__, ', '.join(args) )

a=myStruct()
b=myStruct(x=3,y='test')
c=myStruct(x='str')

>>> a
myStruct(x=0)
>>> b
myStruct(x=3, y='test')
>>> c
myStruct(x='str')

回答 21

如果@dataclass没有3.7,并且需要可变性,则以下代码可能对您有用。它具有很好的自我说明性和IDE友好性(自动完成),可防止重复编写两次,易于扩展,并且测试所有实例变量是否已完全初始化非常简单:

class Params():
    def __init__(self):
        self.var1 : int = None
        self.var2 : str = None

    def are_all_defined(self):
        for key, value in self.__dict__.items():
            assert (value is not None), "instance variable {} is still None".format(key)
        return True


params = Params()
params.var1 = 2
params.var2 = 'hello'
assert(params.are_all_defined)

If you don’t have a 3.7 for @dataclass and need mutability, the following code might work for you. It’s quite self-documenting and IDE-friendly (auto-complete), prevents writing things twice, is easily extendable and it is very simple to test that all instance variables are completely initialized:

class Params():
    def __init__(self):
        self.var1 : int = None
        self.var2 : str = None

    def are_all_defined(self):
        for key, value in self.__dict__.items():
            assert (value is not None), "instance variable {} is still None".format(key)
        return True


params = Params()
params.var1 = 2
params.var2 = 'hello'
assert(params.are_all_defined)

回答 22

这是一个快速而肮脏的把戏:

>>> ms = Warning()
>>> ms.foo = 123
>>> ms.bar = 'akafrit'

如何运作?它只是重复使用内置类Warning(从派生Exception),并使用它,因为它是您自己定义的类。

优点是您不需要首先导入或定义任何内容,“警告”是一个简短的名称,并且还可以清楚地表明您正在做一些肮脏的事情,除了您的小型脚本之外,其他任何地方都不应使用。

顺便说一句,我试图找到一些更简单的东西,ms = object()但是没有(最后一个例子不起作用)。如果您有一个,我很感兴趣。

Here is a quick and dirty trick:

>>> ms = Warning()
>>> ms.foo = 123
>>> ms.bar = 'akafrit'

How does it works? It just re-use the builtin class Warning (derived from Exception) and use it as it was you own defined class.

The good points are that you do not need to import or define anything first, that “Warning” is a short name, and that it also makes clear you are doing something dirty which should not be used elsewhere than a small script of yours.

By the way, I tried to find something even simpler like ms = object() but could not (this last exemple is not working). If you have one, I am interested.


回答 23

我发现做到这一点的最佳方法是使用自定义词典类,如本文中所述:https : //stackoverflow.com/a/14620633/8484485

如果需要iPython自动补全支持,只需定义dir()函数,如下所示:

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self
    def __dir__(self):
        return self.keys()

然后,您可以像这样定义伪结构:(此嵌套)

my_struct=AttrDict ({
    'com1':AttrDict ({
        'inst':[0x05],
        'numbytes':2,
        'canpayload':False,
        'payload':None
    })
})

然后,您可以像这样访问my_struct内的值:

print(my_struct.com1.inst)

=>[5]

The best way I found to do this was to use a custom dictionary class as explained in this post: https://stackoverflow.com/a/14620633/8484485

If iPython autocompletion support is needed, simply define the dir() function like this:

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self
    def __dir__(self):
        return self.keys()

You then define your pseudo struct like so: (this one is nested)

my_struct=AttrDict ({
    'com1':AttrDict ({
        'inst':[0x05],
        'numbytes':2,
        'canpayload':False,
        'payload':None
    })
})

You can then access the values inside my_struct like this:

print(my_struct.com1.inst)

=>[5]


回答 24

NamedTuple很舒服。但没有人共享性能和存储空间。

from typing import NamedTuple
import guppy  # pip install guppy
import timeit


class User:
    def __init__(self, name: str, uid: int):
        self.name = name
        self.uid = uid


class UserSlot:
    __slots__ = ('name', 'uid')

    def __init__(self, name: str, uid: int):
        self.name = name
        self.uid = uid


class UserTuple(NamedTuple):
    # __slots__ = ()  # AttributeError: Cannot overwrite NamedTuple attribute __slots__
    name: str
    uid: int


def get_fn(obj, attr_name: str):
    def get():
        getattr(obj, attr_name)
    return get
if 'memory test':
    obj = [User('Carson', 1) for _ in range(1000000)]      # Cumulative: 189138883
    obj_slot = [UserSlot('Carson', 1) for _ in range(1000000)]          # 77718299  <-- winner
    obj_namedtuple = [UserTuple('Carson', 1) for _ in range(1000000)]   # 85718297
    print(guppy.hpy().heap())  # Run this function individually. 
    """
    Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000    24 112000000 34 112000000  34 dict of __main__.User
     1 1000000    24 64000000  19 176000000  53 __main__.UserTuple
     2 1000000    24 56000000  17 232000000  70 __main__.User
     3 1000000    24 56000000  17 288000000  87 __main__.UserSlot
     ...
    """

if 'performance test':
    obj = User('Carson', 1)
    obj_slot = UserSlot('Carson', 1)
    obj_tuple = UserTuple('Carson', 1)

    time_normal = min(timeit.repeat(get_fn(obj, 'name'), repeat=20))
    print(time_normal)  # 0.12550550000000005

    time_slot = min(timeit.repeat(get_fn(obj_slot, 'name'), repeat=20))
    print(time_slot)  # 0.1368690000000008

    time_tuple = min(timeit.repeat(get_fn(obj_tuple, 'name'), repeat=20))
    print(time_tuple)  # 0.16006120000000124

    print(time_tuple/time_slot)  # 1.1694481584580898  # The slot is almost 17% faster than NamedTuple on Windows. (Python 3.7.7)

如果您__dict__不使用,请在__slots__(更高的性能和存储空间)和NamedTuple(便于阅读和使用)之间进行选择

您可以查看这个链接(的用法 ),以获得更多的__slots__信息。

NamedTuple is comfortable. but there no one shares the performance and storage.

from typing import NamedTuple
import guppy  # pip install guppy
import timeit


class User:
    def __init__(self, name: str, uid: int):
        self.name = name
        self.uid = uid


class UserSlot:
    __slots__ = ('name', 'uid')

    def __init__(self, name: str, uid: int):
        self.name = name
        self.uid = uid


class UserTuple(NamedTuple):
    # __slots__ = ()  # AttributeError: Cannot overwrite NamedTuple attribute __slots__
    name: str
    uid: int


def get_fn(obj, attr_name: str):
    def get():
        getattr(obj, attr_name)
    return get
if 'memory test':
    obj = [User('Carson', 1) for _ in range(1000000)]      # Cumulative: 189138883
    obj_slot = [UserSlot('Carson', 1) for _ in range(1000000)]          # 77718299  <-- winner
    obj_namedtuple = [UserTuple('Carson', 1) for _ in range(1000000)]   # 85718297
    print(guppy.hpy().heap())  # Run this function individually. 
    """
    Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000    24 112000000 34 112000000  34 dict of __main__.User
     1 1000000    24 64000000  19 176000000  53 __main__.UserTuple
     2 1000000    24 56000000  17 232000000  70 __main__.User
     3 1000000    24 56000000  17 288000000  87 __main__.UserSlot
     ...
    """

if 'performance test':
    obj = User('Carson', 1)
    obj_slot = UserSlot('Carson', 1)
    obj_tuple = UserTuple('Carson', 1)

    time_normal = min(timeit.repeat(get_fn(obj, 'name'), repeat=20))
    print(time_normal)  # 0.12550550000000005

    time_slot = min(timeit.repeat(get_fn(obj_slot, 'name'), repeat=20))
    print(time_slot)  # 0.1368690000000008

    time_tuple = min(timeit.repeat(get_fn(obj_tuple, 'name'), repeat=20))
    print(time_tuple)  # 0.16006120000000124

    print(time_tuple/time_slot)  # 1.1694481584580898  # The slot is almost 17% faster than NamedTuple on Windows. (Python 3.7.7)

If your __dict__ is not using, please choose between __slots__ (higher performance and storage) and NamedTuple (clear for reading and use)

You can review this link(Usage of slots ) to get more __slots__ information.


从熊猫DataFrame中按部分字符串选择

问题:从熊猫DataFrame中按部分字符串选择

我有一个DataFrame4列,其中2个包含字符串值。我想知道是否有一种方法可以根据针对特定列的部分字符串匹配来选择行?

换句话说,一个函数或lambda函数将执行以下操作

re.search(pattern, cell_in_question) 

返回一个布尔值。我熟悉的语法,df[df['A'] == "hello world"]但似乎找不到用部分字符串匹配说的方法'hello'

有人可以指出正确的方向吗?

I have a DataFrame with 4 columns of which 2 contain string values. I was wondering if there was a way to select rows based on a partial string match against a particular column?

In other words, a function or lambda function that would do something like

re.search(pattern, cell_in_question) 

returning a boolean. I am familiar with the syntax of df[df['A'] == "hello world"] but can’t seem to find a way to do the same with a partial string match say 'hello'.

Would someone be able to point me in the right direction?


回答 0

基于github问题#620,看来您很快将能够执行以下操作:

df[df['A'].str.contains("hello")]

更新:熊猫0.8.1及更高版本中提供了矢量化字符串方法(即Series.str)

Based on github issue #620, it looks like you’ll soon be able to do the following:

df[df['A'].str.contains("hello")]

Update: vectorized string methods (i.e., Series.str) are available in pandas 0.8.1 and up.


回答 1

我尝试了上面提出的解决方案:

df[df["A"].str.contains("Hello|Britain")]

并得到一个错误:

ValueError:无法使用包含NA / NaN值的数组进行遮罩

您可以将NA值转换为False,如下所示:

df[df["A"].str.contains("Hello|Britain", na=False)]

I tried the proposed solution above:

df[df["A"].str.contains("Hello|Britain")]

and got an error:

ValueError: cannot mask with array containing NA / NaN values

you can transform NA values into False, like this:

df[df["A"].str.contains("Hello|Britain", na=False)]

回答 2

如何从熊猫DataFrame中按部分字符串选择?

这篇文章是为想要

  • 在字符串列中搜索子字符串(最简单的情况)
  • 搜索多个子字符串(类似于isin
  • 匹配文本中的整个单词(例如,“蓝色”应匹配“天空是蓝色”,而不是“ bluejay”)
  • 匹配多个完整词
  • 了解“ ValueError:无法使用包含NA / NaN值的向量进行索引”背后的原因

…并想进一步了解应优先采用哪种方法。

(PS:我在类似主题上看到了很多问题,我认为最好把它留在这里。)


基本子串搜索

# setup
df1 = pd.DataFrame({'col': ['foo', 'foobar', 'bar', 'baz']})
df1

      col
0     foo
1  foobar
2     bar
3     baz

str.contains可用于执行子字符串搜索或基于正则表达式的搜索。搜索默认为基于正则表达式,除非您明确禁用它。

这是一个基于正则表达式的搜索示例,

# find rows in `df1` which contain "foo" followed by something
df1[df1['col'].str.contains(r'foo(?!$)')]

      col
1  foobar

有时,不需要进行正则表达式搜索,因此请指定regex=False为禁用它。

#select all rows containing "foo"
df1[df1['col'].str.contains('foo', regex=False)]
# same as df1[df1['col'].str.contains('foo')] but faster.

      col
0     foo
1  foobar

在性能方面,正则表达式搜索比子字符串搜索慢:

df2 = pd.concat([df1] * 1000, ignore_index=True)

%timeit df2[df2['col'].str.contains('foo')]
%timeit df2[df2['col'].str.contains('foo', regex=False)]

6.31 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.8 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

如果不需要,请避免使用基于正则表达式的搜索。

解决ValueError小号
有时,执行字符串搜索和对结果的过滤会导致

ValueError: cannot index with vector containing NA / NaN values

这通常是由于对象列中存在混合数据或NaN,

s = pd.Series(['foo', 'foobar', np.nan, 'bar', 'baz', 123])
s.str.contains('foo|bar')

0     True
1     True
2      NaN
3     True
4    False
5      NaN
dtype: object


s[s.str.contains('foo|bar')]
# ---------------------------------------------------------------------------
# ValueError                                Traceback (most recent call last)

非字符串的任何内容都不能应用字符串方法,因此结果自然是NaN。在这种情况下,请指定na=False忽略非字符串数据,

s.str.contains('foo|bar', na=False)

0     True
1     True
2    False
3     True
4    False
5    False
dtype: bool

多个子串搜索

通过使用正则表达式OR管道进行正则表达式搜索,最容易实现这一点。

# Slightly modified example.
df4 = pd.DataFrame({'col': ['foo abc', 'foobar xyz', 'bar32', 'baz 45']})
df4

          col
0     foo abc
1  foobar xyz
2       bar32
3      baz 45

df4[df4['col'].str.contains(r'foo|baz')]

          col
0     foo abc
1  foobar xyz
3      baz 45

您还可以创建一个术语列表,然后将其加入:

terms = ['foo', 'baz']
df4[df4['col'].str.contains('|'.join(terms))]

          col
0     foo abc
1  foobar xyz
3      baz 45

有时,明智的做法是将您的术语转义,以防它们包含可被解释为正则表达式元字符的字符。如果您的条款包含以下任何字符…

. ^ $ * + ? { } [ ] \ | ( )

然后,你就需要使用re.escape逃避它们:

import re
df4[df4['col'].str.contains('|'.join(map(re.escape, terms)))]

          col
0     foo abc
1  foobar xyz
3      baz 45

re.escape 具有转义特殊字符的效果,因此可以按字面意义对待它们。

re.escape(r'.foo^')
# '\\.foo\\^'

匹配全词

默认情况下,子字符串搜索将搜索指定的子字符串/模式,而不管其是否为完整单词。为了仅匹配完整的单词,我们将需要在此处使用正则表达式-特别是,我们的模式将需要指定单词边界(\b)。

例如,

df3 = pd.DataFrame({'col': ['the sky is blue', 'bluejay by the window']})
df3

                     col
0        the sky is blue
1  bluejay by the window

现在考虑

df3[df3['col'].str.contains('blue')]

                     col
0        the sky is blue
1  bluejay by the window

伏/秒

df3[df3['col'].str.contains(r'\bblue\b')]

               col
0  the sky is blue

多个全字搜索

与上述类似,不同之处\b在于我们在连接的模式中添加了字边界()。

p = r'\b(?:{})\b'.format('|'.join(map(re.escape, terms)))
df4[df4['col'].str.contains(p)]

       col
0  foo abc
3   baz 45

p这个样子的,

p
# '\\b(?:foo|baz)\\b'

一个很好的选择:使用列表推导

因为你能!而且你应该!它们通常比字符串方法快一点,因为字符串方法难以向量化并且通常具有循环实现。

代替,

df1[df1['col'].str.contains('foo', regex=False)]

in在列表组合中使用运算符,

df1[['foo' in x for x in df1['col']]]

       col
0  foo abc
1   foobar

代替,

regex_pattern = r'foo(?!$)'
df1[df1['col'].str.contains(regex_pattern)]

在列表组合中使用re.compile(用于缓存正则表达式)+ Pattern.search

p = re.compile(regex_pattern, flags=re.IGNORECASE)
df1[[bool(p.search(x)) for x in df1['col']]]

      col
1  foobar

如果“ col”具有NaN,则代替

df1[df1['col'].str.contains(regex_pattern, na=False)]

采用,

def try_search(p, x):
    try:
        return bool(p.search(x))
    except TypeError:
        return False

p = re.compile(regex_pattern)
df1[[try_search(p, x) for x in df1['col']]]

      col
1  foobar

偏字符串匹配更多选项:np.char.findnp.vectorizeDataFrame.query

除了str.contains和列出理解,您还可以使用以下替代方法。

np.char.find
仅支持子字符串搜索(读取:无正则表达式)。

df4[np.char.find(df4['col'].values.astype(str), 'foo') > -1]

          col
0     foo abc
1  foobar xyz

np.vectorize
这是一个循环的包装器,但是比大多数pandas str方法要少。

f = np.vectorize(lambda haystack, needle: needle in haystack)
f(df1['col'], 'foo')
# array([ True,  True, False, False])

df1[f(df1['col'], 'foo')]

       col
0  foo abc
1   foobar

正则表达式解决方案可能:

regex_pattern = r'foo(?!$)'
p = re.compile(regex_pattern)
f = np.vectorize(lambda x: pd.notna(x) and bool(p.search(x)))
df1[f(df1['col'])]

      col
1  foobar

DataFrame.query
通过python引擎支持字符串方法。这没有提供明显的性能优势,但是对于了解是否需要动态生成查询很有用。

df1.query('col.str.contains("foo")', engine='python')

      col
0     foo
1  foobar

有关更多信息queryeval方法系列,请参见使用pd.eval()在大熊猫中进行动态表达评估。


推荐用法

  1. (第一) str.contains,因为它简单易用,可以处理NaN和混合数据
  2. 列出其性能的理解(特别是如果您的数据是纯字符串)
  3. np.vectorize
  4. (持续) df.query

How do I select by partial string from a pandas DataFrame?

This post is meant for readers who want to

  • search for a substring in a string column (the simplest case)
  • search for multiple substrings (similar to isin)
  • match a whole word from text (e.g., “blue” should match “the sky is blue” but not “bluejay”)
  • match multiple whole words
  • Understand the reason behind “ValueError: cannot index with vector containing NA / NaN values”

…and would like to know more about what methods should be preferred over others.

(P.S.: I’ve seen a lot of questions on similar topics, I thought it would be good to leave this here.)


Basic Substring Search

# setup
df1 = pd.DataFrame({'col': ['foo', 'foobar', 'bar', 'baz']})
df1

      col
0     foo
1  foobar
2     bar
3     baz

str.contains can be used to perform either substring searches or regex based search. The search defaults to regex-based unless you explicitly disable it.

Here is an example of regex-based search,

# find rows in `df1` which contain "foo" followed by something
df1[df1['col'].str.contains(r'foo(?!$)')]

      col
1  foobar

Sometimes regex search is not required, so specify regex=False to disable it.

#select all rows containing "foo"
df1[df1['col'].str.contains('foo', regex=False)]
# same as df1[df1['col'].str.contains('foo')] but faster.

      col
0     foo
1  foobar

Performance wise, regex search is slower than substring search:

df2 = pd.concat([df1] * 1000, ignore_index=True)

%timeit df2[df2['col'].str.contains('foo')]
%timeit df2[df2['col'].str.contains('foo', regex=False)]

6.31 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.8 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Avoid using regex-based search if you don’t need it.

Addressing ValueErrors
Sometimes, performing a substring search and filtering on the result will result in

ValueError: cannot index with vector containing NA / NaN values

This is usually because of mixed data or NaNs in your object column,

s = pd.Series(['foo', 'foobar', np.nan, 'bar', 'baz', 123])
s.str.contains('foo|bar')

0     True
1     True
2      NaN
3     True
4    False
5      NaN
dtype: object


s[s.str.contains('foo|bar')]
# ---------------------------------------------------------------------------
# ValueError                                Traceback (most recent call last)

Anything that is not a string cannot have string methods applied on it, so the result is NaN (naturally). In this case, specify na=False to ignore non-string data,

s.str.contains('foo|bar', na=False)

0     True
1     True
2    False
3     True
4    False
5    False
dtype: bool

Multiple Substring Search

This is most easily achieved through a regex search using the regex OR pipe.

# Slightly modified example.
df4 = pd.DataFrame({'col': ['foo abc', 'foobar xyz', 'bar32', 'baz 45']})
df4

          col
0     foo abc
1  foobar xyz
2       bar32
3      baz 45

df4[df4['col'].str.contains(r'foo|baz')]

          col
0     foo abc
1  foobar xyz
3      baz 45

You can also create a list of terms, then join them:

terms = ['foo', 'baz']
df4[df4['col'].str.contains('|'.join(terms))]

          col
0     foo abc
1  foobar xyz
3      baz 45

Sometimes, it is wise to escape your terms in case they have characters that can be interpreted as regex metacharacters. If your terms contain any of the following characters…

. ^ $ * + ? { } [ ] \ | ( )

Then, you’ll need to use re.escape to escape them:

import re
df4[df4['col'].str.contains('|'.join(map(re.escape, terms)))]

          col
0     foo abc
1  foobar xyz
3      baz 45

re.escape has the effect of escaping the special characters so they’re treated literally.

re.escape(r'.foo^')
# '\\.foo\\^'

Matching Entire Word(s)

By default, the substring search searches for the specified substring/pattern regardless of whether it is full word or not. To only match full words, we will need to make use of regular expressions here—in particular, our pattern will need to specify word boundaries (\b).

For example,

df3 = pd.DataFrame({'col': ['the sky is blue', 'bluejay by the window']})
df3

                     col
0        the sky is blue
1  bluejay by the window

Now consider,

df3[df3['col'].str.contains('blue')]

                     col
0        the sky is blue
1  bluejay by the window

v/s

df3[df3['col'].str.contains(r'\bblue\b')]

               col
0  the sky is blue

Multiple Whole Word Search

Similar to the above, except we add a word boundary (\b) to the joined pattern.

p = r'\b(?:{})\b'.format('|'.join(map(re.escape, terms)))
df4[df4['col'].str.contains(p)]

       col
0  foo abc
3   baz 45

Where p looks like this,

p
# '\\b(?:foo|baz)\\b'

A Great Alternative: Use List Comprehensions!

Because you can! And you should! They are usually a little bit faster than string methods, because string methods are hard to vectorise and usually have loopy implementations.

Instead of,

df1[df1['col'].str.contains('foo', regex=False)]

Use the in operator inside a list comp,

df1[['foo' in x for x in df1['col']]]

       col
0  foo abc
1   foobar

Instead of,

regex_pattern = r'foo(?!$)'
df1[df1['col'].str.contains(regex_pattern)]

Use re.compile (to cache your regex) + Pattern.search inside a list comp,

p = re.compile(regex_pattern, flags=re.IGNORECASE)
df1[[bool(p.search(x)) for x in df1['col']]]

      col
1  foobar

If “col” has NaNs, then instead of

df1[df1['col'].str.contains(regex_pattern, na=False)]

Use,

def try_search(p, x):
    try:
        return bool(p.search(x))
    except TypeError:
        return False

p = re.compile(regex_pattern)
df1[[try_search(p, x) for x in df1['col']]]

      col
1  foobar

More Options for Partial String Matching: np.char.find, np.vectorize, DataFrame.query.

In addition to str.contains and list comprehensions, you can also use the following alternatives.

np.char.find
Supports substring searches (read: no regex) only.

df4[np.char.find(df4['col'].values.astype(str), 'foo') > -1]

          col
0     foo abc
1  foobar xyz

np.vectorize
This is a wrapper around a loop, but with lesser overhead than most pandas str methods.

f = np.vectorize(lambda haystack, needle: needle in haystack)
f(df1['col'], 'foo')
# array([ True,  True, False, False])

df1[f(df1['col'], 'foo')]

       col
0  foo abc
1   foobar

Regex solutions possible:

regex_pattern = r'foo(?!$)'
p = re.compile(regex_pattern)
f = np.vectorize(lambda x: pd.notna(x) and bool(p.search(x)))
df1[f(df1['col'])]

      col
1  foobar

DataFrame.query
Supports string methods through the python engine. This offers no visible performance benefits, but is nonetheless useful to know if you need to dynamically generate your queries.

df1.query('col.str.contains("foo")', engine='python')

      col
0     foo
1  foobar

More information on query and eval family of methods can be found at Dynamic Expression Evaluation in pandas using pd.eval().


Recommended Usage Precedence

  1. (First) str.contains, for its simplicity and ease handling NaNs and mixed data
  2. List comprehensions, for its performance (especially if your data is purely strings)
  3. np.vectorize
  4. (Last) df.query

回答 3

如果有人想知道如何执行相关问题:“按部分字符串选择列”

采用:

df.filter(like='hello')  # select columns which contain the word hello

要通过部分字符串匹配选择行,请传递axis=0到过滤器:

# selects rows which contain the word hello in their index label
df.filter(like='hello', axis=0)  

If anyone wonders how to perform a related problem: “Select column by partial string”

Use:

df.filter(like='hello')  # select columns which contain the word hello

And to select rows by partial string matching, pass axis=0 to filter:

# selects rows which contain the word hello in their index label
df.filter(like='hello', axis=0)  

回答 4

快速说明:如果要基于索引中包含的部分字符串进行选择,请尝试以下操作:

df['stridx']=df.index
df[df['stridx'].str.contains("Hello|Britain")]

Quick note: if you want to do selection based on a partial string contained in the index, try the following:

df['stridx']=df.index
df[df['stridx'].str.contains("Hello|Britain")]

回答 5

说您有以下内容DataFrame

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

您始终可以in在lambda表达式中使用运算符来创建过滤器。

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

这里的技巧是使用中的axis=1选项apply将元素逐行(而不是逐列)传递给lambda函数。

Say you have the following DataFrame:

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

You can always use the in operator in a lambda expression to create your filter.

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

The trick here is to use the axis=1 option in the apply to pass elements to the lambda function row by row, as opposed to column by column.


回答 6

这就是我为部分字符串匹配所做的最终结果。如果有人有更有效的方法,请告诉我。

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf

Here’s what I ended up doing for partial string matches. If anyone has a more efficient way of doing this please let me know.

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf

回答 7

对于包含特殊字符的字符串,使用contains效果不佳。找到工作了。

df[df['A'].str.find("hello") != -1]

Using contains didn’t work well for my string with special characters. Find worked though.

df[df['A'].str.find("hello") != -1]

回答 8

在此之前,有一些答案可以完成所要求的功能,无论如何,我想以最普遍的方式展示:

df.filter(regex=".*STRING_YOU_LOOK_FOR.*")

这样,无论编写哪种方式,您都可以获取要查找的列。

(显然,您必须为每种情况编写正确的regex表达式)

There are answers before this which accomplish the asked feature, anyway I would like to show the most generally way:

df.filter(regex=".*STRING_YOU_LOOK_FOR.*")

This way let’s you get the column you look for whatever the way is wrote.

( Obviusly, you have to write the proper regex expression for each case )


回答 9

也许您想在Pandas数据框的所有列中搜索一些文本,而不仅仅是在它们的子集中。在这种情况下,以下代码将有所帮助。

df[df.apply(lambda row: row.astype(str).str.contains('String To Find').any(), axis=1)]

警告。此方法相对较慢,但很方便。

Maybe you want to search for some text in all columns of the Pandas dataframe, and not just in the subset of them. In this case, the following code will help.

df[df.apply(lambda row: row.astype(str).str.contains('String To Find').any(), axis=1)]

Warning. This method is relatively slow, albeit convenient.


回答 10

如果您需要在pandas dataframe列中进行不区分大小写的搜索,请执行以下操作:

df[df['A'].str.contains("hello", case=False)]

Should you need to do a case insensitive search for a string in a pandas dataframe column:

df[df['A'].str.contains("hello", case=False)]

如何检查对象是列表还是元组(而不是字符串)?

问题:如何检查对象是列表还是元组(而不是字符串)?

这就是我通常做,以确定输入是一个list/ tuple-但不是str。因为很多时候我偶然发现了一个错误,即一个函数str错误地传递了一个对象,而目标函数确实for x in lst假定这lst实际上是一个listor tuple

assert isinstance(lst, (list, tuple))

我的问题是:是否有更好的方法来实现这一目标?

This is what I normally do in order to ascertain that the input is a list/tuple – but not a str. Because many times I stumbled upon bugs where a function passes a str object by mistake, and the target function does for x in lst assuming that lst is actually a list or tuple.

assert isinstance(lst, (list, tuple))

My question is: is there a better way of achieving this?


回答 0

仅在python 2中(不是python 3):

assert not isinstance(lst, basestring)

实际上就是您想要的,否则您会错过很多像列表一样的东西,但它们不是listor的子类tuple

In python 2 only (not python 3):

assert not isinstance(lst, basestring)

Is actually what you want, otherwise you’ll miss out on a lot of things which act like lists, but aren’t subclasses of list or tuple.


回答 1

请记住,在Python中,我们要使用“鸭子类型”。因此,任何类似列表的行为都可以视为列表。因此,不要检查列表的类型,只看它是否像列表一样。

但是字符串也像列表一样,通常这不是我们想要的。有时甚至是一个问题!因此,显式检查字符串,然后使用鸭子类型。

这是我写的一个有趣的函数。这是它的特殊版本,repr()可以在尖括号('<‘,’>’)中打印任何序列。

def srepr(arg):
    if isinstance(arg, basestring): # Python 3: isinstance(arg, str)
        return repr(arg)
    try:
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    except TypeError: # catch when for loop fails
        return repr(arg) # not a sequence so just return repr

总体而言,这是干净优雅的。但是那张isinstance()支票在那里做什么?这是一种hack。但这是必不可少的。

该函数以递归方式调用类似于列表的任何对象。如果我们不专门处理字符串,则将其视为列表,并一次拆分一个字符。但是,然后递归调用将尝试将每个字符视为一个列表-它将起作用!即使是一个字符的字符串也可以作为列表!该函数将继续递归调用自身,直到堆栈溢出为止。

像这样的函数,依赖于每个递归调用来分解要完成的工作,必须使用特殊情况的字符串-因为您不能将字符串分解为一个字符以下的字符串,甚至不能分解为一个以下的字符串-字符字符串的作用类似于列表。

注意:try/ except是表达我们意图的最干净的方法。但是,如果这段代码在某种程度上对时间很紧迫,我们可能要用某种测试来替换它,看看是否arg是一个序列。除了测试类型,我们可能应该测试行为。如果它有一个.strip()方法,它是一个字符串,所以不要认为它是一个序列。否则,如果它是可索引的或可迭代的,则它是一个序列:

def is_sequence(arg):
    return (not hasattr(arg, "strip") and
            hasattr(arg, "__getitem__") or
            hasattr(arg, "__iter__"))

def srepr(arg):
    if is_sequence(arg):
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    return repr(arg)

编辑:我最初写上面检查,__getslice__()但我注意到在collections模块文档中,有趣的方法是__getitem__(); 这很有意义,这就是您索引对象的方式。这似乎比根本,__getslice__()因此我更改了上面的内容。

Remember that in Python we want to use “duck typing”. So, anything that acts like a list can be treated as a list. So, don’t check for the type of a list, just see if it acts like a list.

But strings act like a list too, and often that is not what we want. There are times when it is even a problem! So, check explicitly for a string, but then use duck typing.

Here is a function I wrote for fun. It is a special version of repr() that prints any sequence in angle brackets (‘<‘, ‘>’).

def srepr(arg):
    if isinstance(arg, basestring): # Python 3: isinstance(arg, str)
        return repr(arg)
    try:
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    except TypeError: # catch when for loop fails
        return repr(arg) # not a sequence so just return repr

This is clean and elegant, overall. But what’s that isinstance() check doing there? That’s kind of a hack. But it is essential.

This function calls itself recursively on anything that acts like a list. If we didn’t handle the string specially, then it would be treated like a list, and split up one character at a time. But then the recursive call would try to treat each character as a list — and it would work! Even a one-character string works as a list! The function would keep on calling itself recursively until stack overflow.

Functions like this one, that depend on each recursive call breaking down the work to be done, have to special-case strings–because you can’t break down a string below the level of a one-character string, and even a one-character string acts like a list.

Note: the try/except is the cleanest way to express our intentions. But if this code were somehow time-critical, we might want to replace it with some sort of test to see if arg is a sequence. Rather than testing the type, we should probably test behaviors. If it has a .strip() method, it’s a string, so don’t consider it a sequence; otherwise, if it is indexable or iterable, it’s a sequence:

def is_sequence(arg):
    return (not hasattr(arg, "strip") and
            hasattr(arg, "__getitem__") or
            hasattr(arg, "__iter__"))

def srepr(arg):
    if is_sequence(arg):
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    return repr(arg)

EDIT: I originally wrote the above with a check for __getslice__() but I noticed that in the collections module documentation, the interesting method is __getitem__(); this makes sense, that’s how you index an object. That seems more fundamental than __getslice__() so I changed the above.


回答 2

H = "Hello"

if type(H) is list or type(H) is tuple:
    ## Do Something.
else
    ## Do Something.
H = "Hello"

if type(H) is list or type(H) is tuple:
    ## Do Something.
else
    ## Do Something.

回答 3

对于Python 3:

import collections.abc

if isinstance(obj, collections.abc.Sequence) and not isinstance(obj, str):
    print("obj is a sequence (list, tuple, etc) but not a string")

在版本3.3中进行了更改:将集合抽象基类移至collections.abc模块。为了向后兼容,它们在此模块中也将继续可见,直到3.8版将停止工作为止。

对于Python 2:

import collections

if isinstance(obj, collections.Sequence) and not isinstance(obj, basestring):
    print "obj is a sequence (list, tuple, etc) but not a string or unicode"

For Python 3:

import collections.abc

if isinstance(obj, collections.abc.Sequence) and not isinstance(obj, str):
    print("obj is a sequence (list, tuple, etc) but not a string")

Changed in version 3.3: Moved Collections Abstract Base Classes to the collections.abc module. For backwards compatibility, they will continue to be visible in this module as well until version 3.8 where it will stop working.

For Python 2:

import collections

if isinstance(obj, collections.Sequence) and not isinstance(obj, basestring):
    print "obj is a sequence (list, tuple, etc) but not a string or unicode"

回答 4

具有PHP风格的Python:

def is_array(var):
    return isinstance(var, (list, tuple))

Python with PHP flavor:

def is_array(var):
    return isinstance(var, (list, tuple))

回答 5

一般来说,在对象上进行迭代的函数不仅可以处理错误,还可以处理字符串,元组和列表。您当然可以使用isinstance或鸭式输入来检查参数,但是为什么要这么做呢?

这听起来像是个反问,但事实并非如此。答案为“为什么我应该检查参数的类型?” 可能会建议解决实际问题,而不是感知到的问题。将字符串传递给函数时,为什么会出错?另外:如果将字符串传递给此函数是一个错误,是否将其他非列表/元组可迭代传递给它也是一个错误吗?为什么或者为什么不?

我认为这个问题的最常见答案可能是 f("abc")期望该函数的行为就像编写的一样f(["abc"])。在某些情况下,保护开发人员免受自身侵害比支持对字符串中的字符进行迭代的用例更有意义。但是我首先会考虑很长时间。

Generally speaking, the fact that a function which iterates over an object works on strings as well as tuples and lists is more feature than bug. You certainly can use isinstance or duck typing to check an argument, but why should you?

That sounds like a rhetorical question, but it isn’t. The answer to “why should I check the argument’s type?” is probably going to suggest a solution to the real problem, not the perceived problem. Why is it a bug when a string is passed to the function? Also: if it’s a bug when a string is passed to this function, is it also a bug if some other non-list/tuple iterable is passed to it? Why, or why not?

I think that the most common answer to the question is likely to be that developers who write f("abc") are expecting the function to behave as though they’d written f(["abc"]). There are probably circumstances where it makes more sense to protect developers from themselves than it does to support the use case of iterating across the characters in a string. But I’d think long and hard about it first.


回答 6

尝试此操作以提高可读性和最佳做法:

Python2

import types
if isinstance(lst, types.ListType) or isinstance(lst, types.TupleType):
    # Do something

Python3

import typing
if isinstance(lst, typing.List) or isinstance(lst, typing.Tuple):
    # Do something

希望能帮助到你。

Try this for readability and best practices:

Python2

import types
if isinstance(lst, types.ListType) or isinstance(lst, types.TupleType):
    # Do something

Python3

import typing
if isinstance(lst, typing.List) or isinstance(lst, typing.Tuple):
    # Do something

Hope it helps.


回答 7

str对象没有__iter__属性

>>> hasattr('', '__iter__')
False 

所以你可以检查一下

assert hasattr(x, '__iter__')

这也AssertionError将为其他任何不可迭代的对象带来好处。

编辑: 正如蒂姆在评论中提到的那样,这仅适用于python 2.x,而不是3.x

The str object doesn’t have an __iter__ attribute

>>> hasattr('', '__iter__')
False 

so you can do a check

assert hasattr(x, '__iter__')

and this will also raise a nice AssertionError for any other non-iterable object too.

Edit: As Tim mentions in the comments, this will only work in python 2.x, not 3.x


回答 8

这并不是要直接回答OP,而是要分享一些相关想法。

我对上面的@steveha回答非常感兴趣,这似乎举了一个鸭子输入似乎中断的示例。换个角度说,他的例子表明鸭子的分类很难遵循,但是并不能说明str值得任何特殊处理。

毕竟,非str类型(例如,维护一些复杂的递归结构的用户定义类型)可能导致@steveha srepr函数引起无限递归。尽管这确实不太可能,但我们不能忽略这种可能性。因此,与其特殊外壳strsrepr,我们应该明确,我们想要什么srepr在无限递归产生时的事情情况。

似乎一种合理的方法是srepr暂时中断当前递归list(arg) == [arg]。这,其实,彻底解决这个问题str,没有任何isinstance

但是,真正复杂的递归结构可能会导致无限循环,list(arg) == [arg]永远不会发生。因此,尽管上面的检查很有用,但还不够。我们需要对递归深度进行严格限制。

我的观点是,如果您打算处理任意参数类型,则str通过鸭子类型进行处理要比处理(理论上)遇到的更通用类型容易得多。因此,如果您需要排除str实例,则应该要求该参数是您明确指定的几种类型之一的实例。

This is not intended to directly answer the OP, but I wanted to share some related ideas.

I was very interested in @steveha answer above, which seemed to give an example where duck typing seems to break. On second thought, however, his example suggests that duck typing is hard to conform to, but it does not suggest that str deserves any special handling.

After all, a non-str type (e.g., a user-defined type that maintains some complicated recursive structures) may cause @steveha srepr function to cause an infinite recursion. While this is admittedly rather unlikely, we can’t ignore this possibility. Therefore, rather than special-casing str in srepr, we should clarify what we want srepr to do when an infinite recursion results.

It may seem that one reasonable approach is to simply break the recursion in srepr the moment list(arg) == [arg]. This would, in fact, completely solve the problem with str, without any isinstance.

However, a really complicated recursive structure may cause an infinite loop where list(arg) == [arg] never happens. Therefore, while the above check is useful, it’s not sufficient. We need something like a hard limit on the recursion depth.

My point is that if you plan to handle arbitrary argument types, handling str via duck typing is far, far easier than handling the more general types you may (theoretically) encounter. So if you feel the need to exclude str instances, you should instead demand that the argument is an instance of one of the few types that you explicitly specify.


回答 9

在tensorflow中找到了一个名为is_sequence的函数

def is_sequence(seq):
  """Returns a true if its input is a collections.Sequence (except strings).
  Args:
    seq: an input sequence.
  Returns:
    True if the sequence is a not a string and is a collections.Sequence.
  """
  return (isinstance(seq, collections.Sequence)
and not isinstance(seq, six.string_types))

而且我已经证实它可以满足您的需求。

I find such a function named is_sequence in tensorflow.

def is_sequence(seq):
  """Returns a true if its input is a collections.Sequence (except strings).
  Args:
    seq: an input sequence.
  Returns:
    True if the sequence is a not a string and is a collections.Sequence.
  """
  return (isinstance(seq, collections.Sequence)
and not isinstance(seq, six.string_types))

And I have verified that it meets your needs.


回答 10

我在测试用例中执行此操作。

def assertIsIterable(self, item):
    #add types here you don't want to mistake as iterables
    if isinstance(item, basestring): 
        raise AssertionError("type %s is not iterable" % type(item))

    #Fake an iteration.
    try:
        for x in item:
            break;
    except TypeError:
        raise AssertionError("type %s is not iterable" % type(item))

未经生成器测试,我认为如果通过生成器,您将处于下一个“收益”状态,这可能会使下游情况恶化。但是再说一次,这是一个“单元测试”

I do this in my testcases.

def assertIsIterable(self, item):
    #add types here you don't want to mistake as iterables
    if isinstance(item, basestring): 
        raise AssertionError("type %s is not iterable" % type(item))

    #Fake an iteration.
    try:
        for x in item:
            break;
    except TypeError:
        raise AssertionError("type %s is not iterable" % type(item))

Untested on generators, I think you are left at the next ‘yield’ if passed in a generator, which may screw things up downstream. But then again, this is a ‘unittest’


回答 11

以“鸭子打字”的方式

try:
    lst = lst + []
except TypeError:
    #it's not a list

要么

try:
    lst = lst + ()
except TypeError:
    #it's not a tuple

分别。这避免了isinstance/ hasattr内省的东西。

您也可以反之亦然:

try:
    lst = lst + ''
except TypeError:
    #it's not (base)string

所有变体实际上都不会更改变量的内容,而是暗示了重新分配。我不确定这在某些情况下是否不受欢迎。

有趣的是,在任何情况下,如果是列表(不是元组),则在“就地”赋值时都不会引发+=no 。这就是为什么以这种方式完成分配的原因。也许有人可以阐明原因。TypeErrorlst

In “duck typing” manner, how about

try:
    lst = lst + []
except TypeError:
    #it's not a list

or

try:
    lst = lst + ()
except TypeError:
    #it's not a tuple

respectively. This avoids the isinstance / hasattr introspection stuff.

You could also check vice versa:

try:
    lst = lst + ''
except TypeError:
    #it's not (base)string

All variants do not actually change the content of the variable, but imply a reassignment. I’m unsure whether this might be undesirable under some circumstances.

Interestingly, with the “in place” assignment += no TypeError would be raised in any case if lst is a list (not a tuple). That’s why the assignment is done this way. Maybe someone can shed light on why that is.


回答 12

最简单的方法…使用anyisinstance

>>> console_routers = 'x'
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
False
>>>
>>> console_routers = ('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True
>>> console_routers = list('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True

simplest way… using any and isinstance

>>> console_routers = 'x'
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
False
>>>
>>> console_routers = ('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True
>>> console_routers = list('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True

回答 13

鸭式打字的另一种形式,可以帮助区分类似字符串的对象和其他类似序列的对象。

类字符串对象的字符串表示形式是字符串本身,因此您可以检查是否从str构造函数中返回了相等的对象:

# If a string was passed, convert it to a single-element sequence
if var == str(var):
    my_list = [var]

# All other iterables
else: 
    my_list = list(var)

这应该适用于与str所有可迭代对象兼容的所有对象。

Another version of duck-typing to help distinguish string-like objects from other sequence-like objects.

The string representation of string-like objects is the string itself, so you can check if you get an equal object back from the str constructor:

# If a string was passed, convert it to a single-element sequence
if var == str(var):
    my_list = [var]

# All other iterables
else: 
    my_list = list(var)

This should work for all objects compatible with str and for all kinds of iterable objects.


回答 14

Python 3具有以下功能:

from typing import List

def isit(value):
    return isinstance(value, List)

isit([1, 2, 3])  # True
isit("test")  # False
isit({"Hello": "Mars"})  # False
isit((1, 2))  # False

因此,要同时检查列表和元组,将是:

from typing import List, Tuple

def isit(value):
    return isinstance(value, List) or isinstance(value, Tuple)

Python 3 has this:

from typing import List

def isit(value):
    return isinstance(value, List)

isit([1, 2, 3])  # True
isit("test")  # False
isit({"Hello": "Mars"})  # False
isit((1, 2))  # False

So to check for both Lists and Tuples, it would be:

from typing import List, Tuple

def isit(value):
    return isinstance(value, List) or isinstance(value, Tuple)

回答 15

assert (type(lst) == list) | (type(lst) == tuple), "Not a valid lst type, cannot be string"
assert (type(lst) == list) | (type(lst) == tuple), "Not a valid lst type, cannot be string"

回答 16

做这个

if type(lst) in (list, tuple):
    # Do stuff

Just do this

if type(lst) in (list, tuple):
    # Do stuff

回答 17

在python> 3.6中

import collections
isinstance(set(),collections.abc.Container)
True
isinstance([],collections.abc.Container)
True
isinstance({},collections.abc.Container)
True
isinstance((),collections.abc.Container)
True
isinstance(str,collections.abc.Container)
False

in python >3.6

import collections
isinstance(set(),collections.abc.Container)
True
isinstance([],collections.abc.Container)
True
isinstance({},collections.abc.Container)
True
isinstance((),collections.abc.Container)
True
isinstance(str,collections.abc.Container)
False

回答 18

我倾向于这样做(如果真的必须这样做的话):

for i in some_var:
   if type(i) == type(list()):
       #do something with a list
   elif type(i) == type(tuple()):
       #do something with a tuple
   elif type(i) == type(str()):
       #here's your string

I tend to do this (if I really, really had to):

for i in some_var:
   if type(i) == type(list()):
       #do something with a list
   elif type(i) == type(tuple()):
       #do something with a tuple
   elif type(i) == type(str()):
       #here's your string

迭代器,可迭代和迭代到底是什么?

问题:迭代器,可迭代和迭代到底是什么?

Python中“可迭代”,“迭代器”和“迭代”的最基本定义是什么?

我已经阅读了多个定义,但是我无法确定确切的含义,因为它仍然不会陷入。

有人可以在外行方面为我提供3个定义的帮助吗?

What is the most basic definition of “iterable”, “iterator” and “iteration” in Python?

I have read multiple definitions but I am unable to identify the exact meaning as it still won’t sink in.

Can someone please help me with the 3 definitions in layman terms?


回答 0

迭代是一个总称,表示一件一件一件一件一件接一件的物品。每当您使用循环(显式或隐式)遍历一组项目时,即迭代。

在Python中,iterableiterator具有特定的含义。

一个迭代是具有对象__iter__返回一个方法迭代,或者其限定__getitem__,可以采取顺序索引从零启动方法(并发出IndexError时,索引不再有效)。因此,可迭代对象是可以从中获取迭代器的对象。

一个迭代器是具有一个对象next(Python的2)或__next__(Python 3的)方法。

每当在Python中使用for循环或map或列表理解等时,next都会自动调用该方法以从迭代器获取每个项,从而进行迭代过程。

一个开始学习的好地方是本教程迭代器部分和标准类型页面迭代器类型部分。了解基础知识之后,请尝试“功能编程HOWTO”的“ 迭代器”部分

Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.

In Python, iterable and iterator have specific meanings.

An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.

An iterator is an object with a next (Python 2) or __next__ (Python 3) method.

Whenever you use a for loop, or map, or a list comprehension, etc. in Python, the next method is called automatically to get each item from the iterator, thus going through the process of iteration.

A good place to start learning would be the iterators section of the tutorial and the iterator types section of the standard types page. After you understand the basics, try the iterators section of the Functional Programming HOWTO.


回答 1

这是我在教授Python类时使用的解释:

一个ITERABLE是:

  • 任何可以循环播放的内容(例如,您可以循环播放字符串或文件)或
  • 任何可能出现在for循环右侧的内容: for x in iterable: ...
  • 您可以呼叫的任何内容iter()都会传回ITERATOR: iter(obj)
  • 一个定义的对象,该对象__iter__返回一个新鲜的ITERATOR,或者它可能具有__getitem__适合于索引查找的方法。

ITERATOR是一个对象:

  • 状态会记住迭代过程中的位置,
  • 使用以下__next__方法:
    • 返回迭代中的下一个值
    • 更新状态以指向下一个值
    • 通过提高发出信号 StopIteration
  • 并且这是可自我迭代的(意味着它具有__iter__返回的方法self)。

笔记:

  • __next__Python 3中的方法是Python 2中的拼写next,并且
  • 内置函数next()在传递给它的对象上调用该方法。

例如:

>>> s = 'cat'      # s is an ITERABLE
                   # s is a str object that is immutable
                   # s has no state
                   # s has a __getitem__() method 

>>> t = iter(s)    # t is an ITERATOR
                   # t has state (it starts by pointing at the "c"
                   # t has a next() method and an __iter__() method

>>> next(t)        # the next() function returns the next value and advances the state
'c'
>>> next(t)        # the next() function returns the next value and advances
'a'
>>> next(t)        # the next() function returns the next value and advances
't'
>>> next(t)        # next() raises StopIteration to signal that iteration is complete
Traceback (most recent call last):
...
StopIteration

>>> iter(t) is t   # the iterator is self-iterable

Here’s the explanation I use in teaching Python classes:

An ITERABLE is:

  • anything that can be looped over (i.e. you can loop over a string or file) or
  • anything that can appear on the right-side of a for-loop: for x in iterable: ... or
  • anything you can call with iter() that will return an ITERATOR: iter(obj) or
  • an object that defines __iter__ that returns a fresh ITERATOR, or it may have a __getitem__ method suitable for indexed lookup.

An ITERATOR is an object:

  • with state that remembers where it is during iteration,
  • with a __next__ method that:
    • returns the next value in the iteration
    • updates the state to point at the next value
    • signals when it is done by raising StopIteration
  • and that is self-iterable (meaning that it has an __iter__ method that returns self).

Notes:

  • The __next__ method in Python 3 is spelt next in Python 2, and
  • The builtin function next() calls that method on the object passed to it.

For example:

>>> s = 'cat'      # s is an ITERABLE
                   # s is a str object that is immutable
                   # s has no state
                   # s has a __getitem__() method 

>>> t = iter(s)    # t is an ITERATOR
                   # t has state (it starts by pointing at the "c"
                   # t has a next() method and an __iter__() method

>>> next(t)        # the next() function returns the next value and advances the state
'c'
>>> next(t)        # the next() function returns the next value and advances
'a'
>>> next(t)        # the next() function returns the next value and advances
't'
>>> next(t)        # next() raises StopIteration to signal that iteration is complete
Traceback (most recent call last):
...
StopIteration

>>> iter(t) is t   # the iterator is self-iterable

回答 2

上面的答案很棒,但是正如我所见到的大多数一样,对于像我这样的人来说,不要强调这种区别

同样,人们倾向于通过在__foo__()前面放置诸如“ X是具有方法的对象”之类的定义来获得“ Python风格” 。这样的定义是正确的-它们基于鸭子式的哲学,但是当试图以简单的方式理解概念时,对方法的关注往往会介于两者之间。

因此,我添加了我的版本。


用自然语言

  • 迭代是在一行元素中一次获取一个元素的过程。

在Python中,

  • Iterable是一个很好的可迭代对象,简单地说,意味着可以在迭代中使用它,例如使用for循环。怎么样?通过使用迭代器。我会在下面解释。

  • …,而迭代器是一个对象,它定义了如何实际执行迭代-特别是下一个元素是什么。这就是为什么它必须有next()方法的原因 。

迭代器本身也是可迭代的,区别在于它们的__iter__()方法返回相同的object(self),而不管其先前的调用是否已消耗其项目next()


那么,Python解释器看到for x in obj:语句时会怎么想?

看,for循环。看起来像是一个迭代器的工作…让我们得到一个。…有obj一个人,让我们问他。

“先生obj,您有迭代器吗?” (…调用iter(obj),这些调用 obj.__iter__()愉快地发出了一个闪亮的新迭代器_i。)

好的,那很简单…让我们开始迭代。(x = _i.next()x = _i.next()…)

由于Mr. Mr obj成功地通过了某种测试(通过某种方法返回有效的迭代器),因此我们用形容词来奖励他:您现在可以称他为“ Iterable Mr. obj”。

但是,在简单的情况下,通常不会从分别拥有Iterator和Iterable中受益。因此,您定义一个对象,这也是它自己的迭代器。(Python并不真正在乎_i发出的obj不是那么闪亮,而仅仅是obj它本身。)

这就是为什么在我见过的大多数示例中(以及一遍又一遍使我困惑的原因)中,您可以看到:

class IterableExample(object):

    def __iter__(self):
        return self

    def next(self):
        pass

代替

class Iterator(object):
    def next(self):
        pass

class Iterable(object):
    def __iter__(self):
        return Iterator()

但是,在某些情况下,可以从使迭代器与可迭代的对象分离中受益,例如,当您希望有一行项目,但需要更多的“游标”时。例如,当您要使用“当前”和“即将到来”的元素时,可以为这两个元素使用单独的迭代器。或从庞大列表中提取多个线程:每个线程都可以具有自己的迭代器以遍历所有项目。见@雷蒙德@ glglgl的上述回答。

想象一下您可以做什么:

class SmartIterableExample(object):

    def create_iterator(self):
        # An amazingly powerful yet simple way to create arbitrary
        # iterator, utilizing object state (or not, if you are fan
        # of functional), magic and nuclear waste--no kittens hurt.
        pass    # don't forget to add the next() method

    def __iter__(self):
        return self.create_iterator()

笔记:

  • 我将再次重复:迭代器不可迭代。迭代器不能用作for循环中的“源” 。什么for环路主要需要的是__iter__() (即返回与事next())。

  • 当然,for这不是唯一的迭代循环,因此上述内容同样适用于其他一些构造(while…)。

  • 迭代器next()可以抛出StopIteration来停止迭代。但是,它不必永久地迭代或使用其他方式。

  • 在上面的“思想过程”中,_i并不真正存在。我叫这个名字。

  • Python 3.x有一个小的变化:next()现在必须调用方法(不是内置方法)__next__()。是的,一直以来都是这样。

  • 您也可以这样想:可迭代拥有数据,迭代器提取下一项

免责声明:我不是任何Python解释器的开发人员,所以我真的不知道解释器的想法。上面的想法只是从其他解释,实验和Python新手的实际经验中展示了我如何理解该主题。

The above answers are great, but as most of what I’ve seen, don’t stress the distinction enough for people like me.

Also, people tend to get “too Pythonic” by putting definitions like “X is an object that has __foo__() method” before. Such definitions are correct–they are based on duck-typing philosophy, but the focus on methods tends to get between when trying to understand the concept in its simplicity.

So I add my version.


In natural language,

  • iteration is the process of taking one element at a time in a row of elements.

In Python,

  • iterable is an object that is, well, iterable, which simply put, means that it can be used in iteration, e.g. with a for loop. How? By using iterator. I’ll explain below.

  • … while iterator is an object that defines how to actually do the iteration–specifically what is the next element. That’s why it must have next() method.

Iterators are themselves also iterable, with the distinction that their __iter__() method returns the same object (self), regardless of whether or not its items have been consumed by previous calls to next().


So what does Python interpreter think when it sees for x in obj: statement?

Look, a for loop. Looks like a job for an iterator… Let’s get one. … There’s this obj guy, so let’s ask him.

“Mr. obj, do you have your iterator?” (… calls iter(obj), which calls obj.__iter__(), which happily hands out a shiny new iterator _i.)

OK, that was easy… Let’s start iterating then. (x = _i.next()x = _i.next()…)

Since Mr. obj succeeded in this test (by having certain method returning a valid iterator), we reward him with adjective: you can now call him “iterable Mr. obj“.

However, in simple cases, you don’t normally benefit from having iterator and iterable separately. So you define only one object, which is also its own iterator. (Python does not really care that _i handed out by obj wasn’t all that shiny, but just the obj itself.)

This is why in most examples I’ve seen (and what had been confusing me over and over), you can see:

class IterableExample(object):

    def __iter__(self):
        return self

    def next(self):
        pass

instead of

class Iterator(object):
    def next(self):
        pass

class Iterable(object):
    def __iter__(self):
        return Iterator()

There are cases, though, when you can benefit from having iterator separated from the iterable, such as when you want to have one row of items, but more “cursors”. For example when you want to work with “current” and “forthcoming” elements, you can have separate iterators for both. Or multiple threads pulling from a huge list: each can have its own iterator to traverse over all items. See @Raymond’s and @glglgl’s answers above.

Imagine what you could do:

class SmartIterableExample(object):

    def create_iterator(self):
        # An amazingly powerful yet simple way to create arbitrary
        # iterator, utilizing object state (or not, if you are fan
        # of functional), magic and nuclear waste--no kittens hurt.
        pass    # don't forget to add the next() method

    def __iter__(self):
        return self.create_iterator()

Notes:

  • I’ll repeat again: iterator is not iterable. Iterator cannot be used as a “source” in for loop. What for loop primarily needs is __iter__() (that returns something with next()).

  • Of course, for is not the only iteration loop, so above applies to some other constructs as well (while…).

  • Iterator’s next() can throw StopIteration to stop iteration. Does not have to, though, it can iterate forever or use other means.

  • In the above “thought process”, _i does not really exist. I’ve made up that name.

  • There’s a small change in Python 3.x: next() method (not the built-in) now must be called __next__(). Yes, it should have been like that all along.

  • You can also think of it like this: iterable has the data, iterator pulls the next item

Disclaimer: I’m not a developer of any Python interpreter, so I don’t really know what the interpreter “thinks”. The musings above are solely demonstration of how I understand the topic from other explanations, experiments and real-life experience of a Python newbie.


回答 3

可迭代对象是具有__iter__()方法的对象。它可能会迭代多次,例如list()s和tuple()s。

迭代器是要迭代的对象。它由__iter__()方法返回,通过自己的__iter__()方法返回自身,并具有next()方法(__next__()在3.x中)。

迭代是调用此next()响应的过程。__next__()直到它上升StopIteration

例:

>>> a = [1, 2, 3] # iterable
>>> b1 = iter(a) # iterator 1
>>> b2 = iter(a) # iterator 2, independent of b1
>>> next(b1)
1
>>> next(b1)
2
>>> next(b2) # start over, as it is the first call to b2
1
>>> next(b1)
3
>>> next(b1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> b1 = iter(a) # new one, start over
>>> next(b1)
1

An iterable is a object which has a __iter__() method. It can possibly iterated over several times, such as list()s and tuple()s.

An iterator is the object which iterates. It is returned by an __iter__() method, returns itself via its own __iter__() method and has a next() method (__next__() in 3.x).

Iteration is the process of calling this next() resp. __next__() until it raises StopIteration.

Example:

>>> a = [1, 2, 3] # iterable
>>> b1 = iter(a) # iterator 1
>>> b2 = iter(a) # iterator 2, independent of b1
>>> next(b1)
1
>>> next(b1)
2
>>> next(b2) # start over, as it is the first call to b2
1
>>> next(b1)
3
>>> next(b1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> b1 = iter(a) # new one, start over
>>> next(b1)
1

回答 4

这是我的备忘单:

 sequence
  +
  |
  v
   def __getitem__(self, index: int):
  +    ...
  |    raise IndexError
  |
  |
  |              def __iter__(self):
  |             +     ...
  |             |     return <iterator>
  |             |
  |             |
  +--> or <-----+        def __next__(self):
       +        |       +    ...
       |        |       |    raise StopIteration
       v        |       |
    iterable    |       |
           +    |       |
           |    |       v
           |    +----> and +-------> iterator
           |                               ^
           v                               |
   iter(<iterable>) +----------------------+
                                           |
   def generator():                        |
  +    yield 1                             |
  |                 generator_expression +-+
  |                                        |
  +-> generator() +-> generator_iterator +-+

测验:您知道如何…

  1. 每个迭代器都是可迭代的?
  2. 容器对象的__iter__()方法可以实现为生成器吗?
  3. 具有__next__方法的可迭代对象不一定是迭代器吗?

答案:

  1. 每个迭代器都必须有一个__iter__方法。具有__iter__足够的可迭代性。因此,每个迭代器都是可迭代的。
  2. __iter__被调用时,它应该返回一个迭代器(return <iterator>在上图中)。调用生成器将返回生成器迭代器,它是迭代器的一种。

    class Iterable1:
        def __iter__(self):
            # a method (which is a function defined inside a class body)
            # calling iter() converts iterable (tuple) to iterator
            return iter((1,2,3))
    
    class Iterable2:
        def __iter__(self):
            # a generator
            for i in (1, 2, 3):
                yield i
    
    class Iterable3:
        def __iter__(self):
            # with PEP 380 syntax
            yield from (1, 2, 3)
    
    # passes
    assert list(Iterable1()) == list(Iterable2()) == list(Iterable3()) == [1, 2, 3]
  3. 这是一个例子:

    class MyIterable:
    
        def __init__(self):
            self.n = 0
    
        def __getitem__(self, index: int):
            return (1, 2, 3)[index]
    
        def __next__(self):
            n = self.n = self.n + 1
            if n > 3:
                raise StopIteration
            return n
    
    # if you can iter it without raising a TypeError, then it's an iterable.
    iter(MyIterable())
    
    # but obviously `MyIterable()` is not an iterator since it does not have
    # an `__iter__` method.
    from collections.abc import Iterator
    assert isinstance(MyIterable(), Iterator)  # AssertionError

Here’s my cheat sheet:

 sequence
  +
  |
  v
   def __getitem__(self, index: int):
  +    ...
  |    raise IndexError
  |
  |
  |              def __iter__(self):
  |             +     ...
  |             |     return <iterator>
  |             |
  |             |
  +--> or <-----+        def __next__(self):
       +        |       +    ...
       |        |       |    raise StopIteration
       v        |       |
    iterable    |       |
           +    |       |
           |    |       v
           |    +----> and +-------> iterator
           |                               ^
           v                               |
   iter(<iterable>) +----------------------+
                                           |
   def generator():                        |
  +    yield 1                             |
  |                 generator_expression +-+
  |                                        |
  +-> generator() +-> generator_iterator +-+

Quiz: Do you see how…

  1. every iterator is an iterable?
  2. a container object’s __iter__() method can be implemented as a generator?
  3. an iterable that has a __next__ method is not necessarily an iterator?

Answers:

  1. Every iterator must have an __iter__ method. Having __iter__ is enough to be an iterable. Therefore every iterator is an iterable.
  2. When __iter__ is called it should return an iterator (return <iterator> in the diagram above). Calling a generator returns a generator iterator which is a type of iterator.

    class Iterable1:
        def __iter__(self):
            # a method (which is a function defined inside a class body)
            # calling iter() converts iterable (tuple) to iterator
            return iter((1,2,3))
    
    class Iterable2:
        def __iter__(self):
            # a generator
            for i in (1, 2, 3):
                yield i
    
    class Iterable3:
        def __iter__(self):
            # with PEP 380 syntax
            yield from (1, 2, 3)
    
    # passes
    assert list(Iterable1()) == list(Iterable2()) == list(Iterable3()) == [1, 2, 3]
    
  3. Here is an example:

    class MyIterable:
    
        def __init__(self):
            self.n = 0
    
        def __getitem__(self, index: int):
            return (1, 2, 3)[index]
    
        def __next__(self):
            n = self.n = self.n + 1
            if n > 3:
                raise StopIteration
            return n
    
    # if you can iter it without raising a TypeError, then it's an iterable.
    iter(MyIterable())
    
    # but obviously `MyIterable()` is not an iterator since it does not have
    # an `__iter__` method.
    from collections.abc import Iterator
    assert isinstance(MyIterable(), Iterator)  # AssertionError
    

回答 5

我不知道它是否对任何人都有帮助,但我一直喜欢在脑海中形象化概念以更好地理解它们。因此,当我有一个小儿子时,我用砖块和白皮书形象化了迭代/迭代器的概念。

假设我们在黑暗的房间里,在地板上,我的儿子有砖头。现在,大小,颜色不同的砖都不再重要了。假设我们有5块这样的砖。可以将这5块砖描述为一个对象 -假设是砖块套件。使用此积木工具包,我们可以做很多事情–可以取一个,然后取第二,然后取第三,可以更改积木的位置,将第一个积木放在第二个之上。我们可以用这些做很多事情。因此,这个积木工具包是一个可迭代的对象序列,因为我们可以遍历每个积木并对其进行处理。我们只能做到像我的小儿子-我们可以玩一个在同一时间。所以我再次想像自己这套积木是一个可迭代的

现在请记住,我们在黑暗的房间里。或几乎是黑暗的。问题是我们没有清楚地看到那些砖块,它们是什么颜色,什么形状等等。因此,即使我们想对它们做些事情(也就是遍历它们),我们也不知道到底是什么以及如何做,因为它是太暗了。

我们所能做的就是接近第一个砖块(作为砖块工具包的组成部分),我们可以放一张白色荧光纸,以便我们了解第一个砖块元素的位置。每次我们从工具包中取出一块砖块时,都会将白纸替换为下一块砖块,以便能够在黑暗的房间中看到它。这张白纸只不过是一个迭代器。它也是一个对象。但是,具有可工作和可迭代对象的元素的对象–砖块工具包。

顺便说一下,这解释了我在IDLE中尝试以下操作并遇到TypeError时的早期错误:

 >>> X = [1,2,3,4,5]
 >>> next(X)
 Traceback (most recent call last):
    File "<pyshell#19>", line 1, in <module>
      next(X)
 TypeError: 'list' object is not an iterator

清单X是我们的积木工具包,但不是白纸。我需要先找到一个迭代器:

>>> X = [1,2,3,4,5]
>>> bricks_kit = [1,2,3,4,5]
>>> white_piece_of_paper = iter(bricks_kit)
>>> next(white_piece_of_paper)
1
>>> next(white_piece_of_paper)
2
>>>

不知道是否有帮助,但是对我有帮助。如果有人可以确认/纠正该概念的可视化,我将不胜感激。这将帮助我了解更多信息。

I don’t know if it helps anybody but I always like to visualize concepts in my head to better understand them. So as I have a little son I visualize iterable/iterator concept with bricks and white paper.

Suppose we are in the dark room and on the floor we have bricks for my son. Bricks of different size, color, does not matter now. Suppose we have 5 bricks like those. Those 5 bricks can be described as an object – let’s say bricks kit. We can do many things with this bricks kit – can take one and then take second and then third, can change places of bricks, put first brick above the second. We can do many sorts of things with those. Therefore this bricks kit is an iterable object or sequence as we can go through each brick and do something with it. We can only do it like my little son – we can play with one brick at a time. So again I imagine myself this bricks kit to be an iterable.

Now remember that we are in the dark room. Or almost dark. The thing is that we don’t clearly see those bricks, what color they are, what shape etc. So even if we want to do something with them – aka iterate through them – we don’t really know what and how because it is too dark.

What we can do is near to first brick – as element of a bricks kit – we can put a piece of white fluorescent paper in order for us to see where the first brick-element is. And each time we take a brick from a kit, we replace the white piece of paper to a next brick in order to be able to see that in the dark room. This white piece of paper is nothing more than an iterator. It is an object as well. But an object with what we can work and play with elements of our iterable object – bricks kit.

That by the way explains my early mistake when I tried the following in an IDLE and got a TypeError:

 >>> X = [1,2,3,4,5]
 >>> next(X)
 Traceback (most recent call last):
    File "<pyshell#19>", line 1, in <module>
      next(X)
 TypeError: 'list' object is not an iterator

List X here was our bricks kit but NOT a white piece of paper. I needed to find an iterator first:

>>> X = [1,2,3,4,5]
>>> bricks_kit = [1,2,3,4,5]
>>> white_piece_of_paper = iter(bricks_kit)
>>> next(white_piece_of_paper)
1
>>> next(white_piece_of_paper)
2
>>>

Don’t know if it helps, but it helped me. If someone could confirm/correct visualization of the concept, I would be grateful. It would help me to learn more.


回答 6

可迭代: -这是迭代的迭代; 例如列表,字符串等序列。它也具有__getitem__方法或__iter__方法。现在,如果我们iter()对该对象使用功能,我们将获得一个迭代器。

迭代器:-当我们从iter()函数中获取迭代器对象时;我们调用__next__()方法(在python3中)或简单地next()(在python2中)一一获取元素。此类或此类的实例称为迭代器。

从文档:-

迭代器的使用遍布并统一了Python。在后台,for语句调用  iter() 容器对象。该函数返回一个迭代器对象,该对象定义了__next__() 一次访问一个容器中元素的方法  。当没有更多元素时,  __next__() 引发StopIteration异常,该异常通知for循环终止。您可以__next__() 使用next() 内置函数来调用该  方法  。这个例子展示了它是如何工作的:

>>> s = 'abc'
>>> it = iter(s)
>>> it
<iterator object at 0x00A1DB50>
>>> next(it)
'a'
>>> next(it)
'b'
>>> next(it)
'c'
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    next(it)
StopIteration

例如:

class Reverse:
    """Iterator for looping over a sequence backwards."""
    def __init__(self, data):
        self.data = data
        self.index = len(data)
    def __iter__(self):
        return self
    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]


>>> rev = Reverse('spam')
>>> iter(rev)
<__main__.Reverse object at 0x00A1DB50>
>>> for char in rev:
...     print(char)
...
m
a
p
s

Iterable:- something that is iterable is iterable; like sequences like lists ,strings etc. Also it has either the __getitem__ method or an __iter__ method. Now if we use iter() function on that object, we’ll get an iterator.

Iterator:- When we get the iterator object from the iter() function; we call __next__() method (in python3) or simply next() (in python2) to get elements one by one. This class or instance of this class is called an iterator.

From docs:-

The use of iterators pervades and unifies Python. Behind the scenes, the for statement calls iter() on the container object. The function returns an iterator object that defines the method __next__() which accesses elements in the container one at a time. When there are no more elements, __next__() raises a StopIteration exception which tells the for loop to terminate. You can call the __next__() method using the next() built-in function; this example shows how it all works:

>>> s = 'abc'
>>> it = iter(s)
>>> it
<iterator object at 0x00A1DB50>
>>> next(it)
'a'
>>> next(it)
'b'
>>> next(it)
'c'
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    next(it)
StopIteration

Ex of a class:-

class Reverse:
    """Iterator for looping over a sequence backwards."""
    def __init__(self, data):
        self.data = data
        self.index = len(data)
    def __iter__(self):
        return self
    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]


>>> rev = Reverse('spam')
>>> iter(rev)
<__main__.Reverse object at 0x00A1DB50>
>>> for char in rev:
...     print(char)
...
m
a
p
s

回答 7

我认为您不会比文档简单得多,但是我会尝试:

  • 可迭代的东西,可以被重复过。在实践中,它通常表示一个序列,例如具有开始和结束的某种事物,以及某种贯穿其中所有项目的方式。
  • 您可以将Iterator视为辅助伪方法(或伪属性),该伪方法可提供(或保留)iterable中的下一个(或第一个)项。(实际上,它只是一个定义方法的对象next()

  • Merriam-Webster 对该词的定义可能最好地解释了迭代

b:将计算机指令序列重复指定的次数或直到满足条件为止-比较递归

I don’t think that you can get it much simpler than the documentation, however I’ll try:

  • Iterable is something that can be iterated over. In practice it usually means a sequence e.g. something that has a beginning and an end and some way to go through all the items in it.
  • You can think Iterator as a helper pseudo-method (or pseudo-attribute) that gives (or holds) the next (or first) item in the iterable. (In practice it is just an object that defines the method next())

  • Iteration is probably best explained by the Merriam-Webster definition of the word :

b : the repetition of a sequence of computer instructions a specified number of times or until a condition is met — compare recursion


回答 8

iterable = [1, 2] 

iterator = iter(iterable)

print(iterator.__next__())   

print(iterator.__next__())   

所以,

  1. iterable是可以循环对象。例如list,string,tuple等。

  2. iteriterable对象上使用该函数将返回迭代器对象。

  3. 现在,此迭代器对象具有名为__next__(在Python 3中,或仅next在Python 2中)的方法,您可以通过该方法访问iterable的每个元素。

因此,以上代码的输出将是:

1个

2

iterable = [1, 2] 

iterator = iter(iterable)

print(iterator.__next__())   

print(iterator.__next__())   

so,

  1. iterable is an object that can be looped over. e.g. list , string , tuple etc.

  2. using the iter function on our iterable object will return an iterator object.

  3. now this iterator object has method named __next__ (in Python 3, or just next in Python 2) by which you can access each element of iterable.

so, OUTPUT OF ABOVE CODE WILL BE:

1

2


回答 9

__iter__迭代对象具有每次都实例化新迭代器的方法。

迭代器实现一个__next__返回单个项目的__iter__方法和一个返回的方法self

因此,迭代器也是可迭代的,但是可迭代器不是迭代器。

Luciano Ramalho,流利的Python。

Iterables have a __iter__ method that instantiates a new iterator every time.

Iterators implement a __next__ method that returns individual items, and a __iter__ method that returns self .

Therefore, iterators are also iterable, but iterables are not iterators.

Luciano Ramalho, Fluent Python.


回答 10

在处理迭代器和迭代器之前,决定迭代器和迭代器的主要因素是顺序

序列:序列是数据的集合

可迭代:可迭代是支持__iter__方法的序列类型对象。

Iter方法:Iter方法将序列作为输入并创建一个称为迭代器的对象

迭代器:迭代器是调用next方法并遍历整个序列的对象。在调用下一个方法时,它返回当前遍历的对象。

例:

x=[1,2,3,4]

x是一个由数据收集组成的序列

y=iter(x)

调用iter(x)时,仅当x对象具有iter方法时才返回迭代器,否则会引发异常。如果返回迭代器,则按如下方式分配y:

y=[1,2,3,4]

由于y是迭代器,因此它支持next()方法

调用next方法时,它会一步一步返回列表的各个元素。

返回序列的最后一个元素后,如果再次调用下一个方法,则会引发StopIteration错误

例:

>>> y.next()
1
>>> y.next()
2
>>> y.next()
3
>>> y.next()
4
>>> y.next()
StopIteration

Before dealing with the iterables and iterator the major factor that decide the iterable and iterator is sequence

Sequence: Sequence is the collection of data

Iterable: Iterable are the sequence type object that support __iter__ method.

Iter method: Iter method take sequence as an input and create an object which is known as iterator

Iterator: Iterator are the object which call next method and transverse through the sequence. On calling the next method it returns the object that it traversed currently.

example:

x=[1,2,3,4]

x is a sequence which consists of collection of data

y=iter(x)

On calling iter(x) it returns a iterator only when the x object has iter method otherwise it raise an exception.If it returns iterator then y is assign like this:

y=[1,2,3,4]

As y is a iterator hence it support next() method

On calling next method it returns the individual elements of the list one by one.

After returning the last element of the sequence if we again call the next method it raise an StopIteration error

example:

>>> y.next()
1
>>> y.next()
2
>>> y.next()
3
>>> y.next()
4
>>> y.next()
StopIteration

回答 11

在Python中,一切都是对象。如果说一个对象是可迭代的,则意味着您可以将对象作为一个集合逐步进行(即迭代)。

例如,数组是可迭代的。您可以使用for循环遍历它们,并从索引0到索引n,n是数组对象的长度减去1。

字典(键/值对,也称为关联数组)也是可迭代的。您可以逐步浏览他们的键。

显然,不是集合的对象是不可迭代的。例如,布尔对象只有一个值为True或False。它不是可迭代的(它是一个可迭代的对象是没有意义的)。

阅读更多。http://www.lepus.org.uk/ref/companion/Iterator.xml

In Python everything is an object. When an object is said to be iterable, it means that you can step through (i.e. iterate) the object as a collection.

Arrays for example are iterable. You can step through them with a for loop, and go from index 0 to index n, n being the length of the array object minus 1.

Dictionaries (pairs of key/value, also called associative arrays) are also iterable. You can step through their keys.

Obviously the objects which are not collections are not iterable. A bool object for example only have one value, True or False. It is not iterable (it wouldn’t make sense that it’s an iterable object).

Read more. http://www.lepus.org.uk/ref/companion/Iterator.xml


删除点子的缓存?

问题:删除点子的缓存?

我需要专门安装psycopg2 v2.4.1。我不小心做了:

 pip install psycopg2

代替:

 pip install psycopg2==2.4.1

它将安装2.4.4,而不是早期版本。

现在,即使在我pip卸载psycopg2并尝试使用正确的版本重新安装后,pip似乎仍在重新使用它第一次下载的缓存。

如何强制pip清除其下载缓存并使用命令中包含的特定版本?

I need to install psycopg2 v2.4.1 specifically. I accidentally did:

 pip install psycopg2

Instead of:

 pip install psycopg2==2.4.1

That installs 2.4.4 instead of the earlier version.

Now even after I pip uninstall psycopg2 and attempt to reinstall with the correct version, it appears that pip is re-using the cache it downloaded the first time.

How can I force pip to clear out its download cache and use the specific version I’m including in the command?


回答 0

如果使用的是pip 6.0或更高版本,请尝试添加该--no-cache-dir选项

如果使用的是pip 6.0之前的版本,请使用进行升级pip install -U pip

If using pip 6.0 or newer, try adding the --no-cache-dir option.

If using pip older than pip 6.0, upgrade it with pip install -U pip.


回答 1

在适合您的系统的地方清除缓存目录

Linux和Unix

~/.cache/pip  # and it respects the XDG_CACHE_HOME directory.

OS X

~/Library/Caches/pip

视窗

%LocalAppData%\pip\Cache

Clear the cache directory where appropriate for your system

Linux and Unix

~/.cache/pip  # and it respects the XDG_CACHE_HOME directory.

OS X

~/Library/Caches/pip

Windows

%LocalAppData%\pip\Cache

回答 2

https://pip.pypa.io/zh-CN/latest/reference/pip_install.html#caching的文档中:

从v6.0开始,pip提供了默认情况下的缓存,其功能类似于网络浏览器。默认情况下,当缓存处于打开状态并且被设计为默认时,您可以禁用缓存并始终通过使用该--no-cache-dir 选项来访问PyPI 。

From documentation at https://pip.pypa.io/en/latest/reference/pip_install.html#caching:

Starting with v6.0, pip provides an on-by-default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the --no-cache-dir option.


回答 3

pip可以安装一个忽略缓存的软件包,像这样

pip --no-cache-dir install scipy

pip can install a package ignoring the cache, like this

pip --no-cache-dir install scipy

回答 4

在Ubuntu上,我必须删除/tmp/pip-build-root

On Ubuntu, I had to delete /tmp/pip-build-root.


回答 5

(这里是点子维护者!)

由于PIP 6.0(后在2014年!) pip installpip download并且pip wheel命令可以告诉避免使用与高速缓存--no-cache-dir选项。(例如:pip install --no-cache-dir <package>

自pip 10.0(早在2018年!)以来,pip config添加了一个命令,该命令可用于将pip配置为始终忽略缓存- pip config set global.cache-dir false将pip配置为不“全局”使用缓存(即,在所有命令中)。

从pip 20.1开始,pip具有pip cache管理pip缓存内容的命令。

  • pip cache purge 删除缓存中的所有wheel文件。
  • pip cache remove matplotlib 有选择地从缓存中删除与matplotlib相关的文件。

总而言之,pip提供了许多调整缓存使用方式的方法:

  • pip install --no-cache-dir <package>:仅为此运行而无需使用缓存安装软件包。
  • pip config set global.cache-dir false:将pip配置为不“全局”使用缓存(在所有命令中)
  • pip cache remove matplotlib:从pip的缓存中删除所有与matplotlib相关的wheel文件。
  • pip cache purge:清除pip缓存中的所有文件。

问题中提到的“由于缓存而安装了错误版本”的特定问题已在pip 1.4中修复(早在2013年!):

修复了许多与清理和不重用构建目录有关的问题。(#413,#709,#634,#602,#939,#865,#948)

(pip maintainer here!)

Since pip 6.0 (back in 2014!), pip install, pip download and pip wheel commands can be told to avoid using the cache with the --no-cache-dir option. (eg: pip install --no-cache-dir <package>)

Since pip 10.0 (back in 2018!), a pip config command was added, which can be used to configure pip to always ignore the cache — pip config set global.cache-dir false configures pip to not use the cache “globally” (i.e. in all commands).

Since pip 20.1, pip has a pip cache command to manage the contents of pip’s cache.

  • pip cache purge removes all the wheel files in the cache.
  • pip cache remove matplotlib selectively removes files related to a matplotlib from the cache.

In summary, pip provides a lot of ways to tweak how it uses the cache:

  • pip install --no-cache-dir <package>: install a package without using the cache, for just this run.
  • pip config set global.cache-dir false: configure pip to not use the cache “globally” (in all commands)
  • pip cache remove matplotlib: removes all wheel files related to matplotlib from pip’s cache.
  • pip cache purge: to clear all files from pip’s cache.

The specific issue of “installing the wrong version due to caching” issue mentioned in the question was fixed in pip 1.4 (back in 2013!):

Fix a number of issues related to cleaning up and not reusing build directories. (#413, #709, #634, #602, #939, #865, #948)


回答 6

如果您想--no-cache-dir默认设置选项,可以将其放入pip.conf

[global]
no-cache-dir = false

的位置pip.conf取决于您的操作系统。请参阅文档以获取更多信息。

If you like to set the --no-cache-dir option by default, you can put this into pip.conf:

[global]
no-cache-dir = false

The location of pip.conf depends on your OS. See the documentation for more info.


回答 7

我只是遇到了类似的问题,发现获取pip升级软件包的唯一方法是删除以前未完成的安装或先前版本的pip可能遗留下的$PWD/build%CD%\build在Windows上)目录(它现在删除了成功安装后生成目录)。

I just had a similar problem and found that the only way to get pip to upgrade the package was to delete the $PWD/build (%CD%\build on Windows) directory that might have been left over from a previously unfinished install or a previous version of pip (it now deletes the build directories after a successful install).


回答 8

在archlinux中,pip缓存位于〜/ .cache / pip上,我可以通过删除其中的http文件夹来解决问题。

On archlinux pip cache is located at ~/.cache/pip, I could solve my issue by removing the http folder inside it.


回答 9

在我的Mac上,我必须删除缓存目录 ~/Library/Caches/pip/

On my mac I had to remove the cache directory ~/Library/Caches/pip/


回答 10

2020年4月21日发布pip 20.1b1以来,它“添加pip cache了检查/管理pip的转盘缓存的命令”,因此可以发出以下命令:

pip cache purge

参考指南在这里:
https : //pip.pypa.io/en/stable/reference/pip_cache/
相应的拉取请求在这里

Since pip 20.1b1, which was released on 21 April 2020 and “added pip cache command for inspecting/managing pip’s wheel cache”, it is possible to issue this command:

pip cache purge

The reference guide is here:
https://pip.pypa.io/en/stable/reference/pip_cache/
The corresponding pull request is here.


回答 11

在Windows 7上,我必须删除%HOMEPATH%/pip

On Windows 7, I had to delete %HOMEPATH%/pip.


回答 12

如果使用virtualenv,请build在您的环境根目录下查找目录。

If using virtualenv, look for the build directory under your environments root.


回答 13

我必须在Windows 7上删除%TEMP%\ pip-build

I had to delete %TEMP%\pip-build On Windows 7


回答 14

在Mac OS(小牛)上,我不得不删除 /tmp/pip-build/

On Mac OS (Mavericks), I had to delete /tmp/pip-build/


回答 15

更好的方法是删除缓存并重建它。这样,如果您再次为其他virtualenv安装它,它将使用缓存而不是每次安装时都进行构建。

例如,当您安装它时,它将说它使用了缓存的滚轮,

Processing <some_prefix>/Library/Caches/pip/wheels/d0/c4/e4/e49fd07bca8dda00dd6b4bbc606aa05a25aacb00d45747a47a/horovod-0.19.3-cp37-cp37m-macosx_10_9_x86_64.wh

只需删除该文件,然后重新开始安装即可。

A better way to do it is to delete the cache and rebuild it. In this way, if you install it again for other virtualenv, it will use the cache instead of building every time when you install it.

For example, when you install it, it will say it uses cached wheel,

Processing <some_prefix>/Library/Caches/pip/wheels/d0/c4/e4/e49fd07bca8dda00dd6b4bbc606aa05a25aacb00d45747a47a/horovod-0.19.3-cp37-cp37m-macosx_10_9_x86_64.wh

Just delete that one and restart your install.


回答 16

(…)似乎pip正在重新使用缓存(…)

我很确定这不是正在发生的事情。用来(错误地)重用构建目录而非缓存的Pip。此问题已在 2013年7月23日发布的pip版本1.4中修复

(…) it appears that pip is re-using the cache (…)

I’m pretty sure that’s not what’s happening. Pip used to (wrongly) reuse build directory not cache. This was fixed in version 1.4 of pip which was released on 2013-07-23.


如何在Ubuntu上安装LXML

问题:如何在Ubuntu上安装LXML

我在Ubuntu 11上使用easy_install安装lxml遇到困难。

当我输入时,$ easy_install lxml我得到:

Searching for lxml
Reading http://pypi.python.org/simple/lxml/
Reading http://codespeak.net/lxml
Best match: lxml 2.3
Downloading http://lxml.de/files/lxml-2.3.tgz
Processing lxml-2.3.tgz
Running lxml-2.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-7UdQOZ/lxml-2.3/egg-dist-tmp-GacQGy
Building lxml version 2.3.
Building without Cython.
ERROR: /bin/sh: xslt-config: not found

** make sure the development packages of libxml2 and libxslt are installed **

Using build configuration of libxslt 
In file included from src/lxml/lxml.etree.c:227:0:
src/lxml/etree_defs.h:9:31: fatal error: libxml/xmlversion.h: No such file or directory
compilation terminated.

看来libxslt还是libxml2没有安装。我已经尝试按照http://www.techsww.com/tutorials/libraries/libxslt/installation/installing_libxslt_on_ubuntu_linux.phphttp://www.techsww.com/tutorials/libraries/libxml/installation/installing_installing_libxml_on_ubuntu_linux上的说明进行操作。 PHP没有成功。

如果我尝试wget ftp://xmlsoft.org/libxml2/libxml2-sources-2.6.27.tar.gz我会得到

<successful connection info>
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /libxml2 ... done.
==> SIZE libxml2-sources-2.6.27.tar.gz ... done.
==> PASV ... done.    ==> RETR libxml2-sources-2.6.27.tar.gz ... 
No such file `libxml2-sources-2.6.27.tar.gz'.

如果我先尝试另一种,那我会做的,./configure --prefix=/usr/local/libxslt --with-libxml-prefix=/usr/local/libxml2最终会失败,并显示以下内容:

checking for libxml libraries >= 2.6.27... configure: error: Could not find libxml2 anywhere, check ftp://xmlsoft.org/.

我试过两个版本2.6.272.6.29libxml2没什么区别。

不遗余力,我已经成功完成了sudo apt-get install libxml2-dev,但这并没有改变。

I’m having difficulty installing lxml with easy_install on Ubuntu 11.

When I type $ easy_install lxml I get:

Searching for lxml
Reading http://pypi.python.org/simple/lxml/
Reading http://codespeak.net/lxml
Best match: lxml 2.3
Downloading http://lxml.de/files/lxml-2.3.tgz
Processing lxml-2.3.tgz
Running lxml-2.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-7UdQOZ/lxml-2.3/egg-dist-tmp-GacQGy
Building lxml version 2.3.
Building without Cython.
ERROR: /bin/sh: xslt-config: not found

** make sure the development packages of libxml2 and libxslt are installed **

Using build configuration of libxslt 
In file included from src/lxml/lxml.etree.c:227:0:
src/lxml/etree_defs.h:9:31: fatal error: libxml/xmlversion.h: No such file or directory
compilation terminated.

It seems that libxslt or libxml2 is not installed. I’ve tried following the instructions at http://www.techsww.com/tutorials/libraries/libxslt/installation/installing_libxslt_on_ubuntu_linux.php and http://www.techsww.com/tutorials/libraries/libxml/installation/installing_libxml_on_ubuntu_linux.php with no success.

If I try wget ftp://xmlsoft.org/libxml2/libxml2-sources-2.6.27.tar.gz I get

<successful connection info>
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /libxml2 ... done.
==> SIZE libxml2-sources-2.6.27.tar.gz ... done.
==> PASV ... done.    ==> RETR libxml2-sources-2.6.27.tar.gz ... 
No such file `libxml2-sources-2.6.27.tar.gz'.

If I try the other first, I’ll get to ./configure --prefix=/usr/local/libxslt --with-libxml-prefix=/usr/local/libxml2 and that will fail eventually with:

checking for libxml libraries >= 2.6.27... configure: error: Could not find libxml2 anywhere, check ftp://xmlsoft.org/.

I’ve tried both versions 2.6.27 and 2.6.29 of libxml2 with no difference.

Leaving no stone unturned, I have successfully done sudo apt-get install libxml2-dev, but this changes nothing.


回答 0

由于您使用的是Ubuntu,因此不必理会这些源代码包。只需使用apt-get安装这些开发包。

apt-get install libxml2-dev libxslt1-dev python-dev

但是,如果您对可能是旧版本的lxml感到满意,则可以尝试

apt-get install python-lxml

并完成它。:)

Since you’re on Ubuntu, don’t bother with those source packages. Just install those development packages using apt-get.

apt-get install libxml2-dev libxslt1-dev python-dev

If you’re happy with a possibly older version of lxml altogether though, you could try

apt-get install python-lxml

and be done with it. :)


回答 1

在lxml编译之前,我还必须安装lib32z1-dev(Ubuntu 13.04 x64)。

sudo apt-get install lib32z1-dev

或所有必需的包装在一起:

sudo apt-get install libxml2-dev libxslt-dev python-dev lib32z1-dev

I also had to install lib32z1-dev before lxml would compile (Ubuntu 13.04 x64).

sudo apt-get install lib32z1-dev

Or all the required packages together:

sudo apt-get install libxml2-dev libxslt-dev python-dev lib32z1-dev

回答 2

正如@Pepijn在ubuntu 13.04 x64上评论@Druska的答案一样,无需使用lib32z1-dev,zlib1g-dev就足够了:

sudo apt-get install libxml2-dev libxslt-dev python-dev zlib1g-dev

As @Pepijn commented on @Druska ‘s answer, on ubuntu 13.04 x64, there is no need to use lib32z1-dev, zlib1g-dev is enough:

sudo apt-get install libxml2-dev libxslt-dev python-dev zlib1g-dev

回答 3

我使用Ubuntu 14.04在Vagrant中使用pip安装了lxml,并且遇到了同样的问题。即使安装了所有要求,我也一次又一次遇到相同的错误。事实证明,默认情况下,我的VM的内存很少。有了1024 MB,一切正常。

将此添加到您的VagrantFile中,lxml应该正确编译/安装:

config.vm.provider "virtualbox" do |vb|
  vb.memory = 1024
end

感谢sixhobbit的提示(请参阅:无法在Ubuntu 12.04上安装lxml)。

I installed lxml with pip in Vagrant, using Ubuntu 14.04 and had the same problem. Even though all requirements where installed, i got the same error again and again. Turned out, my VM had to little memory by default. With 1024 MB everything works fine.

Add this to your VagrantFile and lxml should properly compile / install:

config.vm.provider "virtualbox" do |vb|
  vb.memory = 1024
end

Thanks to sixhobbit for the hint (see: can’t installing lxml on Ubuntu 12.04).


回答 4

对于Ubuntu 14.04

sudo apt-get install python-lxml

为我工作。

For Ubuntu 14.04

sudo apt-get install python-lxml

worked for me.


回答 5

步骤1

使用此命令安装最新的python更新。

sudo apt-get install python-dev

第2步

添加第一个依赖项libxml2版本2.7.0或更高版本

sudo apt-get install libxml2-dev

第三步

添加第二个依赖库libxslt版本1.1.23或更高版本

sudo apt-get install libxslt1-dev

第四步

首先安装pip软件包管理工具。并运行此命令。

pip install lxml

如果您有任何疑问,请点击这里

Step 1

Install latest python updates using this command.

sudo apt-get install python-dev

Step 2

Add first dependency libxml2 version 2.7.0 or later

sudo apt-get install libxml2-dev

Step 3

Add second dependency libxslt version 1.1.23 or later

sudo apt-get install libxslt1-dev

Step 4

Install pip package management tool first. and run this command.

pip install lxml

If you have any doubt Click Here


回答 6

安装AKX提到的软件包后,我仍然遇到相同的问题。解决了

apt-get install python-dev

After installing the packages mentioned by AKX I still had the same problem. Solved it with

apt-get install python-dev

回答 7

对于Ubuntu 12.04.3 LTS(精确的穿山甲),我必须这样做:

apt-get install libxml2-dev libxslt1-dev

(注意libxslt1-dev中的“ 1”)

然后我刚刚用pip / easy_install安装了lxml。

For Ubuntu 12.04.3 LTS (Precise Pangolin) I had to do:

apt-get install libxml2-dev libxslt1-dev

(Note the “1” in libxslt1-dev)

Then I just installed lxml with pip/easy_install.


回答 8

从Ubuntu 18.4(Bionic Beaver)开始,建议使用apt而不是apt-get,因为它具有更好的结构形式。

sudo apt install libxml2-dev libxslt1-dev python-dev

如果您对可能是旧版本的设备感到满意lxml,则可以尝试

sudo apt install python-lxml

From Ubuntu 18.4 (Bionic Beaver) it is advisable to use apt instead of apt-get since it has much better structural form.

sudo apt install libxml2-dev libxslt1-dev python-dev

If you’re happy with a possibly older version of lxml altogether though, you could try

sudo apt install python-lxml

回答 9


由于@Simplans(https://stackoverflow.com/a/37759871/417747)的指针和主页,这里的许多答案都比较旧了。

什么对我有用(Ubuntu仿生):

sudo apt-get install python3-lxml  

(+ sudo apt-get install libxml2-dev libxslt1-dev我已经安装了它,但是不确定那是否仍然是必需的)

Many answers here are rather old,
thanks to the pointer from @Simplans (https://stackoverflow.com/a/37759871/417747) and the home page

What worked for me (Ubuntu bionic):

sudo apt-get install python3-lxml  

(+ sudo apt-get install libxml2-dev libxslt1-dev I installed before it, but not sure if that’s the requirement still)


回答 10

首先安装Ubuntu的python-lxml软件包及其依赖项:

sudo apt-get install python-lxml

然后使用pip升级到适用于Python的lxml的最新版本:

pip install lxml

First install Ubuntu’s python-lxml package and its dependencies:

sudo apt-get install python-lxml

Then use pip to upgrade to the latest version of lxml for Python:

pip install lxml

为什么Python中没有++和-运算符?

问题:为什么Python中没有++和-运算符?

为什么在Python中没有++and --运算符?

Why are there no ++ and -- operators in Python?


回答 0

不是因为它没有道理;而是因为它没有意义。最好将“ x ++”定义为“ x + = 1,求出x的先前绑定”。

如果您想知道最初的原因,则必须要么浏览旧的Python邮件列表,要么询问那里的某个人(例如Guido),但是在事实成立之后就很容易找到理由了:

与其他语言一样,不需要简单的增量和减量。您不会for(int i = 0; i < 10; ++i)经常用Python 编写东西。相反,你做类似的事情for i in range(0, 10)

由于几乎不需要它,因此没有太多理由为其提供特殊的语法。当您确实需要增加时,+=通常就可以了。

这不是是否有意义,还是可以做到的决定。这是一个好处是否值得添加到该语言的核心语法中的问题。请记住,这是四个运算符-postinc,postdec,preinc,predec,并且每个运算符都需要具有自己的类重载;他们都需要指定和测试;它将在语言中添加操作码(暗示更大,因此更慢的VM引擎);每个支持逻辑增量的类都需要实现它们(在+=和之上-=)。

+=和都是多余的-=,因此将成为净亏损。

It’s not because it doesn’t make sense; it makes perfect sense to define “x++” as “x += 1, evaluating to the previous binding of x”.

If you want to know the original reason, you’ll have to either wade through old Python mailing lists or ask somebody who was there (eg. Guido), but it’s easy enough to justify after the fact:

Simple increment and decrement aren’t needed as much as in other languages. You don’t write things like for(int i = 0; i < 10; ++i) in Python very often; instead you do things like for i in range(0, 10).

Since it’s not needed nearly as often, there’s much less reason to give it its own special syntax; when you do need to increment, += is usually just fine.

It’s not a decision of whether it makes sense, or whether it can be done–it does, and it can. It’s a question of whether the benefit is worth adding to the core syntax of the language. Remember, this is four operators–postinc, postdec, preinc, predec, and each of these would need to have its own class overloads; they all need to be specified, and tested; it would add opcodes to the language (implying a larger, and therefore slower, VM engine); every class that supports a logical increment would need to implement them (on top of += and -=).

This is all redundant with += and -=, so it would become a net loss.


回答 1

我写的这个原始答案是关于计算机民俗的一个神话:被丹尼斯·里奇(Dennis Ritchie)认为是“历史上不可能的”,正如ACM通讯编辑在2012年7月致函doi:10.1145 / 2209249.2209251


C增/减运算符是在C编译器不是很聪明的时候发明的,作者希望能够指定使用机器语言运算符的直接意图,从而节省了编译器的几个周期,可能会做一个

load memory
load 1
add
store memory

代替

inc memory 

PDP-11甚至支持分别对应于*++p和的“自动递增”和“延迟自动递增”指令*p++。如果非常好奇,请参阅手册第5.3节。

由于编译器足够聪明,可以处理C语法中内置的高级优化技巧,因此它们现在只是语法上的便利。

Python没有技巧来向汇编器传达意图,因为它不使用汇编器。

This original answer I wrote is a myth from the folklore of computing: debunked by Dennis Ritchie as “historically impossible” as noted in the letters to the editors of Communications of the ACM July 2012 doi:10.1145/2209249.2209251


The C increment/decrement operators were invented at a time when the C compiler wasn’t very smart and the authors wanted to be able to specify the direct intent that a machine language operator should be used which saved a handful of cycles for a compiler which might do a

load memory
load 1
add
store memory

instead of

inc memory 

and the PDP-11 even supported “autoincrement” and “autoincrement deferred” instructions corresponding to *++p and *p++, respectively. See section 5.3 of the manual if horribly curious.

As compilers are smart enough to handle the high-level optimization tricks built into the syntax of C, they are just a syntactic convenience now.

Python doesn’t have tricks to convey intentions to the assembler because it doesn’t use one.


回答 2

我一直以为这与python禅的这一行有关:

应该有一种(最好只有一种)明显的方式来做到这一点。

x ++和x + = 1做完全相同的事情,因此没有理由同时使用两者。

I always assumed it had to do with this line of the zen of python:

There should be one — and preferably only one — obvious way to do it.

x++ and x+=1 do the exact same thing, so there is no reason to have both.


回答 3

当然,我们可以说“圭多岛就是这样决定的”,但我认为问题实际上是做出该决定的原因。我认为有以下几个原因:

  • 它将陈述和表达式混合在一起,这不是一个好习惯。参见http://norvig.com/python-iaq.html
  • 它通常会鼓励人们编写可读性较低的代码
  • 如前所述,语言实现的额外复杂性在Python中是不必要的

Of course, we could say “Guido just decided that way”, but I think the question is really about the reasons for that decision. I think there are several reasons:

  • It mixes together statements and expressions, which is not good practice. See http://norvig.com/python-iaq.html
  • It generally encourages people to write less readable code
  • Extra complexity in the language implementation, which is unnecessary in Python, as already mentioned

回答 4

因为在Python中,整数是不可变的(int的+ =实际上返回了一个不同的对象)。

同样,使用++ /-时,您需要担心增量前后的递增和递减,并且只需要再写一次按键即可x+=1。换句话说,它避免了潜在的混乱,但付出的代价却很小。

Because, in Python, integers are immutable (int’s += actually returns a different object).

Also, with ++/– you need to worry about pre- versus post- increment/decrement, and it takes only one more keystroke to write x+=1. In other words, it avoids potential confusion at the expense of very little gain.


回答 5

明晰!

Python有很多关于清晰度--a的知识,除非他/她学习了具有这种构造的语言,否则任何程序员都不可能正确地猜测的含义。

Python还有很多关于避免引发错误的构造的知识,并且++已知运算符是缺陷的丰富来源。这两个原因足以在Python中没有这些运算符。

Python使用缩进来标记块而不是诸如某种形式的开始/结束方括号或强制结束标记之类的句法手段的决定很大程度上基于相同的考虑。

作为说明,请看一下有关在2005年向Python中引入条件运算符(在C:中cond ? resultif : resultelse)的讨论。至少请阅读该讨论第一条消息决策消息(之前有相同主题的多个先驱)。

琐事: 其中经常提到的PEP是“ Python扩展建议” PEP 308。LC表示列表理解,GE表示生成器表达式(不要担心,如果它们使您感到困惑,它们不是Python的少数复杂地方)。

Clarity!

Python is a lot about clarity and no programmer is likely to correctly guess the meaning of --a unless s/he’s learned a language having that construct.

Python is also a lot about avoiding constructs that invite mistakes and the ++ operators are known to be rich sources of defects. These two reasons are enough not to have those operators in Python.

The decision that Python uses indentation to mark blocks rather than syntactical means such as some form of begin/end bracketing or mandatory end marking is based largely on the same considerations.

For illustration, have a look at the discussion around introducing a conditional operator (in C: cond ? resultif : resultelse) into Python in 2005. Read at least the first message and the decision message of that discussion (which had several precursors on the same topic previously).

Trivia: The PEP frequently mentioned therein is the “Python Extension Proposal” PEP 308. LC means list comprehension, GE means generator expression (and don’t worry if those confuse you, they are none of the few complicated spots of Python).


回答 6

我对python为什么没有++运算符的理解如下:当您在python中编写此代码时,a=b=c=1您将获得三个变量(标签),它们指向同一对象(值为1)。您可以使用id函数进行验证,该函数将返回对象内存地址:

In [19]: id(a)
Out[19]: 34019256

In [20]: id(b)
Out[20]: 34019256

In [21]: id(c)
Out[21]: 34019256

所有三个变量(标签)都指向同一对象。现在递增变量之一,看看它如何影响内存地址:

In [22] a = a + 1

In [23]: id(a)
Out[23]: 34019232

In [24]: id(b)
Out[24]: 34019256

In [25]: id(c)
Out[25]: 34019256

您可以看到该变量a现在指向另一个对象,为bc。因为您已经使用a = a + 1它,所以它很明显。换句话说,您将另一个对象完全分配给label a。想象一下,您可以编写a++它,这表明您没有分配给变量a新对象,而是增加了旧对象的数量。所有这些东西都是恕我直言,以尽量减少混乱。为了更好地理解,请参见python变量如何工作:

在Python中,为什么函数可以修改调用者认为的某些参数,而不能修改其他参数?

Python是按值调用还是按引用调用?都不行

Python是按值传递还是按引用传递?

Python是按引用传递还是按值传递?

Python:如何通过引用传递变量?

了解Python变量和内存管理

在python中模拟按值传递行为

Python函数通过引用调用

像Pythonista一样的代码:惯用的Python

My understanding of why python does not have ++ operator is following: When you write this in python a=b=c=1 you will get three variables (labels) pointing at same object (which value is 1). You can verify this by using id function which will return an object memory address:

In [19]: id(a)
Out[19]: 34019256

In [20]: id(b)
Out[20]: 34019256

In [21]: id(c)
Out[21]: 34019256

All three variables (labels) point to the same object. Now increment one of variable and see how it affects memory addresses:

In [22] a = a + 1

In [23]: id(a)
Out[23]: 34019232

In [24]: id(b)
Out[24]: 34019256

In [25]: id(c)
Out[25]: 34019256

You can see that variable a now points to another object as variables b and c. Because you’ve used a = a + 1 it is explicitly clear. In other words you assign completely another object to label a. Imagine that you can write a++ it would suggest that you did not assign to variable a new object but ratter increment the old one. All this stuff is IMHO for minimization of confusion. For better understanding see how python variables works:

In Python, why can a function modify some arguments as perceived by the caller, but not others?

Is Python call-by-value or call-by-reference? Neither.

Does Python pass by value, or by reference?

Is Python pass-by-reference or pass-by-value?

Python: How do I pass a variable by reference?

Understanding Python variables and Memory Management

Emulating pass-by-value behaviour in python

Python functions call by reference

Code Like a Pythonista: Idiomatic Python


回答 7

它就是这样设计的。递增和递减运算符只是的快捷方式x = x + 1。Python通常采用了一种设计策略,该策略减少了执行操作的替代方法的数量。 增量分配是Python中最接近递增/递减运算符的东西,直到Python 2.0才添加。

It was just designed that way. Increment and decrement operators are just shortcuts for x = x + 1. Python has typically adopted a design strategy which reduces the number of alternative means of performing an operation. Augmented assignment is the closest thing to increment/decrement operators in Python, and they weren’t even added until Python 2.0.


回答 8

我是python的新手,但我怀疑原因是由于该语言中的可变对象和不可变对象之间的强调。现在,我知道x ++可以很容易地解释为x = x + 1,但是它看起来就像您就地递增一个不可变的对象一样。

只是我的猜测/感觉/预感。

I’m very new to python but I suspect the reason is because of the emphasis between mutable and immutable objects within the language. Now, I know that x++ can easily be interpreted as x = x + 1, but it LOOKS like you’re incrementing in-place an object which could be immutable.

Just my guess/feeling/hunch.


回答 9

首先,Python仅受C间接影响。它在很大程度上受到影响ABC,这显然不具备这些运营商,所以它不应该有任何巨大的惊喜不会找到它们在Python两种。

其次,正如其他人所说,递增和递减由支持+=-=了。

第三,对++and --运算符集的完全支持通常包括同时支持它们的前缀和后缀版本。在C和C ++中,这可能导致各种“可爱”的构造(在我看来)与Python所包含的简单性和直截了当的精神背道而驰。

例如,尽管C语句while(*t++ = *s++);对于有经验的程序员而言似乎简单而优雅,但对于学习它的人来说,却绝非简单。混合使用前缀和后缀增量和减量,甚至许多专业人士也必须停下来思考一下。

First, Python is only indirectly influenced by C; it is heavily influenced by ABC, which apparently does not have these operators, so it should not be any great surprise not to find them in Python either.

Secondly, as others have said, increment and decrement are supported by += and -= already.

Third, full support for a ++ and -- operator set usually includes supporting both the prefix and postfix versions of them. In C and C++, this can lead to all kinds of “lovely” constructs that seem (to me) to be against the spirit of simplicity and straight-forwardness that Python embraces.

For example, while the C statement while(*t++ = *s++); may seem simple and elegant to an experienced programmer, to someone learning it, it is anything but simple. Throw in a mixture of prefix and postfix increments and decrements, and even many pros will have to stop and think a bit.


回答 10

我认为这源于Python的信条,即“明确胜于隐含”。

I believe it stems from the Python creed that “explicit is better than implicit”.


回答 11

这可能是因为@GlennMaynard正在将问题与其他语言进行比较,但是在Python中,您是以python方式进行操作的。这不是一个“为什么”的问题。在那里,您可以使用达到相同的效果x+=。在《 Python的禅宗》中,给出了:“只有一种解决问题的方法。” 多种选择在艺术上(表达自由)很棒,但在工程上却很​​糟糕。

This may be because @GlennMaynard is looking at the matter as in comparison with other languages, but in Python, you do things the python way. It’s not a ‘why’ question. It’s there and you can do things to the same effect with x+=. In The Zen of Python, it is given: “there should only be one way to solve a problem.” Multiple choices are great in art (freedom of expression) but lousy in engineering.


回答 12

++运算符的类别是具有副作用的表达式。这是Python中通常找不到的东西。

出于同样的原因,赋值不是Python中的表达式,因此防止了通用 if (a = f(...)) { /* using a here */ }用法。

最后,我怀疑操作符与Python的参考语义不是很一致。请记住,Python没有具有C / C ++已知语义的变量(或指针)。

The ++ class of operators are expressions with side effects. This is something generally not found in Python.

For the same reason an assignment is not an expression in Python, thus preventing the common if (a = f(...)) { /* using a here */ } idiom.

Lastly I suspect that there operator are not very consistent with Pythons reference semantics. Remember, Python does not have variables (or pointers) with the semantics known from C/C++.


回答 13

也许更好的问题是问为什么这些运算符存在于C中。K&R调用增量和减量运算符为“异常”(第2.8页第2.8节)。导言称它们“更简洁,通常更高效”。我怀疑这些操作总是在指针操作中出现的事实也影响了它们的引入。在Python中,可能已经决定尝试优化增量没有任何意义(事实上,我只是在C中进行了测试,而且似乎gcc生成的程序集在两种情况下都使用addl而不是incl),并且没有指针算法;因此,它本来只是另一种实现方式,而我们知道Python不愿这样做。

Maybe a better question would be to ask why do these operators exist in C. K&R calls increment and decrement operators ‘unusual’ (Section 2.8page 46). The Introduction calls them ‘more concise and often more efficient’. I suspect that the fact that these operations always come up in pointer manipulation also has played a part in their introduction. In Python it has been probably decided that it made no sense to try to optimise increments (in fact I just did a test in C, and it seems that the gcc-generated assembly uses addl instead of incl in both cases) and there is no pointer arithmetic; so it would have been just One More Way to Do It and we know Python loathes that.


回答 14

据我了解,所以您不会认为内存中的值已更改。在c中,当执行x ++时,内存中x的值会更改。但是在python中,所有数字都是不可变的,因此x指向的地址仍然具有x而不是x + 1。当您编写x ++时,您可能会认为x发生了改变,实际上是x引用更改为x + 1存储在内存中的位置,或者如果doe不存在,则重新创建该位置。

as i understood it so you won’t think the value in memory is changed. in c when you do x++ the value of x in memory changes. but in python all numbers are immutable hence the address that x pointed as still has x not x+1. when you write x++ you would think that x change what really happens is that x refrence is changed to a location in memory where x+1 is stored or recreate this location if doe’s not exists.


回答 15

要在该页面上完成已经很好的答案:

假设我们决定这样做,在前缀(++i)处打乱一元+和-运算符。

今天,以++--做任何前缀都没有,因为它使一元加号运算符两次(不执行任何操作)或一元减号两次(两次:取消自身)成为可能

>>> i=12
>>> ++i
12
>>> --i
12

这样可能会破坏这种逻辑。

To complete already good answers on that page:

Let’s suppose we decide to do this, prefix (++i) that would break the unary + and – operators.

Today, prefixing by ++ or -- does nothing, because it enables unary plus operator twice (does nothing) or unary minus twice (twice: cancels itself)

>>> i=12
>>> ++i
12
>>> --i
12

So that would potentially break that logic.


回答 16

其他答案描述了为什么迭代器不需要它,但是有时在分配以增加内联变量时它很有用,您可以使用元组和多重分配来达到相同的效果:

b = ++a 变成:

a,b = (a+1,)*2

b = a++成为:

a,b = a+1, a

Python的3.8引入了分配:=操作,使我们能够实现foo(++a)

foo(a:=a+1)

foo(a++) 虽然仍然难以捉摸。

Other answers have described why it’s not needed for iterators, but sometimes it is useful when assigning to increase a variable in-line, you can achieve the same effect using tuples and multiple assignment:

b = ++a becomes:

a,b = (a+1,)*2

and b = a++ becomes:

a,b = a+1, a

Python 3.8 introduces the assignment := operator, allowing us to achievefoo(++a) with

foo(a:=a+1)

foo(a++) is still elusive though.


回答 17

我认为这涉及对象的可变性和不可变性的概念。2,3,4,5在python中是不可变的。请参考下图。2具有固定的ID,直到此python进程为止。

常量和变量的ID

x ++本质上意味着像C的就地增量。在C中,x ++执行就地增量。因此,x = 3,并且x ++会将内存中的3增加到4,这与python中内存中仍然存在3的情况不同。

因此,在python中,您无需在内存中重新创建值。这可能会导致性能优化。

这是基于预感的答案。

I think this relates to the concepts of mutability and immutability of objects. 2,3,4,5 are immutable in python. Refer to the image below. 2 has fixed id until this python process.

ID of constants and variables

x++ would essentially mean an in-place increment like C. In C, x++ performs in-place increments. So, x=3, and x++ would increment 3 in the memory to 4, unlike python where 3 would still exist in memory.

Thus in python, you don’t need to recreate a value in memory. This may lead to performance optimizations.

This is a hunch based answer.


回答 18

我知道这是一个旧线程,但是没有涵盖++ i的最常见用例,即在没有提供索引的情况下手动索引集。这就是为什么python提供enumerate()的原因

示例:在任何给定的语言中,当您使用诸如foreach之类的结构来遍历一个集合时,出于示例的考虑,我们甚至会说它是无序的集合,并且您需要一个唯一的索引来区分所有内容,例如

i = 0
stuff = {'a': 'b', 'c': 'd', 'e': 'f'}
uniquestuff = {}
for key, val in stuff.items() :
  uniquestuff[key] = '{0}{1}'.format(val, i)
  i += 1

在这种情况下,python提供了一个枚举方法,例如

for i, (key, val) in enumerate(stuff.items()) :

I know this is an old thread, but the most common use case for ++i is not covered, that being manually indexing sets when there are no provided indices. This situation is why python provides enumerate()

Example : In any given language, when you use a construct like foreach to iterate over a set – for the sake of the example we’ll even say it’s an unordered set and you need a unique index for everything to tell them apart, say

i = 0
stuff = {'a': 'b', 'c': 'd', 'e': 'f'}
uniquestuff = {}
for key, val in stuff.items() :
  uniquestuff[key] = '{0}{1}'.format(val, i)
  i += 1

In cases like this, python provides an enumerate method, e.g.

for i, (key, val) in enumerate(stuff.items()) :

将Flask开发服务器配置为在网络上可见

问题:将Flask开发服务器配置为在网络上可见

我不确定这是否是Flask专用的,但是当我在开发模式(http://localhost:5000)下运行应用程序时,无法从网络上的其他计算机(使用http://[dev-host-ip]:5000)访问它。例如,在开发模式下使用Rails时,它可以正常工作。我找不到有关Flask开发服务器配置的任何文档。任何想法应该配置为启用此功能吗?

I’m not sure if this is Flask specific, but when I run an app in dev mode (http://localhost:5000), I cannot access it from other machines on the network (with http://[dev-host-ip]:5000). With Rails in dev mode, for example, it works fine. I couldn’t find any docs regarding the Flask dev server configuration. Any idea what should be configured to enable this?


回答 0

尽管这是可行的,但您不应在生产中使用Flask dev服务器。Flask开发服务器的设计并非特别安全,稳定或高效。有关正确的解决方案,请参阅有关部署的文档。


将参数添加到中app.run()。默认情况下,它在本地主机上运行,​​将其更改app.run(host= '0.0.0.0')为在您的计算机IP地址上运行。

快速入门页上的“外部可见的服务器”下的Flask网站上记录

外部可见服务器

如果运行服务器,您会注意到该服务器仅可用于您自己的计算机,而不能用于网络中的任何其他服务器。这是默认设置,因为在调试模式下,应用程序的用户可以在计算机上执行任意Python代码。如果禁用了调试或信任网络上的用户,则可以使服务器公开可用。

只需将run()方法的调用更改为如下所示:

app.run(host='0.0.0.0')

这告诉您的操作系统侦听公共IP。

While this is possible, you should not use the Flask dev server in production. The Flask dev server is not designed to be particularly secure, stable, or efficient. See the docs on deploying for correct solutions.


Add a parameter to your app.run(). By default it runs on localhost, change it to app.run(host= '0.0.0.0') to run on your machines IP address.

Documented on the Flask site under “Externally Visible Server” on the Quickstart page:

Externally Visible Server

If you run the server you will notice that the server is only available from your own computer, not from any other in the network. This is the default because in debugging mode a user of the application can execute arbitrary Python code on your computer. If you have debug disabled or trust the users on your network, you can make the server publicly available.

Just change the call of the run() method to look like this:

app.run(host='0.0.0.0')

This tells your operating system to listen on a public IP.


回答 1

如果使用flask可执行文件启动服务器,则可以使用flask run --host=0.0.0.0更改默认值,从127.0.0.1并将其打开到非本地连接。其他答案描述的config和app.run方法可能是更好的做法,但这也很方便。

外部可见服务器如果运行服务器,您将注意到只能从您自己的计算机访问该服务器,而不能从网络中的任何其他服务器访问该服务器。这是默认设置,因为在调试模式下,应用程序的用户可以在计算机上执行任意Python代码。

如果禁用了调试器或信任网络上的用户,则只需在命令行中添加–host = 0.0.0.0,即可使服务器公开可用:

flask run –host = 0.0.0.0这告诉您的操作系统侦听所有公用IP。

参考:http//flask.pocoo.org/docs/0.11/quickstart/

If you use the flask executable to start your server, you can use flask run --host=0.0.0.0 to change the default from 127.0.0.1 and open it up to non local connections. The config and app.run methods that the other answers describe are probably better practice but this can be handy as well.

Externally Visible Server If you run the server you will notice that the server is only accessible from your own computer, not from any other in the network. This is the default because in debugging mode a user of the application can execute arbitrary Python code on your computer.

If you have the debugger disabled or trust the users on your network, you can make the server publicly available simply by adding –host=0.0.0.0 to the command line:

flask run –host=0.0.0.0 This tells your operating system to listen on all public IPs.

Reference: http://flask.pocoo.org/docs/0.11/quickstart/


回答 2

如果0.0.0.0方法不起作用,请尝试此操作

无聊的东西

我亲自进行了很多努力,以使我的应用可以通过本地服务器访问其他设备(笔记本电脑和手机)。我尝试了0.0.0.0方法,但是没有运气。然后,我尝试更改端口,但没有成功。因此,在尝试了一堆不同的组合之后,我找到了这个组合,它解决了我在本地服务器上部署应用程序的问题。

脚步

  1. 获取计算机的本地IPv4地址。这可以通过ipconfig在Windows以及ifconfiglinux和Mac上键入来完成。

IPv4(Windows)

请注意:以上步骤将在提供该应用程序的计算机上执行,而不是在您正在访问该应用程序的计算机上执行。另请注意,如果断开连接并重新连接到网络,IPv4地址可能会更改。

  1. 现在,只需使用获取的IPv4地址运行flask应用程序即可。

    flask run -h 192.168.X.X

    例如,就我而言(参见图片),我将其运行为:

    flask run -h 192.168.1.100

运行烧瓶应用程序

在我的移动设备上

我手机的屏幕截图

可选的东西

如果您正在Windows上执行此过程,并使用Power Shell作为CLI,但仍然无法访问该网站,请在运行该应用程序的Shell中尝试使用CTRL + C命令。Power Shell有时会冻结,因此需要一点点恢复。这样做甚至可能终止服务器,但有时可以解决问题。

而已。如果您觉得有帮助,请竖起大拇指。😉

一些其他可选的东西

我创建了一个简短的Powershell脚本,可以在需要时为您提供IP地址:

$env:getIp = ipconfig
if ($env:getIp -match '(IPv4[\sa-zA-Z.]+:\s[0-9.]+)') {
    if ($matches[1] -match '([^a-z\s][\d]+[.\d]+)'){
        $ipv4 = $matches[1]
    }
}
echo $ipv4

将其保存到扩展名为.ps1的文件中(对于PowerShell),然后在启动您的应用程序之前对其运行。您可以将其保存在项目文件夹中,并以以下方式运行:

.\getIP.ps1; flask run -h $ipv4

注意:我将上面的shell代码保存在getIP.ps1中。

酷👌

Try this if the 0.0.0.0 method doesn’t work

Boring Stuff

I personally battled a lot to get my app accessible to other devices(laptops and mobile phones) through a local-server. I tried the 0.0.0.0 method, but no luck. Then I tried changing the port, but it just didn’t work. So, after trying a bunch of different combinations, I arrived to this one, and it solved my problem of deploying my app on a local-server.

Steps

  1. Get the local IPv4 address of your computer. This can be done by typing ipconfig on Windows and ifconfig on linux and Mac.

IPv4 (Windows)

Please note: The above step is to be performed on the machine you are serving the app on, and on not the machine on which you are accessing it. Also note, that the IPv4 address might change if you disconnect and reconnect to the network.

  1. Now, simply run the flask app with the acquired IPv4 address.

    flask run -h 192.168.X.X

    E.g. In my case (see the image), I ran it as:

    flask run -h 192.168.1.100

running the flask app

On my mobile device

screenshot from my mobile phone

Optional Stuff

If you are performing this procedure on Windows, and using Power Shell as the CLI, and you still aren’t able to access the website, try a CTRL + C command in the shell that’s running the app. Power Shell get frozen up sometimes and it needs a pinch to revive. Doing this might even terminate the server, but it sometimes does the trick.

That’s it. Give a thumbs up if you found this helpful.😉

Some more optional stuff

I have created a short Powershell script that will get you your IP address whenever you need one:

$env:getIp = ipconfig
if ($env:getIp -match '(IPv4[\sa-zA-Z.]+:\s[0-9.]+)') {
    if ($matches[1] -match '([^a-z\s][\d]+[.\d]+)'){
        $ipv4 = $matches[1]
    }
}
echo $ipv4

Save it to a file with .ps1 extenstion (for PowerShell), and run it on before starting your app. You can save it in your project folder and run it as:

.\getIP.ps1; flask run -h $ipv4

Note: I saved the above shell code in getIP.ps1.

Cool.👌


回答 3

如果您的cool应用程序的配置是从外部文件加载的,如以下示例所示,请不要忘记使用HOST =“ 0.0.0.0”更新相应的配置文件

cool.app.run(
    host=cool.app.config.get("HOST", "localhost"),
    port=cool.app.config.get("PORT", 9000)
)            

If your cool app has it’s configuration loaded from an external file, like in the following example, then don’t forget to update the corresponding config file with HOST=”0.0.0.0″

cool.app.run(
    host=cool.app.config.get("HOST", "localhost"),
    port=cool.app.config.get("PORT", 9000)
)            

回答 4

在您的项目中添加以下几行

if __name__ == '__main__':
    app.debug = True
    app.run(host = '0.0.0.0',port=5005)

Add below lines to your project

if __name__ == '__main__':
    app.debug = True
    app.run(host = '0.0.0.0',port=5005)

回答 5

检查服务器上是否打开了特定端口以服务于客户端?

在Ubuntu或Linux发行版中

sudo ufw enable
sudo ufw allow 5000/tcp //allow the server to handle the request on port 5000

配置应用程序以处理远程请求

app.run(host='0.0.0.0' , port=5000)


python3 app.py & #run application in background

Check whether the particular port is open on the server to serve the client or not?

in Ubuntu or Linux distro

sudo ufw enable
sudo ufw allow 5000/tcp //allow the server to handle the request on port 5000

Configure the application to handle remote requests

app.run(host='0.0.0.0' , port=5000)


python3 app.py & #run application in background

回答 6

转到CMD(命令提示符)上的项目路径,然后执行以下命令:

设置FLASK_APP = ABC.py

SET FLASK_ENV =开发

烧瓶运行-h [yourIP] -p 8080

您将在CMD上获得以下o / p:-

  • 正在投放Flask应用“ expirement.py”(延迟加载)

现在,您可以使用http:// [yourIP]:8080 / url 在另一台计算机上访问flask应用程序

Go to your project path on CMD(command Prompt) and execute the following command:-

set FLASK_APP=ABC.py

SET FLASK_ENV=development

flask run -h [yourIP] -p 8080

you will get following o/p on CMD:-

  • Serving Flask app “expirement.py” (lazy loading)
    • Environment: development
    • Debug mode: on
    • Restarting with stat
    • Debugger is active!
    • Debugger PIN: 199-519-700
    • Running on http://[yourIP]:8080/ (Press CTRL+C to quit)

Now you can access your flask app on another machine using http://[yourIP]:8080/ url


回答 7

如果您在访问使用PyCharm部署的Flask服务器时遇到问题,请考虑以下因素:

PyCharm不会直接运行您的主.py文件,因此if __name__ == '__main__':不会执行其中的任何代码,并且任何更改(例如app.run(host='0.0.0.0', port=5000))都不会生效。

相反,您应该使用“运行配置”配置Flask服务器,尤其是将其放置--host 0.0.0.0 --port 5000在“ 其他选项”字段中。

运行Flask服务器PyCharm的配置

有关在PyCharm中配置Flask服务器的更多信息

If you’re having troubles accessing your Flask server, deployed using PyCharm, take the following into account:

PyCharm doesn’t run your main .py file directly, so any code in if __name__ == '__main__': won’t be executed, and any changes (like app.run(host='0.0.0.0', port=5000)) won’t take effect.

Instead, you should configure the Flask server using Run Configurations, in particular, placing --host 0.0.0.0 --port 5000 into Additional options field.

Run cofigurations of Flask server PyCharm

More about configuring Flask server in PyCharm


回答 8

我遇到了同样的问题,我使用PyCharm作为编辑器,并且在创建项目时,PyCharm创建了Flask Server。我所做的就是通过以下方式用Python创建服务器;

配置Python服务器PyCharm 基本上我所做的是创建一个新服务器,但如果不是python,则使用flask

希望对您有帮助

I had the same problem, I use PyCharm as an editor and when I created the project, PyCharm created a Flask Server. What I did was create a server with Python in the following way;

Config Python Server PyCharm basically what I did was create a new server but flask if not python

I hope it helps you


回答 9

这个答案不仅与flask有关,而且应适用于所有无法连接来自另一个主机的服务

  1. 用于netstat -ano | grep <port>查看地址是0.0.0.0还是::。如果是127.0.0.1,则仅适用于本地请求。
  2. 使用tcpdump查看是否有任何数据包丢失。如果显示明显的不平衡,请通过iptables检查路由规则。

今天,我像往常一样运行烧瓶应用程序,但是我发现它无法从其他服务器连接。然后运行netstat -ano | grep <port>,本地地址为::or 0.0.0.0(我都尝试过,并且我知道127.0.0.1仅允许来自本地主机的连接)。然后我用了telnet host port,结果就像connect to ...。这很奇怪。然后我想我最好跟查一下tcpdump -i any port <port> -w w.pcap。我注意到这一切都是这样的:

tcpdump结果显示它只有来自远程主机的SYN数据包

然后通过检查iptables --listOUTPUT部分,我可以看到一些规则:

iptables列表结果

这些规则禁止握手时输出tcp重要数据包。通过删除它们,问题消失了。

This answer is not solely related with flask, but should be applicable for all cannot connect service from another host issue.

  1. use netstat -ano | grep <port> to see if the address is 0.0.0.0 or ::. If it is 127.0.0.1 then it is only for the local requests.
  2. use tcpdump to see if any packet is missing. If it shows obvious imbalance, check routing rules by iptables.

Today I run my flask app as usual, but I noticed it cannot connect from other server. Then I run netstat -ano | grep <port>, and the local address is :: or 0.0.0.0 (I tried both, and I know 127.0.0.1 only allows connection from the local host). Then I used telnet host port, the result is like connect to .... This is very odd. Then I thought I would better check it with tcpdump -i any port <port> -w w.pcap. And I noticed it is all like this:

tcpdump result shows it there is only SYN packets from remote host

Then by checking iptables --list OUTPUT section, I could see several rules:

iptables list result

these rules forbid output tcp vital packets in handshaking. By deleting them, the problem is gone.


回答 10

对我来说,我遵循上面的答案并对其进行了一些修改:

  1. 只需在命令提示符下使用ipconfig来获取您的ipv4地址
  2. 转到存在烧瓶代码的文件
  3. 在主函数中编写app.run(host =’您的ipv4地址’)

例如:

在此处输入图片说明

For me i followed the above answer and modified it a bit:

  1. Just grab your ipv4 address using ipconfig on command prompt
  2. Go to the file in which flask code is present
  3. In main function write app.run(host= ‘your ipv4 address’)

Eg:

enter image description here


回答 11

转到项目路径set FLASK_APP = ABC.py SET FLASK_ENV = development

flask run -h [yourIP] -p 8080,您将在CMD上遵循o / p:-*正在服务的Flask应用程序“ expirement.py”(延迟加载)*环境:开发*调试模式:on *以stat重新启动*调试器处于活动状态!*调试器PIN:199-519-700 *在http:// [yourIP]:8080 /上运行(按CTRL + C退出)

go to project path set FLASK_APP=ABC.py SET FLASK_ENV=development

flask run -h [yourIP] -p 8080 you will following o/p on CMD:- * Serving Flask app “expirement.py” (lazy loading) * Environment: development * Debug mode: on * Restarting with stat * Debugger is active! * Debugger PIN: 199-519-700 * Running on http://[yourIP]:8080/ (Press CTRL+C to quit)


回答 12

您还可以通过环境变量设置主机(将其暴露在面向IP地址的网络上)和端口。

$ export FLASK_APP=app.py
$ export FLASK_ENV=development
$ export FLASK_RUN_PORT=8000
$ export FLASK_RUN_HOST=0.0.0.0

$ flask run
 * Serving Flask app "app.py" (lazy loading)
 * Environment: development
 * Debug mode: on
 * Running on https://0.0.0.0:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 329-665-000

请参见如何获取所有可用的命令选项来设置环境变量?

You can also set the host (to expose it on a network facing IP address) and port via environment variables.

$ export FLASK_APP=app.py
$ export FLASK_ENV=development
$ export FLASK_RUN_PORT=8000
$ export FLASK_RUN_HOST=0.0.0.0

$ flask run
 * Serving Flask app "app.py" (lazy loading)
 * Environment: development
 * Debug mode: on
 * Running on https://0.0.0.0:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 329-665-000

See How to get all available Command Options to set environment variables?


回答 13

.flaskenv在项目根目录中创建文件。

该文件中的参数通常为:

FLASK_APP=app.py
FLASK_ENV=development
FLASK_RUN_HOST=[dev-host-ip]
FLASK_RUN_PORT=5000

如果您有虚拟环境,请激活它并执行pip install python-dotenv

该软件包将使用该.flaskenv文件,并且其中的声明将在终端会话之间自动导入。

那你可以做 flask run

Create file .flaskenv in the project root directory.

The parameters in this file are typically:

FLASK_APP=app.py
FLASK_ENV=development
FLASK_RUN_HOST=[dev-host-ip]
FLASK_RUN_PORT=5000

If you have a virtual environment, activate it and do a pip install python-dotenv .

This package is going to use the .flaskenv file, and declarations inside it will be automatically imported across terminal sessions.

Then you can do flask run