分类目录归档：知识问答

如何在Django中以GROUP BY查询？

2021年7月27日 Python实用宝典

问题：如何在Django中以GROUP BY查询？

我查询一个模型：

Members.objects.all()

它返回：

Eric, Salesman, X-Shop
Freddie, Manager, X2-Shop
Teddy, Salesman, X2-Shop
Sean, Manager, X2-Shop

我想要的是知道group_by向我的数据库发送查询的最佳Django方法，例如：

Members.objects.all().group_by('designation')

当然，这不起作用。我知道我们可以在上做一些技巧django/db/models/query.py，但我只是很好奇知道如何在不打补丁的情况下进行操作。

I query a model:

Members.objects.all()

And it returns:

Eric, Salesman, X-Shop
Freddie, Manager, X2-Shop
Teddy, Salesman, X2-Shop
Sean, Manager, X2-Shop

What I want is to know the best Django way to fire a group_by query to my database, like:

Members.objects.all().group_by('designation')

Which doesn’t work, of course. I know we can do some tricks on django/db/models/query.py, but I am just curious to know how to do it without patching.

回答 0

如果您打算进行聚合，则可以使用ORM的聚合功能：

from django.db.models import Count
Members.objects.values('designation').annotate(dcount=Count('designation'))

这导致查询类似于

SELECT designation, COUNT(designation) AS dcount
FROM members GROUP BY designation

并且输出将为以下形式

[{'designation': 'Salesman', 'dcount': 2}, 
 {'designation': 'Manager', 'dcount': 2}]

If you mean to do aggregation you can use the aggregation features of the ORM:

from django.db.models import Count
Members.objects.values('designation').annotate(dcount=Count('designation'))

This results in a query similar to

SELECT designation, COUNT(designation) AS dcount
FROM members GROUP BY designation

and the output would be of the form

[{'designation': 'Salesman', 'dcount': 2}, 
 {'designation': 'Manager', 'dcount': 2}]

回答 1

一个简单的解决方案，但不是正确的方法是使用原始SQL：

results = Members.objects.raw('SELECT * FROM myapp_members GROUP BY designation')

另一种解决方案是使用该group_by属性：

query = Members.objects.all().query
query.group_by = ['designation']
results = QuerySet(query=query, model=Members)

现在，您可以遍历结果变量以检索结果。请注意，该group_by文档未记录，在以后的Django版本中可能会更改。

还有…为什么要使用group_by？如果不使用聚合，则可以使用order_by来获得相似的结果。

An easy solution, but not the proper way is to use raw SQL:

results = Members.objects.raw('SELECT * FROM myapp_members GROUP BY designation')

Another solution is to use the group_by property:

query = Members.objects.all().query
query.group_by = ['designation']
results = QuerySet(query=query, model=Members)

You can now iterate over the results variable to retrieve your results. Note that group_by is not documented and may be changed in future version of Django.

And… why do you want to use group_by? If you don’t use aggregation, you can use order_by to achieve an alike result.

回答 2

您也可以使用regroup模板标记按属性分组。从文档：

cities = [
    {'name': 'Mumbai', 'population': '19,000,000', 'country': 'India'},
    {'name': 'Calcutta', 'population': '15,000,000', 'country': 'India'},
    {'name': 'New York', 'population': '20,000,000', 'country': 'USA'},
    {'name': 'Chicago', 'population': '7,000,000', 'country': 'USA'},
    {'name': 'Tokyo', 'population': '33,000,000', 'country': 'Japan'},
]

...

{% regroup cities by country as country_list %}

<ul>
    {% for country in country_list %}
        <li>{{ country.grouper }}
            <ul>
            {% for city in country.list %}
                <li>{{ city.name }}: {{ city.population }}</li>
            {% endfor %}
            </ul>
        </li>
    {% endfor %}
</ul>

看起来像这样：

印度
- 孟买：19,000,000
- 加尔各答：15,000,000
美国
- 纽约：20,000,000
- 芝加哥：7,000,000
日本
- 东京：33,000,000

QuerySet我相信它也可以使用。

来源：https : //docs.djangoproject.com/en/2.1/ref/templates/builtins/#regroup

编辑：请注意，如果词典列表未按键排序，则该regroup标签将无法正常运行。它迭代地工作。因此，在将列表（或查询集）传递给regroup标签之前，请先按石斑鱼的键对列表进行排序。

You can also use the regroup template tag to group by attributes. From the docs:

cities = [
    {'name': 'Mumbai', 'population': '19,000,000', 'country': 'India'},
    {'name': 'Calcutta', 'population': '15,000,000', 'country': 'India'},
    {'name': 'New York', 'population': '20,000,000', 'country': 'USA'},
    {'name': 'Chicago', 'population': '7,000,000', 'country': 'USA'},
    {'name': 'Tokyo', 'population': '33,000,000', 'country': 'Japan'},
]

...

{% regroup cities by country as country_list %}

<ul>
    {% for country in country_list %}
        <li>{{ country.grouper }}
            <ul>
            {% for city in country.list %}
                <li>{{ city.name }}: {{ city.population }}</li>
            {% endfor %}
            </ul>
        </li>
    {% endfor %}
</ul>

Looks like this:

India
- Mumbai: 19,000,000
- Calcutta: 15,000,000
USA
- New York: 20,000,000
- Chicago: 7,000,000
Japan
- Tokyo: 33,000,000

It also works on QuerySets I believe.

source: https://docs.djangoproject.com/en/2.1/ref/templates/builtins/#regroup

edit: note the regroup tag does not work as you would expect it to if your list of dictionaries is not key-sorted. It works iteratively. So sort your list (or query set) by the key of the grouper before passing it to the regroup tag.

回答 3

您需要按照以下代码片段中的示例进行自定义SQL：

通过子查询自定义SQL

或在在线Django文档中显示的自定义管理器中：

添加额外的Manager方法

You need to do custom SQL as exemplified in this snippet:

Custom SQL via subquery

Or in a custom manager as shown in the online Django docs:

Adding extra Manager methods

回答 4

Django不支持免费的按组分组查询。我以非常糟糕的方式学到了它。如果不使用自定义SQL，则ORM并非旨在支持您想做的事情。您仅限于：

原始sql（即MyModel.objects.raw（））
cr.execute 句子（以及结果的手工解析）。
.annotate() （按句段在.annotate（）的子模型中执行句子分组，例如聚合lines_count = Count（’lines’）之类的示例））。

qs您可以调用整个qs.query.group_by = ['field1', 'field2', ...]查询集，但是如果您不知道要编辑的查询，并且不能保证该查询将起作用并且不会破坏QuerySet对象的内部，则可能会有风险。此外，它是一个内部（未记录）的API，您不应直接访问它，而不必担心代码不再与将来的Django版本兼容。

Django does not support free group by queries. I learned it in the very bad way. ORM is not designed to support stuff like what you want to do, without using custom SQL. You are limited to:

RAW sql (i.e. MyModel.objects.raw())
cr.execute sentences (and a hand-made parsing of the result).
.annotate() (the group by sentences are performed in the child model for .annotate(), in examples like aggregating lines_count=Count(‘lines’))).

Over a queryset qs you can call qs.query.group_by = ['field1', 'field2', ...] but it is risky if you don’t know what query are you editing and have no guarantee that it will work and not break internals of the QuerySet object. Besides, it is an internal (undocumented) API you should not access directly without risking the code not being anymore compatible with future Django versions.

回答 5

有一个模块可以让您对Django模型进行分组，并仍然在结果中使用QuerySet：https : //github.com/kako-nawao/django-group-by

例如：

from django_group_by import GroupByMixin

class BookQuerySet(QuerySet, GroupByMixin):
    pass

class Book(Model):
    title = TextField(...)
    author = ForeignKey(User, ...)
    shop = ForeignKey(Shop, ...)
    price = DecimalField(...)

class GroupedBookListView(PaginationMixin, ListView):
    template_name = 'book/books.html'
    model = Book
    paginate_by = 100

    def get_queryset(self):
        return Book.objects.group_by('title', 'author').annotate(
            shop_count=Count('shop'), price_avg=Avg('price')).order_by(
            'name', 'author').distinct()

    def get_context_data(self, **kwargs):
        return super().get_context_data(total_count=self.get_queryset().count(), **kwargs)

‘book / books.html’

<ul>
{% for book in object_list %}
    <li>
        <h2>{{ book.title }}</td>
        <p>{{ book.author.last_name }}, {{ book.author.first_name }}</p>
        <p>{{ book.shop_count }}</p>
        <p>{{ book.price_avg }}</p>
    </li>
{% endfor %}
</ul>

与annotate/ aggregate基本Django查询的区别在于使用了相关字段的属性，例如book.author.last_name。

如果需要已分组在一起的实例的PK，请添加以下注释：

.annotate(pks=ArrayAgg('id'))

注意：ArrayAgg是Postgres特定的功能，可从Django 1.9开始使用：https : //docs.djangoproject.com/en/1.10/ref/contrib/postgres/aggregates/#arrayagg

There is module that allows you to group Django models and still work with a QuerySet in the result: https://github.com/kako-nawao/django-group-by

For example:

from django_group_by import GroupByMixin

class BookQuerySet(QuerySet, GroupByMixin):
    pass

class Book(Model):
    title = TextField(...)
    author = ForeignKey(User, ...)
    shop = ForeignKey(Shop, ...)
    price = DecimalField(...)

class GroupedBookListView(PaginationMixin, ListView):
    template_name = 'book/books.html'
    model = Book
    paginate_by = 100

    def get_queryset(self):
        return Book.objects.group_by('title', 'author').annotate(
            shop_count=Count('shop'), price_avg=Avg('price')).order_by(
            'name', 'author').distinct()

    def get_context_data(self, **kwargs):
        return super().get_context_data(total_count=self.get_queryset().count(), **kwargs)

‘book/books.html’

<ul>
{% for book in object_list %}
    <li>
        <h2>{{ book.title }}</td>
        <p>{{ book.author.last_name }}, {{ book.author.first_name }}</p>
        <p>{{ book.shop_count }}</p>
        <p>{{ book.price_avg }}</p>
    </li>
{% endfor %}
</ul>

The difference to the annotate/aggregate basic Django queries is the use of the attributes of a related field, e.g. book.author.last_name.

If you need the PKs of the instances that have been grouped together, add the following annotation:

.annotate(pks=ArrayAgg('id'))

NOTE: ArrayAgg is a Postgres specific function, available from Django 1.9 onwards: https://docs.djangoproject.com/en/1.10/ref/contrib/postgres/aggregates/#arrayagg

回答 6

该文档说您可以使用值对queryset进行分组。

class Travel(models.Model):
    interest = models.ForeignKey(Interest)
    user = models.ForeignKey(User)
    time = models.DateTimeField(auto_now_add=True)

# Find the travel and group by the interest:

>>> Travel.objects.values('interest').annotate(Count('user'))
<QuerySet [{'interest': 5, 'user__count': 2}, {'interest': 6, 'user__count': 1}]>
# the interest(id=5) had been visited for 2 times, 
# and the interest(id=6) had only been visited for 1 time.

>>> Travel.objects.values('interest').annotate(Count('user', distinct=True)) 
<QuerySet [{'interest': 5, 'user__count': 1}, {'interest': 6, 'user__count': 1}]>
# the interest(id=5) had been visited by only one person (but this person had 
#  visited the interest for 2 times

您可以找到所有书籍，并使用以下代码按名称分组：

Book.objects.values('name').annotate(Count('id')).order_by() # ensure you add the order_by()

你可以在这里看一些指南。

The document says that you can use values to group the queryset .

class Travel(models.Model):
    interest = models.ForeignKey(Interest)
    user = models.ForeignKey(User)
    time = models.DateTimeField(auto_now_add=True)

# Find the travel and group by the interest:

>>> Travel.objects.values('interest').annotate(Count('user'))
<QuerySet [{'interest': 5, 'user__count': 2}, {'interest': 6, 'user__count': 1}]>
# the interest(id=5) had been visited for 2 times, 
# and the interest(id=6) had only been visited for 1 time.

>>> Travel.objects.values('interest').annotate(Count('user', distinct=True)) 
<QuerySet [{'interest': 5, 'user__count': 1}, {'interest': 6, 'user__count': 1}]>
# the interest(id=5) had been visited by only one person (but this person had 
#  visited the interest for 2 times

You can find all the books and group them by name using this code:

Book.objects.values('name').annotate(Count('id')).order_by() # ensure you add the order_by()

You can watch some cheet sheet here.

回答 7

如果我没有记错的话，可以使用what -query-set .group_by = [‘ field ‘]

If I’m not mistaking you can use, whatever-query-set.group_by=[‘field‘]

回答 8

from django.db.models import Sum
Members.objects.annotate(total=Sum(designation))

首先，您需要导入Sum，然后..

from django.db.models import Sum
Members.objects.annotate(total=Sum(designation))

first you need to import Sum then ..

知识问答

使用Python 3从网上下载文件

2021年7月27日 Python实用宝典

问题：使用Python 3从网上下载文件

我正在创建一个程序，该程序将通过读取同一游戏/应用程序的.jad文件中指定的URL从Web服务器下载.jar（java）文件。我正在使用Python 3.2.1

我设法从JAD文件中提取JAR文件的URL（每个JAD文件都包含指向JAR文件的URL），但是正如您所想象的，提取的值是type（）字符串。

回答 0

如果要将网页的内容转换为变量，则只需read响应urllib.request.urlopen：

import urllib.request
...
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8') # a `str`; this step can't be used if data is binary

下载和保存文件的最简单方法是使用以下urllib.request.urlretrieve功能：

import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
urllib.request.urlretrieve(url, file_name)

import urllib.request
...
# Download the file from `url`, save it in a temporary directory and get the
# path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable:
file_name, headers = urllib.request.urlretrieve(url)

但是请记住，这urlretrieve被认为是遗留的，并且可能会被弃用（尽管不确定为什么）。

因此，最正确的方法是使用urllib.request.urlopen函数返回代表HTTP响应的类似文件的对象，然后使用将其复制到真实文件中shutil.copyfileobj。

import urllib.request
import shutil
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

如果这看起来太复杂，则可能要简化一些并将整个下载存储在一个bytes对象中，然后将其写入文件。但这仅适用于小文件。

import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    data = response.read() # a `bytes` object
    out_file.write(data)

可以动态提取.gz（可能还有其他格式）压缩数据，但是这种操作可能需要HTTP服务器支持对文件的随机访问。

import urllib.request
import gzip
...
# Read the first 64 bytes of the file inside the .gz archive located at `url`
url = 'http://example.com/something.gz'
with urllib.request.urlopen(url) as response:
    with gzip.GzipFile(fileobj=response) as uncompressed:
        file_header = uncompressed.read(64) # a `bytes` object
        # Or do anything shown above using `uncompressed` instead of `response`.

If you want to obtain the contents of a web page into a variable, just read the response of urllib.request.urlopen:

import urllib.request
...
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8') # a `str`; this step can't be used if data is binary

The easiest way to download and save a file is to use the urllib.request.urlretrieve function:

import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
urllib.request.urlretrieve(url, file_name)

import urllib.request
...
# Download the file from `url`, save it in a temporary directory and get the
# path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable:
file_name, headers = urllib.request.urlretrieve(url)

But keep in mind that urlretrieve is considered legacy and might become deprecated (not sure why, though).

So the most correct way to do this would be to use the urllib.request.urlopen function to return a file-like object that represents an HTTP response and copy it to a real file using shutil.copyfileobj.

import urllib.request
import shutil
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

If this seems too complicated, you may want to go simpler and store the whole download in a bytes object and then write it to a file. But this works well only for small files.

import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    data = response.read() # a `bytes` object
    out_file.write(data)

It is possible to extract .gz (and maybe other formats) compressed data on the fly, but such an operation probably requires the HTTP server to support random access to the file.

import urllib.request
import gzip
...
# Read the first 64 bytes of the file inside the .gz archive located at `url`
url = 'http://example.com/something.gz'
with urllib.request.urlopen(url) as response:
    with gzip.GzipFile(fileobj=response) as uncompressed:
        file_header = uncompressed.read(64) # a `bytes` object
        # Or do anything shown above using `uncompressed` instead of `response`.

回答 1

requests每当我想要与HTTP请求相关的内容时，我都会使用package，因为它的API很容易开头：

首先，安装 requests

$ pip install requests

然后是代码：

from requests import get  # to make GET request


def download(url, file_name):
    # open in binary mode
    with open(file_name, "wb") as file:
        # get request
        response = get(url)
        # write to file
        file.write(response.content)

I use requests package whenever I want something related to HTTP requests because its API is very easy to start with:

first, install requests

$ pip install requests

then the code:

from requests import get  # to make GET request


def download(url, file_name):
    # open in binary mode
    with open(file_name, "wb") as file:
        # get request
        response = get(url)
        # write to file
        file.write(response.content)

回答 2

我希望我理解正确的问题，那就是：当URL以字符串类型存储时，如何从服务器下载文件？

我下载文件并使用以下代码将其保存在本地：

import requests

url = 'https://www.python.org/static/img/python-logo.png'
fileName = 'D:\Python\dwnldPythonLogo.png'
req = requests.get(url)
file = open(fileName, 'wb')
for chunk in req.iter_content(100000):
    file.write(chunk)
file.close()

I hope I understood the question right, which is: how to download a file from a server when the URL is stored in a string type?

I download files and save it locally using the below code:

import requests

url = 'https://www.python.org/static/img/python-logo.png'
fileName = 'D:\Python\dwnldPythonLogo.png'
req = requests.get(url)
file = open(fileName, 'wb')
for chunk in req.iter_content(100000):
    file.write(chunk)
file.close()

回答 3

在这里，我们可以在Python3中使用urllib的Legacy接口：

以下函数和类是从Python 2模块urllib（与urllib2相对）移植的。他们可能在将来的某个时候被弃用。

示例（两行代码）：

import urllib.request

url = 'https://www.python.org/static/img/python-logo.png'
urllib.request.urlretrieve(url, "logo.png")

Here we can use urllib’s Legacy interface in Python3:

The following functions and classes are ported from the Python 2 module urllib (as opposed to urllib2). They might become deprecated at some point in the future.

Example (2 lines code):

import urllib.request

url = 'https://www.python.org/static/img/python-logo.png'
urllib.request.urlretrieve(url, "logo.png")

回答 4

您可以使用wget，它是流行的下载shell工具。https://pypi.python.org/pypi/wget 这将是最简单的方法，因为它不需要打开目标文件。这是一个例子。

import wget
url = 'https://i1.wp.com/python3.codes/wp-content/uploads/2015/06/Python3-powered.png?fit=650%2C350'  
wget.download(url, '/Users/scott/Downloads/cat4.jpg')

You can use wget which is popular downloading shell tool for that. https://pypi.python.org/pypi/wget This will be the simplest method since it does not need to open up the destination file. Here is an example.

import wget
url = 'https://i1.wp.com/python3.codes/wp-content/uploads/2015/06/Python3-powered.png?fit=650%2C350'  
wget.download(url, '/Users/scott/Downloads/cat4.jpg')

回答 5

是的，绝对请求是用于与HTTP请求相关的东西的很好的程序包。但是我们需要注意传入数据的编码类型，下面是一个说明差异的示例


from requests import get

# case when the response is byte array
url = 'some_image_url'

response = get(url)
with open('output', 'wb') as file:
    file.write(response.content)


# case when the response is text
# Here unlikely if the reponse content is of type **iso-8859-1** we will have to override the response encoding
url = 'some_page_url'

response = get(url)
# override encoding by real educated guess as provided by chardet
r.encoding = r.apparent_encoding

with open('output', 'w', encoding='utf-8') as file:
    file.write(response.content)

Yes, definietly requests is great package to use in something related to HTTP requests. but we need to be careful with the encoding type of the incoming data as well below is an example which explains the difference


from requests import get

# case when the response is byte array
url = 'some_image_url'

response = get(url)
with open('output', 'wb') as file:
    file.write(response.content)


# case when the response is text
# Here unlikely if the reponse content is of type **iso-8859-1** we will have to override the response encoding
url = 'some_page_url'

response = get(url)
# override encoding by real educated guess as provided by chardet
r.encoding = r.apparent_encoding

with open('output', 'w', encoding='utf-8') as file:
    file.write(response.content)

回答 6

动机

有时，我们想要获取图片，但无需将其下载到真实文件中，

即下载数据并将其保存在内存中。

例如，如果我使用机器学习方法，则训练一个可以识别带有数字（条形码）图像的模型。

当我搜寻一些具有这些图像的网站时，我可以使用模型来识别它，

而且我不想将这些图片保存在磁盘驱动器上，

那么您可以尝试以下方法来帮助您将下载数据保留在内存中。

点数

import requests
from io import BytesIO
response = requests.get(url)
with BytesIO as io_obj:
    for chunk in response.iter_content(chunk_size=4096):
        io_obj.write(chunk)

基本上，就像@Ranvijay Kumar

一个例子

import requests
from typing import NewType, TypeVar
from io import StringIO, BytesIO
import matplotlib.pyplot as plt
import imageio

URL = NewType('URL', str)
T_IO = TypeVar('T_IO', StringIO, BytesIO)


def download_and_keep_on_memory(url: URL, headers=None, timeout=None, **option) -> T_IO:
    chunk_size = option.get('chunk_size', 4096)  # default 4KB
    max_size = 1024 ** 2 * option.get('max_size', -1)  # MB, default will ignore.
    response = requests.get(url, headers=headers, timeout=timeout)
    if response.status_code != 200:
        raise requests.ConnectionError(f'{response.status_code}')

    instance_io = StringIO if isinstance(next(response.iter_content(chunk_size=1)), str) else BytesIO
    io_obj = instance_io()
    cur_size = 0
    for chunk in response.iter_content(chunk_size=chunk_size):
        cur_size += chunk_size
        if 0 < max_size < cur_size:
            break
        io_obj.write(chunk)
    io_obj.seek(0)
    """ save it to real file.
    with open('temp.png', mode='wb') as out_f:
        out_f.write(io_obj.read())
    """
    return io_obj


def main():
    headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive',
        'Host': 'statics.591.com.tw',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
    }
    io_img = download_and_keep_on_memory(URL('http://statics.591.com.tw/tools/showPhone.php?info_data=rLsGZe4U%2FbphHOimi2PT%2FhxTPqI&type=rLEFMu4XrrpgEw'),
                                         headers,  # You may need this. Otherwise, some websites will send the 404 error to you.
                                         max_size=4)  # max loading < 4MB
    with io_img:
        plt.rc('axes.spines', top=False, bottom=False, left=False, right=False)
        plt.rc(('xtick', 'ytick'), color=(1, 1, 1, 0))  # same of plt.axis('off')
        plt.imshow(imageio.imread(io_img, as_gray=False, pilmode="RGB"))
        plt.show()


if __name__ == '__main__':
    main()

Motivation

Sometimes, we are want to get the picture but not need to download it to real files,

i.e., download the data and keep it on memory.

For example, If I use the machine learning method, train a model that can recognize an image with the number (bar code).

When I spider some websites and that have those images so I can use the model to recognize it,

and I don’t want to save those pictures on my disk drive,

then you can try the below method to help you keep download data on memory.

Points

import requests
from io import BytesIO
response = requests.get(url)
with BytesIO as io_obj:
    for chunk in response.iter_content(chunk_size=4096):
        io_obj.write(chunk)

basically, is like to @Ranvijay Kumar

An Example

import requests
from typing import NewType, TypeVar
from io import StringIO, BytesIO
import matplotlib.pyplot as plt
import imageio

URL = NewType('URL', str)
T_IO = TypeVar('T_IO', StringIO, BytesIO)


def download_and_keep_on_memory(url: URL, headers=None, timeout=None, **option) -> T_IO:
    chunk_size = option.get('chunk_size', 4096)  # default 4KB
    max_size = 1024 ** 2 * option.get('max_size', -1)  # MB, default will ignore.
    response = requests.get(url, headers=headers, timeout=timeout)
    if response.status_code != 200:
        raise requests.ConnectionError(f'{response.status_code}')

    instance_io = StringIO if isinstance(next(response.iter_content(chunk_size=1)), str) else BytesIO
    io_obj = instance_io()
    cur_size = 0
    for chunk in response.iter_content(chunk_size=chunk_size):
        cur_size += chunk_size
        if 0 < max_size < cur_size:
            break
        io_obj.write(chunk)
    io_obj.seek(0)
    """ save it to real file.
    with open('temp.png', mode='wb') as out_f:
        out_f.write(io_obj.read())
    """
    return io_obj


def main():
    headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive',
        'Host': 'statics.591.com.tw',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
    }
    io_img = download_and_keep_on_memory(URL('http://statics.591.com.tw/tools/showPhone.php?info_data=rLsGZe4U%2FbphHOimi2PT%2FhxTPqI&type=rLEFMu4XrrpgEw'),
                                         headers,  # You may need this. Otherwise, some websites will send the 404 error to you.
                                         max_size=4)  # max loading < 4MB
    with io_img:
        plt.rc('axes.spines', top=False, bottom=False, left=False, right=False)
        plt.rc(('xtick', 'ytick'), color=(1, 1, 1, 0))  # same of plt.axis('off')
        plt.imshow(imageio.imread(io_img, as_gray=False, pilmode="RGB"))
        plt.show()


if __name__ == '__main__':
    main()

回答 7

from urllib import request

def get(url):
    with request.urlopen(url) as r:
        return r.read()


def download(url, file=None):
    if not file:
        file = url.split('/')[-1]
    with open(file, 'wb') as f:
        f.write(get(url))

from urllib import request

def get(url):
    with request.urlopen(url) as r:
        return r.read()


def download(url, file=None):
    if not file:
        file = url.split('/')[-1]
    with open(file, 'wb') as f:
        f.write(get(url))

知识问答

比较浮点数和Python中几乎相等的最佳方法是什么？

2021年7月27日 Python实用宝典

问题：比较浮点数和Python中几乎相等的最佳方法是什么？

众所周知，由于舍入和精度问题，比较浮点数是否相等。

例如：https： //randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

在Python中处理此问题的推荐方法是什么？

当然在某个地方有一个标准的库函数吗？

It’s well known that comparing floats for equality is a little fiddly due to rounding and precision issues.

For example: https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

What is the recommended way to deal with this in Python?

Surely there is a standard library function for this somewhere?

回答 0

Python 3.5添加了PEP 485中描述的math.isclose和cmath.isclose功能。

如果您使用的是Python的早期版本，则等效功能在文档中给出。

def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
    return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)

rel_tol是一个相对容差，它乘以两个参数中的较大者；当值变大时，它们之间的允许差异也会变大，同时仍将它们视为相等。

abs_tol是在所有情况下均按原样应用的绝对公差。如果差异小于这些公差中的任何一个，则认为值相等。

Python 3.5 adds the math.isclose and cmath.isclose functions as described in PEP 485.

If you’re using an earlier version of Python, the equivalent function is given in the documentation.

def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
    return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)

rel_tol is a relative tolerance, it is multiplied by the greater of the magnitudes of the two arguments; as the values get larger, so does the allowed difference between them while still considering them equal.

abs_tol is an absolute tolerance that is applied as-is in all cases. If the difference is less than either of those tolerances, the values are considered equal.

回答 1

如下简单的内容还不够好吗？

return abs(f1 - f2) <= allowed_error

Is something as simple as the following not good enough?

return abs(f1 - f2) <= allowed_error

回答 2

我同意Gareth的答案可能最适合作为轻量级功能/解决方案。

但是我认为最好注意一下，如果您正在使用NumPy或正在考虑使用它，则可以使用打包功能。

numpy.isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)

不过有一点免责声明：根据您的平台，安装NumPy可能是不平凡的体验。

I would agree that Gareth’s answer is probably most appropriate as a lightweight function/solution.

But I thought it would be helpful to note that if you are using NumPy or are considering it, there is a packaged function for this.

numpy.isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)

A little disclaimer though: installing NumPy can be a non-trivial experience depending on your platform.

回答 3

使用decimal提供Decimal类的Python 模块。

从评论：

值得注意的是，如果您正在做大量的数学工作，并且您绝对不需要十进制的精度，那么这确实会使事情陷入困境。浮动是一种方式，处理起来更快，但不精确。小数非常精确，但速度很慢。

Use Python’s decimal module, which provides the Decimal class.

From the comments:

It is worth noting that if you’re doing math-heavy work and you don’t absolutely need the precision from decimal, this can really bog things down. Floats are way, way faster to deal with, but imprecise. Decimals are extremely precise but slow.

回答 4

我不知道Python标准库（或其他地方）中实现Dawson AlmostEqual2sComplement函数的任何内容。如果这是您想要的行为，则必须自己实施。（在这种情况下，而不是使用Dawson的聪明按位黑客你可能做的更好使用形式的更常规测试if abs(a-b) <= eps1*(abs(a)+abs(b)) + eps2或类似的道森得到类似的行为，你可能会说，if abs(a-b) <= eps*max(EPS,abs(a),abs(b))对于一些小的固定EPS，这是不完全与道森相同，但在精神上相似。

I’m not aware of anything in the Python standard library (or elsewhere) that implements Dawson’s AlmostEqual2sComplement function. If that’s the sort of behaviour you want, you’ll have to implement it yourself. (In which case, rather than using Dawson’s clever bitwise hacks you’d probably do better to use more conventional tests of the form if abs(a-b) <= eps1*(abs(a)+abs(b)) + eps2 or similar. To get Dawson-like behaviour you might say something like if abs(a-b) <= eps*max(EPS,abs(a),abs(b)) for some small fixed EPS; this isn’t exactly the same as Dawson, but it’s similar in spirit.

回答 5

无法将浮点数进行相等性比较的常识是不正确的。浮点数与整数没有什么不同：如果计算“ a == b”，则它们是相同的数字将为true，否则为false（要理解两个NaN当然不是相同的数字）。

实际的问题是这样的：如果我进行了一些计算并且不确定我要比较的两个数字是否完全正确，那又是什么？对于浮点和整数，此问题相同。如果计算整数表达式“ 7/3 * 3”，则它将不等于“ 7 * 3/3”。

因此，假设我们问“如何比较整数是否相等？” 在这种情况下。没有一个答案。您应该做什么取决于具体情况，尤其是您遇到的错误类型以及要实现的错误类型。

这是一些可能的选择。

如果要在数学上精确的数字相等的情况下获得“真实”的结果，则可以尝试使用所执行的计算的属性来证明在两个数字中得到相同的错误。如果这是可行的，并且您比较了两个表达式所产生的两个数字，这些表达式在经过精确计算后将得出相等的数字，那么您将从比较中获得“ true”。另一种方法是，您可能会分析计算的属性，并证明误差不会超过特定数量，可能绝对值或相对于输入之一或输出之一的数量。在这种情况下，您可以询问两个计算得出的数字是否相差最大，如果在间隔内，则返回“ true”。如果您无法证明错误界限，您可能会猜测并希望达到最佳。猜测的一种方法是评估许多随机样本，并查看结果中得到的分布类型。

当然，由于我们仅设置了在数学上精确的结果相等的情况下您必须获得“真实”的要求，因此我们就保留了即使它们不相等也获得“真实”的可能性。（实际上，我们可以通过始终返回“ true”来满足要求。这使计算变得简单，但是通常不希望这样做，因此，我将在下面讨论改善这种情况。）

如果要在数学上精确的数字不相等的情况下获得“假”结果，则需要证明在数学上精确的数字不相等的情况下，对数字的评估会得出不同的数字。在许多常见情况下，出于实际目的这可能是不可能的。因此，让我们考虑一个替代方案。

一个有用的要求是，如果数学上精确的数字相差超过一定数量，我们将得到“假”结果。例如，也许我们要计算在计算机游戏中扔出的球在哪里移动，并且我们想知道它是否击中了蝙蝠。在这种情况下，如果球碰到球棒，我们当然想得到“ true”，如果球远离球棒，我们就希望得到“ false”，如果球进入球棒，我们可以接受不正确的“ true”答案。数学上精确的模拟未击中蝙蝠，但击中蝙蝠仅不到一毫米。在这种情况下，我们需要证明（或猜测/估计）我们对球的位置和球拍的位置的计算的组合误差最大为1毫米（对于所有感兴趣的位置）。这将使我们始终返回“

因此，在比较浮点数时如何决定返回什么很大程度上取决于您的具体情况。

关于如何证明计算的误差范围，这可能是一个复杂的主题。使用舍入取整模式使用IEEE 754标准的任何浮点实现都会返回最接近于任何基本运算（尤其是乘法，除法，加法，减法，平方根）的精确结果的浮点数。（在平局的情况下，是舍入的，所以低位是偶数。）（请特别注意平方根和除法；您的语言实现可能会使用不符合IEEE 754的方法。）由于这一要求，我们知道单个结果中的错误最多为最低有效位的值的1/2。（如果更多，则将四舍五入为数值的1/2之内的另一个数字。）

从那里继续进行变得更加复杂。下一步是执行其中一个输入已经有错误的操作。对于简单表达式，可以通过计算跟踪这些错误，以达到最终错误的界限。实际上，这仅在少数情况下才能完成，例如使用高质量的数学库。而且，当然，您需要精确地控制要执行的操作。高级语言通常会给编译器带来很多负担，因此您可能不知道以什么顺序执行操作。

关于这个话题还有很多（现在）可以写，但是我必须到此为止。总而言之，答案是：没有用于此比较的库例程，因为没有适合大多数需求的单一解决方案值得放入库例程中。（如果与一个相对误差间隔或绝对误差间隔进行比较就足够了，则无需库例程即可完成此操作。）

The common wisdom that floating-point numbers cannot be compared for equality is inaccurate. Floating-point numbers are no different from integers: If you evaluate “a == b”, you will get true if they are identical numbers and false otherwise (with the understanding that two NaNs are of course not identical numbers).

The actual problem is this: If I have done some calculations and am not sure the two numbers I have to compare are exactly correct, then what? This problem is the same for floating-point as it is for integers. If you evaluate the integer expression “7/3*3”, it will not compare equal to “7*3/3”.

So suppose we asked “How do I compare integers for equality?” in such a situation. There is no single answer; what you should do depends on the specific situation, notably what sort of errors you have and what you want to achieve.

Here are some possible choices.

If you want to get a “true” result if the mathematically exact numbers would be equal, then you might try to use the properties of the calculations you perform to prove that you get the same errors in the two numbers. If that is feasible, and you compare two numbers that result from expressions that would give equal numbers if computed exactly, then you will get “true” from the comparison. Another approach is that you might analyze the properties of the calculations and prove that the error never exceeds a certain amount, perhaps an absolute amount or an amount relative to one of the inputs or one of the outputs. In that case, you can ask whether the two calculated numbers differ by at most that amount, and return “true” if they are within the interval. If you cannot prove an error bound, you might guess and hope for the best. One way of guessing is to evaluate many random samples and see what sort of distribution you get in the results.

Of course, since we only set the requirement that you get “true” if the mathematically exact results are equal, we left open the possibility that you get “true” even if they are unequal. (In fact, we can satisfy the requirement by always returning “true”. This makes the calculation simple but is generally undesirable, so I will discuss improving the situation below.)

If you want to get a “false” result if the mathematically exact numbers would be unequal, you need to prove that your evaluation of the numbers yields different numbers if the mathematically exact numbers would be unequal. This may be impossible for practical purposes in many common situations. So let us consider an alternative.

A useful requirement might be that we get a “false” result if the mathematically exact numbers differ by more than a certain amount. For example, perhaps we are going to calculate where a ball thrown in a computer game traveled, and we want to know whether it struck a bat. In this case, we certainly want to get “true” if the ball strikes the bat, and we want to get “false” if the ball is far from the bat, and we can accept an incorrect “true” answer if the ball in a mathematically exact simulation missed the bat but is within a millimeter of hitting the bat. In that case, we need to prove (or guess/estimate) that our calculation of the ball’s position and the bat’s position have a combined error of at most one millimeter (for all positions of interest). This would allow us to always return “false” if the ball and bat are more than a millimeter apart, to return “true” if they touch, and to return “true” if they are close enough to be acceptable.

So, how you decide what to return when comparing floating-point numbers depends very much on your specific situation.

As to how you go about proving error bounds for calculations, that can be a complicated subject. Any floating-point implementation using the IEEE 754 standard in round-to-nearest mode returns the floating-point number nearest to the exact result for any basic operation (notably multiplication, division, addition, subtraction, square root). (In case of tie, round so the low bit is even.) (Be particularly careful about square root and division; your language implementation might use methods that do not conform to IEEE 754 for those.) Because of this requirement, we know the error in a single result is at most 1/2 of the value of the least significant bit. (If it were more, the rounding would have gone to a different number that is within 1/2 the value.)

Going on from there gets substantially more complicated; the next step is performing an operation where one of the inputs already has some error. For simple expressions, these errors can be followed through the calculations to reach a bound on the final error. In practice, this is only done in a few situations, such as working on a high-quality mathematics library. And, of course, you need precise control over exactly which operations are performed. High-level languages often give the compiler a lot of slack, so you might not know in which order operations are performed.

There is much more that could be (and is) written about this topic, but I have to stop there. In summary, the answer is: There is no library routine for this comparison because there is no single solution that fits most needs that is worth putting into a library routine. (If comparing with a relative or absolute error interval suffices for you, you can do it simply without a library routine.)

回答 6

如果要在测试/ TDD上下文中使用它，我会说这是一种标准方式：

from nose.tools import assert_almost_equals

assert_almost_equals(x, y, places=7) #default is 7

If you want to use it in testing/TDD context, I’d say this is a standard way:

from nose.tools import assert_almost_equals

assert_almost_equals(x, y, places=7) #default is 7

回答 7

为此，math.isclose（）已添加到Python 3.5中（源代码）。这是它与Python 2的移植。与Mark Ransom的一句话不同的是，它可以正确处理“ inf”和“ -inf”。

def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
    '''
    Python 2 implementation of Python 3.5 math.isclose()
    https://hg.python.org/cpython/file/tip/Modules/mathmodule.c#l1993
    '''
    # sanity check on the inputs
    if rel_tol < 0 or abs_tol < 0:
        raise ValueError("tolerances must be non-negative")

    # short circuit exact equality -- needed to catch two infinities of
    # the same sign. And perhaps speeds things up a bit sometimes.
    if a == b:
        return True

    # This catches the case of two infinities of opposite sign, or
    # one infinity and one finite number. Two infinities of opposite
    # sign would otherwise have an infinite relative tolerance.
    # Two infinities of the same sign are caught by the equality check
    # above.
    if math.isinf(a) or math.isinf(b):
        return False

    # now do the regular computation
    # this is essentially the "weak" test from the Boost library
    diff = math.fabs(b - a)
    result = (((diff <= math.fabs(rel_tol * b)) or
               (diff <= math.fabs(rel_tol * a))) or
              (diff <= abs_tol))
    return result

math.isclose() has been added to Python 3.5 for that (source code). Here is a port of it to Python 2. It’s difference from one-liner of Mark Ransom is that it can handle “inf” and “-inf” properly.

def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
    '''
    Python 2 implementation of Python 3.5 math.isclose()
    https://hg.python.org/cpython/file/tip/Modules/mathmodule.c#l1993
    '''
    # sanity check on the inputs
    if rel_tol < 0 or abs_tol < 0:
        raise ValueError("tolerances must be non-negative")

    # short circuit exact equality -- needed to catch two infinities of
    # the same sign. And perhaps speeds things up a bit sometimes.
    if a == b:
        return True

    # This catches the case of two infinities of opposite sign, or
    # one infinity and one finite number. Two infinities of opposite
    # sign would otherwise have an infinite relative tolerance.
    # Two infinities of the same sign are caught by the equality check
    # above.
    if math.isinf(a) or math.isinf(b):
        return False

    # now do the regular computation
    # this is essentially the "weak" test from the Boost library
    diff = math.fabs(b - a)
    result = (((diff <= math.fabs(rel_tol * b)) or
               (diff <= math.fabs(rel_tol * a))) or
              (diff <= abs_tol))
    return result

回答 8

我发现以下比较有帮助：

str(f1) == str(f2)

I found the following comparison helpful:

str(f1) == str(f2)

回答 9

对于某些可能影响源编号表示的情况，可以使用整数分子和分母将它们表示为小数而不是浮点数。这样，您就可以进行精确比较。

有关详细信息，请参见“ 分数的分数”模块。

For some of the cases where you can affect the source number representation, you can represent them as fractions instead of floats, using integer numerator and denominator. That way you can have exact comparisons.

See Fraction from fractions module for details.

回答 10

我喜欢@Sesquipedal的建议，但进行了修改（当两个值均为0时，返回False的特殊用例）。就我而言，我使用的是python 2.7，只是使用了一个简单的函数：

if f1 ==0 and f2 == 0:
    return True
else:
    return abs(f1-f2) < tol*max(abs(f1),abs(f2))

I liked @Sesquipedal ‘s suggestion but with modification (a special use case when both values are 0 returns False). In my case I was on Python 2.7 and just used a simple function:

if f1 ==0 and f2 == 0:
    return True
else:
    return abs(f1-f2) < tol*max(abs(f1),abs(f2))

回答 11

对于要确保2个数字相同且“精确度最高”而无需指定公差的情况很有用：

找出2个数字的最小精度
将它们四舍五入到最低精度并进行比较

def isclose(a,b):                                       
    astr=str(a)                                         
    aprec=len(astr.split('.')[1]) if '.' in astr else 0 
    bstr=str(b)                                         
    bprec=len(bstr.split('.')[1]) if '.' in bstr else 0 
    prec=min(aprec,bprec)                                      
    return round(a,prec)==round(b,prec)

如所写，仅适用于字符串表示形式中不包含’e’的数字（含义0.9999999999995e-4 <数字<= 0.9999999999995e11）

例：

>>> isclose(10.0,10.049)
True
>>> isclose(10.0,10.05)
False

Useful for the case where you want to make sure 2 numbers are the same ‘up to precision’, no need to specify the tolerance:

Find minimum precision of the 2 numbers
Round both of them to minimum precision and compare

def isclose(a,b):                                       
    astr=str(a)                                         
    aprec=len(astr.split('.')[1]) if '.' in astr else 0 
    bstr=str(b)                                         
    bprec=len(bstr.split('.')[1]) if '.' in bstr else 0 
    prec=min(aprec,bprec)                                      
    return round(a,prec)==round(b,prec)

As written, only works for numbers without the ‘e’ in their string representation ( meaning 0.9999999999995e-4 < number <= 0.9999999999995e11 )

Example:

>>> isclose(10.0,10.049)
True
>>> isclose(10.0,10.05)
False

回答 12

比较不超过给定十进制数的给定值atol/rtol：

def almost_equal(a, b, decimal=6):
    return '{0:.{1}f}'.format(a, decimal) == '{0:.{1}f}'.format(b, decimal)

print(almost_equal(0.0, 0.0001, decimal=5)) # False
print(almost_equal(0.0, 0.0001, decimal=4)) # True

To compare up to a given decimal without atol/rtol:

def almost_equal(a, b, decimal=6):
    return '{0:.{1}f}'.format(a, decimal) == '{0:.{1}f}'.format(b, decimal)

print(almost_equal(0.0, 0.0001, decimal=5)) # False
print(almost_equal(0.0, 0.0001, decimal=4)) # True

回答 13

这也许有点丑陋，但是当您不需要的默认浮点精度（大约11位小数）时，它就可以很好地工作。

该round_to函数使用格式方法从内置的str类的浮动四舍五入到代表浮球随所需的小数位数，然后用一个字符串应用EVAL内置功能圆形浮弦找回浮点数字类型。

该is_close功能只适用于一个简单的条件向围捕浮动。

def round_to(float_num, prec):
    return eval("'{:." + str(int(prec)) + "f}'.format(" + str(float_num) + ")")

def is_close(float_a, float_b, prec):
    if round_to(float_a, prec) == round_to(float_b, prec):
        return True
    return False

>>>a = 10.0
10.0
>>>b = 10.0001
10.0001
>>>print is_close(a, b, prec=3)
True
>>>print is_close(a, b, prec=4)
False

更新：

正如@stepehjfox所建议的那样，一种构建rount_to函数以避免“ eval”的更简洁方法是使用嵌套格式：

def round_to(float_num, prec):
    return '{:.{precision}f}'.format(float_num, precision=prec)

遵循相同的思想，使用很棒的新f字符串（Python 3.6+），代码甚至可以变得更加简单：

def round_to(float_num, prec):
    return f'{float_num:.{prec}f}'

因此，我们甚至可以将其全部封装在一个简单干净的“ is_close”函数中：

def is_close(a, b, prec):
    return f'{a:.{prec}f}' == f'{b:.{prec}f}'

This maybe is a bit ugly hack, but it works pretty well when you don’t need more than the default float precision (about 11 decimals).

The round_to function uses the format method from the built-in str class to round up the float to a string that represents the float with the number of decimals needed, and then applies the eval built-in function to the rounded float string to get back to the float numeric type.

The is_close function just applies a simple conditional to the rounded up float.

def round_to(float_num, prec):
    return eval("'{:." + str(int(prec)) + "f}'.format(" + str(float_num) + ")")

def is_close(float_a, float_b, prec):
    if round_to(float_a, prec) == round_to(float_b, prec):
        return True
    return False

>>>a = 10.0
10.0
>>>b = 10.0001
10.0001
>>>print is_close(a, b, prec=3)
True
>>>print is_close(a, b, prec=4)
False

Update:

As suggested by @stepehjfox, a cleaner way to build a rount_to function avoiding “eval” is using nested formatting:

def round_to(float_num, prec):
    return '{:.{precision}f}'.format(float_num, precision=prec)

Following the same idea, the code can be even simpler using the great new f-strings (Python 3.6+):

def round_to(float_num, prec):
    return f'{float_num:.{prec}f}'

So, we could even wrap it up all in one simple and clean ‘is_close’ function:

def is_close(a, b, prec):
    return f'{a:.{prec}f}' == f'{b:.{prec}f}'

回答 14

就绝对误差而言，您只需检查一下

if abs(a - b) <= error:
    print("Almost equal")

为什么float行为在Python中很奇怪的一些信息 https://youtu.be/v4HhvoNLILk?t=1129

您也可以将math.isclose用于相对错误

In terms of absolute error, you can just check

if abs(a - b) <= error:
    print("Almost equal")

Some information of why float act weird in Python https://youtu.be/v4HhvoNLILk?t=1129

You can also use math.isclose for relative errors

知识问答

如何在没有科学符号和给定精度的情况下漂亮地打印numpy.array？

2021年7月27日 Python实用宝典

问题：如何在没有科学符号和给定精度的情况下漂亮地打印numpy.array？

我很好奇，是否有任何打印格式化的方法numpy.arrays，例如，类似于以下方式：

x = 1.23456
print '%.3f' % x

如果我想打印numpy.array浮点数，它会以“科学”格式打印几位小数，即使对于低维数组也很难阅读。但是，numpy.array显然必须将其打印为字符串，即使用%s。有解决方案吗？

I’m curious, whether there is any way to print formatted numpy.arrays, e.g., in a way similar to this:

x = 1.23456
print '%.3f' % x

If I want to print the numpy.array of floats, it prints several decimals, often in ‘scientific’ format, which is rather hard to read even for low-dimensional arrays. However, numpy.array apparently has to be printed as a string, i.e., with %s. Is there a solution for this?

回答 0

您可以set_printoptions用来设置输出的精度：

import numpy as np
x=np.random.random(10)
print(x)
# [ 0.07837821  0.48002108  0.41274116  0.82993414  0.77610352  0.1023732
#   0.51303098  0.4617183   0.33487207  0.71162095]

np.set_printoptions(precision=3)
print(x)
# [ 0.078  0.48   0.413  0.83   0.776  0.102  0.513  0.462  0.335  0.712]

并suppress禁止对小数使用科学计数法：

y=np.array([1.5e-10,1.5,1500])
print(y)
# [  1.500e-10   1.500e+00   1.500e+03]
np.set_printoptions(suppress=True)
print(y)
# [    0.      1.5  1500. ]

有关其他选项，请参见文档中的set_printoptions。

要使用NumPy 1.15.0或更高版本在本地应用打印选项，可以使用numpy.printoptions上下文管理器。例如，在with-suite precision=3和suppress=True中设置：

x = np.random.random(10)
with np.printoptions(precision=3, suppress=True):
    print(x)
    # [ 0.073  0.461  0.689  0.754  0.624  0.901  0.049  0.582  0.557  0.348]

但是在with-suite打印选项之外，将恢复为默认设置：

print(x)    
# [ 0.07334334  0.46132615  0.68935231  0.75379645  0.62424021  0.90115836
#   0.04879837  0.58207504  0.55694118  0.34768638]

如果您使用的是NumPy的早期版本，则可以自己创建上下文管理器。例如，

import numpy as np
import contextlib

@contextlib.contextmanager
def printoptions(*args, **kwargs):
    original = np.get_printoptions()
    np.set_printoptions(*args, **kwargs)
    try:
        yield
    finally: 
        np.set_printoptions(**original)

x = np.random.random(10)
with printoptions(precision=3, suppress=True):
    print(x)
    # [ 0.073  0.461  0.689  0.754  0.624  0.901  0.049  0.582  0.557  0.348]

为防止浮点数结尾处的零被剥离：

np.set_printoptions现在有一个formatter参数，可让您为每种类型指定格式功能。

np.set_printoptions(formatter={'float': '{: 0.3f}'.format})
print(x)

哪个打印

[ 0.078  0.480  0.413  0.830  0.776  0.102  0.513  0.462  0.335  0.712]

代替

[ 0.078  0.48   0.413  0.83   0.776  0.102  0.513  0.462  0.335  0.712]

You can use set_printoptions to set the precision of the output:

import numpy as np
x=np.random.random(10)
print(x)
# [ 0.07837821  0.48002108  0.41274116  0.82993414  0.77610352  0.1023732
#   0.51303098  0.4617183   0.33487207  0.71162095]

np.set_printoptions(precision=3)
print(x)
# [ 0.078  0.48   0.413  0.83   0.776  0.102  0.513  0.462  0.335  0.712]

And suppress suppresses the use of scientific notation for small numbers:

y=np.array([1.5e-10,1.5,1500])
print(y)
# [  1.500e-10   1.500e+00   1.500e+03]
np.set_printoptions(suppress=True)
print(y)
# [    0.      1.5  1500. ]

See the docs for set_printoptions for other options.

To apply print options locally, using NumPy 1.15.0 or later, you could use the numpy.printoptions context manager. For example, inside the with-suite precision=3 and suppress=True are set:

x = np.random.random(10)
with np.printoptions(precision=3, suppress=True):
    print(x)
    # [ 0.073  0.461  0.689  0.754  0.624  0.901  0.049  0.582  0.557  0.348]

But outside the with-suite the print options are back to default settings:

print(x)    
# [ 0.07334334  0.46132615  0.68935231  0.75379645  0.62424021  0.90115836
#   0.04879837  0.58207504  0.55694118  0.34768638]

If you are using an earlier version of NumPy, you can create the context manager yourself. For example,

import numpy as np
import contextlib

@contextlib.contextmanager
def printoptions(*args, **kwargs):
    original = np.get_printoptions()
    np.set_printoptions(*args, **kwargs)
    try:
        yield
    finally: 
        np.set_printoptions(**original)

x = np.random.random(10)
with printoptions(precision=3, suppress=True):
    print(x)
    # [ 0.073  0.461  0.689  0.754  0.624  0.901  0.049  0.582  0.557  0.348]

To prevent zeros from being stripped from the end of floats:

np.set_printoptions now has a formatter parameter which allows you to specify a format function for each type.

np.set_printoptions(formatter={'float': '{: 0.3f}'.format})
print(x)

which prints

[ 0.078  0.480  0.413  0.830  0.776  0.102  0.513  0.462  0.335  0.712]

instead of

[ 0.078  0.48   0.413  0.83   0.776  0.102  0.513  0.462  0.335  0.712]

回答 1

您可以np.set_printoptions从np.array_str命令中获得功能的子集，该命令仅适用于单个打印语句。

http://docs.scipy.org/doc/numpy/reference/generated/numpy.array_str.html

例如：

In [27]: x = np.array([[1.1, 0.9, 1e-6]]*3)

In [28]: print x
[[  1.10000000e+00   9.00000000e-01   1.00000000e-06]
 [  1.10000000e+00   9.00000000e-01   1.00000000e-06]
 [  1.10000000e+00   9.00000000e-01   1.00000000e-06]]

In [29]: print np.array_str(x, precision=2)
[[  1.10e+00   9.00e-01   1.00e-06]
 [  1.10e+00   9.00e-01   1.00e-06]
 [  1.10e+00   9.00e-01   1.00e-06]]

In [30]: print np.array_str(x, precision=2, suppress_small=True)
[[ 1.1  0.9  0. ]
 [ 1.1  0.9  0. ]
 [ 1.1  0.9  0. ]]

You can get a subset of the np.set_printoptions functionality from the np.array_str command, which applies only to a single print statement.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.array_str.html

For example:

In [27]: x = np.array([[1.1, 0.9, 1e-6]]*3)

In [28]: print x
[[  1.10000000e+00   9.00000000e-01   1.00000000e-06]
 [  1.10000000e+00   9.00000000e-01   1.00000000e-06]
 [  1.10000000e+00   9.00000000e-01   1.00000000e-06]]

In [29]: print np.array_str(x, precision=2)
[[  1.10e+00   9.00e-01   1.00e-06]
 [  1.10e+00   9.00e-01   1.00e-06]
 [  1.10e+00   9.00e-01   1.00e-06]]

In [30]: print np.array_str(x, precision=2, suppress_small=True)
[[ 1.1  0.9  0. ]
 [ 1.1  0.9  0. ]
 [ 1.1  0.9  0. ]]

回答 2

Unutbu给出了一个非常完整的答案（他们也从我这里得到了+1），但这是一种高科技的替代方法：

>>> x=np.random.randn(5)
>>> x
array([ 0.25276524,  2.28334499, -1.88221637,  0.69949927,  1.0285625 ])
>>> ['{:.2f}'.format(i) for i in x]
['0.25', '2.28', '-1.88', '0.70', '1.03']

作为一项功能（使用format()语法进行格式化）：

def ndprint(a, format_string ='{0:.2f}'):
    print [format_string.format(v,i) for i,v in enumerate(a)]

用法：

>>> ndprint(x)
['0.25', '2.28', '-1.88', '0.70', '1.03']

>>> ndprint(x, '{:10.4e}')
['2.5277e-01', '2.2833e+00', '-1.8822e+00', '6.9950e-01', '1.0286e+00']

>>> ndprint(x, '{:.8g}')
['0.25276524', '2.283345', '-1.8822164', '0.69949927', '1.0285625']

可以使用以下格式的字符串访问数组的索引：

>>> ndprint(x, 'Element[{1:d}]={0:.2f}')
['Element[0]=0.25', 'Element[1]=2.28', 'Element[2]=-1.88', 'Element[3]=0.70', 'Element[4]=1.03']

Unutbu gave a really complete answer (they got a +1 from me too), but here is a lo-tech alternative:

>>> x=np.random.randn(5)
>>> x
array([ 0.25276524,  2.28334499, -1.88221637,  0.69949927,  1.0285625 ])
>>> ['{:.2f}'.format(i) for i in x]
['0.25', '2.28', '-1.88', '0.70', '1.03']

As a function (using the format() syntax for formatting):

def ndprint(a, format_string ='{0:.2f}'):
    print [format_string.format(v,i) for i,v in enumerate(a)]

Usage:

>>> ndprint(x)
['0.25', '2.28', '-1.88', '0.70', '1.03']

>>> ndprint(x, '{:10.4e}')
['2.5277e-01', '2.2833e+00', '-1.8822e+00', '6.9950e-01', '1.0286e+00']

>>> ndprint(x, '{:.8g}')
['0.25276524', '2.283345', '-1.8822164', '0.69949927', '1.0285625']

The index of the array is accessible in the format string:

>>> ndprint(x, 'Element[{1:d}]={0:.2f}')
['Element[0]=0.25', 'Element[1]=2.28', 'Element[2]=-1.88', 'Element[3]=0.70', 'Element[4]=1.03']

回答 3

FYI Numpy 1.15（发布日期待定）将包括一个上下文管理器，用于在本地设置打印选项。这意味着以下内容将与接受的答案（由unutbu和Neil G撰写）中的相应示例相同，而无需编写您自己的上下文管理器。例如，使用他们的示例：

x = np.random.random(10)
with np.printoptions(precision=3, suppress=True):
    print(x)
    # [ 0.073  0.461  0.689  0.754  0.624  0.901  0.049  0.582  0.557  0.348]

FYI Numpy 1.15 (release date pending) will include a context manager for setting print options locally. This means that the following will work the same as the corresponding example in the accepted answer (by unutbu and Neil G) without having to write your own context manager. E.g., using their example:

x = np.random.random(10)
with np.printoptions(precision=3, suppress=True):
    print(x)
    # [ 0.073  0.461  0.689  0.754  0.624  0.901  0.049  0.582  0.557  0.348]

回答 4

在denis答案中隐藏了使它很容易以字符串形式获得结果的gem（在当今的numpy版本中）： np.array2string

>>> import numpy as np
>>> x=np.random.random(10)
>>> np.array2string(x, formatter={'float_kind':'{0:.3f}'.format})
'[0.599 0.847 0.513 0.155 0.844 0.753 0.920 0.797 0.427 0.420]'

The gem that makes it all too easy to obtain the result as a string (in today’s numpy versions) is hidden in denis answer: np.array2string

>>> import numpy as np
>>> x=np.random.random(10)
>>> np.array2string(x, formatter={'float_kind':'{0:.3f}'.format})
'[0.599 0.847 0.513 0.155 0.844 0.753 0.920 0.797 0.427 0.420]'

回答 5

几年后，下面是另一个。但是对于日常使用，我只是

np.set_printoptions( threshold=20, edgeitems=10, linewidth=140,
    formatter = dict( float = lambda x: "%.3g" % x ))  # float arrays %.3g

''' printf( "... %.3g ... %.1f  ...", arg, arg ... ) for numpy arrays too

Example:
    printf( """ x: %.3g   A: %.1f   s: %s   B: %s """,
                   x,        A,        "str",  B )

If `x` and `A` are numbers, this is like `"format" % (x, A, "str", B)` in python.
If they're numpy arrays, each element is printed in its own format:
    `x`: e.g. [ 1.23 1.23e-6 ... ]  3 digits
    `A`: [ [ 1 digit after the decimal point ... ] ... ]
with the current `np.set_printoptions()`. For example, with
    np.set_printoptions( threshold=100, edgeitems=3, suppress=True )
only the edges of big `x` and `A` are printed.
`B` is printed as `str(B)`, for any `B` -- a number, a list, a numpy object ...

`printf()` tries to handle too few or too many arguments sensibly,
but this is iffy and subject to change.

How it works:
numpy has a function `np.array2string( A, "%.3g" )` (simplifying a bit).
`printf()` splits the format string, and for format / arg pairs
    format: % d e f g
    arg: try `np.asanyarray()`
-->  %s  np.array2string( arg, format )
Other formats and non-ndarray args are left alone, formatted as usual.

Notes:

`printf( ... end= file= )` are passed on to the python `print()` function.

Only formats `% [optional width . precision] d e f g` are implemented,
not `%(varname)format` .

%d truncates floats, e.g. 0.9 and -0.9 to 0; %.0f rounds, 0.9 to 1 .
%g is the same as %.6g, 6 digits.
%% is a single "%" character.

The function `sprintf()` returns a long string. For example,
    title = sprintf( "%s  m %g  n %g  X %.3g",
                    __file__, m, n, X )
    print( title )
    ...
    pl.title( title )

Module globals:
_fmt = "%.3g"  # default for extra args
_squeeze = np.squeeze  # (n,1) (1,n) -> (n,) print in 1 line not n

See also:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html
http://docs.python.org/2.7/library/stdtypes.html#string-formatting

'''
# http://stackoverflow.com/questions/2891790/pretty-printing-of-numpy-array


#...............................................................................
from __future__ import division, print_function
import re
import numpy as np

__version__ = "2014-02-03 feb denis"

_splitformat = re.compile( r'''(
    %
    (?<! %% )  # not %%
    -? [ \d . ]*  # optional width.precision
    \w
    )''', re.X )
    # ... %3.0f  ... %g  ... %-10s ...
    # -> ['...' '%3.0f' '...' '%g' '...' '%-10s' '...']
    # odd len, first or last may be ""

_fmt = "%.3g"  # default for extra args
_squeeze = np.squeeze  # (n,1) (1,n) -> (n,) print in 1 line not n

#...............................................................................
def printf( format, *args, **kwargs ):
    print( sprintf( format, *args ), **kwargs )  # end= file=

printf.__doc__ = __doc__


def sprintf( format, *args ):
    """ sprintf( "text %.3g text %4.1f ... %s ... ", numpy arrays or ... )
        %[defg] array -> np.array2string( formatter= )
    """
    args = list(args)
    if not isinstance( format, basestring ):
        args = [format] + args
        format = ""

    tf = _splitformat.split( format )  # [ text %e text %f ... ]
    nfmt = len(tf) // 2
    nargs = len(args)
    if nargs < nfmt:
        args += (nfmt - nargs) * ["?arg?"]
    elif nargs > nfmt:
        tf += (nargs - nfmt) * [_fmt, " "]  # default _fmt

    for j, arg in enumerate( args ):
        fmt = tf[ 2*j + 1 ]
        if arg is None \
        or isinstance( arg, basestring ) \
        or (hasattr( arg, "__iter__" ) and len(arg) == 0):
            tf[ 2*j + 1 ] = "%s"  # %f -> %s, not error
            continue
        args[j], isarray = _tonumpyarray(arg)
        if isarray  and fmt[-1] in "defgEFG":
            tf[ 2*j + 1 ] = "%s"
            fmtfunc = (lambda x: fmt % x)
            formatter = dict( float_kind=fmtfunc, int=fmtfunc )
            args[j] = np.array2string( args[j], formatter=formatter )
    try:
        return "".join(tf) % tuple(args)
    except TypeError:  # shouldn't happen
        print( "error: tf %s  types %s" % (tf, map( type, args )))
        raise


def _tonumpyarray( a ):
    """ a, isarray = _tonumpyarray( a )
        ->  scalar, False
            np.asanyarray(a), float or int
            a, False
    """
    a = getattr( a, "value", a )  # cvxpy
    if np.isscalar(a):
        return a, False
    if hasattr( a, "__iter__" )  and len(a) == 0:
        return a, False
    try:
        # map .value ?
        a = np.asanyarray( a )
    except ValueError:
        return a, False
    if hasattr( a, "dtype" )  and a.dtype.kind in "fi":  # complex ?
        if callable( _squeeze ):
            a = _squeeze( a )  # np.squeeze
        return a, True
    else:
        return a, False


#...............................................................................
if __name__ == "__main__":
    import sys

    n = 5
    seed = 0
        # run this.py n= ...  in sh or ipython
    for arg in sys.argv[1:]:
        exec( arg )
    np.set_printoptions( 1, threshold=4, edgeitems=2, linewidth=80, suppress=True )
    np.random.seed(seed)

    A = np.random.exponential( size=(n,n) ) ** 10
    x = A[0]

    printf( "x: %.3g  \nA: %.1f  \ns: %s  \nB: %s ",
                x,         A,         "str",   A )
    printf( "x %%d: %d", x )
    printf( "x %%.0f: %.0f", x )
    printf( "x %%.1e: %.1e", x )
    printf( "x %%g: %g", x )
    printf( "x %%s uses np printoptions: %s", x )

    printf( "x with default _fmt: ", x )
    printf( "no args" )
    printf( "too few args: %g %g", x )
    printf( x )
    printf( x, x )
    printf( None )
    printf( "[]:", [] )
    printf( "[3]:", [3] )
    printf( np.array( [] ))
    printf( [[]] )  # squeeze

Years later, another one is below. But for everyday use I just

np.set_printoptions( threshold=20, edgeitems=10, linewidth=140,
    formatter = dict( float = lambda x: "%.3g" % x ))  # float arrays %.3g

''' printf( "... %.3g ... %.1f  ...", arg, arg ... ) for numpy arrays too

Example:
    printf( """ x: %.3g   A: %.1f   s: %s   B: %s """,
                   x,        A,        "str",  B )

If `x` and `A` are numbers, this is like `"format" % (x, A, "str", B)` in python.
If they're numpy arrays, each element is printed in its own format:
    `x`: e.g. [ 1.23 1.23e-6 ... ]  3 digits
    `A`: [ [ 1 digit after the decimal point ... ] ... ]
with the current `np.set_printoptions()`. For example, with
    np.set_printoptions( threshold=100, edgeitems=3, suppress=True )
only the edges of big `x` and `A` are printed.
`B` is printed as `str(B)`, for any `B` -- a number, a list, a numpy object ...

`printf()` tries to handle too few or too many arguments sensibly,
but this is iffy and subject to change.

How it works:
numpy has a function `np.array2string( A, "%.3g" )` (simplifying a bit).
`printf()` splits the format string, and for format / arg pairs
    format: % d e f g
    arg: try `np.asanyarray()`
-->  %s  np.array2string( arg, format )
Other formats and non-ndarray args are left alone, formatted as usual.

Notes:

`printf( ... end= file= )` are passed on to the python `print()` function.

Only formats `% [optional width . precision] d e f g` are implemented,
not `%(varname)format` .

%d truncates floats, e.g. 0.9 and -0.9 to 0; %.0f rounds, 0.9 to 1 .
%g is the same as %.6g, 6 digits.
%% is a single "%" character.

The function `sprintf()` returns a long string. For example,
    title = sprintf( "%s  m %g  n %g  X %.3g",
                    __file__, m, n, X )
    print( title )
    ...
    pl.title( title )

Module globals:
_fmt = "%.3g"  # default for extra args
_squeeze = np.squeeze  # (n,1) (1,n) -> (n,) print in 1 line not n

See also:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html
http://docs.python.org/2.7/library/stdtypes.html#string-formatting

'''
# http://stackoverflow.com/questions/2891790/pretty-printing-of-numpy-array


#...............................................................................
from __future__ import division, print_function
import re
import numpy as np

__version__ = "2014-02-03 feb denis"

_splitformat = re.compile( r'''(
    %
    (?<! %% )  # not %%
    -? [ \d . ]*  # optional width.precision
    \w
    )''', re.X )
    # ... %3.0f  ... %g  ... %-10s ...
    # -> ['...' '%3.0f' '...' '%g' '...' '%-10s' '...']
    # odd len, first or last may be ""

_fmt = "%.3g"  # default for extra args
_squeeze = np.squeeze  # (n,1) (1,n) -> (n,) print in 1 line not n

#...............................................................................
def printf( format, *args, **kwargs ):
    print( sprintf( format, *args ), **kwargs )  # end= file=

printf.__doc__ = __doc__


def sprintf( format, *args ):
    """ sprintf( "text %.3g text %4.1f ... %s ... ", numpy arrays or ... )
        %[defg] array -> np.array2string( formatter= )
    """
    args = list(args)
    if not isinstance( format, basestring ):
        args = [format] + args
        format = ""

    tf = _splitformat.split( format )  # [ text %e text %f ... ]
    nfmt = len(tf) // 2
    nargs = len(args)
    if nargs < nfmt:
        args += (nfmt - nargs) * ["?arg?"]
    elif nargs > nfmt:
        tf += (nargs - nfmt) * [_fmt, " "]  # default _fmt

    for j, arg in enumerate( args ):
        fmt = tf[ 2*j + 1 ]
        if arg is None \
        or isinstance( arg, basestring ) \
        or (hasattr( arg, "__iter__" ) and len(arg) == 0):
            tf[ 2*j + 1 ] = "%s"  # %f -> %s, not error
            continue
        args[j], isarray = _tonumpyarray(arg)
        if isarray  and fmt[-1] in "defgEFG":
            tf[ 2*j + 1 ] = "%s"
            fmtfunc = (lambda x: fmt % x)
            formatter = dict( float_kind=fmtfunc, int=fmtfunc )
            args[j] = np.array2string( args[j], formatter=formatter )
    try:
        return "".join(tf) % tuple(args)
    except TypeError:  # shouldn't happen
        print( "error: tf %s  types %s" % (tf, map( type, args )))
        raise


def _tonumpyarray( a ):
    """ a, isarray = _tonumpyarray( a )
        ->  scalar, False
            np.asanyarray(a), float or int
            a, False
    """
    a = getattr( a, "value", a )  # cvxpy
    if np.isscalar(a):
        return a, False
    if hasattr( a, "__iter__" )  and len(a) == 0:
        return a, False
    try:
        # map .value ?
        a = np.asanyarray( a )
    except ValueError:
        return a, False
    if hasattr( a, "dtype" )  and a.dtype.kind in "fi":  # complex ?
        if callable( _squeeze ):
            a = _squeeze( a )  # np.squeeze
        return a, True
    else:
        return a, False


#...............................................................................
if __name__ == "__main__":
    import sys

    n = 5
    seed = 0
        # run this.py n= ...  in sh or ipython
    for arg in sys.argv[1:]:
        exec( arg )
    np.set_printoptions( 1, threshold=4, edgeitems=2, linewidth=80, suppress=True )
    np.random.seed(seed)

    A = np.random.exponential( size=(n,n) ) ** 10
    x = A[0]

    printf( "x: %.3g  \nA: %.1f  \ns: %s  \nB: %s ",
                x,         A,         "str",   A )
    printf( "x %%d: %d", x )
    printf( "x %%.0f: %.0f", x )
    printf( "x %%.1e: %.1e", x )
    printf( "x %%g: %g", x )
    printf( "x %%s uses np printoptions: %s", x )

    printf( "x with default _fmt: ", x )
    printf( "no args" )
    printf( "too few args: %g %g", x )
    printf( x )
    printf( x, x )
    printf( None )
    printf( "[]:", [] )
    printf( "[3]:", [3] )
    printf( np.array( [] ))
    printf( [[]] )  # squeeze

回答 6

这是我所使用的，并且非常简单：

print(np.vectorize("%.2f".__mod__)(sparse))

And here is what I use, and it’s pretty uncomplicated:

print(np.vectorize("%.2f".__mod__)(sparse))

回答 7

惊讶的是没有看到around提到的方法-意味着不会弄乱打印选项。

import numpy as np

x = np.random.random([5,5])
print(np.around(x,decimals=3))

Output:
[[0.475 0.239 0.183 0.991 0.171]
 [0.231 0.188 0.235 0.335 0.049]
 [0.87  0.212 0.219 0.9   0.3  ]
 [0.628 0.791 0.409 0.5   0.319]
 [0.614 0.84  0.812 0.4   0.307]]

Was surprised to not see around method mentioned – means no messing with print options.

import numpy as np

x = np.random.random([5,5])
print(np.around(x,decimals=3))

Output:
[[0.475 0.239 0.183 0.991 0.171]
 [0.231 0.188 0.235 0.335 0.049]
 [0.87  0.212 0.219 0.9   0.3  ]
 [0.628 0.791 0.409 0.5   0.319]
 [0.614 0.84  0.812 0.4   0.307]]

回答 8

我经常希望不同的列具有不同的格式。这是我通过将NumPy数组（的片段）转换为元组来使用格式多样的简单2D数组的方式：

import numpy as np
dat = np.random.random((10,11))*100  # Array of random values between 0 and 100
print(dat)                           # Lines get truncated and are hard to read
for i in range(10):
    print((4*"%6.2f"+7*"%9.4f") % tuple(dat[i,:]))

I often want different columns to have different formats. Here is how I print a simple 2D array using some variety in the formatting by converting (slices of) my NumPy array to a tuple:

import numpy as np
dat = np.random.random((10,11))*100  # Array of random values between 0 and 100
print(dat)                           # Lines get truncated and are hard to read
for i in range(10):
    print((4*"%6.2f"+7*"%9.4f") % tuple(dat[i,:]))

回答 9

numpy.char.mod根据您应用程序的详细信息，它可能也很有用，例如：numpy.char.mod('Value=%4.2f', numpy.arange(5, 10, 0.1))将返回一个包含元素“ Value = 5.00”，“ Value = 5.10”等的字符串数组（作为一个人为的示例）。

numpy.char.mod may also be useful, depending on the details of your application e.g.:numpy.char.mod('Value=%4.2f', numpy.arange(5, 10, 0.1)) will return a string array with elements “Value=5.00”, “Value=5.10” etc. (as a somewhat contrived example).

回答 10

numpy数组具有round(precision)返回一个新的numpy数组的方法，该数组具有相应的舍入元素。

import numpy as np

x = np.random.random([5,5])
print(x.round(3))

The numpy arrays have the method round(precision) which return a new numpy array with elements rounded accordingly.

import numpy as np

x = np.random.random([5,5])
print(x.round(3))

回答 11

我发现使用循环显示列表或数组时，通常的浮点格式{：9.5f}可以正常工作-抑制小数值电子注释。但是，当格式化程序在单个print语句中有多个项目时，该格式有时无法抑制其电子注释。例如：

import numpy as np
np.set_printoptions(suppress=True)
a3 = 4E-3
a4 = 4E-4
a5 = 4E-5
a6 = 4E-6
a7 = 4E-7
a8 = 4E-8
#--first, display separate numbers-----------
print('Case 3:  a3, a4, a5:             {:9.5f}{:9.5f}{:9.5f}'.format(a3,a4,a5))
print('Case 4:  a3, a4, a5, a6:         {:9.5f}{:9.5f}{:9.5f}{:9.5}'.format(a3,a4,a5,a6))
print('Case 5:  a3, a4, a5, a6, a7:     {:9.5f}{:9.5f}{:9.5f}{:9.5}{:9.5f}'.format(a3,a4,a5,a6,a7))
print('Case 6:  a3, a4, a5, a6, a7, a8: {:9.5f}{:9.5f}{:9.5f}{:9.5f}{:9.5}{:9.5f}'.format(a3,a4,a5,a6,a7,a8))
#---second, display a list using a loop----------
myList = [a3,a4,a5,a6,a7,a8]
print('List 6:  a3, a4, a5, a6, a7, a8: ', end='')
for x in myList: 
    print('{:9.5f}'.format(x), end='')
print()
#---third, display a numpy array using a loop------------
myArray = np.array(myList)
print('Array 6: a3, a4, a5, a6, a7, a8: ', end='')
for x in myArray:
    print('{:9.5f}'.format(x), end='')
print()

我的结果显示了情况4、5和6中的错误：

Case 3:  a3, a4, a5:               0.00400  0.00040  0.00004
Case 4:  a3, a4, a5, a6:           0.00400  0.00040  0.00004    4e-06
Case 5:  a3, a4, a5, a6, a7:       0.00400  0.00040  0.00004    4e-06  0.00000
Case 6:  a3, a4, a5, a6, a7, a8:   0.00400  0.00040  0.00004  0.00000    4e-07  0.00000
List 6:  a3, a4, a5, a6, a7, a8:   0.00400  0.00040  0.00004  0.00000  0.00000  0.00000
Array 6: a3, a4, a5, a6, a7, a8:   0.00400  0.00040  0.00004  0.00000  0.00000  0.00000

我对此没有任何解释，因此我总是使用循环来浮动多个值的输出。

I find that the usual float format {:9.5f} works properly — suppressing small-value e-notations — when displaying a list or an array using a loop. But that format sometimes fails to suppress its e-notation when a formatter has several items in a single print statement. For example:

import numpy as np
np.set_printoptions(suppress=True)
a3 = 4E-3
a4 = 4E-4
a5 = 4E-5
a6 = 4E-6
a7 = 4E-7
a8 = 4E-8
#--first, display separate numbers-----------
print('Case 3:  a3, a4, a5:             {:9.5f}{:9.5f}{:9.5f}'.format(a3,a4,a5))
print('Case 4:  a3, a4, a5, a6:         {:9.5f}{:9.5f}{:9.5f}{:9.5}'.format(a3,a4,a5,a6))
print('Case 5:  a3, a4, a5, a6, a7:     {:9.5f}{:9.5f}{:9.5f}{:9.5}{:9.5f}'.format(a3,a4,a5,a6,a7))
print('Case 6:  a3, a4, a5, a6, a7, a8: {:9.5f}{:9.5f}{:9.5f}{:9.5f}{:9.5}{:9.5f}'.format(a3,a4,a5,a6,a7,a8))
#---second, display a list using a loop----------
myList = [a3,a4,a5,a6,a7,a8]
print('List 6:  a3, a4, a5, a6, a7, a8: ', end='')
for x in myList: 
    print('{:9.5f}'.format(x), end='')
print()
#---third, display a numpy array using a loop------------
myArray = np.array(myList)
print('Array 6: a3, a4, a5, a6, a7, a8: ', end='')
for x in myArray:
    print('{:9.5f}'.format(x), end='')
print()

My results show the bug in cases 4, 5, and 6:

Case 3:  a3, a4, a5:               0.00400  0.00040  0.00004
Case 4:  a3, a4, a5, a6:           0.00400  0.00040  0.00004    4e-06
Case 5:  a3, a4, a5, a6, a7:       0.00400  0.00040  0.00004    4e-06  0.00000
Case 6:  a3, a4, a5, a6, a7, a8:   0.00400  0.00040  0.00004  0.00000    4e-07  0.00000
List 6:  a3, a4, a5, a6, a7, a8:   0.00400  0.00040  0.00004  0.00000  0.00000  0.00000
Array 6: a3, a4, a5, a6, a7, a8:   0.00400  0.00040  0.00004  0.00000  0.00000  0.00000

I have no explanation for this, and therefore I always use a loop for floating output of multiple values.

回答 12

我用

def np_print(array,fmt="10.5f"):
    print (array.size*("{:"+fmt+"}")).format(*array)

修改多维数组并不难。

I use

def np_print(array,fmt="10.5f"):
    print (array.size*("{:"+fmt+"}")).format(*array)

It’s not difficult to modify it for multi-dimensional arrays.

回答 13

另一个选择是使用decimal模块：

import numpy as np
from decimal import *

arr = np.array([  56.83,  385.3 ,    6.65,  126.63,   85.76,  192.72,  112.81, 10.55])
arr2 = [str(Decimal(i).quantize(Decimal('.01'))) for i in arr]

# ['56.83', '385.30', '6.65', '126.63', '85.76', '192.72', '112.81', '10.55']

Yet another option is to use the decimal module:

import numpy as np
from decimal import *

arr = np.array([  56.83,  385.3 ,    6.65,  126.63,   85.76,  192.72,  112.81, 10.55])
arr2 = [str(Decimal(i).quantize(Decimal('.01'))) for i in arr]

# ['56.83', '385.30', '6.65', '126.63', '85.76', '192.72', '112.81', '10.55']

知识问答

不区分大小写的正则表达式，无需重新编译？

2021年7月27日 Python实用宝典

问题：不区分大小写的正则表达式，无需重新编译？

在Python中，我可以使用re.compile以下命令将正则表达式编译为不区分大小写：

>>> s = 'TeSt'
>>> casesensitive = re.compile('test')
>>> ignorecase = re.compile('test', re.IGNORECASE)
>>> 
>>> print casesensitive.match(s)
None
>>> print ignorecase.match(s)
<_sre.SRE_Match object at 0x02F0B608>

有没有办法做同样的事情，但是不用re.compile。在文档中找不到Perl的i后缀（例如m/test/i）。

In Python, I can compile a regular expression to be case-insensitive using re.compile:

>>> s = 'TeSt'
>>> casesensitive = re.compile('test')
>>> ignorecase = re.compile('test', re.IGNORECASE)
>>> 
>>> print casesensitive.match(s)
None
>>> print ignorecase.match(s)
<_sre.SRE_Match object at 0x02F0B608>

Is there a way to do the same, but without using re.compile. I can’t find anything like Perl’s i suffix (e.g. m/test/i) in the documentation.

回答 0

传递re.IGNORECASE到flags的PARAM search，match或sub：

re.search('test', 'TeSt', re.IGNORECASE)
re.match('test', 'TeSt', re.IGNORECASE)
re.sub('test', 'xxxx', 'Testing', flags=re.IGNORECASE)

Pass re.IGNORECASE to the flags param of search, match, or sub:

re.search('test', 'TeSt', re.IGNORECASE)
re.match('test', 'TeSt', re.IGNORECASE)
re.sub('test', 'xxxx', 'Testing', flags=re.IGNORECASE)

回答 1

您还可以使用不带IGNORECASE标志（已在Python 2.7.3中进行测试）的搜索/匹配来执行不区分大小写的搜索：

re.search(r'(?i)test', 'TeSt').group()    ## returns 'TeSt'
re.match(r'(?i)test', 'TeSt').group()     ## returns 'TeSt'

You can also perform case insensitive searches using search/match without the IGNORECASE flag (tested in Python 2.7.3):

re.search(r'(?i)test', 'TeSt').group()    ## returns 'TeSt'
re.match(r'(?i)test', 'TeSt').group()     ## returns 'TeSt'

回答 2

不区分大小写的标记(?i)可以直接合并到regex模式中：

>>> import re
>>> s = 'This is one Test, another TEST, and another test.'
>>> re.findall('(?i)test', s)
['Test', 'TEST', 'test']

The case-insensitive marker, (?i) can be incorporated directly into the regex pattern:

>>> import re
>>> s = 'This is one Test, another TEST, and another test.'
>>> re.findall('(?i)test', s)
['Test', 'TEST', 'test']

回答 3

您还可以在模式编译期间定义不区分大小写的代码：

pattern = re.compile('FIle:/+(.*)', re.IGNORECASE)

You can also define case insensitive during the pattern compile:

pattern = re.compile('FIle:/+(.*)', re.IGNORECASE)

回答 4

进口中

import re

在运行时处理中：

RE_TEST = r'test'
if re.match(RE_TEST, 'TeSt', re.IGNORECASE):

应当指出，不使用re.compile是浪费。每次调用上述match方法时，都会编译正则表达式。这在其他编程语言中也是错误的做法。下面是更好的做法。

在应用程序初始化中：

self.RE_TEST = re.compile('test', re.IGNORECASE)

在运行时处理中：

if self.RE_TEST.match('TeSt'):

In imports

import re

In run time processing:

RE_TEST = r'test'
if re.match(RE_TEST, 'TeSt', re.IGNORECASE):

It should be mentioned that not using re.compile is wasteful. Every time the above match method is called, the regular expression will be compiled. This is also faulty practice in other programming languages. The below is the better practice.

In app initialization:

self.RE_TEST = re.compile('test', re.IGNORECASE)

In run time processing:

if self.RE_TEST.match('TeSt'):

回答 5

#'re.IGNORECASE' for case insensitive results short form re.I
#'re.match' returns the first match located from the start of the string. 
#'re.search' returns location of the where the match is found 
#'re.compile' creates a regex object that can be used for multiple matches

 >>> s = r'TeSt'   
 >>> print (re.match(s, r'test123', re.I))
 <_sre.SRE_Match object; span=(0, 4), match='test'>
 # OR
 >>> pattern = re.compile(s, re.I)
 >>> print(pattern.match(r'test123'))
 <_sre.SRE_Match object; span=(0, 4), match='test'>

#'re.IGNORECASE' for case insensitive results short form re.I
#'re.match' returns the first match located from the start of the string. 
#'re.search' returns location of the where the match is found 
#'re.compile' creates a regex object that can be used for multiple matches

 >>> s = r'TeSt'   
 >>> print (re.match(s, r'test123', re.I))
 <_sre.SRE_Match object; span=(0, 4), match='test'>
 # OR
 >>> pattern = re.compile(s, re.I)
 >>> print(pattern.match(r'test123'))
 <_sre.SRE_Match object; span=(0, 4), match='test'>

回答 6

要执行不区分大小写的操作，请提供re.IGNORECASE

>>> import re
>>> test = 'UPPER TEXT, lower text, Mixed Text'
>>> re.findall('text', test, flags=re.IGNORECASE)
['TEXT', 'text', 'Text']

如果我们要替换与大小写匹配的文本…

>>> def matchcase(word):
        def replace(m):
            text = m.group()
            if text.isupper():
                return word.upper()
            elif text.islower():
                return word.lower()
            elif text[0].isupper():
                return word.capitalize()
            else:
                return word
        return replace

>>> re.sub('text', matchcase('word'), test, flags=re.IGNORECASE)
'UPPER WORD, lower word, Mixed Word'

To perform case-insensitive operations, supply re.IGNORECASE

>>> import re
>>> test = 'UPPER TEXT, lower text, Mixed Text'
>>> re.findall('text', test, flags=re.IGNORECASE)
['TEXT', 'text', 'Text']

and if we want to replace text matching the case…

>>> def matchcase(word):
        def replace(m):
            text = m.group()
            if text.isupper():
                return word.upper()
            elif text.islower():
                return word.lower()
            elif text[0].isupper():
                return word.capitalize()
            else:
                return word
        return replace

>>> re.sub('text', matchcase('word'), test, flags=re.IGNORECASE)
'UPPER WORD, lower word, Mixed Word'

回答 7

如果您想替换但仍保留以前str的样式。有可能的。

例如：高亮显示字符串“ test asdasd TEST asd tEst asdasd”。

sentence = "test asdasd TEST asd tEst asdasd"
result = re.sub(
  '(test)', 
  r'<b>\1</b>',  # \1 here indicates first matching group.
  sentence, 
  flags=re.IGNORECASE)

测试 asdasd TEST ASD 测试 asdasd

If you would like to replace but still keeping the style of previous str. It is possible.

For example: highlight the string “test asdasd TEST asd tEst asdasd”.

sentence = "test asdasd TEST asd tEst asdasd"
result = re.sub(
  '(test)', 
  r'<b>\1</b>',  # \1 here indicates first matching group.
  sentence, 
  flags=re.IGNORECASE)

test asdasd TEST asd tEst asdasd

回答 8

对于不区分大小写的正则表达式（Regex）：通过两种方式添加代码：

flags=re.IGNORECASE

Regx3GList = re.search("(WCDMA:)((\d*)(,?))*", txt, **re.IGNORECASE**)

不区分大小写的标记 (?i)

Regx3GList = re.search("**(?i)**(WCDMA:)((\d*)(,?))*", txt)

For Case insensitive regular expression(Regex): There are two ways by adding in your code:

flags=re.IGNORECASE

Regx3GList = re.search("(WCDMA:)((\d*)(,?))*", txt, **re.IGNORECASE**)

The case-insensitive marker (?i)

Regx3GList = re.search("**(?i)**(WCDMA:)((\d*)(,?))*", txt)

知识问答

将pip软件包安装到$ HOME文件夹

2021年7月27日 Python实用宝典

问题：将pip软件包安装到$ HOME文件夹

可能吗？安装时pip，将python软件包安装在我的$HOME文件夹中。（例如，我想使用安装，但mercurial使用而不是）pip$HOME/usr/local

我在Mac机器上，只是考虑了这种可能性，而不是“污染”我的/usr/local，$HOME而是使用我的。

PEP370正是与此有关。只是创建一个˜/.local并pip install package足以将这些软件包安装在我的$ HOME文件夹中？

Is it possible? When installing pip, install the python packages inside my $HOME folder. (for example, I want to install mercurial, using pip, but inside $HOME instead of /usr/local)

I’m with a mac machine and just thought about this possibility, instead of “polluting” my /usr/local, I would use my $HOME instead.

PEP370 is exactly about this. Is just creating a ˜/.local and do a pip install package enough to make these packages to be installed only at my $HOME folder?

回答 0

虽然您可以使用virtualenv，但不需要。诀窍是将PEP370 --user参数传递给setup.py脚本。使用的最新版本pip，一种方法是：

pip install --user mercurial

这应导致hg脚本已安装在中$HOME/.local/bin/hg，而hg软件包的其余部分已安装在中$HOME/.local/lib/pythonx.y/site-packages/。

请注意，以上内容适用于Python 2.6。一直有一些争议的关于什么是PEP370风格的Mac OS X上的相应目录位置Python的核心开发者user安装。在Python 2.7和3.2中，Mac OS X上的位置从更改$HOME/.local为$HOME/Library/Python。这可能会在将来的版本中更改。但是，目前，在2.7（和3.2，如果hgPython 3支持的话）上，上述位置将是$HOME/Library/Python/x.y/bin/hg和$HOME/Library/Python/x.y/lib/python/site-packages。

While you can use a virtualenv, you don’t need to. The trick is passing the PEP370 --user argument to the setup.py script. With the latest version of pip, one way to do it is:

pip install --user mercurial

This should result in the hg script being installed in $HOME/.local/bin/hg and the rest of the hg package in $HOME/.local/lib/pythonx.y/site-packages/.

Note, that the above is true for Python 2.6. There has been a bit of controversy among the Python core developers about what is the appropriate directory location on Mac OS X for PEP370-style user installations. In Python 2.7 and 3.2, the location on Mac OS X was changed from $HOME/.local to $HOME/Library/Python. This might change in a future release. But, for now, on 2.7 (and 3.2, if hg were supported on Python 3), the above locations will be $HOME/Library/Python/x.y/bin/hg and $HOME/Library/Python/x.y/lib/python/site-packages.

回答 1

我会在您的HOME目录中使用virtualenv。

$ sudo easy_install -U virtualenv
$ cd ~
$ virtualenv .
$ bin/pip ...

然后~/.(login|profile|bash_profile)，您还可以更改，无论哪种方式都适合您的shell将〜/ bin添加到您的PATH中，然后pip|python|easy_install默认情况下使用。

I would use virtualenv at your HOME directory.

$ sudo easy_install -U virtualenv
$ cd ~
$ virtualenv .
$ bin/pip ...

You could then also alter ~/.(login|profile|bash_profile), whichever is right for your shell to add ~/bin to your PATH and then that pip|python|easy_install would be the one used by default.

回答 2

您可以指定-t选项（--target）来指定目标目录。请参阅pip install --help以获取详细信息。这是您需要的命令：

pip install -t path_to_your_home package-name

例如，要在我的$HOME目录中安装说“ mxnet”，请输入：

pip install -t /home/foivos/ mxnet

You can specify the -t option (--target) to specify the destination directory. See pip install --help for detailed information. This is the command you need:

pip install -t path_to_your_home package-name

for example, for installing say mxnet, in my $HOME directory, I type:

pip install -t /home/foivos/ mxnet

知识问答

为什么python setup.py在Travis CI上说无效命令’bdist_wheel’？

2021年7月27日 Python实用宝典

问题：为什么python setup.py在Travis CI上说无效命令’bdist_wheel’？

我的Python软件包具有一个setup.py在本地配置时可以在Ubuntu Trusty和新的Vagrant Ubuntu Trusty VM上正常运行的软件包：

sudo apt-get install python python-dev --force-yes --assume-yes --fix-broken
curl --silent --show-error --retry 5 https://bootstrap.pypa.io/get-pip.py | sudo python2.7
sudo -H pip install setuptools wheel virtualenv --upgrade

但是，当我在Travis CI Trusty Beta VM上执行相同操作时：

- sudo apt-get install python python-dev --force-yes --assume-yes --fix-broken
- curl --silent --show-error --retry 5 https://bootstrap.pypa.io/get-pip.py | sudo python2.7
- sudo -H pip install setuptools wheel virtualenv --upgrade

我得到：

python2.7 setup.py bdist_wheel
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help
error: invalid command 'bdist_wheel'

这为什么我不能在python中创建轮子？是相关的，但请注意，我正在安装滚轮并升级setuptools。

My Python package has a setup.py which builds fine locally on Ubuntu Trusty and on a fresh Vagrant Ubuntu Trusty VM when I provision it like this:

sudo apt-get install python python-dev --force-yes --assume-yes --fix-broken
curl --silent --show-error --retry 5 https://bootstrap.pypa.io/get-pip.py | sudo python2.7
sudo -H pip install setuptools wheel virtualenv --upgrade

But when I do the same on a Travis CI Trusty Beta VM:

- sudo apt-get install python python-dev --force-yes --assume-yes --fix-broken
- curl --silent --show-error --retry 5 https://bootstrap.pypa.io/get-pip.py | sudo python2.7
- sudo -H pip install setuptools wheel virtualenv --upgrade

I get:

python2.7 setup.py bdist_wheel
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help
error: invalid command 'bdist_wheel'

This Why can I not create a wheel in python? is related but note I am installing wheel and upgrading setuptools.

回答 0

不得不安装该wheel软件包。一切都是最新的，但仍然给出错误。

pip install wheel

然后

python setup.py bdist_wheel

工作没有问题。

Had to install the wheel package. Everything was up to date but still giving the error.

pip install wheel

then

python setup.py bdist_wheel

Worked without issues.

回答 1

pip install wheel

为我工作，但您也可以添加此内容

setup(
    ...
    setup_requires=['wheel']
)

来setup.py并保存点子安装命令

pip install wheel

worked for me, but you can also add this

setup(
    ...
    setup_requires=['wheel']
)

to setup.py and save yourself a pip install command

回答 2

2020年1月

浪费了2个小时。

在AWS上Ubuntu 18.04 new machine，需要进行以下安装：

sudo apt-get install gcc libpq-dev -y
sudo apt-get install python-dev  python-pip -y
sudo apt-get install python3-dev python3-pip python3-venv python3-wheel -y
pip3 install wheel

特别是最后一行是必须的。
但是，可能需要3行之前。

希望能有所帮助。

Jan 2020

2 hours wasted.

On a AWS Ubuntu 18.04 new machine, below installations are required:

sudo apt-get install gcc libpq-dev -y
sudo apt-get install python-dev  python-pip -y
sudo apt-get install python3-dev python3-pip python3-venv python3-wheel -y
pip3 install wheel

Especially the last line is must.
However before 3 lines might be required as prerequisites.

Hope that helps.

回答 3

该问题是由于：

已为Python 2.7安装了旧版本的pip（6.1.1）
Trusty Beta映像上安装的Python 2.7的多个副本
用于Python 2.7的其他位置 sudo

这一切都有点复杂，这里https://github.com/travis-ci/travis-ci/issues/4989可以更好地解释。

我的解决方案是使用user travis而非sudo：

- pip2.7 install --upgrade --user travis pip setuptools wheel virtualenv

This problem is due to:

an old version of pip (6.1.1) being installed for Python 2.7
multiple copies of Python 2.7 installed on the Trusty Beta image
a different location for Python 2.7 being used for sudo

It’s all a bit complicated and better explained here https://github.com/travis-ci/travis-ci/issues/4989.

My solution was to install with user travis instead of sudo:

- pip2.7 install --upgrade --user travis pip setuptools wheel virtualenv

回答 4

如果已经安装了所有必需的模块，则可能需要将setuptools模块导入setup.py文件中。因此，只需在setup.py文件开头添加以下行。

import setuptools
from distutils.core import setup
# other imports and setups

Wheel的文档中也提到了这一点。https://wheel.readthedocs.io/en/stable/#usage

If you already have all the required modules installed you probably need to import the setuptools module in your setup.py file. So just add the following line at the leading of setup.py file.

import setuptools
from distutils.core import setup
# other imports and setups

This is also mentioned in wheel’s documentation. https://wheel.readthedocs.io/en/stable/#usage

回答 5

与许多建议的答案和不同的解决方案一样，此错误也很奇怪。我尝试过，添加它们。只有当我添加pip install --upgrade pip最终为我消除错误时。但是我没有时间隔离哪个是哪个，所以这只是一个问题。

This error is weird as many proposed answers and got mixed solutions. I tried them, add them. It was only when I added pip install --upgrade pip finally removed the error for me. But I have no time to isolate which is which,so this is just fyi.

回答 6

就我而言，venv创建的wheel / pip / setuptools版本太旧了。这有效：

venv/bin/pip  install --upgrade pip wheel setuptools

in my case, the version of wheel/pip/setuptools created by venv is too old. this works:

venv/bin/pip  install --upgrade pip wheel setuptools

回答 7

我已经wheel安装了，所以我尝试卸载并重新安装，它解决了此问题：

pip uninstall wheel
pip install wheel

奇怪的…

I already had wheel installed so I tried to uninstall and reinstall, and it fixed the issue:

pip uninstall wheel
pip install wheel

Weird…

回答 8

我的解决方法是 apt install python3-dev

My fix was apt install python3-dev

回答 9

在中setup.py，如果您有：

from distutils.core import setup

然后，将其更改为

from setuptools import setup

然后重新创建您的virtualenv并重新运行该命令，它应该可以工作。

In your setup.py, if you have:

from distutils.core import setup

Then, change it to

from setuptools import setup

Then re-create your virtualenv and re-run the command, and it should work.

回答 10

尝试通过从setuptools而不是distutils.core导入安装程序来修改setup.py文件

Try modifying the setup.py file by importing setup from setuptools instead of distutils.core

回答 11

我apt-get install python3-dev在Ubuntu中做过，并添加setup_requires=["wheel"]了setup.py

I did apt-get install python3-dev in my Ubuntu and added setup_requires=["wheel"] in setup.py

回答 12

它帮助我按照此处的说明进行操作：

https://packaging.python.org/guides/installing-using-linux-tools/

Debian / Ubuntu

Python 2：

sudo apt install python-pip

Python 3：

sudo apt install python3-venv python3-pip

It helped me to follow instructions in here:

https://packaging.python.org/guides/installing-using-linux-tools/

Debian/Ubuntu

Python 2:

sudo apt install python-pip

Python 3:

sudo apt install python3-venv python3-pip

回答 13

使用Ubuntu 18.04，可以通过安装python3-wheel软件包来解决此问题。

通常，它是作为对任何Python包的依赖项安装的。但是尤其是在构建容器映像时，您经常使用--no-install-recommends它，因此常常会丢失它，因此必须首先手动安装。

Using Ubuntu 18.04 this problem can be resolved by installing the python3-wheelpackage.

Usually this is installed as a dependency on any Python package. But especially when building container images you often work with --no-install-recommends and therefore it is often missing and has to be installed manually first.

回答 14

与Travis CI无关，但是尝试jupyter在Mac OSX 10.8.5上安装时遇到了类似的问题，其他答案都没有帮助。该问题是由于为名为“pyzmq，错误消息填充了数百页。

我发现的解决方案是直接安装该软件包的旧版本：

python -m pip install pyzmq==17 --user

之后，安装jupyter成功，没有错误。

Not related to Travis CI but I ran into similar problem trying to install jupyter on my Mac OSX 10.8.5, and none of the other answers was of help. The problem was caused by building the “wheel” for the package called pyzmq, with error messages filling hundreds of pages.

The solution I found was to directly install an older version of that package:

python -m pip install pyzmq==17 --user

After that, the installation of jupyter succeded without errors.

回答 15

如果您使用的是setup.cfg文件，请将其添加到install_require零件之前：

setup_requires =
    wheel

setup.cfg项目示例：

# setup.py
from setuptools import setup

setup()

# setup.cfg
[metadata]
name = name
version = 0.0.1
description = desc
long_description = file: README.md
long_description_content_type = text/markdown
url = url
author = author
classifiers =
    Programming Language :: Python
    Programming Language :: Python :: 3

[options]
include_package_data = true
packages = find:
setup_requires =
    wheel
install_requires =
    packages
    packages
    packages

If you’re using setup.cfg files, add this before the install_require part:

setup_requires =
    wheel

Example of setup.cfg project :

# setup.py
from setuptools import setup

setup()

# setup.cfg
[metadata]
name = name
version = 0.0.1
description = desc
long_description = file: README.md
long_description_content_type = text/markdown
url = url
author = author
classifiers =
    Programming Language :: Python
    Programming Language :: Python :: 3

[options]
include_package_data = true
packages = find:
setup_requires =
    wheel
install_requires =
    packages
    packages
    packages

知识问答

如何获取Python类中的方法列表？

2021年7月27日 Python实用宝典

问题：如何获取Python类中的方法列表？

我想遍历类中的方法，或者根据当前方法不同地处理类或实例对象。如何获得类方法的列表？

另请参阅：

I want to iterate through the methods in a class, or handle class or instance objects differently based on the methods present. How do I get a list of class methods?

Also see:

回答 0

一个示例（列出了 optparse.OptionParser类）：

>>> from optparse import OptionParser
>>> import inspect
#python2
>>> inspect.getmembers(OptionParser, predicate=inspect.ismethod)
[([('__init__', <unbound method OptionParser.__init__>),
...
 ('add_option', <unbound method OptionParser.add_option>),
 ('add_option_group', <unbound method OptionParser.add_option_group>),
 ('add_options', <unbound method OptionParser.add_options>),
 ('check_values', <unbound method OptionParser.check_values>),
 ('destroy', <unbound method OptionParser.destroy>),
 ('disable_interspersed_args',
  <unbound method OptionParser.disable_interspersed_args>),
 ('enable_interspersed_args',
  <unbound method OptionParser.enable_interspersed_args>),
 ('error', <unbound method OptionParser.error>),
 ('exit', <unbound method OptionParser.exit>),
 ('expand_prog_name', <unbound method OptionParser.expand_prog_name>),
 ...
 ]
# python3
>>> inspect.getmembers(OptionParser, predicate=inspect.isfunction)
...

请注意，它getmembers返回2元组的列表。第一项是成员的名称，第二项是值。

您还可以将实例传递给getmembers：

>>> parser = OptionParser()
>>> inspect.getmembers(parser, predicate=inspect.ismethod)
...

An example (listing the methods of the optparse.OptionParser class):

>>> from optparse import OptionParser
>>> import inspect
#python2
>>> inspect.getmembers(OptionParser, predicate=inspect.ismethod)
[([('__init__', <unbound method OptionParser.__init__>),
...
 ('add_option', <unbound method OptionParser.add_option>),
 ('add_option_group', <unbound method OptionParser.add_option_group>),
 ('add_options', <unbound method OptionParser.add_options>),
 ('check_values', <unbound method OptionParser.check_values>),
 ('destroy', <unbound method OptionParser.destroy>),
 ('disable_interspersed_args',
  <unbound method OptionParser.disable_interspersed_args>),
 ('enable_interspersed_args',
  <unbound method OptionParser.enable_interspersed_args>),
 ('error', <unbound method OptionParser.error>),
 ('exit', <unbound method OptionParser.exit>),
 ('expand_prog_name', <unbound method OptionParser.expand_prog_name>),
 ...
 ]
# python3
>>> inspect.getmembers(OptionParser, predicate=inspect.isfunction)
...

Notice that getmembers returns a list of 2-tuples. The first item is the name of the member, the second item is the value.

You can also pass an instance to getmembers:

>>> parser = OptionParser()
>>> inspect.getmembers(parser, predicate=inspect.ismethod)
...

回答 1

有 dir(theobject)方法可以列出对象的所有字段和方法（以元组的形式），而检查模块（以codeape编写方式）可以列出带有其文档的字段和方法（在““”中）。

因为所有内容（甚至字段）都可能在Python中调用，所以我不确定是否有内置函数仅列出方法。您可能想尝试一下该对象你打通dir是调用与否。

There is the dir(theobject) method to list all the fields and methods of your object (as a tuple) and the inspect module (as codeape write) to list the fields and methods with their doc (in “””).

Because everything (even fields) might be called in Python, I’m not sure there is a built-in function to list only methods. You might want to try if the object you get through dir is callable or not.

回答 2

没有外部库的Python 3.x答案

method_list = [func for func in dir(Foo) if callable(getattr(Foo, func))]

dunder排除的结果：

method_list = [func for func in dir(Foo) if callable(getattr(Foo, func)) and not func.startswith("__")]

Python 3.x answer without external libraries

method_list = [func for func in dir(Foo) if callable(getattr(Foo, func))]

dunder-excluded result:

method_list = [func for func in dir(Foo) if callable(getattr(Foo, func)) and not func.startswith("__")]

回答 3

假设您想知道与列表类关联的所有方法Just Type下面

 print (dir(list))

以上将为您提供列表类的所有方法

Say you want to know all methods associated with list class Just Type The following

 print (dir(list))

Above will give you all methods of list class

回答 4

尝试属性__dict__。

Try the property __dict__.

回答 5

您还可以从类型中导入FunctionType并使用进行测试class.__dict__：

from types import FunctionType

class Foo:
    def bar(self): pass
    def baz(self): pass

def methods(cls):
    return [x for x, y in cls.__dict__.items() if type(y) == FunctionType]

methods(Foo)  # ['bar', 'baz']

you can also import the FunctionType from types and test it with the class.__dict__:

from types import FunctionType

class Foo:
    def bar(self): pass
    def baz(self): pass

def methods(cls):
    return [x for x, y in cls.__dict__.items() if type(y) == FunctionType]

methods(Foo)  # ['bar', 'baz']

回答 6

请注意，您需要考虑是否要从基类中继承（但不重写）的方法包含在结果中。在dir()和inspect.getmembers()操作做包括基类的方法，但使用的__dict__属性没有。

Note that you need to consider whether you want methods from base classes which are inherited (but not overridden) included in the result. The dir() and inspect.getmembers() operations do include base class methods, but use of the __dict__ attribute does not.

回答 7

这也适用：

mymodule.py

def foo(x)
   return 'foo'
def bar()
   return 'bar'

在另一个文件中

import inspect
import mymodule
method_list = [ func[0] for func in inspect.getmembers(mymodule, predicate=inspect.isroutine) if callable(getattr(mymodule, func[0])) ]

输出：

['foo', 'bar']

从python文档中：

inspect.isroutine（对象）

Return true if the object is a user-defined or built-in function or method.

This also works:

mymodule.py

def foo(x)
   return 'foo'
def bar()
   return 'bar'

In another file

import inspect
import mymodule
method_list = [ func[0] for func in inspect.getmembers(mymodule, predicate=inspect.isroutine) if callable(getattr(mymodule, func[0])) ]

output:

['foo', 'bar']

From the python docs:

inspect.isroutine(object)

Return true if the object is a user-defined or built-in function or method.

回答 8

def find_defining_class(obj, meth_name):
    for ty in type(obj).mro():
        if meth_name in ty.__dict__:
            return ty

所以

print find_defining_class(car, 'speedometer')

Think Python页面210

def find_defining_class(obj, meth_name):
    for ty in type(obj).mro():
        if meth_name in ty.__dict__:
            return ty

print find_defining_class(car, 'speedometer')

Think Python page 210

回答 9

有这种方法：

[getattr(obj, m) for m in dir(obj) if not m.startswith('__')]

当处理一个类实例时，最好返回一个带有方法引用的列表，而不仅仅是名称¹。如果那是您的目标，以及

不使用 import
__init__从列表中排除私有方法（例如）

它可能有用。简而言之，对于像

class Ghost:
    def boo(self, who):
        return f'Who you gonna call? {who}'

我们可以检查实例检索

>>> g = Ghost()
>>> methods = [getattr(g, m) for m in dir(g) if not m.startswith('__')]
>>> print(methods)
[<bound method Ghost.boo of <__main__.Ghost object at ...>>]

因此，您可以立即调用它：

>>> for method in methods:
...     print(method('GHOSTBUSTERS'))
...
Who you gonna call? GHOSTBUSTERS

¹一个用例：

我用它来进行单元测试。有一堂课，其中所有方法都执行同一过程的变体-这导致了冗长的测试，每种方法之间只有一个小小的调整。DRY是一个遥不可及的梦想。

认为我应该对所有方法进行一次测试，因此我进行了上述迭代。

尽管我意识到我反而应该将代码本身重构为符合DRY规范……这在将来仍然可能会成为一个挑剔的灵魂。

There’s this approach:

[getattr(obj, m) for m in dir(obj) if not m.startswith('__')]

When dealing with a class instance, perhaps it’d be better to return a list with the method references instead of just names¹. If that’s your goal, as well as

Using no import
Excluding private methods (e.g. __init__) from the list

It may be of use. In a nutshell, for a class like

class Ghost:
    def boo(self, who):
        return f'Who you gonna call? {who}'

We could check instance retrieval with

>>> g = Ghost()
>>> methods = [getattr(g, m) for m in dir(g) if not m.startswith('__')]
>>> print(methods)
[<bound method Ghost.boo of <__main__.Ghost object at ...>>]

So you can call it right away:

>>> for method in methods:
...     print(method('GHOSTBUSTERS'))
...
Who you gonna call? GHOSTBUSTERS

¹ An use case:

I used this for unit testing. Had a class where all methods performed variations of the same process – which led to lengthy tests, each only a tweak away from the others. DRY was a far away dream.

Thought I should have a single test for all methods, so I made the above iteration.

Although I realized I should instead refactor the code itself to be DRY-compliant anyway… this may still serve a random nitpicky soul in the future.

回答 10

我将其保留在那里，因为评分最高的答案尚不清楚。

这是一个简单的测试，不是基于Enum的常规类。

# -*- coding: utf-8 -*-
import sys, inspect
from enum import Enum

class my_enum(Enum):
    """Enum base class my_enum"""
    M_ONE = -1
    ZERO = 0
    ONE = 1
    TWO = 2
    THREE = 3

    def is_natural(self):
            return (self.value > 0)
    def is_negative(self):
            return (self.value < 0)

def is_clean_name(name):
    return not name.startswith('_') and not name.endswith('_')
def clean_names(lst):
    return [ n for n in lst if is_clean_name(n) ]
def get_items(cls,lst):
    try:
            res = [ getattr(cls,n) for n in lst ]
    except Exception as e:
            res = (Exception, type(e), e)
            pass
    return res


print( sys.version )

dir_res = clean_names( dir(my_enum) )
inspect_res = clean_names( [ x[0] for x in inspect.getmembers(my_enum) ] )
dict_res = clean_names( my_enum.__dict__.keys() )

print( '## names ##' )
print( dir_res )
print( inspect_res )
print( dict_res )

print( '## items ##' )
print( get_items(my_enum,dir_res) )
print( get_items(my_enum,inspect_res) )
print( get_items(my_enum,dict_res) )

这就是输出结果。

3.7.7 (default, Mar 10 2020, 13:18:53) 
[GCC 9.2.1 20200306]
## names ##
['M_ONE', 'ONE', 'THREE', 'TWO', 'ZERO']
['M_ONE', 'ONE', 'THREE', 'TWO', 'ZERO', 'name', 'value']
['is_natural', 'is_negative', 'M_ONE', 'ZERO', 'ONE', 'TWO', 'THREE']
## items ##
[<my_enum.M_ONE: -1>, <my_enum.ONE: 1>, <my_enum.THREE: 3>, <my_enum.TWO: 2>, <my_enum.ZERO: 0>]
(<class 'Exception'>, <class 'AttributeError'>, AttributeError('name'))
[<function my_enum.is_natural at 0xb78a1fa4>, <function my_enum.is_negative at 0xb78ae854>, <my_enum.M_ONE: -1>, <my_enum.ZERO: 0>, <my_enum.ONE: 1>, <my_enum.TWO: 2>, <my_enum.THREE: 3>]

所以我们有：

dir 提供不完整的数据
inspect.getmembers 提供不完整的数据，并提供无法使用的内部密钥 getattr()
__dict__.keys()提供完整可靠的结果

为什么选票如此错误？而我错了？还有哪些人的答案这么低呢？

I just keep this there, because top rated answers are not clear.

This is simple test with not usual class based on Enum.

# -*- coding: utf-8 -*-
import sys, inspect
from enum import Enum

class my_enum(Enum):
    """Enum base class my_enum"""
    M_ONE = -1
    ZERO = 0
    ONE = 1
    TWO = 2
    THREE = 3

    def is_natural(self):
            return (self.value > 0)
    def is_negative(self):
            return (self.value < 0)

def is_clean_name(name):
    return not name.startswith('_') and not name.endswith('_')
def clean_names(lst):
    return [ n for n in lst if is_clean_name(n) ]
def get_items(cls,lst):
    try:
            res = [ getattr(cls,n) for n in lst ]
    except Exception as e:
            res = (Exception, type(e), e)
            pass
    return res


print( sys.version )

dir_res = clean_names( dir(my_enum) )
inspect_res = clean_names( [ x[0] for x in inspect.getmembers(my_enum) ] )
dict_res = clean_names( my_enum.__dict__.keys() )

print( '## names ##' )
print( dir_res )
print( inspect_res )
print( dict_res )

print( '## items ##' )
print( get_items(my_enum,dir_res) )
print( get_items(my_enum,inspect_res) )
print( get_items(my_enum,dict_res) )

And this is output results.

3.7.7 (default, Mar 10 2020, 13:18:53) 
[GCC 9.2.1 20200306]
## names ##
['M_ONE', 'ONE', 'THREE', 'TWO', 'ZERO']
['M_ONE', 'ONE', 'THREE', 'TWO', 'ZERO', 'name', 'value']
['is_natural', 'is_negative', 'M_ONE', 'ZERO', 'ONE', 'TWO', 'THREE']
## items ##
[<my_enum.M_ONE: -1>, <my_enum.ONE: 1>, <my_enum.THREE: 3>, <my_enum.TWO: 2>, <my_enum.ZERO: 0>]
(<class 'Exception'>, <class 'AttributeError'>, AttributeError('name'))
[<function my_enum.is_natural at 0xb78a1fa4>, <function my_enum.is_negative at 0xb78ae854>, <my_enum.M_ONE: -1>, <my_enum.ZERO: 0>, <my_enum.ONE: 1>, <my_enum.TWO: 2>, <my_enum.THREE: 3>]

So what we have:

dir provide not complete data
inspect.getmembers provide not complete data and provide internal keys that are not accessible with getattr()
__dict__.keys() provide complete and reliable result

Why are votes so erroneous? And where i’m wrong? And where wrong other people which answers have so low votes?

回答 11

methods = [(func, getattr(o, func)) for func in dir(o) if callable(getattr(o, func))]

给出与

methods = inspect.getmembers(o, predicate=inspect.ismethod)

做。

methods = [(func, getattr(o, func)) for func in dir(o) if callable(getattr(o, func))]

gives an identical list as

methods = inspect.getmembers(o, predicate=inspect.ismethod)

does.

回答 12

如果您的方法是“常规”方法，而不是statimethod，classmethod等等。
我想出了一点技巧-

for k, v in your_class.__dict__.items():
    if "function" in str(v):
        print(k)

通过相应地改变if条件中的“功能”，可以将其扩展到其他类型的方法。
在python 2.7上测试。

If your method is a “regular” method and not a statimethod, classmethod etc.
There is a little hack I came up with –

for k, v in your_class.__dict__.items():
    if "function" in str(v):
        print(k)

This can be extended to other type of methods by changing “function” in the if condition correspondingly.
Tested on python 2.7.

回答 13

您可以使用以下代码列出python类中的所有方法

dir(className)

这将返回该类中所有方法名称的列表

You can list all methods in a python class by using the following code

dir(className)

This will return a list of all the names of the methods in the class

回答 14

我知道这是一篇旧文章，但是只是编写了此函数，如果有人偶然发现答案，它将留在这里：

def classMethods(the_class,class_only=False,instance_only=False,exclude_internal=True):

    def acceptMethod(tup):
        #internal function that analyzes the tuples returned by getmembers tup[1] is the 
        #actual member object
        is_method = inspect.ismethod(tup[1])
        if is_method:
            bound_to = tup[1].im_self
            internal = tup[1].im_func.func_name[:2] == '__' and tup[1].im_func.func_name[-2:] == '__'
            if internal and exclude_internal:
                include = False
            else:
                include = (bound_to == the_class and not instance_only) or (bound_to == None and not class_only)
        else:
            include = False
        return include
    #uses filter to return results according to internal function and arguments
    return filter(acceptMethod,inspect.getmembers(the_class))

I know this is an old post, but just wrote this function and will leave it here is case someone stumbles looking for an answer:

def classMethods(the_class,class_only=False,instance_only=False,exclude_internal=True):

    def acceptMethod(tup):
        #internal function that analyzes the tuples returned by getmembers tup[1] is the 
        #actual member object
        is_method = inspect.ismethod(tup[1])
        if is_method:
            bound_to = tup[1].im_self
            internal = tup[1].im_func.func_name[:2] == '__' and tup[1].im_func.func_name[-2:] == '__'
            if internal and exclude_internal:
                include = False
            else:
                include = (bound_to == the_class and not instance_only) or (bound_to == None and not class_only)
        else:
            include = False
        return include
    #uses filter to return results according to internal function and arguments
    return filter(acceptMethod,inspect.getmembers(the_class))

回答 15

这只是一个观察。“编码”似乎是字符串对象的一种方法

str_1 = 'a'
str_1.encode('utf-8')
>>> b'a'

但是，如果检查str1中的方法，则返回一个空列表

inspect.getmember(str_1, predicate=inspect.ismethod)
>>> []

因此，也许我错了，但是问题似乎并不简单。

This is just an observation. “encode” seems to be a method for string objects

str_1 = 'a'
str_1.encode('utf-8')
>>> b'a'

However, if str1 is inspected for methods, an empty list is returned

inspect.getmember(str_1, predicate=inspect.ismethod)
>>> []

So, maybe I am wrong, but the issue seems to be not simple.

回答 16

class CPerson:
    def __init__(self, age):
        self._age = age

    def run(self):
        pass

    @property
    def age(self): return self._age

    @staticmethod
    def my_static_method(): print("Life is short, you need Python")

    @classmethod
    def say(cls, msg): return msg


test_class = CPerson
# print(dir(test_class))  # list all the fields and methods of your object
print([(name, t) for name, t in test_class.__dict__.items() if type(t).__name__ == 'function' and not name.startswith('__')])
print([(name, t) for name, t in test_class.__dict__.items() if type(t).__name__ != 'function' and not name.startswith('__')])

输出

[('run', <function CPerson.run at 0x0000000002AD3268>)]
[('age', <property object at 0x0000000002368688>), ('my_static_method', <staticmethod object at 0x0000000002ACBD68>), ('say', <classmethod object at 0x0000000002ACF0B8>)]

class CPerson:
    def __init__(self, age):
        self._age = age

    def run(self):
        pass

    @property
    def age(self): return self._age

    @staticmethod
    def my_static_method(): print("Life is short, you need Python")

    @classmethod
    def say(cls, msg): return msg


test_class = CPerson
# print(dir(test_class))  # list all the fields and methods of your object
print([(name, t) for name, t in test_class.__dict__.items() if type(t).__name__ == 'function' and not name.startswith('__')])
print([(name, t) for name, t in test_class.__dict__.items() if type(t).__name__ != 'function' and not name.startswith('__')])

output

[('run', <function CPerson.run at 0x0000000002AD3268>)]
[('age', <property object at 0x0000000002368688>), ('my_static_method', <staticmethod object at 0x0000000002ACBD68>), ('say', <classmethod object at 0x0000000002ACF0B8>)]

回答 17

如果您只想列出python类的方法

import numpy as np
print(np.random.__all__)

If you want to list only methods of a python class

import numpy as np
print(np.random.__all__)

知识问答

如何在OS X上将Python的默认版本设置为3.x？

2021年7月27日 Python实用宝典

问题：如何在OS X上将Python的默认版本设置为3.x？

我正在运行Mountain Lion，而基本的默认Python版本是2.7。我下载了Python 3.3，并希望将其设置为默认值。

目前：

$ python
    version 2.7.5
$ python3.3
    version 3.3

如何设置它，以便每次运行$ python时都能打开3.3？

I’m running Mountain Lion and the basic default Python version is 2.7. I downloaded Python 3.3 and want to set it as default.

Currently:

$ python
    version 2.7.5
$ python3.3
    version 3.3

How do I set it so that every time I run $ python it opens 3.3?

回答 0

在系统范围内更改默认python可执行文件的版本可能会破坏某些依赖python2的应用程序。

但是，您可以在大多数外壳程序中为命令加上别名，因为macOS中的默认外壳程序（10.14及以下版本中的bash； 10.15中的zsh）具有相似的语法。您可以在您的中放置别名python =’python3′ ~/.profile，然后~/.profile在您~/.bash_profile和/或您的源代码中~/.zsh_profile输入以下内容：

[ -e ~/.profile ] && . ~/.profile

这样，您的别名将可在所有shell中使用。

这样，python命令现在将被调用python3。如果您想偶尔调用“原始” python（指python2），则可以使用command python，这将使别名保持不变，并且适用于所有shell。

如果您更频繁地启动解释器（我愿意），则总是可以创建更多别名来添加，即：

alias 2='python2'
alias 3='python3'

提示：对于脚本，而不是使用shebang之类的方法：

#!/usr/bin/env python

采用：

#!/usr/bin/env python3

这样，系统将使用python3运行python 可执行文件。

Changing the default python executable’s version system-wide could break some applications that depend on python2.

However, you can alias the commands in most shells, Since the default shells in macOS (bash in 10.14 and below; zsh in 10.15) share a similar syntax. You could put alias python=’python3′ in your ~/.profile, and then source ~/.profile in your ~/.bash_profile and/or your~/.zsh_profile with a line like:

[ -e ~/.profile ] && . ~/.profile

This way, your alias will work across shells.

With this, python command now invokes python3. If you want to invoke the “original” python (that refers to python2) on occasion, you can use command python, which will leaving the alias untouched, and works in all shells.

If you launch interpreters more often (I do), you can always create more aliases to add as well, i.e.:

alias 2='python2'
alias 3='python3'

Tip: For scripts, instead of using a shebang like:

#!/usr/bin/env python

use:

#!/usr/bin/env python3

This way, the system will use python3 for running python executables.

回答 1

您可以通过符号链接来解决。

unlink /usr/local/bin/python
ln -s /usr/local/bin/python3.3 /usr/local/bin/python

You can solve it by symbolic link.

unlink /usr/local/bin/python
ln -s /usr/local/bin/python3.3 /usr/local/bin/python

回答 2

打开〜/ .bash_profile文件。

vi ~/.bash_profile

然后按如下所示放置别名：

alias python='python3'

现在保存文件，然后运行〜/ .bash_profile文件。

source ~/.bash_profile

恭喜！！！现在，您可以通过键入python使用python3 。

python --version

的Python 3.7.3

Open ~/.bash_profile file.

vi ~/.bash_profile

Then put the alias as follows:

alias python='python3'

Now save the file and then run the ~/.bash_profile file.

source ~/.bash_profile

Congratulation !!! Now, you can use python3 by typing python.

python --version

Python 3.7.3

回答 3

转到终端类型：

alias python=python3.x

这会将默认python设置为python3.x

Go to terminal type:

alias python=python3.x

This will setup default python as python3.x

回答 4

以下为我工作

cd /usr/local/bin
mv python python.old
ln -s python3 python

The following worked for me

cd /usr/local/bin
mv python python.old
ln -s python3 python

回答 5

我在这个游戏上有点晚了，但是我认为我应该发布更新的答案，因为我自己才遇到这个问题。请注意，这仅适用于基于Mac的设置（我没有在Windows或任何版本的Linux上尝试过）。

最简单的方法是通过Brew安装Python 。如果您未安装brew，则需要先执行该操作。安装完成后，在终端上执行以下操作：

brew install python

这将安装Python3。安装后，运行以下命令：

ls -l /usr/local/bin/python*

您将看到brew创建的所有安装到其Python安装的链接。它看起来像这样：

lrwxr-xr-x  1 username  admin  36 Oct  1 13:35 /usr/local/bin/python3@ -> ../Cellar/python/3.7.4_1/bin/python3
lrwxr-xr-x  1 username  admin  43 Oct  1 13:35 /usr/local/bin/python3-config@ -> ../Cellar/python/3.7.4_1/bin/python3-config
lrwxr-xr-x  1 username  admin  38 Oct  1 13:35 /usr/local/bin/python3.7@ -> ../Cellar/python/3.7.4_1/bin/python3.7
lrwxr-xr-x  1 username  admin  45 Oct  1 13:35 /usr/local/bin/python3.7-config@ -> ../Cellar/python/3.7.4_1/bin/python3.7-config
lrwxr-xr-x  1 username  admin  39 Oct  1 13:35 /usr/local/bin/python3.7m@ -> ../Cellar/python/3.7.4_1/bin/python3.7m
lrwxr-xr-x  1 username  admin  46 Oct  1 13:35 /usr/local/bin/python3.7m-config@ -> ../Cellar/python/3.7.4_1/bin/python3.7m-config

此示例的第一行显示了python3符号链接。要将其设置为默认python符号链接，请运行以下命令：

ln -s -f /usr/local/bin/python3 /usr/local/bin/python

设置后，您可以执行以下操作：

which python

它应该显示：

/usr/local/bin/python

您将必须重新加载当前的终端shell，才能在该shell中使用新的符号链接，但是，所有新打开的shell会话将（应该）自动使用它。要对此进行测试，请打开一个新的终端外壳并运行以下命令：

python --version

I’m a little late to the game on this one, but I thought I should post an updated answer since I just encountered this issue for myself. Please note that this will only apply to a Mac-based setup (I haven’t tried it with Windows or any flavor of Linux).

The simplest way to get this working is to install Python via Brew. If you don’t have brew installed, you will need to do that first. Once installed, do the following in at the terminal:

brew install python

This will install Python 3. After it’s installed, run this:

ls -l /usr/local/bin/python*

You will see all of the links created by brew to its Python install. It will look something like this:

lrwxr-xr-x  1 username  admin  36 Oct  1 13:35 /usr/local/bin/python3@ -> ../Cellar/python/3.7.4_1/bin/python3
lrwxr-xr-x  1 username  admin  43 Oct  1 13:35 /usr/local/bin/python3-config@ -> ../Cellar/python/3.7.4_1/bin/python3-config
lrwxr-xr-x  1 username  admin  38 Oct  1 13:35 /usr/local/bin/python3.7@ -> ../Cellar/python/3.7.4_1/bin/python3.7
lrwxr-xr-x  1 username  admin  45 Oct  1 13:35 /usr/local/bin/python3.7-config@ -> ../Cellar/python/3.7.4_1/bin/python3.7-config
lrwxr-xr-x  1 username  admin  39 Oct  1 13:35 /usr/local/bin/python3.7m@ -> ../Cellar/python/3.7.4_1/bin/python3.7m
lrwxr-xr-x  1 username  admin  46 Oct  1 13:35 /usr/local/bin/python3.7m-config@ -> ../Cellar/python/3.7.4_1/bin/python3.7m-config

The first row in this example shows the python3 symlink. To set it as the default python symlink run the following:

ln -s -f /usr/local/bin/python3 /usr/local/bin/python

Once set, you can do:

which python

and it should show:

/usr/local/bin/python

You will have to reload your current terminal shell for it to use the new symlink in that shell, however, all newly opened shell sessions will (should) automatically use it. To test this, open a new terminal shell and run the following:

python --version

回答 6

转到“应用程序”，进入“ Python”文件夹，应该有一个名为“ Update Shell Profile.command”或类似名称的bash脚本。运行该脚本，它应该这样做。

更新：看来您不应该更新它：如何更改默认python版本？

Go to ‘Applications’, enter ‘Python’ folder, there should be a bash script called ‘Update Shell Profile.command’ or similar. Run that script and it should do it.

Update: It looks like you should not update it: how to change default python version?

回答 7

这对我有用。我添加了别名并重新启动了终端：

alias python=/usr/local/bin/python3

This worked for me. I added alias and restarted my terminal:

alias python=/usr/local/bin/python3

回答 8

我相信大多数登陆这里的人都在使用ZSH thorugh iterm或其他工具，这为您带来了答案。

您必须改为添加/修改命令~/.zshrc。

I believe most of people landed here are using ZSH thorugh iterm or whatever, and that brings you to this answer.

You have to add/modify your commands in ~/.zshrc instead.

回答 9

我不确定在OS X上是否可用，但是在Linux上我会使用该module命令。看这里。

正确设置modulefile，然后将以下内容添加到rc文件中（例如〜/ .bashrc）：

module load python3.3

这样一来，登录时即可根据需要切换路径，而不会影响任何系统默认值。

I’m not sure if this is available on OS X, but on linux I would make use of the module command. See here.

Set up the modulefile correctly, then add something like this to your rc file (e.g. ~/.bashrc):

module load python3.3

This will make it so that your paths get switched around as required when you log in without impacting any system defaults.

回答 10

我认为安装python时会将导出路径语句放入〜/ .bash_profile文件中。因此，如果您不再打算使用Python 2，则可以从那里删除该语句。如上所述的别名也是一种很好的方法。

这是从〜/ .bash_profile中删除引用的方法-vim ./.bash_profile-删除引用（也类似：export PATH =“ / Users / bla / anaconda：$ PATH”）-保存并退出-源./ .bash_profile保存更改

I think when you install python it puts export path statements into your ~/.bash_profile file. So if you do not intend to use Python 2 anymore you can just remove that statement from there. Alias as stated above is also a great way to do it.

Here is how to remove the reference from ~/.bash_profile – vim ./.bash_profile – remove the reference (AKA something like: export PATH=”/Users/bla/anaconda:$PATH”) – save and exit – source ./.bash_profile to save the changes

回答 11

$ sudo ln -s -f $(which python3) $(which python)

完成。

$ sudo ln -s -f $(which python3) $(which python)

done.

回答 12

在Mac上将Python 3设置为默认的正确和错误的方法

在本文中，作者讨论了设置默认python的三种方法：

什么不该做。
我们可以做（但也不应该）。
我们应该做什么！

所有这些方式都有效。您决定哪个更好。

The RIGHT and WRONG way to set Python 3 as default on a Mac

In this article author discuss three ways of setting default python:

What NOT to do.
What we COULD do (but also shouldn’t).
What we SHOULD do!

All these ways are working. You decide which is better.

回答 13

如果virtualenvwrapper使用which virtualenvwrapper.sh，则可以使用进行定位，然后使用vim或任何其他编辑器将其打开，然后更改以下内容

# Locate the global Python where virtualenvwrapper is installed.
if [ "${VIRTUALENVWRAPPER_PYTHON:-}" = "" ]
then
    VIRTUALENVWRAPPER_PYTHON="$(command \which python)"
fi

将行更改VIRTUALENVWRAPPER_PYTHON="$(command \which python)"为VIRTUALENVWRAPPER_PYTHON="$(command \which python3)"。

If you are using a virtualenvwrapper, you can just locate it using which virtualenvwrapper.sh, then open it using vim or any other editor then change the following

# Locate the global Python where virtualenvwrapper is installed.
if [ "${VIRTUALENVWRAPPER_PYTHON:-}" = "" ]
then
    VIRTUALENVWRAPPER_PYTHON="$(command \which python)"
fi

Change the line VIRTUALENVWRAPPER_PYTHON="$(command \which python)" to VIRTUALENVWRAPPER_PYTHON="$(command \which python3)".

回答 14

对我来说，解决方案是使用PyCharm并将默认的python版本设置为我需要使用的版本。

安装PyCharm并转到文件==>新项目的首选项，然后为项目选择所需的解释器，在这种情况下为python 3.3

For me the solution was using PyCharm and setting the default python version to the the one that i need to work with.

install PyCharm and go to file ==> preferences for new project, then choose the interpreter you want for your projects, in this case python 3.3

回答 15

如果您使用macports，则不需要使用别名或环境变量，只需使用macports已经提供的方法，此问答将对此进行说明：

作法：Macports选择python

TL; DR：

sudo port select --set python python27

If you use macports, you do not need to play with aliases or environment variables, just use the the method macports already offers, explained by this Q&A:

How to: Macports select python

TL;DR:

sudo port select --set python python27

回答 16

如果您使用的是Macports，则有一种更简单的方法：

跑：

port install python37

安装后，设置默认值：

sudo port select --set python python37

sudo port select --set python3 python37

重新启动您的cmd窗口，完成。

If you are using macports, that has a easier way to do:

run:

port install python37

after install, set default:

sudo port select --set python python37

sudo port select --set python3 python37

restart your cmd window, finished.

回答 17

好吧…有点老了。但是仍然值得一个好的答案。

优点之一是您不想触摸Mac上的默认Python。

通过Homebrew或其他方式安装所需的任何Python版本，然后在virtualenv中使用它。Virtualenv通常被认为是胡扯，但还是比在系统范围内更改python版本（macOS可能会保护自己免受此类操作）或用户范围，bash范围……好得多。只需忘记默认的Python。使用venv这样的游乐场是您的操作系统最感谢的事情。

例如，这种情况是，许多现代Linux发行版都摆脱了现成安装的Python2的安装，仅在系统中保留了Python3。但是每次您尝试使用python2作为依赖项安装旧版本时…希望您理解我的意思。一个好的开发者不在乎。好的开发人员可以使用他们想要的python版本创建干净的游乐场。

Well… It’s kinda old. But still deserves a good answer.

And the good one is You Don’t Wanna Touch The Default Python On Mac.

Install any Python version you need via Homebrew or whatever and use it in virtualenv. Virtualenv is often considered to be something crap-like, but it’s still way, wayyyy better than changing python version system-wide (macOS is likely to protect itself from such actions) or user-wide, bash-wide… whatever. Just forget about the default Python. Using playgrounds like venv is what your OS will be most, very most grateful for.

The case is, for example, many modern Linux distributions get rid of Python2 installed out-of-the-box, leaving only Python3 in the system. But everytime you try to install something old with python2 as a dependency… hope you understand what I mean. A good developer doesn’t care. Good developers create clean playgrounds with python version they desire.

回答 18

Mac用户只需要在终端上运行以下代码

brew switch python 3.x.x

3.xx应该是新的python版本。

这将更新所有系统链接。

Mac users just need to run the following code on terminal

brew switch python 3.x.x

3.x.x should be the new python version.

This will update all the system links.

回答 19

建议将python别名为python3会导致设置python版本的虚拟环境出现问题（例如：pyenv）。使用pyenv，您可以像下面这样全局设置版本：

pyenv global 3.8.2

然后在任何特定项目中，您都可以创建一个.python-version文件，其中包含python版本：

pyenv local 2.7.1

我认为这是在系统上管理多个版本的python的最佳方法。

Suggestions to alias python to python3 will cause problems with virtual environments that set the version of python (eg: pyenv). With pyenv, you can set the version globally like so:

pyenv global 3.8.2

and then in any specific project, you can create a .python-version file which has the python version inside of it:

pyenv local 2.7.1

This is the best way to manage multiple versions of python on a system in my opinion.

知识问答

熊猫：使用运算符链接过滤DataFrame的行

2021年7月27日 Python实用宝典

问题：熊猫：使用运算符链接过滤DataFrame的行

在大部分操作pandas可以与运营商链接（来完成groupby，aggregate，apply，等），但我发现过滤行唯一方法是通过正常的托架索引

df_filtered = df[df['column'] == value]

这没有吸引力，因为它要求我先分配df一个变量，然后才能根据其值进行过滤。还有以下内容吗？

df_filtered = df.mask(lambda x: x['column'] == value)

Most operations in pandas can be accomplished with operator chaining (groupby, aggregate, apply, etc), but the only way I’ve found to filter rows is via normal bracket indexing

df_filtered = df[df['column'] == value]

This is unappealing as it requires I assign df to a variable before being able to filter on its values. Is there something more like the following?

df_filtered = df.mask(lambda x: x['column'] == value)

回答 0

我不确定您想要什么，您的最后一行代码也无济于事，但是无论如何：

通过“链接”布尔索引中的条件来完成“链式”过滤。

In [96]: df
Out[96]:
   A  B  C  D
a  1  4  9  1
b  4  5  0  2
c  5  5  1  0
d  1  3  9  6

In [99]: df[(df.A == 1) & (df.D == 6)]
Out[99]:
   A  B  C  D
d  1  3  9  6

如果要链接方法，可以添加自己的mask方法并使用该方法。

In [90]: def mask(df, key, value):
   ....:     return df[df[key] == value]
   ....:

In [92]: pandas.DataFrame.mask = mask

In [93]: df = pandas.DataFrame(np.random.randint(0, 10, (4,4)), index=list('abcd'), columns=list('ABCD'))

In [95]: df.ix['d','A'] = df.ix['a', 'A']

In [96]: df
Out[96]:
   A  B  C  D
a  1  4  9  1
b  4  5  0  2
c  5  5  1  0
d  1  3  9  6

In [97]: df.mask('A', 1)
Out[97]:
   A  B  C  D
a  1  4  9  1
d  1  3  9  6

In [98]: df.mask('A', 1).mask('D', 6)
Out[98]:
   A  B  C  D
d  1  3  9  6

I’m not entirely sure what you want, and your last line of code does not help either, but anyway:

“Chained” filtering is done by “chaining” the criteria in the boolean index.

In [96]: df
Out[96]:
   A  B  C  D
a  1  4  9  1
b  4  5  0  2
c  5  5  1  0
d  1  3  9  6

In [99]: df[(df.A == 1) & (df.D == 6)]
Out[99]:
   A  B  C  D
d  1  3  9  6

If you want to chain methods, you can add your own mask method and use that one.

In [90]: def mask(df, key, value):
   ....:     return df[df[key] == value]
   ....:

In [92]: pandas.DataFrame.mask = mask

In [93]: df = pandas.DataFrame(np.random.randint(0, 10, (4,4)), index=list('abcd'), columns=list('ABCD'))

In [95]: df.ix['d','A'] = df.ix['a', 'A']

In [96]: df
Out[96]:
   A  B  C  D
a  1  4  9  1
b  4  5  0  2
c  5  5  1  0
d  1  3  9  6

In [97]: df.mask('A', 1)
Out[97]:
   A  B  C  D
a  1  4  9  1
d  1  3  9  6

In [98]: df.mask('A', 1).mask('D', 6)
Out[98]:
   A  B  C  D
d  1  3  9  6

回答 1

可以使用Pandas 查询链接过滤器：

df = pd.DataFrame(np.random.randn(30, 3), columns=['a','b','c'])
df_filtered = df.query('a > 0').query('0 < b < 2')

过滤器也可以组合在一个查询中：

df_filtered = df.query('a > 0 and 0 < b < 2')

Filters can be chained using a Pandas query:

df = pd.DataFrame(np.random.randn(30, 3), columns=['a','b','c'])
df_filtered = df.query('a > 0').query('0 < b < 2')

Filters can also be combined in a single query:

df_filtered = df.query('a > 0 and 0 < b < 2')

回答 2

@lodagro的答案很好。我可以通过将mask函数概括为：

def mask(df, f):
  return df[f(df)]

然后，您可以执行以下操作：

df.mask(lambda x: x[0] < 0).mask(lambda x: x[1] > 0)

The answer from @lodagro is great. I would extend it by generalizing the mask function as:

def mask(df, f):
  return df[f(df)]

Then you can do stuff like:

df.mask(lambda x: x[0] < 0).mask(lambda x: x[1] > 0)

回答 3

从0.18.1版开始，该.loc方法接受可调用的选择。与lambda函数一起，您可以创建非常灵活的可链接过滤器：

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.loc[lambda df: df.A == 80]  # equivalent to df[df.A == 80] but chainable

df.sort_values('A').loc[lambda df: df.A > 80].loc[lambda df: df.B > df.A]

如果您所做的只是过滤，也可以省略.loc。

Since version 0.18.1 the .loc method accepts a callable for selection. Together with lambda functions you can create very flexible chainable filters:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.loc[lambda df: df.A == 80]  # equivalent to df[df.A == 80] but chainable

df.sort_values('A').loc[lambda df: df.A > 80].loc[lambda df: df.B > df.A]

If all you’re doing is filtering, you can also omit the .loc.

回答 4

我提供了其他示例。这是与https://stackoverflow.com/a/28159296/相同的答案

我将添加其他修改，以使该帖子更有用。

pandas.DataFrame.query
query正是出于这个目的。考虑数据框df

import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(
    np.random.randint(10, size=(10, 5)),
    columns=list('ABCDE')
)

df

   A  B  C  D  E
0  0  2  7  3  8
1  7  0  6  8  6
2  0  2  0  4  9
3  7  3  2  4  3
4  3  6  7  7  4
5  5  3  7  5  9
6  8  7  6  4  7
7  6  2  6  6  5
8  2  8  7  5  8
9  4  7  6  1  5

让我们使用query过滤所有行D > B

df.query('D > B')

   A  B  C  D  E
0  0  2  7  3  8
1  7  0  6  8  6
2  0  2  0  4  9
3  7  3  2  4  3
4  3  6  7  7  4
5  5  3  7  5  9
7  6  2  6  6  5

我们连锁

df.query('D > B').query('C > B')
# equivalent to
# df.query('D > B and C > B')
# but defeats the purpose of demonstrating chaining

   A  B  C  D  E
0  0  2  7  3  8
1  7  0  6  8  6
4  3  6  7  7  4
5  5  3  7  5  9
7  6  2  6  6  5

I offer this for additional examples. This is the same answer as https://stackoverflow.com/a/28159296/

I’ll add other edits to make this post more useful.

pandas.DataFrame.query
query was made for exactly this purpose. Consider the dataframe df

import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(
    np.random.randint(10, size=(10, 5)),
    columns=list('ABCDE')
)

df

   A  B  C  D  E
0  0  2  7  3  8
1  7  0  6  8  6
2  0  2  0  4  9
3  7  3  2  4  3
4  3  6  7  7  4
5  5  3  7  5  9
6  8  7  6  4  7
7  6  2  6  6  5
8  2  8  7  5  8
9  4  7  6  1  5

Let’s use query to filter all rows where D > B

df.query('D > B')

   A  B  C  D  E
0  0  2  7  3  8
1  7  0  6  8  6
2  0  2  0  4  9
3  7  3  2  4  3
4  3  6  7  7  4
5  5  3  7  5  9
7  6  2  6  6  5

Which we chain

df.query('D > B').query('C > B')
# equivalent to
# df.query('D > B and C > B')
# but defeats the purpose of demonstrating chaining

   A  B  C  D  E
0  0  2  7  3  8
1  7  0  6  8  6
4  3  6  7  7  4
5  5  3  7  5  9
7  6  2  6  6  5

回答 5

除了要将条件合并为OR条件外，我有相同的问题。Wouter Overmeire给出的格式将条件合并为AND条件，因此必须同时满足两个条件：

In [96]: df
Out[96]:
   A  B  C  D
a  1  4  9  1
b  4  5  0  2
c  5  5  1  0
d  1  3  9  6

In [99]: df[(df.A == 1) & (df.D == 6)]
Out[99]:
   A  B  C  D
d  1  3  9  6

但是我发现，如果将每个条件包装起来(... == True)并用管道将这些条件连接起来，则这些条件将以OR条件组合，只要它们中的任何一个为true都将满足：

df[((df.A==1) == True) | ((df.D==6) == True)]

I had the same question except that I wanted to combine the criteria into an OR condition. The format given by Wouter Overmeire combines the criteria into an AND condition such that both must be satisfied:

In [96]: df
Out[96]:
   A  B  C  D
a  1  4  9  1
b  4  5  0  2
c  5  5  1  0
d  1  3  9  6

In [99]: df[(df.A == 1) & (df.D == 6)]
Out[99]:
   A  B  C  D
d  1  3  9  6

But I found that, if you wrap each condition in (... == True) and join the criteria with a pipe, the criteria are combined in an OR condition, satisfied whenever either of them is true:

df[((df.A==1) == True) | ((df.D==6) == True)]

回答 6

熊猫提供了Wouter Overmeire答案的两种替代方法，不需要任何替代。一个是.loc[.]可调用的，例如

df_filtered = df.loc[lambda x: x['column'] == value]

另一个是.pipe()，如

df_filtered = df.pipe(lambda x: x['column'] == value)

pandas provides two alternatives to Wouter Overmeire’s answer which do not require any overriding. One is .loc[.] with a callable, as in

df_filtered = df.loc[lambda x: x['column'] == value]

the other is .pipe(), as in

df_filtered = df.pipe(lambda x: x['column'] == value)

回答 7

我的答案与其他人相似。如果您不想创建新功能，则可以使用已经为您定义的pandas。使用管道方法。

df.pipe(lambda d: d[d['column'] == value])

My answer is similar to the others. If you do not want to create a new function you can use what pandas has defined for you already. Use the pipe method.

df.pipe(lambda d: d[d['column'] == value])

回答 8

如果您想应用所有通用布尔掩码以及通用掩码，则可以将以下内容放在文件中，然后按如下所示简单地分配它们：

pd.DataFrame = apply_masks()

用法：

A = pd.DataFrame(np.random.randn(4, 4), columns=["A", "B", "C", "D"])
A.le_mask("A", 0.7).ge_mask("B", 0.2)... (May be repeated as necessary

这有点骇人听闻，但是如果您不断根据过滤器来分割和更改数据集，则可以使事情变得更清晰。gen_mask函数中还有一个上面丹尼尔·韦尔科夫（Daniel Velkov）改编的通用过滤器，您可以将其与lambda函数或其他需要的函数一起使用。

要保存的文件（我使用masks.py）：

import pandas as pd

def eq_mask(df, key, value):
    return df[df[key] == value]

def ge_mask(df, key, value):
    return df[df[key] >= value]

def gt_mask(df, key, value):
    return df[df[key] > value]

def le_mask(df, key, value):
    return df[df[key] <= value]

def lt_mask(df, key, value):
    return df[df[key] < value]

def ne_mask(df, key, value):
    return df[df[key] != value]

def gen_mask(df, f):
    return df[f(df)]

def apply_masks():

    pd.DataFrame.eq_mask = eq_mask
    pd.DataFrame.ge_mask = ge_mask
    pd.DataFrame.gt_mask = gt_mask
    pd.DataFrame.le_mask = le_mask
    pd.DataFrame.lt_mask = lt_mask
    pd.DataFrame.ne_mask = ne_mask
    pd.DataFrame.gen_mask = gen_mask

    return pd.DataFrame

if __name__ == '__main__':
    pass

If you would like to apply all of the common boolean masks as well as a general purpose mask you can chuck the following in a file and then simply assign them all as follows:

pd.DataFrame = apply_masks()

Usage:

A = pd.DataFrame(np.random.randn(4, 4), columns=["A", "B", "C", "D"])
A.le_mask("A", 0.7).ge_mask("B", 0.2)... (May be repeated as necessary

It’s a little bit hacky but it can make things a little bit cleaner if you’re continuously chopping and changing datasets according to filters. There’s also a general purpose filter adapted from Daniel Velkov above in the gen_mask function which you can use with lambda functions or otherwise if desired.

File to be saved (I use masks.py):

import pandas as pd

def eq_mask(df, key, value):
    return df[df[key] == value]

def ge_mask(df, key, value):
    return df[df[key] >= value]

def gt_mask(df, key, value):
    return df[df[key] > value]

def le_mask(df, key, value):
    return df[df[key] <= value]

def lt_mask(df, key, value):
    return df[df[key] < value]

def ne_mask(df, key, value):
    return df[df[key] != value]

def gen_mask(df, f):
    return df[f(df)]

def apply_masks():

    pd.DataFrame.eq_mask = eq_mask
    pd.DataFrame.ge_mask = ge_mask
    pd.DataFrame.gt_mask = gt_mask
    pd.DataFrame.le_mask = le_mask
    pd.DataFrame.lt_mask = lt_mask
    pd.DataFrame.ne_mask = ne_mask
    pd.DataFrame.gen_mask = gen_mask

    return pd.DataFrame

if __name__ == '__main__':
    pass

回答 9

该解决方案在实现方面更缺乏技巧，但我发现它的用法更加简洁，并且肯定比其他建议的方案更通用。

https://github.com/toobaz/generic_utils/blob/master/generic_utils/pandas/where.py

您无需下载整个存储库：保存文件并执行

from where import where as W

应该足够了。然后像这样使用它：

df = pd.DataFrame([[1, 2, True],
                   [3, 4, False], 
                   [5, 7, True]],
                  index=range(3), columns=['a', 'b', 'c'])
# On specific column:
print(df.loc[W['a'] > 2])
print(df.loc[-W['a'] == W['b']])
print(df.loc[~W['c']])
# On entire - or subset of a - DataFrame:
print(df.loc[W.sum(axis=1) > 3])
print(df.loc[W[['a', 'b']].diff(axis=1)['b'] > 1])

一个不太愚蠢的用法示例：

data = pd.read_csv('ugly_db.csv').loc[~(W == '$null$').any(axis=1)]

顺便说一句：即使在您仅使用布尔cols的情况下，

df.loc[W['cond1']].loc[W['cond2']]

可以比

df.loc[W['cond1'] & W['cond2']]

因为它的计算结果cond2只在cond1是True。

免责声明：我首先在其他地方给出了此答案，因为我还没有看到。

This solution is more hackish in terms of implementation, but I find it much cleaner in terms of usage, and it is certainly more general than the others proposed.

https://github.com/toobaz/generic_utils/blob/master/generic_utils/pandas/where.py

You don’t need to download the entire repo: saving the file and doing

from where import where as W

should suffice. Then you use it like this:

df = pd.DataFrame([[1, 2, True],
                   [3, 4, False], 
                   [5, 7, True]],
                  index=range(3), columns=['a', 'b', 'c'])
# On specific column:
print(df.loc[W['a'] > 2])
print(df.loc[-W['a'] == W['b']])
print(df.loc[~W['c']])
# On entire - or subset of a - DataFrame:
print(df.loc[W.sum(axis=1) > 3])
print(df.loc[W[['a', 'b']].diff(axis=1)['b'] > 1])

A slightly less stupid usage example:

data = pd.read_csv('ugly_db.csv').loc[~(W == '$null$').any(axis=1)]

By the way: even in the case in which you are just using boolean cols,

df.loc[W['cond1']].loc[W['cond2']]

can be much more efficient than

df.loc[W['cond1'] & W['cond2']]

because it evaluates cond2 only where cond1 is True.

DISCLAIMER: I first gave this answer elsewhere because I hadn’t seen this.

回答 10

只想使用添加演示 loc不仅用于按行过滤，还可以按列过滤，并对链式操作有一些优点。

下面的代码可以按值过滤行。

df_filtered = df.loc[df['column'] == value]

通过稍作修改，您也可以过滤列。

df_filtered = df.loc[df['column'] == value, ['year', 'column']]

那么为什么我们要使用链式方法呢？答案是，如果您有很多操作，它很容易阅读。例如，

res =  df\
    .loc[df['station']=='USA', ['TEMP', 'RF']]\
    .groupby('year')\
    .agg(np.nanmean)

Just want to add a demonstration using loc to filter not only by rows but also by columns and some merits to the chained operation.

The code below can filter the rows by value.

df_filtered = df.loc[df['column'] == value]

By modifying it a bit you can filter the columns as well.

df_filtered = df.loc[df['column'] == value, ['year', 'column']]

So why do we want a chained method? The answer is that it is simple to read if you have many operations. For example,

res =  df\
    .loc[df['station']=='USA', ['TEMP', 'RF']]\
    .groupby('year')\
    .agg(np.nanmean)

回答 11

这没有吸引力，因为它要求我先分配df一个变量，然后才能根据其值进行过滤。

df[df["column_name"] != 5].groupby("other_column_name")

似乎有效：您也可以嵌套[]运算符。也许他们是在您提出问题后才添加的。

This is unappealing as it requires I assign df to a variable before being able to filter on its values.

df[df["column_name"] != 5].groupby("other_column_name")

seems to work: you can nest the [] operator as well. Maybe they added it since you asked the question.

回答 12

如果将列设置为作为索引搜索，则可以使用DataFrame.xs()横截面。这没有query答案的通用性，但在某些情况下可能有用。

import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(
    np.random.randint(3, size=(10, 5)),
    columns=list('ABCDE')
)

df
# Out[55]: 
#    A  B  C  D  E
# 0  0  2  2  2  2
# 1  1  1  2  0  2
# 2  0  2  0  0  2
# 3  0  2  2  0  1
# 4  0  1  1  2  0
# 5  0  0  0  1  2
# 6  1  0  1  1  1
# 7  0  0  2  0  2
# 8  2  2  2  2  2
# 9  1  2  0  2  1

df.set_index(['A', 'D']).xs([0, 2]).reset_index()
# Out[57]: 
#    A  D  B  C  E
# 0  0  2  2  2  2
# 1  0  2  1  1  0

If you set your columns to search as indexes, then you can use DataFrame.xs() to take a cross section. This is not as versatile as the query answers, but it might be useful in some situations.

import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(
    np.random.randint(3, size=(10, 5)),
    columns=list('ABCDE')
)

df
# Out[55]: 
#    A  B  C  D  E
# 0  0  2  2  2  2
# 1  1  1  2  0  2
# 2  0  2  0  0  2
# 3  0  2  2  0  1
# 4  0  1  1  2  0
# 5  0  0  0  1  2
# 6  1  0  1  1  1
# 7  0  0  2  0  2
# 8  2  2  2  2  2
# 9  1  2  0  2  1

df.set_index(['A', 'D']).xs([0, 2]).reset_index()
# Out[57]: 
#    A  D  B  C  E
# 0  0  2  2  2  2
# 1  0  2  1  1  0

回答 13

您还可以将numpy库用于逻辑操作。它非常快。

df[np.logical_and(df['A'] == 1 ,df['B'] == 6)]

You can also leverage the numpy library for logical operations. Its pretty fast.

df[np.logical_and(df['A'] == 1 ,df['B'] == 6)]