分类目录归档：解决方案

人工智能、解决方案、量化投资

半年磨一剑，我的第一个出海 APP 发布啦！

2024年12月15日 Python实用宝典留下评论

每隔一段时间，总有几个重要的经济数据会影响市场。美联储利率决定、非农、CPI、GDP、PMI…这些数字的起伏，往往牵动着全球市场的神经。

现在市面上的经济数据APP 大多都存在一个问题：没有直观体现经济数据发布后这些数据对世界各地的金融市场产生的影响。

这就是我开发 Market Moments 的初衷。它是一个简单但实用的经济数据工具:

它就像一个随身的经济日历，帮你：
1. 追踪重要经济数据对不同市场的影响

2. 交易日历，哪一天发布了什么经济事件，会发布什么经济事件，有什么节假日，在这个 APP 上一清二楚：

3. 搜索，你可以搜索历史上发生过的经济事件以及其对市场的影响

4. 收藏和自定义设置：点击右上角的小星星，你就可以收藏某个事件，避免重复查找。另外，在设置页面中，你还可以选择红涨绿跌/中英文/黑色模式等配置。

5.分享功能最后，你还可以通过分享按钮，将事件分享给你朋友：

不追求花哨的功能，只为满足最基础的需求 – 知道该看什么，什么时候看。

如果你也需要这样一个工具，欢迎在 App Store 搜索【Market Moments】。

解决方案

Terms of Service

Last Updated: 2024-01-28

Please read these Terms of Service (“Terms,” “Terms of Service”) carefully before using our services operated by us.

Acceptance of Terms:

By accessing or using the Service, you agree to be bound by these Terms. If you disagree with any part of the terms, then you may not access the Service.

Use of the Service:

User Accounts: You may be required to create an account to use certain features of the Service. You are responsible for maintaining the confidentiality of your account and password.
Content: The Service may allow you to post, link, store, share, and otherwise make available certain information, text, graphics, videos, or other material (“Content”). You are responsible for the Content you post.

Intellectual Property:

The Service and its original content, features, and functionality are owned by us and are protected by international copyright, trademark, patent, trade secret, and other intellectual property or proprietary rights laws.

Termination:

We may terminate or suspend your account immediately, without prior notice or liability, for any reason whatsoever, including without limitation if you breach the Terms.

Changes to the Terms:

We reserve the right to modify or replace these Terms at any time. If a revision is material, we will try to provide at least 30 days’ notice prior to any new terms taking effect.

Disclaimer:

The Service is provided on an “as-is” and “as-available” basis. We do not guarantee that the Service will be uninterrupted, timely, secure, or error-free.

Limitation of Liability:

In no event shall us be liable for any indirect, incidental, special, consequential, or punitive damages.

Contact Us:

If you have any questions about these Terms, please contact us at admin@pythondict.com.

解决方案

Data Deletion Policy

Data Deletion Policy

Last Updated: 2024-01-28

This Data Deletion Policy outlines how we handles the deletion of user data upon request.

Requesting Data Deletion:

If you would like to request the deletion of your personal information from our records, please contact us at admin@pythondict.com. Please provide sufficient information to allow us to verify your identity and locate your data.

Verification Process:

For security reasons, we may need to verify your identity before processing a data deletion request. We may request additional information from you to confirm your identity.

Processing Time:

Upon receiving a valid data deletion request, we will make reasonable efforts to delete your data from our records in a timely manner. Please note that some data may be retained for a certain period as required by law or for legitimate business purposes.

Data Categories:

The data that can be deleted upon request may include:

Personal information provided during account registration.
Usage data, such as device information and interactions with the service.
Other data collected for specific purposes, as outlined in our Privacy Policy.

Exceptions:

Certain data may be exempt from deletion if its retention is necessary for legal compliance, protection against fraudulent activity, or other legitimate business purposes.

Confirmation of Deletion:

Once your data has been deleted, we will confirm the deletion through the contact information provided in your request.

Contact Us:

If you have any questions or concerns about our Data Deletion Policy, please contact us at admin@pythondict.com.

github、Python 办公、开发工具、解决方案

Asciinema – 终端日志记录神器，机器学习开发者的福音

2022年3月13日 Python实用宝典留下评论

我们在做机器学习/深度学习开发的时候，经常会产生如下所示的大量日志：

这些日志如果不保存，转瞬即逝，当我们想要回去翻看某一轮训练日志的时候，会很遗憾的发现找不到了。

现在有了这个 Asciinema 这个神器，我们不仅能找到当时的终端日志导出，还能够“重播日志”并“分享日志”。非常牛逼：

Asciinema 是使用Python开发的工具，请按下面的流程安装并使用。

1.准备

开始之前，你要确保Python和pip已经成功安装在电脑上，如果没有，请访问这篇文章：超详细Python安装指南进行安装。

(可选1) 如果你用Python的目的是数据分析，可以直接安装Anaconda：Python数据分析与挖掘好帮手—Anaconda，它内置了Python和pip.

(可选2) 此外，推荐大家用VSCode编辑器来编写小型Python项目：Python 编程的最好搭档—VSCode 详细指南

Windows环境下打开Cmd(开始—运行—CMD)，苹果系统环境下请打开Terminal(command+空格输入Terminal)，输入命令安装依赖：

pip install asciinema

2.Asciinema 使用方法

终端输入如下命令，记录你的第一个终端日志：

asciinema rec first.cast

输入完成后会显示如下的提示：

(gs3_9) zjr@sgd-linux-1:~/cnn_test$ asciinema rec first.cast
  
asciinema: recording asciicast to first.cast
asciinema: press <ctrl-d> or type "exit" when you're done

意思就是日志会被保存在当前文件夹下的first.cast，如果你想结束录制，按 Ctrl + D 即可。

记录完毕后，以双倍速度重播该日志：

asciinema play -s 2 first.cast

或以正常速度但空闲时间限制为 2 秒：

asciinema play -i 2 first.cast

你也可以在启动终端日志录制时传递 -i 2 给 asciinema rec，将其永久设置在录制中：

asciinema rec first.cast -i 2

空闲时间的限制使录制更有趣。试试吧。

如果你想在网络上观看和分享，请上传：

asciinema upload first.cast

这个命令会将日志记录上传到 asciinema.org，此外，它会打印一个秘密链接，你可以使用该链接在网络浏览器中观看你录制的终端日志：

你可以通过省略文件名一步录制和上传终端的日志：

asciinema rec

录制完成后，系统会要求你确认上传。未经你的同意，不会向任何地方发送任何内容。

3.播放日志

查看日志有四种方式，最普通的是通过本地文件进行终端重播：

asciinema play /path/to/asciicast.cast

以下键盘快捷键可用：

Space– 暂停，
.– 按帧步进（暂停时），
Ctrl+C– 退出

第二种方式是通过url播放：

asciinema play https://asciinema.org/a/22124.cast
asciinema play http://example.com/demo.cast

这个方式需要你的日志已经上传到asciinema.org中。

第三种方式是通过你自己生成的html页面访问（需要<link rel="alternate" type="application/x-asciicast" href="/my/ascii.cast">在页面的 HTML 中）：

asciinema play http://your_html_path/post.html

第四种方式是通过标准输入输出播放：

cat /path/to/asciicast.cast | asciinema play -
ssh user@host cat asciicast.cast | asciinema play -

可用选项：

-i, --idle-time-limit=<sec>– 将重播的终端空闲不动时间闲置为最大<sec>秒数
-s, --speed=<factor>– 播放速度

4.导出日志

导出终端日志到文本文件非常简单：

asciinema cat existing.cast > terminal_output.txt

所有的终端日志都会被导出到 terminal_output.txt 中，非常方便好用。

我们的文章到此就结束啦，如果你喜欢今天的 Python 教程，请持续关注Python实用宝典。

有任何问题，可以在公众号后台回复：加群，回答相应验证信息，进入互助群询问。

原创不易，希望你能在下面点个赞和在看支持我继续创作，谢谢！

我要打赏

Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号：Python实用宝典

开发工具、解决方案、设计模式

Box 为你的字典添加点符号访问特性

2021年10月16日 Python实用宝典留下评论

正常情况下，我们想访问字典中的某个值，都是通过中括号访问，比如：

test_dict = {"test": {"imdb stars": 6.7, "length": 104}}

print(test_dict["test"]["imdb stars"])
# 104

而通过Box模块，我们可以扩展字典功能，使用点符号访问元素：

from box import Box

movie_box = Box({ "Robin Hood: Men in Tights": { "imdb stars": 6.7, "length": 104 } })

movie_box.Robin_Hood_Men_in_Tights.imdb_stars

# 6.7

另外，可以看到默认情况下转换后，字典键值中的空格被转化为了下划线。

下面具体介绍 Box 模块的使用方法。

1.准备

开始之前，你要确保Python和pip已经成功安装在电脑上，如果没有，请访问这篇文章：超详细Python安装指南进行安装。

(可选1) 如果你用Python的目的是数据分析，可以直接安装Anaconda：Python数据分析与挖掘好帮手—Anaconda，它内置了Python和pip.

(可选2) 此外，推荐大家用VSCode编辑器来编写小型Python项目：Python 编程的最好搭档—VSCode 详细指南

Windows环境下打开Cmd(开始—运行—CMD)，苹果系统环境下请打开Terminal(command+空格输入Terminal)，输入命令安装依赖：

pip install --upgrade python-box[all]

2.基本使用

我们可以像文章开头那样传入一个字典给 Box，生成一个Box对象；也可以直接使用参数赋值的方式生成一个Box对象：

from box import Box

my_box = Box(funny_movie='Hudson Hawk', best_movie='Kung Fu Panda')
my_box.funny_movie
# 'Hudson Hawk'

请记住，任何情况下，你往Box对象里添加字典或是数组，这些字典或数组都会被转变为Box对象：

my_box = Box({"team": {"red": {"leader": "Sarge", "members": []}}})
print(my_box.team.red.leader)
# Sarge

my_box.team.blue = {"leader": "Church", "members": []} 
print(repr(my_box.team.blue))
# <Box: {'leader': 'Church', 'members': []}>

访问列表中的 Box 对象也非常轻松：

my_box.team.red.members = [
    {"name": "Grif", "rank": "Minor Junior Private Negative First Class"},
    {"name": "Dick Simmons", "rank": "Captain"}
]

print(my_box.team.red.members[0].name)
# Grif

局限性

请注意，字典中有些默认方法，如：clear, copy, fromkeys, get, items, keys, pop, popitem, setdefault, to_dict, update, merge_update, values，当你的键值和这些方法名称冲突时，你无法使用点符号访问它们。

不过冲突时，你依然可以使用传统的字典取值访问它们，例如：

my_box['keys']

合并

要合并两个Box对象，你只需要通过 merge_update 方法：

from box import Box

box_1 = Box(val={'important_key': 1}) 
box_2 = Box(val={'less_important_key': 2})

box_1.merge_update(box_2)

print(box_1)
# {'val': {'important_key': 1, 'less_important_key': 2}}

当然，你也可以用传统的 update 方法：

from box import Box

box_1 = Box(val={'important_key': 1}) 
box_2 = Box(val={'less_important_key': 2})

box_1.update(box_2)

print(box_1)
# {'val': {'less_important_key': 2}}

转换为原始列表/字典

如果你需要把一个 Box 对象的字典转化为原始字典，.to_dict() 方法就可以帮你实现：

from box import Box

box_1 = Box(val={'important_key': 1}) 

print(box_1)
# {'val': {'less_important_key': 2}}
print(type(box_1))
# <class 'box.box.Box'>
print(type(box_1.to_dict()))
# <class 'dict'>

如果你需要把一个 Box 对象的列表转化为原始列表，你可以使用 .to_list() 方法：

from box import BoxList

my_boxlist = BoxList({'item': x} for x in range(10))
#  <BoxList: [<Box: {'item': 0}>, <Box: {'item': 1}>, ...

my_boxlist[5].item
# 5

print(type(my_boxlist.to_list()))
# <class 'list'>

3.导入导出功能

Box对象有一个很方便的功能，就是能够轻松地将Box对象导出为Json/yaml/csv/msgpack文件：

from box import BoxList

my_boxlist = BoxList({'item': x} for x in range(10))
#  <BoxList: [<Box: {'item': 0}>, <Box: {'item': 1}>, ...

my_boxlist.to_json(filename="test.json")
# 在当前文件夹下生成一个 test.json 文件

此外，还能接受 Json/yaml/csv/msgpack 文件导入：

new_box = Box.from_json(filename="films.json")

各种类型的文件对应的方法如下：

转换器方法	描述
to_dict	递归地将所有 Box（和 BoxList）对象转换回字典（和列表）
to_json	将 Box 对象另存为 JSON 字符串或使用`filename`参数写入文件
to_yaml	将 Box 对象另存为 YAML 字符串或使用`filename`参数写入文件
to_msgpack	将 Box 对象另存为 msgpack 字节或使用`filename`参数写入文件
to_toml*	将 Box 对象另存为 TOML 字符串或使用`filename`参数写入文件
to_csv**	将 BoxList 对象另存为 CSV 字符串或使用`filename`参数写入文件
from_json	Classmethod，从一个 JSON 文件或字符串创建一个 Box 对象（所有 Box 参数都可以传递）
from_yaml	类方法，从 YAML 文件或字符串创建一个 Box 对象（所有 Box 参数都可以传递）
from_msgpack	Classmethod，从msgpack文件或字节创建一个Box对象（所有Box参数都可以传递）
from_toml*	Classmethod，从TOML文件或字符串创建一个Box对象（所有Box参数都可以传递）
from_csv**	Classmethod，从一个CSV文件或字符串创建一个BoxList对象（可以传递所有BoxList参数）

* 不适用于 BoxList，仅适用于 Box ** 不适用于 Box，仅适用于 BoxList。

还有更多的特性，大家可以参考 Box 模块官方WIki：

https://github.com/cdgriffith/Box/wiki

我们的文章到此就结束啦，如果你喜欢今天的 Python 教程，请持续关注Python实用宝典。

有任何问题，可以在公众号后台回复：加群，回答相应验证信息，进入互助群询问。

原创不易，希望你能在下面点个赞和在看支持我继续创作，谢谢！

我要打赏

Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号：Python实用宝典

Python 办公、工具、性能优化、解决方案

Isort 自动整理”import”的超实用工具详细教程

2021年7月22日 Python实用宝典留下评论

isort 是一个Python的实用程序/库，它会按字母顺序对导入(import)的库进行排序，并自动分组。它提供多种使用方式，包括命令行、Python调用等。

它基于Python 3.6+实现，但也支持格式化Python 2代码。

在使用 isort 格式化你的 import 之前，你的代码可能是长这样的：

from my_lib import Object
import os
from my_lib import Object3
from my_lib import Object2
import sys
from third_party import lib15, lib1, lib2, lib3, lib4, lib5, lib6, lib7, lib8, lib9, lib10, lib11, lib12, lib13, lib14
import sys
from __future__ import absolute_import
from third_party import lib3
print("Hey")
print("yo")

使用 isort 格式化后的代码是这样的：

from __future__ import absolute_import import os
import sys from third_party import (lib1, lib2, lib3, lib4, lib5, lib6, lib7, lib8,
                        lib9, lib10, lib11, lib12, lib13, lib14, lib15)

from my_lib import Object, Object2, Object3 
print("Hey")
print("yo")

杂乱无章的格式瞬间变得井然有序，可见这是一款多么优秀的整理工具，下面就来介绍这个工具的安装及使用过程，及进阶用法。

1.准备

开始之前，你要确保Python和pip已经成功安装在电脑上，如果没有，请访问这篇文章：超详细Python安装指南进行安装。

(可选1) 如果你用Python的目的是数据分析，可以直接安装Anaconda：Python数据分析与挖掘好帮手—Anaconda，它内置了Python和pip.

(可选2) 此外，推荐大家用VSCode编辑器来编写小型Python项目：Python 编程的最好搭档—VSCode 详细指南

Windows环境下打开Cmd(开始—运行—CMD)，苹果系统环境下请打开Terminal(command+空格输入Terminal)，输入命令安装依赖：

pip install isort

如果你需要让他支持对 requirements.txt 的整理，请这样安装：

pip install isort[requirements_deprecated_finder]

2.使用 isort 整理你的python引用

isort 有2种使用方法，一种是从命令行直接针对py文件进行整理、另一种是在Python内导入 isort 进行整理。

命令行整理

要在特定文件上运行 isort，请在命令行执行以下操作：

isort mypythonfile.py mypythonfile2.py
# 或
python -m isort mypythonfile.py mypythonfile2.py

要对本文件夹递归进行isort整理，请执行以下操作：

isort .
# 或
python -m isort .

要查看更改建议的而不直接应用它们，请执行以下操作：

isort mypythonfile.py --diff

如果你要对项目自动运行isort，但是希望仅在未引入语法错误的情况下应用更改：

isort --atomic .

(注意：这在默认情况下是禁用的，因为它阻止了 isort 去整理不同版本的Python代码。)

从Python内部：

import isort
isort.file("pythonfile.py")

或者：

import isort
sorted_code = isort.code("import b\nimport a\n")

3. 智能平衡格式化

从 isort 3.1.0 开始，添加了对平衡多行导入的支持。启用此选项后，isort 将动态地将导入长度更改为生成最平衡网格的长度，同时保持低于定义的最大导入长度。

开启了平衡导入的格式化：

from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

未开启平衡的格式化：

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

要启用此设置，在你的配置设置 balanced_wrapping=True 或通过命令行添加 -e 参数执行整理。

4.跳过某个import

要使 isort 忽略单个 import，只需在包含文本的导入行的末尾添加注释 isort:skip，如下：

import module  # isort:skip

或者：

from xyz import (abc,  # isort:skip
                 yo,
                 hey)

要使 isort 跳过整个文件，只需添加 isort:skip_file 到文件的开头注释中：

""" 
my_module.py
Best module ever

isort:skip_file
"""

import b
import a

这个工具还是相当方便的，尤其是针对一些杂乱无章、多年沉淀下来的项目代码的 import 进行整理的时候，它会变得非常香。有需要的小伙伴可以赶快试一下。

我们的文章到此就结束啦，如果你喜欢今天的 Python 教程，请持续关注Python实用宝典。

有任何问题，可以在公众号后台回复：加群，回答相应验证信息，进入互助群询问。

原创不易，希望你能在下面点个赞和在看支持我继续创作，谢谢！

我要打赏

Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号：Python实用宝典

性能优化、解决方案

干货源码剖析！详解 Celery Beat 实现原理

2021年7月11日 Python实用宝典留下评论

Celery 是一个简单、灵活且可靠的，处理大量消息的分布式系统，它是一个专注于实时处理的任务队列，同时也支持任务调度。

为了讲解 Celery Beat 的周期调度机制及实现原理，我们会基于Django从制作一个简单的周期任务开始，然后一步一步拆解 Celery Beat 的源代码。

相关前置应用知识，可以阅读以下文章：

1.Django Celery 异步与定时任务实战教程
2.Python Celery 异步快速下载股票数据

1.Celery 简单周期任务示例

在 celery_app.tasks.py 中添加如下任务：

@shared_task
def pythondict_task():
    print("pythondict_task")

在 django.celery.py 文件中添加如下配置，

from celery_django import settings
from datetime import timedelta


app.autodiscover_tasks(lambda : settings.INSTALLED_APPS)

CELERYBEAT_SCHEDULE = {
    'pythondict_task': {
        'task': 'celery_app.tasks.pythondict_task',
        'schedule': timedelta(seconds=3),
    },
}

app.conf.update(CELERYBEAT_SCHEDULE=CELERYBEAT_SCHEDULE)

至此，配置完成，此时，先启动 Celery Beat 定时任务命令：

celery beat -A celery_django -S django

然后打开第二个终端进程启动消费者：

celery -A celery_django worker

此时在worker的终端上就会输出类似如下的信息：

    [2021-07-11 16:34:11,546: WARNING/PoolWorker-3] pythondict_task
    [2021-07-11 16:34:11,550: WARNING/PoolWorker-4] pythondict_task
    [2021-07-11 16:34:11,551: WARNING/PoolWorker-2] pythondict_task
    [2021-07-11 16:34:11,560: WARNING/PoolWorker-1] pythondict_task

看到结果正常输出，说明任务成功定时执行。

2.源码剖析

为了明白 Celery Beat 是如何实现周期任务调度的，我们需要从 Celery 源码入手。

当你执行 Celery Beat 启动命令的时候，到底发生了什么？

celery beat -A celery_django -S django

当你执行这个命令的时候，Celery/bin/celery.py 中的 CeleryCommand 类接收到命令后，会选择 beat 对应的类执行如下代码：

# Python 实用宝典
# https://pythondict.com

from celery.bin.beat import beat

class CeleryCommand(Command):
    commands = {
        # ...
        'beat': beat,
        # ...
    }
    # ...
    def execute(self, command, argv=None):
        try:
            cls = self.commands[command]
        except KeyError:
            cls, argv = self.commands['help'], ['help']
        cls = self.commands.get(command) or self.commands['help']
        try:
            return cls(
                app=self.app, on_error=self.on_error,
                no_color=self.no_color, quiet=self.quiet,
                on_usage_error=partial(self.on_usage_error, command=command),
            ).run_from_argv(self.prog_name, argv[1:], command=argv[0])
        except self.UsageError as exc:
            self.on_usage_error(exc)
            return exc.status
        except self.Error as exc:
            self.on_error(exc)
            return exc.status

此时cls对应的是beat类，通过查看位于bin/beat.py中的 beat 类可知，该类只重写了run方法和add_arguments方法。

所以此时执行的 run_from_argv 方法是 beat 继承的 Command 的 run_from_argv 方法：

# Python 实用宝典
# https://pythondict.com

def run_from_argv(self, prog_name, argv=None, command=None):
    return self.handle_argv(prog_name, sys.argv if argv is None else argv, command)

该方法中会调用 Command 的 handle_argv 方法，而该方法在经过相关参数处理后会调用 self(*args, **options) 到 __call__ 函数：

    # Python 实用宝典
    # https://pythondict.com
    
    def handle_argv(self, prog_name, argv, command=None):
        """Parse command-line arguments from ``argv`` and dispatch
        to :meth:`run`.

        :param prog_name: The program name (``argv[0]``).
        :param argv: Command arguments.

        Exits with an error message if :attr:`supports_args` is disabled
        and ``argv`` contains positional arguments.

        """
        options, args = self.prepare_args(
            *self.parse_options(prog_name, argv, command))
        return self(*args, **options)

Command 类的 __call__函数：

    # Python 实用宝典
    # https://pythondict.com
    
    def __call__(self, *args, **kwargs):
        random.seed()  # maybe we were forked.
        self.verify_args(args)
        try:
            ret = self.run(*args, **kwargs)
            return ret if ret is not None else EX_OK
        except self.UsageError as exc:
            self.on_usage_error(exc)
            return exc.status
        except self.Error as exc:
            self.on_error(exc)
            return exc.status

可见，在该函数中会调用到run方法，此时调用的run方法就是beat类中重写的run方法，查看该方法：

# Python 实用宝典
# https://pythondict.com
    
class beat(Command):
    """Start the beat periodic task scheduler.

    Examples::

        celery beat -l info
        celery beat -s /var/run/celery/beat-schedule --detach
        celery beat -S djcelery.schedulers.DatabaseScheduler

    """
    doc = __doc__
    enable_config_from_cmdline = True
    supports_args = False

    def run(self, detach=False, logfile=None, pidfile=None, uid=None,
            gid=None, umask=None, working_directory=None, **kwargs):
        # 是否开启后台运行
        if not detach:
            maybe_drop_privileges(uid=uid, gid=gid)
        workdir = working_directory
        kwargs.pop('app', None)
        # 设定偏函数
        beat = partial(self.app.Beat,
                       logfile=logfile, pidfile=pidfile, **kwargs)

        if detach:
            with detached(logfile, pidfile, uid, gid, umask, workdir):
                return beat().run() # 后台运行
        else:
            return beat().run() # 立即运行

这里引用了偏函数的知识，偏函数就是从基函数创建一个新的带默认参数的函数，详细可见廖雪峰老师的介绍：
https://www.liaoxuefeng.com/wiki/1016959663602400/1017454145929440

可见，此时创建了app的Beat方法的偏函数，并通过 .run 函数执行启动 beat 进程，首先看看这个 beat 方法：

    # Python 实用宝典
    # https://pythondict.com
    @cached_property
    def Beat(self, **kwargs):
        # 导入celery.apps.beat:Beat类
        return self.subclass_with_self('celery.apps.beat:Beat')

可以看到此时就实例化了 celery.apps.beat 中的 Beat 类，并调用了该实例的 run 方法：

    # Python 实用宝典
    # https://pythondict.com
    def run(self):
        print(str(self.colored.cyan(
            'celery beat v{0} is starting.'.format(VERSION_BANNER))))
        # 初始化loader
        self.init_loader()
        # 设置进程
        self.set_process_title()
        # 开启任务调度
        self.start_scheduler()

init_loader 中，会导入默认的modules，此时会引入相关的定时任务，这些不是本文重点。我们重点看 start_scheduler 是如何开启任务调度的：

    # Python 实用宝典
    # https://pythondict.com
    def start_scheduler(self):
        c = self.colored
        if self.pidfile: # 是否设定了pid文件
            platforms.create_pidlock(self.pidfile)  # 创建pid文件
        # 初始化service
        beat = self.Service(app=self.app,
                            max_interval=self.max_interval,
                            scheduler_cls=self.scheduler_cls,
                            schedule_filename=self.schedule)
        
        # 打印启动信息
        print(str(c.blue('__    ', c.magenta('-'),
                  c.blue('    ... __   '), c.magenta('-'),
                  c.blue('        _\n'),
                  c.reset(self.startup_info(beat)))))
        # 开启日志
        self.setup_logging()
        if self.socket_timeout:
            logger.debug('Setting default socket timeout to %r',
                         self.socket_timeout)
            # 设置超时
            socket.setdefaulttimeout(self.socket_timeout)
        try:
            # 注册handler
            self.install_sync_handler(beat)
            # 开启beat
            beat.start()
        except Exception as exc:
            logger.critical('beat raised exception %s: %r',
                            exc.__class__, exc,
                            exc_info=True)

我们看下beat是如何开启的：

    # Python 实用宝典
    # https://pythondict.com
    def start(self, embedded_process=False, drift=-0.010):
        info('beat: Starting...')
        # 打印最大间隔时间
        debug('beat: Ticking with max interval->%s',
              humanize_seconds(self.scheduler.max_interval))
        
        # 通知注册该signal的函数
        signals.beat_init.send(sender=self)
        if embedded_process:
            signals.beat_embedded_init.send(sender=self)
            platforms.set_process_title('celery beat')

        try:
            while not self._is_shutdown.is_set():
                # 调用scheduler.tick()函数检查还剩多余时间
                interval = self.scheduler.tick()
                interval = interval + drift if interval else interval
                # 如果大于0
                if interval and interval > 0:
                    debug('beat: Waking up %s.',
                          humanize_seconds(interval, prefix='in '))
                    # 休眠
                    time.sleep(interval)
                    if self.scheduler.should_sync():
                        self.scheduler._do_sync()
        except (KeyboardInterrupt, SystemExit):
            self._is_shutdown.set()
        finally:
            self.sync()

这里重点看 self.scheduler.tick() 方法：

    # Python 实用宝典
    # https://pythondict.com
    def tick(self):
        """Run a tick, that is one iteration of the scheduler.

        Executes all due tasks.

        """
        remaining_times = []
        try:
            # 遍历每个周期任务设定
            for entry in values(self.schedule):
                # 下次运行时间
                next_time_to_run = self.maybe_due(entry, self.publisher)
                if next_time_to_run:
                    remaining_times.append(next_time_to_run)
        except RuntimeError:
            pass

        return min(remaining_times + [self.max_interval])

这里通过 self.schedule 拿到了所有存放在用 shelve 写入的 celerybeat-schedule 文件的定时任务，遍历所有定时任务，调用 self.maybe_due 方法：

    # Python 实用宝典
    # https://pythondict.com
    def maybe_due(self, entry, publisher=None):
        # 是否到达运行时间
        is_due, next_time_to_run = entry.is_due()

        if is_due:
            # 打印任务发送日志
            info('Scheduler: Sending due task %s (%s)', entry.name, entry.task)
            try:
                # 执行任务
                result = self.apply_async(entry, publisher=publisher)
            except Exception as exc:
                error('Message Error: %s\n%s',
                      exc, traceback.format_stack(), exc_info=True)
            else:
                debug('%s sent. id->%s', entry.task, result.id)
        return next_time_to_run

可以看到，此处会判断任务是否到达定时时间，如果是的话，会调用 apply_async 调用Worker执行任务。如果不是，则返回下次运行时间，让 Beat 进程进行 Sleep，减少进程资源消耗。

到此，我们就讲解完了 Celery Beat 在周期定时任务的检测调度机制，怎么样，小伙伴们有没有什么疑惑？可以在下方留言区留言一起讨论哦。

我们的文章到此就结束啦，如果你喜欢今天的 Python 教程，请持续关注Python实用宝典。

有任何问题，可以在公众号后台回复：加群，回答相应验证信息，进入互助群询问。

原创不易，希望你能在下面点个赞和在看支持我继续创作，谢谢！

我要打赏

Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号：Python实用宝典

Python 基础教程、知识问答、解决方案

Python yield 关键字有什么作用？详细解答

2021年6月22日 Python实用宝典留下评论

yield 关键字有什么作用？要了解 yield 的作用，您必须了解生成器是什么。在了解生成器之前，您必须了解iterables(可迭代对象)。

1. 什么是可迭代对象？

对于列表时，您可以一项一项地输出它的值。一项一项地读取列表的内容，这种形式被称为迭代：

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist是一个可迭代的对象。当您使用列表推导式时，您将创建一个列表，以及一个可迭代对象：

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

那些可以被你使用 “ for... in...” 迭代的所有对象都是可迭代的，比如 数组、字符串等。

这些可迭代对象很方便，因为您可以随心所欲地读取它们，但是您将所有值存储在内存中，当您有很多值时，这并不总是您想要的。

2. 什么是生成器？

生成器是迭代器，一种只能迭代一次的可迭代对象。生成器不会将所有值存储在内存中，它们会即时生成值：

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

生成器创建的时候需要用 () 而不是 [] . 但是，您不能重复执行 for i in mygenerator ，因为生成器只能使用一次：它们计算出 0 (0*0)，然后忘记它并计算得到 1 (1*1)，然后一一结束计算得到 4 (2*2)。

3. 重点来了，什么是yield? yield 关键字有什么作用？

yield 是一个像 return 一样使用的关键字，不同的是使用yield会使该函数返回一个生成器。

>>> def create_generator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = create_generator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object create_generator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

这是一个无用的示例，但是当您知道您的函数将返回大量您只需要读取一次的值时，它会很方便。

要掌握yield，你必须明白，当你调用函数时，你写在函数体中的代码并没有运行。该函数只返回生成器对象。然后，您的代码将在每次for循环使用生成器时从停止的地方继续。

现在是困难的部分：

第一次 for 调用从您的函数创建的生成器对象时，它将从头开始运行您的函数中的代码，直到命中yield，然后它将返回循环的第一个值。然后，每个后续调用将运行您在函数中编写的循环的另一次迭代并返回下一个值。这将一直持续到生成器被认为是空的为止。

4. 控制生成器耗尽的一个例子

>>> class Bank(): # Let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

注意：对于 Python 3，使用print(corner_street_atm.__next__())或print(next(corner_street_atm))

它可用于控制对资源的访问等各种事情。

5.Itertools，你最好的朋友

itertools 模块包含操作可迭代对象的特殊函数。曾经想复制一个生成器吗？连接两个生成器？使用单行对嵌套列表中的值进行分组？Map/Zip 不创建另一个列表？

那么就 import itertools.

一个例子？让我们看看四马比赛可能的到达顺序：

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

6.理解迭代的内部机制

迭代是一个包含可迭代对象（实现__iter__()方法）和迭代器（实现__next__()方法）的过程。

可迭代对象是您可以从中获取迭代器的任何对象。迭代器是让你迭代可迭代对象的对象。

这篇文章中有更多关于for循环如何工作的内容。

我们的文章到此就结束啦，如果你喜欢今天的 Python 教程，请持续关注Python实用宝典。

有任何问题，可以在公众号后台回复：加群，回答相应验证信息，进入互助群询问。

原创不易，希望你能在下面点个赞和在看支持我继续创作，谢谢！

我要打赏

Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号：Python实用宝典

性能优化、解决方案、设计模式

Python 函数耗时异常自动化监控实战教程

2021年6月1日 Python实用宝典

来源：文呓

本文内容包括Python性能可视化分析，逻辑优化，及根据不同的模型动态计算安全阈值，实现各个函数耗时及程序总耗时的自动化监控预警。

在做Python性能分析优化的时候，可以借助cProfile生成性能数据文件，通过pstats获取详细耗时分布数据，结合gprof2dot脚本生成函数调用栈结构图做可视化分析，提高性能分析的效率。

接着从具体的耗时分布，先从占用大头的函数分析具体逻辑实现，逐步优化，同时保存pstats函数耗时平均值数据作为后续异常自动化监控的样本数据。

实现耗时自动化监控必须是可以根据算法动态调整安全阈值，而不是人工定死安全阈值范围，这样才可以实现异常监控的自循环和迭代校准。

一、性能数据函数耗时采集及可视化报表生成

1. 性能数据文件保存（cProfile）

首先是性能数据文件的保存，cProfile和profile提供了Python程序的确定性性能分析。profile是一组统计数据，用来描述程序的各个部分执行的频率和时间。在程序开始的时候调用enable开始性能数据采集，结束的时候调用dump_stats停止性能数据采集并保存性能数据到指定路径的文件。

import cProfile
# 程序开始的时候打开数据采集开关
pr = cProfile.Profile()
pr.enable()

# 在程序运行结束的时候dump性能数据到指定路径文件中，profliePath为保存文件的绝对路径参数
pr.dump_stats(profliePath)

2. 详细性能数据读取查看

保存性能数据到文件之后，可以用pstats读取文件中的数据，profile统计数据可以通过pstats模块格式化为报表。

import pstats 
# 读取性能数据 
pS = pstats.Stats(profliePath) 
# 根据函数自身累计耗时做排序 
pS.sort_stats('tottime') 
# 打印所有耗时函数信息 
pS.print_stats()
print_stats()输出示例：
79837 function calls (75565 primitive calls) in 37.311 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
 2050    30.167    0.015   30.167    0.015  {time.sleep}
   16     6.579    0.411    6.579    0.411  {select.select}
    1     0.142    0.142    0.142    0.142  {method 'do_handshake' of '_ssl._SSLSocket' objects}
  434     0.074    0.000    0.074    0.000  {method 'read' of '_ssl._SSLSocket' objects}
    1     0.062    0.062    0.062    0.062  {method 'connect' of '_socket.socket' objects}
   37     0.046    0.001    0.046    0.001  {posix.read}
   14     0.024    0.002    0.024    0.002  {posix.fork}

输出字段说明：

ncalls 函数被调用次数（只有一个数字时表示不存在递归，有斜杠分割数字时，后面的数字表示非递归调用的次数）
tottime 函数总计运行时间，不包括子函数调用时间
percall 函数运行一次的平均时间，等于tottime/ncalls
cumtime 函数总计运行时间，包括子函数调用时间
percall 函数运行一次的平均时间，等于cumtime/ncalls
filename:lineno(function) 函数所在的文件名，函数的行号，函数名或基础框架函数类

如果要获取print_stats()里面各个字段信息可以通过如下方式：

# func————filename:lineno(function)
# cc ———— call count，调用次数 
# nc ———— ncalls
# tt ———— tottime
# ct ———— cumtime
# callers ———— 调用堆栈数组，每项数据包括了func, (cc, nc, tt, ct) 字段
for index in range(len(pS.fcn_list)): 
    func = pS.fcn_list[index] 
    cc, nc, tt, ct, callers = pS.stats[func]  
    print cc, nc, tt, ct, func, callers
    for func, (cc, nc, tt, ct) in callers.iteritems():
        print func，cc, nc, tt, ct

二、生成函数调用栈结构图（gprof2dot）教程

gprof2dot脚本把gprof或callgrind分析获得的信息，转化成一个以DOT语言描述的程序调用有向图对象，再通过Graphviz将DOT有向图对象渲染成图片，这样就可以很直观地看出整个程序的调用栈，包括函数所在的类和行数、耗时占比、函数递归次数、以及被调用的次数。

先从GitHub上下载gprof2dot.py脚本到本地，和执行的程序的脚本文件放在同一目录下，当然要使用这个脚本还需要安装graphviz，使用brew命令安装，若安装过程中遇到异常，根据异常提示执行命令安装需要的工具

brew install graphviz

生成程序函数调用栈结构图的逻辑可以参考如下逻辑实现，具体根据自身需求做下修改。

import os
# 获取当前gprof2dot.py脚本路径
gprof2dotPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'gprof2dot.py')
# 函数调用栈结构图保存文件名路径，这边使用生成PNG图片结果
dumpProfPath = profliePath.replace("stats", "png")
dumpCmd = "python %s -f pstats %s | dot -Tpng -o %s" % (gprof2dotPath, profliePath, dumpProfPath)
os.popen(dumpCmd)

三、性能分析及优化实战

在生成函数调用栈结构图之后，就可以很容易的看出各个函数之间的调用关系，每个方块内包括的信息包括函数所在的类和行数，耗时百分比和被调用次数，如果这个函数存在递归的情况，方块边缘会有一个回旋的箭头标明递归的次数。

从结构图里面找到耗时占用较多的部分，分析具体函数的实现逻辑，定位具体耗时的原因，优化的策略如下：

去除多余的逻辑：去除冗余代码
优化递归函数：加日志打印递归时候的各个参数，如果发现很多参数都是重复的，可以加缓存，避免多余的递归消耗。
归并通用逻辑调用：一个函数多次调用同一个子函数获取参数，查看这个子函数的调用是否可以进行整合归并，避免多余的函数调用。
通过上下文环境判断测试程序的初始化是否必要，非必要情况下不进行测试环境的重置操作。

四、耗时异常自动化监控

如果是通过历史的耗时数据计算得到平均值+固定浮动百分比的方式，来配置耗时安全阈值参数实施异常监控存在很大的问题，因为函数执行的耗时容易受设备和运行环境的影响，人工固定浮动百分比的方式维护性差，数据本身不可迭代自循环，总不能每次出现误报问题之后都去手动调整参数。

这边监控的维度包括两方面，一方面是程序各个函数执行耗时的平均值，另一方面是完整程序执行的总耗时，在前期先把这些历史耗时数据保存在数据库中，供后续自动化异常监控的实现提供样本数据。

要实现自动化阈值调整，需要借助常规的模型算法实现，这边只对耗时单个维度的异常做自动化监控实现。

根据原理，无监督异常检测模型一般可分为以下几类：

基于统计和概率模型：主要是对数据的分布做出假设，并找出假设下所定义的“异常”；
线性模型：主要思想是通过线性方法找到合适的低维子空间使得异常点在其中区别于正常点；
基于距离：这种方法认为异常点距离正常点比较远，通过比较数据点之间的距离区分异常点；
基于密度：由于数据分布不均匀，绝对距离无法衡量数据点之间相对远近时，用局部密度表示数据点的异常情况；
基于聚类：将数据点聚类，不属于任何簇、距离最近的簇较远、稀疏聚类里的点认为是异常点；
基于树：通过划分子空间构建树模型寻找异常点。

异常耗时数据是波动的一维数据，这边就直接采用基于统计和概率模型的方式，根据保存的历史数据判断数据是否符合正态分布。

若符合正态分布则用 μ+3δ（平均值+3倍标准差）的方式计算得到安全阈值；

若不符合正态分布，则用Turkey 箱型图方案 Q+1.5IQR 计算安全阈值。

根据实际测试来看，随着样本数据的增加，会出现前期符合正态分布的函数耗时曲线，随着样本数据的增加会变成不符合正态分布。

Python中用于判断数据是否符合正态分布的代码如下，当pvalue值大于0.05时为正态分布，dataList是耗时数组数据：

from scipy import stats
import numpy
percallMean = numpy.mean(dataList) # 计算均值
# percallVar = numpy.var(dataList) # 求方差
percallStd = numpy.std(dataList) # 计算标准差
kstestResult = stats.kstest(dataList, 'norm', (percallMean, percallStd))
# 当pvalue值大于0.05为正态分布
if kstestResult[1] > 0.05:
    pass

1. 正态分布数据方案

在统计学中，如果一个数据分布近似正态，那么大约 68% 的数据值会在均值的一个标准差范围内，大约 95% 会在两个标准差范围内，大约 99.7% 会在三个标准差范围内。因此，如果任何数据点超过标准差的 3 倍，那么这些点很有可能是异常值或离群点。即正态分布的安全阈值上限为：percallMean + 3 * percallStd

2. Turkey 箱型图方案

基于正态分布的 3σ 法则或 Z 分数方法的异常检测是以假定数据服从正态分布为前提的，但实际数据往往并不严格服从正态分布。应用这种方法于非正态分布数据中判断异常值，其有效性是有限的。Tukey 箱型图是一种用于反映原始数据分布的特征常用方法，也可用于异常点识别。在识别异常点时其主要依靠实际数据，因此有其自身的优越性。

箱型图为我们提供了识别异常值的一个标准：异常值被定义为小于 Q1－1.5IQR 或大于 Q+1.5IQR 的值。虽然这种标准有点任意性，但它来源于经验判断，经验表明它在处理需要特别注意的数据方面表现不错。

计算箱型图安全阈值Python实现逻辑如下：

import numpy
percallMean = numpy.mean(dataList)  # 计算均值
boxplotQ1 = numpy.percentile(dataList, 25)
boxplotQ2 = numpy.percentile(dataList, 75)
boxplotIQR = boxplotQ2 - boxplotQ1
upperLimit =  boxplotQ2 + 1.5 * boxplotIQR

在程序实现中就是，在一个程序或用例执行完毕之后，先拿历史数据判断是否符合正态分布，当然历史样本数据至少要达到20个才比较准确，小于20个的时候就继续收集数据，不做异常判断。根据正态分布模型或箱型图模型计算安全阈值参数，判断当前各个函数耗时平均值或用例总耗时是否超过阈值，超过则预警。

高斯模型和箱型图两种方式阈值范围对比

这边给出stats文件数据汇总解析之后，根据相应的模型绘制耗时曲线及阈值或正态曲线及阈值的代码实现，statFolder参数替换成自己stats文件所在文件夹即可。

# coding=utf-8
import os
import pstats
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import traceback
from scipy import stats
import numpy

"""
汇总函数耗时平均值数据
"""
def dataSummary(array, fileName, fcn, percall):
    (funcPath, line, func) = fcn
    exists = False
    for item in array:
        if item["func"] == func and item["funcPath"] == funcPath and item["line"] == line:
            exists = True
            item["cost"].append({
                "percall": percall,
                "fileName": fileName
            })
    if not exists:
        array.append({
            "func": func,
            "funcPath": funcPath,
            "line": line,
            "cost": [{
                "percall": percall,
                "fileName": fileName
            }]
        })

"""
高斯函数计算Y值
"""
def gaussian(x, mu, delta):
    exp = numpy.exp(- numpy.power(x - mu, 2) / (2 * numpy.power(delta, 2)))
    c = 1 / (delta * numpy.sqrt(2 * numpy.pi))
    return c * exp

"""
读取汇总所有stats文件数据
"""
def readStatsFile(statFolder, filterData):
    for path, dir_list, file_list in os.walk(statFolder, "r"):
        for fileName in file_list:
            if fileName.find(".stats") > 0:
                fileAbsolutePath = os.path.join(path, fileName)
                pS = pstats.Stats(fileAbsolutePath)
             # 先对耗时数据从大到小进行排序
                pS.sort_stats('cumtime')
                # pS.print_stats()
                # 统计前100条耗时数据
                for index in range(100):
                    fcn = pS.fcn_list[index]
                    (funcPath, line, func) = fcn
                    # cc ———— call count，调用次数
                    # nc ———— ncalls，调用次数（只有一个数字时表示不存在递归；有斜杠分割数字时，后面的数字表示非递归调用的次数）
                    # tt ———— tottime，函数总计运行时间，除去函数中调用的子函数运行时间
                    # ct ———— cumtime，函数总计运行时间，含调用的子函数运行时间
                    cc, nc, tt, ct, callers = pS.stats[fcn]
                    # print fileName, func, cc, nc, tt, ct, callers
                    percall = ct / nc
                    # 只统计单次函数调用大于1毫秒的数据
                    if percall >= 0.001:
                        dataSummary(filterData, fileName, fcn, percall)

"""
绘制高斯函数曲线和安全阈值
"""
def drawGaussian(func, line, percallMean, threshold, percallList, dumpFolder):
    plt.title(func)
    plt.figure(figsize=(10, 8))
    for delta in [0.2, 0.5, 1]:
        gaussY = []
        gaussX = []
        for item in percallList:
            # 这边为了呈现正态曲线效果，减去平均值
            gaussX.append(item - percallMean)
            y = gaussian(item - percallMean, 0, delta)
            gaussY.append(y)
        plt.plot(gaussX, gaussY, label='sigma={}'.format(delta))
    # 绘制水位线
    plt.plot([threshold - percallMean, threshold - percallMean], [0, 5 * gaussian(percallMean, 0, 1)], color='red',
             linestyle="-", label="Threshold:" + str("%.5f" % threshold))
    plt.xlabel("Time(s)", fontsize=12)
    plt.legend()
    plt.tight_layout()
    # 可能不同类中包含相同的函数名，加上行数参数避免覆盖
    imagePath = dumpFolder + "cost_%s_%s.png" % (func, str(line))
    plt.savefig(imagePath)

"""
绘制耗时曲线和安全阈值
"""
def drawCurve(func, line, percallList, dumpFolder):
    boxplotQ1 = numpy.percentile(percallList, 25)
    boxplotQ2 = numpy.percentile(percallList, 75)
    boxplotIQR = boxplotQ2 - boxplotQ1
    upperLimit = boxplotQ2 + 1.5 * boxplotIQR
    # 不符合正态分布，绘制波动曲线
    timeArray = [i for i in range(len(percallList))]
    plt.title(dataItem["func"])
    plt.figure(figsize=(10, 8))
    # 绘制水位线
    plt.plot([0, len(percallList)], [upperLimit, upperLimit], color='red', linestyle="-",
             label="Threshold:" + str("%.5f" % upperLimit))
    plt.plot(timeArray, percallList, label=dataItem["func"] + "_" + str(dataItem["line"]))
    plt.ylabel("Time(s)", fontsize=12)
    plt.legend()
    plt.tight_layout()
    imagePath = dumpFolder + "cost_%s_%s.png" % (func, str(line))
    plt.savefig(imagePath)

if __name__ == "__main__":
    try:
        statFolder = "/Users/chenwenguan/Downloads/2aab7e17-a1b6-1253/"
        chartFolder = statFolder + "chart/"
        if not os.path.exists(chartFolder):
            os.mkdir(chartFolder)
        filterData = []
        readStatsFile(statFolder, filterData);
        for dataItem in filterData:
            percallList = map(lambda x: x["percall"], dataItem["cost"])
            func = dataItem["func"]
            line = dataItem["line"]
            # 样本个数大于20才进行绘制
            if len(percallList) > 20:
                percallMean = numpy.mean(percallList) # 计算均值
                # percallVar = numpy.var(percallMap) # 求方差
                percallStd = numpy.std(percallList)  # 计算标准差
                # pvalue值大于0.05为正太分布
                kstestResult = stats.kstest(percallList, 'norm', (percallMean, percallStd))
                print "percallStd:%s, pvalue:%s" % (percallStd, kstestResult[1])
                # 符合正态分布绘制分布曲线
                if kstestResult[1] > 0.05:
                    threshold = percallMean + 3 * percallStd
                    drawGaussian(func, line, percallMean, threshold, percallList, chartFolder)
                else:
                    drawCurve(func, line, percallList, chartFolder)
            else:
                pass
    except Exception:
        print 'exeption:' + traceback.format_exc()

两种耗时模型绘制的曲线效果图如下：

函数耗时高斯分布曲线及阈值效果示例

函数耗时曲线及Turkey箱型图阈值示例

我们的文章到此就结束啦，如果你喜欢今天的 Python 教程，请持续关注Python实用宝典。

有任何问题，可以在公众号后台回复：加群，回答相应验证信息，进入互助群询问。

原创不易，希望你能在下面点个赞和在看支持我继续创作，谢谢！

我要打赏

Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号：Python实用宝典

Python 基础教程、性能优化、解决方案

30 个好用的 Python 编程技巧、小贴士

2021年5月29日 Python实用宝典

作者 | Erik-Jan van Baaren

译者 | 弯月，责编 | 屠敏

出品 | CSDN（ID：CSDNnews）

以下为译文：

借本文为大家献上 Python 语言的 30 个最佳实践、小贴士和技巧，希望能对各位勤劳的程序员有所帮助，并希望大家工作顺利！

1. Python 版本

在此想提醒各位：自2020年1月1日起，Python 官方不再支持 Python 2。本文中的很多示例只能在 Python 3 中运行。如果你仍在使用 Python 2.7，请立即升级。

2. Python 编程技巧 – 检查 Python 的最低版本

你可以在代码中检查 Python 的版本，以确保你的用户没有在不兼容的版本中运行脚本。检查方式如下：

if not sys.version_info > (2, 7):
   # berate your user for running a 10 year
   # python version
elif not sys.version_info >= (3, 5):
   # Kindly tell your user (s)he needs to upgrade
   # because you’re using 3.5 features

3.Python 编程技巧 – IPython

IPython 本质上就是一个增强版的shell。就冲着自动补齐就值得一试，而且它的功能还不止于此，它还有很多令我爱不释手的命令，例如：

%cd：改变当前的工作目录
%edit：打开编辑器，并关闭编辑器后执行键入的代码
%env：显示当前环境变量
%pip install [pkgs]：无需离开交互式shell，就可以安装软件包
%time 和 %timeit：测量执行Python代码的时间

完整的命令列表，请点击此处查看（https://ipython.readthedocs.io/en/stable/interactive/magics.html）。

还有一个非常实用的功能：引用上一个命令的输出。In 和 Out 是实际的对象。你可以通过 Out[3] 的形式使用第三个命令的输出。

IPython 的安装命令如下：

pip3 install ipython

4.Python 编程技巧 – 列表推导式

你可以利用列表推导式，避免使用循环填充列表时的繁琐。列表推导式的基本语法如下：

[ expression for item in list if conditional ]

举一个基本的例子：用一组有序数字填充一个列表：

mylist = [i for i in range(10)]
print(mylist)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

由于可以使用表达式，所以你也可以做一些算术运算：

squares = [x**2 for x in range(10)]
print(squares)
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

甚至可以调用外部函数：

def some_function(a):
return (a + 5) / 2

my_formula = [some_function(i) for i in range(10)]
print(my_formula)
# [2, 3, 3, 4, 4, 5, 5, 6, 6, 7]

最后，你还可以使用 ‘if’ 来过滤列表。在如下示例中，我们只保留能被2整除的数字：

filtered = [i for i in range(20) if i%2==0]
print(filtered)
# [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

5. Python 编程技巧 -检查对象使用内存的状况

你可以利用 sys.getsizeof() 来检查对象使用内存的状况：

import sys

mylist = range(0, 10000)
print(sys.getsizeof(mylist))
# 48

等等，为什么这个巨大的列表仅包含48个字节？

因为这里的 range 函数返回了一个类，只不过它的行为就像一个列表。在使用内存方面，range 远比实际的数字列表更加高效。

你可以试试看使用列表推导式创建一个范围相同的数字列表：

import sys

myreallist = [x for x in range(0, 10000)]
print(sys.getsizeof(myreallist))
# 87632

6. Python 编程技巧 – 返回多个值

Python 中的函数可以返回一个以上的变量，而且还无需使用字典、列表或类。如下所示：

def get_user(id):
    # fetch user from database
    # ….
    return name, birthdate

name, birthdate = get_user(4)

如果返回值的数量有限当然没问题。但是，如果返回值的数量超过3个，那么你就应该将返回值放入一个（数据）类中。

7. Python 编程技巧 – 使用数据类

Python从版本3.7开始提供数据类。与常规类或其他方法（比如返回多个值或字典）相比，数据类有几个明显的优势：

数据类的代码量较少
你可以比较数据类，因为数据类提供了 __eq__ 方法
调试的时候，你可以轻松地输出数据类，因为数据类还提供了 __repr__ 方法
数据类需要类型提示，因此可以减少Bug的发生几率

数据类的示例如下：

from dataclasses import dataclass

@dataclass
class Card:
rank: str
suit: str

card = Card(“Q”, “hearts”)

print(card == card)
# True

print(card.rank)
# ‘Q’

print(card)
Card(rank=‘Q’, suit=‘hearts’)

详细的使用指南请点击这里（https://realpython.com/python-data-classes/）。

8. Python 编程技巧 – 交换变量

如下的小技巧很巧妙，可以为你节省多行代码：

a = 1
b = 2
a, b = b, a
print (a)
# 2
print (b)
# 1

9. Python 编程技巧 – 合并字典（Python 3.5以上的版本）

从Python 3.5开始，合并字典的操作更加简单了：

dict1 = { ‘a’: 1, ‘b’: 2 }
dict2 = { ‘b’: 3, ‘c’: 4 }
merged = { **dict1, **dict2 }
print (merged)
# {‘a’: 1, ‘b’: 3, ‘c’: 4}

如果 key 重复，那么第一个字典中的 key 会被覆盖。

10. Python 编程技巧 – 字符串的首字母大写

如下技巧真是一个小可爱：

mystring = “10 awesome python tricks”
print(mystring.title())
’10 Awesome Python Tricks’

11. Python 编程技巧 – 将字符串分割成列表

你可以将字符串分割成一个字符串列表。在如下示例中，我们利用空格分割各个单词：

mystring = “The quick brown fox”
mylist = mystring.split(‘ ‘)
print(mylist)
# [‘The’, ‘quick’, ‘brown’, ‘fox’]

12. Python 编程技巧 – 根据字符串列表创建字符串

与上述技巧相反，我们可以根据字符串列表创建字符串，然后在各个单词之间加入空格：

mylist = [‘The’, ‘quick’, ‘brown’, ‘fox’]
mystring = ” “.join(mylist)
print(mystring)
# ‘The quick brown fox’

你可能会问为什么不是 mylist.join(” “)，这是个好问题！

根本原因在于，函数 String.join() 不仅可以联接列表，而且还可以联接任何可迭代对象。将其放在String中是为了避免在多个地方重复实现同一个功能。

13. Python 编程技巧 – 表情符

有些人非常喜欢表情符，而有些人则深恶痛绝。我在此郑重声明：在分析社交媒体数据时，表情符可以派上大用场。

首先，我们来安装表情符模块：

pip3 install emoji

安装完成后，你可以按照如下方式使用：

import emoji
result = emoji.emojize(‘Python is :thumbs_up:’)
print(result)
# ‘Python is 👍’

# You can also reverse this:
result = emoji.demojize(‘Python is 👍’)
print(result)
# ‘Python is :thumbs_up:’

更多有关表情符的示例和文档，请点击此处（https://pypi.org/project/emoji/）。

14. Python 编程技巧 – 列表切片

列表切片的基本语法如下：

a[start:stop:step]

start、stop 和 step 都是可选项。如果不指定，则会使用如下默认值：

start：0
end：字符串的结尾
step：1

示例如下：

# We can easily create a new list from
# the first two elements of a list:
first_two = [1, 2, 3, 4, 5][0:2]
print(first_two)
# [1, 2]

# And if we use a step value of 2,
# we can skip over every second number
# like this:
steps = [1, 2, 3, 4, 5][0:5:2]
print(steps)
# [1, 3, 5]

# This works on strings too. In Python,
# you can treat a string like a list of
# letters:
mystring = “abcdefdn nimt”[::2]
print(mystring)
# ‘aced it’

15. Python 编程技巧 – 反转字符串和列表

你可以利用如上切片的方法来反转字符串或列表。只需指定 step 为 -1，就可以反转其中的元素：

revstring = “abcdefg”[::-1]
print(revstring)
# ‘gfedcba’

revarray = [1, 2, 3, 4, 5][::-1]
print(revarray)
# [5, 4, 3, 2, 1]

16. Python 编程技巧 – 显示猫猫

我终于找到了一个充分的借口可以在我的文章中显示猫猫了，哈哈！当然，你也可以利用它来显示图片。首先你需要安装 Pillow，这是一个 Python 图片库的分支：

pip3 install Pillow

接下来，你可以将如下图片下载到一个名叫 kittens.jpg 的文件中：

然后，你就可以通过如下 Python 代码显示上面的图片：

from PIL import Image

im = Image.open(“kittens.jpg”)
im.show()
print(im.format, im.size, im.mode)
# JPEG (1920, 1357) RGB

Pillow 还有很多显示该图片之外的功能。它可以分析、调整大小、过滤、增强、变形等等。完整的文档，请点击这里（https://pillow.readthedocs.io/en/stable/）。

17. Python 编程技巧 – map()

Python 有一个自带的函数叫做 map()，语法如下：

map(function, something_iterable)

所以，你需要指定一个函数来执行，或者一些东西来执行。任何可迭代对象都可以。在如下示例中，我指定了一个列表：

def upper(s):
return s.upper()

mylist = list(map(upper, [‘sentence’, ‘fragment’]))
print(mylist)
# [‘SENTENCE’, ‘FRAGMENT’]

# Convert a string representation of
# a number into a list of ints.
list_of_ints = list(map(int, “1234567”)))
print(list_of_ints)
# [1, 2, 3, 4, 5, 6, 7]

你可以仔细看看自己的代码，看看能不能用 map() 替代某处的循环。

18. Python 编程技巧 – 获取列表或字符串中的唯一元素

如果你利用函数 set() 创建一个集合，就可以获取某个列表或类似于列表的对象的唯一元素：

mylist = [1, 1, 2, 3, 4, 5, 5, 5, 6, 6]
print (set(mylist))
# {1, 2, 3, 4, 5, 6}

# And since a string can be treated like a
# list of letters, you can also get the
# unique letters from a string this way:
print (set(“aaabbbcccdddeeefff”))
# {‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’}

19. Python 编程技巧 – 查找出现频率最高的值

你可以通过如下方法查找出现频率最高的值：

test = [1, 2, 3, 4, 2, 2, 3, 1, 4, 4, 4]
print(max(set(test), key = test.count))
# 4

你能看懂上述代码吗？想法搞明白上述代码再往下读。

没看懂？我来告诉你吧：

max() 会返回列表的最大值。参数 key 会接受一个参数函数来自定义排序，在本例中为 test.count。该函数会应用于迭代对象的每一项。
test.count 是 list 的内置函数。它接受一个参数，而且还会计算该参数的出现次数。因此，test.count(1) 将返回2，而 test.count(4) 将返回4。
set(test) 将返回 test 中所有的唯一值，也就是 {1, 2, 3, 4}。

因此，这一行代码完成的操作是：首先获取 test 所有的唯一值，即{1, 2, 3, 4}；然后，max 会针对每一个值执行 list.count，并返回最大值。

这一行代码可不是我个人的发明。

20. Python 编程技巧 – 创建一个进度条

你可以创建自己的进度条，听起来很有意思。但是，更简单的方法是使用 progress 包：

pip3 install progress

接下来，你就可以轻松地创建进度条了：

from progress.bar import Bar

bar = Bar(‘Processing’, max=20)
for i in range(20):
# Do some work
bar.next()
bar.finish()

21. Python 编程技巧 – 在交互式shell中使用_（下划线运算符）

你可以通过下划线运算符获取上一个表达式的结果，例如在 IPython 中，你可以这样操作：

In [1]: 3 * 3
Out[1]: 9In [2]: _ + 3
Out[2]: 12

Python Shell 中也可以这样使用。另外，在 IPython shell 中，你还可以通过 Out[n] 获取表达式 In[n] 的值。例如，在如上示例中，Out[1] 将返回数字9。

22. Python 编程技巧 – 快速创建Web服务器

你可以快速启动一个Web服务，并提供当前目录的内容：

python3 -m http.server

当你想与同事共享某个文件，或测试某个简单的HTML网站时，就可以考虑这个方法。

23. Python 编程技巧 – 多行字符串

虽然你可以用三重引号将代码中的多行字符串括起来，但是这种做法并不理想。所有放在三重引号之间的内容都会成为字符串，包括代码的格式，如下所示。

我更喜欢另一种方法，这种方法不仅可以将多行字符串连接在一起，而且还可以保证代码的整洁。唯一的缺点是你需要明确指定换行符。

s1 = “””Multi line strings can be put
between triple quotes. It’s not ideal
when formatting your code though”””

print (s1)
# Multi line strings can be put
# between triple quotes. It’s not ideal
# when formatting your code though

s2 = (“You can also concatenate multiple\n” +
“strings this way, but you’ll have to\n”
“explicitly put in the newlines”)

print(s2)
# You can also concatenate multiple
# strings this way, but you’ll have to
# explicitly put in the newlines

24. Python 编程技巧 – 条件赋值中的三元运算符

这种方法可以让代码更简洁，同时又可以保证代码的可读性：

[on_true] if [expression] else [on_false]

示例如下：

x = “Success!” if (y == 2) else “Failed!”

25. Python 编程技巧 – 统计元素的出现次数

你可以使用集合库中的 Counter 来获取列表中所有唯一元素的出现次数，Counter 会返回一个字典：

from collections import Counter

mylist = [1, 1, 2, 3, 4, 5, 5, 5, 6, 6]
c = Counter(mylist)
print(c)
# Counter({1: 2, 2: 1, 3: 1, 4: 1, 5: 3, 6: 2})

# And it works on strings too:
print(Counter(“aaaaabbbbbccccc”))
# Counter({‘a’: 5, ‘b’: 5, ‘c’: 5})

26. Python 编程技巧 – 比较运算符的链接

你可以在 Python 中将多个比较运算符链接到一起，如此就可以创建更易读、更简洁的代码：

x = 10

# Instead of:
if x > 5 and x < 15:
print(“Yes”)
# yes

# You can also write:
if 5 < x < 15:
print(“Yes”)
# Yes

27. Python 编程技巧 – 添加颜色

你可以通过 Colorama，设置终端的显示颜色：

from colorama import Fore, Back, Style

print(Fore.RED + ‘some red text’)
print(Back.GREEN + ‘and with a green background’)
print(Style.DIM + ‘and in dim text’)
print(Style.RESET_ALL)
print(‘back to normal now’)

28. Python 编程技巧 – 日期的处理

python-dateutil 模块作为标准日期模块的补充，提供了非常强大的扩展，你可以通过如下命令安装：

pip3 install python-dateutil

你可以利用该库完成很多神奇的操作。在此我只举一个例子：模糊分析日志文件中的日期：

from dateutil.parser import parse

logline = ‘INFO 2020-01-01T00:00:01 Happy new year, human.’
timestamp = parse(log_line, fuzzy=True)
print(timestamp)
# 2020-01-01 00:00:01

你只需记住：当遇到常规 Python 日期时间功能无法解决的问题时，就可以考虑 python-dateutil ！

29.Python 编程技巧 – 整数除法

在 Python 2 中，除法运算符（/）默认为整数除法，除非其中一个操作数是浮点数。因此，你可以这么写：

# Python 2
5 / 2 = 2
5 / 2.0 = 2.5

在 Python 3 中，除法运算符（/）默认为浮点除法，而整数除法的运算符为 //。因此，你需要这么写：

Python 3
5 / 2 = 2.5
5 // 2 = 2

这项变更背后的动机，请参阅 PEP-0238（https://www.python.org/dev/peps/pep-0238/）。

30. Python 编程技巧 – 通过chardet 来检测字符集

你可以使用 chardet 模块来检测文件的字符集。在分析大量随机文本时，这个模块十分实用。安装方法如下：

pip install chardet

安装完成后，你就可以使用命令行工具 chardetect 了，使用方法如下：

chardetect somefile.txt
somefile.txt: ascii with confidence 1.0

你也可以在编程中使用该库，完整的文档请点击这里：

https://chardet.readthedocs.io/en/latest/usage.html

这 30 个小例子虽然有一些是老生长谈，但是确实非常经典，值得反复记忆、练习和收藏！

我们的文章到此就结束啦，如果你喜欢今天的 Python 教程，请持续关注Python实用宝典。

有任何问题，可以在公众号后台回复：加群，回答相应验证信息，进入互助群询问。

原创不易，希望你能在下面点个赞和在看支持我继续创作，谢谢！

我要打赏

Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号：Python实用宝典

1.准备

2.Asciinema 使用方法

3.播放日志

4.导出日志

1.准备

2.基本使用

3.导入导出功能

1.准备

2.使用 isort 整理你的python引用

3. 智能平衡格式化

4.跳过某个import

1.Celery 简单周期任务示例

2.源码剖析

1. 什么是可迭代对象？

2. 什么是生成器？

3. 重点来了，什么是yield? yield 关键字有什么作用？

4. 控制生成器耗尽的一个例子

5.Itertools，你最好的朋友

6.理解迭代的内部机制

一、性能数据函数耗时采集及可视化报表生成

1. 性能数据文件保存（cProfile）

2. 详细性能数据读取查看

二、生成函数调用栈结构图（gprof2dot）教程

三、性能分析及优化实战

四、耗时异常自动化监控

1. 正态分布数据方案

2. Turkey 箱型图方案

1. Python 版本

2. Python 编程技巧 – 检查 Python 的最低版本

3.Python 编程技巧 – IPython

4.Python 编程技巧 – 列表推导式

5. Python 编程技巧 -检查对象使用内存的状况

6. Python 编程技巧 – 返回多个值

7. Python 编程技巧 – 使用数据类

8. Python 编程技巧 – 交换变量

9. Python 编程技巧 – 合并字典（Python 3.5以上的版本）

10. Python 编程技巧 – 字符串的首字母大写

11. Python 编程技巧 – 将字符串分割成列表

12. Python 编程技巧 – 根据字符串列表创建字符串

13. Python 编程技巧 – 表情符

14. Python 编程技巧 – 列表切片

15. Python 编程技巧 – 反转字符串和列表

16. Python 编程技巧 – 显示猫猫

17. Python 编程技巧 – map()

18. Python 编程技巧 – 获取列表或字符串中的唯一元素

19. Python 编程技巧 – 查找出现频率最高的值

20. Python 编程技巧 – 创建一个进度条

21. Python 编程技巧 – 在交互式shell中使用_（下划线运算符）

22. Python 编程技巧 – 快速创建Web服务器

23. Python 编程技巧 – 多行字符串

24. Python 编程技巧 – 条件赋值中的三元运算符

25. Python 编程技巧 – 统计元素的出现次数

26. Python 编程技巧 – 比较运算符的链接

27. Python 编程技巧 – 添加颜色

28. Python 编程技巧 – 日期的处理

29.Python 编程技巧 – 整数除法

30. Python 编程技巧 – 通过chardet 来检测字符集

有趣好用的Python教程