分类目录归档:知识问答

将Python字典转换为kwargs?

问题:将Python字典转换为kwargs?

我想使用类继承构建一个针对sunburnt(solr interface)的查询,因此将键-值对加在一起。sunburnt接口带有关键字参数。如何将字典({'type':'Event'})转换为关键字参数(type='Event')

I want to build a query for sunburnt(solr interface) using class inheritance and therefore adding key – value pairs together. The sunburnt interface takes keyword arguments. How can I transform a dict ({'type':'Event'}) into keyword arguments (type='Event')?


回答 0

使用双星运算符(又名double-splat?):

func(**{'type':'Event'})

相当于

func(type='Event')

Use the double-star (aka double-splat?) operator:

func(**{'type':'Event'})

is equivalent to

func(type='Event')

回答 1

** 操作员在这里会有所帮助。

**操作员将解开dict元素的包装,因此**{'type':'Event'}将被视为type='Event'

func(**{'type':'Event'}) 与…相同 func(type='Event') dict元素将转换为相同keyword arguments

费耶

* 将解压缩列表元素,它们将被视为 positional arguments

func(*['one', 'two']) 与…相同 func('one', 'two')

** operator would be helpful here.

** operator will unpack the dict elements and thus **{'type':'Event'} would be treated as type='Event'

func(**{'type':'Event'}) is same as func(type='Event') i.e the dict elements would be converted to the keyword arguments.

FYI

* will unpack the list elements and they would be treated as positional arguments.

func(*['one', 'two']) is same as func('one', 'two')


回答 2

这是一个完整的示例,显示了如何使用**运算符将字典中的值作为关键字参数传递。

>>> def f(x=2):
...     print(x)
... 
>>> new_x = {'x': 4}
>>> f()        #    default value x=2
2
>>> f(x=3)     #   explicit value x=3
3
>>> f(**new_x) # dictionary value x=4 
4

Here is a complete example showing how to use the ** operator to pass values from a dictionary as keyword arguments.

>>> def f(x=2):
...     print(x)
... 
>>> new_x = {'x': 4}
>>> f()        #    default value x=2
2
>>> f(x=3)     #   explicit value x=3
3
>>> f(**new_x) # dictionary value x=4 
4

pip在哪里安装其软件包?

问题:pip在哪里安装其软件包?

我激活了已安装pip的virtualenv。我做了

pip3 install Django==1.8

和Django成功下载。现在,我想打开Django文件夹。文件夹在哪里?通常它会在“下载”中,但是我不确定如果在virtualenv中使用pip安装它会在哪里。

I activated a virtualenv which has pip installed. I did

pip3 install Django==1.8

and Django successfully downloaded. Now, I want to open up the Django folder. Where is the folder located? Normally it would be in “downloads” but I’m not sure where it would be if I installed it using pip in a virtualenv.


回答 0

virtualenv一起使用时,pip通常会在路径中安装软件包<virtualenv_name>/lib/<python_ver>/site-packages

例如,我使用Python 2.7 创建了一个名为venv_test的测试virtualenv ,该文件夹位于中。djangovenv_test/lib/python2.7/site-packages/django

pip when used with virtualenv will generally install packages in the path <virtualenv_name>/lib/<python_ver>/site-packages.

For example, I created a test virtualenv named venv_test with Python 2.7, and the django folder is in venv_test/lib/python2.7/site-packages/django.


回答 1

根据大众需求,通过发布的答案提供了一个选项:

pip show <package name>将提供Windows和macOS的位置,我猜是任何系统。:)

例如:

> pip show cvxopt
Name: cvxopt
Version: 1.2.0
...
Location: /usr/local/lib/python2.7/site-packages

By popular demand, an option provided via posted answer:

pip show <package name> will provide the location for Windows and macOS, and I’m guessing any system. :)

For example:

> pip show cvxopt
Name: cvxopt
Version: 1.2.0
...
Location: /usr/local/lib/python2.7/site-packages

回答 2

pip list -v可用于列出软件包的安装位置,该位置在https://pip.pypa.io/zh/stable/news/#b1-2018-03-31中引入

当列表命令与“ -v”选项一起运行时,显示安装位置。(#979)

>pip list -v
Package                  Version   Location                                                             Installer
------------------------ --------- -------------------------------------------------------------------- ---------
alabaster                0.7.12    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
apipkg                   1.5       c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
argcomplete              1.10.3    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
astroid                  2.3.3     c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
...

更新pip10.0.0b1中引入了此功能。在Ubuntu 18.04上,pippip3安装有sudo apt install python-pip或是sudo apt install python3-pip9.0.1的版本,但没有此功能。检查https://github.com/pypa/pip/issues/5599,了解升级pip或升级的合适方法pip3

pip list -v can be used to list packages’ install locations, introduced in https://pip.pypa.io/en/stable/news/#b1-2018-03-31

Show install locations when list command ran with “-v” option. (#979)

>pip list -v
Package                  Version   Location                                                             Installer
------------------------ --------- -------------------------------------------------------------------- ---------
alabaster                0.7.12    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
apipkg                   1.5       c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
argcomplete              1.10.3    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
astroid                  2.3.3     c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
...

Update: This feature is introduced in pip 10.0.0b1. On Ubuntu 18.04, pip or pip3 installed with sudo apt install python-pip or sudo apt install python3-pip is 9.0.1 which doesn’t have this feature. Check https://github.com/pypa/pip/issues/5599 for suitable ways of upgrading pip or pip3.


回答 3

默认情况下,在Linux上,Pip将软件包安装到/usr/local/lib/python2.7/dist-packages。

在安装过程中使用virtualenv或–user将更改此默认位置。如果使用,请pip show确保使用的用户正确,否则pip可能看不到您所引用的软件包。

By default, on Linux, Pip installs packages to /usr/local/lib/python2.7/dist-packages.

Using virtualenv or –user during install will change this default location. If you use pip show make sure you are using the right user or else pip may not see the packages you are referencing.


回答 4

在Python解释器或脚本中,您可以执行

import site
site.getsitepackages() # list of global package locations

site.getusersitepackages() #string for user-specific package location

位置安装了第三方软件包(不在核心Python发行版中)。

在MacOS上我的Brew安装的Python上,前者输出

['/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages']

pip show如上一个答案所述,它规范化到所输出的相同路径:

$ readlink -f /usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages
/usr/local/lib/python3.7/site-packages

参考:https : //docs.python.org/3/library/site.html#site.getsitepackages

In a Python interpreter or script, you can do

import site
site.getsitepackages() # list of global package locations

and

site.getusersitepackages() #string for user-specific package location

for locations 3rd party packages (those not in the core Python distribution) are installed to.

On my Brew-installed Python on MacOS, the former outputs

['/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages'],

which canonicalizes to the same path output by pip show, as mentioned in a previous answer:

$ readlink -f /usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages
/usr/local/lib/python3.7/site-packages

Reference: https://docs.python.org/3/library/site.html#site.getsitepackages


如何在Python中将浮点数格式化为固定宽度

问题:如何在Python中将浮点数格式化为固定宽度

如何按照以下要求将浮点数格式化为固定宽度:

  1. 如果n <1,则前导零
  2. 添加尾随的十进制零以填充固定宽度
  3. 截断超出固定宽度的十进制数字
  4. 对齐所有小数点

例如:

% formatter something like '{:06}'
numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:
    print formatter.format(number)

输出会像

  23.2300
   0.1233
   1.0000
   4.2230
9887.2000

How do I format a floating number to a fixed width with the following requirements:

  1. Leading zero if n < 1
  2. Add trailing decimal zero(s) to fill up fixed width
  3. Truncate decimal digits past fixed width
  4. Align all decimal points

For example:

% formatter something like '{:06}'
numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:
    print formatter.format(number)

The output would be like

  23.2300
   0.1233
   1.0000
   4.2230
9887.2000

回答 0

for x in numbers:
    print "{:10.4f}".format(x)

版画

   23.2300
    0.1233
    1.0000
    4.2230
 9887.2000

花括号内的格式说明符遵循Python格式字符串语法。具体来说,在这种情况下,它由以下部分组成:

  • 空字符串冒号前的手段“采取下一个提供参数format()” -在这种情况下,x作为唯一的参数。
  • 10.4f冒号之后的部分是格式规范
  • f表示定点表示法。
  • 10是该领域的总宽度被印刷,用空格lefted-填充。
  • 4是小数点后的位数。
for x in numbers:
    print "{:10.4f}".format(x)

prints

   23.2300
    0.1233
    1.0000
    4.2230
 9887.2000

The format specifier inside the curly braces follows the Python format string syntax. Specifically, in this case, it consists of the following parts:

  • The empty string before the colon means “take the next provided argument to format()” – in this case the x as the only argument.
  • The 10.4f part after the colon is the format specification.
  • The f denotes fixed-point notation.
  • The 10 is the total width of the field being printed, lefted-padded by spaces.
  • The 4 is the number of digits after the decimal point.

回答 1

自从这个答案问了已经好几年了,但是从Python 3.6(PEP498)开始,您可以使用新的f-strings

numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:
    print(f'{number:9.4f}')

印刷品:

  23.2300
   0.1233
   1.0000
   4.2230
9887.2000

It has been a few years since this was answered, but as of Python 3.6 (PEP498) you could use the new f-strings:

numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:
    print(f'{number:9.4f}')

Prints:

  23.2300
   0.1233
   1.0000
   4.2230
9887.2000

回答 2

在python3中,以下工作原理:

>>> v=10.4
>>> print('% 6.2f' % v)
  10.40
>>> print('% 12.1f' % v)
        10.4
>>> print('%012.1f' % v)
0000000010.4

In python3 the following works:

>>> v=10.4
>>> print('% 6.2f' % v)
  10.40
>>> print('% 12.1f' % v)
        10.4
>>> print('%012.1f' % v)
0000000010.4

回答 3

请参阅Python 3.x 格式字符串语法

IDLE 3.5.1   
numbers = ['23.23', '.1233', '1', '4.223', '9887.2']

for x in numbers:  
    print('{0: >#016.4f}'. format(float(x)))  

     23.2300
      0.1233
      1.0000
      4.2230
   9887.2000

See Python 3.x format string syntax:

IDLE 3.5.1   
numbers = ['23.23', '.1233', '1', '4.223', '9887.2']

for x in numbers:  
    print('{0: >#016.4f}'. format(float(x)))  

     23.2300
      0.1233
      1.0000
      4.2230
   9887.2000

回答 4

您也可以将零填充为零。例如,如果您number要有9个字符的长度,请用零左填充,请使用:

print('{:09.3f}'.format(number))

因此,如果为number = 4.656,则输出为:00004.656

对于您的示例,输出将如下所示:

numbers  = [23.2300, 0.1233, 1.0000, 4.2230, 9887.2000]
for x in numbers: 
    print('{:010.4f}'.format(x))

印刷品:

00023.2300
00000.1233
00001.0000
00004.2230
09887.2000

一个可能有用的示例是当您要按字母顺序正确列出文件名时。我注意到在某些linux系统中,数字是:1,10,11,.. 2,20,21,…

因此,如果要在文件名中强制执行必要的数字顺序,则需要在键盘上填充适当数量的零。

You can also left pad with zeros. For example if you want number to have 9 characters length, left padded with zeros use:

print('{:09.3f}'.format(number))

Thus, if number = 4.656, the output is: 00004.656

For your example the output will look like this:

numbers  = [23.2300, 0.1233, 1.0000, 4.2230, 9887.2000]
for x in numbers: 
    print('{:010.4f}'.format(x))

prints:

00023.2300
00000.1233
00001.0000
00004.2230
09887.2000

One example where this may be useful is when you want to properly list filenames in alphabetical order. I noticed in some linux systems, the number is: 1,10,11,..2,20,21,…

Thus if you want to enforce the necessary numeric order in filenames, you need to left pad with the appropriate number of zeros.


回答 5

在Python 3中。

GPA = 2.5
print(" %6.1f " % GPA)

6.1f点之后手段1个数字显示,如果你,你应该只点打印后2位%6.2f,使得%6.3f3位点后打印。

In Python 3.

GPA = 2.5
print(" %6.1f " % GPA)

6.1f means after the dots 1 digits show if you print 2 digits after the dots you should only %6.2f such that %6.3f 3 digits print after the point.


Python 3 ImportError:没有名为“ ConfigParser”的模块

问题:Python 3 ImportError:没有名为“ ConfigParser”的模块

我想pip installMySQL-python包,但我得到的ImportError

Jans-MacBook-Pro:~ jan$ /Library/Frameworks/Python.framework/Versions/3.3/bin/pip-3.3 install MySQL-python
Downloading/unpacking MySQL-python
  Running setup.py egg_info for package MySQL-python
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>
        from setup_posix import get_config
      File "./setup_posix.py", line 2, in <module>
        from ConfigParser import SafeConfigParser
    ImportError: No module named 'ConfigParser'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>

    from setup_posix import get_config

  File "./setup_posix.py", line 2, in <module>

    from ConfigParser import SafeConfigParser

ImportError: No module named 'ConfigParser'

----------------------------------------
Command python setup.py egg_info failed with error code 1 in /var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python
Storing complete log in /Users/jan/.pip/pip.log
Jans-MacBook-Pro:~ jan$ 

有任何想法吗?

I am trying to pip install the MySQL-python package, but I get an ImportError.

Jans-MacBook-Pro:~ jan$ /Library/Frameworks/Python.framework/Versions/3.3/bin/pip-3.3 install MySQL-python
Downloading/unpacking MySQL-python
  Running setup.py egg_info for package MySQL-python
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>
        from setup_posix import get_config
      File "./setup_posix.py", line 2, in <module>
        from ConfigParser import SafeConfigParser
    ImportError: No module named 'ConfigParser'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>

    from setup_posix import get_config

  File "./setup_posix.py", line 2, in <module>

    from ConfigParser import SafeConfigParser

ImportError: No module named 'ConfigParser'

----------------------------------------
Command python setup.py egg_info failed with error code 1 in /var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python
Storing complete log in /Users/jan/.pip/pip.log
Jans-MacBook-Pro:~ jan$ 

Any ideas?


回答 0

在Python 3中,ConfigParser已被重命名configparser为PEP 8合规性。您正在安装的软件包似乎不支持Python 3。

In Python 3, ConfigParser has been renamed to configparser for PEP 8 compliance. It looks like the package you are installing does not support Python 3.


回答 1

您可以改为使用该mysqlclient软件包作为MySQL-python的直接替代品。它是MySQL-python对Python 3的新增支持。

我很幸运

pip install mysqlclient

在我的python3.4 virtualenv之后

sudo apt-get install python3-dev libmysqlclient-dev

这显然是针对ubuntu / debian的,但我只是想分享我的成功:)

You can instead use the mysqlclient package as a drop-in replacement for MySQL-python. It is a fork of MySQL-python with added support for Python 3.

I had luck with simply

pip install mysqlclient

in my python3.4 virtualenv after

sudo apt-get install python3-dev libmysqlclient-dev

which is obviously specific to ubuntu/debian, but I just wanted to share my success :)


回答 2

这是一个在Python 2.x和3.x中均应适用的代码

显然,您将需要该six模块,但是几乎不可能编写在两个版本中都没有六个版本的模块。

try:
    import configparser
except:
    from six.moves import configparser

Here is a code that should work in both Python 2.x and 3.x

Obviously you will need the six module, but it’s almost impossible to write modules that work in both versions without six.

try:
    import configparser
except:
    from six.moves import configparser

回答 3

pip install configparser
sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py

然后尝试再次安装MYSQL-python。对我有用

pip install configparser
sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py

Then try to install the MYSQL-python again. That Worked for me


回答 4

python3不支持MySQL-python,您可以使用mysqlclient

如果您正在fedora/centos/Red Hat安装以下软件包

  1. yum install python3-devel
  2. pip install mysqlclient

MySQL-python is not supported on python3 instead of this you can use mysqlclient

If you are on fedora/centos/Red Hat install following package

  1. yum install python3-devel
  2. pip install mysqlclient

回答 5

如果您使用的是CentOS,则需要使用

  1. yum install python34-devel.x86_64
  2. yum groupinstall -y 'development tools'
  3. pip3 install mysql-connector
  4. pip install mysqlclient

If you are using CentOS, then you need to use

  1. yum install python34-devel.x86_64
  2. yum groupinstall -y 'development tools'
  3. pip3 install mysql-connector
  4. pip install mysqlclient

回答 6

configparser可以通过six库简单地解决Python 2/3的兼容性

from six.moves import configparser

Compatibility of Python 2/3 for configparser can be solved simply by six library

from six.moves import configparser

回答 7

pip3 install PyMySQL然后再做pip3 install mysqlclient。为我工作

Do pip3 install PyMySQL and then pip3 install mysqlclient. Worked for me


回答 8

我遇到了同样的问题。原来,我需要在centos上安装python3 devel。首先,您需要搜索与系统兼容的软件包。

yum search python3 | grep devel

然后,将软件包安装为:

yum install -y python3-devel.x86_64

然后,从pip安装mysqlclient

pip install mysqlclient

I was having the same problem. Turns out, I needed to install python3 devel on my centos. First, you need to search for the package that is compatible with your system.

yum search python3 | grep devel

Then, install the package as:

yum install -y python3-devel.x86_64

Then, install mysqlclient from pip

pip install mysqlclient

回答 9

我进一步了解了Valeres的答案:

pip install configparser sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py然后尝试再次安装MYSQL-python。对我有用

我建议链接文件而不是复制它。保存更新。我将文件链接到/usr/lib/python3/目录。

I got further with Valeres answer:

pip install configparser sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py Then try to install the MYSQL-python again. That Worked for me

I would suggest to link the file instead of copy it. It is save to update. I linked the file to /usr/lib/python3/ directory.


回答 10

试试这个对我来说很好的解决方案。

基本上它是重新安装/升级到最新版本的MySQL冲泡,然后安装mysqlclientMySQL-Pythonglobal pip3代替virtualenv pip3

然后访问virtualenv并成功安装mysqlclientMySQL-Python

Try this solution which worked fine for me.

Basically it’s to reinstall/upgrade to latest version of mysql from brew, and then installing mysqlclient or MySQL-Python from global pip3 instead of virtualenv pip3.

Then accessing the virtualenv and successfully install mysqlclient or MySQL-Python.


回答 11

如何检查您首先使用的Python版本。

import six
if six.PY2:
    import ConfigParser as configparser
else:
    import configparser

how about checking the version of Python you are using first.

import six
if six.PY2:
    import ConfigParser as configparser
else:
    import configparser

回答 12

我运行kali linux- Rolling,并在更新到python 3.6.0之后尝试在终端中运行cupp.py时遇到了这个问题。一些研究和试验后,我发现,改变 ConfigParserconfigparser我的工作,但那时,我发现另一个问题就来了。

config = configparser.configparser() AttributeError: module 'configparser' has no attribute 'configparser'

经过更多研究,我意识到python 3 ConfigParser已更改为, configparser但请注意它具有属性 ConfigParser()

I run kali linux- Rolling and I came across this problem ,when I tried running cupp.py in the terminal, after updating to python 3.6.0. After some research and trial I found that changing ConfigParser to configparser worked for me but then I came across another issue.

config = configparser.configparser() AttributeError: module 'configparser' has no attribute 'configparser'

After a bit more research I realised that for python 3 ConfigParser is changed to configparser but note that it has an attribute ConfigParser().


回答 13

我在Mac OS 10,Python 3.7.6和Django 2.2.7上遇到了相同的错误。在尝试了多种解决方案之后,我想借此机会分享对我有用的东西。

脚步

  1. 通过链接为Mac OS安装了Connector / Python 8.0.20

  2. 将当前依赖项复制到requirements.txt文件,停用当前虚拟环境,并使用删除它;

    创建文件(如果尚未创建的话); touch requirements.txt

    将依赖项复制到文件; python -m pip3 freeze > requirements.txt

    停用并删除当前虚拟环境; deactivate && rm -rf <virtual-env-name>

  3. 创建另一个虚拟环境并使用激活它; python -m venv <virtual-env-name> && source <virtual-env-name>/bin/activate

  4. 使用安装以前的依赖项; python -m pip3 install -r requirements.txt

I was getting the same error on Mac OS 10, Python 3.7.6 & Django 2.2.7. I want to use this opportunity to share what worked for me after trying out numerous solutions.

Steps

  1. Installed Connector/Python 8.0.20 for Mac OS from link

  2. Copy current dependencies into requirements.txt file, deactivated the current virtual env, and deleted it using;

    create the file if not already created with; touch requirements.txt

    copy dependency to file; python -m pip3 freeze > requirements.txt

    deactivate and delete current virtual env; deactivate && rm -rf <virtual-env-name>

  3. Created another virtual env and activated it using; python -m venv <virtual-env-name> && source <virtual-env-name>/bin/activate

  4. Install previous dependencies using; python -m pip3 install -r requirements.txt


回答 14

请看看/usr/bin/python指的是什么

如果它指向python3 or higher 更改为python2.7

这应该可以解决问题。

我收到所有python软件包的安装错误。安倍·卡普拉斯(Abe Karplus)的解决方案和讨论向我暗示了可能是什么问题。然后我回想起我已经手动将/usr/bin/pythonfrom从更改python2.7/usr/bin/python3.5,这实际上是导致问题的原因。有一次我reverted一样。解决了。

Kindly to see what is /usr/bin/python pointing to

if it is pointing to python3 or higher change to python2.7

This should solve the issue.

I was getting install error for all the python packages. Abe Karplus’s solution & discussion gave me the hint as to what could be the problem. Then I recalled that I had manually changed the /usr/bin/python from python2.7 to /usr/bin/python3.5, which actually was causing the issue. Once I reverted the same. It got solved.


回答 15

这对我有用

cp /usr/local/lib/python3.5/configparser.py /usr/local/lib/python3.5/ConfigParser.py

This worked for me

cp /usr/local/lib/python3.5/configparser.py /usr/local/lib/python3.5/ConfigParser.py

Python:defaultdict的defaultdict?

问题:Python:defaultdict的defaultdict?

有没有一种方法可以defaultdict(defaultdict(int))使以下代码正常工作?

for x in stuff:
    d[x.a][x.b] += x.c_int

d需要临时构建,具体取决于x.ax.b元素。

我可以使用:

for x in stuff:
    d[x.a,x.b] += x.c_int

但后来我将无法使用:

d.keys()
d[x.a].keys()

Is there a way to have a defaultdict(defaultdict(int)) in order to make the following code work?

for x in stuff:
    d[x.a][x.b] += x.c_int

d needs to be built ad-hoc, depending on x.a and x.b elements.

I could use:

for x in stuff:
    d[x.a,x.b] += x.c_int

but then I wouldn’t be able to use:

d.keys()
d[x.a].keys()

回答 0

是这样的:

defaultdict(lambda: defaultdict(int))

当您尝试访问不存在的键时,将调用的参数defaultdict(在这种情况下为lambda: defaultdict(int))。它的返回值将设置为该密钥的新值,这意味着在我们的情况下,d[Key_doesnt_exist]将为defaultdict(int)

如果尝试从最后一个defaultdict访问密钥,即d[Key_doesnt_exist][Key_doesnt_exist]它将返回0,这是最后一个defaultdict的参数的返回值int()

Yes like this:

defaultdict(lambda: defaultdict(int))

The argument of a defaultdict (in this case is lambda: defaultdict(int)) will be called when you try to access a key that doesn’t exist. The return value of it will be set as the new value of this key, which means in our case the value of d[Key_doesnt_exist] will be defaultdict(int).

If you try to access a key from this last defaultdict i.e. d[Key_doesnt_exist][Key_doesnt_exist] it will return 0, which is the return value of the argument of the last defaultdict i.e. int().


回答 1

defaultdict构造函数的参数是用于构建新元素的函数。因此,让我们使用lambda!

>>> from collections import defaultdict
>>> d = defaultdict(lambda : defaultdict(int))
>>> print d[0]
defaultdict(<type 'int'>, {})
>>> print d[0]["x"]
0

从Python 2.7开始,使用Counter有了一个更好的解决方案

>>> from collections import Counter
>>> c = Counter()
>>> c["goodbye"]+=1
>>> c["and thank you"]=42
>>> c["for the fish"]-=5
>>> c
Counter({'and thank you': 42, 'goodbye': 1, 'for the fish': -5})

一些额外功能

>>> c.most_common()[:2]
[('and thank you', 42), ('goodbye', 1)]

有关更多信息,请参见PyMOTW-集合-容器数据类型Python文档-集合

The parameter to the defaultdict constructor is the function which will be called for building new elements. So let’s use a lambda !

>>> from collections import defaultdict
>>> d = defaultdict(lambda : defaultdict(int))
>>> print d[0]
defaultdict(<type 'int'>, {})
>>> print d[0]["x"]
0

Since Python 2.7, there’s an even better solution using Counter:

>>> from collections import Counter
>>> c = Counter()
>>> c["goodbye"]+=1
>>> c["and thank you"]=42
>>> c["for the fish"]-=5
>>> c
Counter({'and thank you': 42, 'goodbye': 1, 'for the fish': -5})

Some bonus features

>>> c.most_common()[:2]
[('and thank you', 42), ('goodbye', 1)]

For more information see PyMOTW – Collections – Container data types and Python Documentation – collections


回答 2

我发现使用起来稍微更优雅partial

import functools
dd_int = functools.partial(defaultdict, int)
defaultdict(dd_int)

当然,这与lambda相同。

I find it slightly more elegant to use partial:

import functools
dd_int = functools.partial(defaultdict, int)
defaultdict(dd_int)

Of course, this is the same as a lambda.


回答 3

作为参考,可以通过以下方式实现通用的嵌套defaultdict工厂方法:

from collections import defaultdict
from functools import partial
from itertools import repeat


def nested_defaultdict(default_factory, depth=1):
    result = partial(defaultdict, default_factory)
    for _ in repeat(None, depth - 1):
        result = partial(defaultdict, result)
    return result()

深度定义了default_factory使用中定义的类型之前嵌套字典的数量。例如:

my_dict = nested_defaultdict(list, 3)
my_dict['a']['b']['c'].append('e')

For reference, it’s possible to implement a generic nested defaultdict factory method through:

from collections import defaultdict
from functools import partial
from itertools import repeat


def nested_defaultdict(default_factory, depth=1):
    result = partial(defaultdict, default_factory)
    for _ in repeat(None, depth - 1):
        result = partial(defaultdict, result)
    return result()

The depth defines the number of nested dictionary before the type defined in default_factory is used. For example:

my_dict = nested_defaultdict(list, 3)
my_dict['a']['b']['c'].append('e')

回答 4

先前的答案已经解决了如何制作两级或n级defaultdict。在某些情况下,您需要无限个:

def ddict():
    return defaultdict(ddict)

用法:

>>> d = ddict()
>>> d[1]['a'][True] = 0.5
>>> d[1]['b'] = 3
>>> import pprint; pprint.pprint(d)
defaultdict(<function ddict at 0x7fcac68bf048>,
            {1: defaultdict(<function ddict at 0x7fcac68bf048>,
                            {'a': defaultdict(<function ddict at 0x7fcac68bf048>,
                                              {True: 0.5}),
                             'b': 3})})

Previous answers have addressed how to make a two-levels or n-levels defaultdict. In some cases you want an infinite one:

def ddict():
    return defaultdict(ddict)

Usage:

>>> d = ddict()
>>> d[1]['a'][True] = 0.5
>>> d[1]['b'] = 3
>>> import pprint; pprint.pprint(d)
defaultdict(<function ddict at 0x7fcac68bf048>,
            {1: defaultdict(<function ddict at 0x7fcac68bf048>,
                            {'a': defaultdict(<function ddict at 0x7fcac68bf048>,
                                              {True: 0.5}),
                             'b': 3})})

回答 5

其他人已经正确回答了您如何使以下各项正常工作的问题:

for x in stuff:
    d[x.a][x.b] += x.c_int

一种替代方法是使用元组作为键:

d = defaultdict(int)
for x in stuff:
    d[x.a,x.b] += x.c_int
    # ^^^^^^^ tuple key

这种方法的好处是它很简单并且可以轻松扩展。如果您需要三个层次的映射,只需使用一个三项元组作为键。

Others have answered correctly your question of how to get the following to work:

for x in stuff:
    d[x.a][x.b] += x.c_int

An alternative would be to use tuples for keys:

d = defaultdict(int)
for x in stuff:
    d[x.a,x.b] += x.c_int
    # ^^^^^^^ tuple key

The nice thing about this approach is that it is simple and can be easily expanded. If you need a mapping three levels deep, just use a three item tuple for the key.


向现有对象实例添加方法

问题:向现有对象实例添加方法

我读过,可以在Python中向现有对象(即不在类定义中)添加方法。

我了解这样做并不总是一件好事。但是怎么可能呢?

I’ve read that it is possible to add a method to an existing object (i.e., not in the class definition) in Python.

I understand that it’s not always good to do so. But how might one do this?


回答 0

在Python中,函数和绑定方法之间存在差异。

>>> def foo():
...     print "foo"
...
>>> class A:
...     def bar( self ):
...         print "bar"
...
>>> a = A()
>>> foo
<function foo at 0x00A98D70>
>>> a.bar
<bound method A.bar of <__main__.A instance at 0x00A9BC88>>
>>>

绑定方法已“绑定”(具有描述性)到实例,并且只要调用该方法,该实例将作为第一个参数传递。

但是,作为类(而不是实例)的属性的可调用对象仍未绑定,因此您可以在需要时修改类定义:

>>> def fooFighters( self ):
...     print "fooFighters"
...
>>> A.fooFighters = fooFighters
>>> a2 = A()
>>> a2.fooFighters
<bound method A.fooFighters of <__main__.A instance at 0x00A9BEB8>>
>>> a2.fooFighters()
fooFighters

先前定义的实例也会被更新(只要它们本身没有覆盖属性):

>>> a.fooFighters()
fooFighters

当您要将方法附加到单个实例时,就会出现问题:

>>> def barFighters( self ):
...     print "barFighters"
...
>>> a.barFighters = barFighters
>>> a.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: barFighters() takes exactly 1 argument (0 given)

该函数直接附加到实例时不会自动绑定:

>>> a.barFighters
<function barFighters at 0x00A98EF0>

要绑定它,我们可以在类型模块中使用MethodType函数

>>> import types
>>> a.barFighters = types.MethodType( barFighters, a )
>>> a.barFighters
<bound method ?.barFighters of <__main__.A instance at 0x00A9BC88>>
>>> a.barFighters()
barFighters

这次,该类的其他实例没有受到影响:

>>> a2.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: A instance has no attribute 'barFighters'

通过阅读有关描述符元类 编程的信息,可以找到更多信息。

In Python, there is a difference between functions and bound methods.

>>> def foo():
...     print "foo"
...
>>> class A:
...     def bar( self ):
...         print "bar"
...
>>> a = A()
>>> foo
<function foo at 0x00A98D70>
>>> a.bar
<bound method A.bar of <__main__.A instance at 0x00A9BC88>>
>>>

Bound methods have been “bound” (how descriptive) to an instance, and that instance will be passed as the first argument whenever the method is called.

Callables that are attributes of a class (as opposed to an instance) are still unbound, though, so you can modify the class definition whenever you want:

>>> def fooFighters( self ):
...     print "fooFighters"
...
>>> A.fooFighters = fooFighters
>>> a2 = A()
>>> a2.fooFighters
<bound method A.fooFighters of <__main__.A instance at 0x00A9BEB8>>
>>> a2.fooFighters()
fooFighters

Previously defined instances are updated as well (as long as they haven’t overridden the attribute themselves):

>>> a.fooFighters()
fooFighters

The problem comes when you want to attach a method to a single instance:

>>> def barFighters( self ):
...     print "barFighters"
...
>>> a.barFighters = barFighters
>>> a.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: barFighters() takes exactly 1 argument (0 given)

The function is not automatically bound when it’s attached directly to an instance:

>>> a.barFighters
<function barFighters at 0x00A98EF0>

To bind it, we can use the MethodType function in the types module:

>>> import types
>>> a.barFighters = types.MethodType( barFighters, a )
>>> a.barFighters
<bound method ?.barFighters of <__main__.A instance at 0x00A9BC88>>
>>> a.barFighters()
barFighters

This time other instances of the class have not been affected:

>>> a2.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: A instance has no attribute 'barFighters'

More information can be found by reading about descriptors and metaclass programming.


回答 1

自python 2.6起不推荐使用new模块,并在3.0版中将其删除,请使用类型

参见http://docs.python.org/library/new.html

在下面的示例中,我故意从patch_me()函数中删除了返回值。我认为提供返回值可能会使人相信patch返回了一个新对象,这是不正确的-它修改了传入的对象。可能这可以促进对Monkey补丁的更严格的使用。

import types

class A(object):#but seems to work for old style objects too
    pass

def patch_me(target):
    def method(target,x):
        print "x=",x
        print "called from", target
    target.method = types.MethodType(method,target)
    #add more if needed

a = A()
print a
#out: <__main__.A object at 0x2b73ac88bfd0>  
patch_me(a)    #patch instance
a.method(5)
#out: x= 5
#out: called from <__main__.A object at 0x2b73ac88bfd0>
patch_me(A)
A.method(6)        #can patch class too
#out: x= 6
#out: called from <class '__main__.A'>

Module new is deprecated since python 2.6 and removed in 3.0, use types

see http://docs.python.org/library/new.html

In the example below I’ve deliberately removed return value from patch_me() function. I think that giving return value may make one believe that patch returns a new object, which is not true – it modifies the incoming one. Probably this can facilitate a more disciplined use of monkeypatching.

import types

class A(object):#but seems to work for old style objects too
    pass

def patch_me(target):
    def method(target,x):
        print "x=",x
        print "called from", target
    target.method = types.MethodType(method,target)
    #add more if needed

a = A()
print a
#out: <__main__.A object at 0x2b73ac88bfd0>  
patch_me(a)    #patch instance
a.method(5)
#out: x= 5
#out: called from <__main__.A object at 0x2b73ac88bfd0>
patch_me(A)
A.method(6)        #can patch class too
#out: x= 6
#out: called from <class '__main__.A'>

回答 2

前言-有关兼容性的说明:其他答案可能仅在Python 2中有效-此答案在Python 2和3中应该可以很好地工作。如果仅编写Python 3,则可能会显式地继承自object,但是代码应保持不变。

向现有对象实例添加方法

我读过,可以在Python中向现有对象(例如不在类定义中)添加方法。

我了解这样做并非总是一个好的决定。但是,怎么可能呢?

是的,有可能-但不建议

我不建议这样做。这是一个坏主意。不要这样

这有两个原因:

  • 您将向执行此操作的每个实例添加一个绑定对象。如果您经常这样做,则可能会浪费大量内存。通常仅在调用的短时间内创建绑定方法,然后在自动垃圾回收时它们不再存在。如果手动执行此操作,则将具有一个引用绑定方法的名称绑定-这将防止使用时对其进行垃圾回收。
  • 给定类型的对象实例通常在该类型的所有对象上都有其方法。如果在其他位置添加方法,则某些实例将具有那些方法,而其他实例则不会。程序员不会期望如此,您可能会违反最不惊奇规则
  • 由于还有其他非常好的理由不这样做,因此,如果这样做,您的声誉也会很差。

因此,我建议您除非有充分的理由,否则不要这样做。这是更好的在类定义来定义的正确方法更少的类直接优选猴的贴剂,是这样的:

Foo.sample_method = sample_method

由于具有指导意义,因此,我将向您展示一些这样做的方法。

怎么做

这是一些设置代码。我们需要一个类定义。可以将其导入,但这并不重要。

class Foo(object):
    '''An empty class to demonstrate adding a method to an instance'''

创建一个实例:

foo = Foo()

创建一个添加方法:

def sample_method(self, bar, baz):
    print(bar + baz)

方法零(0)-使用描述符方法, __get__

在函数上进行的点分查找__get__使用实例调用函数的方法,将对象绑定到该方法,从而创建“绑定方法”。

foo.sample_method = sample_method.__get__(foo)

现在:

>>> foo.sample_method(1,2)
3

方法一-types.MethodType

首先,导入类型,从中我们将获得方法构造函数:

import types

现在我们将方法添加到实例中。为此,我们需要types模块(上面已导入)中的MethodType构造函数。

types.MethodType的参数签名为(function, instance, class)

foo.sample_method = types.MethodType(sample_method, foo, Foo)

和用法:

>>> foo.sample_method(1,2)
3

方法二:词法绑定

首先,我们创建一个包装器函数,将方法绑定到实例:

def bind(instance, method):
    def binding_scope_fn(*args, **kwargs): 
        return method(instance, *args, **kwargs)
    return binding_scope_fn

用法:

>>> foo.sample_method = bind(foo, sample_method)    
>>> foo.sample_method(1,2)
3

方法三:functools.partial

局部函数将第一个参数应用于函数(以及可选的关键字参数),以后可以与其余参数(以及覆盖的关键字参数)一起调用。从而:

>>> from functools import partial
>>> foo.sample_method = partial(sample_method, foo)
>>> foo.sample_method(1,2)
3    

当您认为绑定方法是实例的部分功能时,这很有意义。

未绑定函数作为对象属性-为什么不起作用:

如果尝试以与将其添加到类中相同的方式添加sample_method,则它不受实例约束,并且不会将隐式self作为第一个参数。

>>> foo.sample_method = sample_method
>>> foo.sample_method(1,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sample_method() takes exactly 3 arguments (2 given)

我们可以通过显式传递实例(或其他任何方法,因为此方法实际上不使用self参数变量)来使未绑定函数起作用,但是它将与其他实例的预期签名不一致(如果我们进行了Monkey修补)此实例):

>>> foo.sample_method(foo, 1, 2)
3

结论

现在,您知道可以执行此操作的几种方法,但是,认真地说-不要这样做。

Preface – a note on compatibility: other answers may only work in Python 2 – this answer should work perfectly well in Python 2 and 3. If writing Python 3 only, you might leave out explicitly inheriting from object, but otherwise the code should remain the same.

Adding a Method to an Existing Object Instance

I’ve read that it is possible to add a method to an existing object (e.g. not in the class definition) in Python.

I understand that it’s not always a good decision to do so. But, how might one do this?

Yes, it is possible – But not recommended

I don’t recommend this. This is a bad idea. Don’t do it.

Here’s a couple of reasons:

  • You’ll add a bound object to every instance you do this to. If you do this a lot, you’ll probably waste a lot of memory. Bound methods are typically only created for the short duration of their call, and they then cease to exist when automatically garbage collected. If you do this manually, you’ll have a name binding referencing the bound method – which will prevent its garbage collection on usage.
  • Object instances of a given type generally have its methods on all objects of that type. If you add methods elsewhere, some instances will have those methods and others will not. Programmers will not expect this, and you risk violating the rule of least surprise.
  • Since there are other really good reasons not to do this, you’ll additionally give yourself a poor reputation if you do it.

Thus, I suggest that you not do this unless you have a really good reason. It is far better to define the correct method in the class definition or less preferably to monkey-patch the class directly, like this:

Foo.sample_method = sample_method

Since it’s instructive, however, I’m going to show you some ways of doing this.

How it can be done

Here’s some setup code. We need a class definition. It could be imported, but it really doesn’t matter.

class Foo(object):
    '''An empty class to demonstrate adding a method to an instance'''

Create an instance:

foo = Foo()

Create a method to add to it:

def sample_method(self, bar, baz):
    print(bar + baz)

Method nought (0) – use the descriptor method, __get__

Dotted lookups on functions call the __get__ method of the function with the instance, binding the object to the method and thus creating a “bound method.”

foo.sample_method = sample_method.__get__(foo)

and now:

>>> foo.sample_method(1,2)
3

Method one – types.MethodType

First, import types, from which we’ll get the method constructor:

import types

Now we add the method to the instance. To do this, we require the MethodType constructor from the types module (which we imported above).

The argument signature for types.MethodType is (function, instance, class):

foo.sample_method = types.MethodType(sample_method, foo, Foo)

and usage:

>>> foo.sample_method(1,2)
3

Method two: lexical binding

First, we create a wrapper function that binds the method to the instance:

def bind(instance, method):
    def binding_scope_fn(*args, **kwargs): 
        return method(instance, *args, **kwargs)
    return binding_scope_fn

usage:

>>> foo.sample_method = bind(foo, sample_method)    
>>> foo.sample_method(1,2)
3

Method three: functools.partial

A partial function applies the first argument(s) to a function (and optionally keyword arguments), and can later be called with the remaining arguments (and overriding keyword arguments). Thus:

>>> from functools import partial
>>> foo.sample_method = partial(sample_method, foo)
>>> foo.sample_method(1,2)
3    

This makes sense when you consider that bound methods are partial functions of the instance.

Unbound function as an object attribute – why this doesn’t work:

If we try to add the sample_method in the same way as we might add it to the class, it is unbound from the instance, and doesn’t take the implicit self as the first argument.

>>> foo.sample_method = sample_method
>>> foo.sample_method(1,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sample_method() takes exactly 3 arguments (2 given)

We can make the unbound function work by explicitly passing the instance (or anything, since this method doesn’t actually use the self argument variable), but it would not be consistent with the expected signature of other instances (if we’re monkey-patching this instance):

>>> foo.sample_method(foo, 1, 2)
3

Conclusion

You now know several ways you could do this, but in all seriousness – don’t do this.


回答 3

我认为以上答案错过了关键点。

让我们来一个带有方法的类:

class A(object):
    def m(self):
        pass

现在,让我们在ipython中玩它:

In [2]: A.m
Out[2]: <unbound method A.m>

好的,因此m()以某种方式成为A的未绑定方法。但是真的是那样吗?

In [5]: A.__dict__['m']
Out[5]: <function m at 0xa66b8b4>

事实证明,m()只是一个函数,对它的引用已添加到A类字典中-没有魔术。那为什么要给我们一个不受约束的方法?这是因为该点未转换为简单的字典查找。实际上是A .__ class __.__ getattribute __(A,’m’)的调用:

In [11]: class MetaA(type):
   ....:     def __getattribute__(self, attr_name):
   ....:         print str(self), '-', attr_name

In [12]: class A(object):
   ....:     __metaclass__ = MetaA

In [23]: A.m
<class '__main__.A'> - m
<class '__main__.A'> - m

现在,我不确定为什么最后一行要打印两次,但是仍然很清楚那里发生了什么。

现在,默认的__getattribute__所做的是检查属性是否为所谓的描述符,即,是否实现了特殊的__get__方法。如果实现该方法,则返回该__get__方法的结果。回到我们的A类的第一个版本,这是我们拥有的:

In [28]: A.__dict__['m'].__get__(None, A)
Out[28]: <unbound method A.m>

而且由于Python函数实现了描述符协议,所以如果它们代表一个对象被调用,它们将通过__get__方法将自身绑定到该对象。

好的,如何为现有对象添加方法?假设您不介意修补类,那么它很简单:

B.m = m

然后,借助描述符魔术,Bm “成为”一个不受约束的方法。

而且,如果您只想向单个对象添加方法,则必须使用types.MethodType来自己模拟机制。

b.m = types.MethodType(m, b)

顺便说说:

In [2]: A.m
Out[2]: <unbound method A.m>

In [59]: type(A.m)
Out[59]: <type 'instancemethod'>

In [60]: type(b.m)
Out[60]: <type 'instancemethod'>

In [61]: types.MethodType
Out[61]: <type 'instancemethod'>

I think that the above answers missed the key point.

Let’s have a class with a method:

class A(object):
    def m(self):
        pass

Now, let’s play with it in ipython:

In [2]: A.m
Out[2]: <unbound method A.m>

Ok, so m() somehow becomes an unbound method of A. But is it really like that?

In [5]: A.__dict__['m']
Out[5]: <function m at 0xa66b8b4>

It turns out that m() is just a function, reference to which is added to A class dictionary – there’s no magic. Then why A.m gives us an unbound method? It’s because the dot is not translated to a simple dictionary lookup. It’s de facto a call of A.__class__.__getattribute__(A, ‘m’):

In [11]: class MetaA(type):
   ....:     def __getattribute__(self, attr_name):
   ....:         print str(self), '-', attr_name

In [12]: class A(object):
   ....:     __metaclass__ = MetaA

In [23]: A.m
<class '__main__.A'> - m
<class '__main__.A'> - m

Now, I’m not sure out of the top of my head why the last line is printed twice, but still it’s clear what’s going on there.

Now, what the default __getattribute__ does is that it checks if the attribute is a so-called descriptor or not, i.e. if it implements a special __get__ method. If it implements that method, then what is returned is the result of calling that __get__ method. Going back to the first version of our A class, this is what we have:

In [28]: A.__dict__['m'].__get__(None, A)
Out[28]: <unbound method A.m>

And because Python functions implement the descriptor protocol, if they are called on behalf of an object, they bind themselves to that object in their __get__ method.

Ok, so how to add a method to an existing object? Assuming you don’t mind patching class, it’s as simple as:

B.m = m

Then B.m “becomes” an unbound method, thanks to the descriptor magic.

And if you want to add a method just to a single object, then you have to emulate the machinery yourself, by using types.MethodType:

b.m = types.MethodType(m, b)

By the way:

In [2]: A.m
Out[2]: <unbound method A.m>

In [59]: type(A.m)
Out[59]: <type 'instancemethod'>

In [60]: type(b.m)
Out[60]: <type 'instancemethod'>

In [61]: types.MethodType
Out[61]: <type 'instancemethod'>

回答 4

在Python中,Monkey修补通常通过覆盖您自己的类或函数签名来起作用。以下是Zope Wiki的示例:

from SomeOtherProduct.SomeModule import SomeClass
def speak(self):
   return "ook ook eee eee eee!"
SomeClass.speak = speak

该代码将覆盖/创建一个在类上称为“讲话”的方法。在杰夫·阿特伍德(Jeff Atwood)最近关于Monkey修补的文章中。他显示了C#3.0中的示例,这是我在工作中使用的当前语言。

In Python monkey patching generally works by overwriting a class or functions signature with your own. Below is an example from the Zope Wiki:

from SomeOtherProduct.SomeModule import SomeClass
def speak(self):
   return "ook ook eee eee eee!"
SomeClass.speak = speak

That code will overwrite/create a method called speak on the class. In Jeff Atwood’s recent post on monkey patching. He shows an example in C# 3.0 which is the current language I use for work.


回答 5

您可以使用lambda将方法绑定到实例:

def run(self):
    print self._instanceString

class A(object):
    def __init__(self):
        self._instanceString = "This is instance string"

a = A()
a.run = lambda: run(a)
a.run()

输出:

This is instance string

You can use lambda to bind a method to an instance:

def run(self):
    print self._instanceString

class A(object):
    def __init__(self):
        self._instanceString = "This is instance string"

a = A()
a.run = lambda: run(a)
a.run()

Output:

This is instance string

回答 6

没有至少一种方法可以将方法附加到实例types.MethodType

>>> class A:
...  def m(self):
...   print 'im m, invoked with: ', self

>>> a = A()
>>> a.m()
im m, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.m
<bound method A.m of <__main__.A instance at 0x973ec6c>>
>>> 
>>> def foo(firstargument):
...  print 'im foo, invoked with: ', firstargument

>>> foo
<function foo at 0x978548c>

1:

>>> a.foo = foo.__get__(a, A) # or foo.__get__(a, type(a))
>>> a.foo()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo
<bound method A.foo of <__main__.A instance at 0x973ec6c>>

2:

>>> instancemethod = type(A.m)
>>> instancemethod
<type 'instancemethod'>
>>> a.foo2 = instancemethod(foo, a, type(a))
>>> a.foo2()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo2
<bound method instance.foo of <__main__.A instance at 0x973ec6c>>

有用的链接:
数据模型-调用描述符描述
符方法指南-调用描述符

There are at least two ways for attach a method to an instance without types.MethodType:

>>> class A:
...  def m(self):
...   print 'im m, invoked with: ', self

>>> a = A()
>>> a.m()
im m, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.m
<bound method A.m of <__main__.A instance at 0x973ec6c>>
>>> 
>>> def foo(firstargument):
...  print 'im foo, invoked with: ', firstargument

>>> foo
<function foo at 0x978548c>

1:

>>> a.foo = foo.__get__(a, A) # or foo.__get__(a, type(a))
>>> a.foo()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo
<bound method A.foo of <__main__.A instance at 0x973ec6c>>

2:

>>> instancemethod = type(A.m)
>>> instancemethod
<type 'instancemethod'>
>>> a.foo2 = instancemethod(foo, a, type(a))
>>> a.foo2()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo2
<bound method instance.foo of <__main__.A instance at 0x973ec6c>>

Useful links:
Data model – invoking descriptors
Descriptor HowTo Guide – invoking descriptors


回答 7

setattr我相信您在寻找什么。使用此设置对象上的属性。

>>> def printme(s): print repr(s)
>>> class A: pass
>>> setattr(A,'printme',printme)
>>> a = A()
>>> a.printme() # s becomes the implicit 'self' variable
< __ main __ . A instance at 0xABCDEFG>

What you’re looking for is setattr I believe. Use this to set an attribute on an object.

>>> def printme(s): print repr(s)
>>> class A: pass
>>> setattr(A,'printme',printme)
>>> a = A()
>>> a.printme() # s becomes the implicit 'self' variable
< __ main __ . A instance at 0xABCDEFG>

回答 8

由于此问题要求使用非Python版本,因此以下是JavaScript:

a.methodname = function () { console.log("Yay, a new method!") }

Since this question asked for non-Python versions, here’s JavaScript:

a.methodname = function () { console.log("Yay, a new method!") }

回答 9

通过查看不同绑定方法的结果,合并Jason Pratt和社区Wiki的答案:

尤其要注意将绑定函数添加为类方法的工作原理,但是引用范围不正确。

#!/usr/bin/python -u
import types
import inspect

## dynamically adding methods to a unique instance of a class


# get a list of a class's method type attributes
def listattr(c):
    for m in [(n, v) for n, v in inspect.getmembers(c, inspect.ismethod) if isinstance(v,types.MethodType)]:
        print m[0], m[1]

# externally bind a function as a method of an instance of a class
def ADDMETHOD(c, method, name):
    c.__dict__[name] = types.MethodType(method, c)

class C():
    r = 10 # class attribute variable to test bound scope

    def __init__(self):
        pass

    #internally bind a function as a method of self's class -- note that this one has issues!
    def addmethod(self, method, name):
        self.__dict__[name] = types.MethodType( method, self.__class__ )

    # predfined function to compare with
    def f0(self, x):
        print 'f0\tx = %d\tr = %d' % ( x, self.r)

a = C() # created before modified instnace
b = C() # modified instnace


def f1(self, x): # bind internally
    print 'f1\tx = %d\tr = %d' % ( x, self.r )
def f2( self, x): # add to class instance's .__dict__ as method type
    print 'f2\tx = %d\tr = %d' % ( x, self.r )
def f3( self, x): # assign to class as method type
    print 'f3\tx = %d\tr = %d' % ( x, self.r )
def f4( self, x): # add to class instance's .__dict__ using a general function
    print 'f4\tx = %d\tr = %d' % ( x, self.r )


b.addmethod(f1, 'f1')
b.__dict__['f2'] = types.MethodType( f2, b)
b.f3 = types.MethodType( f3, b)
ADDMETHOD(b, f4, 'f4')


b.f0(0) # OUT: f0   x = 0   r = 10
b.f1(1) # OUT: f1   x = 1   r = 10
b.f2(2) # OUT: f2   x = 2   r = 10
b.f3(3) # OUT: f3   x = 3   r = 10
b.f4(4) # OUT: f4   x = 4   r = 10


k = 2
print 'changing b.r from {0} to {1}'.format(b.r, k)
b.r = k
print 'new b.r = {0}'.format(b.r)

b.f0(0) # OUT: f0   x = 0   r = 2
b.f1(1) # OUT: f1   x = 1   r = 10  !!!!!!!!!
b.f2(2) # OUT: f2   x = 2   r = 2
b.f3(3) # OUT: f3   x = 3   r = 2
b.f4(4) # OUT: f4   x = 4   r = 2

c = C() # created after modifying instance

# let's have a look at each instance's method type attributes
print '\nattributes of a:'
listattr(a)
# OUT:
# attributes of a:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x000000000230FD88>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x000000000230FD88>>
# f0 <bound method C.f0 of <__main__.C instance at 0x000000000230FD88>>

print '\nattributes of b:'
listattr(b)
# OUT:
# attributes of b:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x000000000230FE08>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x000000000230FE08>>
# f0 <bound method C.f0 of <__main__.C instance at 0x000000000230FE08>>
# f1 <bound method ?.f1 of <class __main__.C at 0x000000000237AB28>>
# f2 <bound method ?.f2 of <__main__.C instance at 0x000000000230FE08>>
# f3 <bound method ?.f3 of <__main__.C instance at 0x000000000230FE08>>
# f4 <bound method ?.f4 of <__main__.C instance at 0x000000000230FE08>>

print '\nattributes of c:'
listattr(c)
# OUT:
# attributes of c:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x0000000002313108>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x0000000002313108>>
# f0 <bound method C.f0 of <__main__.C instance at 0x0000000002313108>>

就个人而言,我更喜欢使用外部ADDMETHOD函数路由,因为它也允许我在迭代器中动态分配新的方法名称。

def y(self, x):
    pass
d = C()
for i in range(1,5):
    ADDMETHOD(d, y, 'f%d' % i)
print '\nattributes of d:'
listattr(d)
# OUT:
# attributes of d:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x0000000002303508>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x0000000002303508>>
# f0 <bound method C.f0 of <__main__.C instance at 0x0000000002303508>>
# f1 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f2 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f3 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f4 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>

Consolidating Jason Pratt’s and the community wiki answers, with a look at the results of different methods of binding:

Especially note how adding the binding function as a class method works, but the referencing scope is incorrect.

#!/usr/bin/python -u
import types
import inspect

## dynamically adding methods to a unique instance of a class


# get a list of a class's method type attributes
def listattr(c):
    for m in [(n, v) for n, v in inspect.getmembers(c, inspect.ismethod) if isinstance(v,types.MethodType)]:
        print m[0], m[1]

# externally bind a function as a method of an instance of a class
def ADDMETHOD(c, method, name):
    c.__dict__[name] = types.MethodType(method, c)

class C():
    r = 10 # class attribute variable to test bound scope

    def __init__(self):
        pass

    #internally bind a function as a method of self's class -- note that this one has issues!
    def addmethod(self, method, name):
        self.__dict__[name] = types.MethodType( method, self.__class__ )

    # predfined function to compare with
    def f0(self, x):
        print 'f0\tx = %d\tr = %d' % ( x, self.r)

a = C() # created before modified instnace
b = C() # modified instnace


def f1(self, x): # bind internally
    print 'f1\tx = %d\tr = %d' % ( x, self.r )
def f2( self, x): # add to class instance's .__dict__ as method type
    print 'f2\tx = %d\tr = %d' % ( x, self.r )
def f3( self, x): # assign to class as method type
    print 'f3\tx = %d\tr = %d' % ( x, self.r )
def f4( self, x): # add to class instance's .__dict__ using a general function
    print 'f4\tx = %d\tr = %d' % ( x, self.r )


b.addmethod(f1, 'f1')
b.__dict__['f2'] = types.MethodType( f2, b)
b.f3 = types.MethodType( f3, b)
ADDMETHOD(b, f4, 'f4')


b.f0(0) # OUT: f0   x = 0   r = 10
b.f1(1) # OUT: f1   x = 1   r = 10
b.f2(2) # OUT: f2   x = 2   r = 10
b.f3(3) # OUT: f3   x = 3   r = 10
b.f4(4) # OUT: f4   x = 4   r = 10


k = 2
print 'changing b.r from {0} to {1}'.format(b.r, k)
b.r = k
print 'new b.r = {0}'.format(b.r)

b.f0(0) # OUT: f0   x = 0   r = 2
b.f1(1) # OUT: f1   x = 1   r = 10  !!!!!!!!!
b.f2(2) # OUT: f2   x = 2   r = 2
b.f3(3) # OUT: f3   x = 3   r = 2
b.f4(4) # OUT: f4   x = 4   r = 2

c = C() # created after modifying instance

# let's have a look at each instance's method type attributes
print '\nattributes of a:'
listattr(a)
# OUT:
# attributes of a:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x000000000230FD88>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x000000000230FD88>>
# f0 <bound method C.f0 of <__main__.C instance at 0x000000000230FD88>>

print '\nattributes of b:'
listattr(b)
# OUT:
# attributes of b:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x000000000230FE08>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x000000000230FE08>>
# f0 <bound method C.f0 of <__main__.C instance at 0x000000000230FE08>>
# f1 <bound method ?.f1 of <class __main__.C at 0x000000000237AB28>>
# f2 <bound method ?.f2 of <__main__.C instance at 0x000000000230FE08>>
# f3 <bound method ?.f3 of <__main__.C instance at 0x000000000230FE08>>
# f4 <bound method ?.f4 of <__main__.C instance at 0x000000000230FE08>>

print '\nattributes of c:'
listattr(c)
# OUT:
# attributes of c:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x0000000002313108>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x0000000002313108>>
# f0 <bound method C.f0 of <__main__.C instance at 0x0000000002313108>>

Personally, I prefer the external ADDMETHOD function route, as it allows me to dynamically assign new method names within an iterator as well.

def y(self, x):
    pass
d = C()
for i in range(1,5):
    ADDMETHOD(d, y, 'f%d' % i)
print '\nattributes of d:'
listattr(d)
# OUT:
# attributes of d:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x0000000002303508>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x0000000002303508>>
# f0 <bound method C.f0 of <__main__.C instance at 0x0000000002303508>>
# f1 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f2 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f3 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f4 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>

回答 10

这实际上是“杰森·普拉特”答案的附加内容

尽管杰森斯(Jasons)回答有效,但只有在要向类中添加函数时才起作用。当我尝试从.py源代码文件中重新加载现有方法时,它对我不起作用。

我花了很长时间才找到解决方法,但是这个技巧似乎很简单… 1.st从源代码文件导入代码2.nd强制重新加载3.rd使用types.FunctionType(…)来转换导入并绑定到函数的方法,您还可以传递当前的全局变量,因为重新加载的方法将位于不同的命名空间4.现在,您可以按照类型的“ Jason Pratt”的建议继续使用type.MethodType(… )

例:

# this class resides inside ReloadCodeDemo.py
class A:
    def bar( self ):
        print "bar1"

    def reloadCode(self, methodName):
        ''' use this function to reload any function of class A'''
        import types
        import ReloadCodeDemo as ReloadMod # import the code as module
        reload (ReloadMod) # force a reload of the module
        myM = getattr(ReloadMod.A,methodName) #get reloaded Method
        myTempFunc = types.FunctionType(# convert the method to a simple function
                                myM.im_func.func_code, #the methods code
                                globals(), # globals to use
                                argdefs=myM.im_func.func_defaults # default values for variables if any
                                ) 
        myNewM = types.MethodType(myTempFunc,self,self.__class__) #convert the function to a method
        setattr(self,methodName,myNewM) # add the method to the function

if __name__ == '__main__':
    a = A()
    a.bar()
    # now change your code and save the file
    a.reloadCode('bar') # reloads the file
    a.bar() # now executes the reloaded code

This is actually an addon to the answer of “Jason Pratt”

Although Jasons answer works, it does only work if one wants to add a function to a class. It did not work for me when I tried to reload an already existing method from the .py source code file.

It took me for ages to find a workaround, but the trick seems simple… 1.st import the code from the source code file 2.nd force a reload 3.rd use types.FunctionType(…) to convert the imported and bound method to a function you can also pass on the current global variables, as the reloaded method would be in a different namespace 4.th now you can continue as suggested by “Jason Pratt” using the types.MethodType(…)

Example:

# this class resides inside ReloadCodeDemo.py
class A:
    def bar( self ):
        print "bar1"

    def reloadCode(self, methodName):
        ''' use this function to reload any function of class A'''
        import types
        import ReloadCodeDemo as ReloadMod # import the code as module
        reload (ReloadMod) # force a reload of the module
        myM = getattr(ReloadMod.A,methodName) #get reloaded Method
        myTempFunc = types.FunctionType(# convert the method to a simple function
                                myM.im_func.func_code, #the methods code
                                globals(), # globals to use
                                argdefs=myM.im_func.func_defaults # default values for variables if any
                                ) 
        myNewM = types.MethodType(myTempFunc,self,self.__class__) #convert the function to a method
        setattr(self,methodName,myNewM) # add the method to the function

if __name__ == '__main__':
    a = A()
    a.bar()
    # now change your code and save the file
    a.reloadCode('bar') # reloads the file
    a.bar() # now executes the reloaded code

回答 11

如果有什么帮助,我最近发布了一个名为Gorilla的Python库,以使Monkey修补过程更加方便。

使用函数needle()来修补名为的模块的过程guineapig如下:

import gorilla
import guineapig
@gorilla.patch(guineapig)
def needle():
    print("awesome")

但它也需要照顾的更有趣的使用情况如图所示FAQ文档

该代码可在GitHub上获得

If it can be of any help, I recently released a Python library named Gorilla to make the process of monkey patching more convenient.

Using a function needle() to patch a module named guineapig goes as follows:

import gorilla
import guineapig
@gorilla.patch(guineapig)
def needle():
    print("awesome")

But it also takes care of more interesting use cases as shown in the FAQ from the documentation.

The code is available on GitHub.


回答 12

这个问题是几年前提出的,但是,有一种简单的方法可以使用装饰器来模拟函数与类实例的绑定:

def binder (function, instance):
  copy_of_function = type (function) (function.func_code, {})
  copy_of_function.__bind_to__ = instance
  def bound_function (*args, **kwargs):
    return copy_of_function (copy_of_function.__bind_to__, *args, **kwargs)
  return bound_function


class SupaClass (object):
  def __init__ (self):
    self.supaAttribute = 42


def new_method (self):
  print self.supaAttribute


supaInstance = SupaClass ()
supaInstance.supMethod = binder (new_method, supaInstance)

otherInstance = SupaClass ()
otherInstance.supaAttribute = 72
otherInstance.supMethod = binder (new_method, otherInstance)

otherInstance.supMethod ()
supaInstance.supMethod ()

在那里,当您将函数和实例传递给活页夹装饰器时,它将创建一个新函数,其代码对象与第一个相同。然后,该类的给定实例存储在新创建的函数的属性中。装饰器返回一个(第三个)函数,该函数自动调用复制的函数,并将实例作为第一个参数。

总之,您将获得一个模拟它绑定到类实例的函数。保留原始功能不变。

This question was opened years ago, but hey, there’s an easy way to simulate the binding of a function to a class instance using decorators:

def binder (function, instance):
  copy_of_function = type (function) (function.func_code, {})
  copy_of_function.__bind_to__ = instance
  def bound_function (*args, **kwargs):
    return copy_of_function (copy_of_function.__bind_to__, *args, **kwargs)
  return bound_function


class SupaClass (object):
  def __init__ (self):
    self.supaAttribute = 42


def new_method (self):
  print self.supaAttribute


supaInstance = SupaClass ()
supaInstance.supMethod = binder (new_method, supaInstance)

otherInstance = SupaClass ()
otherInstance.supaAttribute = 72
otherInstance.supMethod = binder (new_method, otherInstance)

otherInstance.supMethod ()
supaInstance.supMethod ()

There, when you pass the function and the instance to the binder decorator, it will create a new function, with the same code object as the first one. Then, the given instance of the class is stored in an attribute of the newly created function. The decorator return a (third) function calling automatically the copied function, giving the instance as the first parameter.

In conclusion you get a function simulating it’s binding to the class instance. Letting the original function unchanged.


回答 13

Jason Pratt发表的内容是正确的。

>>> class Test(object):
...   def a(self):
...     pass
... 
>>> def b(self):
...   pass
... 
>>> Test.b = b
>>> type(b)
<type 'function'>
>>> type(Test.a)
<type 'instancemethod'>
>>> type(Test.b)
<type 'instancemethod'>

如您所见,Python认为b()与a()没有什么不同。在Python中,所有方法只是碰巧是函数的变量。

What Jason Pratt posted is correct.

>>> class Test(object):
...   def a(self):
...     pass
... 
>>> def b(self):
...   pass
... 
>>> Test.b = b
>>> type(b)
<type 'function'>
>>> type(Test.a)
<type 'instancemethod'>
>>> type(Test.b)
<type 'instancemethod'>

As you can see, Python doesn’t consider b() any different than a(). In Python all methods are just variables that happen to be functions.


回答 14

我感到奇怪的是,没有人提到上面列出的所有方法都会在添加的方法和实例之间创建一个循环引用,从而导致对象在垃圾回收之前一直保持不变。有一个古老的技巧通过扩展对象的类来添加描述符:

def addmethod(obj, name, func):
    klass = obj.__class__
    subclass = type(klass.__name__, (klass,), {})
    setattr(subclass, name, func)
    obj.__class__ = subclass

I find it strange that nobody mentioned that all of the methods listed above creates a cycle reference between the added method and the instance, causing the object to be persistent till garbage collection. There was an old trick adding a descriptor by extending the class of the object:

def addmethod(obj, name, func):
    klass = obj.__class__
    subclass = type(klass.__name__, (klass,), {})
    setattr(subclass, name, func)
    obj.__class__ = subclass

回答 15

from types import MethodType

def method(self):
   print 'hi!'


setattr( targetObj, method.__name__, MethodType(method, targetObj, type(method)) )

有了这个,你可以使用self指针

from types import MethodType

def method(self):
   print 'hi!'


setattr( targetObj, method.__name__, MethodType(method, targetObj, type(method)) )

With this, you can use the self pointer


从字符串中删除标点符号的最佳方法

问题:从字符串中删除标点符号的最佳方法

似乎应该有一个比以下方法更简单的方法:

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

在那儿?

It seems like there should be a simpler way than:

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

Is there?


回答 0

从效率的角度来看,您不会被击败

s.translate(None, string.punctuation)

对于更高版本的Python,请使用以下代码:

s.translate(str.maketrans('', '', string.punctuation))

它使用查找表在C语言中执行原始字符串操作-除了编写自己的C代码之外,没有什么比这更好的了。

如果不担心速度,那么另一个选择是:

exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)

这比每个char的s.replace更快,但效果不如regexes或string.translate等非纯python方法,如下面的时序所示。对于这种类型的问题,在尽可能低的水平上进行操作会有所回报。

时间码:

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

得到以下结果:

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

From an efficiency perspective, you’re not going to beat

s.translate(None, string.punctuation)

For higher versions of Python use the following code:

s.translate(str.maketrans('', '', string.punctuation))

It’s performing raw string operations in C with a lookup table – there’s not much that will beat that but writing your own C code.

If speed isn’t a worry, another option though is:

exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)

This is faster than s.replace with each char, but won’t perform as well as non-pure python approaches such as regexes or string.translate, as you can see from the below timings. For this type of problem, doing it at as low a level as possible pays off.

Timing code:

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

This gives the following results:

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

回答 1

如果您知道正则表达式,就足够简单了。

import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)

Regular expressions are simple enough, if you know them.

import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)

回答 2

为了方便使用,我在Python 2和Python 3中总结了从字符串中删除标点符号的注意事项。有关详细说明,请参阅其他答案。


Python 2

import string

s = "string. With. Punctuation?"
table = string.maketrans("","")
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation

Python 3

import string

s = "string. With. Punctuation?"
table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
new_s = s.translate(table)                          # Output: string without punctuation

For the convenience of usage, I sum up the note of striping punctuation from a string in both Python 2 and Python 3. Please refer to other answers for the detailed description.


Python 2

import string

s = "string. With. Punctuation?"
table = string.maketrans("","")
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation

Python 3

import string

s = "string. With. Punctuation?"
table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
new_s = s.translate(table)                          # Output: string without punctuation

回答 3

myString.translate(None, string.punctuation)
myString.translate(None, string.punctuation)

回答 4

我通常使用这样的东西:

>>> s = "string. With. Punctuation?" # Sample string
>>> import string
>>> for c in string.punctuation:
...     s= s.replace(c,"")
...
>>> s
'string With Punctuation'

I usually use something like this:

>>> s = "string. With. Punctuation?" # Sample string
>>> import string
>>> for c in string.punctuation:
...     s= s.replace(c,"")
...
>>> s
'string With Punctuation'

回答 5

string.punctuation是ASCII !一种更正确(但也慢得多)的方法是使用unicodedata模块:

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s

您也可以概括和去除其他类型的字符:

''.join(ch for ch in s if category(ch)[0] not in 'SP')

它还会~*+§$根据个人的视点去掉那些可能为“标点”或不为“标点”的字符。

string.punctuation is ASCII only! A more correct (but also much slower) way is to use the unicodedata module:

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s

You can generalize and strip other types of characters as well:

''.join(ch for ch in s if category(ch)[0] not in 'SP')

It will also strip characters like ~*+§$ which may or may not be “punctuation” depending on one’s point of view.


回答 6

如果您对re家族更加熟悉,则不一定会更简单,但会采用另一种方式。

import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

Not necessarily simpler, but a different way, if you are more familiar with the re family.

import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

回答 7

对于Python 3 str或Python 2 unicode值,str.translate()只需要一个字典;在该映射中查找代码点(整数),并None删除所有映射到的代码点。

然后要删除(某些?)标点符号,请使用:

import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)

使用dict.fromkeys()class方法可以轻松创建映射,并None根据键序列将所有值设置为。

要删除所有标点符号,而不仅仅是ASCII标点符号,您的表需要更大一些。参见JF Sebastian的答案(Python 3版本):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))

For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.

To remove (some?) punctuation then, use:

import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)

The dict.fromkeys() class method makes it trivial to create the mapping, setting all values to None based on the sequence of keys.

To remove all punctuation, not just ASCII punctuation, your table needs to be a little bigger; see J.F. Sebastian’s answer (Python 3 version):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))

回答 8

string.punctuation错过了现实世界中常用的大量标点符号。一种适用于非ASCII标点的解决方案怎么样?

import regex
s = u"string. With. Some・Really Weird、Non?ASCII。 「(Punctuation)」?"
remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)
remove.sub(u" ", s).strip()

我个人认为这是从Python中的字符串中删除标点符号的最佳方法,因为:

  • 删除所有Unicode标点符号
  • 它很容易修改,例如,\{S}如果要删除标点符号,则可以将其删除,但要保留诸如$
  • 您可以真正确定要保留的内容和要删除的内容,例如\{Pd}仅删除破折号。
  • 此正则表达式还规范了空格。它将制表符,回车符和其他奇数映射到漂亮的单个空格。

它使用Unicode字符属性,您可以在Wikipedia上了解更多信息

string.punctuation misses loads of punctuation marks that are commonly used in the real world. How about a solution that works for non-ASCII punctuation?

import regex
s = u"string. With. Some・Really Weird、Non?ASCII。 「(Punctuation)」?"
remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)
remove.sub(u" ", s).strip()

Personally, I believe this is the best way to remove punctuation from a string in Python because:

  • It removes all Unicode punctuation
  • It’s easily modifiable, e.g. you can remove the \{S} if you want to remove punctuation, but keep symbols like $.
  • You can get really specific about what you want to keep and what you want to remove, for example \{Pd} will only remove dashes.
  • This regex also normalizes whitespace. It maps tabs, carriage returns, and other oddities to nice, single spaces.

This uses Unicode character properties, which you can read more about on Wikipedia.


回答 9

我还没有看到这个答案。只需使用正则表达式即可;它会删除单词字符(\w)和数字字符(\d)之外的所有字符,然后删除空格字符(\s):

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(ur'[^\w\d\s]+', '', s)

I haven’t seen this answer yet. Just use a regex; it removes all characters besides word characters (\w) and number characters (\d), followed by a whitespace character (\s):

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(ur'[^\w\d\s]+', '', s)

回答 10

这是Python 3.5的一线式:

import string
"l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

Here’s a one-liner for Python 3.5:

import string
"l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

回答 11

这可能不是最佳解决方案,但是这就是我的方法。

import string
f = lambda x: ''.join([i for i in x if i not in string.punctuation])

This might not be the best solution however this is how I did it.

import string
f = lambda x: ''.join([i for i in x if i not in string.punctuation])

回答 12

这是我编写的函数。它不是很有效,但是很简单,您可以添加或删除所需的标点符号:

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

Here is a function I wrote. It’s not very efficient, but it is simple and you can add or remove any punctuation that you desire:

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

回答 13

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(r'[^a-zA-Z0-9\s]', '', s)
import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(r'[^a-zA-Z0-9\s]', '', s)

回答 14

作为更新,我重写了Python 3中的@Brian示例并对其进行了更改,以将regex编译步骤移至函数内部。我的想法是计时使该功能起作用所需的每个步骤。也许您使用的是分布式计算,并且您的工作人员之间无法共享正则表达式对象,因此需要re.compile在每个工作人员中走一步。另外,我很好奇地为Python 3的maketrans的两种不同实现计时了

table = str.maketrans({key: None for key in string.punctuation})

table = str.maketrans('', '', string.punctuation)

另外,我添加了另一种使用set的方法,其中利用了交集函数来减少迭代次数。

这是完整的代码:

import re, string, timeit

s = "string. With. Punctuation"


def test_set(s):
    exclude = set(string.punctuation)
    return ''.join(ch for ch in s if ch not in exclude)


def test_set2(s):
    _punctuation = set(string.punctuation)
    for punct in set(s).intersection(_punctuation):
        s = s.replace(punct, ' ')
    return ' '.join(s.split())


def test_re(s):  # From Vinko's solution, with fix.
    regex = re.compile('[%s]' % re.escape(string.punctuation))
    return regex.sub('', s)


def test_trans(s):
    table = str.maketrans({key: None for key in string.punctuation})
    return s.translate(table)


def test_trans2(s):
    table = str.maketrans('', '', string.punctuation)
    return(s.translate(table))


def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s


print("sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000))
print("sets2      :",timeit.Timer('f(s)', 'from __main__ import s,test_set2 as f').timeit(1000000))
print("regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000))
print("translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000))
print("translate2 :",timeit.Timer('f(s)', 'from __main__ import s,test_trans2 as f').timeit(1000000))
print("replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000))

这是我的结果:

sets      : 3.1830138750374317
sets2      : 2.189873124472797
regex     : 7.142953420989215
translate : 4.243278483860195
translate2 : 2.427158243022859
replace   : 4.579746678471565

Just as an update, I rewrote the @Brian example in Python 3 and made changes to it to move regex compile step inside of the function. My thought here was to time every single step needed to make the function work. Perhaps you are using distributed computing and can’t have regex object shared between your workers and need to have re.compile step at each worker. Also, I was curious to time two different implementations of maketrans for Python 3

table = str.maketrans({key: None for key in string.punctuation})

vs

table = str.maketrans('', '', string.punctuation)

Plus I added another method to use set, where I take advantage of intersection function to reduce number of iterations.

This is the complete code:

import re, string, timeit

s = "string. With. Punctuation"


def test_set(s):
    exclude = set(string.punctuation)
    return ''.join(ch for ch in s if ch not in exclude)


def test_set2(s):
    _punctuation = set(string.punctuation)
    for punct in set(s).intersection(_punctuation):
        s = s.replace(punct, ' ')
    return ' '.join(s.split())


def test_re(s):  # From Vinko's solution, with fix.
    regex = re.compile('[%s]' % re.escape(string.punctuation))
    return regex.sub('', s)


def test_trans(s):
    table = str.maketrans({key: None for key in string.punctuation})
    return s.translate(table)


def test_trans2(s):
    table = str.maketrans('', '', string.punctuation)
    return(s.translate(table))


def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s


print("sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000))
print("sets2      :",timeit.Timer('f(s)', 'from __main__ import s,test_set2 as f').timeit(1000000))
print("regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000))
print("translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000))
print("translate2 :",timeit.Timer('f(s)', 'from __main__ import s,test_trans2 as f').timeit(1000000))
print("replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000))

This is my results:

sets      : 3.1830138750374317
sets2      : 2.189873124472797
regex     : 7.142953420989215
translate : 4.243278483860195
translate2 : 2.427158243022859
replace   : 4.579746678471565

回答 15

>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)


['string', 'With', 'Punctuation']
>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)


['string', 'With', 'Punctuation']

回答 16

这是没有正则表达式的解决方案。

import string

input_text = "!where??and!!or$$then:)"
punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation))    
print ' '.join(input_text.translate(punctuation_replacer).split()).strip()

Output>> where and or then
  • 用空格替换标点符号
  • 用单个空格替换单词之间的多个空格
  • 如果有strip(),请删除尾随空格

Here’s a solution without regex.

import string

input_text = "!where??and!!or$$then:)"
punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation))    
print ' '.join(input_text.translate(punctuation_replacer).split()).strip()

Output>> where and or then
  • Replaces the punctuations with spaces
  • Replace multiple spaces in between words with a single space
  • Remove the trailing spaces, if any with strip()

回答 17

在不太严格的情况下,单线可能会有所帮助:

''.join([c for c in s if c.isalnum() or c.isspace()])

A one-liner might be helpful in not very strict cases:

''.join([c for c in s if c.isalnum() or c.isspace()])

回答 18

#FIRST METHOD
#Storing all punctuations in a variable    
punctuation='!?,.:;"\')(_-'
newstring='' #Creating empty string
word=raw_input("Enter string: ")
for i in word:
     if(i not in punctuation):
                  newstring+=i
print "The string without punctuation is",newstring

#SECOND METHOD
word=raw_input("Enter string: ")
punctuation='!?,.:;"\')(_-'
newstring=word.translate(None,punctuation)
print "The string without punctuation is",newstring


#Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage
#FIRST METHOD
#Storing all punctuations in a variable    
punctuation='!?,.:;"\')(_-'
newstring='' #Creating empty string
word=raw_input("Enter string: ")
for i in word:
     if(i not in punctuation):
                  newstring+=i
print "The string without punctuation is",newstring

#SECOND METHOD
word=raw_input("Enter string: ")
punctuation='!?,.:;"\')(_-'
newstring=word.translate(None,punctuation)
print "The string without punctuation is",newstring


#Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage

回答 19

with open('one.txt','r')as myFile:

    str1=myFile.read()

    print(str1)


    punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"] 

for i in punctuation:

        str1 = str1.replace(i," ") 
        myList=[]
        myList.extend(str1.split(" "))
print (str1) 
for i in myList:

    print(i,end='\n')
    print ("____________")
with open('one.txt','r')as myFile:

    str1=myFile.read()

    print(str1)


    punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"] 

for i in punctuation:

        str1 = str1.replace(i," ") 
        myList=[]
        myList.extend(str1.split(" "))
print (str1) 
for i in myList:

    print(i,end='\n')
    print ("____________")

回答 20

为什么你们没人使用这个?

 ''.join(filter(str.isalnum, s)) 

太慢了?

Why none of you use this?

 ''.join(filter(str.isalnum, s)) 

Too slow?


回答 21

考虑unicode。代码在python3中检查。

from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))

Considering unicode. Code checked in python3.

from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))

回答 22

使用Python从文本文件中删除停用词

print('====THIS IS HOW TO REMOVE STOP WORS====')

with open('one.txt','r')as myFile:

    str1=myFile.read()

    stop_words ="not", "is", "it", "By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these"

    myList=[]

    myList.extend(str1.split(" "))

    for i in myList:

        if i not in stop_words:

            print ("____________")

            print(i,end='\n')

Remove stop words from the text file using Python

print('====THIS IS HOW TO REMOVE STOP WORS====')

with open('one.txt','r')as myFile:

    str1=myFile.read()

    stop_words ="not", "is", "it", "By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these"

    myList=[]

    myList.extend(str1.split(" "))

    for i in myList:

        if i not in stop_words:

            print ("____________")

            print(i,end='\n')

回答 23

我喜欢使用这样的功能:

def scrub(abc):
    while abc[-1] is in list(string.punctuation):
        abc=abc[:-1]
    while abc[0] is in list(string.punctuation):
        abc=abc[1:]
    return abc

I like to use a function like this:

def scrub(abc):
    while abc[-1] is in list(string.punctuation):
        abc=abc[:-1]
    while abc[0] is in list(string.punctuation):
        abc=abc[1:]
    return abc

iloc,ix和loc有何不同?

问题:iloc,ix和loc有何不同?

有人可以解释这三种切片方法有何不同吗?
我看过文档,也看过这些 答案,但仍然发现自己无法解释这三者之间的区别。在我看来,它们在很大程度上似乎是可互换的,因为它们处于切片的较低级别。

例如,假设我们要获取的前五行DataFrame。这三者如何运作?

df.loc[:5]
df.ix[:5]
df.iloc[:5]

有人可以提出三种用法之间的区别更清楚的情况吗?

Can someone explain how these three methods of slicing are different?
I’ve seen the docs, and I’ve seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

For example, say we want to get the first five rows of a DataFrame. How is it that all three of these work?

df.loc[:5]
df.ix[:5]
df.iloc[:5]

Can someone present three cases where the distinction in uses are clearer?


回答 0

注意:在熊猫版本0.20.0及更高版本中,ix弃用,建议改为使用lociloc。我留下了ix完整的答案部分,以供早期版本的熊猫用户参考。下面添加了示例,显示了的替代方案 ix


首先,以下是三种方法的概述:

  • loc从索引中获取带有特定标签的行(或列)。
  • iloc在索引中的特定位置获取行(或列)(因此仅获取整数)。
  • ix通常会尝试表现得像,lociloc如果索引中没有标签,则会回落为行为。

重要的是要注意一些细微之处,这些细微之处可能会使ix使用起来有些棘手:

  • 如果索引是整数类型,ix则将仅使用基于标签的索引,而不会使用基于位置的索引。如果标签不在索引中,则会引发错误。

  • 如果指数不包含唯一整数,然后给出一个整数,ix将立即使用基于位置的索引,而不是基于标签的索引。但是,如果ix给定其他类型(例如字符串),则可以使用基于标签的索引。


为了说明这三种方法之间的差异,请考虑以下系列:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

我们将看看用整数值切片3

在这种情况下,向s.iloc[:3]我们返回前3行(因为它将3视为位置),并向s.loc[:3]我们返回前8行(因为将3视为标签):

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

注意s.ix[:3]s.loc[:3]由于它首先查找标签,而不是在位置上工作(因此,其索引为s整数类型),因此Notification 返回相同的Series 。

如果我们尝试使用不在索引中的整数标签(例如6)怎么办?

此处s.iloc[:6]按预期返回Series的前6行。但是,s.loc[:6]由于6不在索引中,所以引发KeyError 。

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

根据上面提到的细微之处,s.ix[:6]现在引发KeyError,因为它试图像在索引中loc找到一个那样工作,但找不到它6。因为我们的索引是整数类型,ix所以不会回落为iloc

但是,如果我们的索引为混合类型,则给定的整数ixiloc立即表现出来,而不是引发KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

请记住,ix它仍然可以接受非整数并表现为loc

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

作为一般建议,如果您仅使用标签建立索引,或者仅使用整数位置建立索引,请坚持使用lociloc避免出现意外结果-请勿使用ix


结合基于位置和基于标签的索引

有时在给定DataFrame的情况下,您将需要为行和列混合使用标签和位置索引方法。

例如,考虑以下DataFrame。如何最好地将行切成“ c” 包括前四列?

>>> df = pd.DataFrame(np.nan, 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

在早期版本的pandas(0.20.0之前)中ix,您可以整齐地进行此操作-我们可以按标签对行进行切片,按位置对列进行切片(请注意,对于列,ix由于4不是列名,因此默认为基于位置的切片 ):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

在更高版本的熊猫中,我们可以使用iloc并借助另一种方法来获得此结果:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc()是一种索引方法,意思是“获取标签在此索引中的位置”。请注意,由于切片与iloc不包含其端点,因此如果还要行’c’,则必须在此值上加1。

此处的熊猫文档中还有其他示例。

Note: in pandas version 0.20.0 and above, ix is deprecated and the use of loc and iloc is encouraged instead. I have left the parts of this answer that describe ix intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix.


First, here’s a recap of the three methods:

  • loc gets rows (or columns) with particular labels from the index.
  • iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
  • ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index.

It’s important to note some subtleties that can make ix slightly tricky to use:

  • if the index is of integer type, ix will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.

  • if the index does not contain only integers, then given an integer, ix will immediately use position-based indexing rather than label-based indexing. If however ix is given another type (e.g. a string), it can use label-based indexing.


To illustrate the differences between the three methods, consider the following Series:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

We’ll look at slicing with the integer value 3.

In this case, s.iloc[:3] returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3] returns us the first 8 rows (since it treats 3 as a label):

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

Notice s.ix[:3] returns the same Series as s.loc[:3] since it looks for the label first rather than working on the position (and the index for s is of integer type).

What if we try with an integer label that isn’t in the index (say 6)?

Here s.iloc[:6] returns the first 6 rows of the Series as expected. However, s.loc[:6] raises a KeyError since 6 is not in the index.

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

As per the subtleties noted above, s.ix[:6] now raises a KeyError because it tries to work like loc but can’t find a 6 in the index. Because our index is of integer type ix doesn’t fall back to behaving like iloc.

If, however, our index was of mixed type, given an integer ix would behave like iloc immediately instead of raising a KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

Keep in mind that ix can still accept non-integers and behave like loc:

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

As general advice, if you’re only indexing using labels, or only indexing using integer positions, stick with loc or iloc to avoid unexpected results – try not use ix.


Combining position-based and label-based indexing

Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.

For example, consider the following DataFrame. How best to slice the rows up to and including ‘c’ and take the first four columns?

>>> df = pd.DataFrame(np.nan, 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

In earlier versions of pandas (before 0.20.0) ix lets you do this quite neatly – we can slice the rows by label and the columns by position (note that for the columns, ix will default to position-based slicing since 4 is not a column name):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

In later versions of pandas, we can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc() is an index method meaning “get the position of the label in this index”. Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row ‘c’ as well.

There are further examples in pandas’ documentation here.


回答 1

iloc基于整数定位工作。因此,无论您的行标签是什么,您都可以始终执行以下操作:

df.iloc[0]

或最后五行

df.iloc[-5:]

您也可以在列上使用它。这将检索第三列:

df.iloc[:, 2]    # the : in the first position indicates all rows

您可以将它们结合起来以获得行和列的交集:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

另一方面,.loc使用命名索引。让我们设置一个带有字符串作为行和列标签的数据框:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

然后我们可以得到第一行

df.loc['a']     # equivalent to df.iloc[0]

和第二两排的'date'柱通过

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

等等。现在,可能值得指出的是,a的默认行和列索引DataFrame是从0开始的整数,在这种情况下iloc,它们的loc工作方式相同。这就是为什么您的三个示例是等效的。如果您有非数字索引(例如字符串或日期时间), df.loc[:5] 则会引发错误。

另外,您可以仅使用数据框的进行列检索__getitem__

df['time']    # equivalent to df.loc[:, 'time']

现在假设您要混合使用位置索引和命名索引,即使用行上的名称和列上的位置进行索引(为澄清起见,我的意思是从我们的数据框中选择内容,而不是使用行索引中包含字符串和整数的方式创建数据框列索引)。这是.ix进来的地方:

df.ix[:2, 'time']    # the first two rows of the 'time' column

我认为也值得一提的是,您也可以将布尔向量传递给该loc方法。例如:

 b = [True, False, True]
 df.loc[b] 

将返回的第一行和第三行df。这等效df[b]于选择,但也可以用于通过布尔向量进行分配:

df.loc[b, 'name'] = 'Mary', 'John'

iloc works based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing

df.iloc[0]

or the last five rows by doing

df.iloc[-5:]

You can also use it on the columns. This retrieves the 3rd column:

df.iloc[:, 2]    # the : in the first position indicates all rows

You can combine them to get intersections of rows and columns:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

On the other hand, .loc use named indices. Let’s set up a data frame with strings as row and column labels:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

Then we can get the first row by

df.loc['a']     # equivalent to df.iloc[0]

and the second two rows of the 'date' column by

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

and so on. Now, it’s probably worth pointing out that the default row and column indices for a DataFrame are integers from 0 and in this case iloc and loc would work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes, df.loc[:5] would raise an error.

Also, you can do column retrieval just by using the data frame’s __getitem__:

df['time']    # equivalent to df.loc[:, 'time']

Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where .ix comes in:

df.ix[:2, 'time']    # the first two rows of the 'time' column

I think it’s also worth mentioning that you can pass boolean vectors to the loc method as well. For example:

 b = [True, False, True]
 df.loc[b] 

Will return the 1st and 3rd rows of df. This is equivalent to df[b] for selection, but it can also be used for assigning via boolean vectors:

df.loc[b, 'name'] = 'Mary', 'John'

回答 2

我认为,可接受的答案令人困惑,因为它使用仅缺少值的DataFrame。我也不喜欢术语基于位置.iloc,相反,喜欢整数位置,因为它是更描述性,正是.iloc代表。关键字是.ilocINTEGER-需要INTEGERS。

请参阅我关于子集选择的非常详细的博客系列,以了解更多信息


.ix已弃用且含糊不清,切勿使用

由于.ix已弃用,因此我们仅关注.loc和之间的差异.iloc

在讨论差异之前,重要的是要了解DataFrames具有标签,这些标签可帮助标识每个列和每个索引。让我们看一个示例DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

在此处输入图片说明

所有粗体字均为标签。标签,agecolorfoodheightscorestate被用于。其他标签,JaneNickAaronPenelopeDeanChristinaCornelia被用于索引


在DataFrame中选择特定行的主要方法是使用.loc.iloc索引器。这些索引器中的每一个也可以用于同时选择列,但是现在只关注行更容易。同样,每个索引器都使用紧跟其名称的一组括号进行选择。

.loc仅通过标签选择数据

我们将首先讨论.loc仅通过索引或列标签选择数据的索引器。在示例DataFrame中,我们提供了有意义的名称作为索引值。许多DataFrame都没有任何有意义的名称,而是默认为0到n-1之间的整数,其中n是DataFrame的长度。

您可以使用三种不同的输入 .loc

  • 一串
  • 字符串列表
  • 使用字符串作为起始值和终止值的切片符号

用带字符串的.loc选择单行

要选择一行数据,请将索引标签放在后面的括号内.loc

df.loc['Penelope']

这将数据行作为系列返回

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

使用.loc与字符串列表选择多行

df.loc[['Cornelia', 'Jane', 'Dean']]

这将返回一个DataFrame,其中的数据行按列表中指定的顺序进行:

在此处输入图片说明

使用带有切片符号的.loc选择多行

切片符号由开始,停止和步进值定义。按标签切片时,大熊猫在返回值中包含停止值。以下是从亚伦到迪恩(含)的片段。它的步长未明确定义,但默认为1。

df.loc['Aaron':'Dean']

在此处输入图片说明

可以采用与Python列表相同的方式获取复杂的切片。

.iloc仅按整数位置选择数据

现在转到.iloc。DataFrame中数据的每一行和每一列都有一个定义它的整数位置。这是在输出中直观显示的标签的补充。整数位置只是从0开始从顶部/左侧开始的行/列数。

您可以使用三种不同的输入 .iloc

  • 一个整数
  • 整数列表
  • 使用整数作为起始值和终止值的切片符号

用带整数的.iloc选择单行

df.iloc[4]

这将返回第5行(整数位置4)为系列

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

用.iloc选择带有整数列表的多行

df.iloc[[2, -2]]

这将返回第三行和倒数第二行的DataFrame:

在此处输入图片说明

使用带切片符号的.iloc选择多行

df.iloc[:5:3]

在此处输入图片说明


使用.loc和.iloc同时选择行和列

两者的一项出色功能.loc/.iloc是它们可以同时选择行和列。在上面的示例中,所有列都是从每个选择中返回的。我们可以选择输入类型与行相同的列。我们只需要用逗号分隔行和列选择即可。

例如,我们可以选择Jane行和Dean行,它们的高度,得分和状态如下:

df.loc[['Jane', 'Dean'], 'height':]

在此处输入图片说明

这对行使用标签列表,对列使用切片符号

我们自然可以.iloc只使用整数来执行类似的操作。

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

带标签和整数位置的同时选择

.ix用来与标签和整数位置同时进行选择,这很有用,但有时会造成混淆和模棱两可,值得庆幸的是,它已被弃用。如果您需要混合使用标签和整数位置进行选择,则必须同时选择标签或整数位置。

例如,如果我们要选择行Nick以及第Cornelia2列和第4列,则可以.loc通过以下方式将整数转换为标签来使用:

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 

或者,可以使用get_locindex方法将索引标签转换为整数。

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

布尔选择

.loc索引器还可以进行布尔选择。例如,如果我们有兴趣查找年龄在30岁以上的所有行,并仅返回foodscore列,则可以执行以下操作:

df.loc[df['age'] > 30, ['food', 'score']] 

您可以使用复制它,.iloc但是不能将其传递为布尔系列。您必须将boolean Series转换为numpy数组,如下所示:

df.iloc[(df['age'] > 30).values, [2, 4]] 

选择所有行

可以.loc/.iloc仅用于列选择。您可以使用如下冒号来选择所有行:

df.loc[:, 'color':'score':2]

在此处输入图片说明


索引运算符[]可以选择行和列,但不能同时选择。

大多数人都熟悉DataFrame索引运算符的主要目的,即选择列。字符串选择单个列作为系列,而字符串列表选择多个列作为DataFrame。

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

使用列表选择多个列

df[['food', 'score']]

在此处输入图片说明

人们所不熟悉的是,当使用切片符号时,选择是通过行标签或整数位置进行的。这非常令人困惑,我几乎从未使用过,但是确实可以使用。

df['Penelope':'Christina'] # slice rows by label

在此处输入图片说明

df[2:6:2] # slice rows by integer location

在此处输入图片说明

.loc/.iloc选择行的显式性是高度首选的。单独的索引运算符无法同时选择行和列。

df[3:5, 'color']
TypeError: unhashable type: 'slice'

In my opinion, the accepted answer is confusing, since it uses a DataFrame with only missing values. I also do not like the term position-based for .iloc and instead, prefer integer location as it is much more descriptive and exactly what .iloc stands for. The key word is INTEGER – .iloc needs INTEGERS.

See my extremely detailed blog series on subset selection for more


.ix is deprecated and ambiguous and should never be used

Because .ix is deprecated we will only focus on the differences between .loc and .iloc.

Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each index. Let’s take a look at a sample DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

enter image description here

All the words in bold are the labels. The labels, age, color, food, height, score and state are used for the columns. The other labels, Jane, Nick, Aaron, Penelope, Dean, Christina, Cornelia are used for the index.


The primary ways to select particular rows in a DataFrame are with the .loc and .iloc indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.

.loc selects data only by labels

We will first talk about the .loc indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length of the DataFrame.

There are three different inputs you can use for .loc

  • A string
  • A list of strings
  • Slice notation using strings as the start and stop values

Selecting a single row with .loc with a string

To select a single row of data, place the index label inside of the brackets following .loc.

df.loc['Penelope']

This returns the row of data as a Series

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

Selecting multiple rows with .loc with a list of strings

df.loc[['Cornelia', 'Jane', 'Dean']]

This returns a DataFrame with the rows in the order specified in the list:

enter image description here

Selecting multiple rows with .loc with slice notation

Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.

df.loc['Aaron':'Dean']

enter image description here

Complex slices can be taken in the same manner as Python lists.

.iloc selects data only by integer location

Let’s now turn to .iloc. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.

There are three different inputs you can use for .iloc

  • An integer
  • A list of integers
  • Slice notation using integers as the start and stop values

Selecting a single row with .iloc with an integer

df.iloc[4]

This returns the 5th row (integer location 4) as a Series

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

Selecting multiple rows with .iloc with a list of integers

df.iloc[[2, -2]]

This returns a DataFrame of the third and second to last rows:

enter image description here

Selecting multiple rows with .iloc with slice notation

df.iloc[:5:3]

enter image description here


Simultaneous selection of rows and columns with .loc and .iloc

One excellent ability of both .loc/.iloc is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.

For example, we can select rows Jane, and Dean with just the columns height, score and state like this:

df.loc[['Jane', 'Dean'], 'height':]

enter image description here

This uses a list of labels for the rows and slice notation for the columns

We can naturally do similar operations with .iloc using only integers.

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

Simultaneous selection with labels and integer location

.ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.

For instance, if we want to select rows Nick and Cornelia along with columns 2 and 4, we could use .loc by converting the integers to labels with the following:

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 

Or alternatively, convert the index labels to integers with the get_loc index method.

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

Boolean Selection

The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows wher age is above 30 and return just the food and score columns we can do the following:

df.loc[df['age'] > 30, ['food', 'score']] 

You can replicate this with .iloc but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:

df.iloc[(df['age'] > 30).values, [2, 4]] 

Selecting all rows

It is possible to use .loc/.iloc for just column selection. You can select all the rows by using a colon like this:

df.loc[:, 'color':'score':2]

enter image description here


The indexing operator, [], can select rows and columns too but not simultaneously.

Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

Using a list selects multiple columns

df[['food', 'score']]

enter image description here

What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.

df['Penelope':'Christina'] # slice rows by label

enter image description here

df[2:6:2] # slice rows by integer location

enter image description here

The explicitness of .loc/.iloc for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.

df[3:5, 'color']
TypeError: unhashable type: 'slice'

如何获得每月的最后一天?

问题:如何获得每月的最后一天?

是否可以使用Python的标准库轻松确定(即调用一个函数)给定月份的最后一天?

如果标准库不支持该功能,dateutil包是否支持此功能?

Is there a way using Python’s standard library to easily determine (i.e. one function call) the last day of a given month?

If the standard library doesn’t support that, does the dateutil package support this?


回答 0

查看该calendar模块文档时,我没有注意到这一点,但是称为的方法monthrange提供了以下信息:

monthrange(year,month)
    返回指定年份和月份的月份的第一天的工作日以及月份中的天数。

>>> import calendar
>>> calendar.monthrange(2002,1)
(1, 31)
>>> calendar.monthrange(2008,2)
(4, 29)
>>> calendar.monthrange(2100,2)
(0, 28)

所以:

calendar.monthrange(year, month)[1]

似乎是最简单的方法。

明确一点,也monthrange支持supports年:

>>> from calendar import monthrange
>>> monthrange(2012, 2)
(2, 29)

我以前的答案仍然有效,但显然不是最佳选择。

I didn’t notice this earlier when I was looking at the documentation for the calendar module, but a method called monthrange provides this information:

monthrange(year, month)
    Returns weekday of first day of the month and number of days in month, for the specified year and month.

>>> import calendar
>>> calendar.monthrange(2002,1)
(1, 31)
>>> calendar.monthrange(2008,2)
(4, 29)
>>> calendar.monthrange(2100,2)
(0, 28)

so:

calendar.monthrange(year, month)[1]

seems like the simplest way to go.

Just to be clear, monthrange supports leap years as well:

>>> from calendar import monthrange
>>> monthrange(2012, 2)
(2, 29)

My previous answer still works, but is clearly suboptimal.


回答 1

如果您不想导入calendar模块,那么一个简单的两步函数也可以是:

import datetime

def last_day_of_month(any_day):
    next_month = any_day.replace(day=28) + datetime.timedelta(days=4)  # this will never fail
    return next_month - datetime.timedelta(days=next_month.day)

输出:

>>> for month in range(1, 13):
...     print last_day_of_month(datetime.date(2012, month, 1))
...
2012-01-31
2012-02-29
2012-03-31
2012-04-30
2012-05-31
2012-06-30
2012-07-31
2012-08-31
2012-09-30
2012-10-31
2012-11-30
2012-12-31

If you don’t want to import the calendar module, a simple two-step function can also be:

import datetime

def last_day_of_month(any_day):
    next_month = any_day.replace(day=28) + datetime.timedelta(days=4)  # this will never fail
    return next_month - datetime.timedelta(days=next_month.day)

Outputs:

>>> for month in range(1, 13):
...     print last_day_of_month(datetime.date(2012, month, 1))
...
2012-01-31
2012-02-29
2012-03-31
2012-04-30
2012-05-31
2012-06-30
2012-07-31
2012-08-31
2012-09-30
2012-10-31
2012-11-30
2012-12-31

回答 2

编辑:请参阅@Blair Conrad的答案以获得更清洁的解决方案


>>> import datetime
>>> datetime.date(2000, 2, 1) - datetime.timedelta(days=1)
datetime.date(2000, 1, 31)

EDIT: See @Blair Conrad’s answer for a cleaner solution


>>> import datetime
>>> datetime.date(2000, 2, 1) - datetime.timedelta(days=1)
datetime.date(2000, 1, 31)

回答 3

dateutil.relativedelta(使用pip软件包python-datetutil)实际上很容易。day=31始终会返回该月的最后一天。

例:

from datetime import datetime
from dateutil.relativedelta import relativedelta

date_in_feb = datetime.datetime(2013, 2, 21)
print datetime.datetime(2013, 2, 21) + relativedelta(day=31)  # End-of-month
>>> datetime.datetime(2013, 2, 28, 0, 0)

This is actually pretty easy with dateutil.relativedelta (package python-datetutil for pip). day=31 will always always return the last day of the month.

Example:

from datetime import datetime
from dateutil.relativedelta import relativedelta

date_in_feb = datetime.datetime(2013, 2, 21)
print datetime.datetime(2013, 2, 21) + relativedelta(day=31)  # End-of-month
>>> datetime.datetime(2013, 2, 28, 0, 0)

回答 4

编辑:看到我的其他答案。它的实现比该方法更好,我留在这里是为了防止有人感兴趣地看一看如何“滚动自己的”计算器。

@John Millikin提供了一个很好的答案,增加了计算下个月第一天的复杂性。

以下内容并不是特别优雅,但是要弄清楚任何给定日期所在的月份的最后一天,您可以尝试:

def last_day_of_month(date):
    if date.month == 12:
        return date.replace(day=31)
    return date.replace(month=date.month+1, day=1) - datetime.timedelta(days=1)

>>> last_day_of_month(datetime.date(2002, 1, 17))
datetime.date(2002, 1, 31)
>>> last_day_of_month(datetime.date(2002, 12, 9))
datetime.date(2002, 12, 31)
>>> last_day_of_month(datetime.date(2008, 2, 14))
datetime.date(2008, 2, 29)

EDIT: see my other answer. It has a better implementation than this one, which I leave here just in case someone’s interested in seeing how one might “roll your own” calculator.

@John Millikin gives a good answer, with the added complication of calculating the first day of the next month.

The following isn’t particularly elegant, but to figure out the last day of the month that any given date lives in, you could try:

def last_day_of_month(date):
    if date.month == 12:
        return date.replace(day=31)
    return date.replace(month=date.month+1, day=1) - datetime.timedelta(days=1)

>>> last_day_of_month(datetime.date(2002, 1, 17))
datetime.date(2002, 1, 31)
>>> last_day_of_month(datetime.date(2002, 12, 9))
datetime.date(2002, 12, 31)
>>> last_day_of_month(datetime.date(2008, 2, 14))
datetime.date(2008, 2, 29)

回答 5

使用dateutil.relativedelta您会得到这样的月份的最后日期:

from dateutil.relativedelta import relativedelta
last_date_of_month = datetime(mydate.year, mydate.month, 1) + relativedelta(months=1, days=-1)

这个想法是获取月份的第一天,并relativedelta习惯于提前一个月再返回一天,这样您就可以获得想要的月份的最后一天。

Using dateutil.relativedelta you would get last date of month like this:

from dateutil.relativedelta import relativedelta
last_date_of_month = datetime(mydate.year, mydate.month, 1) + relativedelta(months=1, days=-1)

The idea is to get the first day of the month and use relativedelta to go 1 month ahead and 1 day back so you would get the last day of the month you wanted.


回答 6

另一个解决方案是做这样的事情:

from datetime import datetime

def last_day_of_month(year, month):
    """ Work out the last day of the month """
    last_days = [31, 30, 29, 28, 27]
    for i in last_days:
        try:
            end = datetime(year, month, i)
        except ValueError:
            continue
        else:
            return end.date()
    return None

并使用如下功能:

>>> 
>>> last_day_of_month(2008, 2)
datetime.date(2008, 2, 29)
>>> last_day_of_month(2009, 2)
datetime.date(2009, 2, 28)
>>> last_day_of_month(2008, 11)
datetime.date(2008, 11, 30)
>>> last_day_of_month(2008, 12)
datetime.date(2008, 12, 31)

Another solution would be to do something like this:

from datetime import datetime

def last_day_of_month(year, month):
    """ Work out the last day of the month """
    last_days = [31, 30, 29, 28, 27]
    for i in last_days:
        try:
            end = datetime(year, month, i)
        except ValueError:
            continue
        else:
            return end.date()
    return None

And use the function like this:

>>> 
>>> last_day_of_month(2008, 2)
datetime.date(2008, 2, 29)
>>> last_day_of_month(2009, 2)
datetime.date(2009, 2, 28)
>>> last_day_of_month(2008, 11)
datetime.date(2008, 11, 30)
>>> last_day_of_month(2008, 12)
datetime.date(2008, 12, 31)

回答 7

from datetime import timedelta
(any_day.replace(day=1) + timedelta(days=32)).replace(day=1) - timedelta(days=1)
from datetime import timedelta
(any_day.replace(day=1) + timedelta(days=32)).replace(day=1) - timedelta(days=1)

回答 8

>>> import datetime
>>> import calendar
>>> date  = datetime.datetime.now()

>>> print date
2015-03-06 01:25:14.939574

>>> print date.replace(day = 1)
2015-03-01 01:25:14.939574

>>> print date.replace(day = calendar.monthrange(date.year, date.month)[1])
2015-03-31 01:25:14.939574
>>> import datetime
>>> import calendar
>>> date  = datetime.datetime.now()

>>> print date
2015-03-06 01:25:14.939574

>>> print date.replace(day = 1)
2015-03-01 01:25:14.939574

>>> print date.replace(day = calendar.monthrange(date.year, date.month)[1])
2015-03-31 01:25:14.939574

回答 9

如果您愿意使用外部库,请访问http://crsmithdev.com/arrow/

然后,您可以使用以下命令获取月份的最后一天:

import arrow
arrow.utcnow().ceil('month').date()

这将返回一个日期对象,您可以随后对其进行操作。

if you are willing to use an external library, check out http://crsmithdev.com/arrow/

U can then get the last day of the month with:

import arrow
arrow.utcnow().ceil('month').date()

This returns a date object which you can then do your manipulation.


回答 10

要获取该月的最后日期,我们需要执行以下操作:

from datetime import date, timedelta
import calendar
last_day = date.today().replace(day=calendar.monthrange(date.today().year, date.today().month)[1])

现在解释一下我们在这里做什么,我们将其分为两部分:

首先是获取当月的天数,我们使用Blair Conrad已经提到他的解决方案的monthrange

calendar.monthrange(date.today().year, date.today().month)[1]

二是让我们的帮助下做的最后日期本身代替

>>> date.today()
datetime.date(2017, 1, 3)
>>> date.today().replace(day=31)
datetime.date(2017, 1, 31)

当我们按顶部所述将它们组合在一起时,便得到了动态解决方案。

To get the last date of the month we do something like this:

from datetime import date, timedelta
import calendar
last_day = date.today().replace(day=calendar.monthrange(date.today().year, date.today().month)[1])

Now to explain what we are doing here we will break it into two parts:

first is getting the number of days of the current month for which we use monthrange which Blair Conrad has already mentioned his solution:

calendar.monthrange(date.today().year, date.today().month)[1]

second is getting the last date itself which we do with the help of replace e.g

>>> date.today()
datetime.date(2017, 1, 3)
>>> date.today().replace(day=31)
datetime.date(2017, 1, 31)

and when we combine them as mentioned on the top we get a dynamic solution.


回答 11

在Python 3.7中,有未记录的calendar.monthlen(year, month)函数

>>> calendar.monthlen(2002, 1)
31
>>> calendar.monthlen(2008, 2)
29
>>> calendar.monthlen(2100, 2)
28

它等效于记录的calendar.monthrange(year, month)[1]呼叫

In Python 3.7 there is the undocumented calendar.monthlen(year, month) function:

>>> calendar.monthlen(2002, 1)
31
>>> calendar.monthlen(2008, 2)
29
>>> calendar.monthlen(2100, 2)
28

It is equivalent to the documented calendar.monthrange(year, month)[1] call.


回答 12

import datetime

now = datetime.datetime.now()
start_month = datetime.datetime(now.year, now.month, 1)
date_on_next_month = start_month + datetime.timedelta(35)
start_next_month = datetime.datetime(date_on_next_month.year, date_on_next_month.month, 1)
last_day_month = start_next_month - datetime.timedelta(1)
import datetime

now = datetime.datetime.now()
start_month = datetime.datetime(now.year, now.month, 1)
date_on_next_month = start_month + datetime.timedelta(35)
start_next_month = datetime.datetime(date_on_next_month.year, date_on_next_month.month, 1)
last_day_month = start_next_month - datetime.timedelta(1)

回答 13

使用熊猫!

def isMonthEnd(date):
    return date + pd.offsets.MonthEnd(0) == date

isMonthEnd(datetime(1999, 12, 31))
True
isMonthEnd(pd.Timestamp('1999-12-31'))
True
isMonthEnd(pd.Timestamp(1965, 1, 10))
False

Use pandas!

def isMonthEnd(date):
    return date + pd.offsets.MonthEnd(0) == date

isMonthEnd(datetime(1999, 12, 31))
True
isMonthEnd(pd.Timestamp('1999-12-31'))
True
isMonthEnd(pd.Timestamp(1965, 1, 10))
False

回答 14

对我来说,这是最简单的方法:

selected_date = date(some_year, some_month, some_day)

if selected_date.month == 12: # December
     last_day_selected_month = date(selected_date.year, selected_date.month, 31)
else:
     last_day_selected_month = date(selected_date.year, selected_date.month + 1, 1) - timedelta(days=1)

For me it’s the simplest way:

selected_date = date(some_year, some_month, some_day)

if selected_date.month == 12: # December
     last_day_selected_month = date(selected_date.year, selected_date.month, 31)
else:
     last_day_selected_month = date(selected_date.year, selected_date.month + 1, 1) - timedelta(days=1)

回答 15

最简单的方法(无需导入日历)是获取下个月的第一天,然后从中减去一天。

import datetime as dt
from dateutil.relativedelta import relativedelta

thisDate = dt.datetime(2017, 11, 17)

last_day_of_the_month = dt.datetime(thisDate.year, (thisDate + relativedelta(months=1)).month, 1) - dt.timedelta(days=1)
print last_day_of_the_month

输出:

datetime.datetime(2017, 11, 30, 0, 0)

PS:与该import calendar方法相比,此代码运行速度更快;见下文:

import datetime as dt
import calendar
from dateutil.relativedelta import relativedelta

someDates = [dt.datetime.today() - dt.timedelta(days=x) for x in range(0, 10000)]

start1 = dt.datetime.now()
for thisDate in someDates:
    lastDay = dt.datetime(thisDate.year, (thisDate + relativedelta(months=1)).month, 1) - dt.timedelta(days=1)

print ('Time Spent= ', dt.datetime.now() - start1)


start2 = dt.datetime.now()
for thisDate in someDates:
    lastDay = dt.datetime(thisDate.year, 
                          thisDate.month, 
                          calendar.monthrange(thisDate.year, thisDate.month)[1])

print ('Time Spent= ', dt.datetime.now() - start2)

输出:

Time Spent=  0:00:00.097814
Time Spent=  0:00:00.109791

此代码假定您需要该月最后一天的日期(即,不仅是DD部分,而且是整个YYYYMMDD日期)

The easiest way (without having to import calendar), is to get the first day of the next month, and then subtract a day from it.

import datetime as dt
from dateutil.relativedelta import relativedelta

thisDate = dt.datetime(2017, 11, 17)

last_day_of_the_month = dt.datetime(thisDate.year, (thisDate + relativedelta(months=1)).month, 1) - dt.timedelta(days=1)
print last_day_of_the_month

Output:

datetime.datetime(2017, 11, 30, 0, 0)

PS: This code runs faster as compared to the import calendarapproach; see below:

import datetime as dt
import calendar
from dateutil.relativedelta import relativedelta

someDates = [dt.datetime.today() - dt.timedelta(days=x) for x in range(0, 10000)]

start1 = dt.datetime.now()
for thisDate in someDates:
    lastDay = dt.datetime(thisDate.year, (thisDate + relativedelta(months=1)).month, 1) - dt.timedelta(days=1)

print ('Time Spent= ', dt.datetime.now() - start1)


start2 = dt.datetime.now()
for thisDate in someDates:
    lastDay = dt.datetime(thisDate.year, 
                          thisDate.month, 
                          calendar.monthrange(thisDate.year, thisDate.month)[1])

print ('Time Spent= ', dt.datetime.now() - start2)

OUTPUT:

Time Spent=  0:00:00.097814
Time Spent=  0:00:00.109791

This code assumes that you want the date of the last day of the month (i.e., not just the DD part, but the entire YYYYMMDD date)


回答 16

这是另一个答案。无需额外的程序包。

datetime.date(year + int(month/12), month%12+1, 1)-datetime.timedelta(days=1)

获取下个月的第一天并从中减去一天。

Here is another answer. No extra packages required.

datetime.date(year + int(month/12), month%12+1, 1)-datetime.timedelta(days=1)

Get the first day of the next month and subtract a day from it.


回答 17

您可以自己计算结束日期。简单的逻辑是从下个月的start_date减去一天。:)

因此,编写一个自定义方法,

import datetime

def end_date_of_a_month(date):


    start_date_of_this_month = date.replace(day=1)

    month = start_date_of_this_month.month
    year = start_date_of_this_month.year
    if month == 12:
        month = 1
        year += 1
    else:
        month += 1
    next_month_start_date = start_date_of_this_month.replace(month=month, year=year)

    this_month_end_date = next_month_start_date - datetime.timedelta(days=1)
    return this_month_end_date

打电话

end_date_of_a_month(datetime.datetime.now().date())

它将返回本月的结束日期。将任何日期传递给此功能。返回该月的结束日期。

You can calculate the end date yourself. the simple logic is to subtract a day from the start_date of next month. :)

So write a custom method,

import datetime

def end_date_of_a_month(date):


    start_date_of_this_month = date.replace(day=1)

    month = start_date_of_this_month.month
    year = start_date_of_this_month.year
    if month == 12:
        month = 1
        year += 1
    else:
        month += 1
    next_month_start_date = start_date_of_this_month.replace(month=month, year=year)

    this_month_end_date = next_month_start_date - datetime.timedelta(days=1)
    return this_month_end_date

Calling,

end_date_of_a_month(datetime.datetime.now().date())

It will return the end date of this month. Pass any date to this function. returns you the end date of that month.


回答 18

您可以使用relativedelta https://dateutil.readthedocs.io/en/stable/relativedelta.html month_end = <your datetime value within the month> + relativedelta(day=31) 来给您最后的一天。

you can use relativedelta https://dateutil.readthedocs.io/en/stable/relativedelta.html month_end = <your datetime value within the month> + relativedelta(day=31) that will give you the last day.


回答 19

仅使用标准日期时间库,这对我来说是最简单的解决方案:

import datetime

def get_month_end(dt):
    first_of_month = datetime.datetime(dt.year, dt.month, 1)
    next_month_date = first_of_month + datetime.timedelta(days=32)
    new_dt = datetime.datetime(next_month_date.year, next_month_date.month, 1)
    return new_dt - datetime.timedelta(days=1)

This is the simplest solution for me using just the standard datetime library:

import datetime

def get_month_end(dt):
    first_of_month = datetime.datetime(dt.year, dt.month, 1)
    next_month_date = first_of_month + datetime.timedelta(days=32)
    new_dt = datetime.datetime(next_month_date.year, next_month_date.month, 1)
    return new_dt - datetime.timedelta(days=1)

回答 20

这没有解决主要问题,但是使用一个月中的最后一个工作日的一个不错的技巧是使用calendar.monthcalendar,该方法返回一个日期矩阵,将星期一作为第一列,将星期日作为最后一个列。

# Some random date.
some_date = datetime.date(2012, 5, 23)

# Get last weekday
last_weekday = np.asarray(calendar.monthcalendar(some_date.year, some_date.month))[:,0:-2].ravel().max()

print last_weekday
31

整个过程[0:-2]就是刮掉周末专栏文章,然后将它们扔掉。月份以外的日期用0表示,因此最大值实际上会忽略它们。

使用的numpy.ravel是不是绝对必要的,但我不喜欢依靠单纯的惯例numpy.ndarray.max将压平的数组,如果没有被告知哪个轴计算过。

This does not address the main question, but one nice trick to get the last weekday in a month is to use calendar.monthcalendar, which returns a matrix of dates, organized with Monday as the first column through Sunday as the last.

# Some random date.
some_date = datetime.date(2012, 5, 23)

# Get last weekday
last_weekday = np.asarray(calendar.monthcalendar(some_date.year, some_date.month))[:,0:-2].ravel().max()

print last_weekday
31

The whole [0:-2] thing is to shave off the weekend columns and throw them out. Dates that fall outside of the month are indicated by 0, so the max effectively ignores them.

The use of numpy.ravel is not strictly necessary, but I hate relying on the mere convention that numpy.ndarray.max will flatten the array if not told which axis to calculate over.


回答 21

import calendar
from time import gmtime, strftime
calendar.monthrange(int(strftime("%Y", gmtime())), int(strftime("%m", gmtime())))[1]

输出:

31



这将打印当前月份的最后一天。在此示例中,日期是2016年5月15日。因此您的输出可能会有所不同,但是输出将是当月的几天。如果您想通过运行每日Cron作业来检查每月的最后一天,那就太好了。

所以:

import calendar
from time import gmtime, strftime
lastDay = calendar.monthrange(int(strftime("%Y", gmtime())), int(strftime("%m", gmtime())))[1]
today = strftime("%d", gmtime())
lastDay == today

输出:

False

除非是每月的最后一天。

import calendar
from time import gmtime, strftime
calendar.monthrange(int(strftime("%Y", gmtime())), int(strftime("%m", gmtime())))[1]

Output:

31



This will print the last day of whatever the current month is. In this example it was 15th May, 2016. So your output may be different, however the output will be as many days that the current month is. Great if you want to check the last day of the month by running a daily cron job.

So:

import calendar
from time import gmtime, strftime
lastDay = calendar.monthrange(int(strftime("%Y", gmtime())), int(strftime("%m", gmtime())))[1]
today = strftime("%d", gmtime())
lastDay == today

Output:

False

Unless it IS the last day of the month.


回答 22

我喜欢这样

import datetime
import calendar

date=datetime.datetime.now()
month_end_date=datetime.datetime(date.year,date.month,1) + datetime.timedelta(days=calendar.monthrange(date.year,date.month)[1] - 1)

I prefer this way

import datetime
import calendar

date=datetime.datetime.now()
month_end_date=datetime.datetime(date.year,date.month,1) + datetime.timedelta(days=calendar.monthrange(date.year,date.month)[1] - 1)

回答 23

如果要创建自己的小函数,这是一个很好的起点:

def eomday(year, month):
    """returns the number of days in a given month"""
    days_per_month = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
    d = days_per_month[month - 1]
    if month == 2 and (year % 4 == 0 and year % 100 != 0 or year % 400 == 0):
        d = 29
    return d

为此,您必须了解the年的规则:

  • 每四年
  • 每100年除外
  • 但是每400年一次

If you want to make your own small function, this is a good starting point:

def eomday(year, month):
    """returns the number of days in a given month"""
    days_per_month = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
    d = days_per_month[month - 1]
    if month == 2 and (year % 4 == 0 and year % 100 != 0 or year % 400 == 0):
        d = 29
    return d

For this you have to know the rules for the leap years:

  • every fourth year
  • with the exception of every 100 year
  • but again every 400 years

回答 24

如果输入日期范围,则可以使用以下方法:

def last_day_of_month(any_days):
    res = []
    for any_day in any_days:
        nday = any_day.days_in_month -any_day.day
        res.append(any_day + timedelta(days=nday))
    return res

If you pass in a date range, you can use this:

def last_day_of_month(any_days):
    res = []
    for any_day in any_days:
        nday = any_day.days_in_month -any_day.day
        res.append(any_day + timedelta(days=nday))
    return res

回答 25

“ get_last_day_of_month(dt)”下面的代码中,将为您提供此日期,日期格式为“ YYYY-MM-DD”。

import datetime

def DateTime( d ):
    return datetime.datetime.strptime( d, '%Y-%m-%d').date()

def RelativeDate( start, num_days ):
    d = DateTime( start )
    return str( d + datetime.timedelta( days = num_days ) )

def get_first_day_of_month( dt ):
    return dt[:-2] + '01'

def get_last_day_of_month( dt ):
    fd = get_first_day_of_month( dt )
    fd_next_month = get_first_day_of_month( RelativeDate( fd, 31 ) )
    return RelativeDate( fd_next_month, -1 )

In the code below ‘get_last_day_of_month(dt)’ will give you this, with date in string format like ‘YYYY-MM-DD’.

import datetime

def DateTime( d ):
    return datetime.datetime.strptime( d, '%Y-%m-%d').date()

def RelativeDate( start, num_days ):
    d = DateTime( start )
    return str( d + datetime.timedelta( days = num_days ) )

def get_first_day_of_month( dt ):
    return dt[:-2] + '01'

def get_last_day_of_month( dt ):
    fd = get_first_day_of_month( dt )
    fd_next_month = get_first_day_of_month( RelativeDate( fd, 31 ) )
    return RelativeDate( fd_next_month, -1 )

回答 26

最简单的方法是使用datetime和一些日期数学,例如从下个月的第一天减去一天:

import datetime

def last_day_of_month(d: datetime.date) -> datetime.date:
    return (
        datetime.date(d.year + d.month//12, d.month % 12 + 1, 1) -
        datetime.timedelta(days=1)
    )

或者,您可以calendar.monthrange()用来获取一个月中的天数(考虑leap年)并相应地更新日期:

import calendar, datetime

def last_day_of_month(d: datetime.date) -> datetime.date:
    return d.replace(day=calendar.monthrange(d.year, d.month)[1])

快速基准测试表明,第一个版本的速度明显更快:

In [14]: today = datetime.date.today()

In [15]: %timeit last_day_of_month_dt(today)
918 ns ± 3.54 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [16]: %timeit last_day_of_month_calendar(today)
1.4 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

The simplest way is to use datetime and some date math, e.g. subtract a day from the first day of the next month:

import datetime

def last_day_of_month(d: datetime.date) -> datetime.date:
    return (
        datetime.date(d.year + d.month//12, d.month % 12 + 1, 1) -
        datetime.timedelta(days=1)
    )

Alternatively, you could use calendar.monthrange() to get the number of days in a month (taking leap years into account) and update the date accordingly:

import calendar, datetime

def last_day_of_month(d: datetime.date) -> datetime.date:
    return d.replace(day=calendar.monthrange(d.year, d.month)[1])

A quick benchmark shows that the first version is noticeably faster:

In [14]: today = datetime.date.today()

In [15]: %timeit last_day_of_month_dt(today)
918 ns ± 3.54 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [16]: %timeit last_day_of_month_calendar(today)
1.4 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

回答 27

这是一个很长的版本(易于理解),但是照顾了leap年。

干杯,JK

def last_day_month(year, month):
    leap_year_flag = 0
    end_dates = {
        1: 31,
        2: 28,
        3: 31,
        4: 30,
        5: 31,
        6: 30,
        7: 31,
        8: 31,
        9: 30,
        10: 31,
        11: 30,
        12: 31
    }

    # Checking for regular leap year    
    if year % 4 == 0:
        leap_year_flag = 1
    else:
        leap_year_flag = 0

    # Checking for century leap year    
    if year % 100 == 0:
        if year % 400 == 0:
            leap_year_flag = 1
        else:
            leap_year_flag = 0
    else:
        pass

    # return end date of the year-month
    if leap_year_flag == 1 and month == 2:
        return 29
    elif leap_year_flag == 1 and month != 2:
        return end_dates[month]
    else:
        return end_dates[month]

Here is a long (easy to understand) version but takes care of leap years.

cheers, JK

def last_day_month(year, month):
    leap_year_flag = 0
    end_dates = {
        1: 31,
        2: 28,
        3: 31,
        4: 30,
        5: 31,
        6: 30,
        7: 31,
        8: 31,
        9: 30,
        10: 31,
        11: 30,
        12: 31
    }

    # Checking for regular leap year    
    if year % 4 == 0:
        leap_year_flag = 1
    else:
        leap_year_flag = 0

    # Checking for century leap year    
    if year % 100 == 0:
        if year % 400 == 0:
            leap_year_flag = 1
        else:
            leap_year_flag = 0
    else:
        pass

    # return end date of the year-month
    if leap_year_flag == 1 and month == 2:
        return 29
    elif leap_year_flag == 1 and month != 2:
        return end_dates[month]
    else:
        return end_dates[month]

回答 28

这是一个基于解决方案的python lambdas:

next_month = lambda y, m, d: (y, m + 1, 1) if m + 1 < 13 else ( y+1 , 1, 1)
month_end  = lambda dte: date( *next_month( *dte.timetuple()[:3] ) ) - timedelta(days=1)

next_month拉姆达发现到明年下个月的第一天的元组表示,和卷。该month_end拉姆达转换日期(dte)的元组,应用next_month和创造新的日期。然后,“月底”就是下个月的第一天减去timedelta(days=1)

Here is a solution based python lambdas:

next_month = lambda y, m, d: (y, m + 1, 1) if m + 1 < 13 else ( y+1 , 1, 1)
month_end  = lambda dte: date( *next_month( *dte.timetuple()[:3] ) ) - timedelta(days=1)

The next_month lambda finds the tuple representation of the first day of the next month, and rolls over to the next year. The month_end lambda transforms a date (dte) to a tuple, applies next_month and creates a new date. Then the “month’s end” is just the next month’s first day minus timedelta(days=1).


如何保护Python代码?

问题:如何保护Python代码?

我正在用Python开发一款软件,该软件将分发给我的雇主的客户。我的雇主希望通过限时许可文件来限制软件的使用。

如果我们分发.py文件或什至.pyc文件,将很容易(反编译和)删除检查许可证文件的代码。

另一个方面是,我的雇主不希望我们的客户阅读该代码,因为担心该代码可能被盗或至少是“新颖的主意”。

有解决这个问题的好方法吗?最好使用现成的解决方案。

该软件将在Linux系统上运行(因此,我认为py2exe不会成功)。

I am developing a piece of software in Python that will be distributed to my employer’s customers. My employer wants to limit the usage of the software with a time restricted license file.

If we distribute the .py files or even .pyc files it will be easy to (decompile and) remove the code that checks the license file.

Another aspect is that my employer does not want the code to be read by our customers, fearing that the code may be stolen or at least the “novel ideas”.

Is there a good way to handle this problem? Preferably with an off-the-shelf solution.

The software will run on Linux systems (so I don’t think py2exe will do the trick).


回答 0

Python是字节码编译的解释语言,很难锁定。即使您使用py2exe之类的exe打包程序,该可执行文件的布局也是众所周知的,并且Python字节码也很容易理解。

通常在这种情况下,您必须进行权衡。保护代码真的有多重要?那里是否有真正的秘密(例如,对银行转账进行对称加密的密钥),或者您只是偏执?选择一种语言,使您能够最快地开发出最好的产品,并要对您的新颖创意的价值抱有现实的态度。

如果您确定确实需要安全地执行许可证检查,则将其编写为一个小的C扩展,以便可以对许可证检查代码进行额外的难度(但并非不可能!)以进行反向工程,并将大部分代码保留在Python中。

Python, being a byte-code-compiled interpreted language, is very difficult to lock down. Even if you use a exe-packager like py2exe, the layout of the executable is well-known, and the Python byte-codes are well understood.

Usually in cases like this, you have to make a tradeoff. How important is it really to protect the code? Are there real secrets in there (such as a key for symmetric encryption of bank transfers), or are you just being paranoid? Choose the language that lets you develop the best product quickest, and be realistic about how valuable your novel ideas are.

If you decide you really need to enforce the license check securely, write it as a small C extension so that the license check code can be extra-hard (but not impossible!) to reverse engineer, and leave the bulk of your code in Python.


回答 1

“有没有解决这个问题的好方法?” 不可以。没有任何东西可以防止逆向工程。DVD机器上的固件甚至都经过了反向工程,并且暴露了AACS加密密钥。尽管DMCA将该行为定为刑事犯罪,但这仍然存在。

由于没有任何一种技术方法可以阻止您的客户阅读您的代码,因此您必须采用普通的商业方法。

  1. 许可证。合同。条款和条件。即使人们可以阅读代码,这仍然有效。请注意,某些基于Python的组件可能要求您先付费,然后再使用这些组件销售软件。另外,某些开源许可证禁止您隐藏该组件的来源或来源。

  2. 提供重大价值。如果您的产品非常好-以难以拒绝的价格出售-则没有动力浪费时间和金钱进行任何逆向工程。逆向工程很昂贵。使您的产品便宜一些。

  3. 提供升级和增强功能,使任何逆向工程成为一个坏主意。当下一个版本中断其逆向工程时,没有任何意义。这可能荒唐至极,但是您应该提供新功能,这些新功能使下一个版本比逆向工程更有价值。

  4. 以极具吸引力的价格提供定制服务,以至于他们宁愿您付钱给您构建并支持增强功能。

  5. 使用过期的许可证密钥。这是残酷的,会给您带来不好的声誉,但是肯定会使您的软件停止工作。

  6. 作为网络服务提供。SaaS不涉及向客户的下载。

“Is there a good way to handle this problem?” No. Nothing can be protected against reverse engineering. Even the firmware on DVD machines has been reverse engineered and AACS Encryption key exposed. And that’s in spite of the DMCA making that a criminal offense.

Since no technical method can stop your customers from reading your code, you have to apply ordinary commercial methods.

  1. Licenses. Contracts. Terms and Conditions. This still works even when people can read the code. Note that some of your Python-based components may require that you pay fees before you sell software using those components. Also, some open-source licenses prohibit you from concealing the source or origins of that component.

  2. Offer significant value. If your stuff is so good — at a price that is hard to refuse — there’s no incentive to waste time and money reverse engineering anything. Reverse engineering is expensive. Make your product slightly less expensive.

  3. Offer upgrades and enhancements that make any reverse engineering a bad idea. When the next release breaks their reverse engineering, there’s no point. This can be carried to absurd extremes, but you should offer new features that make the next release more valuable than reverse engineering.

  4. Offer customization at rates so attractive that they’d rather pay you do build and support the enhancements.

  5. Use a license key which expires. This is cruel, and will give you a bad reputation, but it certainly makes your software stop working.

  6. Offer it as a web service. SaaS involves no downloads to customers.


回答 2

Python不是您需要的工具

您必须使用正确的工具来完成正确的事情,并且Python并非旨在被混淆。恰恰相反;一切都是开放的,或者很容易在Python中显示或修改,因为这是该语言的理念。

如果您想要看不见的东西,请寻找其他工具。这不是一件坏事,重要的是要存在几种不同的工具以用于不同的用途。

混淆真的很难

即使已编译的程序也可以进行逆向工程,所以不要以为您可以完全保护任何代码。您可以分析混淆的PHP,破坏Flash加密密钥等。每次都会破解较新版本的Windows。

有法律要求是一个好方法

您不能阻止某人滥用您的代码,但是您可以轻松地发现某人是否在使用它。因此,这只是一个偶然的法律问题。

代码保护被高估

如今,商业模式倾向于销售服务而不是产品。您不能复制,盗版或盗用服务。也许是时候考虑顺其自然了…

Python is not the tool you need

You must use the right tool to do the right thing, and Python was not designed to be obfuscated. It’s the contrary; everything is open or easy to reveal or modify in Python because that’s the language’s philosophy.

If you want something you can’t see through, look for another tool. This is not a bad thing, it is important that several different tools exist for different usages.

Obfuscation is really hard

Even compiled programs can be reverse-engineered so don’t think that you can fully protect any code. You can analyze obfuscated PHP, break the flash encryption key, etc. Newer versions of Windows are cracked every time.

Having a legal requirement is a good way to go

You cannot prevent somebody from misusing your code, but you can easily discover if someone does. Therefore, it’s just a casual legal issue.

Code protection is overrated

Nowadays, business models tend to go for selling services instead of products. You cannot copy a service, pirate nor steal it. Maybe it’s time to consider to go with the flow…


回答 3

编译python并分发二进制文件!

明智的主意:

使用CythonNuitkaShed Skin或类似于将python编译为C代码的东西,然后将您的应用分发为python二进制库(pyd)。

这样,我认为就没有剩下Python(字节)代码了,而且您已经做了任何人(即您的雇主)可以从常规代码中期望的合理数量的模糊处理。(.NET或Java不如这种情况安全,因为该字节码不会被混淆,并且可以相对容易地反编译为合理的源代码。)

Cython与CPython的兼容性越来越强,因此我认为它应该可以工作。(我实际上正在考虑将其用于我们的产品。。我们已经在构建一些第三方库作为pyd / dll,因此,将我们自己的python代码作为二进制文件交付对我们来说并不是一个太大的步骤。)

有关如何执行此操作的教程,请参阅此博客文章(不是我本人)。(thx @hithwen)

疯狂的主意:

您可能可以让Cython为每个模块分别存储C文件,然后将它们全部串联起来并使用大量的内联代码进行构建。这样,您的Python模块是非常单一的,并且很难用通用工具来实现。

超越疯狂:

如果您可以静态链接到python运行时和所有库(dll),则可以构建一个可执行文件。这样,肯定很难拦截对python和您使用的任何框架库的调用。但是,如果您使用LGPL代码,则无法完成此操作。

Compile python and distribute binaries!

Sensible idea:

Use Cython, Nuitka, Shed Skin or something similar to compile python to C code, then distribute your app as python binary libraries (pyd) instead.

That way, no Python (byte) code is left and you’ve done any reasonable amount of obscurification anyone (i.e. your employer) could expect from regular Code, I think. (.NET or Java less safe than this case, as that bytecode is not obfuscated and can relatively easily be decompiled into reasonable source.)

Cython is getting more and more compatible with CPython, so I think it should work. (I’m actually considering this for our product.. We’re already building some thirdparty libs as pyd/dlls, so shipping our own python code as binaries is not a overly big step for us.)

See This Blog Post (not by me) for a tutorial on how to do it. (thx @hithwen)

Crazy idea:

You could probably get Cython to store the C-files separately for each module, then just concatenate them all and build them with heavy inlining. That way, your Python module is pretty monolithic and difficult to chip at with common tools.

Beyond crazy:

You might be able to build a single executable if you can link to (and optimize with) the python runtime and all libraries (dlls) statically. That way, it’d sure be difficult to intercept calls to/from python and whatever framework libraries you use. This cannot be done if you’re using LGPL code though.


回答 4

我了解您希望客户使用python的功能,但不希望公开源代码。

这是我的建议:

(a)将关键代码段编写为C或C ++库,然后使用SIPSwig将C / C ++ API公开给Python命名空间。

(b)使用cython代替Python

(c)在(a)和(b)中,都应该可以使用Python接口将库作为许可的二进制文件分发。

I understand that you want your customers to use the power of python but do not want expose the source code.

Here are my suggestions:

(a) Write the critical pieces of the code as C or C++ libraries and then use SIP or swig to expose the C/C++ APIs to Python namespace.

(b) Use cython instead of Python

(c) In both (a) and (b), it should be possible to distribute the libraries as licensed binary with a Python interface.


回答 5

您的雇主是否知道他可以“窃取”他人从您的代码中得到的任何想法?我的意思是,如果他们可以阅读您的作品,那么您也可以阅读。也许看着您如何从这种情况中受益会比担心会损失多少更好地获得投资回报。

[编辑]回答尼克的评论:

一无所有,一无所有。客户拥有自己想要的东西(并且自从进行更改以来就为此付费)。由于他没有发布更改,因此好像其他所有人都没有发生过。

现在,如果客户出售软件,则他们必须更改版权声明(这是非法的,因此您可以提起诉讼,将胜诉->简单案例)。

如果他们不更改版权声明,那么第二级客户将注意到该软件来自您原来的产品,并想知道这是怎么回事。他们很可能会与您联系,因此您将了解有关转售作品的信息。

同样,我们有两种情况:原始客户仅售出了几份。那意味着他们无论如何也赚不了多少钱,那为什么还要打扰呢。或者他们批量销售。这意味着您有更多的机会了解他们的工作并为此做些事情。

但是最后,大多数公司都试图遵守法律(一旦声誉受损,开展业务就会困难得多)。因此,他们不会窃取您的工作,而是会与您一起进行改进。因此,如果您包含源代码(具有可以防止您简单转售的许可证),则它们很可能会简单地推回所做的更改,因为这样可以确保更改在下一版本中进行,而不必维护。这是双赢的:您获得更改,并且即使您不愿意将其真正包含在正式版本中,他们也可以根据自己的需要进行更改,即使他们确实需要它。

Is your employer aware that he can “steal” back any ideas that other people get from your code? I mean, if they can read your work, so can you theirs. Maybe looking at how you can benefit from the situation would yield a better return of your investment than fearing how much you could lose.

[EDIT] Answer to Nick’s comment:

Nothing gained and nothing lost. The customer has what he wants (and paid for it since he did the change himself). Since he doesn’t release the change, it’s as if it didn’t happen for everyone else.

Now if the customer sells the software, they have to change the copyright notice (which is illegal, so you can sue and will win -> simple case).

If they don’t change the copyright notice, the 2nd level customers will notice that the software comes from you original and wonder what is going on. Chances are that they will contact you and so you will learn about the reselling of your work.

Again we have two cases: The original customer sold only a few copies. That means they didn’t make much money anyway, so why bother. Or they sold in volume. That means better chances for you to learn about what they do and do something about it.

But in the end, most companies try to comply to the law (once their reputation is ruined, it’s much harder to do business). So they will not steal your work but work with you to improve it. So if you include the source (with a license that protects you from simple reselling), chances are that they will simply push back changes they made since that will make sure the change is in the next version and they don’t have to maintain it. That’s win-win: You get changes and they can make the change themselves if they really, desperately need it even if you’re unwilling to include it in the official release.


回答 6

你看过催眠药吗?它会缩小,混淆和压缩Python代码。对于偶然的逆向工程,示例代码看起来很讨厌。

$ pyminifier --nonlatin --replacement-length=50 /tmp/tumult.py
#!/usr/bin/env python3
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ=ImportError
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱=print
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡=False
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨=object
try:
 import demiurgic
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: You're not demiurgic. Actually, I think that's normal.")
try:
 import mystificate
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: Dark voodoo may be unreliable.")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺬ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡
class ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨):
 def __init__(self,*args,**kwargs):
  pass
 def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ클(self,dactyl):
  ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐=demiurgic.palpitation(dactyl)
  ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲=mystificate.dark_voodoo(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐)
  return ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲
 def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯(self,whatever):
  ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱(whatever)
if __name__=="__main__":
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Forming...")
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚("epicaricacy","perseverate")
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ.ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯("Codswallop")
# Created by pyminifier (https://github.com/liftoff/pyminifier)

Have you had a look at pyminifier? It does Minify, obfuscate, and compress Python code. The example code looks pretty nasty for casual reverse engineering.

$ pyminifier --nonlatin --replacement-length=50 /tmp/tumult.py
#!/usr/bin/env python3
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ=ImportError
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱=print
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡=False
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨=object
try:
 import demiurgic
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: You're not demiurgic. Actually, I think that's normal.")
try:
 import mystificate
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: Dark voodoo may be unreliable.")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺬ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡
class ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨):
 def __init__(self,*args,**kwargs):
  pass
 def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ클(self,dactyl):
  ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐=demiurgic.palpitation(dactyl)
  ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲=mystificate.dark_voodoo(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐)
  return ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲
 def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯(self,whatever):
  ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱(whatever)
if __name__=="__main__":
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Forming...")
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚("epicaricacy","perseverate")
 ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ.ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯("Codswallop")
# Created by pyminifier (https://github.com/liftoff/pyminifier)

回答 7

不要依靠混淆。正如您已经正确得出的结论,它提供的保护非常有限。更新:这是指向论文链接,该论文在Dropbox中反向工程了经过混淆的python代码。这种方法-操作码重映射是一个很好的障碍,但显然可以克服。

相反,正如许多海报提到的那样做到:

  • 不值得进行反向工程的时间(您的软件是如此出色,值得付出)
  • 让他们签署合同,并在可行时进行许可证审核。

另外,就像踢屁股的Python IDE WingIDE一样:放弃代码。没错,请放弃代码,让人们回来进行升级和支持。

Do not rely on obfuscation. As You have correctly concluded, it offers very limited protection. UPDATE: Here is a link to paper which reverse engineered obfuscated python code in Dropbox. The approach – opcode remapping is a good barrier, but clearly it can be defeated.

Instead, as many posters have mentioned make it:

  • Not worth reverse engineering time (Your software is so good, it makes sense to pay)
  • Make them sign a contract and do a license audit if feasible.

Alternatively, as the kick-ass Python IDE WingIDE does: Give away the code. That’s right, give the code away and have people come back for upgrades and support.


回答 8

使用Cython。它将您的模块编译为高性能的C文件,然后可以将其编译为本机二进制库。与.pyc字节码相比,这基本上是不可逆的!

我写了一篇有关如何为Python项目设置Cython的详细文章,请查看:

用Cython保护Python源

Use Cython. It will compile your modules to high-performant C files, which can then be compiled to native binary libraries. This is basically un-reversable, compared to .pyc bytecode!

I’ve written a detailed article on how to set up Cython for a Python project, check it out:

Protecting Python Sources With Cython


回答 9

运送.pyc文件存在问题-它们与使用其创建的python版本不兼容,与任何其他python版本都不兼容,这意味着您必须知道要在其上运行该产品的系统上正在运行哪个python版本。这是一个非常有限的因素。

Shipping .pyc files has its problems – they are not compatible with any other python version than the python version they were created with, which means you must know which python version is running on the systems the product will run on. That’s a very limiting factor.


回答 10

在某些情况下,有可能将软件(全部或至少关键部分)移入组织托管的Web服务中。

这样,可以在您自己的服务器机房中安全地执行许可证检查。

In some circumstances, it may be possible to move (all, or at least a key part) of the software into a web service that your organization hosts.

That way, the license checks can be performed in the safety of your own server room.


回答 11

尽管没有完美的解决方案,但可以执行以下操作:

  1. 将一些关键的启动代码移到本机库中。
  2. 在本机库中强制执行许可证检查。

如果要删除对本机代码的调用,则该程序无论如何都不会启动。如果未删除,则将强制执行许可证。

尽管这不是跨平台或纯Python解决方案,但它可以工作。

Though there’s no perfect solution, the following can be done:

  1. Move some critical piece of startup code into a native library.
  2. Enforce the license check in the native library.

If the call to the native code were to be removed, the program wouldn’t start anyway. If it’s not removed then the license will be enforced.

Though this is not a cross-platform or a pure-Python solution, it will work.


回答 12

我认为还有另一种方法可以保护您的Python代码;混淆方法的一部分。我相信曾经有一款类似Mount and Blade的游戏,或者是某些东西进行了更改并重新编译了自己的python解释器(我认为它是开源的原始解释器),只是将OP代码表中的OP代码更改为与标准python OP不同代码。

因此python源代码未修改,但* .pyc文件的文件扩展名不同,并且操作码与公共python.exe解释器不匹配。如果您检查了游戏数据文件,则所有数据均为Python源格式。

各种各样的恶作剧都可以通过这种方式与未成熟的黑客打成一片。阻止一堆没有经验的黑客很容易。这是您不可能击败的专业黑客。但是我想象大多数公司不会让专业黑客长期待命(可能是因为事情被黑客入侵了)。但是到处都是不成熟的黑客(以好奇的IT员工的身份阅读)。

例如,您可以在经过修改的解释器中,允许其检查源中的某些注释或文档字符串。对于此类代码行,您可能具有特殊的OP代码。例如:

OP 234用于源代码行“#我写的版权”,或者将该行编译为等效于“如果为False:”的操作代码,如果缺少“#版权所有”。出于某些晦涩的原因,基本上禁用了整个代码块。

重新编译经过修改的解释器可能可行的一个用例是,您没有编写该应用程序,但该应用程序很大,但是却得到了保护它的报酬,例如当您是金融应用程序的专用服务器管理员时。

我发现让源代码或操作码开放供人们注意有点矛盾,但是使用SSL进行网络流量。SSL也不是100%安全的。但这是用来阻止MOST的眼睛阅读它的。采取一点预防措施是明智的。

另外,如果足够多的人认为Python源代码和操作码太明显,那么最终有人可能至少会为其开发一个简单的保护工具。因此,越来越多的人问“如何保护Python应用程序”只会促进这种发展。

I think there is one more method to protect your Python code; part of the Obfuscation method. I believe there was a game like Mount and Blade or something that changed and recompiled their own python interpreter (the original interpreter which i believe is open source) and just changed the OP codes in the OP code table to be different then the standard python OP codes.

So the python source is unmodified but the file extensions of the *.pyc files are different and the op codes don’t match to the public python.exe interpreter. If you checked the games data files all the data was in Python source format.

All sorts of nasty tricks can be done to mess with immature hackers this way. Stopping a bunch of inexperienced hackers is easy. It’s the professional hackers that you will not likely beat. But most companies don’t keep pro hackers on staff long I imagine (likely because things get hacked). But immature hackers are all over the place (read as curious IT staff).

You could for example, in a modified interpreter, allow it to check for certain comments or doc strings in your source. You could have special OP codes for such lines of code. For example:

OP 234 is for source line “# Copyright I wrote this” or compile that line into op codes that are equivalent to “if False:” if “# Copyright” is missing. Basically disabling a whole block of code for what appears to be some obscure reason.

One use case where recompiling a modified interpreter may be feasible is where you didn’t write the app, the app is big, but you are paid to protect it, such as when you’re a dedicated server admin for a financial app.

I find it a little contradictory to leave the source or opcodes open for eyeballs, but use SSL for network traffic. SSL is not 100% safe either. But it’s used to stop MOST eyes from reading it. A wee bit precaution is sensible.

Also, if enough people deem that Python source and opcodes are too visible, it’s likely someone will eventually develop at least a simple protection tool for it. So the more people asking “how to protect Python app” only promotes that development.


回答 13

保护代码的唯一可靠方法是在您控制的服务器上运行该代码,并为客户端提供与该服务器连接的客户端。

The reliable only way to protect code is to run it on a server you control and provide your clients with a client which interfaces with that server.


回答 14

我很惊讶没有在任何答案中看到pyconcrete。也许是因为它比问题新?

它可能正是您所需要的。

它不会混淆代码,而是在加载时对其进行加密和解密。

pypi页面

保护python脚本工作流程

  • your_script.py import pyconcrete
  • pyconcrete将挂钩导入模块
  • 当脚本导入时 MODULE,pyconcrete导入钩子将尝试先查找MODULE.pye然后MODULE.pye通过解密_pyconcrete.pyd并执行解密的数据(如.pyc内容)
  • 加密和解密密钥记录_pyconcrete.pyd (例如DLL或SO),密钥将隐藏在二进制代码中,无法在十六进制视图中直接看到

I was surprised in not seeing pyconcrete in any answer. Maybe because it’s newer than the question?

It could be exactly what you need(ed).

Instead of obfuscating the code, it encrypts it and decrypts at load time.

From pypi page:

Protect python script work flow

  • your_script.py import pyconcrete
  • pyconcrete will hook import module
  • when your script do import MODULE, pyconcrete import hook will try to find MODULE.pye first and then decrypt MODULE.pye via _pyconcrete.pyd and execute decrypted data (as .pyc content)
  • encrypt & decrypt secret key record in _pyconcrete.pyd (like DLL or SO) the secret key would be hide in binary code, can’t see it directly in HEX view

回答 15

根据客户的身份,将简单的保护机制与明智的许可协议相结合将是远远的。超过任何复杂的许可/加密/模糊系统更有效。

最好的解决方案是将代码作为服务出售,例如通过托管服务或提供支持-尽管这并不总是可行的。

将代码作为.pyc文件发送将防止您的保护被一些人破坏#秒钟,但是它几乎不是有效的反盗版保护(好像有这种技术),并且最终,它应该不会实现将与公司达成体面的许可协议。

专注于使您的代码尽可能地好用-使满意的客户比防止理论上的盗版给您的公司带来更多的收益。

Depending in who the client is, a simple protection mechanism, combined with a sensible license agreement will be far more effective than any complex licensing/encryption/obfuscation system.

The best solution would be selling the code as a service, say by hosting the service, or offering support – although that isn’t always practical.

Shipping the code as .pyc files will prevent your protection being foiled by a few #s, but it’s hardly effective anti-piracy protection (as if there is such a technology), and at the end of the day, it shouldn’t achieve anything that a decent license agreement with the company will.

Concentrate on making your code as nice to use as possible – having happy customers will make your company far more money than preventing some theoretical piracy..


回答 16

使代码更难于窃取的另一种尝试是使用jython,然后使用java obfuscator

当jythonc将python代码转换为java,然后将java编译为字节码时,这应该可以很好地工作。因此,如果您对类进​​行了混淆处理,那么在反编译之后将很难理解其内容,更不用说恢复实际的代码了。

jython的唯一问题是您不能使用用c编写的python模块。

Another attempt to make your code harder to steal is to use jython and then use java obfuscator.

This should work pretty well as jythonc translate python code to java and then java is compiled to bytecode. So ounce you obfuscate the classes it will be really hard to understand what is going on after decompilation, not to mention recovering the actual code.

The only problem with jython is that you can’t use python modules written in c.


回答 17

通过对重要文件进行散列和签名并使用公钥方法对其进行检查,使用标准的加密方案对代码签名怎么办?

这样,您可以为每个客户颁发带有公钥的许可证文件。

另外,您可以使用像这样的python混淆器(只需在Google上对其进行搜索)。

What about signing your code with standard encryption schemes by hashing and signing important files and checking it with public key methods?

In this way you can issue license file with a public key for each customer.

Additional you can use an python obfuscator like this one (just googled it).


回答 18

您应该看看getdropbox.com上的家伙如何为他们的客户端软件(包括Linux)做到这一点。破解起来非常棘手,并且需要一些创造性的拆卸才能通过保护机制。

You should take a look at how the guys at getdropbox.com do it for their client software, including Linux. It’s quite tricky to crack and requires some quite creative disassembly to get past the protection mechanisms.


回答 19

使用Python最好的办法就是使事物变得晦涩难懂。

  • 删除所有文档字符串
  • 仅分发.pyc编译文件。
  • 冻结它
  • 在类/模块中隐藏常量,以免help(config)不能显示所有内容

您可能可以通过加密一部分并将其动态解密并将其传递给eval()来添加一些其他模糊性。但是,无论您做什么,都可以打破它。

所有这些都不会阻止坚定的攻击者拆卸字节码或使用帮助,目录等在您的api中进行挖掘。

The best you can do with Python is to obscure things.

  • Strip out all docstrings
  • Distribute only the .pyc compiled files.
  • freeze it
  • Obscure your constants inside a class/module so that help(config) doesn’t show everything

You may be able to add some additional obscurity by encrypting part of it and decrypting it on the fly and passing it to eval(). But no matter what you do someone can break it.

None of this will stop a determined attacker from disassembling the bytecode or digging through your api with help, dir, etc.


回答 20

具有时间限制的许可证并在本地安装的程序中进行检查的想法将不起作用。即使进行了完美的混淆,也可以删除许可证检查。但是,如果您在远程系统上检查许可证并在封闭的远程系统上运行程序的重要部分,则可以保护您的IP。

为了防止竞争者将源代码用作自己的源代码或编写受启发的同一代码版本,一种保护方法是在程序逻辑中添加签名(某些秘密能够证明代码已从您那里被盗)并混淆了python源代码,因此很难阅读和利用。

良好的混淆功能为您的代码增加了基本上相同的保护,与将其编译为可执行文件(和剥离二进制文件)的保护相同。弄清楚混淆后的复杂代码的工作原理可能比实际编写自己的实现还要困难。

这无助于防止程序被黑客入侵。即使混淆了代码,许可证内容也会被破解,程序可能会被修改为具有稍微不同的行为(以将代码编译为二进制无助于保护本机程序的相同方式)。

除了符号混淆外,取消代码重构也是个好主意,如果例如调用图指向许多不同的地方,即使实际上这些不同的地方最终做同样的事情,这也会使一切变得更加混乱。

混淆代码内部的逻辑签名(例如,您可以创建由程序逻辑使用但也用作签名的值表),可以用来确定代码是否源自您。如果有人决定使用混淆的代码模块作为自己产品的一部分(即使在对其进行混淆以使其看起来有所不同之后),您也可以证明,该代码已被您的秘密签名窃取。

Idea of having time restricted license and check for it in locally installed program will not work. Even with perfect obfuscation, license check can be removed. However if you check license on remote system and run significant part of the program on your closed remote system, you will be able to protect your IP.

Preventing competitors from using the source code as their own or write their inspired version of the same code, one way to protect is to add signatures to your program logic (some secrets to be able to prove that code was stolen from you) and obfuscate the python source code so, it’s hard to read and utilize.

Good obfuscation adds basically the same protection to your code, that compiling it to executable (and stripping binary) does. Figuring out how obfuscated complex code works might be even harder than actually writing your own implementation.

This will not help preventing hacking of your program. Even with obfuscation code license stuff will be cracked and program may be modified to have slightly different behaviour (in the same way that compiling code to binary does not help protection of native programs).

In addition to symbol obfuscation might be good idea to unrefactor the code, which makes everything even more confusing if e.g. call graphs points to many different places even if actually those different places does eventually the same thing.

Logical signature inside obfuscated code (e.g. you may create table of values which are used by program logic, but also used as signature), which can be used to determine that code is originated from you. If someone decides to use your obfuscated code module as part of their own product (even after reobfuscating it to make it seem different) you can show, that code is stolen with your secret signature.


回答 21

我已经为自己的项目研究了软件保护,并且总体上认为完全保护是不可能的。您唯一希望达到的目的是将保护级别提高到一个比购买另一个许可证要花更多的钱的客户。

话虽这么说,我只是检查google的python混淆,没有发现很多东西。在.Net解决方案中,混淆将是在Windows平台上解决问题的第一种方法,但我不确定是否有人在Linux上具有可与Mono配合使用的解决方案。

接下来的事情是用一种编译语言编写代码,或者如果您真的想一路走下去,则使用汇编器。剥离的可执行文件比解释的语言难于反编译。

一切都取决于权衡。一方面,您可以轻松地使用python进行软件开发,但在其中隐藏秘密也是非常困难的。另一方面,您有用汇编器编写的软件,它很难编写,但是更容易隐藏秘密。

您的老板必须在该连续体中的某个位置选择一个可以满足其要求的点。然后他必须给您工具和时间,以便您可以构建他想要的东西。但是我敢打赌,他将反对实际的开发成本与潜在的金钱损失。

I have looked at software protection in general for my own projects and the general philosophy is that complete protection is impossible. The only thing that you can hope to achieve is to add protection to a level that would cost your customer more to bypass than it would to purchase another license.

With that said I was just checking google for python obsfucation and not turning up a lot of anything. In a .Net solution, obsfucation would be a first approach to your problem on a windows platform, but I am not sure if anyone has solutions on Linux that work with Mono.

The next thing would be to write your code in a compiled language, or if you really want to go all the way, then in assembler. A stripped out executable would be a lot harder to decompile than an interpreted language.

It all comes down to tradeoffs. On one end you have ease of software development in python, in which it is also very hard to hide secrets. On the other end you have software written in assembler which is much harder to write, but is much easier to hide secrets.

Your boss has to choose a point somewhere along that continuum that supports his requirements. And then he has to give you the tools and time so you can build what he wants. However my bet is that he will object to real development costs versus potential monetary losses.


回答 22

长话短说:

  1. 加密您的源代码
  2. 编写自己的python模块加载器,以在导入时解密代码
  3. 在C / C ++中实现模块加载器
  4. 您可以向模块加载器添加更多功能,例如反调试器,许可证控制,硬件指纹绑定等。

有关更多详细信息,请查看此答案

如果您对该主题感兴趣,该项目将为您提供-pyprotect

Long story short:

  1. Encrypt your source code
  2. Write your own python module loader to decrypt your code when importing
  3. Implement the module loader in C/C++
  4. You can add more features to the module loader, for example anti-debugger, license control, hardware fingerprint binding, etc.

For more detail, look this answer.

If you are interested in the topic, this project will help you – pyprotect.


回答 23

对于在内存中加载并执行C启动器的加密资源,可能有py2exe字节码。这里这里的一些想法。

有些人还想到了一种自我修改程序,以使逆向工程变得昂贵。

您还可以找到防止调试器,使反汇编器失败,设置错误的调试器断点以及使用校验和保护代码的教程。搜索[“加密代码”,在“内存中”执行]以获取更多链接。

但是正如其他人已经说过的那样,如果您的代码值得,那么逆向工程师将最终获得成功。

It is possible to have the py2exe byte-code in a crypted resource for a C launcher that loads and executes it in memory. Some ideas here and here.

Some have also thought of a self modifying program to make reverse engineering expensive.

You can also find tutorials for preventing debuggers, make the disassembler fail, set false debugger breakpoints and protect your code with checksums. Search for [“crypted code” execute “in memory”] for more links.

But as others already said, if your code is worth it, reverse engineers will succeed in the end.


回答 24

如果我们专注于软件许可,我建议您看一下我在这里写的另一个Stack Overflow答案以期获得如何构建许可密钥验证系统的灵感。

GitHub上有一个开源库,可以帮助您进行许可证验证。

您可以通过pip install licensing以下方式安装它,然后添加以下代码:

pubKey = "<RSAKeyValue><Modulus>sGbvxwdlDbqFXOMlVUnAF5ew0t0WpPW7rFpI5jHQOFkht/326dvh7t74RYeMpjy357NljouhpTLA3a6idnn4j6c3jmPWBkjZndGsPL4Bqm+fwE48nKpGPjkj4q/yzT4tHXBTyvaBjA8bVoCTnu+LiC4XEaLZRThGzIn5KQXKCigg6tQRy0GXE13XYFVz/x1mjFbT9/7dS8p85n8BuwlY5JvuBIQkKhuCNFfrUxBWyu87CFnXWjIupCD2VO/GbxaCvzrRjLZjAngLCMtZbYBALksqGPgTUN7ZM24XbPWyLtKPaXF2i4XRR9u6eTj5BfnLbKAU5PIVfjIS+vNYYogteQ==</Modulus><Exponent>AQAB</Exponent></RSAKeyValue>"

res = Key.activate(token="WyIyNTU1IiwiRjdZZTB4RmtuTVcrQlNqcSszbmFMMHB3aWFJTlBsWW1Mbm9raVFyRyJd",\
                   rsa_pub_key=pubKey,\
                   product_id=3349, key="ICVLD-VVSZR-ZTICT-YKGXL", machine_code=Helpers.GetMachineCode())

if res[0] == None not Helpers.IsOnRightMachine(res[0]):
    print("An error occured: {0}".format(res[1]))
else:
    print("Success")

您可以在此处详细了解RSA公钥等的配置方式。

If we focus on software licensing, I would recommend to take a look at another Stack Overflow answer I wrote here to get some inspiration of how a license key verification system can be constructed.

There is an open-source library on GitHub that can help you with the license verification bit.

You can install it by pip install licensing and then add the following code:

pubKey = "<RSAKeyValue><Modulus>sGbvxwdlDbqFXOMlVUnAF5ew0t0WpPW7rFpI5jHQOFkht/326dvh7t74RYeMpjy357NljouhpTLA3a6idnn4j6c3jmPWBkjZndGsPL4Bqm+fwE48nKpGPjkj4q/yzT4tHXBTyvaBjA8bVoCTnu+LiC4XEaLZRThGzIn5KQXKCigg6tQRy0GXE13XYFVz/x1mjFbT9/7dS8p85n8BuwlY5JvuBIQkKhuCNFfrUxBWyu87CFnXWjIupCD2VO/GbxaCvzrRjLZjAngLCMtZbYBALksqGPgTUN7ZM24XbPWyLtKPaXF2i4XRR9u6eTj5BfnLbKAU5PIVfjIS+vNYYogteQ==</Modulus><Exponent>AQAB</Exponent></RSAKeyValue>"

res = Key.activate(token="WyIyNTU1IiwiRjdZZTB4RmtuTVcrQlNqcSszbmFMMHB3aWFJTlBsWW1Mbm9raVFyRyJd",\
                   rsa_pub_key=pubKey,\
                   product_id=3349, key="ICVLD-VVSZR-ZTICT-YKGXL", machine_code=Helpers.GetMachineCode())

if res[0] == None not Helpers.IsOnRightMachine(res[0]):
    print("An error occured: {0}".format(res[1]))
else:
    print("Success")

You can read more about the way the RSA public key, etc are configured here.


回答 25

使用相同的方法来保护c / c ++的二进制文件,即在可执行文件或库二进制文件中混淆每个函数主体,在每个函数条目的开头插入一条指令“ jump”,跳转到特殊函数以恢复混淆的代码。字节码是Python脚本的二进制代码,因此

  • 首先将python脚本编译为代码对象
  • 然后迭代每个代码对象,如下混淆每个代码对象的co_code
    0 JUMP_ABSOLUTE n = 3 + len(字节码)

    3
    ...
    ...这是混淆的字节码
    ...

    n LOAD_GLOBAL?(__pyarmor__)
    n + 3 CALL_FUNCTION 0
    n + 6个POP_TOP
    n + 7 JUMP_ABSOLUTE 0
  • 将混淆的代码对象另存为.pyc或.pyo文件

当第一次调用这些代码对象时,那些混淆的文件(.pyc或.pyo)可以由普通的python解释器使用。

  • 第一个操作是JUMP_ABSOLUTE,它将跳转到偏移量n

  • 在偏移量n处,指令将调用PyCFunction。此函数将恢复偏移量3和n之间的混淆字节码,并将原始字节码放在偏移量0处。混淆码可以通过以下代码获得

        char * obfucated_bytecode;
        Py_ssize_t len;
        PyFrameObject *框架= PyEval_GetFrame();
        PyCodeObject * f_code = frame-> f_code;
        PyObject * co_code = f_code-> co_code;      
        PyBytes_AsStringAndSize(co_code,&obfucated_bytecode,&len)
    
  • 此函数返回后,最后一条指令是跳转到偏移量0。现在将执行实际的字节码。

有一个Pyarmor工具可以通过这种方式混淆python脚本。

Use the same way to protect binary file of c/c++, that is, obfuscate each function body in executable or library binary file, insert an instruction “jump” at the begin of each function entry, jump to special function to restore obfuscated code. Byte-code is binary code of Python script, so

  • First compile python script to code object
  • Then iterate each code object, obfuscate co_code of each code object as the following
    0   JUMP_ABSOLUTE            n = 3 + len(bytecode)

    3
    ...
    ... Here it's obfuscated bytecode
    ...

    n   LOAD_GLOBAL              ? (__pyarmor__)
    n+3 CALL_FUNCTION            0
    n+6 POP_TOP
    n+7 JUMP_ABSOLUTE            0
  • Save obfuscated code object as .pyc or .pyo file

Those obfuscated file (.pyc or .pyo) can be used by normal python interpreter, when those code object is called first time

  • First op is JUMP_ABSOLUTE, it will jump to offset n

  • At offset n, the instruction is to call a PyCFunction. This function will restore those obfuscated bytecode between offset 3 and n, and put the original byte-code at offset 0. The obfuscated code can be got by the following code

        char *obfucated_bytecode;
        Py_ssize_t len;
        PyFrameObject* frame = PyEval_GetFrame();
        PyCodeObject *f_code = frame->f_code;
        PyObject *co_code = f_code->co_code;      
        PyBytes_AsStringAndSize(co_code, &obfucated_bytecode, &len)
    
  • After this function returns, the last instruction is to jump to offset 0. The really byte-code now is executed.

There is a tool Pyarmor to obfuscate python scripts by this way.


回答 26

使用cxfreeze(对于Linux为py2exe)将完成此工作。

http://cx-freeze.sourceforge.net/

它在ubuntu存储库中可用

using cxfreeze ( py2exe for linux ) will do the job.

http://cx-freeze.sourceforge.net/

it is available in ubuntu repositories


回答 27

关于隐藏python源代码有一个全面的答案,可以在此处找到。

讨论的可能技术是:
-使用编译的字节码(python -m compileall
-可执行文件的创建者(或PyInstaller之类的安装程序)
-软件即服务(我认为隐藏代码的最佳解决方案)-python
源代码混淆器

There is a comprehensive answer on concealing the python source code, which can be find here.

Possible techniques discussed are:
– use compiled bytecode (python -m compileall)
– executable creators (or installers like PyInstaller)
– software as an service (the best solution to conceal your code in my opinion)
– python source code obfuscators