标签归档:Python

为什么Python无法找到sys.path目录中的共享对象?

问题:为什么Python无法找到sys.path目录中的共享对象?

我正在尝试导入pycurl

$ python -c "import pycurl"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: libcurl.so.4: cannot open shared object file: No such file or directory

现在,libcurl.so.4在中/usr/local/lib。如您所见,这是在sys.path

$ python -c "import sys; print(sys.path)"
['', '/usr/local/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg', 
'/usr/local/lib/python25.zip', '/usr/local/lib/python2.5', 
'/usr/local/lib/python2.5/plat-linux2', '/usr/local/lib/python2.5/lib-tk', 
'/usr/local/lib/python2.5/lib-dynload', 
'/usr/local/lib/python2.5/sitepackages', '/usr/local/lib', 
'/usr/local/lib/python2.5/site-packages']

任何帮助将不胜感激。

I’m trying to import pycurl:

$ python -c "import pycurl"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: libcurl.so.4: cannot open shared object file: No such file or directory

Now, libcurl.so.4 is in /usr/local/lib. As you can see, this is in sys.path:

$ python -c "import sys; print(sys.path)"
['', '/usr/local/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg', 
'/usr/local/lib/python25.zip', '/usr/local/lib/python2.5', 
'/usr/local/lib/python2.5/plat-linux2', '/usr/local/lib/python2.5/lib-tk', 
'/usr/local/lib/python2.5/lib-dynload', 
'/usr/local/lib/python2.5/sitepackages', '/usr/local/lib', 
'/usr/local/lib/python2.5/site-packages']

Any help will be greatly appreciated.


回答 0

sys.path仅搜索Python模块。对于动态链接库,搜索的路径必须位于中LD_LIBRARY_PATH。检查您是否LD_LIBRARY_PATH包含了/usr/local/lib,如果没有,请添加并重试。

一些更多信息(来源):

在Linux中,环境变量LD_LIBRARY_PATH是用冒号分隔的目录集,在其中应首先搜索库,然后再搜索标准目录集;这在调试新库或出于特殊目的使用非标准库时很有用。环境变量LD_PRELOAD列出了具有覆盖标准集的功能的共享库,就像/etc/ld.so.preload一样。这些由加载程序/lib/ld-linux.so实现。我应该指出,虽然LD_LIBRARY_PATH在许多类Unix系统上都可以使用,但它并不是在所有系统上都可以使用。例如,此功能在HP-UX上可用,但作为环境变量SHLIB_PATH,在AIX上是通过变量LIBPATH(使用相同的语法,用冒号分隔的列表)来提供的。

更新:要设置LD_LIBRARY_PATH,最好在您的~/.bashrc 文件或等效文件中使用以下之一:

export LD_LIBRARY_PATH=/usr/local/lib

要么

export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

如果第一种形式为空(等于空字符串,或者根本不存在),则使用第二种形式,如果不是,则使用第二种形式。注意使用导出

sys.path is only searched for Python modules. For dynamic linked libraries, the paths searched must be in LD_LIBRARY_PATH. Check if your LD_LIBRARY_PATH includes /usr/local/lib, and if it doesn’t, add it and try again.

Some more information (source):

In Linux, the environment variable LD_LIBRARY_PATH is a colon-separated set of directories where libraries should be searched for first, before the standard set of directories; this is useful when debugging a new library or using a nonstandard library for special purposes. The environment variable LD_PRELOAD lists shared libraries with functions that override the standard set, just as /etc/ld.so.preload does. These are implemented by the loader /lib/ld-linux.so. I should note that, while LD_LIBRARY_PATH works on many Unix-like systems, it doesn’t work on all; for example, this functionality is available on HP-UX but as the environment variable SHLIB_PATH, and on AIX this functionality is through the variable LIBPATH (with the same syntax, a colon-separated list).

Update: to set LD_LIBRARY_PATH, use one of the following, ideally in your ~/.bashrc or equivalent file:

export LD_LIBRARY_PATH=/usr/local/lib

or

export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

Use the first form if it’s empty (equivalent to the empty string, or not present at all), and the second form if it isn’t. Note the use of export.


回答 1

确保您的libcurl.so模块位于系统库路径中,该路径与python库路径不同且独立。

“快速修复”是将此路径添加到LD_LIBRARY_PATH变量。但是,设置整个系统(甚至帐户范围)是一个糟糕的想法,因为可以通过某种方式设置它,以使某些程序会找到它不应该或者甚至更不希望打开安全漏洞的库。

如果您的“本地安装的库”安装在例如/ usr / local / lib中,请将此目录添加到/etc/ld.so.conf(这是一个文本文件),然后运行“ ldconfig”

该命令将运行缓存实用程序,但还将创建加载程序系统运行所需的所有必要“符号链接”。令人惊讶的是,libcurl的“ make install”尚未执行此操作,但如果/ usr / local / lib不在/etc/ld.so.conf中,则可能无法执行此操作。

PS:您的/etc/ld.so.conf可能只包含“ include ld.so.conf.d / *。conf”而不包含任何内容。您仍然可以在其后添加目录路径,或者只是在包含该目录的目录中创建一个新文件。不要忘记在它之后运行“ ldconfig”。

小心。弄错了会损坏您的系统。

另外:确保您的Python模块针对该版本的libcurl进行编译。如果您只是从其他系统复制了一些文件,则此方法将永远无法正常工作。如有疑问,请在要运行它们的系统上编译模块。

Ensure your libcurl.so module is in the system library path, which is distinct and separate from the python library path.

A “quick fix” is to add this path to a LD_LIBRARY_PATH variable. However, setting that system wide (or even account wide) is a BAD IDEA, as it is possible to set it in such a way that some programs will find a library it shouldn’t, or even worse, open up security holes.

If your “locally installed libraries” are installed in, for example, /usr/local/lib, add this directory to /etc/ld.so.conf (it’s a text file) and run “ldconfig”

The command will run a caching utility, but will also create all the necessary “symbolic links” required for the loader system to function. It is surprising that the “make install” for libcurl did not do this already, but it’s possible it could not if /usr/local/lib is not in /etc/ld.so.conf already.

PS: it’s possible that your /etc/ld.so.conf contains nothing but “include ld.so.conf.d/*.conf”. You can still add a directory path after it, or just create a new file inside the directory it’s being included from. Dont forget to run “ldconfig” after it.

Be careful. Getting this wrong can screw up your system.

Additionally: make sure your python module is compiled against THAT version of libcurl. If you just copied some files over from another system, this wont always work. If in doubt, compile your modules on the system you intend to run them on.


回答 2

最初编译pycurl时,还可以在用户环境中将LD_RUN_PATH设置为/ usr / local / lib。这会将/ usr / local / lib嵌入C扩展模块。的RPATH属性中,以便它在运行时自动知道在哪里可以找到该库,而不必在运行时设置LD_LIBRARY_PATH。

You can also set LD_RUN_PATH to /usr/local/lib in your user environment when you compile pycurl in the first place. This will embed /usr/local/lib in the RPATH attribute of the C extension module .so so that it automatically knows where to find the library at run time without having to have LD_LIBRARY_PATH set at run time.


回答 3

有完全相同的问题。我将curl 7.19安装到/ opt / curl /中,以确保不会影响生产服务器上的当前curl。将libcurl.so.4链接到/ usr / lib之后:

须藤ln -s /opt/curl/lib/libcurl.so /usr/lib/libcurl.so.4

我仍然遇到相同的错误!杜尔夫

但是运行ldconfig可以使链接变得很有效。完全不需要设置LD_RUN_PATH或LD_LIBRARY_PATH。只需运行ldconfig。

Had the exact same issue. I installed curl 7.19 to /opt/curl/ to make sure that I would not affect current curl on our production servers. Once I linked libcurl.so.4 to /usr/lib:

sudo ln -s /opt/curl/lib/libcurl.so /usr/lib/libcurl.so.4

I still got the same error! Durf.

But running ldconfig make the linkage for me and that worked. No need to set the LD_RUN_PATH or LD_LIBRARY_PATH at all. Just needed to run ldconfig.


回答 4

作为上述答案的补充-我只是遇到了类似的问题,并且完全使用默认安装的python进行工作。

当我调用要使用的共享库的示例时LD_LIBRARY_PATH,会得到如下信息:

$ LD_LIBRARY_PATH=/path/to/mysodir:$LD_LIBRARY_PATH python example-so-user.py
python: can't open file 'example-so-user.py': [Errno 2] No such file or directory

值得注意的是,它甚至不抱怨导入-它抱怨源文件!

但是如果我使用LD_PRELOAD以下命令强制加载对象:

$ LD_PRELOAD=/path/to/mysodir/mypyobj.so python example-so-user.py
python: error while loading shared libraries: libtiff.so.5: cannot open shared object file: No such file or directory

…我立即收到一条更有意义的错误消息-有关缺少的依赖项!

只是以为我会在这里写下来-干杯!

As a supplement to above answers – I’m just bumping into a similar problem, and working completely of the default installed python.

When I call the example of the shared object library I’m looking for with LD_LIBRARY_PATH, I get something like this:

$ LD_LIBRARY_PATH=/path/to/mysodir:$LD_LIBRARY_PATH python example-so-user.py
python: can't open file 'example-so-user.py': [Errno 2] No such file or directory

Notably, it doesn’t even complain about the import – it complains about the source file!

But if I force loading of the object using LD_PRELOAD:

$ LD_PRELOAD=/path/to/mysodir/mypyobj.so python example-so-user.py
python: error while loading shared libraries: libtiff.so.5: cannot open shared object file: No such file or directory

… I immediately get a more meaningful error message – about a missing dependency!

Just thought I’d jot this down here – cheers!


回答 5

我使用的python setup.py build_ext -R/usr/local/lib -I/usr/local/include/libcalg-1.0是已编译的.so文件,该文件位于build文件夹下。您可以键入python setup.py --help build_ext以查看-R和-I的解释

I use python setup.py build_ext -R/usr/local/lib -I/usr/local/include/libcalg-1.0 and the compiled .so file is under the build folder. you can type python setup.py --help build_ext to see the explanations of -R and -I


回答 6

对我来说,这里的工作是使用版本管理器,例如pyenv,我强烈建议您对项目环境和程序包版本进行良好的管理,并使其与操作系统的版本分开。

操作系统更新后,我也遇到了同样的错误,但是很容易用pyenv install 3.7-dev(我使用的版本)修复。

For me what works here is to using a version manager such as pyenv, which I strongly recommend to get your project environments and package versions well managed and separate from that of the operative system.

I had this same error after an OS update, but was easily fixed with pyenv install 3.7-dev (the version I use).


Python字典到URL参数

问题:Python字典到URL参数

我正在尝试将Python字典转换为用作URL参数的字符串。我敢肯定,有一种更好的,更Python化的方法可以做到这一点。它是什么?

x = ""
for key, val in {'a':'A', 'b':'B'}.items():
    x += "%s=%s&" %(key,val)
x = x[:-1]

I am trying to convert a Python dictionary to a string for use as URL parameters. I am sure that there is a better, more Pythonic way of doing this. What is it?

x = ""
for key, val in {'a':'A', 'b':'B'}.items():
    x += "%s=%s&" %(key,val)
x = x[:-1]

回答 0

使用urllib.urlencode()。它采用键值对字典,然后将其转换为适合网址的形式(例如,key1=val1&key2=val2)。

如果您使用的是Python3,请使用 urllib.parse.urlencode()

如果要使用重复的参数创建URL,例如:p=1&p=2&p=3您有两个选择:

>>> import urllib
>>> a = (('p',1),('p',2), ('p', 3))
>>> urllib.urlencode(a)
'p=1&p=2&p=3'

或者,如果您想使用重复的参数创建网址:

>>> urllib.urlencode({'p': [1, 2, 3]}, doseq=True)
'p=1&p=2&p=3'

Use urllib.urlencode(). It takes a dictionary of key-value pairs, and converts it into a form suitable for a URL (e.g., key1=val1&key2=val2).

If you are using Python3, use urllib.parse.urlencode()

If you want to make a URL with repetitive params such as: p=1&p=2&p=3 you have two options:

>>> import urllib
>>> a = (('p',1),('p',2), ('p', 3))
>>> urllib.urlencode(a)
'p=1&p=2&p=3'

or if you want to make a url with repetitive params:

>>> urllib.urlencode({'p': [1, 2, 3]}, doseq=True)
'p=1&p=2&p=3'

回答 1

使用第三方Python URL操作库furl

f = furl.furl('')
f.args = {'a':'A', 'b':'B'}
print(f.url) # prints ... '?a=A&b=B'

如果需要重复的参数,可以执行以下操作:

f = furl.furl('')
f.args = [('a', 'A'), ('b', 'B'),('b', 'B2')]
print(f.url) # prints ... '?a=A&b=B&b=B2'

Use the 3rd party Python url manipulation library furl:

f = furl.furl('')
f.args = {'a':'A', 'b':'B'}
print(f.url) # prints ... '?a=A&b=B'

If you want repetitive parameters, you can do the following:

f = furl.furl('')
f.args = [('a', 'A'), ('b', 'B'),('b', 'B2')]
print(f.url) # prints ... '?a=A&b=B&b=B2'

回答 2

在我看来,这似乎更像Pythonic,并且不使用任何其他模块:

x = '&'.join(["{}={}".format(k, v) for k, v in {'a':'A', 'b':'B'}.items()])

This seems a bit more Pythonic to me, and doesn’t use any other modules:

x = '&'.join(["{}={}".format(k, v) for k, v in {'a':'A', 'b':'B'}.items()])

scipy.misc模块没有属性读取?

问题:scipy.misc模块没有属性读取?

我正在尝试读取图像。但是,它不接受该scipy.misc.imread零件。这可能是什么原因?

>>> import scipy
>>> scipy.misc
<module 'scipy.misc' from 'C:\Python27\lib\site-packages\scipy\misc\__init__.pyc'>
>>> scipy.misc.imread('test.tif')
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    scipy.misc.imread('test.tif')
AttributeError: 'module' object has no attribute 'imread'

I am trying to read an image with scipy. However it does not accept the scipy.misc.imread part. What could be the cause of this?

>>> import scipy
>>> scipy.misc
<module 'scipy.misc' from 'C:\Python27\lib\site-packages\scipy\misc\__init__.pyc'>
>>> scipy.misc.imread('test.tif')
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    scipy.misc.imread('test.tif')
AttributeError: 'module' object has no attribute 'imread'

回答 0

您需要安装Pillow(以前称为PIL)。从在文档scipy.misc

请注意,Pillow不是SciPy的依赖项,但是如果没有它,下面列表中指示的图像处理功能将不可用:

imread

安装Pillow后,我可以imread如下访问:

In [1]: import scipy.misc

In [2]: scipy.misc.imread
Out[2]: <function scipy.misc.pilutil.imread>

You need to install Pillow (formerly PIL). From the docs on scipy.misc:

Note that Pillow is not a dependency of SciPy but the image manipulation functions indicated in the list below are not available without it:

imread

After installing Pillow, I was able to access imread as follows:

In [1]: import scipy.misc

In [2]: scipy.misc.imread
Out[2]: <function scipy.misc.pilutil.imread>

回答 1

imread在SciPy 1.0.0中已弃用,在1.2.0中将被删除。使用imageio.imread代替。

import imageio
im = imageio.imread('astronaut.png')
im.shape  # im is a numpy array
(512, 512, 3)
imageio.imwrite('imageio:astronaut-gray.jpg', im[:, :, 0])

imread is deprecated in SciPy 1.0.0, and will be removed in 1.2.0. Use imageio.imread instead.

import imageio
im = imageio.imread('astronaut.png')
im.shape  # im is a numpy array
(512, 512, 3)
imageio.imwrite('imageio:astronaut-gray.jpg', im[:, :, 0])

回答 2

版本1.2.0之后,imread贬值!因此,要解决此问题,我必须安装版本1.1.0。

pip install scipy==1.1.0

imread is depreciated after version 1.2.0! So to solve this issue I had to install version 1.1.0.

pip install scipy==1.1.0

回答 3

对于Python 3,最好是使用imreadmatplotlib.pyplot

from matplotlib.pyplot import imread

For Python 3, it is best to use imread in matplotlib.pyplot:

from matplotlib.pyplot import imread

回答 4

如果有人遇到相同的问题,请卸载scipy并安装scipy == 1.1.0

$ pip uninstall scipy

$ pip install scipy==1.1.0

In case anyone encountering the same issue, please uninstall scipy and install scipy==1.1.0

$ pip uninstall scipy

$ pip install scipy==1.1.0

回答 5

您需要Python Imaging Library(PIL),但是but!PIL项目似乎已被放弃。特别是,它尚未移植到Python3。因此,如果要在Python 3中使用PIL功能,则最好使用Pillow,它是PIL的半官方分支,并且正在积极开发中。实际上,如果您完全需要现代的PIL实现,我建议您选择Pillow。就像一样简单pip install pillow。由于它使用与PIL相同的命名空间,因此实质上是直接替代。

这个叉子有多“半官方”?你可能会问。“ 枕头”文档的“ 关于”页面说:

自上次发布PIL之后,随着时间的流逝,新发布PIL的可能性降低。但是,我们尚未听到官方的“ PIL已死”声明。因此,如果您仍然希望支持PIL,请先在此处报告问题,然后在此处打开相应的枕头票。

请提供第一张票证的链接,以便我们可以跟踪上游问题。

但是,PIL 官方站点上的最新PIL版本发布于2009年11月15日。我认为,在将近八年没有新版本发布之时,我们可以肯定地说Pillow是PIL的继承者。因此,即使您不需要Python 3支持,我也建议您避免使用PyPI中可用的古老PIL 1.1.6发行版,而只需安装新的,最新的,兼容的Pillow。

You need the Python Imaging Library (PIL) but alas! the PIL project seems to have been abandoned. In particular, it hasn’t been ported to Python 3. So if you want PIL functionality in Python 3, you’ll do well do use Pillow, which is the semi-official fork of PIL and appears to be actively developed. Actually, if you need a modern PIL implementation at all I’d recommend Pillow. It’s as simple as pip install pillow. As it uses the same namespace as PIL it’s essentially a drop-in replacement.

How “semi-official” is this fork? you may ask. The About page of the Pillow docs say this:

As more time passes since the last PIL release, the likelihood of a new PIL release decreases. However, we’ve yet to hear an official “PIL is dead” announcement. So if you still want to support PIL, please report issues here first, then open corresponding Pillow tickets here.

Please provide a link to the first ticket so we can track the issue(s) upstream.

However, the most recent PIL release on the official PIL site is dated November 15, 2009. I think we can safely proclaim Pillow as the successor of PIL after (as of this writing) nearly eight years of no new releases. So even if you don’t need Python 3 support, I suggest you eschew the ancient PIL 1.1.6 distribution available in PyPI and just install fresh, up-to-date, compatible Pillow.


回答 6

通过以下命令安装枕头库:

pip install pillow

请注意,所选答案已过时。查看SciPy的文档

请注意,Pillow(https://python-pillow.org/)不是SciPy的依赖项,但如果没有它,则下面列表中指示的图像处理功能将不可用。

Install the Pillow library by following commands:

pip install pillow

Note, the selected answer has been outdated. See the docs of SciPy

Note that Pillow (https://python-pillow.org/) is not a dependency of SciPy, but the image manipulation functions indicated in the list below are not available without it.


回答 7

答案是:misc.imread在SciPy 1.0.0中已弃用,在1.2.0中将被删除。imageio是一个选项,它将返回类型为object的对象:

<class 'imageio.core.util.Image'>

但要使用image2,而不要使用imageio

import cv2
im = cv2.imread('astronaut.png')

我将是类型: <class 'numpy.ndarray'>

由于numpy数组的计算速度更快。

As answered misc.imread is deprecated in SciPy 1.0.0, and will be removed in 1.2.0. imageio is one option,it will return object of type :

<class 'imageio.core.util.Image'>

but instead of imageio, use cv2

import cv2
im = cv2.imread('astronaut.png')

im will be of type : <class 'numpy.ndarray'>

As numpy arrays are faster to compute.


回答 8

Imread使用PIL库,如果已安装该库,则使用:“ from scipy.ndimage import imread”

资料来源:http : //docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.ndimage.imread.html

Imread uses PIL library, if the library is installed use : “from scipy.ndimage import imread”

Source: http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.ndimage.imread.html


回答 9

python -m pip install pillow

这对我有用。

python -m pip install pillow

This worked for me.


回答 10

您需要一个python图像库(PIL),但是现在仅PIL还不够,您最好安装Pillow。这很好。

You need a python image library (PIL), but now PIL only is not enough, you’d better install Pillow. This works well.


回答 11

在Jupyter Notebook中运行以下命令,我收到了类似的错误消息:

from skimage import data
photo_data = misc.imread('C:/Users/ers.jpg')
type(photo_data)

“错误”消息:

D:\ Program Files(x86)\ Microsoft Visual Studio \ Shared \ Anaconda3_64 \ lib \ site-packages \ ipykernel_launcher.py:3:DeprecationWarning:已imread弃用!imread在SciPy 1.0.0中已弃用,在1.2.0中将被删除。使用imageio.imread 代替。这与ipykernel软件包分开,因此我们可以避免导入,直到

并使用以下我解决了:

import matplotlib.pyplot
photo_data = matplotlib.pyplot.imread('C:/Users/ers.jpg')
type(photo_data)

Running the following in a Jupyter Notebook, I had a similar error message:

from skimage import data
photo_data = misc.imread('C:/Users/ers.jpg')
type(photo_data)

‘error’ msg:

D:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\ipykernel_launcher.py:3: DeprecationWarning: imread is deprecated! imread is deprecated in SciPy 1.0.0, and will be removed in 1.2.0. Use imageio.imread instead. This is separate from the ipykernel package so we can avoid doing imports until

And using the following I got it solved:

import matplotlib.pyplot
photo_data = matplotlib.pyplot.imread('C:/Users/ers.jpg')
type(photo_data)

回答 12

我在jupyter笔记本上具有图像提取所需的所有软件包,但即使如此,它仍然显示相同的错误。

Jupyter Notebook上的错误

阅读以上评论,我已经安装了必需的软件包。请告诉我是否错过了一些包裹。

pip3 freeze | grep -i -E "pillow|scipy|scikit-image"
Pillow==5.4.1
scikit-image==0.14.2

scipy==1.2.1

I have all the packages required for the image extraction on jupyter notebook, but even then it shows me the same error.

Error on Jupyter Notebook

Reading the above comments, I have installed the required packages. Please do tell if I have missed some packages.

pip3 freeze | grep -i -E "pillow|scipy|scikit-image"
Pillow==5.4.1
scikit-image==0.14.2

scipy==1.2.1

回答 13

在python 3.6中为我工作的解决方案如下

py -m pip安装枕头

The solution that work for me in python 3.6 is the following

py -m pip install Pillow


BeautifulSoup抓取可见网页文本

问题:BeautifulSoup抓取可见网页文本

基本上,我想使用BeautifulSoup来严格抓取网页上的可见文本。例如,此网页是我的测试用例。我主要想获取正文文本(文章),甚至在这里和那里甚至几个标签名称。我尝试了这个SO问题中的建议,该建议返回很多<script>我不想要的标签和html注释。我无法弄清楚该函数所需的参数findAll(),以便仅获取网页上的可见文本。

那么,我应该如何查找除脚本,注释,CSS等之外的所有可见文本?

Basically, I want to use BeautifulSoup to grab strictly the visible text on a webpage. For instance, this webpage is my test case. And I mainly want to just get the body text (article) and maybe even a few tab names here and there. I have tried the suggestion in this SO question that returns lots of <script> tags and html comments which I don’t want. I can’t figure out the arguments I need for the function findAll() in order to just get the visible texts on a webpage.

So, how should I find all visible text excluding scripts, comments, css etc.?


回答 0

试试这个:

from bs4 import BeautifulSoup
from bs4.element import Comment
import urllib.request


def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    return True


def text_from_html(body):
    soup = BeautifulSoup(body, 'html.parser')
    texts = soup.findAll(text=True)
    visible_texts = filter(tag_visible, texts)  
    return u" ".join(t.strip() for t in visible_texts)

html = urllib.request.urlopen('http://www.nytimes.com/2009/12/21/us/21storm.html').read()
print(text_from_html(html))

Try this:

from bs4 import BeautifulSoup
from bs4.element import Comment
import urllib.request


def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    return True


def text_from_html(body):
    soup = BeautifulSoup(body, 'html.parser')
    texts = soup.findAll(text=True)
    visible_texts = filter(tag_visible, texts)  
    return u" ".join(t.strip() for t in visible_texts)

html = urllib.request.urlopen('http://www.nytimes.com/2009/12/21/us/21storm.html').read()
print(text_from_html(html))

回答 1

@jbochi批准的答案对我不起作用。str()函数调用会引发异常,因为它无法对BeautifulSoup元素中的非ASCII字符进行编码。这是将示例网页过滤为可见文本的一种更为简洁的方法。

html = open('21storm.html').read()
soup = BeautifulSoup(html)
[s.extract() for s in soup(['style', 'script', '[document]', 'head', 'title'])]
visible_text = soup.getText()

The approved answer from @jbochi does not work for me. The str() function call raises an exception because it cannot encode the non-ascii characters in the BeautifulSoup element. Here is a more succinct way to filter the example web page to visible text.

html = open('21storm.html').read()
soup = BeautifulSoup(html)
[s.extract() for s in soup(['style', 'script', '[document]', 'head', 'title'])]
visible_text = soup.getText()

回答 2

import urllib
from bs4 import BeautifulSoup

url = "https://www.yahoo.com"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)

# kill all script and style elements
for script in soup(["script", "style"]):
    script.extract()    # rip it out

# get text
text = soup.get_text()

# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)

print(text.encode('utf-8'))
import urllib
from bs4 import BeautifulSoup

url = "https://www.yahoo.com"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)

# kill all script and style elements
for script in soup(["script", "style"]):
    script.extract()    # rip it out

# get text
text = soup.get_text()

# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)

print(text.encode('utf-8'))

回答 3

我完全尊重使用Beautiful Soup获取呈现的内容,但是它可能不是获取页面上呈现的内容的理想软件包。

我遇到了类似的问题,无法获取渲染的内容或典型浏览器中的可见内容。特别是,在下面的一个简单示例中,我可能有许多非典型案例。在这种情况下,不可显示的标签嵌套在样式标签中,在我检查过的许多浏览器中都不可见。存在其他变体,例如将类标签设置显示定义为无。然后将此类用于div。

<html>
  <title>  Title here</title>

  <body>

    lots of text here <p> <br>
    <h1> even headings </h1>

    <style type="text/css"> 
        <div > this will not be visible </div> 
    </style>


  </body>

</html>

上面发布的一种解决方案是:

html = Utilities.ReadFile('simple.html')
soup = BeautifulSoup.BeautifulSoup(html)
texts = soup.findAll(text=True)
visible_texts = filter(visible, texts)
print(visible_texts)


[u'\n', u'\n', u'\n\n        lots of text here ', u' ', u'\n', u' even headings ', u'\n', u' this will not be visible ', u'\n', u'\n']

该解决方案当然在许多情况下都有应用程序,并且通常可以很好地完成工作,但是在上面发布的html中,它保留了未呈现的文本。经过搜索之后,这里出现了一些解决方案,BeautifulSoup get_text不会剥离所有标签和JavaScript ,这里是使用Python将HTML渲染为纯文本的方式

我尝试了这两种解决方案:html2text和nltk.clean_html,并且对计时结果感到惊讶,因此认为它们值得后代的答案。当然,速度很大程度上取决于数据的内容。

@Helge的一个答案是关于使用nltk的所有东西。

import nltk

%timeit nltk.clean_html(html)
was returning 153 us per loop

返回带有呈现的html的字符串的效果很好。这个nltk模块甚至比html2text还要快,尽管html2text可能更健壮。

betterHTML = html.decode(errors='ignore')
%timeit html2text.html2text(betterHTML)
%3.09 ms per loop

I completely respect using Beautiful Soup to get rendered content, but it may not be the ideal package for acquiring the rendered content on a page.

I had a similar problem to get rendered content, or the visible content in a typical browser. In particular I had many perhaps atypical cases to work with such a simple example below. In this case the non displayable tag is nested in a style tag, and is not visible in many browsers that I have checked. Other variations exist such as defining a class tag setting display to none. Then using this class for the div.

<html>
  <title>  Title here</title>

  <body>

    lots of text here <p> <br>
    <h1> even headings </h1>

    <style type="text/css"> 
        <div > this will not be visible </div> 
    </style>


  </body>

</html>

One solution posted above is:

html = Utilities.ReadFile('simple.html')
soup = BeautifulSoup.BeautifulSoup(html)
texts = soup.findAll(text=True)
visible_texts = filter(visible, texts)
print(visible_texts)


[u'\n', u'\n', u'\n\n        lots of text here ', u' ', u'\n', u' even headings ', u'\n', u' this will not be visible ', u'\n', u'\n']

This solution certainly has applications in many cases and does the job quite well generally but in the html posted above it retains the text that is not rendered. After searching SO a couple solutions came up here BeautifulSoup get_text does not strip all tags and JavaScript and here Rendered HTML to plain text using Python

I tried both these solutions: html2text and nltk.clean_html and was surprised by the timing results so thought they warranted an answer for posterity. Of course, the speeds highly depend on the contents of the data…

One answer here from @Helge was about using nltk of all things.

import nltk

%timeit nltk.clean_html(html)
was returning 153 us per loop

It worked really well to return a string with rendered html. This nltk module was faster than even html2text, though perhaps html2text is more robust.

betterHTML = html.decode(errors='ignore')
%timeit html2text.html2text(betterHTML)
%3.09 ms per loop

回答 4

如果您关心性能,这是另一种更有效的方法:

import re

INVISIBLE_ELEMS = ('style', 'script', 'head', 'title')
RE_SPACES = re.compile(r'\s{3,}')

def visible_texts(soup):
    """ get visible text from a document """
    text = ' '.join([
        s for s in soup.strings
        if s.parent.name not in INVISIBLE_ELEMS
    ])
    # collapse multiple spaces to two spaces.
    return RE_SPACES.sub('  ', text)

soup.strings是一个迭代器,它返回,NavigableString以便您可以直接检查父级的标记名,而无需经历多个循环。

If you care about performance, here’s another more efficient way:

import re

INVISIBLE_ELEMS = ('style', 'script', 'head', 'title')
RE_SPACES = re.compile(r'\s{3,}')

def visible_texts(soup):
    """ get visible text from a document """
    text = ' '.join([
        s for s in soup.strings
        if s.parent.name not in INVISIBLE_ELEMS
    ])
    # collapse multiple spaces to two spaces.
    return RE_SPACES.sub('  ', text)

soup.strings is an iterator, and it returns NavigableString so that you can check the parent’s tag name directly, without going through multiple loops.


回答 5

标题位于<nyt_headline>标签内,该标签嵌套在<h1>标签和<div>ID为“ article” 的标签内。

soup.findAll('nyt_headline', limit=1)

应该管用。

文章正文位于<nyt_text>标记内,该标记嵌套在<div>ID为“ articleBody” 的标记内。在<nyt_text> 元素内部,文本本身包含在<p> 标签中。图片不在这些<p>标签内。对我来说,尝试语法很难,但是我希望工作的草稿看起来像这样。

text = soup.findAll('nyt_text', limit=1)[0]
text.findAll('p')

The title is inside an <nyt_headline> tag, which is nested inside an <h1> tag and a <div> tag with id “article”.

soup.findAll('nyt_headline', limit=1)

Should work.

The article body is inside an <nyt_text> tag, which is nested inside a <div> tag with id “articleBody”. Inside the <nyt_text> element, the text itself is contained within <p> tags. Images are not within those <p> tags. It’s difficult for me to experiment with the syntax, but I expect a working scrape to look something like this.

text = soup.findAll('nyt_text', limit=1)[0]
text.findAll('p')

回答 6

虽然,我会完全建议一般使用精美的汤,但是,如果有人希望显示格式错误的html的可见部分(例如,您只有网页的一段或一行),无论出于何种原因,以下内容将删除<>标签之间的内容:

import re   ## only use with malformed html - this is not efficient
def display_visible_html_using_re(text):             
    return(re.sub("(\<.*?\>)", "",text))

While, i would completely suggest using beautiful-soup in general, if anyone is looking to display the visible parts of a malformed html (e.g. where you have just a segment or line of a web-page) for whatever-reason, the the following will remove content between < and > tags:

import re   ## only use with malformed html - this is not efficient
def display_visible_html_using_re(text):             
    return(re.sub("(\<.*?\>)", "",text))

回答 7

使用BeautifulSoup是最简单的方法,只需较少的代码即可获取字符串,而不会出现空行和废话。

tag = <Parent_Tag_that_contains_the_data>
soup = BeautifulSoup(tag, 'html.parser')

for i in soup.stripped_strings:
    print repr(i)

Using BeautifulSoup the easiest way with less code to just get the strings, without empty lines and crap.

tag = <Parent_Tag_that_contains_the_data>
soup = BeautifulSoup(tag, 'html.parser')

for i in soup.stripped_strings:
    print repr(i)

回答 8

处理这种情况的最简单方法是使用getattr()。您可以根据需要调整此示例:

from bs4 import BeautifulSoup

source_html = """
<span class="ratingsDisplay">
    <a class="ratingNumber" href="https://www.youtube.com/watch?v=oHg5SJYRHA0" target="_blank" rel="noopener">
        <span class="ratingsContent">3.7</span>
    </a>
</span>
"""

soup = BeautifulSoup(source_html, "lxml")
my_ratings = getattr(soup.find('span', {"class": "ratingsContent"}), "text", None)
print(my_ratings)

如果存在,它将"3.7"在标记对象中找到文本元素,<span class="ratingsContent">3.7</span>但是默认为NoneType不存在时。

getattr(object, name[, default])

返回对象的命名属性的值。名称必须是字符串。如果字符串是对象属性之一的名称,则结果是该属性的值。例如,getattr(x,’foobar’)等同于x.foobar。如果命名属性不存在,则返回默认值(如果提供),否则引发AttributeError。

The simplest way to handle this case is by using getattr(). You can adapt this example to your needs:

from bs4 import BeautifulSoup

source_html = """
<span class="ratingsDisplay">
    <a class="ratingNumber" href="https://www.youtube.com/watch?v=oHg5SJYRHA0" target="_blank" rel="noopener">
        <span class="ratingsContent">3.7</span>
    </a>
</span>
"""

soup = BeautifulSoup(source_html, "lxml")
my_ratings = getattr(soup.find('span', {"class": "ratingsContent"}), "text", None)
print(my_ratings)

This will find the text element,"3.7", within the tag object <span class="ratingsContent">3.7</span> when it exists, however, default to NoneType when it does not.

getattr(object, name[, default])

Return the value of the named attribute of object. name must be a string. If the string is the name of one of the object’s attributes, the result is the value of that attribute. For example, getattr(x, ‘foobar’) is equivalent to x.foobar. If the named attribute does not exist, default is returned if provided, otherwise, AttributeError is raised.


回答 9

from bs4 import BeautifulSoup
from bs4.element import Comment
import urllib.request
import re
import ssl

def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    if re.match(r"[\n]+",str(element)): return False
    return True
def text_from_html(url):
    body = urllib.request.urlopen(url,context=ssl._create_unverified_context()).read()
    soup = BeautifulSoup(body ,"lxml")
    texts = soup.findAll(text=True)
    visible_texts = filter(tag_visible, texts)  
    text = u",".join(t.strip() for t in visible_texts)
    text = text.lstrip().rstrip()
    text = text.split(',')
    clean_text = ''
    for sen in text:
        if sen:
            sen = sen.rstrip().lstrip()
            clean_text += sen+','
    return clean_text
url = 'http://www.nytimes.com/2009/12/21/us/21storm.html'
print(text_from_html(url))
from bs4 import BeautifulSoup
from bs4.element import Comment
import urllib.request
import re
import ssl

def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    if re.match(r"[\n]+",str(element)): return False
    return True
def text_from_html(url):
    body = urllib.request.urlopen(url,context=ssl._create_unverified_context()).read()
    soup = BeautifulSoup(body ,"lxml")
    texts = soup.findAll(text=True)
    visible_texts = filter(tag_visible, texts)  
    text = u",".join(t.strip() for t in visible_texts)
    text = text.lstrip().rstrip()
    text = text.split(',')
    clean_text = ''
    for sen in text:
        if sen:
            sen = sen.rstrip().lstrip()
            clean_text += sen+','
    return clean_text
url = 'http://www.nytimes.com/2009/12/21/us/21storm.html'
print(text_from_html(url))

Python-write()与writelines()和串联字符串

问题:Python-write()与writelines()和串联字符串

所以我正在学习Python。我正在上课,遇到一个问题,我不得不将很多压缩target.write()成一个write(),同时"\n"在每个用户输入变量(的对象write())之间都有一个。

我想出了:

nl = "\n"
lines = line1, nl, line2, nl, line3, nl
textdoc.writelines(lines)

如果我尝试这样做:

textdoc.write(lines)

我得到一个错误。但是如果我输入:

textdoc.write(line1 + "\n" + line2 + ....)

然后工作正常。为什么我不能在其中使用字符串作为换行符,write()但可以在其中使用呢writelines()

Python 2.7当我搜索google时,发现的大部分资源都超出了我的想象力,我仍然是一个外行。

So I’m learning Python. I am going through the lessons and ran into a problem where I had to condense a great many target.write() into a single write(), while having a "\n" between each user input variable(the object of write()).

I came up with:

nl = "\n"
lines = line1, nl, line2, nl, line3, nl
textdoc.writelines(lines)

If I try to do:

textdoc.write(lines)

I get an error. But if I type:

textdoc.write(line1 + "\n" + line2 + ....)

Then it works fine. Why am I unable to use a string for a newline in write() but I can use it in writelines()?

Python 2.7 When I searched google most resources I found were way over my head, I’m still a lay-people.


回答 0

  • writelines 期待字符串的迭代
  • write 需要一个字符串。

line1 + "\n" + line2将这些字符串合并到一个字符串中,然后再传递给write

请注意,如果您有很多行,则可能要使用"\n".join(list_of_lines)

  • writelines expects an iterable of strings
  • write expects a single string.

line1 + "\n" + line2 merges those strings together into a single string before passing it to write.

Note that if you have many lines, you may want to use "\n".join(list_of_lines).


回答 1

为什么我不能在write()中将字符串用于换行符,但可以在writelines()中使用它?

想法如下:如果要编写单个字符串,可以使用write()。如果您有一系列字符串,则可以使用编写所有字符串writelines()

write(arg)需要一个字符串作为参数并将其写入文件。如果您提供字符串列表,它将引发异常(顺便说一下,向我们显示错误!)。

writelines(arg)期望将iterable作为参数(在最一般的意义上,可迭代对象可以是元组,列表,字符串或迭代器)。迭代器中包含的每个项目均应为字符串。您提供的是一个字符串元组,因此一切正常。

字符串的性质对两个函数都无关紧要,即,无论您提供什么字符串,它们都只会写入文件。有趣的是,writelines()它本身并不添加换行符,因此方法名称实际上可能会造成很大的混乱。实际上,它的行为类似于一个称为的虚构方法write_all_of_these_strings(sequence)

接下来是Python中的一种惯用方式,将字符串列表写入文件,同时将每个字符串保留在自己的行中:

lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
    f.write('\n'.join(lines))

这将为您关闭文件。该构造'\n'.join(lines)将列表中的字符串连接(连接),lines并使用字符“ \ n”作为粘合。比使用+运算符更有效。

从相同的lines序列开始,以相同的输出结束,但使用writelines()

lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
    f.writelines("%s\n" % l for l in lines)

这利用了生成器表达式并动态创建了以换行符结尾的字符串。writelines()遍历此字符串序列并写入每个项目。

编辑:您应该注意的另一点:

write()并且readlines()writelines()引入之前就存在。writelines()后来作为的对应版本引入readlines(),以便人们可以轻松地编写通过readlines()以下方式读取的文件内容:

outfile.writelines(infile.readlines())

确实,这就是为什么使用writelines如此混乱的名称的主要原因。而且,今天,我们真的不再想要使用此方法。readlines()writelines()开始写入数据之前,将整个文件读取到计算机的内存中。首先,这可能会浪费时间。为什么不阅读其他部分就开始写部分数据呢?但是,最重要的是,这种方法可能会占用大量内存。在极端情况下,如果输入文件大于计算机的内存,则此方法甚至不起作用。解决此问题的方法是仅使用迭代器。一个工作示例:

with open('inputfile') as infile:
    with open('outputfile') as outfile:
        for line in infile:
            outfile.write(line)

这将逐行读取输入文件。读取一行后,该行即被写入输出文件。从概念上讲,内存中始终只有一行(相比之下,在采用读取行/写入行方法的情况下,整个文件内容都在内存中)。

Why am I unable to use a string for a newline in write() but I can use it in writelines()?

The idea is the following: if you want to write a single string you can do this with write(). If you have a sequence of strings you can write them all using writelines().

write(arg) expects a string as argument and writes it to the file. If you provide a list of strings, it will raise an exception (by the way, show errors to us!).

writelines(arg) expects an iterable as argument (an iterable object can be a tuple, a list, a string, or an iterator in the most general sense). Each item contained in the iterator is expected to be a string. A tuple of strings is what you provided, so things worked.

The nature of the string(s) does not matter to both of the functions, i.e. they just write to the file whatever you provide them. The interesting part is that writelines() does not add newline characters on its own, so the method name can actually be quite confusing. It actually behaves like an imaginary method called write_all_of_these_strings(sequence).

What follows is an idiomatic way in Python to write a list of strings to a file while keeping each string in its own line:

lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
    f.write('\n'.join(lines))

This takes care of closing the file for you. The construct '\n'.join(lines) concatenates (connects) the strings in the list lines and uses the character ‘\n’ as glue. It is more efficient than using the + operator.

Starting from the same lines sequence, ending up with the same output, but using writelines():

lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
    f.writelines("%s\n" % l for l in lines)

This makes use of a generator expression and dynamically creates newline-terminated strings. writelines() iterates over this sequence of strings and writes every item.

Edit: Another point you should be aware of:

write() and readlines() existed before writelines() was introduced. writelines() was introduced later as a counterpart of readlines(), so that one could easily write the file content that was just read via readlines():

outfile.writelines(infile.readlines())

Really, this is the main reason why writelines has such a confusing name. Also, today, we do not really want to use this method anymore. readlines() reads the entire file to the memory of your machine before writelines() starts to write the data. First of all, this may waste time. Why not start writing parts of data while reading other parts? But, most importantly, this approach can be very memory consuming. In an extreme scenario, where the input file is larger than the memory of your machine, this approach won’t even work. The solution to this problem is to use iterators only. A working example:

with open('inputfile') as infile:
    with open('outputfile') as outfile:
        for line in infile:
            outfile.write(line)

This reads the input file line by line. As soon as one line is read, this line is written to the output file. Schematically spoken, there always is only one single line in memory (compared to the entire file content being in memory in case of the readlines/writelines approach).


回答 2

如果您只想保存和加载列表,请尝试Pickle

泡菜保存:

with open("yourFile","wb")as file:
 pickle.dump(YourList,file)

和加载:

with open("yourFile","rb")as file:
 YourList=pickle.load(file)

if you just want to save and load a list try Pickle

Pickle saving:

with open("yourFile","wb")as file:
 pickle.dump(YourList,file)

and loading:

with open("yourFile","rb")as file:
 YourList=pickle.load(file)

回答 3

实际上,我认为问题在于您的变量“行”不好。您将行定义为元组,但是我相信write()需要一个字符串。您所要做的就是将逗号变成加号(+)。

nl = "\n"
lines = line1+nl+line2+nl+line3+nl
textdoc.writelines(lines)

应该管用。

Actually, I think the problem is that your variable “lines” is bad. You defined lines as a tuple, but I believe that write() requires a string. All you have to change is your commas into pluses (+).

nl = "\n"
lines = line1+nl+line2+nl+line3+nl
textdoc.writelines(lines)

should work.


回答 4

习德(Zed Shaw)的书中的练习16?您可以使用转义符,如下所示:

paragraph1 = "%s \n %s \n %s \n" % (line1, line2, line3)
target.write(paragraph1)
target.close()

Exercise 16 from Zed Shaw’s book? You can use escape characters as follows:

paragraph1 = "%s \n %s \n %s \n" % (line1, line2, line3)
target.write(paragraph1)
target.close()

为什么在Python类中使用__init__?

问题:为什么在Python类中使用__init__?

我在理解类的初始化时遇到了麻烦。

它们的意义何在?我们如何知道其中包含什么?用类编写与创建函数相比是否需要不同的思维方式(我认为我可以只创建函数,然后将它们包装在类中,以便我可以重用它们。这行得通吗?)

这是一个例子:

class crawler:
  # Initialize the crawler with the name of database
  def __init__(self,dbname):
    self.con=sqlite.connect(dbname)

  def __del__(self):
    self.con.close()

  def dbcommit(self):
    self.con.commit()

或另一个代码示例:

class bicluster:
  def __init__(self,vec,left=None,right=None,distance=0.0,id=None):
    self.left=left
    self.right=right
    self.vec=vec
    self.id=id
    self.distance=distance

__init__尝试阅读别人的代码时遇到了很多类,但是我不理解创建它们的逻辑。

I am having trouble understanding the Initialization of classes.

What’s the point of them and how do we know what to include in them? Does writing in classes require a different type of thinking versus creating functions (I figured I could just create functions and then just wrap them in a class so I can re-use them. Will that work?)

Here’s an example:

class crawler:
  # Initialize the crawler with the name of database
  def __init__(self,dbname):
    self.con=sqlite.connect(dbname)

  def __del__(self):
    self.con.close()

  def dbcommit(self):
    self.con.commit()

Or another code sample:

class bicluster:
  def __init__(self,vec,left=None,right=None,distance=0.0,id=None):
    self.left=left
    self.right=right
    self.vec=vec
    self.id=id
    self.distance=distance

There are so many classes with __init__ I come across when trying to read other people’s code, but I don’t understand the logic in creating them.


回答 0

根据您所写的内容,您缺少一个关键的理解:类和对象之间的区别。__init__不初始化类,而是初始化类或对象的实例。每只狗都有颜色,但同级狗却没有。每只狗的脚长为四脚或更少,但没有一类。类是对象的概念。当您看到Fido和Spot时,就会认识到它们的相似之处,即狗狗般的风格。那是类。

当你说

class Dog:
    def __init__(self, legs, colour):
        self.legs = legs
        self.colour = colour

fido = Dog(4, "brown")
spot = Dog(3, "mostly yellow")

您的意思是,Fido是一只有四只腿的棕色狗,而Spot有点of弱,而且大多是黄色的。该__init__函数称为构造函数或初始化器,当您创建类的新实例时会自动调用该函数。在该函数内,将新创建的对象分配给parameter self。表示法self.legslegs变量中称为对象的属性self。属性有点像变量,但是它们描述对象的状态或对象可用的特定操作(功能)。

但是,请注意,您并没有设置colour狗狗身份-这是一个抽象概念。有一些对类有意义的属性。例如,population_size就是这样的一种-计算Fido没什么意义,因为Fido总是一个。数狗确实很有意义。让我们说世界上有2亿只狗。这是Dog类的属性。菲多(Fido)与2亿数字无关,与Spot也无关。与“ colourlegs以上”的“实例属性”相对,它称为“类属性” 。

现在,要少一些东西,更多地与编程有关。正如我在下面写的,添加东西的类是不明智的-它是什么类?Python中的类由行为相似的不同数据集合组成。狗类包括Fido和Spot和其他与它们相似的动物199999999998,它们都在路灯柱上撒尿。添加事物的类由什么组成?它们固有的哪些数据不同?他们分享什么行动?

但是,数字…这些是更有趣的主题。说,整数。有很多,比狗还多。我知道Python已经有整数,但是让我们再次玩哑巴并“实现”它们(通过作弊并使用Python的整数)。

因此,整数是一类。他们有一些数据(值)和一些行为(“将我添加到另一个数字”)。让我们展示一下:

class MyInteger:
    def __init__(self, newvalue)
        # imagine self as an index card.
        # under the heading of "value", we will write
        # the contents of the variable newvalue.
        self.value = newvalue
    def add(self, other):
        # when an integer wants to add itself to another integer,
        # we'll take their values and add them together,
        # then make a new integer with the result value.
        return MyInteger(self.value + other.value)

three = MyInteger(3)
# three now contains an object of class MyInteger
# three.value is now 3
five = MyInteger(5)
# five now contains an object of class MyInteger
# five.value is now 5
eight = three.add(five)
# here, we invoked the three's behaviour of adding another integer
# now, eight.value is three.value + five.value = 3 + 5 = 8
print eight.value
# ==> 8

这有点脆弱(我们假设other将是MyInteger),但是现在我们将忽略它。在实际代码中,我们不会;我们将对其进行测试以确保,甚至可以强制它(“您不是整数吗?天哪,您有10纳秒成为一个!9 … 8 ….”)

我们甚至可以定义分数。分数也知道如何添加自己。

class MyFraction:
    def __init__(self, newnumerator, newdenominator)
        self.numerator = newnumerator
        self.denominator = newdenominator
        # because every fraction is described by these two things
    def add(self, other):
        newdenominator = self.denominator * other.denominator
        newnumerator = self.numerator * other.denominator + self.denominator * other.numerator
        return MyFraction(newnumerator, newdenominator)

有比整数更多的分数(不是真的,但是计算机不知道)。让我们做两个:

half = MyFraction(1, 2)
third = MyFraction(1, 3)
five_sixths = half.add(third)
print five_sixths.numerator
# ==> 5
print five_sixths.denominator
# ==> 6

您实际上并没有在这里声明任何内容。属性就像一种新的变量。普通变量只有一个值。我们说你写colour = "grey"。你不能有一个名为另一个变量colour"fuchsia"-并非在代码相同的地方。

数组在一定程度上解决了这个问题。如果您说colour = ["grey", "fuchsia"],您已经将两种颜色堆叠到变量中,但是您通过它们的位置(在这种情况下为0或1)来区分它们。

属性是绑定到对象的变量。像数组一样,我们可以在不同的dogs上有很多colour变量。因此,是一个变量,但是是另一个变量。第一个绑定到变量内的对象; 第二,。现在,当您调用或时,将始终有一个不可见的参数,该参数将分配给参数列表前面悬空的多余参数。它通常称为,它将在点之前获取对象的值。因此,在狗的(构造函数)内部,新狗将成为什么样的狗。中的,将绑定到变量中的对象。从而,fido.colourspot.colourfidospotDog(4, "brown")three.add(five)self__init__selfMyIntegeraddselfthreethree.value将在外部与addself.value内部相同add

如果我说the_mangy_one = fido,我将开始引用fido另一个名称的对象。从现在开始,fido.colour变量与完全相同the_mangy_one.colour

所以,里面的东西__init__。您可以将它们视为在Dog的出生证明中注明的内容。colour本身是一个随机变量,可以包含任何内容。fido.colourself.colour类似于“狗的身份证”上的表格字段;并且__init__是店员填充它的第一次。

更清晰吗?

编辑:扩展下面的评论:

您的意思是列出对象,不是吗?

首先,fido实际上不是对象。它是一个变量,这是目前包含一个对象,只是当你说喜欢x = 5x是目前包含数字五个变量。如果您以后改变主意,则可以fido = Cat(4, "pleasing")(只要创建了class Catfido就可以执行操作,此后将“包含” cat对象。如果这样做fido = x,它将包含数字5,而不是动物对象。

除非您专门编写代码来跟踪它们,否则类本身并不知道其实例。例如:

class Cat:
    census = [] #define census array

    def __init__(self, legs, colour):
        self.colour = colour
        self.legs = legs
        Cat.census.append(self)

census是class的类级属性Cat

fluffy = Cat(4, "white")
spark = Cat(4, "fiery")
Cat.census
# ==> [<__main__.Cat instance at 0x108982cb0>, <__main__.Cat instance at 0x108982e18>]
# or something like that

请注意,您不会得到[fluffy, sparky]。这些只是变量名。如果您希望猫本身具有名称,则必须为名称创建一个单独的属性,然后重写该__str__方法以返回该名称。此方法的用途(即,像add或那样,是类绑定函数__init__)的目的是描述如何将对象转换为字符串,就像将其打印出来一样。

By what you wrote, you are missing a critical piece of understanding: the difference between a class and an object. __init__ doesn’t initialize a class, it initializes an instance of a class or an object. Each dog has colour, but dogs as a class don’t. Each dog has four or fewer feet, but the class of dogs doesn’t. The class is a concept of an object. When you see Fido and Spot, you recognise their similarity, their doghood. That’s the class.

When you say

class Dog:
    def __init__(self, legs, colour):
        self.legs = legs
        self.colour = colour

fido = Dog(4, "brown")
spot = Dog(3, "mostly yellow")

You’re saying, Fido is a brown dog with 4 legs while Spot is a bit of a cripple and is mostly yellow. The __init__ function is called a constructor, or initializer, and is automatically called when you create a new instance of a class. Within that function, the newly created object is assigned to the parameter self. The notation self.legs is an attribute called legs of the object in the variable self. Attributes are kind of like variables, but they describe the state of an object, or particular actions (functions) available to the object.

However, notice that you don’t set colour for the doghood itself – it’s an abstract concept. There are attributes that make sense on classes. For instance, population_size is one such – it doesn’t make sense to count the Fido because Fido is always one. It does make sense to count dogs. Let us say there’re 200 million dogs in the world. It’s the property of the Dog class. Fido has nothing to do with the number 200 million, nor does Spot. It’s called a “class attribute”, as opposed to “instance attributes” that are colour or legs above.

Now, to something less canine and more programming-related. As I write below, class to add things is not sensible – what is it a class of? Classes in Python make up of collections of different data, that behave similarly. Class of dogs consists of Fido and Spot and 199999999998 other animals similar to them, all of them peeing on lampposts. What does the class for adding things consist of? By what data inherent to them do they differ? And what actions do they share?

However, numbers… those are more interesting subjects. Say, Integers. There’s a lot of them, a lot more than dogs. I know that Python already has integers, but let’s play dumb and “implement” them again (by cheating and using Python’s integers).

So, Integers are a class. They have some data (value), and some behaviours (“add me to this other number”). Let’s show this:

class MyInteger:
    def __init__(self, newvalue)
        # imagine self as an index card.
        # under the heading of "value", we will write
        # the contents of the variable newvalue.
        self.value = newvalue
    def add(self, other):
        # when an integer wants to add itself to another integer,
        # we'll take their values and add them together,
        # then make a new integer with the result value.
        return MyInteger(self.value + other.value)

three = MyInteger(3)
# three now contains an object of class MyInteger
# three.value is now 3
five = MyInteger(5)
# five now contains an object of class MyInteger
# five.value is now 5
eight = three.add(five)
# here, we invoked the three's behaviour of adding another integer
# now, eight.value is three.value + five.value = 3 + 5 = 8
print eight.value
# ==> 8

This is a bit fragile (we’re assuming other will be a MyInteger), but we’ll ignore now. In real code, we wouldn’t; we’d test it to make sure, and maybe even coerce it (“you’re not an integer? by golly, you have 10 nanoseconds to become one! 9… 8….”)

We could even define fractions. Fractions also know how to add themselves.

class MyFraction:
    def __init__(self, newnumerator, newdenominator)
        self.numerator = newnumerator
        self.denominator = newdenominator
        # because every fraction is described by these two things
    def add(self, other):
        newdenominator = self.denominator * other.denominator
        newnumerator = self.numerator * other.denominator + self.denominator * other.numerator
        return MyFraction(newnumerator, newdenominator)

There’s even more fractions than integers (not really, but computers don’t know that). Let’s make two:

half = MyFraction(1, 2)
third = MyFraction(1, 3)
five_sixths = half.add(third)
print five_sixths.numerator
# ==> 5
print five_sixths.denominator
# ==> 6

You’re not actually declaring anything here. Attributes are like a new kind of variable. Normal variables only have one value. Let us say you write colour = "grey". You can’t have another variable named colour that is "fuchsia" – not in the same place in the code.

Arrays solve that to a degree. If you say colour = ["grey", "fuchsia"], you have stacked two colours into the variable, but you distinguish them by their position (0, or 1, in this case).

Attributes are variables that are bound to an object. Like with arrays, we can have plenty colour variables, on different dogs. So, fido.colour is one variable, but spot.colour is another. The first one is bound to the object within the variable fido; the second, spot. Now, when you call Dog(4, "brown"), or three.add(five), there will always be an invisible parameter, which will be assigned to the dangling extra one at the front of the parameter list. It is conventionally called self, and will get the value of the object in front of the dot. Thus, within the Dog’s __init__ (constructor), self will be whatever the new Dog will turn out to be; within MyInteger‘s add, self will be bound to the object in the variable three. Thus, three.value will be the same variable outside the add, as self.value within the add.

If I say the_mangy_one = fido, I will start referring to the object known as fido with yet another name. From now on, fido.colour is exactly the same variable as the_mangy_one.colour.

So, the things inside the __init__. You can think of them as noting things into the Dog’s birth certificate. colour by itself is a random variable, could contain anything. fido.colour or self.colour is like a form field on the Dog’s identity sheet; and __init__ is the clerk filling it out for the first time.

Any clearer?

EDIT: Expanding on the comment below:

You mean a list of objects, don’t you?

First of all, fido is actually not an object. It is a variable, which is currently containing an object, just like when you say x = 5, x is a variable currently containing the number five. If you later change your mind, you can do fido = Cat(4, "pleasing") (as long as you’ve created a class Cat), and fido would from then on “contain” a cat object. If you do fido = x, it will then contain the number five, and not an animal object at all.

A class by itself doesn’t know its instances unless you specifically write code to keep track of them. For instance:

class Cat:
    census = [] #define census array

    def __init__(self, legs, colour):
        self.colour = colour
        self.legs = legs
        Cat.census.append(self)

Here, census is a class-level attribute of Cat class.

fluffy = Cat(4, "white")
spark = Cat(4, "fiery")
Cat.census
# ==> [<__main__.Cat instance at 0x108982cb0>, <__main__.Cat instance at 0x108982e18>]
# or something like that

Note that you won’t get [fluffy, sparky]. Those are just variable names. If you want cats themselves to have names, you have to make a separate attribute for the name, and then override the __str__ method to return this name. This method’s (i.e. class-bound function, just like add or __init__) purpose is to describe how to convert the object to a string, like when you print it out.


回答 1

为我的阿玛丹详尽解释贡献我的5美分。

其中类是抽象的“类型”描述。对象是它们的实现:呼吸的活物。在面向对象的世界中,有一些基本思想几乎可以称为一切的本质。他们是:

  1. 封装(在此不做详细介绍)
  2. 遗产
  3. 多态性

对象具有一个或多个特征(=属性)和行为(=方法)。行为主要取决于特性。类定义了行为应该以一般方式完成的事情,但是只要没有将类实现(实例化)为对象,它仍然是可能性的抽象概念。让我借助“继承”和“多态性”进行说明。

    class Human:
        gender
        nationality
        favorite_drink
        core_characteristic
        favorite_beverage
        name
        age

        def love    
        def drink
        def laugh
        def do_your_special_thing                

    class Americans(Humans)
        def drink(beverage):
            if beverage != favorite_drink: print "You call that a drink?"
            else: print "Great!" 

    class French(Humans)
        def drink(beverage, cheese):
            if beverage == favourite_drink and cheese == None: print "No cheese?" 
            elif beverage != favourite_drink and cheese == None: print "Révolution!"

    class Brazilian(Humans)
        def do_your_special_thing
            win_every_football_world_cup()

    class Germans(Humans)
        def drink(beverage):
            if favorite_drink != beverage: print "I need more beer"
            else: print "Lecker!" 

    class HighSchoolStudent(Americans):
        def __init__(self, name, age):
             self.name = name
             self.age = age

jeff = HighSchoolStudent(name, age):
hans = Germans()
ronaldo = Brazilian()
amelie = French()

for friends in [jeff, hans, ronaldo]:
    friends.laugh()
    friends.drink("cola")
    friends.do_your_special_thing()

print amelie.love(jeff)
>>> True
print ronaldo.love(hans)
>>> False

一些特征定义了人类。但是每个国籍都有所不同。因此,“民族类型”是具有附加功能的人类。“美国人”是“人类”的一种,从人类类型(基类)继承一些抽象的特征和行为:即继承。因此,所有人都可以笑喝,因此所有儿童班也可以!继承(2)。

但是,由于它们都是同一类型(类型/基类:人类),因此有时可以交换它们:请参见末尾的for循环。但是它们会暴露出个人特征,那就是多态性(3)。

因此,每个人都有自己喜欢的饮料,但是每个国籍的人都倾向于一种特殊的饮料。如果您从“人类”类型中继承国籍,则可以覆盖上面我用“ drink()方法” 证明的继承行为。但这仍处于类级别,因此,这仍然是一个概括。

hans = German(favorite_drink = "Cola")

实例化类German,我从一开始就“更改”了默认特征。(但是,如果您打电话给hans.drink(’Milk’),他仍然会打印“我需要更多啤酒”-一个明显的错误……或者,如果我将成为更大公司的雇员,那就是我所说的功能。 ;-)!)

通常__init__在实例化时通过构造函数(在python:中)定义类型(例如,德语(hans))的特征。这是定义类以成为对象的地方。您可以通过将个性化特征填充为一个抽象对象(类),将呼吸生活说成一个抽象概念(类)。

但是因为每个对象都是类的实例,所以它们共享所有一些基本的特征类型和某些行为。这是面向对象概念的主要优点。

为了保护每个对象的特征,可以将它们封装在一起-意味着您尝试将行为和特征耦合在一起,并使其难以从对象外部进行操纵。那就是封装(1)

To contribute my 5 cents to the thorough explanation from Amadan.

Where classes are a description “of a type” in an abstract way. Objects are their realizations: the living breathing thing. In the object-orientated world there are principal ideas you can almost call the essence of everything. They are:

  1. encapsulation (won’t elaborate on this)
  2. inheritance
  3. polymorphism

Objects have one, or more characteristics (= Attributes) and behaviors (= Methods). The behavior mostly depends on the characteristics. Classes define what the behavior should accomplish in a general way, but as long as the class is not realized (instantiated) as an object it remains an abstract concept of a possibility. Let me illustrate with the help of “inheritance” and “polymorphism”.

    class Human:
        gender
        nationality
        favorite_drink
        core_characteristic
        favorite_beverage
        name
        age

        def love    
        def drink
        def laugh
        def do_your_special_thing                

    class Americans(Humans)
        def drink(beverage):
            if beverage != favorite_drink: print "You call that a drink?"
            else: print "Great!" 

    class French(Humans)
        def drink(beverage, cheese):
            if beverage == favourite_drink and cheese == None: print "No cheese?" 
            elif beverage != favourite_drink and cheese == None: print "Révolution!"

    class Brazilian(Humans)
        def do_your_special_thing
            win_every_football_world_cup()

    class Germans(Humans)
        def drink(beverage):
            if favorite_drink != beverage: print "I need more beer"
            else: print "Lecker!" 

    class HighSchoolStudent(Americans):
        def __init__(self, name, age):
             self.name = name
             self.age = age

jeff = HighSchoolStudent(name, age):
hans = Germans()
ronaldo = Brazilian()
amelie = French()

for friends in [jeff, hans, ronaldo]:
    friends.laugh()
    friends.drink("cola")
    friends.do_your_special_thing()

print amelie.love(jeff)
>>> True
print ronaldo.love(hans)
>>> False

Some characteristics define human beings. But every nationality differs somewhat. So “national-types” are kinda Humans with extras. “Americans” are a type of “Humans ” and inherit some abstract characteristics and behavior from the human type (base-class) : that’s inheritance. So all Humans can laugh and drink, therefore all child-classes can also! Inheritance (2).

But because they are all of the same kind (Type/base-class : Humans) you can exchange them sometimes: see the for-loop at the end. But they will expose an individual characteristic, and thats Polymorphism (3).

So each human has a favorite_drink, but every nationality tend towards a special kind of drink. If you subclass a nationality from the type of Humans you can overwrite the inherited behavior as I have demonstrated above with the drink() Method. But that’s still at the class-level and because of this it’s still a generalization.

hans = German(favorite_drink = "Cola")

instantiates the class German and I “changed” a default characteristic at the beginning. (But if you call hans.drink(‘Milk’) he would still print “I need more beer” – an obvious bug … or maybe that’s what i would call a feature if i would be a Employee of a bigger Company. ;-)! )

The characteristic of a type e.g. Germans (hans) are usually defined through the constructor (in python : __init__) at the moment of the instantiation. This is the point where you define a class to become an object. You could say breath life into an abstract concept (class) by filling it with individual characteristics and becoming an object.

But because every object is an instance of a class they share all some basic characteristic-types and some behavior. This is a major advantage of the object-orientated concept.

To protect the characteristics of each object you encapsulate them – means you try to couple behavior and characteristic and make it hard to manipulate it from outside the object. That’s Encapsulation (1)


回答 2

它只是初始化实例的变量。

例如crawler,使用特定的数据库名称创建一个实例(来自上述示例)。

It is just to initialize the instance’s variables.

E.g. create a crawler instance with a specific database name (from your example above).


回答 3

__init__如果要正确初始化实例的可变属性,似乎需要在Python中使用。

请参见以下示例:

>>> class EvilTest(object):
...     attr = []
... 
>>> evil_test1 = EvilTest()
>>> evil_test2 = EvilTest()
>>> evil_test1.attr.append('strange')
>>> 
>>> print "This is evil:", evil_test1.attr, evil_test2.attr
This is evil: ['strange'] ['strange']
>>> 
>>> 
>>> class GoodTest(object):
...     def __init__(self):
...         self.attr = []
... 
>>> good_test1 = GoodTest()
>>> good_test2 = GoodTest()
>>> good_test1.attr.append('strange')
>>> 
>>> print "This is good:", good_test1.attr, good_test2.attr
This is good: ['strange'] []

在Java中,这是完全不同的,在Java中,每个属性都使用新值自动初始化:

import java.util.ArrayList;
import java.lang.String;

class SimpleTest
{
    public ArrayList<String> attr = new ArrayList<String>();
}

class Main
{
    public static void main(String [] args)
    {
        SimpleTest t1 = new SimpleTest();
        SimpleTest t2 = new SimpleTest();

        t1.attr.add("strange");

        System.out.println(t1.attr + " " + t2.attr);
    }
}

产生我们直觉上期望的输出:

[strange] []

但是,如果声明attrstatic,它将像Python一样运行:

[strange] [strange]

It seems like you need to use __init__ in Python if you want to correctly initialize mutable attributes of your instances.

See the following example:

>>> class EvilTest(object):
...     attr = []
... 
>>> evil_test1 = EvilTest()
>>> evil_test2 = EvilTest()
>>> evil_test1.attr.append('strange')
>>> 
>>> print "This is evil:", evil_test1.attr, evil_test2.attr
This is evil: ['strange'] ['strange']
>>> 
>>> 
>>> class GoodTest(object):
...     def __init__(self):
...         self.attr = []
... 
>>> good_test1 = GoodTest()
>>> good_test2 = GoodTest()
>>> good_test1.attr.append('strange')
>>> 
>>> print "This is good:", good_test1.attr, good_test2.attr
This is good: ['strange'] []

This is quite different in Java where each attribute is automatically initialized with a new value:

import java.util.ArrayList;
import java.lang.String;

class SimpleTest
{
    public ArrayList<String> attr = new ArrayList<String>();
}

class Main
{
    public static void main(String [] args)
    {
        SimpleTest t1 = new SimpleTest();
        SimpleTest t2 = new SimpleTest();

        t1.attr.add("strange");

        System.out.println(t1.attr + " " + t2.attr);
    }
}

produces an output we intuitively expect:

[strange] []

But if you declare attr as static, it will act like Python:

[strange] [strange]

回答 4

以您的汽车示例为例:当您获得汽车时,您只是没有得到随机的汽车,我的意思是,您选择颜色,品牌,座位数等。并且有些东西在没有您选择的情况下也是“初始化”的为此,例如车轮数或注册号。

class Car:
    def __init__(self, color, brand, number_of_seats):
        self.color = color
        self.brand = brand
        self.number_of_seats = number_of_seats
        self.number_of_wheels = 4
        self.registration_number = GenerateRegistrationNumber()

因此,在该__init__方法中,您可以定义要创建的实例的属性。因此,如果我们想要一辆蓝色雷诺汽车,供2人使用,我们将进行初始化或Car诸如此类的实例:

my_car = Car('blue', 'Renault', 2)

这样,我们正在创建Car类的实例。该__init__是我们处理的特定属性(如一个colorbrand)及其产生的其他属性,如registration_number

Following with your car example: when you get a car, you just don’t get a random car, I mean, you choose the color, the brand, number of seats, etc. And some stuff is also “initialize” without you choosing for it, like number of wheels or registration number.

class Car:
    def __init__(self, color, brand, number_of_seats):
        self.color = color
        self.brand = brand
        self.number_of_seats = number_of_seats
        self.number_of_wheels = 4
        self.registration_number = GenerateRegistrationNumber()

So, in the __init__ method you defining the attributes of the instance you’re creating. So, if we want a blue Renault car, for 2 people, we would initialize or instance of Car like:

my_car = Car('blue', 'Renault', 2)

This way, we are creating an instance of the Car class. The __init__ is the one that is handling our specific attributes (like color or brand) and its generating the other attributes, like registration_number.


回答 5

类是具有特定于该对象的属性(状态,特征)和方法(功能,功能)的对象(例如,鸭子的白色和飞行能力)。

当创建一个类的实例时,可以给它一些初始的个性(状态或字符,例如她的名字和新生儿衣服的颜色)。您可以使用__init__

基本上__init__在您调用时自动设置实例特征instance = MyClass(some_individual_traits)

Classes are objects with attributes (state, characteristic) and methods (functions, capacities) that are specific for that object (like the white color and fly powers, respectively, for a duck).

When you create an instance of a class, you can give it some initial personality (state or character like the name and the color of her dress for a newborn). You do this with __init__.

Basically __init__ sets the instance characteristics automatically when you call instance = MyClass(some_individual_traits).


回答 6

__init__函数正在设置类中的所有成员变量。因此,一旦创建了bicluster,您就可以访问该成员并获取值:

mycluster = bicluster(...actual values go here...)
mycluster.left # returns the value passed in as 'left'

查看Python文档,了解一些信息。您将希望读一本有关面向对象概念的书,以继续学习。

The __init__ function is setting up all the member variables in the class. So once your bicluster is created you can access the member and get a value back:

mycluster = bicluster(...actual values go here...)
mycluster.left # returns the value passed in as 'left'

Check out the Python Docs for some info. You’ll want to pick up an book on OO concepts to continue learning.


回答 7

class Dog(object):

    # Class Object Attribute
    species = 'mammal'

    def __init__(self,breed,name):
        self.breed = breed
        self.name = name

在上面的示例中,我们将物种用作全局物种,因为它始终是相同的(可以说一类常数)。当您调用__init__method 时,里面的所有变量__init__都会被初始化(例如:breed,name)。

class Dog(object):
    a = '12'

    def __init__(self,breed,name,a):
        self.breed = breed
        self.name = name
        self.a= a

如果您通过像下面这样调用下面来打印上面的示例

Dog.a
12

Dog('Lab','Sam','10')
Dog.a
10

这意味着它将仅在对象创建期间初始化。因此,您要声明为常量的任何内容都将其设置为全局的,并且任何更改使用的内容 __init__

class Dog(object):

    # Class Object Attribute
    species = 'mammal'

    def __init__(self,breed,name):
        self.breed = breed
        self.name = name

In above example we use species as a global since it will be always same(Kind of constant you can say). when you call __init__ method then all the variable inside __init__ will be initiated(eg:breed,name).

class Dog(object):
    a = '12'

    def __init__(self,breed,name,a):
        self.breed = breed
        self.name = name
        self.a= a

if you print the above example by calling below like this

Dog.a
12

Dog('Lab','Sam','10')
Dog.a
10

That means it will be only initialized during object creation. so anything which you want to declare as constant make it as global and anything which changes use __init__


Python中的指针?

问题:Python中的指针?

我知道Python没有指针,但是有办法提高Yield 2,而不是

>>> a = 1
>>> b = a # modify this line somehow so that b "points to" a
>>> a = 2
>>> b
1


这是一个例子:我想要form.data['field']form.field.value始终具有相同的值。并非完全必要,但我认为这会很好。


例如,在PHP中,我可以这样做:

<?php

class Form {
    public $data = [];
    public $fields;

    function __construct($fields) {
        $this->fields = $fields;
        foreach($this->fields as &$field) {
            $this->data[$field['id']] = &$field['value'];
        }
    }
}

$f = new Form([
    [
        'id' => 'fname',
        'value' => 'George'
    ],
    [
        'id' => 'lname',
        'value' => 'Lucas'
    ]
]);

echo $f->data['fname'], $f->fields[0]['value']; # George George
$f->data['fname'] = 'Ralph';
echo $f->data['fname'], $f->fields[0]['value']; # Ralph Ralph

输出:

GeorgeGeorgeRalphRalph

异丁酮


或在C ++中这样(我认为这是正确的,但我的C ++生锈了):

#include <iostream>
using namespace std;

int main() {
    int* a;
    int* b = a;
    *a = 1;
    cout << *a << endl << *b << endl; # 1 1

    return 0;
}

I know Python doesn’t have pointers, but is there a way to have this yield 2 instead

>>> a = 1
>>> b = a # modify this line somehow so that b "points to" a
>>> a = 2
>>> b
1

?


Here’s an example: I want form.data['field'] and form.field.value to always have the same value. It’s not completely necessary, but I think it would be nice.


In PHP, for example, I can do this:

<?php

class Form {
    public $data = [];
    public $fields;

    function __construct($fields) {
        $this->fields = $fields;
        foreach($this->fields as &$field) {
            $this->data[$field['id']] = &$field['value'];
        }
    }
}

$f = new Form([
    [
        'id' => 'fname',
        'value' => 'George'
    ],
    [
        'id' => 'lname',
        'value' => 'Lucas'
    ]
]);

echo $f->data['fname'], $f->fields[0]['value']; # George George
$f->data['fname'] = 'Ralph';
echo $f->data['fname'], $f->fields[0]['value']; # Ralph Ralph

Output:

GeorgeGeorgeRalphRalph

ideone


Or like this in C++ (I think this is right, but my C++ is rusty):

#include <iostream>
using namespace std;

int main() {
    int* a;
    int* b = a;
    *a = 1;
    cout << *a << endl << *b << endl; # 1 1

    return 0;
}

回答 0

我希望form.data['field']form.field.value始终拥有相同的价值

这是可行的,因为它涉及修饰的名称和索引-即,与完全不同的结构以及您所要询问的内容,对于您的请求来说是完全不可能的。为什么要索要不可能的东西而这又与您实际想要的(可能的)东西完全不同? ab

也许您不知道裸名和修饰名有多么大的不同。当您引用裸名时a,您将确切地知道该对象a在该范围内最后绑定到该对象(如果未在该范围内绑定则为exceptions),这是Python如此深入和基本的方面,它可以不可能被颠覆。当您引用修饰的名称时x.y,您正在要求一个对象(该对象所x引用的)请提供“ y属性”-响应该请求,该对象可以执行完全任意的计算(并且索引非常相似:它还允许作为响应执行任意计算)。

现在,您的“实际需求”示例很神秘,因为在每种情况下都涉及两个级别的索引编制或属性获取,因此可以通过多种方式引入您渴望的精妙之处。form.field例如,假设还具有其他哪些属性value?如果没有进一步的.value计算,可能性将包括:

class Form(object):
   ...
   def __getattr__(self, name):
       return self.data[name]

class Form(object):
   ...
   @property
   def data(self):
       return self.__dict__

的存在.value表明采摘第一种形式,加上一种-的无用的包装:

class KouWrap(object):
   def __init__(self, value):
       self.value = value

class Form(object):
   ...
   def __getattr__(self, name):
       return KouWrap(self.data[name])

如果还应该将这样的分配form.field.value = 23设置为中的条目form.data,则包装器的确必须变得更加复杂,并且不是所有的没用的:

class MciWrap(object):
   def __init__(self, data, k):
       self._data = data
       self._k = k
   @property
   def value(self):
       return self._data[self._k]
   @value.setter
   def value(self, v)
       self._data[self._k] = v

class Form(object):
   ...
   def __getattr__(self, name):
       return MciWrap(self.data, name)

后面的示例在Python中与您似乎想要的“指针”意义大致相近-但至关重要的是要了解这样的微妙只能与索引和/或修饰的名称一起使用决不能像您最初要求的那样使用裸名!

I want form.data['field'] and form.field.value to always have the same value

This is feasible, because it involves decorated names and indexing — i.e., completely different constructs from the barenames a and b that you’re asking about, and for with your request is utterly impossible. Why ask for something impossible and totally different from the (possible) thing you actually want?!

Maybe you don’t realize how drastically different barenames and decorated names are. When you refer to a barename a, you’re getting exactly the object a was last bound to in this scope (or an exception if it wasn’t bound in this scope) — this is such a deep and fundamental aspect of Python that it can’t possibly be subverted. When you refer to a decorated name x.y, you’re asking an object (the object x refers to) to please supply “the y attribute” — and in response to that request, the object can perform totally arbitrary computations (and indexing is quite similar: it also allows arbitrary computations to be performed in response).

Now, your “actual desiderata” example is mysterious because in each case two levels of indexing or attribute-getting are involved, so the subtlety you crave could be introduced in many ways. What other attributes is form.field suppose to have, for example, besides value? Without that further .value computations, possibilities would include:

class Form(object):
   ...
   def __getattr__(self, name):
       return self.data[name]

and

class Form(object):
   ...
   @property
   def data(self):
       return self.__dict__

The presence of .value suggests picking the first form, plus a kind-of-useless wrapper:

class KouWrap(object):
   def __init__(self, value):
       self.value = value

class Form(object):
   ...
   def __getattr__(self, name):
       return KouWrap(self.data[name])

If assignments such form.field.value = 23 is also supposed to set the entry in form.data, then the wrapper must become more complex indeed, and not all that useless:

class MciWrap(object):
   def __init__(self, data, k):
       self._data = data
       self._k = k
   @property
   def value(self):
       return self._data[self._k]
   @value.setter
   def value(self, v)
       self._data[self._k] = v

class Form(object):
   ...
   def __getattr__(self, name):
       return MciWrap(self.data, name)

The latter example is roughly as close as it gets, in Python, to the sense of “a pointer” as you seem to want — but it’s crucial to understand that such subtleties can ever only work with indexing and/or decorated names, never with barenames as you originally asked!


回答 1

您无法更改那条线。你可以做:

a = [1]
b = a
a[0] = 2
b[0]

这将创建一个列表,将引用分配给a,然后再将b分配给a,使用a引用将第一个元素设置为2,然后使用b引用变量进行访问。

There’s no way you can do that changing only that line. You can do:

a = [1]
b = a
a[0] = 2
b[0]

That creates a list, assigns the reference to a, then b also, uses the a reference to set the first element to 2, then accesses using the b reference variable.


回答 2

这不是一个错误,这是一个功能 :-)

当您在Python中查看’=’运算符时,不要以赋值的方式思考。您不分配东西,而是绑定它们。=是绑定运算符。

因此,在代码中,您给值1命名:然后,为“ a”中的值命名:b。然后,将值2绑定到名称’a’。绑定到b的值在此操作中不会更改。

来自类似C的语言,这可能会造成混淆,但是一旦您习惯了它,就会发现它可以帮助您更清晰地阅读和推理代码:除非您明确地更改它。而且,如果您执行“导入此操作”,您会发现Python的Zen指出显式要好于隐式。

还要注意的是,诸如Haskell之类的功能语言也使用此范例,就健壮性而言具有极大的价值。

It’s not a bug, it’s a feature :-)

When you look at the ‘=’ operator in Python, don’t think in terms of assignment. You don’t assign things, you bind them. = is a binding operator.

So in your code, you are giving the value 1 a name: a. Then, you are giving the value in ‘a’ a name: b. Then you are binding the value 2 to the name ‘a’. The value bound to b doesn’t change in this operation.

Coming from C-like languages, this can be confusing, but once you become accustomed to it, you find that it helps you to read and reason about your code more clearly: the value which has the name ‘b’ will not change unless you explicitly change it. And if you do an ‘import this’, you’ll find that the Zen of Python states that Explicit is better than implicit.

Note as well that functional languages such as Haskell also use this paradigm, with great value in terms of robustness.


回答 3

是! 有一种方法可以将变量用作python中的指针!

我很遗憾地说,许多答案部分不正确。原则上,每个equal(=)分配都共享内存地址(检查id(obj)函数),但实际上并非如此。有一些变量的equal(“ =”)行为在上学期可以用作存储空间的副本,主要在简单对象(例如“ int”对象)中起作用,而其他变量在其中不起作用(例如“ list”,“ dict”对象) 。

这是指针分配的示例

dict1 = {'first':'hello', 'second':'world'}
dict2 = dict1 # pointer assignation mechanism
dict2['first'] = 'bye'
dict1
>>> {'first':'bye', 'second':'world'}

这是副本分配的示例

a = 1
b = a # copy of memory mechanism. up to here id(a) == id(b)
b = 2 # new address generation. therefore without pointer behaviour
a
>>> 1

指针分配是一种非常有用的工具,它可以在某些情况下使用别名来执行别名,而不会浪费额外的内存,

class cls_X():
   ...
   def method_1():
      pd1 = self.obj_clsY.dict_vars_for_clsX['meth1'] # pointer dict 1: aliasing
      pd1['var4'] = self.method2(pd1['var1'], pd1['var2'], pd1['var3'])
   #enddef method_1
   ...
#endclass cls_X

但是为了防止代码错误,必须注意这种用法。

总而言之,默认情况下,某些变量为准名称(简单对象,如int,float,str,…),而某些变量在它们之间分配时为指针(例如dict1 = dict2)。如何识别它们?只需与他们一起尝试此实验。在具有可变资源管理器面板的IDE中,指针机制对象的定义中通常显示为内存地址(“ @axbbbbbb …”)。

我建议对该主题进行调查。肯定有很多人对此主题有更多的了解。(请参见“ ctypes”模块)。希望对您有所帮助。享受物品的良好使用!问候,何塞·克雷斯波

Yes! there is a way to use a variable as a pointer in python!

I am sorry to say that many of answers were partially wrong. In principle every equal(=) assignation shares the memory address (check the id(obj) function), but in practice it is not such. There are variables whose equal(“=”) behaviour works in last term as a copy of memory space, mostly in simple objects (e.g. “int” object), and others in which not (e.g. “list”,”dict” objects).

Here is an example of pointer assignation

dict1 = {'first':'hello', 'second':'world'}
dict2 = dict1 # pointer assignation mechanism
dict2['first'] = 'bye'
dict1
>>> {'first':'bye', 'second':'world'}

Here is an example of copy assignation

a = 1
b = a # copy of memory mechanism. up to here id(a) == id(b)
b = 2 # new address generation. therefore without pointer behaviour
a
>>> 1

Pointer assignation is a pretty useful tool for aliasing without the waste of extra memory, in certain situations for performing comfy code,

class cls_X():
   ...
   def method_1():
      pd1 = self.obj_clsY.dict_vars_for_clsX['meth1'] # pointer dict 1: aliasing
      pd1['var4'] = self.method2(pd1['var1'], pd1['var2'], pd1['var3'])
   #enddef method_1
   ...
#endclass cls_X

but one have to be aware of this use in order to prevent code mistakes.

To conclude, by default some variables are barenames (simple objects like int, float, str,…), and some are pointers when assigned between them (e.g. dict1 = dict2). How to recognize them? just try this experiment with them. In IDEs with variable explorer panel usually appears to be the memory address (“@axbbbbbb…”) in the definition of pointer-mechanism objects.

I suggest investigate in the topic. There are many people who know much more about this topic for sure. (see “ctypes” module). I hope it is helpful. Enjoy the good use of the objects! Regards, José Crespo


回答 4

>> id(1)
1923344848  # identity of the location in memory where 1 is stored
>> id(1)
1923344848  # always the same
>> a = 1
>> b = a  # or equivalently b = 1, because 1 is immutable
>> id(a)
1923344848
>> id(b)  # equal to id(a)
1923344848

如您所见ab只是引用同一不变对象(int)的两个不同名称1。如果稍后编写a = 2,则将名称重新分配a另一个对象(int)2,但b继续引用1

>> id(2)
1923344880
>> a = 2
>> id(a)
1923344880  # equal to id(2)
>> b
1           # b hasn't changed
>> id(b)
1923344848  # equal to id(1)

如果您有一个可变对象,例如列表,将会发生什么[1]

>> id([1])
328817608
>> id([1])
328664968  # different from the previous id, because each time a new list is created
>> a = [1]
>> id(a)
328817800
>> id(a)
328817800 # now same as before
>> b = a
>> id(b)
328817800  # same as id(a)

同样,我们[1]通过两个不同的名称a和引用同一对象(列表)b。然而,现在虽然仍然是相同的对象,我们可以变异这个名单,和ab都将继续引用它

>> a[0] = 2
>> a
[2]
>> b
[2]
>> id(a)
328817800  # same as before
>> id(b)
328817800  # same as before
>> id(1)
1923344848  # identity of the location in memory where 1 is stored
>> id(1)
1923344848  # always the same
>> a = 1
>> b = a  # or equivalently b = 1, because 1 is immutable
>> id(a)
1923344848
>> id(b)  # equal to id(a)
1923344848

As you can see a and b are just two different names that reference to the same immutable object (int) 1. If later you write a = 2, you reassign the name a to a different object (int) 2, but the b continues referencing to 1:

>> id(2)
1923344880
>> a = 2
>> id(a)
1923344880  # equal to id(2)
>> b
1           # b hasn't changed
>> id(b)
1923344848  # equal to id(1)

What would happen if you had a mutable object instead, such as a list [1]?

>> id([1])
328817608
>> id([1])
328664968  # different from the previous id, because each time a new list is created
>> a = [1]
>> id(a)
328817800
>> id(a)
328817800 # now same as before
>> b = a
>> id(b)
328817800  # same as id(a)

Again, we are referencing to the same object (list) [1] by two different names a and b. However now we can mutate this list while it remains the same object, and a, b will both continue referencing to it

>> a[0] = 2
>> a
[2]
>> b
[2]
>> id(a)
328817800  # same as before
>> id(b)
328817800  # same as before

回答 5

从一个角度来看,一切都是Python中的指针。您的示例的工作原理与C ++代码非常相似。

int* a = new int(1);
int* b = a;
a = new int(2);
cout << *b << endl;   // prints 1

(更接近的等效项将使用某种类型的shared_ptr<Object>代替int*。)

这是一个示例:我希望form.data [‘field’]和form.field.value始终具有相同的值。并非完全必要,但我认为这会很好。

您可以通过__getitem__form.data类中重载来实现。

From one point of view, everything is a pointer in Python. Your example works a lot like the C++ code.

int* a = new int(1);
int* b = a;
a = new int(2);
cout << *b << endl;   // prints 1

(A closer equivalent would use some type of shared_ptr<Object> instead of int*.)

Here’s an example: I want form.data[‘field’] and form.field.value to always have the same value. It’s not completely necessary, but I think it would be nice.

You can do this by overloading __getitem__ in form.data‘s class.


回答 6

这是python指针(与c / c ++不同)

>>> a = lambda : print('Hello')
>>> a
<function <lambda> at 0x0000018D192B9DC0>
>>> id(a) == int(0x0000018D192B9DC0)
True
>>> from ctypes import cast, py_object
>>> cast(id(a), py_object).value == cast(int(0x0000018D192B9DC0), py_object).value
True
>>> cast(id(a), py_object).value
<function <lambda> at 0x0000018D192B9DC0>
>>> cast(id(a), py_object).value()
Hello

This is a python pointer (different of c/c++)

>>> a = lambda : print('Hello')
>>> a
<function <lambda> at 0x0000018D192B9DC0>
>>> id(a) == int(0x0000018D192B9DC0)
True
>>> from ctypes import cast, py_object
>>> cast(id(a), py_object).value == cast(int(0x0000018D192B9DC0), py_object).value
True
>>> cast(id(a), py_object).value
<function <lambda> at 0x0000018D192B9DC0>
>>> cast(id(a), py_object).value()
Hello

回答 7

我写了以下简单的类作为有效地在python中模拟指针的方法:

class Parameter:
    """Syntactic sugar for getter/setter pair
    Usage:

    p = Parameter(getter, setter)

    Set parameter value:
    p(value)
    p.val = value
    p.set(value)

    Retrieve parameter value:
    p()
    p.val
    p.get()
    """
    def __init__(self, getter, setter):
        """Create parameter

        Required positional parameters:
        getter: called with no arguments, retrieves the parameter value.
        setter: called with value, sets the parameter.
        """
        self._get = getter
        self._set = setter

    def __call__(self, val=None):
        if val is not None:
            self._set(val)
        return self._get()

    def get(self):
        return self._get()

    def set(self, val):
        self._set(val)

    @property
    def val(self):
        return self._get()

    @val.setter
    def val(self, val):
        self._set(val)

这是一个使用示例(来自jupyter笔记本页面):

l1 = list(range(10))
def l1_5_getter(lst=l1, number=5):
    return lst[number]

def l1_5_setter(val, lst=l1, number=5):
    lst[number] = val

[
    l1_5_getter(),
    l1_5_setter(12),
    l1,
    l1_5_getter()
]

Out = [5, None, [0, 1, 2, 3, 4, 12, 6, 7, 8, 9], 12]

p = Parameter(l1_5_getter, l1_5_setter)

print([
    p(),
    p.get(),
    p.val,
    p(13),
    p(),
    p.set(14),
    p.get()
])
p.val = 15
print(p.val, l1)

[12, 12, 12, 13, 13, None, 14]
15 [0, 1, 2, 3, 4, 15, 6, 7, 8, 9]

当然,对对象的字典项或属性进行这项工作也很容易。甚至可以使用globals()来完成OP所要求的操作:

def setter(val, dict=globals(), key='a'):
    dict[key] = val

def getter(dict=globals(), key='a'):
    return dict[key]

pa = Parameter(getter, setter)
pa(2)
print(a)
pa(3)
print(a)

这将打印2,然后打印3。

以这种方式来处理全局命名空间显然是一个可怕的想法,但它表明可以(如果不建议这样做)执行OP所要求的操作。

这个例子当然是毫无意义的。但是我发现该类在我为其开发的应用程序中很有用:一种数学模型,其行为由众多用户可设置的,各种类型的数学参数(由于它们取决于命令行参数而未知)控制在编译时)。并且一旦将访问权限封装在Parameter对象中,就可以用统一的方式操纵所有此类对象。

尽管它看起来不太像C或C ++指针,但这正在解决一个问题,如果我用C ++编写的话,我将使用指针解决该问题。

I wrote the following simple class as, effectively, a way to emulate a pointer in python:

class Parameter:
    """Syntactic sugar for getter/setter pair
    Usage:

    p = Parameter(getter, setter)

    Set parameter value:
    p(value)
    p.val = value
    p.set(value)

    Retrieve parameter value:
    p()
    p.val
    p.get()
    """
    def __init__(self, getter, setter):
        """Create parameter

        Required positional parameters:
        getter: called with no arguments, retrieves the parameter value.
        setter: called with value, sets the parameter.
        """
        self._get = getter
        self._set = setter

    def __call__(self, val=None):
        if val is not None:
            self._set(val)
        return self._get()

    def get(self):
        return self._get()

    def set(self, val):
        self._set(val)

    @property
    def val(self):
        return self._get()

    @val.setter
    def val(self, val):
        self._set(val)

Here’s an example of use (from a jupyter notebook page):

l1 = list(range(10))
def l1_5_getter(lst=l1, number=5):
    return lst[number]

def l1_5_setter(val, lst=l1, number=5):
    lst[number] = val

[
    l1_5_getter(),
    l1_5_setter(12),
    l1,
    l1_5_getter()
]

Out = [5, None, [0, 1, 2, 3, 4, 12, 6, 7, 8, 9], 12]

p = Parameter(l1_5_getter, l1_5_setter)

print([
    p(),
    p.get(),
    p.val,
    p(13),
    p(),
    p.set(14),
    p.get()
])
p.val = 15
print(p.val, l1)

[12, 12, 12, 13, 13, None, 14]
15 [0, 1, 2, 3, 4, 15, 6, 7, 8, 9]

Of course, it is also easy to make this work for dict items or attributes of an object. There is even a way to do what the OP asked for, using globals():

def setter(val, dict=globals(), key='a'):
    dict[key] = val

def getter(dict=globals(), key='a'):
    return dict[key]

pa = Parameter(getter, setter)
pa(2)
print(a)
pa(3)
print(a)

This will print out 2, followed by 3.

Messing with the global namespace in this way is kind of transparently a terrible idea, but it shows that it is possible (if inadvisable) to do what the OP asked for.

The example is, of course, fairly pointless. But I have found this class to be useful in the application for which I developed it: a mathematical model whose behavior is governed by numerous user-settable mathematical parameters, of diverse types (which, because they depend on command line arguments, are not known at compile time). And once access to something has been encapsulated in a Parameter object, all such objects can be manipulated in a uniform way.

Although it doesn’t look much like a C or C++ pointer, this is solving a problem that I would have solved with pointers if I were writing in C++.


回答 8

以下代码完全模拟了C语言中指针的行为:

from collections import deque # more efficient than list for appending things
pointer_storage = deque()
pointer_address = 0

class new:    
    def __init__(self):
        global pointer_storage    
        global pointer_address

        self.address = pointer_address
        self.val = None        
        pointer_storage.append(self)
        pointer_address += 1


def get_pointer(address):
    return pointer_storage[address]

def get_address(p):
    return p.address

null = new() # create a null pointer, whose address is 0    

以下是使用示例:

p = new()
p.val = 'hello'
q = new()
q.val = p
r = new()
r.val = 33

p = get_pointer(3)
print(p.val, flush = True)
p.val = 43
print(get_pointer(3).val, flush = True)

但是现在是时候提供更专业的代码了,包括删除指针的选项,我刚刚在我的个人库中找到了它:

# C pointer emulation:

from collections import deque # more efficient than list for appending things
from sortedcontainers import SortedList #perform add and discard in log(n) times


class new:      
    # C pointer emulation:
    # use as : p = new()
    #          p.val             
    #          p.val = something
    #          p.address
    #          get_address(p) 
    #          del_pointer(p) 
    #          null (a null pointer)

    __pointer_storage__ = SortedList(key = lambda p: p.address)
    __to_delete_pointers__ = deque()
    __pointer_address__ = 0 

    def __init__(self):      

        self.val = None 

        if new.__to_delete_pointers__:
            p = new.__to_delete_pointers__.pop()
            self.address = p.address
            new.__pointer_storage__.discard(p) # performed in log(n) time thanks to sortedcontainers
            new.__pointer_storage__.add(self)  # idem

        else:
            self.address = new.__pointer_address__
            new.__pointer_storage__.add(self)
            new.__pointer_address__ += 1


def get_pointer(address):
    return new.__pointer_storage__[address]


def get_address(p):
    return p.address


def del_pointer(p):
    new.__to_delete_pointers__.append(p)

null = new() # create a null pointer, whose address is 0

The following code emulates exactly the behavior of pointers in C:

from collections import deque # more efficient than list for appending things
pointer_storage = deque()
pointer_address = 0

class new:    
    def __init__(self):
        global pointer_storage    
        global pointer_address

        self.address = pointer_address
        self.val = None        
        pointer_storage.append(self)
        pointer_address += 1


def get_pointer(address):
    return pointer_storage[address]

def get_address(p):
    return p.address

null = new() # create a null pointer, whose address is 0    

Here are examples of use:

p = new()
p.val = 'hello'
q = new()
q.val = p
r = new()
r.val = 33

p = get_pointer(3)
print(p.val, flush = True)
p.val = 43
print(get_pointer(3).val, flush = True)

But it’s now time to give a more professional code, including the option of deleting pointers, that I’ve just found in my personal library:

# C pointer emulation:

from collections import deque # more efficient than list for appending things
from sortedcontainers import SortedList #perform add and discard in log(n) times


class new:      
    # C pointer emulation:
    # use as : p = new()
    #          p.val             
    #          p.val = something
    #          p.address
    #          get_address(p) 
    #          del_pointer(p) 
    #          null (a null pointer)

    __pointer_storage__ = SortedList(key = lambda p: p.address)
    __to_delete_pointers__ = deque()
    __pointer_address__ = 0 

    def __init__(self):      

        self.val = None 

        if new.__to_delete_pointers__:
            p = new.__to_delete_pointers__.pop()
            self.address = p.address
            new.__pointer_storage__.discard(p) # performed in log(n) time thanks to sortedcontainers
            new.__pointer_storage__.add(self)  # idem

        else:
            self.address = new.__pointer_address__
            new.__pointer_storage__.add(self)
            new.__pointer_address__ += 1


def get_pointer(address):
    return new.__pointer_storage__[address]


def get_address(p):
    return p.address


def del_pointer(p):
    new.__to_delete_pointers__.append(p)

null = new() # create a null pointer, whose address is 0

ImportError:没有名为MySQLdb的模块

问题:ImportError:没有名为MySQLdb的模块

我指的是以下教程来为我的Web应用程序创建登录页面。 http://code.tutsplus.com/tutorials/intro-to-flask-signing-in-and-out–net-29982

我的数据库有问题。我正在

ImportError: No module named MySQLdb

当我执行

http://127.0.0.1:5000/testdb

我已经尝试了所有可能的方法来安装python mysql,这是本教程中提到的一种,easy_install,sudo apt-get install。

我已经在虚拟环境中安装了mysql。我的目录结构与本教程中说明的目录结构相同。该模块已成功安装在我的系统中,但仍然出现此错误。

请帮忙。是什么原因造成的。

I am referring the following tutorial to make a login page for my web application. http://code.tutsplus.com/tutorials/intro-to-flask-signing-in-and-out–net-29982

I am having issue with the database. I am getting an

ImportError: No module named MySQLdb

when I execute

http://127.0.0.1:5000/testdb

I have tried all possible ways to install python mysql, the one mentioned in the tutorial, easy_install, sudo apt-get install.

I have installed mysql in my virtual env. My directory structure is just the same as whats explained in the tutorial. The module is sucessfully installed in my system and still I am getting this error.

Please help. What could be causing this.


回答 0

如果您在编译二进制扩展名时遇到问题,或者在无法扩展的平台上,则可以尝试使用纯python PyMySQL绑定。

只需pip install pymysql切换您的SQLAlchemy URI即可,如下所示:

SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://.....'

您还可以尝试其他一些驱动程序

If you’re having issues compiling the binary extension, or on a platform where you cant, you can try using the pure python PyMySQL bindings.

Simply pip install pymysql and switch your SQLAlchemy URI to start like this:

SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://.....'

There are some other drivers you could also try.


回答 1

或尝试以下方法:

apt-get install python-mysqldb

Or try this:

apt-get install python-mysqldb

回答 2

你可以尝试

pip install mysqlclient

may you try

pip install mysqlclient

回答 3

我的问题是:

return __import__('MySQLdb')
ImportError: No module named MySQLdb

和我的决议:

pip install MySQL-python
yum install mysql-devel.x86_64

在开始的时候,我刚刚安装了MySQL-python,但是问题仍然存在。因此,我认为如果发生此问题,您还应该考虑mysql-devel。希望这可以帮助。

My issue is :

return __import__('MySQLdb')
ImportError: No module named MySQLdb

and my resolution :

pip install MySQL-python
yum install mysql-devel.x86_64

at the very beginning, i just installed MySQL-python, but the issue still existed. So i think if this issue happened, you should also take mysql-devel into consideration. Hope this helps.


回答 4

在研究SQLAlchemy时遇到了这个问题。SQLAlchemy用于MySQL的默认方言是mysql+mysqldb

engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')

No module named MySQLdb执行上述命令时出现“ ”错误。要修复它,我安装了mysql-python模块,此问题已解决。

sudo pip install mysql-python

I got this issue when I was working on SQLAlchemy. The default dialect used by SQLAlchemy for MySQL is mysql+mysqldb.

engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')

I got the “No module named MySQLdb” error when the above command was executed. To fix it I installed the mysql-python module and the issue was fixed.

sudo pip install mysql-python

回答 5

根据我的经验,它也取决于Python版本。

如果您使用的是Python 3,则@DazWorrall答案对我来说效果很好。

但是,如果您使用的是Python 2,则应该

sudo pip install mysql-python

这将安装“ MySQLdb”模块,而无需更改SQLAlchemy URI。

It depends on Python Version as well in my experience.

If you are using Python 3, @DazWorrall answer worked fine for me.

However, if you are using Python 2, you should

sudo pip install mysql-python

which would install ‘MySQLdb’ module without having to change the SQLAlchemy URI.


回答 6

所以我花了大约5个小时试图弄清楚在尝试运行时如何处理此问题

./manage.py makemigrations

使用Ubuntu Server LTS 16.1,完整的LAMP堆栈,Apache2 MySql 5.7 PHP 7 Python 3和Django 1.10.2,我确实很难找到一个好的答案。实际上,我仍然不满意,但是对我有用的唯一解决方案是……

sudo apt-get install build-essential python-dev libapache2-mod-wsgi-py3 libmysqlclient-dev

其次(从虚拟环境内部)

pip install mysqlclient

我真的不喜欢在尝试设置新的Web服务器时必须使用dev install,但是不幸的是,这种配置是我唯一可以采用的舒适方式。

So I spent about 5 hours trying to figure out how to deal with this issue when trying to run

./manage.py makemigrations

With Ubuntu Server LTS 16.1, a full LAMP stack, Apache2 MySql 5.7 PHP 7 Python 3 and Django 1.10.2 I really struggled to find a good answer to this. In fact, I am still not satisfied, but the ONLY solution that worked for me is this…

sudo apt-get install build-essential python-dev libapache2-mod-wsgi-py3 libmysqlclient-dev

followed by (from inside the virtual environment)

pip install mysqlclient

I really dislike having to use dev installs when I am trying to set up a new web server, but unfortunately this configuration was the only mostly comfortable path I could take.


回答 7

尽管@Edward van Kuik答案是正确的,但并未考虑virtualenv v1.7及更高版本的问题

特别是在Ubuntu 上python-mysqldb通过via apt进行安装时,将其放在下/usr/lib/pythonX.Y/dist-packages,但virtualenv的默认情况下不包含此路径。sys.path

因此,要解决此问题,您应该通过运行类似以下内容的系统包来创建您的virtualenv:

virtualenv --system-site-packages .venv

While @Edward van Kuik‘s answer is correct, it doesn’t take into account an issue with virtualenv v1.7 and above.

In particular installing python-mysqldb via apt on Ubuntu put it under /usr/lib/pythonX.Y/dist-packages, but this path isn’t included by default in the virtualenv’s sys.path.

So to resolve this, you should create your virtualenv with system packages by running something like:

virtualenv --system-site-packages .venv


回答 8

有太多与权限有关的错误,什么也没有。您可能想试试这个:

xcode-select --install

Got so many errors related to permissions and what not. You may wanna try this :

xcode-select --install

回答 9

yum install MySQL-python.x86_64

为我工作。

yum install MySQL-python.x86_64

worked for me.


回答 10

在ubuntu 20中,您可以尝试以下操作:

sudo apt-get install libmysqlclient-dev
sudo apt-get install gcc
pip install mysqlclient

In ubuntu 20 , you can try this :

sudo apt-get install libmysqlclient-dev
sudo apt-get install gcc
pip install mysqlclient

在磁盘上保留numpy数组的最佳方法

问题:在磁盘上保留numpy数组的最佳方法

我正在寻找一种保留大型numpy数组的快速方法。我想将它们以二进制格式保存到磁盘中,然后相对快速地将它们读回到内存中。不幸的是,cPickle不够快。

我找到了numpy.saveznumpy.load。但是奇怪的是,numpy.load将一个npy文件加载到“内存映射”中。这意味着对数组的常规操作确实很慢。例如,像这样的事情真的很慢:

#!/usr/bin/python
import numpy as np;
import time; 
from tempfile import TemporaryFile

n = 10000000;

a = np.arange(n)
b = np.arange(n) * 10
c = np.arange(n) * -0.5

file = TemporaryFile()
np.savez(file,a = a, b = b, c = c);

file.seek(0)
t = time.time()
z = np.load(file)
print "loading time = ", time.time() - t

t = time.time()
aa = z['a']
bb = z['b']
cc = z['c']
print "assigning time = ", time.time() - t;

更确切地说,第一行会非常快,但是将数组分配给的其余行却很obj慢:

loading time =  0.000220775604248
assining time =  2.72940087318

有没有更好的方法来保存numpy数组?理想情况下,我希望能够在一个文件中存储多个数组。

I am looking for a fast way to preserve large numpy arrays. I want to save them to the disk in a binary format, then read them back into memory relatively fastly. cPickle is not fast enough, unfortunately.

I found numpy.savez and numpy.load. But the weird thing is, numpy.load loads a npy file into “memory-map”. That means regular manipulating of arrays really slow. For example, something like this would be really slow:

#!/usr/bin/python
import numpy as np;
import time; 
from tempfile import TemporaryFile

n = 10000000;

a = np.arange(n)
b = np.arange(n) * 10
c = np.arange(n) * -0.5

file = TemporaryFile()
np.savez(file,a = a, b = b, c = c);

file.seek(0)
t = time.time()
z = np.load(file)
print "loading time = ", time.time() - t

t = time.time()
aa = z['a']
bb = z['b']
cc = z['c']
print "assigning time = ", time.time() - t;

more precisely, the first line will be really fast, but the remaining lines that assign the arrays to obj are ridiculously slow:

loading time =  0.000220775604248
assining time =  2.72940087318

Is there any better way of preserving numpy arrays? Ideally, I want to be able to store multiple arrays in one file.


回答 0

我是hdf5的忠实支持者,用于存储大型numpy数组。在python中处理hdf5有两种选择:

http://www.pytables.org/

http://www.h5py.org/

两者都旨在有效地处理numpy数组。

I’m a big fan of hdf5 for storing large numpy arrays. There are two options for dealing with hdf5 in python:

http://www.pytables.org/

http://www.h5py.org/

Both are designed to work with numpy arrays efficiently.


回答 1

我比较了性能(空间和时间)以多种方式存储numpy数组。他们中很少有人支持每个文件多个阵列,但是也许仍然有用。

对于密集数据,Npy和二进制文件都非常快而且很小。如果数据稀疏或结构化,则可能要对压缩使用npz,这将节省大量空间,但会花费一些加载时间。

如果可移植性是一个问题,二进制比npy更好。如果人类的可读性很重要,那么您将不得不牺牲很多性能,但是使用csv可以很好地实现它(当然,它也非常可移植)。

更多细节和代码可以在github repo上找到

I’ve compared performance (space and time) for a number of ways to store numpy arrays. Few of them support multiple arrays per file, but perhaps it’s useful anyway.

Npy and binary files are both really fast and small for dense data. If the data is sparse or very structured, you might want to use npz with compression, which’ll save a lot of space but cost some load time.

If portability is an issue, binary is better than npy. If human readability is important, then you’ll have to sacrifice a lot of performance, but it can be achieved fairly well using csv (which is also very portable of course).

More details and the code are available at the github repo.


回答 2

现在有一个pickle名为的基于HDF5的克隆hickle

https://github.com/telegraphic/hickle

import hickle as hkl 

data = { 'name' : 'test', 'data_arr' : [1, 2, 3, 4] }

# Dump data to file
hkl.dump( data, 'new_data_file.hkl' )

# Load data from file
data2 = hkl.load( 'new_data_file.hkl' )

print( data == data2 )

编辑:

还可以通过执行以下操作直接“刺入”压缩的存档:

import pickle, gzip, lzma, bz2

pickle.dump( data, gzip.open( 'data.pkl.gz',   'wb' ) )
pickle.dump( data, lzma.open( 'data.pkl.lzma', 'wb' ) )
pickle.dump( data,  bz2.open( 'data.pkl.bz2',  'wb' ) )


附录

import numpy as np
import matplotlib.pyplot as plt
import pickle, os, time
import gzip, lzma, bz2, h5py

compressions = [ 'pickle', 'h5py', 'gzip', 'lzma', 'bz2' ]
labels = [ 'pickle', 'h5py', 'pickle+gzip', 'pickle+lzma', 'pickle+bz2' ]
size = 1000

data = {}

# Random data
data['random'] = np.random.random((size, size))

# Not that random data
data['semi-random'] = np.zeros((size, size))
for i in range(size):
    for j in range(size):
        data['semi-random'][i,j] = np.sum(data['random'][i,:]) + np.sum(data['random'][:,j])

# Not random data
data['not-random'] = np.arange( size*size, dtype=np.float64 ).reshape( (size, size) )

sizes = {}

for key in data:

    sizes[key] = {}

    for compression in compressions:

        if compression == 'pickle':
            time_start = time.time()
            pickle.dump( data[key], open( 'data.pkl', 'wb' ) )
            time_tot = time.time() - time_start
            sizes[key]['pickle'] = ( os.path.getsize( 'data.pkl' ) * 10**(-6), time_tot )
            os.remove( 'data.pkl' )

        elif compression == 'h5py':
            time_start = time.time()
            with h5py.File( 'data.pkl.{}'.format(compression), 'w' ) as h5f:
                h5f.create_dataset('data', data=data[key])
            time_tot = time.time() - time_start
            sizes[key][compression] = ( os.path.getsize( 'data.pkl.{}'.format(compression) ) * 10**(-6), time_tot)
            os.remove( 'data.pkl.{}'.format(compression) )

        else:
            time_start = time.time()
            pickle.dump( data[key], eval(compression).open( 'data.pkl.{}'.format(compression), 'wb' ) )
            time_tot = time.time() - time_start
            sizes[key][ labels[ compressions.index(compression) ] ] = ( os.path.getsize( 'data.pkl.{}'.format(compression) ) * 10**(-6), time_tot )
            os.remove( 'data.pkl.{}'.format(compression) )


f, ax_size = plt.subplots()
ax_time = ax_size.twinx()

x_ticks = labels
x = np.arange( len(x_ticks) )

y_size = {}
y_time = {}
for key in data:
    y_size[key] = [ sizes[key][ x_ticks[i] ][0] for i in x ]
    y_time[key] = [ sizes[key][ x_ticks[i] ][1] for i in x ]

width = .2
viridis = plt.cm.viridis

p1 = ax_size.bar( x-width, y_size['random']       , width, color = viridis(0)  )
p2 = ax_size.bar( x      , y_size['semi-random']  , width, color = viridis(.45))
p3 = ax_size.bar( x+width, y_size['not-random']   , width, color = viridis(.9) )

p4 = ax_time.bar( x-width, y_time['random']  , .02, color = 'red')
ax_time.bar( x      , y_time['semi-random']  , .02, color = 'red')
ax_time.bar( x+width, y_time['not-random']   , .02, color = 'red')

ax_size.legend( (p1, p2, p3, p4), ('random', 'semi-random', 'not-random', 'saving time'), loc='upper center',bbox_to_anchor=(.5, -.1), ncol=4 )
ax_size.set_xticks( x )
ax_size.set_xticklabels( x_ticks )

f.suptitle( 'Pickle Compression Comparison' )
ax_size.set_ylabel( 'Size [MB]' )
ax_time.set_ylabel( 'Time [s]' )

f.savefig( 'sizes.pdf', bbox_inches='tight' )

There is now a HDF5 based clone of pickle called hickle!

https://github.com/telegraphic/hickle

import hickle as hkl 

data = { 'name' : 'test', 'data_arr' : [1, 2, 3, 4] }

# Dump data to file
hkl.dump( data, 'new_data_file.hkl' )

# Load data from file
data2 = hkl.load( 'new_data_file.hkl' )

print( data == data2 )

EDIT:

There also is the possibility to “pickle” directly into a compressed archive by doing:

import pickle, gzip, lzma, bz2

pickle.dump( data, gzip.open( 'data.pkl.gz',   'wb' ) )
pickle.dump( data, lzma.open( 'data.pkl.lzma', 'wb' ) )
pickle.dump( data,  bz2.open( 'data.pkl.bz2',  'wb' ) )


Appendix

import numpy as np
import matplotlib.pyplot as plt
import pickle, os, time
import gzip, lzma, bz2, h5py

compressions = [ 'pickle', 'h5py', 'gzip', 'lzma', 'bz2' ]
labels = [ 'pickle', 'h5py', 'pickle+gzip', 'pickle+lzma', 'pickle+bz2' ]
size = 1000

data = {}

# Random data
data['random'] = np.random.random((size, size))

# Not that random data
data['semi-random'] = np.zeros((size, size))
for i in range(size):
    for j in range(size):
        data['semi-random'][i,j] = np.sum(data['random'][i,:]) + np.sum(data['random'][:,j])

# Not random data
data['not-random'] = np.arange( size*size, dtype=np.float64 ).reshape( (size, size) )

sizes = {}

for key in data:

    sizes[key] = {}

    for compression in compressions:

        if compression == 'pickle':
            time_start = time.time()
            pickle.dump( data[key], open( 'data.pkl', 'wb' ) )
            time_tot = time.time() - time_start
            sizes[key]['pickle'] = ( os.path.getsize( 'data.pkl' ) * 10**(-6), time_tot )
            os.remove( 'data.pkl' )

        elif compression == 'h5py':
            time_start = time.time()
            with h5py.File( 'data.pkl.{}'.format(compression), 'w' ) as h5f:
                h5f.create_dataset('data', data=data[key])
            time_tot = time.time() - time_start
            sizes[key][compression] = ( os.path.getsize( 'data.pkl.{}'.format(compression) ) * 10**(-6), time_tot)
            os.remove( 'data.pkl.{}'.format(compression) )

        else:
            time_start = time.time()
            pickle.dump( data[key], eval(compression).open( 'data.pkl.{}'.format(compression), 'wb' ) )
            time_tot = time.time() - time_start
            sizes[key][ labels[ compressions.index(compression) ] ] = ( os.path.getsize( 'data.pkl.{}'.format(compression) ) * 10**(-6), time_tot )
            os.remove( 'data.pkl.{}'.format(compression) )


f, ax_size = plt.subplots()
ax_time = ax_size.twinx()

x_ticks = labels
x = np.arange( len(x_ticks) )

y_size = {}
y_time = {}
for key in data:
    y_size[key] = [ sizes[key][ x_ticks[i] ][0] for i in x ]
    y_time[key] = [ sizes[key][ x_ticks[i] ][1] for i in x ]

width = .2
viridis = plt.cm.viridis

p1 = ax_size.bar( x-width, y_size['random']       , width, color = viridis(0)  )
p2 = ax_size.bar( x      , y_size['semi-random']  , width, color = viridis(.45))
p3 = ax_size.bar( x+width, y_size['not-random']   , width, color = viridis(.9) )

p4 = ax_time.bar( x-width, y_time['random']  , .02, color = 'red')
ax_time.bar( x      , y_time['semi-random']  , .02, color = 'red')
ax_time.bar( x+width, y_time['not-random']   , .02, color = 'red')

ax_size.legend( (p1, p2, p3, p4), ('random', 'semi-random', 'not-random', 'saving time'), loc='upper center',bbox_to_anchor=(.5, -.1), ncol=4 )
ax_size.set_xticks( x )
ax_size.set_xticklabels( x_ticks )

f.suptitle( 'Pickle Compression Comparison' )
ax_size.set_ylabel( 'Size [MB]' )
ax_time.set_ylabel( 'Time [s]' )

f.savefig( 'sizes.pdf', bbox_inches='tight' )

回答 3

savez()将数据保存在一个zip文件中,可能需要一些时间来压缩和解压缩该文件。您可以使用save()和load()函数:

f = file("tmp.bin","wb")
np.save(f,a)
np.save(f,b)
np.save(f,c)
f.close()

f = file("tmp.bin","rb")
aa = np.load(f)
bb = np.load(f)
cc = np.load(f)
f.close()

要将多个阵列保存在一个文件中,只需要先打开文件,然后依次保存或加载阵列即可。

savez() save data in a zip file, It may take some time to zip & unzip the file. You can use save() & load() function:

f = file("tmp.bin","wb")
np.save(f,a)
np.save(f,b)
np.save(f,c)
f.close()

f = file("tmp.bin","rb")
aa = np.load(f)
bb = np.load(f)
cc = np.load(f)
f.close()

To save multiple arrays in one file, you just need to open the file first, and then save or load the arrays in sequence.


回答 4

有效存储numpy数组的另一种可能性是Bloscpack

#!/usr/bin/python
import numpy as np
import bloscpack as bp
import time

n = 10000000

a = np.arange(n)
b = np.arange(n) * 10
c = np.arange(n) * -0.5
tsizeMB = sum(i.size*i.itemsize for i in (a,b,c)) / 2**20.

blosc_args = bp.DEFAULT_BLOSC_ARGS
blosc_args['clevel'] = 6
t = time.time()
bp.pack_ndarray_file(a, 'a.blp', blosc_args=blosc_args)
bp.pack_ndarray_file(b, 'b.blp', blosc_args=blosc_args)
bp.pack_ndarray_file(c, 'c.blp', blosc_args=blosc_args)
t1 = time.time() - t
print "store time = %.2f (%.2f MB/s)" % (t1, tsizeMB / t1)

t = time.time()
a1 = bp.unpack_ndarray_file('a.blp')
b1 = bp.unpack_ndarray_file('b.blp')
c1 = bp.unpack_ndarray_file('c.blp')
t1 = time.time() - t
print "loading time = %.2f (%.2f MB/s)" % (t1, tsizeMB / t1)

和我的笔记本电脑(具有Core2处理器的较旧的MacBook Air)的输出:

$ python store-blpk.py
store time = 0.19 (1216.45 MB/s)
loading time = 0.25 (898.08 MB/s)

这意味着它可以真正快速地存储,即瓶颈通常是磁盘。但是,由于此处的压缩率非常好,因此有效速度会乘以压缩率。这是这些76 MB阵列的大小:

$ ll -h *.blp
-rw-r--r--  1 faltet  staff   921K Mar  6 13:50 a.blp
-rw-r--r--  1 faltet  staff   2.2M Mar  6 13:50 b.blp
-rw-r--r--  1 faltet  staff   1.4M Mar  6 13:50 c.blp

请注意,使用Blosc压缩机对于实现这一目标至关重要。相同的脚本,但是使用’clevel’= 0(即禁用压缩):

$ python bench/store-blpk.py
store time = 3.36 (68.04 MB/s)
loading time = 2.61 (87.80 MB/s)

显然是磁盘性能的瓶颈。

Another possibility to store numpy arrays efficiently is Bloscpack:

#!/usr/bin/python
import numpy as np
import bloscpack as bp
import time

n = 10000000

a = np.arange(n)
b = np.arange(n) * 10
c = np.arange(n) * -0.5
tsizeMB = sum(i.size*i.itemsize for i in (a,b,c)) / 2**20.

blosc_args = bp.DEFAULT_BLOSC_ARGS
blosc_args['clevel'] = 6
t = time.time()
bp.pack_ndarray_file(a, 'a.blp', blosc_args=blosc_args)
bp.pack_ndarray_file(b, 'b.blp', blosc_args=blosc_args)
bp.pack_ndarray_file(c, 'c.blp', blosc_args=blosc_args)
t1 = time.time() - t
print "store time = %.2f (%.2f MB/s)" % (t1, tsizeMB / t1)

t = time.time()
a1 = bp.unpack_ndarray_file('a.blp')
b1 = bp.unpack_ndarray_file('b.blp')
c1 = bp.unpack_ndarray_file('c.blp')
t1 = time.time() - t
print "loading time = %.2f (%.2f MB/s)" % (t1, tsizeMB / t1)

and the output for my laptop (a relatively old MacBook Air with a Core2 processor):

$ python store-blpk.py
store time = 0.19 (1216.45 MB/s)
loading time = 0.25 (898.08 MB/s)

that means that it can store really fast, i.e. the bottleneck is typically the disk. However, as the compression ratios are pretty good here, the effective speed is multiplied by the compression ratios. Here are the sizes for these 76 MB arrays:

$ ll -h *.blp
-rw-r--r--  1 faltet  staff   921K Mar  6 13:50 a.blp
-rw-r--r--  1 faltet  staff   2.2M Mar  6 13:50 b.blp
-rw-r--r--  1 faltet  staff   1.4M Mar  6 13:50 c.blp

Please note that the use of the Blosc compressor is fundamental for achieving this. The same script but using ‘clevel’ = 0 (i.e. disabling compression):

$ python bench/store-blpk.py
store time = 3.36 (68.04 MB/s)
loading time = 2.61 (87.80 MB/s)

is clearly bottlenecked by the disk performance.


回答 5

查找时间很慢,因为使用mmap时调用load方法不会将数组的内容加载到内存中。当需要特定数据时,将延迟加载数据。而这种情况发生在您的情况下。但是第二次查找不会太慢。

这是一个很好的功能,mmap当您有一个大数组时,您不必将整个数据加载到内存中。

为了解决您可以使用joblib的问题,joblib.dump甚至可以使用两个或更多对象转储任何想要的对象numpy arrays,请参见示例

firstArray = np.arange(100)
secondArray = np.arange(50)
# I will put two arrays in dictionary and save to one file
my_dict = {'first' : firstArray, 'second' : secondArray}
joblib.dump(my_dict, 'file_name.dat')

The lookup time is slow because when you use mmap to does not load content of array to memory when you invoke load method. Data is lazy loaded when particular data is needed. And this happens in lookup in your case. But second lookup won`t be so slow.

This is nice feature of mmap when you have a big array you do not have to load whole data into memory.

To solve your can use joblib you can dump any object you want using joblib.dump even two or more numpy arrays, see the example

firstArray = np.arange(100)
secondArray = np.arange(50)
# I will put two arrays in dictionary and save to one file
my_dict = {'first' : firstArray, 'second' : secondArray}
joblib.dump(my_dict, 'file_name.dat')

Python:更改元组中的值

问题:Python:更改元组中的值

我是python的新手,所以这个问题可能有点基本。我有一个元组values,其中包含以下内容:

('275', '54000', '0.0', '5000.0', '0.0')

我想更改275此元组中的第一个值(即),但我知道元组是不可变的,因此values[0] = 200将不起作用。我该如何实现?

I’m new to python so this question might be a little basic. I have a tuple called values which contains the following:

('275', '54000', '0.0', '5000.0', '0.0')

I want to change the first value (i.e., 275) in this tuple but I understand that tuples are immutable so values[0] = 200 will not work. How can I achieve this?


回答 0

首先,您需要问,为什么要这样做?

但是可以通过:

t = ('275', '54000', '0.0', '5000.0', '0.0')
lst = list(t)
lst[0] = '300'
t = tuple(lst)

但是,如果您需要进行更改,最好将其保留为 list

First you need to ask, why you want to do this?

But it’s possible via:

t = ('275', '54000', '0.0', '5000.0', '0.0')
lst = list(t)
lst[0] = '300'
t = tuple(lst)

But if you’re going to need to change things, you probably are better off keeping it as a list


回答 1

根据您的问题,切片可以是一个非常整洁的解决方案:

>>> b = (1, 2, 3, 4, 5)
>>> b[:2] + (8,9) + b[3:]
(1, 2, 8, 9, 4, 5)
>>> b[:2] + (8,) + b[3:]
(1, 2, 8, 4, 5)

这使您可以添加多个元素或替换一些元素(尤其是在它们是“邻居”时。在上述情况下,强制转换为列表可能更合适且更易读(即使切片表示法要短得多)。

Depending on your problem slicing can be a really neat solution:

>>> b = (1, 2, 3, 4, 5)
>>> b[:2] + (8,9) + b[3:]
(1, 2, 8, 9, 4, 5)
>>> b[:2] + (8,) + b[3:]
(1, 2, 8, 4, 5)

This allows you to add multiple elements or also to replace a few elements (especially if they are “neighbours”. In the above case casting to a list is probably more appropriate and readable (even though the slicing notation is much shorter).


回答 2

好吧,正如Trufa已经显示的那样,在给定的索引上基本上有两种替换元组元素的方法。将元组转换为列表,替换元素然后再转换回去,或者通过串联构造新的元组。

In [1]: def replace_at_index1(tup, ix, val):
   ...:     lst = list(tup)
   ...:     lst[ix] = val
   ...:     return tuple(lst)
   ...:

In [2]: def replace_at_index2(tup, ix, val):
   ...:     return tup[:ix] + (val,) + tup[ix+1:]
   ...:

那么,哪种方法更好,那就是更快?

事实证明,对于短元组(在Python 3.3上),连接实际上更快!

In [3]: d = tuple(range(10))

In [4]: %timeit replace_at_index1(d, 5, 99)
1000000 loops, best of 3: 872 ns per loop

In [5]: %timeit replace_at_index2(d, 5, 99)
1000000 loops, best of 3: 642 ns per loop

但是,如果我们查看更长的元组,则列表转换是必经之路:

In [6]: k = tuple(range(1000))

In [7]: %timeit replace_at_index1(k, 500, 99)
100000 loops, best of 3: 9.08 µs per loop

In [8]: %timeit replace_at_index2(k, 500, 99)
100000 loops, best of 3: 10.1 µs per loop

对于非常长的元组,列表转换要好得多!

In [9]: m = tuple(range(1000000))

In [10]: %timeit replace_at_index1(m, 500000, 99)
10 loops, best of 3: 26.6 ms per loop

In [11]: %timeit replace_at_index2(m, 500000, 99)
10 loops, best of 3: 35.9 ms per loop

同样,串联方法的性能取决于替换元素的索引。对于列表方法,索引是不相关的。

In [12]: %timeit replace_at_index1(m, 900000, 99)
10 loops, best of 3: 26.6 ms per loop

In [13]: %timeit replace_at_index2(m, 900000, 99)
10 loops, best of 3: 49.2 ms per loop

因此:如果您的元组很短,则切片并连接。如果很长,请执行列表转换!

Well, as Trufa has already shown, there are basically two ways of replacing a tuple’s element at a given index. Either convert the tuple to a list, replace the element and convert back, or construct a new tuple by concatenation.

In [1]: def replace_at_index1(tup, ix, val):
   ...:     lst = list(tup)
   ...:     lst[ix] = val
   ...:     return tuple(lst)
   ...:

In [2]: def replace_at_index2(tup, ix, val):
   ...:     return tup[:ix] + (val,) + tup[ix+1:]
   ...:

So, which method is better, that is, faster?

It turns out that for short tuples (on Python 3.3), concatenation is actually faster!

In [3]: d = tuple(range(10))

In [4]: %timeit replace_at_index1(d, 5, 99)
1000000 loops, best of 3: 872 ns per loop

In [5]: %timeit replace_at_index2(d, 5, 99)
1000000 loops, best of 3: 642 ns per loop

Yet if we look at longer tuples, list conversion is the way to go:

In [6]: k = tuple(range(1000))

In [7]: %timeit replace_at_index1(k, 500, 99)
100000 loops, best of 3: 9.08 µs per loop

In [8]: %timeit replace_at_index2(k, 500, 99)
100000 loops, best of 3: 10.1 µs per loop

For very long tuples, list conversion is substantially better!

In [9]: m = tuple(range(1000000))

In [10]: %timeit replace_at_index1(m, 500000, 99)
10 loops, best of 3: 26.6 ms per loop

In [11]: %timeit replace_at_index2(m, 500000, 99)
10 loops, best of 3: 35.9 ms per loop

Also, performance of the concatenation method depends on the index at which we replace the element. For the list method, the index is irrelevant.

In [12]: %timeit replace_at_index1(m, 900000, 99)
10 loops, best of 3: 26.6 ms per loop

In [13]: %timeit replace_at_index2(m, 900000, 99)
10 loops, best of 3: 49.2 ms per loop

So: If your tuple is short, slice and concatenate. If it’s long, do the list conversion!


回答 3

一个衬垫是可能的:

values = ('275', '54000', '0.0', '5000.0', '0.0')
values = ('300', *values[1:])

It is possible with a one liner:

values = ('275', '54000', '0.0', '5000.0', '0.0')
values = ('300', *values[1:])

回答 4

并不是说这是优越的,但是如果有人好奇的话,可以用以下方法一行完成:

tuple = tuple([200 if i == 0 else _ for i, _ in enumerate(tuple)])

Not that this is superior, but if anyone is curious it can be done on one line with:

tuple = tuple([200 if i == 0 else _ for i, _ in enumerate(tuple)])

回答 5

我相信这从技术上讲可以回答问题,但不要在家中这样做。目前,所有答案都涉及创建新的元组,但是您可以使用它ctypes来修改内存中的元组。依靠64位系统上CPython的各种实现细节,一种实现方法如下:

def modify_tuple(t, idx, new_value):
    # `id` happens to give the memory address in CPython; you may
    # want to use `ctypes.addressof` instead.
    element_ptr = (ctypes.c_longlong).from_address(id(t) + (3 + idx)*8)
    element_ptr.value = id(new_value)
    # Manually increment the reference count to `new_value` to pretend that
    # this is not a terrible idea.
    ref_count = (ctypes.c_longlong).from_address(id(new_value))
    ref_count.value += 1

t = (10, 20, 30)
modify_tuple(t, 1, 50)   # t is now (10, 50, 30)
modify_tuple(t, -1, 50)  # Will probably crash your Python runtime

I believe this technically answers the question, but don’t do this at home. At the moment, all answers involve creating a new tuple, but you can use ctypes to modify a tuple in-memory. Relying on various implementation details of CPython on a 64-bit system, one way to do this is as follows:

def modify_tuple(t, idx, new_value):
    # `id` happens to give the memory address in CPython; you may
    # want to use `ctypes.addressof` instead.
    element_ptr = (ctypes.c_longlong).from_address(id(t) + (3 + idx)*8)
    element_ptr.value = id(new_value)
    # Manually increment the reference count to `new_value` to pretend that
    # this is not a terrible idea.
    ref_count = (ctypes.c_longlong).from_address(id(new_value))
    ref_count.value += 1

t = (10, 20, 30)
modify_tuple(t, 1, 50)   # t is now (10, 50, 30)
modify_tuple(t, -1, 50)  # Will probably crash your Python runtime

回答 6

正如Hunter McMillen在评论中所写,元组是不可变的,您需要创建一个新的元组以实现此目的。例如:

>>> tpl = ('275', '54000', '0.0', '5000.0', '0.0')
>>> change_value = 200
>>> tpl = (change_value,) + tpl[1:]
>>> tpl
(200, '54000', '0.0', '5000.0', '0.0')

As Hunter McMillen wrote in the comments, tuples are immutable, you need to create a new tuple in order to achieve this. For instance:

>>> tpl = ('275', '54000', '0.0', '5000.0', '0.0')
>>> change_value = 200
>>> tpl = (change_value,) + tpl[1:]
>>> tpl
(200, '54000', '0.0', '5000.0', '0.0')

回答 7

编辑:这不适用于具有重复条目的元组!

基于Pooya的想法

如果您打算经常执行此操作(由于元组由于某种原因是不变的,则不应该这样做),您应该执行以下操作:

def modTupByIndex(tup, index, ins):
    return tuple(tup[0:index]) + (ins,) + tuple(tup[index+1:])

print modTupByIndex((1,2,3),2,"a")

或基于乔恩的想法

def modTupByIndex(tup, index, ins):
    lst = list(tup)
    lst[index] = ins
    return tuple(lst)

print modTupByIndex((1,2,3),1,"a")

EDIT: This doesn’t work on tuples with duplicate entries yet!!

Based on Pooya’s idea:

If you are planning on doing this often (which you shouldn’t since tuples are inmutable for a reason) you should do something like this:

def modTupByIndex(tup, index, ins):
    return tuple(tup[0:index]) + (ins,) + tuple(tup[index+1:])

print modTupByIndex((1,2,3),2,"a")

Or based on Jon’s idea:

def modTupByIndex(tup, index, ins):
    lst = list(tup)
    lst[index] = ins
    return tuple(lst)

print modTupByIndex((1,2,3),1,"a")

回答 8

第一拳,问自己为什么要突变你的tuple在Ptyhon中字符串和元组是不可变的,这是有原因的,如果您想对自己的变量进行突变,tuple那么它应该是一个list替代。

其次,如果您仍然希望对元组进行突变,则可以将转换tuple为,list然后再转换回去,然后将新的元组重新分配给相同的变量。如果您只想对元组进行一次变异,那就太好了。否则,我个人认为这是违反直觉的。因为它本质上是在创建一个新的元组,并且每次您想要对元组进行突变时,都必须执行转换。另外,如果您阅读了代码,那么思考为什么不仅仅创建一个list?但这很好,因为它不需要任何库。

我建议mutabletuple(typename, field_names, default=MtNoDefault)mutabletuple 0.2使用。我个人认为这种方式是一个更直观可读性。阅读该代码的人会知道编写者打算将来更改此元组。与list上面的转换方法相比,不利之处在于,这需要您导入其他py文件。

from mutabletuple import mutabletuple

myTuple = mutabletuple('myTuple', 'v w x y z')
p = myTuple('275', '54000', '0.0', '5000.0', '0.0')
print(p.v) #print 275
p.v = '200' #mutate myTuple
print(p.v) #print 200

TL; DR:不要尝试变异tuple。如果您这样做并且是一次操作,则转换tuple为列表,对其进行变异,将其list转换为新tuple变量,然后重新分配给持有old的变量tuple。如果欲望tuple和某种原因想避免list并想变异多于一次,那就创造mutabletuple

Frist, ask yourself why you want to mutate your tuple. There is a reason why strings and tuple are immutable in Ptyhon, if you want to mutate your tuple then it should probably be a list instead.

Second, if you still wish to mutate your tuple then you can convert your tuple to a list then convert it back, and reassign the new tuple to the same variable. This is great if you are only going to mutate your tuple once. Otherwise, I personally think that is counterintuitive. Because It is essentially creating a new tuple and every time if you wish to mutate the tuple you would have to perform the conversion. Also If you read the code it would be confusing to think why not just create a list? But it is nice because it doesn’t require any library.

I suggest using mutabletuple(typename, field_names, default=MtNoDefault) from mutabletuple 0.2. I personally think this way is a more intuitive and readable. The personal reading the code would know that writer intends to mutate this tuple in the future. The downside compares to the list conversion method above is that this requires you to import additional py file.

from mutabletuple import mutabletuple

myTuple = mutabletuple('myTuple', 'v w x y z')
p = myTuple('275', '54000', '0.0', '5000.0', '0.0')
print(p.v) #print 275
p.v = '200' #mutate myTuple
print(p.v) #print 200

TL;DR: Don’t try to mutate tuple. if you do and it is a one-time operation convert tuple to list, mutate it, turn list into a new tuple, and reassign back to the variable holding old tuple. If desires tuple and somehow want to avoid listand want to mutate more than once then create mutabletuple.


回答 9

基于乔恩的思想和亲爱的特鲁法。

def modifyTuple(tup, oldval, newval):
    lst=list(tup)
    for i in range(tup.count(oldval)):
        index = lst.index(oldval)
        lst[index]=newval

    return tuple(lst)

print modTupByIndex((1, 1, 3), 1, "a")

它会改变您所有的旧价值观

based on Jon‘s Idea and dear Trufa

def modifyTuple(tup, oldval, newval):
    lst=list(tup)
    for i in range(tup.count(oldval)):
        index = lst.index(oldval)
        lst[index]=newval

    return tuple(lst)

print modTupByIndex((1, 1, 3), 1, "a")

it changes all of your old values occurrences


回答 10

你不能 如果要更改它,则需要使用列表而不是元组。

请注意,您可以改为创建一个以新值作为第一个元素的新元组。

You can’t. If you want to change it, you need to use a list instead of a tuple.

Note that you could instead make a new tuple that has the new value as its first element.


回答 11

我发现编辑元组的最佳方法是使用以前的版本作为基础来重新创建元组。

这是我用来制作较浅颜色的示例(当时我已经打开它了):

colour = tuple([c+50 for c in colour])

它的作用是遍历元组的“颜色”并读取每个项目,对其进行处理,最后将其添加到新的元组中。

因此,您想要的是:

values = ('275', '54000', '0.0', '5000.0', '0.0')

values  = (tuple(for i in values: if i = 0: i = 200 else i = values[i])

那个特定的那个不起作用,但是您需要的是概念。

tuple = (0, 1, 2)

元组=遍历元组,根据需要更改每个项目

这就是概念。

I’ve found the best way to edit tuples is to recreate the tuple using the previous version as the base.

Here’s an example I used for making a lighter version of a colour (I had it open already at the time):

colour = tuple([c+50 for c in colour])

What it does, is it goes through the tuple ‘colour’ and reads each item, does something to it, and finally adds it to the new tuple.

So what you’d want would be something like:

values = ('275', '54000', '0.0', '5000.0', '0.0')

values  = (tuple(for i in values: if i = 0: i = 200 else i = values[i])

That specific one doesn’t work, but the concept is what you need.

tuple = (0, 1, 2)

tuple = iterate through tuple, alter each item as needed

that’s the concept.


回答 12

我来晚了,但是我认为最简单,资源友好和最快的方法(取决于情况)是覆盖元组本身。由于这将消除对列表和变量创建的需要,因此将其归档在一行中。

new = 24
t = (1, 2, 3)
t = (t[0],t[1],new)

>>> (1, 2, 24)

但是:这仅适用于较小的元组,并且还会将您限制为固定的元组值,但是,无论如何,在大多数情况下,这都是元组的情况。

因此,在这种特殊情况下,它看起来像这样:

new = '200'
t = ('275', '54000', '0.0', '5000.0', '0.0')
t = (new, t[1], t[2], t[3], t[4])

>>> ('200', '54000', '0.0', '5000.0', '0.0')

I´m late to the game but I think the simplest, resource-friendliest and fastest way (depending on the situation), is to overwrite the tuple itself. Since this would remove the need for the list & variable creation and is archived in one line.

new = 24
t = (1, 2, 3)
t = (t[0],t[1],new)

>>> (1, 2, 24)

But: This is only handy for rather small tuples and also limits you to a fixed tuple value, nevertheless, this is the case for tuples most of the time anyway.

So in this particular case it would look like this:

new = '200'
t = ('275', '54000', '0.0', '5000.0', '0.0')
t = (new, t[1], t[2], t[3], t[4])

>>> ('200', '54000', '0.0', '5000.0', '0.0')

回答 13

tldr; “解决方法”是创建一个新的元组对象,而不是实际修改原始对象

尽管这是一个非常老的问题,但有人告诉我有关Python变异元组的疯狂。我非常惊讶/被它吸引,并做了一些谷歌搜索,我降落在这里(和其他在线的类似样本)

我进行了一些测试以证明我的理论

请注意==,值相等,而is引用相等(obj a与obj b相同)

a = ("apple", "canana", "cherry")
b = tuple(["apple", "canana", "cherry"])
c = a

print("a: " + str(a))
print("b: " + str(b))
print("c: " + str(c))
print("a == b :: %s" % (a==b))
print("b == c :: %s" % (b==c))
print("a == c :: %s" % (a==c))
print("a is b :: %s" % (a is b))
print("b is c :: %s" % (b is c))
print("a is c :: %s" % (a is c))

d = list(a)
d[1] = "kiwi"
a = tuple(d)

print("a: " + str(a))
print("b: " + str(b))
print("c: " + str(c))
print("a == b :: %s" % (a==b))
print("b == c :: %s" % (b==c))
print("a == c :: %s" % (a==c))
print("a is b :: %s" % (a is b))
print("b is c :: %s" % (b is c))
print("a is c :: %s" % (a is c))

Yield:

a: ('apple', 'canana', 'cherry')
b: ('apple', 'canana', 'cherry')
c: ('apple', 'canana', 'cherry')
a == b :: True
b == c :: True
a == c :: True
a is b :: False
b is c :: False
a is c :: True
a: ('apple', 'kiwi', 'cherry')
b: ('apple', 'canana', 'cherry')
c: ('apple', 'canana', 'cherry')
a == b :: False
b == c :: True
a == c :: False
a is b :: False
b is c :: False
a is c :: False

tldr; the “workaround” is creating a new tuple object, not actually modifying the original

While this is a very old question, someone told me about this Python mutating tuples madness. Which I was very much surprised/intrigued, and doing some googling, I landed here (and other similar samples online)

I ran some test to prove my theory

Note == does value equality while is does referential equality (is obj a the same instance as obj b)

a = ("apple", "canana", "cherry")
b = tuple(["apple", "canana", "cherry"])
c = a

print("a: " + str(a))
print("b: " + str(b))
print("c: " + str(c))
print("a == b :: %s" % (a==b))
print("b == c :: %s" % (b==c))
print("a == c :: %s" % (a==c))
print("a is b :: %s" % (a is b))
print("b is c :: %s" % (b is c))
print("a is c :: %s" % (a is c))

d = list(a)
d[1] = "kiwi"
a = tuple(d)

print("a: " + str(a))
print("b: " + str(b))
print("c: " + str(c))
print("a == b :: %s" % (a==b))
print("b == c :: %s" % (b==c))
print("a == c :: %s" % (a==c))
print("a is b :: %s" % (a is b))
print("b is c :: %s" % (b is c))
print("a is c :: %s" % (a is c))

Yields:

a: ('apple', 'canana', 'cherry')
b: ('apple', 'canana', 'cherry')
c: ('apple', 'canana', 'cherry')
a == b :: True
b == c :: True
a == c :: True
a is b :: False
b is c :: False
a is c :: True
a: ('apple', 'kiwi', 'cherry')
b: ('apple', 'canana', 'cherry')
c: ('apple', 'canana', 'cherry')
a == b :: False
b == c :: True
a == c :: False
a is b :: False
b is c :: False
a is c :: False

回答 14

您不能修改元组中的项目,但是可以修改元组中可变对象的属性(例如,如果这些对象是列表或实际的类对象)

例如

my_list = [1,2]
tuple_of_lists = (my_list,'hello')
print(tuple_of_lists) # ([1, 2], 'hello')
my_list[0] = 0
print(tuple_of_lists) # ([0, 2], 'hello')

You can’t modify items in tuple, but you can modify properties of mutable objects in tuples (for example if those objects are lists or actual class objects)

For example

my_list = [1,2]
tuple_of_lists = (my_list,'hello')
print(tuple_of_lists) # ([1, 2], 'hello')
my_list[0] = 0
print(tuple_of_lists) # ([0, 2], 'hello')

回答 15

我这样做:

list = [1,2,3,4,5]
tuple = (list)

要改变,只要做

list[0]=6

你可以改变一个元组:D

这是它完全从IDLE复制的

>>> list=[1,2,3,4,5,6,7,8,9]

>>> tuple=(list)

>>> print(tuple)

[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list[0]=6

>>> print(tuple)

[6, 2, 3, 4, 5, 6, 7, 8, 9]

i did this:

list = [1,2,3,4,5]
tuple = (list)

and to change, just do

list[0]=6

and u can change a tuple :D

here is it copied exactly from IDLE

>>> list=[1,2,3,4,5,6,7,8,9]

>>> tuple=(list)

>>> print(tuple)

[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list[0]=6

>>> print(tuple)

[6, 2, 3, 4, 5, 6, 7, 8, 9]

回答 16

您可以使用按引用复制来更改元组的值

>>> tuple1=[20,30,40]

>>> tuple2=tuple1

>>> tuple2
    [20, 30, 40]

>>> tuple2[1]=10

>>> print(tuple2)
    [20, 10, 40]

>>> print(tuple1)
    [20, 10, 40]

You can change the value of tuple using copy by reference

>>> tuple1=[20,30,40]

>>> tuple2=tuple1

>>> tuple2
    [20, 30, 40]

>>> tuple2[1]=10

>>> print(tuple2)
    [20, 10, 40]

>>> print(tuple1)
    [20, 10, 40]