标签归档:python-3.x

Python3项目删除__pycache__文件夹和.pyc文件

问题:Python3项目删除__pycache__文件夹和.pyc文件

从python3项目中清除所有__pycache__ 文件夹和.pyc/.pyo文件的最佳方法是什么?我已经看到多个用户建议pyclean与Debian捆绑在一起的脚本,但这不会删除文件夹。我想要一种简单的方法来将项目推送到我的DVS之前清理项目。

What is the BEST way to clear out all the __pycache__ folders and .pyc/.pyo files from a python3 project. I have seen multiple users suggest the pyclean script bundled with Debian, but this does not remove the folders. I want a simple way to clean up the project before pushing the files to my DVS.


回答 0

您可以使用下一条命令手动进行操作:

find . | grep -E "(__pycache__|\.pyc|\.pyo$)" | xargs rm -rf

这将递归地删除当前目录中的所有* .pyc文件和__pycache__目录。

You can do it manually with the next command:

find . | grep -E "(__pycache__|\.pyc|\.pyo$)" | xargs rm -rf

This will remove all *.pyc files and __pycache__ directories recursively in the current directory.


回答 1

当我将pyclean输错为pycclean时,我自己找到了答案:

    No command 'pycclean' found, did you mean:
     Command 'py3clean' from package 'python3-minimal' (main)
     Command 'pyclean' from package 'python-minimal' (main)
    pycclean: command not found

运行py3clean .清理它很好。

I found the answer myself when I mistyped pyclean as pycclean:

    No command 'pycclean' found, did you mean:
     Command 'py3clean' from package 'python3-minimal' (main)
     Command 'pyclean' from package 'python-minimal' (main)
    pycclean: command not found

Running py3clean . cleaned it up very nicely.


回答 2

macOS和Linux

BSD find在macOS上的实现不同于GNU find-这与BSD和GNU find兼容。使用-name和的-ofor -将这个函数放入.bashrc文件中,从全局实现开始。

pyclean () {
    find . -type f -name '*.py[co]' -delete -o -type d -name __pycache__ -delete
}

然后cd到您要递归清理的目录,然后键入pyclean

GNU仅查找

这是GNU唯一的解决方案(即Linux)解决方案,但是我觉得使用regex会更好一些:

pyclean () {
    find . -regex '^.*\(__pycache__\|\.py[co]\)$' -delete
}

任何使用Python 3的平台

在Windows上,您甚至可能没有find。但是,您可能确实拥有Python 3,从3.4开始,它具有便捷的pathlib模块:

python3 -Bc "import pathlib; [p.unlink() for p in pathlib.Path('.').rglob('*.py[co]')]"
python3 -Bc "import pathlib; [p.rmdir() for p in pathlib.Path('.').rglob('__pycache__')]"

-B标志告诉Python不要写.pyc文件。(另请参见PYTHONDONTWRITEBYTECODE环境变量。)

上面的滥用列出了对循环的理解,但是使用时python -c,样式是次要的问题。或者,我们可以滥用(例如)__import__

python3 -Bc "for p in __import__('pathlib').Path('.').rglob('*.py[co]'): p.unlink()"
python3 -Bc "for p in __import__('pathlib').Path('.').rglob('__pycache__'): p.rmdir()"

批判答案

最常见的答案是:

find . | grep -E "(__pycache__|\.pyc|\.pyo$)" | xargs rm -rf

这似乎效率较低,因为它使用了三个过程。find需要一个正则表达式,因此我们不需要单独调用grep。同样,它已经-delete,所以我们并不需要单独调用rm-和违背评论在这里,它只要他们得到凭借正则表达式匹配的清空删除非空目录。

xargs手册页:

find /tmp -depth -name core -type f -delete

在/ tmp目录下或目录下查找名为core的文件并将其删除,但是比上一个示例更有效(因为我们避免了需要使用fork(2)和exec(2)来启动rm,并且不需要额外的xargs流程)。

macOS & Linux

BSD’s find implementation on macOS is different from GNU find – this is compatible with both BSD and GNU find. Start with a globbing implementation, using -name and the -o for or – Put this function in your .bashrc file:

pyclean () {
    find . -type f -name '*.py[co]' -delete -o -type d -name __pycache__ -delete
}

Then cd to the directory you want to recursively clean, and type pyclean.

GNU find-only

This is a GNU find, only (i.e. Linux) solution, but I feel it’s a little nicer with the regex:

pyclean () {
    find . -regex '^.*\(__pycache__\|\.py[co]\)$' -delete
}

Any platform, using Python 3

On Windows, you probably don’t even have find. You do, however, probably have Python 3, which starting in 3.4 has the convenient pathlib module:

python3 -Bc "import pathlib; [p.unlink() for p in pathlib.Path('.').rglob('*.py[co]')]"
python3 -Bc "import pathlib; [p.rmdir() for p in pathlib.Path('.').rglob('__pycache__')]"

The -B flag tells Python not to write .pyc files. (See also the PYTHONDONTWRITEBYTECODE environment variable.)

The above abuses list comprehensions for looping, but when using python -c, style is rather a secondary concern. Alternatively we could abuse (for example) __import__:

python3 -Bc "for p in __import__('pathlib').Path('.').rglob('*.py[co]'): p.unlink()"
python3 -Bc "for p in __import__('pathlib').Path('.').rglob('__pycache__'): p.rmdir()"

Critique of an answer

The top answer used to say:

find . | grep -E "(__pycache__|\.pyc|\.pyo$)" | xargs rm -rf

This would seem to be less efficient because it uses three processes. find takes a regular expression, so we don’t need a separate invocation of grep. Similarly, it has -delete, so we don’t need a separate invocation of rm —and contrary to a comment here, it will delete non-empty directories so long as they get emptied by virtue of the regular expression match.

From the xargs man page:

find /tmp -depth -name core -type f -delete

Find files named core in or below the directory /tmp and delete them, but more efficiently than in the previous example (because we avoid the need to use fork(2) and exec(2) to launch rm and we don’t need the extra xargs process).


回答 3

由于这是一个Python 3项目,因此您只需要删除__pycache__目录-所有.pyc/ .pyo文件都在其中。

find . -type d -name __pycache__ -exec rm -r {} \+

或更简单的形式

find . -type d -name __pycache__ -delete

出于某种原因对我不起作用(文件已删除,但目录未删除),因此出于完整性考虑,我将两者都包括在内。


另外,如果在受版本控制的目录中执行此操作,则可以告诉RCS __pycache__递归忽略文件夹。然后,在需要的时刻,只需清理所有被忽略的文件。这可能会更方便,因为可能需要清理的不仅仅是余地__pycache__

Since this is a Python 3 project, you only need to delete __pycache__ directories — all .pyc/.pyo files are inside them.

find . -type d -name __pycache__ -exec rm -r {} \+

or its simpler form,

find . -type d -name __pycache__ -delete

which didn’t work for me for some reason (files were deleted but directories weren’t), so I’m including both for the sake of completeness.


Alternatively, if you’re doing this in a directory that’s under revision control, you can tell the RCS to ignore __pycache__ folders recursively. Then, at the required moment, just clean up all the ignored files. This will likely be more convenient because there’ll probably be more to clean up than just __pycache__.


回答 4

这是我的别名,可与Python 2和Python 3一起使用,.pyc .pyo__pycache__递归方式删除所有文件以及目录。

alias pyclean='find . -name "*.py[co]" -o -name __pycache__ -exec rm -rf {} +'

This is my alias that works both with Python 2 and Python 3 removing all .pyc .pyo files as well __pycache__ directories recursively.

alias pyclean='find . -name "*.py[co]" -o -name __pycache__ -exec rm -rf {} +'

回答 5

如果需要永久解决方案以将Python缓存文件保留在项目目录之外:

Python 3.8开始,您可以使用环境变量PYTHONPYCACHEPREFIX为Python定义一个缓存目录。

从Python文档中:

如果设置了该选项,Python将在此路径的镜像目录树中而不是源树中的pycache目录中写入.pyc文件。这等效于指定-X pycache_prefix = PATH选项。

如果./profile将以下行添加到Linux中:

export PYTHONPYCACHEPREFIX="$HOME/.cache/cpython/"

Python不会__pycache__在您的项目目录中创建烦人的目录,而是将所有这些目录放在~/.cache/cpython/

If you need a permanent solution for keeping Python cache files out of your project directories:

Starting with Python 3.8 you can use the environment variable PYTHONPYCACHEPREFIX to define a cache directory for Python.

From the Python docs:

If this is set, Python will write .pyc files in a mirror directory tree at this path, instead of in pycache directories within the source tree. This is equivalent to specifying the -X pycache_prefix=PATH option.

Example

If you add the following line to your ./profile in Linux:

export PYTHONPYCACHEPREFIX="$HOME/.cache/cpython/"

Python won’t create the annoying __pycache__ directories in your project directory, instead it will put all of them under ~/.cache/cpython/


回答 6

我使用的命令:

find . -type d -name "__pycache__" -exec rm -r {} +

说明:

  1. 首先查找__pycache__当前目录中的所有文件夹。

  2. 执行rm -r {} +以上步骤删除每个文件夹({} 表示占位符并+结束命令)

编辑1:

我使用的是Linux,要重用我将以下行添加到~/.bashrc文件中的命令

alias rm-pycache='find . -type d -name  "__pycache__" -exec rm -r {} +'

编辑2: 如果您使用的是VS Code,则无需__pycache__手动删除。您可以将以下代码段添加到settings.json文件中。之后,VS Code将为您隐藏所有__pycache__文件夹

"files.exclude": {
     "**/__pycache__": true
}

希望能帮助到你 !!!

The command I’ve used:

find . -type d -name "__pycache__" -exec rm -r {} +

Explains:

  1. First finds all __pycache__ folders in current directory.

  2. Execute rm -r {} + to delete each folder at step above ({} signify for placeholder and + to end the command)

Edited 1:

I’m using Linux, to reuse the command I’ve added the line below to the ~/.bashrc file

alias rm-pycache='find . -type d -name  "__pycache__" -exec rm -r {} +'

Edited 2: If you’re using VS Code, you don’t need to remove __pycache__ manually. You can add the snippet below to settings.json file. After that, VS Code will hide all __pycache__ folders for you

"files.exclude": {
     "**/__pycache__": true
}

Hope it helps !!!


回答 7

从项目目录中键入以下内容:

删除所有.pyc文件

find . -path "*/*.pyc" -delete

删除所有.pyo文件:

find . -path "*/*.pyo" -delete

最后,要删除所有‘__pycache__’,请输入:

find . -path "*/__pycache__" -type d -exec rm -r {} ';'

如果遇到权限拒绝错误,请在上述所有命令的开头添加sudo

From the project directory type the following:

Deleting all .pyc files

find . -path "*/*.pyc" -delete

Deleting all .pyo files:

find . -path "*/*.pyo" -delete

Finally, to delete all ‘__pycache__’, type:

find . -path "*/__pycache__" -type d -exec rm -r {} ';'

If you encounter permission denied error, add sudo at the begining of all the above command.


回答 8

使用PyCharm

删除Python编译文件

  1. 在中Project Tool Window,右键单击应从中删除Python编译文件的项目或目录。

  2. 在上下文菜单上,选择Clean Python compiled files

.pyc驻留在所选目录中的文件将被静默删除。

Using PyCharm

To remove Python compiled files

  1. In the Project Tool Window, right-click a project or directory, where Python compiled files should be deleted from.

  2. On the context menu, choose Clean Python compiled files.

The .pyc files residing in the selected directory are silently deleted.


回答 9

非常感谢其他答案,基于这些答案,这就是我用于Debian软件包prerm文件的内容:

#!/bin/sh
set -e

deb_package='package-name'
python_package='package_name'

if which pyclean >/dev/null 2>&1; then
    py3clean -p $deb_package
else
    dpkg -L $deb_package | grep ${python_package}$ | while read file
    do
        find ${file} -type d -name __pycache__ -exec rm -r {} \+
    done
fi

Thanks a lot for the other answers, based on them this is what I used for my Debian package’s prerm file:

#!/bin/sh
set -e

deb_package='package-name'
python_package='package_name'

if which pyclean >/dev/null 2>&1; then
    py3clean -p $deb_package
else
    dpkg -L $deb_package | grep ${python_package}$ | while read file
    do
        find ${file} -type d -name __pycache__ -exec rm -r {} \+
    done
fi

回答 10

为什么不只是使用rm -rf __pycache__git add -A然后运行以将其从存储库中删除,然后添加__pycache__/到您的.gitignore文件中。

Why not just use rm -rf __pycache__? Run git add -A afterwards to remove them from your repository and add __pycache__/ to your .gitignore file.


回答 11

请直接到您的终端,然后输入:

$rm __pycache__

它将被删除。

Please just go to your terminal then type:

$rm __pycache__

and it will be removed.


警告有太多未结数字

问题:警告有太多未结数字

在使用创建大量图形的脚本中fix, ax = plt.subplots(...),我收到警告RuntimeWarning:已打开20个以上图形。通过pyplot接口(matplotlib.pyplot.figure)创建的图形将保留到显式关闭,并且可能会占用过多内存。

但是,我不明白为什么会收到此警告,因为使用保存了该数字之后fig.savefig(...),我使用删除了该警告fig.clear(); del fig。在我的代码中,我一次都没有打开多个图形。不过,我仍然收到有关太多未结数字的警告。这是什么意思/如何避免收到警告?

In a script where I create many figures with fix, ax = plt.subplots(...), I get the warning RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory.

However, I don’t understand why I get this warning, because after saving the figure with fig.savefig(...), I delete it with fig.clear(); del fig. At no point in my code, I have more than one figure open at a time. Still, I get the warning about too many open figures. What does that mean / how can I avoid getting the warning?


回答 0

在图形对象上使用.clf.cla,而不要创建一个图形。来自@DavidZwicker

假设您已导入pyplot

import matplotlib.pyplot as plt

plt.cla()清除轴,即当前图形中的当前活动轴。保持其他轴不变。

plt.clf()清除所有轴的整个当前图形,但使窗口保持打开状态,以便可以将其重新用于其他图形。

plt.close()关闭一个window,如果没有另外指定,它将是当前窗口。plt.close('all')将关闭所有未结数字。

之所以del fig不起作用,是因为pyplot状态机一直在参考周围的数字(因为要知道“当前数字”是什么,就必须这样做)。这意味着即使您删除对该图引用,也至少有一个活动引用,因此将永远不会进行垃圾回收。

由于我在这里针对此答案轮询集体智慧,因此@JoeKington在评论中提到plt.close(fig)将从pylab状态机(plt._pylab_helpers.Gcf)中删除特定的图形实例,并允许对其进行垃圾回收。

Use .clf or .cla on your figure object instead of creating a new figure. From @DavidZwicker

Assuming you have imported pyplot as

import matplotlib.pyplot as plt

plt.cla() clears an axis, i.e. the currently active axis in the current figure. It leaves the other axes untouched.

plt.clf() clears the entire current figure with all its axes, but leaves the window opened, such that it may be reused for other plots.

plt.close() closes a window, which will be the current window, if not specified otherwise. plt.close('all') will close all open figures.

The reason that del fig does not work is that the pyplot state-machine keeps a reference to the figure around (as it must if it is going to know what the ‘current figure’ is). This means that even if you delete your ref to the figure, there is at least one live ref, hence it will never be garbage collected.

Since I’m polling on the collective wisdom here for this answer, @JoeKington mentions in the comments that plt.close(fig) will remove a specific figure instance from the pylab state machine (plt._pylab_helpers.Gcf) and allow it to be garbage collected.


回答 1

这是Hooked的答案的更多细节。当我第一次阅读该答案时,我错过了调用说明,clf() 而不是创建一个新图形clf()如果您自己去创建另一个图形,它本身并没有帮助。

这是一个引起警告的简单示例:

from matplotlib import pyplot as plt, patches
import os


def main():
    path = 'figures'
    for i in range(21):
        _fig, ax = plt.subplots()
        x = range(3*i)
        y = [n*n for n in x]
        ax.add_patch(patches.Rectangle(xy=(i, 1), width=i, height=10))
        plt.step(x, y, linewidth=2, where='mid')
        figname = 'fig_{}.png'.format(i)
        dest = os.path.join(path, figname)
        plt.savefig(dest)  # write image to file
        plt.clf()
    print('Done.')

main()

为了避免警告,我必须将调用拉到subplots()循环之外。为了继续看到矩形,我需要切换clf()cla()。这将清除轴,而不会移除轴本身。

from matplotlib import pyplot as plt, patches
import os


def main():
    path = 'figures'
    _fig, ax = plt.subplots()
    for i in range(21):
        x = range(3*i)
        y = [n*n for n in x]
        ax.add_patch(patches.Rectangle(xy=(i, 1), width=i, height=10))
        plt.step(x, y, linewidth=2, where='mid')
        figname = 'fig_{}.png'.format(i)
        dest = os.path.join(path, figname)
        plt.savefig(dest)  # write image to file
        plt.cla()
    print('Done.')

main()

如果要批量生成图,则可能必须同时使用cla()close()。我遇到了一个问题,即一个批次可以拥有20多个地块而没有抱怨,但在20批次之后它会抱怨。我通过cla()在每个地块之后和close()每个批次之后使用来解决该问题。

from matplotlib import pyplot as plt, patches
import os


def main():
    for i in range(21):
        print('Batch {}'.format(i))
        make_plots('figures')
    print('Done.')


def make_plots(path):
    fig, ax = plt.subplots()
    for i in range(21):
        x = range(3 * i)
        y = [n * n for n in x]
        ax.add_patch(patches.Rectangle(xy=(i, 1), width=i, height=10))
        plt.step(x, y, linewidth=2, where='mid')
        figname = 'fig_{}.png'.format(i)
        dest = os.path.join(path, figname)
        plt.savefig(dest)  # write image to file
        plt.cla()
    plt.close(fig)


main()

我测量了性能以查看是否值得在一批中重复使用该图,并且当我close()在每次绘图后都调用时,这个小示例程序从41s减至49s(慢20%)。

Here’s a bit more detail to expand on Hooked’s answer. When I first read that answer, I missed the instruction to call clf() instead of creating a new figure. clf() on its own doesn’t help if you then go and create another figure.

Here’s a trivial example that causes the warning:

from matplotlib import pyplot as plt, patches
import os


def main():
    path = 'figures'
    for i in range(21):
        _fig, ax = plt.subplots()
        x = range(3*i)
        y = [n*n for n in x]
        ax.add_patch(patches.Rectangle(xy=(i, 1), width=i, height=10))
        plt.step(x, y, linewidth=2, where='mid')
        figname = 'fig_{}.png'.format(i)
        dest = os.path.join(path, figname)
        plt.savefig(dest)  # write image to file
        plt.clf()
    print('Done.')

main()

To avoid the warning, I have to pull the call to subplots() outside the loop. In order to keep seeing the rectangles, I need to switch clf() to cla(). That clears the axis without removing the axis itself.

from matplotlib import pyplot as plt, patches
import os


def main():
    path = 'figures'
    _fig, ax = plt.subplots()
    for i in range(21):
        x = range(3*i)
        y = [n*n for n in x]
        ax.add_patch(patches.Rectangle(xy=(i, 1), width=i, height=10))
        plt.step(x, y, linewidth=2, where='mid')
        figname = 'fig_{}.png'.format(i)
        dest = os.path.join(path, figname)
        plt.savefig(dest)  # write image to file
        plt.cla()
    print('Done.')

main()

If you’re generating plots in batches, you might have to use both cla() and close(). I ran into a problem where a batch could have more than 20 plots without complaining, but it would complain after 20 batches. I fixed that by using cla() after each plot, and close() after each batch.

from matplotlib import pyplot as plt, patches
import os


def main():
    for i in range(21):
        print('Batch {}'.format(i))
        make_plots('figures')
    print('Done.')


def make_plots(path):
    fig, ax = plt.subplots()
    for i in range(21):
        x = range(3 * i)
        y = [n * n for n in x]
        ax.add_patch(patches.Rectangle(xy=(i, 1), width=i, height=10))
        plt.step(x, y, linewidth=2, where='mid')
        figname = 'fig_{}.png'.format(i)
        dest = os.path.join(path, figname)
        plt.savefig(dest)  # write image to file
        plt.cla()
    plt.close(fig)


main()

I measured the performance to see if it was worth reusing the figure within a batch, and this little sample program slowed from 41s to 49s (20% slower) when I just called close() after every plot.


回答 2

如果您打算有意识地在内存中保留许多绘图,但又不想被警告,则可以在生成图形之前更新选项。

import matplotlib.pyplot as plt
plt.rcParams.update({'figure.max_open_warning': 0})

这将防止发出警告,而无需更改有关内存管理方式的任何内容。

If you intend to knowingly keep many plots in memory, but don’t want to be warned about it, you can update your options prior to generating figures.

import matplotlib.pyplot as plt
plt.rcParams.update({'figure.max_open_warning': 0})

This will prevent the warning from being emitted without changing anything about the way memory is managed.


回答 3

以下代码段为我解决了这个问题:


class FigureWrapper(object):
    '''Frees underlying figure when it goes out of scope. 
    '''

    def __init__(self, figure):
        self._figure = figure

    def __del__(self):
        plt.close(self._figure)
        print("Figure removed")


# .....
    f, ax = plt.subplots(1, figsize=(20, 20))
    _wrapped_figure = FigureWrapper(f)

    ax.plot(...
    plt.savefig(...
# .....

_wrapped_figure超出范围时,运行时将在内部调用我们的__del__()方法plt.close()。即使在_wrapped_figure构造函数之后触发异常,也会发生这种情况。

The following snippet solved the issue for me:


class FigureWrapper(object):
    '''Frees underlying figure when it goes out of scope. 
    '''

    def __init__(self, figure):
        self._figure = figure

    def __del__(self):
        plt.close(self._figure)
        print("Figure removed")


# .....
    f, ax = plt.subplots(1, figsize=(20, 20))
    _wrapped_figure = FigureWrapper(f)

    ax.plot(...
    plt.savefig(...
# .....

When _wrapped_figure goes out of scope the runtime calls our __del__() method with plt.close() inside. It happens even if exception fires after _wrapped_figure constructor.


回答 4

如果您只想暂时取消警告,这也很有用:

    import matplotlib.pyplot as plt
       
    with plt.rc_context(rc={'figure.max_open_warning': 0}):
        lots_of_plots()

This is also useful if you only want to temporarily suppress the warning:

import matplotlib.pyplot as plt
       
with plt.rc_context(rc={'figure.max_open_warning': 0}):
    lots_of_plots()

为什么x ** 4.0比Python 3中的x ** 4快?

问题:为什么x ** 4.0比Python 3中的x ** 4快?

为什么x**4.0要比x**4?我正在使用CPython 3.5.2。

$ python -m timeit "for x in range(100):" " x**4.0"
  10000 loops, best of 3: 24.2 usec per loop

$ python -m timeit "for x in range(100):" " x**4"
  10000 loops, best of 3: 30.6 usec per loop

我尝试更改所提高的力量以查看其作用方式,例如,如果我将x提高至10或16的力量,它会从30跳到35,但是如果我以浮点数提高10.0,那只是在移动大约24.1〜4。

我想这可能与浮点转换和2的幂有关,但我真的不知道。

我注意到在这两种情况下2的幂都更快,因为这些计算对于解释器/计算机而言更本机/更容易,所以我想。但尽管如此,浮子几乎没有动。2.0 => 24.1~4 & 128.0 => 24.1~4 2 => 29 & 128 => 62


TigerhawkT3指出,它不会在循环之外发生。我检查了一下,这种情况只发生在基地抬高时(从我所看到的情况)。有什么想法吗?

Why is x**4.0 faster than x**4? I am using CPython 3.5.2.

$ python -m timeit "for x in range(100):" " x**4.0"
  10000 loops, best of 3: 24.2 usec per loop

$ python -m timeit "for x in range(100):" " x**4"
  10000 loops, best of 3: 30.6 usec per loop

I tried changing the power I raised by to see how it acts, and for example if I raise x to the power of 10 or 16 it’s jumping from 30 to 35, but if I’m raising by 10.0 as a float, it’s just moving around 24.1~4.

I guess it has something to do with float conversion and powers of 2 maybe, but I don’t really know.

I noticed that in both cases powers of 2 are faster, I guess since those calculations are more native/easy for the interpreter/computer. But still, with floats it’s almost not moving. 2.0 => 24.1~4 & 128.0 => 24.1~4 but 2 => 29 & 128 => 62


TigerhawkT3 pointed out that it doesn’t happen outside of the loop. I checked and the situation only occurs (from what I’ve seen) when the base is getting raised. Any idea about that?

回答 0

为什么比Python 3 * x**4.0 更快x**4

Python 3 int对象是成熟的对象,旨在支持任意大小。因此,它们是在C级别上进行处理的(请参阅如何在中将所有变量声明为PyLongObject *type long_pow)。这也使它们的取幂变得更加棘手乏味,因为您需要ob_digit使用它用来表示其值的数组进行操作。(勇敢的人士来了。-有关s的更多信息,请参见:了解Python中大整数的内存分配PyLongObject。)

float相反,可以将 Python 对象转换为C double类型(通过使用PyFloat_AsDouble),并且可以使用这些本机类型执行操作。这很棒,因为在检查了相关的边缘情况之后,它允许Python 使用平台的powC pow,即)来处理实际的幂运算:

/* Now iv and iw are finite, iw is nonzero, and iv is
 * positive and not equal to 1.0.  We finally allow
 * the platform pow to step in and do the rest.
 */
errno = 0;
PyFPE_START_PROTECT("pow", return NULL)
ix = pow(iv, iw); 

其中,iviw是我们的原PyFloatObjectS作为Ç double秒。

对于它的价值:Python 2.7.13对我而言是一个2~3更快的因子,并且显示出相反的行为。

先前的事实也解释了Python 2和3之间的差异,因此,我认为我也将解决此评论,因为它很有趣。

在Python 2中,您使用的旧int对象与intPython 3中的对象不同(int3.x中的所有对象都是PyLongObject类型)。在Python 2中,有一个区别取决于对象的值(或者,如果使用后缀L/l):

# Python 2
type(30)  # <type 'int'>
type(30L) # <type 'long'>

<type 'int'>你看到这里做同样的事情float就做,它被安全地转换成C long 时,就可以进行幂(中int_pow也暗示编译器放你的歌在寄存器中,如果能够这样做,这样可以有所作为) :

static PyObject *
int_pow(PyIntObject *v, PyIntObject *w, PyIntObject *z)
{
    register long iv, iw, iz=0, ix, temp, prev;
/* Snipped for brevity */    

这样可以提高速度。

要查看<type 'long'>s与<type 'int'>s 相比是否比较慢,如果将x名称包装long在Python 2 中的调用中(实质上是强制将其long_pow像在Python 3中那样使用),则速度增益会消失:

# <type 'int'>
(python2)  python -m timeit "for x in range(1000):" " x**2"       
10000 loops, best of 3: 116 usec per loop
# <type 'long'> 
(python2)  python -m timeit "for x in range(1000):" " long(x)**2"
100 loops, best of 3: 2.12 msec per loop

请注意,尽管一个代码段将转换为intlong而另一个代码段未将其转换为(如@pydsinger所指出的),但这种转换并不是减慢速度的推动力。执行long_pow是。(仅long(x)将时间与语句一起查看)。

[…]它不会在循环之外发生。[…]有什么想法吗?

这是CPython的窥孔优化器,可以为您折叠常量。无论哪种情况,您都会获得相同的精确计时,因为没有实际的计算来找到幂运算的结果,只加载值:

dis.dis(compile('4 ** 4', '', 'exec'))
  1           0 LOAD_CONST               2 (256)
              3 POP_TOP
              4 LOAD_CONST               1 (None)
              7 RETURN_VALUE

生成相同的字节码'4 ** 4.',唯一的区别是LOAD_CONST加载的是float 256.0而不是int 256

dis.dis(compile('4 ** 4.', '', 'exec'))
  1           0 LOAD_CONST               3 (256.0)
              2 POP_TOP
              4 LOAD_CONST               2 (None)
              6 RETURN_VALUE

所以时代是一样的。


*以上所有内容仅适用于CPython(Python的参考实现)。其他实现可能会有所不同。

Why is x**4.0 faster than x**4 in Python 3*?

Python 3 int objects are a full fledged object designed to support an arbitrary size; due to that fact, they are handled as such on the C level (see how all variables are declared as PyLongObject * type in long_pow). This also makes their exponentiation a lot more trickier and tedious since you need to play around with the ob_digit array it uses to represent its value to perform it. (Source for the brave. — See: Understanding memory allocation for large integers in Python for more on PyLongObjects.)

Python float objects, on the contrary, can be transformed to a C double type (by using PyFloat_AsDouble) and operations can be performed using those native types. This is great because, after checking for relevant edge-cases, it allows Python to use the platforms’ pow (C’s pow, that is) to handle the actual exponentiation:

/* Now iv and iw are finite, iw is nonzero, and iv is
 * positive and not equal to 1.0.  We finally allow
 * the platform pow to step in and do the rest.
 */
errno = 0;
PyFPE_START_PROTECT("pow", return NULL)
ix = pow(iv, iw); 

where iv and iw are our original PyFloatObjects as C doubles.

For what it’s worth: Python 2.7.13 for me is a factor 2~3 faster, and shows the inverse behaviour.

The previous fact also explains the discrepancy between Python 2 and 3 so, I thought I’d address this comment too because it is interesting.

In Python 2, you’re using the old int object that differs from the int object in Python 3 (all int objects in 3.x are of PyLongObject type). In Python 2, there’s a distinction that depends on the value of the object (or, if you use the suffix L/l):

# Python 2
type(30)  # <type 'int'>
type(30L) # <type 'long'>

The <type 'int'> you see here does the same thing floats do, it gets safely converted into a C long when exponentiation is performed on it (The int_pow also hints the compiler to put ’em in a register if it can do so, so that could make a difference):

static PyObject *
int_pow(PyIntObject *v, PyIntObject *w, PyIntObject *z)
{
    register long iv, iw, iz=0, ix, temp, prev;
/* Snipped for brevity */    

this allows for a good speed gain.

To see how sluggish <type 'long'>s are in comparison to <type 'int'>s, if you wrapped the x name in a long call in Python 2 (essentially forcing it to use long_pow as in Python 3), the speed gain disappears:

# <type 'int'>
(python2) ➜ python -m timeit "for x in range(1000):" " x**2"       
10000 loops, best of 3: 116 usec per loop
# <type 'long'> 
(python2) ➜ python -m timeit "for x in range(1000):" " long(x)**2"
100 loops, best of 3: 2.12 msec per loop

Take note that, though the one snippet transforms the int to long while the other does not (as pointed out by @pydsinger), this cast is not the contributing force behind the slowdown. The implementation of long_pow is. (Time the statements solely with long(x) to see).

[…] it doesn’t happen outside of the loop. […] Any idea about that?

This is CPython’s peephole optimizer folding the constants for you. You get the same exact timings either case since there’s no actual computation to find the result of the exponentiation, only loading of values:

dis.dis(compile('4 ** 4', '', 'exec'))
  1           0 LOAD_CONST               2 (256)
              3 POP_TOP
              4 LOAD_CONST               1 (None)
              7 RETURN_VALUE

Identical byte-code is generated for '4 ** 4.' with the only difference being that the LOAD_CONST loads the float 256.0 instead of the int 256:

dis.dis(compile('4 ** 4.', '', 'exec'))
  1           0 LOAD_CONST               3 (256.0)
              2 POP_TOP
              4 LOAD_CONST               2 (None)
              6 RETURN_VALUE

So the times are identical.


*All of the above apply solely for CPython, the reference implementation of Python. Other implementations might perform differently.


回答 1

如果我们查看字节码,我们可以看到表达式是完全相同的。唯一的区别是常量的类型,它将是的自变量BINARY_POWER。因此,最肯定的是由于它int被转换为浮点数。

>>> def func(n):
...    return n**4
... 
>>> def func1(n):
...    return n**4.0
... 
>>> from dis import dis
>>> dis(func)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4)
              6 BINARY_POWER
              7 RETURN_VALUE
>>> dis(func1)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4.0)
              6 BINARY_POWER
              7 RETURN_VALUE

更新:让我们看一下CPython源代码中的Objects / abstract.c

PyObject *
PyNumber_Power(PyObject *v, PyObject *w, PyObject *z)
{
    return ternary_op(v, w, z, NB_SLOT(nb_power), "** or pow()");
}

PyNumber_Power呼叫ternary_op,这太长了,无法粘贴到这里,因此这是链接

它调用的nb_power广告位xy作为参数传递。

最后,在Objects / floatobject.c的float_pow()第686行,我们看到在实际操作之前将参数转换为C :double

static PyObject *
float_pow(PyObject *v, PyObject *w, PyObject *z)
{
    double iv, iw, ix;
    int negate_result = 0;

    if ((PyObject *)z != Py_None) {
        PyErr_SetString(PyExc_TypeError, "pow() 3rd argument not "
            "allowed unless all arguments are integers");
        return NULL;
    }

    CONVERT_TO_DOUBLE(v, iv);
    CONVERT_TO_DOUBLE(w, iw);
    ...

If we look at the bytecode, we can see that the expressions are purely identical. The only difference is a type of a constant that will be an argument of BINARY_POWER. So it’s most certainly due to an int being converted to a floating point number down the line.

>>> def func(n):
...    return n**4
... 
>>> def func1(n):
...    return n**4.0
... 
>>> from dis import dis
>>> dis(func)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4)
              6 BINARY_POWER
              7 RETURN_VALUE
>>> dis(func1)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4.0)
              6 BINARY_POWER
              7 RETURN_VALUE

Update: let’s take a look at Objects/abstract.c in the CPython source code:

PyObject *
PyNumber_Power(PyObject *v, PyObject *w, PyObject *z)
{
    return ternary_op(v, w, z, NB_SLOT(nb_power), "** or pow()");
}

PyNumber_Power calls ternary_op, which is too long to paste here, so here’s the link.

It calls the nb_power slot of x, passing y as an argument.

Finally, in float_pow() at line 686 of Objects/floatobject.c we see that arguments are converted to a C double right before the actual operation:

static PyObject *
float_pow(PyObject *v, PyObject *w, PyObject *z)
{
    double iv, iw, ix;
    int negate_result = 0;

    if ((PyObject *)z != Py_None) {
        PyErr_SetString(PyExc_TypeError, "pow() 3rd argument not "
            "allowed unless all arguments are integers");
        return NULL;
    }

    CONVERT_TO_DOUBLE(v, iv);
    CONVERT_TO_DOUBLE(w, iw);
    ...

回答 2

因为一个是正确的,所以另一个是近似值。

>>> 334453647687345435634784453567231654765 ** 4.0
1.2512490121794596e+154
>>> 334453647687345435634784453567231654765 ** 4
125124901217945966595797084130108863452053981325370920366144
719991392270482919860036990488994139314813986665699000071678
41534843695972182197917378267300625

Because one is correct, another is approximation.

>>> 334453647687345435634784453567231654765 ** 4.0
1.2512490121794596e+154
>>> 334453647687345435634784453567231654765 ** 4
125124901217945966595797084130108863452053981325370920366144
719991392270482919860036990488994139314813986665699000071678
41534843695972182197917378267300625

Python 2和3之间的numpy数组的Pickle不兼容

问题:Python 2和3之间的numpy数组的Pickle不兼容

我正在尝试使用此程序加载在Python 3.2中链接到此处的MNIST数据集:

import pickle
import gzip
import numpy


with gzip.open('mnist.pkl.gz', 'rb') as f:
    l = list(pickle.load(f))
    print(l)

不幸的是,它给了我错误:

Traceback (most recent call last):
   File "mnist.py", line 7, in <module>
     train_set, valid_set, test_set = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

然后,我尝试在Python 2.7中解码腌制的文件,然后重新编码。因此,我在Python 2.7中运行了该程序:

import pickle
import gzip
import numpy


with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f)

    # Printing out the three objects reveals that they are
    # all pairs containing numpy arrays.

    with gzip.open('mnistx.pkl.gz', 'wb') as g:
        pickle.dump(
            (train_set, valid_set, test_set),
            g,
            protocol=2)  # I also tried protocol 0.

它运行无误,因此我在Python 3.2中重新运行了该程序:

import pickle
import gzip
import numpy

# note the filename change
with gzip.open('mnistx.pkl.gz', 'rb') as f:
    l = list(pickle.load(f))
    print(l)

但是,它给了我与以前相同的错误。我该如何工作?


这是加载MNIST数据集的更好方法。

I am trying to load the MNIST dataset linked here in Python 3.2 using this program:

import pickle
import gzip
import numpy


with gzip.open('mnist.pkl.gz', 'rb') as f:
    l = list(pickle.load(f))
    print(l)

Unfortunately, it gives me the error:

Traceback (most recent call last):
   File "mnist.py", line 7, in <module>
     train_set, valid_set, test_set = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

I then tried to decode the pickled file in Python 2.7, and re-encode it. So, I ran this program in Python 2.7:

import pickle
import gzip
import numpy


with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f)

    # Printing out the three objects reveals that they are
    # all pairs containing numpy arrays.

    with gzip.open('mnistx.pkl.gz', 'wb') as g:
        pickle.dump(
            (train_set, valid_set, test_set),
            g,
            protocol=2)  # I also tried protocol 0.

It ran without error, so I reran this program in Python 3.2:

import pickle
import gzip
import numpy

# note the filename change
with gzip.open('mnistx.pkl.gz', 'rb') as f:
    l = list(pickle.load(f))
    print(l)

However, it gave me the same error as before. How do I get this to work?


This is a better approach for loading the MNIST dataset.


回答 0

这似乎有点不兼容。它正在尝试加载一个假定为ASCII的“ binstring”对象,而在这种情况下,它是二进制数据。如果这是Python 3取消选取器中的错误,还是numpy对选取器的“滥用”,我不知道。

这是一种解决方法,但是我不知道此时数据的意义如何:

import pickle
import gzip
import numpy

with open('mnist.pkl', 'rb') as f:
    u = pickle._Unpickler(f)
    u.encoding = 'latin1'
    p = u.load()
    print(p)

在Python 2中取消选择它,然后重新选择它只会再次导致相同的问题,因此您需要将其另存为另一种格式。

This seems like some sort of incompatibility. It’s trying to load a “binstring” object, which is assumed to be ASCII, while in this case it is binary data. If this is a bug in the Python 3 unpickler, or a “misuse” of the pickler by numpy, I don’t know.

Here is something of a workaround, but I don’t know how meaningful the data is at this point:

import pickle
import gzip
import numpy

with open('mnist.pkl', 'rb') as f:
    u = pickle._Unpickler(f)
    u.encoding = 'latin1'
    p = u.load()
    print(p)

Unpickling it in Python 2 and then repickling it is only going to create the same problem again, so you need to save it in another format.


回答 1

如果您在python3中遇到此错误,则可能是python 2和python 3之间的不兼容问题,对我来说,解决方案是load使用latin1编码:

pickle.load(file, encoding='latin1')

If you are getting this error in python3, then, it could be an incompatibility issue between python 2 and python 3, for me the solution was to load with latin1 encoding:

pickle.load(file, encoding='latin1')

回答 2

它似乎是Python 2和Python 3之间的不兼容问题。我尝试使用以下命令加载MNIST数据集:

    train_set, valid_set, test_set = pickle.load(file, encoding='iso-8859-1')

它适用于Python 3.5.2

It appears to be an incompatibility issue between Python 2 and Python 3. I tried loading the MNIST dataset with

    train_set, valid_set, test_set = pickle.load(file, encoding='iso-8859-1')

and it worked for Python 3.5.2


回答 3

由于迁移到unicode ,似乎在2.x和3.x之间的泡菜中存在一些兼容性问题。您的文件似乎已被python 2.x腌制,并且在3.x中对其进行解码可能很麻烦。

我建议用python 2.x将其解开,并保存为一种在您使用的两个版本中都能更好地播放的格式。

It looks like there are some compatablility issues in pickle between 2.x and 3.x due to the move to unicode. Your file appears to be pickled with python 2.x and decoding it in 3.x could be troublesome.

I’d suggest unpickling it with python 2.x and saving to a format that plays more nicely across the two versions you’re using.


回答 4

我只是偶然发现了这个片段。希望这有助于澄清兼容性问题。

import sys

with gzip.open('mnist.pkl.gz', 'rb') as f:
    if sys.version_info.major > 2:
        train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
    else:
        train_set, valid_set, test_set = pickle.load(f)

I just stumbled upon this snippet. Hope this helps to clarify the compatibility issue.

import sys

with gzip.open('mnist.pkl.gz', 'rb') as f:
    if sys.version_info.major > 2:
        train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
    else:
        train_set, valid_set, test_set = pickle.load(f)

回答 5

尝试:

l = list(pickle.load(f, encoding='bytes')) #if you are loading image data or 
l = list(pickle.load(f, encoding='latin1')) #if you are loading text data

pickle.load方法的文档中:

可选的关键字参数是fix_imports,编码和错误,用于控制对Python 2生成的pickle流的兼容性支持。

如果fix_imports为True,则pickle将尝试将旧的Python 2名称映射到Python 3中使用的新名称。

编码和错误告诉pickle如何解码Python 2腌制的8位字符串实例;它们分别默认为“ ASCII”和“ strict”。编码可以是“字节”,以将这些8位字符串实例读取为字节对象。

Try:

l = list(pickle.load(f, encoding='bytes')) #if you are loading image data or 
l = list(pickle.load(f, encoding='latin1')) #if you are loading text data

From the documentation of pickle.load method:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2.

If fix_imports is True, pickle will try to map the old Python 2 names to the new names used in Python 3.

The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.


回答 6

有一个比泡菜快和容易的hi。我试图保存并在泡菜转储中阅读它,但是在阅读时有很多问题,浪费了一个小时,尽管我正在处理自己的数据以创建聊天机器人,但仍然找不到解决方案。

vec_x并且vec_y是numpy数组:

data=[vec_x,vec_y]
hkl.dump( data, 'new_data_file.hkl' )

然后,您只需阅读并执行以下操作:

data2 = hkl.load( 'new_data_file.hkl' )

There is hickle which is faster than pickle and easier. I tried to save and read it in pickle dump but while reading there were a lot of problems and wasted an hour and still didn’t find a solution though I was working on my own data to create a chatbot.

vec_x and vec_y are numpy arrays:

data=[vec_x,vec_y]
hkl.dump( data, 'new_data_file.hkl' )

Then you just read it and perform the operations:

data2 = hkl.load( 'new_data_file.hkl' )

重新加载模块,给出NameError:名称’reload’未定义

问题:重新加载模块,给出NameError:名称’reload’未定义

我正在尝试重新加载已经在Python 3中导入的模块。我知道您只需要导入一次,import再次执行命令将不会执行任何操作。

执行reload(foo)中出现此错误:

Traceback (most recent call last):
    File "(stdin)", line 1, in (module)
    ...
NameError: name 'reload' is not defined

错误是什么意思?

I’m trying to reload a module I have already imported in Python 3. I know that you only need to import once and executing the import command again won’t do anything.

Executing reload(foo) is giving this error:

Traceback (most recent call last):
    File "(stdin)", line 1, in (module)
    ...
NameError: name 'reload' is not defined

What does the error mean?


回答 0

reload 是Python 2中的内置函数,而不是Python 3中的内置函数,因此,您所看到的错误是预期的。

如果您确实必须在Python 3中重新加载模块,则应使用以下任一方法:

reload is a builtin in Python 2, but not in Python 3, so the error you’re seeing is expected.

If you truly must reload a module in Python 3, you should use either:


回答 1

对于> = Python3.4:

import importlib
importlib.reload(module)

对于<= Python3.3:

import imp
imp.reload(module)

对于Python2.x:

使用内置reload()功能。

reload(module)

For >= Python3.4:

import importlib
importlib.reload(module)

For <= Python3.3:

import imp
imp.reload(module)

For Python2.x:

Use the in-built reload() function.

reload(module)

回答 2

import imp
imp.reload(script4)
import imp
imp.reload(script4)

回答 3

要扩展先前编写的答案,如果您想要一个可以在Python版本2和版本3中使用的解决方案,则可以使用以下方法:

try:
    reload  # Python 2.7
except NameError:
    try:
        from importlib import reload  # Python 3.4+
    except ImportError:
        from imp import reload  # Python 3.0 - 3.3

To expand on the previously written answers, if you want a single solution which will work across Python versions 2 and 3, you can use the following:

try:
    reload  # Python 2.7
except NameError:
    try:
        from importlib import reload  # Python 3.4+
    except ImportError:
        from imp import reload  # Python 3.0 - 3.3

回答 4

我建议使用以下代码段,因为它可在所有python版本中使用(需要six):

from six.moves import reload_module
reload_module(module)

I recommend using the following snippet as it works in all python versions (requires six):

from six.moves import reload_module
reload_module(module)

回答 5

为了实现python2和python3的兼容性,可以使用:

# Python 2 and 3
from imp import reload
reload(mymodule)

For python2 and python3 compatibility, you can use:

# Python 2 and 3
from imp import reload
reload(mymodule)

回答 6

如果您不想使用外部库,则一种解决方案是从python 2重新为python 3创建reload方法,如下所示。在模块顶部使用它(假设python 3.4+)。

import sys
if(sys.version_info.major>=3):
    def reload(MODULE):        
        import importlib
        importlib.reload(MODULE)

如果您将python文件用作配置文件,并希望避免重新启动应用程序,则非常需要BTW重新加载。

If you don’t want to use external libs, then one solution is to recreate the reload method from python 2 for python 3 as below. Use this in the top of the module (assumes python 3.4+).

import sys
if(sys.version_info.major>=3):
    def reload(MODULE):        
        import importlib
        importlib.reload(MODULE)

BTW reload is very much required if you use python files as config files and want to avoid restarts of the application…..


错误UnicodeDecodeError:’utf-8’编解码器无法解码位置0的字节0xff:无效的起始字节

问题:错误UnicodeDecodeError:’utf-8’编解码器无法解码位置0的字节0xff:无效的起始字节

https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools

在上述站点上编译“ process.py”时发生错误。

 python tools/process.py --input_dir data --            operation resize --outp
ut_dir data2/resize
data/0.jpg -> data2/resize/0.png

追溯(最近一次通话):

File "tools/process.py", line 235, in <module>
  main()
File "tools/process.py", line 167, in main
  src = load(src_path)
File "tools/process.py", line 113, in load
  contents = open(path).read()
      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode
  (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

错误原因是什么?Python的版本是3.5.2。

https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools

An error occurred when compiling “process.py” on the above site.

 python tools/process.py --input_dir data --            operation resize --outp
ut_dir data2/resize
data/0.jpg -> data2/resize/0.png

Traceback (most recent call last):

File "tools/process.py", line 235, in <module>
  main()
File "tools/process.py", line 167, in main
  src = load(src_path)
File "tools/process.py", line 113, in load
  contents = open(path).read()
      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode
  (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

What is the cause of the error? Python’s version is 3.5.2.


回答 0

Python尝试将字节数组(bytes假定为utf-8编码的字符串)转换为unicode字符串(str)。当然,此过程是根据utf-8规则进行的解码。尝试此操作时,会遇到utf-8编码的字符串中不允许的字节序列(即位置0处的此0xff)。

由于您没有提供我们可以查看的任何代码,因此我们只能猜测其余的代码。

从堆栈跟踪中,我们可以假定触发操作是从文件(contents = open(path).read())中读取数据。我建议以如下方式重新编码:

with open(path, 'rb') as f:
  contents = f.read()

b在该模式说明open(),指出该文件应作为二进制来处理,所以contents仍将是一个bytes。这样不会发生任何解码尝试。

Python tries to convert a byte-array (a bytes which it assumes to be a utf-8-encoded string) to a unicode string (str). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).

Since you did not provide any code we could look at, we only could guess on the rest.

From the stack trace we can assume that the triggering action was the reading from a file (contents = open(path).read()). I propose to recode this in a fashion like this:

with open(path, 'rb') as f:
  contents = f.read()

That b in the mode specifier in the open() states that the file shall be treated as binary, so contents will remain a bytes. No decoding attempt will happen this way.


回答 1

使用此解决方案,它将去除(忽略)字符并返回不包含字符的字符串。仅当您需要剥离它们而不转换它们时才使用此方法。

with open(path, encoding="utf8", errors='ignore') as f:

使用errors='ignore' 您只会丢失一些字符。但如果您不关心它们,因为它们似乎是多余的字符,这些字符来自与我的套接字服务器连接的客户端的格式和编程不正确。然后,这是一个简单的直接解决方案。 参考

Use this solution it will strip out (ignore) the characters and return the string without them. Only use this if your need is to strip them not convert them.

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore' You’ll just lose some characters. but if your don’t care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server. Then its a easy direct solution. reference


回答 2

发生了与此类似的问题,最终使用UTF-16进行解码。我的代码如下。

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

这会将文件内容作为导入,但是它将以UTF格式返回代码。从那里开始,它将被解码并以行分隔。

Had an issue similar to this, Ended up using UTF-16 to decode. my code is below.

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

this would take the file contents as an import, but it would return the code in UTF format. from there it would be decoded and seperated by lines.


回答 3

使用编码格式ISO-8859-1解决此问题。

Use encoding format ISO-8859-1 to solve the issue.


回答 4

在遇到相同的错误时,我遇到了这个线程,经过一些研究后我可以确认,这是您尝试使用UTF-8解码UTF-16文件时发生的错误。

对于UTF-16,第一个字符(UTF-16中为2个字节)是字节顺序标记(BOM),它用作解码提示,并且不会在解码字符串中显示为字符。这意味着第一个字节将是FE或FF,第二个字节将是另一个。

我找到真正的答案后进行了大量编辑

I’ve come across this thread when suffering the same error, after doing some research I can confirm, this is an error that happens when you try to decode a UTF-16 file with UTF-8.

With UTF-16 the first characther (2 bytes in UTF-16) is a Byte Order Mark (BOM), which is used as a decoding hint and doesn’t appear as a character in the decoded string. This means the first byte will be either FE or FF and the second, the other.

Heavily edited after I found out the real answer


回答 5

仅使用

base64.b64decode(a) 

代替

base64.b64decode(a).decode('utf-8')

use only

base64.b64decode(a) 

instead of

base64.b64decode(a).decode('utf-8')

回答 6

如果您使用的是Mac,请检查是否有隐藏文件.DS_Store。删除文件后,我的程序正常工作。

If you are on a mac check if you for a hidden file, .DS_Store. After removing the file my program worked.


回答 7

检查要读取的文件的路径。我的代码一直在给我错误,直到我将路径名更改为当前工作目录为止。错误是:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Check the path of the file to be read. My code kept on giving me errors until I changed the path name to present working directory. The error was:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

回答 8

如果您从串行端口接收数据,请确保使用正确的波特率(和其他配置):使用(utf-8)解码,但错误的配置会产生相同的错误

UnicodeDecodeError:’utf-8’编解码器无法解码位置0的字节0xff:无效的起始字节

在Linux上使用以下命令检查您的串行端口配置: stty -F /dev/ttyUSBX -a

if you are receiving data from a serial port, make sure you are using the right baudrate (and the other configs ) : decoding using (utf-8) but the wrong config will generate the same error

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 0: invalid start byte

to check your serial port config on linux use : stty -F /dev/ttyUSBX -a


回答 9

这仅表示您选择了错误的编码来读取文件。

在Mac上,用于file -I file.txt查找正确的编码。在Linux上,使用file -i file.txt

It simply means that one chose the wrong encoding to read the file.

On Mac, use file -I file.txt to find the correct encoding. On Linux, use file -i file.txt.


回答 10

处理从Linux生成的文件时,我遇到相同的问题。事实证明,这与包含问号的文件有关。

I have the same issue when processing a file generated from Linux. It turns out it was related with files containing question marks..


回答 11

我有一个类似的问题。

解决方法:

import io

with io.open(filename, 'r', encoding='utf-8') as fn:
  lines = fn.readlines()

但是,我还有另一个问题。一些html文件(以我为例)不是utf-8,因此我收到了类似的错误。当我排除这些html文件时,一切工作顺利。

因此,除了修复代码之外,还要检查您正在读取的文件,也许确实存在不兼容性。

I had a similar problem.

Solved it by:

import io

with io.open(filename, 'r', encoding='utf-8') as fn:
  lines = fn.readlines()

However, I had another problem. Some html files (in my case) were not utf-8, so I received a similar error. When I excluded those html files, everything worked smoothly.

So, except from fixing the code, check also the files you are reading from, maybe there is an incompatibility there indeed.


回答 12

如果可能,请在文本编辑器中打开文件,然后尝试将编码更改为UTF-8。否则,请在OS级别以编程方式进行操作。

If possible, open the file in a text editor and try to change the encoding to UTF-8. Otherwise do it programatically at the OS level.


回答 13

我有一个类似的问题。我尝试在tensorflow / models / objective_detection中运行示例并遇到相同的消息。尝试将Python3更改为Python2

I have a similar problem. I try to run an example in tensorflow/models/objective_detection and met the same message. Try to change Python3 to Python2


Python 3中的sys.maxint是什么?

问题:Python 3中的sys.maxint是什么?

我一直在尝试找出如何表示最大整数,并且已经阅读过使用"sys.maxint"。但是,在Python 3中,当我调用它时,会得到:

AttributeError: module 'object' has no attribute 'maxint'

I’ve been trying to find out how to represent a maximum integer, and I’ve read to use "sys.maxint". However, in Python 3 when I call it I get:

AttributeError: module 'object' has no attribute 'maxint'

回答 0

sys.maxint常量已删除,因为整数值不再受限制。但是,sys.maxsize可以用作大于任何实际列表或字符串索引的整数。它符合实现的“自然”整数大小,并且通常与同一平台上的先前版本中的sys.maxint相同(假定具有相同的生成选项)。

http://docs.python.org/3.1/whatsnew/3.0.html#integers

The sys.maxint constant was removed, since there is no longer a limit to the value of integers. However, sys.maxsize can be used as an integer larger than any practical list or string index. It conforms to the implementation’s “natural” integer size and is typically the same as sys.maxint in previous releases on the same platform (assuming the same build options).

http://docs.python.org/3.1/whatsnew/3.0.html#integers


回答 1

正如其他人指出的那样,Python 3 int没有最大大小,但是如果您只需要保证比其他任何int值都高的值,则可以将Infinity的float值使用float("inf")

As pointed out by others, Python 3’s int does not have a maximum size, but if you just need something that’s guaranteed to be higher than any other int value, then you can use the float value for Infinity, which you can get with float("inf").


回答 2

Python 3 int没有最大值。

如果您的目的是在以与Python相同的方式确定C语言中int的最大大小,则可以使用struct模块找出:

>>> import struct
>>> platform_c_maxint = 2 ** (struct.Struct('i').size * 8 - 1) - 1

如果您对Python 3 int对象的内部实现细节感到好奇,请查看sys.int_info每位位数和位数大小的详细信息。没有正常的程序应该关心这些。

Python 3 ints do not have a maximum.

If your purpose is to determine the maximum size of an int in C when compiled the same way Python was, you can use the struct module to find out:

>>> import struct
>>> platform_c_maxint = 2 ** (struct.Struct('i').size * 8 - 1) - 1

If you are curious about the internal implementation details of Python 3 int objects, Look at sys.int_info for bits per digit and digit size details. No normal program should care about these.


回答 3

如果您要查找的数字大于所有其他数字:

方法1:

float('inf')

方法2:

import sys
max = sys.maxsize

如果您要寻找一个小于所有其他数字的数字:

方法1:

float('-inf')

方法2:

import sys
min = -sys.maxsize - 1

方法1在Python2和Python3中均有效。方法2在Python3中有效。我没有在Python2中尝试方法2。

If you are looking for a number that is bigger than all others:

Method 1:

float('inf')

Method 2:

import sys
max = sys.maxsize

If you are looking for a number that is smaller than all others:

Method 1:

float('-inf')

Method 2:

import sys
min = -sys.maxsize - 1

Method 1 works in both Python2 and Python3. Method 2 works in Python3. I have not tried Method 2 in Python2.


回答 4

由于Python 3的int具有任意长度,因此Python 3.0不再具有sys.maxint。它具有sys.maxsize而不是sys.maxint;正数size_t或Py_ssize_t的最大大小。

Python 3.0 doesn’t have sys.maxint any more since Python 3’s ints are of arbitrary length. Instead of sys.maxint it has sys.maxsize; the maximum size of a positive sized size_t aka Py_ssize_t.


回答 5

另一种方法是

import math

... math.inf ...

An alternative is

import math

... math.inf ...

PyLint消息:日志记录格式插值

问题:PyLint消息:日志记录格式插值

对于以下代码:

logger.debug('message: {}'.format('test'))

pylint 产生以下警告:

日志记录格式插值(W1202):

在日志记录函数中使用%格式,并将%参数作为参数传递。在日志记录语句的调用形式为“ logging。(format_string.format(format_args …))”时使用。此类调用应改为使用%格式,但通过将参数作为参数传递,将插值留给日志记录函数。

我知道我可以关闭此警告,但我想理解它。我假定使用format()是在Python 3中打印语句的首选方法。为什么这对于logger语句不是正确的?

For the following code:

logger.debug('message: {}'.format('test'))

pylint produces the following warning:

logging-format-interpolation (W1202):

Use % formatting in logging functions and pass the % parameters as arguments Used when a logging statement has a call form of “logging.(format_string.format(format_args…))”. Such calls should use % formatting instead, but leave interpolation to the logging function by passing the parameters as arguments.

I know I can turn off this warning, but I’d like to understand it. I assumed using format() is the preferred way to print out statements in Python 3. Why is this not true for logger statements?


回答 0

logger语句不是正确的,因为它依赖于以前的“%”格式(例如字符串)来使用给logger调用提供的额外参数来对该字符串进行延迟插值。例如,而不是做:

logger.error('oops caused by %s' % exc)

你应该做

logger.error('oops caused by %s', exc)

因此,仅当实际发出消息时才对字符串进行插值。

使用时,您将无法利用此功能.format()


根据文档的“ 优化”部分logging

消息参数的格式将推迟到无法避免为止。但是,计算传递给日志记录方法的参数也可能很昂贵,并且如果记录器将只是丢弃您的事件,则可能要避免这样做。

It is not true for logger statement because it relies on former “%” format like string to provide lazy interpolation of this string using extra arguments given to the logger call. For instance instead of doing:

logger.error('oops caused by %s' % exc)

you should do

logger.error('oops caused by %s', exc)

so the string will only be interpolated if the message is actually emitted.

You can’t benefit of this functionality when using .format().


Per the Optimization section of the logging docs:

Formatting of message arguments is deferred until it cannot be avoided. However, computing the arguments passed to the logging method can also be expensive, and you may want to avoid doing it if the logger will just throw away your event.


回答 1

也许这段时间的差异可以帮助您。

以下描述不是您问题的答案,但可以帮助人们。

对于pylint的2.4:对于中记录样式3个选择.pylintrc文件:oldnewfstr

fstr选项在2.4中添加,在2.5中删除

.pylintrc文件说明(v2.4):

[LOGGING]

# Format style used to check logging format string. `old` means using %
# formatting, `new` is for `{}` formatting,and `fstr` is for f-strings.
logging-format-style=old

对于旧的logging-format-style=old):

foo = "bar"
self.logger.info("foo: %s", foo)

对于logging-format-style=new):

foo = "bar"
self.logger.info("foo: {}", foo)
# OR
self.logger.info("foo: {foo}", foo=foo)

注意:即使选择选项,也无法使用。.format()new

pylint 对于此代码仍然给出相同的警告

self.logger.info("foo: {}".format(foo))  # W1202
# OR
self.logger.info("foo: {foo}".format(foo=foo))  # W1202

对于fstrlogging-format-style=fstr):

foo = "bar"
self.logger.info(f"foo: {foo}")

就个人而言,由于PEP-0498,我更喜欢fstr选项。

Maybe this time differences can help you.

Following description is not the answer for your question, but it can help people.

If you want to use fstrings (Literal String Interpolation) for logging, then you can disable it from .pylintrc file with disable=logging-fstring-interpolation, see: related issue and comment.

Also you can disable logging-format-interpolation.


For pylint 2.4:

There are 3 options for logging style in the .pylintrc file: old, new, fstr

fstr option added in 2.4 and removed in 2.5

Description from .pylintrc file (v2.4):

[LOGGING]

# Format style used to check logging format string. `old` means using %
# formatting, `new` is for `{}` formatting,and `fstr` is for f-strings.
logging-format-style=old

for old (logging-format-style=old):

foo = "bar"
self.logger.info("foo: %s", foo)

for new (logging-format-style=new):

foo = "bar"
self.logger.info("foo: {}", foo)
# OR
self.logger.info("foo: {foo}", foo=foo)

Note: you can not use .format() even if you select new option.

pylint still gives the same warning for this code:

self.logger.info("foo: {}".format(foo))  # W1202
# OR
self.logger.info("foo: {foo}".format(foo=foo))  # W1202

for fstr (logging-format-style=fstr):

foo = "bar"
self.logger.info(f"foo: {foo}")

Personally, I prefer fstr option because of PEP-0498.


回答 2

以我的经验,对于惰性插值而言,比优化(对于大多数用例)更令人信服的原因是,它与Sentry之类的日志聚合器配合得很好。

考虑“用户已登录”日志消息。如果将用户插值到格式字符串中,则与用户一样多的不同日志消息。如果您像这样使用惰性插值,则日志聚合器可以更合理地将其解释为具有多个不同实例的同一条日志消息。

In my experience a more compelling reason than optimization (for most use cases) for the lazy interpolation is that it plays nicely with log aggregators like Sentry.

Consider a ‘user logged in’ log message. If you interpolate the user into the format string, you have as many distinct log messages as there are users. If you use lazy interpolation like this, the log aggregator can more reasonably interpret this as the same log message with a bunch of different instances.


Python dict如何创建密钥或向密钥添加元素?

问题:Python dict如何创建密钥或向密钥添加元素?

我有一本空字典。名称:dict_x 将具有其值为列表的键。

从一个单独的迭代中,我获得一个键(例如:)key_123和一个项目(一个元组),将其放置在dict_xvalue 的列表中key_123

如果该键已经存在,我想添加此项。如果此键不存在,我想用一个空列表创建它,然后追加到它或只在其中添加一个元组。

将来再次出现此键时,由于它存在,我希望再次添加该值。

我的代码包含以下内容:

获取关键和价值。

看看中是否存在NOTdict_x

如果没有创建它: dict_x[key] == []

之后: dict_x[key].append(value)

这是这样做的方式吗?我应该尝试使用try/except积木吗?

I have an empty dictionary. Name: dict_x It is to have keys of which values are lists.

From a separate iteration, I obtain a key (ex: key_123), and an item (a tuple) to place in the list of dict_x‘s value key_123.

If this key already exists, I want to append this item. If this key does not exist, I want to create it with an empty list and then append to it or just create it with a tuple in it.

In future when again this key comes up, since it exists, I want the value to be appended again.

My code consists of this:

Get key and value.

See if NOT key exists in dict_x.

and if not create it: dict_x[key] == []

Afterwards: dict_x[key].append(value)

Is this the way to do it? Shall I try to use try/except blocks?


回答 0

用途dict.setdefault()

dic.setdefault(key,[]).append(value)

help(dict.setdefault)

    setdefault(...)
        D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D

Use dict.setdefault():

dic.setdefault(key,[]).append(value)

help(dict.setdefault):

    setdefault(...)
        D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D

回答 1

这是执行此操作的各种方法,因此您可以比较它的外观并选择所需的内容。我以我认为最“ pythonic”的方式对它们进行了排序,并评论了乍一看可能并不明显的利弊:

使用collections.defaultdict

import collections
dict_x = collections.defaultdict(list)

...

dict_x[key].append(value)

优点:可能是最佳性能。缺点:在Python 2.4.x中不可用。

使用dict().setdefault()

dict_x = {}

...

dict_x.setdefault(key, []).append(value)

缺点:未使用list()s的创建效率低。

使用try ... except

dict_x = {}

...

try:
    values = dict_x[key]
except KeyError:
    values = dict_x[key] = []
values.append(value)

要么:

try:
    dict_x[key].append(value)
except KeyError:
    dict_x[key] = [value]

Here are the various ways to do this so you can compare how it looks and choose what you like. I’ve ordered them in a way that I think is most “pythonic”, and commented the pros and cons that might not be obvious at first glance:

Using collections.defaultdict:

import collections
dict_x = collections.defaultdict(list)

...

dict_x[key].append(value)

Pros: Probably best performance. Cons: Not available in Python 2.4.x.

Using dict().setdefault():

dict_x = {}

...

dict_x.setdefault(key, []).append(value)

Cons: Inefficient creation of unused list()s.

Using try ... except:

dict_x = {}

...

try:
    values = dict_x[key]
except KeyError:
    values = dict_x[key] = []
values.append(value)

Or:

try:
    dict_x[key].append(value)
except KeyError:
    dict_x[key] = [value]

回答 2

您可以为此使用defaultdict

from collections import defaultdict
d = defaultdict(list)
d['key'].append('mykey')

这比setdefault没有创建最终不会使用的新列表要有效得多。每次调用setdefault都会创建一个新列表,即使该条目已存在于字典中也是如此。

You can use a defaultdict for this.

from collections import defaultdict
d = defaultdict(list)
d['key'].append('mykey')

This is slightly more efficient than setdefault since you don’t end up creating new lists that you don’t end up using. Every call to setdefault is going to create a new list, even if the item already exists in the dictionary.


回答 3

您可以使用defaultdictcollections

来自doc的示例:

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
    d[k].append(v)

You can use defaultdict in collections.

An example from doc:

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
    d[k].append(v)

回答 4

dictionary['key'] = dictionary.get('key', []) + list_to_append
dictionary['key'] = dictionary.get('key', []) + list_to_append

如何从Python中的一组字符串中删除特定的子字符串?

问题:如何从Python中的一组字符串中删除特定的子字符串?

我有一组字符串set1,并且其中的所有字符串set1都有两个不需要并且想要删除的特定子字符串。
输入示例: set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
所以基本上我希望从所有字符串中删除.good.bad子字符串。
我试过的

for x in set1:
    x.replace('.good','')
    x.replace('.bad','')

但这似乎根本不起作用。输出绝对没有变化,它与输入相同。我尝试使用for x in list(set1)而不是原始版本,但没有任何改变。

I have a set of strings set1, and all the strings in set1 have a two specific substrings which I don’t need and want to remove.
Sample Input: set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
So basically I want the .good and .bad substrings removed from all the strings.
What I tried:

for x in set1:
    x.replace('.good','')
    x.replace('.bad','')

But this doesn’t seem to work at all. There is absolutely no change in the output and it is the same as the input. I tried using for x in list(set1) instead of the original one but that doesn’t change anything.


回答 0

字符串是不可变的。string.replace(python 2.x)或str.replace(python 3.x)创建一个字符串。在文档中对此进行了说明:

返回字符串s 的副本,其中所有出现的子字符串old都替换为new。…

这意味着您必须重新分配集合或重新填充集合(使用集合推导更容易进行重新分配

new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}

Strings are immutable. string.replace (python 2.x) or str.replace (python 3.x) creates a new string. This is stated in the documentation:

Return a copy of string s with all occurrences of substring old replaced by new. …

This means you have to re-allocate the set or re-populate it (re-allocating is easier with set comprehension):

new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}

回答 1

>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'

.replace不会更改字符串,而是返回字符串的副本并替换。您不能直接更改字符串,因为字符串是不可变的。

您需要从中获取返回值x.replace并将其放入新集合中。

>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'

.replace doesn’t change the string, it returns a copy of the string with the replacement. You can’t change the string directly because strings are immutable.

You need to take the return values from x.replace and put them in a new set.


回答 2

您所需要的只是一点黑魔法!

>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']

All you need is a bit of black magic!

>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']

回答 3

您可以这样做:

import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}

for x in set1:
    x.replace('.good',' ')
    x.replace('.bad',' ')
    x = re.sub('\.good$', '', x)
    x = re.sub('\.bad$', '', x)
    print(x)

You could do this:

import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}

for x in set1:
    x.replace('.good',' ')
    x.replace('.bad',' ')
    x = re.sub('\.good$', '', x)
    x = re.sub('\.bad$', '', x)
    print(x)

回答 4

我进行了测试(但这不是您的示例),并且数据未按顺序或完整地返回它们

>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}

我证明这可行:

>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']

要么

>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
...     newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']

I did the test (but it is not your example) and the data does not return them orderly or complete

>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}

I proved that this works:

>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']

or

>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
...     newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']

回答 5

当有多个要删除的子字符串时,一种简单有效的选择是re.sub与已编译模式一起使用,该模式涉及使用regex OR(|)管道连接所有要删除的子字符串。

import re

to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']

p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']

When there are multiple substrings to remove, one simple and effective option is to use re.sub with a compiled pattern that involves joining all the substrings-to-remove using the regex OR (|) pipe.

import re

to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']

p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']

回答 6

如果清单

我正在为包含一组字符串的列表做某事,并且您想要删除具有特定子字符串的所有行,可以执行此操作

import re
def RemoveInList(sub,LinSplitUnOr):
    indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
    A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
    return A

这里sub是一个图案,你不希望在行的列表LinSplitUnOr

例如

A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)

然后A

If list

I was doing something for a list which is a set of strings and you want to remove all lines that have a certain substring you can do this

import re
def RemoveInList(sub,LinSplitUnOr):
    indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
    A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
    return A

where sub is a patter that you do not wish to have in a list of lines LinSplitUnOr

for example

A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)

Then A will be


回答 7

如果您从列表中删除某些内容,则可以使用以下方式:(方法子区分大小写)

new_list = []
old_list= ["ABCDEFG","HKLMNOP","QRSTUV"]

for data in old_list:
     new_list.append(re.sub("AB|M|TV", " ", data))

print(new_list) // output : [' CDEFG', 'HKL NOP', 'QRSTUV']

if you delete something from list , u can use this way : (method sub is case sensitive)

new_list = []
old_list= ["ABCDEFG","HKLMNOP","QRSTUV"]

for data in old_list:
     new_list.append(re.sub("AB|M|TV", " ", data))

print(new_list) // output : [' CDEFG', 'HKL NOP', 'QRSTUV']