标签归档:string

将版本嵌入python包的标准方法?

问题:将版本嵌入python包的标准方法?

有没有一种标准的方式可以将版本字符串与python软件包相关联,从而可以执行以下操作?

import foo
print foo.version

我可以想象有某种方法可以检索数据而无需任何额外的硬编码,因为setup.py已经指定了次要/主要字符串。另一种解决方案,我发现是有import __version__我的foo/__init__.py,然后让__version__.py所产生的setup.py

Is there a standard way to associate version string with a python package in such way that I could do the following?

import foo
print foo.version

I would imagine there’s some way to retrieve that data without any extra hardcoding, since minor/major strings are specified in setup.py already. Alternative solution that I found was to have import __version__ in my foo/__init__.py and then have __version__.py generated by setup.py.


回答 0

不是直接回答您的问题,而是您应该考虑命名它__version__,而不是version

这几乎是一个准标准。标准库中的许多模块都使用__version__,并且在许多第三方模块中也使用了它,因此它是准标准的。

通常,它__version__是一个字符串,但有时它也是一个浮点数或元组。

编辑:正如S.Lott所提到的(谢谢!),PEP 8明确表示:

模块级Dunder名称

模块级“dunders”(即名称具有两个前缘和两个纵下划线),例如__all____author____version__等应被放置在模块文档字符串之后,但在除了从任何导入语句__future__进口。

您还应确保版本号符合PEP 440中描述的格式(PEP 386是该标准的先前版本)。

Not directly an answer to your question, but you should consider naming it __version__, not version.

This is almost a quasi-standard. Many modules in the standard library use __version__, and this is also used in lots of 3rd-party modules, so it’s the quasi-standard.

Usually, __version__ is a string, but sometimes it’s also a float or tuple.

Edit: as mentioned by S.Lott (Thank you!), PEP 8 says it explicitly:

Module Level Dunder Names

Module level “dunders” (i.e. names with two leading and two trailing underscores) such as __all__, __author__, __version__, etc. should be placed after the module docstring but before any import statements except from __future__ imports.

You should also make sure that the version number conforms to the format described in PEP 440 (PEP 386 a previous version of this standard).


回答 1

我使用一个_version.py文件作为“一次规范的位置”来存储版本信息:

  1. 它提供了一个__version__属性。

  2. 它提供了标准的元数据版本。因此,它将由pkg_resources解析包元数据的其他工具(EGG-INFO和/或PKG-INFO,PEP 0345)检测到。

  3. 在构建软件包时,它不会导入您的软件包(或其他任何东西),这在某些情况下可能会导致问题。(请参阅下面的评论,这可能会导致什么问题。)

  4. 写下版本号的位置只有一个,因此,当版本号更改时,只有一个地方可以更改版本号,并且版本不一致的可能性较小。

它是这样工作的:存储版本号的“一个规范位置”是一个.py文件,名为“ _version.py”,位于您的Python软件包中,例如myniftyapp/_version.py。该文件是Python模块,但您的setup.py不会导入它!(这会使功能3失效。)相反,setup.py知道此文件的内容非常简单,类似于:

__version__ = "3.6.5"

因此,您的setup.py将使用以下代码打开文件并对其进行解析:

import re
VERSIONFILE="myniftyapp/_version.py"
verstrline = open(VERSIONFILE, "rt").read()
VSRE = r"^__version__ = ['\"]([^'\"]*)['\"]"
mo = re.search(VSRE, verstrline, re.M)
if mo:
    verstr = mo.group(1)
else:
    raise RuntimeError("Unable to find version string in %s." % (VERSIONFILE,))

然后,您的setup.py将该字符串作为“ version”参数的值传递给setup(),从而满足功能2的要求。

为了满足功能1,您可以让包(在运行时,而不是在安装时!)从_version文件中导入,myniftyapp/__init__.py如下所示:

from _version import __version__

这是我使用多年的这种技术的示例

该示例中的代码稍微复杂一点,但是我在此注释中编写的简化示例应该是完整的实现。

这是导入版本的示例代码

如果您发现此方法有任何问题,请告诉我。

I use a single _version.py file as the “once cannonical place” to store version information:

  1. It provides a __version__ attribute.

  2. It provides the standard metadata version. Therefore it will be detected by pkg_resources or other tools that parse the package metadata (EGG-INFO and/or PKG-INFO, PEP 0345).

  3. It doesn’t import your package (or anything else) when building your package, which can cause problems in some situations. (See the comments below about what problems this can cause.)

  4. There is only one place that the version number is written down, so there is only one place to change it when the version number changes, and there is less chance of inconsistent versions.

Here is how it works: the “one canonical place” to store the version number is a .py file, named “_version.py” which is in your Python package, for example in myniftyapp/_version.py. This file is a Python module, but your setup.py doesn’t import it! (That would defeat feature 3.) Instead your setup.py knows that the contents of this file is very simple, something like:

__version__ = "3.6.5"

And so your setup.py opens the file and parses it, with code like:

import re
VERSIONFILE="myniftyapp/_version.py"
verstrline = open(VERSIONFILE, "rt").read()
VSRE = r"^__version__ = ['\"]([^'\"]*)['\"]"
mo = re.search(VSRE, verstrline, re.M)
if mo:
    verstr = mo.group(1)
else:
    raise RuntimeError("Unable to find version string in %s." % (VERSIONFILE,))

Then your setup.py passes that string as the value of the “version” argument to setup(), thus satisfying feature 2.

To satisfy feature 1, you can have your package (at run-time, not at setup time!) import the _version file from myniftyapp/__init__.py like this:

from _version import __version__

Here is an example of this technique that I’ve been using for years.

The code in that example is a bit more complicated, but the simplified example that I wrote into this comment should be a complete implementation.

Here is example code of importing the version.

If you see anything wrong with this approach, please let me know.


回答 2

改写2017-05

经过十多年的编写Python代码和管理各种程序包的经历,我得出的结论是,DIY可能不是最好的方法。

我开始使用pbr软件包来处理软件包中的版本控制。如果您将git用作SCM,它将像魔术一样适合您的工作流程,从而节省了数周的工作(您可能会对问题的复杂程度感到惊讶)。

截至目前,pbr在最常用的python软件包中排名第11,并且达到这一水平还没有任何肮脏的技巧:仅仅是一个:用一种非常简单的方法解决了常见的包装问题。

pbr 可以承担更多的程序包维护负担,不仅限于版本控制,还不强迫您采用其所有优点。

因此,为了让您了解一次提交中采用pbr的外观,请看一下将包装夹到pbr

可能您会发现该版本根本没有存储在存储库中。PBR确实从Git分支和标签中检测到它。

无需担心没有git存储库时会发生什么情况,因为打包或安装应用程序时pbr会“编译”并缓存版本,因此git没有运行时依赖性。

旧解决方案

这是到目前为止我所见过的最好的解决方案,它也解释了原因:

内部yourpackage/version.py

# Store the version here so:
# 1) we don't load dependencies by storing it in __init__.py
# 2) we can import it in setup.py for the same reason
# 3) we can import it into your module module
__version__ = '0.12'

内部yourpackage/__init__.py

from .version import __version__

内部setup.py

exec(open('yourpackage/version.py').read())
setup(
    ...
    version=__version__,
    ...

如果您知道另一种似乎更好的方法,请告诉我。

Rewritten 2017-05

After 13+ years of writing Python code and managing various packages, I came to the conclusion that DIY is maybe not the best approach.

I started using the pbr package for dealing with versioning in my packages. If you are using git as your SCM, this will fit into your workflow like magic, saving your weeks of work (you will be surprised about how complex the issue can be).

As of today, pbr is the 11th most used python package, and reaching this level didn’t include any dirty tricks. It was only one thing — fixing a common packaging problem in a very simple way.

pbr can do more of the package maintenance burden, and is not limited to versioning, but it does not force you to adopt all its benefits.

So to give you an idea about how it looks to adopt pbr in one commit have a look switching packaging to pbr

Probably you would observed that the version is not stored at all in the repository. PBR does detect it from Git branches and tags.

No need to worry about what happens when you do not have a git repository because pbr does “compile” and cache the version when you package or install the applications, so there is no runtime dependency on git.

Old solution

Here is the best solution I’ve seen so far and it also explains why:

Inside yourpackage/version.py:

# Store the version here so:
# 1) we don't load dependencies by storing it in __init__.py
# 2) we can import it in setup.py for the same reason
# 3) we can import it into your module module
__version__ = '0.12'

Inside yourpackage/__init__.py:

from .version import __version__

Inside setup.py:

exec(open('yourpackage/version.py').read())
setup(
    ...
    version=__version__,
    ...

If you know another approach that seems to be better let me know.


回答 3

根据递延的PEP 396(模块版本号),有一种建议的方法。它从原理上描述了要遵循的模块的一个(公认的可选)标准。这是一个片段:

3)当一个模块(或包)包括一个版本号时,该版本应该在__version__属性中可用。

4)对于位于命名空间包中的模块,该模块应包含该__version__属性。命名空间包本身不应包含其自己的__version__属性。

5)__version__属性的值应该是一个字符串。

Per the deferred PEP 396 (Module Version Numbers), there is a proposed way to do this. It describes, with rationale, an (admittedly optional) standard for modules to follow. Here’s a snippet:

3) When a module (or package) includes a version number, the version SHOULD be available in the __version__ attribute.

4) For modules which live inside a namespace package, the module SHOULD include the __version__ attribute. The namespace package itself SHOULD NOT include its own __version__ attribute.

5) The __version__ attribute’s value SHOULD be a string.


回答 4

尽管这可能为时已晚,但是对于先前的答案有一个稍微简单的替代方法:

__version_info__ = ('1', '2', '3')
__version__ = '.'.join(__version_info__)

(使用来将版本号的自动递增部分转换为字符串将是非常简单的str()。)

当然,据我所见,人们在使用时通常会使用类似先前提到的版本__version_info__,并将其存储为int元组;但是,我不太明白这样做的意义,因为我怀疑在某些情况下您会出于好奇或自动递增的目的而出于任何目的对版本号的某些部分执行数学运算,例如对版本号进行加减运算(即使如此,int()并且str()可以很容易地使用)。(另一方面,其他人的代码可能期望数字元组而不是字符串元组,从而导致失败。)

当然,这是我自己的观点,我很高兴希望其他人使用数字元组提供输入。


正如shezi提醒我的那样,数字字符串的(词法)比较不一定具有与直接数字比较相同的结果;为此,将需要前导零。因此,最后,将__version_info__(或将要调用的任何形式)存储为整数值的元组将允许更有效的版本比较。

Though this is probably far too late, there is a slightly simpler alternative to the previous answer:

__version_info__ = ('1', '2', '3')
__version__ = '.'.join(__version_info__)

(And it would be fairly simple to convert auto-incrementing portions of version numbers to a string using str().)

Of course, from what I’ve seen, people tend to use something like the previously-mentioned version when using __version_info__, and as such store it as a tuple of ints; however, I don’t quite see the point in doing so, as I doubt there are situations where you would perform mathematical operations such as addition and subtraction on portions of version numbers for any purpose besides curiosity or auto-incrementation (and even then, int() and str() can be used fairly easily). (On the other hand, there is the possibility of someone else’s code expecting a numerical tuple rather than a string tuple and thus failing.)

This is, of course, my own view, and I would gladly like others’ input on using a numerical tuple.


As shezi reminded me, (lexical) comparisons of number strings do not necessarily have the same result as direct numerical comparisons; leading zeroes would be required to provide for that. So in the end, storing __version_info__ (or whatever it would be called) as a tuple of integer values would allow for more efficient version comparisons.


回答 5

这里的许多解决方案都忽略了git版本标记,这仍然意味着您必须在多个位置跟踪版本(错误)。我通过以下目标实现了这一目标:

  • 派生的从标签的所有Python版本引用git回购
  • 使用一个无需输入的命令自动执行git tag/ pushsetup.py upload步骤。

这个怎么运作:

  1. make release命令中,找到并递增git repo中的最后一个标记版本。标签被推回到origin

  2. Makefile存储的版本在src/_version.py那里将被读取setup.py,并且还包含在释放。不要检_version.py入源代码管理!

  3. setup.py命令从中读取新版本字符串package.__version__

细节:

生成文件

# remove optional 'v' and trailing hash "v1.0-N-HASH" -> "v1.0-N"
git_describe_ver = $(shell git describe --tags | sed -E -e 's/^v//' -e 's/(.*)-.*/\1/')
git_tag_ver      = $(shell git describe --abbrev=0)
next_patch_ver = $(shell python versionbump.py --patch $(call git_tag_ver))
next_minor_ver = $(shell python versionbump.py --minor $(call git_tag_ver))
next_major_ver = $(shell python versionbump.py --major $(call git_tag_ver))

.PHONY: ${MODULE}/_version.py
${MODULE}/_version.py:
    echo '__version__ = "$(call git_describe_ver)"' > $@

.PHONY: release
release: test lint mypy
    git tag -a $(call next_patch_ver)
    $(MAKE) ${MODULE}/_version.py
    python setup.py check sdist upload # (legacy "upload" method)
    # twine upload dist/*  (preferred method)
    git push origin master --tags

release目标总是递增第三版数字,但可以使用next_minor_vernext_major_ver递增其他数字。这些命令依赖于versionbump.py签入仓库根目录的脚本

versionbump.py

"""An auto-increment tool for version strings."""

import sys
import unittest

import click
from click.testing import CliRunner  # type: ignore

__version__ = '0.1'

MIN_DIGITS = 2
MAX_DIGITS = 3


@click.command()
@click.argument('version')
@click.option('--major', 'bump_idx', flag_value=0, help='Increment major number.')
@click.option('--minor', 'bump_idx', flag_value=1, help='Increment minor number.')
@click.option('--patch', 'bump_idx', flag_value=2, default=True, help='Increment patch number.')
def cli(version: str, bump_idx: int) -> None:
    """Bumps a MAJOR.MINOR.PATCH version string at the specified index location or 'patch' digit. An
    optional 'v' prefix is allowed and will be included in the output if found."""
    prefix = version[0] if version[0].isalpha() else ''
    digits = version.lower().lstrip('v').split('.')

    if len(digits) > MAX_DIGITS:
        click.secho('ERROR: Too many digits', fg='red', err=True)
        sys.exit(1)

    digits = (digits + ['0'] * MAX_DIGITS)[:MAX_DIGITS]  # Extend total digits to max.
    digits[bump_idx] = str(int(digits[bump_idx]) + 1)  # Increment the desired digit.

    # Zero rightmost digits after bump position.
    for i in range(bump_idx + 1, MAX_DIGITS):
        digits[i] = '0'
    digits = digits[:max(MIN_DIGITS, bump_idx + 1)]  # Trim rightmost digits.
    click.echo(prefix + '.'.join(digits), nl=False)


if __name__ == '__main__':
    cli()  # pylint: disable=no-value-for-parameter

这对于如何处理和增加版本号起了很大的作用git

__init__.py

my_module/_version.py文件已导入my_module/__init__.py。将要与模块一起分发的所有静态安装配置放在此处。

from ._version import __version__
__author__ = ''
__email__ = ''

setup.py

最后一步是从my_module模块读取版本信息。

from setuptools import setup, find_packages

pkg_vars  = {}

with open("{MODULE}/_version.py") as fp:
    exec(fp.read(), pkg_vars)

setup(
    version=pkg_vars['__version__'],
    ...
    ...
)

当然,要使所有这些都起作用,您必须在存储库中至少有一个版本标签才能启动。

git tag -a v0.0.1

Many of these solutions here ignore git version tags which still means you have to track version in multiple places (bad). I approached this with the following goals:

  • Derive all python version references from a tag in the git repo
  • Automate git tag/push and setup.py upload steps with a single command that takes no inputs.

How it works:

  1. From a make release command, the last tagged version in the git repo is found and incremented. The tag is pushed back to origin.

  2. The Makefile stores the version in src/_version.py where it will be read by setup.py and also included in the release. Do not check _version.py into source control!

  3. setup.py command reads the new version string from package.__version__.

Details:

Makefile

# remove optional 'v' and trailing hash "v1.0-N-HASH" -> "v1.0-N"
git_describe_ver = $(shell git describe --tags | sed -E -e 's/^v//' -e 's/(.*)-.*/\1/')
git_tag_ver      = $(shell git describe --abbrev=0)
next_patch_ver = $(shell python versionbump.py --patch $(call git_tag_ver))
next_minor_ver = $(shell python versionbump.py --minor $(call git_tag_ver))
next_major_ver = $(shell python versionbump.py --major $(call git_tag_ver))

.PHONY: ${MODULE}/_version.py
${MODULE}/_version.py:
    echo '__version__ = "$(call git_describe_ver)"' > $@

.PHONY: release
release: test lint mypy
    git tag -a $(call next_patch_ver)
    $(MAKE) ${MODULE}/_version.py
    python setup.py check sdist upload # (legacy "upload" method)
    # twine upload dist/*  (preferred method)
    git push origin master --tags

The release target always increments the 3rd version digit, but you can use the next_minor_ver or next_major_ver to increment the other digits. The commands rely on the versionbump.py script that is checked into the root of the repo

versionbump.py

"""An auto-increment tool for version strings."""

import sys
import unittest

import click
from click.testing import CliRunner  # type: ignore

__version__ = '0.1'

MIN_DIGITS = 2
MAX_DIGITS = 3


@click.command()
@click.argument('version')
@click.option('--major', 'bump_idx', flag_value=0, help='Increment major number.')
@click.option('--minor', 'bump_idx', flag_value=1, help='Increment minor number.')
@click.option('--patch', 'bump_idx', flag_value=2, default=True, help='Increment patch number.')
def cli(version: str, bump_idx: int) -> None:
    """Bumps a MAJOR.MINOR.PATCH version string at the specified index location or 'patch' digit. An
    optional 'v' prefix is allowed and will be included in the output if found."""
    prefix = version[0] if version[0].isalpha() else ''
    digits = version.lower().lstrip('v').split('.')

    if len(digits) > MAX_DIGITS:
        click.secho('ERROR: Too many digits', fg='red', err=True)
        sys.exit(1)

    digits = (digits + ['0'] * MAX_DIGITS)[:MAX_DIGITS]  # Extend total digits to max.
    digits[bump_idx] = str(int(digits[bump_idx]) + 1)  # Increment the desired digit.

    # Zero rightmost digits after bump position.
    for i in range(bump_idx + 1, MAX_DIGITS):
        digits[i] = '0'
    digits = digits[:max(MIN_DIGITS, bump_idx + 1)]  # Trim rightmost digits.
    click.echo(prefix + '.'.join(digits), nl=False)


if __name__ == '__main__':
    cli()  # pylint: disable=no-value-for-parameter

This does the heavy lifting how to process and increment the version number from git.

__init__.py

The my_module/_version.py file is imported into my_module/__init__.py. Put any static install config here that you want distributed with your module.

from ._version import __version__
__author__ = ''
__email__ = ''

setup.py

The last step is to read the version info from the my_module module.

from setuptools import setup, find_packages

pkg_vars  = {}

with open("{MODULE}/_version.py") as fp:
    exec(fp.read(), pkg_vars)

setup(
    version=pkg_vars['__version__'],
    ...
    ...
)

Of course, for all of this to work you’ll have to have at least one version tag in your repo to start.

git tag -a v0.0.1

回答 6

我在包目录中使用JSON文件。这符合Zooko的要求。

内部pkg_dir/pkg_info.json

{"version": "0.1.0"}

内部setup.py

from distutils.core import setup
import json

with open('pkg_dir/pkg_info.json') as fp:
    _info = json.load(fp)

setup(
    version=_info['version'],
    ...
    )

内部pkg_dir/__init__.py

import json
from os.path import dirname

with open(dirname(__file__) + '/pkg_info.json') as fp:
    _info = json.load(fp)

__version__ = _info['version']

我还把其他信息放进pkg_info.json,例如作者。我喜欢使用JSON,因为我可以自动管理元数据。

I use a JSON file in the package dir. This fits Zooko’s requirements.

Inside pkg_dir/pkg_info.json:

{"version": "0.1.0"}

Inside setup.py:

from distutils.core import setup
import json

with open('pkg_dir/pkg_info.json') as fp:
    _info = json.load(fp)

setup(
    version=_info['version'],
    ...
    )

Inside pkg_dir/__init__.py:

import json
from os.path import dirname

with open(dirname(__file__) + '/pkg_info.json') as fp:
    _info = json.load(fp)

__version__ = _info['version']

I also put other information in pkg_info.json, like author. I like to use JSON because I can automate management of metadata.


回答 7

同样值得一提的是,它还__version__具有半标准性。在python中,__version_info__这是一个元组,在简单的情况下,您可以执行以下操作:

__version__ = '1.2.3'
__version_info__ = tuple([ int(num) for num in __version__.split('.')])

…您可以__version__从文件或任何其他内容中获取字符串。

Also worth noting is that as well as __version__ being a semi-std. in python so is __version_info__ which is a tuple, in the simple cases you can just do something like:

__version__ = '1.2.3'
__version_info__ = tuple([ int(num) for num in __version__.split('.')])

…and you can get the __version__ string from a file, or whatever.


回答 8

似乎没有一种将版本字符串嵌入python包的标准方法。我见过的大多数软件包都使用您的解决方案的某些变体,即eitner

  1. 嵌入版本,setup.pysetup.py生成version.py仅包含版本信息的模块(例如),该模块由您的软件包导入,或者

  2. 相反:将版本信息放入包本身,然后导入在其中设置版本 setup.py

There doesn’t seem to be a standard way to embed a version string in a python package. Most packages I’ve seen use some variant of your solution, i.e. eitner

  1. Embed the version in setup.py and have setup.py generate a module (e.g. version.py) containing only version info, that’s imported by your package, or

  2. The reverse: put the version info in your package itself, and import that to set the version in setup.py


回答 9

箭头以一种有趣的方式处理它。

现在(从2e5031b开始

arrow/__init__.py

__version__ = 'x.y.z'

setup.py

from arrow import __version__

setup(
    name='arrow',
    version=__version__,
    # [...]
)

之前

arrow/__init__.py

__version__ = 'x.y.z'
VERSION = __version__

setup.py

def grep(attrname):
    pattern = r"{0}\W*=\W*'([^']+)'".format(attrname)
    strval, = re.findall(pattern, file_text)
    return strval

file_text = read(fpath('arrow/__init__.py'))

setup(
    name='arrow',
    version=grep('__version__'),
    # [...]
)

arrow handles it in an interesting way.

Now (since 2e5031b)

In arrow/__init__.py:

__version__ = 'x.y.z'

In setup.py:

from arrow import __version__

setup(
    name='arrow',
    version=__version__,
    # [...]
)

Before

In arrow/__init__.py:

__version__ = 'x.y.z'
VERSION = __version__

In setup.py:

def grep(attrname):
    pattern = r"{0}\W*=\W*'([^']+)'".format(attrname)
    strval, = re.findall(pattern, file_text)
    return strval

file_text = read(fpath('arrow/__init__.py'))

setup(
    name='arrow',
    version=grep('__version__'),
    # [...]
)

回答 10

我还看到了另一种风格:

>>> django.VERSION
(1, 1, 0, 'final', 0)

I also saw another style:

>>> django.VERSION
(1, 1, 0, 'final', 0)

回答 11

使用setuptoolspbr

没有管理版本的标准方法,但是管理软件包的标准方法是setuptools

我发现总体上管理版本的最佳解决方案是setuptoolspbr扩展一起使用。现在,这是我管理版本的标准方法。

为完整项目设置项目对于简单项目可能是过大的,但是如果您需要管理版本,则可能处于正确的级别来设置所有内容。这样做还可以使您的软件包在PyPi上发布,因此每个人都可以通过Pip下载和使用它。

PBR将大多数元数据从setup.py工具中移出,并移到一个setup.cfg文件中,然后该文件用作大多数元数据的源,其中可以包括版本。这允许使用pyinstaller所需的类似方法将元数据打包到可执行文件中(如果需要,则可能需要此信息),并将元数据与其他程序包管理/设置脚本分开。您可以直接setup.cfg手动更新版本字符串,并且*.egg-info在构建软件包发行版时会将其拉入文件夹。然后,您的脚本可以使用各种方法从元数据访问版本(这些过程在下面的部分中概述)。

将Git用于VCS / SCM时,此设置甚至更好,因为它将从Git中提取很多元数据,这样您的回购就可以成为某些元数据的主要来源,包括版本,作者,变更日志,专门针对版本,它将基于存储库中的git标签为当前提交创建一个版本字符串。

由于PBR会直接从您的git repo中提取版本,作者,changelog和其他信息,因此setup.cfg每当为您的软件包创建发行版时(使用setup.py),其中的一些元数据就可以省去并自动生成

实时最新版本

setuptools将使用setup.py以下命令实时获取最新信息:

python setup.py --version

这将setup.cfg根据所做的最新提交和存储库中存在的标签,从文件或git存储库中提取最新版本。但是,此命令不会更新发行版中的版本。

更新版本

当您使用setup.pypy setup.py sdist例如)创建分发时,所有当前信息将被提取并存储在分发中。这实际上是运行setup.py --version命令,然后将该版本信息存储package.egg-info在存储分发元数据的一组文件中的文件夹中。

关于更新版本元数据的过程的注释:

如果您不使用pbr从git中提取版本数据,则只需使用新的版本信息直接更新setup.cfg(这很容易,但是请确保这是发布过程的标准部分)。

如果您使用的是git,而无需创建源代码或二进制发行版(使用python setup.py sdistpython setup.py bdist_xxx命令之一),则将git repo信息更新到<mypackage>.egg-info元数据文件夹中的最简单方法就是运行python setup.py install命令。这将运行与从git repo中提取元数据有关的所有PBR功能,并更新本地.egg-info文件夹,为已定义的任何入口点安装脚本可执行文件,以及运行此命令时从输出中看到的其他功能。

请注意,.egg-info通常不会将该文件夹存储在git repo本身的标准Python .gitignore文件中(例如,来自Gitignore.IO),因为可以从您的源中生成该文件夹。如果不包括在内,请确保您具有标准的“发布过程”以在发布之前在本地更新元数据,并且您上载到PyPi.org或以其他方式分发的任何软件包都必须包含此数据以具有正确的版本。如果您希望Git存储库包含此信息,则可以将特定文件排除在忽略范围之外(即添加!*.egg-info/PKG_INFO.gitignore

从脚本访问版本

您可以在包本身的Python脚本中从当前内部版本访问元数据。例如,对于版本,到目前为止,有几种方法可以实现:

## This one is a new built-in as of Python 3.8.0 should become the standard
from importlib-metadata import version

v0 = version("mypackage")
print('v0 {}'.format(v0))

## I don't like this one because the version method is hidden
import pkg_resources  # part of setuptools

v1 = pkg_resources.require("mypackage")[0].version
print('v1 {}'.format(v1))

# Probably best for pre v3.8.0 - the output without .version is just a longer string with
# both the package name, a space, and the version string
import pkg_resources  # part of setuptools

v2 = pkg_resources.get_distribution('mypackage').version
print('v2 {}'.format(v2))

## This one seems to be slower, and with pyinstaller makes the exe a lot bigger
from pbr.version import VersionInfo

v3 = VersionInfo('mypackage').release_string()
print('v3 {}'.format(v3))

您可以将其中之一直接放入__init__.py包中以提取版本信息,如下所示,类似于其他答案:

__all__ = (
    '__version__',
    'my_package_name'
)

import pkg_resources  # part of setuptools

__version__ = pkg_resources.get_distribution("mypackage").version

Using setuptools and pbr

There is not a standard way to manage version, but the standard way to manage your packages is setuptools.

The best solution I’ve found overall for managing version is to use setuptools with the pbr extension. This is now my standard way of managing version.

Setting up your project for full packaging may be overkill for simple projects, but if you need to manage version, you are probably at the right level to just set everything up. Doing so also makes your package releasable at PyPi so everyone can download and use it with Pip.

PBR moves most metadata out of the setup.py tools and into a setup.cfg file that is then used as a source for most metadata, which can include version. This allows the metadata to be packaged into an executable using something like pyinstaller if needed (if so, you will probably need this info), and separates the metadata from the other package management/setup scripts. You can directly update the version string in setup.cfg manually, and it will be pulled into the *.egg-info folder when building your package releases. Your scripts can then access the version from the metadata using various methods (these processes are outlined in sections below).

When using Git for VCS/SCM, this setup is even better, as it will pull in a lot of the metadata from Git so that your repo can be your primary source of truth for some of the metadata, including version, authors, changelogs, etc. For version specifically, it will create a version string for the current commit based on git tags in the repo.

As PBR will pull version, author, changelog and other info directly from your git repo, so some of the metadata in setup.cfg can be left out and auto generated whenever a distribution is created for your package (using setup.py)



Get the current version in real-time

setuptools will pull the latest info in real-time using setup.py:

python setup.py --version

This will pull the latest version either from the setup.cfg file, or from the git repo, based on the latest commit that was made and tags that exist in the repo. This command doesn’t update the version in a distribution though.



Updating the version metadata

When you create a distribution with setup.py (i.e. py setup.py sdist, for example), then all the current info will be extracted and stored in the distribution. This essentially runs the setup.py --version command and then stores that version info into the package.egg-info folder in a set of files that store distribution metadata.

Note on process to update version meta-data:

If you are not using pbr to pull version data from git, then just update your setup.cfg directly with new version info (easy enough, but make sure this is a standard part of your release process).

If you are using git, and you don’t need to create a source or binary distribution (using python setup.py sdist or one of the python setup.py bdist_xxx commands) the simplest way to update the git repo info into your <mypackage>.egg-info metadata folder is to just run the python setup.py install command. This will run all the PBR functions related to pulling metadata from the git repo and update your local .egg-info folder, install script executables for any entry-points you have defined, and other functions you can see from the output when you run this command.

Note that the .egg-info folder is generally excluded from being stored in the git repo itself in standard Python .gitignore files (such as from Gitignore.IO), as it can be generated from your source. If it is excluded, make sure you have a standard “release process” to get the metadata updated locally before release, and any package you upload to PyPi.org or otherwise distribute must include this data to have the correct version. If you want the Git repo to contain this info, you can exclude specific files from being ignored (i.e. add !*.egg-info/PKG_INFO to .gitignore)



Accessing the version from a script

You can access the metadata from the current build within Python scripts in the package itself. For version, for example, there are several ways to do this I have found so far:

## This one is a new built-in as of Python 3.8.0 should become the standard
from importlib-metadata import version

v0 = version("mypackage")
print('v0 {}'.format(v0))

## I don't like this one because the version method is hidden
import pkg_resources  # part of setuptools

v1 = pkg_resources.require("mypackage")[0].version
print('v1 {}'.format(v1))

# Probably best for pre v3.8.0 - the output without .version is just a longer string with
# both the package name, a space, and the version string
import pkg_resources  # part of setuptools

v2 = pkg_resources.get_distribution('mypackage').version
print('v2 {}'.format(v2))

## This one seems to be slower, and with pyinstaller makes the exe a lot bigger
from pbr.version import VersionInfo

v3 = VersionInfo('mypackage').release_string()
print('v3 {}'.format(v3))

You can put one of these directly in your __init__.py for the package to extract the version info as follows, similar to some other answers:

__all__ = (
    '__version__',
    'my_package_name'
)

import pkg_resources  # part of setuptools

__version__ = pkg_resources.get_distribution("mypackage").version

回答 12

在尝试寻找最简单可靠的解决方案几个小时后,以下是这些部分:

在包“ / mypackage”的文件夹内创建一个version.py文件:

# Store the version here so:
# 1) we don't load dependencies by storing it in __init__.py
# 2) we can import it in setup.py for the same reason
# 3) we can import it into your module module
__version__ = '1.2.7'

在setup.py中:

exec(open('mypackage/version.py').read())
setup(
    name='mypackage',
    version=__version__,

在主文件夹init .py中:

from .version import __version__

exec()函数在任何导入之外运行脚本,因为setup.py是在导入模块之前运行的。您仍然只需要在一个位置管理一个文件中的版本号,但是不幸的是,它不在setup.py中。(这是不利因素,但没有导入错误是有利因素)

After several hours of trying to find the simplest reliable solution, here are the parts:

create a version.py file INSIDE the folder of your package “/mypackage”:

# Store the version here so:
# 1) we don't load dependencies by storing it in __init__.py
# 2) we can import it in setup.py for the same reason
# 3) we can import it into your module module
__version__ = '1.2.7'

in setup.py:

exec(open('mypackage/version.py').read())
setup(
    name='mypackage',
    version=__version__,

in the main folder init.py:

from .version import __version__

The exec() function runs the script outside of any imports, since setup.py is run before the module can be imported. You still only need to manage the version number in one file in one place, but unfortunately it is not in setup.py. (that’s the downside, but having no import bugs is the upside)


回答 13

自首次提出这个问题以来,已完成了大量的工作,以统一版本和支持约定。现在,《Python打包用户指南》中详细介绍了可口的选项。同样值得注意的是,按照PEP 440版本号方案在Python中相对严格,因此,要使程序包发布到Cheese Shop,保持良好状态就至关重要。

以下是版本控制选项的简化分类:

  1. setup.pysetuptools)中读取文件并获取版本。
  2. 使用外部构建工具(同时更新__init__.py和源代码控制),例如bump2versionchangeszest.releaser
  3. 将值设置__version__为特定模块中的全局变量。
  4. 将值放在简单的VERSION文本文件中,以便setup.py和代码读取。
  5. 通过setup.py发行版设置值,并使用importlib.metadata在运行时将其提取。(警告,有3.8之前和3.8之后的版本。)
  6. 将值设置为__version__in sample/__init__.py并在中导入样本setup.py
  7. 使用setuptools_scm从源代码管理中提取版本控制,以便它是规范参考,而不是代码。

注意,(7)可能是最现代的方法(构建元数据与代码无关,由自动化发布)。另外请注意,如果安装程序用于软件包发行,则简单程序python3 setup.py --version将直接报告版本。

Lots of work toward uniform versioning and in support of conventions has been completed since this question was first asked. Palatable options are now detailed in the Python Packaging User Guide. Also noteworthy is that version number schemes are relatively strict in Python per PEP 440, and so keeping things sane is critical if your package will be released to the Cheese Shop.

Here’s a shortened breakdown of versioning options:

  1. Read the file in setup.py (setuptools) and get the version.
  2. Use an external build tool (to update both __init__.py as well as source control), e.g. bump2version, changes or zest.releaser.
  3. Set the value to a __version__ global variable in a specific module.
  4. Place the value in a simple VERSION text file for both setup.py and code to read.
  5. Set the value via a setup.py release, and use importlib.metadata to pick it up at runtime. (Warning, there are pre-3.8 and post-3.8 versions.)
  6. Set the value to __version__ in sample/__init__.py and import sample in setup.py.
  7. Use setuptools_scm to extract versioning from source control so that it’s the canonical reference, not code.

NOTE that (7) might be the most modern approach (build metadata is independent of code, published by automation). Also NOTE that if setup is used for package release that a simple python3 setup.py --version will report the version directly.


回答 14

值得一说的是,如果您使用的是NumPy distutils,numpy.distutils.misc_util.Configuration则可以使用make_svn_version_py()一种将修订版号嵌入package.__svn_version__变量中的方法version

For what it’s worth, if you’re using NumPy distutils, numpy.distutils.misc_util.Configuration has a make_svn_version_py() method that embeds the revision number inside package.__svn_version__ in the variable version .


回答 15

  1. 使用一个version.py文件只用__version__ = <VERSION>该文件在参数。在setup.py文件中导入__version__参数,然后将其值放在setup.py文件中,如下所示: version=__version__
  2. 另一种方法是仅使用setup.py带有version=<CURRENT_VERSION>-的文件CURRENT_VERSION是硬编码的。

由于我们不想每次创建新标签(准备发布新的软件包版本)时都手动更改文件中的版本,因此可以使用以下内容。

我强烈建议使用bumpversion程序包。多年来,我一直在使用它来改进版本。

首先添加version=<VERSION>setup.py文件(如果尚未添加)。

每次更改版本时,都应使用如下简短脚本:

bumpversion (patch|minor|major) - choose only one option
git push
git push --tags

然后为每个仓库添加一个文件.bumpversion.cfg

[bumpversion]
current_version = <CURRENT_TAG>
commit = True
tag = True
tag_name = {new_version}
[bumpversion:file:<RELATIVE_PATH_TO_SETUP_FILE>]

注意:

  • 您可以像其他帖子中所建议的那样__version__version.py文件下使用参数,并像这样更新bumpversion文件: [bumpversion:file:<RELATIVE_PATH_TO_VERSION_FILE>]
  • 必须 git commitgit reset您的回购中的所有内容,否则您将收到肮脏的回购错误。
  • 确保您的虚拟环境中包含了bumpversion程序包,如果没有,它将无法正常工作。
  1. Use a version.py file only with __version__ = <VERSION> param in the file. In the setup.py file import the __version__ param and put it’s value in the setup.py file like this: version=__version__
  2. Another way is to use just a setup.py file with version=<CURRENT_VERSION> – the CURRENT_VERSION is hardcoded.

Since we don’t want to manually change the version in the file every time we create a new tag (ready to release a new package version), we can use the following..

I highly recommend bumpversion package. I’ve been using it for years to bump a version.

start by adding version=<VERSION> to your setup.py file if you don’t have it already.

You should use a short script like this every time you bump a version:

bumpversion (patch|minor|major) - choose only one option
git push
git push --tags

Then add one file per repo called: .bumpversion.cfg:

[bumpversion]
current_version = <CURRENT_TAG>
commit = True
tag = True
tag_name = {new_version}
[bumpversion:file:<RELATIVE_PATH_TO_SETUP_FILE>]

Note:

  • You can use __version__ parameter under version.py file like it was suggested in other posts and update the bumpversion file like this: [bumpversion:file:<RELATIVE_PATH_TO_VERSION_FILE>]
  • You must git commit or git reset everything in your repo, otherwise you’ll get a dirty repo error.
  • Make sure that your virtual environment includes the package of bumpversion, without it it will not work.

回答 16

如果使用CVS(或RCS)并需要快速解决方案,则可以使用:

__version__ = "$Revision: 1.1 $"[11:-2]
__version_info__ = tuple([int(s) for s in __version__.split(".")])

(当然,修订号将由CVS代替。)

这为您提供了易于打印的版本和版本信息,可用于检查要导入的模块至少具有预期的版本:

import my_module
assert my_module.__version_info__ >= (1, 1)

If you use CVS (or RCS) and want a quick solution, you can use:

__version__ = "$Revision: 1.1 $"[11:-2]
__version_info__ = tuple([int(s) for s in __version__.split(".")])

(Of course, the revision number will be substituted for you by CVS.)

This gives you a print-friendly version and a version info that you can use to check that the module you are importing has at least the expected version:

import my_module
assert my_module.__version_info__ >= (1, 1)

如何在string.replace中输入正则表达式?

问题:如何在string.replace中输入正则表达式?

我需要一些帮助来声明正则表达式。我的输入如下:

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>

所需的输出是:

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. 
and there are many other lines in the txt files
with such tags

我已经试过了:

#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
    for line in reader: 
        line2 = line.replace('<[1> ', '')
        line = line2.replace('</[1> ', '')
        line2 = line.replace('<[1>', '')
        line = line2.replace('</[1>', '')

        print line

我也尝试过此方法(但似乎我使用了错误的regex语法):

    line2 = line.replace('<[*> ', '')
    line = line2.replace('</[*> ', '')
    line2 = line.replace('<[*>', '')
    line = line2.replace('</[*>', '')

我不想replace从1到99 进行硬编码。。。

I need some help on declaring a regex. My inputs are like the following:

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>

The required output is:

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. 
and there are many other lines in the txt files
with such tags

I’ve tried this:

#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
    for line in reader: 
        line2 = line.replace('<[1> ', '')
        line = line2.replace('</[1> ', '')
        line2 = line.replace('<[1>', '')
        line = line2.replace('</[1>', '')

        print line

I’ve also tried this (but it seems like I’m using the wrong regex syntax):

    line2 = line.replace('<[*> ', '')
    line = line2.replace('</[*> ', '')
    line2 = line.replace('<[*>', '')
    line = line2.replace('</[*>', '')

I dont want to hard-code the replace from 1 to 99 . . .


回答 0

这个经过测试的代码段应该做到这一点:

import re
line = re.sub(r"</?\[\d+>", "", line)

编辑:这是解释其工作方式的注释版本:

line = re.sub(r"""
  (?x) # Use free-spacing mode.
  <    # Match a literal '<'
  /?   # Optionally match a '/'
  \[   # Match a literal '['
  \d+  # Match one or more digits
  >    # Match a literal '>'
  """, "", line)

正则表达式是 有趣!但我强烈建议您花一两个小时来学习基础知识。对于初学者,您需要学习哪些特殊字符:需要转义的“元字符”(即,前面加反斜杠-字符类的内外规则都不同。)在以下位置有一个出色的在线教程:www .regular-expressions.info。您在那里度过的时间将使自己获得很多回报。祝您满意!

This tested snippet should do it:

import re
line = re.sub(r"</?\[\d+>", "", line)

Edit: Here’s a commented version explaining how it works:

line = re.sub(r"""
  (?x) # Use free-spacing mode.
  <    # Match a literal '<'
  /?   # Optionally match a '/'
  \[   # Match a literal '['
  \d+  # Match one or more digits
  >    # Match a literal '>'
  """, "", line)

Regexes are fun! But I would strongly recommend spending an hour or two studying the basics. For starters, you need to learn which characters are special: “metacharacters” which need to be escaped (i.e. with a backslash placed in front – and the rules are different inside and outside character classes.) There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!


回答 1

str.replace()进行固定替换。使用re.sub()代替。

str.replace() does fixed replacements. Use re.sub() instead.


回答 2

我会这样(正则表达式在注释中说明):

import re

# If you need to use the regex more than once it is suggested to compile it.
pattern = re.compile(r"</{0,}\[\d+>")

# <\/{0,}\[\d+>
# 
# Match the character “<” literally «<»
# Match the character “/” literally «\/{0,}»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}»
# Match the character “[” literally «\[»
# Match a single digit 0..9 «\d+»
#    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the character “>” literally «>»

subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>"""

result = pattern.sub("", subject)

print(result)

如果您想了解有关正则表达式的更多信息,建议阅读 Jan Goyvaerts和Steven Levithan撰写的《表达式食谱》

I would go like this (regex explained in comments):

import re

# If you need to use the regex more than once it is suggested to compile it.
pattern = re.compile(r"</{0,}\[\d+>")

# <\/{0,}\[\d+>
# 
# Match the character “<” literally «<»
# Match the character “/” literally «\/{0,}»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}»
# Match the character “[” literally «\[»
# Match a single digit 0..9 «\d+»
#    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the character “>” literally «>»

subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>"""

result = pattern.sub("", subject)

print(result)

If you want to learn more about regex I recomend to read Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan.


回答 3

最简单的方法

import re

txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.  and there are many other lines in the txt files with<[3> such tags </[3>'

out = re.sub("(<[^>]+>)", '', txt)
print out

The easiest way

import re

txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.  and there are many other lines in the txt files with<[3> such tags </[3>'

out = re.sub("(<[^>]+>)", '', txt)
print out

回答 4

字符串对象的replace方法不接受正则表达式,而仅接受固定的字符串(请参见文档:http : //docs.python.org/2/library/stdtypes.html#str.replace)。

您必须使用re模块:

import re
newline= re.sub("<\/?\[[0-9]+>", "", line)

replace method of string objects does not accept regular expressions but only fixed strings (see documentation: http://docs.python.org/2/library/stdtypes.html#str.replace).

You have to use re module:

import re
newline= re.sub("<\/?\[[0-9]+>", "", line)

回答 5

不必使用正则表达式(用于您的示例字符串)

>>> s
'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. \nand there are many other lines in the txt files\nwith<[3> such tags </[3>\n'

>>> for w in s.split(">"):
...   if "<" in w:
...      print w.split("<")[0]
...
this is a paragraph with
 in between
 and then there are cases ... where the
 number ranges from 1-100
.
and there are many other lines in the txt files
with
 such tags

don’t have to use regular expression (for your sample string)

>>> s
'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. \nand there are many other lines in the txt files\nwith<[3> such tags </[3>\n'

>>> for w in s.split(">"):
...   if "<" in w:
...      print w.split("<")[0]
...
this is a paragraph with
 in between
 and then there are cases ... where the
 number ranges from 1-100
.
and there are many other lines in the txt files
with
 such tags

回答 6

import os, sys, re, glob

pattern = re.compile(r"\<\[\d\>")
replacementStringMatchesPattern = "<[1>"

for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
   for line in reader: 
      retline =  pattern.sub(replacementStringMatchesPattern, "", line)         
      sys.stdout.write(retline)
      print (retline)
import os, sys, re, glob

pattern = re.compile(r"\<\[\d\>")
replacementStringMatchesPattern = "<[1>"

for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
   for line in reader: 
      retline =  pattern.sub(replacementStringMatchesPattern, "", line)         
      sys.stdout.write(retline)
      print (retline)

将不同类型的项目列表作为字符串加入Python

问题:将不同类型的项目列表作为字符串加入Python

我需要加入项目清单。列表中的许多项目都是从函数返回的整数值;即

myList.append(munfunc()) 

如何将返回的结果转换为字符串,以便将其与列表连接?

我是否需要对每个整数值执行以下操作:

myList.append(str(myfunc()))

有没有更Python化的方法来解决铸造问题?

I need to join a list of items. Many of the items in the list are integer values returned from a function; i.e.,

myList.append(munfunc()) 

How should I convert the returned result to a string in order to join it with the list?

Do I need to do the following for every integer value:

myList.append(str(myfunc()))

Is there a more Pythonic way to solve casting problems?


回答 0

调用str(...)是将内容转换为字符串的Python方式。

您可能需要考虑为什么要使用字符串列表。您可以改为将其保存为整数列表,并且仅在需要显示整数时才将其转换为字符串。例如,如果您有一个整数列表,则可以在for循环中将它们一一转换,然后将它们与结合,

print(','.join(str(x) for x in list_of_ints))

Calling str(...) is the Pythonic way to convert something to a string.

You might want to consider why you want a list of strings. You could instead keep it as a list of integers and only convert the integers to strings when you need to display them. For example, if you have a list of integers then you can convert them one by one in a for-loop and join them with ,:

print(','.join(str(x) for x in list_of_ints))

回答 1

将整数传递给str并没有错。您可能不这样做的一个原因是myList实际上应该是整数列表,例如,将列表中的值相加是合理的。在这种情况下,在将int附加到myList之前,请勿将其传递给str。如果最终在追加之前没有转换为字符串,则可以通过执行以下操作来构造一个大字符串

', '.join(map(str, myList))

There’s nothing wrong with passing integers to str. One reason you might not do this is that myList is really supposed to be a list of integers e.g. it would be reasonable to sum the values in the list. In that case, do not pass your ints to str before appending them to myList. If you end up not converting to strings before appending, you can construct one big string by doing something like

', '.join(map(str, myList))

回答 2

可以使用python中的map函数。它有两个参数。第一个参数是必须用于列表中每个元素的函数。第二个论点是可迭代的

a = [1, 2, 3]   
map(str, a)  
['1', '2', '3']

将列表转换为字符串后,可以使用简单的连接功能将列表组合为单个字符串

a = map(str, a)    
''.join(a)      
'123'

map function in python can be used. It takes two arguments. First argument is the function which has to be used for each element of the list. Second argument is the iterable.

a = [1, 2, 3]   
map(str, a)  
['1', '2', '3']

After converting the list into string you can use simple join function to combine list into a single string

a = map(str, a)    
''.join(a)      
'123'

回答 3

a=[1,2,3]
b=[str(x) for x in a]
print b

上面的方法是将列表转换为字符串的最简单,最通用的方法。另一个简短的方法是-

a=[1,2,3]
b=map(str,a)
print b
a=[1,2,3]
b=[str(x) for x in a]
print b

above method is the easiest and most general way to convert list into string. another short method is-

a=[1,2,3]
b=map(str,a)
print b

回答 4

有三种方法可以做到这一点。

假设您有一个整数列表

my_list = [100,200,300]
  1. "-".join(str(n) for n in my_list)
  2. "-".join([str(n) for n in my_list])
  3. "-".join(map(str, my_list))

但是,如python网站https://docs.python.org/2/library/timeit.html上的timeit示例中所述,使用地图的速度更快。所以我建议您使用"-".join(map(str, my_list))

There are three ways of doing this.

let say you have a list of integers

my_list = [100,200,300]
  1. "-".join(str(n) for n in my_list)
  2. "-".join([str(n) for n in my_list])
  3. "-".join(map(str, my_list))

However as stated in the example of timeit on python website at https://docs.python.org/2/library/timeit.html using a map is faster. So I would recommend you using "-".join(map(str, my_list))


回答 5

您的问题很清楚。也许您正在寻找扩展,以便将另一个列表的所有元素添加到现有列表中:

>>> x = [1,2]
>>> x.extend([3,4,5])
>>> x
[1, 2, 3, 4, 5]

如果要将整数转换为字符串,请使用str()或字符串插值,可能与列表推导结合使用,即

>>> x = ['1', '2']
>>> x.extend([str(i) for i in range(3, 6)])
>>> x
['1', '2', '3', '4', '5']

所有这些都被认为是pythonic的(好吧,生成器表达式更像是pythonic的,但让我们保持话题简单)

Your problem is rather clear. Perhaps you’re looking for extend, to add all elements of another list to an existing list:

>>> x = [1,2]
>>> x.extend([3,4,5])
>>> x
[1, 2, 3, 4, 5]

If you want to convert integers to strings, use str() or string interpolation, possibly combined with a list comprehension, i.e.

>>> x = ['1', '2']
>>> x.extend([str(i) for i in range(3, 6)])
>>> x
['1', '2', '3', '4', '5']

All of this is considered pythonic (ok, a generator expression is even more pythonic but let’s stay simple and on topic)


回答 6

例如:

lst_points = [[313, 262, 470, 482], [551, 254, 697, 449]]

lst_s_points = [" ".join(map(str, lst)) for lst in lst_points]
print lst_s_points
# ['313 262 470 482', '551 254 697 449']

对于我来说,我想str在每个str列表之前添加一个:

# here o means class, other four points means coordinate
print ['0 ' + " ".join(map(str, lst)) for lst in lst_points]
# ['0 313 262 470 482', '0 551 254 697 449']

或单个列表:

lst = [313, 262, 470, 482]
lst_str = [str(i) for i in lst]
print lst_str, ", ".join(lst_str)
# ['313', '262', '470', '482'], 313, 262, 470, 482

lst_str = map(str, lst)
print lst_str, ", ".join(lst_str)
# ['313', '262', '470', '482'], 313, 262, 470, 482

For example:

lst_points = [[313, 262, 470, 482], [551, 254, 697, 449]]

lst_s_points = [" ".join(map(str, lst)) for lst in lst_points]
print lst_s_points
# ['313 262 470 482', '551 254 697 449']

As to me, I want to add a str before each str list:

# here o means class, other four points means coordinate
print ['0 ' + " ".join(map(str, lst)) for lst in lst_points]
# ['0 313 262 470 482', '0 551 254 697 449']

Or single list:

lst = [313, 262, 470, 482]
lst_str = [str(i) for i in lst]
print lst_str, ", ".join(lst_str)
# ['313', '262', '470', '482'], 313, 262, 470, 482

lst_str = map(str, lst)
print lst_str, ", ".join(lst_str)
# ['313', '262', '470', '482'], 313, 262, 470, 482

回答 7

也许您不需要数字作为字符串,只需执行以下操作:

functaulu = [munfunc(arg) for arg in range(loppu)]

稍后,如果您需要将其作为字符串,则可以使用字符串或格式字符串来实现:

print "Vastaus5 = %s" % functaulu[5]

Maybe you do not need numbers as strings, just do:

functaulu = [munfunc(arg) for arg in range(loppu)]

Later if you need it as string you can do it with string or with format string:

print "Vastaus5 = %s" % functaulu[5]


回答 8

没人会喜欢repr吗?
python 3.7.2:

>>> int_list = [1, 2, 3, 4, 5]
>>> print(repr(int_list))
[1, 2, 3, 4, 5]
>>> 

请注意,这是一个明确的表示。一个示例显示:

#Print repr(object) backwards
>>> print(repr(int_list)[::-1])
]5 ,4 ,3 ,2 ,1[
>>> 

pydocs-repr上的更多信息

How come no-one seems to like repr?
python 3.7.2:

>>> int_list = [1, 2, 3, 4, 5]
>>> print(repr(int_list))
[1, 2, 3, 4, 5]
>>> 

Take care though, it’s an explicit representation. An example shows:

#Print repr(object) backwards
>>> print(repr(int_list)[::-1])
]5 ,4 ,3 ,2 ,1[
>>> 

more info at pydocs-repr


如何将多行字符串分成多行?

问题:如何将多行字符串分成多行?

我有一个多行字符串文字,我想在每一行上执行一个操作,如下所示:

inputString = """Line 1
Line 2
Line 3"""

我想做以下事情:

for line in inputString:
    doStuff()

I have a multi-line string literal that I want to do an operation on each line, like so:

inputString = """Line 1
Line 2
Line 3"""

I want to do something like the following:

for line in inputString:
    doStuff()

回答 0

inputString.splitlines()

将为您提供每个项目的列表,该splitlines()方法旨在将每一行拆分为一个列表元素。

inputString.splitlines()

Will give you a list with each item, the splitlines() method is designed to split each line into a list element.


回答 1

就像其他人说的:

inputString.split('\n')  # --> ['Line 1', 'Line 2', 'Line 3']

与上面的相同,但是不建议使用字符串模块的功能,应避免使用:

import string
string.split(inputString, '\n')  # --> ['Line 1', 'Line 2', 'Line 3']

另外,如果您希望每行都包含中断顺序(CR,LF,CRLF),请将该splitlines方法与True参数一起使用:

inputString.splitlines(True)  # --> ['Line 1\n', 'Line 2\n', 'Line 3']

Like the others said:

inputString.split('\n')  # --> ['Line 1', 'Line 2', 'Line 3']

This is identical to the above, but the string module’s functions are deprecated and should be avoided:

import string
string.split(inputString, '\n')  # --> ['Line 1', 'Line 2', 'Line 3']

Alternatively, if you want each line to include the break sequence (CR,LF,CRLF), use the splitlines method with a True argument:

inputString.splitlines(True)  # --> ['Line 1\n', 'Line 2\n', 'Line 3']

回答 2

使用str.splitlines()

splitlines()不同于,可以正确处理换行符split("\n")

它也具有@efotinis提到的优点,当使用True参数调用时,可以在拆分结果中选择性地包括换行符。


为什么不应该使用的详细说明split("\n")

\n在Python中,代表Unix换行符(ASCII十进制代码10),独立于运行它的平台。但是,换行表示形式取决于平台。在Windows上,\n是两个字符CRLF(ASCII十进制码13和10,\r\n称为AKA 和),而在任何现代Unix(包括OS X)上,它都是单个字符LF

print,例如,即使您有一个行尾与平台不匹配的字符串也可以正常工作:

>>> print " a \n b \r\n c "
 a 
 b 
 c

但是,在“ \ n”上进行显式拆分将产生与平台有关的行为:

>>> " a \n b \r\n c ".split("\n")
[' a ', ' b \r', ' c ']

即使你使用了os.linesep,它只会根据你的平台上的换行分隔符分开,并会失败,如果你在处理文本创建在其他平台上,或用裸\n

>>> " a \n b \r\n c ".split(os.linesep)
[' a \n b ', ' c ']

splitlines 解决了所有这些问题:

>>> " a \n b \r\n c ".splitlines()
[' a ', ' b ', ' c ']

以文本模式读取文件可以部分缓解换行符表示问题,因为它将Python \n转换为平台的换行符表示形式。但是,文本模式仅在Windows上存在。在Unix系统上,所有文件都以二进制模式打开,因此split('\n')在带有Windows文件的UNIX系统中使用将导致不良行为。同样,使用与其他来源(例如来自套接字)的换行符可能不同的字符串来处理字符串也很常见。

Use str.splitlines().

splitlines() handles newlines properly, unlike split("\n").

It also has the the advantage mentioned by @efotinis of optionally including the newline character in the split result when called with a True argument.


Why you shouldn’t use split("\n"):

\n, in Python, represents a Unix line-break (ASCII decimal code 10), independently from the platform where you run it. However, the linebreak representation is platform-dependent. On Windows, \n is two characters, CR and LF (ASCII decimal codes 13 and 10, AKA \r and \n), while on any modern Unix (including OS X), it’s the single character LF.

print, for example, works correctly even if you have a string with line endings that don’t match your platform:

>>> print " a \n b \r\n c "
 a 
 b 
 c

However, explicitly splitting on “\n”, will yield platform-dependent behaviour:

>>> " a \n b \r\n c ".split("\n")
[' a ', ' b \r', ' c ']

Even if you use os.linesep, it will only split according to the newline separator on your platform, and will fail if you’re processing text created in other platforms, or with a bare \n:

>>> " a \n b \r\n c ".split(os.linesep)
[' a \n b ', ' c ']

splitlines solves all these problems:

>>> " a \n b \r\n c ".splitlines()
[' a ', ' b ', ' c ']

Reading files in text mode partially mitigates the newline representation problem, as it converts Python’s \n into the platform’s newline representation. However, text mode only exists on Windows. On Unix systems, all files are opened in binary mode, so using split('\n') in a UNIX system with a Windows file will lead to undesired behavior. Also, it’s not unusual to process strings with potentially different newlines from other sources, such as from a socket.


回答 3

在这种特殊情况下可能会过大,但另一个选择涉及使用StringIO创建文件状对象

for line in StringIO.StringIO(inputString):
    doStuff()

Might be overkill in this particular case but another option involves using StringIO to create a file-like object

for line in StringIO.StringIO(inputString):
    doStuff()

回答 4

原始帖子要求提供代码,该代码将打印一些行(如果在某些情况下是正确的),则打印下一行。我的实现是这样的:

text = """1 sfasdf
asdfasdf
2 sfasdf
asdfgadfg
1 asfasdf
sdfasdgf
"""

text = text.splitlines()
rows_to_print = {}

for line in range(len(text)):
    if text[line][0] == '1':
        rows_to_print = rows_to_print | {line, line + 1}

rows_to_print = sorted(list(rows_to_print))

for i in rows_to_print:
    print(text[i])

The original post requested for code which prints some rows (if they are true for some condition) plus the following row. My implementation would be this:

text = """1 sfasdf
asdfasdf
2 sfasdf
asdfgadfg
1 asfasdf
sdfasdgf
"""

text = text.splitlines()
rows_to_print = {}

for line in range(len(text)):
    if text[line][0] == '1':
        rows_to_print = rows_to_print | {line, line + 1}

rows_to_print = sorted(list(rows_to_print))

for i in rows_to_print:
    print(text[i])

回答 5

我希望注释的代码文本格式正确,因为我认为@ 1_CR的答案需要更多的修改,并且我想扩大他的答案。无论如何,他使我领会了以下技巧:如果可用,它将使用cStringIO(但请注意:cStringIO和StringIO 不相同,因为您不能将cStringIO子类化。。。它是内置的。但是对于基本操作,语法将是相同的,因此您可以这样做):

try:
    import cStringIO
    StringIO = cStringIO
except ImportError:
    import StringIO

for line in StringIO.StringIO(variable_with_multiline_string):
    pass
print line.strip()

I wish comments had proper code text formatting, because I think @1_CR ‘s answer needs more bumps, and I would like to augment his answer. Anyway, He led me to the following technique; it will use cStringIO if available (BUT NOTE: cStringIO and StringIO are not the same, because you cannot subclass cStringIO… it is a built-in… but for basic operations the syntax will be identical, so you can do this):

try:
    import cStringIO
    StringIO = cStringIO
except ImportError:
    import StringIO

for line in StringIO.StringIO(variable_with_multiline_string):
    pass
print line.strip()

在Python中更改字符串中的一个字符

问题:在Python中更改字符串中的一个字符

Python中替换字符串中字符的最简单方法是什么?

例如:

text = "abcdefg";
text[1] = "Z";
           ^

What is the easiest way in Python to replace a character in a string?

For example:

text = "abcdefg";
text[1] = "Z";
           ^

回答 0

不要修改字符串。

与他们一起工作;仅在需要时才将它们转换为字符串。

>>> s = list("Hello zorld")
>>> s
['H', 'e', 'l', 'l', 'o', ' ', 'z', 'o', 'r', 'l', 'd']
>>> s[6] = 'W'
>>> s
['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd']
>>> "".join(s)
'Hello World'

Python字符串是不可变的(即无法修改)。有很多的原因。除非您别无选择,否则请使用列表,然后将它们变成字符串。

Don’t modify strings.

Work with them as lists; turn them into strings only when needed.

>>> s = list("Hello zorld")
>>> s
['H', 'e', 'l', 'l', 'o', ' ', 'z', 'o', 'r', 'l', 'd']
>>> s[6] = 'W'
>>> s
['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd']
>>> "".join(s)
'Hello World'

Python strings are immutable (i.e. they can’t be modified). There are a lot of reasons for this. Use lists until you have no choice, only then turn them into strings.


回答 1

最快的方法?

有三种方法。对于速度寻求者,我建议使用“方法2”

方法1

由这个答案给出

text = 'abcdefg'
new = list(text)
new[6] = 'W'
''.join(new)

与“方法2”相比,这相当慢

timeit.timeit("text = 'abcdefg'; s = list(text); s[6] = 'W'; ''.join(s)", number=1000000)
1.0411581993103027

方法2(快速方法)

由这个答案给出

text = 'abcdefg'
text = text[:1] + 'Z' + text[2:]

哪个更快:

timeit.timeit("text = 'abcdefg'; text = text[:1] + 'Z' + text[2:]", number=1000000)
0.34651994705200195

方法3:

字节数组:

timeit.timeit("text = 'abcdefg'; s = bytearray(text); s[1] = 'Z'; str(s)", number=1000000)
1.0387420654296875

Fastest method?

There are three ways. For the speed seekers I recommend ‘Method 2’

Method 1

Given by this answer

text = 'abcdefg'
new = list(text)
new[6] = 'W'
''.join(new)

Which is pretty slow compared to ‘Method 2’

timeit.timeit("text = 'abcdefg'; s = list(text); s[6] = 'W'; ''.join(s)", number=1000000)
1.0411581993103027

Method 2 (FAST METHOD)

Given by this answer

text = 'abcdefg'
text = text[:1] + 'Z' + text[2:]

Which is much faster:

timeit.timeit("text = 'abcdefg'; text = text[:1] + 'Z' + text[2:]", number=1000000)
0.34651994705200195

Method 3:

Byte array:

timeit.timeit("text = 'abcdefg'; s = bytearray(text); s[1] = 'Z'; str(s)", number=1000000)
1.0387420654296875

回答 2

new = text[:1] + 'Z' + text[2:]
new = text[:1] + 'Z' + text[2:]

回答 3

Python字符串是不可变的,您可以通过复制来更改它们。
做您想做的最简单的方法可能是:

text = "Z" + text[1:]

text[1:]返回字符串text从位置1到结束,位置从0开始计数,从而“1”是第二个字符。

编辑:您可以对字符串的任何部分使用相同的字符串切片技术

text = text[:1] + "Z" + text[2:]

或者,如果该字母仅出现一次,则可以使用下面建议的搜索和替换技术

Python strings are immutable, you change them by making a copy.
The easiest way to do what you want is probably:

text = "Z" + text[1:]

The text[1:] returns the string in text from position 1 to the end, positions count from 0 so ‘1’ is the second character.

edit: You can use the same string slicing technique for any part of the string

text = text[:1] + "Z" + text[2:]

Or if the letter only appears once you can use the search and replace technique suggested below


回答 4

从python 2.6和python 3开始,您可以使用可变的字节数组(可以与字符串不同,可以逐个元素地更改):

s = "abcdefg"
b_s = bytearray(s)
b_s[1] = "Z"
s = str(b_s)
print s
aZcdefg

编辑:更改为s

edit2:正如两位炼金术士在评论中所述,此代码不适用于unicode。

Starting with python 2.6 and python 3 you can use bytearrays which are mutable (can be changed element-wise unlike strings):

s = "abcdefg"
b_s = bytearray(s)
b_s[1] = "Z"
s = str(b_s)
print s
aZcdefg

edit: Changed str to s

edit2: As Two-Bit Alchemist mentioned in the comments, this code does not work with unicode.


回答 5

就像其他人所说的那样,通常Python字符串应该是不可变的。

但是,如果您使用的是CPython,即python.org的实现,则可以使用ctypes修改内存中的字符串结构。

这是我使用该技术清除字符串的示例。

在python中将数据标记为敏感

为了完整起见,我提到了这一点,这应该是您的最后选择,因为它有点黑。

Like other people have said, generally Python strings are supposed to be immutable.

However, if you are using CPython, the implementation at python.org, it is possible to use ctypes to modify the string structure in memory.

Here is an example where I use the technique to clear a string.

Mark data as sensitive in python

I mention this for the sake of completeness, and this should be your last resort as it is hackish.


回答 6

此代码不是我的。我不记得我在哪里填写网站表格。有趣的是,您可以使用此字符用一个或多个字符替换一个或多个字符。尽管此回复很晚,但像我这样的新手(随时)可能会觉得有用。

更改文字功能。

mytext = 'Hello Zorld'
mytext = mytext.replace('Z', 'W')
print mytext,

This code is not mine. I couldn’t recall the site form where, I took it. Interestingly, you can use this to replace one character or more with one or more charectors. Though this reply is very late, novices like me (anytime) might find it useful.

Change Text function.

mytext = 'Hello Zorld'
mytext = mytext.replace('Z', 'W')
print mytext,

回答 7

实际上,使用字符串,您可以执行以下操作:

oldStr = 'Hello World!'    
newStr = ''

for i in oldStr:  
    if 'a' < i < 'z':    
        newStr += chr(ord(i)-32)     
    else:      
        newStr += i
print(newStr)

'HELLO WORLD!'

基本上,我是将“ +”字符串一起添加到新字符串中:)。

Actually, with strings, you can do something like this:

oldStr = 'Hello World!'    
newStr = ''

for i in oldStr:  
    if 'a' < i < 'z':    
        newStr += chr(ord(i)-32)     
    else:      
        newStr += i
print(newStr)

'HELLO WORLD!'

Basically, I’m “adding”+”strings” together into a new string :).


回答 8

如果您的世界是100%ascii/utf-8(很多用例都放在该框中):

b = bytearray(s, 'utf-8')
# process - e.g., lowercasing: 
#    b[0] = b[i+1] - 32
s = str(b, 'utf-8')

python 3.7.3

if your world is 100% ascii/utf-8(a lot of use cases fit in that box):

b = bytearray(s, 'utf-8')
# process - e.g., lowercasing: 
#    b[0] = b[i+1] - 32
s = str(b, 'utf-8')

python 3.7.3


回答 9

我想添加另一种更改字符串中字符的方式。

>>> text = '~~~~~~~~~~~'
>>> text = text[:1] + (text[1:].replace(text[0], '+', 1))
'~+~~~~~~~~~'

与将字符串转换为list并替换ith值然后再次加入相比,速度有多快?

清单方式

>>> timeit.timeit("text = '~~~~~~~~~~~'; s = list(text); s[1] = '+'; ''.join(s)", number=1000000)
0.8268570480013295

我的解决方案

>>> timeit.timeit("text = '~~~~~~~~~~~'; text=text[:1] + (text[1:].replace(text[0], '+', 1))", number=1000000)
0.588400217000526

I would like to add another way of changing a character in a string.

>>> text = '~~~~~~~~~~~'
>>> text = text[:1] + (text[1:].replace(text[0], '+', 1))
'~+~~~~~~~~~'

How faster it is when compared to turning the string into list and replacing the ith value then joining again?.

List approach

>>> timeit.timeit("text = '~~~~~~~~~~~'; s = list(text); s[1] = '+'; ''.join(s)", number=1000000)
0.8268570480013295

My solution

>>> timeit.timeit("text = '~~~~~~~~~~~'; text=text[:1] + (text[1:].replace(text[0], '+', 1))", number=1000000)
0.588400217000526

如何在Python中从字符串末尾删除子字符串?

问题:如何在Python中从字符串末尾删除子字符串?

我有以下代码:

url = 'abcdc.com'
print(url.strip('.com'))

我期望: abcdc

我有: abcd

现在我做

url.rsplit('.com', 1)

有没有更好的办法?

I have the following code:

url = 'abcdc.com'
print(url.strip('.com'))

I expected: abcdc

I got: abcd

Now I do

url.rsplit('.com', 1)

Is there a better way?


回答 0

strip并不意味着“删除此子字符串”。x.strip(y)视为y一组字符,并从的末尾剥离该组中的所有字符x

相反,您可以使用endswith和切片:

url = 'abcdc.com'
if url.endswith('.com'):
    url = url[:-4]

或使用正则表达式

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)

strip doesn’t mean “remove this substring”. x.strip(y) treats y as a set of characters and strips any characters in that set from the ends of x.

Instead, you could use endswith and slicing:

url = 'abcdc.com'
if url.endswith('.com'):
    url = url[:-4]

Or using regular expressions:

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)

回答 1

如果您确定字符串仅出现在末尾,则最简单的方法是使用“替换”:

url = 'abcdc.com'
print(url.replace('.com',''))

If you are sure that the string only appears at the end, then the simplest way would be to use ‘replace’:

url = 'abcdc.com'
print(url.replace('.com',''))

回答 2

def strip_end(text, suffix):
    if not text.endswith(suffix):
        return text
    return text[:len(text)-len(suffix)]
def strip_end(text, suffix):
    if not text.endswith(suffix):
        return text
    return text[:len(text)-len(suffix)]

回答 3

由于似乎没有人指出这一点:

url = "www.example.com"
new_url = url[:url.rfind(".")]

这应该比split()不使用任何新列表对象的方法更有效,并且此解决方案适用于带有多个点的字符串。

Since it seems like nobody has pointed this on out yet:

url = "www.example.com"
new_url = url[:url.rfind(".")]

This should be more efficient than the methods using split() as no new list object is created, and this solution works for strings with several dots.


回答 4

取决于您对网址的了解以及您要尝试的内容。如果您知道它将始终以“ .com”(或“ .net”或“ .org”)结尾,则

 url=url[:-4]

是最快的解决方案。如果它是更通用的URL,那么最好研究一下python随附的urlparse库。

另一方面,如果您只是想删除最后一个“。”之后的所有内容。然后是一个字符串

url.rsplit('.',1)[0]

将工作。或者,如果您只想让所有内容都达到第一个“。”。然后尝试

url.split('.',1)[0]

Depends on what you know about your url and exactly what you’re tryinh to do. If you know that it will always end in ‘.com’ (or ‘.net’ or ‘.org’) then

 url=url[:-4]

is the quickest solution. If it’s a more general URLs then you’re probably better of looking into the urlparse library that comes with python.

If you on the other hand you simply want to remove everything after the final ‘.’ in a string then

url.rsplit('.',1)[0]

will work. Or if you want just want everything up to the first ‘.’ then try

url.split('.',1)[0]

回答 5

如果您知道这是一个扩展,那么

url = 'abcdc.com'
...
url.rsplit('.', 1)[0]  # split at '.', starting from the right, maximum 1 split

这与abcdc.comor www.abcdc.com或or 同样有效,abcdc.[anything]并且可扩展性更高。

If you know it’s an extension, then

url = 'abcdc.com'
...
url.rsplit('.', 1)[0]  # split at '.', starting from the right, maximum 1 split

This works equally well with abcdc.com or www.abcdc.com or abcdc.[anything] and is more extensible.


回答 6

一行:

text if not text.endswith(suffix) or len(suffix) == 0 else text[:-len(suffix)]

In one line:

text if not text.endswith(suffix) or len(suffix) == 0 else text[:-len(suffix)]

回答 7

怎么url[:-4]

How about url[:-4]?


回答 8

对于url(在给定的示例中,它似乎是主题的一部分),可以执行以下操作:

import os
url = 'http://www.stackoverflow.com'
name,ext = os.path.splitext(url)
print (name, ext)

#Or:
ext = '.'+url.split('.')[-1]
name = url[:-len(ext)]
print (name, ext)

两者都将输出: ('http://www.stackoverflow', '.com')

str.endswith(suffix)如果您只需要分割“ .com”或其他特定内容,也可以将其结合使用。

For urls (as it seems to be a part of the topic by the given example), one can do something like this:

import os
url = 'http://www.stackoverflow.com'
name,ext = os.path.splitext(url)
print (name, ext)

#Or:
ext = '.'+url.split('.')[-1]
name = url[:-len(ext)]
print (name, ext)

Both will output: ('http://www.stackoverflow', '.com')

This can also be combined with str.endswith(suffix) if you need to just split “.com”, or anything specific.


回答 9

url.rsplit(’。com’,1)

不太正确。

您实际需要写的是

url.rsplit('.com', 1)[0]

,而且恕我直言。

但是,我个人偏爱此选项,因为它仅使用一个参数:

url.rpartition('.com')[0]

url.rsplit(‘.com’, 1)

is not quite right.

What you actually would need to write is

url.rsplit('.com', 1)[0]

, and it looks pretty succinct IMHO.

However, my personal preference is this option because it uses only one parameter:

url.rpartition('.com')[0]

回答 10

从开始Python 3.9,您可以removesuffix改用:

'abcdc.com'.removesuffix('.com')
# 'abcdc'

Starting in Python 3.9, you can use removesuffix instead:

'abcdc.com'.removesuffix('.com')
# 'abcdc'

回答 11

如果需要剥离某个字符串的某个末端(如果存在),否则什么也不做。我最好的解决方案。您可能会想使用前两个实现之一,但是为了完整起见,我包括了第三个实现。

对于恒定的后缀:

def remove_suffix(v, s):
    return v[:-len(s) if v.endswith(s) else v
remove_suffix("abc.com", ".com") == 'abc'
remove_suffix("abc", ".com") == 'abc'

对于正则表达式:

def remove_suffix_compile(suffix_pattern):
    r = re.compile(f"(.*?)({suffix_pattern})?$")
    return lambda v: r.match(v)[1]
remove_domain = remove_suffix_compile(r"\.[a-zA-Z0-9]{3,}")
remove_domain("abc.com") == "abc"
remove_domain("sub.abc.net") == "sub.abc"
remove_domain("abc.") == "abc."
remove_domain("abc") == "abc"

对于常量后缀的集合,用于大量调用的渐近最快方法:

def remove_suffix_preprocess(*suffixes):
    suffixes = set(suffixes)
    try:
        suffixes.remove('')
    except KeyError:
        pass

    def helper(suffixes, pos):
        if len(suffixes) == 1:
            suf = suffixes[0]
            l = -len(suf)
            ls = slice(0, l)
            return lambda v: v[ls] if v.endswith(suf) else v
        si = iter(suffixes)
        ml = len(next(si))
        exact = False
        for suf in si:
            l = len(suf)
            if -l == pos:
                exact = True
            else:
                ml = min(len(suf), ml)
        ml = -ml
        suffix_dict = {}
        for suf in suffixes:
            sub = suf[ml:pos]
            if sub in suffix_dict:
                suffix_dict[sub].append(suf)
            else:
                suffix_dict[sub] = [suf]
        if exact:
            del suffix_dict['']
            for key in suffix_dict:
                suffix_dict[key] = helper([s[:pos] for s in suffix_dict[key]], None)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v[:pos])
        else:
            for key in suffix_dict:
                suffix_dict[key] = helper(suffix_dict[key], ml)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v)
    return helper(tuple(suffixes), None)
domain_remove = remove_suffix_preprocess(".com", ".net", ".edu", ".uk", '.tv', '.co.uk', '.org.uk')

最后一个在pypy中可能要比cpython快得多。对于几乎所有不涉及潜在后缀的巨大词典的情况,regex变体可能比此方法更快,至少在cPython中这些潜在后缀无法轻易地表示为regex。

在PyPy中,即使re模​​块使用DFA编译正则表达式引擎,对于大量调用或长字符串来说,正则表达式变体几乎肯定会变慢,因为JIT会优化lambda的大部分开销。

但是,在cPython中,您几乎可以肯定地比较了正在运行的regex的c代码这一事实,这几乎可以证明后缀集合版本在算法上的优势。

If you need to strip some end of a string if it exists otherwise do nothing. My best solutions. You probably will want to use one of first 2 implementations however I have included the 3rd for completeness.

For a constant suffix:

def remove_suffix(v, s):
    return v[:-len(s) if v.endswith(s) else v
remove_suffix("abc.com", ".com") == 'abc'
remove_suffix("abc", ".com") == 'abc'

For a regex:

def remove_suffix_compile(suffix_pattern):
    r = re.compile(f"(.*?)({suffix_pattern})?$")
    return lambda v: r.match(v)[1]
remove_domain = remove_suffix_compile(r"\.[a-zA-Z0-9]{3,}")
remove_domain("abc.com") == "abc"
remove_domain("sub.abc.net") == "sub.abc"
remove_domain("abc.") == "abc."
remove_domain("abc") == "abc"

For a collection of constant suffixes the asymptotically fastest way for a large number of calls:

def remove_suffix_preprocess(*suffixes):
    suffixes = set(suffixes)
    try:
        suffixes.remove('')
    except KeyError:
        pass

    def helper(suffixes, pos):
        if len(suffixes) == 1:
            suf = suffixes[0]
            l = -len(suf)
            ls = slice(0, l)
            return lambda v: v[ls] if v.endswith(suf) else v
        si = iter(suffixes)
        ml = len(next(si))
        exact = False
        for suf in si:
            l = len(suf)
            if -l == pos:
                exact = True
            else:
                ml = min(len(suf), ml)
        ml = -ml
        suffix_dict = {}
        for suf in suffixes:
            sub = suf[ml:pos]
            if sub in suffix_dict:
                suffix_dict[sub].append(suf)
            else:
                suffix_dict[sub] = [suf]
        if exact:
            del suffix_dict['']
            for key in suffix_dict:
                suffix_dict[key] = helper([s[:pos] for s in suffix_dict[key]], None)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v[:pos])
        else:
            for key in suffix_dict:
                suffix_dict[key] = helper(suffix_dict[key], ml)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v)
    return helper(tuple(suffixes), None)
domain_remove = remove_suffix_preprocess(".com", ".net", ".edu", ".uk", '.tv', '.co.uk', '.org.uk')

the final one is probably significantly faster in pypy then cpython. The regex variant is likely faster than this for virtually all cases that to not involve huge dictionaries of potential suffixes that cannot be easily represented as a regex at least in cPython.

In PyPy the regex variant is almost certainly slower for large number of calls or long strings even if the re module uses a DFA compiling regex engine as the vast majority of the overhead of the lambda’s will be optimized out by the JIT.

In cPython however the fact that your running c code for the regex compare almost certainly out ways the algorithmic advantages of the suffix collection version in almost all cases.


回答 12

如果您只打算去除扩展名:

'.'.join('abcdc.com'.split('.')[:-1])
# 'abcdc'

它适用于任何扩展名,文件名中也可能存在其他点。它只是将字符串拆分为点列表,并在没有最后一个元素的情况下将其加入。

If you mean to only strip the extension:

'.'.join('abcdc.com'.split('.')[:-1])
# 'abcdc'

It works with any extension, with potential other dots existing in filename as well. It simply splits the string as a list on dots and joins it without the last element.


回答 13

import re

def rm_suffix(url = 'abcdc.com', suffix='\.com'):
    return(re.sub(suffix+'$', '', url))

我想重复这个答案,以此作为最有表现力的方式。当然,以下操作会减少CPU时间:

def rm_dotcom(url = 'abcdc.com'):
    return(url[:-4] if url.endswith('.com') else url)

但是,如果CPU是瓶颈,为什么要用Python编写?

无论如何,CPU何时会成为瓶颈?在司机中,也许。

使用正则表达式的优点是代码可重用性。如果下一个要删除只有三个字符的’.me’怎么办?

相同的代码可以解决问题:

>>> rm_sub('abcdc.me','.me')
'abcdc'
import re

def rm_suffix(url = 'abcdc.com', suffix='\.com'):
    return(re.sub(suffix+'$', '', url))

I want to repeat this answer as the most expressive way to do it. Of course, the following would take less CPU time:

def rm_dotcom(url = 'abcdc.com'):
    return(url[:-4] if url.endswith('.com') else url)

However, if CPU is the bottle neck why write in Python?

When is CPU a bottle neck anyway? In drivers, maybe.

The advantages of using regular expression is code reusability. What if you next want to remove ‘.me’, which only has three characters?

Same code would do the trick:

>>> rm_sub('abcdc.me','.me')
'abcdc'

回答 14

就我而言,我需要提出一个exceptions,所以我做到了:

class UnableToStripEnd(Exception):
    """A Exception type to indicate that the suffix cannot be removed from the text."""

    @staticmethod
    def get_exception(text, suffix):
        return UnableToStripEnd("Could not find suffix ({0}) on text: {1}."
                                .format(suffix, text))


def strip_end(text, suffix):
    """Removes the end of a string. Otherwise fails."""
    if not text.endswith(suffix):
        raise UnableToStripEnd.get_exception(text, suffix)
    return text[:len(text)-len(suffix)]

In my case I needed to raise an exception so I did:

class UnableToStripEnd(Exception):
    """A Exception type to indicate that the suffix cannot be removed from the text."""

    @staticmethod
    def get_exception(text, suffix):
        return UnableToStripEnd("Could not find suffix ({0}) on text: {1}."
                                .format(suffix, text))


def strip_end(text, suffix):
    """Removes the end of a string. Otherwise fails."""
    if not text.endswith(suffix):
        raise UnableToStripEnd.get_exception(text, suffix)
    return text[:len(text)-len(suffix)]

回答 15

在这里,我有一个最简单的代码。

url=url.split(".")[0]

Here,i have a simplest code.

url=url.split(".")[0]

回答 16

假定您要删除域,无论它是什么(.com,.net等)。我建议找到,.然后从此删除所有内容。

url = 'abcdc.com'
dot_index = url.rfind('.')
url = url[:dot_index]

在这里,我rfind用来解决url之类的问题abcdc.com.net,应该将其简化为name abcdc.com

如果您还担心www.s,则应明确检查它们:

if url.startswith("www."):
   url = url.replace("www.","", 1)

替换中的1用于奇怪的边缘情况,例如 www.net.www.com

如果您的网址比该网址更野,请查看人们响应的正则表达式答案。

Assuming you want to remove the domain, no matter what it is (.com, .net, etc). I recommend finding the . and removing everything from that point on.

url = 'abcdc.com'
dot_index = url.rfind('.')
url = url[:dot_index]

Here I’m using rfind to solve the problem of urls like abcdc.com.net which should be reduced to the name abcdc.com.

If you’re also concerned about www.s, you should explicitly check for them:

if url.startswith("www."):
   url = url.replace("www.","", 1)

The 1 in replace is for strange edgecases like www.net.www.com

If your url gets any wilder than that look at the regex answers people have responded with.


回答 17

我使用内置的rstrip函数来执行此操作,如下所示:

string = "test.com"
suffix = ".com"
newstring = string.rstrip(suffix)
print(newstring)
test

I used the built-in rstrip function to do it like follow:

string = "test.com"
suffix = ".com"
newstring = string.rstrip(suffix)
print(newstring)
test

回答 18

您可以使用split:

'abccomputer.com'.split('.com',1)[0]
# 'abccomputer'

You can use split:

'abccomputer.com'.split('.com',1)[0]
# 'abccomputer'

回答 19

这是正则表达式的完美用法:

>>> import re
>>> re.match(r"(.*)\.com", "hello.com").group(1)
'hello'

This is a perfect use for regular expressions:

>>> import re
>>> re.match(r"(.*)\.com", "hello.com").group(1)
'hello'

回答 20

Python> = 3.9:

'abcdc.com'.removesuffix('.com')

Python <3.9:

def remove_suffix(text, suffix):
    if text.endswith(suffix):
        text = text[:-len(suffix)]
    return text

remove_suffix('abcdc.com', '.com')

Python >= 3.9:

'abcdc.com'.removesuffix('.com')

Python < 3.9:

def remove_suffix(text, suffix):
    if text.endswith(suffix):
        text = text[:-len(suffix)]
    return text

remove_suffix('abcdc.com', '.com')

每n个字符分割一个字符串?

问题:每n个字符分割一个字符串?

是否可以每n个字符分割一个字符串?

例如,假设我有一个包含以下内容的字符串:

'1234567890'

我怎样才能使它看起来像这样:

['12','34','56','78','90']

Is it possible to split a string every nth character?

For example, suppose I have a string containing the following:

'1234567890'

How can I get it to look like this:

['12','34','56','78','90']

回答 0

>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']
>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']

回答 1

为了完整起见,您可以使用正则表达式执行此操作:

>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']

对于字符的奇数,您可以执行以下操作:

>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']

您还可以执行以下操作,以简化较长块的正则表达式:

>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']

re.finditer如果字符串很长,则可以使用它逐块生成。

Just to be complete, you can do this with a regex:

>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']

For odd number of chars you can do this:

>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']

You can also do the following, to simplify the regex for longer chunks:

>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']

And you can use re.finditer if the string is long to generate chunk by chunk.


回答 2

在python中已经有一个内置函数。

>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']

这是wrap的文档字符串所说的:

>>> help(wrap)
'''
Help on function wrap in module textwrap:

wrap(text, width=70, **kwargs)
    Wrap a single paragraph of text, returning a list of wrapped lines.

    Reformat the single paragraph in 'text' so it fits in lines of no
    more than 'width' columns, and return a list of wrapped lines.  By
    default, tabs in 'text' are expanded with string.expandtabs(), and
    all other whitespace characters (including newline) are converted to
    space.  See TextWrapper class for available keyword args to customize
    wrapping behaviour.
'''

There is already an inbuilt function in python for this.

>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']

This is what the docstring for wrap says:

>>> help(wrap)
'''
Help on function wrap in module textwrap:

wrap(text, width=70, **kwargs)
    Wrap a single paragraph of text, returning a list of wrapped lines.

    Reformat the single paragraph in 'text' so it fits in lines of no
    more than 'width' columns, and return a list of wrapped lines.  By
    default, tabs in 'text' are expanded with string.expandtabs(), and
    all other whitespace characters (including newline) are converted to
    space.  See TextWrapper class for available keyword args to customize
    wrapping behaviour.
'''

回答 3

将元素分组为n个长度的组的另一种常见方式:

>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']

此方法直接来自的文档zip()

Another common way of grouping elements into n-length groups:

>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']

This method comes straight from the docs for zip().


回答 4

我认为这比itertools版本更短,更易读:

def split_by_n(seq, n):
    '''A generator to divide a sequence into chunks of n units.'''
    while seq:
        yield seq[:n]
        seq = seq[n:]

print(list(split_by_n('1234567890', 2)))

I think this is shorter and more readable than the itertools version:

def split_by_n(seq, n):
    '''A generator to divide a sequence into chunks of n units.'''
    while seq:
        yield seq[:n]
        seq = seq[n:]

print(list(split_by_n('1234567890', 2)))

回答 5

我喜欢这个解决方案:

s = '1234567890'
o = []
while s:
    o.append(s[:2])
    s = s[2:]

I like this solution:

s = '1234567890'
o = []
while s:
    o.append(s[:2])
    s = s[2:]

回答 6

使用PyPI的more-itertools

>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']

Using more-itertools from PyPI:

>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']

回答 7

您可以使用以下grouper()配方itertools

Python 2.x:

from itertools import izip_longest    

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Python 3.x:

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

这些函数可节省内存,并且可与任何可迭代对象一起使用。

You could use the grouper() recipe from itertools:

Python 2.x:

from itertools import izip_longest    

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Python 3.x:

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

These functions are memory-efficient and work with any iterables.


回答 8

尝试以下代码:

from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

s = '1234567890'
print list(split_every(2, list(s)))

Try the following code:

from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

s = '1234567890'
print list(split_every(2, list(s)))

回答 9

>>> from functools import reduce
>>> from operator import add
>>> from itertools import izip
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x)]
['12', '34', '56', '78', '90']
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x, x)]
['123', '456', '789']
>>> from functools import reduce
>>> from operator import add
>>> from itertools import izip
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x)]
['12', '34', '56', '78', '90']
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x, x)]
['123', '456', '789']

回答 10

尝试这个:

s='1234567890'
print([s[idx:idx+2] for idx,val in enumerate(s) if idx%2 == 0])

输出:

['12', '34', '56', '78', '90']

Try this:

s='1234567890'
print([s[idx:idx+2] for idx,val in enumerate(s) if idx%2 == 0])

Output:

['12', '34', '56', '78', '90']

回答 11

一如既往,对于那些喜欢一只班轮的人

n = 2  
line = "this is a line split into n characters"  
line = [line[i * n:i * n+n] for i,blah in enumerate(line[::n])]

As always, for those who love one liners

n = 2  
line = "this is a line split into n characters"  
line = [line[i * n:i * n+n] for i,blah in enumerate(line[::n])]

回答 12

一个短字符串的简单递归解决方案:

def split(s, n):
    if len(s) < n:
        return []
    else:
        return [s[:n]] + split(s[n:], n)

print(split('1234567890', 2))

或以这种形式:

def split(s, n):
    if len(s) < n:
        return []
    elif len(s) == n:
        return [s]
    else:
        return split(s[:n], n) + split(s[n:], n)

,它更明确地说明了递归方法中的典型分而治之模式(尽管实际上没有必要这样做)

A simple recursive solution for short string:

def split(s, n):
    if len(s) < n:
        return []
    else:
        return [s[:n]] + split(s[n:], n)

print(split('1234567890', 2))

Or in such a form:

def split(s, n):
    if len(s) < n:
        return []
    elif len(s) == n:
        return [s]
    else:
        return split(s[:n], n) + split(s[n:], n)

, which illustrates the typical divide and conquer pattern in recursive approach more explicitly (though practically it is not necessary to do it this way)


回答 13

我陷入了同一个场景。

这对我有用

x="1234567890"
n=2
list=[]
for i in range(0,len(x),n):
    list.append(x[i:i+n])
print(list)

输出量

['12', '34', '56', '78', '90']

I was stucked in the same scenrio.

This worked for me

x="1234567890"
n=2
list=[]
for i in range(0,len(x),n):
    list.append(x[i:i+n])
print(list)

Output

['12', '34', '56', '78', '90']

回答 14

more_itertools.sliced之前已经提到过。这是more_itertools库中的另外四个选项:

s = "1234567890"

["".join(c) for c in mit.grouper(2, s)]

["".join(c) for c in mit.chunked(s, 2)]

["".join(c) for c in mit.windowed(s, 2, step=2)]

["".join(c) for c in  mit.split_after(s, lambda x: int(x) % 2 == 0)]

后面的每个选项均产生以下输出:

['12', '34', '56', '78', '90']

所讨论的选项的说明文档:grouperchunkedwindowedsplit_after

more_itertools.sliced has been mentioned before. Here are four more options from the more_itertools library:

s = "1234567890"

["".join(c) for c in mit.grouper(2, s)]

["".join(c) for c in mit.chunked(s, 2)]

["".join(c) for c in mit.windowed(s, 2, step=2)]

["".join(c) for c in  mit.split_after(s, lambda x: int(x) % 2 == 0)]

Each of the latter options produce the following output:

['12', '34', '56', '78', '90']

Documentation for discussed options: grouper, chunked, windowed, split_after


回答 15

这可以通过简单的for循环来实现。

a = '1234567890a'
result = []

for i in range(0, len(a), 2):
    result.append(a[i : i + 2])
print(result)

输出看起来像[’12’,’34’,’56’,’78’,’90’,’a’]

This can be achieved by a simple for loop.

a = '1234567890a'
result = []

for i in range(0, len(a), 2):
    result.append(a[i : i + 2])
print(result)

The output looks like [’12’, ’34’, ’56’, ’78’, ’90’, ‘a’]


检查另一个字符串中是否存在多个字符串

问题:检查另一个字符串中是否存在多个字符串

如何检查数组中的任何字符串是否在另一个字符串中?

喜欢:

a = ['a', 'b', 'c']
str = "a123"
if a in str:
  print "some of the strings found in str"
else:
  print "no strings found in str"

该代码行不通,只是为了展示我想要实现的目标。

How can I check if any of the strings in an array exists in another string?

Like:

a = ['a', 'b', 'c']
str = "a123"
if a in str:
  print "some of the strings found in str"
else:
  print "no strings found in str"

That code doesn’t work, it’s just to show what I want to achieve.


回答 0

您可以使用any

a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]

if any(x in a_string for x in matches):

同样检查 找到了列表中的所有字符串,请使用all代替any

You can use any:

a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]

if any(x in a_string for x in matches):

Similarly to check if all the strings from the list are found, use all instead of any.


回答 1

any()到目前为止,如果您想要的只是TrueFalse,那么这是最好的方法,但是如果您想具体了解哪些字符串匹配,则可以使用一些方法。

如果要进行第一个匹配(False默认为):

match = next((x for x in a if x in str), False)

如果要获得所有匹配项(包括重复项):

matches = [x for x in a if x in str]

如果要获取所有非重复的匹配项(不考虑顺序):

matches = {x for x in a if x in str}

如果要以正确的顺序获取所有非重复的匹配项:

matches = []
for x in a:
    if x in str and x not in matches:
        matches.append(x)

any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.

If you want the first match (with False as a default):

match = next((x for x in a if x in str), False)

If you want to get all matches (including duplicates):

matches = [x for x in a if x in str]

If you want to get all non-duplicate matches (disregarding order):

matches = {x for x in a if x in str}

If you want to get all non-duplicate matches in the right order:

matches = []
for x in a:
    if x in str and x not in matches:
        matches.append(x)

回答 2

如果输入的字符串变长astr变长,则应小心。简单的解采用O(S *(A ^ 2)),其中S是的长度,str而A是中的所有字符串的长度之和a。为获得更快的解决方案,请查看用于字符串匹配的Aho-Corasick算法,该算法以线性时间O(S + A)运行。

You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).


回答 3

只是为了增加一些多样性regex

import re

if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
    print 'possible matches thanks to regex'
else:
    print 'no matches'

或者如果您的清单太长- any(re.findall(r'|'.join(a), str, re.IGNORECASE))

Just to add some diversity with regex:

import re

if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
    print 'possible matches thanks to regex'
else:
    print 'no matches'

or if your list is too long – any(re.findall(r'|'.join(a), str, re.IGNORECASE))


回答 4

您需要迭代a的元素。

a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:    
    if item in str:
        found_a_string = True

if found_a_string:
    print "found a match"
else:
    print "no match found"

You need to iterate on the elements of a.

a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:    
    if item in str:
        found_a_string = True

if found_a_string:
    print "found a match"
else:
    print "no match found"

回答 5

jbernadas已经提到Aho-Corasick-Algorithm,以降低复杂性。

这是在Python中使用它的一种方法:

  1. 这里下载aho_corasick.py

  2. 将其与主Python文件放在同一目录中并命名 aho_corasick.py

  3. 使用以下代码尝试算法:

    from aho_corasick import aho_corasick #(string, keywords)
    
    print(aho_corasick(string, ["keyword1", "keyword2"]))

请注意,搜索区分大小写

jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.

Here is one way to use it in Python:

  1. Download aho_corasick.py from here

  2. Put it in the same directory as your main Python file and name it aho_corasick.py

  3. Try the alrorithm with the following code:

    from aho_corasick import aho_corasick #(string, keywords)
    
    print(aho_corasick(string, ["keyword1", "keyword2"]))
    

Note that the search is case-sensitive


回答 6

a = ['a', 'b', 'c']
str =  "a123"

a_match = [True for match in a if match in str]

if True in a_match:
  print "some of the strings found in str"
else:
  print "no strings found in str"
a = ['a', 'b', 'c']
str =  "a123"

a_match = [True for match in a if match in str]

if True in a_match:
  print "some of the strings found in str"
else:
  print "no strings found in str"

回答 7

这取决于上下文猜,如果你要检查,如单文字(任何一个字,E,W,..等)足够

original_word ="hackerearcth"
for 'h' in original_word:
      print("YES")

如果要检查original_word中的任何字符:请使用

if any(your_required in yourinput for your_required in original_word ):

如果要在那个original_word中输入所有想要的输入,请使用所有简单的输入

original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
    print("yes")

It depends on the context suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough

original_word ="hackerearcth"
for 'h' in original_word:
      print("YES")

if you want to check any of the character among the original_word: make use of

if any(your_required in yourinput for your_required in original_word ):

if you want all the input you want in that original_word,make use of all simple

original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
    print("yes")

回答 8

关于如何获取String中所有列表元素的更多信息

a = ['a', 'b', 'c']
str = "a123" 
list(filter(lambda x:  x in str, a))

Just some more info on how to get all list elements availlable in String

a = ['a', 'b', 'c']
str = "a123" 
list(filter(lambda x:  x in str, a))

回答 9

一种出奇的快速方法是使用set

a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
    print("some of the strings found in str")
else:
    print("no strings found in str")

如果a不包含任何多个字符的值(在这种情况下,请使用上面any列出的值),则此方法有效。如果是这样,这是简单的指定为字符串:。aa = 'abc'

A surprisingly fast approach is to use set:

a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
    print("some of the strings found in str")
else:
    print("no strings found in str")

This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it’s simpler to specify a as a string: a = 'abc'.


回答 10

flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
     for fstr in strlist:
         if line.find(fstr) != -1:
            print('found') 
            res = True


if res:
    print('res true')
else: 
    print('res false')

输出示例图像

flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
     for fstr in strlist:
         if line.find(fstr) != -1:
            print('found') 
            res = True


if res:
    print('res true')
else: 
    print('res false')

output example image


回答 11

我会使用这种功能来提高速度:

def check_string(string, substring_list):
    for substring in substring_list:
        if substring in string:
            return True
    return False

I would use this kind of function for speed:

def check_string(string, substring_list):
    for substring in substring_list:
        if substring in string:
            return True
    return False

回答 12

data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']


# for each
for field in mandatory_fields:
    if field not in data:
        print("Error, missing req field {0}".format(field));

# still fine, multiple if statements
if ('firstName' not in data or 
    'lastName' not in data or
    'age' not in data):
    print("Error, missing a req field");

# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
    print("Error, missing fields {0}".format(", ".join(missing_fields)));
data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']


# for each
for field in mandatory_fields:
    if field not in data:
        print("Error, missing req field {0}".format(field));

# still fine, multiple if statements
if ('firstName' not in data or 
    'lastName' not in data or
    'age' not in data):
    print("Error, missing a req field");

# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
    print("Error, missing fields {0}".format(", ".join(missing_fields)));

如何找到所有出现的子串?

问题:如何找到所有出现的子串?

Python具有string.find()string.rfind()获取字符串中子字符串的索引。

我想知道是否有类似的东西string.find_all()可以返回所有找到的索引(不仅是开头的第一个,还是结尾的第一个)。

例如:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

Python has string.find() and string.rfind() to get the index of a substring in a string.

I’m wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).

For example:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

回答 0

没有简单的内置字符串函数可以满足您的需求,但是您可以使用功能更强大的正则表达式

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

如果要查找重叠的匹配项,先行搜索将做到:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

如果您想要一个没有重叠的反向查找全部,则可以将正向和负向超前组合成这样的表达式:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer返回一个generator,所以您可以更改[]上述内容以()获取一个Generator而不是一个列表,如果只迭代一次结果,则列表会更有效。

There is no simple built-in string function that does what you’re looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you’re only iterating through the results once.


回答 1

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

因此,我们可以自己构建它:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

不需要临时字符串或正则表达式。

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.


回答 2

这是一种获取所有(甚至重叠)匹配项的方法(效率很低):

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

Here’s a (very inefficient) way to get all (i.e. even overlapping) matches:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

回答 3

同样,旧线程,但这是我使用生成器和plain的解决方案str.find

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

退货

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

Again, old thread, but here’s my solution using a generator and plain str.find.

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

Example

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

returns

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

回答 4

您可以将其re.finditer()用于非重叠匹配。

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

不适用于:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

You can use re.finditer() for non-overlapping matches.

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

but won’t work for:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

回答 5

来吧,让我们一起递归。

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

这样就不需要正则表达式。

Come, let us recurse together.

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

No need for regular expressions this way.


回答 6

如果您只是寻找一个字符,这将起作用:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

也,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

我的直觉是,这些(尤其是第二名)都没有表现出色。

If you’re just looking for a single character, this would work:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

Also,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

My hunch is that neither of these (especially #2) is terribly performant.


回答 7

这是一个老话题,但是我很感兴趣,想分享我的解决方案。

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

它应该返回找到子字符串的位置列表。如果您发现错误或需要改进的地方,请发表评论。

this is an old thread but i got interested and wanted to share my solution.

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.


回答 8

这使用re.finditer对我有用

import re

text = 'This is sample text to test if this pythonic '\
       'program can serve as an indexing platform for '\
       'finding words in a paragraph. It can give '\
       'values as to where the word is located with the '\
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string \'{}\''.
          format(match.start(), match.end(), match.group()))

This does the trick for me using re.finditer

import re

text = 'This is sample text to test if this pythonic '\
       'program can serve as an indexing platform for '\
       'finding words in a paragraph. It can give '\
       'values as to where the word is located with the '\
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string \'{}\''.
          format(match.start(), match.end(), match.group()))

回答 9

这个线程有点旧,但是对我有用:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

This thread is a little old but this worked for me:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

回答 10

你可以试试 :

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

You can try :

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

回答 11

无论其他人提供的解决方案完全基于可用的方法find()或任何可用的方法。

查找字符串中所有子字符串出现的核心基本算法是什么?

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

您也可以将str类继承到新类,并可以在下面使用此函数。

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

调用方法

newstr.find_all(’您觉得这个答案有用吗?然后投票!’,’this’)

Whatever the solutions provided by others are completely based on the available method find() or any available methods.

What is the core basic algorithm to find all the occurrences of a substring in a string?

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

You can also inherit str class to new class and can use this function below.

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

Calling the method

newstr.find_all(‘Do you find this answer helpful? then upvote this!’,’this’)


回答 12

此函数不会查看字符串内部的所有位置,也不会浪费计算资源。我的尝试:

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

使用它的方式是这样的:

result=findAll('this word is a big word man how many words are there?','word')

This function does not look at all positions inside the string, it does not waste compute resources. My try:

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

to use it call it like this:

result=findAll('this word is a big word man how many words are there?','word')

回答 13

在文档中查找大量关键字时,请使用flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

在大量搜索词中,Flashtext的运行速度比正则表达式快。

When looking for a large amount of key words in a document, use flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

Flashtext runs faster than regex on large list of search words.


回答 14

src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
    res.append(pos)
    pos = src.find(sub, pos + 1)
src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
    res.append(pos)
    pos = src.find(sub, pos + 1)

回答 15

这是来自hackerrank的类似问题的解决方案。希望对您有所帮助。

import re
a = input()
b = input()
if b not in a:
    print((-1,-1))
else:
    #create two list as
    start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
    for i in range(len(start_indc)):
        print((start_indc[i], start_indc[i]+len(b)-1))

输出:

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

This is solution of a similar question from hackerrank. I hope this could help you.

import re
a = input()
b = input()
if b not in a:
    print((-1,-1))
else:
    #create two list as
    start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
    for i in range(len(start_indc)):
        print((start_indc[i], start_indc[i]+len(b)-1))

Output:

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

回答 16

通过切片,我们找到了所有可能的组合,并将它们附加在列表中,并使用count函数查找了发生的次数

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
    for j in range(1,n+1):
        l.append(s[i:j])
if f in l:
    print(l.count(f))

By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
    for j in range(1,n+1):
        l.append(s[i:j])
if f in l:
    print(l.count(f))

回答 17

请看下面的代码

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

please look at below code

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

回答 18

pythonic的方式是:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>> 

The pythonic way would be:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>> 

回答 19

您可以轻松使用:

string.count('test')!

https://www.programiz.com/python-programming/methods/string/count

干杯!


在Python中连接字符串的首选方法是什么?

问题:在Python中连接字符串的首选方法是什么?

由于string无法更改Python ,因此我想知道如何更有效地连接字符串?

我可以这样写:

s += stringfromelsewhere

或像这样:

s = []
s.append(somestring)

later

s = ''.join(s)

在写这个问题时,我找到了一篇很好的文章,谈论这个话题。

http://www.skymind.com/~ocrow/python_string/

但是它在Python 2.x中,所以问题是在Python 3中会有所改变吗?

Since Python’s string can’t be changed, I was wondering how to concatenate a string more efficiently?

I can write like it:

s += stringfromelsewhere

or like this:

s = []
s.append(somestring)

later

s = ''.join(s)

While writing this question, I found a good article talking about the topic.

http://www.skymind.com/~ocrow/python_string/

But it’s in Python 2.x., so the question would be did something change in Python 3?


回答 0

将字符串附加到字符串变量的最佳方法是使用++=。这是因为它可读且快速。它们的速度也一样快,您选择的是一个品味问题,后者是最常见的。以下是该timeit模块的时间安排:

a = a + b:
0.11338996887207031
a += b:
0.11040496826171875

但是,那些建议拥有列表并附加到列表然后再连接这些列表的人之所以这么做,是因为将字符串附加到列表与扩展字符串相比可能非常快。在某些情况下,这可能是正确的。例如,这里是一字符字符串的一百万个追加,首先是字符串,然后是列表:

a += b:
0.10780501365661621
a.append(b):
0.1123361587524414

好的,事实证明,即使结果字符串的长度为一百万个字符,追加操作仍然更快。

现在让我们尝试将十千个字符长的字符串追加十万次:

a += b:
0.41823482513427734
a.append(b):
0.010656118392944336

因此,最终字符串的长度约为100MB。那太慢了,追加到列表上要快得多。那个时机不包括决赛a.join()。那要花多长时间?

a.join(a):
0.43739795684814453

哎呀 即使在这种情况下,append / join也较慢。

那么,该建议来自何处?Python 2?

a += b:
0.165287017822
a.append(b):
0.0132720470428
a.join(a):
0.114929914474

好吧,如果您使用的是非常长的字符串(通常不是,那么内存中100MB的字符串会是什么),append / join的速度会稍微快一些。

但是真正的关键是Python 2.3。我什至不告诉您时间安排,因为它是如此之慢以至于还没有完成。这些测试突然耗时数分钟。除了append / join之外,它和以后的Python一样快。

对。在石器时代,字符串连接在Python中非常缓慢。但是在2.4上已经不存在了(或者至少是Python 2.4.7),因此在2008年Python 2.3停止更新时,使用append / join的建议已过时,您应该停止使用它。:-)

(更新:当我更仔细地进行测试时发现,使用++=在Python 2.3上使用两个字符串的速度也更快。关于使用的建议''.join()一定是一种误解)

但是,这是CPython。其他实现可能还有其他问题。这是过早优化是万恶之源的又一个原因。除非先进行测量,否则不要使用被认为“更快”的技术。

因此,进行字符串连接的“最佳”版本是使用+或+ =。如果事实证明这对您来说很慢,那是不太可能的,那么请执行其他操作。

那么,为什么在我的代码中使用大量的添加/联接?因为有时它实际上更清晰。尤其是当您应将其串联在一起时,应以空格,逗号或换行符分隔。

The best way of appending a string to a string variable is to use + or +=. This is because it’s readable and fast. They are also just as fast, which one you choose is a matter of taste, the latter one is the most common. Here are timings with the timeit module:

a = a + b:
0.11338996887207031
a += b:
0.11040496826171875

However, those who recommend having lists and appending to them and then joining those lists, do so because appending a string to a list is presumably very fast compared to extending a string. And this can be true, in some cases. Here, for example, is one million appends of a one-character string, first to a string, then to a list:

a += b:
0.10780501365661621
a.append(b):
0.1123361587524414

OK, turns out that even when the resulting string is a million characters long, appending was still faster.

Now let’s try with appending a thousand character long string a hundred thousand times:

a += b:
0.41823482513427734
a.append(b):
0.010656118392944336

The end string, therefore, ends up being about 100MB long. That was pretty slow, appending to a list was much faster. That that timing doesn’t include the final a.join(). So how long would that take?

a.join(a):
0.43739795684814453

Oups. Turns out even in this case, append/join is slower.

So where does this recommendation come from? Python 2?

a += b:
0.165287017822
a.append(b):
0.0132720470428
a.join(a):
0.114929914474

Well, append/join is marginally faster there if you are using extremely long strings (which you usually aren’t, what would you have a string that’s 100MB in memory?)

But the real clincher is Python 2.3. Where I won’t even show you the timings, because it’s so slow that it hasn’t finished yet. These tests suddenly take minutes. Except for the append/join, which is just as fast as under later Pythons.

Yup. String concatenation was very slow in Python back in the stone age. But on 2.4 it isn’t anymore (or at least Python 2.4.7), so the recommendation to use append/join became outdated in 2008, when Python 2.3 stopped being updated, and you should have stopped using it. :-)

(Update: Turns out when I did the testing more carefully that using + and += is faster for two strings on Python 2.3 as well. The recommendation to use ''.join() must be a misunderstanding)

However, this is CPython. Other implementations may have other concerns. And this is just yet another reason why premature optimization is the root of all evil. Don’t use a technique that’s supposed “faster” unless you first measure it.

Therefore the “best” version to do string concatenation is to use + or +=. And if that turns out to be slow for you, which is pretty unlikely, then do something else.

So why do I use a lot of append/join in my code? Because sometimes it’s actually clearer. Especially when whatever you should concatenate together should be separated by spaces or commas or newlines.


回答 1

如果要串联很多值,那么两者都不是。追加列表很昂贵。您可以为此使用StringIO。特别是如果您要通过大量操作来构建它。

from cStringIO import StringIO
# python3:  from io import StringIO

buf = StringIO()

buf.write('foo')
buf.write('foo')
buf.write('foo')

buf.getvalue()
# 'foofoofoo'

如果您已经有其他操作返回的完整列表,则只需使用 ''.join(aList)

从python常见问题解答:将许多字符串连接在一起的最有效方法是什么?

str和bytes对象是不可变的,因此将多个字符串连接在一起效率不高,因为每个串联都会创建一个新对象。在一般情况下,总运行时成本在总字符串长度中是二次方的。

要累积许多str对象,建议的惯用法是将它们放入列表中,并在最后调用str.join():

chunks = []
for s in my_strings:
    chunks.append(s)
result = ''.join(chunks)

(另一个合理有效的习惯用法是使用io.StringIO)

要累积许多字节对象,建议的惯用法是使用就地串联(+ =运算符)扩展一个bytearray对象:

result = bytearray()
for b in my_bytes_objects:
    result += b

编辑:我很愚蠢,并且将结果向后粘贴,使其看起来比cStringIO更快。我还添加了针对bytearray / str concat的测试,以及使用较大列表和较大字符串的第二轮测试。(python 2.7.3)

大量字符串的ipython测试示例

try:
    from cStringIO import StringIO
except:
    from io import StringIO

source = ['foo']*1000

%%timeit buf = StringIO()
for i in source:
    buf.write(i)
final = buf.getvalue()
# 1000 loops, best of 3: 1.27 ms per loop

%%timeit out = []
for i in source:
    out.append(i)
final = ''.join(out)
# 1000 loops, best of 3: 9.89 ms per loop

%%timeit out = bytearray()
for i in source:
    out += i
# 10000 loops, best of 3: 98.5 µs per loop

%%timeit out = ""
for i in source:
    out += i
# 10000 loops, best of 3: 161 µs per loop

## Repeat the tests with a larger list, containing
## strings that are bigger than the small string caching 
## done by the Python
source = ['foo']*1000

# cStringIO
# 10 loops, best of 3: 19.2 ms per loop

# list append and join
# 100 loops, best of 3: 144 ms per loop

# bytearray() +=
# 100 loops, best of 3: 3.8 ms per loop

# str() +=
# 100 loops, best of 3: 5.11 ms per loop

If you are concatenating a lot of values, then neither. Appending a list is expensive. You can use StringIO for that. Especially if you are building it up over a lot of operations.

from cStringIO import StringIO
# python3:  from io import StringIO

buf = StringIO()

buf.write('foo')
buf.write('foo')
buf.write('foo')

buf.getvalue()
# 'foofoofoo'

If you already have a complete list returned to you from some other operation, then just use the ''.join(aList)

From the python FAQ: What is the most efficient way to concatenate many strings together?

str and bytes objects are immutable, therefore concatenating many strings together is inefficient as each concatenation creates a new object. In the general case, the total runtime cost is quadratic in the total string length.

To accumulate many str objects, the recommended idiom is to place them into a list and call str.join() at the end:

chunks = []
for s in my_strings:
    chunks.append(s)
result = ''.join(chunks)

(another reasonably efficient idiom is to use io.StringIO)

To accumulate many bytes objects, the recommended idiom is to extend a bytearray object using in-place concatenation (the += operator):

result = bytearray()
for b in my_bytes_objects:
    result += b

Edit: I was silly and had the results pasted backwards, making it look like appending to a list was faster than cStringIO. I have also added tests for bytearray/str concat, as well as a second round of tests using a larger list with larger strings. (python 2.7.3)

ipython test example for large lists of strings

try:
    from cStringIO import StringIO
except:
    from io import StringIO

source = ['foo']*1000

%%timeit buf = StringIO()
for i in source:
    buf.write(i)
final = buf.getvalue()
# 1000 loops, best of 3: 1.27 ms per loop

%%timeit out = []
for i in source:
    out.append(i)
final = ''.join(out)
# 1000 loops, best of 3: 9.89 ms per loop

%%timeit out = bytearray()
for i in source:
    out += i
# 10000 loops, best of 3: 98.5 µs per loop

%%timeit out = ""
for i in source:
    out += i
# 10000 loops, best of 3: 161 µs per loop

## Repeat the tests with a larger list, containing
## strings that are bigger than the small string caching 
## done by the Python
source = ['foo']*1000

# cStringIO
# 10 loops, best of 3: 19.2 ms per loop

# list append and join
# 100 loops, best of 3: 144 ms per loop

# bytearray() +=
# 100 loops, best of 3: 3.8 ms per loop

# str() +=
# 100 loops, best of 3: 5.11 ms per loop

回答 2

在Python> = 3.6中,新的f字符串是连接字符串的有效方法。

>>> name = 'some_name'
>>> number = 123
>>>
>>> f'Name is {name} and the number is {number}.'
'Name is some_name and the number is 123.'

In Python >= 3.6, the new f-string is an efficient way to concatenate a string.

>>> name = 'some_name'
>>> number = 123
>>>
>>> f'Name is {name} and the number is {number}.'
'Name is some_name and the number is 123.'

回答 3

推荐的方法仍然是使用附加和联接。

The recommended method is still to use append and join.


回答 4

如果要串联的字符串是文字,请使用字符串文字串联

re.compile(
        "[A-Za-z_]"       # letter or underscore
        "[A-Za-z0-9_]*"   # letter, digit or underscore
    )

如果要对字符串的一部分进行注释(如上)或要使用原始字符串,这将很有用文本的一部分(但不是全部)或三引号,。

由于这是在语法层发生的,因此它使用零个串联运算符。

If the strings you are concatenating are literals, use String literal concatenation

re.compile(
        "[A-Za-z_]"       # letter or underscore
        "[A-Za-z0-9_]*"   # letter, digit or underscore
    )

This is useful if you want to comment on part of a string (as above) or if you want to use raw strings or triple quotes for part of a literal but not all.

Since this happens at the syntax layer it uses zero concatenation operators.


回答 5

你写这个函数

def str_join(*args):
    return ''.join(map(str, args))

然后,您可以随时随地调用

str_join('Pine')  # Returns : Pine
str_join('Pine', 'apple')  # Returns : Pineapple
str_join('Pine', 'apple', 3)  # Returns : Pineapple3

You write this function

def str_join(*args):
    return ''.join(map(str, args))

Then you can call simply wherever you want

str_join('Pine')  # Returns : Pine
str_join('Pine', 'apple')  # Returns : Pineapple
str_join('Pine', 'apple', 3)  # Returns : Pineapple3

回答 6

虽然有些过时,代码像Pythonista:地道的Python建议join()+ 本节。就像PythonSpeedPerformanceTips在其有关字符串连接的部分中一样,具有以下免责声明:

对于更高版本的Python,本节的准确性存在争议。在CPython 2.5中,字符串连接相当快,尽管这可能不适用于其他Python实现。有关讨论,请参见ConcatenationTestCode。

While somewhat dated, Code Like a Pythonista: Idiomatic Python recommends join() over + in this section. As does PythonSpeedPerformanceTips in its section on string concatenation, with the following disclaimer:

The accuracy of this section is disputed with respect to later versions of Python. In CPython 2.5, string concatenation is fairly fast, although this may not apply likewise to other Python implementations. See ConcatenationTestCode for a discussion.


回答 7

就稳定性和交叉实现而言,最糟糕的串联方法是使用’+’进行就地字符串串联,因为它不支持所有值。PEP8标准不鼓励这样做,并鼓励长期使用format(),join()和append()。

如链接的“编程建议”部分所述:

例如,对于形式为+ = b或a = a + b的语句,请不要依赖CPython有效地实现就地字符串连接。即使在CPython中,这种优化也是脆弱的(仅适用于某些类型),并且在不使用引用计数的实现中根本没有这种优化。在库的性能敏感部分,应改用”.join()形式。这将确保在各种实现中串联发生在线性时间内。

Using in place string concatenation by ‘+’ is THE WORST method of concatenation in terms of stability and cross implementation as it does not support all values. PEP8 standard discourages this and encourages the use of format(), join() and append() for long term use.

As quoted from the linked “Programming Recommendations” section:

For example, do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn’t present at all in implementations that don’t use refcounting. In performance sensitive parts of the library, the ”.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.


回答 8

如@jdi所述,Python文档建议使用str.joinio.StringIO进行字符串连接。并说开发人员应该从+=循环中期待二次时间,即使自Python 2.4开始进行了优化。正如这个答案所说:

如果Python检测到left参数没有其他引用,它将调用realloc来尝试通过调整字符串的大小来避免复制。这不是您应该依靠的东西,因为它是一个实现细节,并且因为如果realloc最终需要频繁移动字符串,那么性能会下降到O(n ^ 2)。

我将展示一个天真地依赖于+=这种优化的真实代码示例,但它并不适用。下面的代码将可迭代的短字符串转换为更大的块,以用于批量API。

def test_concat_chunk(seq, split_by):
    result = ['']
    for item in seq:
        if len(result[-1]) + len(item) > split_by: 
            result.append('')
        result[-1] += item
    return result

由于二次时间的复杂性,该代码可能在文学上运行数小时。以下是具有建议的数据结构的替代方案:

import io

def test_stringio_chunk(seq, split_by):
    def chunk():
        buf = io.StringIO()
        size = 0
        for item in seq:
            if size + len(item) <= split_by:
                size += buf.write(item)
            else:
                yield buf.getvalue()
                buf = io.StringIO()
                size = buf.write(item)
        if size:
            yield buf.getvalue()

    return list(chunk())

def test_join_chunk(seq, split_by):
    def chunk():
        buf = []
        size = 0
        for item in seq:
            if size + len(item) <= split_by:
                buf.append(item)
                size += len(item)
            else:
                yield ''.join(buf)                
                buf.clear()
                buf.append(item)
                size = len(item)
        if size:
            yield ''.join(buf)

    return list(chunk())

还有一个微基准测试:

import timeit
import random
import string
import matplotlib.pyplot as plt

line = ''.join(random.choices(
    string.ascii_uppercase + string.digits, k=512)) + '\n'
x = []
y_concat = []
y_stringio = []
y_join = []
n = 5
for i in range(1, 11):
    x.append(i)
    seq = [line] * (20 * 2 ** 20 // len(line))
    chunk_size = i * 2 ** 20
    y_concat.append(
        timeit.timeit(lambda: test_concat_chunk(seq, chunk_size), number=n) / n)
    y_stringio.append(
        timeit.timeit(lambda: test_stringio_chunk(seq, chunk_size), number=n) / n)
    y_join.append(
        timeit.timeit(lambda: test_join_chunk(seq, chunk_size), number=n) / n)
plt.plot(x, y_concat)
plt.plot(x, y_stringio)
plt.plot(x, y_join)
plt.legend(['concat', 'stringio', 'join'], loc='upper left')
plt.show()

微观基准

As @jdi mentions Python documentation suggests to use str.join or io.StringIO for string concatenation. And says that a developer should expect quadratic time from += in a loop, even though there’s an optimisation since Python 2.4. As this answer says:

If Python detects that the left argument has no other references, it calls realloc to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it’s an implementation detail and because if realloc ends up needing to move the string frequently, performance degrades to O(n^2) anyway.

I will show an example of real-world code that naively relied on += this optimisation, but it didn’t apply. The code below converts an iterable of short strings into bigger chunks to be used in a bulk API.

def test_concat_chunk(seq, split_by):
    result = ['']
    for item in seq:
        if len(result[-1]) + len(item) > split_by: 
            result.append('')
        result[-1] += item
    return result

This code can literary run for hours because of quadratic time complexity. Below are alternatives with suggested data structures:

import io

def test_stringio_chunk(seq, split_by):
    def chunk():
        buf = io.StringIO()
        size = 0
        for item in seq:
            if size + len(item) <= split_by:
                size += buf.write(item)
            else:
                yield buf.getvalue()
                buf = io.StringIO()
                size = buf.write(item)
        if size:
            yield buf.getvalue()

    return list(chunk())

def test_join_chunk(seq, split_by):
    def chunk():
        buf = []
        size = 0
        for item in seq:
            if size + len(item) <= split_by:
                buf.append(item)
                size += len(item)
            else:
                yield ''.join(buf)                
                buf.clear()
                buf.append(item)
                size = len(item)
        if size:
            yield ''.join(buf)

    return list(chunk())

And a micro-benchmark:

import timeit
import random
import string
import matplotlib.pyplot as plt

line = ''.join(random.choices(
    string.ascii_uppercase + string.digits, k=512)) + '\n'
x = []
y_concat = []
y_stringio = []
y_join = []
n = 5
for i in range(1, 11):
    x.append(i)
    seq = [line] * (20 * 2 ** 20 // len(line))
    chunk_size = i * 2 ** 20
    y_concat.append(
        timeit.timeit(lambda: test_concat_chunk(seq, chunk_size), number=n) / n)
    y_stringio.append(
        timeit.timeit(lambda: test_stringio_chunk(seq, chunk_size), number=n) / n)
    y_join.append(
        timeit.timeit(lambda: test_join_chunk(seq, chunk_size), number=n) / n)
plt.plot(x, y_concat)
plt.plot(x, y_stringio)
plt.plot(x, y_join)
plt.legend(['concat', 'stringio', 'join'], loc='upper left')
plt.show()

micro-benchmark


回答 9

您可以采用不同的方式。

str1 = "Hello"
str2 = "World"
str_list = ['Hello', 'World']
str_dict = {'str1': 'Hello', 'str2': 'World'}

# Concatenating With the + Operator
print(str1 + ' ' + str2)  # Hello World

# String Formatting with the % Operator
print("%s %s" % (str1, str2))  # Hello World

# String Formatting with the { } Operators with str.format()
print("{}{}".format(str1, str2))  # Hello World
print("{0}{1}".format(str1, str2))  # Hello World
print("{str1} {str2}".format(str1=str_dict['str1'], str2=str_dict['str2']))  # Hello World
print("{str1} {str2}".format(**str_dict))  # Hello World

# Going From a List to a String in Python With .join()
print(' '.join(str_list))  # Hello World

# Python f'strings --> 3.6 onwards
print(f"{str1} {str2}")  # Hello World

我通过以下文章创建了这个小总结。

You can do in different ways.

str1 = "Hello"
str2 = "World"
str_list = ['Hello', 'World']
str_dict = {'str1': 'Hello', 'str2': 'World'}

# Concatenating With the + Operator
print(str1 + ' ' + str2)  # Hello World

# String Formatting with the % Operator
print("%s %s" % (str1, str2))  # Hello World

# String Formatting with the { } Operators with str.format()
print("{}{}".format(str1, str2))  # Hello World
print("{0}{1}".format(str1, str2))  # Hello World
print("{str1} {str2}".format(str1=str_dict['str1'], str2=str_dict['str2']))  # Hello World
print("{str1} {str2}".format(**str_dict))  # Hello World

# Going From a List to a String in Python With .join()
print(' '.join(str_list))  # Hello World

# Python f'strings --> 3.6 onwards
print(f"{str1} {str2}")  # Hello World

I created this little summary through following articles.


回答 10

我的用例略有不同。我不得不构造一个查询,其中有20多个字段是动态的。我遵循了这种使用格式化方法的方法

query = "insert into {0}({1},{2},{3}) values({4}, {5}, {6})"
query.format('users','name','age','dna','suzan',1010,'nda')

这对我来说相对来说比较简单,而不是使用+或其他方式

my use case was slight different. I had to construct a query where more then 20 fields were dynamic. I followed this approach of using format method

query = "insert into {0}({1},{2},{3}) values({4}, {5}, {6})"
query.format('users','name','age','dna','suzan',1010,'nda')

this was comparatively simpler for me instead of using + or other ways


回答 11

您也可以使用此(更有效)。(/software/304445/why-is-s-better-than-for-concatenation

s += "%s" %(stringfromelsewhere)