标签归档:string

如何在Python的同一行上打印变量和字符串?

问题:如何在Python的同一行上打印变量和字符串?

我正在使用python算出如果一个孩子每7秒出生一次,那么5年内将有多少个孩子出生。问题出在我的最后一行。当我在文本的任何一侧打印文本时,如何使它工作?

这是我的代码:

currentPop = 312032486
oneYear = 365
hours = 24
minutes = 60
seconds = 60

# seconds in a single day
secondsInDay = hours * minutes * seconds

# seconds in a year
secondsInYear = secondsInDay * oneYear

fiveYears = secondsInYear * 5

#Seconds in 5 years
print fiveYears

# fiveYears in seconds, divided by 7 seconds
births = fiveYears // 7

print "If there was a birth every 7 seconds, there would be: " births "births"

I am using python to work out how many children would be born in 5 years if a child was born every 7 seconds. The problem is on my last line. How do I get a variable to work when I’m printing text either side of it?

Here is my code:

currentPop = 312032486
oneYear = 365
hours = 24
minutes = 60
seconds = 60

# seconds in a single day
secondsInDay = hours * minutes * seconds

# seconds in a year
secondsInYear = secondsInDay * oneYear

fiveYears = secondsInYear * 5

#Seconds in 5 years
print fiveYears

# fiveYears in seconds, divided by 7 seconds
births = fiveYears // 7

print "If there was a birth every 7 seconds, there would be: " births "births"

回答 0

使用,分隔字符串和变量,同时打印:

print "If there was a birth every 7 seconds, there would be: ",births,"births"

, 在print语句中将项目分隔一个空格:

>>> print "foo","bar","spam"
foo bar spam

或更好地使用字符串格式

print "If there was a birth every 7 seconds, there would be: {} births".format(births)

字符串格式化功能强大得多,它还允许您执行其他一些操作,例如:填充,填充,对齐,宽度,设置精度等

>>> print "{:d} {:03d} {:>20f}".format(1,2,1.1)
1 002             1.100000
  ^^^
  0's padded to 2

演示:

>>> births = 4
>>> print "If there was a birth every 7 seconds, there would be: ",births,"births"
If there was a birth every 7 seconds, there would be:  4 births

#formatting
>>> print "If there was a birth every 7 seconds, there would be: {} births".format(births)
If there was a birth every 7 seconds, there would be: 4 births

Use , to separate strings and variables while printing:

print("If there was a birth every 7 seconds, there would be: ", births, "births")

, in print function separates the items by a single space:

>>> print("foo", "bar", "spam")
foo bar spam

or better use string formatting:

print("If there was a birth every 7 seconds, there would be: {} births".format(births))

String formatting is much more powerful and allows you to do some other things as well, like padding, fill, alignment, width, set precision, etc.

>>> print("{:d} {:03d} {:>20f}".format(1, 2, 1.1))
1 002             1.100000
  ^^^
  0's padded to 2

Demo:

>>> births = 4
>>> print("If there was a birth every 7 seconds, there would be: ", births, "births")
If there was a birth every 7 seconds, there would be:  4 births

# formatting
>>> print("If there was a birth every 7 seconds, there would be: {} births".format(births))
If there was a birth every 7 seconds, there would be: 4 births

回答 1

还有两个

第一个

 >>>births = str(5)
 >>>print "there are " + births + " births."
 there are 5 births.

添加字符串时,它们会串联在一起。

第二个

同样format,字符串的(Python 2.6和更高版本)方法可能是标准方法:

>>> births = str(5)
>>>
>>> print "there are {} births.".format(births)
there are 5 births.

format方法也可以与列表一起使用

>>> format_list = ['five','three']
>>> print "there are {} births and {} deaths".format(*format_list) #unpack the list
there are five births and three deaths

或字典

>>> format_dictionary = {'births': 'five', 'deaths': 'three'}
>>> print "there are {births} births, and {deaths} deaths".format(**format_dictionary) #yup, unpack the dictionary
there are five births, and three deaths

Two more

The First one

>>> births = str(5)
>>> print("there are " + births + " births.")
there are 5 births.

When adding strings, they concatenate.

The Second One

Also the format (Python 2.6 and newer) method of strings is probably the standard way:

>>> births = str(5)
>>>
>>> print("there are {} births.".format(births))
there are 5 births.

This format method can be used with lists as well

>>> format_list = ['five', 'three']
>>> # * unpacks the list:
>>> print("there are {} births and {} deaths".format(*format_list))  
there are five births and three deaths

or dictionaries

>>> format_dictionary = {'births': 'five', 'deaths': 'three'}
>>> # ** unpacks the dictionary
>>> print("there are {births} births, and {deaths} deaths".format(**format_dictionary))
there are five births, and three deaths

回答 2

Python是一种非常通用的语言。您可以通过不同的方法打印变量。我列出了以下4种方法。您可以根据需要使用它们。

例:

a=1
b='ball'

方法1:

print('I have %d %s' %(a,b))

方法2:

print('I have',a,b)

方法3:

print('I have {} {}'.format(a,b))

方法4:

print('I have ' + str(a) +' ' +b)

方法5:

  print( f'I have {a} {b}')

输出为:

I have 1 ball

Python is a very versatile language. You may print variables by different methods. I have listed below five methods. You may use them according to your convenience.

Example:

a = 1
b = 'ball'

Method 1:

print('I have %d %s' % (a, b))

Method 2:

print('I have', a, b)

Method 3:

print('I have {} {}'.format(a, b))

Method 4:

print('I have ' + str(a) + ' ' + b)

Method 5:

print(f'I have {a} {b}')

The output would be:

I have 1 ball

回答 3

如果要使用python 3,它非常简单:

print("If there was a birth every 7 second, there would be %d births." % (births))

If you want to work with python 3, it’s very simple:

print("If there was a birth every 7 second, there would be %d births." % (births))

回答 4

从python 3.6开始,您可以使用文字字符串插值。

births = 5.25487
>>> print(f'If there was a birth every 7 seconds, there would be: {births:.2f} births')
If there was a birth every 7 seconds, there would be: 5.25 births

As of python 3.6 you can use Literal String Interpolation.

births = 5.25487
>>> print(f'If there was a birth every 7 seconds, there would be: {births:.2f} births')
If there was a birth every 7 seconds, there would be: 5.25 births

回答 5

您可以使用f-string.format()方法

使用f弦

print(f'If there was a birth every 7 seconds, there would be: {births} births')

使用.format()

print("If there was a birth every 7 seconds, there would be: {births} births".format(births=births))

You can either use the f-string or .format() methods

Using f-string

print(f'If there was a birth every 7 seconds, there would be: {births} births')

Using .format()

print("If there was a birth every 7 seconds, there would be: {births} births".format(births=births))

回答 6

您可以使用格式字符串:

print "There are %d births" % (births,)

或在这种简单情况下:

print "There are ", births, "births"

You can either use a formatstring:

print "There are %d births" % (births,)

or in this simple case:

print "There are ", births, "births"

回答 7

如果您使用的是python 3.6或最新版本,则f-string是最佳和简便的选择

print(f"{your_varaible_name}")

If you are using python 3.6 or latest, f-string is the best and easy one

print(f"{your_varaible_name}")

回答 8

您将首先创建一个变量:例如:D =1。然后执行此操作,但是将字符串替换为所需的任何内容:

D = 1
print("Here is a number!:",D)

You would first make a variable: for example: D = 1. Then Do This but replace the string with whatever you want:

D = 1
print("Here is a number!:",D)

回答 9

在当前的python版本上,您必须使用括号,如下所示:

print ("If there was a birth every 7 seconds", X)

On a current python version you have to use parenthesis, like so :

print ("If there was a birth every 7 seconds", X)

回答 10

使用字符串格式

print("If there was a birth every 7 seconds, there would be: {} births".format(births))
 # Will replace "{}" with births

如果您在进行玩具项目,请使用:

print('If there was a birth every 7 seconds, there would be:' births'births) 

要么

print('If there was a birth every 7 seconds, there would be: %d births' %(births))
# Will replace %d with births

use String formatting

print("If there was a birth every 7 seconds, there would be: {} births".format(births))
 # Will replace "{}" with births

if you doing a toy project use:

print('If there was a birth every 7 seconds, there would be:' births'births) 

or

print('If there was a birth every 7 seconds, there would be: %d births' %(births))
# Will replace %d with births

回答 11

您可以使用字符串格式来执行此操作:

print "If there was a birth every 7 seconds, there would be: %d births" % births

或者您可以提供print多个参数,它将自动用空格分隔它们:

print "If there was a birth every 7 seconds, there would be:", births, "births"

You can use string formatting to do this:

print "If there was a birth every 7 seconds, there would be: %d births" % births

or you can give print multiple arguments, and it will automatically separate them by a space:

print "If there was a birth every 7 seconds, there would be:", births, "births"

回答 12

我将您的脚本复制并粘贴到.py文件中。我使用Python 2.7.10原样运行它,并收到了相同的语法错误。我还在Python 3.5中尝试了该脚本,并收到以下输出:

File "print_strings_on_same_line.py", line 16
print fiveYears
              ^
SyntaxError: Missing parentheses in call to 'print'

然后,我修改了最后一行,其中打印了出生人数,如下所示:

currentPop = 312032486
oneYear = 365
hours = 24
minutes = 60
seconds = 60

# seconds in a single day
secondsInDay = hours * minutes * seconds

# seconds in a year
secondsInYear = secondsInDay * oneYear

fiveYears = secondsInYear * 5

#Seconds in 5 years
print fiveYears

# fiveYears in seconds, divided by 7 seconds
births = fiveYears // 7

print "If there was a birth every 7 seconds, there would be: " + str(births) + " births"

输出为(Python 2.7.10):

157680000
If there was a birth every 7 seconds, there would be: 22525714 births

我希望这有帮助。

I copied and pasted your script into a .py file. I ran it as-is with Python 2.7.10 and received the same syntax error. I also tried the script in Python 3.5 and received the following output:

File "print_strings_on_same_line.py", line 16
print fiveYears
              ^
SyntaxError: Missing parentheses in call to 'print'

Then, I modified the last line where it prints the number of births as follows:

currentPop = 312032486
oneYear = 365
hours = 24
minutes = 60
seconds = 60

# seconds in a single day
secondsInDay = hours * minutes * seconds

# seconds in a year
secondsInYear = secondsInDay * oneYear

fiveYears = secondsInYear * 5

#Seconds in 5 years
print fiveYears

# fiveYears in seconds, divided by 7 seconds
births = fiveYears // 7

print "If there was a birth every 7 seconds, there would be: " + str(births) + " births"

The output was (Python 2.7.10):

157680000
If there was a birth every 7 seconds, there would be: 22525714 births

I hope this helps.


回答 13

只需在之间使用,(逗号)。

请参阅以下代码以获得更好的理解:

# Weight converter pounds to kg

weight_lbs = input("Enter your weight in pounds: ")

weight_kg = 0.45 * int(weight_lbs)

print("You are ", weight_kg, " kg")

Just use , (comma) in between.

See this code for better understanding:

# Weight converter pounds to kg

weight_lbs = input("Enter your weight in pounds: ")

weight_kg = 0.45 * int(weight_lbs)

print("You are ", weight_kg, " kg")

回答 14

稍有不同:使用Python 3并在同一行中打印几个变量:

print("~~Create new DB:",argv[5],"; with user:",argv[3],"; and Password:",argv[4]," ~~")

Slightly different: Using Python 3 and print several variables in the same line:

print("~~Create new DB:",argv[5],"; with user:",argv[3],"; and Password:",argv[4]," ~~")

回答 15

PYTHON 3

最好使用格式选项

user_name=input("Enter your name : )

points = 10

print ("Hello, {} your point is {} : ".format(user_name,points)

或将输入声明为字符串并使用

user_name=str(input("Enter your name : ))

points = 10

print("Hello, "+user_name+" your point is " +str(points))

PYTHON 3

Better to use the format option

user_name=input("Enter your name : )

points = 10

print ("Hello, {} your point is {} : ".format(user_name,points)

or declare the input as string and use

user_name=str(input("Enter your name : ))

points = 10

print("Hello, "+user_name+" your point is " +str(points))

回答 16

如果在字符串和变量之间使用逗号,如下所示:

print "If there was a birth every 7 seconds, there would be: ", births, "births"

If you use a comma inbetween the strings and the variable, like this:

print "If there was a birth every 7 seconds, there would be: ", births, "births"

Python-检查Word是否在字符串中

问题:Python-检查Word是否在字符串中

我正在使用Python v2,并且正在尝试找出是否可以判断字符串中是否包含单词。

我发现了一些有关识别单词是否在字符串中的信息-使用.find,但是有一种方法可以执行IF语句。我想要以下内容:

if string.find(word):
    print 'success'

谢谢你的帮助。

I’m working with Python v2, and I’m trying to find out if you can tell if a word is in a string.

I have found some information about identifying if the word is in the string – using .find, but is there a way to do an IF statement. I would like to have something like the following:

if string.find(word):
    print 'success'

Thanks for any help.


回答 0

出什么问题了:

if word in mystring: 
   print 'success'

What is wrong with:

if word in mystring: 
   print 'success'

在Python中使用多个参数进行字符串格式化(例如’%s…%s’)

问题:在Python中使用多个参数进行字符串格式化(例如’%s…%s’)

我有一个看起来像的字符串,'%s in %s'并且我想知道如何分隔参数,以便它们是两个不同的%s。我来自Java的想法是这样的:

'%s in %s' % unicode(self.author),  unicode(self.publication)

但这不起作用,因此它在Python中的外观如何?

I have a string that looks like '%s in %s' and I want to know how to seperate the arguments so that they are two different %s. My mind coming from Java came up with this:

'%s in %s' % unicode(self.author),  unicode(self.publication)

But this doesn’t work so how does it look in Python?


回答 0

马克·西达德(Mark Cidade)的答案是正确的-您需要提供一个元组。

但是从Python 2.6起,您可以使用format代替%

'{0} in {1}'.format(unicode(self.author,'utf-8'),  unicode(self.publication,'utf-8'))

%不再鼓励使用for格式化字符串。

这种字符串格式设置方法是Python 3.0中的新标准,应优先于新代码中“字符串格式设置操作”中描述的%格式设置。

Mark Cidade’s answer is right – you need to supply a tuple.

However from Python 2.6 onwards you can use format instead of %:

'{0} in {1}'.format(unicode(self.author,'utf-8'),  unicode(self.publication,'utf-8'))

Usage of % for formatting strings is no longer encouraged.

This method of string formatting is the new standard in Python 3.0, and should be preferred to the % formatting described in String Formatting Operations in new code.


回答 1

如果使用多个参数,则必须将其放在一个元组中(请注意额外的括号):

'%s in %s' % (unicode(self.author),  unicode(self.publication))

正如EOL所指出的那样,该unicode()函数通常假定默认为ascii编码,因此,如果您使用非ASCII字符,则显式传递编码会更安全:

'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))

从Python 3.0开始,最好改用以下str.format()语法:

'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))

If you’re using more than one argument it has to be in a tuple (note the extra parentheses):

'%s in %s' % (unicode(self.author),  unicode(self.publication))

As EOL points out, the unicode() function usually assumes ascii encoding as a default, so if you have non-ASCII characters, it’s safer to explicitly pass the encoding:

'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))

And as of Python 3.0, it’s preferred to use the str.format() syntax instead:

'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))

回答 2

在元组/映射对象上有多个参数 format

以下是文档摘录:

给定的format % values中的%转换规范format将替换为的零个或多个元素values。效果类似于使用sprintf() C语言中的用法。

如果format需要单个参数,则值可以是单个非元组对象。否则,值必须是一个具有由formatstring 指定的项目数的元组或者是一个映射对象(例如,字典)。

参考资料


开启str.format而不是%

%操作员的新替代方法是使用str.format。以下是文档摘录:

str.format(*args, **kwargs)

执行字符串格式化操作。调用此方法的字符串可以包含文字文本或用大括号分隔的替换字段{}。每个替换字段都包含位置参数的数字索引或关键字参数的名称。返回字符串的副本,其中每个替换字段都用相应参数的字符串值替换。

此方法是Python 3.0中的新标准,应优先于%formatting

参考资料


例子

以下是一些用法示例:

>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond

也可以看看

On a tuple/mapping object for multiple argument format

The following is excerpt from the documentation:

Given format % values, % conversion specifications in format are replaced with zero or more elements of values. The effect is similar to the using sprintf() in the C language.

If format requires a single argument, values may be a single non-tuple object. Otherwise, values must be a tuple with exactly the number of items specified by the format string, or a single mapping object (for example, a dictionary).

References


On str.format instead of %

A newer alternative to % operator is to use str.format. Here’s an excerpt from the documentation:

str.format(*args, **kwargs)

Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.

This method is the new standard in Python 3.0, and should be preferred to % formatting.

References


Examples

Here are some usage examples:

>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond

See also


回答 3

您必须将值放在括号中:

'%s in %s' % (unicode(self.author),  unicode(self.publication))

在这里,第一个%sunicode(self.author)将被放置。第二%sunicode(self.publication)将使用。

注意:你应该有利于string formatting%符号。更多信息在这里

You must just put the values into parentheses:

'%s in %s' % (unicode(self.author),  unicode(self.publication))

Here, for the first %s the unicode(self.author) will be placed. And for the second %s, the unicode(self.publication) will be used.

Note: You should favor string formatting over the % Notation. More info here


回答 4

到目前为止,发布的一些答案存在一个严重的问题:unicode()从默认编码(通常为ASCII)解码;实际上,unicode()试图通过将给定的字节转换为字符来“感知”。因此,以下代码(基本上是前面的答案所建议的)在我的计算机上失败:

# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))

给出:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

失败的原因是author不只包含ASCII字节(即[0; 127]中的值),并且unicode()默认情况下(在许多计算机上)从ASCII解码。

一个可靠的解决方案是显式提供您的字段中使用的编码。以UTF-8为例:

u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))

(或不使用initial u,这取决于您要使用Unicode结果还是字节字符串)。

在这一点上,可能要考虑让authorand publication字段为Unicode字符串,而不是在格式化期间对其进行解码。

There is a significant problem with some of the answers posted so far: unicode() decodes from the default encoding, which is often ASCII; in fact, unicode() tries to make “sense” of the bytes it is given by converting them into characters. Thus, the following code, which is essentially what is recommended by previous answers, fails on my machine:

# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))

gives:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

The failure comes from the fact that author does not contain only ASCII bytes (i.e. with values in [0; 127]), and unicode() decodes from ASCII by default (on many machines).

A robust solution is to explicitly give the encoding used in your fields; taking UTF-8 as an example:

u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))

(or without the initial u, depending on whether you want a Unicode result or a byte string).

At this point, one might want to consider having the author and publication fields be Unicode strings, instead of decoding them during formatting.


回答 5

对于python2,您也可以执行此操作

'%(author)s in %(publication)s'%{'author':unicode(self.author),
                                  'publication':unicode(self.publication)}

如果您有很多可替代的论点(特别是在进行国际化的情况下),这将很方便

Python2.6及更高版本支持 .format()

'{author} in {publication}'.format(author=self.author,
                                   publication=self.publication)

For python2 you can also do this

'%(author)s in %(publication)s'%{'author':unicode(self.author),
                                  'publication':unicode(self.publication)}

which is handy if you have a lot of arguments to substitute (particularly if you are doing internationalisation)

Python2.6 onwards supports .format()

'{author} in {publication}'.format(author=self.author,
                                   publication=self.publication)

回答 6

您还可以通过以下方式干净,简单地使用它(但是错误!因为您应该format像Mark Byers所说的那样使用):

print 'This is my %s formatted with %d arguments' % ('string', 2)

You could also use it clean and simple (but wrong! because you should use format like Mark Byers said) by doing:

print 'This is my %s formatted with %d arguments' % ('string', 2)

回答 7

为了完整起见,在PEP-498中引入了Python 3.6 f-string 。这些字符串可以

使用最小语法将表达式嵌入字符串文字中。

这意味着对于您的示例,您还可以使用:

f'{self.author} in {self.publication}'

For completeness, in python 3.6 f-string are introduced in PEP-498. These strings make it possible to

embed expressions inside string literals, using a minimal syntax.

That would mean that for your example you could also use:

f'{self.author} in {self.publication}'

不区分大小写的替换

问题:不区分大小写的替换

在Python中执行不区分大小写的字符串替换的最简单方法是什么?

What’s the easiest way to do a case-insensitive string replacement in Python?


回答 0

string类型不支持此功能。您最好使用带有re.IGNORECASE选项的正则表达式子方法

>>> import re
>>> insensitive_hippo = re.compile(re.escape('hippo'), re.IGNORECASE)
>>> insensitive_hippo.sub('giraffe', 'I want a hIPpo for my birthday')
'I want a giraffe for my birthday'

The string type doesn’t support this. You’re probably best off using the regular expression sub method with the re.IGNORECASE option.

>>> import re
>>> insensitive_hippo = re.compile(re.escape('hippo'), re.IGNORECASE)
>>> insensitive_hippo.sub('giraffe', 'I want a hIPpo for my birthday')
'I want a giraffe for my birthday'

回答 1

import re
pattern = re.compile("hello", re.IGNORECASE)
pattern.sub("bye", "hello HeLLo HELLO")
# 'bye bye bye'
import re
pattern = re.compile("hello", re.IGNORECASE)
pattern.sub("bye", "hello HeLLo HELLO")
# 'bye bye bye'

回答 2

在一行中:

import re
re.sub("(?i)hello","bye", "hello HeLLo HELLO") #'bye bye bye'
re.sub("(?i)he\.llo","bye", "he.llo He.LLo HE.LLO") #'bye bye bye'

或者,使用可选的“标志”参数:

import re
re.sub("hello", "bye", "hello HeLLo HELLO", flags=re.I) #'bye bye bye'
re.sub("he\.llo", "bye", "he.llo He.LLo HE.LLO", flags=re.I) #'bye bye bye'

In a single line:

import re
re.sub("(?i)hello","bye", "hello HeLLo HELLO") #'bye bye bye'
re.sub("(?i)he\.llo","bye", "he.llo He.LLo HE.LLO") #'bye bye bye'

Or, use the optional “flags” argument:

import re
re.sub("hello", "bye", "hello HeLLo HELLO", flags=re.I) #'bye bye bye'
re.sub("he\.llo", "bye", "he.llo He.LLo HE.LLO", flags=re.I) #'bye bye bye'

回答 3

继续bFloch的回答,此功能将不改变任何一种,而是将所有旧出现的内容更改为新内容-以不区分大小写的方式。

def ireplace(old, new, text):
    idx = 0
    while idx < len(text):
        index_l = text.lower().find(old.lower(), idx)
        if index_l == -1:
            return text
        text = text[:index_l] + new + text[index_l + len(old):]
        idx = index_l + len(new) 
    return text

Continuing on bFloch’s answer, this function will change not one, but all occurrences of old with new – in a case insensitive fashion.

def ireplace(old, new, text):
    idx = 0
    while idx < len(text):
        index_l = text.lower().find(old.lower(), idx)
        if index_l == -1:
            return text
        text = text[:index_l] + new + text[index_l + len(old):]
        idx = index_l + len(new) 
    return text

回答 4

就像布莱尔·康拉德(Blair Conrad)所说的那样,string.replace不支持这一点。

使用regex re.sub,但请记住先转义替换字符串。请注意,在2.6中没有for的flags-option re.sub,因此您必须使用Embedded修饰符'(?i)'(或RE对象,请参阅Blair Conrad的答案)。另外,另一个陷阱是,如果给出了字符串,sub将在替换文本中处理反斜杠转义。为了避免这种情况,可以传入lambda。

这是一个函数:

import re
def ireplace(old, repl, text):
    return re.sub('(?i)'+re.escape(old), lambda m: repl, text)

>>> ireplace('hippo?', 'giraffe!?', 'You want a hiPPO?')
'You want a giraffe!?'
>>> ireplace(r'[binfolder]', r'C:\Temp\bin', r'[BinFolder]\test.exe')
'C:\\Temp\\bin\\test.exe'

Like Blair Conrad says string.replace doesn’t support this.

Use the regex re.sub, but remember to escape the replacement string first. Note that there’s no flags-option in 2.6 for re.sub, so you’ll have to use the embedded modifier '(?i)' (or a RE-object, see Blair Conrad’s answer). Also, another pitfall is that sub will process backslash escapes in the replacement text, if a string is given. To avoid this one can instead pass in a lambda.

Here’s a function:

import re
def ireplace(old, repl, text):
    return re.sub('(?i)'+re.escape(old), lambda m: repl, text)

>>> ireplace('hippo?', 'giraffe!?', 'You want a hiPPO?')
'You want a giraffe!?'
>>> ireplace(r'[binfolder]', r'C:\Temp\bin', r'[BinFolder]\test.exe')
'C:\\Temp\\bin\\test.exe'

回答 5

此函数同时使用str.replace()re.findall()函数。它将以不区分大小写的方式替换patternin中所有出现的情况。stringrepl

def replace_all(pattern, repl, string) -> str:
   occurences = re.findall(pattern, string, re.IGNORECASE)
   for occurence in occurences:
       string = string.replace(occurence, repl)
       return string

This function uses both the str.replace() and re.findall() functions. It will replace all occurences of pattern in string with repl in a case-insensitive way.

def replace_all(pattern, repl, string) -> str:
   occurences = re.findall(pattern, string, re.IGNORECASE)
   for occurence in occurences:
       string = string.replace(occurence, repl)
       return string

回答 6

这不需要RegularExp

def ireplace(old, new, text):
    """ 
    Replace case insensitive
    Raises ValueError if string not found
    """
    index_l = text.lower().index(old.lower())
    return text[:index_l] + new + text[index_l + len(old):] 

This doesn’t require RegularExp

def ireplace(old, new, text):
    """ 
    Replace case insensitive
    Raises ValueError if string not found
    """
    index_l = text.lower().index(old.lower())
    return text[:index_l] + new + text[index_l + len(old):] 

回答 7

关于语法细节和选项的有趣观察:

在Win32上的Python 3.7.2(tags / v3.7.2:9a3ffc0492,2018年12月23日,23:09:28)[MSC v.1916 64位(AMD64)]

import re
old = "TREEROOT treeroot TREerOot"
re.sub(r'(?i)treeroot', 'grassroot', old)

‘草根草根草根’

re.sub(r'treeroot', 'grassroot', old)

‘TREEROOT草根TREerOot’

re.sub(r'treeroot', 'grassroot', old, flags=re.I)

‘草根草根草根’

re.sub(r'treeroot', 'grassroot', old, re.I)

‘TREEROOT草根TREerOot’

因此,match表达式中的(?i)前缀或添加“ flags = re.I”作为第四个参数将导致不区分大小写的匹配。但是,仅使用“ re.I”作为第四个参数不会导致不区分大小写的匹配。

为了比较,

re.findall(r'treeroot', old, re.I)

[‘TREEROOT’,’treeroot’,’TREerOot’]

re.findall(r'treeroot', old)

[‘treeroot’]

An interesting observation about syntax details and options:

Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32

import re
old = "TREEROOT treeroot TREerOot"
re.sub(r'(?i)treeroot', 'grassroot', old)

‘grassroot grassroot grassroot’

re.sub(r'treeroot', 'grassroot', old)

‘TREEROOT grassroot TREerOot’

re.sub(r'treeroot', 'grassroot', old, flags=re.I)

‘grassroot grassroot grassroot’

re.sub(r'treeroot', 'grassroot', old, re.I)

‘TREEROOT grassroot TREerOot’

So the (?i) prefix in the match expression or adding “flags=re.I” as a fourth argument will result in a case-insensitive match. BUT, using just “re.I” as the fourth argument does not result in case-insensitive match.

For comparison,

re.findall(r'treeroot', old, re.I)

[‘TREEROOT’, ‘treeroot’, ‘TREerOot’]

re.findall(r'treeroot', old)

[‘treeroot’]


回答 8

我正在将\ t转换为转义序列(向下滚动),因此我注意到re.sub将反斜杠的转义字符转换为转义序列。

为了防止这种情况,我写了以下内容:

替换不区分大小写。

import re
    def ireplace(findtxt, replacetxt, data):
        return replacetxt.join(  re.compile(findtxt, flags=re.I).split(data)  )

另外,如果您希望将其替换为转义字符,例如此处的其他答案,这些特殊含义是将bashslash字符转换为转义序列,则只需对您的查找和解码,或替换字符串即可。在Python 3中,可能必须执行类似.decode(“ unicode_escape”)#python3的操作

findtxt = findtxt.decode('string_escape') # python2
replacetxt = replacetxt.decode('string_escape') # python2
data = ireplace(findtxt, replacetxt, data)

在Python 2.7.8中测试

希望有帮助。

I was having \t being converted to the escape sequences (scroll a bit down), so I noted that re.sub converts backslashed escaped characters to escape sequences.

To prevent that I wrote the following:

Replace case insensitive.

import re
    def ireplace(findtxt, replacetxt, data):
        return replacetxt.join(  re.compile(findtxt, flags=re.I).split(data)  )

Also, if you want it to replace with the escape characters, like the other answers here that are getting the special meaning bashslash characters converted to escape sequences, just decode your find and, or replace string. In Python 3, might have to do something like .decode(“unicode_escape”) # python3

findtxt = findtxt.decode('string_escape') # python2
replacetxt = replacetxt.decode('string_escape') # python2
data = ireplace(findtxt, replacetxt, data)

Tested in Python 2.7.8

Hope that helps.


回答 9

之前从未发布过答案,并且该线程确实很旧,但是我想出了另一种解决方案,并认为我可以得到您的回应,我在Python编程中经验不足,因此,如果它有明显的缺点,请指出来,因为它的良好学习是: )

i='I want a hIPpo for my birthday'
key='hippo'
swp='giraffe'

o=(i.lower().split(key))
c=0
p=0
for w in o:
    o[c]=i[p:p+len(w)]
    p=p+len(key+w)
    c+=1
print(swp.join(o))

never posted an answer before and this thread is really old but i came up with another sollution and figured i could get your respons, Im not seasoned in Python programming so if there are appearant drawbacks to it, please point them out since its good learning :)

i='I want a hIPpo for my birthday'
key='hippo'
swp='giraffe'

o=(i.lower().split(key))
c=0
p=0
for w in o:
    o[c]=i[p:p+len(w)]
    p=p+len(key+w)
    c+=1
print(swp.join(o))

如何检查变量是否为具有python 2和3兼容性的字符串

问题:如何检查变量是否为具有python 2和3兼容性的字符串

我知道我可以isinstance(x, str)在python-3.x中使用:但我还需要检查python-2.x中是否还有字符串。会isinstance(x, str)按预期在python-2.x的?还是我需要检查版本并使用isinstance(x, basestr)

具体来说,在python-2.x中:

>>>isinstance(u"test", str)
False

和python-3.x没有 u"foo"

I’m aware that I can use: isinstance(x, str) in python-3.x but I need to check if something is a string in python-2.x as well. Will isinstance(x, str) work as expected in python-2.x? Or will I need to check the version and use isinstance(x, basestr)?

Specifically, in python-2.x:

>>>isinstance(u"test", str)
False

and python-3.x does not have u"foo"


回答 0

如果要编写兼容2.x和3.x的代码,则可能要使用六个

from six import string_types
isinstance(s, string_types)

If you’re writing 2.x-and-3.x-compatible code, you’ll probably want to use six:

from six import string_types
isinstance(s, string_types)

回答 1

我发现不依赖六个软件包的最简洁的方法是:

try:
  basestring
except NameError:
  basestring = str

然后,假设您以最通用的方式检查了Python 2中的字符串,

isinstance(s, basestring)

现在也将适用于Python 3+。

The most terse approach I’ve found without relying on packages like six, is:

try:
  basestring
except NameError:
  basestring = str

then, assuming you’ve been checking for strings in Python 2 in the most generic manner,

isinstance(s, basestring)

will now also work for Python 3+.


回答 2

那在所有情况下都可行吗?

isinstance(x, ("".__class__, u"".__class__))

What about this, works in all cases?

isinstance(x, ("".__class__, u"".__class__))

回答 3

这是@Lev Levitsky的答案,重写了一下。

try:
    isinstance("", basestring)
    def isstr(s):
        return isinstance(s, basestring)
except NameError:
    def isstr(s):
        return isinstance(s, str)

try/ except测试只进行一次,然后定义总是工作,并尽可能快的功能。

编辑:实际上,我们甚至不需要调用isinstance();我们只需要评估一下basestring,看看是否得到NameError

try:
    basestring  # attempt to evaluate basestring
    def isstr(s):
        return isinstance(s, basestring)
except NameError:
    def isstr(s):
        return isinstance(s, str)

我认为,调用会更容易isinstance()

This is @Lev Levitsky’s answer, re-written a bit.

try:
    isinstance("", basestring)
    def isstr(s):
        return isinstance(s, basestring)
except NameError:
    def isstr(s):
        return isinstance(s, str)

The try/except test is done once, and then defines a function that always works and is as fast as possible.

EDIT: Actually, we don’t even need to call isinstance(); we just need to evaluate basestring and see if we get a NameError:

try:
    basestring  # attempt to evaluate basestring
    def isstr(s):
        return isinstance(s, basestring)
except NameError:
    def isstr(s):
        return isinstance(s, str)

I think it is easier to follow with the call to isinstance(), though.


回答 4

future添加了(兼容 Python 2)兼容名称,因此您可以继续编写Python 3。您可以简单地执行以下操作:

from builtins import str
isinstance(x, str) 

要安装它,只需执行pip install future

作为一个警告,它仅支持python>=2.6>=3.3但它是更现代的比six,这是唯一的建议,如果使用python 2.5

The future library adds (to Python 2) compatible names, so you can continue writing Python 3. You can simple do the following:

from builtins import str
isinstance(x, str) 

To install it, just execute pip install future.

As a caveat, it only support python>=2.6,>=3.3, but it is more modern than six, which is only recommended if using python 2.5


回答 5

也许使用类似的解决方法

def isstr(s):
    try:
        return isinstance(s, basestring)
    except NameError:
        return isinstance(s, str)

Maybe use a workaround like

def isstr(s):
    try:
        return isinstance(s, basestring)
    except NameError:
        return isinstance(s, str)

回答 6

您可以通过调用来获取对象的类object.__class__,以便检查object是否为默认字符串类型:

    isinstance(object,"".__class__)

而且您可以将以下内容放在代码的顶部,以便用引号括起来的字符串在python 2中的unicode中:

    from __future__ import unicode_literals

You can get the class of an object by calling object.__class__, so in order to check if object is the default string type:

    isinstance(object,"".__class__)

And You can place the following in the top of Your code so that strings enclosed by quotes are in unicode in python 2:

    from __future__ import unicode_literals

回答 7

您可以在代码的开头尝试此操作:

from __future__ import print_function
import sys
if sys.version[0] == "2":
    py3 = False
else:
    py3 = True
if py3: 
    basstring = str
else:
    basstring = basestring

然后在代码中:

anystring = "test"
# anystring = 1
if isinstance(anystring, basstring):
    print("This is a string")
else:
    print("No string")

You can try this at the beginning of your code:

from __future__ import print_function
import sys
if sys.version[0] == "2":
    py3 = False
else:
    py3 = True
if py3: 
    basstring = str
else:
    basstring = basestring

and later in the code:

anystring = "test"
# anystring = 1
if isinstance(anystring, basstring):
    print("This is a string")
else:
    print("No string")

回答 8

小心!在Python 2,str并且bytes基本上是相同的。如果您试图区分两者,则可能会导致错误。

>>> size = 5    
>>> byte_arr = bytes(size)
>>> isinstance(byte_arr, bytes)
True
>>> isinstance(byte_arr, str)
True

Be careful! In python 2, str and bytes are essentially the same. This can cause a bug if you are trying to distinguish between the two.

>>> size = 5    
>>> byte_arr = bytes(size)
>>> isinstance(byte_arr, bytes)
True
>>> isinstance(byte_arr, str)
True

回答 9

类型(字符串)== str

如果它是一个字符串,则返回true;否则,则返回false

type(string) == str

returns true if its a string, and false if not


使用Python从字符串中删除数字以外的字符?

问题:使用Python从字符串中删除数字以外的字符?

如何从字符串中删除除数字以外的所有字符?

How can I remove all characters except numbers from string?


回答 0

在Python 2. *中,到目前为止最快的方法是.translate

>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>> 

string.maketrans生成一个转换表(长度为256的字符串),在这种情况下,该转换表与''.join(chr(x) for x in range(256))(更快地制作;-)相同。.translate应用转换表(这里无关紧要,因为all本质上是指身份),并删除第二个参数(关键部分)中存在的字符。

.translate在Unicode字符串(和Python 3中的字符串)上的工作方式大不相同-我确实希望问题能说明感兴趣的是哪个Python的主要发行版!)-并不是那么简单,也不是那么快,尽管仍然非常有用。

回到2. *,性能差异令人印象深刻……:

$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop

将事情加速7到8倍几乎不是花生,因此该translate方法非常值得了解和使用。另一种流行的非RE方法…

$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop

比RE慢50%,因此该.translate方法将其击败了一个数量级。

在Python 3或Unicode中,您需要传递.translate一个映射(以普通字符而不是直接字符作为键),该映射返回None要删除的内容。这是删除“除以下所有内容外”几个字符的一种便捷方式:

import string

class Del:
  def __init__(self, keep=string.digits):
    self.comp = dict((ord(c),c) for c in keep)
  def __getitem__(self, k):
    return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)

也发出'1233344554552'。但是,将其放在xx.py中,我们可以…:

$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop

…表明性能优势对于这种“删除”任务消失了,而变成了性能下降。

In Python 2.*, by far the fastest approach is the .translate method:

>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>> 

string.maketrans makes a translation table (a string of length 256) which in this case is the same as ''.join(chr(x) for x in range(256)) (just faster to make;-). .translate applies the translation table (which here is irrelevant since all essentially means identity) AND deletes characters present in the second argument — the key part.

.translate works very differently on Unicode strings (and strings in Python 3 — I do wish questions specified which major-release of Python is of interest!) — not quite this simple, not quite this fast, though still quite usable.

Back to 2.*, the performance difference is impressive…:

$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop

Speeding things up by 7-8 times is hardly peanuts, so the translate method is well worth knowing and using. The other popular non-RE approach…:

$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop

is 50% slower than RE, so the .translate approach beats it by over an order of magnitude.

In Python 3, or for Unicode, you need to pass .translate a mapping (with ordinals, not characters directly, as keys) that returns None for what you want to delete. Here’s a convenient way to express this for deletion of “everything but” a few characters:

import string

class Del:
  def __init__(self, keep=string.digits):
    self.comp = dict((ord(c),c) for c in keep)
  def __getitem__(self, k):
    return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)

also emits '1233344554552'. However, putting this in xx.py we have…:

$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop

…which shows the performance advantage disappears, for this kind of “deletion” tasks, and becomes a performance decrease.


回答 1

使用re.sub,如下所示:

>>> import re
>>> re.sub('\D', '', 'aas30dsa20')
'3020'

\D 匹配任何非数字字符,因此,上面的代码实质上是将每个非数字字符替换为空字符串。

或者您可以使用filter,就像这样(在Python 2中):

>>> filter(str.isdigit, 'aas30dsa20')
'3020'

由于在Python 3中,filter返回的是迭代器而不是list,因此您可以使用以下代码:

>>> ''.join(filter(str.isdigit, 'aas30dsa20'))
'3020'

Use re.sub, like so:

>>> import re
>>> re.sub('\D', '', 'aas30dsa20')
'3020'

\D matches any non-digit character so, the code above, is essentially replacing every non-digit character for the empty string.

Or you can use filter, like so (in Python 2):

>>> filter(str.isdigit, 'aas30dsa20')
'3020'

Since in Python 3, filter returns an iterator instead of a list, you can use the following instead:

>>> ''.join(filter(str.isdigit, 'aas30dsa20'))
'3020'

回答 2

s=''.join(i for i in s if i.isdigit())

另一个生成器变体。

s=''.join(i for i in s if i.isdigit())

Another generator variant.


回答 3

您可以使用过滤器:

filter(lambda x: x.isdigit(), "dasdasd2313dsa")

在python3.0上,您必须加入这个(有点丑陋的:()

''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))

You can use filter:

filter(lambda x: x.isdigit(), "dasdasd2313dsa")

On python3.0 you have to join this (kinda ugly :( )

''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))

回答 4

按照拜耳的回答:

''.join(i for i in s if i.isdigit())

along the lines of bayer’s answer:

''.join(i for i in s if i.isdigit())

回答 5

您可以使用Regex轻松完成此操作

>>> import re
>>> re.sub("\D","","£70,000")
70000

You can easily do it using Regex

>>> import re
>>> re.sub("\D","","£70,000")
70000

回答 6

x.translate(None, string.digits)

将从字符串中删除所有数字。要删除字母并保留数字,请执行以下操作:

x.translate(None, string.letters)
x.translate(None, string.digits)

will delete all digits from string. To delete letters and keep the digits, do this:

x.translate(None, string.letters)

回答 7

这位操作员在评论中提到他想保留小数位。可以通过re.sub方法(按照第二个方法和恕我直言的最佳答案)来完成,方法是明确列出要保留的字符,例如

>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'

The op mentions in the comments that he wants to keep the decimal place. This can be done with the re.sub method (as per the second and IMHO best answer) by explicitly listing the characters to keep e.g.

>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'

回答 8

Python 3的快速版本:

# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)

def keeper(keep):
    table = defaultdict(_NoneType)
    table.update({ord(c): c for c in keep})
    return table

digit_keeper = keeper(string.digits)

这是与regex的性能比较:

$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop

对我来说,它比正则表达式快3倍多。它也比class Del上面更快,因为defaultdict它使用C语言而不是(慢)Python进行所有查找。这是我在同一系统上的版本,以进行比较。

$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop

A fast version for Python 3:

# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)

def keeper(keep):
    table = defaultdict(_NoneType)
    table.update({ord(c): c for c in keep})
    return table

digit_keeper = keeper(string.digits)

Here’s a performance comparison vs. regex:

$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop

So it’s a little bit more than 3 times faster than regex, for me. It’s also faster than class Del above, because defaultdict does all its lookups in C, rather than (slow) Python. Here’s that version on my same system, for comparison.

$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop

回答 9

使用生成器表达式:

>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")

Use a generator expression:

>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")

回答 10

丑陋但可行:

>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>

Ugly but works:

>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>

回答 11

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000次循环,每循环3:2.48微秒最佳

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000次循环,最好为3:每个循环2.02微秒

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000次循环,每循环3:2.37最佳

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000次循环,每循环3:1.97最佳

我已经观察到联接比sub快。

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000 loops, best of 3: 2.48 usec per loop

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000 loops, best of 3: 2.02 usec per loop

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000 loops, best of 3: 2.37 usec per loop

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000 loops, best of 3: 1.97 usec per loop

I had observed that join is faster than sub.


回答 12

您可以阅读每个字符。如果是数字,则将其包括在答案中。该str.isdigit() 方法是一种知道字符是否为数字的方法。

your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'

You can read each character. If it is digit, then include it in the answer. The str.isdigit() method is a way to know if a character is digit.

your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'

回答 13

不是一行代码,但非常简单:

buffer = ""
some_str = "aas30dsa20"

for char in some_str:
    if not char.isdigit():
        buffer += char

print( buffer )

Not a one liner but very simple:

buffer = ""
some_str = "aas30dsa20"

for char in some_str:
    if not char.isdigit():
        buffer += char

print( buffer )

回答 14

我用这个 'letters'应该包含您要删除的所有字母:

Output = Input.translate({ord(i): None for i in 'letters'}))

例:

Input = "I would like 20 dollars for that suit" Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'})) print(Output)

输出: 20

I used this. 'letters' should contain all the letters that you want to get rid of:

Output = Input.translate({ord(i): None for i in 'letters'}))

Example:

Input = "I would like 20 dollars for that suit" Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'})) print(Output)

Output: 20


查找名称包含特定字符串的列

问题:查找名称包含特定字符串的列

我有一个带有列名称的数据框,我想找到一个包含特定字符串但与之不完全匹配的数据框。我在寻找'spike'列名喜欢'spike-2''hey spike''spiked-in'(该'spike'部分总是连续)。

我希望列名以字符串或变量的形式返回,因此我以后可以使用df['name']df[name]照常访问列。我试图找到方法,但没有成功。有小费吗?

I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I’m searching for 'spike' in column names like 'spike-2', 'hey spike', 'spiked-in' (the 'spike' part is always continuous).

I want the column name to be returned as a string or a variable, so I access the column later with df['name'] or df[name] as normal. I’ve tried to find ways to do this, to no avail. Any tips?


回答 0

只需遍历DataFrame.columns,这是一个示例,在此示例中,您将获得匹配的列名称列表:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)

输出:

['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']

说明:

  1. df.columns 返回列名列表
  2. [col for col in df.columns if 'spike' in col]df.columns使用变量遍历列表col并将其添加到结果列表(如果col包含)'spike'。此语法是列表理解

如果只希望结果数据集的列匹配,则可以执行以下操作:

df2 = df.filter(regex='spike')
print(df2)

输出:

   spike-2  spiked-in
0        1          7
1        2          8
2        3          9

Just iterate over DataFrame.columns, now this is an example in which you will end up with a list of column names that match:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)

Output:

['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']

Explanation:

  1. df.columns returns a list of column names
  2. [col for col in df.columns if 'spike' in col] iterates over the list df.columns with the variable col and adds it to the resulting list if col contains 'spike'. This syntax is list comprehension.

If you only want the resulting data set with the columns that match you can do this:

df2 = df.filter(regex='spike')
print(df2)

Output:

   spike-2  spiked-in
0        1          7
1        2          8
2        3          9

回答 1

此答案使用DataFrame.filter方法执行此操作而无需列表理解:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6]}
df = pd.DataFrame(data)

print(df.filter(like='spike').columns)

将仅输出“ spike-2”。您还可以使用正则表达式,如某些人在上面的评论中建议的那样:

print(df.filter(regex='spike|spke').columns)

将输出两列:[‘spike-2’,’hey spke’]

This answer uses the DataFrame.filter method to do this without list comprehension:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6]}
df = pd.DataFrame(data)

print(df.filter(like='spike').columns)

Will output just ‘spike-2’. You can also use regex, as some people suggested in comments above:

print(df.filter(regex='spike|spke').columns)

Will output both columns: [‘spike-2’, ‘hey spke’]


回答 2

您也可以使用 df.columns[df.columns.str.contains(pat = 'spike')]

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

colNames = df.columns[df.columns.str.contains(pat = 'spike')] 

print(colNames)

这将输出列名称: 'spike-2', 'spiked-in'

有关pandas.Series.str.contains的更多信息。

You can also use df.columns[df.columns.str.contains(pat = 'spike')]

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

colNames = df.columns[df.columns.str.contains(pat = 'spike')] 

print(colNames)

This will output the column names: 'spike-2', 'spiked-in'

More about pandas.Series.str.contains.


回答 3

# select columns containing 'spike'
df.filter(like='spike', axis=1)

您还可以按名称选择正则表达式。请参阅:pandas.DataFrame.filter

# select columns containing 'spike'
df.filter(like='spike', axis=1)

You can also select by name, regular expression. Refer to: pandas.DataFrame.filter


回答 4

df.loc[:,df.columns.str.contains("spike")]
df.loc[:,df.columns.str.contains("spike")]

回答 5

您还可以使用以下代码:

spike_cols =[x for x in df.columns[df.columns.str.contains('spike')]]

You also can use this code:

spike_cols =[x for x in df.columns[df.columns.str.contains('spike')]]

回答 6

根据“开始”,“包含”和“结束”获取名称和子集:

# from: /programming/21285380/find-column-whose-name-contains-a-specific-string
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
# from: https://cmdlinetips.com/2019/04/how-to-select-columns-using-prefix-suffix-of-column-names-in-pandas/
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html




import pandas as pd



data = {'spike_starts': [1,2,3], 'ends_spike_starts': [4,5,6], 'ends_spike': [7,8,9], 'not': [10,11,12]}
df = pd.DataFrame(data)



print("\n")
print("----------------------------------------")
colNames_contains = df.columns[df.columns.str.contains(pat = 'spike')].tolist() 
print("Contains")
print(colNames_contains)



print("\n")
print("----------------------------------------")
colNames_starts = df.columns[df.columns.str.contains(pat = '^spike')].tolist() 
print("Starts")
print(colNames_starts)



print("\n")
print("----------------------------------------")
colNames_ends = df.columns[df.columns.str.contains(pat = 'spike$')].tolist() 
print("Ends")
print(colNames_ends)



print("\n")
print("----------------------------------------")
df_subset_start = df.filter(regex='^spike',axis=1)
print("Starts")
print(df_subset_start)



print("\n")
print("----------------------------------------")
df_subset_contains = df.filter(regex='spike',axis=1)
print("Contains")
print(df_subset_contains)



print("\n")
print("----------------------------------------")
df_subset_ends = df.filter(regex='spike$',axis=1)
print("Ends")
print(df_subset_ends)

Getting name and subsetting based on Start, Contains, and Ends:

# from: https://stackoverflow.com/questions/21285380/find-column-whose-name-contains-a-specific-string
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
# from: https://cmdlinetips.com/2019/04/how-to-select-columns-using-prefix-suffix-of-column-names-in-pandas/
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html




import pandas as pd



data = {'spike_starts': [1,2,3], 'ends_spike_starts': [4,5,6], 'ends_spike': [7,8,9], 'not': [10,11,12]}
df = pd.DataFrame(data)



print("\n")
print("----------------------------------------")
colNames_contains = df.columns[df.columns.str.contains(pat = 'spike')].tolist() 
print("Contains")
print(colNames_contains)



print("\n")
print("----------------------------------------")
colNames_starts = df.columns[df.columns.str.contains(pat = '^spike')].tolist() 
print("Starts")
print(colNames_starts)



print("\n")
print("----------------------------------------")
colNames_ends = df.columns[df.columns.str.contains(pat = 'spike$')].tolist() 
print("Ends")
print(colNames_ends)



print("\n")
print("----------------------------------------")
df_subset_start = df.filter(regex='^spike',axis=1)
print("Starts")
print(df_subset_start)



print("\n")
print("----------------------------------------")
df_subset_contains = df.filter(regex='spike',axis=1)
print("Contains")
print(df_subset_contains)



print("\n")
print("----------------------------------------")
df_subset_ends = df.filter(regex='spike$',axis=1)
print("Ends")
print(df_subset_ends)

Python TypeError:格式字符串的参数不足

问题:Python TypeError:格式字符串的参数不足

这是输出。我相信这些是utf-8字符串…其中一些可以是NoneType,但是在类似这样的字符串之前会立即失败…

instr = "'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % softname, procversion, int(percent), exe, description, company, procurl

TypeError:格式字符串的参数不足

虽然是7比7?

Here’s the output. These are utf-8 strings I believe… some of these can be NoneType but it fails immediately, before ones like that…

instr = "'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % softname, procversion, int(percent), exe, description, company, procurl

TypeError: not enough arguments for format string

Its 7 for 7 though?


回答 0

请注意,%格式化字符串的语法已过时。如果您的Python版本支持它,则应编写:

instr = "'{0}', '{1}', '{2}', '{3}', '{4}', '{5}', '{6}'".format(softname, procversion, int(percent), exe, description, company, procurl)

这也可以修复您碰巧遇到的错误。

Note that the % syntax for formatting strings is becoming outdated. If your version of Python supports it, you should write:

instr = "'{0}', '{1}', '{2}', '{3}', '{4}', '{5}', '{6}'".format(softname, procversion, int(percent), exe, description, company, procurl)

This also fixes the error that you happened to have.


回答 1

您需要将格式参数放入元组(添加括号):

instr = "'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % (softname, procversion, int(percent), exe, description, company, procurl)

您当前拥有的等同于以下内容:

intstr = ("'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % softname), procversion, int(percent), exe, description, company, procurl

例:

>>> "%s %s" % 'hello', 'world'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: not enough arguments for format string
>>> "%s %s" % ('hello', 'world')
'hello world'

You need to put the format arguments into a tuple (add parentheses):

instr = "'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % (softname, procversion, int(percent), exe, description, company, procurl)

What you currently have is equivalent to the following:

intstr = ("'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % softname), procversion, int(percent), exe, description, company, procurl

Example:

>>> "%s %s" % 'hello', 'world'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: not enough arguments for format string
>>> "%s %s" % ('hello', 'world')
'hello world'

回答 2

%在格式字符串中用作百分比字符时,出现了相同的错误。解决的办法是加倍%%

I got the same error when using % as a percent character in my format string. The solution to this is to double up the %%.


python中的n克,四克,五克,六克?

问题:python中的n克,四克,五克,六克?

我正在寻找一种将文本拆分为n-gram的方法。通常我会做类似的事情:

import nltk
from nltk import bigrams
string = "I really like python, it's pretty awesome."
string_bigrams = bigrams(string)
print string_bigrams

我知道nltk仅提供二元组和三元组,但是有没有办法将我的文本分为四克,五克甚至一百克?

谢谢!

I’m looking for a way to split a text into n-grams. Normally I would do something like:

import nltk
from nltk import bigrams
string = "I really like python, it's pretty awesome."
string_bigrams = bigrams(string)
print string_bigrams

I am aware that nltk only offers bigrams and trigrams, but is there a way to split my text in four-grams, five-grams or even hundred-grams?

Thanks!


回答 0

其他用户提供的基于本地Python的出色答案。但是这就是nltk方法(以防万一,OP会因为重新发明nltk库中已经存在的内容而受到惩罚)。

有一个NGRAM模块,人们很少使用nltk。这不是因为很难读取ngram,而是基于ngram训练模型,其中n> 3将导致大量数据稀疏。

from nltk import ngrams

sentence = 'this is a foo bar sentences and i want to ngramize it'

n = 6
sixgrams = ngrams(sentence.split(), n)

for grams in sixgrams:
  print grams

Great native python based answers given by other users. But here’s the nltk approach (just in case, the OP gets penalized for reinventing what’s already existing in the nltk library).

There is an ngram module that people seldom use in nltk. It’s not because it’s hard to read ngrams, but training a model base on ngrams where n > 3 will result in much data sparsity.

from nltk import ngrams

sentence = 'this is a foo bar sentences and i want to ngramize it'

n = 6
sixgrams = ngrams(sentence.split(), n)

for grams in sixgrams:
  print grams

回答 1

我很惊讶这还没有出现:

In [34]: sentence = "I really like python, it's pretty awesome.".split()

In [35]: N = 4

In [36]: grams = [sentence[i:i+N] for i in xrange(len(sentence)-N+1)]

In [37]: for gram in grams: print gram
['I', 'really', 'like', 'python,']
['really', 'like', 'python,', "it's"]
['like', 'python,', "it's", 'pretty']
['python,', "it's", 'pretty', 'awesome.']

I’m surprised that this hasn’t shown up yet:

In [34]: sentence = "I really like python, it's pretty awesome.".split()

In [35]: N = 4

In [36]: grams = [sentence[i:i+N] for i in xrange(len(sentence)-N+1)]

In [37]: for gram in grams: print gram
['I', 'really', 'like', 'python,']
['really', 'like', 'python,', "it's"]
['like', 'python,', "it's", 'pretty']
['python,', "it's", 'pretty', 'awesome.']

回答 2

仅使用nltk工具

from nltk.tokenize import word_tokenize
from nltk.util import ngrams

def get_ngrams(text, n ):
    n_grams = ngrams(word_tokenize(text), n)
    return [ ' '.join(grams) for grams in n_grams]

输出示例

get_ngrams('This is the simplest text i could think of', 3 )

['This is the', 'is the simplest', 'the simplest text', 'simplest text i', 'text i could', 'i could think', 'could think of']

为了使ngram保持数组格式,只需删除 ' '.join

Using only nltk tools

from nltk.tokenize import word_tokenize
from nltk.util import ngrams

def get_ngrams(text, n ):
    n_grams = ngrams(word_tokenize(text), n)
    return [ ' '.join(grams) for grams in n_grams]

Example output

get_ngrams('This is the simplest text i could think of', 3 )

['This is the', 'is the simplest', 'the simplest text', 'simplest text i', 'text i could', 'i could think', 'could think of']

In order to keep the ngrams in array format just remove ' '.join


回答 3

这是做n-gram的另一种简单方法

>>> from nltk.util import ngrams
>>> text = "I am aware that nltk only offers bigrams and trigrams, but is there a way to split my text in four-grams, five-grams or even hundred-grams"
>>> tokenize = nltk.word_tokenize(text)
>>> tokenize
['I', 'am', 'aware', 'that', 'nltk', 'only', 'offers', 'bigrams', 'and', 'trigrams', ',', 'but', 'is', 'there', 'a', 'way', 'to', 'split', 'my', 'text', 'in', 'four-grams', ',', 'five-grams', 'or', 'even', 'hundred-grams']
>>> bigrams = ngrams(tokenize,2)
>>> bigrams
[('I', 'am'), ('am', 'aware'), ('aware', 'that'), ('that', 'nltk'), ('nltk', 'only'), ('only', 'offers'), ('offers', 'bigrams'), ('bigrams', 'and'), ('and', 'trigrams'), ('trigrams', ','), (',', 'but'), ('but', 'is'), ('is', 'there'), ('there', 'a'), ('a', 'way'), ('way', 'to'), ('to', 'split'), ('split', 'my'), ('my', 'text'), ('text', 'in'), ('in', 'four-grams'), ('four-grams', ','), (',', 'five-grams'), ('five-grams', 'or'), ('or', 'even'), ('even', 'hundred-grams')]
>>> trigrams=ngrams(tokenize,3)
>>> trigrams
[('I', 'am', 'aware'), ('am', 'aware', 'that'), ('aware', 'that', 'nltk'), ('that', 'nltk', 'only'), ('nltk', 'only', 'offers'), ('only', 'offers', 'bigrams'), ('offers', 'bigrams', 'and'), ('bigrams', 'and', 'trigrams'), ('and', 'trigrams', ','), ('trigrams', ',', 'but'), (',', 'but', 'is'), ('but', 'is', 'there'), ('is', 'there', 'a'), ('there', 'a', 'way'), ('a', 'way', 'to'), ('way', 'to', 'split'), ('to', 'split', 'my'), ('split', 'my', 'text'), ('my', 'text', 'in'), ('text', 'in', 'four-grams'), ('in', 'four-grams', ','), ('four-grams', ',', 'five-grams'), (',', 'five-grams', 'or'), ('five-grams', 'or', 'even'), ('or', 'even', 'hundred-grams')]
>>> fourgrams=ngrams(tokenize,4)
>>> fourgrams
[('I', 'am', 'aware', 'that'), ('am', 'aware', 'that', 'nltk'), ('aware', 'that', 'nltk', 'only'), ('that', 'nltk', 'only', 'offers'), ('nltk', 'only', 'offers', 'bigrams'), ('only', 'offers', 'bigrams', 'and'), ('offers', 'bigrams', 'and', 'trigrams'), ('bigrams', 'and', 'trigrams', ','), ('and', 'trigrams', ',', 'but'), ('trigrams', ',', 'but', 'is'), (',', 'but', 'is', 'there'), ('but', 'is', 'there', 'a'), ('is', 'there', 'a', 'way'), ('there', 'a', 'way', 'to'), ('a', 'way', 'to', 'split'), ('way', 'to', 'split', 'my'), ('to', 'split', 'my', 'text'), ('split', 'my', 'text', 'in'), ('my', 'text', 'in', 'four-grams'), ('text', 'in', 'four-grams', ','), ('in', 'four-grams', ',', 'five-grams'), ('four-grams', ',', 'five-grams', 'or'), (',', 'five-grams', 'or', 'even'), ('five-grams', 'or', 'even', 'hundred-grams')]

here is another simple way for do n-grams

>>> from nltk.util import ngrams
>>> text = "I am aware that nltk only offers bigrams and trigrams, but is there a way to split my text in four-grams, five-grams or even hundred-grams"
>>> tokenize = nltk.word_tokenize(text)
>>> tokenize
['I', 'am', 'aware', 'that', 'nltk', 'only', 'offers', 'bigrams', 'and', 'trigrams', ',', 'but', 'is', 'there', 'a', 'way', 'to', 'split', 'my', 'text', 'in', 'four-grams', ',', 'five-grams', 'or', 'even', 'hundred-grams']
>>> bigrams = ngrams(tokenize,2)
>>> bigrams
[('I', 'am'), ('am', 'aware'), ('aware', 'that'), ('that', 'nltk'), ('nltk', 'only'), ('only', 'offers'), ('offers', 'bigrams'), ('bigrams', 'and'), ('and', 'trigrams'), ('trigrams', ','), (',', 'but'), ('but', 'is'), ('is', 'there'), ('there', 'a'), ('a', 'way'), ('way', 'to'), ('to', 'split'), ('split', 'my'), ('my', 'text'), ('text', 'in'), ('in', 'four-grams'), ('four-grams', ','), (',', 'five-grams'), ('five-grams', 'or'), ('or', 'even'), ('even', 'hundred-grams')]
>>> trigrams=ngrams(tokenize,3)
>>> trigrams
[('I', 'am', 'aware'), ('am', 'aware', 'that'), ('aware', 'that', 'nltk'), ('that', 'nltk', 'only'), ('nltk', 'only', 'offers'), ('only', 'offers', 'bigrams'), ('offers', 'bigrams', 'and'), ('bigrams', 'and', 'trigrams'), ('and', 'trigrams', ','), ('trigrams', ',', 'but'), (',', 'but', 'is'), ('but', 'is', 'there'), ('is', 'there', 'a'), ('there', 'a', 'way'), ('a', 'way', 'to'), ('way', 'to', 'split'), ('to', 'split', 'my'), ('split', 'my', 'text'), ('my', 'text', 'in'), ('text', 'in', 'four-grams'), ('in', 'four-grams', ','), ('four-grams', ',', 'five-grams'), (',', 'five-grams', 'or'), ('five-grams', 'or', 'even'), ('or', 'even', 'hundred-grams')]
>>> fourgrams=ngrams(tokenize,4)
>>> fourgrams
[('I', 'am', 'aware', 'that'), ('am', 'aware', 'that', 'nltk'), ('aware', 'that', 'nltk', 'only'), ('that', 'nltk', 'only', 'offers'), ('nltk', 'only', 'offers', 'bigrams'), ('only', 'offers', 'bigrams', 'and'), ('offers', 'bigrams', 'and', 'trigrams'), ('bigrams', 'and', 'trigrams', ','), ('and', 'trigrams', ',', 'but'), ('trigrams', ',', 'but', 'is'), (',', 'but', 'is', 'there'), ('but', 'is', 'there', 'a'), ('is', 'there', 'a', 'way'), ('there', 'a', 'way', 'to'), ('a', 'way', 'to', 'split'), ('way', 'to', 'split', 'my'), ('to', 'split', 'my', 'text'), ('split', 'my', 'text', 'in'), ('my', 'text', 'in', 'four-grams'), ('text', 'in', 'four-grams', ','), ('in', 'four-grams', ',', 'five-grams'), ('four-grams', ',', 'five-grams', 'or'), (',', 'five-grams', 'or', 'even'), ('five-grams', 'or', 'even', 'hundred-grams')]

回答 4

对于需要二元组或三元组的情况,人们已经很好地回答了,但是在这种情况下,如果需要句子的每一个整组,您可以使用nltk.util.everygrams

>>> from nltk.util import everygrams

>>> message = "who let the dogs out"

>>> msg_split = message.split()

>>> list(everygrams(msg_split))
[('who',), ('let',), ('the',), ('dogs',), ('out',), ('who', 'let'), ('let', 'the'), ('the', 'dogs'), ('dogs', 'out'), ('who', 'let', 'the'), ('let', 'the', 'dogs'), ('the', 'dogs', 'out'), ('who', 'let', 'the', 'dogs'), ('let', 'the', 'dogs', 'out'), ('who', 'let', 'the', 'dogs', 'out')]

如果您有一个限制,如三字母组的最大长度应为3,则可以使用max_len参数来指定它。

>>> list(everygrams(msg_split, max_len=2))
[('who',), ('let',), ('the',), ('dogs',), ('out',), ('who', 'let'), ('let', 'the'), ('the', 'dogs'), ('dogs', 'out')]

您可以修改max_len参数以实现任意克,即4克,5克,6甚至100克。

可以修改前面提到的解决方案以实现上面提到的解决方案,但是此解决方案比这要简单得多。

欲了解更多信息,请点击这里

而且,当您只需要一个特定的语法,例如bigram或trigram等时,可以使用MAHassan的答案中提到的nltk.util.ngrams。

People have already answered pretty nicely for the scenario where you need bigrams or trigrams but if you need everygram for the sentence in that case you can use nltk.util.everygrams

>>> from nltk.util import everygrams

>>> message = "who let the dogs out"

>>> msg_split = message.split()

>>> list(everygrams(msg_split))
[('who',), ('let',), ('the',), ('dogs',), ('out',), ('who', 'let'), ('let', 'the'), ('the', 'dogs'), ('dogs', 'out'), ('who', 'let', 'the'), ('let', 'the', 'dogs'), ('the', 'dogs', 'out'), ('who', 'let', 'the', 'dogs'), ('let', 'the', 'dogs', 'out'), ('who', 'let', 'the', 'dogs', 'out')]

Incase you have a limit like in case of trigrams where the max length should be 3 then you can use max_len param to specify it.

>>> list(everygrams(msg_split, max_len=2))
[('who',), ('let',), ('the',), ('dogs',), ('out',), ('who', 'let'), ('let', 'the'), ('the', 'dogs'), ('dogs', 'out')]

You can just modify the max_len param to achieve whatever gram i.e four gram, five gram, six or even hundred gram.

The previous mentioned solutions can be modified to implement the above mentioned solution but this solution is much straight forward than that.

For further reading click here

And when you just need a specific gram like bigram or trigram etc you can use the nltk.util.ngrams as mentioned in M.A.Hassan’s answer.


回答 5

您可以使用以下命令轻松启动自己的功能itertools

from itertools import izip, islice, tee
s = 'spam and eggs'
N = 3
trigrams = izip(*(islice(seq, index, None) for index, seq in enumerate(tee(s, N))))
list(trigrams)
# [('s', 'p', 'a'), ('p', 'a', 'm'), ('a', 'm', ' '),
# ('m', ' ', 'a'), (' ', 'a', 'n'), ('a', 'n', 'd'),
# ('n', 'd', ' '), ('d', ' ', 'e'), (' ', 'e', 'g'),
# ('e', 'g', 'g'), ('g', 'g', 's')]

You can easily whip up your own function to do this using itertools:

from itertools import izip, islice, tee
s = 'spam and eggs'
N = 3
trigrams = izip(*(islice(seq, index, None) for index, seq in enumerate(tee(s, N))))
list(trigrams)
# [('s', 'p', 'a'), ('p', 'a', 'm'), ('a', 'm', ' '),
# ('m', ' ', 'a'), (' ', 'a', 'n'), ('a', 'n', 'd'),
# ('n', 'd', ' '), ('d', ' ', 'e'), (' ', 'e', 'g'),
# ('e', 'g', 'g'), ('g', 'g', 's')]

回答 6

使用python的buildin构建双字母组的一种更优雅的方法zip()。只需通过将原始字符串转换为列表split(),然后正常传递一次列表,然后一次偏移一个元素即可。

string = "I really like python, it's pretty awesome."

def find_bigrams(s):
    input_list = s.split(" ")
    return zip(input_list, input_list[1:])

def find_ngrams(s, n):
  input_list = s.split(" ")
  return zip(*[input_list[i:] for i in range(n)])

find_bigrams(string)

[('I', 'really'), ('really', 'like'), ('like', 'python,'), ('python,', "it's"), ("it's", 'pretty'), ('pretty', 'awesome.')]

A more elegant approach to build bigrams with python’s builtin zip(). Simply convert the original string into a list by split(), then pass the list once normally and once offset by one element.

string = "I really like python, it's pretty awesome."

def find_bigrams(s):
    input_list = s.split(" ")
    return zip(input_list, input_list[1:])

def find_ngrams(s, n):
  input_list = s.split(" ")
  return zip(*[input_list[i:] for i in range(n)])

find_bigrams(string)

[('I', 'really'), ('really', 'like'), ('like', 'python,'), ('python,', "it's"), ("it's", 'pretty'), ('pretty', 'awesome.')]

回答 7

我从未处理过nltk,但是在一些小班项目中做了N-grams。如果要查找字符串中所有N-gram出现的频率,可以采用这种方法。D会给你N个单词的直方图。

D = dict()
string = 'whatever string...'
strparts = string.split()
for i in range(len(strparts)-N): # N-grams
    try:
        D[tuple(strparts[i:i+N])] += 1
    except:
        D[tuple(strparts[i:i+N])] = 1

I have never dealt with nltk but did N-grams as part of some small class project. If you want to find the frequency of all N-grams occurring in the string, here is a way to do that. D would give you the histogram of your N-words.

D = dict()
string = 'whatever string...'
strparts = string.split()
for i in range(len(strparts)-N): # N-grams
    try:
        D[tuple(strparts[i:i+N])] += 1
    except:
        D[tuple(strparts[i:i+N])] = 1

回答 8

对于four_grams,它已经在NLTK中,这是一段代码,可以帮助您实现这一目标:

 from nltk.collocations import *
 import nltk
 #You should tokenize your text
 text = "I do not like green eggs and ham, I do not like them Sam I am!"
 tokens = nltk.wordpunct_tokenize(text)
 fourgrams=nltk.collocations.QuadgramCollocationFinder.from_words(tokens)
 for fourgram, freq in fourgrams.ngram_fd.items():  
       print fourgram, freq

希望对您有所帮助。

For four_grams it is already in NLTK, here is a piece of code that can help you toward this:

 from nltk.collocations import *
 import nltk
 #You should tokenize your text
 text = "I do not like green eggs and ham, I do not like them Sam I am!"
 tokens = nltk.wordpunct_tokenize(text)
 fourgrams=nltk.collocations.QuadgramCollocationFinder.from_words(tokens)
 for fourgram, freq in fourgrams.ngram_fd.items():  
       print fourgram, freq

I hope it helps.


回答 9

您可以使用sklearn.feature_extraction.text.CountVectorizer

import sklearn.feature_extraction.text # FYI http://scikit-learn.org/stable/install.html
ngram_size = 4
string = ["I really like python, it's pretty awesome."]
vect = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(ngram_size,ngram_size))
vect.fit(string)
print('{1}-grams: {0}'.format(vect.get_feature_names(), ngram_size))

输出:

4-grams: [u'like python it pretty', u'python it pretty awesome', u'really like python it']

您可以设置为ngram_size任何正整数。也就是说,您可以将文本拆分为四克,五克甚至一百克。

You can use sklearn.feature_extraction.text.CountVectorizer:

import sklearn.feature_extraction.text # FYI http://scikit-learn.org/stable/install.html
ngram_size = 4
string = ["I really like python, it's pretty awesome."]
vect = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(ngram_size,ngram_size))
vect.fit(string)
print('{1}-grams: {0}'.format(vect.get_feature_names(), ngram_size))

outputs:

4-grams: [u'like python it pretty', u'python it pretty awesome', u'really like python it']

You can set to ngram_size to any positive integer. I.e. you can split a text in four-grams, five-grams or even hundred-grams.


回答 10

如果效率是一个问题,您必须构建多个不同的n-gram(如您所说的最多一百个),但是您想使用纯python,我会这样做:

from itertools import chain

def n_grams(seq, n=1):
    """Returns an itirator over the n-grams given a listTokens"""
    shiftToken = lambda i: (el for j,el in enumerate(seq) if j>=i)
    shiftedTokens = (shiftToken(i) for i in range(n))
    tupleNGrams = zip(*shiftedTokens)
    return tupleNGrams # if join in generator : (" ".join(i) for i in tupleNGrams)

def range_ngrams(listTokens, ngramRange=(1,2)):
    """Returns an itirator over all n-grams for n in range(ngramRange) given a listTokens."""
    return chain(*(n_grams(listTokens, i) for i in range(*ngramRange)))

用法:

>>> input_list = input_list = 'test the ngrams generator'.split()
>>> list(range_ngrams(input_list, ngramRange=(1,3)))
[('test',), ('the',), ('ngrams',), ('generator',), ('test', 'the'), ('the', 'ngrams'), ('ngrams', 'generator'), ('test', 'the', 'ngrams'), ('the', 'ngrams', 'generator')]

〜与NLTK相同的速度:

import nltk
%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
nltk.ngrams(input_list,n=5)
# 7.02 ms ± 79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
n_grams(input_list,n=5)
# 7.01 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
nltk.ngrams(input_list,n=1)
nltk.ngrams(input_list,n=2)
nltk.ngrams(input_list,n=3)
nltk.ngrams(input_list,n=4)
nltk.ngrams(input_list,n=5)
# 7.32 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
range_ngrams(input_list, ngramRange=(1,6))
# 7.13 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

重新发布我以前的答案

If efficiency is an issue and you have to build multiple different n-grams (up to a hundred as you say), but you want to use pure python I would do:

from itertools import chain

def n_grams(seq, n=1):
    """Returns an itirator over the n-grams given a listTokens"""
    shiftToken = lambda i: (el for j,el in enumerate(seq) if j>=i)
    shiftedTokens = (shiftToken(i) for i in range(n))
    tupleNGrams = zip(*shiftedTokens)
    return tupleNGrams # if join in generator : (" ".join(i) for i in tupleNGrams)

def range_ngrams(listTokens, ngramRange=(1,2)):
    """Returns an itirator over all n-grams for n in range(ngramRange) given a listTokens."""
    return chain(*(n_grams(listTokens, i) for i in range(*ngramRange)))

Usage :

>>> input_list = input_list = 'test the ngrams generator'.split()
>>> list(range_ngrams(input_list, ngramRange=(1,3)))
[('test',), ('the',), ('ngrams',), ('generator',), ('test', 'the'), ('the', 'ngrams'), ('ngrams', 'generator'), ('test', 'the', 'ngrams'), ('the', 'ngrams', 'generator')]

~Same speed as NLTK:

import nltk
%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
nltk.ngrams(input_list,n=5)
# 7.02 ms ± 79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
n_grams(input_list,n=5)
# 7.01 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
nltk.ngrams(input_list,n=1)
nltk.ngrams(input_list,n=2)
nltk.ngrams(input_list,n=3)
nltk.ngrams(input_list,n=4)
nltk.ngrams(input_list,n=5)
# 7.32 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
range_ngrams(input_list, ngramRange=(1,6))
# 7.13 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Repost from my previous answer.


回答 11

Nltk很棒,但有时对于某些项目来说是一项开销:

import re
def tokenize(text, ngrams=1):
    text = re.sub(r'[\b\(\)\\\"\'\/\[\]\s+\,\.:\?;]', ' ', text)
    text = re.sub(r'\s+', ' ', text)
    tokens = text.split()
    return [tuple(tokens[i:i+ngrams]) for i in xrange(len(tokens)-ngrams+1)]

使用示例:

>> text = "This is an example text"
>> tokenize(text, 2)
[('This', 'is'), ('is', 'an'), ('an', 'example'), ('example', 'text')]
>> tokenize(text, 3)
[('This', 'is', 'an'), ('is', 'an', 'example'), ('an', 'example', 'text')]

Nltk is great, but sometimes is a overhead for some projects:

import re
def tokenize(text, ngrams=1):
    text = re.sub(r'[\b\(\)\\\"\'\/\[\]\s+\,\.:\?;]', ' ', text)
    text = re.sub(r'\s+', ' ', text)
    tokens = text.split()
    return [tuple(tokens[i:i+ngrams]) for i in xrange(len(tokens)-ngrams+1)]

Example use:

>> text = "This is an example text"
>> tokenize(text, 2)
[('This', 'is'), ('is', 'an'), ('an', 'example'), ('example', 'text')]
>> tokenize(text, 3)
[('This', 'is', 'an'), ('is', 'an', 'example'), ('an', 'example', 'text')]

回答 12

您可以使用以下代码获得所有4-6克的代码,而无需以下其他软件包:

from itertools import chain

def get_m_2_ngrams(input_list, min, max):
    for s in chain(*[get_ngrams(input_list, k) for k in range(min, max+1)]):
        yield ' '.join(s)

def get_ngrams(input_list, n):
    return zip(*[input_list[i:] for i in range(n)])

if __name__ == '__main__':
    input_list = ['I', 'am', 'aware', 'that', 'nltk', 'only', 'offers', 'bigrams', 'and', 'trigrams', ',', 'but', 'is', 'there', 'a', 'way', 'to', 'split', 'my', 'text', 'in', 'four-grams', ',', 'five-grams', 'or', 'even', 'hundred-grams']
    for s in get_m_2_ngrams(input_list, 4, 6):
        print(s)

输出如下:

I am aware that
am aware that nltk
aware that nltk only
that nltk only offers
nltk only offers bigrams
only offers bigrams and
offers bigrams and trigrams
bigrams and trigrams ,
and trigrams , but
trigrams , but is
, but is there
but is there a
is there a way
there a way to
a way to split
way to split my
to split my text
split my text in
my text in four-grams
text in four-grams ,
in four-grams , five-grams
four-grams , five-grams or
, five-grams or even
five-grams or even hundred-grams
I am aware that nltk
am aware that nltk only
aware that nltk only offers
that nltk only offers bigrams
nltk only offers bigrams and
only offers bigrams and trigrams
offers bigrams and trigrams ,
bigrams and trigrams , but
and trigrams , but is
trigrams , but is there
, but is there a
but is there a way
is there a way to
there a way to split
a way to split my
way to split my text
to split my text in
split my text in four-grams
my text in four-grams ,
text in four-grams , five-grams
in four-grams , five-grams or
four-grams , five-grams or even
, five-grams or even hundred-grams
I am aware that nltk only
am aware that nltk only offers
aware that nltk only offers bigrams
that nltk only offers bigrams and
nltk only offers bigrams and trigrams
only offers bigrams and trigrams ,
offers bigrams and trigrams , but
bigrams and trigrams , but is
and trigrams , but is there
trigrams , but is there a
, but is there a way
but is there a way to
is there a way to split
there a way to split my
a way to split my text
way to split my text in
to split my text in four-grams
split my text in four-grams ,
my text in four-grams , five-grams
text in four-grams , five-grams or
in four-grams , five-grams or even
four-grams , five-grams or even hundred-grams

您可以在此博客上找到更多详细信息

You can get all 4-6gram using the code without other package below:

from itertools import chain

def get_m_2_ngrams(input_list, min, max):
    for s in chain(*[get_ngrams(input_list, k) for k in range(min, max+1)]):
        yield ' '.join(s)

def get_ngrams(input_list, n):
    return zip(*[input_list[i:] for i in range(n)])

if __name__ == '__main__':
    input_list = ['I', 'am', 'aware', 'that', 'nltk', 'only', 'offers', 'bigrams', 'and', 'trigrams', ',', 'but', 'is', 'there', 'a', 'way', 'to', 'split', 'my', 'text', 'in', 'four-grams', ',', 'five-grams', 'or', 'even', 'hundred-grams']
    for s in get_m_2_ngrams(input_list, 4, 6):
        print(s)

the output is below:

I am aware that
am aware that nltk
aware that nltk only
that nltk only offers
nltk only offers bigrams
only offers bigrams and
offers bigrams and trigrams
bigrams and trigrams ,
and trigrams , but
trigrams , but is
, but is there
but is there a
is there a way
there a way to
a way to split
way to split my
to split my text
split my text in
my text in four-grams
text in four-grams ,
in four-grams , five-grams
four-grams , five-grams or
, five-grams or even
five-grams or even hundred-grams
I am aware that nltk
am aware that nltk only
aware that nltk only offers
that nltk only offers bigrams
nltk only offers bigrams and
only offers bigrams and trigrams
offers bigrams and trigrams ,
bigrams and trigrams , but
and trigrams , but is
trigrams , but is there
, but is there a
but is there a way
is there a way to
there a way to split
a way to split my
way to split my text
to split my text in
split my text in four-grams
my text in four-grams ,
text in four-grams , five-grams
in four-grams , five-grams or
four-grams , five-grams or even
, five-grams or even hundred-grams
I am aware that nltk only
am aware that nltk only offers
aware that nltk only offers bigrams
that nltk only offers bigrams and
nltk only offers bigrams and trigrams
only offers bigrams and trigrams ,
offers bigrams and trigrams , but
bigrams and trigrams , but is
and trigrams , but is there
trigrams , but is there a
, but is there a way
but is there a way to
is there a way to split
there a way to split my
a way to split my text
way to split my text in
to split my text in four-grams
split my text in four-grams ,
my text in four-grams , five-grams
text in four-grams , five-grams or
in four-grams , five-grams or even
four-grams , five-grams or even hundred-grams

you can find more detail on this blog


回答 13

大约七年后,这是一个更优雅的答案collections.deque

def ngrams(words, n):
    d = collections.deque(maxlen=n)
    d.extend(words[:n])
    words = words[n:]
    for window, word in zip(itertools.cycle((d,)), words):
        print(' '.join(window))
        d.append(word)

words = ['I', 'am', 'become', 'death,', 'the', 'destroyer', 'of', 'worlds']

输出:

In [15]: ngrams(words, 3)                                                                                                                                                                                                                     
I am become
am become death,
become death, the
death, the destroyer
the destroyer of

In [16]: ngrams(words, 4)                                                                                                                                                                                                                     
I am become death,
am become death, the
become death, the destroyer
death, the destroyer of

In [17]: ngrams(words, 1)                                                                                                                                                                                                                     
I
am
become
death,
the
destroyer
of

In [18]: ngrams(words, 2)                                                                                                                                                                                                                     
I am
am become
become death,
death, the
the destroyer
destroyer of

After about seven years, here’s a more elegant answer using collections.deque:

def ngrams(words, n):
    d = collections.deque(maxlen=n)
    d.extend(words[:n])
    words = words[n:]
    for window, word in zip(itertools.cycle((d,)), words):
        print(' '.join(window))
        d.append(word)

words = ['I', 'am', 'become', 'death,', 'the', 'destroyer', 'of', 'worlds']

Output:

In [15]: ngrams(words, 3)                                                                                                                                                                                                                     
I am become
am become death,
become death, the
death, the destroyer
the destroyer of

In [16]: ngrams(words, 4)                                                                                                                                                                                                                     
I am become death,
am become death, the
become death, the destroyer
death, the destroyer of

In [17]: ngrams(words, 1)                                                                                                                                                                                                                     
I
am
become
death,
the
destroyer
of

In [18]: ngrams(words, 2)                                                                                                                                                                                                                     
I am
am become
become death,
death, the
the destroyer
destroyer of

回答 14

如果您想为具有恒定内存使用量的大字符串提供纯迭代器解决方案:

from typing import Iterable  
import itertools

def ngrams_iter(input: str, ngram_size: int, token_regex=r"[^\s]+") -> Iterable[str]:
    input_iters = [ 
        map(lambda m: m.group(0), re.finditer(token_regex, input)) 
        for n in range(ngram_size) 
    ]
    # Skip first words
    for n in range(1, ngram_size): list(map(next, input_iters[n:]))  

    output_iter = itertools.starmap( 
        lambda *args: " ".join(args),  
        zip(*input_iters) 
    ) 
    return output_iter

测试:

input = "If you want a pure iterator solution for large strings with constant memory usage"
list(ngrams_iter(input, 5))

输出:

['If you want a pure',
 'you want a pure iterator',
 'want a pure iterator solution',
 'a pure iterator solution for',
 'pure iterator solution for large',
 'iterator solution for large strings',
 'solution for large strings with',
 'for large strings with constant',
 'large strings with constant memory',
 'strings with constant memory usage']

If you want a pure iterator solution for large strings with constant memory usage:

from typing import Iterable  
import itertools

def ngrams_iter(input: str, ngram_size: int, token_regex=r"[^\s]+") -> Iterable[str]:
    input_iters = [ 
        map(lambda m: m.group(0), re.finditer(token_regex, input)) 
        for n in range(ngram_size) 
    ]
    # Skip first words
    for n in range(1, ngram_size): list(map(next, input_iters[n:]))  

    output_iter = itertools.starmap( 
        lambda *args: " ".join(args),  
        zip(*input_iters) 
    ) 
    return output_iter

Test:

input = "If you want a pure iterator solution for large strings with constant memory usage"
list(ngrams_iter(input, 5))

Output:

['If you want a pure',
 'you want a pure iterator',
 'want a pure iterator solution',
 'a pure iterator solution for',
 'pure iterator solution for large',
 'iterator solution for large strings',
 'solution for large strings with',
 'for large strings with constant',
 'large strings with constant memory',
 'strings with constant memory usage']

除非分配输出,为什么调用Python字符串方法不做任何事情?

问题:除非分配输出,为什么调用Python字符串方法不做任何事情?

我尝试做一个简单的字符串替换,但是我不知道为什么它似乎不起作用:

X = "hello world"
X.replace("hello", "goodbye")

我想将单词更改hellogoodbye,因此应将字符串更改"hello world""goodbye world"。但是X仍然存在"hello world"。为什么我的代码不起作用?

I try to do a simple string replacement, but I don’t know why it doesn’t seem to work:

X = "hello world"
X.replace("hello", "goodbye")

I want to change the word hello to goodbye, thus it should change the string "hello world" to "goodbye world". But X just remains "hello world". Why is my code not working?


回答 0

这是因为字符串在Python中是不可变的

这意味着将X.replace("hello","goodbye")返回的副本,X其中包含已替换的副本。因此,您需要替换此行:

X.replace("hello", "goodbye")

用这一行:

X = X.replace("hello", "goodbye")

更广泛地说,这是所有Python字符串的方法是“就地”修改字符串的内容真实,例如replacestriptranslatelower/ upperjoin,…

如果要使用它而不要将其丢弃,则必须将其输出分配给某些东西。

X  = X.strip(' \t')
X2 = X.translate(...)
Y  = X.lower()
Z  = X.upper()
A  = X.join(':')
B  = X.capitalize()
C  = X.casefold()

等等。

This is because strings are immutable in Python.

Which means that X.replace("hello","goodbye") returns a copy of X with replacements made. Because of that you need replace this line:

X.replace("hello", "goodbye")

with this line:

X = X.replace("hello", "goodbye")

More broadly, this is true for all Python string methods that change a string’s content “in-place”, e.g. replace,strip,translate,lower/upper,join,…

You must assign their output to something if you want to use it and not throw it away, e.g.

X  = X.strip(' \t')
X2 = X.translate(...)
Y  = X.lower()
Z  = X.upper()
A  = X.join(':')
B  = X.capitalize()
C  = X.casefold()

and so on.


回答 1

所有的字符串功能lowerupperstrip而没有修改原始返回的字符串。如果您尝试修改一个字符串(可能会想到)well it is an iterable,它将失败。

x = 'hello'
x[0] = 'i' #'str' object does not support item assignment

关于字符串不可更改的重要性的文章很好读:为什么Python字符串不可更改?使用它们的最佳实践

All string functions as lower, upper, strip are returning a string without modifying the original. If you try to modify a string, as you might think well it is an iterable, it will fail.

x = 'hello'
x[0] = 'i' #'str' object does not support item assignment

There is a good reading about the importance of strings being immutable: Why are Python strings immutable? Best practices for using them