# 如何在Python的同一行上打印变量和字符串？

## 问题：如何在Python的同一行上打印变量和字符串？

``````currentPop = 312032486
oneYear = 365
hours = 24
minutes = 60
seconds = 60

# seconds in a single day
secondsInDay = hours * minutes * seconds

# seconds in a year
secondsInYear = secondsInDay * oneYear

fiveYears = secondsInYear * 5

#Seconds in 5 years
print fiveYears

# fiveYears in seconds, divided by 7 seconds
births = fiveYears // 7

print "If there was a birth every 7 seconds, there would be: " births "births"
``````

I am using python to work out how many children would be born in 5 years if a child was born every 7 seconds. The problem is on my last line. How do I get a variable to work when I’m printing text either side of it?

Here is my code:

``````currentPop = 312032486
oneYear = 365
hours = 24
minutes = 60
seconds = 60

# seconds in a single day
secondsInDay = hours * minutes * seconds

# seconds in a year
secondsInYear = secondsInDay * oneYear

fiveYears = secondsInYear * 5

#Seconds in 5 years
print fiveYears

# fiveYears in seconds, divided by 7 seconds
births = fiveYears // 7

print "If there was a birth every 7 seconds, there would be: " births "births"
``````

## 回答 0

``print "If there was a birth every 7 seconds, there would be: ",births,"births"``

`,` 在print语句中将项目分隔一个空格：

``````>>> print "foo","bar","spam"
foo bar spam``````

``print "If there was a birth every 7 seconds, there would be: {} births".format(births)``

``````>>> print "{:d} {:03d} {:>20f}".format(1,2,1.1)
1 002             1.100000
^^^

``````>>> births = 4
>>> print "If there was a birth every 7 seconds, there would be: ",births,"births"
If there was a birth every 7 seconds, there would be:  4 births

#formatting
>>> print "If there was a birth every 7 seconds, there would be: {} births".format(births)
If there was a birth every 7 seconds, there would be: 4 births``````

Use `,` to separate strings and variables while printing:

``````print("If there was a birth every 7 seconds, there would be: ", births, "births")
``````

`,` in print function separates the items by a single space:

``````>>> print("foo", "bar", "spam")
foo bar spam
``````

or better use string formatting:

``````print("If there was a birth every 7 seconds, there would be: {} births".format(births))
``````

String formatting is much more powerful and allows you to do some other things as well, like padding, fill, alignment, width, set precision, etc.

``````>>> print("{:d} {:03d} {:>20f}".format(1, 2, 1.1))
1 002             1.100000
^^^
``````

Demo:

``````>>> births = 4
>>> print("If there was a birth every 7 seconds, there would be: ", births, "births")
If there was a birth every 7 seconds, there would be:  4 births

# formatting
>>> print("If there was a birth every 7 seconds, there would be: {} births".format(births))
If there was a birth every 7 seconds, there would be: 4 births
``````

## 回答 1

`````` >>>births = str(5)
>>>print "there are " + births + " births."
there are 5 births.``````

``````>>> births = str(5)
>>>
>>> print "there are {} births.".format(births)
there are 5 births.``````

`format`方法也可以与列表一起使用

``````>>> format_list = ['five','three']
>>> print "there are {} births and {} deaths".format(*format_list) #unpack the list
there are five births and three deaths``````

``````>>> format_dictionary = {'births': 'five', 'deaths': 'three'}
>>> print "there are {births} births, and {deaths} deaths".format(**format_dictionary) #yup, unpack the dictionary
there are five births, and three deaths``````

Two more

The First one

``````>>> births = str(5)
>>> print("there are " + births + " births.")
there are 5 births.
``````

The Second One

Also the `format` (Python 2.6 and newer) method of strings is probably the standard way:

``````>>> births = str(5)
>>>
>>> print("there are {} births.".format(births))
there are 5 births.
``````

This `format` method can be used with lists as well

``````>>> format_list = ['five', 'three']
>>> # * unpacks the list:
>>> print("there are {} births and {} deaths".format(*format_list))
there are five births and three deaths
``````

or dictionaries

``````>>> format_dictionary = {'births': 'five', 'deaths': 'three'}
>>> # ** unpacks the dictionary
>>> print("there are {births} births, and {deaths} deaths".format(**format_dictionary))
there are five births, and three deaths
``````

## 回答 2

Python是一种非常通用的语言。您可以通过不同的方法打印变量。我列出了以下4种方法。您可以根据需要使用它们。

``````a=1
b='ball'``````

``print('I have %d %s' %(a,b))``

``print('I have',a,b)``

``print('I have {} {}'.format(a,b))``

``print('I have ' + str(a) +' ' +b)``

``  print( f'I have {a} {b}')``

``I have 1 ball``

Python is a very versatile language. You may print variables by different methods. I have listed below five methods. You may use them according to your convenience.

Example:

``````a = 1
b = 'ball'
``````

Method 1:

``````print('I have %d %s' % (a, b))
``````

Method 2:

``````print('I have', a, b)
``````

Method 3:

``````print('I have {} {}'.format(a, b))
``````

Method 4:

``````print('I have ' + str(a) + ' ' + b)
``````

Method 5:

``````print(f'I have {a} {b}')
``````

The output would be:

``````I have 1 ball
``````

## 回答 3

``print("If there was a birth every 7 second, there would be %d births." % (births))``

If you want to work with python 3, it’s very simple:

``````print("If there was a birth every 7 second, there would be %d births." % (births))
``````

## 回答 4

``````births = 5.25487
>>> print(f'If there was a birth every 7 seconds, there would be: {births:.2f} births')
If there was a birth every 7 seconds, there would be: 5.25 births``````

As of python 3.6 you can use Literal String Interpolation.

``````births = 5.25487
>>> print(f'If there was a birth every 7 seconds, there would be: {births:.2f} births')
If there was a birth every 7 seconds, there would be: 5.25 births
``````

## 回答 5

``print(f'If there was a birth every 7 seconds, there would be: {births} births')``

``print("If there was a birth every 7 seconds, there would be: {births} births".format(births=births))``

You can either use the f-string or .format() methods

Using f-string

``````print(f'If there was a birth every 7 seconds, there would be: {births} births')
``````

Using .format()

``````print("If there was a birth every 7 seconds, there would be: {births} births".format(births=births))
``````

## 回答 6

``print "There are %d births" % (births,)``

``print "There are ", births, "births"``

You can either use a formatstring:

``````print "There are %d births" % (births,)
``````

or in this simple case:

``````print "There are ", births, "births"
``````

## 回答 7

``print(f"{your_varaible_name}")``

If you are using python 3.6 or latest, f-string is the best and easy one

``````print(f"{your_varaible_name}")
``````

## 回答 8

``````D = 1
print("Here is a number!:",D)``````

You would first make a variable: for example: D = 1. Then Do This but replace the string with whatever you want:

``````D = 1
print("Here is a number!:",D)
``````

## 回答 9

``print ("If there was a birth every 7 seconds", X)``

On a current python version you have to use parenthesis, like so :

``````print ("If there was a birth every 7 seconds", X)
``````

## 回答 10

``````print("If there was a birth every 7 seconds, there would be: {} births".format(births))
# Will replace "{}" with births``````

``print('If there was a birth every 7 seconds, there would be:' births'births) ``

# 要么

``````print('If there was a birth every 7 seconds, there would be: %d births' %(births))
# Will replace %d with births``````
``````print("If there was a birth every 7 seconds, there would be: {} births".format(births))
# Will replace "{}" with births
``````

if you doing a toy project use:

``````print('If there was a birth every 7 seconds, there would be:' births'births)
``````

# or

``````print('If there was a birth every 7 seconds, there would be: %d births' %(births))
# Will replace %d with births
``````

## 回答 11

``print "If there was a birth every 7 seconds, there would be: %d births" % births``

``print "If there was a birth every 7 seconds, there would be:", births, "births"``

You can use string formatting to do this:

``````print "If there was a birth every 7 seconds, there would be: %d births" % births
``````

or you can give `print` multiple arguments, and it will automatically separate them by a space:

``````print "If there was a birth every 7 seconds, there would be:", births, "births"
``````

## 回答 12

``````File "print_strings_on_same_line.py", line 16
print fiveYears
^
SyntaxError: Missing parentheses in call to 'print'``````

``````currentPop = 312032486
oneYear = 365
hours = 24
minutes = 60
seconds = 60

# seconds in a single day
secondsInDay = hours * minutes * seconds

# seconds in a year
secondsInYear = secondsInDay * oneYear

fiveYears = secondsInYear * 5

#Seconds in 5 years
print fiveYears

# fiveYears in seconds, divided by 7 seconds
births = fiveYears // 7

print "If there was a birth every 7 seconds, there would be: " + str(births) + " births"``````

``````157680000
If there was a birth every 7 seconds, there would be: 22525714 births``````

I copied and pasted your script into a .py file. I ran it as-is with Python 2.7.10 and received the same syntax error. I also tried the script in Python 3.5 and received the following output:

``````File "print_strings_on_same_line.py", line 16
print fiveYears
^
SyntaxError: Missing parentheses in call to 'print'
``````

Then, I modified the last line where it prints the number of births as follows:

``````currentPop = 312032486
oneYear = 365
hours = 24
minutes = 60
seconds = 60

# seconds in a single day
secondsInDay = hours * minutes * seconds

# seconds in a year
secondsInYear = secondsInDay * oneYear

fiveYears = secondsInYear * 5

#Seconds in 5 years
print fiveYears

# fiveYears in seconds, divided by 7 seconds
births = fiveYears // 7

print "If there was a birth every 7 seconds, there would be: " + str(births) + " births"
``````

The output was (Python 2.7.10):

``````157680000
If there was a birth every 7 seconds, there would be: 22525714 births
``````

I hope this helps.

## 回答 13

``````# Weight converter pounds to kg

weight_lbs = input("Enter your weight in pounds: ")

weight_kg = 0.45 * int(weight_lbs)

print("You are ", weight_kg, " kg")``````

Just use , (comma) in between.

See this code for better understanding:

``````# Weight converter pounds to kg

weight_lbs = input("Enter your weight in pounds: ")

weight_kg = 0.45 * int(weight_lbs)

print("You are ", weight_kg, " kg")
``````

## 回答 14

``print("~~Create new DB:",argv[5],"; with user:",argv[3],"; and Password:",argv[4]," ~~")``

Slightly different: Using Python 3 and print several variables in the same line:

``````print("~~Create new DB:",argv[5],"; with user:",argv[3],"; and Password:",argv[4]," ~~")
``````

## 回答 15

PYTHON 3

``````user_name=input("Enter your name : )

points = 10

print ("Hello, {} your point is {} : ".format(user_name,points)``````

``````user_name=str(input("Enter your name : ))

points = 10

print("Hello, "+user_name+" your point is " +str(points))``````

PYTHON 3

Better to use the format option

``````user_name=input("Enter your name : )

points = 10

print ("Hello, {} your point is {} : ".format(user_name,points)
``````

or declare the input as string and use

``````user_name=str(input("Enter your name : ))

points = 10

print("Hello, "+user_name+" your point is " +str(points))
``````

## 回答 16

``print "If there was a birth every 7 seconds, there would be: ", births, "births"``

If you use a comma inbetween the strings and the variable, like this:

``````print "If there was a birth every 7 seconds, there would be: ", births, "births"
``````

# Python-检查Word是否在字符串中

## 问题：Python-检查Word是否在字符串中

``````if string.find(word):
print 'success'
``````

I’m working with Python v2, and I’m trying to find out if you can tell if a word is in a string.

I have found some information about identifying if the word is in the string – using .find, but is there a way to do an IF statement. I would like to have something like the following:

``````if string.find(word):
print 'success'
``````

Thanks for any help.

## 回答 0

``````if word in mystring:
print 'success'
``````

What is wrong with:

``````if word in mystring:
print 'success'
``````

# 在Python中使用多个参数进行字符串格式化（例如’％s…％s’）

## 问题：在Python中使用多个参数进行字符串格式化（例如’％s…％s’）

``'%s in %s' % unicode(self.author),  unicode(self.publication)``

I have a string that looks like `'%s in %s'` and I want to know how to seperate the arguments so that they are two different %s. My mind coming from Java came up with this:

``````'%s in %s' % unicode(self.author),  unicode(self.publication)
``````

But this doesn’t work so how does it look in Python?

## 回答 0

``'{0} in {1}'.format(unicode(self.author,'utf-8'),  unicode(self.publication,'utf-8'))``

`%`不再鼓励使用for格式化字符串。

Mark Cidade’s answer is right – you need to supply a tuple.

However from Python 2.6 onwards you can use `format` instead of `%`:

``````'{0} in {1}'.format(unicode(self.author,'utf-8'),  unicode(self.publication,'utf-8'))
``````

Usage of `%` for formatting strings is no longer encouraged.

This method of string formatting is the new standard in Python 3.0, and should be preferred to the % formatting described in String Formatting Operations in new code.

## 回答 1

``'%s in %s' % (unicode(self.author),  unicode(self.publication))``

``'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))``

``'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))``

If you’re using more than one argument it has to be in a tuple (note the extra parentheses):

``````'%s in %s' % (unicode(self.author),  unicode(self.publication))
``````

As EOL points out, the `unicode()` function usually assumes ascii encoding as a default, so if you have non-ASCII characters, it’s safer to explicitly pass the encoding:

``````'%s in %s' % (unicode(self.author,'utf-8'),  unicode(self.publication('utf-8')))
``````

And as of Python 3.0, it’s preferred to use the `str.format()` syntax instead:

``````'{0} in {1}'.format(unicode(self.author,'utf-8'),unicode(self.publication,'utf-8'))
``````

## 回答 2

### 开启`str.format`而不是`%`

`%`操作员的新替代方法是使用`str.format`。以下是文档摘录：

`str.format(*args, **kwargs)`

### 例子

``````>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond``````

### On a tuple/mapping object for multiple argument `format`

The following is excerpt from the documentation:

Given `format % values`, `%` conversion specifications in `format` are replaced with zero or more elements of `values`. The effect is similar to the using `sprintf()` in the C language.

If `format` requires a single argument, values may be a single non-tuple object. Otherwise, values must be a tuple with exactly the number of items specified by the `format` string, or a single mapping object (for example, a dictionary).

### On `str.format` instead of `%`

A newer alternative to `%` operator is to use `str.format`. Here’s an excerpt from the documentation:

`str.format(*args, **kwargs)`

Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces `{}`. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.

This method is the new standard in Python 3.0, and should be preferred to `%` formatting.

### Examples

Here are some usage examples:

``````>>> '%s for %s' % ("tit", "tat")
tit for tat

>>> '{} and {}'.format("chicken", "waffles")
chicken and waffles

>>> '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
Bond, James Bond

>>> '{last}, {first} {last}'.format(first="James", last="Bond")
Bond, James Bond
``````

## 回答 3

``'%s in %s' % (unicode(self.author),  unicode(self.publication))``

You must just put the values into parentheses:

``````'%s in %s' % (unicode(self.author),  unicode(self.publication))
``````

Here, for the first `%s` the `unicode(self.author)` will be placed. And for the second `%s`, the `unicode(self.publication)` will be used.

Note: You should favor `string formatting` over the `%` Notation. More info here

## 回答 4

``````# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))``````

``````Traceback (most recent call last):
File "test.py", line 3, in <module>
print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)``````

``u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))``

（或不使用initial `u`，这取决于您要使用Unicode结果还是字节字符串）。

There is a significant problem with some of the answers posted so far: `unicode()` decodes from the default encoding, which is often ASCII; in fact, `unicode()` tries to make “sense” of the bytes it is given by converting them into characters. Thus, the following code, which is essentially what is recommended by previous answers, fails on my machine:

``````# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))
``````

gives:

``````Traceback (most recent call last):
File "test.py", line 3, in <module>
print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
``````

The failure comes from the fact that `author` does not contain only ASCII bytes (i.e. with values in [0; 127]), and `unicode()` decodes from ASCII by default (on many machines).

A robust solution is to explicitly give the encoding used in your fields; taking UTF-8 as an example:

``````u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))
``````

(or without the initial `u`, depending on whether you want a Unicode result or a byte string).

At this point, one might want to consider having the `author` and `publication` fields be Unicode strings, instead of decoding them during formatting.

## 回答 5

``````'%(author)s in %(publication)s'%{'author':unicode(self.author),
'publication':unicode(self.publication)}``````

Python2.6及更高版本支持 `.format()`

``````'{author} in {publication}'.format(author=self.author,
publication=self.publication)``````

For python2 you can also do this

``````'%(author)s in %(publication)s'%{'author':unicode(self.author),
'publication':unicode(self.publication)}
``````

which is handy if you have a lot of arguments to substitute (particularly if you are doing internationalisation)

Python2.6 onwards supports `.format()`

``````'{author} in {publication}'.format(author=self.author,
publication=self.publication)
``````

## 回答 6

``print 'This is my %s formatted with %d arguments' % ('string', 2)``

You could also use it clean and simple (but wrong! because you should use `format` like Mark Byers said) by doing:

``````print 'This is my %s formatted with %d arguments' % ('string', 2)
``````

## 回答 7

``f'{self.author} in {self.publication}'``

For completeness, in python 3.6 f-string are introduced in PEP-498. These strings make it possible to

embed expressions inside string literals, using a minimal syntax.

That would mean that for your example you could also use:

``````f'{self.author} in {self.publication}'
``````

# 不区分大小写的替换

## 问题：不区分大小写的替换

What’s the easiest way to do a case-insensitive string replacement in Python?

## 回答 0

`string`类型不支持此功能。您最好使用带有re.IGNORECASE选项的正则表达式子方法

``````>>> import re
>>> insensitive_hippo = re.compile(re.escape('hippo'), re.IGNORECASE)
>>> insensitive_hippo.sub('giraffe', 'I want a hIPpo for my birthday')
'I want a giraffe for my birthday'``````

The `string` type doesn’t support this. You’re probably best off using the regular expression sub method with the re.IGNORECASE option.

``````>>> import re
>>> insensitive_hippo = re.compile(re.escape('hippo'), re.IGNORECASE)
>>> insensitive_hippo.sub('giraffe', 'I want a hIPpo for my birthday')
'I want a giraffe for my birthday'
``````

## 回答 1

``````import re
pattern = re.compile("hello", re.IGNORECASE)
pattern.sub("bye", "hello HeLLo HELLO")
# 'bye bye bye'``````
``````import re
pattern = re.compile("hello", re.IGNORECASE)
pattern.sub("bye", "hello HeLLo HELLO")
# 'bye bye bye'
``````

## 回答 2

``````import re
re.sub("(?i)hello","bye", "hello HeLLo HELLO") #'bye bye bye'
re.sub("(?i)he\.llo","bye", "he.llo He.LLo HE.LLO") #'bye bye bye'``````

``````import re
re.sub("hello", "bye", "hello HeLLo HELLO", flags=re.I) #'bye bye bye'
re.sub("he\.llo", "bye", "he.llo He.LLo HE.LLO", flags=re.I) #'bye bye bye'``````

In a single line:

``````import re
re.sub("(?i)hello","bye", "hello HeLLo HELLO") #'bye bye bye'
re.sub("(?i)he\.llo","bye", "he.llo He.LLo HE.LLO") #'bye bye bye'
``````

Or, use the optional “flags” argument:

``````import re
re.sub("hello", "bye", "hello HeLLo HELLO", flags=re.I) #'bye bye bye'
re.sub("he\.llo", "bye", "he.llo He.LLo HE.LLO", flags=re.I) #'bye bye bye'
``````

## 回答 3

``````def ireplace(old, new, text):
idx = 0
while idx < len(text):
index_l = text.lower().find(old.lower(), idx)
if index_l == -1:
return text
text = text[:index_l] + new + text[index_l + len(old):]
idx = index_l + len(new)
return text``````

Continuing on bFloch’s answer, this function will change not one, but all occurrences of old with new – in a case insensitive fashion.

``````def ireplace(old, new, text):
idx = 0
while idx < len(text):
index_l = text.lower().find(old.lower(), idx)
if index_l == -1:
return text
text = text[:index_l] + new + text[index_l + len(old):]
idx = index_l + len(new)
return text
``````

## 回答 4

``````import re
def ireplace(old, repl, text):
return re.sub('(?i)'+re.escape(old), lambda m: repl, text)

>>> ireplace('hippo?', 'giraffe!?', 'You want a hiPPO?')
'You want a giraffe!?'
>>> ireplace(r'[binfolder]', r'C:\Temp\bin', r'[BinFolder]\test.exe')
'C:\\Temp\\bin\\test.exe'``````

Like Blair Conrad says string.replace doesn’t support this.

Use the regex `re.sub`, but remember to escape the replacement string first. Note that there’s no flags-option in 2.6 for `re.sub`, so you’ll have to use the embedded modifier `'(?i)'` (or a RE-object, see Blair Conrad’s answer). Also, another pitfall is that sub will process backslash escapes in the replacement text, if a string is given. To avoid this one can instead pass in a lambda.

Here’s a function:

``````import re
def ireplace(old, repl, text):
return re.sub('(?i)'+re.escape(old), lambda m: repl, text)

>>> ireplace('hippo?', 'giraffe!?', 'You want a hiPPO?')
'You want a giraffe!?'
>>> ireplace(r'[binfolder]', r'C:\Temp\bin', r'[BinFolder]\test.exe')
'C:\\Temp\\bin\\test.exe'
``````

## 回答 5

``````def replace_all(pattern, repl, string) -> str:
occurences = re.findall(pattern, string, re.IGNORECASE)
for occurence in occurences:
string = string.replace(occurence, repl)
return string``````

This function uses both the `str.replace()` and `re.findall()` functions. It will replace all occurences of `pattern` in `string` with `repl` in a case-insensitive way.

``````def replace_all(pattern, repl, string) -> str:
occurences = re.findall(pattern, string, re.IGNORECASE)
for occurence in occurences:
string = string.replace(occurence, repl)
return string
``````

## 回答 6

``````def ireplace(old, new, text):
"""
Replace case insensitive
"""
index_l = text.lower().index(old.lower())
return text[:index_l] + new + text[index_l + len(old):] ``````

This doesn’t require RegularExp

``````def ireplace(old, new, text):
"""
Replace case insensitive
"""
index_l = text.lower().index(old.lower())
return text[:index_l] + new + text[index_l + len(old):]
``````

## 回答 7

``````import re
old = "TREEROOT treeroot TREerOot"
re.sub(r'(?i)treeroot', 'grassroot', old)``````

‘草根草根草根’

``re.sub(r'treeroot', 'grassroot', old)``

‘TREEROOT草根TREerOot’

``re.sub(r'treeroot', 'grassroot', old, flags=re.I)``

‘草根草根草根’

``re.sub(r'treeroot', 'grassroot', old, re.I)``

‘TREEROOT草根TREerOot’

``re.findall(r'treeroot', old, re.I)``

[‘TREEROOT’，’treeroot’，’TREerOot’]

``re.findall(r'treeroot', old)``

[‘treeroot’]

An interesting observation about syntax details and options:

Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32

``````import re
old = "TREEROOT treeroot TREerOot"
re.sub(r'(?i)treeroot', 'grassroot', old)
``````

‘grassroot grassroot grassroot’

``````re.sub(r'treeroot', 'grassroot', old)
``````

‘TREEROOT grassroot TREerOot’

``````re.sub(r'treeroot', 'grassroot', old, flags=re.I)
``````

‘grassroot grassroot grassroot’

``````re.sub(r'treeroot', 'grassroot', old, re.I)
``````

‘TREEROOT grassroot TREerOot’

So the (?i) prefix in the match expression or adding “flags=re.I” as a fourth argument will result in a case-insensitive match. BUT, using just “re.I” as the fourth argument does not result in case-insensitive match.

For comparison,

``````re.findall(r'treeroot', old, re.I)
``````

[‘TREEROOT’, ‘treeroot’, ‘TREerOot’]

``````re.findall(r'treeroot', old)
``````

[‘treeroot’]

## 回答 8

``````import re
def ireplace(findtxt, replacetxt, data):
return replacetxt.join(  re.compile(findtxt, flags=re.I).split(data)  )``````

``````findtxt = findtxt.decode('string_escape') # python2
replacetxt = replacetxt.decode('string_escape') # python2
data = ireplace(findtxt, replacetxt, data)``````

I was having \t being converted to the escape sequences (scroll a bit down), so I noted that re.sub converts backslashed escaped characters to escape sequences.

To prevent that I wrote the following:

Replace case insensitive.

``````import re
def ireplace(findtxt, replacetxt, data):
return replacetxt.join(  re.compile(findtxt, flags=re.I).split(data)  )
``````

Also, if you want it to replace with the escape characters, like the other answers here that are getting the special meaning bashslash characters converted to escape sequences, just decode your find and, or replace string. In Python 3, might have to do something like .decode(“unicode_escape”) # python3

``````findtxt = findtxt.decode('string_escape') # python2
replacetxt = replacetxt.decode('string_escape') # python2
data = ireplace(findtxt, replacetxt, data)
``````

Tested in Python 2.7.8

Hope that helps.

## 回答 9

``````i='I want a hIPpo for my birthday'
key='hippo'
swp='giraffe'

o=(i.lower().split(key))
c=0
p=0
for w in o:
o[c]=i[p:p+len(w)]
p=p+len(key+w)
c+=1
print(swp.join(o))``````

never posted an answer before and this thread is really old but i came up with another sollution and figured i could get your respons, Im not seasoned in Python programming so if there are appearant drawbacks to it, please point them out since its good learning :)

``````i='I want a hIPpo for my birthday'
key='hippo'
swp='giraffe'

o=(i.lower().split(key))
c=0
p=0
for w in o:
o[c]=i[p:p+len(w)]
p=p+len(key+w)
c+=1
print(swp.join(o))
``````

# 如何检查变量是否为具有python 2和3兼容性的字符串

## 问题：如何检查变量是否为具有python 2和3兼容性的字符串

``````>>>isinstance(u"test", str)
False
``````

I’m aware that I can use: `isinstance(x, str)` in python-3.x but I need to check if something is a string in python-2.x as well. Will `isinstance(x, str)` work as expected in python-2.x? Or will I need to check the version and use `isinstance(x, basestr)`?

Specifically, in python-2.x:

``````>>>isinstance(u"test", str)
False
``````

and python-3.x does not have `u"foo"`

## 回答 0

``````from six import string_types
isinstance(s, string_types)
``````

If you’re writing 2.x-and-3.x-compatible code, you’ll probably want to use six:

``````from six import string_types
isinstance(s, string_types)
``````

## 回答 1

``````try:
basestring
except NameError:
basestring = str
``````

``isinstance(s, basestring)``

The most terse approach I’ve found without relying on packages like six, is:

``````try:
basestring
except NameError:
basestring = str
``````

then, assuming you’ve been checking for strings in Python 2 in the most generic manner,

``````isinstance(s, basestring)
``````

will now also work for Python 3+.

## 回答 2

``isinstance(x, ("".__class__, u"".__class__))``

``````isinstance(x, ("".__class__, u"".__class__))
``````

## 回答 3

``````try:
isinstance("", basestring)
def isstr(s):
return isinstance(s, basestring)
except NameError:
def isstr(s):
return isinstance(s, str)
``````

`try`/ `except`测试只进行一次，然后定义总是工作，并尽可能快的功能。

``````try:
basestring  # attempt to evaluate basestring
def isstr(s):
return isinstance(s, basestring)
except NameError:
def isstr(s):
return isinstance(s, str)
``````

This is @Lev Levitsky’s answer, re-written a bit.

``````try:
isinstance("", basestring)
def isstr(s):
return isinstance(s, basestring)
except NameError:
def isstr(s):
return isinstance(s, str)
``````

The `try`/`except` test is done once, and then defines a function that always works and is as fast as possible.

EDIT: Actually, we don’t even need to call `isinstance()`; we just need to evaluate `basestring` and see if we get a `NameError`:

``````try:
basestring  # attempt to evaluate basestring
def isstr(s):
return isinstance(s, basestring)
except NameError:
def isstr(s):
return isinstance(s, str)
``````

I think it is easier to follow with the call to `isinstance()`, though.

## 回答 4

`future`添加了（兼容 Python 2）兼容名称，因此您可以继续编写Python 3。您可以简单地执行以下操作：

``````from builtins import str
isinstance(x, str)
``````

The `future` library adds (to Python 2) compatible names, so you can continue writing Python 3. You can simple do the following:

``````from builtins import str
isinstance(x, str)
``````

To install it, just execute `pip install future`.

As a caveat, it only support `python>=2.6`,`>=3.3`, but it is more modern than `six`, which is only recommended if using `python 2.5`

## 回答 5

``````def isstr(s):
try:
return isinstance(s, basestring)
except NameError:
return isinstance(s, str)
``````

Maybe use a workaround like

``````def isstr(s):
try:
return isinstance(s, basestring)
except NameError:
return isinstance(s, str)
``````

## 回答 6

``    isinstance(object,"".__class__)``

``    from __future__ import unicode_literals``

You can get the class of an object by calling `object.__class__`, so in order to check if object is the default string type:

``````    isinstance(object,"".__class__)
``````

And You can place the following in the top of Your code so that strings enclosed by quotes are in unicode in python 2:

``````    from __future__ import unicode_literals
``````

## 回答 7

``````from __future__ import print_function
import sys
if sys.version[0] == "2":
py3 = False
else:
py3 = True
if py3:
basstring = str
else:
basstring = basestring``````

``````anystring = "test"
# anystring = 1
if isinstance(anystring, basstring):
print("This is a string")
else:
print("No string")``````

You can try this at the beginning of your code:

``````from __future__ import print_function
import sys
if sys.version[0] == "2":
py3 = False
else:
py3 = True
if py3:
basstring = str
else:
basstring = basestring
``````

and later in the code:

``````anystring = "test"
# anystring = 1
if isinstance(anystring, basstring):
print("This is a string")
else:
print("No string")
``````

## 回答 8

``````>>> size = 5
>>> byte_arr = bytes(size)
>>> isinstance(byte_arr, bytes)
True
>>> isinstance(byte_arr, str)
True``````

Be careful! In python 2, `str` and `bytes` are essentially the same. This can cause a bug if you are trying to distinguish between the two.

``````>>> size = 5
>>> byte_arr = bytes(size)
>>> isinstance(byte_arr, bytes)
True
>>> isinstance(byte_arr, str)
True
``````

## 回答 9

type(string) == str

returns true if its a string, and false if not

# 使用Python从字符串中删除数字以外的字符？

## 问题：使用Python从字符串中删除数字以外的字符？

How can I remove all characters except numbers from string?

## 回答 0

``````>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>> ``````

`string.maketrans`生成一个转换表（长度为256的字符串），在这种情况下，该转换表与`''.join(chr(x) for x in range(256))`（更快地制作；-）相同。`.translate`应用转换表（这里无关紧要，因为`all`本质上是指身份），并删除第二个参数（关键部分）中存在的字符。

`.translate`在Unicode字符串（和Python 3中的字符串）上的工作方式大不相同-我确实希望问题能说明感兴趣的是哪个Python的主要发行版！）-并不是那么简单，也不是那么快，尽管仍然非常有用。

``````\$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
\$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop``````

``````\$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop``````

``````import string

class Del:
def __init__(self, keep=string.digits):
self.comp = dict((ord(c),c) for c in keep)
def __getitem__(self, k):
return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)``````

``````\$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
\$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop``````

…表明性能优势对于这种“删除”任务消失了，而变成了性能下降。

In Python 2.*, by far the fastest approach is the `.translate` method:

``````>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>>
``````

`string.maketrans` makes a translation table (a string of length 256) which in this case is the same as `''.join(chr(x) for x in range(256))` (just faster to make;-). `.translate` applies the translation table (which here is irrelevant since `all` essentially means identity) AND deletes characters present in the second argument — the key part.

`.translate` works very differently on Unicode strings (and strings in Python 3 — I do wish questions specified which major-release of Python is of interest!) — not quite this simple, not quite this fast, though still quite usable.

Back to 2.*, the performance difference is impressive…:

``````\$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
\$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop
``````

Speeding things up by 7-8 times is hardly peanuts, so the `translate` method is well worth knowing and using. The other popular non-RE approach…:

``````\$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop
``````

is 50% slower than RE, so the `.translate` approach beats it by over an order of magnitude.

In Python 3, or for Unicode, you need to pass `.translate` a mapping (with ordinals, not characters directly, as keys) that returns `None` for what you want to delete. Here’s a convenient way to express this for deletion of “everything but” a few characters:

``````import string

class Del:
def __init__(self, keep=string.digits):
self.comp = dict((ord(c),c) for c in keep)
def __getitem__(self, k):
return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)
``````

also emits `'1233344554552'`. However, putting this in xx.py we have…:

``````\$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
\$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop
``````

…which shows the performance advantage disappears, for this kind of “deletion” tasks, and becomes a performance decrease.

## 回答 1

``````>>> import re
>>> re.sub('\D', '', 'aas30dsa20')
'3020'``````

`\D` 匹配任何非数字字符，因此，上面的代码实质上是将每个非数字字符替换为空字符串。

``````>>> filter(str.isdigit, 'aas30dsa20')
'3020'``````

``````>>> ''.join(filter(str.isdigit, 'aas30dsa20'))
'3020'``````

Use `re.sub`, like so:

``````>>> import re
>>> re.sub('\D', '', 'aas30dsa20')
'3020'
``````

`\D` matches any non-digit character so, the code above, is essentially replacing every non-digit character for the empty string.

Or you can use `filter`, like so (in Python 2):

``````>>> filter(str.isdigit, 'aas30dsa20')
'3020'
``````

Since in Python 3, `filter` returns an iterator instead of a `list`, you can use the following instead:

``````>>> ''.join(filter(str.isdigit, 'aas30dsa20'))
'3020'
``````

## 回答 2

``s=''.join(i for i in s if i.isdigit())``

``````s=''.join(i for i in s if i.isdigit())
``````

Another generator variant.

## 回答 3

``filter(lambda x: x.isdigit(), "dasdasd2313dsa")``

``''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))``

You can use filter:

``````filter(lambda x: x.isdigit(), "dasdasd2313dsa")
``````

On python3.0 you have to join this (kinda ugly :( )

``````''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))
``````

## 回答 4

``''.join(i for i in s if i.isdigit())``

along the lines of bayer’s answer:

``````''.join(i for i in s if i.isdigit())
``````

## 回答 5

``````>>> import re
>>> re.sub("\D","","£70,000")
70000``````

You can easily do it using Regex

``````>>> import re
>>> re.sub("\D","","£70,000")
70000
``````

## 回答 6

``x.translate(None, string.digits)``

``x.translate(None, string.letters)``
``````x.translate(None, string.digits)
``````

will delete all digits from string. To delete letters and keep the digits, do this:

``````x.translate(None, string.letters)
``````

## 回答 7

``````>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'``````

The op mentions in the comments that he wants to keep the decimal place. This can be done with the re.sub method (as per the second and IMHO best answer) by explicitly listing the characters to keep e.g.

``````>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'
``````

## 回答 8

Python 3的快速版本：

``````# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)

def keeper(keep):
table = defaultdict(_NoneType)
table.update({ord(c): c for c in keep})
return table

digit_keeper = keeper(string.digits)``````

``````\$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
\$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop``````

``````\$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop``````

A fast version for Python 3:

``````# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)

def keeper(keep):
table = defaultdict(_NoneType)
table.update({ord(c): c for c in keep})
return table

digit_keeper = keeper(string.digits)
``````

Here’s a performance comparison vs. regex:

``````\$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
\$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop
``````

So it’s a little bit more than 3 times faster than regex, for me. It’s also faster than `class Del` above, because `defaultdict` does all its lookups in C, rather than (slow) Python. Here’s that version on my same system, for comparison.

``````\$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop
``````

## 回答 9

``````>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")``````

Use a generator expression:

``````>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")
``````

## 回答 10

``````>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>``````

Ugly but works:

``````>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>
``````

## 回答 11

``\$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'``

100000次循环，每循环3：2.48微秒最佳

``\$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'``

100000次循环，最好为3：每个循环2.02微秒

``\$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'``

100000次循环，每循环3：2.37最佳

``\$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'``

100000次循环，每循环3：1.97最佳

``````\$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
``````

100000 loops, best of 3: 2.48 usec per loop

``````\$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'
``````

100000 loops, best of 3: 2.02 usec per loop

``````\$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
``````

100000 loops, best of 3: 2.37 usec per loop

``````\$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'
``````

100000 loops, best of 3: 1.97 usec per loop

I had observed that join is faster than sub.

## 回答 12

``````your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'``````

You can read each character. If it is digit, then include it in the answer. The `str.isdigit()` method is a way to know if a character is digit.

``````your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'
``````

## 回答 13

``````buffer = ""
some_str = "aas30dsa20"

for char in some_str:
if not char.isdigit():
buffer += char

print( buffer )``````

Not a one liner but very simple:

``````buffer = ""
some_str = "aas30dsa20"

for char in some_str:
if not char.isdigit():
buffer += char

print( buffer )
``````

## 回答 14

`Output = Input.translate({ord(i): None for i in 'letters'}))`

```Input = "I would like 20 dollars for that suit" Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'})) print(Output)```

I used this. `'letters'` should contain all the letters that you want to get rid of:

`Output = Input.translate({ord(i): None for i in 'letters'}))`

Example:

```Input = "I would like 20 dollars for that suit" Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'})) print(Output)```

Output: `20`

# 查找名称包含特定字符串的列

## 问题：查找名称包含特定字符串的列

I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I’m searching for `'spike'` in column names like `'spike-2'`, `'hey spike'`, `'spiked-in'` (the `'spike'` part is always continuous).

I want the column name to be returned as a string or a variable, so I access the column later with `df['name']` or `df[name]` as normal. I’ve tried to find ways to do this, to no avail. Any tips?

## 回答 0

``````import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)``````

``````['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']``````

1. `df.columns` 返回列名列表
2. `[col for col in df.columns if 'spike' in col]``df.columns`使用变量遍历列表`col`并将其添加到结果列表（如果`col`包含）`'spike'`。此语法是列表理解

``````df2 = df.filter(regex='spike')
print(df2)``````

``````   spike-2  spiked-in
0        1          7
1        2          8
2        3          9``````

Just iterate over `DataFrame.columns`, now this is an example in which you will end up with a list of column names that match:

``````import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)
``````

Output:

``````['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']
``````

Explanation:

1. `df.columns` returns a list of column names
2. `[col for col in df.columns if 'spike' in col]` iterates over the list `df.columns` with the variable `col` and adds it to the resulting list if `col` contains `'spike'`. This syntax is list comprehension.

If you only want the resulting data set with the columns that match you can do this:

``````df2 = df.filter(regex='spike')
print(df2)
``````

Output:

``````   spike-2  spiked-in
0        1          7
1        2          8
2        3          9
``````

## 回答 1

``````import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6]}
df = pd.DataFrame(data)

print(df.filter(like='spike').columns)``````

``print(df.filter(regex='spike|spke').columns)``

This answer uses the DataFrame.filter method to do this without list comprehension:

``````import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6]}
df = pd.DataFrame(data)

print(df.filter(like='spike').columns)
``````

Will output just ‘spike-2’. You can also use regex, as some people suggested in comments above:

``````print(df.filter(regex='spike|spke').columns)
``````

Will output both columns: [‘spike-2’, ‘hey spke’]

## 回答 2

``````data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

colNames = df.columns[df.columns.str.contains(pat = 'spike')]

print(colNames)``````

You can also use `df.columns[df.columns.str.contains(pat = 'spike')]`

``````data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

colNames = df.columns[df.columns.str.contains(pat = 'spike')]

print(colNames)
``````

This will output the column names: `'spike-2', 'spiked-in'`

## 回答 3

``````# select columns containing 'spike'
df.filter(like='spike', axis=1)``````

``````# select columns containing 'spike'
df.filter(like='spike', axis=1)
``````

You can also select by name, regular expression. Refer to: pandas.DataFrame.filter

## 回答 4

``df.loc[:,df.columns.str.contains("spike")]``
``````df.loc[:,df.columns.str.contains("spike")]
``````

## 回答 5

``spike_cols =[x for x in df.columns[df.columns.str.contains('spike')]]``

You also can use this code:

``````spike_cols =[x for x in df.columns[df.columns.str.contains('spike')]]
``````

## 回答 6

``````# from: /programming/21285380/find-column-whose-name-contains-a-specific-string
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
# from: https://cmdlinetips.com/2019/04/how-to-select-columns-using-prefix-suffix-of-column-names-in-pandas/
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html

import pandas as pd

data = {'spike_starts': [1,2,3], 'ends_spike_starts': [4,5,6], 'ends_spike': [7,8,9], 'not': [10,11,12]}
df = pd.DataFrame(data)

print("\n")
print("----------------------------------------")
colNames_contains = df.columns[df.columns.str.contains(pat = 'spike')].tolist()
print("Contains")
print(colNames_contains)

print("\n")
print("----------------------------------------")
colNames_starts = df.columns[df.columns.str.contains(pat = '^spike')].tolist()
print("Starts")
print(colNames_starts)

print("\n")
print("----------------------------------------")
colNames_ends = df.columns[df.columns.str.contains(pat = 'spike\$')].tolist()
print("Ends")
print(colNames_ends)

print("\n")
print("----------------------------------------")
df_subset_start = df.filter(regex='^spike',axis=1)
print("Starts")
print(df_subset_start)

print("\n")
print("----------------------------------------")
df_subset_contains = df.filter(regex='spike',axis=1)
print("Contains")
print(df_subset_contains)

print("\n")
print("----------------------------------------")
df_subset_ends = df.filter(regex='spike\$',axis=1)
print("Ends")
print(df_subset_ends)``````

Getting name and subsetting based on Start, Contains, and Ends:

``````# from: https://stackoverflow.com/questions/21285380/find-column-whose-name-contains-a-specific-string
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
# from: https://cmdlinetips.com/2019/04/how-to-select-columns-using-prefix-suffix-of-column-names-in-pandas/
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html

import pandas as pd

data = {'spike_starts': [1,2,3], 'ends_spike_starts': [4,5,6], 'ends_spike': [7,8,9], 'not': [10,11,12]}
df = pd.DataFrame(data)

print("\n")
print("----------------------------------------")
colNames_contains = df.columns[df.columns.str.contains(pat = 'spike')].tolist()
print("Contains")
print(colNames_contains)

print("\n")
print("----------------------------------------")
colNames_starts = df.columns[df.columns.str.contains(pat = '^spike')].tolist()
print("Starts")
print(colNames_starts)

print("\n")
print("----------------------------------------")
colNames_ends = df.columns[df.columns.str.contains(pat = 'spike\$')].tolist()
print("Ends")
print(colNames_ends)

print("\n")
print("----------------------------------------")
df_subset_start = df.filter(regex='^spike',axis=1)
print("Starts")
print(df_subset_start)

print("\n")
print("----------------------------------------")
df_subset_contains = df.filter(regex='spike',axis=1)
print("Contains")
print(df_subset_contains)

print("\n")
print("----------------------------------------")
df_subset_ends = df.filter(regex='spike\$',axis=1)
print("Ends")
print(df_subset_ends)
``````

# Python TypeError：格式字符串的参数不足

## 问题：Python TypeError：格式字符串的参数不足

``instr = "'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % softname, procversion, int(percent), exe, description, company, procurl``

TypeError：格式字符串的参数不足

Here’s the output. These are utf-8 strings I believe… some of these can be NoneType but it fails immediately, before ones like that…

``````instr = "'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % softname, procversion, int(percent), exe, description, company, procurl
``````

TypeError: not enough arguments for format string

Its 7 for 7 though?

## 回答 0

``instr = "'{0}', '{1}', '{2}', '{3}', '{4}', '{5}', '{6}'".format(softname, procversion, int(percent), exe, description, company, procurl)``

Note that the `%` syntax for formatting strings is becoming outdated. If your version of Python supports it, you should write:

``````instr = "'{0}', '{1}', '{2}', '{3}', '{4}', '{5}', '{6}'".format(softname, procversion, int(percent), exe, description, company, procurl)
``````

This also fixes the error that you happened to have.

## 回答 1

``instr = "'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % (softname, procversion, int(percent), exe, description, company, procurl)``

``intstr = ("'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % softname), procversion, int(percent), exe, description, company, procurl``

``````>>> "%s %s" % 'hello', 'world'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not enough arguments for format string
>>> "%s %s" % ('hello', 'world')
'hello world'``````

You need to put the format arguments into a tuple (add parentheses):

``````instr = "'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % (softname, procversion, int(percent), exe, description, company, procurl)
``````

What you currently have is equivalent to the following:

``````intstr = ("'%s', '%s', '%d', '%s', '%s', '%s', '%s'" % softname), procversion, int(percent), exe, description, company, procurl
``````

Example:

``````>>> "%s %s" % 'hello', 'world'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not enough arguments for format string
>>> "%s %s" % ('hello', 'world')
'hello world'
``````

## 回答 2

`%`在格式字符串中用作百分比字符时，出现了相同的错误。解决的办法是加倍`%%`

I got the same error when using `%` as a percent character in my format string. The solution to this is to double up the `%%`.

# python中的n克，四克，五克，六克？

## 问题：python中的n克，四克，五克，六克？

``````import nltk
from nltk import bigrams
string = "I really like python, it's pretty awesome."
string_bigrams = bigrams(string)
print string_bigrams``````

I’m looking for a way to split a text into n-grams. Normally I would do something like:

``````import nltk
from nltk import bigrams
string = "I really like python, it's pretty awesome."
string_bigrams = bigrams(string)
print string_bigrams
``````

I am aware that nltk only offers bigrams and trigrams, but is there a way to split my text in four-grams, five-grams or even hundred-grams?

Thanks!

## 回答 0

``````from nltk import ngrams

sentence = 'this is a foo bar sentences and i want to ngramize it'

n = 6
sixgrams = ngrams(sentence.split(), n)

for grams in sixgrams:
print grams``````

Great native python based answers given by other users. But here’s the `nltk` approach (just in case, the OP gets penalized for reinventing what’s already existing in the `nltk` library).

There is an ngram module that people seldom use in `nltk`. It’s not because it’s hard to read ngrams, but training a model base on ngrams where n > 3 will result in much data sparsity.

``````from nltk import ngrams

sentence = 'this is a foo bar sentences and i want to ngramize it'

n = 6
sixgrams = ngrams(sentence.split(), n)

for grams in sixgrams:
print grams
``````

## 回答 1

``````In [34]: sentence = "I really like python, it's pretty awesome.".split()

In [35]: N = 4

In [36]: grams = [sentence[i:i+N] for i in xrange(len(sentence)-N+1)]

In [37]: for gram in grams: print gram
['I', 'really', 'like', 'python,']
['really', 'like', 'python,', "it's"]
['like', 'python,', "it's", 'pretty']
['python,', "it's", 'pretty', 'awesome.']``````

I’m surprised that this hasn’t shown up yet:

``````In [34]: sentence = "I really like python, it's pretty awesome.".split()

In [35]: N = 4

In [36]: grams = [sentence[i:i+N] for i in xrange(len(sentence)-N+1)]

In [37]: for gram in grams: print gram
['I', 'really', 'like', 'python,']
['really', 'like', 'python,', "it's"]
['like', 'python,', "it's", 'pretty']
['python,', "it's", 'pretty', 'awesome.']
``````

## 回答 2

``````from nltk.tokenize import word_tokenize
from nltk.util import ngrams

def get_ngrams(text, n ):
n_grams = ngrams(word_tokenize(text), n)
return [ ' '.join(grams) for grams in n_grams]``````

``````get_ngrams('This is the simplest text i could think of', 3 )

['This is the', 'is the simplest', 'the simplest text', 'simplest text i', 'text i could', 'i could think', 'could think of']``````

Using only nltk tools

``````from nltk.tokenize import word_tokenize
from nltk.util import ngrams

def get_ngrams(text, n ):
n_grams = ngrams(word_tokenize(text), n)
return [ ' '.join(grams) for grams in n_grams]
``````

Example output

``````get_ngrams('This is the simplest text i could think of', 3 )

['This is the', 'is the simplest', 'the simplest text', 'simplest text i', 'text i could', 'i could think', 'could think of']
``````

In order to keep the ngrams in array format just remove `' '.join`

## 回答 3

``````>>> from nltk.util import ngrams
>>> text = "I am aware that nltk only offers bigrams and trigrams, but is there a way to split my text in four-grams, five-grams or even hundred-grams"
>>> tokenize = nltk.word_tokenize(text)
>>> tokenize
['I', 'am', 'aware', 'that', 'nltk', 'only', 'offers', 'bigrams', 'and', 'trigrams', ',', 'but', 'is', 'there', 'a', 'way', 'to', 'split', 'my', 'text', 'in', 'four-grams', ',', 'five-grams', 'or', 'even', 'hundred-grams']
>>> bigrams = ngrams(tokenize,2)
>>> bigrams
[('I', 'am'), ('am', 'aware'), ('aware', 'that'), ('that', 'nltk'), ('nltk', 'only'), ('only', 'offers'), ('offers', 'bigrams'), ('bigrams', 'and'), ('and', 'trigrams'), ('trigrams', ','), (',', 'but'), ('but', 'is'), ('is', 'there'), ('there', 'a'), ('a', 'way'), ('way', 'to'), ('to', 'split'), ('split', 'my'), ('my', 'text'), ('text', 'in'), ('in', 'four-grams'), ('four-grams', ','), (',', 'five-grams'), ('five-grams', 'or'), ('or', 'even'), ('even', 'hundred-grams')]
>>> trigrams=ngrams(tokenize,3)
>>> trigrams
[('I', 'am', 'aware'), ('am', 'aware', 'that'), ('aware', 'that', 'nltk'), ('that', 'nltk', 'only'), ('nltk', 'only', 'offers'), ('only', 'offers', 'bigrams'), ('offers', 'bigrams', 'and'), ('bigrams', 'and', 'trigrams'), ('and', 'trigrams', ','), ('trigrams', ',', 'but'), (',', 'but', 'is'), ('but', 'is', 'there'), ('is', 'there', 'a'), ('there', 'a', 'way'), ('a', 'way', 'to'), ('way', 'to', 'split'), ('to', 'split', 'my'), ('split', 'my', 'text'), ('my', 'text', 'in'), ('text', 'in', 'four-grams'), ('in', 'four-grams', ','), ('four-grams', ',', 'five-grams'), (',', 'five-grams', 'or'), ('five-grams', 'or', 'even'), ('or', 'even', 'hundred-grams')]
>>> fourgrams=ngrams(tokenize,4)
>>> fourgrams
[('I', 'am', 'aware', 'that'), ('am', 'aware', 'that', 'nltk'), ('aware', 'that', 'nltk', 'only'), ('that', 'nltk', 'only', 'offers'), ('nltk', 'only', 'offers', 'bigrams'), ('only', 'offers', 'bigrams', 'and'), ('offers', 'bigrams', 'and', 'trigrams'), ('bigrams', 'and', 'trigrams', ','), ('and', 'trigrams', ',', 'but'), ('trigrams', ',', 'but', 'is'), (',', 'but', 'is', 'there'), ('but', 'is', 'there', 'a'), ('is', 'there', 'a', 'way'), ('there', 'a', 'way', 'to'), ('a', 'way', 'to', 'split'), ('way', 'to', 'split', 'my'), ('to', 'split', 'my', 'text'), ('split', 'my', 'text', 'in'), ('my', 'text', 'in', 'four-grams'), ('text', 'in', 'four-grams', ','), ('in', 'four-grams', ',', 'five-grams'), ('four-grams', ',', 'five-grams', 'or'), (',', 'five-grams', 'or', 'even'), ('five-grams', 'or', 'even', 'hundred-grams')]``````

here is another simple way for do n-grams

``````>>> from nltk.util import ngrams
>>> text = "I am aware that nltk only offers bigrams and trigrams, but is there a way to split my text in four-grams, five-grams or even hundred-grams"
>>> tokenize = nltk.word_tokenize(text)
>>> tokenize
['I', 'am', 'aware', 'that', 'nltk', 'only', 'offers', 'bigrams', 'and', 'trigrams', ',', 'but', 'is', 'there', 'a', 'way', 'to', 'split', 'my', 'text', 'in', 'four-grams', ',', 'five-grams', 'or', 'even', 'hundred-grams']
>>> bigrams = ngrams(tokenize,2)
>>> bigrams
[('I', 'am'), ('am', 'aware'), ('aware', 'that'), ('that', 'nltk'), ('nltk', 'only'), ('only', 'offers'), ('offers', 'bigrams'), ('bigrams', 'and'), ('and', 'trigrams'), ('trigrams', ','), (',', 'but'), ('but', 'is'), ('is', 'there'), ('there', 'a'), ('a', 'way'), ('way', 'to'), ('to', 'split'), ('split', 'my'), ('my', 'text'), ('text', 'in'), ('in', 'four-grams'), ('four-grams', ','), (',', 'five-grams'), ('five-grams', 'or'), ('or', 'even'), ('even', 'hundred-grams')]
>>> trigrams=ngrams(tokenize,3)
>>> trigrams
[('I', 'am', 'aware'), ('am', 'aware', 'that'), ('aware', 'that', 'nltk'), ('that', 'nltk', 'only'), ('nltk', 'only', 'offers'), ('only', 'offers', 'bigrams'), ('offers', 'bigrams', 'and'), ('bigrams', 'and', 'trigrams'), ('and', 'trigrams', ','), ('trigrams', ',', 'but'), (',', 'but', 'is'), ('but', 'is', 'there'), ('is', 'there', 'a'), ('there', 'a', 'way'), ('a', 'way', 'to'), ('way', 'to', 'split'), ('to', 'split', 'my'), ('split', 'my', 'text'), ('my', 'text', 'in'), ('text', 'in', 'four-grams'), ('in', 'four-grams', ','), ('four-grams', ',', 'five-grams'), (',', 'five-grams', 'or'), ('five-grams', 'or', 'even'), ('or', 'even', 'hundred-grams')]
>>> fourgrams=ngrams(tokenize,4)
>>> fourgrams
[('I', 'am', 'aware', 'that'), ('am', 'aware', 'that', 'nltk'), ('aware', 'that', 'nltk', 'only'), ('that', 'nltk', 'only', 'offers'), ('nltk', 'only', 'offers', 'bigrams'), ('only', 'offers', 'bigrams', 'and'), ('offers', 'bigrams', 'and', 'trigrams'), ('bigrams', 'and', 'trigrams', ','), ('and', 'trigrams', ',', 'but'), ('trigrams', ',', 'but', 'is'), (',', 'but', 'is', 'there'), ('but', 'is', 'there', 'a'), ('is', 'there', 'a', 'way'), ('there', 'a', 'way', 'to'), ('a', 'way', 'to', 'split'), ('way', 'to', 'split', 'my'), ('to', 'split', 'my', 'text'), ('split', 'my', 'text', 'in'), ('my', 'text', 'in', 'four-grams'), ('text', 'in', 'four-grams', ','), ('in', 'four-grams', ',', 'five-grams'), ('four-grams', ',', 'five-grams', 'or'), (',', 'five-grams', 'or', 'even'), ('five-grams', 'or', 'even', 'hundred-grams')]
``````

## 回答 4

``````>>> from nltk.util import everygrams

>>> message = "who let the dogs out"

>>> msg_split = message.split()

>>> list(everygrams(msg_split))
[('who',), ('let',), ('the',), ('dogs',), ('out',), ('who', 'let'), ('let', 'the'), ('the', 'dogs'), ('dogs', 'out'), ('who', 'let', 'the'), ('let', 'the', 'dogs'), ('the', 'dogs', 'out'), ('who', 'let', 'the', 'dogs'), ('let', 'the', 'dogs', 'out'), ('who', 'let', 'the', 'dogs', 'out')]``````

``````>>> list(everygrams(msg_split, max_len=2))
[('who',), ('let',), ('the',), ('dogs',), ('out',), ('who', 'let'), ('let', 'the'), ('the', 'dogs'), ('dogs', 'out')]``````

People have already answered pretty nicely for the scenario where you need bigrams or trigrams but if you need everygram for the sentence in that case you can use `nltk.util.everygrams`

``````>>> from nltk.util import everygrams

>>> message = "who let the dogs out"

>>> msg_split = message.split()

>>> list(everygrams(msg_split))
[('who',), ('let',), ('the',), ('dogs',), ('out',), ('who', 'let'), ('let', 'the'), ('the', 'dogs'), ('dogs', 'out'), ('who', 'let', 'the'), ('let', 'the', 'dogs'), ('the', 'dogs', 'out'), ('who', 'let', 'the', 'dogs'), ('let', 'the', 'dogs', 'out'), ('who', 'let', 'the', 'dogs', 'out')]
``````

Incase you have a limit like in case of trigrams where the max length should be 3 then you can use max_len param to specify it.

``````>>> list(everygrams(msg_split, max_len=2))
[('who',), ('let',), ('the',), ('dogs',), ('out',), ('who', 'let'), ('let', 'the'), ('the', 'dogs'), ('dogs', 'out')]
``````

You can just modify the max_len param to achieve whatever gram i.e four gram, five gram, six or even hundred gram.

The previous mentioned solutions can be modified to implement the above mentioned solution but this solution is much straight forward than that.

And when you just need a specific gram like bigram or trigram etc you can use the nltk.util.ngrams as mentioned in M.A.Hassan’s answer.

## 回答 5

``````from itertools import izip, islice, tee
s = 'spam and eggs'
N = 3
trigrams = izip(*(islice(seq, index, None) for index, seq in enumerate(tee(s, N))))
list(trigrams)
# [('s', 'p', 'a'), ('p', 'a', 'm'), ('a', 'm', ' '),
# ('m', ' ', 'a'), (' ', 'a', 'n'), ('a', 'n', 'd'),
# ('n', 'd', ' '), ('d', ' ', 'e'), (' ', 'e', 'g'),
# ('e', 'g', 'g'), ('g', 'g', 's')]``````

You can easily whip up your own function to do this using `itertools`:

``````from itertools import izip, islice, tee
s = 'spam and eggs'
N = 3
trigrams = izip(*(islice(seq, index, None) for index, seq in enumerate(tee(s, N))))
list(trigrams)
# [('s', 'p', 'a'), ('p', 'a', 'm'), ('a', 'm', ' '),
# ('m', ' ', 'a'), (' ', 'a', 'n'), ('a', 'n', 'd'),
# ('n', 'd', ' '), ('d', ' ', 'e'), (' ', 'e', 'g'),
# ('e', 'g', 'g'), ('g', 'g', 's')]
``````

## 回答 6

``````string = "I really like python, it's pretty awesome."

def find_bigrams(s):
input_list = s.split(" ")
return zip(input_list, input_list[1:])

def find_ngrams(s, n):
input_list = s.split(" ")
return zip(*[input_list[i:] for i in range(n)])

find_bigrams(string)

[('I', 'really'), ('really', 'like'), ('like', 'python,'), ('python,', "it's"), ("it's", 'pretty'), ('pretty', 'awesome.')]``````

A more elegant approach to build bigrams with python’s builtin `zip()`. Simply convert the original string into a list by `split()`, then pass the list once normally and once offset by one element.

``````string = "I really like python, it's pretty awesome."

def find_bigrams(s):
input_list = s.split(" ")
return zip(input_list, input_list[1:])

def find_ngrams(s, n):
input_list = s.split(" ")
return zip(*[input_list[i:] for i in range(n)])

find_bigrams(string)

[('I', 'really'), ('really', 'like'), ('like', 'python,'), ('python,', "it's"), ("it's", 'pretty'), ('pretty', 'awesome.')]
``````

## 回答 7

``````D = dict()
string = 'whatever string...'
strparts = string.split()
for i in range(len(strparts)-N): # N-grams
try:
D[tuple(strparts[i:i+N])] += 1
except:
D[tuple(strparts[i:i+N])] = 1``````

I have never dealt with nltk but did N-grams as part of some small class project. If you want to find the frequency of all N-grams occurring in the string, here is a way to do that. `D` would give you the histogram of your N-words.

``````D = dict()
string = 'whatever string...'
strparts = string.split()
for i in range(len(strparts)-N): # N-grams
try:
D[tuple(strparts[i:i+N])] += 1
except:
D[tuple(strparts[i:i+N])] = 1
``````

## 回答 8

`````` from nltk.collocations import *
import nltk
text = "I do not like green eggs and ham, I do not like them Sam I am!"
tokens = nltk.wordpunct_tokenize(text)
for fourgram, freq in fourgrams.ngram_fd.items():
print fourgram, freq``````

For four_grams it is already in NLTK, here is a piece of code that can help you toward this:

`````` from nltk.collocations import *
import nltk
text = "I do not like green eggs and ham, I do not like them Sam I am!"
tokens = nltk.wordpunct_tokenize(text)
for fourgram, freq in fourgrams.ngram_fd.items():
print fourgram, freq
``````

I hope it helps.

## 回答 9

``````import sklearn.feature_extraction.text # FYI http://scikit-learn.org/stable/install.html
ngram_size = 4
string = ["I really like python, it's pretty awesome."]
vect = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(ngram_size,ngram_size))
vect.fit(string)
print('{1}-grams: {0}'.format(vect.get_feature_names(), ngram_size))``````

``4-grams: [u'like python it pretty', u'python it pretty awesome', u'really like python it']``

You can use sklearn.feature_extraction.text.CountVectorizer:

``````import sklearn.feature_extraction.text # FYI http://scikit-learn.org/stable/install.html
ngram_size = 4
string = ["I really like python, it's pretty awesome."]
vect = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(ngram_size,ngram_size))
vect.fit(string)
print('{1}-grams: {0}'.format(vect.get_feature_names(), ngram_size))
``````

outputs:

``````4-grams: [u'like python it pretty', u'python it pretty awesome', u'really like python it']
``````

You can set to `ngram_size` to any positive integer. I.e. you can split a text in four-grams, five-grams or even hundred-grams.

## 回答 10

``````from itertools import chain

def n_grams(seq, n=1):
"""Returns an itirator over the n-grams given a listTokens"""
shiftToken = lambda i: (el for j,el in enumerate(seq) if j>=i)
shiftedTokens = (shiftToken(i) for i in range(n))
tupleNGrams = zip(*shiftedTokens)
return tupleNGrams # if join in generator : (" ".join(i) for i in tupleNGrams)

def range_ngrams(listTokens, ngramRange=(1,2)):
"""Returns an itirator over all n-grams for n in range(ngramRange) given a listTokens."""
return chain(*(n_grams(listTokens, i) for i in range(*ngramRange)))``````

``````>>> input_list = input_list = 'test the ngrams generator'.split()
>>> list(range_ngrams(input_list, ngramRange=(1,3)))
[('test',), ('the',), ('ngrams',), ('generator',), ('test', 'the'), ('the', 'ngrams'), ('ngrams', 'generator'), ('test', 'the', 'ngrams'), ('the', 'ngrams', 'generator')]``````

〜与NLTK相同的速度：

``````import nltk
%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
nltk.ngrams(input_list,n=5)
# 7.02 ms ± 79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
n_grams(input_list,n=5)
# 7.01 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
nltk.ngrams(input_list,n=1)
nltk.ngrams(input_list,n=2)
nltk.ngrams(input_list,n=3)
nltk.ngrams(input_list,n=4)
nltk.ngrams(input_list,n=5)
# 7.32 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
range_ngrams(input_list, ngramRange=(1,6))
# 7.13 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)``````

If efficiency is an issue and you have to build multiple different n-grams (up to a hundred as you say), but you want to use pure python I would do:

``````from itertools import chain

def n_grams(seq, n=1):
"""Returns an itirator over the n-grams given a listTokens"""
shiftToken = lambda i: (el for j,el in enumerate(seq) if j>=i)
shiftedTokens = (shiftToken(i) for i in range(n))
tupleNGrams = zip(*shiftedTokens)
return tupleNGrams # if join in generator : (" ".join(i) for i in tupleNGrams)

def range_ngrams(listTokens, ngramRange=(1,2)):
"""Returns an itirator over all n-grams for n in range(ngramRange) given a listTokens."""
return chain(*(n_grams(listTokens, i) for i in range(*ngramRange)))
``````

Usage :

``````>>> input_list = input_list = 'test the ngrams generator'.split()
>>> list(range_ngrams(input_list, ngramRange=(1,3)))
[('test',), ('the',), ('ngrams',), ('generator',), ('test', 'the'), ('the', 'ngrams'), ('ngrams', 'generator'), ('test', 'the', 'ngrams'), ('the', 'ngrams', 'generator')]
``````

~Same speed as NLTK:

``````import nltk
%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
nltk.ngrams(input_list,n=5)
# 7.02 ms ± 79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
n_grams(input_list,n=5)
# 7.01 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
nltk.ngrams(input_list,n=1)
nltk.ngrams(input_list,n=2)
nltk.ngrams(input_list,n=3)
nltk.ngrams(input_list,n=4)
nltk.ngrams(input_list,n=5)
# 7.32 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input_list = 'test the ngrams interator vs nltk '*10**6
range_ngrams(input_list, ngramRange=(1,6))
# 7.13 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
``````

## 回答 11

Nltk很棒，但有时对于某些项目来说是一项开销：

``````import re
def tokenize(text, ngrams=1):
text = re.sub(r'[\b\(\)\\\"\'\/\[\]\s+\,\.:\?;]', ' ', text)
text = re.sub(r'\s+', ' ', text)
tokens = text.split()
return [tuple(tokens[i:i+ngrams]) for i in xrange(len(tokens)-ngrams+1)]``````

``````>> text = "This is an example text"
>> tokenize(text, 2)
[('This', 'is'), ('is', 'an'), ('an', 'example'), ('example', 'text')]
>> tokenize(text, 3)
[('This', 'is', 'an'), ('is', 'an', 'example'), ('an', 'example', 'text')]``````

Nltk is great, but sometimes is a overhead for some projects:

``````import re
def tokenize(text, ngrams=1):
text = re.sub(r'[\b\(\)\\\"\'\/\[\]\s+\,\.:\?;]', ' ', text)
text = re.sub(r'\s+', ' ', text)
tokens = text.split()
return [tuple(tokens[i:i+ngrams]) for i in xrange(len(tokens)-ngrams+1)]
``````

Example use:

``````>> text = "This is an example text"
>> tokenize(text, 2)
[('This', 'is'), ('is', 'an'), ('an', 'example'), ('example', 'text')]
>> tokenize(text, 3)
[('This', 'is', 'an'), ('is', 'an', 'example'), ('an', 'example', 'text')]
``````

## 回答 12

``````from itertools import chain

def get_m_2_ngrams(input_list, min, max):
for s in chain(*[get_ngrams(input_list, k) for k in range(min, max+1)]):
yield ' '.join(s)

def get_ngrams(input_list, n):
return zip(*[input_list[i:] for i in range(n)])

if __name__ == '__main__':
input_list = ['I', 'am', 'aware', 'that', 'nltk', 'only', 'offers', 'bigrams', 'and', 'trigrams', ',', 'but', 'is', 'there', 'a', 'way', 'to', 'split', 'my', 'text', 'in', 'four-grams', ',', 'five-grams', 'or', 'even', 'hundred-grams']
for s in get_m_2_ngrams(input_list, 4, 6):
print(s)``````

``````I am aware that
am aware that nltk
aware that nltk only
that nltk only offers
nltk only offers bigrams
only offers bigrams and
offers bigrams and trigrams
bigrams and trigrams ,
and trigrams , but
trigrams , but is
, but is there
but is there a
is there a way
there a way to
a way to split
way to split my
to split my text
split my text in
my text in four-grams
text in four-grams ,
in four-grams , five-grams
four-grams , five-grams or
, five-grams or even
five-grams or even hundred-grams
I am aware that nltk
am aware that nltk only
aware that nltk only offers
that nltk only offers bigrams
nltk only offers bigrams and
only offers bigrams and trigrams
offers bigrams and trigrams ,
bigrams and trigrams , but
and trigrams , but is
trigrams , but is there
, but is there a
but is there a way
is there a way to
there a way to split
a way to split my
way to split my text
to split my text in
split my text in four-grams
my text in four-grams ,
text in four-grams , five-grams
in four-grams , five-grams or
four-grams , five-grams or even
, five-grams or even hundred-grams
I am aware that nltk only
am aware that nltk only offers
aware that nltk only offers bigrams
that nltk only offers bigrams and
nltk only offers bigrams and trigrams
only offers bigrams and trigrams ,
offers bigrams and trigrams , but
bigrams and trigrams , but is
and trigrams , but is there
trigrams , but is there a
, but is there a way
but is there a way to
is there a way to split
there a way to split my
a way to split my text
way to split my text in
to split my text in four-grams
split my text in four-grams ,
my text in four-grams , five-grams
text in four-grams , five-grams or
in four-grams , five-grams or even
four-grams , five-grams or even hundred-grams``````

You can get all 4-6gram using the code without other package below:

``````from itertools import chain

def get_m_2_ngrams(input_list, min, max):
for s in chain(*[get_ngrams(input_list, k) for k in range(min, max+1)]):
yield ' '.join(s)

def get_ngrams(input_list, n):
return zip(*[input_list[i:] for i in range(n)])

if __name__ == '__main__':
input_list = ['I', 'am', 'aware', 'that', 'nltk', 'only', 'offers', 'bigrams', 'and', 'trigrams', ',', 'but', 'is', 'there', 'a', 'way', 'to', 'split', 'my', 'text', 'in', 'four-grams', ',', 'five-grams', 'or', 'even', 'hundred-grams']
for s in get_m_2_ngrams(input_list, 4, 6):
print(s)
``````

the output is below:

``````I am aware that
am aware that nltk
aware that nltk only
that nltk only offers
nltk only offers bigrams
only offers bigrams and
offers bigrams and trigrams
bigrams and trigrams ,
and trigrams , but
trigrams , but is
, but is there
but is there a
is there a way
there a way to
a way to split
way to split my
to split my text
split my text in
my text in four-grams
text in four-grams ,
in four-grams , five-grams
four-grams , five-grams or
, five-grams or even
five-grams or even hundred-grams
I am aware that nltk
am aware that nltk only
aware that nltk only offers
that nltk only offers bigrams
nltk only offers bigrams and
only offers bigrams and trigrams
offers bigrams and trigrams ,
bigrams and trigrams , but
and trigrams , but is
trigrams , but is there
, but is there a
but is there a way
is there a way to
there a way to split
a way to split my
way to split my text
to split my text in
split my text in four-grams
my text in four-grams ,
text in four-grams , five-grams
in four-grams , five-grams or
four-grams , five-grams or even
, five-grams or even hundred-grams
I am aware that nltk only
am aware that nltk only offers
aware that nltk only offers bigrams
that nltk only offers bigrams and
nltk only offers bigrams and trigrams
only offers bigrams and trigrams ,
offers bigrams and trigrams , but
bigrams and trigrams , but is
and trigrams , but is there
trigrams , but is there a
, but is there a way
but is there a way to
is there a way to split
there a way to split my
a way to split my text
way to split my text in
to split my text in four-grams
split my text in four-grams ,
my text in four-grams , five-grams
text in four-grams , five-grams or
in four-grams , five-grams or even
four-grams , five-grams or even hundred-grams
``````

you can find more detail on this blog

## 回答 13

``````def ngrams(words, n):
d = collections.deque(maxlen=n)
d.extend(words[:n])
words = words[n:]
for window, word in zip(itertools.cycle((d,)), words):
print(' '.join(window))
d.append(word)

words = ['I', 'am', 'become', 'death,', 'the', 'destroyer', 'of', 'worlds']``````

``````In [15]: ngrams(words, 3)
I am become
am become death,
become death, the
death, the destroyer
the destroyer of

In [16]: ngrams(words, 4)
I am become death,
am become death, the
become death, the destroyer
death, the destroyer of

In [17]: ngrams(words, 1)
I
am
become
death,
the
destroyer
of

In [18]: ngrams(words, 2)
I am
am become
become death,
death, the
the destroyer
destroyer of``````

After about seven years, here’s a more elegant answer using `collections.deque`:

``````def ngrams(words, n):
d = collections.deque(maxlen=n)
d.extend(words[:n])
words = words[n:]
for window, word in zip(itertools.cycle((d,)), words):
print(' '.join(window))
d.append(word)

words = ['I', 'am', 'become', 'death,', 'the', 'destroyer', 'of', 'worlds']
``````

Output:

``````In [15]: ngrams(words, 3)
I am become
am become death,
become death, the
death, the destroyer
the destroyer of

In [16]: ngrams(words, 4)
I am become death,
am become death, the
become death, the destroyer
death, the destroyer of

In [17]: ngrams(words, 1)
I
am
become
death,
the
destroyer
of

In [18]: ngrams(words, 2)
I am
am become
become death,
death, the
the destroyer
destroyer of
``````

## 回答 14

``````from typing import Iterable
import itertools

def ngrams_iter(input: str, ngram_size: int, token_regex=r"[^\s]+") -> Iterable[str]:
input_iters = [
map(lambda m: m.group(0), re.finditer(token_regex, input))
for n in range(ngram_size)
]
# Skip first words
for n in range(1, ngram_size): list(map(next, input_iters[n:]))

output_iter = itertools.starmap(
lambda *args: " ".join(args),
zip(*input_iters)
)
return output_iter``````

``````input = "If you want a pure iterator solution for large strings with constant memory usage"
list(ngrams_iter(input, 5))``````

``````['If you want a pure',
'you want a pure iterator',
'want a pure iterator solution',
'a pure iterator solution for',
'pure iterator solution for large',
'iterator solution for large strings',
'solution for large strings with',
'for large strings with constant',
'large strings with constant memory',
'strings with constant memory usage']``````

If you want a pure iterator solution for large strings with constant memory usage:

``````from typing import Iterable
import itertools

def ngrams_iter(input: str, ngram_size: int, token_regex=r"[^\s]+") -> Iterable[str]:
input_iters = [
map(lambda m: m.group(0), re.finditer(token_regex, input))
for n in range(ngram_size)
]
# Skip first words
for n in range(1, ngram_size): list(map(next, input_iters[n:]))

output_iter = itertools.starmap(
lambda *args: " ".join(args),
zip(*input_iters)
)
return output_iter
``````

Test:

``````input = "If you want a pure iterator solution for large strings with constant memory usage"
list(ngrams_iter(input, 5))
``````

Output:

``````['If you want a pure',
'you want a pure iterator',
'want a pure iterator solution',
'a pure iterator solution for',
'pure iterator solution for large',
'iterator solution for large strings',
'solution for large strings with',
'for large strings with constant',
'large strings with constant memory',
'strings with constant memory usage']
``````

# 除非分配输出，为什么调用Python字符串方法不做任何事情？

## 问题：除非分配输出，为什么调用Python字符串方法不做任何事情？

``````X = "hello world"
X.replace("hello", "goodbye")``````

I try to do a simple string replacement, but I don’t know why it doesn’t seem to work:

``````X = "hello world"
X.replace("hello", "goodbye")
``````

I want to change the word `hello` to `goodbye`, thus it should change the string `"hello world"` to `"goodbye world"`. But X just remains `"hello world"`. Why is my code not working?

## 回答 0

``X.replace("hello", "goodbye")``

``X = X.replace("hello", "goodbye")``

``````X  = X.strip(' \t')
X2 = X.translate(...)
Y  = X.lower()
Z  = X.upper()
A  = X.join(':')
B  = X.capitalize()
C  = X.casefold()``````

This is because strings are immutable in Python.

Which means that `X.replace("hello","goodbye")` returns a copy of `X` with replacements made. Because of that you need replace this line:

``````X.replace("hello", "goodbye")
``````

with this line:

``````X = X.replace("hello", "goodbye")
``````

More broadly, this is true for all Python string methods that change a string’s content “in-place”, e.g. `replace`,`strip`,`translate`,`lower`/`upper`,`join`,…

You must assign their output to something if you want to use it and not throw it away, e.g.

``````X  = X.strip(' \t')
X2 = X.translate(...)
Y  = X.lower()
Z  = X.upper()
A  = X.join(':')
B  = X.capitalize()
C  = X.casefold()
``````

and so on.

## 回答 1

``````x = 'hello'
x[0] = 'i' #'str' object does not support item assignment``````

All string functions as `lower`, `upper`, `strip` are returning a string without modifying the original. If you try to modify a string, as you might think `well it is an iterable`, it will fail.

``````x = 'hello'
x[0] = 'i' #'str' object does not support item assignment
``````

There is a good reading about the importance of strings being immutable: Why are Python strings immutable? Best practices for using them