标签归档:strip

从字符串列表的元素中删除结尾的换行符

问题:从字符串列表的元素中删除结尾的换行符

我必须采用以下形式的大量单词:

['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']

然后使用strip功能,将其转换为:

['this', 'is', 'a', 'list', 'of', 'words']

我以为我写的东西行得通,但是我不断收到错误消息:

“’list’对象没有属性’strip’”

这是我尝试过的代码:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

I have to take a large list of words in the form:

['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']

and then using the strip function, turn it into:

['this', 'is', 'a', 'list', 'of', 'words']

I thought that what I had written would work, but I keep getting an error saying:

“‘list’ object has no attribute ‘strip'”

Here is the code that I tried:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

回答 0

>>> my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> map(str.strip, my_list)
['this', 'is', 'a', 'list', 'of', 'words']
>>> my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> map(str.strip, my_list)
['this', 'is', 'a', 'list', 'of', 'words']

回答 1

清单理解力? [x.strip() for x in lst]

list comprehension? [x.strip() for x in lst]


回答 2

您可以使用列表推导

strip_list = [item.strip() for item in lines]

map功能:

# with a lambda
strip_list = map(lambda it: it.strip(), lines)

# without a lambda
strip_list = map(str.strip, lines)

You can use lists comprehensions:

strip_list = [item.strip() for item in lines]

Or the map function:

# with a lambda
strip_list = map(lambda it: it.strip(), lines)

# without a lambda
strip_list = map(str.strip, lines)

回答 3

这可以使用PEP 202中定义的列表理解来完成

[w.strip() for w in  ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']]

This can be done using list comprehensions as defined in PEP 202

[w.strip() for w in  ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']]

回答 4

所有其他答案,主要是关于列表理解的,都很棒。但是只是为了解释您的错误:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

a是您列表的成员,而不是索引。您可以这样写:

[...]
for a in lines:
    strip_list.append(a.strip())

另一个重要的评论:您可以通过以下方式创建一个空列表:

strip_list = [0] * 20

但这不是那么有用,因为可以.append 内容追加到列表中。在您的情况下,创建带有默认值的列表是没有用的,因为在附加剥离字符串时,将逐项构建该列表。

因此,您的代码应类似于:

strip_list = []
for a in lines:
    strip_list.append(a.strip())

但是,可以肯定的是,最好的选择就是这个,因为这是完全一样的:

stripped = [line.strip() for line in lines]

如果您遇到的不仅仅是a复杂的事情.strip,请将其放在函数中并执行相同的操作。这是使用列表最易读的方式。

All other answers, and mainly about list comprehension, are great. But just to explain your error:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

a is a member of your list, not an index. What you could write is this:

[...]
for a in lines:
    strip_list.append(a.strip())

Another important comment: you can create an empty list this way:

strip_list = [0] * 20

But this is not so useful, as .append appends stuff to your list. In your case, it’s not useful to create a list with defaut values, as you’ll build it item per item when appending stripped strings.

So your code should be like:

strip_list = []
for a in lines:
    strip_list.append(a.strip())

But, for sure, the best one is this one, as this is exactly the same thing:

stripped = [line.strip() for line in lines]

In case you have something more complicated than just a .strip, put this in a function, and do the same. That’s the most readable way to work with lists.


回答 5

如果您只需要删除结尾的空格,则可以使用str.rstrip(),它的效率应比str.strip()

>>> lst = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> [x.rstrip() for x in lst]
['this', 'is', 'a', 'list', 'of', 'words']
>>> list(map(str.rstrip, lst))
['this', 'is', 'a', 'list', 'of', 'words']

If you need to remove just trailing whitespace, you could use str.rstrip(), which should be slightly more efficient than str.strip():

>>> lst = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> [x.rstrip() for x in lst]
['this', 'is', 'a', 'list', 'of', 'words']
>>> list(map(str.rstrip, lst))
['this', 'is', 'a', 'list', 'of', 'words']

回答 6

my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
print([l.strip() for l in my_list])

输出:

['this', 'is', 'a', 'list', 'of', 'words']
my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
print([l.strip() for l in my_list])

Output:

['this', 'is', 'a', 'list', 'of', 'words']

如何删除第一双和最后一个双引号?

问题:如何删除第一双和最后一个双引号?

我想从中删除双引号:

string = '"" " " ""\\1" " "" ""'

获得:

string = '" " " ""\\1" " "" "'

我试图用rstriplstripstrip('[^\"]|[\"$]')但没有奏效。

我怎样才能做到这一点?

I want to strip double quotes from:

string = '"" " " ""\\1" " "" ""'

to obtain:

string = '" " " ""\\1" " "" "'

I tried to use rstrip, lstrip and strip('[^\"]|[\"$]') but it did not work.

How can I do this?


回答 0

如果您要去除的引号总是像您所说的那样“第一和最后”,那么您可以简单地使用:

string = string[1:-1]

If the quotes you want to strip are always going to be “first and last” as you said, then you could simply use:

string = string[1:-1]


回答 1

如果您不能假设您处理的所有字符串都带有双引号,则可以使用如下所示的内容:

if string.startswith('"') and string.endswith('"'):
    string = string[1:-1]

编辑:

我确定您只是string在此处用作示例的变量名,并且在您的真实代码中它具有有用的名称,但是我不得不警告您string在标准库中有一个名为模块的模块。它不会自动加载,但是如果您曾经使用过,请import string确保您的变量不会使其黯然失色。

If you can’t assume that all the strings you process have double quotes you can use something like this:

if string.startswith('"') and string.endswith('"'):
    string = string[1:-1]

Edit:

I’m sure that you just used string as the variable name for exemplification here and in your real code it has a useful name, but I feel obliged to warn you that there is a module named string in the standard libraries. It’s not loaded automatically, but if you ever use import string make sure your variable doesn’t eclipse it.


回答 2

要删除第一个和最后符,并且仅在所讨论的字符为双引号的情况下,才分别删除:

import re

s = re.sub(r'^"|"$', '', s)

请注意,RE模式与您给定的RE模式不同,并且操作sub(用空格替换字符串)(“替代”)(strip是字符串方法,但与您的要求有所不同,如其他答案所示)。

To remove the first and last characters, and in each case do the removal only if the character in question is a double quote:

import re

s = re.sub(r'^"|"$', '', s)

Note that the RE pattern is different than the one you had given, and the operation is sub (“substitute”) with an empty replacement string (strip is a string method but does something pretty different from your requirements, as other answers have indicated).


回答 3

重要说明:我正在扩展问题/答案以去除单引号或双引号。我将问题解释为必须同时存在两个引号,并且两个引号都必须匹配才能执行删除操作。否则,字符串将保持不变。

要“取消引用”字符串表示形式,该字符串表示形式周围可能带有单引号或双引号(这是@tgray答案的扩展):

def dequote(s):
    """
    If a string has single or double quotes around it, remove them.
    Make sure the pair of quotes match.
    If a matching pair of quotes is not found, return the string unchanged.
    """
    if (s[0] == s[-1]) and s.startswith(("'", '"')):
        return s[1:-1]
    return s

说明:

startswith可以取一个元组,以匹配多个备选方案中的任何一个。的原因倍增括号(())是使得我们通过一个参数("'", '"')startswith(),以指定允许的前缀,而不是两个参数"'"'"',这将被解释为一个前缀和(无效的)的开始位置。

s[-1] 是字符串中的最后符。

测试:

print( dequote("\"he\"l'lo\"") )
print( dequote("'he\"l'lo'") )
print( dequote("he\"l'lo") )
print( dequote("'he\"l'lo\"") )

=>

he"l'lo
he"l'lo
he"l'lo
'he"l'lo"

(对我来说,正则表达式不是显而易见的,所以我没有尝试扩展@Alex的答案。)

IMPORTANT: I’m extending the question/answer to strip either single or double quotes. And I interpret the question to mean that BOTH quotes must be present, and matching, to perform the strip. Otherwise, the string is returned unchanged.

To “dequote” a string representation, that might have either single or double quotes around it (this is an extension of @tgray’s answer):

def dequote(s):
    """
    If a string has single or double quotes around it, remove them.
    Make sure the pair of quotes match.
    If a matching pair of quotes is not found, return the string unchanged.
    """
    if (s[0] == s[-1]) and s.startswith(("'", '"')):
        return s[1:-1]
    return s

Explanation:

startswith can take a tuple, to match any of several alternatives. The reason for the DOUBLED parentheses (( and )) is so that we pass ONE parameter ("'", '"') to startswith(), to specify the permitted prefixes, rather than TWO parameters "'" and '"', which would be interpreted as a prefix and an (invalid) start position.

s[-1] is the last character in the string.

Testing:

print( dequote("\"he\"l'lo\"") )
print( dequote("'he\"l'lo'") )
print( dequote("he\"l'lo") )
print( dequote("'he\"l'lo\"") )

=>

he"l'lo
he"l'lo
he"l'lo
'he"l'lo"

(For me, regex expressions are non-obvious to read, so I didn’t try to extend @Alex’s answer.)


回答 4

如果字符串始终如您所显示:

string[1:-1]

If string is always as you show:

string[1:-1]

回答 5

快完成了 引用http://docs.python.org/library/stdtypes.html?highlight=strip#str.strip

chars参数是一个字符串,指定要删除的字符集。

[…]

chars参数不是前缀或后缀;而是删除其值的所有组合:

因此,该参数不是正则表达式。

>>> string = '"" " " ""\\1" " "" ""'
>>> string.strip('"')
' " " ""\\1" " "" '
>>> 

注意,这并不是您所要求的,因为它在字符串的两端都使用了多个引号!

Almost done. Quoting from http://docs.python.org/library/stdtypes.html?highlight=strip#str.strip

The chars argument is a string specifying the set of characters to be removed.

[…]

The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped:

So the argument is not a regexp.

>>> string = '"" " " ""\\1" " "" ""'
>>> string.strip('"')
' " " ""\\1" " "" '
>>> 

Note, that this is not exactly what you requested, because it eats multiple quotes from both end of the string!


回答 6

如果您确定要删除的开头和结尾处有一个“,请执行以下操作:

string = string[1:len(string)-1]

要么

string = string[1:-1]

If you are sure there is a ” at the beginning and at the end, which you want to remove, just do:

string = string[1:len(string)-1]

or

string = string[1:-1]

回答 7

从字符串的开头和结尾删除确定的字符串。

s = '""Hello World""'
s.strip('""')

> 'Hello World'

Remove a determinated string from start and end from a string.

s = '""Hello World""'
s.strip('""')

> 'Hello World'

回答 8

我有一些代码需要去除单引号或双引号,而我不能简单地ast.literal_eval。

if len(arg) > 1 and arg[0] in ('"\'') and arg[-1] == arg[0]:
    arg = arg[1:-1]

这类似于ToolmakerSteve的答案,但是它允许长度为0的字符串,并且不能将单个字符"转换为空字符串。

I have some code that needs to strip single or double quotes, and I can’t simply ast.literal_eval it.

if len(arg) > 1 and arg[0] in ('"\'') and arg[-1] == arg[0]:
    arg = arg[1:-1]

This is similar to ToolmakerSteve’s answer, but it allows 0 length strings, and doesn’t turn the single character " into an empty string.


回答 9

在您的示例中,您可以使用地带,但必须提供空间

string = '"" " " ""\\1" " "" ""'
string.strip('" ')  # output '\\1'

注意输出中的\是字符串输出的标准python引号

您的变量的值为’\\ 1′

in your example you could use strip but you have to provide the space

string = '"" " " ""\\1" " "" ""'
string.strip('" ')  # output '\\1'

note the \’ in the output is the standard python quotes for string output

the value of your variable is ‘\\1’


回答 10

下面的函数将去除空的spces并返回不带引号的字符串。如果没有引号,它将返回相同的字符串(剥离)

def removeQuote(str):
str = str.strip()
if re.search("^[\'\"].*[\'\"]$",str):
    str = str[1:-1]
    print("Removed Quotes",str)
else:
    print("Same String",str)
return str

Below function will strip the empty spces and return the strings without quotes. If there are no quotes then it will return same string(stripped)

def removeQuote(str):
str = str.strip()
if re.search("^[\'\"].*[\'\"]$",str):
    str = str[1:-1]
    print("Removed Quotes",str)
else:
    print("Same String",str)
return str

回答 11

从开始Python 3.9,您可以使用removeprefixremovesuffix

'"" " " ""\\1" " "" ""'.removeprefix('"').removesuffix('"')
# '" " " ""\\1" " "" "'

Starting in Python 3.9, you can use removeprefix and removesuffix:

'"" " " ""\\1" " "" ""'.removeprefix('"').removesuffix('"')
# '" " " ""\\1" " "" "'

回答 12

在字符串中找到第一个和最后一个“的位置

>>> s = '"" " " ""\\1" " "" ""'
>>> l = s.find('"')
>>> r = s.rfind('"')

>>> s[l+1:r]
'" " " ""\\1" " "" "'

find the position of the first and the last ” in your string

>>> s = '"" " " ""\\1" " "" ""'
>>> l = s.find('"')
>>> r = s.rfind('"')

>>> s[l+1:r]
'" " " ""\\1" " "" "'

如何从字符串中删除所有空格

问题:如何从字符串中删除所有空格

如何删除python字符串中的所有空格?例如,我希望将一个字符串strip my spaces转换成stripmyspaces,但是我似乎无法通过以下方式完成此操作strip()

>>> 'strip my spaces'.strip()
'strip my spaces'

How do I strip all the spaces in a python string? For example, I want a string like strip my spaces to be turned into stripmyspaces, but I cannot seem to accomplish that with strip():

>>> 'strip my spaces'.strip()
'strip my spaces'

回答 0

利用没有sep参数的str.split的行为:

>>> s = " \t foo \n bar "
>>> "".join(s.split())
'foobar'

如果只想删除空格而不是所有空格:

>>> s.replace(" ", "")
'\tfoo\nbar'

过早的优化

尽管效率不是主要目标(编写清晰的代码是),但以下是一些初始时间:

$ python -m timeit '"".join(" \t foo \n bar ".split())'
1000000 loops, best of 3: 1.38 usec per loop
$ python -m timeit -s 'import re' 're.sub(r"\s+", "", " \t foo \n bar ")'
100000 loops, best of 3: 15.6 usec per loop

请注意,正则表达式已缓存,因此它没有您想象的那么慢。编译事前帮助一些,但在实践中,如果你把这个只会重要很多倍:

$ python -m timeit -s 'import re; e = re.compile(r"\s+")' 'e.sub("", " \t foo \n bar ")'
100000 loops, best of 3: 7.76 usec per loop

即使re.sub慢了11.3倍,但请记住,您的瓶颈肯定会在其他地方。大多数程序不会注意到这三个选择之间的区别。

Taking advantage of str.split’s behavior with no sep parameter:

>>> s = " \t foo \n bar "
>>> "".join(s.split())
'foobar'

If you just want to remove spaces instead of all whitespace:

>>> s.replace(" ", "")
'\tfoo\nbar'

Premature optimization

Even though efficiency isn’t the primary goal—writing clear code is—here are some initial timings:

$ python -m timeit '"".join(" \t foo \n bar ".split())'
1000000 loops, best of 3: 1.38 usec per loop
$ python -m timeit -s 'import re' 're.sub(r"\s+", "", " \t foo \n bar ")'
100000 loops, best of 3: 15.6 usec per loop

Note the regex is cached, so it’s not as slow as you’d imagine. Compiling it beforehand helps some, but would only matter in practice if you call this many times:

$ python -m timeit -s 'import re; e = re.compile(r"\s+")' 'e.sub("", " \t foo \n bar ")'
100000 loops, best of 3: 7.76 usec per loop

Even though re.sub is 11.3x slower, remember your bottlenecks are assuredly elsewhere. Most programs would not notice the difference between any of these 3 choices.


回答 1

>>> import re
>>> re.sub(r'\s+', '', 'strip my spaces')
'stripmyspaces'

还可以处理您不会想到的所有空白字符(相信我,有很多)。

For Python 3:

>>> import re
>>> re.sub(r'\s+', '', 'strip my \n\t\r ASCII and \u00A0 \u2003 Unicode spaces')
'stripmyASCIIandUnicodespaces'
>>> # Or, depending on the situation:
>>> re.sub(r'(\s|\u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF)+', '', \
... '\uFEFF\t\t\t strip all \u000A kinds of \u200B whitespace \n')
'stripallkindsofwhitespace'

…handles any whitespace characters that you’re not thinking of – and believe us, there are plenty.

\s on its own always covers the ASCII whitespace:

  • (regular) space
  • tab
  • new line (\n)
  • carriage return (\r)
  • form feed
  • vertical tab

Additionally:

  • for Python 2 with re.UNICODE enabled,
  • for Python 3 without any extra actions,

\s also covers the Unicode whitespace characters, for example:

  • non-breaking space,
  • em space,
  • ideographic space,

…etc. See the full list here, under “Unicode characters with White_Space property”.

However \s DOES NOT cover characters not classified as whitespace, which are de facto whitespace, such as among others:

  • zero-width joiner,
  • Mongolian vowel separator,
  • zero-width non-breaking space (a.k.a. byte order mark),

…etc. See the full list here, under “Related Unicode characters without White_Space property”.

So these 6 characters are covered by the list in the second regex, \u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF.

Sources:


回答 2

或者,

"strip my spaces".translate( None, string.whitespace )

这是Python3版本:

"strip my spaces".translate(str.maketrans('', '', string.whitespace))

Alternatively,

"strip my spaces".translate( None, string.whitespace )

And here is Python3 version:

"strip my spaces".translate(str.maketrans('', '', string.whitespace))

回答 3

最简单的方法是使用replace:

"foo bar\t".replace(" ", "").replace("\t", "")

或者,使用正则表达式:

import re
re.sub(r"\s", "", "foo bar\t")

The simplest is to use replace:

"foo bar\t".replace(" ", "").replace("\t", "")

Alternatively, use a regular expression:

import re
re.sub(r"\s", "", "foo bar\t")

回答 4

在Python中删除起始空间

string1="    This is Test String to strip leading space"
print string1
print string1.lstrip()

在Python中删除尾随或结尾空格

string2="This is Test String to strip trailing space     "
print string2
print string2.rstrip()

在Python中从字符串的开头和结尾删除空格

string3="    This is Test String to strip leading and trailing space      "
print string3
print string3.strip()

删除python中的所有空格

string4="   This is Test String to test all the spaces        "
print string4
print string4.replace(" ", "")

Remove the Starting Spaces in Python

string1="    This is Test String to strip leading space"
print string1
print string1.lstrip()

Remove the Trailing or End Spaces in Python

string2="This is Test String to strip trailing space     "
print string2
print string2.rstrip()

Remove the whiteSpaces from Beginning and end of the string in Python

string3="    This is Test String to strip leading and trailing space      "
print string3
print string3.strip()

Remove all the spaces in python

string4="   This is Test String to test all the spaces        "
print string4
print string4.replace(" ", "")

回答 5

尝试使用regex re.sub。您可以搜索所有空格并替换为空字符串。

\s模式中的匹配空格字符-不仅是空格(制表符,换行符等)。您可以在手册中了解更多信息。

Try a regex with re.sub. You can search for all whitespace and replace with an empty string.

\s in your pattern will match whitespace characters – and not just a space (tabs, newlines, etc). You can read more about it in the manual.


回答 6

import re
re.sub(' ','','strip my spaces')
import re
re.sub(' ','','strip my spaces')

回答 7

如Roger Pate所述,以下代码为我工作:

s = " \t foo \n bar "
"".join(s.split())
'foobar'

我正在使用Jupyter Notebook运行以下代码:

i=0
ProductList=[]
while i < len(new_list): 
   temp=''                            # new_list[i]=temp=' Plain   Utthapam  '
   #temp=new_list[i].strip()          #if we want o/p as: 'Plain Utthapam'
   temp="".join(new_list[i].split())  #o/p: 'PlainUtthapam' 
   temp=temp.upper()                  #o/p:'PLAINUTTHAPAM' 
   ProductList.append(temp)
   i=i+2

As mentioned by Roger Pate following code worked for me:

s = " \t foo \n bar "
"".join(s.split())
'foobar'

I am using Jupyter Notebook to run following code:

i=0
ProductList=[]
while i < len(new_list): 
   temp=''                            # new_list[i]=temp=' Plain   Utthapam  '
   #temp=new_list[i].strip()          #if we want o/p as: 'Plain Utthapam'
   temp="".join(new_list[i].split())  #o/p: 'PlainUtthapam' 
   temp=temp.upper()                  #o/p:'PLAINUTTHAPAM' 
   ProductList.append(temp)
   i=i+2

回答 8

可以使用过滤列表的标准技术,尽管它们不如split/jointranslate方法有效。

我们需要一组空格:

>>> import string
>>> ws = set(string.whitespace)

filter内置:

>>> "".join(filter(lambda c: c not in ws, "strip my spaces"))
'stripmyspaces'

列表理解(是,请使用方括号:请参见下面的基准):

>>> import string
>>> "".join([c for c in "strip my spaces" if c not in ws])
'stripmyspaces'

折:

>>> import functools
>>> "".join(functools.reduce(lambda acc, c: acc if c in ws else acc+c, "strip my spaces"))
'stripmyspaces'

基准测试:

>>> from timeit import timeit
>>> timeit('"".join("strip my spaces".split())')
0.17734256500003198
>>> timeit('"strip my spaces".translate(ws_dict)', 'import string; ws_dict = {ord(ws):None for ws in string.whitespace}')
0.457635745999994
>>> timeit('re.sub(r"\s+", "", "strip my spaces")', 'import re')
1.017787621000025

>>> SETUP = 'import string, operator, functools, itertools; ws = set(string.whitespace)'
>>> timeit('"".join([c for c in "strip my spaces" if c not in ws])', SETUP)
0.6484303600000203
>>> timeit('"".join(c for c in "strip my spaces" if c not in ws)', SETUP)
0.950212219999969
>>> timeit('"".join(filter(lambda c: c not in ws, "strip my spaces"))', SETUP)
1.3164566040000523
>>> timeit('"".join(functools.reduce(lambda acc, c: acc if c in ws else acc+c, "strip my spaces"))', SETUP)
1.6947649049999995

The standard techniques to filter a list apply, although they are not as efficient as the split/join or translate methods.

We need a set of whitespaces:

>>> import string
>>> ws = set(string.whitespace)

The filter builtin:

>>> "".join(filter(lambda c: c not in ws, "strip my spaces"))
'stripmyspaces'

A list comprehension (yes, use the brackets: see benchmark below):

>>> import string
>>> "".join([c for c in "strip my spaces" if c not in ws])
'stripmyspaces'

A fold:

>>> import functools
>>> "".join(functools.reduce(lambda acc, c: acc if c in ws else acc+c, "strip my spaces"))
'stripmyspaces'

Benchmark:

>>> from timeit import timeit
>>> timeit('"".join("strip my spaces".split())')
0.17734256500003198
>>> timeit('"strip my spaces".translate(ws_dict)', 'import string; ws_dict = {ord(ws):None for ws in string.whitespace}')
0.457635745999994
>>> timeit('re.sub(r"\s+", "", "strip my spaces")', 'import re')
1.017787621000025

>>> SETUP = 'import string, operator, functools, itertools; ws = set(string.whitespace)'
>>> timeit('"".join([c for c in "strip my spaces" if c not in ws])', SETUP)
0.6484303600000203
>>> timeit('"".join(c for c in "strip my spaces" if c not in ws)', SETUP)
0.950212219999969
>>> timeit('"".join(filter(lambda c: c not in ws, "strip my spaces"))', SETUP)
1.3164566040000523
>>> timeit('"".join(functools.reduce(lambda acc, c: acc if c in ws else acc+c, "strip my spaces"))', SETUP)
1.6947649049999995

回答 9

TL / DR

该解决方案已使用Python 3.6进行了测试

要在Python3中从字符串中去除所有空格,可以使用以下函数:

def remove_spaces(in_string: str):
    return in_string.translate(str.maketrans({' ': ''})

要删除任何空格字符(’\ t \ n \ r \ x0b \ x0c’),可以使用以下功能:

import string
def remove_whitespace(in_string: str):
    return in_string.translate(str.maketrans(dict.fromkeys(string.whitespace)))

说明

Python的str.translate方法是str的内置类方法,它获取一个表并返回字符串的副本,其中每个字符都通过传递的转换表进行映射。str.translate的完整文档

使用创建转换表str.maketrans。此方法是的另一个内置类方法str。在这里,我们仅将其与一个参数一起使用,在本例中为字典,其中的键是要替换的字符,映射到具有字符替换值的值。它返回一个转换表以与一起使用str.translatestr.maketrans的完整文档

stringpython中的模块包含一些常见的字符串操作和常量。string.whitespace是一个常量,它返回一个字符串,其中包含所有被视为空格的ASCII字符。这包括字符空格,制表符,换行符,返回符,换页符和垂直制表符。字符串的完整文档

在第二个函数dict.fromkeys中,用于创建字典,其中的键是string.whitespace每个带有value 的字符串返回的字符Nonedict.fromkeys的完整文档

TL/DR

This solution was tested using Python 3.6

To strip all spaces from a string in Python3 you can use the following function:

def remove_spaces(in_string: str):
    return in_string.translate(str.maketrans({' ': ''})

To remove any whitespace characters (‘ \t\n\r\x0b\x0c’) you can use the following function:

import string
def remove_whitespace(in_string: str):
    return in_string.translate(str.maketrans(dict.fromkeys(string.whitespace)))

Explanation

Python’s str.translate method is a built-in class method of str, it takes a table and returns a copy of the string with each character mapped through the passed translation table. Full documentation for str.translate

To create the translation table str.maketrans is used. This method is another built-in class method of str. Here we use it with only one parameter, in this case a dictionary, where the keys are the characters to be replaced mapped to values with the characters replacement value. It returns a translation table for use with str.translate. Full documentation for str.maketrans

The string module in python contains some common string operations and constants. string.whitespace is a constant which returns a string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab.Full documentation for string

In the second function dict.fromkeys is used to create a dictionary where the keys are the characters in the string returned by string.whitespace each with value None. Full documentation for dict.fromkeys


回答 10

如果不是最佳性能的要求,而您只想简单地做一些事情,则可以使用字符串类的内置“ isspace”方法定义一个基本函数来测试每个字符:

def remove_space(input_string):
    no_white_space = ''
    for c in input_string:
        if not c.isspace():
            no_white_space += c
    return no_white_space

no_white_space这种方式构建字符串将不会具有理想的性能,但是解决方案很容易理解。

>>> remove_space('strip my spaces')
'stripmyspaces'

如果您不想定义一个函数,则可以将其转换为与列表理解相似的东西。从最佳答案的join解决方案中借用:

>>> "".join([c for c in "strip my spaces" if not c.isspace()])
'stripmyspaces'

If optimal performance is not a requirement and you just want something dead simple, you can define a basic function to test each character using the string class’s built in “isspace” method:

def remove_space(input_string):
    no_white_space = ''
    for c in input_string:
        if not c.isspace():
            no_white_space += c
    return no_white_space

Building the no_white_space string this way will not have ideal performance, but the solution is easy to understand.

>>> remove_space('strip my spaces')
'stripmyspaces'

If you don’t want to define a function, you can convert this into something vaguely similar with list comprehension. Borrowing from the top answer’s join solution:

>>> "".join([c for c in "strip my spaces" if not c.isspace()])
'stripmyspaces'

用逗号分割并在Python中去除空格

问题:用逗号分割并在Python中去除空格

我有一些在逗号处分割的python代码,但没有去除空格:

>>> string = "blah, lots  ,  of ,  spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots  ', '  of ', '  spaces', ' here ']

我宁愿这样删除空格:

['blah', 'lots', 'of', 'spaces', 'here']

我知道我可以遍历list和strip()每个项目,但是,因为这是Python,所以我猜有一种更快,更轻松和更优雅的方法。

I have some python code that splits on comma, but doesn’t strip the whitespace:

>>> string = "blah, lots  ,  of ,  spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots  ', '  of ', '  spaces', ' here ']

I would rather end up with whitespace removed like this:

['blah', 'lots', 'of', 'spaces', 'here']

I am aware that I could loop through the list and strip() each item but, as this is Python, I’m guessing there’s a quicker, easier and more elegant way of doing it.


回答 0

使用列表理解-更简单,就像for循环一样容易阅读。

my_string = "blah, lots  ,  of ,  spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]

请参阅: 有关列表理解的Python文档
很好的2秒钟的列表理解说明。

Use list comprehension — simpler, and just as easy to read as a for loop.

my_string = "blah, lots  ,  of ,  spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]

See: Python docs on List Comprehension
A good 2 second explanation of list comprehension.


回答 1

使用正则表达式拆分。注意我用前导空格使情况更一般。列表理解是删除前面和后面的空字符串。

>>> import re
>>> string = "  blah, lots  ,  of ,  spaces, here "
>>> pattern = re.compile("^\s+|\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']

即使^\s+不匹配也可以:

>>> string = "foo,   bar  "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>

这就是您需要^ \ s +的原因:

>>> pattern = re.compile("\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['  blah', 'lots', 'of', 'spaces', 'here']

看到等等的主要空间吗?

说明:上面使用的是Python 3解释器,但结果与Python 2相同。

Split using a regular expression. Note I made the case more general with leading spaces. The list comprehension is to remove the null strings at the front and back.

>>> import re
>>> string = "  blah, lots  ,  of ,  spaces, here "
>>> pattern = re.compile("^\s+|\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']

This works even if ^\s+ doesn’t match:

>>> string = "foo,   bar  "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>

Here’s why you need ^\s+:

>>> pattern = re.compile("\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['  blah', 'lots', 'of', 'spaces', 'here']

See the leading spaces in blah?

Clarification: above uses the Python 3 interpreter, but results are the same in Python 2.


回答 2

我来补充:

map(str.strip, string.split(','))

但是看到Jason Orendorff在评论中已经提到了它。

在同一个答案中读到格伦·梅纳德(Glenn Maynard)的评论,这暗示着人们对地图的理解,我开始怀疑为什么。我以为他是出于性能方面的考虑,但是当然他可能是出于风格方面的原因,或者其他原因(Glenn?)。

因此,在我的盒子上快速地(可能有缺陷?)应用了以下三种方法的测试:

[word.strip() for word in string.split(',')]
$ time ./list_comprehension.py 
real    0m22.876s

map(lambda s: s.strip(), string.split(','))
$ time ./map_with_lambda.py 
real    0m25.736s

map(str.strip, string.split(','))
$ time ./map_with_str.strip.py 
real    0m19.428s

map(str.strip, string.split(','))赢家,但它似乎他们都在同一个球场。

当然,出于性能原因,不一定要排除map(有或没有lambda),对我而言,它至少与列表理解一样清晰。

编辑:

Ubuntu 10.04上的Python 2.6.5

I came to add:

map(str.strip, string.split(','))

but saw it had already been mentioned by Jason Orendorff in a comment.

Reading Glenn Maynard’s comment in the same answer suggesting list comprehensions over map I started to wonder why. I assumed he meant for performance reasons, but of course he might have meant for stylistic reasons, or something else (Glenn?).

So a quick (possibly flawed?) test on my box applying the three methods in a loop revealed:

[word.strip() for word in string.split(',')]
$ time ./list_comprehension.py 
real    0m22.876s

map(lambda s: s.strip(), string.split(','))
$ time ./map_with_lambda.py 
real    0m25.736s

map(str.strip, string.split(','))
$ time ./map_with_str.strip.py 
real    0m19.428s

making map(str.strip, string.split(',')) the winner, although it seems they are all in the same ballpark.

Certainly though map (with or without a lambda) should not necessarily be ruled out for performance reasons, and for me it is at least as clear as a list comprehension.

Edit:

Python 2.6.5 on Ubuntu 10.04


回答 3

分割字符串之前,只需从字符串中删除空格。

mylist = my_string.replace(' ','').split(',')

Just remove the white space from the string before you split it.

mylist = my_string.replace(' ','').split(',')

回答 4

我知道已经回答了这个问题,但是如果您结束很多工作,则使用正则表达式可能是更好的选择:

>>> import re
>>> re.sub(r'\s', '', string).split(',')
['blah', 'lots', 'of', 'spaces', 'here']

\s匹配任何空白字符,我们只是用一个空字符串替换它''。您可以在此处找到更多信息:http : //docs.python.org/library/re.html#re.sub

I know this has already been answered, but if you end doing this a lot, regular expressions may be a better way to go:

>>> import re
>>> re.sub(r'\s', '', string).split(',')
['blah', 'lots', 'of', 'spaces', 'here']

The \s matches any whitespace character, and we just replace it with an empty string ''. You can find more info here: http://docs.python.org/library/re.html#re.sub


回答 5

import re
result=[x for x in re.split(',| ',your_string) if x!='']

这对我来说很好。

import re
result=[x for x in re.split(',| ',your_string) if x!='']

this works fine for me.


回答 6

re (如正则表达式中一样)允许一次分割多个字符:

$ string = "blah, lots  ,  of ,  spaces, here "
$ re.split(', ',string)
['blah', 'lots  ', ' of ', ' spaces', 'here ']

这对于您的示例字符串而言效果不佳,但对于逗号分隔的列表则效果很好。对于您的示例字符串,您可以结合使用re.split功能来分割正则表达式模式,从而获得“按此分割”效果。

$ re.split('[, ]',string)
['blah',
 '',
 'lots',
 '',
 '',
 '',
 '',
 'of',
 '',
 '',
 '',
 'spaces',
 '',
 'here',
 '']

不幸的是,这很丑陋,但是a filter会成功的:

$ filter(None, re.split('[, ]',string))
['blah', 'lots', 'of', 'spaces', 'here']

瞧!

re (as in regular expressions) allows splitting on multiple characters at once:

$ string = "blah, lots  ,  of ,  spaces, here "
$ re.split(', ',string)
['blah', 'lots  ', ' of ', ' spaces', 'here ']

This doesn’t work well for your example string, but works nicely for a comma-space separated list. For your example string, you can combine the re.split power to split on regex patterns to get a “split-on-this-or-that” effect.

$ re.split('[, ]',string)
['blah',
 '',
 'lots',
 '',
 '',
 '',
 '',
 'of',
 '',
 '',
 '',
 'spaces',
 '',
 'here',
 '']

Unfortunately, that’s ugly, but a filter will do the trick:

$ filter(None, re.split('[, ]',string))
['blah', 'lots', 'of', 'spaces', 'here']

Voila!


回答 7

map(lambda s: s.strip(), mylist)比显式循环要好一点。或一次全部:map(lambda s:s.strip(), string.split(','))

map(lambda s: s.strip(), mylist) would be a little better than explicitly looping. Or for the whole thing at once: map(lambda s:s.strip(), string.split(','))


回答 8

s = 'bla, buu, jii'

sp = []
sp = s.split(',')
for st in sp:
    print st
s = 'bla, buu, jii'

sp = []
sp = s.split(',')
for st in sp:
    print st

回答 9

import re
mylist = [x for x in re.compile('\s*[,|\s+]\s*').split(string)]

简单地说,用逗号或至少一个空白空格,带有/没有在前/在后的空格。

请试试!

import re
mylist = [x for x in re.compile('\s*[,|\s+]\s*').split(string)]

Simply, comma or at least one white spaces with/without preceding/succeeding white spaces.

Please try!


回答 10

map(lambda s: s.strip(), mylist)比显式循环要好一点。
或一次全部:

map(lambda s:s.strip(), string.split(','))

这基本上就是您需要的一切。

map(lambda s: s.strip(), mylist) would be a little better than explicitly looping.
Or for the whole thing at once:

map(lambda s:s.strip(), string.split(','))

That’s basically everything you need.


如何修剪空白?

问题:如何修剪空白?

是否有Python函数可以从字符串中修剪空格(空格和制表符)?

例如:\t example string\texample string

Is there a Python function that will trim whitespace (spaces and tabs) from a string?

Example: \t example string\texample string


回答 0

两侧的空格:

s = "  \t a string example\t  "
s = s.strip()

右侧的空格:

s = s.rstrip()

左侧的空白:

s = s.lstrip()

正如thedz所指出的,您可以提供一个参数来将任意字符剥离到以下任何函数中,如下所示:

s = s.strip(' \t\n\r')

这将去除任何空间,\t\n,或\r从左侧字符,右手侧,或该字符串的两侧。

上面的示例仅从字符串的左侧和右侧删除字符串。如果还要从字符串中间删除字符,请尝试re.sub

import re
print re.sub('[\s+]', '', s)

那应该打印出来:

astringexample

Whitespace on both sides:

s = "  \t a string example\t  "
s = s.strip()

Whitespace on the right side:

s = s.rstrip()

Whitespace on the left side:

s = s.lstrip()

As thedz points out, you can provide an argument to strip arbitrary characters to any of these functions like this:

s = s.strip(' \t\n\r')

This will strip any space, \t, \n, or \r characters from the left-hand side, right-hand side, or both sides of the string.

The examples above only remove strings from the left-hand and right-hand sides of strings. If you want to also remove characters from the middle of a string, try re.sub:

import re
print re.sub('[\s+]', '', s)

That should print out:

astringexample

回答 1

Python trim方法称为strip

str.strip() #trim
str.lstrip() #ltrim
str.rstrip() #rtrim

Python trim method is called strip:

str.strip() #trim
str.lstrip() #ltrim
str.rstrip() #rtrim

回答 2

对于前导和尾随空格:

s = '   foo    \t   '
print s.strip() # prints "foo"

否则,一个正则表达式将起作用:

import re
pat = re.compile(r'\s+')
s = '  \t  foo   \t   bar \t  '
print pat.sub('', s) # prints "foobar"

For leading and trailing whitespace:

s = '   foo    \t   '
print s.strip() # prints "foo"

Otherwise, a regular expression works:

import re
pat = re.compile(r'\s+')
s = '  \t  foo   \t   bar \t  '
print pat.sub('', s) # prints "foobar"

回答 3

您还可以使用非常简单且基本的功能:str.replace(),用于空白和制表符:

>>> whitespaces = "   abcd ef gh ijkl       "
>>> tabs = "        abcde       fgh        ijkl"

>>> print whitespaces.replace(" ", "")
abcdefghijkl
>>> print tabs.replace(" ", "")
abcdefghijkl

简单容易。

You can also use very simple, and basic function: str.replace(), works with the whitespaces and tabs:

>>> whitespaces = "   abcd ef gh ijkl       "
>>> tabs = "        abcde       fgh        ijkl"

>>> print whitespaces.replace(" ", "")
abcdefghijkl
>>> print tabs.replace(" ", "")
abcdefghijkl

Simple and easy.


回答 4

#how to trim a multi line string or a file

s=""" line one
\tline two\t
line three """

#line1 starts with a space, #2 starts and ends with a tab, #3 ends with a space.

s1=s.splitlines()
print s1
[' line one', '\tline two\t', 'line three ']

print [i.strip() for i in s1]
['line one', 'line two', 'line three']




#more details:

#we could also have used a forloop from the begining:
for line in s.splitlines():
    line=line.strip()
    process(line)

#we could also be reading a file line by line.. e.g. my_file=open(filename), or with open(filename) as myfile:
for line in my_file:
    line=line.strip()
    process(line)

#moot point: note splitlines() removed the newline characters, we can keep them by passing True:
#although split() will then remove them anyway..
s2=s.splitlines(True)
print s2
[' line one\n', '\tline two\t\n', 'line three ']
#how to trim a multi line string or a file

s=""" line one
\tline two\t
line three """

#line1 starts with a space, #2 starts and ends with a tab, #3 ends with a space.

s1=s.splitlines()
print s1
[' line one', '\tline two\t', 'line three ']

print [i.strip() for i in s1]
['line one', 'line two', 'line three']




#more details:

#we could also have used a forloop from the begining:
for line in s.splitlines():
    line=line.strip()
    process(line)

#we could also be reading a file line by line.. e.g. my_file=open(filename), or with open(filename) as myfile:
for line in my_file:
    line=line.strip()
    process(line)

#moot point: note splitlines() removed the newline characters, we can keep them by passing True:
#although split() will then remove them anyway..
s2=s.splitlines(True)
print s2
[' line one\n', '\tline two\t\n', 'line three ']

回答 5

尚无人发布这些正则表达式解决方案。

匹配:

>>> import re
>>> p=re.compile('\\s*(.*\\S)?\\s*')

>>> m=p.match('  \t blah ')
>>> m.group(1)
'blah'

>>> m=p.match('  \tbl ah  \t ')
>>> m.group(1)
'bl ah'

>>> m=p.match('  \t  ')
>>> print m.group(1)
None

搜索(您必须以不同的方式处理“仅空格”输入大小写):

>>> p1=re.compile('\\S.*\\S')

>>> m=p1.search('  \tblah  \t ')
>>> m.group()
'blah'

>>> m=p1.search('  \tbl ah  \t ')
>>> m.group()
'bl ah'

>>> m=p1.search('  \t  ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

如果使用re.sub,则可以删除内部空格,这可能是不希望的。

No one has posted these regex solutions yet.

Matching:

>>> import re
>>> p=re.compile('\\s*(.*\\S)?\\s*')

>>> m=p.match('  \t blah ')
>>> m.group(1)
'blah'

>>> m=p.match('  \tbl ah  \t ')
>>> m.group(1)
'bl ah'

>>> m=p.match('  \t  ')
>>> print m.group(1)
None

Searching (you have to handle the “only spaces” input case differently):

>>> p1=re.compile('\\S.*\\S')

>>> m=p1.search('  \tblah  \t ')
>>> m.group()
'blah'

>>> m=p1.search('  \tbl ah  \t ')
>>> m.group()
'bl ah'

>>> m=p1.search('  \t  ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

If you use re.sub, you may remove inner whitespace, which could be undesirable.


回答 6

空格包括空格,制表符和CRLF。因此,我们可以使用的一种优雅且单线的字符串函数是translation

' hello apple'.translate(None, ' \n\t\r')

或者,如果您想彻底

import string
' hello  apple'.translate(None, string.whitespace)

Whitespace includes space, tabs and CRLF. So an elegant and one-liner string function we can use is translate.

' hello apple'.translate(None, ' \n\t\r')

OR if you want to be thorough

import string
' hello  apple'.translate(None, string.whitespace)

回答 7

(re.sub(’+’,”,(my_str.replace(’\ n’,”))))。strip()

这将删除所有不需要的空格和换行符。希望有帮助

import re
my_str = '   a     b \n c   '
formatted_str = (re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()

这将导致:

‘a b \ nc’ 将更改为 ‘ab c’

(re.sub(‘ +’, ‘ ‘,(my_str.replace(‘\n’,’ ‘)))).strip()

This will remove all the unwanted spaces and newline characters. Hope this help

import re
my_str = '   a     b \n c   '
formatted_str = (re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()

This will result :

‘ a      b \n c ‘ will be changed to ‘a b c’


回答 8

    something = "\t  please_     \t remove_  all_    \n\n\n\nwhitespaces\n\t  "

    something = "".join(something.split())

输出:

please_remove_all_whitespaces


在答案中添加Le Droid的评论。用空格分隔:

    something = "\t  please     \t remove  all   extra \n\n\n\nwhitespaces\n\t  "
    something = " ".join(something.split())

输出:

请删除所有多余的空格

    something = "\t  please_     \t remove_  all_    \n\n\n\nwhitespaces\n\t  "

    something = "".join(something.split())

output:

please_remove_all_whitespaces


Adding Le Droid’s comment to the answer. To separate with a space:
    something = "\t  please     \t remove  all   extra \n\n\n\nwhitespaces\n\t  "
    something = " ".join(something.split())

output:

please remove all extra whitespaces


回答 9

如果使用Python 3:在您的打印语句中,以sep =“”结尾。这将分隔所有空间。

例:

txt="potatoes"
print("I love ",txt,"",sep="")

这将打印: 我爱土豆。

代替: 我爱土豆。

在您的情况下,由于您尝试使用\ t,因此请执行sep =“ \ t”

If using Python 3: In your print statement, finish with sep=””. That will separate out all of the spaces.

EXAMPLE:

txt="potatoes"
print("I love ",txt,"",sep="")

This will print: I love potatoes.

Instead of: I love potatoes .

In your case, since you would be trying to get ride of the \t, do sep=”\t”


回答 10

在以不同的理解程度查看了这里的许多解决方案之后,我想知道如果字符串用逗号分隔该怎么办…

问题

在尝试处理联系人信息的csv时,我需要一个解决此问题的方法:修剪多余的空格和一些垃圾,但保留尾随逗号和内部空格。我要处理包含联系人注释的字段,所以我想删除垃圾,留下好东西。删除所有标点符号和谷壳后,我不想失去复合令牌之间的空白,因为我不想以后再构建。

正则表达式和模式: [\s_]+?\W+

该模式查找任何空白字符的单个实例,并且下划线(’_’)从1到无数次懒惰(尽可能少的字符),[\s_]+?而在非单词字符从1到无数个数字出现之前时间:( \W+等于[^a-zA-Z0-9_])。具体来说,这会找到大量空格:空字符(\ 0),制表符(\ t),换行符(\ n),前馈(\ f),回车符(\ r)。

我认为这样做有两个好处:

  1. 它不会删除您可能希望保持在一起的完整单词/标记之间的空格;

  2. Python的内置字符串方法strip()不在字符串内部处理,仅在左右两端进行处理,默认arg为空字符(请参见以下示例:文本中包含几行换行符,strip()而regex模式却不会将其全部删除) 。text.strip(' \n\t\r')

这超出了OP的问题,但我认为在很多情况下,像我一样,文本数据中可能会有奇怪的病理性实例(某些转义字符最终出现在某些文本中)。此外,在类似列表的字符串中,除非分隔符将两个空格字符或某些非单词字符分开,例如’-,’或’-、、、’,否则我们不希望删除分隔符。

注意:不是在谈论CSV本身的分隔符。仅在CSV内数据是列表形式的实例,即cs字符串是子字符串。

全面披露:我只处理文本约一个月,而正则表达式仅在最近两周内处理,所以我确定我缺少一些细微差别。就是说,对于较小的字符串集合(我的是在12,000行和40个奇数列的数据帧中),作为除去多余字符的最后一步,此方法效果很好,特别是如果您在其中引入了一些额外的空格想要分隔由非单词字符连接的文本,但又不想在以前没有空格的地方添加空格。

一个例子:

import re


text = "\"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, , , , \r, , \0, ff dd \n invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, \n i69rpofhfsp9t7c practice 20ignition - 20june \t\n .2134.pdf 2109                                                 \n\n\n\nklkjsdf\""

print(f"Here is the text as formatted:\n{text}\n")
print()
print("Trimming both the whitespaces and the non-word characters that follow them.")
print()
trim_ws_punctn = re.compile(r'[\s_]+?\W+')
clean_text = trim_ws_punctn.sub(' ', text)
print(clean_text)
print()
print("what about 'strip()'?")
print(f"Here is the text, formatted as is:\n{text}\n")
clean_text = text.strip(' \n\t\r')  # strip out whitespace?
print()
print(f"Here is the text, formatted as is:\n{clean_text}\n")

print()
print("Are 'text' and 'clean_text' unchanged?")
print(clean_text == text)

输出:

Here is the text as formatted:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf" 

using regex to trim both the whitespaces and the non-word characters that follow them.

"portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk,  jim.somedude@blahblah.com, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf"

Very nice.
What about 'strip()'?

Here is the text, formatted as is:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"


Here is the text, after stipping with 'strip':


"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"
Are 'text' and 'clean_text' unchanged? 'True'

因此,strip一次删除一个空格。因此,在OP的情况下,strip()可以。但是如果情况变得更加复杂,则对于更一般的设置,正则表达式和类似的模式可能会有一定价值。

看到它在行动

Having looked at quite a few solutions here with various degrees of understanding, I wondered what to do if the string was comma separated…

the problem

While trying to process a csv of contact information, I needed a solution this problem: trim extraneous whitespace and some junk, but preserve trailing commas, and internal whitespace. Working with a field containing notes on the contacts, I wanted to remove the garbage, leaving the good stuff. Trimming out all the punctuation and chaff, I didn’t want to lose the whitespace between compound tokens as I didn’t want to rebuild later.

regex and patterns: [\s_]+?\W+

The pattern looks for single instances of any whitespace character and the underscore (‘_’) from 1 to an unlimited number of times lazily (as few characters as possible) with [\s_]+? that come before non-word characters occurring from 1 to an unlimited amount of time with this: \W+ (is equivalent to [^a-zA-Z0-9_]). Specifically, this finds swaths of whitespace: null characters (\0), tabs (\t), newlines (\n), feed-forward (\f), carriage returns (\r).

I see the advantage to this as two-fold:

  1. that it doesn’t remove whitespace between the complete words/tokens that you might want to keep together;

  2. Python’s built in string method strip()doesn’t deal inside the string, just the left and right ends, and default arg is null characters (see below example: several newlines are in the text, and strip() does not remove them all while the regex pattern does). text.strip(' \n\t\r')

This goes beyond the OPs question, but I think there are plenty of cases where we might have odd, pathological instances within the text data, as I did (some how the escape characters ended up in some of the text). Moreover, in list-like strings, we don’t want to eliminate the delimiter unless the delimiter separates two whitespace characters or some non-word character, like ‘-,’ or ‘-, ,,,’.

NB: Not talking about the delimiter of the CSV itself. Only of instances within the CSV where the data is list-like, ie is a c.s. string of substrings.

Full disclosure: I’ve only been manipulating text for about a month, and regex only the last two weeks, so I’m sure there are some nuances I’m missing. That said, for smaller collections of strings (mine are in a dataframe of 12,000 rows and 40 odd columns), as a final step after a pass for removal of extraneous characters, this works exceptionally well, especially if you introduce some additional whitespace where you want to separate text joined by a non-word character, but don’t want to add whitespace where there was none before.

An example:

import re


text = "\"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, , , , \r, , \0, ff dd \n invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, \n i69rpofhfsp9t7c practice 20ignition - 20june \t\n .2134.pdf 2109                                                 \n\n\n\nklkjsdf\""

print(f"Here is the text as formatted:\n{text}\n")
print()
print("Trimming both the whitespaces and the non-word characters that follow them.")
print()
trim_ws_punctn = re.compile(r'[\s_]+?\W+')
clean_text = trim_ws_punctn.sub(' ', text)
print(clean_text)
print()
print("what about 'strip()'?")
print(f"Here is the text, formatted as is:\n{text}\n")
clean_text = text.strip(' \n\t\r')  # strip out whitespace?
print()
print(f"Here is the text, formatted as is:\n{clean_text}\n")

print()
print("Are 'text' and 'clean_text' unchanged?")
print(clean_text == text)

This outputs:

Here is the text as formatted:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf" 

using regex to trim both the whitespaces and the non-word characters that follow them.

"portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk,  jim.somedude@blahblah.com, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf"

Very nice.
What about 'strip()'?

Here is the text, formatted as is:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"


Here is the text, after stipping with 'strip':


"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"
Are 'text' and 'clean_text' unchanged? 'True'

So strip removes one whitespace from at a time. So in the OPs case, strip() is fine. but if things get any more complex, regex and a similar pattern may be of some value for more general settings.

see it in action


回答 11

尝试翻译

>>> import string
>>> print '\t\r\n  hello \r\n world \t\r\n'

  hello 
 world  
>>> tr = string.maketrans(string.whitespace, ' '*len(string.whitespace))
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr)
'     hello    world    '
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr).replace(' ', '')
'helloworld'

try translate

>>> import string
>>> print '\t\r\n  hello \r\n world \t\r\n'

  hello 
 world  
>>> tr = string.maketrans(string.whitespace, ' '*len(string.whitespace))
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr)
'     hello    world    '
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr).replace(' ', '')
'helloworld'

回答 12

如果要仅在字符串的开头和结尾处修剪空格,则可以执行以下操作:

some_string = "    Hello,    world!\n    "
new_string = some_string.strip()
# new_string is now "Hello,    world!"

这与Qt的QString :: trimmed()方法非常相似,因为它删除了前导和尾随空格,而只保留了内部空格。

但是,如果您想使用类似Qt的QString :: simplified()方法的方法,该方法不仅删除开头和结尾的空格,还可以将所有连续的内部空格“挤压”到一个空格字符,则可以使用.split()and 的组合" ".join,如下所示:

some_string = "\t    Hello,  \n\t  world!\n    "
new_string = " ".join(some_string.split())
# new_string is now "Hello, world!"

在最后一个示例中,内部空格的每个序列都用一个空格代替,同时仍在字符串的开头和结尾修剪空格。

If you want to trim the whitespace off just the beginning and end of the string, you can do something like this:

some_string = "    Hello,    world!\n    "
new_string = some_string.strip()
# new_string is now "Hello,    world!"

This works a lot like Qt’s QString::trimmed() method, in that it removes leading and trailing whitespace, while leaving internal whitespace alone.

But if you’d like something like Qt’s QString::simplified() method which not only removes leading and trailing whitespace, but also “squishes” all consecutive internal whitespace to one space character, you can use a combination of .split() and " ".join, like this:

some_string = "\t    Hello,  \n\t  world!\n    "
new_string = " ".join(some_string.split())
# new_string is now "Hello, world!"

In this last example, each sequence of internal whitespace replaced with a single space, while still trimming the whitespace off the start and end of the string.


回答 13

通常,我使用以下方法:

>>> myStr = "Hi\n Stack Over \r flow!"
>>> charList = [u"\u005Cn",u"\u005Cr",u"\u005Ct"]
>>> import re
>>> for i in charList:
        myStr = re.sub(i, r"", myStr)

>>> myStr
'Hi Stack Over  flow'

注意:这仅用于删除“ \ n”,“ \ r”和“ \ t”。它不会删除多余的空间。

Generally, I am using the following method:

>>> myStr = "Hi\n Stack Over \r flow!"
>>> charList = [u"\u005Cn",u"\u005Cr",u"\u005Ct"]
>>> import re
>>> for i in charList:
        myStr = re.sub(i, r"", myStr)

>>> myStr
'Hi Stack Over  flow'

Note: This is only for removing “\n”, “\r” and “\t” only. It does not remove extra spaces.


回答 14

用于从字符串中间删除空格

$p = "ATGCGAC ACGATCGACC";
$p =~ s/\s//g;
print $p;

输出:

ATGCGACACGATCGACC

for removing whitespaces from the middle of the string

$p = "ATGCGAC ACGATCGACC";
$p =~ s/\s//g;
print $p;

output:

ATGCGACACGATCGACC

回答 15

这将删除字符串开头和结尾的所有空格和换行符:

>>> s = "  \n\t  \n   some \n text \n     "
>>> re.sub("^\s+|\s+$", "", s)
>>> "some \n text"

This will remove all whitespace and newlines from both the beginning and end of a string:

>>> s = "  \n\t  \n   some \n text \n     "
>>> re.sub("^\s+|\s+$", "", s)
>>> "some \n text"