问题:如何删除尾随换行符?
Python与Perl chomp
函数等效吗?如果是换行符,它将删除字符串的最后符?
What is the Python equivalent of Perl’s chomp
function, which removes the last character of a string if it is a newline?
回答 0
试用该方法rstrip()
(请参阅doc Python 2和Python 3)
>>> 'test string\n'.rstrip()
'test string'
Python的rstrip()
方法去除所有的默认类型的尾随空白的,如Perl并与不只是一个换行符chomp
。
>>> 'test string \n \r\n\n\r \n\n'.rstrip()
'test string'
要只删除换行符:
>>> 'test string \n \r\n\n\r \n\n'.rstrip('\n')
'test string \n \r\n\n\r '
还有一些方法lstrip()
和strip()
:
>>> s = " \n\r\n \n abc def \n\r\n \n "
>>> s.strip()
'abc def'
>>> s.lstrip()
'abc def \n\r\n \n '
>>> s.rstrip()
' \n\r\n \n abc def'
Try the method rstrip()
(see doc Python 2 and Python 3)
>>> 'test string\n'.rstrip()
'test string'
Python’s rstrip()
method strips all kinds of trailing whitespace by default, not just one newline as Perl does with chomp
.
>>> 'test string \n \r\n\n\r \n\n'.rstrip()
'test string'
To strip only newlines:
>>> 'test string \n \r\n\n\r \n\n'.rstrip('\n')
'test string \n \r\n\n\r '
There are also the methods lstrip()
and strip()
:
>>> s = " \n\r\n \n abc def \n\r\n \n "
>>> s.strip()
'abc def'
>>> s.lstrip()
'abc def \n\r\n \n '
>>> s.rstrip()
' \n\r\n \n abc def'
回答 1
我想说的是,在不尾随换行符的情况下获取行的“ pythonic”方法是splitlines()。
>>> text = "line 1\nline 2\r\nline 3\nline 4"
>>> text.splitlines()
['line 1', 'line 2', 'line 3', 'line 4']
And I would say the “pythonic” way to get lines without trailing newline characters is splitlines().
>>> text = "line 1\nline 2\r\nline 3\nline 4"
>>> text.splitlines()
['line 1', 'line 2', 'line 3', 'line 4']
回答 2
删除行尾(EOL)字符的规范方法是使用字符串rstrip()方法,删除任何尾随的\ r或\ n。以下是Mac,Windows和Unix EOL字符的示例。
>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'
>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'
>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'
使用’\ r \ n’作为rstrip的参数意味着它会去除’\ r’或’\ n’的任何尾随组合。这就是为什么它在以上所有三种情况下都有效的原因。
这种细微差别在极少数情况下很重要。例如,我曾经不得不处理一个包含HL7消息的文本文件。HL7标准要求结尾的’\ r’作为其EOL字符。我在其上使用此消息的Windows计算机附加了自己的’\ r \ n’EOL字符。因此,每行的末尾看起来像’\ r \ r \ n’。使用rstrip(’\ r \ n’)会删除整个’\ r \ r \ n’,这不是我想要的。在那种情况下,我只是切掉了最后两个字符。
请注意,与Perl的chomp
函数不同,这将在字符串的末尾去除所有指定的字符,而不仅仅是一个:
>>> "Hello\n\n\n".rstrip("\n")
"Hello"
The canonical way to strip end-of-line (EOL) characters is to use the string rstrip() method removing any trailing \r or \n. Here are examples for Mac, Windows, and Unix EOL characters.
>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'
>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'
>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'
Using ‘\r\n’ as the parameter to rstrip means that it will strip out any trailing combination of ‘\r’ or ‘\n’. That’s why it works in all three cases above.
This nuance matters in rare cases. For example, I once had to process a text file which contained an HL7 message. The HL7 standard requires a trailing ‘\r’ as its EOL character. The Windows machine on which I was using this message had appended its own ‘\r\n’ EOL character. Therefore, the end of each line looked like ‘\r\r\n’. Using rstrip(‘\r\n’) would have taken off the entire ‘\r\r\n’ which is not what I wanted. In that case, I simply sliced off the last two characters instead.
Note that unlike Perl’s chomp
function, this will strip all specified characters at the end of the string, not just one:
>>> "Hello\n\n\n".rstrip("\n")
"Hello"
回答 3
请注意,rstrip的行为与Perl的chomp()并不完全相同,因为它不会修改字符串。也就是说,在Perl中:
$x="a\n";
chomp $x
导致$x
存在"a"
。
但在Python中:
x="a\n"
x.rstrip()
将意味着价值x
是依旧 "a\n"
。甚至x=x.rstrip()
并不总是给出相同的结果,因为它从字符串的末尾去除所有空格,最多不只是一个换行符。
Note that rstrip doesn’t act exactly like Perl’s chomp() because it doesn’t modify the string. That is, in Perl:
$x="a\n";
chomp $x
results in $x
being "a"
.
but in Python:
x="a\n"
x.rstrip()
will mean that the value of x
is still "a\n"
. Even x=x.rstrip()
doesn’t always give the same result, as it strips all whitespace from the end of the string, not just one newline at most.
回答 4
我可能会使用这样的东西:
import os
s = s.rstrip(os.linesep)
我认为问题rstrip("\n")
在于您可能需要确保行分隔符是可移植的。(有传闻说有些过时的系统要使用"\r\n"
)。另一个难题是,rstrip
它将去除重复的空白。希望os.linesep
将包含正确的字符。以上对我有用。
I might use something like this:
import os
s = s.rstrip(os.linesep)
I think the problem with rstrip("\n")
is that you’ll probably want to make sure the line separator is portable. (some antiquated systems are rumored to use "\r\n"
). The other gotcha is that rstrip
will strip out repeated whitespace. Hopefully os.linesep
will contain the right characters. the above works for me.
回答 5
您可以使用line = line.rstrip('\n')
。这将从字符串末尾除去所有换行符,而不仅仅是一条。
You may use line = line.rstrip('\n')
. This will strip all newlines from the end of the string, not just one.
回答 6
s = s.rstrip()
将删除字符串末尾的所有换行符s
。需要分配是因为rstrip
返回一个新字符串而不是修改原始字符串。
s = s.rstrip()
will remove all newlines at the end of the string s
. The assignment is needed because rstrip
returns a new string instead of modifying the original string.
回答 7
这将为“ \ n”行终止符精确复制perl的champ(数组的负行为):
def chomp(x):
if x.endswith("\r\n"): return x[:-2]
if x.endswith("\n") or x.endswith("\r"): return x[:-1]
return x
(注意:它不会修改字符串“就地”;它不会去除多余的尾随空格;需要考虑\ r \ n)
This would replicate exactly perl’s chomp (minus behavior on arrays) for “\n” line terminator:
def chomp(x):
if x.endswith("\r\n"): return x[:-2]
if x.endswith("\n") or x.endswith("\r"): return x[:-1]
return x
(Note: it does not modify string ‘in place’; it does not strip extra trailing whitespace; takes \r\n in account)
回答 8
"line 1\nline 2\r\n...".replace('\n', '').replace('\r', '')
>>> 'line 1line 2...'
否则您总是可以通过regexp变得更加怪异:)
玩得开心!
"line 1\nline 2\r\n...".replace('\n', '').replace('\r', '')
>>> 'line 1line 2...'
or you could always get geekier with regexps :)
have fun!
回答 9
您可以使用地带:
line = line.strip()
演示:
>>> "\n\n hello world \n\n".strip()
'hello world'
you can use strip:
line = line.strip()
demo:
>>> "\n\n hello world \n\n".strip()
'hello world'
回答 10
rstrip在很多级别上都没有与chomp相同的功能。阅读http://perldoc.perl.org/functions/chomp.html,发现chomp确实非常复杂。
但是,我的主要观点是chomp最多删除1个行尾,而rstrip会删除尽可能多的行。
在这里,您可以看到rstrip删除了所有换行符:
>>> 'foo\n\n'.rstrip(os.linesep)
'foo'
可以使用re.sub来更接近典型的Perl chomp用法,如下所示:
>>> re.sub(os.linesep + r'\Z','','foo\n\n')
'foo\n'
rstrip doesn’t do the same thing as chomp, on so many levels. Read http://perldoc.perl.org/functions/chomp.html and see that chomp is very complex indeed.
However, my main point is that chomp removes at most 1 line ending, whereas rstrip will remove as many as it can.
Here you can see rstrip removing all the newlines:
>>> 'foo\n\n'.rstrip(os.linesep)
'foo'
A much closer approximation of typical Perl chomp usage can be accomplished with re.sub, like this:
>>> re.sub(os.linesep + r'\Z','','foo\n\n')
'foo\n'
回答 11
注意"foo".rstrip(os.linesep)
:只会砍断正在执行Python的平台的换行符。想象一下,例如,您正在用Linux整理Windows文件的行,例如:
$ python
Python 2.7.1 (r271:86832, Mar 18 2011, 09:09:48)
[GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.platform
'linux2'
>>> "foo\r\n".rstrip(os.linesep)
'foo\r'
>>>
"foo".rstrip("\r\n")
如Mike所说,请改用。
Careful with "foo".rstrip(os.linesep)
: That will only chomp the newline characters for the platform where your Python is being executed. Imagine you’re chimping the lines of a Windows file under Linux, for instance:
$ python
Python 2.7.1 (r271:86832, Mar 18 2011, 09:09:48)
[GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.platform
'linux2'
>>> "foo\r\n".rstrip(os.linesep)
'foo\r'
>>>
Use "foo".rstrip("\r\n")
instead, as Mike says above.
回答 12
Python文档中的示例仅使用line.strip()
。
Perl的chomp
函数仅在字符串末尾才删除一个换行序列。
如果process
从概念上来说,这是我需要执行的功能,以便对该文件的每一行都有用,这就是我打算在Python 中执行的操作:
import os
sep_pos = -len(os.linesep)
with open("file.txt") as f:
for line in f:
if line[sep_pos:] == os.linesep:
line = line[:sep_pos]
process(line)
An example in Python’s documentation simply uses line.strip()
.
Perl’s chomp
function removes one linebreak sequence from the end of a string only if it’s actually there.
Here is how I plan to do that in Python, if process
is conceptually the function that I need in order to do something useful to each line from this file:
import os
sep_pos = -len(os.linesep)
with open("file.txt") as f:
for line in f:
if line[sep_pos:] == os.linesep:
line = line[:sep_pos]
process(line)
回答 13
我不使用Python编程,但是在python.org上遇到了一个常见问题解答,主张S.rstrip(“ \ r \ n”)适用于python 2.2或更高版本。
I don’t program in Python, but I came across an FAQ at python.org advocating S.rstrip(“\r\n”) for python 2.2 or later.
回答 14
import re
r_unwanted = re.compile("[\n\t\r]")
r_unwanted.sub("", your_text)
import re
r_unwanted = re.compile("[\n\t\r]")
r_unwanted.sub("", your_text)
回答 15
我发现能够通过迭代器获得短线很方便,这与从文件对象中获得短线的方式相似。您可以使用以下代码进行操作:
def chomped_lines(it):
return map(operator.methodcaller('rstrip', '\r\n'), it)
用法示例:
with open("file.txt") as infile:
for line in chomped_lines(infile):
process(line)
I find it convenient to have be able to get the chomped lines via in iterator, parallel to the way you can get the un-chomped lines from a file object. You can do so with the following code:
def chomped_lines(it):
return map(operator.methodcaller('rstrip', '\r\n'), it)
Sample usage:
with open("file.txt") as infile:
for line in chomped_lines(infile):
process(line)
回答 16
特殊情况的解决方法:
如果换行符是最后符(大多数文件输入都是这种情况),那么对于集合中的任何元素,您都可以按如下所示进行索引:
foobar= foobar[:-1]
切出换行符。
workaround solution for special case:
if the newline character is the last character (as is the case with most file inputs), then for any element in the collection you can index as follows:
foobar= foobar[:-1]
to slice out your newline character.
回答 17
如果您的问题是清理多行str对象(oldstr)中的所有换行符,则可以根据定界符’\ n’将其拆分为一个列表,然后将该列表加入一个新的str(newstr)中。
newstr = "".join(oldstr.split('\n'))
If your question is to clean up all the line breaks in a multiple line str object (oldstr), you can split it into a list according to the delimiter ‘\n’ and then join this list into a new str(newstr).
newstr = "".join(oldstr.split('\n'))
回答 18
它看起来像没有用于Perl的一个完美的模拟格格。尤其是,rstrip无法处理多字符换行符分隔符,例如\r\n
。但是,分割线确实如此处指出。按照我对另一个问题的回答,您可以结合使用join和splitlines来删除/替换字符串中的所有换行符s
:
''.join(s.splitlines())
以下内容仅删除了一条尾随的换行符(我相信像排行一样)。True
将keepends
参数作为分割线传递时保留定界符。然后,再次调用splitlines以删除最后一个“行”上的分隔符:
def chomp(s):
if len(s):
lines = s.splitlines(True)
last = lines.pop()
return ''.join(lines + last.splitlines())
else:
return ''
It looks like there is not a perfect analog for perl’s chomp. In particular, rstrip cannot handle multi-character newline delimiters like \r\n
. However, splitlines does as pointed out here.
Following my answer on a different question, you can combine join and splitlines to remove/replace all newlines from a string s
:
''.join(s.splitlines())
The following removes exactly one trailing newline (as chomp would, I believe). Passing True
as the keepends
argument to splitlines retain the delimiters. Then, splitlines is called again to remove the delimiters on just the last “line”:
def chomp(s):
if len(s):
lines = s.splitlines(True)
last = lines.pop()
return ''.join(lines + last.splitlines())
else:
return ''
回答 19
我正在从先前在其他答案的评论中发布的答案中冒充基于正则表达式的答案。我认为使用re
可以解决此问题str.rstrip
。
>>> import re
如果要删除一个或多个尾随换行符,请执行以下操作:
>>> re.sub(r'[\n\r]+$', '', '\nx\r\n')
'\nx'
如果要在各处删除换行符(不只是尾随):
>>> re.sub(r'[\n\r]+', '', '\nx\r\n')
'x'
如果你想删除只有1-2尾随换行字符(即\r
,\n
,\r\n
,\n\r
,\r\r
,\n\n
)
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r\n')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n')
'\nx'
我有一种感觉,大多数人真的想在这里,是消除只是一个发生尾随换行符的,无论是\r\n
或\n
仅此而已。
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n\n', count=1)
'\nx\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n\r\n', count=1)
'\nx\r\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n', count=1)
'\nx'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n', count=1)
'\nx'
( ?:
创建一个非捕获组。)
(顺便说一句,这不是做什么'...'.rstrip('\n', '').rstrip('\r', '')
,其他人可能不会在这个线程上绊脚石。 str.rstrip
剥离掉尽可能多的尾随字符,因此,像这样的字符串foo\n\n\n
会导致的误报,foo
而您可能想保留除去尾随单个后的其他换行符。)
I’m bubbling up my regular expression based answer from one I posted earlier in the comments of another answer. I think using re
is a clearer more explicit solution to this problem than str.rstrip
.
>>> import re
If you want to remove one or more trailing newline chars:
>>> re.sub(r'[\n\r]+$', '', '\nx\r\n')
'\nx'
If you want to remove newline chars everywhere (not just trailing):
>>> re.sub(r'[\n\r]+', '', '\nx\r\n')
'x'
If you want to remove only 1-2 trailing newline chars (i.e., \r
, \n
, \r\n
, \n\r
, \r\r
, \n\n
)
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r\n')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n')
'\nx'
I have a feeling what most people really want here, is to remove just one occurrence of a trailing newline character, either \r\n
or \n
and nothing more.
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n\n', count=1)
'\nx\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n\r\n', count=1)
'\nx\r\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n', count=1)
'\nx'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n', count=1)
'\nx'
(The ?:
is to create a non-capturing group.)
(By the way this is not what '...'.rstrip('\n', '').rstrip('\r', '')
does which may not be clear to others stumbling upon this thread. str.rstrip
strips as many of the trailing characters as possible, so a string like foo\n\n\n
would result in a false positive of foo
whereas you may have wanted to preserve the other newlines after stripping a single trailing one.)
回答 20
>>> ' spacious '.rstrip()
' spacious'
>>> "AABAA".rstrip("A")
'AAB'
>>> "ABBA".rstrip("AB") # both AB and BA are stripped
''
>>> "ABCABBA".rstrip("AB")
'ABC'
>>> ' spacious '.rstrip()
' spacious'
>>> "AABAA".rstrip("A")
'AAB'
>>> "ABBA".rstrip("AB") # both AB and BA are stripped
''
>>> "ABCABBA".rstrip("AB")
'ABC'
回答 21
只需使用:
line = line.rstrip("\n")
要么
line = line.strip("\n")
您不需要这些复杂的东西
Just use :
line = line.rstrip("\n")
or
line = line.strip("\n")
You don’t need any of this complicated stuff
回答 22
s = '''Hello World \t\n\r\tHi There'''
# import the module string
import string
# use the method translate to convert
s.translate({ord(c): None for c in string.whitespace}
>>'HelloWorldHiThere'
与正则表达式
s = ''' Hello World
\t\n\r\tHi '''
print(re.sub(r"\s+", "", s), sep='') # \s matches all white spaces
>HelloWorldHi
替换\ n,\ t,\ r
s.replace('\n', '').replace('\t','').replace('\r','')
>' Hello World Hi '
与正则表达式
s = '''Hello World \t\n\r\tHi There'''
regex = re.compile(r'[\n\r\t]')
regex.sub("", s)
>'Hello World Hi There'
与加入
s = '''Hello World \t\n\r\tHi There'''
' '.join(s.split())
>'Hello World Hi There'
s = '''Hello World \t\n\r\tHi There'''
# import the module string
import string
# use the method translate to convert
s.translate({ord(c): None for c in string.whitespace}
>>'HelloWorldHiThere'
With regex
s = ''' Hello World
\t\n\r\tHi '''
print(re.sub(r"\s+", "", s), sep='') # \s matches all white spaces
>HelloWorldHi
Replace \n,\t,\r
s.replace('\n', '').replace('\t','').replace('\r','')
>' Hello World Hi '
With regex
s = '''Hello World \t\n\r\tHi There'''
regex = re.compile(r'[\n\r\t]')
regex.sub("", s)
>'Hello World Hi There'
with Join
s = '''Hello World \t\n\r\tHi There'''
' '.join(s.split())
>'Hello World Hi There'
回答 23
有三种类型的行结尾的,我们常遇到的问题:\n
,\r
和\r\n
。中的一个相当简单的正则表达式re.sub
,即r"\r?\n?$"
,能够将它们全部捕获。
(而且我们要抓住一切,对吗?)
import re
re.sub(r"\r?\n?$", "", the_text, 1)
对于最后一个参数,我们将替换的出现次数限制为一次,从而在某种程度上模仿了chomp。例:
import re
text_1 = "hellothere\n\n\n"
text_2 = "hellothere\n\n\r"
text_3 = "hellothere\n\n\r\n"
a = re.sub(r"\r?\n?$", "", text_1, 1)
b = re.sub(r"\r?\n?$", "", text_2, 1)
c = re.sub(r"\r?\n?$", "", text_3, 1)
…这里a == b == c
是True
。
There are three types of line endings that we normally encounter: \n
, \r
and \r\n
. A rather simple regular expression in re.sub
, namely r"\r?\n?$"
, is able to catch them all.
(And we gotta catch ’em all, am I right?)
import re
re.sub(r"\r?\n?$", "", the_text, 1)
With the last argument, we limit the number of occurences replaced to one, mimicking chomp to some extent. Example:
import re
text_1 = "hellothere\n\n\n"
text_2 = "hellothere\n\n\r"
text_3 = "hellothere\n\n\r\n"
a = re.sub(r"\r?\n?$", "", text_1, 1)
b = re.sub(r"\r?\n?$", "", text_2, 1)
c = re.sub(r"\r?\n?$", "", text_3, 1)
… where a == b == c
is True
.
回答 24
如果您担心速度(例如,您有很长的字符串列表)并且知道换行符char的性质,则字符串切片实际上比rstrip快。进行一点测试以说明这一点:
import time
loops = 50000000
def method1(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string[:-1]
t1 = time.time()
print('Method 1: ' + str(t1 - t0))
def method2(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string.rstrip()
t1 = time.time()
print('Method 2: ' + str(t1 - t0))
method1()
method2()
输出:
Method 1: 3.92700004578
Method 2: 6.73000001907
If you are concerned about speed (say you have a looong list of strings) and you know the nature of the newline char, string slicing is actually faster than rstrip. A little test to illustrate this:
import time
loops = 50000000
def method1(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string[:-1]
t1 = time.time()
print('Method 1: ' + str(t1 - t0))
def method2(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string.rstrip()
t1 = time.time()
print('Method 2: ' + str(t1 - t0))
method1()
method2()
Output:
Method 1: 3.92700004578
Method 2: 6.73000001907
回答 25
这将同时适用于Windows和Linux(如果您只寻求re解决方案,那么re sub会有点贵)
import re
if re.search("(\\r|)\\n$", line):
line = re.sub("(\\r|)\\n$", "", line)
This will work both for windows and linux (bit expensive with re sub if you are looking for only re solution)
import re
if re.search("(\\r|)\\n$", line):
line = re.sub("(\\r|)\\n$", "", line)
回答 26
首先分割线,然后通过您喜欢的任何分隔符将它们连接起来:
x = ' '.join(x.splitlines())
应该像魅力一样工作。
First split lines then join them by any separator you like:
x = ' '.join(x.splitlines())
should work like a charm.
回答 27
一网打尽:
line = line.rstrip('\r|\n')
A catch all:
line = line.rstrip('\r|\n')