
Python与Perl chomp函数等效吗?如果是换行符,它将删除字符串的最后符?

What is the Python equivalent of Perl’s chomp function, which removes the last character of a string if it is a newline?

回答 0

试用该方法rstrip()(请参阅doc Python 2Python 3

>>> 'test string\n'.rstrip()
'test string'


>>> 'test string \n \r\n\n\r \n\n'.rstrip()
'test string'


>>> 'test string \n \r\n\n\r \n\n'.rstrip('\n')
'test string \n \r\n\n\r '


>>> s = "   \n\r\n  \n  abc   def \n\r\n  \n  "
>>> s.strip()
'abc   def'
>>> s.lstrip()
'abc   def \n\r\n  \n  '
>>> s.rstrip()
'   \n\r\n  \n  abc   def'

Try the method rstrip() (see doc Python 2 and Python 3)

>>> 'test string\n'.rstrip()
'test string'

Python’s rstrip() method strips all kinds of trailing whitespace by default, not just one newline as Perl does with chomp.

>>> 'test string \n \r\n\n\r \n\n'.rstrip()
'test string'

To strip only newlines:

>>> 'test string \n \r\n\n\r \n\n'.rstrip('\n')
'test string \n \r\n\n\r '

There are also the methods lstrip() and strip():

>>> s = "   \n\r\n  \n  abc   def \n\r\n  \n  "
>>> s.strip()
'abc   def'
>>> s.lstrip()
'abc   def \n\r\n  \n  '
>>> s.rstrip()
'   \n\r\n  \n  abc   def'

回答 1

我想说的是,在不尾随换行符的情况下获取行的“ pythonic”方法是splitlines()。

>>> text = "line 1\nline 2\r\nline 3\nline 4"
>>> text.splitlines()
['line 1', 'line 2', 'line 3', 'line 4']

And I would say the “pythonic” way to get lines without trailing newline characters is splitlines().

>>> text = "line 1\nline 2\r\nline 3\nline 4"
>>> text.splitlines()
['line 1', 'line 2', 'line 3', 'line 4']

回答 2

删除行尾(EOL)字符的规范方法是使用字符串rstrip()方法,删除任何尾随的\ r或\ n。以下是Mac,Windows和Unix EOL字符的示例。

>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'
>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'
>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'

使用’\ r \ n’作为rstrip的参数意味着它会去除’\ r’或’\ n’的任何尾随组合。这就是为什么它在以上所有三种情况下都有效的原因。

这种细微差别在极少数情况下很重要。例如,我曾经不得不处理一个包含HL7消息的文本文件。HL7标准要求结尾的’\ r’作为其EOL字符。我在其上使用此消息的Windows计算机附加了自己的’\ r \ n’EOL字符。因此,每行的末尾看起来像’\ r \ r \ n’。使用rstrip(’\ r \ n’)会删除整个’\ r \ r \ n’,这不是我想要的。在那种情况下,我只是切掉了最后两个字符。


>>> "Hello\n\n\n".rstrip("\n")

The canonical way to strip end-of-line (EOL) characters is to use the string rstrip() method removing any trailing \r or \n. Here are examples for Mac, Windows, and Unix EOL characters.

>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'
>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'
>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'

Using ‘\r\n’ as the parameter to rstrip means that it will strip out any trailing combination of ‘\r’ or ‘\n’. That’s why it works in all three cases above.

This nuance matters in rare cases. For example, I once had to process a text file which contained an HL7 message. The HL7 standard requires a trailing ‘\r’ as its EOL character. The Windows machine on which I was using this message had appended its own ‘\r\n’ EOL character. Therefore, the end of each line looked like ‘\r\r\n’. Using rstrip(‘\r\n’) would have taken off the entire ‘\r\r\n’ which is not what I wanted. In that case, I simply sliced off the last two characters instead.

Note that unlike Perl’s chomp function, this will strip all specified characters at the end of the string, not just one:

>>> "Hello\n\n\n".rstrip("\n")

回答 3



chomp $x





将意味着价值x依旧 "a\n"。甚至x=x.rstrip()并不总是给出相同的结果,因为它从字符串的末尾去除所有空格,最多不只是一个换行符。

Note that rstrip doesn’t act exactly like Perl’s chomp() because it doesn’t modify the string. That is, in Perl:


chomp $x

results in $x being "a".

but in Python:



will mean that the value of x is still "a\n". Even x=x.rstrip() doesn’t always give the same result, as it strips all whitespace from the end of the string, not just one newline at most.

回答 4


import os
s = s.rstrip(os.linesep)


I might use something like this:

import os
s = s.rstrip(os.linesep)

I think the problem with rstrip("\n") is that you’ll probably want to make sure the line separator is portable. (some antiquated systems are rumored to use "\r\n"). The other gotcha is that rstrip will strip out repeated whitespace. Hopefully os.linesep will contain the right characters. the above works for me.

回答 5

您可以使用line = line.rstrip('\n')。这将从字符串末尾除去所有换行符,而不仅仅是一条。

You may use line = line.rstrip('\n'). This will strip all newlines from the end of the string, not just one.

回答 6

s = s.rstrip()


s = s.rstrip()

will remove all newlines at the end of the string s. The assignment is needed because rstrip returns a new string instead of modifying the original string.

回答 7

这将为“ \ n”行终止符精确复制perl的champ(数组的负行为):

def chomp(x):
    if x.endswith("\r\n"): return x[:-2]
    if x.endswith("\n") or x.endswith("\r"): return x[:-1]
    return x

(注意:它不会修改字符串“就地”;它不会去除多余的尾随空格;需要考虑\ r \ n)

This would replicate exactly perl’s chomp (minus behavior on arrays) for “\n” line terminator:

def chomp(x):
    if x.endswith("\r\n"): return x[:-2]
    if x.endswith("\n") or x.endswith("\r"): return x[:-1]
    return x

(Note: it does not modify string ‘in place’; it does not strip extra trailing whitespace; takes \r\n in account)

回答 8

"line 1\nline 2\r\n...".replace('\n', '').replace('\r', '')
>>> 'line 1line 2...'



"line 1\nline 2\r\n...".replace('\n', '').replace('\r', '')
>>> 'line 1line 2...'

or you could always get geekier with regexps :)

have fun!

回答 9


line = line.strip()


>>> "\n\n hello world \n\n".strip()
'hello world'

you can use strip:

line = line.strip()


>>> "\n\n hello world \n\n".strip()
'hello world'

回答 10




>>> 'foo\n\n'.rstrip(os.linesep)

可以使用re.sub来更接近典型的Perl chomp用法,如下所示:

>>> re.sub(os.linesep + r'\Z','','foo\n\n')

rstrip doesn’t do the same thing as chomp, on so many levels. Read http://perldoc.perl.org/functions/chomp.html and see that chomp is very complex indeed.

However, my main point is that chomp removes at most 1 line ending, whereas rstrip will remove as many as it can.

Here you can see rstrip removing all the newlines:

>>> 'foo\n\n'.rstrip(os.linesep)

A much closer approximation of typical Perl chomp usage can be accomplished with re.sub, like this:

>>> re.sub(os.linesep + r'\Z','','foo\n\n')

回答 11


$ python
Python 2.7.1 (r271:86832, Mar 18 2011, 09:09:48) 
[GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.platform
>>> "foo\r\n".rstrip(os.linesep)


Careful with "foo".rstrip(os.linesep): That will only chomp the newline characters for the platform where your Python is being executed. Imagine you’re chimping the lines of a Windows file under Linux, for instance:

$ python
Python 2.7.1 (r271:86832, Mar 18 2011, 09:09:48) 
[GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.platform
>>> "foo\r\n".rstrip(os.linesep)

Use "foo".rstrip("\r\n") instead, as Mike says above.

回答 12



如果process从概念上来说,这是我需要执行的功能,以便对该文件的每一行都有用,这就是我打算在Python 中执行的操作:

import os
sep_pos = -len(os.linesep)
with open("file.txt") as f:
    for line in f:
        if line[sep_pos:] == os.linesep:
            line = line[:sep_pos]

An example in Python’s documentation simply uses line.strip().

Perl’s chomp function removes one linebreak sequence from the end of a string only if it’s actually there.

Here is how I plan to do that in Python, if process is conceptually the function that I need in order to do something useful to each line from this file:

import os
sep_pos = -len(os.linesep)
with open("file.txt") as f:
    for line in f:
        if line[sep_pos:] == os.linesep:
            line = line[:sep_pos]

回答 13

我不使用Python编程,但是在python.org上遇到了一个常见问题解答,主张S.rstrip(“ \ r \ n”)适用于python 2.2或更高版本。

I don’t program in Python, but I came across an FAQ at python.org advocating S.rstrip(“\r\n”) for python 2.2 or later.

回答 14

import re

r_unwanted = re.compile("[\n\t\r]")
r_unwanted.sub("", your_text)
import re

r_unwanted = re.compile("[\n\t\r]")
r_unwanted.sub("", your_text)

回答 15


def chomped_lines(it):
    return map(operator.methodcaller('rstrip', '\r\n'), it)


with open("file.txt") as infile:
    for line in chomped_lines(infile):

I find it convenient to have be able to get the chomped lines via in iterator, parallel to the way you can get the un-chomped lines from a file object. You can do so with the following code:

def chomped_lines(it):
    return map(operator.methodcaller('rstrip', '\r\n'), it)

Sample usage:

with open("file.txt") as infile:
    for line in chomped_lines(infile):

回答 16



foobar= foobar[:-1]


workaround solution for special case:

if the newline character is the last character (as is the case with most file inputs), then for any element in the collection you can index as follows:

foobar= foobar[:-1]

to slice out your newline character.

回答 17

如果您的问题是清理多行str对象(oldstr)中的所有换行符,则可以根据定界符’\ n’将其拆分为一个列表,然后将该列表加入一个新的str(newstr)中。

newstr = "".join(oldstr.split('\n'))

If your question is to clean up all the line breaks in a multiple line str object (oldstr), you can split it into a list according to the delimiter ‘\n’ and then join this list into a new str(newstr).

newstr = "".join(oldstr.split('\n'))

回答 18




def chomp(s):
    if len(s):
        lines = s.splitlines(True)
        last = lines.pop()
        return ''.join(lines + last.splitlines())
        return ''

It looks like there is not a perfect analog for perl’s chomp. In particular, rstrip cannot handle multi-character newline delimiters like \r\n. However, splitlines does as pointed out here. Following my answer on a different question, you can combine join and splitlines to remove/replace all newlines from a string s:


The following removes exactly one trailing newline (as chomp would, I believe). Passing True as the keepends argument to splitlines retain the delimiters. Then, splitlines is called again to remove the delimiters on just the last “line”:

def chomp(s):
    if len(s):
        lines = s.splitlines(True)
        last = lines.pop()
        return ''.join(lines + last.splitlines())
        return ''

回答 19


>>> import re


>>> re.sub(r'[\n\r]+$', '', '\nx\r\n')


>>> re.sub(r'[\n\r]+', '', '\nx\r\n')


>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r\n')
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r')
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n')


>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n\n', count=1)
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n\r\n', count=1)
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n', count=1)
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n', count=1)


(顺便说一句,这不是做什么'...'.rstrip('\n', '').rstrip('\r', ''),其他人可能不会在这个线程上绊脚石。 str.rstrip剥离掉尽可能多的尾随字符,因此,像这样的字符串foo\n\n\n会导致的误报,foo而您可能想保留除去尾随单个后的其他换行符。)

I’m bubbling up my regular expression based answer from one I posted earlier in the comments of another answer. I think using re is a clearer more explicit solution to this problem than str.rstrip.

>>> import re

If you want to remove one or more trailing newline chars:

>>> re.sub(r'[\n\r]+$', '', '\nx\r\n')

If you want to remove newline chars everywhere (not just trailing):

>>> re.sub(r'[\n\r]+', '', '\nx\r\n')

If you want to remove only 1-2 trailing newline chars (i.e., \r, \n, \r\n, \n\r, \r\r, \n\n)

>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r\n')
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r')
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n')

I have a feeling what most people really want here, is to remove just one occurrence of a trailing newline character, either \r\n or \n and nothing more.

>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n\n', count=1)
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n\r\n', count=1)
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n', count=1)
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n', count=1)

(The ?: is to create a non-capturing group.)

(By the way this is not what '...'.rstrip('\n', '').rstrip('\r', '') does which may not be clear to others stumbling upon this thread. str.rstrip strips as many of the trailing characters as possible, so a string like foo\n\n\n would result in a false positive of foo whereas you may have wanted to preserve the other newlines after stripping a single trailing one.)

回答 20

>>> '   spacious   '.rstrip()
'   spacious'
>>> "AABAA".rstrip("A")
>>> "ABBA".rstrip("AB") # both AB and BA are stripped
>>> "ABCABBA".rstrip("AB")
>>> '   spacious   '.rstrip()
'   spacious'
>>> "AABAA".rstrip("A")
>>> "ABBA".rstrip("AB") # both AB and BA are stripped
>>> "ABCABBA".rstrip("AB")

回答 21


line = line.rstrip("\n")


line = line.strip("\n")


Just use :

line = line.rstrip("\n")


line = line.strip("\n")

You don’t need any of this complicated stuff

回答 22

s = '''Hello  World \t\n\r\tHi There'''
# import the module string   
import string
# use the method translate to convert 
s.translate({ord(c): None for c in string.whitespace}


s = '''  Hello  World 
\t\n\r\tHi '''
print(re.sub(r"\s+", "", s), sep='')  # \s matches all white spaces

替换\ n,\ t,\ r

s.replace('\n', '').replace('\t','').replace('\r','')
>'  Hello  World Hi '


s = '''Hello  World \t\n\r\tHi There'''
regex = re.compile(r'[\n\r\t]')
regex.sub("", s)
>'Hello  World Hi There'


s = '''Hello  World \t\n\r\tHi There'''
' '.join(s.split())
>'Hello  World Hi There'
s = '''Hello  World \t\n\r\tHi There'''
# import the module string   
import string
# use the method translate to convert 
s.translate({ord(c): None for c in string.whitespace}

With regex

s = '''  Hello  World 
\t\n\r\tHi '''
print(re.sub(r"\s+", "", s), sep='')  # \s matches all white spaces

Replace \n,\t,\r

s.replace('\n', '').replace('\t','').replace('\r','')
>'  Hello  World Hi '

With regex

s = '''Hello  World \t\n\r\tHi There'''
regex = re.compile(r'[\n\r\t]')
regex.sub("", s)
>'Hello  World Hi There'

with Join

s = '''Hello  World \t\n\r\tHi There'''
' '.join(s.split())
>'Hello  World Hi There'

回答 23



import re

re.sub(r"\r?\n?$", "", the_text, 1)


import re

text_1 = "hellothere\n\n\n"
text_2 = "hellothere\n\n\r"
text_3 = "hellothere\n\n\r\n"

a = re.sub(r"\r?\n?$", "", text_1, 1)
b = re.sub(r"\r?\n?$", "", text_2, 1)
c = re.sub(r"\r?\n?$", "", text_3, 1)

…这里a == b == cTrue

There are three types of line endings that we normally encounter: \n, \r and \r\n. A rather simple regular expression in re.sub, namely r"\r?\n?$", is able to catch them all.

(And we gotta catch ’em all, am I right?)

import re

re.sub(r"\r?\n?$", "", the_text, 1)

With the last argument, we limit the number of occurences replaced to one, mimicking chomp to some extent. Example:

import re

text_1 = "hellothere\n\n\n"
text_2 = "hellothere\n\n\r"
text_3 = "hellothere\n\n\r\n"

a = re.sub(r"\r?\n?$", "", text_1, 1)
b = re.sub(r"\r?\n?$", "", text_2, 1)
c = re.sub(r"\r?\n?$", "", text_3, 1)

… where a == b == c is True.

回答 24


import time

loops = 50000000

def method1(loops=loops):
    test_string = 'num\n'
    t0 = time.time()
    for num in xrange(loops):
        out_sting = test_string[:-1]
    t1 = time.time()
    print('Method 1: ' + str(t1 - t0))

def method2(loops=loops):
    test_string = 'num\n'
    t0 = time.time()
    for num in xrange(loops):
        out_sting = test_string.rstrip()
    t1 = time.time()
    print('Method 2: ' + str(t1 - t0))



Method 1: 3.92700004578
Method 2: 6.73000001907

If you are concerned about speed (say you have a looong list of strings) and you know the nature of the newline char, string slicing is actually faster than rstrip. A little test to illustrate this:

import time

loops = 50000000

def method1(loops=loops):
    test_string = 'num\n'
    t0 = time.time()
    for num in xrange(loops):
        out_sting = test_string[:-1]
    t1 = time.time()
    print('Method 1: ' + str(t1 - t0))

def method2(loops=loops):
    test_string = 'num\n'
    t0 = time.time()
    for num in xrange(loops):
        out_sting = test_string.rstrip()
    t1 = time.time()
    print('Method 2: ' + str(t1 - t0))



Method 1: 3.92700004578
Method 2: 6.73000001907

回答 25

这将同时适用于Windows和Linux(如果您只寻求re解决方案,那么re sub会有点贵)

import re 
if re.search("(\\r|)\\n$", line):
    line = re.sub("(\\r|)\\n$", "", line)

This will work both for windows and linux (bit expensive with re sub if you are looking for only re solution)

import re 
if re.search("(\\r|)\\n$", line):
    line = re.sub("(\\r|)\\n$", "", line)

回答 26


x = ' '.join(x.splitlines())


First split lines then join them by any separator you like:

x = ' '.join(x.splitlines())

should work like a charm.

回答 27


line = line.rstrip('\r|\n')

A catch all:

line = line.rstrip('\r|\n')
