问题:Python截断长字符串

如何在Python中将字符串截断为75个字符?

这是在JavaScript中完成的方式:

var data="saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
var info = (data.length > 75) ? data.substring[0,75] + '..' : data;

How does one truncate a string to 75 characters in Python?

This is how it is done in JavaScript:

var data="saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
var info = (data.length > 75) ? data.substring[0,75] + '..' : data;

回答 0

info = (data[:75] + '..') if len(data) > 75 else data
info = (data[:75] + '..') if len(data) > 75 else data

回答 1

甚至更短:

info = data[:75] + (data[75:] and '..')

Even shorter :

info = data[:75] + (data[75:] and '..')

回答 2

更简洁:

data = data[:75]

如果少于75个字符,则不会更改。

Even more concise:

data = data[:75]

If it is less than 75 characters there will be no change.


回答 3

如果您使用的是Python 3.4+,则可以textwrap.shorten从标准库中使用:

折叠并截断给定的文本以适合给定的宽度。

首先,文本中的空格被折叠(所有空格均由单个空格代替)。如果结果适合宽度,则将其返回。否则,将从末尾放置足够的单词,以使其余单词加上占位符适合宽度:

>>> textwrap.shorten("Hello  world!", width=12)
'Hello world!'
>>> textwrap.shorten("Hello  world!", width=11)
'Hello [...]'
>>> textwrap.shorten("Hello world", width=10, placeholder="...")
'Hello...'

If you are using Python 3.4+, you can use textwrap.shorten from the standard library:

Collapse and truncate the given text to fit in the given width.

First the whitespace in text is collapsed (all whitespace is replaced by single spaces). If the result fits in the width, it is returned. Otherwise, enough words are dropped from the end so that the remaining words plus the placeholder fit within width:

>>> textwrap.shorten("Hello  world!", width=12)
'Hello world!'
>>> textwrap.shorten("Hello  world!", width=11)
'Hello [...]'
>>> textwrap.shorten("Hello world", width=10, placeholder="...")
'Hello...'

回答 4

对于Django解决方案(问题中未提及):

from django.utils.text import Truncator
value = Truncator(value).chars(75)

看一下Truncator的源代码以了解这个问题:https : //github.com/django/django/blob/master/django/utils/text.py#L66

关于Django的截断Django HTML截断

For a Django solution (which has not been mentioned in the question):

from django.utils.text import Truncator
value = Truncator(value).chars(75)

Have a look at Truncator’s source code to appreciate the problem: https://github.com/django/django/blob/master/django/utils/text.py#L66

Concerning truncation with Django: Django HTML truncation


回答 5

您可以使用这种单线:

data = (data[:75] + '..') if len(data) > 75 else data

You could use this one-liner:

data = (data[:75] + '..') if len(data) > 75 else data

回答 6

使用正则表达式:

re.sub(r'^(.{75}).*$', '\g<1>...', data)

长字符串被截断:

>>> data="11111111112222222222333333333344444444445555555555666666666677777777778888888888"
>>> re.sub(r'^(.{75}).*$', '\g<1>...', data)
'111111111122222222223333333333444444444455555555556666666666777777777788888...'

较短的字符串永远不会被截断:

>>> data="11111111112222222222333333"
>>> re.sub(r'^(.{75}).*$', '\g<1>...', data)
'11111111112222222222333333'

这样,您还可以“剪切”字符串的中间部分,这在某些情况下会更好:

re.sub(r'^(.{5}).*(.{5})$', '\g<1>...\g<2>', data)

>>> data="11111111112222222222333333333344444444445555555555666666666677777777778888888888"
>>> re.sub(r'^(.{5}).*(.{5})$', '\g<1>...\g<2>', data)
'11111...88888'

With regex:

re.sub(r'^(.{75}).*$', '\g<1>...', data)

Long strings are truncated:

>>> data="11111111112222222222333333333344444444445555555555666666666677777777778888888888"
>>> re.sub(r'^(.{75}).*$', '\g<1>...', data)
'111111111122222222223333333333444444444455555555556666666666777777777788888...'

Shorter strings never get truncated:

>>> data="11111111112222222222333333"
>>> re.sub(r'^(.{75}).*$', '\g<1>...', data)
'11111111112222222222333333'

This way, you can also “cut” the middle part of the string, which is nicer in some cases:

re.sub(r'^(.{5}).*(.{5})$', '\g<1>...\g<2>', data)

>>> data="11111111112222222222333333333344444444445555555555666666666677777777778888888888"
>>> re.sub(r'^(.{5}).*(.{5})$', '\g<1>...\g<2>', data)
'11111...88888'

回答 7

如果满足以下条件,则此方法不使用任何方法:

data[:75] + bool(data[75:]) * '..'

This method doesn’t use any if:

data[:75] + bool(data[75:]) * '..'


回答 8

limit = 75
info = data[:limit] + '..' * (len(data) > limit)
limit = 75
info = data[:limit] + '..' * (len(data) > limit)

回答 9

另一个解决方案。使用True和,最后False您会得到有关测试的一些反馈。

data = {True: data[:75] + '..', False: data}[len(data) > 75]

Yet another solution. With True and False you get a little feedback about the test at the end.

data = {True: data[:75] + '..', False: data}[len(data) > 75]

回答 10

这只是在:

n = 8
s = '123'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '12345678'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '123456789'     
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '123456789012345'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]

123
12345678
12345...
12345...

This just in:

n = 8
s = '123'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '12345678'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '123456789'     
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '123456789012345'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]

123
12345678
12345...
12345...

回答 11

       >>> info = lambda data: len(data)>10 and data[:10]+'...' or data
       >>> info('sdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdf')
           'sdfsdfsdfs...'
       >>> info('sdfsdf')
           'sdfsdf'
       >>> 
       >>> info = lambda data: len(data)>10 and data[:10]+'...' or data
       >>> info('sdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdf')
           'sdfsdfsdfs...'
       >>> info('sdfsdf')
           'sdfsdf'
       >>> 

回答 12

您实际上无法像执行动态分配的C字符串那样“截断” Python字符串。Python中的字符串是不可变的。您可以按照其他答案中的说明对字符串进行切片,从而生成仅包含切片偏移量和步长定义的字符的新字符串。在某些(非实际)情况下,这可能会有些烦人,例如当您选择Python作为您的采访语言并且采访者要求您就地从字符串中删除重复的字符时。h

You can’t actually “truncate” a Python string like you can do a dynamically allocated C string. Strings in Python are immutable. What you can do is slice a string as described in other answers, yielding a new string containing only the characters defined by the slice offsets and step. In some (non-practical) cases this can be a little annoying, such as when you choose Python as your interview language and the interviewer asks you to remove duplicate characters from a string in-place. Doh.


回答 13

info = data[:min(len(data), 75)
info = data[:min(len(data), 75)

回答 14

不需要正则表达式,但是您确实想使用字符串格式而不是接受的答案中的字符串串联。

这可能是将字符串截断data为75个字符的最典型的Python方法。

>>> data = "saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
>>> info = "{}..".format(data[:75]) if len(data) > 75 else data
>>> info
'111111111122222222223333333333444444444455555555556666666666777777777788888...'

There’s no need for a regular expression but you do want to use string formatting rather than the string concatenation in the accepted answer.

This is probably the most canonical, Pythonic way to truncate the string data at 75 characters.

>>> data = "saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
>>> info = "{}..".format(data[:75]) if len(data) > 75 else data
>>> info
'111111111122222222223333333333444444444455555555556666666666777777777788888...'

回答 15

这是我作为新String类的一部分制作的函数…它允许添加后缀(如果字符串是修剪后的大小,并且添加的长度足够长-尽管您无需强制使用绝对大小)

我当时正在改变一些事情,因此有一些无用的逻辑成本(例如_truncate …),在此不再需要,并且顶部有回报。

但是,它仍然是截断数据的好功能…

##
## Truncate characters of a string after _len'nth char, if necessary... If _len is less than 0, don't truncate anything... Note: If you attach a suffix, and you enable absolute max length then the suffix length is subtracted from max length... Note: If the suffix length is longer than the output then no suffix is used...
##
## Usage: Where _text = 'Testing', _width = 4
##      _data = String.Truncate( _text, _width )                        == Test
##      _data = String.Truncate( _text, _width, '..', True )            == Te..
##
## Equivalent Alternates: Where _text = 'Testing', _width = 4
##      _data = String.SubStr( _text, 0, _width )                       == Test
##      _data = _text[  : _width ]                                      == Test
##      _data = ( _text )[  : _width ]                                  == Test
##
def Truncate( _text, _max_len = -1, _suffix = False, _absolute_max_len = True ):
    ## Length of the string we are considering for truncation
    _len            = len( _text )

    ## Whether or not we have to truncate
    _truncate       = ( False, True )[ _len > _max_len ]

    ## Note: If we don't need to truncate, there's no point in proceeding...
    if ( not _truncate ):
        return _text

    ## The suffix in string form
    _suffix_str     = ( '',  str( _suffix ) )[ _truncate and _suffix != False ]

    ## The suffix length
    _len_suffix     = len( _suffix_str )

    ## Whether or not we add the suffix
    _add_suffix     = ( False, True )[ _truncate and _suffix != False and _max_len > _len_suffix ]

    ## Suffix Offset
    _suffix_offset = _max_len - _len_suffix
    _suffix_offset  = ( _max_len, _suffix_offset )[ _add_suffix and _absolute_max_len != False and _suffix_offset > 0 ]

    ## The truncate point.... If not necessary, then length of string.. If necessary then the max length with or without subtracting the suffix length... Note: It may be easier ( less logic cost ) to simply add the suffix to the calculated point, then truncate - if point is negative then the suffix will be destroyed anyway.
    ## If we don't need to truncate, then the length is the length of the string.. If we do need to truncate, then the length depends on whether we add the suffix and offset the length of the suffix or not...
    _len_truncate   = ( _len, _max_len )[ _truncate ]
    _len_truncate   = ( _len_truncate, _max_len )[ _len_truncate <= _max_len ]

    ## If we add the suffix, add it... Suffix won't be added if the suffix is the same length as the text being output...
    if ( _add_suffix ):
        _text = _text[ 0 : _suffix_offset ] + _suffix_str + _text[ _suffix_offset: ]

    ## Return the text after truncating...
    return _text[ : _len_truncate ]

Here’s a function I made as part of a new String class… It allows adding a suffix ( if the string is size after trimming and adding it is long enough – although you don’t need to force the absolute size )

I was in the process of changing a few things around so there are some useless logic costs ( if _truncate … for instance ) where it is no longer necessary and there is a return at the top…

But, it is still a good function for truncating data…

##
## Truncate characters of a string after _len'nth char, if necessary... If _len is less than 0, don't truncate anything... Note: If you attach a suffix, and you enable absolute max length then the suffix length is subtracted from max length... Note: If the suffix length is longer than the output then no suffix is used...
##
## Usage: Where _text = 'Testing', _width = 4
##      _data = String.Truncate( _text, _width )                        == Test
##      _data = String.Truncate( _text, _width, '..', True )            == Te..
##
## Equivalent Alternates: Where _text = 'Testing', _width = 4
##      _data = String.SubStr( _text, 0, _width )                       == Test
##      _data = _text[  : _width ]                                      == Test
##      _data = ( _text )[  : _width ]                                  == Test
##
def Truncate( _text, _max_len = -1, _suffix = False, _absolute_max_len = True ):
    ## Length of the string we are considering for truncation
    _len            = len( _text )

    ## Whether or not we have to truncate
    _truncate       = ( False, True )[ _len > _max_len ]

    ## Note: If we don't need to truncate, there's no point in proceeding...
    if ( not _truncate ):
        return _text

    ## The suffix in string form
    _suffix_str     = ( '',  str( _suffix ) )[ _truncate and _suffix != False ]

    ## The suffix length
    _len_suffix     = len( _suffix_str )

    ## Whether or not we add the suffix
    _add_suffix     = ( False, True )[ _truncate and _suffix != False and _max_len > _len_suffix ]

    ## Suffix Offset
    _suffix_offset = _max_len - _len_suffix
    _suffix_offset  = ( _max_len, _suffix_offset )[ _add_suffix and _absolute_max_len != False and _suffix_offset > 0 ]

    ## The truncate point.... If not necessary, then length of string.. If necessary then the max length with or without subtracting the suffix length... Note: It may be easier ( less logic cost ) to simply add the suffix to the calculated point, then truncate - if point is negative then the suffix will be destroyed anyway.
    ## If we don't need to truncate, then the length is the length of the string.. If we do need to truncate, then the length depends on whether we add the suffix and offset the length of the suffix or not...
    _len_truncate   = ( _len, _max_len )[ _truncate ]
    _len_truncate   = ( _len_truncate, _max_len )[ _len_truncate <= _max_len ]

    ## If we add the suffix, add it... Suffix won't be added if the suffix is the same length as the text being output...
    if ( _add_suffix ):
        _text = _text[ 0 : _suffix_offset ] + _suffix_str + _text[ _suffix_offset: ]

    ## Return the text after truncating...
    return _text[ : _len_truncate ]

回答 16

info = data[:75] + ('..' if len(data) > 75 else '')
info = data[:75] + ('..' if len(data) > 75 else '')

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。