标签归档:string

如何引发ValueError?

问题:如何引发ValueError?

我有这段代码可以找到字符串中特定字符的最大索引,但是ValueError当字符串中未出现指定字符时,我希望它提高a值。

所以像这样:

contains('bababa', 'k')

会导致:

ValueError: could not find k in bababa

我怎样才能做到这一点?

这是我的函数的当前代码:

def contains(string,char):
  list = []

  for i in range(0,len(string)):
      if string[i] == char:
           list = list + [i]

  return list[-1]

I have this code which finds the largest index of a specific character in a string, however I would like it to raise a ValueError when the specified character does not occur in a string.

So something like this:

contains('bababa', 'k')

would result in a:

ValueError: could not find k in bababa

How can I do this?

Here’s the current code for my function:

def contains(string,char):
  list = []

  for i in range(0,len(string)):
      if string[i] == char:
           list = list + [i]

  return list[-1]

回答 0

raise ValueError('could not find %c in %s' % (ch,str))

raise ValueError('could not find %c in %s' % (ch,str))


回答 1

这是您的代码的修订版本,该版本仍然有效,并且还说明了如何提高ValueError所需的方式。顺便说一句,我认为find_last()find_last_index()或类似的名称会对此功能更具描述性。更可能造成混乱的是,Python已经有一个名为的容器对象方法__contains__(),该方法在成员资格测试方面做了一些不同。

def contains(char_string, char):
    largest_index = -1
    for i, ch in enumerate(char_string):
        if ch == char:
            largest_index = i
    if largest_index > -1:  # any found?
        return largest_index  # return index of last one
    else:
        raise ValueError('could not find {!r} in {!r}'.format(char, char_string))

print(contains('mississippi', 's'))  # -> 6
print(contains('bababa', 'k'))  # ->
Traceback (most recent call last):
  File "how-to-raise-a-valueerror.py", line 15, in <module>
    print(contains('bababa', 'k'))
  File "how-to-raise-a-valueerror.py", line 12, in contains
    raise ValueError('could not find {} in {}'.format(char, char_string))
ValueError: could not find 'k' in 'bababa'

更新-实质上更简单的方法

哇!这是一个更简洁的版本-本质上是单行代码-可能也更快,因为它会先反向搜索(通过[::-1])字符串,然后再对它进行第一个匹配字符的正向搜索,并且使用快速内置的字符串index()方法。关于您的实际问题,使用附带的一点好处index()是,ValueError当找不到字符子字符串时,它已经引发了,因此不需要其他任何操作。

这是一个快速的单元测试:

def contains(char_string, char):
    #  Ending - 1 adjusts returned index to account for searching in reverse.
    return len(char_string) - char_string[::-1].index(char) - 1

print(contains('mississippi', 's'))  # -> 6
print(contains('bababa', 'k'))  # ->
Traceback (most recent call last):
  File "better-way-to-raise-a-valueerror.py", line 9, in <module>
    print(contains('bababa', 'k'))
  File "better-way-to-raise-a-valueerror", line 6, in contains
    return len(char_string) - char_string[::-1].index(char) - 1
ValueError: substring not found

Here’s a revised version of your code which still works plus it illustrates how to raise a ValueError the way you want. By-the-way, I think find_last(), find_last_index(), or something simlar would be a more descriptive name for this function. Adding to the possible confusion is the fact that Python already has a container object method named __contains__() that does something a little different, membership-testing-wise.

def contains(char_string, char):
    largest_index = -1
    for i, ch in enumerate(char_string):
        if ch == char:
            largest_index = i
    if largest_index > -1:  # any found?
        return largest_index  # return index of last one
    else:
        raise ValueError('could not find {!r} in {!r}'.format(char, char_string))

print(contains('mississippi', 's'))  # -> 6
print(contains('bababa', 'k'))  # ->
Traceback (most recent call last):
  File "how-to-raise-a-valueerror.py", line 15, in <module>
    print(contains('bababa', 'k'))
  File "how-to-raise-a-valueerror.py", line 12, in contains
    raise ValueError('could not find {} in {}'.format(char, char_string))
ValueError: could not find 'k' in 'bababa'

Update — A substantially simpler way

Wow! Here’s a much more concise version—essentially a one-liner—that is also likely faster because it reverses (via [::-1]) the string before doing a forward search through it for the first matching character and it does so using the fast built-in string index() method. With respect to your actual question, a nice little bonus convenience that comes with using index() is that it already raises a ValueError when the character substring isn’t found, so nothing additional is required to make that happen.

Here it is along with a quick unit test:

def contains(char_string, char):
    #  Ending - 1 adjusts returned index to account for searching in reverse.
    return len(char_string) - char_string[::-1].index(char) - 1

print(contains('mississippi', 's'))  # -> 6
print(contains('bababa', 'k'))  # ->
Traceback (most recent call last):
  File "better-way-to-raise-a-valueerror.py", line 9, in <module>
    print(contains('bababa', 'k'))
  File "better-way-to-raise-a-valueerror", line 6, in contains
    return len(char_string) - char_string[::-1].index(char) - 1
ValueError: substring not found

回答 2

>>> def contains(string, char):
...     for i in xrange(len(string) - 1, -1, -1):
...         if string[i] == char:
...             return i
...     raise ValueError("could not find %r in %r" % (char, string))
...
>>> contains('bababa', 'k')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in contains
ValueError: could not find 'k' in 'bababa'
>>> contains('bababa', 'a')
5
>>> contains('bababa', 'b')
4
>>> contains('xbababa', 'x')
0
>>>
>>> def contains(string, char):
...     for i in xrange(len(string) - 1, -1, -1):
...         if string[i] == char:
...             return i
...     raise ValueError("could not find %r in %r" % (char, string))
...
>>> contains('bababa', 'k')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in contains
ValueError: could not find 'k' in 'bababa'
>>> contains('bababa', 'a')
5
>>> contains('bababa', 'b')
4
>>> contains('xbababa', 'x')
0
>>>

回答 3

>>> response='bababa'
...  if "K" in response.text:
...     raise ValueError("Not found")
>>> response='bababa'
...  if "K" in response.text:
...     raise ValueError("Not found")

Python 3中的字符串格式

问题:Python 3中的字符串格式

我在Python 2中执行此操作:

"(%d goals, $%d)" % (self.goals, self.penalties)

这是什么Python 3版本?

我尝试在线搜索示例,但一直在获取Python 2版本。

I do this in Python 2:

"(%d goals, $%d)" % (self.goals, self.penalties)

What is the Python 3 version of this?

I tried searching for examples online but I kept getting Python 2 versions.


回答 0

这是有关“新”格式语法的文档。一个例子是:

"({:d} goals, ${:d})".format(self.goals, self.penalties)

如果goalspenalties均为整数(即​​默认格式为ok),则可以将其缩短为:

"({} goals, ${})".format(self.goals, self.penalties)

而且由于参数是的字段self,所以还有一种使用单个参数两次的方法(如@Burhan Khalid在评论中指出的那样):

"({0.goals} goals, ${0.penalties})".format(self)

说明:

  • {} 表示仅下一个位置参数,采用默认格式;
  • {0}表示带有index的参数0,默认格式;
  • {:d} 是下一个位置参数,采用十进制整数格式;
  • {0:d}是带有index的参数0,十进制整数格式。

选择参数时,您可以执行许多其他操作(使用命名参数而不是位置参数,访问字段等),还有许多格式选项(填充数字,使用千位分隔符,显示或不显示符号等)。其他一些例子:

"({goals} goals, ${penalties})".format(goals=2, penalties=4)
"({goals} goals, ${penalties})".format(**self.__dict__)

"first goal: {0.goal_list[0]}".format(self)
"second goal: {.goal_list[1]}".format(self)

"conversion rate: {:.2f}".format(self.goals / self.shots) # '0.20'
"conversion rate: {:.2%}".format(self.goals / self.shots) # '20.45%'
"conversion rate: {:.0%}".format(self.goals / self.shots) # '20%'

"self: {!s}".format(self) # 'Player: Bob'
"self: {!r}".format(self) # '<__main__.Player instance at 0x00BF7260>'

"games: {:>3}".format(player1.games)  # 'games: 123'
"games: {:>3}".format(player2.games)  # 'games:   4'
"games: {:0>3}".format(player2.games) # 'games: 004'

注意:正如其他人指出的那样,新格式并不能取代前一种格式,这两种格式都可以在Python 3和更高版本的Python 2中使用。有人可能会说这是优先考虑的问题,但是恕我直言,较新的版本比旧的版本更具表现力,并且在编写新代码时应使用(当然,除非它针对较旧的环境)。

Here are the docs about the “new” format syntax. An example would be:

"({:d} goals, ${:d})".format(self.goals, self.penalties)

If both goals and penalties are integers (i.e. their default format is ok), it could be shortened to:

"({} goals, ${})".format(self.goals, self.penalties)

And since the parameters are fields of self, there’s also a way of doing it using a single argument twice (as @Burhan Khalid noted in the comments):

"({0.goals} goals, ${0.penalties})".format(self)

Explaining:

  • {} means just the next positional argument, with default format;
  • {0} means the argument with index 0, with default format;
  • {:d} is the next positional argument, with decimal integer format;
  • {0:d} is the argument with index 0, with decimal integer format.

There are many others things you can do when selecting an argument (using named arguments instead of positional ones, accessing fields, etc) and many format options as well (padding the number, using thousands separators, showing sign or not, etc). Some other examples:

"({goals} goals, ${penalties})".format(goals=2, penalties=4)
"({goals} goals, ${penalties})".format(**self.__dict__)

"first goal: {0.goal_list[0]}".format(self)
"second goal: {.goal_list[1]}".format(self)

"conversion rate: {:.2f}".format(self.goals / self.shots) # '0.20'
"conversion rate: {:.2%}".format(self.goals / self.shots) # '20.45%'
"conversion rate: {:.0%}".format(self.goals / self.shots) # '20%'

"self: {!s}".format(self) # 'Player: Bob'
"self: {!r}".format(self) # '<__main__.Player instance at 0x00BF7260>'

"games: {:>3}".format(player1.games)  # 'games: 123'
"games: {:>3}".format(player2.games)  # 'games:   4'
"games: {:0>3}".format(player2.games) # 'games: 004'

Note: As others pointed out, the new format does not supersede the former, both are available both in Python 3 and the newer versions of Python 2 as well. Some may say it’s a matter of preference, but IMHO the newer is much more expressive than the older, and should be used whenever writing new code (unless it’s targeting older environments, of course).


回答 1

Python 3.6现在支持PEP 498的简写文字字符串插值。对于您的用例,新语法很简单:

f"({self.goals} goals, ${self.penalties})"

这与以前的.format标准,但让一个容易做的事情一样

>>> width = 10
>>> precision = 4
>>> value = decimal.Decimal('12.34567')
>>> f'result: {value:{width}.{precision}}'
'result:      12.35'

Python 3.6 now supports shorthand literal string interpolation with PEP 498. For your use case, the new syntax is simply:

f"({self.goals} goals, ${self.penalties})"

This is similar to the previous .format standard, but lets one easily do things like:

>>> width = 10
>>> precision = 4
>>> value = decimal.Decimal('12.34567')
>>> f'result: {value:{width}.{precision}}'
'result:      12.35'

回答 2

该行在Python 3中保持原样。

>>> sys.version
'3.2 (r32:88445, Oct 20 2012, 14:09:29) \n[GCC 4.5.2]'
>>> "(%d goals, $%d)" % (self.goals, self.penalties)
'(1 goals, $2)'

That line works as-is in Python 3.

>>> sys.version
'3.2 (r32:88445, Oct 20 2012, 14:09:29) \n[GCC 4.5.2]'
>>> "(%d goals, $%d)" % (self.goals, self.penalties)
'(1 goals, $2)'

回答 3

我喜欢这种方法

my_hash = {}
my_hash["goals"] = 3 #to show number
my_hash["penalties"] = "5" #to show string
print("I scored %(goals)d goals and took %(penalties)s penalties" % my_hash)

请分别注意括号中的d和s。

输出将是:

I scored 3 goals and took 5 penalties

I like this approach

my_hash = {}
my_hash["goals"] = 3 #to show number
my_hash["penalties"] = "5" #to show string
print("I scored %(goals)d goals and took %(penalties)s penalties" % my_hash)

Note the appended d and s to the brackets respectively.

output will be:

I scored 3 goals and took 5 penalties

Python,在目录字符串中添加尾部斜线,独立于操作系统

问题:Python,在目录字符串中添加尾部斜线,独立于操作系统

如果尾部斜杠尚不存在,如何在目录字符串中添加尾部斜杠(/对于* nix,\对于win32)?谢谢!

How can I add a trailing slash (/ for *nix, \ for win32) to a directory string, if the tailing slash is not already there? Thanks!


回答 0

os.path.join(path, '') 如果末尾不存在斜线,则会添加该斜线。

你可以做os.path.join(path, '', '')os.path.join(path_with_a_trailing_slash, '')你仍然只会得到一个尾部的斜杠。

os.path.join(path, '') will add the trailing slash if it’s not already there.

You can do os.path.join(path, '', '') or os.path.join(path_with_a_trailing_slash, '') and you will still only get one trailing slash.


回答 1

由于要连接目录和文件名,请使用

os.path.join(directory, filename)

如果要摆脱.\..\..\blah\路径,请使用

os.path.join(os.path.normpath(directory), filename)

Since you want to connect a directory and a filename, use

os.path.join(directory, filename)

If you want to get rid of .\..\..\blah\ paths, use

os.path.join(os.path.normpath(directory), filename)

回答 2

您可以通过以下方式手动进行操作:

path = ...

import os
if not path.endswith(os.path.sep):
    path += os.path.sep

但是,通常使用起来更清洁os.path.join

You can do it manually by:

path = ...

import os
if not path.endswith(os.path.sep):
    path += os.path.sep

However, it is usually much cleaner to use os.path.join.


回答 3

您可以使用如下形式:

os.path.normcase(path)
    Normalize the case of a pathname. On Unix and Mac OS X, this returns the path unchanged; on case-insensitive filesystems, it converts the path to lowercase. On Windows, it also converts forward slashes to backward slashes.

否则,您可以在页面上寻找其他内容

You could use something like this:

os.path.normcase(path)
    Normalize the case of a pathname. On Unix and Mac OS X, this returns the path unchanged; on case-insensitive filesystems, it converts the path to lowercase. On Windows, it also converts forward slashes to backward slashes.

Else you could look for something else on this page


在python中创建漂亮的列输出

问题:在python中创建漂亮的列输出

我试图在python中创建一个不错的列列表,以与我创建的命令行管理工具一起使用。

基本上,我想要一个像这样的列表:

[['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]

变成:

a            b            c
aaaaaaaaaa   b            c
a            bbbbbbbbbb   c

使用普通标签不会解决问题,因为我不知道每一行中最长的数据。

这与Linux中的’column -t’相同。

$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c"
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c

$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c" | column -t
a           b           c
aaaaaaaaaa  b           c
a           bbbbbbbbbb  c

我到处寻找各种python库来执行此操作,但找不到任何有用的方法。

I am trying to create a nice column list in python for use with commandline admin tools which I create.

Basicly, I want a list like:

[['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]

To turn into:

a            b            c
aaaaaaaaaa   b            c
a            bbbbbbbbbb   c

Using plain tabs wont do the trick here because I don’t know the longest data in each row.

This is the same behavior as ‘column -t’ in Linux..

$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c"
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c

$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c" | column -t
a           b           c
aaaaaaaaaa  b           c
a           bbbbbbbbbb  c

I have looked around for various python libraries to do this but can’t find anything useful.


回答 0

data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]

col_width = max(len(word) for row in data for word in row) + 2  # padding
for row in data:
    print "".join(word.ljust(col_width) for word in row)

a            b            c            
aaaaaaaaaa   b            c            
a            bbbbbbbbbb   c   

这样做是计算最长的数据条目以确定列宽,然后.ljust()在打印出每一列时用于添加必要的填充。

data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]

col_width = max(len(word) for row in data for word in row) + 2  # padding
for row in data:
    print "".join(word.ljust(col_width) for word in row)

a            b            c            
aaaaaaaaaa   b            c            
a            bbbbbbbbbb   c   

What this does is calculate the longest data entry to determine the column width, then use .ljust() to add the necessary padding when printing out each column.


回答 1

从Python 2.6+开始,您可以通过以下方式使用格式字符串,以将列设置为至少20个字符,并将文本向右对齐。

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]
for row in table_data:
    print("{: >20} {: >20} {: >20}".format(*row))

输出:

               a                    b                    c
      aaaaaaaaaa                    b                    c
               a           bbbbbbbbbb                    c

Since Python 2.6+, you can use a format string in the following way to set the columns to a minimum of 20 characters and align text to right.

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]
for row in table_data:
    print("{: >20} {: >20} {: >20}".format(*row))

Output:

               a                    b                    c
      aaaaaaaaaa                    b                    c
               a           bbbbbbbbbb                    c

回答 2

我来到这里有相同的要求,但@lvc和@Preet的答案似乎与column -t该列中产生的内容具有不同的宽度更加一致:

>>> rows =  [   ['a',           'b',            'c',    'd']
...         ,   ['aaaaaaaaaa',  'b',            'c',    'd']
...         ,   ['a',           'bbbbbbbbbb',   'c',    'd']
...         ]
...

>>> widths = [max(map(len, col)) for col in zip(*rows)]
>>> for row in rows:
...     print "  ".join((val.ljust(width) for val, width in zip(row, widths)))
...
a           b           c  d
aaaaaaaaaa  b           c  d
a           bbbbbbbbbb  c  d

I came here with the same requirements but @lvc and @Preet’s answers seems more inline with what column -t produces in that columns have different widths:

>>> rows =  [   ['a',           'b',            'c',    'd']
...         ,   ['aaaaaaaaaa',  'b',            'c',    'd']
...         ,   ['a',           'bbbbbbbbbb',   'c',    'd']
...         ]
...

>>> widths = [max(map(len, col)) for col in zip(*rows)]
>>> for row in rows:
...     print "  ".join((val.ljust(width) for val, width in zip(row, widths)))
...
a           b           c  d
aaaaaaaaaa  b           c  d
a           bbbbbbbbbb  c  d

回答 3

这对聚会来说有点晚了,是我写的一个无耻的插件,但是您也可以查看Columnar软件包。

它接受一个输入列表和一个标题列表,并输出一个表格式的字符串。此代码段创建了一个docker-esque表:

from columnar import columnar

headers = ['name', 'id', 'host', 'notes']

data = [
    ['busybox', 'c3c37d5d-38d2-409f-8d02-600fd9d51239', 'linuxnode-1-292735', 'Test server.'],
    ['alpine-python', '6bb77855-0fda-45a9-b553-e19e1a795f1e', 'linuxnode-2-249253', 'The one that runs python.'],
    ['redis', 'afb648ba-ac97-4fb2-8953-9a5b5f39663e', 'linuxnode-3-3416918', 'For queues and stuff.'],
    ['app-server', 'b866cd0f-bf80-40c7-84e3-c40891ec68f9', 'linuxnode-4-295918', 'A popular destination.'],
    ['nginx', '76fea0f0-aa53-4911-b7e4-fae28c2e469b', 'linuxnode-5-292735', 'Traffic Cop'],
]

table = columnar(data, headers, no_borders=True)
print(table)

表格显示无边框样式

或者,您可以得到一些带有颜色和边框的爱好者。 表显示春季经典

要了解有关列大小调整算法的更多信息并查看API的其余部分,可以查看上面的链接或查看Columnar GitHub Repo

This is a little late to the party, and a shameless plug for a package I wrote, but you can also check out the Columnar package.

It takes a list of lists of input and a list of headers and outputs a table-formatted string. This snippet creates a docker-esque table:

from columnar import columnar

headers = ['name', 'id', 'host', 'notes']

data = [
    ['busybox', 'c3c37d5d-38d2-409f-8d02-600fd9d51239', 'linuxnode-1-292735', 'Test server.'],
    ['alpine-python', '6bb77855-0fda-45a9-b553-e19e1a795f1e', 'linuxnode-2-249253', 'The one that runs python.'],
    ['redis', 'afb648ba-ac97-4fb2-8953-9a5b5f39663e', 'linuxnode-3-3416918', 'For queues and stuff.'],
    ['app-server', 'b866cd0f-bf80-40c7-84e3-c40891ec68f9', 'linuxnode-4-295918', 'A popular destination.'],
    ['nginx', '76fea0f0-aa53-4911-b7e4-fae28c2e469b', 'linuxnode-5-292735', 'Traffic Cop'],
]

table = columnar(data, headers, no_borders=True)
print(table)

Table Displaying No-border Style

Or you can get a little fancier with colors and borders. Table Displaying Spring Classics

To read more about the column-sizing algorithm and see the rest of the API you can check out the link above or see the Columnar GitHub Repo


回答 4

您必须使用2个通行证来执行此操作:

  1. 获取每列的最大宽度。
  2. 使用格式化使用我们最宽的知识从第一遍列str.ljust()str.rjust()

You have to do this with 2 passes:

  1. get the maximum width of each column.
  2. formatting the columns using our knowledge of max width from the first pass using str.ljust() and str.rjust()

回答 5

像这样转换列是zip的工作:

>>> a = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
>>> list(zip(*a))
[('a', 'aaaaaaaaaa', 'a'), ('b', 'b', 'bbbbbbbbbb'), ('c', 'c', 'c')]

要查找每列所需的长度,可以使用max

>>> trans_a = zip(*a)
>>> [max(len(c) for c in b) for b in trans_a]
[10, 10, 1]

您可以在适当的填充下使用它来构造要传递给的字符串print

>>> col_lenghts = [max(len(c) for c in b) for b in trans_a]
>>> padding = ' ' # You might want more
>>> padding.join(s.ljust(l) for s,l in zip(a[0], col_lenghts))
'a          b          c'

Transposing the columns like that is a job for zip:

>>> a = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
>>> list(zip(*a))
[('a', 'aaaaaaaaaa', 'a'), ('b', 'b', 'bbbbbbbbbb'), ('c', 'c', 'c')]

To find the required length of each column, you can use max:

>>> trans_a = zip(*a)
>>> [max(len(c) for c in b) for b in trans_a]
[10, 10, 1]

Which you can use, with suitable padding, to construct strings to pass to print:

>>> col_lenghts = [max(len(c) for c in b) for b in trans_a]
>>> padding = ' ' # You might want more
>>> padding.join(s.ljust(l) for s,l in zip(a[0], col_lenghts))
'a          b          c'

回答 6

要获得更高级的表

---------------------------------------------------
| First Name | Last Name        | Age | Position  |
---------------------------------------------------
| John       | Smith            | 24  | Software  |
|            |                  |     | Engineer  |
---------------------------------------------------
| Mary       | Brohowski        | 23  | Sales     |
|            |                  |     | Manager   |
---------------------------------------------------
| Aristidis  | Papageorgopoulos | 28  | Senior    |
|            |                  |     | Reseacher |
---------------------------------------------------

您可以使用以下Python配方

'''
From http://code.activestate.com/recipes/267662-table-indentation/
PSF License
'''
import cStringIO,operator

def indent(rows, hasHeader=False, headerChar='-', delim=' | ', justify='left',
           separateRows=False, prefix='', postfix='', wrapfunc=lambda x:x):
    """Indents a table by column.
       - rows: A sequence of sequences of items, one sequence per row.
       - hasHeader: True if the first row consists of the columns' names.
       - headerChar: Character to be used for the row separator line
         (if hasHeader==True or separateRows==True).
       - delim: The column delimiter.
       - justify: Determines how are data justified in their column. 
         Valid values are 'left','right' and 'center'.
       - separateRows: True if rows are to be separated by a line
         of 'headerChar's.
       - prefix: A string prepended to each printed row.
       - postfix: A string appended to each printed row.
       - wrapfunc: A function f(text) for wrapping text; each element in
         the table is first wrapped by this function."""
    # closure for breaking logical rows to physical, using wrapfunc
    def rowWrapper(row):
        newRows = [wrapfunc(item).split('\n') for item in row]
        return [[substr or '' for substr in item] for item in map(None,*newRows)]
    # break each logical row into one or more physical ones
    logicalRows = [rowWrapper(row) for row in rows]
    # columns of physical rows
    columns = map(None,*reduce(operator.add,logicalRows))
    # get the maximum of each column by the string length of its items
    maxWidths = [max([len(str(item)) for item in column]) for column in columns]
    rowSeparator = headerChar * (len(prefix) + len(postfix) + sum(maxWidths) + \
                                 len(delim)*(len(maxWidths)-1))
    # select the appropriate justify method
    justify = {'center':str.center, 'right':str.rjust, 'left':str.ljust}[justify.lower()]
    output=cStringIO.StringIO()
    if separateRows: print >> output, rowSeparator
    for physicalRows in logicalRows:
        for row in physicalRows:
            print >> output, \
                prefix \
                + delim.join([justify(str(item),width) for (item,width) in zip(row,maxWidths)]) \
                + postfix
        if separateRows or hasHeader: print >> output, rowSeparator; hasHeader=False
    return output.getvalue()

# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
    """
    A word-wrap function that preserves existing line breaks
    and most spaces in the text. Expects that existing line
    breaks are posix newlines (\n).
    """
    return reduce(lambda line, word, width=width: '%s%s%s' %
                  (line,
                   ' \n'[(len(line[line.rfind('\n')+1:])
                         + len(word.split('\n',1)[0]
                              ) >= width)],
                   word),
                  text.split(' ')
                 )

import re
def wrap_onspace_strict(text, width):
    """Similar to wrap_onspace, but enforces the width constraint:
       words longer than width are split."""
    wordRegex = re.compile(r'\S{'+str(width)+r',}')
    return wrap_onspace(wordRegex.sub(lambda m: wrap_always(m.group(),width),text),width)

import math
def wrap_always(text, width):
    """A simple word-wrap function that wraps text on exactly width characters.
       It doesn't split the text in words."""
    return '\n'.join([ text[width*i:width*(i+1)] \
                       for i in xrange(int(math.ceil(1.*len(text)/width))) ])

if __name__ == '__main__':
    labels = ('First Name', 'Last Name', 'Age', 'Position')
    data = \
    '''John,Smith,24,Software Engineer
       Mary,Brohowski,23,Sales Manager
       Aristidis,Papageorgopoulos,28,Senior Reseacher'''
    rows = [row.strip().split(',')  for row in data.splitlines()]

    print 'Without wrapping function\n'
    print indent([labels]+rows, hasHeader=True)
    # test indent with different wrapping functions
    width = 10
    for wrapper in (wrap_always,wrap_onspace,wrap_onspace_strict):
        print 'Wrapping function: %s(x,width=%d)\n' % (wrapper.__name__,width)
        print indent([labels]+rows, hasHeader=True, separateRows=True,
                     prefix='| ', postfix=' |',
                     wrapfunc=lambda x: wrapper(x,width))

    # output:
    #
    #Without wrapping function
    #
    #First Name | Last Name        | Age | Position         
    #-------------------------------------------------------
    #John       | Smith            | 24  | Software Engineer
    #Mary       | Brohowski        | 23  | Sales Manager    
    #Aristidis  | Papageorgopoulos | 28  | Senior Reseacher 
    #
    #Wrapping function: wrap_always(x,width=10)
    #
    #----------------------------------------------
    #| First Name | Last Name  | Age | Position   |
    #----------------------------------------------
    #| John       | Smith      | 24  | Software E |
    #|            |            |     | ngineer    |
    #----------------------------------------------
    #| Mary       | Brohowski  | 23  | Sales Mana |
    #|            |            |     | ger        |
    #----------------------------------------------
    #| Aristidis  | Papageorgo | 28  | Senior Res |
    #|            | poulos     |     | eacher     |
    #----------------------------------------------
    #
    #Wrapping function: wrap_onspace(x,width=10)
    #
    #---------------------------------------------------
    #| First Name | Last Name        | Age | Position  |
    #---------------------------------------------------
    #| John       | Smith            | 24  | Software  |
    #|            |                  |     | Engineer  |
    #---------------------------------------------------
    #| Mary       | Brohowski        | 23  | Sales     |
    #|            |                  |     | Manager   |
    #---------------------------------------------------
    #| Aristidis  | Papageorgopoulos | 28  | Senior    |
    #|            |                  |     | Reseacher |
    #---------------------------------------------------
    #
    #Wrapping function: wrap_onspace_strict(x,width=10)
    #
    #---------------------------------------------
    #| First Name | Last Name  | Age | Position  |
    #---------------------------------------------
    #| John       | Smith      | 24  | Software  |
    #|            |            |     | Engineer  |
    #---------------------------------------------
    #| Mary       | Brohowski  | 23  | Sales     |
    #|            |            |     | Manager   |
    #---------------------------------------------
    #| Aristidis  | Papageorgo | 28  | Senior    |
    #|            | poulos     |     | Reseacher |
    #---------------------------------------------

Python的配方页面包含了它一些改进。

To get fancier tables like

---------------------------------------------------
| First Name | Last Name        | Age | Position  |
---------------------------------------------------
| John       | Smith            | 24  | Software  |
|            |                  |     | Engineer  |
---------------------------------------------------
| Mary       | Brohowski        | 23  | Sales     |
|            |                  |     | Manager   |
---------------------------------------------------
| Aristidis  | Papageorgopoulos | 28  | Senior    |
|            |                  |     | Reseacher |
---------------------------------------------------

you can use this Python recipe:

'''
From http://code.activestate.com/recipes/267662-table-indentation/
PSF License
'''
import cStringIO,operator

def indent(rows, hasHeader=False, headerChar='-', delim=' | ', justify='left',
           separateRows=False, prefix='', postfix='', wrapfunc=lambda x:x):
    """Indents a table by column.
       - rows: A sequence of sequences of items, one sequence per row.
       - hasHeader: True if the first row consists of the columns' names.
       - headerChar: Character to be used for the row separator line
         (if hasHeader==True or separateRows==True).
       - delim: The column delimiter.
       - justify: Determines how are data justified in their column. 
         Valid values are 'left','right' and 'center'.
       - separateRows: True if rows are to be separated by a line
         of 'headerChar's.
       - prefix: A string prepended to each printed row.
       - postfix: A string appended to each printed row.
       - wrapfunc: A function f(text) for wrapping text; each element in
         the table is first wrapped by this function."""
    # closure for breaking logical rows to physical, using wrapfunc
    def rowWrapper(row):
        newRows = [wrapfunc(item).split('\n') for item in row]
        return [[substr or '' for substr in item] for item in map(None,*newRows)]
    # break each logical row into one or more physical ones
    logicalRows = [rowWrapper(row) for row in rows]
    # columns of physical rows
    columns = map(None,*reduce(operator.add,logicalRows))
    # get the maximum of each column by the string length of its items
    maxWidths = [max([len(str(item)) for item in column]) for column in columns]
    rowSeparator = headerChar * (len(prefix) + len(postfix) + sum(maxWidths) + \
                                 len(delim)*(len(maxWidths)-1))
    # select the appropriate justify method
    justify = {'center':str.center, 'right':str.rjust, 'left':str.ljust}[justify.lower()]
    output=cStringIO.StringIO()
    if separateRows: print >> output, rowSeparator
    for physicalRows in logicalRows:
        for row in physicalRows:
            print >> output, \
                prefix \
                + delim.join([justify(str(item),width) for (item,width) in zip(row,maxWidths)]) \
                + postfix
        if separateRows or hasHeader: print >> output, rowSeparator; hasHeader=False
    return output.getvalue()

# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
    """
    A word-wrap function that preserves existing line breaks
    and most spaces in the text. Expects that existing line
    breaks are posix newlines (\n).
    """
    return reduce(lambda line, word, width=width: '%s%s%s' %
                  (line,
                   ' \n'[(len(line[line.rfind('\n')+1:])
                         + len(word.split('\n',1)[0]
                              ) >= width)],
                   word),
                  text.split(' ')
                 )

import re
def wrap_onspace_strict(text, width):
    """Similar to wrap_onspace, but enforces the width constraint:
       words longer than width are split."""
    wordRegex = re.compile(r'\S{'+str(width)+r',}')
    return wrap_onspace(wordRegex.sub(lambda m: wrap_always(m.group(),width),text),width)

import math
def wrap_always(text, width):
    """A simple word-wrap function that wraps text on exactly width characters.
       It doesn't split the text in words."""
    return '\n'.join([ text[width*i:width*(i+1)] \
                       for i in xrange(int(math.ceil(1.*len(text)/width))) ])

if __name__ == '__main__':
    labels = ('First Name', 'Last Name', 'Age', 'Position')
    data = \
    '''John,Smith,24,Software Engineer
       Mary,Brohowski,23,Sales Manager
       Aristidis,Papageorgopoulos,28,Senior Reseacher'''
    rows = [row.strip().split(',')  for row in data.splitlines()]

    print 'Without wrapping function\n'
    print indent([labels]+rows, hasHeader=True)
    # test indent with different wrapping functions
    width = 10
    for wrapper in (wrap_always,wrap_onspace,wrap_onspace_strict):
        print 'Wrapping function: %s(x,width=%d)\n' % (wrapper.__name__,width)
        print indent([labels]+rows, hasHeader=True, separateRows=True,
                     prefix='| ', postfix=' |',
                     wrapfunc=lambda x: wrapper(x,width))

    # output:
    #
    #Without wrapping function
    #
    #First Name | Last Name        | Age | Position         
    #-------------------------------------------------------
    #John       | Smith            | 24  | Software Engineer
    #Mary       | Brohowski        | 23  | Sales Manager    
    #Aristidis  | Papageorgopoulos | 28  | Senior Reseacher 
    #
    #Wrapping function: wrap_always(x,width=10)
    #
    #----------------------------------------------
    #| First Name | Last Name  | Age | Position   |
    #----------------------------------------------
    #| John       | Smith      | 24  | Software E |
    #|            |            |     | ngineer    |
    #----------------------------------------------
    #| Mary       | Brohowski  | 23  | Sales Mana |
    #|            |            |     | ger        |
    #----------------------------------------------
    #| Aristidis  | Papageorgo | 28  | Senior Res |
    #|            | poulos     |     | eacher     |
    #----------------------------------------------
    #
    #Wrapping function: wrap_onspace(x,width=10)
    #
    #---------------------------------------------------
    #| First Name | Last Name        | Age | Position  |
    #---------------------------------------------------
    #| John       | Smith            | 24  | Software  |
    #|            |                  |     | Engineer  |
    #---------------------------------------------------
    #| Mary       | Brohowski        | 23  | Sales     |
    #|            |                  |     | Manager   |
    #---------------------------------------------------
    #| Aristidis  | Papageorgopoulos | 28  | Senior    |
    #|            |                  |     | Reseacher |
    #---------------------------------------------------
    #
    #Wrapping function: wrap_onspace_strict(x,width=10)
    #
    #---------------------------------------------
    #| First Name | Last Name  | Age | Position  |
    #---------------------------------------------
    #| John       | Smith      | 24  | Software  |
    #|            |            |     | Engineer  |
    #---------------------------------------------
    #| Mary       | Brohowski  | 23  | Sales     |
    #|            |            |     | Manager   |
    #---------------------------------------------
    #| Aristidis  | Papageorgo | 28  | Senior    |
    #|            | poulos     |     | Reseacher |
    #---------------------------------------------

The Python recipe page contains a few improvements on it.


回答 7

pandas 创建数据框的基于解决方案:

import pandas as pd
l = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
df = pd.DataFrame(l)

print(df)
            0           1  2
0           a           b  c
1  aaaaaaaaaa           b  c
2           a  bbbbbbbbbb  c

要删除索引和标头值以创建所需的输出,可以使用to_string方法:

result = df.to_string(index=False, header=False)

print(result)
          a           b  c
 aaaaaaaaaa           b  c
          a  bbbbbbbbbb  c

pandas based solution with creating dataframe:

import pandas as pd
l = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
df = pd.DataFrame(l)

print(df)
            0           1  2
0           a           b  c
1  aaaaaaaaaa           b  c
2           a  bbbbbbbbbb  c

To remove index and header values to create output what you want you could use to_string method:

result = df.to_string(index=False, header=False)

print(result)
          a           b  c
 aaaaaaaaaa           b  c
          a  bbbbbbbbbb  c

回答 8

Scolp是一个新的库,可让您在自动调整列宽的同时轻松打印流式列数据。

(免责声明:我是作者)

Scolp is a new library that lets you pretty print streaming columnar data easily while auto-adjusting column width.

(Disclaimer: I am the author)


回答 9

这将基于其他答案中使用的最大度量设置独立的,最适合的列宽。

data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
padding = 2
col_widths = [max(len(w) for w in [r[cn] for r in data]) + padding for cn in range(len(data[0]))]
format_string = "{{:{}}}{{:{}}}{{:{}}}".format(*col_widths)
for row in data:
    print(format_string.format(*row))

This sets independent, best-fit column widths based on the max-metric used in other answers.

data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
padding = 2
col_widths = [max(len(w) for w in [r[cn] for r in data]) + padding for cn in range(len(data[0]))]
format_string = "{{:{}}}{{:{}}}{{:{}}}".format(*col_widths)
for row in data:
    print(format_string.format(*row))

回答 10

对于懒惰的人

使用Python 3. *Pandas / Geopandas的代码;通用的简单类内方法(对于“普通”脚本,只需删除self):

函数着色:

    def colorize(self,s,color):
        s = color+str(s)+"\033[0m"
        return s

标头:

print('{0:<23} {1:>24} {2:>26} {3:>26} {4:>11} {5:>11}'.format('Road name','Classification','Function','Form of road','Length','Distance') )

然后是来自Pandas / Geopandas数据框的数据:

            for index, row in clipped.iterrows():
                rdName      = self.colorize(row['name1'],"\033[32m")
                rdClass     = self.colorize(row['roadClassification'],"\033[93m")
                rdFunction  = self.colorize(row['roadFunction'],"\033[33m")
                rdForm      = self.colorize(row['formOfWay'],"\033[94m")
                rdLength    = self.colorize(row['length'],"\033[97m")
                rdDistance  = self.colorize(row['distance'],"\033[96m")
                print('{0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}'.format(rdName,rdClass,rdFunction,rdForm,rdLength,rdDistance) )

含义{0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}

0, 1, 2, 3, 4, 5 ->列,在这种情况下总共有6个

30, 35, 20->列的宽度(请注意,您必须添加长度\033[96m-对于Python来说,这也是一个字符串),只需进行实验即可:)

>, <->对齐:右,左(也=用于填充零)

如果要区分例如最大值,则必须切换到特殊的Pandas样式函数,但是假设这足以在终端窗口上显示数据。

结果:

在此处输入图片说明

For lazy people

that are using Python 3.* and Pandas/Geopandas; universal simple in-class approach (for ‘normal’ script just remove self):

Function colorize:

    def colorize(self,s,color):
        s = color+str(s)+"\033[0m"
        return s

Header:

print('{0:<23} {1:>24} {2:>26} {3:>26} {4:>11} {5:>11}'.format('Road name','Classification','Function','Form of road','Length','Distance') )

and then data from Pandas/Geopandas dataframe:

            for index, row in clipped.iterrows():
                rdName      = self.colorize(row['name1'],"\033[32m")
                rdClass     = self.colorize(row['roadClassification'],"\033[93m")
                rdFunction  = self.colorize(row['roadFunction'],"\033[33m")
                rdForm      = self.colorize(row['formOfWay'],"\033[94m")
                rdLength    = self.colorize(row['length'],"\033[97m")
                rdDistance  = self.colorize(row['distance'],"\033[96m")
                print('{0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}'.format(rdName,rdClass,rdFunction,rdForm,rdLength,rdDistance) )

Meaning of {0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}:

0, 1, 2, 3, 4, 5 -> columns, there are 6 in total in this case

30, 35, 20 -> width of column (note that you’ll have to add length of \033[96m – this for Python is a string as well), just experiment :)

>, < -> justify: right, left (there is = for filling with zeros as well)

If you want to distinct e.g. max value, you’ll have to switch to special Pandas style function, but suppose that’s far enough to present data on terminal window.

Result:

enter image description here


回答 11

与先前的答案略有不同(我没有足够的代表对此发表评论)。格式库允许您指定元素的宽度和对齐方式,但不能指定元素的开始位置,即可以说“宽20列”,而不能说“从20列开始”。导致此问题:

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]

print("first row: {: >20} {: >20} {: >20}".format(*table_data[0]))
print("second row: {: >20} {: >20} {: >20}".format(*table_data[1]))
print("third row: {: >20} {: >20} {: >20}".format(*table_data[2]))

输出量

first row:                    a                    b                    c
second row:           aaaaaaaaaa                    b                    c
third row:                    a           bbbbbbbbbb                    c

当然,答案是格式化文字字符串,它与格式有点奇怪地结合在一起:

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]

print(f"{'first row:': <20} {table_data[0][0]: >20} {table_data[0][1]: >20} {table_data[0][2]: >20}")
print("{: <20} {: >20} {: >20} {: >20}".format(*['second row:', *table_data[1]]))
print("{: <20} {: >20} {: >20} {: >20}".format(*['third row:', *table_data[1]]))

输出量

first row:                              a                    b                    c
second row:                    aaaaaaaaaa                    b                    c
third row:                     aaaaaaaaaa                    b                    c

A slight variation on a previous answer (I don’t have enough rep to comment on it). The format library lets you specify the width and alignment of an element but not where it starts, ie, you can say “be 20 columns wide” but not “start in column 20”. Which leads to this issue:

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]

print("first row: {: >20} {: >20} {: >20}".format(*table_data[0]))
print("second row: {: >20} {: >20} {: >20}".format(*table_data[1]))
print("third row: {: >20} {: >20} {: >20}".format(*table_data[2]))

Output

first row:                    a                    b                    c
second row:           aaaaaaaaaa                    b                    c
third row:                    a           bbbbbbbbbb                    c

The answer of course is to format the literal strings as well, which combines slightly weirdly with the format:

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]

print(f"{'first row:': <20} {table_data[0][0]: >20} {table_data[0][1]: >20} {table_data[0][2]: >20}")
print("{: <20} {: >20} {: >20} {: >20}".format(*['second row:', *table_data[1]]))
print("{: <20} {: >20} {: >20} {: >20}".format(*['third row:', *table_data[1]]))

Output

first row:                              a                    b                    c
second row:                    aaaaaaaaaa                    b                    c
third row:                     aaaaaaaaaa                    b                    c

回答 12

我发现这个答案超级有用且优雅,最初是从这里开始的

matrix = [["A", "B"], ["C", "D"]]

print('\n'.join(['\t'.join([str(cell) for cell in row]) for row in matrix]))

输出量

A   B
C   D

I found this answer super-helpful and elegant, originally from here:

matrix = [["A", "B"], ["C", "D"]]

print('\n'.join(['\t'.join([str(cell) for cell in row]) for row in matrix]))

Output

A   B
C   D

回答 13

这是肖恩·钦(Shawn Chin)答案的一种变体。宽度是固定的,不是每列都固定。第一行下方和各列之间还有一个边框。(icontract库用于执行合同。)

@icontract.pre(
    lambda table: not table or all(len(row) == len(table[0]) for row in table))
@icontract.post(lambda table, result: result == "" if not table else True)
@icontract.post(lambda result: not result.endswith("\n"))
def format_table(table: List[List[str]]) -> str:
    """
    Format the table as equal-spaced columns.

    :param table: rows of cells
    :return: table as string
    """
    cols = len(table[0])

    col_widths = [max(len(row[i]) for row in table) for i in range(cols)]

    lines = []  # type: List[str]
    for i, row in enumerate(table):
        parts = []  # type: List[str]

        for cell, width in zip(row, col_widths):
            parts.append(cell.ljust(width))

        line = " | ".join(parts)
        lines.append(line)

        if i == 0:
            border = []  # type: List[str]

            for width in col_widths:
                border.append("-" * width)

            lines.append("-+-".join(border))

    result = "\n".join(lines)

    return result

这是一个例子:

>>> table = [['column 0', 'another column 1'], ['00', '01'], ['10', '11']]
>>> result = packagery._format_table(table=table)
>>> print(result)
column 0 | another column 1
---------+-----------------
00       | 01              
10       | 11              

Here is a variation of the Shawn Chin’s answer. The width is fixed per column, not over all columns. There is also a border below the first row and between the columns. (icontract library is used to enforce the contracts.)

@icontract.pre(
    lambda table: not table or all(len(row) == len(table[0]) for row in table))
@icontract.post(lambda table, result: result == "" if not table else True)
@icontract.post(lambda result: not result.endswith("\n"))
def format_table(table: List[List[str]]) -> str:
    """
    Format the table as equal-spaced columns.

    :param table: rows of cells
    :return: table as string
    """
    cols = len(table[0])

    col_widths = [max(len(row[i]) for row in table) for i in range(cols)]

    lines = []  # type: List[str]
    for i, row in enumerate(table):
        parts = []  # type: List[str]

        for cell, width in zip(row, col_widths):
            parts.append(cell.ljust(width))

        line = " | ".join(parts)
        lines.append(line)

        if i == 0:
            border = []  # type: List[str]

            for width in col_widths:
                border.append("-" * width)

            lines.append("-+-".join(border))

    result = "\n".join(lines)

    return result

Here is an example:

>>> table = [['column 0', 'another column 1'], ['00', '01'], ['10', '11']]
>>> result = packagery._format_table(table=table)
>>> print(result)
column 0 | another column 1
---------+-----------------
00       | 01              
10       | 11              

回答 14

更新了@Franck Dernoncourt花式食谱以符合python 3和PEP8

import io
import math
import operator
import re
import functools

from itertools import zip_longest


def indent(
    rows,
    has_header=False,
    header_char="-",
    delim=" | ",
    justify="left",
    separate_rows=False,
    prefix="",
    postfix="",
    wrapfunc=lambda x: x,
):
    """Indents a table by column.
       - rows: A sequence of sequences of items, one sequence per row.
       - hasHeader: True if the first row consists of the columns' names.
       - headerChar: Character to be used for the row separator line
         (if hasHeader==True or separateRows==True).
       - delim: The column delimiter.
       - justify: Determines how are data justified in their column.
         Valid values are 'left','right' and 'center'.
       - separateRows: True if rows are to be separated by a line
         of 'headerChar's.
       - prefix: A string prepended to each printed row.
       - postfix: A string appended to each printed row.
       - wrapfunc: A function f(text) for wrapping text; each element in
         the table is first wrapped by this function."""

    # closure for breaking logical rows to physical, using wrapfunc
    def row_wrapper(row):
        new_rows = [wrapfunc(item).split("\n") for item in row]
        return [[substr or "" for substr in item] for item in zip_longest(*new_rows)]

    # break each logical row into one or more physical ones
    logical_rows = [row_wrapper(row) for row in rows]
    # columns of physical rows
    columns = zip_longest(*functools.reduce(operator.add, logical_rows))
    # get the maximum of each column by the string length of its items
    max_widths = [max([len(str(item)) for item in column]) for column in columns]
    row_separator = header_char * (
        len(prefix) + len(postfix) + sum(max_widths) + len(delim) * (len(max_widths) - 1)
    )
    # select the appropriate justify method
    justify = {"center": str.center, "right": str.rjust, "left": str.ljust}[
        justify.lower()
    ]
    output = io.StringIO()
    if separate_rows:
        print(output, row_separator)
    for physicalRows in logical_rows:
        for row in physicalRows:
            print( output, prefix + delim.join(
                [justify(str(item), width) for (item, width) in zip(row, max_widths)]
            ) + postfix)
        if separate_rows or has_header:
            print(output, row_separator)
            has_header = False
    return output.getvalue()


# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
    """
    A word-wrap function that preserves existing line breaks
    and most spaces in the text. Expects that existing line
    breaks are posix newlines (\n).
    """
    return functools.reduce(
        lambda line, word, i_width=width: "%s%s%s"
        % (
            line,
            " \n"[
                (
                    len(line[line.rfind("\n") + 1 :]) + len(word.split("\n", 1)[0])
                    >= i_width
                )
            ],
            word,
        ),
        text.split(" "),
    )


def wrap_onspace_strict(text, i_width):
    """Similar to wrap_onspace, but enforces the width constraint:
       words longer than width are split."""
    word_regex = re.compile(r"\S{" + str(i_width) + r",}")
    return wrap_onspace(
        word_regex.sub(lambda m: wrap_always(m.group(), i_width), text), i_width
    )


def wrap_always(text, width):
    """A simple word-wrap function that wraps text on exactly width characters.
       It doesn't split the text in words."""
    return "\n".join(
        [
            text[width * i : width * (i + 1)]
            for i in range(int(math.ceil(1.0 * len(text) / width)))
        ]
    )


if __name__ == "__main__":
    labels = ("First Name", "Last Name", "Age", "Position")
    data = """John,Smith,24,Software Engineer
           Mary,Brohowski,23,Sales Manager
           Aristidis,Papageorgopoulos,28,Senior Reseacher"""
    rows = [row.strip().split(",") for row in data.splitlines()]

    print("Without wrapping function\n")
    print(indent([labels] + rows, has_header=True))

    # test indent with different wrapping functions
    width = 10
    for wrapper in (wrap_always, wrap_onspace, wrap_onspace_strict):
        print("Wrapping function: %s(x,width=%d)\n" % (wrapper.__name__, width))

        print(
            indent(
                [labels] + rows,
                has_header=True,
                separate_rows=True,
                prefix="| ",
                postfix=" |",
                wrapfunc=lambda x: wrapper(x, width),
            )
        )

    # output:
    #
    # Without wrapping function
    #
    # First Name | Last Name        | Age | Position
    # -------------------------------------------------------
    # John       | Smith            | 24  | Software Engineer
    # Mary       | Brohowski        | 23  | Sales Manager
    # Aristidis  | Papageorgopoulos | 28  | Senior Reseacher
    #
    # Wrapping function: wrap_always(x,width=10)
    #
    # ----------------------------------------------
    # | First Name | Last Name  | Age | Position   |
    # ----------------------------------------------
    # | John       | Smith      | 24  | Software E |
    # |            |            |     | ngineer    |
    # ----------------------------------------------
    # | Mary       | Brohowski  | 23  | Sales Mana |
    # |            |            |     | ger        |
    # ----------------------------------------------
    # | Aristidis  | Papageorgo | 28  | Senior Res |
    # |            | poulos     |     | eacher     |
    # ----------------------------------------------
    #
    # Wrapping function: wrap_onspace(x,width=10)
    #
    # ---------------------------------------------------
    # | First Name | Last Name        | Age | Position  |
    # ---------------------------------------------------
    # | John       | Smith            | 24  | Software  |
    # |            |                  |     | Engineer  |
    # ---------------------------------------------------
    # | Mary       | Brohowski        | 23  | Sales     |
    # |            |                  |     | Manager   |
    # ---------------------------------------------------
    # | Aristidis  | Papageorgopoulos | 28  | Senior    |
    # |            |                  |     | Reseacher |
    # ---------------------------------------------------
    #
    # Wrapping function: wrap_onspace_strict(x,width=10)
    #
    # ---------------------------------------------
    # | First Name | Last Name  | Age | Position  |
    # ---------------------------------------------
    # | John       | Smith      | 24  | Software  |
    # |            |            |     | Engineer  |
    # ---------------------------------------------
    # | Mary       | Brohowski  | 23  | Sales     |
    # |            |            |     | Manager   |
    # ---------------------------------------------
    # | Aristidis  | Papageorgo | 28  | Senior    |
    # |            | poulos     |     | Reseacher |
    # ---------------------------------------------

updated @Franck Dernoncourt fancy recipe to be python 3 and PEP8 compliant

import io
import math
import operator
import re
import functools

from itertools import zip_longest


def indent(
    rows,
    has_header=False,
    header_char="-",
    delim=" | ",
    justify="left",
    separate_rows=False,
    prefix="",
    postfix="",
    wrapfunc=lambda x: x,
):
    """Indents a table by column.
       - rows: A sequence of sequences of items, one sequence per row.
       - hasHeader: True if the first row consists of the columns' names.
       - headerChar: Character to be used for the row separator line
         (if hasHeader==True or separateRows==True).
       - delim: The column delimiter.
       - justify: Determines how are data justified in their column.
         Valid values are 'left','right' and 'center'.
       - separateRows: True if rows are to be separated by a line
         of 'headerChar's.
       - prefix: A string prepended to each printed row.
       - postfix: A string appended to each printed row.
       - wrapfunc: A function f(text) for wrapping text; each element in
         the table is first wrapped by this function."""

    # closure for breaking logical rows to physical, using wrapfunc
    def row_wrapper(row):
        new_rows = [wrapfunc(item).split("\n") for item in row]
        return [[substr or "" for substr in item] for item in zip_longest(*new_rows)]

    # break each logical row into one or more physical ones
    logical_rows = [row_wrapper(row) for row in rows]
    # columns of physical rows
    columns = zip_longest(*functools.reduce(operator.add, logical_rows))
    # get the maximum of each column by the string length of its items
    max_widths = [max([len(str(item)) for item in column]) for column in columns]
    row_separator = header_char * (
        len(prefix) + len(postfix) + sum(max_widths) + len(delim) * (len(max_widths) - 1)
    )
    # select the appropriate justify method
    justify = {"center": str.center, "right": str.rjust, "left": str.ljust}[
        justify.lower()
    ]
    output = io.StringIO()
    if separate_rows:
        print(output, row_separator)
    for physicalRows in logical_rows:
        for row in physicalRows:
            print( output, prefix + delim.join(
                [justify(str(item), width) for (item, width) in zip(row, max_widths)]
            ) + postfix)
        if separate_rows or has_header:
            print(output, row_separator)
            has_header = False
    return output.getvalue()


# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
    """
    A word-wrap function that preserves existing line breaks
    and most spaces in the text. Expects that existing line
    breaks are posix newlines (\n).
    """
    return functools.reduce(
        lambda line, word, i_width=width: "%s%s%s"
        % (
            line,
            " \n"[
                (
                    len(line[line.rfind("\n") + 1 :]) + len(word.split("\n", 1)[0])
                    >= i_width
                )
            ],
            word,
        ),
        text.split(" "),
    )


def wrap_onspace_strict(text, i_width):
    """Similar to wrap_onspace, but enforces the width constraint:
       words longer than width are split."""
    word_regex = re.compile(r"\S{" + str(i_width) + r",}")
    return wrap_onspace(
        word_regex.sub(lambda m: wrap_always(m.group(), i_width), text), i_width
    )


def wrap_always(text, width):
    """A simple word-wrap function that wraps text on exactly width characters.
       It doesn't split the text in words."""
    return "\n".join(
        [
            text[width * i : width * (i + 1)]
            for i in range(int(math.ceil(1.0 * len(text) / width)))
        ]
    )


if __name__ == "__main__":
    labels = ("First Name", "Last Name", "Age", "Position")
    data = """John,Smith,24,Software Engineer
           Mary,Brohowski,23,Sales Manager
           Aristidis,Papageorgopoulos,28,Senior Reseacher"""
    rows = [row.strip().split(",") for row in data.splitlines()]

    print("Without wrapping function\n")
    print(indent([labels] + rows, has_header=True))

    # test indent with different wrapping functions
    width = 10
    for wrapper in (wrap_always, wrap_onspace, wrap_onspace_strict):
        print("Wrapping function: %s(x,width=%d)\n" % (wrapper.__name__, width))

        print(
            indent(
                [labels] + rows,
                has_header=True,
                separate_rows=True,
                prefix="| ",
                postfix=" |",
                wrapfunc=lambda x: wrapper(x, width),
            )
        )

    # output:
    #
    # Without wrapping function
    #
    # First Name | Last Name        | Age | Position
    # -------------------------------------------------------
    # John       | Smith            | 24  | Software Engineer
    # Mary       | Brohowski        | 23  | Sales Manager
    # Aristidis  | Papageorgopoulos | 28  | Senior Reseacher
    #
    # Wrapping function: wrap_always(x,width=10)
    #
    # ----------------------------------------------
    # | First Name | Last Name  | Age | Position   |
    # ----------------------------------------------
    # | John       | Smith      | 24  | Software E |
    # |            |            |     | ngineer    |
    # ----------------------------------------------
    # | Mary       | Brohowski  | 23  | Sales Mana |
    # |            |            |     | ger        |
    # ----------------------------------------------
    # | Aristidis  | Papageorgo | 28  | Senior Res |
    # |            | poulos     |     | eacher     |
    # ----------------------------------------------
    #
    # Wrapping function: wrap_onspace(x,width=10)
    #
    # ---------------------------------------------------
    # | First Name | Last Name        | Age | Position  |
    # ---------------------------------------------------
    # | John       | Smith            | 24  | Software  |
    # |            |                  |     | Engineer  |
    # ---------------------------------------------------
    # | Mary       | Brohowski        | 23  | Sales     |
    # |            |                  |     | Manager   |
    # ---------------------------------------------------
    # | Aristidis  | Papageorgopoulos | 28  | Senior    |
    # |            |                  |     | Reseacher |
    # ---------------------------------------------------
    #
    # Wrapping function: wrap_onspace_strict(x,width=10)
    #
    # ---------------------------------------------
    # | First Name | Last Name  | Age | Position  |
    # ---------------------------------------------
    # | John       | Smith      | 24  | Software  |
    # |            |            |     | Engineer  |
    # ---------------------------------------------
    # | Mary       | Brohowski  | 23  | Sales     |
    # |            |            |     | Manager   |
    # ---------------------------------------------
    # | Aristidis  | Papageorgo | 28  | Senior    |
    # |            | poulos     |     | Reseacher |
    # ---------------------------------------------

回答 15

我知道这个问题很旧,但是我不了解Antak的答案,也不想使用库,所以我推出了自己的解决方案。

解决方案假定记录是2D数组,记录的长度都相同,并且字段都是字符串。

def stringifyRecords(records):
    column_widths = [0] * len(records[0])
    for record in records:
        for i, field in enumerate(record):
            width = len(field)
            if width > column_widths[i]: column_widths[i] = width

    s = ""
    for record in records:
        for column_width, field in zip(column_widths, record):
            s += field.ljust(column_width+1)
        s += "\n"

    return s

I realize this question is old but I didn’t understand Antak’s answer and didn’t want to use a library so I rolled my own solution.

Solution assumes records is a 2D array, records are all the same length, and that fields are all strings.

def stringifyRecords(records):
    column_widths = [0] * len(records[0])
    for record in records:
        for i, field in enumerate(record):
            width = len(field)
            if width > column_widths[i]: column_widths[i] = width

    s = ""
    for record in records:
        for column_width, field in zip(column_widths, record):
            s += field.ljust(column_width+1)
        s += "\n"

    return s

Python中是否有`string.split()`的生成器版本?

问题:Python中是否有`string.split()`的生成器版本?

string.split()返回列表实例。是否有返回生成器的版本?是否有任何理由禁止使用生成器版本?

string.split() returns a list instance. Is there a version that returns a generator instead? Are there any reasons against having a generator version?


回答 0

re.finditer使用相当少的内存开销的可能性很大。

def split_iter(string):
    return (x.group(0) for x in re.finditer(r"[A-Za-z']+", string))

演示:

>>> list( split_iter("A programmer's RegEx test.") )
['A', "programmer's", 'RegEx', 'test']

编辑:我刚刚确认,假设我的测试方法正确,这将在python 3.2.1中占用不变的内存。我创建了一个非常大的字符串(大约1GB),然后使用for循环遍历了可迭代对象(没有列表理解,这会产生额外的内存)。这不会导致内存的显着增长(也就是说,如果内存增长,则远远少于1GB字符串)。

It is highly probable that re.finditer uses fairly minimal memory overhead.

def split_iter(string):
    return (x.group(0) for x in re.finditer(r"[A-Za-z']+", string))

Demo:

>>> list( split_iter("A programmer's RegEx test.") )
['A', "programmer's", 'RegEx', 'test']

edit: I have just confirmed that this takes constant memory in python 3.2.1, assuming my testing methodology was correct. I created a string of very large size (1GB or so), then iterated through the iterable with a for loop (NOT a list comprehension, which would have generated extra memory). This did not result in a noticeable growth of memory (that is, if there was a growth in memory, it was far far less than the 1GB string).


回答 1

我可以想到的最有效的方法是使用方法的offset参数编写一个str.find()。这样可以避免大量的内存使用,并在不需要时依靠正则表达式的开销。

[编辑2016-8-2:已对此进行更新,可以选择支持正则表达式分隔符]

def isplit(source, sep=None, regex=False):
    """
    generator version of str.split()

    :param source:
        source string (unicode or bytes)

    :param sep:
        separator to split on.

    :param regex:
        if True, will treat sep as regular expression.

    :returns:
        generator yielding elements of string.
    """
    if sep is None:
        # mimic default python behavior
        source = source.strip()
        sep = "\\s+"
        if isinstance(source, bytes):
            sep = sep.encode("ascii")
        regex = True
    if regex:
        # version using re.finditer()
        if not hasattr(sep, "finditer"):
            sep = re.compile(sep)
        start = 0
        for m in sep.finditer(source):
            idx = m.start()
            assert idx >= start
            yield source[start:idx]
            start = m.end()
        yield source[start:]
    else:
        # version using str.find(), less overhead than re.finditer()
        sepsize = len(sep)
        start = 0
        while True:
            idx = source.find(sep, start)
            if idx == -1:
                yield source[start:]
                return
            yield source[start:idx]
            start = idx + sepsize

可以根据需要使用…

>>> print list(isplit("abcb","b"))
['a','c','']

每次执行find()或切片时,在字符串中都需要花费一点成本,但这应该是最小的,因为字符串被表示为内存中的连续数组。

The most efficient way I can think of it to write one using the offset parameter of the str.find() method. This avoids lots of memory use, and relying on the overhead of a regexp when it’s not needed.

[edit 2016-8-2: updated this to optionally support regex separators]

def isplit(source, sep=None, regex=False):
    """
    generator version of str.split()

    :param source:
        source string (unicode or bytes)

    :param sep:
        separator to split on.

    :param regex:
        if True, will treat sep as regular expression.

    :returns:
        generator yielding elements of string.
    """
    if sep is None:
        # mimic default python behavior
        source = source.strip()
        sep = "\\s+"
        if isinstance(source, bytes):
            sep = sep.encode("ascii")
        regex = True
    if regex:
        # version using re.finditer()
        if not hasattr(sep, "finditer"):
            sep = re.compile(sep)
        start = 0
        for m in sep.finditer(source):
            idx = m.start()
            assert idx >= start
            yield source[start:idx]
            start = m.end()
        yield source[start:]
    else:
        # version using str.find(), less overhead than re.finditer()
        sepsize = len(sep)
        start = 0
        while True:
            idx = source.find(sep, start)
            if idx == -1:
                yield source[start:]
                return
            yield source[start:idx]
            start = idx + sepsize

This can be used like you want…

>>> print list(isplit("abcb","b"))
['a','c','']

While there is a little bit of cost seeking within the string each time find() or slicing is performed, this should be minimal since strings are represented as continguous arrays in memory.


回答 2

这是split()通过实现的生成器版本,re.search()不存在分配太多子字符串的问题。

import re

def itersplit(s, sep=None):
    exp = re.compile(r'\s+' if sep is None else re.escape(sep))
    pos = 0
    while True:
        m = exp.search(s, pos)
        if not m:
            if pos < len(s) or sep is not None:
                yield s[pos:]
            break
        if pos < m.start() or sep is not None:
            yield s[pos:m.start()]
        pos = m.end()


sample1 = "Good evening, world!"
sample2 = " Good evening, world! "
sample3 = "brackets][all][][over][here"
sample4 = "][brackets][all][][over][here]["

assert list(itersplit(sample1)) == sample1.split()
assert list(itersplit(sample2)) == sample2.split()
assert list(itersplit(sample3, '][')) == sample3.split('][')
assert list(itersplit(sample4, '][')) == sample4.split('][')

编辑:如果没有给出分隔符,则纠正了周围空白的处理。

This is generator version of split() implemented via re.search() that does not have the problem of allocating too many substrings.

import re

def itersplit(s, sep=None):
    exp = re.compile(r'\s+' if sep is None else re.escape(sep))
    pos = 0
    while True:
        m = exp.search(s, pos)
        if not m:
            if pos < len(s) or sep is not None:
                yield s[pos:]
            break
        if pos < m.start() or sep is not None:
            yield s[pos:m.start()]
        pos = m.end()


sample1 = "Good evening, world!"
sample2 = " Good evening, world! "
sample3 = "brackets][all][][over][here"
sample4 = "][brackets][all][][over][here]["

assert list(itersplit(sample1)) == sample1.split()
assert list(itersplit(sample2)) == sample2.split()
assert list(itersplit(sample3, '][')) == sample3.split('][')
assert list(itersplit(sample4, '][')) == sample4.split('][')

EDIT: Corrected handling of surrounding whitespace if no separator chars are given.


回答 3

对提出的各种方法进行了性能测试(我在这里不再重复)。一些结果:

  • str.split (默认= 0.3461570239996945
  • 手动搜索(按字符)(Dave Webb的答案之一)= 0.8260340550004912
  • re.finditer (ninjagecko的答案)= 0.698872097000276
  • str.find (Eli Collins的答案之一)= 0.7230395330007013
  • itertools.takewhile (伊格纳西奥·巴斯克斯(Ignacio Vazquez-Abrams)的答案)= 2.023023967998597
  • str.split(..., maxsplit=1) 递归= N / A†

† 鉴于s的速度,递归答案(string.split带有maxsplit = 1)未能在合理的时间内完成string.split,但它们可能在较短的字符串上可以更好地工作,但是后来我看不到内存不成问题的短字符串的用例。

使用timeit以下测试:

the_text = "100 " * 9999 + "100"

def test_function( method ):
    def fn( ):
        total = 0

        for x in method( the_text ):
            total += int( x )

        return total

    return fn

这就提出了另一个问题,即为什么string.split尽管使用了内存却速度如此之快。

Did some performance testing on the various methods proposed (I won’t repeat them here). Some results:

  • str.split (default = 0.3461570239996945
  • manual search (by character) (one of Dave Webb’s answer’s) = 0.8260340550004912
  • re.finditer (ninjagecko’s answer) = 0.698872097000276
  • str.find (one of Eli Collins’s answers) = 0.7230395330007013
  • itertools.takewhile (Ignacio Vazquez-Abrams’s answer) = 2.023023967998597
  • str.split(..., maxsplit=1) recursion = N/A†

†The recursion answers (string.split with maxsplit = 1) fail to complete in a reasonable time, given string.splits speed they may work better on shorter strings, but then I can’t see the use-case for short strings where memory isn’t an issue anyway.

Tested using timeit on:

the_text = "100 " * 9999 + "100"

def test_function( method ):
    def fn( ):
        total = 0

        for x in method( the_text ):
            total += int( x )

        return total

    return fn

This raises another question as to why string.split is so much faster despite its memory usage.


回答 4

这是我的实现,比这里的其他答案要快得多,更完整。它具有针对不同情况的4个单独的子功能。

我将只复制main str_split函数的文档字符串:


str_split(s, *delims, empty=None)

s用其余的参数分割字符串,可能省略空白部分(empty关键字参数负责)。这是一个生成器功能。

如果仅提供一个定界符,则字符串将被它简单分割。 empty然后True默认情况下。

str_split('[]aaa[][]bb[c', '[]')
    -> '', 'aaa', '', 'bb[c'
str_split('[]aaa[][]bb[c', '[]', empty=False)
    -> 'aaa', 'bb[c'

如果提供了多个定界符,则默认情况下,该字符串将按这些定界符的最长可能序列进行拆分,或者,如果empty将其设置为 True,则还包括定界符之间的空字符串。注意,在这种情况下,分隔符只能是单个字符。

str_split('aaa, bb : c;', ' ', ',', ':', ';')
    -> 'aaa', 'bb', 'c'
str_split('aaa, bb : c;', *' ,:;', empty=True)
    -> 'aaa', '', 'bb', '', '', 'c', ''

如果未提供定界符,string.whitespace则使用,因此效果与相同str.split(),不同之处在于此函数是一个生成器。

str_split('aaa\\t  bb c \\n')
    -> 'aaa', 'bb', 'c'

import string

def _str_split_chars(s, delims):
    "Split the string `s` by characters contained in `delims`, including the \
    empty parts between two consecutive delimiters"
    start = 0
    for i, c in enumerate(s):
        if c in delims:
            yield s[start:i]
            start = i+1
    yield s[start:]

def _str_split_chars_ne(s, delims):
    "Split the string `s` by longest possible sequences of characters \
    contained in `delims`"
    start = 0
    in_s = False
    for i, c in enumerate(s):
        if c in delims:
            if in_s:
                yield s[start:i]
                in_s = False
        else:
            if not in_s:
                in_s = True
                start = i
    if in_s:
        yield s[start:]


def _str_split_word(s, delim):
    "Split the string `s` by the string `delim`"
    dlen = len(delim)
    start = 0
    try:
        while True:
            i = s.index(delim, start)
            yield s[start:i]
            start = i+dlen
    except ValueError:
        pass
    yield s[start:]

def _str_split_word_ne(s, delim):
    "Split the string `s` by the string `delim`, not including empty parts \
    between two consecutive delimiters"
    dlen = len(delim)
    start = 0
    try:
        while True:
            i = s.index(delim, start)
            if start!=i:
                yield s[start:i]
            start = i+dlen
    except ValueError:
        pass
    if start<len(s):
        yield s[start:]


def str_split(s, *delims, empty=None):
    """\
Split the string `s` by the rest of the arguments, possibly omitting
empty parts (`empty` keyword argument is responsible for that).
This is a generator function.

When only one delimiter is supplied, the string is simply split by it.
`empty` is then `True` by default.
    str_split('[]aaa[][]bb[c', '[]')
        -> '', 'aaa', '', 'bb[c'
    str_split('[]aaa[][]bb[c', '[]', empty=False)
        -> 'aaa', 'bb[c'

When multiple delimiters are supplied, the string is split by longest
possible sequences of those delimiters by default, or, if `empty` is set to
`True`, empty strings between the delimiters are also included. Note that
the delimiters in this case may only be single characters.
    str_split('aaa, bb : c;', ' ', ',', ':', ';')
        -> 'aaa', 'bb', 'c'
    str_split('aaa, bb : c;', *' ,:;', empty=True)
        -> 'aaa', '', 'bb', '', '', 'c', ''

When no delimiters are supplied, `string.whitespace` is used, so the effect
is the same as `str.split()`, except this function is a generator.
    str_split('aaa\\t  bb c \\n')
        -> 'aaa', 'bb', 'c'
"""
    if len(delims)==1:
        f = _str_split_word if empty is None or empty else _str_split_word_ne
        return f(s, delims[0])
    if len(delims)==0:
        delims = string.whitespace
    delims = set(delims) if len(delims)>=4 else ''.join(delims)
    if any(len(d)>1 for d in delims):
        raise ValueError("Only 1-character multiple delimiters are supported")
    f = _str_split_chars if empty else _str_split_chars_ne
    return f(s, delims)

该函数可在Python 3中使用,并且可以通过简单但很难看的修复程序使其在2和3版本中均可使用。该函数的第一行应更改为:

def str_split(s, *delims, **kwargs):
    """...docstring..."""
    empty = kwargs.get('empty')

Here is my implementation, which is much, much faster and more complete than the other answers here. It has 4 separate subfunctions for different cases.

I’ll just copy the docstring of the main str_split function:


str_split(s, *delims, empty=None)

Split the string s by the rest of the arguments, possibly omitting empty parts (empty keyword argument is responsible for that). This is a generator function.

When only one delimiter is supplied, the string is simply split by it. empty is then True by default.

str_split('[]aaa[][]bb[c', '[]')
    -> '', 'aaa', '', 'bb[c'
str_split('[]aaa[][]bb[c', '[]', empty=False)
    -> 'aaa', 'bb[c'

When multiple delimiters are supplied, the string is split by longest possible sequences of those delimiters by default, or, if empty is set to True, empty strings between the delimiters are also included. Note that the delimiters in this case may only be single characters.

str_split('aaa, bb : c;', ' ', ',', ':', ';')
    -> 'aaa', 'bb', 'c'
str_split('aaa, bb : c;', *' ,:;', empty=True)
    -> 'aaa', '', 'bb', '', '', 'c', ''

When no delimiters are supplied, string.whitespace is used, so the effect is the same as str.split(), except this function is a generator.

str_split('aaa\\t  bb c \\n')
    -> 'aaa', 'bb', 'c'

import string

def _str_split_chars(s, delims):
    "Split the string `s` by characters contained in `delims`, including the \
    empty parts between two consecutive delimiters"
    start = 0
    for i, c in enumerate(s):
        if c in delims:
            yield s[start:i]
            start = i+1
    yield s[start:]

def _str_split_chars_ne(s, delims):
    "Split the string `s` by longest possible sequences of characters \
    contained in `delims`"
    start = 0
    in_s = False
    for i, c in enumerate(s):
        if c in delims:
            if in_s:
                yield s[start:i]
                in_s = False
        else:
            if not in_s:
                in_s = True
                start = i
    if in_s:
        yield s[start:]


def _str_split_word(s, delim):
    "Split the string `s` by the string `delim`"
    dlen = len(delim)
    start = 0
    try:
        while True:
            i = s.index(delim, start)
            yield s[start:i]
            start = i+dlen
    except ValueError:
        pass
    yield s[start:]

def _str_split_word_ne(s, delim):
    "Split the string `s` by the string `delim`, not including empty parts \
    between two consecutive delimiters"
    dlen = len(delim)
    start = 0
    try:
        while True:
            i = s.index(delim, start)
            if start!=i:
                yield s[start:i]
            start = i+dlen
    except ValueError:
        pass
    if start<len(s):
        yield s[start:]


def str_split(s, *delims, empty=None):
    """\
Split the string `s` by the rest of the arguments, possibly omitting
empty parts (`empty` keyword argument is responsible for that).
This is a generator function.

When only one delimiter is supplied, the string is simply split by it.
`empty` is then `True` by default.
    str_split('[]aaa[][]bb[c', '[]')
        -> '', 'aaa', '', 'bb[c'
    str_split('[]aaa[][]bb[c', '[]', empty=False)
        -> 'aaa', 'bb[c'

When multiple delimiters are supplied, the string is split by longest
possible sequences of those delimiters by default, or, if `empty` is set to
`True`, empty strings between the delimiters are also included. Note that
the delimiters in this case may only be single characters.
    str_split('aaa, bb : c;', ' ', ',', ':', ';')
        -> 'aaa', 'bb', 'c'
    str_split('aaa, bb : c;', *' ,:;', empty=True)
        -> 'aaa', '', 'bb', '', '', 'c', ''

When no delimiters are supplied, `string.whitespace` is used, so the effect
is the same as `str.split()`, except this function is a generator.
    str_split('aaa\\t  bb c \\n')
        -> 'aaa', 'bb', 'c'
"""
    if len(delims)==1:
        f = _str_split_word if empty is None or empty else _str_split_word_ne
        return f(s, delims[0])
    if len(delims)==0:
        delims = string.whitespace
    delims = set(delims) if len(delims)>=4 else ''.join(delims)
    if any(len(d)>1 for d in delims):
        raise ValueError("Only 1-character multiple delimiters are supported")
    f = _str_split_chars if empty else _str_split_chars_ne
    return f(s, delims)

This function works in Python 3, and an easy, though quite ugly, fix can be applied to make it work in both 2 and 3 versions. The first lines of the function should be changed to:

def str_split(s, *delims, **kwargs):
    """...docstring..."""
    empty = kwargs.get('empty')

回答 5

否,但是使用编写一个应该足够容易itertools.takewhile()

编辑:

非常简单,半断的实现:

import itertools
import string

def isplitwords(s):
  i = iter(s)
  while True:
    r = []
    for c in itertools.takewhile(lambda x: not x in string.whitespace, i):
      r.append(c)
    else:
      if r:
        yield ''.join(r)
        continue
      else:
        raise StopIteration()

No, but it should be easy enough to write one using itertools.takewhile().

EDIT:

Very simple, half-broken implementation:

import itertools
import string

def isplitwords(s):
  i = iter(s)
  while True:
    r = []
    for c in itertools.takewhile(lambda x: not x in string.whitespace, i):
      r.append(c)
    else:
      if r:
        yield ''.join(r)
        continue
      else:
        raise StopIteration()

回答 6

我认为的生成器版本没有任何明显的好处split()。生成器对象将必须包含整个字符串以进行迭代,因此您不必通过生成器来节省任何内存。

如果您想编写一个,那将很容易:

import string

def gsplit(s,sep=string.whitespace):
    word = []

    for c in s:
        if c in sep:
            if word:
                yield "".join(word)
                word = []
        else:
            word.append(c)

    if word:
        yield "".join(word)

I don’t see any obvious benefit to a generator version of split(). The generator object is going to have to contain the whole string to iterate over so you’re not going to save any memory by having a generator.

If you wanted to write one it would be fairly easy though:

import string

def gsplit(s,sep=string.whitespace):
    word = []

    for c in s:
        if c in sep:
            if word:
                yield "".join(word)
                word = []
        else:
            word.append(c)

    if word:
        yield "".join(word)

回答 7

我写了一个@ninjagecko答案的版本,其行为更类似于string.split(即默认情况下用空格定界,您可以指定定界符)。

def isplit(string, delimiter = None):
    """Like string.split but returns an iterator (lazy)

    Multiple character delimters are not handled.
    """

    if delimiter is None:
        # Whitespace delimited by default
        delim = r"\s"

    elif len(delimiter) != 1:
        raise ValueError("Can only handle single character delimiters",
                        delimiter)

    else:
        # Escape, incase it's "\", "*" etc.
        delim = re.escape(delimiter)

    return (x.group(0) for x in re.finditer(r"[^{}]+".format(delim), string))

这是我使用的测试(在python 3和python 2中):

# Wrapper to make it a list
def helper(*args,  **kwargs):
    return list(isplit(*args, **kwargs))

# Normal delimiters
assert helper("1,2,3", ",") == ["1", "2", "3"]
assert helper("1;2;3,", ";") == ["1", "2", "3,"]
assert helper("1;2 ;3,  ", ";") == ["1", "2 ", "3,  "]

# Whitespace
assert helper("1 2 3") == ["1", "2", "3"]
assert helper("1\t2\t3") == ["1", "2", "3"]
assert helper("1\t2 \t3") == ["1", "2", "3"]
assert helper("1\n2\n3") == ["1", "2", "3"]

# Surrounding whitespace dropped
assert helper(" 1 2  3  ") == ["1", "2", "3"]

# Regex special characters
assert helper(r"1\2\3", "\\") == ["1", "2", "3"]
assert helper(r"1*2*3", "*") == ["1", "2", "3"]

# No multi-char delimiters allowed
try:
    helper(r"1,.2,.3", ",.")
    assert False
except ValueError:
    pass

python的regex模块说 unicode空格做了“正确的事”,但我实际上尚未对其进行测试。

也可作为要点

I wrote a version of @ninjagecko’s answer that behaves more like string.split (i.e. whitespace delimited by default and you can specify a delimiter).

def isplit(string, delimiter = None):
    """Like string.split but returns an iterator (lazy)

    Multiple character delimters are not handled.
    """

    if delimiter is None:
        # Whitespace delimited by default
        delim = r"\s"

    elif len(delimiter) != 1:
        raise ValueError("Can only handle single character delimiters",
                        delimiter)

    else:
        # Escape, incase it's "\", "*" etc.
        delim = re.escape(delimiter)

    return (x.group(0) for x in re.finditer(r"[^{}]+".format(delim), string))

Here are the tests I used (in both python 3 and python 2):

# Wrapper to make it a list
def helper(*args,  **kwargs):
    return list(isplit(*args, **kwargs))

# Normal delimiters
assert helper("1,2,3", ",") == ["1", "2", "3"]
assert helper("1;2;3,", ";") == ["1", "2", "3,"]
assert helper("1;2 ;3,  ", ";") == ["1", "2 ", "3,  "]

# Whitespace
assert helper("1 2 3") == ["1", "2", "3"]
assert helper("1\t2\t3") == ["1", "2", "3"]
assert helper("1\t2 \t3") == ["1", "2", "3"]
assert helper("1\n2\n3") == ["1", "2", "3"]

# Surrounding whitespace dropped
assert helper(" 1 2  3  ") == ["1", "2", "3"]

# Regex special characters
assert helper(r"1\2\3", "\\") == ["1", "2", "3"]
assert helper(r"1*2*3", "*") == ["1", "2", "3"]

# No multi-char delimiters allowed
try:
    helper(r"1,.2,.3", ",.")
    assert False
except ValueError:
    pass

python’s regex module says that it does “the right thing” for unicode whitespace, but I haven’t actually tested it.

Also available as a gist.


回答 8

如果您还希望能够读取迭代器(以及返回一个迭代器),请尝试以下操作:

import itertools as it

def iter_split(string, sep=None):
    sep = sep or ' '
    groups = it.groupby(string, lambda s: s != sep)
    return (''.join(g) for k, g in groups if k)

用法

>>> list(iter_split(iter("Good evening, world!")))
['Good', 'evening,', 'world!']

If you would also like to be able to read an iterator (as well as return one) try this:

import itertools as it

def iter_split(string, sep=None):
    sep = sep or ' '
    groups = it.groupby(string, lambda s: s != sep)
    return (''.join(g) for k, g in groups if k)

Usage

>>> list(iter_split(iter("Good evening, world!")))
['Good', 'evening,', 'world!']

回答 9

more_itertools.split_atstr.split为迭代器提供一个模拟。

>>> import more_itertools as mit


>>> list(mit.split_at("abcdcba", lambda x: x == "b"))
[['a'], ['c', 'd', 'c'], ['a']]

>>> "abcdcba".split("b")
['a', 'cdc', 'a']

more_itertools 是第三方软件包。

more_itertools.split_at offers an analog to str.split for iterators.

>>> import more_itertools as mit


>>> list(mit.split_at("abcdcba", lambda x: x == "b"))
[['a'], ['c', 'd', 'c'], ['a']]

>>> "abcdcba".split("b")
['a', 'cdc', 'a']

more_itertools is a third-party package.


回答 10

我想展示如何使用find_iter解决方案为给定的定界符返回生成器,然后使用itertools中的成对配方来构建先前的下一个迭代,该迭代将获得与原始split方法相同的实际单词。


from more_itertools import pairwise
import re

string = "dasdha hasud hasuid hsuia dhsuai dhasiu dhaui d"
delimiter = " "
# split according to the given delimiter including segments beginning at the beginning and ending at the end
for prev, curr in pairwise(re.finditer("^|[{0}]+|$".format(delimiter), string)):
    print(string[prev.end(): curr.start()])

注意:

  1. 我使用prev&curr而不是prev&next,因为在python中覆盖next是一个非常糟糕的主意
  2. 这很有效

I wanted to show how to use the find_iter solution to return a generator for given delimiters and then use the pairwise recipe from itertools to build a previous next iteration which will get the actual words as in the original split method.


from more_itertools import pairwise
import re

string = "dasdha hasud hasuid hsuia dhsuai dhasiu dhaui d"
delimiter = " "
# split according to the given delimiter including segments beginning at the beginning and ending at the end
for prev, curr in pairwise(re.finditer("^|[{0}]+|$".format(delimiter), string)):
    print(string[prev.end(): curr.start()])

note:

  1. I use prev & curr instead of prev & next because overriding next in python is a very bad idea
  2. This is quite efficient

回答 11

最笨的方法,不使用正则表达式/ itertools:

def isplit(text, split='\n'):
    while text != '':
        end = text.find(split)

        if end == -1:
            yield text
            text = ''
        else:
            yield text[:end]
            text = text[end + 1:]

Dumbest method, without regex / itertools:

def isplit(text, split='\n'):
    while text != '':
        end = text.find(split)

        if end == -1:
            yield text
            text = ''
        else:
            yield text[:end]
            text = text[end + 1:]

回答 12

def split_generator(f,s):
    """
    f is a string, s is the substring we split on.
    This produces a generator rather than a possibly
    memory intensive list. 
    """
    i=0
    j=0
    while j<len(f):
        if i>=len(f):
            yield f[j:]
            j=i
        elif f[i] != s:
            i=i+1
        else:
            yield [f[j:i]]
            j=i+1
            i=i+1
def split_generator(f,s):
    """
    f is a string, s is the substring we split on.
    This produces a generator rather than a possibly
    memory intensive list. 
    """
    i=0
    j=0
    while j<len(f):
        if i>=len(f):
            yield f[j:]
            j=i
        elif f[i] != s:
            i=i+1
        else:
            yield [f[j:i]]
            j=i+1
            i=i+1

回答 13

这是一个简单的回应

def gen_str(some_string, sep):
    j=0
    guard = len(some_string)-1
    for i,s in enumerate(some_string):
        if s == sep:
           yield some_string[j:i]
           j=i+1
        elif i!=guard:
           continue
        else:
           yield some_string[j:]

here is a simple response

def gen_str(some_string, sep):
    j=0
    guard = len(some_string)-1
    for i,s in enumerate(some_string):
        if s == sep:
           yield some_string[j:i]
           j=i+1
        elif i!=guard:
           continue
        else:
           yield some_string[j:]

在Python中处理字符串中的转义序列

问题:在Python中处理字符串中的转义序列

有时,当我从文件或用户那里得到输入时,我会得到一个带有转义序列的字符串。我想以与Python处理字符串文字中的转义序列相同的方式来处理转义序列

例如,假设myString定义为:

>>> myString = "spam\\neggs"
>>> print(myString)
spam\neggs

我想要一个process执行此操作的函数(我称之为):

>>> print(process(myString))
spam
eggs

该函数可以处理Python中的所有转义序列(在上面的链接的表格中列出),这一点很重要。

Python是否具有执行此操作的功能?

Sometimes when I get input from a file or the user, I get a string with escape sequences in it. I would like to process the escape sequences in the same way that Python processes escape sequences in string literals.

For example, let’s say myString is defined as:

>>> myString = "spam\\neggs"
>>> print(myString)
spam\neggs

I want a function (I’ll call it process) that does this:

>>> print(process(myString))
spam
eggs

It’s important that the function can process all of the escape sequences in Python (listed in a table in the link above).

Does Python have a function to do this?


回答 0

正确的做法是使用“字符串转义”代码对字符串进行解码。

>>> myString = "spam\\neggs"
>>> decoded_string = bytes(myString, "utf-8").decode("unicode_escape") # python3 
>>> decoded_string = myString.decode('string_escape') # python2
>>> print(decoded_string)
spam
eggs

不要使用AST或eval。使用字符串编解码器更加安全。

The correct thing to do is use the ‘string-escape’ code to decode the string.

>>> myString = "spam\\neggs"
>>> decoded_string = bytes(myString, "utf-8").decode("unicode_escape") # python3 
>>> decoded_string = myString.decode('string_escape') # python2
>>> print(decoded_string)
spam
eggs

Don’t use the AST or eval. Using the string codecs is much safer.


回答 1

unicode_escape 总的来说不起作用

事实证明,string_escapeor unicode_escape解决方案通常无法正常工作-尤其是在存在实际Unicode的情况下,它不能正常工作。

如果您可以确定每个非ASCII字符都会被转义(并且请记住,前128个字符以外的任何字符都是非ASCII),unicode_escape将为您做正确的事。但是,如果您的字符串中已经有任何文字上的非ASCII字符,则会出错。

unicode_escape从根本上来说是设计用来将字节转换为Unicode文本。但是在许多地方(例如Python源代码),源数据已经是Unicode文本。

唯一可以正常工作的方法是首先将文本编码为字节。UTF-8是所有文本的明智编码,因此应该可以使用,对吧?

以下示例是Python 3中的示例,因此字符串文字更清晰,但在Python 2和3上,存在相同的问题,但表现形式略有不同。

>>> s = 'naïve \\t test'
>>> print(s.encode('utf-8').decode('unicode_escape'))
naïve   test

好吧,那是错误的。

建议使用编解码器将文本解码为文本的新方法是codecs.decode直接调用。有帮助吗?

>>> import codecs
>>> print(codecs.decode(s, 'unicode_escape'))
naïve   test

一点也不。(此外,以上是Python 2上的UnicodeError。)

unicode_escape编解码器,尽管它的名字,原来假设所有非ASCII字节拉丁-1(ISO-8859-1)编码。因此,您必须这样做:

>>> print(s.encode('latin-1').decode('unicode_escape'))
naïve    test

但这太可怕了。这将您限制为256个Latin-1字符,就好像根本没有发明Unicode一样!

>>> print('Ernő \\t Rubik'.encode('latin-1').decode('unicode_escape'))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0151'
in position 3: ordinal not in range(256)

添加正则表达式以解决问题

(令人惊讶的是,我们现在没有两个问题。)

我们需要做的只是将unicode_escape解码器应用于我们确定为ASCII文本的内容。特别是,我们可以确保仅将其应用于有效的Python转义序列,这些序列必须保证为ASCII文本。

计划是,我们将使用正则表达式查找转义序列,并使用函数作为参数以re.sub将其替换为未转义的值。

import re
import codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\U........      # 8-digit hex escapes
    | \\u....          # 4-digit hex escapes
    | \\x..            # 2-digit hex escapes
    | \\[0-7]{1,3}     # Octal escapes
    | \\N\{[^}]+\}     # Unicode characters by name
    | \\[\\'"abfnrtv]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')

    return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

然后:

>>> print(decode_escapes('Ernő \\t Rubik'))
Ernő     Rubik

unicode_escape doesn’t work in general

It turns out that the string_escape or unicode_escape solution does not work in general — particularly, it doesn’t work in the presence of actual Unicode.

If you can be sure that every non-ASCII character will be escaped (and remember, anything beyond the first 128 characters is non-ASCII), unicode_escape will do the right thing for you. But if there are any literal non-ASCII characters already in your string, things will go wrong.

unicode_escape is fundamentally designed to convert bytes into Unicode text. But in many places — for example, Python source code — the source data is already Unicode text.

The only way this can work correctly is if you encode the text into bytes first. UTF-8 is the sensible encoding for all text, so that should work, right?

The following examples are in Python 3, so that the string literals are cleaner, but the same problem exists with slightly different manifestations on both Python 2 and 3.

>>> s = 'naïve \\t test'
>>> print(s.encode('utf-8').decode('unicode_escape'))
naïve   test

Well, that’s wrong.

The new recommended way to use codecs that decode text into text is to call codecs.decode directly. Does that help?

>>> import codecs
>>> print(codecs.decode(s, 'unicode_escape'))
naïve   test

Not at all. (Also, the above is a UnicodeError on Python 2.)

The unicode_escape codec, despite its name, turns out to assume that all non-ASCII bytes are in the Latin-1 (ISO-8859-1) encoding. So you would have to do it like this:

>>> print(s.encode('latin-1').decode('unicode_escape'))
naïve    test

But that’s terrible. This limits you to the 256 Latin-1 characters, as if Unicode had never been invented at all!

>>> print('Ernő \\t Rubik'.encode('latin-1').decode('unicode_escape'))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0151'
in position 3: ordinal not in range(256)

Adding a regular expression to solve the problem

(Surprisingly, we do not now have two problems.)

What we need to do is only apply the unicode_escape decoder to things that we are certain to be ASCII text. In particular, we can make sure only to apply it to valid Python escape sequences, which are guaranteed to be ASCII text.

The plan is, we’ll find escape sequences using a regular expression, and use a function as the argument to re.sub to replace them with their unescaped value.

import re
import codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\U........      # 8-digit hex escapes
    | \\u....          # 4-digit hex escapes
    | \\x..            # 2-digit hex escapes
    | \\[0-7]{1,3}     # Octal escapes
    | \\N\{[^}]+\}     # Unicode characters by name
    | \\[\\'"abfnrtv]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')

    return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

And with that:

>>> print(decode_escapes('Ernő \\t Rubik'))
Ernő     Rubik

回答 2

python 3的实际正确答案:

>>> import codecs
>>> myString = "spam\\neggs"
>>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
spam
eggs
>>> myString = "naïve \\t test"
>>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
naïve    test

有关的详细信息codecs.escape_decode

  • codecs.escape_decode 是一个逐字节解码器
  • codecs.escape_decode解码ascii转义序列,例如:b"\\n"-> b"\n"b"\\xce"-> b"\xce"
  • codecs.escape_decode 不需要或不需要了解字节对象的编码,但是转义字节的编码应与对象其余部分的编码匹配。

背景:

The actually correct and convenient answer for python 3:

>>> import codecs
>>> myString = "spam\\neggs"
>>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
spam
eggs
>>> myString = "naïve \\t test"
>>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
naïve    test

Details regarding codecs.escape_decode:

  • codecs.escape_decode is a bytes-to-bytes decoder
  • codecs.escape_decode decodes ascii escape sequences, such as: b"\\n" -> b"\n", b"\\xce" -> b"\xce".
  • codecs.escape_decode does not care or need to know about the byte object’s encoding, but the encoding of the escaped bytes should match the encoding of the rest of the object.

Background:

  • @rspeer is correct: unicode_escape is the incorrect solution for python3. This is because unicode_escape decodes escaped bytes, then decodes bytes to unicode string, but receives no information regarding which codec to use for the second operation.
  • @Jerub is correct: avoid the AST or eval.
  • I first discovered codecs.escape_decode from this answer to “how do I .decode(‘string-escape’) in Python3?”. As that answer states, that function is currently not documented for python 3.

回答 3

ast.literal_eval函数将关闭,但是它将期望该字符串先被正确引用。

当然反斜杠Python的解释依赖于字符串的方式引用(""VS r""VS u"",三引号等),所以你可能想包装在合适的报价的用户输入和传递给literal_eval。将其包装在引号中还可以防止literal_eval返回数字,元组,字典等。

如果用户键入您打算在字符串周围使用的引号引起来,事情可能仍然会变得棘手。

The ast.literal_eval function comes close, but it will expect the string to be properly quoted first.

Of course Python’s interpretation of backslash escapes depends on how the string is quoted ("" vs r"" vs u"", triple quotes, etc) so you may want to wrap the user input in suitable quotes and pass to literal_eval. Wrapping it in quotes will also prevent literal_eval from returning a number, tuple, dictionary, etc.

Things still might get tricky if the user types unquoted quotes of the type you intend to wrap around the string.


回答 4

这是一个不好的方法,但是当我尝试解释在字符串参数中传递的转义八进制时,它对我有用。

input_string = eval('b"' + sys.argv[1] + '"')

值得一提的是,eval和ast.literal_eval之间存在区别(eval更加不安全)。请参阅使用python的eval()与ast.literal_eval()吗?

This is a bad way of doing it, but it worked for me when trying to interpret escaped octals passed in a string argument.

input_string = eval('b"' + sys.argv[1] + '"')

It’s worth mentioning that there is a difference between eval and ast.literal_eval (eval being way more unsafe). See Using python’s eval() vs. ast.literal_eval()?


回答 5

下面的代码应该适用于\ n,要求将其显示在字符串上。

import string

our_str = 'The String is \\n, \\n and \\n!'
new_str = string.replace(our_str, '/\\n', '/\n', 1)
print(new_str)

Below code should work for \n is required to be displayed on the string.

import string

our_str = 'The String is \\n, \\n and \\n!'
new_str = string.replace(our_str, '/\\n', '/\n', 1)
print(new_str)

在Python 3中禁止/打印不带b’前缀的字节

问题:在Python 3中禁止/打印不带b’前缀的字节

只需发布此内容,以便稍后查找,因为它总是让我感到困惑:

$ python3.2
Python 3.2 (r32:88445, Oct 20 2012, 14:09:50) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import curses
>>> print(curses.version)
b'2.2'
>>> print(str(curses.version))
b'2.2'
>>> print(curses.version.encode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'encode'
>>> print(str(curses.version).encode('utf-8'))
b"b'2.2'"

问题:如何bytes在Python 3中打印不带b'前缀的二进制()字符串?

Just posting this so I can search for it later, as it always seems to stump me:

$ python3.2
Python 3.2 (r32:88445, Oct 20 2012, 14:09:50) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import curses
>>> print(curses.version)
b'2.2'
>>> print(str(curses.version))
b'2.2'
>>> print(curses.version.encode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'encode'
>>> print(str(curses.version).encode('utf-8'))
b"b'2.2'"

As question: how to print a binary (bytes) string in Python 3, without the b' prefix?


回答 0

用途decode

print(curses.version.decode())
# 2.2

Use decode:

print(curses.version.decode())
# 2.2

回答 1

如果字节已经使用适当的字符编码;您可以直接打印它们:

sys.stdout.buffer.write(data)

要么

nwritten = os.write(sys.stdout.fileno(), data)  # NOTE: it may write less than len(data) bytes

If the bytes use an appropriate character encoding already; you could print them directly:

sys.stdout.buffer.write(data)

or

nwritten = os.write(sys.stdout.fileno(), data)  # NOTE: it may write less than len(data) bytes

回答 2

如果我们看一下的源代码bytes.__repr__,看起来就像b''是将烘焙到方法中一样。

最明显的解决方法是b''从结果中手动切片repr()

>>> x = b'\x01\x02\x03\x04'

>>> print(repr(x))
b'\x01\x02\x03\x04'

>>> print(repr(x)[2:-1])
\x01\x02\x03\x04

If we take a look at the source for bytes.__repr__, it looks as if the b'' is baked into the method.

The most obvious workaround is to manually slice off the b'' from the resulting repr():

>>> x = b'\x01\x02\x03\x04'

>>> print(repr(x))
b'\x01\x02\x03\x04'

>>> print(repr(x)[2:-1])
\x01\x02\x03\x04

回答 3

如果数据采用UTF-8兼容格式,则可以将字节转换为字符串。

>>> import curses
>>> print(str(curses.version, "utf-8"))
2.2

如果数据尚不兼容UTF-8,则可以选择先转换为十六进制。例如,当数据是实际的原始字节时。

from binascii import hexlify
from codecs import encode  # alternative
>>> print(hexlify(b"\x13\x37"))
b'1337'
>>> print(str(hexlify(b"\x13\x37"), "utf-8"))
1337
>>>> print(str(encode(b"\x13\x37", "hex"), "utf-8"))
1337

If the data is in an UTF-8 compatible format, you can convert the bytes to a string.

>>> import curses
>>> print(str(curses.version, "utf-8"))
2.2

Optionally convert to hex first, if the data is not already UTF-8 compatible. E.g. when the data are actual raw bytes.

from binascii import hexlify
from codecs import encode  # alternative
>>> print(hexlify(b"\x13\x37"))
b'1337'
>>> print(str(hexlify(b"\x13\x37"), "utf-8"))
1337
>>>> print(str(encode(b"\x13\x37", "hex"), "utf-8"))
1337

将Python字符串格式化与列表一起使用

问题:将Python字符串格式化与列表一起使用

s在Python 2.6.5中构造了一个字符串,该字符串将具有不同数量的%s令牌,这些令牌与list中的条目数匹配x。我需要写出格式化的字符串。以下内容不起作用,但表示我要执行的操作。在此示例中,有三个%s标记,并且列表具有三个条目。

s = '%s BLAH %s FOO %s BAR'
x = ['1', '2', '3']
print s % (x)

我希望输出字符串为:

1 BLAH 2 FOO 3 BAR

I construct a string s in Python 2.6.5 which will have a varying number of %s tokens, which match the number of entries in list x. I need to write out a formatted string. The following doesn’t work, but indicates what I’m trying to do. In this example, there are three %s tokens and the list has three entries.

s = '%s BLAH %s FOO %s BAR'
x = ['1', '2', '3']
print s % (x)

I’d like the output string to be:

1 BLAH 2 FOO 3 BAR


回答 0

print s % tuple(x)

代替

print s % (x)
print s % tuple(x)

instead of

print s % (x)

回答 1

您应该看一下python 的format方法。然后,您可以像这样定义格式字符串:

>>> s = '{0} BLAH BLAH {1} BLAH {2} BLAH BLIH BLEH'
>>> x = ['1', '2', '3']
>>> print s.format(*x)
'1 BLAH BLAH 2 BLAH 3 BLAH BLIH BLEH'

You should take a look to the format method of python. You could then define your formatting string like this :

>>> s = '{0} BLAH BLAH {1} BLAH {2} BLAH BLIH BLEH'
>>> x = ['1', '2', '3']
>>> print s.format(*x)
'1 BLAH BLAH 2 BLAH 3 BLAH BLIH BLEH'

回答 2

在此资源页面之后,如果x的长度变化,我们可以使用:

', '.join(['%.2f']*len(x))

为列表中的每个元素创建一个占位符x。这是示例:

x = [1/3.0, 1/6.0, 0.678]
s = ("elements in the list are ["+', '.join(['%.2f']*len(x))+"]") % tuple(x)
print s
>>> elements in the list are [0.33, 0.17, 0.68]

Following this resource page, if the length of x is varying, we can use:

', '.join(['%.2f']*len(x))

to create a place holder for each element from the list x. Here is the example:

x = [1/3.0, 1/6.0, 0.678]
s = ("elements in the list are ["+', '.join(['%.2f']*len(x))+"]") % tuple(x)
print s
>>> elements in the list are [0.33, 0.17, 0.68]

回答 3

这是一行代码。一个临时的答案,使用带有print()的format来迭代列表。

怎么样(Python 3.x):

sample_list = ['cat', 'dog', 'bunny', 'pig']
print("Your list of animals are: {}, {}, {} and {}".format(*sample_list))

在此处阅读有关使用format()的文档。

Here is a one liner. A little improvised answer using format with print() to iterate a list.

How about this (python 3.x):

sample_list = ['cat', 'dog', 'bunny', 'pig']
print("Your list of animals are: {}, {}, {} and {}".format(*sample_list))

Read the docs here on using format().


回答 4

由于我刚刚学到了这个很酷的东西(从格式字符串中索引到列表中),所以我添加了这个老问题。

s = '{x[0]} BLAH {x[1]} FOO {x[2]} BAR'
x = ['1', '2', '3']
print (s.format (x=x))

输出:

1 BLAH 2 FOO 3 BAR

但是,我仍然没有弄清楚如何进行切片(在格式字符串'"{x[2:4]}".format...中),并且很想弄清楚是否有人有想法,但是我怀疑您根本无法做到这一点。

Since I just learned about this cool thing(indexing into lists from within a format string) I’m adding to this old question.

s = '{x[0]} BLAH {x[1]} FOO {x[2]} BAR'
x = ['1', '2', '3']
print (s.format (x=x))

Output:

1 BLAH 2 FOO 3 BAR

However, I still haven’t figured out how to do slicing(inside of the format string '"{x[2:4]}".format...,) and would love to figure it out if anyone has an idea, however I suspect that you simply cannot do that.


回答 5

这是一个有趣的问题!处理可变长度列表的另一种方法是构建一个充分利用该.format方法和列表拆包的功能。在下面的示例中,我不使用任何特殊的格式,但是可以轻松地对其进行更改以满足您的需求。

list_1 = [1,2,3,4,5,6]
list_2 = [1,2,3,4,5,6,7,8]

# Create a function that can apply formatting to lists of any length:
def ListToFormattedString(alist):
    # Create a format spec for each item in the input `alist`.
    # E.g., each item will be right-adjusted, field width=3.
    format_list = ['{:>3}' for item in alist] 

    # Now join the format specs into a single string:
    # E.g., '{:>3}, {:>3}, {:>3}' if the input list has 3 items.
    s = ','.join(format_list)

    # Now unpack the input list `alist` into the format string. Done!
    return s.format(*alist)

# Example output:
>>>ListToFormattedString(list_1)
'  1,  2,  3,  4,  5,  6'
>>>ListToFormattedString(list_2)
'  1,  2,  3,  4,  5,  6,  7,  8'

This was a fun question! Another way to handle this for variable length lists is to build a function that takes full advantage of the .format method and list unpacking. In the following example I don’t use any fancy formatting, but that can easily be changed to suit your needs.

list_1 = [1,2,3,4,5,6]
list_2 = [1,2,3,4,5,6,7,8]

# Create a function that can apply formatting to lists of any length:
def ListToFormattedString(alist):
    # Create a format spec for each item in the input `alist`.
    # E.g., each item will be right-adjusted, field width=3.
    format_list = ['{:>3}' for item in alist] 

    # Now join the format specs into a single string:
    # E.g., '{:>3}, {:>3}, {:>3}' if the input list has 3 items.
    s = ','.join(format_list)

    # Now unpack the input list `alist` into the format string. Done!
    return s.format(*alist)

# Example output:
>>>ListToFormattedString(list_1)
'  1,  2,  3,  4,  5,  6'
>>>ListToFormattedString(list_2)
'  1,  2,  3,  4,  5,  6,  7,  8'

回答 6

与@neobot的答案相同,但更加现代和简洁。

>>> l = range(5)
>>> " & ".join(["{}"]*len(l)).format(*l)
'0 & 1 & 2 & 3 & 4'

The same as @neobot’s answer but a little more modern and succinct.

>>> l = range(5)
>>> " & ".join(["{}"]*len(l)).format(*l)
'0 & 1 & 2 & 3 & 4'

回答 7

x = ['1', '2', '3']
s = f"{x[0]} BLAH {x[1]} FOO {x[2]} BAR"
print(s)

输出是

1 BLAH 2 FOO 3 BAR
x = ['1', '2', '3']
s = f"{x[0]} BLAH {x[1]} FOO {x[2]} BAR"
print(s)

The output is

1 BLAH 2 FOO 3 BAR

格式化字符串时多次插入相同的值

问题:格式化字符串时多次插入相同的值

我有这种形式的字符串

s='arbit'
string='%s hello world %s hello world %s' %(s,s,s)

字符串中的所有%s都具有相同的值(即s)。有没有更好的书写方式?(而不是列出三遍)

I have a string of this form

s='arbit'
string='%s hello world %s hello world %s' %(s,s,s)

All the %s in string have the same value (i.e. s). Is there a better way of writing this? (Rather than listing out s three times)


回答 0

您可以使用Python 2.6和Python 3.x中提供的高级字符串格式

incoming = 'arbit'
result = '{0} hello world {0} hello world {0}'.format(incoming)

You can use advanced string formatting, available in Python 2.6 and Python 3.x:

incoming = 'arbit'
result = '{0} hello world {0} hello world {0}'.format(incoming)

回答 1

incoming = 'arbit'
result = '%(s)s hello world %(s)s hello world %(s)s' % {'s': incoming}

您可能需要阅读以下内容以了解:String Formatting Operations

incoming = 'arbit'
result = '%(s)s hello world %(s)s hello world %(s)s' % {'s': incoming}

You may like to have a read of this to get an understanding: String Formatting Operations.


回答 2

您可以使用格式的字典类型:

s='arbit'
string='%(key)s hello world %(key)s hello world %(key)s' % {'key': s,}

You can use the dictionary type of formatting:

s='arbit'
string='%(key)s hello world %(key)s hello world %(key)s' % {'key': s,}

回答 3

取决于您的意思更好。如果您的目标是消除冗余,则此方法有效。

s='foo'
string='%s bar baz %s bar baz %s bar baz' % (3*(s,))

Depends on what you mean by better. This works if your goal is removal of redundancy.

s='foo'
string='%s bar baz %s bar baz %s bar baz' % (3*(s,))

回答 4

>>> s1 ='arbit'
>>> s2 = 'hello world '.join( [s]*3 )
>>> print s2
arbit hello world arbit hello world arbit
>>> s1 ='arbit'
>>> s2 = 'hello world '.join( [s]*3 )
>>> print s2
arbit hello world arbit hello world arbit

回答 5

弦线

如果您正在使用Python 3.6+,则可以使用新的,f-strings它代表格式化的字符串,可以通过f在字符串的开头添加字符以将其标识为f字符串来使用它

price = 123
name = "Jerry"
print(f"{name}!!, {price} is much, isn't {price} a lot? {name}!")
>Jerry!!, 123 is much, isn't 123 a lot? Jerry!

使用f字符串的主要好处是它们更具可读性,可以更快,并且具有更好的性能:

每个人的熊猫资源:Python数据分析,作者Daniel Y. Chen

基准测试

毫无疑问,新f-strings方法更具可读性,因为您不必重新映射字符串,但是如前所述,它是否更快?

price = 123
name = "Jerry"

def new():
    x = f"{name}!!, {price} is much, isn't {price} a lot? {name}!"


def old():
    x = "{1}!!, {0} is much, isn't {0} a lot? {1}!".format(price, name)

import timeit
print(timeit.timeit('new()', setup='from __main__ import new', number=10**7))
print(timeit.timeit('old()', setup='from __main__ import old', number=10**7))
> 3.8741058271543776  #new
> 5.861819514350163   #old

运行1000万次测试似乎新f-strings的映射速度实际上更快。

Fstrings

If you are using Python 3.6+ you can make use of the new so called f-strings which stands for formatted strings and it can be used by adding the character f at the beginning of a string to identify this as an f-string.

price = 123
name = "Jerry"
print(f"{name}!!, {price} is much, isn't {price} a lot? {name}!")
>Jerry!!, 123 is much, isn't 123 a lot? Jerry!

The main benefits of using f-strings is that they are more readable, can be faster, and offer better performance:

Source Pandas for Everyone: Python Data Analysis, By Daniel Y. Chen

Benchmarks

No doubt that the new f-strings are more readable, as you don’t have to remap the strings, but is it faster though as stated in the aformentioned quote?

price = 123
name = "Jerry"

def new():
    x = f"{name}!!, {price} is much, isn't {price} a lot? {name}!"


def old():
    x = "{1}!!, {0} is much, isn't {0} a lot? {1}!".format(price, name)

import timeit
print(timeit.timeit('new()', setup='from __main__ import new', number=10**7))
print(timeit.timeit('old()', setup='from __main__ import old', number=10**7))
> 3.8741058271543776  #new
> 5.861819514350163   #old

Running 10 Million test’s it seems that the new f-strings are actually faster in mapping.


Python将元组转换为字符串

问题:Python将元组转换为字符串

我有一个这样的字符元组:

('a', 'b', 'c', 'd', 'g', 'x', 'r', 'e')

我如何将其转换为字符串,使其类似于:

'abcdgxre'

I have a tuple of characters like such:

('a', 'b', 'c', 'd', 'g', 'x', 'r', 'e')

How do I convert it to a string so that it is like:

'abcdgxre'

回答 0

用途str.join

>>> tup = ('a', 'b', 'c', 'd', 'g', 'x', 'r', 'e')
>>> ''.join(tup)
'abcdgxre'
>>>
>>> help(str.join)
Help on method_descriptor:

join(...)
    S.join(iterable) -> str

    Return a string which is the concatenation of the strings in the
    iterable.  The separator between elements is S.

>>>

Use str.join:

>>> tup = ('a', 'b', 'c', 'd', 'g', 'x', 'r', 'e')
>>> ''.join(tup)
'abcdgxre'
>>>
>>> help(str.join)
Help on method_descriptor:

join(...)
    S.join(iterable) -> str

    Return a string which is the concatenation of the strings in the
    iterable.  The separator between elements is S.

>>>

回答 1

这是使用联接的一种简单方法。

''.join(('a', 'b', 'c', 'd', 'g', 'x', 'r', 'e'))

here is an easy way to use join.

''.join(('a', 'b', 'c', 'd', 'g', 'x', 'r', 'e'))

回答 2

这有效:

''.join(('a', 'b', 'c', 'd', 'g', 'x', 'r', 'e'))

它将生成:

'abcdgxre'

您还可以使用定界符(例如逗号)来生成:

'a,b,c,d,g,x,r,e'

通过使用:

','.join(('a', 'b', 'c', 'd', 'g', 'x', 'r', 'e'))

This works:

''.join(('a', 'b', 'c', 'd', 'g', 'x', 'r', 'e'))

It will produce:

'abcdgxre'

You can also use a delimiter like a comma to produce:

'a,b,c,d,g,x,r,e'

By using:

','.join(('a', 'b', 'c', 'd', 'g', 'x', 'r', 'e'))

回答 3

最简单的方法是像这样使用join:

>>> myTuple = ['h','e','l','l','o']
>>> ''.join(myTuple)
'hello'

之所以有效,是因为您的定界符实际上什么都没有,甚至没有空格:”。

Easiest way would be to use join like this:

>>> myTuple = ['h','e','l','l','o']
>>> ''.join(myTuple)
'hello'

This works because your delimiter is essentially nothing, not even a blank space: ”.