Python 实用宝典

Question 1

I have this code which finds the largest index of a specific character in a string, however I would like it to raise a ValueError when the specified character does not occur in a string.

So something like this:

contains('bababa', 'k')

would result in a:

→ ValueError: could not find k in bababa

How can I do this?

Here’s the current code for my function:

def contains(string,char):
  list = []

  for i in range(0,len(string)):
      if string[i] == char:
           list = list + [i]

  return list[-1]

Question 2

raise ValueError('could not find %c in %s' % (ch,str))

Question 3

Here’s a revised version of your code which still works plus it illustrates how to raise a ValueError the way you want. By-the-way, I think find_last(), find_last_index(), or something simlar would be a more descriptive name for this function. Adding to the possible confusion is the fact that Python already has a container object method named __contains__() that does something a little different, membership-testing-wise.

def contains(char_string, char):
    largest_index = -1
    for i, ch in enumerate(char_string):
        if ch == char:
            largest_index = i
    if largest_index > -1:  # any found?
        return largest_index  # return index of last one
    else:
        raise ValueError('could not find {!r} in {!r}'.format(char, char_string))

print(contains('mississippi', 's'))  # -> 6
print(contains('bababa', 'k'))  # ->

Traceback (most recent call last):
  File "how-to-raise-a-valueerror.py", line 15, in <module>
    print(contains('bababa', 'k'))
  File "how-to-raise-a-valueerror.py", line 12, in contains
    raise ValueError('could not find {} in {}'.format(char, char_string))
ValueError: could not find 'k' in 'bababa'

Update — A substantially simpler way

Wow! Here’s a much more concise version—essentially a one-liner—that is also likely faster because it reverses (via [::-1]) the string before doing a forward search through it for the first matching character and it does so using the fast built-in string index() method. With respect to your actual question, a nice little bonus convenience that comes with using index() is that it already raises a ValueError when the character substring isn’t found, so nothing additional is required to make that happen.

Here it is along with a quick unit test:

def contains(char_string, char):
    #  Ending - 1 adjusts returned index to account for searching in reverse.
    return len(char_string) - char_string[::-1].index(char) - 1

print(contains('mississippi', 's'))  # -> 6
print(contains('bababa', 'k'))  # ->

Traceback (most recent call last):
  File "better-way-to-raise-a-valueerror.py", line 9, in <module>
    print(contains('bababa', 'k'))
  File "better-way-to-raise-a-valueerror", line 6, in contains
    return len(char_string) - char_string[::-1].index(char) - 1
ValueError: substring not found

Question 4

>>> def contains(string, char):
...     for i in xrange(len(string) - 1, -1, -1):
...         if string[i] == char:
...             return i
...     raise ValueError("could not find %r in %r" % (char, string))
...
>>> contains('bababa', 'k')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in contains
ValueError: could not find 'k' in 'bababa'
>>> contains('bababa', 'a')
5
>>> contains('bababa', 'b')
4
>>> contains('xbababa', 'x')
0
>>>

Question 5

>>> response='bababa'
...  if "K" in response.text:
...     raise ValueError("Not found")

Question 6

I do this in Python 2:

"(%d goals, $%d)" % (self.goals, self.penalties)

What is the Python 3 version of this?

I tried searching for examples online but I kept getting Python 2 versions.

Question 7

Here are the docs about the “new” format syntax. An example would be:

"({:d} goals, ${:d})".format(self.goals, self.penalties)

If both goals and penalties are integers (i.e. their default format is ok), it could be shortened to:

"({} goals, ${})".format(self.goals, self.penalties)

And since the parameters are fields of self, there’s also a way of doing it using a single argument twice (as @Burhan Khalid noted in the comments):

"({0.goals} goals, ${0.penalties})".format(self)

Explaining:

{} means just the next positional argument, with default format;
{0} means the argument with index 0, with default format;
{:d} is the next positional argument, with decimal integer format;
{0:d} is the argument with index 0, with decimal integer format.

There are many others things you can do when selecting an argument (using named arguments instead of positional ones, accessing fields, etc) and many format options as well (padding the number, using thousands separators, showing sign or not, etc). Some other examples:

"({goals} goals, ${penalties})".format(goals=2, penalties=4)
"({goals} goals, ${penalties})".format(**self.__dict__)

"first goal: {0.goal_list[0]}".format(self)
"second goal: {.goal_list[1]}".format(self)

"conversion rate: {:.2f}".format(self.goals / self.shots) # '0.20'
"conversion rate: {:.2%}".format(self.goals / self.shots) # '20.45%'
"conversion rate: {:.0%}".format(self.goals / self.shots) # '20%'

"self: {!s}".format(self) # 'Player: Bob'
"self: {!r}".format(self) # '<__main__.Player instance at 0x00BF7260>'

"games: {:>3}".format(player1.games)  # 'games: 123'
"games: {:>3}".format(player2.games)  # 'games:   4'
"games: {:0>3}".format(player2.games) # 'games: 004'

Note: As others pointed out, the new format does not supersede the former, both are available both in Python 3 and the newer versions of Python 2 as well. Some may say it’s a matter of preference, but IMHO the newer is much more expressive than the older, and should be used whenever writing new code (unless it’s targeting older environments, of course).

Question 8

Python 3.6 now supports shorthand literal string interpolation with PEP 498. For your use case, the new syntax is simply:

f"({self.goals} goals, ${self.penalties})"

This is similar to the previous .format standard, but lets one easily do things like:

>>> width = 10
>>> precision = 4
>>> value = decimal.Decimal('12.34567')
>>> f'result: {value:{width}.{precision}}'
'result:      12.35'

Question 9

That line works as-is in Python 3.

>>> sys.version
'3.2 (r32:88445, Oct 20 2012, 14:09:29) \n[GCC 4.5.2]'
>>> "(%d goals, $%d)" % (self.goals, self.penalties)
'(1 goals, $2)'

Question 10

I like this approach

my_hash = {}
my_hash["goals"] = 3 #to show number
my_hash["penalties"] = "5" #to show string
print("I scored %(goals)d goals and took %(penalties)s penalties" % my_hash)

Note the appended d and s to the brackets respectively.

output will be:

I scored 3 goals and took 5 penalties

Question 11

How can I add a trailing slash (/ for *nix, \ for win32) to a directory string, if the tailing slash is not already there? Thanks!

Question 12

os.path.join(path, '') will add the trailing slash if it’s not already there.

You can do os.path.join(path, '', '') or os.path.join(path_with_a_trailing_slash, '') and you will still only get one trailing slash.

Question 13

Since you want to connect a directory and a filename, use

os.path.join(directory, filename)

If you want to get rid of .\..\..\blah\ paths, use

os.path.join(os.path.normpath(directory), filename)

Question 14

You can do it manually by:

path = ...

import os
if not path.endswith(os.path.sep):
    path += os.path.sep

However, it is usually much cleaner to use os.path.join.

Question 15

You could use something like this:

os.path.normcase(path)
    Normalize the case of a pathname. On Unix and Mac OS X, this returns the path unchanged; on case-insensitive filesystems, it converts the path to lowercase. On Windows, it also converts forward slashes to backward slashes.

Else you could look for something else on this page

Question 16

I am trying to create a nice column list in python for use with commandline admin tools which I create.

Basicly, I want a list like:

[['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]

To turn into:

a            b            c
aaaaaaaaaa   b            c
a            bbbbbbbbbb   c

Using plain tabs wont do the trick here because I don’t know the longest data in each row.

This is the same behavior as ‘column -t’ in Linux..

$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c"
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c

$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c" | column -t
a           b           c
aaaaaaaaaa  b           c
a           bbbbbbbbbb  c

I have looked around for various python libraries to do this but can’t find anything useful.

Question 17

data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]

col_width = max(len(word) for row in data for word in row) + 2  # padding
for row in data:
    print "".join(word.ljust(col_width) for word in row)

a            b            c            
aaaaaaaaaa   b            c            
a            bbbbbbbbbb   c

What this does is calculate the longest data entry to determine the column width, then use .ljust() to add the necessary padding when printing out each column.

Question 18

Since Python 2.6+, you can use a format string in the following way to set the columns to a minimum of 20 characters and align text to right.

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]
for row in table_data:
    print("{: >20} {: >20} {: >20}".format(*row))

Output:

               a                    b                    c
      aaaaaaaaaa                    b                    c
               a           bbbbbbbbbb                    c

Question 19

I came here with the same requirements but @lvc and @Preet’s answers seems more inline with what column -t produces in that columns have different widths:

>>> rows =  [   ['a',           'b',            'c',    'd']
...         ,   ['aaaaaaaaaa',  'b',            'c',    'd']
...         ,   ['a',           'bbbbbbbbbb',   'c',    'd']
...         ]
...

>>> widths = [max(map(len, col)) for col in zip(*rows)]
>>> for row in rows:
...     print "  ".join((val.ljust(width) for val, width in zip(row, widths)))
...
a           b           c  d
aaaaaaaaaa  b           c  d
a           bbbbbbbbbb  c  d

Question 20

This is a little late to the party, and a shameless plug for a package I wrote, but you can also check out the Columnar package.

It takes a list of lists of input and a list of headers and outputs a table-formatted string. This snippet creates a docker-esque table:

from columnar import columnar

headers = ['name', 'id', 'host', 'notes']

data = [
    ['busybox', 'c3c37d5d-38d2-409f-8d02-600fd9d51239', 'linuxnode-1-292735', 'Test server.'],
    ['alpine-python', '6bb77855-0fda-45a9-b553-e19e1a795f1e', 'linuxnode-2-249253', 'The one that runs python.'],
    ['redis', 'afb648ba-ac97-4fb2-8953-9a5b5f39663e', 'linuxnode-3-3416918', 'For queues and stuff.'],
    ['app-server', 'b866cd0f-bf80-40c7-84e3-c40891ec68f9', 'linuxnode-4-295918', 'A popular destination.'],
    ['nginx', '76fea0f0-aa53-4911-b7e4-fae28c2e469b', 'linuxnode-5-292735', 'Traffic Cop'],
]

table = columnar(data, headers, no_borders=True)
print(table)

Table Displaying No-border Style

Or you can get a little fancier with colors and borders. Table Displaying Spring Classics

To read more about the column-sizing algorithm and see the rest of the API you can check out the link above or see the Columnar GitHub Repo

Question 21

You have to do this with 2 passes:

get the maximum width of each column.
formatting the columns using our knowledge of max width from the first pass using str.ljust() and str.rjust()

Question 22

Transposing the columns like that is a job for zip:

>>> a = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
>>> list(zip(*a))
[('a', 'aaaaaaaaaa', 'a'), ('b', 'b', 'bbbbbbbbbb'), ('c', 'c', 'c')]

To find the required length of each column, you can use max:

>>> trans_a = zip(*a)
>>> [max(len(c) for c in b) for b in trans_a]
[10, 10, 1]

Which you can use, with suitable padding, to construct strings to pass to print:

>>> col_lenghts = [max(len(c) for c in b) for b in trans_a]
>>> padding = ' ' # You might want more
>>> padding.join(s.ljust(l) for s,l in zip(a[0], col_lenghts))
'a          b          c'

Question 23

To get fancier tables like

---------------------------------------------------
| First Name | Last Name        | Age | Position  |
---------------------------------------------------
| John       | Smith            | 24  | Software  |
|            |                  |     | Engineer  |
---------------------------------------------------
| Mary       | Brohowski        | 23  | Sales     |
|            |                  |     | Manager   |
---------------------------------------------------
| Aristidis  | Papageorgopoulos | 28  | Senior    |
|            |                  |     | Reseacher |
---------------------------------------------------

you can use this Python recipe:

'''
From http://code.activestate.com/recipes/267662-table-indentation/
PSF License
'''
import cStringIO,operator

def indent(rows, hasHeader=False, headerChar='-', delim=' | ', justify='left',
           separateRows=False, prefix='', postfix='', wrapfunc=lambda x:x):
    """Indents a table by column.
       - rows: A sequence of sequences of items, one sequence per row.
       - hasHeader: True if the first row consists of the columns' names.
       - headerChar: Character to be used for the row separator line
         (if hasHeader==True or separateRows==True).
       - delim: The column delimiter.
       - justify: Determines how are data justified in their column. 
         Valid values are 'left','right' and 'center'.
       - separateRows: True if rows are to be separated by a line
         of 'headerChar's.
       - prefix: A string prepended to each printed row.
       - postfix: A string appended to each printed row.
       - wrapfunc: A function f(text) for wrapping text; each element in
         the table is first wrapped by this function."""
    # closure for breaking logical rows to physical, using wrapfunc
    def rowWrapper(row):
        newRows = [wrapfunc(item).split('\n') for item in row]
        return [[substr or '' for substr in item] for item in map(None,*newRows)]
    # break each logical row into one or more physical ones
    logicalRows = [rowWrapper(row) for row in rows]
    # columns of physical rows
    columns = map(None,*reduce(operator.add,logicalRows))
    # get the maximum of each column by the string length of its items
    maxWidths = [max([len(str(item)) for item in column]) for column in columns]
    rowSeparator = headerChar * (len(prefix) + len(postfix) + sum(maxWidths) + \
                                 len(delim)*(len(maxWidths)-1))
    # select the appropriate justify method
    justify = {'center':str.center, 'right':str.rjust, 'left':str.ljust}[justify.lower()]
    output=cStringIO.StringIO()
    if separateRows: print >> output, rowSeparator
    for physicalRows in logicalRows:
        for row in physicalRows:
            print >> output, \
                prefix \
                + delim.join([justify(str(item),width) for (item,width) in zip(row,maxWidths)]) \
                + postfix
        if separateRows or hasHeader: print >> output, rowSeparator; hasHeader=False
    return output.getvalue()

# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
    """
    A word-wrap function that preserves existing line breaks
    and most spaces in the text. Expects that existing line
    breaks are posix newlines (\n).
    """
    return reduce(lambda line, word, width=width: '%s%s%s' %
                  (line,
                   ' \n'[(len(line[line.rfind('\n')+1:])
                         + len(word.split('\n',1)[0]
                              ) >= width)],
                   word),
                  text.split(' ')
                 )

import re
def wrap_onspace_strict(text, width):
    """Similar to wrap_onspace, but enforces the width constraint:
       words longer than width are split."""
    wordRegex = re.compile(r'\S{'+str(width)+r',}')
    return wrap_onspace(wordRegex.sub(lambda m: wrap_always(m.group(),width),text),width)

import math
def wrap_always(text, width):
    """A simple word-wrap function that wraps text on exactly width characters.
       It doesn't split the text in words."""
    return '\n'.join([ text[width*i:width*(i+1)] \
                       for i in xrange(int(math.ceil(1.*len(text)/width))) ])

if __name__ == '__main__':
    labels = ('First Name', 'Last Name', 'Age', 'Position')
    data = \
    '''John,Smith,24,Software Engineer
       Mary,Brohowski,23,Sales Manager
       Aristidis,Papageorgopoulos,28,Senior Reseacher'''
    rows = [row.strip().split(',')  for row in data.splitlines()]

    print 'Without wrapping function\n'
    print indent([labels]+rows, hasHeader=True)
    # test indent with different wrapping functions
    width = 10
    for wrapper in (wrap_always,wrap_onspace,wrap_onspace_strict):
        print 'Wrapping function: %s(x,width=%d)\n' % (wrapper.__name__,width)
        print indent([labels]+rows, hasHeader=True, separateRows=True,
                     prefix='| ', postfix=' |',
                     wrapfunc=lambda x: wrapper(x,width))

    # output:
    #
    #Without wrapping function
    #
    #First Name | Last Name        | Age | Position         
    #-------------------------------------------------------
    #John       | Smith            | 24  | Software Engineer
    #Mary       | Brohowski        | 23  | Sales Manager    
    #Aristidis  | Papageorgopoulos | 28  | Senior Reseacher 
    #
    #Wrapping function: wrap_always(x,width=10)
    #
    #----------------------------------------------
    #| First Name | Last Name  | Age | Position   |
    #----------------------------------------------
    #| John       | Smith      | 24  | Software E |
    #|            |            |     | ngineer    |
    #----------------------------------------------
    #| Mary       | Brohowski  | 23  | Sales Mana |
    #|            |            |     | ger        |
    #----------------------------------------------
    #| Aristidis  | Papageorgo | 28  | Senior Res |
    #|            | poulos     |     | eacher     |
    #----------------------------------------------
    #
    #Wrapping function: wrap_onspace(x,width=10)
    #
    #---------------------------------------------------
    #| First Name | Last Name        | Age | Position  |
    #---------------------------------------------------
    #| John       | Smith            | 24  | Software  |
    #|            |                  |     | Engineer  |
    #---------------------------------------------------
    #| Mary       | Brohowski        | 23  | Sales     |
    #|            |                  |     | Manager   |
    #---------------------------------------------------
    #| Aristidis  | Papageorgopoulos | 28  | Senior    |
    #|            |                  |     | Reseacher |
    #---------------------------------------------------
    #
    #Wrapping function: wrap_onspace_strict(x,width=10)
    #
    #---------------------------------------------
    #| First Name | Last Name  | Age | Position  |
    #---------------------------------------------
    #| John       | Smith      | 24  | Software  |
    #|            |            |     | Engineer  |
    #---------------------------------------------
    #| Mary       | Brohowski  | 23  | Sales     |
    #|            |            |     | Manager   |
    #---------------------------------------------
    #| Aristidis  | Papageorgo | 28  | Senior    |
    #|            | poulos     |     | Reseacher |
    #---------------------------------------------

The Python recipe page contains a few improvements on it.

Question 24

pandas based solution with creating dataframe:

import pandas as pd
l = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
df = pd.DataFrame(l)

print(df)
            0           1  2
0           a           b  c
1  aaaaaaaaaa           b  c
2           a  bbbbbbbbbb  c

To remove index and header values to create output what you want you could use to_string method:

result = df.to_string(index=False, header=False)

print(result)
          a           b  c
 aaaaaaaaaa           b  c
          a  bbbbbbbbbb  c

Question 25

Scolp is a new library that lets you pretty print streaming columnar data easily while auto-adjusting column width.

(Disclaimer: I am the author)

Question 26

This sets independent, best-fit column widths based on the max-metric used in other answers.

data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
padding = 2
col_widths = [max(len(w) for w in [r[cn] for r in data]) + padding for cn in range(len(data[0]))]
format_string = "{{:{}}}{{:{}}}{{:{}}}".format(*col_widths)
for row in data:
    print(format_string.format(*row))

Question 27

For lazy people

that are using Python 3.* and Pandas/Geopandas; universal simple in-class approach (for ‘normal’ script just remove self):

Function colorize:

    def colorize(self,s,color):
        s = color+str(s)+"\033[0m"
        return s

Header:

print('{0:<23} {1:>24} {2:>26} {3:>26} {4:>11} {5:>11}'.format('Road name','Classification','Function','Form of road','Length','Distance') )

and then data from Pandas/Geopandas dataframe:

            for index, row in clipped.iterrows():
                rdName      = self.colorize(row['name1'],"\033[32m")
                rdClass     = self.colorize(row['roadClassification'],"\033[93m")
                rdFunction  = self.colorize(row['roadFunction'],"\033[33m")
                rdForm      = self.colorize(row['formOfWay'],"\033[94m")
                rdLength    = self.colorize(row['length'],"\033[97m")
                rdDistance  = self.colorize(row['distance'],"\033[96m")
                print('{0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}'.format(rdName,rdClass,rdFunction,rdForm,rdLength,rdDistance) )

Meaning of {0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}:

0, 1, 2, 3, 4, 5 -> columns, there are 6 in total in this case

30, 35, 20 -> width of column (note that you’ll have to add length of \033[96m – this for Python is a string as well), just experiment :)

>, < -> justify: right, left (there is = for filling with zeros as well)

If you want to distinct e.g. max value, you’ll have to switch to special Pandas style function, but suppose that’s far enough to present data on terminal window.

Result:

Question 28

A slight variation on a previous answer (I don’t have enough rep to comment on it). The format library lets you specify the width and alignment of an element but not where it starts, ie, you can say “be 20 columns wide” but not “start in column 20”. Which leads to this issue:

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]

print("first row: {: >20} {: >20} {: >20}".format(*table_data[0]))
print("second row: {: >20} {: >20} {: >20}".format(*table_data[1]))
print("third row: {: >20} {: >20} {: >20}".format(*table_data[2]))

Output

first row:                    a                    b                    c
second row:           aaaaaaaaaa                    b                    c
third row:                    a           bbbbbbbbbb                    c

The answer of course is to format the literal strings as well, which combines slightly weirdly with the format:

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]

print(f"{'first row:': <20} {table_data[0][0]: >20} {table_data[0][1]: >20} {table_data[0][2]: >20}")
print("{: <20} {: >20} {: >20} {: >20}".format(*['second row:', *table_data[1]]))
print("{: <20} {: >20} {: >20} {: >20}".format(*['third row:', *table_data[1]]))

Output

first row:                              a                    b                    c
second row:                    aaaaaaaaaa                    b                    c
third row:                     aaaaaaaaaa                    b                    c

Question 29

I found this answer super-helpful and elegant, originally from here:

matrix = [["A", "B"], ["C", "D"]]

print('\n'.join(['\t'.join([str(cell) for cell in row]) for row in matrix]))

Output

A   B
C   D

Question 30

Here is a variation of the Shawn Chin’s answer. The width is fixed per column, not over all columns. There is also a border below the first row and between the columns. (icontract library is used to enforce the contracts.)

@icontract.pre(
    lambda table: not table or all(len(row) == len(table[0]) for row in table))
@icontract.post(lambda table, result: result == "" if not table else True)
@icontract.post(lambda result: not result.endswith("\n"))
def format_table(table: List[List[str]]) -> str:
    """
    Format the table as equal-spaced columns.

    :param table: rows of cells
    :return: table as string
    """
    cols = len(table[0])

    col_widths = [max(len(row[i]) for row in table) for i in range(cols)]

    lines = []  # type: List[str]
    for i, row in enumerate(table):
        parts = []  # type: List[str]

        for cell, width in zip(row, col_widths):
            parts.append(cell.ljust(width))

        line = " | ".join(parts)
        lines.append(line)

        if i == 0:
            border = []  # type: List[str]

            for width in col_widths:
                border.append("-" * width)

            lines.append("-+-".join(border))

    result = "\n".join(lines)

    return result

Here is an example:

>>> table = [['column 0', 'another column 1'], ['00', '01'], ['10', '11']]
>>> result = packagery._format_table(table=table)
>>> print(result)
column 0 | another column 1
---------+-----------------
00       | 01              
10       | 11

Question 31

updated @Franck Dernoncourt fancy recipe to be python 3 and PEP8 compliant

import io
import math
import operator
import re
import functools

from itertools import zip_longest


def indent(
    rows,
    has_header=False,
    header_char="-",
    delim=" | ",
    justify="left",
    separate_rows=False,
    prefix="",
    postfix="",
    wrapfunc=lambda x: x,
):
    """Indents a table by column.
       - rows: A sequence of sequences of items, one sequence per row.
       - hasHeader: True if the first row consists of the columns' names.
       - headerChar: Character to be used for the row separator line
         (if hasHeader==True or separateRows==True).
       - delim: The column delimiter.
       - justify: Determines how are data justified in their column.
         Valid values are 'left','right' and 'center'.
       - separateRows: True if rows are to be separated by a line
         of 'headerChar's.
       - prefix: A string prepended to each printed row.
       - postfix: A string appended to each printed row.
       - wrapfunc: A function f(text) for wrapping text; each element in
         the table is first wrapped by this function."""

    # closure for breaking logical rows to physical, using wrapfunc
    def row_wrapper(row):
        new_rows = [wrapfunc(item).split("\n") for item in row]
        return [[substr or "" for substr in item] for item in zip_longest(*new_rows)]

    # break each logical row into one or more physical ones
    logical_rows = [row_wrapper(row) for row in rows]
    # columns of physical rows
    columns = zip_longest(*functools.reduce(operator.add, logical_rows))
    # get the maximum of each column by the string length of its items
    max_widths = [max([len(str(item)) for item in column]) for column in columns]
    row_separator = header_char * (
        len(prefix) + len(postfix) + sum(max_widths) + len(delim) * (len(max_widths) - 1)
    )
    # select the appropriate justify method
    justify = {"center": str.center, "right": str.rjust, "left": str.ljust}[
        justify.lower()
    ]
    output = io.StringIO()
    if separate_rows:
        print(output, row_separator)
    for physicalRows in logical_rows:
        for row in physicalRows:
            print( output, prefix + delim.join(
                [justify(str(item), width) for (item, width) in zip(row, max_widths)]
            ) + postfix)
        if separate_rows or has_header:
            print(output, row_separator)
            has_header = False
    return output.getvalue()


# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
    """
    A word-wrap function that preserves existing line breaks
    and most spaces in the text. Expects that existing line
    breaks are posix newlines (\n).
    """
    return functools.reduce(
        lambda line, word, i_width=width: "%s%s%s"
        % (
            line,
            " \n"[
                (
                    len(line[line.rfind("\n") + 1 :]) + len(word.split("\n", 1)[0])
                    >= i_width
                )
            ],
            word,
        ),
        text.split(" "),
    )


def wrap_onspace_strict(text, i_width):
    """Similar to wrap_onspace, but enforces the width constraint:
       words longer than width are split."""
    word_regex = re.compile(r"\S{" + str(i_width) + r",}")
    return wrap_onspace(
        word_regex.sub(lambda m: wrap_always(m.group(), i_width), text), i_width
    )


def wrap_always(text, width):
    """A simple word-wrap function that wraps text on exactly width characters.
       It doesn't split the text in words."""
    return "\n".join(
        [
            text[width * i : width * (i + 1)]
            for i in range(int(math.ceil(1.0 * len(text) / width)))
        ]
    )


if __name__ == "__main__":
    labels = ("First Name", "Last Name", "Age", "Position")
    data = """John,Smith,24,Software Engineer
           Mary,Brohowski,23,Sales Manager
           Aristidis,Papageorgopoulos,28,Senior Reseacher"""
    rows = [row.strip().split(",") for row in data.splitlines()]

    print("Without wrapping function\n")
    print(indent([labels] + rows, has_header=True))

    # test indent with different wrapping functions
    width = 10
    for wrapper in (wrap_always, wrap_onspace, wrap_onspace_strict):
        print("Wrapping function: %s(x,width=%d)\n" % (wrapper.__name__, width))

        print(
            indent(
                [labels] + rows,
                has_header=True,
                separate_rows=True,
                prefix="| ",
                postfix=" |",
                wrapfunc=lambda x: wrapper(x, width),
            )
        )

    # output:
    #
    # Without wrapping function
    #
    # First Name | Last Name        | Age | Position
    # -------------------------------------------------------
    # John       | Smith            | 24  | Software Engineer
    # Mary       | Brohowski        | 23  | Sales Manager
    # Aristidis  | Papageorgopoulos | 28  | Senior Reseacher
    #
    # Wrapping function: wrap_always(x,width=10)
    #
    # ----------------------------------------------
    # | First Name | Last Name  | Age | Position   |
    # ----------------------------------------------
    # | John       | Smith      | 24  | Software E |
    # |            |            |     | ngineer    |
    # ----------------------------------------------
    # | Mary       | Brohowski  | 23  | Sales Mana |
    # |            |            |     | ger        |
    # ----------------------------------------------
    # | Aristidis  | Papageorgo | 28  | Senior Res |
    # |            | poulos     |     | eacher     |
    # ----------------------------------------------
    #
    # Wrapping function: wrap_onspace(x,width=10)
    #
    # ---------------------------------------------------
    # | First Name | Last Name        | Age | Position  |
    # ---------------------------------------------------
    # | John       | Smith            | 24  | Software  |
    # |            |                  |     | Engineer  |
    # ---------------------------------------------------
    # | Mary       | Brohowski        | 23  | Sales     |
    # |            |                  |     | Manager   |
    # ---------------------------------------------------
    # | Aristidis  | Papageorgopoulos | 28  | Senior    |
    # |            |                  |     | Reseacher |
    # ---------------------------------------------------
    #
    # Wrapping function: wrap_onspace_strict(x,width=10)
    #
    # ---------------------------------------------
    # | First Name | Last Name  | Age | Position  |
    # ---------------------------------------------
    # | John       | Smith      | 24  | Software  |
    # |            |            |     | Engineer  |
    # ---------------------------------------------
    # | Mary       | Brohowski  | 23  | Sales     |
    # |            |            |     | Manager   |
    # ---------------------------------------------
    # | Aristidis  | Papageorgo | 28  | Senior    |
    # |            | poulos     |     | Reseacher |
    # ---------------------------------------------

Question 32

I realize this question is old but I didn’t understand Antak’s answer and didn’t want to use a library so I rolled my own solution.

Solution assumes records is a 2D array, records are all the same length, and that fields are all strings.

def stringifyRecords(records):
    column_widths = [0] * len(records[0])
    for record in records:
        for i, field in enumerate(record):
            width = len(field)
            if width > column_widths[i]: column_widths[i] = width

    s = ""
    for record in records:
        for column_width, field in zip(column_widths, record):
            s += field.ljust(column_width+1)
        s += "\n"

    return s

Question 33

string.split() returns a list instance. Is there a version that returns a generator instead? Are there any reasons against having a generator version?

Question 34

It is highly probable that re.finditer uses fairly minimal memory overhead.

def split_iter(string):
    return (x.group(0) for x in re.finditer(r"[A-Za-z']+", string))

Demo:

>>> list( split_iter("A programmer's RegEx test.") )
['A', "programmer's", 'RegEx', 'test']

edit: I have just confirmed that this takes constant memory in python 3.2.1, assuming my testing methodology was correct. I created a string of very large size (1GB or so), then iterated through the iterable with a for loop (NOT a list comprehension, which would have generated extra memory). This did not result in a noticeable growth of memory (that is, if there was a growth in memory, it was far far less than the 1GB string).

Question 35

The most efficient way I can think of it to write one using the offset parameter of the str.find() method. This avoids lots of memory use, and relying on the overhead of a regexp when it’s not needed.

[edit 2016-8-2: updated this to optionally support regex separators]

def isplit(source, sep=None, regex=False):
    """
    generator version of str.split()

    :param source:
        source string (unicode or bytes)

    :param sep:
        separator to split on.

    :param regex:
        if True, will treat sep as regular expression.

    :returns:
        generator yielding elements of string.
    """
    if sep is None:
        # mimic default python behavior
        source = source.strip()
        sep = "\\s+"
        if isinstance(source, bytes):
            sep = sep.encode("ascii")
        regex = True
    if regex:
        # version using re.finditer()
        if not hasattr(sep, "finditer"):
            sep = re.compile(sep)
        start = 0
        for m in sep.finditer(source):
            idx = m.start()
            assert idx >= start
            yield source[start:idx]
            start = m.end()
        yield source[start:]
    else:
        # version using str.find(), less overhead than re.finditer()
        sepsize = len(sep)
        start = 0
        while True:
            idx = source.find(sep, start)
            if idx == -1:
                yield source[start:]
                return
            yield source[start:idx]
            start = idx + sepsize

This can be used like you want…

>>> print list(isplit("abcb","b"))
['a','c','']

While there is a little bit of cost seeking within the string each time find() or slicing is performed, this should be minimal since strings are represented as continguous arrays in memory.

Question 36

This is generator version of split() implemented via re.search() that does not have the problem of allocating too many substrings.

import re

def itersplit(s, sep=None):
    exp = re.compile(r'\s+' if sep is None else re.escape(sep))
    pos = 0
    while True:
        m = exp.search(s, pos)
        if not m:
            if pos < len(s) or sep is not None:
                yield s[pos:]
            break
        if pos < m.start() or sep is not None:
            yield s[pos:m.start()]
        pos = m.end()


sample1 = "Good evening, world!"
sample2 = " Good evening, world! "
sample3 = "brackets][all][][over][here"
sample4 = "][brackets][all][][over][here]["

assert list(itersplit(sample1)) == sample1.split()
assert list(itersplit(sample2)) == sample2.split()
assert list(itersplit(sample3, '][')) == sample3.split('][')
assert list(itersplit(sample4, '][')) == sample4.split('][')

EDIT: Corrected handling of surrounding whitespace if no separator chars are given.

Question 37

Did some performance testing on the various methods proposed (I won’t repeat them here). Some results:

str.split (default = 0.3461570239996945
manual search (by character) (one of Dave Webb’s answer’s) = 0.8260340550004912
re.finditer (ninjagecko’s answer) = 0.698872097000276
str.find (one of Eli Collins’s answers) = 0.7230395330007013
itertools.takewhile (Ignacio Vazquez-Abrams’s answer) = 2.023023967998597
str.split(..., maxsplit=1) recursion = N/A†

†The recursion answers (string.split with maxsplit = 1) fail to complete in a reasonable time, given string.splits speed they may work better on shorter strings, but then I can’t see the use-case for short strings where memory isn’t an issue anyway.

Tested using timeit on:

the_text = "100 " * 9999 + "100"

def test_function( method ):
    def fn( ):
        total = 0

        for x in method( the_text ):
            total += int( x )

        return total

    return fn

This raises another question as to why string.split is so much faster despite its memory usage.

Question 38

Here is my implementation, which is much, much faster and more complete than the other answers here. It has 4 separate subfunctions for different cases.

I’ll just copy the docstring of the main str_split function:

str_split(s, *delims, empty=None)

Split the string s by the rest of the arguments, possibly omitting empty parts (empty keyword argument is responsible for that). This is a generator function.

When only one delimiter is supplied, the string is simply split by it. empty is then True by default.

str_split('[]aaa[][]bb[c', '[]')
    -> '', 'aaa', '', 'bb[c'
str_split('[]aaa[][]bb[c', '[]', empty=False)
    -> 'aaa', 'bb[c'

When multiple delimiters are supplied, the string is split by longest possible sequences of those delimiters by default, or, if empty is set to True, empty strings between the delimiters are also included. Note that the delimiters in this case may only be single characters.

str_split('aaa, bb : c;', ' ', ',', ':', ';')
    -> 'aaa', 'bb', 'c'
str_split('aaa, bb : c;', *' ,:;', empty=True)
    -> 'aaa', '', 'bb', '', '', 'c', ''

When no delimiters are supplied, string.whitespace is used, so the effect is the same as str.split(), except this function is a generator.

str_split('aaa\\t  bb c \\n')
    -> 'aaa', 'bb', 'c'

import string

def _str_split_chars(s, delims):
    "Split the string `s` by characters contained in `delims`, including the \
    empty parts between two consecutive delimiters"
    start = 0
    for i, c in enumerate(s):
        if c in delims:
            yield s[start:i]
            start = i+1
    yield s[start:]

def _str_split_chars_ne(s, delims):
    "Split the string `s` by longest possible sequences of characters \
    contained in `delims`"
    start = 0
    in_s = False
    for i, c in enumerate(s):
        if c in delims:
            if in_s:
                yield s[start:i]
                in_s = False
        else:
            if not in_s:
                in_s = True
                start = i
    if in_s:
        yield s[start:]


def _str_split_word(s, delim):
    "Split the string `s` by the string `delim`"
    dlen = len(delim)
    start = 0
    try:
        while True:
            i = s.index(delim, start)
            yield s[start:i]
            start = i+dlen
    except ValueError:
        pass
    yield s[start:]

def _str_split_word_ne(s, delim):
    "Split the string `s` by the string `delim`, not including empty parts \
    between two consecutive delimiters"
    dlen = len(delim)
    start = 0
    try:
        while True:
            i = s.index(delim, start)
            if start!=i:
                yield s[start:i]
            start = i+dlen
    except ValueError:
        pass
    if start<len(s):
        yield s[start:]


def str_split(s, *delims, empty=None):
    """\
Split the string `s` by the rest of the arguments, possibly omitting
empty parts (`empty` keyword argument is responsible for that).
This is a generator function.

When only one delimiter is supplied, the string is simply split by it.
`empty` is then `True` by default.
    str_split('[]aaa[][]bb[c', '[]')
        -> '', 'aaa', '', 'bb[c'
    str_split('[]aaa[][]bb[c', '[]', empty=False)
        -> 'aaa', 'bb[c'

When multiple delimiters are supplied, the string is split by longest
possible sequences of those delimiters by default, or, if `empty` is set to
`True`, empty strings between the delimiters are also included. Note that
the delimiters in this case may only be single characters.
    str_split('aaa, bb : c;', ' ', ',', ':', ';')
        -> 'aaa', 'bb', 'c'
    str_split('aaa, bb : c;', *' ,:;', empty=True)
        -> 'aaa', '', 'bb', '', '', 'c', ''

When no delimiters are supplied, `string.whitespace` is used, so the effect
is the same as `str.split()`, except this function is a generator.
    str_split('aaa\\t  bb c \\n')
        -> 'aaa', 'bb', 'c'
"""
    if len(delims)==1:
        f = _str_split_word if empty is None or empty else _str_split_word_ne
        return f(s, delims[0])
    if len(delims)==0:
        delims = string.whitespace
    delims = set(delims) if len(delims)>=4 else ''.join(delims)
    if any(len(d)>1 for d in delims):
        raise ValueError("Only 1-character multiple delimiters are supported")
    f = _str_split_chars if empty else _str_split_chars_ne
    return f(s, delims)

This function works in Python 3, and an easy, though quite ugly, fix can be applied to make it work in both 2 and 3 versions. The first lines of the function should be changed to:

def str_split(s, *delims, **kwargs):
    """...docstring..."""
    empty = kwargs.get('empty')

Question 39

No, but it should be easy enough to write one using itertools.takewhile().

EDIT:

Very simple, half-broken implementation:

import itertools
import string

def isplitwords(s):
  i = iter(s)
  while True:
    r = []
    for c in itertools.takewhile(lambda x: not x in string.whitespace, i):
      r.append(c)
    else:
      if r:
        yield ''.join(r)
        continue
      else:
        raise StopIteration()

Question 40

I don’t see any obvious benefit to a generator version of split(). The generator object is going to have to contain the whole string to iterate over so you’re not going to save any memory by having a generator.

If you wanted to write one it would be fairly easy though:

import string

def gsplit(s,sep=string.whitespace):
    word = []

    for c in s:
        if c in sep:
            if word:
                yield "".join(word)
                word = []
        else:
            word.append(c)

    if word:
        yield "".join(word)

Question 41

I wrote a version of @ninjagecko’s answer that behaves more like string.split (i.e. whitespace delimited by default and you can specify a delimiter).

def isplit(string, delimiter = None):
    """Like string.split but returns an iterator (lazy)

    Multiple character delimters are not handled.
    """

    if delimiter is None:
        # Whitespace delimited by default
        delim = r"\s"

    elif len(delimiter) != 1:
        raise ValueError("Can only handle single character delimiters",
                        delimiter)

    else:
        # Escape, incase it's "\", "*" etc.
        delim = re.escape(delimiter)

    return (x.group(0) for x in re.finditer(r"[^{}]+".format(delim), string))

Here are the tests I used (in both python 3 and python 2):

# Wrapper to make it a list
def helper(*args,  **kwargs):
    return list(isplit(*args, **kwargs))

# Normal delimiters
assert helper("1,2,3", ",") == ["1", "2", "3"]
assert helper("1;2;3,", ";") == ["1", "2", "3,"]
assert helper("1;2 ;3,  ", ";") == ["1", "2 ", "3,  "]

# Whitespace
assert helper("1 2 3") == ["1", "2", "3"]
assert helper("1\t2\t3") == ["1", "2", "3"]
assert helper("1\t2 \t3") == ["1", "2", "3"]
assert helper("1\n2\n3") == ["1", "2", "3"]

# Surrounding whitespace dropped
assert helper(" 1 2  3  ") == ["1", "2", "3"]

# Regex special characters
assert helper(r"1\2\3", "\\") == ["1", "2", "3"]
assert helper(r"1*2*3", "*") == ["1", "2", "3"]

# No multi-char delimiters allowed
try:
    helper(r"1,.2,.3", ",.")
    assert False
except ValueError:
    pass

python’s regex module says that it does “the right thing” for unicode whitespace, but I haven’t actually tested it.

Also available as a gist.

Question 42

If you would also like to be able to read an iterator (as well as return one) try this:

import itertools as it

def iter_split(string, sep=None):
    sep = sep or ' '
    groups = it.groupby(string, lambda s: s != sep)
    return (''.join(g) for k, g in groups if k)

Usage

>>> list(iter_split(iter("Good evening, world!")))
['Good', 'evening,', 'world!']

Question 43

more_itertools.split_at offers an analog to str.split for iterators.

>>> import more_itertools as mit


>>> list(mit.split_at("abcdcba", lambda x: x == "b"))
[['a'], ['c', 'd', 'c'], ['a']]

>>> "abcdcba".split("b")
['a', 'cdc', 'a']

more_itertools is a third-party package.

Question 44

I wanted to show how to use the find_iter solution to return a generator for given delimiters and then use the pairwise recipe from itertools to build a previous next iteration which will get the actual words as in the original split method.

from more_itertools import pairwise
import re

string = "dasdha hasud hasuid hsuia dhsuai dhasiu dhaui d"
delimiter = " "
# split according to the given delimiter including segments beginning at the beginning and ending at the end
for prev, curr in pairwise(re.finditer("^|[{0}]+|$".format(delimiter), string)):
    print(string[prev.end(): curr.start()])

note:

I use prev & curr instead of prev & next because overriding next in python is a very bad idea
This is quite efficient

Question 45

Dumbest method, without regex / itertools:

def isplit(text, split='\n'):
    while text != '':
        end = text.find(split)

        if end == -1:
            yield text
            text = ''
        else:
            yield text[:end]
            text = text[end + 1:]

Question 46

def split_generator(f,s):
    """
    f is a string, s is the substring we split on.
    This produces a generator rather than a possibly
    memory intensive list. 
    """
    i=0
    j=0
    while j<len(f):
        if i>=len(f):
            yield f[j:]
            j=i
        elif f[i] != s:
            i=i+1
        else:
            yield [f[j:i]]
            j=i+1
            i=i+1

Question 47

here is a simple response

def gen_str(some_string, sep):
    j=0
    guard = len(some_string)-1
    for i,s in enumerate(some_string):
        if s == sep:
           yield some_string[j:i]
           j=i+1
        elif i!=guard:
           continue
        else:
           yield some_string[j:]

Question 48

Sometimes when I get input from a file or the user, I get a string with escape sequences in it. I would like to process the escape sequences in the same way that Python processes escape sequences in string literals.

For example, let’s say myString is defined as:

>>> myString = "spam\\neggs"
>>> print(myString)
spam\neggs

I want a function (I’ll call it process) that does this:

>>> print(process(myString))
spam
eggs

It’s important that the function can process all of the escape sequences in Python (listed in a table in the link above).

Does Python have a function to do this?

Question 49

The correct thing to do is use the ‘string-escape’ code to decode the string.

>>> myString = "spam\\neggs"
>>> decoded_string = bytes(myString, "utf-8").decode("unicode_escape") # python3 
>>> decoded_string = myString.decode('string_escape') # python2
>>> print(decoded_string)
spam
eggs

Don’t use the AST or eval. Using the string codecs is much safer.

Question 50

`unicode_escape` doesn’t work in general

It turns out that the string_escape or unicode_escape solution does not work in general — particularly, it doesn’t work in the presence of actual Unicode.

If you can be sure that every non-ASCII character will be escaped (and remember, anything beyond the first 128 characters is non-ASCII), unicode_escape will do the right thing for you. But if there are any literal non-ASCII characters already in your string, things will go wrong.

unicode_escape is fundamentally designed to convert bytes into Unicode text. But in many places — for example, Python source code — the source data is already Unicode text.

The only way this can work correctly is if you encode the text into bytes first. UTF-8 is the sensible encoding for all text, so that should work, right?

The following examples are in Python 3, so that the string literals are cleaner, but the same problem exists with slightly different manifestations on both Python 2 and 3.

>>> s = 'naïve \\t test'
>>> print(s.encode('utf-8').decode('unicode_escape'))
naÃ¯ve   test

Well, that’s wrong.

The new recommended way to use codecs that decode text into text is to call codecs.decode directly. Does that help?

>>> import codecs
>>> print(codecs.decode(s, 'unicode_escape'))
naÃ¯ve   test

Not at all. (Also, the above is a UnicodeError on Python 2.)

The unicode_escape codec, despite its name, turns out to assume that all non-ASCII bytes are in the Latin-1 (ISO-8859-1) encoding. So you would have to do it like this:

>>> print(s.encode('latin-1').decode('unicode_escape'))
naïve    test

But that’s terrible. This limits you to the 256 Latin-1 characters, as if Unicode had never been invented at all!

>>> print('Ernő \\t Rubik'.encode('latin-1').decode('unicode_escape'))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0151'
in position 3: ordinal not in range(256)

Adding a regular expression to solve the problem

(Surprisingly, we do not now have two problems.)

What we need to do is only apply the unicode_escape decoder to things that we are certain to be ASCII text. In particular, we can make sure only to apply it to valid Python escape sequences, which are guaranteed to be ASCII text.

The plan is, we’ll find escape sequences using a regular expression, and use a function as the argument to re.sub to replace them with their unescaped value.

import re
import codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\U........      # 8-digit hex escapes
    | \\u....          # 4-digit hex escapes
    | \\x..            # 2-digit hex escapes
    | \\[0-7]{1,3}     # Octal escapes
    | \\N\{[^}]+\}     # Unicode characters by name
    | \\[\\'"abfnrtv]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')

    return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

And with that:

>>> print(decode_escapes('Ernő \\t Rubik'))
Ernő     Rubik

Question 51

The actually correct and convenient answer for python 3:

>>> import codecs
>>> myString = "spam\\neggs"
>>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
spam
eggs
>>> myString = "naïve \\t test"
>>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
naïve    test

Details regarding codecs.escape_decode:

codecs.escape_decode is a bytes-to-bytes decoder
codecs.escape_decode decodes ascii escape sequences, such as: b"\\n" -> b"\n", b"\\xce" -> b"\xce".
codecs.escape_decode does not care or need to know about the byte object’s encoding, but the encoding of the escaped bytes should match the encoding of the rest of the object.

Background:

@rspeer is correct: unicode_escape is the incorrect solution for python3. This is because unicode_escape decodes escaped bytes, then decodes bytes to unicode string, but receives no information regarding which codec to use for the second operation.
@Jerub is correct: avoid the AST or eval.
I first discovered codecs.escape_decode from this answer to “how do I .decode(‘string-escape’) in Python3?”. As that answer states, that function is currently not documented for python 3.

Question 52

The ast.literal_eval function comes close, but it will expect the string to be properly quoted first.

Of course Python’s interpretation of backslash escapes depends on how the string is quoted ("" vs r"" vs u"", triple quotes, etc) so you may want to wrap the user input in suitable quotes and pass to literal_eval. Wrapping it in quotes will also prevent literal_eval from returning a number, tuple, dictionary, etc.

Things still might get tricky if the user types unquoted quotes of the type you intend to wrap around the string.

Question 53

This is a bad way of doing it, but it worked for me when trying to interpret escaped octals passed in a string argument.

input_string = eval('b"' + sys.argv[1] + '"')

It’s worth mentioning that there is a difference between eval and ast.literal_eval (eval being way more unsafe). See Using python’s eval() vs. ast.literal_eval()?

Question 54

Below code should work for \n is required to be displayed on the string.

import string

our_str = 'The String is \\n, \\n and \\n!'
new_str = string.replace(our_str, '/\\n', '/\n', 1)
print(new_str)

Question 55

Just posting this so I can search for it later, as it always seems to stump me:

$ python3.2
Python 3.2 (r32:88445, Oct 20 2012, 14:09:50) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import curses
>>> print(curses.version)
b'2.2'
>>> print(str(curses.version))
b'2.2'
>>> print(curses.version.encode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'encode'
>>> print(str(curses.version).encode('utf-8'))
b"b'2.2'"

As question: how to print a binary (bytes) string in Python 3, without the b' prefix?

Question 56

Use decode:

print(curses.version.decode())
# 2.2

Question 57

If the bytes use an appropriate character encoding already; you could print them directly:

sys.stdout.buffer.write(data)

or

nwritten = os.write(sys.stdout.fileno(), data)  # NOTE: it may write less than len(data) bytes

Question 58

If we take a look at the source for bytes.__repr__, it looks as if the b'' is baked into the method.

The most obvious workaround is to manually slice off the b'' from the resulting repr():

>>> x = b'\x01\x02\x03\x04'

>>> print(repr(x))
b'\x01\x02\x03\x04'

>>> print(repr(x)[2:-1])
\x01\x02\x03\x04

Question 59

If the data is in an UTF-8 compatible format, you can convert the bytes to a string.

>>> import curses
>>> print(str(curses.version, "utf-8"))
2.2

Optionally convert to hex first, if the data is not already UTF-8 compatible. E.g. when the data are actual raw bytes.

from binascii import hexlify
from codecs import encode  # alternative
>>> print(hexlify(b"\x13\x37"))
b'1337'
>>> print(str(hexlify(b"\x13\x37"), "utf-8"))
1337
>>>> print(str(encode(b"\x13\x37", "hex"), "utf-8"))
1337

Question 60

I construct a string s in Python 2.6.5 which will have a varying number of %s tokens, which match the number of entries in list x. I need to write out a formatted string. The following doesn’t work, but indicates what I’m trying to do. In this example, there are three %s tokens and the list has three entries.

s = '%s BLAH %s FOO %s BAR'
x = ['1', '2', '3']
print s % (x)

I’d like the output string to be:

1 BLAH 2 FOO 3 BAR

Question 61

print s % tuple(x)

instead of

print s % (x)

Question 62

You should take a look to the format method of python. You could then define your formatting string like this :

>>> s = '{0} BLAH BLAH {1} BLAH {2} BLAH BLIH BLEH'
>>> x = ['1', '2', '3']
>>> print s.format(*x)
'1 BLAH BLAH 2 BLAH 3 BLAH BLIH BLEH'

Question 63

Following this resource page, if the length of x is varying, we can use:

', '.join(['%.2f']*len(x))

to create a place holder for each element from the list x. Here is the example:

x = [1/3.0, 1/6.0, 0.678]
s = ("elements in the list are ["+', '.join(['%.2f']*len(x))+"]") % tuple(x)
print s
>>> elements in the list are [0.33, 0.17, 0.68]

Question 64

Here is a one liner. A little improvised answer using format with print() to iterate a list.

How about this (python 3.x):

sample_list = ['cat', 'dog', 'bunny', 'pig']
print("Your list of animals are: {}, {}, {} and {}".format(*sample_list))

Read the docs here on using format().

问题：如何引发ValueError？

回答 0

回答 1

回答 2

回答 3

问题：Python 3中的字符串格式

回答 0

回答 1

回答 2

回答 3

问题：Python，在目录字符串中添加尾部斜线，独立于操作系统

回答 0

回答 1

回答 2

回答 3

问题：在python中创建漂亮的列输出

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

对于懒惰的人

结果：

For lazy people

Result:

回答 11

回答 12

回答 13

回答 14

回答 15

问题：Python中是否有`string.split（）`的生成器版本？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

问题：在Python中处理字符串中的转义序列

回答 0

回答 1

unicode_escape 总的来说不起作用

添加正则表达式以解决问题

unicode_escape doesn’t work in general

Adding a regular expression to solve the problem

回答 2

回答 3

回答 4

回答 5

问题：在Python 3中禁止/打印不带b’前缀的字节

回答 0

回答 1

回答 2

回答 3

问题：将Python字符串格式化与列表一起使用

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：格式化字符串时多次插入相同的值

回答 0

回答 1

回答 2

`unicode_escape` 总的来说不起作用

`unicode_escape` doesn’t work in general