Python 实用宝典

Question 1

I have a list of strings for which I would like to perform a natural alphabetical sort.

For instance, the following list is naturally sorted (what I want):

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

And here’s the “sorted” version of the above list (what I get using sorted()):

['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

I’m looking for a sort function which behaves like the first one.

Question 2

There is a third party library for this on PyPI called natsort (full disclosure, I am the package’s author). For your case, you can do either of the following:

>>> from natsort import natsorted, ns
>>> x = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
>>> natsorted(x, key=lambda y: y.lower())
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsorted(x, alg=ns.IGNORECASE)  # or alg=ns.IC
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

You should note that natsort uses a general algorithm so it should work for just about any input that you throw at it. If you want more details on why you might choose a library to do this rather than rolling your own function, check out the natsort documentation’s How It Works page, in particular the Special Cases Everywhere! section.

If you need a sorting key instead of a sorting function, use either of the below formulas.

>>> from natsort import natsort_keygen, ns
>>> l1 = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> l2 = l1[:]
>>> natsort_key1 = natsort_keygen(key=lambda y: y.lower())
>>> l1.sort(key=natsort_key1)
>>> l1
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)
>>> l2.sort(key=natsort_key2)
>>> l2
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

Question 3

Try this:

import re

def natural_sort(l): 
    convert = lambda text: int(text) if text.isdigit() else text.lower() 
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(l, key = alphanum_key)

Output:

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

Code adapted from here: Sorting for Humans : Natural Sort Order.

Question 4

Here’s a much more pythonic version of Mark Byer’s answer:

import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in _nsre.split(s)]

Now this function can be used as a key in any function that uses it, like list.sort, sorted, max, etc.

As a lambda:

lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(\d+)', s)]

Question 5

I wrote a function based on http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html which adds the ability to still pass in your own ‘key’ parameter. I need this in order to perform a natural sort of lists that contain more complex objects (not just strings).

import re

def natural_sort(list, key=lambda s:s):
    """
    Sort the list into natural alphanumeric order.
    """
    def get_alphanum_key_func(key):
        convert = lambda text: int(text) if text.isdigit() else text 
        return lambda s: [convert(c) for c in re.split('([0-9]+)', key(s))]
    sort_key = get_alphanum_key_func(key)
    list.sort(key=sort_key)

For example:

my_list = [{'name':'b'}, {'name':'10'}, {'name':'a'}, {'name':'1'}, {'name':'9'}]
natural_sort(my_list, key=lambda x: x['name'])
print my_list
[{'name': '1'}, {'name': '9'}, {'name': '10'}, {'name': 'a'}, {'name': 'b'}]

Question 6

data = ['elm13', 'elm9', 'elm0', 'elm1', 'Elm11', 'Elm2', 'elm10']

Let’s analyse the data. The digit capacity of all elements is 2. And there are 3 letters in common literal part 'elm'.

So, the maximal length of element is 5. We can increase this value to make sure (for example, to 8).

Bearing that in mind, we’ve got a one-line solution:

data.sort(key=lambda x: '{0:0>8}'.format(x).lower())

without regular expressions and external libraries!

print(data)

>>> ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'elm13']

Explanation:

for elm in data:
    print('{0:0>8}'.format(elm).lower())

>>>
0000elm0
0000elm1
0000elm2
0000elm9
000elm10
000elm11
000elm13

Question 7

Given:

data=['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

Similar to SergO’s solution, a 1-liner without external libraries would be:

data.sort(key=lambda x : int(x[3:]))

or

sorted_data=sorted(data, key=lambda x : int(x[3:]))

Explanation:

This solution uses the key feature of sort to define a function that will be employed for the sorting. Because we know that every data entry is preceded by ‘elm’ the sorting function converts to integer the portion of the string after the 3rd character (i.e. int(x[3:])). If the numerical part of the data is in a different location, then this part of the function would have to change.

Cheers

Question 8

And now for something more* elegant (pythonic) -just a touch

There are many implementations out there, and while some have come close, none quite captured the elegance modern python affords.

Tested using python(3.5.1)
Included an additional list to demonstrate that it works when the numbers are mid string
Didn’t test, however, I am assuming that if your list was sizable it would be more efficient to compile the regex beforehand
- I’m sure someone will correct me if this is an erroneous assumption

Quicky

from re import compile, split    
dre = compile(r'(\d+)')
mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

Full-Code

#!/usr/bin/python3
# coding=utf-8
"""
Natural-Sort Test
"""

from re import compile, split

dre = compile(r'(\d+)')
mylist = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13', 'elm']
mylist2 = ['e0lm', 'e1lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm', 'e01lm']

mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
mylist2.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

print(mylist)  
  # ['elm', 'elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
print(mylist2)  
  # ['e0lm', 'e1lm', 'e01lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm']

Caution when using

from os.path import split
- you will need to differentiate the imports

Inspiration from

Python Documentation- Sorting HOW TO
Sorting for Humans : Natural Sort Order
Human Sorting
Contributors/Commentators to this and referenced posts

Question 9

Value Of This Post

My point is to offer a non regex solution that can be applied generally.
I’ll create three functions:

find_first_digit which I borrowed from @AnuragUniyal. It will find the position of the first digit or non-digit in a string.
split_digits which is a generator that picks apart a string into digit and non digit chunks. It will also yield integers when it is a digit.
natural_key just wraps split_digits into a tuple. This is what we use as a key for sorted, max, min.

Functions

def find_first_digit(s, non=False):
    for i, x in enumerate(s):
        if x.isdigit() ^ non:
            return i
    return -1

def split_digits(s, case=False):
    non = True
    while s:
        i = find_first_digit(s, non)
        if i == 0:
            non = not non
        elif i == -1:
            yield int(s) if s.isdigit() else s if case else s.lower()
            s = ''
        else:
            x, s = s[:i], s[i:]
            yield int(x) if x.isdigit() else x if case else x.lower()

def natural_key(s, *args, **kwargs):
    return tuple(split_digits(s, *args, **kwargs))

We can see that it is general in that we can have multiple digit chunks:

# Note that the key has lower case letters
natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh')

('asl;dkfdfkj:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')

Or leave as case sensitive:

natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh', True)

('asl;dkfDFKJ:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')

We can see that it sorts the OP’s list in the appropriate order

sorted(
    ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13'],
    key=natural_key
)

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

But it can handle more complicated lists as well:

sorted(
    ['f_1', 'e_1', 'a_2', 'g_0', 'd_0_12:2', 'd_0_1_:2'],
    key=natural_key
)

['a_2', 'd_0_1_:2', 'd_0_12:2', 'e_1', 'f_1', 'g_0']

My regex equivalent would be

def int_maybe(x):
    return int(x) if str(x).isdigit() else x

def split_digits_re(s, case=False):
    parts = re.findall('\d+|\D+', s)
    if not case:
        return map(int_maybe, (x.lower() for x in parts))
    else:
        return map(int_maybe, parts)
    
def natural_key_re(s, *args, **kwargs):
    return tuple(split_digits_re(s, *args, **kwargs))

Question 10

One option is to turn the string into a tuple and replace digits using expanded form http://wiki.answers.com/Q/What_does_expanded_form_mean

that way a90 would become (“a”,90,0) and a1 would become (“a”,1)

below is some sample code (which isn’t very efficient due to the way It removes leading 0’s from numbers)

alist=["something1",
    "something12",
    "something17",
    "something2",
    "something25and_then_33",
    "something25and_then_34",
    "something29",
    "beta1.1",
    "beta2.3.0",
    "beta2.33.1",
    "a001",
    "a2",
    "z002",
    "z1"]

def key(k):
    nums=set(list("0123456789"))
        chars=set(list(k))
    chars=chars-nums
    for i in range(len(k)):
        for c in chars:
            k=k.replace(c+"0",c)
    l=list(k)
    base=10
    j=0
    for i in range(len(l)-1,-1,-1):
        try:
            l[i]=int(l[i])*base**j
            j+=1
        except:
            j=0
    l=tuple(l)
    print l
    return l

print sorted(alist,key=key)

output:

('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 1)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 7)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 3)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 4)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 9)
('b', 'e', 't', 'a', 1, '.', 1)
('b', 'e', 't', 'a', 2, '.', 3, '.')
('b', 'e', 't', 'a', 2, '.', 30, 3, '.', 1)
('a', 1)
('a', 2)
('z', 2)
('z', 1)
['a001', 'a2', 'beta1.1', 'beta2.3.0', 'beta2.33.1', 'something1', 'something2', 'something12', 'something17', 'something25and_then_33', 'something25and_then_34', 'something29', 'z1', 'z002']

Question 11

Based on the answers here, I wrote a natural_sorted function that behaves like the built-in function sorted:

# Copyright (C) 2018, Benjamin Drung <bdrung@posteo.de>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

import re

def natural_sorted(iterable, key=None, reverse=False):
    """Return a new naturally sorted list from the items in *iterable*.

    The returned list is in natural sort order. The string is ordered
    lexicographically (using the Unicode code point number to order individual
    characters), except that multi-digit numbers are ordered as a single
    character.

    Has two optional arguments which must be specified as keyword arguments.

    *key* specifies a function of one argument that is used to extract a
    comparison key from each list element: ``key=str.lower``.  The default value
    is ``None`` (compare the elements directly).

    *reverse* is a boolean value.  If set to ``True``, then the list elements are
    sorted as if each comparison were reversed.

    The :func:`natural_sorted` function is guaranteed to be stable. A sort is
    stable if it guarantees not to change the relative order of elements that
    compare equal --- this is helpful for sorting in multiple passes (for
    example, sort by department, then by salary grade).
    """
    prog = re.compile(r"(\d+)")

    def alphanum_key(element):
        """Split given key in list of strings and digits"""
        return [int(c) if c.isdigit() else c for c in prog.split(key(element)
                if key else element)]

    return sorted(iterable, key=alphanum_key, reverse=reverse)

The source code is also available in my GitHub snippets repository: https://github.com/bdrung/snippets/blob/master/natural_sorted.py

Question 12

The above answers are good for the specific example that was shown, but miss several useful cases for the more general question of natural sort. I just got bit by one of those cases, so created a more thorough solution:

def natural_sort_key(string_or_number):
    """
    by Scott S. Lawton <scott@ProductArchitect.com> 2014-12-11; public domain and/or CC0 license

    handles cases where simple 'int' approach fails, e.g.
        ['0.501', '0.55'] floating point with different number of significant digits
        [0.01, 0.1, 1]    already numeric so regex and other string functions won't work (and aren't required)
        ['elm1', 'Elm2']  ASCII vs. letters (not case sensitive)
    """

    def try_float(astring):
        try:
            return float(astring)
        except:
            return astring

    if isinstance(string_or_number, basestring):
        string_or_number = string_or_number.lower()

        if len(re.findall('[.]\d', string_or_number)) <= 1:
            # assume a floating point value, e.g. to correctly sort ['0.501', '0.55']
            # '.' for decimal is locale-specific, e.g. correct for the Anglosphere and Asia but not continental Europe
            return [try_float(s) for s in re.split(r'([\d.]+)', string_or_number)]
        else:
            # assume distinct fields, e.g. IP address, phone number with '.', etc.
            # caveat: might want to first split by whitespace
            # TBD: for unicode, replace isdigit with isdecimal
            return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', string_or_number)]
    else:
        # consider: add code to recurse for lists/tuples and perhaps other iterables
        return string_or_number

Test code and several links (on and off of StackOverflow) are here: http://productarchitect.com/code/better-natural-sort.py

Feedback welcome. That’s not meant to be a definitive solution; just a step forward.

Question 13

Most likely functools.cmp_to_key() is closely tied to the underlying implementation of python’s sort. Besides, the cmp parameter is legacy. The modern way is to transform the input items into objects that support the desired rich comparison operations.

Under CPython 2.x, objects of disparate types can be ordered even if the respective rich comparison operators haven’t been implemented. Under CPython 3.x, objects of different types must explicitly support the comparison. See How does Python compare string and int? which links to the official documentation. Most of the answers depend on this implicit ordering. Switching to Python 3.x will require a new type to implement and unify comparisons between numbers and strings.

Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
>>> (0,"foo") < ("foo",0)
True

Python 3.5.2 (default, Oct 14 2016, 12:54:53) 
>>> (0,"foo") < ("foo",0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  TypeError: unorderable types: int() < str()

There are three different approaches. The first uses nested classes to take advantage of Python’s Iterable comparison algorithm. The second unrolls this nesting into a single class. The third foregoes subclassing str to focus on performance. All are timed; the second is twice as fast while the third almost six times faster. Subclassing str isn’t required, and was probably a bad idea in the first place, but it does come with certain conveniences.

The sort characters are duplicated to force ordering by case, and case-swapped to force lower case letter to sort first; this is the typical definition of “natural sort”. I couldn’t decide on the type of grouping; some might prefer the following, which also brings significant performance benefits:

d = lambda s: s.lower()+s.swapcase()

Where utilized, the comparison operators are set to that of object so they won’t be ignored by functools.total_ordering.

import functools
import itertools


@functools.total_ordering
class NaturalStringA(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda c, s: [ c.NaturalStringPart("".join(v))
                        for k,v in
                       itertools.groupby(s, c.isdigit)
                     ]
    d = classmethod(d)
    @functools.total_ordering
    class NaturalStringPart(str):
        d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
        d = staticmethod(d)
        def __lt__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) < int(other)
            except ValueError:
                if self.isdigit():
                    return True
                elif other.isdigit():
                    return False
                else:
                    return self.d(self) < self.d(other)
        def __eq__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) == int(other)
            except ValueError:
                if self.isdigit() or other.isdigit():
                    return False
                else:
                    return self.d(self) == self.d(other)
        __le__ = object.__le__
        __ne__ = object.__ne__
        __gt__ = object.__gt__
        __ge__ = object.__ge__
    def __lt__(self, other):
        return self.d(self) < self.d(other)
    def __eq__(self, other):
        return self.d(self) == self.d(other)
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__

import functools
import itertools


@functools.total_ordering
class NaturalStringB(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
    d = staticmethod(d)
    def __lt__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
        return False
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None or o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return False
        return True
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__

import functools
import itertools
import enum


class OrderingType(enum.Enum):
    PerWordSwapCase         = lambda s: s.lower()+s.swapcase()
    PerCharacterSwapCase    = lambda s: "".join(c.lower()+c.swapcase() for c in s)


class NaturalOrdering:
    @classmethod
    def by(cls, ordering):
        def wrapper(string):
            return cls(string, ordering)
        return wrapper
    def __init__(self, string, ordering=OrderingType.PerCharacterSwapCase):
        self.string = string
        self.groups = [ (k,int("".join(v)))
                            if k else
                        (k,ordering("".join(v)))
                            for k,v in
                        itertools.groupby(string, str.isdigit)
                      ]
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , self.string
            )
    def __lesser(self, other, default):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return s_v < o_v
        return default
    def __lt__(self, other):
        return self.__lesser(other, default=False)
    def __le__(self, other):
        return self.__lesser(other, default=True)
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None or o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return False
        return True
    # functools.total_ordering doesn't create single-call wrappers if both
    # __le__ and __lt__ exist, so do it manually.
    def __gt__(self, other):
        op_result = self.__le__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    def __ge__(self, other):
        op_result = self.__lt__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    # __ne__ is the only implied ordering relationship, it automatically
    # delegates to __eq__

>>> import natsort
>>> import timeit
>>> l1 = ['Apple', 'corn', 'apPlE', 'arbour', 'Corn', 'Banana', 'apple', 'banana']
>>> l2 = list(map(str, range(30)))
>>> l3 = ["{} {}".format(x,y) for x in l1 for y in l2]
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringA)', number=10000, globals=globals()))
362.4729259099986
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringB)', number=10000, globals=globals()))
189.7340817489967
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalOrdering.by(OrderingType.PerCharacterSwapCase))', number=10000, globals=globals()))
69.34636392899847
>>> print(timeit.timeit('natsort.natsorted(l3+["0"], alg=natsort.ns.GROUPLETTERS | natsort.ns.LOWERCASEFIRST)', number=10000, globals=globals()))
98.2531585780016

Natural sorting is both pretty complicated and vaguely defined as a problem. Don’t forget to run unicodedata.normalize(...) beforehand, and consider use str.casefold() rather than str.lower(). There are probably subtle encoding issues I haven’t considered. So I tentatively recommend the natsort library. I took a quick glance at the github repository; the code maintenance has been stellar.

All the algorithms I’ve seen depend on tricks such as duplicating and lowering characters, and swapping case. While this doubles the running time, an alternative would require a total natural ordering on the input character set. I don’t think this is part of the unicode specification, and since there are many more unicode digits than [0-9], creating such a sorting would be equally daunting. If you want locale-aware comparisons, prepare your strings with locale.strxfrm per Python’s Sorting HOW TO.

Question 14

Let me submit my own take on this need:

from typing import Tuple, Union, Optional, Generator


StrOrInt = Union[str, int]


# On Python 3.6, string concatenation is REALLY fast
# Tested myself, and this fella also tested:
# https://blog.ganssle.io/articles/2019/11/string-concat.html
def griter(s: str) -> Generator[StrOrInt, None, None]:
    last_was_digit: Optional[bool] = None
    cluster: str = ""
    for c in s:
        if last_was_digit is None:
            last_was_digit = c.isdigit()
            cluster += c
            continue
        if c.isdigit() != last_was_digit:
            if last_was_digit:
                yield int(cluster)
            else:
                yield cluster
            last_was_digit = c.isdigit()
            cluster = ""
        cluster += c
    if last_was_digit:
        yield int(cluster)
    else:
        yield cluster
    return


def grouper(s: str) -> Tuple[StrOrInt, ...]:
    return tuple(griter(s))

Now if we have the list like such:

filelist = [
    'File3', 'File007', 'File3a', 'File10', 'File11', 'File1', 'File4', 'File5',
    'File9', 'File8', 'File8b1', 'File8b2', 'File8b11', 'File6'
]

We can simply use the key= kwarg to do a natural sort:

>>> sorted(filelist, key=grouper)
['File1', 'File3', 'File3a', 'File4', 'File5', 'File6', 'File007', 'File8', 
'File8b1', 'File8b2', 'File8b11', 'File9', 'File10', 'File11']

The drawback here is of course, as it is now, the function will sort uppercase letters before lowercase letters.

I’ll leave the implementation of a case-insenstive grouper to the reader :-)

Question 15

I suggest you simply use the key keyword argument of sorted to achieve your desired list
For example:

to_order= [e2,E1,e5,E4,e3]
ordered= sorted(to_order, key= lambda x: x.lower())
    # ordered should be [E1,e2,e3,E4,e5]

Question 16

Following @Mark Byers answer, here is an adaptation which accepts the key parameter, and is more PEP8-compliant.

def natsorted(seq, key=None):
    def convert(text):
        return int(text) if text.isdigit() else text

    def alphanum(obj):
        if key is not None:
            return [convert(c) for c in re.split(r'([0-9]+)', key(obj))]
        return [convert(c) for c in re.split(r'([0-9]+)', obj)]

    return sorted(seq, key=alphanum)

I also made a Gist

Question 17

An improvement on Claudiu’s improvement on Mark Byers’ answer ;-)

import re

def natural_sort_key(s, _re=re.compile(r'(\d+)')):
    return [int(t) if i & 1 else t.lower() for i, t in enumerate(_re.split(s))]

...
my_naturally_sorted_list = sorted(my_list, key=natural_sort_key)

BTW, maybe not everyone remembers that function argument defaults are evaluated at def time

Question 18

a = ['H1', 'H100', 'H10', 'H3', 'H2', 'H6', 'H11', 'H50', 'H5', 'H99', 'H8']
b = ''
c = []

def bubble(bad_list):#bubble sort method
        length = len(bad_list) - 1
        sorted = False

        while not sorted:
                sorted = True
                for i in range(length):
                        if bad_list[i] > bad_list[i+1]:
                                sorted = False
                                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i] #sort the integer list 
                                a[i], a[i+1] = a[i+1], a[i] #sort the main list based on the integer list index value

for a_string in a: #extract the number in the string character by character
        for letter in a_string:
                if letter.isdigit():
                        #print letter
                        b += letter
        c.append(b)
        b = ''

print 'Before sorting....'
print a
c = map(int, c) #converting string list into number list
print c
bubble(c)

print 'After sorting....'
print c
print a

Acknowledgments:

Bubble Sort Homework

How to read a string one letter at a time in python

Question 19

>>> import re
>>> sorted(lst, key=lambda x: int(re.findall(r'\d+$', x)[0]))
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

Question 20

The string.replace() is deprecated on python 3.x. What is the new way of doing this?

Question 21

As in 2.x, use str.replace().

Example:

>>> 'Hello world'.replace('world', 'Guido')
'Hello Guido'

Question 22

replace() is a method of <class 'str'> in python3:

>>> 'hello, world'.replace(',', ':')
'hello: world'

Question 23

The replace() method in python 3 is used simply by:

a = "This is the island of istanbul"
print (a.replace("is" , "was" , 3))

#3 is the maximum replacement that can be done in the string#

>>> Thwas was the wasland of istanbul

# Last substring 'is' in istanbul is not replaced by was because maximum of 3 has already been reached

Question 24

You can use str.replace() as a chain of str.replace(). Think you have a string like 'Testing PRI/Sec (#434242332;PP:432:133423846,335)' and you want to replace all the '#',':',';','/' sign with '-'. You can replace it either this way(normal way),

>>> str = 'Testing PRI/Sec (#434242332;PP:432:133423846,335)'
>>> str = str.replace('#', '-')
>>> str = str.replace(':', '-')
>>> str = str.replace(';', '-')
>>> str = str.replace('/', '-')
>>> str
'Testing PRI-Sec (-434242332-PP-432-133423846,335)'

or this way(chain of str.replace())

>>> str = 'Testing PRI/Sec (#434242332;PP:432:133423846,335)'.replace('#', '-').replace(':', '-').replace(';', '-').replace('/', '-')
>>> str
'Testing PRI-Sec (-434242332-PP-432-133423846,335)'

Question 25

Try this:

mystring = "This Is A String"
print(mystring.replace("String","Text"))

Question 26

FYI, when appending some characters to an arbitrary, position-fixed word inside the string (e.g. changing an adjective to an adverb by adding the suffix -ly), you can put the suffix at the end of the line for readability. To do this, use split() inside replace():

s="The dog is large small"
ss=s.replace(s.split()[3],s.split()[3]+'ly')
ss
'The dog is largely small'

Question 27

ss = s.replace(s.split()[1], +s.split()[1] + 'gy')
# should have no plus after the comma --i.e.,
ss = s.replace(s.split()[1], s.split()[1] + 'gy')

Question 28

>>> timeit.timeit("'x' in ('x',)")
0.04869917374131205
>>> timeit.timeit("'x' == 'x'")
0.06144205736110564

Also works for tuples with multiple elements, both versions seem to grow linearly:

>>> timeit.timeit("'x' in ('x', 'y')")
0.04866674801541748
>>> timeit.timeit("'x' == 'x' or 'x' == 'y'")
0.06565782838087131
>>> timeit.timeit("'x' in ('y', 'x')")
0.08975995576448526
>>> timeit.timeit("'x' == 'y' or 'x' == 'y'")
0.12992391047427532

Based on this, I think I should totally start using in everywhere instead of ==!

Question 29

As I mentioned to David Wolever, there’s more to this than meets the eye; both methods dispatch to is; you can prove this by doing

min(Timer("x == x", setup="x = 'a' * 1000000").repeat(10, 10000))
#>>> 0.00045456900261342525

min(Timer("x == y", setup="x = 'a' * 1000000; y = 'a' * 1000000").repeat(10, 10000))
#>>> 0.5256857610074803

The first can only be so fast because it checks by identity.

To find out why one would take longer than the other, let’s trace through execution.

They both start in ceval.c, from COMPARE_OP since that is the bytecode involved

TARGET(COMPARE_OP) {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *res = cmp_outcome(oparg, left, right);
    Py_DECREF(left);
    Py_DECREF(right);
    SET_TOP(res);
    if (res == NULL)
        goto error;
    PREDICT(POP_JUMP_IF_FALSE);
    PREDICT(POP_JUMP_IF_TRUE);
    DISPATCH();
}

This pops the values from the stack (technically it only pops one)

PyObject *right = POP();
PyObject *left = TOP();

and runs the compare:

PyObject *res = cmp_outcome(oparg, left, right);

cmp_outcome is this:

static PyObject *
cmp_outcome(int op, PyObject *v, PyObject *w)
{
    int res = 0;
    switch (op) {
    case PyCmp_IS: ...
    case PyCmp_IS_NOT: ...
    case PyCmp_IN:
        res = PySequence_Contains(w, v);
        if (res < 0)
            return NULL;
        break;
    case PyCmp_NOT_IN: ...
    case PyCmp_EXC_MATCH: ...
    default:
        return PyObject_RichCompare(v, w, op);
    }
    v = res ? Py_True : Py_False;
    Py_INCREF(v);
    return v;
}

This is where the paths split. The PyCmp_IN branch does

int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
    Py_ssize_t result;
    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL)
        return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

Note that a tuple is defined as

static PySequenceMethods tuple_as_sequence = {
    ...
    (objobjproc)tuplecontains,                  /* sq_contains */
};

PyTypeObject PyTuple_Type = {
    ...
    &tuple_as_sequence,                         /* tp_as_sequence */
    ...
};

So the branch

if (sqm != NULL && sqm->sq_contains != NULL)

will be taken and *sqm->sq_contains, which is the function (objobjproc)tuplecontains, will be taken.

This does

static int
tuplecontains(PyTupleObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

…Wait, wasn’t that PyObject_RichCompareBool what the other branch took? Nope, that was PyObject_RichCompare.

That code path was short so it likely just comes down to the speed of these two. Let’s compare.

int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
    PyObject *res;
    int ok;

    /* Quick result when objects are the same.
       Guarantees that identity implies equality. */
    if (v == w) {
        if (op == Py_EQ)
            return 1;
        else if (op == Py_NE)
            return 0;
    }

    ...
}

The code path in PyObject_RichCompareBool pretty much immediately terminates. For PyObject_RichCompare, it does

PyObject *
PyObject_RichCompare(PyObject *v, PyObject *w, int op)
{
    PyObject *res;

    assert(Py_LT <= op && op <= Py_GE);
    if (v == NULL || w == NULL) { ... }
    if (Py_EnterRecursiveCall(" in comparison"))
        return NULL;
    res = do_richcompare(v, w, op);
    Py_LeaveRecursiveCall();
    return res;
}

The Py_EnterRecursiveCall/Py_LeaveRecursiveCall combo are not taken in the previous path, but these are relatively quick macros that’ll short-circuit after incrementing and decrementing some globals.

do_richcompare does:

static PyObject *
do_richcompare(PyObject *v, PyObject *w, int op)
{
    richcmpfunc f;
    PyObject *res;
    int checked_reverse_op = 0;

    if (v->ob_type != w->ob_type && ...) { ... }
    if ((f = v->ob_type->tp_richcompare) != NULL) {
        res = (*f)(v, w, op);
        if (res != Py_NotImplemented)
            return res;
        ...
    }
    ...
}

This does some quick checks to call v->ob_type->tp_richcompare which is

PyTypeObject PyUnicode_Type = {
    ...
    PyUnicode_RichCompare,      /* tp_richcompare */
    ...
};

which does

PyObject *
PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
{
    int result;
    PyObject *v;

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))
        Py_RETURN_NOTIMPLEMENTED;

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)
        return NULL;

    if (left == right) {
        switch (op) {
        case Py_EQ:
        case Py_LE:
        case Py_GE:
            /* a string is equal to itself */
            v = Py_True;
            break;
        case Py_NE:
        case Py_LT:
        case Py_GT:
            v = Py_False;
            break;
        default:
            ...
        }
    }
    else if (...) { ... }
    else { ...}
    Py_INCREF(v);
    return v;
}

Namely, this shortcuts on left == right… but only after doing

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)

All in all the paths then look something like this (manually recursively inlining, unrolling and pruning known branches)

POP()                           # Stack stuff
TOP()                           #
                                #
case PyCmp_IN:                  # Dispatch on operation
                                #
sqm != NULL                     # Dispatch to builtin op
sqm->sq_contains != NULL        #
*sqm->sq_contains               #
                                #
cmp == 0                        # Do comparison in loop
i < Py_SIZE(a)                  #
v == w                          #
op == Py_EQ                     #
++i                             # 
cmp == 0                        #
                                #
res < 0                         # Convert to Python-space
res ? Py_True : Py_False        #
Py_INCREF(v)                    #
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

vs

POP()                           # Stack stuff
TOP()                           #
                                #
default:                        # Dispatch on operation
                                #
Py_LT <= op                     # Checking operation
op <= Py_GE                     #
v == NULL                       #
w == NULL                       #
Py_EnterRecursiveCall(...)      # Recursive check
                                #
v->ob_type != w->ob_type        # More operation checks
f = v->ob_type->tp_richcompare  # Dispatch to builtin op
f != NULL                       #
                                #
!PyUnicode_Check(left)          # ...More checks
!PyUnicode_Check(right))        #
PyUnicode_READY(left) == -1     #
PyUnicode_READY(right) == -1    #
left == right                   # Finally, doing comparison
case Py_EQ:                     # Immediately short circuit
Py_INCREF(v);                   #
                                #
res != Py_NotImplemented        #
                                #
Py_LeaveRecursiveCall()         # Recursive check
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

Now, PyUnicode_Check and PyUnicode_READY are pretty cheap since they only check a couple of fields, but it should be obvious that the top one is a smaller code path, it has fewer function calls, only one switch statement and is just a bit thinner.

TL;DR:

Both dispatch to if (left_pointer == right_pointer); the difference is just how much work they do to get there. in just does less.

Question 30

There are three factors at play here which, combined, produce this surprising behavior.

First: the in operator takes a shortcut and checks identity (x is y) before it checks equality (x == y):

>>> n = float('nan')
>>> n in (n, )
True
>>> n == n
False
>>> n is n
True

Second: because of Python’s string interning, both "x"s in "x" in ("x", ) will be identical:

>>> "x" is "x"
True

(big warning: this is implementation-specific behavior! is should never be used to compare strings because it will give surprising answers sometimes; for example "x" * 100 is "x" * 100 ==> False)

Third: as detailed in Veedrac’s fantastic answer, tuple.__contains__ (x in (y, ) is roughly equivalent to (y, ).__contains__(x)) gets to the point of performing the identity check faster than str.__eq__ (again, x == y is roughly equivalent to x.__eq__(y)) does.

You can see evidence for this because x in (y, ) is significantly slower than the logically equivalent, x == y:

In [18]: %timeit 'x' in ('x', )
10000000 loops, best of 3: 65.2 ns per loop

In [19]: %timeit 'x' == 'x'    
10000000 loops, best of 3: 68 ns per loop

In [20]: %timeit 'x' in ('y', ) 
10000000 loops, best of 3: 73.4 ns per loop

In [21]: %timeit 'x' == 'y'    
10000000 loops, best of 3: 56.2 ns per loop

The x in (y, ) case is slower because, after the is comparison fails, the in operator falls back to normal equality checking (i.e., using ==), so the comparison takes about the same amount of time as ==, rendering the entire operation slower because of the overhead of creating the tuple, walking its members, etc.

Note also that a in (b, ) is only faster when a is b:

In [48]: a = 1             

In [49]: b = 2

In [50]: %timeit a is a or a == a
10000000 loops, best of 3: 95.1 ns per loop

In [51]: %timeit a in (a, )      
10000000 loops, best of 3: 140 ns per loop

In [52]: %timeit a is b or a == b
10000000 loops, best of 3: 177 ns per loop

In [53]: %timeit a in (b, )      
10000000 loops, best of 3: 169 ns per loop

(why is a in (b, ) faster than a is b or a == b? My guess would be fewer virtual machine instructions — a in (b, ) is only ~3 instructions, where a is b or a == b will be quite a few more VM instructions)

Veedrac’s answer — https://stackoverflow.com/a/28889838/71522 — goes into much more detail on specifically what happens during each of == and in and is well worth the read.

Question 31

Recently I started using Python3 and it’s lack of xrange hurts.

Simple example:

1) Python2:

from time import time as t
def count():
  st = t()
  [x for x in xrange(10000000) if x%4 == 0]
  et = t()
  print et-st
count()

2) Python3:

from time import time as t

def xrange(x):

    return iter(range(x))

def count():
    st = t()
    [x for x in xrange(10000000) if x%4 == 0]
    et = t()
    print (et-st)
count()

The results are, respectively:

1) 1.53888392448 2) 3.215819835662842

Why is that? I mean, why xrange’s been removed? It’s such a great tool to learn. For the beginners, just like myself, like we all were at some point. Why remove it? Can somebody point me to the proper PEP, I can’t find it.

Cheers.

Question 32

Some performance measurements, using timeit instead of trying to do it manually with time.

First, Apple 2.7.2 64-bit:

In [37]: %timeit collections.deque((x for x in xrange(10000000) if x%4 == 0), maxlen=0)
1 loops, best of 3: 1.05 s per loop

Now, python.org 3.3.0 64-bit:

In [83]: %timeit collections.deque((x for x in range(10000000) if x%4 == 0), maxlen=0)
1 loops, best of 3: 1.32 s per loop

In [84]: %timeit collections.deque((x for x in xrange(10000000) if x%4 == 0), maxlen=0)
1 loops, best of 3: 1.31 s per loop

In [85]: %timeit collections.deque((x for x in iter(range(10000000)) if x%4 == 0), maxlen=0) 
1 loops, best of 3: 1.33 s per loop

Apparently, 3.x range really is a bit slower than 2.x xrange. And the OP’s xrange function has nothing to do with it. (Not surprising, as a one-time call to the __iter__ slot isn’t likely to be visible among 10000000 calls to whatever happens in the loop, but someone brought it up as a possibility.)

But it’s only 30% slower. How did the OP get 2x as slow? Well, if I repeat the same tests with 32-bit Python, I get 1.58 vs. 3.12. So my guess is that this is yet another of those cases where 3.x has been optimized for 64-bit performance in ways that hurt 32-bit.

But does it really matter? Check this out, with 3.3.0 64-bit again:

In [86]: %timeit [x for x in range(10000000) if x%4 == 0]
1 loops, best of 3: 3.65 s per loop

So, building the list takes more than twice as long than the entire iteration.

And as for “consumes much more resources than Python 2.6+”, from my tests, it looks like a 3.x range is exactly the same size as a 2.x xrange—and, even if it were 10x as big, building the unnecessary list is still about 10000000x more of a problem than anything the range iteration could possibly do.

And what about an explicit for loop instead of the C loop inside deque?

In [87]: def consume(x):
   ....:     for i in x:
   ....:         pass
In [88]: %timeit consume(x for x in range(10000000) if x%4 == 0)
1 loops, best of 3: 1.85 s per loop

So, almost as much time wasted in the for statement as in the actual work of iterating the range.

If you’re worried about optimizing the iteration of a range object, you’re probably looking in the wrong place.

Meanwhile, you keep asking why xrange was removed, no matter how many times people tell you the same thing, but I’ll repeat it again: It was not removed: it was renamed to range, and the 2.x range is what was removed.

Here’s some proof that the 3.3 range object is a direct descendant of the 2.x xrange object (and not of the 2.x range function): the source to 3.3 range and 2.7 xrange. You can even see the change history (linked to, I believe, the change that replaced the last instance of the string “xrange” anywhere in the file).

So, why is it slower?

Well, for one, they’ve added a lot of new features. For another, they’ve done all kinds of changes all over the place (especially inside iteration) that have minor side effects. And there’d been a lot of work to dramatically optimize various important cases, even if it sometimes slightly pessimizes less important cases. Add this all up, and I’m not surprised that iterating a range as fast as possible is now a bit slower. It’s one of those less-important cases that nobody would ever care enough to focus on. No one is likely to ever have a real-life use case where this performance difference is the hotspot in their code.

Question 33

Python3’s range is Python2’s xrange. There’s no need to wrap an iter around it. To get an actual list in Python3, you need to use list(range(...))

If you want something that works with Python2 and Python3, try this

try:
    xrange
except NameError:
    xrange = range

Question 34

Python 3’s range type works just like Python 2’s xrange. I’m not sure why you’re seeing a slowdown, since the iterator returned by your xrange function is exactly what you’d get if you iterated over range directly.

I’m not able to reproduce the slowdown on my system. Here’s how I tested:

Python 2, with xrange:

Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import timeit
>>> timeit.timeit("[x for x in xrange(1000000) if x%4]",number=100)
18.631936646865853

Python 3, with range is a tiny bit faster:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import timeit
>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=100)
17.31399508687869

I recently learned that Python 3’s range type has some other neat features, such as support for slicing: range(10,100,2)[5:25:5] is range(20, 60, 10)!

Question 35

One way to fix up your python2 code is:

import sys

if sys.version_info >= (3, 0):
    def xrange(*args, **kwargs):
        return iter(range(*args, **kwargs))

Question 36

xrange from Python 2 is a generator and implements iterator while range is just a function. In Python3 I don’t know why was dropped off the xrange.

Question 37

comp:~$ python Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2

>>> import timeit
>>> timeit.timeit("[x for x in xrange(1000000) if x%4]",number=100)

5.656799077987671

>>> timeit.timeit("[x for x in xrange(1000000) if x%4]",number=100)

5.579368829727173

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=100)

21.54827117919922

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=100)

22.014557123184204

With timeit number=1 param:

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=1)

0.2245171070098877

>>> timeit.timeit("[x for x in xrange(1000000) if x%4]",number=1)

0.10750913619995117

comp:~$ python3 Python 3.4.3 (default, Oct 14 2015, 20:28:29) [GCC 4.8.4] on linux

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=100)

9.113872020003328

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=100)

9.07014398300089

With timeit number=1,2,3,4 param works quick and in linear way:

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=1)

0.09329321900440846

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=2)

0.18501482300052885

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=3)

0.2703447980020428

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=4)

0.36209142999723554

So it seems if we measure 1 running loop cycle like timeit.timeit(“[x for x in range(1000000) if x%4]”,number=1) (as we actually use in real code) python3 works quick enough, but in repeated loops python 2 xrange() wins in speed against range() from python 3.

Question 38

If I have a class…

class MyClass:

    def method(arg):
        print(arg)

…which I use to create an object…

my_object = MyClass()

…on which I call method("foo") like so…

>>> my_object.method("foo")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: method() takes exactly 1 positional argument (2 given)

…why does Python tell me I gave it two arguments, when I only gave one?

Question 39

In Python, this:

my_object.method("foo")

…is syntactic sugar, which the interpreter translates behind the scenes into:

MyClass.method(my_object, "foo")

…which, as you can see, does indeed have two arguments – it’s just that the first one is implicit, from the point of view of the caller.

This is because most methods do some work with the object they’re called on, so there needs to be some way for that object to be referred to inside the method. By convention, this first argument is called self inside the method definition:

class MyNewClass:

    def method(self, arg):
        print(self)
        print(arg)

If you call method("foo") on an instance of MyNewClass, it works as expected:

>>> my_new_object = MyNewClass()
>>> my_new_object.method("foo")
<__main__.MyNewClass object at 0x29045d0>
foo

Occasionally (but not often), you really don’t care about the object that your method is bound to, and in that circumstance, you can decorate the method with the builtin staticmethod() function to say so:

class MyOtherClass:

    @staticmethod
    def method(arg):
        print(arg)

…in which case you don’t need to add a self argument to the method definition, and it still works:

>>> my_other_object = MyOtherClass()
>>> my_other_object.method("foo")
foo

Question 40

Something else to consider when this type of error is encountered:

I was running into this error message and found this post helpful. Turns out in my case I had overridden an __init__() where there was object inheritance.

The inherited example is rather long, so I’ll skip to a more simple example that doesn’t use inheritance:

class MyBadInitClass:
    def ___init__(self, name):
        self.name = name

    def name_foo(self, arg):
        print(self)
        print(arg)
        print("My name is", self.name)


class MyNewClass:
    def new_foo(self, arg):
        print(self)
        print(arg)


my_new_object = MyNewClass()
my_new_object.new_foo("NewFoo")
my_bad_init_object = MyBadInitClass(name="Test Name")
my_bad_init_object.name_foo("name foo")

Result is:

<__main__.MyNewClass object at 0x033C48D0>
NewFoo
Traceback (most recent call last):
  File "C:/Users/Orange/PycharmProjects/Chapter9/bad_init_example.py", line 41, in <module>
    my_bad_init_object = MyBadInitClass(name="Test Name")
TypeError: object() takes no parameters

PyCharm didn’t catch this typo. Nor did Notepad++ (other editors/IDE’s might).

Granted, this is a “takes no parameters” TypeError, it isn’t much different than “got two” when expecting one, in terms of object initialization in Python.

Addressing the topic: An overloading initializer will be used if syntactically correct, but if not it will be ignored and the built-in used instead. The object won’t expect/handle this and the error is thrown.

In the case of the sytax error: The fix is simple, just edit the custom init statement:

def __init__(self, name):
    self.name = name

Question 41

In simple words.

In Python you should add self argument as the first argument to all defined methods in classes:

class MyClass:
  def method(self, arg):
    print(arg)

Then you can use your method according to your intuition:

>>> my_object = MyClass()
>>> my_object.method("foo")
foo

This should solve your problem :)

For a better understanding, you can also read the answers to this question: What is the purpose of self?

Question 42

Newcomer to Python, I had this issue when I was using the Python’s ** feature in a wrong way. Trying to call this definition from somewhere:

def create_properties_frame(self, parent, **kwargs):

using a call without a double star was causing the problem:

self.create_properties_frame(frame, kw_gsp)

TypeError: create_properties_frame() takes 2 positional arguments but 3 were given

The solution is to add ** to the argument:

self.create_properties_frame(frame, **kw_gsp)

Question 43

It occurs when you don’t specify the no of parameters the __init__() or any other method looking for.

For example:

class Dog:
    def __init__(self):
        print("IN INIT METHOD")

    def __unicode__(self,):
        print("IN UNICODE METHOD")

    def __str__(self):
        print("IN STR METHOD")

obj=Dog("JIMMY",1,2,3,"WOOF")

When you run the above programme, it gives you an error like that:

TypeError: __init__() takes 1 positional argument but 6 were given

How we can get rid of this thing?

Just pass the parameters, what __init__() method looking for

class Dog:
    def __init__(self, dogname, dob_d, dob_m, dob_y, dogSpeakText):
        self.name_of_dog = dogname
        self.date_of_birth = dob_d
        self.month_of_birth = dob_m
        self.year_of_birth = dob_y
        self.sound_it_make = dogSpeakText

    def __unicode__(self, ):
        print("IN UNICODE METHOD")

    def __str__(self):
        print("IN STR METHOD")


obj = Dog("JIMMY", 1, 2, 3, "WOOF")
print(id(obj))

Question 44

You should actually create a class:

class accum:
    def __init__(self):
        self.acc = 0
    def accumulator(self, var2add, end):
        if not end:
            self.acc+=var2add
    return self.acc

Question 45

In my case, I forgot to add the ()

I was calling the method like this

obj = className.myMethod

But it should be is like this

obj = className.myMethod()

Question 46

Pass cls parameter into @classmethod to resolve this problem.

@classmethod
def test(cls):
    return ''

Question 47

Why are x and y strings instead of ints in the below code?

(Note: in Python 2.x use raw_input(). In Python 3.x use input(). raw_input() was renamed to input() in Python 3.x)

play = True

while play:

    x = input("Enter a number: ")
    y = input("Enter a number: ")

    print(x + y)
    print(x - y)
    print(x * y)
    print(x / y)
    print(x % y)

    if input("Play again? ") == "no":
        play = False

Question 48

TLDR

Python 3 doesn’t evaluate the data received with input function, but Python 2’s input function does (read the next section to understand the implication).
Python 2’s equivalent of Python 3’s input is the raw_input function.

Python 2.x

There were two functions to get user input, called input and raw_input. The difference between them is, raw_input doesn’t evaluate the data and returns as it is, in string form. But, input will evaluate whatever you entered and the result of evaluation will be returned. For example,

>>> import sys
>>> sys.version
'2.7.6 (default, Mar 22 2014, 22:59:56) \n[GCC 4.8.2]'
>>> data = input("Enter a number: ")
Enter a number: 5 + 17
>>> data, type(data)
(22, <type 'int'>)

The data 5 + 17 is evaluated and the result is 22. When it evaluates the expression 5 + 17, it detects that you are adding two numbers and so the result will also be of the same int type. So, the type conversion is done for free and 22 is returned as the result of input and stored in data variable. You can think of input as the raw_input composed with an eval call.

>>> data = eval(raw_input("Enter a number: "))
Enter a number: 5 + 17
>>> data, type(data)
(22, <type 'int'>)

Note: you should be careful when you are using input in Python 2.x. I explained why one should be careful when using it, in this answer.

But, raw_input doesn’t evaluate the input and returns as it is, as a string.

>>> import sys
>>> sys.version
'2.7.6 (default, Mar 22 2014, 22:59:56) \n[GCC 4.8.2]'
>>> data = raw_input("Enter a number: ")
Enter a number: 5 + 17
>>> data, type(data)
('5 + 17', <type 'str'>)

Python 3.x

Python 3.x’s input and Python 2.x’s raw_input are similar and raw_input is not available in Python 3.x.

>>> import sys
>>> sys.version
'3.4.0 (default, Apr 11 2014, 13:05:11) \n[GCC 4.8.2]'
>>> data = input("Enter a number: ")
Enter a number: 5 + 17
>>> data, type(data)
('5 + 17', <class 'str'>)

Solution

To answer your question, since Python 3.x doesn’t evaluate and convert the data type, you have to explicitly convert to ints, with int, like this

x = int(input("Enter a number: "))
y = int(input("Enter a number: "))

You can accept numbers of any base and convert them directly to base-10 with the int function, like this

>>> data = int(input("Enter a number: "), 8)
Enter a number: 777
>>> data
511
>>> data = int(input("Enter a number: "), 16)
Enter a number: FFFF
>>> data
65535
>>> data = int(input("Enter a number: "), 2)
Enter a number: 10101010101
>>> data
1365

The second parameter tells what is the base of the numbers entered and then internally it understands and converts it. If the entered data is wrong it will throw a ValueError.

>>> data = int(input("Enter a number: "), 2)
Enter a number: 1234
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: invalid literal for int() with base 2: '1234'

For values that can have a fractional component, the type would be float rather than int:

x = float(input("Enter a number:"))

Apart from that, your program can be changed a little bit, like this

while True:
    ...
    ...
    if input("Play again? ") == "no":
        break

You can get rid of the play variable by using break and while True.

Question 49

In Python 3.x, raw_input was renamed to input and the Python 2.x input was removed.

This means that, just like raw_input, input in Python 3.x always returns a string object.

To fix the problem, you need to explicitly make those inputs into integers by putting them in int:

x = int(input("Enter a number: "))
y = int(input("Enter a number: "))

Question 50

For multiple integer in a single line, map might be better.

arr = map(int, raw_input().split())

If the number is already known, (like 2 integers), you can use

num1, num2 = map(int, raw_input().split())

Question 51

input() (Python 3) and raw_input() (Python 2) always return strings. Convert the result to integer explicitly with int().

x = int(input("Enter a number: "))
y = int(input("Enter a number: "))

Question 52

Multiple questions require input for several integers on single line. The best way is to input the whole string of numbers one one line and then split them to integers. Here is a Python 3 version:

a = []
p = input()
p = p.split()      
for i in p:
    a.append(int(i))

Also a list comprehension can be used

p = input().split("whatever the seperator is")

And to convert all the inputs from string to int we do the following

x = [int(i) for i in p]
print(x, end=' ')

shall print the list elements in a straight line.

Question 53

Convert to integers:

my_number = int(input("enter the number"))

Similarly for floating point numbers:

my_decimalnumber = float(input("enter the number"))

Question 54

n=int(input())
for i in range(n):
    n=input()
    n=int(n)
    arr1=list(map(int,input().split()))

the for loop shall run ‘n’ number of times . the second ‘n’ is the length of the array. the last statement maps the integers to a list and takes input in space separated form . you can also return the array at the end of for loop.

Question 55

I encountered a problem of taking integer input while solving a problem on CodeChef, where two integers – separated by space – should be read from one line.

While int(input()) is sufficient for a single integer, I did not find a direct way to input two integers. I tried this:

num = input()
num1 = 0
num2 = 0

for i in range(len(num)):
    if num[i] == ' ':
        break

num1 = int(num[:i])
num2 = int(num[i+1:])

Now I use num1 and num2 as integers. Hope this helps.

Question 56

def dbz():
    try:
        r = raw_input("Enter number:")
        if r.isdigit():
            i = int(raw_input("Enter divident:"))
            d = int(r)/i
            print "O/p is -:",d
        else:
            print "Not a number"
    except Exception ,e:
        print "Program halted incorrect data entered",type(e)
dbz()

Or 

num = input("Enter Number:")#"input" will accept only numbers

Question 57

While in your example, int(input(...)) does the trick in any case, python-future‘s builtins.input is worth consideration since that makes sure your code works for both Python 2 and 3 and disables Python2’s default behaviour of input trying to be “clever” about the input data type (builtins.input basically just behaves like raw_input).

Question 58

This is just a snippet of my code:

print("Total score for %s is %s  ", name, score)

But I want it to print out:

“Total score for (name) is (score)”

where name is a variable in a list and score is an integer. This is Python 3.3 if that helps at all.

Question 59

There are many ways to do this. To fix your current code using %-formatting, you need to pass in a tuple:

Pass it as a tuple:

print("Total score for %s is %s" % (name, score))

A tuple with a single element looks like ('this',).

Here are some other common ways of doing it:

Pass it as a dictionary:

print("Total score for %(n)s is %(s)s" % {'n': name, 's': score})

There’s also new-style string formatting, which might be a little easier to read:

Use new-style string formatting:

print("Total score for {} is {}".format(name, score))

Use new-style string formatting with numbers (useful for reordering or printing the same one multiple times):
```
print("Total score for {0} is {1}".format(name, score))
```

Use new-style string formatting with explicit names:

print("Total score for {n} is {s}".format(n=name, s=score))

Concatenate strings:

print("Total score for " + str(name) + " is " + str(score))

The clearest two, in my opinion:

Just pass the values as parameters:
```
print("Total score for", name, "is", score)
```
If you don’t want spaces to be inserted automatically by print in the above example, change the sep parameter:
```
print("Total score for ", name, " is ", score, sep='')
```
If you’re using Python 2, won’t be able to use the last two because print isn’t a function in Python 2. You can, however, import this behavior from __future__:
```
from __future__ import print_function
```

Use the new f-string formatting in Python 3.6:

print(f'Total score for {name} is {score}')

Question 60

There are many ways to print that.

Let’s have a look with another example.

a = 10
b = 20
c = a + b

#Normal string concatenation
print("sum of", a , "and" , b , "is" , c) 

#convert variable into str
print("sum of " + str(a) + " and " + str(b) + " is " + str(c)) 

# if you want to print in tuple way
print("Sum of %s and %s is %s: " %(a,b,c))  

#New style string formatting
print("sum of {} and {} is {}".format(a,b,c)) 

#in case you want to use repr()
print("sum of " + repr(a) + " and " + repr(b) + " is " + repr(c))

EDIT :

#New f-string formatting from Python 3.6:
print(f'Sum of {a} and {b} is {c}')

Question 61

Use: .format():

print("Total score for {0} is {1}".format(name, score))

Or:

// Recommended, more readable code

print("Total score for {n} is {s}".format(n=name, s=score))

Or:

print("Total score for" + name + " is " + score)

Or:

`print("Total score for %s is %d" % (name, score))`

Question 62

In Python 3.6, f-string is much cleaner.

In earlier version:

print("Total score for %s is %s. " % (name, score))

In Python 3.6:

print(f'Total score for {name} is {score}.')

will do.

It is more efficient and elegant.

Question 63

Keeping it simple, I personally like string concatenation:

print("Total score for " + name + " is " + score)

It works with both Python 2.7 an 3.X.

NOTE: If score is an int, then, you should convert it to str:

print("Total score for " + name + " is " + str(score))

Question 64

Just try:

print("Total score for", name, "is", score)

问题：是否有用于字符串自然排序的内置函数？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

这篇文章的价值

功能

Value Of This Post

Functions

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

问题：如何在python 3.x中使用string.replace（）

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：为什么（’x’，）中的’x’比’x’==’x’快？

回答 0

TL; DR：

TL;DR:

回答 1

问题：为什么Python3中没有xrange函数？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：TypeError：method（）接受1个位置参数，但给出了2个

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：如何将输入读取为数字？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：如何在Python 3中使用过滤，映射和归约

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：在Python中打印多个参数

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8