Python 实用宝典

Question 1

Is there a way to make a defaultdict also be the default for the defaultdict? (i.e. infinite-level recursive defaultdict?)

I want to be able to do:

x = defaultdict(...stuff...)
x[0][1][0]
{}

So, I can do x = defaultdict(defaultdict), but that’s only a second level:

x[0]
{}
x[0][0]
KeyError: 0

There are recipes that can do this. But can it be done simply just using the normal defaultdict arguments?

Note this is asking how to do an infinite-level recursive defaultdict, so it’s distinct to Python: defaultdict of defaultdict?, which was how to do a two-level defaultdict.

I’ll probably just end up using the bunch pattern, but when I realized I didn’t know how to do this, it got me interested.

Question 2

For an arbitrary number of levels:

def rec_dd():
    return defaultdict(rec_dd)

>>> x = rec_dd()
>>> x['a']['b']['c']['d']
defaultdict(<function rec_dd at 0x7f0dcef81500>, {})
>>> print json.dumps(x)
{"a": {"b": {"c": {"d": {}}}}}

Of course you could also do this with a lambda, but I find lambdas to be less readable. In any case it would look like this:

rec_dd = lambda: defaultdict(rec_dd)

Question 3

The other answers here tell you how to create a defaultdict which contains “infinitely many” defaultdict, but they fail to address what I think may have been your initial need which was to simply have a two-depth defaultdict.

You may have been looking for:

defaultdict(lambda: defaultdict(dict))

The reasons why you might prefer this construct are:

It is more explicit than the recursive solution, and therefore likely more understandable to the reader.
This enables the “leaf” of the defaultdict to be something other than a dictionary, e.g.,: defaultdict(lambda: defaultdict(list)) or defaultdict(lambda: defaultdict(set))

Question 4

There is a nifty trick for doing that:

tree = lambda: defaultdict(tree)

Then you can create your x with x = tree().

Question 5

Similar to BrenBarn’s solution, but doesn’t contain the name of the variable tree twice, so it works even after changes to the variable dictionary:

tree = (lambda f: f(f))(lambda a: (lambda: defaultdict(a(a))))

Then you can create each new x with x = tree().

For the def version, we can use function closure scope to protect the data structure from the flaw where existing instances stop working if the tree name is rebound. It looks like this:

from collections import defaultdict

def tree():
    def the_tree():
        return defaultdict(the_tree)
    return the_tree()

Question 6

I would also propose more OOP-styled implementation, which supports infinite nesting as well as properly formatted repr.

class NestedDefaultDict(defaultdict):
    def __init__(self, *args, **kwargs):
        super(NestedDefaultDict, self).__init__(NestedDefaultDict, *args, **kwargs)

    def __repr__(self):
        return repr(dict(self))

Usage:

my_dict = NestedDefaultDict()
my_dict['a']['b'] = 1
my_dict['a']['c']['d'] = 2
my_dict['b']

print(my_dict)  # {'a': {'b': 1, 'c': {'d': 2}}, 'b': {}}

Question 7

here is a recursive function to convert a recursive default dict to a normal dict

def defdict_to_dict(defdict, finaldict):
    # pass in an empty dict for finaldict
    for k, v in defdict.items():
        if isinstance(v, defaultdict):
            # new level created and that is the new value
            finaldict[k] = defdict_to_dict(v, {})
        else:
            finaldict[k] = v
    return finaldict

defdict_to_dict(my_rec_default_dict, {})

Question 8

I based this of Andrew’s answer here. If you are looking to load data from a json or an existing dict into the nester defaultdict see this example:

def nested_defaultdict(existing=None, **kwargs):
    if existing is None:
        existing = {}
    if not isinstance(existing, dict):
        return existing
    existing = {key: nested_defaultdict(val) for key, val in existing.items()}
    return defaultdict(nested_defaultdict, existing, **kwargs)

https://gist.github.com/nucklehead/2d29628bb49115f3c30e78c071207775

Question 9

I am working on a script to recursively go through subfolders in a mainfolder and build a list off a certain file type. I am having an issue with the script. Its currently set as follows

for root, subFolder, files in os.walk(PATH):
    for item in files:
        if item.endswith(".txt") :
            fileNamePath = str(os.path.join(root,subFolder,item))

the problem is that the subFolder variable is pulling in a list of subfolders rather than the folder that the ITEM file is located. I was thinking of running a for loop for the subfolder before and join the first part of the path but I figured Id double check to see if anyone has any suggestions before that. Thanks for your help!

Question 10

You should be using the dirpath which you call root. The dirnames are supplied so you can prune it if there are folders that you don’t wish os.walk to recurse into.

import os
result = [os.path.join(dp, f) for dp, dn, filenames in os.walk(PATH) for f in filenames if os.path.splitext(f)[1] == '.txt']

Edit:

After the latest downvote, it occurred to me that glob is a better tool for selecting by extension.

import os
from glob import glob
result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]

Also a generator version

from itertools import chain
result = (chain.from_iterable(glob(os.path.join(x[0], '*.txt')) for x in os.walk('.')))

Edit2 for Python 3.4+

from pathlib import Path
result = list(Path(".").rglob("*.[tT][xX][tT]"))

Question 11

Changed in Python 3.5: Support for recursive globs using “**”.

glob.glob() got a new recursive parameter.

If you want to get every .txt file under my_path (recursively including subdirs):

import glob

files = glob.glob(my_path + '/**/*.txt', recursive=True)

# my_path/     the dir
# **/       every file and dir under my_path
# *.txt     every file that ends with '.txt'

If you need an iterator you can use iglob as an alternative:

for file in glob.iglob(my_path, recursive=False):
    # ...

Question 12

I will translate John La Rooy’s list comprehension to nested for’s, just in case anyone else has trouble understanding it.

result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]

Should be equivalent to:

import glob

result = []

for x in os.walk(PATH):
    for y in glob.glob(os.path.join(x[0], '*.txt')):
        result.append(y)

Here’s the documentation for list comprehension and the functions os.walk and glob.glob.

Question 13

This seems to be the fastest solution I could come up with, and is faster than os.walk and a lot faster than any glob solution.

It will also give you a list of all nested subfolders at basically no cost.
You can search for several different extensions.
You can also choose to return either full paths or just the names for the files by changing f.path to f.name (do not change it for subfolders!).

Args: dir: str, ext: list.
Function returns two lists: subfolders, files.

See below for a detailed speed anaylsis.

def run_fast_scandir(dir, ext):    # dir: str, ext: list
    subfolders, files = [], []

    for f in os.scandir(dir):
        if f.is_dir():
            subfolders.append(f.path)
        if f.is_file():
            if os.path.splitext(f.name)[1].lower() in ext:
                files.append(f.path)


    for dir in list(subfolders):
        sf, f = run_fast_scandir(dir, ext)
        subfolders.extend(sf)
        files.extend(f)
    return subfolders, files


subfolders, files = run_fast_scandir(folder, [".jpg"])

Speed analysis

for various methods to get all files with a specific file extension inside all subfolders and the main folder.

tl;dr:
– fast_scandir clearly wins and is twice as fast as all other solutions, except os.walk.
– os.walk is second place slighly slower.
– using glob will greatly slow down the process.
– None of the results use natural sorting. This means results will be sorted like this: 1, 10, 2. To get natural sorting (1, 2, 10), please have a look at https://stackoverflow.com/a/48030307/2441026

Results:

fast_scandir    took  499 ms. Found files: 16596. Found subfolders: 439
os.walk         took  589 ms. Found files: 16596
find_files      took  919 ms. Found files: 16596
glob.iglob      took  998 ms. Found files: 16596
glob.glob       took 1002 ms. Found files: 16596
pathlib.rglob   took 1041 ms. Found files: 16596
os.walk-glob    took 1043 ms. Found files: 16596

Tests were done with W7x64, Python 3.8.1, 20 runs. 16596 files in 439 (partially nested) subfolders.
find_files is from https://stackoverflow.com/a/45646357/2441026 and lets you search for several extensions.
fast_scandir was written by myself and will also return a list of subfolders. You can give it a list of extensions to search for (I tested a list with one entry to a simple if ... == ".jpg" and there was no significant difference).

# -*- coding: utf-8 -*-
# Python 3


import time
import os
from glob import glob, iglob
from pathlib import Path


directory = r"<folder>"
RUNS = 20


def run_os_walk():
    a = time.time_ns()
    for i in range(RUNS):
        fu = [os.path.join(dp, f) for dp, dn, filenames in os.walk(directory) for f in filenames if
                  os.path.splitext(f)[1].lower() == '.jpg']
    print(f"os.walk\t\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_os_walk_glob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = [y for x in os.walk(directory) for y in glob(os.path.join(x[0], '*.jpg'))]
    print(f"os.walk-glob\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_glob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = glob(os.path.join(directory, '**', '*.jpg'), recursive=True)
    print(f"glob.glob\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_iglob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = list(iglob(os.path.join(directory, '**', '*.jpg'), recursive=True))
    print(f"glob.iglob\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_pathlib_rglob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = list(Path(directory).rglob("*.jpg"))
    print(f"pathlib.rglob\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def find_files(files, dirs=[], extensions=[]):
    # https://stackoverflow.com/a/45646357/2441026

    new_dirs = []
    for d in dirs:
        try:
            new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
        except OSError:
            if os.path.splitext(d)[1].lower() in extensions:
                files.append(d)

    if new_dirs:
        find_files(files, new_dirs, extensions )
    else:
        return


def run_fast_scandir(dir, ext):    # dir: str, ext: list
    # https://stackoverflow.com/a/59803793/2441026

    subfolders, files = [], []

    for f in os.scandir(dir):
        if f.is_dir():
            subfolders.append(f.path)
        if f.is_file():
            if os.path.splitext(f.name)[1].lower() in ext:
                files.append(f.path)


    for dir in list(subfolders):
        sf, f = run_fast_scandir(dir, ext)
        subfolders.extend(sf)
        files.extend(f)
    return subfolders, files



if __name__ == '__main__':
    run_os_walk()
    run_os_walk_glob()
    run_glob()
    run_iglob()
    run_pathlib_rglob()


    a = time.time_ns()
    for i in range(RUNS):
        files = []
        find_files(files, dirs=[directory], extensions=[".jpg"])
    print(f"find_files\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(files)}")


    a = time.time_ns()
    for i in range(RUNS):
        subf, files = run_fast_scandir(directory, [".jpg"])
    print(f"fast_scandir\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(files)}. Found subfolders: {len(subf)}")

Question 14

The new pathlib library simplifies this to one line:

from pathlib import Path
result = list(Path(PATH).glob('**/*.txt'))

You can also use the generator version:

from pathlib import Path
for file in Path(PATH).glob('**/*.txt'):
    pass

This returns Path objects, which you can use for pretty much anything, or get the file name as a string by file.name.

Question 15

Its not the most pythonic answer, but I’ll put it here for fun because it’s a neat lesson in recursion

def find_files( files, dirs=[], extensions=[]):
    new_dirs = []
    for d in dirs:
        try:
            new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
        except OSError:
            if os.path.splitext(d)[1] in extensions:
                files.append(d)

    if new_dirs:
        find_files(files, new_dirs, extensions )
    else:
        return

On my machine I have two folders, root and root2

mender@multivax ]ls -R root root2
root:
temp1 temp2

root/temp1:
temp1.1 temp1.2

root/temp1/temp1.1:
f1.mid

root/temp1/temp1.2:
f.mi  f.mid

root/temp2:
tmp.mid

root2:
dummie.txt temp3

root2/temp3:
song.mid

Lets say I want to find all .txt and all .mid files in either of these directories, then I can just do

files = []
find_files( files, dirs=['root','root2'], extensions=['.mid','.txt'] )
print(files)

#['root2/dummie.txt',
# 'root/temp2/tmp.mid',
# 'root2/temp3/song.mid',
# 'root/temp1/temp1.1/f1.mid',
# 'root/temp1/temp1.2/f.mid']

Question 16

Recursive is new in Python 3.5, so it won’t work on Python 2.7. Here is the example that uses r strings so you just need to provide the path as is on either Win, Lin, …

import glob

mypath=r"C:\Users\dj\Desktop\nba"

files = glob.glob(mypath + r'\**\*.py', recursive=True)
# print(files) # as list
for f in files:
    print(f) # nice looking single line per file

Note: It will list all files, no matter how deep it should go.

Question 17

You can do it this way to return you a list of absolute path files.

def list_files_recursive(path):
    """
    Function that receives as a parameter a directory path
    :return list_: File List and Its Absolute Paths
    """

    import os

    files = []

    # r = root, d = directories, f = files
    for r, d, f in os.walk(path):
        for file in f:
            files.append(os.path.join(r, file))

    lst = [file for file in files]
    return lst


if __name__ == '__main__':

    result = list_files_recursive('/tmp')
    print(result)

Question 18

If you don’t mind installing an additional light library, you can do this:

pip install plazy

Usage:

import plazy

txt_filter = lambda x : True if x.endswith('.txt') else False
files = plazy.list_files(root='data', filter_func=txt_filter, is_include_root=True)

The result should look something like this:

['data/a.txt', 'data/b.txt', 'data/sub_dir/c.txt']

It works on both Python 2.7 and Python 3.

Github: https://github.com/kyzas/plazy#list-files

Disclaimer: I’m an author of plazy.

Question 19

This function will recursively put only files into a list.

import os


def ls_files(dir):
    files = list()
    for item in os.listdir(dir):
        abspath = os.path.join(dir, item)
        try:
            if os.path.isdir(abspath):
                files = files + ls_files(abspath)
            else:
                files.append(abspath)
        except FileNotFoundError as err:
            print('invalid directory\n', 'Error: ', err)
    return files

Question 20

Your original solution was very nearly correct, but the variable “root” is dynamically updated as it recursively paths around. os.walk() is a recursive generator. Each tuple set of (root, subFolder, files) is for a specific root the way you have it setup.

i.e.

root = 'C:\\'
subFolder = ['Users', 'ProgramFiles', 'ProgramFiles (x86)', 'Windows', ...]
files = ['foo1.txt', 'foo2.txt', 'foo3.txt', ...]

root = 'C:\\Users\\'
subFolder = ['UserAccount1', 'UserAccount2', ...]
files = ['bar1.txt', 'bar2.txt', 'bar3.txt', ...]

...

I made a slight tweak to your code to print a full list.

import os
for root, subFolder, files in os.walk(PATH):
    for item in files:
        if item.endswith(".txt") :
            fileNamePath = str(os.path.join(root,item))
            print(fileNamePath)

Hope this helps!

Question 21

I want to override access to one variable in a class, but return all others normally. How do I accomplish this with __getattribute__?

I tried the following (which should also illustrate what I’m trying to do) but I get a recursion error:

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:
            return self.__dict__[name]

>>> print D().test
0.0
>>> print D().test2
...
RuntimeError: maximum recursion depth exceeded in cmp

Question 22

You get a recursion error because your attempt to access the self.__dict__ attribute inside __getattribute__ invokes your __getattribute__ again. If you use object‘s __getattribute__ instead, it works:

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:
            return object.__getattribute__(self, name)

This works because object (in this example) is the base class. By calling the base version of __getattribute__ you avoid the recursive hell you were in before.

Ipython output with code in foo.py:

In [1]: from foo import *

In [2]: d = D()

In [3]: d.test
Out[3]: 0.0

In [4]: d.test2
Out[4]: 21

Update:

There’s something in the section titled More attribute access for new-style classes in the current documentation, where they recommend doing exactly this to avoid the infinite recursion.

Question 23

Actually, I believe you want to use the __getattr__ special method instead.

Quote from the Python docs:

__getattr__( self, name)

Called when an attribute lookup has not found the attribute in the usual places (i.e. it is not an instance attribute nor is it found in the class tree for self). name is the attribute name. This method should return the (computed) attribute value or raise an AttributeError exception.
Note that if the attribute is found through the normal mechanism, __getattr__() is not called. (This is an intentional asymmetry between __getattr__() and __setattr__().) This is done both for efficiency reasons and because otherwise __setattr__() would have no way to access other attributes of the instance. Note that at least for instance variables, you can fake total control by not inserting any values in the instance attribute dictionary (but instead inserting them in another object). See the __getattribute__() method below for a way to actually get total control in new-style classes.

Note: for this to work, the instance should not have a test attribute, so the line self.test=20 should be removed.

Question 24

Python language reference:

In order to avoid infinite recursion in this method, its implementation should always call the base class method with the same name to access any attributes it needs, for example, object.__getattribute__(self, name).

Meaning:

def __getattribute__(self,name):
    ...
        return self.__dict__[name]

You’re calling for an attribute called __dict__. Because it’s an attribute, __getattribute__ gets called in search for __dict__ which calls __getattribute__ which calls … yada yada yada

return  object.__getattribute__(self, name)

Using the base classes __getattribute__ helps finding the real attribute.

Question 25

Are you sure you want to use __getattribute__? What are you actually trying to achieve?

The easiest way to do what you ask is:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21

    test = 0

or:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21

    @property
    def test(self):
        return 0

Edit: Note that an instance of D would have different values of test in each case. In the first case d.test would be 20, in the second it would be 0. I’ll leave it to you to work out why.

Edit2: Greg pointed out that example 2 will fail because the property is read only and the __init__ method tried to set it to 20. A more complete example for that would be:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21

    _test = 0

    def get_test(self):
        return self._test

    def set_test(self, value):
        self._test = value

    test = property(get_test, set_test)

Obviously, as a class this is almost entirely useless, but it gives you an idea to move on from.

Question 26

Here is a more reliable version:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21
    def __getattribute__(self, name):
        if name == 'test':
            return 0.
        else:
            return super(D, self).__getattribute__(name)

It calls __getattribute__ method from parent class, eventually falling back to object.__getattribute__ method if other ancestors don’t override it.

Question 27

How is the __getattribute__ method used?

It is called before the normal dotted lookup. If it raises AttributeError, then we call __getattr__.

Use of this method is rather rare. There are only two definitions in the standard library:

$ grep -Erl  "def __getattribute__\(self" cpython/Lib | grep -v "/test/"
cpython/Lib/_threading_local.py
cpython/Lib/importlib/util.py

Best Practice

The proper way to programmatically control access to a single attribute is with property. Class D should be written as follows (with the setter and deleter optionally to replicate apparent intended behavior):

class D(object):
    def __init__(self):
        self.test2=21

    @property
    def test(self):
        return 0.

    @test.setter
    def test(self, value):
        '''dummy function to avoid AttributeError on setting property'''

    @test.deleter
    def test(self):
        '''dummy function to avoid AttributeError on deleting property'''

And usage:

>>> o = D()
>>> o.test
0.0
>>> o.test = 'foo'
>>> o.test
0.0
>>> del o.test
>>> o.test
0.0

A property is a data descriptor, thus it is the first thing looked for in the normal dotted lookup algorithm.

Options for `getattribute`

You several options if you absolutely need to implement lookup for every attribute via __getattribute__.

raise AttributeError, causing __getattr__ to be called (if implemented)
return something from it by
- using super to call the parent (probably object‘s) implementation
- calling __getattr__
- implementing your own dotted lookup algorithm somehow

For example:

class NoisyAttributes(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self, name):
        print('getting: ' + name)
        try:
            return super(NoisyAttributes, self).__getattribute__(name)
        except AttributeError:
            print('oh no, AttributeError caught and reraising')
            raise
    def __getattr__(self, name):
        """Called if __getattribute__ raises AttributeError"""
        return 'close but no ' + name    


>>> n = NoisyAttributes()
>>> nfoo = n.foo
getting: foo
oh no, AttributeError caught and reraising
>>> nfoo
'close but no foo'
>>> n.test
getting: test
20

What you originally wanted.

And this example shows how you might do what you originally wanted:

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:
            return super(D, self).__getattribute__(name)

And will behave like this:

>>> o = D()
>>> o.test = 'foo'
>>> o.test
0.0
>>> del o.test
>>> o.test
0.0
>>> del o.test

Traceback (most recent call last):
  File "<pyshell#216>", line 1, in <module>
    del o.test
AttributeError: test

Code review

Your code with comments. You have a dotted lookup on self in __getattribute__. This is why you get a recursion error. You could check if name is "__dict__" and use super to workaround, but that doesn’t cover __slots__. I’ll leave that as an exercise to the reader.

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:      #   v--- Dotted lookup on self in __getattribute__
            return self.__dict__[name]

>>> print D().test
0.0
>>> print D().test2
...
RuntimeError: maximum recursion depth exceeded in cmp

Question 28

Recently I wrote a function to generate certain sequences with nontrivial constraints. The problem came with a natural recursive solution. Now it happens that, even for relatively small input, the sequences are several thousands, thus I would prefer to use my algorithm as a generator instead of using it to fill a list with all the sequences.

Here is an example. Suppose we want to compute all the permutations of a string with a recursive function. The following naive algorithm takes an extra argument ‘storage’ and appends a permutation to it whenever it finds one:

def getPermutations(string, storage, prefix=""):
   if len(string) == 1:
      storage.append(prefix + string)   # <-----
   else:
      for i in range(len(string)):
         getPermutations(string[:i]+string[i+1:], storage, prefix+string[i])

storage = []
getPermutations("abcd", storage)
for permutation in storage: print permutation

(Please don’t care about inefficiency, this is only an example.)

Now I want to turn my function into a generator, i.e. to yield a permutation instead of appending it to the storage list:

def getPermutations(string, prefix=""):
   if len(string) == 1:
      yield prefix + string             # <-----
   else:
      for i in range(len(string)):
         getPermutations(string[:i]+string[i+1:], prefix+string[i])

for permutation in getPermutations("abcd"):
   print permutation

This code does not work (the function behaves like an empty generator).

Am I missing something? Is there a way to turn the above recursive algorithm into a generator without replacing it with an iterative one?

Question 29

def getPermutations(string, prefix=""):
    if len(string) == 1:
        yield prefix + string
    else:
        for i in xrange(len(string)):
            for perm in getPermutations(string[:i] + string[i+1:], prefix+string[i]):
                yield perm

Or without an accumulator:

def getPermutations(string):
    if len(string) == 1:
        yield string
    else:
        for i in xrange(len(string)):
            for perm in getPermutations(string[:i] + string[i+1:]):
                yield string[i] + perm

Question 30

This avoids the len(string)-deep recursion, and is in general a nice way to handle generators-inside-generators:

from types import GeneratorType

def flatten(*stack):
    stack = list(stack)
    while stack:
        try: x = stack[0].next()
        except StopIteration:
            stack.pop(0)
            continue
        if isinstance(x, GeneratorType): stack.insert(0, x)
        else: yield x

def _getPermutations(string, prefix=""):
    if len(string) == 1: yield prefix + string
    else: yield (_getPermutations(string[:i]+string[i+1:], prefix+string[i])
            for i in range(len(string)))

def getPermutations(string): return flatten(_getPermutations(string))

for permutation in getPermutations("abcd"): print permutation

flatten allows us to continue progress in another generator by simply yielding it, instead of iterating through it and yielding each item manually.

Python 3.3 will add yield from to the syntax, which allows for natural delegation to a sub-generator:

def getPermutations(string, prefix=""):
    if len(string) == 1:
        yield prefix + string
    else:
        for i in range(len(string)):
            yield from getPermutations(string[:i]+string[i+1:], prefix+string[i])

Question 31

The interior call to getPermutations — it’s a generator, too.

def getPermutations(string, prefix=""):
   if len(string) == 1:
      yield prefix + string            
   else:
      for i in range(len(string)):
         getPermutations(string[:i]+string[i+1:], prefix+string[i])  # <-----

You need to iterate through that with a for-loop (see @MizardX posting, which edged me out by seconds!)

Question 32

I have the following piece of code which fails with the following error:

RuntimeError: maximum recursion depth exceeded

I attempted to rewrite this to allow for tail recursion optimization (TCO). I believe that this code should have been successful if a TCO had taken place.

def trisum(n, csum):
    if n == 0:
        return csum
    else:
        return trisum(n - 1, csum + n)

print(trisum(1000, 0))

Should I conclude that Python does not do any type of TCO, or do I just need to define it differently?

Question 33

No, and it never will since Guido van Rossum prefers to be able to have proper tracebacks:

Tail Recursion Elimination (2009-04-22)

Final Words on Tail Calls (2009-04-27)

You can manually eliminate the recursion with a transformation like this:

>>> def trisum(n, csum):
...     while True:                     # Change recursion to a while loop
...         if n == 0:
...             return csum
...         n, csum = n - 1, csum + n   # Update parameters instead of tail recursion

>>> trisum(1000,0)
500500

Question 34

I published a module performing tail-call optimization (handling both tail-recursion and continuation-passing style): https://github.com/baruchel/tco

Optimizing tail-recursion in Python

It has often been claimed that tail-recursion doesn’t suit the Pythonic way of coding and that one shouldn’t care about how to embed it in a loop. I don’t want to argue with this point of view; sometimes however I like trying or implementing new ideas as tail-recursive functions rather than with loops for various reasons (focusing on the idea rather than on the process, having twenty short functions on my screen in the same time rather than only three “Pythonic” functions, working in an interactive session rather than editing my code, etc.).

Optimizing tail-recursion in Python is in fact quite easy. While it is said to be impossible or very tricky, I think it can be achieved with elegant, short and general solutions; I even think that most of these solutions don’t use Python features otherwise than they should. Clean lambda expressions working along with very standard loops lead to quick, efficient and fully usable tools for implementing tail-recursion optimization.

As a personal convenience, I wrote a small module implementing such an optimization by two different ways. I would like to discuss here about my two main functions.

The clean way: modifying the Y combinator

The Y combinator is well known; it allows to use lambda functions in a recursive manner, but it doesn’t allow by itself to embed recursive calls in a loop. Lambda calculus alone can’t do such a thing. A slight change in the Y combinator however can protect the recursive call to be actually evaluated. Evaluation can thus be delayed.

Here is the famous expression for the Y combinator:

lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))

With a very slight change, I could get:

lambda f: (lambda x: x(x))(lambda y: f(lambda *args: lambda: y(y)(*args)))

Instead of calling itself, the function f now returns a function performing the very same call, but since it returns it, the evaluation can be done later from outside.

My code is:

def bet(func):
    b = (lambda f: (lambda x: x(x))(lambda y:
          f(lambda *args: lambda: y(y)(*args))))(func)
    def wrapper(*args):
        out = b(*args)
        while callable(out):
            out = out()
        return out
    return wrapper

The function can be used in the following way; here are two examples with tail-recursive versions of factorial and Fibonacci:

>>> from recursion import *
>>> fac = bet( lambda f: lambda n, a: a if not n else f(n-1,a*n) )
>>> fac(5,1)
120
>>> fibo = bet( lambda f: lambda n,p,q: p if not n else f(n-1,q,p+q) )
>>> fibo(10,0,1)
55

Obviously recursion depth isn’t an issue any longer:

>>> bet( lambda f: lambda n: 42 if not n else f(n-1) )(50000)
42

This is of course the single real purpose of the function.

Only one thing can’t be done with this optimization: it can’t be used with a tail-recursive function evaluating to another function (this comes from the fact that callable returned objects are all handled as further recursive calls with no distinction). Since I usually don’t need such a feature, I am very happy with the code above. However, in order to provide a more general module, I thought a little more in order to find some workaround for this issue (see next section).

Concerning the speed of this process (which isn’t the real issue however), it happens to be quite good; tail-recursive functions are even evaluated much quicker than with the following code using simpler expressions:

def bet1(func):
    def wrapper(*args):
        out = func(lambda *x: lambda: x)(*args)
        while callable(out):
            out = func(lambda *x: lambda: x)(*out())
        return out
    return wrapper

I think that evaluating one expression, even complicated, is much quicker than evaluating several simple expressions, which is the case in this second version. I didn’t keep this new function in my module, and I see no circumstances where it could be used rather than the “official” one.

Continuation passing style with exceptions

Here is a more general function; it is able to handle all tail-recursive functions, including those returning other functions. Recursive calls are recognized from other return values by the use of exceptions. This solutions is slower than the previous one; a quicker code could probably be written by using some special values as “flags” being detected in the main loop, but I don’t like the idea of using special values or internal keywords. There is some funny interpretation of using exceptions: if Python doesn’t like tail-recursive calls, an exception should be raised when a tail-recursive call does occur, and the Pythonic way will be to catch the exception in order to find some clean solution, which is actually what happens here…

class _RecursiveCall(Exception):
  def __init__(self, *args):
    self.args = args
def _recursiveCallback(*args):
  raise _RecursiveCall(*args)
def bet0(func):
    def wrapper(*args):
        while True:
          try:
            return func(_recursiveCallback)(*args)
          except _RecursiveCall as e:
            args = e.args
    return wrapper

Now all functions can be used. In the following example, f(n) is evaluated to the identity function for any positive value of n:

>>> f = bet0( lambda f: lambda n: (lambda x: x) if not n else f(n-1) )
>>> f(5)(42)
42

Of course, it could be argued that exceptions are not intended to be used for intentionally redirecting the interpreter (as a kind of goto statement or probably rather a kind of continuation passing style), which I have to admit. But, again, I find funny the idea of using try with a single line being a return statement: we try to return something (normal behaviour) but we can’t do it because of a recursive call occurring (exception).

Initial answer (2013-08-29).

I wrote a very small plugin for handling tail recursion. You may find it with my explanations there: https://groups.google.com/forum/?hl=fr#!topic/comp.lang.python/dIsnJ2BoBKs

It can embed a lambda function written with a tail recursion style in another function which will evaluate it as a loop.

The most interesting feature in this small function, in my humble opinion, is that the function doesn’t rely on some dirty programming hack but on mere lambda calculus: the behaviour of the function is changed to another one when inserted in another lambda function which looks very like the Y combinator.

Question 35

The word of Guido is at http://neopythonic.blogspot.co.uk/2009/04/tail-recursion-elimination.html

I recently posted an entry in my Python History blog on the origins of Python’s functional features. A side remark about not supporting tail recursion elimination (TRE) immediately sparked several comments about what a pity it is that Python doesn’t do this, including links to recent blog entries by others trying to “prove” that TRE can be added to Python easily. So let me defend my position (which is that I don’t want TRE in the language). If you want a short answer, it’s simply unpythonic. Here’s the long answer:

Question 36

CPython does not and will probably never support tail call optimization based on Guido van Rossum’s statements on the subject.

I’ve heard arguments that it makes debugging more difficult because of how it modifies the stack trace.

Question 37

Try the experimental macropy TCO implementation for size.

Question 38

Besides optimizing tail recursion, you can set the recursion depth manually by:

import sys
sys.setrecursionlimit(5500000)
print("recursion limit:%d " % (sys.getrecursionlimit()))

问题：嵌套的defaultdict defaultdict

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：递归子文件夹搜索并返回列表python中的文件

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：如何实现__getattribute__而没有无限递归错误？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

如何__getattribute__使用该方法？

的选项 __getattribute__

您最初想要的。

代码审查

How is the __getattribute__ method used?

Options for __getattribute__

What you originally wanted.

Code review

问题：Python：使用递归算法作为生成器

回答 0

回答 1

回答 2

问题：Python是否优化尾部递归？

回答 0

回答 1

在Python中优化尾递归

干净的方法：修改Y组合器

延续传球方式，但有exceptions

Optimizing tail-recursion in Python

The clean way: modifying the Y combinator

Continuation passing style with exceptions

回答 2

回答 3

回答 4

回答 5

问题：Python中的最大递归深度是多少，以及如何增加？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

输出量

资源

Output

Source

回答 16

有趣好用的Python教程

问题：如何实现getattribute而没有无限递归错误？

如何`getattribute`使用该方法？

的选项 `getattribute`

How is the `getattribute` method used?

Options for `getattribute`