标签归档:recursion

嵌套的defaultdict defaultdict

问题:嵌套的defaultdict defaultdict

有没有办法使defaultdict也成为defaultdict的默认值?(即无限级递归defaultdict?)

我希望能够做到:

x = defaultdict(...stuff...)
x[0][1][0]
{}

因此,我可以做到x = defaultdict(defaultdict),但这仅是第二层:

x[0]
{}
x[0][0]
KeyError: 0

有一些食谱可以做到这一点。但是,仅使用常规的defaultdict参数就可以做到吗?

请注意,这是在问如何执行无限级递归defaultdict,因此它与Python不同:defaultdict的defaultdict?,这是执行两级defaultdict的方法。

我可能最终会使用模式,但是当我意识到自己不知道该怎么做时,这引起了我的兴趣。

Is there a way to make a defaultdict also be the default for the defaultdict? (i.e. infinite-level recursive defaultdict?)

I want to be able to do:

x = defaultdict(...stuff...)
x[0][1][0]
{}

So, I can do x = defaultdict(defaultdict), but that’s only a second level:

x[0]
{}
x[0][0]
KeyError: 0

There are recipes that can do this. But can it be done simply just using the normal defaultdict arguments?

Note this is asking how to do an infinite-level recursive defaultdict, so it’s distinct to Python: defaultdict of defaultdict?, which was how to do a two-level defaultdict.

I’ll probably just end up using the bunch pattern, but when I realized I didn’t know how to do this, it got me interested.


回答 0

对于任意数量的级别:

def rec_dd():
    return defaultdict(rec_dd)

>>> x = rec_dd()
>>> x['a']['b']['c']['d']
defaultdict(<function rec_dd at 0x7f0dcef81500>, {})
>>> print json.dumps(x)
{"a": {"b": {"c": {"d": {}}}}}

当然,您也可以使用lambda来执行此操作,但是我发现lambda的可读性较差。无论如何,它看起来像这样:

rec_dd = lambda: defaultdict(rec_dd)

For an arbitrary number of levels:

def rec_dd():
    return defaultdict(rec_dd)

>>> x = rec_dd()
>>> x['a']['b']['c']['d']
defaultdict(<function rec_dd at 0x7f0dcef81500>, {})
>>> print json.dumps(x)
{"a": {"b": {"c": {"d": {}}}}}

Of course you could also do this with a lambda, but I find lambdas to be less readable. In any case it would look like this:

rec_dd = lambda: defaultdict(rec_dd)

回答 1

这里的其他答案告诉您如何创建一个defaultdict包含“无限多个”的defaultdict,但是它们无法解决我认为您最初的需求,即仅具有两个深度的defaultdict。

您可能一直在寻找:

defaultdict(lambda: defaultdict(dict))

您可能更喜欢此构造的原因是:

  • 它比递归解决方案更明确,因此读者可能更容易理解。
  • 这使的“叶” defaultdict可以是除字典之外的其他内容,例如:defaultdict(lambda: defaultdict(list))defaultdict(lambda: defaultdict(set))

The other answers here tell you how to create a defaultdict which contains “infinitely many” defaultdict, but they fail to address what I think may have been your initial need which was to simply have a two-depth defaultdict.

You may have been looking for:

defaultdict(lambda: defaultdict(dict))

The reasons why you might prefer this construct are:

  • It is more explicit than the recursive solution, and therefore likely more understandable to the reader.
  • This enables the “leaf” of the defaultdict to be something other than a dictionary, e.g.,: defaultdict(lambda: defaultdict(list)) or defaultdict(lambda: defaultdict(set))

回答 2

有一个不错的技巧:

tree = lambda: defaultdict(tree)

然后,您可以使用创建自己xx = tree()

There is a nifty trick for doing that:

tree = lambda: defaultdict(tree)

Then you can create your x with x = tree().


回答 3

与BrenBarn的解决方案类似,但是不包含tree两次变量名,因此即使更改了变量字典也可以使用:

tree = (lambda f: f(f))(lambda a: (lambda: defaultdict(a(a))))

然后,您可以创建的每个新xx = tree()


对于该def版本,我们可以使用函数闭包作用域来保护数据结构,以免其tree名称被反弹时现有实例停止工作的缺陷。看起来像这样:

from collections import defaultdict

def tree():
    def the_tree():
        return defaultdict(the_tree)
    return the_tree()

Similar to BrenBarn’s solution, but doesn’t contain the name of the variable tree twice, so it works even after changes to the variable dictionary:

tree = (lambda f: f(f))(lambda a: (lambda: defaultdict(a(a))))

Then you can create each new x with x = tree().


For the def version, we can use function closure scope to protect the data structure from the flaw where existing instances stop working if the tree name is rebound. It looks like this:

from collections import defaultdict

def tree():
    def the_tree():
        return defaultdict(the_tree)
    return the_tree()

回答 4

我还将提出更多OOP样式的实现,该实现支持无限嵌套以及正确格式化repr

class NestedDefaultDict(defaultdict):
    def __init__(self, *args, **kwargs):
        super(NestedDefaultDict, self).__init__(NestedDefaultDict, *args, **kwargs)

    def __repr__(self):
        return repr(dict(self))

用法:

my_dict = NestedDefaultDict()
my_dict['a']['b'] = 1
my_dict['a']['c']['d'] = 2
my_dict['b']

print(my_dict)  # {'a': {'b': 1, 'c': {'d': 2}}, 'b': {}}

I would also propose more OOP-styled implementation, which supports infinite nesting as well as properly formatted repr.

class NestedDefaultDict(defaultdict):
    def __init__(self, *args, **kwargs):
        super(NestedDefaultDict, self).__init__(NestedDefaultDict, *args, **kwargs)

    def __repr__(self):
        return repr(dict(self))

Usage:

my_dict = NestedDefaultDict()
my_dict['a']['b'] = 1
my_dict['a']['c']['d'] = 2
my_dict['b']

print(my_dict)  # {'a': {'b': 1, 'c': {'d': 2}}, 'b': {}}

回答 5

这是一个递归函数,用于将递归默认字典转换为普通字典

def defdict_to_dict(defdict, finaldict):
    # pass in an empty dict for finaldict
    for k, v in defdict.items():
        if isinstance(v, defaultdict):
            # new level created and that is the new value
            finaldict[k] = defdict_to_dict(v, {})
        else:
            finaldict[k] = v
    return finaldict

defdict_to_dict(my_rec_default_dict, {})

here is a recursive function to convert a recursive default dict to a normal dict

def defdict_to_dict(defdict, finaldict):
    # pass in an empty dict for finaldict
    for k, v in defdict.items():
        if isinstance(v, defaultdict):
            # new level created and that is the new value
            finaldict[k] = defdict_to_dict(v, {})
        else:
            finaldict[k] = v
    return finaldict

defdict_to_dict(my_rec_default_dict, {})

回答 6

我在这里基于安德鲁的答案。如果要从json或现有字典将数据加载到嵌套程序defaultdict中,请参见以下示例:

def nested_defaultdict(existing=None, **kwargs):
    if existing is None:
        existing = {}
    if not isinstance(existing, dict):
        return existing
    existing = {key: nested_defaultdict(val) for key, val in existing.items()}
    return defaultdict(nested_defaultdict, existing, **kwargs)

https://gist.github.com/nucklehead/2d29628bb49115f3c30e78c071207775

I based this of Andrew’s answer here. If you are looking to load data from a json or an existing dict into the nester defaultdict see this example:

def nested_defaultdict(existing=None, **kwargs):
    if existing is None:
        existing = {}
    if not isinstance(existing, dict):
        return existing
    existing = {key: nested_defaultdict(val) for key, val in existing.items()}
    return defaultdict(nested_defaultdict, existing, **kwargs)

https://gist.github.com/nucklehead/2d29628bb49115f3c30e78c071207775


递归子文件夹搜索并返回列表python中的文件

问题:递归子文件夹搜索并返回列表python中的文件

我正在编写一个脚本,以递归地遍历主文件夹中的子文件夹并根据某种文件类型构建一个列表。我的脚本有问题。当前设置如下

for root, subFolder, files in os.walk(PATH):
    for item in files:
        if item.endswith(".txt") :
            fileNamePath = str(os.path.join(root,subFolder,item))

问题是subFolder变量正在拉入子文件夹列表,而不是ITEM文件所在的文件夹。我曾考虑过为子文件夹运行一个for循环,然后加入路径的第一部分,但我想出了ID双重检查以了解在此之前是否有人提出建议。谢谢你的帮助!

I am working on a script to recursively go through subfolders in a mainfolder and build a list off a certain file type. I am having an issue with the script. Its currently set as follows

for root, subFolder, files in os.walk(PATH):
    for item in files:
        if item.endswith(".txt") :
            fileNamePath = str(os.path.join(root,subFolder,item))

the problem is that the subFolder variable is pulling in a list of subfolders rather than the folder that the ITEM file is located. I was thinking of running a for loop for the subfolder before and join the first part of the path but I figured Id double check to see if anyone has any suggestions before that. Thanks for your help!


回答 0

您应该使用dirpath调用的root。在dirnames供给这样你就可以进行清理,如果有文件夹,你不想os.walk递归到。

import os
result = [os.path.join(dp, f) for dp, dn, filenames in os.walk(PATH) for f in filenames if os.path.splitext(f)[1] == '.txt']

编辑:

经过最新一轮的投票之后,我发现这glob是一个用于扩展选择的更好工具。

import os
from glob import glob
result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]

也是生成器版本

from itertools import chain
result = (chain.from_iterable(glob(os.path.join(x[0], '*.txt')) for x in os.walk('.')))

适用于Python的Edit2 3.4+

from pathlib import Path
result = list(Path(".").rglob("*.[tT][xX][tT]"))

You should be using the dirpath which you call root. The dirnames are supplied so you can prune it if there are folders that you don’t wish os.walk to recurse into.

import os
result = [os.path.join(dp, f) for dp, dn, filenames in os.walk(PATH) for f in filenames if os.path.splitext(f)[1] == '.txt']

Edit:

After the latest downvote, it occurred to me that glob is a better tool for selecting by extension.

import os
from glob import glob
result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]

Also a generator version

from itertools import chain
result = (chain.from_iterable(glob(os.path.join(x[0], '*.txt')) for x in os.walk('.')))

Edit2 for Python 3.4+

from pathlib import Path
result = list(Path(".").rglob("*.[tT][xX][tT]"))

回答 1

Python 3.5中进行了更改:支持使用“ **”的递归glob。

glob.glob()有一个新的递归参数

如果要获取下面的每个.txt文件my_path(递归包括子目录):

import glob

files = glob.glob(my_path + '/**/*.txt', recursive=True)

# my_path/     the dir
# **/       every file and dir under my_path
# *.txt     every file that ends with '.txt'

如果需要迭代器,可以使用iglob作为替代方案:

for file in glob.iglob(my_path, recursive=False):
    # ...

Changed in Python 3.5: Support for recursive globs using “**”.

glob.glob() got a new recursive parameter.

If you want to get every .txt file under my_path (recursively including subdirs):

import glob

files = glob.glob(my_path + '/**/*.txt', recursive=True)

# my_path/     the dir
# **/       every file and dir under my_path
# *.txt     every file that ends with '.txt'

If you need an iterator you can use iglob as an alternative:

for file in glob.iglob(my_path, recursive=False):
    # ...

回答 2

我将把John La Rooy的列表理解理解为nest for的表达,以防万一其他人难以理解它。

result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]

应等效于:

import glob

result = []

for x in os.walk(PATH):
    for y in glob.glob(os.path.join(x[0], '*.txt')):
        result.append(y)

这是列表理解os.walkglob.glob函数的文档。

I will translate John La Rooy’s list comprehension to nested for’s, just in case anyone else has trouble understanding it.

result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]

Should be equivalent to:

import glob

result = []

for x in os.walk(PATH):
    for y in glob.glob(os.path.join(x[0], '*.txt')):
        result.append(y)

Here’s the documentation for list comprehension and the functions os.walk and glob.glob.


回答 3

这似乎是我能想到的最快的解决方案,并且比任何解决方案都快os.walk而且要快得多glob

  • 它也将免费为您提供所有嵌套子文件夹的列表。
  • 您可以搜索几个不同的扩展名。
  • 您也可以选择通过改变返回的文件不管是完整路径或只是名称f.pathf.name(不改变它的子文件夹!)。

Args :dir: str, ext: list
函数返回两个列表:subfolders, files

有关详细的速度分析,请参见下文。

def run_fast_scandir(dir, ext):    # dir: str, ext: list
    subfolders, files = [], []

    for f in os.scandir(dir):
        if f.is_dir():
            subfolders.append(f.path)
        if f.is_file():
            if os.path.splitext(f.name)[1].lower() in ext:
                files.append(f.path)


    for dir in list(subfolders):
        sf, f = run_fast_scandir(dir, ext)
        subfolders.extend(sf)
        files.extend(f)
    return subfolders, files


subfolders, files = run_fast_scandir(folder, [".jpg"])


速度分析

各种方法来获取所有子文件夹和主文件夹中具有特定文件扩展名的所有文件。

tl; dr:
fast_scandir显然是获胜的,并且是除os.walk之外所有其他解决方案的两倍。
os.walk落后第二名。
-使用glob会大大减慢该过程。
-没有任何结果使用自然排序。这意味着将对结果进行如下排序:1、10、2。要进行自然排序(1、2、10),请查看https://stackoverflow.com/a/48030307/2441026


结果:

fast_scandir    took  499 ms. Found files: 16596. Found subfolders: 439
os.walk         took  589 ms. Found files: 16596
find_files      took  919 ms. Found files: 16596
glob.iglob      took  998 ms. Found files: 16596
glob.glob       took 1002 ms. Found files: 16596
pathlib.rglob   took 1041 ms. Found files: 16596
os.walk-glob    took 1043 ms. Found files: 16596

测试使用W7x64,Python 3.8.1和20次运行完成。439个(部分嵌套的)子文件夹中的16596个文件。
find_files来自https://stackoverflow.com/a/45646357/2441026,可让您搜索多个扩展名。
fast_scandir由我自己编写,并且还将返回子文件夹列表。您可以给它提供扩展名列表以进行搜索(我测试了一个列表,其中一个条目很简单,if ... == ".jpg"并且没有显着差异)。


# -*- coding: utf-8 -*-
# Python 3


import time
import os
from glob import glob, iglob
from pathlib import Path


directory = r"<folder>"
RUNS = 20


def run_os_walk():
    a = time.time_ns()
    for i in range(RUNS):
        fu = [os.path.join(dp, f) for dp, dn, filenames in os.walk(directory) for f in filenames if
                  os.path.splitext(f)[1].lower() == '.jpg']
    print(f"os.walk\t\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_os_walk_glob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = [y for x in os.walk(directory) for y in glob(os.path.join(x[0], '*.jpg'))]
    print(f"os.walk-glob\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_glob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = glob(os.path.join(directory, '**', '*.jpg'), recursive=True)
    print(f"glob.glob\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_iglob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = list(iglob(os.path.join(directory, '**', '*.jpg'), recursive=True))
    print(f"glob.iglob\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_pathlib_rglob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = list(Path(directory).rglob("*.jpg"))
    print(f"pathlib.rglob\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def find_files(files, dirs=[], extensions=[]):
    # https://stackoverflow.com/a/45646357/2441026

    new_dirs = []
    for d in dirs:
        try:
            new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
        except OSError:
            if os.path.splitext(d)[1].lower() in extensions:
                files.append(d)

    if new_dirs:
        find_files(files, new_dirs, extensions )
    else:
        return


def run_fast_scandir(dir, ext):    # dir: str, ext: list
    # https://stackoverflow.com/a/59803793/2441026

    subfolders, files = [], []

    for f in os.scandir(dir):
        if f.is_dir():
            subfolders.append(f.path)
        if f.is_file():
            if os.path.splitext(f.name)[1].lower() in ext:
                files.append(f.path)


    for dir in list(subfolders):
        sf, f = run_fast_scandir(dir, ext)
        subfolders.extend(sf)
        files.extend(f)
    return subfolders, files



if __name__ == '__main__':
    run_os_walk()
    run_os_walk_glob()
    run_glob()
    run_iglob()
    run_pathlib_rglob()


    a = time.time_ns()
    for i in range(RUNS):
        files = []
        find_files(files, dirs=[directory], extensions=[".jpg"])
    print(f"find_files\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(files)}")


    a = time.time_ns()
    for i in range(RUNS):
        subf, files = run_fast_scandir(directory, [".jpg"])
    print(f"fast_scandir\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(files)}. Found subfolders: {len(subf)}")

This seems to be the fastest solution I could come up with, and is faster than os.walk and a lot faster than any glob solution.

  • It will also give you a list of all nested subfolders at basically no cost.
  • You can search for several different extensions.
  • You can also choose to return either full paths or just the names for the files by changing f.path to f.name (do not change it for subfolders!).

Args: dir: str, ext: list.
Function returns two lists: subfolders, files.

See below for a detailed speed anaylsis.

def run_fast_scandir(dir, ext):    # dir: str, ext: list
    subfolders, files = [], []

    for f in os.scandir(dir):
        if f.is_dir():
            subfolders.append(f.path)
        if f.is_file():
            if os.path.splitext(f.name)[1].lower() in ext:
                files.append(f.path)


    for dir in list(subfolders):
        sf, f = run_fast_scandir(dir, ext)
        subfolders.extend(sf)
        files.extend(f)
    return subfolders, files


subfolders, files = run_fast_scandir(folder, [".jpg"])


Speed analysis

for various methods to get all files with a specific file extension inside all subfolders and the main folder.

tl;dr:
fast_scandir clearly wins and is twice as fast as all other solutions, except os.walk.
os.walk is second place slighly slower.
– using glob will greatly slow down the process.
– None of the results use natural sorting. This means results will be sorted like this: 1, 10, 2. To get natural sorting (1, 2, 10), please have a look at https://stackoverflow.com/a/48030307/2441026


Results:

fast_scandir    took  499 ms. Found files: 16596. Found subfolders: 439
os.walk         took  589 ms. Found files: 16596
find_files      took  919 ms. Found files: 16596
glob.iglob      took  998 ms. Found files: 16596
glob.glob       took 1002 ms. Found files: 16596
pathlib.rglob   took 1041 ms. Found files: 16596
os.walk-glob    took 1043 ms. Found files: 16596

Tests were done with W7x64, Python 3.8.1, 20 runs. 16596 files in 439 (partially nested) subfolders.
find_files is from https://stackoverflow.com/a/45646357/2441026 and lets you search for several extensions.
fast_scandir was written by myself and will also return a list of subfolders. You can give it a list of extensions to search for (I tested a list with one entry to a simple if ... == ".jpg" and there was no significant difference).


# -*- coding: utf-8 -*-
# Python 3


import time
import os
from glob import glob, iglob
from pathlib import Path


directory = r"<folder>"
RUNS = 20


def run_os_walk():
    a = time.time_ns()
    for i in range(RUNS):
        fu = [os.path.join(dp, f) for dp, dn, filenames in os.walk(directory) for f in filenames if
                  os.path.splitext(f)[1].lower() == '.jpg']
    print(f"os.walk\t\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_os_walk_glob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = [y for x in os.walk(directory) for y in glob(os.path.join(x[0], '*.jpg'))]
    print(f"os.walk-glob\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_glob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = glob(os.path.join(directory, '**', '*.jpg'), recursive=True)
    print(f"glob.glob\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_iglob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = list(iglob(os.path.join(directory, '**', '*.jpg'), recursive=True))
    print(f"glob.iglob\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def run_pathlib_rglob():
    a = time.time_ns()
    for i in range(RUNS):
        fu = list(Path(directory).rglob("*.jpg"))
    print(f"pathlib.rglob\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(fu)}")


def find_files(files, dirs=[], extensions=[]):
    # https://stackoverflow.com/a/45646357/2441026

    new_dirs = []
    for d in dirs:
        try:
            new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
        except OSError:
            if os.path.splitext(d)[1].lower() in extensions:
                files.append(d)

    if new_dirs:
        find_files(files, new_dirs, extensions )
    else:
        return


def run_fast_scandir(dir, ext):    # dir: str, ext: list
    # https://stackoverflow.com/a/59803793/2441026

    subfolders, files = [], []

    for f in os.scandir(dir):
        if f.is_dir():
            subfolders.append(f.path)
        if f.is_file():
            if os.path.splitext(f.name)[1].lower() in ext:
                files.append(f.path)


    for dir in list(subfolders):
        sf, f = run_fast_scandir(dir, ext)
        subfolders.extend(sf)
        files.extend(f)
    return subfolders, files



if __name__ == '__main__':
    run_os_walk()
    run_os_walk_glob()
    run_glob()
    run_iglob()
    run_pathlib_rglob()


    a = time.time_ns()
    for i in range(RUNS):
        files = []
        find_files(files, dirs=[directory], extensions=[".jpg"])
    print(f"find_files\t\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(files)}")


    a = time.time_ns()
    for i in range(RUNS):
        subf, files = run_fast_scandir(directory, [".jpg"])
    print(f"fast_scandir\ttook {(time.time_ns() - a) / 1000 / 1000 / RUNS:.0f} ms. Found files: {len(files)}. Found subfolders: {len(subf)}")

回答 4

新的pathlib库将其简化为一行:

from pathlib import Path
result = list(Path(PATH).glob('**/*.txt'))

您还可以使用生成器版本:

from pathlib import Path
for file in Path(PATH).glob('**/*.txt'):
    pass

这将返回Path对象,您几乎可以将其用于任何对象,或者通过获取字符串形式的文件名file.name

The new pathlib library simplifies this to one line:

from pathlib import Path
result = list(Path(PATH).glob('**/*.txt'))

You can also use the generator version:

from pathlib import Path
for file in Path(PATH).glob('**/*.txt'):
    pass

This returns Path objects, which you can use for pretty much anything, or get the file name as a string by file.name.


回答 5

它不是最Python的答案,但我会把它放在这里很有趣,因为这是递归中的一个很好的类

def find_files( files, dirs=[], extensions=[]):
    new_dirs = []
    for d in dirs:
        try:
            new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
        except OSError:
            if os.path.splitext(d)[1] in extensions:
                files.append(d)

    if new_dirs:
        find_files(files, new_dirs, extensions )
    else:
        return

在我的机器上,我有两个文件夹,root并且root2

mender@multivax ]ls -R root root2
root:
temp1 temp2

root/temp1:
temp1.1 temp1.2

root/temp1/temp1.1:
f1.mid

root/temp1/temp1.2:
f.mi  f.mid

root/temp2:
tmp.mid

root2:
dummie.txt temp3

root2/temp3:
song.mid

比方说,我想找到所有.txt和所有.mid文件在任何这些目录中,然后我可以做

files = []
find_files( files, dirs=['root','root2'], extensions=['.mid','.txt'] )
print(files)

#['root2/dummie.txt',
# 'root/temp2/tmp.mid',
# 'root2/temp3/song.mid',
# 'root/temp1/temp1.1/f1.mid',
# 'root/temp1/temp1.2/f.mid']

Its not the most pythonic answer, but I’ll put it here for fun because it’s a neat lesson in recursion

def find_files( files, dirs=[], extensions=[]):
    new_dirs = []
    for d in dirs:
        try:
            new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
        except OSError:
            if os.path.splitext(d)[1] in extensions:
                files.append(d)

    if new_dirs:
        find_files(files, new_dirs, extensions )
    else:
        return

On my machine I have two folders, root and root2

mender@multivax ]ls -R root root2
root:
temp1 temp2

root/temp1:
temp1.1 temp1.2

root/temp1/temp1.1:
f1.mid

root/temp1/temp1.2:
f.mi  f.mid

root/temp2:
tmp.mid

root2:
dummie.txt temp3

root2/temp3:
song.mid

Lets say I want to find all .txt and all .mid files in either of these directories, then I can just do

files = []
find_files( files, dirs=['root','root2'], extensions=['.mid','.txt'] )
print(files)

#['root2/dummie.txt',
# 'root/temp2/tmp.mid',
# 'root2/temp3/song.mid',
# 'root/temp1/temp1.1/f1.mid',
# 'root/temp1/temp1.2/f.mid']

回答 6

递归是Python 3.5中的新增功能,因此它不适用于Python 2.7。这是使用r字符串的示例,因此您只需按Win,Lin,…上的路径提供路径即可。

import glob

mypath=r"C:\Users\dj\Desktop\nba"

files = glob.glob(mypath + r'\**\*.py', recursive=True)
# print(files) # as list
for f in files:
    print(f) # nice looking single line per file

注意:它将列出所有文件,无论文件应多深。

Recursive is new in Python 3.5, so it won’t work on Python 2.7. Here is the example that uses r strings so you just need to provide the path as is on either Win, Lin, …

import glob

mypath=r"C:\Users\dj\Desktop\nba"

files = glob.glob(mypath + r'\**\*.py', recursive=True)
# print(files) # as list
for f in files:
    print(f) # nice looking single line per file

Note: It will list all files, no matter how deep it should go.


回答 7

您可以通过这种方式返回绝对路径文件列表。

def list_files_recursive(path):
    """
    Function that receives as a parameter a directory path
    :return list_: File List and Its Absolute Paths
    """

    import os

    files = []

    # r = root, d = directories, f = files
    for r, d, f in os.walk(path):
        for file in f:
            files.append(os.path.join(r, file))

    lst = [file for file in files]
    return lst


if __name__ == '__main__':

    result = list_files_recursive('/tmp')
    print(result)

You can do it this way to return you a list of absolute path files.

def list_files_recursive(path):
    """
    Function that receives as a parameter a directory path
    :return list_: File List and Its Absolute Paths
    """

    import os

    files = []

    # r = root, d = directories, f = files
    for r, d, f in os.walk(path):
        for file in f:
            files.append(os.path.join(r, file))

    lst = [file for file in files]
    return lst


if __name__ == '__main__':

    result = list_files_recursive('/tmp')
    print(result)


回答 8

如果您不介意安装其他灯库,则可以执行以下操作:

pip install plazy

用法:

import plazy

txt_filter = lambda x : True if x.endswith('.txt') else False
files = plazy.list_files(root='data', filter_func=txt_filter, is_include_root=True)

结果应如下所示:

['data/a.txt', 'data/b.txt', 'data/sub_dir/c.txt']

它同时适用于Python 2.7和Python 3。

GitHub:https : //github.com/kyzas/plazy#list-files

免责声明:我是的作者plazy

If you don’t mind installing an additional light library, you can do this:

pip install plazy

Usage:

import plazy

txt_filter = lambda x : True if x.endswith('.txt') else False
files = plazy.list_files(root='data', filter_func=txt_filter, is_include_root=True)

The result should look something like this:

['data/a.txt', 'data/b.txt', 'data/sub_dir/c.txt']

It works on both Python 2.7 and Python 3.

Github: https://github.com/kyzas/plazy#list-files

Disclaimer: I’m an author of plazy.


回答 9

此功能将递归地仅将文件放入列表中。希望你能。

import os


def ls_files(dir):
    files = list()
    for item in os.listdir(dir):
        abspath = os.path.join(dir, item)
        try:
            if os.path.isdir(abspath):
                files = files + ls_files(abspath)
            else:
                files.append(abspath)
        except FileNotFoundError as err:
            print('invalid directory\n', 'Error: ', err)
    return files

This function will recursively put only files into a list.

import os


def ls_files(dir):
    files = list()
    for item in os.listdir(dir):
        abspath = os.path.join(dir, item)
        try:
            if os.path.isdir(abspath):
                files = files + ls_files(abspath)
            else:
                files.append(abspath)
        except FileNotFoundError as err:
            print('invalid directory\n', 'Error: ', err)
    return files

回答 10

您最初的解决方案几乎是正确的,但是变量“ root”是在递归路径周围动态更新的。os.walk()是一个递归生成器。每个元组集(根,subFolder,文件)都是按照设置方式针对特定根的。

root = 'C:\\'
subFolder = ['Users', 'ProgramFiles', 'ProgramFiles (x86)', 'Windows', ...]
files = ['foo1.txt', 'foo2.txt', 'foo3.txt', ...]

root = 'C:\\Users\\'
subFolder = ['UserAccount1', 'UserAccount2', ...]
files = ['bar1.txt', 'bar2.txt', 'bar3.txt', ...]

...

我对您的代码做了些微调整,以打印完整列表。

import os
for root, subFolder, files in os.walk(PATH):
    for item in files:
        if item.endswith(".txt") :
            fileNamePath = str(os.path.join(root,item))
            print(fileNamePath)

希望这可以帮助!

Your original solution was very nearly correct, but the variable “root” is dynamically updated as it recursively paths around. os.walk() is a recursive generator. Each tuple set of (root, subFolder, files) is for a specific root the way you have it setup.

i.e.

root = 'C:\\'
subFolder = ['Users', 'ProgramFiles', 'ProgramFiles (x86)', 'Windows', ...]
files = ['foo1.txt', 'foo2.txt', 'foo3.txt', ...]

root = 'C:\\Users\\'
subFolder = ['UserAccount1', 'UserAccount2', ...]
files = ['bar1.txt', 'bar2.txt', 'bar3.txt', ...]

...

I made a slight tweak to your code to print a full list.

import os
for root, subFolder, files in os.walk(PATH):
    for item in files:
        if item.endswith(".txt") :
            fileNamePath = str(os.path.join(root,item))
            print(fileNamePath)

Hope this helps!


如何实现__getattribute__而没有无限递归错误?

问题:如何实现__getattribute__而没有无限递归错误?

我想覆盖对类中一个变量的访问,但通常返回所有其他变量。我该怎么做__getattribute__呢?

我尝试了以下操作(它也应说明我要执行的操作),但是出现了递归错误:

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:
            return self.__dict__[name]

>>> print D().test
0.0
>>> print D().test2
...
RuntimeError: maximum recursion depth exceeded in cmp

I want to override access to one variable in a class, but return all others normally. How do I accomplish this with __getattribute__?

I tried the following (which should also illustrate what I’m trying to do) but I get a recursion error:

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:
            return self.__dict__[name]

>>> print D().test
0.0
>>> print D().test2
...
RuntimeError: maximum recursion depth exceeded in cmp

回答 0

您收到递归错误,因为您尝试访问其中的self.__dict__属性会再次__getattribute__调用您__getattribute__。如果你使用object__getattribute__不是,它的工作原理:

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:
            return object.__getattribute__(self, name)

之所以可行,是因为object(在此示例中)是基类。通过调用您的基本版本,__getattribute__可以避免您以前遇到的递归地狱。

IPython的输出与foo.py中的代码:

In [1]: from foo import *

In [2]: d = D()

In [3]: d.test
Out[3]: 0.0

In [4]: d.test2
Out[4]: 21

更新:

在当前文档中,标题为“ 针对新样式类的更多属性访问 ”的部分中有一些内容,他们建议完全这样做以避免无限递归。

You get a recursion error because your attempt to access the self.__dict__ attribute inside __getattribute__ invokes your __getattribute__ again. If you use object‘s __getattribute__ instead, it works:

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:
            return object.__getattribute__(self, name)

This works because object (in this example) is the base class. By calling the base version of __getattribute__ you avoid the recursive hell you were in before.

Ipython output with code in foo.py:

In [1]: from foo import *

In [2]: d = D()

In [3]: d.test
Out[3]: 0.0

In [4]: d.test2
Out[4]: 21

Update:

There’s something in the section titled More attribute access for new-style classes in the current documentation, where they recommend doing exactly this to avoid the infinite recursion.


回答 1

实际上,我相信您想改用__getattr__特殊方法。

引用Python文档:

__getattr__( self, name)

当在常规位置未找到属性时调用该属性(即,它不是实例属性,也不是在自身的类树中找到该属性)。name是属性名称。此方法应返回(计算出的)属性值或引发AttributeError异常。
请注意,如果通过常规机制找到该属性,__getattr__()则不会调用该属性。(这是__getattr__()和之间的故意不对称__setattr__()。)这样做是出于效率方面的考虑,并且因为否则__setattr__()将无法访问实例的其他属性。请注意,至少对于实例变量,您可以通过不在实例属性字典中插入任何值(而是将其插入另一个对象中)来伪造总体控制。见__getattribute__() 方法,以实际获得新样式类中的总控制权。

注:对于这项工作,该实例应该不会有一个test属性,因此行self.test=20应该被删除。

Actually, I believe you want to use the __getattr__ special method instead.

Quote from the Python docs:

__getattr__( self, name)

Called when an attribute lookup has not found the attribute in the usual places (i.e. it is not an instance attribute nor is it found in the class tree for self). name is the attribute name. This method should return the (computed) attribute value or raise an AttributeError exception.
Note that if the attribute is found through the normal mechanism, __getattr__() is not called. (This is an intentional asymmetry between __getattr__() and __setattr__().) This is done both for efficiency reasons and because otherwise __setattr__() would have no way to access other attributes of the instance. Note that at least for instance variables, you can fake total control by not inserting any values in the instance attribute dictionary (but instead inserting them in another object). See the __getattribute__() method below for a way to actually get total control in new-style classes.

Note: for this to work, the instance should not have a test attribute, so the line self.test=20 should be removed.


回答 2

Python语言参考:

为了避免此方法的无限递归,其实现应始终调用具有相同名称的基类方法以访问其所需的任何属性,例如 object.__getattribute__(self, name)

含义:

def __getattribute__(self,name):
    ...
        return self.__dict__[name]

您正在调用名为的属性__dict__。由于它是一个属性,因此__getattribute__会在搜索__dict__中调用__getattribute__哪个调用而被调用… yada yada yada

return  object.__getattribute__(self, name)

使用基类__getattribute__有助于查找真实属性。

Python language reference:

In order to avoid infinite recursion in this method, its implementation should always call the base class method with the same name to access any attributes it needs, for example, object.__getattribute__(self, name).

Meaning:

def __getattribute__(self,name):
    ...
        return self.__dict__[name]

You’re calling for an attribute called __dict__. Because it’s an attribute, __getattribute__ gets called in search for __dict__ which calls __getattribute__ which calls … yada yada yada

return  object.__getattribute__(self, name)

Using the base classes __getattribute__ helps finding the real attribute.


回答 3

确定要使用__getattribute__吗?您实际上想实现什么?

最简单的方法是:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21

    test = 0

要么:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21

    @property
    def test(self):
        return 0

编辑:请注意,的实例在每种情况下D将具有不同的值test。在第一种情况下d.test为20,在第二种情况下为0。我将由您自己确定原因。

Edit2:Greg指出示例2将失败,因为该属性是只读属性,并且该__init__方法尝试将其设置为20。对此的更完整示例为:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21

    _test = 0

    def get_test(self):
        return self._test

    def set_test(self, value):
        self._test = value

    test = property(get_test, set_test)

显然,作为一门课,这几乎是毫无用处的,但它为您提供了继续学习的想法。

Are you sure you want to use __getattribute__? What are you actually trying to achieve?

The easiest way to do what you ask is:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21

    test = 0

or:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21

    @property
    def test(self):
        return 0

Edit: Note that an instance of D would have different values of test in each case. In the first case d.test would be 20, in the second it would be 0. I’ll leave it to you to work out why.

Edit2: Greg pointed out that example 2 will fail because the property is read only and the __init__ method tried to set it to 20. A more complete example for that would be:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21

    _test = 0

    def get_test(self):
        return self._test

    def set_test(self, value):
        self._test = value

    test = property(get_test, set_test)

Obviously, as a class this is almost entirely useless, but it gives you an idea to move on from.


回答 4

这是一个更可靠的版本:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21
    def __getattribute__(self, name):
        if name == 'test':
            return 0.
        else:
            return super(D, self).__getattribute__(name)

它从父类调用__ getattribute __方法,最终退回到对象。__ getattribute __方法,如果其他祖先没有覆盖它。

Here is a more reliable version:

class D(object):
    def __init__(self):
        self.test = 20
        self.test2 = 21
    def __getattribute__(self, name):
        if name == 'test':
            return 0.
        else:
            return super(D, self).__getattribute__(name)

It calls __getattribute__ method from parent class, eventually falling back to object.__getattribute__ method if other ancestors don’t override it.


回答 5

如何__getattribute__使用该方法?

在普通的点分查找之前调用它。如果涨了AttributeError,我们打电话__getattr__

这种方法很少使用。标准库中只有两个定义:

$ grep -Erl  "def __getattribute__\(self" cpython/Lib | grep -v "/test/"
cpython/Lib/_threading_local.py
cpython/Lib/importlib/util.py

最佳实践

以编程方式控制对单个属性的访问的正确方法是使用property。类的D编写应如下所示(可以使用setter和Deleter来复制明显的预期行为):

class D(object):
    def __init__(self):
        self.test2=21

    @property
    def test(self):
        return 0.

    @test.setter
    def test(self, value):
        '''dummy function to avoid AttributeError on setting property'''

    @test.deleter
    def test(self):
        '''dummy function to avoid AttributeError on deleting property'''

和用法:

>>> o = D()
>>> o.test
0.0
>>> o.test = 'foo'
>>> o.test
0.0
>>> del o.test
>>> o.test
0.0

属性是数据描述符,因此它是常规点分查找算法中要查找的第一件事。

的选项 __getattribute__

如果您绝对需要通过来为每个属性实现查找,则有几种选择__getattribute__

  • 提高AttributeError,导致__getattr__被调用(如果已实现)
  • 从中退还东西
    • 通过super调用父类的(可能object的)执行
    • 呼唤 __getattr__
    • 以某种方式实现您自己的虚线查找算法

例如:

class NoisyAttributes(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self, name):
        print('getting: ' + name)
        try:
            return super(NoisyAttributes, self).__getattribute__(name)
        except AttributeError:
            print('oh no, AttributeError caught and reraising')
            raise
    def __getattr__(self, name):
        """Called if __getattribute__ raises AttributeError"""
        return 'close but no ' + name    


>>> n = NoisyAttributes()
>>> nfoo = n.foo
getting: foo
oh no, AttributeError caught and reraising
>>> nfoo
'close but no foo'
>>> n.test
getting: test
20

您最初想要的。

此示例说明了如何执行您最初想要的操作:

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:
            return super(D, self).__getattribute__(name)

并且会像这样:

>>> o = D()
>>> o.test = 'foo'
>>> o.test
0.0
>>> del o.test
>>> o.test
0.0
>>> del o.test

Traceback (most recent call last):
  File "<pyshell#216>", line 1, in <module>
    del o.test
AttributeError: test

代码审查

您的代码带注释。您在中对自己进行了点查询__getattribute__。这就是为什么您会得到递归错误的原因。您可以检查名称是否可用,"__dict__"并使用它super来解决,但这并不覆盖__slots__。我将其留给读者练习。

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:      #   v--- Dotted lookup on self in __getattribute__
            return self.__dict__[name]

>>> print D().test
0.0
>>> print D().test2
...
RuntimeError: maximum recursion depth exceeded in cmp

How is the __getattribute__ method used?

It is called before the normal dotted lookup. If it raises AttributeError, then we call __getattr__.

Use of this method is rather rare. There are only two definitions in the standard library:

$ grep -Erl  "def __getattribute__\(self" cpython/Lib | grep -v "/test/"
cpython/Lib/_threading_local.py
cpython/Lib/importlib/util.py

Best Practice

The proper way to programmatically control access to a single attribute is with property. Class D should be written as follows (with the setter and deleter optionally to replicate apparent intended behavior):

class D(object):
    def __init__(self):
        self.test2=21

    @property
    def test(self):
        return 0.

    @test.setter
    def test(self, value):
        '''dummy function to avoid AttributeError on setting property'''

    @test.deleter
    def test(self):
        '''dummy function to avoid AttributeError on deleting property'''

And usage:

>>> o = D()
>>> o.test
0.0
>>> o.test = 'foo'
>>> o.test
0.0
>>> del o.test
>>> o.test
0.0

A property is a data descriptor, thus it is the first thing looked for in the normal dotted lookup algorithm.

Options for __getattribute__

You several options if you absolutely need to implement lookup for every attribute via __getattribute__.

  • raise AttributeError, causing __getattr__ to be called (if implemented)
  • return something from it by
    • using super to call the parent (probably object‘s) implementation
    • calling __getattr__
    • implementing your own dotted lookup algorithm somehow

For example:

class NoisyAttributes(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self, name):
        print('getting: ' + name)
        try:
            return super(NoisyAttributes, self).__getattribute__(name)
        except AttributeError:
            print('oh no, AttributeError caught and reraising')
            raise
    def __getattr__(self, name):
        """Called if __getattribute__ raises AttributeError"""
        return 'close but no ' + name    


>>> n = NoisyAttributes()
>>> nfoo = n.foo
getting: foo
oh no, AttributeError caught and reraising
>>> nfoo
'close but no foo'
>>> n.test
getting: test
20

What you originally wanted.

And this example shows how you might do what you originally wanted:

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:
            return super(D, self).__getattribute__(name)

And will behave like this:

>>> o = D()
>>> o.test = 'foo'
>>> o.test
0.0
>>> del o.test
>>> o.test
0.0
>>> del o.test

Traceback (most recent call last):
  File "<pyshell#216>", line 1, in <module>
    del o.test
AttributeError: test

Code review

Your code with comments. You have a dotted lookup on self in __getattribute__. This is why you get a recursion error. You could check if name is "__dict__" and use super to workaround, but that doesn’t cover __slots__. I’ll leave that as an exercise to the reader.

class D(object):
    def __init__(self):
        self.test=20
        self.test2=21
    def __getattribute__(self,name):
        if name=='test':
            return 0.
        else:      #   v--- Dotted lookup on self in __getattribute__
            return self.__dict__[name]

>>> print D().test
0.0
>>> print D().test2
...
RuntimeError: maximum recursion depth exceeded in cmp

Python:使用递归算法作为生成器

问题:Python:使用递归算法作为生成器

最近,我编写了一个函数来生成具有非平凡约束的某些序列。问题来自自然的递归解决方案。现在碰巧,即使对于相对较小的输入,序列也要成千上万,因此我宁愿使用我的算法作为生成器,而不是使用它来填充所有序列的列表。

这是一个例子。假设我们要使用递归函数计算字符串的所有排列。以下朴素算法采用一个额外的参数“存储”,并在找到一个参数时附加一个置换:

def getPermutations(string, storage, prefix=""):
   if len(string) == 1:
      storage.append(prefix + string)   # <-----
   else:
      for i in range(len(string)):
         getPermutations(string[:i]+string[i+1:], storage, prefix+string[i])

storage = []
getPermutations("abcd", storage)
for permutation in storage: print permutation

(请不要在意效率低下,这只是一个例子。)

现在,我想将函数转换为生成器,即产生置换而不是将其追加到存储列表中:

def getPermutations(string, prefix=""):
   if len(string) == 1:
      yield prefix + string             # <-----
   else:
      for i in range(len(string)):
         getPermutations(string[:i]+string[i+1:], prefix+string[i])

for permutation in getPermutations("abcd"):
   print permutation

此代码不能正常工作(该函数的行为像一个空发生器)。

我想念什么吗?有没有一种方法可以将上述递归算法转换为生成器,而无需用迭代替换呢

Recently I wrote a function to generate certain sequences with nontrivial constraints. The problem came with a natural recursive solution. Now it happens that, even for relatively small input, the sequences are several thousands, thus I would prefer to use my algorithm as a generator instead of using it to fill a list with all the sequences.

Here is an example. Suppose we want to compute all the permutations of a string with a recursive function. The following naive algorithm takes an extra argument ‘storage’ and appends a permutation to it whenever it finds one:

def getPermutations(string, storage, prefix=""):
   if len(string) == 1:
      storage.append(prefix + string)   # <-----
   else:
      for i in range(len(string)):
         getPermutations(string[:i]+string[i+1:], storage, prefix+string[i])

storage = []
getPermutations("abcd", storage)
for permutation in storage: print permutation

(Please don’t care about inefficiency, this is only an example.)

Now I want to turn my function into a generator, i.e. to yield a permutation instead of appending it to the storage list:

def getPermutations(string, prefix=""):
   if len(string) == 1:
      yield prefix + string             # <-----
   else:
      for i in range(len(string)):
         getPermutations(string[:i]+string[i+1:], prefix+string[i])

for permutation in getPermutations("abcd"):
   print permutation

This code does not work (the function behaves like an empty generator).

Am I missing something? Is there a way to turn the above recursive algorithm into a generator without replacing it with an iterative one?


回答 0

def getPermutations(string, prefix=""):
    if len(string) == 1:
        yield prefix + string
    else:
        for i in xrange(len(string)):
            for perm in getPermutations(string[:i] + string[i+1:], prefix+string[i]):
                yield perm

或没有累加器:

def getPermutations(string):
    if len(string) == 1:
        yield string
    else:
        for i in xrange(len(string)):
            for perm in getPermutations(string[:i] + string[i+1:]):
                yield string[i] + perm
def getPermutations(string, prefix=""):
    if len(string) == 1:
        yield prefix + string
    else:
        for i in xrange(len(string)):
            for perm in getPermutations(string[:i] + string[i+1:], prefix+string[i]):
                yield perm

Or without an accumulator:

def getPermutations(string):
    if len(string) == 1:
        yield string
    else:
        for i in xrange(len(string)):
            for perm in getPermutations(string[:i] + string[i+1:]):
                yield string[i] + perm

回答 1

这避免了len(string)-deep递归,并且通常是处理generators-inside-generators的一种好方法:

from types import GeneratorType

def flatten(*stack):
    stack = list(stack)
    while stack:
        try: x = stack[0].next()
        except StopIteration:
            stack.pop(0)
            continue
        if isinstance(x, GeneratorType): stack.insert(0, x)
        else: yield x

def _getPermutations(string, prefix=""):
    if len(string) == 1: yield prefix + string
    else: yield (_getPermutations(string[:i]+string[i+1:], prefix+string[i])
            for i in range(len(string)))

def getPermutations(string): return flatten(_getPermutations(string))

for permutation in getPermutations("abcd"): print permutation

flatten允许我们通过简单地yield对另一个生成器进行操作,而不是对其进行迭代并yield手动查看每个项目,从而继续取得进展。


Python 3.3将添加yield from到语法中,该语法允许自然地委派给子生成器:

def getPermutations(string, prefix=""):
    if len(string) == 1:
        yield prefix + string
    else:
        for i in range(len(string)):
            yield from getPermutations(string[:i]+string[i+1:], prefix+string[i])

This avoids the len(string)-deep recursion, and is in general a nice way to handle generators-inside-generators:

from types import GeneratorType

def flatten(*stack):
    stack = list(stack)
    while stack:
        try: x = stack[0].next()
        except StopIteration:
            stack.pop(0)
            continue
        if isinstance(x, GeneratorType): stack.insert(0, x)
        else: yield x

def _getPermutations(string, prefix=""):
    if len(string) == 1: yield prefix + string
    else: yield (_getPermutations(string[:i]+string[i+1:], prefix+string[i])
            for i in range(len(string)))

def getPermutations(string): return flatten(_getPermutations(string))

for permutation in getPermutations("abcd"): print permutation

flatten allows us to continue progress in another generator by simply yielding it, instead of iterating through it and yielding each item manually.


Python 3.3 will add yield from to the syntax, which allows for natural delegation to a sub-generator:

def getPermutations(string, prefix=""):
    if len(string) == 1:
        yield prefix + string
    else:
        for i in range(len(string)):
            yield from getPermutations(string[:i]+string[i+1:], prefix+string[i])

回答 2

内部调用getPermutations-它也是一个生成器。

def getPermutations(string, prefix=""):
   if len(string) == 1:
      yield prefix + string            
   else:
      for i in range(len(string)):
         getPermutations(string[:i]+string[i+1:], prefix+string[i])  # <-----

您需要使用for循环进行遍历(请参阅@MizardX发布,这使我几秒钟就消失了!)

The interior call to getPermutations — it’s a generator, too.

def getPermutations(string, prefix=""):
   if len(string) == 1:
      yield prefix + string            
   else:
      for i in range(len(string)):
         getPermutations(string[:i]+string[i+1:], prefix+string[i])  # <-----

You need to iterate through that with a for-loop (see @MizardX posting, which edged me out by seconds!)


Python是否优化尾部递归?

问题:Python是否优化尾部递归?

我有以下代码,失败并出现以下错误:

RuntimeError:超过最大递归深度

我试图重写此代码以允许尾递归优化(TCO)。我相信,如果发生了TCO,则该代码应该会成功。

def trisum(n, csum):
    if n == 0:
        return csum
    else:
        return trisum(n - 1, csum + n)

print(trisum(1000, 0))

我是否应该得出结论,Python不执行任何类型的TCO,还是只需要以不同的方式定义它?

I have the following piece of code which fails with the following error:

RuntimeError: maximum recursion depth exceeded

I attempted to rewrite this to allow for tail recursion optimization (TCO). I believe that this code should have been successful if a TCO had taken place.

def trisum(n, csum):
    if n == 0:
        return csum
    else:
        return trisum(n - 1, csum + n)

print(trisum(1000, 0))

Should I conclude that Python does not do any type of TCO, or do I just need to define it differently?


回答 0

不,而且永远不会,因为Guido van Rossum希望能够进行适当的追溯:

消除尾递归(2009-04-22)

尾声定语(2009-04-27)

您可以通过以下转换手动消除递归:

>>> def trisum(n, csum):
...     while True:                     # Change recursion to a while loop
...         if n == 0:
...             return csum
...         n, csum = n - 1, csum + n   # Update parameters instead of tail recursion

>>> trisum(1000,0)
500500

No, and it never will since Guido van Rossum prefers to be able to have proper tracebacks:

Tail Recursion Elimination (2009-04-22)

Final Words on Tail Calls (2009-04-27)

You can manually eliminate the recursion with a transformation like this:

>>> def trisum(n, csum):
...     while True:                     # Change recursion to a while loop
...         if n == 0:
...             return csum
...         n, csum = n - 1, csum + n   # Update parameters instead of tail recursion

>>> trisum(1000,0)
500500

回答 1

我发布了执行尾部调用优化的模块(处理尾部递归和连续传递样式):https : //github.com/baruchel/tco

在Python中优化尾递归

经常有人声称尾递归不适合Pythonic的编码方式,而且人们不应该在意如何将其嵌入循环中。我不想以这种观点参数。但是有时我还是喜欢尝试或将新的想法作为尾递归函数而不是由于各种原因而使用循环(出于各种原因(着眼于想法而不是过程,同时在屏幕上同时具有二十个短函数,而不是三个“ Pythonic”)功能,在交互式会话中工作而不是编辑我的代码等)。

实际上,在Python中优化尾递归非常容易。尽管据说这是不可能的或非常棘手的,但我认为可以通过优雅,简短和通用的解决方案来实现。我什至认为这些解决方案中的大多数都没有使用Python功能,而是应该使用它们。干净的lambda表达式与非常标准的循环一起使用,可以快速,高效且完全可用的工具来实现尾递归优化。

为了个人方便,我编写了一个小模块,通过两种不同的方式来实现这种优化。我想在这里讨论我的两个主要功能。

干净的方法:修改Y组合器

Y组合是公知的; 它允许以递归方式使用lambda函数,但它本身不允许将递归调用嵌入循环中。单凭Lambda演算就无法做到这一点。但是,Y组合器中的微小变化可以保护递归调用被实际评估。因此可以延迟评估。

这是Y组合器的著名表达:

lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))

进行很小的更改,我可以得到:

lambda f: (lambda x: x(x))(lambda y: f(lambda *args: lambda: y(y)(*args)))

现在,函数f不再调用自身,而是返回执行相同调用的函数,但是由于返回了函数,因此可以稍后从外部进行评估。

我的代码是:

def bet(func):
    b = (lambda f: (lambda x: x(x))(lambda y:
          f(lambda *args: lambda: y(y)(*args))))(func)
    def wrapper(*args):
        out = b(*args)
        while callable(out):
            out = out()
        return out
    return wrapper

该功能可以通过以下方式使用:这是阶乘和斐波那契的尾递归版本的两个示例:

>>> from recursion import *
>>> fac = bet( lambda f: lambda n, a: a if not n else f(n-1,a*n) )
>>> fac(5,1)
120
>>> fibo = bet( lambda f: lambda n,p,q: p if not n else f(n-1,q,p+q) )
>>> fibo(10,0,1)
55

显然,递归深度不再是一个问题:

>>> bet( lambda f: lambda n: 42 if not n else f(n-1) )(50000)
42

当然,这是该功能的唯一实际目的。

此优化只能完成一件事情:不能与评估另一个函数的尾部递归函数一起使用(这是由于可调用返回的对象都被当作没有区别的进一步递归调用来处理的事实)。由于我通常不需要这种功能,因此我对上面的代码感到非常满意。但是,为了提供一个更通用的模块,我想了一下,以便找到解决此问题的方法(请参阅下一节)。

关于这个过程的速度(但这不是真正的问题),它碰巧是相当不错的。尾递归函数甚至比使用以下代码使用更简单的表达式更快地求值:

def bet1(func):
    def wrapper(*args):
        out = func(lambda *x: lambda: x)(*args)
        while callable(out):
            out = func(lambda *x: lambda: x)(*out())
        return out
    return wrapper

我认为评估一个甚至复杂的表达式比评估几个简单的表达式要快得多,第二个版本就是这种情况。我没有在模块中保留此新功能,而且在任何情况下都不能使用它而不是“官方”功能。

延续传球方式,但有exceptions

这是一个更通用的功能;它能够处理所有的尾递归函数,包括那些返回其他函数的函数。通过使用异常,可以从其他返回值中识别出递归调用。这种解决方案比以前的解决方案要慢。通过在主循环中检测到一些特殊值作为“标志”,可以编写更快的代码,但是我不喜欢使用特殊值或内部关键字的想法。使用异常有一些有趣的解释:如果Python不喜欢尾递归调用,那么当确实发生尾递归调用时,应该引发一个异常,而Python的方式将是捕获该异常以找到一些干净的方法解决方案,这实际上就是这里发生的事情…

class _RecursiveCall(Exception):
  def __init__(self, *args):
    self.args = args
def _recursiveCallback(*args):
  raise _RecursiveCall(*args)
def bet0(func):
    def wrapper(*args):
        while True:
          try:
            return func(_recursiveCallback)(*args)
          except _RecursiveCall as e:
            args = e.args
    return wrapper

现在可以使用所有功能。在以下示例中,f(n)对n的任何正值求值到恒等函数:

>>> f = bet0( lambda f: lambda n: (lambda x: x) if not n else f(n-1) )
>>> f(5)(42)
42

当然,可以争辩说,异常并非旨在用于故意重定向解释器(作为一种goto陈述,或者可能是一种延续传递样式),我必须承认。但是,再次让我觉得有趣的是使用try单行作为return语句的想法:我们尝试返回某些内容(正常行为),但是由于发生递归调用(异常)而无法这样做。

初步答案(2013-08-29)。

我写了一个很小的插件来处理尾递归。您可以在这里找到我的解释:https//groups.google.com/forum/?hl = fr#!topic / comp.lang.python / dIsnJ2BoBKs

它可以将以尾部递归样式编写的lambda函数嵌入另一个函数中,该函数会将其评估为循环。

以我的拙见,这个小函数最有趣的功能是该函数不依赖于肮脏的编程技巧,而仅依赖于lambda演算:插入另一个lambda函数后,该函数的行为会更改为另一个行为。看起来非常像Y组合器。

I published a module performing tail-call optimization (handling both tail-recursion and continuation-passing style): https://github.com/baruchel/tco

Optimizing tail-recursion in Python

It has often been claimed that tail-recursion doesn’t suit the Pythonic way of coding and that one shouldn’t care about how to embed it in a loop. I don’t want to argue with this point of view; sometimes however I like trying or implementing new ideas as tail-recursive functions rather than with loops for various reasons (focusing on the idea rather than on the process, having twenty short functions on my screen in the same time rather than only three “Pythonic” functions, working in an interactive session rather than editing my code, etc.).

Optimizing tail-recursion in Python is in fact quite easy. While it is said to be impossible or very tricky, I think it can be achieved with elegant, short and general solutions; I even think that most of these solutions don’t use Python features otherwise than they should. Clean lambda expressions working along with very standard loops lead to quick, efficient and fully usable tools for implementing tail-recursion optimization.

As a personal convenience, I wrote a small module implementing such an optimization by two different ways. I would like to discuss here about my two main functions.

The clean way: modifying the Y combinator

The Y combinator is well known; it allows to use lambda functions in a recursive manner, but it doesn’t allow by itself to embed recursive calls in a loop. Lambda calculus alone can’t do such a thing. A slight change in the Y combinator however can protect the recursive call to be actually evaluated. Evaluation can thus be delayed.

Here is the famous expression for the Y combinator:

lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))

With a very slight change, I could get:

lambda f: (lambda x: x(x))(lambda y: f(lambda *args: lambda: y(y)(*args)))

Instead of calling itself, the function f now returns a function performing the very same call, but since it returns it, the evaluation can be done later from outside.

My code is:

def bet(func):
    b = (lambda f: (lambda x: x(x))(lambda y:
          f(lambda *args: lambda: y(y)(*args))))(func)
    def wrapper(*args):
        out = b(*args)
        while callable(out):
            out = out()
        return out
    return wrapper

The function can be used in the following way; here are two examples with tail-recursive versions of factorial and Fibonacci:

>>> from recursion import *
>>> fac = bet( lambda f: lambda n, a: a if not n else f(n-1,a*n) )
>>> fac(5,1)
120
>>> fibo = bet( lambda f: lambda n,p,q: p if not n else f(n-1,q,p+q) )
>>> fibo(10,0,1)
55

Obviously recursion depth isn’t an issue any longer:

>>> bet( lambda f: lambda n: 42 if not n else f(n-1) )(50000)
42

This is of course the single real purpose of the function.

Only one thing can’t be done with this optimization: it can’t be used with a tail-recursive function evaluating to another function (this comes from the fact that callable returned objects are all handled as further recursive calls with no distinction). Since I usually don’t need such a feature, I am very happy with the code above. However, in order to provide a more general module, I thought a little more in order to find some workaround for this issue (see next section).

Concerning the speed of this process (which isn’t the real issue however), it happens to be quite good; tail-recursive functions are even evaluated much quicker than with the following code using simpler expressions:

def bet1(func):
    def wrapper(*args):
        out = func(lambda *x: lambda: x)(*args)
        while callable(out):
            out = func(lambda *x: lambda: x)(*out())
        return out
    return wrapper

I think that evaluating one expression, even complicated, is much quicker than evaluating several simple expressions, which is the case in this second version. I didn’t keep this new function in my module, and I see no circumstances where it could be used rather than the “official” one.

Continuation passing style with exceptions

Here is a more general function; it is able to handle all tail-recursive functions, including those returning other functions. Recursive calls are recognized from other return values by the use of exceptions. This solutions is slower than the previous one; a quicker code could probably be written by using some special values as “flags” being detected in the main loop, but I don’t like the idea of using special values or internal keywords. There is some funny interpretation of using exceptions: if Python doesn’t like tail-recursive calls, an exception should be raised when a tail-recursive call does occur, and the Pythonic way will be to catch the exception in order to find some clean solution, which is actually what happens here…

class _RecursiveCall(Exception):
  def __init__(self, *args):
    self.args = args
def _recursiveCallback(*args):
  raise _RecursiveCall(*args)
def bet0(func):
    def wrapper(*args):
        while True:
          try:
            return func(_recursiveCallback)(*args)
          except _RecursiveCall as e:
            args = e.args
    return wrapper

Now all functions can be used. In the following example, f(n) is evaluated to the identity function for any positive value of n:

>>> f = bet0( lambda f: lambda n: (lambda x: x) if not n else f(n-1) )
>>> f(5)(42)
42

Of course, it could be argued that exceptions are not intended to be used for intentionally redirecting the interpreter (as a kind of goto statement or probably rather a kind of continuation passing style), which I have to admit. But, again, I find funny the idea of using try with a single line being a return statement: we try to return something (normal behaviour) but we can’t do it because of a recursive call occurring (exception).

Initial answer (2013-08-29).

I wrote a very small plugin for handling tail recursion. You may find it with my explanations there: https://groups.google.com/forum/?hl=fr#!topic/comp.lang.python/dIsnJ2BoBKs

It can embed a lambda function written with a tail recursion style in another function which will evaluate it as a loop.

The most interesting feature in this small function, in my humble opinion, is that the function doesn’t rely on some dirty programming hack but on mere lambda calculus: the behaviour of the function is changed to another one when inserted in another lambda function which looks very like the Y combinator.


回答 2

Guido的字词位于http://neopythonic.blogspot.co.uk/2009/04/tail-recursion-elimination.html

我最近在我的Python历史记录博客中发布了有关Python功能特性起源的条目。关于不支持尾递归消除(TRE)的附带评论立即引发了一些评论,说明Python不能做到这一点是多么可惜,包括其他人试图“证明”可以将TRE添加到Python的最新博客条目的链接。容易。因此,让我捍卫自己的立场(这就是我不希望使用该语言的TRE)。如果您想要一个简短的答案,那简直就是不可思议。这是很长的答案:

The word of Guido is at http://neopythonic.blogspot.co.uk/2009/04/tail-recursion-elimination.html

I recently posted an entry in my Python History blog on the origins of Python’s functional features. A side remark about not supporting tail recursion elimination (TRE) immediately sparked several comments about what a pity it is that Python doesn’t do this, including links to recent blog entries by others trying to “prove” that TRE can be added to Python easily. So let me defend my position (which is that I don’t want TRE in the language). If you want a short answer, it’s simply unpythonic. Here’s the long answer:


回答 3

CPython不会并且可能永远不会基于Guido van Rossum关于该主题陈述来支持尾部调用优化。

我听说过有论点,因为它如何修改堆栈跟踪,这使调试更加困难。

CPython does not and will probably never support tail call optimization based on Guido van Rossum’s statements on the subject.

I’ve heard arguments that it makes debugging more difficult because of how it modifies the stack trace.


回答 4

尝试使用实验性 TCO实施以获取大小。

Try the experimental macropy TCO implementation for size.


回答 5

除了优化尾递归,您还可以通过以下方式手动设置递归深度:

import sys
sys.setrecursionlimit(5500000)
print("recursion limit:%d " % (sys.getrecursionlimit()))

Besides optimizing tail recursion, you can set the recursion depth manually by:

import sys
sys.setrecursionlimit(5500000)
print("recursion limit:%d " % (sys.getrecursionlimit()))

Python中的最大递归深度是多少,以及如何增加?

问题:Python中的最大递归深度是多少,以及如何增加?

我在这里有这个尾部递归函数:

def recursive_function(n, sum):
    if n < 1:
        return sum
    else:
        return recursive_function(n-1, sum+n)

c = 998
print(recursive_function(c, 0))

它工作到了n=997,然后它破裂并吐出了RecursionError: maximum recursion depth exceeded in comparison。这仅仅是堆栈溢出吗?有办法解决吗?

I have this tail recursive function here:

def recursive_function(n, sum):
    if n < 1:
        return sum
    else:
        return recursive_function(n-1, sum+n)

c = 998
print(recursive_function(c, 0))

It works up to n=997, then it just breaks and spits out a RecursionError: maximum recursion depth exceeded in comparison. Is this just a stack overflow? Is there a way to get around it?


回答 0

这是防止堆栈溢出的保护措施,是的。Python(或更确切地说,CPython实现)无法优化尾部递归,无限制的递归会导致堆栈溢出。您可以使用来检查递归限制,sys.getrecursionlimit并使用来更改递归限制sys.setrecursionlimit,但是这样做很危险-标准限制有些保守,但是Python堆栈框架可能会很大。

Python不是一种功能语言,尾部递归并不是一种特别有效的技术。如果可能的话,迭代地重写算法通常是一个更好的主意。

It is a guard against a stack overflow, yes. Python (or rather, the CPython implementation) doesn’t optimize tail recursion, and unbridled recursion causes stack overflows. You can check the recursion limit with sys.getrecursionlimit and change the recursion limit with sys.setrecursionlimit, but doing so is dangerous — the standard limit is a little conservative, but Python stackframes can be quite big.

Python isn’t a functional language and tail recursion is not a particularly efficient technique. Rewriting the algorithm iteratively, if possible, is generally a better idea.


回答 1

看起来您只需要设置更高的递归深度即可

import sys
sys.setrecursionlimit(1500)

Looks like you just need to set a higher recursion depth:

import sys
sys.setrecursionlimit(1500)

回答 2

这是为了避免堆栈溢出。Python解释器限制了递归的深度,以帮助您避免无限递归,从而导致堆栈溢出。尝试增加递归限制(sys.setrecursionlimit)或不递归地重写代码。

Python文档中

sys.getrecursionlimit()

返回递归限制的当前值,即Python解释器堆栈的最大深度。此限制可防止无限递归导致C堆栈溢出和Python崩溃。可以通过设置setrecursionlimit()

It’s to avoid a stack overflow. The Python interpreter limits the depths of recursion to help you avoid infinite recursions, resulting in stack overflows. Try increasing the recursion limit (sys.setrecursionlimit) or re-writing your code without recursion.

From the Python documentation:

sys.getrecursionlimit()

Return the current value of the recursion limit, the maximum depth of the Python interpreter stack. This limit prevents infinite recursion from causing an overflow of the C stack and crashing Python. It can be set by setrecursionlimit().


回答 3

如果您经常需要更改递归限制(例如,在解决编程难题时),则可以定义一个简单的上下文管理器,如下所示:

import sys

class recursionlimit:
    def __init__(self, limit):
        self.limit = limit
        self.old_limit = sys.getrecursionlimit()

    def __enter__(self):
        sys.setrecursionlimit(self.limit)

    def __exit__(self, type, value, tb):
        sys.setrecursionlimit(self.old_limit)

然后要调用具有自定义限制的函数,您可以执行以下操作:

with recursionlimit(1500):
    print(fib(1000, 0))

with语句主体退出时,递归限制将恢复为默认值。

If you often need to change the recursion limit (e.g. while solving programming puzzles) you can define a simple context manager like this:

import sys

class recursionlimit:
    def __init__(self, limit):
        self.limit = limit
        self.old_limit = sys.getrecursionlimit()

    def __enter__(self):
        sys.setrecursionlimit(self.limit)

    def __exit__(self, type, value, tb):
        sys.setrecursionlimit(self.old_limit)

Then to call a function with a custom limit you can do:

with recursionlimit(1500):
    print(fib(1000, 0))

On exit from the body of the with statement the recursion limit will be restored to the default value.


回答 4

使用保证尾叫优化的语言。或使用迭代。另外,也可以和装饰工一起变得可爱。

Use a language that guarantees tail-call optimisation. Or use iteration. Alternatively, get cute with decorators.


回答 5

resource.setrlimit 还必须用于增加堆栈大小并防止段错误

Linux内核限制了进程的堆栈

Python将局部变量存储在解释器的堆栈中,因此递归会占用解释器的堆栈空间。

如果Python解释器试图超过堆栈限制,则Linux内核会使其出现分段错误。

堆栈限制大小由getrlimitsetrlimit系统调用控制。

Python通过resource模块提供对那些系统调用的访问。

import resource
import sys

print resource.getrlimit(resource.RLIMIT_STACK)
print sys.getrecursionlimit()
print

# Will segfault without this line.
resource.setrlimit(resource.RLIMIT_STACK, [0x10000000, resource.RLIM_INFINITY])
sys.setrecursionlimit(0x100000)

def f(i):
    print i
    sys.stdout.flush()
    f(i + 1)
f(0)

当然,如果您继续增加ulimit,RAM将会用完,这将使计算机因交换疯狂而停止运行,或者通过OOM Killer杀死Python。

在bash中,您可以使用以下命令查看和设置堆栈限制(以kb为单位):

ulimit -s
ulimit -s 10000

我的默认值为8Mb。

也可以看看:

已在Ubuntu 16.10,Python 2.7.12上测试。

resource.setrlimit must also be used to increase the stack size and prevent segfault

The Linux kernel limits the stack of processes.

Python stores local variables on the stack of the interpreter, and so recursion takes up stack space of the interpreter.

If the Python interpreter tries to go over the stack limit, the Linux kernel makes it segmentation fault.

The stack limit size is controlled with the getrlimit and setrlimit system calls.

Python offers access to those system calls through the resource module.

import resource
import sys

print resource.getrlimit(resource.RLIMIT_STACK)
print sys.getrecursionlimit()
print

# Will segfault without this line.
resource.setrlimit(resource.RLIMIT_STACK, [0x10000000, resource.RLIM_INFINITY])
sys.setrecursionlimit(0x100000)

def f(i):
    print i
    sys.stdout.flush()
    f(i + 1)
f(0)

Of course, if you keep increasing ulimit, your RAM will run out, which will either slow your computer to a halt due to swap madness, or kill Python via the OOM Killer.

From bash, you can see and set the stack limit (in kb) with:

ulimit -s
ulimit -s 10000

The default value for me is 8Mb.

See also:

Tested on Ubuntu 16.10, Python 2.7.12.


回答 6

我意识到这是一个老问题,但是对于那些阅读者,我建议不要对此类问题使用递归-列表要快得多,并且完全避免递归。我将其实现为:

def fibonacci(n):
    f = [0,1,1]
    for i in xrange(3,n):
        f.append(f[i-1] + f[i-2])
    return 'The %.0fth fibonacci number is: %.0f' % (n,f[-1])

(如果您开始从0而不是1开始计数斐波那契数列,请在xrange中使用n + 1。)

I realize this is an old question but for those reading, I would recommend against using recursion for problems such as this – lists are much faster and avoid recursion entirely. I would implement this as:

def fibonacci(n):
    f = [0,1,1]
    for i in xrange(3,n):
        f.append(f[i-1] + f[i-2])
    return 'The %.0fth fibonacci number is: %.0f' % (n,f[-1])

(Use n+1 in xrange if you start counting your fibonacci sequence from 0 instead of 1.)


回答 7

当然,可以通过应用Binet公式在O(n)中计算斐波那契数:

from math import floor, sqrt

def fib(n):                                                     
    return int(floor(((1+sqrt(5))**n-(1-sqrt(5))**n)/(2**n*sqrt(5))+0.5))

正如评论者所指出的,由于,它不是O(1)而是O(n)2**n。另外一个区别是,您只获得一个值,而使用递归时,您将获得该值之前的所有值Fibonacci(n)

Of course Fibonacci numbers can be computed in O(n) by applying the Binet formula:

from math import floor, sqrt

def fib(n):                                                     
    return int(floor(((1+sqrt(5))**n-(1-sqrt(5))**n)/(2**n*sqrt(5))+0.5))

As the commenters note it’s not O(1) but O(n) because of 2**n. Also a difference is that you only get one value, while with recursion you get all values of Fibonacci(n) up to that value.


回答 8

错误“超出最大递归深度”时,我遇到了类似的问题。我发现错误是由我遍历的目录中的损坏文件触发的os.walk。如果您无法解决此问题,并且正在使用文件路径,请确保将其范围缩小,因为它可能是损坏的文件。

I had a similar issue with the error “Max recursion depth exceeded”. I discovered the error was being triggered by a corrupt file in the directory I was looping over with os.walk. If you have trouble solving this issue and you are working with file paths, be sure to narrow it down, as it might be a corrupt file.


回答 9

如果只想得到几个斐波那契数,则可以使用矩阵法。

from numpy import matrix

def fib(n):
    return (matrix('0 1; 1 1', dtype='object') ** n).item(1)

由于numpy使用快速指数运算算法,因此速度很快。您会在O(log n)中得到答案。它比Binet的公式更好,因为它仅使用整数。但是,如果您希望所有斐波那契数均不超过n,那么最好通过记忆来实现。

If you want to get only few Fibonacci numbers, you can use matrix method.

from numpy import matrix

def fib(n):
    return (matrix('0 1; 1 1', dtype='object') ** n).item(1)

It’s fast as numpy uses fast exponentiation algorithm. You get answer in O(log n). And it’s better than Binet’s formula because it uses only integers. But if you want all Fibonacci numbers up to n, then it’s better to do it by memorisation.


回答 10

使用生成器?

def fib():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

fibs = fib() #seems to be the only way to get the following line to work is to
             #assign the infinite generator to a variable

f = [fibs.next() for x in xrange(1001)]

for num in f:
        print num

上面的fib()函数改编自:http : //intermediatepythonista.com/python-generators

Use generators?

def fib():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

fibs = fib() #seems to be the only way to get the following line to work is to
             #assign the infinite generator to a variable

f = [fibs.next() for x in xrange(1001)]

for num in f:
        print num

above fib() function adapted from: http://intermediatepythonista.com/python-generators


回答 11

正如@alex 建议的那样,您可以使用生成器函数按顺序执行此操作,而不必递归执行。

这与您问题中的代码等效:

def fib(n):
    def fibseq(n):
        """ Iteratively return the first n Fibonacci numbers, starting from 0. """
        a, b = 0, 1
        for _ in xrange(n):
            yield a
            a, b = b, a + b

    return sum(v for v in fibseq(n))

print format(fib(100000), ',d')  # -> no recursion depth error

As @alex suggested, you could use a generator function to do this sequentially instead of recursively.

Here’s the equivalent of the code in your question:

def fib(n):
    def fibseq(n):
        """ Iteratively return the first n Fibonacci numbers, starting from 0. """
        a, b = 0, 1
        for _ in xrange(n):
            yield a
            a, b = b, a + b

    return sum(v for v in fibseq(n))

print format(fib(100000), ',d')  # -> no recursion depth error

回答 12

许多人建议增加递归限制是一个很好的解决方案,但并不是因为总会有限制。而是使用迭代解决方案。

def fib(n):
    a,b = 1,1
    for i in range(n-1):
        a,b = b,a+b
    return a
print fib(5)

Many recommend that increasing recursion limit is a good solution however it is not because there will be always limit. Instead use an iterative solution.

def fib(n):
    a,b = 1,1
    for i in range(n-1):
        a,b = b,a+b
    return a
print fib(5)

回答 13

我想给你一个使用记忆来计算斐波那契的例子,因为这将允许您使用递归来计算更大的数字:

cache = {}
def fib_dp(n):
    if n in cache:
        return cache[n]
    if n == 0: return 0
    elif n == 1: return 1
    else:
        value = fib_dp(n-1) + fib_dp(n-2)
    cache[n] = value
    return value

print(fib_dp(998))

这仍然是递归的,但是使用了一个简单的哈希表,该哈希表允许重新使用先前计算的斐波那契数,而不是再次进行处理。

I wanted to give you an example for using memoization to compute Fibonacci as this will allow you to compute significantly larger numbers using recursion:

cache = {}
def fib_dp(n):
    if n in cache:
        return cache[n]
    if n == 0: return 0
    elif n == 1: return 1
    else:
        value = fib_dp(n-1) + fib_dp(n-2)
    cache[n] = value
    return value

print(fib_dp(998))

This is still recursive, but uses a simple hashtable that allows the reuse of previously calculated Fibonacci numbers instead of doing them again.


回答 14

import sys
sys.setrecursionlimit(1500)

def fib(n, sum):
    if n < 1:
        return sum
    else:
        return fib(n-1, sum+n)

c = 998
print(fib(c, 0))
import sys
sys.setrecursionlimit(1500)

def fib(n, sum):
    if n < 1:
        return sum
    else:
        return fib(n-1, sum+n)

c = 998
print(fib(c, 0))

回答 15

我们可以使用@lru_cache装饰器和setrecursionlimit()方法来做到这一点:

import sys
from functools import lru_cache

sys.setrecursionlimit(15000)


@lru_cache(128)
def fib(n: int) -> int:
    if n == 0:
        return 0
    if n == 1:
        return 1

    return fib(n - 2) + fib(n - 1)


print(fib(14000))

输出量

30024687611784610909954941797150256486927479374907929434683754295022302429422848358634023335752162178658116387303893522391813423077567204146193912177985425759965410810605019053021570190026149647173108088094786756027114403612415007326991458343778563263940370716662743216573053208040553070210197932517628308167015873869948880323622321982198435498652758806996123592751252434571324967728548865087033966433650424543330098020063842868595816492963908030032326548984645615892344451398632426062857115917462228808073910572119126558184997987209873025407120679598408021068497765475222474299046183573947717256532535593461952826012850191693602073551792238148571064052850079975476925463787570629995816578671884209957706505655213778743330859631234442589530527514612069776150795114358628796784390811755362655769771068650740995128972351005382411964458155682913778466563529792280989115666759565256441826456081786038371722278388967254256057199423000376505262314868810660373978669420138382967692847455277784392729950672314920693691302891547531323138832943985935078735556672110054220032041561548590315294621529531199575971957359536867988711311482550501404508450342400953050944499115785985396588557041582402218095280101794144934995834735688732530679216395139965967382758179096248575936932919808413032911456135664665752332836514201349157649613728759338222629534204445483491804365831832919448755994772408147745801871446379654872505781349904024433656779853884819614924449819945230342456197818533654765527194609607959296668836657042938973102012760116580743591941893596607924960274722264285715479716022598086974414353585784805898377669116842002756368891922547626785125970004526761913744759327966638428657446582649249137716764154041799200960747515164228729976654250474574283272762300592961327227879153001050020190062933200829553787159082636533777550311557940634505157310094024075846831328702063769940259207902985911442136599426686220621914413462000983429439551695225325742716449543602174724585214896718594652325684194041820439660922117443726997973759660480107754534446001535247722384014147895626514102898089949605331327595320928957794069409252529061666121536998507599337628979471759721478687840083202475862103785567113327394632779402552890479623233069460683818874460463877452479256752401829811908362649646406120699094586824433927299460840993120477529668064393314036639349699429580222379452059925811788036061569820343853471827665733517687496651725499086383376119531998081619378853667092850432765957264840681380911889146981517031227737267252613705423551621181643027288122591924764289387307241098259223319732561050912005515665813505080619227629100785282198699132141465755572491992636342411653522265707496189070505531154683066691844859102698062258945308098231022792317500616520425607725305767131486478587056496429077806032476806802362362202208266406656599026501804747607137957607601654671

资源

functools lru_cache

We can do that using @lru_cache decorator and setrecursionlimit() method:

import sys
from functools import lru_cache

sys.setrecursionlimit(15000)


@lru_cache(128)
def fib(n: int) -> int:
    if n == 0:
        return 0
    if n == 1:
        return 1

    return fib(n - 2) + fib(n - 1)


print(fib(14000))

Output

3002468761178461090995494179715025648692747937490792943468375429502230242942284835863402333575216217865811638730389352239181342307756720414619391217798542575996541081060501905302157019002614964717310808809478675602711440361241500732699145834377856326394037071666274321657305320804055307021019793251762830816701587386994888032362232198219843549865275880699612359275125243457132496772854886508703396643365042454333009802006384286859581649296390803003232654898464561589234445139863242606285711591746222880807391057211912655818499798720987302540712067959840802106849776547522247429904618357394771725653253559346195282601285019169360207355179223814857106405285007997547692546378757062999581657867188420995770650565521377874333085963123444258953052751461206977615079511435862879678439081175536265576977106865074099512897235100538241196445815568291377846656352979228098911566675956525644182645608178603837172227838896725425605719942300037650526231486881066037397866942013838296769284745527778439272995067231492069369130289154753132313883294398593507873555667211005422003204156154859031529462152953119957597195735953686798871131148255050140450845034240095305094449911578598539658855704158240221809528010179414493499583473568873253067921639513996596738275817909624857593693291980841303291145613566466575233283651420134915764961372875933822262953420444548349180436583183291944875599477240814774580187144637965487250578134990402443365677985388481961492444981994523034245619781853365476552719460960795929666883665704293897310201276011658074359194189359660792496027472226428571547971602259808697441435358578480589837766911684200275636889192254762678512597000452676191374475932796663842865744658264924913771676415404179920096074751516422872997665425047457428327276230059296132722787915300105002019006293320082955378715908263653377755031155794063450515731009402407584683132870206376994025920790298591144213659942668622062191441346200098342943955169522532574271644954360217472458521489671859465232568419404182043966092211744372699797375966048010775453444600153524772238401414789562651410289808994960533132759532092895779406940925252906166612153699850759933762897947175972147868784008320247586210378556711332739463277940255289047962323306946068381887446046387745247925675240182981190836264964640612069909458682443392729946084099312047752966806439331403663934969942958022237945205992581178803606156982034385347182766573351768749665172549908638337611953199808161937885366709285043276595726484068138091188914698151703122773726725261370542355162118164302728812259192476428938730724109825922331973256105091200551566581350508061922762910078528219869913214146575557249199263634241165352226570749618907050553115468306669184485910269806225894530809823102279231750061652042560772530576713148647858705369649642907780603247428680176236527220826640665659902650188140474762163503557640566711903907798932853656216227739411210513756695569391593763704981001125

Source

functools lru_cache


回答 16

我们还可以使用动态编程自底向上方法的变体

def fib_bottom_up(n):

    bottom_up = [None] * (n+1)
    bottom_up[0] = 1
    bottom_up[1] = 1

    for i in range(2, n+1):
        bottom_up[i] = bottom_up[i-1] + bottom_up[i-2]

    return bottom_up[n]

print(fib_bottom_up(20000))

We could also use a variation of dynamic programming bottom up approach

def fib_bottom_up(n):

    bottom_up = [None] * (n+1)
    bottom_up[0] = 1
    bottom_up[1] = 1

    for i in range(2, n+1):
        bottom_up[i] = bottom_up[i-1] + bottom_up[i-2]

    return bottom_up[n]

print(fib_bottom_up(20000))