标签归档:caching

有Python缓存库吗?

问题:有Python缓存库吗?

我正在寻找Python缓存库,但到目前为止找不到任何东西。我需要一个简单dict的类似接口,可以在其中设置密钥及其有效期,并将其重新缓存。有点像:

cache.get(myfunction, duration=300)

它将从缓存中为我提供该项目(如果存在),或者调用该函数并将其存储(如果它不存在或已过期)。有人知道这样吗?

I’m looking for a Python caching library but can’t find anything so far. I need a simple dict-like interface where I can set keys and their expiration and get them back cached. Sort of something like:

cache.get(myfunction, duration=300)

which will give me the item from the cache if it exists or call the function and store it if it doesn’t or has expired. Does anyone know something like this?


回答 0


回答 1

在Python 3.2中,您可以使用functools库中的装饰器@lru_cache。这是最近使用过的高速缓存,因此其中的项目没有过期时间,但是作为快速破解,它非常有用。

from functools import lru_cache

@lru_cache(maxsize=256)
def f(x):
  return x*x

for x in range(20):
  print f(x)
for x in range(20):
  print f(x)

From Python 3.2 you can use the decorator @lru_cache from the functools library. It’s a Last Recently Used cache, so there is no expiration time for the items in it, but as a fast hack it’s very useful.

from functools import lru_cache

@lru_cache(maxsize=256)
def f(x):
  return x*x

for x in range(20):
  print f(x)
for x in range(20):
  print f(x)

回答 2

您还可以查看Memoize装饰器。您可能无需做太多修改就可以使它完成您想做的事情。

You might also take a look at the Memoize decorator. You could probably get it to do what you want without too much modification.


回答 3

Joblib https://joblib.readthedocs.io支持Memoize模式中的缓存功能。通常,这种想法是缓存计算上昂贵的功能。

>>> from joblib import Memory
>>> mem = Memory(cachedir='/tmp/joblib')
>>> import numpy as np
>>> square = mem.cache(np.square)
>>> 
>>> a = np.vander(np.arange(3)).astype(np.float)
>>> b = square(a)                                   
________________________________________________________________________________
[Memory] Calling square...
square(array([[ 0.,  0.,  1.],
       [ 1.,  1.,  1.],
       [ 4.,  2.,  1.]]))
___________________________________________________________square - 0...s, 0.0min

>>> c = square(a)

您也可以做一些花哨的事情,例如在函数上使用@ memory.cache装饰器。该文档位于此处:https : //joblib.readthedocs.io/en/latest/generation/joblib.Memory.html

Joblib https://joblib.readthedocs.io supports caching functions in the Memoize pattern. Mostly, the idea is to cache computationally expensive functions.

>>> from joblib import Memory
>>> mem = Memory(cachedir='/tmp/joblib')
>>> import numpy as np
>>> square = mem.cache(np.square)
>>> 
>>> a = np.vander(np.arange(3)).astype(np.float)
>>> b = square(a)                                   
________________________________________________________________________________
[Memory] Calling square...
square(array([[ 0.,  0.,  1.],
       [ 1.,  1.,  1.],
       [ 4.,  2.,  1.]]))
___________________________________________________________square - 0...s, 0.0min

>>> c = square(a)

You can also do fancy things like using the @memory.cache decorator on functions. The documentation is here: https://joblib.readthedocs.io/en/latest/generated/joblib.Memory.html


回答 4

还没有人提到搁置。https://docs.python.org/2/library/shelve.html

它不是memcached的,但是看起来更简单,并且可能满足您的需求。

No one has mentioned shelve yet. https://docs.python.org/2/library/shelve.html

It isn’t memcached, but looks much simpler and might fit your need.


回答 5

我认为python memcached API是流行的工具,但我自己并未使用过它,也不确定它是否支持您所需的功能。

I think the python memcached API is the prevalent tool, but I haven’t used it myself and am not sure whether it supports the features you need.


回答 6

import time

class CachedItem(object):
    def __init__(self, key, value, duration=60):
        self.key = key
        self.value = value
        self.duration = duration
        self.timeStamp = time.time()

    def __repr__(self):
        return '<CachedItem {%s:%s} expires at: %s>' % (self.key, self.value, time.time() + self.duration)

class CachedDict(dict):

    def get(self, key, fn, duration):
        if key not in self \
            or self[key].timeStamp + self[key].duration < time.time():
                print 'adding new value'
                o = fn(key)
                self[key] = CachedItem(key, o, duration)
        else:
            print 'loading from cache'

        return self[key].value



if __name__ == '__main__':

    fn = lambda key: 'value of %s  is None' % key

    ci = CachedItem('a', 12)
    print ci 
    cd = CachedDict()
    print cd.get('a', fn, 5)
    time.sleep(2)
    print cd.get('a', fn, 6)
    print cd.get('b', fn, 6)
    time.sleep(2)
    print cd.get('a', fn, 7)
    print cd.get('b', fn, 7)
import time

class CachedItem(object):
    def __init__(self, key, value, duration=60):
        self.key = key
        self.value = value
        self.duration = duration
        self.timeStamp = time.time()

    def __repr__(self):
        return '<CachedItem {%s:%s} expires at: %s>' % (self.key, self.value, time.time() + self.duration)

class CachedDict(dict):

    def get(self, key, fn, duration):
        if key not in self \
            or self[key].timeStamp + self[key].duration < time.time():
                print 'adding new value'
                o = fn(key)
                self[key] = CachedItem(key, o, duration)
        else:
            print 'loading from cache'

        return self[key].value



if __name__ == '__main__':

    fn = lambda key: 'value of %s  is None' % key

    ci = CachedItem('a', 12)
    print ci 
    cd = CachedDict()
    print cd.get('a', fn, 5)
    time.sleep(2)
    print cd.get('a', fn, 6)
    print cd.get('b', fn, 6)
    time.sleep(2)
    print cd.get('a', fn, 7)
    print cd.get('b', fn, 7)

回答 7

您可以使用我的简单解决方案来解决该问题。这真的很简单,没有花哨:

class MemCache(dict):
    def __init__(self, fn):
        dict.__init__(self)
        self.__fn = fn

    def __getitem__(self, item):
        if item not in self:
            dict.__setitem__(self, item, self.__fn(item))
        return dict.__getitem__(self, item)

mc = MemCache(lambda x: x*x)

for x in xrange(10):
    print mc[x]

for x in xrange(10):
    print mc[x]

它确实缺乏到期功能,但是您可以通过在MemCache c-tor中指定特定规则来轻松扩展它。

希望代码是不言而喻的,但是,如果不是,就更不用说了,高速缓存正在作为其c-tor参数之一传递给翻译函数。依次用于生成有关输入的缓存输出。

希望能帮助到你

You can use my simple solution to the problem. It is really straightforward, nothing fancy:

class MemCache(dict):
    def __init__(self, fn):
        dict.__init__(self)
        self.__fn = fn

    def __getitem__(self, item):
        if item not in self:
            dict.__setitem__(self, item, self.__fn(item))
        return dict.__getitem__(self, item)

mc = MemCache(lambda x: x*x)

for x in xrange(10):
    print mc[x]

for x in xrange(10):
    print mc[x]

It indeed lacks expiration funcionality, but you can easily extend it with specifying a particular rule in MemCache c-tor.

Hope code is enough self-explanatory, but if not, just to mention, that cache is being passed a translation function as one of its c-tor params. It’s used in turn to generate cached output regarding the input.

Hope it helps


回答 8

尝试使用redis,它是应用程序以原子方式共享数据或如果您具有某种Web服务器平台的最干净,最简单的解决方案之一。它非常容易设置,您将需要一个python redis客户端http://pypi.python.org/pypi/redis

Try redis, it is one of the cleanest and easiest solutions for applications to share data in a atomic way or if you have got some web server platform. Its very easy to setup, you will need a python redis client http://pypi.python.org/pypi/redis


回答 9

查看pypi 上的gocept.cache,管理超时。

Look at gocept.cache on pypi, manage timeout.


回答 10

项目旨在提供“为人类提供缓存”(尽管似乎相当未知)

来自项目页面的一些信息:

安装

点安装缓存

用法:

import pylibmc
from cache import Cache

backend = pylibmc.Client(["127.0.0.1"])

cache = Cache(backend)

@cache("mykey")
def some_expensive_method():
    sleep(10)
    return 42

# writes 42 to the cache
some_expensive_method()

# reads 42 from the cache
some_expensive_method()

# re-calculates and writes 42 to the cache
some_expensive_method.refresh()

# get the cached value or throw an error
# (unless default= was passed to @cache(...))
some_expensive_method.cached()

This project aims to provide “Caching for humans” (seems like it’s fairly unknown though)

Some info from the project page:

Installation

pip install cache

Usage:

import pylibmc
from cache import Cache

backend = pylibmc.Client(["127.0.0.1"])

cache = Cache(backend)

@cache("mykey")
def some_expensive_method():
    sleep(10)
    return 42

# writes 42 to the cache
some_expensive_method()

# reads 42 from the cache
some_expensive_method()

# re-calculates and writes 42 to the cache
some_expensive_method.refresh()

# get the cached value or throw an error
# (unless default= was passed to @cache(...))
some_expensive_method.cached()

回答 11

查看bda.cache http://pypi.python.org/pypi/bda.cache-使用ZCA并经过zope和bfg的测试。

Look at bda.cache http://pypi.python.org/pypi/bda.cache – uses ZCA and is tested with zope and bfg.


回答 12

keyring是最好的python缓存库。您可以使用

keyring.set_password("service","jsonkey",json_res)

json_res= keyring.get_password("service","jsonkey")

json_res= keyring.core.delete_password("service","jsonkey")

keyring is the best python caching library. You can use

keyring.set_password("service","jsonkey",json_res)

json_res= keyring.get_password("service","jsonkey")

json_res= keyring.core.delete_password("service","jsonkey")

是否有一个装饰器来简单地缓存函数的返回值?

问题:是否有一个装饰器来简单地缓存函数的返回值?

考虑以下:

@property
def name(self):

    if not hasattr(self, '_name'):

        # expensive calculation
        self._name = 1 + 1

    return self._name

我是新手,但我认为可以将缓存分解为装饰器。只有我找不到喜欢的人;)

PS实际计算不取决于可变值

Consider the following:

@property
def name(self):

    if not hasattr(self, '_name'):

        # expensive calculation
        self._name = 1 + 1

    return self._name

I’m new, but I think the caching could be factored out into a decorator. Only I didn’t find one like it ;)

PS the real calculation doesn’t depend on mutable values


回答 0

从Python 3.2开始,有一个内置的装饰器:

@functools.lru_cache(maxsize=100, typed=False)

装饰器用备注可调用函数包装一个函数,该函数可保存最多最近调用的最大大小。当使用相同的参数定期调用昂贵的或I / O绑定的函数时,可以节省时间。

用于计算斐波纳契数的LRU缓存示例:

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

>>> print([fib(n) for n in range(16)])
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]

>>> print(fib.cache_info())
CacheInfo(hits=28, misses=16, maxsize=None, currsize=16)

如果您对Python 2.x感到困惑,那么这里是其他兼容的备注库的列表:

Starting from Python 3.2 there is a built-in decorator:

@functools.lru_cache(maxsize=100, typed=False)

Decorator to wrap a function with a memoizing callable that saves up to the maxsize most recent calls. It can save time when an expensive or I/O bound function is periodically called with the same arguments.

Example of an LRU cache for computing Fibonacci numbers:

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

>>> print([fib(n) for n in range(16)])
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]

>>> print(fib.cache_info())
CacheInfo(hits=28, misses=16, maxsize=None, currsize=16)

If you are stuck with Python 2.x, here’s a list of other compatible memoization libraries:


回答 1

听起来您好像并没有要求通用的备忘录装饰器(即,您对要为不同的参数值缓存返回值的一般情况不感兴趣)。也就是说,您想要这样:

x = obj.name  # expensive
y = obj.name  # cheap

而通用的备忘装饰器会为您提供:

x = obj.name()  # expensive
y = obj.name()  # cheap

我认为方法调用语法是更好的样式,因为它暗示了可能进行昂贵的计算,而属性语法则建议进行快速查找。

[更新:我以前链接并在此处引用的基于类的备忘录装饰器不适用于方法。我已经用装饰器函数代替了它。]如果您愿意使用通用的备忘录装饰器,这是一个简单的例子:

def memoize(function):
  memo = {}
  def wrapper(*args):
    if args in memo:
      return memo[args]
    else:
      rv = function(*args)
      memo[args] = rv
      return rv
  return wrapper

用法示例:

@memoize
def fibonacci(n):
  if n < 2: return n
  return fibonacci(n - 1) + fibonacci(n - 2)

这里可以找到另一个限制缓存大小的备忘装饰器。

It sounds like you’re not asking for a general-purpose memoization decorator (i.e., you’re not interested in the general case where you want to cache return values for different argument values). That is, you’d like to have this:

x = obj.name  # expensive
y = obj.name  # cheap

while a general-purpose memoization decorator would give you this:

x = obj.name()  # expensive
y = obj.name()  # cheap

I submit that the method-call syntax is better style, because it suggests the possibility of expensive computation while the property syntax suggests a quick lookup.

[Update: The class-based memoization decorator I had linked to and quoted here previously doesn’t work for methods. I’ve replaced it with a decorator function.] If you’re willing to use a general-purpose memoization decorator, here’s a simple one:

def memoize(function):
  memo = {}
  def wrapper(*args):
    if args in memo:
      return memo[args]
    else:
      rv = function(*args)
      memo[args] = rv
      return rv
  return wrapper

Example usage:

@memoize
def fibonacci(n):
  if n < 2: return n
  return fibonacci(n - 1) + fibonacci(n - 2)

Another memoization decorator with a limit on the cache size can be found here.


回答 2

class memorize(dict):
    def __init__(self, func):
        self.func = func

    def __call__(self, *args):
        return self[args]

    def __missing__(self, key):
        result = self[key] = self.func(*key)
        return result

样本用途:

>>> @memorize
... def foo(a, b):
...     return a * b
>>> foo(2, 4)
8
>>> foo
{(2, 4): 8}
>>> foo('hi', 3)
'hihihi'
>>> foo
{(2, 4): 8, ('hi', 3): 'hihihi'}
class memorize(dict):
    def __init__(self, func):
        self.func = func

    def __call__(self, *args):
        return self[args]

    def __missing__(self, key):
        result = self[key] = self.func(*key)
        return result

Sample uses:

>>> @memorize
... def foo(a, b):
...     return a * b
>>> foo(2, 4)
8
>>> foo
{(2, 4): 8}
>>> foo('hi', 3)
'hihihi'
>>> foo
{(2, 4): 8, ('hi', 3): 'hihihi'}

回答 3

Python 3.8 functools.cached_property装饰器

https://docs.python.org/dev/library/functools.html#functools.cached_property

cached_property在以下网址中提到了来自Werkzeug的文章:https : //stackoverflow.com/a/5295190/895245,但假设派生的版本将合并到3.8中,真是太棒了。

当没有任何参数时@property,可以将此装饰器视为缓存或清洁器 @functools.lru_cache

文档说:

@functools.cached_property(func)

将类的方法转换为属性,该属性的值将被计算一次,然后在实例生命周期中作为常规属性进行缓存。类似于property(),但增加了缓存。对于实例有效的不可变的昂贵的计算属性很有用。

例:

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)

3.8版的新功能。

注意此装饰器要求每个实例上的dict属性都是可变映射。这意味着它不适用于某些类型,例如元类(因为类型实例上的dict属性是类命名空间的只读代理),以及那些指定而不将dict作为已定义槽之一的类(例如此类)根本不提供dict属性)。

Python 3.8 functools.cached_property decorator

https://docs.python.org/dev/library/functools.html#functools.cached_property

cached_property from Werkzeug was mentioned at: https://stackoverflow.com/a/5295190/895245 but a supposedly derived version will be merged into 3.8, which is awesome.

This decorator can be seen as caching @property, or as a cleaner @functools.lru_cache for when you don’t have any arguments.

The docs say:

@functools.cached_property(func)

Transform a method of a class into a property whose value is computed once and then cached as a normal attribute for the life of the instance. Similar to property(), with the addition of caching. Useful for expensive computed properties of instances that are otherwise effectively immutable.

Example:

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)

New in version 3.8.

Note This decorator requires that the dict attribute on each instance be a mutable mapping. This means it will not work with some types, such as metaclasses (since the dict attributes on type instances are read-only proxies for the class namespace), and those that specify slots without including dict as one of the defined slots (as such classes don’t provide a dict attribute at all).


回答 4

Werkzeug有一个cached_property装饰器(docs源代码

Werkzeug has a cached_property decorator (docs, source)


回答 5

我编码了这个简单的装饰器类以缓存函数响应。我发现它对我的项目非常有用:

from datetime import datetime, timedelta 

class cached(object):
    def __init__(self, *args, **kwargs):
        self.cached_function_responses = {}
        self.default_max_age = kwargs.get("default_cache_max_age", timedelta(seconds=0))

    def __call__(self, func):
        def inner(*args, **kwargs):
            max_age = kwargs.get('max_age', self.default_max_age)
            if not max_age or func not in self.cached_function_responses or (datetime.now() - self.cached_function_responses[func]['fetch_time'] > max_age):
                if 'max_age' in kwargs: del kwargs['max_age']
                res = func(*args, **kwargs)
                self.cached_function_responses[func] = {'data': res, 'fetch_time': datetime.now()}
            return self.cached_function_responses[func]['data']
        return inner

用法很简单:

import time

@cached
def myfunc(a):
    print "in func"
    return (a, datetime.now())

@cached(default_max_age = timedelta(seconds=6))
def cacheable_test(a):
    print "in cacheable test: "
    return (a, datetime.now())


print cacheable_test(1,max_age=timedelta(seconds=5))
print cacheable_test(2,max_age=timedelta(seconds=5))
time.sleep(7)
print cacheable_test(3,max_age=timedelta(seconds=5))

I coded this simple decorator class to cache function responses. I find it VERY useful for my projects:

from datetime import datetime, timedelta 

class cached(object):
    def __init__(self, *args, **kwargs):
        self.cached_function_responses = {}
        self.default_max_age = kwargs.get("default_cache_max_age", timedelta(seconds=0))

    def __call__(self, func):
        def inner(*args, **kwargs):
            max_age = kwargs.get('max_age', self.default_max_age)
            if not max_age or func not in self.cached_function_responses or (datetime.now() - self.cached_function_responses[func]['fetch_time'] > max_age):
                if 'max_age' in kwargs: del kwargs['max_age']
                res = func(*args, **kwargs)
                self.cached_function_responses[func] = {'data': res, 'fetch_time': datetime.now()}
            return self.cached_function_responses[func]['data']
        return inner

The usage is straightforward:

import time

@cached
def myfunc(a):
    print "in func"
    return (a, datetime.now())

@cached(default_max_age = timedelta(seconds=6))
def cacheable_test(a):
    print "in cacheable test: "
    return (a, datetime.now())


print cacheable_test(1,max_age=timedelta(seconds=5))
print cacheable_test(2,max_age=timedelta(seconds=5))
time.sleep(7)
print cacheable_test(3,max_age=timedelta(seconds=5))

回答 6

免责声明:我是kids.cache的作者。

您应该检查kids.cache,它提供了@cache可在python 2和python 3上使用的装饰器。没有依赖项,大约100行代码。例如,考虑到您的代码,使用起来非常简单,您可以像这样使用它:

pip install kids.cache

然后

from kids.cache import cache
...
class MyClass(object):
    ...
    @cache            # <-- That's all you need to do
    @property
    def name(self):
        return 1 + 1  # supposedly expensive calculation

或者,您可以将@cache装饰器放在@property(相同结果)之后。

在属性上使用缓存称为惰性评估kids.cache可以做更多的事情(它可以在具有任何参数,属性,任何类型的方法,甚至是类的函数上工作)。对于高级用户,kids.cache支持cachetools可为python 2和python 3提供高级缓存存储(LRU,LFU,TTL,RR缓存)。

重要说明:的默认缓存存储区kids.cache是标准字典,不建议对运行时间长且查询内容不同的长期运行的程序进行存储,因为它会导致缓存存储区的不断增长。对于这种用法,您可以使用例如插入其他缓存存储(@cache(use=cachetools.LRUCache(maxsize=2))以装饰您的功能/属性/类/方法…)

DISCLAIMER: I’m the author of kids.cache.

You should check kids.cache, it provides a @cache decorator that works on python 2 and python 3. No dependencies, ~100 lines of code. It’s very straightforward to use, for instance, with your code in mind, you could use it like this:

pip install kids.cache

Then

from kids.cache import cache
...
class MyClass(object):
    ...
    @cache            # <-- That's all you need to do
    @property
    def name(self):
        return 1 + 1  # supposedly expensive calculation

Or you could put the @cache decorator after the @property (same result).

Using cache on a property is called lazy evaluation, kids.cache can do much more (it works on function with any arguments, properties, any type of methods, and even classes…). For advanced users, kids.cache supports cachetools which provides fancy cache stores to python 2 and python 3 (LRU, LFU, TTL, RR cache).

IMPORTANT NOTE: the default cache store of kids.cache is a standard dict, which is not recommended for long running program with ever different queries as it would lead to an ever growing caching store. For this usage you can plugin other cache stores using for instance (@cache(use=cachetools.LRUCache(maxsize=2)) to decorate your function/property/class/method…)


回答 7

嗯,只需要为此找到正确的名称:“ 惰性属性评估 ”。

我也经常这样做。也许我会在代码中使用该配方。

Ah, just needed to find the right name for this: “Lazy property evaluation“.

I do this a lot too; maybe I’ll use that recipe in my code sometime.


回答 8

这里有fastcache,它是“ Python 3 functools.lru_cache的C实现。与标准库相比提供了10-30倍的加速”。

选择的答案相同,只是导入不同:

from fastcache import lru_cache
@lru_cache(maxsize=128, typed=False)
def f(a, b):
    pass

此外,它还安装在Anaconda中,与需要安装的 functools不同。

There is fastcache, which is “C implementation of Python 3 functools.lru_cache. Provides speedup of 10-30x over standard library.”

Same as chosen answer, just different import:

from fastcache import lru_cache
@lru_cache(maxsize=128, typed=False)
def f(a, b):
    pass

Also, it comes installed in Anaconda, unlike functools which needs to be installed.


回答 9

Python Wiki上还有一个备忘录装饰器的示例:

http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize

该示例有点聪明,因为如果参数可变,它将不会缓存结果。(检查该代码,它非常简单和有趣!)

There is yet another example of a memoize decorator at Python Wiki:

http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize

That example is a bit smart, because it won’t cache the results if the parameters are mutable. (check that code, it’s very simple and interesting!)


回答 10

如果您使用的是Django Framework,则它具有此类属性以缓存API使用的视图或响应,@cache_page(time)并且还可以有其他选项。

例:

@cache_page(60 * 15, cache="special_cache")
def my_view(request):
    ...

可以在此处找到更多详细信息。

If you are using Django Framework, it has such a property to cache a view or response of API’s using @cache_page(time) and there can be other options as well.

Example:

@cache_page(60 * 15, cache="special_cache")
def my_view(request):
    ...

More details can be found here.


回答 11

备忘录示例一起,我找到了以下python软件包:

  • 粗暴的 ; 它允许设置ttl和/或缓存函数的调用次数;另外,人们可以使用基于文件的加密缓存…
  • 缓存

Along with the Memoize Example I found the following python packages:

  • cachepy; It allows to set up ttl and\or the number of calls for cached functions; Also, one can use encrypted file-based cache…
  • percache

回答 12

我实现了类似的方法,使用pickle进行持久化,并使用sha1来实现几乎确定的短唯一ID。基本上,缓存会对函数的代码和参数的历史进行哈希处理,以获取sha1,然后查找名称中具有sha1的文件。如果存在,则将其打开并返回结果。如果不是,它将调用该函数并保存结果(可选地,仅在处理了一定时间后才保存)。

就是说,我发誓我已经找到了一个执行此操作的现有模块,并发现自己在这里试图找到该模块…我能找到的最接近的是这个,看起来很正确:http://chase-seibert.github。 io / blog / 2011/11/23 / pythondjango-disk-based-caching-decorator.html

我唯一看到的问题是,它不能对大型输入有效,因为它会散列str(arg),这对于巨型数组并不是唯一的。

如果有一个unique_hash()协议让一个类返回其内容的安全哈希值,那就太好了。我基本上手动实现了我所关心的类型。

I implemented something like this, using pickle for persistance and using sha1 for short almost-certainly-unique IDs. Basically the cache hashed the code of the function and the hist of arguments to get a sha1 then looked for a file with that sha1 in the name. If it existed, it opened it and returned the result; if not, it calls the function and saves the result (optionally only saving if it took a certain amount of time to process).

That said, I’d swear I found an existing module that did this and find myself here trying to find that module… The closest I can find is this, which looks about right: http://chase-seibert.github.io/blog/2011/11/23/pythondjango-disk-based-caching-decorator.html

The only problem I see with that is it wouldn’t work well for large inputs since it hashes str(arg), which isn’t unique for giant arrays.

It would be nice if there were a unique_hash() protocol that had a class return a secure hash of its contents. I basically manually implemented that for the types I cared about.


回答 13

尝试joblib http://pythonhosted.org/joblib/memory.html

from joblib import Memory
memory = Memory(cachedir=cachedir, verbose=0)
@memory.cache
    def f(x):
        print('Running f(%s)' % x)
        return x

Try joblib http://pythonhosted.org/joblib/memory.html

from joblib import Memory
memory = Memory(cachedir=cachedir, verbose=0)
@memory.cache
    def f(x):
        print('Running f(%s)' % x)
        return x

回答 14

如果您使用的是Django,并且想缓存视图,请参见Nikhil Kumar的答案


但是,如果要缓存ANY函数结果,则可以使用django-cache-utils

它重用了Django缓存并提供了易于使用的cached装饰器:

from cache_utils.decorators import cached

@cached(60)
def foo(x, y=0):
    print 'foo is called'
    return x+y

If you are using Django and want to cache views, see Nikhil Kumar’s answer.


But if you want to cache ANY function results, you can use django-cache-utils.

It reuses Django caches and provides easy to use cached decorator:

from cache_utils.decorators import cached

@cached(60)
def foo(x, y=0):
    print 'foo is called'
    return x+y

回答 15

@lru_cache 不适合使用默认功能值

我的mem装饰:

import inspect


def get_default_args(f):
    signature = inspect.signature(f)
    return {
        k: v.default
        for k, v in signature.parameters.items()
        if v.default is not inspect.Parameter.empty
    }


def full_kwargs(f, kwargs):
    res = dict(get_default_args(f))
    res.update(kwargs)
    return res


def mem(func):
    cache = dict()

    def wrapper(*args, **kwargs):
        kwargs = full_kwargs(func, kwargs)
        key = list(args)
        key.extend(kwargs.values())
        key = hash(tuple(key))
        if key in cache:
            return cache[key]
        else:
            res = func(*args, **kwargs)
            cache[key] = res
            return res
    return wrapper

和测试代码:

from time import sleep


@mem
def count(a, *x, z=10):
    sleep(2)
    x = list(x)
    x.append(z)
    x.append(a)
    return sum(x)


def main():
    print(count(1,2,3,4,5))
    print(count(1,2,3,4,5))
    print(count(1,2,3,4,5, z=6))
    print(count(1,2,3,4,5, z=6))
    print(count(1))
    print(count(1, z=10))


if __name__ == '__main__':
    main()

结果-睡眠只有3次

但这@lru_cache将是4次,因为:

print(count(1))
print(count(1, z=10))

将被计算两次(默认情况下无效)

@lru_cache is not perfect with default function values

my mem decorator:

import inspect


def get_default_args(f):
    signature = inspect.signature(f)
    return {
        k: v.default
        for k, v in signature.parameters.items()
        if v.default is not inspect.Parameter.empty
    }


def full_kwargs(f, kwargs):
    res = dict(get_default_args(f))
    res.update(kwargs)
    return res


def mem(func):
    cache = dict()

    def wrapper(*args, **kwargs):
        kwargs = full_kwargs(func, kwargs)
        key = list(args)
        key.extend(kwargs.values())
        key = hash(tuple(key))
        if key in cache:
            return cache[key]
        else:
            res = func(*args, **kwargs)
            cache[key] = res
            return res
    return wrapper

and code for testing:

from time import sleep


@mem
def count(a, *x, z=10):
    sleep(2)
    x = list(x)
    x.append(z)
    x.append(a)
    return sum(x)


def main():
    print(count(1,2,3,4,5))
    print(count(1,2,3,4,5))
    print(count(1,2,3,4,5, z=6))
    print(count(1,2,3,4,5, z=6))
    print(count(1))
    print(count(1, z=10))


if __name__ == '__main__':
    main()

result – only 3 times with sleep

but with @lru_cache it will be 4 times, because this:

print(count(1))
print(count(1, z=10))

will be calculated twice (bad working with defaults)


什么是__pycache__?

问题:什么是__pycache__?

据我了解,缓存是类似文件的加密文件。

__pycache__文件夹怎么办?是我们提供给人们的,而不是我们提供的源代码吗?只是我的输入数据吗?这个文件夹不断创建,它是做什么用的?

From what I understand, a cache is an encrypted file of similar files.

What do we do with the __pycache__ folder? Is it what we give to people instead of our source code? Is it just my input data? This folder keeps getting created, what it is for?


回答 0

当您在python中运行程序时,解释器首先将其编译为字节码(这过于简化),并将其存储在__pycache__文件夹中。如果在其中查看,则会在项目文件夹中找到一堆共享.py文件名的文件,只有它们的扩展名是.pyc或.pyo。它们分别是程序文件的字节码编译版本和优化的字节码编译版本。

作为程序员,您基本上可以忽略它……它所做的只是使您的程序启动更快。脚本更改时,将重新编译它们,如果删除文件或整个文件夹并再次运行程序,它们将重新出现(除非您明确禁止这种行为)

如果您使用的是cpython(这是最常见的实现,因为它是参考实现),并且您不想要该文件夹,则可以通过使用-B标志启动解释器来取消显示该文件夹。

python -B foo.py

如tcaswell所述,另一种选择是将环境变量设置PYTHONDONTWRITEBYTECODE为任何值(根据python的手册页,任何“非空字符串”)。

When you run a program in python, the interpreter compiles it to bytecode first (this is an oversimplification) and stores it in the __pycache__ folder. If you look in there you will find a bunch of files sharing the names of the .py files in your project’s folder, only their extensions will be either .pyc or .pyo. These are bytecode-compiled and optimized bytecode-compiled versions of your program’s files, respectively.

As a programmer, you can largely just ignore it… All it does is make your program start a little faster. When your scripts change, they will be recompiled, and if you delete the files or the whole folder and run your program again, they will reappear (unless you specifically suppress that behavior)

If you are using cpython (which is the most common, as it’s the reference implementation) and you don’t want that folder, then you can suppress it by starting the interpreter with the -B flag, for example

python -B foo.py

Another option, as noted by tcaswell, is to set the environment variable PYTHONDONTWRITEBYTECODE to any value (according to python’s man page, any “non-empty string”).


回答 1

__pycache__是一个包含已编译并准备执行的Python 3字节码的文件夹。

我不建议您定期删除这些文件或在开发过程中禁止创建文件,因为这可能会影响性能。只需准备好一个递归命令(请参见下文)即可在需要时进行清理,因为在极端情况下字节码可能会变得过时(请参见注释)。

Python程序员通常忽略字节码。事实上,__pycache__*.pyc有共同线看.gitignore文件。字节码不用于分发,可以使用dismodule进行反汇编。


如果使用的是OS X,则可以通过从项目的根文件夹运行以下命令来轻松地将所有这些文件夹隐藏在项目中。

find . -name '__pycache__' -exec chflags hidden {} \;

更换__pycache__*.pyc的Python 2。

这会在所有这些目录(.pyc文件)上设置一个标志,告诉Finder / Textmate 2将其从列表中排除。重要的是字节码在那里,它只是隐藏的。

如果创建新模块并希望隐藏新的字节码或删除隐藏的字节码文件,请重新运行该命令。


在Windows上,可以使用等效命令(未经测试,欢迎使用批处理脚本):

dir * /s/b | findstr __pycache__ | attrib +h +s +r

这与使用右键单击>隐藏…浏览项目隐藏文件夹相同。


运行单元测试是一种方案(在注释中更多),在该方案中删除*.pyc文件和__pycache__文件夹确实很有用。我在我的代码中使用以下几行,~/.bash_profilecl在需要时进行清理。

alias cpy='find . -name "__pycache__" -delete'
alias cpc='find . -name "*.pyc"       -delete'
...
alias cl='cpy && cpc && ...'

__pycache__ is a folder containing Python 3 bytecode compiled and ready to be executed.

I don’t recommend routinely deleting these files or suppressing creation during development as it may hurt performance. Just have a recursive command ready (see below) to clean up when needed as bytecode can become stale in edge cases (see comments).

Python programmers usually ignore bytecode. Indeed __pycache__ and *.pyc are common lines to see in .gitignore files. Bytecode is not meant for distribution and can be disassembled using dis module.


If you are using OS X you can easily hide all of these folders in your project by running following command from the root folder of your project.

find . -name '__pycache__' -exec chflags hidden {} \;

Replace __pycache__ with *.pyc for Python 2.

This sets a flag on all those directories (.pyc files) telling Finder/Textmate 2 to exclude them from listings. Importantly the bytecode is there, it’s just hidden.

Rerun the command if you create new modules and wish to hide new bytecode or if you delete the hidden bytecode files.


On Windows the equivalent command might be (not tested, batch script welcome):

dir * /s/b | findstr __pycache__ | attrib +h +s +r

Which is same as going through the project hiding folders using right-click > hide…


Running unit tests is one scenario (more in comments) where deleting the *.pyc files and __pycache__ folders is indeed useful. I use the following lines in my ~/.bash_profile and just run cl to clean up when needed.

alias cpy='find . -name "__pycache__" -delete'
alias cpc='find . -name "*.pyc"       -delete'
...
alias cl='cpy && cpc && ...'

回答 2

__pycache__使用以下行时将创建一个文件夹:

import file_name

或尝试从您创建的另一个文件中获取信息。这使得第二次运行程序打开另一个文件时速度更快。

A __pycache__ folder is created when you use the line:

import file_name

or try to get information from another file you have created. This makes it a little faster when running your program the second time to open the other file.


回答 3

更新了3.7+文档中的答案:

为了加快模块的加载速度,Python将每个模块的编译版本都缓存在__pycache__名称下的 目录中module.version.pyc,该版本对编译文件的格式进行编码;它通常包含Python版本号。例如,在CPython版本3.3中,spam.py的编译版本将被缓存为__pycache__/spam.cpython-33.pyc。此命名约定允许来自不同发行版和不同版本的Python的编译模块共存。

来源:https : //docs.python.org/3/tutorial/modules.html#compiled-python-files

也就是说,该目录由Python生成,并且存在以使您的程序运行更快。它不应致力于源代码控制,而应与本地源代码共存。


__pycache__是一个目录,其中包含由python自动生成的字节码缓存文件,即已编译的python或.pyc文件。您可能想知道为什么Python(一种“解释”语言)根本没有任何编译文件。这个SO问题解决了这个问题(绝对值得阅读此答案)。

python文档更深入地介绍了它的确切工作方式及其存在的原因:

  • 它是在python 3.2中添加的,因为现有的将.pyc文件保存在同一目录中的系统会引起各种问题,例如,使用不同版本的Python解释器运行程序时。有关完整功能的规范,请参阅PEP 3174

Updated answer from 3.7+ docs:

To speed up loading modules, Python caches the compiled version of each module in the __pycache__ directory under the name module.version.pyc, where the version encodes the format of the compiled file; it generally contains the Python version number. For example, in CPython release 3.3 the compiled version of spam.py would be cached as __pycache__/spam.cpython-33.pyc. This naming convention allows compiled modules from different releases and different versions of Python to coexist.

Source: https://docs.python.org/3/tutorial/modules.html#compiled-python-files

That is, this directory is generated by Python and exists to make your programs run faster. It shouldn’t be committed to source control, and should coexist in peace with your local source code.


__pycache__ is a directory that contains bytecode cache files that are automatically generated by python, namely compiled python, or .pyc, files. You might be wondering why Python, an “interpreted” language, has any compiled files at all. This SO question addresses that (and it’s definitely worth reading this answer).

The python docs go into more depth about exactly how it works and why it exists:

  • It was added in python 3.2 because the existing system of maintaining .pyc files in the same directory caused various problems, such as when a program was run with Python interpreters of different versions. For the full feature spec, see PEP 3174.

回答 4

来自官方python教程模块

为了加快模块的加载速度,Python将每个模块的编译版本都缓存在__pycache__名称下的目录中module.version.pyc,该版本对编译文件的格式进行编码;它通常包含Python版本号。例如,在CPython版本3.6中,spam.py的编译版本将被缓存为__pycache__/spam.cpython-36.pyc

来自Python doc 编程常见问题解答

首次导入模块时(或自创建当前编译文件以来源文件已更改),应在__pycache__包含该.py文件的目录的子目录中创建一个包含编译代码的.pyc 文件。该.pyc文件的文件名以与该.py文件相同的名称开头,以结尾.pyc,其中间部分取决于创建该文件的特定python二进制文件。

from the official python tutorial Modules

To speed up loading modules, Python caches the compiled version of each module in the __pycache__ directory under the name module.version.pyc, where the version encodes the format of the compiled file; it generally contains the Python version number. For example, in CPython release 3.6 the compiled version of spam.py would be cached as __pycache__/spam.cpython-36.pyc.

from Python doc Programming FAQs

When a module is imported for the first time (or when the source file has changed since the current compiled file was created) a .pyc file containing the compiled code should be created in a __pycache__ subdirectory of the directory containing the .py file. The .pyc file will have a filename that starts with the same name as the .py file, and ends with .pyc, with a middle component that depends on the particular python binary that created it.


回答 5

执行python脚本将导致字节码在内存中生成并保留直到程序关闭。如果导入了模块,则为了提高重用性,Python会创建一个缓存.pyc(PYC是“ Python”“已编译”)文件,其中将要导入的模块的字节码存储在该文件中。想法是通过避免重新导入时避免重新编译(一次编译,多次运行策略)来加快python模块的加载。

文件名与模块名相同。起始点后面的部分表示创建缓存的Python实现(可以是CPython),其后是版本号。

Execution of a python script would cause the byte code to be generated in memory and kept until the program is shutdown. In case a module is imported, for faster reusability, Python would create a cache .pyc (PYC is ‘Python’ ‘Compiled’) file where the byte code of the module being imported is cached. Idea is to speed up loading of python modules by avoiding re-compilation ( compile once, run multiple times policy ) when they are re-imported.

The name of the file is the same as the module name. The part after the initial dot indicates Python implementation that created the cache (could be CPython) followed by its version number.


回答 6

python解释器编译* .py脚本文件,并将编译结果保存到__pycache__目录中。

当再次执行该项目时,如果解释器识别出* .py脚本尚未被修改,则它将跳过编译步骤并运行存储在该__pycache__文件夹中的先前生成的* .pyc文件。

当项目很复杂时,您可以缩短项目运行的准备时间。如果程序太小,则可以通过python -B abc.pyB选项一起使用来忽略它。

The python interpreter compiles the *.py script file and saves the results of the compilation to the __pycache__ directory.

When the project is executed again, if the interpreter identifies that the *.py script has not been modified, it skips the compile step and runs the previously generated *.pyc file stored in the __pycache__ folder.

When the project is complex, you can make the preparation time before the project is run shorter. If the program is too small, you can ignore that by using python -B abc.py with the B option.


回答 7

解释器编译代码时,Python版本2.x将具有.pyc

当解释器编译代码时,Python版本3.x将具有__pycache__

alok@alok:~$ ls
module.py  module.pyc  __pycache__  test.py
alok@alok:~$

Python Version 2.x will have .pyc when interpreter compiles the code.

Python Version 3.x will have __pycache__ when interpreter compiles the code.

alok@alok:~$ ls
module.py  module.pyc  __pycache__  test.py
alok@alok:~$

回答 8

在3.2及更高版本中,Python将.pyc编译后的字节代码文件保存在一个子目录__pycache__中,该子目录位于源文件所在的目录中,该目录中带有标识创建它们的Python版本的文件名(例如script.cpython-33.pyc)。

In 3.2 and later, Python saves .pyc compiled byte code files in a sub-directory named __pycache__ located in the directory where your source files reside with filenames that identify the Python version that created them (e.g. script.cpython-33.pyc)


回答 9

当您导入模块

import file_name

Python将已编译的字节码存储在__pycache__目录中,以便将来的导入可以直接使用它,而不必再次解析和编译源代码。

仅导入文件时,它不会仅运行脚本就这样做。

(以前的版本用于将缓存的字节码存储为.pyc文件,这些文件与.py文件位于同一目录中,但是从Python 3开始,它们被移到了一个子目录以使内容变得整洁。)

PYTHONDONTWRITEBYTECODE —>如果将其设置为非空字符串,Python将不会尝试在源模块的导入中写入.pyc文件。这等效于指定-B选项。

When you import a module,

import file_name

Python stores the compiled bytecode in __pycache__ directory so that future imports can use it directly, rather than having to parse and compile the source again.

It does not do that for merely running a script, only when a file is imported.

(Previous versions used to store the cached bytecode as .pyc files that littered up the same directory as the .py files, but starting in Python 3 they were moved to a subdirectory to make things tidier.)

PYTHONDONTWRITEBYTECODE —> If this is set to a non-empty string, Python won’t try to write .pyc files on the import of source modules. This is equivalent to specifying the -B option.