Python:绑定未绑定方法?

问题:Python:绑定未绑定方法?

在Python中,有没有办法绑定未绑定的方法而不调用它?

我正在编写一个wxPython程序,对于某个类,我决定将所有按钮的数据分组为一个类级别的元组列表是很不错的,如下所示:

class MyWidget(wx.Window):
    buttons = [("OK", OnOK),
               ("Cancel", OnCancel)]

    # ...

    def Setup(self):
        for text, handler in MyWidget.buttons:

            # This following line is the problem line.
            b = wx.Button(parent, label=text).Bind(wx.EVT_BUTTON, handler)

问题是,因为所有的值handler都是未绑定方法,所以我的程序爆炸得很厉害,我哭了。

我正在网上寻找解决方案,该方案应该是一个相对简单,可解决的问题。不幸的是我找不到任何东西。现在,我正在functools.partial解决此问题,但是没有人知道是否存在一种干净,健康,Pythonic的方式将未绑定的方法绑定到实例并继续传递它而不调用它吗?

In Python, is there a way to bind an unbound method without calling it?

I am writing a wxPython program, and for a certain class I decided it’d be nice to group the data of all of my buttons together as a class-level list of tuples, like so:

class MyWidget(wx.Window):
    buttons = [("OK", OnOK),
               ("Cancel", OnCancel)]

    # ...

    def Setup(self):
        for text, handler in MyWidget.buttons:

            # This following line is the problem line.
            b = wx.Button(parent, label=text).Bind(wx.EVT_BUTTON, handler)

The problem is, since all of the values of handler are unbound methods, my program explodes in a spectacular blaze and I weep.

I was looking around online for a solution to what seems like should be a relatively straightforward, solvable problem. Unfortunately I couldn’t find anything. Right now, I’m using functools.partial to work around this, but does anyone know if there’s a clean-feeling, healthy, Pythonic way to bind an unbound method to an instance and continue passing it around without calling it?


回答 0

所有函数也是描述符,因此您可以通过调用它们的__get__方法来绑定它们:

bound_handler = handler.__get__(self, MyWidget)

这是R. Hettinger 关于描述符的出色指南


作为一个独立的例子,请参考Keith的 评论

def bind(instance, func, as_name=None):
    """
    Bind the function *func* to *instance*, with either provided name *as_name*
    or the existing name of *func*. The provided *func* should accept the 
    instance as the first argument, i.e. "self".
    """
    if as_name is None:
        as_name = func.__name__
    bound_method = func.__get__(instance, instance.__class__)
    setattr(instance, as_name, bound_method)
    return bound_method

class Thing:
    def __init__(self, val):
        self.val = val

something = Thing(21)

def double(self):
    return 2 * self.val

bind(something, double)
something.double()  # returns 42

All functions are also descriptors, so you can bind them by calling their __get__ method:

bound_handler = handler.__get__(self, MyWidget)

Here’s R. Hettinger’s excellent guide to descriptors.


As a self-contained example pulled from Keith’s comment:

def bind(instance, func, as_name=None):
    """
    Bind the function *func* to *instance*, with either provided name *as_name*
    or the existing name of *func*. The provided *func* should accept the 
    instance as the first argument, i.e. "self".
    """
    if as_name is None:
        as_name = func.__name__
    bound_method = func.__get__(instance, instance.__class__)
    setattr(instance, as_name, bound_method)
    return bound_method

class Thing:
    def __init__(self, val):
        self.val = val

something = Thing(21)

def double(self):
    return 2 * self.val

bind(something, double)
something.double()  # returns 42

回答 1

可以使用types.MethodType干净地完成此操作。例:

import types

def f(self): print self

class C(object): pass

meth = types.MethodType(f, C(), C) # Bind f to an instance of C
print meth # prints <bound method C.f of <__main__.C object at 0x01255E90>>

This can be done cleanly with types.MethodType. Example:

import types

def f(self): print self

class C(object): pass

meth = types.MethodType(f, C(), C) # Bind f to an instance of C
print meth # prints <bound method C.f of <__main__.C object at 0x01255E90>>

回答 2

创建一个带有self的闭包在技术上不会绑定该函数,但这是解决相同(或非常相似)潜在问题的另一种方法。这是一个简单的例子:

self.method = (lambda self: lambda args: self.do(args))(self)

Creating a closure with self in it will not technically bind the function, but it is an alternative way of solving the same (or very similar) underlying problem. Here’s a trivial example:

self.method = (lambda self: lambda args: self.do(args))(self)

回答 3

这将绑定selfhandler

bound_handler = lambda *args, **kwargs: handler(self, *args, **kwargs)

这是通过将self第一个参数传递给函数来实现的。object.function()只是语法糖function(object)

This will bind self to handler:

bound_handler = lambda *args, **kwargs: handler(self, *args, **kwargs)

This works by passing self as the first argument to the function. object.function() is just syntactic sugar for function(object).


回答 4

晚了,但是我来这里有一个类似的问题:我有一个类方法和一个实例,并且想要将该实例应用于该方法。

冒着过于简化OP问题的风险,我最终做了一些不太神秘的事情,这可能会对到这里来的其他人有用(注意:我正在使用Python 3 – YMMV工作)。

考虑这个简单的类:

class Foo(object):

    def __init__(self, value):
        self._value = value

    def value(self):
        return self._value

    def set_value(self, value):
        self._value = value

这是您可以使用的方法:

>>> meth = Foo.set_value   # the method
>>> a = Foo(12)            # a is an instance with value 12
>>> meth(a, 33)            # apply instance and method
>>> a.value()              # voila - the method was called
33

Late to the party, but I came here with a similar question: I have a class method and an instance, and want to apply the instance to the method.

At the risk of oversimplifying the OP’s question, I ended up doing something less mysterious that may be useful to others who arrive here (caveat: I’m working in Python 3 — YMMV).

Consider this simple class:

class Foo(object):

    def __init__(self, value):
        self._value = value

    def value(self):
        return self._value

    def set_value(self, value):
        self._value = value

Here’s what you can do with it:

>>> meth = Foo.set_value   # the method
>>> a = Foo(12)            # a is an instance with value 12
>>> meth(a, 33)            # apply instance and method
>>> a.value()              # voila - the method was called
33

无法使用Ctrl-C终止Python脚本

问题:无法使用Ctrl-C终止Python脚本

我正在使用以下脚本测试Python线程:

import threading

class FirstThread (threading.Thread):
    def run (self):
        while True:
            print 'first'

class SecondThread (threading.Thread):
    def run (self):
        while True:
            print 'second'

FirstThread().start()
SecondThread().start()

它在Kubuntu 11.10上的Python 2.7中运行。Ctrl+ C不会杀死它。我还尝试为系统信号添加处理程序,但这没有帮助:

import signal 
import sys
def signal_handler(signal, frame):
    sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)

为了终止进程,我使用Ctrl+ 将程序发送到后台后通过PID将其终止Z,这不会被忽略。为什么Ctrl+ C被如此持久地忽略?我该如何解决?

I am testing Python threading with the following script:

import threading

class FirstThread (threading.Thread):
    def run (self):
        while True:
            print 'first'

class SecondThread (threading.Thread):
    def run (self):
        while True:
            print 'second'

FirstThread().start()
SecondThread().start()

This is running in Python 2.7 on Kubuntu 11.10. Ctrl+C will not kill it. I also tried adding a handler for system signals, but that did not help:

import signal 
import sys
def signal_handler(signal, frame):
    sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)

To kill the process I am killing it by PID after sending the program to the background with Ctrl+Z, which isn’t being ignored. Why is Ctrl+C being ignored so persistently? How can I resolve this?


回答 0

Ctrl+ C终止主线程,但是因为您的线程不在守护程序模式下,所以它们继续运行,并且使进程保持活动状态。我们可以使它们成为守护进程:

f = FirstThread()
f.daemon = True
f.start()
s = SecondThread()
s.daemon = True
s.start()

但是,还有另一个问题-一旦主线程启动了线程,就别无他法了。因此它退出了,线程立即被销毁。因此,让主线程保持活动状态:

import time
while True:
    time.sleep(1)

现在它将保留打印“ first”和“ second”,直到您按Ctrl+ 为止C

编辑:正如评论者所指出的,守护进程线程可能没有机会清理临时文件之类的东西。如果需要,请抓住KeyboardInterrupt主线程并协调其清理和关闭。但是在很多情况下,让守护线程突然死掉可能已经足够了。

Ctrl+C terminates the main thread, but because your threads aren’t in daemon mode, they keep running, and that keeps the process alive. We can make them daemons:

f = FirstThread()
f.daemon = True
f.start()
s = SecondThread()
s.daemon = True
s.start()

But then there’s another problem – once the main thread has started your threads, there’s nothing else for it to do. So it exits, and the threads are destroyed instantly. So let’s keep the main thread alive:

import time
while True:
    time.sleep(1)

Now it will keep print ‘first’ and ‘second’ until you hit Ctrl+C.

Edit: as commenters have pointed out, the daemon threads may not get a chance to clean up things like temporary files. If you need that, then catch the KeyboardInterrupt on the main thread and have it co-ordinate cleanup and shutdown. But in many cases, letting daemon threads die suddenly is probably good enough.


回答 1

KeyboardInterrupt和信号仅在进程(即主线程)中可见…看看Ctrl-c即KeyboardInterrupt杀死python中的线程

KeyboardInterrupt and signals are only seen by the process (ie the main thread)… Have a look at Ctrl-c i.e. KeyboardInterrupt to kill threads in python


回答 2

我认为最好在您希望线程死亡时在线程上调用join()。我对您的代码采取了一些自由以结束循环(您也可以在其中添加所需的任何清理需求)。每次通过时都要检查变量die是否为真,当它为True时,程序将退出。

import threading
import time

class MyThread (threading.Thread):
    die = False
    def __init__(self, name):
        threading.Thread.__init__(self)
        self.name = name

    def run (self):
        while not self.die:
            time.sleep(1)
            print (self.name)

    def join(self):
        self.die = True
        super().join()

if __name__ == '__main__':
    f = MyThread('first')
    f.start()
    s = MyThread('second')
    s.start()
    try:
        while True:
            time.sleep(2)
    except KeyboardInterrupt:
        f.join()
        s.join()

I think it’s best to call join() on your threads when you expect them to die. I’ve taken some liberty with your code to make the loops end (you can add whatever cleanup needs are required to there as well). The variable die is checked for truth on each pass and when it’s True then the program exits.

import threading
import time

class MyThread (threading.Thread):
    die = False
    def __init__(self, name):
        threading.Thread.__init__(self)
        self.name = name

    def run (self):
        while not self.die:
            time.sleep(1)
            print (self.name)

    def join(self):
        self.die = True
        super().join()

if __name__ == '__main__':
    f = MyThread('first')
    f.start()
    s = MyThread('second')
    s.start()
    try:
        while True:
            time.sleep(2)
    except KeyboardInterrupt:
        f.join()
        s.join()

回答 3

@Thomas K的答案的改进版本:

  • is_any_thread_alive()根据此要旨定义助手功能,该功能可以main()自动终止。

示例代码:

import threading

def job1():
    ...

def job2():
    ...

def is_any_thread_alive(threads):
    return True in [t.is_alive() for t in threads]

if __name__ == "__main__":
    ...
    t1 = threading.Thread(target=job1,daemon=True)
    t2 = threading.Thread(target=job2,daemon=True)
    t1.start()
    t2.start()

    while is_any_thread_alive([t1,t2]):
        time.sleep(0)

An improved version of @Thomas K’s answer:

  • Defining an assistant function is_any_thread_alive() according to this gist, which can terminates the main() automatically.

Example codes:

import threading

def job1():
    ...

def job2():
    ...

def is_any_thread_alive(threads):
    return True in [t.is_alive() for t in threads]

if __name__ == "__main__":
    ...
    t1 = threading.Thread(target=job1,daemon=True)
    t2 = threading.Thread(target=job2,daemon=True)
    t1.start()
    t2.start()

    while is_any_thread_alive([t1,t2]):
        time.sleep(0)

如何在PyCharm中选择Python版本?

问题:如何在PyCharm中选择Python版本?

我拥有PyCharm 1.5.4,并使用“打开目录”选项在IDE中打开文件夹的内容。

我选择了Python 3.2版(它显示在“外部库”节点下)。

如何选择其他版本的Python(已经安装在计算机上),以便PyCharm改用该版本?

I have PyCharm 1.5.4 and have used the “Open Directory” option to open the contents of a folder in the IDE.

I have Python version 3.2 selected (it shows up under the “External Libraries” node).

How can I select another version of Python (that I already have installed on my machine) so that PyCharm uses that version instead?


回答 0

文件->设置

首选项->项目解释器-> Python解释器

如果未列出,则添加它。

File -> Settings

Preferences->Project Interpreter->Python Interpreters

If it’s not listed add it.


回答 1

我认为您是说您已经安装了python2和python3,并在Pycharm>设置>项目解释器下添加了对每个版本的引用

我想您要问的是如何在Python 2上运行某些项目,在Python 3上运行某些项目。

如果是这样,您可以在“运行”>“编辑配置”下查看

I think you are saying that you have python2 and python3 installed and have added a reference to each version under Pycharm > Settings > Project Interpreter

What I think you are asking is how do you have some projects run with Python 2 and some projects running with Python 3.

If so, you can look under Run > Edit Configurations


回答 2

PyCharm 2019.1+

状态栏中有一项名为“ 解释器”的新功能(向下滚动一点)。这使得在python解释器之间切换和查看所使用的版本更加容易。

启用状态栏

如果看不到状态栏,则可以通过运行“查找操作”命令(在Mac上为Ctrl+ Shift+ A+ + A)轻松激活它。然后输入状态栏并选择查看:状态栏以查看它。

PyCharm 2019.1+

There is a new feature called Interpreter in status bar (scroll down a little bit). This makes switching between python interpreters and seeing which version you’re using easier.

Enable status bar

In case you cannot see the status bar, you can easily activate it by running the Find Action command (Ctrl+Shift+A or + +A on mac). Then type status bar and choose View: Status Bar to see it.


回答 3

在集成了PyCharm的Intellij Ultimate中也可能发生这种情况。问题如上所述,您选择了错误的解释器。

要针对任何给定项目修复此问题的确切方法是转到“ 项目设置项目”,然后调整Project SDK。如果您没有添加Python 3,则可以通过导航到python3二进制文件来添加New Project SDK。这将修复上面列出的错误。“项目设置”的快捷方式是蓝色棋盘式图标。

您还可以将Python 3添加为Python项目的默认解释器。在OSX上,它位于文件中其他设置默认项目结构。您可以在此处设置Project SDK,该SDK现在将应用于每个新项目。它在其他平台上可以有所不同,但仍然相似。

This can also happen in Intellij Ultimate, which has PyCharm integrated. The issue is as diagnosed above, you have the wrong interpreter selected.

The exact method to fix this for any given project is to go to Project SettingsProject and adjust the Project SDK. You can add a New Project SDK if you don’t have Python 3 added by navigating to the python3 binary. This will fix the errors listed above. A shortcut to Project Settings is the blue checkerboard-type icon.

You can also add Python 3 as the default interpreter for Python projects. On OSX this is in File..Other SettingsDefault Project Structure. There you can set the Project SDK which will now apply on each new project. It can be different on other platforms, but still similar.


回答 4

去:

Files -> Settings -> Project -> *"Your Project Name"* -> Project Interpreter

在这里,您可以查看已为python2安装了哪些外部库以及为python3安装了哪些外部库。

根据您的要求选择所需的python版本。

Go to:

Files -> Settings -> Project -> *"Your Project Name"* -> Project Interpreter

There you can see which external libraries you have installed for python2 and which for python3.

Select the required python version according to your requirements.


回答 5

快速回答:

  • File -> Setting
  • project部分的左侧->Project interpreter
  • 选择所需 Project interpreter
  • Apply + OK

[ 注意 ]:

在Pycharm 2018和2017上测试。


Quick Answer:

  • File –> Setting
  • In left side in project section –> Project interpreter
  • Select desired Project interpreter
  • Apply + OK

[NOTE]:

Tested on Pycharm 2018 and 2017.



元类有哪些(具体的)用例?

问题:元类有哪些(具体的)用例?

我有一个喜欢使用元类的朋友,并定期提供它们作为解决方案。

我认为您几乎不需要使用元类。为什么?因为我认为如果您正在对一个类进行类似的操作,那么您可能应该对一个对象进行此操作。并进行了少量的重新设计/重构。

能够使用元类已经使很多地方的许多人将类用作某种二流对象,这对我来说似乎是灾难性的。用元编程代替编程吗?不幸的是,添加了类装饰器使它变得更加可以接受。

所以,我很想知道您在Python中对元类的有效(具体)用例。还是要启发一下为什么有时候变异类比变异对象更好。

我将开始:

有时在使用第三方库时,能够以某种方式对类进行更改很有用。

(这是我能想到的唯一情况,并不具体)

I have a friend who likes to use metaclasses, and regularly offers them as a solution.

I am of the mind that you almost never need to use metaclasses. Why? because I figure if you are doing something like that to a class, you should probably be doing it to an object. And a small redesign/refactor is in order.

Being able to use metaclasses has caused a lot of people in a lot of places to use classes as some kind of second rate object, which just seems disastrous to me. Is programming to be replaced by meta-programming? The addition of class decorators has unfortunately made it even more acceptable.

So please, I am desperate to know your valid (concrete) use-cases for metaclasses in Python. Or to be enlightened as to why mutating classes is better than mutating objects, sometimes.

I will start:

Sometimes when using a third-party library it is useful to be able to mutate the class in a certain way.

(This is the only case I can think of, and it’s not concrete)


回答 0

我有一个处理非交互式绘图的类,作为Matplotlib的前端。但是,有时需要进行交互式绘图。仅使用几个函数,我发现我能够增加图形数量,手动调用绘制等,但是我需要在每次绘制调用之前和之后执行这些操作。因此,要创建交互式绘图包装器和屏幕外绘图包装器,我发现通过元类包装适当的方法来执行此操作比执行以下操作更有效:

class PlottingInteractive:
    add_slice = wrap_pylab_newplot(add_slice)

该方法不能跟上API的更改等等,但是__init__在重新设置类属性之前对类属性进行迭代的方法效率更高,并且可以保持最新状态:

class _Interactify(type):
    def __init__(cls, name, bases, d):
        super(_Interactify, cls).__init__(name, bases, d)
        for base in bases:
            for attrname in dir(base):
                if attrname in d: continue # If overridden, don't reset
                attr = getattr(cls, attrname)
                if type(attr) == types.MethodType:
                    if attrname.startswith("add_"):
                        setattr(cls, attrname, wrap_pylab_newplot(attr))
                    elif attrname.startswith("set_"):
                        setattr(cls, attrname, wrap_pylab_show(attr))

当然,也许有更好的方法可以做到这一点,但是我发现这是有效的。当然,这也可以在__new__或中完成__init__,但这是我发现最直接的解决方案。

I have a class that handles non-interactive plotting, as a frontend to Matplotlib. However, on occasion one wants to do interactive plotting. With only a couple functions I found that I was able to increment the figure count, call draw manually, etc, but I needed to do these before and after every plotting call. So to create both an interactive plotting wrapper and an offscreen plotting wrapper, I found it was more efficient to do this via metaclasses, wrapping the appropriate methods, than to do something like:

class PlottingInteractive:
    add_slice = wrap_pylab_newplot(add_slice)

This method doesn’t keep up with API changes and so on, but one that iterates over the class attributes in __init__ before re-setting the class attributes is more efficient and keeps things up to date:

class _Interactify(type):
    def __init__(cls, name, bases, d):
        super(_Interactify, cls).__init__(name, bases, d)
        for base in bases:
            for attrname in dir(base):
                if attrname in d: continue # If overridden, don't reset
                attr = getattr(cls, attrname)
                if type(attr) == types.MethodType:
                    if attrname.startswith("add_"):
                        setattr(cls, attrname, wrap_pylab_newplot(attr))
                    elif attrname.startswith("set_"):
                        setattr(cls, attrname, wrap_pylab_show(attr))

Of course, there might be better ways to do this, but I’ve found this to be effective. Of course, this could also be done in __new__ or __init__, but this was the solution I found the most straightforward.


回答 1

最近有人问我同样的问题,并提出了几个答案。我希望可以重新启动该线程,因为我想详细说明所提到的一些用例,并添加一些新用例。

我见过的大多数元类都执行以下两项操作之一:

  1. 注册(将类添加到数据结构中):

    models = {}
    
    class ModelMetaclass(type):
        def __new__(meta, name, bases, attrs):
            models[name] = cls = type.__new__(meta, name, bases, attrs)
            return cls
    
    class Model(object):
        __metaclass__ = ModelMetaclass

    每当您子类化时Model,您的Class都会在models字典中注册:

    >>> class A(Model):
    ...     pass
    ...
    >>> class B(A):
    ...     pass
    ...
    >>> models
    {'A': <__main__.A class at 0x...>,
     'B': <__main__.B class at 0x...>}

    这也可以使用类装饰器来完成:

    models = {}
    
    def model(cls):
        models[cls.__name__] = cls
        return cls
    
    @model
    class A(object):
        pass

    或具有显式注册功能:

    models = {}
    
    def register_model(cls):
        models[cls.__name__] = cls
    
    class A(object):
        pass
    
    register_model(A)

    实际上,这几乎是相同的:您不利地提到了类装饰器,但是实际上,对于类的函数调用而言,它只不过是语法糖,所以这没有什么魔术。

    无论如何,在这种情况下,元类的优点是继承,因为它们适用于任何子类,而其他解决方案仅适用于显式修饰或注册的子类。

    >>> class B(A):
    ...     pass
    ...
    >>> models
    {'A': <__main__.A class at 0x...> # No B :(
  2. 重构(修改类属性或添加新属性):

    class ModelMetaclass(type):
        def __new__(meta, name, bases, attrs):
            fields = {}
            for key, value in attrs.items():
                if isinstance(value, Field):
                    value.name = '%s.%s' % (name, key)
                    fields[key] = value
            for base in bases:
                if hasattr(base, '_fields'):
                    fields.update(base._fields)
            attrs['_fields'] = fields
            return type.__new__(meta, name, bases, attrs)
    
    class Model(object):
        __metaclass__ = ModelMetaclass

    每当您子类化Model并定义一些Field属性时,它们就会被注入其名称(例如,用于提供更多有用的错误消息),并分组到_fields字典中(以方便迭代,而不必查看所有类属性及其所有基类的’属性每次):

    >>> class A(Model):
    ...     foo = Integer()
    ...
    >>> class B(A):
    ...     bar = String()
    ...
    >>> B._fields
    {'foo': Integer('A.foo'), 'bar': String('B.bar')}

    同样,可以使用类装饰器完成此操作(不继承):

    def model(cls):
        fields = {}
        for key, value in vars(cls).items():
            if isinstance(value, Field):
                value.name = '%s.%s' % (cls.__name__, key)
                fields[key] = value
        for base in cls.__bases__:
            if hasattr(base, '_fields'):
                fields.update(base._fields)
        cls._fields = fields
        return cls
    
    @model
    class A(object):
        foo = Integer()
    
    class B(A):
        bar = String()
    
    # B.bar has no name :(
    # B._fields is {'foo': Integer('A.foo')} :(

    或明确地:

    class A(object):
        foo = Integer('A.foo')
        _fields = {'foo': foo} # Don't forget all the base classes' fields, too!

    尽管与您主张的可读性和可维护性的非元编程相反,但这更加麻烦,冗余且容易出错:

    class B(A):
        bar = String()
    
    # vs.
    
    class B(A):
        bar = String('bar')
        _fields = {'B.bar': bar, 'A.foo': A.foo}

考虑了最常见和具体的用例之后,您绝对必须使用元类的唯一情况是您想修改类名称或基类列表,因为一旦定义,这些参数就被烘焙到类中,并且没有装饰器或功能可以取消它们。

class Metaclass(type):
    def __new__(meta, name, bases, attrs):
        return type.__new__(meta, 'foo', (int,), attrs)

class Baseclass(object):
    __metaclass__ = Metaclass

class A(Baseclass):
    pass

class B(A):
    pass

print A.__name__ # foo
print B.__name__ # foo
print issubclass(B, A)   # False
print issubclass(B, int) # True

每当定义具有相似名称或不完整继承树的类时,在发出警告的框架中这可能都是有用的,但是除了实际更改这些值之外,我没有想到其他原因。也许大卫·比兹利可以。

无论如何,在Python 3中,元类也具有__prepare__方法,该方法使您可以将类主体评估为除之外的映射dict,从而支持有序属性,重载属性和其他邪恶的东西:

import collections

class Metaclass(type):

    @classmethod
    def __prepare__(meta, name, bases, **kwds):
        return collections.OrderedDict()

    def __new__(meta, name, bases, attrs, **kwds):
        print(list(attrs))
        # Do more stuff...

class A(metaclass=Metaclass):
    x = 1
    y = 2

# prints ['x', 'y'] rather than ['y', 'x']

 

class ListDict(dict):
    def __setitem__(self, key, value):
        self.setdefault(key, []).append(value)

class Metaclass(type):

    @classmethod
    def __prepare__(meta, name, bases, **kwds):
        return ListDict()

    def __new__(meta, name, bases, attrs, **kwds):
        print(attrs['foo'])
        # Do more stuff...

class A(metaclass=Metaclass):

    def foo(self):
        pass

    def foo(self, x):
        pass

# prints [<function foo at 0x...>, <function foo at 0x...>] rather than <function foo at 0x...>

您可能会争辩说,可以使用创建计数器来实现有序属性,并且可以使用默认参数来模拟重载:

import itertools

class Attribute(object):
    _counter = itertools.count()
    def __init__(self):
        self._count = Attribute._counter.next()

class A(object):
    x = Attribute()
    y = Attribute()

A._order = sorted([(k, v) for k, v in vars(A).items() if isinstance(v, Attribute)],
                  key = lambda (k, v): v._count)

 

class A(object):

    def _foo0(self):
        pass

    def _foo1(self, x):
        pass

    def foo(self, x=None):
        if x is None:
            return self._foo0()
        else:
            return self._foo1(x)

除了难看之外,它的灵活性也更低:如果您想要有序的文字属性(如整数和字符串)怎么办?如果None有效值是x什么呢?

这是解决第一个问题的创造性方法:

import sys

class Builder(object):
    def __call__(self, cls):
        cls._order = self.frame.f_code.co_names
        return cls

def ordered():
    builder = Builder()
    def trace(frame, event, arg):
        builder.frame = frame
        sys.settrace(None)
    sys.settrace(trace)
    return builder

@ordered()
class A(object):
    x = 1
    y = 'foo'

print A._order # ['x', 'y']

这是解决第二种问题的一种创造性方法:

_undefined = object()

class A(object):

    def _foo0(self):
        pass

    def _foo1(self, x):
        pass

    def foo(self, x=_undefined):
        if x is _undefined:
            return self._foo0()
        else:
            return self._foo1(x)

但是,这远比简单的元类(尤其是第一个真正使您的大脑融化的元类)更加巫毒教。我的观点是,您将元类视为陌生且违反直觉的事物,但也可以将它们视为编程语言发展的下一步:您只需要调整心态即可。毕竟,您可能可以在C中完成所有操作,包括使用函数指针定义结构并将其作为函数的第一个参数传递。初次接触C ++的人可能会说:“这是什么魔术?为什么编译器会隐式传递this方法,而不是常规和静态函数?最好是对参数进行露骨和冗长。”但是,一旦获得面向对象的编程,它就会变得功能强大得多;嗯,我想……准面向方面的编程。了解元类,它们实际上非常简单,那么为什么不方便使用它们呢?

最后,元类是rad,编程应该很有趣。始终使用标准的编程构造和设计模式既无聊又无济于事,并且阻碍了您的想象力。坚持一下!这是一个元数据类,仅供您使用。

class MetaMetaclass(type):
    def __new__(meta, name, bases, attrs):
        def __new__(meta, name, bases, attrs):
            cls = type.__new__(meta, name, bases, attrs)
            cls._label = 'Made in %s' % meta.__name__
            return cls 
        attrs['__new__'] = __new__
        return type.__new__(meta, name, bases, attrs)

class China(type):
    __metaclass__ = MetaMetaclass

class Taiwan(type):
    __metaclass__ = MetaMetaclass

class A(object):
    __metaclass__ = China

class B(object):
    __metaclass__ = Taiwan

print A._label # Made in China
print B._label # Made in Taiwan

编辑

这是一个很老的问题,但仍在投票中,因此我想添加一个指向更全面答案的链接。如果您想了解有关元类及其使用的更多信息,我刚刚在这里发表了一篇有关元类的文章。

I was asked the same question recently, and came up with several answers. I hope it’s OK to revive this thread, as I wanted to elaborate on a few of the use cases mentioned, and add a few new ones.

Most metaclasses I’ve seen do one of two things:

  1. Registration (adding a class to a data structure):

    models = {}
    
    class ModelMetaclass(type):
        def __new__(meta, name, bases, attrs):
            models[name] = cls = type.__new__(meta, name, bases, attrs)
            return cls
    
    class Model(object):
        __metaclass__ = ModelMetaclass
    

    Whenever you subclass Model, your class is registered in the models dictionary:

    >>> class A(Model):
    ...     pass
    ...
    >>> class B(A):
    ...     pass
    ...
    >>> models
    {'A': <__main__.A class at 0x...>,
     'B': <__main__.B class at 0x...>}
    

    This can also be done with class decorators:

    models = {}
    
    def model(cls):
        models[cls.__name__] = cls
        return cls
    
    @model
    class A(object):
        pass
    

    Or with an explicit registration function:

    models = {}
    
    def register_model(cls):
        models[cls.__name__] = cls
    
    class A(object):
        pass
    
    register_model(A)
    

    Actually, this is pretty much the same: you mention class decorators unfavorably, but it’s really nothing more than syntactic sugar for a function invocation on a class, so there’s no magic about it.

    Anyway, the advantage of metaclasses in this case is inheritance, as they work for any subclasses, whereas the other solutions only work for subclasses explicitly decorated or registered.

    >>> class B(A):
    ...     pass
    ...
    >>> models
    {'A': <__main__.A class at 0x...> # No B :(
    
  2. Refactoring (modifying class attributes or adding new ones):

    class ModelMetaclass(type):
        def __new__(meta, name, bases, attrs):
            fields = {}
            for key, value in attrs.items():
                if isinstance(value, Field):
                    value.name = '%s.%s' % (name, key)
                    fields[key] = value
            for base in bases:
                if hasattr(base, '_fields'):
                    fields.update(base._fields)
            attrs['_fields'] = fields
            return type.__new__(meta, name, bases, attrs)
    
    class Model(object):
        __metaclass__ = ModelMetaclass
    

    Whenever you subclass Model and define some Field attributes, they are injected with their names (for more informative error messages, for example), and grouped into a _fields dictionary (for easy iteration, without having to look through all the class attributes and all its base classes’ attributes every time):

    >>> class A(Model):
    ...     foo = Integer()
    ...
    >>> class B(A):
    ...     bar = String()
    ...
    >>> B._fields
    {'foo': Integer('A.foo'), 'bar': String('B.bar')}
    

    Again, this can be done (without inheritance) with a class decorator:

    def model(cls):
        fields = {}
        for key, value in vars(cls).items():
            if isinstance(value, Field):
                value.name = '%s.%s' % (cls.__name__, key)
                fields[key] = value
        for base in cls.__bases__:
            if hasattr(base, '_fields'):
                fields.update(base._fields)
        cls._fields = fields
        return cls
    
    @model
    class A(object):
        foo = Integer()
    
    class B(A):
        bar = String()
    
    # B.bar has no name :(
    # B._fields is {'foo': Integer('A.foo')} :(
    

    Or explicitly:

    class A(object):
        foo = Integer('A.foo')
        _fields = {'foo': foo} # Don't forget all the base classes' fields, too!
    

    Although, on the contrary to your advocacy for readable and maintainable non-meta programming, this is much more cumbersome, redundant and error prone:

    class B(A):
        bar = String()
    
    # vs.
    
    class B(A):
        bar = String('bar')
        _fields = {'B.bar': bar, 'A.foo': A.foo}
    

Having considered the most common and concrete use cases, the only cases where you absolutely HAVE to use metaclasses are when you want to modify the class name or list of base classes, because once defined, these parameters are baked into the class, and no decorator or function can unbake them.

class Metaclass(type):
    def __new__(meta, name, bases, attrs):
        return type.__new__(meta, 'foo', (int,), attrs)

class Baseclass(object):
    __metaclass__ = Metaclass

class A(Baseclass):
    pass

class B(A):
    pass

print A.__name__ # foo
print B.__name__ # foo
print issubclass(B, A)   # False
print issubclass(B, int) # True

This may be useful in frameworks for issuing warnings whenever classes with similar names or incomplete inheritance trees are defined, but I can’t think of a reason beside trolling to actually change these values. Maybe David Beazley can.

Anyway, in Python 3, metaclasses also have the __prepare__ method, which lets you evaluate the class body into a mapping other than a dict, thus supporting ordered attributes, overloaded attributes, and other wicked cool stuff:

import collections

class Metaclass(type):

    @classmethod
    def __prepare__(meta, name, bases, **kwds):
        return collections.OrderedDict()

    def __new__(meta, name, bases, attrs, **kwds):
        print(list(attrs))
        # Do more stuff...

class A(metaclass=Metaclass):
    x = 1
    y = 2

# prints ['x', 'y'] rather than ['y', 'x']

 

class ListDict(dict):
    def __setitem__(self, key, value):
        self.setdefault(key, []).append(value)

class Metaclass(type):

    @classmethod
    def __prepare__(meta, name, bases, **kwds):
        return ListDict()

    def __new__(meta, name, bases, attrs, **kwds):
        print(attrs['foo'])
        # Do more stuff...

class A(metaclass=Metaclass):

    def foo(self):
        pass

    def foo(self, x):
        pass

# prints [<function foo at 0x...>, <function foo at 0x...>] rather than <function foo at 0x...>

You might argue ordered attributes can be achieved with creation counters, and overloading can be simulated with default arguments:

import itertools

class Attribute(object):
    _counter = itertools.count()
    def __init__(self):
        self._count = Attribute._counter.next()

class A(object):
    x = Attribute()
    y = Attribute()

A._order = sorted([(k, v) for k, v in vars(A).items() if isinstance(v, Attribute)],
                  key = lambda (k, v): v._count)

 

class A(object):

    def _foo0(self):
        pass

    def _foo1(self, x):
        pass

    def foo(self, x=None):
        if x is None:
            return self._foo0()
        else:
            return self._foo1(x)

Besides being much more ugly, it’s also less flexible: what if you want ordered literal attributes, like integers and strings? What if None is a valid value for x?

Here’s a creative way to solve the first problem:

import sys

class Builder(object):
    def __call__(self, cls):
        cls._order = self.frame.f_code.co_names
        return cls

def ordered():
    builder = Builder()
    def trace(frame, event, arg):
        builder.frame = frame
        sys.settrace(None)
    sys.settrace(trace)
    return builder

@ordered()
class A(object):
    x = 1
    y = 'foo'

print A._order # ['x', 'y']

And here’s a creative way to solve the second one:

_undefined = object()

class A(object):

    def _foo0(self):
        pass

    def _foo1(self, x):
        pass

    def foo(self, x=_undefined):
        if x is _undefined:
            return self._foo0()
        else:
            return self._foo1(x)

But this is much, MUCH voodoo-er than a simple metaclass (especially the first one, which really melts your brain). My point is, you look at metaclasses as unfamiliar and counter-intuitive, but you can also look at them as the next step of evolution in programming languages: you just have to adjust your mindset. After all, you could probably do everything in C, including defining a struct with function pointers and passing it as the first argument to its functions. A person seeing C++ for the first time might say, “what is this magic? Why is the compiler implicitly passing this to methods, but not to regular and static functions? It’s better to be explicit and verbose about your arguments”. But then, object-oriented programming is much more powerful once you get it; and so is this, uh… quasi-aspect-oriented programming, I guess. And once you understand metaclasses, they’re actually very simple, so why not use them when convenient?

And finally, metaclasses are rad, and programming should be fun. Using standard programming constructs and design patterns all the time is boring and uninspiring, and hinders your imagination. Live a little! Here’s a metametaclass, just for you.

class MetaMetaclass(type):
    def __new__(meta, name, bases, attrs):
        def __new__(meta, name, bases, attrs):
            cls = type.__new__(meta, name, bases, attrs)
            cls._label = 'Made in %s' % meta.__name__
            return cls 
        attrs['__new__'] = __new__
        return type.__new__(meta, name, bases, attrs)

class China(type):
    __metaclass__ = MetaMetaclass

class Taiwan(type):
    __metaclass__ = MetaMetaclass

class A(object):
    __metaclass__ = China

class B(object):
    __metaclass__ = Taiwan

print A._label # Made in China
print B._label # Made in Taiwan

Edit

This is a pretty old question, but it’s still getting upvotes, so I thought I’d add a link to a more comprehensive answer. If you’d like to read more about metaclasses and their uses, I’ve just published an article about it here.


回答 2

元类的目的不是用元类/类代替类/对象的区别,而是以某种方式改变类定义(及其实例)的行为。有效地是,以对默认域更有用的方式更改类语句的行为。我使用它们的目的是:

  • 跟踪子类,通常用于注册处理程序。当使用插件样式设置时,这很方便,您希望通过子类化并设置一些类属性来为特定事件注册处理程序。例如。假设您为各种音乐格式编写了一个处理程序,其中每个类针对其类型实现适当的方法(播放/获取标签等)。为新类型添加处理程序将变为:

    class Mp3File(MusicFile):
        extensions = ['.mp3']  # Register this type as a handler for mp3 files
        ...
        # Implementation of mp3 methods go here

    然后,元类维护{'.mp3' : MP3File, ... }etc 的字典,并在通过工厂函数请求处理程序时构造适当类型的对象。

  • 改变行为。您可能想对某些属性附加特殊含义,从而导致它们出现时行为发生变化。例如,你可能想寻找一个名为方法_get_foo_set_foo和透明地转换为性能。作为一个真实的例子,这是我写的食谱,旨在提供更多类似C的结构定义。元类用于将声明的项转换为结构格式的字符串,处理继承等,并生成能够处理的类。

    对于其他实际示例,请查看各种ORM,例如sqlalchemy ORM或sqlobject。同样,目的是解释具有特定含义的定义(此处为SQL列定义)。

The purpose of metaclasses isn’t to replace the class/object distinction with metaclass/class – it’s to change the behaviour of class definitions (and thus their instances) in some way. Effectively it’s to alter the behaviour of the class statement in ways that may be more useful for your particular domain than the default. The things I have used them for are:

  • Tracking subclasses, usually to register handlers. This is handy when using a plugin style setup, where you wish to register a handler for a particular thing simply by subclassing and setting up a few class attributes. eg. suppose you write a handler for various music formats, where each class implements appropriate methods (play / get tags etc) for its type. Adding a handler for a new type becomes:

    class Mp3File(MusicFile):
        extensions = ['.mp3']  # Register this type as a handler for mp3 files
        ...
        # Implementation of mp3 methods go here
    

    The metaclass then maintains a dictionary of {'.mp3' : MP3File, ... } etc, and constructs an object of the appropriate type when you request a handler through a factory function.

  • Changing behaviour. You may want to attach a special meaning to certain attributes, resulting in altered behaviour when they are present. For example, you may want to look for methods with the name _get_foo and _set_foo and transparently convert them to properties. As a real-world example, here’s a recipe I wrote to give more C-like struct definitions. The metaclass is used to convert the declared items into a struct format string, handling inheritance etc, and produce a class capable of dealing with it.

    For other real-world examples, take a look at various ORMs, like sqlalchemy’s ORM or sqlobject. Again, the purpose is to interpret defintions (here SQL column definitions) with a particular meaning.


回答 3

让我们从蒂姆·彼得的经典名言开始:

元类是比99%的用户应该担心的更深的魔力。如果您想知道是否需要它们,则不需要(实际上需要它们的人肯定会知道他们需要它们,并且不需要解释原因)。蒂姆·彼得斯(clp post 2002-12-22)

话虽如此,我(定期地)遇到了元类的真实用法。我想到的是在Django中,所有模型都继承自models.Model。反过来,models.Model确实做了一些严肃的魔术,将您的数据库模型与Django的ORM优势包装在一起。这种魔术通过元类发生。它创建各种形式的异常类,管理器类等。

有关故事的开头,请参见django / db / models / base.py,类ModelBase()。

Let’s start with Tim Peter’s classic quote:

Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don’t (the people who actually need them know with certainty that they need them, and don’t need an explanation about why). Tim Peters (c.l.p post 2002-12-22)

Having said that, I have (periodically) run across true uses of metaclasses. The one that comes to mind is in Django where all of your models inherit from models.Model. models.Model, in turn, does some serious magic to wrap your DB models with Django’s ORM goodness. That magic happens by way of metaclasses. It creates all manner of exception classes, manager classes, etc. etc.

See django/db/models/base.py, class ModelBase() for the beginning of the story.


回答 4

元类可以很方便地在Python中构造领域特定语言。具体的例子是Django,它是SQLObject的数据库模式声明式语法。

Ian Bicking的“ 保守的元类”中的一个基本示例:

我使用的元类主要是为了支持一种声明性的编程风格。例如,考虑一个验证模式:

class Registration(schema.Schema):
    first_name = validators.String(notEmpty=True)
    last_name = validators.String(notEmpty=True)
    mi = validators.MaxLength(1)
    class Numbers(foreach.ForEach):
        class Number(schema.Schema):
            type = validators.OneOf(['home', 'work'])
            phone_number = validators.PhoneNumber()

其他一些技术:用Python构建DSL的成分(pdf)。

编辑(由Ali编写):我希望使用一个使用集合和实例进行此操作的示例。重要的事实是实例,它们可以为您提供更多功能,并消除使用元类的原因。进一步值得一提的是,您的示例使用了类和实例的混合体,这肯定表明您不能只对元类进行所有操作。并创建了一种真正不统一的方式。

number_validator = [
    v.OneOf('type', ['home', 'work']),
    v.PhoneNumber('phone_number'),
]

validators = [
    v.String('first_name', notEmpty=True),
    v.String('last_name', notEmpty=True),
    v.MaxLength('mi', 1),
    v.ForEach([number_validator,])
]

这不是完美的,但是魔术几乎为零,不需要元类,并且一致性更高。

Metaclasses can be handy for construction of Domain Specific Languages in Python. Concrete examples are Django, SQLObject ‘s declarative syntax of database schemata.

A basic example from A Conservative Metaclass by Ian Bicking:

The metaclasses I’ve used have been primarily to support a sort of declarative style of programming. For instance, consider a validation schema:

class Registration(schema.Schema):
    first_name = validators.String(notEmpty=True)
    last_name = validators.String(notEmpty=True)
    mi = validators.MaxLength(1)
    class Numbers(foreach.ForEach):
        class Number(schema.Schema):
            type = validators.OneOf(['home', 'work'])
            phone_number = validators.PhoneNumber()

Some other techniques: Ingredients for Building a DSL in Python (pdf).

Edit (by Ali): An example of doing this using collections and instances is what I would prefer. The important fact is the instances, which give you more power, and eliminate reason to use metaclasses. Further worth noting that your example uses a mixture of classes and instances, which is surely an indication that you can’t just do it all with metaclasses. And creates a truly non-uniform way of doing it.

number_validator = [
    v.OneOf('type', ['home', 'work']),
    v.PhoneNumber('phone_number'),
]

validators = [
    v.String('first_name', notEmpty=True),
    v.String('last_name', notEmpty=True),
    v.MaxLength('mi', 1),
    v.ForEach([number_validator,])
]

It’s not perfect, but already there is almost zero magic, no need for metaclasses, and improved uniformity.


回答 5

元类使用的合理模式是在定义类时执行一次操作,而不是在实例化相同类时重复执行操作。

当多个类共享相同的特殊行为时,重复__metaclass__=X显然比重复专用代码和/或引入即席共享超类更好。

但是,即使只有一个特殊类且没有可预见的扩展,对于元类而言, __new__与在类定义主体中__init__混合专用代码和normal defclassstatement 相比,初始化类变量或其他全局数据是一种更干净的方法。

A reasonable pattern of metaclass use is doing something once when a class is defined rather than repeatedly whenever the same class is instantiated.

When multiple classes share the same special behaviour, repeating __metaclass__=X is obviously better than repeating the special purpose code and/or introducing ad-hoc shared superclasses.

But even with only one special class and no foreseeable extension, __new__ and __init__ of a metaclass are a cleaner way to initialize class variables or other global data than intermixing special-purpose code and normal def and class statements in the class definition body.


回答 6

我唯一在Python中使用元类的时间是在为Flickr API编写包装器时。

我的目标是抓取flickr的api网站并动态生成完整的类层次结构,以允许使用Python对象进行API访问:

# Both the photo type and the flickr.photos.search API method 
# are generated at "run-time"
for photo in flickr.photos.search(text=balloons):
    print photo.description

因此,在该示例中,因为我从网站生成了整个Python Flickr API,所以我真的不知道运行时的类定义。能够动态生成类型非常有用。

The only time I used metaclasses in Python was when writing a wrapper for the Flickr API.

My goal was to scrape flickr’s api site and dynamically generate a complete class hierarchy to allow API access using Python objects:

# Both the photo type and the flickr.photos.search API method 
# are generated at "run-time"
for photo in flickr.photos.search(text=balloons):
    print photo.description

So in that example, because I generated the entire Python Flickr API from the website, I really don’t know the class definitions at runtime. Being able to dynamically generate types was very useful.


回答 7

就在昨天,我在想同样的事情,完全同意。我认为,尝试使声明式更具声明性而导致的代码复杂性通常会使代码库更难以维护,更难阅读且Python的使用率也更低。它通常也需要大量的copy.copy()ing(以保持继承并从类复制到实例),这意味着您必须在许多地方查看发生了什么(始终从元类向上查看),这与蟒纹也。我一直在浏览formencode和sqlalchemy代码,以查看这种声明式样式是否值得,而显然不值得。这种样式应留给描述符(例如属性和方法)和不可变数据。Ruby对这样的声明式样式提供了更好的支持,我很高兴核心python语言没有走这条路。

我可以看到它们在调试中的用途,可以将一个元类添加到您所有的基类中以获得更丰富的信息。我还看到它们仅在(非常)大型项目中使用,以摆脱一些样板代码(但有一点不清楚)。例如,sqlalchemy 确实在其他地方使用了它们,以基于所有子类的类定义中的属性值向所有子类添加特定的自定义方法,例如玩具示例

class test(baseclass_with_metaclass):
    method_maker_value = "hello"

可能有一个元类,该元类在该类中生成了一个基于“ hello”的特殊属性的方法(例如,将“ hello”添加到字符串末尾的方法)。确保不必在您创建的每个子类中都编写方法,而对于您所要定义的只是method_maker_value,这对维护性有好处。

但是,对此的需求如此罕见,并且仅减少了一点打字,除非您有足够大的代码库,否则它根本不值得考虑。

I was thinking the same thing just yesterday and completely agree. The complications in the code caused by attempts to make it more declarative generally make the codebase harder to maintain, harder to read and less pythonic in my opinion. It also normally requires a lot of copy.copy()ing (to maintain inheritance and to copy from class to instance) and means you have to look in many places to see whats going on (always looking from metaclass up) which goes against the python grain also. I have been picking through formencode and sqlalchemy code to see if such a declarative style was worth it and its clearly not. Such style should be left to descriptors (such as property and methods) and immutable data. Ruby has better support for such declarative styles and I am glad the core python language is not going down that route.

I can see their use for debugging, add a metaclass to all your base classes to get richer info. I also see their use only in (very) large projects to get rid of some boilerplate code (but at the loss of clarity). sqlalchemy for example does use them elsewhere, to add a particular custom method to all subclasses based on an attribute value in their class definition e.g a toy example

class test(baseclass_with_metaclass):
    method_maker_value = "hello"

could have a metaclass that generated a method in that class with special properties based on “hello” (say a method that added “hello” to the end of a string). It could be good for maintainability to make sure you did not have to write a method in every subclass you make instead all you have to define is method_maker_value.

The need for this is so rare though and only cuts down on a bit of typing that its not really worth considering unless you have a large enough codebase.


回答 8

您永远不必绝对使用元类,因为您始终可以使用要修改的类的继承或聚合来构造一个可以满足您需要的类。

就是说,在Smalltalk和Ruby中,能够修改现有的类非常方便,但是Python不喜欢直接这样做。

关于Python元类化的一篇很好的DeveloperWorks文章可能会有所帮助。在维基百科的文章也还不错。

You never absolutely need to use a metaclass, since you can always construct a class that does what you want using inheritance or aggregation of the class you want to modify.

That said, it can be very handy in Smalltalk and Ruby to be able to modify an existing class, but Python doesn’t like to do that directly.

There’s an excellent DeveloperWorks article on metaclassing in Python that might help. The Wikipedia article is also pretty good.


回答 9

当多个线程尝试与它们进行交互时,某些GUI库会遇到麻烦。tkinter就是这样一个例子;尽管可以通过事件和队列显式处理问题,但以完全忽略该问题的方式使用该库要简单得多。看得出-元类的魔力。

在某些情况下,能够无缝地动态重写整个库,使其在多线程应用程序中按预期正常工作,这可能会非常有用。该safetkinter模块这是否与所提供的元类的帮助threadbox模块-事件和队列没有必要的。

它的一个整洁的方面threadbox是它不在乎它克隆什么类。它提供了一个示例,说明如果需要,元类可以如何触及所有基类。元类附带的另一个好处是它们也可以在继承类上运行。编写自己的程序-为什么不呢?

Some GUI libraries have trouble when multiple threads try to interact with them. tkinter is one such example; and while one can explicitly handle the problem with events and queues, it can be far simpler to use the library in a manner that ignores the problem altogether. Behold — the magic of metaclasses.

Being able to dynamically rewrite an entire library seamlessly so that it works properly as expected in a multithreaded application can be extremely helpful in some circumstances. The safetkinter module does that with the help of a metaclass provided by the threadbox module — events and queues not needed.

One neat aspect of threadbox is that it does not care what class it clones. It provides an example of how all base classes can be touched by a metaclass if needed. A further benefit that comes with metaclasses is that they run on inheriting classes as well. Programs that write themselves — why not?


回答 10

元类的唯一合法用例是防止其他管闲的开发人员接触您的代码。当一个管闲的开发人员掌握了元类并开始与您的类一起玩耍时,请投入另一个或两个级别以将其排除在外。如果这样不起作用,请开始使用type.__new__递归元类或者使用某种方案使用递归元类。

(用舌头写在脸颊上,但我已经看到这种混淆处理了。Django是一个很好的例子)

The only legitimate use-case of a metaclass is to keep other nosy developers from touching your code. Once a nosy developer masters metaclasses and starts poking around with yours, throw in another level or two to keep them out. If that doesn’t work, start using type.__new__ or perhaps some scheme using a recursive metaclass.

(written tongue in cheek, but I’ve seen this kind of obfuscation done. Django is a perfect example)


回答 11

元类不能代替编程!它们只是一个技巧,可以使某些任务自动化或变得更加优雅。一个很好的例子是Pygments语法高亮库。它具有一个称为的类,该类RegexLexer使用户可以将一组词法化规则定义为类上的正则表达式。元类用于将定义变成有用的解析器。

他们就像盐一样。过多使用很容易。

Metaclasses aren’t replacing programming! They’re just a trick which can automate or make more elegant some tasks. A good example of this is Pygments syntax highlighting library. It has a class called RegexLexer which lets the user define a set of lexing rules as regular expressions on a class. A metaclass is used to turn the definitions into a useful parser.

They’re like salt; it’s easy to use too much.


回答 12

我使用元类的方式是为类提供一些属性。举个例子:

class NameClass(type):
    def __init__(cls, *args, **kwargs):
       type.__init__(cls, *args, **kwargs)
       cls.name = cls.__name__

将在每个将元类设置为指向NameClass的类上放置name属性。

The way I used metaclasses was to provide some attributes to classes. Take for example:

class NameClass(type):
    def __init__(cls, *args, **kwargs):
       type.__init__(cls, *args, **kwargs)
       cls.name = cls.__name__

will put the name attribute on every class that will have the metaclass set to point to NameClass.


回答 13

这是次要用途,但是…我发现元类有用的一件事是每当创建子类时都调用一个函数。我将其编码为一个寻找__initsubclass__属性的元类:每当创建一个子类时,所有使用该方法定义该方法的父类都将被调用__initsubclass__(cls, subcls)。这允许创建一个父类,该父类随后在某个全局注册表中注册所有子类,在定义子类时对它们进行不变检查,执行后期绑定操作等…所有这些都无需手动调用函数创建自定义元类,履行这些单独的职责。

提醒您,我逐渐意识到这种行为的隐含魔力在某种程度上是不可取的,因为如果从上下文中查看类定义是意外的……因此,我不再使用该解决方案来处理其他严重问题__super为每个类和实例初始化一个属性。

This is a minor use, but… one thing I’ve found metaclasses useful for is to invoke a function whenever a subclass is created. I codified this into a metaclass which looks for an __initsubclass__ attribute: whenever a subclass is created, all parent classes which define that method are invoked with __initsubclass__(cls, subcls). This allows creation of a parent class which then registers all subclasses with some global registry, runs invariant checks on subclasses whenever they are defined, perform late-binding operations, etc… all without have to manually call functions or to create custom metaclasses that perform each of these separate duties.

Mind you, I’ve slowly come to realize the implicit magicalness of this behavior is somewhat undesirable, since it’s unexpected if looking at a class definition out of context… and so I’ve moved away from using that solution for anything serious besides initializing a __super attribute for each class and instance.


回答 14

最近,我不得不使用元类来帮助围绕填充有来自美国的人口普查数据的数据库表来声明性定义SQLAlchemy模型 http://census.ire.org/data/bulkdata.html的

IRE提供数据库外壳了人口普查数据表的,这些根据p012015,p012016,p012017等人口普查局的命名约定创建整数列。

我想要a)能够使用model_instance.p012017语法访问这些列,b)对我正在做的事情相当明确,c)不必在模型上明确定义数十个字段,因此我将SQLAlchemy的子类化为DeclarativeMeta一个迭代范围,列并自动创建与列对应的模型字段:

from sqlalchemy.ext.declarative.api import DeclarativeMeta

class CensusTableMeta(DeclarativeMeta):
    def __init__(cls, classname, bases, dict_):
        table = 'p012'
        for i in range(1, 49):
            fname = "%s%03d" % (table, i)
            dict_[fname] = Column(Integer)
            setattr(cls, fname, dict_[fname])

        super(CensusTableMeta, cls).__init__(classname, bases, dict_)

然后,我可以将此元类用于模型定义,并访问模型上自动枚举的字段:

CensusTableBase = declarative_base(metaclass=CensusTableMeta)

class P12Tract(CensusTableBase):
    __tablename__ = 'ire_p12'

    geoid = Column(String(12), primary_key=True)

    @property
    def male_under_5(self):
        return self.p012003

    ...

I recently had to use a metaclass to help declaratively define an SQLAlchemy model around a database table populated with U.S. Census data from http://census.ire.org/data/bulkdata.html

IRE provides database shells for the census data tables, which create integer columns following a naming convention from the Census Bureau of p012015, p012016, p012017, etc.

I wanted to a) be able to access these columns using a model_instance.p012017 syntax, b) be fairly explicit about what I was doing and c) not have to explicitly define dozens of fields on the model, so I subclassed SQLAlchemy’s DeclarativeMeta to iterate through a range of the columns and automatically create model fields corresponding to the columns:

from sqlalchemy.ext.declarative.api import DeclarativeMeta

class CensusTableMeta(DeclarativeMeta):
    def __init__(cls, classname, bases, dict_):
        table = 'p012'
        for i in range(1, 49):
            fname = "%s%03d" % (table, i)
            dict_[fname] = Column(Integer)
            setattr(cls, fname, dict_[fname])

        super(CensusTableMeta, cls).__init__(classname, bases, dict_)

I could then use this metaclass for my model definition and access the automatically enumerated fields on the model:

CensusTableBase = declarative_base(metaclass=CensusTableMeta)

class P12Tract(CensusTableBase):
    __tablename__ = 'ire_p12'

    geoid = Column(String(12), primary_key=True)

    @property
    def male_under_5(self):
        return self.p012003

    ...

回答 15

似乎是一个合法的使用说明在这里 -重写Python的文档字符串与元类。

There seems to be a legitimate use described here – Rewriting Python Docstrings with a Metaclass.


回答 16

我不得不将它们一次用于二进制解析器,以使其更易于使用。您可以定义消息类,其中包含电线上存在的字段的属性。需要按照声明的顺序来订购它们,以从中构造最终的电汇格式。如果使用有序命名空间dict,则可以使用元类来实现。实际上,在元类的示例中:

https://docs.python.org/3/reference/datamodel.html#metaclass-example

但总的来说:如果您确实确实需要增加元类的复杂性,请仔细评估。

I had to use them once for a binary parser to make it easier to use. You define a message class with attributes of the fields present on the wire. They needed to be ordered in the way they were declared to construct the final wire format from it. You can do that with metaclasses, if you use an ordered namespace dict. In fact, its in the examples for Metaclasses:

https://docs.python.org/3/reference/datamodel.html#metaclass-example

But in general: Very carefully evaluate, if you really really need the added complexity of metaclasses.


回答 17

@Dan Gittik的回答很酷

最后的示例可以澄清很多事情,我将其更改为python 3并给出了一些解释:

class MetaMetaclass(type):
    def __new__(meta, name, bases, attrs):
        def __new__(meta, name, bases, attrs):
            cls = type.__new__(meta, name, bases, attrs)
            cls._label = 'Made in %s' % meta.__name__
            return cls

        attrs['__new__'] = __new__
        return type.__new__(meta, name, bases, attrs)

#China is metaclass and it's __new__ method would be changed by MetaMetaclass(metaclass)
class China(MetaMetaclass, metaclass=MetaMetaclass):
    __metaclass__ = MetaMetaclass

#Taiwan is metaclass and it's __new__ method would be changed by MetaMetaclass(metaclass)
class Taiwan(MetaMetaclass, metaclass=MetaMetaclass):
    __metaclass__ = MetaMetaclass

#A is a normal class and it's __new__ method would be changed by China(metaclass)
class A(metaclass=China):
    __metaclass__ = China

#B is a normal class and it's __new__ method would be changed by Taiwan(metaclass)
class B(metaclass=Taiwan):
    __metaclass__ = Taiwan


print(A._label)  # Made in China
print(B._label)  # Made in Taiwan
  • 一切都是对象,所以类是对象
  • 类对象是由元类创建的
  • 从类型继承的所有类都是元类
  • 元类可以控制类的创建
  • 元类也可以控制元类的创建(因此它可以永远循环)
  • 这是元编程…您可以在运行时控制类型系统
  • 再次,一切都是对象,这是一个统一的系统,类型创建类型,类型创建实例

the answer from @Dan Gittik is cool

the examples at the end could clarify many things,I changed it to python 3 and give some explanation:

class MetaMetaclass(type):
    def __new__(meta, name, bases, attrs):
        def __new__(meta, name, bases, attrs):
            cls = type.__new__(meta, name, bases, attrs)
            cls._label = 'Made in %s' % meta.__name__
            return cls

        attrs['__new__'] = __new__
        return type.__new__(meta, name, bases, attrs)

#China is metaclass and it's __new__ method would be changed by MetaMetaclass(metaclass)
class China(MetaMetaclass, metaclass=MetaMetaclass):
    __metaclass__ = MetaMetaclass

#Taiwan is metaclass and it's __new__ method would be changed by MetaMetaclass(metaclass)
class Taiwan(MetaMetaclass, metaclass=MetaMetaclass):
    __metaclass__ = MetaMetaclass

#A is a normal class and it's __new__ method would be changed by China(metaclass)
class A(metaclass=China):
    __metaclass__ = China

#B is a normal class and it's __new__ method would be changed by Taiwan(metaclass)
class B(metaclass=Taiwan):
    __metaclass__ = Taiwan


print(A._label)  # Made in China
print(B._label)  # Made in Taiwan

  • everything is object,so class is object
  • class object is created by metaclass
  • all class inheritted from type is metaclass
  • metaclass could control class creating
  • metaclass could control metaclass creating too(so it could loop for ever)
  • this’s metaprograming…you could control the type system at running time
  • again,everything is object,this’s a uniform system,type create type,and type create instance

回答 18

另一个用例是,当您希望能够修改类级别的属性并确保它仅影响手头的对象时。实际上,这意味着“合并”元类和类实例化的阶段,从而使您仅处理它们自己(唯一)种类的类实例。

当(出于对可读性多态性的考虑)我们要动态定义 property s时,该返回值(可能)是基于(经常更改的)实例级属性的计算结果的,而这只能在类级别完成,因此我还必须这样做在元类实例化之后和类实例化之前。

Another use case is when you want to be able to modify class-level attributes and be sure that it only affects the object at hand. In practice, this implies “merging” the phases of metaclasses and classes instantiations, thus leading you to deal only with class instances of their own (unique) kind.

I also had to do that when (for concerns of readibility and polymorphism) we wanted to dynamically define propertys which returned values (may) result from calculations based on (often changing) instance-level attributes, which can only be done at the class level, i.e. after the metaclass instantiation and before the class instantiation.


与Python生成器模式等效的C ++

问题:与Python生成器模式等效的C ++

我有一些需要在C ++中模仿的示例Python代码。我不需要任何特定的解决方案(例如基于协同例程的收益解决方案,尽管它们也是可接受的答案),我只需要以某种方式重现语义。

Python

这是一个基本的序列生成器,显然太大了,无法存储实例化版本。

def pair_sequence():
    for i in range(2**32):
        for j in range(2**32):
            yield (i, j)

目标是维护上述序列的两个实例,并以半锁步的方式在块上进行迭代。在下面的示例中,first_pass使用对的序列来初始化缓冲区,然后second_pass重新生成相同的精确序列并再次处理缓冲区。

def run():
    seq1 = pair_sequence()
    seq2 = pair_sequence()

    buffer = [0] * 1000
    first_pass(seq1, buffer)
    second_pass(seq2, buffer)
    ... repeat ...

C ++

对于C ++解决方案,我唯一能找到的就是模仿yieldC ++协程,但是我还没有找到有关如何执行此操作的良好参考。我也对解决此问题的替代(非常规)解决方案感兴趣。我没有足够的内存预算来保留两次通过之间的序列副本。

I’ve got some example Python code that I need to mimic in C++. I do not require any specific solution (such as co-routine based yield solutions, although they would be acceptable answers as well), I simply need to reproduce the semantics in some manner.

Python

This is a basic sequence generator, clearly too large to store a materialized version.

def pair_sequence():
    for i in range(2**32):
        for j in range(2**32):
            yield (i, j)

The goal is to maintain two instances of the sequence above, and iterate over them in semi-lockstep, but in chunks. In the example below the first_pass uses the sequence of pairs to initialize the buffer, and the second_pass regenerates the same exact sequence and processes the buffer again.

def run():
    seq1 = pair_sequence()
    seq2 = pair_sequence()

    buffer = [0] * 1000
    first_pass(seq1, buffer)
    second_pass(seq2, buffer)
    ... repeat ...

C++

The only thing I can find for a solution in C++ is to mimic yield with C++ coroutines, but I haven’t found any good reference on how to do this. I’m also interested in alternative (non general) solutions for this problem. I do not have enough memory budget to keep a copy of the sequence between passes.


回答 0

生成器在C ++中存在,只是另外一个名称:Input Iterators。例如,从读取std::cin类似于使用的生成器char

您只需要了解生成器的功能:

  • 有大量数据:局部变量定义状态
  • 有一个初始化方法
  • 有一个“下一个”方法
  • 有一种信号终止的方法

在您的琐碎示例中,这很容易。从概念上讲:

struct State { unsigned i, j; };

State make();

void next(State&);

bool isDone(State const&);

当然,我们将其包装为适当的类:

class PairSequence:
    // (implicit aliases)
    public std::iterator<
        std::input_iterator_tag,
        std::pair<unsigned, unsigned>
    >
{
  // C++03
  typedef void (PairSequence::*BoolLike)();
  void non_comparable();
public:
  // C++11 (explicit aliases)
  using iterator_category = std::input_iterator_tag;
  using value_type = std::pair<unsigned, unsigned>;
  using reference = value_type const&;
  using pointer = value_type const*;
  using difference_type = ptrdiff_t;

  // C++03 (explicit aliases)
  typedef std::input_iterator_tag iterator_category;
  typedef std::pair<unsigned, unsigned> value_type;
  typedef value_type const& reference;
  typedef value_type const* pointer;
  typedef ptrdiff_t difference_type;

  PairSequence(): done(false) {}

  // C++11
  explicit operator bool() const { return !done; }

  // C++03
  // Safe Bool idiom
  operator BoolLike() const {
    return done ? 0 : &PairSequence::non_comparable;
  }

  reference operator*() const { return ij; }
  pointer operator->() const { return &ij; }

  PairSequence& operator++() {
    static unsigned const Max = std::numeric_limts<unsigned>::max();

    assert(!done);

    if (ij.second != Max) { ++ij.second; return *this; }
    if (ij.first != Max) { ij.second = 0; ++ij.first; return *this; }

    done = true;
    return *this;
  }

  PairSequence operator++(int) {
    PairSequence const tmp(*this);
    ++*this;
    return tmp;
  }

private:
  bool done;
  value_type ij;
};

所以,嗯…可能是C ++有点冗长:)

Generators exist in C++, just under another name: Input Iterators. For example, reading from std::cin is similar to having a generator of char.

You simply need to understand what a generator does:

  • there is a blob of data: the local variables define a state
  • there is an init method
  • there is a “next” method
  • there is a way to signal termination

In your trivial example, it’s easy enough. Conceptually:

struct State { unsigned i, j; };

State make();

void next(State&);

bool isDone(State const&);

Of course, we wrap this as a proper class:

class PairSequence:
    // (implicit aliases)
    public std::iterator<
        std::input_iterator_tag,
        std::pair<unsigned, unsigned>
    >
{
  // C++03
  typedef void (PairSequence::*BoolLike)();
  void non_comparable();
public:
  // C++11 (explicit aliases)
  using iterator_category = std::input_iterator_tag;
  using value_type = std::pair<unsigned, unsigned>;
  using reference = value_type const&;
  using pointer = value_type const*;
  using difference_type = ptrdiff_t;

  // C++03 (explicit aliases)
  typedef std::input_iterator_tag iterator_category;
  typedef std::pair<unsigned, unsigned> value_type;
  typedef value_type const& reference;
  typedef value_type const* pointer;
  typedef ptrdiff_t difference_type;

  PairSequence(): done(false) {}

  // C++11
  explicit operator bool() const { return !done; }

  // C++03
  // Safe Bool idiom
  operator BoolLike() const {
    return done ? 0 : &PairSequence::non_comparable;
  }

  reference operator*() const { return ij; }
  pointer operator->() const { return &ij; }

  PairSequence& operator++() {
    static unsigned const Max = std::numeric_limts<unsigned>::max();

    assert(!done);

    if (ij.second != Max) { ++ij.second; return *this; }
    if (ij.first != Max) { ij.second = 0; ++ij.first; return *this; }

    done = true;
    return *this;
  }

  PairSequence operator++(int) {
    PairSequence const tmp(*this);
    ++*this;
    return tmp;
  }

private:
  bool done;
  value_type ij;
};

So hum yeah… might be that C++ is a tad more verbose :)


回答 1

在C ++中有迭代器,但是实现迭代器并不容易:必须查阅迭代器概念并仔细设计新的迭代器类以实现它们。值得庆幸的是,Boost有一个iterator_facade模板,该模板应该有助于实现迭代器和兼容迭代器的生成器。

有时可以使用无堆栈协程来实现迭代器

PS另请参见 本文,其中同时提到了switchChristopher M. Kohlhoff 的黑客行为和Oliver Kowalke的Boost.Coroutine。Oliver Kowalke的工作 Giovanni P. Deretta 对Boost.Coroutine 的后续

PS我想你也可以用lambdas编写一种生成器:

std::function<int()> generator = []{
  int i = 0;
  return [=]() mutable {
    return i < 10 ? i++ : -1;
  };
}();
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;

或使用函子:

struct generator_t {
  int i = 0;
  int operator() () {
    return i < 10 ? i++ : -1;
  }
} generator;
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;

PS这是用Mordor协同程序实现的生成器:

#include <iostream>
using std::cout; using std::endl;
#include <mordor/coroutine.h>
using Mordor::Coroutine; using Mordor::Fiber;

void testMordor() {
  Coroutine<int> coro ([](Coroutine<int>& self) {
    int i = 0; while (i < 9) self.yield (i++);
  });
  for (int i = coro.call(); coro.state() != Fiber::TERM; i = coro.call()) cout << i << endl;
}

In C++ there are iterators, but implementing an iterator isn’t straightforward: one has to consult the iterator concepts and carefully design the new iterator class to implement them. Thankfully, Boost has an iterator_facade template which should help implementing the iterators and iterator-compatible generators.

Sometimes a stackless coroutine can be used to implement an iterator.

P.S. See also this article which mentions both a switch hack by Christopher M. Kohlhoff and Boost.Coroutine by Oliver Kowalke. Oliver Kowalke’s work is a followup on Boost.Coroutine by Giovanni P. Deretta.

P.S. I think you can also write a kind of generator with lambdas:

std::function<int()> generator = []{
  int i = 0;
  return [=]() mutable {
    return i < 10 ? i++ : -1;
  };
}();
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;

Or with a functor:

struct generator_t {
  int i = 0;
  int operator() () {
    return i < 10 ? i++ : -1;
  }
} generator;
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;

P.S. Here’s a generator implemented with the Mordor coroutines:

#include <iostream>
using std::cout; using std::endl;
#include <mordor/coroutine.h>
using Mordor::Coroutine; using Mordor::Fiber;

void testMordor() {
  Coroutine<int> coro ([](Coroutine<int>& self) {
    int i = 0; while (i < 9) self.yield (i++);
  });
  for (int i = coro.call(); coro.state() != Fiber::TERM; i = coro.call()) cout << i << endl;
}

回答 2

由于Boost.Coroutine2现在很好地支持了它(我找到它是因为我想解决完全相同的yield问题),所以我发布了符合您最初意图的C ++代码:

#include <stdint.h>
#include <iostream>
#include <memory>
#include <boost/coroutine2/all.hpp>

typedef boost::coroutines2::coroutine<std::pair<uint16_t, uint16_t>> coro_t;

void pair_sequence(coro_t::push_type& yield)
{
    uint16_t i = 0;
    uint16_t j = 0;
    for (;;) {
        for (;;) {
            yield(std::make_pair(i, j));
            if (++j == 0)
                break;
        }
        if (++i == 0)
            break;
    }
}

int main()
{
    coro_t::pull_type seq(boost::coroutines2::fixedsize_stack(),
                          pair_sequence);
    for (auto pair : seq) {
        print_pair(pair);
    }
    //while (seq) {
    //    print_pair(seq.get());
    //    seq();
    //}
}

在此示例中,pair_sequence不接受其他参数。如果需要,std::bind或者在将lambda push_type传递给coro_t::pull_type构造函数时,应使用lambda生成仅包含(of )个参数的函数对象。

Since Boost.Coroutine2 now supports it very well (I found it because I wanted to solve exactly the same yield problem), I am posting the C++ code that matches your original intention:

#include <stdint.h>
#include <iostream>
#include <memory>
#include <boost/coroutine2/all.hpp>

typedef boost::coroutines2::coroutine<std::pair<uint16_t, uint16_t>> coro_t;

void pair_sequence(coro_t::push_type& yield)
{
    uint16_t i = 0;
    uint16_t j = 0;
    for (;;) {
        for (;;) {
            yield(std::make_pair(i, j));
            if (++j == 0)
                break;
        }
        if (++i == 0)
            break;
    }
}

int main()
{
    coro_t::pull_type seq(boost::coroutines2::fixedsize_stack(),
                          pair_sequence);
    for (auto pair : seq) {
        print_pair(pair);
    }
    //while (seq) {
    //    print_pair(seq.get());
    //    seq();
    //}
}

In this example, pair_sequence does not take additional arguments. If it needs to, std::bind or a lambda should be used to generate a function object that takes only one argument (of push_type), when it is passed to the coro_t::pull_type constructor.


回答 3

所有涉及编写自己的迭代器的答案都是完全错误的。这样的答案完全错过了Python生成器的意义(该语言最大的独特功能之一)。生成器最重要的是执行从中断处开始执行。迭代器不会发生这种情况。取而代之的是,您必须手动存储状态信息,以便在重新调用operator ++或operator * 时,在下一个函数调用的最开始便有正确的信息。这就是为什么编写自己的C ++迭代器是一个巨大的痛苦。相反,生成器优雅,并且易于读写。

我认为本机C ++中没有适合Python生成器的良好模拟,至少目前还没有(有传言称yield将在C ++ 17中实现)。您可以借助第三方(例如,永伟的Boost建议)或自己动手做一些类似的事情。

我会说本机C ++中最接近的东西是线程。线程可以维护一组暂停的局部变量,并且可以在中断处继续执行,这与生成器非常相似,但是您需要使用一些附加的基础架构来支持生成器对象与其调用者之间的通信。例如

// Infrastructure

template <typename Element>
class Channel { ... };

// Application

using IntPair = std::pair<int, int>;

void yield_pairs(int end_i, int end_j, Channel<IntPair>* out) {
  for (int i = 0; i < end_i; ++i) {
    for (int j = 0; j < end_j; ++j) {
      out->send(IntPair{i, j});  // "yield"
    }
  }
  out->close();
}

void MyApp() {
  Channel<IntPair> pairs;
  std::thread generator(yield_pairs, 32, 32, &pairs);
  for (IntPair pair : pairs) {
    UsePair(pair);
  }
  generator.join();
}

但是,此解决方案有几个缺点:

  1. 线程是“昂贵的”。大多数人会认为这是对线程的“过度”使用,尤其是当生成器如此简单时。
  2. 您需要记住一些清理操作。这些可以是自动化的,但是您将需要更多的基础架构,而这些基础架构又可能被视为“过于奢侈”。无论如何,您需要进行的清理工作是:
    1. out-> close()
    2. generator.join()
  3. 这不允许您停止生成器。您可以进行一些修改以添加该功能,但是这会使代码混乱。它永远不会像Python的yield语句那么干净。
  4. 除2之外,每次您要“实例化”生成器对象时,还需要其他样板位:
    1. 通道*输出参数
    2. 主变量:对,生成器

All answers that involve writing your own iterator are completely wrong. Such answers entirely miss the point of Python generators (one of the language’s greatest and unique features). The most important thing about generators is that execution picks up where it left off. This does not happen to iterators. Instead, you must manually store state information such that when operator++ or operator* is called anew, the right information is in place at the very beginning of the next function call. This is why writing your own C++ iterator is a gigantic pain; whereas, generators are elegant, and easy to read+write.

I don’t think there is a good analog for Python generators in native C++, at least not yet (there is a rummor that yield will land in C++17). You can get something similarish by resorting to third-party (e.g. Yongwei’s Boost suggestion), or rolling your own.

I would say the closest thing in native C++ is threads. A thread can maintain a suspended set of local variables, and can continue execution where it left off, very much like generators, but you need to roll a little bit of additional infrastructure to support communication between the generator object and its caller. E.g.

// Infrastructure

template <typename Element>
class Channel { ... };

// Application

using IntPair = std::pair<int, int>;

void yield_pairs(int end_i, int end_j, Channel<IntPair>* out) {
  for (int i = 0; i < end_i; ++i) {
    for (int j = 0; j < end_j; ++j) {
      out->send(IntPair{i, j});  // "yield"
    }
  }
  out->close();
}

void MyApp() {
  Channel<IntPair> pairs;
  std::thread generator(yield_pairs, 32, 32, &pairs);
  for (IntPair pair : pairs) {
    UsePair(pair);
  }
  generator.join();
}

This solution has several downsides though:

  1. Threads are “expensive”. Most people would consider this to be an “extravagant” use of threads, especially when your generator is so simple.
  2. There are a couple of clean up actions that you need to remember. These could be automated, but you’d need even more infrastructure, which again, is likely to be seen as “too extravagant”. Anyway, the clean ups that you need are:
    1. out->close()
    2. generator.join()
  3. This does not allow you to stop generator. You could make some modifications to add that ability, but it adds clutter to the code. It would never be as clean as Python’s yield statement.
  4. In addition to 2, there are other bits of boilerplate that are needed each time you want to “instantiate” a generator object:
    1. Channel* out parameter
    2. Additional variables in main: pairs, generator

回答 4

您可能应该在Visual Studio 2015的std :: experimental中检查生成器,例如:https : //blogs.msdn.microsoft.com/vcblog/2014/11/12/resumable-functions-in-c/

我认为这正是您想要的。总体生成器应在C ++ 17中可用,因为这只是实验性的Microsoft VC功能。

You should probably check generators in std::experimental in Visual Studio 2015 e.g: https://blogs.msdn.microsoft.com/vcblog/2014/11/12/resumable-functions-in-c/

I think it’s exactly what you are looking for. Overall generators should be available in C++17 as this is only experimental Microsoft VC feature.


回答 5

如果只需要为相对较少的特定生成器执行此操作,则可以将每个实现生成为一个类,其中成员数据等效于Python生成器函数的局部变量。然后,您将具有一个next函数,该函数返回生成器将产生的下一个内容,并以此更新内部状态。

我相信,这基本上与Python生成器的实现方式相似。主要的区别是它们可以记住生成器函数的字节码中的偏移量,作为“内部状态”的一部分,这意味着可以将生成器写为包含yield的循环。您将不得不从上一个计算下一个值。在您的情况下pair_sequence,这是微不足道的。它可能不适用于复杂的生成器。

您还需要一些指示终止的方法。如果返回的是“类似指针的”,并且NULL不应为有效的yieldable值,则可以将NULL指针用作终止指示符。否则,您需要带外信号。

If you only need to do this for a relatively small number of specific generators, you can implement each as a class, where the member data is equivalent to the local variables of the Python generator function. Then you have a next function that returns the next thing the generator would yield, updating the internal state as it does so.

This is basically similar to how Python generators are implemented, I believe. The major difference being they can remember an offset into the bytecode for the generator function as part of the “internal state”, which means the generators can be written as loops containing yields. You would have to instead calculate the next value from the previous. In the case of your pair_sequence, that’s pretty trivial. It may not be for complex generators.

You also need some way of indicating termination. If what you’re returning is “pointer-like”, and NULL should not be a valid yieldable value you could use a NULL pointer as a termination indicator. Otherwise you need an out-of-band signal.


回答 6

这样的事情非常相似:

struct pair_sequence
{
    typedef pair<unsigned int, unsigned int> result_type;
    static const unsigned int limit = numeric_limits<unsigned int>::max()

    pair_sequence() : i(0), j(0) {}

    result_type operator()()
    {
        result_type r(i, j);
        if(j < limit) j++;
        else if(i < limit)
        {
          j = 0;
          i++;
        }
        else throw out_of_range("end of iteration");
    }

    private:
        unsigned int i;
        unsigned int j;
}

使用operator()只是要对该生成器执行的操作的一个问题,例如,您还可以将其构建为流,并确保它适合istream_iterator。

Something like this is very similar:

struct pair_sequence
{
    typedef pair<unsigned int, unsigned int> result_type;
    static const unsigned int limit = numeric_limits<unsigned int>::max()

    pair_sequence() : i(0), j(0) {}

    result_type operator()()
    {
        result_type r(i, j);
        if(j < limit) j++;
        else if(i < limit)
        {
          j = 0;
          i++;
        }
        else throw out_of_range("end of iteration");
    }

    private:
        unsigned int i;
        unsigned int j;
}

Using the operator() is only a question of what you want to do with this generator, you could also build it as a stream and make sure it adapts to an istream_iterator, for example.


回答 7

使用range-v3

#include <iostream>
#include <tuple>
#include <range/v3/all.hpp>

using namespace std;
using namespace ranges;

auto generator = [x = view::iota(0) | view::take(3)] {
    return view::cartesian_product(x, x);
};

int main () {
    for (auto x : generator()) {
        cout << get<0>(x) << ", " << get<1>(x) << endl;
    }

    return 0;
}

Using range-v3:

#include <iostream>
#include <tuple>
#include <range/v3/all.hpp>

using namespace std;
using namespace ranges;

auto generator = [x = view::iota(0) | view::take(3)] {
    return view::cartesian_product(x, x);
};

int main () {
    for (auto x : generator()) {
        cout << get<0>(x) << ", " << get<1>(x) << endl;
    }

    return 0;
}

回答 8

这样的东西:

使用示例:

using ull = unsigned long long;

auto main() -> int {
    for (ull val : range_t<ull>(100)) {
        std::cout << val << std::endl;
    }

    return 0;
}

将打印从0到99的数字

Something like this:

Example use:

using ull = unsigned long long;

auto main() -> int {
    for (ull val : range_t<ull>(100)) {
        std::cout << val << std::endl;
    }

    return 0;
}

Will print the numbers from 0 to 99


回答 9

好吧,今天我也在寻找在C ++ 11下实现轻松收集的实现。实际上,我很失望,因为我发现的一切都与python生成器或C#yield操作符等东西相距太远……或者过于复杂。

目的是使收集仅在需要时才发出其项目。

我希望它像这样:

auto emitter = on_range<int>(a, b).yield(
    [](int i) {
         /* do something with i */
         return i * 2;
    });

我发现这个职位,恕我直言,最好的回答是大约boost.coroutine2,通过永伟吴。由于它是最接近作者想要的东西。

值得学习加强日常护理。.而且我也许会在周末去做。但是到目前为止,我正在使用非常小的实现。希望它对其他人有帮助。

下面是使用示例,然后实现。

范例.cpp

#include <iostream>
#include "Generator.h"
int main() {
    typedef std::pair<int, int> res_t;

    auto emitter = Generator<res_t, int>::on_range(0, 3)
        .yield([](int i) {
            return std::make_pair(i, i * i);
        });

    for (auto kv : emitter) {
        std::cout << kv.first << "^2 = " << kv.second << std::endl;
    }

    return 0;
}

Generator.h

template<typename ResTy, typename IndexTy>
struct yield_function{
    typedef std::function<ResTy(IndexTy)> type;
};

template<typename ResTy, typename IndexTy>
class YieldConstIterator {
public:
    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;

    typedef YieldConstIterator<ResTy, IndexTy> mytype_t;
    typedef ResTy value_type;

    YieldConstIterator(index_t index, yield_function_t yieldFunction) :
            mIndex(index),
            mYieldFunction(yieldFunction) {}

    mytype_t &operator++() {
        ++mIndex;
        return *this;
    }

    const value_type operator*() const {
        return mYieldFunction(mIndex);
    }

    bool operator!=(const mytype_t &r) const {
        return mIndex != r.mIndex;
    }

protected:

    index_t mIndex;
    yield_function_t mYieldFunction;
};

template<typename ResTy, typename IndexTy>
class YieldIterator : public YieldConstIterator<ResTy, IndexTy> {
public:

    typedef YieldConstIterator<ResTy, IndexTy> parent_t;

    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;
    typedef ResTy value_type;

    YieldIterator(index_t index, yield_function_t yieldFunction) :
            parent_t(index, yieldFunction) {}

    value_type operator*() {
        return parent_t::mYieldFunction(parent_t::mIndex);
    }
};

template<typename IndexTy>
struct Range {
public:
    typedef IndexTy index_t;
    typedef Range<IndexTy> mytype_t;

    index_t begin;
    index_t end;
};

template<typename ResTy, typename IndexTy>
class GeneratorCollection {
public:

    typedef Range<IndexTy> range_t;

    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;
    typedef YieldIterator<ResTy, IndexTy> iterator;
    typedef YieldConstIterator<ResTy, IndexTy> const_iterator;

    GeneratorCollection(range_t range, const yield_function_t &yieldF) :
            mRange(range),
            mYieldFunction(yieldF) {}

    iterator begin() {
        return iterator(mRange.begin, mYieldFunction);
    }

    iterator end() {
        return iterator(mRange.end, mYieldFunction);
    }

    const_iterator begin() const {
        return const_iterator(mRange.begin, mYieldFunction);
    }

    const_iterator end() const {
        return const_iterator(mRange.end, mYieldFunction);
    }

private:
    range_t mRange;
    yield_function_t mYieldFunction;
};

template<typename ResTy, typename IndexTy>
class Generator {
public:
    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;

    typedef Generator<ResTy, IndexTy> mytype_t;
    typedef Range<IndexTy> parent_t;
    typedef GeneratorCollection<ResTy, IndexTy> finalized_emitter_t;
    typedef  Range<IndexTy> range_t;

protected:
    Generator(range_t range) : mRange(range) {}
public:
    static mytype_t on_range(index_t begin, index_t end) {
        return mytype_t({ begin, end });
    }

    finalized_emitter_t yield(yield_function_t f) {
        return finalized_emitter_t(mRange, f);
    }
protected:

    range_t mRange;
};      

Well, today I also was looking for easy collection implementation under C++11. Actually I was disappointed, because everything I found is too far from things like python generators, or C# yield operator… or too complicated.

The purpose is to make collection which will emit its items only when it is required.

I wanted it to be like this:

auto emitter = on_range<int>(a, b).yield(
    [](int i) {
         /* do something with i */
         return i * 2;
    });

I found this post, IMHO best answer was about boost.coroutine2, by Yongwei Wu. Since it is the nearest to what author wanted.

It is worth learning boost couroutines.. And I’ll perhaps do on weekends. But so far I’m using my very small implementation. Hope it helps to someone else.

Below is example of use, and then implementation.

Example.cpp

#include <iostream>
#include "Generator.h"
int main() {
    typedef std::pair<int, int> res_t;

    auto emitter = Generator<res_t, int>::on_range(0, 3)
        .yield([](int i) {
            return std::make_pair(i, i * i);
        });

    for (auto kv : emitter) {
        std::cout << kv.first << "^2 = " << kv.second << std::endl;
    }

    return 0;
}

Generator.h

template<typename ResTy, typename IndexTy>
struct yield_function{
    typedef std::function<ResTy(IndexTy)> type;
};

template<typename ResTy, typename IndexTy>
class YieldConstIterator {
public:
    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;

    typedef YieldConstIterator<ResTy, IndexTy> mytype_t;
    typedef ResTy value_type;

    YieldConstIterator(index_t index, yield_function_t yieldFunction) :
            mIndex(index),
            mYieldFunction(yieldFunction) {}

    mytype_t &operator++() {
        ++mIndex;
        return *this;
    }

    const value_type operator*() const {
        return mYieldFunction(mIndex);
    }

    bool operator!=(const mytype_t &r) const {
        return mIndex != r.mIndex;
    }

protected:

    index_t mIndex;
    yield_function_t mYieldFunction;
};

template<typename ResTy, typename IndexTy>
class YieldIterator : public YieldConstIterator<ResTy, IndexTy> {
public:

    typedef YieldConstIterator<ResTy, IndexTy> parent_t;

    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;
    typedef ResTy value_type;

    YieldIterator(index_t index, yield_function_t yieldFunction) :
            parent_t(index, yieldFunction) {}

    value_type operator*() {
        return parent_t::mYieldFunction(parent_t::mIndex);
    }
};

template<typename IndexTy>
struct Range {
public:
    typedef IndexTy index_t;
    typedef Range<IndexTy> mytype_t;

    index_t begin;
    index_t end;
};

template<typename ResTy, typename IndexTy>
class GeneratorCollection {
public:

    typedef Range<IndexTy> range_t;

    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;
    typedef YieldIterator<ResTy, IndexTy> iterator;
    typedef YieldConstIterator<ResTy, IndexTy> const_iterator;

    GeneratorCollection(range_t range, const yield_function_t &yieldF) :
            mRange(range),
            mYieldFunction(yieldF) {}

    iterator begin() {
        return iterator(mRange.begin, mYieldFunction);
    }

    iterator end() {
        return iterator(mRange.end, mYieldFunction);
    }

    const_iterator begin() const {
        return const_iterator(mRange.begin, mYieldFunction);
    }

    const_iterator end() const {
        return const_iterator(mRange.end, mYieldFunction);
    }

private:
    range_t mRange;
    yield_function_t mYieldFunction;
};

template<typename ResTy, typename IndexTy>
class Generator {
public:
    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;

    typedef Generator<ResTy, IndexTy> mytype_t;
    typedef Range<IndexTy> parent_t;
    typedef GeneratorCollection<ResTy, IndexTy> finalized_emitter_t;
    typedef  Range<IndexTy> range_t;

protected:
    Generator(range_t range) : mRange(range) {}
public:
    static mytype_t on_range(index_t begin, index_t end) {
        return mytype_t({ begin, end });
    }

    finalized_emitter_t yield(yield_function_t f) {
        return finalized_emitter_t(mRange, f);
    }
protected:

    range_t mRange;
};      

回答 10

这个答案适用于C语言(因此我认为也适用于C ++语言)

#include <stdio.h>

const uint64_t MAX = 1ll<<32;

typedef struct {
    uint64_t i, j;
} Pair;

Pair* generate_pairs()
{
    static uint64_t i = 0;
    static uint64_t j = 0;
    
    Pair p = {i,j};
    if(j++ < MAX)
    {
        return &p;
    }
        else if(++i < MAX)
    {
        p.i++;
        p.j = 0;
        j = 0;
        return &p;
    }
    else
    {
        return NULL;
    }
}

int main()
{
    while(1)
    {
        Pair *p = generate_pairs();
        if(p != NULL)
        {
            //printf("%d,%d\n",p->i,p->j);
        }
        else
        {
            //printf("end");
            break;
        }
    }
    return 0;
}

这是一种模拟生成器的简单,非面向对象的方法。这按我的预期工作。

This answer works in C (and hence i think works in c++ too)

#include <stdio.h>

const uint64_t MAX = 1ll<<32;

typedef struct {
    uint64_t i, j;
} Pair;

Pair* generate_pairs()
{
    static uint64_t i = 0;
    static uint64_t j = 0;
    
    Pair p = {i,j};
    if(j++ < MAX)
    {
        return &p;
    }
        else if(++i < MAX)
    {
        p.i++;
        p.j = 0;
        j = 0;
        return &p;
    }
    else
    {
        return NULL;
    }
}

int main()
{
    while(1)
    {
        Pair *p = generate_pairs();
        if(p != NULL)
        {
            //printf("%d,%d\n",p->i,p->j);
        }
        else
        {
            //printf("end");
            break;
        }
    }
    return 0;
}

This is simple, non object-oriented way to mimic a generator. This worked as expected for me.


回答 11

正如函数模拟堆栈的概念一样,生成器模拟队列的概念。剩下的就是语义。

附带说明,您始终可以通过使用操作堆栈而不是数据来模拟带有堆栈的队列。实际上,这意味着您可以通过返回一对来实现类似队列的行为,该对的第二个值要么具有要调用的下一个函数,要么表明我们没有值。但是,这比收益与收益的关系更为普遍。它允许模拟任何值的队列,而不是生成器期望的同类值,但是无需保留完整的内部队列。

更具体地说,由于C ++对队列没有自然的抽象,因此您需要使用在内部实现队列的构造。因此,使用迭代器给出示例的答案是该概念的不错实现。

这实际上意味着,如果您只想快速地进行操作,然后使用队列值,就可以使用生成器产生的值,那么您可以使用准系统队列功能来实现某些功能。

Just as a function simulates the concept of a stack, generators simulate the concept of a queue. The rest is semantics.

As a side note, you can always simulate a queue with a stack by using a stack of operations instead of data. What that practically means is that you can implement a queue-like behavior by returning a pair, the second value of which either has the next function to be called or indicates that we are out of values. But this is more general than what yield vs return does. It allows to simulate a queue of any values rather than homogeneous values that you expect from a generator, but without keeping a full internal queue.

More specifically, since C++ does not have a natural abstraction for a queue, you need to use constructs which implement a queue internally. So the answer which gave the example with iterators is a decent implementation of the concept.

What this practically means is that you can implement something with bare-bones queue functionality if you just want something quick and then consume queue’s values just as you would consume values yielded from a generator.


Python 2.x中的nonlocal关键字

问题:Python 2.x中的nonlocal关键字

我正在尝试在python 2.6中实现闭包,我需要访问一个非局部变量,但似乎此关键字在python 2.x中不可用。在这些版本的python中,如何在闭包中访问非局部变量?

I’m trying to implement a closure in Python 2.6 and I need to access a nonlocal variable but it seems like this keyword is not available in python 2.x. How should one access nonlocal variables in closures in these versions of python?


回答 0

内部函数可以读取 2.x中的非局部变量,而无需重新绑定它们。这很烦人,但是您可以解决它。只需创建一个字典,然后将数据作为元素存储在其中即可。禁止内部函数对非局部变量引用的对象进行突变

要使用Wikipedia中的示例:

def outer():
    d = {'y' : 0}
    def inner():
        d['y'] += 1
        return d['y']
    return inner

f = outer()
print(f(), f(), f()) #prints 1 2 3

Inner functions can read nonlocal variables in 2.x, just not rebind them. This is annoying, but you can work around it. Just create a dictionary, and store your data as elements therein. Inner functions are not prohibited from mutating the objects that nonlocal variables refer to.

To use the example from Wikipedia:

def outer():
    d = {'y' : 0}
    def inner():
        d['y'] += 1
        return d['y']
    return inner

f = outer()
print(f(), f(), f()) #prints 1 2 3

回答 1

以下解决方案的灵感来自Elias Zamaria答案,但与该答案相反,它确实可以正确处理外部函数的多次调用。“变量” inner.y位于的当前调用中outer。因为它是一个变量,所以它不是变量,而是对象属性(对象inner本身就是函数本身)。这非常丑陋(请注意,只能在inner定义函数后才能创建属性),但似乎有效。

def outer():
    def inner():
        inner.y += 1
        return inner.y
    inner.y = 0
    return inner

f = outer()
g = outer()
print(f(), f(), g(), f(), g()) #prints (1, 2, 1, 3, 2)

The following solution is inspired by the answer by Elias Zamaria, but contrary to that answer does handle multiple calls of the outer function correctly. The “variable” inner.y is local to the current call of outer. Only it isn’t a variable, since that is forbidden, but an object attribute (the object being the function inner itself). This is very ugly (note that the attribute can only be created after the inner function is defined) but seems effective.

def outer():
    def inner():
        inner.y += 1
        return inner.y
    inner.y = 0
    return inner

f = outer()
g = outer()
print(f(), f(), g(), f(), g()) #prints (1, 2, 1, 3, 2)

回答 2

与字典相比,非本地类的混乱程度更低。修改@ChrisB的示例

def outer():
    class context:
        y = 0
    def inner():
        context.y += 1
        return context.y
    return inner

然后

f = outer()
assert f() == 1
assert f() == 2
assert f() == 3
assert f() == 4

每个external()调用都会创建一个称为上下文的新的独特类(不仅仅是一个新实例)。因此,它避免了@Nathaniel提防共享上下文。

g = outer()
assert g() == 1
assert g() == 2

assert f() == 5

Rather than a dictionary, there’s less clutter to a nonlocal class. Modifying @ChrisB’s example:

def outer():
    class context:
        y = 0
    def inner():
        context.y += 1
        return context.y
    return inner

Then

f = outer()
assert f() == 1
assert f() == 2
assert f() == 3
assert f() == 4

Each outer() call creates a new and distinct class called context (not merely a new instance). So it avoids @Nathaniel’s beware about shared context.

g = outer()
assert g() == 1
assert g() == 2

assert f() == 5

回答 3

我认为这里的关键是“访问”的含义。读取闭包范围之外的变量应该没有问题,例如,

x = 3
def outer():
    def inner():
        print x
    inner()
outer()

应该可以按预期工作(打印3)。但是,覆盖x的值不起作用,例如,

x = 3
def outer():
    def inner():
        x = 5
    inner()
outer()
print x

仍会打印3。根据我对PEP-3104的理解,这就是nonlocal关键字的含义。如PEP中所述,您可以使用一个类来完成同一件事(有点凌乱):

class Namespace(object): pass
ns = Namespace()
ns.x = 3
def outer():
    def inner():
        ns.x = 5
    inner()
outer()
print ns.x

I think the key here is what you mean by “access”. There should be no issue with reading a variable outside of the closure scope, e.g.,

x = 3
def outer():
    def inner():
        print x
    inner()
outer()

should work as expected (printing 3). However, overriding the value of x does not work, e.g.,

x = 3
def outer():
    def inner():
        x = 5
    inner()
outer()
print x

will still print 3. From my understanding of PEP-3104 this is what the nonlocal keyword is meant to cover. As mentioned in the PEP, you can use a class to accomplish the same thing (kind of messy):

class Namespace(object): pass
ns = Namespace()
ns.x = 3
def outer():
    def inner():
        ns.x = 5
    inner()
outer()
print ns.x

回答 4

如果由于任何原因,此处的任何答案都不理想,则可以使用另一种方法在Python 2中实现非局部变量:

def outer():
    outer.y = 0
    def inner():
        outer.y += 1
        return outer.y
    return inner

f = outer()
print(f(), f(), f()) #prints 1 2 3

在变量的赋值语句中使用函数名称是多余的,但对我来说,比将变量放入字典中看起来更简单和简洁。就像克里斯·B(Chris B.)的回答一样,一个电话会记住另一个电话的价值。

There is another way to implement nonlocal variables in Python 2, in case any of the answers here are undesirable for whatever reason:

def outer():
    outer.y = 0
    def inner():
        outer.y += 1
        return outer.y
    return inner

f = outer()
print(f(), f(), f()) #prints 1 2 3

It is redundant to use the name of the function in the assignment statement of the variable, but it looks simpler and cleaner to me than putting the variable in a dictionary. The value is remembered from one call to another, just like in Chris B.’s answer.


回答 5

以下是Alois Mahdal在评论另一个答案时提出的建议:

class Nonlocal(object):
    """ Helper to implement nonlocal names in Python 2.x """
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)


def outer():
    nl = Nonlocal(y=0)
    def inner():
        nl.y += 1
        return nl.y
    return inner

f = outer()
print(f(), f(), f()) # -> (1 2 3)

更新资料

在最近回顾了这一点之后,我对它的装饰风格感到震惊-当我想到将其实现为装饰器将使它更加通用和有用时(尽管这样做无疑会在某种程度上降低其可读性)。

# Implemented as a decorator.

class Nonlocal(object):
    """ Decorator class to help implement nonlocal names in Python 2.x """
    def __init__(self, **kwargs):
        self._vars = kwargs

    def __call__(self, func):
        for k, v in self._vars.items():
            setattr(func, k, v)
        return func


@Nonlocal(y=0)
def outer():
    def inner():
        outer.y += 1
        return outer.y
    return inner


f = outer()
print(f(), f(), f()) # -> (1 2 3)

请注意,这两个版本均可在Python 2和3中使用。

Here’s something inspired by a suggestion Alois Mahdal made in a comment regarding another answer:

class Nonlocal(object):
    """ Helper to implement nonlocal names in Python 2.x """
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)


def outer():
    nl = Nonlocal(y=0)
    def inner():
        nl.y += 1
        return nl.y
    return inner

f = outer()
print(f(), f(), f()) # -> (1 2 3)

Update

After looking back at this recently, I was struck by how decorator-like it was—when it dawned on me that implementing it as one would make it more generic & useful (although doing so arguably degrades its readability to some degree).

# Implemented as a decorator.

class Nonlocal(object):
    """ Decorator class to help implement nonlocal names in Python 2.x """
    def __init__(self, **kwargs):
        self._vars = kwargs

    def __call__(self, func):
        for k, v in self._vars.items():
            setattr(func, k, v)
        return func


@Nonlocal(y=0)
def outer():
    def inner():
        outer.y += 1
        return outer.y
    return inner


f = outer()
print(f(), f(), f()) # -> (1 2 3)

Note that both versions work in both Python 2 and 3.


回答 6

python的作用域规则中有一个缺陷-赋值使变量在其立即包含的函数作用域内是局部的。对于全局变量,您可以使用global关键字解决。

解决方案是引入一个在两个作用域之间共享的对象,该对象包含可变变量,但本身通过未分配的变量引用。

def outer(v):
    def inner(container = [v]):
        container[0] += 1
        return container[0]
    return inner

另一种选择是一些示波器黑客:

def outer(v):
    def inner(varname = 'v', scope = locals()):
        scope[varname] += 1
        return scope[varname]
    return inner

您可能可以找出一些技巧来将参数的名称获取到outer,然后将其作为varname传递,但是在不依赖名称的情况下,outer您需要使用Y组合器。

There is a wart in python’s scoping rules – assignment makes a variable local to its immediately enclosing function scope. For a global variable, you would solve this with the global keyword.

The solution is to introduce an object which is shared between the two scopes, which contains mutable variables, but is itself referenced through a variable which is not assigned.

def outer(v):
    def inner(container = [v]):
        container[0] += 1
        return container[0]
    return inner

An alternative is some scopes hackery:

def outer(v):
    def inner(varname = 'v', scope = locals()):
        scope[varname] += 1
        return scope[varname]
    return inner

You might be able to figure out some trickery to get the name of the parameter to outer, and then pass it as varname, but without relying on the name outer you would like need to use a Y combinator.


回答 7

另一种方法(尽管太冗长了):

import ctypes

def outer():
    y = 0
    def inner():
        ctypes.pythonapi.PyCell_Set(id(inner.func_closure[0]), id(y + 1))
        return y
    return inner

x = outer()
x()
>> 1
x()
>> 2
y = outer()
y()
>> 1
x()
>> 3

Another way to do it (although it’s too verbose):

import ctypes

def outer():
    y = 0
    def inner():
        ctypes.pythonapi.PyCell_Set(id(inner.func_closure[0]), id(y + 1))
        return y
    return inner

x = outer()
x()
>> 1
x()
>> 2
y = outer()
y()
>> 1
x()
>> 3

回答 8

将Martineau的优雅解决方案扩展到一个实用且不太优雅的用例中,我得到:

class nonlocals(object):
""" Helper to implement nonlocal names in Python 2.x.
Usage example:
def outer():
     nl = nonlocals( n=0, m=1 )
     def inner():
         nl.n += 1
     inner() # will increment nl.n

or...
    sums = nonlocals( { k:v for k,v in locals().iteritems() if k.startswith('tot_') } )
"""
def __init__(self, **kwargs):
    self.__dict__.update(kwargs)

def __init__(self, a_dict):
    self.__dict__.update(a_dict)

Extending Martineau elegant solution above to a practical and somewhat less elegant use case I get:

class nonlocals(object):
""" Helper to implement nonlocal names in Python 2.x.
Usage example:
def outer():
     nl = nonlocals( n=0, m=1 )
     def inner():
         nl.n += 1
     inner() # will increment nl.n

or...
    sums = nonlocals( { k:v for k,v in locals().iteritems() if k.startswith('tot_') } )
"""
def __init__(self, **kwargs):
    self.__dict__.update(kwargs)

def __init__(self, a_dict):
    self.__dict__.update(a_dict)

回答 9

使用全局变量

def outer():
    global y # import1
    y = 0
    def inner():
        global y # import2 - requires import1
        y += 1
        return y
    return inner

f = outer()
print(f(), f(), f()) #prints 1 2 3

我个人不喜欢全局变量。但是,我的建议基于https://stackoverflow.com/a/19877437/1083704回答

def report():
        class Rank: 
            def __init__(self):
                report.ranks += 1
        rank = Rank()
report.ranks = 0
report()

在用户需要声明全局变量的地方ranks,每次需要调用report。我的改进消除了从用户初始化函数变量的需要。

Use a global variable

def outer():
    global y # import1
    y = 0
    def inner():
        global y # import2 - requires import1
        y += 1
        return y
    return inner

f = outer()
print(f(), f(), f()) #prints 1 2 3

Personally, I do not like the global variables. But, my proposal is based on https://stackoverflow.com/a/19877437/1083704 answer

def report():
        class Rank: 
            def __init__(self):
                report.ranks += 1
        rank = Rank()
report.ranks = 0
report()

where user needs to declare a global variable ranks, every time you need to call the report. My improvement eliminates the need to initialize the function variables from the user.


查找字符串中子字符串的第n次出现

问题:查找字符串中子字符串的第n次出现

这似乎应该是微不足道的,但是我是Python的新手,并且希望以最Python的方式进行操作。

我想找到对应于字符串中第n个子字符串的索引。

一定有什么我想做的事情是

mystring.find("substring", 2nd)

如何在Python中实现?

This seems like it should be pretty trivial, but I am new at Python and want to do it the most Pythonic way.

I want to find the index corresponding to the n’th occurrence of a substring within a string.

There’s got to be something equivalent to what I WANT to do which is

mystring.find("substring", 2nd)

How can you achieve this in Python?


回答 0

我认为,Mark的迭代方法将是通常的方法。

这是字符串拆分的替代方法,通常可用于查找相关过程:

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

这是一种快速(有点脏,因为您必须选择一些无法与针头相匹配的谷壳)的单缸套:

'foo bar bar bar'.replace('bar', 'XXX', 1).find('bar')

Mark’s iterative approach would be the usual way, I think.

Here’s an alternative with string-splitting, which can often be useful for finding-related processes:

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

And here’s a quick (and somewhat dirty, in that you have to choose some chaff that can’t match the needle) one-liner:

'foo bar bar bar'.replace('bar', 'XXX', 1).find('bar')

回答 1

这是简单的迭代解决方案的更多Pythonic版本:

def find_nth(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+len(needle))
        n -= 1
    return start

例:

>>> find_nth("foofoofoofoo", "foofoo", 2)
6

如果要查找的第n个重叠出现needle,可以用1代替,增加len(needle),如下所示:

def find_nth_overlapping(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+1)
        n -= 1
    return start

例:

>>> find_nth_overlapping("foofoofoofoo", "foofoo", 2)
3

这比Mark的版本更容易阅读,并且不需要拆分版本或导入正则表达式模块的额外内存。与各种方法不同,它还遵守python Zen中的一些规则re

  1. 简单胜于复杂。
  2. 扁平比嵌套更好。
  3. 可读性很重要。

Here’s a more Pythonic version of the straightforward iterative solution:

def find_nth(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+len(needle))
        n -= 1
    return start

Example:

>>> find_nth("foofoofoofoo", "foofoo", 2)
6

If you want to find the nth overlapping occurrence of needle, you can increment by 1 instead of len(needle), like this:

def find_nth_overlapping(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+1)
        n -= 1
    return start

Example:

>>> find_nth_overlapping("foofoofoofoo", "foofoo", 2)
3

This is easier to read than Mark’s version, and it doesn’t require the extra memory of the splitting version or importing regular expression module. It also adheres to a few of the rules in the Zen of python, unlike the various re approaches:

  1. Simple is better than complex.
  2. Flat is better than nested.
  3. Readability counts.

回答 2

这将在字符串中找到子字符串的第二次出现。

def find_2nd(string, substring):
   return string.find(substring, string.find(substring) + 1)

编辑:我对性能没有考虑太多,但是快速递归可以帮助找到第n个出现的情况:

def find_nth(string, substring, n):
   if (n == 1):
       return string.find(substring)
   else:
       return string.find(substring, find_nth(string, substring, n - 1) + 1)

This will find the second occurrence of substring in string.

def find_2nd(string, substring):
   return string.find(substring, string.find(substring) + 1)

Edit: I haven’t thought much about the performance, but a quick recursion can help with finding the nth occurrence:

def find_nth(string, substring, n):
   if (n == 1):
       return string.find(substring)
   else:
       return string.find(substring, find_nth(string, substring, n - 1) + 1)

回答 3

了解正则表达式并不总是最好的解决方案,我可能在这里使用一个:

>>> import re
>>> s = "ababdfegtduab"
>>> [m.start() for m in re.finditer(r"ab",s)]
[0, 2, 11]
>>> [m.start() for m in re.finditer(r"ab",s)][2] #index 2 is third occurrence 
11

Understanding that regex is not always the best solution, I’d probably use one here:

>>> import re
>>> s = "ababdfegtduab"
>>> [m.start() for m in re.finditer(r"ab",s)]
[0, 2, 11]
>>> [m.start() for m in re.finditer(r"ab",s)][2] #index 2 is third occurrence 
11

回答 4

我提供了一些基准测试结果,以比较到目前为止介绍的最著名的方法,即@bobince findnth()(基于str.split())与@tgamblin find_nth()(或基于@Mark Byers)(基于str.find())。我还将与C扩展名(_find_nth.so)进行比较,以了解我们可以走多快。这里是find_nth.py

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

def find_nth(s, x, n=0, overlap=False):
    l = 1 if overlap else len(x)
    i = -l
    for c in xrange(n + 1):
        i = s.find(x, i + l)
        if i < 0:
            break
    return i

当然,如果字符串很大,性能最重要,因此假设我们要在1.3 GB的文件“ bigfile”中找到第1000001个换行符(’\ n’)。为了节省内存,我们希望处理mmap.mmap文件的对象表示形式:

In [1]: import _find_nth, find_nth, mmap

In [2]: f = open('bigfile', 'r')

In [3]: mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

findnth()由于mmap.mmap对象不支持,因此已经存在第一个问题split()。因此,我们实际上必须将整个文件复制到内存中:

In [4]: %time s = mm[:]
CPU times: user 813 ms, sys: 3.25 s, total: 4.06 s
Wall time: 17.7 s

哎哟! 幸运的是s,我的Macbook Air仍可容纳4 GB内存,因此让我们进行基准测试findnth()

In [5]: %timeit find_nth.findnth(s, '\n', 1000000)
1 loops, best of 3: 29.9 s per loop

显然表现糟糕。让我们看看基于的方法是如何str.find()做到的:

In [6]: %timeit find_nth.find_nth(s, '\n', 1000000)
1 loops, best of 3: 774 ms per loop

好多了!显然,findnth()问题在于它被迫在期间复制字符串split(),这已经是我们第二次在after之后复制1.3 GB的数据了s = mm[:]。这里有第二个优点find_nth():我们可以mm直接使用它,因此文件的副本是必需的:

In [7]: %timeit find_nth.find_nth(mm, '\n', 1000000)
1 loops, best of 3: 1.21 s per loop

mmvs. 上似乎有一些小的性能损失s,但这表明find_nth()与1.2 s findnth的总和(47 s)相比,可以在1.2 s内获得答案。

我发现没有任何str.find()一种方法比基于方法的性能明显差于str.split()基于方法的情况,因此,在这一点上,我认为应该接受@tgamblin或@Mark Byers的答案,而不是@bobince的答案。

在我的测试中,上述版本find_nth()是我能想到的最快的纯Python解决方案(非常类似于@Mark Byers的版本)。让我们看看使用C扩展模块可以做的更好。这里是_find_nthmodule.c

#include <Python.h>
#include <string.h>

off_t _find_nth(const char *buf, size_t l, char c, int n) {
    off_t i;
    for (i = 0; i < l; ++i) {
        if (buf[i] == c && n-- == 0) {
            return i;
        }
    }
    return -1;
}

off_t _find_nth2(const char *buf, size_t l, char c, int n) {
    const char *b = buf - 1;
    do {
        b = memchr(b + 1, c, l);
        if (!b) return -1;
    } while (n--);
    return b - buf;
}

/* mmap_object is private in mmapmodule.c - replicate beginning here */
typedef struct {
    PyObject_HEAD
    char *data;
    size_t size;
} mmap_object;

typedef struct {
    const char *s;
    size_t l;
    char c;
    int n;
} params;

int parse_args(PyObject *args, params *P) {
    PyObject *obj;
    const char *x;

    if (!PyArg_ParseTuple(args, "Osi", &obj, &x, &P->n)) {
        return 1;
    }
    PyTypeObject *type = Py_TYPE(obj);

    if (type == &PyString_Type) {
        P->s = PyString_AS_STRING(obj);
        P->l = PyString_GET_SIZE(obj);
    } else if (!strcmp(type->tp_name, "mmap.mmap")) {
        mmap_object *m_obj = (mmap_object*) obj;
        P->s = m_obj->data;
        P->l = m_obj->size;
    } else {
        PyErr_SetString(PyExc_TypeError, "Cannot obtain char * from argument 0");
        return 1;
    }
    P->c = x[0];
    return 0;
}

static PyObject* py_find_nth(PyObject *self, PyObject *args) {
    params P;
    if (!parse_args(args, &P)) {
        return Py_BuildValue("i", _find_nth(P.s, P.l, P.c, P.n));
    } else {
        return NULL;    
    }
}

static PyObject* py_find_nth2(PyObject *self, PyObject *args) {
    params P;
    if (!parse_args(args, &P)) {
        return Py_BuildValue("i", _find_nth2(P.s, P.l, P.c, P.n));
    } else {
        return NULL;    
    }
}

static PyMethodDef methods[] = {
    {"find_nth", py_find_nth, METH_VARARGS, ""},
    {"find_nth2", py_find_nth2, METH_VARARGS, ""},
    {0}
};

PyMODINIT_FUNC init_find_nth(void) {
    Py_InitModule("_find_nth", methods);
}

这是setup.py文件:

from distutils.core import setup, Extension
module = Extension('_find_nth', sources=['_find_nthmodule.c'])
setup(ext_modules=[module])

像往常一样安装python setup.py install。C代码在这里发挥了优势,因为它仅限于查找单个字符,但是让我们看一下它有多快:

In [8]: %timeit _find_nth.find_nth(mm, '\n', 1000000)
1 loops, best of 3: 218 ms per loop

In [9]: %timeit _find_nth.find_nth(s, '\n', 1000000)
1 loops, best of 3: 216 ms per loop

In [10]: %timeit _find_nth.find_nth2(mm, '\n', 1000000)
1 loops, best of 3: 307 ms per loop

In [11]: %timeit _find_nth.find_nth2(s, '\n', 1000000)
1 loops, best of 3: 304 ms per loop

显然还快很多。有趣的是,内存中情况和映射情况之间的C级别没有差异。有趣的是_find_nth2(),它基于string.hmemchr()库函数,相对于以下简单的实现方式有所失落_find_nth():额外的“优化” memchr()显然是后退式的…

总而言之,findnth()(基于str.split())中的实现确实是一个坏主意,因为(a)由于需要进行复制,因此它对于较大的字符串表现出极大的性能,(b)根本不适用于mmap.mmap对象。在find_nth()(基于str.find())中的实现在所有情况下都应优先考虑(因此是该问题的公认答案)。

还有很大的改进空间,因为C扩展比纯Python代码快将近4倍,这表明可能存在专用Python库函数的情况。

I’m offering some benchmarking results comparing the most prominent approaches presented so far, namely @bobince’s findnth() (based on str.split()) vs. @tgamblin’s or @Mark Byers’ find_nth() (based on str.find()). I will also compare with a C extension (_find_nth.so) to see how fast we can go. Here is find_nth.py:

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

def find_nth(s, x, n=0, overlap=False):
    l = 1 if overlap else len(x)
    i = -l
    for c in xrange(n + 1):
        i = s.find(x, i + l)
        if i < 0:
            break
    return i

Of course, performance matters most if the string is large, so suppose we want to find the 1000001st newline (‘\n’) in a 1.3 GB file called ‘bigfile’. To save memory, we would like to work on an mmap.mmap object representation of the file:

In [1]: import _find_nth, find_nth, mmap

In [2]: f = open('bigfile', 'r')

In [3]: mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

There is already the first problem with findnth(), since mmap.mmap objects don’t support split(). So we actually have to copy the whole file into memory:

In [4]: %time s = mm[:]
CPU times: user 813 ms, sys: 3.25 s, total: 4.06 s
Wall time: 17.7 s

Ouch! Fortunately s still fits in the 4 GB of memory of my Macbook Air, so let’s benchmark findnth():

In [5]: %timeit find_nth.findnth(s, '\n', 1000000)
1 loops, best of 3: 29.9 s per loop

Clearly a terrible performance. Let’s see how the approach based on str.find() does:

In [6]: %timeit find_nth.find_nth(s, '\n', 1000000)
1 loops, best of 3: 774 ms per loop

Much better! Clearly, findnth()‘s problem is that it is forced to copy the string during split(), which is already the second time we copied the 1.3 GB of data around after s = mm[:]. Here comes in the second advantage of find_nth(): We can use it on mm directly, such that zero copies of the file are required:

In [7]: %timeit find_nth.find_nth(mm, '\n', 1000000)
1 loops, best of 3: 1.21 s per loop

There appears to be a small performance penalty operating on mm vs. s, but this illustrates that find_nth() can get us an answer in 1.2 s compared to findnth‘s total of 47 s.

I found no cases where the str.find() based approach was significantly worse than the str.split() based approach, so at this point, I would argue that @tgamblin’s or @Mark Byers’ answer should be accepted instead of @bobince’s.

In my testing, the version of find_nth() above was the fastest pure Python solution I could come up with (very similar to @Mark Byers’ version). Let’s see how much better we can do with a C extension module. Here is _find_nthmodule.c:

#include <Python.h>
#include <string.h>

off_t _find_nth(const char *buf, size_t l, char c, int n) {
    off_t i;
    for (i = 0; i < l; ++i) {
        if (buf[i] == c && n-- == 0) {
            return i;
        }
    }
    return -1;
}

off_t _find_nth2(const char *buf, size_t l, char c, int n) {
    const char *b = buf - 1;
    do {
        b = memchr(b + 1, c, l);
        if (!b) return -1;
    } while (n--);
    return b - buf;
}

/* mmap_object is private in mmapmodule.c - replicate beginning here */
typedef struct {
    PyObject_HEAD
    char *data;
    size_t size;
} mmap_object;

typedef struct {
    const char *s;
    size_t l;
    char c;
    int n;
} params;

int parse_args(PyObject *args, params *P) {
    PyObject *obj;
    const char *x;

    if (!PyArg_ParseTuple(args, "Osi", &obj, &x, &P->n)) {
        return 1;
    }
    PyTypeObject *type = Py_TYPE(obj);

    if (type == &PyString_Type) {
        P->s = PyString_AS_STRING(obj);
        P->l = PyString_GET_SIZE(obj);
    } else if (!strcmp(type->tp_name, "mmap.mmap")) {
        mmap_object *m_obj = (mmap_object*) obj;
        P->s = m_obj->data;
        P->l = m_obj->size;
    } else {
        PyErr_SetString(PyExc_TypeError, "Cannot obtain char * from argument 0");
        return 1;
    }
    P->c = x[0];
    return 0;
}

static PyObject* py_find_nth(PyObject *self, PyObject *args) {
    params P;
    if (!parse_args(args, &P)) {
        return Py_BuildValue("i", _find_nth(P.s, P.l, P.c, P.n));
    } else {
        return NULL;    
    }
}

static PyObject* py_find_nth2(PyObject *self, PyObject *args) {
    params P;
    if (!parse_args(args, &P)) {
        return Py_BuildValue("i", _find_nth2(P.s, P.l, P.c, P.n));
    } else {
        return NULL;    
    }
}

static PyMethodDef methods[] = {
    {"find_nth", py_find_nth, METH_VARARGS, ""},
    {"find_nth2", py_find_nth2, METH_VARARGS, ""},
    {0}
};

PyMODINIT_FUNC init_find_nth(void) {
    Py_InitModule("_find_nth", methods);
}

Here is the setup.py file:

from distutils.core import setup, Extension
module = Extension('_find_nth', sources=['_find_nthmodule.c'])
setup(ext_modules=[module])

Install as usual with python setup.py install. The C code plays at an advantage here since it is limited to finding single characters, but let’s see how fast this is:

In [8]: %timeit _find_nth.find_nth(mm, '\n', 1000000)
1 loops, best of 3: 218 ms per loop

In [9]: %timeit _find_nth.find_nth(s, '\n', 1000000)
1 loops, best of 3: 216 ms per loop

In [10]: %timeit _find_nth.find_nth2(mm, '\n', 1000000)
1 loops, best of 3: 307 ms per loop

In [11]: %timeit _find_nth.find_nth2(s, '\n', 1000000)
1 loops, best of 3: 304 ms per loop

Clearly quite a bit faster still. Interestingly, there is no difference on the C level between the in-memory and mmapped cases. It is also interesting to see that _find_nth2(), which is based on string.h‘s memchr() library function, loses out against the straightforward implementation in _find_nth(): The additional “optimizations” in memchr() are apparently backfiring…

In conclusion, the implementation in findnth() (based on str.split()) is really a bad idea, since (a) it performs terribly for larger strings due to the required copying, and (b) it doesn’t work on mmap.mmap objects at all. The implementation in find_nth() (based on str.find()) should be preferred in all circumstances (and therefore be the accepted answer to this question).

There is still quite a bit of room for improvement, since the C extension ran almost a factor of 4 faster than the pure Python code, indicating that there might be a case for a dedicated Python library function.


回答 5

最简单的方法?

text = "This is a test from a test ok" 

firstTest = text.find('test')

print text.find('test', firstTest + 1)

Simplest way?

text = "This is a test from a test ok" 

firstTest = text.find('test')

print text.find('test', firstTest + 1)

回答 6

我可能会使用带有索引参数的find函数来做这样的事情:

def find_nth(s, x, n):
    i = -1
    for _ in range(n):
        i = s.find(x, i + len(x))
        if i == -1:
            break
    return i

print find_nth('bananabanana', 'an', 3)

我猜这不是特别的Pythonic,但是很简单。您可以使用递归来代替:

def find_nth(s, x, n, i = 0):
    i = s.find(x, i)
    if n == 1 or i == -1:
        return i 
    else:
        return find_nth(s, x, n - 1, i + len(x))

print find_nth('bananabanana', 'an', 3)

这是解决该问题的一种实用方法,但是我不知道这是否使其更具有Python风格。

I’d probably do something like this, using the find function that takes an index parameter:

def find_nth(s, x, n):
    i = -1
    for _ in range(n):
        i = s.find(x, i + len(x))
        if i == -1:
            break
    return i

print find_nth('bananabanana', 'an', 3)

It’s not particularly Pythonic I guess, but it’s simple. You could do it using recursion instead:

def find_nth(s, x, n, i = 0):
    i = s.find(x, i)
    if n == 1 or i == -1:
        return i 
    else:
        return find_nth(s, x, n - 1, i + len(x))

print find_nth('bananabanana', 'an', 3)

It’s a functional way to solve it, but I don’t know if that makes it more Pythonic.


回答 7

这将为您提供与匹配的起始索引数组yourstring

import re
indices = [s.start() for s in re.finditer(':', yourstring)]

那么您的第n个条目将是:

n = 2
nth_entry = indices[n-1]

当然,您必须小心索引范围。您可以获得这样的实例数yourstring

num_instances = len(indices)

This will give you an array of the starting indices for matches to yourstring:

import re
indices = [s.start() for s in re.finditer(':', yourstring)]

Then your nth entry would be:

n = 2
nth_entry = indices[n-1]

Of course you have to be careful with the index bounds. You can get the number of instances of yourstring like this:

num_instances = len(indices)

回答 8

这是使用re.finditer的另一种方法。
所不同的是,这只会尽可能地调查大海捞针

from re import finditer
from itertools import dropwhile
needle='an'
haystack='bananabanana'
n=2
next(dropwhile(lambda x: x[0]<n, enumerate(re.finditer(needle,haystack))))[1].start() 

Here is another approach using re.finditer.
The difference is that this only looks into the haystack as far as necessary

from re import finditer
from itertools import dropwhile
needle='an'
haystack='bananabanana'
n=2
next(dropwhile(lambda x: x[0]<n, enumerate(re.finditer(needle,haystack))))[1].start() 

回答 9

这是搜索a 或a 时应该工作的另一个re+ itertools版本。我会自由地承认这可能是过度设计的,但是出于某种原因,它使我感到很开心。strRegexpObject

import itertools
import re

def find_nth(haystack, needle, n = 1):
    """
    Find the starting index of the nth occurrence of ``needle`` in \
    ``haystack``.

    If ``needle`` is a ``str``, this will perform an exact substring
    match; if it is a ``RegexpObject``, this will perform a regex
    search.

    If ``needle`` doesn't appear in ``haystack``, return ``-1``. If
    ``needle`` doesn't appear in ``haystack`` ``n`` times,
    return ``-1``.

    Arguments
    ---------
    * ``needle`` the substring (or a ``RegexpObject``) to find
    * ``haystack`` is a ``str``
    * an ``int`` indicating which occurrence to find; defaults to ``1``

    >>> find_nth("foo", "o", 1)
    1
    >>> find_nth("foo", "o", 2)
    2
    >>> find_nth("foo", "o", 3)
    -1
    >>> find_nth("foo", "b")
    -1
    >>> import re
    >>> either_o = re.compile("[oO]")
    >>> find_nth("foo", either_o, 1)
    1
    >>> find_nth("FOO", either_o, 1)
    1
    """
    if (hasattr(needle, 'finditer')):
        matches = needle.finditer(haystack)
    else:
        matches = re.finditer(re.escape(needle), haystack)
    start_here = itertools.dropwhile(lambda x: x[0] < n, enumerate(matches, 1))
    try:
        return next(start_here)[1].start()
    except StopIteration:
        return -1

Here’s another re + itertools version that should work when searching for either a str or a RegexpObject. I will freely admit that this is likely over-engineered, but for some reason it entertained me.

import itertools
import re

def find_nth(haystack, needle, n = 1):
    """
    Find the starting index of the nth occurrence of ``needle`` in \
    ``haystack``.

    If ``needle`` is a ``str``, this will perform an exact substring
    match; if it is a ``RegexpObject``, this will perform a regex
    search.

    If ``needle`` doesn't appear in ``haystack``, return ``-1``. If
    ``needle`` doesn't appear in ``haystack`` ``n`` times,
    return ``-1``.

    Arguments
    ---------
    * ``needle`` the substring (or a ``RegexpObject``) to find
    * ``haystack`` is a ``str``
    * an ``int`` indicating which occurrence to find; defaults to ``1``

    >>> find_nth("foo", "o", 1)
    1
    >>> find_nth("foo", "o", 2)
    2
    >>> find_nth("foo", "o", 3)
    -1
    >>> find_nth("foo", "b")
    -1
    >>> import re
    >>> either_o = re.compile("[oO]")
    >>> find_nth("foo", either_o, 1)
    1
    >>> find_nth("FOO", either_o, 1)
    1
    """
    if (hasattr(needle, 'finditer')):
        matches = needle.finditer(haystack)
    else:
        matches = re.finditer(re.escape(needle), haystack)
    start_here = itertools.dropwhile(lambda x: x[0] < n, enumerate(matches, 1))
    try:
        return next(start_here)[1].start()
    except StopIteration:
        return -1

回答 10

基于modle13的答案,但没有re模块依赖性。

def iter_find(haystack, needle):
    return [i for i in range(0, len(haystack)) if haystack[i:].startswith(needle)]

我有点希望这是一个内置的字符串方法。

>>> iter_find("http://stackoverflow.com/questions/1883980/", '/')
[5, 6, 24, 34, 42]

Building on modle13‘s answer, but without the re module dependency.

def iter_find(haystack, needle):
    return [i for i in range(0, len(haystack)) if haystack[i:].startswith(needle)]

I kinda wish this was a builtin string method.

>>> iter_find("http://stackoverflow.com/questions/1883980/", '/')
[5, 6, 24, 34, 42]

回答 11

>>> s="abcdefabcdefababcdef"
>>> j=0
>>> for n,i in enumerate(s):
...   if s[n:n+2] =="ab":
...     print n,i
...     j=j+1
...     if j==2: print "2nd occurence at index position: ",n
...
0 a
6 a
2nd occurence at index position:  6
12 a
14 a
>>> s="abcdefabcdefababcdef"
>>> j=0
>>> for n,i in enumerate(s):
...   if s[n:n+2] =="ab":
...     print n,i
...     j=j+1
...     if j==2: print "2nd occurence at index position: ",n
...
0 a
6 a
2nd occurence at index position:  6
12 a
14 a

回答 12

提供另一个使用“ split和”的“棘手”解决方案join

在您的示例中,我们可以使用

len("substring".join([s for s in ori.split("substring")[:2]]))

Providing another “tricky” solution, which use split and join.

In your example, we can use

len("substring".join([s for s in ori.split("substring")[:2]]))

回答 13

# return -1 if nth substr (0-indexed) d.n.e, else return index
def find_nth(s, substr, n):
    i = 0
    while n >= 0:
        n -= 1
        i = s.find(substr, i + 1)
    return i
# return -1 if nth substr (0-indexed) d.n.e, else return index
def find_nth(s, substr, n):
    i = 0
    while n >= 0:
        n -= 1
        i = s.find(substr, i + 1)
    return i

回答 14

不使用循环和递归的解决方案。

在编译方法中使用所需的模式,然后在变量‘n’中输入所需的出现位置,最后一条语句将在给定的字符串中打印该模式的第n个出现位置的起始索引。在这里,finditer的结果(即迭代器)将转换为list并直接访问第n个索引。

import re
n=2
sampleString="this is history"
pattern=re.compile("is")
matches=pattern.finditer(sampleString)
print(list(matches)[n].span()[0])

Solution without using loops and recursion.

Use the required pattern in compile method and enter the desired occurrence in variable ‘n’ and the last statement will print the starting index of the nth occurrence of the pattern in the given string. Here the result of finditer i.e. iterator is being converted to list and directly accessing the nth index.

import re
n=2
sampleString="this is history"
pattern=re.compile("is")
matches=pattern.finditer(sampleString)
print(list(matches)[n].span()[0])

回答 15

替换一根衬管很棒,但只能工作,因为XX和bar具有相同的长度

一个好的和一般的定义是:

def findN(s,sub,N,replaceString="XXX"):
    return s.replace(sub,replaceString,N-1).find(sub) - (len(replaceString)-len(sub))*(N-1)

The replace one liner is great but only works because XX and bar have the same lentgh

A good and general def would be:

def findN(s,sub,N,replaceString="XXX"):
    return s.replace(sub,replaceString,N-1).find(sub) - (len(replaceString)-len(sub))*(N-1)

回答 16

这是您真正想要的答案:

def Find(String,ToFind,Occurence = 1):
index = 0 
count = 0
while index <= len(String):
    try:
        if String[index:index + len(ToFind)] == ToFind:
            count += 1
        if count == Occurence:
               return index
               break
        index += 1
    except IndexError:
        return False
        break
return False

This is the answer you really want:

def Find(String,ToFind,Occurence = 1):
index = 0 
count = 0
while index <= len(String):
    try:
        if String[index:index + len(ToFind)] == ToFind:
            count += 1
        if count == Occurence:
               return index
               break
        index += 1
    except IndexError:
        return False
        break
return False

回答 17

这是我找到ninth出现b在字符串中的解决方案a

from functools import reduce


def findNth(a, b, n):
    return reduce(lambda x, y: -1 if y > x + 1 else a.find(b, x + 1), range(n), -1)

它是纯Python并且是迭代的。对于0或n太大,它将返回-1。它是单线的,可以直接使用。这是一个例子:

>>> reduce(lambda x, y: -1 if y > x + 1 else 'bibarbobaobaotang'.find('b', x + 1), range(4), -1)
7

Here is my solution for finding nth occurrance of b in string a:

from functools import reduce


def findNth(a, b, n):
    return reduce(lambda x, y: -1 if y > x + 1 else a.find(b, x + 1), range(n), -1)

It is pure Python and iterative. For 0 or n that is too large, it returns -1. It is one-liner and can be used directly. Here is an example:

>>> reduce(lambda x, y: -1 if y > x + 1 else 'bibarbobaobaotang'.find('b', x + 1), range(4), -1)
7

回答 18

对于搜索字符的第n个出现(即长度为1的子字符串)的特殊情况,以下功能通过构建给定字符出现的所有位置的列表来起作用:

def find_char_nth(string, char, n):
    """Find the n'th occurence of a character within a string."""
    return [i for i, c in enumerate(string) if c == char][n-1]

如果少于n给定字符的出现次数,它将给出IndexError: list index out of range

这是从@Z​​v_oDD的答案派生而来的,对于单个字符而言,它得到了简化。

For the special case where you search for the n’th occurence of a character (i.e. substring of length 1), the following function works by building a list of all positions of occurences of the given character:

def find_char_nth(string, char, n):
    """Find the n'th occurence of a character within a string."""
    return [i for i, c in enumerate(string) if c == char][n-1]

If there are fewer than n occurences of the given character, it will give IndexError: list index out of range.

This is derived from @Zv_oDD’s answer and simplified for the case of a single character.


回答 19

Def:

def get_first_N_words(mytext, mylen = 3):
    mylist = list(mytext.split())
    if len(mylist)>=mylen: return ' '.join(mylist[:mylen])

使用方法:

get_first_N_words('  One Two Three Four ' , 3)

输出:

'One Two Three'

Def:

def get_first_N_words(mytext, mylen = 3):
    mylist = list(mytext.split())
    if len(mylist)>=mylen: return ' '.join(mylist[:mylen])

To use:

get_first_N_words('  One Two Three Four ' , 3)

Output:

'One Two Three'

回答 20

怎么样:

c = os.getcwd().split('\\')
print '\\'.join(c[0:-2])

How about:

c = os.getcwd().split('\\')
print '\\'.join(c[0:-2])

快速计数正整数中的非零位的方法

问题:快速计数正整数中的非零位的方法

我需要一种快速的方法来计算python中整数的位数。我当前的解决方案是

bin(n).count("1")

但我想知道是否有更快的方法?

PS :(我将一个大型2D二进制数组表示为一个数字列表并进行按位运算,这将时间从几小时缩短为几分钟。现在,我想摆脱那些多余的分钟。

编辑:1.它必须在python 2.7或2.6中

对小数字进行优化并不重要,因为这并不是一个明显的瓶颈,但是我确实在某些地方有1万+位的数字

例如,这是一个2000位的情况:

12448057941136394342297748548545082997815840357634948550739612798732309975923280685245876950055614362283769710705811182976142803324242407017104841062064840113262840137625582646683068904149296501029754654149991842951570880471230098259905004533869130509989042199261339990315125973721454059973605358766253998615919997174542922163484086066438120268185904663422979603026066685824578356173882166747093246377302371176167843247359636030248569148734824287739046916641832890744168385253915508446422276378715722482359321205673933317512861336054835392844676749610712462818600179225635467147870208L

I need a fast way to count the number of bits in an integer in python. My current solution is

bin(n).count("1")

but I am wondering if there is any faster way of doing this?

PS: (i am representing a big 2D binary array as a single list of numbers and doing bitwise operations, and that brings the time down from hours to minutes. and now I would like to get rid of those extra minutes.

Edit: 1. it has to be in python 2.7 or 2.6

and optimizing for small numbers does not matter that much since that would not be a clear bottleneck, but I do have numbers with 10 000 + bits at some places

for example this is a 2000 bit case:

12448057941136394342297748548545082997815840357634948550739612798732309975923280685245876950055614362283769710705811182976142803324242407017104841062064840113262840137625582646683068904149296501029754654149991842951570880471230098259905004533869130509989042199261339990315125973721454059973605358766253998615919997174542922163484086066438120268185904663422979603026066685824578356173882166747093246377302371176167843247359636030248569148734824287739046916641832890744168385253915508446422276378715722482359321205673933317512861336054835392844676749610712462818600179225635467147870208L

回答 0

对于任意长度的整数,这bin(n).count("1")是我在纯Python中可以找到的最快的速度。

我尝试修改Óscar和Adam的解决方案以分别处理64位和32位块中的整数。两者都比至少慢了十倍bin(n).count("1")(32位版本花了大约一半的时间)。

另一方面,gmpy popcount()大约花费了时间的1/20 bin(n).count("1")。因此,如果可以安装gmpy,请使用它。

为了回答注释中的问题,对于字节,我将使用查找表。您可以在运行时生成它:

counts = bytes(bin(x).count("1") for x in range(256))  # py2: use bytearray

或者只是按字面意思定义:

counts = (b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')

然后counts[x]得到0≤x≤255处的1位数目x

For arbitrary-length integers, bin(n).count("1") is the fastest I could find in pure Python.

I tried adapting Óscar’s and Adam’s solutions to process the integer in 64-bit and 32-bit chunks, respectively. Both were at least ten times slower than bin(n).count("1") (the 32-bit version took about half again as much time).

On the other hand, gmpy popcount() took about 1/20th of the time of bin(n).count("1"). So if you can install gmpy, use that.

To answer a question in the comments, for bytes I’d use a lookup table. You can generate it at runtime:

counts = bytes(bin(x).count("1") for x in range(256))  # py2: use bytearray

Or just define it literally:

counts = (b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')

Then it’s counts[x] to get the number of 1 bits in x where 0 ≤ x ≤ 255.


回答 1

您可以调整以下算法:

def CountBits(n):
  n = (n & 0x5555555555555555) + ((n & 0xAAAAAAAAAAAAAAAA) >> 1)
  n = (n & 0x3333333333333333) + ((n & 0xCCCCCCCCCCCCCCCC) >> 2)
  n = (n & 0x0F0F0F0F0F0F0F0F) + ((n & 0xF0F0F0F0F0F0F0F0) >> 4)
  n = (n & 0x00FF00FF00FF00FF) + ((n & 0xFF00FF00FF00FF00) >> 8)
  n = (n & 0x0000FFFF0000FFFF) + ((n & 0xFFFF0000FFFF0000) >> 16)
  n = (n & 0x00000000FFFFFFFF) + ((n & 0xFFFFFFFF00000000) >> 32) # This last & isn't strictly necessary.
  return n

这适用于64位正数,但是它很容易扩展,并且运算数量随参数的对数增长(即与参数的位大小成线性关系)。

为了了解其工作原理,可以想象将整个64位字符串分成64个1位存储桶。每个存储区的值等于存储区中设置的位数(如果未设置,则为0,如果设置为1,则为1)。第一次转换会产生类似状态,但有32个存储桶,每个存储桶2位长。这可以通过适当地移动存储桶并添加其值来实现(一次加法处理所有存储桶,因为在存储桶之间不会发生进位-n位数字始终足够长以对数字n进行编码)。进一步的转换导致状态的桶数呈指数级下降,而大小呈指数增长,直到我们得到一个64位长的桶。这给出了原始参数中设置的位数。

You can adapt the following algorithm:

def CountBits(n):
  n = (n & 0x5555555555555555) + ((n & 0xAAAAAAAAAAAAAAAA) >> 1)
  n = (n & 0x3333333333333333) + ((n & 0xCCCCCCCCCCCCCCCC) >> 2)
  n = (n & 0x0F0F0F0F0F0F0F0F) + ((n & 0xF0F0F0F0F0F0F0F0) >> 4)
  n = (n & 0x00FF00FF00FF00FF) + ((n & 0xFF00FF00FF00FF00) >> 8)
  n = (n & 0x0000FFFF0000FFFF) + ((n & 0xFFFF0000FFFF0000) >> 16)
  n = (n & 0x00000000FFFFFFFF) + ((n & 0xFFFFFFFF00000000) >> 32) # This last & isn't strictly necessary.
  return n

This works for 64-bit positive numbers, but it’s easily extendable and the number of operations growth with the logarithm of the argument (i.e. linearly with the bit-size of the argument).

In order to understand how this works imagine that you divide the entire 64-bit string into 64 1-bit buckets. Each bucket’s value is equal to the number of bits set in the bucket (0 if no bits are set and 1 if one bit is set). The first transformation results in an analogous state, but with 32 buckets each 2-bit long. This is achieved by appropriately shifting the buckets and adding their values (one addition takes care of all buckets since no carry can occur across buckets – n-bit number is always long enough to encode number n). Further transformations lead to states with exponentially decreasing number of buckets of exponentially growing size until we arrive at one 64-bit long bucket. This gives the number of bits set in the original argument.


回答 2

这里有一个Python实现人口数的算法,如在此解释

def numberOfSetBits(i):
    i = i - ((i >> 1) & 0x55555555)
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
    return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24

它适用于0 <= i < 0x100000000

Here’s a Python implementation of the population count algorithm, as explained in this post:

def numberOfSetBits(i):
    i = i - ((i >> 1) & 0x55555555)
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
    return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24

It will work for 0 <= i < 0x100000000.


回答 3

根据这篇文章,这似乎是汉明重量最快的实现之一(如果您不介意使用大约64KB的内存)。

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

在Python 2.x上,您应该替换rangexrange

编辑

如果您需要更好的性能(并且数字是大整数),请查看GMP库。它包含用于许多不同体系结构的手写程序集实现。

gmpy 是包装GMP库的C编码Python扩展模块。

>>> import gmpy
>>> gmpy.popcount(2**1024-1)
1024

According to this post, this seems to be one the fastest implementation of the Hamming weight (if you don’t mind using about 64KB of memory).

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

On Python 2.x you should replace range with xrange.

Edit

If you need better performance (and your numbers are big integers), have a look at the GMP library. It contains hand-written assembly implementations for many different architectures.

gmpy is A C-coded Python extension module that wraps the GMP library.

>>> import gmpy
>>> gmpy.popcount(2**1024-1)
1024

回答 4

我真的很喜欢这种方法。它简单而快速,但由于python具有无限整数,因此位长不受限制。

实际上,它比看上去要狡猾得多,因为它避免了浪费时间扫描零。例如,与1111中一样,将需要花费相同的时间来计算1000000000000000000000010100000001中的设置位。

def get_bit_count(value):
   n = 0
   while value:
      n += 1
      value &= value-1
   return n

I really like this method. Its simple and pretty fast but also not limited in the bit length since python has infinite integers.

It’s actually more cunning than it looks, because it avoids wasting time scanning the zeros. For example it will take the same time to count the set bits in 1000000000000000000000010100000001 as in 1111.

def get_bit_count(value):
   n = 0
   while value:
      n += 1
      value &= value-1
   return n

回答 5

您可以使用该算法获取整数的二进制字符串[1],而不是将字符串连接起来,而是计算一个数字:

def count_ones(a):
    s = 0
    t = {'0':0, '1':1, '2':1, '3':2, '4':1, '5':2, '6':2, '7':3}
    for c in oct(a)[1:]:
        s += t[c]
    return s

[1] https://wiki.python.org/moin/BitManipulation

You can use the algorithm to get the binary string [1] of an integer but instead of concatenating the string, counting the number of ones:

def count_ones(a):
    s = 0
    t = {'0':0, '1':1, '2':1, '3':2, '4':1, '5':2, '6':2, '7':3}
    for c in oct(a)[1:]:
        s += t[c]
    return s

[1] https://wiki.python.org/moin/BitManipulation


回答 6

你说Numpy太慢了。您是否使用它来存储单个位?为什么不扩展使用int作为位数组,而是使用Numpy来存储这些位的想法呢?

将n位存储为ceil(n/32.)32位int 数组。然后,您可以使用int的方式(很好,非常相似)使用numpy数组,包括使用它们为另一个数组建立索引。

该算法基本上是并行计算每个单元中设置的位数,并且它们求和每个单元的位数。

setup = """
import numpy as np
#Using Paolo Moretti's answer http://stackoverflow.com/a/9829855/2963903
POPCOUNT_TABLE16 = np.zeros(2**16, dtype=int) #has to be an array

for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

def count1s(v):
    return popcount32_table16(v).sum()

v1 = np.arange(1000)*1234567                       #numpy array
v2 = sum(int(x)<<(32*i) for i, x in enumerate(v1)) #single int
"""
from timeit import timeit

timeit("count1s(v1)", setup=setup)        #49.55184188873349
timeit("bin(v2).count('1')", setup=setup) #225.1857464598633

尽管我很惊讶,但没有人建议您编写C模块。

You said Numpy was too slow. Were you using it to store individual bits? Why not extend the idea of using ints as bit arrays but use Numpy to store those?

Store n bits as an array of ceil(n/32.) 32-bit ints. You can then work with the numpy array the same (well, similar enough) way you use ints, including using them to index another array.

The algorithm is basically to compute, in parallel, the number of bits set in each cell, and them sum up the bitcount of each cell.

setup = """
import numpy as np
#Using Paolo Moretti's answer http://stackoverflow.com/a/9829855/2963903
POPCOUNT_TABLE16 = np.zeros(2**16, dtype=int) #has to be an array

for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

def count1s(v):
    return popcount32_table16(v).sum()

v1 = np.arange(1000)*1234567                       #numpy array
v2 = sum(int(x)<<(32*i) for i, x in enumerate(v1)) #single int
"""
from timeit import timeit

timeit("count1s(v1)", setup=setup)        #49.55184188873349
timeit("bin(v2).count('1')", setup=setup) #225.1857464598633

Though I’m surprised no one suggested you write a C module.


回答 7

#Python prg to count set bits
#Function to count set bits
def bin(n):
    count=0
    while(n>=1):
        if(n%2==0):
            n=n//2
        else:
            count+=1
            n=n//2
    print("Count of set bits:",count)
#Fetch the input from user
num=int(input("Enter number: "))
#Output
bin(num)
#Python prg to count set bits
#Function to count set bits
def bin(n):
    count=0
    while(n>=1):
        if(n%2==0):
            n=n//2
        else:
            count+=1
            n=n//2
    print("Count of set bits:",count)
#Fetch the input from user
num=int(input("Enter number: "))
#Output
bin(num)


回答 8

事实证明,您的起始表示形式是一个整数列表,该整数列表为1或0。只需在该表示形式中对其进行计数即可。


整数中的位数在python中是恒定的。

但是,如果要计算设置的位数,最快的方法是创建符合以下伪代码的列表: [numberofsetbits(n) for n in range(MAXINT)]

生成列表后,这将为您提供恒定时间的查找。有关此的良好实现,请参见@PaoloMoretti的答案。当然,您不必将所有内容都保留在内存中-您可以使用某种持久键值存储,甚至MySql。(另一种选择是实现您自己的基于磁盘的简单存储)。

It turns out your starting representation is a list of lists of ints which are either 1 or 0. Simply count them in that representation.


The number of bits in an integer is constant in python.

However, if you want to count the number of set bits, the fastest way is to create a list conforming to the following pseudocode: [numberofsetbits(n) for n in range(MAXINT)]

This will provide you a constant time lookup after you have generated the list. See @PaoloMoretti’s answer for a good implementation of this. Of course, you don’t have to keep this all in memory – you could use some sort of persistent key-value store, or even MySql. (Another option would be to implement your own simple disk-based storage).


如何使用Python3读写INI文件?

问题:如何使用Python3读写INI文件?

我需要读取,写入和创建一个INI使用Python3文件。

文件文件

default_path = "/path/name/"
default_file = "file.txt"

Python档案:

#    Read file and and create if it not exists
config = iniFile( 'FILE.INI' )

#    Get "default_path"
config.default_path

#    Print (string)/path/name
print config.default_path

#    Create or Update
config.append( 'default_path', 'var/shared/' )
config.append( 'default_message', 'Hey! help me!!' )

更新的 FILE.INI

default_path    = "var/shared/"
default_file    = "file.txt"
default_message = "Hey! help me!!"

I need to read, write and create an INI file with Python3.

FILE.INI

default_path = "/path/name/"
default_file = "file.txt"

Python File:

#    Read file and and create if it not exists
config = iniFile( 'FILE.INI' )

#    Get "default_path"
config.default_path

#    Print (string)/path/name
print config.default_path

#    Create or Update
config.append( 'default_path', 'var/shared/' )
config.append( 'default_message', 'Hey! help me!!' )

UPDATED FILE.INI

default_path    = "var/shared/"
default_file    = "file.txt"
default_message = "Hey! help me!!"

回答 0

可以从以下开始:

import configparser

config = configparser.ConfigParser()
config.read('FILE.INI')
print(config['DEFAULT']['path'])     # -> "/path/name/"
config['DEFAULT']['path'] = '/var/shared/'    # update
config['DEFAULT']['default_message'] = 'Hey! help me!!'   # create

with open('FILE.INI', 'w') as configfile:    # save
    config.write(configfile)

您可以在官方configparser文档中找到更多信息

This can be something to start with:

import configparser

config = configparser.ConfigParser()
config.read('FILE.INI')
print(config['DEFAULT']['path'])     # -> "/path/name/"
config['DEFAULT']['path'] = '/var/shared/'    # update
config['DEFAULT']['default_message'] = 'Hey! help me!!'   # create

with open('FILE.INI', 'w') as configfile:    # save
    config.write(configfile)

You can find more at the official configparser documentation.


回答 1

这是一个完整的读取,更新和写入示例。

输入文件test.ini

[section_a]
string_val = hello
bool_val = false
int_val = 11
pi_val = 3.14

工作代码。

try:
    from configparser import ConfigParser
except ImportError:
    from ConfigParser import ConfigParser  # ver. < 3.0

# instantiate
config = ConfigParser()

# parse existing file
config.read('test.ini')

# read values from a section
string_val = config.get('section_a', 'string_val')
bool_val = config.getboolean('section_a', 'bool_val')
int_val = config.getint('section_a', 'int_val')
float_val = config.getfloat('section_a', 'pi_val')

# update existing value
config.set('section_a', 'string_val', 'world')

# add a new section and some values
config.add_section('section_b')
config.set('section_b', 'meal_val', 'spam')
config.set('section_b', 'not_found_val', '404')

# save to a file
with open('test_update.ini', 'w') as configfile:
    config.write(configfile)

输出文件test_update.ini

[section_a]
string_val = world
bool_val = false
int_val = 11
pi_val = 3.14

[section_b]
meal_val = spam
not_found_val = 404

原始输入文件保持不变。

Here’s a complete read, update and write example.

Input file, test.ini

[section_a]
string_val = hello
bool_val = false
int_val = 11
pi_val = 3.14

Working code.

try:
    from configparser import ConfigParser
except ImportError:
    from ConfigParser import ConfigParser  # ver. < 3.0

# instantiate
config = ConfigParser()

# parse existing file
config.read('test.ini')

# read values from a section
string_val = config.get('section_a', 'string_val')
bool_val = config.getboolean('section_a', 'bool_val')
int_val = config.getint('section_a', 'int_val')
float_val = config.getfloat('section_a', 'pi_val')

# update existing value
config.set('section_a', 'string_val', 'world')

# add a new section and some values
config.add_section('section_b')
config.set('section_b', 'meal_val', 'spam')
config.set('section_b', 'not_found_val', '404')

# save to a file
with open('test_update.ini', 'w') as configfile:
    config.write(configfile)

Output file, test_update.ini

[section_a]
string_val = world
bool_val = false
int_val = 11
pi_val = 3.14

[section_b]
meal_val = spam
not_found_val = 404

The original input file remains untouched.


回答 2

http://docs.python.org/library/configparser.html

在这种情况下,Python的标准库可能会有所帮助。

http://docs.python.org/library/configparser.html

Python’s standard library might be helpful in this case.


回答 3

该标准ConfigParser通常需要通过进行访问config['section_name']['key'],这很无聊。稍加修改即可实现属性访问:

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

AttrDict是派生自该类的类,该类dict允许同时通过字典键和属性访问:a.x is a['x']

我们可以在以下类中使用此类ConfigParser

config = configparser.ConfigParser(dict_type=AttrDict)
config.read('application.ini')

现在我们得到application.ini

[general]
key = value

>>> config._sections.general.key
'value'

The standard ConfigParser normally requires access via config['section_name']['key'], which is no fun. A little modification can deliver attribute access:

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

AttrDict is a class derived from dict which allows access via both dictionary keys and attribute access: that means a.x is a['x']

We can use this class in ConfigParser:

config = configparser.ConfigParser(dict_type=AttrDict)
config.read('application.ini')

and now we get application.ini with:

[general]
key = value

as

>>> config._sections.general.key
'value'

回答 4

ConfigObj是ConfigParser的很好替代品,它提供了更多的灵活性:

  • 嵌套的节(子节),任意级别
  • 列出值
  • 多行值
  • 字符串插值(替代)
  • 与功能强大的验证系统集成,包括自动类型检查/转换重复部分并允许使用默认值
  • 写出配置文件时,ConfigObj保留所有注释以及成员和节的顺序
  • 使用配置文件的许多有用方法和选项(例如“ reload”方法)
  • 完全支持Unicode

它有一些缺点:

  • 您不能设置定界符,它必须是=…(拉取请求
  • 您不能有空值,但是可以,但是它们看起来很喜欢:fuabr =而不是fubar看起来怪异和错误的值。

ConfigObj is a good alternative to ConfigParser which offers a lot more flexibility:

  • Nested sections (subsections), to any level
  • List values
  • Multiple line values
  • String interpolation (substitution)
  • Integrated with a powerful validation system including automatic type checking/conversion repeated sections and allowing default values
  • When writing out config files, ConfigObj preserves all comments and the order of members and sections
  • Many useful methods and options for working with configuration files (like the ‘reload’ method)
  • Full Unicode support

It has some draw backs:

  • You cannot set the delimiter, it has to be =… (pull request)
  • You cannot have empty values, well you can but they look liked: fuabr = instead of just fubar which looks weird and wrong.

回答 5

我的backup_settings.ini文件中的内容

[Settings]
year = 2020

用于阅读的python代码

import configparser
config = configparser.ConfigParser()
config.read('backup_settings.ini') #path of your .ini file
year = config.get("Settings","year") 
print(year)

用于编写或更新

from pathlib import Path
import configparser
myfile = Path('backup_settings.ini')  #Path of your .ini file
config.read(myfile)
config.set('Settings', 'year','2050') #Updating existing entry 
config.set('Settings', 'day','sunday') #Writing new entry
config.write(myfile.open("w"))

输出

[Settings]
year = 2050
day = sunday

contents in my backup_settings.ini file

[Settings]
year = 2020

python code for reading

import configparser
config = configparser.ConfigParser()
config.read('backup_settings.ini') #path of your .ini file
year = config.get("Settings","year") 
print(year)

for writing or updating

from pathlib import Path
import configparser
myfile = Path('backup_settings.ini')  #Path of your .ini file
config.read(myfile)
config.set('Settings', 'year','2050') #Updating existing entry 
config.set('Settings', 'day','sunday') #Writing new entry
config.write(myfile.open("w"))

output

[Settings]
year = 2050
day = sunday

忽略带有str.contains的NaN

问题:忽略带有str.contains的NaN

我想查找包含字符串的行,如下所示:

DF[DF.col.str.contains("foo")]

但是,这失败了,因为某些元素是NaN:

ValueError:无法使用包含NA / NaN值的向量建立索引

所以我诉诸于混乱

DF[DF.col.notnull()][DF.col.dropna().str.contains("foo")]

有没有更好的办法?

I want to find rows that contain a string, like so:

DF[DF.col.str.contains("foo")]

However, this fails because some elements are NaN:

ValueError: cannot index with vector containing NA / NaN values

So I resort to the obfuscated

DF[DF.col.notnull()][DF.col.dropna().str.contains("foo")]

Is there a better way?


回答 0

有一个标志:

In [11]: df = pd.DataFrame([["foo1"], ["foo2"], ["bar"], [np.nan]], columns=['a'])

In [12]: df.a.str.contains("foo")
Out[12]:
0     True
1     True
2    False
3      NaN
Name: a, dtype: object

In [13]: df.a.str.contains("foo", na=False)
Out[13]:
0     True
1     True
2    False
3    False
Name: a, dtype: bool

参见str.replace文档:

na:默认的NaN,填充缺失值的值。


因此,您可以执行以下操作:

In [21]: df.loc[df.a.str.contains("foo", na=False)]
Out[21]:
      a
0  foo1
1  foo2

There’s a flag for that:

In [11]: df = pd.DataFrame([["foo1"], ["foo2"], ["bar"], [np.nan]], columns=['a'])

In [12]: df.a.str.contains("foo")
Out[12]:
0     True
1     True
2    False
3      NaN
Name: a, dtype: object

In [13]: df.a.str.contains("foo", na=False)
Out[13]:
0     True
1     True
2    False
3    False
Name: a, dtype: bool

See the str.replace docs:

na : default NaN, fill value for missing values.


So you can do the following:

In [21]: df.loc[df.a.str.contains("foo", na=False)]
Out[21]:
      a
0  foo1
1  foo2

回答 1

除了上述答案,我想说的是对于没有单个单词名称的列,您可以使用:

df[df['Product ID'].str.contains("foo") == True]

希望这可以帮助。

In addition to the above answers, I would say for columns having no single word name, you may use:-

df[df['Product ID'].str.contains("foo") == True]

Hope this helps.


回答 2

我不是100%的理由(实际上是来这里寻找答案的),但是这也行得通,并且不需要替换所有的nan值。

import pandas as pd
import numpy as np

df = pd.DataFrame([["foo1"], ["foo2"], ["bar"], [np.nan]], columns=['a'])

newdf = df.loc[df['a'].str.contains('foo') == True]

可以使用或不使用.loc

我不知道为什么会这样,据我了解,当您使用方括号索引时,pandas会将方括号内的内容评估为TrueFalse。我不知道为什么在方括号“ extra boolean”中添加该短语完全没有效果。

I’m not 100% on why (actually came here to search for the answer), but this also works, and doesn’t require replacing all nan values.

import pandas as pd
import numpy as np

df = pd.DataFrame([["foo1"], ["foo2"], ["bar"], [np.nan]], columns=['a'])

newdf = df.loc[df['a'].str.contains('foo') == True]

Works with or without .loc.

I have no idea why this works, as I understand it when you’re indexing with brackets pandas evaluates whatever’s inside the bracket as either True or False. I can’t tell why making the phrase inside the brackets ‘extra boolean’ has any effect at all.


回答 3

您也可以设置:

DF[DF.col.str.contains(pat = '(foo)', regex = True) ]

You can also patern :

DF[DF.col.str.contains(pat = '(foo)', regex = True) ]

回答 4

import folium
import pandas

data= pandas.read_csv("maps.txt")

lat = list(data["latitude"])
lon = list(data["longitude"])

map= folium.Map(location=[31.5204, 74.3587], zoom_start=6, tiles="Mapbox Bright")

fg = folium.FeatureGroup(name="My Map")

for lt, ln in zip(lat, lon):
c1 = fg.add_child(folium.Marker(location=[lt, ln], popup="Hi i am a Country",icon=folium.Icon(color='green')))

child = fg.add_child(folium.Marker(location=[31.5204, 74.5387], popup="Welcome to Lahore", icon= folium.Icon(color='green')))

map.add_child(fg)

map.save("Lahore.html")


Traceback (most recent call last):
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python36-32\check2.py", line 14, in <module>
    c1 = fg.add_child(folium.Marker(location=[lt, ln], popup="Hi i am a Country",icon=folium.Icon(color='green')))
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\folium\map.py", line 647, in __init__
    self.location = _validate_coordinates(location)
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\folium\utilities.py", line 48, in _validate_coordinates
    'got:\n{!r}'.format(coordinates))
ValueError: Location values cannot contain NaNs, got:
[nan, nan]
import folium
import pandas

data= pandas.read_csv("maps.txt")

lat = list(data["latitude"])
lon = list(data["longitude"])

map= folium.Map(location=[31.5204, 74.3587], zoom_start=6, tiles="Mapbox Bright")

fg = folium.FeatureGroup(name="My Map")

for lt, ln in zip(lat, lon):
c1 = fg.add_child(folium.Marker(location=[lt, ln], popup="Hi i am a Country",icon=folium.Icon(color='green')))

child = fg.add_child(folium.Marker(location=[31.5204, 74.5387], popup="Welcome to Lahore", icon= folium.Icon(color='green')))

map.add_child(fg)

map.save("Lahore.html")


Traceback (most recent call last):
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python36-32\check2.py", line 14, in <module>
    c1 = fg.add_child(folium.Marker(location=[lt, ln], popup="Hi i am a Country",icon=folium.Icon(color='green')))
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\folium\map.py", line 647, in __init__
    self.location = _validate_coordinates(location)
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\folium\utilities.py", line 48, in _validate_coordinates
    'got:\n{!r}'.format(coordinates))
ValueError: Location values cannot contain NaNs, got:
[nan, nan]

有趣好用的Python教程

退出移动版
微信支付
请使用 微信 扫码支付