分类目录归档:知识问答

Python中类似C的结构

问题:Python中类似C的结构

有没有一种方法可以在Python中方便地定义类似C的结构?我讨厌写这样的东西:

class MyStruct():
    def __init__(self, field1, field2, field3):
        self.field1 = field1
        self.field2 = field2
        self.field3 = field3

Is there a way to conveniently define a C-like structure in Python? I’m tired of writing stuff like:

class MyStruct():
    def __init__(self, field1, field2, field3):
        self.field1 = field1
        self.field2 = field2
        self.field3 = field3

回答 0

使用命名的tuple,它已添加到Python 2.6的标准库的collections模块中。如果您需要支持Python 2.4,也可以使用Raymond Hettinger的命名元组配方。

这对于您的基本示例很好,但是也涵盖了以后可能会遇到的许多极端情况。您上面的片段将写为:

from collections import namedtuple
MyStruct = namedtuple("MyStruct", "field1 field2 field3")

新创建的类型可以这样使用:

m = MyStruct("foo", "bar", "baz")

您还可以使用命名参数:

m = MyStruct(field1="foo", field2="bar", field3="baz")

Use a named tuple, which was added to the collections module in the standard library in Python 2.6. It’s also possible to use Raymond Hettinger’s named tuple recipe if you need to support Python 2.4.

It’s nice for your basic example, but also covers a bunch of edge cases you might run into later as well. Your fragment above would be written as:

from collections import namedtuple
MyStruct = namedtuple("MyStruct", "field1 field2 field3")

The newly created type can be used like this:

m = MyStruct("foo", "bar", "baz")

You can also use named arguments:

m = MyStruct(field1="foo", field2="bar", field3="baz")

回答 1

更新:数据类

通过引入数据类的Python 3.7,我们非常接近。

下面的示例类似于下面的NamedTuple示例,但是结果对象是可变的,并且允许使用默认值。

from dataclasses import dataclass


@dataclass
class Point:
    x: float
    y: float
    z: float = 0.0


p = Point(1.5, 2.5)

print(p)  # Point(x=1.5, y=2.5, z=0.0)

如果您想使用更多特定的类型注释,那么这与新的键入模块配合使用非常好。

我一直在拼命等待!如果您问我,数据类和新的NamedTuple声明,再加上键入模块,真是天赐之物!

改进了NamedTuple声明

Python 3.6开始,只要您可以忍受不变性,它就会变得非常简单和美观(IMHO)。

一个声明NamedTuples的新方法被引入,它允许类型的注释,以及:

from typing import NamedTuple


class User(NamedTuple):
    name: str


class MyStruct(NamedTuple):
    foo: str
    bar: int
    baz: list
    qux: User


my_item = MyStruct('foo', 0, ['baz'], User('peter'))

print(my_item) # MyStruct(foo='foo', bar=0, baz=['baz'], qux=User(name='peter'))

Update: Data Classes

With the introduction of Data Classes in Python 3.7 we get very close.

The following example is similar to the NamedTuple example below, but the resulting object is mutable and it allows for default values.

from dataclasses import dataclass


@dataclass
class Point:
    x: float
    y: float
    z: float = 0.0


p = Point(1.5, 2.5)

print(p)  # Point(x=1.5, y=2.5, z=0.0)

This plays nicely with the new typing module in case you want to use more specific type annotations.

I’ve been waiting desperately for this! If you ask me, Data Classes and the new NamedTuple declaration, combined with the typing module are a godsend!

Improved NamedTuple declaration

Since Python 3.6 it became quite simple and beautiful (IMHO), as long as you can live with immutability.

A new way of declaring NamedTuples was introduced, which allows for type annotations as well:

from typing import NamedTuple


class User(NamedTuple):
    name: str


class MyStruct(NamedTuple):
    foo: str
    bar: int
    baz: list
    qux: User


my_item = MyStruct('foo', 0, ['baz'], User('peter'))

print(my_item) # MyStruct(foo='foo', bar=0, baz=['baz'], qux=User(name='peter'))

回答 2

您可以在很多情况下使用元组,而在C中使用结构(例如x,y坐标或RGB颜色)。

对于其他所有内容,您都可以使用字典或类似这样的实用程序类:

>>> class Bunch:
...     def __init__(self, **kwds):
...         self.__dict__.update(kwds)
...
>>> mystruct = Bunch(field1=value1, field2=value2)

我认为“确定性”讨论在此处,在Python Cookbook的发行版本中。

You can use a tuple for a lot of things where you would use a struct in C (something like x,y coordinates or RGB colors for example).

For everything else you can use dictionary, or a utility class like this one:

>>> class Bunch:
...     def __init__(self, **kwds):
...         self.__dict__.update(kwds)
...
>>> mystruct = Bunch(field1=value1, field2=value2)

I think the “definitive” discussion is here, in the published version of the Python Cookbook.


回答 3

也许您正在寻找没有构造函数的Structs:

class Sample:
  name = ''
  average = 0.0
  values = None # list cannot be initialized here!


s1 = Sample()
s1.name = "sample 1"
s1.values = []
s1.values.append(1)
s1.values.append(2)
s1.values.append(3)

s2 = Sample()
s2.name = "sample 2"
s2.values = []
s2.values.append(4)

for v in s1.values:   # prints 1,2,3 --> OK.
  print v
print "***"
for v in s2.values:   # prints 4 --> OK.
  print v

Perhaps you are looking for Structs without constructors:

class Sample:
  name = ''
  average = 0.0
  values = None # list cannot be initialized here!


s1 = Sample()
s1.name = "sample 1"
s1.values = []
s1.values.append(1)
s1.values.append(2)
s1.values.append(3)

s2 = Sample()
s2.name = "sample 2"
s2.values = []
s2.values.append(4)

for v in s1.values:   # prints 1,2,3 --> OK.
  print v
print "***"
for v in s2.values:   # prints 4 --> OK.
  print v

回答 4

字典怎么样?

像这样:

myStruct = {'field1': 'some val', 'field2': 'some val'}

然后,您可以使用它来操纵值:

print myStruct['field1']
myStruct['field2'] = 'some other values'

并且值不必是字符串。它们几乎可以是任何其他对象。

How about a dictionary?

Something like this:

myStruct = {'field1': 'some val', 'field2': 'some val'}

Then you can use this to manipulate values:

print myStruct['field1']
myStruct['field2'] = 'some other values'

And the values don’t have to be strings. They can be pretty much any other object.


回答 5

dF:太酷了……我不知道我可以使用dict访问类中的字段。

马克:我希望我遇到的情况恰好是当我想要一个元组,却又没有字典那么重的时候。

您可以使用字典访问类的字段,因为类的字段,其方法及其所有属性都使用dict在内部存储(至少在CPython中)。

…这将引导我们提出您的第二条评论。相信Python字典是“繁重的”是一个极端的非Python的概念。阅读此类评论会杀死我的Python Zen。这不好。

您会看到,在声明一个类时,实际上是在围绕字典创建一个非常复杂的包装器-因此,如果有的话,与使用简单的字典相比,您将增加更多的开销。顺便说一句,开销在任何情况下都是没有意义的。如果您正在处理对性能至关重要的应用程序,请使用C或类似的东西。

dF: that’s pretty cool… I didn’t know that I could access the fields in a class using dict.

Mark: the situations that I wish I had this are precisely when I want a tuple but nothing as “heavy” as a dictionary.

You can access the fields of a class using a dictionary because the fields of a class, its methods and all its properties are stored internally using dicts (at least in CPython).

…Which leads us to your second comment. Believing that Python dicts are “heavy” is an extremely non-pythonistic concept. And reading such comments kills my Python Zen. That’s not good.

You see, when you declare a class you are actually creating a pretty complex wrapper around a dictionary – so, if anything, you are adding more overhead than by using a simple dictionary. An overhead which, by the way, is meaningless in any case. If you are working on performance critical applications, use C or something.


回答 6

您可以将标准库中可用的C结构子类化。该ctypes的模块提供了一个结构类。来自文档的示例:

>>> from ctypes import *
>>> class POINT(Structure):
...     _fields_ = [("x", c_int),
...                 ("y", c_int)]
...
>>> point = POINT(10, 20)
>>> print point.x, point.y
10 20
>>> point = POINT(y=5)
>>> print point.x, point.y
0 5
>>> POINT(1, 2, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: too many initializers
>>>
>>> class RECT(Structure):
...     _fields_ = [("upperleft", POINT),
...                 ("lowerright", POINT)]
...
>>> rc = RECT(point)
>>> print rc.upperleft.x, rc.upperleft.y
0 5
>>> print rc.lowerright.x, rc.lowerright.y
0 0
>>>

You can subclass the C structure that is available in the standard library. The ctypes module provides a Structure class. The example from the docs:

>>> from ctypes import *
>>> class POINT(Structure):
...     _fields_ = [("x", c_int),
...                 ("y", c_int)]
...
>>> point = POINT(10, 20)
>>> print point.x, point.y
10 20
>>> point = POINT(y=5)
>>> print point.x, point.y
0 5
>>> POINT(1, 2, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: too many initializers
>>>
>>> class RECT(Structure):
...     _fields_ = [("upperleft", POINT),
...                 ("lowerright", POINT)]
...
>>> rc = RECT(point)
>>> print rc.upperleft.x, rc.upperleft.y
0 5
>>> print rc.lowerright.x, rc.lowerright.y
0 0
>>>

回答 7

我还想添加一个使用slot的解决方案:

class Point:
    __slots__ = ["x", "y"]
    def __init__(self, x, y):
        self.x = x
        self.y = y

一定要查看文档中的插槽,但是插槽的快速说明是python的一种说法:“如果您可以将这些属性以及仅这些属性锁定到类中,以致您承诺一旦该类就不会添加任何新属性实例化(是的,您可以向类实例添加新属性,请参见下面的示例),然后我将取消大的内存分配,该内存分配允许向类实例添加新属性,并仅使用我需要的这些插槽化属性。”

向类实例添加属性的示例(因此不使用插槽):

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

p1 = Point(3,5)
p1.z = 8
print(p1.z)

输出8

尝试向使用插槽的类实例添加属性的示例:

class Point:
    __slots__ = ["x", "y"]
    def __init__(self, x, y):
        self.x = x
        self.y = y

p1 = Point(3,5)
p1.z = 8

输出:AttributeError:’Point’对象没有属性’z’

这可以有效地用作结构,并且比类使用更少的内存(就像结构一样,尽管我没有确切研究多少)。如果您将创建大量对象实例并且不需要添加属性,则建议使用插槽。一个点对象就是一个很好的例子,因为可能实例化许多点来描述一个数据集。

I would also like to add a solution that uses slots:

class Point:
    __slots__ = ["x", "y"]
    def __init__(self, x, y):
        self.x = x
        self.y = y

Definitely check the documentation for slots but a quick explanation of slots is that it is python’s way of saying: “If you can lock these attributes and only these attributes into the class such that you commit that you will not add any new attributes once the class is instantiated (yes you can add new attributes to a class instance, see example below) then I will do away with the large memory allocation that allows for adding new attributes to a class instance and use just what I need for these slotted attributes”.

Example of adding attributes to class instance (thus not using slots):

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

p1 = Point(3,5)
p1.z = 8
print(p1.z)

Output: 8

Example of trying to add attributes to class instance where slots was used:

class Point:
    __slots__ = ["x", "y"]
    def __init__(self, x, y):
        self.x = x
        self.y = y

p1 = Point(3,5)
p1.z = 8

Output: AttributeError: ‘Point’ object has no attribute ‘z’

This can effectively works as a struct and uses less memory than a class (like a struct would, although I have not researched exactly how much). It is recommended to use slots if you will be creating a large amount of instances of the object and do not need to add attributes. A point object is a good example of this as it is likely that one may instantiate many points to describe a dataset.


回答 8

您还可以按位置将init参数传递给实例变量

# Abstract struct class       
class Struct:
    def __init__ (self, *argv, **argd):
        if len(argd):
            # Update by dictionary
            self.__dict__.update (argd)
        else:
            # Update by position
            attrs = filter (lambda x: x[0:2] != "__", dir(self))
            for n in range(len(argv)):
                setattr(self, attrs[n], argv[n])

# Specific class
class Point3dStruct (Struct):
    x = 0
    y = 0
    z = 0

pt1 = Point3dStruct()
pt1.x = 10

print pt1.x
print "-"*10

pt2 = Point3dStruct(5, 6)

print pt2.x, pt2.y
print "-"*10

pt3 = Point3dStruct (x=1, y=2, z=3)
print pt3.x, pt3.y, pt3.z
print "-"*10

You can also pass the init parameters to the instance variables by position

# Abstract struct class       
class Struct:
    def __init__ (self, *argv, **argd):
        if len(argd):
            # Update by dictionary
            self.__dict__.update (argd)
        else:
            # Update by position
            attrs = filter (lambda x: x[0:2] != "__", dir(self))
            for n in range(len(argv)):
                setattr(self, attrs[n], argv[n])

# Specific class
class Point3dStruct (Struct):
    x = 0
    y = 0
    z = 0

pt1 = Point3dStruct()
pt1.x = 10

print pt1.x
print "-"*10

pt2 = Point3dStruct(5, 6)

print pt2.x, pt2.y
print "-"*10

pt3 = Point3dStruct (x=1, y=2, z=3)
print pt3.x, pt3.y, pt3.z
print "-"*10

回答 9

每当我需要一个“行为也像字典的即时数据对象”(我认为C结构!)时,我就会想到这个可爱的技巧:

class Map(dict):
    def __init__(self, **kwargs):
        super(Map, self).__init__(**kwargs)
        self.__dict__ = self

现在您可以说:

struct = Map(field1='foo', field2='bar', field3=42)

self.assertEquals('bar', struct.field2)
self.assertEquals(42, struct['field3'])

当您需要一个“不是类的数据包”以及namedtuple难以理解时,非常方便。

Whenever I need an “instant data object that also behaves like a dictionary” (I don’t think of C structs!), I think of this cute hack:

class Map(dict):
    def __init__(self, **kwargs):
        super(Map, self).__init__(**kwargs)
        self.__dict__ = self

Now you can just say:

struct = Map(field1='foo', field2='bar', field3=42)

self.assertEquals('bar', struct.field2)
self.assertEquals(42, struct['field3'])

Perfectly handy for those times when you need a “data bag that’s NOT a class”, and for when namedtuples are incomprehensible…


回答 10

您可以通过以下方式在python中访问C-Style结构。

class cstruct:
    var_i = 0
    var_f = 0.0
    var_str = ""

如果您只想使用cstruct的对象

obj = cstruct()
obj.var_i = 50
obj.var_f = 50.00
obj.var_str = "fifty"
print "cstruct: obj i=%d f=%f s=%s" %(obj.var_i, obj.var_f, obj.var_str)

如果要创建cstruct的对象数组

obj_array = [cstruct() for i in range(10)]
obj_array[0].var_i = 10
obj_array[0].var_f = 10.00
obj_array[0].var_str = "ten"

#go ahead and fill rest of array instaces of struct

#print all the value
for i in range(10):
    print "cstruct: obj_array i=%d f=%f s=%s" %(obj_array[i].var_i, obj_array[i].var_f, obj_array[i].var_str)

注意:请使用您的结构名称而不是“ cstruct”名称,而不要使用var_i,var_f,var_str,请定义结构的成员变量。

You access C-Style struct in python in following way.

class cstruct:
    var_i = 0
    var_f = 0.0
    var_str = ""

if you just want use object of cstruct

obj = cstruct()
obj.var_i = 50
obj.var_f = 50.00
obj.var_str = "fifty"
print "cstruct: obj i=%d f=%f s=%s" %(obj.var_i, obj.var_f, obj.var_str)

if you want to create an array of objects of cstruct

obj_array = [cstruct() for i in range(10)]
obj_array[0].var_i = 10
obj_array[0].var_f = 10.00
obj_array[0].var_str = "ten"

#go ahead and fill rest of array instaces of struct

#print all the value
for i in range(10):
    print "cstruct: obj_array i=%d f=%f s=%s" %(obj_array[i].var_i, obj_array[i].var_f, obj_array[i].var_str)

Note: instead of ‘cstruct’ name, please use your struct name instead of var_i, var_f, var_str, please define your structure’s member variable.


回答 11

这里的一些答案非常详尽。我找到的最简单的选项是(来自:http : //norvig.com/python-iaq.html):

class Struct:
    "A structure that can have any fields defined."
    def __init__(self, **entries): self.__dict__.update(entries)

初始化:

>>> options = Struct(answer=42, linelen=80, font='courier')
>>> options.answer
42

增加更多:

>>> options.cat = "dog"
>>> options.cat
dog

编辑:对不起,没有看到这个例子。

Some the answers here are massively elaborate. The simplest option I’ve found is (from: http://norvig.com/python-iaq.html):

class Struct:
    "A structure that can have any fields defined."
    def __init__(self, **entries): self.__dict__.update(entries)

Initialising:

>>> options = Struct(answer=42, linelen=80, font='courier')
>>> options.answer
42

adding more:

>>> options.cat = "dog"
>>> options.cat
dog

edit: Sorry didn’t see this example already further down.


回答 12

这可能有点晚了,但是我使用Python Meta-Classes(也是下面的装饰器版本)提出了一个解决方案。

什么时候 __init__在运行时被调用时,它抓住每个参数和它们的值,并将它们分配为实例变量上您的课。这样,您可以制作类似结构的类,而不必手动分配每个值。

我的示例没有错误检查,因此更容易理解。

class MyStruct(type):
    def __call__(cls, *args, **kwargs):
        names = cls.__init__.func_code.co_varnames[1:]

        self = type.__call__(cls, *args, **kwargs)

        for name, value in zip(names, args):
            setattr(self , name, value)

        for name, value in kwargs.iteritems():
            setattr(self , name, value)
        return self 

它在起作用。

>>> class MyClass(object):
    __metaclass__ = MyStruct
    def __init__(self, a, b, c):
        pass


>>> my_instance = MyClass(1, 2, 3)
>>> my_instance.a
1
>>> 

将其发布在reddit上/ u / matchu发布了一个更干净的装饰器版本。除非您要扩展元类版本,否则我建议您使用它。

>>> def init_all_args(fn):
    @wraps(fn)
    def wrapped_init(self, *args, **kwargs):
        names = fn.func_code.co_varnames[1:]

        for name, value in zip(names, args):
            setattr(self, name, value)

        for name, value in kwargs.iteritems():
            setattr(self, name, value)

    return wrapped_init

>>> class Test(object):
    @init_all_args
    def __init__(self, a, b):
        pass


>>> a = Test(1, 2)
>>> a.a
1
>>> 

This might be a bit late but I made a solution using Python Meta-Classes (decorator version below too).

When __init__ is called during run time, it grabs each of the arguments and their value and assigns them as instance variables to your class. This way you can make a struct-like class without having to assign every value manually.

My example has no error checking so it is easier to follow.

class MyStruct(type):
    def __call__(cls, *args, **kwargs):
        names = cls.__init__.func_code.co_varnames[1:]

        self = type.__call__(cls, *args, **kwargs)

        for name, value in zip(names, args):
            setattr(self , name, value)

        for name, value in kwargs.iteritems():
            setattr(self , name, value)
        return self 

Here it is in action.

>>> class MyClass(object):
    __metaclass__ = MyStruct
    def __init__(self, a, b, c):
        pass


>>> my_instance = MyClass(1, 2, 3)
>>> my_instance.a
1
>>> 

I posted it on reddit and /u/matchu posted a decorator version which is cleaner. I’d encourage you to use it unless you want to expand the metaclass version.

>>> def init_all_args(fn):
    @wraps(fn)
    def wrapped_init(self, *args, **kwargs):
        names = fn.func_code.co_varnames[1:]

        for name, value in zip(names, args):
            setattr(self, name, value)

        for name, value in kwargs.iteritems():
            setattr(self, name, value)

    return wrapped_init

>>> class Test(object):
    @init_all_args
    def __init__(self, a, b):
        pass


>>> a = Test(1, 2)
>>> a.a
1
>>> 

回答 13

我编写了一个装饰器,可以将其用于任何方法,以便将传入的所有参数或任何默认值分配给该实例。

def argumentsToAttributes(method):
    argumentNames = method.func_code.co_varnames[1:]

    # Generate a dictionary of default values:
    defaultsDict = {}
    defaults = method.func_defaults if method.func_defaults else ()
    for i, default in enumerate(defaults, start = len(argumentNames) - len(defaults)):
        defaultsDict[argumentNames[i]] = default

    def newMethod(self, *args, **kwargs):
        # Use the positional arguments.
        for name, value in zip(argumentNames, args):
            setattr(self, name, value)

        # Add the key word arguments. If anything is missing, use the default.
        for name in argumentNames[len(args):]:
            setattr(self, name, kwargs.get(name, defaultsDict[name]))

        # Run whatever else the method needs to do.
        method(self, *args, **kwargs)

    return newMethod

快速演示。请注意,我使用位置参数a,使用默认值b和命名参数c。然后self,我打印所有3个引用,以显示在输入方法之前已正确分配了它们。

class A(object):
    @argumentsToAttributes
    def __init__(self, a, b = 'Invisible', c = 'Hello'):
        print(self.a)
        print(self.b)
        print(self.c)

A('Why', c = 'Nothing')

请注意,我的装饰器应使用任何方法,而不仅仅是__init__

I wrote a decorator which you can use on any method to make it so that all of the arguments passed in, or any defaults, are assigned to the instance.

def argumentsToAttributes(method):
    argumentNames = method.func_code.co_varnames[1:]

    # Generate a dictionary of default values:
    defaultsDict = {}
    defaults = method.func_defaults if method.func_defaults else ()
    for i, default in enumerate(defaults, start = len(argumentNames) - len(defaults)):
        defaultsDict[argumentNames[i]] = default

    def newMethod(self, *args, **kwargs):
        # Use the positional arguments.
        for name, value in zip(argumentNames, args):
            setattr(self, name, value)

        # Add the key word arguments. If anything is missing, use the default.
        for name in argumentNames[len(args):]:
            setattr(self, name, kwargs.get(name, defaultsDict[name]))

        # Run whatever else the method needs to do.
        method(self, *args, **kwargs)

    return newMethod

A quick demonstration. Note that I use a positional argument a, use the default value for b, and a named argument c. I then print all 3 referencing self, to show that they’ve been properly assigned before the method is entered.

class A(object):
    @argumentsToAttributes
    def __init__(self, a, b = 'Invisible', c = 'Hello'):
        print(self.a)
        print(self.b)
        print(self.c)

A('Why', c = 'Nothing')

Note that my decorator should work with any method, not just __init__.


回答 14

我在这里看不到这个答案,因此我想添加一下,因为我现在倾向于使用Python并发现了它。所述的Python教程(Python 2中在这种情况下)给出以下简单而有效的实施例:

class Employee:
    pass

john = Employee()  # Create an empty employee record

# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000

即,创建一个空的类对象,然后实例化该字段,并动态添加字段。

这样做的好处是非常简单。缺点是它不是特别自我记录(预期的成员未在“定义”类中的任何位置列出),并且未设置的字段在访问时会引起问题。这两个问题可以通过以下方法解决:

class Employee:
    def __init__ (self):
        self.name = None # or whatever
        self.dept = None
        self.salary = None

现在,您至少可以一眼看出程序将期望哪些字段。

两者都容易出现错别字,john.slarly = 1000会成功。仍然可以。

I don’t see this answer here, so I figure I’ll add it since I’m leaning Python right now and just discovered it. The Python tutorial (Python 2 in this case) gives the following simple and effective example:

class Employee:
    pass

john = Employee()  # Create an empty employee record

# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000

That is, an empty class object is created, then instantiated, and the fields are added dynamically.

The up-side to this is its really simple. The downside is it isn’t particularly self-documenting (the intended members aren’t listed anywhere in the class “definition”), and unset fields can cause problems when accessed. Those two problems can be solved by:

class Employee:
    def __init__ (self):
        self.name = None # or whatever
        self.dept = None
        self.salary = None

Now at a glance you can at least see what fields the program will be expecting.

Both are prone to typos, john.slarly = 1000 will succeed. Still, it works.


回答 15

这是一个使用类(从未实例化)保存数据的解决方案。我喜欢这种方式,几乎不需要打字,也不需要任何其他软件包

class myStruct:
    field1 = "one"
    field2 = "2"

您以后可以根据需要添加更多字段:

myStruct.field3 = 3

要获取值,请照常访问这些字段:

>>> myStruct.field1
'one'

Here is a solution which uses a class (never instantiated) to hold data. I like that this way involves very little typing and does not require any additional packages etc.

class myStruct:
    field1 = "one"
    field2 = "2"

You can add more fields later, as needed:

myStruct.field3 = 3

To get the values, the fields are accessed as usual:

>>> myStruct.field1
'one'

回答 16

我个人也喜欢这个变体。它扩展了@dF的答案

class struct:
    def __init__(self, *sequential, **named):
        fields = dict(zip(sequential, [None]*len(sequential)), **named)
        self.__dict__.update(fields)
    def __repr__(self):
        return str(self.__dict__)

它支持两种初始化模式(可以混合使用):

# Struct with field1, field2, field3 that are initialized to None.
mystruct1 = struct("field1", "field2", "field3") 
# Struct with field1, field2, field3 that are initialized according to arguments.
mystruct2 = struct(field1=1, field2=2, field3=3)

此外,它的打印效果更好:

print(mystruct2)
# Prints: {'field3': 3, 'field1': 1, 'field2': 2}

Personally, I like this variant too. It extends @dF’s answer.

class struct:
    def __init__(self, *sequential, **named):
        fields = dict(zip(sequential, [None]*len(sequential)), **named)
        self.__dict__.update(fields)
    def __repr__(self):
        return str(self.__dict__)

It supports two modes of initialization (that can be blended):

# Struct with field1, field2, field3 that are initialized to None.
mystruct1 = struct("field1", "field2", "field3") 
# Struct with field1, field2, field3 that are initialized according to arguments.
mystruct2 = struct(field1=1, field2=2, field3=3)

Also, it prints nicer:

print(mystruct2)
# Prints: {'field3': 3, 'field1': 1, 'field2': 2}

回答 17

以下对结构的解决方案的灵感来自namedtuple实现和一些先前的答案。但是,与namedtuple不同,它的值是可变的,但是就像名称/属性中不可变的c样式结构一样,而普通的类或dict则不是。

_class_template = """\
class {typename}:
def __init__(self, *args, **kwargs):
    fields = {field_names!r}

    for x in fields:
        setattr(self, x, None)            

    for name, value in zip(fields, args):
        setattr(self, name, value)

    for name, value in kwargs.items():
        setattr(self, name, value)            

def __repr__(self):
    return str(vars(self))

def __setattr__(self, name, value):
    if name not in {field_names!r}:
        raise KeyError("invalid name: %s" % name)
    object.__setattr__(self, name, value)            
"""

def struct(typename, field_names):

    class_definition = _class_template.format(
        typename = typename,
        field_names = field_names)

    namespace = dict(__name__='struct_%s' % typename)
    exec(class_definition, namespace)
    result = namespace[typename]
    result._source = class_definition

    return result

用法:

Person = struct('Person', ['firstname','lastname'])
generic = Person()
michael = Person('Michael')
jones = Person(lastname = 'Jones')


In [168]: michael.middlename = 'ben'
Traceback (most recent call last):

  File "<ipython-input-168-b31c393c0d67>", line 1, in <module>
michael.middlename = 'ben'

  File "<string>", line 19, in __setattr__

KeyError: 'invalid name: middlename'

The following solution to a struct is inspired by the namedtuple implementation and some of the previous answers. However, unlike the namedtuple it is mutable, in it’s values, but like the c-style struct immutable in the names/attributes, which a normal class or dict isn’t.

_class_template = """\
class {typename}:
def __init__(self, *args, **kwargs):
    fields = {field_names!r}

    for x in fields:
        setattr(self, x, None)            

    for name, value in zip(fields, args):
        setattr(self, name, value)

    for name, value in kwargs.items():
        setattr(self, name, value)            

def __repr__(self):
    return str(vars(self))

def __setattr__(self, name, value):
    if name not in {field_names!r}:
        raise KeyError("invalid name: %s" % name)
    object.__setattr__(self, name, value)            
"""

def struct(typename, field_names):

    class_definition = _class_template.format(
        typename = typename,
        field_names = field_names)

    namespace = dict(__name__='struct_%s' % typename)
    exec(class_definition, namespace)
    result = namespace[typename]
    result._source = class_definition

    return result

Usage:

Person = struct('Person', ['firstname','lastname'])
generic = Person()
michael = Person('Michael')
jones = Person(lastname = 'Jones')


In [168]: michael.middlename = 'ben'
Traceback (most recent call last):

  File "<ipython-input-168-b31c393c0d67>", line 1, in <module>
michael.middlename = 'ben'

  File "<string>", line 19, in __setattr__

KeyError: 'invalid name: middlename'

回答 18

有一个专门用于此目的的python包。见cstruct2py

cstruct2py是一个纯Python库,用于从C代码生成python类,并使用它们来打包和解压缩数据。该库可以解析C头(结构,联合,枚举和数组声明),并在python中进行仿真。生成的pythonic类可以解析和打包数据。

例如:

typedef struct {
  int x;
  int y;
} Point;

after generating pythonic class...
p = Point(x=0x1234, y=0x5678)
p.packed == "\x34\x12\x00\x00\x78\x56\x00\x00"

如何使用

首先,我们需要生成pythonic结构:

import cstruct2py
parser = cstruct2py.c2py.Parser()
parser.parse_file('examples/example.h')

现在我们可以从C代码导入所有名称:

parser.update_globals(globals())

我们也可以直接这样做:

A = parser.parse_string('struct A { int x; int y;};')

从C代码使用类型和定义

a = A()
a.x = 45
print a
buf = a.packed
b = A(buf)
print b
c = A('aaaa11112222', 2)
print c
print repr(c)

输出将是:

{'x':0x2d, 'y':0x0}
{'x':0x2d, 'y':0x0}
{'x':0x31316161, 'y':0x32323131}
A('aa111122', x=0x31316161, y=0x32323131)

克隆

对于克隆cstruct2py运行:

git clone https://github.com/st0ky/cstruct2py.git --recursive

There is a python package exactly for this purpose. see cstruct2py

cstruct2py is a pure python library for generate python classes from C code and use them to pack and unpack data. The library can parse C headres (structs, unions, enums, and arrays declarations) and emulate them in python. The generated pythonic classes can parse and pack the data.

For example:

typedef struct {
  int x;
  int y;
} Point;

after generating pythonic class...
p = Point(x=0x1234, y=0x5678)
p.packed == "\x34\x12\x00\x00\x78\x56\x00\x00"

How to use

First we need to generate the pythonic structs:

import cstruct2py
parser = cstruct2py.c2py.Parser()
parser.parse_file('examples/example.h')

Now we can import all names from the C code:

parser.update_globals(globals())

We can also do that directly:

A = parser.parse_string('struct A { int x; int y;};')

Using types and defines from the C code

a = A()
a.x = 45
print a
buf = a.packed
b = A(buf)
print b
c = A('aaaa11112222', 2)
print c
print repr(c)

The output will be:

{'x':0x2d, 'y':0x0}
{'x':0x2d, 'y':0x0}
{'x':0x31316161, 'y':0x32323131}
A('aa111122', x=0x31316161, y=0x32323131)

Clone

For clone cstruct2py run:

git clone https://github.com/st0ky/cstruct2py.git --recursive

回答 19

我认为Python结构字典适合此要求。

d = dict{}
d[field1] = field1
d[field2] = field2
d[field2] = field3

I think Python structure dictionary is suitable for this requirement.

d = dict{}
d[field1] = field1
d[field2] = field2
d[field2] = field3

回答 20

https://stackoverflow.com/a/32448434/159695在Python3中不起作用。

https://stackoverflow.com/a/35993/159695可在Python3中使用。

我将其扩展为添加默认值。

class myStruct:
    def __init__(self, **kwds):
        self.x=0
        self.__dict__.update(kwds) # Must be last to accept assigned member variable.
    def __repr__(self):
        args = ['%s=%s' % (k, repr(v)) for (k,v) in vars(self).items()]
        return '%s(%s)' % ( self.__class__.__qualname__, ', '.join(args) )

a=myStruct()
b=myStruct(x=3,y='test')
c=myStruct(x='str')

>>> a
myStruct(x=0)
>>> b
myStruct(x=3, y='test')
>>> c
myStruct(x='str')

https://stackoverflow.com/a/32448434/159695 does not work in Python3.

https://stackoverflow.com/a/35993/159695 works in Python3.

And I extends it to add default values.

class myStruct:
    def __init__(self, **kwds):
        self.x=0
        self.__dict__.update(kwds) # Must be last to accept assigned member variable.
    def __repr__(self):
        args = ['%s=%s' % (k, repr(v)) for (k,v) in vars(self).items()]
        return '%s(%s)' % ( self.__class__.__qualname__, ', '.join(args) )

a=myStruct()
b=myStruct(x=3,y='test')
c=myStruct(x='str')

>>> a
myStruct(x=0)
>>> b
myStruct(x=3, y='test')
>>> c
myStruct(x='str')

回答 21

如果@dataclass没有3.7,并且需要可变性,则以下代码可能对您有用。它具有很好的自我说明性和IDE友好性(自动完成),可防止重复编写两次,易于扩展,并且测试所有实例变量是否已完全初始化非常简单:

class Params():
    def __init__(self):
        self.var1 : int = None
        self.var2 : str = None

    def are_all_defined(self):
        for key, value in self.__dict__.items():
            assert (value is not None), "instance variable {} is still None".format(key)
        return True


params = Params()
params.var1 = 2
params.var2 = 'hello'
assert(params.are_all_defined)

If you don’t have a 3.7 for @dataclass and need mutability, the following code might work for you. It’s quite self-documenting and IDE-friendly (auto-complete), prevents writing things twice, is easily extendable and it is very simple to test that all instance variables are completely initialized:

class Params():
    def __init__(self):
        self.var1 : int = None
        self.var2 : str = None

    def are_all_defined(self):
        for key, value in self.__dict__.items():
            assert (value is not None), "instance variable {} is still None".format(key)
        return True


params = Params()
params.var1 = 2
params.var2 = 'hello'
assert(params.are_all_defined)

回答 22

这是一个快速而肮脏的把戏:

>>> ms = Warning()
>>> ms.foo = 123
>>> ms.bar = 'akafrit'

如何运作?它只是重复使用内置类Warning(从派生Exception),并使用它,因为它是您自己定义的类。

优点是您不需要首先导入或定义任何内容,“警告”是一个简短的名称,并且还可以清楚地表明您正在做一些肮脏的事情,除了您的小型脚本之外,其他任何地方都不应使用。

顺便说一句,我试图找到一些更简单的东西,ms = object()但是没有(最后一个例子不起作用)。如果您有一个,我很感兴趣。

Here is a quick and dirty trick:

>>> ms = Warning()
>>> ms.foo = 123
>>> ms.bar = 'akafrit'

How does it works? It just re-use the builtin class Warning (derived from Exception) and use it as it was you own defined class.

The good points are that you do not need to import or define anything first, that “Warning” is a short name, and that it also makes clear you are doing something dirty which should not be used elsewhere than a small script of yours.

By the way, I tried to find something even simpler like ms = object() but could not (this last exemple is not working). If you have one, I am interested.


回答 23

我发现做到这一点的最佳方法是使用自定义词典类,如本文中所述:https : //stackoverflow.com/a/14620633/8484485

如果需要iPython自动补全支持,只需定义dir()函数,如下所示:

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self
    def __dir__(self):
        return self.keys()

然后,您可以像这样定义伪结构:(此嵌套)

my_struct=AttrDict ({
    'com1':AttrDict ({
        'inst':[0x05],
        'numbytes':2,
        'canpayload':False,
        'payload':None
    })
})

然后,您可以像这样访问my_struct内的值:

print(my_struct.com1.inst)

=>[5]

The best way I found to do this was to use a custom dictionary class as explained in this post: https://stackoverflow.com/a/14620633/8484485

If iPython autocompletion support is needed, simply define the dir() function like this:

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self
    def __dir__(self):
        return self.keys()

You then define your pseudo struct like so: (this one is nested)

my_struct=AttrDict ({
    'com1':AttrDict ({
        'inst':[0x05],
        'numbytes':2,
        'canpayload':False,
        'payload':None
    })
})

You can then access the values inside my_struct like this:

print(my_struct.com1.inst)

=>[5]


回答 24

NamedTuple很舒服。但没有人共享性能和存储空间。

from typing import NamedTuple
import guppy  # pip install guppy
import timeit


class User:
    def __init__(self, name: str, uid: int):
        self.name = name
        self.uid = uid


class UserSlot:
    __slots__ = ('name', 'uid')

    def __init__(self, name: str, uid: int):
        self.name = name
        self.uid = uid


class UserTuple(NamedTuple):
    # __slots__ = ()  # AttributeError: Cannot overwrite NamedTuple attribute __slots__
    name: str
    uid: int


def get_fn(obj, attr_name: str):
    def get():
        getattr(obj, attr_name)
    return get
if 'memory test':
    obj = [User('Carson', 1) for _ in range(1000000)]      # Cumulative: 189138883
    obj_slot = [UserSlot('Carson', 1) for _ in range(1000000)]          # 77718299  <-- winner
    obj_namedtuple = [UserTuple('Carson', 1) for _ in range(1000000)]   # 85718297
    print(guppy.hpy().heap())  # Run this function individually. 
    """
    Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000    24 112000000 34 112000000  34 dict of __main__.User
     1 1000000    24 64000000  19 176000000  53 __main__.UserTuple
     2 1000000    24 56000000  17 232000000  70 __main__.User
     3 1000000    24 56000000  17 288000000  87 __main__.UserSlot
     ...
    """

if 'performance test':
    obj = User('Carson', 1)
    obj_slot = UserSlot('Carson', 1)
    obj_tuple = UserTuple('Carson', 1)

    time_normal = min(timeit.repeat(get_fn(obj, 'name'), repeat=20))
    print(time_normal)  # 0.12550550000000005

    time_slot = min(timeit.repeat(get_fn(obj_slot, 'name'), repeat=20))
    print(time_slot)  # 0.1368690000000008

    time_tuple = min(timeit.repeat(get_fn(obj_tuple, 'name'), repeat=20))
    print(time_tuple)  # 0.16006120000000124

    print(time_tuple/time_slot)  # 1.1694481584580898  # The slot is almost 17% faster than NamedTuple on Windows. (Python 3.7.7)

如果您__dict__不使用,请在__slots__(更高的性能和存储空间)和NamedTuple(便于阅读和使用)之间进行选择

您可以查看这个链接(的用法 ),以获得更多的__slots__信息。

NamedTuple is comfortable. but there no one shares the performance and storage.

from typing import NamedTuple
import guppy  # pip install guppy
import timeit


class User:
    def __init__(self, name: str, uid: int):
        self.name = name
        self.uid = uid


class UserSlot:
    __slots__ = ('name', 'uid')

    def __init__(self, name: str, uid: int):
        self.name = name
        self.uid = uid


class UserTuple(NamedTuple):
    # __slots__ = ()  # AttributeError: Cannot overwrite NamedTuple attribute __slots__
    name: str
    uid: int


def get_fn(obj, attr_name: str):
    def get():
        getattr(obj, attr_name)
    return get
if 'memory test':
    obj = [User('Carson', 1) for _ in range(1000000)]      # Cumulative: 189138883
    obj_slot = [UserSlot('Carson', 1) for _ in range(1000000)]          # 77718299  <-- winner
    obj_namedtuple = [UserTuple('Carson', 1) for _ in range(1000000)]   # 85718297
    print(guppy.hpy().heap())  # Run this function individually. 
    """
    Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000    24 112000000 34 112000000  34 dict of __main__.User
     1 1000000    24 64000000  19 176000000  53 __main__.UserTuple
     2 1000000    24 56000000  17 232000000  70 __main__.User
     3 1000000    24 56000000  17 288000000  87 __main__.UserSlot
     ...
    """

if 'performance test':
    obj = User('Carson', 1)
    obj_slot = UserSlot('Carson', 1)
    obj_tuple = UserTuple('Carson', 1)

    time_normal = min(timeit.repeat(get_fn(obj, 'name'), repeat=20))
    print(time_normal)  # 0.12550550000000005

    time_slot = min(timeit.repeat(get_fn(obj_slot, 'name'), repeat=20))
    print(time_slot)  # 0.1368690000000008

    time_tuple = min(timeit.repeat(get_fn(obj_tuple, 'name'), repeat=20))
    print(time_tuple)  # 0.16006120000000124

    print(time_tuple/time_slot)  # 1.1694481584580898  # The slot is almost 17% faster than NamedTuple on Windows. (Python 3.7.7)

If your __dict__ is not using, please choose between __slots__ (higher performance and storage) and NamedTuple (clear for reading and use)

You can review this link(Usage of slots ) to get more __slots__ information.


从熊猫DataFrame中按部分字符串选择

问题:从熊猫DataFrame中按部分字符串选择

我有一个DataFrame4列,其中2个包含字符串值。我想知道是否有一种方法可以根据针对特定列的部分字符串匹配来选择行?

换句话说,一个函数或lambda函数将执行以下操作

re.search(pattern, cell_in_question) 

返回一个布尔值。我熟悉的语法,df[df['A'] == "hello world"]但似乎找不到用部分字符串匹配说的方法'hello'

有人可以指出正确的方向吗?

I have a DataFrame with 4 columns of which 2 contain string values. I was wondering if there was a way to select rows based on a partial string match against a particular column?

In other words, a function or lambda function that would do something like

re.search(pattern, cell_in_question) 

returning a boolean. I am familiar with the syntax of df[df['A'] == "hello world"] but can’t seem to find a way to do the same with a partial string match say 'hello'.

Would someone be able to point me in the right direction?


回答 0

基于github问题#620,看来您很快将能够执行以下操作:

df[df['A'].str.contains("hello")]

更新:熊猫0.8.1及更高版本中提供了矢量化字符串方法(即Series.str)

Based on github issue #620, it looks like you’ll soon be able to do the following:

df[df['A'].str.contains("hello")]

Update: vectorized string methods (i.e., Series.str) are available in pandas 0.8.1 and up.


回答 1

我尝试了上面提出的解决方案:

df[df["A"].str.contains("Hello|Britain")]

并得到一个错误:

ValueError:无法使用包含NA / NaN值的数组进行遮罩

您可以将NA值转换为False,如下所示:

df[df["A"].str.contains("Hello|Britain", na=False)]

I tried the proposed solution above:

df[df["A"].str.contains("Hello|Britain")]

and got an error:

ValueError: cannot mask with array containing NA / NaN values

you can transform NA values into False, like this:

df[df["A"].str.contains("Hello|Britain", na=False)]

回答 2

如何从熊猫DataFrame中按部分字符串选择?

这篇文章是为想要

  • 在字符串列中搜索子字符串(最简单的情况)
  • 搜索多个子字符串(类似于isin
  • 匹配文本中的整个单词(例如,“蓝色”应匹配“天空是蓝色”,而不是“ bluejay”)
  • 匹配多个完整词
  • 了解“ ValueError:无法使用包含NA / NaN值的向量进行索引”背后的原因

…并想进一步了解应优先采用哪种方法。

(PS:我在类似主题上看到了很多问题,我认为最好把它留在这里。)


基本子串搜索

# setup
df1 = pd.DataFrame({'col': ['foo', 'foobar', 'bar', 'baz']})
df1

      col
0     foo
1  foobar
2     bar
3     baz

str.contains可用于执行子字符串搜索或基于正则表达式的搜索。搜索默认为基于正则表达式,除非您明确禁用它。

这是一个基于正则表达式的搜索示例,

# find rows in `df1` which contain "foo" followed by something
df1[df1['col'].str.contains(r'foo(?!$)')]

      col
1  foobar

有时,不需要进行正则表达式搜索,因此请指定regex=False为禁用它。

#select all rows containing "foo"
df1[df1['col'].str.contains('foo', regex=False)]
# same as df1[df1['col'].str.contains('foo')] but faster.

      col
0     foo
1  foobar

在性能方面,正则表达式搜索比子字符串搜索慢:

df2 = pd.concat([df1] * 1000, ignore_index=True)

%timeit df2[df2['col'].str.contains('foo')]
%timeit df2[df2['col'].str.contains('foo', regex=False)]

6.31 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.8 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

如果不需要,请避免使用基于正则表达式的搜索。

解决ValueError小号
有时,执行字符串搜索和对结果的过滤会导致

ValueError: cannot index with vector containing NA / NaN values

这通常是由于对象列中存在混合数据或NaN,

s = pd.Series(['foo', 'foobar', np.nan, 'bar', 'baz', 123])
s.str.contains('foo|bar')

0     True
1     True
2      NaN
3     True
4    False
5      NaN
dtype: object


s[s.str.contains('foo|bar')]
# ---------------------------------------------------------------------------
# ValueError                                Traceback (most recent call last)

非字符串的任何内容都不能应用字符串方法,因此结果自然是NaN。在这种情况下,请指定na=False忽略非字符串数据,

s.str.contains('foo|bar', na=False)

0     True
1     True
2    False
3     True
4    False
5    False
dtype: bool

多个子串搜索

通过使用正则表达式OR管道进行正则表达式搜索,最容易实现这一点。

# Slightly modified example.
df4 = pd.DataFrame({'col': ['foo abc', 'foobar xyz', 'bar32', 'baz 45']})
df4

          col
0     foo abc
1  foobar xyz
2       bar32
3      baz 45

df4[df4['col'].str.contains(r'foo|baz')]

          col
0     foo abc
1  foobar xyz
3      baz 45

您还可以创建一个术语列表,然后将其加入:

terms = ['foo', 'baz']
df4[df4['col'].str.contains('|'.join(terms))]

          col
0     foo abc
1  foobar xyz
3      baz 45

有时,明智的做法是将您的术语转义,以防它们包含可被解释为正则表达式元字符的字符。如果您的条款包含以下任何字符…

. ^ $ * + ? { } [ ] \ | ( )

然后,你就需要使用re.escape逃避它们:

import re
df4[df4['col'].str.contains('|'.join(map(re.escape, terms)))]

          col
0     foo abc
1  foobar xyz
3      baz 45

re.escape 具有转义特殊字符的效果,因此可以按字面意义对待它们。

re.escape(r'.foo^')
# '\\.foo\\^'

匹配全词

默认情况下,子字符串搜索将搜索指定的子字符串/模式,而不管其是否为完整单词。为了仅匹配完整的单词,我们将需要在此处使用正则表达式-特别是,我们的模式将需要指定单词边界(\b)。

例如,

df3 = pd.DataFrame({'col': ['the sky is blue', 'bluejay by the window']})
df3

                     col
0        the sky is blue
1  bluejay by the window

现在考虑

df3[df3['col'].str.contains('blue')]

                     col
0        the sky is blue
1  bluejay by the window

伏/秒

df3[df3['col'].str.contains(r'\bblue\b')]

               col
0  the sky is blue

多个全字搜索

与上述类似,不同之处\b在于我们在连接的模式中添加了字边界()。

p = r'\b(?:{})\b'.format('|'.join(map(re.escape, terms)))
df4[df4['col'].str.contains(p)]

       col
0  foo abc
3   baz 45

p这个样子的,

p
# '\\b(?:foo|baz)\\b'

一个很好的选择:使用列表推导

因为你能!而且你应该!它们通常比字符串方法快一点,因为字符串方法难以向量化并且通常具有循环实现。

代替,

df1[df1['col'].str.contains('foo', regex=False)]

in在列表组合中使用运算符,

df1[['foo' in x for x in df1['col']]]

       col
0  foo abc
1   foobar

代替,

regex_pattern = r'foo(?!$)'
df1[df1['col'].str.contains(regex_pattern)]

在列表组合中使用re.compile(用于缓存正则表达式)+ Pattern.search

p = re.compile(regex_pattern, flags=re.IGNORECASE)
df1[[bool(p.search(x)) for x in df1['col']]]

      col
1  foobar

如果“ col”具有NaN,则代替

df1[df1['col'].str.contains(regex_pattern, na=False)]

采用,

def try_search(p, x):
    try:
        return bool(p.search(x))
    except TypeError:
        return False

p = re.compile(regex_pattern)
df1[[try_search(p, x) for x in df1['col']]]

      col
1  foobar

偏字符串匹配更多选项:np.char.findnp.vectorizeDataFrame.query

除了str.contains和列出理解,您还可以使用以下替代方法。

np.char.find
仅支持子字符串搜索(读取:无正则表达式)。

df4[np.char.find(df4['col'].values.astype(str), 'foo') > -1]

          col
0     foo abc
1  foobar xyz

np.vectorize
这是一个循环的包装器,但是比大多数pandas str方法要少。

f = np.vectorize(lambda haystack, needle: needle in haystack)
f(df1['col'], 'foo')
# array([ True,  True, False, False])

df1[f(df1['col'], 'foo')]

       col
0  foo abc
1   foobar

正则表达式解决方案可能:

regex_pattern = r'foo(?!$)'
p = re.compile(regex_pattern)
f = np.vectorize(lambda x: pd.notna(x) and bool(p.search(x)))
df1[f(df1['col'])]

      col
1  foobar

DataFrame.query
通过python引擎支持字符串方法。这没有提供明显的性能优势,但是对于了解是否需要动态生成查询很有用。

df1.query('col.str.contains("foo")', engine='python')

      col
0     foo
1  foobar

有关更多信息queryeval方法系列,请参见使用pd.eval()在大熊猫中进行动态表达评估。


推荐用法

  1. (第一) str.contains,因为它简单易用,可以处理NaN和混合数据
  2. 列出其性能的理解(特别是如果您的数据是纯字符串)
  3. np.vectorize
  4. (持续) df.query

How do I select by partial string from a pandas DataFrame?

This post is meant for readers who want to

  • search for a substring in a string column (the simplest case)
  • search for multiple substrings (similar to isin)
  • match a whole word from text (e.g., “blue” should match “the sky is blue” but not “bluejay”)
  • match multiple whole words
  • Understand the reason behind “ValueError: cannot index with vector containing NA / NaN values”

…and would like to know more about what methods should be preferred over others.

(P.S.: I’ve seen a lot of questions on similar topics, I thought it would be good to leave this here.)


Basic Substring Search

# setup
df1 = pd.DataFrame({'col': ['foo', 'foobar', 'bar', 'baz']})
df1

      col
0     foo
1  foobar
2     bar
3     baz

str.contains can be used to perform either substring searches or regex based search. The search defaults to regex-based unless you explicitly disable it.

Here is an example of regex-based search,

# find rows in `df1` which contain "foo" followed by something
df1[df1['col'].str.contains(r'foo(?!$)')]

      col
1  foobar

Sometimes regex search is not required, so specify regex=False to disable it.

#select all rows containing "foo"
df1[df1['col'].str.contains('foo', regex=False)]
# same as df1[df1['col'].str.contains('foo')] but faster.

      col
0     foo
1  foobar

Performance wise, regex search is slower than substring search:

df2 = pd.concat([df1] * 1000, ignore_index=True)

%timeit df2[df2['col'].str.contains('foo')]
%timeit df2[df2['col'].str.contains('foo', regex=False)]

6.31 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.8 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Avoid using regex-based search if you don’t need it.

Addressing ValueErrors
Sometimes, performing a substring search and filtering on the result will result in

ValueError: cannot index with vector containing NA / NaN values

This is usually because of mixed data or NaNs in your object column,

s = pd.Series(['foo', 'foobar', np.nan, 'bar', 'baz', 123])
s.str.contains('foo|bar')

0     True
1     True
2      NaN
3     True
4    False
5      NaN
dtype: object


s[s.str.contains('foo|bar')]
# ---------------------------------------------------------------------------
# ValueError                                Traceback (most recent call last)

Anything that is not a string cannot have string methods applied on it, so the result is NaN (naturally). In this case, specify na=False to ignore non-string data,

s.str.contains('foo|bar', na=False)

0     True
1     True
2    False
3     True
4    False
5    False
dtype: bool

Multiple Substring Search

This is most easily achieved through a regex search using the regex OR pipe.

# Slightly modified example.
df4 = pd.DataFrame({'col': ['foo abc', 'foobar xyz', 'bar32', 'baz 45']})
df4

          col
0     foo abc
1  foobar xyz
2       bar32
3      baz 45

df4[df4['col'].str.contains(r'foo|baz')]

          col
0     foo abc
1  foobar xyz
3      baz 45

You can also create a list of terms, then join them:

terms = ['foo', 'baz']
df4[df4['col'].str.contains('|'.join(terms))]

          col
0     foo abc
1  foobar xyz
3      baz 45

Sometimes, it is wise to escape your terms in case they have characters that can be interpreted as regex metacharacters. If your terms contain any of the following characters…

. ^ $ * + ? { } [ ] \ | ( )

Then, you’ll need to use re.escape to escape them:

import re
df4[df4['col'].str.contains('|'.join(map(re.escape, terms)))]

          col
0     foo abc
1  foobar xyz
3      baz 45

re.escape has the effect of escaping the special characters so they’re treated literally.

re.escape(r'.foo^')
# '\\.foo\\^'

Matching Entire Word(s)

By default, the substring search searches for the specified substring/pattern regardless of whether it is full word or not. To only match full words, we will need to make use of regular expressions here—in particular, our pattern will need to specify word boundaries (\b).

For example,

df3 = pd.DataFrame({'col': ['the sky is blue', 'bluejay by the window']})
df3

                     col
0        the sky is blue
1  bluejay by the window

Now consider,

df3[df3['col'].str.contains('blue')]

                     col
0        the sky is blue
1  bluejay by the window

v/s

df3[df3['col'].str.contains(r'\bblue\b')]

               col
0  the sky is blue

Multiple Whole Word Search

Similar to the above, except we add a word boundary (\b) to the joined pattern.

p = r'\b(?:{})\b'.format('|'.join(map(re.escape, terms)))
df4[df4['col'].str.contains(p)]

       col
0  foo abc
3   baz 45

Where p looks like this,

p
# '\\b(?:foo|baz)\\b'

A Great Alternative: Use List Comprehensions!

Because you can! And you should! They are usually a little bit faster than string methods, because string methods are hard to vectorise and usually have loopy implementations.

Instead of,

df1[df1['col'].str.contains('foo', regex=False)]

Use the in operator inside a list comp,

df1[['foo' in x for x in df1['col']]]

       col
0  foo abc
1   foobar

Instead of,

regex_pattern = r'foo(?!$)'
df1[df1['col'].str.contains(regex_pattern)]

Use re.compile (to cache your regex) + Pattern.search inside a list comp,

p = re.compile(regex_pattern, flags=re.IGNORECASE)
df1[[bool(p.search(x)) for x in df1['col']]]

      col
1  foobar

If “col” has NaNs, then instead of

df1[df1['col'].str.contains(regex_pattern, na=False)]

Use,

def try_search(p, x):
    try:
        return bool(p.search(x))
    except TypeError:
        return False

p = re.compile(regex_pattern)
df1[[try_search(p, x) for x in df1['col']]]

      col
1  foobar

More Options for Partial String Matching: np.char.find, np.vectorize, DataFrame.query.

In addition to str.contains and list comprehensions, you can also use the following alternatives.

np.char.find
Supports substring searches (read: no regex) only.

df4[np.char.find(df4['col'].values.astype(str), 'foo') > -1]

          col
0     foo abc
1  foobar xyz

np.vectorize
This is a wrapper around a loop, but with lesser overhead than most pandas str methods.

f = np.vectorize(lambda haystack, needle: needle in haystack)
f(df1['col'], 'foo')
# array([ True,  True, False, False])

df1[f(df1['col'], 'foo')]

       col
0  foo abc
1   foobar

Regex solutions possible:

regex_pattern = r'foo(?!$)'
p = re.compile(regex_pattern)
f = np.vectorize(lambda x: pd.notna(x) and bool(p.search(x)))
df1[f(df1['col'])]

      col
1  foobar

DataFrame.query
Supports string methods through the python engine. This offers no visible performance benefits, but is nonetheless useful to know if you need to dynamically generate your queries.

df1.query('col.str.contains("foo")', engine='python')

      col
0     foo
1  foobar

More information on query and eval family of methods can be found at Dynamic Expression Evaluation in pandas using pd.eval().


Recommended Usage Precedence

  1. (First) str.contains, for its simplicity and ease handling NaNs and mixed data
  2. List comprehensions, for its performance (especially if your data is purely strings)
  3. np.vectorize
  4. (Last) df.query

回答 3

如果有人想知道如何执行相关问题:“按部分字符串选择列”

采用:

df.filter(like='hello')  # select columns which contain the word hello

要通过部分字符串匹配选择行,请传递axis=0到过滤器:

# selects rows which contain the word hello in their index label
df.filter(like='hello', axis=0)  

If anyone wonders how to perform a related problem: “Select column by partial string”

Use:

df.filter(like='hello')  # select columns which contain the word hello

And to select rows by partial string matching, pass axis=0 to filter:

# selects rows which contain the word hello in their index label
df.filter(like='hello', axis=0)  

回答 4

快速说明:如果要基于索引中包含的部分字符串进行选择,请尝试以下操作:

df['stridx']=df.index
df[df['stridx'].str.contains("Hello|Britain")]

Quick note: if you want to do selection based on a partial string contained in the index, try the following:

df['stridx']=df.index
df[df['stridx'].str.contains("Hello|Britain")]

回答 5

说您有以下内容DataFrame

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

您始终可以in在lambda表达式中使用运算符来创建过滤器。

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

这里的技巧是使用中的axis=1选项apply将元素逐行(而不是逐列)传递给lambda函数。

Say you have the following DataFrame:

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

You can always use the in operator in a lambda expression to create your filter.

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

The trick here is to use the axis=1 option in the apply to pass elements to the lambda function row by row, as opposed to column by column.


回答 6

这就是我为部分字符串匹配所做的最终结果。如果有人有更有效的方法,请告诉我。

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf

Here’s what I ended up doing for partial string matches. If anyone has a more efficient way of doing this please let me know.

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf

回答 7

对于包含特殊字符的字符串,使用contains效果不佳。找到工作了。

df[df['A'].str.find("hello") != -1]

Using contains didn’t work well for my string with special characters. Find worked though.

df[df['A'].str.find("hello") != -1]

回答 8

在此之前,有一些答案可以完成所要求的功能,无论如何,我想以最普遍的方式展示:

df.filter(regex=".*STRING_YOU_LOOK_FOR.*")

这样,无论编写哪种方式,您都可以获取要查找的列。

(显然,您必须为每种情况编写正确的regex表达式)

There are answers before this which accomplish the asked feature, anyway I would like to show the most generally way:

df.filter(regex=".*STRING_YOU_LOOK_FOR.*")

This way let’s you get the column you look for whatever the way is wrote.

( Obviusly, you have to write the proper regex expression for each case )


回答 9

也许您想在Pandas数据框的所有列中搜索一些文本,而不仅仅是在它们的子集中。在这种情况下,以下代码将有所帮助。

df[df.apply(lambda row: row.astype(str).str.contains('String To Find').any(), axis=1)]

警告。此方法相对较慢,但很方便。

Maybe you want to search for some text in all columns of the Pandas dataframe, and not just in the subset of them. In this case, the following code will help.

df[df.apply(lambda row: row.astype(str).str.contains('String To Find').any(), axis=1)]

Warning. This method is relatively slow, albeit convenient.


回答 10

如果您需要在pandas dataframe列中进行不区分大小写的搜索,请执行以下操作:

df[df['A'].str.contains("hello", case=False)]

Should you need to do a case insensitive search for a string in a pandas dataframe column:

df[df['A'].str.contains("hello", case=False)]

如何检查对象是列表还是元组(而不是字符串)?

问题:如何检查对象是列表还是元组(而不是字符串)?

这就是我通常做,以确定输入是一个list/ tuple-但不是str。因为很多时候我偶然发现了一个错误,即一个函数str错误地传递了一个对象,而目标函数确实for x in lst假定这lst实际上是一个listor tuple

assert isinstance(lst, (list, tuple))

我的问题是:是否有更好的方法来实现这一目标?

This is what I normally do in order to ascertain that the input is a list/tuple – but not a str. Because many times I stumbled upon bugs where a function passes a str object by mistake, and the target function does for x in lst assuming that lst is actually a list or tuple.

assert isinstance(lst, (list, tuple))

My question is: is there a better way of achieving this?


回答 0

仅在python 2中(不是python 3):

assert not isinstance(lst, basestring)

实际上就是您想要的,否则您会错过很多像列表一样的东西,但它们不是listor的子类tuple

In python 2 only (not python 3):

assert not isinstance(lst, basestring)

Is actually what you want, otherwise you’ll miss out on a lot of things which act like lists, but aren’t subclasses of list or tuple.


回答 1

请记住,在Python中,我们要使用“鸭子类型”。因此,任何类似列表的行为都可以视为列表。因此,不要检查列表的类型,只看它是否像列表一样。

但是字符串也像列表一样,通常这不是我们想要的。有时甚至是一个问题!因此,显式检查字符串,然后使用鸭子类型。

这是我写的一个有趣的函数。这是它的特殊版本,repr()可以在尖括号('<‘,’>’)中打印任何序列。

def srepr(arg):
    if isinstance(arg, basestring): # Python 3: isinstance(arg, str)
        return repr(arg)
    try:
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    except TypeError: # catch when for loop fails
        return repr(arg) # not a sequence so just return repr

总体而言,这是干净优雅的。但是那张isinstance()支票在那里做什么?这是一种hack。但这是必不可少的。

该函数以递归方式调用类似于列表的任何对象。如果我们不专门处理字符串,则将其视为列表,并一次拆分一个字符。但是,然后递归调用将尝试将每个字符视为一个列表-它将起作用!即使是一个字符的字符串也可以作为列表!该函数将继续递归调用自身,直到堆栈溢出为止。

像这样的函数,依赖于每个递归调用来分解要完成的工作,必须使用特殊情况的字符串-因为您不能将字符串分解为一个字符以下的字符串,甚至不能分解为一个以下的字符串-字符字符串的作用类似于列表。

注意:try/ except是表达我们意图的最干净的方法。但是,如果这段代码在某种程度上对时间很紧迫,我们可能要用某种测试来替换它,看看是否arg是一个序列。除了测试类型,我们可能应该测试行为。如果它有一个.strip()方法,它是一个字符串,所以不要认为它是一个序列。否则,如果它是可索引的或可迭代的,则它是一个序列:

def is_sequence(arg):
    return (not hasattr(arg, "strip") and
            hasattr(arg, "__getitem__") or
            hasattr(arg, "__iter__"))

def srepr(arg):
    if is_sequence(arg):
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    return repr(arg)

编辑:我最初写上面检查,__getslice__()但我注意到在collections模块文档中,有趣的方法是__getitem__(); 这很有意义,这就是您索引对象的方式。这似乎比根本,__getslice__()因此我更改了上面的内容。

Remember that in Python we want to use “duck typing”. So, anything that acts like a list can be treated as a list. So, don’t check for the type of a list, just see if it acts like a list.

But strings act like a list too, and often that is not what we want. There are times when it is even a problem! So, check explicitly for a string, but then use duck typing.

Here is a function I wrote for fun. It is a special version of repr() that prints any sequence in angle brackets (‘<‘, ‘>’).

def srepr(arg):
    if isinstance(arg, basestring): # Python 3: isinstance(arg, str)
        return repr(arg)
    try:
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    except TypeError: # catch when for loop fails
        return repr(arg) # not a sequence so just return repr

This is clean and elegant, overall. But what’s that isinstance() check doing there? That’s kind of a hack. But it is essential.

This function calls itself recursively on anything that acts like a list. If we didn’t handle the string specially, then it would be treated like a list, and split up one character at a time. But then the recursive call would try to treat each character as a list — and it would work! Even a one-character string works as a list! The function would keep on calling itself recursively until stack overflow.

Functions like this one, that depend on each recursive call breaking down the work to be done, have to special-case strings–because you can’t break down a string below the level of a one-character string, and even a one-character string acts like a list.

Note: the try/except is the cleanest way to express our intentions. But if this code were somehow time-critical, we might want to replace it with some sort of test to see if arg is a sequence. Rather than testing the type, we should probably test behaviors. If it has a .strip() method, it’s a string, so don’t consider it a sequence; otherwise, if it is indexable or iterable, it’s a sequence:

def is_sequence(arg):
    return (not hasattr(arg, "strip") and
            hasattr(arg, "__getitem__") or
            hasattr(arg, "__iter__"))

def srepr(arg):
    if is_sequence(arg):
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    return repr(arg)

EDIT: I originally wrote the above with a check for __getslice__() but I noticed that in the collections module documentation, the interesting method is __getitem__(); this makes sense, that’s how you index an object. That seems more fundamental than __getslice__() so I changed the above.


回答 2

H = "Hello"

if type(H) is list or type(H) is tuple:
    ## Do Something.
else
    ## Do Something.
H = "Hello"

if type(H) is list or type(H) is tuple:
    ## Do Something.
else
    ## Do Something.

回答 3

对于Python 3:

import collections.abc

if isinstance(obj, collections.abc.Sequence) and not isinstance(obj, str):
    print("obj is a sequence (list, tuple, etc) but not a string")

在版本3.3中进行了更改:将集合抽象基类移至collections.abc模块。为了向后兼容,它们在此模块中也将继续可见,直到3.8版将停止工作为止。

对于Python 2:

import collections

if isinstance(obj, collections.Sequence) and not isinstance(obj, basestring):
    print "obj is a sequence (list, tuple, etc) but not a string or unicode"

For Python 3:

import collections.abc

if isinstance(obj, collections.abc.Sequence) and not isinstance(obj, str):
    print("obj is a sequence (list, tuple, etc) but not a string")

Changed in version 3.3: Moved Collections Abstract Base Classes to the collections.abc module. For backwards compatibility, they will continue to be visible in this module as well until version 3.8 where it will stop working.

For Python 2:

import collections

if isinstance(obj, collections.Sequence) and not isinstance(obj, basestring):
    print "obj is a sequence (list, tuple, etc) but not a string or unicode"

回答 4

具有PHP风格的Python:

def is_array(var):
    return isinstance(var, (list, tuple))

Python with PHP flavor:

def is_array(var):
    return isinstance(var, (list, tuple))

回答 5

一般来说,在对象上进行迭代的函数不仅可以处理错误,还可以处理字符串,元组和列表。您当然可以使用isinstance或鸭式输入来检查参数,但是为什么要这么做呢?

这听起来像是个反问,但事实并非如此。答案为“为什么我应该检查参数的类型?” 可能会建议解决实际问题,而不是感知到的问题。将字符串传递给函数时,为什么会出错?另外:如果将字符串传递给此函数是一个错误,是否将其他非列表/元组可迭代传递给它也是一个错误吗?为什么或者为什么不?

我认为这个问题的最常见答案可能是 f("abc")期望该函数的行为就像编写的一样f(["abc"])。在某些情况下,保护开发人员免受自身侵害比支持对字符串中的字符进行迭代的用例更有意义。但是我首先会考虑很长时间。

Generally speaking, the fact that a function which iterates over an object works on strings as well as tuples and lists is more feature than bug. You certainly can use isinstance or duck typing to check an argument, but why should you?

That sounds like a rhetorical question, but it isn’t. The answer to “why should I check the argument’s type?” is probably going to suggest a solution to the real problem, not the perceived problem. Why is it a bug when a string is passed to the function? Also: if it’s a bug when a string is passed to this function, is it also a bug if some other non-list/tuple iterable is passed to it? Why, or why not?

I think that the most common answer to the question is likely to be that developers who write f("abc") are expecting the function to behave as though they’d written f(["abc"]). There are probably circumstances where it makes more sense to protect developers from themselves than it does to support the use case of iterating across the characters in a string. But I’d think long and hard about it first.


回答 6

尝试此操作以提高可读性和最佳做法:

Python2

import types
if isinstance(lst, types.ListType) or isinstance(lst, types.TupleType):
    # Do something

Python3

import typing
if isinstance(lst, typing.List) or isinstance(lst, typing.Tuple):
    # Do something

希望能帮助到你。

Try this for readability and best practices:

Python2

import types
if isinstance(lst, types.ListType) or isinstance(lst, types.TupleType):
    # Do something

Python3

import typing
if isinstance(lst, typing.List) or isinstance(lst, typing.Tuple):
    # Do something

Hope it helps.


回答 7

str对象没有__iter__属性

>>> hasattr('', '__iter__')
False 

所以你可以检查一下

assert hasattr(x, '__iter__')

这也AssertionError将为其他任何不可迭代的对象带来好处。

编辑: 正如蒂姆在评论中提到的那样,这仅适用于python 2.x,而不是3.x

The str object doesn’t have an __iter__ attribute

>>> hasattr('', '__iter__')
False 

so you can do a check

assert hasattr(x, '__iter__')

and this will also raise a nice AssertionError for any other non-iterable object too.

Edit: As Tim mentions in the comments, this will only work in python 2.x, not 3.x


回答 8

这并不是要直接回答OP,而是要分享一些相关想法。

我对上面的@steveha回答非常感兴趣,这似乎举了一个鸭子输入似乎中断的示例。换个角度说,他的例子表明鸭子的分类很难遵循,但是并不能说明str值得任何特殊处理。

毕竟,非str类型(例如,维护一些复杂的递归结构的用户定义类型)可能导致@steveha srepr函数引起无限递归。尽管这确实不太可能,但我们不能忽略这种可能性。因此,与其特殊外壳strsrepr,我们应该明确,我们想要什么srepr在无限递归产生时的事情情况。

似乎一种合理的方法是srepr暂时中断当前递归list(arg) == [arg]。这,其实,彻底解决这个问题str,没有任何isinstance

但是,真正复杂的递归结构可能会导致无限循环,list(arg) == [arg]永远不会发生。因此,尽管上面的检查很有用,但还不够。我们需要对递归深度进行严格限制。

我的观点是,如果您打算处理任意参数类型,则str通过鸭子类型进行处理要比处理(理论上)遇到的更通用类型容易得多。因此,如果您需要排除str实例,则应该要求该参数是您明确指定的几种类型之一的实例。

This is not intended to directly answer the OP, but I wanted to share some related ideas.

I was very interested in @steveha answer above, which seemed to give an example where duck typing seems to break. On second thought, however, his example suggests that duck typing is hard to conform to, but it does not suggest that str deserves any special handling.

After all, a non-str type (e.g., a user-defined type that maintains some complicated recursive structures) may cause @steveha srepr function to cause an infinite recursion. While this is admittedly rather unlikely, we can’t ignore this possibility. Therefore, rather than special-casing str in srepr, we should clarify what we want srepr to do when an infinite recursion results.

It may seem that one reasonable approach is to simply break the recursion in srepr the moment list(arg) == [arg]. This would, in fact, completely solve the problem with str, without any isinstance.

However, a really complicated recursive structure may cause an infinite loop where list(arg) == [arg] never happens. Therefore, while the above check is useful, it’s not sufficient. We need something like a hard limit on the recursion depth.

My point is that if you plan to handle arbitrary argument types, handling str via duck typing is far, far easier than handling the more general types you may (theoretically) encounter. So if you feel the need to exclude str instances, you should instead demand that the argument is an instance of one of the few types that you explicitly specify.


回答 9

在tensorflow中找到了一个名为is_sequence的函数

def is_sequence(seq):
  """Returns a true if its input is a collections.Sequence (except strings).
  Args:
    seq: an input sequence.
  Returns:
    True if the sequence is a not a string and is a collections.Sequence.
  """
  return (isinstance(seq, collections.Sequence)
and not isinstance(seq, six.string_types))

而且我已经证实它可以满足您的需求。

I find such a function named is_sequence in tensorflow.

def is_sequence(seq):
  """Returns a true if its input is a collections.Sequence (except strings).
  Args:
    seq: an input sequence.
  Returns:
    True if the sequence is a not a string and is a collections.Sequence.
  """
  return (isinstance(seq, collections.Sequence)
and not isinstance(seq, six.string_types))

And I have verified that it meets your needs.


回答 10

我在测试用例中执行此操作。

def assertIsIterable(self, item):
    #add types here you don't want to mistake as iterables
    if isinstance(item, basestring): 
        raise AssertionError("type %s is not iterable" % type(item))

    #Fake an iteration.
    try:
        for x in item:
            break;
    except TypeError:
        raise AssertionError("type %s is not iterable" % type(item))

未经生成器测试,我认为如果通过生成器,您将处于下一个“收益”状态,这可能会使下游情况恶化。但是再说一次,这是一个“单元测试”

I do this in my testcases.

def assertIsIterable(self, item):
    #add types here you don't want to mistake as iterables
    if isinstance(item, basestring): 
        raise AssertionError("type %s is not iterable" % type(item))

    #Fake an iteration.
    try:
        for x in item:
            break;
    except TypeError:
        raise AssertionError("type %s is not iterable" % type(item))

Untested on generators, I think you are left at the next ‘yield’ if passed in a generator, which may screw things up downstream. But then again, this is a ‘unittest’


回答 11

以“鸭子打字”的方式

try:
    lst = lst + []
except TypeError:
    #it's not a list

要么

try:
    lst = lst + ()
except TypeError:
    #it's not a tuple

分别。这避免了isinstance/ hasattr内省的东西。

您也可以反之亦然:

try:
    lst = lst + ''
except TypeError:
    #it's not (base)string

所有变体实际上都不会更改变量的内容,而是暗示了重新分配。我不确定这在某些情况下是否不受欢迎。

有趣的是,在任何情况下,如果是列表(不是元组),则在“就地”赋值时都不会引发+=no 。这就是为什么以这种方式完成分配的原因。也许有人可以阐明原因。TypeErrorlst

In “duck typing” manner, how about

try:
    lst = lst + []
except TypeError:
    #it's not a list

or

try:
    lst = lst + ()
except TypeError:
    #it's not a tuple

respectively. This avoids the isinstance / hasattr introspection stuff.

You could also check vice versa:

try:
    lst = lst + ''
except TypeError:
    #it's not (base)string

All variants do not actually change the content of the variable, but imply a reassignment. I’m unsure whether this might be undesirable under some circumstances.

Interestingly, with the “in place” assignment += no TypeError would be raised in any case if lst is a list (not a tuple). That’s why the assignment is done this way. Maybe someone can shed light on why that is.


回答 12

最简单的方法…使用anyisinstance

>>> console_routers = 'x'
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
False
>>>
>>> console_routers = ('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True
>>> console_routers = list('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True

simplest way… using any and isinstance

>>> console_routers = 'x'
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
False
>>>
>>> console_routers = ('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True
>>> console_routers = list('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True

回答 13

鸭式打字的另一种形式,可以帮助区分类似字符串的对象和其他类似序列的对象。

类字符串对象的字符串表示形式是字符串本身,因此您可以检查是否从str构造函数中返回了相等的对象:

# If a string was passed, convert it to a single-element sequence
if var == str(var):
    my_list = [var]

# All other iterables
else: 
    my_list = list(var)

这应该适用于与str所有可迭代对象兼容的所有对象。

Another version of duck-typing to help distinguish string-like objects from other sequence-like objects.

The string representation of string-like objects is the string itself, so you can check if you get an equal object back from the str constructor:

# If a string was passed, convert it to a single-element sequence
if var == str(var):
    my_list = [var]

# All other iterables
else: 
    my_list = list(var)

This should work for all objects compatible with str and for all kinds of iterable objects.


回答 14

Python 3具有以下功能:

from typing import List

def isit(value):
    return isinstance(value, List)

isit([1, 2, 3])  # True
isit("test")  # False
isit({"Hello": "Mars"})  # False
isit((1, 2))  # False

因此,要同时检查列表和元组,将是:

from typing import List, Tuple

def isit(value):
    return isinstance(value, List) or isinstance(value, Tuple)

Python 3 has this:

from typing import List

def isit(value):
    return isinstance(value, List)

isit([1, 2, 3])  # True
isit("test")  # False
isit({"Hello": "Mars"})  # False
isit((1, 2))  # False

So to check for both Lists and Tuples, it would be:

from typing import List, Tuple

def isit(value):
    return isinstance(value, List) or isinstance(value, Tuple)

回答 15

assert (type(lst) == list) | (type(lst) == tuple), "Not a valid lst type, cannot be string"
assert (type(lst) == list) | (type(lst) == tuple), "Not a valid lst type, cannot be string"

回答 16

做这个

if type(lst) in (list, tuple):
    # Do stuff

Just do this

if type(lst) in (list, tuple):
    # Do stuff

回答 17

在python> 3.6中

import collections
isinstance(set(),collections.abc.Container)
True
isinstance([],collections.abc.Container)
True
isinstance({},collections.abc.Container)
True
isinstance((),collections.abc.Container)
True
isinstance(str,collections.abc.Container)
False

in python >3.6

import collections
isinstance(set(),collections.abc.Container)
True
isinstance([],collections.abc.Container)
True
isinstance({},collections.abc.Container)
True
isinstance((),collections.abc.Container)
True
isinstance(str,collections.abc.Container)
False

回答 18

我倾向于这样做(如果真的必须这样做的话):

for i in some_var:
   if type(i) == type(list()):
       #do something with a list
   elif type(i) == type(tuple()):
       #do something with a tuple
   elif type(i) == type(str()):
       #here's your string

I tend to do this (if I really, really had to):

for i in some_var:
   if type(i) == type(list()):
       #do something with a list
   elif type(i) == type(tuple()):
       #do something with a tuple
   elif type(i) == type(str()):
       #here's your string

迭代器,可迭代和迭代到底是什么?

问题:迭代器,可迭代和迭代到底是什么?

Python中“可迭代”,“迭代器”和“迭代”的最基本定义是什么?

我已经阅读了多个定义,但是我无法确定确切的含义,因为它仍然不会陷入。

有人可以在外行方面为我提供3个定义的帮助吗?

What is the most basic definition of “iterable”, “iterator” and “iteration” in Python?

I have read multiple definitions but I am unable to identify the exact meaning as it still won’t sink in.

Can someone please help me with the 3 definitions in layman terms?


回答 0

迭代是一个总称,表示一件一件一件一件一件接一件的物品。每当您使用循环(显式或隐式)遍历一组项目时,即迭代。

在Python中,iterableiterator具有特定的含义。

一个迭代是具有对象__iter__返回一个方法迭代,或者其限定__getitem__,可以采取顺序索引从零启动方法(并发出IndexError时,索引不再有效)。因此,可迭代对象是可以从中获取迭代器的对象。

一个迭代器是具有一个对象next(Python的2)或__next__(Python 3的)方法。

每当在Python中使用for循环或map或列表理解等时,next都会自动调用该方法以从迭代器获取每个项,从而进行迭代过程。

一个开始学习的好地方是本教程迭代器部分和标准类型页面迭代器类型部分。了解基础知识之后,请尝试“功能编程HOWTO”的“ 迭代器”部分

Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.

In Python, iterable and iterator have specific meanings.

An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.

An iterator is an object with a next (Python 2) or __next__ (Python 3) method.

Whenever you use a for loop, or map, or a list comprehension, etc. in Python, the next method is called automatically to get each item from the iterator, thus going through the process of iteration.

A good place to start learning would be the iterators section of the tutorial and the iterator types section of the standard types page. After you understand the basics, try the iterators section of the Functional Programming HOWTO.


回答 1

这是我在教授Python类时使用的解释:

一个ITERABLE是:

  • 任何可以循环播放的内容(例如,您可以循环播放字符串或文件)或
  • 任何可能出现在for循环右侧的内容: for x in iterable: ...
  • 您可以呼叫的任何内容iter()都会传回ITERATOR: iter(obj)
  • 一个定义的对象,该对象__iter__返回一个新鲜的ITERATOR,或者它可能具有__getitem__适合于索引查找的方法。

ITERATOR是一个对象:

  • 状态会记住迭代过程中的位置,
  • 使用以下__next__方法:
    • 返回迭代中的下一个值
    • 更新状态以指向下一个值
    • 通过提高发出信号 StopIteration
  • 并且这是可自我迭代的(意味着它具有__iter__返回的方法self)。

笔记:

  • __next__Python 3中的方法是Python 2中的拼写next,并且
  • 内置函数next()在传递给它的对象上调用该方法。

例如:

>>> s = 'cat'      # s is an ITERABLE
                   # s is a str object that is immutable
                   # s has no state
                   # s has a __getitem__() method 

>>> t = iter(s)    # t is an ITERATOR
                   # t has state (it starts by pointing at the "c"
                   # t has a next() method and an __iter__() method

>>> next(t)        # the next() function returns the next value and advances the state
'c'
>>> next(t)        # the next() function returns the next value and advances
'a'
>>> next(t)        # the next() function returns the next value and advances
't'
>>> next(t)        # next() raises StopIteration to signal that iteration is complete
Traceback (most recent call last):
...
StopIteration

>>> iter(t) is t   # the iterator is self-iterable

Here’s the explanation I use in teaching Python classes:

An ITERABLE is:

  • anything that can be looped over (i.e. you can loop over a string or file) or
  • anything that can appear on the right-side of a for-loop: for x in iterable: ... or
  • anything you can call with iter() that will return an ITERATOR: iter(obj) or
  • an object that defines __iter__ that returns a fresh ITERATOR, or it may have a __getitem__ method suitable for indexed lookup.

An ITERATOR is an object:

  • with state that remembers where it is during iteration,
  • with a __next__ method that:
    • returns the next value in the iteration
    • updates the state to point at the next value
    • signals when it is done by raising StopIteration
  • and that is self-iterable (meaning that it has an __iter__ method that returns self).

Notes:

  • The __next__ method in Python 3 is spelt next in Python 2, and
  • The builtin function next() calls that method on the object passed to it.

For example:

>>> s = 'cat'      # s is an ITERABLE
                   # s is a str object that is immutable
                   # s has no state
                   # s has a __getitem__() method 

>>> t = iter(s)    # t is an ITERATOR
                   # t has state (it starts by pointing at the "c"
                   # t has a next() method and an __iter__() method

>>> next(t)        # the next() function returns the next value and advances the state
'c'
>>> next(t)        # the next() function returns the next value and advances
'a'
>>> next(t)        # the next() function returns the next value and advances
't'
>>> next(t)        # next() raises StopIteration to signal that iteration is complete
Traceback (most recent call last):
...
StopIteration

>>> iter(t) is t   # the iterator is self-iterable

回答 2

上面的答案很棒,但是正如我所见到的大多数一样,对于像我这样的人来说,不要强调这种区别

同样,人们倾向于通过在__foo__()前面放置诸如“ X是具有方法的对象”之类的定义来获得“ Python风格” 。这样的定义是正确的-它们基于鸭子式的哲学,但是当试图以简单的方式理解概念时,对方法的关注往往会介于两者之间。

因此,我添加了我的版本。


用自然语言

  • 迭代是在一行元素中一次获取一个元素的过程。

在Python中,

  • Iterable是一个很好的可迭代对象,简单地说,意味着可以在迭代中使用它,例如使用for循环。怎么样?通过使用迭代器。我会在下面解释。

  • …,而迭代器是一个对象,它定义了如何实际执行迭代-特别是下一个元素是什么。这就是为什么它必须有next()方法的原因 。

迭代器本身也是可迭代的,区别在于它们的__iter__()方法返回相同的object(self),而不管其先前的调用是否已消耗其项目next()


那么,Python解释器看到for x in obj:语句时会怎么想?

看,for循环。看起来像是一个迭代器的工作…让我们得到一个。…有obj一个人,让我们问他。

“先生obj,您有迭代器吗?” (…调用iter(obj),这些调用 obj.__iter__()愉快地发出了一个闪亮的新迭代器_i。)

好的,那很简单…让我们开始迭代。(x = _i.next()x = _i.next()…)

由于Mr. Mr obj成功地通过了某种测试(通过某种方法返回有效的迭代器),因此我们用形容词来奖励他:您现在可以称他为“ Iterable Mr. obj”。

但是,在简单的情况下,通常不会从分别拥有Iterator和Iterable中受益。因此,您定义一个对象,这也是它自己的迭代器。(Python并不真正在乎_i发出的obj不是那么闪亮,而仅仅是obj它本身。)

这就是为什么在我见过的大多数示例中(以及一遍又一遍使我困惑的原因)中,您可以看到:

class IterableExample(object):

    def __iter__(self):
        return self

    def next(self):
        pass

代替

class Iterator(object):
    def next(self):
        pass

class Iterable(object):
    def __iter__(self):
        return Iterator()

但是,在某些情况下,可以从使迭代器与可迭代的对象分离中受益,例如,当您希望有一行项目,但需要更多的“游标”时。例如,当您要使用“当前”和“即将到来”的元素时,可以为这两个元素使用单独的迭代器。或从庞大列表中提取多个线程:每个线程都可以具有自己的迭代器以遍历所有项目。见@雷蒙德@ glglgl的上述回答。

想象一下您可以做什么:

class SmartIterableExample(object):

    def create_iterator(self):
        # An amazingly powerful yet simple way to create arbitrary
        # iterator, utilizing object state (or not, if you are fan
        # of functional), magic and nuclear waste--no kittens hurt.
        pass    # don't forget to add the next() method

    def __iter__(self):
        return self.create_iterator()

笔记:

  • 我将再次重复:迭代器不可迭代。迭代器不能用作for循环中的“源” 。什么for环路主要需要的是__iter__() (即返回与事next())。

  • 当然,for这不是唯一的迭代循环,因此上述内容同样适用于其他一些构造(while…)。

  • 迭代器next()可以抛出StopIteration来停止迭代。但是,它不必永久地迭代或使用其他方式。

  • 在上面的“思想过程”中,_i并不真正存在。我叫这个名字。

  • Python 3.x有一个小的变化:next()现在必须调用方法(不是内置方法)__next__()。是的,一直以来都是这样。

  • 您也可以这样想:可迭代拥有数据,迭代器提取下一项

免责声明:我不是任何Python解释器的开发人员,所以我真的不知道解释器的想法。上面的想法只是从其他解释,实验和Python新手的实际经验中展示了我如何理解该主题。

The above answers are great, but as most of what I’ve seen, don’t stress the distinction enough for people like me.

Also, people tend to get “too Pythonic” by putting definitions like “X is an object that has __foo__() method” before. Such definitions are correct–they are based on duck-typing philosophy, but the focus on methods tends to get between when trying to understand the concept in its simplicity.

So I add my version.


In natural language,

  • iteration is the process of taking one element at a time in a row of elements.

In Python,

  • iterable is an object that is, well, iterable, which simply put, means that it can be used in iteration, e.g. with a for loop. How? By using iterator. I’ll explain below.

  • … while iterator is an object that defines how to actually do the iteration–specifically what is the next element. That’s why it must have next() method.

Iterators are themselves also iterable, with the distinction that their __iter__() method returns the same object (self), regardless of whether or not its items have been consumed by previous calls to next().


So what does Python interpreter think when it sees for x in obj: statement?

Look, a for loop. Looks like a job for an iterator… Let’s get one. … There’s this obj guy, so let’s ask him.

“Mr. obj, do you have your iterator?” (… calls iter(obj), which calls obj.__iter__(), which happily hands out a shiny new iterator _i.)

OK, that was easy… Let’s start iterating then. (x = _i.next()x = _i.next()…)

Since Mr. obj succeeded in this test (by having certain method returning a valid iterator), we reward him with adjective: you can now call him “iterable Mr. obj“.

However, in simple cases, you don’t normally benefit from having iterator and iterable separately. So you define only one object, which is also its own iterator. (Python does not really care that _i handed out by obj wasn’t all that shiny, but just the obj itself.)

This is why in most examples I’ve seen (and what had been confusing me over and over), you can see:

class IterableExample(object):

    def __iter__(self):
        return self

    def next(self):
        pass

instead of

class Iterator(object):
    def next(self):
        pass

class Iterable(object):
    def __iter__(self):
        return Iterator()

There are cases, though, when you can benefit from having iterator separated from the iterable, such as when you want to have one row of items, but more “cursors”. For example when you want to work with “current” and “forthcoming” elements, you can have separate iterators for both. Or multiple threads pulling from a huge list: each can have its own iterator to traverse over all items. See @Raymond’s and @glglgl’s answers above.

Imagine what you could do:

class SmartIterableExample(object):

    def create_iterator(self):
        # An amazingly powerful yet simple way to create arbitrary
        # iterator, utilizing object state (or not, if you are fan
        # of functional), magic and nuclear waste--no kittens hurt.
        pass    # don't forget to add the next() method

    def __iter__(self):
        return self.create_iterator()

Notes:

  • I’ll repeat again: iterator is not iterable. Iterator cannot be used as a “source” in for loop. What for loop primarily needs is __iter__() (that returns something with next()).

  • Of course, for is not the only iteration loop, so above applies to some other constructs as well (while…).

  • Iterator’s next() can throw StopIteration to stop iteration. Does not have to, though, it can iterate forever or use other means.

  • In the above “thought process”, _i does not really exist. I’ve made up that name.

  • There’s a small change in Python 3.x: next() method (not the built-in) now must be called __next__(). Yes, it should have been like that all along.

  • You can also think of it like this: iterable has the data, iterator pulls the next item

Disclaimer: I’m not a developer of any Python interpreter, so I don’t really know what the interpreter “thinks”. The musings above are solely demonstration of how I understand the topic from other explanations, experiments and real-life experience of a Python newbie.


回答 3

可迭代对象是具有__iter__()方法的对象。它可能会迭代多次,例如list()s和tuple()s。

迭代器是要迭代的对象。它由__iter__()方法返回,通过自己的__iter__()方法返回自身,并具有next()方法(__next__()在3.x中)。

迭代是调用此next()响应的过程。__next__()直到它上升StopIteration

例:

>>> a = [1, 2, 3] # iterable
>>> b1 = iter(a) # iterator 1
>>> b2 = iter(a) # iterator 2, independent of b1
>>> next(b1)
1
>>> next(b1)
2
>>> next(b2) # start over, as it is the first call to b2
1
>>> next(b1)
3
>>> next(b1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> b1 = iter(a) # new one, start over
>>> next(b1)
1

An iterable is a object which has a __iter__() method. It can possibly iterated over several times, such as list()s and tuple()s.

An iterator is the object which iterates. It is returned by an __iter__() method, returns itself via its own __iter__() method and has a next() method (__next__() in 3.x).

Iteration is the process of calling this next() resp. __next__() until it raises StopIteration.

Example:

>>> a = [1, 2, 3] # iterable
>>> b1 = iter(a) # iterator 1
>>> b2 = iter(a) # iterator 2, independent of b1
>>> next(b1)
1
>>> next(b1)
2
>>> next(b2) # start over, as it is the first call to b2
1
>>> next(b1)
3
>>> next(b1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> b1 = iter(a) # new one, start over
>>> next(b1)
1

回答 4

这是我的备忘单:

 sequence
  +
  |
  v
   def __getitem__(self, index: int):
  +    ...
  |    raise IndexError
  |
  |
  |              def __iter__(self):
  |             +     ...
  |             |     return <iterator>
  |             |
  |             |
  +--> or <-----+        def __next__(self):
       +        |       +    ...
       |        |       |    raise StopIteration
       v        |       |
    iterable    |       |
           +    |       |
           |    |       v
           |    +----> and +-------> iterator
           |                               ^
           v                               |
   iter(<iterable>) +----------------------+
                                           |
   def generator():                        |
  +    yield 1                             |
  |                 generator_expression +-+
  |                                        |
  +-> generator() +-> generator_iterator +-+

测验:您知道如何…

  1. 每个迭代器都是可迭代的?
  2. 容器对象的__iter__()方法可以实现为生成器吗?
  3. 具有__next__方法的可迭代对象不一定是迭代器吗?

答案:

  1. 每个迭代器都必须有一个__iter__方法。具有__iter__足够的可迭代性。因此,每个迭代器都是可迭代的。
  2. __iter__被调用时,它应该返回一个迭代器(return <iterator>在上图中)。调用生成器将返回生成器迭代器,它是迭代器的一种。

    class Iterable1:
        def __iter__(self):
            # a method (which is a function defined inside a class body)
            # calling iter() converts iterable (tuple) to iterator
            return iter((1,2,3))
    
    class Iterable2:
        def __iter__(self):
            # a generator
            for i in (1, 2, 3):
                yield i
    
    class Iterable3:
        def __iter__(self):
            # with PEP 380 syntax
            yield from (1, 2, 3)
    
    # passes
    assert list(Iterable1()) == list(Iterable2()) == list(Iterable3()) == [1, 2, 3]
  3. 这是一个例子:

    class MyIterable:
    
        def __init__(self):
            self.n = 0
    
        def __getitem__(self, index: int):
            return (1, 2, 3)[index]
    
        def __next__(self):
            n = self.n = self.n + 1
            if n > 3:
                raise StopIteration
            return n
    
    # if you can iter it without raising a TypeError, then it's an iterable.
    iter(MyIterable())
    
    # but obviously `MyIterable()` is not an iterator since it does not have
    # an `__iter__` method.
    from collections.abc import Iterator
    assert isinstance(MyIterable(), Iterator)  # AssertionError

Here’s my cheat sheet:

 sequence
  +
  |
  v
   def __getitem__(self, index: int):
  +    ...
  |    raise IndexError
  |
  |
  |              def __iter__(self):
  |             +     ...
  |             |     return <iterator>
  |             |
  |             |
  +--> or <-----+        def __next__(self):
       +        |       +    ...
       |        |       |    raise StopIteration
       v        |       |
    iterable    |       |
           +    |       |
           |    |       v
           |    +----> and +-------> iterator
           |                               ^
           v                               |
   iter(<iterable>) +----------------------+
                                           |
   def generator():                        |
  +    yield 1                             |
  |                 generator_expression +-+
  |                                        |
  +-> generator() +-> generator_iterator +-+

Quiz: Do you see how…

  1. every iterator is an iterable?
  2. a container object’s __iter__() method can be implemented as a generator?
  3. an iterable that has a __next__ method is not necessarily an iterator?

Answers:

  1. Every iterator must have an __iter__ method. Having __iter__ is enough to be an iterable. Therefore every iterator is an iterable.
  2. When __iter__ is called it should return an iterator (return <iterator> in the diagram above). Calling a generator returns a generator iterator which is a type of iterator.

    class Iterable1:
        def __iter__(self):
            # a method (which is a function defined inside a class body)
            # calling iter() converts iterable (tuple) to iterator
            return iter((1,2,3))
    
    class Iterable2:
        def __iter__(self):
            # a generator
            for i in (1, 2, 3):
                yield i
    
    class Iterable3:
        def __iter__(self):
            # with PEP 380 syntax
            yield from (1, 2, 3)
    
    # passes
    assert list(Iterable1()) == list(Iterable2()) == list(Iterable3()) == [1, 2, 3]
    
  3. Here is an example:

    class MyIterable:
    
        def __init__(self):
            self.n = 0
    
        def __getitem__(self, index: int):
            return (1, 2, 3)[index]
    
        def __next__(self):
            n = self.n = self.n + 1
            if n > 3:
                raise StopIteration
            return n
    
    # if you can iter it without raising a TypeError, then it's an iterable.
    iter(MyIterable())
    
    # but obviously `MyIterable()` is not an iterator since it does not have
    # an `__iter__` method.
    from collections.abc import Iterator
    assert isinstance(MyIterable(), Iterator)  # AssertionError
    

回答 5

我不知道它是否对任何人都有帮助,但我一直喜欢在脑海中形象化概念以更好地理解它们。因此,当我有一个小儿子时,我用砖块和白皮书形象化了迭代/迭代器的概念。

假设我们在黑暗的房间里,在地板上,我的儿子有砖头。现在,大小,颜色不同的砖都不再重要了。假设我们有5块这样的砖。可以将这5块砖描述为一个对象 -假设是砖块套件。使用此积木工具包,我们可以做很多事情–可以取一个,然后取第二,然后取第三,可以更改积木的位置,将第一个积木放在第二个之上。我们可以用这些做很多事情。因此,这个积木工具包是一个可迭代的对象序列,因为我们可以遍历每个积木并对其进行处理。我们只能做到像我的小儿子-我们可以玩一个在同一时间。所以我再次想像自己这套积木是一个可迭代的

现在请记住,我们在黑暗的房间里。或几乎是黑暗的。问题是我们没有清楚地看到那些砖块,它们是什么颜色,什么形状等等。因此,即使我们想对它们做些事情(也就是遍历它们),我们也不知道到底是什么以及如何做,因为它是太暗了。

我们所能做的就是接近第一个砖块(作为砖块工具包的组成部分),我们可以放一张白色荧光纸,以便我们了解第一个砖块元素的位置。每次我们从工具包中取出一块砖块时,都会将白纸替换为下一块砖块,以便能够在黑暗的房间中看到它。这张白纸只不过是一个迭代器。它也是一个对象。但是,具有可工作和可迭代对象的元素的对象–砖块工具包。

顺便说一下,这解释了我在IDLE中尝试以下操作并遇到TypeError时的早期错误:

 >>> X = [1,2,3,4,5]
 >>> next(X)
 Traceback (most recent call last):
    File "<pyshell#19>", line 1, in <module>
      next(X)
 TypeError: 'list' object is not an iterator

清单X是我们的积木工具包,但不是白纸。我需要先找到一个迭代器:

>>> X = [1,2,3,4,5]
>>> bricks_kit = [1,2,3,4,5]
>>> white_piece_of_paper = iter(bricks_kit)
>>> next(white_piece_of_paper)
1
>>> next(white_piece_of_paper)
2
>>>

不知道是否有帮助,但是对我有帮助。如果有人可以确认/纠正该概念的可视化,我将不胜感激。这将帮助我了解更多信息。

I don’t know if it helps anybody but I always like to visualize concepts in my head to better understand them. So as I have a little son I visualize iterable/iterator concept with bricks and white paper.

Suppose we are in the dark room and on the floor we have bricks for my son. Bricks of different size, color, does not matter now. Suppose we have 5 bricks like those. Those 5 bricks can be described as an object – let’s say bricks kit. We can do many things with this bricks kit – can take one and then take second and then third, can change places of bricks, put first brick above the second. We can do many sorts of things with those. Therefore this bricks kit is an iterable object or sequence as we can go through each brick and do something with it. We can only do it like my little son – we can play with one brick at a time. So again I imagine myself this bricks kit to be an iterable.

Now remember that we are in the dark room. Or almost dark. The thing is that we don’t clearly see those bricks, what color they are, what shape etc. So even if we want to do something with them – aka iterate through them – we don’t really know what and how because it is too dark.

What we can do is near to first brick – as element of a bricks kit – we can put a piece of white fluorescent paper in order for us to see where the first brick-element is. And each time we take a brick from a kit, we replace the white piece of paper to a next brick in order to be able to see that in the dark room. This white piece of paper is nothing more than an iterator. It is an object as well. But an object with what we can work and play with elements of our iterable object – bricks kit.

That by the way explains my early mistake when I tried the following in an IDLE and got a TypeError:

 >>> X = [1,2,3,4,5]
 >>> next(X)
 Traceback (most recent call last):
    File "<pyshell#19>", line 1, in <module>
      next(X)
 TypeError: 'list' object is not an iterator

List X here was our bricks kit but NOT a white piece of paper. I needed to find an iterator first:

>>> X = [1,2,3,4,5]
>>> bricks_kit = [1,2,3,4,5]
>>> white_piece_of_paper = iter(bricks_kit)
>>> next(white_piece_of_paper)
1
>>> next(white_piece_of_paper)
2
>>>

Don’t know if it helps, but it helped me. If someone could confirm/correct visualization of the concept, I would be grateful. It would help me to learn more.


回答 6

可迭代: -这是迭代的迭代; 例如列表,字符串等序列。它也具有__getitem__方法或__iter__方法。现在,如果我们iter()对该对象使用功能,我们将获得一个迭代器。

迭代器:-当我们从iter()函数中获取迭代器对象时;我们调用__next__()方法(在python3中)或简单地next()(在python2中)一一获取元素。此类或此类的实例称为迭代器。

从文档:-

迭代器的使用遍布并统一了Python。在后台,for语句调用  iter() 容器对象。该函数返回一个迭代器对象,该对象定义了__next__() 一次访问一个容器中元素的方法  。当没有更多元素时,  __next__() 引发StopIteration异常,该异常通知for循环终止。您可以__next__() 使用next() 内置函数来调用该  方法  。这个例子展示了它是如何工作的:

>>> s = 'abc'
>>> it = iter(s)
>>> it
<iterator object at 0x00A1DB50>
>>> next(it)
'a'
>>> next(it)
'b'
>>> next(it)
'c'
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    next(it)
StopIteration

例如:

class Reverse:
    """Iterator for looping over a sequence backwards."""
    def __init__(self, data):
        self.data = data
        self.index = len(data)
    def __iter__(self):
        return self
    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]


>>> rev = Reverse('spam')
>>> iter(rev)
<__main__.Reverse object at 0x00A1DB50>
>>> for char in rev:
...     print(char)
...
m
a
p
s

Iterable:- something that is iterable is iterable; like sequences like lists ,strings etc. Also it has either the __getitem__ method or an __iter__ method. Now if we use iter() function on that object, we’ll get an iterator.

Iterator:- When we get the iterator object from the iter() function; we call __next__() method (in python3) or simply next() (in python2) to get elements one by one. This class or instance of this class is called an iterator.

From docs:-

The use of iterators pervades and unifies Python. Behind the scenes, the for statement calls iter() on the container object. The function returns an iterator object that defines the method __next__() which accesses elements in the container one at a time. When there are no more elements, __next__() raises a StopIteration exception which tells the for loop to terminate. You can call the __next__() method using the next() built-in function; this example shows how it all works:

>>> s = 'abc'
>>> it = iter(s)
>>> it
<iterator object at 0x00A1DB50>
>>> next(it)
'a'
>>> next(it)
'b'
>>> next(it)
'c'
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    next(it)
StopIteration

Ex of a class:-

class Reverse:
    """Iterator for looping over a sequence backwards."""
    def __init__(self, data):
        self.data = data
        self.index = len(data)
    def __iter__(self):
        return self
    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]


>>> rev = Reverse('spam')
>>> iter(rev)
<__main__.Reverse object at 0x00A1DB50>
>>> for char in rev:
...     print(char)
...
m
a
p
s

回答 7

我认为您不会比文档简单得多,但是我会尝试:

  • 可迭代的东西,可以被重复过。在实践中,它通常表示一个序列,例如具有开始和结束的某种事物,以及某种贯穿其中所有项目的方式。
  • 您可以将Iterator视为辅助伪方法(或伪属性),该伪方法可提供(或保留)iterable中的下一个(或第一个)项。(实际上,它只是一个定义方法的对象next()

  • Merriam-Webster 对该词的定义可能最好地解释了迭代

b:将计算机指令序列重复指定的次数或直到满足条件为止-比较递归

I don’t think that you can get it much simpler than the documentation, however I’ll try:

  • Iterable is something that can be iterated over. In practice it usually means a sequence e.g. something that has a beginning and an end and some way to go through all the items in it.
  • You can think Iterator as a helper pseudo-method (or pseudo-attribute) that gives (or holds) the next (or first) item in the iterable. (In practice it is just an object that defines the method next())

  • Iteration is probably best explained by the Merriam-Webster definition of the word :

b : the repetition of a sequence of computer instructions a specified number of times or until a condition is met — compare recursion


回答 8

iterable = [1, 2] 

iterator = iter(iterable)

print(iterator.__next__())   

print(iterator.__next__())   

所以,

  1. iterable是可以循环对象。例如list,string,tuple等。

  2. iteriterable对象上使用该函数将返回迭代器对象。

  3. 现在,此迭代器对象具有名为__next__(在Python 3中,或仅next在Python 2中)的方法,您可以通过该方法访问iterable的每个元素。

因此,以上代码的输出将是:

1个

2

iterable = [1, 2] 

iterator = iter(iterable)

print(iterator.__next__())   

print(iterator.__next__())   

so,

  1. iterable is an object that can be looped over. e.g. list , string , tuple etc.

  2. using the iter function on our iterable object will return an iterator object.

  3. now this iterator object has method named __next__ (in Python 3, or just next in Python 2) by which you can access each element of iterable.

so, OUTPUT OF ABOVE CODE WILL BE:

1

2


回答 9

__iter__迭代对象具有每次都实例化新迭代器的方法。

迭代器实现一个__next__返回单个项目的__iter__方法和一个返回的方法self

因此,迭代器也是可迭代的,但是可迭代器不是迭代器。

Luciano Ramalho,流利的Python。

Iterables have a __iter__ method that instantiates a new iterator every time.

Iterators implement a __next__ method that returns individual items, and a __iter__ method that returns self .

Therefore, iterators are also iterable, but iterables are not iterators.

Luciano Ramalho, Fluent Python.


回答 10

在处理迭代器和迭代器之前,决定迭代器和迭代器的主要因素是顺序

序列:序列是数据的集合

可迭代:可迭代是支持__iter__方法的序列类型对象。

Iter方法:Iter方法将序列作为输入并创建一个称为迭代器的对象

迭代器:迭代器是调用next方法并遍历整个序列的对象。在调用下一个方法时,它返回当前遍历的对象。

例:

x=[1,2,3,4]

x是一个由数据收集组成的序列

y=iter(x)

调用iter(x)时,仅当x对象具有iter方法时才返回迭代器,否则会引发异常。如果返回迭代器,则按如下方式分配y:

y=[1,2,3,4]

由于y是迭代器,因此它支持next()方法

调用next方法时,它会一步一步返回列表的各个元素。

返回序列的最后一个元素后,如果再次调用下一个方法,则会引发StopIteration错误

例:

>>> y.next()
1
>>> y.next()
2
>>> y.next()
3
>>> y.next()
4
>>> y.next()
StopIteration

Before dealing with the iterables and iterator the major factor that decide the iterable and iterator is sequence

Sequence: Sequence is the collection of data

Iterable: Iterable are the sequence type object that support __iter__ method.

Iter method: Iter method take sequence as an input and create an object which is known as iterator

Iterator: Iterator are the object which call next method and transverse through the sequence. On calling the next method it returns the object that it traversed currently.

example:

x=[1,2,3,4]

x is a sequence which consists of collection of data

y=iter(x)

On calling iter(x) it returns a iterator only when the x object has iter method otherwise it raise an exception.If it returns iterator then y is assign like this:

y=[1,2,3,4]

As y is a iterator hence it support next() method

On calling next method it returns the individual elements of the list one by one.

After returning the last element of the sequence if we again call the next method it raise an StopIteration error

example:

>>> y.next()
1
>>> y.next()
2
>>> y.next()
3
>>> y.next()
4
>>> y.next()
StopIteration

回答 11

在Python中,一切都是对象。如果说一个对象是可迭代的,则意味着您可以将对象作为一个集合逐步进行(即迭代)。

例如,数组是可迭代的。您可以使用for循环遍历它们,并从索引0到索引n,n是数组对象的长度减去1。

字典(键/值对,也称为关联数组)也是可迭代的。您可以逐步浏览他们的键。

显然,不是集合的对象是不可迭代的。例如,布尔对象只有一个值为True或False。它不是可迭代的(它是一个可迭代的对象是没有意义的)。

阅读更多。http://www.lepus.org.uk/ref/companion/Iterator.xml

In Python everything is an object. When an object is said to be iterable, it means that you can step through (i.e. iterate) the object as a collection.

Arrays for example are iterable. You can step through them with a for loop, and go from index 0 to index n, n being the length of the array object minus 1.

Dictionaries (pairs of key/value, also called associative arrays) are also iterable. You can step through their keys.

Obviously the objects which are not collections are not iterable. A bool object for example only have one value, True or False. It is not iterable (it wouldn’t make sense that it’s an iterable object).

Read more. http://www.lepus.org.uk/ref/companion/Iterator.xml


删除点子的缓存?

问题:删除点子的缓存?

我需要专门安装psycopg2 v2.4.1。我不小心做了:

 pip install psycopg2

代替:

 pip install psycopg2==2.4.1

它将安装2.4.4,而不是早期版本。

现在,即使在我pip卸载psycopg2并尝试使用正确的版本重新安装后,pip似乎仍在重新使用它第一次下载的缓存。

如何强制pip清除其下载缓存并使用命令中包含的特定版本?

I need to install psycopg2 v2.4.1 specifically. I accidentally did:

 pip install psycopg2

Instead of:

 pip install psycopg2==2.4.1

That installs 2.4.4 instead of the earlier version.

Now even after I pip uninstall psycopg2 and attempt to reinstall with the correct version, it appears that pip is re-using the cache it downloaded the first time.

How can I force pip to clear out its download cache and use the specific version I’m including in the command?


回答 0

如果使用的是pip 6.0或更高版本,请尝试添加该--no-cache-dir选项

如果使用的是pip 6.0之前的版本,请使用进行升级pip install -U pip

If using pip 6.0 or newer, try adding the --no-cache-dir option.

If using pip older than pip 6.0, upgrade it with pip install -U pip.


回答 1

在适合您的系统的地方清除缓存目录

Linux和Unix

~/.cache/pip  # and it respects the XDG_CACHE_HOME directory.

OS X

~/Library/Caches/pip

视窗

%LocalAppData%\pip\Cache

Clear the cache directory where appropriate for your system

Linux and Unix

~/.cache/pip  # and it respects the XDG_CACHE_HOME directory.

OS X

~/Library/Caches/pip

Windows

%LocalAppData%\pip\Cache

回答 2

https://pip.pypa.io/zh-CN/latest/reference/pip_install.html#caching的文档中:

从v6.0开始,pip提供了默认情况下的缓存,其功能类似于网络浏览器。默认情况下,当缓存处于打开状态并且被设计为默认时,您可以禁用缓存并始终通过使用该--no-cache-dir 选项来访问PyPI 。

From documentation at https://pip.pypa.io/en/latest/reference/pip_install.html#caching:

Starting with v6.0, pip provides an on-by-default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the --no-cache-dir option.


回答 3

pip可以安装一个忽略缓存的软件包,像这样

pip --no-cache-dir install scipy

pip can install a package ignoring the cache, like this

pip --no-cache-dir install scipy

回答 4

在Ubuntu上,我必须删除/tmp/pip-build-root

On Ubuntu, I had to delete /tmp/pip-build-root.


回答 5

(这里是点子维护者!)

由于PIP 6.0(后在2014年!) pip installpip download并且pip wheel命令可以告诉避免使用与高速缓存--no-cache-dir选项。(例如:pip install --no-cache-dir <package>

自pip 10.0(早在2018年!)以来,pip config添加了一个命令,该命令可用于将pip配置为始终忽略缓存- pip config set global.cache-dir false将pip配置为不“全局”使用缓存(即,在所有命令中)。

从pip 20.1开始,pip具有pip cache管理pip缓存内容的命令。

  • pip cache purge 删除缓存中的所有wheel文件。
  • pip cache remove matplotlib 有选择地从缓存中删除与matplotlib相关的文件。

总而言之,pip提供了许多调整缓存使用方式的方法:

  • pip install --no-cache-dir <package>:仅为此运行而无需使用缓存安装软件包。
  • pip config set global.cache-dir false:将pip配置为不“全局”使用缓存(在所有命令中)
  • pip cache remove matplotlib:从pip的缓存中删除所有与matplotlib相关的wheel文件。
  • pip cache purge:清除pip缓存中的所有文件。

问题中提到的“由于缓存而安装了错误版本”的特定问题已在pip 1.4中修复(早在2013年!):

修复了许多与清理和不重用构建目录有关的问题。(#413,#709,#634,#602,#939,#865,#948)

(pip maintainer here!)

Since pip 6.0 (back in 2014!), pip install, pip download and pip wheel commands can be told to avoid using the cache with the --no-cache-dir option. (eg: pip install --no-cache-dir <package>)

Since pip 10.0 (back in 2018!), a pip config command was added, which can be used to configure pip to always ignore the cache — pip config set global.cache-dir false configures pip to not use the cache “globally” (i.e. in all commands).

Since pip 20.1, pip has a pip cache command to manage the contents of pip’s cache.

  • pip cache purge removes all the wheel files in the cache.
  • pip cache remove matplotlib selectively removes files related to a matplotlib from the cache.

In summary, pip provides a lot of ways to tweak how it uses the cache:

  • pip install --no-cache-dir <package>: install a package without using the cache, for just this run.
  • pip config set global.cache-dir false: configure pip to not use the cache “globally” (in all commands)
  • pip cache remove matplotlib: removes all wheel files related to matplotlib from pip’s cache.
  • pip cache purge: to clear all files from pip’s cache.

The specific issue of “installing the wrong version due to caching” issue mentioned in the question was fixed in pip 1.4 (back in 2013!):

Fix a number of issues related to cleaning up and not reusing build directories. (#413, #709, #634, #602, #939, #865, #948)


回答 6

如果您想--no-cache-dir默认设置选项,可以将其放入pip.conf

[global]
no-cache-dir = false

的位置pip.conf取决于您的操作系统。请参阅文档以获取更多信息。

If you like to set the --no-cache-dir option by default, you can put this into pip.conf:

[global]
no-cache-dir = false

The location of pip.conf depends on your OS. See the documentation for more info.


回答 7

我只是遇到了类似的问题,发现获取pip升级软件包的唯一方法是删除以前未完成的安装或先前版本的pip可能遗留下的$PWD/build%CD%\build在Windows上)目录(它现在删除了成功安装后生成目录)。

I just had a similar problem and found that the only way to get pip to upgrade the package was to delete the $PWD/build (%CD%\build on Windows) directory that might have been left over from a previously unfinished install or a previous version of pip (it now deletes the build directories after a successful install).


回答 8

在archlinux中,pip缓存位于〜/ .cache / pip上,我可以通过删除其中的http文件夹来解决问题。

On archlinux pip cache is located at ~/.cache/pip, I could solve my issue by removing the http folder inside it.


回答 9

在我的Mac上,我必须删除缓存目录 ~/Library/Caches/pip/

On my mac I had to remove the cache directory ~/Library/Caches/pip/


回答 10

2020年4月21日发布pip 20.1b1以来,它“添加pip cache了检查/管理pip的转盘缓存的命令”,因此可以发出以下命令:

pip cache purge

参考指南在这里:
https : //pip.pypa.io/en/stable/reference/pip_cache/
相应的拉取请求在这里

Since pip 20.1b1, which was released on 21 April 2020 and “added pip cache command for inspecting/managing pip’s wheel cache”, it is possible to issue this command:

pip cache purge

The reference guide is here:
https://pip.pypa.io/en/stable/reference/pip_cache/
The corresponding pull request is here.


回答 11

在Windows 7上,我必须删除%HOMEPATH%/pip

On Windows 7, I had to delete %HOMEPATH%/pip.


回答 12

如果使用virtualenv,请build在您的环境根目录下查找目录。

If using virtualenv, look for the build directory under your environments root.


回答 13

我必须在Windows 7上删除%TEMP%\ pip-build

I had to delete %TEMP%\pip-build On Windows 7


回答 14

在Mac OS(小牛)上,我不得不删除 /tmp/pip-build/

On Mac OS (Mavericks), I had to delete /tmp/pip-build/


回答 15

更好的方法是删除缓存并重建它。这样,如果您再次为其他virtualenv安装它,它将使用缓存而不是每次安装时都进行构建。

例如,当您安装它时,它将说它使用了缓存的滚轮,

Processing <some_prefix>/Library/Caches/pip/wheels/d0/c4/e4/e49fd07bca8dda00dd6b4bbc606aa05a25aacb00d45747a47a/horovod-0.19.3-cp37-cp37m-macosx_10_9_x86_64.wh

只需删除该文件,然后重新开始安装即可。

A better way to do it is to delete the cache and rebuild it. In this way, if you install it again for other virtualenv, it will use the cache instead of building every time when you install it.

For example, when you install it, it will say it uses cached wheel,

Processing <some_prefix>/Library/Caches/pip/wheels/d0/c4/e4/e49fd07bca8dda00dd6b4bbc606aa05a25aacb00d45747a47a/horovod-0.19.3-cp37-cp37m-macosx_10_9_x86_64.wh

Just delete that one and restart your install.


回答 16

(…)似乎pip正在重新使用缓存(…)

我很确定这不是正在发生的事情。用来(错误地)重用构建目录而非缓存的Pip。此问题已在 2013年7月23日发布的pip版本1.4中修复

(…) it appears that pip is re-using the cache (…)

I’m pretty sure that’s not what’s happening. Pip used to (wrongly) reuse build directory not cache. This was fixed in version 1.4 of pip which was released on 2013-07-23.


如何在Ubuntu上安装LXML

问题:如何在Ubuntu上安装LXML

我在Ubuntu 11上使用easy_install安装lxml遇到困难。

当我输入时,$ easy_install lxml我得到:

Searching for lxml
Reading http://pypi.python.org/simple/lxml/
Reading http://codespeak.net/lxml
Best match: lxml 2.3
Downloading http://lxml.de/files/lxml-2.3.tgz
Processing lxml-2.3.tgz
Running lxml-2.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-7UdQOZ/lxml-2.3/egg-dist-tmp-GacQGy
Building lxml version 2.3.
Building without Cython.
ERROR: /bin/sh: xslt-config: not found

** make sure the development packages of libxml2 and libxslt are installed **

Using build configuration of libxslt 
In file included from src/lxml/lxml.etree.c:227:0:
src/lxml/etree_defs.h:9:31: fatal error: libxml/xmlversion.h: No such file or directory
compilation terminated.

看来libxslt还是libxml2没有安装。我已经尝试按照http://www.techsww.com/tutorials/libraries/libxslt/installation/installing_libxslt_on_ubuntu_linux.phphttp://www.techsww.com/tutorials/libraries/libxml/installation/installing_installing_libxml_on_ubuntu_linux上的说明进行操作。 PHP没有成功。

如果我尝试wget ftp://xmlsoft.org/libxml2/libxml2-sources-2.6.27.tar.gz我会得到

<successful connection info>
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /libxml2 ... done.
==> SIZE libxml2-sources-2.6.27.tar.gz ... done.
==> PASV ... done.    ==> RETR libxml2-sources-2.6.27.tar.gz ... 
No such file `libxml2-sources-2.6.27.tar.gz'.

如果我先尝试另一种,那我会做的,./configure --prefix=/usr/local/libxslt --with-libxml-prefix=/usr/local/libxml2最终会失败,并显示以下内容:

checking for libxml libraries >= 2.6.27... configure: error: Could not find libxml2 anywhere, check ftp://xmlsoft.org/.

我试过两个版本2.6.272.6.29libxml2没什么区别。

不遗余力,我已经成功完成了sudo apt-get install libxml2-dev,但这并没有改变。

I’m having difficulty installing lxml with easy_install on Ubuntu 11.

When I type $ easy_install lxml I get:

Searching for lxml
Reading http://pypi.python.org/simple/lxml/
Reading http://codespeak.net/lxml
Best match: lxml 2.3
Downloading http://lxml.de/files/lxml-2.3.tgz
Processing lxml-2.3.tgz
Running lxml-2.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-7UdQOZ/lxml-2.3/egg-dist-tmp-GacQGy
Building lxml version 2.3.
Building without Cython.
ERROR: /bin/sh: xslt-config: not found

** make sure the development packages of libxml2 and libxslt are installed **

Using build configuration of libxslt 
In file included from src/lxml/lxml.etree.c:227:0:
src/lxml/etree_defs.h:9:31: fatal error: libxml/xmlversion.h: No such file or directory
compilation terminated.

It seems that libxslt or libxml2 is not installed. I’ve tried following the instructions at http://www.techsww.com/tutorials/libraries/libxslt/installation/installing_libxslt_on_ubuntu_linux.php and http://www.techsww.com/tutorials/libraries/libxml/installation/installing_libxml_on_ubuntu_linux.php with no success.

If I try wget ftp://xmlsoft.org/libxml2/libxml2-sources-2.6.27.tar.gz I get

<successful connection info>
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /libxml2 ... done.
==> SIZE libxml2-sources-2.6.27.tar.gz ... done.
==> PASV ... done.    ==> RETR libxml2-sources-2.6.27.tar.gz ... 
No such file `libxml2-sources-2.6.27.tar.gz'.

If I try the other first, I’ll get to ./configure --prefix=/usr/local/libxslt --with-libxml-prefix=/usr/local/libxml2 and that will fail eventually with:

checking for libxml libraries >= 2.6.27... configure: error: Could not find libxml2 anywhere, check ftp://xmlsoft.org/.

I’ve tried both versions 2.6.27 and 2.6.29 of libxml2 with no difference.

Leaving no stone unturned, I have successfully done sudo apt-get install libxml2-dev, but this changes nothing.


回答 0

由于您使用的是Ubuntu,因此不必理会这些源代码包。只需使用apt-get安装这些开发包。

apt-get install libxml2-dev libxslt1-dev python-dev

但是,如果您对可能是旧版本的lxml感到满意,则可以尝试

apt-get install python-lxml

并完成它。:)

Since you’re on Ubuntu, don’t bother with those source packages. Just install those development packages using apt-get.

apt-get install libxml2-dev libxslt1-dev python-dev

If you’re happy with a possibly older version of lxml altogether though, you could try

apt-get install python-lxml

and be done with it. :)


回答 1

在lxml编译之前,我还必须安装lib32z1-dev(Ubuntu 13.04 x64)。

sudo apt-get install lib32z1-dev

或所有必需的包装在一起:

sudo apt-get install libxml2-dev libxslt-dev python-dev lib32z1-dev

I also had to install lib32z1-dev before lxml would compile (Ubuntu 13.04 x64).

sudo apt-get install lib32z1-dev

Or all the required packages together:

sudo apt-get install libxml2-dev libxslt-dev python-dev lib32z1-dev

回答 2

正如@Pepijn在ubuntu 13.04 x64上评论@Druska的答案一样,无需使用lib32z1-dev,zlib1g-dev就足够了:

sudo apt-get install libxml2-dev libxslt-dev python-dev zlib1g-dev

As @Pepijn commented on @Druska ‘s answer, on ubuntu 13.04 x64, there is no need to use lib32z1-dev, zlib1g-dev is enough:

sudo apt-get install libxml2-dev libxslt-dev python-dev zlib1g-dev

回答 3

我使用Ubuntu 14.04在Vagrant中使用pip安装了lxml,并且遇到了同样的问题。即使安装了所有要求,我也一次又一次遇到相同的错误。事实证明,默认情况下,我的VM的内存很少。有了1024 MB,一切正常。

将此添加到您的VagrantFile中,lxml应该正确编译/安装:

config.vm.provider "virtualbox" do |vb|
  vb.memory = 1024
end

感谢sixhobbit的提示(请参阅:无法在Ubuntu 12.04上安装lxml)。

I installed lxml with pip in Vagrant, using Ubuntu 14.04 and had the same problem. Even though all requirements where installed, i got the same error again and again. Turned out, my VM had to little memory by default. With 1024 MB everything works fine.

Add this to your VagrantFile and lxml should properly compile / install:

config.vm.provider "virtualbox" do |vb|
  vb.memory = 1024
end

Thanks to sixhobbit for the hint (see: can’t installing lxml on Ubuntu 12.04).


回答 4

对于Ubuntu 14.04

sudo apt-get install python-lxml

为我工作。

For Ubuntu 14.04

sudo apt-get install python-lxml

worked for me.


回答 5

步骤1

使用此命令安装最新的python更新。

sudo apt-get install python-dev

第2步

添加第一个依赖项libxml2版本2.7.0或更高版本

sudo apt-get install libxml2-dev

第三步

添加第二个依赖库libxslt版本1.1.23或更高版本

sudo apt-get install libxslt1-dev

第四步

首先安装pip软件包管理工具。并运行此命令。

pip install lxml

如果您有任何疑问,请点击这里

Step 1

Install latest python updates using this command.

sudo apt-get install python-dev

Step 2

Add first dependency libxml2 version 2.7.0 or later

sudo apt-get install libxml2-dev

Step 3

Add second dependency libxslt version 1.1.23 or later

sudo apt-get install libxslt1-dev

Step 4

Install pip package management tool first. and run this command.

pip install lxml

If you have any doubt Click Here


回答 6

安装AKX提到的软件包后,我仍然遇到相同的问题。解决了

apt-get install python-dev

After installing the packages mentioned by AKX I still had the same problem. Solved it with

apt-get install python-dev

回答 7

对于Ubuntu 12.04.3 LTS(精确的穿山甲),我必须这样做:

apt-get install libxml2-dev libxslt1-dev

(注意libxslt1-dev中的“ 1”)

然后我刚刚用pip / easy_install安装了lxml。

For Ubuntu 12.04.3 LTS (Precise Pangolin) I had to do:

apt-get install libxml2-dev libxslt1-dev

(Note the “1” in libxslt1-dev)

Then I just installed lxml with pip/easy_install.


回答 8

从Ubuntu 18.4(Bionic Beaver)开始,建议使用apt而不是apt-get,因为它具有更好的结构形式。

sudo apt install libxml2-dev libxslt1-dev python-dev

如果您对可能是旧版本的设备感到满意lxml,则可以尝试

sudo apt install python-lxml

From Ubuntu 18.4 (Bionic Beaver) it is advisable to use apt instead of apt-get since it has much better structural form.

sudo apt install libxml2-dev libxslt1-dev python-dev

If you’re happy with a possibly older version of lxml altogether though, you could try

sudo apt install python-lxml

回答 9


由于@Simplans(https://stackoverflow.com/a/37759871/417747)的指针和主页,这里的许多答案都比较旧了。

什么对我有用(Ubuntu仿生):

sudo apt-get install python3-lxml  

(+ sudo apt-get install libxml2-dev libxslt1-dev我已经安装了它,但是不确定那是否仍然是必需的)

Many answers here are rather old,
thanks to the pointer from @Simplans (https://stackoverflow.com/a/37759871/417747) and the home page

What worked for me (Ubuntu bionic):

sudo apt-get install python3-lxml  

(+ sudo apt-get install libxml2-dev libxslt1-dev I installed before it, but not sure if that’s the requirement still)


回答 10

首先安装Ubuntu的python-lxml软件包及其依赖项:

sudo apt-get install python-lxml

然后使用pip升级到适用于Python的lxml的最新版本:

pip install lxml

First install Ubuntu’s python-lxml package and its dependencies:

sudo apt-get install python-lxml

Then use pip to upgrade to the latest version of lxml for Python:

pip install lxml

为什么Python中没有++和-运算符?

问题:为什么Python中没有++和-运算符?

为什么在Python中没有++and --运算符?

Why are there no ++ and -- operators in Python?


回答 0

不是因为它没有道理;而是因为它没有意义。最好将“ x ++”定义为“ x + = 1,求出x的先前绑定”。

如果您想知道最初的原因,则必须要么浏览旧的Python邮件列表,要么询问那里的某个人(例如Guido),但是在事实成立之后就很容易找到理由了:

与其他语言一样,不需要简单的增量和减量。您不会for(int i = 0; i < 10; ++i)经常用Python 编写东西。相反,你做类似的事情for i in range(0, 10)

由于几乎不需要它,因此没有太多理由为其提供特殊的语法。当您确实需要增加时,+=通常就可以了。

这不是是否有意义,还是可以做到的决定。这是一个好处是否值得添加到该语言的核心语法中的问题。请记住,这是四个运算符-postinc,postdec,preinc,predec,并且每个运算符都需要具有自己的类重载;他们都需要指定和测试;它将在语言中添加操作码(暗示更大,因此更慢的VM引擎);每个支持逻辑增量的类都需要实现它们(在+=和之上-=)。

+=和都是多余的-=,因此将成为净亏损。

It’s not because it doesn’t make sense; it makes perfect sense to define “x++” as “x += 1, evaluating to the previous binding of x”.

If you want to know the original reason, you’ll have to either wade through old Python mailing lists or ask somebody who was there (eg. Guido), but it’s easy enough to justify after the fact:

Simple increment and decrement aren’t needed as much as in other languages. You don’t write things like for(int i = 0; i < 10; ++i) in Python very often; instead you do things like for i in range(0, 10).

Since it’s not needed nearly as often, there’s much less reason to give it its own special syntax; when you do need to increment, += is usually just fine.

It’s not a decision of whether it makes sense, or whether it can be done–it does, and it can. It’s a question of whether the benefit is worth adding to the core syntax of the language. Remember, this is four operators–postinc, postdec, preinc, predec, and each of these would need to have its own class overloads; they all need to be specified, and tested; it would add opcodes to the language (implying a larger, and therefore slower, VM engine); every class that supports a logical increment would need to implement them (on top of += and -=).

This is all redundant with += and -=, so it would become a net loss.


回答 1

我写的这个原始答案是关于计算机民俗的一个神话:被丹尼斯·里奇(Dennis Ritchie)认为是“历史上不可能的”,正如ACM通讯编辑在2012年7月致函doi:10.1145 / 2209249.2209251


C增/减运算符是在C编译器不是很聪明的时候发明的,作者希望能够指定使用机器语言运算符的直接意图,从而节省了编译器的几个周期,可能会做一个

load memory
load 1
add
store memory

代替

inc memory 

PDP-11甚至支持分别对应于*++p和的“自动递增”和“延迟自动递增”指令*p++。如果非常好奇,请参阅手册第5.3节。

由于编译器足够聪明,可以处理C语法中内置的高级优化技巧,因此它们现在只是语法上的便利。

Python没有技巧来向汇编器传达意图,因为它不使用汇编器。

This original answer I wrote is a myth from the folklore of computing: debunked by Dennis Ritchie as “historically impossible” as noted in the letters to the editors of Communications of the ACM July 2012 doi:10.1145/2209249.2209251


The C increment/decrement operators were invented at a time when the C compiler wasn’t very smart and the authors wanted to be able to specify the direct intent that a machine language operator should be used which saved a handful of cycles for a compiler which might do a

load memory
load 1
add
store memory

instead of

inc memory 

and the PDP-11 even supported “autoincrement” and “autoincrement deferred” instructions corresponding to *++p and *p++, respectively. See section 5.3 of the manual if horribly curious.

As compilers are smart enough to handle the high-level optimization tricks built into the syntax of C, they are just a syntactic convenience now.

Python doesn’t have tricks to convey intentions to the assembler because it doesn’t use one.


回答 2

我一直以为这与python禅的这一行有关:

应该有一种(最好只有一种)明显的方式来做到这一点。

x ++和x + = 1做完全相同的事情,因此没有理由同时使用两者。

I always assumed it had to do with this line of the zen of python:

There should be one — and preferably only one — obvious way to do it.

x++ and x+=1 do the exact same thing, so there is no reason to have both.


回答 3

当然,我们可以说“圭多岛就是这样决定的”,但我认为问题实际上是做出该决定的原因。我认为有以下几个原因:

  • 它将陈述和表达式混合在一起,这不是一个好习惯。参见http://norvig.com/python-iaq.html
  • 它通常会鼓励人们编写可读性较低的代码
  • 如前所述,语言实现的额外复杂性在Python中是不必要的

Of course, we could say “Guido just decided that way”, but I think the question is really about the reasons for that decision. I think there are several reasons:

  • It mixes together statements and expressions, which is not good practice. See http://norvig.com/python-iaq.html
  • It generally encourages people to write less readable code
  • Extra complexity in the language implementation, which is unnecessary in Python, as already mentioned

回答 4

因为在Python中,整数是不可变的(int的+ =实际上返回了一个不同的对象)。

同样,使用++ /-时,您需要担心增量前后的递增和递减,并且只需要再写一次按键即可x+=1。换句话说,它避免了潜在的混乱,但付出的代价却很小。

Because, in Python, integers are immutable (int’s += actually returns a different object).

Also, with ++/– you need to worry about pre- versus post- increment/decrement, and it takes only one more keystroke to write x+=1. In other words, it avoids potential confusion at the expense of very little gain.


回答 5

明晰!

Python有很多关于清晰度--a的知识,除非他/她学习了具有这种构造的语言,否则任何程序员都不可能正确地猜测的含义。

Python还有很多关于避免引发错误的构造的知识,并且++已知运算符是缺陷的丰富来源。这两个原因足以在Python中没有这些运算符。

Python使用缩进来标记块而不是诸如某种形式的开始/结束方括号或强制结束标记之类的句法手段的决定很大程度上基于相同的考虑。

作为说明,请看一下有关在2005年向Python中引入条件运算符(在C:中cond ? resultif : resultelse)的讨论。至少请阅读该讨论第一条消息决策消息(之前有相同主题的多个先驱)。

琐事: 其中经常提到的PEP是“ Python扩展建议” PEP 308。LC表示列表理解,GE表示生成器表达式(不要担心,如果它们使您感到困惑,它们不是Python的少数复杂地方)。

Clarity!

Python is a lot about clarity and no programmer is likely to correctly guess the meaning of --a unless s/he’s learned a language having that construct.

Python is also a lot about avoiding constructs that invite mistakes and the ++ operators are known to be rich sources of defects. These two reasons are enough not to have those operators in Python.

The decision that Python uses indentation to mark blocks rather than syntactical means such as some form of begin/end bracketing or mandatory end marking is based largely on the same considerations.

For illustration, have a look at the discussion around introducing a conditional operator (in C: cond ? resultif : resultelse) into Python in 2005. Read at least the first message and the decision message of that discussion (which had several precursors on the same topic previously).

Trivia: The PEP frequently mentioned therein is the “Python Extension Proposal” PEP 308. LC means list comprehension, GE means generator expression (and don’t worry if those confuse you, they are none of the few complicated spots of Python).


回答 6

我对python为什么没有++运算符的理解如下:当您在python中编写此代码时,a=b=c=1您将获得三个变量(标签),它们指向同一对象(值为1)。您可以使用id函数进行验证,该函数将返回对象内存地址:

In [19]: id(a)
Out[19]: 34019256

In [20]: id(b)
Out[20]: 34019256

In [21]: id(c)
Out[21]: 34019256

所有三个变量(标签)都指向同一对象。现在递增变量之一,看看它如何影响内存地址:

In [22] a = a + 1

In [23]: id(a)
Out[23]: 34019232

In [24]: id(b)
Out[24]: 34019256

In [25]: id(c)
Out[25]: 34019256

您可以看到该变量a现在指向另一个对象,为bc。因为您已经使用a = a + 1它,所以它很明显。换句话说,您将另一个对象完全分配给label a。想象一下,您可以编写a++它,这表明您没有分配给变量a新对象,而是增加了旧对象的数量。所有这些东西都是恕我直言,以尽量减少混乱。为了更好地理解,请参见python变量如何工作:

在Python中,为什么函数可以修改调用者认为的某些参数,而不能修改其他参数?

Python是按值调用还是按引用调用?都不行

Python是按值传递还是按引用传递?

Python是按引用传递还是按值传递?

Python:如何通过引用传递变量?

了解Python变量和内存管理

在python中模拟按值传递行为

Python函数通过引用调用

像Pythonista一样的代码:惯用的Python

My understanding of why python does not have ++ operator is following: When you write this in python a=b=c=1 you will get three variables (labels) pointing at same object (which value is 1). You can verify this by using id function which will return an object memory address:

In [19]: id(a)
Out[19]: 34019256

In [20]: id(b)
Out[20]: 34019256

In [21]: id(c)
Out[21]: 34019256

All three variables (labels) point to the same object. Now increment one of variable and see how it affects memory addresses:

In [22] a = a + 1

In [23]: id(a)
Out[23]: 34019232

In [24]: id(b)
Out[24]: 34019256

In [25]: id(c)
Out[25]: 34019256

You can see that variable a now points to another object as variables b and c. Because you’ve used a = a + 1 it is explicitly clear. In other words you assign completely another object to label a. Imagine that you can write a++ it would suggest that you did not assign to variable a new object but ratter increment the old one. All this stuff is IMHO for minimization of confusion. For better understanding see how python variables works:

In Python, why can a function modify some arguments as perceived by the caller, but not others?

Is Python call-by-value or call-by-reference? Neither.

Does Python pass by value, or by reference?

Is Python pass-by-reference or pass-by-value?

Python: How do I pass a variable by reference?

Understanding Python variables and Memory Management

Emulating pass-by-value behaviour in python

Python functions call by reference

Code Like a Pythonista: Idiomatic Python


回答 7

它就是这样设计的。递增和递减运算符只是的快捷方式x = x + 1。Python通常采用了一种设计策略,该策略减少了执行操作的替代方法的数量。 增量分配是Python中最接近递增/递减运算符的东西,直到Python 2.0才添加。

It was just designed that way. Increment and decrement operators are just shortcuts for x = x + 1. Python has typically adopted a design strategy which reduces the number of alternative means of performing an operation. Augmented assignment is the closest thing to increment/decrement operators in Python, and they weren’t even added until Python 2.0.


回答 8

我是python的新手,但我怀疑原因是由于该语言中的可变对象和不可变对象之间的强调。现在,我知道x ++可以很容易地解释为x = x + 1,但是它看起来就像您就地递增一个不可变的对象一样。

只是我的猜测/感觉/预感。

I’m very new to python but I suspect the reason is because of the emphasis between mutable and immutable objects within the language. Now, I know that x++ can easily be interpreted as x = x + 1, but it LOOKS like you’re incrementing in-place an object which could be immutable.

Just my guess/feeling/hunch.


回答 9

首先,Python仅受C间接影响。它在很大程度上受到影响ABC,这显然不具备这些运营商,所以它不应该有任何巨大的惊喜不会找到它们在Python两种。

其次,正如其他人所说,递增和递减由支持+=-=了。

第三,对++and --运算符集的完全支持通常包括同时支持它们的前缀和后缀版本。在C和C ++中,这可能导致各种“可爱”的构造(在我看来)与Python所包含的简单性和直截了当的精神背道而驰。

例如,尽管C语句while(*t++ = *s++);对于有经验的程序员而言似乎简单而优雅,但对于学习它的人来说,却绝非简单。混合使用前缀和后缀增量和减量,甚至许多专业人士也必须停下来思考一下。

First, Python is only indirectly influenced by C; it is heavily influenced by ABC, which apparently does not have these operators, so it should not be any great surprise not to find them in Python either.

Secondly, as others have said, increment and decrement are supported by += and -= already.

Third, full support for a ++ and -- operator set usually includes supporting both the prefix and postfix versions of them. In C and C++, this can lead to all kinds of “lovely” constructs that seem (to me) to be against the spirit of simplicity and straight-forwardness that Python embraces.

For example, while the C statement while(*t++ = *s++); may seem simple and elegant to an experienced programmer, to someone learning it, it is anything but simple. Throw in a mixture of prefix and postfix increments and decrements, and even many pros will have to stop and think a bit.


回答 10

我认为这源于Python的信条,即“明确胜于隐含”。

I believe it stems from the Python creed that “explicit is better than implicit”.


回答 11

这可能是因为@GlennMaynard正在将问题与其他语言进行比较,但是在Python中,您是以python方式进行操作的。这不是一个“为什么”的问题。在那里,您可以使用达到相同的效果x+=。在《 Python的禅宗》中,给出了:“只有一种解决问题的方法。” 多种选择在艺术上(表达自由)很棒,但在工程上却很​​糟糕。

This may be because @GlennMaynard is looking at the matter as in comparison with other languages, but in Python, you do things the python way. It’s not a ‘why’ question. It’s there and you can do things to the same effect with x+=. In The Zen of Python, it is given: “there should only be one way to solve a problem.” Multiple choices are great in art (freedom of expression) but lousy in engineering.


回答 12

++运算符的类别是具有副作用的表达式。这是Python中通常找不到的东西。

出于同样的原因,赋值不是Python中的表达式,因此防止了通用 if (a = f(...)) { /* using a here */ }用法。

最后,我怀疑操作符与Python的参考语义不是很一致。请记住,Python没有具有C / C ++已知语义的变量(或指针)。

The ++ class of operators are expressions with side effects. This is something generally not found in Python.

For the same reason an assignment is not an expression in Python, thus preventing the common if (a = f(...)) { /* using a here */ } idiom.

Lastly I suspect that there operator are not very consistent with Pythons reference semantics. Remember, Python does not have variables (or pointers) with the semantics known from C/C++.


回答 13

也许更好的问题是问为什么这些运算符存在于C中。K&R调用增量和减量运算符为“异常”(第2.8页第2.8节)。导言称它们“更简洁,通常更高效”。我怀疑这些操作总是在指针操作中出现的事实也影响了它们的引入。在Python中,可能已经决定尝试优化增量没有任何意义(事实上,我只是在C中进行了测试,而且似乎gcc生成的程序集在两种情况下都使用addl而不是incl),并且没有指针算法;因此,它本来只是另一种实现方式,而我们知道Python不愿这样做。

Maybe a better question would be to ask why do these operators exist in C. K&R calls increment and decrement operators ‘unusual’ (Section 2.8page 46). The Introduction calls them ‘more concise and often more efficient’. I suspect that the fact that these operations always come up in pointer manipulation also has played a part in their introduction. In Python it has been probably decided that it made no sense to try to optimise increments (in fact I just did a test in C, and it seems that the gcc-generated assembly uses addl instead of incl in both cases) and there is no pointer arithmetic; so it would have been just One More Way to Do It and we know Python loathes that.


回答 14

据我了解,所以您不会认为内存中的值已更改。在c中,当执行x ++时,内存中x的值会更改。但是在python中,所有数字都是不可变的,因此x指向的地址仍然具有x而不是x + 1。当您编写x ++时,您可能会认为x发生了改变,实际上是x引用更改为x + 1存储在内存中的位置,或者如果doe不存在,则重新创建该位置。

as i understood it so you won’t think the value in memory is changed. in c when you do x++ the value of x in memory changes. but in python all numbers are immutable hence the address that x pointed as still has x not x+1. when you write x++ you would think that x change what really happens is that x refrence is changed to a location in memory where x+1 is stored or recreate this location if doe’s not exists.


回答 15

要在该页面上完成已经很好的答案:

假设我们决定这样做,在前缀(++i)处打乱一元+和-运算符。

今天,以++--做任何前缀都没有,因为它使一元加号运算符两次(不执行任何操作)或一元减号两次(两次:取消自身)成为可能

>>> i=12
>>> ++i
12
>>> --i
12

这样可能会破坏这种逻辑。

To complete already good answers on that page:

Let’s suppose we decide to do this, prefix (++i) that would break the unary + and – operators.

Today, prefixing by ++ or -- does nothing, because it enables unary plus operator twice (does nothing) or unary minus twice (twice: cancels itself)

>>> i=12
>>> ++i
12
>>> --i
12

So that would potentially break that logic.


回答 16

其他答案描述了为什么迭代器不需要它,但是有时在分配以增加内联变量时它很有用,您可以使用元组和多重分配来达到相同的效果:

b = ++a 变成:

a,b = (a+1,)*2

b = a++成为:

a,b = a+1, a

Python的3.8引入了分配:=操作,使我们能够实现foo(++a)

foo(a:=a+1)

foo(a++) 虽然仍然难以捉摸。

Other answers have described why it’s not needed for iterators, but sometimes it is useful when assigning to increase a variable in-line, you can achieve the same effect using tuples and multiple assignment:

b = ++a becomes:

a,b = (a+1,)*2

and b = a++ becomes:

a,b = a+1, a

Python 3.8 introduces the assignment := operator, allowing us to achievefoo(++a) with

foo(a:=a+1)

foo(a++) is still elusive though.


回答 17

我认为这涉及对象的可变性和不可变性的概念。2,3,4,5在python中是不可变的。请参考下图。2具有固定的ID,直到此python进程为止。

常量和变量的ID

x ++本质上意味着像C的就地增量。在C中,x ++执行就地增量。因此,x = 3,并且x ++会将内存中的3增加到4,这与python中内存中仍然存在3的情况不同。

因此,在python中,您无需在内存中重新创建值。这可能会导致性能优化。

这是基于预感的答案。

I think this relates to the concepts of mutability and immutability of objects. 2,3,4,5 are immutable in python. Refer to the image below. 2 has fixed id until this python process.

ID of constants and variables

x++ would essentially mean an in-place increment like C. In C, x++ performs in-place increments. So, x=3, and x++ would increment 3 in the memory to 4, unlike python where 3 would still exist in memory.

Thus in python, you don’t need to recreate a value in memory. This may lead to performance optimizations.

This is a hunch based answer.


回答 18

我知道这是一个旧线程,但是没有涵盖++ i的最常见用例,即在没有提供索引的情况下手动索引集。这就是为什么python提供enumerate()的原因

示例:在任何给定的语言中,当您使用诸如foreach之类的结构来遍历一个集合时,出于示例的考虑,我们甚至会说它是无序的集合,并且您需要一个唯一的索引来区分所有内容,例如

i = 0
stuff = {'a': 'b', 'c': 'd', 'e': 'f'}
uniquestuff = {}
for key, val in stuff.items() :
  uniquestuff[key] = '{0}{1}'.format(val, i)
  i += 1

在这种情况下,python提供了一个枚举方法,例如

for i, (key, val) in enumerate(stuff.items()) :

I know this is an old thread, but the most common use case for ++i is not covered, that being manually indexing sets when there are no provided indices. This situation is why python provides enumerate()

Example : In any given language, when you use a construct like foreach to iterate over a set – for the sake of the example we’ll even say it’s an unordered set and you need a unique index for everything to tell them apart, say

i = 0
stuff = {'a': 'b', 'c': 'd', 'e': 'f'}
uniquestuff = {}
for key, val in stuff.items() :
  uniquestuff[key] = '{0}{1}'.format(val, i)
  i += 1

In cases like this, python provides an enumerate method, e.g.

for i, (key, val) in enumerate(stuff.items()) :

将Flask开发服务器配置为在网络上可见

问题:将Flask开发服务器配置为在网络上可见

我不确定这是否是Flask专用的,但是当我在开发模式(http://localhost:5000)下运行应用程序时,无法从网络上的其他计算机(使用http://[dev-host-ip]:5000)访问它。例如,在开发模式下使用Rails时,它可以正常工作。我找不到有关Flask开发服务器配置的任何文档。任何想法应该配置为启用此功能吗?

I’m not sure if this is Flask specific, but when I run an app in dev mode (http://localhost:5000), I cannot access it from other machines on the network (with http://[dev-host-ip]:5000). With Rails in dev mode, for example, it works fine. I couldn’t find any docs regarding the Flask dev server configuration. Any idea what should be configured to enable this?


回答 0

尽管这是可行的,但您不应在生产中使用Flask dev服务器。Flask开发服务器的设计并非特别安全,稳定或高效。有关正确的解决方案,请参阅有关部署的文档。


将参数添加到中app.run()。默认情况下,它在本地主机上运行,​​将其更改app.run(host= '0.0.0.0')为在您的计算机IP地址上运行。

快速入门页上的“外部可见的服务器”下的Flask网站上记录

外部可见服务器

如果运行服务器,您会注意到该服务器仅可用于您自己的计算机,而不能用于网络中的任何其他服务器。这是默认设置,因为在调试模式下,应用程序的用户可以在计算机上执行任意Python代码。如果禁用了调试或信任网络上的用户,则可以使服务器公开可用。

只需将run()方法的调用更改为如下所示:

app.run(host='0.0.0.0')

这告诉您的操作系统侦听公共IP。

While this is possible, you should not use the Flask dev server in production. The Flask dev server is not designed to be particularly secure, stable, or efficient. See the docs on deploying for correct solutions.


Add a parameter to your app.run(). By default it runs on localhost, change it to app.run(host= '0.0.0.0') to run on your machines IP address.

Documented on the Flask site under “Externally Visible Server” on the Quickstart page:

Externally Visible Server

If you run the server you will notice that the server is only available from your own computer, not from any other in the network. This is the default because in debugging mode a user of the application can execute arbitrary Python code on your computer. If you have debug disabled or trust the users on your network, you can make the server publicly available.

Just change the call of the run() method to look like this:

app.run(host='0.0.0.0')

This tells your operating system to listen on a public IP.


回答 1

如果使用flask可执行文件启动服务器,则可以使用flask run --host=0.0.0.0更改默认值,从127.0.0.1并将其打开到非本地连接。其他答案描述的config和app.run方法可能是更好的做法,但这也很方便。

外部可见服务器如果运行服务器,您将注意到只能从您自己的计算机访问该服务器,而不能从网络中的任何其他服务器访问该服务器。这是默认设置,因为在调试模式下,应用程序的用户可以在计算机上执行任意Python代码。

如果禁用了调试器或信任网络上的用户,则只需在命令行中添加–host = 0.0.0.0,即可使服务器公开可用:

flask run –host = 0.0.0.0这告诉您的操作系统侦听所有公用IP。

参考:http//flask.pocoo.org/docs/0.11/quickstart/

If you use the flask executable to start your server, you can use flask run --host=0.0.0.0 to change the default from 127.0.0.1 and open it up to non local connections. The config and app.run methods that the other answers describe are probably better practice but this can be handy as well.

Externally Visible Server If you run the server you will notice that the server is only accessible from your own computer, not from any other in the network. This is the default because in debugging mode a user of the application can execute arbitrary Python code on your computer.

If you have the debugger disabled or trust the users on your network, you can make the server publicly available simply by adding –host=0.0.0.0 to the command line:

flask run –host=0.0.0.0 This tells your operating system to listen on all public IPs.

Reference: http://flask.pocoo.org/docs/0.11/quickstart/


回答 2

如果0.0.0.0方法不起作用,请尝试此操作

无聊的东西

我亲自进行了很多努力,以使我的应用可以通过本地服务器访问其他设备(笔记本电脑和手机)。我尝试了0.0.0.0方法,但是没有运气。然后,我尝试更改端口,但没有成功。因此,在尝试了一堆不同的组合之后,我找到了这个组合,它解决了我在本地服务器上部署应用程序的问题。

脚步

  1. 获取计算机的本地IPv4地址。这可以通过ipconfig在Windows以及ifconfiglinux和Mac上键入来完成。

IPv4(Windows)

请注意:以上步骤将在提供该应用程序的计算机上执行,而不是在您正在访问该应用程序的计算机上执行。另请注意,如果断开连接并重新连接到网络,IPv4地址可能会更改。

  1. 现在,只需使用获取的IPv4地址运行flask应用程序即可。

    flask run -h 192.168.X.X

    例如,就我而言(参见图片),我将其运行为:

    flask run -h 192.168.1.100

运行烧瓶应用程序

在我的移动设备上

我手机的屏幕截图

可选的东西

如果您正在Windows上执行此过程,并使用Power Shell作为CLI,但仍然无法访问该网站,请在运行该应用程序的Shell中尝试使用CTRL + C命令。Power Shell有时会冻结,因此需要一点点恢复。这样做甚至可能终止服务器,但有时可以解决问题。

而已。如果您觉得有帮助,请竖起大拇指。😉

一些其他可选的东西

我创建了一个简短的Powershell脚本,可以在需要时为您提供IP地址:

$env:getIp = ipconfig
if ($env:getIp -match '(IPv4[\sa-zA-Z.]+:\s[0-9.]+)') {
    if ($matches[1] -match '([^a-z\s][\d]+[.\d]+)'){
        $ipv4 = $matches[1]
    }
}
echo $ipv4

将其保存到扩展名为.ps1的文件中(对于PowerShell),然后在启动您的应用程序之前对其运行。您可以将其保存在项目文件夹中,并以以下方式运行:

.\getIP.ps1; flask run -h $ipv4

注意:我将上面的shell代码保存在getIP.ps1中。

酷👌

Try this if the 0.0.0.0 method doesn’t work

Boring Stuff

I personally battled a lot to get my app accessible to other devices(laptops and mobile phones) through a local-server. I tried the 0.0.0.0 method, but no luck. Then I tried changing the port, but it just didn’t work. So, after trying a bunch of different combinations, I arrived to this one, and it solved my problem of deploying my app on a local-server.

Steps

  1. Get the local IPv4 address of your computer. This can be done by typing ipconfig on Windows and ifconfig on linux and Mac.

IPv4 (Windows)

Please note: The above step is to be performed on the machine you are serving the app on, and on not the machine on which you are accessing it. Also note, that the IPv4 address might change if you disconnect and reconnect to the network.

  1. Now, simply run the flask app with the acquired IPv4 address.

    flask run -h 192.168.X.X

    E.g. In my case (see the image), I ran it as:

    flask run -h 192.168.1.100

running the flask app

On my mobile device

screenshot from my mobile phone

Optional Stuff

If you are performing this procedure on Windows, and using Power Shell as the CLI, and you still aren’t able to access the website, try a CTRL + C command in the shell that’s running the app. Power Shell get frozen up sometimes and it needs a pinch to revive. Doing this might even terminate the server, but it sometimes does the trick.

That’s it. Give a thumbs up if you found this helpful.😉

Some more optional stuff

I have created a short Powershell script that will get you your IP address whenever you need one:

$env:getIp = ipconfig
if ($env:getIp -match '(IPv4[\sa-zA-Z.]+:\s[0-9.]+)') {
    if ($matches[1] -match '([^a-z\s][\d]+[.\d]+)'){
        $ipv4 = $matches[1]
    }
}
echo $ipv4

Save it to a file with .ps1 extenstion (for PowerShell), and run it on before starting your app. You can save it in your project folder and run it as:

.\getIP.ps1; flask run -h $ipv4

Note: I saved the above shell code in getIP.ps1.

Cool.👌


回答 3

如果您的cool应用程序的配置是从外部文件加载的,如以下示例所示,请不要忘记使用HOST =“ 0.0.0.0”更新相应的配置文件

cool.app.run(
    host=cool.app.config.get("HOST", "localhost"),
    port=cool.app.config.get("PORT", 9000)
)            

If your cool app has it’s configuration loaded from an external file, like in the following example, then don’t forget to update the corresponding config file with HOST=”0.0.0.0″

cool.app.run(
    host=cool.app.config.get("HOST", "localhost"),
    port=cool.app.config.get("PORT", 9000)
)            

回答 4

在您的项目中添加以下几行

if __name__ == '__main__':
    app.debug = True
    app.run(host = '0.0.0.0',port=5005)

Add below lines to your project

if __name__ == '__main__':
    app.debug = True
    app.run(host = '0.0.0.0',port=5005)

回答 5

检查服务器上是否打开了特定端口以服务于客户端?

在Ubuntu或Linux发行版中

sudo ufw enable
sudo ufw allow 5000/tcp //allow the server to handle the request on port 5000

配置应用程序以处理远程请求

app.run(host='0.0.0.0' , port=5000)


python3 app.py & #run application in background

Check whether the particular port is open on the server to serve the client or not?

in Ubuntu or Linux distro

sudo ufw enable
sudo ufw allow 5000/tcp //allow the server to handle the request on port 5000

Configure the application to handle remote requests

app.run(host='0.0.0.0' , port=5000)


python3 app.py & #run application in background

回答 6

转到CMD(命令提示符)上的项目路径,然后执行以下命令:

设置FLASK_APP = ABC.py

SET FLASK_ENV =开发

烧瓶运行-h [yourIP] -p 8080

您将在CMD上获得以下o / p:-

  • 正在投放Flask应用“ expirement.py”(延迟加载)

现在,您可以使用http:// [yourIP]:8080 / url 在另一台计算机上访问flask应用程序

Go to your project path on CMD(command Prompt) and execute the following command:-

set FLASK_APP=ABC.py

SET FLASK_ENV=development

flask run -h [yourIP] -p 8080

you will get following o/p on CMD:-

  • Serving Flask app “expirement.py” (lazy loading)
    • Environment: development
    • Debug mode: on
    • Restarting with stat
    • Debugger is active!
    • Debugger PIN: 199-519-700
    • Running on http://[yourIP]:8080/ (Press CTRL+C to quit)

Now you can access your flask app on another machine using http://[yourIP]:8080/ url


回答 7

如果您在访问使用PyCharm部署的Flask服务器时遇到问题,请考虑以下因素:

PyCharm不会直接运行您的主.py文件,因此if __name__ == '__main__':不会执行其中的任何代码,并且任何更改(例如app.run(host='0.0.0.0', port=5000))都不会生效。

相反,您应该使用“运行配置”配置Flask服务器,尤其是将其放置--host 0.0.0.0 --port 5000在“ 其他选项”字段中。

运行Flask服务器PyCharm的配置

有关在PyCharm中配置Flask服务器的更多信息

If you’re having troubles accessing your Flask server, deployed using PyCharm, take the following into account:

PyCharm doesn’t run your main .py file directly, so any code in if __name__ == '__main__': won’t be executed, and any changes (like app.run(host='0.0.0.0', port=5000)) won’t take effect.

Instead, you should configure the Flask server using Run Configurations, in particular, placing --host 0.0.0.0 --port 5000 into Additional options field.

Run cofigurations of Flask server PyCharm

More about configuring Flask server in PyCharm


回答 8

我遇到了同样的问题,我使用PyCharm作为编辑器,并且在创建项目时,PyCharm创建了Flask Server。我所做的就是通过以下方式用Python创建服务器;

配置Python服务器PyCharm 基本上我所做的是创建一个新服务器,但如果不是python,则使用flask

希望对您有帮助

I had the same problem, I use PyCharm as an editor and when I created the project, PyCharm created a Flask Server. What I did was create a server with Python in the following way;

Config Python Server PyCharm basically what I did was create a new server but flask if not python

I hope it helps you


回答 9

这个答案不仅与flask有关,而且应适用于所有无法连接来自另一个主机的服务

  1. 用于netstat -ano | grep <port>查看地址是0.0.0.0还是::。如果是127.0.0.1,则仅适用于本地请求。
  2. 使用tcpdump查看是否有任何数据包丢失。如果显示明显的不平衡,请通过iptables检查路由规则。

今天,我像往常一样运行烧瓶应用程序,但是我发现它无法从其他服务器连接。然后运行netstat -ano | grep <port>,本地地址为::or 0.0.0.0(我都尝试过,并且我知道127.0.0.1仅允许来自本地主机的连接)。然后我用了telnet host port,结果就像connect to ...。这很奇怪。然后我想我最好跟查一下tcpdump -i any port <port> -w w.pcap。我注意到这一切都是这样的:

tcpdump结果显示它只有来自远程主机的SYN数据包

然后通过检查iptables --listOUTPUT部分,我可以看到一些规则:

iptables列表结果

这些规则禁止握手时输出tcp重要数据包。通过删除它们,问题消失了。

This answer is not solely related with flask, but should be applicable for all cannot connect service from another host issue.

  1. use netstat -ano | grep <port> to see if the address is 0.0.0.0 or ::. If it is 127.0.0.1 then it is only for the local requests.
  2. use tcpdump to see if any packet is missing. If it shows obvious imbalance, check routing rules by iptables.

Today I run my flask app as usual, but I noticed it cannot connect from other server. Then I run netstat -ano | grep <port>, and the local address is :: or 0.0.0.0 (I tried both, and I know 127.0.0.1 only allows connection from the local host). Then I used telnet host port, the result is like connect to .... This is very odd. Then I thought I would better check it with tcpdump -i any port <port> -w w.pcap. And I noticed it is all like this:

tcpdump result shows it there is only SYN packets from remote host

Then by checking iptables --list OUTPUT section, I could see several rules:

iptables list result

these rules forbid output tcp vital packets in handshaking. By deleting them, the problem is gone.


回答 10

对我来说,我遵循上面的答案并对其进行了一些修改:

  1. 只需在命令提示符下使用ipconfig来获取您的ipv4地址
  2. 转到存在烧瓶代码的文件
  3. 在主函数中编写app.run(host =’您的ipv4地址’)

例如:

在此处输入图片说明

For me i followed the above answer and modified it a bit:

  1. Just grab your ipv4 address using ipconfig on command prompt
  2. Go to the file in which flask code is present
  3. In main function write app.run(host= ‘your ipv4 address’)

Eg:

enter image description here


回答 11

转到项目路径set FLASK_APP = ABC.py SET FLASK_ENV = development

flask run -h [yourIP] -p 8080,您将在CMD上遵循o / p:-*正在服务的Flask应用程序“ expirement.py”(延迟加载)*环境:开发*调试模式:on *以stat重新启动*调试器处于活动状态!*调试器PIN:199-519-700 *在http:// [yourIP]:8080 /上运行(按CTRL + C退出)

go to project path set FLASK_APP=ABC.py SET FLASK_ENV=development

flask run -h [yourIP] -p 8080 you will following o/p on CMD:- * Serving Flask app “expirement.py” (lazy loading) * Environment: development * Debug mode: on * Restarting with stat * Debugger is active! * Debugger PIN: 199-519-700 * Running on http://[yourIP]:8080/ (Press CTRL+C to quit)


回答 12

您还可以通过环境变量设置主机(将其暴露在面向IP地址的网络上)和端口。

$ export FLASK_APP=app.py
$ export FLASK_ENV=development
$ export FLASK_RUN_PORT=8000
$ export FLASK_RUN_HOST=0.0.0.0

$ flask run
 * Serving Flask app "app.py" (lazy loading)
 * Environment: development
 * Debug mode: on
 * Running on https://0.0.0.0:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 329-665-000

请参见如何获取所有可用的命令选项来设置环境变量?

You can also set the host (to expose it on a network facing IP address) and port via environment variables.

$ export FLASK_APP=app.py
$ export FLASK_ENV=development
$ export FLASK_RUN_PORT=8000
$ export FLASK_RUN_HOST=0.0.0.0

$ flask run
 * Serving Flask app "app.py" (lazy loading)
 * Environment: development
 * Debug mode: on
 * Running on https://0.0.0.0:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 329-665-000

See How to get all available Command Options to set environment variables?


回答 13

.flaskenv在项目根目录中创建文件。

该文件中的参数通常为:

FLASK_APP=app.py
FLASK_ENV=development
FLASK_RUN_HOST=[dev-host-ip]
FLASK_RUN_PORT=5000

如果您有虚拟环境,请激活它并执行pip install python-dotenv

该软件包将使用该.flaskenv文件,并且其中的声明将在终端会话之间自动导入。

那你可以做 flask run

Create file .flaskenv in the project root directory.

The parameters in this file are typically:

FLASK_APP=app.py
FLASK_ENV=development
FLASK_RUN_HOST=[dev-host-ip]
FLASK_RUN_PORT=5000

If you have a virtual environment, activate it and do a pip install python-dotenv .

This package is going to use the .flaskenv file, and declarations inside it will be automatically imported across terminal sessions.

Then you can do flask run


如何使用argparse将列表作为命令行参数传递?

问题:如何使用argparse将列表作为命令行参数传递?

我正在尝试将列表作为参数传递给命令行程序。是否有将argparse列表作为选项传递的选项?

parser.add_argument('-l', '--list',
                      type=list, action='store',
                      dest='list',
                      help='<Required> Set flag',
                      required=True)

脚本如下所示

python test.py -l "265340 268738 270774 270817"

I am trying to pass a list as an argument to a command line program. Is there an argparse option to pass a list as option?

parser.add_argument('-l', '--list',
                      type=list, action='store',
                      dest='list',
                      help='<Required> Set flag',
                      required=True)

Script is called like below

python test.py -l "265340 268738 270774 270817"

回答 0

TL; DR

使用nargs选项或选项的'append'设置action(取决于您希望用户界面的行为方式)。

纳尔

parser.add_argument('-l','--list', nargs='+', help='<Required> Set flag', required=True)
# Use like:
# python arg.py -l 1234 2345 3456 4567

nargs='+'接受1个或多个参数,nargs='*'接受零个或多个。

附加

parser.add_argument('-l','--list', action='append', help='<Required> Set flag', required=True)
# Use like:
# python arg.py -l 1234 -l 2345 -l 3456 -l 4567

append您提供多个选项来构建列表。

不要使用type=list-可能没有可能要与一起使用的type=list情况argparse。曾经


让我们更详细地了解人们可能尝试执行此操作的一些不同方式以及最终结果。

import argparse

parser = argparse.ArgumentParser()

# By default it will fail with multiple arguments.
parser.add_argument('--default')

# Telling the type to be a list will also fail for multiple arguments,
# but give incorrect results for a single argument.
parser.add_argument('--list-type', type=list)

# This will allow you to provide multiple arguments, but you will get
# a list of lists which is not desired.
parser.add_argument('--list-type-nargs', type=list, nargs='+')

# This is the correct way to handle accepting multiple arguments.
# '+' == 1 or more.
# '*' == 0 or more.
# '?' == 0 or 1.
# An int is an explicit number of arguments to accept.
parser.add_argument('--nargs', nargs='+')

# To make the input integers
parser.add_argument('--nargs-int-type', nargs='+', type=int)

# An alternate way to accept multiple inputs, but you must
# provide the flag once per input. Of course, you can use
# type=int here if you want.
parser.add_argument('--append-action', action='append')

# To show the results of the given option to screen.
for _, value in parser.parse_args()._get_kwargs():
    if value is not None:
        print(value)

这是您可以期望的输出:

$ python arg.py --default 1234 2345 3456 4567
...
arg.py: error: unrecognized arguments: 2345 3456 4567

$ python arg.py --list-type 1234 2345 3456 4567
...
arg.py: error: unrecognized arguments: 2345 3456 4567

$ # Quotes won't help here... 
$ python arg.py --list-type "1234 2345 3456 4567"
['1', '2', '3', '4', ' ', '2', '3', '4', '5', ' ', '3', '4', '5', '6', ' ', '4', '5', '6', '7']

$ python arg.py --list-type-nargs 1234 2345 3456 4567
[['1', '2', '3', '4'], ['2', '3', '4', '5'], ['3', '4', '5', '6'], ['4', '5', '6', '7']]

$ python arg.py --nargs 1234 2345 3456 4567
['1234', '2345', '3456', '4567']

$ python arg.py --nargs-int-type 1234 2345 3456 4567
[1234, 2345, 3456, 4567]

$ # Negative numbers are handled perfectly fine out of the box.
$ python arg.py --nargs-int-type -1234 2345 -3456 4567
[-1234, 2345, -3456, 4567]

$ python arg.py --append-action 1234 --append-action 2345 --append-action 3456 --append-action 4567
['1234', '2345', '3456', '4567']

小贴士

  • 使用nargsaction='append'
    • nargs从用户的角度来看,它可能更直接,但是如果存在位置参数,则可能是不直观的,因为argparse无法分辨什么应该是位置参数以及什么属于nargs;如果您有位置参数,那么action='append'最终可能是一个更好的选择。
    • 如果以上是唯一真正的nargs给予'*''+''?'。如果您提供一个整数(例如4),则将选项与nargs和位置参数混合使用将不会有问题,因为argparse它将确切知道期望该选项有多少个值。
  • 不要在命令行1上使用引号
  • 不要使用type=list,因为它会返回列表列表
    • 发生这种情况的原因是,在后台argparse使用的值type来强制您选择的每个给定给定参数type,而不是所有参数的总和。
    • 您可以使用type=int(或其他任何方式)获取一个整数列表(或其他任何方式)

1:我的意思不是一般。.我的意思不是用引号将列表传递给argparse您。

TL;DR

Use the nargs option or the 'append' setting of the action option (depending on how you want the user interface to behave).

nargs

parser.add_argument('-l','--list', nargs='+', help='<Required> Set flag', required=True)
# Use like:
# python arg.py -l 1234 2345 3456 4567

nargs='+' takes 1 or more arguments, nargs='*' takes zero or more.

append

parser.add_argument('-l','--list', action='append', help='<Required> Set flag', required=True)
# Use like:
# python arg.py -l 1234 -l 2345 -l 3456 -l 4567

With append you provide the option multiple times to build up the list.

Don’t use type=list!!! – There is probably no situation where you would want to use type=list with argparse. Ever.


Let’s take a look in more detail at some of the different ways one might try to do this, and the end result.

import argparse

parser = argparse.ArgumentParser()

# By default it will fail with multiple arguments.
parser.add_argument('--default')

# Telling the type to be a list will also fail for multiple arguments,
# but give incorrect results for a single argument.
parser.add_argument('--list-type', type=list)

# This will allow you to provide multiple arguments, but you will get
# a list of lists which is not desired.
parser.add_argument('--list-type-nargs', type=list, nargs='+')

# This is the correct way to handle accepting multiple arguments.
# '+' == 1 or more.
# '*' == 0 or more.
# '?' == 0 or 1.
# An int is an explicit number of arguments to accept.
parser.add_argument('--nargs', nargs='+')

# To make the input integers
parser.add_argument('--nargs-int-type', nargs='+', type=int)

# An alternate way to accept multiple inputs, but you must
# provide the flag once per input. Of course, you can use
# type=int here if you want.
parser.add_argument('--append-action', action='append')

# To show the results of the given option to screen.
for _, value in parser.parse_args()._get_kwargs():
    if value is not None:
        print(value)

Here is the output you can expect:

$ python arg.py --default 1234 2345 3456 4567
...
arg.py: error: unrecognized arguments: 2345 3456 4567

$ python arg.py --list-type 1234 2345 3456 4567
...
arg.py: error: unrecognized arguments: 2345 3456 4567

$ # Quotes won't help here... 
$ python arg.py --list-type "1234 2345 3456 4567"
['1', '2', '3', '4', ' ', '2', '3', '4', '5', ' ', '3', '4', '5', '6', ' ', '4', '5', '6', '7']

$ python arg.py --list-type-nargs 1234 2345 3456 4567
[['1', '2', '3', '4'], ['2', '3', '4', '5'], ['3', '4', '5', '6'], ['4', '5', '6', '7']]

$ python arg.py --nargs 1234 2345 3456 4567
['1234', '2345', '3456', '4567']

$ python arg.py --nargs-int-type 1234 2345 3456 4567
[1234, 2345, 3456, 4567]

$ # Negative numbers are handled perfectly fine out of the box.
$ python arg.py --nargs-int-type -1234 2345 -3456 4567
[-1234, 2345, -3456, 4567]

$ python arg.py --append-action 1234 --append-action 2345 --append-action 3456 --append-action 4567
['1234', '2345', '3456', '4567']

Takeaways:

  • Use nargs or action='append'
    • nargs can be more straightforward from a user perspective, but it can be unintuitive if there are positional arguments because argparse can’t tell what should be a positional argument and what belongs to the nargs; if you have positional arguments then action='append' may end up being a better choice.
    • The above is only true if nargs is given '*', '+', or '?'. If you provide an integer number (such as 4) then there will be no problem mixing options with nargs and positional arguments because argparse will know exactly how many values to expect for the option.
  • Don’t use quotes on the command line1
  • Don’t use type=list, as it will return a list of lists
    • This happens because under the hood argparse uses the value of type to coerce each individual given argument you your chosen type, not the aggregate of all arguments.
    • You can use type=int (or whatever) to get a list of ints (or whatever)

1: I don’t mean in general.. I mean using quotes to pass a list to argparse is not what you want.


回答 1

我更喜欢传递一个定界字符串,稍后在脚本中对其进行解析。原因是:该列表可以是任何类型intstr,有时nargs如果有多个可选参数和位置参数,有时会遇到问题。

parser = ArgumentParser()
parser.add_argument('-l', '--list', help='delimited list input', type=str)
args = parser.parse_args()
my_list = [int(item) for item in args.list.split(',')]

然后,

python test.py -l "265340,268738,270774,270817" [other arguments]

要么,

python test.py -l 265340,268738,270774,270817 [other arguments]

会很好的工作。分隔符也可以是空格,尽管会像问题中的示例一样在参数值周围加引号。

I prefer passing a delimited string which I parse later in the script. The reasons for this are; the list can be of any type int or str, and sometimes using nargs I run into problems if there are multiple optional arguments and positional arguments.

parser = ArgumentParser()
parser.add_argument('-l', '--list', help='delimited list input', type=str)
args = parser.parse_args()
my_list = [int(item) for item in args.list.split(',')]

Then,

python test.py -l "265340,268738,270774,270817" [other arguments]

or,

python test.py -l 265340,268738,270774,270817 [other arguments]

will work fine. The delimiter can be a space, too, which would though enforce quotes around the argument value like in the example in the question.


回答 2

除之外nargschoices如果您事先知道列表,则可能要使用:

>>> parser = argparse.ArgumentParser(prog='game.py')
>>> parser.add_argument('move', choices=['rock', 'paper', 'scissors'])
>>> parser.parse_args(['rock'])
Namespace(move='rock')
>>> parser.parse_args(['fire'])
usage: game.py [-h] {rock,paper,scissors}
game.py: error: argument move: invalid choice: 'fire' (choose from 'rock',
'paper', 'scissors')

Additionally to nargs, you might want to use choices if you know the list in advance:

>>> parser = argparse.ArgumentParser(prog='game.py')
>>> parser.add_argument('move', choices=['rock', 'paper', 'scissors'])
>>> parser.parse_args(['rock'])
Namespace(move='rock')
>>> parser.parse_args(['fire'])
usage: game.py [-h] {rock,paper,scissors}
game.py: error: argument move: invalid choice: 'fire' (choose from 'rock',
'paper', 'scissors')

回答 3

在argparse的add_argument方法中使用nargs参数

我使用nargs =’ ‘作为add_argument参数。如果我没有传递任何明确的参数,我专门在选项中使用nargs =’ ‘来选择默认值

包括一个代码片段作为示例:

示例:temp_args1.py

请注意:以下示例代码是用python3编写的。通过更改打印语句的格式,可以在python2中运行

#!/usr/local/bin/python3.6

from argparse import ArgumentParser

description = 'testing for passing multiple arguments and to get list of args'
parser = ArgumentParser(description=description)
parser.add_argument('-i', '--item', action='store', dest='alist',
                    type=str, nargs='*', default=['item1', 'item2', 'item3'],
                    help="Examples: -i item1 item2, -i item3")
opts = parser.parse_args()

print("List of items: {}".format(opts.alist))

注意:我正在收集存储在列表中的多个字符串参数-opts.alist如果要获取整数列表,请将parser.add_argument上的type参数更改为int

执行结果:

python3.6 temp_agrs1.py -i item5 item6 item7
List of items: ['item5', 'item6', 'item7']

python3.6 temp_agrs1.py -i item10
List of items: ['item10']

python3.6 temp_agrs1.py
List of items: ['item1', 'item2', 'item3']

Using nargs parameter in argparse’s add_argument method

I use nargs=’‘ as an add_argument parameter. I specifically used nargs=’‘ to the option to pick defaults if I am not passing any explicit arguments

Including a code snippet as example:

Example: temp_args1.py

Please Note: The below sample code is written in python3. By changing the print statement format, can run in python2

#!/usr/local/bin/python3.6

from argparse import ArgumentParser

description = 'testing for passing multiple arguments and to get list of args'
parser = ArgumentParser(description=description)
parser.add_argument('-i', '--item', action='store', dest='alist',
                    type=str, nargs='*', default=['item1', 'item2', 'item3'],
                    help="Examples: -i item1 item2, -i item3")
opts = parser.parse_args()

print("List of items: {}".format(opts.alist))

Note: I am collecting multiple string arguments that gets stored in the list – opts.alist If you want list of integers, change the type parameter on parser.add_argument to int

Execution Result:

python3.6 temp_agrs1.py -i item5 item6 item7
List of items: ['item5', 'item6', 'item7']

python3.6 temp_agrs1.py -i item10
List of items: ['item10']

python3.6 temp_agrs1.py
List of items: ['item1', 'item2', 'item3']

回答 4

如果打算使单个开关具有多个参数,请使用nargs='+'。如果您的示例“ -l”实际上是整数:

a = argparse.ArgumentParser()
a.add_argument(
    '-l', '--list',  # either of this switches
    nargs='+',       # one or more parameters to this switch
    type=int,        # /parameters/ are ints
    dest='list',     # store in 'list'.
    default=[],      # since we're not specifying required.
)

print a.parse_args("-l 123 234 345 456".split(' '))
print a.parse_args("-l 123 -l=234 -l345 --list 456".split(' '))

产生

Namespace(list=[123, 234, 345, 456])
Namespace(list=[456])  # Attention!

如果您多次指定相同的参数,则默认操作('store')将替换现有数据。

替代方法是使用append操作:

a = argparse.ArgumentParser()
a.add_argument(
    '-l', '--list',  # either of this switches
    type=int,        # /parameters/ are ints
    dest='list',     # store in 'list'.
    default=[],      # since we're not specifying required.
    action='append', # add to the list instead of replacing it
)

print a.parse_args("-l 123 -l=234 -l345 --list 456".split(' '))

哪个产生

Namespace(list=[123, 234, 345, 456])

或者,您可以编写一个自定义处理程序/操作来解析逗号分隔的值,以便您可以

-l 123,234,345 -l 456

If you are intending to make a single switch take multiple parameters, then you use nargs='+'. If your example ‘-l’ is actually taking integers:

a = argparse.ArgumentParser()
a.add_argument(
    '-l', '--list',  # either of this switches
    nargs='+',       # one or more parameters to this switch
    type=int,        # /parameters/ are ints
    dest='list',     # store in 'list'.
    default=[],      # since we're not specifying required.
)

print a.parse_args("-l 123 234 345 456".split(' '))
print a.parse_args("-l 123 -l=234 -l345 --list 456".split(' '))

Produces

Namespace(list=[123, 234, 345, 456])
Namespace(list=[456])  # Attention!

If you specify the same argument multiple times, the default action ('store') replaces the existing data.

The alternative is to use the append action:

a = argparse.ArgumentParser()
a.add_argument(
    '-l', '--list',  # either of this switches
    type=int,        # /parameters/ are ints
    dest='list',     # store in 'list'.
    default=[],      # since we're not specifying required.
    action='append', # add to the list instead of replacing it
)

print a.parse_args("-l 123 -l=234 -l345 --list 456".split(' '))

Which produces

Namespace(list=[123, 234, 345, 456])

Or you can write a custom handler/action to parse comma-separated values so that you could do

-l 123,234,345 -l 456

回答 5

在中add_argument()type只是一个可调用对象,它接收字符串并返回选项值。

import ast

def arg_as_list(s):                                                            
    v = ast.literal_eval(s)                                                    
    if type(v) is not list:                                                    
        raise argparse.ArgumentTypeError("Argument \"%s\" is not a list" % (s))
    return v                                                                   


def foo():
    parser.add_argument("--list", type=arg_as_list, default=[],
                        help="List of values")

这将允许:

$ ./tool --list "[1,2,3,4]"

In add_argument(), type is just a callable object that receives string and returns option value.

import ast

def arg_as_list(s):                                                            
    v = ast.literal_eval(s)                                                    
    if type(v) is not list:                                                    
        raise argparse.ArgumentTypeError("Argument \"%s\" is not a list" % (s))
    return v                                                                   


def foo():
    parser.add_argument("--list", type=arg_as_list, default=[],
                        help="List of values")

This will allow to:

$ ./tool --list "[1,2,3,4]"

回答 6

如果您有一个嵌套列表,其中内部列表具有不同的类型和长度,并且您想保留该类型,例如,

[[1, 2], ["foo", "bar"], [3.14, "baz", 20]]

那么您可以使用@ sam-mason这个问题提出的解决方案,如下所示:

from argparse import ArgumentParser
import json

parser = ArgumentParser()
parser.add_argument('-l', type=json.loads)
parser.parse_args(['-l', '[[1,2],["foo","bar"],[3.14,"baz",20]]'])

这使:

Namespace(l=[[1, 2], ['foo', 'bar'], [3.14, 'baz', 20]])

If you have a nested list where the inner lists have different types and lengths and you would like to preserve the type, e.g.,

[[1, 2], ["foo", "bar"], [3.14, "baz", 20]]

then you can use the solution proposed by @sam-mason to this question, shown below:

from argparse import ArgumentParser
import json

parser = ArgumentParser()
parser.add_argument('-l', type=json.loads)
parser.parse_args(['-l', '[[1,2],["foo","bar"],[3.14,"baz",20]]'])

which gives:

Namespace(l=[[1, 2], ['foo', 'bar'], [3.14, 'baz', 20]])

回答 7

我想处理传递多个列表,整数值和字符串。

有用的链接=> 如何将Bash变量传递给Python?

def main(args):
    my_args = []
    for arg in args:
        if arg.startswith("[") and arg.endswith("]"):
            arg = arg.replace("[", "").replace("]", "")
            my_args.append(arg.split(","))
        else:
            my_args.append(arg)

    print(my_args)


if __name__ == "__main__":
    import sys
    main(sys.argv[1:])

顺序并不重要。如果要传递列表,请在之间进行操作"[""]并使用逗号分隔它们。

然后,

python test.py my_string 3 "[1,2]" "[3,4,5]"

输出=> ['my_string', '3', ['1', '2'], ['3', '4', '5']]my_args变量按顺序包含参数。

I want to handle passing multiple lists, integer values and strings.

Helpful link => How to pass a Bash variable to Python?

def main(args):
    my_args = []
    for arg in args:
        if arg.startswith("[") and arg.endswith("]"):
            arg = arg.replace("[", "").replace("]", "")
            my_args.append(arg.split(","))
        else:
            my_args.append(arg)

    print(my_args)


if __name__ == "__main__":
    import sys
    main(sys.argv[1:])

Order is not important. If you want to pass a list just do as in between "[" and "] and seperate them using a comma.

Then,

python test.py my_string 3 "[1,2]" "[3,4,5]"

Output => ['my_string', '3', ['1', '2'], ['3', '4', '5']], my_args variable contains the arguments in order.


回答 8

我认为,最优雅的解决方案是将lambda函数传递给“类型”,如Chepner所述。除此之外,如果您事先不知道列表的分隔符是什么,还可以将多个分隔符传递给re.split:

# python3 test.py -l "abc xyz, 123"

import re
import argparse

parser = argparse.ArgumentParser(description='Process a list.')
parser.add_argument('-l', '--list',
                    type=lambda s: re.split(' |, ', s),
                    required=True,
                    help='comma or space delimited list of characters')

args = parser.parse_args()
print(args.list)


# Output: ['abc', 'xyz', '123']

I think the most elegant solution is to pass a lambda function to “type”, as mentioned by Chepner. In addition to this, if you do not know beforehand what the delimiter of your list will be, you can also pass multiple delimiters to re.split:

# python3 test.py -l "abc xyz, 123"

import re
import argparse

parser = argparse.ArgumentParser(description='Process a list.')
parser.add_argument('-l', '--list',
                    type=lambda s: re.split(' |, ', s),
                    required=True,
                    help='comma or space delimited list of characters')

args = parser.parse_args()
print(args.list)


# Output: ['abc', 'xyz', '123']

拼合不规则的列表

问题:拼合不规则的列表

是的,我知道以前已经讨论过这个主题(这里这里这里这里),但是据我所知,除一个解决方案外,所有解决方案在这样的列表上都失败了:

L = [[[1, 2, 3], [4, 5]], 6]

所需的输出是

[1, 2, 3, 4, 5, 6]

甚至更好的迭代器。这个问题是我看到的唯一适用于任意嵌套的解决方案:

def flatten(x):
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

flatten(L)

这是最好的模型吗?我有事吗 任何问题?

Yes, I know this subject has been covered before (here, here, here, here), but as far as I know, all solutions, except for one, fail on a list like this:

L = [[[1, 2, 3], [4, 5]], 6]

Where the desired output is

[1, 2, 3, 4, 5, 6]

Or perhaps even better, an iterator. The only solution I saw that works for an arbitrary nesting is found in this question:

def flatten(x):
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

flatten(L)

Is this the best model? Did I overlook something? Any problems?


回答 0

使用生成器函数可以使您的示例更易于阅读,并可能提高性能。

Python 2

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

我使用了2.6中添加的Iterable ABC

Python 3

在Python 3中,basestring是没有更多的,但你可以使用一个元组str,并bytes得到同样的效果存在。

yield from运营商从一时间产生一个返回的项目。这句法委派到子发生器在3.3加入

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

Using generator functions can make your example a little easier to read and probably boost the performance.

Python 2

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

I used the Iterable ABC added in 2.6.

Python 3

In Python 3, the basestring is no more, but you can use a tuple of str and bytes to get the same effect there.

The yield from operator returns an item from a generator one at a time. This syntax for delegating to a subgenerator was added in 3.3

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

回答 1

我的解决方案:

import collections


def flatten(x):
    if isinstance(x, collections.Iterable):
        return [a for i in x for a in flatten(i)]
    else:
        return [x]

更加简洁,但几乎相同。

My solution:

import collections


def flatten(x):
    if isinstance(x, collections.Iterable):
        return [a for i in x for a in flatten(i)]
    else:
        return [x]

A little more concise, but pretty much the same.


回答 2

使用递归和鸭子类型生成器(针对Python 3更新):

def flatten(L):
    for item in L:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

list(flatten([[[1, 2, 3], [4, 5]], 6]))
>>>[1, 2, 3, 4, 5, 6]

Generator using recursion and duck typing (updated for Python 3):

def flatten(L):
    for item in L:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

list(flatten([[[1, 2, 3], [4, 5]], 6]))
>>>[1, 2, 3, 4, 5, 6]

回答 3

@unutbu的非递归解决方案的生成器版本,由@Andrew在注释中要求:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        yield l[i]
        i += 1

此生成器的简化版本:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    while l:
        while l and isinstance(l[0], ltypes):
            l[0:1] = l[0]
        if l: yield l.pop(0)

Generator version of @unutbu’s non-recursive solution, as requested by @Andrew in a comment:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        yield l[i]
        i += 1

Slightly simplified version of this generator:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    while l:
        while l and isinstance(l[0], ltypes):
            l[0:1] = l[0]
        if l: yield l.pop(0)

回答 4

这是我的功能性版本的递归展平,它既处理元组又处理列表,并允许您引入位置参数的任何组合。返回一个生成器,该生成器按arg由arg的顺序生成整个序列:

flatten = lambda *n: (e for a in n
    for e in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))

用法:

l1 = ['a', ['b', ('c', 'd')]]
l2 = [0, 1, (2, 3), [[4, 5, (6, 7, (8,), [9]), 10]], (11,)]
print list(flatten(l1, -2, -1, l2))
['a', 'b', 'c', 'd', -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Here is my functional version of recursive flatten which handles both tuples and lists, and lets you throw in any mix of positional arguments. Returns a generator which produces the entire sequence in order, arg by arg:

flatten = lambda *n: (e for a in n
    for e in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))

Usage:

l1 = ['a', ['b', ('c', 'd')]]
l2 = [0, 1, (2, 3), [[4, 5, (6, 7, (8,), [9]), 10]], (11,)]
print list(flatten(l1, -2, -1, l2))
['a', 'b', 'c', 'd', -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

回答 5

此版本的版本flatten避免了python的递归限制(因此可用于任意深度的嵌套可迭代对象)。它是一个生成器,可以处理字符串和任意可迭代(甚至是无限的)。

import itertools as IT
import collections

def flatten(iterable, ltypes=collections.Iterable):
    remainder = iter(iterable)
    while True:
        first = next(remainder)
        if isinstance(first, ltypes) and not isinstance(first, (str, bytes)):
            remainder = IT.chain(first, remainder)
        else:
            yield first

以下是一些示例说明其用法:

print(list(IT.islice(flatten(IT.repeat(1)),10)))
# [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

print(list(IT.islice(flatten(IT.chain(IT.repeat(2,3),
                                       {10,20,30},
                                       'foo bar'.split(),
                                       IT.repeat(1),)),10)))
# [2, 2, 2, 10, 20, 30, 'foo', 'bar', 1, 1]

print(list(flatten([[1,2,[3,4]]])))
# [1, 2, 3, 4]

seq = ([[chr(i),chr(i-32)] for i in range(ord('a'), ord('z')+1)] + list(range(0,9)))
print(list(flatten(seq)))
# ['a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H',
# 'i', 'I', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'N', 'o', 'O', 'p', 'P',
# 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X',
# 'y', 'Y', 'z', 'Z', 0, 1, 2, 3, 4, 5, 6, 7, 8]

尽管flatten可以处理无限生成器,但不能处理无限嵌套:

def infinitely_nested():
    while True:
        yield IT.chain(infinitely_nested(), IT.repeat(1))

print(list(IT.islice(flatten(infinitely_nested()), 10)))
# hangs

This version of flatten avoids python’s recursion limit (and thus works with arbitrarily deep, nested iterables). It is a generator which can handle strings and arbitrary iterables (even infinite ones).

import itertools as IT
import collections

def flatten(iterable, ltypes=collections.Iterable):
    remainder = iter(iterable)
    while True:
        first = next(remainder)
        if isinstance(first, ltypes) and not isinstance(first, (str, bytes)):
            remainder = IT.chain(first, remainder)
        else:
            yield first

Here are some examples demonstrating its use:

print(list(IT.islice(flatten(IT.repeat(1)),10)))
# [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

print(list(IT.islice(flatten(IT.chain(IT.repeat(2,3),
                                       {10,20,30},
                                       'foo bar'.split(),
                                       IT.repeat(1),)),10)))
# [2, 2, 2, 10, 20, 30, 'foo', 'bar', 1, 1]

print(list(flatten([[1,2,[3,4]]])))
# [1, 2, 3, 4]

seq = ([[chr(i),chr(i-32)] for i in range(ord('a'), ord('z')+1)] + list(range(0,9)))
print(list(flatten(seq)))
# ['a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H',
# 'i', 'I', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'N', 'o', 'O', 'p', 'P',
# 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X',
# 'y', 'Y', 'z', 'Z', 0, 1, 2, 3, 4, 5, 6, 7, 8]

Although flatten can handle infinite generators, it can not handle infinite nesting:

def infinitely_nested():
    while True:
        yield IT.chain(infinitely_nested(), IT.repeat(1))

print(list(IT.islice(flatten(infinitely_nested()), 10)))
# hangs

回答 6

这是另一个更有趣的答案…

import re

def Flatten(TheList):
    a = str(TheList)
    b,crap = re.subn(r'[\[,\]]', ' ', a)
    c = b.split()
    d = [int(x) for x in c]

    return(d)

基本上,它将嵌套列表转换为字符串,使用正则表达式去除嵌套语法,然后将结果转换回(扁平化的)列表。

Here’s another answer that is even more interesting…

import re

def Flatten(TheList):
    a = str(TheList)
    b,crap = re.subn(r'[\[,\]]', ' ', a)
    c = b.split()
    d = [int(x) for x in c]

    return(d)

Basically, it converts the nested list to a string, uses a regex to strip out the nested syntax, and then converts the result back to a (flattened) list.


回答 7

def flatten(xs):
    res = []
    def loop(ys):
        for i in ys:
            if isinstance(i, list):
                loop(i)
            else:
                res.append(i)
    loop(xs)
    return res
def flatten(xs):
    res = []
    def loop(ys):
        for i in ys:
            if isinstance(i, list):
                loop(i)
            else:
                res.append(i)
    loop(xs)
    return res

回答 8

您可以deepflatten在第三方套餐中使用iteration_utilities

>>> from iteration_utilities import deepflatten
>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(deepflatten(L))
[1, 2, 3, 4, 5, 6]

>>> list(deepflatten(L, types=list))  # only flatten "inner" lists
[1, 2, 3, 4, 5, 6]

这是一个迭代器,因此您需要对其进行迭代(例如,通过将其包装list或在循环中使用)。在内部,它使用迭代方法而不是递归方法,并且将其编写为C扩展,因此它可以比纯python方法更快:

>>> %timeit list(deepflatten(L))
12.6 µs ± 298 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit list(deepflatten(L, types=list))
8.7 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit list(flatten(L))   # Cristian - Python 3.x approach from https://stackoverflow.com/a/2158532/5393381
86.4 µs ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(flatten(L))   # Josh Lee - https://stackoverflow.com/a/2158522/5393381
107 µs ± 2.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(genflat(L, list))  # Alex Martelli - https://stackoverflow.com/a/2159079/5393381
23.1 µs ± 710 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

我是iteration_utilities图书馆的作者。

You could use deepflatten from the 3rd party package iteration_utilities:

>>> from iteration_utilities import deepflatten
>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(deepflatten(L))
[1, 2, 3, 4, 5, 6]

>>> list(deepflatten(L, types=list))  # only flatten "inner" lists
[1, 2, 3, 4, 5, 6]

It’s an iterator so you need to iterate it (for example by wrapping it with list or using it in a loop). Internally it uses an iterative approach instead of an recursive approach and it’s written as C extension so it can be faster than pure python approaches:

>>> %timeit list(deepflatten(L))
12.6 µs ± 298 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit list(deepflatten(L, types=list))
8.7 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit list(flatten(L))   # Cristian - Python 3.x approach from https://stackoverflow.com/a/2158532/5393381
86.4 µs ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(flatten(L))   # Josh Lee - https://stackoverflow.com/a/2158522/5393381
107 µs ± 2.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(genflat(L, list))  # Alex Martelli - https://stackoverflow.com/a/2159079/5393381
23.1 µs ± 710 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I’m the author of the iteration_utilities library.


回答 9

尝试创建一个可以平化Python中不规则列表的函数很有趣,但是当然这就是Python的目的(使编程变得有趣)。以下生成器在某些警告方面工作得很好:

def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable

这将压扁的数据类型,你可能想独自离开(比如bytearraybytesstr对象)。此外,代码还依赖于以下事实:从不可迭代的对象请求迭代器会引发TypeError

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable


>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>>

编辑:

我不同意以前的实现。问题在于您不应该将无法迭代的东西弄平。这令人困惑,并给人以错误的印象。

>>> list(flatten(123))
[123]
>>>

下面的生成器与第一个生成器几乎相同,但是不存在试图展平不可迭代对象的问题。当给它一个不适当的参数时,它会像人们期望的那样失败。

def flatten(iterable):
    for item in iterable:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

使用提供的列表对生成器进行测试可以正常工作。但是,TypeError当给它一个不可迭代的对象时,新代码将引发一个。下面显示了新行为的示例。

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>> list(flatten(123))
Traceback (most recent call last):
  File "<pyshell#32>", line 1, in <module>
    list(flatten(123))
  File "<pyshell#27>", line 2, in flatten
    for item in iterable:
TypeError: 'int' object is not iterable
>>>

It was fun trying to create a function that could flatten irregular list in Python, but of course that is what Python is for (to make programming fun). The following generator works fairly well with some caveats:

def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable

It will flatten datatypes that you might want left alone (like bytearray, bytes, and str objects). Also, the code relies on the fact that requesting an iterator from a non-iterable raises a TypeError.

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable


>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>>

Edit:

I disagree with the previous implementation. The problem is that you should not be able to flatten something that is not an iterable. It is confusing and gives the wrong impression of the argument.

>>> list(flatten(123))
[123]
>>>

The following generator is almost the same as the first but does not have the problem of trying to flatten a non-iterable object. It fails as one would expect when an inappropriate argument is given to it.

def flatten(iterable):
    for item in iterable:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

Testing the generator works fine with the list that was provided. However, the new code will raise a TypeError when a non-iterable object is given to it. Example are shown below of the new behavior.

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>> list(flatten(123))
Traceback (most recent call last):
  File "<pyshell#32>", line 1, in <module>
    list(flatten(123))
  File "<pyshell#27>", line 2, in flatten
    for item in iterable:
TypeError: 'int' object is not iterable
>>>

回答 10

尽管选择了一个优雅且非常Python化的答案,但我仅出于审查目的而提出我的解决方案:

def flat(l):
    ret = []
    for i in l:
        if isinstance(i, list) or isinstance(i, tuple):
            ret.extend(flat(i))
        else:
            ret.append(i)
    return ret

请告诉我们这段代码的好坏?

Although an elegant and very pythonic answer has been selected I would present my solution just for the review:

def flat(l):
    ret = []
    for i in l:
        if isinstance(i, list) or isinstance(i, tuple):
            ret.extend(flat(i))
        else:
            ret.append(i)
    return ret

Please tell how good or bad this code is?


回答 11

我喜欢简单的答案。没有生成器。没有递归或递归限制。只是迭代:

def flatten(TheList):
    listIsNested = True

    while listIsNested:                 #outer loop
        keepChecking = False
        Temp = []

        for element in TheList:         #inner loop
            if isinstance(element,list):
                Temp.extend(element)
                keepChecking = True
            else:
                Temp.append(element)

        listIsNested = keepChecking     #determine if outer loop exits
        TheList = Temp[:]

    return TheList

这适用于两个列表:内部for循环和外部while循环。

内部的for循环遍历列表。如果找到列表元素,则(1)使用list.extend()展平该部分嵌套的层次,并且(2)将keepChecking切换为True。keepchecking用于控制外部while循环。如果将外部循环设置为true,则会触发内部循环进行另一遍处理。

这些通行证一直发生,直到找不到更多的嵌套列表。当最后一次通过但找不到任何地方的传递时,keepChecking永远不会变为true,这意味着listIsNested保持为false,而外部while循环退出。

然后返回扁平化列表。

测试运行

flatten([1,2,3,4,[100,200,300,[1000,2000,3000]]])

[1, 2, 3, 4, 100, 200, 300, 1000, 2000, 3000]

I prefer simple answers. No generators. No recursion or recursion limits. Just iteration:

def flatten(TheList):
    listIsNested = True

    while listIsNested:                 #outer loop
        keepChecking = False
        Temp = []

        for element in TheList:         #inner loop
            if isinstance(element,list):
                Temp.extend(element)
                keepChecking = True
            else:
                Temp.append(element)

        listIsNested = keepChecking     #determine if outer loop exits
        TheList = Temp[:]

    return TheList

This works with two lists: an inner for loop and an outer while loop.

The inner for loop iterates through the list. If it finds a list element, it (1) uses list.extend() to flatten that part one level of nesting and (2) switches keepChecking to True. keepchecking is used to control the outer while loop. If the outer loop gets set to true, it triggers the inner loop for another pass.

Those passes keep happening until no more nested lists are found. When a pass finally occurs where none are found, keepChecking never gets tripped to true, which means listIsNested stays false and the outer while loop exits.

The flattened list is then returned.

Test-run

flatten([1,2,3,4,[100,200,300,[1000,2000,3000]]])

[1, 2, 3, 4, 100, 200, 300, 1000, 2000, 3000]


回答 12

这是一个简单的函数,可以平铺任意深度的列表。没有递归,以避免堆栈溢出。

from copy import deepcopy

def flatten_list(nested_list):
    """Flatten an arbitrarily nested list, without recursion (to avoid
    stack overflows). Returns a new list, the original list is unchanged.

    >> list(flatten_list([1, 2, 3, [4], [], [[[[[[[[[5]]]]]]]]]]))
    [1, 2, 3, 4, 5]
    >> list(flatten_list([[1, 2], 3]))
    [1, 2, 3]

    """
    nested_list = deepcopy(nested_list)

    while nested_list:
        sublist = nested_list.pop(0)

        if isinstance(sublist, list):
            nested_list = sublist + nested_list
        else:
            yield sublist

Here’s a simple function that flattens lists of arbitrary depth. No recursion, to avoid stack overflow.

from copy import deepcopy

def flatten_list(nested_list):
    """Flatten an arbitrarily nested list, without recursion (to avoid
    stack overflows). Returns a new list, the original list is unchanged.

    >> list(flatten_list([1, 2, 3, [4], [], [[[[[[[[[5]]]]]]]]]]))
    [1, 2, 3, 4, 5]
    >> list(flatten_list([[1, 2], 3]))
    [1, 2, 3]

    """
    nested_list = deepcopy(nested_list)

    while nested_list:
        sublist = nested_list.pop(0)

        if isinstance(sublist, list):
            nested_list = sublist + nested_list
        else:
            yield sublist

回答 13

我很惊讶没有人想到这一点。该死的递归我没有这里的高级人员做出的递归答案。无论如何,这是我的尝试。请注意,这是非常特定于OP的用例的

import re

L = [[[1, 2, 3], [4, 5]], 6]
flattened_list = re.sub("[\[\]]", "", str(L)).replace(" ", "").split(",")
new_list = list(map(int, flattened_list))
print(new_list)

输出:

[1, 2, 3, 4, 5, 6]

I’m surprised no one has thought of this. Damn recursion I don’t get the recursive answers that the advanced people here made. anyway here is my attempt on this. caveat is it’s very specific to the OP’s use case

import re

L = [[[1, 2, 3], [4, 5]], 6]
flattened_list = re.sub("[\[\]]", "", str(L)).replace(" ", "").split(",")
new_list = list(map(int, flattened_list))
print(new_list)

output:

[1, 2, 3, 4, 5, 6]

回答 14

我没有在这里浏览所有已经可用的答案,但这是我想到的一个衬里,它借鉴了Lisp的第一张清单和其余清单的处理方式

def flatten(l): return flatten(l[0]) + (flatten(l[1:]) if len(l) > 1 else []) if type(l) is list else [l]

这是一种简单而又不太简单的情况-

>>> flatten([1,[2,3],4])
[1, 2, 3, 4]

>>> flatten([1, [2, 3], 4, [5, [6, {'name': 'some_name', 'age':30}, 7]], [8, 9, [10, [11, [12, [13, {'some', 'set'}, 14, [15, 'some_string'], 16], 17, 18], 19], 20], 21, 22, [23, 24], 25], 26, 27, 28, 29, 30])
[1, 2, 3, 4, 5, 6, {'age': 30, 'name': 'some_name'}, 7, 8, 9, 10, 11, 12, 13, set(['set', 'some']), 14, 15, 'some_string', 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
>>> 

I didn’t go through all the already available answers here, but here is a one liner I came up with, borrowing from lisp’s way of first and rest list processing

def flatten(l): return flatten(l[0]) + (flatten(l[1:]) if len(l) > 1 else []) if type(l) is list else [l]

here is one simple and one not-so-simple case –

>>> flatten([1,[2,3],4])
[1, 2, 3, 4]

>>> flatten([1, [2, 3], 4, [5, [6, {'name': 'some_name', 'age':30}, 7]], [8, 9, [10, [11, [12, [13, {'some', 'set'}, 14, [15, 'some_string'], 16], 17, 18], 19], 20], 21, 22, [23, 24], 25], 26, 27, 28, 29, 30])
[1, 2, 3, 4, 5, 6, {'age': 30, 'name': 'some_name'}, 7, 8, 9, 10, 11, 12, 13, set(['set', 'some']), 14, 15, 'some_string', 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
>>> 

回答 15

当试图回答这样的问题时,您确实需要给出您提议作为解决方案的代码的限制。如果只考虑性能,我不会太在意,但是提议作为解决方案的大多数代码(包括可接受的答案)都无法使深度大于1000的列表变平。

当我说大多数代码我指的是所有使用任何形式的递归的代码(或调用递归的标准库函数)。所有这些代码都会失败,因为对于每个递归调用,(调用)堆栈都增加一个单位,而(默认)python调用堆栈的大小为1000。

如果您不太熟悉调用堆栈,那么以下内容可能会有所帮助(否则,您可以滚动到Implementation)。

调用堆栈大小和递归编程(类似于地下城)

寻找宝藏并退出

想象一下,您进入一个带编号房间的巨大地牢,寻找宝藏。您不知道这个地方,但是对于如何找到宝藏有一些指示。每个指示都是一个谜(难度各不相同,但是您无法预测它们的难易程度)。您决定对节省时间的策略进行一点思考,然后进行两个观察:

  1. 很难(很长)找到宝藏,因为您必须解决(可能很难)谜团才能到达那里。
  2. 找到宝藏后,返回入口可能很容易,您只需要在另一个方向上使用相同的路径即可(尽管这需要一点记忆才能调用您的路径)。

进入地牢时,您会在这里注意到一个小笔记本。您决定使用它来写下谜题(当进入新房间时)之后退出的每个房间,这样您就可以返回到入口。那是个天才的主意,您甚至都不会花一分钱实施自己的策略。

您进入了地牢,成功地解决了前1001个难题,但是这是您未曾计划的事情,您借用的笔记本中没有剩余空间。您决定放弃自己的任务,因为您更喜欢没有宝物,而不是永远迷失在地牢中(确实看起来很聪明)。

执行递归程序

基本上,这与寻找宝藏完全相同。地牢是计算机的内存,您现在的目标不是找到宝藏,而是要计算某些函数(对于给定x找到f(x))。这些指示只是子例程,可以帮助您解决f(x)。您的策略与调用堆栈策略相同,笔记本是堆栈,房间是函数的返回地址:

x = ["over here", "am", "I"]
y = sorted(x) # You're about to enter a room named `sorted`, note down the current room address here so you can return back: 0x4004f4 (that room address looks weird)
# Seems like you went back from your quest using the return address 0x4004f4
# Let's see what you've collected 
print(' '.join(y))

您在地牢中遇到的问题在这里将是相同的,调用堆栈的大小是有限的(此处为1000),因此,如果您输入了太多函数而没有返回,则您将填充调用堆栈并出现错误就像一次调用自己-一遍又一遍-),您将一遍又一遍地输入,直到计算完成(直到找到宝藏为止),然后返回,直到返回到调用的位置为止 “亲爱的冒险家,很抱歉,您的笔记本已经满了”:最初的地方。直到最后一次将调用栈从所有返回地址中释放出来之前,调用栈将永远不会被释放。RecursionError: maximum recursion depth exceeded。请注意,您不需要递归即可填充调用堆栈,但是非递归程序调用1000函数而永远不会返回的可能性很小。同样重要的是要了解,从函数返回后,调用栈将从使用的地址中释放出来(因此,名称“栈”,返回地址在进入函数之前就被压入,并在返回时被拉出)。在简单递归的特殊情况下(一个函数ffff

如何避免这个问题?

这实际上很简单:“如果您不知道递归的深度,请不要使用递归”。并非总是如此,因为在某些情况下,可以优化尾调用递归(TCO)。但是在python中,情况并非如此,即使“写得很好”的递归函数也无法优化堆栈的使用。Guido有一个有趣的帖子,关于这个问题:尾递归消除

您可以使用一种技术来迭代任何递归函数,我们可以称之为自带笔记本。例如,在我们的特定情况下,我们只是在探索一个列表,进入一个房间等同于进入一个子列表,您应该问自己的问题是如何从列表返回其父列表?答案并不那么复杂,请重复以下操作,直到stack为空:

  1. 推送当前列表,addressindexstack进入新的子列表时将其推入(请注意,列表地址+索引也是地址,因此我们只使用调用堆栈使用的完全相同的技术);
  2. 每次找到一个项目yield(或将它们添加到列表中);
  3. 完全浏览列表后,请使用stack return address(和index)返回父列表。

还要注意,这等效于树中的DFS,其中某些节点是子列表,A = [1, 2]而有些则是简单项:(0, 1, 2, 3, 4用于L = [0, [1,2], 3, 4])。树看起来像这样:

                    L
                    |
           -------------------
           |     |     |     |
           0   --A--   3     4
               |   |
               1   2

DFS遍历的顺序为:L,0,A,1、2、3、4。请记住,要实现迭代DFS,您还需要“堆栈”。我之前提出的实现导致具有以下状态(针对stackflat_list):

init.:  stack=[(L, 0)]
**0**:  stack=[(L, 0)],         flat_list=[0]
**A**:  stack=[(L, 1), (A, 0)], flat_list=[0]
**1**:  stack=[(L, 1), (A, 0)], flat_list=[0, 1]
**2**:  stack=[(L, 1), (A, 1)], flat_list=[0, 1, 2]
**3**:  stack=[(L, 2)],         flat_list=[0, 1, 2, 3]
**3**:  stack=[(L, 3)],         flat_list=[0, 1, 2, 3, 4]
return: stack=[],               flat_list=[0, 1, 2, 3, 4]

在此示例中,堆栈最大大小为2,因为输入列表(因此树)的深度为2。

实作

对于实现,在python中,您可以使用迭代器而不是简单的列表来简化一点。对(子)迭代器的引用将用于存储子列表的返回地址(而不是同时具有列表地址和索引)。这不是什么大的区别,但是我觉得这更具可读性(并且速度更快):

def flatten(iterable):
    return list(items_from(iterable))

def items_from(iterable):
    cursor_stack = [iter(iterable)]
    while cursor_stack:
        sub_iterable = cursor_stack[-1]
        try:
            item = next(sub_iterable)
        except StopIteration:   # post-order
            cursor_stack.pop()
            continue
        if is_list_like(item):  # pre-order
            cursor_stack.append(iter(item))
        elif item is not None:
            yield item          # in-order

def is_list_like(item):
    return isinstance(item, list)

另外,请注意,在is_list_likeI have中isinstance(item, list),可以将其更改为处理更多输入类型,在这里,我只想拥有最简单的版本,其中(可迭代)只是一个列表。但是您也可以这样做:

def is_list_like(item):
    try:
        iter(item)
        return not isinstance(item, str)  # strings are not lists (hmm...) 
    except TypeError:
        return False

flatten_iter([["test", "a"], "b])会将字符串视为“简单项目”,因此将返回["test", "a", "b"]而不是["t", "e", "s", "t", "a", "b"]。请注意,在这种情况下,iter(item)每个项目都会被调用两次,让我们假设这是读者练习此清洁器的一种练习。

测试和评论其他实现

最后,请记住,您不能使用来打印无限嵌套的列表Lprint(L)因为它在内部将使用对__repr__RecursionError: maximum recursion depth exceeded while getting the repr of an object)的递归调用。出于相同的原因,flatten涉及解决方案str将失败,并显示相同的错误消息。

如果您需要测试解决方案,则可以使用此函数生成一个简单的嵌套列表:

def build_deep_list(depth):
    """Returns a list of the form $l_{depth} = [depth-1, l_{depth-1}]$
    with $depth > 1$ and $l_0 = [0]$.
    """
    sub_list = [0]
    for d in range(1, depth):
        sub_list = [d, sub_list]
    return sub_list

给出:build_deep_list(5)>>> [4, [3, [2, [1, [0]]]]]

When trying to answer such a question you really need to give the limitations of the code you propose as a solution. If it was only about performances I wouldn’t mind too much, but most of the codes proposed as solution (including the accepted answer) fail to flatten any list that has a depth greater than 1000.

When I say most of the codes I mean all codes that use any form of recursion (or call a standard library function that is recursive). All these codes fail because for every of the recursive call made, the (call) stack grow by one unit, and the (default) python call stack has a size of 1000.

If you’re not too familiar with the call stack, then maybe the following will help (otherwise you can just scroll to the Implementation).

Call stack size and recursive programming (dungeon analogy)

Finding the treasure and exit

Imagine you enter a huge dungeon with numbered rooms, looking for a treasure. You don’t know the place but you have some indications on how to find the treasure. Each indication is a riddle (difficulty varies, but you can’t predict how hard they will be). You decide to think a little bit about a strategy to save time, you make two observations:

  1. It’s hard (long) to find the treasure as you’ll have to solve (potentially hard) riddles to get there.
  2. Once the treasure found, returning to the entrance may be easy, you just have to use the same path in the other direction (though this needs a bit of memory to recall your path).

When entering the dungeon, you notice a small notebook here. You decide to use it to write down every room you exit after solving a riddle (when entering a new room), this way you’ll be able to return back to the entrance. That’s a genius idea, you won’t even spend a cent implementing your strategy.

You enter the dungeon, solving with great success the first 1001 riddles, but here comes something you hadn’t planed, you have no space left in the notebook you borrowed. You decide to abandon your quest as you prefer not having the treasure than being lost forever inside the dungeon (that looks smart indeed).

Executing a recursive program

Basically, it’s the exact same thing as finding the treasure. The dungeon is the computer’s memory, your goal now is not to find a treasure but to compute some function (find f(x) for a given x). The indications simply are sub-routines that will help you solving f(x). Your strategy is the same as the call stack strategy, the notebook is the stack, the rooms are the functions’ return addresses:

x = ["over here", "am", "I"]
y = sorted(x) # You're about to enter a room named `sorted`, note down the current room address here so you can return back: 0x4004f4 (that room address looks weird)
# Seems like you went back from your quest using the return address 0x4004f4
# Let's see what you've collected 
print(' '.join(y))

The problem you encountered in the dungeon will be the same here, the call stack has a finite size (here 1000) and therefore, if you enter too many functions without returning back then you’ll fill the call stack and have an error that look like “Dear adventurer, I’m very sorry but your notebook is full”: RecursionError: maximum recursion depth exceeded. Note that you don’t need recursion to fill the call stack, but it’s very unlikely that a non-recursive program call 1000 functions without ever returning. It’s important to also understand that once you returned from a function, the call stack is freed from the address used (hence the name “stack”, return address are pushed in before entering a function and pulled out when returning). In the special case of a simple recursion (a function f that call itself once — over and over –) you will enter f over and over until the computation is finished (until the treasure is found) and return from f until you go back to the place where you called f in the first place. The call stack will never be freed from anything until the end where it will be freed from all return addresses one after the other.

How to avoid this issue?

That’s actually pretty simple: “don’t use recursion if you don’t know how deep it can go”. That’s not always true as in some cases, Tail Call recursion can be Optimized (TCO). But in python, this is not the case, and even “well written” recursive function will not optimize stack use. There is an interesting post from Guido about this question: Tail Recursion Elimination.

There is a technique that you can use to make any recursive function iterative, this technique we could call bring your own notebook. For example, in our particular case we simply are exploring a list, entering a room is equivalent to entering a sublist, the question you should ask yourself is how can I get back from a list to its parent list? The answer is not that complex, repeat the following until the stack is empty:

  1. push the current list address and index in a stack when entering a new sublist (note that a list address+index is also an address, therefore we just use the exact same technique used by the call stack);
  2. every time an item is found, yield it (or add them in a list);
  3. once a list is fully explored, go back to the parent list using the stack return address (and index).

Also note that this is equivalent to a DFS in a tree where some nodes are sublists A = [1, 2] and some are simple items: 0, 1, 2, 3, 4 (for L = [0, [1,2], 3, 4]). The tree looks like this:

                    L
                    |
           -------------------
           |     |     |     |
           0   --A--   3     4
               |   |
               1   2

The DFS traversal pre-order is: L, 0, A, 1, 2, 3, 4. Remember, in order to implement an iterative DFS you also “need” a stack. The implementation I proposed before result in having the following states (for the stack and the flat_list):

init.:  stack=[(L, 0)]
**0**:  stack=[(L, 0)],         flat_list=[0]
**A**:  stack=[(L, 1), (A, 0)], flat_list=[0]
**1**:  stack=[(L, 1), (A, 0)], flat_list=[0, 1]
**2**:  stack=[(L, 1), (A, 1)], flat_list=[0, 1, 2]
**3**:  stack=[(L, 2)],         flat_list=[0, 1, 2, 3]
**3**:  stack=[(L, 3)],         flat_list=[0, 1, 2, 3, 4]
return: stack=[],               flat_list=[0, 1, 2, 3, 4]

In this example, the stack maximum size is 2, because the input list (and therefore the tree) have depth 2.

Implementation

For the implementation, in python you can simplify a little bit by using iterators instead of simple lists. References to the (sub)iterators will be used to store sublists return addresses (instead of having both the list address and the index). This is not a big difference but I feel this is more readable (and also a bit faster):

def flatten(iterable):
    return list(items_from(iterable))

def items_from(iterable):
    cursor_stack = [iter(iterable)]
    while cursor_stack:
        sub_iterable = cursor_stack[-1]
        try:
            item = next(sub_iterable)
        except StopIteration:   # post-order
            cursor_stack.pop()
            continue
        if is_list_like(item):  # pre-order
            cursor_stack.append(iter(item))
        elif item is not None:
            yield item          # in-order

def is_list_like(item):
    return isinstance(item, list)

Also, notice that in is_list_like I have isinstance(item, list), which could be changed to handle more input types, here I just wanted to have the simplest version where (iterable) is just a list. But you could also do that:

def is_list_like(item):
    try:
        iter(item)
        return not isinstance(item, str)  # strings are not lists (hmm...) 
    except TypeError:
        return False

This considers strings as “simple items” and therefore flatten_iter([["test", "a"], "b]) will return ["test", "a", "b"] and not ["t", "e", "s", "t", "a", "b"]. Remark that in that case, iter(item) is called twice on each item, let’s pretend it’s an exercise for the reader to make this cleaner.

Testing and remarks on other implementations

In the end, remember that you can’t print a infinitely nested list L using print(L) because internally it will use recursive calls to __repr__ (RecursionError: maximum recursion depth exceeded while getting the repr of an object). For the same reason, solutions to flatten involving str will fail with the same error message.

If you need to test your solution, you can use this function to generate a simple nested list:

def build_deep_list(depth):
    """Returns a list of the form $l_{depth} = [depth-1, l_{depth-1}]$
    with $depth > 1$ and $l_0 = [0]$.
    """
    sub_list = [0]
    for d in range(1, depth):
        sub_list = [d, sub_list]
    return sub_list

Which gives: build_deep_list(5) >>> [4, [3, [2, [1, [0]]]]].


回答 16

这是compiler.ast.flatten2.7.5中的实现:

def flatten(seq):
    l = []
    for elt in seq:
        t = type(elt)
        if t is tuple or t is list:
            for elt2 in flatten(elt):
                l.append(elt2)
        else:
            l.append(elt)
    return l

有更好,更快的方法(如果您已经到达这里,您已经看到了它们)

另请注意:

自2.6版起弃用:编译器软件包已在Python 3中删除。

Here’s the compiler.ast.flatten implementation in 2.7.5:

def flatten(seq):
    l = []
    for elt in seq:
        t = type(elt)
        if t is tuple or t is list:
            for elt2 in flatten(elt):
                l.append(elt2)
        else:
            l.append(elt)
    return l

There are better, faster methods (If you’ve reached here, you have seen them already)

Also note:

Deprecated since version 2.6: The compiler package has been removed in Python 3.


回答 17

完全hacky,但我认为它可以工作(取决于您的data_type)

flat_list = ast.literal_eval("[%s]"%re.sub("[\[\]]","",str(the_list)))

totally hacky but I think it would work (depending on your data_type)

flat_list = ast.literal_eval("[%s]"%re.sub("[\[\]]","",str(the_list)))

回答 18

只需使用一个funcy库: pip install funcy

import funcy


funcy.flatten([[[[1, 1], 1], 2], 3]) # returns generator
funcy.lflatten([[[[1, 1], 1], 2], 3]) # returns list

Just use a funcy library: pip install funcy

import funcy


funcy.flatten([[[[1, 1], 1], 2], 3]) # returns generator
funcy.lflatten([[[[1, 1], 1], 2], 3]) # returns list

回答 19

这是另一种py2方法,我不确定它是最快还是最优雅也不最安全…

from collections import Iterable
from itertools import imap, repeat, chain


def flat(seqs, ignore=(int, long, float, basestring)):
    return repeat(seqs, 1) if any(imap(isinstance, repeat(seqs), ignore)) or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

它可以忽略您想要的任何特定(或派生)类型,它返回一个迭代器,因此您可以将其转换为任何特定的容器(例如list,tuple,dict或仅使用它)以减少内存占用,无论是好是坏它可以处理初始的不可迭代对象,例如int …

请注意,大多数繁重的工作都是在C中完成的,因为据我所知,这是itertools的实现方式,因此尽管是递归的,但AFAIK并不受python递归深度的限制,因为函数调用发生在C中,尽管这样做并不意味着您会受到内存的限制,特别是在OS X中,从今天开始,它的堆栈大小有了硬限制(OS X Mavericks)…

有一种稍微快一点的方法,但可移植性较低的方法,只有在可以假定可以明确确定输入的基本元素的情况下,才使用它,否则,将获得无限递归,并且具有有限堆栈大小的OS X将很快地引发细分错误…

def flat(seqs, ignore={int, long, float, str, unicode}):
    return repeat(seqs, 1) if type(seqs) in ignore or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

在这里,我们使用集合来检查类型,因此需要O(1)与O(类型数)来检查是否应忽略某个元素,尽管任何具有声明的被忽略类型的派生类型的值都将失败,这就是为什么要使用它strunicode因此请谨慎使用…

测试:

import random

def test_flat(test_size=2000):
    def increase_depth(value, depth=1):
        for func in xrange(depth):
            value = repeat(value, 1)
        return value

    def random_sub_chaining(nested_values):
        for values in nested_values:
            yield chain((values,), chain.from_iterable(imap(next, repeat(nested_values, random.randint(1, 10)))))

    expected_values = zip(xrange(test_size), imap(str, xrange(test_size)))
    nested_values = random_sub_chaining((increase_depth(value, depth) for depth, value in enumerate(expected_values)))
    assert not any(imap(cmp, chain.from_iterable(expected_values), flat(chain(((),), nested_values, ((),)))))

>>> test_flat()
>>> list(flat([[[1, 2, 3], [4, 5]], 6]))
[1, 2, 3, 4, 5, 6]
>>>  

$ uname -a
Darwin Samys-MacBook-Pro.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun  3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64
$ python --version
Python 2.7.5

Here is another py2 approach, Im not sure if its the fastest or the most elegant nor safest …

from collections import Iterable
from itertools import imap, repeat, chain


def flat(seqs, ignore=(int, long, float, basestring)):
    return repeat(seqs, 1) if any(imap(isinstance, repeat(seqs), ignore)) or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

It can ignore any specific (or derived) type you would like, it returns an iterator, so you can convert it to any specific container such as list, tuple, dict or simply consume it in order to reduce memory footprint, for better or worse it can handle initial non-iterable objects such as int …

Note most of the heavy lifting is done in C, since as far as I know thats how itertools are implemented, so while it is recursive, AFAIK it isn’t bounded by python recursion depth since the function calls are happening in C, though this doesn’t mean you are bounded by memory, specially in OS X where its stack size has a hard limit as of today (OS X Mavericks) …

there is a slightly faster approach, but less portable method, only use it if you can assume that the base elements of the input can be explicitly determined otherwise, you’ll get an infinite recursion, and OS X with its limited stack size, will throw a segmentation fault fairly quickly …

def flat(seqs, ignore={int, long, float, str, unicode}):
    return repeat(seqs, 1) if type(seqs) in ignore or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

here we are using sets to check for the type so it takes O(1) vs O(number of types) to check whether or not an element should be ignored, though of course any value with derived type of the stated ignored types will fail, this is why its using str, unicode so use it with caution …

tests:

import random

def test_flat(test_size=2000):
    def increase_depth(value, depth=1):
        for func in xrange(depth):
            value = repeat(value, 1)
        return value

    def random_sub_chaining(nested_values):
        for values in nested_values:
            yield chain((values,), chain.from_iterable(imap(next, repeat(nested_values, random.randint(1, 10)))))

    expected_values = zip(xrange(test_size), imap(str, xrange(test_size)))
    nested_values = random_sub_chaining((increase_depth(value, depth) for depth, value in enumerate(expected_values)))
    assert not any(imap(cmp, chain.from_iterable(expected_values), flat(chain(((),), nested_values, ((),)))))

>>> test_flat()
>>> list(flat([[[1, 2, 3], [4, 5]], 6]))
[1, 2, 3, 4, 5, 6]
>>>  

$ uname -a
Darwin Samys-MacBook-Pro.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun  3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64
$ python --version
Python 2.7.5

回答 20

不使用任何库:

def flat(l):
    def _flat(l, r):    
        if type(l) is not list:
            r.append(l)
        else:
            for i in l:
                r = r + flat(i)
        return r
    return _flat(l, [])



# example
test = [[1], [[2]], [3], [['a','b','c'] , [['z','x','y']], ['d','f','g']], 4]    
print flat(test) # prints [1, 2, 3, 'a', 'b', 'c', 'z', 'x', 'y', 'd', 'f', 'g', 4]

Without using any library:

def flat(l):
    def _flat(l, r):    
        if type(l) is not list:
            r.append(l)
        else:
            for i in l:
                r = r + flat(i)
        return r
    return _flat(l, [])



# example
test = [[1], [[2]], [3], [['a','b','c'] , [['z','x','y']], ['d','f','g']], 4]    
print flat(test) # prints [1, 2, 3, 'a', 'b', 'c', 'z', 'x', 'y', 'd', 'f', 'g', 4]

回答 21

使用itertools.chain

import itertools
from collections import Iterable

def list_flatten(lst):
    flat_lst = []
    for item in itertools.chain(lst):
        if isinstance(item, Iterable):
            item = list_flatten(item)
            flat_lst.extend(item)
        else:
            flat_lst.append(item)
    return flat_lst

或不链接:

def flatten(q, final):
    if not q:
        return
    if isinstance(q, list):
        if not isinstance(q[0], list):
            final.append(q[0])
        else:
            flatten(q[0], final)
        flatten(q[1:], final)
    else:
        final.append(q)

Using itertools.chain:

import itertools
from collections import Iterable

def list_flatten(lst):
    flat_lst = []
    for item in itertools.chain(lst):
        if isinstance(item, Iterable):
            item = list_flatten(item)
            flat_lst.extend(item)
        else:
            flat_lst.append(item)
    return flat_lst

Or without chaining:

def flatten(q, final):
    if not q:
        return
    if isinstance(q, list):
        if not isinstance(q[0], list):
            final.append(q[0])
        else:
            flatten(q[0], final)
        flatten(q[1:], final)
    else:
        final.append(q)

回答 22

我使用递归来解决任何深度的嵌套列表

def combine_nlist(nlist,init=0,combiner=lambda x,y: x+y):
    '''
    apply function: combiner to a nested list element by element(treated as flatten list)
    '''
    current_value=init
    for each_item in nlist:
        if isinstance(each_item,list):
            current_value =combine_nlist(each_item,current_value,combiner)
        else:
            current_value = combiner(current_value,each_item)
    return current_value

因此,在定义函数combin_nlist之后,就很容易使用此函数进行展平。或者,您可以将其组合为一个功能。我喜欢我的解决方案,因为它可以应用于任何嵌套列表。

def flatten_nlist(nlist):
    return combine_nlist(nlist,[],lambda x,y:x+[y])

结果

In [379]: flatten_nlist([1,2,3,[4,5],[6],[[[7],8],9],10])
Out[379]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

I used recursive to solve nested list with any depth

def combine_nlist(nlist,init=0,combiner=lambda x,y: x+y):
    '''
    apply function: combiner to a nested list element by element(treated as flatten list)
    '''
    current_value=init
    for each_item in nlist:
        if isinstance(each_item,list):
            current_value =combine_nlist(each_item,current_value,combiner)
        else:
            current_value = combiner(current_value,each_item)
    return current_value

So after i define function combine_nlist, it is easy to use this function do flatting. Or you can combine it into one function. I like my solution because it can be applied to any nested list.

def flatten_nlist(nlist):
    return combine_nlist(nlist,[],lambda x,y:x+[y])

result

In [379]: flatten_nlist([1,2,3,[4,5],[6],[[[7],8],9],10])
Out[379]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

回答 23

最简单的方法是使用变身利用图书馆pip install morph

代码是:

import morph

list = [[[1, 2, 3], [4, 5]], 6]
flattened_list = morph.flatten(list)  # returns [1, 2, 3, 4, 5, 6]

The easiest way is to use the morph library using pip install morph.

The code is:

import morph

list = [[[1, 2, 3], [4, 5]], 6]
flattened_list = morph.flatten(list)  # returns [1, 2, 3, 4, 5, 6]

回答 24

我知道已经有很多很棒的答案,但是我想添加一个使用功能性编程方法解决问题的答案。在这个答案中,我使用了双重递归:

def flatten_list(seq):
    if not seq:
        return []
    elif isinstance(seq[0],list):
        return (flatten_list(seq[0])+flatten_list(seq[1:]))
    else:
        return [seq[0]]+flatten_list(seq[1:])

print(flatten_list([1,2,[3,[4],5],[6,7]]))

输出:

[1, 2, 3, 4, 5, 6, 7]

I am aware that there are already many awesome answers but i wanted to add an answer that uses the functional programming method of solving the question. In this answer i make use of double recursion :

def flatten_list(seq):
    if not seq:
        return []
    elif isinstance(seq[0],list):
        return (flatten_list(seq[0])+flatten_list(seq[1:]))
    else:
        return [seq[0]]+flatten_list(seq[1:])

print(flatten_list([1,2,[3,[4],5],[6,7]]))

output:

[1, 2, 3, 4, 5, 6, 7]

回答 25

我不确定这是否一定更快或更有效,但这是我要做的:

def flatten(lst):
    return eval('[' + str(lst).replace('[', '').replace(']', '') + ']')

L = [[[1, 2, 3], [4, 5]], 6]
print(flatten(L))

flatten这里的函数将列表转换为字符串,取出所有方括号,将方括号附加到两端,然后将其重新转换为列表。

虽然,如果您知道列表中的方括号中包含字符串,例如[[1, 2], "[3, 4] and [5]"],则您需要做其他事情。

I’m not sure if this is necessarily quicker or more effective, but this is what I do:

def flatten(lst):
    return eval('[' + str(lst).replace('[', '').replace(']', '') + ']')

L = [[[1, 2, 3], [4, 5]], 6]
print(flatten(L))

The flatten function here turns the list into a string, takes out all of the square brackets, attaches square brackets back onto the ends, and turns it back into a list.

Although, if you knew you would have square brackets in your list in strings, like [[1, 2], "[3, 4] and [5]"], you would have to do something else.


回答 26

这是在python2上进行flatten的简单实现

flatten=lambda l: reduce(lambda x,y:x+y,map(flatten,l),[]) if isinstance(l,list) else [l]

test=[[1,2,3,[3,4,5],[6,7,[8,9,[10,[11,[12,13,14]]]]]],]
print flatten(test)

#output [1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

This is a simple implement of flatten on python2

flatten=lambda l: reduce(lambda x,y:x+y,map(flatten,l),[]) if isinstance(l,list) else [l]

test=[[1,2,3,[3,4,5],[6,7,[8,9,[10,[11,[12,13,14]]]]]],]
print flatten(test)

#output [1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

回答 27

这将使列表或字典(或列表列表或字典的字典等)变平。它假定值是字符串,并创建一个字符串,该字符串将每个项目与分隔符参数连接在一起。如果需要,可以使用分隔符随后将结果拆分为列表对象。如果下一个值是列表或字符串,则使用递归。使用key参数来告诉您要使用字典对象的键还是值(将key设置为false)。

def flatten_obj(n_obj, key=True, my_sep=''):
    my_string = ''
    if type(n_obj) == list:
        for val in n_obj:
            my_sep_setter = my_sep if my_string != '' else ''
            if type(val) == list or type(val) == dict:
                my_string += my_sep_setter + flatten_obj(val, key, my_sep)
            else:
                my_string += my_sep_setter + val
    elif type(n_obj) == dict:
        for k, v in n_obj.items():
            my_sep_setter = my_sep if my_string != '' else ''
            d_val = k if key else v
            if type(v) == list or type(v) == dict:
                my_string += my_sep_setter + flatten_obj(v, key, my_sep)
            else:
                my_string += my_sep_setter + d_val
    elif type(n_obj) == str:
        my_sep_setter = my_sep if my_string != '' else ''
        my_string += my_sep_setter + n_obj
        return my_string
    return my_string

print(flatten_obj(['just', 'a', ['test', 'to', 'try'], 'right', 'now', ['or', 'later', 'today'],
                [{'dictionary_test': 'test'}, {'dictionary_test_two': 'later_today'}, 'my power is 9000']], my_sep=', ')

Yield:

just, a, test, to, try, right, now, or, later, today, dictionary_test, dictionary_test_two, my power is 9000

This will flatten a list or dictionary (or list of lists or dictionaries of dictionaries etc). It assumes that the values are strings and it creates a string that concatenates each item with a separator argument. If you wanted you could use the separator to split the result into a list object afterward. It uses recursion if the next value is a list or a string. Use the key argument to tell whether you want the keys or the values (set key to false) from the dictionary object.

def flatten_obj(n_obj, key=True, my_sep=''):
    my_string = ''
    if type(n_obj) == list:
        for val in n_obj:
            my_sep_setter = my_sep if my_string != '' else ''
            if type(val) == list or type(val) == dict:
                my_string += my_sep_setter + flatten_obj(val, key, my_sep)
            else:
                my_string += my_sep_setter + val
    elif type(n_obj) == dict:
        for k, v in n_obj.items():
            my_sep_setter = my_sep if my_string != '' else ''
            d_val = k if key else v
            if type(v) == list or type(v) == dict:
                my_string += my_sep_setter + flatten_obj(v, key, my_sep)
            else:
                my_string += my_sep_setter + d_val
    elif type(n_obj) == str:
        my_sep_setter = my_sep if my_string != '' else ''
        my_string += my_sep_setter + n_obj
        return my_string
    return my_string

print(flatten_obj(['just', 'a', ['test', 'to', 'try'], 'right', 'now', ['or', 'later', 'today'],
                [{'dictionary_test': 'test'}, {'dictionary_test_two': 'later_today'}, 'my power is 9000']], my_sep=', ')

yields:

just, a, test, to, try, right, now, or, later, today, dictionary_test, dictionary_test_two, my power is 9000

回答 28

如果您喜欢递归,这可能是您感兴趣的解决方案:

def f(E):
    if E==[]: 
        return []
    elif type(E) != list: 
        return [E]
    else:
        a = f(E[0])
        b = f(E[1:])
        a.extend(b)
        return a

我实际上是从前一段时间写的一些练习Scheme代码中改编而成的。

请享用!

If you like recursion, this might be a solution of interest to you:

def f(E):
    if E==[]: 
        return []
    elif type(E) != list: 
        return [E]
    else:
        a = f(E[0])
        b = f(E[1:])
        a.extend(b)
        return a

I actually adapted this from some practice Scheme code that I had written a while back.

Enjoy!


回答 29

我是python的新手,来自Lisp背景。这是我想出的(查看lulz的var名称):

def flatten(lst):
    if lst:
        car,*cdr=lst
        if isinstance(car,(list,tuple)):
            if cdr: return flatten(car) + flatten(cdr)
            return flatten(car)
        if cdr: return [car] + flatten(cdr)
        return [car]

似乎可以工作。测试:

flatten((1,2,3,(4,5,6,(7,8,(((1,2)))))))

返回:

[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]

I’m new to python and come from a lisp background. This is what I came up with (check out the var names for lulz):

def flatten(lst):
    if lst:
        car,*cdr=lst
        if isinstance(car,(list,tuple)):
            if cdr: return flatten(car) + flatten(cdr)
            return flatten(car)
        if cdr: return [car] + flatten(cdr)
        return [car]

Seems to work. Test:

flatten((1,2,3,(4,5,6,(7,8,(((1,2)))))))

returns:

[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]