分类目录归档:知识问答

使用@property与getter和setter

问题:使用@property与getter和setter

这是一个纯Python特定的设计问题:

class MyClass(object):
    ...
    def get_my_attr(self):
        ...

    def set_my_attr(self, value):
        ...

class MyClass(object):
    ...        
    @property
    def my_attr(self):
        ...

    @my_attr.setter
    def my_attr(self, value):
        ...

Python让我们可以以任何一种方式来做。如果要设计Python程序,将使用哪种方法,为什么?

Here is a pure Python-specific design question:

class MyClass(object):
    ...
    def get_my_attr(self):
        ...

    def set_my_attr(self, value):
        ...

and

class MyClass(object):
    ...        
    @property
    def my_attr(self):
        ...

    @my_attr.setter
    def my_attr(self, value):
        ...

Python lets us to do it either way. If you would design a Python program, which approach would you use and why?


回答 0

首选属性。这就是他们在那里的目的。

原因是所有属性在Python中都是公共的。以一两个下划线开头的名称只是警告,给定属性是实现细节,在将来的代码版本中可能会保持不变。它不会阻止您实际获取或设置该属性。因此,标准属性访问是访问属性的常规Python方式。

属性的优点是它们在语法上与属性访问相同,因此您可以在不更改客户端代码的情况下从一个属性更改为另一个属性。您甚至可以拥有使用属性的类的一个版本(例如,用于按合同进行代码或调试),而不用于生产的版本,而无需更改使用该属性的代码。同时,您不必为所有内容编写getter和setter,以防万一您以后可能需要更好地控制访问。

Prefer properties. It’s what they’re there for.

The reason is that all attributes are public in Python. Starting names with an underscore or two is just a warning that the given attribute is an implementation detail that may not stay the same in future versions of the code. It doesn’t prevent you from actually getting or setting that attribute. Therefore, standard attribute access is the normal, Pythonic way of, well, accessing attributes.

The advantage of properties is that they are syntactically identical to attribute access, so you can change from one to another without any changes to client code. You could even have one version of a class that uses properties (say, for code-by-contract or debugging) and one that doesn’t for production, without changing the code that uses it. At the same time, you don’t have to write getters and setters for everything just in case you might need to better control access later.


回答 1

在Python中,您不会仅仅为了获得乐趣而使用getter,setter或属性。首先,您只使用属性,然后,仅在需要时才最终将其迁移到属性,而不必使用类更改代码。

确实有很多扩展名为.py的代码,它们在任何地方(例如,简单的元组)都可以使用getter和setters以及继承和无意义的类,但这是人们使用Python用C ++或Java编写的。

那不是Python代码。

In Python you don’t use getters or setters or properties just for the fun of it. You first just use attributes and then later, only if needed, eventually migrate to a property without having to change the code using your classes.

There is indeed a lot of code with extension .py that uses getters and setters and inheritance and pointless classes everywhere where e.g. a simple tuple would do, but it’s code from people writing in C++ or Java using Python.

That’s not Python code.


回答 2

使用属性可以使您从普通的属性访问开始,然后在必要时使用getter和setter备份它们

Using properties lets you begin with normal attribute accesses and then back them up with getters and setters afterwards as necessary.


回答 3

简短的答案是:属性胜出。总是。

有时有时需要吸气剂和吸气剂,但即使那样,我仍会将其“隐藏”到外界。有很多方法在Python做到这一点(getattrsetattr__getattribute__,等等,但一个非常简洁和干净的一个是:

def set_email(self, value):
    if '@' not in value:
        raise Exception("This doesn't look like an email address.")
    self._email = value

def get_email(self):
    return self._email

email = property(get_email, set_email)

这是一篇简短的文章,介绍Python中的getter和setter主题。

The short answer is: properties wins hands down. Always.

There is sometimes a need for getters and setters, but even then, I would “hide” them to the outside world. There are plenty of ways to do this in Python (getattr, setattr, __getattribute__, etc…, but a very concise and clean one is:

def set_email(self, value):
    if '@' not in value:
        raise Exception("This doesn't look like an email address.")
    self._email = value

def get_email(self):
    return self._email

email = property(get_email, set_email)

Here’s a brief article that introduces the topic of getters and setters in Python.


回答 4

[ TL; DR? 您可以跳到最后一个代码示例。]

实际上,我更喜欢使用另一种习惯用法,这是一个单独的习惯,但是如果您有一个更复杂的用例,那就很好了。

首先有一些背景知识。

属性是有用的,因为它们允许我们以编程方式处理设置和获取值,但仍允许将属性作为属性进行访问。我们可以(基本上)将“获取”转换为“计算”,并且可以将“设置”转换为“事件”。假设我们有以下类,我已经使用类似Java的getter和setter进行了编码。

class Example(object):
    def __init__(self, x=None, y=None):
        self.x = x
        self.y = y

    def getX(self):
        return self.x or self.defaultX()

    def getY(self):
        return self.y or self.defaultY()

    def setX(self, x):
        self.x = x

    def setY(self, y):
        self.y = y

    def defaultX(self):
        return someDefaultComputationForX()

    def defaultY(self):
        return someDefaultComputationForY()

你可能会奇怪,为什么我没有打电话defaultX,并defaultY在对象的__init__方法。原因是,对于我们的情况,我想假设这些someDefaultComputation方法返回的值会随时间变化,例如时间戳记,以及未设置(xy)的时间(在本示例中,“未设置”的意思是“已设置”到无“)我想的值x的(或y的)默认计算。

因此,由于上述多种原因,这是la脚的。我将使用属性重写它:

class Example(object):
    def __init__(self, x=None, y=None):
        self._x = x
        self._y = y

    @property
    def x(self):
        return self.x or self.defaultX()

    @x.setter
    def x(self, value):
        self._x = value

    @property
    def y(self):
        return self.y or self.defaultY()

    @y.setter
    def y(self, value):
        self._y = value

    # default{XY} as before.

我们获得了什么?我们已经拥有将这些属性称为属性的能力,即使我们在后台最终运行了方法。

当然,属性的真正威力在于,我们通常希望这些方法除了获取和设置值外还可以做一些事情(否则,使用属性毫无意义)。我在我的getter示例中做到了这一点。基本上,我们正在运行一个函数主体以在未设置任何值时获取默认值。这是非常常见的模式。

但是,我们正在失去什么,我们不能做什么?

在我看来,主要的烦恼是,如果您定义了一个吸气剂(就像我们在这里所做的那样),那么您还必须定义一个setter。[1] 那是使代码混乱的额外噪音。

另一个烦人的地方是,我们仍然必须初始化中的xy__init__。(当然,我们可以使用添加它们,setattr()但这是更多的额外代码。)

第三,与类似Java的示例不同,getter无法接受其他参数。现在,我已经可以听到您说的了,好吧,如果它带有参数,那不是吸气剂!从官方的角度来看,这是正确的。但从实际意义上讲,没有理由我们不能参数化命名属性(例如)x,并为某些特定参数设置其值。

如果我们可以做类似的事情会很好:

e.x[a,b,c] = 10
e.x[d,e,f] = 20

例如。我们得到的最接近的结果是重写赋值,以暗示某些特殊的语义:

e.x = [a,b,c,10]
e.x = [d,e,f,30]

并且当然要确保我们的设置者知道如何提取前三个值作为字典的键并将其值设置为数字或其他内容。

但是即使这样做,我们仍然不能用属性来支持它,因为没有办法获取值,因为我们根本无法将参数传递给getter。因此,我们必须返回所有内容,并引入了不对称性。

Java风格的getter / setter确实可以解决这个问题,但我们又回到了需要getter / setter的地方。

在我看来,我们真正想要的是满足以下要求的东西:

  • 用户仅为给定属性定义一种方法,并可以在其中指示该属性是只读还是读写属性。如果属性为可写属性,则此测试将失败。

  • 用户无需在函数下面定义额外的变量,因此我们不需要代码中的__init__setattr。实际上,由于我们已经创建了这种新样式属性,因此该变量存在。

  • 该属性的任何默认代码都在方法主体本身中执行。

  • 我们可以将属性设置为属性,并将其引用为属性。

  • 我们可以参数化属性。

在代码方面,我们需要一种编写方式:

def x(self, *args):
    return defaultX()

然后能够执行以下操作:

print e.x     -> The default at time T0
e.x = 1
print e.x     -> 1
e.x = None
print e.x     -> The default at time T1

等等。

我们还希望有一种方法可以针对可参数化属性的特殊情况执行此操作,但仍允许默认的大小写有效。您将在下面看到我的处理方法。

现在到了要点(是的!要点!)。我为此提出的解决方案如下。

我们创建一个新对象来替换属性的概念。该对象旨在存储为其设置的变量的值,而且还维护知道如何计算默认值的代码的句柄。它的工作是存储设置value或运行该method值(如果未设置)。

我们称它为UberProperty

class UberProperty(object):

    def __init__(self, method):
        self.method = method
        self.value = None
        self.isSet = False

    def setValue(self, value):
        self.value = value
        self.isSet = True

    def clearValue(self):
        self.value = None
        self.isSet = False

我假设method这里是一个类方法,value是的值UberProperty,并且我添加了它,isSet因为它None可能是真实值,这使我们可以采用一种干净的方式来声明确实没有“值”。另一种方式是某种形式的哨兵。

基本上,这给了我们一个可以做我们想要的对象的对象,但是实际上如何将它放在我们的类上呢?好吧,属性使用装饰器;我们为什么不能呢?让我们看看它的外观(从这里开始,我将坚持只使用一个’attribute’,x)。

class Example(object):

    @uberProperty
    def x(self):
        return defaultX()

当然,这实际上还行不通。我们必须实现uberProperty并确保它能够处理获取和设置。

让我们从获取开始。

我的第一次尝试是简单地创建一个新的UberProperty对象并返回它:

def uberProperty(f):
    return UberProperty(f)

当然,我很快发现这是行不通的:Python从不将可调用对象绑定到对象,并且我需要对象才能调用该函数。即使在类中创建装饰器也不起作用,尽管现在我们有了类,但仍然没有可以使用的对象。

因此,我们将需要在这里做更多的事情。我们确实知道一种方法只需要表示一次,所以让我们继续保留装饰器,但是修改UberProperty为仅存储method引用:

class UberProperty(object):

    def __init__(self, method):
        self.method = method

它也是不可调用的,因此目前没有任何效果。

我们如何完成图片?好吧,当我们使用新的装饰器创建示例类时,最终会得到什么:

class Example(object):

    @uberProperty
    def x(self):
        return defaultX()

print Example.x     <__main__.UberProperty object at 0x10e1fb8d0>
print Example().x   <__main__.UberProperty object at 0x10e1fb8d0>

在这两种情况下,我们都返回UberProperty哪个当然不是可调用的,所以这没什么用。

我们需要的是一种UberProperty在类创建后将装饰者创建的实例动态绑定到该类的对象,然后将该对象返回给该用户使用的动态绑定方法。嗯,是的__init__,老兄。

让我们写下我们希望我们的搜索结果为第一的内容。我们将an绑定UberProperty到实例,因此要返回的显而易见的东西是BoundUberProperty。这是我们实际维护x属性状态的地方。

class BoundUberProperty(object):
    def __init__(self, obj, uberProperty):
        self.obj = obj
        self.uberProperty = uberProperty
        self.isSet = False

    def setValue(self, value):
        self.value = value
        self.isSet = True

    def getValue(self):
        return self.value if self.isSet else self.uberProperty.method(self.obj)

    def clearValue(self):
        del self.value
        self.isSet = False

现在我们来表示;如何将它们放在物体上?有几种方法,但是最容易解释的__init__方法就是使用该方法进行映射。到__init__我们的装饰器运行时为止,所以只需要浏览对象的对象__dict__并更新属性值是type的所有属性UberProperty

现在,uber-properties很酷,我们可能会想大量使用它们,因此仅创建一个对所有子类都执行此操作的基类是有意义的。我认为您知道将要调用的基类。

class UberObject(object):
    def __init__(self):
        for k in dir(self):
            v = getattr(self, k)
            if isinstance(v, UberProperty):
                v = BoundUberProperty(self, v)
                setattr(self, k, v)

我们添加此代码,将示例更改为从继承UberObject,并…

e = Example()
print e.x               -> <__main__.BoundUberProperty object at 0x104604c90>

修改x为:

@uberProperty
def x(self):
    return *datetime.datetime.now()*

我们可以运行一个简单的测试:

print e.x.getValue()
print e.x.getValue()
e.x.setValue(datetime.date(2013, 5, 31))
print e.x.getValue()
e.x.clearValue()
print e.x.getValue()

然后我们得到想要的输出:

2013-05-31 00:05:13.985813
2013-05-31 00:05:13.986290
2013-05-31
2013-05-31 00:05:13.986310

(老兄,我迟到了。)

请注意,我已经使用getValuesetValue以及clearValue在这里。这是因为我还没有链接自动返回这些值的方法。

但是我认为这是一个停止的好地方,因为我累了。您还可以看到我们所需的核心功能已经到位。其余的是橱窗装饰。重要的可用性窗口修饰,但是可以等到我进行更改以更新帖子。

我将通过解决以下问题来完成下一个示例中的示例:

  • 我们需要确保UberObject __init__始终由子类调用。

    • 因此,我们要么强制在某个地方调用它,要么阻止其实现。
    • 我们将看到如何使用元类来做到这一点。
  • 我们需要确保能够处理某些人将函数“别名”为其他事物的常见情况,例如:

      class Example(object):
          @uberProperty
          def x(self):
              ...
    
          y = x
  • 我们需要默认e.x返回e.x.getValue()

    • 我们实际上将看到的是模型失败的领域。
    • 事实证明,我们始终需要使用函数调用来获取值。
    • 但是我们可以使其看起来像常规函数调用,而不必使用e.x.getValue()。(如果您还没有解决问题,那么这样做很明显。)
  • 我们需要支持设置e.x directly,如中所示e.x = <newvalue>。我们也可以在父类中执行此操作,但是我们需要更新__init__代码以进行处理。

  • 最后,我们将添加参数化属性。我们也将如何做到这一点很明显。

这是到目前为止的代码:

import datetime

class UberObject(object):
    def uberSetter(self, value):
        print 'setting'

    def uberGetter(self):
        return self

    def __init__(self):
        for k in dir(self):
            v = getattr(self, k)
            if isinstance(v, UberProperty):
                v = BoundUberProperty(self, v)
                setattr(self, k, v)


class UberProperty(object):
    def __init__(self, method):
        self.method = method

class BoundUberProperty(object):
    def __init__(self, obj, uberProperty):
        self.obj = obj
        self.uberProperty = uberProperty
        self.isSet = False

    def setValue(self, value):
        self.value = value
        self.isSet = True

    def getValue(self):
        return self.value if self.isSet else self.uberProperty.method(self.obj)

    def clearValue(self):
        del self.value
        self.isSet = False

    def uberProperty(f):
        return UberProperty(f)

class Example(UberObject):

    @uberProperty
    def x(self):
        return datetime.datetime.now()

[1]对于是否仍然如此,我可能会落后。

[TL;DR? You can skip to the end for a code example.]

I actually prefer to use a different idiom, which is a little involved for using as a one off, but is nice if you have a more complex use case.

A bit of background first.

Properties are useful in that they allow us to handle both setting and getting values in a programmatic way but still allow attributes to be accessed as attributes. We can turn ‘gets’ into ‘computations’ (essentially) and we can turn ‘sets’ into ‘events’. So let’s say we have the following class, which I’ve coded with Java-like getters and setters.

class Example(object):
    def __init__(self, x=None, y=None):
        self.x = x
        self.y = y

    def getX(self):
        return self.x or self.defaultX()

    def getY(self):
        return self.y or self.defaultY()

    def setX(self, x):
        self.x = x

    def setY(self, y):
        self.y = y

    def defaultX(self):
        return someDefaultComputationForX()

    def defaultY(self):
        return someDefaultComputationForY()

You may be wondering why I didn’t call defaultX and defaultY in the object’s __init__ method. The reason is that for our case I want to assume that the someDefaultComputation methods return values that vary over time, say a timestamp, and whenever x (or y) is not set (where, for the purpose of this example, “not set” means “set to None”) I want the value of x‘s (or y‘s) default computation.

So this is lame for a number of reasons describe above. I’ll rewrite it using properties:

class Example(object):
    def __init__(self, x=None, y=None):
        self._x = x
        self._y = y

    @property
    def x(self):
        return self.x or self.defaultX()

    @x.setter
    def x(self, value):
        self._x = value

    @property
    def y(self):
        return self.y or self.defaultY()

    @y.setter
    def y(self, value):
        self._y = value

    # default{XY} as before.

What have we gained? We’ve gained the ability to refer to these attributes as attributes even though, behind the scenes, we end up running methods.

Of course the real power of properties is that we generally want these methods to do something in addition to just getting and setting values (otherwise there is no point in using properties). I did this in my getter example. We are basically running a function body to pick up a default whenever the value isn’t set. This is a very common pattern.

But what are we losing, and what can’t we do?

The main annoyance, in my view, is that if you define a getter (as we do here) you also have to define a setter.[1] That’s extra noise that clutters the code.

Another annoyance is that we still have to initialize the x and y values in __init__. (Well, of course we could add them using setattr() but that is more extra code.)

Third, unlike in the Java-like example, getters cannot accept other parameters. Now I can hear you saying already, well, if it’s taking parameters it’s not a getter! In an official sense, that is true. But in a practical sense there is no reason we shouldn’t be able to parameterize an named attribute — like x — and set its value for some specific parameters.

It’d be nice if we could do something like:

e.x[a,b,c] = 10
e.x[d,e,f] = 20

for example. The closest we can get is to override the assignment to imply some special semantics:

e.x = [a,b,c,10]
e.x = [d,e,f,30]

and of course ensure that our setter knows how to extract the first three values as a key to a dictionary and set its value to a number or something.

But even if we did that we still couldn’t support it with properties because there is no way to get the value because we can’t pass parameters at all to the getter. So we’ve had to return everything, introducing an asymmetry.

The Java-style getter/setter does let us handle this, but we’re back to needing getter/setters.

In my mind what we really want is something that capture the following requirements:

  • Users define just one method for a given attribute and can indicate there whether the attribute is read-only or read-write. Properties fail this test if the attribute writable.

  • There is no need for the user to define an extra variable underlying the function, so we don’t need the __init__ or setattr in the code. The variable just exists by the fact we’ve created this new-style attribute.

  • Any default code for the attribute executes in the method body itself.

  • We can set the attribute as an attribute and reference it as an attribute.

  • We can parameterize the attribute.

In terms of code, we want a way to write:

def x(self, *args):
    return defaultX()

and be able to then do:

print e.x     -> The default at time T0
e.x = 1
print e.x     -> 1
e.x = None
print e.x     -> The default at time T1

and so forth.

We also want a way to do this for the special case of a parameterizable attribute, but still allow the default assign case to work. You’ll see how I tackled this below.

Now to the point (yay! the point!). The solution I came up for for this is as follows.

We create a new object to replace the notion of a property. The object is intended to store the value of a variable set to it, but also maintains a handle on code that knows how to calculate a default. Its job is to store the set value or to run the method if that value is not set.

Let’s call it an UberProperty.

class UberProperty(object):

    def __init__(self, method):
        self.method = method
        self.value = None
        self.isSet = False

    def setValue(self, value):
        self.value = value
        self.isSet = True

    def clearValue(self):
        self.value = None
        self.isSet = False

I assume method here is a class method, value is the value of the UberProperty, and I have added isSet because None may be a real value and this allows us a clean way to declare there really is “no value”. Another way is a sentinel of some sort.

This basically gives us an object that can do what we want, but how do we actually put it on our class? Well, properties use decorators; why can’t we? Let’s see how it might look (from here on I’m going to stick to using just a single ‘attribute’, x).

class Example(object):

    @uberProperty
    def x(self):
        return defaultX()

This doesn’t actually work yet, of course. We have to implement uberProperty and make sure it handles both gets and sets.

Let’s start with gets.

My first attempt was to simply create a new UberProperty object and return it:

def uberProperty(f):
    return UberProperty(f)

I quickly discovered, of course, that this doens’t work: Python never binds the callable to the object and I need the object in order to call the function. Even creating the decorator in the class doesn’t work, as although now we have the class, we still don’t have an object to work with.

So we’re going to need to be able to do more here. We do know that a method need only be represented the one time, so let’s go ahead and keep our decorator, but modify UberProperty to only store the method reference:

class UberProperty(object):

    def __init__(self, method):
        self.method = method

It is also not callable, so at the moment nothing is working.

How do we complete the picture? Well, what do we end up with when we create the example class using our new decorator:

class Example(object):

    @uberProperty
    def x(self):
        return defaultX()

print Example.x     <__main__.UberProperty object at 0x10e1fb8d0>
print Example().x   <__main__.UberProperty object at 0x10e1fb8d0>

in both cases we get back the UberProperty which of course is not a callable, so this isn’t of much use.

What we need is some way to dynamically bind the UberProperty instance created by the decorator after the class has been created to an object of the class before that object has been returned to that user for use. Um, yeah, that’s an __init__ call, dude.

Let’s write up what we want our find result to be first. We’re binding an UberProperty to an instance, so an obvious thing to return would be a BoundUberProperty. This is where we’ll actually maintain state for the x attribute.

class BoundUberProperty(object):
    def __init__(self, obj, uberProperty):
        self.obj = obj
        self.uberProperty = uberProperty
        self.isSet = False

    def setValue(self, value):
        self.value = value
        self.isSet = True

    def getValue(self):
        return self.value if self.isSet else self.uberProperty.method(self.obj)

    def clearValue(self):
        del self.value
        self.isSet = False

Now we the representation; how do get these on to an object? There are a few approaches, but the easiest one to explain just uses the __init__ method to do that mapping. By the time __init__ is called our decorators have run, so just need to look through the object’s __dict__ and update any attributes where the value of the attribute is of type UberProperty.

Now, uber-properties are cool and we’ll probably want to use them a lot, so it makes sense to just create a base class that does this for all subclasses. I think you know what the base class is going to be called.

class UberObject(object):
    def __init__(self):
        for k in dir(self):
            v = getattr(self, k)
            if isinstance(v, UberProperty):
                v = BoundUberProperty(self, v)
                setattr(self, k, v)

We add this, change our example to inherit from UberObject, and …

e = Example()
print e.x               -> <__main__.BoundUberProperty object at 0x104604c90>

After modifying x to be:

@uberProperty
def x(self):
    return *datetime.datetime.now()*

We can run a simple test:

print e.x.getValue()
print e.x.getValue()
e.x.setValue(datetime.date(2013, 5, 31))
print e.x.getValue()
e.x.clearValue()
print e.x.getValue()

And we get the output we wanted:

2013-05-31 00:05:13.985813
2013-05-31 00:05:13.986290
2013-05-31
2013-05-31 00:05:13.986310

(Gee, I’m working late.)

Note that I have used getValue, setValue, and clearValue here. This is because I haven’t yet linked in the means to have these automatically returned.

But I think this is a good place to stop for now, because I’m getting tired. You can also see that the core functionality we wanted is in place; the rest is window dressing. Important usability window dressing, but that can wait until I have a change to update the post.

I’ll finish up the example in the next posting by addressing these things:

  • We need to make sure UberObject’s __init__ is always called by subclasses.

    • So we either force it be called somewhere or we prevent it from being implemented.
    • We’ll see how to do this with a metaclass.
  • We need to make sure we handle the common case where someone ‘aliases’ a function to something else, such as:

      class Example(object):
          @uberProperty
          def x(self):
              ...
    
          y = x
    
  • We need e.x to return e.x.getValue() by default.

    • What we’ll actually see is this is one area where the model fails.
    • It turns out we’ll always need to use a function call to get the value.
    • But we can make it look like a regular function call and avoid having to use e.x.getValue(). (Doing this one is obvious, if you haven’t already fixed it out.)
  • We need to support setting e.x directly, as in e.x = <newvalue>. We can do this in the parent class too, but we’ll need to update our __init__ code to handle it.

  • Finally, we’ll add parameterized attributes. It should be pretty obvious how we’ll do this, too.

Here’s the code as it exists up to now:

import datetime

class UberObject(object):
    def uberSetter(self, value):
        print 'setting'

    def uberGetter(self):
        return self

    def __init__(self):
        for k in dir(self):
            v = getattr(self, k)
            if isinstance(v, UberProperty):
                v = BoundUberProperty(self, v)
                setattr(self, k, v)


class UberProperty(object):
    def __init__(self, method):
        self.method = method

class BoundUberProperty(object):
    def __init__(self, obj, uberProperty):
        self.obj = obj
        self.uberProperty = uberProperty
        self.isSet = False

    def setValue(self, value):
        self.value = value
        self.isSet = True

    def getValue(self):
        return self.value if self.isSet else self.uberProperty.method(self.obj)

    def clearValue(self):
        del self.value
        self.isSet = False

    def uberProperty(f):
        return UberProperty(f)

class Example(UberObject):

    @uberProperty
    def x(self):
        return datetime.datetime.now()

[1] I may be behind on whether this is still the case.


回答 5

我认为两者都有自己的位置。使用的一个问题@property是,很难使用标准的类机制来扩展子类中的getter或setter的行为。问题在于实际的获取器/设置器函数隐藏在属性中。

您实际上可以掌握这些功能,例如

class C(object):
    _p = 1
    @property
    def p(self):
        return self._p
    @p.setter
    def p(self, val):
        self._p = val

您可以访问getter和setter功能C.p.fgetC.p.fset,但你不能轻易使用正常方法继承(如超)设备来扩展他们。在深入研究了super的复杂性之后,您确实可以通过以下方式使用super:

# Using super():
class D(C):
    # Cannot use super(D,D) here to define the property
    # since D is not yet defined in this scope.
    @property
    def p(self):
        return super(D,D).p.fget(self)

    @p.setter
    def p(self, val):
        print 'Implement extra functionality here for D'
        super(D,D).p.fset(self, val)

# Using a direct reference to C
class E(C):
    p = C.p

    @p.setter
    def p(self, val):
        print 'Implement extra functionality here for E'
        C.p.fset(self, val)

但是,使用super()非常麻烦,因为必须重新定义该属性,并且您必须使用略有反直觉的super(cls,cls)机制来获取p的未绑定副本。

I think both have their place. One issue with using @property is that it is hard to extend the behaviour of getters or setters in subclasses using standard class mechanisms. The problem is that the actual getter/setter functions are hidden in the property.

You can actually get hold of the functions, e.g. with

class C(object):
    _p = 1
    @property
    def p(self):
        return self._p
    @p.setter
    def p(self, val):
        self._p = val

you can access the getter and setter functions as C.p.fget and C.p.fset, but you can’t easily use the normal method inheritance (e.g. super) facilities to extend them. After some digging into the intricacies of super, you can indeed use super in this way:

# Using super():
class D(C):
    # Cannot use super(D,D) here to define the property
    # since D is not yet defined in this scope.
    @property
    def p(self):
        return super(D,D).p.fget(self)

    @p.setter
    def p(self, val):
        print 'Implement extra functionality here for D'
        super(D,D).p.fset(self, val)

# Using a direct reference to C
class E(C):
    p = C.p

    @p.setter
    def p(self, val):
        print 'Implement extra functionality here for E'
        C.p.fset(self, val)

Using super() is, however, quite clunky, since the property has to be redefined, and you have to use the slightly counter-intuitive super(cls,cls) mechanism to get an unbound copy of p.


回答 6

对我来说,使用属性更直观,并且更适合大多数代码。

比较中

o.x = 5
ox = o.x

o.setX(5)
ox = o.getX()

在我看来,这很容易阅读。属性也使私有变量变得更加容易。

Using properties is to me more intuitive and fits better into most code.

Comparing

o.x = 5
ox = o.x

vs.

o.setX(5)
ox = o.getX()

is to me quite obvious which is easier to read. Also properties allows for private variables much easier.


回答 7

在大多数情况下,我都不想使用两者。属性的问题在于它们使类不那么透明。特别是,如果您要向设置员提出exceptions情况,这将成为一个问题。例如,如果您具有Account.email属性:

class Account(object):
    @property
    def email(self):
        return self._email

    @email.setter
    def email(self, value):
        if '@' not in value:
            raise ValueError('Invalid email address.')
        self._email = value

那么该类的用户就不会期望为该属性分配值会导致异常:

a = Account()
a.email = 'badaddress'
--> ValueError: Invalid email address.

结果,异常可能无法处理,或者在调用链中传播得太高而无法正确处理,或者导致向程序用户呈现非常无用的回溯(在python和java的世界中,这实在太普遍了)。

我也避免使用getter和setter:

  • 因为预先为所有属性定义它们非常耗时,
  • 不必要地增加了代码量,使理解和维护代码更加困难,
  • 如果仅根据需要为属性定义它们,则类的界面将发生变化,从而损害该类的所有用户

我更喜欢在定义明确的位置(例如在验证方法中)执行复杂的逻辑,而不是使用属性和获取/设置方法:

class Account(object):
    ...
    def validate(self):
        if '@' not in self.email:
            raise ValueError('Invalid email address.')

或类似的Account.save方法。

请注意,我并不是想说在任何情况下属性都是有用的,只是如果您可以使类足够简单和透明以至于不需要它们,则可能会更好。

I would prefer to use neither in most cases. The problem with properties is that they make the class less transparent. Especially, this is an issue if you were to raise an exception from a setter. For example, if you have an Account.email property:

class Account(object):
    @property
    def email(self):
        return self._email

    @email.setter
    def email(self, value):
        if '@' not in value:
            raise ValueError('Invalid email address.')
        self._email = value

then the user of the class does not expect that assigning a value to the property could cause an exception:

a = Account()
a.email = 'badaddress'
--> ValueError: Invalid email address.

As a result, the exception may go unhandled, and either propagate too high in the call chain to be handled properly, or result in a very unhelpful traceback being presented to the program user (which is sadly too common in the world of python and java).

I would also avoid using getters and setters:

  • because defining them for all properties in advance is very time consuming,
  • makes the amount of code unnecessarily longer, which makes understanding and maintaining the code more difficult,
  • if you were define them for properties only as needed, the interface of the class would change, hurting all users of the class

Instead of properties and getters/setters I prefer doing the complex logic in well defined places such as in a validation method:

class Account(object):
    ...
    def validate(self):
        if '@' not in self.email:
            raise ValueError('Invalid email address.')

or a similiar Account.save method.

Note that I am not trying to say that there are no cases when properties are useful, only that you may be better off if you can make your classes simple and transparent enough that you don’t need them.


回答 8

我觉得属性是关于让您仅在实际需要时才编写getter和setter的开销。

Java编程文化强烈建议永远不要访问属性,而应通过getter和setter以及仅实际需要的属性进行访问。总是编写这些显而易见的代码片段有点冗长,请注意,有70%的时间从未将它们替换为一些非平凡的逻辑。

在Python中,人们实际上关心这种开销,因此您可以采用以下做法:

  • 如果不需要,首先不要使用getter和setter。
  • 使用@property予以实施而又不改变你的代码的其余部分的语法。

I feel like properties are about letting you get the overhead of writing getters and setters only when you actually need them.

Java Programming culture strongly advise to never give access to properties, and instead, go through getters and setters, and only those which are actually needed. It’s a bit verbose to always write these obvious pieces of code, and notice that 70% of the time they are never replaced by some non-trivial logic.

In Python, people actually care for that kind of overhead, so that you can embrace the following practice :

  • Do not use getters and setters at first, when if they not needed
  • Use @property to implement them without changing the syntax of the rest of your code.

回答 9

令我惊讶的是,没有人提到属性是描述符类的绑定方法,Adam DonohueNeilenMarais在他们的帖子中确切地了解了这个想法-getter和setter是函数,可以用来:

  • 验证
  • 修改数据
  • 鸭子类型(强制类型为其他类型)

这提供了一种隐藏实现细节和代码残废(例如正则表达式,类型强制转换,尝试..除了块,断言或计算值之外)的聪明方法。

通常,对一个对象执行CRUD通常可能很平凡,但请考虑将数据保存到关系数据库的示例。ORM可以在绑定到属性类中定义的fget,fset,fdel的方法中隐藏特定SQL语言的实现细节,该类将管理糟糕的OO代码中的.. elif .. else阶梯,从而暴露出简单易懂的优雅,self.variable = something并避免使用 ORM 为开发人员提供细节。

如果仅将属性视为束缚和纪律语言(即Java)的沉闷痕迹,那么他们就错过了描述符的要点。

I am surprised that nobody has mentioned that properties are bound methods of a descriptor class, Adam Donohue and NeilenMarais get at exactly this idea in their posts — that getters and setters are functions and can be used to:

  • validate
  • alter data
  • duck type (coerce type to another type)

This presents a smart way to hide implementation details and code cruft like regular expression, type casts, try .. except blocks, assertions or computed values.

In general doing CRUD on an object may often be fairly mundane but consider the example of data that will be persisted to a relational database. ORM’s can hide implementation details of particular SQL vernaculars in the methods bound to fget, fset, fdel defined in a property class that will manage the awful if .. elif .. else ladders that are so ugly in OO code — exposing the simple and elegant self.variable = something and obviate the details for the developer using the ORM.

If one thinks of properties only as some dreary vestige of a Bondage and Discipline language (i.e. Java) they are missing the point of descriptors.


回答 10

在复杂的项目中,我更喜欢使用带有显式setter函数的只读属性(或getter):

class MyClass(object):
...        
@property
def my_attr(self):
    ...

def set_my_attr(self, value):
    ...

在寿命长的项目中,调试和重构比编写代码本身要花费更多的时间。使用它有几个缺点@property.setter,使调试更加困难:

1)python允许为现有对象创建新属性。这使得很难跟踪以下印刷错误:

my_object.my_atttr = 4.

如果您的对象是一个复杂的算法,那么您将花费相当多的时间尝试找出为什么它不收敛(请注意,在上面的行中有一个额外的“ t”)

2)setter有时可能会演变为复杂而缓慢的方法(例如,访问数据库)。对于另一个开发人员来说,很难弄清楚为什么以下功能非常慢。他可能在分析do_something()方法上花费了大量时间,而my_object.my_attr = 4.实际上是导致速度下降的原因:

def slow_function(my_object):
    my_object.my_attr = 4.
    my_object.do_something()

In complex projects I prefer using read-only properties (or getters) with explicit setter function:

class MyClass(object):
...        
@property
def my_attr(self):
    ...

def set_my_attr(self, value):
    ...

In long living projects debugging and refactoring takes more time than writing the code itself. There are several downsides for using @property.setter that makes debugging even harder:

1) python allows creating new attributes for an existing object. This makes a following misprint very hard to track:

my_object.my_atttr = 4.

If your object is a complicated algorithm then you will spend quite some time trying to find out why it doesn’t converge (notice an extra ‘t’ in the line above)

2) setter sometimes might evolve to a complicated and slow method (e.g. hitting a database). It would be quite hard for another developer to figure out why the following function is very slow. He might spend a lot of time on profiling do_something() method, while my_object.my_attr = 4. is actually the cause of slowdown:

def slow_function(my_object):
    my_object.my_attr = 4.
    my_object.do_something()

回答 11

无论@property与传统的getter和setter方法各有优点。这取决于您的用例。

优点 @property

  • 您无需在更改数据访问的实现时更改接口。当您的项目较小时,您可能希望使用直接属性访问来访问类成员。例如,假设您有一个foo类型为object的对象Foo,该对象具有一个member num。然后,您只需使用即可获得此成员num = foo.num。随着项目的发展,您可能会觉得需要对简单的属性访问进行一些检查或调试。然后,您可以@property 使用。数据访问接口保持不变,因此无需修改客户端代码。

    引用自PEP-8

    对于简单的公共数据属性,最好仅公开属性名称,而不使用复杂的访问器/更改器方法。请记住,如果您发现简单的数据属性需要增强功能行为,则Python为将来的增强提供了简便的方法。在这种情况下,使用属性将功能实现隐藏在简单的数据属性访问语法之后。

  • 使用@property在Python中的数据访问被认为是Python的

    • 它可以增强您作为Python(不是Java)程序员的自我认同。

    • 如果您的面试官认为Java风格的getter和setter是反模式的,那么它可以帮助您进行工作面试。

传统吸气剂和吸气剂的优点

  • 与简单的属性访问相比,传统的getter和setter允许更复杂的数据访问。例如,当您设置一个类成员时,有时您需要一个标志来指示您希望在哪里强制执行此操作,即使某些情况看起来并不完美。虽然如何扩展直接成员访问权限(如)并不明显foo.num = num,但您可以通过附加force参数轻松扩展传统的setter :

    def Foo:
        def set_num(self, num, force=False):
            ...
  • 传统的getter和setter 明确表明,类成员访问是通过方法进行的。这表示:

    • 结果所得到的结果可能与该类中确切存储的结果不同。

    • 即使访问看起来像简单的属性访问,其性能也可能相差很大。

    除非您的Class用户希望@property在每个属性访问语句后都隐藏起来,否则将其明确表示可以最大程度地减少您的Class用户的意外情况。

  • @NeilenMarais本文所提到的,在子类中扩展传统的getter和setters比扩展属性更容易。

  • 长期以来,传统的吸气剂和吸气剂已以多种语言广泛使用。如果您的团队中有来自不同背景的人员,那么他们看起来比熟悉@property。另外,随着项目的发展,如果您可能需要从Python迁移到另一种不具备的语言,则@property使用传统的getter和setter可以使迁移过程更加顺畅。

注意事项

  • @property即使您使用传统的getter和setter 都不将类成员设为私有,即使您在其名称前使用双下划线也是如此:

    class Foo:
        def __init__(self):
            self.__num = 0
    
        @property
        def num(self):
            return self.__num
    
        @num.setter
        def num(self, num):
            self.__num = num
    
        def get_num(self):
            return self.__num
    
        def set_num(self, num):
            self.__num = num
    
    foo = Foo()
    print(foo.num)          # output: 0
    print(foo.get_num())    # output: 0
    print(foo._Foo__num)    # output: 0

Both @property and traditional getters and setters have their advantages. It depends on your use case.

Advantages of @property

  • You don’t have to change the interface while changing the implementation of data access. When your project is small, you probably want to use direct attribute access to access a class member. For example, let’s say you have an object foo of type Foo, which has a member num. Then you can simply get this member with num = foo.num. As your project grows, you may feel like there needs to be some checks or debugs on the simple attribute access. Then you can do that with a @property within the class. The data access interface remains the same so that there is no need to modify client code.

    Cited from PEP-8:

    For simple public data attributes, it is best to expose just the attribute name, without complicated accessor/mutator methods. Keep in mind that Python provides an easy path to future enhancement, should you find that a simple data attribute needs to grow functional behavior. In that case, use properties to hide functional implementation behind simple data attribute access syntax.

  • Using @property for data access in Python is regarded as Pythonic:

    • It can strengthen your self-identification as a Python (not Java) programmer.

    • It can help your job interview if your interviewer thinks Java-style getters and setters are anti-patterns.

Advantages of traditional getters and setters

  • Traditional getters and setters allow for more complicated data access than simple attribute access. For example, when you are setting a class member, sometimes you need a flag indicating where you would like to force this operation even if something doesn’t look perfect. While it is not obvious how to augment a direct member access like foo.num = num, You can easily augment your traditional setter with an additional force parameter:

    def Foo:
        def set_num(self, num, force=False):
            ...
    
  • Traditional getters and setters make it explicit that a class member access is through a method. This means:

    • What you get as the result may not be the same as what is exactly stored within that class.

    • Even if the access looks like a simple attribute access, the performance can vary greatly from that.

    Unless your class users expect a @property hiding behind every attribute access statement, making such things explicit can help minimize your class users surprises.

  • As mentioned by @NeilenMarais and in this post, extending traditional getters and setters in subclasses is easier than extending properties.

  • Traditional getters and setters have been widely used for a long time in different languages. If you have people from different backgrounds in your team, they look more familiar than @property. Also, as your project grows, if you may need to migrate from Python to another language that doesn’t have @property, using traditional getters and setters would make the migration smoother.

Caveats

  • Neither @property nor traditional getters and setters makes the class member private, even if you use double underscore before its name:

    class Foo:
        def __init__(self):
            self.__num = 0
    
        @property
        def num(self):
            return self.__num
    
        @num.setter
        def num(self, num):
            self.__num = num
    
        def get_num(self):
            return self.__num
    
        def set_num(self, num):
            self.__num = num
    
    foo = Foo()
    print(foo.num)          # output: 0
    print(foo.get_num())    # output: 0
    print(foo._Foo__num)    # output: 0
    

回答 12

这是“有效的Python:编写更好的Python的90种特定方法”的摘录(很棒的书。我强烈推荐它)。

要记住的事情

using使用简单的公共属性定义新的类接口,并避免定义setter和getter方法。

necessary必要时,使用@property定义在对象上访问属性时的特殊行为。

@在您的@property方法中遵循最小惊喜规则,并避免出现奇怪的副作用。

✦确保@property方法是快速的;对于缓慢或复杂的工作(尤其是涉及I / O或引起副作用的工作),请改用常规方法。

@property的一种高级但通用的用法是将曾经简单的数字属性转换为即时计算。这非常有用,因为它使您可以将类的所有现有用法迁移到新行为,而无需重写任何调用站点(如果您无法控制调用代码,这尤其重要)。@property还提供了一个重要的权宜之计,用于随着时间的推移改进接口。

我特别喜欢@property,因为它可以让您随着时间的推移逐步向更好的数据模型发展。
@property是一个工具,可帮助您解决在实际代码中遇到的问题。不要过度使用它。当您发现自己反复扩展@property方法时,可能是时候重构您的类,而不是进一步讨论代码的不良设计了。

✦使用@property为现有实例属性赋予新功能。

using通过使用@property,逐步朝着更好的数据模型发展。

find当您过多地使用@property时,请考虑重构一个类和所有调用站点。

Here is an excerpts from “Effective Python: 90 Specific Ways to Write Better Python” (Amazing book. I highly recommend it).

Things to Remember

✦ Define new class interfaces using simple public attributes and avoid defining setter and getter methods.

✦ Use @property to define special behavior when attributes are accessed on your objects, if necessary.

✦ Follow the rule of least surprise and avoid odd side effects in your @property methods.

✦ Ensure that @property methods are fast; for slow or complex work—especially involving I/O or causing side effects—use normal methods instead.

One advanced but common use of @property is transitioning what was once a simple numerical attribute into an on-the-fly calculation. This is extremely helpful because it lets you migrate all existing usage of a class to have new behaviors without requiring any of the call sites to be rewritten (which is especially important if there’s calling code that you don’t control). @property also provides an important stopgap for improving interfaces over time.

I especially like @property because it lets you make incremental progress toward a better data model over time.
@property is a tool to help you address problems you’ll come across in real-world code. Don’t overuse it. When you find yourself repeatedly extending @property methods, it’s probably time to refactor your class instead of further paving over your code’s poor design.

✦ Use @property to give existing instance attributes new functionality.

✦ Make incremental progress toward better data models by using @property.

✦ Consider refactoring a class and all call sites when you find yourself using @property too heavily.


如何在Python中打印异常?

问题:如何在Python中打印异常?

try:
    something here
except:
    print('the whatever error occurred.')

如何在except:块中打印错误/异常?

try:
    something here
except:
    print('the whatever error occurred.')

How can I print the error/exception in my except: block?


回答 0

对于Python 2.6和更高版本以及Python 3.x:

except Exception as e: print(e)

对于Python 2.5及更早版本,请使用:

except Exception,e: print str(e)

For Python 2.6 and later and Python 3.x:

except Exception as e: print(e)

For Python 2.5 and earlier, use:

except Exception,e: print str(e)

回答 1

traceback模块提供了格式化和打印异常及其回溯的方法,例如,它将像默认处理程序那样打印异常:

import traceback

try:
    1/0
except Exception:
    traceback.print_exc()

输出:

Traceback (most recent call last):
  File "C:\scripts\divide_by_zero.py", line 4, in <module>
    1/0
ZeroDivisionError: division by zero

The traceback module provides methods for formatting and printing exceptions and their tracebacks, e.g. this would print exception like the default handler does:

import traceback

try:
    1/0
except Exception:
    traceback.print_exc()

Output:

Traceback (most recent call last):
  File "C:\scripts\divide_by_zero.py", line 4, in <module>
    1/0
ZeroDivisionError: division by zero

回答 2

Python 2.6或更高版本中,它更干净一些:

except Exception as e: print(e)

在旧版本中,它仍然很可读:

except Exception, e: print e

In Python 2.6 or greater it’s a bit cleaner:

except Exception as e: print(e)

In older versions it’s still quite readable:

except Exception, e: print e

回答 3

如果您想传递错误字符串,这是错误和异常(Python 2.6)中的示例

>>> try:
...    raise Exception('spam', 'eggs')
... except Exception as inst:
...    print type(inst)     # the exception instance
...    print inst.args      # arguments stored in .args
...    print inst           # __str__ allows args to printed directly
...    x, y = inst          # __getitem__ allows args to be unpacked directly
...    print 'x =', x
...    print 'y =', y
...
<type 'exceptions.Exception'>
('spam', 'eggs')
('spam', 'eggs')
x = spam
y = eggs

In case you want to pass error strings, here is an example from Errors and Exceptions (Python 2.6)

>>> try:
...    raise Exception('spam', 'eggs')
... except Exception as inst:
...    print type(inst)     # the exception instance
...    print inst.args      # arguments stored in .args
...    print inst           # __str__ allows args to printed directly
...    x, y = inst          # __getitem__ allows args to be unpacked directly
...    print 'x =', x
...    print 'y =', y
...
<type 'exceptions.Exception'>
('spam', 'eggs')
('spam', 'eggs')
x = spam
y = eggs

回答 4

(我打算将其作为对@jldupont答案的评论,但我没有足够的声誉。)

我在其他地方也看到过类似@jldupont的答案的答案。FWIW,我认为必须注意以下几点:

except Exception as e:
    print(e)

sys.stdout默认将错误输出打印到。通常,更合适的错误处理方法是:

except Exception as e:
    print(e, file=sys.stderr)

(请注意,您必须import sys执行此操作。)这样,将错误打印到STDERR而不是STDOUT,从而可以进行正确的输出解析/重定向/等。我知道问题完全是关于“打印错误”的,但是在此处指出最佳实践而不是忽略可能导致最终学习不到的标准代码的细节似乎很重要。

我没有traceback在Cat Plus Plus的答案中使用该模块,也许这是最好的方法,但是我想我应该把它扔在那里。

(I was going to leave this as a comment on @jldupont’s answer, but I don’t have enough reputation.)

I’ve seen answers like @jldupont’s answer in other places as well. FWIW, I think it’s important to note that this:

except Exception as e:
    print(e)

will print the error output to sys.stdout by default. A more appropriate approach to error handling in general would be:

except Exception as e:
    print(e, file=sys.stderr)

(Note that you have to import sys for this to work.) This way, the error is printed to STDERR instead of STDOUT, which allows for the proper output parsing/redirection/etc. I understand that the question was strictly about ‘printing an error’, but it seems important to point out the best practice here rather than leave out this detail that could lead to non-standard code for anyone who doesn’t eventually learn better.

I haven’t used the traceback module as in Cat Plus Plus’s answer, and maybe that’s the best way, but I thought I’d throw this out there.


回答 5

Python 3: logging

除了使用基本print()功能,logging还可以使用更灵活的模块来记录异常。该logging模块提供了许多额外的功能,例如,将消息记录到给定的日志文件中,使用时间戳记录消息以及有关记录发生位置的其他信息。(有关更多信息,请查看官方文档。)

可以使用模块级功能记录异常,logging.exception()如下所示:

import logging

try:
    1/0
except BaseException:
    logging.exception("An exception was thrown!")

输出:

ERROR:root:An exception was thrown!
Traceback (most recent call last):
  File ".../Desktop/test.py", line 4, in <module>
    1/0
ZeroDivisionError: division by zero 

笔记:

  • 该功能logging.exception()只能从异常处理程序中调用

  • logging模块不应在日志记录处理程序中使用,以免出现RecursionError(感谢@PrakharPandey)


备用日志级别

也可以使用关键字参数将异常记录到另一个日志级别,exc_info=True如下所示:

logging.debug("An exception was thrown!", exc_info=True)
logging.info("An exception was thrown!", exc_info=True)
logging.warning("An exception was thrown!", exc_info=True)

Python 3: logging

Instead of using the basic print() function, the more flexible logging module can be used to log the exception. The logging module offers a lot extra functionality, e.g. logging messages into a given log file, logging messages with timestamps and additional information about where the logging happened. (For more information check out the official documentation.)

Logging an exception can be done with the module-level function logging.exception() like so:

import logging

try:
    1/0
except BaseException:
    logging.exception("An exception was thrown!")

Output:

ERROR:root:An exception was thrown!
Traceback (most recent call last):
  File ".../Desktop/test.py", line 4, in <module>
    1/0
ZeroDivisionError: division by zero 

Notes:

  • the function logging.exception() should only be called from an exception handler

  • the logging module should not be used inside a logging handler to avoid a RecursionError (thanks @PrakharPandey)


Alternative log-levels

It’s also possible to log the exception with another log-level by using the keyword argument exc_info=True like so:

logging.debug("An exception was thrown!", exc_info=True)
logging.info("An exception was thrown!", exc_info=True)
logging.warning("An exception was thrown!", exc_info=True)

回答 6

如果您要这样做,可以使用assert语句来完成一次线性错误提升。这将帮助您编写可静态修复的代码并及早检查错误。

assert type(A) is type(""), "requires a string"

One liner error raising can be done with assert statements if that’s what you want to do. This will help you write statically fixable code and check errors early.

assert type(A) is type(""), "requires a string"

回答 7

在捕获异常时,几乎可以控制要显示/记录的追溯信息。

编码

with open("not_existing_file.txt", 'r') as text:
    pass

将产生以下回溯:

Traceback (most recent call last):
  File "exception_checks.py", line 19, in <module>
    with open("not_existing_file.txt", 'r') as text:
FileNotFoundError: [Errno 2] No such file or directory: 'not_existing_file.txt'

打印/记录完整的追溯

正如其他人已经提到的那样,您可以使用traceback模块捕获整个traceback:

import traceback
try:
    with open("not_existing_file.txt", 'r') as text:
        pass
except Exception as exception:
    traceback.print_exc()

这将产生以下输出:

Traceback (most recent call last):
  File "exception_checks.py", line 19, in <module>
    with open("not_existing_file.txt", 'r') as text:
FileNotFoundError: [Errno 2] No such file or directory: 'not_existing_file.txt'

您可以通过使用日志记录来实现相同目的:

try:
    with open("not_existing_file.txt", 'r') as text:
        pass
except Exception as exception:
    logger.error(exception, exc_info=True)

输出:

__main__: 2020-05-27 12:10:47-ERROR- [Errno 2] No such file or directory: 'not_existing_file.txt'
Traceback (most recent call last):
  File "exception_checks.py", line 27, in <module>
    with open("not_existing_file.txt", 'r') as text:
FileNotFoundError: [Errno 2] No such file or directory: 'not_existing_file.txt'

仅打印/记录错误名称/消息

您可能对整个追溯不感兴趣,而仅对最重要的信息(例如,异常名称和异常消息)感兴趣,请使用:

try:
    with open("not_existing_file.txt", 'r') as text:
        pass
except Exception as exception:
    print("Exception: {}".format(type(exception).__name__))
    print("Exception message: {}".format(exception))

输出:

Exception: FileNotFoundError
Exception message: [Errno 2] No such file or directory: 'not_existing_file.txt'

One has pretty much control on which information from the traceback to be displayed/logged when catching exceptions.

The code

with open("not_existing_file.txt", 'r') as text:
    pass

would produce the following traceback:

Traceback (most recent call last):
  File "exception_checks.py", line 19, in <module>
    with open("not_existing_file.txt", 'r') as text:
FileNotFoundError: [Errno 2] No such file or directory: 'not_existing_file.txt'

Print/Log the full traceback

As others already mentioned, you can catch the whole traceback by using the traceback module:

import traceback
try:
    with open("not_existing_file.txt", 'r') as text:
        pass
except Exception as exception:
    traceback.print_exc()

This will produce the following output:

Traceback (most recent call last):
  File "exception_checks.py", line 19, in <module>
    with open("not_existing_file.txt", 'r') as text:
FileNotFoundError: [Errno 2] No such file or directory: 'not_existing_file.txt'

You can achieve the same by using logging:

try:
    with open("not_existing_file.txt", 'r') as text:
        pass
except Exception as exception:
    logger.error(exception, exc_info=True)

Output:

__main__: 2020-05-27 12:10:47-ERROR- [Errno 2] No such file or directory: 'not_existing_file.txt'
Traceback (most recent call last):
  File "exception_checks.py", line 27, in <module>
    with open("not_existing_file.txt", 'r') as text:
FileNotFoundError: [Errno 2] No such file or directory: 'not_existing_file.txt'

Print/log error name/message only

You might not be interested in the whole traceback, but only in the most important information, such as Exception name and Exception message, use:

try:
    with open("not_existing_file.txt", 'r') as text:
        pass
except Exception as exception:
    print("Exception: {}".format(type(exception).__name__))
    print("Exception message: {}".format(exception))

Output:

Exception: FileNotFoundError
Exception message: [Errno 2] No such file or directory: 'not_existing_file.txt'

如何在Python中获取绝对文件路径

问题:如何在Python中获取绝对文件路径

给定路径,例如"mydir/myfile.txt",我如何找到相对于Python中当前工作目录的文件的绝对路径?例如在Windows上,我可能最终得到:

"C:/example/cwd/mydir/myfile.txt"

Given a path such as "mydir/myfile.txt", how do I find the file’s absolute path relative to the current working directory in Python? E.g. on Windows, I might end up with:

"C:/example/cwd/mydir/myfile.txt"

回答 0

>>> import os
>>> os.path.abspath("mydir/myfile.txt")
'C:/example/cwd/mydir/myfile.txt'

如果已经是绝对路径,也可以使用:

>>> import os
>>> os.path.abspath("C:/example/cwd/mydir/myfile.txt")
'C:/example/cwd/mydir/myfile.txt'
>>> import os
>>> os.path.abspath("mydir/myfile.txt")
'C:/example/cwd/mydir/myfile.txt'

Also works if it is already an absolute path:

>>> import os
>>> os.path.abspath("C:/example/cwd/mydir/myfile.txt")
'C:/example/cwd/mydir/myfile.txt'

回答 1

您可以使用新的Python 3.4库pathlib。(您也可以使用来为Python 2.6或2.7获取它pip install pathlib。)作者写道:“该库的目的是提供一个简单的类层次结构来处理文件系统路径以及用户对其进行的常见操作。”

在Windows中获取绝对路径:

>>> from pathlib import Path
>>> p = Path("pythonw.exe").resolve()
>>> p
WindowsPath('C:/Python27/pythonw.exe')
>>> str(p)
'C:\\Python27\\pythonw.exe'

或在UNIX上:

>>> from pathlib import Path
>>> p = Path("python3.4").resolve()
>>> p
PosixPath('/opt/python3/bin/python3.4')
>>> str(p)
'/opt/python3/bin/python3.4'

文档在这里:https : //docs.python.org/3/library/pathlib.html

You could use the new Python 3.4 library pathlib. (You can also get it for Python 2.6 or 2.7 using pip install pathlib.) The authors wrote: “The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.”

To get an absolute path in Windows:

>>> from pathlib import Path
>>> p = Path("pythonw.exe").resolve()
>>> p
WindowsPath('C:/Python27/pythonw.exe')
>>> str(p)
'C:\\Python27\\pythonw.exe'

Or on UNIX:

>>> from pathlib import Path
>>> p = Path("python3.4").resolve()
>>> p
PosixPath('/opt/python3/bin/python3.4')
>>> str(p)
'/opt/python3/bin/python3.4'

Docs are here: https://docs.python.org/3/library/pathlib.html


回答 2

更好的是,安装模块(位于上PyPI),它将所有os.path功能和其他相关功能包装到对象上的方法中,无论使用什么字符串,都可以使用该方法:

>>> from path import path
>>> path('mydir/myfile.txt').abspath()
'C:\\example\\cwd\\mydir\\myfile.txt'
>>>

Better still, install the module (found on PyPI), it wraps all the os.path functions and other related functions into methods on an object that can be used wherever strings are used:

>>> from path import path
>>> path('mydir/myfile.txt').abspath()
'C:\\example\\cwd\\mydir\\myfile.txt'
>>>

回答 3

今天,您还可以使用unipath基于以下内容的软件包path.pyhttp : //sluggo.scrapping.cc/python/unipath/

>>> from unipath import Path
>>> absolute_path = Path('mydir/myfile.txt').absolute()
Path('C:\\example\\cwd\\mydir\\myfile.txt')
>>> str(absolute_path)
C:\\example\\cwd\\mydir\\myfile.txt
>>>

我建议使用此软件包,因为它为常见的os.path实用程序提供了一个干净的接口

Today you can also use the unipath package which was based on path.py: http://sluggo.scrapping.cc/python/unipath/

>>> from unipath import Path
>>> absolute_path = Path('mydir/myfile.txt').absolute()
Path('C:\\example\\cwd\\mydir\\myfile.txt')
>>> str(absolute_path)
C:\\example\\cwd\\mydir\\myfile.txt
>>>

I would recommend using this package as it offers a clean interface to common os.path utilities.


回答 4

Python 3.4+的更新pathlib实际上回答了这个问题:

from pathlib import Path

relative = Path("mydir/myfile.txt")
absolute = relative.absolute()  # absolute is a Path object

如果只需要一个临时字符串,请记住,您可以将Path对象与中的所有相关功能一起使用os.path,当然包括abspath

from os.path import abspath

absolute = abspath(relative)  # absolute is a str object

Update for Python 3.4+ pathlib that actually answers the question:

from pathlib import Path

relative = Path("mydir/myfile.txt")
absolute = relative.absolute()  # absolute is a Path object

If you only need a temporary string, keep in mind that you can use Path objects with all the relevant functions in os.path, including of course abspath:

from os.path import abspath

absolute = abspath(relative)  # absolute is a str object

回答 5

import os
os.path.abspath(os.path.expanduser(os.path.expandvars(PathNameString)))

请注意expanduser(在Unix上),如果给定的文件(或目录)名称和位置表达式可能包含前导~/(代字号指向用户的主目录),并且expandvars可以处理任何其他环境变量(如$HOME),则这是必需的。

import os
os.path.abspath(os.path.expanduser(os.path.expandvars(PathNameString)))

Note that expanduser is necessary (on Unix) in case the given expression for the file (or directory) name and location may contain a leading ~/(the tilde refers to the user’s home directory), and expandvars takes care of any other environment variables (like $HOME).


回答 6

始终获取当前脚本的文件名权,即使它是从另一个脚本中调用。使用时特别有用subprocess

import sys,os

filename = sys.argv[0]

从那里,您可以使用以下命令获取脚本的完整路径:

>>> os.path.abspath(filename)
'/foo/bar/script.py'

通过/..在目录的层次结构中添加您想要向上跳转的次数,它还使导航文件夹更加容易。

要获取cwd:

>>> os.path.abspath(filename+"/..")
'/foo/bar'

对于父路径:

>>> os.path.abspath(filename+"/../..")
'/foo'

通过"/.."与其他文件名结合使用,您可以访问系统中的任何文件。

This always gets the right filename of the current script, even when it is called from within another script. It is especially useful when using subprocess.

import sys,os

filename = sys.argv[0]

from there, you can get the script’s full path with:

>>> os.path.abspath(filename)
'/foo/bar/script.py'

It also makes easier to navigate folders by just appending /.. as many times as you want to go ‘up’ in the directories’ hierarchy.

To get the cwd:

>>> os.path.abspath(filename+"/..")
'/foo/bar'

For the parent path:

>>> os.path.abspath(filename+"/../..")
'/foo'

By combining "/.." with other filenames, you can access any file in the system.


回答 7

模块os提供了一种找到Abs路径的方法。

但是,Linux中的大多数路径都以~(波浪号)开头,因此效果不理想。

因此您可以使用srblib它。

>>> import os
>>> os.path.abspath('~/hello/world')
'/home/srb/Desktop/~/hello/world'
>>> from srblib import abs_path
>>> abs_path('~/hello/world')
'/home/srb/hello/world'

使用安装 python3 -m pip install srblib

https://pypi.org/project/srblib/

Module os provides a way to find abs path.

BUT most of the paths in Linux start with ~ (tilde), which doesn’t give a satisfactory result.

so you can use srblib for that.

>>> import os
>>> os.path.abspath('~/hello/world')
'/home/srb/Desktop/~/hello/world'
>>> from srblib import abs_path
>>> abs_path('~/hello/world')
'/home/srb/hello/world'

install it using python3 -m pip install srblib

https://pypi.org/project/srblib/


回答 8

我更喜欢使用glob

以下是列出当前文件夹中所有文件类型的方法:

import glob
for x in glob.glob():
    print(x)

以下是列出当前文件夹中所有(例如).txt文件的方法:

import glob
for x in glob.glob('*.txt'):
    print(x)

以下是列出所选目录中所有文件类型的方法:

import glob
for x in glob.glob('C:/example/hi/hello/'):
    print(x)

希望这对你有帮助

I prefer to use glob

here is how to list all file types in your current folder:

import glob
for x in glob.glob():
    print(x)

here is how to list all (for example) .txt files in your current folder:

import glob
for x in glob.glob('*.txt'):
    print(x)

here is how to list all file types in a chose directory:

import glob
for x in glob.glob('C:/example/hi/hello/'):
    print(x)

hope this helped you


回答 9

如果您使用的是Mac

import os
upload_folder = os.path.abspath("static/img/users")

这将为您提供完整的路径:

print(upload_folder)

将显示以下路径:

>>>/Users/myUsername/PycharmProjects/OBS/static/img/user

if you are on a mac

import os
upload_folder = os.path.abspath("static/img/users")

this will give you a full path:

print(upload_folder)

will show the following path:

>>>/Users/myUsername/PycharmProjects/OBS/static/img/user

回答 10

如果有人使用python和linux并寻找文件的完整路径:

>>> path=os.popen("readlink -f file").read()
>>> print path
abs/path/to/file

In case someone is using python and linux and looking for full path to file:

>>> path=os.popen("readlink -f file").read()
>>> print path
abs/path/to/file

如何在Python中定义二维数组

问题:如何在Python中定义二维数组

我想定义一个没有初始化长度的二维数组,如下所示:

Matrix = [][]

但这不起作用…

我已经尝试过下面的代码,但是它也是错误的:

Matrix = [5][5]

错误:

Traceback ...

IndexError: list index out of range

我怎么了

I want to define a two-dimensional array without an initialized length like this:

Matrix = [][]

but it does not work…

I’ve tried the code below, but it is wrong too:

Matrix = [5][5]

Error:

Traceback ...

IndexError: list index out of range

What is my mistake?


回答 0

从技术上讲,您正在尝试索引未初始化的数组。您必须先使用列表初始化外部列表,然后再添加项目。Python将其称为“列表理解”。

# Creates a list containing 5 lists, each of 8 items, all set to 0
w, h = 8, 5;
Matrix = [[0 for x in range(w)] for y in range(h)] 

您现在可以将项目添加到列表中:

Matrix[0][0] = 1
Matrix[6][0] = 3 # error! range... 
Matrix[0][6] = 3 # valid

请注意,矩阵是“ y”地址主地址,换句话说,“ y索引”位于“ x索引”之前。

print Matrix[0][0] # prints 1
x, y = 0, 6 
print Matrix[x][y] # prints 3; be careful with indexing! 

尽管您可以根据需要命名它们,但是如果您对内部列表和外部列表都使用“ x”,并且希望使用非平方矩阵,那么我会以这种方式来避免索引可能引起的混淆。

You’re technically trying to index an uninitialized array. You have to first initialize the outer list with lists before adding items; Python calls this “list comprehension”.

# Creates a list containing 5 lists, each of 8 items, all set to 0
w, h = 8, 5;
Matrix = [[0 for x in range(w)] for y in range(h)] 

You can now add items to the list:

Matrix[0][0] = 1
Matrix[6][0] = 3 # error! range... 
Matrix[0][6] = 3 # valid

Note that the matrix is “y” address major, in other words, the “y index” comes before the “x index”.

print Matrix[0][0] # prints 1
x, y = 0, 6 
print Matrix[x][y] # prints 3; be careful with indexing! 

Although you can name them as you wish, I look at it this way to avoid some confusion that could arise with the indexing, if you use “x” for both the inner and outer lists, and want a non-square Matrix.


回答 1

如果您确实需要矩阵,最好使用numpy。在numpy大多数情况下,矩阵运算使用具有二维的数组类型。有很多方法可以创建一个新数组。最有用的zeros函数之一是函数,它采用shape参数并返回给定形状的数组,其值初始化为零:

>>> import numpy
>>> numpy.zeros((5, 5))
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

这是创建二维数组和矩阵的其他一些方法(为了紧凑起见,删除了输出):

numpy.arange(25).reshape((5, 5))         # create a 1-d range and reshape
numpy.array(range(25)).reshape((5, 5))   # pass a Python range and reshape
numpy.array([5] * 25).reshape((5, 5))    # pass a Python list and reshape
numpy.empty((5, 5))                      # allocate, but don't initialize
numpy.ones((5, 5))                       # initialize with ones

numpy也提供了一种matrix类型,但是不再建议将其用于任何用途,以后可能会删除numpy它。

If you really want a matrix, you might be better off using numpy. Matrix operations in numpy most often use an array type with two dimensions. There are many ways to create a new array; one of the most useful is the zeros function, which takes a shape parameter and returns an array of the given shape, with the values initialized to zero:

>>> import numpy
>>> numpy.zeros((5, 5))
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

Here are some other ways to create 2-d arrays and matrices (with output removed for compactness):

numpy.arange(25).reshape((5, 5))         # create a 1-d range and reshape
numpy.array(range(25)).reshape((5, 5))   # pass a Python range and reshape
numpy.array([5] * 25).reshape((5, 5))    # pass a Python list and reshape
numpy.empty((5, 5))                      # allocate, but don't initialize
numpy.ones((5, 5))                       # initialize with ones

numpy provides a matrix type as well, but it is no longer recommended for any use, and may be removed from numpy in the future.


回答 2

这是用于初始化列表列表的简短表示法:

matrix = [[0]*5 for i in range(5)]

不幸的是,将其缩短为类似的方法5*[5*[0]]实际上是行不通的,因为最终您会得到同一列表的5个副本,因此,当您修改其中一个副本时,它们都会更改,例如:

>>> matrix = 5*[5*[0]]
>>> matrix
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
>>> matrix[4][4] = 2
>>> matrix
[[0, 0, 0, 0, 2], [0, 0, 0, 0, 2], [0, 0, 0, 0, 2], [0, 0, 0, 0, 2], [0, 0, 0, 0, 2]]

Here is a shorter notation for initializing a list of lists:

matrix = [[0]*5 for i in range(5)]

Unfortunately shortening this to something like 5*[5*[0]] doesn’t really work because you end up with 5 copies of the same list, so when you modify one of them they all change, for example:

>>> matrix = 5*[5*[0]]
>>> matrix
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
>>> matrix[4][4] = 2
>>> matrix
[[0, 0, 0, 0, 2], [0, 0, 0, 0, 2], [0, 0, 0, 0, 2], [0, 0, 0, 0, 2], [0, 0, 0, 0, 2]]

回答 3

如果要创建一个空矩阵,则正确的语法是

matrix = [[]]

如果您要生成大小为5的矩阵,并用0填充,

matrix = [[0 for i in xrange(5)] for i in xrange(5)]

If you want to create an empty matrix, the correct syntax is

matrix = [[]]

And if you want to generate a matrix of size 5 filled with 0,

matrix = [[0 for i in xrange(5)] for i in xrange(5)]

回答 4

如果只需要一个二维容器来容纳某些元素,则可以方便地使用字典:

Matrix = {}

然后,您可以执行以下操作:

Matrix[1,2] = 15
print Matrix[1,2]

这是有效的,因为它1,2是一个元组,并且您将其用作索引字典的键。结果类似于哑的稀疏矩阵。

如osa和Josap Valls所指出的,您也可以使用,Matrix = collections.defaultdict(lambda:0)以便丢失的元素具有默认值0

Vatsal进一步指出,该方法对于大型矩阵可能不是很有效,并且仅应在代码的非关键性能部分中使用。

If all you want is a two dimensional container to hold some elements, you could conveniently use a dictionary instead:

Matrix = {}

Then you can do:

Matrix[1,2] = 15
print Matrix[1,2]

This works because 1,2 is a tuple, and you’re using it as a key to index the dictionary. The result is similar to a dumb sparse matrix.

As indicated by osa and Josap Valls, you can also use Matrix = collections.defaultdict(lambda:0) so that the missing elements have a default value of 0.

Vatsal further points that this method is probably not very efficient for large matrices and should only be used in non performance-critical parts of the code.


回答 5

在Python中,您将创建一个列表列表。您不必提前声明尺寸,但是可以声明。例如:

matrix = []
matrix.append([])
matrix.append([])
matrix[0].append(2)
matrix[1].append(3)

现在,matrix [0] [0] == 2和matrix [1] [0] ==3。您还可以使用列表理解语法。此示例两次使用它来构建“二维列表”:

from itertools import count, takewhile
matrix = [[i for i in takewhile(lambda j: j < (k+1) * 10, count(k*10))] for k in range(10)]

In Python you will be creating a list of lists. You do not have to declare the dimensions ahead of time, but you can. For example:

matrix = []
matrix.append([])
matrix.append([])
matrix[0].append(2)
matrix[1].append(3)

Now matrix[0][0] == 2 and matrix[1][0] == 3. You can also use the list comprehension syntax. This example uses it twice over to build a “two-dimensional list”:

from itertools import count, takewhile
matrix = [[i for i in takewhile(lambda j: j < (k+1) * 10, count(k*10))] for k in range(10)]

回答 6

公认的答案是正确且正确的,但是花了我一段时间才了解到我也可以使用它来创建一个完全空的数组。

l =  [[] for _ in range(3)]

结果是

[[], [], []]

The accepted answer is good and correct, but it took me a while to understand that I could also use it to create a completely empty array.

l =  [[] for _ in range(3)]

results in

[[], [], []]

回答 7

您应该列出列表,最好的方法是使用嵌套的理解:

>>> matrix = [[0 for i in range(5)] for j in range(5)]
>>> pprint.pprint(matrix)
[[0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0]]

在您的[5][5]示例中,您正在创建一个内部带有整数“ 5”的列表,并尝试访问其第五项,这自然会引发IndexError,因为没有第五项:

>>> l = [5]
>>> l[5]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

You should make a list of lists, and the best way is to use nested comprehensions:

>>> matrix = [[0 for i in range(5)] for j in range(5)]
>>> pprint.pprint(matrix)
[[0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0]]

On your [5][5] example, you are creating a list with an integer “5” inside, and try to access its 5th item, and that naturally raises an IndexError because there is no 5th item:

>>> l = [5]
>>> l[5]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

回答 8

rows = int(input())
cols = int(input())

matrix = []
for i in range(rows):
  row = []
  for j in range(cols):
    row.append(0)
  matrix.append(row)

print(matrix)

为什么这么长的代码,Python您也会问?

很久以前,当我不熟悉Python时,我看到了编写2D矩阵的单行答案,并告诉自己我不再打算在Python中再次使用2D矩阵。(这些行很吓人,它没有给我有关Python所做的任何信息。还要注意,我不知道这些速记法。)

无论如何,这是一个来自C,CPP和Java背景的初学者的代码

给Python爱好者和专家的说明:请不要因为我编写了详细的代码而投了反对票。

rows = int(input())
cols = int(input())

matrix = []
for i in range(rows):
  row = []
  for j in range(cols):
    row.append(0)
  matrix.append(row)

print(matrix)

Why such a long code, that too in Python you ask?

Long back when I was not comfortable with Python, I saw the single line answers for writing 2D matrix and told myself I am not going to use 2-D matrix in Python again. (Those single lines were pretty scary and It didn’t give me any information on what Python was doing. Also note that I am not aware of these shorthands.)

Anyways, here’s the code for a beginner whose coming from C, CPP and Java background

Note to Python Lovers and Experts: Please do not down vote just because I wrote a detailed code.


回答 9

重写以便于阅读:

# 2D array/ matrix

# 5 rows, 5 cols
rows_count = 5
cols_count = 5

# create
#     creation looks reverse
#     create an array of "cols_count" cols, for each of the "rows_count" rows
#        all elements are initialized to 0
two_d_array = [[0 for j in range(cols_count)] for i in range(rows_count)]

# index is from 0 to 4
#     for both rows & cols
#     since 5 rows, 5 cols

# use
two_d_array[0][0] = 1
print two_d_array[0][0]  # prints 1   # 1st row, 1st col (top-left element of matrix)

two_d_array[1][0] = 2
print two_d_array[1][0]  # prints 2   # 2nd row, 1st col

two_d_array[1][4] = 3
print two_d_array[1][4]  # prints 3   # 2nd row, last col

two_d_array[4][4] = 4
print two_d_array[4][4]  # prints 4   # last row, last col (right, bottom element of matrix)

A rewrite for easy reading:

# 2D array/ matrix

# 5 rows, 5 cols
rows_count = 5
cols_count = 5

# create
#     creation looks reverse
#     create an array of "cols_count" cols, for each of the "rows_count" rows
#        all elements are initialized to 0
two_d_array = [[0 for j in range(cols_count)] for i in range(rows_count)]

# index is from 0 to 4
#     for both rows & cols
#     since 5 rows, 5 cols

# use
two_d_array[0][0] = 1
print two_d_array[0][0]  # prints 1   # 1st row, 1st col (top-left element of matrix)

two_d_array[1][0] = 2
print two_d_array[1][0]  # prints 2   # 2nd row, 1st col

two_d_array[1][4] = 3
print two_d_array[1][4]  # prints 3   # 2nd row, last col

two_d_array[4][4] = 4
print two_d_array[4][4]  # prints 4   # last row, last col (right, bottom element of matrix)

回答 10

采用:

matrix = [[0]*5 for i in range(5)]

第一维的* 5起作用是因为在此级别上,数据是不可变的。

Use:

matrix = [[0]*5 for i in range(5)]

The *5 for the first dimension works because at this level the data is immutable.


回答 11

声明零(一)矩阵:

numpy.zeros((x, y))

例如

>>> numpy.zeros((3, 5))
    array([[ 0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.]])

或numpy.ones((x,y))例如

>>> np.ones((3, 5))
array([[ 1.,  1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.,  1.]])

甚至三个尺寸都是可能的。(http://www.astro.ufl.edu/~warner/prog/python.html参见->多维数组)

To declare a matrix of zeros (ones):

numpy.zeros((x, y))

e.g.

>>> numpy.zeros((3, 5))
    array([[ 0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.]])

or numpy.ones((x, y)) e.g.

>>> np.ones((3, 5))
array([[ 1.,  1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.,  1.]])

Even three dimensions are possible. (http://www.astro.ufl.edu/~warner/prog/python.html see –> Multi-dimensional arrays)


回答 12

这就是我通常在python中创建2D数组的方式。

col = 3
row = 4
array = [[0] * col for _ in range(row)]

与在列表理解中使用两个for循环相比,我发现此语法易于记住。

This is how I usually create 2D arrays in python.

col = 3
row = 4
array = [[0] * col for _ in range(row)]

I find this syntax easy to remember compared to using two for loops in a list comprehension.


回答 13

我正在使用我的第一个Python脚本,我对方矩阵示例有些困惑,因此希望以下示例可以帮助您节省一些时间:

 # Creates a 2 x 5 matrix
 Matrix = [[0 for y in xrange(5)] for x in xrange(2)]

以便

Matrix[1][4] = 2 # Valid
Matrix[4][1] = 3 # IndexError: list index out of range

I’m on my first Python script, and I was a little confused by the square matrix example so I hope the below example will help you save some time:

 # Creates a 2 x 5 matrix
 Matrix = [[0 for y in xrange(5)] for x in xrange(2)]

so that

Matrix[1][4] = 2 # Valid
Matrix[4][1] = 3 # IndexError: list index out of range

回答 14

使用NumPy,您可以像这样初始化空矩阵:

import numpy as np
mm = np.matrix([])

然后像这样追加数据:

mm = np.append(mm, [[1,2]], axis=1)

Using NumPy you can initialize empty matrix like this:

import numpy as np
mm = np.matrix([])

And later append data like this:

mm = np.append(mm, [[1,2]], axis=1)

回答 15

我读了这样的逗号分隔文件:

data=[]
for l in infile:
    l = split(',')
    data.append(l)

然后,列表“数据”是带有索引数据的列表的列表[行] [列]

I read in comma separated files like this:

data=[]
for l in infile:
    l = split(',')
    data.append(l)

The list “data” is then a list of lists with index data[row][col]


回答 16

如果您希望能够将其视为2D数组,而不是被迫以列表列表的方式思考(我认为这自然得多),则可以执行以下操作:

import numpy
Nx=3; Ny=4
my2Dlist= numpy.zeros((Nx,Ny)).tolist()

结果是一个列表(不是NumPy数组),您可以用数字,字符串或其他内容覆盖各个位置。

If you want to be able to think it as a 2D array rather than being forced to think in term of a list of lists (much more natural in my opinion), you can do the following:

import numpy
Nx=3; Ny=4
my2Dlist= numpy.zeros((Nx,Ny)).tolist()

The result is a list (not a NumPy array), and you can overwrite the individual positions with numbers, strings, whatever.


回答 17

这就是字典的用途!

matrix = {}

您可以通过两种方式定义

matrix[0,0] = value

要么

matrix = { (0,0)  : value }

结果:

   [ value,  value,  value,  value,  value],
   [ value,  value,  value,  value,  value],
   ...

That’s what dictionary is made for!

matrix = {}

You can define keys and values in two ways:

matrix[0,0] = value

or

matrix = { (0,0)  : value }

Result:

   [ value,  value,  value,  value,  value],
   [ value,  value,  value,  value,  value],
   ...

回答 18

采用:

import copy

def ndlist(*args, init=0):
    dp = init
    for x in reversed(args):
        dp = [copy.deepcopy(dp) for _ in range(x)]
    return dp

l = ndlist(1,2,3,4) # 4 dimensional list initialized with 0's
l[0][1][2][3] = 1

我确实认为NumPy是要走的路。如果您不想使用NumPy,则以上是一种通用方法。

Use:

import copy

def ndlist(*args, init=0):
    dp = init
    for x in reversed(args):
        dp = [copy.deepcopy(dp) for _ in range(x)]
    return dp

l = ndlist(1,2,3,4) # 4 dimensional list initialized with 0's
l[0][1][2][3] = 1

I do think NumPy is the way to go. The above is a generic one if you don’t want to use NumPy.


回答 19

通过使用列表:

matrix_in_python  = [['Roy',80,75,85,90,95],['John',75,80,75,85,100],['Dave',80,80,80,90,95]]

通过使用dict:您还可以将此信息存储在哈希表中,以进行快速搜索,例如

matrix = { '1':[0,0] , '2':[0,1],'3':[0,2],'4' : [1,0],'5':[1,1],'6':[1,2],'7':[2,0],'8':[2,1],'9':[2,2]};

matrix [‘1’]将为您提供O(1)时间的结果

* nb:您需要处理哈希表中的冲突

by using list :

matrix_in_python  = [['Roy',80,75,85,90,95],['John',75,80,75,85,100],['Dave',80,80,80,90,95]]

by using dict: you can also store this info in the hash table for fast searching like

matrix = { '1':[0,0] , '2':[0,1],'3':[0,2],'4' : [1,0],'5':[1,1],'6':[1,2],'7':[2,0],'8':[2,1],'9':[2,2]};

matrix[‘1’] will give you result in O(1) time

*nb: you need to deal with a collision in the hash table


回答 20

如果在开始之前没有尺寸信息,请创建两个一维列表。

list 1: To store rows
list 2: Actual two-dimensional matrix

将整个行存储在第一个列表中。完成后,将列表1附加到列表2:

from random import randint

coordinates=[]
temp=[]
points=int(raw_input("Enter No Of Coordinates >"))
for i in range(0,points):
    randomx=randint(0,1000)
    randomy=randint(0,1000)
    temp=[]
    temp.append(randomx)
    temp.append(randomy)
    coordinates.append(temp)

print coordinates

输出:

Enter No Of Coordinates >4
[[522, 96], [378, 276], [349, 741], [238, 439]]

If you don’t have size information before start then create two one-dimensional lists.

list 1: To store rows
list 2: Actual two-dimensional matrix

Store the entire row in the 1st list. Once done, append list 1 into list 2:

from random import randint

coordinates=[]
temp=[]
points=int(raw_input("Enter No Of Coordinates >"))
for i in range(0,points):
    randomx=randint(0,1000)
    randomy=randint(0,1000)
    temp=[]
    temp.append(randomx)
    temp.append(randomy)
    coordinates.append(temp)

print coordinates

Output:

Enter No Of Coordinates >4
[[522, 96], [378, 276], [349, 741], [238, 439]]

回答 21

# Creates a list containing 5 lists initialized to 0
Matrix = [[0]*5]*5

请注意此简短表达,请参见@FJ答案中的完整解释

# Creates a list containing 5 lists initialized to 0
Matrix = [[0]*5]*5

Be careful about this short expression, see full explanation down in @F.J’s answer


回答 22

l=[[0]*(L) for _ in range(W)]

将比:

l = [[0 for x in range(L)] for y in range(W)] 
l=[[0]*(L) for _ in range(W)]

Will be faster than:

l = [[0 for x in range(L)] for y in range(W)] 

回答 23

您可以通过将两个或多个方括号或第三个方括号([]用逗号分隔)嵌套在一起来创建一个空的二维列表,如下所示:

Matrix = [[], []]

现在假设您要在Matrix[0][0]其后附加1,然后键入:

Matrix[0].append(1)

现在,键入Matrix并按Enter。输出将是:

[[1], []]

You can create an empty two dimensional list by nesting two or more square bracing or third bracket ([], separated by comma) with a square bracing, just like below:

Matrix = [[], []]

Now suppose you want to append 1 to Matrix[0][0] then you type:

Matrix[0].append(1)

Now, type Matrix and hit Enter. The output will be:

[[1], []]

回答 24

尝试这个:

rows = int(input('Enter rows\n'))
my_list = []
for i in range(rows):
    my_list.append(list(map(int, input().split())))

Try this:

rows = int(input('Enter rows\n'))
my_list = []
for i in range(rows):
    my_list.append(list(map(int, input().split())))

回答 25

如果您需要带有预定义数字的矩阵,则可以使用以下代码:

def matrix(rows, cols, start=0):
    return [[c + start + r * cols for c in range(cols)] for r in range(rows)]


assert matrix(2, 3, 1) == [[1, 2, 3], [4, 5, 6]]

In case if you need a matrix with predefined numbers you can use the following code:

def matrix(rows, cols, start=0):
    return [[c + start + r * cols for c in range(cols)] for r in range(rows)]


assert matrix(2, 3, 1) == [[1, 2, 3], [4, 5, 6]]

回答 26

这是在python中创建矩阵的代码片段:

# get the input rows and cols
rows = int(input("rows : "))
cols = int(input("Cols : "))

# initialize the list
l=[[0]*cols for i in range(rows)]

# fill some random values in it
for i in range(0,rows):
    for j in range(0,cols):
        l[i][j] = i+j

# print the list
for i in range(0,rows):
    print()
    for j in range(0,cols):
        print(l[i][j],end=" ")

如果我错过了什么,请提出建议。

Here is the code snippet for creating a matrix in python:

# get the input rows and cols
rows = int(input("rows : "))
cols = int(input("Cols : "))

# initialize the list
l=[[0]*cols for i in range(rows)]

# fill some random values in it
for i in range(0,rows):
    for j in range(0,cols):
        l[i][j] = i+j

# print the list
for i in range(0,rows):
    print()
    for j in range(0,cols):
        print(l[i][j],end=" ")

Please suggest if I have missed something.


如何强制除法为浮点数?除数一直舍入到0?

问题:如何强制除法为浮点数?除数一直舍入到0?

我有两个整数值ab,但是我需要它们在浮点数中的比率。我知道a < b并且想要计算a / b,所以如果我使用整数除法,我将总是得到0,余数为a

c在下文中,如何在Python中强制成为浮点数?

c = a / b

I have two integer values a and b, but I need their ratio in floating point. I know that a < b and I want to calculate a / b, so if I use integer division I’ll always get 0 with a remainder of a.

How can I force c to be a floating point number in Python in the following?

c = a / b

回答 0

在Python 2中,两个整数的除法产生一个整数。在Python 3中,它产生一个浮点数。我们可以通过从中导入来获得新的行为__future__

>>> from __future__ import division
>>> a = 4
>>> b = 6
>>> c = a / b
>>> c
0.66666666666666663

In Python 2, division of two ints produces an int. In Python 3, it produces a float. We can get the new behaviour by importing from __future__.

>>> from __future__ import division
>>> a = 4
>>> b = 6
>>> c = a / b
>>> c
0.66666666666666663

回答 1

您可以通过执行此操作来浮动c = a / float(b)。如果分子或分母是浮点数,则结果也将是。


一个警告:如评论员所指出的,如果b它不是整数或浮点数(或表示一个的字符串),则此方法将无效。如果您正在处理其他类型(例如复数),则需要检查这些类型或使用其他方法。

You can cast to float by doing c = a / float(b). If the numerator or denominator is a float, then the result will be also.


A caveat: as commenters have pointed out, this won’t work if b might be something other than an integer or floating-point number (or a string representing one). If you might be dealing with other types (such as complex numbers) you’ll need to either check for those or use a different method.


回答 2

如何在Python中强制除法为浮点数?

我有两个整数值a和b,但是我需要它们在浮点数中的比率。我知道a <b并且我想计算a / b,所以如果我使用整数除法,我总是得到0并得到a的余数。

下面如何在Python中强制c为浮点数?

c = a / b

这里真正要问的是:

“我如何强制进行真正的除法以a / b返回分数?”

升级到Python 3

在Python 3中,要进行真正的除法,只需执行a / b

>>> 1/2
0.5

地板除法(整数的经典除法行为)现在为a // b

>>> 1//2
0
>>> 1//2.0
0.0

但是,您可能无法使用Python 2,或者编写的代码必须能同时在2和3中使用。

如果使用Python 2

在Python 2中,它不是那么简单。处理经典Python 2分区的某些方法比其他方法更好,更可靠。

对Python 2的建议

您可以在任何给定的模块中获得Python 3划分行为,并在顶部输入以下内容:

from __future__ import division

然后将Python 3样式划分应用于整个模块。它也可以在任何给定的点在python shell中工作。在Python 2中:

>>> from __future__ import division
>>> 1/2
0.5
>>> 1//2
0
>>> 1//2.0
0.0

这确实是最好的解决方案,因为它可以确保模块中的代码与Python 3向前兼容。

Python 2的其他选项

如果您不想将其应用于整个模块,则只能使用一些解决方法。最受欢迎的是将其中一个操作数强制为浮点数。一种可靠的解决方案是a / (b * 1.0)。在新的Python Shell中:

>>> 1/(2 * 1.0)
0.5

truediv来自operator模块operator.truediv(a, b)的功能也很强大,但这可能会更慢,因为它是一个函数调用:

>>> from operator import truediv
>>> truediv(1, 2)
0.5

不建议用于Python 2

常见的是a / float(b)。如果b为复数,则将引发TypeError。由于定义了带有复数的除法,所以对我来说,在为除数传递一个复数时,除法不会失败。

>>> 1 / float(2)
0.5
>>> 1 / float(2j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't convert complex to float

对我而言,故意使您的代码更脆弱没有太大意义。

您也可以使用-Qnew标记运行Python ,但这不利于执行具有新Python 3行为的所有模块,并且您的某些模块可能需要经典的划分,因此除测试外,我不建议这样做。但要演示:

$ python -Qnew -c 'print 1/2'
0.5
$ python -Qnew -c 'print 1/2j'
-0.5j

How can I force division to be floating point in Python?

I have two integer values a and b, but I need their ratio in floating point. I know that a < b and I want to calculate a/b, so if I use integer division I’ll always get 0 with a remainder of a.

How can I force c to be a floating point number in Python in the following?

c = a / b

What is really being asked here is:

“How do I force true division such that a / b will return a fraction?”

Upgrade to Python 3

In Python 3, to get true division, you simply do a / b.

>>> 1/2
0.5

Floor division, the classic division behavior for integers, is now a // b:

>>> 1//2
0
>>> 1//2.0
0.0

However, you may be stuck using Python 2, or you may be writing code that must work in both 2 and 3.

If Using Python 2

In Python 2, it’s not so simple. Some ways of dealing with classic Python 2 division are better and more robust than others.

Recommendation for Python 2

You can get Python 3 division behavior in any given module with the following import at the top:

from __future__ import division

which then applies Python 3 style division to the entire module. It also works in a python shell at any given point. In Python 2:

>>> from __future__ import division
>>> 1/2
0.5
>>> 1//2
0
>>> 1//2.0
0.0

This is really the best solution as it ensures the code in your module is more forward compatible with Python 3.

Other Options for Python 2

If you don’t want to apply this to the entire module, you’re limited to a few workarounds. The most popular is to coerce one of the operands to a float. One robust solution is a / (b * 1.0). In a fresh Python shell:

>>> 1/(2 * 1.0)
0.5

Also robust is truediv from the operator module operator.truediv(a, b), but this is likely slower because it’s a function call:

>>> from operator import truediv
>>> truediv(1, 2)
0.5

Not Recommended for Python 2

Commonly seen is a / float(b). This will raise a TypeError if b is a complex number. Since division with complex numbers is defined, it makes sense to me to not have division fail when passed a complex number for the divisor.

>>> 1 / float(2)
0.5
>>> 1 / float(2j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't convert complex to float

It doesn’t make much sense to me to purposefully make your code more brittle.

You can also run Python with the -Qnew flag, but this has the downside of executing all modules with the new Python 3 behavior, and some of your modules may expect classic division, so I don’t recommend this except for testing. But to demonstrate:

$ python -Qnew -c 'print 1/2'
0.5
$ python -Qnew -c 'print 1/2j'
-0.5j

回答 3

c = a / (b * 1.0)
c = a / (b * 1.0)

回答 4

在Python 3.x中,单斜杠(/)始终表示真实(非截断)除法。(//运算符用于截断除法。)在Python 2.x(2.2及更高版本)中,您可以通过将

from __future__ import division

在模块顶部。

In Python 3.x, the single slash (/) always means true (non-truncating) division. (The // operator is used for truncating division.) In Python 2.x (2.2 and above), you can get this same behavior by putting a

from __future__ import division

at the top of your module.


回答 5

仅以浮点格式进行除法的任何参数也会产生浮点输出。

例:

>>> 4.0/3
1.3333333333333333

要么,

>>> 4 / 3.0
1.3333333333333333

要么,

>>> 4 / float(3)
1.3333333333333333

要么,

>>> float(4) / 3
1.3333333333333333

Just making any of the parameters for division in floating-point format also produces the output in floating-point.

Example:

>>> 4.0/3
1.3333333333333333

or,

>>> 4 / 3.0
1.3333333333333333

or,

>>> 4 / float(3)
1.3333333333333333

or,

>>> float(4) / 3
1.3333333333333333

回答 6

加一个点(.)表示浮点数

>>> 4/3.
1.3333333333333333

Add a dot (.) to indicate floating point numbers

>>> 4/3.
1.3333333333333333

回答 7

这也可以

>>> u=1./5
>>> print u
0.2

This will also work

>>> u=1./5
>>> print u
0.2

回答 8

如果要默认使用“ true”(浮点)除法,则有一个命令行标志:

python -Q new foo.py

有一些缺点(来自PEP):

有人认为,更改默认值的命令行选项是有害的。如果使用不当,肯定会很危险:例如,不可能将需要-Qnew的第三方库软件包与需要-Qold的第三方库软件包结合使用。

您可以通过查看python手册页了解有关更改/警告除法行为的其他标志值的更多信息。

有关除法更改的详细信息,请阅读:PEP 238-更改除法运算符

If you want to use “true” (floating point) division by default, there is a command line flag:

python -Q new foo.py

There are some drawbacks (from the PEP):

It has been argued that a command line option to change the default is evil. It can certainly be dangerous in the wrong hands: for example, it would be impossible to combine a 3rd party library package that requires -Qnew with another one that requires -Qold.

You can learn more about the other flags values that change / warn-about the behavior of division by looking at the python man page.

For full details on division changes read: PEP 238 — Changing the Division Operator


回答 9

from operator import truediv

c = truediv(a, b)
from operator import truediv

c = truediv(a, b)

回答 10

from operator import truediv

c = truediv(a, b)

其中a是除数,b是除数。当两个整数相除后的商是浮点数时,此函数非常方便。

from operator import truediv

c = truediv(a, b)

where a is dividend and b is the divisor. This function is handy when quotient after division of two integers is a float.


pip和conda有什么区别?

问题:pip和conda有什么区别?

我知道pip是python软件包的软件包管理器。但是,我看到IPython网站conda上的安装用于安装IPython。

我可以pip用来安装IPython吗?conda我已经拥有了为什么还要用作另一个python软件包管理器pip

pip和之间有什么区别conda

I know pip is a package manager for python packages. However, I saw the installation on IPython’s website use conda to install IPython.

Can I use pip to install IPython? Why should I use conda as another python package manager when I already have pip?

What is the difference between pip and conda?


回答 0

引用来自Conda博客

参与python世界已经很长时间了,我们都知道pip,easy_install和virtualenv,但是这些工具不能满足我们所有的特定要求。主要问题是它们专注于Python,而忽略了非Python库依赖项,例如HDF5,MKL,LLVM等,它们的源代码中没有setup.py,也没有将文件安装到Python的站点中-packages目录。

因此,Conda是一种包装工具和安装程序,旨在做更多的事情pip。处理Python包之外的库依赖关系以及Python包本身。Conda也像创建虚拟环境一样virtualenv

因此,也许应该将Conda与Buildout进行比较,后者是另一个可以让您处理Python和非Python安装任务的工具。

由于Conda引入了新的包装格式,因此您不能pip与Conda互换使用。 pip无法安装Conda软件包格式。您可以使用并排的两个工具侧(通过安装pipconda install pip),但他们不具备互操作性无论是。

自编写此答案以来,Anaconda 理解Conda和Pip发布了新页面,该页面也与此相呼应:

这凸显了conda和pip之间的关键区别。Pip安装Python软件包,而conda安装软件包,其中可能包含以任何语言编写的软件。例如,在使用pip之前,必须通过系统软件包管理器或下载并运行安装程序来安装Python解释器。另一方面,Conda可以直接安装Python软件包以及Python解释器。

并进一步

有时需要一个软件包,该软件包不是conda软件包,但在PyPI上可用,可以与pip一起安装。在这些情况下,尝试同时使用conda和pip是有意义的。

Quoting from the Conda blog:

Having been involved in the python world for so long, we are all aware of pip, easy_install, and virtualenv, but these tools did not meet all of our specific requirements. The main problem is that they are focused around Python, neglecting non-Python library dependencies, such as HDF5, MKL, LLVM, etc., which do not have a setup.py in their source code and also do not install files into Python’s site-packages directory.

So Conda is a packaging tool and installer that aims to do more than what pip does; handle library dependencies outside of the Python packages as well as the Python packages themselves. Conda also creates a virtual environment, like virtualenv does.

As such, Conda should be compared to Buildout perhaps, another tool that lets you handle both Python and non-Python installation tasks.

Because Conda introduces a new packaging format, you cannot use pip and Conda interchangeably; pip cannot install the Conda package format. You can use the two tools side by side (by installing pip with conda install pip) but they do not interoperate either.

Since writing this answer, Anaconda has published a new page on Understanding Conda and Pip, which echoes this as well:

This highlights a key difference between conda and pip. Pip installs Python packages whereas conda installs packages which may contain software written in any language. For example, before using pip, a Python interpreter must be installed via a system package manager or by downloading and running an installer. Conda on the other hand can install Python packages as well as the Python interpreter directly.

and further on

Occasionally a package is needed which is not available as a conda package but is available on PyPI and can be installed with pip. In these cases, it makes sense to try to use both conda and pip.


回答 1

这是一个简短的摘要:

点子

  • 仅Python软件包。
  • 从源代码编译所有内容。编辑:pip现在会安装二进制车轮(如果可用)。
  • 受核心Python社区的祝福(即Python 3.4+包含自动引导pip的代码)。

康达

  • 不可知的Python。现有软件包的主要焦点是用于Python,的确Conda本身是用Python编写的,但是您也可以拥有用于C库,R软件包或其他任何东西的Conda软件包。
  • 安装二进制文件。有一个名为的工具conda build可以从源代码构建软件包,但conda install它本身可以从已构建的Conda软件包安装东西。
  • 外部。Conda是Anaconda的软件包管理器,它是Continuum Analytics提供的Python发行版,但也可以在Anaconda之外使用。您可以通过pip安装将其与现有的Python安装配合使用(尽管除非您有充分的理由使用现有的安装,否则不建议这样做)。

在两种情况下:

  • 用Python编写
  • 开源(Conda是BSD,pip是MIT)

实际上,Conda的前两个要点是使许多包装优于点子的原因。由于pip是从源代码安装的,因此如果您无法编译源代码,则可能会很麻烦地安装东西(在Windows上尤其如此,但在Linux上,如果软件包中包含一些困难的C或FORTRAN库,甚至可能也是这样。依赖项)。Conda从二进制安装,这意味着某人(例如Continuum)已经完成了编译软件包的艰苦工作,因此安装很容易。

如果您对构建自己的软件包感兴趣,也有一些区别。例如,pip是建立在setuptools之上的,而Conda使用自己的格式,这种格式具有一些优点(例如,静态的,Python不可知的)。

Here is a short rundown:

pip

  • Python packages only.
  • Compiles everything from source. EDIT: pip now installs binary wheels, if they are available.
  • Blessed by the core Python community (i.e., Python 3.4+ includes code that automatically bootstraps pip).

conda

  • Python agnostic. The main focus of existing packages are for Python, and indeed Conda itself is written in Python, but you can also have Conda packages for C libraries, or R packages, or really anything.
  • Installs binaries. There is a tool called conda build that builds packages from source, but conda install itself installs things from already built Conda packages.
  • External. Conda is the package manager of Anaconda, the Python distribution provided by Continuum Analytics, but it can be used outside of Anaconda too. You can use it with an existing Python installation by pip installing it (though this is not recommended unless you have a good reason to use an existing installation).

In both cases:

  • Written in Python
  • Open source (Conda is BSD and pip is MIT)

The first two bullet points of Conda are really what make it advantageous over pip for many packages. Since pip installs from source, it can be painful to install things with it if you are unable to compile the source code (this is especially true on Windows, but it can even be true on Linux if the packages have some difficult C or FORTRAN library dependencies). Conda installs from binary, meaning that someone (e.g., Continuum) has already done the hard work of compiling the package, and so the installation is easy.

There are also some differences if you are interested in building your own packages. For instance, pip is built on top of setuptools, whereas Conda uses its own format, which has some advantages (like being static, and again, Python agnostic).


回答 2

其他答案对这些细节给出了合理的描述,但我想强调一些高级要点。

pip是一个软件包管理器,可简化python软件包的安装,升级和卸载。它还适用于虚拟python环境。

conda是任何软件(安装,升级和卸载)的软件包管理器。它还适用于虚拟系统环境。

conda设计的目标之一是促进用户所需的整个软件堆栈的软件包管理,其中一个或多个python版本可能只是其中的一小部分。这包括低级库(例如线性代数),编译器(例如Windows上的mingw),编辑器,版本控制工具(例如Hg和Git)或其他需要分发和管理的内容

对于版本管理,pip允许您在多个python环境之间切换和管理。

Conda允许您在多个通用环境之间进行切换和管理,在多个通用环境中,其他多个版本的版本号可能会有所不同,例如C库,编译器,测试套件或数据库引擎等。

Conda不是以Windows为中心的,但是在Windows上,当需要安装和管理需要编译的复杂科学软件包时,它是目前可用的高级解决方案。

当我想到尝试通过Windows上的pip编译许多这些软件包或pip install在需要编译时调试失败的会话时浪费了多少时间时,我想哭。

最后,Continuum Analytics还托管(免费)binstar.org(现在称为anaconda.org),以允许常规软件包开发人员创建自己的自定义(内置!)软件堆栈,包用户可以conda install从中使用它们。

The other answers give a fair description of the details, but I want to highlight some high-level points.

pip is a package manager that facilitates installation, upgrade, and uninstallation of python packages. It also works with virtual python environments.

conda is a package manager for any software (installation, upgrade and uninstallation). It also works with virtual system environments.

One of the goals with the design of conda is to facilitate package management for the entire software stack required by users, of which one or more python versions may only be a small part. This includes low-level libraries, such as linear algebra, compilers, such as mingw on Windows, editors, version control tools like Hg and Git, or whatever else requires distribution and management.

For version management, pip allows you to switch between and manage multiple python environments.

Conda allows you to switch between and manage multiple general purpose environments across which multiple other things can vary in version number, like C-libraries, or compilers, or test-suites, or database engines and so on.

Conda is not Windows-centric, but on Windows it is by far the superior solution currently available when complex scientific packages requiring compilation are required to be installed and managed.

I want to weep when I think of how much time I have lost trying to compile many of these packages via pip on Windows, or debug failed pip install sessions when compilation was required.

As a final point, Continuum Analytics also hosts (free) binstar.org (now called anaconda.org) to allow regular package developers to create their own custom (built!) software stacks that their package-users will be able to conda install from.


回答 3

不要再让您感到困惑了,但是您也可以在conda环境中使用pip,这可以验证上面的一般管理员和python特定管理员的评论。

conda install -n testenv pip
source activate testenv
pip <pip command>

您还可以将pip添加到任何环境的默认程序包中,因此每次都会显示pip,因此您不必遵循上述代码段。

Not to confuse you further, but you can also use pip within your conda environment, which validates the general vs. python specific managers comments above.

conda install -n testenv pip
source activate testenv
pip <pip command>

you can also add pip to default packages of any environment so it is present each time so you don’t have to follow the above snippet.


回答 4

引用康达在Continuum网站上发表的关于数据科学的文章:

康达vs点

Python程序员可能很熟悉pip从PyPI下载软件包并管理他们的要求。尽管conda和pip都是程序包管理器,但它们却大不相同:

  • Pip是特定于Python软件包的,而conda是与语言无关的,这意味着我们可以使用conda管理任何语言的软件包。
  • Conda本机创建与语言无关的环境,而pip依靠virtualenv仅管理Python环境尽管建议始终使用conda软件包,但conda也包含pip,因此您不必在这两者之间进行选择。例如,要安装没有conda软件包但可通过pip获得的python软件包,请运行,例如:
conda install pip
pip install gensim

Quote from Conda for Data Science article onto Continuum’s website:

Conda vs pip

Python programmers are probably familiar with pip to download packages from PyPI and manage their requirements. Although, both conda and pip are package managers, they are very different:

  • Pip is specific for Python packages and conda is language-agnostic, which means we can use conda to manage packages from any language Pip compiles from source and conda installs binaries, removing the burden of compilation
  • Conda creates language-agnostic environments natively whereas pip relies on virtualenv to manage only Python environments Though it is recommended to always use conda packages, conda also includes pip, so you don’t have to choose between the two. For example, to install a python package that does not have a conda package, but is available through pip, just run, for example:
conda install pip
pip install gensim

回答 5

引用《Conda:神话与误解》(全面描述):

误解3:Conda和Pip是直接竞争对手

现实:Conda和pip服务于不同的目的,仅直接竞争一小部分任务:即在隔离的环境中安装Python软件包。

皮普,代表P IP nstalls P ackages,是Python的官方认可的包管理器,并且是最常用的在其上安装Python包索引(PyPI中)发布的数据包。pip和PyPI均受Python Packaging Authority(PyPA)管辖和支持。

简而言之,pip是Python软件包的通用管理器。conda是与语言无关的跨平台环境管理器。对于用户而言,最明显的区别可能是:pip在任何环境中安装python软件包;conda在conda环境中安装任何软件包。如果您要做的只是在隔离的环境中安装Python软件包,则conda和pip + virtualenv通常是可互换的,从而在依赖项处理和软件包可用性方面取得了一些差异。隔离环境是指conda-env或virtualenv,您可以在其中安装软件包而无需修改系统Python安装。

即使抛开神话#2,如果我们只关注Python软件包的安装,conda和pip也可以为不同的受众和不同的目的服务。例如,如果要管理现有系统Python安装中的Python软件包,conda不能为您提供帮助:根据设计,它只能在conda环境中安装软件包。例如,如果您想使用许多依赖于外部依赖关系的Python包(NumPy,SciPy和Matplotlib是常见的示例),而以有意义的方式跟踪这些依赖关系时,pip并不能帮助您:通过设计,它仅管理Python软件包。

Conda和pip不是竞争对手,而是针对不同用户群和使用方式的工具。

Quoting from Conda: Myths and Misconceptions (a comprehensive description):

Myth #3: Conda and pip are direct competitors

Reality: Conda and pip serve different purposes, and only directly compete in a small subset of tasks: namely installing Python packages in isolated environments.

Pip, which stands for Pip Installs Packages, is Python’s officially-sanctioned package manager, and is most commonly used to install packages published on the Python Package Index (PyPI). Both pip and PyPI are governed and supported by the Python Packaging Authority (PyPA).

In short, pip is a general-purpose manager for Python packages; conda is a language-agnostic cross-platform environment manager. For the user, the most salient distinction is probably this: pip installs python packages within any environment; conda installs any package within conda environments. If all you are doing is installing Python packages within an isolated environment, conda and pip+virtualenv are mostly interchangeable, modulo some difference in dependency handling and package availability. By isolated environment I mean a conda-env or virtualenv, in which you can install packages without modifying your system Python installation.

Even setting aside Myth #2, if we focus on just installation of Python packages, conda and pip serve different audiences and different purposes. If you want to, say, manage Python packages within an existing system Python installation, conda can’t help you: by design, it can only install packages within conda environments. If you want to, say, work with the many Python packages which rely on external dependencies (NumPy, SciPy, and Matplotlib are common examples), while tracking those dependencies in a meaningful way, pip can’t help you: by design, it manages Python packages and only Python packages.

Conda and pip are not competitors, but rather tools focused on different groups of users and patterns of use.


回答 6

对于WINDOWS用户

最近,“标准”包装工具的状况正在改善:

  • 截至9月,在pypi本身上,有48%的车轮包装。2015年11月11日(高于2015年5月的38%和2014年9月的24%),

  • 现在,最新的python 2.7.9支持开箱即用的wheel格式,

“标准” +“调整”包装工具的状况也在改善:

  • 您可以在http://www.lfd.uci.edu/~gohlke/pythonlibs上找到几乎所有关于转轮格式的科学软件包,

  • mingwpy项目可能有一天为Windows用户带来一个“编译”包,允许在需要时从源代码安装所有内容。

“康达”包装对于所服务的市场而言仍然更好,并强调了“标准” 应该改进的地方。

(同样,在标准车轮系统和conda系统中,或者在扩展方面,依赖规范的多方面努力不是很Python,如果所有这些打包的“核心”技术都可以通过某种PEP收敛,那就太好了)

For WINDOWS users

“standard” packaging tools situation is improving recently:

  • on pypi itself, there are now 48% of wheel packages as of sept. 11th 2015 (up from 38% in may 2015 , 24% in sept. 2014),

  • the wheel format is now supported out-of-the-box per latest python 2.7.9,

“standard”+”tweaks” packaging tools situation is improving also:

  • you can find nearly all scientific packages on wheel format at http://www.lfd.uci.edu/~gohlke/pythonlibs,

  • the mingwpy project may bring one day a ‘compilation’ package to windows users, allowing to install everything from source when needed.

“Conda” packaging remains better for the market it serves, and highlights areas where the “standard” should improve.

(also, the dependency specification multiple-effort, in standard wheel system and in conda system, or buildout, is not very pythonic, it would be nice if all these packaging ‘core’ techniques could converge, via a sort of PEP)


回答 7

pip 是包裹经理。

conda 既是包管理器又是环境管理器。

详情:

在此处输入图片说明

参考文献

pip is a package manager.

conda is both a package manager and an environment manager.

Detail:

enter image description here

References


回答 8

我可以使用pip安装iPython吗?

当然,两者(第一种方法在页面上)

pip install ipython

和(第三种方法,第二种是conda

您可以从GitHub或PyPI手动下载IPython。要安装这些版本之一,请解压缩它并使用终端从顶级源目录运行以下命令:

pip install .

官方推荐的安装方法

当我已经有了pip时,为什么还要使用conda作为另一个python软件包管理器?

这里所说:

如果您需要一个特定的软件包,也许仅用于一个项目,或者需要与其他人共享该项目,那么conda似乎更合适。

康达(YMMV)超过点

  • 使用非Python工具的项目
  • 与同事分享
  • 在版本之间切换
  • 在具有不同库版本的项目之间切换

pip和conda有什么区别?

其他所有人对此都有广泛的回答。

Can I use pip to install iPython?

Sure, both (first approach on page)

pip install ipython

and (third approach, second is conda)

You can manually download IPython from GitHub or PyPI. To install one of these versions, unpack it and run the following from the top-level source directory using the Terminal:

pip install .

are officially recommended ways to install.

Why should I use conda as another python package manager when I already have pip?

As said here:

If you need a specific package, maybe only for one project, or if you need to share the project with someone else, conda seems more appropriate.

Conda surpasses pip in (YMMV)

  • projects that use non-python tools
  • sharing with colleagues
  • switching between versions
  • switching between projects with different library versions

What is the difference between pip and conda?

That is extensively answered by everyone else.


回答 9

pip 仅适用于Python

conda仅适用于Anaconda +其他科学软件包,例如R依赖等。并非每个人都需要Python附带的Anaconda。Anaconda主要适合那些进行机器学习/深度学习等的人。Casual Python开发人员不会在他的笔记本电脑上运行Anaconda。

pip is for Python only

conda is only for Anaconda + other scientific packages like R dependencies etc. NOT everyone needs Anaconda that already comes with Python. Anaconda is mostly for those who do Machine learning/deep learning etc. Casual Python dev won’t run Anaconda on his laptop.


回答 10

我可能已经发现了另一小的区别。我在python环境下/usr而不是在/home任何环境下。为了安装它,我将不得不使用sudo install pip。对我来说,不想要的副作用sudo install pip是比被广泛报道的其他地方略有不同:这样做之后,我还得跑pythonsudo以进口任何的sudo-installed包。我放弃了这一点,最终发现我可以sudo conda将软件包安装到一个环境中/usr,然后在该环境下可以正常导入而不需要sudo获得许可python。我什sudo conda至习惯于修复损坏的东西,pip而不是使用sudo pip uninstall pipor sudo pip --upgrade install pip

I may have found one further difference of a minor nature. I have my python environments under /usr rather than /home or whatever. In order to install to it, I would have to use sudo install pip. For me, the undesired side effect of sudo install pip was slightly different than what are widely reported elsewhere: after doing so, I had to run python with sudo in order to import any of the sudo-installed packages. I gave up on that and eventually found I could use sudo conda to install packages to an environment under /usr which then imported normally without needing sudo permission for python. I even used sudo conda to fix a broken pip rather than using sudo pip uninstall pip or sudo pip --upgrade install pip.


Python中的单引号与双引号[关闭]

问题:Python中的单引号与双引号[关闭]

根据文档,它们几乎可以互换。是否有出于某种风格的原因要在一个之上使用另一个?

According to the documentation, they’re pretty much interchangeable. Is there a stylistic reason to use one over the other?


回答 0

我喜欢在用于插值或自然语言消息的字符串周围使用双引号,对于像符号一样小的字符串使用单引号,但是如果字符串包含引号或我忘记了,则会违反规则。我对文档字符串使用三重双引号,对正则表达式使用原始字符串文字,即使不需要它们也是如此。

例如:

LIGHT_MESSAGES = {
    'English': "There are %(number_of_lights)s lights.",
    'Pirate':  "Arr! Thar be %(number_of_lights)s lights."
}

def lights_message(language, number_of_lights):
    """Return a language-appropriate string reporting the light count."""
    return LIGHT_MESSAGES[language] % locals()

def is_pirate(message):
    """Return True if the given message sounds piratical."""
    return re.search(r"(?i)(arr|avast|yohoho)!", message) is not None

I like to use double quotes around strings that are used for interpolation or that are natural language messages, and single quotes for small symbol-like strings, but will break the rules if the strings contain quotes, or if I forget. I use triple double quotes for docstrings and raw string literals for regular expressions even if they aren’t needed.

For example:

LIGHT_MESSAGES = {
    'English': "There are %(number_of_lights)s lights.",
    'Pirate':  "Arr! Thar be %(number_of_lights)s lights."
}

def lights_message(language, number_of_lights):
    """Return a language-appropriate string reporting the light count."""
    return LIGHT_MESSAGES[language] % locals()

def is_pirate(message):
    """Return True if the given message sounds piratical."""
    return re.search(r"(?i)(arr|avast|yohoho)!", message) is not None

回答 1

https://docs.python.org/2.0/ref/strings.html引用官方文档:

用简单的英语:字符串文字可以用匹配的单引号(’)或双引号(“)括起来。

因此没有区别。取而代之的是,人们会告诉您选择与上下文匹配并且一致的样式。我会同意-补充一点,试图为此类事情提出“惯例”是没有意义的,因为这样只会使任何新来者感到困惑。

Quoting the official docs at https://docs.python.org/2.0/ref/strings.html:

In plain English: String literals can be enclosed in matching single quotes (‘) or double quotes (“).

So there is no difference. Instead, people will tell you to choose whichever style that matches the context, and to be consistent. And I would agree – adding that it is pointless to try to come up with “conventions” for this sort of thing because you’ll only end up confusing any newcomers.


回答 2

我以前喜欢',尤其是对'''docstrings''',因为我觉得"""this creates some fluff"""。另外,'无需Shift我的瑞士德语键盘上的键即可键入。

从那以后,我改变为使用三引号"""docstrings""",以符合PEP 257

I used to prefer ', especially for '''docstrings''', as I find """this creates some fluff""". Also, ' can be typed without the Shift key on my Swiss German keyboard.

I have since changed to using triple quotes for """docstrings""", to conform to PEP 257.


回答 3

我与Will在一起:

  • 文字双引号
  • 行为类似于标识符的单引号
  • 正则表达式的双引号原始字符串文字
  • 文档字符串三重双引号

我会坚持下去,即使这意味着很多逃避。

从引号引起来的单引号标识符中,我获得了最大的价值。其余的做法只是为了给那些单引号标识符留出一定的空间。

I’m with Will:

  • Double quotes for text
  • Single quotes for anything that behaves like an identifier
  • Double quoted raw string literals for regexps
  • Tripled double quotes for docstrings

I’ll stick with that even if it means a lot of escaping.

I get the most value out of single quoted identifiers standing out because of the quotes. The rest of the practices are there just to give those single quoted identifiers some standing room.


回答 4

如果您的字符串包含一个,则应使用另一个。例如"You're able to do this",或'He said "Hi!"'。除此之外,您应该尽可能地保持一致(在模块内,包内,项目内,组织内)。

如果您的代码将由使用C / C ++的人员阅读(或者如果您在这些语言和Python之间切换),则将其''用于单字符字符串和""较长的字符串可能有助于简化转换。(同样地,对于遵循其他不可互换的其他语言)。

我在野外看到的Python代码倾向于优先"',但只是略微偏偏。从我所看到的情况来看,一个exceptions是,"""these"""它比普遍得多'''these'''

If the string you have contains one, then you should use the other. For example, "You're able to do this", or 'He said "Hi!"'. Other than that, you should simply be as consistent as you can (within a module, within a package, within a project, within an organisation).

If your code is going to be read by people who work with C/C++ (or if you switch between those languages and Python), then using '' for single-character strings, and "" for longer strings might help ease the transition. (Likewise for following other languages where they are not interchangeable).

The Python code I’ve seen in the wild tends to favour " over ', but only slightly. The one exception is that """these""" are much more common than '''these''', from what I have seen.


回答 5

用三引号引起来的注释是该问题的一个有趣的子主题。PEP 257指定文档字符串的三引号。我使用Google代码搜索进行了快速检查,发现Python中的三重双引号大约是三重单引号的 10倍-在Google索引的代码中出现了130万对131K。因此,在多行情况下,如果使用三重双引号,您的代码可能会变得更加熟悉。

Triple quoted comments are an interesting subtopic of this question. PEP 257 specifies triple quotes for doc strings. I did a quick check using Google Code Search and found that triple double quotes in Python are about 10x as popular as triple single quotes — 1.3M vs 131K occurrences in the code Google indexes. So in the multi line case your code is probably going to be more familiar to people if it uses triple double quotes.


回答 6

"If you're going to use apostrophes, 
       ^

you'll definitely want to use double quotes".
   ^

由于这个简单的原因,我总是在外面使用双引号。总是

说到绒毛,如果您将不得不使用转义字符来表示撇号,那么用’简化字符串文字有什么好处?它会冒犯编码员阅读小说吗?我无法想象高中英语课对你有多痛苦!

"If you're going to use apostrophes, 
       ^

you'll definitely want to use double quotes".
   ^

For that simple reason, I always use double quotes on the outside. Always

Speaking of fluff, what good is streamlining your string literals with ‘ if you’re going to have to use escape characters to represent apostrophes? Does it offend coders to read novels? I can’t imagine how painful high school English class was for you!


回答 7

Python使用如下引号:

mystringliteral1="this is a string with 'quotes'"
mystringliteral2='this is a string with "quotes"'
mystringliteral3="""this is a string with "quotes" and more 'quotes'"""
mystringliteral4='''this is a string with 'quotes' and more "quotes"'''
mystringliteral5='this is a string with \"quotes\"'
mystringliteral6='this is a string with \042quotes\042'
mystringliteral6='this is a string with \047quotes\047'

print mystringliteral1
print mystringliteral2
print mystringliteral3
print mystringliteral4
print mystringliteral5
print mystringliteral6

给出以下输出:

this is a string with 'quotes'
this is a string with "quotes"
this is a string with "quotes" and more 'quotes'
this is a string with 'quotes' and more "quotes"
this is a string with "quotes"
this is a string with 'quotes'

Python uses quotes something like this:

mystringliteral1="this is a string with 'quotes'"
mystringliteral2='this is a string with "quotes"'
mystringliteral3="""this is a string with "quotes" and more 'quotes'"""
mystringliteral4='''this is a string with 'quotes' and more "quotes"'''
mystringliteral5='this is a string with \"quotes\"'
mystringliteral6='this is a string with \042quotes\042'
mystringliteral6='this is a string with \047quotes\047'

print mystringliteral1
print mystringliteral2
print mystringliteral3
print mystringliteral4
print mystringliteral5
print mystringliteral6

Which gives the following output:

this is a string with 'quotes'
this is a string with "quotes"
this is a string with "quotes" and more 'quotes'
this is a string with 'quotes' and more "quotes"
this is a string with "quotes"
this is a string with 'quotes'

回答 8

我通常使用双引号,但出于某种原因而不是使用双引号-可能只是出于Java的习惯。

我猜您也更希望内联文字字符串中使用撇号,而不是双引号。

I use double quotes in general, but not for any specific reason – Probably just out of habit from Java.

I guess you’re also more likely to want apostrophes in an inline literal string than you are to want double quotes.


回答 9

我个人坚持一个或另一个。没关系 提供您自己的意思来引用任何一种,只是在您进行协作时使其他人感到困惑。

Personally I stick with one or the other. It doesn’t matter. And providing your own meaning to either quote is just to confuse other people when you collaborate.


回答 10

风格上的偏爱可能比什么都重要。我只是检查了PEP 8,却没有提到单引号和双引号。

我更喜欢单引号,因为它只有一个击键而不是两个击键。也就是说,我不必混入Shift键即可制作单引号。

It’s probably a stylistic preference more than anything. I just checked PEP 8 and didn’t see any mention of single versus double quotes.

I prefer single quotes because its only one keystroke instead of two. That is, I don’t have to mash the shift key to make single quote.


回答 11

在Perl中,当您有不需要插入变量或\ n,\ t,\ r等转义字符的字符串时,您想使用单引号。

PHP与Perl的区别是相同的:单引号中的内容将不会被解释(甚至不会转换\ n),而双引号中可能包含变量以显示其值。

恐怕Python没有。从技术上看,在Python中没有$令牌(或类似符号)可将名称/文本与变量分开。毕竟,这两个功能使Python更具可读性,减少了混乱。单引号和双引号可以在Python中互换使用。

In Perl you want to use single quotes when you have a string which doesn’t need to interpolate variables or escaped characters like \n, \t, \r, etc.

PHP makes the same distinction as Perl: content in single quotes will not be interpreted (not even \n will be converted), as opposed to double quotes which can contain variables to have their value printed out.

Python does not, I’m afraid. Technically seen, there is no $ token (or the like) to separate a name/text from a variable in Python. Both features make Python more readable, less confusing, after all. Single and double quotes can be used interchangeably in Python.


回答 12

我选择使用双引号,因为它们更易于查看。

I chose to use double quotes because they are easier to see.


回答 13

我只是用当时我喜欢的任何东西;能够一时之间在两者之间切换很方便!

当然,引用报价字符时,毕竟在两者之间切换可能不是那么古怪……

I just use whatever strikes my fancy at the time; it’s convenient to be able to switch between the two at a whim!

Of course, when quoting quote characetrs, switching between the two might not be so whimsical after all…


回答 14

您的团队的品味或项目的编码准则。

例如,如果您处于多语言环境中,则可能希望鼓励对其他语言使用的字符串使用相同类型的引号。另外,我个人最喜欢“

Your team’s taste or your project’s coding guidelines.

If you are in a multilanguage environment, you might wish to encourage the use of the same type of quotes for strings that the other language uses, for instance. Else, I personally like best the look of ‘


回答 15

据我所知没有。尽管如果看一些代码,“”通常用于文本字符串(我猜想’在文本内部比’更为常见),并且”出现在哈希键之类的东西中。

None as far as I know. Although if you look at some code, ” ” is commonly used for strings of text (I guess ‘ is more common inside text than “), and ‘ ‘ appears in hashkeys and things like that.


回答 16

我的目标是尽量减少像素和惊喜。我通常更喜欢'最小化像素,但是"如果字符串中带有单引号,则我还是希望最小化像素。但是,对于文档字符串,我更喜欢"""'''因为后者是非标准的,不常见的,因此令人惊讶。如果现在我"按照上述逻辑使用了一堆字符串,但又可以避免使用a字符串,那么我'仍然可以"在其中使用它来保持一致性,以最大程度地减少意外。

也许可以通过以下方式来考虑像素最小化原理。您希望英文字符看起来像A B C还是AA BB CC?后一种选择浪费了50%的非空像素。

I aim to minimize both pixels and surprise. I typically prefer ' in order to minimize pixels, but " instead if the string has an apostrophe, again to minimize pixels. For a docstring, however, I prefer """ over ''' because the latter is non-standard, uncommon, and therefore surprising. If now I have a bunch of strings where I used " per the above logic, but also one that can get away with a ', I may still use " in it to preserve consistency, only to minimize surprise.

Perhaps it helps to think of the pixel minimization philosophy in the following way. Would you rather that English characters looked like A B C or AA BB CC? The latter choice wastes 50% of the non-empty pixels.


回答 17

我使用双引号,是因为除Bash之外,大多数语言(C ++,Java,VB等)都已经使用多年了,因为我也在普通文本中使用双引号,并且因为我在使用(修改过的)非英语键盘,这两个字符都需要使用Shift键。

I use double quotes because I have been doing so for years in most languages (C++, Java, VB…) except Bash, because I also use double quotes in normal text and because I’m using a (modified) non-English keyboard where both characters require the shift key.


回答 18

' = "

/= \=\\

例如:

f = open('c:\word.txt', 'r')
f = open("c:\word.txt", "r")
f = open("c:/word.txt", "r")
f = open("c:\\\word.txt", "r")

结果是一样的

= >>不,他们不一样。单个反斜杠将转义字符。在该示例中,您只是碰巧了运气,因为\k\w不是有效的转义字符,例如\tor \n\\or\"

如果要使用单个反斜杠(并且将反斜杠解释为反斜杠),则需要使用“原始”字符串。您可以通过r在字符串前面加上“ ”来实现

im_raw = r'c:\temp.txt'
non_raw = 'c:\\temp.txt'
another_way = 'c:/temp.txt'

就Windows中的路径而言,正斜杠的解释方式相同。显然,字符串本身是不同的。但是,我不能保证在外部设备上会以这种方式处理它们。

' = "

/ = \ = \\

example :

f = open('c:\word.txt', 'r')
f = open("c:\word.txt", "r")
f = open("c:/word.txt", "r")
f = open("c:\\\word.txt", "r")

Results are the same

=>> no, they’re not the same. A single backslash will escape characters. You just happen to luck out in that example because \k and \w aren’t valid escapes like \t or \n or \\ or \"

If you want to use single backslashes (and have them interpreted as such), then you need to use a “raw” string. You can do this by putting an ‘r‘ in front of the string

im_raw = r'c:\temp.txt'
non_raw = 'c:\\temp.txt'
another_way = 'c:/temp.txt'

As far as paths in Windows are concerned, forward slashes are interpreted the same way. Clearly the string itself is different though. I wouldn’t guarantee that they’re handled this way on an external device though.


Python 2.X中的range和xrange函数之间有什么区别?

问题:Python 2.X中的range和xrange函数之间有什么区别?

显然,xrange更快,但是我不知道为什么它更快(到目前为止,除了轶事之外,还没有证据表明它更快)或除此之外还有什么不同

for i in range(0, 20):
for i in xrange(0, 20):

Apparently xrange is faster but I have no idea why it’s faster (and no proof besides the anecdotal so far that it is faster) or what besides that is different about

for i in range(0, 20):
for i in xrange(0, 20):

回答 0

在Python 2.x中:

  • range创建一个列表,所以如果您这样做range(1, 10000000),则会在内存中创建一个包含9999999元素的列表。

  • xrange 是一个延迟计算的序列对象。

在Python 3中,range它等效于python xrange,并且必须使用来获取列表list(range(...))

In Python 2.x:

  • range creates a list, so if you do range(1, 10000000) it creates a list in memory with 9999999 elements.

  • xrange is a sequence object that evaluates lazily.

In Python 3, range does the equivalent of python’s xrange, and to get the list, you have to use list(range(...)).


回答 1

range会创建一个列表,因此,如果执行range(1, 10000000)此操作,则会在内存中创建一个包含9999999元素的列表。

xrange 是一个生成器,所以它是一个序列对象,是一个懒惰求值的对象。

的确如此,但是在Python 3中,.range()将由Python 2实现.xrange()。如果需要实际生成列表,则需要执行以下操作:

list(range(1,100))

range creates a list, so if you do range(1, 10000000) it creates a list in memory with 9999999 elements.

xrange is a generator, so it is a sequence object is a that evaluates lazily.

This is true, but in Python 3, .range() will be implemented by the Python 2 .xrange(). If you need to actually generate the list, you will need to do:

list(range(1,100))

回答 2

记住,使用该timeit模块来测试较小的代码片段更快!

$ python -m timeit 'for i in range(1000000):' ' pass'
10 loops, best of 3: 90.5 msec per loop
$ python -m timeit 'for i in xrange(1000000):' ' pass'
10 loops, best of 3: 51.1 msec per loop

就个人而言,.range()除非我处理的列表非常庞大,否则我总是使用-从时间上可以看出,对于一百万个条目的列表,额外的开销只有0.04秒。正如Corey所指出的那样,在Python 3.0中,它.xrange()将会消失,并且.range()无论如何都会为您提供良好的迭代器行为。

Remember, use the timeit module to test which of small snippets of code is faster!

$ python -m timeit 'for i in range(1000000):' ' pass'
10 loops, best of 3: 90.5 msec per loop
$ python -m timeit 'for i in xrange(1000000):' ' pass'
10 loops, best of 3: 51.1 msec per loop

Personally, I always use .range(), unless I were dealing with really huge lists — as you can see, time-wise, for a list of a million entries, the extra overhead is only 0.04 seconds. And as Corey points out, in Python 3.0 .xrange() will go away and .range() will give you nice iterator behavior anyway.


回答 3

xrange仅存储范围参数并按需生成数字。但是,Python的C实现当前将其args限制为C long:

xrange(2**32-1, 2**32+1)  # When long is 32 bits, OverflowError: Python int too large to convert to C long
range(2**32-1, 2**32+1)   # OK --> [4294967295L, 4294967296L]

请注意,在Python 3.0中仅存在,range并且其行为类似于2.x,xrange但对最小和最大端点没有限制。

xrange only stores the range params and generates the numbers on demand. However the C implementation of Python currently restricts its args to C longs:

xrange(2**32-1, 2**32+1)  # When long is 32 bits, OverflowError: Python int too large to convert to C long
range(2**32-1, 2**32+1)   # OK --> [4294967295L, 4294967296L]

Note that in Python 3.0 there is only range and it behaves like the 2.x xrange but without the limitations on minimum and maximum end points.


回答 4

xrange返回一个迭代器,一次只在内存中保留一个数字。range将整个数字列表保留在内存中。

xrange returns an iterator and only keeps one number in memory at a time. range keeps the entire list of numbers in memory.


回答 5

一定要花一些时间在图书馆参考上。您越熟悉它,就可以更快地找到此类问题的答案。关于内置对象和类型的前几章特别重要。

xrange类型的优点在于,无论xrange对象代表的范围大小如何,它始终将占用相同的内存量。没有一致的性能优势。

查找有关Python构造的快速信息的另一种方法是docstring和help-function:

print xrange.__doc__ # def doc(x): print x.__doc__ is super useful
help(xrange)

Do spend some time with the Library Reference. The more familiar you are with it, the faster you can find answers to questions like this. Especially important are the first few chapters about builtin objects and types.

The advantage of the xrange type is that an xrange object will always take the same amount of memory, no matter the size of the range it represents. There are no consistent performance advantages.

Another way to find quick information about a Python construct is the docstring and the help-function:

print xrange.__doc__ # def doc(x): print x.__doc__ is super useful
help(xrange)

回答 6

我很震惊,没有人读doc

此函数非常类似于range(),但是返回一个xrange对象而不是一个列表。这是一种不透明的序列类型,其产生的值与对应的列表相同,而实际上并没有同时存储它们。xrange()over 的优点range()是最小的(因为xrange()在要求输入值时仍必须创建值),除非在内存不足的计算机上使用了非常大的范围或从未使用过范围的所有元素时(例如当循环被使用时)。通常以break)终止。

I am shocked nobody read doc:

This function is very similar to range(), but returns an xrange object instead of a list. This is an opaque sequence type which yields the same values as the corresponding list, without actually storing them all simultaneously. The advantage of xrange() over range() is minimal (since xrange() still has to create the values when asked for them) except when a very large range is used on a memory-starved machine or when all of the range’s elements are never used (such as when the loop is usually terminated with break).


回答 7

range创建一个列表,因此如果执行range(1,10000000),它将在内存中创建一个包含10000000个元素的列表。xrange是一个生成器,因此它懒惰地求值。

这为您带来两个优点:

  1. 您可以迭代更长的列表而无需获取MemoryError
  2. 当它懒散地解析每个数字时,如果您尽早停止迭代,您将不会浪费时间创建整个列表。

range creates a list, so if you do range(1, 10000000) it creates a list in memory with 10000000 elements. xrange is a generator, so it evaluates lazily.

This brings you two advantages:

  1. You can iterate longer lists without getting a MemoryError.
  2. As it resolves each number lazily, if you stop iteration early, you won’t waste time creating the whole list.

回答 8

在这个简单的示例中,您将发现xrangeover 的优点range

import timeit

t1 = timeit.default_timer()
a = 0
for i in xrange(1, 100000000):
    pass
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 4.49153590202 seconds

t1 = timeit.default_timer()
a = 0
for i in range(1, 100000000):
    pass
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 7.04547905922 seconds

上面的示例在的情况下并没有任何实质性的改善xrange

现在range,与相比,以下情况的确非常慢xrange

import timeit

t1 = timeit.default_timer()
a = 0
for i in xrange(1, 100000000):
    if i == 10000:
        break
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 0.000764846801758 seconds

t1 = timeit.default_timer()
a = 0
for i in range(1, 100000000):
    if i == 10000:
        break
t2 = timeit.default_timer() 

print "time taken: ", (t2-t1)  # 2.78506207466 seconds

使用range,它已经创建了一个从0到100000000(耗时)的列表,但是它xrange是一个生成器,并且仅根据需要(即,如果继续迭代)生成数字。

在Python-3中,该range功能的实现与xrangePython-2中的相同,而xrange在Python-3中已取消了该功能。

快乐编码!

You will find the advantage of xrange over range in this simple example:

import timeit

t1 = timeit.default_timer()
a = 0
for i in xrange(1, 100000000):
    pass
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 4.49153590202 seconds

t1 = timeit.default_timer()
a = 0
for i in range(1, 100000000):
    pass
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 7.04547905922 seconds

The above example doesn’t reflect anything substantially better in case of xrange.

Now look at the following case where range is really really slow, compared to xrange.

import timeit

t1 = timeit.default_timer()
a = 0
for i in xrange(1, 100000000):
    if i == 10000:
        break
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 0.000764846801758 seconds

t1 = timeit.default_timer()
a = 0
for i in range(1, 100000000):
    if i == 10000:
        break
t2 = timeit.default_timer() 

print "time taken: ", (t2-t1)  # 2.78506207466 seconds

With range, it already creates a list from 0 to 100000000(time consuming), but xrange is a generator and it only generates numbers based on the need, that is, if the iteration continues.

In Python-3, the implementation of the range functionality is same as that of xrange in Python-2, while they have done away with xrange in Python-3

Happy Coding!!


回答 9

这是出于优化的原因。

range()将从头到尾创建一个值列表(在您的示例中为0 .. 20)。在很大范围内,这将成为昂贵的操作。

另一方面,xrange()更优化了。它只会在需要时(通过xrange序列对象)计算下一个值,并且不会像range()那样创建所有值的列表。

It is for optimization reasons.

range() will create a list of values from start to end (0 .. 20 in your example). This will become an expensive operation on very large ranges.

xrange() on the other hand is much more optimised. it will only compute the next value when needed (via an xrange sequence object) and does not create a list of all values like range() does.


回答 10

range(x,y)如果使用for循环,则返回x和y之间的每个数字的列表,这样range比较慢。实际上,range具有较大的Index范围。range(x.y)将打印出x和y之间所有数字的列表

xrange(x,y)返回,xrange(x,y)但如果使用for循环,则xrange速度更快。xrange索引范围较小。xrange不仅会打印出来xrange(x,y),还会保留其中的所有数字。

[In] range(1,10)
[Out] [1, 2, 3, 4, 5, 6, 7, 8, 9]
[In] xrange(1,10)
[Out] xrange(1,10)

如果使用for循环,那么它将起作用

[In] for i in range(1,10):
        print i
[Out] 1
      2
      3
      4
      5
      6
      7
      8
      9
[In] for i in xrange(1,10):
         print i
[Out] 1
      2
      3
      4
      5
      6
      7
      8
      9

尽管使用循环时并没有什么不同,但仅打印循环时也有差异!

range(x,y) returns a list of each number in between x and y if you use a for loop, then range is slower. In fact, range has a bigger Index range. range(x.y) will print out a list of all the numbers in between x and y

xrange(x,y) returns xrange(x,y) but if you used a for loop, then xrange is faster. xrange has a smaller Index range. xrange will not only print out xrange(x,y) but it will still keep all the numbers that are in it.

[In] range(1,10)
[Out] [1, 2, 3, 4, 5, 6, 7, 8, 9]
[In] xrange(1,10)
[Out] xrange(1,10)

If you use a for loop, then it would work

[In] for i in range(1,10):
        print i
[Out] 1
      2
      3
      4
      5
      6
      7
      8
      9
[In] for i in xrange(1,10):
         print i
[Out] 1
      2
      3
      4
      5
      6
      7
      8
      9

There isn’t much difference when using loops, though there is a difference when just printing it!


回答 11

range(): range(1,10)返回一个1到10个数字的列表,并将整个列表保存在内存中。

xrange():类似于range(),但不返回列表,而是返回一个对象,该对象根据需要生成范围内的数字。对于循环,这比range()快一点,并且内存效率更高。xrange()对象就像一个迭代器,并根据需要生成数字。(延迟评估)

In [1]: range(1,10)

Out[1]: [1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]: xrange(10)

Out[2]: xrange(10)

In [3]: print xrange.__doc__

xrange([start,] stop[, step]) -> xrange object

range(): range(1, 10) returns a list from 1 to 10 numbers & hold whole list in memory.

xrange(): Like range(), but instead of returning a list, returns an object that generates the numbers in the range on demand. For looping, this is lightly faster than range() and more memory efficient. xrange() object like an iterator and generates the numbers on demand.(Lazy Evaluation)

In [1]: range(1,10)

Out[1]: [1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]: xrange(10)

Out[2]: xrange(10)

In [3]: print xrange.__doc__

xrange([start,] stop[, step]) -> xrange object

回答 12

其他一些答案提到Python 3淘汰了2.x range,并将2.x重命名xrangerange。但是,除非您使用3.0或3.1(应该没有人使用),否则它实际上是一种不同的类型。

正如3.1文档所说:

范围对象几乎没有行为:它们仅支持索引,迭代和len功能。

但是,在3.2+版本中,它range是一个完整序列,它支持扩展的slice,以及所有collections.abc.Sequence与语义相同的方法list*

并且,至少在CPython和PyPy(当前仅有的两个3.2+实现)中,它还具有indexand count方法和in运算符的恒定时间实现(只要您仅将其传递为整数)。这意味着123456 in r在3.2+版本中写作是合理的,而在2.7或3.1版本中这将是一个可怕的想法。


*事实上,issubclass(xrange, collections.Sequence)回报率True在2.6-2.7 3.0-3.1和是一个错误是固定在3.2,而不是向后移植。

Some of the other answers mention that Python 3 eliminated 2.x’s range and renamed 2.x’s xrange to range. However, unless you’re using 3.0 or 3.1 (which nobody should be), it’s actually a somewhat different type.

As the 3.1 docs say:

Range objects have very little behavior: they only support indexing, iteration, and the len function.

However, in 3.2+, range is a full sequence—it supports extended slices, and all of the methods of collections.abc.Sequence with the same semantics as a list.*

And, at least in CPython and PyPy (the only two 3.2+ implementations that currently exist), it also has constant-time implementations of the index and count methods and the in operator (as long as you only pass it integers). This means writing 123456 in r is reasonable in 3.2+, while in 2.7 or 3.1 it would be a horrible idea.


* The fact that issubclass(xrange, collections.Sequence) returns True in 2.6-2.7 and 3.0-3.1 is a bug that was fixed in 3.2 and not backported.


回答 13

在python 2.x中

range(x)返回一个列表,该列表在内存中创建有x个元素。

>>> a = range(5)
>>> a
[0, 1, 2, 3, 4]

xrange(x)返回一个xrange对象,该对象是生成器obj,可按需生成数字。它们是在for循环(惰性评估)期间计算的。

对于循环,这比range()快一点,并且内存效率更高。

>>> b = xrange(5)
>>> b
xrange(5)

In python 2.x

range(x) returns a list, that is created in memory with x elements.

>>> a = range(5)
>>> a
[0, 1, 2, 3, 4]

xrange(x) returns an xrange object which is a generator obj which generates the numbers on demand. they are computed during for-loop(Lazy Evaluation).

For looping, this is slightly faster than range() and more memory efficient.

>>> b = xrange(5)
>>> b
xrange(5)

回答 14

在循环中针对xrange测试范围时(我知道我应该使用timeit,但是使用简单的列表理解示例从内存中迅速破解了它),我发现了以下内容:

import time

for x in range(1, 10):

    t = time.time()
    [v*10 for v in range(1, 10000)]
    print "range:  %.4f" % ((time.time()-t)*100)

    t = time.time()
    [v*10 for v in xrange(1, 10000)]
    print "xrange: %.4f" % ((time.time()-t)*100)

这使:

$python range_tests.py
range:  0.4273
xrange: 0.3733
range:  0.3881
xrange: 0.3507
range:  0.3712
xrange: 0.3565
range:  0.4031
xrange: 0.3558
range:  0.3714
xrange: 0.3520
range:  0.3834
xrange: 0.3546
range:  0.3717
xrange: 0.3511
range:  0.3745
xrange: 0.3523
range:  0.3858
xrange: 0.3997 <- garbage collection?

或者,在for循环中使用xrange:

range:  0.4172
xrange: 0.3701
range:  0.3840
xrange: 0.3547
range:  0.3830
xrange: 0.3862 <- garbage collection?
range:  0.4019
xrange: 0.3532
range:  0.3738
xrange: 0.3726
range:  0.3762
xrange: 0.3533
range:  0.3710
xrange: 0.3509
range:  0.3738
xrange: 0.3512
range:  0.3703
xrange: 0.3509

我的代码段测试是否正确?对较慢的xrange实例有何评论?或更好的例子:-)

When testing range against xrange in a loop (I know I should use timeit, but this was swiftly hacked up from memory using a simple list comprehension example) I found the following:

import time

for x in range(1, 10):

    t = time.time()
    [v*10 for v in range(1, 10000)]
    print "range:  %.4f" % ((time.time()-t)*100)

    t = time.time()
    [v*10 for v in xrange(1, 10000)]
    print "xrange: %.4f" % ((time.time()-t)*100)

which gives:

$python range_tests.py
range:  0.4273
xrange: 0.3733
range:  0.3881
xrange: 0.3507
range:  0.3712
xrange: 0.3565
range:  0.4031
xrange: 0.3558
range:  0.3714
xrange: 0.3520
range:  0.3834
xrange: 0.3546
range:  0.3717
xrange: 0.3511
range:  0.3745
xrange: 0.3523
range:  0.3858
xrange: 0.3997 <- garbage collection?

Or, using xrange in the for loop:

range:  0.4172
xrange: 0.3701
range:  0.3840
xrange: 0.3547
range:  0.3830
xrange: 0.3862 <- garbage collection?
range:  0.4019
xrange: 0.3532
range:  0.3738
xrange: 0.3726
range:  0.3762
xrange: 0.3533
range:  0.3710
xrange: 0.3509
range:  0.3738
xrange: 0.3512
range:  0.3703
xrange: 0.3509

Is my snippet testing properly? Any comments on the slower instance of xrange? Or a better example :-)


回答 15

python中的xrange()和range()与用户的工作原理相似,但是区别在于,当我们讨论使用这两个函数分配内存的方式时。

当我们使用range()时,我们为它生成的所有变量分配内存,因此不建议使用更大的no。生成的变量。

另一方面,xrange()一次仅生成一个特定值,并且只能与for循环一起使用以打印所需的所有值。

xrange() and range() in python works similarly as for the user , but the difference comes when we are talking about how the memory is allocated in using both the function.

When we are using range() we allocate memory for all the variables it is generating, so it is not recommended to use with larger no. of variables to be generated.

xrange() on the other hand generate only a particular value at a time and can only be used with the for loop to print all the values required.


回答 16

range生成整个列表并返回。xrange不会-它会按需生成列表中的数字。

range generates the entire list and returns it. xrange does not — it generates the numbers in the list on demand.


回答 17

xrange使用迭代器(动态生成值),range返回一个列表。

xrange uses an iterator (generates values on the fly), range returns a list.


回答 18

什么?
range在运行时返回静态列表。
xrange返回一个object(在某种情况下,它的作用类似于生成器,尽管肯定不是一个),并在需要时从中生成值。

什么时候使用?

  • xrange如果要生成一个巨大范围(例如10亿)的列表,则可以使用该选项,尤其是当您拥有像手机这样的“内存敏感系统”时。
  • 使用range,如果你想在列表几次迭代。

PS:Python 3.x的range功能== Python 2.x的xrange功能。

What?
range returns a static list at runtime.
xrange returns an object (which acts like a generator, although it’s certainly not one) from which values are generated as and when required.

When to use which?

  • Use xrange if you want to generate a list for a gigantic range, say 1 billion, especially when you have a “memory sensitive system” like a cell phone.
  • Use range if you want to iterate over the list several times.

PS: Python 3.x’s range function == Python 2.x’s xrange function.


回答 19

每个人都对此做了很大的解释。但我希望自己看到它。我使用python3。因此,我打开了资源监视器(在Windows中!),首先,首先执行以下命令:

a=0
for i in range(1,100000):
    a=a+i

然后检查“使用中”内存中的更改。这无关紧要。然后,我运行了以下代码:

for i in list(range(1,100000)):
    a=a+i

立即消耗了很大一部分内存。而且,我被说服了。您可以自己尝试。

如果您使用的是Python 2X,则在第一个代码中将“ range()”替换为“ xrange()”,将“ list(range())”替换为“ range()”。

Everyone has explained it greatly. But I wanted it to see it for myself. I use python3. So, I opened the resource monitor (in Windows!), and first, executed the following command first:

a=0
for i in range(1,100000):
    a=a+i

and then checked the change in ‘In Use’ memory. It was insignificant. Then, I ran the following code:

for i in list(range(1,100000)):
    a=a+i

And it took a big chunk of the memory for use, instantly. And, I was convinced. You can try it for yourself.

If you are using Python 2X, then replace ‘range()’ with ‘xrange()’ in the first code and ‘list(range())’ with ‘range()’.


回答 20

从帮助文档。

Python 2.7.12

>>> print range.__doc__
range(stop) -> list of integers
range(start, stop[, step]) -> list of integers

Return a list containing an arithmetic progression of integers.
range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0.
When step is given, it specifies the increment (or decrement).
For example, range(4) returns [0, 1, 2, 3].  The end point is omitted!
These are exactly the valid indices for a list of 4 elements.

>>> print xrange.__doc__
xrange(stop) -> xrange object
xrange(start, stop[, step]) -> xrange object

Like range(), but instead of returning a list, returns an object that
generates the numbers in the range on demand.  For looping, this is 
slightly faster than range() and more memory efficient.

Python 3.5.2

>>> print(range.__doc__)
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).

>>> print(xrange.__doc__)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'xrange' is not defined

差异显而易见。在Python 2.x中,range返回一个列表,xrange返回一个可迭代的xrange对象。

在Python 3.x中,range成为xrangePython 2.x并被xrange删除。

From the help docs.

Python 2.7.12

>>> print range.__doc__
range(stop) -> list of integers
range(start, stop[, step]) -> list of integers

Return a list containing an arithmetic progression of integers.
range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0.
When step is given, it specifies the increment (or decrement).
For example, range(4) returns [0, 1, 2, 3].  The end point is omitted!
These are exactly the valid indices for a list of 4 elements.

>>> print xrange.__doc__
xrange(stop) -> xrange object
xrange(start, stop[, step]) -> xrange object

Like range(), but instead of returning a list, returns an object that
generates the numbers in the range on demand.  For looping, this is 
slightly faster than range() and more memory efficient.

Python 3.5.2

>>> print(range.__doc__)
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).

>>> print(xrange.__doc__)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'xrange' is not defined

Difference is apparent. In Python 2.x, range returns a list, xrange returns an xrange object which is iterable.

In Python 3.x, range becomes xrange of Python 2.x, and xrange is removed.


回答 21

根据扫描/打印0-N个项目的要求,range和xrange的工作方式如下。

range()-在内存中创建一个新列表,并将整个0到N个项目(总共N + 1个)打印出来。xrange()-创建一个迭代器实例,该实例扫描项目并仅将当前遇到的项目保留在内存中,因此始终使用相同数量的内存。

如果所需元素只是在列表的开头,那么它可以节省大量时间和内存。

On a requirement for scanning/printing of 0-N items , range and xrange works as follows.

range() – creates a new list in the memory and takes the whole 0 to N items(totally N+1) and prints them. xrange() – creates a iterator instance that scans through the items and keeps only the current encountered item into the memory , hence utilising same amount of memory all the time.

In case the required element is somewhat at the beginning of the list only then it saves a good amount of time and memory.


回答 22

Range返回一个列表,xrange返回一个xrange对象,无论范围大小如何,该对象都占用相同的内存,因为在这种情况下,每次迭代仅生成一个元素并且可用,而在使用range的情况下,所有元素一次生成,并且在内存中可用。

Range returns a list while xrange returns an xrange object which takes the same memory irrespective of the range size,as in this case,only one element is generated and available per iteration whereas in case of using range, all the elements are generated at once and are available in the memory.


回答 23

对于较小的参数差减小range(..)/ xrange(..)

$ python -m timeit "for i in xrange(10111):" " for k in range(100):" "  pass"
10 loops, best of 3: 59.4 msec per loop

$ python -m timeit "for i in xrange(10111):" " for k in xrange(100):" "  pass"
10 loops, best of 3: 46.9 msec per loop

在这种情况下,xrange(100)效率仅提高约20%。

The difference decreases for smaller arguments to range(..) / xrange(..):

$ python -m timeit "for i in xrange(10111):" " for k in range(100):" "  pass"
10 loops, best of 3: 59.4 msec per loop

$ python -m timeit "for i in xrange(10111):" " for k in xrange(100):" "  pass"
10 loops, best of 3: 46.9 msec per loop

In this case xrange(100) is only about 20% more efficient.


回答 24

range:-range会立即填充所有内容,这意味着范围的每个数字都会占用内存。

xrange:-xrange类似于生成器,当您想要数字的范围但不希望将它们存储时(例如当您要在for loop.so中使用时),它将出现在图片中,这样可以提高内存效率。

range :-range will populate everything at once.which means every number of the range will occupy the memory.

xrange :-xrange is something like generator ,it will comes into picture when you want the range of numbers but you dont want them to be stored,like when you want to use in for loop.so memory efficient.


回答 25

此外,如果这样做list(xrange(...))等同于range(...)

所以 list很慢。

xrange真的没有完全完成序列

这就是为什么它不是列表,而是xrange对象

Additionally, if do list(xrange(...)) will be equivalent to range(...).

So list is slow.

Also xrange really doesn’t fully finish the sequence

So that’s why its not a list, it’s a xrange object


回答 26

range() 在Python中 2.x

此函数本质range()上是Python中可用的旧函数,它2.x返回list包含指定范围内的元素的对象的实例。

但是,这种实现在初始化带有一系列数字的列表时效率太低。例如,for i in range(1000000)就内存和时间使用而言,要执行的命令非常昂贵,因为它需要将此列表存储到内存中。


range()在Python 3.xxrange()Python中2.x

Python 3.x引入了更新的实现range()(而更新的实现已经可以2.x通过Python 通过xrange()功能)。

range()漏洞利用一种称为惰性评估的策略较新的实现未在范围内创建大量元素,而是引入了class range,这是一个轻量级的对象,代表给定范围内的所需元素,而无需将其显式存储在内存中(这听起来像是生成器,但是惰性求值的概念是不同)。


例如,请考虑以下内容:

# Python 2.x
>>> a = range(10)
>>> type(a)
<type 'list'>
>>> b = xrange(10)
>>> type(b)
<type 'xrange'>

# Python 3.x
>>> a = range(10)
>>> type(a)
<class 'range'>

range() in Python 2.x

This function is essentially the old range() function that was available in Python 2.x and returns an instance of a list object that contains the elements in the specified range.

However, this implementation is too inefficient when it comes to initialise a list with a range of numbers. For example, for i in range(1000000) would be a very expensive command to execute, both in terms of memory and time usage as it requires the storage of this list into the memory.


range() in Python 3.x and xrange() in Python 2.x

Python 3.x introduced a newer implementation of range() (while the newer implementation was already available in Python 2.x through the xrange() function).

The range() exploits a strategy known as lazy evaluation. Instead of creating a huge list of elements in range, the newer implementation introduces the class range, a lightweight object that represents the required elements in the given range, without storing them explicitly in memory (this might sound like generators but the concept of lazy evaluation is different).


As an example, consider the following:

# Python 2.x
>>> a = range(10)
>>> type(a)
<type 'list'>
>>> b = xrange(10)
>>> type(b)
<type 'xrange'>

and

# Python 3.x
>>> a = range(10)
>>> type(a)
<class 'range'>

回答 27

看到这个帖子以查找range和xrange之间的区别:

报价:

range返回确切的结果:一系列连续的整数,定义的长度以0开头xrange。但是,将返回“ xrange object”,其作用类似于迭代器

See this post to find difference between range and xrange:

To quote:

range returns exactly what you think: a list of consecutive integers, of a defined length beginning with 0. xrange, however, returns an “xrange object”, which acts a great deal like an iterator


相对进口量为十亿次

问题:相对进口量为十亿次

我来过这里:

以及很多我没有复制的URL,有些在SO上,有些在其他网站上,当我以为我很快就会找到解决方案时。

永远存在的问题是:在Windows 7、32位Python 2.7.3中,如何解决此“尝试以非软件包方式进行相对导入”消息?我在pep-0328上构建了该软件包的精确副本:

package/
    __init__.py
    subpackage1/
        __init__.py
        moduleX.py
        moduleY.py
    subpackage2/
        __init__.py
        moduleZ.py
    moduleA.py

导入是从控制台完成的。

我确实在相应的模块中创建了名为垃圾邮件和鸡蛋的函数。自然,它不起作用。答案显然是在我列出的第4个网址中,但对我来说都是校友。我访问的其中一个URL上有此响应:

相对导入使用模块的名称属性来确定该模块在包层次结构中的位置。如果模块的名称不包含任何包信息(例如,将其设置为“ main”),则相对导入的解析就好像该模块是顶级模块一样,无论该模块实际位于文件系统上的哪个位置。

上面的回答看起来很有希望,但对我来说,全都是象形文字。所以我的问题是,如何使Python不返回“未包装的相对导入尝试”?可能有一个涉及-m的答案。

有人可以告诉我为什么Python会给出该错误消息,“非包装”的含义,为什么以及如何定义“包装”以及准确的答案,这些措辞足以使幼儿园的学生理解

I’ve been here:

and plenty of URLs that I did not copy, some on SO, some on other sites, back when I thought I’d have the solution quickly.

The forever-recurring question is this: With Windows 7, 32-bit Python 2.7.3, how do I solve this “Attempted relative import in non-package” message? I built an exact replica of the package on pep-0328:

package/
    __init__.py
    subpackage1/
        __init__.py
        moduleX.py
        moduleY.py
    subpackage2/
        __init__.py
        moduleZ.py
    moduleA.py

The imports were done from the console.

I did make functions named spam and eggs in their appropriate modules. Naturally, it didn’t work. The answer is apparently in the 4th URL I listed, but it’s all alumni to me. There was this response on one of the URLs I visited:

Relative imports use a module’s name attribute to determine that module’s position in the package hierarchy. If the module’s name does not contain any package information (e.g. it is set to ‘main’) then relative imports are resolved as if the module were a top level module, regardless of where the module is actually located on the file system.

The above response looks promising, but it’s all hieroglyphs to me. So my question, how do I make Python not return to me “Attempted relative import in non-package”? has an answer that involves -m, supposedly.

Can somebody please tell me why Python gives that error message, what it means by “non-package”, why and how do you define a ‘package’, and the precise answer put in terms easy enough for a kindergartener to understand.


回答 0

脚本与模块

这是一个解释。简短的版本是直接运行Python文件与从其他位置导入该文件之间存在很大差异。 仅知道文件位于哪个目录并不能确定Python认为位于哪个软件包。 此外,这还取决于您如何通过运行或导入将文件加载到Python中。

加载Python文件的方式有两种:作为顶级脚本或作为模块。如果直接执行文件(例如,python myfile.py在命令行上键入),则将文件作为顶级脚本加载。如果您这样做python -m myfile,则将其作为模块加载,或者import在其他文件中遇到语句时将其加载。一次只能有一个顶级脚本。顶层脚本是您为了开始而运行的Python文件。

命名

加载文件时,将为其指定一个名称(存储在其__name__属性中)。如果已将其作为顶级脚本加载,则其名称为__main__。如果将其作为模块加载,则其名称为文件名,其后是其所属的所有软件包/子软件包的名称,并用点号分隔。

例如,在您的示例中:

package/
    __init__.py
    subpackage1/
        __init__.py
        moduleX.py
    moduleA.py

如果您导入moduleX(请注意:imported,不直接执行),则其名称为package.subpackage1.moduleX。如果导入moduleA,则名称为package.moduleA。但是,如果直接从命令行运行 moduleX,则名称为__main__,如果直接从命令行运行moduleA,则名称为__main__。当模块作为顶级脚本运行时,它将失去其常规名称,而其名称改为__main__

不通过其包含的包访问模块

还有一个额外的问题:模块的名称取决于它是从其所在目录“直接”导入还是通过软件包导入。仅当您在目录中运行Python并尝试将文件导入同一目录(或其子目录)时,这才有所不同。例如,如果您在目录中启动Python解释器package/subpackage1,然后执行do import moduleX,则其名称moduleX将仅为moduleX,而不是package.subpackage1.moduleX。这是因为Python在启动时会将当前目录添加到其搜索路径中。如果它在当前目录中找到了要导入的模块,则不会知道该目录是软件包的一部分,并且软件包信息也不会成为模块名称的一部分。

一种特殊情况是,如果您以交互方式运行解释器(例如,只需键入python并开始即时输入Python代码)。在这种情况下,该交互式会话的名称为__main__

现在,这是您的错误消息的关键所在:如果模块的名称没有点,则不认为它是包的一部分。文件实际在磁盘上的哪个位置都没有关系。重要的是它的名称是什么,它的名称取决于您如何加载它。

现在查看您在问题中包含的报价:

相对导入使用模块的名称属性来确定该模块在包层次结构中的位置。如果模块的名称不包含任何包信息(例如,将其设置为“ main”),则相对导入的解析就好像该模块是顶级模块一样,无论该模块实际位于文件系统上的哪个位置。

相对进口…

相对导入使用模块的名称来确定模块在包中的位置。当您使用类似的相对导入时from .. import foo,点表示在包层次结构中增加了一些级别。例如,如果您当前模块的名称为package.subpackage1.moduleX,则..moduleA表示package.moduleA。为了使a from .. import起作用,模块的名称必须至少包含与import语句中一样多的点。

…只是相对的

但是,如果模块的名称为__main__,则不认为它在软件包中。它的名称没有点,因此您不能from .. import在其中使用语句。如果您尝试这样做,则会收到“非包中的相对导入”错误。

脚本无法导入相对

您可能所做的是尝试从命令行运行moduleX等。执行此操作时,其名称设置为__main__,这意味着其中的相对导入将失败,因为它的名称不会显示它在软件包中。请注意,如果您从模块所在的同一目录运行Python,然后尝试导入该模块,也会发生这种情况,因为如上所述,Python会“过早”在当前目录中找到该模块,而没有意识到它是包装的一部分。

还请记住,当您运行交互式解释器时,该交互式会话的“名称”始终为__main__。因此,您不能直接从交互式会话进行相对导入。相对导入仅在模块文件中使用。

两种解决方案:

  1. 如果您确实确实想moduleX直接运行,但是仍然希望将其视为软件包的一部分,则可以这样做python -m package.subpackage1.moduleX。该命令-m告诉Python将其作为模块而不是顶级脚本进行加载。

  2. 或者,也许您实际上并不想运行 moduleX,而只想运行其他脚本,例如myfile.py,该脚本使用 inside函数moduleX。如果是这样的话,把myfile.py 其他地方没有内部package目录-并运行它。如果myfile.py您在内部执行类似的操作from package.moduleA import spam,则效果很好。

笔记

  • 对于这两种解决方案,都package必须可以从Python模块搜索路径(sys.path)访问包目录(在您的示例中)。如果不是,您将根本无法可靠地使用包装中的任何物品。

  • 从Python 2.6开始,用于程序包解析的模块的“名称”不仅由其__name__属性确定,而且由__package__属性确定。这就是为什么我避免使用显式符号__name__来引用模块的“名称”的原因。因为Python 2.6模块的“名”是有效的__package__ + '.' + __name__,或者只是__name__如果__package__None)。

Script vs. Module

Here’s an explanation. The short version is that there is a big difference between directly running a Python file, and importing that file from somewhere else. Just knowing what directory a file is in does not determine what package Python thinks it is in. That depends, additionally, on how you load the file into Python (by running or by importing).

There are two ways to load a Python file: as the top-level script, or as a module. A file is loaded as the top-level script if you execute it directly, for instance by typing python myfile.py on the command line. It is loaded as a module if you do python -m myfile, or if it is loaded when an import statement is encountered inside some other file. There can only be one top-level script at a time; the top-level script is the Python file you ran to start things off.

Naming

When a file is loaded, it is given a name (which is stored in its __name__ attribute). If it was loaded as the top-level script, its name is __main__. If it was loaded as a module, its name is the filename, preceded by the names of any packages/subpackages of which it is a part, separated by dots.

So for instance in your example:

package/
    __init__.py
    subpackage1/
        __init__.py
        moduleX.py
    moduleA.py

if you imported moduleX (note: imported, not directly executed), its name would be package.subpackage1.moduleX. If you imported moduleA, its name would be package.moduleA. However, if you directly run moduleX from the command line, its name will instead be __main__, and if you directly run moduleA from the command line, its name will be __main__. When a module is run as the top-level script, it loses its normal name and its name is instead __main__.

Accessing a module NOT through its containing package

There is an additional wrinkle: the module’s name depends on whether it was imported “directly” from the directory it is in, or imported via a package. This only makes a difference if you run Python in a directory, and try to import a file in that same directory (or a subdirectory of it). For instance, if you start the Python interpreter in the directory package/subpackage1 and then do import moduleX, the name of moduleX will just be moduleX, and not package.subpackage1.moduleX. This is because Python adds the current directory to its search path on startup; if it finds the to-be-imported module in the current directory, it will not know that that directory is part of a package, and the package information will not become part of the module’s name.

A special case is if you run the interpreter interactively (e.g., just type python and start entering Python code on the fly). In this case the name of that interactive session is __main__.

Now here is the crucial thing for your error message: if a module’s name has no dots, it is not considered to be part of a package. It doesn’t matter where the file actually is on disk. All that matters is what its name is, and its name depends on how you loaded it.

Now look at the quote you included in your question:

Relative imports use a module’s name attribute to determine that module’s position in the package hierarchy. If the module’s name does not contain any package information (e.g. it is set to ‘main’) then relative imports are resolved as if the module were a top level module, regardless of where the module is actually located on the file system.

Relative imports…

Relative imports use the module’s name to determine where it is in a package. When you use a relative import like from .. import foo, the dots indicate to step up some number of levels in the package hierarchy. For instance, if your current module’s name is package.subpackage1.moduleX, then ..moduleA would mean package.moduleA. For a from .. import to work, the module’s name must have at least as many dots as there are in the import statement.

… are only relative in a package

However, if your module’s name is __main__, it is not considered to be in a package. Its name has no dots, and therefore you cannot use from .. import statements inside it. If you try to do so, you will get the “relative-import in non-package” error.

Scripts can’t import relative

What you probably did is you tried to run moduleX or the like from the command line. When you did this, its name was set to __main__, which means that relative imports within it will fail, because its name does not reveal that it is in a package. Note that this will also happen if you run Python from the same directory where a module is, and then try to import that module, because, as described above, Python will find the module in the current directory “too early” without realizing it is part of a package.

Also remember that when you run the interactive interpreter, the “name” of that interactive session is always __main__. Thus you cannot do relative imports directly from an interactive session. Relative imports are only for use within module files.

Two solutions:

  1. If you really do want to run moduleX directly, but you still want it to be considered part of a package, you can do python -m package.subpackage1.moduleX. The -m tells Python to load it as a module, not as the top-level script.

  2. Or perhaps you don’t actually want to run moduleX, you just want to run some other script, say myfile.py, that uses functions inside moduleX. If that is the case, put myfile.py somewhere elsenot inside the package directory – and run it. If inside myfile.py you do things like from package.moduleA import spam, it will work fine.

Notes

  • For either of these solutions, the package directory (package in your example) must be accessible from the Python module search path (sys.path). If it is not, you will not be able to use anything in the package reliably at all.

  • Since Python 2.6, the module’s “name” for package-resolution purposes is determined not just by its __name__ attributes but also by the __package__ attribute. That’s why I’m avoiding using the explicit symbol __name__ to refer to the module’s “name”. Since Python 2.6 a module’s “name” is effectively __package__ + '.' + __name__, or just __name__ if __package__ is None.)


回答 1

这确实是python中的问题。混淆的根源是人们错误地将相对进口作为相对的进口,而不是。

例如,当您在faa.py中编写时:

from .. import foo

这具有只有一个意思faa.py识别并加载由蟒,在执行期间,作为一个包的一部分。在这种情况下,该模块的名称faa.py将是例如some_packagename.faa。如果仅由于文件在当前目录中而被加载,则在运行python时,其名称将不会引用任何软件包,最终相对导入将失败。

引用当前目录中模块的一个简单解决方案是使用以下方法:

if __package__ is None or __package__ == '':
    # uses current directory visibility
    import foo
else:
    # uses current package visibility
    from . import foo

This is really a problem within python. The origin of confusion is that people mistakenly takes the relative import as path relative which is not.

For example when you write in faa.py:

from .. import foo

This has a meaning only if faa.py was identified and loaded by python, during execution, as a part of a package. In that case,the module’s name for faa.py would be for example some_packagename.faa. If the file was loaded just because it is in the current directory, when python is run, then its name would not refer to any package and eventually relative import would fail.

A simple solution to refer modules in the current directory, is to use this:

if __package__ is None or __package__ == '':
    # uses current directory visibility
    import foo
else:
    # uses current package visibility
    from . import foo

回答 2

这是一个通用的配方,经过修改以适合作为示例,我现在使用它来处理以程序包形式编写的Python库,其中包含相互依赖的文件,我希望能够逐个测试其中的某些部分。让我们称之为lib.foo它,它需要lib.fileA对函数f1f2lib.fileB类进行访问Class3

我打了几个print电话,以帮助说明这是如何工作的。实际上,您可能希望将其删除(也许还删除该from __future__ import print_function行)。

这个特定的例子太简单了,无法显示何时确实需要在中插入条目sys.path。(见拉尔斯的回答为我们的情况下,需要它,当我们有包目录中的两个或两个以上的水平,然后我们使用os.path.dirname(os.path.dirname(__file__))-但它并没有真正伤害在这里无论是。)它也足够安全要做到这一点,而不if _i in sys.path测试。但是,如果每个导入文件插入相同的路径-例如,如果两个fileAfileB希望导入实用程序从包中,这个杂波了sys.path具有相同路径很多次,所以很高兴有if _i not in sys.path在样板。

from __future__ import print_function # only when showing how this works

if __package__:
    print('Package named {!r}; __name__ is {!r}'.format(__package__, __name__))
    from .fileA import f1, f2
    from .fileB import Class3
else:
    print('Not a package; __name__ is {!r}'.format(__name__))
    # these next steps should be used only with care and if needed
    # (remove the sys.path manipulation for simple cases!)
    import os, sys
    _i = os.path.dirname(os.path.abspath(__file__))
    if _i not in sys.path:
        print('inserting {!r} into sys.path'.format(_i))
        sys.path.insert(0, _i)
    else:
        print('{!r} is already in sys.path'.format(_i))
    del _i # clean up global name space

    from fileA import f1, f2
    from fileB import Class3

... all the code as usual ...

if __name__ == '__main__':
    import doctest, sys
    ret = doctest.testmod()
    sys.exit(0 if ret.failed == 0 else 1)

这里的想法是这样的(请注意,这些在python2.7和python 3.x中的功能都相同):

  1. 如果从普通代码导入为常规软件包import libfrom lib import foo作为常规软件包运行,__package则is lib__name__is lib.foo。我们采用第一个代码路径,从.fileA等导入。

  2. 如果运行为python lib/foo.py__package__则将为None且__name__将为__main__

    我们采用第二条代码路径。该lib目录已经存在,sys.path因此无需添加它。我们从fileA等导入

  3. 如果在lib目录中以身份运行python foo.py,则其行为与情况2相同。

  4. 如果在libas目录中运行python -m foo,其行为类似于情况2和3。但是,lib目录的路径不在in中sys.path,因此我们在导入之前将其添加。如果我们先运行Python然后运行,则同样适用import foo

    (由于. sys.path,我们并不真正需要添加此路径的绝对的版本。这是一个更深层次的包嵌套结构,我们想要做的from ..otherlib.fileC import ...,有差别。如果你不这样做,就可以在sys.path完全省略所有操作。)

笔记

仍然有一个怪癖。如果从外部运行整个过程:

$ python2 lib.foo

要么:

$ python3 lib.foo

行为取决于的内容lib/__init__.py。如果存在并且为空,则一切正常:

Package named 'lib'; __name__ is '__main__'

但是,如果lib/__init__.py 本身导入,routine以便可以routine.name直接将导出为lib.name,则会得到:

$ python2 lib.foo
Package named 'lib'; __name__ is 'lib.foo'
Package named 'lib'; __name__ is '__main__'

也就是说,该模块两次导入,一次是通过包导入的,然后是再次导入的,__main__以便它运行您的main代码。Python 3.6及更高版本对此发出警告:

$ python3 lib.routine
Package named 'lib'; __name__ is 'lib.foo'
[...]/runpy.py:125: RuntimeWarning: 'lib.foo' found in sys.modules
after import of package 'lib', but prior to execution of 'lib.foo';
this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
Package named 'lib'; __name__ is '__main__'

警告是新的,但警告说,有关的行为是不能。这就是所谓的双重导入陷阱的一部分。(有关其他详细信息,请参见问题27487。)尼克·科格兰(Nick Coghlan)说:

下一个陷阱存在于所有当前的Python版本(包括3.3)中,并且可以在以下常规准则中进行总结:“切勿将包目录或包内的任何目录直接添加到Python路径中”。

请注意,虽然此处违反了该规则,但不将要加载的文件作为程序包的一部分加载时才这样做,并且我们的修改经过专门设计,允许我们访问该程序包中的其他文件。(而且,正如我所指出的,我们可能根本不应该为单层程序包执行此操作。)如果我们想变得更加干净,可以将其重写为例如:

    import os, sys
    _i = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
    if _i not in sys.path:
        sys.path.insert(0, _i)
    else:
        _i = None

    from sub.fileA import f1, f2
    from sub.fileB import Class3

    if _i:
        sys.path.remove(_i)
    del _i

也就是说,我们进行了sys.path足够长的修改以实现导入,然后将其恢复原样(_i如果且仅当我们添加的一个副本时,删除一个副本_i)。

Here’s a general recipe, modified to fit as an example, that I am using right now for dealing with Python libraries written as packages, that contain interdependent files, where I want to be able to test parts of them piecemeal. Let’s call this lib.foo and say that it needs access to lib.fileA for functions f1 and f2, and lib.fileB for class Class3.

I have included a few print calls to help illustrate how this works. In practice you would want to remove them (and maybe also the from __future__ import print_function line).

This particular example is too simple to show when we really need to insert an entry into sys.path. (See Lars’ answer for a case where we do need it, when we have two or more levels of package directories, and then we use os.path.dirname(os.path.dirname(__file__))—but it doesn’t really hurt here either.) It’s also safe enough to do this without the if _i in sys.path test. However, if each imported file inserts the same path—for instance, if both fileA and fileB want to import utilities from the package—this clutters up sys.path with the same path many times, so it’s nice to have the if _i not in sys.path in the boilerplate.

from __future__ import print_function # only when showing how this works

if __package__:
    print('Package named {!r}; __name__ is {!r}'.format(__package__, __name__))
    from .fileA import f1, f2
    from .fileB import Class3
else:
    print('Not a package; __name__ is {!r}'.format(__name__))
    # these next steps should be used only with care and if needed
    # (remove the sys.path manipulation for simple cases!)
    import os, sys
    _i = os.path.dirname(os.path.abspath(__file__))
    if _i not in sys.path:
        print('inserting {!r} into sys.path'.format(_i))
        sys.path.insert(0, _i)
    else:
        print('{!r} is already in sys.path'.format(_i))
    del _i # clean up global name space

    from fileA import f1, f2
    from fileB import Class3

... all the code as usual ...

if __name__ == '__main__':
    import doctest, sys
    ret = doctest.testmod()
    sys.exit(0 if ret.failed == 0 else 1)

The idea here is this (and note that these all function the same across python2.7 and python 3.x):

  1. If run as import lib or from lib import foo as a regular package import from ordinary code, __package is lib and __name__ is lib.foo. We take the first code path, importing from .fileA, etc.

  2. If run as python lib/foo.py, __package__ will be None and __name__ will be __main__.

    We take the second code path. The lib directory will already be in sys.path so there is no need to add it. We import from fileA, etc.

  3. If run within the lib directory as python foo.py, the behavior is the same as for case 2.

  4. If run within the lib directory as python -m foo, the behavior is similar to cases 2 and 3. However, the path to the lib directory is not in sys.path, so we add it before importing. The same applies if we run Python and then import foo.

    (Since . is in sys.path, we don’t really need to add the absolute version of the path here. This is where a deeper package nesting structure, where we want to do from ..otherlib.fileC import ..., makes a difference. If you’re not doing this, you can omit all the sys.path manipulation entirely.)

Notes

There is still a quirk. If you run this whole thing from outside:

$ python2 lib.foo

or:

$ python3 lib.foo

the behavior depends on the contents of lib/__init__.py. If that exists and is empty, all is well:

Package named 'lib'; __name__ is '__main__'

But if lib/__init__.py itself imports routine so that it can export routine.name directly as lib.name, you get:

$ python2 lib.foo
Package named 'lib'; __name__ is 'lib.foo'
Package named 'lib'; __name__ is '__main__'

That is, the module gets imported twice, once via the package and then again as __main__ so that it runs your main code. Python 3.6 and later warn about this:

$ python3 lib.routine
Package named 'lib'; __name__ is 'lib.foo'
[...]/runpy.py:125: RuntimeWarning: 'lib.foo' found in sys.modules
after import of package 'lib', but prior to execution of 'lib.foo';
this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
Package named 'lib'; __name__ is '__main__'

The warning is new, but the warned-about behavior is not. It is part of what some call the double import trap. (For additional details see issue 27487.) Nick Coghlan says:

This next trap exists in all current versions of Python, including 3.3, and can be summed up in the following general guideline: “Never add a package directory, or any directory inside a package, directly to the Python path”.

Note that while we violate that rule here, we do it only when the file being loaded is not being loaded as part of a package, and our modification is specifically designed to allow us to access other files in that package. (And, as I noted, we probably shouldn’t do this at all for single level packages.) If we wanted to be extra-clean, we might rewrite this as, e.g.:

    import os, sys
    _i = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
    if _i not in sys.path:
        sys.path.insert(0, _i)
    else:
        _i = None

    from sub.fileA import f1, f2
    from sub.fileB import Class3

    if _i:
        sys.path.remove(_i)
    del _i

That is, we modify sys.path long enough to achieve our imports, then put it back the way it was (deleting one copy of _i if and only if we added one copy of _i).


回答 3

因此,在与其他许多人讨论了这一问题之后,我遇到了Dorian B本文中发布的注释,该注释解决了我在开发与Web服务一起使用的模块和类时遇到的特定问题,但我也想成为能够使用PyCharm中的调试器工具在编写代码时对其进行测试。要在独立的类中运行测试,我将在类文件末尾包含以下内容:

if __name__ == '__main__':
   # run test code here...

但是如果我想在同一文件夹中导入其他类或模块,则必须将所有导入语句从相对符号更改为本地引用(即,删除点(。)。)但是在阅读了多里安的建议之后,我尝试了他的“一线”,它的工作!现在,我可以在PyCharm中进行测试,并在另一个被测类中使用该类时,或者在Web服务中使用该类时,将测试代码保留在原位!

# import any site-lib modules first, then...
import sys
parent_module = sys.modules['.'.join(__name__.split('.')[:-1]) or '__main__']
if __name__ == '__main__' or parent_module.__name__ == '__main__':
    from codex import Codex # these are in same folder as module under test!
    from dblogger import DbLogger
else:
    from .codex import Codex
    from .dblogger import DbLogger

if语句检查是否将这个模块作为main运行,或者是否在另一个被测试为main的模块中使用。也许这很明显,但是我在这里提供此说明,以防其他因上述相对导入问题而感到沮丧的人可以使用它。

So after carping about this along with many others, I came across a note posted by Dorian B in this article that solved the specific problem I was having where I would develop modules and classes for use with a web service, but I also want to be able to test them as I’m coding, using the debugger facilities in PyCharm. To run tests in a self-contained class, I would include the following at the end of my class file:

if __name__ == '__main__':
   # run test code here...

but if I wanted to import other classes or modules in the same folder, I would then have to change all my import statements from relative notation to local references (i.e. remove the dot (.)) But after reading Dorian’s suggestion, I tried his ‘one-liner’ and it worked! I can now test in PyCharm and leave my test code in place when I use the class in another class under test, or when I use it in my web service!

# import any site-lib modules first, then...
import sys
parent_module = sys.modules['.'.join(__name__.split('.')[:-1]) or '__main__']
if __name__ == '__main__' or parent_module.__name__ == '__main__':
    from codex import Codex # these are in same folder as module under test!
    from dblogger import DbLogger
else:
    from .codex import Codex
    from .dblogger import DbLogger

The if statement checks to see if we’re running this module as main or if it’s being used in another module that’s being tested as main. Perhaps this is obvious, but I offer this note here in case anyone else frustrated by the relative import issues above can make use of it.


回答 4

这是我不建议使用的一种解决方案,但在某些情况下可能根本不生成模块,这可能会很有用:

import os
import sys
parent_dir_name = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
sys.path.append(parent_dir_name + "/your_dir")
import your_script
your_script.a_function()

Here is one solution that I would not recommend, but might be useful in some situations where modules were simply not generated:

import os
import sys
parent_dir_name = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
sys.path.append(parent_dir_name + "/your_dir")
import your_script
your_script.a_function()

回答 5

我有一个类似的问题,我不想更改Python模块的搜索路径,而需要从脚本中相对地加载模块(尽管“脚本不能与所有对象相对地导入”,正如BrenBarn上面很好地解释的那样)。

因此,我使用了以下技巧。不幸的是,它依赖于imp从3.4版本开始就弃用的模块,而被取而代之importlib。(这是否也可以用importlib?我不知道。)不过,这种破解现在仍然有效。

从位于文件夹中的脚本访问moduleXin的成员的示例:subpackage1subpackage2

#!/usr/bin/env python3

import inspect
import imp
import os

def get_script_dir(follow_symlinks=True):
    """
    Return directory of code defining this very function.
    Should work from a module as well as from a script.
    """
    script_path = inspect.getabsfile(get_script_dir)
    if follow_symlinks:
        script_path = os.path.realpath(script_path)
    return os.path.dirname(script_path)

# loading the module (hack, relying on deprecated imp-module)
PARENT_PATH = os.path.dirname(get_script_dir())
(x_file, x_path, x_desc) = imp.find_module('moduleX', [PARENT_PATH+'/'+'subpackage1'])
module_x = imp.load_module('subpackage1.moduleX', x_file, x_path, x_desc)

# importing a function and a value
function = module_x.my_function
VALUE = module_x.MY_CONST

较干净的方法似乎是修改Federico提到的用于加载模块的sys.path。

#!/usr/bin/env python3

if __name__ == '__main__' and __package__ is None:
    from os import sys, path
    # __file__ should be defined in this case
    PARENT_DIR = path.dirname(path.dirname(path.abspath(__file__)))
   sys.path.append(PARENT_DIR)
from subpackage1.moduleX import *

I had a similar problem where I didn’t want to change the Python module search path and needed to load a module relatively from a script (in spite of “scripts can’t import relative with all” as BrenBarn explained nicely above).

So I used the following hack. Unfortunately, it relies on the imp module that became deprecated since version 3.4 to be dropped in favour of importlib. (Is this possible with importlib, too? I don’t know.) Still, the hack works for now.

Example for accessing members of moduleX in subpackage1 from a script residing in the subpackage2 folder:

#!/usr/bin/env python3

import inspect
import imp
import os

def get_script_dir(follow_symlinks=True):
    """
    Return directory of code defining this very function.
    Should work from a module as well as from a script.
    """
    script_path = inspect.getabsfile(get_script_dir)
    if follow_symlinks:
        script_path = os.path.realpath(script_path)
    return os.path.dirname(script_path)

# loading the module (hack, relying on deprecated imp-module)
PARENT_PATH = os.path.dirname(get_script_dir())
(x_file, x_path, x_desc) = imp.find_module('moduleX', [PARENT_PATH+'/'+'subpackage1'])
module_x = imp.load_module('subpackage1.moduleX', x_file, x_path, x_desc)

# importing a function and a value
function = module_x.my_function
VALUE = module_x.MY_CONST

A cleaner approach seems to be to modify the sys.path used for loading modules as mentioned by Federico.

#!/usr/bin/env python3

if __name__ == '__main__' and __package__ is None:
    from os import sys, path
    # __file__ should be defined in this case
    PARENT_DIR = path.dirname(path.dirname(path.abspath(__file__)))
   sys.path.append(PARENT_DIR)
from subpackage1.moduleX import *

回答 6

__name__ 更改取决于所讨论的代码是在全局命名空间中运行还是作为导入模块的一部分运行。

如果代码不在全局空间中运行,则为__name__模块名称。如果它在全局命名空间中运行-例如,如果您将其输入到控制台中,或者使用python.exe yourscriptnamehere.pythen __name__成为脚本来运行该模块"__main__"

您将看到很多if __name__ == '__main__'用于测试是否从全局命名空间运行代码的python代码 -这使您可以拥有一个兼用作脚本的模块。

您是否尝试过从控制台进行这些导入?

__name__ changes depending on whether the code in question is run in the global namespace or as part of an imported module.

If the code is not running in the global space, __name__ will be the name of the module. If it is running in global namespace — for example, if you type it into a console, or run the module as a script using python.exe yourscriptnamehere.py then __name__ becomes "__main__".

You’ll see a lot of python code with if __name__ == '__main__' is used to test whether the code is being run from the global namespace – that allows you to have a module that doubles as a script.

Did you try to do these imports from the console?


回答 7

@BrenBarn的回答说明了一切,但是如果您像我一样,可能需要一段时间才能理解。这是我的情况,以及@BrenBarn的答案如何适用于此,也许会对您有所帮助。

案子

package/
    __init__.py
    subpackage1/
        __init__.py
        moduleX.py
    moduleA.py

使用我们熟悉的示例,并添加moduleX.py与..moduleA的相对导入。假设我尝试在导入moduleX的subpackage1目录中编写测试脚本,但随后得到了OP描述的可怕错误。

将测试脚本移至与package相同的级别,然后导入package.subpackage1.moduleX

说明

如前所述,相对导入是相对于当前名称进行的。当我的测试脚本从同一目录导入moduleX时,moduleX内的模块名称为moduleX。当遇到相对导入时,解释器无法备份程序包层次结构,因为它已经位于顶部

当我从上方导入moduleX时,moduleX内部的名称为package.subpackage1.moduleX,并且可以找到相对的导入

@BrenBarn’s answer says it all, but if you’re like me it might take a while to understand. Here’s my case and how @BrenBarn’s answer applies to it, perhaps it will help you.

The case

package/
    __init__.py
    subpackage1/
        __init__.py
        moduleX.py
    moduleA.py

Using our familiar example, and add to it that moduleX.py has a relative import to ..moduleA. Given that I tried writing a test script in the subpackage1 directory that imported moduleX, but then got the dreaded error described by the OP.

Solution

Move test script to the same level as package and import package.subpackage1.moduleX

Explanation

As explained, relative imports are made relative to the current name. When my test script imports moduleX from the same directory, then module name inside moduleX is moduleX. When it encounters a relative import the interpreter can’t back up the package hierarchy because it’s already at the top

When I import moduleX from above, then name inside moduleX is package.subpackage1.moduleX and the relative import can be found


回答 8

相对导入使用模块的名称属性来确定该模块在包层次结构中的位置。如果模块的名称不包含任何包信息(例如,将其设置为“ main”),则相对导入的解析就好像该模块是顶级模块一样,无论该模块实际位于文件系统上的哪个位置。

在PyPi上编写了一些python程序包,它可能会对这个问题的查看者有所帮助。如果一个人希望能够运行一个python文件,而该文件包含一个包/项目中的包含上层包的导入文件,而又不直接位于导入文件的目录中,则该包文件可以作为解决方法。https://pypi.org/project/import-anywhere/

Relative imports use a module’s name attribute to determine that module’s position in the package hierarchy. If the module’s name does not contain any package information (e.g. it is set to ‘main’) then relative imports are resolved as if the module were a top level module, regardless of where the module is actually located on the file system.

Wrote a little python package to PyPi that might help viewers of this question. The package acts as workaround if one wishes to be able to run python files containing imports containing upper level packages from within a package / project without being directly in the importing file’s directory. https://pypi.org/project/import-anywhere/


回答 9

为了使Python不再返回我“尝试以非包方式进行相对导入”。包/

init .py subpackage1 / init .py moduleX.py moduleY.py subpackage2 / init .py moduleZ.py moduleA.py

仅当您将相对导入应用于父文件时,才会发生此错误。例如,在moduleA.py中对“ print(name)”进行编码后,父文件已经返回main,因此该文件已经是main它无法进一步返回任何父包。包subpackage1和subpackage2的文件中需要相对导入,您可以使用“ ..”来引用父目录或模块。但是parent是如果已经是顶级程序包,则它不能在该父目录(程序包)之上。您向父母应用相对导入的此类文件只能与绝对导入应用一起使用。如果您使用绝对导入到父程序包中将不会出现错误,因为PYTHON PATH的概念定义了项目的顶层,即使您的文件位于子程序包中,python也会知道谁在程序包的顶层

To make Python not return to me “Attempted relative import in non-package”. package/

init.py subpackage1/ init.py moduleX.py moduleY.py subpackage2/ init.py moduleZ.py moduleA.py

This error occurs only if you are applying relative import to the parent file. For example parent file already returns main after you code “print(name)” in moduleA.py .so THIS file is already main it cannot return any parent package further on. relative imports are required in files of packages subpackage1 and subpackage2 you can use “..” to refer to the parent directory or module .But parent is if already top level package it cannot go further above that parent directory(package). Such files where you are applying relative importing to parents can only work with the application of absolute import. If you will use ABSOLUTE IMPORT IN PARENT PACKAGE NO ERROR will come as python knows who is at the top level of package even if your file is in subpackages because of the concept of PYTHON PATH which defines the top level of the project


将pandas DataFrame写入CSV文件

问题:将pandas DataFrame写入CSV文件

我在熊猫中有一个数据框,我想将其写入CSV文件。我正在使用以下方法:

df.to_csv('out.csv')

并得到错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in range(128)

有什么方法可以轻松解决此问题(即我的数据框中有Unicode字符)吗?是否有一种方法可以使用例如“ to-tab”方法(我认为不存在)写入制表符分隔的文件而不是CSV?

I have a dataframe in pandas which I would like to write to a CSV file. I am doing this using:

df.to_csv('out.csv')

And getting the error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in range(128)

Is there any way to get around this easily (i.e. I have unicode characters in my data frame)? And is there a way to write to a tab delimited file instead of a CSV using e.g. a ‘to-tab’ method (that I dont think exists)?


回答 0

要用制表符分隔,可以使用sep参数to_csv

df.to_csv(file_name, sep='\t')

要使用特定的编码(例如’utf-8’),请使用encoding参数:

df.to_csv(file_name, sep='\t', encoding='utf-8')

To delimit by a tab you can use the sep argument of to_csv:

df.to_csv(file_name, sep='\t')

To use a specific encoding (e.g. ‘utf-8’) use the encoding argument:

df.to_csv(file_name, sep='\t', encoding='utf-8')

回答 1

当你存储DataFrame对象转换成csv文件使用to_csv方法,你大概不会需要存储前指数DataFrame对象。

您可以通过将布尔值传递给参数来避免这种情况。Falseindex

有点像:

df.to_csv(file_name, encoding='utf-8', index=False)

因此,如果您的DataFrame对象类似于:

  Color  Number
0   red     22
1  blue     10

csv文件将存储:

Color,Number
red,22
blue,10

而不是(通过默认值情况True

,Color,Number
0,red,22
1,blue,10

When you are storing a DataFrame object into a csv file using the to_csv method, you probably wont be needing to store the preceding indices of each row of the DataFrame object.

You can avoid that by passing a False boolean value to index parameter.

Somewhat like:

df.to_csv(file_name, encoding='utf-8', index=False)

So if your DataFrame object is something like:

  Color  Number
0   red     22
1  blue     10

The csv file will store:

Color,Number
red,22
blue,10

instead of (the case when the default value True was passed)

,Color,Number
0,red,22
1,blue,10

回答 2

要将pandas DataFrame写入CSV文件,您将需要DataFrame.to_csv。此函数提供许多具有合理默认值的参数,您将经常需要覆盖这些参数以适合您的特定用例。例如,您可能要使用其他分隔符,更改日期时间格式或在写入时删除索引。to_csv您可以通过传递参数来满足这些要求。

下表列出了一些写入CSV文件的常见情况以及可以用于它们的相应参数。

写入CSV ma dude

脚注

  1. 默认分隔符假定为逗号(',')。除非您知道需要,否则请勿更改此设置。
  2. 默认情况下,的索引df写为第一列。如果您的DataFrame没有索引(IOW,df.index默认值为RangeIndex),那么您将index=False在写入时进行设置。以另一种方式解释这一点,如果您的数据确实有索引,则可以(并且应该)使用index=True或完全不使用它(默认值为True)。
  3. 如果要写入字符串数据,则最好设置此参数,以便其他应用程序知道如何读取数据。这也将避免UnicodeEncodeError您在保存时可能遇到的任何潜在问题。
  4. 如果要将大的DataFrame(> 100K行)写入磁盘,建议使用压缩,因为压缩会导致输出文件小得多。OTOH,这意味着写入时间将增加(因此,由于文件需要解压缩,因此读取时间也将增加)。

To write a pandas DataFrame to a CSV file, you will need DataFrame.to_csv. This function offers many arguments with reasonable defaults that you will more often than not need to override to suit your specific use case. For example, you might want to use a different separator, change the datetime format, or drop the index when writing. to_csv has arguments you can pass to address these requirements.

Here’s a table listing some common scenarios of writing to CSV files and the corresponding arguments you can use for them.

Write to CSV ma dude

Footnotes

  1. The default separator is assumed to be a comma (','). Don’t change this unless you know you need to.
  2. By default, the index of df is written as the first column. If your DataFrame does not have an index (IOW, the df.index is the default RangeIndex), then you will want to set index=False when writing. To explain this in a different way, if your data DOES have an index, you can (and should) use index=True or just leave it out completely (as the default is True).
  3. It would be wise to set this parameter if you are writing string data so that other applications know how to read your data. This will also avoid any potential UnicodeEncodeErrors you might encounter while saving.
  4. Compression is recommended if you are writing large DataFrames (>100K rows) to disk as it will result in much smaller output files. OTOH, it will mean the write time will increase (and consequently, the read time since the file will need to be decompressed).

回答 3

如果您遇到编码为’utf-8’的问题,并且想要逐个单元移动,可以尝试以下方法。

Python 2

(其中“ df”是您的DataFrame对象。)

for column in df.columns:
    for idx in df[column].index:
        x = df.get_value(idx,column)
        try:
            x = unicode(x.encode('utf-8','ignore'),errors ='ignore') if type(x) == unicode else unicode(str(x),errors='ignore')
            df.set_value(idx,column,x)
        except Exception:
            print 'encoding error: {0} {1}'.format(idx,column)
            df.set_value(idx,column,'')
            continue

然后尝试:

df.to_csv(file_name)

您可以通过以下方式检查列的编码:

for column in df.columns:
    print '{0} {1}'.format(str(type(df[column][0])),str(column))

警告:errors =’ignore’只会忽略字符,例如

IN: unicode('Regenexx\xae',errors='ignore')
OUT: u'Regenexx'

Python 3

for column in df.columns:
    for idx in df[column].index:
        x = df.get_value(idx,column)
        try:
            x = x if type(x) == str else str(x).encode('utf-8','ignore').decode('utf-8','ignore')
            df.set_value(idx,column,x)
        except Exception:
            print('encoding error: {0} {1}'.format(idx,column))
            df.set_value(idx,column,'')
            continue

Something else you can try if you are having issues encoding to ‘utf-8’ and want to go cell by cell you could try the following.

Python 2

(Where “df” is your DataFrame object.)

for column in df.columns:
    for idx in df[column].index:
        x = df.get_value(idx,column)
        try:
            x = unicode(x.encode('utf-8','ignore'),errors ='ignore') if type(x) == unicode else unicode(str(x),errors='ignore')
            df.set_value(idx,column,x)
        except Exception:
            print 'encoding error: {0} {1}'.format(idx,column)
            df.set_value(idx,column,'')
            continue

Then try:

df.to_csv(file_name)

You can check the encoding of the columns by:

for column in df.columns:
    print '{0} {1}'.format(str(type(df[column][0])),str(column))

Warning: errors=’ignore’ will just omit the character e.g.

IN: unicode('Regenexx\xae',errors='ignore')
OUT: u'Regenexx'

Python 3

for column in df.columns:
    for idx in df[column].index:
        x = df.get_value(idx,column)
        try:
            x = x if type(x) == str else str(x).encode('utf-8','ignore').decode('utf-8','ignore')
            df.set_value(idx,column,x)
        except Exception:
            print('encoding error: {0} {1}'.format(idx,column))
            df.set_value(idx,column,'')
            continue

回答 4

如果同时指定UTF-8编码,有时会遇到这些问题。我建议您在读取文件时指定编码,而在写入文件时指定相同的编码。这可能会解决您的问题。

Sometimes you face these problems if you specify UTF-8 encoding also. I recommend you to specify encoding while reading file and same encoding while writing to file. This might solve your problem.


回答 5

在Windows上具有完整路径的文件导出示例,如果文件具有标题,请执行以下操作

df.to_csv (r'C:\Users\John\Desktop\export_dataframe.csv', index = None, header=True) 

例如,如果您要存储在脚本所在目录的文件夹中,并且使用utf-8编码,制表符用作分隔符

df.to_csv(r'./export/dftocsv.csv', sep='\t', encoding='utf-8', header='true')

Example of export in file with full path on Windows and in case your file has headers:

df.to_csv (r'C:\Users\John\Desktop\export_dataframe.csv', index = None, header=True) 

Example if you have want to store in folder in same directory where your script is, with utf-8 encoding and tab as separator:

df.to_csv(r'./export/dftocsv.csv', sep='\t', encoding='utf-8', header='true')

回答 6

它可能不是这种情况的答案,但由于我.to_csv尝试过相同的错误消息,.toCSV('name.csv')并且错误消息有所不同(“” SparseDataFrame' object has no attribute 'toCSV'),因此通过将数据帧转换为密集数据帧来解决了问题。

df.to_dense().to_csv("submission.csv", index = False, sep=',', encoding='utf-8')

it could be not the answer for this case, but as I had the same error-message with .to_csvI tried .toCSV('name.csv') and the error-message was different (“SparseDataFrame' object has no attribute 'toCSV'). So the problem was solved by turning dataframe to dense dataframe

df.to_dense().to_csv("submission.csv", index = False, sep=',', encoding='utf-8')