标签归档:python-2.x

如何强制除法为浮点数?除数一直舍入到0?

问题:如何强制除法为浮点数?除数一直舍入到0?

我有两个整数值ab,但是我需要它们在浮点数中的比率。我知道a < b并且想要计算a / b,所以如果我使用整数除法,我将总是得到0,余数为a

c在下文中,如何在Python中强制成为浮点数?

c = a / b

I have two integer values a and b, but I need their ratio in floating point. I know that a < b and I want to calculate a / b, so if I use integer division I’ll always get 0 with a remainder of a.

How can I force c to be a floating point number in Python in the following?

c = a / b

回答 0

在Python 2中,两个整数的除法产生一个整数。在Python 3中,它产生一个浮点数。我们可以通过从中导入来获得新的行为__future__

>>> from __future__ import division
>>> a = 4
>>> b = 6
>>> c = a / b
>>> c
0.66666666666666663

In Python 2, division of two ints produces an int. In Python 3, it produces a float. We can get the new behaviour by importing from __future__.

>>> from __future__ import division
>>> a = 4
>>> b = 6
>>> c = a / b
>>> c
0.66666666666666663

回答 1

您可以通过执行此操作来浮动c = a / float(b)。如果分子或分母是浮点数,则结果也将是。


一个警告:如评论员所指出的,如果b它不是整数或浮点数(或表示一个的字符串),则此方法将无效。如果您正在处理其他类型(例如复数),则需要检查这些类型或使用其他方法。

You can cast to float by doing c = a / float(b). If the numerator or denominator is a float, then the result will be also.


A caveat: as commenters have pointed out, this won’t work if b might be something other than an integer or floating-point number (or a string representing one). If you might be dealing with other types (such as complex numbers) you’ll need to either check for those or use a different method.


回答 2

如何在Python中强制除法为浮点数?

我有两个整数值a和b,但是我需要它们在浮点数中的比率。我知道a <b并且我想计算a / b,所以如果我使用整数除法,我总是得到0并得到a的余数。

下面如何在Python中强制c为浮点数?

c = a / b

这里真正要问的是:

“我如何强制进行真正的除法以a / b返回分数?”

升级到Python 3

在Python 3中,要进行真正的除法,只需执行a / b

>>> 1/2
0.5

地板除法(整数的经典除法行为)现在为a // b

>>> 1//2
0
>>> 1//2.0
0.0

但是,您可能无法使用Python 2,或者编写的代码必须能同时在2和3中使用。

如果使用Python 2

在Python 2中,它不是那么简单。处理经典Python 2分区的某些方法比其他方法更好,更可靠。

对Python 2的建议

您可以在任何给定的模块中获得Python 3划分行为,并在顶部输入以下内容:

from __future__ import division

然后将Python 3样式划分应用于整个模块。它也可以在任何给定的点在python shell中工作。在Python 2中:

>>> from __future__ import division
>>> 1/2
0.5
>>> 1//2
0
>>> 1//2.0
0.0

这确实是最好的解决方案,因为它可以确保模块中的代码与Python 3向前兼容。

Python 2的其他选项

如果您不想将其应用于整个模块,则只能使用一些解决方法。最受欢迎的是将其中一个操作数强制为浮点数。一种可靠的解决方案是a / (b * 1.0)。在新的Python Shell中:

>>> 1/(2 * 1.0)
0.5

truediv来自operator模块operator.truediv(a, b)的功能也很强大,但这可能会更慢,因为它是一个函数调用:

>>> from operator import truediv
>>> truediv(1, 2)
0.5

不建议用于Python 2

常见的是a / float(b)。如果b为复数,则将引发TypeError。由于定义了带有复数的除法,所以对我来说,在为除数传递一个复数时,除法不会失败。

>>> 1 / float(2)
0.5
>>> 1 / float(2j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't convert complex to float

对我而言,故意使您的代码更脆弱没有太大意义。

您也可以使用-Qnew标记运行Python ,但这不利于执行具有新Python 3行为的所有模块,并且您的某些模块可能需要经典的划分,因此除测试外,我不建议这样做。但要演示:

$ python -Qnew -c 'print 1/2'
0.5
$ python -Qnew -c 'print 1/2j'
-0.5j

How can I force division to be floating point in Python?

I have two integer values a and b, but I need their ratio in floating point. I know that a < b and I want to calculate a/b, so if I use integer division I’ll always get 0 with a remainder of a.

How can I force c to be a floating point number in Python in the following?

c = a / b

What is really being asked here is:

“How do I force true division such that a / b will return a fraction?”

Upgrade to Python 3

In Python 3, to get true division, you simply do a / b.

>>> 1/2
0.5

Floor division, the classic division behavior for integers, is now a // b:

>>> 1//2
0
>>> 1//2.0
0.0

However, you may be stuck using Python 2, or you may be writing code that must work in both 2 and 3.

If Using Python 2

In Python 2, it’s not so simple. Some ways of dealing with classic Python 2 division are better and more robust than others.

Recommendation for Python 2

You can get Python 3 division behavior in any given module with the following import at the top:

from __future__ import division

which then applies Python 3 style division to the entire module. It also works in a python shell at any given point. In Python 2:

>>> from __future__ import division
>>> 1/2
0.5
>>> 1//2
0
>>> 1//2.0
0.0

This is really the best solution as it ensures the code in your module is more forward compatible with Python 3.

Other Options for Python 2

If you don’t want to apply this to the entire module, you’re limited to a few workarounds. The most popular is to coerce one of the operands to a float. One robust solution is a / (b * 1.0). In a fresh Python shell:

>>> 1/(2 * 1.0)
0.5

Also robust is truediv from the operator module operator.truediv(a, b), but this is likely slower because it’s a function call:

>>> from operator import truediv
>>> truediv(1, 2)
0.5

Not Recommended for Python 2

Commonly seen is a / float(b). This will raise a TypeError if b is a complex number. Since division with complex numbers is defined, it makes sense to me to not have division fail when passed a complex number for the divisor.

>>> 1 / float(2)
0.5
>>> 1 / float(2j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't convert complex to float

It doesn’t make much sense to me to purposefully make your code more brittle.

You can also run Python with the -Qnew flag, but this has the downside of executing all modules with the new Python 3 behavior, and some of your modules may expect classic division, so I don’t recommend this except for testing. But to demonstrate:

$ python -Qnew -c 'print 1/2'
0.5
$ python -Qnew -c 'print 1/2j'
-0.5j

回答 3

c = a / (b * 1.0)
c = a / (b * 1.0)

回答 4

在Python 3.x中,单斜杠(/)始终表示真实(非截断)除法。(//运算符用于截断除法。)在Python 2.x(2.2及更高版本)中,您可以通过将

from __future__ import division

在模块顶部。

In Python 3.x, the single slash (/) always means true (non-truncating) division. (The // operator is used for truncating division.) In Python 2.x (2.2 and above), you can get this same behavior by putting a

from __future__ import division

at the top of your module.


回答 5

仅以浮点格式进行除法的任何参数也会产生浮点输出。

例:

>>> 4.0/3
1.3333333333333333

要么,

>>> 4 / 3.0
1.3333333333333333

要么,

>>> 4 / float(3)
1.3333333333333333

要么,

>>> float(4) / 3
1.3333333333333333

Just making any of the parameters for division in floating-point format also produces the output in floating-point.

Example:

>>> 4.0/3
1.3333333333333333

or,

>>> 4 / 3.0
1.3333333333333333

or,

>>> 4 / float(3)
1.3333333333333333

or,

>>> float(4) / 3
1.3333333333333333

回答 6

加一个点(.)表示浮点数

>>> 4/3.
1.3333333333333333

Add a dot (.) to indicate floating point numbers

>>> 4/3.
1.3333333333333333

回答 7

这也可以

>>> u=1./5
>>> print u
0.2

This will also work

>>> u=1./5
>>> print u
0.2

回答 8

如果要默认使用“ true”(浮点)除法,则有一个命令行标志:

python -Q new foo.py

有一些缺点(来自PEP):

有人认为,更改默认值的命令行选项是有害的。如果使用不当,肯定会很危险:例如,不可能将需要-Qnew的第三方库软件包与需要-Qold的第三方库软件包结合使用。

您可以通过查看python手册页了解有关更改/警告除法行为的其他标志值的更多信息。

有关除法更改的详细信息,请阅读:PEP 238-更改除法运算符

If you want to use “true” (floating point) division by default, there is a command line flag:

python -Q new foo.py

There are some drawbacks (from the PEP):

It has been argued that a command line option to change the default is evil. It can certainly be dangerous in the wrong hands: for example, it would be impossible to combine a 3rd party library package that requires -Qnew with another one that requires -Qold.

You can learn more about the other flags values that change / warn-about the behavior of division by looking at the python man page.

For full details on division changes read: PEP 238 — Changing the Division Operator


回答 9

from operator import truediv

c = truediv(a, b)
from operator import truediv

c = truediv(a, b)

回答 10

from operator import truediv

c = truediv(a, b)

其中a是除数,b是除数。当两个整数相除后的商是浮点数时,此函数非常方便。

from operator import truediv

c = truediv(a, b)

where a is dividend and b is the divisor. This function is handy when quotient after division of two integers is a float.


Python 2.X中的range和xrange函数之间有什么区别?

问题:Python 2.X中的range和xrange函数之间有什么区别?

显然,xrange更快,但是我不知道为什么它更快(到目前为止,除了轶事之外,还没有证据表明它更快)或除此之外还有什么不同

for i in range(0, 20):
for i in xrange(0, 20):

Apparently xrange is faster but I have no idea why it’s faster (and no proof besides the anecdotal so far that it is faster) or what besides that is different about

for i in range(0, 20):
for i in xrange(0, 20):

回答 0

在Python 2.x中:

  • range创建一个列表,所以如果您这样做range(1, 10000000),则会在内存中创建一个包含9999999元素的列表。

  • xrange 是一个延迟计算的序列对象。

在Python 3中,range它等效于python xrange,并且必须使用来获取列表list(range(...))

In Python 2.x:

  • range creates a list, so if you do range(1, 10000000) it creates a list in memory with 9999999 elements.

  • xrange is a sequence object that evaluates lazily.

In Python 3, range does the equivalent of python’s xrange, and to get the list, you have to use list(range(...)).


回答 1

range会创建一个列表,因此,如果执行range(1, 10000000)此操作,则会在内存中创建一个包含9999999元素的列表。

xrange 是一个生成器,所以它是一个序列对象,是一个懒惰求值的对象。

的确如此,但是在Python 3中,.range()将由Python 2实现.xrange()。如果需要实际生成列表,则需要执行以下操作:

list(range(1,100))

range creates a list, so if you do range(1, 10000000) it creates a list in memory with 9999999 elements.

xrange is a generator, so it is a sequence object is a that evaluates lazily.

This is true, but in Python 3, .range() will be implemented by the Python 2 .xrange(). If you need to actually generate the list, you will need to do:

list(range(1,100))

回答 2

记住,使用该timeit模块来测试较小的代码片段更快!

$ python -m timeit 'for i in range(1000000):' ' pass'
10 loops, best of 3: 90.5 msec per loop
$ python -m timeit 'for i in xrange(1000000):' ' pass'
10 loops, best of 3: 51.1 msec per loop

就个人而言,.range()除非我处理的列表非常庞大,否则我总是使用-从时间上可以看出,对于一百万个条目的列表,额外的开销只有0.04秒。正如Corey所指出的那样,在Python 3.0中,它.xrange()将会消失,并且.range()无论如何都会为您提供良好的迭代器行为。

Remember, use the timeit module to test which of small snippets of code is faster!

$ python -m timeit 'for i in range(1000000):' ' pass'
10 loops, best of 3: 90.5 msec per loop
$ python -m timeit 'for i in xrange(1000000):' ' pass'
10 loops, best of 3: 51.1 msec per loop

Personally, I always use .range(), unless I were dealing with really huge lists — as you can see, time-wise, for a list of a million entries, the extra overhead is only 0.04 seconds. And as Corey points out, in Python 3.0 .xrange() will go away and .range() will give you nice iterator behavior anyway.


回答 3

xrange仅存储范围参数并按需生成数字。但是,Python的C实现当前将其args限制为C long:

xrange(2**32-1, 2**32+1)  # When long is 32 bits, OverflowError: Python int too large to convert to C long
range(2**32-1, 2**32+1)   # OK --> [4294967295L, 4294967296L]

请注意,在Python 3.0中仅存在,range并且其行为类似于2.x,xrange但对最小和最大端点没有限制。

xrange only stores the range params and generates the numbers on demand. However the C implementation of Python currently restricts its args to C longs:

xrange(2**32-1, 2**32+1)  # When long is 32 bits, OverflowError: Python int too large to convert to C long
range(2**32-1, 2**32+1)   # OK --> [4294967295L, 4294967296L]

Note that in Python 3.0 there is only range and it behaves like the 2.x xrange but without the limitations on minimum and maximum end points.


回答 4

xrange返回一个迭代器,一次只在内存中保留一个数字。range将整个数字列表保留在内存中。

xrange returns an iterator and only keeps one number in memory at a time. range keeps the entire list of numbers in memory.


回答 5

一定要花一些时间在图书馆参考上。您越熟悉它,就可以更快地找到此类问题的答案。关于内置对象和类型的前几章特别重要。

xrange类型的优点在于,无论xrange对象代表的范围大小如何,它始终将占用相同的内存量。没有一致的性能优势。

查找有关Python构造的快速信息的另一种方法是docstring和help-function:

print xrange.__doc__ # def doc(x): print x.__doc__ is super useful
help(xrange)

Do spend some time with the Library Reference. The more familiar you are with it, the faster you can find answers to questions like this. Especially important are the first few chapters about builtin objects and types.

The advantage of the xrange type is that an xrange object will always take the same amount of memory, no matter the size of the range it represents. There are no consistent performance advantages.

Another way to find quick information about a Python construct is the docstring and the help-function:

print xrange.__doc__ # def doc(x): print x.__doc__ is super useful
help(xrange)

回答 6

我很震惊,没有人读doc

此函数非常类似于range(),但是返回一个xrange对象而不是一个列表。这是一种不透明的序列类型,其产生的值与对应的列表相同,而实际上并没有同时存储它们。xrange()over 的优点range()是最小的(因为xrange()在要求输入值时仍必须创建值),除非在内存不足的计算机上使用了非常大的范围或从未使用过范围的所有元素时(例如当循环被使用时)。通常以break)终止。

I am shocked nobody read doc:

This function is very similar to range(), but returns an xrange object instead of a list. This is an opaque sequence type which yields the same values as the corresponding list, without actually storing them all simultaneously. The advantage of xrange() over range() is minimal (since xrange() still has to create the values when asked for them) except when a very large range is used on a memory-starved machine or when all of the range’s elements are never used (such as when the loop is usually terminated with break).


回答 7

range创建一个列表,因此如果执行range(1,10000000),它将在内存中创建一个包含10000000个元素的列表。xrange是一个生成器,因此它懒惰地求值。

这为您带来两个优点:

  1. 您可以迭代更长的列表而无需获取MemoryError
  2. 当它懒散地解析每个数字时,如果您尽早停止迭代,您将不会浪费时间创建整个列表。

range creates a list, so if you do range(1, 10000000) it creates a list in memory with 10000000 elements. xrange is a generator, so it evaluates lazily.

This brings you two advantages:

  1. You can iterate longer lists without getting a MemoryError.
  2. As it resolves each number lazily, if you stop iteration early, you won’t waste time creating the whole list.

回答 8

在这个简单的示例中,您将发现xrangeover 的优点range

import timeit

t1 = timeit.default_timer()
a = 0
for i in xrange(1, 100000000):
    pass
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 4.49153590202 seconds

t1 = timeit.default_timer()
a = 0
for i in range(1, 100000000):
    pass
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 7.04547905922 seconds

上面的示例在的情况下并没有任何实质性的改善xrange

现在range,与相比,以下情况的确非常慢xrange

import timeit

t1 = timeit.default_timer()
a = 0
for i in xrange(1, 100000000):
    if i == 10000:
        break
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 0.000764846801758 seconds

t1 = timeit.default_timer()
a = 0
for i in range(1, 100000000):
    if i == 10000:
        break
t2 = timeit.default_timer() 

print "time taken: ", (t2-t1)  # 2.78506207466 seconds

使用range,它已经创建了一个从0到100000000(耗时)的列表,但是它xrange是一个生成器,并且仅根据需要(即,如果继续迭代)生成数字。

在Python-3中,该range功能的实现与xrangePython-2中的相同,而xrange在Python-3中已取消了该功能。

快乐编码!

You will find the advantage of xrange over range in this simple example:

import timeit

t1 = timeit.default_timer()
a = 0
for i in xrange(1, 100000000):
    pass
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 4.49153590202 seconds

t1 = timeit.default_timer()
a = 0
for i in range(1, 100000000):
    pass
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 7.04547905922 seconds

The above example doesn’t reflect anything substantially better in case of xrange.

Now look at the following case where range is really really slow, compared to xrange.

import timeit

t1 = timeit.default_timer()
a = 0
for i in xrange(1, 100000000):
    if i == 10000:
        break
t2 = timeit.default_timer()

print "time taken: ", (t2-t1)  # 0.000764846801758 seconds

t1 = timeit.default_timer()
a = 0
for i in range(1, 100000000):
    if i == 10000:
        break
t2 = timeit.default_timer() 

print "time taken: ", (t2-t1)  # 2.78506207466 seconds

With range, it already creates a list from 0 to 100000000(time consuming), but xrange is a generator and it only generates numbers based on the need, that is, if the iteration continues.

In Python-3, the implementation of the range functionality is same as that of xrange in Python-2, while they have done away with xrange in Python-3

Happy Coding!!


回答 9

这是出于优化的原因。

range()将从头到尾创建一个值列表(在您的示例中为0 .. 20)。在很大范围内,这将成为昂贵的操作。

另一方面,xrange()更优化了。它只会在需要时(通过xrange序列对象)计算下一个值,并且不会像range()那样创建所有值的列表。

It is for optimization reasons.

range() will create a list of values from start to end (0 .. 20 in your example). This will become an expensive operation on very large ranges.

xrange() on the other hand is much more optimised. it will only compute the next value when needed (via an xrange sequence object) and does not create a list of all values like range() does.


回答 10

range(x,y)如果使用for循环,则返回x和y之间的每个数字的列表,这样range比较慢。实际上,range具有较大的Index范围。range(x.y)将打印出x和y之间所有数字的列表

xrange(x,y)返回,xrange(x,y)但如果使用for循环,则xrange速度更快。xrange索引范围较小。xrange不仅会打印出来xrange(x,y),还会保留其中的所有数字。

[In] range(1,10)
[Out] [1, 2, 3, 4, 5, 6, 7, 8, 9]
[In] xrange(1,10)
[Out] xrange(1,10)

如果使用for循环,那么它将起作用

[In] for i in range(1,10):
        print i
[Out] 1
      2
      3
      4
      5
      6
      7
      8
      9
[In] for i in xrange(1,10):
         print i
[Out] 1
      2
      3
      4
      5
      6
      7
      8
      9

尽管使用循环时并没有什么不同,但仅打印循环时也有差异!

range(x,y) returns a list of each number in between x and y if you use a for loop, then range is slower. In fact, range has a bigger Index range. range(x.y) will print out a list of all the numbers in between x and y

xrange(x,y) returns xrange(x,y) but if you used a for loop, then xrange is faster. xrange has a smaller Index range. xrange will not only print out xrange(x,y) but it will still keep all the numbers that are in it.

[In] range(1,10)
[Out] [1, 2, 3, 4, 5, 6, 7, 8, 9]
[In] xrange(1,10)
[Out] xrange(1,10)

If you use a for loop, then it would work

[In] for i in range(1,10):
        print i
[Out] 1
      2
      3
      4
      5
      6
      7
      8
      9
[In] for i in xrange(1,10):
         print i
[Out] 1
      2
      3
      4
      5
      6
      7
      8
      9

There isn’t much difference when using loops, though there is a difference when just printing it!


回答 11

range(): range(1,10)返回一个1到10个数字的列表,并将整个列表保存在内存中。

xrange():类似于range(),但不返回列表,而是返回一个对象,该对象根据需要生成范围内的数字。对于循环,这比range()快一点,并且内存效率更高。xrange()对象就像一个迭代器,并根据需要生成数字。(延迟评估)

In [1]: range(1,10)

Out[1]: [1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]: xrange(10)

Out[2]: xrange(10)

In [3]: print xrange.__doc__

xrange([start,] stop[, step]) -> xrange object

range(): range(1, 10) returns a list from 1 to 10 numbers & hold whole list in memory.

xrange(): Like range(), but instead of returning a list, returns an object that generates the numbers in the range on demand. For looping, this is lightly faster than range() and more memory efficient. xrange() object like an iterator and generates the numbers on demand.(Lazy Evaluation)

In [1]: range(1,10)

Out[1]: [1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]: xrange(10)

Out[2]: xrange(10)

In [3]: print xrange.__doc__

xrange([start,] stop[, step]) -> xrange object

回答 12

其他一些答案提到Python 3淘汰了2.x range,并将2.x重命名xrangerange。但是,除非您使用3.0或3.1(应该没有人使用),否则它实际上是一种不同的类型。

正如3.1文档所说:

范围对象几乎没有行为:它们仅支持索引,迭代和len功能。

但是,在3.2+版本中,它range是一个完整序列,它支持扩展的slice,以及所有collections.abc.Sequence与语义相同的方法list*

并且,至少在CPython和PyPy(当前仅有的两个3.2+实现)中,它还具有indexand count方法和in运算符的恒定时间实现(只要您仅将其传递为整数)。这意味着123456 in r在3.2+版本中写作是合理的,而在2.7或3.1版本中这将是一个可怕的想法。


*事实上,issubclass(xrange, collections.Sequence)回报率True在2.6-2.7 3.0-3.1和是一个错误是固定在3.2,而不是向后移植。

Some of the other answers mention that Python 3 eliminated 2.x’s range and renamed 2.x’s xrange to range. However, unless you’re using 3.0 or 3.1 (which nobody should be), it’s actually a somewhat different type.

As the 3.1 docs say:

Range objects have very little behavior: they only support indexing, iteration, and the len function.

However, in 3.2+, range is a full sequence—it supports extended slices, and all of the methods of collections.abc.Sequence with the same semantics as a list.*

And, at least in CPython and PyPy (the only two 3.2+ implementations that currently exist), it also has constant-time implementations of the index and count methods and the in operator (as long as you only pass it integers). This means writing 123456 in r is reasonable in 3.2+, while in 2.7 or 3.1 it would be a horrible idea.


* The fact that issubclass(xrange, collections.Sequence) returns True in 2.6-2.7 and 3.0-3.1 is a bug that was fixed in 3.2 and not backported.


回答 13

在python 2.x中

range(x)返回一个列表,该列表在内存中创建有x个元素。

>>> a = range(5)
>>> a
[0, 1, 2, 3, 4]

xrange(x)返回一个xrange对象,该对象是生成器obj,可按需生成数字。它们是在for循环(惰性评估)期间计算的。

对于循环,这比range()快一点,并且内存效率更高。

>>> b = xrange(5)
>>> b
xrange(5)

In python 2.x

range(x) returns a list, that is created in memory with x elements.

>>> a = range(5)
>>> a
[0, 1, 2, 3, 4]

xrange(x) returns an xrange object which is a generator obj which generates the numbers on demand. they are computed during for-loop(Lazy Evaluation).

For looping, this is slightly faster than range() and more memory efficient.

>>> b = xrange(5)
>>> b
xrange(5)

回答 14

在循环中针对xrange测试范围时(我知道我应该使用timeit,但是使用简单的列表理解示例从内存中迅速破解了它),我发现了以下内容:

import time

for x in range(1, 10):

    t = time.time()
    [v*10 for v in range(1, 10000)]
    print "range:  %.4f" % ((time.time()-t)*100)

    t = time.time()
    [v*10 for v in xrange(1, 10000)]
    print "xrange: %.4f" % ((time.time()-t)*100)

这使:

$python range_tests.py
range:  0.4273
xrange: 0.3733
range:  0.3881
xrange: 0.3507
range:  0.3712
xrange: 0.3565
range:  0.4031
xrange: 0.3558
range:  0.3714
xrange: 0.3520
range:  0.3834
xrange: 0.3546
range:  0.3717
xrange: 0.3511
range:  0.3745
xrange: 0.3523
range:  0.3858
xrange: 0.3997 <- garbage collection?

或者,在for循环中使用xrange:

range:  0.4172
xrange: 0.3701
range:  0.3840
xrange: 0.3547
range:  0.3830
xrange: 0.3862 <- garbage collection?
range:  0.4019
xrange: 0.3532
range:  0.3738
xrange: 0.3726
range:  0.3762
xrange: 0.3533
range:  0.3710
xrange: 0.3509
range:  0.3738
xrange: 0.3512
range:  0.3703
xrange: 0.3509

我的代码段测试是否正确?对较慢的xrange实例有何评论?或更好的例子:-)

When testing range against xrange in a loop (I know I should use timeit, but this was swiftly hacked up from memory using a simple list comprehension example) I found the following:

import time

for x in range(1, 10):

    t = time.time()
    [v*10 for v in range(1, 10000)]
    print "range:  %.4f" % ((time.time()-t)*100)

    t = time.time()
    [v*10 for v in xrange(1, 10000)]
    print "xrange: %.4f" % ((time.time()-t)*100)

which gives:

$python range_tests.py
range:  0.4273
xrange: 0.3733
range:  0.3881
xrange: 0.3507
range:  0.3712
xrange: 0.3565
range:  0.4031
xrange: 0.3558
range:  0.3714
xrange: 0.3520
range:  0.3834
xrange: 0.3546
range:  0.3717
xrange: 0.3511
range:  0.3745
xrange: 0.3523
range:  0.3858
xrange: 0.3997 <- garbage collection?

Or, using xrange in the for loop:

range:  0.4172
xrange: 0.3701
range:  0.3840
xrange: 0.3547
range:  0.3830
xrange: 0.3862 <- garbage collection?
range:  0.4019
xrange: 0.3532
range:  0.3738
xrange: 0.3726
range:  0.3762
xrange: 0.3533
range:  0.3710
xrange: 0.3509
range:  0.3738
xrange: 0.3512
range:  0.3703
xrange: 0.3509

Is my snippet testing properly? Any comments on the slower instance of xrange? Or a better example :-)


回答 15

python中的xrange()和range()与用户的工作原理相似,但是区别在于,当我们讨论使用这两个函数分配内存的方式时。

当我们使用range()时,我们为它生成的所有变量分配内存,因此不建议使用更大的no。生成的变量。

另一方面,xrange()一次仅生成一个特定值,并且只能与for循环一起使用以打印所需的所有值。

xrange() and range() in python works similarly as for the user , but the difference comes when we are talking about how the memory is allocated in using both the function.

When we are using range() we allocate memory for all the variables it is generating, so it is not recommended to use with larger no. of variables to be generated.

xrange() on the other hand generate only a particular value at a time and can only be used with the for loop to print all the values required.


回答 16

range生成整个列表并返回。xrange不会-它会按需生成列表中的数字。

range generates the entire list and returns it. xrange does not — it generates the numbers in the list on demand.


回答 17

xrange使用迭代器(动态生成值),range返回一个列表。

xrange uses an iterator (generates values on the fly), range returns a list.


回答 18

什么?
range在运行时返回静态列表。
xrange返回一个object(在某种情况下,它的作用类似于生成器,尽管肯定不是一个),并在需要时从中生成值。

什么时候使用?

  • xrange如果要生成一个巨大范围(例如10亿)的列表,则可以使用该选项,尤其是当您拥有像手机这样的“内存敏感系统”时。
  • 使用range,如果你想在列表几次迭代。

PS:Python 3.x的range功能== Python 2.x的xrange功能。

What?
range returns a static list at runtime.
xrange returns an object (which acts like a generator, although it’s certainly not one) from which values are generated as and when required.

When to use which?

  • Use xrange if you want to generate a list for a gigantic range, say 1 billion, especially when you have a “memory sensitive system” like a cell phone.
  • Use range if you want to iterate over the list several times.

PS: Python 3.x’s range function == Python 2.x’s xrange function.


回答 19

每个人都对此做了很大的解释。但我希望自己看到它。我使用python3。因此,我打开了资源监视器(在Windows中!),首先,首先执行以下命令:

a=0
for i in range(1,100000):
    a=a+i

然后检查“使用中”内存中的更改。这无关紧要。然后,我运行了以下代码:

for i in list(range(1,100000)):
    a=a+i

立即消耗了很大一部分内存。而且,我被说服了。您可以自己尝试。

如果您使用的是Python 2X,则在第一个代码中将“ range()”替换为“ xrange()”,将“ list(range())”替换为“ range()”。

Everyone has explained it greatly. But I wanted it to see it for myself. I use python3. So, I opened the resource monitor (in Windows!), and first, executed the following command first:

a=0
for i in range(1,100000):
    a=a+i

and then checked the change in ‘In Use’ memory. It was insignificant. Then, I ran the following code:

for i in list(range(1,100000)):
    a=a+i

And it took a big chunk of the memory for use, instantly. And, I was convinced. You can try it for yourself.

If you are using Python 2X, then replace ‘range()’ with ‘xrange()’ in the first code and ‘list(range())’ with ‘range()’.


回答 20

从帮助文档。

Python 2.7.12

>>> print range.__doc__
range(stop) -> list of integers
range(start, stop[, step]) -> list of integers

Return a list containing an arithmetic progression of integers.
range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0.
When step is given, it specifies the increment (or decrement).
For example, range(4) returns [0, 1, 2, 3].  The end point is omitted!
These are exactly the valid indices for a list of 4 elements.

>>> print xrange.__doc__
xrange(stop) -> xrange object
xrange(start, stop[, step]) -> xrange object

Like range(), but instead of returning a list, returns an object that
generates the numbers in the range on demand.  For looping, this is 
slightly faster than range() and more memory efficient.

Python 3.5.2

>>> print(range.__doc__)
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).

>>> print(xrange.__doc__)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'xrange' is not defined

差异显而易见。在Python 2.x中,range返回一个列表,xrange返回一个可迭代的xrange对象。

在Python 3.x中,range成为xrangePython 2.x并被xrange删除。

From the help docs.

Python 2.7.12

>>> print range.__doc__
range(stop) -> list of integers
range(start, stop[, step]) -> list of integers

Return a list containing an arithmetic progression of integers.
range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0.
When step is given, it specifies the increment (or decrement).
For example, range(4) returns [0, 1, 2, 3].  The end point is omitted!
These are exactly the valid indices for a list of 4 elements.

>>> print xrange.__doc__
xrange(stop) -> xrange object
xrange(start, stop[, step]) -> xrange object

Like range(), but instead of returning a list, returns an object that
generates the numbers in the range on demand.  For looping, this is 
slightly faster than range() and more memory efficient.

Python 3.5.2

>>> print(range.__doc__)
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).

>>> print(xrange.__doc__)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'xrange' is not defined

Difference is apparent. In Python 2.x, range returns a list, xrange returns an xrange object which is iterable.

In Python 3.x, range becomes xrange of Python 2.x, and xrange is removed.


回答 21

根据扫描/打印0-N个项目的要求,range和xrange的工作方式如下。

range()-在内存中创建一个新列表,并将整个0到N个项目(总共N + 1个)打印出来。xrange()-创建一个迭代器实例,该实例扫描项目并仅将当前遇到的项目保留在内存中,因此始终使用相同数量的内存。

如果所需元素只是在列表的开头,那么它可以节省大量时间和内存。

On a requirement for scanning/printing of 0-N items , range and xrange works as follows.

range() – creates a new list in the memory and takes the whole 0 to N items(totally N+1) and prints them. xrange() – creates a iterator instance that scans through the items and keeps only the current encountered item into the memory , hence utilising same amount of memory all the time.

In case the required element is somewhat at the beginning of the list only then it saves a good amount of time and memory.


回答 22

Range返回一个列表,xrange返回一个xrange对象,无论范围大小如何,该对象都占用相同的内存,因为在这种情况下,每次迭代仅生成一个元素并且可用,而在使用range的情况下,所有元素一次生成,并且在内存中可用。

Range returns a list while xrange returns an xrange object which takes the same memory irrespective of the range size,as in this case,only one element is generated and available per iteration whereas in case of using range, all the elements are generated at once and are available in the memory.


回答 23

对于较小的参数差减小range(..)/ xrange(..)

$ python -m timeit "for i in xrange(10111):" " for k in range(100):" "  pass"
10 loops, best of 3: 59.4 msec per loop

$ python -m timeit "for i in xrange(10111):" " for k in xrange(100):" "  pass"
10 loops, best of 3: 46.9 msec per loop

在这种情况下,xrange(100)效率仅提高约20%。

The difference decreases for smaller arguments to range(..) / xrange(..):

$ python -m timeit "for i in xrange(10111):" " for k in range(100):" "  pass"
10 loops, best of 3: 59.4 msec per loop

$ python -m timeit "for i in xrange(10111):" " for k in xrange(100):" "  pass"
10 loops, best of 3: 46.9 msec per loop

In this case xrange(100) is only about 20% more efficient.


回答 24

range:-range会立即填充所有内容,这意味着范围的每个数字都会占用内存。

xrange:-xrange类似于生成器,当您想要数字的范围但不希望将它们存储时(例如当您要在for loop.so中使用时),它将出现在图片中,这样可以提高内存效率。

range :-range will populate everything at once.which means every number of the range will occupy the memory.

xrange :-xrange is something like generator ,it will comes into picture when you want the range of numbers but you dont want them to be stored,like when you want to use in for loop.so memory efficient.


回答 25

此外,如果这样做list(xrange(...))等同于range(...)

所以 list很慢。

xrange真的没有完全完成序列

这就是为什么它不是列表,而是xrange对象

Additionally, if do list(xrange(...)) will be equivalent to range(...).

So list is slow.

Also xrange really doesn’t fully finish the sequence

So that’s why its not a list, it’s a xrange object


回答 26

range() 在Python中 2.x

此函数本质range()上是Python中可用的旧函数,它2.x返回list包含指定范围内的元素的对象的实例。

但是,这种实现在初始化带有一系列数字的列表时效率太低。例如,for i in range(1000000)就内存和时间使用而言,要执行的命令非常昂贵,因为它需要将此列表存储到内存中。


range()在Python 3.xxrange()Python中2.x

Python 3.x引入了更新的实现range()(而更新的实现已经可以2.x通过Python 通过xrange()功能)。

range()漏洞利用一种称为惰性评估的策略较新的实现未在范围内创建大量元素,而是引入了class range,这是一个轻量级的对象,代表给定范围内的所需元素,而无需将其显式存储在内存中(这听起来像是生成器,但是惰性求值的概念是不同)。


例如,请考虑以下内容:

# Python 2.x
>>> a = range(10)
>>> type(a)
<type 'list'>
>>> b = xrange(10)
>>> type(b)
<type 'xrange'>

# Python 3.x
>>> a = range(10)
>>> type(a)
<class 'range'>

range() in Python 2.x

This function is essentially the old range() function that was available in Python 2.x and returns an instance of a list object that contains the elements in the specified range.

However, this implementation is too inefficient when it comes to initialise a list with a range of numbers. For example, for i in range(1000000) would be a very expensive command to execute, both in terms of memory and time usage as it requires the storage of this list into the memory.


range() in Python 3.x and xrange() in Python 2.x

Python 3.x introduced a newer implementation of range() (while the newer implementation was already available in Python 2.x through the xrange() function).

The range() exploits a strategy known as lazy evaluation. Instead of creating a huge list of elements in range, the newer implementation introduces the class range, a lightweight object that represents the required elements in the given range, without storing them explicitly in memory (this might sound like generators but the concept of lazy evaluation is different).


As an example, consider the following:

# Python 2.x
>>> a = range(10)
>>> type(a)
<type 'list'>
>>> b = xrange(10)
>>> type(b)
<type 'xrange'>

and

# Python 3.x
>>> a = range(10)
>>> type(a)
<class 'range'>

回答 27

看到这个帖子以查找range和xrange之间的区别:

报价:

range返回确切的结果:一系列连续的整数,定义的长度以0开头xrange。但是,将返回“ xrange object”,其作用类似于迭代器

See this post to find difference between range and xrange:

To quote:

range returns exactly what you think: a list of consecutive integers, of a defined length beginning with 0. xrange, however, returns an “xrange object”, which acts a great deal like an iterator


Python2中的dict.items()和dict.iteritems()有什么区别?

问题:Python2中的dict.items()和dict.iteritems()有什么区别?

dict.items()和之间有适用的区别dict.iteritems()吗?

Python文档

dict.items():返回字典的(键,值)对列表的副本

dict.iteritems():在字典的(键,值)对上返回迭代器

如果我运行下面的代码,每个似乎都返回对同一对象的引用。我缺少任何细微的差异吗?

#!/usr/bin/python

d={1:'one',2:'two',3:'three'}
print 'd.items():'
for k,v in d.items():
   if d[k] is v: print '\tthey are the same object' 
   else: print '\tthey are different'

print 'd.iteritems():'   
for k,v in d.iteritems():
   if d[k] is v: print '\tthey are the same object' 
   else: print '\tthey are different'   

输出:

d.items():
    they are the same object
    they are the same object
    they are the same object
d.iteritems():
    they are the same object
    they are the same object
    they are the same object

Are there any applicable differences between dict.items() and dict.iteritems()?

From the Python docs:

dict.items(): Return a copy of the dictionary’s list of (key, value) pairs.

dict.iteritems(): Return an iterator over the dictionary’s (key, value) pairs.

If I run the code below, each seems to return a reference to the same object. Are there any subtle differences that I am missing?

#!/usr/bin/python

d={1:'one',2:'two',3:'three'}
print 'd.items():'
for k,v in d.items():
   if d[k] is v: print '\tthey are the same object' 
   else: print '\tthey are different'

print 'd.iteritems():'   
for k,v in d.iteritems():
   if d[k] is v: print '\tthey are the same object' 
   else: print '\tthey are different'   

Output:

d.items():
    they are the same object
    they are the same object
    they are the same object
d.iteritems():
    they are the same object
    they are the same object
    they are the same object

回答 0

这是演变的一部分。

最初,Python items()构建了一个真正的元组列表,并将其返回。这可能会占用大量额外的内存。

然后,一般将生成器引入该语言,然后将该方法重新实现为名为的迭代器-生成器方法iteritems()。保留原始版本是为了向后兼容。

Python 3的更改之一是 items()现在返回迭代器,并且列表从未完全构建。该iteritems()方法也消失了,因为items()在Python 3中的工作方式与viewitems()在Python 2.7中一样。

It’s part of an evolution.

Originally, Python items() built a real list of tuples and returned that. That could potentially take a lot of extra memory.

Then, generators were introduced to the language in general, and that method was reimplemented as an iterator-generator method named iteritems(). The original remains for backwards compatibility.

One of Python 3’s changes is that items() now return iterators, and a list is never fully built. The iteritems() method is also gone, since items() in Python 3 works like viewitems() in Python 2.7.


回答 1

dict.items()返回2元组([(key, value), (key, value), ...])的列表,而是dict.iteritems()生成2元组的生成器。前者最初占用更多空间和时间,但是访问每个元素的速度很快,而前者最初占用较少的空间和时间,但是在生成每个元素时要花费更多的时间。

dict.items() returns a list of 2-tuples ([(key, value), (key, value), ...]), whereas dict.iteritems() is a generator that yields 2-tuples. The former takes more space and time initially, but accessing each element is fast, whereas the second takes less space and time initially, but a bit more time in generating each element.


回答 2

在Py2.x中

该命令dict.items()dict.keys()dict.values()返回一个副本字典的的列表(k, v)对,键和值。如果复制的列表很大,则可能会占用大量内存。

该命令dict.iteritems()dict.iterkeys()dict.itervalues()返回一个迭代器在字典的(k, v)对,键和值。

该命令dict.viewitems()dict.viewkeys()dict.viewvalues()返回视图对象,它可以体现字典的变化。(即,如果您在字典中del添加了项或(k,v)在字典中添加了对,则视图对象可以同时自动更改。)

$ python2.7

>>> d = {'one':1, 'two':2}
>>> type(d.items())
<type 'list'>
>>> type(d.keys())
<type 'list'>
>>> 
>>> 
>>> type(d.iteritems())
<type 'dictionary-itemiterator'>
>>> type(d.iterkeys())
<type 'dictionary-keyiterator'>
>>> 
>>> 
>>> type(d.viewitems())
<type 'dict_items'>
>>> type(d.viewkeys())
<type 'dict_keys'>

在Py3.x中

在Py3.x,事情比较干净,因为只有dict.items()dict.keys()dict.values()可用,这回该视图对象,就像dict.viewitems()在Py2.x一样。

就像@lvc指出的那样,view对象iterator并不相同,因此,如果要在Py3.x中返回迭代器,可以使用iter(dictview)

$ python3.3

>>> d = {'one':'1', 'two':'2'}
>>> type(d.items())
<class 'dict_items'>
>>>
>>> type(d.keys())
<class 'dict_keys'>
>>>
>>>
>>> ii = iter(d.items())
>>> type(ii)
<class 'dict_itemiterator'>
>>>
>>> ik = iter(d.keys())
>>> type(ik)
<class 'dict_keyiterator'>

In Py2.x

The commands dict.items(), dict.keys() and dict.values() return a copy of the dictionary’s list of (k, v) pair, keys and values. This could take a lot of memory if the copied list is very large.

The commands dict.iteritems(), dict.iterkeys() and dict.itervalues() return an iterator over the dictionary’s (k, v) pair, keys and values.

The commands dict.viewitems(), dict.viewkeys() and dict.viewvalues() return the view objects, which can reflect the dictionary’s changes. (I.e. if you del an item or add a (k,v) pair in the dictionary, the view object can automatically change at the same time.)

$ python2.7

>>> d = {'one':1, 'two':2}
>>> type(d.items())
<type 'list'>
>>> type(d.keys())
<type 'list'>
>>> 
>>> 
>>> type(d.iteritems())
<type 'dictionary-itemiterator'>
>>> type(d.iterkeys())
<type 'dictionary-keyiterator'>
>>> 
>>> 
>>> type(d.viewitems())
<type 'dict_items'>
>>> type(d.viewkeys())
<type 'dict_keys'>

While in Py3.x

In Py3.x, things are more clean, since there are only dict.items(), dict.keys() and dict.values() available, which return the view objects just as dict.viewitems() in Py2.x did.

But

Just as @lvc noted, view object isn’t the same as iterator, so if you want to return an iterator in Py3.x, you could use iter(dictview) :

$ python3.3

>>> d = {'one':'1', 'two':'2'}
>>> type(d.items())
<class 'dict_items'>
>>>
>>> type(d.keys())
<class 'dict_keys'>
>>>
>>>
>>> ii = iter(d.items())
>>> type(ii)
<class 'dict_itemiterator'>
>>>
>>> ik = iter(d.keys())
>>> type(ik)
<class 'dict_keyiterator'>

回答 3

您问:“ dict.items()和dict.iteritems()之间是否有适用的区别”

这可能会有所帮助(对于Python 2.x):

>>> d={1:'one',2:'two',3:'three'}
>>> type(d.items())
<type 'list'>
>>> type(d.iteritems())
<type 'dictionary-itemiterator'>

您将看到d.items()返回键,值对的元组列表,并d.iteritems()返回一个字典迭代器。

清单d.items()是可切片的:

>>> l1=d.items()[0]
>>> l1
(1, 'one')   # an unordered value!

但是没有__iter__方法:

>>> next(d.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list object is not an iterator

作为迭代器,d.iteritems()不可切片:

>>> i1=d.iteritems()[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dictionary-itemiterator' object is not subscriptable

但是确实有__iter__

>>> next(d.iteritems())
(1, 'one')               # an unordered value!

因此,物品本身是相同的-运送物品的容器是不同的。一个是列表,另一个是迭代器(取决于Python版本…)

因此,dict.items()和dict.iteritems()之间的适用差异与列表和迭代器之间的适用差异相同。

You asked: ‘Are there any applicable differences between dict.items() and dict.iteritems()’

This may help (for Python 2.x):

>>> d={1:'one',2:'two',3:'three'}
>>> type(d.items())
<type 'list'>
>>> type(d.iteritems())
<type 'dictionary-itemiterator'>

You can see that d.items() returns a list of tuples of the key, value pairs and d.iteritems() returns a dictionary-itemiterator.

As a list, d.items() is slice-able:

>>> l1=d.items()[0]
>>> l1
(1, 'one')   # an unordered value!

But would not have an __iter__ method:

>>> next(d.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list object is not an iterator

As an iterator, d.iteritems() is not slice-able:

>>> i1=d.iteritems()[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dictionary-itemiterator' object is not subscriptable

But does have __iter__:

>>> next(d.iteritems())
(1, 'one')               # an unordered value!

So the items themselves are same — the container delivering the items are different. One is a list, the other an iterator (depending on the Python version…)

So the applicable differences between dict.items() and dict.iteritems() are the same as the applicable differences between a list and an iterator.


回答 4

dict.items()返回元组列表,并dict.iteritems()在字典中返回元组的迭代器对象为(key,value)。元组相同,但容器不同。

dict.items()基本上将所有字典复制到列表中。尝试使用下面的代码的执行时间比较dict.items()dict.iteritems()。您将看到差异。

import timeit

d = {i:i*2 for i in xrange(10000000)}  
start = timeit.default_timer() #more memory intensive
for key,value in d.items():
    tmp = key + value #do something like print
t1 = timeit.default_timer() - start

start = timeit.default_timer()
for key,value in d.iteritems(): #less memory intensive
    tmp = key + value
t2 = timeit.default_timer() - start

在我的机器上输出:

Time with d.items(): 9.04773592949
Time with d.iteritems(): 2.17707300186

这清楚地表明这dictionary.iteritems()是非常有效的。

dict.items() return list of tuples, and dict.iteritems() return iterator object of tuple in dictionary as (key,value). The tuples are the same, but container is different.

dict.items() basically copies all dictionary into list. Try using following code to compare the execution times of the dict.items() and dict.iteritems(). You will see the difference.

import timeit

d = {i:i*2 for i in xrange(10000000)}  
start = timeit.default_timer() #more memory intensive
for key,value in d.items():
    tmp = key + value #do something like print
t1 = timeit.default_timer() - start

start = timeit.default_timer()
for key,value in d.iteritems(): #less memory intensive
    tmp = key + value
t2 = timeit.default_timer() - start

Output in my machine:

Time with d.items(): 9.04773592949
Time with d.iteritems(): 2.17707300186

This clearly shows that dictionary.iteritems() is much more efficient.


回答 5

如果你有

dict = {key1:value1, key2:value2, key3:value3,...}

Python 2中dict.items()复制每个元组并返回字典中的元组列表,即[(key1,value1), (key2,value2), ...]。这意味着整个字典将被复制到包含元组的新列表中

dict = {i: i * 2 for i in xrange(10000000)}  
# Slow and memory hungry.
for key, value in dict.items():
    print(key,":",value)

dict.iteritems()返回字典项迭代器。返回的项的值也相同,即(key1,value1), (key2,value2), ...,但这不是列表。这只是字典项迭代器对象。这意味着更少的内存使用量(减少了50%)。

  • 列出为可变快照: d.items() -> list(d.items())
  • 迭代器对象: d.iteritems() -> iter(d.items())

元组是相同的。您比较了每个中的元组,因此您得到相同的元组。

dict = {i: i * 2 for i in xrange(10000000)}  
# More memory efficient.
for key, value in dict.iteritems():
    print(key,":",value)

Python 3中dict.items()返回迭代器对象。dict.iteritems()已删除,因此不再有问题。

If you have

dict = {key1:value1, key2:value2, key3:value3,...}

In Python 2, dict.items() copies each tuples and returns the list of tuples in dictionary i.e. [(key1,value1), (key2,value2), ...]. Implications are that the whole dictionary is copied to new list containing tuples

dict = {i: i * 2 for i in xrange(10000000)}  
# Slow and memory hungry.
for key, value in dict.items():
    print(key,":",value)

dict.iteritems() returns the dictionary item iterator. The value of the item returned is also the same i.e. (key1,value1), (key2,value2), ..., but this is not a list. This is only dictionary item iterator object. That means less memory usage (50% less).

  • Lists as mutable snapshots: d.items() -> list(d.items())
  • Iterator objects: d.iteritems() -> iter(d.items())

The tuples are the same. You compared tuples in each so you get same.

dict = {i: i * 2 for i in xrange(10000000)}  
# More memory efficient.
for key, value in dict.iteritems():
    print(key,":",value)

In Python 3, dict.items() returns iterator object. dict.iteritems() is removed so there is no more issue.


回答 6

dict.iteritems在Python3.x中已经不存在了,因此用于iter(dict.items())获得相同的输出和内存分配

dict.iteritems is gone in Python3.x So use iter(dict.items()) to get the same output and memory alocation


回答 7

如果您想要一种方法来迭代同时适用于Python 2和3的字典的项对,请尝试如下操作:

DICT_ITER_ITEMS = (lambda d: d.iteritems()) if hasattr(dict, 'iteritems') else (lambda d: iter(d.items()))

像这样使用它:

for key, value in DICT_ITER_ITEMS(myDict):
    # Do something with 'key' and/or 'value'.

If you want a way to iterate the item pairs of a dictionary that works with both Python 2 and 3, try something like this:

DICT_ITER_ITEMS = (lambda d: d.iteritems()) if hasattr(dict, 'iteritems') else (lambda d: iter(d.items()))

Use it like this:

for key, value in DICT_ITER_ITEMS(myDict):
    # Do something with 'key' and/or 'value'.

回答 8

dict.iteritems():给您一个迭代器。您可以在循环外的其他模式中使用迭代器。

student = {"name": "Daniel", "student_id": 2222}

for key,value in student.items():
    print(key,value)

('student_id', 2222)
('name', 'Daniel')

for key,value in student.iteritems():
    print(key,value)

('student_id', 2222)
('name', 'Daniel')

studentIterator = student.iteritems()

print(studentIterator.next())
('student_id', 2222)

print(studentIterator.next())
('name', 'Daniel')

dict.iteritems(): gives you an iterator. You may use the iterator in other patterns outside of the loop.

student = {"name": "Daniel", "student_id": 2222}

for key,value in student.items():
    print(key,value)

('student_id', 2222)
('name', 'Daniel')

for key,value in student.iteritems():
    print(key,value)

('student_id', 2222)
('name', 'Daniel')

studentIterator = student.iteritems()

print(studentIterator.next())
('student_id', 2222)

print(studentIterator.next())
('name', 'Daniel')

回答 9

python 2中的dict.iteritems()与python 3中的dict.items()等效。

dict.iteritems() in python 2 is equivalent to dict.items() in python 3.


Python中的__future__是什么,以及如何/何时使用它以及如何工作

问题:Python中的__future__是什么,以及如何/何时使用它以及如何工作

__future__经常出现在Python模块中。__future__即使阅读了python的__future__文档,我也不明白它的用途以及使用时间/方式。

有人可以举例说明吗?

关于__future__我收到的基本用法的一些答案似乎是正确的。

但是,我需要了解有关__future__工作原理的另一件事:

对我来说,最令人困惑的概念是当前的python版本如何包含未来版本的功能,以及如何使用当前版本的Python成功地编译使用未来版本的功能的程序。

我猜想当前版本包含了将来的潜在功能。但是,这些功能仅可通过使用获得,__future__因为它们不是当前标准。让我知道我是否正确。

__future__ frequently appears in Python modules. I do not understand what __future__ is for and how/when to use it even after reading the Python’s __future__ doc.

Can anyone explain with examples?

A few answers regarding the basic usage of __future__ I’ve received seemed correct.

However, I need to understand one more thing regarding how __future__ works:

The most confusing concept for me is how a current python release includes features for future releases, and how a program using a feature from a future release can be be compiled successfully in the current version of Python.

I am guessing that the current release is packaged with potential features for the future. However, the features are available only by using __future__ because they are not the current standard. Let me know if I am right.


回答 0

通过__future__包含模块,您可以慢慢习惯不兼容的更改或引入新关键字的更改。

例如,对于使用上下文管理器,您必须from __future__ import with_statement在2.5中进行操作,因为with关键字是new,不再应该用作变量名。为了with在Python 2.5或更早版本中用作Python关键字,您将需要使用上面的import。

另一个例子是

from __future__ import division
print 8/7  # prints 1.1428571428571428
print 8//7 # prints 1

没有这些__future__东西,两个print语句都将打印出来1

内部差异在于没有导入时,/映射到__div__()方法,而使用导入__truediv__()。(无论如何,请//调用__floordiv__()。)

Apropos printprint在3.x中成为函数,失去其特殊属性作为关键字。反之亦然。

>>> print

>>> from __future__ import print_function
>>> print
<built-in function print>
>>>

With __future__ module’s inclusion, you can slowly be accustomed to incompatible changes or to such ones introducing new keywords.

E.g., for using context managers, you had to do from __future__ import with_statement in 2.5, as the with keyword was new and shouldn’t be used as variable names any longer. In order to use with as a Python keyword in Python 2.5 or older, you will need to use the import from above.

Another example is

from __future__ import division
print 8/7  # prints 1.1428571428571428
print 8//7 # prints 1

Without the __future__ stuff, both print statements would print 1.

The internal difference is that without that import, / is mapped to the __div__() method, while with it, __truediv__() is used. (In any case, // calls __floordiv__().)

Apropos print: print becomes a function in 3.x, losing its special property as a keyword. So it is the other way round.

>>> print

>>> from __future__ import print_function
>>> print
<built-in function print>
>>>

回答 1

当你做

from __future__ import whatever

您实际上不是在使用import语句,而是在将来的语句。您正在阅读错误的文档,因为您实际上并未在导入该模块。

以后的语句很特殊-它们更改了Python模块的解析方式,这就是为什么它们必须位于文件顶部的原因。它们为文件中的单词或符号赋予了新的(或不同的)含义。从文档:

将来的语句是对编译器的指令,即应使用将来指定的Python版本中可用的语法或语义来编译特定模块。将来的声明旨在简化向Python的未来版本的移植,从而对语言进行不兼容的更改。它允许在该功能成为标准版本之前,按模块使用新功能。

如果您确实要导入__future__模块,请执行

import __future__

然后照常访问它。

When you do

from __future__ import whatever

You’re not actually using an import statement, but a future statement. You’re reading the wrong docs, as you’re not actually importing that module.

Future statements are special — they change how your Python module is parsed, which is why they must be at the top of the file. They give new — or different — meaning to words or symbols in your file. From the docs:

A future statement is a directive to the compiler that a particular module should be compiled using syntax or semantics that will be available in a specified future release of Python. The future statement is intended to ease migration to future versions of Python that introduce incompatible changes to the language. It allows use of the new features on a per-module basis before the release in which the feature becomes standard.

If you actually want to import the __future__ module, just do

import __future__

and then access it as usual.


回答 2

__future__ 是一个伪模块,程序员可以使用它来启用与当前解释器不兼容的新语言功能。例如,该表达式11/4当前的计算结果为2。如果执行该模块的模块通过执行以下命令启用了真除法:

from __future__ import division

该表达式的11/4计算结果为2.75。通过导入__future__模块并评估其变量,您可以看到何时将新功能首次添加到语言中以及何时将其成为默认功能:

  >>> import __future__
  >>> __future__.division
  _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192)

__future__ is a pseudo-module which programmers can use to enable new language features which are not compatible with the current interpreter. For example, the expression 11/4 currently evaluates to 2. If the module in which it is executed had enabled true division by executing:

from __future__ import division

the expression 11/4 would evaluate to 2.75. By importing the __future__ module and evaluating its variables, you can see when a new feature was first added to the language and when it will become the default:

  >>> import __future__
  >>> __future__.division
  _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192)

回答 3

它可以用于使用某些功能,这些功能将在具有较旧版本的Python的同时以较新的版本显示。

例如

>>> from __future__ import print_function

将允许您将其print用作功能:

>>> print('# of entries', len(dictionary), file=sys.stderr)

It can be used to use features which will appear in newer versions while having an older release of Python.

For example

>>> from __future__ import print_function

will allow you to use print as a function:

>>> print('# of entries', len(dictionary), file=sys.stderr)

回答 4

已经有一些不错的答案,但是都没有一个完整的清单 __future__语句当前支持。

简而言之,__future__语句强制Python解释器使用该语言的更新功能。


当前支持的功能如下:

nested_scopes

在Python 2.1之前,以下代码将引发NameError

def f():
    ...
    def g(value):
        ...
        return g(value-1) + 1
    ...

from __future__ import nested_scopes指令将允许启用此功能。

generators

引入了以下生成器函数,以在连续的函数调用之间保存状态:

def fib():
    a, b = 0, 1
    while 1:
       yield b
       a, b = b, a+b

division

在Python 2.x版本中使用经典除法。这意味着某些除法语句返回合理的除法近似值(“真除法”),而另一些则返回下限(“地板除法”)。从Python 3.0开始,真正的除法由指定x/y,而场除由指定x//y

from __future__ import division指令强制使用Python 3.0样式划分。

absolute_import

允许用括号括起多个import语句。例如:

from Tkinter import (Tk, Frame, Button, Entry, Canvas, Text,
    LEFT, DISABLED, NORMAL, RIDGE, END)

代替:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text, \
    LEFT, DISABLED, NORMAL, RIDGE, END

要么:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text
from Tkinter import LEFT, DISABLED, NORMAL, RIDGE, END

with_statement

with在Python中将该语句作为关键字添加,以消除对try/finally语句的需要。在执行文件I / O时,通常的用法是:

with open('workfile', 'r') as f:
     read_data = f.read()

print_function

强制使用Python 3括号样式print()函数调用代替print MESSAGEstyle语句。

unicode_literals

介绍bytes对象的文字语法。意味着诸如之类的陈述bytes('Hello world', 'ascii')可以简单地表达为b'Hello world'

generator_stop

StopIteration生成器函数内部使用的异常的使用替换为RuntimeError异常。

上面没有提到的另一种用法是该__future__语句还需要使用Python 2.1+解释器,因为使用较旧的版本将引发运行时异常。


参考文献

There are some great answers already, but none of them address a complete list of what the __future__ statement currently supports.

Put simply, the __future__ statement forces Python interpreters to use newer features of the language.


The features that it currently supports are the following:

nested_scopes

Prior to Python 2.1, the following code would raise a NameError:

def f():
    ...
    def g(value):
        ...
        return g(value-1) + 1
    ...

The from __future__ import nested_scopes directive will allow for this feature to be enabled.

generators

Introduced generator functions such as the one below to save state between successive function calls:

def fib():
    a, b = 0, 1
    while 1:
       yield b
       a, b = b, a+b

division

Classic division is used in Python 2.x versions. Meaning that some division statements return a reasonable approximation of division (“true division”) and others return the floor (“floor division”). Starting in Python 3.0, true division is specified by x/y, whereas floor division is specified by x//y.

The from __future__ import division directive forces the use of Python 3.0 style division.

absolute_import

Allows for parenthesis to enclose multiple import statements. For example:

from Tkinter import (Tk, Frame, Button, Entry, Canvas, Text,
    LEFT, DISABLED, NORMAL, RIDGE, END)

Instead of:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text, \
    LEFT, DISABLED, NORMAL, RIDGE, END

Or:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text
from Tkinter import LEFT, DISABLED, NORMAL, RIDGE, END

with_statement

Adds the statement with as a keyword in Python to eliminate the need for try/finally statements. Common uses of this are when doing file I/O such as:

with open('workfile', 'r') as f:
     read_data = f.read()

print_function:

Forces the use of Python 3 parenthesis-style print() function call instead of the print MESSAGE style statement.

unicode_literals

Introduces the literal syntax for the bytes object. Meaning that statements such as bytes('Hello world', 'ascii') can be simply expressed as b'Hello world'.

generator_stop

Replaces the use of the StopIteration exception used inside generator functions with the RuntimeError exception.

One other use not mentioned above is that the __future__ statement also requires the use of Python 2.1+ interpreters since using an older version will throw a runtime exception.


References


回答 5

还是说“既然是python v2.7,请在python 3中添加它后,再使用另一个已添加到python v2.7中的’print’函数,因此我的’print’将不再是语句(例如,打印“ message”),但具有功能(例如,print(“ message”,选项)。这样,当我的代码在python 3中运行时,“ print”不会中断。”

from __future__ import print_function

print_function是包含“ print”的新实现的模块,具体取决于python v3中的行为。

这有更多解释:http : //python3porting.com/noconv.html

Or is it like saying “Since this is python v2.7, use that different ‘print’ function that has also been added to python v2.7, after it was added in python 3. So my ‘print’ will no longer be statements (eg print “message” ) but functions (eg, print(“message”, options). That way when my code is run in python 3, ‘print’ will not break.”

In

from __future__ import print_function

print_function is the module containing the new implementation of ‘print’ as per how it is behaving in python v3.

This has more explanation: http://python3porting.com/noconv.html


回答 6

我发现非常有用的用途之一是print_functionfrom__future__模块。

在Python 2.7中,我希望将来自不同打印语句的字符打印在同一行上而没有空格。

可以在最后使用逗号(“,”)来完成此操作,但是它还会附加一个额外的空间。上面的语句用作:

from __future__ import print_function
...
print (v_num,end="")
...

这将v_num在一行中没有空格的情况下打印每次迭代的值。

One of the uses which I found to be very useful is the print_function from __future__ module.

In Python 2.7, I wanted chars from different print statements to be printed on same line without spaces.

It can be done using a comma(“,”) at the end, but it also appends an extra space. The above statement when used as :

from __future__ import print_function
...
print (v_num,end="")
...

This will print the value of v_num from each iteration in a single line without spaces.


回答 7

从Python 3.0开始,print不再只是一个语句,而是一个函数。并包含在PEP 3105中。

我也认为Python 3.0包仍然具有这些特殊功能。让我们通过Python中的传统“金字塔程序”查看其可用性:

from __future__ import print_function

class Star(object):
    def __init__(self,count):
        self.count = count

    def start(self):
        for i in range(1,self.count):
            for j in range (i): 
                print('*', end='') # PEP 3105: print As a Function 
            print()

a = Star(5)
a.start()

Output:
*
**
***
****

如果我们使用普通的打印功能,将无法获得相同的输出,因为print()带有额外的换行符。因此,每次执行内部for循环时,它将在下一行上打印*。

After Python 3.0 onward, print is no longer just a statement, its a function instead. and is included in PEP 3105.

Also I think the Python 3.0 package has still these special functionality. Lets see its usability through a traditional “Pyramid program” in Python:

from __future__ import print_function

class Star(object):
    def __init__(self,count):
        self.count = count

    def start(self):
        for i in range(1,self.count):
            for j in range (i): 
                print('*', end='') # PEP 3105: print As a Function 
            print()

a = Star(5)
a.start()

Output:
*
**
***
****

If we use normal print function, we won’t be able to achieve the same output, since print() comes with a extra newline. So every time the inner for loop execute, it will print * onto the next line.


字符串标志“ u”和“ r”到底是做什么的,什么是原始字符串文字?

问题:字符串标志“ u”和“ r”到底是做什么的,什么是原始字符串文字?

当问这个问题时,我意识到我对原始字符串不了解很多。对于自称是Django培训师的人来说,这很糟糕。

我知道什么是编码,而且我知道u''自从得到Unicode以来,它独自做什么。

  • 但是究竟是r''什么呢?它产生什么样的字符串?

  • 最重要的是,该怎么ur''办?

  • 最后,有什么可靠的方法可以从Unicode字符串返回到简单的原始字符串?

  • 嗯,顺便说一句,如果您的系统和文本编辑器字符集设置为UTF-8,u''实际上有什么作用吗?

While asking this question, I realized I didn’t know much about raw strings. For somebody claiming to be a Django trainer, this sucks.

I know what an encoding is, and I know what u'' alone does since I get what is Unicode.

  • But what does r'' do exactly? What kind of string does it result in?

  • And above all, what the heck does ur'' do?

  • Finally, is there any reliable way to go back from a Unicode string to a simple raw string?

  • Ah, and by the way, if your system and your text editor charset are set to UTF-8, does u'' actually do anything?


回答 0

实际上并没有任何“原始字符串 ”。这里有原始的字符串文字,它们恰好是'r'在引号前用a标记的字符串文字。

“原始字符串文字”与字符串文字的语法略有不同,其中\反斜杠“”代表“只是反斜杠”(除非在引号之前否则会终止文字)- “转义序列”代表换行符,制表符,退格键,换页等。在普通的字符串文字中,每个反斜杠必须加倍,以避免被当作转义序列的开始。

之所以存在此语法变体,主要是因为正则表达式模式的语法带有反斜杠(但不会在末尾加重),所以语法比较繁琐(因此,上面的“ except”子句无关紧要),并且在避免将每个模式加倍时看起来会更好一些- – 就这样。表达本机Windows文件路径(用反斜杠代替其他平台上的常规斜杠)也引起了人们的欢迎,但这很少需要(因为正常斜杠在Windows上也可以正常工作)并且不完美(由于“ except”子句)以上)。

r'...'是一个字节串(在Python 2 *),ur'...'是Unicode字符串(再次,在Python 2 *),以及任何其他3种引用的也产生完全相同的类型字符串(因此,例如r'...'r'''...'''r"..."r"""..."""都是字节字符串,依此类推)。

不确定“ 返回 ”是什么意思-本质上没有前后方向,因为没有原始字符串类型,它只是一种表示完全正常的字符串对象,字节或Unicode的替代语法。

是的,在Python 2 *,u'...' 当然总是从刚不同'...'-前者是一个unicode字符串,后者是一个字节的字符串。文字表达的编码方式可能是完全正交的问题。

例如,考虑一下(Python 2.6):

>>> sys.getsizeof('ciao')
28
>>> sys.getsizeof(u'ciao')
34

Unicode对象当然会占用更多的存储空间(很短的字符串,很明显,;-差别很小)。

There’s not really any “raw string“; there are raw string literals, which are exactly the string literals marked by an 'r' before the opening quote.

A “raw string literal” is a slightly different syntax for a string literal, in which a backslash, \, is taken as meaning “just a backslash” (except when it comes right before a quote that would otherwise terminate the literal) — no “escape sequences” to represent newlines, tabs, backspaces, form-feeds, and so on. In normal string literals, each backslash must be doubled up to avoid being taken as the start of an escape sequence.

This syntax variant exists mostly because the syntax of regular expression patterns is heavy with backslashes (but never at the end, so the “except” clause above doesn’t matter) and it looks a bit better when you avoid doubling up each of them — that’s all. It also gained some popularity to express native Windows file paths (with backslashes instead of regular slashes like on other platforms), but that’s very rarely needed (since normal slashes mostly work fine on Windows too) and imperfect (due to the “except” clause above).

r'...' is a byte string (in Python 2.*), ur'...' is a Unicode string (again, in Python 2.*), and any of the other three kinds of quoting also produces exactly the same types of strings (so for example r'...', r'''...''', r"...", r"""...""" are all byte strings, and so on).

Not sure what you mean by “going back” – there is no intrinsically back and forward directions, because there’s no raw string type, it’s just an alternative syntax to express perfectly normal string objects, byte or unicode as they may be.

And yes, in Python 2.*, u'...' is of course always distinct from just '...' — the former is a unicode string, the latter is a byte string. What encoding the literal might be expressed in is a completely orthogonal issue.

E.g., consider (Python 2.6):

>>> sys.getsizeof('ciao')
28
>>> sys.getsizeof(u'ciao')
34

The Unicode object of course takes more memory space (very small difference for a very short string, obviously ;-).


回答 1

python中有两种类型的字符串:传统str类型和较新unicode类型。如果您键入的字符串文字u前面不带,则将得到str存储8位字符的旧类型,而带u前面的将得到unicode可存储任何Unicode字符的较新类型。

r完全不改变类型,它只是改变了字符串是如何解释。如果没有,则将r反斜杠视为转义字符。使用时r,反斜杠被视为文字。无论哪种方式,类型都是相同的。

ur 当然是Unicode字符串,其中反斜杠是文字反斜杠,而不是转义码的一部分。

您可以尝试使用str()函数将Unicode字符串转换为旧字符串,但是如果旧字符串中无法表示任何Unicode字符,则会出现异常。如果愿意,可以先用问号替换它们,但是当然这会导致这些字符不可读。str如果要正确处理Unicode字符,建议不要使用该类型。

There are two types of string in python: the traditional str type and the newer unicode type. If you type a string literal without the u in front you get the old str type which stores 8-bit characters, and with the u in front you get the newer unicode type that can store any Unicode character.

The r doesn’t change the type at all, it just changes how the string literal is interpreted. Without the r, backslashes are treated as escape characters. With the r, backslashes are treated as literal. Either way, the type is the same.

ur is of course a Unicode string where backslashes are literal backslashes, not part of escape codes.

You can try to convert a Unicode string to an old string using the str() function, but if there are any unicode characters that cannot be represented in the old string, you will get an exception. You could replace them with question marks first if you wish, but of course this would cause those characters to be unreadable. It is not recommended to use the str type if you want to correctly handle unicode characters.


回答 2

“原始字符串”表示将其存储为原样。例如,'\'只是一个反斜杠,而不是逃避

‘raw string’ means it is stored as it appears. For example, '\' is just a backslash instead of an escaping.


回答 3

“ u”前缀表示该值具有类型unicode而不是str

带有“ r”前缀的原始字符串文字将转义其中的所有转义序列,因此len(r"\n")也是如此。2。由于它们转义了转义序列,因此您不能以单个反斜杠结束字符串文字:这不是有效的转义序列(例如r"\")。

“原始”不是该类型的一部分,它只是表示值的一种方式。例如,"\\n"r"\n"是相同的值,就像320x200b100000是相同的。

您可以使用unicode原始字符串文字:

>>> u = ur"\n"
>>> print type(u), len(u)
<type 'unicode'> 2

源文件编码仅决定如何解释源文件,否则不会影响表达式或类型。但是,建议避免使用非ASCII编码会改变含义的代码:

使用ASCII的文件(对于Python 3.0,则为UTF-8)应该没有编码cookie。只有在注释或文档字符串需要提及需要使用Latin-1的作者姓名时,才应使用Latin-1(或UTF-8)。否则,使用\ x,\ u或\ U转义是在字符串文字中包含非ASCII数据的首选方法。

A “u” prefix denotes the value has type unicode rather than str.

Raw string literals, with an “r” prefix, escape any escape sequences within them, so len(r"\n") is 2. Because they escape escape sequences, you cannot end a string literal with a single backslash: that’s not a valid escape sequence (e.g. r"\").

“Raw” is not part of the type, it’s merely one way to represent the value. For example, "\\n" and r"\n" are identical values, just like 32, 0x20, and 0b100000 are identical.

You can have unicode raw string literals:

>>> u = ur"\n"
>>> print type(u), len(u)
<type 'unicode'> 2

The source file encoding just determines how to interpret the source file, it doesn’t affect expressions or types otherwise. However, it’s recommended to avoid code where an encoding other than ASCII would change the meaning:

Files using ASCII (or UTF-8, for Python 3.0) should not have a coding cookie. Latin-1 (or UTF-8) should only be used when a comment or docstring needs to mention an author name that requires Latin-1; otherwise, using \x, \u or \U escapes is the preferred way to include non-ASCII data in string literals.


回答 4

让我简单地解释一下:在python 2中,您可以将字符串存储为2种不同的类型。

第一个是ASCII,它是python中的str类型,它使用1个字节的内存。(256个字符,将主要存储英文字母和简单符号)

第二种类型是UNICODE,它是python中的unicode类型。Unicode存储所有类型的语言。

默认情况下,python会更喜欢str类型,但是如果您想将字符串存储为unicode类型,则可以将u放在像u’text’这样的文本前面,也可以通过调用unicode(’text’)来实现

所以ü只是打电话投的功能一小段路海峡Unicode的。而已!

现在r部分,您将其放在文本前面以告诉计算机该文本是原始文本,反斜杠不应是转义字符。r’\ n’不会创建换行符。只是包含2个字符的纯文本。

如果要将str转换为unicode并将原始文本也放入其中,请使用ur,因为ru会引发错误。

现在,重要的部分:

您不能使用r来存储一个反斜杠,这是唯一的exceptions。因此,此代码将产生错误:r’\’

要存储反斜杠(仅一个),您需要使用“ \\”

如果要存储1个以上的字符,则仍可以使用r,r’\\’一样,将产生2个反斜杠,如您所愿。

我不知道r无法与一个反斜杠存储一起使用的原因,但至今尚未有人描述。我希望这是一个错误。

Let me explain it simply: In python 2, you can store string in 2 different types.

The first one is ASCII which is str type in python, it uses 1 byte of memory. (256 characters, will store mostly English alphabets and simple symbols)

The 2nd type is UNICODE which is unicode type in python. Unicode stores all types of languages.

By default, python will prefer str type but if you want to store string in unicode type you can put u in front of the text like u’text’ or you can do this by calling unicode(‘text’)

So u is just a short way to call a function to cast str to unicode. That’s it!

Now the r part, you put it in front of the text to tell the computer that the text is raw text, backslash should not be an escaping character. r’\n’ will not create a new line character. It’s just plain text containing 2 characters.

If you want to convert str to unicode and also put raw text in there, use ur because ru will raise an error.

NOW, the important part:

You cannot store one backslash by using r, it’s the only exception. So this code will produce error: r’\’

To store a backslash (only one) you need to use ‘\\’

If you want to store more than 1 characters you can still use r like r’\\’ will produce 2 backslashes as you expected.

I don’t know the reason why r doesn’t work with one backslash storage but the reason isn’t described by anyone yet. I hope that it is a bug.


回答 5

也许这很明显,也许不是,但是您可以通过调用x = chr(92)来使字符串“ \”

x=chr(92)
print type(x), len(x) # <type 'str'> 1
y='\\'
print type(y), len(y) # <type 'str'> 1
x==y   # True
x is y # False

Maybe this is obvious, maybe not, but you can make the string ‘\’ by calling x=chr(92)

x=chr(92)
print type(x), len(x) # <type 'str'> 1
y='\\'
print type(y), len(y) # <type 'str'> 1
x==y   # True
x is y # False

回答 6

Unicode字符串文字

Unicode字符串文字(以前缀的字符串文字u)在Python 3中不再使用。它们仍然有效,但仅出于与Python 2 兼容的目的

原始字符串文字

如果要创建仅由易于键入的字符(例如英文字母或数字)组成的字符串文字,只需键入以下内容即可:'hello world'。但是,如果您还想包含一些其他奇特的字符,则必须使用一些解决方法。解决方法之一是转义序列。这样,例如,您只需\n在字符串文字中添加两个易于键入的字符,即可在字符串中表示新行。因此,当您打印'hello\nworld'字符串时,单词将被打印在单独的行上。非常方便!

另一方面,在某些情况下,您想创建一个包含转义序列的字符串文字,但又不希望它们由Python解释。您希望它们变得生硬。看下面的例子:

'New updates are ready in c:\windows\updates\new'
'In this lesson we will learn what the \n escape sequence does.'

在这种情况下,您可以在字符串文字前加上如下r字符:r'hello\nworld'并且Python不会解释任何转义序列。字符串将完全按照您创建的样子打印。

原始字符串文字不是完全“原始”吗?

许多人期望原始字符串文字是原始的,因为“ Python会忽略引号之间的任何内容”。那是不对的。Python仍然可以识别所有转义序列,只是不解释它们-而是使它们保持不变。这意味着原始字符串文字仍然必须是有效的字符串文字

根据字符串文字的词汇定义

string     ::=  "'" stringitem* "'"
stringitem ::=  stringchar | escapeseq
stringchar ::=  <any source character except "\" or newline or the quote>
escapeseq  ::=  "\" <any source character>

显然,包含裸引号:'hello'world'或以反斜杠:结尾的字符串文字(无论是否原始)'hello world\'都是无效的。

Unicode string literals

Unicode string literals (string literals prefixed by u) are no longer used in Python 3. They are still valid but just for compatibility purposes with Python 2.

Raw string literals

If you want to create a string literal consisting of only easily typable characters like english letters or numbers, you can simply type them: 'hello world'. But if you want to include also some more exotic characters, you’ll have to use some workaround. One of the workarounds are Escape sequences. This way you can for example represent a new line in your string simply by adding two easily typable characters \n to your string literal. So when you print the 'hello\nworld' string, the words will be printed on separate lines. That’s very handy!

On the other hand, there are some situations when you want to create a string literal that contains escape sequences but you don’t want them to be interpreted by Python. You want them to be raw. Look at these examples:

'New updates are ready in c:\windows\updates\new'
'In this lesson we will learn what the \n escape sequence does.'

In such situations you can just prefix the string literal with the r character like this: r'hello\nworld' and no escape sequences will be interpreted by Python. The string will be printed exactly as you created it.

Raw string literals are not completely “raw”?

Many people expect the raw string literals to be raw in a sense that “anything placed between the quotes is ignored by Python”. That is not true. Python still recognizes all the escape sequences, it just does not interpret them – it leaves them unchanged instead. It means that raw string literals still have to be valid string literals.

From the lexical definition of a string literal:

string     ::=  "'" stringitem* "'"
stringitem ::=  stringchar | escapeseq
stringchar ::=  <any source character except "\" or newline or the quote>
escapeseq  ::=  "\" <any source character>

It is clear that string literals (raw or not) containing a bare quote character: 'hello'world' or ending with a backslash: 'hello world\' are not valid.


UnicodeEncodeError:’ascii’编解码器无法在位置20编码字符u’\ xa0’:序数不在范围内(128)

问题:UnicodeEncodeError:’ascii’编解码器无法在位置20编码字符u’\ xa0’:序数不在范围内(128)

我在处理从不同网页(在不同站点上)获取的文本中的unicode字符时遇到问题。我正在使用BeautifulSoup。

问题是错误并非总是可重现的。它有时可以在某些页面上使用,有时它会通过抛出来发声UnicodeEncodeError。我已经尝试了几乎所有我能想到的东西,但是没有找到任何能正常工作而不抛出某种与Unicode相关的错误的东西。

导致问题的代码部分之一如下所示:

agent_telno = agent.find('div', 'agent_contact_number')
agent_telno = '' if agent_telno is None else agent_telno.contents[0]
p.agent_info = str(agent_contact + ' ' + agent_telno).strip()

这是运行上述代码段时在某些字符串上生成的堆栈跟踪:

Traceback (most recent call last):
  File "foobar.py", line 792, in <module>
    p.agent_info = str(agent_contact + ' ' + agent_telno).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

我怀疑这是因为某些页面(或更具体地说,来自某些站点的页面)可能已编码,而其他页面可能未编码。所有站点都位于英国,并提供供英国消费的数据-因此,与英语以外的其他任何形式的内部化或文字处理都没有问题。

是否有人对如何解决此问题有任何想法,以便我可以始终如一地解决此问题?

I’m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

The problem is that the error is not always reproducible; it sometimes works with some pages, and sometimes, it barfs by throwing a UnicodeEncodeError. I have tried just about everything I can think of, and yet I have not found anything that works consistently without throwing some kind of Unicode-related error.

One of the sections of code that is causing problems is shown below:

agent_telno = agent.find('div', 'agent_contact_number')
agent_telno = '' if agent_telno is None else agent_telno.contents[0]
p.agent_info = str(agent_contact + ' ' + agent_telno).strip()

Here is a stack trace produced on SOME strings when the snippet above is run:

Traceback (most recent call last):
  File "foobar.py", line 792, in <module>
    p.agent_info = str(agent_contact + ' ' + agent_telno).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

I suspect that this is because some pages (or more specifically, pages from some of the sites) may be encoded, whilst others may be unencoded. All the sites are based in the UK and provide data meant for UK consumption – so there are no issues relating to internalization or dealing with text written in anything other than English.

Does anyone have any ideas as to how to solve this so that I can CONSISTENTLY fix this problem?


回答 0

您需要阅读Python Unicode HOWTO。这个错误是第一个例子

基本上,停止使用str从Unicode转换为编码的文本/字节。

相反,请正确使用.encode()编码字符串:

p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()

或完全以unicode工作。

You need to read the Python Unicode HOWTO. This error is the very first example.

Basically, stop using str to convert from unicode to encoded text / bytes.

Instead, properly use .encode() to encode the string:

p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()

or work entirely in unicode.


回答 1

这是经典的python unicode痛点!考虑以下:

a = u'bats\u00E0'
print a
 => batsà

到目前为止一切都很好,但是如果我们调用str(a),让我们看看会发生什么:

str(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)

噢,蘸,那对任何人都不会有好处!要解决该错误,请使用.encode明确编码字节,并告诉python使用哪种编解码器:

a.encode('utf-8')
 => 'bats\xc3\xa0'
print a.encode('utf-8')
 => batsà

Voil \ u00E0!

问题是,当您调用str()时,python使用默认的字符编码来尝试对给定的字节进行编码,在您的情况下,有时表示为unicode字符。要解决此问题,您必须告诉python如何使用.encode(’whatever_unicode’)处理您给它的字符串。大多数时候,使用utf-8应该会很好。

有关此主题的出色论述,请参见Ned Batchelder在PyCon上的演讲:http : //nedbatchelder.com/text/unipain.html

This is a classic python unicode pain point! Consider the following:

a = u'bats\u00E0'
print a
 => batsà

All good so far, but if we call str(a), let’s see what happens:

str(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)

Oh dip, that’s not gonna do anyone any good! To fix the error, encode the bytes explicitly with .encode and tell python what codec to use:

a.encode('utf-8')
 => 'bats\xc3\xa0'
print a.encode('utf-8')
 => batsà

Voil\u00E0!

The issue is that when you call str(), python uses the default character encoding to try and encode the bytes you gave it, which in your case are sometimes representations of unicode characters. To fix the problem, you have to tell python how to deal with the string you give it by using .encode(‘whatever_unicode’). Most of the time, you should be fine using utf-8.

For an excellent exposition on this topic, see Ned Batchelder’s PyCon talk here: http://nedbatchelder.com/text/unipain.html


回答 2

我发现可以通过优雅的方法删除符号并继续按以下方式将字符串保留为字符串:

yourstring = yourstring.encode('ascii', 'ignore').decode('ascii')

重要的是要注意,使用ignore选项是危险的,因为它会悄悄地从使用它的代码中删除所有对unicode(和国际化)的支持,如下所示(转换unicode):

>>> u'City: Malmö'.encode('ascii', 'ignore').decode('ascii')
'City: Malm'

I found elegant work around for me to remove symbols and continue to keep string as string in follows:

yourstring = yourstring.encode('ascii', 'ignore').decode('ascii')

It’s important to notice that using the ignore option is dangerous because it silently drops any unicode(and internationalization) support from the code that uses it, as seen here (convert unicode):

>>> u'City: Malmö'.encode('ascii', 'ignore').decode('ascii')
'City: Malm'

回答 3

好吧,我尝试了一切,但并没有帮助,在谷歌搜索之后,我发现了以下内容并有所帮助。使用python 2.7。

# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')

well i tried everything but it did not help, after googling around i figured the following and it helped. python 2.7 is in use.

# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')

回答 4

导致甚至打印失败的一个细微问题是环境变量设置错误,例如。此处LC_ALL设置为“ C”。在Debian中,他们不鼓励设置它:Locale上的Debian Wiki

$ echo $LANG
en_US.utf8
$ echo $LC_ALL 
C
$ python -c "print (u'voil\u00e0')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)
$ export LC_ALL='en_US.utf8'
$ python -c "print (u'voil\u00e0')"
voilà
$ unset LC_ALL
$ python -c "print (u'voil\u00e0')"
voilà

A subtle problem causing even print to fail is having your environment variables set wrong, eg. here LC_ALL set to “C”. In Debian they discourage setting it: Debian wiki on Locale

$ echo $LANG
en_US.utf8
$ echo $LC_ALL 
C
$ python -c "print (u'voil\u00e0')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)
$ export LC_ALL='en_US.utf8'
$ python -c "print (u'voil\u00e0')"
voilà
$ unset LC_ALL
$ python -c "print (u'voil\u00e0')"
voilà

回答 5

对我来说,有效的是:

BeautifulSoup(html_text,from_encoding="utf-8")

希望这对某人有帮助。

For me, what worked was:

BeautifulSoup(html_text,from_encoding="utf-8")

Hope this helps someone.


回答 6

实际上,我发现在大多数情况下,仅去除那些字符会更加简单:

s = mystring.decode('ascii', 'ignore')

I’ve actually found that in most of my cases, just stripping out those characters is much simpler:

s = mystring.decode('ascii', 'ignore')

回答 7

问题是您正在尝试打印unicode字符,但是您的终端不支持它。

您可以尝试安装language-pack-en软件包来解决此问题:

sudo apt-get install language-pack-en

它为所有支持的软件包(包括Python)提供英语翻译数据更新。如有必要,请安装其他语言包(取决于您尝试打​​印的字符)。

在某些Linux发行版中,需要确保正确设置了默认的英语语言环境(因此unicode字符可以由shell / terminal处理)。有时,与手动配置相比,它更容易安装。

然后,在编写代码时,请确保在代码中使用正确的编码。

例如:

open(foo, encoding='utf-8')

如果仍然有问题,请仔细检查系统配置,例如:

  • 您的语言环境文件(/etc/default/locale),应包含例如

    LANG="en_US.UTF-8"
    LC_ALL="en_US.UTF-8"

    要么:

    LC_ALL=C.UTF-8
    LANG=C.UTF-8
  • LANG/ LC_CTYPEin shell的值。

  • 通过以下方法检查您的shell支持的语言环境:

    locale -a | grep "UTF-8"

演示新VM中的问题和解决方案。

  1. 初始化和配置VM(例如使用vagrant):

    vagrant init ubuntu/trusty64; vagrant up; vagrant ssh

    请参阅:可用的Ubuntu盒

  2. 打印unicode字符(例如商标符号):

    $ python -c 'print(u"\u2122");'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 0: ordinal not in range(128)
  3. 现在安装language-pack-en

    $ sudo apt-get -y install language-pack-en
    The following extra packages will be installed:
      language-pack-en-base
    Generating locales...
      en_GB.UTF-8... /usr/sbin/locale-gen: done
    Generation complete.
  4. 现在应该解决问题:

    $ python -c 'print(u"\u2122");'
    
  5. 否则,请尝试以下命令:

    $ LC_ALL=C.UTF-8 python -c 'print(u"\u2122");'
    

The problem is that you’re trying to print a unicode character, but your terminal doesn’t support it.

You can try installing language-pack-en package to fix that:

sudo apt-get install language-pack-en

which provides English translation data updates for all supported packages (including Python). Install different language package if necessary (depending which characters you’re trying to print).

On some Linux distributions it’s required in order to make sure that the default English locales are set-up properly (so unicode characters can be handled by shell/terminal). Sometimes it’s easier to install it, than configuring it manually.

Then when writing the code, make sure you use the right encoding in your code.

For example:

open(foo, encoding='utf-8')

If you’ve still a problem, double check your system configuration, such as:

  • Your locale file (/etc/default/locale), which should have e.g.

    LANG="en_US.UTF-8"
    LC_ALL="en_US.UTF-8"
    

    or:

    LC_ALL=C.UTF-8
    LANG=C.UTF-8
    
  • Value of LANG/LC_CTYPE in shell.

  • Check which locale your shell supports by:

    locale -a | grep "UTF-8"
    

Demonstrating the problem and solution in fresh VM.

  1. Initialize and provision the VM (e.g. using vagrant):

    vagrant init ubuntu/trusty64; vagrant up; vagrant ssh
    

    See: available Ubuntu boxes..

  2. Printing unicode characters (such as trade mark sign like ):

    $ python -c 'print(u"\u2122");'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 0: ordinal not in range(128)
    
  3. Now installing language-pack-en:

    $ sudo apt-get -y install language-pack-en
    The following extra packages will be installed:
      language-pack-en-base
    Generating locales...
      en_GB.UTF-8... /usr/sbin/locale-gen: done
    Generation complete.
    
  4. Now problem should be solved:

    $ python -c 'print(u"\u2122");'
    ™
    
  5. Otherwise, try the following command:

    $ LC_ALL=C.UTF-8 python -c 'print(u"\u2122");'
    ™
    

回答 8

在外壳中:

  1. 通过以下命令查找支持的UTF-8语言环境:

    locale -a | grep "UTF-8"
  2. 在运行脚本之前将其导出,例如:

    export LC_ALL=$(locale -a | grep UTF-8)

    或手动像:

    export LC_ALL=C.UTF-8
  3. 通过打印特殊字符进行测试,例如

    python -c 'print(u"\u2122");'

以上在Ubuntu中测试过。

In shell:

  1. Find supported UTF-8 locale by the following command:

    locale -a | grep "UTF-8"
    
  2. Export it, before running the script, e.g.:

    export LC_ALL=$(locale -a | grep UTF-8)
    

    or manually like:

    export LC_ALL=C.UTF-8
    
  3. Test it by printing special character, e.g. :

    python -c 'print(u"\u2122");'
    

Above tested in Ubuntu.


回答 9

在脚本开头的下面添加一行(或作为第二行):

# -*- coding: utf-8 -*-

那就是python源代码编码的定义。PEP 263中的更多信息。

Add line below at the beginning of your script ( or as second line):

# -*- coding: utf-8 -*-

That’s definition of python source code encoding. More info in PEP 263.


回答 10

这是其他一些所谓的“警惕”答案的重新表述。在某些情况下,尽管有人在这里提出抗议,但简单地扔掉麻烦的字符/字符串是一个很好的解决方案。

def safeStr(obj):
    try: return str(obj)
    except UnicodeEncodeError:
        return obj.encode('ascii', 'ignore').decode('ascii')
    except: return ""

测试它:

if __name__ == '__main__': 
    print safeStr( 1 ) 
    print safeStr( "test" ) 
    print u'98\xb0'
    print safeStr( u'98\xb0' )

结果:

1
test
98°
98

建议:您可能想将此函数命名为toAscii?这是优先事项。

这是为Python 2编写的。 对于Python 3,我相信您将要使用bytes(obj,"ascii")而不是str(obj)。我尚未对此进行测试,但是我会在某个时候修改答案。

Here’s a rehashing of some other so-called “cop out” answers. There are situations in which simply throwing away the troublesome characters/strings is a good solution, despite the protests voiced here.

def safeStr(obj):
    try: return str(obj)
    except UnicodeEncodeError:
        return obj.encode('ascii', 'ignore').decode('ascii')
    except: return ""

Testing it:

if __name__ == '__main__': 
    print safeStr( 1 ) 
    print safeStr( "test" ) 
    print u'98\xb0'
    print safeStr( u'98\xb0' )

Results:

1
test
98°
98

Suggestion: you might want to name this function to toAscii instead? That’s a matter of preference.

This was written for Python 2. For Python 3, I believe you’ll want to use bytes(obj,"ascii") rather than str(obj). I didn’t test this yet, but I will at some point and revise the answer.


回答 11

我总是将代码放在python文件的前两行中:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

I always put the code below in the first two lines of the python files:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

回答 12

这里可以找到简单的帮助程序功能。

def safe_unicode(obj, *args):
    """ return the unicode representation of obj """
    try:
        return unicode(obj, *args)
    except UnicodeDecodeError:
        # obj is byte string
        ascii_text = str(obj).encode('string_escape')
        return unicode(ascii_text)

def safe_str(obj):
    """ return the byte string representation of obj """
    try:
        return str(obj)
    except UnicodeEncodeError:
        # obj is unicode
        return unicode(obj).encode('unicode_escape')

Simple helper functions found here.

def safe_unicode(obj, *args):
    """ return the unicode representation of obj """
    try:
        return unicode(obj, *args)
    except UnicodeDecodeError:
        # obj is byte string
        ascii_text = str(obj).encode('string_escape')
        return unicode(ascii_text)

def safe_str(obj):
    """ return the byte string representation of obj """
    try:
        return str(obj)
    except UnicodeEncodeError:
        # obj is unicode
        return unicode(obj).encode('unicode_escape')

回答 13

只需添加到变量encode(’utf-8’)

agent_contact.encode('utf-8')

Just add to a variable encode(‘utf-8’)

agent_contact.encode('utf-8')

回答 14

请打开终端并执行以下命令:

export LC_ALL="en_US.UTF-8"

Please open terminal and fire the below command:

export LC_ALL="en_US.UTF-8"

回答 15

我只使用了以下内容:

import unicodedata
message = unicodedata.normalize("NFKD", message)

查看有关它的文档说明:

unicodedata.normalize(form,unistr)返回Unicode字符串unistr的普通形式form。格式的有效值为“ NFC”,“ NFKC”,“ NFD”和“ NFKD”。

Unicode标准基于规范对等和兼容性对等的定义,定义了Unicode字符串的各种规范化形式。在Unicode中,可以用各种方式表示几个字符。例如,字符U + 00C7(带有CEDILLA的拉丁文大写字母C)也可以表示为序列U + 0043(拉丁文的大写字母C)U + 0327(合并CEDILLA)。

对于每个字符,有两种规范形式:规范形式C和规范形式D。规范形式D(NFD)也称为规范分解,将每个字符转换为其分解形式。范式C(NFC)首先应用规范分解,然后再次组成预组合字符。

除了这两种形式,还有基于兼容性对等的两种其他常规形式。在Unicode中,支持某些字符,这些字符通常会与其他字符统一。例如,U + 2160(罗马数字ONE)与U + 0049(拉丁大写字母I)实际上是同一回事。但是,Unicode支持它与现有字符集(例如gb2312)兼容。

普通形式的KD(NFKD)将应用兼容性分解,即用所有等效字符替换它们的等效字符。范式KC(NFKC)首先应用兼容性分解,然后进行规范组合。

即使将两个unicode字符串归一化并在人类读者看来是相同的,但是如果一个字符串包含组合字符而另一个字符串没有组合,则它们可能不相等。

为我解决。简单容易。

I just used the following:

import unicodedata
message = unicodedata.normalize("NFKD", message)

Check what documentation says about it:

unicodedata.normalize(form, unistr) Return the normal form form for the Unicode string unistr. Valid values for form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.

The Unicode standard defines various normalization forms of a Unicode string, based on the definition of canonical equivalence and compatibility equivalence. In Unicode, several characters can be expressed in various way. For example, the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA).

For each character, there are two normal forms: normal form C and normal form D. Normal form D (NFD) is also known as canonical decomposition, and translates each character into its decomposed form. Normal form C (NFC) first applies a canonical decomposition, then composes pre-combined characters again.

In addition to these two forms, there are two additional normal forms based on compatibility equivalence. In Unicode, certain characters are supported which normally would be unified with other characters. For example, U+2160 (ROMAN NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL LETTER I). However, it is supported in Unicode for compatibility with existing character sets (e.g. gb2312).

The normal form KD (NFKD) will apply the compatibility decomposition, i.e. replace all compatibility characters with their equivalents. The normal form KC (NFKC) first applies the compatibility decomposition, followed by the canonical composition.

Even if two unicode strings are normalized and look the same to a human reader, if one has combining characters and the other doesn’t, they may not compare equal.

Solves it for me. Simple and easy.


回答 16

下面的解决方案为我工作,刚刚添加

u“字符串”

(将字符串表示为unicode)在我的字符串之前。

result_html = result.to_html(col_space=1, index=False, justify={'right'})

text = u"""
<html>
<body>
<p>
Hello all, <br>
<br>
Here's weekly summary report.  Let me know if you have any questions. <br>
<br>
Data Summary <br>
<br>
<br>
{0}
</p>
<p>Thanks,</p>
<p>Data Team</p>
</body></html>
""".format(result_html)

Below solution worked for me, Just added

u “String”

(representing the string as unicode) before my string.

result_html = result.to_html(col_space=1, index=False, justify={'right'})

text = u"""
<html>
<body>
<p>
Hello all, <br>
<br>
Here's weekly summary report.  Let me know if you have any questions. <br>
<br>
Data Summary <br>
<br>
<br>
{0}
</p>
<p>Thanks,</p>
<p>Data Team</p>
</body></html>
""".format(result_html)

回答 17

this这至少在Python 3中有效…

Python 3

有时错误在于环境变量中,因此

import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
myLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")
... 
print(myText.encode('utf-8', errors='ignore'))

在编码中忽略错误的地方。

Alas this works in Python 3 at least…

Python 3

Sometimes the error is in the enviroment variables and enconding so

import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
myLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")
... 
print(myText.encode('utf-8', errors='ignore'))

where errors are ignored in encoding.


回答 18

我只是遇到了这个问题,而Google带领我来到这里,因此,为了在这里添加一般的解决方案,这对我有用:

# 'value' contains the problematic data
unic = u''
unic += value
value = unic

阅读内德的演讲后,我有了这个主意。

不过,我并没有声称完全理解为什么这样做。因此,如果任何人都可以编辑此答案或发表评论以进行解释,我将不胜感激。

I just had this problem, and Google led me here, so just to add to the general solutions here, this is what worked for me:

# 'value' contains the problematic data
unic = u''
unic += value
value = unic

I had this idea after reading Ned’s presentation.

I don’t claim to fully understand why this works, though. So if anyone can edit this answer or put in a comment to explain, I’ll appreciate it.


回答 19

manage.py migrate在带有本地化夹具的Django中运行时,我们遇到了此错误。

我们的资料包含# -*- coding: utf-8 -*-声明,MySQL已为utf8正确配置,而Ubuntu在中具有适当的语言包和值/etc/default/locale

问题只是因为Django容器(我们使用docker)缺少 LANG env var。

在重新运行迁移之前设置LANGen_US.UTF-8并重新启动容器可以解决此问题。

We struck this error when running manage.py migrate in Django with localized fixtures.

Our source contained the # -*- coding: utf-8 -*- declaration, MySQL was correctly configured for utf8 and Ubuntu had the appropriate language pack and values in /etc/default/locale.

The issue was simply that the Django container (we use docker) was missing the LANG env var.

Setting LANG to en_US.UTF-8 and restarting the container before re-running migrations fixed the problem.


回答 20

这里的许多答案(例如,@ agf和@Andbdrew)已经解决了OP问题的最直接方面。

但是,我认为有一个微妙但重要的方面已被很大程度上忽略,这对于像我这样在尝试理解Python编码时最终落到这里的每个人都非常重要:Python 2 vs Python 3字符表示的管理截然不同。我觉得很多困惑与人们在不了解版本的情况下阅读Python编码有关。

我建议有兴趣了解OP问题根本原因的人首先阅读Spolsky对字符表示法和Unicode 介绍,然后转向Python 2和Python 3中的Unicode Batchelder

Many answers here (@agf and @Andbdrew for example) have already addressed the most immediate aspects of the OP question.

However, I think there is one subtle but important aspect that has been largely ignored and that matters dearly for everyone who like me ended up here while trying to make sense of encodings in Python: Python 2 vs Python 3 management of character representation is wildly different. I feel like a big chunk of confusion out there has to do with people reading about encodings in Python without being version aware.

I suggest anyone interested in understanding the root cause of OP problem to begin by reading Spolsky’s introduction to character representations and Unicode and then move to Batchelder on Unicode in Python 2 and Python 3.


回答 21

尽量避免将变量转换为str(variable)。有时,这可能会导致问题。

避免的简单提示:

try: 
    data=str(data)
except:
    data = data #Don't convert to String

上面的示例还将解决Encode错误。

Try to avoid conversion of variable to str(variable). Sometimes, It may cause the issue.

Simple tip to avoid :

try: 
    data=str(data)
except:
    data = data #Don't convert to String

The above example will solve Encode error also.


回答 22

如果您有类似的packet_data = "This is data"操作,请在初始化后立即在下一行执行此操作packet_data

unic = u''
packet_data = unic

If you have something like packet_data = "This is data" then do this on the next line, right after initializing packet_data:

unic = u''
packet_data = unic

回答 23

python 3.0及更高版本的更新。在python编辑器中尝试以下操作:

locale-gen en_US.UTF-8
export LANG=en_US.UTF-8 LANGUAGE=en_US.en
LC_ALL=en_US.UTF-8

这会将系统的默认语言环境编码设置为UTF-8格式。

有关更多信息,请参见PEP 538-将传统C语言环境强制为基于UTF-8的语言环境

Update for python 3.0 and later. Try the following in the python editor:

locale-gen en_US.UTF-8
export LANG=en_US.UTF-8 LANGUAGE=en_US.en
LC_ALL=en_US.UTF-8

This sets the system`s default locale encoding to the UTF-8 format.

More can be read here at PEP 538 — Coercing the legacy C locale to a UTF-8 based locale.


回答 24

我遇到了尝试将Unicode字符输出到stdout,但使用sys.stdout.write而不是print的问题(这样我也可以支持将输出输出到其他文件)。

从BeautifulSoup自己的文档中,我使用编解码器库解决了此问题:

import sys
import codecs

def main(fIn, fOut):
    soup = BeautifulSoup(fIn)
    # Do processing, with data including non-ASCII characters
    fOut.write(unicode(soup))

if __name__ == '__main__':
    with (sys.stdin) as fIn: # Don't think we need codecs.getreader here
        with codecs.getwriter('utf-8')(sys.stdout) as fOut:
            main(fIn, fOut)

I had this issue trying to output Unicode characters to stdout, but with sys.stdout.write, rather than print (so that I could support output to a different file as well).

From BeautifulSoup’s own documentation, I solved this with the codecs library:

import sys
import codecs

def main(fIn, fOut):
    soup = BeautifulSoup(fIn)
    # Do processing, with data including non-ASCII characters
    fOut.write(unicode(soup))

if __name__ == '__main__':
    with (sys.stdin) as fIn: # Don't think we need codecs.getreader here
        with codecs.getwriter('utf-8')(sys.stdout) as fOut:
            main(fIn, fOut)

回答 25

当使用Apache部署django项目时,经常会发生此问题。因为Apache在/ etc / sysconfig / httpd中设置环境变量LANG = C。只需打开文件并注释(或更改为您的样式)此设置即可。或使用WSGIDaemonProcess命令的lang选项,在这种情况下,您将能够为不同的虚拟主机设置不同的LANG环境变量。

This problem often happens when a django project deploys using Apache. Because Apache sets environment variable LANG=C in /etc/sysconfig/httpd. Just open the file and comment (or change to your flavior) this setting. Or use the lang option of the WSGIDaemonProcess command, in this case you will be able to set different LANG environment variable to different virtualhosts.


回答 26

推荐的解决方案对我不起作用,我可以忍受所有非ascii字符的转储,因此

s = s.encode('ascii',errors='ignore')

这给我留下了不会抛出错误的东西。

The recommended solution did not work for me, and I could live with dumping all non ascii characters, so

s = s.encode('ascii',errors='ignore')

which left me with something stripped that doesn’t throw errors.


回答 27

这将起作用:

 >>>print(unicodedata.normalize('NFD', re.sub("[\(\[].*?[\)\]]", "", "bats\xc3\xa0")).encode('ascii', 'ignore'))

输出:

>>>bats

This will work:

 >>>print(unicodedata.normalize('NFD', re.sub("[\(\[].*?[\)\]]", "", "bats\xc3\xa0")).encode('ascii', 'ignore'))

Output:

>>>bats